A sustainable funding model for core data resources in the life sciences

Science relies on data. What is less obvious, is that the majority of life science databases in the world benefit from less than one year of funding – and that 2/3 of the public databases that existed 15 years ago have now disappeared. If no solution is found, essential knowledge and associated investments will be lost. A recent study by the SIB Swiss Institute of Bioinformatics and supported by ELIXIR compares existing funding models and identifies a candidate scheme, according to which the costs of core data resources worldwide could be covered by using less than 1% of the total amount already dedicated to research grants in the life sciences.

Box 1. Curated databases: adding value has a cost

Data resources can be very expensive, especially those offering a high added value.

The expert curation of knowledgebases for example, ensures a highly accurate and reliable source of scientific information. But it also increases the costs associated with such infrastructures, when compared to simple archiving.

Box 2. UniProt as a case study

Since its creation in 1986, the UniProt knowledgebase passed through various funding models, which made it particularly interesting for the study.

It started in 1986 as a research project at the University of Geneva funded through a Swiss National Science Foundation (SNSF) research grant.

Then it passed through a funding crisis solved with the creation of an institutional framework for the knowledgebase: the SIB Swiss Institute of Bioinformatics. At that time, the Swiss government was funding 50% of the resource, while the rest of the budget was covered by licences sold to commercial users.

Currently, UniProt is freely accessible to all users and it is supported by a consortium formed by SIB, the European Bioinformatics Institute (EMBL-EBI), and the Protein Information Resource (PIR). The UniProt consortium has 3 main funders: the Swiss government, the NIH and the EMBL.


With the increasing volume of data generated every year in the life sciences, data resources are becoming more and more essential for research and medicine.

Unfortunately, most databases are evaluated as research projects rather than as actual research infrastructure components, and rely on short-term grants that run out well earlier than the planning horizon of the resource.

While numerous international initiatives have been launched to find solutions to this issue, there is a lack of clarity as to which would be the most appropriate to ensure the long-term sustainability of data resources. Additionally, these initiatives relate mainly to archives, storage and data stewardship, while discussions on curated databases (e.g. knowledgebases; see Box 1), are still in their infancy.

Therefore, with the support of ELIXIR (see ELIXIR Data Platform), the European life science infrastructure, SIB conducted a study to address funding concerns for knowledgebases.

Identifying the pros and cons of existing funding models

“In our study, we have applied existing or possible funding models to a real case, the Universal Protein Resource (UniProt, the key resource for protein sequences and functional information knowledge, see Box 2), and we investigated the pros and cons for each model”, says Chiara Gabella, first author on the study. “Most models present inconsistencies with open access policies, for example those relying on a total or partial paywall, or present issues in terms of equity, by favouring specific user categories and penalising others. Other models are consistent with those principles, but are not sufficient to cover the total costs. They can however be used as a complementary income source”.

 “less than 1% of the total amount already dedicated to research grants in the life sciences would be sufficient to cover the costs of the core data resources worldwide”

The Infrastructure Model: an efficient business model for the sustainability of data resources

Among the 12 models evaluated, the analysis identifies the Infrastructure Model as a sustainable funding scheme for all core data resources in the life sciences (see ELIXIR’s recent list of Core Resources). Under such scheme, funding agencies set aside a fixed percentage of their research grants, which is subsequently redistributed to core data resources according to well-defined selection criteria.

This model, compatible with the principles of open science, is in agreement with several international initiatives such as the Human Frontiers Science Program Organisation (HFSPO) and the OECD Global Science Forum (GSF) project.

“We have estimated that less than 1% of the total amount already dedicated to research grants in the life sciences would be sufficient to cover the costs of the core data resources worldwide, including both knowledgebases and deposition databases” concludes Gabella.

Provided a suitable governance structure is established, the Infrastructure Model therefore represents an extremely efficient business model for the long term sustainability of data resources and encourages equity, internationality and economic dependability.


