Science relies on data. What is less obvious, is that the majority of life science databases in the world benefit from less than one year of funding – and that 2/3 of the public databases that existed 15 years ago have now disappeared. If no solution is found, essential knowledge and associated investments will be lost. A recent study by the SIB Swiss Institute of Bioinformatics and supported by ELIXIR compares existing funding models and identifies a candidate scheme, according to which the costs of core data resources worldwide could be covered by using less than 1% of the total amount already dedicated to research grants in the life sciences.
With the increasing volume of data generated every year in the life sciences, data resources are becoming more and more essential for research and medicine.
Unfortunately, most databases are evaluated as research projects rather than as actual research infrastructure components, and rely on short-term grants that run out well earlier than the planning horizon of the resource.
While numerous international initiatives have been launched to find solutions to this issue, there is a lack of clarity as to which would be the most appropriate to ensure the long-term sustainability of data resources. Additionally, these initiatives relate mainly to archives, storage and data stewardship, while discussions on curated databases (e.g. knowledgebases; see Box 1), are still in their infancy.
Identifying the pros and cons of existing funding models
“In our study, we have applied existing or possible funding models to a real case, the Universal Protein Resource (UniProt, the key resource for protein sequences and functional information knowledge, see Box 2), and we investigated the pros and cons for each model”, says Chiara Gabella, first author on the study. “Most models present inconsistencies with open access policies, for example those relying on a total or partial paywall, or present issues in terms of equity, by favouring specific user categories and penalising others. Other models are consistent with those principles, but are not sufficient to cover the total costs. They can however be used as a complementary income source”.
“less than 1% of the total amount already dedicated to research grants in the life sciences would be sufficient to cover the costs of the core data resources worldwide”
The Infrastructure Model: an efficient business model for the sustainability of data resources
Among the 12 models evaluated, the analysis identifies the Infrastructure Model as a sustainable funding scheme for all core data resources in the life sciences (see ELIXIR’s recent list of Core Resources). Under such scheme, funding agencies set aside a fixed percentage of their research grants, which is subsequently redistributed to core data resources according to well-defined selection criteria.
This model, compatible with the principles of open science, is in agreement with several international initiatives such as the Human Frontiers Science Program Organisation (HFSPO) and the OECD Global Science Forum (GSF) project.
“We have estimated that less than 1% of the total amount already dedicated to research grants in the life sciences would be sufficient to cover the costs of the core data resources worldwide, including both knowledgebases and deposition databases” concludes Gabella.
Provided a suitable governance structure is established, the Infrastructure Model therefore represents an extremely efficient business model for the long term sustainability of data resources and encourages equity, internationality and economic dependability.
Gabella C, Durinx C and Appel R. Funding knowledgebases: Towards a sustainable funding model for the UniProt use case [version 1; referees: awaiting peer review]. F1000Research 2017, 6(ELIXIR):2051 doi: 10.12688/f1000research.12989.1
First international proposal to sustain core data resources: a revolution in motion
SIB is part of a global coalition to sustain core data resources
Two SIB knowledgebases deemed essential for the life sciences