With the current exponential growth of biomedical data, there is a crucial need to organize and check them, as well as make them accessible to both humans and computers. This activity is called biocuration. Online biocurated databases play a central role in the whole process. On the occasion of the 30th anniversary of Swiss-Prot, the curated section of the UniProt Knowledgebase Ioannis Xenarios, Head of the Swiss-Prot group at SIB, has been asked about the current and future challenges for the knowledgebase and for his group.

What do you think is the main challenge of UniProtKB/Swiss-Prot today?
Nowadays, there are terabytes of data newly generated every week. Our main challenge is to transform these “big data” into smart and actionable data. This can only be done by adequately balancing the power of our biocuration experts to optimize the automation performed by computers. In other words, we need to feed the algorithms with high quality, robustly identified and well-structured data and then use these algorithms to scale up on to the millions of entries that we have to deal with. The real challenge we have today is thus to make the best use of human expertise in the whole process of dealing with massive amounts of complex data.

What will be the challenges of UniProtKB/Swiss-Prot tomorrow?
With the development of artificial intelligence, there is a big danger of solely relying on technologies and automatic systems to solve everything, and leaving the humans out of the process; this also applies to the fields of life sciences and biomedical research. Therefore, the challenge of Swiss-Prot in the future is actually the same as it is today: making sure we can survive in such a machine driven environment by correctly structuring the “big data”, while also using human expertise to provide the “reference set” of information and knowledge to the database users.

Which initiatives have been taken by the SIB Swiss-Prot group to move forward?
“Big data” are useful if they are accessible by scientists, but also if the different pieces of information from various fields are connected or interlinked. Our goal is to facilitate the data exchange between scientists. For this, we have reinforced our collaborative work with scientific journals and publishers to properly cross-reference – within the text body of articles – the different evidences or knowledge that we have curated inside Swiss-Prot. For instance, the SIB Swiss-Prot group is collaborating with the European Molecular Biology Organization (EMBO) journal to develop a resource called SourceData. In this project, we are attempting to structure figure annotations, since figures do contain most of the evidence held within a biology paper.
We also try to identify emerging fields and needs, and collaborate with their experts to transform their knowledge accumulated over the years, into well-structured databases. SwissLipids, Rhea and Viralzone are all resources issued from such collaborative work between the SIB Swiss-Prot group and experts. Here again, our aim is to help build bridges between fields. For instance, the experts-annotated database of chemical reactions, Rhea, links enzymes and chemistry, thus becoming the cornerstone resource of metabolic balanced reactions. Swiss-Lipids, for its part, bridges lipids with biochemistry and helps fill a gap in the study and structuration of lipid knowledge.
Finally, the SIB Swiss-Prot group, together with Vital-IT, is also involved in public-private partnerships, as illustrated by the Innovative Medicine Initiatives (IMI) funded by both the EU and pharma companies. The IMI takes advantage, either directly or indirectly, of the expertise provided by the SIB Swiss-Prot group to enhance knowledge access by scientists and clinicians.