SIB’s Swiss-Prot Group co-director Alan Bridge on the future of biocuration

As announced in May, Alan Bridge is taking up the co-director role of SIB’s Swiss-Prot Group, thus succeeding to Lydie Bougueleret. He gives us his vision of the future for a unique profession, biocuration, and for the Swiss-Prot Group, one of its best-known exponents.

Alan Bridge, a short bio

Biologist by trade, with experience in molecular and cell biology in the field of cancer research

  • 1970: Birth in Preston, United Kingdom
  • 1998: Post-doc at ISREC, the Swiss Institute for Experimental Cancer Research. There he meets members of a small group of bioinformaticians of the nascent SIB, a group which later would become Vital-IT...
  • 2004: Joins Swiss-Prot as a biocurator in the quality assurance department
  • 2008: Becomes jointly responsible for the integration of all manually curated UniProtKB/Swiss-Prot
    entries
  • 2009: Becomes Head of all transversal programs of the Swiss-Prot group
  • 2017: Becomes Swiss-Prot’s Group co-director

About SIB’s Swiss-Prot Group

  • Created in 1986 by Amos Bairoch
  • Based in Geneva
  • Some 60 employees
  • Develops and maintains a number of internationally renowned resources including:
    • UniProt, the reference resource for protein sequence and functional information, produced in collaboration with PIR (USA) and EMBL-EBI (UK), and for which the Swiss-Prot group is
      providing the majority of the expert-curated content;
    • PROSITE and HAMAP, resources for protein classification and annotation;
    • ENZYME, a resource for enzyme nomenclature;
    • Rhea a resource of biochemical reactions;
    • SwissLipids, a resource for lipids and their biology;
    • ViralZone, a resource of viral biology.

“we might expect something of a renaissance in biocuration as biology evolves (...) towards a data-driven science of systems”

How would you describe the work of a biocurator to your non-biologist friends?

A curator (from the Latin: curare, meaning "to take care") is responsible for assembling, managing, and presenting some type of collection. While curation is often associated with cultural and scientific institutions such as galleries or museums, even early scientists – including Charles Darwin – were also obliged to perform curatorial work in order to organize and maintain their own collections of physical specimens and manuscripts, drawings, and notes... collections which were subsequently bestowed on museums and incorporated into larger curated collections.
Biocuration is a modern re-interpretation of this fundamental aspect of biological research, and could be defined as the curation of biological data and knowledge in forms that can be handled by computational analysis.
Biocuration might also be considered as something of a lost art among biologists, but is an essential aspect of biological data management and will be a central pillar of emerging data life cycle management plans. So we might expect something of a renaissance in biocuration as biology evolves further from its reductionist roots towards a data-driven science of systems.

What is the big issue that biocuration is trying to solve?

Biocuration aims to facilitate data preservation and reuse, both by scientists and machines, thereby maximizing the value of biological data.
Biocuration should be an integral part of any research project and data management plan, and an activity that biologists are familiar with and even perform themselves.

What were the three major changes that occurred in the group since you joined in 2004?

During my time at Swiss-Prot the group has evolved enormously. We have seen the emergence of new resources – like Rhea, SwissLipids, and ViralZone; new partners, including consortia like IMEx (The International Molecular Exchange Consortium) and Gene Ontology, journals (such as The EMBO Journal), and a number of companies; and new ways of working, where biocuration expertise now forms a central element of data management and data exploitation plans (e.g. through computational modelling) for research projects.
We might say that the Swiss-Prot Group has evolved from a resource development group into a competency centre for biocuration and knowledge management.

 “Biocuration aims to (...) maximize the value of biological data”

What will biocuration at Swiss-Prot be like in 10 years?

Biocuration by human experts will remain central to the development of high quality knowledgebases at Swiss-Prot, but we might expect that the work of curators will be more effectively supported by developments in other fields, such as improved machine learning algorithms for automated text-mining of publications, the emergence of structured publications compatible with computational analysis (like the SourceData initiative of EMBO), and possibly even curation of publications by their authors (again probably assisted by machines).
In recent years we have also witnessed an increasing demand for biocuration from researchers – with biocuration now an essential element of the data management plan for complex multidisciplinary projects such as the IMIDIA and RHAPSODY projects of the Innovative Medicines Initiative (IMI), and a crucial base for the perennialization of knowledge from projects like LipidX (SystemsX.ch). Biocurators at Swiss-Prot – and other SIB groups – will be increasingly called upon to help lead and shape these types of efforts.

 “The Swiss-Prot group has evolved from a resource development group into a competency centre for biocuration and knowledge management”

In your new function, you will be working hand in hand with Ioannis Xenarios, also Group Leader of Vital-IT, the other largest SIB group. How important is the synergy between Vital-IT and Swiss-Prot?

Together our two groups have a very broad range of skills and expertise, not only in biocuration and knowledge management but also in software development, high performance computing, algorithm development, bioinformatics, web and interface design, data analysis, and many other areas.
This allows us to identify and respond to opportunities using the right mix of talent from a very broad and deep pool.
We have recently combined our efforts to launch new resources (SwissLipids, with SystemsX.ch and LipidX), develop improved tools for genome annotation (accelerated PROSITE), support the development of enhanced scientific publications (The SourceData initiative of The EMBO Journal), and to develop robust data curation and management plans (IMIDIA, RHAPSODY).

“Biocuration by human experts will remain central to the development of high quality knowledgebases”

What are the biggest challenges facing Swiss-Prot today?

Aside from the obvious scientific challenges (ensuring that our resources continue to keep pace with emerging trends in biological research), one of the main challenges we face is one that will be familiar to many developers of knowledge resources, data repositories, and bioinformatics infrastructures: to ensure stable long term funding for our resources.
Resources such as UniProtKB/Swiss-Prot are used by thousands of researchers worldwide every day, but are supported by only a few countries - Switzerland (SERI), the US (NIH), and a number of European countries and partners (EMBL) being the main funders in this case.
This is neither equitable nor sustainable, and a better, fairer system is needed, one where an international coalition of funders supports core data resources according to objective criteria (such as their usage, research budget or Gross Domestic Product).
One of the goals of initiatives such as ELIXIR and the International Coalition to Sustain Core Data Resources – to which SIB is actively committed – is to create such a system. While this work is moving in the right direction, it will probably require years before any type of agreement is reached between the international funding agencies involved.
In the meantime, we are currently waiting on the results of our latest NIH application...

One of the main challenges we face is (...) to ensure stable long term funding for our resources”

The SIB Swiss Institute of Bioinformatics was created in 1998 to ensure the financial sustainability of Swiss-Prot. What do you expect SIB’s role will be in the coming years in the field of bioinformatics resources?

SIB was a very early pioneer in the development of bioinformatics infrastructures and resources – like the Swiss-Prot database – and in developing mechanisms to fund them.
Initiatives like ELIXIR are attempting to reproduce this success at an international level, and we expect that SIB will continue to be an active advocate for Swiss bioinformatics inside ELIXIR, helping to foster recognition and to create a more sustainable funding landscape for the future.
On a scientific level the SIB provides a fantastic environment for collaboration between developers of world-class bioinformatics resources – such as neXtProt, STRING and STITCH, SWISS-MODEL, SwissDrugDesign, and others – and we will continue to explore synergies and common ground with these resources.
As one example, we will begin to annotate enzymatic reaction data in UniProtKB from 2018 using the Rhea resource of biochemical reactions, which provides an explicit representation of chemical structures. This will further enhance the semantic search capacities of neXtProt, protein-metabolite networks in resources like STRING/STITCH, drug metabolism data for SwissDrugDesign, enzyme annotation for glycobiology and glycomics resources, and so on.

Could you tell us about a concrete example where the biocuration expertise from Swiss-Prot was used in a project and led to major scientific advances?

We showcased one example at a recent ELIXIR meeting in Brussels, where UniProtKB/Swiss-Prot was used in the discovery of a new form of cellulase.
Cellulase is an enzyme crucial for the production of biofuels from cellulose - an abundant source of renewable biomass – and is considered an industrial blockbuster with a potential market value of billions of Swiss francs worldwide. By using UniProtKB/Swiss-Prot for only a few minutes, a consortium of European researchers were able to identify a new thermostable cellulase suitable for industrial applications by screening metagenomic data from hot springs.
This example highlights not only the value of UniProtKB/Swiss-Prot, but also some of the difficulties we can encounter when trying to quantify the impact of such resources from simple usage figures. There is clearly a pressing need to develop more accurate and complete indicators of economic and scientific benefit, and to meet this need SIB has commissioned a study on economic impact and resource indicators. We eagerly await the results in 2018.

“By using UniProtKB/Swiss-Prot for only a few minutes, (...) researchers were able to identify a new thermostable cellulase suitable for industrial applications”