With over 20,000 protein-coding genes in humans, discovering those that are linked to a given disease amounts to finding a molecular needle in a haystack.
Gene networks serve as useful shortcuts for this purpose, as they document the possible topologies of association between genes, based, for example, on their function. New genes, functionally similar to known disease-associated ones, can therefore be uncovered. A plethora of bioinformatics network tools are available to allow scientists to explore gene network topologies. But which are the best performing networks?
The SIB and ELIXIR Core Resource STRING, developed and maintained by the SIB Group of Christian von Mering at the University of Zurich, outperforms most competing networks, according to a recent study. Led by researchers from the University of California San Diego and published in Cell Systems, the study provides a systematic comparison of 21 publicly available interaction networks. STRING came out as best performing in the majority of tests, such as the average ranked performance on the recovery of genome-wide association studies (GWAS) gene sets.
This study should make the selection of the appropriate tools for scientists conducting human disease research a few steps easier.
What is STRING?
STRING is a database and online resource providing users with all known and predicted functional interactions between proteins. Such interactions include for example physical binding and regulatory interactions. It currently includes information on 9'643'763 proteins from 2'031 organisms. The crucial feature of STRING, in addition to its comprehensive data integration (from experimental and curated data, as well as text-mining predictions), is that is provides a confidence scoring for each interaction. STRING is jointly developed by the SIB, EMBL Heidelberg and University of Copenhagen.
Huang J K et al. Systematic evaluation of molecular networks for discovery of disease genes. Cell Systems, 2018. https://doi.org/10.1016/j.cels.2018.03.001