Abstract
Although genome-scale technologies have benefited from statistical measures of data quality, extracting biologically relevant pathways from high-throughput proteomics data remains a challenge. Here we develop a quantitative method for evaluating proteomics data. We present a logistic regression approach that uses statistical and topological descriptors to predict the biological relevance of protein-protein interactions obtained from high-throughput screens for yeast. Other sources of information, including mRNA expression, genetic interactions and database annotations, are subsequently used to validate the model predictions without bias or cross-pollution. Novel topological statistics show hierarchical organization of the network of high-confidence interactions: protein complex interactions extend one to two links, and genetic interactions represent an even finer scale of organization. Knowledge of the maximum number of links that indicates a significant correlation between protein pairs (correlation distance) enables the integrated analysis of proteomics data with data from genetics and gene expression. The type of analysis presented will be essential for analyzing the growing amount of genomic and proteomics data in model organisms and humans.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Uetz, P. et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000).
Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574 (2001).
Tong, A.H. et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295, 321–324 (2002).
Gavin, A.C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).
Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).
von Mering, C. et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002).
Bader, G.D. & Hogue, C.W. Analyzing yeast protein-protein interaction data obtained from different sources. Nat. Biotechnol. 20, 991–997 (2002).
Phizicky, E., Bastiaens, P.I., Zhu, H., Snyder, M. & Fields, S. Protein analysis on a proteomic scale. Nature 422, 208–215 (2003).
Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).
Deane, C.M., Salwinski, L., Xenarios, I. & Eisenberg, D. Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell. Proteomics 1, 349–356 (2002).
Maslov, S. & Sneppen, K. Specificity and stability in topology of protein networks. Science 296, 910–913 (2002).
Watts, D.J. & Strogatz, S.H. Collective dynamics of 'small-world' networks. Nature 393, 440–442 (1998).
Barabasi, A.L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
Jeong, H., Mason, S.P., Barabasi, A.L. & Oltvai, Z.N. Lethality and centrality in protein networks. Nature 411, 41–42 (2001).
Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N. & Barabasi, A.L. Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002).
Wolf, Y.I., Karev, G. & Koonin, E.V. Scale-free networks in biology: new insights into the fundamentals of evolution? Bioessays 24, 105–109 (2002).
Goldberg, D.S. & Roth, F.P. Assessing experimentally derived interactions in a small world. Proc. Natl. Acad. Sci. USA 100, 4372–4376 (2003).
Ge, H., Liu, Z., Church, G.M. & Vidal, M. Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat. Genet. 29, 482–486 (2001).
Jansen, R., Greenbaum, D. & Gerstein, M. Relating whole-genome expression data with protein-protein interactions. Genome Res. 12, 37–46 (2002).
Kemmeren, P. et al. Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol. Cell 9, 1133–1143 (2002).
Matthews, L.R. et al. Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or “interologs”. Genome Res. 11, 2120–2126 (2001).
Lee, T.I. et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002).
Milo, R. et al. Network motifs: simple building blocks of complex networks. Science 298, 824–827 (2002).
McCullagh, P. & Nelder, J.A. Generalized Linear Models, edn. 2 (Chapman & Hall, London, 1983).
Hastie, T., Tibshirani, R. & Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, New York, 2001).
Xenarios, I. et al. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30, 303–305 (2002).
Jansen, R. et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003).
Ideker, T., Ozier, O., Schwikowski, B. & Siegel, A.F. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18 (Suppl. 1), S233–S240 (2002).
Bader, G.D. & Hogue, C.W. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4, 2 (2003).
Guet, C.C., Elowitz, M.B., Hsing, W. & Leibler, S. Combinatorial synthesis of genetic networks. Science 296, 1466–1470 (2002).
Bhalla, U.S., Ram, P.T. & Iyengar, R. MAP kinase phosphatase as a locus of flexibility in a mitogen-activated protein kinase signaling network. Science 297, 1018–1023 (2002).
Cho, R.J. et al. A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2, 65–73 (1998).
Spellman, P.T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998).
Zhao, L.P., Prentice, R. & Breeden, L. Statistical modeling of large microarray data sets to identify stimulus-response profiles. Proc. Natl. Acad. Sci. USA 98, 5631–5636 (2001).
Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998).
Bader, J.S. Greedily building protein networks with confidence. Bioinformatics 19, 1869–1874 (2003).
Chu, S. et al. The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998).
Giot, L. et al. A protein interaction map of Drosophila melanogaster. Science; published online 6 November 2003 (doi:10.1126/science.1090289).
Mewes, H.W. et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30, 31–34 (2002).
Hughes, T.R. et al. Functional discovery via a compendium of expression profiles. Cell 102, 109–126 (2000).
Tong, A.H. et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368 (2001).
Acknowledgements
J.S.B. acknowledges his colleagues at CuraGen who generated much of the data analyzed here and whose discussions have been enjoyable and productive.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Rights and permissions
About this article
Cite this article
Bader, J., Chaudhuri, A., Rothberg, J. et al. Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol 22, 78–85 (2004). https://doi.org/10.1038/nbt924
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt924
This article is cited by
-
Finding New Ways How to Control BACE1
The Journal of Membrane Biology (2022)
-
Integrating data and knowledge to identify functional modules of genes: a multilayer approach
BMC Bioinformatics (2019)
-
Characterizing building blocks of resource constrained biological networks
BMC Bioinformatics (2019)
-
A novel shortest path query algorithm
Cluster Computing (2019)
-
An information-theoretic, all-scales approach to comparing networks
Applied Network Science (2019)