Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages

Abstract

We introduce a general computational method, applicable on a genome-wide scale, for the systematic discovery of uncharacterized cellular systems. Quantitative analysis of the coinheritance of pairs of genes among different organisms, calculated using phylogenetic profiles, allows the prediction of thousands of functional linkages between the corresponding proteins. A comparison of these functional linkages to known pathways reveals that calculated linkages are comparable in accuracy to genome-wide yeast two-hybrid screens or mass spectrometry interaction assays. In aggregate, these linkages describe the structure of large-scale networks, with the resulting yeast network composed of 3,875 linkages among 804 proteins, and the resulting pathogenic Escherichia coli network composed of 2,043 linkages among 828 proteins. The search of such networks for groups of uncharacterized, linked proteins led to the identification of 27 novel cellular systems from one nonpathogenic and three pathogenic bacterial genomes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: A schematic representation of the systematic method for identification of novel cellular systems.
Figure 2: Two measures of the quality of functional linkages are presented.
Figure 3: Predicted genome-wide protein networks for yeast23.
Figure 4: Predicted genome-wide protein networks for pathogenic E. coli O157:H7 (ref. 46).
Figure 5: Clusters representing potentially new pathways selected from reconstructions of genome-wide interaction networks of four different organisms.
Figure 6: The phylogenetic profiles drawn for the core components of the gene clusters in Figure 5.

Similar content being viewed by others

References

  1. Marcotte, E.M. Computational genetics: finding function by non-homology methods. Curr. Opin. Struct. Biol. 10, 359–365 (2000).

    Article  CAS  Google Scholar 

  2. Huynen, M., Snel, B., Lathe, W. & Bork, P. Exploitation of gene context. Curr. Opin. Struct. Biol. 10, 366–370 (2000).

    Article  CAS  Google Scholar 

  3. Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D. & Yeates, T.O. Assigning protein functions by comparative analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999).

    Article  CAS  Google Scholar 

  4. Overbeek, R., Fonstein, M., D'Souza, M., Pusch, G. & Maltsev, N. The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901 (1999).

    Article  CAS  Google Scholar 

  5. Tavazoie, S., Huges, J.D., Campbell, M.J., Cho, R.J. & Church, G.M. Systematic determination of genetic network architecture. Nat. Gen. 22, 281–285 (1999).

    Article  CAS  Google Scholar 

  6. Marcotte, E.M., Pellegrini, M., Ng, H.-L., Rice, D.W., Yeates, T.O. & Eisenberg, D. Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753 (1999).

    Article  CAS  Google Scholar 

  7. Enright, A.J., Iliopoulos, I., Kyrpides, N.C. & Ouzounis, C.A. Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90 (1999).

    Article  CAS  Google Scholar 

  8. Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T. & Eisenberg, D. A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999).

    Article  CAS  Google Scholar 

  9. Dandekar, T., Snel, B., Huynen, M. & Bork, P. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328 (1998).

    Article  CAS  Google Scholar 

  10. Eisenberg, D., Marcotte, E.M., Xenarios, I. & Yeates, T.O. Protein function in the post-genomic era. Nature 405, 823–826 (2000).

    Article  CAS  Google Scholar 

  11. Salgado, H., Moreno-Hagelsieb, G., Smith, T.F. & Collado-Vides, J. Operons in Escherichia coli: genomic analyses and predictions. Proc. Natl. Acad. Sci. USA 97, 6652–6657 (2000).

    Article  CAS  Google Scholar 

  12. Thompson, H.G.R, Harris, J.W, Wold, B.J, Quake, S.R & Brody, J.P. Identification and confirmation of a module of coexpressed genes. Genome Res. 12, 1517–1522 (2002).

    Article  CAS  Google Scholar 

  13. Tong, A.H. et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368 (2001).

    Article  CAS  Google Scholar 

  14. Uetz, P. et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000).

    Article  CAS  Google Scholar 

  15. Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574 (2001).

    Article  CAS  Google Scholar 

  16. Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).

    Article  CAS  Google Scholar 

  17. Gavin, A. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).

    Article  CAS  Google Scholar 

  18. Pavlidis, P., Weston, J., Cai, J. & Grundy, W.N. Learning gene functional classifications from multiple data types. J. Comp. Biol. 9, 401–411 (2002).

    Article  CAS  Google Scholar 

  19. Shannon, C.E. A mathematical theory of communication. Bell System Technical Journal 27, 379–423, 623–656 (1948).

    Article  Google Scholar 

  20. Krober, B.T.M, Farber, R.M., Wolpert, D.H. & Lapedes, A.S. Covariation of mutations in the V3 loop of human immunodeficiency virus type I envelope protein: an information theoretic analysis. Proc. Nat. Acad. Sci. USA 90, 7176–7180 (1993).

    Article  Google Scholar 

  21. Huynen, M., Snel, B., Lathe, W. & Bork, P. Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 10, 1204–1210 (2000).

    Article  CAS  Google Scholar 

  22. Blattner, F.R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 13–1474 (1997).

    Article  Google Scholar 

  23. Goffeau, A. et al. The yeast genome directory. Nature 387, Supplement (1997).

  24. Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).

    Article  CAS  Google Scholar 

  25. Xenarios, I. et al. DIP: the database of interacting proteins: 2001 update. Nucleic Acids Res. 29, 239–241 (2001).

    Article  CAS  Google Scholar 

  26. McAteer, S., Coulson, A., McLennan, N. & Masters, M. The lytB gene of Escherichia coli is essential and specifies a product needed for isoprenoid biosynthesis. J. Bacteriol. 183, 7403–7407 (2001).

    Article  CAS  Google Scholar 

  27. Cunningham, F.X. Jr., Lafond, T.P. & Gantt, E. Evidence of a role for LytB in the nonmevalonate pathway of isoprenoid biosynthesis. J. Bacteriol. 182, 5841–5848 (2000).

    Article  CAS  Google Scholar 

  28. Takahashi, S., Kuzuyama, T., Watanabe, H. & Seto, H. A 1-deoxy-D-xylulose 5-phosphate reductoisomerase catalyzing the formation of 2-C-methyl-D-erythritol 4-phosphate in an alternative nonmevalonate pathway for terpenoid biosynthesis. Proc. Natl. Acad. Sci. USA 95, 9879–9884 (1998).

    Article  CAS  Google Scholar 

  29. Herz, S. et al. Biosynthesis of terpenoids: YgbB protein converts 4-diphosphocytidyl-2C-methyl-D-erythritol 2-phosphate to 2-C-methyl-D-erythritol 2,4-cyclodiphosphate. Proc. Natl. Acad. Sci. USA 97, 2486–2490 (2000).

    Article  CAS  Google Scholar 

  30. Delneri, D., Gardner, D.C., Bruschi, C.V. & Oliver, S.G. Disruption of seven hypothetical aryl alcohol dehydrogenase genes from Saccharomyces cerevisiae and construction of a multiple knock-out strain. Yeast 15, 1681–1689 (1999).

    Article  CAS  Google Scholar 

  31. Traff, K.L., Jonsson, L.J. & Hahn-Hagerdal, B. Putative xylose and arabinose reductases in Saccharomyces cerevisiae. Yeast 19, 1233–1241 (2002).

    Article  CAS  Google Scholar 

  32. Galperin, M.Y., Nikolskaya, A.N. & Koonin, E.V. Novel domains of the prokaryotic two-component signal transduction systems. FEMS Microbiol. Lett. 203, 11–21 (2001).

    Article  CAS  Google Scholar 

  33. Amabile-Cuevas, C.F. & Demple, B. Molecular characterization of the soxRS genes of Escherichia coli: two genes control a superoxide stress regulon. Nucleic Acids Res. 19, 4479–4484 (1991).

    Article  CAS  Google Scholar 

  34. Gentschev, I., Dietrich, G. & Goebel, W. The E. coli α-hemolysin secretion system and its use in vaccine development. Trends Microbiol. 1, 39–45 (2002).

    Article  Google Scholar 

  35. Braun, V. & Braun, M. Active transport of iron and siderophore antibiotics. Curr. Opin. Microbiol. 2, 194–201 (2002).

    Article  Google Scholar 

  36. Bouveret, E. et al. Analysis of the Escherichia coli Tol–Pal and TonB systems by periplasmic production of Tol, TonB, colicin, or phage capsid soluble domains. Biochimie 84, 413–421 (2002).

    Article  CAS  Google Scholar 

  37. Garrett, T.A., Que, N.L. & Raetz, C.R. Accumulation of a lipid A precursor lacking the 4'-phosphate following inactivation of the Escherichia coli lpxK gene. J. Biol. Chem. 273, 12457–12465 (1998).

    Article  CAS  Google Scholar 

  38. Tzeng, Y.L., Datta, A., Kolli, V.K., Carlson, R.W. & Stephens, D.S. Endotoxin of Neisseria meningitidis composed only of intact lipid A: inactivation of the meningococcal 3-deoxy-D-manno-octulosonic acid transferase. J. Bacteriol. 184, 2379–2388 (2002).

    Article  CAS  Google Scholar 

  39. Rodriguez, E., Banchio, C., Diacovich, L., Bibb, M. & Gramajo, H. Role of an essential acyl coenzyme A carboxylase in the primary and secondary metabolism of Streptomyces coelicolor A3(2). Appl. Environ. Microbiol. 9, 4166–4176 (2001).

    Article  Google Scholar 

  40. Grundling, A., Manson, M. & Young, R. Holins kill without warning. Proc. Natl. Acad. Sci. USA 98, 9348–9352 (2001).

    Article  CAS  Google Scholar 

  41. Mengin-Lecreulx, D., van Heijenoort, J. & Park, J.T. Identification of the mpl gene encoding UDP-N-acetylmuramate: L-alanyl-gamma-D-glutamyl-meso-diaminopimelate ligase in Escherichia coli and its role in recycling of cell wall peptidoglycan. J. Bacteriol. 178, 5347–5352 (1996).

    Article  CAS  Google Scholar 

  42. Eisen, J.A. & Wu, M. Phylogenetic analysis and gene functional predictions: phylogenomics in action. Theor. Popul. Biol. 61, 481–487 (2002).

    Article  Google Scholar 

  43. Vert, J.P. A tree kernel to analyse phylogenetic profiles. Bioinformatics 1, 276–284 (2002).

    Article  Google Scholar 

  44. Verjovsky Marcotte, C.J. & Marcotte, E.M. Predicting functional linkages from gene fusions with confidence. Appl. Bioinformatics 1, 37–44 (2002).

    Google Scholar 

  45. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  Google Scholar 

  46. Perna, N.T. et al. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409, 529–533 (2001).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by grants from the Welch Foundation (F-1515), the Texas Advanced Research Program, a Camille and Henry Dreyfus New Faculty Award, National Science Foundation (EIA – 0219061) and a Packard Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edward M Marcotte.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Date, S., Marcotte, E. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat Biotechnol 21, 1055–1062 (2003). https://doi.org/10.1038/nbt861

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt861

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing