Abstract
Species phylogenies derived from comparisons of single genes are rarely consistent with each other, due to horizontal gene transfer1, unrecognized paralogy and highly variable rates of evolution2. The advent of completely sequenced genomes allows the construction of a phylogeny that is less sensitive to such inconsistencies and more representative of whole-genomes than are single-gene trees. Here, we present a distance-based phylogeny3 constructed on the basis of gene content, rather than on sequence identity, of 13 completely sequenced genomes of unicellular species. The similarity between two species is defined as the number of genes that they have in common divided by their total number of genes. In this type of phylogenetic analysis, evolutionary distance can be interpreted in terms of evolutionary events such as the acquisition and loss of genes, whereas the underlying properties (the gene content) can be interpreted in terms of function. As such, it takes a position intermediate to phylogenies based on single genes and phylogenies based on phenotypic characteristics. Although our comprehensive genome phylogeny is independent of phylogenies based on the level of sequence identity of individual genes, it correlates with the standard reference of prokarytic phylogeny based on sequence similarity of 16s rRNA (ref. 4). Thus, shared gene content between genomes is quantitatively determined by phylogeny, rather than by phenotype, and horizontal gene transfer has only a limited role in determining the gene content of genomes.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Doolittle, W.F. & Logsdon, J.M. Archaeal genomics: do Archaea have a mixed heritage? Curr. Biol. 8, R209–R211 (1998).
Huynen, M.A. & Bork, P. Measuring genome evolution. Proc. Natl Acad. Sci. USA 95, 5849– 5856 (1998).
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).
Olsen, G.J., Woese, C.R. & Overbeek, R. The winds of (evolutionary) change: breathing new life into microbiology. J. Bacteriol. 176, 1– 6 (1994).
Fitch, W.M. Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–110 ( 1970).
Maidak, B.L. et al. The RDP (Ribosomal Database Project). Nucleic Acids Res. 25, 109–111 ( 1997).
Klenk, H. & Zillig, W. DNA-dependent RNA polymerase subunit B as a tool for phylogenetic reconstructions: branching topology of the archaeal domain. J. Mol. Evol. 38, 420– 432 (1994).
Baldauf, S., Palmer, J.D. & Doolittle, W.F. The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc. Natl Acad. Sci. USA 93, 7749–7754 ( 1996).
Gruber, T.M. & Bryant, D.A. Molecular systematic studies of eubacteria, using σ70-type σ factors of group 1 and group 2. J. Bacteriol. 179, 1734–1747 (1997).
Huynen, M.A., Dandekar, T. & Bork, P. Differential genome analysis applied to the species-specific features of Helicobacter pylori. FEBS Lett. 426, 1–5 (1998).
Lawrence, J.G. & Ochman, H. Molecular archaeology of the Escherichia coli genome. Proc. Natl Acad. Sci. USA 95, 9413–9417 ( 1998).
Smith, T. & Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195– 197 (1981).
Pearson, W. Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84 (1998).
Brenner, S., Chotia, C. & Hubbard, T.J. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl Acad. Sci. USA 95, 6073–6078 (1998).
Tatusov, R.L., Koonin, E.V. & Lipman, D.J. A genomic perspective on protein families. Science 278, 631–637 ( 1997).
Fleishmann, R. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae. Science 269, 496– 512 (1995).
Fraser, C.M. et al. The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403 (1995).
Kaneko, T. et al. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. ii. sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res. 3, 109–136 ( 1996).
Bult, C.J. et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273, 1058–1072 (1996).
Blattner, F.E. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453– 1462 (1997).
Smith, D.R. et al. Complete genome sequence of Methanobacterium thermoautotrophicum δH: functional analysis and comparative genomics. J. Bacteriol. 17, 7135–7155 (1997).
Tomb, J.-F. et al. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388, 539 –547 (1997).
Klenk, H.P. et al. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature 390, 364–370 (1997).
Kunst, F. et al. The complete genome sequence of the Gram-positive bacterium Bacillus subtilis. Nature 390, 249– 256 (1997).
Fraser, C.M. et al. Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390, 580– 586 (1997).
Mewes, H.W. et al. Overview of the yeast genome. Nature 387, 7–65 (1997).
Deckert, G. et al. The complete genome of the hyperthermophilic bacterium Aquifex aeolicus. Nature 392, 353– 358 (1998).
Kawarabayasi, Y. et al. Complete sequence and gene organization of the genome of a hyper-thermophylic archaebacterium Pyrococcus horikoshii OT3. DNA Res. 5, 55–76 (1998).
Wu, C.F.J. Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Stat. 14, 1261–1295 (1986).
Himmelreich, R., Plagens, H., Hilbert, H., Reiner, B. & Herrmann, R. Comparative analysis of the genomes of the bacteria Mycoplasma pneumoniae and Mycoplasma genitalium. Nucleic Acids Res. 24, 4420–4449 (1996).
Acknowledgements
This work was supported by BMBF.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Snel, B., Bork, P. & Huynen, M. Genome phylogeny based on gene content. Nat Genet 21, 108–110 (1999). https://doi.org/10.1038/5052
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/5052
This article is cited by
-
Novel Organism Verification and Analysis (NOVA) study: identification of 35 clinical isolates representing potentially novel bacterial taxa using a pipeline based on whole genome sequencing
BMC Microbiology (2024)
-
Exploring objective feature sets in constructing the evolution relationship of animal genome sequences
BMC Genomics (2023)
-
Conserved and lineage-specific hypothetical proteins may have played a central role in the rise and diversification of major archaeal groups
BMC Biology (2022)
-
Phylogenies from unaligned proteomes using sequence environments of amino acid residues
Scientific Reports (2022)
-
Bifidobacterium castoris strains isolated from wild mice show evidence of frequent host switching and diverse carbohydrate metabolism potential
ISME Communications (2022)