Trends in Genetics
Volume 17, Issue 3, 1 March 2001, Pages 113-120
Journal home page for Trends in Genetics

Research update
How many genes in Arabidopsis come from cyanobacteria? An estimate from 386 protein phylogenies

https://doi.org/10.1016/S0168-9525(00)02209-5Get rights and content

Abstract

It is well known that chloroplasts and mitochondria donated many genes to nuclear chromosomes during evolution – but how many is ‘many’? A sample of 3961 Arabidopsis nuclear protein-coding genes was compared with the complete set of proteins from yeast and 17 reference prokaryotic genomes, including one cyanobacterium (the lineage from which plastids arose). The analysis of 386 phylogenetic trees distilled from these data suggests that between ∼400 (1.6%) and ∼2200 (9.2%) of Arabidopsis nuclear genes stem from cyanobacteria. The degree of conservation preserved in protein sequences in addition to lateral gene transfer between free-living prokaryotes pose substantial challenges to genome phylogenetics.

Section snippets

Automated sequence filtering for automated phylogenetic analysis

How does one obtain and evaluate a large number of protein phylogenies starting from 3961 proteins and 18 reference genomes using automated procedures? Our strategy is outlined in Fig. 1. (By the time this article appears, the complete Arabidopsis genome will have been published, but those data were not available when we embarked upon this work.) We obtained 3961 annotated Arabidopsis nuclear-encoded nonredundant proteins (kindly provided by H-W. Mewes, MIPS, Munich). All of the proteins from

What assumptions are involved here?

The simplest criterion for inferring a cyanobacterial origin of a nuclear-encoded Arabidopsis protein would be a common branch for the Synechocystis and Arabidopsis proteins in a phylogenetic tree, regardless of how the rest of the tree hatches out, as outlined above and in Fig. 2a. But expecting to find this branch for a genuinely cyanobacterial gene in the Arabidopsis genome entails quite a few assumptions that are usually made implicitly. It is worthwhile spelling them out.

The first of these

A brief summary of 386 protein phylogenies

Given these considerations, what did we find? Sixty-three of 386 alignments investigated yielded a topology in which the Synechocystis and Arabidopsis homologues shared a unique common branch [designated here as (Ath,Syn), using standard phylogenetic shorthand]. Two such examples are shown in Fig. 2b, CrtE and NifU, along with one example, RfbD, that did not contain the (Ath,Syn) branch. The remaining 323 trees (386 minus 63) did not contain the (Ath,Syn) branch.

Are the 323 proteins whose trees

Branches, trees and horizontal gene transfer

A cyanobacterial branching for some plant nuclear genes makes good biological sense because it is well known that plastids descend from cyanobacteria that took up permanent residence within their host and transferred many genes to the nucleus 1, 3. So when we see an Arabidopsis protein branching with a cyanobacterial homologue in a tree, we infer a cyanobacterial origin of the plant nuclear gene – no problem. But, applying exactly the same logic, for example, to the Bacillus and Mycobacterium

Acknowledgements

We thank H-W. Mewes for making data available, and K. Henze, M. Hoffmeister and several students for critical comments on the paper. T.R. is currently at Epigenomics AG, Kastanienallee 24, 10435 Berlin, Germany.

References (28)

  • M. Nei

    Phylogenetic analysis in molecular evolutionary genetics

    Annu. Rev. Genet.

    (1996)
  • P.J. Lockhart

    Spectral analysis, systematic bias, and the evolution of chloroplasts

    Mol. Biol. Evol.

    (1999)
  • B. Bölter

    Origin of a chloroplast protein importer

    Proc. Natl. Acad. Sci. USA.

    (1998)
  • W.F. Doolittle

    Phylogenetic classification and the universal tree

    Science

    (1999)
  • Cited by (0)

    View full text