Trends in Genetics
Research updateHow many genes in Arabidopsis come from cyanobacteria? An estimate from 386 protein phylogenies
Section snippets
Automated sequence filtering for automated phylogenetic analysis
How does one obtain and evaluate a large number of protein phylogenies starting from 3961 proteins and 18 reference genomes using automated procedures? Our strategy is outlined in Fig. 1. (By the time this article appears, the complete Arabidopsis genome will have been published, but those data were not available when we embarked upon this work.) We obtained 3961 annotated Arabidopsis nuclear-encoded nonredundant proteins (kindly provided by H-W. Mewes, MIPS, Munich). All of the proteins from
What assumptions are involved here?
The simplest criterion for inferring a cyanobacterial origin of a nuclear-encoded Arabidopsis protein would be a common branch for the Synechocystis and Arabidopsis proteins in a phylogenetic tree, regardless of how the rest of the tree hatches out, as outlined above and in Fig. 2a. But expecting to find this branch for a genuinely cyanobacterial gene in the Arabidopsis genome entails quite a few assumptions that are usually made implicitly. It is worthwhile spelling them out.
The first of these
A brief summary of 386 protein phylogenies
Given these considerations, what did we find? Sixty-three of 386 alignments investigated yielded a topology in which the Synechocystis and Arabidopsis homologues shared a unique common branch [designated here as (Ath,Syn), using standard phylogenetic shorthand]. Two such examples are shown in Fig. 2b, CrtE and NifU, along with one example, RfbD, that did not contain the (Ath,Syn) branch. The remaining 323 trees (386 minus 63) did not contain the (Ath,Syn) branch.
Are the 323 proteins whose trees
Branches, trees and horizontal gene transfer
A cyanobacterial branching for some plant nuclear genes makes good biological sense because it is well known that plastids descend from cyanobacteria that took up permanent residence within their host and transferred many genes to the nucleus 1, 3. So when we see an Arabidopsis protein branching with a cyanobacterial homologue in a tree, we infer a cyanobacterial origin of the plant nuclear gene – no problem. But, applying exactly the same logic, for example, to the Bacillus and Mycobacterium
Acknowledgements
We thank H-W. Mewes for making data available, and K. Henze, M. Hoffmeister and several students for critical comments on the paper. T.R. is currently at Epigenomics AG, Kastanienallee 24, 10435 Berlin, Germany.
References (28)
A prediction of the size and evolutionary origin of the proteome of chloroplasts of Arabidopsis
Trends Plant Sci.
(2000)Rickettsiae and Chlamydiae – evidence of horizontal gene transfer and gene exchange
Trends Genet.
(1999)- et al.
The family of light-harvesting related proteins: was the harvesting of light their primary function?
Gene
(2000) Why have organelles retained genomes?
Trends Genet.
(1999)- et al.
Gene transfer from organelles to the nucleus: How much, what happens and why?
Plant Physiol.
(1998) Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions
DNA Res.
(1996)Gene transfer to the nucleus and the evolution of chloroplasts
Nature
(1998)Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Res.
(1997)CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Res.
(1994)- et al.
Computer Science Monographs, No. 28. MOLPHY Version 2.3: Programs for Molecular Phylogenetics Based on Maximum Likelihood
(1996)