Trends in Ecology & Evolution
OpinionLarge-scale sequencing and the new animal phylogeny
Introduction
Knowledge of phylogeny is important to biologists for several reasons; most directly, it tells us about the pattern of evolutionary relationships, revealing the historical pattern of speciation and divergence and enabling us to classify life according to a logical, informative, evolutionary scheme. Equally significant, however, is the central place of phylogeny in comparative biology: knowledge of the phylogeny of animals, for example, also tells us about the pattern of evolution of the heritable characteristics of those animals and, therefore, how the great diversity of animals evolved.
The phylogeny of the animals is currently incompletely resolved and has undergone major reorganisations over the past few years, mainly as a result of analyses of rRNA gene sequences 1, 2 (Box 1). The most significant changes concern the relative positions of those organisms that lack a true coelom (acoelomates and pseudocoelomates; see Glossary), including platyhelminths and nematodes, which were considered to have emerged early on during bilaterian evolution, and those that have a coelom, which had traditionally been considered to define a monophyletic group (Coelomata). In the new rRNA-based scheme 3, 4, the coelom is no longer central to the phylogeny. According to this new animal phylogeny (Box 1), the acoelomate platyhelminths now belong to Lophotrochozoa, a clade that includes mainly coelomate animals, such as annelids and mollusks; the pseudocoelomate nematodes are grouped with coelomate arthropods in a second group called Ecdysozoa. Although this revised rRNA-based phylogeny is widely cited and supported by independent evidence 5, 6, 7, there are several studies of non-rRNA gene sequences that have found support for the older idea of a monophyletic Coelomata 8, 9, 10, 11.
Collecting a large number of orthologous gene sequences is an important prerequisite for reconciling these data sets to achieve a reliable resolution of the animal phylogeny. Here, we argue that the most economical approach for data collection is to sequence ∼5000 expressed sequence tags (ESTs) from a wide diversity of organisms, as opposed to the more usual targeted PCR-based sequencing approach or complete genome sequencing (Box 2).
Section snippets
The impact of genomics
The contradictions that exist among different single gene studies 5, 6, 7, 8, 9, 10 probably occur as a result of stochastic errors owing to the limited amount of information available in single genes (Box 3). One popular approach to overcome these errors has been to mine the complete genomes of model organisms such as Caenorhabditis elegans, Homo sapiens and Drosophila melanogaster. Such a phylogenomic approach was expected to provide the information necessary to generate a fully resolved and
Alternative approaches
The standard sampling approach widely used in systematics (i.e. targeted PCR and sequencing of selected orthologous genes) is one obvious way to proceed. Such an approach is already under way for some animal groups (e.g. the Tree of Life project, http://tolweb.org/tree/phylogeny.html). However, defining a priori which gene is a reliable marker (e.g. minimally affected by hidden paralogy or gene conversion and evolving homogeneously at an appropriate rate) is not easy. Moreover, different genes
Practical considerations
From an empirical point of view, sequencing 1000 randomly chosen ESTs from a non-normalized library has provided sufficient data to resolve the phylogenetic position of some unicellular eukaryotes 32, 33. At the very least, 1000 sequences are sufficient to show whether a given species will be difficult to locate owing to systematic biases, such as high evolutionary rate or a biased amino-acid composition. However, evaluating a priori the number of ESTs to be sequenced is difficult. In fact, it
Missing data
One common concern relates to the presence of missing data in the concatenated alignment that results from the EST approach; some studies have demonstrated that incomplete taxa (i.e. for which the level of missing data are large) have an unstable placement in the phylogeny 36, 37, 38, 39. However, these studies considered only a limited total number of homologous characters (<1000), leaving very few positions that are known for the incomplete taxa. By contrast, computer simulations have shown
Future directions
EST sequencing is likely to have many uses beyond phylogenetic utility; indeed few EST projects to date have been initiated for phylogenetic purposes. As well as sets of orthologous genes, the EST approach can provide other valuable information. From a phylogenetic point of view, comparisons of gene representation have already opened the way to the discovery of molecular signatures, such as the appearance of a novel gene by horizontal transfer from distantly related species [30], or shared gene
Acknowledgements
We thank Frédéric Delsuc, Nicolas Lartillot, Ken Halanych and four anonymous referees for helpful suggestions. H.P. gratefully acknowledges the financial support provided by Génome Québec, the Canadian Research Chair and the Université de Montréal.
Glossary
- Acoelomates
- animals that lack a body cavity; includes platyhelminths which are lophotrochozoans and acoelomorph flatworms which are thought to be the sister group of all other bilaterians.
- Bilateria
- bilaterally symmetrical animals with three tissue layers (synonymous with triploblasts); includes most animal phyla. Cnidarians (jellyfish and hydra), ctenophores (comb jellies) and sponges are the most notable exceptions.
- Coelom
- a mesodermal, epithelium-lined, fluid-filled body cavity.
- Coelomates
- animals
References (70)
Comparative genomics of nematodes
Trends Genet.
(2005)Monophyly of primary photosynthetic eukaryotes: Green plants, red algae, and glaucophytes
Curr. Biol.
(2005)EST analysis of the cnidarian Acropora millepora reveals extensive gene loss and rapid sequence divergence in the model invertebrates
Curr. Biol.
(2003)Maintenance of ancestral complexity and non-metazoan genes in two basal cnidarians
Trends Genet.
(2005)Expressed sequence tags: alternative or complement to whole genome sequences?
Trends Plant Sci.
(2003)Deciding among green plants for whole genome studies
Trends Plant Sci.
(2002)Mitochondrial genome data support the basal position of Acoelomorpha and the polyphyly of the Platyhelminthes
Mol. Phylogenet. Evol.
(2004)Evidence from 18S ribosomal DNA that the lophophorates are protostome animals
Science
(1995)Evidence for a clade of nematodes, arthropods and other moulting animals
Nature
(1997)The new view of animal phylogeny
Annu. Rev. Ecol. Evol. Syst.
(2004)