Opinion
Large-scale sequencing and the new animal phylogeny

https://doi.org/10.1016/j.tree.2006.08.004Get rights and content

Although comparisons of gene sequences have revolutionised our understanding of the animal phylogenetic tree, it has become clear that, to avoid errors in tree reconstruction, a large number of genes from many species must be considered: too few genes and stochastic errors predominate, too few taxa and systematic errors appear. We argue here that, to gather many sequences from many taxa, the best use of resources is to sequence a small number of expressed sequence tags (1000–5000 per species) from as many taxa as possible. This approach counters both sources of error, gives the best hope of a well-resolved phylogeny of the animals and will act as a central resource for a carefully targeted genome sequencing programme.

Introduction

Knowledge of phylogeny is important to biologists for several reasons; most directly, it tells us about the pattern of evolutionary relationships, revealing the historical pattern of speciation and divergence and enabling us to classify life according to a logical, informative, evolutionary scheme. Equally significant, however, is the central place of phylogeny in comparative biology: knowledge of the phylogeny of animals, for example, also tells us about the pattern of evolution of the heritable characteristics of those animals and, therefore, how the great diversity of animals evolved.

The phylogeny of the animals is currently incompletely resolved and has undergone major reorganisations over the past few years, mainly as a result of analyses of rRNA gene sequences 1, 2 (Box 1). The most significant changes concern the relative positions of those organisms that lack a true coelom (acoelomates and pseudocoelomates; see Glossary), including platyhelminths and nematodes, which were considered to have emerged early on during bilaterian evolution, and those that have a coelom, which had traditionally been considered to define a monophyletic group (Coelomata). In the new rRNA-based scheme 3, 4, the coelom is no longer central to the phylogeny. According to this new animal phylogeny (Box 1), the acoelomate platyhelminths now belong to Lophotrochozoa, a clade that includes mainly coelomate animals, such as annelids and mollusks; the pseudocoelomate nematodes are grouped with coelomate arthropods in a second group called Ecdysozoa. Although this revised rRNA-based phylogeny is widely cited and supported by independent evidence 5, 6, 7, there are several studies of non-rRNA gene sequences that have found support for the older idea of a monophyletic Coelomata 8, 9, 10, 11.

Collecting a large number of orthologous gene sequences is an important prerequisite for reconciling these data sets to achieve a reliable resolution of the animal phylogeny. Here, we argue that the most economical approach for data collection is to sequence ∼5000 expressed sequence tags (ESTs) from a wide diversity of organisms, as opposed to the more usual targeted PCR-based sequencing approach or complete genome sequencing (Box 2).

Section snippets

The impact of genomics

The contradictions that exist among different single gene studies 5, 6, 7, 8, 9, 10 probably occur as a result of stochastic errors owing to the limited amount of information available in single genes (Box 3). One popular approach to overcome these errors has been to mine the complete genomes of model organisms such as Caenorhabditis elegans, Homo sapiens and Drosophila melanogaster. Such a phylogenomic approach was expected to provide the information necessary to generate a fully resolved and

Alternative approaches

The standard sampling approach widely used in systematics (i.e. targeted PCR and sequencing of selected orthologous genes) is one obvious way to proceed. Such an approach is already under way for some animal groups (e.g. the Tree of Life project, http://tolweb.org/tree/phylogeny.html). However, defining a priori which gene is a reliable marker (e.g. minimally affected by hidden paralogy or gene conversion and evolving homogeneously at an appropriate rate) is not easy. Moreover, different genes

Practical considerations

From an empirical point of view, sequencing 1000 randomly chosen ESTs from a non-normalized library has provided sufficient data to resolve the phylogenetic position of some unicellular eukaryotes 32, 33. At the very least, 1000 sequences are sufficient to show whether a given species will be difficult to locate owing to systematic biases, such as high evolutionary rate or a biased amino-acid composition. However, evaluating a priori the number of ESTs to be sequenced is difficult. In fact, it

Missing data

One common concern relates to the presence of missing data in the concatenated alignment that results from the EST approach; some studies have demonstrated that incomplete taxa (i.e. for which the level of missing data are large) have an unstable placement in the phylogeny 36, 37, 38, 39. However, these studies considered only a limited total number of homologous characters (<1000), leaving very few positions that are known for the incomplete taxa. By contrast, computer simulations have shown

Future directions

EST sequencing is likely to have many uses beyond phylogenetic utility; indeed few EST projects to date have been initiated for phylogenetic purposes. As well as sets of orthologous genes, the EST approach can provide other valuable information. From a phylogenetic point of view, comparisons of gene representation have already opened the way to the discovery of molecular signatures, such as the appearance of a novel gene by horizontal transfer from distantly related species [30], or shared gene

Acknowledgements

We thank Frédéric Delsuc, Nicolas Lartillot, Ken Halanych and four anonymous referees for helpful suggestions. H.P. gratefully acknowledges the financial support provided by Génome Québec, the Canadian Research Chair and the Université de Montréal.

Glossary

Acoelomates
animals that lack a body cavity; includes platyhelminths which are lophotrochozoans and acoelomorph flatworms which are thought to be the sister group of all other bilaterians.
Bilateria
bilaterally symmetrical animals with three tissue layers (synonymous with triploblasts); includes most animal phyla. Cnidarians (jellyfish and hydra), ctenophores (comb jellies) and sponges are the most notable exceptions.
Coelom
a mesodermal, epithelium-lined, fluid-filled body cavity.
Coelomates
animals

References (70)

  • A. Adoutte

    The new animal phylogeny: reliability and implications

    Proc. Natl. Acad. Sci. U. S. A.

    (2000)
  • R. de Rosa

    Hox genes in brachiopods and priapulids and protostome evolution

    Nature

    (1999)
  • A. Haase

    A tissue-specific marker of Ecdysozoa

    Dev. Genes Evol.

    (2001)
  • I. Ruiz-Trillo

    A phylogenetic analysis of myosin heavy chain type II sequences corroborates that Acoela and Nemertodermatida are basal bilaterians

    Proc. Natl. Acad. Sci. U. S. A.

    (2002)
  • B. Hausdorf

    Early evolution of the bilateria

    Syst. Biol.

    (2000)
  • J.B. Dacks

    Analyses of RNA Polymerase II genes from free-living protists: phylogeny, long-branch attraction, and the eukaryotic big bang

    Mol. Biol. Evol.

    (2002)
  • D.S. Horner et al.

    Chaperonin 60 phylogeny provides further evidence for secondary loss of mitochondria among putative early-branching eukaryotes

    Mol. Biol. Evol.

    (2001)
  • G.K. Philip

    The Opisthokonta and the Ecdysozoa may not be clades: stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the Coelomata than Ecdysozoa

    Mol. Biol. Evol.

    (2005)
  • A. Rokas

    Conflicting phylogenetic signals at the base of the metazoan tree

    Evol. Dev.

    (2003)
  • J.E. Blair

    The evolutionary position of nematodes

    BMC Evol. Biol.

    (2002)
  • H. Dopazo

    Phylogenomics and the number of characters required for obtaining an accurate phylogeny of eukaryote model species

    Bioinformatics

    (2004)
  • Y.I. Wolf

    Coelomata and not ecdysozoa: evidence from genome-wide phylogenetic analysis

    Genome Res.

    (2004)
  • J.E. Blair

    Evolutionary sequence analysis of complete eukaryote genomes

    BMC Bioinform.

    (2005)
  • H. Philippe

    Phylogenomics

    Annu. Rev. Ecol. Evol. Syst.

    (2005)
  • J. Felsenstein

    Cases in which parsimony or compatibility methods will be positively misleading

    Syst. Zool.

    (1978)
  • H. Philippe

    Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia

    Mol. Biol. Evol.

    (2005)
  • K.G. Field

    Molecular phylogeny of the Animal Kingdom

    Science

    (1988)
  • H. Philippe

    Can the Cambrian explosion be inferred through molecular phylogeny?

    Development

    (1994)
  • M.D. Hendy et al.

    A framework for the quantitative study of evolutionary trees

    Syst. Zool.

    (1989)
  • K.J. Peterson et al.

    Animal phylogeny and the ancestry of bilaterians: inferences from morphology and 18S rDNA gene sequences

    Evol. Dev.

    (2001)
  • F. Delsuc

    Phylogenomics and the reconstruction of the tree of life

    Nat. Rev. Genet.

    (2005)
  • H. Dopazo et al.

    Genome-scale evidence of the nematode-arthropod clade

    Genome Biol.

    (2005)
  • S.W. Roy et al.

    Resolution of a deep animal divergence by the pattern of intron conservation

    Proc. Natl. Acad. Sci. U. S. A.

    (2005)
  • Lartillot, N. et al. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous...
  • D.M. Hillis

    Is sparse taxon sampling a problem for phylogenetic inference?

    Syst. Biol.

    (2003)
  • Cited by (0)

    View full text