Review
The supermatrix approach to systematics

https://doi.org/10.1016/j.tree.2006.10.002Get rights and content

Recent reviews of the construction of large phylogenies have focused on supertree methods that involve separate analyses of data sets and subsequent integration of the resulting trees. Here, we consider the alternative method of analyzing all character data simultaneously. Such ‘supermatrix’ analyses use information from each character directly and enable straightforward incorporation of diverse kinds of data, including characters from fossils. The approach has been extended by the development of new methods, including model-based techniques for analyzing heterogeneous data and hierarchical methods for constructing extremely large trees. Recent work also suggests that the problem of missing data in supermatrix analyses has been overstated. Although the supermatrix approach is not suited for all cases, we suggest that its inherent strengths will ensure that it will continue to have a central role in inferring large phylogenetic trees from diverse data.

Section snippets

Building ever-larger trees

In The Origin of Species [1], Charles Darwin established that descent with modification explains the similarities and differences among all organisms: from bacteria to orchids to humans, we are all connected in a single Tree of Life. Until recently, taxonomic specialists were restricted to reconstructing phylogenetic relationships among relatively few species using small numbers of characters. However, recent advances now enable phylogenetic analyses of thousands of taxa or characters (e.g. 2, 3

Basic strengths of the supermatrix approach

The supermatrix approach is defined by the direct, simultaneous use of all the character evidence from all included taxa (Figure 1). A basic advantage of the approach is simply that it uses this character evidence more fully in estimating the tree than do supertree methods; in supertree analyses, some of the character information within data sets is lost when sets of characters are summarized as trees 6, 13. This advantage of the supermatrix approach is so fundamental (more information is

The gene tree–species tree problem

The standard supermatrix approach implicitly assumes that all characters have experienced the same branching history. It is now well known that this assumption is not always valid. In particular, the gene tree for copies of a gene in different species might not match the species tree (the history of splitting of species lineages) because of hybridization, horizontal gene transfer, gene duplication and sorting of genetic polymorphisms among species lineages (lineage sorting) [22]. This ‘gene

New supermatrix methods

An equally weighted parsimony approach has been perhaps the most common mode of analysis in supermatrix studies (e.g. 18, 29). Proponents of parsimony seek maximally general explanations for similarities among taxa as results of inheritance from a common ancestor; the criterion of generality entails choosing a tree that minimizes the number of character-state changes required to explain character similarities [30]. Here, we describe several new methods that extend the capabilities of

Supermatrices and the Tree of Life

The Tree of Life as we know it includes ∼1.5 million living species [48] plus many thousands of extinct species. Despite the continuing development of methods that increase the speed of phylogenetic analyses (e.g. 40, 49, 50, 51, 52), it might never be possible to estimate the entire Tree of Life reliably using a single, standard supermatrix analysis [53]. Nonetheless, current methods used to construct very large trees suggest that a supermatrix approach can have a central role in assembling

Supermatrices for the future

The supermatrix approach has proven to be a powerful method of combining diverse data to infer phylogenetic relationships. The approach directly uses the evidence provided by each character, often revealing emergent support that is hidden in separate analyses of data partitions, and easily accommodates different classes of character data. Regarding the latter point, molecular systematists sometimes forget that a tree of all known life must include many fossil taxa and, thus, must be derived in

Acknowledgements

We thank C. Hayashi, J. Kim, K. de Queiroz and three anonymous reviewers for discussion or comments on the article, and M. O’Leary and G. Giribet for providing the MorphoBank figure. J.G. was funded by NSF EAR0228629, DEB–0213171 and DEB–0212572.

Glossary

Bayesian phylogenetic method
any method that uses Bayesian inference in phylogenetic estimation. In Bayesian phylogenetic inference, the posterior probability of a tree (which, given certain assumptions, is the probability that the tree is correct) is a function of the product of the tree's prior probability and its likelihood. The most commonly used Bayesian phylogenetic method uses a Markov Chain Monte Carlo technique designed to visit different tree topologies with a frequency proportional to

References (73)

  • H. Glenner

    Bayesian inference of the metazoan phylogeny: a combined molecular and morphological approach

    Curr. Biol.

    (2004)
  • P.A. Goloboff

    Analyzing large data sets in reasonable times: solutions for composite optima

    Cladistics

    (1999)
  • J.S. Farris

    Parsimony jackknifing outperforms neighbor-joining

    Cladistics

    (1996)
  • R.L. Graham et al.

    Unlikelihood that minimal phylogenies for a realistic biological study can be constructed in reasonable computation time

    Math. Biosci.

    (1982)
  • J.J. Wiens

    Missing data and the design of phylogenetic analyses

    J. Biomed. Inform.

    (2006)
  • T. Grant et al.

    Data exploration in phylogenetic inference: scientific, heuristic, or neither

    Cladistics

    (2003)
  • C. Darwin

    On the Origin of Species

    (1859)
  • M. Källersjö

    Simultaneous parsimony jackknife analysis of 2538 rbcL DNA sequences reveals support for major clades of green plants, land plants, seed plants, and flowering plants

    Plant Syst. Evol.

    (1998)
  • A. Rokas

    Genome-scale approaches to resolving incongruence in molecular phylogenies

    Nature

    (2003)
  • A.G. Kluge

    A concern for evidence and a phylogenetic hypothesis of relationships among Epicrates (Boidae, Serpentes)

    Syst. Zool.

    (1989)
  • A.C. Driskell

    Prospects for building the Tree of Life from large sequence databases

    Science

    (2004)
  • J. Gatesy

    Inconsistencies in arguments for the supertree approach: supermatrices versus supertrees of Crocodylia

    Syst. Biol.

    (2004)
  • M.S.Y. Lee

    Molecular evidence and marine snake origins

    Biol. Lett.

    (2005)
  • R.J. Asher

    Relationships of endemic African mammals and their fossil relatives based on morphological and molecular evidence

    J. Mamm. Evol.

    (2003)
  • J.H. Geisler et al.

    Phylogenetic relationships of extinct cetartiodactyls: results of simultaneous analyses of molecular, morphological, and stratigraphic data

    J. Mamm. Evol.

    (2005)
  • F. Delsuc

    Phylogenomics and the reconstruction of the tree of life

    Nat. Rev. Genet.

    (2005)
  • A. de Queiroz

    Separate versus combined analysis of phylogenetic evidence

    Annu. Rev. Ecol. Syst.

    (1995)
  • M. Barrett

    Against consensus

    Syst. Zool.

    (1991)
  • R.G. Olmstead et al.

    Combining data in phylogenetic systematics: an empirical approach using three molecular data sets in the Solanaceae

    Syst. Biol.

    (1994)
  • C.L. Lambkin

    Partitioned Bremer support localizes significant conflict in bee flies (Diptera: Bombyliidae: Anthracinae)

    Invertebr. Syst.

    (2004)
  • N. Wahlberg

    Synergistic effects of combining morphological and molecular data in resolving the phylogeny of butterflies and skippers

    Proc. R. Soc. B

    (2005)
  • J.G. Burleigh

    Supertree bootstrapping methods for assessing phylogenetic variation among genes in genome-scale data sets

    Syst. Biol.

    (2006)
  • D.M. Hillis

    Taxonomic sampling, phylogenetic accuracy, and investigator bias

    Syst. Biol.

    (1998)
  • D.J. Zwickl et al.

    Increased taxon sampling greatly reduces phylogenetic error

    Syst. Biol.

    (2002)
  • W.P. Maddison

    Gene trees in species trees

    Syst. Biol.

    (1997)
  • W.P. Maddison et al.

    Inferring phylogeny despite incomplete lineage sorting

    Syst. Biol.

    (2006)
  • Cited by (325)

    View all citing articles on Scopus
    View full text