Properties of phylogenetic trees generated by Yule-type speciation models☆
Introduction
Phylogenetic trees are widely used in biology to represent evolutionary relationships between species. In these trees the leaves represent extant species, and the internal vertices represent hypothesised speciation events. There is much interest in the process of speciation, and the extent and manner in which the distribution of phylogenetic tree shapes can be modelled by a random process. Several simple stochastic models of speciation have been proposed and several investigators have aimed to test or refine such models by comparing their predictions with published phylogenetic trees [1], [2], [3], [4], [5], [6], [7], [8], [9]. These models make predictions about the shape of the phylogenetic tree connecting the extant species. These models can provide prior probabilities for phylogenetic trees in Bayesian approaches to tree reconstruction [10], [11], [12], and they are also used as a basis for calculating the probability of certain configurations under random speciation [13]. These probabilities may then be useful in testing hypotheses concerning the speciation process.
In this paper we will consider just the model's predictions regarding the discrete underlying tree structure, without regard to the lengths of the edges. While such an approach may neglect some informative characteristics of the tree, the approach has two motivations – firstly, the predictions regarding the discrete tree remain valid under a much wider class of models (they are insensitive to underlying parameters) and, secondly, we are interested in isolating out the information that is conveyed solely by the discrete tree shape.
In this paper we consider some properties of the Yule model, which is perhaps the simplest stochastic model for speciation. We then define and investigate an extension of this model. We begin by introducing some basic terminology for phylogenetic trees (Section 2). The Yule model is then introduced, and some of its properties are described (Section 3). We then consider the probability distribution on the number of edges separating the root of a tree from the most recent common ancestor of a randomly selected subset of size k (Section 4). Next, a maximum likelihood approach to edge-rooting an unrooted tree is presented, and simulation is used to show that even for large unrooted trees the approximate location of the root can be identified with high probability (Section 5). Following this a modification of the Yule model is considered in which the rate of speciation of a lineage is dependent on the time back to the last speciation event on that lineage (Section 6). We show that this modified model reduces to the uniform model under the condition of `explosive radiation'.
Section snippets
Terminology
Evolutionary relationships are generally represented by rooted or unrooted binary (phylogenetic) trees [14]. Such trees consist of uniquely labeled vertices of degree 1 called leaves and unlabeled internal vertices of degree 3 (also, in case the tree is rooted, it contains an additional root vertex of degree 2 – in this way every vertex can be regarded as having exactly two descendants). We say a vertex v is a descendant of another vertex w, if w lies on the path between v and the root vertex.
The Yule model
A simple model of speciation is to assume the exchangeability condition that, at any given time, each of the then-extant species are equally likely to give rise to one new species. The `rate' of speciation may vary with time, or with the present and past number of species. Also we may allow extinctions (or random sampling of extant taxa) provided that a similar exchangeability criterion applies – that is, whenever an extinction event occurs each of the then-extant species is equally likely to
Depth of a most recent common ancestor
Suppose we evolve a rooted phylogenetic tree T on n extant species under the Yule model, and we select a random subset S of k extant species. Let Xn,k denote the number of edges separating the root of T from the vertex in T that corresponds to the most recent common ancestor (MRCA) of S. In this section we investigate the probability distribution of Xn,k for various values of k, particularly in the limit as n becomes large. Some of the reasons why a biologist might be interested in such
Rooting an unrooted tree
Typically, construction of an evolutionary tree for a set of species is a two stage process. In the first stage, using biological data of some sort, an unrooted tree is constructed. In the next stage, the unrooted tree is rooted at some point. Commonly this is done by outgroup comparison, or using some auxiliary data (for example embryological or fossil data) [27].
However, in some circumstances an outgroup is not available, or the auxiliary data is unclear. Furthermore, the choice of outgroup
An extension of the Yule model
In the Yule model, at any time each existing species has the same probability of giving rise to a new species, and all lineages are treated exchangeably. Here we consider a simple modification of this model, in which the rate of speciation events on a given lineage is a function of the time back to the last speciation event on that lineage.
More precisely, we suppose that at time t=0 there is just one species, labeled s0, subject to a 2-state Markov process on state space {1,2}. Under this
Acknowledgements
We thank Charles Semple and the anonymous referees for some helpful comments on an earlier version of this paper.
References (35)
- et al.
Distributions of cherries for two models of trees
Math. Biosci.
(2000) Rooting molecular trees: problems and strategies
Biol. J. Linnean Soc.
(1994)- D. Aldous, Probability distributions on cladograms, in: D. Aldous, R. Permantle (Eds.), Random Structures, vol. 76,...
Probabilities of evolutionary trees
Syst. Biol.
(1994)The probabilities of rooted tree-shapes generated by random bifurcation
Adv. Appl. Prob.
(1971)Patterns in tree balance among cladistic, phenetic, and randomly generated phylogenetic trees
Evolution
(1992)Patterns in phylogenetic tree balance with variable and evolving speciation rates
Evolution
(1996)- et al.
Inferring the rates of branching and extinction from molecular phylogenies
Evolution
(1995) - et al.
Inferring evolutionary process from phylogenetic tree shape
Quart. Rev. Biol.
(1997) - D. Aldous, Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today [online], Available:...
Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference
Molecular Evolution
Bayesian phylogenetic inference via Markov chain Monte Carlo methods
Biometrics
Phylogenetic tree construction using Markov chain Monte Carlo
J. Am. Statist. Assoc.
Probabilities of n-trees under two models: a demonstration that asymmetrical interior nodes are not improbable
Syst. Zool.
Testing the stochasticity of patterns of organismal diversity: an improved null model
Am. Nat.
The reconstructed evolutionary process
Philos. Trans. R. Soc. London B
Cited by (133)
Clade size distribution under neutral evolutionary models
2024, Theoretical Population BiologyA lattice structure for ancestral configurations arising from the relationship between gene trees and species trees
2024, Discrete Applied MathematicsThe distributions under two species-tree models of the total number of ancestral configurations for matching gene trees and species trees
2024, Advances in Applied MathematicsStrict monotonic trees arising from evolutionary processes: Combinatorial and probabilistic study
2022, Advances in Applied MathematicsCitation Excerpt :In the dynamical construction of the trees, allowing repetition of labels means allowing the addition of several nodes at once in the tree. Our generalisations can be seen as natural discrete-time versions of the classical probabilistic model of Yule trees (see, e.g., [29]); recall that, in the Yule tree, the time between two branchings is exponentially distributed. This work is part of a long-term over-arching project, in which we aim at relaxing the classical rules of increasing labelling (described in, e.g., [4]), by, for example, allowing labels to appear more than once in the tree.
Computing the probability of gene trees concordant with the species tree in the multispecies coalescent
2021, Theoretical Population Biology
- ☆
This research was supported by the New Zealand Marsden Fund (UOC-MIS-003).