Does gene flow destroy phylogenetic signal? The performance of three methods for estimating species phylogenies in the presence of gene flow
Introduction
The fundamental goal of systematics is to understand the process of lineage divergence that leads to the formation of new species. Since Maddison (1997) there has been growing acceptance among systematists that gene genealogies are not always congruent with species phylogenies (e.g. the actual pattern of lineage splitting and descent from common ancestors). It is now widely recognized that processes such as gene duplication (Fitch, 1970), lateral transfer (Cummings, 1994) and incomplete lineage sorting (Tajima, 1983, Takahata and Nei, 1985, Hudson, 1992) can lead to incongruence between gene trees and species trees, and empirical examples of each process exist (cf. Syring et al., 2007 for an example of incomplete lineage sorting). This realization has prompted the development of approaches designed to estimate species phylogenies despite the process that presumably caused the incongruence. For example, gene tree parsimony (Slowinski and Page, 1999) was developed to account for gene duplication, while the minimization of deep coalescence (MDC; Maddison, 1997), COAL (Degnan and Salter, 2005), and BEST (Edwards et al., 2007, Liu and Pearl, 2007) were designed in part to estimate species phylogeny when the discord between the gene trees and species tree is a result of the incomplete sorting of ancestral polymorphisms.
At the initial stages of divergence, incomplete lineage sorting is ubiquitous and likely produces the majority of gene-species tree discord among closely related lineages. This is a direct outcome of population-level processes; consequently, the developers of methods have incorporated statistical models derived from the coalescent (Kingman, 1982, Hudson, 1990) into species-level phylogenetic analyses to account for these processes. However, for many empirical systems it is also these lineages that exchange migrants, particularly when they occur in sympatry. Since genetic polymorphism shared among lineages can result from either retained ancestral polymorphism or a gene copy introduced into the population via gene flow (Slatkin and Maddison, 1989), it is often difficult to determine which process produced the shared polymorphism. Fully statistical treatments of coalescence, gene flow, and divergence are currently available only for pairwise comparisons between two lineages (Nielsen and Wakeley, 2001, Hey and Nielsen, 2004, Hey and Nielsen, 2007, Hey, 2006).
It is an understatement to suggest that the biologist who wishes to estimate species phylogeny in a system where details such as (a) the number of lineages, (b) the relationship among lineages, and (c) the amount of gene flow are unclear is currently faced with a difficult task. Methods that estimate a species phylogeny using some approach derived from the coalescent must be robust to at least moderate levels of gene flow (e.g. levels that not be easily recognizable) to be of any use to the majority of empirical biologists, or the use of such methods may result in spurious conclusions about the actual pattern of lineage divergence. The data we present in this manuscript were collected out of a desire to explore how the phylogenetic signal contained in DNA sequence data is affected by gene flow in recently diverged lineages. Does gene flow destroy phylogenetic signal entirely, or are some methods able to accurately estimate species phylogeny when some of the shared polymorphisms result from gene flow? In order to explore this issue, we evaluate approaches based on the coalescent that use estimated gene trees as input in an attempt to isolate gene flow as the sole factor affecting phylogenetic accuracy.
Section snippets
Statistical inference of species trees from gene trees
A renewed interest exists in the development and interpretation of statistical methods for the inference of species trees from gene trees (Maddison and Knowles, 2006). A myriad of innovative approaches have been developed (Slatkin and Maddison, 1989, Maddison, 1997, Page and Charleston, 1997, Slowinski and Page, 1999, Liu and Pearl, 2006, Edwards et al., 2007, Carstens and Knowles, 2007), as well as applied to empirical questions in phylogeography and systematics (Knowles and Carstens, 2007,
Performance of ESP-COAL
The type and magnitude of gene flow affected the ability to infer the correct ST topology using ESP-COAL (Fig. 2A). In general, models of historical gene flow did not greatly degrade the phylogenetic accuracy, regardless of the magnitude (Nem = 0.01–1.00) or duration (0.1xNe or 0.5xNe generations) of gene flow. The probability of identifying the correct ST never dipped below 0.70 for any parameter combination for either the parapatric or allopatric models. In contrast, phylogenetic accuracy was
Explanation of results
Incomplete lineage sorting has emerged as a common problem for phylogenetic inference at the species level. Given the volume of mathematical theory predicting this phenomenon (cf. Pamilo and Nei, 1988, Rosenberg, 2002, Rosenberg, 2003), this may not be surprising. Several methods of inferring species phylogenies from gene trees have incorporated the stochastic process of incomplete lineage sorting (Maddison, 1997, Degnan and Salter, 2005, Liu and Pearl, 2007). While these methods are clearly at
Acknowledgments
We thank Benjamin Hall and Wennie Chou for providing the Rhododendron DNA sequences. Special thanks to Gabriel Rosa and John Liechty for assistance with the Department of Plant Sciences computing cluster located at the University of California, Davis and with PERL scripting. We thank Amy Litt, Jeffrey Oliver, and one anonymous reviewer for providing insightful comments that significantly improved this manuscript.
References (57)
Transmission patterns of eukaryotic transposable elements: arguments for and against horizontal transfer
Trends Ecol. Evol.
(1994)Recent advances in assessing gene flow between diverging populations and species
Curr. Opin. Genet. Dev.
(2006)The coalescent
Stochastic Process. Appl.
(1982)Phylogeny and biogeography of Rhododendron subsection Pontica, a group with a tertiary relict distribution
Mol. Phylogenet. Evol.
(2004)- et al.
From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem
Mol. Phylogenet. Evol.
(1997) The probability of topological concordance of gene trees and species trees
Theor. Popul. Biol.
(2002)- et al.
Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative
Syst. Biol.
(2006) Phylogeography: The History and Formation of Species
(2000)- et al.
Maximum likelihood estimation of migration rates and population numbers of two populations using a coalescent approach
Genetics
(1999) - et al.
Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach
Proc. Nat. Acad. Sci. USA
(2001)
Comparison of species tree methods for reconstructing the phylogeny of bearded manakins (Aves: Pipridae: Manacus) from multilocus sequence data
Syst. Biol.
Integrating phylogenetic and population genetic analyses of multiple loci to test species divergence hypotheses in Passerina buntings
Genetics
Estimating phylogeny from gene tree probabilities in Melanoplus grasshoppers despite incomplete lineage sorting
Syst. Biol.
The Encyclopedia of Rhododendron Species
Gene tree distributions under the coalescent process
Evolution
Discordance of species trees with their most likely gene trees
PLoS Genet.
High-resolution species trees without concatenation
Proc. Natl. Acad. Sci. USA
SIMCOAL: a general coalescent program for the simulation of molecular data in interconnected populations with arbitrary demography
J. Hered.
Confidence limits on phylogenies. An approach using the bootstrap
Evolution
Inferring Phylogenies
Distinguishing homologous from analogous proteins
Syst. Zool.
The molecular systematics of Rhododendron (Ericaceae): a phylogeny based upon RPB2 gene sequences
Syst. Bot.
MsBayes: a flexible pipeline for comparative phylogeographic inference using approximate Bayesian computation (ABC)
BMC Bioinformatics
Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. Persimilis
Genetics
Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics
Proc. Natl. Acad. Sci. USA
Testing the constant-rate neutral allele model with protein sequence data
Evolution
Gene genealogies and the coalescent process
Gene trees, species trees and the segregation of ancestral alleles
Genetics
Cited by (122)
Gene flow in phylogenomics: Sequence capture resolves species limits and biogeography of Afromontane forest endemic frogs from the Cameroon Highlands
2021, Molecular Phylogenetics and EvolutionPhylogenomic approach reveals strong signatures of introgression in the rapid diversification of neotropical true fruit flies (Anastrepha: Tephritidae)
2021, Molecular Phylogenetics and EvolutionPhylotranscriptomic evidence for pervasive ancient hybridization among Old World salamanders
2021, Molecular Phylogenetics and EvolutionCitation Excerpt :In addition, ancient introgression can involve now-extinct species and thus be more difficult to detect. While the application of phylogenetic inference methods that account for ILS is now common, primarily in the framework of the Multi-Species Coalescent (MSC), introgression has been widely ignored in large scale phylogenetic studies (Eckert & Carstens 2008). The extension of the MSC into the Multi-Species Network Coalescent (Degnan 2018) allowed the development of models accounting for both ILS and introgression as sources of variation among gene trees.
Systematics of a Neotropical clade of dead-leaf-foraging antwrens (Aves: Thamnophilidae; Epinecrophylla)
2021, Molecular Phylogenetics and EvolutionCryptic diversity in Brazilian endemic monkey frogs (Hylidae, Phyllomedusinae, Pithecopus) revealed by multispecies coalescent and integrative approaches
2019, Molecular Phylogenetics and EvolutionResolving taxonomic turbulence and uncovering cryptic diversity in the musk turtles (Sternotherus) using robust demographic modeling
2018, Molecular Phylogenetics and Evolution