Key Points
-
Population genetics can be used to study the history of natural populations. However, it is a difficult science because natural populations have complex geographies and histories.
-
With the advent of DNA-sequence-based data sets drawn from natural populations two main schools of study developed: the phylogeographic approach, which uses the data to estimate the evolutionary tree, or gene tree, then attempts to interpret the history of the populations from which the samples came; and the summary statistics approach, which is an outgrowth of mathematical population genetics and proceeds by mathematically fitting specific population-genetic models to the data.
-
The phylogeographic approach has the advantage of not being constrained by specific models, and lends itself to exploratory types of analysis. However, it is highly dependent on gene-tree estimates, which are often incorrect. This method can be misleading if investigators focus on just a single gene or stretches of tightly-linked sequence, such as mitochondrial DNA, and overlook the large stochastic variance that arises among genes in populations.
-
Summary-statistic approaches can be mathematically sophisticated and provide ways to compare models and assess the sources of variance in the process that gave rise to the data. However, these methods are often highly constrained by the available models and are difficult to apply if investigators have little knowledge of the locations and boundaries of populations in nature. Also, they do not usually take full advantage of all of the information that is available in the data.
-
In recent years, a new family of methods has begun to offer the advantages of the phylogeographic approach, using all of the information in the data and allowing diverse models to be considered, together with the mathematical sophistication of the summary-statistics methods. These are probabilistic methods in which gene trees have a role, but in a framework in which they are used strictly in conjunction with their probability. As these methods continue to develop, they offer the promise of increased flexibility and applicability to a wide range of questions in the history of populations.
Abstract
Natural populations, including those of humans, have complex geographies and histories. Studying how they evolve is difficult, but it is possible with population-based DNA sequence data. However, the study of structured populations is divided by two distinct schools of thought and analysis. The phylogeographic approach is fundamentally graphical and begins with a gene-tree estimate. By contrast, the more traditional approach of using summary statistics is fundamentally mathematical. Both approaches have limitations, but there is promise in newer probabilistic methods that offer the flexibility and data exploitation of the phylogeographic approach in an explicitly model-based mathematical framework.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Provine, W. B. The Origins of Theoretical Population Genetics (Univ. of Chicago Press, Chicago, 1971).
Fisher, R. The Genetical Theory of Natural Selection (Clarenson, Oxford, 1930).
Wright, S. Evolution in Mendelian populations. Genetics 16, 97–159 (1931). The first paper to mathematically address the effects of population structure on patterns of genetic variation.
Wright, S. Evolution and the Genetics of Populations Volume 2: The Theory of Gene Frequencies (Univ. of Chicago Press, Chicago, 1969).
Wakeley, J. & Hey, J. Estimating ancestral population parameters. Genetics 145, 847–855 (1997).
Wakeley, J. Nonequilibrium migration in human history. Genetics 153, 1863–1871 (1999).
Slatkin, M. Isolation by distance in equilibrium and non-equilibrium populations. Evolution 47, 264–279 (1993).
Van Dooren, T. J. M. & Metz, J. A. J. Delayed maturation in temporally structured populations with non-equilibrium dynamics. J. Evol. Biol. 11, 41–62 (1998).
Avise, J. C. et al. Intraspecific phylogeography: the mitochondrial-DNA bridge between population genetics and systematics. Annu. Rev. Ecol. Syst. 18, 489–522 (1987). This review paper marks the birth of phylogeography.
Avise, J. C. Phylogeography (Harvard Univ. Press, Cambridge, Massachusetts, 2000).
Bermingham, E. & Mortiz, C. Comparative phylogeography: concepts and applications. Mol. Evol. 7, 367–369 (1998).
Kingman, J. F. C. The coalescent. Stoch. Proc. Appl. 13, 235–248 (1982). The original mathematical description of the coalescent theory.
Hudson, R. R. in Oxford Surveys in Evolutionary Biology (eds Futuyma, D. & Antonovics, J.) 1–44 (Oxford Univ. Press, New York, 1990). A comprehensive review of coalescent theory by one of its developers, which provides computer code for conducting basic simulations of neutral processes.
Rosenberg, N. A. & Nordborg, M. Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nature Rev. Genet. 3, 380–390 (2002).
Tavare, S. Line-of-descent and genealogical processes, and their applications in population genetics models. Theor. Popul. Biol. 26, 119–164 (1984).
Hare, M. P. Prospects for nuclear gene phylogeography. Trends Ecol. Evol. 16, 700–706 (2001).
Bernardi, G., Sordino, P. & Powers, D. A. Concordant mitochondrial and nuclear DNA phylogenies for populations of the teleost fish Fundulus heteroclitus. Proc. Natl Acad. Sci. USA 90, 9271–9274 (1993).
Burton, R. S. & Lee, B. N. Nuclear and mitochondrial gene genealogies and allozyme polymorphism across a major phylogeographic break in the copepod Tigriopus californicus. Proc. Natl Acad. Sci. USA 91, 5197–5201 (1994).
Palumbi, S. R. & Baker, C. S. Contrasting population structure from nuclear intron sequences and mtDNA of humpback whales. Mol. Biol. Evol. 11, 426–435 (1994).
Hare, M. P. & Avise, J. C. Population structure in the American oyster as inferred by nuclear gene genealogies. Mol. Phylogenet. Evol. 15, 119–128 (1998).
Hare, M. P., Cipriano, F. & Palumbi, S. R. Genetic evidence on the demography of speciation in allopatric dolphin species. Evolution 56, 804–816 (2002).
Machado, C. A. & Hey, J. The causes of phylogenetic conflict in a classic Drosophila species group. Proc. Royal Soc. Lond. B 270, 1193–1202 (2003).
Cann, R. L., Stoneking, M. & Wilson, A. C. Mitochondrial DNA and human evolution. Nature 325, 31–36 (1987). A much-discussed paper that describes one of the first attempts to use mitochondrial DNA data to study the history of the human species.
Vigilant, L., Stoneking, M., Harpending, H., Hawkes, K. & Wilson, A. C. African populations and the evolution of human mitochondrial DNA. Science 253, 1503–1507 (1991).
Maddison, D. R., Ruvolo, M. & Swofford, D. L. Geographic origins of human mitochondrial DNA: phylogenetic evidence from control region sequences. Syst. Biol. 41, 111–124 (1992).
Templeton, A. R. Human origins and analysis of mitochondrial DNA sequences. Science 255, 737 (1992).
Templeton, A. R. The “Eve” hypothesis: a genetic critique and reanalysis. Am. Anthropol. 95, 51–72 (1993).
Hey, J. Mitochondrial and nuclear genes present conflicting portraits of human origins. Mol. Biol. Evol. 14, 166–172 (1997).
Templeton, A. R., Routman, E. & Phillips, C. A. Separating population structure from population history: a cladistic analysis of the geographical distribution of mitochondrial DNA haplotypes in the tiger salamander, Ambystoma tigrinum. Genetics 140, 767–782 (1995). The original description of the nested-clade-analysis method.
Templeton, A. R. Nested clade analyses of phylogeographic data: testing hypotheses about gene flow and population history. Mol. Ecol. 7, 381–397 (1998).
Templeton, A. Out of Africa again and again. Nature 416, 45–51 (2002).
Stringer, C. B. & Andrews, P. Genetic and fossil evidence for the origins of modern humans. Science 239, 1263–1268 (1988).
Knowles, L. L. & Maddison, W. P. Statistical phylogeography. Mol. Ecol. 11, 2623–2635 (2002).
Edwards, S. V. & Beerli, P. Gene divergence, population divergence, and the variance in coalescence time in phylogeographic studies. Evolution 54, 1839–1854 (2000).
Hudson, R. R. & Turelli, M. Stochasticity overrules the “three-times rule”: genetic drift, genetic draft, and coalescence times for nuclear loci versus mitochondrial DNA. Evolution 57, 182–190 (2003).
Hudson, R. R. & Coyne, J. A. Mathematical consequences of the genealogical species concept. Evolution 56, 1557–1565 (2002).
Maynard Smith, J. & Haigh, J. The hitch-hiking effect of a favourable gene. Genome Res. 23, 23–35 (1974).
Felsenstein, J. Phylogenies from molecular sequences: inference and reliability. Annu. Rev. Gen. 22, 521–565 (1988).
Swofford, D., Olsen, G., Waddel, P. & Hillis, D. in Molecular Systematics (eds. Hillis, D., Mortiz, C. & Mable, B.) 486–493 (Sinauer Associates, Sunderland, Massachusetts, 1996).
Hudson, R. R. & Kaplan, N. L. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164 (1985).
Templeton, A. R. et al. Recombinational and mutational hotspots within the human lipoprotein lipase gene. Am. J. Hum. Genet. 66, 69–83 (2000).
Kimura, M. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61, 893–903 (1969).
Ewens, W. J. The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3, 87–112 (1972).
Watterson, G. A. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7, 256–275 (1975).
Wright, S. The genetical structure of populations. Ann. Eugen. 15, 323–354 (1951).
Wright, S. The interpretation of population structure by F-statistics with special regards to systems of mating. Evolution 19, 395–420 (1965).
Slatkin, M. & Voelm, L. Fst in a hierarchical island model. Genetics 127, 627–629 (1991).
Slatkin, M. Inbreeding coefficients and coalescence times. Genome Res. 58, 167 (1991).
Notohara, M. The coalescent and the genealogical process in geographically structured population. J. Math. Biol. 29, 59–75 (1990).
Wakeley, J. Segregating sites in Wright's Island model. Theor. Popul. Biol. 53, 166–174 (1998).
Wakeley, J. The effects of subdivision on the genetic divergence of populations and species. Evolution 54, 1092–1101 (2000).
Wilkins, J. F. & Wakeley, J. The coalescent in a continuous, finite, linear population. Genetics 161, 873–888 (2002).
Whitlock, M. C. Neutral additive genetic variance in a metapopulation. Genet. Res. 74, 215–221 (1999).
Wakeley, J. & Aliacar, N. Gene genealogies in a metapopulation. Genetics 159, 893–905 (2001).
Hey, J. A multi-dimensional coalescent process applied to multi-allelic selection models and migration models. Theor. Popul. Biol. 39, 30–48 (1991).
Tajima, F. Evolutionary relationships of DNA sequences in finite populations. Genetics 105, 437–460 (1983).
Fu, Y. X. Estimating effective population size or mutation rate using the frequencies of mutations of various classes in a sample of DNA sequences. Genetics 138, 1375–1386 (1994).
Tajima, F. The effect of change in population size on DNA polymorphism. Genetics 123, 597–601 (1989).
Slatkin, M. & Hudson, R. R. Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129, 555–562 (1991).
Rogers, A. R. & Harpending, H. Population growth makes waves in the distribution of pairwise genetic differences. Mol. Biol. Evol. 9, 552–568 (1992).
Innan, H. & Stephan, W. The coalescent in an exponentially growing metapopulation and its application to Arabidopsis thaliana. Genetics 155, 2015–2019 (2000).
Hudson, R. R., Slatkin, M. & Maddison, W. P. Estimation of levels of gene flow from DNA sequence data. Genetics 132, 583–589 (1992).
Tajima, F. DNA polymorphism in a subdivided population: the expected number of segregating sites in the two-subpopulation model. Genetics 123, 229–240 (1989).
Tajima, F. Relationship between migration and DNA polymorphism in a local population. Genetics 126, 231–234 (1990).
Slatkin, M. The average number of sites separating DNA sequences drawn from a subdivided population. Theor. Popul. Biol. 32, 42–49 (1987).
Strobeck, C. Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision. Genetics 117, 149–153 (1987).
Wakeley, J. Pairwise differences under a general model of population subdivision. J. Genet. 75, 81–89 (1996).
Arbogast, B. S., Edwards, S. V., Wakeley, J., Beerli, P. & Slowinski, J. B. Estimating divergence times from molecular data on phylogenetic and population genetic timescales. Annu. Rev. Ecol. Syst. 33, 707–740 (2002).
Ford, M. J. Applications of selective neutrality tests to molecular ecology. Mol. Ecol. 11, 1245–1262 (2002).
Braverman, J. M., Hudson, R. R. & Stephan, W. The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. Genetics 140, 783–796 (1990).
Fu, Y. X. & Li, W. H. Statistical tests of neutrality of mutations. Genetics 133, 693–709 (1993).
Tavare, S., Balding, D. J., Griffiths, R. C. & Donnelly, P. Inferring coalescence times from DNA sequence data. Genetics 145, 505–518 (1997).
Beaumont, M. A., Zhang, W. & Balding, D. J. Approximate bayesian computation in population genetics. Genetics 162, 2025–2035 (2002).
Hudson, R. R., Kreitman, M. & Aguadé, M. A test of neutral molecular evolution based on nucleotide data. Genetics 116, 153–159 (1987).
Slatkin, M. & Maddison, W. P. A cladistic measure of gene flow inferred from the phylogenies of alleles. Genetics 123, 603–613 (1989). The first method that was developed to estimate migration rates using a gene tree.
Felsenstein, J. Estimating effective population size from samples of sequences: a bootstrap Monte Carlo integration method. Gene. Res. 60, 209–220 (1992). The first study to describe a method to estimate a population-genetic parameter (population size) by integrating over multiple gene trees.
Fu, Y. X. A phylogenetic estimator of effective population size or mutation rate. Genetics 136, 685–692 (1994).
Nee, S., Holmes, E. C., Rambaut, A. & Harvey, P. H. Inferring population history from molecular phylogenies. Phil. Trans. Royal Soc. Lond. B 349, 25–31 (1995).
Pybus, O. G., Rambaut, A. & Harvey, P. H. An integrated framework for the inference of viral population history from reconstructed genealogies. Genetics 155, 1429–1437 (2000).
Felsenstein, J., Kuhner, M. K., Yamato, J. & Beerli, P. in Statistics in Genetics and Molecular Biology (ed. Seillier-Moiseiwitsch, F.) (Institute of Mathematical Statistics and American Mathematical Soc., Hayward, California, 1999).
Griffiths, R. C. & Tavare, S. Simulating probability distributions in the coalescent. Theor. Popul. Biol. 46, 131–159 (1994).
Griffiths, R. C. & Tavare, S. The age of a mutation in a general coalescent tree. Stochastic Models 14, 273–295 (1998).
Kuhner, M. K., Yamato, J. & Felsenstein, J. Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics 140, 1421–1430 (1995).
Bahlo, M. & Griffiths, R. C. Inference from gene trees in a subdivided population. Theor. Popul. Biol. 57, 79–95 (2000).
Kuhner, M. K., Yamato, J. & Felsenstein, J. Maximum likelihood estimation of population growth rates based on the coalescent. Genetics. 149, 429–434 (1998).
Kuhner, M. K., Yamato, J. & Felsenstein, J. Maximum likelihood estimation of recombination rates from population data. Genetics 156, 1393–1401 (2000).
Nielsen, R. Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154, 931–942 (2000).
Beerli, P. & Felsenstein, J. Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152, 763–773 (1999).
Takahata, N. & Slatkin, M. Genealogy of neutral genes in two partially isolated populations. Theor. Popul. Biol. 38, 331–350 (1990). The first paper to address the difficulty of distinguishing the presence of gene flow in a non-equilibrium isolation model.
Hey, J. in Molecular Approaches to Ecology and Evolution. (eds. Schierwater, B., Streit, B., Wagner, G. & DeSalle, R.) 435–449 (Birkhäuser, Basel, 1994).
Wakeley, J. & Hey, J. in Molecular Approaches to Ecology and Evolution (eds. DeSalle, R. & Schierwater, B.) 157–175 (Birkhäuser, Basel, 1998).
Nielsen, R. & Wakeley, J. Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 158, 885–896 (2001).
Moran, P. A. P. Random processes in genetics. Camb. Philos. Soc. Proc. 54, 60–71 (1958).
Templeton, A. R., Crandall, K. A. & Sing, C. F. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimating. Genetics 132, 619–633 (1992).
Templeton, A. R., Boerwinkle, E. & Sing, C. F. Cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. I. Basic theory and an analysis of alcohol dehydrogenase activity in Drosophila. Genetics 117, 343–351 (1987).
Templeton, A. R. & Sing, C. F. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping IV. Nested analyses with cladogram uncertainty and recombination. Genetics 134, 659–669 (1993).
Posada, D., Crandall, K. A. & Templeton, A. R. GeoDis: a program for the cladistic nested analysis of the geographical distribution of genetic haplotypes. Mol. Ecol. 9, 487–488 (2000).
Wright, S. Breeding structure of populations in relation to speciation. Am. Nat. 74, 232–248 (1940).
Kimura, M. & Weiss, G. H. The stepping stone model of population structure and the decrease of genetic correlation with distance. Genetics 49, 561–576 (1964).
Wright, S. Isolation by distance. Genetics 28, 114–138 (1943).
Malecot, G. The Mathematics of Heredity (Freeman, San Francisco, 1969).
Slatkin, M. Gene flow and genetic drift in a species subject to frequent local extinction. Theor. Popul. Biol. 12, 253–262 (1977).
Wade, M. J. & McCauley, D. E. Extinction and recolonization: their effects on the genetic differentiation of local populations. Evolution 42, 995–1005 (1988).
Acknowledgements
We are grateful to M. Hare, Y.-J. Won and two anonymous referees for helpful suggestions and corrections. This work was supported in part by a grant from the National Institutes of Health to J.H.
Author information
Authors and Affiliations
Corresponding author
Glossary
- DEMOGRAPHIC HISTORY
-
The reproductive history of a population or group of populations. This can include population sizes, sex ratios, migration rates, population-splitting events, variation in reproductive rates and times among organisms, as well as variation over time in all of these quantities.
- POISSON DISTRIBUTION
-
A probability distribution that is commonly used to describe the frequency at which similar but independent events can be expected to occur over a given period of time.
- GENE EXCHANGE
-
The process by which genetic material is shared among organisms, which can occur through sexual reproduction or lateral genetic transfer.
- GENETIC DRIFT
-
Random changes in gene frequency in a population that occur when a finite number of progeny are formed by the random sampling of gametes from the parents.
- HARDY-WEINBERG
-
A classical mathematical principle in population genetics that describes the expected frequencies of genotypes for one locus after one generation of random mating if the allele frequencies in the parents are known.
- EVOLUTIONARY TREE
-
A graph or branching diagram that describes the pattern of evolutionary ancestry (historical relationships) among a group of organisms.
- GENE TREE
-
A graph or branching diagram that describes the pattern of ancestry among homologous DNA sequences from different individuals of a population or species.
- PHYLOGENETIC TREE
-
A graph or branching diagram that describes the pattern of ancestry among different species or other taxa.
- SYSTEMATICS
-
A branch of biology that deals with the classification of living organisms on the basis of their evolutionary relationships. This differs from 'taxonomy' as organisms are grouped on the basis of shared ancestry, not just on their similarities (which might or might not correspond to shared evolutionary history).
- COALESCENT THEORY
-
A mathematical approach that models the depths of gene trees for samples that are drawn from one or more closely related populations.
- ESTIMATOR
-
A method for calculating an estimate of a parameter in a model.
- SUMMARY STATISTIC
-
A number that is calculated from a data set, which represents much of the information in the data. For a set of DNA sequences, one commonly used summary statistic is S, which represents the number of variable sites in the sample. Summary statistics are often easier to use to fit models to data than would be the case with the data itself.
- OUTGROUP
-
A sample or group of samples that are included in an evolutionary tree because they are known, or assumed, to connect directly to the root of the tree (that is, to the node of the tree that represents the common ancestor of all samples in the tree).
- HOMOPLASY
-
Identical character states (for example, the same nucleotide base in a DNA sequence) that are not the result of common ancestry (not homologous), but arose independently in different ancestors by parallel or convergent mutations.
- LINKAGE BLOCK
-
A region of DNA that is inherited as a single unit owing to a lack of recombination, such as the mitochondrial DNA of metazoans. The histories of genes that are located in such regions are not independent, and are equally affected by all the selective forces that have acted anywhere in the linkage block.
- ALLOPATRIC DIFFERENTIATION
-
The process of divergence between populations or species that are geographically separated.
- INFERENCE KEY
-
A list of paired rules that are used for diagnosis or identification. Keys are a classic tool for identifying organisms to the species level, on the basis of the presence or absence of specific morphological characters or character states. A similar tool is used in nested-clade analysis to distinguish between different historical scenarios.
- HEURISTIC
-
A method of inference that relies on educated guesses or simplifications that limit the parameter space over which solutions are searched. This approach is not guaranteed to find the correct answer.
- STOCHASTIC VARIANCE
-
In the context of gene histories, this is the variation in gene trees and mutations among unlinked genes that have passed through the same demographic history of populations of organisms.
- MONOPHYLY
-
The property that is attributed to a group of samples in an evolutionary tree that all share the same common ancestor exclusive of other samples in the tree. A set of samples that constitute an entire branch on an evolutionary tree is said to be monophyletic.
- F-STATISTICS
-
A method of summary statistics that was devised by Sewall Wright to describe correlations among alleles that are sampled at different hierarchical levels (individuals, subpopulations and total populations). F-statistics are frequently used to describe the presence of population structure.
- SINGLETON MUTATIONS
-
Polymorphic sites in which a rare base is found in only one of the sampled sequences.
- POLYMORPHIC-SITE FREQUENCY DISTRIBUTION
-
A polymorphic site in a DNA sequence can be described by the frequency of one of its variable bases. The distribution of these values for all the polymorphic sites in a sample can be described using a histogram or bar chart. The shape of the histogram can provide qualitative information on the processes that are involved in the history of the sample.
Rights and permissions
About this article
Cite this article
Hey, J., Machado, C. The study of structured populations — new hope for a difficult and divided science. Nat Rev Genet 4, 535–543 (2003). https://doi.org/10.1038/nrg1112
Issue Date:
DOI: https://doi.org/10.1038/nrg1112
This article is cited by
-
The persistent homology of genealogical networks
Applied Network Science (2023)
-
Phylogenetic relationships of sleeper gobies (Eleotridae: Gobiiformes: Gobioidei), with comments on the position of the miniature genus Microphilypnus
Scientific Reports (2022)
-
Inferring number of populations and changes in connectivity under the n-island model
Heredity (2021)
-
Genetic and morphological differentiation in the green swordtail fish, Xiphophorus hellerii: the influence of geographic and environmental factors
Hydrobiologia (2021)
-
The IICR and the non-stationary structured coalescent: towards demographic inference with arbitrary changes in population structure
Heredity (2018)