Regular Article
Simulating Probability Distributions in the Coalescent

https://doi.org/10.1006/tpbi.1994.1023Get rights and content

Abstract

We describe some computational algorithms for computing probability distributions for sample configurations from the finite-sites models in population genetics. One particular interest is the development of computational methods for estimating substitution rates for DNA sequence data using likelihood techniques. The approach uses a recursion satisfied by the sampling probabilities to construct a Markov chain with a set of absorbing states in such a way that the required sampling distribution is the mean of a functional of the process up to the absorption time. This provides a conceptually simple framework for simulating the likelihood of the data for a set of parameter values. The method is particularly attractive in practice: it is simple to program and can be extended to cover other features of interest such as the infinitely-many-sites process, recombination, selection, and variable population size.

References (0)

Cited by (227)

  • A characterisation of the reconstructed birth–death process through time rescaling

    2020, Theoretical Population Biology
    Citation Excerpt :

    We briefly review relevant known results which we will rely on throughout the paper. Secondly, the coalescent with variable population size, as described by Griffiths and Tavaré (1994), can be described as an inhomogeneous pure-death process, where the death rate is quadratic in the number of lineages and depends on a population size function. Because the death rate of the RRP is linear in the number of lineages, there is no population size function which would equate the two models.

  • Allele frequency spectra in structured populations: Novel-allele probabilities under the labelled coalescent

    2020, Theoretical Population Biology
    Citation Excerpt :

    Restricted to the infinite-alleles model of mutation, it can be solved inductively, by progressively incrementing sample size, number of distinct allelic classes, and number of singleton alleles, to produce AFS probabilities for samples derived from structured populations (Uyenoyama et al., 2019). A number of works have explored methods for approximating the distribution of the immediate ancestor of an observed sample in a variety of demographic contexts (e.g., Hoppe, 1987; Griffiths and Tavaré, 1994a,b; Stephens and Donnelly, 2000; Tavaré, 2004; De Iorio and Griffiths, 2004b). By addressing the distribution of the next-sampled gene, Stephens and Donnelly (2000) developed a more efficient class of importance sampling (IS) proposal distributions for generating genealogical histories.

  • Ancestral inference from haplotypes and mutations

    2018, Theoretical Population Biology
  • Computing the joint distribution of the total tree length across loci in populations with variable size

    2017, Theoretical Population Biology
    Citation Excerpt :

    Moreover, ancient and contemporary population structure can lead to the accumulation of private genetic variation in certain sub-populations. Methods to study genetic variation, or perform inference, in populations with varying size or more complex demographic histories have been developed based on the Wright–Fisherdiffusion, describing the evolution of population allele frequencies forward in time (Griffiths, 2003; Živković, et al., 2015; Gutenkunst et al., 2009; Excoffier et al., 2013), or the Coalescent process, a model for the genealogical relationship in a sample of individuals (Griffiths and Tavaré, 1994; Griffiths and Marjoram, 1996; Griffiths and Tavaré, 1998; Živković and Wiehe, 2008; Bhaskar et al., 2015; Kamm et al., 2017). A powerful representation of genetic variation data that has been used in this context is the Site-Frequency-Spectrum.

View all citing articles on Scopus
View full text