Calibrating a coalescent simulation of human genome sequence variation

  1. Stephen F. Schaffner1,5,
  2. Catherine Foo1,
  3. Stacey Gabriel1,
  4. David Reich1,2,
  5. Mark J. Daly1, and
  6. David Altshuler1,2,3,4
  1. 1 Program in Medical and Population Genetics, The Broad Institute, Cambridge, Massachusetts 02139, USA
  2. 2 Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA
  3. 3 Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA
  4. 4 Department of Molecular Biology and Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA

Abstract

Population genetic models play an important role in human genetic research, connecting empirical observations about sequence variation with hypotheses about underlying historical and biological causes. More specifically, models are used to compare empirical measures of sequence variation, linkage disequilibrium (LD), and selection to expectations under a “null” distribution. In the absence of detailed information about human demographic history, and about variation in mutation and recombination rates, simulations have of necessity used arbitrary models, usually simple ones. With the advent of large empirical data sets, it is now possible to calibrate population genetic models with genome-wide data, permitting for the first time the generation of data that are consistent with empirical data across a wide range of characteristics. We present here the first such calibrated model and show that, while still arbitrary, it successfully generates simulated data (for three populations) that closely resemble empirical data in allele frequency, linkage disequilibrium, and population differentiation. No assertion is made about the accuracy of the proposed historical and recombination model, but its ability to generate realistic data meets a long-standing need among geneticists. We anticipate that this model, for which software is publicly available, and others like it will have numerous applications in empirical studies of human genetics.

Footnotes

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3709305. Freely available online through the Genome Research Immediate Open Access option.

  • 5 Corresponding author. E-mail sfs{at}broad.mit.edu; fax (617) 252-1902.

    • Accepted May 17, 2005.
    • Received January 17, 2005.
| Table of Contents

Preprint Server