Skip to main content

Advertisement

Log in

A comprehensive evaluation of SNP genotype imputation

  • Original Investigation
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

Genome-wide association studies have contributed significantly to the genetic dissection of complex diseases. In order to increase the power of existing marker sets even further, methods have been proposed to predict individual genotypes at un-typed loci from other marker sets by imputation, usually employing HapMap data as a reference. Although various imputation algorithms have been used in practice already, a comprehensive evaluation and comparison of these approaches, using genome-wide SNP data from one and the same population is still lacking. We therefore investigated four publicly available programs for genotype imputation (BEAGLE, IMPUTE, MACH, and PLINK) using data from 449 German individuals genotyped in our laboratory for three genome-wide SNP sets [Affymetrix 5.0 (500 k), Affymetrix 6.0 (1,000 k), and Illumina 550 k]. We observed that HapMap-based imputation in a northern European population is powerful and reliable, even in highly variable genomic regions such as the extended MHC on chromosome 6p21. However, while genotype predictions were found to be highly accurate with all four programs, the number of SNPs for which imputation was actually carried out (‘imputation efficacy’) varied substantially. BEAGLE, IMPUTE, and MACH yielded nearly identical trade-offs between imputation accuracy and efficacy whereas PLINK performed consistently poorer. We nevertheless recommend either MACH or BEAGLE for practical use because these two programs are more user-friendly and generally require less memory than IMPUTE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Anderson CA, Pettersson FH, Barrett JC, Zhuang JJ, Ragoussis J, Cardon LR, Morris AP (2008) Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. Am J Hum Genet 83:112–119

    Article  PubMed  CAS  Google Scholar 

  • Ardlie KG, Kruglyak L, Seielstad M (2002) Patterns of linkage disequilibrium in the human genome. Nat Rev Genet 3:299–309

    Article  PubMed  CAS  Google Scholar 

  • Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263–265

    Article  PubMed  CAS  Google Scholar 

  • Becker T, Knapp M (2004) Maximum-likelihood estimation of haplotype frequencies in nuclear families. Genet Epidemiol 27:21–32

    Article  PubMed  Google Scholar 

  • Browning SR (2008) Missing data imputation and haplotype phase inference for genome-wide association studies. Hum Genet 124:439–450

    Article  PubMed  CAS  Google Scholar 

  • Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81:1084–1097

    Article  PubMed  CAS  Google Scholar 

  • Browning BL, Browning SR (2008) Haplotypic analysis of Wellcome Trust Case Control Consortium data. Hum Genet 123:273–280

    Article  PubMed  Google Scholar 

  • Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Sun W, Wang H, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallee C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T et al (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861

    Article  PubMed  CAS  Google Scholar 

  • Gourraud PA, Genin E, Cambon-Thomsen A (2004) Handling missing values in population data: consequences for maximum likelihood estimation of haplotype frequencies. Eur J Hum Genet 12:805–812

    Article  PubMed  CAS  Google Scholar 

  • Krawczak M, Nikolaus S, von Eberstein H, Croucher PJ, El Mokhtari NE, Schreiber S (2006) PopGen: population-based recruitment of patients and controls for the analysis of complex genotype-phenotype relationships. Community Genet 9:55–61

    Article  PubMed  Google Scholar 

  • Lao O, Lu TT, Nothnagel M, Junge O, Freitag-Wolf S, Caliebe A, Balascakova M, Bertranpetit J, Bindoff LA, Comas D, Holmlund G, Kouvatsi A, Macek M, Mollet I, Parson W, Palo J, Ploski R, Sajantila A, Tagliabracci A, Gether U, Werge T, Rivadeneira F, Hofman A, Uitterlinden AG, Gieger C, Wichmann HE, Ruther A, Schreiber S, Becker C, Nurnberg P, Nelson MR, Krawczak M, Kayser M (2008) Correlation between genetic and geographic structure in Europe. Curr Biol 18:1241–1248

    Article  PubMed  CAS  Google Scholar 

  • Leslie S, Donnelly P, McVean G (2008) A statistical method for predicting classical HLA alleles from SNP data. Am J Hum Genet 82:48–56

    Article  PubMed  CAS  Google Scholar 

  • Li Y, Abecasis GR (2006) Mach 1.0: rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet S79:2290

    Google Scholar 

  • Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39:906–913

    Article  PubMed  CAS  Google Scholar 

  • Pei YF, Li J, Zhang L, Papasian CJ, Deng HW (2008) Analyses and comparison of accuracy of different genotype imputation methods. PLoS ONE 3:e3551

    Article  PubMed  Google Scholar 

  • Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575

    Article  PubMed  CAS  Google Scholar 

  • R Development Core Team (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  • Raymond CK, Kas A, Paddock M, Qiu R, Zhou Y, Subramanian S, Chang J, Palmieri A, Haugen E, Kaul R, Olson MV (2005) Ancient haplotypes of the HLA Class II region. Genome Res 15:1250–1257

    Article  PubMed  CAS  Google Scholar 

  • Servin B, Stephens M (2007) Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet 3:e114

    Article  PubMed  Google Scholar 

  • Teare MD, Dunning AM, Durocher F, Rennart G, Easton DF (2002) Sampling distribution of summary linkage disequilibrium measures. Ann Hum Genet 66:223–233

    Article  PubMed  CAS  Google Scholar 

  • Terwilliger JD, Haghighi F, Hiekkalinna TS, Goring HH (2002) A bias-ed assessment of the use of SNPs in human complex traits. Curr Opin Genet Dev 12:726–734

    Article  PubMed  CAS  Google Scholar 

  • The International HapMap Consortium (2003) The International HapMap Project. Nature 426:789–796

    Article  Google Scholar 

  • The International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437:1299–1320

    Article  Google Scholar 

  • Traherne JA (2008) Human MHC architecture and evolution: implications for disease association studies. Int J Immunogenet 35:179–192

    Article  PubMed  CAS  Google Scholar 

  • Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14, 000 cases of seven common diseases and 3, 000 shared controls. Nature 447:661–678

    Article  Google Scholar 

Download references

Acknowledgments

The authors wish to thank all probands for participating in the study. We also thank Alfred Wagner and Simone Knief of the Computational Centre, Christian-Albrechts University Kiel, Germany, for their support. Thomas Wienker and Michael Steffens (IMBIE, University of Bonn, Germany) are acknowledged for performing the initial quality control of the genotype data. Marcus Will, Michael Wittig (both at the Institute of Clinical Molecular Biology, Kiel) and Olaf Junge (Institute of Medical Informatics and Statistics, Kiel) are gratefully acknowledged for expert technical help. We would like to thank Shaun Purcell (PNGU, Massachusetts General Hospital, Boston, MA, USA), Goncalo Abecasis and Yun Li (both at the Center for Statistical Genetics, University of Michigan, MI, USA), Brian Browning (Department of Statistics, University of Auckland, New Zealand), and Tim Becker (IMBIE, University of Bonn, Germany) for providing access to the latest versions of their software and for helpful discussions. This study was supported by the German Ministry of Education and Research (BMBF) through the National Genome Research Network (NGFN). The project received infrastructure support through the DFG excellence cluster “Inflammation at Interfaces”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andre Franke.

Additional information

M. Nothnagel and D. Ellinghaus contributed equally to the manuscript.

M. Krawczak and A. Franke shared senior authorship.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 7.37 mb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nothnagel, M., Ellinghaus, D., Schreiber, S. et al. A comprehensive evaluation of SNP genotype imputation. Hum Genet 125, 163–171 (2009). https://doi.org/10.1007/s00439-008-0606-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-008-0606-5

Keywords

Navigation