Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Designing candidate gene and genome-wide case–control association studies

Abstract

This protocol describes how to appropriately design a genetic association case–control study, either focusing on a candidate gene (CG) or region or implementing a genome-wide approach. The steps described involve: (i) defining the case phenotype in adequate detail; (ii) checking the heritability of the disease in question; (iii) considering whether a population-based study is the appropriate design for the research question; (iv) the appropriate selection of controls; (v) sample size calculations and (vi) giving due consideration to whether it is a de novo or replication study. General guidelines are given, as well as specific examples of a CG and a genome-wide association study into type 2 diabetes. Software and websites used in this protocol include the International HapMap Consortium website, Genetic Power Calculator, CaT, and SNPSpD. Running each of the programs takes only a few seconds; the rate-limiting steps involve thinking through the designs and parameters in the disease models.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Purchase on Springer Link

Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Required number of cases (= number of controls) to detect varying disease allele frequencies and genotypic relative risks (GRRs) with 80% power.

Similar content being viewed by others

References

  1. Gilliam, T.C. et al. Localization of the Huntington's disease gene to a small segment of chromosome 4 flanked by D4S10 and the telomere. Cell 50, 565–571 (1987).

    Article  CAS  Google Scholar 

  2. Kerem, B. et al. Identification of the cystic fibrosis gene: genetic analysis. Science 245, 1073–1080 (1989).

    Article  CAS  Google Scholar 

  3. The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  4. Palmer, L.J. & Cardon, L.R. Shaking the tree: mapping complex disease genes with linkage disequilibrium. Lancet 366, 1223–1234 (2005).

    Article  CAS  Google Scholar 

  5. Zondervan, K.T. & Cardon, L.R. The complex interplay among factors that influence allelic association. Nat. Rev. Genet. 5, 89–100 (2004).

    Article  CAS  Google Scholar 

  6. Hirschhorn, J.N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet. Med. 4, 45–61 (2002).

    Article  CAS  Google Scholar 

  7. Weiss, K.M. & Terwilliger, J.D. How many diseases does it take to map a gene with SNPs? Nat. Genet. 26, 151–157 (2000).

    Article  CAS  Google Scholar 

  8. Klein, R.J. et al. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385–389 (2005).

    Article  CAS  Google Scholar 

  9. Dewan, A. et al. HTRA1 promoter polymorphism in wet age-related macular degeneration. Science 314, 989–992 (2006).

    Article  CAS  Google Scholar 

  10. Duerr, R.H. et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 314, 1461–1463 (2006).

    Article  CAS  Google Scholar 

  11. Cardon, L.R. Genetics. Delivering new disease genes. Science 314, 1403–1405 (2006).

    Article  CAS  Google Scholar 

  12. Sladek, R. et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445, 881–885 (2007).

    Article  CAS  Google Scholar 

  13. Frayling, T.M. et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889–894 (2007).

    Article  CAS  Google Scholar 

  14. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  15. Bennett, P.H. Basis of the present classification of diabetes. Adv. Exp. Med. Biol. 189, 17–29 (1985).

    Article  CAS  Google Scholar 

  16. American Diabetes Association. Diagnosis and classification of diabetes mellitus. Diabetes Care 29 (Suppl 1): S43–S48 (2006).

  17. O'Rahilly, S., Barroso, I. & Wareham, N.J. Genetic factors in type 2 diabetes: the end of the beginning? Science 307, 370–373 (2005).

    Article  CAS  Google Scholar 

  18. Antoniou, A.C. & Easton, D.F. Polygenic inheritance of breast cancer: implications for design of association studies. Genet. Epidemiol. 25, 190–202 (2003).

    Article  Google Scholar 

  19. Rothman, K.J. & Greenland, S. Case-control studies. In Modern Epidemiology (eds. Rothman, K.J. & Greenland, S.) 93–114 (Lippincott-Raven, Philadelphia, Pennsylvania, 1998).

    Google Scholar 

  20. Rothman, K.J. & Greenland, S. Precision and validity in epidemiologic studies. In Modern Epidemiology (eds. Rothman, K.J. & Greenland, S.G.) 115–134 (Lippincott-Raven, Philadelphia, Pennsylvania, 1998).

    Google Scholar 

  21. Cardon, L.R. & Palmer, L.J. Population stratification and spurious allelic association. Lancet 361, 598–604 (2003).

    Article  Google Scholar 

  22. Schlesselman, J.J. Case-control Studies: Design, Conduct And Analysis 1–330 (Oxford University Press, Oxford, 1982).

    Google Scholar 

  23. Tang, H. et al. Genetic structure, self-identified race/ethnicity, and confounding in case-control association studies. Am. J. Hum. Genet. 76, 268–275 (2005).

    Article  CAS  Google Scholar 

  24. Campbell, C.D. et al. Demonstrating stratification in a European American population. Nat. Genet. 37, 868–872 (2005).

    Article  CAS  Google Scholar 

  25. Marchini, J., Cardon, L.R., Phillips, M.S. & Donnelly, P. The effects of human population structure on large genetic association studies. Nat. Genet. 36, 512–517 (2004).

    Article  CAS  Google Scholar 

  26. Helgason, A., Yngvadóttir, B., Hrafnkelsson, B., Gulcher, J. & Stefánsson, K. An Icelandic example of the impact of population structure on association studies. Nat. Genet. 37, 90–95 (2005).

    Article  CAS  Google Scholar 

  27. Steffens, M. et al. SNP-based analysis of genetic substructure in the German population. Hum. Hered. 62, 20–29 (2006).

    Article  CAS  Google Scholar 

  28. Tsai, H.J. et al. Comparison of three methods to estimate genetic ancestry and control for stratification in genetic association studies among admixed populations. Hum. Genet. 118, 424–433 (2005).

    Article  Google Scholar 

  29. Gorroochurn, P., Heiman, G.A., Hodge, S.E. & Greenberg, D.A. Centralizing the non-central chi-square: a new method to correct for population stratification in genetic case-control association studies. Genet. Epidemiol. 30, 277–289 (2006).

    Article  Google Scholar 

  30. Palmer, L.J. UK Biobank: bank on it. Lancet 369, 1980–1982 (2007).

    Article  Google Scholar 

  31. Cardon, L.R. & Bell, J.I. Association study designs for complex diseases. Nat. Rev. Genet. 2, 91–99 (2001).

    Article  CAS  Google Scholar 

  32. Thomas, D., Xie, R. & Gebregziabher, M. Two-stage sampling designs for gene association studies. Genet. Epidemiol. 27, 401–414 (2004).

    Article  Google Scholar 

  33. Skol, A.D., Scott, L.J., Abecasis, G.R. & Boehnke, M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38, 209–213 (2006).

    Article  CAS  Google Scholar 

  34. Beavis, W.D. The power and deceit of QTL experiments: lessons from comparitive QTL studies. In Proceedings of the Forty-Ninth Annual Corn & Sorghum Industry Research Conference) 250–266 (American Trade Association, Washington, DC, 1994.

    Google Scholar 

  35. Lohmueller, K.E., Pearce, C.L., Pike, M., Lander, E.S. & Hirschhorn, J.N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat. Genet. 33, 177–182 (2003).

    Article  CAS  Google Scholar 

  36. Garner, C. Upward bias in odds ratio estimates from genome-wide association studies. Genet. Epidemiol. 31, 288–295 (2007).

    Article  Google Scholar 

  37. Purcell, S., Cherny, S.S. & Sham, P.C. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19, 149–150 (2003).

    Article  CAS  Google Scholar 

  38. Nyholt, D.R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74, 765–769 (2004).

    Article  CAS  Google Scholar 

  39. Li, J. & Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95, 221–227 (2005).

    Article  CAS  Google Scholar 

  40. Poulsen, P., Kyvik, K.O., Vaag, A. & Beck-Nielsen, H. Heritability of type II (non-insulin-dependent) diabetes mellitus and abnormal glucose tolerance—a population-based twin study. Diabetologia 42, 139–145 (1999).

    Article  CAS  Google Scholar 

  41. Altshuler, D. et al. The common PPARgamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nat. Genet. 26, 76–80 (2000).

    Article  CAS  Google Scholar 

  42. Grant, S.F. et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 38, 320–323 (2006).

    Article  CAS  Google Scholar 

  43. Wild, S., Roglic, G., Green, A., Sicree, R. & King, H. Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. Diabetes Care 27, 1047–1053 (2004).

    Article  Google Scholar 

  44. Bland, J.M. & Altman, D.G. Multiple significance tests: the Bonferroni method. BMJ 310, 170 (1995).

    Article  CAS  Google Scholar 

  45. Kruglyak, L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22, 139–144 (1999).

    Article  CAS  Google Scholar 

  46. Barrett, J.C. & Cardon, L.R. Evaluating coverage of genome-wide association studies. Nat. Genet. 38, 659–662 (2006).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank David Evans and John Broxholme for their help with Perl scripting. This work was supported by funding from the European Union (MolPAGE grant LSHG-512066) to K.T.Z. and from the Wellcome Trust to L.R.C.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krina T Zondervan.

Supplementary information

Supplementary Table 1

Selection of 18 tag SNPs in PPARγ (PDF 31 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zondervan, K., Cardon, L. Designing candidate gene and genome-wide case–control association studies. Nat Protoc 2, 2492–2501 (2007). https://doi.org/10.1038/nprot.2007.366

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2007.366

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing