Abstract
This protocol describes how to appropriately design a genetic association case–control study, either focusing on a candidate gene (CG) or region or implementing a genome-wide approach. The steps described involve: (i) defining the case phenotype in adequate detail; (ii) checking the heritability of the disease in question; (iii) considering whether a population-based study is the appropriate design for the research question; (iv) the appropriate selection of controls; (v) sample size calculations and (vi) giving due consideration to whether it is a de novo or replication study. General guidelines are given, as well as specific examples of a CG and a genome-wide association study into type 2 diabetes. Software and websites used in this protocol include the International HapMap Consortium website, Genetic Power Calculator, CaT, and SNPSpD. Running each of the programs takes only a few seconds; the rate-limiting steps involve thinking through the designs and parameters in the disease models.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
Purchase on Springer Link
Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Gilliam, T.C. et al. Localization of the Huntington's disease gene to a small segment of chromosome 4 flanked by D4S10 and the telomere. Cell 50, 565–571 (1987).
Kerem, B. et al. Identification of the cystic fibrosis gene: genetic analysis. Science 245, 1073–1080 (1989).
The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
Palmer, L.J. & Cardon, L.R. Shaking the tree: mapping complex disease genes with linkage disequilibrium. Lancet 366, 1223–1234 (2005).
Zondervan, K.T. & Cardon, L.R. The complex interplay among factors that influence allelic association. Nat. Rev. Genet. 5, 89–100 (2004).
Hirschhorn, J.N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet. Med. 4, 45–61 (2002).
Weiss, K.M. & Terwilliger, J.D. How many diseases does it take to map a gene with SNPs? Nat. Genet. 26, 151–157 (2000).
Klein, R.J. et al. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385–389 (2005).
Dewan, A. et al. HTRA1 promoter polymorphism in wet age-related macular degeneration. Science 314, 989–992 (2006).
Duerr, R.H. et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 314, 1461–1463 (2006).
Cardon, L.R. Genetics. Delivering new disease genes. Science 314, 1403–1405 (2006).
Sladek, R. et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445, 881–885 (2007).
Frayling, T.M. et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889–894 (2007).
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Bennett, P.H. Basis of the present classification of diabetes. Adv. Exp. Med. Biol. 189, 17–29 (1985).
American Diabetes Association. Diagnosis and classification of diabetes mellitus. Diabetes Care 29 (Suppl 1): S43–S48 (2006).
O'Rahilly, S., Barroso, I. & Wareham, N.J. Genetic factors in type 2 diabetes: the end of the beginning? Science 307, 370–373 (2005).
Antoniou, A.C. & Easton, D.F. Polygenic inheritance of breast cancer: implications for design of association studies. Genet. Epidemiol. 25, 190–202 (2003).
Rothman, K.J. & Greenland, S. Case-control studies. In Modern Epidemiology (eds. Rothman, K.J. & Greenland, S.) 93–114 (Lippincott-Raven, Philadelphia, Pennsylvania, 1998).
Rothman, K.J. & Greenland, S. Precision and validity in epidemiologic studies. In Modern Epidemiology (eds. Rothman, K.J. & Greenland, S.G.) 115–134 (Lippincott-Raven, Philadelphia, Pennsylvania, 1998).
Cardon, L.R. & Palmer, L.J. Population stratification and spurious allelic association. Lancet 361, 598–604 (2003).
Schlesselman, J.J. Case-control Studies: Design, Conduct And Analysis 1–330 (Oxford University Press, Oxford, 1982).
Tang, H. et al. Genetic structure, self-identified race/ethnicity, and confounding in case-control association studies. Am. J. Hum. Genet. 76, 268–275 (2005).
Campbell, C.D. et al. Demonstrating stratification in a European American population. Nat. Genet. 37, 868–872 (2005).
Marchini, J., Cardon, L.R., Phillips, M.S. & Donnelly, P. The effects of human population structure on large genetic association studies. Nat. Genet. 36, 512–517 (2004).
Helgason, A., Yngvadóttir, B., Hrafnkelsson, B., Gulcher, J. & Stefánsson, K. An Icelandic example of the impact of population structure on association studies. Nat. Genet. 37, 90–95 (2005).
Steffens, M. et al. SNP-based analysis of genetic substructure in the German population. Hum. Hered. 62, 20–29 (2006).
Tsai, H.J. et al. Comparison of three methods to estimate genetic ancestry and control for stratification in genetic association studies among admixed populations. Hum. Genet. 118, 424–433 (2005).
Gorroochurn, P., Heiman, G.A., Hodge, S.E. & Greenberg, D.A. Centralizing the non-central chi-square: a new method to correct for population stratification in genetic case-control association studies. Genet. Epidemiol. 30, 277–289 (2006).
Palmer, L.J. UK Biobank: bank on it. Lancet 369, 1980–1982 (2007).
Cardon, L.R. & Bell, J.I. Association study designs for complex diseases. Nat. Rev. Genet. 2, 91–99 (2001).
Thomas, D., Xie, R. & Gebregziabher, M. Two-stage sampling designs for gene association studies. Genet. Epidemiol. 27, 401–414 (2004).
Skol, A.D., Scott, L.J., Abecasis, G.R. & Boehnke, M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38, 209–213 (2006).
Beavis, W.D. The power and deceit of QTL experiments: lessons from comparitive QTL studies. In Proceedings of the Forty-Ninth Annual Corn & Sorghum Industry Research Conference) 250–266 (American Trade Association, Washington, DC, 1994.
Lohmueller, K.E., Pearce, C.L., Pike, M., Lander, E.S. & Hirschhorn, J.N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat. Genet. 33, 177–182 (2003).
Garner, C. Upward bias in odds ratio estimates from genome-wide association studies. Genet. Epidemiol. 31, 288–295 (2007).
Purcell, S., Cherny, S.S. & Sham, P.C. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19, 149–150 (2003).
Nyholt, D.R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74, 765–769 (2004).
Li, J. & Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95, 221–227 (2005).
Poulsen, P., Kyvik, K.O., Vaag, A. & Beck-Nielsen, H. Heritability of type II (non-insulin-dependent) diabetes mellitus and abnormal glucose tolerance—a population-based twin study. Diabetologia 42, 139–145 (1999).
Altshuler, D. et al. The common PPARgamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nat. Genet. 26, 76–80 (2000).
Grant, S.F. et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 38, 320–323 (2006).
Wild, S., Roglic, G., Green, A., Sicree, R. & King, H. Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. Diabetes Care 27, 1047–1053 (2004).
Bland, J.M. & Altman, D.G. Multiple significance tests: the Bonferroni method. BMJ 310, 170 (1995).
Kruglyak, L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22, 139–144 (1999).
Barrett, J.C. & Cardon, L.R. Evaluating coverage of genome-wide association studies. Nat. Genet. 38, 659–662 (2006).
Acknowledgements
We thank David Evans and John Broxholme for their help with Perl scripting. This work was supported by funding from the European Union (MolPAGE grant LSHG-512066) to K.T.Z. and from the Wellcome Trust to L.R.C.
Author information
Authors and Affiliations
Corresponding author
Supplementary information
Supplementary Table 1
Selection of 18 tag SNPs in PPARγ (PDF 31 kb)
Rights and permissions
About this article
Cite this article
Zondervan, K., Cardon, L. Designing candidate gene and genome-wide case–control association studies. Nat Protoc 2, 2492–2501 (2007). https://doi.org/10.1038/nprot.2007.366
Published:
Issue Date:
DOI: https://doi.org/10.1038/nprot.2007.366
This article is cited by
-
Transfer learning for genotype–phenotype prediction using deep learning models
BMC Bioinformatics (2022)
-
Ocular morphologic traits in the American Cocker Spaniel may confer primary angle closure glaucoma susceptibility
Scientific Reports (2022)
-
RNA-Seq reveals adaptive genetic potential of the rare Torrey pine (Pinus torreyana) in the face of Ips bark beetle outbreaks
Conservation Genetics (2021)
-
The impact of disregarding family structure on genome-wide association analysis of complex diseases in cohorts with simple pedigrees
Journal of Applied Genetics (2020)
-
Genetic Variation and Response to Neurocritical Illness: a Powerful Approach to Identify Novel Pathophysiological Mechanisms and Therapeutic Targets
Neurotherapeutics (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.