Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Detecting gene–gene interactions that underlie human diseases

Key Points

  • Interactions between genetic loci might reduce the power to detect genetic effects in genetic association studies, if these interactions are not allowed for.

  • Statistical interaction corresponds to a departure from the additive effects of two or more variables in a linear model describing the relationship between an outcome and predictor variables.

  • A variety of methods can be used to test for statistical interaction between predictor variables that encode the genotype and an outcome variable corresponding to the disease phenotype.

  • Logistic regression is one method that can be used either to test for interaction, or to test for association while allowing for interaction.

  • Given genome-wide data, an exhaustive search is feasible for investigating two-way interactions (that is, all pairwise combinations of loci) but not for investigation of higher-order interactions.

  • Filtering approaches allow one to reduce the number of loci considered and thus the number of interaction tests performed.

  • Data-mining or machine-learning methods, such as random forests and Multifactor Dimensionality Reduction (MDR), can allow one to search through the space of possible interactions.

  • Bayesian model selection approaches offer an alternative approach for searching through the space of possible interactions.

  • The biological interpretation of statistical interactions is complex. The degree to which statistical interaction implies interaction or synergism in a causal sense might be extremely limited.

Abstract

Following the identification of several disease-associated polymorphisms by genome-wide association (GWA) analysis, interest is now focusing on the detection of effects that, owing to their interaction with other genetic or environmental factors, might not be identified by using standard single-locus tests. In addition to increasing the power to detect associations, it is hoped that detecting interactions between loci will allow us to elucidate the biological and biochemical pathways that underpin disease. Here I provide a critical survey of the methods and related software packages currently used to detect the interactions between genetic loci that contribute to human genetic disease. I also discuss the difficulties in determining the biological relevance of statistical interactions.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Semi-exhaustive search of pairwise interactions between 89,294 SNPs.
Figure 2: Random Jungle analysis of 89,294 SNPs.
Figure 3: Multifactor Dimensionality Reduction (MDR) and Tuned ReliefF (TuRF) analysis of 6,113 SNPs.
Figure 4: Bayesian Epistasis Association Mapping (BEAM) analysis of 47,727 SNPs.

Similar content being viewed by others

References

  1. WTCCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007). In this study of 17,000 individuals, many new complex trait loci were identified and key methodological and technical issues related to GWA studies were explored.

  2. Easton, D. F. et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, 1087–1093 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Frayling, T. M. et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889–894 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Plenge, R. M. et al. TRAF1-C5 as a risk locus for rheumatoid arthritis — a genome-wide study. N. Engl. J. Med. 357, 1199–1209 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Fellay, J. et al. A whole-genome association study of major determinants for host control of HIV-1. Science 317, 944–947 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Culverhouse, R., Suarez, B. K., Lin, J. & Reich, T. A perspective on epistasis: limits of models displaying no main effect. Am. J. Hum. Genet. 70, 461–471 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Moore, J. H. The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56, 73–82 (2003).

    Article  PubMed  Google Scholar 

  8. Ritchie, M. D. et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69, 138–147 (2001). This was the original paper describing the popular MDR method.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Hahn, L. W., Ritchie, M. D. & Moore, J. H. Multifactor dimensionality reduction software for detecting gene–gene and gene–environment interactions. Bioinformatics 19, 376–382 (2003).

    Article  CAS  PubMed  Google Scholar 

  10. Moore, J. H. Computational analysis of gene–gene interactions using multifactor dimensionality reduction. Expert Rev. Mol. Diagn. 4, 795–803 (2004).

    Article  CAS  PubMed  Google Scholar 

  11. Chung, Y., Lee, S. Y., Elston, R. C. & Park, T. Odds ratio based multifactor-dimensionality reduction method for detecting gene–gene interactions. Bioinformatics 23, 71–76 (2007).

    Article  CAS  PubMed  Google Scholar 

  12. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Zhang, Y. & Liu, J. S. Bayesian inference of epistatic interactions in case–control studies. Nature Genet. 39, 1167–1173 (2007). This paper proposed a new Bayesian approach for the detection of loci that might interact in the context of GWA studies. The related BEAM software package provides a computationally efficient implementation of the proposed algorithm.

    Article  CAS  PubMed  Google Scholar 

  14. Ferreira, T., Donnelly, P. & Marchini, J. Powerful Bayesian gene–gene interaction analysis. Am. J. Hum. Genet. 81 (Suppl.), 32 (2007).

    Google Scholar 

  15. Gayan, J. et al. A method for detecting epistasis in genome-wide studies using case–control multi-locus association analysis. BMC Genomics 9, 360 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Kraft, P., Yen, Y. C., Stram, D. O., Morrison, J. & Gauderman, W. J. Exploiting gene–environment interaction to detect genetic associations. Hum. Hered. 63, 111–119 (2007).

    Article  CAS  PubMed  Google Scholar 

  17. Fisher, R. The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edin. 52, 399–433 (1918).

    Article  Google Scholar 

  18. Hayman, B. I. & Mather, K. The description of genetic interactions in continuous variation. Biometrics 11, 69–82 (1955).

    Article  Google Scholar 

  19. Zeng, Z. B., Wang, T. & Zou, W. Modeling quantitative trait loci and interpretation of models. Genetics 169, 1711–1725 (2005). This paper includes an excellent discussion of issues in the definition and interpretation of interaction in quantitative genetic studies of derived populations (inbred lines).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Phillips, P. C. Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems. Nature Rev. Genet. 9, 855–867 (2008). An excellent review describing the differing definitions and interpretations of epistasis.

    Article  CAS  PubMed  Google Scholar 

  21. Cordell, H. J. Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum. Mol. Genet. 11, 2463–2468 (2002).

    Article  CAS  PubMed  Google Scholar 

  22. Cordell, H. J., Todd, J. A., Bennett, S. T., Kawaguchi, Y. & Farrall, M. Two-locus maximum lod score analysis of a multifactorial trait: joint consideration of IDDM2 and IDDM4 with IDDM1 in type 1 diabetes. Am. J. Hum. Genet. 57, 920–934 (1995).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Cox, N. J. et al. Loci on chromosomes 2 (NIDDM1) and 15 interact to increase susceptibility to diabetes in Mexican Americans. Nature Genet. 21, 213–215 (1999).

    Article  CAS  PubMed  Google Scholar 

  24. Cordell, H. J., Wedig, G. C., Jacobs, K. B. & Elston, R. C. Multilocus linkage tests based on affected relative pairs. Am. J. Hum. Genet. 66, 1273–1286 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Strauch, K., Fimmers, R., Baur, M. & Wienker, T. F. How to model a complex trait 2. Analysis with two disease loci. Hum. Hered. 56, 200–211 (2003).

    Article  PubMed  Google Scholar 

  26. Armitage, P., Berry, G. & Matthews, J. N. S. Statistical Methods in Medical Research 4th edn (Blackwell Science, Chichester, 2002).

    Book  Google Scholar 

  27. McCullagh, P. & Nelder, J. A. Generalized Linear Models (Chapman & Hall, London, 1989).

    Book  Google Scholar 

  28. Neuman, R. J. & Rice, J. P. Two-locus models of disease. Genet. Epidemiol. 9, 347–365 (1992).

    Article  CAS  PubMed  Google Scholar 

  29. Li, W. & Reich, J. A complete enumeration and classification of two-locus disease models. Hum. Hered. 50, 334–349 (2000).

    Article  CAS  PubMed  Google Scholar 

  30. Hallgrimsdottir, I. B. & Yuster, D. S. A complete classification of epistatic two-locus models. BMC Genet. 9, 17 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  31. McKinney, B. A., Reif, D. M., Ritchie, M. D. & Moore, J. H. Machine learning for detecting gene–gene interactions: a review. Appl. Bioinformatics 5, 77–88 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Piegorsch, W. W., Weinberg, C. R. & Taylor, J. A. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case–control studies. Stat. Med. 13, 153–162 (1994). An important paper showing the use of case-only designs for detection of gene–environment interactions in epidemiological studies.

    Article  CAS  PubMed  Google Scholar 

  33. Yang, Q., Khoury, M. J., Sun, F. & Flanders, W. D. Case-only design to measure gene–gene interaction. Epidemiology 10, 167–170 (1999).

    Article  CAS  PubMed  Google Scholar 

  34. Weinberg, C. R. & Umbach, D. M. Choosing a retrospective design to assess joint genetic and environmental contributions to risk. Am. J. Epidemiol. 152, 197–203 (2000).

    Article  CAS  PubMed  Google Scholar 

  35. Mukherjee, B. et al. Tests for gene–environment interaction from case–control data: a novel study of type I error, power and designs. Genet. Epidemiol. 32, 615–626 (2008).

    Article  PubMed  Google Scholar 

  36. Zhao, J., Jin, L. & Xiong, M. Test for interaction between two unlinked loci. Am. J. Hum. Genet. 79, 831–845 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Hoh, J. & Ott, J. Mathematical multi-locus approaches to localizing complex human trait genes. Nature Rev. Genet. 4, 701–709 (2003).

    Article  CAS  PubMed  Google Scholar 

  38. Mukherjee, B. & Chatterjee, N. Exploiting gene–environment independence for analysis of case–control studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics 64, 685–694 (2008).

    Article  PubMed  Google Scholar 

  39. Yang, Y., Houle, A. M., Letendre, J. & Richter, A. RET Gly691Ser mutation is associated with primary vesicoureteral reflux in the French-Canadian population from Quebec. Hum. Mutat. 29, 695–702 (2008).

    Article  CAS  PubMed  Google Scholar 

  40. Moore, J. H. et al. A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241, 252–261 (2006).

    Article  PubMed  Google Scholar 

  41. Chanda, P. et al. Information-theoretic metrics for visualizing gene–environment interactions. Am. J. Hum. Genet. 81, 939–963 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Kang, G. et al. An entropy-based approach for testing genetic epistasis underlying complex diseases. J. Theor. Biol. 250, 362–374 (2008).

    Article  CAS  PubMed  Google Scholar 

  43. Dong, C. et al. Exploration of gene–gene interaction effects using entropy-based methods. Eur. J. Hum. Genet. 16, 229–235 (2008).

    Article  CAS  PubMed  Google Scholar 

  44. Zwick, M. An overview of reconstructability analysis. Kybernetes 33, 877–905 (2004). An excellent overview of some of the principles and techniques used in information-theory modelling of frequency and probability distributions.

    Article  Google Scholar 

  45. Cordell, H. J. & Clayton, D. G. A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. Am. J. Hum. Genet. 70, 124–141 (2002).

    Article  CAS  PubMed  Google Scholar 

  46. Cordell, H. J., Barratt, B. J. & Clayton, D. G. Case/pseudocontrol analysis in genetic association studies: a unified framework for detection of genotype and haplotype associations, gene–gene and gene–environment interactions and parent-of-origin effects. Genet. Epidemiol. 26, 167–185 (2004). This paper describes a regression-based framework for the analysis of family-based data that allows tests of interaction that are similar to the tests often used in case–control studies to be performed.

    Article  PubMed  Google Scholar 

  47. Martin, E. R., Ritchie, M. D., Hahn, L., Kang, S. & Moore, J. H. A novel method to identify gene–gene effects in nuclear families: the MDR-PDT. Genet. Epidemiol. 30, 111–123 (2006).

    Article  CAS  PubMed  Google Scholar 

  48. Kotti, S., Bickeboller, H. & Clerget-Darpoux, F. Strategy for detecting susceptibility genes with weak or no marginal effect. Hum. Hered. 63, 85–92 (2007).

    Article  CAS  PubMed  Google Scholar 

  49. Lou, X. Y. et al. A combinatorial approach to detecting gene–gene and gene–environment interactions in family studies. Am. J. Hum. Genet. 83, 457–467 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Gauderman, W. J. Sample size requirements for association studies of gene–gene interaction. Am. J. Epidemiol. 155, 478–484 (2002).

    Article  PubMed  Google Scholar 

  51. Hein, R., Beckmann, L. & Chang-Claude, J. Sample size requirements for indirect association studies of gene–environment interactions (G x E). Genet. Epidemiol. 32, 235–245 (2008).

    Article  PubMed  Google Scholar 

  52. Marchini, J., Donnelly, P. & Cardon, L. R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genet. 37, 413–417 (2005). This paper highlights the importance and feasibility of fitting interaction models using GWA data.

    Article  CAS  PubMed  Google Scholar 

  53. Chapman, J. & Clayton, D. Detecting association using epistatic information. Genet. Epidemiol. 31, 894–909 (2007).

    Article  PubMed  Google Scholar 

  54. Motsinger, A., Lee, S., Mellick, G. & Ritchie, M. GPNN: power studies and applications of a neural network method for detecting gene–gene interactions in studies of human disease. BMC Bioinformatics 7, 39 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Motsinger-Reif, A. A., Dudek, S. M., Hahn, L. W. & Ritchie, M. D. Comparison of approaches for machine-learning optimization of neural networks for detecting gene–gene interactions in genetic epidemiology. Genet. Epidemiol. 32, 325–340 (2008).

    Article  PubMed  Google Scholar 

  56. Lunn, D. J., Whittaker, J. C. & Best, N. A Bayesian toolkit for genetic association studies. Genet. Epidemiol. 30, 231–247 (2006).

    Article  PubMed  Google Scholar 

  57. Hoh, J. et al. Selecting SNPs in two-stage analysis of disease association data: a model-free approach. Ann. Hum. Genet. 64, 413–417 (2000).

    Article  CAS  PubMed  Google Scholar 

  58. Millstein, J., Conti, D. V., Gilliland, F. D. & Gauderman, W. J. A testing framework for identifying susceptibility genes in the presence of epistasis. Am. J. Hum. Genet. 78, 15–27 (2006).

    Article  CAS  PubMed  Google Scholar 

  59. ochdanovits, Z. et al. Genome-wide prediction of functional gene–gene interactions inferred from patterns of genetic differentiation in mice and men. PLoS ONE 3, e1593 (2008).

    Article  CAS  Google Scholar 

  60. Emily, M., Mailund, T., Schauser, L. & Schierup, M. H. Using biological networks to search for interacting loci in genomewide association studies. Eur. J. Hum. Genet. 11 Mar 2009 (doi: 10.1038/ejhg.2009.15).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Moore, J. H. & Williams, S. M. New strategies for identifying gene–gene interactions in hypertension. Ann. Med. 34, 88–95 (2002).

    Article  CAS  PubMed  Google Scholar 

  62. Golub, G., Heath, M. & Wahba, G. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21, 215–224 (1979).

    Article  Google Scholar 

  63. Velez, D. R. et al. A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet. Epidemiol. 31, 306–315 (2007).

    Article  PubMed  Google Scholar 

  64. Copas, J. B. Regression, prediction and shrinkage. J. Roy. Stat. Soc., Series B 45, 311–354 (1983).

    Google Scholar 

  65. Hastie, T., Tibshirani, R., & Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference and Prediction (Springer, New York, 2001).

    Book  Google Scholar 

  66. Lee, A. & Silvapulle, M. Ridge estimation in logistic regression. Comm. Stat. Simul. Comput. 17, 1231–1257 (1988).

    Article  Google Scholar 

  67. Le Cessie, S. & Van Houwelingen, J. Ridge estimators in logistic regression. Appl. Stat. 41, 191–201 (1992).

    Article  Google Scholar 

  68. Efron, B., Hastie, T., Johnstone, I. & Tibshirani, R. Least angle regression. Ann. Statist. 32, 407–499 (2004).

    Article  Google Scholar 

  69. Park, M. Y. & Hastie, T. Penalized logistic regression for detecting gene interactions. Biostatistics 9, 30–50 (2008).

    Article  PubMed  Google Scholar 

  70. Zhang, Z., Zhang, S., Wong, M. Y., Wareham, N. H. & Sha, Q. An ensemble learning approach jointly modelling main and interaction effects in genetic association studies. Genet. Epidemiol. 32, 285–300 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Zhang, H. & Bonney, G. Use of classification trees for association studies. Genet. Epidemiol. 19, 323–332 (2000).

    Article  CAS  PubMed  Google Scholar 

  72. Nelson, M. R., Kardia, S. L., Ferrell, R. E. & Sing, C. F. A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res. 11, 458–470 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Culverhouse, R., Klein, T. & Shannon, W. Detecting epistatic interactions contributing to quantitative traits. Genet. Epidemiol. 27, 141–152 (2004).

    Article  PubMed  Google Scholar 

  74. McKinney, B. A., Crowe, J. E., Guo, J. & Tian, D. Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis. PLoS Genet. 5, e1000432 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  76. Lunetta, K. L., Hayward, L. B., Segal, J. & Van Eerdewegh, P. Screening large-scale association study data: exploiting interactions using random forests. BMC Genet. 5, 32 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Bureau, A. et al. Identifying SNPs predictive of phenotype using random forests. Genet. Epidemiol. 28, 171–182 (2005).

    Article  PubMed  Google Scholar 

  78. Schwartz, D. F., Ziegler, A. & König, I. R. Beyond the results of genome-wide association studies. Genet. Epidemiol. 32, 671 (2008).

    Google Scholar 

  79. Kooperberg, C., Ruczinski, I., LeBlanc, M. & Hsu, L. Sequence analysis using logic regression. Genet. Epidemiol. 21, S626–S631 (2001).

    Article  PubMed  Google Scholar 

  80. Kooperberg, C. & Ruczinski, I. Identifying interacting SNPs using Monte Carlo logic regression. Genet. Epidemiol. 28, 157–170 (2005).

    Article  PubMed  Google Scholar 

  81. Nunkesser, R., Bernholt, T., Schwender, H., Ickstadt, K. & Wegener, I. Detecting high-order interactions of single nucleotide polymorphisms using genetic programming. Bioinformatics 23, 3280–3288 (2007).

    Article  CAS  PubMed  Google Scholar 

  82. Li, Z., Zheng, T., Califano, A. & Floratos, A. Pattern-based mining strategy to detect multi-locus association and gene × environment interaction. BMC Proc. 1(Suppl. 1), S16 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  83. Long, Q., Zhang, Q. & Ott, J. Detecting disease-associated genotype patterns. BMC Bioinform. 10(Suppl. 1), S75 (2009).

    Article  CAS  Google Scholar 

  84. Cho, Y. M. et al. Multifactor-dimensionality reduction shows a two-locus interaction associated with type 2 diabetes mellitus. Diabetologia 47, 549–554 (2004).

    Article  CAS  PubMed  Google Scholar 

  85. Julia, A. et al. Identification of a two-loci epistatic interaction associated with susceptibility to rheumatoid arthritis through reverse engineering and multifactor dimensionality reduction. Genomics 90, 6–13 (2007).

    Article  CAS  PubMed  Google Scholar 

  86. Tsai, C. T. et al. Renin–angiotensin system gene polymorphisms and coronary artery disease in a large angiographic cohort: detection of high order gene–gene interaction. Atherosclerosis 195, 172–180 (2007).

    Article  CAS  PubMed  Google Scholar 

  87. Lee, S. Y., Chung, Y., Elston, R. C., Kim, Y. & Park, T. Log-linear model based multifactor-dimensionality reduction method to detect gene–gene interactions. Bioinformatics 23, 2589–2595 (2007).

    Article  CAS  PubMed  Google Scholar 

  88. Lou, X. Y. et al. A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence. Am. J. Hum. Genet. 80, 1125–1137 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Robnik-Sikonja, M. & Kononenko, I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53, 23–69 (2003).

    Article  Google Scholar 

  90. Moore, J. H. & White, B. C. Tuning ReliefF for genome-wide genetic analysis. Lect. Notes Comp. Sci. 4447, 166–175 (2007).

    Article  Google Scholar 

  91. McKinney, B. A., Reif, D. M., White, B. C., Crowe, J. & Moore, J. H. Evaporative cooling feature selection for genotypic data involving interactions. Bioinformatics 23, 2113–2120 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Gelman, A., Carlin, J. B., Stern, H. S. & Rubin, D. B. Bayesian Data Analysis (Chapman and Hall, London, 1995).

    Google Scholar 

  93. Gilks, W. R., Richardson, S. & Spiegelhalter, D. J. Markov Chain Monte Carlo in Practice (Chapman and Hall, London, 1996).

    Google Scholar 

  94. Hoggart, C. J., Whittaker, J. C., De Iorio, M. & Balding, D. J. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 4, e1000130 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Phillips, P. C. The language of gene interaction. Genetics 149, 1167–1171 (1998). An important paper that describes the differing definitions and interpretations of epistasis used in different fields and the lack of equivalence between these definitions.

    CAS  PubMed  PubMed Central  Google Scholar 

  96. Moore, J. H. & Williams, S. M. Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. Bioessays 27, 637–646 (2005).

    Article  CAS  PubMed  Google Scholar 

  97. Cheverud, J. M. & Routman, E. J. Epistasis and its contribution to genetic variance components. Genetics 139, 1455–1461 (1995).

    CAS  PubMed  PubMed Central  Google Scholar 

  98. Alvarez-Castro, J. M. & Carlborg, O. A unified model for functional and statistical epistasis and its application in quantitative trait loci analysis. Genetics 176, 1151–1167 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  99. McClay, J. L. & van den Oord, E. J. Variance component analysis of polymorphic metabolic systems. J. Theor. Biol. 240, 149–159 (2006).

    Article  CAS  PubMed  Google Scholar 

  100. Thompson, W. D. Effect modification and the limits of biological inference from epidemiologic data. J. Clin. Epidemiol. 44, 221–232 (1991).

    Article  CAS  PubMed  Google Scholar 

  101. Siemiatycki, J. & Thomas, D. C. Biological models and statistical interactions: an example from multistage carcinogenesis. Int. J. Epidemiol. 10, 383–387 (1981).

    Article  CAS  PubMed  Google Scholar 

  102. Greenland, S. Interactions in epidemiology: relevance, identification, and estimation. Epidemiology 20, 14–17 (2009). A useful commentary on the relationship between statistical and biological interaction assessed from epidemiological studies.

    Article  PubMed  Google Scholar 

  103. Gibson, G. Epistasis and pleiotropy as natural properties of transcriptional regulation. Theor. Popul. Biol. 49, 58–89 (1996).

    Article  CAS  PubMed  Google Scholar 

  104. Vanderweele, T. J. Sufficient cause interactions and statistical interactions. Epidemiology 20, 6–13 (2009).

    Article  PubMed  Google Scholar 

  105. Todd, J. et al. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nature Genet. 39, 857–864 (2007).

    Article  CAS  PubMed  Google Scholar 

  106. Zeggini, E. et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316, 1336–1341 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Sepulveda, N., Paulino, C. D., Carneiro, J. & Penha-Goncalves, C. Allelic penetrance approach as a tool to model two-locus interaction in complex binary traits. Heredity 99, 173–184 (2007).

    Article  CAS  PubMed  Google Scholar 

  108. Sepulveda, N., Paulino, C. D. & Penha-Goncalves, C. Bayesian analysis of allelic penetrance models for complex binary traits. Comp. Stat. Data Anal. 53, 1271–1283 (2009).

    Article  Google Scholar 

  109. Aylor, D. L. & Zeng, Z. B. From classical genetics to quantitative genetics to systems biology: modeling epistasis. PLoS Genet. 4, e1000029 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Curtis, D. Allelic association studies of genome wide association data can reveal errors in marker position assignments. BMC Genet. 8, 30 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Breiman, L., Freidman, J. H., Olshen, R. A. & Stone, C. J. Classification and Regression Trees (Chapman and Hall/CRC, New York, 1984).

    Google Scholar 

  112. Bastone, L., Reilly, M., Rader, D. J. & Foulkes, A. S. MDR and PRP: a comparison of methods for high-order genotype–phenotype associations. Hum. Hered. 58, 82–92 (2004).

    Article  CAS  PubMed  Google Scholar 

  113. Strobl, C., Boulesteix, A. L., Zeileis, A. & Hothorn, T. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 8, 25 (2007). This paper gives an overview of some of the strengths and limitations of random forests analysis for measuring variable importance.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Support for this work was provided by the Wellcome Trust (Grant reference 074524). I thank J. Barrett for assistance with interpretation of the WTCCC Crohn's results, and the WTCCC for making their data freely available. I also thank J. Moore for useful discussions of data-mining methods in general and MDR in particular, and K. Keen for pointing out the origins of the term epistasis.

Author information

Authors and Affiliations

Authors

Supplementary information

Supplementary Box S1

Different models of interaction (PDF 253 kb)

Supplementary Box S2

Effects – interacting, independent or otherwise (PDF 294 kb)

Supplementary Table 1

Top pairwise interactions as detected from a--fast-epistasis analysis of the WTCCC Crohn's disease and control data using PLINK (PDF 164 kb)

Related links

Related links

DATABASES

OMIM

Crohn's disease

FURTHER INFORMATION

Heather J. Cordell's homepage

BEAM

MDR

Nature Reviews Genetics Series on Genome-wide association studies

Nature Reviews Genetics Series on Modelling

PLINK

Random Jungle

Glossary

Data mining

The process of extracting hidden patterns and potentially useful information from large amounts of data.

Machine learning

The ability of a program to learn from experience, that is, to modify its execution on the basis of newly acquired information. A major focus of machine-learning research is to automatically produce models (rules and patterns) from data.

Bayesian model selection

A statistical approach for selecting models by incorporating both prior distributions for parameters of the models and the observed experimental data.

Maximum likelihood

A statistical approach that is used to make inferences about the combination of parameter values that gives the greatest probability of obtaining the observed data.

Saturated

A term for a statistical model that is as full as possible (saturated) with parameters. Such a model is sometimes useful as it serves as a benchmark to quantify how well a simpler model (one with fewer parameters) fits the data.

Penetrance

The probability of displaying a particular phenotype (for example, succumbing to a disease) given that one has a specific genotype.

Marginal effects

The average effects (for example, penetrances) of a single variable, averaged over the possible values taken by other variables. These could be calculated for one locus of a two-locus system as the average of the two-locus penetrances, averaged over the three possible genotypes at the other locus.

Logistic regression model

A statistical model that is used when the outcome is binary. It relates the log odds of the probability of an event to a linear combination of the predictor variables.

Multinomial regression

A statistical approach, similar to logistic regression, which is used when the outcome takes one of several possible categorical values.

Confounding

A phenomenon whereby the measure of association between two variables is distorted because other variables, associated with both variables of interest, are not controlled for in the calculation.

Empirical Bayes procedure

A hierarchical model in which the hyperparameter is not a random variable but is estimated by another (often classical) method.

Information theory

A branch of applied mathematics involving the quantification of information.

Entropy

A key measure used in information theory that quantifies the uncertainty associated with a random variable. For example, a variable indicating the outcome from a toss of a coin will have less entropy than a variable indicating the outcome from a roll of a die (two versus six equally likely outcomes).

Permutation

This method is often used in hypothesis testing. An empirical distribution of a test statistic is obtained by permuting the original sample many times and recalculating the value of the test statistic in each permuted data set. Each permuted sample is considered to be a sample of the population under the null hypothesis.

Multiple testing

An analysis in which multiple independent hypotheses are tested. If a large number of tests are performed, the significance level (p value) of any particular test must be interpreted in light of this fact, as the overall combined probability of making a type I error will increase.

Bonferroni correction

The simplest correction of individual p values for multiple hypothesis testing can be calculated using pcorrected = 1 – (1 – puncorrected)n, in which n is the number of hypotheses tested. This formula assumes that the hypotheses are all independent, and simplifies to pcorrected = npuncorrected when npuncorrected <<1.

Q–Q plot

A quantile–quantile plot is a diagnostic plot that can be used to compare the distribution of observed test statistics with the distribution expected under the null hypothesis. Those tests that lie significantly above the line of equality between observed and expected quantiles are considered significant in the context of the number of tests performed.

High-dimensional data

Data that contain information on a large number of variables, albeit possibly measured in a small number of subjects or replicates.

Cross-validation

This approach involves partitioning a data set into smaller subsamples, performing an analysis in one subsample and using the other subsample to measure or validate how well the analysis has performed. To reduce variability, multiple rounds of cross-validation are often performed using different partitions of the data and the validation results are averaged over the rounds.

Overfitting

The phenomenon in which a complex model might provide a good fit to the current data set but is overfitted to the random quirks present in that particular data set and therefore cannot be generalized to future data sets in the way that a simpler model might be.

Bootstrap samples

These are data sets obtained by taking a random sample of the original data, usually with replacement. One then applies the same analysis as was applied to the real data. This is repeated many times, allowing one to assess the variability in results incurred owing to random sampling.

Frequentist

A statistical approach for testing hypotheses by assessing the strength of evidence for the hypothesis provided by the data.

Burn-in period

In Markov chain Monte Carlo analysis, a period at the start of the computation in which the values taken by the parameters are ignored when constructing the posterior distribution.

Compositional epistasis

The blocking of one allelic effect by an allele at another locus.

Statistical epistasis

The average effect of substitution of alleles at combinations of loci, with respect to the average genetic background of the population.

Functional epistasis

The molecular interactions that proteins and other genetic elements have with one another.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cordell, H. Detecting gene–gene interactions that underlie human diseases. Nat Rev Genet 10, 392–404 (2009). https://doi.org/10.1038/nrg2579

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg2579

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing