Trends in Genetics
Volume 20, Issue 12, December 2004, Pages 640-647
Journal home page for Trends in Genetics

Genetics, statistics and human disease: analytical retooling for complexity

https://doi.org/10.1016/j.tig.2004.09.007Get rights and content

Molecular biologists and geneticists alike now acknowledge that most common human diseases with a genetic component are likely to have complex etiologies. Yet despite this belief, many statistical geneticists continue applying, in slightly new and different ways, methodologies that were developed to dissect much simpler etiologies. In this article, we characterize, with examples, the various factors that can complicate genetic analysis and demonstrate their shared features and how they affect genetic analysis. We describe a variety of approaches that are currently available, revealing methodological gaps and suggesting new directions for method development. Finally, we propose a comprehensive two-step approach to analysis that systemically addresses the different genetic factors that are likely to underlie complex diseases.

Section snippets

Categorization and analytical approaches

Each of the factors presented in Table 1, Table 2, Table 3 complicate statistical analysis in one of two ways – either by creating heterogeneous, or competing, disease models (Table 1, Table 2), or by creating a multifactorial, interacting disease model (Table 3). The challenge for modeling the relationship between genetic and environmental risk factors (independent variables) and disease endpoints (dependent variables) is different for these two categories. Of course, what exacerbates the

Heterogeneity

For this category of factors, there are multiple independent (predictor) variables or else multiple dependent (outcome) variables that complicate the analysis by creating a heterogeneous model landscape. In the case of allelic or locus heterogeneity or phenocopy, multiple predictor variables (e.g. multiple alleles, multiple loci and/or environmental risk factors) are present, some of which might be unmeasured or unobserved and, therefore, unavailable for inclusion in the disease model. In the

Concluding remarks: retooling for the future

None of the aforementioned methodologies is superior in all respects for the range of complicating factors that might be present in any given dataset. Given the relative shortcomings of our current analyses in complex diseases, we need to extend greatly the range of available analytical tools. There is a crucial need for extensive reevaluation of existing methodologies for complex diseases, as well as for massive efforts in new method development. It is important that empirical studies be

Acknowledgements

We thank the reviewers, Marylyn Ritchie and Dan Hahs for their critical reading of this manuscript. This work was supported by NLM training grant T32 MH64913 and by NIH grants HL65234, AI59694, N532830 and A619085.

Glossary

Recombination fraction:
the probability that a parent will produce a recombinant offspring; the percentage of offspring in a family or dataset who are recombinants; a statistical measure of the distance between two loci.
Admixture:
the mixing of two or more subpopulations, having differing characteristics. If the subpopulations have different allele or genotype frequencies and have different disease frequencies it can result in spurious associations.
Lod score:
the log10 of the odds in favor of

References (85)

  • J. Collinge

    Genetic predisposition to iatrogenic Creutzfeldt–Jakob disease

    Lancet

    (1991)
  • R. De Silva

    Neuropathological phenotype and ‘prion protein’ genotype correlation in sporadic Creutzfeldt–Jakob disease

    Neurosci. Lett.

    (1994)
  • D. Larget-Piet

    Genetic heterogeneity of Usher syndrome type 1 in French families

    Genomics

    (1994)
  • Y. Shao

    Fine mapping of autistic disorder to chromosome 15q11-q13 by use of phenotypic subtypes

    Am. J. Hum. Genet.

    (2003)
  • R.S. Spielman

    Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM)

    Am. J. Hum. Genet.

    (1993)
  • N.E. Morton

    Sequential tests for the detection of linkage

    Am. J. Hum. Genet.

    (1955)
  • N. Risch

    A new statistical test for linkage heterogeneity

    Am. J. Hum. Genet.

    (1988)
  • C.A.B. Smith

    Testing for heterogeneity of recombination fraction values in human genetics

    Ann. Hum. Genet.

    (1963)
  • J. Ott

    Strategies for characterizing highly polymorphic markers in human gene mapping

    Am. J. Hum. Genet.

    (1992)
  • E.R. Hauser

    Stratified linkage analysis of complex genetic traits using related covariates

    Am. J. Hum. Genet

    (1998)
  • E.R. Hauser

    Ordered subset analysis in genetic linkage mapping of complex traits

    Genet. Epidemiol.

    (2004)
  • D.K. Slonim

    From patterns to pathways: gene expression data analysis comes of age

    Nat. Genet.

    (2002)
  • R.J. Neuman

    Clustering methods applied to allele sharing data

    Genet. Epidemiol.

    (2000)
  • A. Pickles

    Latent class analysis of recurrence risks for complex phenotypes with selection and measurement error: a twin and family history study of autism

    Am. J. Hum. Genet.

    (1995)
  • Han, E.H. et al. (1997) Clustering based on association rule hypergraphs. In SIGMOD'97 Workshop on Research Issues on...
  • Hanson, R. et al. (1991) Bayesian classification with correlation and inheritance. In Proceedings of the 12th...
  • Z. Huang et al.

    A fuzzy k-modes algorithm for clustering categorical data

    IEEE Trans. Fuzzy Syst.

    (1999)
  • J.H. Moore

    The ubiquitous nature of epistasis in determining susceptibility to common human diseases

    Hum. Hered.

    (2003)
  • A.H. Tong

    Global mapping of the yeast genetic interaction network

    Science

    (2004)
  • J. Concato

    The risk of determining risk with multivariable models

    Ann. Intern. Med.

    (1993)
  • J.H. Moore et al.

    New strategies for identifying gene-gene interactions in hypertension

    Ann. Med.

    (2002)
  • J. Hoh

    Trimming, weighting, and grouping SNPs in human case-control association studies

    Genome Res.

    (2001)
  • J. Ott et al.

    Set association analysis of SNP case-control and microarray data

    J. Comput. Biol.

    (2003)
  • R.Y. Zee

    A prospective evaluation of the angiotensin-converting enzyme D/I polymorphism and left ventricular remodeling in the ‘Healing and Early Afterload Reducing Therapy’ study

    Clin. Genet.

    (2002)
  • A. Wille

    Sum statistics for the joint detection of multiple disease loci in case-control association studies with SNP markers

    Genet. Epidemiol.

    (2003)
  • J.M. Cheverud et al.

    Epistasis and its contribution to gentic variance components

    Genetics

    (1995)
  • J. Friedman

    Multivariate adaptive regression splines

    Ann. Stat.

    (1991)
  • N.R. Cook

    Tree and spline based association analysis of gene–gene interaction models for ischemic stroke

    Stat. Med.

    (2004)
  • J.N. Morgan et al.

    Problems in the analysis of survey data and a proposal

    J. Am. Stat. Assoc.

    (1963)
  • W.D. Shannon

    Tree-based recursive partitioning methods for subdividing sibpairs into relatively more homogeneous subgroups

    Genet. Epidemiol.

    (2001)
  • I.J. Good

    A causal calculus

    Br. J. Philos. Sci.

    (1961)
  • M.R. Nelson

    A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation

    Genome Res.

    (2001)
  • Cited by (216)

    View all citing articles on Scopus
    View full text