Trends in Genetics
Genetics, statistics and human disease: analytical retooling for complexity
Section snippets
Categorization and analytical approaches
Each of the factors presented in Table 1, Table 2, Table 3 complicate statistical analysis in one of two ways – either by creating heterogeneous, or competing, disease models (Table 1, Table 2), or by creating a multifactorial, interacting disease model (Table 3). The challenge for modeling the relationship between genetic and environmental risk factors (independent variables) and disease endpoints (dependent variables) is different for these two categories. Of course, what exacerbates the
Heterogeneity
For this category of factors, there are multiple independent (predictor) variables or else multiple dependent (outcome) variables that complicate the analysis by creating a heterogeneous model landscape. In the case of allelic or locus heterogeneity or phenocopy, multiple predictor variables (e.g. multiple alleles, multiple loci and/or environmental risk factors) are present, some of which might be unmeasured or unobserved and, therefore, unavailable for inclusion in the disease model. In the
Concluding remarks: retooling for the future
None of the aforementioned methodologies is superior in all respects for the range of complicating factors that might be present in any given dataset. Given the relative shortcomings of our current analyses in complex diseases, we need to extend greatly the range of available analytical tools. There is a crucial need for extensive reevaluation of existing methodologies for complex diseases, as well as for massive efforts in new method development. It is important that empirical studies be
Acknowledgements
We thank the reviewers, Marylyn Ritchie and Dan Hahs for their critical reading of this manuscript. This work was supported by NLM training grant T32 MH64913 and by NIH grants HL65234, AI59694, N532830 and A619085.
Glossary
- Recombination fraction:
- the probability that a parent will produce a recombinant offspring; the percentage of offspring in a family or dataset who are recombinants; a statistical measure of the distance between two loci.
- Admixture:
- the mixing of two or more subpopulations, having differing characteristics. If the subpopulations have different allele or genotype frequencies and have different disease frequencies it can result in spurious associations.
- Lod score:
- the log10 of the odds in favor of
References (85)
- et al.
Multilocus genotypes, a tree of individuals, and human evolutionary history
Am. J. Hum. Genet.
(1997) Evaluation of ADHD typology in three contrasting samples: a latent class approach
J. Am. Acad. Child Adolesc. Psychiatry
(1999)Further exploration of a latent class typology of schizophrenia
Schizophr. Res.
(1996)A simulation study of the number of events per variable in logistic regression analysis
J. Clin. Epidemiol.
(1996)Classification methods for confronting heterogeneity
Adv. Genet.
(2001)Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer
Am. J. Hum. Genet.
(2001)- et al.
The genetic basis of tuberous sclerosis
Mol. Med. Today
(1998) Phenotypic homogeneity provides increased support for linkage on chromosome 2 in autistic disorder
Am. J. Hum. Genet.
(2002)- et al.
The tuberous sclerosis complex and its highly variable manifestations
J. Urol.
(2003) Pro-to-leu change at position 102 of prion protein is the most common but not the sole mutation related to Gerstmann-Straussler syndrome
Biochem. Biophys. Res. Commun.
(1989)
Genetic predisposition to iatrogenic Creutzfeldt–Jakob disease
Lancet
Neuropathological phenotype and ‘prion protein’ genotype correlation in sporadic Creutzfeldt–Jakob disease
Neurosci. Lett.
Genetic heterogeneity of Usher syndrome type 1 in French families
Genomics
Fine mapping of autistic disorder to chromosome 15q11-q13 by use of phenotypic subtypes
Am. J. Hum. Genet.
Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM)
Am. J. Hum. Genet.
Sequential tests for the detection of linkage
Am. J. Hum. Genet.
A new statistical test for linkage heterogeneity
Am. J. Hum. Genet.
Testing for heterogeneity of recombination fraction values in human genetics
Ann. Hum. Genet.
Strategies for characterizing highly polymorphic markers in human gene mapping
Am. J. Hum. Genet.
Stratified linkage analysis of complex genetic traits using related covariates
Am. J. Hum. Genet
Ordered subset analysis in genetic linkage mapping of complex traits
Genet. Epidemiol.
From patterns to pathways: gene expression data analysis comes of age
Nat. Genet.
Clustering methods applied to allele sharing data
Genet. Epidemiol.
Latent class analysis of recurrence risks for complex phenotypes with selection and measurement error: a twin and family history study of autism
Am. J. Hum. Genet.
A fuzzy k-modes algorithm for clustering categorical data
IEEE Trans. Fuzzy Syst.
The ubiquitous nature of epistasis in determining susceptibility to common human diseases
Hum. Hered.
Global mapping of the yeast genetic interaction network
Science
The risk of determining risk with multivariable models
Ann. Intern. Med.
New strategies for identifying gene-gene interactions in hypertension
Ann. Med.
Trimming, weighting, and grouping SNPs in human case-control association studies
Genome Res.
Set association analysis of SNP case-control and microarray data
J. Comput. Biol.
A prospective evaluation of the angiotensin-converting enzyme D/I polymorphism and left ventricular remodeling in the ‘Healing and Early Afterload Reducing Therapy’ study
Clin. Genet.
Sum statistics for the joint detection of multiple disease loci in case-control association studies with SNP markers
Genet. Epidemiol.
Epistasis and its contribution to gentic variance components
Genetics
Multivariate adaptive regression splines
Ann. Stat.
Tree and spline based association analysis of gene–gene interaction models for ischemic stroke
Stat. Med.
Problems in the analysis of survey data and a proposal
J. Am. Stat. Assoc.
Tree-based recursive partitioning methods for subdividing sibpairs into relatively more homogeneous subgroups
Genet. Epidemiol.
A causal calculus
Br. J. Philos. Sci.
A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation
Genome Res.
Cited by (216)
Fundamental research into endometriosis: what areas, what avenues?
2023, Medecine de la ReproductionThe Translational Machine: A novel machine-learning approach to illuminate complex genetic architectures
2021, Genetic Epidemiology