Genetic association studies

doi:10.1016/S0140-6736(05)67424-7

The Lancet

Volume 366, Issue 9491, 24–30 September 2005, Pages 1121-1131

https://doi.org/10.1016/S0140-6736(05)67424-7 Get rights and content

Summary

We review the rationale behind and discuss methods of design and analysis of genetic association studies. There are similarities between genetic association studies and classic epidemiological studies of environmental risk factors but there are also issues that are specific to studies of genetic risk factors such as the use of particular family-based designs, the need to account for different underlying genetic mechanisms, and the effect of population history. Association differs from linkage (covered elsewhere in this series) in that the alleles of interest will be the same across the whole population. As with other types of genetic epidemiological study, issues of design, statistical analysis, and interpretation are very important.

Section snippets

Direct association

The first of these forms of association is termed direct association, and studies of direct association target polymorphisms which are themselves putative causal variants. This type of study is the easiest to analyse and the most powerful, but the difficulty is the identification of candidate polymorphisms. A mutation in a codon which leads to an aminoacid change is a candidate causal variant. However, it is likely that many causal variants responsible for heritability of common complex

Indirect association

In the second type of association, the polymorphism is a surrogate for the causal locus and this type of association allows us to search for causal genes in indirect association studies. However, indirect associations are even weaker than the direct associations they reflect, and it will usually be necessary to type several surrounding markers to have a high chance of picking up the indirect association. Indirect association studies are more difficult to analyse, and there is still debate as to

Confounded association

The final type of association is that due to confounding by stratification and admixture (substructure) within the population. Confounding, as in the rest of epidemiology, raises the possibility both of generating false findings (positive confounding) or obscuring true causal associations (negative confounding). However, although the problem of unobserved confounding is intractable in classic epidemiology, dictating limits on the size of causal effect that can be safely inferred from

Direct association: patterns of genotype–phenotype relationship

We shall consider a diallelic locus, directly related to either a quantitative trait or to a discrete trait such as presence (prevalence), or occurrence (incidence), of a disease. Multiallelic loci lead to more complicated scenarios and generate tests with many degrees of freedom. Even in the simplest diallelic case, different patterns for the genotype–phenotype relationship must be considered. Since there are three possible genotypes, which have a natural order (1/1, 1/2, and 2/2), the

Linear dose-response modelling

In classic mendelian genetics of fully penetrant discrete traits, the description of an allele as dominant implies that the corresponding phenotype will occur irrespective of the number of copies of the allele carried. A recessive allele requires both copies to be present for the phenotype to be evident. In a diallelic system, if neither allele is dominant, 1/2 heterozygotes will display an intermediate phenotype. Fisher¹⁶ used the term dominance in a different way to describe the related

Epistasis

The general issue of dominance relates to the extent to which the joint effect of two alleles at a single autosomal locus might be different from the sum (or product in a multiplicative model) of the effects anticipated for each allele independently. A related issue is the degree to which the combined effect of alleles at two or more loci can reasonably be modelled by the individual locus contributions. The fact that inheritance of some traits could only be explained by joint action of two

Indirect association: patterns of linkage disequilibrium

The mapping of susceptibility genes for common complex disorders and genes for other common traits by the indirect method depends on the existence of association, at the population level, between the causal variants and nearby markers. Such association, because of the proximity of loci on the genome, is termed linkage disequilibrium. (Some use this term to describe any population-wide association between loci, whether due to proximity or to another reason such as population stratification and

Study designs

Familiar epidemiological designs such as population-based case-control or cohort designs19, 52 are often used for genetic association studies and the data are analysed much the same way too, risk factors such as smoking and obesity etc, being replaced by the presence or absence of a particular genetic polymorphism. Risk can be considered in terms of either a predisposing allele or genotype, or in terms of multiple categories of disease risk such as the risks associated with different alleles at

Statistical analysis

The analysis of data depends crucially on the study design. In the simplest case, familiar methods such as logistic regression, χ² tests of association, and odds ratios may be suitable. At a single marker, the issue arises as to whether to analyse on the basis of allele counts or genotype counts. Suppose we have case and control data for a single diallelic genetic locus (table 4). A simple χ² test for independence has 2 degrees of freedom. Two odds ratios can be calculated: af/be (for genotype

Significance and importance

The standards of statistical proof that have become acceptable in the general biomedical literature are not appropriate for genetic association studies. Something akin to a multiple testing problem pervades the discipline, although there has been no clear consensus about how it should be dealt with. Approaches such as the Bonferroni correction are not appropriate because it is not the number of tests in any one investigation that is important. Rather it is that the vast majority of loci tested

References (110)

J Ioannidis et al.
Genetic associations in large versus small studies: an empirical assessment
Lancet
(2003)
C Hoggart et al.
Control of confounding of genetic associations in stratified populations
Am J Hum Genet
(2003)
J Pritchard et al.
Use of unlinked genetic markers to detect population stratification in association studies
Am J Hum Genet
(1999)
J Pritchard et al.
Association mapping in structured populations
Am J Hum Genet
(2000)
G Satten et al.
Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model
Am J Hum Genet
(2001)
SA Bacanu et al.
The power of genomic control
Am J Hum Genet
(2000)
W Thompson
Effect modification and the limits of biological inference from epidemiologic data
J Clin Epidemiol
(1991)
B Devlin et al.
A comparison of linkage disequilibrium measures for fine-scale mapping
Genomics
(1995)
PC Sham et al.
Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data
Am J Hum Genet
(2000)
CS Carlson et al.
Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium
Am J Hum Genet
(2004)

N Risch et al.

The future of genetic studies of complex human diseases

Science

(1996)

The International HapMap project

Statistical Methods in Cancer Research. Volume I—The Analysis of Case-Control Studies. IARC Scientific Publications

(1980)

D Clayton et al.

Statistical Models in Epidemiology

(1993)

P Sasieni

From genotypes to genes: doubling the sample size

Biometrics

(1997)

W Bateson

Mendel's principles of heredity

(1909)

P Phillips

The language of gene interaction

Genetics

(1998)

H Cordell et al.

Statistical modeling of interlocus interactions in a complex disease: Rejection of the multiplicative model of epistasis in type 1 diabetes

Genetics

(2001)

H Cordell

Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans

Hum Mol Genet

(2002)

J Siemiatycki et al.

Biological models and statistical interactions: an example from multistage carcinogenesis

Int J Epidemiol

(1981)

Cited by (0)

View full text

SeriesGenetic association studies

Summary

Section snippets

Direct association

Indirect association

Confounded association

Direct association: patterns of genotype–phenotype relationship

Linear dose-response modelling

Epistasis

Indirect association: patterns of linkage disequilibrium

Study designs

Statistical analysis

Significance and importance

Lancet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

J Clin Epidemiol

Genomics

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Lancet

Am J Hum Genet

Am Hum Genet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Am J Hu Genet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Am J Hum Genet

Towards fully automated genome-wide polymorphism screening

Nat Genet

The future of genetic studies of complex human diseases

Science

The International HapMap project

Nature

Epidemiology faces its limits

Science

Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer

Cancer Epidemiol Biomarkers Prev

Parameters for reliable results in genetic association studies in common disease

Nat Genet

Genomic control for association studies

Biometrics

Association studies for quantitative traits in structured populations

Genet Epidemiol

The correlation between relatives on the supposition of Mendelian inheritance

Trans R Soc Edin

Genotype relative risks under ordered restriction

Genet Epidemiol

The relative importance of heredity and environment in determining the piebald pattern of guinea-pigs

Proc Natl Acad Sci USA

Statistical Methods in Cancer Research. Volume I—The Analysis of Case-Control Studies. IARC Scientific Publications

Statistical Models in Epidemiology

From genotypes to genes: doubling the sample size

Biometrics

Mendel's principles of heredity

The language of gene interaction

Genetics

Statistical modeling of interlocus interactions in a complex disease: Rejection of the multiplicative model of epistasis in type 1 diabetes

Genetics

Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans

Hum Mol Genet

Biological models and statistical interactions: an example from multistage carcinogenesis

Int J Epidemiol

Series
Genetic association studies