Main

The scientific community has, until recently, viewed with disappointment the results of genetic investigations of common diseases. Now, several widely publicized studies have generated an opposite impression. The staff of Science voted that identification of 'genes for mental illness' represented a top research 'breakthrough' of 2003 (ref. 1). deCODE Genetics, a company attempting to identify comprehensively the genetic contributions to common diseases in Iceland, has reported candidate genes for underlying susceptibility to several diseases2,3,4,5. Considering the far more numerous publications describing negative or equivocal findings, it is uncertain how we should assess the current state of common-trait genetics. Have we won the war, or even a few battles? Or, more pessimistically, have we simply declared victory? To answer these questions we must first ask others. Has one strategy emerged as the best for identifying disease genes? How much and what kind of evidence is enough to establish linkage and association? Which samples should be used and how many markers are needed to genotype them? These questions plague investigators and reviewers.

Four factors fuel confusion in the field: (i) inadequate awareness that different approaches for genetic investigation are expected to yield different types of findings; (ii) insufficient recognition of the dependence of each approach on the availability of appropriate technologies and statistical methods; (iii) inconsistent standards for interpreting levels of statistical evidence; and (iv) nonstandardized strategies for choosing and evaluating phenotypes. Here we discuss each factor, suggesting possible means to diminish confusion, and review the current status of each approach.

Different approaches should yield different findings

For most common diseases, the literature includes findings from pedigree and affected sib-pair (ASP) linkage studies and from association studies of population samples6. Controversy regarding the utility of these different approaches springs from the incorrect impression that they are perfect substitutes for each other. Historically, the different mapping approaches arose from different 'traditions', and a dichotomy existed between rare and common diseases. In the former, the experimental roots of mendelism produced a biological orientation, even before the gene mapping era. In the latter, an epidemiological orientation has predominated, as exemplified by studies of schizophrenia. Before the early twentieth century, psychiatrists considered psychosis a single entity. Kraepelin separated dementia precox (now schizophrenia) from manic depressive psychosis (now bipolar disorder), largely based on observations of the aggregation of each syndrome in different families7. By the mid-twentieth century, twin studies established the genetic basis of schizophrenia8,9; numerous genetic epidemiologic investigations of the disorder continue to this day10. Epidemiologic traditions fostered the development of ASP and association approaches, which depend on epidemiologic data for estimating disease risks in relatives and in the population. Family studies of schizophrenia11,12,13 provided the risk data used as examples in laying out the rationale for ASP genome scans of common diseases14,15.

The aims of different mapping approaches also reflect the dichotomy between biological and epidemiological traditions. High-penetrance variants segregate in extended pedigrees; pedigree studies aim to identify such variants to illuminate biological pathways and processes. Population samples are best for identifying low-penetrance variants; association studies aim to elucidate the contribution of such variants to disease distributions observed in populations. ASP and association studies are more systematic than pedigree studies in ascertaining and recruiting subjects and have aims beyond gene mapping, such as identifying interactions between environmental and genetic variables.

Appropriate technology and statistics for each approach

Theoretical conceptualizations of particular approaches often came several years before technological and statistical advances that made such designs practical. An influential proposal to use mapped DNA markers for genome-wide linkage analysis of mendelian diseases16 preceded by three years the chromosomal localization of the gene mutated in Huntington disease (HD)17, which suggested the feasibility of such mapping. The mapping of HD fortuitously used one of the few DNA markers then available; this and other early linkage findings spurred the development of genetic maps, which made linkage studies of mendelian disorders routine. The consequent proliferation of linkage studies required computationally efficient methods for genome-wide statistical analysis and fostered the growth of the field of statistical genetics. The difficulty of isolating the first mapped genes18,19,20 spurred the development of physical maps and the adaptation of linkage disequilibrium (LD) analysis (previously a tool for basic population genetics) for fine-scale mapping20,21.

A similar mismatch between what was theoretically correct and what was technologically feasible characterizes the more recent history of mapping studies of common diseases. In the late 1980s, several linkage findings established the investigation of extended pedigrees as the predominant paradigm for mapping common diseases22,23,24. Investigators began to question this approach when findings could not be replicated and were recognized as false positives25,26. At the same time, theoretical statistical studies suggested ASP and association studies as alternatives to pedigree studies14,15,27. The questioning of pedigree approaches on theoretical grounds obscured the fact that, until recently, most pedigree studies of common diseases were underpowered and possibly tested too few markers. In the past few years it became feasible to genotype cheaply large pedigrees with much greater numbers of microsatellites than were previously used for genome scans, and statistical programs that are now available permit efficient computation of linkage even in complex pedigrees28,29. These advances allowed a substantial increase in the scale of pedigree-based linkage studies2,4,30,31,32,33,34,35,36,37,38. Inadequate technology and statistical methodology have similarly hindered implementation of alternatives to pedigree-based mapping. Although it remains uncertain how association studies should be designed and analyzed, the technology for inexpensive high-throughput genotyping of single-nucleotide polymorphisms (SNPs) is maturing rapidly, accompanied by a surge of activity in the development of statistical methods.

Levels of statistical evidence

Statistical methodologies respond to what is technologically feasible, a fact that has guided determinations of the evidence levels required to consider linkage and association results statistically significant. In the premap era, when there were few markers and genotyping was expensive, the significance cut-off for linkage (lod score of 3) rested on two arguments. First, to minimize the costs of sample collection and genotyping, Morton39 proposed a sequential procedure for sampling and analyzing pedigrees until the evidence in favor of linkage with a marker (expressed as logarithm base 10 of the likelihood ratio) reached the level of 3. This threshold corresponds to a P value of 10−4, using the χ2 approximation for likelihood ratio tests, and taking into account that this is a 'one-sided' test40. Such stringent criteria for significance guaranteed against biases introduced by the sequential sampling procedure. Second, Morton and others used Bayesian arguments to show how, even without adopting a sequential procedure, it was necessary to require such strong evidence to conclude in favor of linkage; given the availability of only a few markers, there is a very small prior probability that one of these markers is linked to the gene of interest. Substantial evidence is needed to convert this low prior probability of linkage into a high posterior probability. For example, one calculation based on genome length and the distance between loci over which one could detect linkage, determined a prior probability of 0.02 for linkage between a given locus of interest and a random genome location41,42. To obtain a posterior probability of ≥0.95, so that when one declares linkage there is a probability of ≤0.05 of being mistaken, one applies Bayes' theorem:

Substituting the prior probability of linkage of 0.02, equating this with a posterior probability of 0.95 and solving for the likelihood ratio Pr(Data | Linkage) / Pr(Data | NoLinkage), this ratio must be ≥1,000, corresponding to a lod score of 3. Originally, then, the stringent threshold for the lod score, or for the P value corresponding to this score, was to protect against too little search, too few pedigrees collected or too few markers tested. The numerous genome-wide linkage studies for mendelian disorders subsequently confirmed the low prior probability of linkage to any preselected single locus.

When genome-wide sets of mapped markers became available, the problem became reversed: too much search rather than too little. The prior probability that some marker in a genome-wide data set is linked to the locus of interest is 1. Because so many statistical tests are done, however, at least one test will probably yield a false positive result; therefore, one must correct for multiple comparisons. For example, consider a genome scan with 500 microsatellites. To control, at the 0.05 level, the global hypothesis of no linkage anywhere in the genome, we can use a correction level of 0.05/500 = 10−4 for each test, corresponding to a lod score of 3. Such a Bonferroni correction is too conservative when tests are dependent, as is the case in linkage studies done with denser marker sets, where intermarker distances are so small that linkage statistics pick up substantially the same information at adjacent markers. Several investigators used Gaussian process approximations for linkage statistics and determined that use of dense marker sets requires little additional adjustment of the lod score threshold, to 3–3.5 (refs. 4345). These analyses highlighted the fact that the appropriate correction is based on the number of possible independent tests, rather than on the number of tests specifically carried out. That the several statistical arguments reviewed above suggested the same lod score cut-off has ensured universal acceptance for the above criteria for designating linkage 'significant'.

Although the field has not reached consensus on significance cut-offs for association studies, assessing such thresholds, as for linkage studies, requires consideration of prior probability and multiple comparisons. Most association studies so far have investigated small numbers of variants in one or a few candidate genes. For such studies, the need to correct for numerous comparisons is a minor issue, a fact that has led to acceptance of nonstringent significance cut-offs. However, in this case the major problem is that of too little search. Determining appropriate cut-offs for gene association studies is analogous to determining significance for linkage in the premap era, when the acceptance of stringent lod score thresholds prevented the dissemination of false positive gene linkage results. The prior probability of association of a trait to a single candidate gene is much lower than the prior probability of linkage to such a gene, as association extends over much shorter genomic intervals than does linkage. If one makes the conservative simplifying assumption that the gene was picked at random from the 30,000 genes in the genome, the prior probability is 1/30,000 that a given candidate gene is associated with a trait. Using the same Bayesian arguments presented above for linkage, the likelihood ratio should be ≥550,000 to consider the association significant; assessing association with a χ2 test, which asymptotically approximates twice the natural logarithm of the likelihood ratio, this translates into a P value of ≤2.6 × 10−7. Almost no candidate association studies meet this threshold; usually, investigators (and readers) implicitly assume that meaningful prior evidence guides the selection of a candidate gene (i.e., that the prior probability of association is higher than 1/30,000). Estimates of prior probability are inherently subjective and hypothesis-based; the estimate proposed by Morton for linkage achieved acceptance because its assumptions used mendelian principles. For association studies, there is no comparable form of prior evidence that can be readily quantified in a probability. Unfortunately, the field has therefore largely chosen to ignore the need to apply stringent cut-offs for gene association studies, so that many of even the most highly publicized results are probably false positives.

Journals could improve the reporting of gene association results by requiring explicit, critical and standardized descriptions of previous evidence for candidate genes. Investigators could propose estimates of prior probability based on such evidence; readers could judge whether these estimates are reasonable. Often such evidence will consist of results from similar association studies. Unequivocally positive results from a different population (i.e., results to be replicated) raise the prior probability; negative results from a similar study lower the prior probability. Authors could use the Genetic Association Database46 to provide a complete summary of previous association studies relevant to their publication.

Positional candidate genes reside in regions that showed linkage in earlier studies. Readers can judge the extent of such evidence—the strength of the linkage finding, the width of the lod score peak, the number of genes in the region, the degree of heterogeneity of the finding between different sets of families and whether linkage and association samples derived from similar populations. The existing evidence is usually softer for 'functional' candidate genes. For example, tryptophan hydroxylase (TPH1), which encodes a key protein in serotonin metabolism, has been widely investigated as a candidate for involvement in abnormal behavioral phenotypes. Yet the recent discovery of a new isoform of TPH47 has cast doubt on numerous association studies, which investigated an isoform that, it is now known, is not even expressed in brain serotonergic neurons. For functional candidate association studies, therefore, authors should be particularly cautious in assuming substantial prior probabilities. Readers must be able to evaluate whether earlier studies used to justify a selected candidate gene corrected for multiple comparisons, or whether any evidence argues against the candidate hypothesis. Some authors have suggested further that association studies should compute a false positive report probability for each result, incorporating the prior probability, the observed P value and the statistical power of the analysis48. Several factors (sample size, variant frequencies and effect size) determine power, and the utility of placing such emphasis on a single false positive report probability score is still unclear49.

The advent of genome-wide association studies will diminish the problem of searching too little and introduce the problem of searching too much. Some information needed to determine statistical cut-offs for such studies is still unavailable, particularly for LD mapping. We do not know how many markers are needed to bring the probability of having at least one marker associated with a disease to 1; unlike in linkage, this number will vary between populations. Furthermore, the structure of dependence between tests for association at nearby markers is unclear. Current initiatives, such as the International HapMap project and LD map–building efforts, may diminish this uncertainty50,51. Proposals advocating direct association studies using intragenic functional variants52 envision that 50,000–100,000 such SNPs will provide genome coverage. Such estimates supply an initial basis for considering statistical cut-offs; with a Bonferroni correction, one needs a P value of <5 × 10−7 to achieve significance. Although this cut-off may be too conservative if there is substantial dependence between the association tests at the various SNPs, we currently lack appropriate models for such possible dependence. Given that association-based genome scans aim to identify multiple genes of relatively small effect, some have proposed implementing a less strict definition of global error. The Bonferroni correction controls the probability of declaring at least one false association, known as the family wise error rate. An alternative approach controlling the false discovery rate, the proportion of wrong associations among all the identified associations53, is receiving increasing attention54,55,56.

An additional issue in association studies, one not faced in linkage studies, is that high-throughput SNP analysis creates the possibility of genotyping numerous variants in sizable candidate regions or for series of candidate genes. This situation has some characteristics of both too little search and too much search. As only limited segments of the genome are evaluated, one must account for the low prior probability of association, but one must also correct for multiple comparisons, taking into consideration the number of possible tests.

Choosing and evaluating phenotypes

Each mapping approach offers advantages and disadvantages for phenotyping. Investigating pedigrees permits collection of deeper phenotypic profiles than is feasible in a population sample; ongoing relationships with pedigree members facilitate extensive and longitudinal assessments. But the phenotypes assessed in a single pedigree may be idiosyncratic to that pedigree or to specific clinicians; this limits the feasibility of combined analyses of pedigrees sampled by different research groups. Large-scale cooperative ASP studies have fostered a more systematic approach to phenotyping, permitting comparability of phenotype definition and assessment between research groups. Association samples, which are easy to collect in clinical settings, may be 'convenience samples', in which phenotypic assessment is superficial. When sufficient resources are devoted to identifying and phenotyping subjects, however, population samples, such as those collected in large cohort studies57,58, have unmatched potential for providing generalizable information on a comprehensive array of phenotypic features59 and for enabling evaluation of phenotypic and environmental variation in relation to genotypic variation. Comprehensive phenotypic databases provide economies of scale for investigating common diseases, but the degree of systematization followed in identifying and phenotyping subjects will determine the utility of such databases.

In selecting phenotypes for linkage and association analyses, investigators must account for low prior probability and multiple comparisons, just as in selecting markers. The low prior probability of 'candidate phenotypes' is similar to the low prior probability of candidate genes. Consider functional variants in a gene implicated in an important biological pathway—for example, the repeat polymorphism in the serotonin transporter promoter region, which has been tested for association to a wide range of behavioral phenotypes, chosen based on their hypothesized physiological connection to serotonergic pathways60. Stringent statistical cut-offs are needed to offset the low prior probability that this variant will influence, among all possible phenotypes, the phenotype chosen by an investigator. Although it is not evident how one can estimate this prior probability, for some phenotypes there is better a priori evidence than for others. For example, it is more probable that the serotonin transporter variant influences phenotypes previously shown to be heritable than phenotypes not known to be heritable; for the latter, a significance cut-off of P < 0.05 is almost certainly too liberal. This low prior probability could influence interpretation of recent association results for this variant for complicated phenotypes. Examples include functional brain imaging results in response to emotional stimuli61 and depression-related phenotypes when related to stressful life events62.

Increasing the scale and variety of phenotypic data introduces additional statistical issues. If the analysis plan for considering different phenotypic categorizations is not specified in advance, then there is a risk of inflating the likelihood ratio (for either linkage or association tests) by maximizing the evidence according to disease definition63. The statistical problem of multiple comparisons occurs when researchers investigate multiple phenotypes in the same set of samples; this problem will be exacerbated when investigators begin to analyze comprehensive phenotype databases from large population samples. Applying Bonferroni corrections based on the number of phenotypes evaluated will probably result in exceedingly conservative conclusions: often phenotypes (and hence tests) will be correlated and one expects more than one phenotype to lead to positive mapping results. In this context, false discovery rate approaches may be particularly useful, possibly coupled with resampling procedures to take this dependency into account.

Another statistical issue arises when authors report only some of the possible phenotypic assortment from their data. Authors should indicate explicitly the phenotype combinations that yield negative mapping results and state how they use phenotypic information to guide the extension of pedigrees. In the absence of such information, readers may assume that the phenotypes used for genetic analyses are idiosyncratic and hard to relate to prior probabilities for linkage or association. Readers will be better able to evaluate mapping studies involving multiple phenotypic categorizations if they are provided with details about the procedures used in all stages of phenotyping and phenotype-genotype analyses. For example, in mapping stroke susceptibility loci, deCODE obtained the strongest evidence using unconventional phenotypic categorizations3,64. If the authors provided detailed phenotyping information on a website, readers could judge whether criticism of this approach is valid64 or simply another example of the gulf between biologic and epidemiologic approaches65.

Extended pedigree studies

Three recent developments have revived interest in pedigree studies of common diseases. First, theoretical studies have suggested that pedigree approaches may be the most powerful for identifying quantitative trait loci underlying disease phenotypes (endophenotypes)66,67; these quantitative trait loci may have a simpler genetic architecture than the disease diagnoses and may therefore be more straightforward to map. Endophenotype mapping has not yet been implemented in humans on a sufficient scale to judge its success. Second, new methods can efficiently compute linkage statistics in extended pedigrees28,29. Third, deCODE has published pedigree-based linkage findings for numerous diseases. deCODE obtained access to the Icelandic population, with its medical records and genealogy, and used this information to assemble large pedigrees. The genealogies enabled reconstruction of most connections between distantly related individuals. The medical records provided wide-ranging phenotypic information on most family members. deCODE focuses on large pedigrees with distantly related affected members, who are expected to share shorter genome segments around a disease gene than the more closely related affected individuals in small pedigrees. Hence, deCODE conducted genome scans using denser marker sets than those used by most groups68, analyzing linkage in entire pedigrees with programs developed by its scientists69 and considering several different combinations of phenotypic information in these analyses

Although the scale of deCODE's extended pedigree studies is unusual, numerous research groups are using similar approaches, mainly in families from relatively closed populations70,71,72,73,74,75. These populations exist throughout the world and are characterized by low immigration, low emigration and distribution over relatively small areas, so that most subjects and their medical records are available to investigators. From these communities it is feasible to obtain nearly complete genealogies, a crucial step in conducting adequately powered studies.

The power of pedigree studies is a topic of great current interest. Most of deCODE's studies have involved genotyping several hundred affected individuals, using >1,000 markers. Although each study has yielded interesting results, leading to fine-mapping and gene-identification efforts, several have failed to achieve clear statistical significance2,5,37,38. deCODE's experiences suggest two avenues for extended pedigree designs. First, for diseases that do not yield unequivocal linkage results in large samples, implementation of endophenotype mapping may be particularly attractive. Second, to obtain adequate power investigators may need to combine pedigree samples from different countries, perhaps from genetically related populations76.

For common diseases, the extended pedigree approach has so far failed, other than for rare early-onset forms of these diseases, to fulfill the expectation that it would identify high-penetrance variants that illuminate biological pathways77,78. Pedigree studies of common forms of these diseases have led to positional candidate association studies that have provided intriguing, but mainly statistically equivocal, evidence for variants that may have a role in disease susceptibility; variants identified so far do not have the biological effects of most mutations underlying mendelian disorders2,3,4,5,79. In this respect, the field is eagerly awaiting the results of fine-mapping studies for several diseases being undertaken by deCODE and others.

ASP studies

As genealogy-based pedigree studies require well-demarcated, stable populations, and as most phenotyped individuals live in other settings, the genetics field requires other paradigms. The ASP strategy enabled numerous investigators with access to well-phenotyped clinical samples to initiate linkage studies. Owing to the influence of Risch's theoretical work, the requirement for relatively few markers and the development of improved statistical analysis programs80,81, this approach now predominates for genome-wide mapping of common diseases. The use of inadequate sample sizes probably explains why most published ASP studies have reported negative or equivocal results, particularly for phenotypes with small effect sizes (low genotype relative risk). The launching of so many underpowered studies exemplifies how the field incorrectly interprets the conclusions of theoretical studies but also reflects the substantial resources required to collect adequate ASP samples. By forming consortia to obtain such samples, investigators are beginning to obtain the results predicted by Risch and others. Crohn disease is an example. Independent ASP studies suggested that several possible loci on different chromosomes were involved in Crohn disease82,83,84. Many, but not all, studies implicated a locus on chromosome 16; some of these studies, on their own, barely highlighted this region. Formation of an international consortium to investigate more than 600 ASPs from these several studies generated an unequivocal linkage finding on chromosome 16 (IBD1; ref. 85), which led to identification of the gene underlying the linkage of IBD1 with susceptibility to Crohn disease (CARD15; ref. 86). This example shows that the ASP design is well-suited for combining samples from different countries. Unlike extended pedigree approaches, ASP studies are readily coordinated between sites and do not depend on genealogic efforts. Compared with association studies, ASP approaches are robust to differences in the genetic composition of the study populations. One caveat concerns comparison of ASP genome scans. Given the typically sparse marker sets used in such scans (<500 markers), false negative results may result from excessive gaps in genome coverage, for example, if a particular marker fails in one of the data sets. The problem is exacerbated by the fact that different studies use different markers, as illustrated by recent scans for rheumatoid arthritis87,88. The increasing interest in combining data from different scans suggests the need to use denser, more uniform marker sets in future ASP studies.

The advantages of ASP designs for multi-site projects are now being exploited by large-scale studies that will support investigations beyond gene mapping. For example, the GenomEUtwin project89, incorporating almost one million twins from several countries, will be powerful for ASP linkage studies of several phenotypes. The extensive longitudinal data collected will permit evaluation of numerous environmental variables; even the richest pedigree is poorly suited to investigating questions relating to gene-environment interactions, and the non-independence of members of pedigrees complicates statistical analyses that are straightforward in ASPs.

Association studies

Association studies are the focus of much current interest. Genome-wide association studies to identify risk variants for common diseases have mainly been limited, to date, to recently founded population isolates, in which microsatellites detect LD over distances up to several centimorgans90,91. Although some remain skeptical about genome-wide LD mapping using SNPs92, identification of an identical asthma-associated SNP haplotype in Finland and Quebec indicates the tremendous potential of this approach, at least in population isolates93. A few unequivocal candidate-gene associations, such as that between the apolipoprotein E4 allele (ApoE4) and Alzheimer disease, illustrate the kind of information we can expect from successful association studies. Numerous studies showed that ApoE4 is the most important risk factor for Alzheimer disease94. Although this finding has generated fewer biological insights than has the identification, through pedigree studies, of genes implicated in rare, mendelian forms of Alzheimer disease78, it has transformed epidemiological and clinical investigation of dementia and related phenotypes. Consequently, it is now known that ApoE4 is associated with the age of onset of Alzheimer disease95, the process of cognitive decline in 'normal' aging96, altered magnetic resonance imaging findings in asymptomatic individuals97,98, risk of chronic traumatic brain injury in boxers99 and clinical outcome in survivors of traumatic brain injury100.

Conclusion

Developments in genotyping technology and statistical methodology will soon make adequately powered pedigree, ASP and association studies feasible for most common diseases. The specific biological and epidemiological questions that an investigator aims to answer will then dictate the choice of study design, and it will be difficult to justify designs (such as candidate gene association studies) because they are convenient or inexpensive. A currently practical step in this direction would be for funding agencies to start rejecting such justifications. They could also require investigators to indicate explicitly the reasoning, the evidence underlying the reasoning and the procedures to be used to address statistical issues of prior probability, power and multiple comparisons; journals could take the same stance with authors.

Insistence by the field on more stringent standards of evidence will encourage investigators, using any mapping approach, to increase sample sizes. In most cases this step will require combining samples from different sites. Incompatibility of phenotypic data between sites will probably impede this process; funding agencies should support efforts to standardize such data, both new and existing. The feasibility of combining samples may be specific to the setting. For example, extended pedigree studies will mainly be undertaken in well-demarcated populations and increasing sample size may require identifying suitable companion populations. The combination of samples between studies will also require greater efforts to ensure that the marker data from genome-wide analyses are compatible and provide complete genome coverage; this is already apparent for pedigree and ASP studies and will be even more important for association studies.