Abstract
Efforts to identify gene variants associated with susceptibility to common diseases use three approaches: pedigree and affected sib-pair linkage studies and association studies of population samples. The different aims of these study designs reflect their derivation from biological versus epidemiological traditions. Similar principles regarding determination of the evidence levels required to consider the results statistically significant apply to both linkage and association studies, however. Such determination requires explicit attention to the prior probability of particular findings, as well as appropriate correction for multiple comparisons. For most common diseases, increasing the sample size in a study is a crucial step in achieving statistically significant genetic mapping results. Recent studies suggest that the technology and statistical methodology will soon be available to make well-powered studies feasible using any of these approaches.
Similar content being viewed by others
Main
The scientific community has, until recently, viewed with disappointment the results of genetic investigations of common diseases. Now, several widely publicized studies have generated an opposite impression. The staff of Science voted that identification of 'genes for mental illness' represented a top research 'breakthrough' of 2003 (ref. 1). deCODE Genetics, a company attempting to identify comprehensively the genetic contributions to common diseases in Iceland, has reported candidate genes for underlying susceptibility to several diseases2,3,4,5. Considering the far more numerous publications describing negative or equivocal findings, it is uncertain how we should assess the current state of common-trait genetics. Have we won the war, or even a few battles? Or, more pessimistically, have we simply declared victory? To answer these questions we must first ask others. Has one strategy emerged as the best for identifying disease genes? How much and what kind of evidence is enough to establish linkage and association? Which samples should be used and how many markers are needed to genotype them? These questions plague investigators and reviewers.
Four factors fuel confusion in the field: (i) inadequate awareness that different approaches for genetic investigation are expected to yield different types of findings; (ii) insufficient recognition of the dependence of each approach on the availability of appropriate technologies and statistical methods; (iii) inconsistent standards for interpreting levels of statistical evidence; and (iv) nonstandardized strategies for choosing and evaluating phenotypes. Here we discuss each factor, suggesting possible means to diminish confusion, and review the current status of each approach.
Different approaches should yield different findings
For most common diseases, the literature includes findings from pedigree and affected sib-pair (ASP) linkage studies and from association studies of population samples6. Controversy regarding the utility of these different approaches springs from the incorrect impression that they are perfect substitutes for each other. Historically, the different mapping approaches arose from different 'traditions', and a dichotomy existed between rare and common diseases. In the former, the experimental roots of mendelism produced a biological orientation, even before the gene mapping era. In the latter, an epidemiological orientation has predominated, as exemplified by studies of schizophrenia. Before the early twentieth century, psychiatrists considered psychosis a single entity. Kraepelin separated dementia precox (now schizophrenia) from manic depressive psychosis (now bipolar disorder), largely based on observations of the aggregation of each syndrome in different families7. By the mid-twentieth century, twin studies established the genetic basis of schizophrenia8,9; numerous genetic epidemiologic investigations of the disorder continue to this day10. Epidemiologic traditions fostered the development of ASP and association approaches, which depend on epidemiologic data for estimating disease risks in relatives and in the population. Family studies of schizophrenia11,12,13 provided the risk data used as examples in laying out the rationale for ASP genome scans of common diseases14,15.
The aims of different mapping approaches also reflect the dichotomy between biological and epidemiological traditions. High-penetrance variants segregate in extended pedigrees; pedigree studies aim to identify such variants to illuminate biological pathways and processes. Population samples are best for identifying low-penetrance variants; association studies aim to elucidate the contribution of such variants to disease distributions observed in populations. ASP and association studies are more systematic than pedigree studies in ascertaining and recruiting subjects and have aims beyond gene mapping, such as identifying interactions between environmental and genetic variables.
Appropriate technology and statistics for each approach
Theoretical conceptualizations of particular approaches often came several years before technological and statistical advances that made such designs practical. An influential proposal to use mapped DNA markers for genome-wide linkage analysis of mendelian diseases16 preceded by three years the chromosomal localization of the gene mutated in Huntington disease (HD)17, which suggested the feasibility of such mapping. The mapping of HD fortuitously used one of the few DNA markers then available; this and other early linkage findings spurred the development of genetic maps, which made linkage studies of mendelian disorders routine. The consequent proliferation of linkage studies required computationally efficient methods for genome-wide statistical analysis and fostered the growth of the field of statistical genetics. The difficulty of isolating the first mapped genes18,19,20 spurred the development of physical maps and the adaptation of linkage disequilibrium (LD) analysis (previously a tool for basic population genetics) for fine-scale mapping20,21.
A similar mismatch between what was theoretically correct and what was technologically feasible characterizes the more recent history of mapping studies of common diseases. In the late 1980s, several linkage findings established the investigation of extended pedigrees as the predominant paradigm for mapping common diseases22,23,24. Investigators began to question this approach when findings could not be replicated and were recognized as false positives25,26. At the same time, theoretical statistical studies suggested ASP and association studies as alternatives to pedigree studies14,15,27. The questioning of pedigree approaches on theoretical grounds obscured the fact that, until recently, most pedigree studies of common diseases were underpowered and possibly tested too few markers. In the past few years it became feasible to genotype cheaply large pedigrees with much greater numbers of microsatellites than were previously used for genome scans, and statistical programs that are now available permit efficient computation of linkage even in complex pedigrees28,29. These advances allowed a substantial increase in the scale of pedigree-based linkage studies2,4,30,31,32,33,34,35,36,37,38. Inadequate technology and statistical methodology have similarly hindered implementation of alternatives to pedigree-based mapping. Although it remains uncertain how association studies should be designed and analyzed, the technology for inexpensive high-throughput genotyping of single-nucleotide polymorphisms (SNPs) is maturing rapidly, accompanied by a surge of activity in the development of statistical methods.
Levels of statistical evidence
Statistical methodologies respond to what is technologically feasible, a fact that has guided determinations of the evidence levels required to consider linkage and association results statistically significant. In the premap era, when there were few markers and genotyping was expensive, the significance cut-off for linkage (lod score of 3) rested on two arguments. First, to minimize the costs of sample collection and genotyping, Morton39 proposed a sequential procedure for sampling and analyzing pedigrees until the evidence in favor of linkage with a marker (expressed as logarithm base 10 of the likelihood ratio) reached the level of 3. This threshold corresponds to a P value of 10−4, using the χ2 approximation for likelihood ratio tests, and taking into account that this is a 'one-sided' test40. Such stringent criteria for significance guaranteed against biases introduced by the sequential sampling procedure. Second, Morton and others used Bayesian arguments to show how, even without adopting a sequential procedure, it was necessary to require such strong evidence to conclude in favor of linkage; given the availability of only a few markers, there is a very small prior probability that one of these markers is linked to the gene of interest. Substantial evidence is needed to convert this low prior probability of linkage into a high posterior probability. For example, one calculation based on genome length and the distance between loci over which one could detect linkage, determined a prior probability of 0.02 for linkage between a given locus of interest and a random genome location41,42. To obtain a posterior probability of ≥0.95, so that when one declares linkage there is a probability of ≤0.05 of being mistaken, one applies Bayes' theorem:
Substituting the prior probability of linkage of 0.02, equating this with a posterior probability of 0.95 and solving for the likelihood ratio Pr(Data | Linkage) / Pr(Data | NoLinkage), this ratio must be ≥∼1,000, corresponding to a lod score of 3. Originally, then, the stringent threshold for the lod score, or for the P value corresponding to this score, was to protect against too little search, too few pedigrees collected or too few markers tested. The numerous genome-wide linkage studies for mendelian disorders subsequently confirmed the low prior probability of linkage to any preselected single locus.
When genome-wide sets of mapped markers became available, the problem became reversed: too much search rather than too little. The prior probability that some marker in a genome-wide data set is linked to the locus of interest is 1. Because so many statistical tests are done, however, at least one test will probably yield a false positive result; therefore, one must correct for multiple comparisons. For example, consider a genome scan with 500 microsatellites. To control, at the 0.05 level, the global hypothesis of no linkage anywhere in the genome, we can use a correction level of 0.05/500 = 10−4 for each test, corresponding to a lod score of 3. Such a Bonferroni correction is too conservative when tests are dependent, as is the case in linkage studies done with denser marker sets, where intermarker distances are so small that linkage statistics pick up substantially the same information at adjacent markers. Several investigators used Gaussian process approximations for linkage statistics and determined that use of dense marker sets requires little additional adjustment of the lod score threshold, to 3–3.5 (refs. 43–45). These analyses highlighted the fact that the appropriate correction is based on the number of possible independent tests, rather than on the number of tests specifically carried out. That the several statistical arguments reviewed above suggested the same lod score cut-off has ensured universal acceptance for the above criteria for designating linkage 'significant'.
Although the field has not reached consensus on significance cut-offs for association studies, assessing such thresholds, as for linkage studies, requires consideration of prior probability and multiple comparisons. Most association studies so far have investigated small numbers of variants in one or a few candidate genes. For such studies, the need to correct for numerous comparisons is a minor issue, a fact that has led to acceptance of nonstringent significance cut-offs. However, in this case the major problem is that of too little search. Determining appropriate cut-offs for gene association studies is analogous to determining significance for linkage in the premap era, when the acceptance of stringent lod score thresholds prevented the dissemination of false positive gene linkage results. The prior probability of association of a trait to a single candidate gene is much lower than the prior probability of linkage to such a gene, as association extends over much shorter genomic intervals than does linkage. If one makes the conservative simplifying assumption that the gene was picked at random from the ∼30,000 genes in the genome, the prior probability is ∼1/30,000 that a given candidate gene is associated with a trait. Using the same Bayesian arguments presented above for linkage, the likelihood ratio should be ≥550,000 to consider the association significant; assessing association with a χ2 test, which asymptotically approximates twice the natural logarithm of the likelihood ratio, this translates into a P value of ≤2.6 × 10−7. Almost no candidate association studies meet this threshold; usually, investigators (and readers) implicitly assume that meaningful prior evidence guides the selection of a candidate gene (i.e., that the prior probability of association is higher than 1/30,000). Estimates of prior probability are inherently subjective and hypothesis-based; the estimate proposed by Morton for linkage achieved acceptance because its assumptions used mendelian principles. For association studies, there is no comparable form of prior evidence that can be readily quantified in a probability. Unfortunately, the field has therefore largely chosen to ignore the need to apply stringent cut-offs for gene association studies, so that many of even the most highly publicized results are probably false positives.
Journals could improve the reporting of gene association results by requiring explicit, critical and standardized descriptions of previous evidence for candidate genes. Investigators could propose estimates of prior probability based on such evidence; readers could judge whether these estimates are reasonable. Often such evidence will consist of results from similar association studies. Unequivocally positive results from a different population (i.e., results to be replicated) raise the prior probability; negative results from a similar study lower the prior probability. Authors could use the Genetic Association Database46 to provide a complete summary of previous association studies relevant to their publication.
Positional candidate genes reside in regions that showed linkage in earlier studies. Readers can judge the extent of such evidence—the strength of the linkage finding, the width of the lod score peak, the number of genes in the region, the degree of heterogeneity of the finding between different sets of families and whether linkage and association samples derived from similar populations. The existing evidence is usually softer for 'functional' candidate genes. For example, tryptophan hydroxylase (TPH1), which encodes a key protein in serotonin metabolism, has been widely investigated as a candidate for involvement in abnormal behavioral phenotypes. Yet the recent discovery of a new isoform of TPH47 has cast doubt on numerous association studies, which investigated an isoform that, it is now known, is not even expressed in brain serotonergic neurons. For functional candidate association studies, therefore, authors should be particularly cautious in assuming substantial prior probabilities. Readers must be able to evaluate whether earlier studies used to justify a selected candidate gene corrected for multiple comparisons, or whether any evidence argues against the candidate hypothesis. Some authors have suggested further that association studies should compute a false positive report probability for each result, incorporating the prior probability, the observed P value and the statistical power of the analysis48. Several factors (sample size, variant frequencies and effect size) determine power, and the utility of placing such emphasis on a single false positive report probability score is still unclear49.
The advent of genome-wide association studies will diminish the problem of searching too little and introduce the problem of searching too much. Some information needed to determine statistical cut-offs for such studies is still unavailable, particularly for LD mapping. We do not know how many markers are needed to bring the probability of having at least one marker associated with a disease to 1; unlike in linkage, this number will vary between populations. Furthermore, the structure of dependence between tests for association at nearby markers is unclear. Current initiatives, such as the International HapMap project and LD map–building efforts, may diminish this uncertainty50,51. Proposals advocating direct association studies using intragenic functional variants52 envision that ∼50,000–100,000 such SNPs will provide genome coverage. Such estimates supply an initial basis for considering statistical cut-offs; with a Bonferroni correction, one needs a P value of <5 × 10−7 to achieve significance. Although this cut-off may be too conservative if there is substantial dependence between the association tests at the various SNPs, we currently lack appropriate models for such possible dependence. Given that association-based genome scans aim to identify multiple genes of relatively small effect, some have proposed implementing a less strict definition of global error. The Bonferroni correction controls the probability of declaring at least one false association, known as the family wise error rate. An alternative approach controlling the false discovery rate, the proportion of wrong associations among all the identified associations53, is receiving increasing attention54,55,56.
An additional issue in association studies, one not faced in linkage studies, is that high-throughput SNP analysis creates the possibility of genotyping numerous variants in sizable candidate regions or for series of candidate genes. This situation has some characteristics of both too little search and too much search. As only limited segments of the genome are evaluated, one must account for the low prior probability of association, but one must also correct for multiple comparisons, taking into consideration the number of possible tests.
Choosing and evaluating phenotypes
Each mapping approach offers advantages and disadvantages for phenotyping. Investigating pedigrees permits collection of deeper phenotypic profiles than is feasible in a population sample; ongoing relationships with pedigree members facilitate extensive and longitudinal assessments. But the phenotypes assessed in a single pedigree may be idiosyncratic to that pedigree or to specific clinicians; this limits the feasibility of combined analyses of pedigrees sampled by different research groups. Large-scale cooperative ASP studies have fostered a more systematic approach to phenotyping, permitting comparability of phenotype definition and assessment between research groups. Association samples, which are easy to collect in clinical settings, may be 'convenience samples', in which phenotypic assessment is superficial. When sufficient resources are devoted to identifying and phenotyping subjects, however, population samples, such as those collected in large cohort studies57,58, have unmatched potential for providing generalizable information on a comprehensive array of phenotypic features59 and for enabling evaluation of phenotypic and environmental variation in relation to genotypic variation. Comprehensive phenotypic databases provide economies of scale for investigating common diseases, but the degree of systematization followed in identifying and phenotyping subjects will determine the utility of such databases.
In selecting phenotypes for linkage and association analyses, investigators must account for low prior probability and multiple comparisons, just as in selecting markers. The low prior probability of 'candidate phenotypes' is similar to the low prior probability of candidate genes. Consider functional variants in a gene implicated in an important biological pathway—for example, the repeat polymorphism in the serotonin transporter promoter region, which has been tested for association to a wide range of behavioral phenotypes, chosen based on their hypothesized physiological connection to serotonergic pathways60. Stringent statistical cut-offs are needed to offset the low prior probability that this variant will influence, among all possible phenotypes, the phenotype chosen by an investigator. Although it is not evident how one can estimate this prior probability, for some phenotypes there is better a priori evidence than for others. For example, it is more probable that the serotonin transporter variant influences phenotypes previously shown to be heritable than phenotypes not known to be heritable; for the latter, a significance cut-off of P < 0.05 is almost certainly too liberal. This low prior probability could influence interpretation of recent association results for this variant for complicated phenotypes. Examples include functional brain imaging results in response to emotional stimuli61 and depression-related phenotypes when related to stressful life events62.
Increasing the scale and variety of phenotypic data introduces additional statistical issues. If the analysis plan for considering different phenotypic categorizations is not specified in advance, then there is a risk of inflating the likelihood ratio (for either linkage or association tests) by maximizing the evidence according to disease definition63. The statistical problem of multiple comparisons occurs when researchers investigate multiple phenotypes in the same set of samples; this problem will be exacerbated when investigators begin to analyze comprehensive phenotype databases from large population samples. Applying Bonferroni corrections based on the number of phenotypes evaluated will probably result in exceedingly conservative conclusions: often phenotypes (and hence tests) will be correlated and one expects more than one phenotype to lead to positive mapping results. In this context, false discovery rate approaches may be particularly useful, possibly coupled with resampling procedures to take this dependency into account.
Another statistical issue arises when authors report only some of the possible phenotypic assortment from their data. Authors should indicate explicitly the phenotype combinations that yield negative mapping results and state how they use phenotypic information to guide the extension of pedigrees. In the absence of such information, readers may assume that the phenotypes used for genetic analyses are idiosyncratic and hard to relate to prior probabilities for linkage or association. Readers will be better able to evaluate mapping studies involving multiple phenotypic categorizations if they are provided with details about the procedures used in all stages of phenotyping and phenotype-genotype analyses. For example, in mapping stroke susceptibility loci, deCODE obtained the strongest evidence using unconventional phenotypic categorizations3,64. If the authors provided detailed phenotyping information on a website, readers could judge whether criticism of this approach is valid64 or simply another example of the gulf between biologic and epidemiologic approaches65.
Extended pedigree studies
Three recent developments have revived interest in pedigree studies of common diseases. First, theoretical studies have suggested that pedigree approaches may be the most powerful for identifying quantitative trait loci underlying disease phenotypes (endophenotypes)66,67; these quantitative trait loci may have a simpler genetic architecture than the disease diagnoses and may therefore be more straightforward to map. Endophenotype mapping has not yet been implemented in humans on a sufficient scale to judge its success. Second, new methods can efficiently compute linkage statistics in extended pedigrees28,29. Third, deCODE has published pedigree-based linkage findings for numerous diseases. deCODE obtained access to the Icelandic population, with its medical records and genealogy, and used this information to assemble large pedigrees. The genealogies enabled reconstruction of most connections between distantly related individuals. The medical records provided wide-ranging phenotypic information on most family members. deCODE focuses on large pedigrees with distantly related affected members, who are expected to share shorter genome segments around a disease gene than the more closely related affected individuals in small pedigrees. Hence, deCODE conducted genome scans using denser marker sets than those used by most groups68, analyzing linkage in entire pedigrees with programs developed by its scientists69 and considering several different combinations of phenotypic information in these analyses
Although the scale of deCODE's extended pedigree studies is unusual, numerous research groups are using similar approaches, mainly in families from relatively closed populations70,71,72,73,74,75. These populations exist throughout the world and are characterized by low immigration, low emigration and distribution over relatively small areas, so that most subjects and their medical records are available to investigators. From these communities it is feasible to obtain nearly complete genealogies, a crucial step in conducting adequately powered studies.
The power of pedigree studies is a topic of great current interest. Most of deCODE's studies have involved genotyping several hundred affected individuals, using >1,000 markers. Although each study has yielded interesting results, leading to fine-mapping and gene-identification efforts, several have failed to achieve clear statistical significance2,5,37,38. deCODE's experiences suggest two avenues for extended pedigree designs. First, for diseases that do not yield unequivocal linkage results in large samples, implementation of endophenotype mapping may be particularly attractive. Second, to obtain adequate power investigators may need to combine pedigree samples from different countries, perhaps from genetically related populations76.
For common diseases, the extended pedigree approach has so far failed, other than for rare early-onset forms of these diseases, to fulfill the expectation that it would identify high-penetrance variants that illuminate biological pathways77,78. Pedigree studies of common forms of these diseases have led to positional candidate association studies that have provided intriguing, but mainly statistically equivocal, evidence for variants that may have a role in disease susceptibility; variants identified so far do not have the biological effects of most mutations underlying mendelian disorders2,3,4,5,79. In this respect, the field is eagerly awaiting the results of fine-mapping studies for several diseases being undertaken by deCODE and others.
ASP studies
As genealogy-based pedigree studies require well-demarcated, stable populations, and as most phenotyped individuals live in other settings, the genetics field requires other paradigms. The ASP strategy enabled numerous investigators with access to well-phenotyped clinical samples to initiate linkage studies. Owing to the influence of Risch's theoretical work, the requirement for relatively few markers and the development of improved statistical analysis programs80,81, this approach now predominates for genome-wide mapping of common diseases. The use of inadequate sample sizes probably explains why most published ASP studies have reported negative or equivocal results, particularly for phenotypes with small effect sizes (low genotype relative risk). The launching of so many underpowered studies exemplifies how the field incorrectly interprets the conclusions of theoretical studies but also reflects the substantial resources required to collect adequate ASP samples. By forming consortia to obtain such samples, investigators are beginning to obtain the results predicted by Risch and others. Crohn disease is an example. Independent ASP studies suggested that several possible loci on different chromosomes were involved in Crohn disease82,83,84. Many, but not all, studies implicated a locus on chromosome 16; some of these studies, on their own, barely highlighted this region. Formation of an international consortium to investigate more than 600 ASPs from these several studies generated an unequivocal linkage finding on chromosome 16 (IBD1; ref. 85), which led to identification of the gene underlying the linkage of IBD1 with susceptibility to Crohn disease (CARD15; ref. 86). This example shows that the ASP design is well-suited for combining samples from different countries. Unlike extended pedigree approaches, ASP studies are readily coordinated between sites and do not depend on genealogic efforts. Compared with association studies, ASP approaches are robust to differences in the genetic composition of the study populations. One caveat concerns comparison of ASP genome scans. Given the typically sparse marker sets used in such scans (<500 markers), false negative results may result from excessive gaps in genome coverage, for example, if a particular marker fails in one of the data sets. The problem is exacerbated by the fact that different studies use different markers, as illustrated by recent scans for rheumatoid arthritis87,88. The increasing interest in combining data from different scans suggests the need to use denser, more uniform marker sets in future ASP studies.
The advantages of ASP designs for multi-site projects are now being exploited by large-scale studies that will support investigations beyond gene mapping. For example, the GenomEUtwin project89, incorporating almost one million twins from several countries, will be powerful for ASP linkage studies of several phenotypes. The extensive longitudinal data collected will permit evaluation of numerous environmental variables; even the richest pedigree is poorly suited to investigating questions relating to gene-environment interactions, and the non-independence of members of pedigrees complicates statistical analyses that are straightforward in ASPs.
Association studies
Association studies are the focus of much current interest. Genome-wide association studies to identify risk variants for common diseases have mainly been limited, to date, to recently founded population isolates, in which microsatellites detect LD over distances up to several centimorgans90,91. Although some remain skeptical about genome-wide LD mapping using SNPs92, identification of an identical asthma-associated SNP haplotype in Finland and Quebec indicates the tremendous potential of this approach, at least in population isolates93. A few unequivocal candidate-gene associations, such as that between the apolipoprotein E4 allele (ApoE4) and Alzheimer disease, illustrate the kind of information we can expect from successful association studies. Numerous studies showed that ApoE4 is the most important risk factor for Alzheimer disease94. Although this finding has generated fewer biological insights than has the identification, through pedigree studies, of genes implicated in rare, mendelian forms of Alzheimer disease78, it has transformed epidemiological and clinical investigation of dementia and related phenotypes. Consequently, it is now known that ApoE4 is associated with the age of onset of Alzheimer disease95, the process of cognitive decline in 'normal' aging96, altered magnetic resonance imaging findings in asymptomatic individuals97,98, risk of chronic traumatic brain injury in boxers99 and clinical outcome in survivors of traumatic brain injury100.
Conclusion
Developments in genotyping technology and statistical methodology will soon make adequately powered pedigree, ASP and association studies feasible for most common diseases. The specific biological and epidemiological questions that an investigator aims to answer will then dictate the choice of study design, and it will be difficult to justify designs (such as candidate gene association studies) because they are convenient or inexpensive. A currently practical step in this direction would be for funding agencies to start rejecting such justifications. They could also require investigators to indicate explicitly the reasoning, the evidence underlying the reasoning and the procedures to be used to address statistical issues of prior probability, power and multiple comparisons; journals could take the same stance with authors.
Insistence by the field on more stringent standards of evidence will encourage investigators, using any mapping approach, to increase sample sizes. In most cases this step will require combining samples from different sites. Incompatibility of phenotypic data between sites will probably impede this process; funding agencies should support efforts to standardize such data, both new and existing. The feasibility of combining samples may be specific to the setting. For example, extended pedigree studies will mainly be undertaken in well-demarcated populations and increasing sample size may require identifying suitable companion populations. The combination of samples between studies will also require greater efforts to ensure that the marker data from genome-wide analyses are compatible and provide complete genome coverage; this is already apparent for pedigree and ASP studies and will be even more important for association studies.
References
Anonymous. Breakthrough of the year: The runners-up. Science 302, 2039–2045 (2003).
Stefansson, H. et al. Neuregulin 1 and susceptibility to schizophrenia. Am. J. Hum. Genet. 71, 877–892 (2002).
Gretarsdottir, S. et al. The gene encoding phosphodiesterase 4D confers risk of ischemic stroke. Nat. Genet. 35, 131–138 (2003).
Styrkarsdottir, U. et al. Linkage of osteoporosis to chromosome 20p12 and association to BMP2. PLoS Biol. 1, E69 (2003).
Helgadottir, A. et al. The gene encoding 5-lipoxygenase activating protein confers risk of myocardial infarction and stroke. Nat. Genet. 36, 233–239 (2004).
King, R.A., Rotter, J.I. & Motulsky, A.G. The Genetic Basis of Common Diseases 2nd edn. (Oxford University Press, Oxford, 2002).
Kraepelin, E. Psychiatrie (JA Barth, Leipzig, 1896; reprinted by Arno, New York, 1976).
Kallman, F.J. The Genetics of Schizophrenia (J.J. Augustin, New York, 1938).
Kallman, F.J. The genetic theory of schizophrenia: an analysis of 691 schizophrenia twin index families. Am. J. Psychiatry 103, 309–322 (1946).
Sullivan, P.F., Kendler, K.S. & Neale, M.C. Schizophrenia as a complex trait: evidence from a meta-analysis of twin studies. Arch. Gen. Psychiatry 60, 1187–1192 (2003).
Gottesman, II & Shields, J. A polygenic theory of schizophrenia. Proc. Natl. Acad. Sci. USA 58, 199–205 (1967).
McGue, M., Gottesman, II & Rao, D.C. The transmission of schizophrenia under a multifactorial threshold model. Am. J. Hum. Genet. 35, 1161–1178 (1983).
Hanson, D.L. & Gottesman, I.I. Schizophrenia. in The Genetic Basis of Common Diseases 1st edn. (eds. King, R.A., Rotter, J.I. & Motulsky, A.G.) 816–836 (Oxford University Press, Oxford, 1992).
Risch, N. Linkage strategies for genetically complex traits. I. Multilocus models. Am. J. Hum. Genet. 46, 222–228 (1990).
Risch, N. Linkage strategies for genetically complex traits. II. The power of affected relative pairs. Am. J. Hum. Genet. 46, 229–241 (1990).
Botstein, D., White, R.L., Skolnick, M. & Davis, R.W. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet. 32, 314–331 (1980).
Gusella, J.F. et al. A polymorphic DNA marker genetically linked to Huntington's disease. Nature 306, 234–238 (1983).
Rommens, J.M. et al. Identification of the cystic fibrosis gene: chromosome walking and jumping. Science 245, 1059–1065 (1989).
Riordan, J.R. et al. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science 245, 1066–1073 (1989).
Kerem, B. et al. Identification of the cystic fibrosis gene: genetic analysis. Science 245, 1073–1080 (1989).
Hastbacka, J. et al. Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nat. Genet. 2, 204–211 (1992).
Baron, M. et al. Genetic linkage between X-chromosome markers and bipolar affective illness. Nature 326, 289–292 (1987).
Egeland, J.A. et al. Bipolar affective disorders linked to DNA markers on chromosome 11. Nature 325, 783–787 (1987).
St George-Hyslop, P.H. et al. The genetic defect causing familial Alzheimer's disease maps on chromosome 21. Science 235, 885–890 (1987).
Kelsoe, J.R. et al. Re-evaluation of the linkage relationship between chromosome 11p loci and the gene for bipolar affective disorder in the Old Order Amish. Nature 342, 238–243 (1989).
Baron, M. et al. Diminished support for linkage between manic depressive illness and X-chromosome markers in three Israeli pedigrees. Nat. Genet. 3, 49–55 (1993).
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).
Sobel, E. & Lange, K. Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics. Am. J. Hum. Genet. 58, 1323–1337 (1996).
Thompson, E. Statistical Inference from Genetic Data on Pedigrees (IMS, Beachwood, Ohio, 2000).
Gudmundsson, G. et al. Localization of a gene for peripheral arterial occlusive disease to chromosome 1p31. Am. J. Hum. Genet. 70, 586–592 (2002).
Gretarsdottir, S. et al. Localization of a susceptibility gene for common forms of stroke to 5q12. Am. J. Hum. Genet. 70, 593–603 (2002).
Kristjansson, K. et al. Linkage of essential hypertension to chromosome 18q. Hypertension 39, 1044–1049 (2002).
Hakonarson, H. et al. A major susceptibility gene for asthma maps to chromosome 14q24. Am. J. Hum. Genet. 71, 483–491 (2002).
Karason, A. et al. A susceptibility gene for psoriatic arthritis maps to chromosome 16q: evidence for imprinting. Am. J. Hum. Genet. 72, 125–131 (2003).
Thorgeirsson, T.E. et al. Anxiety with panic disorder linked to chromosome 9q in Iceland. Am. J. Hum. Genet. 72, 1221–1230 (2003).
Stefansson, S.E. et al. Genomewide scan for hand osteoarthritis: a novel mutation in matrilin-3. Am. J. Hum. Genet. 72, 1448–1459 (2003).
Reynisdottir, I. et al. Localization of a susceptibility gene for type 2 diabetes to chromosome 5q34-q35.2. Am. J. Hum. Genet. 73, 323–335 (2003).
Bjornsson, A. et al. Localization of a gene for migraine without aura to chromosome 4q21. Am. J. Hum. Genet. 73, 986–993 (2003).
Morton, N.E. Sequential tests for the detection of linkage. Am. J. Hum. Genet. 7, 277–318 (1955).
Ott, J. Analysis of Human Genetic Linkage 3rd edn. (Johns Hopkins University Press, Baltimore, 1999).
Renwick, J.H. The mapping of human chromosomes. Annu. Rev. Genet. 5, 81–120 (1971).
Elston, R.C. The prior probability of autosomal linkage. Ann. Hum. Genet. 38, 341–350 (1975).
Feingold, E., Brown, P.O. & Siegmund, D. Gaussian models for genetic linkage analysis using complete high-resolution maps of identity by descent. Am. J. Hum. Genet. 53, 234–251 (1993).
Dupuis, J., Brown, P.O. & Siegmund, D. Statistical methods for linkage analysis of complex traits from high-resolution maps of identity by descent. Genetics 140, 843–856 (1995).
Lander, E. & Kruglyak, L. Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat. Genet. 11, 241–247 (1995).
Becker, K.G., Barnes, K.C., Bright, T.J. & Wang, S.A. The Genetic Association Database. Nat. Genet. 36, 431–432 (2004).
Walther, D.J. et al. Synthesis of serotonin by a second tryptophan hydroxylase isoform. Science 299, 76 (2003).
Wacholder, S., Chanock, S., Garcia-Closas, M., El Ghormli, L. & Rothman, N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl. Cancer Inst. 96, 434–442 (2004).
Thomas, D.C. & Clayton, D.G. Betting odds and genetic associations. J. Natl. Cancer Inst. 96, 421–423 (2004).
The International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).
Maniatis, N. et al. Positional cloning by linkage disequilibrium. Am. J. Hum. Genet. 74, 846–855 (2004).
Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat. Genet. 33 Suppl, 228–237 (2003).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Society B 57, 289–300 (1995).
Weller, J.I., Song, J.Z., Heyen, D.W., Lewin, H.A. & Ron, M. A new approach to the problem of multiple comparisons in the genetic dissection of complex traits. Genetics 150, 1699–1706 (1998).
Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).
Sabatti, C., Service, S. & Freimer, N. False discovery rate in linkage and association genome screens for complex disorders. Genetics 164, 829–833 (2003).
Rantakallio, P. Groups at risk in low birth weight infants and perinatal mortality. Acta. Paediatr. Scand. 193, 193:1+ (1969).
Jarvelin, M.R. et al. Ecological and individual predictors of birthweight in a northern Finland birth cohort 1986. Paediatr. Perinat. Epidemiol. 11, 298–312 (1997).
Freimer, N. & Sabatti, C. The human phenome project. Nat. Genet. 34, 15–21 (2003).
Glatt, C.E. & Freimer, N.B. Association analysis of candidate genes for neuropsychiatric disease: the perpetual campaign. Trends Genet. 18, 307–312 (2002).
Hariri, A.R. et al. Serotonin transporter genetic variation and the response of the human amygdala. Science 297, 400–403 (2002).
Caspi, A. et al. Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene. Science 301, 386–389 (2003).
Weeks, D.E., Lehner, T., Squires-Wheeler, E., Kaufmann, C. & Ott, J. Genet. Epidemiol. 7, 237–243 (1990).
Funalot, B., Varenne, O. & Mas, J.L. A call for accurate phenotype definition in the study of complex disorders. Nat. Genet. 36, 3 (2004).
Gulcher, J.R., Gretarsdottie, S., King, A. & Stefansson, K. Reply to “A call for accurate phenotype definition in the study of complex disorders.” Nat. Genet. 36, 3–4 (2004).
Williams, J.T. & Blangero, J. Power of variance component linkage analysis to detect quantitative trait loci. Ann. Hum. Genet. 6, 545–563 (1999).
Almasy, L. & Blangero, J. Endophenotypes as quantitative risk factors for psychiatric disease: rationale and study design. Am. J. Med. Genet. 105, 42–44 (2001).
Kong, A. et al. A high-resolution recombination map of the human genome. Nat. Genet. 31, 241–247 (2002).
Gudbjartsson, D.F., Jonasson, K., Frigge, M.L. & Kong, A. Allegro, a new computer program for multipoint linkage analysis. Nat. Genet. 25, 12–13 (2000).
Hovatta, I. et al. A genomewide screen for schizophrenia genes in an isolated Finnish subpopulation, suggesting multiple susceptibility loci. Am. J. Hum. Genet. 65, 1114–1124 (1999).
Ober, C., Tsalenko, A., Parry, R. & Cox, N.J. A second-generation genomewide screen for asthma-susceptibility alleles in a founder population. Am. J. Hum. Genet. 67, 1154–1162 (2000).
Garner, C. et al. Linkage analysis of a complex pedigree with severe bipolar disorder, using a Markov chain Monte Carlo method. Am. J. Hum. Genet. 68, 1061–1064 (2001).
Schoenberg, S.J. et al. Fine mapping of a multiple sclerosis locus to 2.5 Mb on chromosome 17q22-q24. Hum. Mol. Genet. 11, 2257–2267 (2002).
Abkevich, V. et al. Predisposition locus for major depression at chromosome 12q22-12q23.2. Am. J. Hum. Genet. 73, 1271–1281 (2003).
Abecasis, G.R. et al. Genomewide scan in families with schizophrenia from the founder population of Afrikaners reveals evidence for linkage and uniparental disomy on chromosome 1. Am. J. Hum. Genet. 74, 403–417 (2004).
Carvajal-Carmona, L.G. et al. Genetic demography of Antioquia (Colombia) and the Central Valley of Costa Rica. Hum. Genet. 112, 534–541 (2003).
de la Chapelle, A. & Peltomaki, P. The genetics of hereditary common cancers. Curr. Opin. Genet. Dev. 8, 298–303 (1998).
Nussbaum, R.L. & Ellis, C.E. Alzheimer's disease and Parkinson's disease N. Engl. J. Med. 348, 1356–1364 (2003).
Hennah, W. et al. Haplotype transmission analysis provides evidence of association for DISC1 to schizophrenia and suggests sex-dependent effects. Hum. Mol. Genet. 12, 3151–3159 (2003).
Kruglyak, L. & Lander, E.S. Complete multipoint sib-pair analysis of qualitative and quantitative traits. Am. J. Hum. Genet. 57, 439–454 (1995).
Kruglyak, L., Daly, M.J., Reeve-Daly, M.P. & Lander, E.S. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am. J. Hum. Genet. 58, 1347–1363 (1996).
Hugot, J.P. et al. Mapping of a susceptibility locus for Crohn's disease on chromosome 16. Nature 379, 821–823 (1996).
Satsangi, J. et al. Two stage genome-wide search in inflammatory bowel disease provides evidence for susceptibility loci on chromosomes 3, 7 and 12. Nat. Genet. 14, 199–202 (1996).
Cho, J.H. et al. Identification of novel susceptibility loci for inflammatory bowel disease on chromosomes 1p, 3q, and 4q: evidence for epistasis between 1p and IBD1. Proc. Natl. Acad. Sci. USA 95, 7502–7507 (1998).
Cavanaugh, J. et al. International collaboration provides convincing linkage replication in complex disease through analysis of a large pooled data set: Crohn disease and chromosome 16. Am. J. Hum. Genet. 68, 1165–1171 (2001).
Hugot, J.P. et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 411, 599–603 (2001).
Jawaheer, D. et al. Screening the genome for rheumatoid arthritis susceptibility genes: a replication study and combined analysis of 512 multicase families. Arthritis Rheum. 48, 906–916 (2003).
MacKay, K. et al. Whole-genome linkage analysis of rheumatoid arthritis susceptibility loci in 252 affected sib pairs in the United Kingdom. Arthritis Rheum. 46, 632–639 (2002).
Peltonen, L. GenomEUtwin: a strategy to identify genetic influences on health and disease. Twin Res. 6, 354–60 (2003).
Ophoff, R.A. et al. Genomewide linkage disequilibrium mapping of severe bipolar disorder in a population isolate. Am. J. Hum. Genet. 71, 565–574 (2002).
Vaessen, N. et al. A genome-wide search for linkage-disequilibrium with type 1 diabetes in a recent genetically isolated population from the Netherlands. Diabetes 51, 856–859 (2002).
Terwilliger, J.D., Haghighi, F., Hiekkalinna, T.S. & Goring, H.H. A bias-ed assessment of the use of SNPs in human complex traits. Curr. Opin. Genet. Dev. 12, 726–734 (2002).
Laitinen, T. et al. Characterization of a common susceptibility locus for asthma-related traits. Science 304, 300–304 (2004).
Strittmatter, W.J. & Roses, A.D. Apolipoprotein E and Alzheimer's disease. Annu. Rev. Neurosci. 19, 53–77 (1996).
Meyer, M.R. et al. APOE genotype predicts when—not whether—one is predisposed to develop Alzheimer disease. Nat. Genet. 19, 321–322 (1998).
Bretsky, P. et al. The role of APOE-epsilon4 in longitudinal cognitive decline: MacArthur Studies of Successful Aging. Neurology 60, 1077–1081 (2003).
Bookheimer, S.Y. et al. Patterns of brain activation in people at risk for Alzheimer's disease. N. Engl. J. Med. 343, 450–456 (2000).
Small, G.W. et al. Cerebral metabolic and cognitive decline in persons at genetic risk for Alzheimer's disease. Proc. Natl. Acad. Sci. USA 97, 6037–6042 (2000).
Jordan, B.D. et al. Apolipoprotein E epsilon4 associated with chronic traumatic brain injury in boxing. JAMA 278, 136–140 (1997).
Friedman, G. et al. Apolipoprotein E-epsilon4 genotype predicts a poor outcome in survivors of traumatic brain injury. Neurology 52, 244–248 (1999).
Acknowledgements
We thank M. Karayiorgou, S. Service and S. Blower for comments on the manuscript. This work was supported by grants from the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Rights and permissions
About this article
Cite this article
Freimer, N., Sabatti, C. The use of pedigree, sib-pair and association studies of common diseases for genetic mapping and epidemiology. Nat Genet 36, 1045–1051 (2004). https://doi.org/10.1038/ng1433
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng1433
This article is cited by
-
Evaluating endophenotypes for bipolar disorder
International Journal of Bipolar Disorders (2021)
-
Genetic Variants Associated With Drug-Induced Hypersensitivity Reactions: towards Precision Medicine?
Current Treatment Options in Allergy (2021)
-
Rediscovering the value of families for psychiatric genetics research
Molecular Psychiatry (2019)
-
Weighted likelihood inference of genomic autozygosity patterns in dense genotype data
BMC Genomics (2017)
-
A genome-wide screen for acrophobia susceptibility loci in a Finnish isolate
Scientific Reports (2016)