Introduction

Stature, that is adult body height, and body mass index (BMI) are typical complex traits that have a well-established genetic component as well as well-known environmental and lifestyle factors that act independently and perhaps in concert with genetic components in the determination of the trait phenotype. Both stature and BMI have relatively high proportion of variance explained by genetic components (heritability)1, 2 and should thus be amenable to genetic mapping of underlying quantitative trait loci (QTL).

Genetic linkage mapping has been highly successful in the study of Mendelian traits, where a single-gene mutation (or several closely linked mutations within the same gene) explains most of the observed trait variation. Linkage mapping focuses on observed recombinations and is thus useful for narrowing down the chromosomal region that contains loci influencing the trait. Association (or linkage disequilibrium) mapping on the other hand is utilized to directly identify the genetic variant(s) influencing the trait (or variants in linkage disequilibrium with it). Recently with the advent of rapid and affordable genome-wide single-nucleotide polymorphism genotyping, genome-wide association (GWA) studies have been proposed as a more powerful approach compared to traditional linkage mapping for identifying genetic variants underlying complex traits.3 However, even though the GWA study design holds great promise for genetic discoveries, there is also a plethora of genome-wide marker data for family samples that have yet to be utilized for genetic linkage analyses of available phenotypes such as stature and BMI. As these family samples may still hold unexplored potential for genetic discoveries,4 they should be exploited to the fullest extent possible.

We were able to obtain genome-wide genotype and phenotype data for 9.371 individuals from 3.032 families and performed genome-wide variance-components linkage analysis for locating QTL that influence stature and BMI. To our knowledge, this study represents the single largest family-based genome-wide scan published to date that is based on combining primary data instead of meta-analysis of primary results. We corroborate multiple previously reported QTL for stature and BMI as well as provide linkage evidence for two novel loci for stature. In addition, we were able to demonstrate population- and sex-specific genetic effects underlying these phenotypes.

Materials and methods

Family samples

The samples analyzed in the study consisted of four independent genome-wide screened family cohorts: the Joslin study of type 2 diabetes, the Family Blood Pressure Program (FBPP), systemic lupus erythematosus (SLE) families and the Cleveland Family Study. The number of families and individuals with genotype and phenotype information are listed in Table 1. Details of the patient ascertainment and genotyping can be found from the respective initial publications. For the purpose of minimizing within-family genetic heterogeneity, we discarded all families with members of multiple ethnicities in each sample before analysis.

Table 1 Number of individuals with genotype and phenotype information by cohort and ethnicity

From the Joslin study of type 2 diabetes, we included 45 families originally ascertained for maturity-onset diabetes of the young (MODY).5 As the majority of the sample consisted of European-American families (39 of 45), we excluded one Asian-American, two Hispanic and three African-American families before analysis. A total of 39 European-American families consisting of 282 with genotype and phenotype information (152 females and 130 males) were included in all subsequent analyses.

The FBPP is a multicenter collaborative effort established by the National Heart, Lung, and Blood Institute (NHLBI) to localize and identify genes underlying blood pressure and hypertension.6 The FBPP consists of the African-American, Mexican-American, Asian and non-Hispanic white populations but because we had access to a limited number of Asian- and Mexican-American families from the other studies, we included only African-American and European-American families in our analyses. In total, we analyzed 1496 African-American families with 4009 individuals (2663 females and 1346 males) and 1151 European-American families with 3641 individuals (2004 females and 1637 males).

The SLE families were ascertained from the Lupus Family Registry and Repository (http://lupus.omrf.org). All SLE cases met 4 of the 117 criteria for SLE. These families included 73 African-American families with 281 individuals (231 females and 50 males) and 148 European-American families with 720 individuals (518 females and 202 males).8, 9 These families are partially included in the paper,10 which reports individuals from 56 African-American families, 92 European-American families and 12 families of other ethnic backgrounds. This paper builds on the samples described by Gray-McGuire et al8 and Moser et al9 with 34 additional families.

Participants from the Cleveland Family Study were ascertained from first or selected second-degree relatives of a proband with either laboratory diagnosed obstructive sleep apnea or neighborhood control of an affected proband. Families were selected for genotyping on the basis of genetic informativity, including multigenerational data or individuals from the extremes of the distribution of apnea phenotype.11, 12 These families included 59 African-American families with 176 individuals (100 females and 76 males) and 66 European-American families with 262 individuals (120 females and 142 males) with genotype and phenotype information.

Statistical analysis of the phenotype data

Phenotypic distributions were analyzed within each cohort using SPSS 14.0 (SPSS, Chicago, IL, USA). We excluded the phenotype information of all individuals less than 23 years of age, as they may still be growing and may have not reached their final adult height. However, the genotypes of these individuals were retained to maximize linkage phase information and to obtain more accurate allele frequency estimates. In each cohort, we removed phenotypic outliers (defined as individuals whose phenotypic value deviated 3 standard deviations or more from their respective ethnic- and sex-specific mean) because they may have disproportionate weight in the genetic analyses.13 Both stature and BMI were approximately normally distributed, thus no transformations were applied before variance-component analyses.

Familial relationship and genotype checks

As all genetic analyses performed assume that the familial relationship information is correct, we utilized genome-wide genotype data to verify relationships using the GRR program.14 The genotype and phenotype data of all pairs whose reported relationship was likely to have been misspecified were removed before analysis.

Mendelian consistency of the genotype data was verified with PedCheck15 and all inconsistencies were removed before genetic analysis. We also performed the multipoint method of detecting unlikely yet Mendelian consistent genotypes implemented in Merlin16 and excluded those that were unlikely using the Pedwipe-program included in the Merlin package. We tested each marker for deviation from Hardy–Weinberg equilibrium (HWE) with PedStats and excluded all markers deviating from HWE after controlling for multiple testing using the Sidak's correction17 with α=0.05.

Genetic map construction

We used the published deCODE genetic map18 to assign the genetic location for each marker. For those markers that were not included in the original deCODE genetic map, we used the physical position for linear interpolation of the genetic location using the physical and genetic locations of the flanking deCODE markers. Building of the genetic map was automated using Cartographer.19

Variance-components linkage analyses

Variance-components linkage analyses were performed using Merlin. Marker allele frequencies were estimated before linkage analysis, within each cohort for European-Americans and African-American separately using the observed genotype data by maximum likelihood as implemented in Merlin. The sex and age of the individuals at the time of phenotype determination were used as covariates in the variance-components model, and to control for heterogeneity between the cohorts, the cohort identifier and ethnicity were included as covariates in the combined analyses. As it has been shown that inclusion of ungenotyped and unphenotyped individuals that are not required for specification of pedigree structure introduces noise in the variance-components analyses,20 we used Merlin's trim option to automatically remove these individuals before genetic analysis. All Merlin analyses were automated using AUTOGSCAN.21

Results

We collected genome-wide genetic marker information for families ascertained for various adult onset phenotypes genotyped at the National Heart, Lung, and Blood Institute (NHLBI) Mammalian Genotyping Service. The total number of informative individuals with genome-wide genotype and phenotype data was 9.371 from 3.032 families (Table 1). The numbers of informative relative pairs for the variance-components analysis by sex and ethnicity are shown in Table 2, and the phenotypic characteristics are summarized in Table 3. The mean BMI and standard deviation were largest in the African-American female sample. In the European-Americans, the mean BMI was slightly smaller for females compared to males, but the female standard deviation was larger. In males, the mean BMI was larger in European-Americans, but the standard deviation was larger in African-American males. The mean and standard deviations for stature between African-Americans and European-Americans were nearly identical in both females and males.

Table 2 Number of informative relative pairs, where both members have genotype and phenotype information available
Table 3 Phenotypic characteristics of the sample analyzed in the study

Similar to many other quantitative traits, stature and BMI are good examples of sexually dimorphic traits that may have sex-specific genetic backgrounds.22 Therefore, we also performed sex-limited linkage analyses attempting to localize such sex-specific QTL by ignoring the phenotype values from one sex, while retaining all available genotype data to maximize identity-by-descent information. The covariate-adjusted heritability estimates ranged between 0.49–0.66 for BMI and 0.81–0.97 for stature in the African-American, European-American cohorts and the combined cohort (Table 4). These estimates are in good agreement with estimates from other studies.1, 2

Table 4 Heritability estimates from the Merlin variance-components analysis

The average map density of the multiallelic markers was 6.4 cM and average entropy-based information content was 0.61 in the combined sample. The genome-wide linkage results for African-American, European-American cohorts and the combined cohort are shown in Figure 1, and the chromosomal loci that produced evidence for linkage (multipoint LOD score ≥2) are summarized in Table 5. Multipoint LOD scores for chromosomes that showed evidence for linkage (multipoint LOD score ≥2) in one or more cohorts are shown in detail in Supplementary Figure 1.

Figure 1
figure 1figure 1

(a) Genome-wide multipoint linkage results for stature with LOD scores on the y axis and cumulative cM on x axis. The left panel displays results for African-American families, the middle panel for European-American and the right panel results for all families combined. Black lines show results for females and males, red lines for females and blue lines for males. (b) Genome-wide multipoint linkage results for body mass index with LOD scores on the y axis and cumulative cM on x axis. The left panel displays results for African-American families, the middle panel for European-American and the right panel results for all families combined. Black lines show results for females and males, red lines for females and blue lines for males.

Table 5 Multipoint LOD scores for loci where evidence for linkage (LOD >2.0) was observed in one or more cohorts

Linkage results for BMI

We found evidence for two separate female-specific loci for BMI: one on 7q35 (LOD=2.93) and another on 11q22 (LOD=2.21). Interestingly, both loci also seem to be population-specific as the 7q35 locus shows linkage only in European-American females and the 11q22 locus only in African-American females. The males provide no evidence for linkage to BMI at these loci.

These LOD scores correspond to the ten-base logarithm of the ratio of the likelihoods of the observed data under the alternative hypothesis (locus is linked to a QTL) and the null hypothesis (locus is not linked to a QTL). Therefore, the interpretation for LOD score of 2.0 is that the likelihood that the locus is linked is 100 times more likely than the likelihood that the locus is not linked to trait. A LOD score of 3.0 is thus evidence that linkage at this locus is 1000 times more likely than no linkage. The LOD scores can also be converted to P-values – LOD score of 2.0 corresponds roughly to a P-value of 10−3 and a LOD score of 3.0 to a P-value of 10−4. However, due to the inherent multiple testing performed in genome-wide linkage screens due to the usage of 300–400 genetic markers, usually LOD scores that are considered suggestive and significant are 2.2 and 3.3, respectively.

7q35 has previously been linked to BMI and other body composition-related phenotypes in the Quebec Family Study and in the Old Order Amish population.23, 24 This region also contains two highly interesting candidate genes for body composition phenotypes: leptin (LEP) and plasminogen activator inhibitor-1 (PAI1). LEP, originally identified from studying the C57BL/6J ob/ob mouse strain,25 plays a key role in the regulation of body weight by inhibiting food intake and stimulating energy expenditure. In humans, congenital LEP deficiency causes morbid early-onset obesity.25 The PAI1 was previously found to be associated with BMI in a female cohort in the Quebec Family Study,26 which is consistent with our female-specific linkage finding.

The 11q22 region has also been previously linked to BMI in Dutch, Nigerian and Old Order Amish cohorts.24, 27, 28 The linked region contains two highly relevant genes for various body composition-related phenotypes: uncoupling protein 2 (UCP2) and uncoupling protein 3 (UCP3). Both UCP2 and UCP3 have been linked with thermogenesis and energy balance26, 29 and also possess significant sequence similarity with the uncoupling protein 1 (UCP1), an important mediator of thermogenesis-mediated energy expenditure.

Linkage results for stature

For stature loci on 11q23 (LOD=2.74), 12q12 (LOD=2.07), 15q25 (LOD=3.00), 15q26 (LOD=2.15), 18q23 (LOD=2.49) and 19q13 (LOD=2.18) showed evidence for linkage. The loci on 11q23, 12q12, 15q25 and 18q23 have also been previously reported for linkage to stature, whereas 15q26 and 19q13 have not been linked to stature previously.

The linkage observed at 11q23 is primarily driven by European-American males who clearly show the clearest evidence for linkage. This locus may not, however, be truly sex-specific, as females also show some evidence for linkage, and the total evidence for linkage is stronger when females and males are analyzed jointly. African-American families on the other hand exhibit virtually no evidence for linkage at this locus. Mukhopadhyay et al20 also reported linkage finding consistent with our results in this region. Both African-American and European-American cohorts show evidence for linkage on chromosome 12 but peak locations are slightly different. This linkage is also mostly contributed by females; however, inclusion of males adds linkage evidence suggesting lack of sex-specific genetic effects. Interestingly, this peak overlaps with the best association signal from the first published GWA study for stature30 that reported convincing association of the high-mobility group at-hook 2 gene (HMGA2) to stature in the general population. Also, this locus has also been previously reported to contain a QTL for stature in a Finnish cohort31 and contains several other interesting candidate genes for stature, such as SRY-box 5 (SOX5), vitamin D receptor (VDR), collagen type 2 α-1 gene (COL2α1). 15q25 shows linkage only in European-American families and is contributed by females and males. This locus has also been previously reported with linkage to stature 13 cM upstream in an Australian sample (LOD score of 3.43 at 79 cM) by Perola et al.32 However, a seemingly distinct male-specific linkage on 15q26 was also observed in the European-American cohort. Combined, these linkage regions contain several relevant genes for stature with aggrecan 1 (AGC1) and insulin-like growth factor I receptor (IGF1R) being the most noteworthy. There is some evidence that mutations in the IGF1R gene, resulting in IGF1 resistance, may underlie some cases of prenatal and postnatal growth failure.33

The male-specific linkage observed on 18q23 is provided roughly equally by both African-American and European-American cohorts, although the exact peak location is slightly different between the cohorts. Interestingly, this finding overlaps well with the same male-specific linkage observed in our previous joint analysis of four Finnish cohorts19 and a male-driven linkage reported in the Framingham Heart Study.34 Chromosome 19 has not been previously linked to stature in other genome-wide screens. The linked region on 19q is very large containing hundreds of known genes; however, the most prominent candidates are transforming growth factor-β 1 (TGFβ1), a cluster genes belonging to the insulin-like growth factor family of signaling molecules (IGFL1, IGFL2, IGFL3 and IGFL4).

Discussion

The success of the ‘positional cloning’ paradigm utilizing family-based linkage designs used in gene identification for Mendelian traits has yet to be paralleled for complex traits, where solid evidence for genetic variants has been extremely difficult to establish. Likely, reasons for these shortcomings are (1) insufficient sample sizes and (2) heterogeneity from population to population with respect to either environmental or genetic factors. In this study, we aim to locate QTL underlying the observed variation in stature and BMI and overcome these hurdles by (1) maximizing the sample size by combining the primary genome-wide genotype data of four independent family-based studies and (2) dissecting genetic and environmental heterogeneity by also performing sex- and population-specific linkage analyses. Using this approach, we have performed, to our knowledge, the largest family-based genome-wide screen for stature and BMI, which is not based on meta-analysis, and report two loci linked to variation in BMI and six loci linked to variation in stature most of which add evidence to previously reported loci for these traits.

The genetic background of the continuous variation observed for stature and BMI may be either oligo- or polygenic in nature. There is some supporting evidence for major genes underlying stature and BMI in humans,35 although solid proof is still elusive perhaps barring some exceptions such as the HMGA2 gene for stature30 as well as a variant upstream of the INSIG236, 37 and FTO gene for BMI. These recent GWA results that have demonstrated QTL explain no more than 1% of the trait variance.30, 38 Mapping QTL of such small effects by linkage analysis is likely to require unrealistic sample sizes, while GWA studies may succeed. Therefore, it is not surprising that in our linkage study, we do not detect the HMGA2 region associated to stature or the FTO region associated to BMI.

The existence of major genes is critical for successful genetic mapping at least in out bred populations.39 We believe that our results do support Fisher's infinitesimal model, where in the human population the genetic background of stature and BMI is controlled by a large number of genes each having a minute effect on the phenotype. In our opinion, this would at least to some extent explain the relatively modest statistical evidence for linkage observed in this study as well as the lack of consistent findings in other studies as well (see http://www.genomeutwin.org/stature_gene_map.htm and http://obesitygene.pbrc.edu for overview). Another factor that may have resulted in false-negative findings in this study and previous comparable studies is that many traditional linkage-based genome-wide screens contain a relatively low proportion of inheritance information due to relatively sparse genetic maps (>5 cM) and missing founder genotypes. Regenotyping these family samples utilizing high-density single-nucleotide polymorphism map to increase inheritance information has been shown to be a successful strategy in simulation40 and empirical studies.41

Family-based linkage studies are based on examining patterns of allele sharing within individual families and then summing up these results across the study sample. Therefore, they are less liable to allelic heterogeneity than association studies and might identify genes for a trait where differing variants may contribute to the trait variance in different populations. In such circumstances even a dense GWA study might miss the signal, especially if the direction of effect is different for allelic variants. Thus, although there are obvious technological advantages in GWA studies, there is still room for traditional linkage studies4 in identification of biologically important genes for traits such as stature and BMI, where allelic heterogeneity across populations is quite probable. Another important issue in study design arises from the fact that family-based linkage mapping and association mapping in unrelated individuals are optimal under very different genetic models, and therefore it is unwise to invest solely on one or the other, as we do not know a priori the genetic architecture of the trait we are interested in. Simplistically speaking linkage studies are geared for relatively rare alleles with large effects within families (that may be of little effect in the population), whereas association studies are designed to detect common genetic effects that have smaller effects. In the case of rare monogenic disease, multiple rare variants at linked loci (allelic heterogeneity) seem to be the rule not the exception.42 For common polygenic disease and quantitative traits this question is still unanswered – there are examples for both common43 and rare alleles,44 and theoretical and empirical studies suggest a role for both rare and common variants.45 One must also remember that GWA can be performed in family samples, although utilizing unrelated samples is more straight-forward and powerful. However, due to the other beneficial qualities inherent to family samples, investigators already in possession of family samples should use them for GWA studies, as loss of statistical power is relatively small.46 Considering these and other examples, it is clear that genome-wide linkage mapping in families and GWA mapping in population cohorts should rather be considered as complimentary, not alternative, strategies in mapping polygenic traits.

Body mass index is a derived variable and thus dependent on both height and weight and used mainly as a means for classifying people as underweight, normal or obese. Although BMI does not provide specific information on physiological intermediate phenotypes, such as basal metabolic rate, its widespread use clinically, ease of measurement, high levels of reproducibility and significant heritability across populations make it an important phenotype for genetic analyses. Interest in this phenotype is exemplified by the Obesity Gene Map (http://obesitygene.pbrc.edu), which is currently reporting 169 linkages and 183 associations to BMI from various studies, across all human chromosomes. Although there are probably several true signals among those reported, it is plausible to assume that all findings are not due to the obvious polygenic background of BMI, but many are just reflections of analyzing a variable that signals different biological backgrounds in different ascertainment schemes, populations and cohorts. The modest LOD scores we observed for BMI despite the large study sample likely reflects the heterogeneity of our study populations, and may suggest that there are relatively few common loci with strong effects for BMI across these populations. From the results of our analysis, we conclude that there is marked locus heterogeneity between males and females as well as African-American and European-American cohorts. This is evident from the fact that some loci are linked only in sex-specific (BMI 7q35 and stature loci on 12q and 18q) and/or population-specific (BMI loci on 7q and 11q and stature loci on 11q and 15q25) analyses. The relatively large sample size of even subgroups reduced, but cannot eliminate, the probability that the observed linkages in these stratified analyses were spurious.

For some loci the benefit of combining a large number of families resulted in increased statistical significance; for example, the African-American and European-American families provide consistent evidence for linkage at the stature loci on 18q and 19q13.

It is well established that in the setting of genome-wide screening for trait loci combining the primary data from independent genome-wide screens is superior to meta-analytic approaches in terms of power to detect loci and reducing sources of variation.47 However, combining data across multiple cohorts may also increase genetic and environmental heterogeneity and thus hamper successful locus identification.48

Our results show both the benefit of (1) combining data to maximize the sample size as well as (2) minimizing heterogeneity by analyzing subgroups where within-group variation can be reduced. Our results however suggest that the latter may be a more fruitful approach in genetic mapping. This approach is analogous to utilizing special populations such as population isolates to reduce genetic and environmental heterogeneity in gene mapping efforts.