Introduction

Genome-wide association studies (GWASs) have significantly added to our understanding of the genetic basis of type 2 diabetes and related traits, including fasting plasma glucose (FPG), by identifying a number of genes potentially involved in the pathophysiology of this common complex disease [1]. However, the majority of FPG GWASs have been conducted in individuals of European descent, many of which were included in a recent meta-analysis [2]. While this information has laid an important foundation, it is important to investigate whether identified loci transfer across populations with different ancestral backgrounds [3] and whether novel variants could be identified as recently demonstrated in populations of East Asian and Indian backgrounds [4, 5]. Here, we conducted replication of published GWAS results for FPG in African-Americans from the metropolitan area of Washington DC, USA.

Methods

Ethics statement

Ethical approval for the Howard University Family Study (HUFS) was obtained from the Howard University Institutional Review Board and written informed consent was obtained from each participant.

Study design

The individuals studied were unrelated non-diabetic participants over the age of 20 years (n = 927) enrolled in the HUFS. This population-based study of African-Americans in the Washington, DC metropolitan area has been previously described by Adeyemo et al. [6]. For the present study, participants with FPG ≥7 mmol/l or who were receiving treatment for diabetes were excluded. Additional characteristics of the cohort can be found in Electronic supplementary material (ESM) Table 1.

Genotyping

All 927 DNA samples were prepared and genotyped as described by Adeyemo et al. [6]. Briefly, all samples passed a sample success rate of 95%. Single nucleotide polymorphisms (SNPs) were excluded if they had a success rate of less than 95% (41,885 SNPs excluded), a minor allele frequency (MAF) ≤0.01 (19,154 SNPs excluded), or had a p value for the Hardy-Weinberg test of equilibrium <10−3 (6,317 SNPs excluded). The current analysis focuses on the 808,465 autosomal SNPs that passed these filters. In addition, imputation was performed as reported by Shriner et al. [3]. We successfully imputed 1,506,100 SNPs using the Yoruba in Ibadan, Nigeria (YRI) reference panel and an additional 52,291 SNPs using the Centre d’Etude du Polymorphisme (Utah residents with northern and western European ancestry) (CEU) reference panel, for a total of 2,366,856 experimentally determined and imputed SNPs.

Statistical analyses

FPG was log-transformed and values greater than ±3 SDs from the mean value were winsorised (n = 8). All regression models were adjusted for age, sex, BMI and one EIGENSTRAT axis under an additive model. In separate analyses, hypertension was also adjusted for the known association with insulin resistance [7], but the effect was inconsistent, with the magnitude of the p value marginally increasing or decreasing significance for some SNPs (data not shown).

Replication analysis was performed on SNPs identified in GWASs of FPG based on information in the National Human Genome Research Institute’s catalogue of published GWASs (www.genome.gov/gwastudies/). The query returned hits indicating reported SNPs, their respective p values and associated genes (ESM Table 2). If multiple studies reported the same SNP, the SNPs with the lowest p value were included in the present study. The returned results included 16 SNPs associated with FPG in the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) study [2]. The MAGIC SNPs were supplemented by 13 additional SNPs from previously published GWASs (ESM Table 2) for a total of 29 SNPs that we attempted to replicate in our African-American cohort.

Our replication effort occurred in two stages. In the first stage, we attempted to replicate 20 of the 29 exact published SNPs (i.e. direct replication) available in the HUFS dataset. For this stage, SNPs were considered replicated if the same HUFS SNP had a p value <0.05. For the second stage, we performed a local replication analysis based on a 500 kb linkage disequilibrium (LD) block containing a query SNP determined by the SNPs most distant from the query SNP with r 2 ≥ 0.3. We used the HapMap CEU LD data (http://hapmap.ncbi.nlm.nih.gov/downloads/ld_data/2008-06/00README.TXT) for all SNPs except for rs2166706 where the Gujarati Indians in Houston, Texas, USA (GIH) reference dataset (http://hapmap.ncbi.nlm.nih.gov/downloads/ld_data/2008-09_phaseIII/00README.txt) was used to match the original reported GWAS population [5]. Second, we estimated the covariance matrix for the block of SNPs using the HUFS genotype data. Third, the covariance matrix was spectrally decomposed and the effective degrees of freedom, Neff, were estimated using the relationship \( {N_{eff}} = {\left( {\sum\limits_{k = 1}^K {{\lambda_k}} } \right)^2}/{\left( {\sum\limits_{k = 1}^K {\lambda_k^2} } \right)^2} \), in which λ k is the kth eigenvalue of the K × K covariance matrix for the K SNPs [8]. Fourth, the nominal significance threshold α = 0.05 was divided by Neff.

Power calculations were carried out using the Quanto software package (Version 1.2.3, http://hydra.usc.edu/gxe/). Calculations were based on: continuous outcome; an independent individuals design; and a gene-only hypothesis. An additive inheritance model was applied for varying MAFs. MAFs were calculated based on HUFS data; for SNPs with no associated HUFS data, HapMap- or Perlegen-reported MAFs were used. The power for the present study was determined based on reported effect estimates for FPG for each reported MAGIC SNP [2].

Results

Of the 16 SNPs recently reported in the MAGIC meta-analysis of over 122,000 participants [2], 12 were available for testing in the HUFS dataset (ESM Table 2). We directly replicated three SNPs (rs2191349, rs11558471 and rs4506565) located in or near DGKB-TMEM195, SLC30A8 and TCF7L2 genes respectively (Table 1). We also replicated SNPs from other GWASs for FPG: rs2722425 within ZMAT4 (p value = 0.024) as well as rs625643 (p value = 0.048), which is located in a functionally unknown region on chromosome 1 (Table 1). SNPs from the remaining studies that did not directly replicate are not shown. We note that SNPs in C2CD4B, FADS1, GCK and G6PC2 from the MAGIC study and IRS1, PDE4B, and ATP8B4 from other GWASs were not directly compared in the HUFS dataset owing to quality-control filters or lack of genotyping or imputation data.

Table 1 SNPs that were reported in the MAGIC study and other GWASs of FPG that were directly analysed for replication in a cohort of African-Americans (the HUFS)

We also analysed SNPs that were in LD (r 2 ≥ 0.3) with each discovery SNP (ESM Fig. 1). This replication strategy, which queried a 500 kb window centred on the index SNP, yielded a total of 317 SNPs located in or near nine different genes or unknown gene region (G6PC2, GCKR, MTNR1B, DGKB-TMEM195, TCF7L2, SLC30A8, AK024684, ZMAT4 and IRS1). Thirty-eight SNPs distributed across all nine gene regions of the 317 SNPs tested were significantly associated with FPG after Bonferroni correction for multiple comparisons (Table 2).

Table 2 Significant SNPs and their effects (β) after Bonferroni correction that were in LD (r 2 ≥ 0.3) with reported SNPs from the MAGIC study and other GWASs of FPG

Based on reported effect sizes of the 14 MAGIC loci (excluding TCF7L2 and SLC30A8), the power was calculated for each SNP using the African-American MAFs where available (ESM Table 3). The estimated power for this study ranged from a low of 0.25 to a high of 0.99. The SNPs in the four genes most strongly powered (i.e. > 90% power) in this study were either directly replicated (DGKB-TMEM195) or locally replicated (G6PC2, MTNR1B and GCKR) with markers in moderate LD (r 2 ≥ 0.4). The effect sizes for SNPs rs4506565 and rs11558471 (previously reported loci TCF7L2 and SLC30A8, respectively) were not reported in the MAGIC study.

Discussion

Chronically elevated FPG is a primary indicator of diabetes, making it an important barometer of the progression of impaired glucose metabolism. In this paper, we attempted to replicate, in nearly 1,000 African-Americans, significant GWAS loci for FPG in populations of predominantly European ancestry. In light of well-reported increased genetic diversity in populations with African ancestry [9, 10], our replication strategy not only focused on the reported SNPs but also included querying variants in LD with the reported SNPs.

We focused our replication analysis on the MAGIC study of over 122,000 participants to identify FPG-associated SNPs shared across ethnically diverse populations. In addition, we included SNPs from prior GWASs of FPG to add breadth to our replication pool, keeping in mind that potential differences in susceptibility loci between populations may exist [11]. Of the 12 SNPs reported by the MAGIC study that were directly testable in our African-American cohort, we replicated three SNPs within DGKB-TMEM195, TCF7L2 and SLC30A8. We also replicated SNPs in ZMAT4, which encodes a zinc finger, matrin type 4 protein identified in previous GWASs but not replicated in the MAGIC meta-analysis. Using the local (LD-based) replication strategy, we replicated additional SNPs in or near previously reported genes, including the insulin receptor substrate 1 gene.

Interestingly, comparison of the LD structure in HUFS to HapMap reference samples CEU and YRI supports the utility of African-American population samples in refining association loci. For example, the covariance matrix generated for the local replication of rs625643 spans 40 kb and includes 16 SNPs. In HapMap CEU, nearly the entire region is in moderate LD, whereas in HapMap YRI two distinct LD blocks are observed (ESM Fig. 2) and lower (on average) r 2 values are observed between rs625643 and downstream SNPs (0.78 for CEU and 0.5 for YRI). As expected, African-American samples (i.e. HapMap African Ancestry in Southwest, USA [ASW] and this study HUFS) show an LD structure intermediate to CEU and YRI (ESM Fig. 3 and ESM Fig. 2). Furthermore, given the association signals in HUFS, a case can be made for a narrowing of the region of interest from 40 kb to 3 kb between the locally replicated SNP rs671431 and the original discovery SNP (ESM Fig. 3).

We acknowledge the fluid interpretation of r 2 values within the context of establishing variants in LD with each other as well as the concern of being overly conservative in our correction for multiple comparisons. At an r 2 ≥ 0.3, we attempted to first capture a significant portion of SNPs in LD within the reference sample while maintaining confidence in the ability of related SNPs to serve as proxies [12]. In addition, a blanket search window of 500 kb would allow for capture of some unique characteristics of LD and long-range LD associated with admixed populations such as African-Americans [13]. To address the burden of potentially overcorrecting for multiple comparisons, our Bonferroni correction strategy was based on the estimation of effective degrees of freedom [8], which provides an analysis of covariance among HUFS SNPs in the extracted LD block that was based on CEU HapMap samples. We feel this approach better describes the relationship of SNPs in LD within the queried window instead of assuming the very conservative approach of independent effects for all tested SNPs.

We also acknowledge the limitation of our study of about 1,000 participants to detect some of the very small effect sizes reported by the MAGIC study, which included more than 122,000 participants. However, this study had over 80% power to replicate similar effect sizes for 10 of the 14 SNPs reported by the MAGIC study (ESM Table 3); this is evident in this study’s ability to replicate several of the published GWAS variants for FPG. We caution that lack of replication in this study may be due to limited sample size, differences in effect sizes calculated for SNPs and difference in allele frequency between populations of European and African ancestries.

The need for understanding differential susceptibility to diseases at the population level makes the identification of risk factors for diabetes and its indicators, including FPG, particularly important in the African-American community and other ethnic groups, given their disproportionate rate of morbidity and mortality from diabetes and associated complications. Unfortunately and for multiple reasons, the majority of GWASs aimed at identifying genetic variants associated with FPG and diabetes have so far focused predominantly on individuals of European ancestry. While the results from these studies provided tremendous insight into the genetic architecture of the disease, recent studies of non-European populations have shown utility in expanding the breadth of populations studied. Specifically, studies in East Asians allowed for a ‘wider net’ to be cast in the identification of type 2 diabetes susceptibility variants [4, 11]. The present study’s focus on individuals self-identified as African-Americans not only widens the net but also underscores the need for directed investigation of under-represented populations.