Introduction

Type 1 diabetes is one of the most widely studied complex genetic disorders, and the genes in the HLA region are reported to account for 40–50% of the familial aggregation of type 1 diabetes [1]. Age at onset of type 1 diabetes may modify the metabolic phenotype of the patients and may influence the risk of late complications of diabetes. For example, age at onset of type 1 diabetes significantly modifies the long-term risk of proliferative retinopathy. The highest risk for retinopathy is seen in age-at-onset group 5–14 years, whereas the lowest risk is in age-at-onset group 15–40 years [2]. Similarly, patients with onset of diabetes after age 15 have been observed to have a lower risk of diabetic nephropathy and end-stage renal disease than patients diagnosed during adolescence [3]. On the other hand, recent studies indicate higher mortality for type 1 diabetes patients diagnosed in late adolescence or adulthood than for patients diagnosed earlier [4].

A significant genetic component for age at onset has been reported—in particular, a contribution by specific HLA alleles [57]. However, previous studies investigating the role of HLA alleles on age of onset have all come from a single cohort and have analysed sample sizes of only a few hundred patients at a time [57]. The aim of this study was to investigate HLA class I and class II classic loci genotyped in a large collection of patients from the Type 1 Diabetes Genetics Consortium (T1DGC) collection to assess their effect on age at onset of type 1 diabetes. We have studied different populations of European descent, focusing on the role of specific DRB1-DQB1 genotypes, DPB1 and HLA class I alleles that have been previously implicated in risk of type 1 diabetes or age at diagnosis. We have compared genetic prediction risk models for early age at onset (age <5 years) and for late age at onset (≥15 years).

Methods

Study participants

The Type 1 Diabetes Genetics Consortium (T1DGC) is a large, worldwide, collaborative study aimed at collecting and genotyping new type 1 diabetes families from multiple populations in a highly standardised fashion, to aid in the search for additional type 1 diabetes genes within and outside the HLA region [8]. An individual was designated as affected if he or she had documented type 1 diabetes with onset ≤37 years of age, had used insulin within 6 months of diagnosis and had no concomitant disease or disorder associated with diabetes. High-resolution HLA genotyping was performed at eight classic MHC loci by four genotyping centres using standardised typing protocols, reagents and quality control procedures [9]. In addition to the patient clinical samples collected by the T1DGC, genotyping was also carried out in existing clinical collections. Age at onset and high-resolution genotyping data were also available for samples and data collected outside of the T1DGC framework and were contributed for inclusion in various T1DGC projects, including the Danish, Human Biological Data Interchange (HBDI), Joslin and Sardinian collections.

Proband status

For the T1DGC collection, the proband was identified as the first child diagnosed with type 1 diabetes in the family. The ‘proband’ variable within the data set identifies the first child diagnosed with type 1 diabetes. For the existing cohorts, the criteria for proband assignment were not readily available for all pedigrees.

Allele selection and genotype coding

The genetic contribution of DRB1-DQB1 genotypes was encoded as DR3/DR4 = 4, DR3/DR3 = 3, DR4/DR4 = 2, DR4/DRx = 1, DR3/DRx = 1 and DRx/DRx = 0. Where DR3 = DRB1*03:01-DQB1*02:01, DR4 = DRB1*04:01/2/4/5/8/13-DQB1*03:02 or 03:04 or 02:01 and x is any other haplotype including DRB1*04:03 or other DRB1*04 carrying haplotypes with DQB1*03:01. Genotypes that included the highly protective allele DQB1*06:02, or the haplotypes DRB1*14:01 DQB1*05:03 or DRB1*07:01 DQB1*03:03 were categorised as DRx/DRx. This ranking was based on previous reports of predisposing, protective and neutral DR-DQ haplotypes [10]. HLA alleles at loci other than DRB1-DQB1 that have been convincingly implicated in risk of type 1 diabetes were also included in the model. These included DPB1 alleles 02:02, 03:01 and 04:02 and class I alleles A*24:02, A*02:01, A*11:01, A*30:02, A*32:01, A*66:01, B*18:01, B*35:02, B*57:01, C*03:03, B*39:06, B*44:03 and C*07:02 [6, 11, 12].

Risk factor selection

Stepwise linear regressions were carried out using age at onset as the outcome continuous variable and including all the above genetic factors in addition to sex, cohort of origin and proband status.

Outcome variables

Having identified which HLA variables to include, logistic regressions were carried out in each cohort separately, adjusting for proband status and including the HLA alleles that were found to influence age at onset in the stepwise linear regression analyses. Two binary outcome variables were defined: ‘early age at onset’ coded as 1 if age at onset ≤5 (28.6% of patients) or 0 if age at onset >5 (71.4% of patients); ‘late age at onset’ coded as 1 if age at onset >15 (21.5% of patients) or 0 if age at onset ≤15 (78.5% of patients). These age cut-offs were defined based on the ages at which differences in rates of complications and mortality have been observed [3, 4].

Inter-study heterogeneity

Inter-study heterogeneity was assessed using a DerSimonian and Laird random effects meta-analysis and computing the heterogeneity variance τ. The rmeta library in R was used (http://cran.r-project.org/web/packages/rmeta/rmeta.pdf).

Calibration and discrimination

The predictive power of a given diagnostic is usually summarised by a receiver operating characteristic (ROC) curve. In this type of analysis, subjects are ranked in descending order of their predicted risk and the cumulative proportion of subjects who develop disease (cases) is plotted against the corresponding cumulative proportion of the population, i.e. the sensitivity (true-positive fraction) is plotted in the y-axis vs 1 − specificity (the false-negative fraction) in the x-axis [13]. A perfect diagnostic would be represented by a line that starts at the origin, travels up the y-axis to 1 and then across the origin to an x-axis value of 1, thus having a total AUC of 1. A test with AUC = 0.5, on the other hand, has zero diagnostic value. Whereas discrimination examines the ability to correctly classify subjects into different groups, calibration assesses how closely the predicted probabilities reflect actual risk [13]. The calibration and discrimination abilities of the models were examined in the independent cohorts described above.

A risk score was calculated for each individual using the logit equation:

$$ \log \!{\text{it}} = {\log_e}\left( {p/1 - p} \right) = \alpha + \beta 1{\text{X}}i + \ldots + \beta i{\text{X}}i $$

Where p is the probability of the outcome (early or late age at onset), α is the constant and β is the natural logarithm value of the odds ratio for a specific predictor Xi.

The logit operator maintains the linearity of the model and allows the calculation of a probability of the outcome (in this case early or late age at onset), given the different sets of predictors, according to \( p = \exp \left( {\log \!\text it} \right)/\left( {1 + \exp \left[ {\log \!{\text{it}}} \right]} \right) \). Thus, the higher the risk score, the greater the risk of the outcome.

The individuals were classified into different subgroups according to the risk scores. Observed and predicted frequencies of the disease in subgroups were calculated. The Hosmer–Lemeshow χ 2 statistics for goodness of fit were used for calibration to compare observed and predicted risk [14]. Non-significant p values for this test indicate good calibration. Both discrimination and calibration of risk models were carried out using the PredictABEL package for R (http://cran.r-project.org/web/packages/PredictABEL/index.html).

Results

The mean and standard deviation of age at onset by sex and proband status for each of the cohorts are summarised in Table 1. Genotyping and age at onset information for a total of 3,602 type 1 diabetes patients, corresponding to 1,801 affected sib pairs from the T1DGC, including the extant collection, were included. The overall range for age at onset was 0–37 years.

Table 1 Age at onset and sex distribution by cohort of the patients included in the study

Age at onset was found to be significantly higher in male (p < 0.005) than in female subjects. In the T1DGC collection, where the proband was defined as the first child to develop type 1 diabetes, a strong difference in age at onset is seen between probands and non-probands (Table 1). In the T1DGC collection and some of the pre-existing collections, proband status appears to be a confounding variable for younger onset.

A stepwise logistic regression was carried out, which included DPB1 alleles 02:02, 03:01 and 04:02, class I alleles A*24:02, A*02:01, A*11:01, A*32:01, A*66:01, B*18:01, B*35:02, B*57:01, C*03:03, B*39:06, B*44:03 and C*07:02, adjusting for sex, cohort of origin and proband status. This analysis revealed that, for HLA, only DR-DQ genotype A*24:02 and B*39:06 contributed significantly to age of type 1 diabetes onset, with B*44:03 and B*18:01 nearing statistical significance (Table 2).

Table 2 Stepwise linear regression of factors associated with type 1 diabetes age at onset in 1,801 sib pairs

Individual population effects

The allele/genotype frequencies for DR3/4 A*24:02, B*18:01, B*39:06 and B*4,403 are shown in Table 3. The frequencies are stratified by early age at onset (age ≤5) vs not, by late age at onset (age ≥15) vs not and by early onset (age ≤5) vs late onset (age ≥15). Differences in allele and genotype frequencies are seen among the various populations, as is expected for the genes in the HLA region. The effect of these alleles and genotypes on early and late onset was assessed by multiple logistic regression including all genetic variables in the model in addition to sex and proband status. We observe a striking difference in the frequency of B*18:01 among Sardinian patients compared with the other groups (Table 3). A much higher frequency of certain DR3 haplotypes in this population, compared with other European populations, has already been reported [15].

Table 3 Comparison of HLA alleles associated with age of onset and their association with early and late age of onset in different populations

Using random-effects meta-analysis we investigated whether there was evidence of statistically significant heterogeneity between study cohorts, i.e. whether, regardless of the frequency, the effect on age of onset was different for each of the five genetic variables studied. We found no evidence of inter-study heterogeneity for any of the early-onset genetic effects nor for any of the early- vs late-onset associations, with the smallest p value for heterogeneity being p = 0.20. From both of these traits (early vs other and early vs late), meta-analyses of the genetic effects of B*18:01 and B*44:03 did not reach statistical significance; all other associations were statistically significant overall. For late age at onset we observed evidence for inter-study heterogeneity for the effect of B*39:06 on this outcome, yielding τ = 0.086 and p = 0.05. By meta-analysis the only genetic effects that were significantly associated with late age of onset were the DR-DQ genotype and A*24:02, indicating that these are the most consistent effects throughout the age of onset distribution. In the absence of significant heterogeneity within the T1DGC subcohorts, we have merged them for the risk-prediction analysis.

Risk-prediction models

We then assessed whether these HLA markers could predict early or late age at onset. A logistic regression on early-onset and late-onset outcomes was then fitted using three different models for each outcome: sex as the only risk factor; HLA as the only risk factor; and sex and HLA as risk factors. Early vs late models were not fitted given the small sample sizes involved.

The following models were fitted.

  1. (1)

    Early age at onset

    The models fitted were:

    $$ \matrix{ {{\text{sex only, logit}} = - 1.00254 + 0.07075{\text{ sex}};} \hfill \\ {{\text{HLA only}},\log {\text{it}} = - 1.611245 + 0.12472B*18:01 + 0.855414B*39:06 - 0.3341B*44:03 + 0.2947A*24:02 + 0.1869DR - DQ;{\text{ and}}} \hfill \\ {{\text{HLA}} + {\text{sex}},\log {\text{it}} = - 1.91141 + 0.096{\text{ sex}} + 0.1225B*18:01 + 0.29309A*24:02 + 0.8556B*39:06 - 0.3394B*44:03 + 0.18356*DR - DQ.} \hfill \\ }<!end array> $$
  2. (2)

    Late age at onset

    The models fitted were:

    $$ \matrix{ {{\text{sex only}},\log {\text{it}} = - 1.3957 - 0.1790{\text{ sex}};} \hfill \\ {{\text{HLA only}},\log {\text{it}} = - 0.8281 - 0.008191B*18:01 - 0.60403B*39:06 + 0.11133B*44:03 - 0.4537A*24:02 + 0.18383DR - DQ;{\text{ and}}} \hfill \\ {{\text{HLA}} + {\text{sex}},\log {\text{it}} = - 0.6588 - 0.207{\text{ proband status}} - 0.207{\text{ sex}} - 0.0478B*18:01 - 0.5087A*24:02 - 0.4058B*39:06 + 0.2339B*44:03 - 0.1621*DR - DQ.} \hfill \\ }<!end array> $$

The risk discrimination and calibration results from these models in all cohorts are shown in Table 4.

Table 4 Validation of the risk-prediction models for type 1 diabetes age at onset using HLA information

The best prediction for early onset was seen in the Joslin cohort for an HLA-only model, yielding an AUC = 0.700 (Table 4). For late onset, the best AUC was seen in the Danish cohort for a model including both sex and HLA (AUC = 0.644). For all other cohorts and outcomes except one, at least one of the models had an AUC value significantly higher than a value of 0.5, at which the test would have no predictive value. None of the risk-prediction models is significantly different from 0.5 in the Sardinian cohorts.

For late onset, the models that included sex and HLA had calibration problems in two of the cohorts because of the heterogeneous relationship between sex and age of onset among cohorts. We also note that the large confidence intervals for AUC in some of the cohorts are probably due to the smaller sample sizes available in those studies.

Discussion

In the current study we have investigated the role of high-resolution HLA genotypes on age at type 1 diabetes onset in various populations of European descent. To our knowledge, this is the first study to compare the role of HLA on age of onset in different populations.

We further investigated whether such genotyping information could have any predictive value for assessing the risk of very young age at onset in contrast to late onset for type 1 diabetes. Using the largest data set to date to address this question we found that the strongest genetic contribution to age of onset appears to come from the DRB1-DQB1 genotypes, which also have the strongest influence on disease risk [10]. In addition, a few select class I alleles, notably A*24:02, B*18:01, B*39:06 and B*44:03 also influence age at onset. Of these the most consistent effect is that of A*24:02, whereas the other class I alleles either do not influence specific cut-offs of age of onset (early vs late) or show evidence of strong heterogeneity across populations (e.g. B*39:06 for late age of onset).

In both the T1DGC collection and other independent extant collections, we find that genotypes for classic class I and class II HLA loci can have some modest predictive power for these two outcomes. On the one hand, this confirms the role of HLA polymorphism in influencing age at onset. On the other hand, it highlights that other risk factors not included in our models must also be influencing age at type 1 diabetes onset.

Current approaches for the prediction of type 1 diabetes in screening studies take advantage of the major genetic risk factors, genotyping for HLA-DR and HLA-DQ loci and screening for autoantibodies directed against islet-cell antigens [16]. For example, children who carry both of the highest-risk HLA haplotypes (DR3/DR4DQB1*03:02) have a risk of approximately one in 20 for a diagnosis of type 1 diabetes by the age of 15 years [16]. The results presented here may help improve such models by taking into account also the role of genetic risk factors on age at onset.

We found that sex had little or no predictive value and that because the relationship with age of onset was not consistent across cohorts in most instances it did not improve the AUC. For the Danish and HBDI cohorts, where the difference in age of onset between sexes was strongest, the inclusion of sex did show a slight improvement, but not in an additive way. This is consistent with reports for the combination of genetic and non-genetic factors for other disease areas [17].

We note several study limitations. Our analyses have used data derived from affected sib pair cohorts of European descent, selecting for patients with a strong genetic contribution to type 1 diabetes and, therefore, possibly also to its age at diagnosis. The current data are thus reflective of the prediction of HLA in a group of patients enriched for genetic risk. On the other hand, these data are relevant to clinical research, as studies of first-degree relatives (follow-up, prevention trials) involve those who have a family member [16, 18] already diagnosed with type 1 diabetes and genetic factors combined with other factors could be applied to the analysis of data from cohorts of relatives. In addition, these results highlight the differences between populations of European descent and illustrate the limits and the extent to which HLA may be helpful in predicting age at disease onset.

We have developed and calibrated three risk-prediction models for age at early and late onset of type 1 diabetes, based on five independent patient collections. We hope that these models may be used as pilots to lead further research in defining risk prediction for age at onset using other risk factors (e.g. environmental exposures, autoantibodies). The models may be applied at the individual level to predict the most likely category of age at onset (early or late), but also at the population level, with reference to other relative risks from published studies, to estimate the potential reduction in population risk that may be gained by primary prevention of any modifiable risk factors that influence type 1 diabetes and the ensuing complications.