Introduction

The prevalence of type 2 diabetes has been increasing in developing and developed countries, including China [1, 2]. Between 1986 and 1996, the prevalence of diabetes trebled in mainland China [3]. Sixty percent of the world’s diabetic population is expected to come from Asia [2]. Diabetes, mainly type 2, is the leading cause of end-stage renal disease (ESRD) [4], and Asian diabetic patients have one of the highest prevalence rates of albuminuria. In one multinational survey of Asian type 2 diabetic patients with hypertension, 40% had microalbuminuria and 20% had macroalbuminuria [5]. Consistent with these observations, Asians have higher incidences of ESRD compared with Caucasians [6].

Based on a prospective cohort of patients with type 2 diabetes, the United Kingdom Prospective Diabetes Study (UKPDS) has developed a series of risk engines or equations to estimate the absolute risk of CHD [7] and stroke [8] etc. This type of risk assessment is widely accepted and is recommended in clinical guidelines [9, 10]. The UKPDS has also developed a risk equation for ESRD in diabetic populations as part of a health outcome model [11]. However, this equation was developed using a predominantly Caucasian population with only a limited set of risk factors. Previous epidemiological studies have shown that there is considerable variation in the rate of diabetes-related complications across different populations [12]. Hence, there is a need to develop a risk equation for ESRD for the Chinese population that would assist in identifying high-risk subjects for surveillance and intensive management [13].

In 1995, the Hong Kong Diabetes Registry was established as part of a continuous quality improvement programme to document all risk factors, complications and clinical outcomes of diabetic patients referred to the Prince of Wales Hospital, the majority being southern Chinese. Given its comprehensive nature, this well-established diabetes registry [14, 15] provides a unique opportunity to (1) identify risk factors for the occurrence of ESRD, and (2) develop ESRD risk equations for Chinese patients with type 2 diabetes that may also be applicable to diabetic patients at large, if externally validated.

Subjects and methods

Participants

The Prince of Wales Hospital, the teaching hospital of the Chinese University of Hong Kong, serves a population of over 1.2 million. The Hong Kong Diabetes Registry was established in 1995 and each week 30 to 50 diabetic patients attending our outpatient clinics, representative of the local diabetic populations, were recruited into the registry. The referral sources included general practitioners, community clinics, other specialty clinics, and patients discharged from the Prince of Wales Hospital or other regional hospitals. In this registry, 2.1% were newly diagnosed patients. Overall, the median number of years since the diagnosis of diabetes at the time of enrolment was 6 years (interquartile range 2–11 years). Enrolled patients with recent hospital admissions accounted for fewer than 10% of all referrals. The 4-h assessment of complications and risk factors was performed on an outpatient basis, modified from the European DIABCARE protocol [16]. Once a diabetic subject had undergone this comprehensive assessment, he/she was considered to have entered into this study cohort and was followed up until the time of death. Ethical approval for the study was obtained from the Chinese University of Hong Kong Clinical Research Ethics Committee. The Declaration of Helsinki was followed in the study and informed consent was obtained from all patients at the time of assessment for data analysis and research purposes.

For this analysis, the end-points, including data on hospital admissions, laboratory results (estimated GFR [eGFR]) and mortality, were censored on 31 December 2000. Details of all medical admissions of the cohort by that date were retrieved from the Hospital Authority Central Computer System, which records admissions to all public hospitals in Hong Kong. Mortality data from the Hong Kong Death Registry were also retrieved and all causes of death were further ascertained by review of case notes by an endocrinologist. These databases were matched by a unique identification number, the Hong Kong Identity Card number, which is compulsory for all residents in Hong Kong. Similarly, all drug-dispensing information and laboratory results, including serum creatinine data, were computerised and were retrieved from the Hospital Authority Central Computer System.

Because of the lack of a comprehensive primary care system and compulsory medical insurance scheme, the Hospital Authority, which is the governing body of all 42 public hospitals and 21 community-based general outpatient clinics in Hong Kong, provides about 95% of all inpatient bed days to the 6.8 million population of Hong Kong [17]. In this heavily subsidised system, most patients with chronic diseases, including diabetes, are regularly followed by the Hospital Authority system at intervals of 2 to 4 months, except those who had emigrated. Investigations and drug dispensing are done on site. A diabetic subject would stay in this analysis and be considered to exit the analysis on the date of death, as captured by the mortality data from the Hong Kong Death Registry.

Between 1995 and 2000, 4,799 diabetic patients were enrolled in this registry. From these patients, we excluded from the analysis 228 with type 1 diabetes, defined as acute presentation with diabetic ketoacidosis, heavy ketonuria (>3+) or continuous requirement for insulin within 1 year of diagnosis, and 79 with uncertain type 1 diabetes status. Thirty-seven patients who had an eGFR <15 ml min−1 1.73 m−2 and 17 patients whose serum creatinine was not available at baseline were also excluded. Thus, 4,438 patients with type 2 diabetes who did not have ESRD at baseline were included in the analysis. Among them, 139 out of 187 all-cause deaths, were censored due to non-ESRD causes and 4,140 were censored due to the termination of the study.

In accordance to the International Classification of Diseases, Ninth Revision (ICD-9) code, ESRD was used as the end-point of this study, and was defined as (1) death due to diabetes with renal manifestations or renal failure (ICD-9 codes 250.4, 585, 586); (2) hospitalisation due to non-fatal renal failure (ICD-9 codes 585 or 586); and (3) eGFR <15 ml min−1 1.73 m−2 [18].

Clinical measurements

Details of assessment methods and definitions have been described previously [14, 15, 19]. In brief, on the day of assessment, patients attended the centre after at least 8 h of fasting. All patients underwent a 4-h assessment that included clinical examination, anthropometric measurements and laboratory investigations. This study used the abbreviated Modification of Diet in Renal Disease (MDRD) formula to estimate GFR (eGFR), expressed in ml min−1 1.73 m−2: eGFR=186×(SCR×0.011)−1.154×(age)−0.203×(0.742 if female), where SCR is serum creatinine expressed as μmol/l (original mg/dl converted to μmol/l). The serum creatinine level for calculation of eGFR on follow-up was retrieved from the Hospital Authority Central Computer System, which records all laboratory data generated from all public hospitals in Hong Kong. The actual age at measurement of serum creatinine was used in the calculation of eGFR on follow-up.

Laboratory assays

The complete blood picture was determined using a Beckman Coulter counter. Plasma glucose was measured with a hexokinase method (Hitachi 911 automated analyser; Boehringer Mannheim, Mannheim, Germany). HbA1c was measured with an automated ion-exchange chromatographic method (Biorad Laboratory, Hercules, CA, USA; reference range 5.1–6.4%). Inter- and intra-assay coefficients of variation for HbA1c were ≤3.1% at values below 6.5%. Total cholesterol (TC), triglyceride (TG) and HDL cholesterol (HDL-C) were measured by enzymatic methods with a Hitachi 911 automated analyser (Boehringer Mannheim) using reagent kits supplied by the manufacturer of the analyser. LDL cholesterol (LDL-C) was calculated by the Friedewald’s equation [20]. The precision of these assays was within the manufacturer’s specifications. Urinary creatinine (Jaffe’s kinetic method) and albumin (immunoturbidimetry method) were also measured with a Hitachi 911 analyser using reagent kits supplied by the manufacturer. The inter-assay coefficient of variation was 12.0% and 2.3% for urinary albumin concentrations of 8.0 mg/l and 68.8 mg/l, respectively. The lowest detection limit was 3.0 mg/l. Serum creatinine (Jaffe’s kinetic method) was measured with a Dimension AR system (Dade Behring, Deerfield, IL, USA).

Statistical analyses and risk function development

All data are expressed as mean (SD), median (interquartile range) or percentage, as appropriate. Rates were compared using the χ 2 test and means of pairs of variables were compared using Student’s t-test. The Wilcoxon two-sample test was used to test differences in distributions between pairs of variables that were not normally distributed.

Before developing the risk equation, the data were randomly divided into two databases. One half of the data (n=2,227) was used as the training data and the other half (n=2,211) was used as the test data. After verifying the assumption of proportionality, Cox proportional hazards regression analysis was used to obtain estimates of predictors of ESRD. The variables in the first Cox model included age, sex, BMI, known duration of diabetes, smoking status (ex and current), systolic BP, TC:HDL-C ratio, HbA1c, white blood cell count, retinopathy and peripheral vascular disease (PVD) status at baseline. A forward stepwise algorithm (p<0.10 for entry and p<0.05 for stay) was used to obtain model 1. Using significant variables obtained in the first model, another Cox model was constructed with additional entry of baseline eGFR, log10 ACR and haematocrit using the same forward stepwise algorithm to obtain model 2. Since haematocrit, eGFR and ACR were closely related to renal function, they were not included in model 1 in order to improve our understanding of possible cause–effect relationships. Then, drug effects were examined in the model using the forward stepwise algorithm and significant variables were included in the model (model 3). As drug effects on diabetic complications were expected to be consistent with their beneficial effects on risk profile [21], the number of months of drug use (lipid-lowering drugs, angiotensin II receptor blockers and ACE inhibitors [ACEIs]) after the date of enrolment was used as a covariate, and the number of months was further multiplied by q 1 and q 2, where q 1 was the months of drug use during the study as a proportion of the total observation months, and q 2 was 1 divided by the sum of months from stopping the drug to event time plus 1. For example, during 24 months of follow-up, an ACEI was used for 12 months, commencing at the 12th month (ACEIs were not used in the last month); the adjusted number of months of drug use was 12×(12/24)×{1/(1+1)}=3.

Based on the Cox model, the risk score (not absolute risk or a probability) of developing ESRD over a 4-year period from baseline is

$${\text{risk score}} = X_{1} \times \beta _{1} + X_{2} \times \beta _{2} + ,...,X_{p} \times \beta _{p} ,$$

where X 1, X 2,..., X p are baseline variables and β 1, β 2,..., β p are, respectively, the estimated parameters of variables 1 to p from Cox models.

Validation

The validation of the risk equations was dependent on how correctly they ranked patients by their ESRD risk. Calibration was checked using the Hosmer–Lemeshow goodness of fit test. The area under the receiver operating characteristic curve (aROC) was used to indicate the discriminative power of the risk equations obtained, using the risk score of ESRD calculated from the equations [22]. The aROC varied from 0.5 to 1.0, a larger value indicating better performance of the equation. The aROC was compared between equations using an SAS macro for comparing the aROC for correlated samples (Mann–Whitney statistics) (available from http://ftp.sas.com/techsup/download/stat/roc.html). The aROC took observation time into account and censoring status was calculated using the method described by Chambless and Diao [23]. As non-ESRD death was a competing risk for the ESRD endpoint and the Kaplan–Meier estimator is not valid in the situation [24], the method described by Gooley et al. [24] was used to plot actual cumulative incidences of ESRD with stratification of the risk score at the cut-off point, enabling us to observe the discriminating ability of the risk score above or below the cut-off point over time.

Results

Characteristics of the study population

The total observation time was 12,774.8 person-years and the median follow-up time was 2.9 years (interquartile range 1.6–4.1). During the follow-up, 159 patients with type 2 diabetes (72 in the training data and 87 in the test data) who did not have ESRD at baseline developed ESRD with an incident rate of 12.45 per 1,000 person-years (95% CI 10.52–14.37 per 1,000 person-years). Among these 159 patients who developed ESRD, 65 were hospitalised because of renal failure or died due to renal failure or diabetes with renal manifestations (ten of these 65 patients died) and 148 had an eGFR <15 ml min−1 1.73 m−2. The total number exceeded 159 because of overlap among these definitions (54 overlapped, including seven deaths).

Table 1 compares the baseline clinical and biochemical characteristics of patients who developed ESRD and those who did not develop ESRD. The differences in sex, HbA1c, BMI in the pooled data, and in waist-to-hip ratio in men, did not reach statistical significance at the 0.05 level. All other variables, including age, known duration of diabetes, systolic BP, diastolic BP, TC:HDL-C ratio, TG, LDL-C, white blood cell count, haematocrit, eGFR, ACR, serum creatinine and rates of retinopathy and PVD, were significantly different between the two groups at baseline.

Table 1 Comparison of baseline clinical and biochemical characteristics between groups of Chinese patients with type 2 diabetes who developed and those who did not develop end-stage renal disease (ESRD) during an observation period of 2.9 years

Risk factors for end-stage renal disease

Using a Cox model with a forward stepwise algorithm, without including eGFR, log10 and haematocrit, the variables remaining in the model were: known duration of diabetes, systolic BP, log10 TC:HDL-C ratio and retinopathy (model 1, Table 2). When haematocrit was entered, other variables remained significant. Systolic BP, log10 TC:HDL-C ratio and known duration of diabetes were removed when eGFR was also entered into the model. After further entry of log10 ACR, retinopathy was removed from the model. In the final model, haematocrit, log10 ACR and eGFR were independent predictors of ESRD (model 2, Table 2). Use of ACEI was also a significant predictor. When ACEI was entered in the final model, the significance of the other variables remained (model 3, Table 2). Other treatments were not found to be predictive of ESRD in the cohort.

Table 2 Parameter estimates of risk equations for end-stage renal disease in Hong Kong Chinese patients with type 2 diabetes

Validation

The observed ESRD events and predicted ESRD events were not significantly different for the three ESRD risk equations (p>0.1000) (see Fig. 1 for data derived from the second risk equation). The aROC was 0.841 (95% CI 0.802–0.879) for the first risk equation (derived from model 1), 0.962 (95% CI 0.939–0.985) for the second risk equation (derived from model 2) and 0.965 (95% CI 0.945–0.985) for the third risk equation (derived from the model 3). The aROC derived from the fist risk equation was significantly smaller than the aROCs from either the second or the third risk equation (both p values <0.0001). The difference in aROC between the second and third equations was not significant (p=0.3377). The adjusted aROC of the first, second and third risk equations were, respectively, 0.883, 0.967 and 0.977 over the 4 years of follow-up (the 75th percentile of follow-up time was rounded up to an integer; see Fig. 2 for data derived from the second risk equation). At the cutoff point of ≥−4.7068 for the second risk equation, the sensitivity was 96.8% and the specificity was 86.7%. When the cut-off point of ≥−3.5282 was used for the second equation, the sensitivity was 88.8% and the specificity was 94.7%. Similarly, for the third risk equation, at the cutoff point of ≥−4.5472, the sensitivity was 96.8% and the specificity was 87.8%. Using the cutoff point of ≥−3.1939, the sensitivity decreased to 87.0% and the specificity increased to 96.7%. Using the risk equation derived from the second risk equation, the actual ESRD incidence of patients with risk score above or below the cut-off point of −4.7068 continues to diverge over a 4-year period of follow-up (Fig. 3).

Fig. 1
figure 1

Observed (light grey) ESRD events vs predicted (dark grey) ESRD events by deciles of the risk score derived from the second risk equation using the test subsample. p>0.1000, Hosmer–Lemeshow test

Fig. 2
figure 2

Follow-up time and censoring-adjusted ROC curves estimating the predicting accuracy of the second ESRD risk equation in Chinese patients with type 2 diabetes using the validation subsample

Fig. 3
figure 3

Actual cumulative incidence of end-stage renal disease over follow-up time divided by the cut-off point of the risk score from the second risk equation. Upper curve, group with a cut-off point of −4.7068 and above; lower curve, group with a cut-off point below −4.7068

Discussion

On the basis of a relatively large diabetes registry with well-documented clinical data, we have developed risk equations for calculating the risk score of ESRD over a 4-year period in patients with type 2 diabetes. The risk equation had good discriminative power and correctly classified those who developed ESRD from their counterparts who did not during a 4-year observational period. The accuracy was very high, with a value of 0.97 (second risk equation) and, to our knowledge, this is the highest aROC among published risk equations. Orford et al. [25] compared the predicted and observed numbers of cardiovascular events in 1,393 subjects in the USA and reported c-statistics of 0.60 and 0.58 using the Framingham risk equation and the European Society of Cardiology risk model, respectively. Liu et al. [26] developed a CHD risk equation for the general Chinese population and achieved a discriminative power of 0.71 for men and 0.74 for women. Guzder et al. [27] validated the Framingham CHD risk equation and the UKPDS CHD risk engines using a UK community-based cohort and found that the two sets of risk functions had similar discriminative power (Framingham, 0.66; UKPDS, 0.67). Compared with these published risk equations for CHD, the aROCs of the currently developed equation for ESRD are considerably higher, probably because of the comprehensive risk profiling at baseline and the high incidence of ESRD in our Chinese population with type 2 diabetes, despite a relatively short median observational period of 2.9 years.

The risk factors identified by these risk models were consistent with findings from other studies. Albuminuria is now proven to be predictive of ESRD on the basis of both epidemiological and interventional studies [28, 29]. In the landmark Reduction of End-points in NIDDM with the Angiotensin II Antagonist Losartan (RENAAL) study, reduction in proteinuria 6 months after randomisation to either losartan or placebo treatment was predictive of the future development of ESRD in patients with type 2 diabetes with overt nephropathy and renal impairment [30]. In Hong Kong Chinese patients with type 2 diabetes, albuminuria was an independent predictor of early mortality and deterioration in renal function [19, 31]. It is now widely accepted that microalbuminuria is a marker of generalised vascular damage due to risk factors such as elevated arterial pressure [32]. In diabetic patients, microalbuminuria may also herald the onset of overt nephropathy, which can further accelerate the process of vascular dysfunction [33]. On the basis of a meta-analysis, Bakris et al. [34] reported an inverse relationship between reduction in BP and rate of decline in eGFR, thus emphasising the critical importance of optimal BP control and reducing albuminuria to preserve renal function. In this connection, systolic BP was a predictor of ESRD in model 1, but was removed after adjustment for eGFR.

The pathogenesis of diabetic renal disease is complex and involves genetic, haemodynamic and metabolic factors [35]. In the Helsinki Heart Study [36], an increased ratio of LDL-C to HDL-C was associated with decline in renal function. In the Atherosclerosis Risk in Communities Study, Muntner et al. [37] reported that high TG and low HDL-C levels predicted a rise in serum creatinine. In this analysis, a high TC:HDL-C ratio was a significant predictor of ESRD, although its effect was influenced by haematocrit and ACR.

In support of the reported associations between retinopathy and renal disease in the WHO Multinational Study of Vascular Disease in Diabetes (MSVDD) [21], retinopathy was also found to be a risk factor for ESRD in model 1, as in the UKPDS outcome model [11]. Hackam [38] found that patients with PVD had a marked increase in the risk of stroke and death from cardiovascular causes compared with those without PVD. We had previously demonstrated that PVD was associated with an adverse metabolic profile and impaired renal function [39]. However, PVD was not identified as a risk factor in model 1. Its effect may be small and largely explainable by other risk factors. Of note, whereas retinopathy was a predictor of ESRD in our cohort, the predictive power of retinopathy was also explained by ACR.

The importance of anaemia in the development of ESRD is now increasingly appreciated [28, 40]. In the RENAAL study, which involved patients with type 2 diabetes and overt nephropathy, only haemoglobin, albuminuria and serum creatinine were predictive of ESRD after controlling for all confounding factors [28, 40]. Broom [41] argued that anaemia was an underestimated risk factor for diabetic renal disease. Ueda et al. [42] observed that insulin therapy, serum albumin, mean blood pressure and haemoglobin were independent and significant predictors of progression to ESRD, whereas HbA1c and serum cholesterol were not. Foley et al. [43] found that anaemia was an independent risk factor for mortality in ESRD, whereas Ishimura et al. [44] reported that increased serum creatinine and the presence of diabetes were independent risk factors for new onset of anaemia.

Reduced renal production of erythropoietin in diabetic patients with renal disease might be a cause of anaemia. On the other hand, Dean et al. [45] reported that erythropoiesis-stimulating proteins might curb the rate of decline in renal function in predialysis patients with chronic kidney disease. We speculate that anaemia may have dual roles in the pathogenesis of ESRD in diabetic populations, being a consequence of renal damage as well as a promoter of the decline of renal function. The use of erythropoietin in patients with moderate to severe renal impairment might further complicate the picture. However, this was not applicable to our local population, in whom the use of erythropoietin was uncommon. Indeed, only dialysis patients who are on a transplant waiting list with a haemoglobin concentration below 8.0 g/dl will be considered for subsidised erythropoietin therapy. Further studies are required to investigate the complex associations between haemoglobin and ESRD.

In agreement with recent studies which showed risk associations between smoking and renal failure in both diabetic and non-diabetic subjects [46], we found an increased rate of smoking (ex and current) in our patients who developed ESRD. However, the significance of this was lost when other covariates were controlled. Although genetic, in utero, perinatal and early childhood factors may all contribute to the development of diabetic nephropathy [47], our study demonstrates that, with comprehensive risk profiling, most incident cases of ESRD can be predicted with high accuracy.

This study has several limitations. First, the sample size was only moderate and the observation period was not long. Second, as with all risk equations, further validation is essential before our results can be put to clinical use. Third, during the development of UKPDS risk engines, it was argued that it was not necessary to include therapies, since their effects on diabetic complications were expected to be consistent with their beneficial effects on the risk profile [21]. In that study, the renoprotective actions of ACEI might be largely due to its BP-lowering effect [48]. Since inclusion of the use of ACEIs in the risk equation did not increase its discriminative power, the use of ACEIs was not included in the final risk equation (second risk equation). Another reason for not including drug treatments is that clinicians can never be certain about what drugs will be used in patients in the many years to come. Taking all results together, the concept of developing risk equations that take drug treatments into consideration, though attractive, has limited use in practice. This is comparable to risk prediction based on data from randomised clinical trials and meta-analyses. Fourth, only single biochemical measurements were used and regression dilution was not considered. Fifth, patients with chronic diseases in Hong Kong are predominantly followed up in public health-care systems, and the cohort was clinic-based rather than population-based. Therefore, our cohort might not be fully representative and care should be taken when the risk equation is used in other Hong Kong diabetic populations. Sixth, in the late 1980s and early 1990s, Hong Kong experienced a tide of emigration. However, by the time this study was initiated, the tide had almost ended. Moreover, as heavy users of health-care resources, patients with chronic diseases such as diabetes were less likely to be accepted as immigrants by other countries. Among the 4,151 diabetic patients who survived during the observation period (out of the total of 4,799 diabetic patients entered in this analysis between 1995 and 2000), 97.7% had laboratory investigations, pharmacy prescriptions or hospital admission records captured from the Hospital Authority Central Computer System, suggesting that they were receiving active medical care in hospital. Considering all data, the number of emigrated patients, if any, was expected to be small. Seventh, non-ESRD death is a competing risk for the ESRD end-point. In the presence of competing risks, proportional hazard models such as Cox and Weibull models still give valid results when employed to test the hazard ratio [49]. However, the cumulative distribution does not have any probabilistic interpretation [50]. Therefore, no further efforts were made to calculate the absolute risk of ESRD. Eighth, equations for calculating GFR have not been well validated in the Chinese population. Based on limited reports, the abbreviated MDRD equation is considered to be less biased than the Cockcroft–Gault equation for Chinese people [51].

In conclusion, despite these limitations, on the basis of data from a prospective cohort of Hong Kong Chinese patients with type 2 diabetes, we have developed a risk equation for ESRD with high accuracy. Given the rising burden of type 2 diabetes and ESRD, especially amongst Asian populations, including Chinese, validation of the equation in other populations could make an important contribution to public health.