INTRODUCTION

National cholesterol guidelines use a “Framingham model” to calculate a person’s 10-year risk of myocardial infarction or coronary death. Based on this risk, patients are categorized into different risk groups, which are used to guide treatment decisions.1 The Framingham model is a complex mathematical equation. To enable use in settings without calculators or computers, the formula for calculating risk was simplified into a point-based or “score sheet” system.2

Both the original and point-based versions of the Framingham model are endorsed by the National Cholesterol Education Project’s Adult Treatment Panel III (ATP III) guidelines.1,3 However, it is possible that the simplified point-based system may lead to less accurate risk estimates and potentially different treatment recommendations. The original system derives risk estimates using Cox regression models based on patient age, total and HDL cholesterol, systolic blood pressure, treatment for hypertension, and smoking status. For example, based on the original Framingham model equation, a 45-year-old male smoker with total cholesterol 170 mg/dl, HDL 38 mg/dl, and systolic blood pressure of 125 mmHg not on antihypertensive treatment would be calculated to have a 7% risk of myocardial infarction or death over 10 years. In contrast, the point-based system assigns each risk factor level an integer number. These risk factor values are summed to derive a score, and then the risk for that score is determined from a look-up table. For example, this same patient above would get 3 points for being age 45–49 years, 5 points for smoking, and so forth for a total score of 13 points, which corresponds to a 12% risk of a major coronary event over the next 10 years.

The proliferation of computers and personal digital assistants (PDAs) in clinical medicine enables easy implementation of the original, mathematically complex Framingham model at the point of care. However, the point-based system has remained in widespread use in both clinical practice and research, including widespread application in computerized risk prediction tools.48 Given that approximately 36 million persons in the US are eligible for lipid-lowering therapy, differences in classification could result in millions of persons receiving different lipid-lowering therapy depending on which model is used.9,10 In this study, we used nationally representative data to compare differences in predicted risk between the original and point-based Framingham calculations and to determine the degree to which the point-based system stratifies patients into different risk groups.

METHODS

Data were obtained from the 2001-2006 waves of the National Health and Nutrition Examination Survey (NHANES). Respondents were selected and their data weighted to be representative of the non-institutionalized US population.11

Under ATP III guidelines, patients with known coronary heart disease (CHD) or risk equivalents are considered high-risk (>20% 10-year risk). Patients with 0–1 risk factors are considered to be low risk. Patients with two or more risk factors but no known CHD or risk equivalents are placed in an indeterminate risk group. The Framingham risk model is used on patients in this indeterminate group to assess whether they are at “moderate” (<10%), “moderately high” (10-20%), or “high” (>20%) risk of myocardial infarction or coronary death in the next 10 years (Fig. 1).1

Figure 1
figure 1

Creation of analytic cohort and LDL treatment guidelines for patients undergoing Framingham-based risk stratification. *CHD risk equivalents included self-report of myocardial infarction, angina pectoris, diabetes mellitus, and stroke. †Persons taking lipid-lowering drugs ineligible for analysis. ‡CHD risk factors included: cigarette smoking, hypertension, HDL <40 mg/dl, family history of premature CHD, and age (>45 years for men, >55 years for women). Persons with HDL >60 mg/dl had 1 point subtracted from their risk factor sum score.

Our analyses focused on adults age 20–79 in this indeterminate group (the Framingham models are not adapted to people over age 79 years). We excluded the 11% of patients in this group who reported taking lipid-lowering therapy, since the Framingham models and risk stratification algorithms are not designed to predict risk in this population.

We used self-report data from NHANES to exclude subjects with CHD or risk equivalents, including myocardial infarction, angina pectoris, diabetes mellitus, or stroke. Next, we summed the number of risk factors for each subject including current cigarette use, hypertension (defined by self-report or documented blood pressure ≥140/≥90 mmHg), low HDL (<40 mg/dl), family history of CHD (history of heart attack or angina before age 50 years in close biological relatives), and age (male >45 years old, female >55 years old). Subjects with HDL >60 mg/dl had one point subtracted from their risk factor sum. Subjects with two or more risk factors formed our analytic cohort.

We used the multiple imputation by chained equations procedure for the 2005–2006 wave of NHANES to account for the approximately 11% of eligible subjects who had incomplete data, mostly due to absent blood pressures and laboratory values for total and HDL cholesterol.12,13 Our multiple imputation analysis results were very similar to those obtained analyzing only subjects with complete data. We thus conducted our main analyses for all three waves on subjects with complete data.

Analyses

As the Framingham models are gender specific, we split our cohort into male and female groups. Using formulas published on the Framingham Heart Study website, for each subject we calculated the predicted 10-year risk of hard CHD events using the original and point-based models.14 Next, we determined which risk group each person in our cohort would be placed in under the original model and calculated the number of people reclassified into a higher or lower risk group by the point-based system. We term this shift between risk groups “misclassification,” insofar as such patients are misclassified relative to the original Framingham model. We evaluated differences in risk classification using kappa statistics and compared the estimated probabilities of misclassification into lower or higher ATP III risk groups using multinomial models.

Finally, among patients misclassified by the point-based system, we determined whether this misclassification would impact guideline-based drug treatment recommendations. For this analysis, we used LDL levels, which were collected on NHANES participants examined in the morning (1,079 of our 2,543 subjects). Among misclassified subjects, we determined if LDL levels were above the threshold for starting drug therapy given their risk group determined by one model, but below the threshold for starting drug therapy given the risk group determined by the other model.

All analyses were conducted using Stata 10.1 (StataCorp, College Station, TX) and were adjusted for subject weights and clustering effects using standard methods recommended by NHANES to make our results nationally representative.

RESULTS

Among 11,967 subjects aged 20–79 years, 1,898 had known CHD or risk equivalents, 7,080 had 0 or 1 CHD risk factors, and 446 were taking a lipid-lowering drug. The remaining 2,543 subjects, representing 39 million adults, formed our analytic cohort (Fig. 1). About one third of our analytic cohort was female, with a median age of 48 years, and three-quarters were white (Table 1). Half were cigarette smokers, 56% had hypertension, 30% had a family history of CHD, 46% had HDL cholesterol <40 mg/dl, and slightly under half had LDL cholesterol levels of 130 mg/dl or greater. As calculated by the original model, 71% had a “moderate” risk of a major coronary event in the next 10 years (including 48% with <5% risk and 23% with 5–9% risk), 22% were at “moderately high” risk, and 7% were at “high” risk (data not shown in Table).

Table 1 Characteristics of Sample

Figure 2 shows the ATP III risk groups that patients would be assigned to had their risk been calculated by the original model vs. the point-based model. Compared with the original model, the point-based system misclassified 15% of subjects (95% CI, 13%–16%) into different ATP III risk groups, corresponding to 5.7 million people. Kappa for agreement in risk group stratification was 0.69. Misclassification disproportionately shifted patients into higher risk groups (P < 0.001), with 10% (95% CI 9%–12%; 3.9 million people) misclassified into higher risk groups and 5% (95% CI, 4%–6%; 1.8 million people) into lower risk groups. The majority of upward classification originated among patients in the lowest risk group. The largest source of downward classification was among subjects placed in the “high risk” group by the original Framingham model, of whom 45% (1.2 million of 2.7 million) were misclassified as “moderately high risk” or “moderate risk” under the point-based system.

Figure 2
figure 2

Classification of Subjects into Risk Groups by the Point-Based and Original Model. *Cells to the right of the diagonal represent the point-based system estimating a higher risk than the original model. Cells to the left of the diagonal represent the point-based system estimating a lower risk than the original model. Overall, 2,543 subjects contributed data toward this table

Patterns of misclassification varied significantly by gender and age. Overall, 17% of men (95% CI, 15%–19%; 4.3 million) and 11% of women (95% CI, 9%–14%; 1.4 million) were misclassified (P = 0.003 for difference between genders). Among those misclassified, 64% of men (95% CI, 58%–70%) and 80% of women (95% CI, 72%–87%) were shifted into higher risk groups by the point-based system. Results also varied by age groups. Misclassification affected 7% of people aged 20–44 years (95% CI, 5%-9%; 1.0 million), 17% aged 45–64 years (95% CI, 15%–20%; 3.2 million), and 27% aged 65–79 years (95% CI, 23%–30%, 1.5 million; P < 0.001 for difference between age groups). This variation in misclassification patterns may in part be attributable to underlying differences in CHD risks between age and sex groups.

Next, we compared point estimates of risk generated by the original and point-based models. On average, the point-based system generated higher risk estimates than the original model by a mean of 0.6% (95% CI, 0.5%–0.8%; SD 3.3%) in men and 0.4% (95% CI, 0.2%–0.6%, SD 2.5%) in women (Table 2 and e-Appendix). Differences between the models were often substantial for individual patients, and the magnitude of differences grew as risk increased. For example, the median absolute risk difference for men at “moderate” risk (predicted by the original model) was 1.0% [interquartile range (IQR) 0.5%–1.8%), increasing to a mean absolute risk difference of 3.6% (IQR 1.8%–6.4%) for those in the “high risk” group. Differences of more than 5% between the original and point-based models were common at higher levels of risk, occurring in 26% of subjects (95% CI, 23%–30%) who had risk scores above 10% as calculated by the original model (data not shown in Table).

Table 2 Differences in Estimated Risk by Original Framingham Model and Point-Based System, by Level of Risk

Figure 3 shows results from the perspective of point-based scores. For each level of point-based risk, the box-and-whisker format shows the distribution of risk estimates calculated by the original model. For example, consider women with a point-based risk estimate of 8%. The upper border of the box corresponds to the upper 25th percentile of risk estimates calculated by the original model. Since the upper border of the box lies at 10% risk, approximately one-quarter of women with point-based scores of 8% had original model scores of 10% or above. As such, one-quarter of women with a point-based score of 8% would be reclassified from the “moderate risk” category (<10% risk) to the “moderately high risk” category (10-20% risk) had the original model been used instead. Overall, for women and men with point-based scores above 6%, reclassification into different ATP III risk strata was common. In contrast, very few subjects with point-based scores of 6% or less would have been classified into the different ATP III risk groups had the original model been used instead of the point-based version.

Figure 3
figure 3

Classification of Subjects into Risk Groups by the Point-Based and Original Model. *Cells to the right of the diagonal represent the point-based system estimating a higher risk than the original model. Cells to the left of the diagonal represent the point-based system estimating a lower risk than the original model. Overall, 2,543 subjects contributed data toward this table

Finally, we evaluated the potential impact of risk group misclassification on guideline-recommended treatment decisions. Under standard cutpoints of the original ATP III guidelines, 25% of subjects (95% CI, 17% to 36%) misclassified by the point-based model would have had different drug treatment strategies recommended as a result of misclassification, with 18% (95% CI, 11%–26%) recommended for more intensive treatment and 7% (95% CI, 4%–12%) recommended for less intensive treatment (P = 0.01 for direction of treatment effects). Using more aggressive optional targets published in a 2004 update to ATP III, 46% of subjects (95% CI, 37% to 56%) misclassified by the point-based model would have drug treatment recommendations changed as a result of misclassification, including 39% (95% CI, 31%–48%) being recommended for more intensive therapy and 7% (95% CI, 4%–13%) for less intensive therapy (P < 0.001 for direction of treatment effects).

DISCUSSION

In this nationally representative study, the original and point-based Framingham models produced clinically meaningful differences in estimated CHD risk for many individuals and stratified substantial numbers of patients into different risk groups established by ATP III guidelines. Overall, the point-based system classified 15% of eligible Americans (5.7 million people) into different risk groups than the original Framingham model. Misclassification predominantly shifted patients into higher risk groups, with 10% of adults (3.9 million) misclassified into higher risk groups and 5% (1.8 million) into lower risk groups, and had the potential to impact drug treatment recommendations in 25–46% of affected subjects not currently taking lipid-lowering therapy. Because our analyses excluded study subjects with incomplete data, patients on lipid-lowering therapy, and patients who in clinical practice may receive Framingham risk prediction outside formal guideline criteria, our results underestimate the number of Americans potentially affected by differences in the point-based and original Framingham models.6

These discrepancies comprise one of the ongoing challenges in a history of impressive advances in cardiovascular risk assessment. Beginning with a sum-of-risk-factors approach in the first report of the National Cholesterol Education Program, successive advances in modeling have improved clinicians’ ability to predict—and thereby better prevent—cardiovascular events.1518 As predictive models became more complex and impractical to calculate by hand, point-based versions became necessary to facilitate their regular use. This need may persist in settings where computer-based risk calculators are not readily available at the point of care, as it may be preferable to have an imperfect system of risk prediction than none at all. However, as the availability of desktop- and handheld-based computers has become routine in clinical practice, there is limited need for predictive models that can be calculated using pen and paper.

Nonetheless, the point-based system remains in widespread use, including in risk calculators on websites and personal digital assistants, and such tools are often not transparent in noting which model they use. Thus, the misclassifications of risk that we observed are likely common in clinical practice, and may have substantial clinical and policy implications. Of particular note, over two-thirds of misclassifications moved patients into higher risk groups. Because guidelines recommend more aggressive treatment strategies for patients in higher risk groups, this misclassification may drive increases in the use of lipid-lowering medications. This may have some benefits by reducing cardiovascular event rates, although at the risk of increasing adverse drug reactions, patients’ medication burden, and clinician time and resources.19 In addition, there is debate over the utility of expanding drug therapy beyond NCEP guidelines.3,2030

Also concerning is potential undertreatment for the 1.8 million people whom the point-based system misclassifies into lower risk groups, particularly the nearly 50% of people (1.2 million) at high coronary risk whom the point-based system triages into lower risk categories. Failure to define and pursue aggressive LDL goals in such patients may compound the widespread undertreatment of persons at high cardiovascular risk.10

Unfortunately, there does not appear to be a simple “fix” to correct the misclassification that occurred under the point-based system. Patterns of misclassification were complex, varying by underlying CHD risk, sex, and age. In addition, the population-level implications of misclassification also vary among age and sex groups. For example, a substantial majority (80%) of the 1.4 million misclassified women were misclassified into higher risk groups, largely reflecting the fact that most women had calculated risk of under 10% by the original model, so their only available direction for risk group misclassification was into a higher risk group. In contrast, while misclassification also predominantly placed men into higher risk groups, a substantial minority (36%) were shifted into lower risk groups, leaving them susceptible to undertreatment. This was particularly notable for the 2.5 million men at high risk of future coronary events, almost half of whom were misclassified by the point-based model into a lower risk group.

ATP III guidelines acknowledge that the original model gives more precise estimates of risk than the point-based one, but note that use of the point-based system “provide[s] a result that is accurate for clinical purposes.”31 While the differences in predicted risk between the models are small for the majority of patients, there are substantial numbers of patients for whom the two models produce clinically meaningful differences in predicted risk. National guidelines would benefit from acknowledging the calculated discrepancies between the two models and educating and guiding clinicians about preferred methods of risk stratification. More importantly, current guidelines should strongly consider endorsing the original model as the preferred method of risk calculation and as the sole appropriate option for computer or PDA-based risk calculators. In addition, patients and clinicians who made treatment decisions based on the point-based system should consider recalculating risk based on the original Framingham model and where appropriate adjust treatment plans accordingly.

Our results should be interpreted in the context of known limitations of the original Framingham model and previous evaluations of the NCEP risk stratification algorithm.6,7,32 The original Framingham model has only moderate ability to distinguish between persons who will or will not have future coronary events (with ROC curves from validation studies mostly in the range of 0.65 to 0.75).3336 This model was also developed in a mostly white, middle-class population, and validation studies have revealed that it overestimates CHD risk in a number of other populations.3336 Other research suggests that Framingham-based risk assessment should be expanded to patients with 0 or 1 risk factors.6,7 In addition, national guidelines from other countries use versions of the Framingham model in different, often more conservative ways to guide lipid management.37 In the US, it is well-documented that many patients—particularly those at high coronary risk—have LDL levels above current guideline recommendations.10 Thus, our findings should be interpreted as one piece of a larger challenge of appropriately identifying individuals’ coronary risk profiles and increasing adherence to treatment strategies optimally tailored to those individuals’ risk.

The next generation of cholesterol guidelines (ATP IV) is expected to be released in the near future, and it is likely that these new guidelines will predict risk using a new model of global cardiovascular risk prediction that incorporates a broader range of cardiovascular outcomes.35 Nonetheless, score-sheet versions of this model have already been developed and if applied to guidelines may result in problems similar to those that we observed.35 Thus, when simplifying future models of cardiovascular or other forms of risk, it will be essential to account for the practical effects of simplification on algorithm-based management decisions and to disseminate these analyses in peer-reviewed publications to maximize transparency.38

There are several limitations to our study. Our estimates of how many subjects would be recommended for changes in lipid-lowering therapy based on misclassification are approximate due to limited sample sizes, absence of data on potential lifestyle interventions, and potential inaccuracies in self-reported use of lipid-lowering medications. In addition, we did not evaluate users of lipid-lowering drugs, so we do not know what the impact of using different Framingham models would have been prior to their initiating drug treatment. Finally, our study did not have access to actual cardiovascular outcomes, and so we are unable to determine the accuracy of these models for predicting cardiovascular events. Nonetheless, the original Framingham model is the de facto gold standard for ATP III-based risk prediction, and mathematically is it very unlikely that a point-based system derived from the original model would be more accurate than the original model itself.

In summary, the point-based Framingham risk prediction tool misclassifies millions of Americans into different ATP III risk groups compared with the original Framingham model, with 25–46% of affected subjects experiencing potential impacts on drug treatment recommendations. Guidelines and their associated risk prediction tools should account for the clinically meaningful differences that can arise between original and point-based models and the impact that these differences can have on treatment decisions. This will support the goal of a clinically consistent, transparent, and standardized approach to cardiovascular risk assessment.