Introduction

Health outcomes assessment has advanced to a point where generic health-related quality of life (HRQL) measures are often used to examine the health status of populations and the effects of medical interventions. Generic HRQL profile measures, such as the SF-36 Health Survey [1], provide multiple health domains scores (e.g., physical function, mental health, pain, vitality, etc.), but not an overall index score. Preference-based health index scores provide a single summary score assessing overall health-related quality of life and are useful as an outcome measure in clinical studies, for estimating quality-adjusted life years for economic evaluations, and for monitoring the health of populations. Preference-based HRQL instruments provide information on the value of different health states and can be used to estimate health outcomes for cost-effectiveness analyses [2, 3].

Several preference-based indexes have been developed, which includes the EuroQoL EQ-5D [4], the health utilities index (HUI) [5, 6], and the Quality of Well-Being Scale–Self Administered [7]. In addition, a preference-based score, the SF-6D, has been estimated from SF-36 items [8, 9]. Although each of these health indexes provides valuations on a 0 (dead) to 1 (perfect health) scale, they differ in health state classification systems, methods for preference assessment, and scoring algorithms. US normative data for these measures have been developed based on the National Health Measurement Study [2] and other national surveys [10, 11].

Previous studies have derived preference-based scores from generic HRQL profile measures [8, 9, 12, 13]. Lawrence and Fleishman [14] and Sullivan and Ghushchyan [15] discussed mapping the EQ-5D index from the SF-12 using nationally representative samples. Gray et al. [16] used regression analysis to explore the association between responses to the SF-12 and responses to each EQ-5D question, and found that both the US-based data (from the 2000 Medical Expenditure Panel Survey [MEPS]) and the UK-based data (from the 1996 Health Survey for England [HSE]) had similar demographic characteristics, as well as physical component summary (PCS) and mental component summary (MCS) scores, though HSE had higher mean EQ-5D index scores. Comparisons of EQ-5D index scores derived from the US and UK algorithms found that the US model predicted higher scores than the UK model for almost all EQ-5D health states, while the US model resulted in smaller gains in health preferences than the UK model for the large majority of simulated transitions between EQ-5D health states [17]. Mapping of the SF-12 to the EQ-5D index in a nationally representative US sample has also been conducted, though this was completed without the use of US population weights [18]. The use of mapping estimation methods allows patient preference scores to be derived from health status profiles based on the empirical relationship between these constructs, which is particularly useful when patient preference scores are unavailable.

Recently, the patient-reported outcomes measurement information system (PROMIS) project has developed several health domain item banks [19] and a short-form version of a global health questionnaire, a new generic health status measure based on the review of PROMIS item banks (Hays et al., submitted). Given that the PROMIS domain scores and global items are likely to see increased application in National Institutes of Health (NIH) and other studies, estimating health preference scores from the PROMIS measures will be useful for those studies where assessments of health preferences have not been included. These estimated health preference scores will be useful for economic analyses using data from studies that include the PROMIS domain and global short-form instruments. The objective of this study was to estimate health preference scores based on the EQ-5D index, using selected PROMIS domain scores and the summary scores from the PROMIS global short form. We also compared the estimated health preference scores to EQ-5D index scores from several US national surveys by age and gender groups [2, 10].

Methods

Study design

The PROMIS item banks were administered via web-based survey to a national internet panel maintained by Polimetrix (now YouGovPolimetrix; see http://www.polimetrix.com). The field test involved administering the item banks from five domains (i.e., pain, fatigue, physical functioning, social activities, emotional distress) to selected participants [19]. Some respondents were randomly assigned to administer different complete item banks, that is, all the items within a defined domain-specific bank, such as physical function or fatigue. Other respondents were randomly assigned to block-form item samples consisting of sets of seven consecutive items from each of 14 subdomains in the five PROMIS health domains.

Study participants

The PROMIS sample was selected to be generally comparable to distributions of gender, age groups, race/ethnicity (white/African–American/Hispanic/other), and education (high school or less versus more than high school) based on the 2000 US census data (Liu et al., submitted). Study participants were identified from the Polimetrix internet panel and from selected clinical research centers. For the current study, the participants included subjects who administered the full item banks and the block data.

Wave 1 sample

Because of the number of item banks being tested in Wave 1, a complex data-collection strategy was employed. This strategy included two arms and a total sample size of 21,133. A total of 19,601 subjects were recruited by Polimetrix, with the remaining 1,532 subjects recruited by PROMIS research sites (Fig. 1). In the full-bank testing arm, 7,005 persons from the general population were administered two of the 14, 56-item, subdomain-specific PROMIS item banks. In the block testing arm, 14,128 individuals administered randomly selected seven-item blocks measuring each of the 14 PROMIS-targeted subdomains. The PROMIS research sites and the Polimetrix sample included both community and clinical samples. The clinical samples included persons with heart disease (n = 1,156), cancer (n = 1,754), rheumatoid arthritis (n = 557), osteoarthritis (n = 918), psychiatric disorders (n = 1,193), COPD (n = 1,214), spinal cord injury (n = 531), and other conditions (n = 560).

Fig. 1
figure 1

PROMIS wave 1 sample

Measures

EQ-5D

The EQ-5D is a preference-based instrument designed to measure generic health status across five dimensions of health: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression, with three response levels (no problems, some problems, extreme problems) [4]. A unique EQ-5D health state is defined by combining one level from each of the five dimensions, and scores range from −0.109 to 1.0, with greater scores indicating better overall health. The calculation of the EQ-5D index scores was based on the valuation reported by Shaw et al. [13] that was derived from a large-scale survey of the US general population [10]. The EQ-5D also includes a single visual analogue scale (EQ-5D VAS) that was not used in this study.

PROMIS global items

The PROMIS survey included ten global health items (Hays et al. submitted). One item was the general health question rating overall health on a poor-to-excellent scale. The remaining items covered quality of life, mental health, physical health (two items), pain, fatigue, social function (two items), and emotional distress. Based on these global items, Hays et al. (submitted) found evidence supporting two summary scores assessing physical and mental health. Mental health and physical health summary scores were developed from global items in factor and item response theory analyses conducted in the PROMIS Wave 1 sample. The PROMIS global items were administered to all participants in the Wave 1 sample (Fig. 1). The summary scores were calculated as sums of the relevant individual global items, and individual global item scores (untransformed) were included in the subsequent regression analyses.

Domain item banks

The PROMIS initial item banks were developed based on the published literature, clinician review, and qualitative research on patients with various health conditions (for more information, go to http://www.nihpromis.org). Existing domain-specific instruments were also reviewed for item content, and new items were developed for the PROMIS item banks [19, 20]. Content of the final set of physical function, fatigue, pain impact, anxiety, and depression items was revised based on the results of cognitive debriefing interviews [20]. For this study, we used the calibrated and available item banks measuring physical function, fatigue, pain impact, anxiety, and depression (www.nihpromis.org). The physical function item bank covered self-reported capability for upper extremity and lower extremity function. The fatigue bank was developed to cover both fatigue experience and impact. The pain impact domain included items on various impacts of pain on daily activities and function. The anxiety bank included various symptoms associated with anxiety, and the depression bank included items on depressed mood. Each item used a five- to six-level categorical response scale. The domain scores included in this analysis are T-scores derived from Theta scores from the item response theory calibrations. For the physical function domain scores, higher scores indicate better physical functioning. For the fatigue, pain impact, anxiety, and depression domain scores, higher scores indicate more severe impairment.

Other measures

Information on demographic characteristics was collected for the study participants (i.e., age, gender, race/ethnicity, education). Information was also collected on a number of chronic medical conditions in the Wave 1 sample. These chronic conditions were classified into groups of physical and mental health disorders.

Statistical analysis

A series of ordinary least squares (OLS) regression models were specified where EQ-5D index scores were predicted from different sets of PROMIS scores. First, three sets of regression models were performed using (1) all ten global items; (2) a subset of eight global items (reduced because of multicollinearity); and (3) a subset of eight global items (using alternative duplicative items). The Wave 1 analysis sample of 20,400 cases was separated into two randomly assigned split-half samples; the models were developed in the first sample and the analyses replicated in the second sample to confirm results. Second, we specified an OLS regression model using the PROMIS global item–based mental health and physical health summary scores to predict EQ-5D index scores in the block testing sample (n = 14,128). Finally, a regression analysis was performed including the T-scores for the PROMIS domain item banks for physical function, fatigue, pain impact, anxiety, and depression using the block design data. We selected these five domain banks because they (1) covered important patient-reported outcome constructs, including mental health and physical health; and (2) these item banks were calibrated and tested within the PROMIS project. Subjects were included in this analysis if they completed at least three items for each of the five relevant PROMIS domains and had an EQ-5D index score (n = 1,658). The domain scores included in this analysis are T-scores derived from Theta scores from the item response theory calibrations.

We examined plots of residuals from the regression analyses and performed a Bland-Altman assessment of agreement [21] comparing the actual and predicted EQ-5D index scores. A range of agreement was defined as mean bias ± 2 standard deviation (SD) units. Intraclass correlation coefficients (ICCs) were calculated comparing actual and predicted EQ-5D scores.

Estimated EQ-5D index scores were also compared with actual EQ-5D scores based on the PROMIS general population sample by gender and age groups and by type of chronic medical condition (n = 2,722; Liu et al. submitted). In addition, we compared the PROMIS estimated EQ-5D index scores to those reported in the Luo et al. [10] and Fryback et al. [2] studies by gender and age groups.

Results

Sample demographic characteristics

The PROMIS Wave 1 analysis sample consisted of 20,400 cases. A total of 733 cases were excluded; subjects with an average response time of less than 1 s per item (n = 573) and/or ten consecutive items with response time less than half a second (n = 192) were excluded from the analyses. The overall sample (n = 20,400) was 52% female. The sample mean age was ~53 years: 12% were 18–29 years; 12% were 30–39 years; 16% were 40–49 years; 32% were 50–64 years; and 28% were 65 years or older. The racial/ethnic breakdown was 80% white, 9% African–American, 9% Hispanic or Latino, and 2% other races (Asian/Pacific Islanders or Native Americans). Educational attainment ranged from less than high school (3%) to college or above (44%), with 39% reporting some college and 16% a high school diploma. The general population sample used in some analyses is a subset of the PROMIS Wave 1 sample. Sample characteristics have been found to be consistent with those of the 2000 US census data (Liu et al., submitted).

Regression analyses

To predict EQ-5D index scores, three regression models were run using the PROMIS data. Model 1 included the 10 global items as predictors of the EQ-5D index score and had an adjusted R-square of 0.65 (Table 1). Possible multicollinearity among the ten global items was examined using collinearity diagnostics (tolerance, variance inflation, condition index, and proportion of variance). Some multicollinearity was detected, particularly among the general health and physical health items. These items were highly correlated (r = 0.90) and had low tolerance and high variance inflation factors. In Model 1, general health was a significant indicator (P = 0.0014), but physical health was not significant (P = 0.526). The social satisfaction item was not significant in Model 1 (P = 0.316), and was dropped from subsequent regression models.

Table 1 Results of regression analyses predicting EQ-5D index scores from PROMIS global items

Two additional models were run to assess the effect of including general health and physical health in separate models (Table 1). Model 2 included the general health item, and Model 3 included the physical health item. All global items included in Models 2 and 3 were significant predictors of EQ-5D index scores, and overall model statistics for both indicated good fit. General health (P < 0.0001) was a significant predictor in the absence of physical health (Model 2), while physical health (P = 0.0051) was significant when general health was not included (Model 3). All models accounted for ~65% of variability in EQ-5D index scores. The three models were repeated using the second split-half sample of Wave 1 data (R-square = 0.65). Results using the two separate samples were very similar, and there were no substantive differences.

The ICCs for each of the three models were 0.77. Review of residuals suggested reasonably good fit for all three models. For example, in Model 2 the mean residual was 0.0 (SE = 0.002), with 95% range limits of −0.17 and 0.14 based on the distribution of residuals. The other two models had similar results. The Bland–Altman analyses indicated that the 95% limits of agreement between the predicted and actual EQ-5D scores ranged from −0.20 to 0.20. The largest differences were observed at the upper extremes of EQ-5D scores (>0.90), and there was some evidence of overestimation in the lower range (<0.30).

A regression model including only the mental health and physical health summary scores based on the ten global items accounted for 57% of the variance in EQ-5D index scores. The unstandardized regression coefficients for both the mental health summary (b = 0.031, P < 0.0001) and physical health summary (b = 0.137, P < 0.0001) scores were significant in this model.

The regression model including PROMIS physical function, fatigue, pain impact, anxiety, and depression scores resulted in an adjusted R-square of 0.57. Domains were scored such that higher scores corresponded to higher levels of the attribute (e.g., better physical function, more fatigue). Results were in the expected directions for each domain. Regression coefficients for physical function (b = 0.0077, P < 0.0001), fatigue (b = −0.0021, P < 0.0001), pain impact (b = −0.0040, P < 0.0001), anxiety (b = −0.0023, P < 0.0001), and depression (b = −0.0022, P < 0.0001) were all statistically significant in the model.

The ICC for the PROMIS domain model was 0.73. Review of residuals suggested reasonably good fit for this model. The mean residual was 0.0 (SE = 0.003), with 95% range limits of −0.20 and 0.15 based on the distribution of residuals. The Bland–Altman analysis indicated that the 95% limits of agreement between the predicted and actual EQ-5D scores ranged from −0.21 to 0.21. The largest differences were observed at lower EQ-5D scores (<0.40).

Comparison of predicted and actual EQ-5D index scores

In the PROMIS general population sample, there were few differences indentified between actual and predicted EQ-5D index scores (Table 2). For the total sample, the actual mean EQ-5D score was 0.85 (SD = 0.16), compared with a predicted score of 0.85 (SD = 0.13). Differences in mean EQ-5D index scores by age and gender groups ranged from 0 to 0.02 points, with most (87%) deviations ≤ 0.01 points.

Table 2 Mean actual and predicted EQ-5D index scores by gender and age groups in PROMIS general population sample (n = 2,722)

EQ-5D index scores by medical conditions

Table 3 summarizes actual and predicted EQ-5D index scores by chronic conditions. As expected, subjects with both mental and physical conditions had the lowest EQ-5D preference scores (0.72–0.75). Subjects with no chronic conditions reported the best EQ-5D scores (0.92–0.94). Those with only physical or mental conditions had preference scores situated between subjects with none or both types of chronic conditions.

Table 3 Mean actual and predicted EQ-5D index scores by disease classification in PROMIS general population Sample (n = 2,722)

Comparison of PROMIS-predicted and reported EQ-5D index scores from other studies

We compared predicted EQ-5D index scores from the PROMIS general population sample to gender and age groups reported in the Luo et al. [10] study (Table 4). In general, the preference scores were similar, although PROMIS sample females reported somewhat lower index scores and those aged 65 years and older reported higher index scores when compared with the gender and age groups in Luo et al. [10] study.

Table 4 Comparing predicted PROMIS preference scores to Luo et al. [10]

The PROMIS-predicted EQ-5D index scores were also compared with the National Health Measurement Study [NHMS] EQ-5D data reported by Fryback et al. [2] (Table 5). Generally, the index scores were comparable for males (differences of 0.01 to 0.02 points) but varied more for females (differences of 0.01 to 0.08 points). The largest differences between the PROMIS and NHMS samples were for women aged 35–44 years (0.05 points) and 45–54 years (0.08 points). The preference scores for the older men and women were more comparable between the two samples.

Table 5 Comparing predicted PROMIS preference scores to Fryback et al. [2]

Discussion

We estimated EQ-5D index scores using the PROMIS global items and selected domain scores. Using different sets of the global items, we were able to account for 65% of the variance in preference scores. By comparison, about 57% of the variance in EQ-5D index scores was explained by the global item summary scores or selected PROMIS domain scores. These results are consistent with previous research in predicting health preference scores from HRQL profile measures [8, 14, 15, 18, 22]. For example, Lawrence and Fleishman [14] were able to explain 61–63% of the variance in EQ-5D index scores using SF-12 summary scores. Other researchers explained 58–63% of EQ-5D index scores using different HRQL measures [15, 18]. The availability of preference-based scores based on the PROMIS global items and domain scores enables potential application of these measures to population-based studies and economic evaluations. The main advantage of the PROMIS measures over other static health status measures is that the PROMIS domain item banks and scores allow flexibility in administration using either targeted short forms or computerized adaptive testing.

The estimated EQ-5D index scores based on PROMIS global items were comparable to those directly assessed using the EQ-5D in this sample. Based on the Bland-Altman and other analyses, there was evidence of some overestimation for EQ-5D scores under 0.40; however, the ICCs indicated good agreement (0.77). Differences between the predicted and actual index scores were between 0 and 0.02 points by gender and age groups. Most of the observed deviations were less than 0.01 points. These findings are encouraging and suggest that the predicted EQ-5D index scores may be applied to future studies. More importantly, the predicted EQ-5D index scores varied by presence of physical or mental conditions and were most impaired in those with both mental and physical conditions. The predicted EQ-5D scores based on the PROMIS domains were also comparable to the actual measured EQ-5D scores, and demonstrated similar levels of agreement to the PROMIS global items.

The general pattern of predicted EQ-5D index scores by gender and age groups seen in the PROMIS sample was comparable to those in other recent studies [2, 10]. There is a general decline in index scores by age, although the oldest age group (65–74 years) showed a small increase in preference scores compared with those aged 55–64 years. These findings are consistent with the observed EQ-5D index scores reported in Fryback et al. [2].

There were few differences between the PROMIS sample and the Luo et al. [10] study sample on preference scores. However, in the PROMIS sample women reported somewhat lower index scores, and those aged 65 years and older reported higher index scores compared to those in Luo et al. [10] study. The Luo et al. [10] study used self-completion, as did the PROMIS study, and this may explain the comparability in mean scores. We found some differences by gender and age groups between the predicted EQ-5D index scores from the PROMIS sample and those in the NHMS [2]. The largest differences between the PROMIS and NHMS samples were for women aged 35–44 years and 45–54 years. The preference scores for the younger men and older men and women were comparable between the two samples. These observed differences may be due to different sampling strategies; the Fryback et al. [2] study over-sampled the elderly and ethnic minorities, while the PROMIS study attempted to recruit a representative national sample through an internet panel. Fryback et al. [2] weighted to account for this oversampling, but differences in response patterns and mode of administration (telephone interview vs. internet self-completion) may also have contributed to observed variability. Future research is needed to more carefully examine differences in the PROMIS-predicted EQ-5D index scores by ethnicity, gender, and age groups.

We recommend the PROMIS global item–based prediction equation as best for estimating EQ-5D scores if only one approach is considered. However, future application of these prediction equations depends on the incorporation of either the PROMIS global items or domain measures in future clinical and health services research studies. Given the flexibility of multi-domain short forms and computerized adaptive testing, the PROMIS domain item banks and domain scores may be very useful in clinical studies. The PROMIS global items have potential applications for large population-based and epidemiologic studies. The existing prediction equations allow flexibility to researchers depending on the PROMIS instruments included in their studies.

In general, if a researcher needs to include a preference-based health outcome measure in a study, the most recommended approach is to include one of the direct (i.e., time trade-off, standard gamble) or indirect (i.e., EQ-5D, HUI) measures of health preferences. As we have demonstrated, it is possible to estimate a preference-based score using the PROMIS global items or domain scores in the absence of a preference-based instrument, for example, because of respondent burden or other issues. However, the researcher should recognize that this is a second-best approach and that primary data collection is recommended.

There are several limitations associated with these analyses. First, for analyses involving PROMIS global items, the ordinal nature of these measures may impact the coefficient estimation in the regression analyses. Second, the PROMIS data were all collected using a web-based survey, and there may be differences between the PROMIS sample and the US general population that may limit generalizability of these results. However, Liu et al. (submitted) found that the PROMIS sample was comparable in demographic characteristics and health status to samples from the US general population.

In summary, we predicted EQ-5D index scores based on the PROMIS global items and selected domain scores, and these predicted preference scores varied as expected by demographic characteristics and presence of mental or physical conditions in the PROMIS sample. The predicted index scores were generally comparable to other national samples by age and gender groups. Additional research is needed to further evaluate the validity of the predicted index scores and should also examine other possible approaches to mapping the PROMIS item banks, perhaps through item response theory analysis and the resultant theta scores or through health preference measures such as the EQ-5D, HUI, or direct utility measures. This study suggests that useful preference scores can be derived from the PROMIS measures, and these predicted EQ-5D index scores have applications in measuring the health of populations and estimating quality-adjusted life years for economic evaluations.