Original Article
A simple imputation algorithm reduced missing data in SF-12 health surveys

https://doi.org/10.1016/j.jclinepi.2004.06.005Get rights and content

Abstract

Objective

The SF-12 Health Survey is a 12-item questionnaire that yields two summary scores (physical and mental health). Neither score can be computed when an item is missing. We explored imputation methods for missing scores for this instrument.

Study design and setting

Using data from a population-based survey, we tested several ways of imputing simulated missing data.

Results

Among 1250 participants, 118 (9.6%) had at last one missing SF-12 item. Missing data were more common among women, older respondents, non-Swiss nationals, and health service users. Among the 1132 respondents with complete data, replacement of any item with the mean population item weight yielded good results: the mean correlation between imputed and true score was 0.979 for both the physical and mental score. Results remained satisfactory when up to three of the six key items for each score (items that contribute predominantly to a given score), and any number of non-key items, were replaced by the mean. Application of this imputation algorithm to the original survey reduced the proportion of missing scores to <1%. Respondents with incomplete surveys, hence imputed scores, had lower scores than respondents with complete data (physical score: 44.9 vs. 49.8, p < 0.001, mental score: 44.4 vs. 46.3, p = 0.064).

Conclusions

A simple imputation algorithm can substantially reduce the proportion of missing scores for the SF-12 health survey, and consequently reduce non-response bias.

Introduction

Valid research requires complete data. Missing values reduce statistical power, and more importantly, may cause selection bias. Many studies have documented differences between respondents and non-respondents [1], [2], [3], [4], [5], or between early and late respondents [5], [6], [7], in health research. However, even among study participants, substantial proportions of some variables may have missing values. Possible bias due to partial survey response has attracted only limited attention [7], [8], [9].

Incomplete data cause particular concern when multiple data elements are combined to form a single variable, such as a composite score, a clinical prediction rule, or a multi-item psychometric scale. Without imputation, a single bit of missing information may cause the composite score to be missing. Obviously, the more items are combined, the greater the problem: if the probability of a missing value was 2% per item in a composite score, and if missing item probabilities were mutually independent, the probability of a missing score value would be 10% for a 5-item score, 18% for a 10-item score, and 33% for a 20-item score.

Imputation rules alleviate this problem by substituting an acceptable value for the missing element, hence salvaging useful information available in non-missing items. When the composite score is essentially a sum of similar parts, a common imputation rule is “if up to half of the items are missing, replace missing values with mean of available items; if more than half are missing, declare score as missing.” In theory, replacement with the respondent's mean is appropriate only for strictly parallel tests, that is, when all items have the same mean, variance, and correlation with the underlying latent variable [10]. In practice, this rule is often used even when these conditions do not apply, and with good results as long as missing values are fairly rare. For instance, this imputation method is recommended for the SF-36 health survey, which consists of scales based on items that have the same number of response options, but not necessarily the same distribution [11].

When the composite score is derived through a more complex formula, or when items are not identically distributed, replacement with the mean of available items will not work. Often, no imputation rule is available. This is the case for the mental and physical component summary scores (MCS and PCS) based on the SF-36 [12] and the SF-12 [13], [14] health surveys. Each summary score is a sum of 36 or 12 item weights. As no imputation rules exist, a single missing item value will cause missing values for both summary scores. As a result, the SF-12 summary scores have rates of missing data among respondents that are frequently around 10%, and sometimes exceed 25% [15], [16], [17], [18], [19], [20], [21]. Corresponding figures are necessarily higher for the SF-36 summary scores. This issue is of concern to anyone who considers using these popular instruments.

In this article, we examine selection bias due to incomplete responses to an SF-12 survey, explore several ways of imputing summary scores when one or more items have a missing value, and propose a simple and effective imputation algorithm.

Section snippets

Survey

A mail survey of residents of the French-speaking Swiss canton of Vaud was conducted in 1996 to produce local population norms for health status questionnaires [22]. Participants were selected at random from the official resident file, in strata defined by age (20–29 years to 70–79 years) and sex, of 200 persons each (total 2400). Two follow-up mailings were sent to non-respondents.

Questionnaire content

The questionnaire included the COOP Charts [22], the SF-36 health survey [11], including the 12 items needed for

Response rate

Of the initial sample, 2327 persons were eligible, and 1329 (57.1%) returned the questionnaire. A further 79 questionnaires were eliminated because of a mismatch on age or sex between questionnaire data and the original database, leaving 1250 (53.7%) questionnaires for the analysis.

Missing items

All SF-12 items were filled by 1132 (90.6%) respondents; 118 (9.4%) respondents failed to answer at least one item. Most (N = 66) omitted only one item, 16 respondents omitted two items, 13 omitted three items, 8 four

Discussion

This study confirmed that partial completion of the SF-12 health status questionnaire may cause bias, as women, foreigners, the elderly, and users of health services were less likely to answer all 12 questions. To correct this problem, we propose a simple imputation algorithm— replace missing value by mean population weight for up to three key items, and any number of non-key items—which works well in most situations. Imputation revealed that respondents with incomplete data had markedly lower

References (27)

  • E.W. Wolfe

    Using logistic regression to detect item-level non-response bias in surveys

    J Appl Meas

    (2003)
  • R.F. DeVellis

    Scale development: theory and applications

    Sage Publications

    (1991)
  • J.E. Ware

    SF-36 Health Survey. Manual & interpretation guide

    (1993)
  • Cited by (64)

    • Health status and quality of life in patients with diabetes in Switzerland

      2019, Primary Care Diabetes
      Citation Excerpt :

      Scores range from 0 (lowest level of health) to 100 (highest level of health) and were initially calibrated so that 50 is the average score or norm for the US general population, with a standard deviations equalized to 10 [12]. In Switzerland, the PCS and MCS scores were respectively 49.8 (SD 8.6) and 46.7 (SD 10.1) in a sample of Swiss residents in the canton of Vaud from a study performed to establish local population norms for health status questionnaires [13]. Diabetes-specific QoL was assessed with the third version of the Audit of Diabetes-Dependent Quality of Life (ADDQoL) [14,15], a validated and widely recommended instrument with good psychometric properties [16–18].

    • Performance of a Bayesian Approach for Imputing Missing Data on the SF-12 Health-Related Quality-of-Life Measure

      2018, Value in Health
      Citation Excerpt :

      Then, four other models were applied, as tested by Perneger and Burnand [13]. The four other methods of imputing weights associated with PCS-12 or MCS-12 were 1) missing data could be replaced by 0 (zero model [ZM]); 2) missing data could be replaced by the mean weight in the population (mean weight model [MWM]); 3) missing data could be replaced by the mean weight predicted from age and education (weight from regression with age model [WRAM]); and 4) missing data could be replaced by the mean weight predicted from the sum of weights of the remaining items (weight from regression with weight model [WRWM]) [13]. The latter two were derived from a linear regression model.

    View all citing articles on Scopus
    View full text