Regression calibration for logistic regression with multiple surrogates for one exposure

https://doi.org/10.1016/j.jspi.2006.01.009Get rights and content

Abstract

Methods have been developed by several authors to address the problem of bias in regression coefficients due to errors in exposure measurement. These approaches typically assume that there is one surrogate for each exposure. Occupational exposures are quite complex and are often described by characteristics of the workplace and the amount of time that one has worked in a particular area. In this setting, there are several surrogates which are used to define an individual's exposure. To analyze this type of data, regression calibration methodology is extended to adjust the estimates of exposure-response associations for the bias and additional uncertainty due to exposure measurement error from multiple surrogates. The health outcome is assumed to be binary and related to the quantitative measure of exposure by a logistic link function. The model for the conditional mean of the quantitative exposure measurement in relation to job characteristics is assumed to be linear. This approach is applied to a cross-sectional epidemiologic study of lung function in relation to metal working fluid exposure and the corresponding exposure assessment study with quantitative measurements from personal monitors. A simulation study investigates the performance of the proposed estimator for various values of the baseline prevalence of disease, exposure effect and measurement error variance. The efficiency of the proposed estimator relative to the one proposed by Carroll et al. [1995. Measurement Error in Nonlinear Models. Chapman & Hall, New York] is evaluated numerically for the motivating example. User-friendly and fully documented Splus and SAS routines implementing these methods are available (http://www.hsph.harvard.edu/faculty/spiegelman/multsurr.html).

Introduction

Many researchers have proposed methods to adjust for errors in exposure measurement (Rosner et al., 1989, Rosner et al., 1990, Rosner et al., 1992; Armstrong, 1990, Carroll and Stefanski, 1990; Thomas et al., 1993; Kuha, 1994; Carroll et al., 1995; Lee and Sepanski, 1995; Lyles and Kupper, 1997). Two related methods have been referred in the literature to as regression calibration. The first approach was proposed by Rosner et al., 1989, Rosner et al., 1990 and involves using a surrogate for the true exposure when fitting the primary regression model, i.e., the one which includes the parameter of interest, β. Based on asymptotic theory, the estimated parameters from this surrogate model were shown to converge, approximately, to functions of the parameters of interest, upon which corrections were derived. A second approach was proposed by Carroll and Stefanski (1990). This approach involves substituting the conditional expectation of the true exposure given the observed surrogate and other covariates into the primary regression model, and using bootstrap or sandwich methods to adjust the standard errors of the estimates. Thurston et al. (2003) showed that these two methods are identical under fairly general circumstances, for main study/external validation study designs. In such designs, the main study includes data on the outcome of interest, surrogates of the exposure and confounding covariates. The external validation study includes data about the exposure, the surrogates and confounding covariates, but does not contain outcome data. The external validation studies may be conducted independent of the main study as long as it can be assumed that the measurement error model parameters to be used in subsequent measurement error correction of results from the main study but estimated in the validation study are “transportable” to the main study. The approach proposed in this paper follows that of Rosner et al., 1989, Rosner et al., 1990.

In Rosner et al.'s regression calibration for main study/external validation study designs, the point and interval estimates of association are first obtained by fitting a logistic regression modellogit[Pr(Di=1)]=α0+Wiα1+Ziα2,where Wi is a vector of r surrogates for exposure Xi for individual i (i=1,2,,n1) in the main study, Zi is a vector of the s covariates measured without error and the vectors α1 and α2 represent uncorrected log odds ratios describing a one unit increase in the model covariates. The parameter vectors α1=(α11,α12,,α1r) and α2=(α21,α22,,α2s) correspond to the covariates measured with and without error, respectively.

The logistic regression coefficients from Eq. (1) can then be adjusted for bias due to measurement error in a one-step procedure to obtain estimates of β from the model logit[Pr(Di=1)]=β0+Xiβ1+Ziβ2. Rosner et al., 1989, Rosner et al., 1990 proposed that the point and interval estimates of log odds ratio can be corrected for measurement error using the formula β^RC=Γ^RC-1α^ where the (r+s) vector α^=(α^1,α^2) is estimated from (1) in the main study and β^RC are the corrected logistic regression coefficients. The (r+s)×(r+s) matrix ΓRC is estimated in the validation study by fitting the linear regression modelXi=γ0+Wiγ1+Ziγ2+εi,where Xi are the perfectly measured exposure variables for individual i (i=1,2,,n2),ΓRC=γ10γ2I,γ1=(γ11,γ12,,γ1r),γ2=(γ21,γ22,,γ2s),and εi is a random error with mean 0 and variance σX|W,Z2. The variance of β^RC was derived using the multivariate delta method. As shown by Kuha (1994), this method is valid if either the outcome of interest is rare and εi is Gaussian, or the measurement error is not severe, that is, if β1ΣX|W,Zβ1 is small. In addition, this method assumes that for each exposure (X), a single surrogate (W) is measured, that is, that the dimension of X and W are equal. In internal validation studies with outcome measured as well as the surrogates, exposure and confounding covariates, β^RC can be combined with the maximum likelihood estimator obtained from the internal validation study using an inverse-variance weighted summary estimator, as long as sampling into the validation study is conditionally independent of X given (W,Z) (Spiegelman et al., 2001).

In many occupational studies, multiple factors serve as surrogates for a single exposure. Exposure is generally described by characteristics of the workplace and the amount of time one has worked in particular areas. Often an industrial hygenist will conduct a detailed exposure assessment using personal or area monitors, resulting in a more quantitative measure of exposure. Without these more accurate exposure measurements, the use of alternative exposure assessments can result in substantial misclassification, reducing or obscuring the exposure–effect relationship (Smith, 1987). In small studies of acute health outcomes, current exposure may be measured for each subject. More commonly, personal exposure is measured only on a subset of the subjects and these values are then used to estimate average exposure by job or exposure zone. In retrospective studies of chronic diseases, long-term exposures are generally estimated by a weighted sum of many of these averages. Typically, individual exposure levels are assessed by classifying workers as exposed/not exposed or by assigning each worker an average exposure for their particular work area. The average values are used directly in the model relating exposure to the health outcome without any adjustment for the fact that the exposure values are estimated. Health effects models are also fit using the job characteristics as surrogates for exposure.

To accommodate the structure of these type of studies, a regression calibration approach is proposed which extends the methods of Rosner et al., 1989, Rosner et al., 1990. The quantitative exposure measure is assumed to be related to the health outcome by a logistic model. The regression of the quantitative exposure on the characteristics of the workplace is assumed to be linear. Results of a simulation study to assess the properties of the proposed estimator with small sample size and under deviations from the rare disease assumption are presented. The asymptotic efficiency of the proposed estimator relative to the corresponding Carroll estimator is evaluated numerically in a broad region of the parameter space centered around the parameters of the motivating example. Finally, we apply the proposed approach to an epidemiologic study assessing the relationship between the exposure to metal working fluids and respiratory function in the presence of measurement error.

Section snippets

A motivating example

An epidemiologic study jointly sponsored by the United Automobile Workers Union and the General Motors Corporation evaluated the relationship between current exposure to metal working fluids (MWF) and respiratory function (Greaves et al., 1997). This study was part of a larger evaluation that included an assessment of past MWF exposure and cancer related mortality (Eisen et al., 1992; Hallock et al., 1994; Tolbert et al., 1992, Woskie et al., 1994). In this study, automobile workers from three

Methods

Suppose that the true exposure (X) and the perfectly measured covariates (Z) are related to the probability of binary outcome (D) by the logistic function logit[Pr(D=1)]=β0+Xβ1+Zβ2 where β2=(β21,β22,,β2s). In addition, we assume that the linear regression model given in (2) is appropriate to relate the r surrogates of W and the s covariates of Z to the true exposure. Standard regression diagnostic methods can be used to check the validity of the linear regression model (Belsey et al., 1980).

Application

To illustrate the application of these methods, 1040 (n1) of the workers in Greaves et al.'s epidemiologic study (1997) and 83 (n2) of the workers in the validation study (Woskie et al., 1994) who were either working in an area with no direct exposure to MWF or were exposed to synthetic or straight MWF were analyzed. Of the 1811 workers with complete data from the two studies, 236 were in plant 3 and of those not in plant 3, 452 were exposed to soluble MWF, leaving a total of 1123 workers (1040

A simulation study

Simulation studies were performed to assess the properties of the proposed estimator in small samples and under violations of the rare disease assumption and/or small parameter of measurement error approximation (β12σX|W2). The performance was summarized using percent relative bias 100×b=12000β^1b/2000-β1/β1, where β^1b is the estimated β1 from the bth simulated data set and by coverage probability, the percentage of the 2000 generated data sets for which the 95% confidence interval (CI)

Carroll et al. (1995) approach

The approach proposed by Carroll et al. (1995) to adjust for estimated exposure values in the health effects model involves solving a non-standard set of estimating equations derived from a Taylor series approximation of the mean and variance functions. The method simplifies considerably with homoscedastic variance and an assumption that the measurement error variance is small. Then, the validation data (X,W,Z) can be used to estimate the measurement error coefficients (γ) and estimate the

Discussion

In this paper, a regression calibration approach is proposed that provides a single estimate of the exposure effect on a health outcome by combining the parameter estimates for each surrogate (such as job area, type of work) using inverse variance weights which minimize the variance of the summary estimator. Each of these single estimates of the exposure effect is adjusted for exposure measurement error. This approach assumes a linear measurement error model and a logistic regression model for

Acknowledgments

The authors thank the members of the Measurement Error Working Group at the Harvard School of Public Health and Ruifeng Li for their valuable comments and suggestions. This research was supported by the following grants: NIEHS ES09411-03, NIOSH RO1 OH03489, NIEHS RO1 ES007036 and NIEHS 2P30ES00002.

References (28)

  • S.W. Thurston et al.

    Equivalence of regression calibration methods in main study/external validation study designs

    J. Statist. Planning Inference

    (2003)
  • B.G. Armstrong

    The effects of measurement errors in relative risk regressions

    Amer. J. Epidemiol.

    (1990)
  • S.F. Arnold

    The Theory of Linear Models and Multivariate Analysis

    (1981)
  • D.A. Belsey et al.

    Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

    (1980)
  • R.J. Carroll et al.

    Approximate quasi-likelihood estimation in models with surrogate predictors

    J. Amer. Statist. Assoc.

    (1990)
  • R.J. Carroll et al.

    Measurement Error in Nonlinear Models

    (1995)
  • R.J Cohen et al.

    Comparison of partially measured latent traits across nominal subgroups

    J. Amer. Statist. Assoc.

    (1999)
  • E.A. Eisen et al.

    Mortality studies of machining fluid exposure in the automobile industry

    I. A standardized mortality ratio analysis. Amer. J. Ind. Med.

    (1992)
  • I.A. Greaves et al.

    Respiratory health of automobile workers exposed to metal-working fluid aerosols: respiratory symptoms

    Amer. J. Ind. Med.

    (1997)
  • M.F. Hallock et al.

    Estimation of historical exposures to machining fluids in the automotive industry

    Amer. J. Ind. Med.

    (1994)
  • Kaaks, R., Riboli, E., Esteve, J., Van Kappel, A., Vab Staveren, W., 1994. Estimating the accuracy of dietary...
  • V. Kipnis et al.

    Implications of a new dietary measurement error model for estimation of relative risk: application to four calibration studies

    Amer. J. Epidemiol.

    (1999)
  • J. Kuha

    Corrections for exposure measurement error in logistic regression models with an application to nutritional data

    Statist. Med.

    (1994)
  • E.L. Lehmann

    Theory of Point Estimation

    (1983)
  • Cited by (0)

    View full text