Regression calibration for logistic regression with multiple surrogates for one exposure
Introduction
Many researchers have proposed methods to adjust for errors in exposure measurement (Rosner et al., 1989, Rosner et al., 1990, Rosner et al., 1992; Armstrong, 1990, Carroll and Stefanski, 1990; Thomas et al., 1993; Kuha, 1994; Carroll et al., 1995; Lee and Sepanski, 1995; Lyles and Kupper, 1997). Two related methods have been referred in the literature to as regression calibration. The first approach was proposed by Rosner et al., 1989, Rosner et al., 1990 and involves using a surrogate for the true exposure when fitting the primary regression model, i.e., the one which includes the parameter of interest, . Based on asymptotic theory, the estimated parameters from this surrogate model were shown to converge, approximately, to functions of the parameters of interest, upon which corrections were derived. A second approach was proposed by Carroll and Stefanski (1990). This approach involves substituting the conditional expectation of the true exposure given the observed surrogate and other covariates into the primary regression model, and using bootstrap or sandwich methods to adjust the standard errors of the estimates. Thurston et al. (2003) showed that these two methods are identical under fairly general circumstances, for main study/external validation study designs. In such designs, the main study includes data on the outcome of interest, surrogates of the exposure and confounding covariates. The external validation study includes data about the exposure, the surrogates and confounding covariates, but does not contain outcome data. The external validation studies may be conducted independent of the main study as long as it can be assumed that the measurement error model parameters to be used in subsequent measurement error correction of results from the main study but estimated in the validation study are “transportable” to the main study. The approach proposed in this paper follows that of Rosner et al., 1989, Rosner et al., 1990.
In Rosner et al.'s regression calibration for main study/external validation study designs, the point and interval estimates of association are first obtained by fitting a logistic regression modelwhere is a vector of r surrogates for exposure for individual i () in the main study, is a vector of the s covariates measured without error and the vectors and represent uncorrected log odds ratios describing a one unit increase in the model covariates. The parameter vectors and correspond to the covariates measured with and without error, respectively.
The logistic regression coefficients from Eq. (1) can then be adjusted for bias due to measurement error in a one-step procedure to obtain estimates of from the model . Rosner et al., 1989, Rosner et al., 1990 proposed that the point and interval estimates of log odds ratio can be corrected for measurement error using the formula where the vector is estimated from (1) in the main study and are the corrected logistic regression coefficients. The matrix is estimated in the validation study by fitting the linear regression modelwhere are the perfectly measured exposure variables for individual and is a random error with mean 0 and variance . The variance of was derived using the multivariate delta method. As shown by Kuha (1994), this method is valid if either the outcome of interest is rare and is Gaussian, or the measurement error is not severe, that is, if is small. In addition, this method assumes that for each exposure , a single surrogate is measured, that is, that the dimension of and are equal. In internal validation studies with outcome measured as well as the surrogates, exposure and confounding covariates, can be combined with the maximum likelihood estimator obtained from the internal validation study using an inverse-variance weighted summary estimator, as long as sampling into the validation study is conditionally independent of given (Spiegelman et al., 2001).
In many occupational studies, multiple factors serve as surrogates for a single exposure. Exposure is generally described by characteristics of the workplace and the amount of time one has worked in particular areas. Often an industrial hygenist will conduct a detailed exposure assessment using personal or area monitors, resulting in a more quantitative measure of exposure. Without these more accurate exposure measurements, the use of alternative exposure assessments can result in substantial misclassification, reducing or obscuring the exposure–effect relationship (Smith, 1987). In small studies of acute health outcomes, current exposure may be measured for each subject. More commonly, personal exposure is measured only on a subset of the subjects and these values are then used to estimate average exposure by job or exposure zone. In retrospective studies of chronic diseases, long-term exposures are generally estimated by a weighted sum of many of these averages. Typically, individual exposure levels are assessed by classifying workers as exposed/not exposed or by assigning each worker an average exposure for their particular work area. The average values are used directly in the model relating exposure to the health outcome without any adjustment for the fact that the exposure values are estimated. Health effects models are also fit using the job characteristics as surrogates for exposure.
To accommodate the structure of these type of studies, a regression calibration approach is proposed which extends the methods of Rosner et al., 1989, Rosner et al., 1990. The quantitative exposure measure is assumed to be related to the health outcome by a logistic model. The regression of the quantitative exposure on the characteristics of the workplace is assumed to be linear. Results of a simulation study to assess the properties of the proposed estimator with small sample size and under deviations from the rare disease assumption are presented. The asymptotic efficiency of the proposed estimator relative to the corresponding Carroll estimator is evaluated numerically in a broad region of the parameter space centered around the parameters of the motivating example. Finally, we apply the proposed approach to an epidemiologic study assessing the relationship between the exposure to metal working fluids and respiratory function in the presence of measurement error.
Section snippets
A motivating example
An epidemiologic study jointly sponsored by the United Automobile Workers Union and the General Motors Corporation evaluated the relationship between current exposure to metal working fluids (MWF) and respiratory function (Greaves et al., 1997). This study was part of a larger evaluation that included an assessment of past MWF exposure and cancer related mortality (Eisen et al., 1992; Hallock et al., 1994; Tolbert et al., 1992, Woskie et al., 1994). In this study, automobile workers from three
Methods
Suppose that the true exposure and the perfectly measured covariates are related to the probability of binary outcome by the logistic function where . In addition, we assume that the linear regression model given in (2) is appropriate to relate the r surrogates of and the s covariates of to the true exposure. Standard regression diagnostic methods can be used to check the validity of the linear regression model (Belsey et al., 1980).
Application
To illustrate the application of these methods, 1040 () of the workers in Greaves et al.'s epidemiologic study (1997) and 83 () of the workers in the validation study (Woskie et al., 1994) who were either working in an area with no direct exposure to MWF or were exposed to synthetic or straight MWF were analyzed. Of the 1811 workers with complete data from the two studies, 236 were in plant 3 and of those not in plant 3, 452 were exposed to soluble MWF, leaving a total of 1123 workers (1040
A simulation study
Simulation studies were performed to assess the properties of the proposed estimator in small samples and under violations of the rare disease assumption and/or small parameter of measurement error approximation (). The performance was summarized using percent relative bias , where is the estimated from the simulated data set and by coverage probability, the percentage of the 2000 generated data sets for which the 95% confidence interval (CI)
Carroll et al. (1995) approach
The approach proposed by Carroll et al. (1995) to adjust for estimated exposure values in the health effects model involves solving a non-standard set of estimating equations derived from a Taylor series approximation of the mean and variance functions. The method simplifies considerably with homoscedastic variance and an assumption that the measurement error variance is small. Then, the validation data can be used to estimate the measurement error coefficients () and estimate the
Discussion
In this paper, a regression calibration approach is proposed that provides a single estimate of the exposure effect on a health outcome by combining the parameter estimates for each surrogate (such as job area, type of work) using inverse variance weights which minimize the variance of the summary estimator. Each of these single estimates of the exposure effect is adjusted for exposure measurement error. This approach assumes a linear measurement error model and a logistic regression model for
Acknowledgments
The authors thank the members of the Measurement Error Working Group at the Harvard School of Public Health and Ruifeng Li for their valuable comments and suggestions. This research was supported by the following grants: NIEHS ES09411-03, NIOSH RO1 OH03489, NIEHS RO1 ES007036 and NIEHS 2P30ES00002.
References (28)
- et al.
Equivalence of regression calibration methods in main study/external validation study designs
J. Statist. Planning Inference
(2003) The effects of measurement errors in relative risk regressions
Amer. J. Epidemiol.
(1990)The Theory of Linear Models and Multivariate Analysis
(1981)- et al.
Regression Diagnostics: Identifying Influential Data and Sources of Collinearity
(1980) - et al.
Approximate quasi-likelihood estimation in models with surrogate predictors
J. Amer. Statist. Assoc.
(1990) - et al.
Measurement Error in Nonlinear Models
(1995) - et al.
Comparison of partially measured latent traits across nominal subgroups
J. Amer. Statist. Assoc.
(1999) - et al.
Mortality studies of machining fluid exposure in the automobile industry
I. A standardized mortality ratio analysis. Amer. J. Ind. Med.
(1992) - et al.
Respiratory health of automobile workers exposed to metal-working fluid aerosols: respiratory symptoms
Amer. J. Ind. Med.
(1997) - et al.
Estimation of historical exposures to machining fluids in the automotive industry
Amer. J. Ind. Med.
(1994)