Original Article
Relative risks and confidence intervals were easily computed indirectly from multivariable logistic regression

https://doi.org/10.1016/j.jclinepi.2006.12.001Get rights and content

Abstract

Objective

To assess alternative statistical methods for estimating relative risks and their confidence intervals from multivariable binary regression when outcomes are common.

Study Design and Setting

We performed simulations on two hypothetical groups of patients in a single-center study, either randomized or cohort, and reanalyzed a published observational study. Outcomes of interest were the bias of relative risk estimates, coverage of 95% confidence intervals, and the Akaike information criterion.

Results

According to simulations, a commonly used method of computing confidence intervals for relative risk substantially overstates statistical significance in typical applications when outcomes are common. Generalized linear models other than logistic regression sometimes failed to converge, or produced estimated risks that exceeded 1.0. Conditional or marginal standardization using logistic regression and bootstrap resampling estimated risks within the [0,1] bounds and relative risks with appropriate confidence intervals.

Conclusion

Especially when outcomes are common, relative risks and confidence intervals are easily computed indirectly from multivariable logistic regression. Log-linear regression models, by contrast, are problematic when outcomes are common.

Introduction

Relative risk (RR) is a common measure of the effect of treatment or exposure on outcome in cohort studies. Estimating this simple ratio of the disease risk among the treated (or exposed) compared to the untreated (or unexposed), and an appropriate confidence interval, is a routine application of Mantel–Haenszel methods [1], provided the investigator needs to adjust for only one or two categorical factors. More commonly, however, the study calls for simultaneous adjustment of several factors, some of which are continuous, via multivariable regression modeling. As described in texts on biomedical statistics [2], logistic regression for binary outcome data produces an adjusted odds ratio (OR), not a relative risk. Although the OR has attractive mathematical properties, clinicians rarely think in terms of odds of disease or the OR as a measure of effect [3], [4]. If the risk of an outcome event is rare, under 10%, and the OR is small, the OR approximates the relative risk. But with more common outcomes, the OR is well-known to be more extreme (farther from 1.0) than the relative risk for the same data [5], [6]. The controversy generated by the report of Schulman et al. report on the effects of race and sex on physician referrals exemplifies this distortion [7], [8].

Some authors [9] have converted ORs to relative risks by the simple relationship RR = OR/([1  p0] + [p0 × OR]), where the OR came from the estimate of the logistic regression model, while the value of the baseline risk (p0) was estimated as the unadjusted risk in the reference group (in that case a hospital). The authors estimated the upper and lower bounds for the confidence interval by substituting for OR the upper and lower confidence bounds for the OR from the logistic regression. This method of estimating a confidence interval, known as the “method of substitution”, has been applied to other measures of association [10]. Subsequent criticism suggested, however, that the proposed confidence interval for relative risk would be too narrow because of its failure to account for variability in the baseline risk [11], [12]. Others arrived at the same conclusion independently in similar contexts [13].

Estimating relative risk is also possible by means of alternative generalized linear models.

One proposed option, the log-binomial model, replaces the logit link in logistic regression with a log link but maintains the specification of a binomial distribution [12], [14]. Although this functional form estimates relative risk directly by simple exponentiation of the regression coefficient for the exposure of interest, the log link permits estimates of risk within the broader bounds [0,∞] when probabilities must fall within the bounds [0,1]. Because of this mismatch between the bounds of the model and the allowable outcome, Wacholder [15] proposed constraining the fitting algorithm to respect the [0,1] bounds. His algorithm has been incorporated into the Stata statistical package in the function “binreg” (Stata Corp., College Station, TX, 2001). Owing to the known problem of convergence with the log-binomial model, several authors [16], [17], [18] recently proposed Poisson regression, a generalized linear model with a log-link and a Poisson distribution, and the sandwich variance estimator to produce confidence intervals with correct coverage. The issue of expected risks exceeding the [0,1] bounds remains, however. Greenland recently reviewed these recent articles in the broader context of the literature on standardization of estimates [19].

Inspite of the shortcoming of simplistic methods, they continue to appear in the literature. As of October 2005, almost 200 articles have cited and used the method of substitution outlined in 1998 by Zhang and Yu [9]. These applications, including those published in major medical journals [20], [21], involved common outcomes. The log-linear model with sandwich variance estimates outlined by Zou in 2004 has also begun to gain use, again in leading medical journals [22] for applications with common outcomes. Nevertheless, both methods, as we shall point out, suffer from theoretical as well as methodological problems.

We first demonstrate that confidence intervals generated from logistic regression and the method of substitution, at least as promulgated by Zhang and Yu, exhibit poor coverage for the intended applications of common outcomes. Then we explain why the log-binomial and the log-linear (Poisson) models options might also fail when outcomes are common. Finally, we build upon methodological literature on conditional and marginal standardization to demonstrate several options for using logistic regression to estimate relative risk.

Section snippets

Simulations

We simulated data sets with known values for baseline risks (0.1, 0.2, and 0.3) and relative risk (1.25, 1.5, 1.75, and 2.0), and with 100, 500, and 1000 hypothetical patients split equally into two hypothetical groups, unexposed and exposed. Additional simulations assumed two unbalanced data sets: 100 unexposed and 400 exposed patients, or 100 exposed and 400 unexposed patients. The program next simulated the occurrence of disease at an expected rate among the unexposed (untreated) patients

Method of substitution

In our simulations, the method of substitution advocated by Zhang and Yu generally produced inappropriately narrow 95% confidence intervals for relative risk (Table 1). Even for a low baseline risk (0.2) and a modest relative risk (2.0), the confidence intervals intended to have 95% coverage actually produced less than 90% coverage. For a given relative risk, coverage of the confidence intervals worsened as the baseline risk (p0) increased. For a given baseline risk, coverage deteriorated with

Discussion

Confidence intervals are essential to support estimates of relative risk from multivariable regression models [32], [33]. Our simulations demonstrate why the method of substitution outlined by Zhang and Yu [9] and finding common use in leading journals fails in the very situations for which it was designed—when baseline risk and relative risk are not small. Confidence intervals are too narrow and therefore precision of estimates is overstated. By contrast, confidence intervals based on either

Acknowledgment

Funding: Support was provided in part by an Agency for Healthcare Research and Quality (AHRQ), Centers for Education and Research on Therapeutics cooperative agreement (U18 HS10399), and by Agency for Healthcare Research and Quality, Grant No. R03 HS 11481-01.

Competing interests: Dr. Berlin is employed by Johnson & Johnson, which markets products for treatment of wounds. Johnson & Johnson has provided no input to or support for this study.

References (41)

  • J. Zhang et al.

    What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes

    JAMA

    (1998)
  • L.E. Daly

    Confidence limits made easy: interval estimation using a substitution method

    Am J Epidemiol

    (1998)
  • L.A. McNutt et al.

    Correcting the odds ratio in cohort studies of common outcomes

    JAMA

    (1999)
  • L.A. McNutt et al.

    Estimating the relative risk in cohort studies and clinical trials of common outcomes

    Am J Epidemiol

    (2003)
  • L.M. Bjerre et al.

    Expressing the magnitude of adverse effects in case–control studies: “the number of patients needed to be treated for one additional patient to be harmed”

    BMJ

    (2000)
  • S. Wacholder

    Binomial regression in GLIM: estimating risk ratios and risk differences

    Am J Epidemiol

    (1986)
  • G. Zou

    A modified Poisson regression approach to prospective studies with binary data

    Am J Epidemiol

    (2004)
  • R.E. Carter et al.

    Quasi-likelihood estimation for relative risk regression models

    Biostatistics

    (2005)
  • D. Spiegelman et al.

    Easy SAS calculations for risk and prevalence ratios and differences

    Am J Epidemiol

    (2005)
  • S. Greenland

    Model-based estimation of relative risks and other epidemiologic measures in studies of common outcomes and in case–control studies

    Am J Epidemiol

    (2004)
  • Cited by (206)

    • Technology and the geography of the foreign exchange market

      2023, Journal of International Money and Finance
    • Low-dose aspirin use in pregnancy and the risk of preterm birth: a Swedish register-based cohort study

      2023, American Journal of Obstetrics and Gynecology
      Citation Excerpt :

      An a priori statistical analysis plan was agreed on by all authors. The marginal relative risk (mRR) for preterm birth among women using low-dose aspirin was compared with preterm birth rates among women not using low-dose aspirin and was computed via standardization from logistic regression models,17 adjusted for confounders, as outlined below. Standard errors were based on the delta method.17

    View all citing articles on Scopus
    View full text