A semi-parametric generalization of the Cox proportional hazards regression model: Inference and applications

https://doi.org/10.1016/j.csda.2010.06.010Get rights and content

Abstract

The assumption of proportional hazards (PH) fundamental to the Cox PH model sometimes may not hold in practice. In this paper, we propose a generalization of the Cox PH model in terms of the cumulative hazard function taking a form similar to the Cox PH model, with the extension that the baseline cumulative hazard function is raised to a power function. Our model allows for interaction between covariates and the baseline hazard and it also includes, for the two sample problem, the case of two Weibull distributions and two extreme value distributions differing in both scale and shape parameters. The partial likelihood approach can not be applied here to estimate the model parameters. We use the full likelihood approach via a cubic B-spline approximation for the baseline hazard to estimate the model parameters. A semi-automatic procedure for knot selection based on Akaike’s information criterion is developed. We illustrate the applicability of our approach using real-life data.

Introduction

The modeling and analysis of data in which the principal endpoint is the time until an event occurs is often of prime interest in medical and engineering studies. Typically, such an event is the onset of a disease or death itself as seen in clinical trials or failure of an item or a system as seen in industrial life testing. The time to an event is normally referred to as survival or failure time.

The primary goal in analyzing censored survival data is to assess the dependence of survival time on covariates. The secondary goal is the estimation of the underlying distribution of survival time. The Cox Proportional Hazards (PH) model (Cox, 1972) is a standard tool for exploring the association of covariates with survival time. An interesting feature of this model is that it is semi-parametric in the sense that it can be factored into a parametric part consisting of a regression parameter vector associated with the covariates and a non-parametric part that can be left completely unspecified.

In the Cox PH model, given a vector of possibly time-dependent covariates z, the hazard function at time t is assumed to be of the form λ(t|z)=λ0(t)g(z) where λ0(t) is the baseline hazard function, denoting the hazard under no covariate effect and g(z) is a non-negative function of the covariate vector z, referred to as the risk function, such that g(0)=1. The most commonly used form of the Cox PH model is λ(t|z)=λ0(t)eβz where β=(β1,,βp) is a p vector of regression coefficients. The focus is on inference for β, with the baseline hazard function, λ0(t), the non-parametric part, left completely unspecified.

In spite of its semi-parametric feature, the Cox PH model implicitly assumes that the hazard and survival curves corresponding to two different values of the covariates do not cross. Although this assumption may be valid in many experimental settings, it has been found to be suspect in others. For example, if the treatment effect decreases with time, then one might expect the hazard curves corresponding to the treatment and control groups to converge. Other examples that indicate the presence of non-proportional hazards are also given in Gore et al. (1984), and Tonak et al. (1979), among others.

In this paper, we describe a semi-parametric generalization of the Cox PH model which allows crossing of hazards as well as survival functions. In Section 2, we discuss its unique properties and place it within the context of censored survival data analysis. In Section 3, we describe an estimation procedure for this model using cubic B-spline approximations for the baseline hazard. We illustrate our method with real-life examples in Section 4 and provide some concluding remarks.

Section snippets

A semi-parametric generalization of the Cox PH model

We describe a semi-parametric generalization of the Cox PH model in which the hazard functions corresponding to different values of the covariates can cross. The special case of this model was originally introduced by Quantin et al. (1996) for the purpose of goodness of fit testing of the Cox PH model. Devarajan (2000) outlined the unique properties of this non-proportional hazards regression model as well as inference for this model using maximum penalized likelihood estimation, and provided a

Estimation for the non-proportional hazards model

The observed data consist of independent observations on the triple (X,δ,z), where X is the minimum of a failure and censoring time pair (T,C),δ=I(TC) is the indicator of the event that a failure has been observed and z=(z1,,zp) is a p vector of covariates. The random variables T and C denote the survival and censoring times respectively which are assumed to be independent.

The fundamental assumption of proportionality of hazards in the Cox PH model (1.2) requires that the hazards ratio

Illustration of our methods

We illustrate estimation in the non-proportional hazards model (2.2) using real-life examples. All figures presented were created using the R statistical language and environment (R Development Core Team (2009), www.R-project.org).

Acknowledgements

The authors would like to thank the Associate Editor and referee for providing valuable comments that helped improve the presentation of this paper. The work of the first author was supported in part by NIH grant P30 CA 06927 and an appropriation from the Commonwealth of Pennsylvania.

References (24)

  • L. Bordes et al.

    Sequential estimation for semiparametric models with application to the proportional hazards model

    Journal of Statistical Planning and Inference

    (2006)
  • H.-D.I. Wu et al.

    Heterogeneity and varying effect in hazards regression

    Journal of Statistical Planning and Inference

    (2009)
  • S.C. Cheng et al.

    Predicting survival probabilities with semiparametric transformation models

    Journal of the American Statistical Association

    (1997)
  • D.R. Cox

    Regression models and life tables (with discussion)

    Journal of the Royal Statistical Society. Series B

    (1972)
  • J. Cuzick et al.

    Analysis of trials with treatment-individual interactions

  • C. de Boor

    A Practical Guide to Splines

    (2001)
  • Devarajan, K., 2000. Inference for a non-proportional hazards regression model and applications. Ph.D. Dissertation....
  • K. Devarajan et al.

    Goodness-of-fit testing for the Cox proportional hazards model

  • K. Devarajan et al.

    Testing for covariate effect in the Cox proportional hazards regression model

    Communications in Statistics—Theory and Methods

    (2009)
  • S. Durrelman et al.

    Flexible regression models with cubic splines

    Statistics in Medicine

    (1989)
  • S.M. Gore et al.

    Regression models and non-proportional hazards in the analysis of breast cancer survival

    Applied Statistics

    (1984)
  • R.J. Gray

    Spline-based tests in survival analysis

    Biometrics

    (1994)
  • Cited by (29)

    • SurvNAM: The machine learning survival model explanation

      2022, Neural Networks
      Citation Excerpt :

      This assumption is referred to as the linear proportional hazards condition. The Cox model is semi-parametric in the sense that it can be factored into a parametric part, which consists of a regression parameter vector associated with the covariates, and a non-parametric part, which can be left completely unspecified (Devarajn & Ebrahimi, 2011). One of the main problems of using the Cox model is linear relationship assumption between covariates and the log-risk of an event.

    • Cox proportional hazards model used for predictive analysis of the energy consumption of healthcare buildings

      2022, Energy and Buildings
      Citation Excerpt :

      As the hazard function associated with cumulative energy consumption is unknown, parametric models cannot be applied. Nonetheless, the CPH model is applicable since it does not require this information about the event analysed [34]. There are no precedents in the state of the art which use the CPH model to analyse either the energy consumption of buildings or how to improve the energy efficiency of healthcare buildings.

    • Reliability of a collection and transport system for industrial waste water

      2020, Process Safety and Environmental Protection
      Citation Excerpt :

      Monte Carlo simulations (MCs) were performed to validate the FTA results, and a sensitivity and lead analysis to determine the dominant events contributing to the main event failure. Devarajan and Ebrahimi (2011) find that data modeling and analysis in which the main element of analysis is the time to the occurrence of an event are generally of interest in medical and engineering studies. Typically, in engineering, this event is the failure of an item or system, as seen in industrial life tests and when one has primary interest in analyzing data from this type of modeling which is to evaluate the time dependency of the event has not yet occurred (survival or reliability) with covariates, the Cox Proportional Hazards model (Cox, 1972) is a standard tool for exploring the association of covariates with survival time.

    • A weighted random survival forest

      2019, Knowledge-Based Systems
      Citation Excerpt :

      This assumption is referred to as the linear proportional hazards condition. The Cox model is semi-parametric in the sense that it can be factored into a parametric part consisting of a regression parameter vector associated with the covariates and a non-parametric part that can be left completely unspecified [8]. It should be noted that the Cox model may provide unsatisfactory results under conditions of a high dimensionality of survivor data and a small number of observations.

    View all citing articles on Scopus
    View full text