Abstract
Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data.
Similar content being viewed by others
References
Blum J.R., Susarla V. (1980) Maximal deviation theory of density and failure rate function estimates based on censored data. In: Krishniah P.R. (ed.) Multivariate analysis. North-Holland, New York, pp 213–222
Cao R., Jácome M.A. (2004) Presmoothed kernel density estimator for censored data. Journal of Nonparametric Statistics 16: 289–309
Cao R., López-de-Ullibarri I., Janssen P., Veraverbeke N. (2005) Presmoothed Kaplan–Meier and Nelson–Aalen estimators. Journal of Nonparametric Statistics 17: 31–56
Cheng P.E. (1994) Nonparametric estimation of mean functionals with data missing at random. Journal of the American Statistical Association 89: 81–87
Dewanji A. (1992) A note on a test for competing risks with missing failure type. Biometrika 79: 855–857
Diehl S., Stute W. (1988) Kernel density and hazard function estimation in the presence of censoring. Journal of Multivariate Analysis 25: 299–310
Dikta G. (1998) On semiparametric random censorship models. Journal of Statistical Planning and Inference 66: 253–279
Dinse G.E. (1982) Nonparametric estimation for partially-complete time and type of failure data. Biometrics 38: 417–431
Dinse G.E. (1986) Nonparametric prevalence and mortality estimators for animal experiments with incomplete cause-of-death data. Journal of the American Statistical Association 81: 328–336
Gao G.Z., Tsaitis A.A. (2005) Semiparametric estimators for the regression coefficients in the linear transformation competing risks model with missing cause of failing. Biometrika 92: 875–891
Goetghebeur E.J., Ryan L.M. (1990) A modified log rank test for competing risks with missing failure type. Biometrika 77: 207–211
Goetghebeur E.J., Ryan L.M. (1995) Analysis of competing risks survival data when some failure types are missing. Biometrika 82: 821–833
González-Manteiga W., Cao R., Marron J.S. (1996) Bootstrap selection of the smoothing parameter in nonparametric hazard rate estimation. Journal of the American Statistical Association 91: 1130–1140
Jacome M.A., Gijbels I., Cao R. (2008) Comparison of presmoothing methods in kernel density estimation under censoring. Computational Statistics 23: 381–406
Kalbfleisch J.D., Prentice R.L. (1980) The statistical analysis of failure time data. Wiley, New York
Kaplan E.L., Meier P. (1958) Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53: 457–481
Klein J.P., Moeschberger M.L. (2003) Survival analysis. Springer, New York
Lipsitz S.R., Zhao L.P., Molenberghs G. (1998) A semiparametric method of multiple imputation. Journal of the Royal Statistical Society, Series B 60: 127–144
Little R.J.A., Rubin D.B. (1987) Statistical analysis with missing data. Wiley, New York
Lo S.-H. (1991) Estimating a survival function with incomplete cause-of-death data. Journal of Multivariate Analysis 39: 217–235
Marron J.S., Padgett J.W. (1987) Asymptotically optimal bandwidth selection for kernel density estimators from randomly right-censored samples. The Annals of Statistics 15: 1520–1535
McKeague I.W., Subramanian S. (1998) Product-limit estimators and Cox regression with missing censoring information. Scandinavian Journal of Statistics 25: 589–601
Nelson W. (1972) Theory and applications of hazard plotting for censored failure data. Technometrics 14: 945–966
Patil P.N. (1993a) Bandwidth choice for nonparametric hazard rate estimation. Journal of Statistical Planning and Inference 35: 15–30
Patil P.N. (1993b) On the least squares cross-validation bandwidth in hazard rate estimation. The Annals of Statistics 21: 1792–1810
Ramlau-Hansen H. (1983) Smoothing counting process intensities by means of kernel functions. The Annals of Statistics 11: 453–466
Regina Y.C., John V.R. (1985) A histogram estimator of the hazard rate with censored data. The Annals of Statistics 13: 592–605
Robins J.M., Rotnitzky A. (1992) Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N., Dietz K., Farewell V. (eds) AIDS epidemiology—methodological issues. Birkhäuser, Boston, pp 297–331
Robins J.M., Rotnitzky A., Zhao L.P. (1994) Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89: 846–866
Robins J.M., Wang N. (2000) Inference for imputation estimators. Biometrika 87: 113–124
Rubin D.B. (1987) Multiple imputation for nonresponse in surveys. Wiley, New York
Sarda P., Vieu P. (1991) Smoothing parameter selection in hazard estimation. Statistics & Probability Letters 11: 429–434
Scharfstein D.O., Rotnitzky A., Robins J. (1999) Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion). Journal of the American Statistical Association 94: 1096–1146
Subramanian S. (2004) Asymptotically efficient estimation of a survival function in the missing censoring indicator model. Journal of Nonparametric Statistics 16: 797–817
Subramanian S. (2006) Survival analysis for the missing censoring indicator model using kernel density estimation techniques. Statistical Methodology 3: 125–136
Tanner M.A. (1983) A note on the variable kernel estimator of the hazard function from randomly censored data. The Annals of Statistics 11: 994–998
Tanner M.A., Wong W.H. (1983) The estimation of the hazard function from randomly censored data by the kernel method. The Annals of Statistics 11: 989–993
Tsiatis A.A., Davidian M., McNeney B. (2002) Multiple imputation methods for testing treatment differences in survival distributions with missing cause of failure. Biometrika 89: 238–244
van der Laan M.J., McKeague I.W. (1998) Efficient estimation from right-censored data when failure indicators are missing at random. The Annals of Statistics 26: 164–182
Wang Q.H. (1999) Some bounds for the error of an estimator of the hazard function with censored data. Statistics & Probability Letters 44: 319–326
Wang Q.H., Linton O., Härdle W. (2004) Semiparametric regression analysis with missing response at random. Journal of the American Statistical Association 99: 334–345
Wang Q.H., Rao J.N.K. (2002) Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics 30: 896–924
Zhao L.P., Lipsitz S.R., Lew D. (1996) Regression analysis with missing covariate data using estimating equations. Biometrics 52: 1165–1182
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Wang, Q., Dinse, G.E. & Liu, C. Hazard function estimation with cause-of-death data missing at random. Ann Inst Stat Math 64, 415–438 (2012). https://doi.org/10.1007/s10463-010-0317-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-010-0317-2