The meta-analytic framework for the evaluation of surrogate endpoints in clinical trials
Introduction
The use of surrogate endpoints in the development of new therapies has always been very controversial, partly owing to a number of unfortunate historical instances where treatments showing a highly positive effect on a surrogate endpoints were ultimately shown to be detrimental to the subjects’ clinical outcome, and conversely, some instances of treatments conferring clinical benefit without measurable impact on presumed surrogates (Fleming and DeMets, 1996). For example, in cardiovascular disease, the unsettling discovery that the two major antiarrhythmic drugs encanaide and flecanaide reduced arrhythmia but caused a more than three fold increase in overall mortality stressed the need for caution in using non-validated surrogate markers in the evaluation of the possible clinical benefits of new drugs (CAST, 1989). On the other hand, the dramatic surge of the AIDS epidemic, the impressive therapeutic results obtained early on with zidovudine, and the pressure for an accelerated evaluation of new therapies, have all led to the use of CD4 blood count and later of viral load as endpoints that replaced time to clinical events and overall survival (DeGruttola and Tu, 1994), in spite of serious concerns about their limitations as surrogate markers for clinically relevant endpoints (Lagakos and Hoth, 1992).
Throughout this paper, we use the terms “endpoint” and “marker” interchangeably to refer simply to some random variable that can be measured over the course of the disease process. Variables that are measured early in the course of the disease are often suggested as potential “surrogates” for those that are measured later. The following definitions reflect the commonly accepted use of various terms in the biomedical literature (Biomarkers Definitions Working Group, 2001). A clinical endpoint is a characteristic or variable that reflects how a patient feels, functions, or survives. A biomarker is a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. A surrogate endpoint is a biomarker that is intended to substitute for a clinical endpoint. A surrogate endpoint is expected to predict clinical benefit, harm, or lack thereof.
One important reason for the present interest in surrogate endpoints is the advent of a large number of biomarkers that closely reflect the disease process. An increasing number of new drugs have a well-defined mechanism of action at the molecular level, allowing drug developers to measure the effect of these drugs on the relevant biomarkers (Ferentz, 2002). There is increasing public pressure for new, promising drugs to be approved for marketing as rapidly as possible, and such approval will have to be based on biomarkers rather than on some long-term clinical endpoint (Lesko and Atkinson, 2001). As an illustration of this trend towards early decision-making, recently proposed clinical trial designs use treatment effects on a surrogate endpoint to screen for treatments that show insufficient promise to have a sizeable impact on survival (Royston et al., 2003). If the approval process is shortened, there will be a corresponding need for earlier detection of safety signals that could point to toxic problems with new drugs. It is a safe bet, therefore, that the evaluation of tomorrow's drugs will be based primarily on biomarkers, rather than on the longer-term, harder clinical endpoints that have dominated the development of new drugs until now.
It is therefore best to use validated surrogates, though one needs to reflect on the precise meaning and extent of validation (Schatzkin and Gail, 2002). Like in many clinical decisions, statistical arguments will play a major role, but ought to be considered in conjunction with clinical and biological evidence. At the same time, surrogate endpoints can play different roles in different phases of drug development. While it may be more acceptable to use surrogates in early phases of research, one should be much more restraint using them as substitutes for the true endpoint in pivotal phase III trials, since the latter might imply replacing the true endpoint by a surrogate for all future studies as well, a far-reaching decision. For a biomarker to be used as a “valid” surrogate, a number of conditions must be fulfilled. The ICH Guidelines on Statistical Principles for Clinical Trials state that “In practice, the strength of the evidence for surrogacy depends upon: (i) the biological plausibility of the relationship, (ii) the demonstration in epidemiological studies of the prognostic value of the surrogate for the clinical outcome and (iii) evidence from clinical trials that treatment effects on the surrogate correspond to effects on the clinical outcome” (International Conference on Harmonisation, 1998).
Two motivating case studies are introduced in Section 2. The meta-analytic evaluation framework is presented in Section 3, in the context of normally distributed outcomes. Extensions to a variety of non-Gaussian settings are discussed in Section 4. Efforts for unifying the scattered suite of validation measures are reviewed in Section 5. Implications for prediction of the effect in a new trial and for designing studies based on surrogates are the topics of Section 6.
Section snippets
A meta-analysis of five clinical trials in schizophrenia
The data come from a meta-analysis of five double-blind randomized clinical trials, comparing the effects of risperidone to conventional antipsychotic agents for the treatment of chronic schizophrenia. The treatment indicator for risperidone versus conventional treatment will be denoted by Z. Schizophrenia has long been recognized as a heterogeneous disorder with patients suffering from both “negative” and “positive” symptoms. Negative symptoms are characterized by deficits in cognitive,
A meta-analytic framework for normally distributed outcomes
Several methods have been suggested for the formal evaluation of surrogate markers, some based on a single trial with others, currently gaining momentum, of a meta-analytic nature. The first formal single trial approach to validate markers is due to Prentice (1989), who gave a definition of the concept of a surrogate endpoint, followed by a series of operational criteria. Freedman et al. (1992) augmented Prentice's hypothesis-testing-based approach, with the estimation paradigm, through the
Non-Gaussian endpoints
Statistically speaking, the surrogate endpoint and the clinical endpoint are realizations of random variables. As will be clear from the formalism in Section 3, one is in need of the joint distribution of these variables. The easiest, but not the only, situation is where both are Gaussian random variables, but one also encounters binary (e.g., counts over , tumor shrinkage), categorical (e.g., cholesterol levels , 200–, , tumor response as complete
Towards a unified approach
The longitudinal method of the previous section, while elegant, hinges upon normality. First using the likelihood reduction factor (LRF) (Section 5.1) and then an information-theoretic approach (Section 5.2), extension, and therefore unification, will be achieved.
Prediction and design aspects
An important application of surrogacy evaluation is the prediction of treatment effect on the true endpoint without measuring the latter, supplemented with appropriate quantification of uncertainty. We will review the work done in this respect by Burzykowski and Buyse (2006).
Two components contribute to such a prediction: (a) information obtained in the validation process based on trials , used to fit models (1)–(2) and (b) the estimate of the effect of Z on S in a new trial
Concluding remarks
Over the years, a variety of surrogate marker evaluation strategies have been proposed, cast within a meta-analytic framework. With an increasing range of endpoint types considered, such as continuous, binary, time-to-event, and longitudinal endpoints, also the scatter of types of measures proposed has increased. Some of these measures are difficult to calculate from fully specified hierarchical models, which has sparked of the formulation of simplified strategies. We reviewed the ensuing
Acknowledgment
We gratefully acknowledge support from Belgian IUAP/PAI network “Statistical Techniques and Modeling for Complex Substantive Questions with Complex Data”.
References (40)
- et al.
Reliability and validity of the positive and negative syndrome scale for schizophrenics
Psychiatr. Res.
(1988) - et al.
Surrogate marker evaluation from an information theoretic perspective
Biometrics
(2006) - et al.
Validation of surrogate markers in multiple randomized clinical trials with repeated measurements
Biometrical J.
(2003) - et al.
Prentice's approach and the meta analytic paradigm: a reflection on the role of statistics in the evaluation of surrogate endpoints
Biometrics
(2004) - et al.
A unifying approach for surrogate marker validation based on Prentice's criteria
Statist. Med.
(2005) Biomarkers and surrogate endpoints: preferred definitions and conceptual framework
Clin. Pharmacol. Ther.
(2001)- et al.
Surrogate threshold effect: an alternative measure for meta-analytic surrogate endpoint validation
Pharm. Statist.
(2006) - et al.
Validation of surrogate endpoints in multiple randomized clinical trials with failure-time endpoints
Appl. Statist.
(2001) - et al.
The validation of surrogate endpoints using data from randomized clinical trials: a case-study in advanced colorectal cancer
J. Roy. Statist. Soc. Ser. A
(2004) - et al.
The Evaluation of Surrogate Endpoints
(2005)
The validation of surrogate endpoints in randomized experiments
Biometrics
The validation of surrogate endpoints in meta-analyses of randomized experiments
Biostatistics
Preliminary report: effect of encainide and flecainide on mortality in a randomized trial of arrhythmia suppression after myocardial infraction
N. Engl. J. Med.
A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence
Biometrika
Choice of units of analysis and modeling strategies in multilevel hierarchical models
Comput. Statist. Data Anal.
Elements of Information Theory
Global cross ratio models for bivariate, discrete, ordered responses
Biometrics
Meta-analysis for the evaluation of potential surrogate markers
Statist. Med.
Modelling progression of CD-4 lymphocyte count and its relationship to survival time
Biometrics
Integrating pharmacogenomics into drug development
Pharmacogenomics
Cited by (14)
Center-within-trial versus trial-level evaluation of surrogate endpoints
2014, Computational Statistics and Data AnalysisCitation Excerpt :Where a surrogacy analysis is desired but one or more of these issues cause only a few (say, one to five) trials to be available for analysis, a common ad-hoc solution is to perform trial-level surrogacy analyses on trial sub-units, such as centers, investigators, or geographic regions within trials, as if these sub-units were themselves unique trials. Published examples estimating trial-level surrogacy using trial sub-units for analysis include: evaluation of time to progression and progression-free survival as surrogates for overall survival in advanced ovarian cancer, where centers within trials are treated as the trial unit (Buyse et al., 2000; Burzykowski et al., 2001; Molenberghs et al., 2002; Tibaldi et al., 2003; Burzykowski and Buyse, 2006); change in visual acuity at 6 months after treatment as a surrogate for change in visual acuity at 12 months in age-related macular degeneration, where centers are treated as trial units (Buyse et al., 2000; Molenberghs et al., 2001, 2002; Tibaldi et al., 2003; Alonso et al., 2004b, 2006; Pryseley et al., 2007; Abrahantes et al., 2008; Molenberghs et al., 2008); progression-free survival as a surrogate for overall survival in advanced colorectal cancer, with centers as trial units (Burzykowski et al., 2001; Molenberghs et al., 2002; Tibaldi et al., 2003; Burzykowski and Buyse, 2006; Abrahantes et al., 2008); outcomes of the Positive and Negative Syndrome Scale (PANSS) as a surrogates for the Clinician’s Global Impression (CGI) scale in schizophrenia, where treating physicians, main investigators, or countries were considered as trial-level replicates (Molenberghs et al., 2002; Renard et al., 2002; Alonso et al., 2002, 2003, 2004a, 2006; Tilahun et al., 2007; Alonso and Molenberghs, 2007; Abrahantes et al., 2008; Molenberghs et al., 2008, 2010); prostate specific antigen (PSA) as a surrogate for overall survival in advanced prostate cancer, where country was used as the trial unit (Renard et al., 2003; Molenberghs et al., 2004); recurrence-free survival as a surrogate for overall survival in colon cancer, with grouped centers treated as the trial unit (Sertdemir and Burgut, 2009); leukemia-free survival as a surrogate for overall survival in maintenance therapy trials for patients with acute myeloid leukemia in complete remission, where countries within a single trial were treated similarly to trials (Buyse et al., 2011); pathologic complete response and local control as surrogates for overall survival in advanced rectal cancer, where grouped centers were treated as trial units (Bonnetain et al., 2012); and progression-free survival as a surrogate for overall survival in advanced non-small-cell lung cancer, where centers within trials was the unit of assessment of trial-level surrogacy (Laporte et al., 2013). Because time-to-event endpoints are of primary interest in our own and many other applications, we performed a simulation study to determine the extent to which particular meta-analytic features (e.g., number of trials or centers, underlying trial-level or center-level surrogacy, or relative variability of treatment effects at each level) influence differences between naive center-level and trial-level surrogacy evaluation.
Surrogate end points in secondary analyses of cardiovascular trials
2012, Progress in Cardiovascular DiseasesCitation Excerpt :In the above example, the authors calculate an estimated AA (expressed as a correlation) of 0.944. The Buyse and Molenberghs approach has also been applied to 15 phase II/III trials to evaluate CD4 cell counts as a surrogate for the composite clinical end point of development of AIDS or death40; a single, multicenter trial of interferon α treatment of age-related macular degeneration using a potential surrogate end point of change in visual acuity over 6 months for a longer-term end point of visual acuity after 1 year10,42-44; the Regression Growth Evaluation Statin Study (REGRESS)—a single multicenter study of pravastatin treatment on coronary atherosclerosis—to evaluate change in serum cholesterol as a surrogate for the primary end point, the change in mean coronary artery segment diameter over a 2-year trial period45; 2 multicenter trials in advanced colorectal cancer to evaluate progression-free survival time as a surrogate for survival42; and 5 schizophrenia studies evaluating one instrument's validity as a surrogate for another instrument.44,46 It must be admitted, however, that these articles are more concerned with generalization and development of the statistical techniques than practical validation of surrogates.
A controlled effects approach to assessing immune correlates of protection
2023, BiostatisticsEvaluation of surrogacy in the multi-trial setting based on information theory: an extension to ordinal outcomes
2020, Journal of Biopharmaceutical Statistics