The meta-analytic framework for the evaluation of surrogate endpoints in clinical trials

doi:10.1016/j.jspi.2007.06.005

Journal of Statistical Planning and Inference

Volume 138, Issue 2, 1 February 2008, Pages 432-449

https://doi.org/10.1016/j.jspi.2007.06.005 Get rights and content

Abstract

For a number of reasons, surrogate endpoints are considered instead of the so-called true endpoint in clinical studies, especially when such endpoints can be measured earlier, and/or with less burden for patient and experimenter. Surrogate endpoints may occur more frequently than their standard counterparts. For these reasons, it is not surprising that the use of surrogate endpoints in clinical practice is increasing.

Building on the seminal work of Prentice [1989. Surrogate endpoints in clinical trials: definitions and operational criteria. Statist. Med. 8, 431–440] and Freedman et al. [1992. Statistical validation of intermediate endpoints for chronic diseases. Statist. Med. 11, 167–178], Buyse et al. [2000. The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics 1, 49–67] framed the evaluation exercise within a meta-analytic setting, in an effort to overcome difficulties that necessarily surround evaluation efforts based on a single trial. In this paper, we review the meta-analytic approach for continuous outcomes, discuss extensions to non-normal and longitudinal settings, as well as proposals to unify the somewhat disparate collection of validation measures currently on the market. Implications for design and for predicting the effect of treatment in a new trial, based on the surrogate, are discussed. Two case studies are analyzed, one in schizophrenia and one in opthalmology.

Introduction

The use of surrogate endpoints in the development of new therapies has always been very controversial, partly owing to a number of unfortunate historical instances where treatments showing a highly positive effect on a surrogate endpoints were ultimately shown to be detrimental to the subjects’ clinical outcome, and conversely, some instances of treatments conferring clinical benefit without measurable impact on presumed surrogates (Fleming and DeMets, 1996). For example, in cardiovascular disease, the unsettling discovery that the two major antiarrhythmic drugs encanaide and flecanaide reduced arrhythmia but caused a more than three fold increase in overall mortality stressed the need for caution in using non-validated surrogate markers in the evaluation of the possible clinical benefits of new drugs (CAST, 1989). On the other hand, the dramatic surge of the AIDS epidemic, the impressive therapeutic results obtained early on with zidovudine, and the pressure for an accelerated evaluation of new therapies, have all led to the use of CD4 blood count and later of viral load as endpoints that replaced time to clinical events and overall survival (DeGruttola and Tu, 1994), in spite of serious concerns about their limitations as surrogate markers for clinically relevant endpoints (Lagakos and Hoth, 1992).

Throughout this paper, we use the terms “endpoint” and “marker” interchangeably to refer simply to some random variable that can be measured over the course of the disease process. Variables that are measured early in the course of the disease are often suggested as potential “surrogates” for those that are measured later. The following definitions reflect the commonly accepted use of various terms in the biomedical literature (Biomarkers Definitions Working Group, 2001). A clinical endpoint is a characteristic or variable that reflects how a patient feels, functions, or survives. A biomarker is a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. A surrogate endpoint is a biomarker that is intended to substitute for a clinical endpoint. A surrogate endpoint is expected to predict clinical benefit, harm, or lack thereof.

One important reason for the present interest in surrogate endpoints is the advent of a large number of biomarkers that closely reflect the disease process. An increasing number of new drugs have a well-defined mechanism of action at the molecular level, allowing drug developers to measure the effect of these drugs on the relevant biomarkers (Ferentz, 2002). There is increasing public pressure for new, promising drugs to be approved for marketing as rapidly as possible, and such approval will have to be based on biomarkers rather than on some long-term clinical endpoint (Lesko and Atkinson, 2001). As an illustration of this trend towards early decision-making, recently proposed clinical trial designs use treatment effects on a surrogate endpoint to screen for treatments that show insufficient promise to have a sizeable impact on survival (Royston et al., 2003). If the approval process is shortened, there will be a corresponding need for earlier detection of safety signals that could point to toxic problems with new drugs. It is a safe bet, therefore, that the evaluation of tomorrow's drugs will be based primarily on biomarkers, rather than on the longer-term, harder clinical endpoints that have dominated the development of new drugs until now.

It is therefore best to use validated surrogates, though one needs to reflect on the precise meaning and extent of validation (Schatzkin and Gail, 2002). Like in many clinical decisions, statistical arguments will play a major role, but ought to be considered in conjunction with clinical and biological evidence. At the same time, surrogate endpoints can play different roles in different phases of drug development. While it may be more acceptable to use surrogates in early phases of research, one should be much more restraint using them as substitutes for the true endpoint in pivotal phase III trials, since the latter might imply replacing the true endpoint by a surrogate for all future studies as well, a far-reaching decision. For a biomarker to be used as a “valid” surrogate, a number of conditions must be fulfilled. The ICH Guidelines on Statistical Principles for Clinical Trials state that “In practice, the strength of the evidence for surrogacy depends upon: (i) the biological plausibility of the relationship, (ii) the demonstration in epidemiological studies of the prognostic value of the surrogate for the clinical outcome and (iii) evidence from clinical trials that treatment effects on the surrogate correspond to effects on the clinical outcome” (International Conference on Harmonisation, 1998).

Two motivating case studies are introduced in Section 2. The meta-analytic evaluation framework is presented in Section 3, in the context of normally distributed outcomes. Extensions to a variety of non-Gaussian settings are discussed in Section 4. Efforts for unifying the scattered suite of validation measures are reviewed in Section 5. Implications for prediction of the effect in a new trial and for designing studies based on surrogates are the topics of Section 6.

Section snippets

A meta-analysis of five clinical trials in schizophrenia

The data come from a meta-analysis of five double-blind randomized clinical trials, comparing the effects of risperidone to conventional antipsychotic agents for the treatment of chronic schizophrenia. The treatment indicator for risperidone versus conventional treatment will be denoted by Z. Schizophrenia has long been recognized as a heterogeneous disorder with patients suffering from both “negative” and “positive” symptoms. Negative symptoms are characterized by deficits in cognitive,

A meta-analytic framework for normally distributed outcomes

Several methods have been suggested for the formal evaluation of surrogate markers, some based on a single trial with others, currently gaining momentum, of a meta-analytic nature. The first formal single trial approach to validate markers is due to Prentice (1989), who gave a definition of the concept of a surrogate endpoint, followed by a series of operational criteria. Freedman et al. (1992) augmented Prentice's hypothesis-testing-based approach, with the estimation paradigm, through the

Non-Gaussian endpoints

Statistically speaking, the surrogate endpoint and the clinical endpoint are realizations of random variables. As will be clear from the formalism in Section 3, one is in need of the joint distribution of these variables. The easiest, but not the only, situation is where both are Gaussian random variables, but one also encounters binary (e.g., $CD 4 +$ counts over $500 / {mm}^{3}$ , tumor shrinkage), categorical (e.g., cholesterol levels $< 200 mg / dl$ , 200– $299 mg / dl$ , $300 + mg / dl$ , tumor response as complete

Towards a unified approach

The longitudinal method of the previous section, while elegant, hinges upon normality. First using the likelihood reduction factor (LRF) (Section 5.1) and then an information-theoretic approach (Section 5.2), extension, and therefore unification, will be achieved.

Prediction and design aspects

An important application of surrogacy evaluation is the prediction of treatment effect on the true endpoint without measuring the latter, supplemented with appropriate quantification of uncertainty. We will review the work done in this respect by Burzykowski and Buyse (2006).

Two components contribute to such a prediction: (a) information obtained in the validation process based on trials $i = 1, \dots, N$ , used to fit models (1)–(2) and (b) the estimate of the effect of Z on S in a new trial $i = 0$

Concluding remarks

Over the years, a variety of surrogate marker evaluation strategies have been proposed, cast within a meta-analytic framework. With an increasing range of endpoint types considered, such as continuous, binary, time-to-event, and longitudinal endpoints, also the scatter of types of measures proposed has increased. Some of these measures are difficult to calculate from fully specified hierarchical models, which has sparked of the formulation of simplified strategies. We reviewed the ensuing

Acknowledgment

We gratefully acknowledge support from Belgian IUAP/PAI network “Statistical Techniques and Modeling for Complex Substantive Questions with Complex Data”.

References (40)

S.R. Kay et al.
Reliability and validity of the positive and negative syndrome scale for schizophrenics
Psychiatr. Res.
(1988)
A. Alonso et al.
Surrogate marker evaluation from an information theoretic perspective
Biometrics
(2006)
A. Alonso et al.
Validation of surrogate markers in multiple randomized clinical trials with repeated measurements
Biometrical J.
(2003)
A. Alonso et al.
Prentice's approach and the meta analytic paradigm: a reflection on the role of statistics in the evaluation of surrogate endpoints
Biometrics
(2004)
A. Alonso et al.
A unifying approach for surrogate marker validation based on Prentice's criteria
Statist. Med.
(2005)
Biomarkers and surrogate endpoints: preferred definitions and conceptual framework
Clin. Pharmacol. Ther.
(2001)
T. Burzykowski et al.
Surrogate threshold effect: an alternative measure for meta-analytic surrogate endpoint validation
Pharm. Statist.
(2006)
T. Burzykowski et al.
Validation of surrogate endpoints in multiple randomized clinical trials with failure-time endpoints
Appl. Statist.
(2001)
T. Burzykowski et al.
The validation of surrogate endpoints using data from randomized clinical trials: a case-study in advanced colorectal cancer
J. Roy. Statist. Soc. Ser. A
(2004)
T. Burzykowski et al.
The Evaluation of Surrogate Endpoints
(2005)

M. Buyse et al.

The validation of surrogate endpoints in randomized experiments

Biometrics

(1998)

M. Buyse et al.

The validation of surrogate endpoints in meta-analyses of randomized experiments

Biostatistics

(2000)

Preliminary report: effect of encainide and flecainide on mortality in a randomized trial of arrhythmia suppression after myocardial infraction

N. Engl. J. Med.

(1989)

D.G. Clayton

A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence

Biometrika

(1978)

A.J. Cortiñas et al.

Choice of units of analysis and modeling strategies in multilevel hierarchical models

Comput. Statist. Data Anal.

(2004)

T. Cover et al.

Elements of Information Theory

(1991)

J.R. Dale

Global cross ratio models for bivariate, discrete, ordered responses

Biometrics

(1986)

M.J. Daniels et al.

Meta-analysis for the evaluation of potential surrogate markers

Statist. Med.

(1997)

V. DeGruttola et al.

Modelling progression of CD-4 lymphocyte count and its relationship to survival time

Biometrics

(1994)

A.E. Ferentz

Integrating pharmacogenomics into drug development

Pharmacogenomics

(2002)

Cited by (14)

Center-within-trial versus trial-level evaluation of surrogate endpoints
2014, Computational Statistics and Data Analysis
Citation Excerpt :
Where a surrogacy analysis is desired but one or more of these issues cause only a few (say, one to five) trials to be available for analysis, a common ad-hoc solution is to perform trial-level surrogacy analyses on trial sub-units, such as centers, investigators, or geographic regions within trials, as if these sub-units were themselves unique trials. Published examples estimating trial-level surrogacy using trial sub-units for analysis include: evaluation of time to progression and progression-free survival as surrogates for overall survival in advanced ovarian cancer, where centers within trials are treated as the trial unit (Buyse et al., 2000; Burzykowski et al., 2001; Molenberghs et al., 2002; Tibaldi et al., 2003; Burzykowski and Buyse, 2006); change in visual acuity at 6 months after treatment as a surrogate for change in visual acuity at 12 months in age-related macular degeneration, where centers are treated as trial units (Buyse et al., 2000; Molenberghs et al., 2001, 2002; Tibaldi et al., 2003; Alonso et al., 2004b, 2006; Pryseley et al., 2007; Abrahantes et al., 2008; Molenberghs et al., 2008); progression-free survival as a surrogate for overall survival in advanced colorectal cancer, with centers as trial units (Burzykowski et al., 2001; Molenberghs et al., 2002; Tibaldi et al., 2003; Burzykowski and Buyse, 2006; Abrahantes et al., 2008); outcomes of the Positive and Negative Syndrome Scale (PANSS) as a surrogates for the Clinician’s Global Impression (CGI) scale in schizophrenia, where treating physicians, main investigators, or countries were considered as trial-level replicates (Molenberghs et al., 2002; Renard et al., 2002; Alonso et al., 2002, 2003, 2004a, 2006; Tilahun et al., 2007; Alonso and Molenberghs, 2007; Abrahantes et al., 2008; Molenberghs et al., 2008, 2010); prostate specific antigen (PSA) as a surrogate for overall survival in advanced prostate cancer, where country was used as the trial unit (Renard et al., 2003; Molenberghs et al., 2004); recurrence-free survival as a surrogate for overall survival in colon cancer, with grouped centers treated as the trial unit (Sertdemir and Burgut, 2009); leukemia-free survival as a surrogate for overall survival in maintenance therapy trials for patients with acute myeloid leukemia in complete remission, where countries within a single trial were treated similarly to trials (Buyse et al., 2011); pathologic complete response and local control as surrogates for overall survival in advanced rectal cancer, where grouped centers were treated as trial units (Bonnetain et al., 2012); and progression-free survival as a surrogate for overall survival in advanced non-small-cell lung cancer, where centers within trials was the unit of assessment of trial-level surrogacy (Laporte et al., 2013). Because time-to-event endpoints are of primary interest in our own and many other applications, we performed a simulation study to determine the extent to which particular meta-analytic features (e.g., number of trials or centers, underlying trial-level or center-level surrogacy, or relative variability of treatment effects at each level) influence differences between naive center-level and trial-level surrogacy evaluation.
Evaluation of candidate surrogate endpoints using individual patient data from multiple clinical trials is considered the gold standard approach to validate surrogates at both patient and trial levels. However, this approach assumes the availability of patient-level data from a relatively large collection of similar trials, which may not be possible to achieve for a given disease application. One common solution to the problem of too few similar trials involves performing trial-level surrogacy analyses on trial sub-units (e.g., centers within trials), thereby artificially increasing the trial-level sample size for feasibility of the multi-trial analysis. To date, the practical impact of treating trial sub-units (centers) identically to trials in multi-trial surrogacy analyses remains unexplored, and conditions under which this ad hoc solution may in fact be reasonable have not been identified. We perform a simulation study to identify such conditions, and demonstrate practical implications using a multi-trial dataset of patients with early stage colon cancer.
Surrogate end points in secondary analyses of cardiovascular trials
2012, Progress in Cardiovascular Diseases
Citation Excerpt :
In the above example, the authors calculate an estimated AA (expressed as a correlation) of 0.944. The Buyse and Molenberghs approach has also been applied to 15 phase II/III trials to evaluate CD4 cell counts as a surrogate for the composite clinical end point of development of AIDS or death40; a single, multicenter trial of interferon α treatment of age-related macular degeneration using a potential surrogate end point of change in visual acuity over 6 months for a longer-term end point of visual acuity after 1 year10,42-44; the Regression Growth Evaluation Statin Study (REGRESS)—a single multicenter study of pravastatin treatment on coronary atherosclerosis—to evaluate change in serum cholesterol as a surrogate for the primary end point, the change in mean coronary artery segment diameter over a 2-year trial period45; 2 multicenter trials in advanced colorectal cancer to evaluate progression-free survival time as a surrogate for survival42; and 5 schizophrenia studies evaluating one instrument's validity as a surrogate for another instrument.44,46 It must be admitted, however, that these articles are more concerned with generalization and development of the statistical techniques than practical validation of surrogates.
A surrogate end point is one that is used as a substitute for a clinical end point of more direct interest, usually for reasons of practicality, and that is expected to predict clinical benefit. Surrogate end points play a critical role in the advancement of all medical research, and cardiovascular (CV) research in particular. However, the relationship between a surrogate end point and its clinical end point is usually complex, and there are many examples where results based on surrogates have proved to be misleading. Secondary analyses of existing clinical trial data are likely to involve surrogate end points, if only because clinical end points will have been extensively studied as part of the primary analysis of a trial large enough to collect useful clinical end point data. Validation of a surrogate end point is a laudable goal for a secondary analysis of a large clinical end point trial (or meta-analysis of multiple smaller trials), and the result may be an important new tool for further study of a class of compounds in a particular disease context. Secondary analyses using surrogate end points may also provide new insight into disease or treatment mechanism, but as with any surrogate end point analysis, the results can mislead, and the existing literature is heavy on application and light on methodology. Surrogate end points often substitute efficiency for clarity, and while many interesting and potentially informative secondary analyses of CV trials will involve surrogates, results are likely to be ambiguous and should be interpreted with care.
A controlled effects approach to assessing immune correlates of protection
2023, Biostatistics
Assessment of immune correlates of protection via controlled vaccine efficacy and controlled risk
2021, arXiv
Is disease intensity a good surrogate for yield loss or toxin contamination? A case study with Fusarium head blight of wheat
2020, Phytopathology
Evaluation of surrogacy in the multi-trial setting based on information theory: an extension to ordinal outcomes
2020, Journal of Biopharmaceutical Statistics

View all citing articles on Scopus

View full text

The meta-analytic framework for the evaluation of surrogate endpoints in clinical trials

Abstract

Introduction

Section snippets

A meta-analysis of five clinical trials in schizophrenia

A meta-analytic framework for normally distributed outcomes

Non-Gaussian endpoints

Towards a unified approach

Prediction and design aspects

Concluding remarks

Acknowledgment

Psychiatr. Res.

Surrogate marker evaluation from an information theoretic perspective

Biometrics

Validation of surrogate markers in multiple randomized clinical trials with repeated measurements

Biometrical J.

Prentice's approach and the meta analytic paradigm: a reflection on the role of statistics in the evaluation of surrogate endpoints

Biometrics

A unifying approach for surrogate marker validation based on Prentice's criteria

Statist. Med.

Biomarkers and surrogate endpoints: preferred definitions and conceptual framework

Clin. Pharmacol. Ther.

Surrogate threshold effect: an alternative measure for meta-analytic surrogate endpoint validation

Pharm. Statist.

Validation of surrogate endpoints in multiple randomized clinical trials with failure-time endpoints

Appl. Statist.

The validation of surrogate endpoints using data from randomized clinical trials: a case-study in advanced colorectal cancer

J. Roy. Statist. Soc. Ser. A

The Evaluation of Surrogate Endpoints

The validation of surrogate endpoints in randomized experiments

Biometrics

The validation of surrogate endpoints in meta-analyses of randomized experiments

Biostatistics

Preliminary report: effect of encainide and flecainide on mortality in a randomized trial of arrhythmia suppression after myocardial infraction

N. Engl. J. Med.

A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence

Biometrika

Choice of units of analysis and modeling strategies in multilevel hierarchical models

Comput. Statist. Data Anal.

Elements of Information Theory

Global cross ratio models for bivariate, discrete, ordered responses

Biometrics

Meta-analysis for the evaluation of potential surrogate markers

Statist. Med.

Modelling progression of CD-4 lymphocyte count and its relationship to survival time

Biometrics

Integrating pharmacogenomics into drug development

Pharmacogenomics