Review ArticleDiscordance between reported intention-to-treat and per protocol analyses
Introduction
The inherent point of a controlled clinical trial (CT), which defines it as experimental and makes it distinct from an observational study, is to assess the consequences of the assignment of an intervention to a patient [1]. Though some studies [2], [3] conclude that observational designs provide estimates of treatment effects not significantly different from those given by CTs, it is accepted [4] that in observational settings it is more hazardous to establish causality, because a higher degree of uncertainty is introduced in them by the unknown assignment procedure, which may be related to uncontrolled covariates whose effects may be confounded with the intervention effect. In a CT, the main source of uncertainty is attributed to the chance or sample variation, and thus measured by standard errors. But when potential deviations occur, the overall uncertainty is affected. As the units of a CT are human beings with legal and ethical rights [5], they might take decisions that overlap and are confounded with the clinician's decision. Other deviations may even occur in the course of treatment, resulting in dropout, missing data, or protocol violations.
To manage those deviations, two strategies are commonly used: the intention-to-treat (ITT) principle states that any subject should be analyzed as if he or she has completely followed the scheduled design and the per protocol (PP) approach proposes including only those volunteers who adhered to the assigned intervention and completed the prespecified follow-up without any major protocol deviation. Given that the ITT estimate includes patients who, in fact, did not receive the experimental treatment, one would expect them to provide attenuated values [6]. As ITT tries to preserve the experimental design, it has been usually recommended [7], [8], [9], [10] for nonequivalence trials, despite the need to impute outcome values for noncompliers with missing data. The dilemma with missing data is to distinguish between random and nonrandom missingness: as the randomness assumption rests on independence from nonobservable variables, it cannot be empirically contrasted and missing data result in a greater uncertainty of the trial conclusions.
When studying the adherence to the intervention assigned, we can consider the distinction between use effectiveness, which estimates the outcome in habitual conditions of administration (“proof of practice”), and method effectiveness (“efficacy”), which assesses the method's potential under ideal conditions and no protocol deviations (“proof of principle”). Shih and Quan [11] suggested that use effectiveness should be considered for management decisions involving a whole population; however, for clinicians involved with individual patients, method effectiveness among completers, together with the probability of completion, may be more relevant. If the dropouts in a trial are similar to future dropouts, and given that there is a good definition of the studied, treated, and sick populations [12], it can be argued that a valid study-based ITT analysis will adequately address use effectiveness. On the other hand, as dropout may be related to outcome [13] and it may have different cause in each treatment arm [14], the PP estimate that excludes protocol deviations will be biased [6], [15], [16], especially when there is a large percentage of dropout [17]. Furthermore, compliance can interact with treatment, resulting in better results for compliers in the active group but just the opposite (better for noncompliers) in the control group [10]. Thus, as the PP estimate is not acceptable in cases of substantial dropout, it has been argued that a valid estimate of method effectiveness can be derived from the ITT estimate taking into account the degree of noncompliance [16].
To summarize, our rationale is that point estimates and their standard errors are derived assuming random allocation in addition to a complete and identical follow-up in the treatment arms. Then, our hypotheses are that any deviation from the protocol design may generate two sources of error: bias in the estimates of the effect (systematic bias); but also, as was pointed out by Deeks et al. [4], an incorrect underestimation of the real variability present (unpredictability bias), because standard errors account solely for random variation.
Our main objective is to study empirically the relationship between the ITT and the PP estimators as reported by researchers in indexed medical journals and to quantify its degree of concordance.
Some authors [15], [18] have hypothesized about a loss of power of the PP estimate due to its reduced sample size, although others have questioned whether it can be compensated by its expected higher estimate [7]. Our second objective is to compare the statistical efficiency of the two approaches.
Section snippets
Data sources
We performed a systematic review of papers abstracted in PubMed, restricted to publication years 2001–2003, using the keywords “clinical trial,” “intention to treat” (or ITT), and “per protocol” (or PP). The papers were manually checked to make sure they performed both the ITT and PP analyses on the primary endpoint. Finally, seeking homogeneity, we restricted the study to CTs comparing only two groups of treatment and with a binary response.
Data extraction
We recorded sample size and number of positive
Data selection
The initial search identified 162 papers, but only 127 were true randomized CTs analyzed both by ITT and PP. From these, 53 were excluded, mainly because they had more than two groups or did not analyze a binary response (see Fig. 1 for details). The final sample comprised 74 papers.
Sample description
There was a large heterogeneity in sample size: the number of patients included in the ITT (PP) analyses ranged from 26 (21) to 5,792 (4,755) with a median of 155 (133). The percentage of losses ranged from 1.74 to
PP provides higher estimates
Our first conclusion is that, as expected, the PP analysis tends to provide, on average, higher estimates of effect than the ITT analysis [6], [16]. This result is in accordance with the idea that losses do not retain the treatment effect and missing data in CTs result in systematic differences between the approaches for dealing with them [16].
Unpredictability
Our second conclusion is about poor agreement: though the Lin reproducibility index was large, the discrepancy limits showed that both the ITT and PP
Conclusion
To conclude, we recommend first optimizing clinical plans to mitigate sample attrition [35], trying to avoid nonrandom errors, because “the best way of dealing with missing data is not to have them in the first place” [36]. Second, if nonrandom mechanisms are involved, the dropout mechanism should be carefully monitored [37], to allow statistical analysis reflecting the uncertainty introduced by protocol deviations: on modeling nonresponse, clinicians, and statisticians should work together to
Acknowledgments
While taking full responsibility for possible errors, we gratefully acknowledge helpful reviews of earlier versions of this work by Drs. Mike Campbell, Francesc Cardellach, Josep Lluis Carrasco, Guadalupe Gómez, and Ian White, as well as two anonymous reviewers. We also appreciate Donald Rubin's suggestions for future work and Alan Pounds for English editing. E.C. was partially supported by grant FIS PI041945 from the “Instituto de Salud Carlos III.”
Contributors: N.P., C.B., and E.C. designed
References (37)
Diseño y análisis de un ensayo clínico: el aspecto más crítico
Med Clin (Barc)
(2004)- et al.
Intention-to-treat approach to data from randomized controlled trials: a sensitivity analysis
J Clin Epidemiol
(2003) Statistical considerations in the intent-to-treat principle
Control Clin Trials
(2000)- et al.
Intention-to-treat vs on-treatment analyses of clinical trial data
Control Clin Trials
(1998) - et al.
In a randomized controlled trial, missing data led to biased results regarding anxiety
J Clin Epidemiol
(2004) - et al.
The methods for handling missing data in clinical trials influence sample size requirements
J Clin Epidemiol
(2004) - et al.
Randomized, controlled trials, observational studies, and the hierarchy of research designs
N Engl J Med
(2000) - et al.
A comparison of observational studies and randomized, controlled trials
N Engl J Med
(2000) - et al.
Evaluating non-randomised intervention studies
Health Technol Assess
(2003) - et al.
What makes clinical research ethical?
JAMA
(2000)
The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials
Ann Intern Med
International conference on harmonization: statistical principles for clinical trials
Fed Regist
Testing for treatment differences with dropouts present in clinical trials—a composite approach
Stat Med
Sick population—treated population: the need for a better definition
Eur J Clin Pharmacol
Informative noncompliance in endpoint trials
Curr Control Trials Cardiovasc Med
Handling missing data in clinical trials: an overview
Drug Inf J
Intention-to-treat analysis and the goals of clinical trials
Clin Pharmacol Ther
Effect of non-random missing data mechanisms in clinical trials
Stat Med
Cited by (138)
Effect of modified income assistance payment schedules on substance use service access: Evidence from an experimental study
2024, International Journal of Drug PolicyEffect of alternative income assistance schedules on drug use and drug-related harm: a randomised controlled trial
2021, The Lancet Public HealthPublic availability and adherence to prespecified statistical analysis approaches was low in published randomized trials
2020, Journal of Clinical EpidemiologyCitation Excerpt :The statistical methods used to analyze a randomized trial can affect the results; for instance, excluding different participants or using different statistical models can change the size of the estimated treatment effect or P-value [1–14].
Erik Cobo was in part supported by FISS grant PI041945.