Discordance between reported intention-to-treat and per protocol analyses

doi:10.1016/j.jclinepi.2006.09.013

Journal of Clinical Epidemiology

Volume 60, Issue 7, July 2007, Pages 663-669

https://doi.org/10.1016/j.jclinepi.2006.09.013 Get rights and content

Abstract

Objective

To quantify the degree of disagreement between the two most popular methods for dealing with missing data: intention to treat (ITT) and per protocol (PP).

Study Design and Setting

We performed a systematic review of randomized two-armed clinical trials (CTs) published between 2001 and 2003, abstracted in PubMed and reporting both the ITT and PP analyses on a primary binary endpoint, out of which 74 papers were finally selected. The treatment effect of each CT was measured by the odds ratio, and the disagreement between them was quantified by the Bland–Altman method.

Results

On average, the PP estimator provides greater values Log_eORPP = 1.25·Log_eORITT, (95% CI: 1.15, 1.35) than the corresponding ITT estimator, although the limits of concordance showed that the ratio between the two estimators varies greatly from 0.39 up to 2.53.

Conclusion

These results confirm that missing values may cause both systematic and unpredictable bias in CTs. Further efforts should be made to minimize protocol deviations and to use better statistical methods to highlight the drawbacks of missing information. In the presence of protocol deviations, the conclusion of a CT cannot rest on the single reporting of either the ITT or the PP approach alone.

Introduction

The inherent point of a controlled clinical trial (CT), which defines it as experimental and makes it distinct from an observational study, is to assess the consequences of the assignment of an intervention to a patient [1]. Though some studies [2], [3] conclude that observational designs provide estimates of treatment effects not significantly different from those given by CTs, it is accepted [4] that in observational settings it is more hazardous to establish causality, because a higher degree of uncertainty is introduced in them by the unknown assignment procedure, which may be related to uncontrolled covariates whose effects may be confounded with the intervention effect. In a CT, the main source of uncertainty is attributed to the chance or sample variation, and thus measured by standard errors. But when potential deviations occur, the overall uncertainty is affected. As the units of a CT are human beings with legal and ethical rights [5], they might take decisions that overlap and are confounded with the clinician's decision. Other deviations may even occur in the course of treatment, resulting in dropout, missing data, or protocol violations.

To manage those deviations, two strategies are commonly used: the intention-to-treat (ITT) principle states that any subject should be analyzed as if he or she has completely followed the scheduled design and the per protocol (PP) approach proposes including only those volunteers who adhered to the assigned intervention and completed the prespecified follow-up without any major protocol deviation. Given that the ITT estimate includes patients who, in fact, did not receive the experimental treatment, one would expect them to provide attenuated values [6]. As ITT tries to preserve the experimental design, it has been usually recommended [7], [8], [9], [10] for nonequivalence trials, despite the need to impute outcome values for noncompliers with missing data. The dilemma with missing data is to distinguish between random and nonrandom missingness: as the randomness assumption rests on independence from nonobservable variables, it cannot be empirically contrasted and missing data result in a greater uncertainty of the trial conclusions.

When studying the adherence to the intervention assigned, we can consider the distinction between use effectiveness, which estimates the outcome in habitual conditions of administration (“proof of practice”), and method effectiveness (“efficacy”), which assesses the method's potential under ideal conditions and no protocol deviations (“proof of principle”). Shih and Quan [11] suggested that use effectiveness should be considered for management decisions involving a whole population; however, for clinicians involved with individual patients, method effectiveness among completers, together with the probability of completion, may be more relevant. If the dropouts in a trial are similar to future dropouts, and given that there is a good definition of the studied, treated, and sick populations [12], it can be argued that a valid study-based ITT analysis will adequately address use effectiveness. On the other hand, as dropout may be related to outcome [13] and it may have different cause in each treatment arm [14], the PP estimate that excludes protocol deviations will be biased [6], [15], [16], especially when there is a large percentage of dropout [17]. Furthermore, compliance can interact with treatment, resulting in better results for compliers in the active group but just the opposite (better for noncompliers) in the control group [10]. Thus, as the PP estimate is not acceptable in cases of substantial dropout, it has been argued that a valid estimate of method effectiveness can be derived from the ITT estimate taking into account the degree of noncompliance [16].

To summarize, our rationale is that point estimates and their standard errors are derived assuming random allocation in addition to a complete and identical follow-up in the treatment arms. Then, our hypotheses are that any deviation from the protocol design may generate two sources of error: bias in the estimates of the effect (systematic bias); but also, as was pointed out by Deeks et al. [4], an incorrect underestimation of the real variability present (unpredictability bias), because standard errors account solely for random variation.

Our main objective is to study empirically the relationship between the ITT and the PP estimators as reported by researchers in indexed medical journals and to quantify its degree of concordance.

Some authors [15], [18] have hypothesized about a loss of power of the PP estimate due to its reduced sample size, although others have questioned whether it can be compensated by its expected higher estimate [7]. Our second objective is to compare the statistical efficiency of the two approaches.

Section snippets

Data sources

We performed a systematic review of papers abstracted in PubMed, restricted to publication years 2001–2003, using the keywords “clinical trial,” “intention to treat” (or ITT), and “per protocol” (or PP). The papers were manually checked to make sure they performed both the ITT and PP analyses on the primary endpoint. Finally, seeking homogeneity, we restricted the study to CTs comparing only two groups of treatment and with a binary response.

Data extraction

We recorded sample size and number of positive

Data selection

The initial search identified 162 papers, but only 127 were true randomized CTs analyzed both by ITT and PP. From these, 53 were excluded, mainly because they had more than two groups or did not analyze a binary response (see Fig. 1 for details). The final sample comprised 74 papers.

Sample description

There was a large heterogeneity in sample size: the number of patients included in the ITT (PP) analyses ranged from 26 (21) to 5,792 (4,755) with a median of 155 (133). The percentage of losses ranged from 1.74 to

PP provides higher estimates

Our first conclusion is that, as expected, the PP analysis tends to provide, on average, higher estimates of effect than the ITT analysis [6], [16]. This result is in accordance with the idea that losses do not retain the treatment effect and missing data in CTs result in systematic differences between the approaches for dealing with them [16].

Unpredictability

Our second conclusion is about poor agreement: though the Lin reproducibility index was large, the discrepancy limits showed that both the ITT and PP

Conclusion

To conclude, we recommend first optimizing clinical plans to mitigate sample attrition [35], trying to avoid nonrandom errors, because “the best way of dealing with missing data is not to have them in the first place” [36]. Second, if nonrandom mechanisms are involved, the dropout mechanism should be carefully monitored [37], to allow statistical analysis reflecting the uncertainty introduced by protocol deviations: on modeling nonresponse, clinicians, and statisticians should work together to

Acknowledgments

While taking full responsibility for possible errors, we gratefully acknowledge helpful reviews of earlier versions of this work by Drs. Mike Campbell, Francesc Cardellach, Josep Lluis Carrasco, Guadalupe Gómez, and Ian White, as well as two anonymous reviewers. We also appreciate Donald Rubin's suggestions for future work and Alan Pounds for English editing. E.C. was partially supported by grant FIS PI041945 from the “Instituto de Salud Carlos III.”

Contributors: N.P., C.B., and E.C. designed

References (37)

E. Cobo
Diseño y análisis de un ensayo clínico: el aspecto más crítico
Med Clin (Barc)
(2004)
C.C. Wright et al.
Intention-to-treat approach to data from randomized controlled trials: a sensitivity analysis
J Clin Epidemiol
(2003)
J.M. Lachin
Statistical considerations in the intent-to-treat principle
Control Clin Trials
(2000)
G. Chêne et al.
Intention-to-treat vs on-treatment analyses of clinical trial data
Control Clin Trials
(1998)
L. Ross et al.
In a randomized controlled trial, missing data led to biased results regarding anxiety
J Clin Epidemiol
(2004)
G.R. Auleley et al.
The methods for handling missing data in clinical trials influence sample size requirements
J Clin Epidemiol
(2004)
J. Concato et al.
Randomized, controlled trials, observational studies, and the hierarchy of research designs
N Engl J Med
(2000)
K. Benson et al.
A comparison of observational studies and randomized, controlled trials
N Engl J Med
(2000)
J.J. Deeks et al.
Evaluating non-randomised intervention studies
Health Technol Assess
(2003)
E.J. Emanuel et al.
What makes clinical research ethical?
JAMA
(2000)

D. Moher et al.

The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials

Ann Intern Med

(2001)

Food and Drug Administration

International conference on harmonization: statistical principles for clinical trials

Fed Regist

(1998)

W.J. Shih et al.

Testing for treatment differences with dropouts present in clinical trials—a composite approach

Stat Med

(1997)

J.P. Collet et al.

Sick population—treated population: the need for a better definition

Eur J Clin Pharmacol

(1991)

S.M. Snapinn et al.

Informative noncompliance in endpoint trials

Curr Control Trials Cardiovasc Med

(2004)

W.R. Myers

Handling missing data in clinical trials: an overview

Drug Inf J

(2000)

L.B. Sheiner et al.

Intention-to-treat analysis and the goals of clinical trials

Clin Pharmacol Ther

(1995)

S.C. Choi et al.

Effect of non-random missing data mechanisms in clinical trials

Stat Med

(1995)

Cited by (138)

Effect of modified income assistance payment schedules on substance use service access: Evidence from an experimental study
2024, International Journal of Drug Policy
Despite being critical to reducing the impacts of poverty internationally, synchronized monthly government income assistance payments are linked to intensified drug use and associated harms, including disrupted access to substance use-related services. This study evaluates whether alternative income assistance distribution schedules improve harm reduction (HR), pharmacotherapy and substance use service utilization.
This exploratory, parallel group, unblinded, randomized controlled trial analyzed data from adults (n = 192) in Vancouver, Canada receiving income assistance, and reporting active, regular illicit drug use. Participants were randomly assigned on a 1:2:2 basis for six income assistance payment cycles to: (1) existing government schedules (control); (2) a “staggered” single monthly payment; or (3) “split & staggered” twice-monthly payments. Generalized linear mixed models analyzed secondary outcomes of HR, pharmacotherapy and substance use service utilization as well as barriers accessing these services.
Forty-five control, 71 staggered, and 76 split & staggered volunteers participated between 2015 and 2019. Multivariable modified per-protocol analyses demonstrate increased access to substance use services (Adjusted Odds Ratio [AOR] 1.64, 95% Confidence Interval [CI] 1.02–2.64) for split & staggered arm participants, and, conversely, increased barriers to HR for participants in the staggered (AOR 2.34, 95% CI 1.24–4.41) and split & staggered (AOR 2.16, 95% CI 1.08–4.35) arms. Results also showed decreased barriers to pharmacotherapy around government payments (AOR 0.23, 95% CI 0.06–0.90), pharmacotherapy around individual payments (AOR 0.12, 95% CI 0.02–0.58), and HR around individual payments (AOR 0.11, 95% CI 0.02–0.63) for staggered arm participants.
Modifying payments schedules demonstrate improved access to overall substance use services, and reduced barriers to HR and pharmacotherapy around income assistance payments. However, increased overall barriers to HR access were also shown. These complex, predominantly beneficial findings support the exploration of offering alternative payment schedules to support service access.
Randomized, single-blind, placebo-controlled trial on Hominis placenta extract pharmacopuncture for hot flashes in peri- and post-menopausal women
2022, Integrative Medicine Research
Hominis placenta pharmacopuncture is widely used for climacteric symptoms. This study examined the efficacy and safety of pharmacopuncture with PLC (the extract of Hominis placenta) on hot flashes for perimenopausal and postmenopausal women.
This study was a randomized placebo-controlled single-blind trial, which recruited 128 perimenopausal and postmenopausal women, randomly assigned to receive pharmacopuncture with PLC or normal saline (NS) for eight weeks. The primary outcome was the mean changes in the hot flash score (HFS) and the secondary outcomes were the mean changes in the Menopause Rating Scale (MRS), follicle-stimulating hormone (FSH) levels, and estradiol (E₂) levels from baseline to eight weeks. Missing values were imputed using the last-observation-carried-forward method.
After treatment (week 9), the HFS decreased significantly in both groups (p = 0.000). The residual HFS was 47.09 ± 41.39% and 56.45 ± 44.92 % in the PLC and control groups, respectively (p = 0.262). One month after the treatment (week 13), the score of the PLC group was reduced, but the score increased in the control group (p = 0.077). There were no statistically significant differences in the mean changes in MRS, FSH, and E₂ between the two groups. No serious adverse events related to this trial were noted.
In this study, Hominis placenta extract pharmacopuncture did not differ significantly from NS in reducing the hot flash score. While this therapy appears safe, the potential for long-term effect of PLC extract needs to be examined in a large randomized controlled trial with appropriate controls.
Clinical Research Information Service, Republic of Korea, KCT0 0 03533.
Effectiveness of cognitive behavioral therapy for perinatal maternal depression, anxiety and stress: A systematic review and meta-analysis of randomized controlled trials
2022, Clinical Psychology Review
Cognitive behavioral therapy (CBT) has been widely studied in prenatal or postnatal depression, with much less research on anxiety and stress. This meta-analysis aims to comprehensively evaluate CBT efficacy for perinatal depression, anxiety and stress in the short term (from baseline to immediately post-intervention) and in the long term (from baseline to the end of follow-up). Five databases were searched. We included 79 randomized controlled trials (RCTs) and quasi-RCTs assessing the efficacy of CBT during pregnancy and the first year postpartum. Primary outcome was the mean score change in depression, anxiety and stress. CBT-only and CBT plus other interventions were effective for perinatal maternal depression in the short term (SMD −0.69, 95% CI: −0.83, −0.55) and long term (SMD −0.59, 95% CI −0.75, −0.42). CBT-only had both short- and long-term efficacy for perinatal anxiety (short term: SMD −0.63, 95% CI −0.85, −0.42; long term: SMD −0.71, 95% CI −1.02, −0.39) and short-term efficacy for perinatal stress (SMD −0.96, 95% CI −1.40, −0.52). Overall, CBT was effective for perinatal maternal depression, anxiety and stress. CBT-only exhibited short-term efficacy for perinatal depression, anxiety and stress, and long-term efficacy for perinatal depression and anxiety. Subgroup analyses suggested that CBT-only was effective across a wide variety of modalities.
Effect of alternative income assistance schedules on drug use and drug-related harm: a randomised controlled trial
2021, The Lancet Public Health
The synchronised monthly disbursement of income assistance, whereby all recipients are paid on the same day, has been associated with increases in illicit drug use and serious associated harms. This phenomenon is often referred to as the cheque effect. Because payment variability can affect consumption patterns, this study aimed to assess whether these harms could be mitigated through a structural intervention that varied income assistance payment timing and frequency.
This randomised, parallel group trial was done in Vancouver, Canada, and enrolled recipients of income assistance whose drug use increased around payment days. The recipients were randomly assigned 1:2:2 to a control group that received monthly synchronised income assistance payments on government payment days, a staggered group in which participants received single desynchronised monthly income assistance payments, or a split and staggered group in which participants received desynchronised income assistance payments split into two instalments per month, 2 weeks apart, for six monthly payment cycles. Desynchronised payments in the intervention groups were made on individual payment days outside the week of the standard government schedules. Randomisation was through a pre-established stratified block procedure. Investigators and statisticians were masked to group allocation, but participants and front-line staff were not. Complete final results are reported after scheduled interim analyses and the resulting early stoppage of recruitment. Under intention-to-treat specifications, generalised linear mixed models were used to analyse the primary outcome, which was escalations in drug use, predefined as a 40% increase in at least one of: use frequency; use quantity; or number of substances used during the 3 days after government payments. Secondary analyses examined analogous drug use outcomes coinciding with individual payments as well as exposure to violence. This trial is registered with ClinicalTrials.gov, NCT02457949.
Between Oct 27, 2015, and Jan 2, 2019, 45 participants were enrolled to the control group, 72 to the staggered group, and 77 to the split and staggered group. Intention-to-treat analyses showed a significantly reduced likelihood of increased drug use coinciding with government payment days, relative to the control group, in the staggered (adjusted odds ratio 0·38, 95% CI 0·20–0·74; p=0·0044) and split and staggered (0·44, 0·23–0·83; p=0·012) groups. Findings were consistent in the secondary analyses of drug use coinciding with individual payment days (staggered group 0·50, 0·27–0·96, p=0·036; split and staggered group 0·49, 0·26–0·94, p=0·030). However, secondary outcome analyses of exposure to violence showed increased harm in the staggered group compared with the control group (2·71, 1·06–6·91, p=0·037). Additionally, 51 individuals had a severe or life-threatening adverse event and there were six deaths, none of which was directly attributed to study participation.
Complex results indicate the potential for modified income assistance payment schedules to mitigate escalations in drug use, provided measures to address unintended harms are also undertaken. Additional research is needed to clarify whether desynchronised schedules produce other unanticipated consequences and if additional measures could mitigate these harms.
Canadian Institutes of Health Research, Providence Health Care Research Institute, Peter Wall Institute for Advanced Research, Michael Smith Foundation for Health Research.
Public availability and adherence to prespecified statistical analysis approaches was low in published randomized trials
2020, Journal of Clinical Epidemiology
Citation Excerpt :
The statistical methods used to analyze a randomized trial can affect the results; for instance, excluding different participants or using different statistical models can change the size of the estimated treatment effect or P-value [1–14].
Prespecification of statistical methods in clinical trial protocols and statistical analysis plans can help to deter bias from p-hacking but is only effective if the prespecified approach is made available.
For 100 randomized trials published in 2018 and indexed in PubMed, we evaluated how often a prespecified statistical analysis approach for the trial's primary outcome was publicly available. For each trial with an available prespecified analysis, we compared this with the trial publication to identify whether there were unexplained discrepancies.
Only 12 of 100 trials (12%) had a publicly available prespecified analysis approach for their primary outcome; this document was dated before recruitment began for only two trials. Of the 12 trials with an available prespecified analysis approach, 11 (92%) had one or more unexplained discrepancies. Only 4 of 100 trials (4%) stated that the statistician was blinded until the SAP was signed off, and only 10 of 100 (10%) stated the statistician was blinded until the database was locked.
For most published trials, there is insufficient information available to determine whether the results may be subject to p-hacking. Where information was available, there were often unexplained discrepancies between the prespecified and final analysis methods.
Cluster-randomized implementation trial of two facilitation strategies to implement a novel information and communications technology at the Veterans Health Administration
2024, Implementation Science

View all citing articles on Scopus

: Erik Cobo was in part supported by FISS grant PI041945.

View full text

Review ArticleDiscordance between reported intention-to-treat and per protocol analyses

Abstract

Objective

Study Design and Setting

Results

Conclusion

Introduction

Section snippets

Data sources

Data extraction

Data selection

Sample description

PP provides higher estimates

Unpredictability

Conclusion

Acknowledgments

Med Clin (Barc)

J Clin Epidemiol

Control Clin Trials

Control Clin Trials

J Clin Epidemiol

J Clin Epidemiol

Randomized, controlled trials, observational studies, and the hierarchy of research designs

N Engl J Med

A comparison of observational studies and randomized, controlled trials

N Engl J Med

Evaluating non-randomised intervention studies

Health Technol Assess

What makes clinical research ethical?

JAMA

The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials

Ann Intern Med

International conference on harmonization: statistical principles for clinical trials

Fed Regist

Testing for treatment differences with dropouts present in clinical trials—a composite approach

Stat Med

Sick population—treated population: the need for a better definition

Eur J Clin Pharmacol

Informative noncompliance in endpoint trials

Curr Control Trials Cardiovasc Med

Handling missing data in clinical trials: an overview

Drug Inf J

Intention-to-treat analysis and the goals of clinical trials

Clin Pharmacol Ther

Effect of non-random missing data mechanisms in clinical trials

Stat Med

Review Article
Discordance between reported intention-to-treat and per protocol analyses