Original ArticleAn empirical comparison of methods for meta-analysis of diagnostic accuracy showed hierarchical models are necessary
Introduction
There is growing acknowledgment of the need for systematic reviews of studies evaluating the accuracy of diagnostic and screening tests. Recent years have seen increasing numbers of such reviews being published: the Database of Abstracts of Reviews of Effects (DARE) maintained by the Centre for Reviews and Dissemination at the University of York [1] includes 27 diagnostic test accuracy reviews for 1998, increasing to 49 in 2003. The Cochrane Collaboration is planning to include reviews of test accuracy studies in the Cochrane Library. There has also been an increase in methodological work in this area [2], [3], [4], [5], [6], [7], [8].
Considerable uncertainty remains, however, over the best way to formally synthesize test accuracy studies. Statistical methodology is much more varied than in meta-analysis of therapeutic interventions. There are a number of different measures of diagnostic test accuracy, and a variety of ways of meta-analyzing them [7], [9], [10], [11]. These meta-analytic methods may produce either summary estimates of test characteristics (e.g., sensitivity, specificity, positive and negative likelihood ratios) or a summary receiver operating characteristic (SROC) curve.
A survey of reviews of diagnostic accuracy that were included in the Centre for Reviews and Dissemination's DARE up to 2002 [9] found that of 133 reviews in which meta-analysis was performed, 52% computed one or more summary measures of accuracy, 18% conducted only SROC analyses, and 30% did both. Of the 109 reviews that computed summary measures of accuracy, 89% used sensitivity and/or specificity, 24% used likelihood ratios, and 10% used predictive values. There are also several alternative ways of computing both SROC curves and summary measures of accuracy, differing in the weighting given to each study or whether a transformation is used [9]. There is a clear need for consensus on the most appropriate methods for meta-analysis of test accuracy studies.
Our aim in this paper was to compare two statistically rigorous methods involving hierarchical models that are not widely used at present with simpler, more commonly used methods, to assess whether the simpler methods are adequate in practice. We start by presenting brief results of a survey of the statistical methods used in recently published systematic reviews of test accuracy studies. We then review these methods, and then evaluate their results when applied to data from eight systematic reviews. We conclude with recommendations for best practice in future reviews.
Section snippets
Brief review of use of methods in the literature
Diagnostic systematic reviews published in 2003, the most recent complete available year, were identified from the DARE [1] (http://www.york.ac.uk/inst/crd/crddatabases.htm#DARE) maintained by the Centre for Reviews and Dissemination at the University of York. We extracted data on the summary accuracy measures presented and the meta-analysis methods used.
Of 49 systematic reviews of diagnostic accuracy published in 2003 and identified from the DARE database, 34 (69%) included a meta-analysis.
Review of methods for meta-analysis of diagnostic accuracy studies
Several features distinguish the results of diagnostic accuracy studies from those of studies of therapeutic interventions and necessitate different methods of meta-analysis. First, diagnostic accuracy is usually quantified by two measures, sensitivity and specificity (or positive and negative likelihood ratios), and cannot be reduced to a single summary measure such as a diagnostic odds ratio without losing important information [10]. Second, declaring a test result to be positive involves
Empirical comparison of methods
We compared the results of meta-analysis based on the different proposed methods, using data from a sample of eight systematic reviews (Table 3) [23], [24], [25], [26], [27], [28], [29], [30]. These were purposively sampled from reviews in which one or more of the authors have been involved to illustrate a variety of tests with different ranges of and variability in sensitivity and specificity. For each data set, we plotted the individual study results in ROC space. We applied the methods above
Discussion
Our empirical comparison of the results of different meta-analytic methods applied to eight datasets showed that the commonly used Littenberg–Moses method of generating SROC curves, and simple pooling of sensitivity and specificity, can give results that differ markedly from those derived using the statistically rigorous bivariate/HSROC method involving hierarchical models. Separate random-effects meta-analysis of logit-transformed sensitivity and specificity gave summary points, but not SROC
Limitations of this study
The main limitation of the present study is that we applied the methods to a limited sample of eight data sets. However, our chosen data sets capture a range of features common in the literature: all exhibit substantial between-study variability, but the differ widely in the regions of ROC space in which the study estimates lie. There remains scope for future work comparing the methods both on a larger number of data sets, and on simulated data where the true parameter values are known and can
Conclusions
We have reviewed methods for meta-analysis of diagnostic accuracy studies and compared their results when applied to the eight example data sets. The bivariate/HSROC method is the most statistically rigorous and can be used to give confidence and prediction regions and a summary ROC curve, in addition to the summary sensitivity and specificity. We believe this method should be adopted as the standard approach. The Littenberg–Moses method and separate random effects meta-analysis did not
Acknowledgments
This work was supported by the MRC Health Services Research Collaboration. Dr Bachmann's work (grants no. 3233B0-103182 and 3200B0-103183) was supported by the Swiss National Science Foundation.
References (43)
- et al.
Systematic reviews with individual patient data meta-analysis to evaluate diagnostic tests
Eur J Obstet Gynecol Reprod Biol
(2003) - et al.
Meta-analytic methods for diagnostic test accuracy
J Clin Epidemiol
(1995) - et al.
Studies reporting ROC curves of diagnostic and prediction data can be incorporated into meta-analyses using corresponding odds ratios
J Clin Epidemiol
(2007) - et al.
The conditional relative odds ratio provided less biased results for comparing diagnostic test accuracy in meta-analyses
J Clin Epidemiol
(2004) - et al.
Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews
J Clin Epidemiol
(2005) - et al.
Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach
J Clin Epidemiol
(2006) Empirical Bayes estimates generated in a hierarchical summary ROC analysis agreed closely with those of a full Bayesian analysis
J Clin Epidemiol
(2004)- et al.
Assessment of the accuracy of diagnostic tests: the cross-sectional study
J Clin Epidemiol
(2003) Database of abstracts of reviews of effects
- et al.
Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD Initiative
Ann Intern Med
(2003)
Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests
BMJ
Conducting systematic reviews of diagnostic studies: didactic guidelines
BMC Med Res Methodol
Evidence based diagnostics
BMJ
Challenges in systematic reviews of diagnostic technologies
Ann Intern Med
Sources of variation and bias in studies of diagnostic accuracy: a systematic review
Ann Intern Med
A methodological review of how heterogeneity has been examined in systematic reviews of diagnostic test accuracy
Health Technol Assess
Evaluations of diagnostic and screening tests
Meta-analysis of diagnostic test accuracy assessment studies with varying number of thresholds
Biometrics
A unification of models for meta-analysis of diagnostic accuracy studies
Biostatistics
Estimating diagnostic accuracy from multiple conflicting reports: a new meta-analytic method
Med Decis Making
Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations
Stat Med
Cited by (165)
Diagnostic accuracy of ultrasound for upper extremity fractures in children: A systematic review and meta-analysis
2021, American Journal of Emergency MedicineA new method for synthesizing test accuracy data outperformed the bivariate method
2021, Journal of Clinical EpidemiologyThe diagnostic accuracy of cardiac ultrasound for acute myocardial ischemia in the emergency department: a systematic review and meta-analysis
2024, Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine