Original Article
An empirical comparison of methods for meta-analysis of diagnostic accuracy showed hierarchical models are necessary

https://doi.org/10.1016/j.jclinepi.2007.09.013Get rights and content

Abstract

Objective

Meta-analysis of studies of the accuracy of diagnostic tests currently uses a variety of methods. Statistically rigorous hierarchical models require expertise and sophisticated software. We assessed whether any of the simpler methods can in practice give adequately accurate and reliable results.

Study Design and Setting

We reviewed six methods for meta-analysis of diagnostic accuracy: four simple commonly used methods (simple pooling, separate random-effects meta-analyses of sensitivity and specificity, separate meta-analyses of positive and negative likelihood ratios, and the Littenberg–Moses summary receiver operating characteristic [ROC] curve) and two more statistically rigorous approaches using hierarchical models (bivariate random-effects meta-analysis and hierarchical summary ROC curve analysis). We applied the methods to data from a sample of eight systematic reviews chosen to illustrate a variety of patterns of results.

Results

In each meta-analysis, there was substantial heterogeneity between the results of different studies. Simple pooling of results gave misleading summary estimates of sensitivity and specificity in some meta-analyses, and the Littenberg–Moses method produced summary ROC curves that diverged from those produced by more rigorous methods in some situations.

Conclusion

The closely related hierarchical summary ROC curve or bivariate models should be used as the standard method for meta-analysis of diagnostic accuracy.

Introduction

There is growing acknowledgment of the need for systematic reviews of studies evaluating the accuracy of diagnostic and screening tests. Recent years have seen increasing numbers of such reviews being published: the Database of Abstracts of Reviews of Effects (DARE) maintained by the Centre for Reviews and Dissemination at the University of York [1] includes 27 diagnostic test accuracy reviews for 1998, increasing to 49 in 2003. The Cochrane Collaboration is planning to include reviews of test accuracy studies in the Cochrane Library. There has also been an increase in methodological work in this area [2], [3], [4], [5], [6], [7], [8].

Considerable uncertainty remains, however, over the best way to formally synthesize test accuracy studies. Statistical methodology is much more varied than in meta-analysis of therapeutic interventions. There are a number of different measures of diagnostic test accuracy, and a variety of ways of meta-analyzing them [7], [9], [10], [11]. These meta-analytic methods may produce either summary estimates of test characteristics (e.g., sensitivity, specificity, positive and negative likelihood ratios) or a summary receiver operating characteristic (SROC) curve.

A survey of reviews of diagnostic accuracy that were included in the Centre for Reviews and Dissemination's DARE up to 2002 [9] found that of 133 reviews in which meta-analysis was performed, 52% computed one or more summary measures of accuracy, 18% conducted only SROC analyses, and 30% did both. Of the 109 reviews that computed summary measures of accuracy, 89% used sensitivity and/or specificity, 24% used likelihood ratios, and 10% used predictive values. There are also several alternative ways of computing both SROC curves and summary measures of accuracy, differing in the weighting given to each study or whether a transformation is used [9]. There is a clear need for consensus on the most appropriate methods for meta-analysis of test accuracy studies.

Our aim in this paper was to compare two statistically rigorous methods involving hierarchical models that are not widely used at present with simpler, more commonly used methods, to assess whether the simpler methods are adequate in practice. We start by presenting brief results of a survey of the statistical methods used in recently published systematic reviews of test accuracy studies. We then review these methods, and then evaluate their results when applied to data from eight systematic reviews. We conclude with recommendations for best practice in future reviews.

Section snippets

Brief review of use of methods in the literature

Diagnostic systematic reviews published in 2003, the most recent complete available year, were identified from the DARE [1] (http://www.york.ac.uk/inst/crd/crddatabases.htm#DARE) maintained by the Centre for Reviews and Dissemination at the University of York. We extracted data on the summary accuracy measures presented and the meta-analysis methods used.

Of 49 systematic reviews of diagnostic accuracy published in 2003 and identified from the DARE database, 34 (69%) included a meta-analysis.

Review of methods for meta-analysis of diagnostic accuracy studies

Several features distinguish the results of diagnostic accuracy studies from those of studies of therapeutic interventions and necessitate different methods of meta-analysis. First, diagnostic accuracy is usually quantified by two measures, sensitivity and specificity (or positive and negative likelihood ratios), and cannot be reduced to a single summary measure such as a diagnostic odds ratio without losing important information [10]. Second, declaring a test result to be positive involves

Empirical comparison of methods

We compared the results of meta-analysis based on the different proposed methods, using data from a sample of eight systematic reviews (Table 3) [23], [24], [25], [26], [27], [28], [29], [30]. These were purposively sampled from reviews in which one or more of the authors have been involved to illustrate a variety of tests with different ranges of and variability in sensitivity and specificity. For each data set, we plotted the individual study results in ROC space. We applied the methods above

Discussion

Our empirical comparison of the results of different meta-analytic methods applied to eight datasets showed that the commonly used Littenberg–Moses method of generating SROC curves, and simple pooling of sensitivity and specificity, can give results that differ markedly from those derived using the statistically rigorous bivariate/HSROC method involving hierarchical models. Separate random-effects meta-analysis of logit-transformed sensitivity and specificity gave summary points, but not SROC

Limitations of this study

The main limitation of the present study is that we applied the methods to a limited sample of eight data sets. However, our chosen data sets capture a range of features common in the literature: all exhibit substantial between-study variability, but the differ widely in the regions of ROC space in which the study estimates lie. There remains scope for future work comparing the methods both on a larger number of data sets, and on simulated data where the true parameter values are known and can

Conclusions

We have reviewed methods for meta-analysis of diagnostic accuracy studies and compared their results when applied to the eight example data sets. The bivariate/HSROC method is the most statistically rigorous and can be used to give confidence and prediction regions and a summary ROC curve, in addition to the summary sensitivity and specificity. We believe this method should be adopted as the standard approach. The Littenberg–Moses method and separate random effects meta-analysis did not

Acknowledgments

This work was supported by the MRC Health Services Research Collaboration. Dr Bachmann's work (grants no. 3233B0-103182 and 3200B0-103183) was supported by the Swiss National Science Foundation.

References (43)

  • J.J. Deeks

    Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests

    BMJ

    (2001)
  • W. Deville et al.

    Conducting systematic reviews of diagnostic studies: didactic guidelines

    BMC Med Res Methodol

    (2002)
  • C. Gluud et al.

    Evidence based diagnostics

    BMJ

    (2005)
  • A. Tatsioni et al.

    Challenges in systematic reviews of diagnostic technologies

    Ann Intern Med

    (2005)
  • P. Whiting et al.

    Sources of variation and bias in studies of diagnostic accuracy: a systematic review

    Ann Intern Med

    (2004)
  • J. Dinnes et al.

    A methodological review of how heterogeneity has been examined in systematic reviews of diagnostic test accuracy

    Health Technol Assess

    (2005)
  • J.J. Deeks

    Evaluations of diagnostic and screening tests

  • V. Dukic et al.

    Meta-analysis of diagnostic test accuracy assessment studies with varying number of thresholds

    Biometrics

    (2003)
  • R.M. Harbord et al.

    A unification of models for meta-analysis of diagnostic accuracy studies

    Biostatistics

    (2007)
  • B. Littenberg et al.

    Estimating diagnostic accuracy from multiple conflicting reports: a new meta-analytic method

    Med Decis Making

    (1993)
  • L.E. Moses et al.

    Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations

    Stat Med

    (1993)
  • Cited by (165)

    View all citing articles on Scopus
    View full text