Original Article
A systematic review finds that diagnostic reviews fail to incorporate quality despite available tools

https://doi.org/10.1016/j.jclinepi.2004.04.008Get rights and content

Abstract

Background and objective

To review existing quality assessment tools for diagnostic accuracy studies and to examine to what extent quality was assessed and incorporated in diagnostic systematic reviews.

Methods

Electronic databases were searched for tools to assess the quality of studies of diagnostic accuracy or guides for conducting, reporting or interpreting such studies. The Database of Abstracts of Reviews of Effects (DARE; 1995–2001) was used to identify systematic reviews of diagnostic studies to examine the practice of quality assessment of primary studies.

Results

Ninety-one quality assessment tools were identified. Only two provided details of tool development, and only a small proportion provided any indication of the aspects of quality they aimed to assess. None of the tools had been systematically evaluated. We identified 114 systematic reviews, of which 58 (51%) had performed an explicit quality assessment and were further examined. The majority of reviews used more than one method of incorporating quality.

Conclusion

Most tools to assess the quality of diagnostic accuracy studies do not start from a well-defined definition of quality. None has been systematically evaluated. The majority of existing systematic reviews fail to take differences in quality into account. Reviewers should consider quality as a possible source of heterogeneity.

Introduction

There is a growing interest in the systematic review and quantitative synthesis of research evaluating diagnostic tests. A crucial step in the process of reviewing is the critical appraisal of the quality of studies [1], [2], [3], [4]. Study quality can be used as a criterion for inclusion, or as a potential source of heterogeneity in study results. Study quality, however, has proven to be an elusive concept. Generally speaking, study quality can involve one or more of four aspects. The first is the potential for bias. What is the fit between the study design and the study purpose? Shortcomings in the design and conduct of diagnostic studies can lead to exaggerated estimates of diagnostic accuracy. Evidence is accumulating on the amount and direction of these potential threats to validity [5], [6]. The second aspect of quality is the conduct of the study. How well was the study planned, and what are the discrepancies between what was planned and what was actually performed? A third aspect of quality is the applicability of the results. Are the results of the study applicable to the clinical problem that the reader is interested in? A fourth aspect is the quality of reporting. Does the report or article contain the information needed to assess the potential for bias, the quality of conduct of the study, and the applicability of the results? Some people may see quality as involving all four aspects; others may see it as concerned with only one or two.

Several tools for evaluating the quality of diagnostic studies, some quite different from one another, have been presented in the literature. We report the results of a systematic review of these existing tools to evaluate the quality of diagnostic test evaluations. In a second systematic review, we have documented the ways in which quality has been incorporated in published systematic review of diagnostic tests.

Section snippets

Materials and methods

For the first systematic review, electronic databases including Medline, Embase, Biosis, and the methodological databases of both the Centre for Reviews and Dissemination (CRD) and the Cochrane Collaboration were systematically searched from database inception to April 2001 to identify existing quality assessment tools. Full details of the search strategy are provided elsewhere [7]. Methodological experts in the area of diagnostic tests were contacted for additional studies. No language

Search results

A total of 91 quality-related tools were identified, of which 67 could be classified as tools designed to assess the quality of primary studies of diagnostic accuracy. Of these, 58 were described only within the methods section of a particular systematic review [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47],

Discussion

These two systematic reviews highlight the large variation in tools currently used to assess the quality of studies of diagnostic accuracy, and the different ways in which systematic reviews have incorporated quality.

We were able to identify 91 quality assessment tools. Only two of these provided details on how they had been developed, and none of the tools appear to have been systematically evaluated. There is an obvious lack of agreement about the term quality and why this should be measured.

Conclusion

Existing tools used to assess the quality of diagnostic accuracy studies provide limited detail on the method for their development, and on the aspects of quality that the tools aimed to address. None of these tools appears to have been systematically evaluated. The wide variation in the existing tools to assess quality of diagnostic studies, and the various ways in which study quality differences have been taken into account in systematic reviews, show the need for guidance in this area.

Acknowledgments

Our thanks go to Kath Wright (CRD) for conducting literature searches. This review was commissioned and funded by the NHS R&D Health Technology Assessment Programme. The views expressed in this review are those of the authors and not necessarily those of the Standing Group, the Commissioning Group, or the Department of Health.

References (116)

  • M. Greiner et al.

    Epidemiologic issues in the validation of veterinary diagnostic tests

    Prev Vet Med

    (2000)
  • J.E. Heffner

    Evaluating diagnostic tests in the pleural space: differentiating transudates from exudates as a model

    Clin Chest Med

    (1998)
  • J.J. Deeks

    Using evaluations of diagnostic tests: understanding their limitations and making the most of available evidence

    Ann Oncol

    (1999)
  • W.R. Mower

    Evaluating bias and variability in diagnostic test reports

    Ann Emerg Med

    (1999)
  • D.L. Sackett et al.

    The selection of diagnostic tests

  • R. Jaeschke et al.

    Users' guides to the medical literature. III. How to use an article about a diagnostic test. A. Are the results of the study valid?

    JAMA

    (1994)
  • R. Jaeschke et al.

    Users' guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients?

    JAMA

    (1994)
  • L. Irwig et al.

    Guidelines for meta-analyses evaluating diagnostic tests

    Ann Intern Med

    (1994)
  • J.G. Lijmer et al.

    Empirical evidence of design-related bias in studies of diagnostic tests

    JAMA

    (1999)
  • P. Whiting et al.

    Sources of variation and bias in studies of diagnostic accuracy: a systematic review

    Ann Intern Med

    (2004)
  • P. Whiting et al.

    Development and validation of methods for assessing the quality of diagnostic accuracy studies

    Health Technol Assess

    (2004)
  • K. Flynn et al.

    Positron emission tomography: systematic review. Report No. MTA94-001-02

    (1996)
  • M.C. Reid et al.

    Use of methodological standards in diagnostic test research: getting better but still not good

    JAMA

    (1995)
  • S.B. Sheps et al.

    The assessment of diagnostic tests: a survey of current medical research

    JAMA

    (1984)
  • L.S. Cooper et al.

    The poor quality of early evaluations of magnetic resonance imaging

    JAMA

    (1988)
  • C.A. Beam et al.

    Status of clinical MR evaluations 1985–1988: baseline and design for further assessments

    Radiology

    (1991)
  • D.L. Kent et al.

    Disease, level of impact, and quality of research methods: three dimensions of clinical efficacy assessment applied to magnetic resonance imaging

    Invest Radiol

    (1992)
  • P.F.W. Chien et al.

    The diagnostic accuracy of cervico-vaginal fetal fibronectin in predicting preterm delivery: an overview

    Br J Obstet Gynaecol

    (1997)
  • M.T. Fahey et al.

    Meta-analysis of Pap test accuracy

    Am J Epidemiol

    (1995)
  • J.E. Heffner et al.

    Pleural fluid chemical analysis in parapneumonic effusions: a meta-analysis

    Am J Respir Crit Care Med

    (1995)
  • F.D.R. Hobbs et al.

    A review of near patient testing in primary care

    Health Technol Assess

    (1997)
  • A.W. Lensing et al.

    125I-fibrinogen leg scanning: reassessment of its role for the diagnosis of venous thrombosis in post-operative patients

    Thromb Haemost

    (1993)
  • K. Radack et al.

    Is there a valid association between skin tags and colonic polyps: insights from a quantitative and methodologic analysis of the literature

    J Gen Intern Med

    (1993)
  • M.D. Devous et al.

    SPECT brain imaging in epilepsy: a meta-analysis

    J Nucl Med

    (1998)
  • D.K. Owens et al.

    Polymerase chain reaction for the diagnosis of HIV infection in adults: a meta-analysis with recommendations for clinical practice and study design

    Ann Intern Med

    (1996)
  • J.K. Rao et al.

    The role of antineutrophil cytoplasmic antibody (c-ANCA) testing in the diagnosis of Wegener granulomatosis: a literature review and meta-analysis

    Ann Intern Med

    (1995)
  • W.W. Reed et al.

    Sputum Gram's stain in community-acquired pneumococcal pneumonia: a meta-analysis

    West J Med

    (1996)
  • P.S. Wells et al.

    Accuracy of ultrasound for the diagnosis of deep venous thrombosis in asymptomatic patients after orthopedic surgery: a meta-analysis

    Ann Intern Med

    (1995)
  • J. Attia et al.

    Diagnosis of thyroid disease in hospitalized patients: a systematic review

    Arch Intern Med

    (1999)
  • R.G. Badgett et al.

    How well can the chest radiograph diagnose left ventricular dysfunction?

    J Gen Intern Med

    (1996)
  • D. Becker et al.

    D-dimer testing and acute venous thromboembolism

    Arch Intern Med

    (1996)
  • K.A. Bradley et al.

    Alcohol screening questionnaires in women: a critical review

    JAMA

    (1998)
  • F. Buntinx et al.

    The diagnostic value of macroscopic haematuria in diagnosing urological cancers: a meta-analysis

    Fam Pract

    (1997)
  • A. Conde-Agudelo et al.

    Triple-marker test as screening for Down syndrome: a meta-analysis

    Obstet Gynecol Surv

    (1998)
  • O. Da Silva et al.

    Accuracy of leukocyte indices and C-reactive protein for diagnosis of neonatal sepsis: a critical review

    Pediatr Infect Dis J

    (1995)
  • M. De Bernardinis et al.

    Discriminant power and information content of Ranson's prognostic signs in acute pancreatitis: a meta-analytic study

    Crit Care Med

    (1999)
  • S. Hallan et al.

    The accuracy of C-reactive protein in diagnosing acute appendicitis

    Scand J Clin Lab Invest

    (1997)
  • C. Kearon et al.

    Noninvasive diagnosis of deep venous thrombosis

    Ann Intern Med

    (1998)
  • M.J. Koelemay et al.

    Diagnosis of arterial disease of the lower extremities with duplex ultrasonography

    Br J Surg

    (1996)
  • E.H. Koumans et al.

    Laboratory testing for Neisseria gonorrhoeae by recently introduced nonculture tests: a performance review with clinical and public health considerations

    Clin Infect Dis

    (1998)
  • Cited by (61)

    • Methods and reporting of systematic reviews of comparative accuracy were deficient: a methodological survey and proposed guidance

      2020, Journal of Clinical Epidemiology
      Citation Excerpt :

      We also discussed any uncertainty in a judgment before making a final decision. Previous research focused on systematic reviews of a single test or overview of any review type without detailed assessment of comparative reviews [6,34,35], specific clinical area [36,37], or specific methodological issue [38–40]. Mallett et al. [36] and Cruciani et al. [37] concluded that conduct and reporting of DTA reviews in cancer and infectious diseases was poor.

    • Three risk of bias tools lead to opposite conclusions in observational research synthesis

      2018, Journal of Clinical Epidemiology
      Citation Excerpt :

      Thus, RoB assessments of a single study using different tools may lead to different conclusions [4,15,16], both in randomized controlled trials [1,14,17] and in observational studies [7,8,18]. Meanwhile, the use of scales that provide a single summary score is strongly discouraged [4,15,19] because it involves the weighting of component items, although some of them may be not related to RoB [3,11]. The alternative seems to perform an RoB assessment based on domains [20–23], which is increasingly applied and apparently provides a more structured framework within which to make qualitative decisions on the overall quality of studies and to detect potential sources of bias [16].

    • Recommendations for assessing the risk of bias in systematic reviews of health-care interventions

      2018, Journal of Clinical Epidemiology
      Citation Excerpt :

      In particular, in relation to assessing non-randomized studies, a combination of methods and topical expertise will be necessary to anticipate the most important sources of bias, assess risk of bias, and interpret the effect of potential sources of bias on estimates of effect. Many tools have emerged over the past 25 years to assess risk of bias; several reviews describe and compare the most commonly used risk-of-bias instruments [17,39–43]. No single universal tool addresses all the varied contexts for assessment of risk of bias.

    • Measurement properties of quality assessment tools for studies of diagnostic accuracy

      2020, Brazilian Journal of Physical Therapy
      Citation Excerpt :

      Numerous quality assessment tools have been developed to evaluate the quality of diagnostic test accuracy studies.4 In 2005, Whiting and colleagues5 identified 67 tools to evaluate the quality of primary diagnostic test accuracy studies. Very few (6 of 67) provided a clear definition of what quality was or what aspects of quality the tool was designed to evaluate.

    View all citing articles on Scopus
    View full text