Skip to main content

Advertisement

Log in

The use of “overall accuracy” to evaluate the validity of screening or diagnostic tests

  • Review
  • Published:
Journal of General Internal Medicine Aims and scope Submit manuscript

Abstract

OBJECTIVE: Evaluations of screening or diagnostic tests sometimes incorporate measures of overall accuracy, diagnostic accuracy, or test efficiency. These terms refer to a single summary measurement calculated from 2 × 2 contingency tables that is the overall probability that a patient will be correctly classified by a screening or diagnostic test. We assessed the value of overall accuracy in studies of test validity, a topic that has not received adequate emphasis in the clinical literature.

DESIGN: Guided by previous reports, we summarize the issues concerning the use of overall accuracy. To document its use in contemporary studies, a search was performed for test evaluation studies published in the clinical literature from 2000 to 2002 in which overall accuracy derived from a 2×2 contingency table was reported.

MEASUREMENTS AND MAIN RESULTS: Overall accuracy is the weighted average of a test’s sensitivity and specificity, where sensitivity is weighted by prevalence and specificity is weighted by the complement of prevalence. Overall accuracy becomes particularly problematic as a measure of validity as 1) the difference between sensitivity and specificity increases and/or 2) the prevalence deviates away from 50%. Both situations lead to an increasing deviation between overall accuracy and either sensitivity or specificity. A summary of results from published studies (N=25) illustrated that the prevalence-dependent nature of overall accuracy has potentially negative consequences that can lead to a distorted impression of the validity of a screening or diagnostic test.

CONCLUSIONS: Despite the intuitive appeal of overall accuracy as a single measure of test validity, its dependence on prevalence renders it inferior to the careful and balanced consideration of sensitivity and specificity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Shapiro DE. The interpretation of diagnostic tests. Stat Methods Med Res. 1999;8:113–34.

    Article  PubMed  CAS  Google Scholar 

  2. Begg CB. Biases in the assessment of diagnostic tests. Stat Med. 1987;6:411–23.

    Article  PubMed  CAS  Google Scholar 

  3. Metz CE. Basic principles of ROC analysis. Semin Nucl Med. 1978;8:283–98.

    Article  PubMed  CAS  Google Scholar 

  4. Weiss N. Clinical Epidemiology: The Study of the Outcome of Illness. 2nd ed. New York, NY: Oxford University Press; 1996:20–1.

    Google Scholar 

  5. Grimes DA, Schulz KF. Uses and abuses of screening tests. Lancet. 2002;359:881–4.

    Article  PubMed  Google Scholar 

  6. Siberry GK. Conversion formulas and biostatistics. In: Siberry GK, Iannone R, eds. The Harriet Lane Handbook: A Manual for Pediatric House Officers. 15th ed. St. Louis, Mo: Mosby; 2000:181–6.

    Google Scholar 

  7. Galen RS, Gambino SR. Beyond Normality: The Predictive Value and Efficiency of Medical Diagnoses. New York, NY: John Wiley & Sons; 1975.

    Google Scholar 

  8. Wassertheil-Smoller S. Biostatistics and Epidemiology: A Primer for Health Professionals. 2nd ed. New York, NY: Springer-Verlag; 1995:118–28.

    Google Scholar 

  9. Nardin RA, Rutkove SB, Raynor EM. Diagnostic accuracy of electrodiagnostic testing in the evaluation of weakness. Muscle Nerve. 2002;26:201–5.

    Article  PubMed  Google Scholar 

  10. Tong MJ, Blatt LM, Kao VWC. Surveillance for hepatocellular carcinoma in patients with chronic viral hepatitis in the United States of America. J Gastroenterol Hepatol. 2001;16:553–9.

    Article  PubMed  CAS  Google Scholar 

  11. McFarland EG, Kim TK, Savino RM. Clinical assessment of three common tests for superior labral anterior-posterior lesions. Am J Sports Med. 2002;30:810–5.

    PubMed  Google Scholar 

  12. Krettek C, Seekamp A, Kontopp H, Tscherne H. Hannover Fracture Scale ′98—re-evaluation and new perspectives of an established extremity salvage score. Injury. 2001;32:317–28.

    Article  PubMed  CAS  Google Scholar 

  13. Postema S, Pattynama P, van den Berg-Huysmans A, Peters LW, Kenter G, Trimbos JB. Effect of MRI on therapeutic decisions in invasive cervical carcinoma. Gynecol Oncol. 2000;79:485–9.

    Article  PubMed  CAS  Google Scholar 

  14. Yang WT, Lam WWM, Yu MY, Cheung TH, Metreweli C. Comparison of dynamic helical CT and dynamic MR imaging in the evaluation of pelvic lymph nodes in cervical carcinoma. Am J Roentgenol. 2000;175:759–66.

    CAS  Google Scholar 

  15. Tsatalpas P, Beuthein-Baumann B, Kropp J, et al. Diagnostic value of 18F-FDG positron emission tomography for detection and treatment control of malignant germ cell tumors. Urol Int. 2002;68:157–63.

    Article  PubMed  Google Scholar 

  16. Jee W, McCauley TR, Katz LD, Matheny JM, Ruwe PA, Daigneault JP. Superior labral anterior posterior (SLAP) lesions of the glenoid labrum: reliability and accuracy of MR arthrography for diagnosis. Radiology. 2001;218:127–32.

    PubMed  CAS  Google Scholar 

  17. Koide Y, Yotsukura M, Yoshino H, Ishikawa K. Usefulness of QT dispersion immediately after exercise as an indicator of coronary stenosis independent of gender or exercise-induced ST-segment depression. Am J Cardiol. 2000;86:1312–7.

    Article  PubMed  CAS  Google Scholar 

  18. Aslam N, Banerjee S, Carr JV, Savvas M, Hooper R, Jurkovic D. Prospective evaluation of logistic regression models for the diagnosis of ovarian cancer. Obstet Gynecol. 2000;96:75–80.

    Article  PubMed  CAS  Google Scholar 

  19. Yeoh GPS, Chan KW. The diagnostic value of fine-needle aspiration cytology in the assessment of thyroid nodules: a retrospective 5-year analysis. Hong Kong Med J. 1999;5:140–4.

    PubMed  Google Scholar 

  20. Vicini FA, Kestin LL, Martinez AA. The correlation of serial prostate specific antigen measurements with clinical outcome after external beam radiation therapy of patients for prostate carcinoma. Cancer. 2000;88:2305–18.

    Article  PubMed  CAS  Google Scholar 

  21. Elhendy A, van Domberg RT, Sozzi FB, Poldermans D, Bax JJ, Roelandt JRTC. Impact of hypertension on the accuracy of exercise stress myocardial perfusion imaging for the diagnosis of coronary artery disease. Heart. 2001;85:655–61.

    Article  PubMed  CAS  Google Scholar 

  22. Viegi G, Pedreschi M, Pistelli F, et al. Prevalence of airways obstruction in a general population: European Respiratory Society versus American Thoracic Society definition. Chest. 2000;117(suppl 2):339–45.

    Article  Google Scholar 

  23. Nunes LW, Schnall MD, Orel SG. Update of breast MR imaging architectural interpretation model. Radiology. 2001;219:484–94.

    PubMed  CAS  Google Scholar 

  24. Flamen P, Lerut A, Van Cutsem E, et al. Utility of positron emission tomography for the staging of patients with potentially operable esophageal carcinoma. J Clin Oncol. 2000;18:3202–10.

    PubMed  CAS  Google Scholar 

  25. Sone S, Li F, Yang Z-G, et al. Characteristics of small lung cancers invisible on conventional chest radiography and detected by population based screening using spiral CT. Br J Radiol. 2000;73:137–45.

    PubMed  CAS  Google Scholar 

  26. Wong BC, Wong WM, Wang WH, et al. An evaluation of invasive and non-invasive tests for the diagnosis of Helicobactor pylori infection in Chinese. Aliment Pharmacol Ther. 2001;15:505–11.

    Article  PubMed  CAS  Google Scholar 

  27. Lin WY, Chao TH, Wang SJ. Clinical features and gallium scan in the detection of post-surgical infection in the elderly. Eur J Nucl Med Mol Imaging. 2002;29:371–5.

    Article  PubMed  Google Scholar 

  28. Ahmad NA, Lewis JD, Ginsberg GG, Rosato EF, Morris JB, Kochman ML. EUS in preoperative staging of pancreatic cancer. Gastrointest Endosc. 2000;52:463–8.

    Article  PubMed  CAS  Google Scholar 

  29. Meyer PT, Schreckenberger M, Spetzger U, et al. Comparison of visual and ROI-based brain tumor grading using 18F-FDG PET: ROC analysis. Eur J Nucl Med Mol Imaging. 2001;28:165–74.

    Article  CAS  Google Scholar 

  30. Ogawa K, Oida A, Sugimura H, et al. Clinical significance of blood brain natriuretic peptide level measurement in the detection of heart disease in untreated outpatients. Circ J. 2002;66:122–6.

    Article  PubMed  CAS  Google Scholar 

  31. Lokeshwar VB, Schroeder GL, Selzer MG, et al. Bladder tumor markers for monitoring recurrence and screening comparison of hyaluronic acid-hyaluronidase and BTA-stat tests. Cancer. 2002;95:61–72.

    Article  PubMed  Google Scholar 

  32. Gurleyik G, Gurleyik E, Cetinkaya F, Unalmiser S. Serum interleukin-6 measurement in the diagnosis of acute appendicitis. Aust NZ J Surg. 2002;72:665–7.

    Article  Google Scholar 

  33. Greco M, Crippa F, Agresti R, et al. Axillary lymph node staging in breast cancer by 2-fluoro-2-deoxy-D-glucose-positron emission tomography: clinical evaluation and alternative management. J Natl Cancer Inst. 2001;93:630–5.

    Article  PubMed  CAS  Google Scholar 

  34. Colao A, Faggiano A, Pivonello R, et al. Inferior petrosal sinus sampling in the differential diagnosis of Cushing’s syndrome: results of an Italian multicenter study. Eur J Endocrinol. 2001;144:499–507.

    Article  PubMed  CAS  Google Scholar 

  35. Szklo M, Nieto FJ. Epidemiology: Beyond the Basics. Gaithersburg, Md: Aspen Publishers, Inc.; 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anthony J. Alberg PhD, MPH.

Additional information

This research was supported by funding from the National Institute of Aging (5U01AG018033), National Cancer Institute (5U01CA086308), and National Institute of Environmental Health Sciences (P30 ES03819). Dr. Alberg is a recipient of a KO7 award from the National Cancer Institute (CA73790).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alberg, A.J., Park, J.W., Hager, B.W. et al. The use of “overall accuracy” to evaluate the validity of screening or diagnostic tests. J GEN INTERN MED 19, 460–465 (2004). https://doi.org/10.1111/j.1525-1497.2004.30091.x

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1111/j.1525-1497.2004.30091.x

Key words

Navigation