Abstract
OBJECTIVE: Evaluations of screening or diagnostic tests sometimes incorporate measures of overall accuracy, diagnostic accuracy, or test efficiency. These terms refer to a single summary measurement calculated from 2 × 2 contingency tables that is the overall probability that a patient will be correctly classified by a screening or diagnostic test. We assessed the value of overall accuracy in studies of test validity, a topic that has not received adequate emphasis in the clinical literature.
DESIGN: Guided by previous reports, we summarize the issues concerning the use of overall accuracy. To document its use in contemporary studies, a search was performed for test evaluation studies published in the clinical literature from 2000 to 2002 in which overall accuracy derived from a 2×2 contingency table was reported.
MEASUREMENTS AND MAIN RESULTS: Overall accuracy is the weighted average of a test’s sensitivity and specificity, where sensitivity is weighted by prevalence and specificity is weighted by the complement of prevalence. Overall accuracy becomes particularly problematic as a measure of validity as 1) the difference between sensitivity and specificity increases and/or 2) the prevalence deviates away from 50%. Both situations lead to an increasing deviation between overall accuracy and either sensitivity or specificity. A summary of results from published studies (N=25) illustrated that the prevalence-dependent nature of overall accuracy has potentially negative consequences that can lead to a distorted impression of the validity of a screening or diagnostic test.
CONCLUSIONS: Despite the intuitive appeal of overall accuracy as a single measure of test validity, its dependence on prevalence renders it inferior to the careful and balanced consideration of sensitivity and specificity.
Similar content being viewed by others
References
Shapiro DE. The interpretation of diagnostic tests. Stat Methods Med Res. 1999;8:113–34.
Begg CB. Biases in the assessment of diagnostic tests. Stat Med. 1987;6:411–23.
Metz CE. Basic principles of ROC analysis. Semin Nucl Med. 1978;8:283–98.
Weiss N. Clinical Epidemiology: The Study of the Outcome of Illness. 2nd ed. New York, NY: Oxford University Press; 1996:20–1.
Grimes DA, Schulz KF. Uses and abuses of screening tests. Lancet. 2002;359:881–4.
Siberry GK. Conversion formulas and biostatistics. In: Siberry GK, Iannone R, eds. The Harriet Lane Handbook: A Manual for Pediatric House Officers. 15th ed. St. Louis, Mo: Mosby; 2000:181–6.
Galen RS, Gambino SR. Beyond Normality: The Predictive Value and Efficiency of Medical Diagnoses. New York, NY: John Wiley & Sons; 1975.
Wassertheil-Smoller S. Biostatistics and Epidemiology: A Primer for Health Professionals. 2nd ed. New York, NY: Springer-Verlag; 1995:118–28.
Nardin RA, Rutkove SB, Raynor EM. Diagnostic accuracy of electrodiagnostic testing in the evaluation of weakness. Muscle Nerve. 2002;26:201–5.
Tong MJ, Blatt LM, Kao VWC. Surveillance for hepatocellular carcinoma in patients with chronic viral hepatitis in the United States of America. J Gastroenterol Hepatol. 2001;16:553–9.
McFarland EG, Kim TK, Savino RM. Clinical assessment of three common tests for superior labral anterior-posterior lesions. Am J Sports Med. 2002;30:810–5.
Krettek C, Seekamp A, Kontopp H, Tscherne H. Hannover Fracture Scale ′98—re-evaluation and new perspectives of an established extremity salvage score. Injury. 2001;32:317–28.
Postema S, Pattynama P, van den Berg-Huysmans A, Peters LW, Kenter G, Trimbos JB. Effect of MRI on therapeutic decisions in invasive cervical carcinoma. Gynecol Oncol. 2000;79:485–9.
Yang WT, Lam WWM, Yu MY, Cheung TH, Metreweli C. Comparison of dynamic helical CT and dynamic MR imaging in the evaluation of pelvic lymph nodes in cervical carcinoma. Am J Roentgenol. 2000;175:759–66.
Tsatalpas P, Beuthein-Baumann B, Kropp J, et al. Diagnostic value of 18F-FDG positron emission tomography for detection and treatment control of malignant germ cell tumors. Urol Int. 2002;68:157–63.
Jee W, McCauley TR, Katz LD, Matheny JM, Ruwe PA, Daigneault JP. Superior labral anterior posterior (SLAP) lesions of the glenoid labrum: reliability and accuracy of MR arthrography for diagnosis. Radiology. 2001;218:127–32.
Koide Y, Yotsukura M, Yoshino H, Ishikawa K. Usefulness of QT dispersion immediately after exercise as an indicator of coronary stenosis independent of gender or exercise-induced ST-segment depression. Am J Cardiol. 2000;86:1312–7.
Aslam N, Banerjee S, Carr JV, Savvas M, Hooper R, Jurkovic D. Prospective evaluation of logistic regression models for the diagnosis of ovarian cancer. Obstet Gynecol. 2000;96:75–80.
Yeoh GPS, Chan KW. The diagnostic value of fine-needle aspiration cytology in the assessment of thyroid nodules: a retrospective 5-year analysis. Hong Kong Med J. 1999;5:140–4.
Vicini FA, Kestin LL, Martinez AA. The correlation of serial prostate specific antigen measurements with clinical outcome after external beam radiation therapy of patients for prostate carcinoma. Cancer. 2000;88:2305–18.
Elhendy A, van Domberg RT, Sozzi FB, Poldermans D, Bax JJ, Roelandt JRTC. Impact of hypertension on the accuracy of exercise stress myocardial perfusion imaging for the diagnosis of coronary artery disease. Heart. 2001;85:655–61.
Viegi G, Pedreschi M, Pistelli F, et al. Prevalence of airways obstruction in a general population: European Respiratory Society versus American Thoracic Society definition. Chest. 2000;117(suppl 2):339–45.
Nunes LW, Schnall MD, Orel SG. Update of breast MR imaging architectural interpretation model. Radiology. 2001;219:484–94.
Flamen P, Lerut A, Van Cutsem E, et al. Utility of positron emission tomography for the staging of patients with potentially operable esophageal carcinoma. J Clin Oncol. 2000;18:3202–10.
Sone S, Li F, Yang Z-G, et al. Characteristics of small lung cancers invisible on conventional chest radiography and detected by population based screening using spiral CT. Br J Radiol. 2000;73:137–45.
Wong BC, Wong WM, Wang WH, et al. An evaluation of invasive and non-invasive tests for the diagnosis of Helicobactor pylori infection in Chinese. Aliment Pharmacol Ther. 2001;15:505–11.
Lin WY, Chao TH, Wang SJ. Clinical features and gallium scan in the detection of post-surgical infection in the elderly. Eur J Nucl Med Mol Imaging. 2002;29:371–5.
Ahmad NA, Lewis JD, Ginsberg GG, Rosato EF, Morris JB, Kochman ML. EUS in preoperative staging of pancreatic cancer. Gastrointest Endosc. 2000;52:463–8.
Meyer PT, Schreckenberger M, Spetzger U, et al. Comparison of visual and ROI-based brain tumor grading using 18F-FDG PET: ROC analysis. Eur J Nucl Med Mol Imaging. 2001;28:165–74.
Ogawa K, Oida A, Sugimura H, et al. Clinical significance of blood brain natriuretic peptide level measurement in the detection of heart disease in untreated outpatients. Circ J. 2002;66:122–6.
Lokeshwar VB, Schroeder GL, Selzer MG, et al. Bladder tumor markers for monitoring recurrence and screening comparison of hyaluronic acid-hyaluronidase and BTA-stat tests. Cancer. 2002;95:61–72.
Gurleyik G, Gurleyik E, Cetinkaya F, Unalmiser S. Serum interleukin-6 measurement in the diagnosis of acute appendicitis. Aust NZ J Surg. 2002;72:665–7.
Greco M, Crippa F, Agresti R, et al. Axillary lymph node staging in breast cancer by 2-fluoro-2-deoxy-D-glucose-positron emission tomography: clinical evaluation and alternative management. J Natl Cancer Inst. 2001;93:630–5.
Colao A, Faggiano A, Pivonello R, et al. Inferior petrosal sinus sampling in the differential diagnosis of Cushing’s syndrome: results of an Italian multicenter study. Eur J Endocrinol. 2001;144:499–507.
Szklo M, Nieto FJ. Epidemiology: Beyond the Basics. Gaithersburg, Md: Aspen Publishers, Inc.; 2000.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by funding from the National Institute of Aging (5U01AG018033), National Cancer Institute (5U01CA086308), and National Institute of Environmental Health Sciences (P30 ES03819). Dr. Alberg is a recipient of a KO7 award from the National Cancer Institute (CA73790).
Rights and permissions
About this article
Cite this article
Alberg, A.J., Park, J.W., Hager, B.W. et al. The use of “overall accuracy” to evaluate the validity of screening or diagnostic tests. J GEN INTERN MED 19, 460–465 (2004). https://doi.org/10.1111/j.1525-1497.2004.30091.x
Issue Date:
DOI: https://doi.org/10.1111/j.1525-1497.2004.30091.x