Abstract
Performance assessments are subject to many potential error sources. For performance-based assessments, including standardized patient (SP) examinations, these error sources, if left unchecked, can compromise the validity and reliability of scores. Quality assurance (QA) measures, both quantitative and qualitative, can be used to ensure that candidate scores are accurate and reasonably free from measurement error. The purpose of this paper is to outline several QA strategies that can be used to identify potential content- and score-related problems with SP assessments. These approaches include case analyses and various comparisons of primary and observer scores. Specific examples from the ECFMG Clinical Skills Assessment(CSA®) are used to educate the reader concerning appropriate statistical methods and legitimate data interpretations. The results presented in this investigation highlight the need for well-defined training regimes, regular feedback to those involved in rating/scoring performances, and detailed statistical analyses of all scores.
Similar content being viewed by others
References
Bollen, K.A. (1989). Structural Equations with Latent Variables. New York: John Wiley & Sons.
Boulet, J.R., Friedman Ben-David, M. et al. (1998a). Using standardized patients to assess the interpersonal skills of physicians. Academic Medicine 73: S94–S96.
Boulet, J.R., Friedman Ben-David, M. et al. (1998b). An investigation of the sources of measurement error in the post-encounter written scores from standardized patient examinations. Advances in Health Sciences Education 3: 89–100.
Boulet, J.R., Friedman Ben-David, M. et al. (2000). The use of holistic scoring for post-encounter written exercises. In D. Melnick (ed.), Proceedings of the Eighth Ottawa Conference on Medical Education and Assessment, pp. 254–260. Philadelphia: National Board of Medical Examiners.
Brennan, R.L. & Johnson, E.G. (1995). Generalizability of performance assessments. Educational Measurement: Issues and Practice Winter: 9–12.
Carraccio, C. & Englander, R. (2000). The objective structured clinical examination: A step in the direction of competency-based evaluation. Archives of Pediatric Adolescent Medicine 154: 736–741.
Chambers, K.A., Boulet, J.R. & Gary, N.E. (2000). The management of patient encounter time in a high-stakes assessment using standardized patients. Medical Education 34: 813–817.
Clauser, B.E., Swanson, D.B. & Clyman, S.G. (1996). The generalizability of scores from a performance assessment of physicians' patient management skills. Academic Medicine 71: S109–S111.
Cooper-Patrick, L., Gallo, J.J. et al. (1999). Race, gender, and partnership in the patient-physician relationship. Journal of the American Medical Association 282: 583–589.
Dauphinee, D. & Norcini, J.J. (1999). Assessing health care professionals in the new millenium. Advances in Health Sciences Education 4: 3–7.
De Champlain, A.F., Margolis, M.J. et al. (1997). Standardized patients' accuracy in recording examinees' behaviors using checklists. Academic Medicine 72: S85–S87.
Downing, S.M. & Haladyna, T.M. (1997). Test item development: Validity evidence from quality assurance procedures. Applied Measurement in Education 10: 61–82.
ECFMG (1999). Clinical Skills Assessment (CSA) Candidate Orientation Manual. Philadelphia, Pennsylvania: Educational Commission for Foreign Medical Graduates (ECFMG).
Friedman Ben-David, M., Boulet, J.R. et al. (1997). Issues of validity and reliability concerning who should score the post-encounter patient-progress note. Academic Medicine 72: S79–S81.
Grand'Maison, P., Brailovsky, C.A. et al. (1997). Using standardized patients in licensing / certification examinations: Comparison of two tests in Canada. Family Medicine 29: 27–32.
Hodges, B., Turnbull, J. et al. (1995). Assessment of communication skills with complex cases using OSCE format. In A.I. Rothman & R. Cohen (eds.), Proceedings of the Sixth Ottawa Conference on Medical Education, pp. 269–272. Toronto: University of Toronto Bookstore.
Hodges, B., Regehr, G. et al. (1999). OSCE checklists do not capture increasing levels of expertise. Academic Medicine 74: 1129–1134.
Klass, D.J. (1994). “High-stakes” testing of medical students using standardized patients. Teaching and Learning in Medicine 6: 28–32.
Kline, R.B. (1998). Principles and Practice of Structural Equation Modeling. New York: The Guilford Press.
Pangaro, L.N., Worth-Dickstein, H. et al. (1997). Performance of “standardized examinees” in a standardized-patient examination of clinical skills. Academic Medicine 72: 1008–1011.
Reznick, R., Blackmore, D. et al. (1996). Large-scale high-stakes testing with an OSCE: Report from the Medical College of Canada. Academic Medicine 71: S19–S21.
Rutala, P.J., Witzke, D.B. et al. (1990). Student fatigue as a variable affecting performance in an objective structured clinical examination. Academic Medicine 65: S53–S54.
Searle, S.R., Speed, F.M. & Milliken, G.A. (1980). Population marginal means in the linear model: An alternative to least squares means. The American Statistician 34: 216–221.
Sinacore, J.M., Connell, K.J. et al. (2000). A method for measuring interrater agreement on checklists. Evaluation & the Health Professions 22: 221–234.
Swanson, D.B., Clauser, B.E. & Case, S.M. (1999). Clinical skills assessment with standardized patients in high-stakes tests: A framework for thinking about score precision, equating, and security. Advances in Health Sciences Education 4: 67–106.
Swanson, D.B., Norman, G.R. & Linn, R.L. (1995). Performance-based assessment: Lessons from the health professions. Educational Researcher 24: 5–11.
Tamblyn, R.M., Klass, D.J. et al. (1991). Sources of unreliability and bias in standardized-patient rating. Teaching and Learning in Medicine 3: 74–85.
van der Vleuten, C., Norman, G.R. & De Graaff, E. (1991). Pitfalls in the pursuit of objectivity: Issues of reliability. Medical Education 25: 110–118.
Vu, N.V. & Barrows, H.S. (1994). Use of standardized patients in clinical assessments: recent developments and measurement findings. Educational Researcher 23: 23–30.
Wallace, P., Garman, K. et al. (1999). Effect of varying amounts of feedback on standardized patient checklist accuracy in clinical practice examinations. Teaching and Learning in Medicine 11: 148–152.
Wang, Y., Stillman, P.L. et al. (1996). The effect of fatigue on the accuracy of standardized patients' checklist recording. Teaching & Learning in Medicine 8: 148–151.
Whelan, G.P. (1999). Educational Commission for Foreign Medical Graduates: Clinical skills assessment prototype. Medical Teacher 21: 156–160.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Boulet, J.R., McKinley, D.W., Whelan, G.P. et al. Quality Assurance Methods for Performance-Based Assessments. Adv Health Sci Educ Theory Pract 8, 27–47 (2003). https://doi.org/10.1023/A:1022639521218
Issue Date:
DOI: https://doi.org/10.1023/A:1022639521218