Elsevier

The American Journal of Medicine

Volume 119, Issue 2, February 2006, Pages 166.e7-166.e16
The American Journal of Medicine

Review
Current Concepts in Validity and Reliability for Psychometric Instruments: Theory and Application

https://doi.org/10.1016/j.amjmed.2005.10.036Get rights and content

Abstract

Validity and reliability relate to the interpretation of scores from psychometric instruments (eg, symptom scales, questionnaires, education tests, and observer ratings) used in clinical practice, research, education, and administration. Emerging paradigms replace prior distinctions of face, content, and criterion validity with the unitary concept “construct validity,” the degree to which a score can be interpreted as representing the intended underlying construct. Evidence to support the validity argument is collected from 5 sources:

  • Content: do instrument items completely represent the construct?

  • Response process: the relationship between the intended construct and the thought processes of subjects or observers

  • Internal structure: acceptable reliability and factor structure

  • Relations to other variables: correlation with scores from another instrument assessing the same construct

  • Consequences: do scores really make a difference?

Evidence should be sought from a variety of sources to support a given interpretation. Reliable scores are necessary, but not sufficient, for valid interpretation. Increased attention to the systematic collection of validity evidence for scores from psychometric instruments will improve assessments in research, patient care, and education.

Section snippets

Validity, constructs, and meaningful interpretation of instrument scores

Validity refers to “the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses of tests.”19 In other words, validity describes how well one can legitimately trust the results of a test as interpreted for a specific purpose.

Many instruments measure a physical quantity such as height, blood pressure, or serum sodium. Interpreting the meaning of such results is straightforward.20 In contrast, results from assessments of patient symptoms,

Reliability: Necessary, but not Sufficient, for valid inferences

Reliability refers to the reproducibility or consistency of scores from one assessment to another.19 Reliability is a necessary, but not sufficient, component of validity.21, 29 An instrument that does not yield reliable scores does not permit valid interpretations. Imagine obtaining blood pressure readings of 185/100 mm Hg, 80/40 mm Hg, and 140/70 mm Hg in 3 consecutive measurements over a 3-minute period in an otherwise stable patient. How would we interpret these results? Given the wide

Practical application of validity concepts in selecting an instrument

Consumers of previously developed psychometric instruments in clinical practice, research, or education need to carefully weigh the evidence supporting the validity of the interpretations they are trying to make. Scores from a popular instrument may not have evidence to justify their use. Many authors cite evidence from only one or two sources, such as reliability or correlation with another instrument’s scores, to support the validity of interpretations. Such instruments should be used with

Practical application of validity concepts in developing an instrument

When developing psychometric instruments, careful attention should again be given to each category of validity evidence in turn. To illustrate the application of these principles, we will discuss how evidence could be planned, collected, and documented when developing an assessment of clinical performance for internal medicine residents.

The first step in developing any instrument is to identify the construct and corresponding content. In our example we could look at residency program objectives

Conclusion

A clear understanding of validity and reliability in psychometric assessment is essential for practitioners in diverse medical settings. As Foster and Cone note, “Science rests on the adequacy of its measurement. Poor measures provide a weak foundation for research and clinical endeavors.”18 Validity concerns the degree to which scores reflect the intended underlying construct, and refers to the interpretation of results rather than the instrument itself. It is best viewed as a carefully

Acknowledgments

We thank Steven M. Downing, PhD (University of Illinois at Chicago, Department of Medical Education), for his insights and constructive critique.

References (65)

  • S.A. Kaplan et al.

    The American Urological Association symptom score in the evaluation of men with lower urinary tract symptomsat 2 years of followup, does it work?

    J Urol

    (1996)
  • W. Knaus et al.

    The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults

    Chest

    (1991)
  • J.A. Ewing

    Detecting alcoholismthe CAGE questionnaire

    JAMA

    (1984)
  • R.L. Spitzer et al.

    Utility of a new procedure for diagnosing mental disorders in primary care. The PRIME-MD 1000 study

    JAMA

    (1994)
  • U.E. Bauer et al.

    Changes in youth cigarette use and intentions following implementation of a tobacco control programfindings from the Florida Youth Tobacco Survey, 1998-2000

    JAMA

    (2000)
  • National Board of Medical Examiners. United States Medical Licensing Exam Bulletin. Produced by Federation of State...
  • J.J. Norcini et al.

    The mini-CEXa method for assessing clinical skills

    Ann Intern Med

    (2003)
  • D.K. Litzelman et al.

    Factorial validation of a widely disseminated educational framework for evaluating clinical teachers

    Acad Med

    (1998)
  • Merriam-Webster Online. Available at: http://www.m-w.com/. Accessed March 10,...
  • D.L. Sackett et al.

    Evidence-Based MedicineHow to Practice and Teach EBM

    (1998)
  • J. Wallach

    Interpretation of Diagnostic Tests

    (2000)
  • T.J. Beckman et al.

    How reliable are assessments of clinical teaching? A review of the published instruments

    J Gen Intern Med

    (2004)
  • T.D. Shanafelt et al.

    Burnout and self-reported patient care in an internal medicine residency program

    Ann Intern Med

    (2002)
  • G.C. Alexander et al.

    Patient-physician communication about out-of-pocket costs

    JAMA

    (2003)
  • D. Pittet et al.

    Hand hygiene among physiciansperformance, beliefs, and perceptions

    Ann Intern Med

    (2004)
  • S. Messick

    Validity

  • S.L. Foster et al.

    Validity issues in clinical assessment

    Psychol Assess

    (1995)
  • Standards for Educational and Psychological Testing

    (1999)
  • J.M. Bland et al.

    Statistics notesvalidating scales and indexes

    BMJ

    (2002)
  • S.M. Downing

    Validityon the meaningful interpretation of assessment data

    Med Educ

    (2003)
  • 2005 Certification Examination in Internal Medicine Information Booklet. Produced by American Board of Internal...
  • M.T. Kane

    An argument-based approach to validity

    Psychol Bull

    (1992)
  • Cited by (1055)

    View all citing articles on Scopus
    View full text