Evaluation of a clinical test. II: Assessment of validity

https://doi.org/10.1016/S0306-5456(00)00128-5Get rights and content

Introduction

Part one of this commentary dealt with the reliability of a clinical test1; Part Two deals with the validity of a clinical test. Validity assesses whether the test is actually measuring what it is purporting to measure2. In order to measure validity, the measurements obtained from the test under study are compared with those obtained from a recognised reference standard3., 4.. There are three types of validity: content, criterion and construct validity2. However, we will consider only criterion validity, which is more relevant to the evaluation of clinical tests. Content and construct validity are important in psychometric tests and quality of life measurements. It is important to note that reliability and validity are closely related, for a test which is unreliable cannot be valid.

Validation involves comparing measurements obtained simultaneously, using the test under study and the reference test. One difficulty with studies of validity is that the units of measurement may be different between the test under study and the reference standard. Examples are shown in Table 1. The measurements of bladder volume by both ultrasound and bladder catheterisation are on same scales (continuous) and their units of measurement (ml) are also identical5. The comparison between pictorial menstrual blood loss and objective menstrual blood loss measurements using the alkaline haematin method have different scales: the pictorial menstrual blood loss method is in a scale of ordered categories; the alkaline haematin method is in a continuous scale6. In the fetal fibronectin test, preterm delivery can be considered the reference standard and cervical fetal fibronectin is the test under scrutiny7., 8.. Fetal fibronectin is measured on a continuous scale, but this is converted to a dichotomous scale using an optimum cutoff value to predict for preterm delivery. This optimum cutoff value is determined by a receiver–operator characteristic curve9., 10..

Section snippets

Design of a study of validity

In any study of validity the method of recruitment to the study, the blinding of measurements and the descriptions of the study population, and the test under study which has been described in relation to studies of reliability1 are equally important to studies of validity. In addition, studies of validity require that the reference test should be an appropriate one3. This is often described as the gold standard but in reality it is usually a test that is generally acknowledged to be the best

Test under scrutiny and reference standard on the same scale

As with reliability, the quantitative assessment of validity depends on scales of measurement. When the scales of measurement are the same for the test under study and the reference test, the appropriate indices of validity are the same as those used for reliability. The objective is to estimate the agreement between the two tests. The appropriate statistical tests for validity are the kappa statistic for dichotomous scales, the weighted kappa statistic for ranked scales, and the limits of

First page preview

First page preview
Click to open first page preview

References (27)

  • J.M. Higham et al.

    Assessment of menstrual blood loss using a pictorial chart

    Br J Obstet Gynaecol

    (1990)
  • C.J. Lockwood et al.

    Fetal fibronectin in cervical and vaginal secretions as a predictor of preterm delivery

    N Engl J Med

    (1991)
  • P.F.W. Chien et al.

    The diagnostic accuracy of cervico-vaginal fetal fibronectin in predicting preterm delivery: an overview

    Br J Obstet Gynaecol

    (1995)
  • Cited by (0)

    View full text