British Journal of Obstetrics and Gynaecology
Evaluation of a clinical test. II: Assessment of validity
Introduction
Part one of this commentary dealt with the reliability of a clinical test1; Part Two deals with the validity of a clinical test. Validity assesses whether the test is actually measuring what it is purporting to measure2. In order to measure validity, the measurements obtained from the test under study are compared with those obtained from a recognised reference standard3., 4.. There are three types of validity: content, criterion and construct validity2. However, we will consider only criterion validity, which is more relevant to the evaluation of clinical tests. Content and construct validity are important in psychometric tests and quality of life measurements. It is important to note that reliability and validity are closely related, for a test which is unreliable cannot be valid.
Validation involves comparing measurements obtained simultaneously, using the test under study and the reference test. One difficulty with studies of validity is that the units of measurement may be different between the test under study and the reference standard. Examples are shown in Table 1. The measurements of bladder volume by both ultrasound and bladder catheterisation are on same scales (continuous) and their units of measurement (ml) are also identical5. The comparison between pictorial menstrual blood loss and objective menstrual blood loss measurements using the alkaline haematin method have different scales: the pictorial menstrual blood loss method is in a scale of ordered categories; the alkaline haematin method is in a continuous scale6. In the fetal fibronectin test, preterm delivery can be considered the reference standard and cervical fetal fibronectin is the test under scrutiny7., 8.. Fetal fibronectin is measured on a continuous scale, but this is converted to a dichotomous scale using an optimum cutoff value to predict for preterm delivery. This optimum cutoff value is determined by a receiver–operator characteristic curve9., 10..
Section snippets
Design of a study of validity
In any study of validity the method of recruitment to the study, the blinding of measurements and the descriptions of the study population, and the test under study which has been described in relation to studies of reliability1 are equally important to studies of validity. In addition, studies of validity require that the reference test should be an appropriate one3. This is often described as the gold standard but in reality it is usually a test that is generally acknowledged to be the best
Test under scrutiny and reference standard on the same scale
As with reliability, the quantitative assessment of validity depends on scales of measurement. When the scales of measurement are the same for the test under study and the reference test, the appropriate indices of validity are the same as those used for reliability. The objective is to estimate the agreement between the two tests. The appropriate statistical tests for validity are the kappa statistic for dichotomous scales, the weighted kappa statistic for ranked scales, and the limits of
References (27)
- et al.
Evaluating diagnostic tests
Baillière's Clini Obstet Gynaecol
(1996) - et al.
The presence of cervical and vaginal fetal fibronectin predicts preterm delivery in an inner-city obstetric population
Am J Obstet Gynecol
(1993) - et al.
Chlamydia trachomatis antibody testing is more accurate than hysterosalpingogram in predicting tubal factor infertility
Fertil Steril
(1994) - et al.
Misleading author's inferences in obstetric diagnostic test literature
Am J Obstet Gynecol
(1999) - et al.
Likelihood ratios with confidence: sample size estimation for diagnostic test studies
J Clin Epidemiol
(1991) - et al.
Evaluation of a clinical test Part I: assessment of reliability
Br J Obstet Gynaecol
(2001) - et al.
Health Measurement Scales: A Practical Guide to Their Development and Use
(1995) Measuring measuring errors
Stat Med
(1989)- et al.
Medical Statistics. A Commonsense Approach
(1999) - et al.
The validity and reliability of real time ultrasound estimation of bladder volume in postnatal women
J Obstet Gynaecol
(1996)