Validity and reliability of the on-road driving assessment with senior drivers
Introduction
When assessing driving performance of “at risk” drivers, validating off-road tests of driving competence or examining effectiveness of driving simulators, the on-road assessment has traditionally been the criterion measure. Because of face validity, the on-road assessment is assumed to be the most accurate measure of driving competence. However, despite being widely used, there is a paucity of statistical evidence for the validity and reliability of the on-road assessment (Withaar et al., 2000).
Researchers and clinicians generally agree that an on-road assessment should be conducted on a standardized route in real traffic, in a vehicle with dual controls and with separation of responsibility for maintenance of safety and scoring of driving performance (Fox et al., 1998, Mazer et al., 2004). To determine the outcome of the on-road assessment, traditionally, a gestalt decision based on overall driving performance has been used but increasingly a decision based on standard observation and scoring procedures is being recommended (Di Stefano and Macdonald, 2003, Justiss et al., 2006, Odenheimer et al., 1994, Withaar et al., 2000). Different scoring procedures have been employed. Some researchers (Galski et al., 1990, Hunt et al., 1997, Justiss et al., 2006, Odenheimer et al., 1994) rated performance on specific manoeuvres while others (Baldock et al., 2006, Dobbs et al., 1998; Janke and Eberhard, 1998, Staplin et al., 1998) weighted errors according to severity. These total scores were then compared with a gestalt decision.
The results of these studies have been mixed, with some finding that the gestalt decision is consistent with the scored decision (Baldock et al., 2006, Dobbs et al., 1998, Hunt et al., 1997, Justiss et al., 2006, Odenheimer et al., 1994, Staplin et al., 1998) and others finding that only some behaviours or errors are related to the decision (Galski et al., 1990, Janke and Eberhard, 1998). More recently, the need for driving instructor intervention rather than error scores was found to be predictive of failure (Di Stefano and Macdonald, 2003). Inconclusive results such as these raise questions about the theoretical construct being evaluated. The deconstruction of the complex task of driving into smaller component parts may result in the loss of critical information (i.e., the whole is greater than the sum of the parts). In addition previous research comprised varying sample sizes, different client groups (healthy older drivers, “at risk” drivers and medically impaired drivers) and employed varying statistical analyses, all of which could have contributed to the lack of consistent findings.
Item response theory (IRT) with its focus on the items rather than multiple parameters has been identified as a means of addressing important issues associated with on-road evaluation (Justiss et al., 2006) including using ordinal rather than interval scores, uni- versus multi-dimensionality, sample-free measurement, and logicality of the item hierarchy. Rasch modeling, the simplest of the IRT models, is being used increasingly to evaluate tests of human performance to address these concerns. The relationship between person ability and item difficulty is the basis of Rasch analysis. It converts ordinal scores into interval scores, orders items and people on a continuum of difficulty and ability respectively, and examines goodness of fit of items and people along the line (Bond and Fox, 2001).
The purpose of this study was 2-fold. First, we examined the psychometric properties of a standard on-road assessment with healthy older drivers and drivers with vision deficits using Rasch modeling. Second, we compared the outcome of the gestalt decision made by trained professionals with that based on weighted error scores from the standardized assessment.
Section snippets
Study design
The study was a prospective, masked, observational design approved by The University of Sydney Human Research Ethics committee. It was part of a larger investigation of the relationship between vision and driving performance. Only Stage 1 of three stages is presented in this paper.
Participants
A group of 100 senior (≥60 years) volunteer drivers was recruited from the community and from referrals by ophthalmologists in Sydney, Australia. Community members were recruited through Probus Clubs (Senior Rotarians
Weighting of errors
The frequency of each error is recorded in Table 1. Applying a weighting of “1”, “5” and “10”, respectively for habitual, hazardous and critical errors yielded a separation index of 1.14, participant reliability index of .57 and poor item hierarchy and separation on the map, which was disappointing. A weighting of “3”, “5” and “10”, respectively yielded an improved separation index of 1.4, participant reliability index of .60 and an improved item hierarchy map. Finally, applying a weighting of
Discussion
The purpose of this study was to examine validity and reliability of the on-road driving assessment with healthy older drivers and those with vision deficits and to determine how accurately the total error score matched a gestalt decision for on-road driving performance. The findings yielded strong evidence for construct validity indicating that the on-road test measures a single theoretical construct, namely driving errors, indicative of driving safety. There was also strong evidence for
Limitations
Several limitations contribute to the need to exercise caution in generalizing the findings of this study. Firstly, the on-road assessment was shorter (duration of 20–30 min) than many others (duration of 45–60 min) reported in the literature. Most of the required elements were included, but Rasch analysis identified that a sufficiently demanding cognitive task was also required to separate levels of competent drivers. Secondly, there was a small number of participants with vision deficits (N = 20)
Conclusion and recommendation for future research
Rasch analysis of the on-road driving assessment provided strong evidence for construct validity and inter-rater reliability and limited evidence for internal reliability. The addition of more cognitively demanding items and using the assessment with a more varied population, specifically with less competent drivers, would reveal important additional information about the test's psychometric properties and provide avenues of further research. The total error score predicted the assessment
Recommendations for practice
To ensure validity and reliability, on-road driving assessments for senior drivers should be conducted over a standardized route using a vehicle with dual controls to ensure safety. It is recommended that clinicians record errors in performance, then categorize them as habitual, hazardous or critical and weight them by a factor of ‘1”, “2” or “5”, respectively for severity of threat to safety. Although clinicians generally do not have access to software to enable them to statistically analyze
Acknowledgements
This study was funded by the Faculty of Health Sciences, The University of Sydney. Lynnette Kay's contribution was also funded by an Australian Postgraduate Award as the study was undertaken in partial fulfillment of the requirements for her Ph.D. The authors wish to thank the clinicians at Driver Rehabilitation and Fleet Safety Services and the Discipline of Orthoptics at The University of Sydney for their support in conducting the study.
References (23)
- et al.
Self-regulation of driving and its relationship to driving ability among older adults
Accid. Anal. Prev.
(2006) - et al.
Assessment of older drivers: relationships among on-road errors, medical conditions and test outcome
J. Safety Res.
(2003) - et al.
A comparative approach to identify unsafe older drivers
Accid. Anal. Prev.
(1998) - et al.
On-road assessment of driving competence after brain impairment: review of current practice and recommendations for a standardized examination
Arch. Phys. Med. Rehabil.
(1998) Assessing older drivers. Two studies
J. Safety Res.
(2001)- et al.
Assessing medically impaired older drivers in a licensing agency setting
Accid. Anal. Prev.
(1998) - et al.
Confidence in, and self-rating of, driving ability among older drivers
Accid. Anal. Prev.
(1998) - et al.
Fatal crash risk for older drivers at intersections
Accid. Anal. Prev.
(1998) - Badia, X., Prieto, L., Linacre, J.M., 2002. Differential item and test functioning. Rasch Measurement Transactions....
- et al.
Applying the Rasch Model
(2001)
An assessment of measures to predict the outcome of driving evaluations in patients with cerebral damage
Am. J. Occup. Ther.
Cited by (50)
Comparison of older and middle-aged drivers’ driving performance in a naturalistic setting
2021, Accident Analysis and PreventionCitation Excerpt :While it has been shown that crashes occur more often in certain driving environments and during specific maneuvers among older drivers compared to middle-aged drivers, the differences in the complexity of the driving routes chosen by these age groups during their everyday excursions are not well understood. Many drivers with considerable years of experience perform some inappropriate driving maneuvers, which may be either bad habits that are relatively harmless, or they may pose a risk (Baldock et al., 2006; Kay et al., 2008). Analysis of crash data reveals some important differences in the types of critical errors made by older and middle-aged drivers.
Development of a weighted scoring system for the Electronic Driving Observation Schedule (eDOS)
2020, MethodsXCitation Excerpt :Therefore, the severity of errors defined by the verbal or physical intervention of the instructor is not applicable to the NDO. Other studies developed weighting systems that do not consider the intervention of driving instructors [2,10,12,14]. These weighting systems separate driving errors into habitual errors (or “high-frequency low-severity errors”), hazardous errors (or “low-frequency high-severity errors”), and critical errors.
A roadmap for interpreting the literature on vision and driving
2015, Survey of OphthalmologyUsing the community health assessment to screen for continued driving
2014, Accident Analysis and PreventionCitation Excerpt :For a small group of elders, referral to on-road and off-road risk assessments have provided families and clinicians with information indicative of hazardous driving behavior. On-road approaches include driving tests on both open roads (Kay et al., 2008; Shechtman et al., 2010; Ott et al., 2012) and closed-tracks (Ponsford et al., 2008). Driving assessment with off-road simulators also have been an area of active research, with tests simulating approaching intersections, making lane changes, attempting on-street parking and identifying hazardous road conditions (Devlin et al., 2012; Lavallière et al., 2012; Edquist et al., 2012; Wood et al., 2013).
Performance Analysis of Driving Ability (P-Drive): Investigating Construct Validity and Concordance of Australasian Data
2024, OTJR: Occupational Therapy Journal of ResearchDriveSafe DriveAware: A systematic review
2023, Australasian Journal on Ageing