Articles
The intra- and interrater reliability of the action research arm test: A practical test of upper extremity function in patients with stroke,☆☆,

https://doi.org/10.1053/apmr.2001.18668Get rights and content

Abstract

van der Lee JH, de Groot V, Beckerman H, Wagenaar RC, Lankhorst GJ, Bouter LM. The intra- and interrater reliability of the Action Research Arm test: a practical test of upper extremity function in patients with stroke. Arch Phys Med Rehabil 2001;82:14-9. Objectives: To determine the intra- and interrater reliability of the Action Research Arm (ARA) test, to assess its ability to detect a minimal clinically important difference (MCID) of 5.7 points, and to identify less reliable test items. Design: Intrarater reliability of the sum scores and of individual items was assessed by comparing (1) the ratings of the laboratory measurements of 20 patients with the ratings of the same measurements recorded on videotape by the original rater, and (2) the repeated ratings of videotaped measurements by the same rater. Interrater reliability was assessed by comparing the ratings of the videotaped measurements of 2 raters. The resulting limits of agreement were compared with the MCID. Patients: Stratified sample, based on the intake ARA score, of 20 chronic stroke patients (median age, 62yr; median time since stroke onset, 3.6yr; mean intake ARA score, 29.2). Main Outcome Measures: Spearman's rank-order correlation coefficient (Spearman's rho); intraclass correlation coefficient (ICC); mean difference and limits of agreement, based on ARA sum scores; and weighted kappa, based on individual items. Results: All intra- and interrater Spearman's rho and ICC values were higher than.98. The mean difference between ratings was highest for the interrater pair (.75; 95% confidence interval,.02-1.48), suggesting a small systematic difference between raters. Intrarater limits of agreement were −1.66 to 2.26; interrater limits of agreement were −2.35 to 3.85. Median weighted kappas exceeded.92. Conclusion: The high intra- and interrater reliability of the ARA test was confirmed, as was its ability to detect a clinically relevant difference of 5.7 points. © 2001 by the American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation

Section snippets

Subjects

A subsample of 20 patients involved in a randomized clinical trial (RCT) on the effectiveness of forced-use treatment in chronic stroke patients served as the study population in the reliability study.12 The randomized clinical trial included 66 subjects who met the following inclusion criteria: (1) a history of a single stroke, at least 1 year previously, resulting in hemiparesis on the dominant side; (2) a minimum of 20° of active extension in the wrist and 10° of finger extension; (3) ARA

Results

The sample included 20 patients (9 men, 11 women; median age, 62yr), with a median time since stroke of 3.6 years. The baseline characteristics are presented in table 1.

Table 1: Intake Characteristics of Subjects (n = 20)

Median age in yr (IQR)62 (52.5-71.8)
Median years since stroke (IQR)3.6 (2.5-4.9)
Women11 (55%)
Diagnosis of hemorrhage3 (15%)
Left-sided hemiparesis6 (30%)
Sensory disorders present10 (50%)
Hemineglect present2 (10%)
Intake ARA score*29.2 ± 12.5
Intake FMA score*49.2 ± 9.9
*Intake (ie,

Discussion

Although the items on the ARA test are scored on an ordinal 4-point scale, performance on this test is usually expressed as a sum score, which is generally treated as an interval scale ranging from 0 to 57.1, 2, 3, 4, 6, 7, 8 Statistical methods to estimate reliability for ordinal scales are different from those for interval or continuous scales, although the underlying statistical principles are similar.10 Because the sum score of the ARA test is used in the analysis of clinical trials, this

Conclusion

The present study confirms the high intra- and interrater reliability of the ARA test in a population of chronic stroke patients with a moderate residual loss of arm function. It is capable of detecting a difference of 10% of its maximum possible sum score of 57 points, which is considered to be clinically relevant. To make a clear distinction between scores 2 and 3, it is recommended that an explicit criterion be applied to assess patients' contact with the back of the chair, in combination

References (19)

There are more references available in the full text version of this article.

Cited by (383)

  • Biomarkers of Motor Outcomes After Stroke

    2024, Physical Medicine and Rehabilitation Clinics of North America
  • A Pilot Feasibility Trial of an Upper Extremity Assistive System

    2023, Archives of Rehabilitation Research and Clinical Translation
  • The Critical Period After Stroke Study (CPASS) Upper Extremity Treatment Protocol

    2023, Archives of Rehabilitation Research and Clinical Translation
View all citing articles on Scopus

Supported by the Netherlands Organization for Scientific Research (NWO) Council for Medical and Health Research (project no. 904-65-045).

☆☆

No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the author(s) or upon any organization with which the authors is/are associated.

Reprint requests to Johanna H. Van der Lee, MD, Dept of Rehabilitation Medicine, University Hospital Vrije Universiteit, PO Box 7057, 1007 MB Amsterdam, The Netherlands, e-mail: [email protected].

NO LABEL

a. SPSS, Inc, 233 S Wacker Dr, 11th Fl, Chicago, IL 60606.

View full text