The Effect of Standardization and Training on Inter

SP16 The Effect of Standardization and Training on Inter- and Intra-Rater Reliability
of the Modified Ashworth Scale in Children with Cerebral Palsy
Nancy J. Clegg, BSN, MSN, PhD; Deborah A. Baldwin, BS; Sara Baldwin, BS; Carol Chambers, BS, MS, PCS, PT; Hun Epps, PT; Margie Goggans, BS;
ChanHee Jo, PhD; Charter Rushing, PhD, PT; Angela Shierk, PhD, OTR/L; Mauricio R. Delgado, MD
Texas Scottish Rite Hospital for Children • The University of Texas Southwestern Medical Center at Dallas, Texas
BACKGROUND:
GROUP A: TRAINED RATERS:
The reliable measurement of spasticity is critical to monitor the progress of children with cerebral
palsy (CP) and to assess the efficacy of current treatments. Most studies of spasticity measurement
have consistently utilized the Ashworth Scale (Ashworth 1964) or Modified Ashworth Scale
(Bohannon & Smith 1987), which are actually tone or stiffness measuring scales that have become
a de facto criterion standard for spasticity measurement.
Bohannon and Smith (1987) revised the original Ashworth Scale (1964) to render the scale more
discrete. This new scale is commonly referred to as the Modified Ashworth Scale (MAS):
0 = no increase muscle tone;
1 = slight increase in muscle tone, manifested by a catch and release or by minimal
resistance at the end of the range of motion when the affected part(s) is moved in flexion or
extension;
1+= slight increase in muscle tone, manifested by a catch, followed by minimal resistance
throughout the remainder (less than half) of the ROM;
2 = more marked increase in muscle tone through most of the ROM, but affected part(s)
easily moved;
3 = considerable increase in muscle tone, passive movement difficult; and
4 = affected part(s) rigid in flexion or extension.
For the purposes of this study, raters in Group A were trained using a newly developed specialized
training method on performing MAS assessments. Raters in this group attended a lecture
(PowerPoint with video demonstrations), received a training manual (See Figure 1.), were given
detailed data collection tools for each joint (See Figure 2.), received detailed instructions on dividing
the available range of motion into quarters, and watched demonstrations of the new assessment
method and participated in supervised practice sessions with clinic patients. These raters were given
specific instructions on dividing the available range of motion into quarters to better delineate the
scores on the scale:
• A MAS rating of “1” refers to a catch and release in the 4th quarter (last quarter) of the available
range of motion.
• A MAS rating of “1+” refers to a catch and release in the 3rd quarter of the available range of
motion.
• A MAS rating of “2” or “3” refers to an increase in muscle tone that begins in the 1st or 2nd quarter
of the available range of motion.
Following training and supervised practice, raters were certified in the assessment of tone using the
MAS in clinic on pediatric patients diagnosed with CP. (See Figure 3.)
Although the original scale has been modified twice, reliability studies have demonstrated varied
results across ages. Furthermore, there are limited studies in children with CP. (See Table 1.)
RESEARCH STUDY
Fosang et al. (2003)
Clopton et al. (2005)
Yam & Leung (2006)
Mutlu et al. (2008)
Klingels et al. (2010)
Numanoğlu & Günel (2012)
Delgado et al. (2015)
[In Process]
RATERS SUBJECTS MUSCLE GROUPS
N=6
N=5
N=2
N=3
N=2
N=1
N=3
Untrained
N=3
Trained
N=18
N=17
N=17
N=38
N=30
N=37
N=17
3 Lower Limb
6 Upper/Lower Limb
2 Lower Limb
5 Lower Limb
8 Upper Limb
6 Upper/Lower Limb
4 Upper/Lower Limb
INTRA-RATER
RELIABILITY
-0.07 - 0.85 ICC
0.54 - 0.80 ICC
Not performed
0.36 - 0.83 ICC
0.57 - 0.85 ICC
0.26 - 0.66 ICC
Untrained:
0.37- 0.81 ICC
Trained:
0.71 - 0.85 ICC
Figure 2: Data collection tool for Group A
Trained Raters.
Table 1: Literature review of MAS intra- and inter-rater reliability studies in children with cerebral palsy.
Recently, while executing another study using the MAS, “A phase III, multicenter, double-blind,
prospective, randomized, controlled, multiple treatment study assessing efficacy and safety of
Dysport® used in the treatment of upper limb spasticity in children” (Protocol Y-52-52120-153),
the investigators created more detailed instructions on using the MAS to enhance reliability of
measurements across multiple sites.
EXAMINER CERTIFICATION
Date:
MATERIALS/METHODS:
The subjects were divided into three groups (6 patients per group) and were assessed at one of three
assigned times (AM, Noon, or PM) on both days, approximately 24 hours apart. During the assessment,
each clinician rated the subject’s muscle tone using the MAS. Four joints (elbow flexor, wrist flexor,
knee flexor and ankle plantarflexor) on each subject were measured by each rater on 2 consecutive
days. Each rater had ten minutes per subject to complete the four MAS assessments.
Raters were divided into 2 groups: Group A with specialized training in rating hypertonia using the
MAS following a standardized protocol. Group B used MAS assessments following Bohannon & Smith’s
1987 published recommendations. The order of evaluations by examiners was randomized.
All raters received the article by Bohannon and Smith (1987) and were instructed to:
• Place the child in a supine position with the head midline.
• For the timing for the extension of the limb, use a ‘fast velocity’ of one second.
• Keep repeated movement cycles at a minimum, 5 to 8 times.
The primary outcome measure was the extent of agreement among all raters (inter-rater reliability) &
the extent of agreement between each rater’s 2 evaluations (intra-rater reliability). Statistical analysis
was performed using weighted kappas with a 95% Confidence Interval.
Assessor:
Subject#:
ELBOW
Disagree
Agree
Disagree
Left Right Left Right Left Right
WRIST
Patient and family were
comfortable throughout the
examination
Examiner gave patient age
appropriate instructions/
explanations
Patient safety was priority at all
times
Proper disinfectant measures
were taken
Correctly positioned patient’s
head/trunk & maintained
position
Correctly stabilized opposite
limb and maintained stabilization
Correctly positioned distal and
proximal limb to be measured
Examiner correctly positioned
his/her hands
Relaxation technique applied
correctly
The purpose of the current study was to examine the effect of standardization and training on interand intra-rater reliability of the MAS in children with cerebral palsy. The study compared two groups
of raters: Group A with specialized training and Group B using the MAS per standard clinical protocol
following the recommendations of Bohannon and Smith (1987).
Seventeen children (mean age 10.9 years ±3.38) with hypertonia due to CP were recruited from the
neurology clinic population at a tertiary care facility, Texas Scottish Rite Hospital for Children (TSRHC).
The six raters included healthcare professionals who had specific experience in the management
of patients with hypertonia: 1 Neurologist, 1 Physician Assistant, 1 Occupational Therapist, and 3
Physical Therapists.
INTRA-RATER RELIABILITY
Weighted Kappa
Trained
#1 0.725
#2 0.749
#3 0.853
Untrained
#4 0.595
#5 0.822
#6 0.387
95% Confidence Interval
( 0.5665, 0.8355 )
( 0.5470, 0.8574 )
( 0.7320, 0.9277 )
( 0.3698, 0.7597 )
( 0.6976, 0.9023 )
( 0.0882, 0.6320 )
Table 2: Intra-rater reliability for all healthcare providers with specific experience in the
management of patients with hypertonia: Three trained raters using a newly developed
standardized protocol (highlighted in blue) and three untrained raters performing assessments
using routine clinical practice (highlighted in tan).
Inter-rater reliability for the trained raters was 0.72 on Day 1 and 0.66 on Day 2. The inter-rater
reliability for the untrained raters 0.23 on Day 1 and 0.27 on Day 2. (See Table 3.)
Table 3: Inter-rater reliability on Day1 and Day2 for Group A Trained Raters using the new
standardized protocol (highlighted in blue) compared to Group B Untrained Raters using routine
clinical practice (highlighted in tan).
Figure 3. Examiner Certification Form for Group A Trained Raters.
OBJECTIVES:
STUDY PARTICIPANTS/SETTING:
Intra-rater reliability for the three healthcare providers in the trained group ranged from 0.73 to 0.85
while three healthcare providers in the untrained group ranged from 0.39 to 0.82. (See Table 2.)
INTER-RATER RELIABILITY BY GROUP
Weighted Kappa
95% Confidence Interval
Trained
Day1 0.717
( 0.6040, 0.8133 )
Day2 0.663
( 0.5386, 0.7626 )
Untrained
Day1 0.233
( 0.1085, 0.3568 )
Day2 0.269
( 01415, 0.3906 )
INTER-RATER
RELIABILITY
0.27 - 0.56 ICC
0.33 - 0.79 ICC
0.41 - 0.73 ICC
0.61 - 0.87 ICC
0.52 - 0.83 ICC
Not performed
Untrained:
0.27 - 0.29 ICC
Trained:
0.716 - 0.717 ICC
METHODS:
DISCUSSION:
RESULTS:
Agree
Left
Right
Evaluator:
KNEE
Agree
Disagree
Left Right Left Right
ANKLE
Agree
Disagree
Left Right Left Right
Max flexion accurate
Max extension accurate
Available range of motion
calculated correctly
Speed accurate: 1 second
Ashworth: Rated resistance to
passive movements of the joint
correctly
Ashworth: Recorded resistance
to passive movements of the
joint correctly
Figure 1: Excerpt from the MAS Training
Manual on assessment of the elbow flexors.
Figure 3: Examiner certification form for Group
A Trained Raters.
GROUP B: UNTRAINED RATERS:
For the purposes of this study, raters in Group B
were instructed to perform assessments using the
MAS per standard clinical protocol following the
recommendations of Bohannon and Smith (1987).
Raters attended a lecture to review the MAS and the
recommendations. The raters also received a data
collection tool for each joint. (See Figure 4.) These
raters were given the standard definitions provided
by Bohannon and Smith (1987) for scoring their
assessments:
• A MAS rating of “1” refers to a catch and release at
the end of the range of motion.
• A MAS rating of “1+” refers to a catch and release
throughout the remainder (less than half) of the
range of motion.
• A MAS rating of “2” or “3” refers to an increase in
muscle tone through most of the range of motion.
Raters in Group B were not provided supervised
practice sessions nor were they certified in there
method of assessing tone using the MAS.
For inter-rater reliability by joint, the trained group had greater weighted kappa values compared
to the untrained group for each of the four joints. For the trained raters, the inter-rater reliability
ranged from 0.56 to 0.66. For the untrained raters, inter-rater reliability ranged from 0.11 to 0.41.
(See Table 4 and Figure 5.)
INTER-RATER RELIABILITY BY JOINT AND GROUP
Joint
Group
Weighted
95% Confidence
Kappa
Interval
Elbow
Trained
0.6555
( 0.3572, 0.8604 )
Elbow
Untrained
0.4047
( 0.1148, 0.6717 )
Wrist
Wrist
Knee
Knee
Ankle
Ankle
Trained
Untrained
Trained
Untrained
Trained
Untrained
0.5643
0.2817
0.6591
0.1943
0.6133
0.1106
( 0.2865, 0.7909 )
( -0.0086, 0.6257 )
( 0.2675, 0.7995 )
( 0.0329, 0.3673 )
(0.3772, 0.8326 )
( 0.0244, 0.2586 )
Table 4: Inter-rater reliability for four joints comparing the Group A Trained Raters (highlighted in
blue) versus Group B Untrained Raters (highlighted in tan).
Figure 4: Data collection tool for Group B
Untrained Raters.
Figure 5: Inter-rater reliability by joint: weighted kappas for Group A Trained Raters (highlighted
in blue) versus Group B Untrained Raters (highlighted in tan).
Training was associated with significant improvement in intra-rater reliability. Intra-rater reliability
for the healthcare providers in the trained group showed good to excellent agreement (0.73 to 0.85)
compared to raters in the untrained group (0.39 to 0.82).
Training was associated with significant improvement in inter-rater reliability. Inter-rater reliability
for the trained raters demonstrated good agreement (Day1=0.72, Day2=0.66). The inter-rater
reliability for the untrained raters showed poor agreement (Day1=0.23, Day2=0.27).
For inter-rater reliability by joint, the trained group showed greater reliability (0.56 to 0.66) compared
to the untrained group (0.11 to 0.41) for each of the four joints. As found in previous intra- and interrater reliability studies of the MAS, there were differences in reliability for different muscle groups. As
expected, both trained and untrained raters had fewer differences with measurements for the elbow
joint. The trained raters had more challenges with wrist measurements. The untrained raters had
challenges with all four joints but had the greatest challenges with the ankle and knee followed closely
by the wrist.
Based on this study as well as existing evidence, when utilizing the MAS, standardizing assessment
techniques and providing detailed instruction and practice results in greater intra- and inter-rater
reliability. Key components of reliably measuring tone using the MAS include:
1. Understanding of MAS definitions and accurately calculating the quarters of available range
2. Keeping the patient’s body and limb position the same & stabilization of the proximal limb segment
3. Using standardized hand placement for the clinicians
4. Using a consistent speed of movement of one second for the full available range
5. Ability of clinicians to recognize the different manifestations of muscle resistance
6. Ability of clinicians to identify maximum flexion, maximum extension, and pinpoint the location
where resistance begins
7. Accurate recording of all measurements and calculations.
8. The need to practice of standardized process to gain proficiency and accuracy - even for skilled
clinicians
CONCLUSIONS:
Training was associated with significant improvement in intra-rater reliability.
• Intra-rater reliability for the healthcare providers in the trained group showed good to excellent
agreement compared to varied results for raters in the untrained group.
Training was associated with significant improvement in inter-rater reliability.
• Inter-rater reliability for the trained raters demonstrated good agreement. The inter-rater
reliability for the untrained raters showed poor agreement.
Training was associated with significant improvement in inter-rater reliability by joint.
• For inter-rater reliability by joint, for each of the four joints assessed, the trained group showed
greater reliability compared to the untrained group.
• As expected, both trained and untrained raters had fewer differences with measurements for the
elbow joint.
• The trained raters had more challenges with wrist measurements although still demonstrated
greater inter-reliability than the untrained raters.
• The untrained raters had challenges with all four joints but had the greatest challenges with the
ankle and knee followed closely by the wrist.
With most pediatric studies of hypertonia due to CP using the MAS for tone measurement, it is
imperative to have reliable inter- and intra-rater measurements.
Standardization and training of investigators significantly improves the accuracy of hypertonia
assessment in children with CP with obvious clinical and research implications.
REFERENCES:
Ashworth B. Preliminary trial of carisoprodol in multiple sclerosis. Practitioner 1964;192:540-2.
Bohannon RW, Smith MB. Interrater reliability of a Modified Ashworth Scale of muscle spasticity. Physical Therapy 1987;67:206-7.
Clopton N, Dutton J, Featherston T, Grigsby A, Mobley J, Melvin J. Interrater and intrarater reliability of the Modified Ashworth Scale in children with hypertonia. Pediatric
Physical Therapy 2005;17(4);268-74.
Fosang AL, Galea MP, McCoy AT, Reddihough DS. Measures of muscle and joint performance in the lower limb of children with cerebral palsy. Developmental Medicine & Child
Neurology 2003;45:664-670.
Gracies J-M, Burke K, Clegg NJ, Browne R, Rushing C, Fehlings D, Matthews D, Tilton A, Delgado MR. Reliability of the Tardieu Scale for assessing spasticity in children with
cerebral palsy. Archives of Physical Medicine and Rehabilitation 2010;91:421-8.
Klingels K, De Cock P, Molenaers G, Desloovere K, Huenaerts C, Jaspers E, Feys H. Upper limb motor and sensory impairments in children with hemiplegic cerebral palsy. Can
they be measured reliably? Disability and Rehabilitation 2010;32(5):409-16.
Mutlu A, Livanelioglu A, Gunel MK. Reliability of Ashworth and Modified Ashworth Scales in children with spastic cerebral palsy. BMC Musculoskeletal Disorders 2008;9:44.
Numanoğlu A, Günel MK. Intraobserver reliability of Modified Ashworth Scale and Modified Tardieu Scale in the assessment of spasticity in children with cerebral palsy. Acta
Orthopaedica et Traumatologica Turc 2012;46(3):196-200.
Pandyan AD, Johnson GR, Price CI, Curless RH, Barnes MP, Rodgers H. A review of the properties and limitations of the Ashworth and Modified Ashworth Scales as measures of
spasticity. Clinical Rehabilitation 1999;13(5):373-83.
Yam WKL, Leung MSM. Interrater reliability of Modified Ashworth Scale and Modified Tardieu Scale in children with spastic cerebral palsy. Journal of Child Neurology
2006;21:1031-5.
PRESENTER:
Nancy J. Clegg, RN, CNS, PhD, CCRP
Childhood Motor Disorders Research Coordinator
Texas Scottish Rite Hospital for Children
CORRESPONDING AUTHOR:
Mauricio R. Delgado, MD, FRCPC, FAAN
Professor of Neurology and Neurotherapeutics
University of Texas Southwestern Medical Center at Dallas
Director of Pediatric Neurology
Texas Scottish Rite Hospital for Children
2222 Welborn Street
Dallas, Texas 75219
Office: 214-559-7831
Fax: 214-559-8383
[email protected]
Disclosure of Relevant Financial Relationships:
We have the following financial relationships to
disclose:
Grant/Research support from IPSEN.
Disclosure of Off-Label and/or investigative uses:
We will not discuss off label use and/or investigational
use in our presentation.