Print - Stroke

Intra- and Interrater Reliability of Ischemic Lesion Volume
Measurements on Diffusion-Weighted, Mean Transit Time
and Fluid-Attenuated Inversion Recovery MRI
Marie Luby, MEng, MS; Julie L. Bykowski, MD; Peter D. Schellinger, MD, PhD;
José G. Merino, MD; Steven Warach, MD, PhD
Downloaded from http://stroke.ahajournals.org/ by guest on June 14, 2017
Background and Purpose—We investigated the intra- and interrater reliability of ischemic lesion volumes measurements
assessed by different MRI sequences at various times from onset.
Methods—Ischemic lesion volumes were measured for intrarater reliability using diffusion-weighted (DWI), mean transit
time (MTT) perfusion and fluid-attenuated inversion recovery (FLAIR) MRI at chronic (⬎3 days from stroke onset)
time points. A single intrarater reader, blind to clinical information and time point, repeated the volume measurements
on two occasions separated by at least 1 week. Interrater reliability was also obtained in the second set of patients using
acute DWI, MTT and chronic FLAIR MRI. Four blinded readers performed these volume measurements. Average
deviations across repeat measurements per lesion and differences between sample means between the two measurements
were calculated globally, ie, across all sequences and time points, and per reader type for each sequence at each time
point.
Results—There was good concordance of the mean sample volumes of the 2 intrarater readings (deviations were ⬍4% and
2 mL globally, ⬍2% and 2 mL for DWI, ⬍6% and 7 mL for MTT, and ⬍2% and 1 mL for FLAIR). There was also
good concordance of the interrater readings (⬍5% and 2 mL globally).
Conclusions—Repeat measurements of stroke lesion volumes show excellent intra- and interrater concordance for DWI,
MTT and FLAIR at acute through chronic time points. (Stroke. 2006;37:000-000.)
Key Words: acute stroke 䡲 brain imaging 䡲 magnetic resonance 䡲 neuroradiology 䡲 thrombolysis
M
easurements of ischemic lesion volumes have been used
in numerous imaging studies. Investigations of stroke
outcomes and studies of the effects of therapeutic interventions
including randomized clinical trials have included lesion volumes measured acutely with diffusion (DWI) and perfusion
(PWI) weighted MRI and chronically with T2-weighted or
fluid-attenuated inversion recovery (FLAIR) MRI.1–9 The use of
volumetric measurements as an objective quantitative tool depends on intra- and interrater variability, but limited information
is available on the reliability of these measurements across MRI
sequence type and time from stroke onset. Reliabilities reported
have been restricted to small sample sizes using manual or
semiautomated techniques. Martel et al reported an intra- and
interobserver repeatability coefficient defined as the 95% CI, of
⬍6 mL for their semiautomatic method of measuring lesion
volume using DWI (N⫽10).10 Ritzl et al reported an interobserver error of ⬍3 mL for measurement of chronic FLAIR
volume (N⫽8).11 Baird et al and Barber et al both reported an
interrater variability of 5% with manual segmentation of lesion
volumes on DWI.12,13 The current study provides intra- and
interrater reliability estimates of lesion volume measurements,
which may be relevant to stroke trial design and lesion data
analysis.
Methods
Patients
This study is part of a prospective, natural history study of MRI in a
consecutive series of tissue plasminogen activator (tPA)-treated
patients at the National Institutes of Neurological Disorders and
Stroke (NINDS) and Suburban Hospital, Bethesda, Md.14,15 The
Institutional Review Board at NINDS and Suburban Hospital approved the study. From February 2000 through January 2005, 147
patients were treated with standard intravenous tPA; of those, 81
patients had an MRI before tPA treatment and were the subject of the
intrarater analysis. Patients were eligible for this analysis if they
received an MRI scan (“acute” scan) followed by standard intravenous tPA within 3 hours from stroke onset. The study design was to
obtain serial MRI approximately 3 hours after the acute scan
(“3-hour” scan), approximately 24 hours after the start of tPA
(“24-hour” scan), and then at 5, 30, and 90 days after stroke onset
(“chronic” scan). Chronic (⬎3 days from stroke onset) FLAIR
volumes were taken from the latest available FLAIR for a given
patient. Because of clinical care requirements, patient preferences, or
Received July 1, 2006; final revision received August 3, 2006; accepted August 15, 2006.
From the the National Institute of Neurological Disorders and Stroke, National Institutes of Health (M.L., J.L.B., J.G.M., S.W.), Bethesda, Maryland;
and Neurologische Klinik, Universitätsklinikum Erlangen (P.D.S.), Erlangen, Germany.
Correspondence to Marie Luby, MEng, MS, Section on Stroke Diagnostics and Therapeutics, National Institute of Neurological Disorders and Stroke,
National Institutes of Health, 10 Center Drive, Rm. B1D733, MSC 1063, Bethesda, MD 20892. E-mail: [email protected]
© 2006 American Heart Association, Inc.
Stroke is available at http://www.strokeaha.org
DOI: 10.1161/01.STR.0000249416.77132.1a
1
2
Stroke
December 2006
death, all posttreatment MRI time points were not obtained in all
patients. Only image time points and sequences with identifiable
lesions were included.
A second confirmatory sample for intrarater analysis was obtained
from August 1999 through July 2005 in 106 patients diagnosed with
an acute cerebrovascular event, ischemic stroke, or transient ischemic attack. These patients had an acute MRI on average within 14
hours of onset and were not treated with tPA. The study design was
identical to the tPA patients with the exception of the “3- and
24-hour” scans, which were not obtained. These same patients were
also measured for the interrater analysis.
Imaging Sequences
MRI sequences were performed on a GE 1.5-T clinical scanner
(Twinspeed; General Electric).
Diffusion-Weighted Imaging
Downloaded from http://stroke.ahajournals.org/ by guest on June 14, 2017
In this study, the DWI spin-echo planar sequence included 20
contiguous axial oblique slices with b⫽0 and b⫽1000 s/mm2,
isotropically weighted, using repetition time (TR)/echo time (TE)⫽
6000/72 ms, acquisition matrix of 128⫻128, 7-mm slice thickness,
and 24-cm field of view (FOV).
Perfusion-Weighted Imaging
In this study, the PWI gradient-echo planar sequence included 20
contiguous axial oblique slices with single-dose gadolinium injection
of 0.1 mmol/kg through a power injector using 25 phase measurements (2 seconds per phase measurement), TR/TE⫽2000/45 ms,
acquisition matrix of 64⫻64, 7-mm slice thickness, and 24-cm FOV.
Mean Transit Time
Mean transit time (MTT) maps were calculated from PWI using time
concentration curves in this study as the first moment of the time
concentration curves divided by the zeroeth moment.
Fluid-Attenuated Inversion Recovery
FLAIR images were used for chronic lesion measurement as the
suppression of cerebrospinal fluid in these images is advantageous when
delineating the ischemic lesions from the adjacent sulci or ventricles. In
this study, the high-resolution FLAIR sequence included 66 contiguous
axial oblique slices, TR/TE⫽9000/92 ms, TI⫽2200 ms, acquisition
matrix of 256⫻128, 2-mm slice thickness, and 24-cm FOV.
sary to identify positive DWI lesions. For MTT assessments, the
reader paid particular attention to exclude hyperintensities attributable to the typical susceptibility artifacts adjacent to the paranasal
sinuses. The reader assessed the MTT as not evaluable if the signal
drop from the contrast did not produce at least a 10% drop or if there
was significant patient motion causing inconsistency in the confirmation of the perfusion deficit. In evaluating FLAIR, the reader paid
particular attention to preexisting chronic lesions present on the acute
FLAIR images to avoid replication of these lesion areas in the
current FLAIR volume calculations.
Measurements from each sequence were performed independently of
the other time points except for the chronic FLAIR measurements in
which reference to the acute FLAIR sequence aided identification of the
index stroke. Within each time point, the reader had access to the other
sequences to aid in accurate identification of the index lesion.
Patients’ intra- and interrater measurements from single slices of
the acute DWI volumes are displayed in Figure 1. These same
patients’ acute MTT and chronic FLAIR measurements from the
corresponding single slices are displayed in Figures 2 and 3.
Statistical Analysis
Deviations between measurements were evaluated in two ways. The
sample means, globally (across all sequences and time points) and
within each reader type (intra- or interrater), time point, and
sequence were compared for the two reads. The deviation per lesion
was computed as the absolute value of the difference between the
two readings; percent deviation per lesion was the per lesion absolute
deviation divided by the average of the two readings. All volume
data were skewed heavily toward mild. Cube root transformation of
the raw volumes was performed to normalize the volumetric data.
For example, for intrarater, acute DWI volume sample had a
skewness of 2.1 and a kurtosis of 3.9; after transformation, the
skewness was 0.7 and the kurtosis was ⫺0.3. The absolute volume of
the difference and the percentage deviation of the two readings, raw
volume and cube root of volume, were calculated for each reader
type and sequence and then averaged across all patients.
Bland-Altman plots were generated for the raw volume data to
display the spread of data and the limits of agreement, specifically to
Image Analysis
Image analysis software (Cheshire; Perceptive Informatics) was used
for measurement of ischemic lesion volumes from DWI, MTT, and
FLAIR MRI sequences. Lesion volumes were measured at acute,
subacute (3 and 24 hours), and chronic time points. Readers blind to
clinical characteristics and time point used a semiautomated technique for initial identification of all the lesions and a manual editing
tool for final corrections to the lesion borders. All lesion areas on a
slice-by-slice basis were segmented with a semiautomated segmentation tool followed by manual editing. The semiautomated segmentation tool is based on a watershed method dependent on a series of
seed points and the subsequent sampled surrounding area placed by
the reader. The volumes were automatically produced by the multiplication of the slice thickness times the total lesion area. The
intrarater reader (M.L.) measured the volume of each lesion on two
occasions separated by at least 1 week. The intrarater reader trained
one medical student (J.B.) and two stroke neurologists (P.S., J.M.) to
perform interrater measurements using the same semiautomated
technique. For the interrater readings, one reader (M.L.) performed
one complete set of these measurements; however, the second set
was performed by any combination of two of the three additional
interrater readers with no repeat measurements. Only one measurement was produced by one of the additional interrater readers to
perform direct comparisons with the first set of measurements.
DWI lesion volumes were assessed on the affected slices with
hyperintense areas visible from the b⫽1000 mm/s2 images. The
reader paid particular attention to the typical locations of bilateral
artifact and produced apparent diffusion coefficient maps as neces-
Figure 1. (A) Intrarater volumetric measurements on single DWI
image (top left region of interest [ROI])—(top right ROI). (B) Interrater volumetric measurements on single DWI image (bottom left
ROI—(bottom right ROI) with arrow indicating main difference
between readings.
Luby et al
Downloaded from http://stroke.ahajournals.org/ by guest on June 14, 2017
Figure 2. (A) Intrarater volumetric measurements on single MTT
image (top left ROI)—(top right ROI). (B) Interrater volumetric
measurements on single MTT image (bottom left ROI)—(bottom
right ROI) with arrows indicating main differences between
readings.
illustrate how many of the averaged data points lie within 2 SDs from
the mean difference.16 The Bland-Altman plots were used to address
the key question of whether one set of volumetric measurements is
sufficiently representative or if two sets of measurements are
required for providing the most accurate results. The 95% confidence
limits are proposed as the repeatability coefficients of one type of
measurement for another, ie, one set of volumetric measurement is
sufficient rather than requiring two sets in this particular study.16
Results
In the intrarater analysis, DWI had measurable lesions in 68
patients at the acute time point, 62 at 3 hours and 41 at 24
hours. MTT images were measured in 62 patients at the acute
time point, 38 at 3 hours and 24 at 24 hours. Chronic FLAIR
images were measured in 46 patients. The volume statistics
by sequence, DWI, MTT, and FLAIR are listed in Table 1.
Only DWI, MTT, and FLAIR measurements in which at least
one volume measurement is nonzero are reported in Table 1
and all subsequent tables and figures. The corresponding
sample sizes are provided for each reader type, sequence, and
time point. The correlation coefficients and cube root of the
raw volume data are presented in supplemental Table I,
available online at http://stroke.ahajournals.org.
The confirmatory intrarater sample had measurable lesions in 98 patients at the acute DWI, 72 patients at the
acute MTT, and 28 patients at the chronic FLAIR time
points. The volume statistics for the confirmatory sample
are presented in Table 1.
In the interrater analysis, there were measurable lesions in
103 patients at the acute DWI, 77 patients at the acute MTT,
Reliability of Ischemic Lesion Volume Measurements
3
Figure 3. (A) Intrarater volumetric measurements on single
FLAIR image (top left ROI)—(top right ROI). (B) Interrater volumetric measurements on single FLAIR image (bottom left ROI)—
(bottom right ROI) with arrows indicating main differences
between readings.
and 29 patients at the chronic FLAIR time points. The volume
statistics by sequence, DWI, MTT, and FLAIR are listed in
Table 2 for the interrater analysis.
Sequence: Diffusion-Weighted Image
The mean volumes, absolute volume (mL) differences, and
percent deviations between the two intrarater readings (N⫽68,
62, and 41) for DWI at acute, 3-hour, and 24-hour time points
are reported in Table 1. Supplemental Figure I (panels A through
C; available online at http://stroke.ahajournals.org) display the
Bland-Altman plots for the DWI raw volume data for the intraacute, 3-hour, and 24-hour time points. The upper and lower
limits, shown as thick black solid lines, were calculated for each
plot to represent ⫾2 SD from the mean. A total of 90%, 92%,
and 90% data points were within these boundary limits for the
DWI data, respectively. There were no significant changes with
respect to time point in the reliability of the DWI data.
The confirmatory intrarater sample statistics are reported in
Table 1. Supplemental Figure I (panel D) contains the
Bland-Altman plot for the DWI raw volume data. At total of
96% data points were within the boundary limits.
The mean volumes, absolute volume (mL) differences, and
percent deviations along with the cube root statistics between
the two interrater readings at acute DWI (N⫽103) time points
are reported in Table 2. Supplemental Figure I (panel E)
displays the interrater acute DWI Bland-Altman plot. A total
of 94% data points were within the boundary limits. By
excluding DWI lesions ⬍10 mL in volume, the percent
4
Stroke
TABLE 1.
Sequence
DWI
MTT
FLAIR
December 2006
Intrarater Volume (mL) Statistics by Time Point and Sequence
Time Point
N
Read 1 Average (mL)
median (IQR)
Acute
68
21.09 7.74 (1.97–27.35)
Absolute Volume Difference (mL)
Percent Deviation
Mean⫾SD median (IQR)
Mean⫾SD median (IQR)
20.19 6.47 (1.8–23.7)
3.48⫾5.41 0.87 (0.19–4.69)
30.11⫾48.42 12.69 (0.05–0.27)
Read 2 Average (mL)
median (IQR)
3-hour
62
25.64 9.35 (2.8–34.0)
26.09 10.9 (3.35–38.22)
3.21⫾4.71 1.58 (0.43–4.5)
23.75⫾34.17 12.71 (0.07–0.24)
24-hour
41
61.81 16.26 (2.47–57.83)
60.17 15.85 (2.34–54.82)
3.31⫾6.15 0.74 (0.21–3.13)
29.02⫾54.15 8.99 (0.02–0.24)
Acute
62
119.22 94.72 (31.85–187.42)
112.38 97.88 (28.71–180.79)
13.98⫾23.09 6.0 (1.58–17.87)
18.13⫾29.77 9.08 (0.04–0.19)
3-hour
38
111.44 88.71 (22.18–184.27)
110.57 86.57 (11.77–164.32)
11.27⫾16.13 3.86 (0.98–17.13)
18.52⫾35.58 8.56 (0.02–0.15)
24-hour
24
93.89 74.99 (18.26–102.15)
88.5 62.16 (14.98–109.77)
11.32⫾14.61 7.74 (2.43–16.24)
Days 4–145
46
44.43 12.1 (1.06–52.98)
43.82 11.4 (0.98–53.66)
2.02⫾3.13 0.81 (0.14–2.33)
34.68⫾61.76 9.4 (0.03–0.2)
41.55⫾59.2 17.88 (0.07–0.4)
43.91⫾65.5 14.67 (0.04–0.56)
Confirmatory intrarater sample statistics
DWI
Acute
98
15.45 2.37 (0.8–12.72)
14.48 3.25 (0.79–13.29)
2.7⫾7.2 0.73 (0.16–2.1)
MTT
Acute
72
60.51 19.28 (1.78–76.69)
56.0 17.26 (2.18–80.32)
11.89⫾42.65 3.49 (0.93–8.15)
58.64⫾76.2 19.37 (0.07–0.93)
FLAIR
Days 5–90
28
6.08 1.76 (0.66–4.9)
1.31⫾2.32 0.53 (0.11–0.95)
45.82⫾60.97 17.73 (0.08–0.62)
6.3 1.82 (0.76–5.64)
Downloaded from http://stroke.ahajournals.org/ by guest on June 14, 2017
deviations (average, median, and SD of 25.01, 18.47, and
26.81, respectively) were comparable to those of the intrarater readings.
In addition to the measured samples, 11 and 12 acute DWI
sequences were evaluated separately as normal, zero volumes, for the intra- and interrater readings. Included in the
measured samples, there were 4 and 11 acute DWI sequences
with one intrarater and interrater readings as zero volume. An
additional five and four DWI sequences at 3 and 24 hours
were evaluated separately as normal for both intrarater
readings. There was one DWI sequence at 3 hours with one
intrarater read as normal. There were three DWI sequences at
24 hours with one intrarater reading as normal.
Sequence: Mean Transit Time
The mean volumes, absolute volume (mL) differences, and
percent deviations between the two intrarater readings
(N⫽62, 38, and 24) for MTT at acute, 3 hour, and 24 hour
time points are reported in Table 1. Supplemental Figure I
(panels F through H) display the Bland-Altman plots for the
MTT raw volume data for the intra- acute, 3-hour, and
24-hour time points. A total of 94%, 92%, and 92% data
points were within the boundary lines for the MTT data,
respectively. There were no significant changes with respect
to time point in the reliability of the MTT data.
The confirmatory intrarater sample statistics are reported in
Table 1. Supplemental Figure I (panel I) displays the BlandAltman plot for the MTT raw volume data. At total of 99%
data points were within the boundary limits.
TABLE 2.
Review
Type
The mean volumes, absolute volume (mL) differences, and
percent deviations along with the cube root statistics between
the two interrater readings at acute MTT (N⫽77) time points
are reported in Table 2. Supplemental Figure I (panel J)
displays the interrater acute MTT Bland-Altman plot. A total
of 95% data points were within the boundary limits. When
excluding the MTT deficits ⬍10 mL in volume, the percent
deviation statistics improved but were still not comparable to
the intrarater readings.
In addition to the measured samples, seven and 34 acute
MTT sequences were evaluated separately as normal, zero
volumes, for the intra- and interrater readings. Included in the
measured samples, there were one and 21 acute MTT sequences with one intrarater and interrater readings as zero
volume. An additional five and nine MTT sequences at 3 and
24 hours were evaluated separately as normal for both
intrarater readings. There were three MTT sequences at 3
hours with one intrarater read as normal.
Sequence: Fluid-Attenuated Inversion Recovery
The mean volumes, absolute volume (mL) differences, and
percent deviations between the two intrareadings (N⫽46) for
chronic FLAIR time points are reported in Table 1. Supplemental Figure I (panel K) displays the Bland-Altman plot for
the intrarater chronic FLAIR time points. A total of 91% data
points were within the boundary lines.
The confirmatory intrarater sample statistics are reported in
Table 1. Supplemental Figure I (panel L) contains the
Bland-Altman plot for the FLAIR raw volume data. A total of
89% data points were within the boundary limits.
Interrater Volume (mL) Statistics by Time Point and Sequence
Sequence
Time Point
N
Inter
DWI
Acute
103
Inter
DWI—cubed root
Acute
103
Inter
MTT
Acute
77
Inter
MTT—cubed root
Acute
77
Inter
FLAIR
Days 5–90
29
Inter
FLAIR—cubed root
Days 5–90
29
Read 1 Average (mL)
median (IQR)
14.7 1.9 (0.67–11.73)
Read 2 Average (mL)
median (IQR)
13.0 1.4 (0.61–8.47)
Absolute Volume Difference (mL)
Mean⫾SD median (IQR)
Percent Deviation
Mean⫾SD median (IQR)
2.4⫾4.67 0.63 (0.23–2.17)
51.83⫾59.0 29.79 (0.15–0.6)
0.26⫾0.32 0.15 (0.06–0.29)
31.98⫾59.37 9.99 (0.05–0.21)
56.61 14.1 (1.25–75.0)
55.81 18.11 (2.19–75.17)
19.42⫾34.57 5.65 (1.14–18.74)
82.96⫾78.87 39.66 (0.19–2.0)
0.72⫾0.73 0.42 (0.21–0.96)
64.75⫾84.34 13.38 (0.06–2.0)
6.08 1.77 (0.73–5.51)
4.28 1.75 (0.39–2.82)
2.55⫾4.93 0.51 (0.31–1.94)
78.77⫾63.09 62.52 (0.3–0.84)
0.33⫾0.25 0.27 (0.16–0.53)
46.84⫾65.14 21.48 (0.1–0.3)
Luby et al
Downloaded from http://stroke.ahajournals.org/ by guest on June 14, 2017
The mean volumes, absolute volume (mL) differences, and
percent deviations along with the cube root statistics between
the two interrater readings at chronic FLAIR (N⫽29) time
points are reported in Table 2. Supplemental Figure I (panel
M) displays the interrater FLAIR Bland-Altman plot. A total
of 93% data points were within the boundary limits. By
excluding the FLAIR lesions ⬍10 mL, the sample size was
reduced to N⫽6; however, the percent deviations (average,
median, and SD of 39.74, 39.73, and 27.69, respectively)
were then comparable to those of the intrarater readings.
In addition to the measured sample, 12 chronic FLAIR sequences were evaluated separately as normal, zero volumes, for
both intrarater readings. Included in the measured sample, there
were five FLAIR sequences with only one intrarater reading as zero
volume. There were no FLAIR sequences evaluated separately as
normal, zero volumes, for both interrater readings. Included in the
measured sample, there were four FLAIR sequences with only one
interrater reading as zero volume.
A subset of nine intrarater patients with all sequences and
time points was calculated for comparison purposes to Table
1. The mean volumes (N⫽9) of the two intrarater readings for
DWI were 35.3 mL and 35.9 mL, 46.47 mL and 42.14 mL,
and 78.53 mL and 75.89 mL at acute, 3 hours, and 24 hour
time points, respectively. The mean volumes (N⫽9) of the
two intrarater readings for MTT were 164.56 mL and 162.04
mL, 134.73 mL and 138.07 mL, and 86.39 mL and 83.83 mL
at acute, 3 hours, and 24 hour time points, respectively. The
mean volumes (N⫽9) of the two intrarater readings for
FLAIR were 58.05 mL and 58.59 mL. The volume statistics
of this subset of nine patients was consistent with those of the
larger group of patients contained in Table 1.
The global mean volumes (N⫽341) of the two intrarater
readings were 63.0 mL and 60.9 mL. The absolute volume
(mL) differences between the reads were 6.54⫾13.27/1.92
(0.4 to 6.86) (mean⫾SD/median [interquartile range {IQR}
25% to 75%]). The percent deviations between the reads were
26.94⫾46.51/11.14 (0.04 to 0.24).
The global mean volumes (N⫽198) of the two confirmatory intrarater readings were 30.5 mL and 28.4 mL. The
absolute volume (mL) differences between the two readings
were 5.84⫾26.52/1.06 (0.24 to 4.3) (mean⫾SD/median [IQR
25% to 75%]). The percent deviations between the reads were
48.37⫾66.26/18.28 (0.07 to 0.51).
The global mean volumes (N⫽209) of the two interrater
readings were 28.9 mL and 27.6 mL. The absolute volume
(mL) differences between the reads were 8.69⫾22.76/1.31
(0.34 to 5.98) (mean⫾SD/median [IQR 25% to 75%]). The
percent deviations between the reads were 67.04⫾68.87/
35.64 (0.18 to 0.87).
Discussion
The volume of injury, infarction, and ischemia has assumed an
increasingly important role in stroke research in general and in
stroke clinical trials in particular. Yet, relatively little attention
has been paid to the reliability of volumetric measures and the
contribution of measurement error to the overall variance of
these outcomes in stroke studies. Because measurement error
may potentially obscure true biologic effects and is potentially
Reliability of Ischemic Lesion Volume Measurements
5
controllable by the investigator, it is important to understand the
sources and magnitude of measurement error.
In the present study, technical variables that could affect
lesion volume measurement such as the MRI scanner type,
pulse sequence parameters, image processing, and analysis
software were held constant. Because the intrarater reader’s
experience performing stroke volume measurements with this
software has been extensive over a period of years, the skill
and consistency in measurement technique can be assumed to
be constant over the period of the intrarater study. Thus, we
have characterized the intrarater reproducibility independent
of other sources of measurement error.
In principle, programs to measure lesion volumes in a
completely automated fashion could eliminate this source of
error. However, because regions of abnormality on these MRI
sequences often lack sharp contrast boundaries with the
normal brain and may overlap in signal intensity with
nonpathologic structures, investigator judgment is needed to
avoid inaccuracies. The interrater study attempts to quantify
reader subjectivity, the main concern with interpretations of
the images by multiple readers. To reduce the measurement
error, the intrarater reader trained the three additional interrater readers in the same image processing techniques and
software. The interrater readers were also required to be
familiar with the clinical reading of serial stroke MRI scans.
Furthermore, the patients and imaging protocol used in the
interrater analysis were from the same time period, hospital,
and stroke population as the intrarater analysis.
The intrarater repeatability seen for DWI was ⬍1 mL at the
acute and 3-hour time points and ⬍2 mL at the 24-hour time
point. The overall intrarater percent difference ranged from
2% to 5% for DWI. The interrater percent difference was
⬍5% for acute DWI. For DWI, the error of the lesion volume
measurement was caused by two main sources. There were
six outliers readily identified by the Bland-Altman plots in
the interrater DWI readings caused by reader differences in
interpretation of subtle changes, differences in exclusion of
sulcal areas, and in one case, misidentification of a chronic
lesion as acute. Measurement error was also attributable to
significantly smaller lesions seen for the interrater readings.
The overall intrarater MTT percent difference ranged from
1% to 6%. The intrarater repeatability coefficient seen for
MTT was ⬍1 mL at the 3-hour time point but ⬍7 mL and ⬍6
mL at the acute and 24-hour time points, respectively. The
interrater repeatability coefficient seen for acute MTT was
⬍1 mL. The interrater MTT percent difference was ⬍1%.
The larger standard deviations seen with MTT are in part
attributable to the larger lesions present in this sequence and
the lower spatial resolution in the perfusion sequence acquisitions. This was most evidenced by the four outliers in the
interrater MTT readings, which are most readily identified by
the Bland-Altman plots.
The intrarater repeatability coefficient seen for FLAIR was
⬍0.6 mL and the overall percent difference of ⬍1.5%. The
most consistent lesion volume was seen with FLAIR attributable in part to the increased spatial resolution of this
sequence compared with DWI and MTT as well as the
resolution of edema at the chronic time point allowing for a
more stable measurement. The interrater repeatability coeffi-
6
Stroke
December 2006
Downloaded from http://stroke.ahajournals.org/ by guest on June 14, 2017
cient seen for FLAIR was ⬍2 mL but an overall percent
difference of ⬍50%. The decreased consistency in the interrater lesion volume measurement with FLAIR was mainly
attributable to the significantly smaller lesions seen (approximately 6 mL on average). The two outliers displayed in the
corresponding Bland-Altman plot in the interrater FLAIR
readings were caused by one reader, including areas of
cavitations that were excluded by the other reader. Overall,
for the DWI, MTT, and FLAIR volumes by requiring a
minimum lesion size, ie, 10 mL, the reproducibility of the
measurements, especially percent deviation, is improved.
Quantitative volumes by a single expert reader can provide
highly consistent and repeatable results of lesion volumes on
DWI, MTT, and FLAIR. The variability seen with the interrater
measurements increased compared with the intrarater measurements; however, they also demonstrated consistent and repeatable results. This study provides a resource of volumetric
statistics from a large homogenous stroke population to researchers interested in potential selection of particular sequences
and series of time points for further investigation.
Sources of Funding
This research was supported by the Division of Intramural Research
of the National Institute of Neurological Disorders and Stroke,
National Institutes of Health.
Disclosures
None.
References
1. Warach S. Tissue viability thresholds in acute stroke: the 4-factor model.
Stroke. 2001;32:2460 –2461.
2. Thijs VN, Lansberg MG, Beaulieu C, Marks MP, Moseley ME, Albers
GW. Is early ischemic lesion volume on diffusion-weighted imaging an
independent predictor of stroke outcome? A multivariate analysis. Stroke.
2000;31:2597–2602.
3. Baird AE, Dambrosia J, Janket S, Eichbaum Q, Chaves C, Silver B,
Barber PA, Parsons M, Darby D, Davis S, Caplan LR, Edelman RR,
Warach S. A three-item scale for the early prediction of stroke recovery.
Lancet. 2001;357:2095–2099.
4. Arenillas JF, Rovira A, Molina CA, Grive E, Montaner J, Alvarez-Sabin
J. Prediction of early neurological deterioration using diffusion- and
perfusion-weighted imaging in hyperacute middle cerebral artery ischemic stroke. Stroke. 2002;33:2197–2205.
5. Beaulieu C, de Crespigny A, Tong DC, Moseley ME, Albers GW, Marks
MP. Longitudinal magnetic resonance imaging study of perfusion and
diffusion in stroke: evolution of lesion volume and correlation with
clinical outcome. Ann Neurol. 1999;46:568 –578.
6. van Everdingen KJ, van der Grond J, Kappelle LJ, Ramos LMP, Mali
WPTM. Diffusion-weighted magnetic resonance imaging in acute stroke.
Stroke. 1998;29:1783–1790.
7. Warach S, Pettigrew LW, Dashe JF, Pullicino P, Lefkowitz DM, Sabounjian
L, Harnett K, Schwiderski U, Gammans R; Citicholine 010 Investigators.
Effect of citicholine on ischemic lesions as measured by diffusion-weighted
magnetic resonance imaging. Ann Neurol. 2000;48:713–722.
8. Hacke W, Albers G, Al-Rawi Y, Bogousslavsky J, Davalos A, Eliasziw M,
Fischer M, Furlan A, Kaste M, Lees KR, Soehngen M, Warach S. The Desmoteplase in Acute Ischemic Stroke Trial (DIAS). Stroke. 2005;36:66–73.
9. Warach S, Kaufman D, Chiu D, Devlin T, Luby M, Rashid A, Clayton L,
Kaste M, Lees KR, Sacco R, Fisher M. Effect of the glycine antagonist
gavestinel on cerebral infarcts in acute stroke patients, a randomized
placebo-controlled trial: the GAIN MRI Substudy. Cerebrovasc Dis.
2006;21:106 –111.
10. Martel AL, Allder SJ, Delay GS, Morgan PS, Moody AR. Measurement
of infarct volume in stroke patients using adaptive segmentation of
diffusion weighted MR images. Medical Image Computing and
Computer-Assisted Intervention—MICCAI ’99, LNCS. 1999;1679:22–31.
11. Ritzl A, Meisel S, Wittsack HJ, Fink GR, Siebler M, Modder U, Seitz RJ.
Development of brain infarct volume as assessed by magnetic resonance
imaging (MRI): follow-up of diffusion-weighted MRI lesions. J Magn
Reson Imaging. 2004;20:201–207.
12. Baird AE, Benfield A, Schlaug G, Siewert B, Lovblad KO, Edelman RR,
Warach S. Enlargement of human cerebral ischemic lesion volumes
measured by diffusion-weighted magnetic resonance imaging. Ann
Neurol. 1997;41:581–589.
13. Barber PA, Darby DG, Desmond PM, Yang Q, Gerraty RP, Jolley D, Donnan
GA, Tress BM, Davis SM. Prediction of stroke outcome with echoplanar
perfusion- and diffusion-weighted MRI. Neurology. 1998;51:418–426.
14. Chalela JA, Kang DW, Luby M, Ezzeddine M, Latour LL, Todd JW,
Dunn B, Warach S. Early magnetic resonance imaging findings in
patients receiving tissue plasminogen activator predict outcome: insights
into the pathophysiology of acute stroke in the thrombolysis era. Ann
Neurol. 2004;55:105–112.
15. Lee DK, Kim JS, Kwon SU, Yoo SH, Kang DW. Lesion patterns and
stroke mechanism in atherosclerotic middle cerebral artery disease: early
diffusion-weighted imaging study. Stroke. 2005;36:2583–2588.
16. Bland JM, Altman DG. Statistical methods for assessing agreement
between two methods of clinical measurement. Lancet. 1986;I:307–310.
Luby et al
TABLE I.
Sequence
DWI
MTT
FLAIR
Reliability of Ischemic Lesion Volume Measurements
Intrarater Volume (mL) Statistics by Time Point and Sequence—Cube Root of Raw Data
Absolute Volume Difference (mL)
Percent Deviation
Time Point
N
Mean⫾SD median (IQR)
Mean⫾SD median (IQR)
Spearman Coefficient
(raw volume data)
Acute
68
0.21⫾0.3 0.07 (0.03–0.28)
18.01⫾46.45 4.24 (0.02–0.09)
0.965
3-hour
62
0.17⫾0.28 0.08 (0.05–0.19)
10.34⫾26.22 4.24 (0.02–0.08)
0.961
24-hour
41
0.19⫾0.42 0.06 (0.02–0.17)
19.64⫾52.06 3.0 (0.01–0.08)
0.954
Acute
62
0.25⫾0.46 0.12 (0.06–0.25)
8.29⫾25.57 3.03 (0.01–0.06)
0.97
3-hour
38
0.2⫾0.27 0.08 (0.03–0.25)
9.78⫾32.37 2.85 (0.01–0.05)
0.989
24-hour
24
0.48⫾0.71 0.18 (0.04–0.42)
31.53⫾65.69 4.9 (0.01–0.19)
0.922
Days 4–145
46
0.13⫾0.15 0.08 (0.03–0.13)
26.17⫾61.79 3.12 (0.01–0.07)
0.991
7
Downloaded from http://stroke.ahajournals.org/ by guest on June 14, 2017
8
Stroke
December 2006
Downloaded from http://stroke.ahajournals.org/ by guest on June 14, 2017
Figure I. A through E, DWI sequence. A, Intrarater acute time point plot of mean volume versus volume difference between 2 sets of
measurements (raw numbers) where the thick black lines represent the upper and lower boundaries using ⫾2 SDs from the mean. B,
Intrarater 3-hour time point plot of mean volume versus volume difference between 2 sets of measurements (raw numbers) where the
thick black lines represent the upper and lower boundaries using ⫾2 SDs from the mean. C, Intrarater 24-hour time point plot of mean
volume versus volume difference between 2 sets of measurements (raw numbers) where the thick black lines represent the upper and
lower boundaries using ⫾2 SDs from the mean. D, Confirmatory intrarater acute time point plot of mean volume versus volume difference between 2 sets of measurements (raw numbers) where the thick black lines represent the upper and lower boundaries using ⫾2
SDs from the mean. E, Interrater acute time point plot of mean volume versus volume difference between 2 sets of measurements (raw
numbers) where the thick black lines represent the upper and lower boundaries using ⫾2 SDs from the mean. F through J, MTT
sequence. F, Intrarater acute time point plot of mean volume versus volume difference between 2 sets of measurements (raw numbers)
where the thick black lines represent the upper and lower boundaries using ⫾2 SDs from the mean. G, Intrarater 3-hour time point plot
of mean volume versus volume difference between 2 sets of measurements (raw numbers) where the thick black lines represent the
upper and lower boundaries using ⫾2 SDs from the mean. H, Intrarater 24-hour time point plot of mean volume versus volume difference between 2 sets of measurements (raw numbers) where the thick black lines represent the upper and lower boundaries using ⫾2
SDs from the mean. I, Confirmatory intrarater acute time point plot of mean volume versus volume difference between 2 sets of measurements (raw numbers) where the thick black lines represent the upper and lower boundaries using ⫾2 SDs from the mean. J, Interrater acute time point plot of mean volume versus volume difference between 2 sets of measurements (raw numbers) where the thick
black lines represent the upper and lower boundaries using ⫾2 SDs from the mean. K through M, FLAIR sequence. K, Intrarater 4-145–
day time point plot of mean volume versus volume difference between 2 sets of measurements (raw numbers) where the thick black
lines represent the upper and lower boundaries using ⫾2 SDs from the mean. L, Confirmatory intrarater 5–90 – day time point plot of
mean volume versus volume difference between 2 sets of measurements (raw numbers) where the thick black lines represent the upper
and lower boundaries using ⫾2 SDs from the mean. M, Interrater 5–90 – day time point plot of mean volume versus volume difference
between 2 sets of measurements (raw numbers) where the thick black lines represent the upper and lower boundaries using ⫾2 SDs
from the mean.
Luby et al
Reliability of Ischemic Lesion Volume Measurements
Downloaded from http://stroke.ahajournals.org/ by guest on June 14, 2017
Figure I. Continued
9
Intra- and Interrater Reliability of Ischemic Lesion Volume Measurements on
Diffusion-Weighted, Mean Transit Time and Fluid-Attenuated Inversion Recovery MRI
Marie Luby, Julie L. Bykowski, Peter D. Schellinger, José G. Merino and Steven Warach
Downloaded from http://stroke.ahajournals.org/ by guest on June 14, 2017
Stroke. published online November 2, 2006;
Stroke is published by the American Heart Association, 7272 Greenville Avenue, Dallas, TX 75231
Copyright © 2006 American Heart Association, Inc. All rights reserved.
Print ISSN: 0039-2499. Online ISSN: 1524-4628
The online version of this article, along with updated information and services, is located on the
World Wide Web at:
http://stroke.ahajournals.org/content/early/2006/11/02/01.STR.0000249416.77132.1a.citation
Permissions: Requests for permissions to reproduce figures, tables, or portions of articles originally published
in Stroke can be obtained via RightsLink, a service of the Copyright Clearance Center, not the Editorial Office.
Once the online version of the published article for which permission is being requested is located, click
Request Permissions in the middle column of the Web page under Services. Further information about this
process is available in the Permissions and Rights Question and Answer document.
Reprints: Information about reprints can be found online at:
http://www.lww.com/reprints
Subscriptions: Information about subscribing to Stroke is online at:
http://stroke.ahajournals.org//subscriptions/