The analysis of our goals required the development of an

COPYRIGHT © BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED
FLANIGAN ET AL.
INTERRATER AND INTRARATER RELIABILITY OF ARTHROSCOPIC MEASUREMENTS
OF ARTICULAR CARTILAGE DEFECTS IN THE KNEE
http://dx.doi.org/10.2106/JBJS.16.01132
Page 1
Appendix E-1: A Statistical Model for ICCs for a Measurement Method
The analysis of our goals required the development of an appropriate model that provides
a parsimonious representation for the various sources of variation that may contribute to the
variability in the 1,200 measurements made (10 knees, evaluated by 3 raters, using 4
measurement methods, at 5 locations within each knee, on 2 occasions). Initial exploratory
analyses suggested that the measurement error depended on the location and method used.
Accordingly, we considered the mixed-effect linear model:
π‘Œπ‘–π‘—π‘˜ βˆ’ π‘₯𝑖 = 𝛼 + 𝛽π‘₯𝑖 + 𝐴𝑖 + 𝐡𝑗 + πœ€π‘–π‘—π‘˜ , 𝑖 = 1 π‘‘π‘œ 10, 𝑗 = 1 π‘‘π‘œ 3, π‘˜ = 1, 2
where π‘₯𝑖 is the gold-standard value for the i-th knee, 𝐴𝑖 is the random knee effect that accounts
for the correlation between multiple measurements made on the randomly chosen knee, 𝐡𝑗 is the
random rater effect that accounts for the correlation between multiple measurements made by the
randomly chosen rater, and πœ€π‘–π‘—π‘˜ is the random measurement error associated with the k-th
replicate taken on the i-th knee by the j-th rater. It is generally assumed that 𝐴𝑖 , 𝐡𝑗 , π‘Žπ‘›π‘‘ πœ€π‘–π‘—π‘˜ are
independent and normally distributed with mean 0 and variances 𝜎𝐴2 , 𝜎𝐡2 , and πœŽπ‘’2 . With these
model assumptions, and with 𝜎π‘₯2 as the variance of the gold-standard values that are being
measured,
𝛽 2 𝜎π‘₯2 + 𝜎𝐴2 + 𝜎𝐡2
𝐼𝐢𝐢1 = 2 2
𝛽 𝜎π‘₯ + 𝜎𝐴2 + 𝜎𝐡2 + πœŽπ‘’2
is the ICC for the measurements for the same knee made by a single rater (intrarater reliability)
and
𝛽 2 𝜎π‘₯2 + 𝜎𝐴2
𝐼𝐢𝐢2 = 2 2
𝛽 𝜎π‘₯ + 𝜎𝐴2 + 𝜎𝐡2 + πœŽπ‘’2
is the ICC representing the interrater reliability that measures the correlation between the
measurements on the same knee by 2 randomly chosen raters.
COPYRIGHT © BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED
FLANIGAN ET AL.
INTERRATER AND INTRARATER RELIABILITY OF ARTHROSCOPIC MEASUREMENTS
OF ARTICULAR CARTILAGE DEFECTS IN THE KNEE
http://dx.doi.org/10.2106/JBJS.16.01132
Page 2
Appendix E-2: Overall Intrarater and Interrater Reliability Coefficients
Overall intrarater reliability (ICC) and interrater reliability (ICC) was run with all ratings
of femur, no gold standard, and without tibial measurements and was computed using the
following model.
Model:
π‘Œπ‘–π‘—π‘˜ = µ + 𝐴𝑖 + 𝐡𝑗 + πœ€π‘–π‘—π‘˜ , 𝑖 = 1 π‘‘π‘œ 40, 𝑗 = 1 π‘‘π‘œ 3, π‘˜ = 1, 2
where π‘Œπ‘–π‘—π‘˜ is the response corresponding to the i-th knee location (40 distinct knee locations,
excluding the 10 tibial locations), 𝐴𝑖 is the random knee effect that accounts for the correlation
between multiple measurements made on the randomly chosen knee, 𝐡𝑗 is the random subject
effect that accounts for the correlation between multiple measurements made by the randomly
chosen rater, and πœ€π‘–π‘—π‘˜ is the random measurement error associated with the k-th replicate taken
on the i-th knee location by the j-th rater. It is generally assumed that 𝐴𝑖 , 𝐡𝑗 , and πœ€π‘–π‘—π‘˜ are
independent and normally distributed with mean 0 and variances 𝜎𝐴2 , 𝜎𝐡2 , and πœŽπ‘’2 . With these
model assumptions, the intraclass correlation (ICC1) for measuring intrarater reliability is given
by
𝜎𝐴2 + 𝜎𝐡2
𝐼𝐢𝐢1 = 2
𝜎𝐴 + 𝜎𝐡2 + πœŽπ‘’2
and the ICC2 that measures the interrater reliability of the measurements of the same knee
location by 2 randomly chosen raters is given by
𝜎𝐴2
𝐼𝐢𝐢2 = 2
.
𝜎𝐴 + 𝜎𝐡2 + πœŽπ‘’2