HB talk slides

Assessing Individual Agreement
Huiman X. Barnhart, Ph.D.
[email protected]
Associate Professor
Department of Biostatistics and Bioinformatics
Duke Clinical Research Institute
Duke University
Andrzej S. Kosinski, Ph.D.
Duke University
Michael Haber, Ph.D.
Department of Biostatistics, Emory University
This research is funded by R01 MH70028
Outline
• Introduction
• Data Examples
• Individual equivalence and individual agreement
• Individual equivalence criterion (IEC) and coefficient of
individual agreement (CIA)
• Application to data examples
• Discussion and Future research
Introduction
Accurate and precise measurement is important in clinical
diagnosis. A method comparison study or reliability study is
usually conducted to evaluate agreement between methods or
observers. We are often interested in
• Whether the methods/observers can be used interchangeably
at individual level
• Whether a new method that is easy to use can replace an
existing standard method that may be expensive or invasive
at individual level.
Introduction
Traditionally, if there is no reference method, assessing agreement
has been based on
• Intraclass correlation coefficient (ICC) (Inter- and Intra-ICC)
2
σB
Between-subject variability
ICC = 2
=
2
σB + σW
Total variability
for model Yij = αi + eij , j = 1, . . . , J
• Concordance correlation coefficient (CCC)
CCC
E(Yi1 − Yi2 )2
= 1−
E(Yi1 − Yi2 )2 |Yi1 , Yi2 are independent)
2σB1 σB2 ρµ
=
2 + σ2
2 + σ2
2
σB1
+
σ
+
(µ
−
µ
)
1
2
W1
B2
W2
for model Yij = µij + eij , j = 1, . . . , J
2
, ICC and CCC
• With fixed within-subject variability σW
2
increase as between-subject variability σB
increases.
10
6
4
2
0
2
4
6
8
10
0
10
0
2
4
6
8
10
6
8
10
ICC=CCC=0.97
6
4
2
0
0
2
4
6
8
ICC=CCC=0.91
8
10
ICC=CCC=0.8
8
10
0
2
4
6
8
ICC=CCC=0.0
0
2
4
6
8
10
0
2
4
Goniometer Example
n=29, K=3, J=2
Table 1. Comparison of Manual Goniometer and Electrogoniometer
Manual Goniometer Electro-goniometer
µ
1.437
0.046
2
σW
0.736
0.977
2
σB
53.8
51.4
Intra ICC
0.986
0.982
Inter ICC
0.961
CCC
0.945
Bland and Altman Blood Pressure example
(n=85, K=3, J=3)
µ
2
σW
2
σB
Intra ICC
Inter ICC
CCC
Observer 1
127.4
37.4
936.0
0.962
Observer 2
127.3
38.0
917.1
0.960
0.875
0.782
Machine
143.0
83.1
983.2
0.922
Introduction
• It is questionable whether ICC or CCC is adequate in
assessing agreement at individual level.
• We define “Individual Agreement” in the following sense:
individual difference between measurements from different
methods is close to the
difference between replicated measurements within a method
10
10
Method 2 replications: 5,6,5,6
Method 2 replications: 3,4,3,4
•
•
6
•
•
•
•
2
•
0
0
2
4
•
4
6
8
Method 1 replications: 7,7,8,8
8
Method 1 replications: 5,5,6,6
0
2
4
6
8
10
0
2
4
6
8
10
Individual Bioequivalence
• “Individual agreement” here is similar to individual
bioequivalence in bioequivalence studies that assess agreement
between a test drug against reference drug.
• Individual bioequivalence was first introduced by Anderson
and Hauck (1990) using probability criterion.
• Schall and Luus (1993) extended to general case that includes
the probability and moment criteria as special cases.
• FDA modified and adopted the moment criterion in their
guidelines (2001) for establishing individual bioequivalence
(IBE).
Individual Agreement
• The moment criterion for IBE compares the (expected)
squared individual difference between new and standard drugs
to the (expected) squared individual difference within
standard drug
• We propose to assess individual agreement by comparing the
(expected) sqaured individual difference between methods to
the (expected) squared individual difference within mehtod
due to replication.
Two cases are considered and compared:
(1) Existence of a reference method (or gold standard
measured with error)
(2) There is no reference method.
Individual Bioequivalence Criterion
FDA (2001)
Existence of a reference
• Reference-scaled IBC
E(YiT − YiR )2 − E(YiR − YiR0 )2
IBC =
≤ θI
2
E(YiR − YiR0 ) )/2
θI is the bioequivalence limit set by the regulatory agency.
• Yij = µij + ij , j =T (test drug), R (reference drug)
Within-subject: σ 2
= V ar(iT ), σ 2
= V ar(iR )
WT
WR
Between-subject: σ 2
= V ar(µiT ), σ 2
= V ar(µiR ).
BT
BR
Subject-by-formulation interaction: σ 2 = V ar(µiT − µiR )
D
2
2
2
−
σ
+ σW
(µT − µR )2 + σD
WR
T
≤ θI
IBC =
2
σW R
Proposed Individual Equivalence Criterion (IEC)
Existence of a reference
For a total of J methods with first J − 1 new methods and J
method as a reference
PJ−1
2
2
0)
(
E(Y
−
Y
)
)/(J
−
1)
−
E(Y
−
Y
ij
iJ
iJk
iJk
j=1
R
IEC =
≤ θI
2
E(YiJk − YiJk0 ) /2
PJ−1 2
PJ−1 2
PJ−1
2
j=1
IEC R =
(µj −µJ )
J−1
+
j=1
σD
jJ
J−1
2
σW
J
+
j=1
σW j
J−1
2
where Yij = µij + ij with σD
= V ar(µij − µiJ ).
jJ
2
− σW
J
≤ θI
Coefficient of Individual Agreement (CIA)
Existence of a reference
2
Within-subject variability
σW
J
=
.
CIA = 2
2
τ∗R + σ∗R
Total variability between methods
R
2
“True” inter-method variability: τ∗R
=
E(
PJ−1
j=1
2
Weighted within-method variability: σ∗R
= 21 (
2
CIA =
IEC R + 2
R
(µij −µiJ )2 /2)
PJ−1
J−1
j=1
2
σW
j
J−1
2
+ σW
J)
Proposed Individual Equivalence Criterion (IEC)
No reference
For J new methods
P
PJ−1 PJ
2
E(Yijk −Yijk0 )2
E(Yij −Yij 0 ) )
0
j
j=1
j =j+1
−
2
J(J−1)
J
N
P
IEC =
≤ θI .
2
j
E(Yijk −Yijk0 )
2J
IEC N
2τ∗2
= 2
σ∗
IEC N and CIAN
No reference
N
• CIA
=ψ=
σ∗2
τ∗2 +σ∗2
=
Average within-subject variability
Total variability between methods
(Haber et al, 2005)
• CIAN =
2
IEC N +2 .
• In general, IEC N ≤ IEC R and CIAR ≤ CIAN
Guideline on IEC and CIA values
• In general, we want to have low value of IEC and high value
of CIA to claim satisfactory individual agreement.
• FDA’s boundary: θI = 2.4948. This corresponds to
IEC ≤ 2.4948 or
CIA ≥ 0.445
i.e., the inter-method variability (τ∗2 ) is within 125% of the
within-subject variance (σ∗2 ).
Estimation and Inference
Based on method of moment,
• Existence of a reference
ˆ
IEC
R
2
2
− M SEW J )
+ σ̂∗R
2(τ̂∗R
=
,
M SEW J
PJ−1
2
σ̂∗R
M SEW j
j=1
=(
J −1
R
M SEW J
ˆ
CIA = 2
.
2
τ̂∗R + σ̂∗R
+ M SEW J )/2.
Yijk = µ + αi + ijk , j = 1, . . . , J.
2
τ̂∗R
=
P
j
(M SjJ − M SEjJ )
K(J − 1)
Yijk = µ + αi + γij + ijk , j = j,
or
.
J
Estimation and Inference
Based on method of moment,
• No reference
N
2(M S − M SE)
ˆ
IEC =
K ∗ M SE
N
K ∗ M SE
ˆ
CIA =
.
M S + (K − 1) ∗ M SE
τ̂∗2 =
M S − M SE
,
K
and
σ̂∗2 = M SE.
Yijk = µ + αi + γij + ijk ,
• Large sample distribution is possible. We use the bootstrap
percentile method for one sided confidence bound
Goniometer Example
n=29, K=3, J=2
Table 3. Comparison of Manual Goniometer and Electrogoniometer
Manual Goniometer
Electro-goniometer
µ
1.437
0.046
2
0.736
0.977
σW
2
σB
53.8
51.4
Intra ICC
0.986
0.982
IEC
CIA
Inter ICC
CCC
Individual Agreement
Reference: Manual Goniometer
No reference
estimate (95% bound)
Estimate (95% bound)
6.120 (9.896 upper)
4.975 (8.521 upper)
0.246 (0.168 lower)
0.287 ( 0.190 lower)
–
–
0.961
0.945
Table 4. Comparison of Observers and Automatic Machine in Measuring
Blood Pressure (n=85, K=3, J=3)
µ
2
σW
2
σB
Intra ICC
CIA
inter ICC
CCC
CIA
inter ICC
CCC
CIA
inter ICC
CCC
Observer 1
127.4
37.4
936.0
0.962
Observer 2
Machine
127.3
143.0
38.0
83.1
917.1
983.2
0.960
0.922
Individual Agreement
Reference: Observers
No reference
Overall results
0.111 (0.071 lower)
0.225 (0.152 lower)
–
0.875
–
0.782
Pairwise: Observer 1 vs. Observer 2
–
1.44 (1.426 lower)
–
0.989
–
0.973
Pairwise: Machine vs. observer 1
0.110 (0.070 lower)
0.178 (0.117 lower)
–
0.834
–
0.701
Discussion and Future Research
We proposed CIA for assessing individual agreement of continuous
measures between multiple methods for scenarios of existing
reference or no reference.
• Asymptotic distribution of CIA
• Covariates impact on CIA
• How does CIA relates to ICC/CCC?
• How to quantify the impact of between-subject
variability?
Discussion and Future Research
• How to assess individual agreement for binary or discrete
measures?
How does the CIA differ from Kappa for binary case?
• Replicated measures may be difficult to obtain. Repeated
measures over time are often taken.
How to assess (individual) agreement with repeated
measures data?
• How to design an agreement study with proper sample size
computation?
• How does the measurement error impact different study
designs?
Two group design or Matched design