AK slides

Duke University
Diagnostic Tests Issues with Incomplete or Sparse Data
Andrzej S. Kosinski
Department of Biostatistics and Bioinformatics
Duke University
and Duke Clinical Research Institute
[email protected]
Joint work with Huiman X. Barnhart
Patient Care and Outcomes Research Grant sponsored by American Heart
Association (AHA) Pharmaceutical Roundtable and Grant in Aid from AHA
NCSU. April 27, 2006
1 of 40
Duke University
OUTLINE
• Dichotomous test measure
– Background and data
– Possible verification bias problem — missing data problem
– Point estimate approach with assumptions
– Two dimensional region estimate approach with no assumptions —
Test Ignorance Region
• Continuous test measure
– Receiver Operating Characteristic (ROC) curve
– Model derived predictive values
• Possible future work
NCSU. April 27, 2006
2 of 40
Duke University
BACKGROUND
Diagnostic test T (1=positive, 0=negative) evaluated by comparison with
the gold standard D (1=disease, 0=no disease).
D=1
D=0
T =1
A1
B1
T =0
A2
B2
Performance of a test often measured by:
Sensitivity P (T = 1|D = 1) estimated by
A1
A1 +A2
Specificity P (T = 0|D = 0) estimated by
B2
B1 +B2
• Sensitivity and specificity are numbers between 0 and 1.
• Higher values are better.
• A random coin toss: sensitivity=0.5 and specificity=0.5.
NCSU. April 27, 2006
3 of 40
Duke University
PROBLEM
The decision to verify often depends on:
• Recorded variables related to the disease status
• Unrecorded variables related to disease and hence disease itself.
⇒ Verified patients may be a biased sample from the population of interest.
⇒ Sensitivity and specificity estimates based only on complete data may be
biased.
⇒ Verification bias Work-up bias
Ransohoff and Feinstein (1978, New England Journal of Medicine)
NCSU. April 27, 2006
4 of 40
Duke University
DICHOTOMOUS TEST DATA LAYOUT
Result of
Disease verified (R=1)
Disease not verified (R=0)
test T
D=1
D=0
T =1
a1
b1
u1
T =0
a2
b2
u2
T
D
R
frequency
1
1
1
a1
1
0
1
b1
0
1
1
a2
0
0
1
b2
1
NA
0
u1
0
NA
0
u2
NCSU. April 27, 2006
D=1
D=0
Missing data problem
5 of 40
Duke University
MISSING AT RANDOM (MAR) ASSUMPTION
P (R|D, T, X) = P (R|T, X) does not depend on D
Begg and Greenes (Biometrics, 1983)
Diamond (American Journal of Cardiology, 1986)
Cecil, Kosinski, Jones et.al (Journal of Clinical Epidemiology, 1996)
NOT IGNORABLE (NI) SITUATION
P (R|D, T, X) depends on D
Zhou (Communications in Statistics - Theory and Methods, 1993)
Baker (Biometrics, 1995)
Kosinski and Barnhart (Biometrics, 2003)
NCSU. April 27, 2006
6 of 40
Duke University
REGRESSION MODEL FRAMEWORK
Kosinski and Barnhart (Biometrics, 2003). Continuous or categorical
covariates.
Lobs
=
N
P (Ri , Ti , Di |xi )Ri P (Ri , Ti |xi )1−Ri
i=1
=
N Ri
×
P (Di |xi )P (Ti |Di , xi )P (Ri |Ti , Di , xi )
i=1
1
1−Ri
P (Di = d|xi )P (Ti |Di = d, xi )P (Ri |Ti , Di = d, xi )
d=0
Disease component :
Test component :
Missing data mechanism component :
NCSU. April 27, 2006
logit P (Di = 1| xi ) = α z0i
logit P (Ti = 1| Di , xi ) = β z1i
logit P (Ri = 1| Di , Ti , xi ) = γ z2i
7 of 40
Duke University
EXAMPLES OF MISSING DATA MECHANISM COMPONENT
MAR assumption
logit P (Ri = 1|Di Ti , xi ) = γ0 + γ1 Ti + γ2 x1
Disease (D) not included as a variable.
NI (non-ignorable) missing data mechanism
logit P (Ri = 1|Di Ti , xi ) = γ0 + γ1 Ti + γ2 Di + γ3 x1
“Non-differential non-ignorability” with respect to T and X.
logit P (Ri = 1|Di Ti , xi ) = γ0 + γ1 Ti + γ2 Di + γ3 Ti Di + γ4 x1
“Differential non-ignorability” with respect to T .
“Non-differential non-ignorability” with respect to X.
NCSU. April 27, 2006
8 of 40
Duke University
SPECT DATA
Cecil, Kosinski, Jones et al. (1996, Journal of Clinical Epidemiology)
T: single-photon-emission computed tomography (SPECT) thallium stress test
(non-invasive diagnostic test)
D: coronary angiography (invasive gold standard)
Result of
Disease D verified (R=1)
Disease D not
test T
D=1
D=0
verified (R=0)
Total
T =1
195
232
996
1423
T =0
5
39
1221
1265
2217
2688
Total
82% = 2217/2688 not verified
“Naive” SENS =
195
195+5
NCSU. April 27, 2006
= 97.5%
“Naive” SPEC =
39
39+232
= 14.4%
9 of 40
Duke University
SPECT DATA MODELS
M-1 (MAR model)
M-2 (MAR model)
M-3 (NI model)
Missing data mechanism component: logit P (R = 1| D, T, X1 , X2 , X3 )
γ
SE
P
γ
SE
P
γ
SE
P
Int
-3.323
0.15 <.001
-3.423
0.17 <.001
-3.315
0.37 <.001
T
2.476
0.16 <.001
2.476
0.17 <.001
3.183
0.45 <.001
Gender
—
—
—
-0.022
0.11
0.85
0.350
0.19
0.07
Stress mode
—
—
—
-0.176
0.12
0.13
0.114
0.17
0.50
Age ≥ 60
—
—
—
0.400
0.11 <.001
0.739
0.17 <.001
D
—
—
—
—
“Naive”
—
—
-2.054
1.02
0.043
Marginal estimate
Sens
97.5%
81.9% (69.5, 94.3)
80.4% (68.6, 92.3)
65.3% (46.4, 84.1)
Spec
14.4%
59.2% (55.4, 63.0)
58.5% (54.8, 62.2)
64.5% (55.8, 73.3)
NCSU. April 27, 2006
10 of 40
Duke University
REGRESSION MODEL FRAMEWORK
• A flexible and general modeling framework
• Allows for categorical and continuous covariates
General Questions :
• Can observed data provide evidence for non-ignorability?
• Can we “test” for non-ignorability?
• Maybe all we can do is to fit NI models as a plausible alternative to a
MAR model.
NCSU. April 27, 2006
11 of 40
Duke University
“SOLUTION” TO PARTIAL VERIFICATION OF DISEASE
Assumptions, models, Bayesian approach . . .
provide point estimate (identifiability) for sensitivity and specificity.
BUT
• Data which can be used to check assumptions are missing.
• Goodness-of-fit for models can be checked against observed data only.
• Bayesian is not really a classic Bayesian situation. Observed data may not
be able to improve on prior information regardless of the sample size.
In the end one may need or choose to use assumptions but we recommed to
start with a global sensitivity analysis first.
NCSU. April 27, 2006
12 of 40
Duke University
GLOBAL SENSITIVITY ANALYSIS
Consider ALL combinations of sensitivity and specificity plausible under the
observed data.
Test Ignorance Region (TIR)
“Ignorance” due to incompleteness of disease status verification.
Kosinski and Barnhart (Satistics in Medicine, 2003)
Horowitz and Manski (JASA, 2000)
Molenberghs et al. (Applied Statistics, 2001).
NCSU. April 27, 2006
13 of 40
Duke University
Result of
Disease verified (R=1)
Disease not verified (R=0)
test T
D=1
D=0
D=1
D=0
T =1
a1
b1
u1
T =0
a2
b2
u2
Consider not known p1 and p2 :
p1 = P (D = 1|T = 1, R = 0)
p2 = P (D = 1|T = 0, R = 0)
IDEALIZED COMPLETE VERIFICATION DATA
Result of
Disease verified (R=1)
Disease not verified (R=0)
test T
D=1
D=0
D=1
D=0
T =1
a1
b1
p1 u 1
u 1 - p1 u 1
T =0
a2
b2
p2 u 2
u 2 - p2 u 2
NCSU. April 27, 2006
14 of 40
Duke University
IDEALIZED COMPLETE VERIFICATION DATA
Result of
Disease D
test T
D=1
T =1
a1 + p1 u1
b1 + (1 − p1 )u1
T =0
a2 + p2 u2
b2 + (1 − p2 )u2
p1 = P (D = 1|T = 1, R = 0)
D=0
p2 = P (D = 1|T = 0, R = 0)
a1 + p1 u1
SENS ≡ f1 (p1 , p2 ) =
a1 + p1 u1 + a2 + p2 u2
b2 + (1 − p2 )u2
SPEC ≡ f2 (p1 , p2 ) =
.
b1 + (1 − p1 )u1 + b2 + (1 − p2 )u2
(p1 , p2 ) ∈ [0, 1] × [0, 1].
NCSU. April 27, 2006
15 of 40
Duke University
SPECT DATA
0.8
0.6
Specificity
0.0
0.2
0.4
0.8
0.6
0.4
0.2
0.0
p2=P(D=1|T=0,R=0)
1.0
Test Ignorance Region
1.0
"Ignorance" about p1 and p2
0.0
0.2
0.4
0.6
0.8
p1=P(D=1|T=1,R=0)
NCSU. April 27, 2006
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
16 of 40
Duke University
1.0
TIR FOR SPECT DATA
Sensitivity
0.975
0.819
A: MCAR
B: MAR
Specificity
0.144
0.592
(p1=1,p2=0)
0.6
•
B
•
(p1=0,p2=0)
0.2
0.4
Specificity
0.8
•
•A
0.0
• (p1=1,p2=1)
(p1=0,p2=1) •
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
NCSU. April 27, 2006
17 of 40
Duke University
POSSIBLE PARAMETERIZATIONS
• Pattern mixture model approach: P (D, T, R) = P (D|T, R)P (T, R)
p1 = P (D = 1|T = 1, R = 0)
p2 = P (D = 1|T = 0, R = 0)
(p1 , p2 ) ∈ [0, 1] × [0, 1]
• Selection model approach: P (D, T, R) = P (R|T, D)P (T, D)
π1 = P (R = 1|T = 1, D = 1)
π2 = P (R = 1|T = 0, D = 1)
(π1 , π2 ) ∈ [a1 /(a1 + u1 ), 1] × [a2 /(a2 + u2 ), 1]
p1 u1 = a1 (1 − π1 )/π1
p2 u2 = a2 (1 − π2 )/π2
• Other parametrizations can be considered: Zhou 1993, odds ratios, etc.
NCSU. April 27, 2006
18 of 40
Duke University
POSSIBLE PARAMETERIZATIONS
The choice of parameterization does not modify the TIR.
Assumptions can be equivalently expressed.
For example, the MAR estimates result from assumptions
• p1 = a1 /(a1 + b1 )
p2 = a2 /(a2 + b2 )
• or
π1 = (a1 + b1 )/n1
π2 = (a2 + b2 )/n2
NCSU. April 27, 2006
19 of 40
Duke University
“SOLUTION” TO PARTIAL VERIFICATION OF DISEASE
Test Ignorance Region (TIR) — No assumptions.
• More missingness results in more ignorance.
• Maybe it is fine to settle on a non-identifiable model? Region-estimate
may by informative enough... Do we always need point estimates?
• Magnitude of non-identifiablity? Does the size of the region reflect amount
of ingnorance due to the missing data?
We suggest separation of
• information provided by observed data (region-estimate) and
• “information” provided by assumptions
Only “information by assumption” allows to “shrink” a purely data based
region-estimate into a point-estimate
NCSU. April 27, 2006
20 of 40
Duke University
SPECT DATA
0.8
0.6
Specificity
0.0
0.2
0.4
0.8
0.6
0.4
0.2
0.0
p2=P(D=1|T=0,R=0)
1.0
Test Ignorance Region
1.0
"Ignorance" about p1 and p2
0.0
0.2
0.4
0.6
0.8
p1=P(D=1|T=1,R=0)
NCSU. April 27, 2006
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
21 of 40
Duke University
SPECT DATA — Assumption about p1 and p2
0.8
0.6
Specificity
0.0
0.2
0.4
0.8
0.6
0.4
0.2
0.0
p2=P(D=1|T=0,R=0)
1.0
Test Ignorance Region
1.0
"Ignorance" about p1 and p2
0.0
0.2
0.4
0.6
0.8
p1=P(D=1|T=1,R=0)
NCSU. April 27, 2006
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
22 of 40
Duke University
SPECT DATA — Assumption about p1 and p2
0.8
0.6
Specificity
0.0
0.2
0.4
0.8
0.6
0.4
0.2
0.0
p2=P(D=1|T=0,R=0)
1.0
Test Ignorance Region
1.0
"Ignorance" about p1 and p2
0.0
0.2
0.4
0.6
0.8
p1=P(D=1|T=1,R=0)
NCSU. April 27, 2006
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
23 of 40
Duke University
SPECT DATA — Assumption about p1 and p2
0.8
0.6
Specificity
0.0
0.2
0.4
0.8
0.6
0.4
0.2
0.0
p2=P(D=1|T=0,R=0)
1.0
Test Ignorance Region
1.0
"Ignorance" about p1 and p2
0.0
0.2
0.4
0.6
0.8
p1=P(D=1|T=1,R=0)
NCSU. April 27, 2006
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
24 of 40
Duke University
SPECT DATA — Assumption about disease prevalence
D=1 D=0
T=1 195 232
T=0
5
39
R=0
996
1221
Possible disease prevalence range:
0.074 − 0.899
Test Ignorance Region
0.8
0.6
0.4
0.2
0.0
0.2
0.4
Specificity
0.6
0.8
1.0
Disease prevalence range: 0.074 − 0.200
0.0
p2=P(D=1|T=0,R=0)
1.0
"Ignorance" about p1 and p2
0.0
0.2
0.4
0.6
0.8
p1=P(D=1|T=1,R=0)
NCSU. April 27, 2006
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
25 of 40
Duke University
SPECT DATA — Assumption about disease prevalence
D=1 D=0
T=1 195 232
T=0
5
39
R=0
996
1221
Possible disease prevalence range:
0.074 − 0.899
Test Ignorance Region
0.8
0.6
0.4
0.2
0.0
0.2
0.4
Specificity
0.6
0.8
1.0
Disease prevalence range: 0.074 − 0.200
0.0
p2=P(D=1|T=0,R=0)
1.0
"Ignorance" about p1 and p2
0.0
0.2
0.4
0.6
0.8
p1=P(D=1|T=1,R=0)
NCSU. April 27, 2006
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
26 of 40
Duke University
SPECT DATA — Assumption about disease prevalence
D=1 D=0
T=1 195 232
T=0
5
39
R=0
996
1221
Possible disease prevalence range:
0.074 − 0.899
Test Ignorance Region
0.8
0.6
0.4
0.2
0.0
0.2
0.4
Specificity
0.6
0.8
1.0
Disease prevalence range: 0.074 − 0.300
0.0
p2=P(D=1|T=0,R=0)
1.0
"Ignorance" about p1 and p2
0.0
0.2
0.4
0.6
0.8
p1=P(D=1|T=1,R=0)
NCSU. April 27, 2006
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
27 of 40
Duke University
SPECT DATA — Assumption about disease prevalence
D=1 D=0
T=1 195 232
T=0
5
39
R=0
996
1221
Possible disease prevalence range:
0.074 − 0.899
Test Ignorance Region
0.8
0.6
0.4
0.2
0.0
0.2
0.4
Specificity
0.6
0.8
1.0
Disease prevalence range: 0.200 − 0.300
0.0
p2=P(D=1|T=0,R=0)
1.0
"Ignorance" about p1 and p2
0.0
0.2
0.4
0.6
0.8
p1=P(D=1|T=1,R=0)
NCSU. April 27, 2006
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
28 of 40
Duke University
• General regression based approach
Point estimate BUT assumptions are needed
• Test Ignorance Region (TIR)
No assumptions BUT a region estimate
TIR provides fair summary of the information in the data set.
Overlay assumption derived point or sub-region estimates over the TIR: explicit
statement about amount of information induced by a model or an assumption.
NCSU. April 27, 2006
29 of 40
Duke University
SPECT DATA
Result of
Disease D verified (R=1)
Disease D not
test T
D=1
D=0
verified (R=0)
Total
T =1
195
232
996
1423
T =0
5
39
1221
1265
SPECT DATA — No disease information (or latent disease)
Result of
Disease D verified (R=1)
Disease D not
test T
D=1
D=0
verified (R=0)
Total
T =1
0
0
1423
1423
T =0
0
0
1265
1265
NCSU. April 27, 2006
30 of 40
Duke University
SPECT DATA — Latent disease
D=1 D=0
T=1
0
0
T=0
0
0
R=0
1423
1265
Possible disease prevalence range:
0.000 − 1.000
0.8
0.6
Specificity
0.0
0.2
0.4
0.8
0.6
0.4
0.2
0.0
p2=P(D=1|T=0,R=0)
1.0
Test Ignorance Region
1.0
"Ignorance" about p1 and p2
0.0
0.2
0.4
0.6
0.8
p1=P(D=1|T=1,R=0)
NCSU. April 27, 2006
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
31 of 40
Duke University
SPECT DATA — Latent disease - Assumption about prevalence
D=1 D=0
T=1
0
0
T=0
0
0
R=0
1423
1265
Possible disease prevalence range:
0.000 − 1.000
Test Ignorance Region
0.8
0.6
0.4
0.2
0.0
0.2
0.4
Specificity
0.6
0.8
1.0
Disease prevalence range: 0.074 − 0.899
0.0
p2=P(D=1|T=0,R=0)
1.0
"Ignorance" about p1 and p2
0.0
0.2
0.4
0.6
0.8
p1=P(D=1|T=1,R=0)
NCSU. April 27, 2006
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
32 of 40
Duke University
SPECT DATA — Latent disease - Assumption about prevalence
D=1 D=0
T=1
0
0
T=0
0
0
R=0
1423
1265
Possible disease prevalence range:
0.000 − 1.000
Test Ignorance Region
0.8
0.6
0.4
0.2
0.0
0.2
0.4
Specificity
0.6
0.8
1.0
Disease prevalence range: 0.074 − 0.899
0.0
p2=P(D=1|T=0,R=0)
1.0
"Ignorance" about p1 and p2
0.0
0.2
0.4
0.6
0.8
p1=P(D=1|T=1,R=0)
NCSU. April 27, 2006
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
33 of 40
Duke University
SPECT DATA — Latent disease - Assumption about prevalence
D=1 D=0
T=1
0
0
T=0
0
0
R=0
1423
1265
Possible disease prevalence range:
0.000 − 1.000
Test Ignorance Region
0.8
0.6
0.4
0.2
0.0
0.2
0.4
Specificity
0.6
0.8
1.0
Disease prevalence range: 0.200 − 0.300
0.0
p2=P(D=1|T=0,R=0)
1.0
"Ignorance" about p1 and p2
0.0
0.2
0.4
0.6
0.8
p1=P(D=1|T=1,R=0)
NCSU. April 27, 2006
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Sensitivity
34 of 40
Duke University
Data with continuous test measure
Result of
Disease verified (R=1)
Disease not verified (R=0)
test T
D=1
D=0
D=1
D=0
T = t1
a1
b1
u1
T = t2
..
.
a2
..
.
b2
..
.
u2
..
.
T = tN
aN
bN
uN
Consider not known pi = P (D = 1|T = ti , R = 0) (i = 1, 2, . . . , N )
NCSU. April 27, 2006
35 of 40
Duke University
IDEALIZED COMPLETE VERIFICATION DATA
Result of
Disease verified (R=1)
Disease not verified (R=0)
test T
D=1
D=0
D=1
D=0
T = t1
a1
b1
p1 u 1
u 1 - p1 u 1
T = t2
..
.
a2
..
.
b2
..
.
p2 u 2
u 2 - p2 u 2
..
.
T = tN
aN
bN
pN u N
u N - pN u N
Receiver Operating Characteristic (ROC) curve results from a plot of sensitivity
versus (1-specificity) for all cutpoints ti .
Area Under the Curve (AUC) is computed: the closer to 1 the better.
MAR estimate for ROC (Zhou, Biometrics, 1996) considers pi = ai /(ai + bi )
NCSU. April 27, 2006
36 of 40
Duke University
Model based estimation
Positive Predicitve Value (PPV): P (D = 1|T = 1)
T = 1 means that X ≥ xcut
PPV
=
=
P (D = 1, X ≥ xcut)
P (D = 1|T = 1) = P (D = 1|X ≥ xcut) =
P (X ≥ xcut)
i≥xcut P (D = 1, X = i)
i≥xcut P (D = 1|X = i)P (X = i)
=
P (X ≥ xcut)
i≥xcut P (X = i)
• Logistic regression to relate presence of disease with the continuous test
measure X
P (D = 1|X) = 1/(1 + exp(−(α + βX)))
• Density estimation for distribution of X.
NCSU. April 27, 2006
37 of 40
Model: log P/(1−P) = −2.3914 + 0.0125 * X
Hosmer and Lemeshow (H−L) Goodness−of−Fit test p−value = 0.81
0.8
• •
0.4
•
• •
0
50
87.5
100
• •
•
••
• •
•
•••
150
••
•
••
•
•
•
• • •• • •• •• ••
• • •••
••
•
• •
0.0
P = Probability of angiography >= 50%
Duke University
•• •
200
•
250
300
X
NPV
obs = 0.8525, mdens = 0.8530
PPV
20
10
0
Count
30
obs = 0.3393, mdens = 0.3523
0
50
NCSU. April 27, 2006
100
150
200
250
300
38 of 40
Model: log P/(1−P) = −2.3914 + 0.0125 * X
Hosmer and Lemeshow (H−L) Goodness−of−Fit test p−value = 0.81
0.8
• •
0.4
•
• •
••
•
•• ••• •
•
•
• •
•
• • •••••
••
•
•
•
0.0
P = Probability of angiography >= 50%
Duke University
0
50
100
• •
•
• •
•
••
•
•••
150
••
•• •
200
•
250
250
300
X
NPV
obs = 0.8117, mdens = 0.8091
PPV
20
10
0
Count
30
obs = 0.8000, mdens = 0.7487
0
50
NCSU. April 27, 2006
100
150
200
250
300
39 of 40
Duke University
Possible work
• Dichotomous test measure
– TIR with covariates
– Relationship with latent class models
– Multiple tests
– Sub-unit measurements - various types of missingness
– Comparison of two tests utilizing the TIR concept
• Continuous test measure
– “TIR-like” approach to ROC and AUC?
– Is there a penalty for a too dense choice of cutpoints for a MAR
analysis of ROC and AUC.
– Complete development of model based estimation of sensitivity and
specificity and comparison to the direct computations approach;
possible application to verification bias issues with ROC and AUC.
NCSU. April 27, 2006
40 of 40