A Classification Tree with “Killer”

APPLYING THE CLASSIFICATION TREE
METHOD TO PREDICT A “KILLER” IN
CLINICAL DECISION-MAKING
Qiongqiong Liu MS, Isaac Li PhD, Yi Wang MS, Edward Tsai PhD
National Board of Osteopathic Medical Examiners, Chicago, Illinois
MedBiquitous Annual Conference
June 5-6 2017
John Hopkins School of Medicine
Baltimore, MD
COMLEX-USA LEVEL 3 EXAMINATION
Taken by post-graduate Doctors of Osteopathic Medicine (DOs) in their first
or second years of residency (OGME-1 or OGME-2)
Adapting to competency-based medical education
– Multiple Choice Question (MCQ)
– Clinical Decision Making (CDM)
– CDM cases assess the ability to apply medical knowledge and clinical skills
at specific decision points in patient safety
CLINICAL DECISION-MAKING (CDM) CASES
Key features (KF): the critical steps
required to appropriately diagnose
and treat a clinical case. Multiple
KFs embedded in a CDM case.
As a proxy for
misdiagnosis/mistreatment, if
residents selected a response
that might harm the patient on
the KF, also known as “hitting a
killer”.
SAMPLE CDM CASE
You are asked to see a 71-year-old female at the extended care facility because she
has lost 7.3 kg (16 lb) over the past 3 months without having been on a diet. She
was diagnosed with suspected Alzheimer disease 6 months ago. She was admitted to
the extended care facility because of her inability to take care of herself at home.
She was started on donepezil 4 months ago. The dose was increased after the first
month to the current dose of 10 mg once daily at bed time. She has had
osteoarthritis and chronic knee pain in the past, for which she takes acetaminophen
with good results. She had surgery for a rectal prolapse 3 years ago. She has
experienced constipation for several years. She eats poorly and requires
encouragement from the staff. She is confined to a wheelchair when not in bed and
needs 2 people to assist her movement from bed to chair. She appears comfortable and has a
good fluid intake with no signs of urinary or respiratory issues, pain, fever, or chills. Vital
signs reveal:
Temperature: 36.4°C (97.6°F)
Blood pressure: 118/70 mmHg
Heart rate: 60/min
Respiratory rate: 14/min
EXTENDED MULTIPLE CHOICE (EMC) QUESTION
What test(s), if any, will you order for this patient at this time? You may select
up to 7 of the 13 options listed below; select the last option if no investigation
is needed.
A.
B.
C.
D.
E.
F.
G.
H.
I.
J.
K.
L.
M.
barium enema
basic metabolic panel
complete blood count
creatinine level
CT scan of the abdomen and pelvis
electrolyte panel
fecal occult blood testing
glycosylated hemoglobin level
serum albumin level
thyroid-stimulating hormone level
ultrasonography of the abdomen
urinalysis
no investigation
Answer Key
Two required correct responses:
H: glycosylated hemoglobin
J: thyroid-stimulating hormone level
Zero points for selecting any of the following:
A: barium enema(Killer)
E: CT scan of the abdomen and pelvis(Killer)
K: ultrasonography of the abdomen(Killer)
More than 7 options (Over treatment)
MOTIVATION
Previous results: Residents hitting a killer was
associated with a lower total score on COMLEXUSA Level 3.
Further analysis: Is there any relationship between
hitting a killer and performance on exams of the
COMLEX-LEVEL 1 & 2 ?
If we use residents’ exam performance to predict
this “killer hitting” behavior, which predictor will
be the most important in the model?
DATA AND METHOD
Software use: a SAS
high performance
procedure (HPSPLIT)
Predict the
probability
of a “Killer”
outcome.
Number of
observations:
Around 800
residents.
Data Use: COMLEXUSA series standard
scores, subscores,
and other
demographic
variables.
RESULTS: A CLASSIFICATION TREE WITH “KILLER” NODES
Classification Tree for killer
0
1
2
3
4
7
8
5
A
B
9
G
H
D
K
L
M
N
O
P
Q
J
R
T
U
S
l
X
Y
Z
a
b
c
d
m
n
1I
g
1J
1r
o
t
r
u
v
s
z
24
20 21 22 23
10
13
1B
26
2y
28
29
2A
2B
2C
2D
1e
2E
1i
1g
1C
1l
1j
2F
2I 2J
2b
2V 2W
2w 2x
27
1c
25
2U
2S
2v
1x 1y
k
j
1f
1z
1t 1u 1v
i
w x y
1M
1s
2T
h
f
1a
2P 2Q 2R
e
VW
p q
1q
E
C
I
F
6
2c
2d
2e
2X 2Y 2Z 2a
2k
2f
2z
30
31
32
killer
33
34
35
2g
2h
2i
36
0
2j
2l
37
1
2m
38
2o
2p
2n
39
3A 3B 3C 3D 3E
2q
2L
2t
2r
3F 3G
2u
2s
3H 3I
3J
3K
MODEL BUILDING PROCEDURE
The model is based on a partition of the predictor space into
non-overlapping segments, which correspond to the terminal
nodes or leaves of the tree. The partitioning is done
recursively, starting with the root node, which contains all the
data, and ending with the terminal nodes.
At each step of the recursion, the parent node is split
into child nodes through selection of a predictor
variable and a split value that minimize the variability in
the response across the child nodes.
The splitting rules that define the leaves
provide the information that is needed
to score new data.
RESULTS: EASY INTERPRETATION FROM A SUBTREE
Subtree Starting at Node=0
The results showed school is
an important factor on one
major branch, and
psychiatry score from the
Level 3 exam is another
important predictor when it
is lower than 534.69.
Node
N
2
school
A,AO,B,G,J,N,P,Q,W
This tree model gives each
layer predictive information
with certain rules and
conditions so we could
investigate more when we
try to link these scores with
school information.
school
BO,C,CO,D,DO,E,F,H,I,K,L,M,O,R,S,T,U,V,X,...
Node
N
1
Node
N
2
l3_DI23
< 766.350
l3_DI23
>= 766.350
Node
N
1
l1_DI16
< 541.130
Node
N
1
l3_DI23
< 534.690
Node
N
1
l1_DI16
>= 541.130
Node
N
2
l1_DI18
< 412.728
Node
N
2
l3_DI23
>= 534.690
Node
N
2
l1_DI18
>= 412.728
Node
N
1
1 killer=0
l3_DI21
< 878.220
Node
N
2
2 killer=1
Node
N
1
l3_DI21
>= 878.220
Node
N
2
school
BO,C,DO,F,H,I,...
Node
N
1
school
CO,D,E,L,...
Node
N
2
RESULTS: MODEL-DATA FIT
Model-Based Confusion Matrix
Predicted
Actual
Error Rate
0
1
0
365
13
0.0344
1
13
372
0.0338
Model-Based Fit Statistics for Selected Tree
No. of
Leaves
ASE
Missclassification
Sensitivity
104
0.0277
0.0341
0.9662
Specificity Entropy
0.9656
0.13
Gini
RSS
AUC
0.05
42.34
0.99
The fit statistics showed small error rate(0.0341) for
misclassification when we put new data in the model, which
indicate our data fit model well.
RESULTS: SPECIFICITY AND SENSITIVITY
ROC Curve for killer
1.0
0.8
The sensitivity
indicates the
probability of
correctly
prediction of
resident hit killer
, 0.9662 .
Sensitivity
0.6
0.4
0.2
Training AUC 0.99
0.0
0.0
0.2
0.4
0.6
1 - Specificity
Training
0.8
1.0
The specificity
indicates the
probability of
correctly
prediction of
resident not hit
killer, which is
0.9656.
RESULTS: IMPORTANCE FACTOR IN THE MODEL
Variable Importance
Training
Variable
Count
Relative Importance
Biochemistry (L1)
0.6754
5.2538
8
Pediatric (L2)
0.6653
5.1759
8
Anatomy (L1)
0.5745
4.4694
7
In addition to school
effect, variable
importance shows
that basic science
disciplines of
biochemistry,
pediatric, and
anatomy are
important
predictors of
mistreatment/mis
diagnosis. This
finding calls
attention to
the instruction of
these subjects,
particularly for
those COMs
identified by the
model.
SUMMARY OF THE TREE BUILDING
Advantages
Use for
complex
problem
solving.
Do not need
worry about
assumptions
for the
model.
Use for new
data
predication
and easy to
visualize.
Could easily
change the
partition
rules.
Disadvantages
1. Overfitting and
under-fitting,
particularly for smalldata.
2. Strong correlations
could lead the model
to outcomes we did
not expect.
THANK YOU!
REFERENCES
Bordage, G., Brailovsky, C., Carretier, H., & Page, G. (1995). Content validation of key features on a
national examination of clinical decision-making skills. Academic Medicine, 70, 276-81.
Holmboe, E. S., Ward, D. S., Reznick, R. K., Katsufrakis, P. J., Leslie, K. M., Patel, V. L., Ray, D. D., &
Nelson, E. A. (2011). Faculty development in assessment: The missing link in competency-based
medical education. Academic Medicine, 86, 460-467.
Medical Council of Canada (2012, August). Guidelines for the development of key feature problems
and test cases. Retrieved from http://mcc.ca/wp-content/uploads/CDM-Guidelines.pdf
Page, G., Bordage, G., & Allen, T. (1995). Developing key-feature problems and examinations to
assess clinical decision-making skills. Academic Medicine, 70, 194-201.
Rindler, S. E. (1979). Pitfalls in assessing test speededness. Journal of Educational Measurement, 16,
261–270.
SAS Institute Inc. (2015). SAS/STAT® 14.1 User’s Guide: High-Performance Procedures. Cary, NC:
SAS Institute Inc.