Talk - csiro

Why the naïve Bayesian classifier can
dominate the proper one
(‘All models are wrong but some are useful'')
Hans - J. Lenz
Inst. f. Statistik und Ökonometrie
Freie Universität Berlin
hans-j.lenz @ fu-berlin.de
1 / 19
Overview




The classification problem
Proper and naïve Bayesian classifier
The Gamerman, Thatcher (1991) study
Explanation
 Curse of dimension
 Estimation error
 Missing values
 Short conclusion
Hans-J. Lenz FU Berlin Aug 2013
2
Overview
1.
2.
3.
4.
The classification problem
Proper and naïve Bayesian classifier
The Gamerman, Thatcher (1991) study
Explanation
1. Curse of dimension
2. Estimation error
3. Missing values
5. Short conclusion
Hans-J. Lenz FU Berlin Aug 2013
3
Entity-Relationship Model
Patient
Treatment
case
Doctor
Symptom
x
ICD
Disease
Hans-J. Lenz FU Berlin Aug 2013

4
The classification problem in hospitals
 Given a new patient and his observed pN
symptoms classify him according to the ICD set
of known diseases (illnesses) and prior
information from the hospital’s database system.
  finite parameter (disease) space
  finite sampling (symptom) space
DEF.: Diagnostic function
:   p  R[0,1] with = ( given x) and 
Hans-J. Lenz FU Berlin Aug 2013
5
Overview




The classification problem
Proper and naïve Bayesian classifier
The Gamerman, Thatcher (1991) study
Explanation
 Curse of dimension
 Estimation error
 Missing values
 Short conclusion
Hans-J. Lenz FU Berlin Aug 2013
6
Bayesian classifier
 B(x)=*=P(* x) where * = arg max P(x)
x
Likelihood
P(x)
Prior
information
P()
Bayes rule
Posterior
information
P(x)
P(x)  P(x) P()
Learning
Clinical
Database
Hans-J. Lenz FU Berlin Aug 2013
Feedback
7
Proper Bayesian classifier
 PB(x)  Px(x) P() given x=(x1,x2,…,xp)
(Note: p symptoms obs. for disease )
 symptom xi 
/ xj disease  for all cases
symptom
x1,x2,…,xp
case
disease

 B(x)=PB(*x)
Hans-J. Lenz FU Berlin Aug 2013
8
Naïve (Idiot) Bayesian classifier
 PnB(x)   Pi(xi) P() given x = (x1,x2,…,xp)
 symptom xi  xj  disease  for all cases
symptom
x1,x2,…,xp
case
disease

 nB(x)=PnB(*x)
Hans-J. Lenz FU Berlin Aug 2013
9
Overview




The classification problem
Proper and naïve Bayesian classifier
The Gamerman & Thatcher (1991) study
Explanation
 Curse of dimension
 Estimation error
 Missing values
 Short conclusion
Hans-J. Lenz FU Berlin Aug 2013
10
The Gamerman & Thatcher (1991)
study




UK hospital data sampled in 1988
9 diseases
135 symptoms
sample sizes
 nTraining = 2000
 nTest = 4387
Hans-J. Lenz FU Berlin Aug 2013
11
I Physician
Percentage+)
of correct diagnoses
76
II Idiot Bayes
74
III Proper Bayes
65 (!)
Method
+) estimated from test set
Source: Gammerman and Thatcher(1988)
Hans-J. Lenz FU Berlin Aug 2013
12
Overview




The classification problem
Proper and naïve Bayesian classifier
The Gamerman, Thatcher (1991) study
Explanation
 Curse of dimension
 Missing values
 Estimation error
Hans-J. Lenz FU Berlin Aug 2013
13
Curse of dimension
 ICD relates the finite disease space  with the
power set of the finite sampling (symptom) space
, i.e. ICD:   2
 Note that the no of observed symptoms (pl) varies
from case to case (l)  missing value problem
 = Prob( missing
value) = 5%
1.200
1.000
0.800
0.600
0.400
0.200
dimension d
12
0
80
60
40
20
Hans-J. Lenz FU Berlin Aug 2013
10
0
0.000
1
P(at least one missing
value)
Curse
Dimension
P(at least
oneof
missing
value) = 1-(1-)p
14
Estimation of disease probabilities
 N #(all cases)
 x  Xd symptom vector for disease acc. to ICD
 n(, x) #(cases where symptom vector x and
disease  is recorded)
n (, x) / N n (, x)

 P̂( x) 
n ( x) / N
n ( x)
for all 
O(N): one pass table scan for fixed d!
Hans-J. Lenz FU Berlin Aug 2013
15
Estimation Error
 Missing value problem: pl  p not constant over all
cases l for each disease , and having known upper
boundary p according to ICD
 Over-fitting effect : a too large value pmax for p
(too many symptoms per disease  considered)
 Sampling error in weakly occupied cells is increased
var P̂(x, ) ~ 1 / m
where m is the absolute frequency in cell (x,)
Hans-J. Lenz FU Berlin Aug 2013
16
Overview




The classification problem
Proper and naïve Bayesian classifier
The Gamerman, Thatcher (1991) study
Explanation
 Curse of dimension
 Estimation error
 Missing values
 Short conclusion
Hans-J. Lenz FU Berlin Aug 2013
17
Short conclusion
 The magic triangle
modeling
structural
dependency
missing values
imputation
estimation
error
Hans-J. Lenz FU Berlin Aug 2013
18
Thank you for your attention
Hans-J. Lenz FU Berlin Aug 2013
19