2014_0097_basic Probability and Bayes

Does Naïve Bayes always work?
• No
Bayes’ Theorem with relative likelihood
• In the setting of diagnostic/evidential reasoning
H i P(Hi )
hypotheses
P(E j | Hi )
E1
Ej
Em
– Known prior probability of hypothesis
conditional probability
– Want to compute the posterior probability
evidence/m anifestati ons
P(Hi )
P(E j | Hi )
P(Hi | E j )
• Bayes’ theorem (formula 1): P ( H i | E j )  P ( H i ) P ( E j | H i ) / P ( E j )
• If the purpose is to find which of the n hypotheses H1 ,..., H n
is more plausible given E j, then we can ignore the
denominator and rank them, use relative likelihood
rel ( H i | E j )  P ( E j | H i ) P ( H i )
2
Relative likelihood
• P ( E j ) can be computed fromP ( E j | H i ) and P ( H i ,) if we assume
all hypotheses H1 ,..., H n are ME and EXH
P ( E j )  P ( E j  ( H1  ...  H n )
(by EXH)
n
  P(E j  Hi )
(by ME)
i 1
n
  P(E j | Hi )P(Hi )
i 1
• Then we have another version of Bayes’ theorem:
P(Hi | E j ) 
P(E j | Hi )P(Hi )
n
 P(E
k 1
j
| Hk )P(Hk )

rel ( H i | E j )
n
 rel ( H
k 1
k
| Ej)
n
where
 P(E
k 1
j
| H k ) P ( H k ), the sum of relative likelihood of all
n hypotheses, is a normalization factor
Naïve Bayesian Approach
• Knowledge base:
E1 ,..., Em :
evidence/m anifestati on
H1 ,..., H n :
hypotheses /disorders
E j and H i are binary and hypotheses form a ME & EXH set
P ( E j | H i ), i  1,...n, j  1,...m
conditiona l probabilit ies
• Case input: E1 ,..., El
• Find the hypothesis H iwith the highest posterior
probability P ( H i | E1 ,..., El )
P ( E1 ,...E l | H i ) P ( H i )
• By Bayes’ theorem P ( H i | E1 ,..., E l ) 
P ( E1 ,...E l )
• Assume all pieces of evidence are conditionally
independent, given any hypothesis
P ( E1 ,...El | H i )   lj 1 P ( E j | H i )
4
absolute posterior probability
• The relative likelihood
rel ( H i | E1 ,..., El )  P ( E1 ,..., El | H i ) P ( H i )  P ( H i ) lj 1 P ( E j | H i )
• The absolute posterior probability
P ( H i | E1 ,..., El ) 
rel ( H i | E1 ,..., El )
n
 rel ( H k | E1 ,..., El )
k 1

P ( H i ) lj 1 P ( E j | H i )
l
P
(
H
)

 k j 1 P ( E j | H k )
n
k 1
• Evidence accumulation (when new evidence is
discovered)
rel ( H i | E1 ,..., El , El 1 )  P ( El 1 | H i )rel ( H i | E1 ,..., El )
rel ( H i | E1 ,..., El , ~ El 1 )  (1  P ( El 1 | H i ))rel ( H i | E1 ,..., El )
Discussion of Assumptions of Naïve Bayes
• Assumption 1: hypotheses are mutually exclusive and
exhaustive
– Single fault assumption (one and only one hypothesis must be
true)
– Multi-faults do exist in individual cases
– Can be viewed as an approximation of situations where
hypotheses are independent of each other and their prior
probabilities are very small
P ( H1  H 2 )  P ( H1 ) P( H2 )  0 if both P ( H1 ) and P( H 2 ) are very small
• Assumption 2: pieces of evidence are conditionally
independent of each other, given any hypothesis
– Manifestations themselves are not independent of each other,
they are correlated by their common causes
– Reasonable under single fault assumption
– Not so when multi-faults are to be considered
6
Limitations of the
naïve Bayesian
system
Limitations of the naïve Bayesian system
• Cannot handle hypotheses of multiple disorders well
– Suppose H1 ,..., H n are independent of each other
– Consider a composite hypothesis H1 ^ H 2
– How to compute the posterior probability (or relative likelihood)
P ( H1 ^ H 2 | E1 ,..., El ) ?
– Using Bayes’ theorem
P ( E1 ,...E l | H1 ^ H 2 ) P ( H1 ^ H 2 )
P ( H1 ^ H 2 | E1 ,..., E l ) 
P ( E1 ,...E l )
P ( H1 ^ H 2 )  P( H1 ) P( H 2 ) because they are independen t
P ( E1 ,...El | H1 ^ H 2 )   lj 1 P ( E j | H1 ^ H 2 )
assuming E j are independen t, given H1 ^ H 2
How to compute P ( E j | H1 ^ H 2 ) ?
Explanation example
–
Earthquake?
Burgler?
Alarm is on
Assuming H1 ,..., H n are independen t, given E1 ,..., El ?
but this is a very unreasonable assumption
P ( H1 ^ H 2 | E1 ,..., El )  P ( H1 | E1 ,..., El ) P ( H 2 | E1 ,..., El )
B:
burglar
E: earth quake
A: alarm set off
E and B are independent
But when A is given, they
are (adversely) dependent
because they become
competitors to explain A
P(B|A, E) <<P(B|A)
E explains away of A
• Need a better representation and a better assumption
9
• Naïve Bayes cannot handle causal chaining
A
– Example. A: weather of the year
B: cotton production of the year
C: cotton price of next year
– Observed: A influences C
– The influence is not direct (A -> B -> C)
P(C|B, A) = P(C|B): instantiation of B blocks
influence of A on C
• Need a better representation and a better
assumption
10
B
C
Bayesian Networks and
Markov Models – applications in robotics
•
•
•
•
•
•
•
Bayesian AI
Bayesian Filters
Kalman Filters
Particle Filters
Bayesian networks
Decision networks
Reasoning about changes over time
• Dynamic Bayesian Networks
• Markov models