Does Naïve Bayes always work? • No Bayes’ Theorem with relative likelihood • In the setting of diagnostic/evidential reasoning H i P(Hi ) hypotheses P(E j | Hi ) E1 Ej Em – Known prior probability of hypothesis conditional probability – Want to compute the posterior probability evidence/m anifestati ons P(Hi ) P(E j | Hi ) P(Hi | E j ) • Bayes’ theorem (formula 1): P ( H i | E j ) P ( H i ) P ( E j | H i ) / P ( E j ) • If the purpose is to find which of the n hypotheses H1 ,..., H n is more plausible given E j, then we can ignore the denominator and rank them, use relative likelihood rel ( H i | E j ) P ( E j | H i ) P ( H i ) 2 Relative likelihood • P ( E j ) can be computed fromP ( E j | H i ) and P ( H i ,) if we assume all hypotheses H1 ,..., H n are ME and EXH P ( E j ) P ( E j ( H1 ... H n ) (by EXH) n P(E j Hi ) (by ME) i 1 n P(E j | Hi )P(Hi ) i 1 • Then we have another version of Bayes’ theorem: P(Hi | E j ) P(E j | Hi )P(Hi ) n P(E k 1 j | Hk )P(Hk ) rel ( H i | E j ) n rel ( H k 1 k | Ej) n where P(E k 1 j | H k ) P ( H k ), the sum of relative likelihood of all n hypotheses, is a normalization factor Naïve Bayesian Approach • Knowledge base: E1 ,..., Em : evidence/m anifestati on H1 ,..., H n : hypotheses /disorders E j and H i are binary and hypotheses form a ME & EXH set P ( E j | H i ), i 1,...n, j 1,...m conditiona l probabilit ies • Case input: E1 ,..., El • Find the hypothesis H iwith the highest posterior probability P ( H i | E1 ,..., El ) P ( E1 ,...E l | H i ) P ( H i ) • By Bayes’ theorem P ( H i | E1 ,..., E l ) P ( E1 ,...E l ) • Assume all pieces of evidence are conditionally independent, given any hypothesis P ( E1 ,...El | H i ) lj 1 P ( E j | H i ) 4 absolute posterior probability • The relative likelihood rel ( H i | E1 ,..., El ) P ( E1 ,..., El | H i ) P ( H i ) P ( H i ) lj 1 P ( E j | H i ) • The absolute posterior probability P ( H i | E1 ,..., El ) rel ( H i | E1 ,..., El ) n rel ( H k | E1 ,..., El ) k 1 P ( H i ) lj 1 P ( E j | H i ) l P ( H ) k j 1 P ( E j | H k ) n k 1 • Evidence accumulation (when new evidence is discovered) rel ( H i | E1 ,..., El , El 1 ) P ( El 1 | H i )rel ( H i | E1 ,..., El ) rel ( H i | E1 ,..., El , ~ El 1 ) (1 P ( El 1 | H i ))rel ( H i | E1 ,..., El ) Discussion of Assumptions of Naïve Bayes • Assumption 1: hypotheses are mutually exclusive and exhaustive – Single fault assumption (one and only one hypothesis must be true) – Multi-faults do exist in individual cases – Can be viewed as an approximation of situations where hypotheses are independent of each other and their prior probabilities are very small P ( H1 H 2 ) P ( H1 ) P( H2 ) 0 if both P ( H1 ) and P( H 2 ) are very small • Assumption 2: pieces of evidence are conditionally independent of each other, given any hypothesis – Manifestations themselves are not independent of each other, they are correlated by their common causes – Reasonable under single fault assumption – Not so when multi-faults are to be considered 6 Limitations of the naïve Bayesian system Limitations of the naïve Bayesian system • Cannot handle hypotheses of multiple disorders well – Suppose H1 ,..., H n are independent of each other – Consider a composite hypothesis H1 ^ H 2 – How to compute the posterior probability (or relative likelihood) P ( H1 ^ H 2 | E1 ,..., El ) ? – Using Bayes’ theorem P ( E1 ,...E l | H1 ^ H 2 ) P ( H1 ^ H 2 ) P ( H1 ^ H 2 | E1 ,..., E l ) P ( E1 ,...E l ) P ( H1 ^ H 2 ) P( H1 ) P( H 2 ) because they are independen t P ( E1 ,...El | H1 ^ H 2 ) lj 1 P ( E j | H1 ^ H 2 ) assuming E j are independen t, given H1 ^ H 2 How to compute P ( E j | H1 ^ H 2 ) ? Explanation example – Earthquake? Burgler? Alarm is on Assuming H1 ,..., H n are independen t, given E1 ,..., El ? but this is a very unreasonable assumption P ( H1 ^ H 2 | E1 ,..., El ) P ( H1 | E1 ,..., El ) P ( H 2 | E1 ,..., El ) B: burglar E: earth quake A: alarm set off E and B are independent But when A is given, they are (adversely) dependent because they become competitors to explain A P(B|A, E) <<P(B|A) E explains away of A • Need a better representation and a better assumption 9 • Naïve Bayes cannot handle causal chaining A – Example. A: weather of the year B: cotton production of the year C: cotton price of next year – Observed: A influences C – The influence is not direct (A -> B -> C) P(C|B, A) = P(C|B): instantiation of B blocks influence of A on C • Need a better representation and a better assumption 10 B C Bayesian Networks and Markov Models – applications in robotics • • • • • • • Bayesian AI Bayesian Filters Kalman Filters Particle Filters Bayesian networks Decision networks Reasoning about changes over time • Dynamic Bayesian Networks • Markov models
© Copyright 2025 Paperzz