Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia, National Taiwan University 1 Basic Assumptions The decision problem is posed in probabilistic terms All of the relevant probability values are known 2 State of Nature State of nature – 1 (sea bass) or 2 (salmon ) A priori probability (prior) – P( ) : the next fish is sea bass 1 P(2 ) : the next fish is salmon Decision rule to judge just one fish – Decide 1 if P(1 ) P(2 ); otherwise decide 2 3 Class-Conditional Probability Density 4 Bayes Formula P( j | x) p( x | j ) P( j ) p( x) 2 p( x) p( x | j ) P( j ) j 1 likelihood prior posterior evidence 5 Posterior Probabilities P(1 ) 2 / 3, P(2 ) 1/ 3 6 Bayes Decision Rule Probability of error P(1 | x) if we decide 2 P(error | x) P(2 | x) if we decide 1 P(error ) p(error , x)dx p(error | x) p( x)dx Bayes decision rule Decide 1 if P(1 | x) P(2 | x); otherwise decide 2 Or, decide 1 if p( x | 1 ) P(1 ) p( x | 2 ) P(2 ); otherwise decide 2 7 Bayes Decision Theory (1/3) 1 ,, c 1 ,, a Categories Actions Loss functions ( i | j ) Feature vector d component vector x 8 Bayes Decision Theory (2/3) Bayes formula p( j | x) p (x | j ) P( j ) p ( x) c p(x) p (x | j ) P( j ) j 1 Conditional risk c R( i | x) ( i | j ) p( j | x) j 1 9 Bayes Decision Theory (3/3) Decision function (x) assumes one of the a values 1 , , a Overall risk R R( (x) | x) p(x)dx Bayes decision rule: compute the conditional risk c R( i | x) ( i | j ) P( j | x), i 1,, a j 1 then select the action is minimum i for which R( i | x) 10 Two-Category Classification Conditional risk R(1 | x) 11P(1 | x) 12 P(2 | x) R( 2 | x) 21P(1 | x) 22 P(2 | x) ij ( i | j ) Decision rule: decide 1 if (21 11 ) p(x | 1 ) P(1 ) (12 22 ) p(x | 2 ) P(2 ) Likelihood ratio p(x | 1 ) 12 22 P(1 ) p(x | 2 ) 21 11 P(2 ) 11 Minimum-Error-Rate Classification If action i is taken and the true state is j , then the decision is correct if i j and in error if i j Error rate (the probability of error) is to be minimized Symmetrical or zero-one loss function 0, i j ( i | j ) , i, j 1,, c 1, i j Conditional risk c R( i | x) ( i | j ) P( j | x) 1 P(i | x) j 1 12 Minimum-Error-Rate Classification 13 Mini-max Criterion To perform well over a range of prior probability Minimize the maximum possible overall risk – So that the worst risk for any value of the priors is as small as possible 14 Mini-maximizing Risk R 11 P(1 ) p (x | 1 ) 12 P(2 ) p(x | 2 )dx R1 21 P(1 ) p (x | 1 ) 22 P(2 ) p (x | 2 )dx R2 22 (12 22 ) p (x | 2 )dx R1 P(1 )[( 11 22 ) (21 11 ) p (x | 1 )dx (12 22 ) p (x | 2 )dx] R2 Rmm R R1 15 Searching for Mini-max Boundary 16 Neyman-Pearson Criterion Minimize the overall risk subject to a constraint Example – Minimize the total risk subject to R( i | x)dx constant 17 Discriminant Functions A classifier assigns x to class gi (x) g j (x) for all j i i if where g i (x) are called discriminant functions A discriminant function for a Bayes classifier gi (x) R( i | x) Two discriminant functions for minimum- error-rate classification g i ( x) p(x | i ) P(i ) c p(x | ) P( ) j 1 j j ; g i (x) ln p(x | i ) ln P(i ) 18 Discriminant Functions 19 Two-Dimensional Two-Category Classifier 20 Dichotomizers Place a pattern in one of only two categories – cf. Polychotomizers More common to define a single duscriminant function g (x) g1 (x) g 2 (x) Some particular forms g (x) P(1 | x) P(2 | x) p(x | 1 ) P(1 ) g (x) ln ln p (x | 2 ) P(2 ) 21 Univariate Normal PDF 1 x 2 1 2 p ( x) exp ~ N ( , ) 2 2 22 Distribution with Maximum Entropy and Central Limit Theorem Entropy for discrete distribution m H Pi log 2 Pi (bits) i 1 Entropy for continuous distribution H ( p( x)) p( x) ln p( x)dx (nats) Central limit theorem – Aggregate effect of the sum of a large number of small, independent random disturbances, will lead to a Gaussian distrubution 23 Multivariate Normal PDF 1 T 1 p ( x) exp x μ Σ x μ 1/ 2 d /2 2 2 Σ 1 ~ N (μ, Σ) μ E[x] : d-component mean vector Σ E x μ x μ : d-by-d T covariance matrix 24 Linear Combination of Gaussian Random Variables p(x) ~ N (μ, Σ), y At x p(y) ~ N (At μ, At ΣA) 25 Whitening Transform F: matrix whose columns are the orthonormal eigenvectors of S L: diagonal matrix of the corresponding eigenvalues Whitening transform A w ΦΛ1/ 2 A tw ΣA w I 26 Bivariate Gaussian PDF 27 Mahalanobis Distance Squared Mahalanobus distance 1 r ( x μ) Σ ( x μ) 2 t Volume of the Hyperellipsoids of constant Mahalanobis distance r V Vd Σ 1/ 2 rd d / 2 /( d / 2)! d even Vd d ( d 1) / 2 d 1 ( )! / d! d odd 2 2 28 Discriminant Functions for Normal Density for minimum - error - rate classifica tion g i (x) ln p(x | i ) ln P(i ) for normasl density 1 d 1 t 1 g i (x) (x μ i ) Σ i (x μ i ) ln 2 ln Σ i ln P(i ) 2 2 2 29 Case 1: Si = 2 I g i ( x) x μi 2 x μi 2 2 ln P(i ) 2 (x μ i )t (x μ i ) g i ( x) 1 x x 2μ x μ μ ln P( ) t 2 g i (x) w ti x wi 0 2 t i t i i i 1 t w i 2 μ i , wi 0 μ i μ i ln P(i ) 2 i 2 i 1 30 Decision Boundaries g i ( x) g j ( x) w t (x x 0 ) 0 w μi μ j 1 x 0 (μ i μ j ) 2 μi μ j 2 2 P(i ) ln (μ i μ j ) P( j ) 31 Decision Boundaries when P(i)=P(j) 32 Decision Boundaries when P(i) and P(j) are unequal 33 Case 2: Si = S 1 t 1 g i (x) (x μ i ) Σ (x μ i ) ln P(i ) 2 1 t 1 1 t t 1 g i (x) x Σ x μ i Σ x μ i μ i ln P(i ) 2 2 g i (x) w ti x wi 0 1 t 1 w i Σ μ i , wi 0 μ i Σ μ i ln P(i ) 2 1 34 Decision Boundaries g i ( x) g j ( x) w (x x 0 ) 0 t 1 w Σ (μ i μ j ) ln[ P(i ) / P( j )] 1 x 0 (μ i μ j ) (μ i μ j ) t 1 2 (μ i μ j ) Σ (μ i μ j ) 35 Decision Boundaries 36 Case 3: Si = arbitrary g i (x) x Wi x w x wi 0 t t i 1 1 1 Wi Σ i , w i Σ i μ i 2 1 t 1 1 wi 0 μ i Σ i μ i ln Σ i ln P(i ) 2 2 37 Decision Boundaries for OneDimensional Case 38 Decision Boundaries for TwoDimensional Case 39 Decision Boundaries for ThreeDimensional Case (1/2) 40 Decision Boundaries for ThreeDimensional Case (2/2) 41 Decision Boundaries for Four Normal Distributions 42 Example: Decision Regions for Two-Dimensional Gaussian Data 43 Example: Decision Regions for Two-Dimensional Gaussian Data 3 1 / 2 0 3 2 0 , μ 2 , Σ 2 μ1 , Σ1 6 0 2 2 0 2 2 0 1 / 2 0 1 1 , Σ 2 Σ1 0 1/ 2 0 1/ 2 P(1 ) P(2 ) 0.5 decision boundary 3 x2 3.514 1.125 x1 0.1875 x , not passing 2 2 1 44 Bayes Decision Compared with Other Decision Strategies P(error ) P(x R2 , 1 ) P(x R1 , 2 ) p(x | ) P( )dx p(x | ) P( )dx 1 R2 1 2 R1 2 45 Multicategory Case Probability of being correct c P(correct ) p(x | i ) P(i )dx i 1 Ri Bayes classifier maximizes this probability by choosing the regions so that the integrand is maximal for all x – No other partitioning can yield a smaller probability of error 46 Error Bounds for Normal Densities Full calculation of the error probability is difficult for the Gaussian case – Especially in high dimensions – Discontinuous nature of the decision regions Upper bound on the error can be obtained for two-category case – By approximating the error integral analytically 47 Chernoff Bound min[ a, b] a b1 for a, b 0 and 0 1 P (error ) min[ P (1 | x), P (2 | x)] p ( j | x) p (x | j ) P ( j ) p ( x) p (x | j ) P ( j ) P (error ) P (1 ) P1 (2 ) p (x | 1 ) p1 (x | 2 )dx for normal densities, k ( ) ( 1) 1 ln 2 1 k ( ) p ( x | ) p ( x | ) d x e 1 2 (μ 2 μ1 ) t [(1 ) Σ1 Σ 2 ]1 (μ 2 μ1 ) 2 (1 ) Σ1 Σ 2 Σ1 1 Σ2 48 Bhattacharyya Bound set 1 / 2 P(error ) P(1 ) P(2 ) p (x | 1 ) p (x | 2 )dx P(1 ) P(2 )e k (1/ 2 ) 1 1 t Σ1 Σ 2 k (1 / 2) (μ 2 μ1 ) (μ 2 μ1 ) 8 2 1 ln 2 Σ1 Σ 2 2 Σ1 Σ 2 49 Chernoff Bound and Bhattacharyya Bound 50 Example: Error Bounds for Gaussian Distribution 3 1 / 2 0 3 2 0 , μ 2 , Σ 2 μ1 , Σ1 6 0 2 2 0 2 2 0 1 / 2 0 1 1 , Σ 2 Σ1 0 1/ 2 0 1/ 2 P(1 ) P(2 ) 0.5 51 Example: Error Bounds for Gaussian Distribution Bhattacharyya bound – k(1/2) = 4.11157 – P(error) < 0.0087 Chernoff bound – 0.008190 by numerical searching Error rate by numerical integration – 0.0021 – Impractical for higher dimension 52 Signal Detection Theory Internal signal in the detector x – Has mean 2 when external signal (pulse) is present – Has mean 1 when external signal is not present – p(x|i) ~ N(i, 2) 53 Signal Detection Theory 2 1 discrimina bility d ' 54 Four Probabilities Hit: P(x>x*|x in 2) False alarm: P(x>x*|x in 1) Miss: P(x<x*|x in 2) Correct reject: P(x<x*|x in 1) 55 Receiver Operating Characteristic (ROC) 56 Bayes Decision Theory: Discrete Features p(x | )dx P(x | ) i i x P(x | i ) P(i ) P(i | x) P ( x) c P(x) P(x | i ) P(i ) i 1 select action arg min R( i | x) * i 57 Independent Binary Features x x1 , , xd t pi Pr[ xi 1 | 1 ], qi Pr[ xi 1 | 2 ] Assume conditiona l independen ce d P(x | 1 ) p (1 pi ) 1 xi xi i i 1 d , P( x | 2 ) q (1 qi ) xi i 1 xi i 1 likelihood ratio pi P(x | 1 ) P(x | 2 ) i 1 qi d xi 1 xi 1 pi 1 qi 58 Discriminant Function pi 1 pi P(1 ) g (x) xi ln (1 xi ) ln ln qi 1 qi P(2 ) i 1 d d g (x) wi xi w0 i 1 d pi (1 qi ) 1 pi P(1 ) wi ln , w0 ln ln qi (1 pi ) 1 qi P(2 ) i 1 decide 1 if g (x) 0 and 2 if g (x) 0 59 Example: Three-Dimensional Binary Data P (1 ) P (2 ) 0.5 pi 0.8, qi 0.5, i 1,2,3 0.8(1 0.5) wi ln 0.5(1 0.8) 1.3863 1 0 .8 0 .5 w0 ln ln 1 0 .5 0 .5 i 1 2.75 3 60 Example: Three-Dimensional Binary Data P (1 ) P (2 ) 0.5 pi 0.8, qi 0.5, i 1,2 p3 q3 0.5 0.8(1 0.5) wi ln 0.5(1 0.8) 1.3863, i 1,2 w3 0 1 0. 8 0 .5 w0 ln ln 1 0. 5 0 .5 i 1 1.83 2 61 Illustration of Missing Features 62 Decision with Missing Features x [x g , x b ] P(i | x g ) P( i P(i , x g ) P(x g ) p ( , x , x )dx p ( x ) dx i g b b b | x g , x b ) p (x g , x b )dxb p ( x ) dx g (x) p (x)dx p(x)dx i b b b 63 Noisy Features noise model : p (x b | x t ), assume xb independen t of i and x g P(i | x g , x b p ( , x ) i g , x b , x t )dxt P(x g , x b ) p (i , x g , x b , x t ) P (i | x g , x b , x t ) p (x g , x b , x t ) P (i | x g , x t ) p (x b | x g , x t ) p (x g , x t ), P(i | x g , x b p ( ) i | x g , x t ) p (x g , x t ) p (x b | x t )dxt p(x g g (x) p (x) p (x | x )dx p(x) p(x | x )dx i b b t t p(x b | x g , x t ) p(x b | x t ) t , x t ) p (x b | x t )dxt , x [x g , x t ] t 64 Example of Statistical Dependence and Independence p( x1 , x3 ) p( x1 ) p( x3 ) 65 Example of Causal Dependence State of an mobile – Temperature of engine – Pressure of brake fluid – Pressure of air in the tires – Voltages in the wires – Oil temperature – Coolant temperature – Speed of the radiator fan 66 Bayesian Belief Nets (Causal Networks) 67 Example: Belief Network for Fish p(a3 , b1 , x2 , c3 , d 2 ) P(a3 ) P(b1 ) P( x2 | a3 , b1 ) P(c3 | x2 ) P(d 2 | x2 ) 0.25 0.6 0.4 0.5 0.4 0.012 68 Simple Belief Network 1 P(d) P(a, b, c, d) a ,b ,c P(a) P(b | a) P(c | b) P(d | c) a ,b ,c P(d | c) P(c | b) P(b | a)P(a) c b a 69 Simple Belief Network 2 P(h) P(e, f , g, h) e ,f , g P(e) P(f | e) P(g | e) P(h | f , g) e ,f , g P(e) P(f | e) P(g | e) P(h | f , g) e f ,g 70 Use of Bayes Belief Nets Seek to determine some particular configuration of other variables – Given the values of some of the variables (evidence) Determine values of several query variables (x) given the evidence of all other variables (e) P(x, e) P ( x | e) P(x, e) P(e) 71 Example 72 Example P( x , c , b ) P( x | c , b ) P ( x , a, b , c , d ) P (c , b ) 1 1 1 1 2 2 1 1 2 2 1 a ,d P (a) P (b2 ) P ( x1 | a, b2 ) P (c1 | x1 ) P (d | x1 ) a ,d P (b2 ) P (c1 | x1 ) [ P (a1 ) P ( x1 | a1 , b2 ) P (a2 ) P ( x1 | a2 , b2 ) P (a3 ) P ( x1 | a3 , b2 ) P (a4 ) P ( x1 | a4 , b2 )] [ P (d1 | x1 ) P (d 2 | x1 )] 0.114 P ( x2 | c1 , b2 ) 0.066 P ( x1 | c1 , b2 ) 0.63, P ( x2 | c1 , b2 ) 0.37 73 Naïve Bayes’ Rule (Idiot Bayes’ Rule) When the dependency relationship among the features are unknown, we generally take the simplest assumption – Features are conditionally independent given the category P(x | a, b) P(x | a) P(x | b) – Often works quite well 74 Applications in Medical Diagnosis Uppermost nodes represent a fundamental biological agent – Such as the presence of a virus or bacteria Intermediate nodes describe disease – Such as flu or emphysema Lowermost nodes describe the symptoms – Such as high temperature or coughing A physician enters measured values into the net and finds the most likely disease or cause 75 Compound Bayesian Decision ω (1), , (n) t (i ) takes one values from 1 ,, c X x1 , , x n p ( X | ω) P(ω) P(ω | X) p ( X) p ( X | ω) P(ω) p(X | ω) P(ω) simplifica tion n p( X | ω) p(x i | (i )) i 1 76
© Copyright 2026 Paperzz