Bayes Decision Theory Bayes Error Rate and Error Bounds Receiver Operating Characteristics Discrete Features Missing Features CSE 555: Srihari 0 Example of Bayes Decision Boundary Two Gaussian distributions each with four data points x2 10 ⎡ 3⎤ µ1 = ⎢ ⎥ 6 ⎣ ⎦ ⎡3⎤ µ2 = ⎢ ⎥ ⎣ − 2⎦ µ1 2 -2 2 4 6 8 x1 µ2 ⎛1 / 2 0 ⎞ = ∑1 ⎜⎜ 0 2 ⎟⎟ ⎝ ⎠ ⎛ 2 0⎞ = ∑2 ⎜⎜ 0 2 ⎟⎟ ⎠ ⎝ Inverse Matrices ⎛2 0 ⎞ ∑1 = ⎜⎜ 0 1 / 2 ⎟⎟ ⎠ ⎝ 0 ⎞ -1 ⎛1 / 2 ∑2 = ⎜⎜ 0 1 / 2 ⎟⎟ ⎠ ⎝ -1 x2 = 3.514 − 1.125 x1 + 0.1865 x1 Decision Boundary assumes P(ω1) = P(ω2) = 0.5 CSE 555: Srihari 2 1 Bayes Error Rate Two Class Case optimal Stated for the multidimensional case (regions R1 and R2 not easy to specify) Multi-Class Case Components of P(error) for equal priors and non-optimal decision point x* CSE 555: Srihari 2 Error Bounds for Normal Densities • Evaluation of Error Integrals difficult • Discontinuous decision regions • Instead, obtain bounds on error rate • Useful inequality in obtaining a bound • Minimum of two integers is smaller than the square root of their product, more generally β 1− β min[a, b] ≤ a b • Proof: for a, b ≥ 0 and 0 ≤ β ≤ 1 If a ≥ b then (a / b) β ≥ 1 or (a / b) β b ≥ b or a β b1− β ≥ b CSE 555: Srihari 3 Chernoff Bound P(error) ≤ Pβ (ω1 )P1−β (ω2 )∫ pβ ( x | ω1 ) p1−β ( x | ω2 )dx for 0 ≤ β ≤1 Note that integral is over all space– no need to impose integration limits. If conditional probabilities are normal, the integral can be evaluated analytically, yielding ∫ p β ( x | ω 1 ) p 1 − β ( x | ω 2 )dx = e − k ( β ) CSE 555: Srihari 4 Bhattacharya Bound • Special case of Chernoff bound where β = 0.5 CSE 555: Srihari 5 Bhattacharya versus Chernoff Error Bounds (as value of β is varied) Chernoff bound is never looser than the Bhattacharya bound. Here Chernoff bound is at β* = 0.66 and is slightly tighter than the Bhattacharya bound (β = 0.5) CSE 555: Srihari 6 Example of Bhattacharya bound with normal densities x2 10 ⎡ 3⎤ µ1 = ⎢ ⎥ 6 µ1 2 -2 2 4 6 8 x1 ⎣ ⎦ ⎡3⎤ µ2 = ⎢ ⎥ ⎣ − 2⎦ ⎛1 / 2 0 ⎞ = ∑1 ⎜⎜ 0 2 ⎟⎟ ⎝ ⎠ ⎛ 2 0⎞ = ∑2 ⎜⎜ 0 2 ⎟⎟ ⎠ ⎝ µ2 k(1/2) = 4.06 P(error) < 0.0087 CSE 555: Srihari 7 Receiver Operating Characteristics • Distance between Gaussian distributions useful in experimental psychology, radar detection, medical diagnosis • Interested in detecting a weak pulse, or dim flash of light • Detector detects a signal whose mean value is µ1 when signal is absent and µ2 when signal is present p ( x / ωi ) ~ N ( µ i , σ 2 ) CSE 555: Srihari 8 Four Types of Probability in Two-Class Discrimination Correct Rejection Hit Probability Miss False Alarm When no signal present: p(x/ω1) ~ N(µ1,σ 2) When signal present: p(x/ω1) ~ N(µ1,σ 2) Discriminability d'= | µ 2 − µ1 | σ Decision threshold determines probabilities of hit and false alarm CSE 555: Srihari 9 ROC Curve Probability of Hit When no signal present: p(x/ω1) ~ N(µ1,σ 2) When signal present: p(x/ω1) ~ N(µ1,σ 2) Decision threshold determines probabilities of hit and false alarm d'= | µ 2 − µ1 | Probability of False Alarm σ CSE 555: Srihari 10 ROCs need not be symmetric when distributions are not Gaussian Probability of Hit Probability of False Alarm CSE 555: Srihari 11 Bayes Decision Theory: Discrete Features CSE 555: Srihari 12 Bayes Decision Theory – Discrete Features • Components of x are binary or integer valued, x can take only one of m discrete values v1, v2, …, vm • Probability Density Functions replaced by Probabilities P(ω j | x) = P( x | ω j ) P(ω j ) P( x) where c P(x) = ∑ P(x | ω j ) P (ω j ) j=1 CSE 555: Srihari 13 Independent Binary Features 2 category problem Let x = [x1, x2, …, xd ]t where each xi is either 0 or 1, with probabilities: pi = p(xi = 1 | ω1) qi = p(xi = 1 | ω2) Assuming Conditional Independence P( x | ω ) = ∏ p (1 − p ) d i 1 1− xi xi i i =1 and d P( x | ω2 ) = ∏ qi i (1 − qi )1− xi x CSE 555: i =1Srihari d pi P( x | ω1 ) = ∏( P( x | ω2 ) i =1 qi ) ( xi 1 − pi 1 − qi )1− x i 14 Bayes discriminant function for Independent Binary Features: d g ( x ) = ∑ w i x i + w0 i =1 where : pi ( 1 − q i ) w i = ln q i ( 1 − pi ) i = 1 ,..., d and : 1 − pi P( ω1 ) w0 = ∑ ln + ln 1 − qi P( ω 2 ) i =1 d decide ω 1 if g(x) > 0 and ω 2 if g(x) ≤ 0 CSE 555: Srihari 15 Bayesian Decisions for 3-D Binary Data P (ω1 ) = P(ω2 ) = 0.5 pi = 0.8 q i = 0.5 for i = 1,2,3 p3 =q 3 w3 = 0 g ( x) = 0 CSE 555: Srihari 16 Missing and Noisy Features Features are corrupted by a known noise source Ex: variability of light source may degrade measurement of lightness Features are missing Ex: occlusion prevents measurement of length CSE 555: Srihari 17 Missing Feature Example of Missing Feature Four categories with equal priors and class-conditional distributions. Here x1 is missing and the other has value x2 We want to classify as ω2 since it has the largest likelihood Choosing mean of missing feature (over all classes) will result CSEin 555:worse Srihari performance! 18 Missing Feature Analysis good features xg bad features (unknown or missing) xb Marginalize over all values of missing feature This is the Bayes Discriminant Function CSE 555: Srihari 19 Noisy Features • Uncorrupted good features xg • Noise model p(xb|xt) • xt = True value of the Observed value xb • Assume if xt were known xb would be independent of wi and xg Integral is weighted by the noise model CSE 555: Srihari 20
© Copyright 2026 Paperzz