Bayesian Classification Bayesian Classification CS 650: Computer Vision Bryan S. Morse BYU Computer Science Bayesian Classification Statistical Basis Training: Class-Conditional Probabilities I Suppose that we measure features for a large training set taken from class ωi . I Each of these training patterns has a different value x for the features. This can be written as the class-conditional probability: p(x|ωi ) In other words, How often do things in class ωi exhibit features x? Bayesian Classification Statistical Basis Classification When we classify, we measure the feature vector x, then we ask this question: “Given that this has features x, what is the probability that it belongs to class ωi ?”. Mathematically, this is written as P(ωi |x) Bayesian Classification Statistical Basis Why We Care About Conditional Probabilities I Training gives us I But we want These are not the same! How are they related? p(x|ωi ) P(ωi |x) Bayesian Classification Statistical Basis Bayes Theorem (Revisited) Generally: P(A|B) = P(B|A) P(A) P(B) P(ωi |x) = p(x|ωi ) P(ωi ) p(x) P(ωi |x) = p(x|ωi ) P(ωi ) p(x) For our purposes: Bayesian Classification Statistical Basis Definitions p(x|ωi ) class conditioned probability or likelihood P(ωi ) a priori or prior probability p(x) evidence (usually ignored) P(ωi |x) measurement-conditioned or posterior probability Bayesian Classification The Bayesian Classifier Structure of a Bayesian Classifier Training: Measure p(x|ωi ) for each class. Prior Knowledge: Measure or estimate P(ωi ) in the general population. (Can sometimes aggregate the training set if it is a reasonable sampling of the population.) Classification: 1. Measure feature (x) for new pattern. 2. Calculate posterior probabilities P(ωi |x) for each class. 3. Choose the one with the larger posterior P(ωi |x). Bayesian Classification The Bayesian Classifier Example Normally distributed class-conditional probabilities: p(x|ωi ) = √ 1 2πσi 1 e− 2 (x−µi ) 2 /σi2 Bayesian Classification The Bayesian Classifier From Probabilities to Discriminants: 1-D Case p(x|ωi ) P(ωi ) p(x) Want to maximize P(ωi |x) = same as maximizing p(x|ωi ) P(ωi ) which for a normal distribution is √ 1 2πσi applying logarithm log dropping constants log P(ωi ) − log σi − 12 (x − µi )2 /σi2 1 e− 2 (x−µi ) √1 2π 2 /σi2 P(ωi ) − log σi − 12 (x − µi )2 /σi2 + log P(ωi ) Bayesian Classification The Bayesian Classifier Extending to Multiple Features I Note that the key term for a 1-D normal distribution is (x − µi )2 /σi2 the squared distance from the mean in standard deviations I Can extend to multiple features by simply normalizing each feature’s “distance” by the respective standard deviation, then just use minimum distance classification (remembering to use the priors as well) Bayesian Classification The Bayesian Classifier Extending to Multiple Features I Some call normalizing each feature by its variance naive Bayes I So what’s naive about it? I It ignores relationships between features Bayesian Classification Multivariate Normal Distributions The Multivariate Normal Distribution In multiple dimensions, the normal distribution takes on the following form: d 1 1 − 12 (x−m)T C−1 (x−m) √ e p(x) = 1/2 2π |C| = −d/2 (2π) −1/2 |C| [See examples in Mathematica] 1 T e− 2 (x−m) C−1 (x−m) Bayesian Classification Multivariate Normal Distributions Multivariate Normal Bayesian Classification For multiple classes, each class ωi has its own I mean vector mi I covariance matrix Ci The class-conditional probabilities are p(x|ωi ) = −d/2 (2π) 1 −1/2 e− 2 (x−mi ) |Ci | T C−1 (x−mi ) i Bayesian Classification Multivariate Normal Distributions From Probabilities to Discriminants p(x|ωi ) P(ωi ) p(x) Want to maximize P(ωi |x) = so maximize p(x|ωi ) P(ωi ) so maximize log p(x|ωi ) + log P(ωi ) for normal distribution: − d2 log 2π − 21 log |Ci | − 12 (x − mi )T C−1 i (x − mi ) + log P(ωi ) maximize log P(ωi ) − 1 2 log |Ci | − 12 (x − mi )T C−1 i (x − mi ) Bayesian Classification Multivariate Normal Distributions Mahalonobis Distance I The expression can be thought of as (x − mi )T C−1 (x − mi ) 2 kx − mi kC−1 I This looks like squared distance, but the inverse covariance matrix C−1 acts like a metric (stretching factor) on the space. I This is the Mahalonobis distance. I Pattern recognition using multivariate normal distributions is simply a minimum (Mahalonobis) distance classifier. Bayesian Classification Different Cases Case 1: Identity Matrix Case 1: Identity Matrix Suppose that the covariance matrix for all classes is the identity matrix I: Ci = I or Ci = σ 2 I Discriminant becomes 1 gi (x) = − (x − mi )T (x − mi ) + log P(ωi ) 2 Assuming all classes ωi are a priori equally likely, 1 gi (x) = − (x − mi )T (x − mi ) 2 Ignoring the constant 12 , we can use gi (x) = −(x − mi )T (x − mi ) Bayesian Classification Different Cases Case 1: Identity Matrix Example: Equal Priors Bayesian Classification Different Cases Case 1: Identity Matrix Examples: Different Priors Bayesian Classification Different Cases Case 2: Same Covariance Matrix Case 2: Same Covariance Matrix I If each class has the same covariance matrix, 1 gi (x) = − (x − mi )T C(x − mi ) + log P(ωi ) 2 Loci of constant probability are hyperellipes oriented with the eigenvectors of C: eigenvectors directions of ellipse axes eigenvalues variance (squared axis length) in axis directions I The decision boundaries are still hyperplanes, though they may no longer be normal to the lines between the respective class means. Bayesian Classification Different Cases Case 2: Same Covariance Matrix Examples Bayesian Classification Different Cases Case 3: Different Covariances for Each Class Case 3: Different Covariances for Each Class I Suppose that each class has its own arbitrary covariance matrix (the most general case): Ci 6= Cj I Loci of constant probability for each class are hyperellipes oriented with the eigenvectors of Ci for that class. I Decision boundaries are quadratic, specifically, hyperellipses or hyperhyperboloids. [See examples in Mathematica] Bayesian Classification Different Cases Case 3: Different Covariances for Each Class Examples: 2-D Bayesian Classification Different Cases Case 3: Different Covariances for Each Class Examples: 3-D Bayesian Classification Different Cases Case 3: Different Covariances for Each Class Example: Multiple Classes
© Copyright 2026 Paperzz