Ch 1. Introduction (Latter) Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by J.W. Ha Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/ Contents 1.4 The Curse of Dimensionality 1.5 Decision Theory 1.6 Information Theroy (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 2 1.4 The Curse of Dimensionality The High Dimensionality Problem Ex. Mixture of Oil, Water, Gas - 3-Class (Homogeneous, Annular, Laminar) - 12 Input Variables - Scatter Plot of x6, x7 - Predict Point X - Simple and Naïve Approach (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 3 1.4 The Curse of Dimensionality (Cont’d) The Shortcomings of Naïve Approach - The number of cells increase exponentially. - Needs a large training data set for cells not to be empty. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 4 1.4 The Curse of Dimensionality (Cont’d) Polynomial Curve Fitting Method(M Order) - Althogh D increases, it grows propotionally to Dm The Volume of High Dimensional Sphere - Concentrated in a thin shell near the space (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 5 1.4 The Curse of Dimensionality (Cont’d) Gaussian Distribution (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 6 1.5 Decision Theory Make Optimal Decisions - Inference Step & Decision Step - Select Higher Posterior Probability Minimizing the Misclassification Rate - Object: → Minimizing Colored Area (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 7 1.5 Decision Theory (Cont’d) Minimizing the Expected Loss - Class마다 Missclassification의 Damage가 다르다. - Introduction of Loss Function(Cost Function) - Object : Minimizing Expected Loss The Reject Option - Threshold θ - Reject if θ > Posterior Prob. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 8 1.5 Decision Theory (Cont’d) Inference and Decision - Three Distinct Approach 1. Obtain Posterior Probability & Generative Models - Obtain data distribution by Caculating p(x|Ck) for each class - Obtain p(Ck), p(x) to get p(Ck|x) in Bayesian Rule - Can generate synthetic data points - Overheads of Calculation (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 9 1.5 Decision Theory (Cont’d) 2. Discriminative Models using Posterior - Obtain Posterior Directly - Classify the class for new input data - In case that classification is needed only 3. Discriminative Function - Maps input x to class directly (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 10 1.5 Decision Theory (Cont’d) Why do we compute the posterior? 1. Minimizing Risk - Frequently changed Loss Matrix 2. Reject Option 3. Compensating for Class Priors - In case of large difference between the probablities of each class - Posterior is proportional to prior 4. Combining Models - Seprate subproblem and Obtain each posterior (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 11 1.5 Decision Theory (Cont’d) Loss Function for Regression - Multiple Target Variable Vector (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 12 1.5 Decision Theory (Cont’d) Minkowski Loss (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 13 1.6 Information Theory Entropy - Low probability events corresponds to high information content.( h(x) = -log2p(x) ) - Expectaion value of information content. - Higher Entropy, Lager Uncertainty (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 14 1.6 Information Theory (Cont’d) Maximum Entropy Configuration for Continuous Variable H[x] - p( x) ln p(x)dx p( x)dx 1 xp( x)dx ( x )2 p( x)dx 2 - Adopt Lagrange multipliers to obtain maximum entropy - p( x) ln p( x) dx 1 3 ( x ) 2 p ( x)dx 2 p ( x) p( x) dx 1 2 xp( x)dx ( x )2 exp 2 2 2 2 1 - The distribution that maximize the differential entropy is the Gaussian Conditional Entropy : H[x,y] = H[y|x] + H[x] (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 15 1.6 Information Theory (Cont’d) Relative Entropy [Kullback-Leibler divergence] - Predict unknown distribution p(x) with an approaxiamting distribution q(x) Convexity Function (Jensen’s Inequality) (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 16 1.6 Information Theory (Cont’d) Mutual Information - Relative Entropy between the joint distribution and the product of the marginals - I[x, y] = H[x] – H[x|y] = H[y] – H[y|x] - If x and y are independent, I[x,y] = 0 - the Reduction in the uncertainty about x by virtue of being told the value of y (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 17
© Copyright 2026 Paperzz