From Last Meeting Studied Approximation of Corpora Callosa - by Fourier (raw and centered) - by PCA Fisher Linear Discrimination Recall Toy Problem: Show HDLSS/HDLSSod1Raw.ps Want to find “separating direction vector” Recall PCA didn’t work Show HDLSS/HDLSSod1PCA.ps Also “difference between means” doesn’t work: Show HDLSS/HDLSSod1Mdif.ps Fisher Linear Discrimination A view of Fisher Linear Discrimination: Adjust to “make covariance structure right” Show HDLSS/HDLSSod1FLD.ps Mathematical Notation (vectors with dimension d ): Class 1: (1) (1) Class Centerpoints: ( 2) Class 2: X 1 ,..., X n1 X (1) 1 n1 (1) Xi n1 i 1 and ( 2) X 1 ,..., X n2 X ( 2) 1 n2 ( 2 ) Xi n2 i 1 Fisher Linear Discrimination (cont.) ~ ( j) ~ ( j)t ( j) ˆ Covariances: X X , for j 1,2 (outer products) Based on “normalized, centered data matrices”: 1 ~ ( j) ( j) ( j) ( j) X ( j) X 1 X ,..., X n j X nj note: Use “MLE” version of normalization, for simpler notation Terminology (useful later): ˆ ( j ) are “within class covariances” Fisher Linear Discrimination (cont.) Major assumption: Class covariances are same (or “similar”) Good estimate of “common within class covariance”? Show HDLSS/HDLSSod1FLD.ps Pooled (weighted average) within class covariance: (1) ( 2) ˆ ˆ n n ~~ 2 ˆ 1 XX t n1 n2 w for the “full data matrix”: 1 ~ ~ (1) ~ ( 2) n1 X n2 X X n Fisher Linear Discrimination (cont.) Note: ̂ w is similar to ̂ from before - i.e. “covariance matrix ignoring class labels” - important difference is “class by class centering” Again show HDLSS/HDLSSod1FLD.ps Fisher Linear Discrimination (cont.) Simple way to find “correct covariance adjustment”: Individ’ly transform subpop’ns so “spherical” about their means Show HDLSS\HDLSSod1egFLD.ps Y ( j) i ( j) w 1 / 2 ˆ X i then: “best separating hyperplane” is “perpendicular bisector of line between means” Fisher Linear Discrimination (cont.) So in transformed space, the separating hyperlane has: Transformed normal vector: (1) ( 2) (1) ( 2) w 1 / 2 w 1 / 2 w 1 / 2 ˆ ˆ ˆ n TFLD X X X X Transformed intercept: 1 ˆ w 1/ 2 (1) 1 ˆ w 1/ 2 ( 2) 1 w 1 / 2 1 ˆ TFLD X X X (1) X ( 2) 2 2 2 2 Equation: Again show HDLSS\HDLSSod1egFLD.ps y : y, n TFLD TFLD , n TFLD Fisher Linear Discrimination (cont.) Thus discrimination rule is: 0 Given a new data vector X , Choose Class 1 when: w 1 / 2 ˆ X 0 , nTFLD TFLD , nTFLD i.e. (transforming back to original space) 1 / 2 1/ 2 1 / 2 0 X , ˆ w nTFLD ˆ w TFLD , ˆ w nTFLD X , n FLD FLD , n FLD 0 where: 1 / 2 1 (1) ( 2) n FLD ˆ w n TFLD ˆ w X X 1 (1) 1 ( 2) w 1/ 2 ˆ FLD TFLD X X 2 2 Fisher Linear Discrimination (cont.) Thus (in original space) have separating hyperplane with: Normal vector: Intercept: FLD Again show HDLSS\HDLSSod1egFLD.ps n FLD Next time: relate to Mahalanobis distance FLD Likelihood View N Assume: Class distributions are multivariate ( j) , w (strong distributional assumption + common cov.) 0 At a location x , the likelihood ratio, for choosing between Class 1 and Class 2, is: LR x , , 0 (1) ( 2) , w w x 0 (1) / x w 0 ( 2) where w is the Gaussian density with covariance w FLD Likelihood View (cont.) Simplifying, using the form of the Gaussian density: 1 x 2 w d /2 w e 1 x t w x / 2 Gives (critically using the common covariance): LR x , , 0 (1) ( 2) , w e x 0 (1) w t 0 (1) 1 x 2 log LR x , , x 0 x (1) t w 1 0 (1) x 0 0 t (1) x 0 ( 2 ) w ( 2) x 1 x 0 (2) / 2 , w ( 2) t w 1 0 ( 2) FLD Likelihood View (cont.) But: x 0 x ( j) t w 1 0 ( j) x so: 0t w 1 0t x 2x 0 2 log LR x , , 0t 2 x w 1 Thus LR x , , 0 (1) ( 2) (1) ( 2) 0 , w 1 (1) (1) 0t x w 1 (1) ( 2) 0 12 (1) (1) ( 2) ( j) ( j) , w ( 2) when 2 log LR x , , i.e. ( 2) w 1 w 1 (1) ( 2) , w 0 ( 2) w 1 (1) ( 2) w 1 ( j) FLD Likelihood View (cont.) Replacing , (1) ( 2) and w by maximum likelihood estimates: (1) X , X ( 2) and ̂ w gives the likelihood ratio discrimination rule: Choose Class 1, when 1 (1) (1) ( 2) ( 2) (1) ( 2) w 1 w 1 ˆ ˆ x X X X X X X 2 0t same as above FLD Generalization I Gaussian Likelihood Ratio Discrimination (a. k. a. “nonlinear discriminant analysis”) Idea: Assume class distributions are N ( j) , ( j) Different covariances! Likelihood Ratio rule is straightforward numerical calculation (thus can easily do discrimination) FLD Generalization I (cont.) But no longer have “separating hyperplane” representation (instead “regions determined by quadratics”) (fairly complicated case-wise calculations) Graphical display: for each point, color as: Yellow if assigned to Class 1 Cyan if assigned to Class 2 (“intensity” is “strength of assignment”) show PolyEmbed\PEod1FLDe1.ps FLD Generalization I (cont.) Toy Examples: 1. Standard Tilted Point clouds: also show PolyEmbed\PEod1GLRe1.ps - Both FLD and LR work well. 2. Donut: Show PolyEmbed\PEdonFLDe1.ps & PEdonGLRe1.ps - FLD poor (no separating plane can work) - LR much better FLD Generalization I (cont.) 3. Split X: Show PolyEmbed\PExd3FLDe1.ps & PExd3GLRe1.ps - neither works well - although good separating surfaces - they are not “from Gaussian likelihoods” - so this is not “general quadratic discrimination” FLD Generalization II Different prior probabilities Main idea: Give different weights to 2 classes I.e. assume not a priori equally likely Development is “straightforward” - modified likelihood - change intercept in FLD Might explore with toy examples, but time is short FLD Generalization III Principal Discriminant Analysis Idea: FLD-like approach to more than two classes Assumption: Class covariance matrices are the same (similar) (but not Gaussian, as for FLD) Main idea: quantify “location of classes” by their means 1, 2 , … , k FLD Generalization III (cont.) Simple way to find “interesting directions” among the means: PCA on set of means i.e. Eigen-analysis of “between class covariance matrix” B MM t where M Aside: can show: 1 (1) ( k ) k overall n n B k w FLD Generalization III (cont.) But PCA only works like “mean difference”, Expect can improve by “taking covariance into account”. Again show HDLSS\HDLSSod1egFLD.ps Blind application of above ideas suggests eigen-analysis of: w 1 B FLD Generalization III (cont.) There are: - smarter ways to compute (“ generalized eigenvalue”) - other representations (this solves optimization prob’s) Special case: 2 classes, Good reference for more: reduces to standard FLD Section 3.8 of: Duda, R. O., Hart, P. E. and Stork, D. G. (2001) Pattern Classification, Wiley.
© Copyright 2025 Paperzz