ECE 471/571 – Lecture 10 Nonparametric Density Estimation – knearest neighbor (kNN) 02/20/17 Different Approaches - More Detail Pattern Classification Statistical Approach Syntactic Approach Supervised Unsupervised Basic concepts: Baysian decision rule (MPP, LR, Discri.) Basic concepts: Distance Agglomerative method Parametric learning (ML, BL) k-means Non-Parametric learning (kNN) Winner-take-all NN (Perceptron, BP) Kohonen maps Dimensionality Reduction Fisher’s linear discriminant K-L transform (PCA) Performance Evaluation ROC curve TP, TN, FN, FP Stochastic Methods local optimization (GD) global optimization (SA, GA) ECE471/571, Hairong Qi 2 Bayes Decision Rule (Recap) ( ) p(x | ωp(x)P) (ω ) P ωj | x = j j R(α i | x ) = c ∑ λ (α i )( |ωj P ωj | x ) j =1 g i (x ) = ln p(x | ωi )+ ln P(ωi ) Maximum Posterior Probability Likelihoo d Ratio Discriminant Function For a given x, if P (ω1 | x ) > P (ω 2 | x ), then x belongs to class 1, otherwise, 2. If R (α1 | x ) < R (α 2 | x ), then decide ω1 , that is p (x | ω1 ) λ12 − λ22 P (ω 2 ) > p (x | ω 2 ) λ21 − λ11 P (ω1 ) The classifier will assign a feature vector x to class ω i if g i (x ) > g j (x ) Estimate Gaussian, Two-modal Gaussian Three cases Dimensionality reduction Performance evaluation and ROC curve 3 1 Motivation Estimate the density functions without the assumption that the pdf has a particular form ( ) p(x | ωp(x)P) (ω ) P ωj | x = j j 4 *Problem with Parzen Windows Discontinuous window function -> Gaussian The choice of h Still another one: fixed volume ' x − xi $ " ϕ% 1 n %& hn "# pn (x ) = ∑ n i =1 Vn 5 kNN (k-Nearest Neighbor) To estimate p(x) from n samples, we can center a cell at x and let it grow until it contains kn samples, and kn can be some function of n Normally, we let kn = n 6 2 kNN in Classification pn (x ) = kn / n Vn Given c training sets from c classes, the total number c of samples is n = ∑ nm m =1 *Given a point x at which we wish to determine the statistics, we find the hypersphere of volume V which just encloses k points from the combined set. If within that volume, km of those points belong to class m, then we estimate the density for class m by p(x | ω m ) = km nmV P(ω m ) = nm n p(x ) = k nV 7 kNN Classification Rule P (ω m | x ) = km nm p ( x | ω m ) P (ω m ) nmV n km = = k p ( x) k nV The decision rule tells us to look in a neighborhood of the unknown feature vector for k samples. If within that neighborhood, more samples lie in class i than any other class, we assign the unknown as belonging to class i. 8 Problems What kind of distance should be used to measure “nearest” n Euclidean metric is a reasonable measurement Computation burden n n Massive storage burden Need to compute the distance from the unknown to all the neighbors 9 3 Computational Complexity of kNN In both space (storage space) and time (search time) Algorithms reducing the computational burden n n n Computing partial distances Prestructuring Editing the stored prototypes 10 Method 1 - Partial Distance Calculate the distance using some subset r of the full d dimensions. If this partial distance is too large, we don’t need to compute further What we know about the subspace is indicative of the full space 1/ 2 & r 2# Dr (a, b ) = $ ∑ (ak − bk ) ! % k =1 " 11 Method 2 - Prestructuring Prestructure samples in the training set into a search tree where samples are selectively linked Distances are calculated between the testing sample and a few stored “root” samples, and then consider only the samples linked to it Closest distance no longer guaranteed. X o o o o X x x o o x x x x x x x 12 4 Method 3 – Editing/Pruning/Condensing Eliminate “useless” samples during training n For example, eliminate samples that are surrounded by training points of the same category label Leave the decision boundaries unchanged 13 Discussion Combining three methods Other concerns n n The choice of k The choice of metrics 14 Distance (Metrics) Used by kNN Properties of metrics n n n n Nonnegativity (D(a,b) >=0) Reflexivity (D(a,b) = 0 iff a=b) Symmetry (D(a,b) = D(b,a)) Triangle inequality (D(a,b) + D(b,c) >= D(a,c)) Different metrics n Minkowski metric $d '1/ k k Lk (a,b) = &∑ ai − bi ) % i=1 ( d L1 (a,b) = ∑ ai − bi w Manhattan distance (city block distance) d i=1 2 L2 (a,b) = ∑ ai − bi w Euclidean distance i=1 € w When k is inf, maximum of the projected distances onto each of the d coordinate axes € € 15 5 Visualizing the Distances 16 6
© Copyright 2026 Paperzz