POLYTECHNIC UNIVERSITY OF VALENCIA Lecture 5 Distance-based methods Pattern Recognition Contents 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alfons Juan-Císcar [email protected] www.dsic.upv.es/∼ajuan 1 5.2 Metric spaces and distance functions . . . . . . . . . . . . . . . . . . . . . . . 3 5.3 Minimum distance classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.4 Nearest neighbor classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.5 k-nearest neighbor classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Department of Computer Systems and Computation (DSIC) Polytechnic University of Valencia Last update: September 21, 2010 Example and justification 5.1 Introduction As its name indicates, distance-based methods require a distance function to be defined so as to measure the proximity between any pair of data points. Posterior class probabilities are locally estimated from N prototypes (labeled training samples) as: kc(x) (5.1) p̂(c | x) = k where k is a predefined number of nearest neighbors to be considered and kc(x) is the number of nearest neighbors of x that are labeled with c. 8 7 6 5 4 3 2 1 0 b c (x) = arg max p(c | x) ≈ arg max kc(x) c=1,...,C c=1,...,C b b b bc x bc ut bc ut 0 1 2 3 4 5 6 7 8 p̂(x | c) = kc Nc V Nc N → c′ 1 / 17 A. Juan, ETSINF / DSIC, UPV, 2010. N• = 6 N ◦ = 5 N△ = 3 V (Ball(x, rad = 2)) = 4π k•(x) = 3 k◦(x) = 1 → c∗(x) ≈ • k△(x) = 1 → p̂(•) = p̂(x | •) = p̂(c) p̂(x | c) kc p̂(c | x) = P ′ = k p̂(c ) p̂(x | c′) That is, x is assigned to the most voted class among its k nearest neighbors. A. Juan, ETSINF / DSIC, UPV, 2010. k=5 ut bc p̂(c) = (5.2) N = 14 bc Using (5.1), the Bayes classifier can be approximated as: ∗ b b → → 6 14 3 6 V p̂(◦) = 5 14 p̂(x | ◦) = p̂(• | x) = 3 5 p̂(△) = 1 5 V 3 14 p̂(x | △) = p̂(◦ | x) = 1 5 1 3 V p̂(△ | x) = 1 5 2 / 17 5.2 5.2.1 Vectorial metrics: the Euclidean distance or L2 Metric spaces and distance functions A metric space is a pair (E, d) comprising a set of of points E and an application d:E×E →R (5.3) (5.8) d called metric or distance (function), satisfying: This is the most popular distance in RD and the one which we will use by default. d(x, x) = 0 d(x, y) > 0 si The Euclidean distance or L2 in RD is defined as: sX (xd − yd)2 para todo x, y ∈ RD d2(x, y) = (5.4) x 6= y In the plane, the Euclidean ball has the form of a circle: (5.5) d(x, y) = d(y, x) (5.6) d(x, y) + d(y, z) ≥ d(x, z) (5.7) B2(a, r) r The last property is called triangle inequality: a y , y) d (y , z ) z d(x, z) A. Juan, ETSINF / DSIC, UPV, 2010. 3 / 17 5.2.2 Vectorial metrics: distances L1 and L∞ The L1 metric is the fairest distance since it sums absolute differences: X d1(x, y) = |xd − yd| para todo x, y ∈ RD (5.9) d On the contrary, L∞ takes into account only the maximum difference in absolute value: d para todo x, y ∈ RD (5.10) B∞ B1 4 / 17 5.2.3 Non-vectorial metrics: the edit distance The Euclidean distance sums squared differences, and thus it might depend too much on the dimensions (features) showing largest differences. d∞(x, y) = max |xd − yd| A. Juan, ETSINF / DSIC, UPV, 2010. The edit distance computes the minimum number of elementary edit operations (insertion, deletion and substitution of distinct symbols) required to transform a string into another. Example: d(paTernn, pattern) = 3 p→p _paTernn −−→ p_aTernn a→a −−→ pa_Ternn B2 λ→t −−→ pat_Ternn T→t −−→ patt_ernn e→e −−→ patte_rnn The balls L1, L2 and L∞ are tighly related: O(|x||y|) by Dymanic Programming a→λ b deletion bs a → titu b tio n λ→b insertion x su d (x r→r −−→ patter_nn a n r e t t a p n→λ −−→ patter_n n→n −−→ pattern_ A. Juan, ETSINF / DSIC, UPV, 2010. 5 / 17 A. Juan, ETSINF / DSIC, UPV, 2010. p a T e r n n 6 / 17 Example: edit distance between chain codes y x x 0 1 3 1s 0s 3 1 2 3 1 0b 3 2b 3 1 1 0 3 1 2 3 1 3 2 0i 1 1s 1 0s 1 1 1 0 1 2 1 3 2 2i 0b 0 3b 0 1 1 0b 3 0b 1 0 3 1 0s 3 1s 0s 3 2b 3 1 2b 2b 3 1 1 3 2 2 2s 2 2b 0 3 3 2 3 3 3 3 3 d(x, y) = 6 d(x, y) = 12 1 1 0 1 1 1 0 1 1 1 2 2 3 2 3 3 3 3 2 3 3 3 0 0 5.3 Minimum distance classifier y 0 1 1 1 0 1s 1 1 0s 1 2 3s 1 2 2 0 2s 3 3 3 3 In the simplest case, k = 1 and each class c is represented by a single prototype pc. The aproximation (5.2) results in the so-called minimum distance classifier: 3 3 3 c(x) = arg min d(x, pc) (5.11) In the Euclidean case, it is linear with x: c(x) = arg min kx − pck c = arg min (x − pc)t(x − pc) c 1 1 0 1 1 1 0 1 1 1 2 2 3 2 3 3 3 3 2 3 3 3 0 0 0 3 3 3 2 3 3 2 3 3 2 3 2 1 1 1 0 1 1 0 1 1 0 1 for all x ∈ E c = arg min xtx − 2ptcx + ptcpc c = arg max 2ptcx − ptcpc c = arg max gc(x) c with: gc(x) = wtc x + wc0 where wc = 2pc and wc0 = −ptcpc Learning: each class is usually represented by its sample mean or median. 00303033033232322222112111010101 A. Juan, ETSINF / DSIC, UPV, 2010. 7 / 17 A. Juan, ETSINF / DSIC, UPV, 2010. Minimum distance classifier example 8 / 17 5.4 Nearest neighbor classifier Consider the case k = 1 with no constraints on the number of prototypes from each class. The aproximation (5.2) is then called nearest neighbor classifier: 40 c(x) = arg min min d(x, p) p∈Pc c −3 6 30 for all x ∈ E (5.12) 12x where Pc is the set of prototypes that represents class c, c = 1, . . . , C. In the Euclidean case, it is a piecewise linear function of x: 20 c(x) = arg min min kx − pk 10 0 − 4x 0 b 1 b 2 p• c p∈Pc c p∈Pc c p∈Pc c p∈Pc = arg min min (x − p)t(x − p) 4 = arg min min xtx − 2ptx + ptp b 3 4 bc 5 6 p◦ bc 7 = arg min min −2ptx + ptp 8 = arg max gc(x) c with: gc(x) = max 2ptx − ptp p∈Pc Learning: each class is usually represented by all its available prototypes. A. Juan, ETSINF / DSIC, UPV, 2010. 9 / 17 A. Juan, ETSINF / DSIC, UPV, 2010. 10 / 17 Nearest neighbor classifier example 5.4.1 Boundaries in 2D: Voronoi diagrams 40 10x 14x − 2 − 5 49 Synthetic example 30 25 100 6s and 100 9s (1×2 and 64 grey levels) A B 6 9 60 55 20 15 x2 1 − 2x − 4 4x 9 − x 6x a m 10 Lower brightness ma x 20 10 b 0 b 1 2 b 3 bc 4 5 bc 6 7 11 / 17 5 10 Let P be the Bayes error and let P be the NN error when N → ∞. Then: P C−1 C x1 ≤ 2P ∗ 20 15 25 20 25 30 35 40 45 50 55 60 Upper brightness 12 / 17 k-nearest neighbor classifier In the general case, (5.2) is known as the k-nearest neighbor classifier: c(x) = arg max kc(x) (5.13) for all x ∈ E c=1,...,C (5.14) where kc(x) can be formally described as 1 2P ∗ 15 5.5 ∗ C P∗ C−1 30 A. Juan, ETSINF / DSIC, UPV, 2010. 5.4.2 Asymptotic NN probability of error P∗ ≤ P ≤ P∗ 2 − 35 15 0 A. Juan, ETSINF / DSIC, UPV, 2010. 40 20 8 0 45 25 5 0 50 kc(x) = |k(x) ∩ Pc| 0.8 (5.15) being Pc the prototypes from class c, and k(x) a set of k nearest neighbors of x; i.e. 0.6 k(x) = arg min max d(x, p) c=2 S⊂P |S|=k 0.4 3 5 0.2 0 0 A. Juan, ETSINF / DSIC, UPV, 2010. P∗ C−1 C p∈S (5.16) Tie-breaking rule: decide among the tied classes by using the NN rule. 10 100 Learning: all available prototypes are tipically used. 0 0 0.2 0.4 0.6 0.8 1 Note: the 2-NN classifier is equivalent to the (1-)NN classifier. 13 / 17 A. Juan, ETSINF / DSIC, UPV, 2010. 14 / 17 k-nearest neighbor classifier example k-nearest neighbor classifier example (2) 7 3 6 |3N N (x) ∩ {1, 2, 3}| bc 5 bc bc 4 bc × 3 2 b 2 1 1 b 0 b b -1 -1 |3N N (x) ∩ {5, 7}| 0 b b 0 1 b 2 b 3 4 bc 5 6 bc 7 15 / 17 Asymptotic k-NN probability of error The asymptotic k-NN probability of error tends to the Bayes error when the following conditions are satisfied: N →∞ (5.17) k→∞ (5.18) k →0 N (5.19) The last two conditions are satisfied by using, for instance, k = Excellent result: N →∞ ⇒ √ N Bayes error Problem 1: convergence might be really slow Problem 2: high computational cost A. Juan, ETSINF / DSIC, UPV, 2010. 1 k NN class 1 • 2 ◦ 3 ◦ 4 • 5 ◦ 8 A. Juan, ETSINF / DSIC, UPV, 2010. 0 17 / 17 A. Juan, ETSINF / DSIC, UPV, 2010. 2 dist. √ √1 √2 √4 √5 8 3 • 1 1 1 2 2 4 5 6 ◦ decision 0 • 1 NN → • 2 ◦ 2 NN → • 3 ◦ 16 / 17
© Copyright 2026 Paperzz