Zero-error dissimilarity based classifiers1 Robert P.W. Duin and Elžbieta Pȩkalska Delft Pattern Recognition Group Faculty of Applied Sciences Lorentzweg 1, 2628 CJ Delft, The Netherlands [email protected] Abstract We consider general non-Euclidean distance measures between real world objects that need to be classified. It is assumed that objects are represented by distances to other objects only. Conditions for zero-error dissimilarity based classifiers are derived. Additional conditions are given under which the zero-error decision boundary is a continues function of the distances to a finite set of training samples. These conditions affect the objects as well as the distance measure used. It is argued that they can be met in practice. 1 Introduction In pattern recognition the real world objects like characters or faces are traditionally represented in a feature space. Generalisation is performed by training classifiers in such a space using a set of examples, the training set. In some applications like shape recognition, good features may be difficult to find and dissimilarities between objects are used for representation [1]. Usually the nearest neighbour rule is then used as a classifier. In our previous work we showed that instead of the nearest neighbour rule also other, traditional classifiers like Fisher’s Linear Discriminant may be used for generalisation. To that purpose we studied the use of dissimilarity spaces as well as Euclidean embedding for representing the dissimilarities between the objects in the training set as well as between the training objects and test objects to be classified. We found that such dissimilarity based classifiers may perform better and may be evaluated faster than the nearest neighbour rule. A problem that had to be faced in building dissimilarity based spaces on distances measures practically used in pattern recognition is that such distance measures are frequently not Euclidean embeddable and are even sometimes not metric. Nevertheless, they may perform well and it remains of practical interest to study their properties fundamentally. An intriguing aspect of the dissimilarity based approach to pattern recognition is that the distance measures of interest, although not Euclidean, may be given other interesting properties. One of them is that they may built a so-called Hausdorff space in which distances are zero if and only if the objects are identical. In case the real world objects can 1. The original paper was written in 2003 when the first author was visiting Anil Jain, MSU in East Lansing, USA. The paper was not accepted for publication as it ‘tried to proof an obvious truth’. Nevertheless it was for the authors a significant step as it made them clear that there should be in some way a continuous relation between the real world objects and their representation to have the prospect of a zero-error classifier defined by a finite training set. be unambiguously labelled, which is rather natural for real world objects, this opens the road to build so-called zero-error classifiers. Such classifiers make use of the non-overlap of the classes and define the classification function in the gap between them. For non-Euclidean embeddable datasets, however, we don’t have yet a space in which this may be performed. For such datasets the problem of constructing such classifiers is not yet solved. In this paper we will approach it fundamentally and start by defining a topology based on a neighbourhood system defined on a given dissimilarity measure. We will define conditions for such measures and for the class of objects under consideration and proof two fundamental theorems based on these conditions. First, it will be proved that under very general conditions for the dissimilarity measure, including non-metricness, a zero error classifier exists. Second, general conditions with respect to the class of objects will be given under which such a classifier can be based on a finite set of objects. Together, these two theorems show that a zero-error classifier may be constructed in practice using a finite training set. 2 Intuition Before giving a formal proof we will give some intuition on what we are aiming at. Consider a two-class dissimilarity based classification problem, e.g. face recognition from 2D images. Let for each of the classes the images of the two persons A and B be sampled from a set for which parameters describing light circumstances, position, face expression, etcetera, vary continuously. Let us consider a dissimilarity measure D(x,y) between face images x and y that is continuous in the parameters generating x and y. Let D(x,y) = 0 iff x = y. Now the infinite set of images of A as well as the infinite set of images of B constitute under some conditions for D(•) both connected sets AD, respectively BD, defined on D(•). For all images of A there is a one to one relation with an element in AD. Similar for B and BD. Very similar images are close in such sets. Between any two images of the same class exists a connected path between the two elements representing the two images. For classification problems like the given one it can be assumed that no two images, one from A and one from B have a dissimilarity D < δ. Consequently the two sets AD and BD do not intersect and are thereby entirely separable. 3 Conditions We will proof the above formally if the following conditions hold for dissimilarity measure D(x,y), ∀x,y ∈ X, in which X is the entire set of objects of interest. Condition 3.1 D(x,y) ∈ ℜ, 0 ≤ D(x,y) < ∞. Condition 3.2 D(x,y) is continuous in the parameters that generate the objects x and y. Condition 3.3 D(x,y) = 0 iff x = y. Condition 3.4 D(x,y) > δ, for some δ, 0 < δ < ∞ iff (x∈A ∧ y∈B) ∨ (x∈B ∧ y∈A). Lemma 3.1 For a sufficiently small ε, ∀x∈A ∃y∈A D(x,y)<ε y≠x, similar for x∈B y∈B. This directly follows from the conditions 3.2, 3.3 and 3.4. 4 The existence of a zero-error classifier In this section we will proof that there exists a classifier that under the given conditions all x correctly classifies. First we define a neighbourhood basis for x and on top of that neighbourhoods and neighbourhood systems. Definition 4.1 The neighbourhood basis of x, NB(x) is the set of all y with D(x,y) < ε, 0 < ε < δ. Definition 4.2 If P(X) is the power set of X, then N∈P(X) is a neighbourhood of x if NB(x)⊆N Definition 4.3 A neighbourhood system N(x) is the collection of all neighbourhoods of x. As a result has each x in A in all its neighbourhoods some y in A: Lemma 4.1 ∀x∈A ∃y∈N ∀N∈N(x) y≠x y∈A. From definition 4.1 it follows that ∃y∈A with D(x,y) < ε is equivalent with ∃y∈NB(x), which is equivalent with ∃y∈N ∀N∈N(x) as ∀N∈N(x) NB(x)⊆N (definitions 4.2 and 4.3). Substitution in lemma 3.1 shows the result. The neighbourhood basis of all x in A contains no points of B Lemma 4.2 ∀x∈A ∀y∈B y∉NB(x) x∉NB(y). This follows from the condition 3.4 on the dissimilarities between objects in A and B and definition 4.1 for NB(x). So all x in A have a neighbourhood that contains no points of B Lemma 4.3 ∀x∈A ∃N∈N(x) N∩B=∅, and ∀x∈B ∃N∈N(x) N∩A=∅. This follows from choosing N = NB(x) and lemma 4.2. This brings us to the central theorem of the paper, the existence of a rule that correctly classifies each x∈A as class_A and each x∈B as class_B. Theorem 4.1 The following classifier correctly classifies x ∀x∈A ∧ ∀x∈B. 1 if ∃N∈N(x) (N∩A=∅ ∧ N∩B=∅) reject x 2 elseif ∃N∈N(x) N∩B=∅ classify x as class_A 3 elseif ∃N∈N(x) N∩A=∅ classify x as class_B 4 else reject x. Proof: Assume x∈A ∀N∈N(x) ∃y∈N y≠x y∈A (lemma 4.1) N∩A≠∅ rule 1 does not apply, but ∃N∈N(x) N∩B=∅ (lemma 4.3) rule 2 applies x is classified as class_A Assume x∈B, similar, using rule 3 instead of 2. x is classified as class_B if x is in A rule 4 does not apply as some of its neighbourhoods have just points in A. q.e.d. 5 Remarks Theorem 4.1 just shows that an error free classifier exists. It does not describe how such a rule may be defined based on a finite set of examples. This will be the topic of the next section. In order to prepare that we have based the theorem not just on the neighbourhood basis, which would have been possible, but formulated it more general in terms of neighbourhood systems. Rule 1 in theorem 4.1 should take care of that objects not belonging to one of the two classes (x∉A∪B) are rejected. Some x∉A∪B, however, having a sufficiently small dissimilarity with objects in A or B may also be classified as A or B and not rejected. This does not contradict the theorem as this considers only points x∈A∪B. Rule 4 takes care of objects x∉A∪B that have small dissimilarities to objects in A as well as B and rejects them. Further, it can be proven that for each subset S of A the closure of S, cl(S) will be classified as A (objects on the border of A have neighbourhoods that have no points of B and visa versa). So classes are closed sets: they contain their boundary. 6 The decision function We will now proof that the classification function can be expressed in the dissimilarities to a finite set of points if the dissimilarity measure is bounded (any distance is finite) and can be expressed in a finite set of parameters describing the objects. We will set two additional conditions: The following steps are needed: Condition 6.1 The number of parameters by which an arbitrary object can be generated is finite, say n. Condition 6.2 The values of these object parameters are bounded. Theorem 6.1 The classifier defined in theorem 4.1 can be based on a finite set of objects. Sketch of a proof: 1. There exists a one-to-one mapping between the sets of objects, A or B, and ℜn (condition 6.1) 2. The dissimilarities of a fixed, given object x to all possible objects constitute a continuous mapping ℜn → ℜ (conditions 3.2 and 6.1). 3. As the object x itself is included in the set of all objects, there is a single point in ℜn that maps on the origin (condition 3.3). 4. Because of the continuity of the mapping (condition 3.2) a finite interval in ℜn around x maps on [0,ε) in ℜ and thereby on the neighbourhood NB(x). Call this interval an object patch. 5. Because the object parameters are bounded (condition 6.2) the entire set of possible objects can be mapped on a finite interval in ℜn. Call this interval for all objects of class A (B) the class patch of class A (B). 6. From 4 and 5 it follows that just a finite set of objects is needed to cover the class patches with object patches. As a result the classifier can be written as continuous function of a finite set of dissimilarities Theorem 6.2 The classifier defined in theorem 4.1 can be written as continuous function of the dissimilarities to a finite set of objects. Sketch of a proof: 1. Any point can be correctly classified by the nearest neighbour rule applied to the dissimilarities to the set of objects that cover the class patches PA for class A and PB for class B (theorem 6.1, step 6). 2. The continuous decision function if x ∈ PA exp ( – D ( ( x, y ) ⁄ s ) ) – exp ( – D ( ( x, y ) ⁄ s ) ) > 0 x ∈ PB then y → class_A else y → class_B performs the same classification. It classifies any object correctly if s is sufficiently small, i.e. if 0 < s < (δ-ε)/log(max(|PA|,|PB|)) as for that value of s the term with the nearest neighbour object dominates. 7 Discussion The given conditions may be met in many practical applications. For instances, if objects can be identified without confusion from a video screen using a finite set of pixels, then for many shape distance measures defined on these pixels the conditions are fulfilled. Consequently, for such applications a zero-error classifier exists (theorem 4.1) and can be based on a finite set of objects (theorem 6.1) by a continuous dissimilarity based classifier (theorem 6.2). As the conditions on the dissimilarity measure are very mild, it does not even have to be a proper metric, such measures may be studied in great freedom using all possible knowledge from the application. It thereby largely enables the inclusion of expert knowledge in statistical pattern recognition procedures. Acknowledgements This research was supported by the Dutch Organization for Scientific Research (NWO). References [1] L. da Fontoura Costa and R. Marcondes Cesar Jr., Shape Analysis and Classification, CRC Press, Boca Raton, 2001. [2]E. Pekalska and R.P.W. Duin, Automatic pattern recognition by similarity representations - a novel approach, Electronic Letters, vol. 37, no. 3, 2001, 159-160. [3]E. Pekalska and R.P.W. Duin, Similarity representations allow to build good classifiers, Pattern Recognition Letters, vol. 23, no. 8, 2002, 943-956. [4]E. Pekalska, P. Paclik, and R.P.W. Duin, A generalized kernel approach to dissimilarity based classification, Journal of Machine Learning Research, vol. 2, no. 2, 2002, 175211. [5] 226. E. Pekalska, D.M.J. Tax, and R.P.W. Duin, One-class LP classifier for dissimilarity representations, NIPS 2002, Vancouver, NIPS 2003. [6]R.P.W. Duin and E. Pekalska, Possibilities of zero-error recognition by dissimilarity representations (Invited), in: J.M. Inesta, L. Mico (eds.), Pattern Recognition in Information Systems (Proc. PRIS2002, Alicante, April 2002), ICEIS Press, Setubal, Portugal, 2002, 20-32. [7]A.K. Jain and D. Zongker, Representation and recognition of handwritten digits using deformable templates, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 12, 1997, 1386-1391. [8]M.-P. Dubuisson and A.K. Jain, Modified Hausdorff distance for object matching, Proc. 12th IAPR International Conference on Pattern Recognition (Jerusalem, October 9-13, 1994), vol. 1, IEEE, Piscataway, NJ, USA,94CH3440-5, 1994, 566-568. [9] A.K. Jain, R.P.W. Duin, and J. Mao, Statistical Pattern Recognition: A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, 2000, 4-37. [10]J.S. Sanchez, F. Pla, and F.J. Ferri, Prototype selection for the nearest neighbour rule through proximity graphs, Pattern Recognition Letters, vol. 18, no. 6, 1997, 507-513. [11]Uri Lipowezky, Selection of the optimal prototype subset for 1-NN classification, Pattern Recognition Letters, vol. 19, no. 10, 1998, 907-918.
© Copyright 2026 Paperzz