17 CLUSTERING OF AGGREGETE ON THEIR DISTANCES AND PROXIMITIES1 S. Dvoenko2 2 Tula State University, 300600, Russia, Tula, Lenin Ave., 92, [email protected] Variants of certain known cluster analysis algorithms generating unbiased clustering with distance or proximity matrix. The proposed algorithms are compared with then algorithms of extreme clustering characteristic. It is proved that the method of nonhierarchical fragmentations allows to improve the quality of hierarchical fragmentations. Introduction In cluster-analysis problem the objects ωi , i 1,... N are usually considered as vectors x i ( xi1 ,... xin ) in n-dimensional Euclidean feature space represented by data matrix X ( N , n) . According to clusteranalysis idea objects are locally concentrated in K clusters (classes, taxons). Well-known clustering algorithms (K-MEANS, ISODATA [1], FOREL family [2]) are based on the idea of unbiased partitioning [3]. According to it, each cluster k , k 1,... K is represented by “representative” object ~ xk , and a center of a cluster is represented by “mean” object x k . It is unbiased clustering, if for all clusters ~ xk x k are true, else this is biased clustering. So, it needs to appoint “mean” objects as “representatives”, and recalculate new “mean” objects based on minimal distance of objects to “representatives”. But in a case of featureless problem a “mean” object ω(x k ) isn’t presented in Euclidean distances matrix D( N , N ) as a cluster center. So, closest to others in the cluster, object ωk is usually used as the cluster center. But in ~ ω are general case, if for all clusters ω k k true, it appears to be biased clustering, because of center x(ωk ) isn’t “mean” object x k in the unknown feature space. Based on cosines theorem scalar product for some k as a point of origin and a pair ωi , ω j with distance d ij d (i , j ) can be calculated as cij (d ki2 d kj2 d ij2 ) / 2 , where cii d ki2 for i j . So, main diagonals of matrixes Cl ( N , N ) , l 1,...N are distances squared from origins ωl to other objects. In multidimensional scaling problem it needs to restore the unknown feature space. According to multidimensional scaling idea the positive semi-definite matrix Cl ( N 1, N 1) with rank n N can be decomposed to with X ( N 1, n) [4]. In the method of principal projections [5] the origin of the feature space to be restored is the center of gravity (centroid) of objects i , i 1,... N . Cl XX T 1. Unbiased clustering by distances According to the method of principal projections it is immediately proved the cluster center ωk is represented by its distances to other objects ωi , i 1,... N without restoring the unknown feature space, where _______________________________________________________________________ 1 This paper was supported by the EU INTAS project PRINCESS 04-77-7347, by the grant of RFBR 05-01-00679 18 N k is a number of objects in k : 1 d (ωi , ωk ) Nk 2 Nk d ip2 p 1 1 2 N k2 sij (d ki2 d kj2 d ij2 ) / 2 , sii d ki2 , distances Nk Nk d pq2 ; p 1q 1 and the cluster dispersion is: σ k2 1 Nk Nk 1 Nk Nk d 2 (ωi , ωk ) 2 N 2 d pq2 ; i 1 k p 1q 1 ωi , ω p , ωq k . The K-MEANS algorithm can be presented as: ~ 0 , k 1,... K Step 0. Get K representatives ω k as the most distant to each other objects. Step s. 1. Reallocate objects ωi ks , j k , if ~ s ) d (ω , ω ~ s ), j 1,...K . d (ωi , ω k i j 2. Recalculate based d (ωi , ωks ), 3. Stop, if centers on ωks , k 1,... K distances i 1,... N . ~ s ω s , k 1,... K , ω k k are d ij2 defined based on proximities as sii s jj 2sij . So, K-MEANS and FOREL (like 1-MEAN) algorithms are immediately implemented for distances to get unbiased clustering. But now the cluster center ωk can be represented by its proximities to other objects ωi , i 1,... N , where N k is a number of objects in k : s(ωi , ωk ) 1 Nk Nk sip ; ωp k , p 1 and cluster compactness as average proximity of its center to other objects in the cluster is: δk 1 Nk Nk s(ωi , ωk ) i 1 1 Nk Nk sip ; N k2 i 1 p 1 ωi , ω p k . else ~ s 1 ω s , s s 1 . ω k k The FOREL algorithm can be presented as: Step 0. Define threshold r and set 0 . Step k. Step 0. Get an object ωi k as a repre- ~ 0 of cluster 0 . sentative ω k k Step s. 1. Reallocate objects ωi ks , The K-MEANS algorithm can be presented as: ~ 0 , k 1,... K Step 0. Get K representatives ω k as the least close to each other objects. Step s. 1. Reallocate objects ωi ks , j k , if ~ s ) s(ω , ω ~ s ), j 1,...K . s(ωi , ω k i j 2. Recalculate centers ωks , k 1,... K based on if ~s ) r . d (ωi , ω k 2. Recalculate center ks based on distances d (ωi , ωks ), ωi k . ~ s ω s , then ω ~ s 1 ω s , s s 1 , 3. If ω k k k k else k 1 k \ ks . 4. Stop, if k 1 , else k k 1 . 2. Unbiased clustering by proximities A positive semi-definite proximity matrix S ( N , N ) with sij s(ωi , ω j ) 0 can be used as a matrix of scalar products in Euclidean space of not more than N dimensionality. Relative to ωk as an origin, where proximities s(ωi , ωks ), i 1,... N . ~ s ω s , k 1,... K , else 3. Stop, if ω k k ~ s 1 ω s , s s 1 . ω k k The FOREL algorithm can be presented as: Step 0. Define threshold ρ and set 0 . Step k. Step 0. Get an object ωi k as a repre- ~ 0 of cluster 0 . sentative ω k k Step s. 1. Reallocate objects ωi ks , if ~s ) ρ . s(ωi , ω k 2. Recalculate center ks based on proximities s(ωi , ωks ), ωi k . ~ s ω s , then ω ~ s 1 ω s , s s 1 , 3. If ω k k k k 19 else k 1 k \ ks . 4. Stop, if k 1 , else k k 1 . 3. Unbiased clustering of features Well-known algorithms of features extreme grouping [6] SQUARE and MODULUS maximize functionals for “principal” π k and “centroid” μk factors ( ωi k ): K nk Unbiased clustering minimizes cluster dispersion σ k2 and maximizes compactness δk : 1 k2 1 Nk Nk Nk d2 2 ij 2N k Nk i 1 j 1 Nk Nk i 1 Nk sii 1 s . 2 ij i 1 j 1 For normalized proximities sij sij / sii s jj , sii 1 , therefore: σ k2 1 Nk Nk 1 2 sij 1 δk . N k i 1 j 1 Let set of objects be a collection of features as columns X j ( x1 j ,...x N j )T of data matrix X ( N , n) ( X 1 ,... X n ) . In this case similar features are interconnected or correlated each other and represented by weighted scalar products as correlation matrix R(n, n) . Grouping of feature set represented by squares or moduli of correlations can be realized as clustering of them by proximities or by distances (after converting). K-MEANS and FOREL give unbiased clustering, maximize compactness ( ωi , ω p k , nk - number of features in k ): δk 1 nk nk 1 nk nk 2 r (ωi , ωk ) n 2 rip , i 1 k i 1 p 1 2 1 nk nk | rip | ; nk2 i 1 p1 i 1 and maximize functionals ( ωi k ): δk 1 nk nk | r (ωi , ωk ) | K K nk I1 nk δk r 2 (ωi , ωk ) , I2 k 1 K k 1 i 1 K nk k 1 k 1i 1 nk δk | r (ωi , ωk ) | . 4. Extreme grouping like clustering J1 r 2 (ωi , π k ) , k 1 i 1 K nk J 2 | r (ωi , μk ) | . k 1 i 1 Let proximity matrix S (n, n) consist of squares sij rij2 or moduli sij | rij | of correlations between features ωi from matrix R(n, n) . It can be proved for group k its center ωk , principal π k , and centroid μk factors can be presented by their proximities to other features ωi : 1 nk s (i , k ) sij , nk j 1 nk s ( , ) kj sij k kj , i k j 1 nk s (i , k ) sij , j 1 where α k (1k ,... nk ) is 1st eigenvector. k As a result, unbiased grouping by MODULUS is unbiased clustering, but unbiased grouping by SQUARE is biased clustering, and KMEANS for distances and proximities to cluster features are the same as MODULUS to group them. Some sort of subtle difference known between results of extreme grouping by SQUARE and by MODULUS can be explained by difference between unbiased and biased clustering. 5. K clusters and groups problem It is well-known problem to define a suitable number of clusters. First answer is: it would rather find suitable number K to get clustering (grouping), for example, by K-MEANS, FOREL, SQUARE, and MODULUS. Second answer is: it would rather find suitable clustering (grouping) to get number K, for example, 20 by ISODATA, FOREL, hierarchical clustering [7, 8], and nonhierarchical clustering (grouping) [9]. The nonhierarchical divisive clustering algorithm can be represented as: starting from K 1 , let us get a least compact subset (cluster, group) k . Let us use a pair of most different objects in it as representatives and build partitioning of k on two subsets k and K 1 by some clustering algorithm. For ~ and K 1 subsets let us use two new ω k ~ ω K 1 representatives, and K 1 previously ~ ,... ω ~ ,ω ~ ,... ω ~ ones. Let us defined ω 1 k 1 k 1 K build by the same way partitioning of all N objects on K 1 clusters. Stop, when K N . A sequence of partitions, starting from 1 and finishing with single-element sets 1 ,... N is a result. Some partitions in this sequence are in subsequences of hierarchical ones. Two partitions on K and K 1 sets are hierarchy, if breaking the least compact set on two subsets k and K 1 immediately gives unbiased partition on 1 ,... K 1 . In this case the partition on K sets is so called “stable” partition for number K. Let us use stable partitions as suitable ones to get result of clustering (grouping) and to get number K. Biased partitions as the basis for nonhierarchical algorithm disintegrate (split to parts) hierarchical subsequences of partitions, reducing the set of suitable numbers K. Nonhierarchical clustering (grouping) algorithm is the superstructure (“additional storey”) for well-known iterative algorithms with specified K (K-MEANS and FOREL for feature space, new ones for distances and proximities, SQUARE and MODULUS for correlations, etc). This algorithm gives a dendrogram like hierarchical ones, but points to breaks in the hierarchy, and gives better partition at this moment. Conclusion Unlike multidimensional scaling and factor analysis problems, featureless clustering doesn’t need to restore unknown feature space. So, we leave out crucial problems of building interpretive features. Developed algorithms are suitable for featureless clustering of results of different types of pair-wise comparing (for expertises, protein sequences, signatures, etc.). These algorithms were tested on Fisher’s iris data [10], Holsinger’s psychological tests [11], etc. References 1. Tou J.T., Gonzalez R.C. Pattern Recognition Principles. Addison-Wesley, 1981. 2. Zagoruiko N.G. Applied Methods of Data and Knowledge Analysis. Novosibirsk, Institute of Mathematics, 1999 (in Russian). 3. Schlesinger M. About spontaneous recognition of patterns// Reading Automations. - 1965, Kiev. - P.38 – 45 (in Russian). 4. Young G., Householder A.S. Discussion of a set of points in terms of their mutual distances// Psychometrika. – 1938. Vol. 3. - P.19–22. 5. Torgenson W.S. Theory and Methods of Scaling. N.Y., J. Wiley, 1958. 6. Braverman E.M. Methods of extreme grouping of parameters and a problem to find essential factors// Avtomatika i telemehanika. - 1970. No.1. - P.123–132 (in Russian). 7. Sneath P. The application of computers to taxonomy// J. of General Microbiology.–1957. Vol.17 P. 201–226. 8. Ward J. Hierarchical grouping to optimize an objective function// J. of the ASA.–1963.Vol.58.- P. 236-244. 9. Dvoenko S.D. Restoration of spaces in data by the method of nonhierarchical decompositions// Automation and Remote Control. 2001. Vol. 62, No. 3. -P. 467–473. 10. Fisher R.A. The use of multiple measurements in taxonomic problems// Ann. Eugenics. – 1936. - Vol.7. P.179-188. 11. Harman H.H. Modern Factor Analysis. Chicago, Univ. Chicago Press, 1976.
© Copyright 2026 Paperzz