УДК 681

17
CLUSTERING OF AGGREGETE ON THEIR DISTANCES AND
PROXIMITIES1
S. Dvoenko2
2
Tula State University, 300600, Russia, Tula, Lenin Ave., 92, [email protected]
Variants of certain known cluster analysis algorithms generating unbiased clustering
with distance or proximity matrix. The proposed algorithms are compared with then algorithms of extreme clustering characteristic. It is proved that the method of nonhierarchical fragmentations allows to improve the quality of hierarchical fragmentations.
Introduction
In
cluster-analysis problem the objects
ωi   , i  1,... N are usually considered as
vectors x i  ( xi1 ,... xin ) in n-dimensional
Euclidean feature space represented by data
matrix X ( N , n) . According to clusteranalysis idea objects are locally concentrated
in K clusters (classes, taxons). Well-known
clustering algorithms (K-MEANS, ISODATA
[1], FOREL family [2]) are based on the idea
of unbiased partitioning [3]. According to it,
each cluster  k , k  1,... K is represented by
“representative” object ~
xk , and a center of a
cluster is represented by “mean” object x k . It
is unbiased clustering, if for all clusters
~
xk  x k are true, else this is biased clustering.
So, it needs to appoint “mean” objects as
“representatives”, and recalculate new “mean”
objects based on minimal distance of objects
to “representatives”.
But in a case of featureless problem a “mean”
object ω(x k ) isn’t presented in Euclidean
distances matrix D( N , N ) as a cluster center.
So, closest to others in the cluster, object ωk
is usually used as the cluster center. But in
~  ω are
general case, if for all clusters ω
k
k
true, it appears to be biased clustering, because
of center x(ωk ) isn’t “mean” object x k in the
unknown feature space.
Based on cosines theorem scalar product for
some k   as a point of origin and a pair
ωi , ω j with distance d ij  d (i ,  j ) can be
calculated as cij  (d ki2  d kj2  d ij2 ) / 2 , where
cii  d ki2 for i  j . So, main diagonals of
matrixes Cl ( N , N ) , l  1,...N are distances
squared from origins ωl   to other objects.
In multidimensional scaling problem it needs
to restore the unknown feature space. According to multidimensional scaling idea the positive semi-definite matrix Cl ( N  1, N  1)
with rank n  N can be decomposed to
with X ( N  1, n) [4]. In the
method of principal projections [5] the origin
of the feature space to be restored is the center
of gravity (centroid) of objects i  ,
i  1,... N .
Cl  XX T
1. Unbiased clustering by distances
According to the method of principal projections it is immediately proved the cluster
center ωk is represented by its distances to
other objects ωi  , i  1,... N without
restoring the unknown feature space, where
_______________________________________________________________________
1
This paper was supported by the EU INTAS project PRINCESS 04-77-7347, by the grant of RFBR 05-01-00679
18
N k is a number of objects in  k :
1
d (ωi , ωk ) 
Nk
2
Nk

d ip2
p 1
1

2 N k2
sij  (d ki2  d kj2  d ij2 ) / 2 , sii  d ki2 , distances
Nk Nk
 d pq2 ;
p 1q 1
and the cluster dispersion is:
σ k2 
1
Nk
Nk
1
Nk Nk
 d 2 (ωi , ωk )  2 N 2  d pq2 ;
i 1
k p 1q 1
ωi , ω p , ωq  k .
The K-MEANS algorithm can be presented as:
~ 0 , k  1,... K
Step 0. Get K representatives ω
k
as the most distant to each other objects.
Step s.
1.
Reallocate objects ωi   ks , j  k , if
~ s )  d (ω , ω
~ s ), j  1,...K .
d (ωi , ω
k
i
j
2.
Recalculate
based
d (ωi , ωks ),
3.
Stop,
if
centers
on
ωks , k  1,... K
distances
i  1,... N .
~ s  ω s , k  1,... K ,
ω
k
k
are
d ij2
defined
based on proximities as
 sii  s jj  2sij . So, K-MEANS and
FOREL (like 1-MEAN) algorithms are immediately implemented for distances to get unbiased clustering.
But now the cluster center ωk can be represented by its proximities to other objects
ωi  , i  1,... N , where N k is a number of
objects in  k :
s(ωi , ωk ) 
1
Nk
Nk
 sip ; ωp  k ,
p 1
and cluster compactness as average proximity
of its center to other objects in the cluster is:
δk 
1
Nk
Nk
 s(ωi , ωk ) 
i 1
1 Nk Nk
  sip ;
N k2 i 1 p 1
ωi , ω p   k .
else
~ s 1  ω s , s  s  1 .
ω
k
k
The FOREL algorithm can be presented as:
Step 0. Define threshold r and set  0   .
Step k.
Step 0. Get an object ωi   k as a repre-
~ 0 of cluster  0 .
sentative ω
k
k
Step s.
1. Reallocate
objects
ωi   ks ,
The K-MEANS algorithm can be presented as:
~ 0 , k  1,... K
Step 0. Get K representatives ω
k
as the least close to each other objects.
Step s.
1. Reallocate objects ωi   ks , j  k , if
~ s )  s(ω , ω
~ s ), j  1,...K .
s(ωi , ω
k
i
j
2. Recalculate centers ωks , k  1,... K based on
if
~s )  r .
d (ωi , ω
k
2. Recalculate center  ks based on distances d (ωi , ωks ), ωi   k .
~ s  ω s , then ω
~ s 1  ω s , s  s  1 ,
3. If ω
k
k
k
k
else  k 1   k \  ks .
4. Stop, if  k 1   , else k  k  1 .
2. Unbiased clustering by proximities
A positive semi-definite proximity matrix
S ( N , N ) with sij  s(ωi , ω j )  0 can be
used as a matrix of scalar products in Euclidean space of not more than N dimensionality.
Relative to ωk   as an origin, where
proximities s(ωi , ωks ), i  1,... N .
~ s  ω s , k  1,... K , else
3. Stop, if ω
k
k
~ s 1  ω s , s  s  1 .
ω
k
k
The FOREL algorithm can be presented as:
Step 0. Define threshold ρ and set  0   .
Step k.
Step 0. Get an object ωi   k as a repre-
~ 0 of cluster  0 .
sentative ω
k
k
Step s.
1. Reallocate objects ωi   ks , if
~s )  ρ .
s(ωi , ω
k
2. Recalculate center  ks based on
proximities s(ωi , ωks ), ωi   k .
~ s  ω s , then ω
~ s 1  ω s , s  s  1 ,
3. If ω
k
k
k
k
19
else  k 1   k \  ks .
4. Stop, if  k 1   , else k  k  1 .
3. Unbiased clustering of features
Well-known algorithms of features extreme
grouping [6] SQUARE and MODULUS maximize functionals for “principal” π k and
“centroid” μk factors ( ωi   k ):
K nk
Unbiased clustering minimizes cluster dispersion σ k2 and maximizes compactness δk :
1
 k2 

1
Nk
Nk Nk
d2 
2   ij
2N k
Nk
i 1 j 1
Nk Nk
i 1
Nk
 sii 
1
s .
2   ij
i 1 j 1
For normalized proximities sij  sij / sii s jj ,
sii  1 , therefore:
σ k2
1 Nk Nk
 1  2  sij  1  δk .
N k i 1 j 1
Let set of objects be a collection of features as
columns X j  ( x1 j ,...x N j )T of data matrix
X ( N , n)  ( X 1 ,... X n ) . In this case similar
features are interconnected or correlated each
other and represented by weighted scalar
products as correlation matrix R(n, n) .
Grouping of feature set represented by squares
or moduli of correlations can be realized as
clustering of them by proximities or by distances (after converting).
K-MEANS and FOREL give unbiased clustering, maximize compactness ( ωi , ω p   k ,
nk - number of features in  k ):
δk 
1
nk
nk
1 nk nk 2
 r (ωi , ωk )  n 2   rip ,
i 1
k i 1 p 1
2
1 nk nk
 | rip | ;
nk2 i 1 p1
i 1
and maximize functionals ( ωi   k ):
δk 
1
nk
nk
| r (ωi , ωk ) | 
K
K nk
I1   nk δk   r 2 (ωi , ωk ) ,
I2 
k 1
K
k 1 i 1
K nk
k 1
k 1i 1
 nk δk  | r (ωi , ωk ) | .
4. Extreme grouping like clustering
J1   r 2 (ωi , π k ) ,
k 1 i 1
K nk
J 2   | r (ωi , μk ) | .
k 1 i 1
Let proximity matrix S (n, n) consist of
squares sij  rij2 or moduli sij | rij | of correlations between features ωi   from matrix
R(n, n) . It can be proved for group  k its
center ωk , principal π k , and centroid μk
factors can be presented by their proximities to
other features ωi   :

1 nk
 s (i ,  k ) 
 sij ,
nk j 1


nk

s
(

,

)


  kj sij  k  kj ,
i
k
j 1

nk

 s (i ,  k )   sij ,

j 1
where α k  (1k ,... nk ) is 1st eigenvector.
k
As a result, unbiased grouping by MODULUS
is unbiased clustering, but unbiased grouping
by SQUARE is biased clustering, and KMEANS for distances and proximities to
cluster features are the same as MODULUS to
group them. Some sort of subtle difference
known between results of extreme grouping by
SQUARE and by MODULUS can be explained by difference between unbiased and
biased clustering.
5. K clusters and groups problem
It is well-known problem to define a suitable
number of clusters. First answer is: it would
rather find suitable number K to get clustering
(grouping), for example, by K-MEANS,
FOREL, SQUARE, and MODULUS. Second
answer is: it would rather find suitable clustering (grouping) to get number K, for example,
20
by ISODATA, FOREL, hierarchical clustering
[7, 8], and nonhierarchical clustering (grouping) [9].
The nonhierarchical divisive clustering algorithm can be represented as: starting from
K  1 , let us get a least compact subset (cluster, group)  k . Let us use a pair of most
different objects in it as representatives and
build partitioning of  k on two subsets  k
and  K 1 by some clustering algorithm. For
~ and
K  1 subsets let us use two new ω
k
~
ω
K 1 representatives, and K  1 previously
~ ,... ω
~ ,ω
~ ,... ω
~ ones. Let us
defined ω
1
k 1
k 1
K
build by the same way partitioning of all N
objects on K  1 clusters. Stop, when K  N .
A sequence of partitions, starting from
1   and finishing with single-element
sets 1 ,...  N is a result. Some partitions in
this sequence are in subsequences of hierarchical ones. Two partitions on K and K  1
sets are hierarchy, if breaking the least compact set
on two subsets  k and  K 1
immediately gives unbiased partition on
1 ,...  K 1 . In this case the partition on K
sets is so called “stable” partition for number
K.
Let us use stable partitions as suitable ones to
get result of clustering (grouping) and to get
number K. Biased partitions as the basis for
nonhierarchical algorithm disintegrate (split to
parts) hierarchical subsequences of partitions,
reducing the set of suitable numbers K.
Nonhierarchical clustering (grouping) algorithm is the superstructure (“additional storey”) for well-known iterative algorithms with
specified K (K-MEANS and FOREL for feature space, new ones for distances and proximities, SQUARE and MODULUS for correlations, etc). This algorithm gives a dendrogram
like hierarchical ones, but points to breaks in
the hierarchy, and gives better partition at this
moment.
Conclusion
Unlike multidimensional scaling and factor
analysis problems, featureless clustering
doesn’t need to restore unknown feature space.
So, we leave out crucial problems of building
interpretive features. Developed algorithms are
suitable for featureless clustering of results of
different types of pair-wise comparing (for
expertises, protein sequences, signatures, etc.).
These algorithms were tested on Fisher’s iris
data [10], Holsinger’s psychological tests [11],
etc.
References
1. Tou J.T., Gonzalez R.C. Pattern Recognition Principles. Addison-Wesley, 1981.
2. Zagoruiko N.G. Applied Methods of Data and
Knowledge Analysis. Novosibirsk, Institute of Mathematics, 1999 (in Russian).
3. Schlesinger M. About spontaneous recognition of
patterns// Reading Automations. - 1965, Kiev. - P.38 –
45 (in Russian).
4. Young G., Householder A.S. Discussion of a set of
points in terms of their mutual distances// Psychometrika. – 1938. Vol. 3. - P.19–22.
5. Torgenson W.S. Theory and Methods of Scaling.
N.Y., J. Wiley, 1958.
6. Braverman E.M. Methods of extreme grouping of
parameters and a problem to find essential factors//
Avtomatika i telemehanika. - 1970. No.1. - P.123–132
(in Russian).
7. Sneath P. The application of computers to taxonomy// J. of General Microbiology.–1957. Vol.17 P. 201–226.
8. Ward J. Hierarchical grouping to optimize an objective function// J. of the ASA.–1963.Vol.58.- P. 236-244.
9. Dvoenko S.D. Restoration of spaces in data by the
method of nonhierarchical decompositions// Automation
and Remote Control. 2001. Vol. 62, No. 3. -P. 467–473.
10. Fisher R.A. The use of multiple measurements in
taxonomic problems// Ann. Eugenics. – 1936. - Vol.7.
P.179-188.
11. Harman H.H. Modern Factor Analysis. Chicago,
Univ. Chicago Press, 1976.