Chapter 6. Fuzzy Clustering K X K = arg min , . k d X X

6-1
Chapter 6. Fuzzy Clustering
• Clustering:
Clustering essentially deals with the task of splitting a set of patterns into
a number of more-or-less homogeneous classes (clusters) with respect
to a suitable similarity measure such that the patterns belonging to any
one of the clusters are similar and the patterns of different clusters are
as dissimilar as possible.
~ Depending on the structure of the partition, two different kinds of
clustering can be distinguished:
(1) Hard clustering:
The partition is a set of disjoint subsets of patterns in such a way
that each object belongs to exactly one cluster.
(2) Fuzzy clustering:
Each object belongs to a cluster to a certain degree according to the
membership function of the cluster. The partition is fuzzy in the
sense that a single object can simultaneously belong to multiple
clusters.
6-2
•
Hard clustering.
(1) k – means algorithm :
~ “ k” simply refers to the number of desired clusters and can be replaced by
any desired index .
~ Feature vectors are continuously reassigned to clusters, and the cluster
centroids are updated until no further reassignment is necessary.
~ Algorithm :
Initialization :
randomly generate
k=1,2…
K mean vectors ( centroid ) of each cluster, X k ,
K.
Recursion :
Step 1. For each feature vector, X , in the training set , assign X to
where
k ∗ = arg min d ( X , X k ) .
k
Ck *,
6-3
d( ∙, ∙) represents some distance measure in the feature space.
e.g. Euclidean distance :
d
(X , X ) =
( X − X )T ( X − X )
Mahalanobis distance
d ( X , X ) = ( X − X ) C X−1 ( X − X ),
T
C X : covariance matrix
Step 2. Recompute the cluster centroids and return to step 1 if any of the
centroids change from the last iteration.
( t +1)
X k*
(t )
= X k*
(
)
1
t
+
X − X k( * ) ,
N +1
X
( t + 1)
K*
=
X + N ⋅ X K( t*)
N +1
N : number of samples in cluster k* at t .
6-4
(2). Kohonen learning rule or winner–take–all learning rule
Similarity matching :
k ∗ = arg min d ( X , X k ) .
k
Updating
(
X k( t* +1) = X k( t* ) + α ( t ) X − X k( t* )
)
X k( t +1) = X k( t ) , k = 1, 2, " , K , k ≠ k * .
α(t) : a suitable learning constant at the t-th time step.
• Formulation of fuzzy clustering
G G
G
G
~ Given X = {x1 , x2 " xn }, xi ∈ R p , perform a partition of X into C fuzzy
sets w.r.t. a given criterion.
C: number of clusters
Criterion: optimization of an objective function
Result: a partition matrix U s.t.
U = [ u ij ] i = 1 ~ c , j = 1 ~ n
6-5
u i j ∈ [ 0 , 1]
uij :expresses the degree to which the element xG j belongs to the i-th cluster.
additional constraints:
C
∑u
= 1 , for j = 1 ~ n
ij
" (1)
i =1
n
0 < ∑ uij < n,
for i = 1~C
" (2)
j =1
6-6
~ A general form of the objective function:
C
G
J (uij , vi ) = ∑
i =1
G
n
∑ g[w( x ) , u
i
ij
G G
] d ( x j , vi )
j =1
G
G
w( x j ) : priori weight for x
i
G G
d ( x j , vi ) : degree of similarity between
G
xi
G
and vi (central vector of the
ith cluster), and should satisfy two axioms:
G G
(a.1) d ( x j , vi ) ≥ 0
G G
G G
(a.2) d ( x j , vi ) = d (vi , x j )
~ Fuzzy clustering can be formulated as:
G
Minimize J (uij , vi ) i = 1,2"C
subject to Eqs. (1) and (2)
j = 1,2"n
• Fuzzy c-means (FCM) : [ Bezdek , 1981]
Based on Eq.(3) and
C
G
J (uij , vi ) = ∑
i =1
n
∑
j =1
G G
uijm || x j − vi ||2
m >1
(3)
6-7
m: exponential weight , influences the degree of fuzziness of the
membership
G (partition) matrix.
~Derivation of vi and uij :
∂J
∂ ⎛ C n m G G 2⎞
G
1.v p , (for fixed uij ) 0 = G = G ⎜
uij || x j − vi || ⎟
∑∑
∂v p
∂v p ⎝ i =1 j =1
n
∂
G G
= ∑ u mpj G || x j − v p ||2
∂v p
j =1
⎠
∂ G G T G G
= ∑ u mpj G ( x j − v p ) ( x j − v p )
∂v p
j =1
n
=2
G
n
∑u (x
m
pj
j =1
G
vp =
G
−
v
j
p)
n
1
n
∑ (u )
m
∑ (u )
pj
m
G
xj ,
p = 1 , 2 ," C (4)
j =1
pj
j =1
2
6-8
G
uij(for fixed vi ): The constraint of Eq. (1) can be expressed by introducing
a Lagrange multiplier
λ
.
G G 2
⎛C
⎞
u || x j − vi || − λ ⎜ ∑ uij − 1⎟
∑
i =1
j =1
⎝ i =1
⎠
m −1
∂J
λ m1−1
1
G G 2
0=
= m ⋅ ( u pq ) || xq − v p || − λ , u pq = ( ) ( G G
C
G
J (uij , vi , λ ) = ∑
n
m
ij
∂u pq
∂J
0=
∂λ
(
λ
m
)
m
=
λ
i =1
m
1 = ∑ u iq = (
⇒
1
m −1
C
1
G G 2 m1−1
∑ (|| xq − vi || )
C
)
1
C
m −1
∑
i =1
|| xq − v p ||2
1
1
m −1
( G G 2)
|| xq − vi ||
, substitute into Eq. (5)
i =1
u pq =
G G 2 m1−1
(1 / || xq − v p || )
G G 2 m1−1
∑ (1 / || xq − vi || )
C
i =1
, p = 1,2,...,C , q = 1,2,...,n
(6)
)
1
m −1
(5)
6-9
~ The system described by Eqs.(4) and (6) can not be solved analytically
~ Solution : by iterative approach
Algorithm FCM : Fuzzy C-Means Algorithm
Step1 : Select C (2 ≤ C ≤ n ), m (1 < m < ∞ ),
(0)
initial partition U and termination
set the iteration index l to 0.
Step2 : Calculate the fuzzy cluster centers
G
{ v p ( l ) p = 1,2, ⋅⋅⋅,C} by using U ( l )and Eq. (4).
ε
Step3 : Calculate U
( l +1 )
by using{
G
v p (l ) p = 1,2, ⋅⋅⋅,C } and Eq. (6).
( l +1 )
− U ( l ) = max u ij( l + 1 ) − u ij( l )
Step4 : Calculate ∆ = U
i, j
If ∆ > ε , then set l = l + 1 and go to Step2
if ∆ ≤ ε , then stop.
EX : The data for the butterfly shown below are processed
by a fuzzy 2-means algorithm with
⎡ 0 . 854 0.146 0.854 0.854 ⋅ ⋅ ⋅ 0.854 ⎤
= ⎢
⎥
⎣ 0 . 146 0.854 0.146 0.146 ⋅ ⋅ ⋅ 0.146 ⎦ 2 ×15
ε = 0 . 01 , m = 1.25
U
(0)
6-10
6-11
Result :
Remark :
(1)
6-12
m ↑, fuzziness of the partition matrix ↑.
⎡1⎤
(2) m →∞, U → ⎢ ⎥
⎣c⎦
(3) In practice , choose m from the range [1.5 , 30].
•
Determination of the “correct” number of clusters, C
~ no exact solution to this question.
~ some scalar measures of partitioning fuzziness have been used as validity
indicators, e.g.
6-13
1. Partitioning entropy :
1
H (U , C ) =
n
n
c
j =1
i=1
∑ ∑
u ij ln u ij
2. Partition coefficient :
1
F (U , C ) =
n
c
n
∑ ∑
i =1
u ij2
j =1
3. Proportion exponent :
−1
⎡ uj
⎤⎫
k +1 c
( c −1 ) ⎪
⎢ ∑ ( − 1) ( k )(1 − ku j )
⎥⎬
⎢⎣ k =1
⎥⎦ ⎪⎭
⎛ 1 ⎞
where u j = max uij , ⎡⎣ u -1j ⎤⎦ = greatest integer ≤ ⎜ ⎟
⎜u ⎟
1≤ i ≤ c
⎝ j⎠
⎧⎪ n
P (U , C ) = − ln ⎨ ∏
⎪⎩ j =1
Remarks:
(1) H = 0 and F = 1 if uij ∈ {0,1}
(2) H = lnC and F =
( hard partitioning )
1
1
if u ij = , for all i , j
C
C
1
≤ F ≤1 , 0 ≤ P ≤ ∞
C
~ Assume that the clustering structure is better identified
when more points concentrate around the cluster centers
i.e. the crisper the partitioning matrix is .
(3) 0 ≤ H ≤ ln C ,
~Based on the above assumption , the heuristic rules for selecting
the best partitioning number C are:
{
}
max {max [ F (U , C ) ]}
max {max [ P(U , C )]}
n −1
min min [ H (U , C ) ]
c=2
U ∈ΩC
n −1
c=2
U ∈ΩC
n −1
c=2
U ∈ΩC
ΩC : The set of all optimal solution for a given C
6-14
6-15
Ex : Iris data .
It contains three categories of the species Iris .
∈= 0 .05 , m = 40.
only one guess is used for U 0
Fig.
Iris data set in example
6-16
Table is Validity indicators in example
Partition entropy
C
2
3
4
5
6
H (U ; c)
0.55
0.92
1.21
1.43
1.60
Partition coefficient
F (U ; c)
0.63
0.45
0.35
0.27
0.23
Proportion exponent
P(U ; c)
109
113
111
62
68