Unsupervised possibilistic clustering

Pattern Recognition 39 (2006) 5 – 21
www.elsevier.com/locate/patcog
Unsupervised possibilistic clustering
Miin-Shen Yanga,∗ , Kuo-Lung Wub
a Department of Applied Mathematics, Chung Yuan Christian University, Chung-Li 32023, Taiwan, ROC
b Department of Information Management, Kun Shan University of Technology, Tainan 71023, Taiwan, ROC
Received 22 March 2004; received in revised form 4 April 2005; accepted 15 July 2005
Abstract
In fuzzy clustering, the fuzzy c-means (FCM) clustering algorithm is the best known and used method. Since the FCM memberships
do not always explain the degrees of belonging for the data well, Krishnapuram and Keller proposed a possibilistic approach to clustering
to correct this weakness of FCM. However, the performance of Krishnapuram and Keller’s approach depends heavily on the parameters.
In this paper, we propose another possibilistic clustering algorithm (PCA) which is based on the FCM objective function, the partition
coefficient (PC) and partition entropy (PE) validity indexes. The resulting membership becomes the exponential function, so that it is robust
to noise and outliers. The parameters in PCA can be easily handled. Also, the PCA objective function can be considered as a potential
function, or a mountain function, so that the prototypes of PCA can be correspondent to the peaks of the estimated function. To validate
the clustering results obtained through a PCA, we generalized the validity indexes of FCM. This generalization makes each validity index
workable in both fuzzy and possibilistic clustering models. By combining these generalized validity indexes, an unsupervised possibilistic
clustering is proposed. Some numerical examples and real data implementation on the basis of the proposed PCA and generalized validity
indexes show their effectiveness and accuracy.
䉷 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Keywords: Fuzzy clustering; Possibilistic clustering; Fuzzy c-means; Validity indexes; Fuzzy c-partitions; Possibilistic c-memberships; Robustness
1. Introduction
Cluster analysis is a method for clustering a data set into
groups of similar individuals. It is an approach towards unsupervised learning as well as one of the major techniques
in pattern recognition. The conventional (hard) clustering
methods restrict each point of the data set to exactly one
cluster. Since Zadeh [1] proposed fuzzy sets that produced
the idea of partial membership described by a membership
function, fuzzy clustering has been widely studied and applied in a variety of key areas (see Refs. [2–4]). In the
literature on fuzzy clustering, the fuzzy c-means (FCM)
clustering algorithm, proposed by Dunn [5] and extended
by Bezdek [2], is the most well-known and used method.
∗ Corresponding author. Tel.: +886 3 265 3100; fax: +886 3 265 3199.
E-mail address: [email protected] (M.-S. Yang).
Although FCM is a very useful clustering method, its memberships do not always correspond well to the degrees of
belonging of the data, and it may be inaccurate in a noisy
environment [6]. To improve this weakness of FCM, and to
produce memberships that have a good explanation of the
degrees of belonging for the data, Krishnapuram and Keller
[6] created a possibilistic approach to clustering which used
a possibilistic type of membership function to describe the
degree of belonging. They showed that algorithms with possibilistic memberships are more robust to noise and outliers
than FCM. The possibilistic clustering approach has also
been applied in shell clustering, boundary detection, surface
and function approximations (see Refs. [7–9]).
It is necessary to pre-assume the number c of clusters
for these hard, fuzzy and possibilistic clustering algorithms
(PCAS). In general, the cluster number c should be unknown. The problem for finding an optimal c is usually
called cluster validity. Once the partition is obtained by
a clustering method, the validity function can help us to
0031-3203/$30.00 䉷 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2005.07.005
6
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
validate whether it accurately presents the structure of the
data set or not. The first proposed cluster validity functions associated with FCM are the partition coefficient
(PC) and partition entropy (PE) [2,10,11]. These indexes
use only membership functions and have the disadvantage of lack of connection to the geometrical structure
of the data. The validity indexes that explicitly take into
account the data geometrical properties include the FS
index proposed by Fukuyama and Sugeno [12], the XB
index proposed by Xie and Beni [13], the SC index proposed by Zahid et al. [14], FH V (fuzzy hyper-volume)
and PD (partition density) indexes proposed by Gath
and Geva [15], etc. By combining a validity function,
fuzzy clustering algorithms, such as FCM and alternative FCM [16], can become unsupervised fuzzy clustering
algorithms.
In real data analysis, noise and outliers are unavoidable. In
this situation, one may process a PCA. However, the existing
validity indexes will lose their efficiency in a possibilistic
clustering environment (i.e. the membership functions are
possibilistic types). In this paper we will discuss problems
of how to validate the clustering results obtained through a
PCA. Since possibilistic memberships relax the constraint
in the fuzzy memberships obtained by FCM, it can be made
sure that these existing validity indexes will not work in these
possibilistic clustering models. We will use a normalization
technique to generalize these existing indexes. This generalization makes each validity index workable in both fuzzy and
possibilistic clustering models. We shall also propose a new
PCA whose possibilistic memberships are exponential functions, and is of course robust to noise and outliers. Therefore,
an unsupervised possibilistic clustering method can be created by combining these generalized validity indexes with
a PCA.
This paper is organized as follows: in Section 2,
we review the FCM clustering algorithm and discuss the
effects of the parameters m (fuzzifier) and c (cluster number). We also review four existing validity indexes that are
the most indicative in the fuzzy clustering validity analysis.
In Section 3.1, we propose a PCA whose objective function is an extension of the FCM objective function in the
combination of the PC and PE validity indexes. We also
discuss the effects of the parameters m and c in PCA. In
Section 3.2, we give the robust properties of PCA based on
the influence function. In Section 3.3, we propose a normalization technique to generalize these existing indexes.
This generalization makes each validity index workable
by validating both fuzzy and possibilistic clusters. In Section 4, we use three real data sets to test the efficiency of
the original and generalized validity indexes by validating
the fuzzy and possibilistic clusters obtained through the
FCM and PCA, respectively. We also analyze the effects
of the parameter m by using three real data and by giving a suggestion of choosing m in both FCM and PCA.
Finally, the discussion and conclusions are presented in
Section 5.
2. Unsupervised fuzzy clustering
Since Zadeh [1] introduced the concept of fuzzy sets, a
great deal of research on fuzzy clustering has been conducted. Let X={x1 , . . . , xn } be a data set in an s-dimensional
Euclidean space R s with its ordinary Euclidean norm · and let c be a positive integer larger than one. A partition
of X into c clusters can be presented using mutually disjoint
sets X1 , . . . , Xc such that X1 ∪ · · · ∪ Xc = X or equivalently
by the indicator functions 1 , . . . , c such that i (x) = 1 if
x ∈ Xi and i (x) = 0 if x ∈
/ Xi for all i = 1, . . . , c. The set
of indicator functions
{1 , . . . , c }H = {{1 , . . . , c } | i (x) ∈ {0, 1}}
(1)
is called a hard c-partition which clusters X into c clusters.
Consider an extension to allow i (x) to be membership functions of fuzzy sets
i on X assuming values in the interval
[0, 1] such that ci=1 i (x) = 1 for all x in X. In this case,
{1 , . . . , c }F
= {1 , . . . , c } | i (x) ∈ [0, 1],
c
i (x) = 1
i=1
(2)
is called a fuzzy c-partition of X.
2.1. The FCM clustering algorithm
In the unsupervised learning literature, the FCM is the
best-known fuzzy clustering method. The FCM is an iterative
algorithm using the necessary conditions for a minimizer of
the FCM objective function JF CM with
JF CM (, a) =
n
c i=1 j =1
2
m
ij xj − ai ,
m > 1,
(3)
where = {1 , . . . , c } with the membership function
i defined as ij = i (xj ) is a fuzzy c-partition and
a = {a1 , . . . , ac } is the set of c cluster centers. The necessary conditions for a minimizer (, a) of JF CM are the
following update equations:
−1
c
xj − ai 2/(m−1)
,
ij =
xj − ak 2/(m−1)
k=1
i = 1, . . . , c, j = 1, . . . , n
and
(4)
n
m
j =1 ij xj
m
j =1 ij
ai = n
,
i = 1, . . . , c.
(5)
The weighting exponent m is called the fuzzifier which
can have an influence on the clustering performance of FCM
[17]. The influence of the weighting exponent m on the membership function i of Eq. (4) is shown in Fig. 1. This figure is produced by assuming that there are only two clusters
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
7
concept and property in any fuzzy or possibilistic clustering
methods. The FCM clustering algorithm is then summarized
as follows:
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
m = 10
m=3
m=2
m = 1.5
m = 1.1
0
1
2
3
Fuzzy c-means clustering algorithm
(0)
Initialize ai , i = 1, . . . , c and set > 0; set iteration
counter = 0;
(+1)
Step 1. Compute ij
using Eq. (4).
(+1)
4
Fig. 1. The membership functions of FCM with different weighting exponents m.
using Eq. (5).
Step 2. Compute ai
(+1)
()
− ai < .
Increment ; until maxi ai
2.2. Cluster validity for fuzzy clustering
1.0
c=2
0.9
c=3
0.8
c=4
0.7
c=5
0.6
0.5
c = 10
0.4
0.3
0.2
0.1
0.0
0
1
2
3
4
Fig. 2. The membership functions of FCM with different cluster
numbers c.
with centers 0 and 2. The curves with different m values are
the membership functions belonging to the cluster with center 0. When m = 1, the FCM will reduce to the traditional
hard c-means. When m tends to infinity, ij = 1/c for all i, j
and the sample mean will be a unique optimizer of JF CM .
In fact, this situation may occur for any specified m values,
and Yu et al. [18] proposed a theoretical upper bound for
m that can prevent the sample mean from being the unique
optimizer of JF CM .
Another parameter which also has an influence on ij is
the cluster number c. In general, we do not consider it to be
a parameter of the membership function. However, in fact,
the shapes of the membership functions change when the
cluster number c changes. The influence of c on ij is shown
in Fig. 2. The curve is the membership function of belonging
to the cluster with center 0. When c=2, the curve is made by
adding the cluster with center 2. That is, the set of the cluster
centers when c=2 is {0, 2}. When c=3, the curve is made by
adding the third cluster with center 2 and the set of the cluster
centers when c = 3 is {0, 2, 2}. The set of the cluster centers
when c = 4 is {0, 2, 2, 2}. The rest can be done in the same
way. The shapes of the FCM membership functions will
become steep when c increases. This is reasonable because it
can help FCM to find more clusters in a large cluster number
case. This analysis is important, and we should involve this
After a fuzzy c-partition is provided by a fuzzy clustering
algorithm such as FCM, we may ask whether it accurately
represents the structure of the data set or not. This is a cluster
validity problem. Since most of the fuzzy clustering methods need to pre-assume the number c of clusters, a validity
criterion for finding an optimal c, which can completely describe the data structure, becomes the most studied topic in
cluster validity. For a given cluster number range the validity measure is evaluated for each given cluster number, and
then an optimal number is chosen for these validity measures. We briefly review four existing indexes here.
(a) The first validity index associated with FCM is the
partition coefficient [2] defined by
P C(c) =
n
c
1 2
ij ,
n
(6)
i=1 j =1
where 1/c P C(c) 1. In general, we find an optimal cluster number c∗ by solving max2 c n−1 P C(c) to produce a
best clustering performance for the data set X.
(b) The PE [10,11] is defined by
n
c
1
PE(c) = −
ij log2 ij ,
n
(7)
i=1 j =1
where 0 PE(c)log2 c. In general, we find an optimal c∗
by solving min2 c n−1 PE(c) to produce a best clustering
performance for the data set X.
When ij = 1/c for all i, j (i.e. the sample mean is the
unique optimizer), PC(c)=1/c and PE(c)=log2 c. This may
be caused by the fault of algorithms, by unsuitable parameter
use, or by lack of data structure. Yu et al. [18] showed that if
m is larger than a theoretical upper boundary then the above
situation occurs.
The above indexes use only the membership functions,
and may have a monotone tendency with cluster number c.
This may be due to lack of connection to the geometrical
structure of data. The following indexes simultaneously take
into account the membership functions and the structure of
data.
8
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
(c) A validity function proposed by Fukuyama and Sugeno
[12] is defined by
FS(c) =
c
n
i=1 j =1
m
ij xj
− ai −
2
c
n
i=1 j =1
= JF CM (, a) + K(, a),
m
ij ai
− a
2
XB(c) =
=
i=1
n
m
j =1 ij xj
(8)
∀i .
(10)
We may call the memberships 1 , . . . , c in Eq. (10) as the
possibilistic c-memberships. To avoid trivial solutions, Krishnapuram and Keller [6] added a constraining term to FCM
and proposed the following possibilistic clustering objective
function:
J1 (, a) =
n
c i=1 j =1
+
2
m
ij xj − ai n
c i (1 − ij )m .
(11)
i=1 j =1
− a i 2
n mini,j ai − aj 2
JF CM (, a)/n
.
Sep(a)
{1 , . . . , c }P
= {1 , . . . , c } | max i (x) > 0,
i
where a = ci=1 ai /c. JF CM (, a) is the FCM objective
function which measures the compactness, and K(, a) measures the separation. In general, an optimal c∗ is found by
solving min2 c n−1 FS(c) to produce a best clustering performance for the data set X.
(d) A validity function proposed by Xie and Beni [13]
with m = 2 and then generalized by Pal and Bezdek [17] is
defined by
c
possibilistic type of membership function with
(9)
JF CM (, a) is a compactness measure, and Sep(a) is a separation measure. In general, an optimal c∗ is found by solving min2 c n−1 XB(c) to produce a best clustering performance for the data set X.
These four indexes are the most cited validity indexes for
fuzzy clustering. They have a common objective of finding an optimal c with each one of these c clusters, compact
and separated from other clusters. By combining the validity
functions, FCM becomes a completely unsupervised fuzzy
clustering algorithm. Note that since no single validity index is the best, a better way of using validity indexes to
solve the cluster validity problem is to consider all information proposed by all selected indexes, and then make an
optimal decision. However, in some situations such as in a
noisy and outlier environment, one may wish to process a
possibilistic clustering approach. But, when the membership
functions are of a possibility type, it can be made sure that
these existing validity indexes will not work. Before solving this problem, we will discuss the possibilistic clustering
approaches below.
3. Unsupervised possibilistic clustering
Although FCM is a very useful clustering method, its
memberships do not always correspond well to the degree
of belonging of the data, and may be inaccurate in a noisy
environment [6]. To improve this weakness of FCM, and
to produce memberships that have a good explanation for
the degree of belonging for the data, Krishnapuram
and
Keller [6] relaxed the constrained condition ci=1 i (x) = 1
of the fuzzy c-partition {1 , . . . , c }F in FCM to obtain a
They then created a possibilistic approach to clustering
which used a possibilistic c-membership of Eq. (10) to describe the degree of belonging on the basis of the objective
function (11). Afterward, Krishnapuram and Keller [19]
gave an alternative objective function for the possibilistic
clustering with:
J2 (, a) =
n
c ij xj − ai 2
i=1 j =1
+
n
c i (ij log ij − ij ).
(12)
i=1 j =1
Note that both (1 − ij )m and ij log ij − ij of Eqs. (11)
and (12) are monotone decreasing functions of ij . This
forces ij to be as large as possible to avoid the trivial solutions. However, the parameter i has a major influence
on the clustering results. The determination of the normalization parameter i is quite important. Krishnapuram and
Keller [6,19] recommended to select i as
n
m
2
j =1 ij xj − ai n
i = K
or
m
j =1 ij
2
xj ∈(i ) xj − ai i =
,
(13)
|xj ∈ (i ) |
where K ∈ (0, ∞) was typically chosen to be one, and
xj ∈ (i ) if {ij | j = 1, . . . , n}. Memberships obtained by minimizing J1 or J2 are possibilistic type. The
clustering performance using both J1 and J2 heavily depends on the chosen parameter i (see Refs. [19,20]).
On the other hand, it is also difficult to handle the parameter i in the real applications. In this section we
will propose a PCA whose performance can be easily
controlled and whose objective function can be properly
analyzed.
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
Now eliminate xj − ai 2 from the objective function JP CA
using Eq. (18) and we have:
3.1. A new possibilistic clustering algorithm
The objective function of our PCA is
JP CA (, a) =
n
c i=1 j =1
+
m
ij xj
√
m2 c
JP CA () = −
− ai 2
n c i=1 j =1
m
m
m
ij log ij − ij ,
(14)
where , m and c are all positive. The first term is equivalent
to the FCM objective function which requires the distances
from the feature vectors to the cluster centers to be as small
as possible. The second term is constructed by an analog
m
of the PE validity index (m
ij log ij ), and an analog of the
PC validity index (m
ij ). This constrained term will force the
ij to be as large as possible. The objective function (14)
extends the FCM objective function by combining
√ it with
the PC and PE validity indexes. The term /m2 c will be
discussed later.
Theorem 1. The necessary conditions for a minimizer (, a)
of the objective function (14) are the following update equations:
√
m cxj − ai 2
ij = exp −
,
i = 1, . . . , c,
j = 1, . . . , n
(15)
and
n
m
j =1 ij xj
m
j =1 ij
ai = n
,
i = 1, . . . , c.
(16)
Proof. Since no conditions are constrained on ij , minimizing JP CA with respect to ij is equivalent to minimizing
2
m
ij xj − ai +
m
m
m
log
−
√
ij
ij
ij
m2 c
(17)
with respect to ij . By differentiating Eq. (17) with respect
to ij and setting it to zero, we will have Eq. (15). Since
the second term of JP CA is independent of ai , minimizing
JP CA with respect to ai will be equivalent to minimizing
the first term of JP CA . This leads to Eq. (16). The insights of the objective function JP CA can be observed by solving the membership function (15) for xj −
ai 2 in terms of ij . We have that:
xj − ai 2 = −
ln m
ln ij
ij
√ = − 2√ .
m c
m c
9
(18)
n
c
m
ij .
√
m2 c
(19)
i=1 j =1
Thus, minimizing
or (JP CA ) is equivalent to maxic n JP CA
m
mizing i=1 j =1 ij . Therefore, the objective of JP CA
is to find prototypes such that the sum of the membership
function is maximized. Furthermore, for the special case of
m = 2, the PCA ends up producing the results that maximize
the PC validity index. On the other hand, if we insert the
membership functions (15) into the term ci=1 nj=1 m
ij ,
we have
√
m
n
c c n
m cxj − ai 2
m
ij =
exp −
.
i=1 j =1
i=1 j =1
(20)
We find that the right-hand term in Eq. (20) is the potential
function [21] and the mountain function [22]. Thus, minimizing JP CA (or JP CA ) is equivalent to maximizing the total potential function or the mountain function. Therefore,
the cluster centers obtained by PCA correspond to the peaks
of the potential function or the mountain function. Since the
potential function or mountain function can be looked upon
as a density function, the prototypes of PCA are equivalent
to the modes of the estimated density.
It is obvious that the exponential function in Eq. (15)
obtained by minimizing JP CA is a possibilistic type. The
parameter is a normalization term that measures the degree
of separation of the data set, and it is reasonable to define as the sample co-variance. That is:
n
n
2
j =1 xj − x
j =1 xj
=
with x =
.
(21)
n
n
The role of the parameter m in Eq. (15) is correspondent to
the fuzzifier m in FCM. If m in JP CA tends to zero, then
limm→0 ij = 1 and limm→0 ai = x for all i, j. In this case,
the sample mean will be the unique optimizer of the JP CA ,
and hence no clusters will be found, or we say that these c
clusters coincide to one cluster. If m tends to infinity, most
data points will have very small membership values even if
they are very close to one of these c cluster centers. In this
case, the membership function does not have a good corresponding explanation of the degree of belonging. In PCA,
the influence of the weighting exponent m on the membership functions ij is shown in Fig. 3. The curves of different
m values are the membership functions of belonging to the
cluster with center 0.
We also combine the cluster number c into the exponential membership function. In PCA, the influence of the cluster number c on the membership functions ij is shown in
Fig. 4. When c increases, the shape of the membership function will become more steep. This will allow us to find more
10
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
200
m = 1.1
m = 1.5
B
190
m=2
180
m=3
170
m = 10
160
A
150
140
0
1
2
3
Fig. 3. The membership functions of PCA with different weighting exponents m.
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
50
4
100
150
Fig. 5. Two clusters’ data set with a bridge point A and an outlying
point B.
Table 1
Data set and membership values of FCM and PCA in Fig. 5
c=2
c=3
X
Y
c=4
c=5
1
2
3
(0)
Initialize ai , i = 1, . . . , c and set > 0; set iteration
counter = 0;
(+1)
Step 1. Compute ij
using Eq. (15).
(+1)
using Eq. (16).
Step 2. Compute ai
(+1)
()
− ai < .
Increment ; until maxi ai
mu2
mu1
mu2
0.998
0.998
0.988
0.991
0.979
0.991
0.997
0.002
0.009
0.021
0.002
0.012
0.009
0.003
0.002
0.002
0.012
0.009
0.021
0.009
0.003
0.998
0.991
0.979
0.998
0.988
0.991
0.997
1.000
0.957
0.838
0.955
0.835
0.956
0.956
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.000
0.956
0.835
0.957
0.838
0.956
0.956
A
100
150
0.500
0.500
0.058
0.058
B
100
200
0.500
0.500
0.001
0.001
Fig. 4. The membership functions of PCA with different cluster
numbers c.
Possibilistic clustering algorithm
mu1
150
150
150
150
150
145
155
150
150
150
150
150
145
155
4
possible clusters in a large c case. Although the cluster number c is not directly considered as a parameter in FCM, it
still has an influence on the membership as shown in Fig. 2.
Therefore, we include the cluster number c in our exponential membership function.
The above
√ discussion is the reason why we multiply the
term /m2 c in the second term of JP CA . The parameter
can always be fixed as the sample co-variance. The parameter m can be specified by the users according to their
√
requirements. In general, we take m = 2. The parameter c
is used to control the steep degree of the membership functions. The main objective involving the parameter c in JP CA
is to make our algorithm more powerful in various data sets,
especially when solving the cluster validity problem. The
PCA is then summarized as follows:
PCA
60
65
70
55
50
60
60
140
145
150
135
130
140
140
c = 10
0
FCM
Similar to the possibilistic clustering approach proposed
by Krishnapuram and Keller [6], the proposed PCA also has
a more reasonable explanation for the degree of belonging
than FCM. In Fig. 5, there are two clusters with a bridge
point A and an outlying point B. FCM gives both A and B the
memberships 0.5 to these two clusters, although they should
have a different degree of belonging. These are shown in
Table 1. In PCA with the weighting exponent m = 2, the
bridge point A and the outlying point B are assigned different degrees of belonging with 0.058 and 0.001, respectively.
Note that the memberships obtained by PCA are of a possibilistic type as shown in Table 1. PCA can give us more
information about the data locations than FCM according to
these possibilistic membership values.
In fact, the parameters , m and c in the proposed PCA
algorithm play a similar role that may affect the degree of
belonging and the shape of the PCA membership functions
of Eq. (15). In Figs. 3 and 4, we demonstrate these effects of
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
3.2. The robust properties of PCA
0.15
FCM
Error rate
11
0.10
0.05
PCA
0.00
1
2
3
4
m
Fig. 6. The error rate curves of FCM and PCA for the Normal-4 data set
with respect to different weighting exponents m.
the parameters m and c on the PCA membership functions.
The parameter in PCA is used to reduce the influence of
the data scale so that it is set up as the sample variance with
Eq. (21). In Fig. 2, we had illustrated that FCM membership functions will become steep when the cluster number
c increases. The parameter c in PCA is used to adjust the
membership functions suitable for different cluster number
cases as shown in Fig. 4 so that the PCA with an actual
number c could have the clustering results to fit the structure of data well. In this case, the adjustment with c could
make the PCA more suitable for use for various data sets.
Thus, we need only to specify the parameter m in PCA so
that we could only focus on the discussion for the parameter m. We now use an example to show the influence of m
in PCA with a comparison to the influence of m in FCM for
the classification problem.
Example 1. We consider the Normal-4 data set that
was proposed by Pal and Bezdek [17]. Normal-4 is a
four-dimensional data set with a sample size n = 800
points consisting of 200 points from each of four clusters. The population mean vectors are 1 = (3, 0, 0, 0),
2 =(0, 3, 0, 0), 3 =(0, 0, 3, 0)and 4 =(0, 0, 0, 4) and the
variance–covariance matrix is i = I4 , an identity matrix,
i = 1, 2, 3, 4. Both FCM and PCA are implemented to obtain four clusters for the Normal-4 data set. Each data point
is classified according to the nearest neighbor to the cluster
center. Fig. 6 shows the classification error rate curves of
FCM (solid circle point) with m = 1.5.4 and PCA (circle
point) with m = 0.01.4. We find that both algorithms work
well when the parameter m is between 1.5 and 3. But FCM
does not work well when the parameter m is between 3 and
4 where PCA still works well for the Normal-4 data set. In
fact, Pal and Bezdek [17] had suggested that m in the interval
[1.5, 2.5] was generally recommended to be used for FCM.
More discussion about the parameter m including the robust
properties and clustering for an unknown cluster number
data set will be discussed in the next two subsections.
A good clustering method should have the robustness to
tolerate noise and outliers. In this subsection, we will give
the robust properties of the proposed PCA algorithm. We
use the influence function (see Ref. [23]) to show that the
proposed PCA cluster center update Eqs. (15) and (16) are
robust to noise and outliers. Let {x1 , . . . , xn } be an observed
data set of real numbers and be an unknown parameter to be
estimated. An M-estimator [23] is generated by minimizing
the form
n
(xj ; ),
(22)
j =1
where is an arbitrary function that can measure the loss of
xj and . Here, we are interested in a location estimate that
minimizes
n
(xj − )
(23)
j =1
and the M-estimator is generated by solving the equation
n
(xj − ) = 0,
(24)
j =1
where (xj − ) = (j/j)(xj − ).
The influence function or influence curve (IC) can help us
to assess the relative influence of an individual observation
toward the value of an estimate. The M-estimator has shown
that its influence function is proportional to its function.
In the location problem, we have the influence function of
an M-estimator with
I C(x; F, ) = (x − )
,
− ) dFX (x)
(x
(25)
where FX (x) denotes the distribution function of X. If the
influence function of an estimator is unbounded, a noise or
outliers might cause trouble. Similarly, if the function of
an estimator is unbounded, noise and outliers may also cause
trouble.
We have shown that minimizing JP CA is equivalent to
maximizing the total potential function (20). Now let the
loss between the data point xj and the ith cluster center ai be
√
m
m cxj − ai 2
(xj − ai ) = 1 − exp −
(26)
and
√
m √
m cxj − ai 2
m c
(xj − ai ) = exp −
(xj − ai ).
(27)
12
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
Maximizing the total
equiva potential function (20) is lent to minimizing nj=1 (xj − ai ). By solving nj=1 (xj − ai ) = 0, we have the result shown in Eq. (16) with the
membership function (15). Thus, the PCA cluster center
The phi function of PCA
2
m=5
m=2
1
m=1
0
-1
-2
10 20 30 40 50 60 70 80 90 100
xj
Fig. 7. The (phi) function of the PCA clustering algorithm.
estimate is an M-estimator with the loss function (26) and
the function (27). The function can be used to assess
the relative influence of an individual observation toward the
value of a cluster center estimate where the function of
PCA is shown in Fig. 7. We see that the influence of a single noisy point (far away from 50) is a monotone decreasing function of the weighting exponent m and the largest
influence occurs in the case when m tends to zero. In PCA,
however, limm→0 ij = 1 and limm→0 ai = x for all i, j. This
means that the sample mean will be the unique optimizer of
the PCA objective function JP CA with small m values. This
is reasonable because the sample mean is not robust to noise
or outliers so that adding an individual outlier will largely
shift the sample mean toward the outlier. Fig. 8 shows a
two-clusters data set with an outlier (50, 0). Fig. 8(a) shows
these phenomena. When m is small such as m = 1.5, the
two cluster center estimates (solid circle points) will be very
close to the sample mean. In Figs. 8(b) and (c) with m = 2
and 2.5, the cluster center estimates are not influenced by
this outlying point and present the clustering centers for the
data set well.
PCA m = 1.5
PCA m = 2
2
2
1
1
0
0
-1
-1
-2
-2
(a)
2
-3 -2 -1 0 1 2 3 4 5 6 7
(b)
PCA m = 2.5
2
1
1
0
0
-1
-1
-2
-2
(c)
-3 -2 -1 0 1 2 3 4 5 6 7
(d)
FCM m = 1.05
2
1
1
0
0
-1
-1
(e)
PCA m = 20
-3 -2 -1 0 1 2 3 4 5 6 7
FCM m = 2
2
-2
-3 -2 -1 0 1 2 3 4 5 6 7
-2
-3 -2 -1 0 1 2 3 4 5 6 7
(f)
-3 -2 -1 0 1 2 3 4 5 6 7
Fig. 8. The locations of cluster center estimates (solid circle points) obtained by PCA and FCM with different weighting exponents m where the outlier
(50, 0) is added in the data set.
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
In Fig. 7, if the added individual point is far away from
the cluster center estimate, it will have no influence when
m is small. In contrast, if the added individual point is close
to the cluster center estimate, it will have a large influence
when m is large as shown in Fig. 7. This can be explained by
denoting ˆ = max{i1 , . . . , in }, ij = ij /,
ˆ j = 1, . . . , n
and we have
n
m
j =1 ij xj
lim {ai } = lim n
m
m→∞
m→∞
j =1 ij
n m
xj
ij =1 xj
j =1 ij
= lim .
m = m→∞
n
ij =1 1
j =1 ij
(28)
When m becomes large, the cluster center estimate will be
the data point that has the largest membership value. In
other words, the cluster center estimate will be the data point
that is the closest to the initial value. This phenomenon is
shown in Fig. 8(d) with m = 20. The solid circle points
present the cluster center estimates that are close to the initial values. We also show the FCM clustering results with
m = 1.05 and 2 in Figs. 8(e) and (f), respectively. The
right-hand cluster center estimate is always influenced by
the noise and cannot present the location of the cluster
center well.
In the next subsection, we will discuss the validity problem in the possibilistic clustering environment. Thus, we can
combine the PCA with the cluster validity indexes to be an
unsupervised PCA.
3.3. The generalized cluster validity indexes
In general, the existing validity indexes are constructed
for solving the validity problems under a fuzzy clustering
environment. The most simple
examples are the PC and PE
indexes. Since we constrain nj=1 i (x) = 1 in FCM, it is
impossible for one data point to have simultaneous high
memberships with more than one cluster. Thus, a large value
of PC and a small value of PE will correspond to a wellvalidated data structure. Hence, maximizing PC and minimizing PE will give us a good cluster number estimate.
However, the data points in PCA may have simultaneously
high memberships to different clusters. For example, if the
data set has only one compact cluster, the PCA will find
two coincident clusters when c = 2 and each data point
will have a large and equal membership value to both clusters. According to this property, it can be seen that the validity values of PC and PE will increase when c increases
in a possibilistic clustering environment. Other existing indexes will also have some undesirable tendency. This will be
shown later.
In order to solve the validity problem in using the possibilistic clustering method, we use a normalization technique to generalize these existing validity indexes. We know
13
that the possibilistic clustering method is created by relaxing the condition of nj=1 ij = 1. Hence, it is more robust
to the noise and outliers. It also has a better explanation for
the degree of belonging than FCM. However, this relaxation
makes the fuzzy validity indexes lose their efficiency under
the possibilistic clustering environment. Therefore, we normalized the possibilistic c-memberships{1 , . . . , c }P to be
{1 , . . . , c }F such that the condition nj=1 ij = 1 can be
satisfied. We then generalized the fuzzy cluster validity indexes by replacing ij with ij as follows: suppose that we
have a set of a possibilistic c-membership {1 , . . . , c }P ;
the normalized possibilistic c-memberships is then defined
by {1 , . . . , c }F with
(x)
i (x) = c i
,
k=1 k (x)
i = 1, . . . , c.
(29)
Then the generalized PC, PE, FS and XB are defined by
replacing ij with
(xj )
ij = i (xj ) = c i
,
k=1 k (xj )
j = 1, . . . , n,
i = 1, . . . , c,
(30)
which are denoted by GPC, GPE, GFS and GXB, respectively. When these generalized validity
indexes are used in
a fuzzy clustering environment (i.e. ci=1 ij = 1), it is easy
to show that ij = ij in Eq. (23) and hence each generalized validity index is equivalent to their original form. This
is why we call these indexes generalized validity indexes.
Note that any existing validity indexes can also make this
generalization treat the cluster validity problem under a possibilistic clustering environment. Next, we use two examples
to demonstrate the performance of these generalized validity
indexes GPC, GPE, GFS and GXB.
Example 2. This is a 16-group data set as shown in
Fig. 9(a). Data points in each group are uniformly generated
from each rectangle. We call this data set the Uniform-16
data set. Uniform-16 is a two-dimensional data set with a
sample size n = 800 consisting of 50 points from each of
16 clusters. In order to compare the performance of these
generalized validity indexes GPC, GPE, GFS and GXB
for the Uniform-16 data set, we will implement FCM and
PCA from c = 2 to 25 so that the clustering results could
completely consider all situations for this 16-group data
set. According to the data structure of the Uniform-16 data
set, we may assign the c initial values from c = 1 to 25
with the locations and orders as shown in Fig. 10. Say for
examples, if c = 4, the initial cluster centers are located on
the four corners and if c = 16, the initial cluster centers are
uniformly located on 16 locations with the orders shown in
Fig. 10, etc. The results of the validity indexes PC, PE, FS
and XB by processing FCM are shown in the left portion
of Table 2. The PC indexes give the result that matches the
data structure well. According to Fig. 10, the result with
c = 4 may be another reasonable choice for the optimal c,
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
7
6
5
4
3
2
1
0
noise.Y
Y
14
0
1
2
3
4
5
6
7
0
X
(a)
7
6
5
4
3
2
1
0
1
2
3
4
5
6
7
noise.X
(b)
Fig. 9. (a) The Uniform-16 data set. (b) The Uniform-16 data set with 100 noisy points.
14
5
15
10
6
23
20
17
2
1
1
24
21
13
3
16
11
7
18
4
4
25
22
19
6
initial. Y
12
8
2
7
9
5
3
0
0
1
2
3
4
initial. X
5
6
7
Fig. 10. The locations and orders of the given initial values for the
Uniform-16 data set.
such as the FS and XB indexes have shown. We now process
the PCA with the results in using the original and generalized validity indexes which are shown in the center and
the right portion of Table 2, respectively. As we expected,
these original validity indexes present some undesirable
tendencies. The PC, PE and FS present a monotone tendency with c. The XB index presents an acceptable result.
However, all generalized validity indexes present reasonable results. The GPC, GPE, and GFS show that c = 4 is
optimal. The GXB index gives a result with c = 16 which
matches well with the data structure. Note that both FCM
and PCA are processed with the same initial values and the
same m = 2.
Table 2
Cluster validity for the Uniform-16 data set without noise
c
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
FCM
PCA
PC
PE
FS
XB
PC
PE
FS
XB
GPC
GPE
GFS
GXB
0.683
0.614
0.595
0.533
0.523
0.518
0.533
0.536
0.556
0.581
0.587
0.634
0.658
0.675
0.700
0.685
0.669
0.658
0.643
0.630
0.620
0.614
0.607
0.597
0.483
0.687
0.796
0.948
1.003
1.055
1.061
1.091
1.080
1.044
1.053
0.958
0.913
0.878
0.824
0.864
0.906
0.946
0.980
1.002
1.028
1.043
1.060
1.086
−1102
−2728
−3792
−3281
651
283
6022
15034
7430
10923
8667
49442
16237
6679
8160
10207
14768
13626
26246
30790
31237
789816
49237
43091
0.068
0.021
0.020
0.117
0.988
0.979
2.973
7.007
4.501
5.657
5.012
18.235
7.224
4.134
4.549
57.6
99.4
74.4
177.9
204.4
210.6
4584.3
318.6
277.6
0.178
0.211
0.258
0.308
0.356
0.405
0.450
0.496
0.542
0.588
0.633
0.678
0.722
0.764
0.807
0.848
0.890
0.932
0.974
1.014
1.055
1.096
1.136
1.174
0.293
0.224
0.249
0.293
0.357
0.409
0.433
0.454
0.490
0.520
0.530
0.540
0.549
0.560
0.568
0.586
0.603
0.618
0.624
0.637
0.650
0.649
0.661
0.661
−188
− 2051
− 3162
− 3512
− 3488
− 3569
− 3979
− 4373
− 4409
− 4496
− 4932
− 5271
− 5563
− 5980
− 6376
− 6396
− 6420
− 6437
− 6776
− 6779
− 6802
− 7462
− 7484
− 8145
2.42E−02
7.46E−03
5.76E−03
4.59E−02
4.36E−02
4.23E−02
4.15E−02
4.17E−02
4.15E−02
4.11E−02
4.07E−02
4.02E−02
4.02E−02
4.07E−02
4.11E−02
2.77E+06
3.71E+06
1.13E+07
1.65E+06
1.80E+08
2.22E+08
2.75E+09
3.66E+10
5.22E+12
0.862
0.919
0.957
0.854
0.790
0.725
0.700
0.681
0.667
0.652
0.659
0.677
0.693
0.706
0.723
0.709
0.697
0.694
0.678
0.676
0.676
0.654
0.639
0.616
0.221
0.146
0.097
0.251
0.367
0.485
0.547
0.589
0.622
0.660
0.659
0.641
0.622
0.611
0.588
0.616
0.639
0.657
0.680
0.692
0.705
0.732
0.754
0.784
3381
− 5506
− 9637
− 9165
− 7760
− 6637
− 6370
− 6135
− 5663
− 5211
− 5203
− 5466
− 5661
− 5831
− 5976
− 6012
− 6043
− 6102
− 5999
− 6081
− 6125
− 5791
− 5809
− 5393
5.79E−01
2.11E−01
1.06E−01
7.50E−01
5.33E−01
3.88E−01
3.45E−01
3.09E−01
2.34E−01
1.79E−01
1.57E−01
1.19E−01
9.84E−02
6.06E−02
3.82E−02
2.41E+06
3.02E+06
8.82E+06
1.21E+06
1.27E+08
1.51E+08
1.76E+09
2.21E+10
2.94E+12
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
15
Table 3
Cluster validity for the Uniform-16 data set with noise
c
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
FCM
PCA
PC
PE
0.686
0.621
0.592
0.531
0.521
0.524
0.518
0.525
0.537
0.565
0.583
0.592
0.617
0.645
0.666
0.647
0.629
0.616
0.605
0.594
0.587
0.576
0.568
0.561
0.478
0.676
0.798
0.951
1.010
1.041
1.092
1.111
1.116
1.077
1.060
1.055
1.008
0.949
0.906
0.952
0.995
1.034
1.06
1.099
1.109
1.137
1.157
1.173
FS
− 1137
− 3104
−4095
−3801
− 26
17689
1456
14401
8376
13843
7005
10218
12857
10564
11596
10030
18611
13819
15389
25766
28204
21264
22978
44734
XB
PC
PE
FS
XB
GPC
GPE
GFS
GXB
0.079
0.023
0.026
0.089
0.844
4.161
1.661
6.342
4.585
5.565
4.092
5.116
6.045
4.923
5.134
60.7
108.8
77.5
100.4
153.6
152.1
126.8
134.5
225.5
0.198
0.217
0.248
0.304
0.355
0.405
0.451
0.489
0.527
0.567
0.605
0.651
0.700
0.736
0.772
0.818
0.861
0.903
0.947
0.987
1.027
1.062
1.096
1.129
0.274
0.265
0.260
0.307
0.377
0.434
0.459
0.478
0.509
0.536
0.545
0.564
0.581
0.587
0.591
0.605
0.628
0.650
0.662
0.682
0.701
0.698
0.707
0.706
−198
−1848
−3305
−3699
−3663
−3760
−4250
−4641
−4682
−4774
−5221
−5575
−5909
−6354
−6780
−7155
−7181
−7197
−7543
−7526
−7551
−8207
−8225
−8933
2.89E−02
1.69E−02
6.57E−03
6.39E−02
5.88E−02
5.60E−02
5.26E−02
4.96E−02
4.79E−02
4.65E−02
4.53E−02
4.56E−02
4.62E−02
4.59E−02
4.56E−02
3.31E+06
3.09E+07
6.02E+08
8.39E+08
9.38E+08
1.02E+09
1.06E+09
1.09E+09
1.36E+14
0.854
0.869
0.949
0.836
0.773
0.709
0.685
0.670
0.660
0.648
0.654
0.666
0.679
0.692
0.709
0.691
0.673
0.666
0.660
0.652
0.650
0.632
0.617
0.597
0.238
0.218
0.108
0.278
0.395
0.512
0.570
0.605
0.631
0.664
0.664
0.657
0.643
0.630
0.607
0.638
0.673
0.699
0.721
0.736
0.753
0.776
0.797
0.822
5423
− 4426
−10475
− 9832
− 8339
− 7050
− 6766
− 6568
− 6119
− 5698
− 5705
− 5940
− 6106
− 6325
− 6525
− 6392
− 6368
− 6394
− 6395
− 6470
− 6494
− 6217
− 6252
− 5826
7.75E−01
3.62E−01
1.07E−01
9.15E−01
6.28E−01
4.39E−01
3.65E−01
3.14E−01
2.42E−01
1.89E−01
1.68E−01
1.27E−01
1.02E−01
6.72E−02
4.66−02E
3.12E+06
2.69E+07
4.97E+08
6.57E+08
6.92E+08
7.23E+08
7.27E+08
7.18E+08
8.55E+13
Table 4
Initial values for Normal-4 data set and their orders
Order
X1
X2
X3
X4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
3
0
0
0
3
0
0
1
3
0
1
0
3
1
0
0
0
3
0
0
1
3
0
0
0
3
0
1
0
3
1
0
0
0
3
0
0
1
3
0
1
0
3
0
0
0
3
1
0
0
0
3
0
0
1
3
0
1
0
3
1
0
0
3
We now add 100 uniformly random noisy points to the
Uniform-16 data set as shown in Fig. 9(b). The x-coordinate
values of these noisy points are uniformly distributed over
the interval [−0.5, 3.5] and the y-coordinate values of
these noisy points are uniformly distributed over the interval [−0.5, 7.5]. These noisy points have an influence on
FCM, and the validity indexes are influenced correspondingly as shown on the left portion of Table 3. However,
as the right portion of Table 3 shows, the results of the
generalized validity indexes by processing PCA are not
influenced by these noisy points, and the GXB index still
presents the best result which matches the structure of
the data. The results of the original validity indexes by
processing PCA are shown in the center of Table 3. In
this example, PCA shows its robust property for noise,
not only in the clustering results but also in the validity
problems.
Example 3. We implement the Normal-4 data set that was
used in Example 1. The initial values and their orders are
shown in Table 4. If c = 3, the initial cluster centers are
the observations of orders 1, 2 and 3. If c = 4, the initial
cluster centers are the observations of orders 1, 2, 3 and
4, etc. Both FCM and PCA are processed with the same
initial values and m = 2. The validity results by processing
FCM are shown in the left portion of Table 5. Only the
FS index presents the result that matches the data structure.
The users should be careful about the fact that the FS index
often ended up in unreasonable results in the investigations
16
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
Table 5
Cluster validity for the Normal-4 data set
c
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
FCM
PCA
PC
PE
FS
XB
PC
PE
FS
XB
GPC
GPE
GFS
GXB
0.556
0.488
0.503
0.407
0.351
0.309
0.279
0.251
0.234
0.219
0.205
0.195
0.185
0.178
0.168
0.635
0.878
0.958
1.196
1.369
1.508
1.615
1.735
1.828
1.911
1.985
2.057
2.126
2.184
2.244
− 334
− 1145
− 2157
−1785
−1550
−1376
−1084
−1083
−1009
− 857
− 810
− 378
− 422
− 330
− 350
0.0281
0.0217
0.0224
0.0478
0.1178
0.2397
0.6435
0.4062
0.3469
0.5551
0.5960
1.1157
1.1008
1.4691
1.6172
0.123
0.145
0.167
0.185
0.194
0.204
0.216
0.226
0.231
0.238
0.247
0.254
0.257
0.262
0.270
0.237
0.290
0.346
0.395
0.443
0.483
0.526
0.563
0.595
0.622
0.655
0.691
0.719
0.741
0.768
−188
−467
−669
−709
−781
−878
−960
−995
−1074
−1151
−1200
−1158
−1154
−1190
−1208
1.62E −02
1.57E −02
1.64E −02
4.62E+05
3.83E+09
1.64E+11
4.89E+13
1.14E+18
2.93E+21
2.15E+22
2.16E+23
1.33E+13
1.49E+11
9.05E+09
8.83E+10
0.819
0.839
0.895
0.786
0.681
0.578
0.468
0.429
0.393
0.356
0.317
0.310
0.292
0.276
0.258
0.289
0.286
0.199
0.359
0.507
0.653
0.811
0.909
0.998
1.089
1.189
1.237
1.293
1.352
1.419
3939
350
− 2008
− 2229
− 2022
− 1578
− 1125
− 1150
− 1086
− 956
− 830
− 783
− 688
− 627
− 581
5.12E−01
3.20E−01
2.07E−01
4.96E+06
3.56E+10
1.32E+12
3.21E+14
6.85E+18
1.65E+22
1.13E+23
1.02E+24
5.98E+13
6.44E+11
3.76E+10
3.45E+11
Table 6
Cluster validity for the Normal-4 data set with uniform noise
c
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
FCM
PCA
PC
PE
FS
XB
PC
PE
FS
XB
GPC
GPE
GFS
GXB
0.524
0.428
0.444
0.461
0.377
0.324
0.289
0.257
0.231
0.214
0.199
0.189
0.190
0.179
0.169
0.669
0.964
1.057
1.084
1.302
1.472
1.597
1.710
1.829
1.922
2.006
2.078
2.085
2.161
2.225
− 133
− 690
− 1862
− 4684
− 3691
- 3068
− 2621
− 2322
− 2066
− 1898
− 1604
− 1530
− 2084
− 1933
− 1818
0.0453
0.0352
0.0166
0.0200
0.0420
0.0809
0.2113
0.2853
0.3719
0.4112
0.5435
0.4864
0.4489
0.4317
0.6594
0.219
0.252
0.295
0.330
0.357
0.380
0.408
0.432
0.449
0.465
0.485
0.504
0.516
0.528
0.545
0.427
0.494
0.555
0.614
0.673
0.723
0.778
0.829
0.881
0.927
0.977
1.024
1.072
1.114
1.160
373
− 289
− 777
− 965
− 1161
− 1396
− 1603
− 1712
− 1830
− 1985
− 2121
− 2201
− 2289
− 2408
− 2510
1.05E−01
6.30E−02
5.68E−02
5.83E+06
9.25E+06
2.83E+07
2.11E+07
3.91E+07
4.32E+08
5.62E+08
7.10E+08
2.75E+09
3.79E+09
6.81E+09
1.36E+10
0.659
0.664
0.710
0.641
0.570
0.494
0.416
0.385
0.355
0.323
0.292
0.275
0.259
0.242
0.225
0.512
0.594
0.560
0.678
0.790
0.901
1.016
1.096
1.174
1.252
1.333
1.394
1.453
1.513
1.575
11058
7536
4856
3935
3242
2550
2300
2101
1887
1648
1556
1471
1365
1244
1193
1.65E+00
9.13E−01
6.92E−01
6.40E+07
8.75E+07
2.18E+08
1.37E+08
2.36E+08
2.38E+09
2.74E+09
3.13E+09
1.15E+10
1.48E+10
2.46E+10
4.60E+10
of many researchers (see Refs. [14,17]). The results of the
original and generalized validity indexes by processing PCA
are shown in the center, and on the right portion in Table 5
respectively. As we expect, the original validity indexes lose
their efficiency in the possibilistic clustering environment.
However, the GPC, GPE and GXB indexes give the best
results that match the data structure.
We now add 100 uniform noisy points. Each coordinate
of the noisy points is generated from a uniform distribution
over the interval [0, 10]. The values of the validity indexes
by processing FCM and PCA are shown in Table 6. In this
noisy environment, the XB index with FCM gives us a good
result. However, it does not always work well in a noisefree environment as Table 5 shows. This may because the
data set is randomly generated. Thus, the results of the XB
index, by processing FCM is still influenced by the noise.
Both GPC and GXB present good results that match the data
structure and are not influenced by the noise. More examples
including the real data are presented in the next section.
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
4. Examples with real data
xj k as
In this section, the Iris real data set (see Refs. [24,25]) and
the other two real data sets from [26] will be implemented.
The first one is the Iris data set that has n = 150 points in
an s = 4-dimensional space that represents three clusters,
each with 50 points (see Refs. [24,25]). Two clusters have
substantial overlapping in Iris. Thus, one can argue c = 2 or
3 for Iris. The second real data set is the Glass [26] that has
n = 214 points in an s = 9-dimensional space that presents
six clusters. The third real data set is the Vowel [26] that has
n = 990 points in an s = 10-dimensional space that presents
eleven clusters
Most clustering problems are solved by minimizing
the constructed dispersion measures. For a data set in an
s-dimensional space, each dimension presents one characteristic of the data and the degrees of dispersions of each
characteristic are always different. Thus, the results of minimizing the total dispersion measure will discard the effects
of some characteristics, especially those that have a small
degree of dispersion. This situation occurs frequently, especially in a high-dimensional data set. To sufficiently use all
the information of the characteristics, we shall normalize
the data set. Suppose we have a data set X = {x1 , . . . , xn }
in an s-dimensional space with each xj = (xj 1 , . . . ,
xj s ), we normalize the data by replacing xj k with
xj k = 17
xj k − nl=1 xlk /n
,
2
n
n −
x
/n
/(n
−
1)
x
l=1 lk
l=1 lk
k = 1, . . . , s, j = 1, . . . , n.
(31)
After normalization, each characteristic of the data set will
have a common sample mean and dispersion measures.
We will normalize three real data sets before we analyze
them.
We analyze the cluster validity problem using the original
validity indexes (PC and XB) and the generalized validity
indexes (GPC and GXB) by processing FCM and PCA, respectively. We also look for the influences of the parameter
m on the validity problem in both FCM and PCA. The chosen m values for FCM are 1.1, 1.5, 2, 2.5, and 3. The chosen
Table 7
The theoretical upper bound of m for the FCM clustering algorithm
Data set
n
s
c
Upper bound of m
Iris
Glass
Vowel
150
214
990
4
9
10
3
6
11
infinity
3.1726
1.7787
Original data from Yu et al. [18].
Table 8
Cluster validity for the normalized Iris data
c
FCM
m = 1.1
m = 1.5
PC
XB
2
3
4
5
6
7
8
9
0.999
0.981
0.976
0.976
0.976
0.980
0.976
0.983
5.71E
2.32E
1.25E
1.57E
2.27E
4.21E
6.33E
5.19E
c
PCA
m= 1
GPC
GXB
m = 1.5
GPC
0.880
0.637
0.481
0.381
0.329
0.274
0.242
0.221
2.61E − 01
1.49E + 07
2.68E + 09
5.31E + 09
7.76E + 09
4.47E + 11
1.86E + 11
2.92E + 11
0.971
0.660
0.495
0.387
0.332
0.277
0.244
0.222
2
3
4
5
6
7
8
9
PC
+ 17
+ 17
+ 16
+ 16
+ 16
+ 15
+ 15
+ 15
0.952
0.887
0.848
0.801
0.780
0.787
0.761
0.767
m= 2
XB
m = 2.5
m= 3
PC
XB
PC
XB
PC
0.832
0.707
0.635
0.570
0.513
0.498
0.486
0.474
0.251
1.222
2.098
2.845
8.095
9.988
11.375
8.862
0.729
0.573
0.481
0.416
0.366
0.341
0.312
0.290
0.115
0.492
0.708
0.952
2.173
2.573
4.270
6.389
0.660
0.490
0.392
0.330
0.283
0.257
0.230
0.207
0.073
0.270
0.328
0.400
0.762
0.944
1.817
17.271
GXB
m= 2
GPC
GXB
m = 2.5
GPC
GXB
m= 3
GPC
GXB
1.44E − 01
6.15E + 05
3.49E + 09
3.67E + 09
2.14E + 09
8.58E + 10
6.24E + 10
2.79E + 10
0.988
0.665
0.498
0.388
0.333
0.278
0.244
0.222
1.53E − 01
8.74E + 04
8.76E + 08
2.72E + 09
1.12E + 10
1.66E + 12
4.00 E + 13
3.71E + 15
0.995
0.666
0.500
0.389
0.486
0.439
0.468
0.459
1.69E − 01
1.69E + 04
1.82E + 17
1.66E + 29
2.52E + 09
3.61E + 11
4.23E + 04
1.29E + 11
0.998
0.887
0.801
0.645
0.593
0.546
0.519
0.501
1.92E − 01
4.30E − 01
1.67E + 00
2.53E + 04
1.91E + 04
4.13E + 30
6.97E + 25
1.42E + 27
10.3
28.7
50.1
54.8
81.6
148.3
243.6
200.4
XB
18
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
Table 9
Cluster validity for the normalized Glass data
c
FCM
m = 1.1
m = 1.5
PC
XB
2
3
4
5
6
7
8
9
10
11
12
0.981
0.951
0.945
0.954
0.962
0.961
0.944
0.971
0.960
0.960
0.959
3.91E
3.95E
2.72E
1.21E
1.20 E
1.22E
3.67E
1.96E
3.32E
6.90 E
7.55E
c
PCA
m= 1
GPC
2
3
4
5
6
7
8
9
10
11
12
0.500
0.333
0.250
0.200
0.167
0.143
0.125
0.111
0.100
0.131
0.122
PC
+ 00
+ 03
+ 08
+ 14
+ 14
+ 14
+ 14
+ 14
+ 14
+ 14
+ 14
0.830
0.773
0.735
0.743
0.748
0.685
0.678
0.675
0.620
0.614
0.618
m= 2
XB
0.060
0.149
1.013
1.053
0.988
3.001
4.892
7.819
38.012
46.579
70.241
m = 1.5
GXB
6.12E
2.30 E
9.27E
4.74E
1.03E
2.44E
1.66E
2.28E
8.36E
1.08E
1.00 E
+ 08
+ 08
+ 09
+ 09
+ 10
+ 11
+ 12
+ 14
+ 21
+ 12
+ 12
m = 2.5
XB
PC
XB
PC
XB
0.617
0.540
0.487
0.447
0.370
0.373
0.332
0.318
0.288
0.283
0.288
0.133
0.194
0.288
0.247
1.650
2.114
3.292
3.287
2.953
3.480
3.676
0.500
0.409
0.335
0.261
0.218
0.188
0.170
0.172
0.155
0.141
0.115
2.13E + 03
1.88E − 01
1.92E − 01
1.43E + 02
1.17E + 02
8.26E + 06
4.38E + 06
6.35E + 04
2.49E + 04
6.08E + 04
1.31E + 05
0.500
0.340
0.278
0.207
0.173
0.148
0.139
0.123
0.111
0.108
0.092
4.52E+02
6.14E+01
9.96E+03
2.58E+03
3.85E+04
5.18E+06
1.65E+03
2.71E+03
1.43E+05
2.34E+12
2.02E+07
GPC
GXB
GPC
GXB
0.500
0.333
0.356
0.386
0.387
0.317
0.275
0.346
0.445
0.420
0.457
2.90 E
1.16E
8.81E
7.88E
1.77E
5.69E
1.75E
Inf
4.02E
1.78E
2.77E
0.500
0.333
0.550
0.587
0.595
0.498
0.535
0.531
0.563
0.569
0.592
8.59E
3.13E
1.40 E
5.71E
6.36E
5.56E
5.96E
4.16E
7.46E
1.17E
1.39E
m= 2
GPC
GXB
0.500
0.333
0.353
0.275
0.228
0.196
0.174
0.211
0.283
0.270
0.338
2.68E
1.45E
4.24E
1.29E
9.80 E
2.47E
4.65E
1.60 E
1.48E
1.88E
2.26E
+ 08
+ 09
+ 08
+ 09
+ 11
+ 13
+ 22
+ 14
+ 15
+ 15
+ 15
m= 3
PC
m = 2.5
+ 08
+ 08
+ 08
+ 08
+ 08
+ 16
+ 26
+ 19
+ 18
+ 19
m= 3
+ 08
+ 10
+ 12
+ 06
+ 06
+ 31
+ 13
+ 11
+ 11
+ 12
+ 12
GPC
GXB
0.836
0.543
0.821
0.849
0.866
0.640
0.556
0.552
0.581
0.584
0.577
1.06E+00
5.99E+06
1.09E+00
9.47E−01
9.02E−01
3.55E+11
9.52E+11
4.55E+12
3.27E+16
1.69E+15
2.30E+06
m values for PCA are 1, 1.5, 2, 2.5 and 3. In FCM, the most
suggested m values range within [1.5, 2.5]. Moreover, Yu et
al. [18] gave a theoretical upper boundary of m for FCM.
They showed that an m value which is larger than the theoretical upper boundary will have only one optimizer (i.e.
the sample mean) of the FCM objective function, no matter
what value c has. The theoretical upper boundaries for three
real data are shown in Table 7 according to Yu et al. [18]. We
shall also consider this theoretical result in our simulations.
sample mean being a unique optimizer when m is larger.
This phenomenon of sample mean being a unique optimizer
may occur in PCA when m is too small. However, the theoretical upper boundary of m for the normalized Iris data
is infinity as shown in Table 7. Therefore, the sample mean
will not be a unique optimizer of JF CM and the validity index PC will not be equal to 1/c for all c. Thus, both original and generalized validity indexes give good results in the
Iris data set.
Example 4 (Iris data set). The Iris data set (see Refs.
[24,25]) has n = 150 points in an s = 4-dimensional space.
It consists of three clusters. Two clusters have substantial
overlapping. Table 8 shows the cluster validity results for
the normalized Iris data set implemented by FCM and PCA.
The clustering algorithms with the selected parameter m and
the validity indexes PC, XB, GPC and GXB are combined
for different cluster numbers c. In FCM, both PC and XB
work well when m = 1.5, 2, 2.5 and 3. In PCA, both GPC
and GXB work well for all specified m values. In general,
we should use FCM carefully to avoid the situation of the
Example 5 (Glass data set). The Glass data set from Blake
and Merz [26] has n = 214 points in an s = 9-dimensional
space. It consists of six clusters. Table 9 shows the cluster validity for the normalized Glass data set. For given m
values from 1.1 to 3, PC and XB always show that c = 2
or 3 is optimal.Note that, when m = 2.5 and 3, P C(2) =
1/2. We suspect that the sample mean is a unique optimizer when c = 2 and we do not adopt the result provided
by the PC and XB when c = 2, or we say that c = 2 is
not suitable for this data set. Generally speaking, we will
not consider the results in the cases of P C(c) = 1/c. The
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
19
Table 10
Cluster validity for the normalized Vowel data
c
FCM
m = 1.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
c
2
3
4
5
6
7
8
9
10
11
12
13
14
15
m = 1.5
m= 2
m = 2.5
PC
XB
PC
XB
PC
XB
0.958
0.933
0.942
0.923
0.917
0.924
0.910
0.920
0.912
0.915
0.921
0.918
0.919
0.924
0.0002
0.0001
0.0002
0.0136
0.1738
0.1207
0.0145
0.0260
0.1746
0.0069
0.0096
0.0126
0.0172
0.0261
0.608
0.508
0.463
0.416
0.377
0.335
0.340
0.319
0.310
0.290
0.306
0.293
0.294
0.296
0.008
0.011
0.010
0.020
0.026
0.078
0.024
0.165
0.094
0.261
0.105
0.550
230
3.524
0.500
0.333
0.250
0.200
0.167
0.143
0.125
0.111
0.100
0.091
0.083
0.077
0.071
0.067
3.89E
1.63E
9.38E
9.49E
1.42E
6.77E
5.56E
2.16E
7.96E
3.57E
1.02E
1.82E
6.41E
5.70 E
PCA
m= 1
m =2
m = 1.5
GPC
GXB
0.500
0.333
0.250
0.200
0.167
0.143
0.125
0.111
0.100
0.091
0.083
0.077
0.071
0.067
2.43E
1.48E
1.37E
1.60 E
7.80 E
1.11E
5.21E
2.55E
4.20 E
2.33E
1.35E
5.90 E
7.46E
4.00 E
+
+
+
+
+
+
+
+
+
+
+
+
+
+
09
09
09
09
08
09
09
09
09
09
10
09
09
09
+ 02
+ 03
+ 03
+ 04
+ 06
+ 05
+ 05
+ 07
+ 07
+ 07
+ 08
+ 11
+ 09
+ 08
GPC
GXB
0.500
0.333
0.250
0.200
0.167
0.143
0.187
0.170
0.157
0.147
0.205
0.254
0.243
0.212
7.71E
1.67E
2.85E
2.04E
1.08E
7.48E
4.43E
9.62E
6.54E
3.50 E
1.56E
1.22E
2.93E
Inf
+
+
+
+
+
+
+
+
+
+
+
+
+
08
08
08
10
11
12
11
12
18
24
21
26
27
GXB
0.500
0.688
0.685
0.434
0.402
0.403
0.422
0.410
0.404
0.397
0.381
0.375
0.365
0.360
1.90 E
8.09E
3.58E
8.84E
5.00 E
8.51E
8.63E
2.87E
3.69E
1.07E
8.35E
4.45E
6.20 E
8.07E
Example 6 (Vowel data set). The Vowel data set from Blake
and Merz [26] has n = 990 points in an s = 10-dimensional
space that has 11 clusters. Table 10 shows the cluster validity
PC
XB
PC
XB
0.500
0.333
0.250
0.200
0.167
0.143
0.125
0.111
0.100
0.091
0.083
0.077
0.071
0.067
3761
27772
20427
152493
516614
6261341
5671117
2705403
1863116
2507218
3085955
2512509
1664479
1440508
0.500
0.333
0.250
0.200
0.167
0.143
0.125
0.111
0.100
0.091
0.083
0.077
0.071
0.067
12651
46664
32445
99395
435968
836560
685560
425636
580590
331552
314035
357623
318837
263237
m = 2.5
GPC
same phenomenon can also be found in the GPC index.
If GP C(c) = 1/c, then these c clusters are coincidental
(may or may not be the sample mean). As m = 1 in PCA,
GP C(c) = 1/c for all c, and we discard this result. When
m = 1.5, 2 and 2.5, GP C(c) = 1/c as c = 2 and 3. Therefore, we only consider the case in c > 3. As m increases, the
results of the GPC and GXB give the correct cluster number (c = 6) of the data. In the normalized Glass data, the
algorithm PCA with the generalized validity indexes works
better than the algorithm FCM with the original validity
indexes.
m= 3
+ 07
+ 10
+ 11
+ 21
+ 14
+ 09
+ 08
+ 10
+ 11
+ 09
+ 11
+ 20
+ 31
+ 31
m= 3
GPC
GXB
0.500
0.493
0.579
0.574
0.431
0.467
0.484
0.456
0.463
0.487
0.452
0.437
0.422
0.412
9.45E
3.87E
1.23E
4.15E
2.48E
1.57E
2.50 E
1.13E
6.04E
8.03E
1.97E
2.74E
5.58E
2.30 E
+ 05
+ 06
+ 07
+ 07
+ 09
+ 10
+ 10
+ 11
+ 12
+ 14
+ 16
+ 17
+ 30
+ 31
GPC
GXB
0.500
0.553
0.619
0.644
0.610
0.524
0.567
0.545
0.547
0.488
0.450
0.435
0.424
0.385
3.18E
2.63E
7.83E
2.54E
2.04E
7.88E
5.04E
2.02E
2.45E
3.03E
2.69E
4.83E
3.82E
6.85E
+
+
+
+
+
+
+
+
+
+
+
+
+
+
06
07
08
00
00
14
11
10
10
10
16
19
28
30
for the normalized Vowel data set. When m = 2, 2.5 and 3,
the sample mean is a unique optimizer of JF CM and hence
P C(c) = 1/c for all c. Thus, we do not consider the results
obtained by FCM with m=2, 2.5 and 3. These results exactly
match the theoretical upper boundary 1.7787 for m which
are shown in Table 7. When m = 1.1 and 1.5, PC and XB
give the results that c = 2 or 3 is optimal. In the normalized
Glass and Vowel data, the PC and XB with FCM algorithm
have the tendency that a small value of c is preferred. In
PCA, GP C(c) = 1/c for all c when m = 1 so that we do not
adopt the result of m=1. As m=1.5 in PCA, GP C(c)=1/c
for c = 2.7. Only c > 7 is considerable. The optimal cluster
numbers obtained by the generalized validity GPC and GXB
indexes with the PCA algorithm for m = 2, 2.5 and 3 are
shown in Table 10. It is also difficult to match the cluster
20
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
number c = 11. We see that the Vowel data set seems to lack
a good structure.
5. Conclusions
We proposed a new PCA, and solved the problem for
validating the clusters obtained by PCA. The proposed
generalized validity indexes work well in both fuzzy and
possibilistic clustering environments, and the proposed PCA
objective function combines the FCM objective function
with the validity indexes PC and PE so that the exponential
membership functions are used to describe the degree of
belonging. We also embedded the cluster number c into
the membership functions to make PCA more powerful for
various data sets, especially for solving the cluster validity
problems. The cluster centers obtained by PCA attempted to
maximize the sum of the membership functions, and were
corresponding to the peaks of the potential function or the
mountain function. By combining the generalized validity
indexes, PCA became an unsupervised PCA.
Numerical examples showed that the validity results of
the generalized validity indexes are more accurate than the
original validity indexes in a noisy environment. This is
because the possibilistic clustering is more robust than the
fuzzy clustering when noisy points are presented. Moreover,
we presented the robust properties of the proposed PCA
based on the influence function. In the analysis of three real
data sets, the generalized validity indexes with PCA seemed
to work better than the original validity indexes with FCM.
According to the simulation results, we recommend that a
better choice of the fuzzifier m in FCM is 1.5 for a large
and high-dimensional data, and the parameter m = 2 in
PCA is recommended. Note that the exponential membership function of PCA with a too small or too large m value
cannot have a good explanation for degree of belonging.
Finally, the results of PCA depend on the initialization,
just as any clustering algorithm does. A general initialized technique for FCM and PCA shall be our further
research topic.
Acknowledgements
The authors are grateful to the anonymous reviewers for
their comments in improving the presentation of the paper.
This work was supported in part by the National Science
Council of Taiwan under Grant NSC-93-2118-M-033-001.
References
[1] L.A. Zadeh, Fuzzy sets, Inf. Control 8 (1965) 338–353.
[2] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function
Algorithms, Plenum Press, New York, 1981.
[3] M.S. Yang, A survey of fuzzy clustering, Math. Comput. Modell. 18
(1993) 1–16.
[4] F. Höppner, F. Klawonn, R. Kruse, T. Runkler, Fuzzy Cluster
Analysis: Methods for Classification Data Analysis and Image
Recognition, Wiley, New York, 1999.
[5] J.C. Dunn, A fuzzy relative of the ISODATA process and its use
in detecting compact, well-separated clusters, J. Cybern. 3 (1974)
32–57.
[6] R. Krishnapuram, J.M. Keller, A possibilistic approach to clustering,
IEEE Trans. Fuzzy Syst. 1 (1993) 98–110.
[7] R. Krishnapuram, H. Frigui, O. Nasraoui, Fuzzy and possibilistic
shell clustering algorithm and their application to boundary detection
and surface approximation, IEEE Trans. Fuzzy Syst. 3 (1995) 29–60.
[8] T.A. Runkler, J.C. Bezdek, Function approximation with polynomial
membership functions and alternating cluster estimation, Fuzzy Sets
Syst. 101 (1999) 207–218.
[9] T.A. Runkler, J.C. Bezdek, Alternating cluster estimation: a new tool
for clustering and function approximation, IEEE Trans. Fuzzy Syst.
7 (1999) 377–393.
[10] J.C. Bezdek, Numerical taxonomy with fuzzy sets, J. Math. Biol.
1 (1974) 57–71.
[11] J.C. Bezdek, Cluster validity with fuzzy sets, J. Cybern. 3 (1974)
58–73.
[12] Y. Fukuyama, M. Sugeno, A new method of choosing the number
of clusters for the fuzzy c-means method, Proceedings of the Fifth
Fuzzy Syst. Symposium, 1989, pp. 247–250.
[13] X.L. Xie, G. Beni, A validity measure for fuzzy clustering, IEEE
Trans. Pattern Anal. Mach. Intell. 13 (1991) 841–847.
[14] N. Zahid, M. Limouri, A. Essaid, A new cluster-validity for fuzzy
clustering, Pattern Recognition 32 (1999) 1089–1097.
[15] I. Gath, A.B. Geva, Unsupervised optimal fuzzy clustering, IEEE
Trans. Pattern Anal. Mach. Intell. 11 (1989) 73–781.
[16] K.L. Wu, M.S. Yang, Alternative c-means clustering algorithm,
Pattern Recognition 35 (2002) 2267–2278.
[17] N.R. Pal, J.C. Bezdek, On cluster validity for fuzzy c-means model,
IEEE Trans. Fuzzy Syst. 1 (1995) 370–379.
[18] J. Yu, Q. Cheng, H. Huang, Analysis of the weighting exponent
in the FCM, IEEE Trans. Syst. Man Cybern.—Part B 34 (2004)
634–638.
[19] R. Krishnapuram, J.M. Keller, The possibilistic c-means algorithm:
insights and recommendations, IEEE Trans. Fuzzy Syst. 4 (1996)
385–393.
[20] M. Barni, V. Cappellini, A. Mecocci, Comments on: a possibilistic
approach to clustering, IEEE Trans. Fuzzy Syst. 4 (1996) 393–396.
[21] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis,
Wiley, New York, 1973.
[22] R.R. Yager, D.P. Filev, Approximate clustering via the mountain
method, IEEE Trans. Syst., Man Cybern. 24 (1994) 1279–1284.
[23] P.J. Huber, Robust Statistics, Wiley, New York, 1991.
[24] E. Anderson, The Irises of the gaspe peninsula, Bull. Am. IRIS Soc.
59 (1935) 2–5.
[25] J.C. Bezdek, J.M. Keller, R. Krishnapuram, L.I. Kuncheva, N.R. Pal,
Will the Iris data please stand up?, IEEE Trans. Fuzzy Syst. 7 (1999)
368–369.
[26] C.L. Blake, C.J. Merz, UCI repository of machine learning
databases, a huge collection of artificial and real-world
data sets, 1998. Available from: http://www.ics.uci.edu/∼mlearn/
MLRepository.html.
About the Author—MIIN-SHEN YANG received his BS degree in mathematics from the Chung Yuan Christian University, Chungli, Taiwan, in 1977,
MS degree in applied mathematics from the National Chiao-Tung University, Hsinchu, Taiwan, in 1980, and his Ph.D. degree in statistics from the
University of South Carolina, Columbia, USA, in 1989.
M.-S. Yang, K.-L. Wu / Pattern Recognition 39 (2006) 5 – 21
21
In 1989, he joined the faculty of the Department of Applied Mathematics in the Chung Yuan Christian University as an Associate Proffesor, where, since
1994, he has been a Professor. From 1997 to 1998, he was a Visiting Professor with the Department of Industrial Engineering, University of Washington,
Seattle. His current research interests include applications of statistics, fuzzy clustering, pattern recognition, and neural fuzzy systems.
Dr. Yang is an Associate Editor of the IEEE Transactions on Fuzzy systems.
About the Author—KUO-LUNG WU received his BS degree in mathematics in 1997, the MS and Ph.D. degrees in applied mathematics in 2000 and
2003, all from the Chung Yuan Christian University, Chungli, Taiwan. Since 2003, he has been an Assistant Professor in the Department of Information
Management at Kun Shan University of Technology, Tainan, Taiwan. He is a member of the Phi Tau Phi Scholastic Honor Society of Taiwan. His
research interests include fuzzy theorem, cluster analysis, pattern recognition, and neural networks.