International Journal of Fuzzy Systems, Vol. 13, No. 2, June 2011 89 Different Objective Functions in Fuzzy c-Means Algorithms and Kernel-Based Clustering Sadaaki Miyamoto Abstract1 An overview of fuzzy c-means clustering algorithms is given where we focus on different objective functions: they use regularized dissimilarity, entropy-based function, and function for possibilistic clustering. Classification functions for the objective functions and their properties are studied. Fuzzy c-means algorithms using kernel functions is also discussed with kernelized cluster validity measures and numerical experiments. New kernel functions derived from the classification functions are moreover studied. Keywords: cluster validity measure, fuzzy c-means clustering, kernel functions, possibilistic clustering. not the kernelized measures are adequate for ordinary ball-shaped clusters. Finally, a new class of kernel functions is proposed; they are derived from fuzzy c-means solutions. Illustrative examples are given. 2. Fuzzy c-Means Clustering We first give three objective functions. Possibilistic clustering [18] is included as a variation of fuzzy c-means clustering. A. Preliminary consideration Let objects for clustering be points in the p-dimensional Euclidean space. They are denoted by xk = ( x1k ,…, xkp ) ∈ R p ( k = 1,…, N ). A generic point x = ( x1 ,…, x p ) 1. Introduction Fuzzy clustering is well-known not only in fuzzy community but also in the related fields of data analysis, neural networks, and other areas in computational intelligence. Among various techniques of clustering using fuzzy concepts [16, 23, 30, 37], the word of fuzzy clustering mostly refers to fuzzy c-means clustering by Dunn and Bezdek [1, 2, 6, 7, 8, 13]. This paper gives an overview of this method. Nevertheless, we adopt a non-standard formulation. That is, we begin from three different objective functions, and none of them is exactly the same as the one by Dunn and Bezdek. Comparing different objective functions and their solutions, we find theoretical properties of fuzzy c-means clustering: different fuzzy classifiers are derived from different solutions. Moreover generalization including a “cluster size” variable and a “covariance'” variable is developed. This generalization is shown to be closely related to mixture distributions. Kernel-based fuzzy c-means clustering is moreover studied with associated cluster validity measures. Many numerical simulations are used to evaluate whether or Corresponding Author: Sadaaki Miyamoto is with the Department of Risk Engineering, the University of Tsukuba, Ibaraki 305-8573, Japan. E-mail: [email protected] Manuscript received June 2010; revised Nov. 2010; accepted Dec. 2010. c clusters; implies a variable in R p . We assume cluster centers are denoted by vi ( i = 1, …, c ). We write V = (v1 , …, vc ) as the collection of all cluster centers. The dissimilarity between an object and a cluster center is the squared Euclidean distance: 2 D( xk , vi ) =∥xk − v∥ (1) i . We sometimes write Dki = D( xk , vi ) for simplicity. Moreover D( x, vi ) means that variable x is substituted into object xk . U = (uki ) is the membership matrix: uki means the degree of belongingness of xk to cluster i . Crisp and fuzzy c-means clustering are based on the minimization of objection functions. Crisp c-means clustering [21] uses the following: c N J H (U ,V ) = ∑∑ uki D( xk , vi ) (2) i =1 k =1 Alternate minimization with respect to one of (U , V ) , while another variable is fixed, is repeated until convergence [1]. Minimization with respect to U uses the following constraint: © 2011 TFSA c M = { U = (uki ) : ∑ uki = 1; ukj ≥ 0, ∀k , j }. i =1 (3) International Journal of Fuzzy Systems, Vol. 13, No. 2, June 2011 90 We consider three objective functions: c N ∑ (u N J B (U ,V ) = ∑∑(uki )m{ε + D( xk , vi )}, vi = (4) i =1 k =1 k =1 N ∑ (u (m > 1, ε ≥ 0), c k =1 N J E (U ,V ) = ∑∑{uki D(xk , vi ) + λ −1uki (log uki −1)}, i =1 k =1 Solution for J E : (5) uki = (λ > 0), c N J P (U ,V ) = ∑∑{(uki ) D( xk , vi ) + ζ (1 − uki ) }, −1 m exp(−λ D( xk , vi )) ∑ exp(−λ D( x , v )) k (6) All above are different from the original function proposed by Dunn [7, 8] and Bezdek [1, 2]. J B (U , V ) has a nonnegative parameter ε proposed by Ichihashi [28]. When ε = 0 , J B (U , V ) is the original objective vi = ∑u k =1 N the optimal solution be V . FCM4: If (U , V ) is convergent, stop. Otherwise go to FCM2. End FCM. We show solutions of FCM2 and FCM3 for each objective function, where the derivations are omitted. Solution for J B : 1 k =1 1 (ε + D( xk , v j )) . (10) ki Solution for J P : 1 uki = 1 + ζ D( xk , vi ) 1 1 m −1 , c ∑ j =1 1 + ζ D( xk , v j ) (11) 1 m −1 N vi = ∑ (u k =1 N ki ∑ (u k =1 ) m xk . ki ) (12) m B. Basic Functions We introduce what we call basic functions in this paper: 1 1 m −1 , (13) (ε + D( x, y )) g E ( x, y ) = exp(−λ D( x, y )), (14) 1 (15) . g P ( x, y ) = 1 m −1 1 + ζ D ( x, y ) We also assume that g ( x, y ) is either g B ( x, y ) , g E ( x, y ) , or g P ( x, y ) . A unified representation is now obtained for optimal uki : (ε + D( xk , vi )) m −1 uki = c 1 j =1 j x g B ( x, y ) = with respect to V . Let (9) ki k ∑u J E (U ,V ) , or J P (U ,V ) . Minimization with respect to where m = 2 . U is with constraint (3). FCM Algorithm of Alternate Optimization. FCM1: Put initial value V randomly. FCM2: Minimize J (U , V ) with respect to U . Let , N function. J E (U , V ) has an additional term of entropy. The use of entropy in fuzzy c -means clustering has been proposed by a number of researchers, e.g.,[19, 20, 24]. J P (U , V ) has been proposed by Krishnapuram and Keller [18] for possibilistic clustering. This function can also be used for fuzzy c -means with constraint (3) when m = 2 . We use alternate minimization procedure FCM in the following, where J (U , V ) is either J B (U , V ) , ∑ ) ki (8) c (ζ > 0). the optimal solution be U . FCM3: Minimize J (U , V ) . m j =1 m i =1 k =1 ) m xk ki 1 m −1 , (7) uki = g ( xk , vi ) c ∑ g(x , v ) j =1 k (16) j for all three objective functions, since g ( x, y ) represents either g B ( x, y ) , g E ( x, y ) , or g P ( x, y ) . S. Miyamoto: Fuzzy c-Means Algorithms and Kernel-Based Clustering Possibilistic Clustering Possibilistic clustering [18] uses J P (U , V ) but with a different constraint: M = { U = (uki ) : ukj > 0, ∀k , j } . 91 if x ∈ Si (V ) then x → cluster i. C. Note that J P (U , V ) and M in this paper are simpler than the original formulation [18], but the essential discussion is the same. We cannot use J B (U , V ) which leads to a trivial solution in possibilistic clustering, but J E (U , V ) can be used [4]. We have the solution of possibilistic clustering for J E (U ,V ) : When we consider fuzzy rules, a function U i ( x;V ) that interpolates uki is used. We define the following function using the basic function: U i ( x;V ) = g ( x, vi ) c ∑ g ( x, v ) j =1 , x∈ Rp (19) j where g ( x, y ) is either g B ( x, y ) , g E ( x, y ) , or g P ( x, y ) . Fuzzy rules are simpler in possibilistic clustering: U i ( x; vi ) = g ( x, vi ), x ∈ R p (20) (17) where g ( x, y ) is either g ( x, y ) , or g ( x, y ) . The E P uki = g E ( xk , vi ) using basic function g E with vi given by (10), while the solution for J P (U , V ) is the following: uki = g P ( xk , vi ) (18) using basic function g P with vi given by (12). Note that m = 2 is not assumed for possibilistic clustering. D. Fuzzy Classifiers There have been many discussions on fuzzy classifiers derived from fuzzy clustering, but we show a standard classifier that is naturally derived from the optimal solutions. Note that uki is given only on objects xk , while what we need is fuzzy classification rules whereby the solutions are provided. To understand classification rules clearly, let us consider the crisp c -means, where we use the nearest prototype allocation rule: when the set of cluster prototypes are determined, we allocate an object to its nearest prototype, i.e., ⎧1 (i = arg min1≤ j ≤c D( xk , v j )), uki = ⎨ ⎩0 (otherwise). Note that the objective function is J H . rule is thus the same as basic functions in possibilistic clustering. We show a number of theoretical properties of the fuzzy rules defined by the above functions. The proofs are given in [25, 28] and omitted here. Proposition 1: Let U i ( x;V ) is with function g B . In other words, J B is used. Suppose ε → 0 . Then the maximum value of U i ( x;V ) is at x = vi : arg max x∈R p U i ( x;V ) → vi , as ε → 0. Moreover, for all ε ≥ 0 , we have 1 lim U i ( x;V ) = . ∥x∥→∞ c Proposition 2: Let U i ( x;V ) is with function g P . In other words, J P is used with m = 2 . Suppose ζ → +∞ . Then the maximum value of U i ( x;V ) is at x = vi : arg max x∈R p U i ( x;V ) → vi , as ζ → +∞. Moreover, for all ζ ≥ 0 , we have lim U i ( x;V ) = ∥x∥→∞ 1 . c This allocation rule is applied to all points in the space, Hence the functions of the fuzzy rules for J B and J P and the result is the Voronoi regions [17] with the cenbehave similarly when point x goes far, while the ters of the cluster prototypes. Specifically, we define maximum point approaches to the cluster center as the Si (V ) = { x ∈ R p ∥ : x − v∥ / i} i <∥x − v ∥ j , ∀j = respective parameters tend to their limitations. In conas a Voronoi region for a given set of cluster prototypes trast, fuzzy rule U i ( x;V ) for J E has a quite different V . We then have property. To describe this, we should discuss Voronoi c p regions again. Si (V ) = R , Si (V ) ∩ S j (V ) = ∅ (i =/ j ), In many cases, fuzzy clusters are made crisp by the i =1 maximum membership rule: where S (V ) is the closure of S (V ) . The nearest al- ∪ i location rule then is as follows: i if i = arg max1≤ j ≤c U j ( x;V ) then x → cluster i. International Journal of Fuzzy Systems, Vol. 13, No. 2, June 2011 92 Accordingly we can define the set of points that belongs to cluster i : Ti (V ) = {x ∈ R p : i = arg max1≤ j ≤c U j ( x;V )}. We then have the next proposition. Proposition 3: For all choices of g = g B , g = g E , and g = gP , Ti (V ) = Si (V ). [11]. However, there is another problem to separate a dense cluster and a sparse cluster for which “density” or “cluster size” has to be considered. To solve the both problems, a generalized objective function with a Kullback-Leibler information term has been proposed by Ichihashi and his colleagues [15, 28]. That is, the following function is used for this purpose: c i =1 k =1 Thus Ti (V ) is the closure of the Voronoi region with c Let us now consider U i ( x;V ) for J E . Proposition 4: Let U i ( x;V ) is with function g E . In other words, J E is used. Assume vi 's are in general positions in the sense that none of the three are on a line. If a Voronoi region Ti (V ) is bounded, then lim U i ( x;V ) = 0. ∥x∥→∞ If a Voronoi region Ti (V ) is unbounded and x moves inside Ti (V ) , then lim U i ( x;V ) = 1. ∥x∥→∞ In the both cases, 0 < U i ( x;V ) < 1 for all x ∈ R p . The proof is given in [25] and omitted here. Possibilistic clustering As fuzzy rules in possibilistic clustering are bell-shaped functions, we have the same property: arg max x∈R p U i ( x; vi ) = vi , lim U i ( x; vi ) = 0 ∥x∥→∞ for both g E and g P . If possibilistic clusters should be made crisp, we define Ti′(V ) = {x ∈ R p : i = arg max1≤ j ≤c U j ( x; vi )}. We have the next proposition: Proposition 5: For both g = g E and g = g P , Ti′(V ) = Si (V ). N + ∑∑ uki {ν log center V , and Ti (V ) is the same for all the three objective functions J B , J E , and J P . N J KL (U , V , A, S ) = ∑∑ uki D( xk , vi ; Si ) i =1 k =1 We frequently need to recognize a prolonged cluster, but the original fuzzy c-means cannot do this, as the Voronoi region cannot separate such a prolonged region. To solve such a problem, cluster covariances in fuzzy c-means have been considered by Gustafson and Kessel αi 1 2 (23) + log | Si | } where variable A = (α1 , …, α c ) controls cluster sizes with the constraint c A = { A : ∑ α i = 1, α j ≥ 0, j = 1,…, c}. (24) i =1 Another variable is S = ( S1 , …, Sc ) ; Si ( i = 1, …, c ) is p × p positive-definite matrix with determinant | Si | . In addition, D( x, vi ; Si ) = ( x − vi )T Si−1 ( x − vi ) (25) is the squared Mahalanobis distance for cluster i . Since this objective function has four variables, the alternate optimization means minimization with respect to a variable while other three are fixed: After giving initial values for V , A, S , we repeat U = arg minU J KL (U ,V , A, S ), V = arg minV J KL (U ,V , A, S ), A = arg min A J KL (U ,V , A, S ), S = arg min S J KL (U , V , A, S ), until convergence. The solutions are as follows [28]. Solutions for J KL : αi ⎛ D( xk , vi ; Si ) ⎞ exp ⎜ − ⎟ ν ⎝ ⎠ | Si | , uki = c ⎛ D( xk , v j ; Si ) ⎞ αj exp ⎜ − ⎟⎟ ∑ 1 ⎜ ν j =1 2 ⎝ ⎠ | Sj | 1 2 (26) n The Voronoi regions are thus derived again. 3. Size and Covariance of a Cluster uki vi = ∑u k =1 n ∑u k =1 n αi = x ki k , (27) , (28) ki ∑u k =1 n ik S. Miyamoto: Fuzzy c-Means Algorithms and Kernel-Based Clustering Si = n 1 ∑u n ∑u k =1 k =1 ki ( xk − vi )( xk − vi ) . (29) ki Note that the above solutions are similar to those obtained by the EM algorithm [5] for Gaussian mixture distributions [22, 29]. This model for fuzzy c -means clustering thus has a close relationship with the statistical model of mixture distributions. 4. Kernel Functions in Fuzzy Clustering Many studies on kernel functions have been done [31]. The algorithms of kernel-based fuzzy c-means (e.g., [10, 26, 27]) have also been developed. We review the clustering algorithms and also discuss kernelized cluster validity measures. We moreover study a class of new kernel functions. A. Kernel-based algorithms Linear cluster boundaries between Voronoi regions are obtained by fuzzy c-means clustering. When we use the KL-information method, we have a curved boundary described by quadratic functions. In contrast, more general nonlinear boundaries can be obtained using kernel functions, as discussed in support vector machines [34]. A high-dimensional feature space H is assumed, while the original space R p is called the data space. H is an inner product space. Assume that the inner product is 〈⋅, ⋅〉 . The norm of H for g ∈ H is given 93 The basic function is changed as follows: 1 gB ( x, y) = 1 m−1 , (33) (ε + K ( x, x) + K ( y, y) − 2K ( x, y)) gE ( x, y) = exp(−λ ( K ( x, x) + K ( y, y) − 2K ( x, y))), (34) 1 , (35) gP ( x, y) = 1 m−1 1 + ζ (K ( x, x) + K ( y, y) − 2K ( x, y)) whereby solution uki is given by (16), with function g = g B , g = g E , or g = g P changed as above. The cluster prototype is given by N vi = ∑ (u k =1 ki ) m Φ ( xk ) , N ∑ (u k =1 ki ) m but function Φ ( xk ) is generally unknown. Hence we cannot use FCM algorithm. Instead, we update dissimilarity measure Dki : Dki = K kk − N ∑ (u k =1 + N 2 ki )m ∑ (u N 1 N j =1 ) m K jk (36) N ∑∑ (u (∑ (uki ) ) ji m 2 j =1 =1 m ji u i) Kj , k =1 by ∥g∥ = 〈 g , g 〉 . 2 H A transformation Φ : R p → H is used whereby xk is mapped into Φ ( xk ) . Explicit representation of where K jk = K ( x j , xk ) . Note that m = 1 in (36) when J E is considered. We thus repeat (16) and (36) until convergence. Φ ( x) is unknown in general but the inner product B. Kernelized Cluster Validity Measures 〈Φ ( x), Φ ( y )〉 is assumed to be represented by a kernel Various cluster validity measures have been proposed function: K ( x, y ) = 〈Φ( x), Φ ( y )〉 . (30) A well-known kernel function is the Gaussian kernel: K ( x, y ) = exp{−C∥x − y∥2}, (c > 0) . (31) Note that K ( x, y ) = g E ( x, y ) holds when C = λ . Objective functions J B , J E , and J P are used but the dissimilarity is changed as follows: 2 Dki =∥Φ ( xk ) − v∥ i H, where vi ∈ H . Note: There is (32) another formulation using Dki =∥Φ ( xk ) − Φ (vi )∥ instead of (32), which is omitted here (see, e.g., [35]). When we derive a kernel-based fuzzy c-means algorithm, we should consider two problems: one is the basic function and another is the updating scheme. [1,6] in order to determine the appropriate number of clusters. They are divided into two classes: one class uses the membership values alone. A typical example is the entropy c N E (U , c) = ∑∑ uki log uki i =1 k =1 whereby the number c that maximizes E (U , c) is selected. Another class takes geometrical characteristics into account. A typical method uses the fuzzy covariance matrix for cluster i : N 2 H Fi = ∑(u k =1 ki ) m ( xk − vi )( xk − vi ) . N ∑(u k =1 ki ) m (37) International Journal of Fuzzy Systems, Vol. 13, No. 2, June 2011 94 Gath and Geva [9] use the sum of the square root of the determinants of Fi : c Wdet = ∑ det Fi . (38) i =1 We also consider the sum of the traces of Fi : c Wtr = ∑ trFi . (39) i =1 Hashimoto et al. [12] showed the trace works as well as the determinant by having randomly generated many simulation examples and tested different validity measures. When we use kernel-based clustering, we should also have kernel-based validity measures. Let us consider the kernelized versions of (38) and (39) for this purpose. The kernel-based fuzzy covariance matrix is the following: N KFi = ∑(u k =1 ) (Φ ( xk ) − vi )(Φ ( xk ) − vi ) m ki . N ∑(u k =1 ki (40) )m where vi is not explicitly given. Note that the determinant of the kernelized fuzzy covariance is inappropriate, since the next relation holds: det KFi → 0, as N → ∞. The proof of this relation is simple because the monotone decreasing sequence λ1 , λ2 , … of the eigenvalues of KFi will converge to zero as N → ∞ (see, e.g., [31]). Hence we have N log det KFi = ∑ logλi → −∞, as N → ∞. i =1 In contrast, the trace of KFi is useful. After some calculation, we have trKFi = ∑ (u N ∑(u k =1 = N 1 ki )m ∑ (u N k =1 2 ) m∥Φ ( xk ) − v∥ i (41) N 1 ∑(u k =1 ki ki ) m k =1 m ki ) Dki , where Dki is given by (36). We hence define c KWtr = ∑ trKFi . (42) i =1 Numerical experiments We now have a question: although a kernel-based clustering works well for some typical clusters with nonlinear boundaries (such as those in Fig.1), does it also work well for ordinary ball-shaped clusters? To answer this question, we compared the above measures using randomly generated data with artificial clusters and evaluated the numbers of clusters. Conditions of the random data are shown below. The basic condition is shown by No.1 in Table 1. Then the diameter of each cluster was changed to No.2. Next, the total number of data points of each cluster was randomly changed to No.3. Finally, the diameter and total number of members was changed to No.4. The details of these conditions are shown in Table 1. Note that the randomly generated clusters are ball-shaped, and we tested if the kernel-based measure has the ability to judge correct number of clusters as well as the non-kernelized measures. Table 1. Conditions for random generation of clusters. Conditions Total number of clusters Total number of data points Dimensions of data set Range of cluster centers Number of data in each cluster Diameter of each cluster No.1 No.2 No.3 No.4 4 400 2, 3 0.0 ~ 1.0 100 0.1 0.05~ 0.193 50~150 0.1 0.05 ~0.193 The process of evaluating the numbers of clusters is as follows: (1) Generate data sets with conditions No.1--4. (2) Perform clustering 100 times with random initial values, and then use the resulting clusters having the minimum value of the objective function for the evaluation. (3) Evaluate the above clusters by each validity measure. (4) Give label “correct” that has the number “4” of clusters, otherwise give label “wrong.” (5) Repeat the process No.1--4 for 1000 times. (6) Calculate the percentage of label “correct” for each validity measure. We observe that kernelized measure KWtr is as effective as the non-kernerized measures. Moreover, it has been shown that KWtr can judge the correct number of clusters for the set of points like the one in Fig.1 (see e.g., [28]). C. Positive definite kernels derived from fuzzy c-means As the last topic in this paper, we consider kernel S. Miyamoto: Fuzzy c-Means Algorithms and Kernel-Based Clustering functions again. We note that the Gaussian kernel is the same as basic function g E . Note that other basic functions g B and g P are also bell-shaped with “longer tails.” Table 2. The ratio of accurate numbers of clusters for each condition using J B ( m = 2 ) with dimension P=2, 3. No.1 Wtr Wdet KWtr p=2 p=3 0.958 0.943 0.952 0.994 0.995 0.994 No.2 Wtr Wdet KWtr p=2 p=3 0.770 0.931 0.779 0.982 0.992 0.980 No.3 Wtr Wdet KWtr p=2 p=3 0.953 0.931 0.949 0.993 0.993 0.990 No.4 Wtr Wdet KWtr 0.710 0.897 0.728 0.981 0.982 0.972 p=2 p=3 Here is a question: Are g B and g P also positive-definite kernel functions? We also have a second question: Are they as useful in kernel-based clustering as the Gaussian kernel? The first answer is shown in the next proposition. Proposition 6: Functions g B ( x, y ) = 1 (ε + D( x, y )) 1 m −1 95 Illustrative examples Let us consider two sets of points shown in Figs.1 and 2. The former figure is typical in discussing the effect of kernel functions. The crisp and fuzzy c-means cannot divide the set of points into the outer circle and inner ball, since they produce linear cluster boundaries, In contrast, the Gaussian kernel is known to successfully divide the both circles. As expected, g B and g P also perfectly succeed in dividing the outer circle and inner ball [14]. Figure 1. First data set: a ball inside a circle. In the second figure, the “two crescents” are shown, which is similar to those examples in semi-supervised learning [3]. It is more difficult to divide these two sets of points. , with ε > 0 and g P ( x, y ) = 1 1 + ζ D ( x, y ) 1 m −1 are positive-definite kernels. The proof is based on a theorem by Schönberg [32] that states a class of positive-definite kernels can be derived from complete monotone functions. We proved that g B and g P are derived from complete monotone functions [14]. The details are given in [14] and omitted here. Note that g B is not positive-definite if ε = 0 , i.e., the original objective function does not give a kernel function. The regularization parameter ε > 0 is thus necessary. Accordingly, we can use these two functions to the kernel-based fuzzy c-means algorithms instead of the Gaussian kernel. Figure 2. Second data set: two crescents. International Journal of Fuzzy Systems, Vol. 13, No. 2, June 2011 96 We summarize the results of classifications in Table 3, where misclassifications are fewer for g B and g P than the Gaussian kernel. We thus have the second answer: g B and g P are useful in these examples with nonlinear cluster boundaries. Table 3. Summary of misclassifications by fuzzy c-means with the three kernel functions applied to the two crescents data. Calculations were repeated 50 times for each kernel with different initial random values. Numbers in the parentheses (*) are from the entropy fuzzy c-means. The parameters are m = 2 , ε = 1.0 , η = 1.0 , and λ = 1.0 . Percentage of misclassifications 0 ~ 15 16 ~ 30 gE gB [4] [5] [6] [7] gP 14 (7) (Gaussian) 1 (0) 11 (10) 10 (12) 3 (5) 13 (13) 31 ~ 45 8 (13) 8 (8) 12 (15) 46~ 18 (18) 38 (37) 14 (12) [8] [9] 5. Conclusions [10] An overview of fuzzy c-means clustering with three different objective functions has been given with the focuses on fuzzy classifiers, a generalization including variables of cluster size and covariance, and kernel functions. The two discussions on kernel functions are kernelized validity measures and new kernels derived from basic functions of fuzzy c-means. The kernel functions g B and g P are useful for examples given here. We expect that they are useful in support vector machines as well, but many more experiments using real numerical data are necessary. In spite of their importance, the topics herein are relatively unknown to the fuzzy community interested in clustering. They provide, however, many future research opportunities both in theory and applications. For example, application to semi-supervised clustering [3, 35] and a variety of new fuzzy clustering algorithms [33, 36] will be promising. [11] [12] [13] [14] [15] References [1] [2] [3] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981. J. C. Bezdek, J. Keller, R. Krishnapuram, and N. R. Pal, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, Kluwer, Boston, 1999. O. Chapelle, B. Schölkopf, and A. Zien, eds., [16] [17] [18] Semi-Supervised Learning, MIT Press, Cambridge, Massachusetts, 2006. R. N. Davé and R. Krishnapuram, “Robust Clustering Methods: a Unified View,” IEEE Trans. on Fuzzy Systems, vol.5, pp.270-293, 1997. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. R. Stat. Soc., vol. B39, pp. 1-38, 1977. D. Dumitrescu, B. Lazzerini, and L. C. Jain, Fuzzy Sets and Their Application to Clustering and Training, CRC Press, Boca Raton, Florida, 2000. J. C. Dunn, “A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters,” J. of Cybernetics, vol. 3, pp. 32-57, 1974. J. C. Dunn, “Well-Separated Clusters and Optimal Fuzzy Partitions,” J. of Cybernetics, vol. 4, pp. 95-104, 1974. I. Gath and A. B. Geva, “Unsupervised Optimal Fuzzy Clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 773-781, 1989. M. Girolami, “Mercer Kernel Based Clustering in Feature Space,” IEEE Trans. on Neural Networks, vol. 13, no. 3, pp. 780-784, 2002. E. E. Gustafson and W. C. Kessel, “Fuzzy Clustering with a Fuzzy Covariance Matrix,” IEEE CDC, San Diego, California, pp. 761-766, 1979. W. Hashimoto, T. Nakamura, and S. Miyamoto, “Comparison and Evaluation of Different Cluster Validity Measures Including Their Kernelization,” Journal of Advanced Computational Intelligence and Intelligent Informatics, vol. 13, no. 3, pp. 204-209, 2009. F. Höppner, F. Klawonn, R. Kruse, and T. Runkler, Fuzzy Cluster Analysis, Wiley, Chichester, 1999. J. S. Hwang and S. Miyamoto, “Kernel Functions Derived from Fuzzy Clustering and Their Application to Kernel Fuzzy c -Means,” Journal of Advanced Computational Intelligence and Intelligent Informatics, vol. 15, pp. 90-94, 2011. H. Ichihashi, K. Honda, and N. Tani, “Gaussian Mixture PDF Approximation and Fuzzy c-Means Clustering with Entropy Regularization,” Proc. of Fourth Asian Fuzzy Systems Symposium, vol. 1, pp. 217-221, 2000. L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York, 1990. T. Kohonen, Self-Organizing Maps, 2nd Ed., Springer, Berlin, 1997. R. Krishnapuram and J. M. Keller, “A Possibilistic Approach to Clustering,” IEEE Trans. on Fuzzy S. Miyamoto: Fuzzy c-Means Algorithms and Kernel-Based Clustering Systems, vol. 1, pp. 98-110, 1993. [19] R. P. Li and M. Mukaidono, “A Maximum Entropy Approach to Fuzzy Clustering,” Proc. of the 4th IEEE Intern. Conf. on Fuzzy Systems (FUZZ-IEEE/IFES'95), Yokohama, Japan, pp. 2227-2232, March 20-24, 1995. [20] R. P. Li and M. Mukaidono, “Gaussian Clustering Method Based on Maximum-Fuzzy-Entropy Interpretation,” Fuzzy Sets and Systems, vol. 102, pp. 253-258, 1999. [21] J. B. MacQueen, “Some Methods of Classification and Analysis of Multivariate Observations,” Proc. of 5th Berkeley Symposium on Math. Stat. and Prob., pp. 281-297, 1967. [22] G. McLachlan and D. Peel, Finite Mixture Models, Wiley, New York, 2000. [23] S. Miyamoto, Fuzzy Sets in Information Retrieval and Cluster Analysis, Kluwer, Dordrecht, 1990. [24] S. Miyamoto and M. Mukaidono, “Fuzzy c -Means as a Regularization and Maximum Entropy Approach,” Proc. of the 7th International Fuzzy Systems Association World Congress (IFSA'97), Prague, Czech, vol. II, pp. 86-92, June 25-30, 1997. [25] S. Miyamoto, Introduction to Cluster Analysis, Morikita-Shuppan, Tokyo, 1999 (in Japanese). [26] S. Miyamoto and Y. Nakayama, “Algorithms of Hard c -Means Clustering Using Kernel Functions in Support Vector Machines,” Journal of Advanced Computational Intelligence and Intelligent Informatics, vol. 7, no. 1, pp. 19-24, 2003. [27] S. Miyamoto and D. Suizu, “Fuzzy c -Means Clustering Using Kernel Functions in Support Vector Machines,” Journal of Advanced Computational Intelligence and Intelligent Informatics, vol. 7, no. 1, pp. 25-30, 2003. [28] S. Miyamoto, H. Ichihashi, and K. Honda, Algorithms for Fuzzy Clustering, Springer, Berlin, 2008. [29] R. A. Redner and H. F. Walker, “Mixture Densities, Maximum Likelihood and the EM Algorithm,” SIAM Review, vol. 26, no. 2, pp. 195-239, 1984. [30] E. H. Ruspini, “A New Approach to Clustering,” Information and Control, vol. 15, pp. 22-32, 1969. [31] B. Schölkopf and A. Smola. Learning with Kernels, MIT Press, Cambridge, Massachusetts, 2002. [32] I. J. Schönberg, “Metric Spaces and Completely Monotone Functions,” Annals of Mathematics, vol. 39, no. 4, pp. 811-841, 1938. [33] C.-C. Tsai, C.-C. Chen, C.-K. Chan, and Y.-Y. Li, “Behavior-Based Navigation Using Heuristic Fuzzy Kohonen Clustering Network for Mobile Service Robots,” International Journal of Fuzzy Systems, vol. 12, no. 1, pp. 25-32, 2010. [34] V. N. Vapnik, Statistical Learning Theory, Wiley, 97 New York, 1998. [35] N. Wang, X. Li, and X. Luo, “Semi-supervised kernel-based fuzzy c -means with pairwise constraints,” WCCI 2008 Proceedings, Hong Kong, China, pp.1099-1103, June 1-6, 2008. [36] F. Yu, J. Tang, and R. Cai, “Partially Horizontal Collaborative Fuzzy C-Means,” International Journal of Fuzzy Systems, vol. 9, no. 4, pp. 198-204, 2007. [37] L. A. Zadeh, “Similarity Relations and Fuzzy Orderings,” Information Sciences, vol. 3, pp. 177-200, 1971. Dr. Miyamoto was born in Osaka, Japan, in 1950. He received the B.S.,M.S. and the Dr. Eng. degrees in Applied Mathematics and Physics Engineering from Kyoto University, Japan, in 1973, 1975, and 1978, respectively. He is now a Professor at the Department of Risk Engineering, the University of Tsukuba, Japan. His current research interests include methodology for uncertainty modeling, data clustering algorithms, multisets, and methods for text mining. He is a member of the Society of Instrumentation and Control Engineers of Japan, Information Processing Society of Japan, the Japan Society of Fuzzy Theory and Systems, and IEEE. He is a fellow of International Fuzzy Systems Association.
© Copyright 2026 Paperzz