Appl Intell DOI 10.1007/s10489-011-0327-7 Three new fuzzy neural networks learning algorithms based on clustering, training error and genetic algorithm Hamed Malek · Mohammad Mehdi Ebadzadeh · Mohammad Rahmati © Springer Science+Business Media, LLC 2011 Abstract Three new learning algorithms for TakagiSugeno-Kang fuzzy system based on training error and genetic algorithm are proposed. The first two algorithms are consisted of two phases. In the first phase, the initial structure of neuro-fuzzy network is created by estimating the optimum points of training data in input-output space using KNN (for the first algorithm) and Mean-Shift methods (for the second algorithm) and keeps adding new neurons based on an error-based algorithm. Then in the second phase, redundant neurons are recognized and removed using a genetic algorithm. The third algorithm then builds the network in one phase using a modified version of error algorithm used in the first two methods. The KNN method is shown to be invariant to parameter K in KNN algorithm and in two simulated examples outperforms other neuro-fuzzy approaches in both performance and network compactness. Keywords Fuzzy neural network · Learning algorithm · Genetic algorithm · Clustering 1 Introduction It has been shown that a fuzzy system can approximate any continuous real function defined on a compact domain [14] by covering its function graph in input-output space using a set of if-then fuzzy rules. Theoretically, these fuzzy rules can always be discovered, but in practice we may have no idea on how to initialize these rules. Thus, it is crucial to have an adaptive fuzzy system which can produce the required rules automatically. H. Malek () · M.M. Ebadzadeh · M. Rahmati Tehran, Iran e-mail: [email protected] Adaptivity in fuzzy systems can be achieved by integration of fuzzy systems and neural networks. Both approaches are widely used for function approximation and each has its own drawbacks. Neural networks are basically effective on problems with enough training data and difficult to describe network structure and operations. On the other hand, fuzzy systems are not capable of learning the input data, while system’s input and output should be presented linguistically, so for incomplete or wrong rules, system tuning is not a straightforward task. Therefore, since the drawbacks of these two approaches seem to be complementary, it is advantageous to combine fuzzy system and neural networks into one integrated system and benefit from both at once [3]. Different approaches have been proposed for combination of neural networks and fuzzy systems [1, 2, 16, 19]. In general, one can find two important classes of these networks in literature: cooperative and hybrid fuzzy neural networks [20]. In cooperative model, a neural network or a neural learning algorithm is employed as a preprocessing phase to adjust fuzzy parameters optimally. This learning process can be applied by learning fuzzy sets, learning fuzzy rules, adapting fuzzy sets or scaling of fuzzy rules [23]. In hybrid fuzzy neural networks (FNNs), fuzzy system is integrated to neural networks. The structure of this model is similar to neural networks, so some consider it as a special kind of neural networks. Learning phase in this model usually is done using one or a combination of different algorithms. Neurons weight in this model corresponds to rules and their inputs and outputs in fuzzy system. One of the widely used hybrid networks is ANFIS (Adaptive network based fuzzy inference systems) [10], where the system is constructed based on both human knowledge and stipulated input-output data pairs. ANFIS implements Takagi-Sugeno-Kang fuzzy system [25] and has been H. Malek et al. widely used in modeling and controlling nonlinear systems [24]. A mixture of gradient decent and least square method is used in its learning procedure and it has yielded remarkable performance in comparison with various approaches. However, ANFIS suffers from the curse of dimensionality as the number of input dimension gets larger. Thus, in the learning algorithm, before or during the generation of rules, a method for reducing the number of fuzzy rules should be employed. Although it has been shown that by the method used in ANFIS, the same performance can be achieved when a smaller number of fuzzy rules is selected for grid-type partitioning [21], but still the curse of dimensionality is a significant obstacle when the number of fuzzy rules increase. Various methods have been proposed in literature for optimizing the number of rules in hybrid neuro-fuzzy networks. In some approaches fuzzy rules are determined by utilizing the idea of partitioning the input space by some unsupervised algorithms [6, 15]. Different clustering algorithms like FCM have been proposed for defining the structure of fuzzy rules [13, 22]. The output is then approximated by a combination of linear functions on each partition. Defining fuzzy rules on just input data may not perform well since it does not take output values into account which may have large variance in one cluster in comparison with their corresponding inputs and consequently the system doesn’t provide an accurate approximation on the input data. Some authors tried to overcome this problem by doing the clustering on input-output space [9, 12, 26]. They mostly use well known clustering algorithms like HCM or FCM. However, the number of clusters should be provided by an expert and there is no guarantee of finding an optimal solution since the resulting network is highly dependent on the selection of initial cluster centers [18]. Furthermore, the clustering algorithms tend to define clusters based on the closeness of data points instead of their behavior similarities that can consequently lead to generation of redundant fuzzy rules [4]. The above mentioned methods require an expert to set the number of partitions or equivalently the number of fuzzy rules before the beginning of main algorithm. However, we usually do not know an optimal number of fuzzy rules. Therefore, dynamic adaptive methods are proposed which mainly start with a few number of neurons and, during the learning process, the number of fuzzy rules grows until the required precision is achieved [4, 7, 17, 18, 28]. In [7, 28] an online self-organizing dynamic fuzzy neural network based on extended radial basis function neural networks has been proposed that the structure and parameters can be adapted online without partitioning the input space a prior. Based on this dynamic fuzzy neural network (DFNN) and its extended version which is a generalized dynamic fuzzy neural network (GDFNN) [29], a self-organizing fuzzy neural network (SOFNN) is proposed in [18] which designs an online learning using a modified recursive least squares (RLS) for parameter identification and a structure learning algorithm that determines the center and width vectors of elliptical basis functions (EBF) neurons by a criterion based on system error and firing strength to increase the ability of network in covering of input data. An extended version of SOFNN named self-organizing fuzzy neural network based on genetic algorithm (SOFNNGA) is also proposed in [17] where the initial network structure is first built using a method based on geometric growing criterion and the -completeness of fuzzy rules and then a combination of genetic algorithm, backpropagation and recursive least squares (RLS) is employed to adjust system parameters. In this paper we propose three new algorithms for finding the optimum number of rules (or neurons) in function approximation. In the first algorithm, during the first phase, the initial structure identification is performed using K-nearest neighbor (KNN) algorithm which generates the initial network structure by estimating optimum points in the simulated function. Then to ensure that the generated network can approximate training data with a root mean square error (RMSE) less than requested user’s threshold, an error based algorithm is applied which adds new neurons based on the worst error obtained in training data. In the second phase, a genetic algorithm is utilized which removes redundant neurons by keeping RMSE near the desired threshold. The second algorithm performs the same as the first one, but the KNN part is replaced with Mean-Shift algorithm [8]. A simpler method based on error algorithm is proposed as a third algorithm, which can reduce the number of fuzzy sets in fuzzy rules. Since this method merely performs the error algorithm and no genetic algorithm or initial rule generation is included, the network is constructed with lower computational cost. The performance of these three algorithms are evaluated and presented in the simulation section and their superiorities as network compactness and accuracy to other algorithms are presented. The remaining of this paper is organized as follows. In Sect. 2, an overview of function approximation using fuzzy systems is presented. Proposed network structure and learning algorithms are described in Sect. 3. The simulation results are carried out in Sect. 4. A discussion on the role of K and a comparison between the three proposed algorithms are described in Sect. 5. Finally, conclusion is presented in Sect. 6. 2 Fuzzy function approximation Nonlinear function approximation problem can be modeled by employing fuzzy rule base with a set of IF-THEN rules defined as follows: R i : IF x1 is Ai1 and . . . and xn is Ain THEN y is B i (1) Three new fuzzy neural networks learning algorithms based on clustering, training error and genetic algorithm where Aij and B i are fuzzy sets and x = (x1 , . . . , xn )T and y are input and output variables of the fuzzy system, respectively. One can find a mapping from a fuzzy set A to a fuzzy set B using the product inference engine, where we have: n m μAi (xj )μB i (y)) . (2) μB (y) = max sup(μA (x) i=1 x j =1 j Fuzzification of a real-valued point x ∗ can be done using Singleton Fuzzifier where it maps x ∗ into a fuzzy singleton A with membership value 1 at x ∗ and 0 at other points: 1 x = x∗, μA (x) = (3) 0 otherwise. Defuzzifier is defined as a mapping from a fuzzy set B to a crisp point y ∗ . There are different defuzzifiers defined in literature. One of them is center average defuzzifier. Let ȳ i be the center of the i’th fuzzy set and wi be its height, the center average defuzzifier determines y ∗ as m i ȳ wi ∗ . y = i=1 m i=1 wi (4) Fig. 1 Fuzzy neural network structure layer: y= M vi hi , (8) i=1 For a fuzzy set B i with center ȳ i , the fuzzy systems with fuzzy rule base (1), product inference engine (2), singleton fuzzifier (3) and center average defuzzifier (4) are of the following form: m n i i=1 ȳ j =1 μAij (xj ) m n i=1 j =1 μAij (xj ) f (x) = (5) where x is the input and f (x) is the output of the fuzzy system [27]. The corresponding network of fuzzy model can be built as shown in Fig. 1. The network has four layers. The output of each node in the first layer is equal to μAi (xj ) the j membership value of fuzzy set Aij . Nodes in the second layer calculate the product of membership values of inputs in all dimensions for each rule: gi = n μAi (xj ). j =1 j (6) where vi s are called the consequent parameters which should be learned using least-square or gradient descent method. For Takagi-Sugeno-Kang fuzzy model, one layer before the output layer is added which it replaces the consequent parameters with a linear combination of inputs. Thus, the output of the network is calculated as follows: y= M (c0i + c1i x1 + · · · + cni xn )hi . (9) i=1 In recent neuro-fuzzy networks, parameters are determined through a hybrid learning algorithm where epochs are consisted of a forward and backward pass. In the forward pass all training data is presented to the network and the output weights are identified by least-square algorithm. Recursive least square can also be employed to determine output layer weights [14]. The third layer is called the normalization layer where the output of node i is calculated as follows: 3 Proposed algorithms gi hi = m In this paper, three algorithms are proposed for TakagiSuegeno-Kang modeling using fuzzy neural network. The first algorithm builds the initial structure of the network using KNN and keeps adding more rules for covering data points with the highest error until the satisfying RMSE is i=1 gi . (7) And finally the last layer (or output layer) which calculates the summation of its input values from the previous H. Malek et al. Fig. 3 Flowchart for the third algorithm Fig. 2 Flowchart for the first algorithm achieved. Next, the number of rules are optimized using genetic algorithm which tries to reduce the neuron numbers while retaining network error as low as defined threshold (Fig. 2). The second algorithm does the same but uses MeanShift algorithm for finding a good set of initial rules. The third algorithm removes the first and last parts of these two algorithms and only performs the second part which adds neurons until desired RMSE is achieved (Fig. 3). The network architecture used in this paper is the same as standard network architecture in ANFIS which is shown in Fig. 1. The details of the three algorithms are presented as follows. 3.1 KNN method In a learning algorithm, the number of rules, the values of center, and width vectors for each rule should be identified. The first learning algorithm is designed in two phases: Rule generation and rule reduction. In the rule generation phase, the K-nearest-neighbor algorithm following by an error-based algorithm is used. In the rule reduction phase, a genetic algorithm is applied on the network. After adding or removing each rule, least-square method is applied on the network to identify the best consequent parameters in each step. 3.1.1 Rule generation phase In the first phase, initial fuzzy rules are generated using Knearest neighbor (KNN) algorithm. It is intuitively obvious that having enough samples of a nonlinear function, one can estimate local optimums of a function by finding local optimums in uniformly distributed samples. The KNN algorithm tries to locate these local optimums by examining each point with its K nearest neighbors. For a training data point x = (x1 , x2 , . . . , xn )T , define Ax as the set of K training input points with the nearest Euclidean distance to x. We say x is an optimum point if its corresponding output value has the highest or lowest value between Ax members. Thus, for the given training points set P , we set the center vectors of initial fuzzy rules as follows: M = {x = (x1 , . . . , xn ) ∈ P |y(x) < y(Ax ) or y(x) > y(Ax )} (10) In this algorithm, gaussian function with a mean and width is selected as a membership function of fuzzy sets. Width vectors are set as a predefined value σ0 = (σ01 , . . . , σ0n ) which should be provided by user before the learning process. A good heuristic can also be employed which uses the standard deviation of K nearest neighbors as width values. After generation of initial fuzzy rules with KNN method, the consequent parameters should be learned. These consequent parameters can be identified using Least-Square algorithm. In Takagi-Sugeno-Kang modeling we have (n + 1) ∗ M parameters in consequent part of fuzzy rules. Defining the consequent parameters as vector C = (c01 , . . . , cn1 , . . . , c0i , Three new fuzzy neural networks learning algorithms based on clustering, training error and genetic algorithm . . . , cni , . . . , c0M , . . . , cnM )T , we can estimate C using leastsquare algorithm; that is: C = (H T H )−1 H T Y. fitness value as above one can assure that the RMSE of resulting network will not pass the twice threshold value and will converge to networks with the lowest number of rules. (11) 3.2 Mean-shift method New rules based on worst RMSE Newly built network now estimates unknown function by a RMSE which can be higher or lower than our expectation based on training inputs and the complexity of the function. For achieving more precision, the second stage of this phase can be performed. In this stage, we add new neurons based on training data error to the existing network. In this stage, the algorithm enters in a loop where in each round, a point x = (x1 , . . . , xn ) from training set with the highest error (generated by the built network) is selected and a new neuron (rule) is added to the network with center vector x and predefined width σ0 = (σ01 , . . . , σ0n ). This process of adding new neurons continues until the desired error is obtained. 3.1.2 Rule reduction phase After obtaining the required RMSE, algorithm starts the optimization process where the number of neurons gets reduced by applying a genetic algorithm on them. In the second step of the first phase, neuron centers may have not been found optimally, so in this phase the algorithm tries to find and remove redundant neurons. A variety of methods can be employed to find the optimal subset of neurons for a given data. Genetic algorithm is a promising tool for optimization which in contrast to other optimization problems like gradient decent can escape from trapping in local minimum. In genetic algorithms, a population of solutions is initially produced in a specific representation and genetic operators like crossover and mutation are applied to produce next generations. In each generation, a selection process is applied to identify population parents. In the selection step, a fitness scaling should be chosen which determines how to select individuals from existing population. Fitness proportional and rank based scaling are two examples of fitness scaling methods. For this problem, binary representation is chosen where each bit indicates the existence of its corresponding rule. So a bit 1 means that the corresponding neuron is used and 0 means it is not. Rank-based scaling is also selected to remove the effect of the spread of raw scores. The fitness function should be chosen carefully. In one hand, we need to keep the error as low as possible, and in the other hand, we wish to reduce the number of neurons. We define our fitness function as follows: RMSE ∗ M if RMSE > 2 ∗ τ , fitness = (12) L otherwise, where L is the number of ones in the bit string, τ is the error threshold and M is a large number. By defining the The second algorithm is similar to the first one with one exception where the KNN algorithm is replaced with a nonparametric mode seeking algorithm named Mean-Shift [8] which can find the local optimums (modes) in the estimated density function of training data. In short, for n input points, the well-known density estimator [8] can be written as 2 n ck,d x − xi , ˆ fh,K (x) = d (13) k nh h i=1 where k(.) is the profile kernel and ck,d is the corresponding normalization constant which is assumed strictly positive. The parameter h is the kernel bandwidth and must be provided by user. The modes of this density function can be found by locating the zeros of the gradient: f (x) = 0. By defining the function g(x) = −k (x) and using g(x) for profile, the density gradient estimator is obtained as follows: n x − xi 2 2c k,d ˆ h,K (x) = f g h nhd+2 i=1 n x−xi 2 i=1 xi g( h ) × n −x . (14) x−xi 2 i=1 g( h ) By defining kernel G as G(x) = cg,d g(x2 ), (15) the density gradient estimator can be rewritten as ˆ h,K (x) = fˆh,G (x) 2ck,d mh,G (x) f h2 cg,d (16) where 2 n x − xi ck,d , ˆ g fh,G (x) = d nh h (17) i=1 and n mh,G (x) = i=1 n i 2 xi g( x−x h ) x−xi 2 i=1 g( h ) − x. (18) The second term or mh,G (x) is called mean shift. It has been shown [5] that one can find the nearest stationary point of each arbitrary point (x 0 ) by updating its position using the following procedure: x t+1 = x t + mh,G (x t ). (19) H. Malek et al. Starting from training points, based on the value of h the bandwidth parameter, we can find the modes of density estimator which can be used as the initial guess for mean points of fuzzy sets in fuzzy network. The width vectors of these fuzzy sets should be predefined by user. The remaining parts of the algorithm is exactly the same as the first algorithm where new rules are added based on their error, and a genetic algorithm reduces redundant fuzzy rules. 3.3 Space partitioning method Although initializing the network with estimated optimal points using KNN or Mean-Shift method seems to be useful, but we can remove them and start the rule generation phase with error-based algorithm, so the initial center vectors of the fuzzy rules are selected from points with the highest error in the current network. Removing the KNN or Mean Shift algorithms may only increase the initial neurons generated in this phase, but after the genetic algorithm phase the same result would be obtained. The third algorithm is a new method which adds new neurons to the network through finding data points in training set with the worst performance in the existing network. We can extend the idea of generating fuzzy sets from input data errors (presented in our fist two algorithms), and after detecting the data with the worst error, divide the input space based on one dimension of input data. In other words, in each iteration of the algorithm, an input data with the highest error is calculated by comparing the network output with the desired output for each training data. A rule with the highest covering of this data is found and for each component of chosen center vector, one is selected randomly and is added as the center of new fuzzy set in that dimension. The left and right center neighbors of each fuzzy triangular center are considered as the left and right sides of the fuzzy sets. Fuzzy sets in other dimensions are remained intact. So, in the i-th iteration of the algorithm, for x = (x1 , x2 , . . . , xn ) which is the data point with the highest error, the rule with the best covering is found as follows: i = arg max D i t (x; aji , bji , cji ) (20) j =1 where t (x; a, b, c) is a triangular-shaped membership function, and Di is the number of fuzzy sets in i-th dimension. aji , bji and cji are three scalars defining the left, right and center positions of j -th fuzzy triangular set in i-th dimension, respectively. For fuzzy rule i, C i = [c1i , c2i , . . . , cni ]T represents the center vector and Ai = [a1i , a2i , . . . , ani ]T and B i = [b1i , b2i , . . . , bni ]T represent the left and right vectors of triangular Fig. 4 Two fuzzy sets for each dimension are produced in the last iteration of algorithm. A training point x ∗ = (x1∗ , x2∗ ) with the highest error is represented in each dimension Fig. 5 Second dimension is selected randomly and a new fuzzy triangular-shaped set B3 is inserted at x2∗ and the left and right sides of fuzzy sets are updated fuzzy sets corresponding to rule i. So, by replacing one randomly selected dimension of rule i, say r, with the corresponding value in x a new fuzzy rule with the following center vector is generated: C M+1 = [c1M+1 , . . . , crM+1 , . . . , cnM+1 ] (21) where: cjM+1 i c = j xr j= r, j = r. (22) The corresponding values in AM+1 and B M+1 should be updated too. One can do this by finding the position of xr between the center points in dimension r and setting arM+1 to the biggest center value in the left side of xr and setting brM+1 to the lowest center value in the right side of xr . Similarly, we need to update other Aj s and B j s as new sets are being inserted. This can be seen in the Figs. 4 and 5. This algorithm can be improved by employing a heuristic method for choosing the dimension where the space division Three new fuzzy neural networks learning algorithms based on clustering, training error and genetic algorithm Table 1 Comparison results of sinc function Algorithm No. neurons No. parameters ANFIS 16 72 Training RMSE Testing RMSE SOFNN 14 68 0.0217 0.0860 SOFNNGA 14 76 0.0173 0.0567 Mean shift method 18 90 0.0163 0.0571 KNN method 14 70 0.0168 0.0231 Space partitioning method 24 97 0.0161 0.2603 should take place. In order to have a better covering of input space, we need to keep the fuzzy sets far away from each other. So, for input data x, in each dimension, the component with the biggest distance to other fuzzy sets is selected to be replaced in rule i. By utilizing this alternative method we can cover the input space with small number of fuzzy sets, since the generated rules have lots of fuzzy sets in common. There is no rule reduction phase in this method, so the computational cost of the algorithm is much lower than the last two methods. Also, as we update the values of fuzzy sets, there would be no need for normalization too. The comparison of this method to KNN and Mean-Shift methods is presented in simulation section. Fig. 6 Sinc surface from training points 4 Simulation In this section the performance of the proposed network is evaluated using two examples. One nonlinear two-inputs function and a nonlinear dynamic system identification. We also compare our results with other networks. 4.1 Nonlinear sinc function The function is defined as follows: f (x, y) = sin(x) sin(y) , xy x ∈ [−10, 10], y ∈ [−10, 10]. (23) A set of 121 data is sampled uniformly from the function as the training set and a set of 121 data points is sampled as the testing set. Figure 6 shows the surface from training points. We set K = 20 and set τ the desired error 0.01. The genetic algorithm is ran with a population size of 20 in 10 generations. Table 1 depicts our results and its comparison with other algorithms. The resulting output is shown in Fig. 7. By employing the alternative method for rule generation, the number of fuzzy sets can be reduced from 28 sets to 25 sets. Although the number of rules gets bigger to 24 rules with an RMSE around 0.0161 for training data. Fig. 7 Sinc surface from simulation 4.2 Nonlinear dynamic system identification The system is defined as follows: y(t + 1) = y(t)y(t − 1)[y(t) + 2.5] + u(t), 1 + y 2 (t) + y 2 (t − 1) t ∈ [1, 200], y(1) = 0, y(0) = 0, 2πt u(t) = sin . 25 (24) H. Malek et al. Table 2 Comparison results of nonlinear dynamic system identification Algorithm No. neurons No. parameters Training RMSE OLS 65 326 0.0288 RBFAFS 0.1384 Testing RMSE 35 280 DFNN 6 48 0.0283 GDFNN 6 48 0.0241 75 48 0.193 0.201 SOFNN 5 46 0.0157 0.0151 SOFNNGA 4 34 0.0159 0.0146 Khayat’s model [11] 4 34 0.0147 0.0141 Mean shift method 5 35 0.0137 0.0127 KNN method 4 28 0.0150 0.0131 Space partitioning method 9 38 0.0065 0.0055 Farag’s model Fig. 9 NDL graph surface from simulation Fig. 8 NDL graph from training points Which can be described in a form of three-input one output function: ŷ(t + 1) = f (y(t), y(t − 1), u(t)). (25) A set of 200 data is selected as training set and 200 data as testing set. Figure 8 shows the surface from training points. Table 2 depicts our results and its comparison with other algorithms. The resulting output is shown in Fig. 9. The number of fuzzy sets is reduced in the second method from 12 to 11. Although the number of rules gets bigger to 9 with an RMSE around 0.0065 for training data. 5 Discussion In this section, we discuss the role of K in KNN method and a comparison between the three introduced learning algorithms is made. 5.1 The role of K in KNN method The main parameter in KNN method that should be set by user is K which specifies how many neighboring data should be compared with each training data in the process of constructing initial fuzzy network. It can be easily seen that by reducing K more local optimum points in the training data would be found and thus the initial structure of the fuzzy network gets bigger. In this section we examine the effect of changing parameter K before and after the rule reduction phase. To this end, we fix the desired error to a constant value and estimate the results on a specific training data for different values of K. In Fig. 10, the number of rules for different values of K for nonlinear Sinc function, before and after the rule reduction phase are shown. As the value of K increases, in the first phase, the number of rules decreases. In contrast, after applying the genetic algorithm, the number of rules is almost invariant under changes of K. This implies that regardless of the value Three new fuzzy neural networks learning algorithms based on clustering, training error and genetic algorithm Generalization Proposed algorithms perform differently on test data in two simulated functions. KNN and MeanShift algorithms can generalize well in both Sinc function and Nonlinear Dynamic system, but the Space Partitioning method has overfitting problem in Sinc function. The overfitting problem occurs because of precise adjustment of left and right sides of triangular-shaped fuzzy sets in this method. As the number of fuzzy sets increases, the generalization ability of this method reduces and in comparison with other approaches, higher RMSE will be achieved with the same number of fuzzy rules. Fig. 10 The number of rules for 4K value of K, the proposed learning algorithm generates nearly the same number of rules. 5.2 Explanation and comparison between algorithms As it can be seen in the simulation section, comparing to latest hybrid learning algorithms proposed in literature, for the fixed number of neurons, our first two algorithms can estimate the benchmarking functions with the lower RMSE. At the same time, since we choose a fixed variance value for our gaussian fuzzy sets, the number of parameters is reduced and no parameter tuning algorithm is needed to be applied. So, in applications where one needs to have a small network with a low RMSE, the first two methods can be employed and they will outperforms other neuro-fuzzy approaches in both performance and network compactness. The third algorithm is mostly suitable when we need a fast but yet accurate learning algorithm and the compactness of the network is not our priority. Three introduced algorithms are compared based on their learning accuracy, generalization and computational cost. Learning accuracy As it can be seen from the results, the KNN approach outperforms other algorithms in learning the training data. Mean-Shift algorithm comes next which performs much better than Space Partitioning algorithm, but cannot produce less RMSE with the same fuzzy rules in comparison with KNN method. The low performance of Space Partitioning method can be justified by the fact that it doesn’t employ the clustering and genetic algorithm phases. The comparison between KNN and Mean-Shift methods is not easy. Both try to find function mode points. However, since density estimation in Mean-Shift method is done through a restricted kernel shape, it has a lower flexibility in defining the initial fuzzy centers in comparison with KNN method. Computational cost One important factor in comparing different function approximation approaches is their time and space complexity which can become a serious problem when the algorithm is used in real applications. In approximation of some complex functions, one may decide to compromise on accuracy by employing a more simpler algorithm in learning input data. In three proposed algorithms, Space Partitioning algorithm has lower computational costs. It does not perform any initialization phase for finding initial fuzzy rules and there is no rule reduction phase at the end. So, it is a good candidate when having the lowest number of rules is not the first priority for the user. 6 Conclusions Three new hybrid learning algorithms for Takagi-SugenoKang fuzzy systems based on K-nearest neighbor, MeanShift procedure and space partitioning are proposed. The algorithms are simple with low computational costs and are effective in approximating nonlinear functions with enough accuracy and small number of rules. It is shown that, since we choose a fixed variance value for our gaussian fuzzy sets, the number of parameters is reduced and no parameter tuning algorithm is needed to be applied during the algorithm. This makes the algorithms faster comparing to latest hybrid learning algorithms proposed in literature. References 1. Buckley JJ, Hayashi Y (1994) Fuzzy neural networks: A survey. Fuzzy Sets Syst 1–13 2. Buckley JJ, Hayashi Y (1995) Neural networks for fuzzy systems. Fuzzy Sets Syst 265–276 3. Bunke H, Kandel A (2000) Neuro-fuzzy pattern recognition. World Scientific, Singapore 4. Chuang C, Su S, Chen S (2001) Robust TSK fuzzy modeling for function approximation with outliers. IEEE Trans Fuzzy Syst 9 5. Comaniciu D, Meer P (1997) A robust analysis of feature spaces: color image segmentation. In: Proc 1997 IEEE conf computer vision and pattern recognition, pp 750–755 6. Dickerson JA, Kosko B (1996) Fuzzy function approximation with ellipsoidal rules. IEEE Trans Syst Man Cybern 26(4):542–560 H. Malek et al. 7. Er MJ, Wu S (2002) A fast learning algorithm for parsimonious fuzzy neural systems. Fuzzy Sets Syst 337–351 8. Fukunaga K, Hostetler LD (2002) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Pattern Anal Mach Intell 24:603–619 9. Gonzalez J, Rojas I, Pomares H, Ortega J, Prieto A (2002) New clustering technique for function approximation. IEEE Trans Neural Netw 13:132–152 10. Jang JSR (1993) ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst, Man Cybern 665–684 11. Khayat, et al (2009) A novel hybrid algorithm for creating selforganizing fuzzy neural networks. Neurocomputing 12. Klawonn F, Keller A (1998) Grid clustering for generating fuzzy rules. In: European congress on intelligent techniques and soft computing, Aachen, Germany, pp 1365–1369 13. Klawonn F, Kruse R (1997) Constructing a fuzzy controller from data. Fuzzy Sets Syst 85:177–193 14. Kosko B (1994) Fuzzy systems as universal approximators. IEEE Trans Comput 43:1329–1333 15. Kroll A (1996) Identification of functional fuzzy models using multi-dimensional reference fuzzy sets. Fuzzy Sets Syst 80:149– 158 16. Kulkarni AD, Cavanaugh CD (2000) Fuzzy neural network models for classification. Appl Intell 12:207–215. doi:10.1023/A:1008367007808 17. Leng G, McGinnity T (2006) Design for self organizing fuzzy neural network based on genetic algorithm. IEEE Trans Fuzzy Syst 18. Leng G, Prasad G, McGinnity, TM (2006) An on-line algorithm for creating self-organizing fuzzy neural networks. Neural Netw 19:974 19. Lin S-F, Chang J-W, Hsu Y-C (2010) A self-organization mining based hybrid evolution learning for TSK-type fuzzy model design. Appl Intell, December 20. Nauck D, Kruse R (1997) Function approximation by NEFPROX. In: Proc second European workshop on fuzzy decision analysis and neural networks for management, planning, and optimization, Dortmund, pp 160–169 21. Nauck D, Kruse R (1999) Neuro-fuzzy systems for function approximation. Fuzzy Sets Syst 101:261–271 22. Park B-J, Pedrycz W, Oh S-K (2008) Polynomial-based radial basis function neural networks (P-RBF NNs) and their application to pattern classification. Appl Intell 32(1):27–46 23. Pedrycz W (1996) Fuzzy modeling: paradigms and practice. Springer, Berlin, p 205 24. Schilling RJ, Carroll JJ, Al-Ajlouni AF (2001) Approximation of nonlinear systems with radial basis function neural networks. IEEE Trans Neural Netw 1–15 25. Takagi T, Sugeno M (1995) Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern 15:116–132 26. Wang JS, Lee CSG (2001) Efficient neuro-fuzzy control systems for autonomous underwater vehicle control. In: IEEE international conference on robotics and automation, pp 2986–2991 27. Wang L-X (1997) A Course in fuzzy systems and control. Prentice-Hall, New York 28. Wu S, Er MJ (2000) Dynamic fuzzy neural networks-a novel approach to function approximation. IEEE Trans Syst, Man Cybern 358–364 29. Wu S, Er MJ, Gao Y (2001) A fast approach for automatic generation of fuzzy rules by generalized dynamic fuzzy neural networks. IEEE Trans Fuzzy Syst 578–594 Hamed Malek received the B.Sc. In Mathematics from Sharif University of Technology, Iran in 2005 and M.Sc. In Computer Science from Sharif University of Technology, Iran in 2007. Currently, he is a Ph.D. candidate in Computer Engineering Department of Amirkabir University of Technology (Tehran Polytechnic). His research interests include: Artificial Immune Systems, Fuzzy Systems, Mathematical Biology, Evolutionary Algorithms and Multi Agent Systems. Mohammad Mehdi Ebadzadeh received the B.Sc. In Electrical Engineering from Sharif University of Technology, Iran in 1991 and M.Sc. in Machine Intelligence and Robotic from Amirkabir University of Technology, Iran in 1995 and his Ph.D. in Machine Intelligence and Robotic from Télécom ParisTech in 2004. Currently, he is a associate professor in the Computer Engineering Department of Amirkabir University of Technology (Tehran Polytechnic). His research interests include: Evolutionary Algorithms. Fuzzy Systems, Neural Networks, Artificial Immune Systems and Robotics and Artificial Muscles. Mohammad Rahmati received the M.Sc. In Electrical Engineering from the University of New Orleans, USA in 1997 and the Ph.D. degree in Electrical and Computer Engineering from University of Kentucky, Lexington, KY USA in 2003. Currently, he is a associate professor in the Computer Engineering Department at Amirkabir University of Technology (Tehran Polytechnic). His research interests include: Pattern recognition, Image Processing, Bioinformatics, video processing, and Data Mining.
© Copyright 2026 Paperzz