Three new fuzzy neural networks learning algorithms based on

Appl Intell
DOI 10.1007/s10489-011-0327-7
Three new fuzzy neural networks learning algorithms based
on clustering, training error and genetic algorithm
Hamed Malek · Mohammad Mehdi Ebadzadeh ·
Mohammad Rahmati
© Springer Science+Business Media, LLC 2011
Abstract Three new learning algorithms for TakagiSugeno-Kang fuzzy system based on training error and genetic algorithm are proposed. The first two algorithms are
consisted of two phases. In the first phase, the initial structure of neuro-fuzzy network is created by estimating the
optimum points of training data in input-output space using
KNN (for the first algorithm) and Mean-Shift methods (for
the second algorithm) and keeps adding new neurons based
on an error-based algorithm. Then in the second phase, redundant neurons are recognized and removed using a genetic algorithm. The third algorithm then builds the network
in one phase using a modified version of error algorithm
used in the first two methods. The KNN method is shown
to be invariant to parameter K in KNN algorithm and in
two simulated examples outperforms other neuro-fuzzy approaches in both performance and network compactness.
Keywords Fuzzy neural network · Learning algorithm ·
Genetic algorithm · Clustering
1 Introduction
It has been shown that a fuzzy system can approximate any
continuous real function defined on a compact domain [14]
by covering its function graph in input-output space using a
set of if-then fuzzy rules. Theoretically, these fuzzy rules can
always be discovered, but in practice we may have no idea
on how to initialize these rules. Thus, it is crucial to have an
adaptive fuzzy system which can produce the required rules
automatically.
H. Malek () · M.M. Ebadzadeh · M. Rahmati
Tehran, Iran
e-mail: [email protected]
Adaptivity in fuzzy systems can be achieved by integration of fuzzy systems and neural networks. Both approaches
are widely used for function approximation and each has
its own drawbacks. Neural networks are basically effective
on problems with enough training data and difficult to describe network structure and operations. On the other hand,
fuzzy systems are not capable of learning the input data,
while system’s input and output should be presented linguistically, so for incomplete or wrong rules, system tuning is
not a straightforward task. Therefore, since the drawbacks
of these two approaches seem to be complementary, it is advantageous to combine fuzzy system and neural networks
into one integrated system and benefit from both at once [3].
Different approaches have been proposed for combination of neural networks and fuzzy systems [1, 2, 16, 19].
In general, one can find two important classes of these networks in literature: cooperative and hybrid fuzzy neural networks [20].
In cooperative model, a neural network or a neural learning algorithm is employed as a preprocessing phase to adjust
fuzzy parameters optimally. This learning process can be applied by learning fuzzy sets, learning fuzzy rules, adapting
fuzzy sets or scaling of fuzzy rules [23].
In hybrid fuzzy neural networks (FNNs), fuzzy system is
integrated to neural networks. The structure of this model
is similar to neural networks, so some consider it as a special kind of neural networks. Learning phase in this model
usually is done using one or a combination of different algorithms. Neurons weight in this model corresponds to rules
and their inputs and outputs in fuzzy system.
One of the widely used hybrid networks is ANFIS (Adaptive network based fuzzy inference systems) [10], where
the system is constructed based on both human knowledge and stipulated input-output data pairs. ANFIS implements Takagi-Sugeno-Kang fuzzy system [25] and has been
H. Malek et al.
widely used in modeling and controlling nonlinear systems
[24]. A mixture of gradient decent and least square method
is used in its learning procedure and it has yielded remarkable performance in comparison with various approaches.
However, ANFIS suffers from the curse of dimensionality
as the number of input dimension gets larger. Thus, in the
learning algorithm, before or during the generation of rules,
a method for reducing the number of fuzzy rules should be
employed. Although it has been shown that by the method
used in ANFIS, the same performance can be achieved when
a smaller number of fuzzy rules is selected for grid-type partitioning [21], but still the curse of dimensionality is a significant obstacle when the number of fuzzy rules increase.
Various methods have been proposed in literature for
optimizing the number of rules in hybrid neuro-fuzzy networks. In some approaches fuzzy rules are determined by
utilizing the idea of partitioning the input space by some
unsupervised algorithms [6, 15]. Different clustering algorithms like FCM have been proposed for defining the structure of fuzzy rules [13, 22]. The output is then approximated by a combination of linear functions on each partition. Defining fuzzy rules on just input data may not perform well since it does not take output values into account
which may have large variance in one cluster in comparison
with their corresponding inputs and consequently the system doesn’t provide an accurate approximation on the input
data. Some authors tried to overcome this problem by doing the clustering on input-output space [9, 12, 26]. They
mostly use well known clustering algorithms like HCM or
FCM. However, the number of clusters should be provided
by an expert and there is no guarantee of finding an optimal solution since the resulting network is highly dependent
on the selection of initial cluster centers [18]. Furthermore,
the clustering algorithms tend to define clusters based on the
closeness of data points instead of their behavior similarities
that can consequently lead to generation of redundant fuzzy
rules [4].
The above mentioned methods require an expert to set
the number of partitions or equivalently the number of fuzzy
rules before the beginning of main algorithm. However, we
usually do not know an optimal number of fuzzy rules.
Therefore, dynamic adaptive methods are proposed which
mainly start with a few number of neurons and, during the
learning process, the number of fuzzy rules grows until the
required precision is achieved [4, 7, 17, 18, 28]. In [7, 28] an
online self-organizing dynamic fuzzy neural network based
on extended radial basis function neural networks has been
proposed that the structure and parameters can be adapted
online without partitioning the input space a prior. Based
on this dynamic fuzzy neural network (DFNN) and its extended version which is a generalized dynamic fuzzy neural network (GDFNN) [29], a self-organizing fuzzy neural
network (SOFNN) is proposed in [18] which designs an online learning using a modified recursive least squares (RLS)
for parameter identification and a structure learning algorithm that determines the center and width vectors of elliptical basis functions (EBF) neurons by a criterion based
on system error and firing strength to increase the ability of network in covering of input data. An extended version of SOFNN named self-organizing fuzzy neural network
based on genetic algorithm (SOFNNGA) is also proposed
in [17] where the initial network structure is first built using a method based on geometric growing criterion and the
-completeness of fuzzy rules and then a combination of genetic algorithm, backpropagation and recursive least squares
(RLS) is employed to adjust system parameters.
In this paper we propose three new algorithms for finding the optimum number of rules (or neurons) in function
approximation.
In the first algorithm, during the first phase, the initial
structure identification is performed using K-nearest neighbor (KNN) algorithm which generates the initial network
structure by estimating optimum points in the simulated
function. Then to ensure that the generated network can
approximate training data with a root mean square error
(RMSE) less than requested user’s threshold, an error based
algorithm is applied which adds new neurons based on the
worst error obtained in training data. In the second phase, a
genetic algorithm is utilized which removes redundant neurons by keeping RMSE near the desired threshold.
The second algorithm performs the same as the first one,
but the KNN part is replaced with Mean-Shift algorithm [8].
A simpler method based on error algorithm is proposed
as a third algorithm, which can reduce the number of fuzzy
sets in fuzzy rules. Since this method merely performs the
error algorithm and no genetic algorithm or initial rule generation is included, the network is constructed with lower
computational cost.
The performance of these three algorithms are evaluated
and presented in the simulation section and their superiorities as network compactness and accuracy to other algorithms are presented.
The remaining of this paper is organized as follows. In
Sect. 2, an overview of function approximation using fuzzy
systems is presented. Proposed network structure and learning algorithms are described in Sect. 3. The simulation results are carried out in Sect. 4. A discussion on the role of
K and a comparison between the three proposed algorithms
are described in Sect. 5. Finally, conclusion is presented in
Sect. 6.
2 Fuzzy function approximation
Nonlinear function approximation problem can be modeled
by employing fuzzy rule base with a set of IF-THEN rules
defined as follows:
R i : IF x1 is Ai1 and . . . and xn is Ain THEN y is B i
(1)
Three new fuzzy neural networks learning algorithms based on clustering, training error and genetic algorithm
where Aij and B i are fuzzy sets and x = (x1 , . . . , xn )T and
y are input and output variables of the fuzzy system, respectively.
One can find a mapping from a fuzzy set A to a fuzzy set
B using the product inference engine, where we have:
n
m
μAi (xj )μB i (y)) .
(2)
μB (y) = max sup(μA (x)
i=1
x
j =1
j
Fuzzification of a real-valued point x ∗ can be done using
Singleton Fuzzifier where it maps x ∗ into a fuzzy singleton
A with membership value 1 at x ∗ and 0 at other points:
1 x = x∗,
μA (x) =
(3)
0 otherwise.
Defuzzifier is defined as a mapping from a fuzzy set B to a crisp point y ∗ . There are different defuzzifiers defined
in literature. One of them is center average defuzzifier. Let
ȳ i be the center of the i’th fuzzy set and wi be its height, the
center average defuzzifier determines y ∗ as
m i
ȳ wi
∗
.
y = i=1
m
i=1 wi
(4)
Fig. 1 Fuzzy neural network structure
layer:
y=
M
vi hi ,
(8)
i=1
For a fuzzy set B i with center ȳ i , the fuzzy systems with
fuzzy rule base (1), product inference engine (2), singleton
fuzzifier (3) and center average defuzzifier (4) are of the following form:
m
n
i
i=1 ȳ
j =1 μAij (xj )
m n
i=1 j =1 μAij (xj )
f (x) =
(5)
where x is the input and f (x) is the output of the fuzzy
system [27].
The corresponding network of fuzzy model can be built
as shown in Fig. 1. The network has four layers. The output of each node in the first layer is equal to μAi (xj ) the
j
membership value of fuzzy set Aij .
Nodes in the second layer calculate the product of membership values of inputs in all dimensions for each rule:
gi =
n
μAi (xj ).
j =1
j
(6)
where vi s are called the consequent parameters which
should be learned using least-square or gradient descent
method.
For Takagi-Sugeno-Kang fuzzy model, one layer before
the output layer is added which it replaces the consequent
parameters with a linear combination of inputs. Thus, the
output of the network is calculated as follows:
y=
M
(c0i + c1i x1 + · · · + cni xn )hi .
(9)
i=1
In recent neuro-fuzzy networks, parameters are determined through a hybrid learning algorithm where epochs
are consisted of a forward and backward pass. In the forward pass all training data is presented to the network and
the output weights are identified by least-square algorithm.
Recursive least square can also be employed to determine
output layer weights [14].
The third layer is called the normalization layer where
the output of node i is calculated as follows:
3 Proposed algorithms
gi
hi = m
In this paper, three algorithms are proposed for TakagiSuegeno-Kang modeling using fuzzy neural network. The
first algorithm builds the initial structure of the network using KNN and keeps adding more rules for covering data
points with the highest error until the satisfying RMSE is
i=1 gi
.
(7)
And finally the last layer (or output layer) which calculates the summation of its input values from the previous
H. Malek et al.
Fig. 3 Flowchart for the third algorithm
Fig. 2 Flowchart for the first algorithm
achieved. Next, the number of rules are optimized using genetic algorithm which tries to reduce the neuron numbers
while retaining network error as low as defined threshold
(Fig. 2). The second algorithm does the same but uses MeanShift algorithm for finding a good set of initial rules. The
third algorithm removes the first and last parts of these two
algorithms and only performs the second part which adds
neurons until desired RMSE is achieved (Fig. 3).
The network architecture used in this paper is the same
as standard network architecture in ANFIS which is shown
in Fig. 1. The details of the three algorithms are presented
as follows.
3.1 KNN method
In a learning algorithm, the number of rules, the values of
center, and width vectors for each rule should be identified. The first learning algorithm is designed in two phases:
Rule generation and rule reduction. In the rule generation
phase, the K-nearest-neighbor algorithm following by an
error-based algorithm is used. In the rule reduction phase,
a genetic algorithm is applied on the network. After adding
or removing each rule, least-square method is applied on the
network to identify the best consequent parameters in each
step.
3.1.1 Rule generation phase
In the first phase, initial fuzzy rules are generated using Knearest neighbor (KNN) algorithm. It is intuitively obvious
that having enough samples of a nonlinear function, one can
estimate local optimums of a function by finding local optimums in uniformly distributed samples. The KNN algorithm
tries to locate these local optimums by examining each point
with its K nearest neighbors.
For a training data point x = (x1 , x2 , . . . , xn )T , define Ax
as the set of K training input points with the nearest Euclidean distance to x. We say x is an optimum point if its
corresponding output value has the highest or lowest value
between Ax members.
Thus, for the given training points set P , we set the center
vectors of initial fuzzy rules as follows:
M = {x = (x1 , . . . , xn ) ∈ P |y(x) < y(Ax ) or y(x) > y(Ax )}
(10)
In this algorithm, gaussian function with a mean and
width is selected as a membership function of fuzzy sets.
Width vectors are set as a predefined value σ0 = (σ01 , . . . ,
σ0n ) which should be provided by user before the learning
process. A good heuristic can also be employed which uses
the standard deviation of K nearest neighbors as width values.
After generation of initial fuzzy rules with KNN method,
the consequent parameters should be learned. These consequent parameters can be identified using Least-Square algorithm. In Takagi-Sugeno-Kang modeling we have (n + 1) ∗
M parameters in consequent part of fuzzy rules. Defining the
consequent parameters as vector C = (c01 , . . . , cn1 , . . . , c0i ,
Three new fuzzy neural networks learning algorithms based on clustering, training error and genetic algorithm
. . . , cni , . . . , c0M , . . . , cnM )T , we can estimate C using leastsquare algorithm; that is:
C = (H T H )−1 H T Y.
fitness value as above one can assure that the RMSE of resulting network will not pass the twice threshold value and
will converge to networks with the lowest number of rules.
(11)
3.2 Mean-shift method
New rules based on worst RMSE Newly built network
now estimates unknown function by a RMSE which can be
higher or lower than our expectation based on training inputs and the complexity of the function. For achieving more
precision, the second stage of this phase can be performed.
In this stage, we add new neurons based on training data
error to the existing network. In this stage, the algorithm enters in a loop where in each round, a point x = (x1 , . . . , xn )
from training set with the highest error (generated by the
built network) is selected and a new neuron (rule) is added
to the network with center vector x and predefined width
σ0 = (σ01 , . . . , σ0n ). This process of adding new neurons
continues until the desired error is obtained.
3.1.2 Rule reduction phase
After obtaining the required RMSE, algorithm starts the optimization process where the number of neurons gets reduced by applying a genetic algorithm on them. In the second step of the first phase, neuron centers may have not been
found optimally, so in this phase the algorithm tries to find
and remove redundant neurons.
A variety of methods can be employed to find the optimal subset of neurons for a given data. Genetic algorithm is
a promising tool for optimization which in contrast to other
optimization problems like gradient decent can escape from
trapping in local minimum. In genetic algorithms, a population of solutions is initially produced in a specific representation and genetic operators like crossover and mutation are
applied to produce next generations. In each generation, a
selection process is applied to identify population parents. In
the selection step, a fitness scaling should be chosen which
determines how to select individuals from existing population. Fitness proportional and rank based scaling are two examples of fitness scaling methods.
For this problem, binary representation is chosen where
each bit indicates the existence of its corresponding rule. So
a bit 1 means that the corresponding neuron is used and 0
means it is not. Rank-based scaling is also selected to remove the effect of the spread of raw scores.
The fitness function should be chosen carefully. In one
hand, we need to keep the error as low as possible, and in
the other hand, we wish to reduce the number of neurons.
We define our fitness function as follows:
RMSE ∗ M if RMSE > 2 ∗ τ ,
fitness =
(12)
L
otherwise,
where L is the number of ones in the bit string, τ is the
error threshold and M is a large number. By defining the
The second algorithm is similar to the first one with one exception where the KNN algorithm is replaced with a nonparametric mode seeking algorithm named Mean-Shift [8]
which can find the local optimums (modes) in the estimated
density function of training data.
In short, for n input points, the well-known density estimator [8] can be written as
2 n
ck,d x − xi ,
ˆ
fh,K (x) = d
(13)
k nh
h i=1
where k(.) is the profile kernel and ck,d is the corresponding
normalization constant which is assumed strictly positive.
The parameter h is the kernel bandwidth and must be provided by user. The modes of this density function can be
found by locating the zeros of the gradient: f (x) = 0.
By defining the function g(x) = −k (x) and using g(x)
for profile, the density gradient estimator is obtained as follows:
n x − xi 2 2c
k,d
ˆ h,K (x) =
f
g h nhd+2
i=1
n
x−xi 2
i=1 xi g( h )
× n
−x .
(14)
x−xi 2
i=1 g( h )
By defining kernel G as
G(x) = cg,d g(x2 ),
(15)
the density gradient estimator can be rewritten as
ˆ h,K (x) = fˆh,G (x) 2ck,d mh,G (x)
f
h2 cg,d
(16)
where
2 n
x − xi ck,d ,
ˆ
g fh,G (x) = d
nh
h (17)
i=1
and
n
mh,G (x) = i=1
n
i 2
xi g( x−x
h )
x−xi 2
i=1 g( h )
− x.
(18)
The second term or mh,G (x) is called mean shift. It has
been shown [5] that one can find the nearest stationary point
of each arbitrary point (x 0 ) by updating its position using
the following procedure:
x t+1 = x t + mh,G (x t ).
(19)
H. Malek et al.
Starting from training points, based on the value of h the
bandwidth parameter, we can find the modes of density estimator which can be used as the initial guess for mean points
of fuzzy sets in fuzzy network. The width vectors of these
fuzzy sets should be predefined by user.
The remaining parts of the algorithm is exactly the same
as the first algorithm where new rules are added based on
their error, and a genetic algorithm reduces redundant fuzzy
rules.
3.3 Space partitioning method
Although initializing the network with estimated optimal
points using KNN or Mean-Shift method seems to be useful, but we can remove them and start the rule generation
phase with error-based algorithm, so the initial center vectors of the fuzzy rules are selected from points with the highest error in the current network. Removing the KNN or Mean
Shift algorithms may only increase the initial neurons generated in this phase, but after the genetic algorithm phase the
same result would be obtained.
The third algorithm is a new method which adds new neurons to the network through finding data points in training
set with the worst performance in the existing network.
We can extend the idea of generating fuzzy sets from input data errors (presented in our fist two algorithms), and
after detecting the data with the worst error, divide the input
space based on one dimension of input data. In other words,
in each iteration of the algorithm, an input data with the
highest error is calculated by comparing the network output
with the desired output for each training data. A rule with the
highest covering of this data is found and for each component of chosen center vector, one is selected randomly and
is added as the center of new fuzzy set in that dimension.
The left and right center neighbors of each fuzzy triangular
center are considered as the left and right sides of the fuzzy
sets. Fuzzy sets in other dimensions are remained intact.
So, in the i-th iteration of the algorithm, for x =
(x1 , x2 , . . . , xn ) which is the data point with the highest error, the rule with the best covering is found as follows:
i = arg max
D
i
t (x; aji , bji , cji )
(20)
j =1
where t (x; a, b, c) is a triangular-shaped membership function, and Di is the number of fuzzy sets in i-th dimension.
aji , bji and cji are three scalars defining the left, right and
center positions of j -th fuzzy triangular set in i-th dimension, respectively.
For fuzzy rule i, C i = [c1i , c2i , . . . , cni ]T represents the
center vector and Ai = [a1i , a2i , . . . , ani ]T and B i = [b1i , b2i ,
. . . , bni ]T represent the left and right vectors of triangular
Fig. 4 Two fuzzy sets for each dimension are produced in the last
iteration of algorithm. A training point x ∗ = (x1∗ , x2∗ ) with the highest
error is represented in each dimension
Fig. 5 Second dimension is selected randomly and a new fuzzy triangular-shaped set B3 is inserted at x2∗ and the left and right sides of
fuzzy sets are updated
fuzzy sets corresponding to rule i. So, by replacing one randomly selected dimension of rule i, say r, with the corresponding value in x a new fuzzy rule with the following
center vector is generated:
C M+1 = [c1M+1 , . . . , crM+1 , . . . , cnM+1 ]
(21)
where:
cjM+1
i
c
= j
xr
j=
r,
j = r.
(22)
The corresponding values in AM+1 and B M+1 should be
updated too. One can do this by finding the position of xr
between the center points in dimension r and setting arM+1
to the biggest center value in the left side of xr and setting
brM+1 to the lowest center value in the right side of xr . Similarly, we need to update other Aj s and B j s as new sets are
being inserted. This can be seen in the Figs. 4 and 5.
This algorithm can be improved by employing a heuristic
method for choosing the dimension where the space division
Three new fuzzy neural networks learning algorithms based on clustering, training error and genetic algorithm
Table 1 Comparison results of
sinc function
Algorithm
No. neurons
No. parameters
ANFIS
16
72
Training RMSE
Testing RMSE
SOFNN
14
68
0.0217
0.0860
SOFNNGA
14
76
0.0173
0.0567
Mean shift method
18
90
0.0163
0.0571
KNN method
14
70
0.0168
0.0231
Space partitioning method
24
97
0.0161
0.2603
should take place. In order to have a better covering of input
space, we need to keep the fuzzy sets far away from each
other. So, for input data x, in each dimension, the component
with the biggest distance to other fuzzy sets is selected to be
replaced in rule i.
By utilizing this alternative method we can cover the input space with small number of fuzzy sets, since the generated rules have lots of fuzzy sets in common.
There is no rule reduction phase in this method, so the
computational cost of the algorithm is much lower than the
last two methods. Also, as we update the values of fuzzy
sets, there would be no need for normalization too. The comparison of this method to KNN and Mean-Shift methods is
presented in simulation section.
Fig. 6 Sinc surface from training points
4 Simulation
In this section the performance of the proposed network
is evaluated using two examples. One nonlinear two-inputs
function and a nonlinear dynamic system identification. We
also compare our results with other networks.
4.1 Nonlinear sinc function
The function is defined as follows:
f (x, y) =
sin(x) sin(y)
,
xy
x ∈ [−10, 10], y ∈ [−10, 10].
(23)
A set of 121 data is sampled uniformly from the function
as the training set and a set of 121 data points is sampled
as the testing set. Figure 6 shows the surface from training points. We set K = 20 and set τ the desired error 0.01.
The genetic algorithm is ran with a population size of 20 in
10 generations. Table 1 depicts our results and its comparison with other algorithms. The resulting output is shown in
Fig. 7.
By employing the alternative method for rule generation,
the number of fuzzy sets can be reduced from 28 sets to 25
sets. Although the number of rules gets bigger to 24 rules
with an RMSE around 0.0161 for training data.
Fig. 7 Sinc surface from simulation
4.2 Nonlinear dynamic system identification
The system is defined as follows:
y(t + 1) =
y(t)y(t − 1)[y(t) + 2.5]
+ u(t),
1 + y 2 (t) + y 2 (t − 1)
t ∈ [1, 200],
y(1) = 0,
y(0) = 0,
2πt
u(t) = sin
.
25
(24)
H. Malek et al.
Table 2 Comparison results of
nonlinear dynamic system
identification
Algorithm
No. neurons
No. parameters
Training RMSE
OLS
65
326
0.0288
RBFAFS
0.1384
Testing RMSE
35
280
DFNN
6
48
0.0283
GDFNN
6
48
0.0241
75
48
0.193
0.201
SOFNN
5
46
0.0157
0.0151
SOFNNGA
4
34
0.0159
0.0146
Khayat’s model [11]
4
34
0.0147
0.0141
Mean shift method
5
35
0.0137
0.0127
KNN method
4
28
0.0150
0.0131
Space partitioning method
9
38
0.0065
0.0055
Farag’s model
Fig. 9 NDL graph surface from simulation
Fig. 8 NDL graph from training points
Which can be described in a form of three-input one output function:
ŷ(t + 1) = f (y(t), y(t − 1), u(t)).
(25)
A set of 200 data is selected as training set and 200 data as
testing set. Figure 8 shows the surface from training points.
Table 2 depicts our results and its comparison with other
algorithms. The resulting output is shown in Fig. 9.
The number of fuzzy sets is reduced in the second method
from 12 to 11. Although the number of rules gets bigger to
9 with an RMSE around 0.0065 for training data.
5 Discussion
In this section, we discuss the role of K in KNN method
and a comparison between the three introduced learning algorithms is made.
5.1 The role of K in KNN method
The main parameter in KNN method that should be set by
user is K which specifies how many neighboring data should
be compared with each training data in the process of constructing initial fuzzy network. It can be easily seen that by
reducing K more local optimum points in the training data
would be found and thus the initial structure of the fuzzy
network gets bigger. In this section we examine the effect
of changing parameter K before and after the rule reduction phase. To this end, we fix the desired error to a constant
value and estimate the results on a specific training data for
different values of K.
In Fig. 10, the number of rules for different values of K
for nonlinear Sinc function, before and after the rule reduction phase are shown.
As the value of K increases, in the first phase, the number of rules decreases. In contrast, after applying the genetic algorithm, the number of rules is almost invariant under changes of K. This implies that regardless of the value
Three new fuzzy neural networks learning algorithms based on clustering, training error and genetic algorithm
Generalization Proposed algorithms perform differently
on test data in two simulated functions. KNN and MeanShift algorithms can generalize well in both Sinc function
and Nonlinear Dynamic system, but the Space Partitioning method has overfitting problem in Sinc function. The
overfitting problem occurs because of precise adjustment of
left and right sides of triangular-shaped fuzzy sets in this
method. As the number of fuzzy sets increases, the generalization ability of this method reduces and in comparison
with other approaches, higher RMSE will be achieved with
the same number of fuzzy rules.
Fig. 10 The number of rules for 4K value
of K, the proposed learning algorithm generates nearly the
same number of rules.
5.2 Explanation and comparison between algorithms
As it can be seen in the simulation section, comparing to
latest hybrid learning algorithms proposed in literature, for
the fixed number of neurons, our first two algorithms can
estimate the benchmarking functions with the lower RMSE.
At the same time, since we choose a fixed variance value for
our gaussian fuzzy sets, the number of parameters is reduced
and no parameter tuning algorithm is needed to be applied.
So, in applications where one needs to have a small network
with a low RMSE, the first two methods can be employed
and they will outperforms other neuro-fuzzy approaches in
both performance and network compactness. The third algorithm is mostly suitable when we need a fast but yet accurate learning algorithm and the compactness of the network
is not our priority.
Three introduced algorithms are compared based on their
learning accuracy, generalization and computational cost.
Learning accuracy As it can be seen from the results, the
KNN approach outperforms other algorithms in learning the
training data. Mean-Shift algorithm comes next which performs much better than Space Partitioning algorithm, but
cannot produce less RMSE with the same fuzzy rules in
comparison with KNN method. The low performance of
Space Partitioning method can be justified by the fact that it
doesn’t employ the clustering and genetic algorithm phases.
The comparison between KNN and Mean-Shift methods
is not easy. Both try to find function mode points. However, since density estimation in Mean-Shift method is done
through a restricted kernel shape, it has a lower flexibility in
defining the initial fuzzy centers in comparison with KNN
method.
Computational cost One important factor in comparing
different function approximation approaches is their time
and space complexity which can become a serious problem when the algorithm is used in real applications. In approximation of some complex functions, one may decide to
compromise on accuracy by employing a more simpler algorithm in learning input data. In three proposed algorithms,
Space Partitioning algorithm has lower computational costs.
It does not perform any initialization phase for finding initial
fuzzy rules and there is no rule reduction phase at the end.
So, it is a good candidate when having the lowest number of
rules is not the first priority for the user.
6 Conclusions
Three new hybrid learning algorithms for Takagi-SugenoKang fuzzy systems based on K-nearest neighbor, MeanShift procedure and space partitioning are proposed. The algorithms are simple with low computational costs and are
effective in approximating nonlinear functions with enough
accuracy and small number of rules. It is shown that, since
we choose a fixed variance value for our gaussian fuzzy sets,
the number of parameters is reduced and no parameter tuning algorithm is needed to be applied during the algorithm.
This makes the algorithms faster comparing to latest hybrid
learning algorithms proposed in literature.
References
1. Buckley JJ, Hayashi Y (1994) Fuzzy neural networks: A survey.
Fuzzy Sets Syst 1–13
2. Buckley JJ, Hayashi Y (1995) Neural networks for fuzzy systems.
Fuzzy Sets Syst 265–276
3. Bunke H, Kandel A (2000) Neuro-fuzzy pattern recognition.
World Scientific, Singapore
4. Chuang C, Su S, Chen S (2001) Robust TSK fuzzy modeling for
function approximation with outliers. IEEE Trans Fuzzy Syst 9
5. Comaniciu D, Meer P (1997) A robust analysis of feature spaces:
color image segmentation. In: Proc 1997 IEEE conf computer vision and pattern recognition, pp 750–755
6. Dickerson JA, Kosko B (1996) Fuzzy function approximation with
ellipsoidal rules. IEEE Trans Syst Man Cybern 26(4):542–560
H. Malek et al.
7. Er MJ, Wu S (2002) A fast learning algorithm for parsimonious
fuzzy neural systems. Fuzzy Sets Syst 337–351
8. Fukunaga K, Hostetler LD (2002) The estimation of the gradient of a density function, with applications in pattern recognition.
IEEE Trans Pattern Anal Mach Intell 24:603–619
9. Gonzalez J, Rojas I, Pomares H, Ortega J, Prieto A (2002) New
clustering technique for function approximation. IEEE Trans Neural Netw 13:132–152
10. Jang JSR (1993) ANFIS: adaptive-network-based fuzzy inference
system. IEEE Trans Syst, Man Cybern 665–684
11. Khayat, et al (2009) A novel hybrid algorithm for creating selforganizing fuzzy neural networks. Neurocomputing
12. Klawonn F, Keller A (1998) Grid clustering for generating fuzzy
rules. In: European congress on intelligent techniques and soft
computing, Aachen, Germany, pp 1365–1369
13. Klawonn F, Kruse R (1997) Constructing a fuzzy controller from
data. Fuzzy Sets Syst 85:177–193
14. Kosko B (1994) Fuzzy systems as universal approximators. IEEE
Trans Comput 43:1329–1333
15. Kroll A (1996) Identification of functional fuzzy models using
multi-dimensional reference fuzzy sets. Fuzzy Sets Syst 80:149–
158
16. Kulkarni AD, Cavanaugh CD (2000) Fuzzy neural network models for classification. Appl Intell 12:207–215.
doi:10.1023/A:1008367007808
17. Leng G, McGinnity T (2006) Design for self organizing fuzzy
neural network based on genetic algorithm. IEEE Trans Fuzzy
Syst
18. Leng G, Prasad G, McGinnity, TM (2006) An on-line algorithm
for creating self-organizing fuzzy neural networks. Neural Netw
19:974
19. Lin S-F, Chang J-W, Hsu Y-C (2010) A self-organization mining
based hybrid evolution learning for TSK-type fuzzy model design.
Appl Intell, December
20. Nauck D, Kruse R (1997) Function approximation by NEFPROX.
In: Proc second European workshop on fuzzy decision analysis
and neural networks for management, planning, and optimization,
Dortmund, pp 160–169
21. Nauck D, Kruse R (1999) Neuro-fuzzy systems for function approximation. Fuzzy Sets Syst 101:261–271
22. Park B-J, Pedrycz W, Oh S-K (2008) Polynomial-based radial basis function neural networks (P-RBF NNs) and their application to
pattern classification. Appl Intell 32(1):27–46
23. Pedrycz W (1996) Fuzzy modeling: paradigms and practice.
Springer, Berlin, p 205
24. Schilling RJ, Carroll JJ, Al-Ajlouni AF (2001) Approximation
of nonlinear systems with radial basis function neural networks.
IEEE Trans Neural Netw 1–15
25. Takagi T, Sugeno M (1995) Fuzzy identification of systems and
its applications to modeling and control. IEEE Trans Syst Man
Cybern 15:116–132
26. Wang JS, Lee CSG (2001) Efficient neuro-fuzzy control systems
for autonomous underwater vehicle control. In: IEEE international
conference on robotics and automation, pp 2986–2991
27. Wang L-X (1997) A Course in fuzzy systems and control.
Prentice-Hall, New York
28. Wu S, Er MJ (2000) Dynamic fuzzy neural networks-a novel approach to function approximation. IEEE Trans Syst, Man Cybern
358–364
29. Wu S, Er MJ, Gao Y (2001) A fast approach for automatic generation of fuzzy rules by generalized dynamic fuzzy neural networks.
IEEE Trans Fuzzy Syst 578–594
Hamed Malek received the B.Sc.
In Mathematics from Sharif University of Technology, Iran in 2005 and
M.Sc. In Computer Science from
Sharif University of Technology,
Iran in 2007. Currently, he is a Ph.D.
candidate in Computer Engineering
Department of Amirkabir University of Technology (Tehran Polytechnic). His research interests include: Artificial Immune Systems,
Fuzzy Systems, Mathematical Biology, Evolutionary Algorithms and
Multi Agent Systems.
Mohammad Mehdi Ebadzadeh
received the B.Sc. In Electrical Engineering from Sharif University
of Technology, Iran in 1991 and
M.Sc. in Machine Intelligence and
Robotic from Amirkabir University
of Technology, Iran in 1995 and his
Ph.D. in Machine Intelligence and
Robotic from Télécom ParisTech
in 2004. Currently, he is a associate professor in the Computer Engineering Department of Amirkabir
University of Technology (Tehran
Polytechnic). His research interests
include: Evolutionary Algorithms.
Fuzzy Systems, Neural Networks, Artificial Immune Systems and
Robotics and Artificial Muscles.
Mohammad Rahmati received the
M.Sc. In Electrical Engineering
from the University of New Orleans, USA in 1997 and the Ph.D.
degree in Electrical and Computer
Engineering from University of Kentucky, Lexington, KY USA in 2003.
Currently, he is a associate professor in the Computer Engineering
Department at Amirkabir University of Technology (Tehran Polytechnic). His research interests include: Pattern recognition, Image
Processing, Bioinformatics, video
processing, and Data Mining.