Cooperative Co-evolution for Large Scale Optimization

WCCI 2010 IEEE World Congress on Computational Intelligence
July, 18-23, 2010 - CCIB, Barcelona, Spain
CEC IEEE
Cooperative Co-evolution for Large Scale Optimization Through
More frequent Random Grouping
Mohammad Nabi Omidvar, Member, IEEE and Xiaodong Li, Senior Member, IEEE and
Zhenyu Yang and Xin Yao, Fellow, IEEE
Abstract— In this paper we propose three techniques to
improve the performance of one of the major algorithms for
large scale continuous global function optimization. Multilevel
Cooperative Co-evolution (MLCC) is based on a Cooperative
Co-evolutionary framework and employs a technique called
random grouping in order to group interacting variables in
one subcomponent. It also uses another technique called adaptive weighting for co-adaptation of subcomponents. We prove
that the probability of grouping interacting variables in one
subcomponent using random grouping drops significantly as
the number of interacting variables increases. This calls for
more frequent random grouping of variables. We show how to
increase the frequency of random grouping without increasing
the number of fitness evaluations. We also show that adaptive
weighting is ineffective and in most cases fails to improve
the quality of found solution, and hence wastes considerable
amount of CPU time by extra evaluations of objective function.
Finally we propose a new technique for self-adaptation of the
subcomponent sizes in CC. We demonstrate how a substantial
improvement can be gained by applying these three techniques.
I. I NTRODUCTION
Many Evolutionary Algorithms(EAs) have been used for
optimization problems, but the performance of these algorithms deteriorate as the dimensionality of problem increases,
commonly referred to as the curse of dimensionality. The
problem of finding the global optimum becomes even harder
when some or all of the decision variables have interaction amongst themselves. This class of problems are called
non-separable problems. Variable interaction in large scale
problems drastically increases the total number of function
evaluations in order to find a reasonable solution. Large-scale
non-separable problems are considered as very difficult type
of optimization and they usually require a large number of
fitness evaluations.
The expensive nature of these problems gets magnified in
dealing with real world problems specially in engineering.
In many real world problems the cost of evaluating the
objective function involves interaction with other modules
such as simulation software. RoboCup simulation software
M. N. Omidvar and X. Li are with the Evolutionary Computing and
Machine Learning Group (ECML), the School of Computer Science and
IT, RMIT University, VIC 3001, Melbourne, Australia (emails: [email protected], [email protected]).
X. Yao is with the Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA), the School of Computer
Science, University of Birmingham, Birmingham B15 2TT, UK (e-mail:
[email protected]).
Z. Yang is with Nature Inspired Computation and Application Laboratory, the Department of Computer Science and Technology, University of
Science and Technology of China, Hefei, Anhui 230027, China. (email:
[email protected])
c
978-1-4244-8126-2/10/$26.00 2010
IEEE
is a perfect example[1]. In Evolutionary Robotics, Multidisciplinary Design Optimization(MDO), Shape Optimization,
and other areas the cost of only one evaluation of the
objective function is more than a few minutes if not a few
hours. The expensive nature of objective function specially
in real world problems along with the difficult nature of large
scale non-separable problems demands new ways of cutting
down the total number of fitness evaluations.
Many techniques have been proposed for solving large
scale problems. Cooperative Co-evolution(CC) proposed by
Potter et al.[2] uses a divide-and-conquer approach to divide
the decision variables into subpopulation of smaller sizes and
each of these subpopulations is optimized with a separate
EA in a round robin fashion. CC seems to be a promising
framework for large scale problems, but its performance
degrades when applied to non-separable problems[2]. In an
ideal setting all the interacting variables should be grouped
in one subcomponent in order to enhance the performance
of optimization, but in real world problems there is almost
no prior information about how the variables are interacting. This arises the need for more sophisticated techniques
capable of capturing the interacting parameters through the
course of evolution and grouping them together in one single
subpopulation.
MLCC[3] is a new techniques for large scale nonseparable function optimization which extends another algorithm called DECC-G[4] by self-adapting the subcomponent
sizes. DECC-G relies on random grouping of decision variables into subcomponents in order to increase the probability
of grouping interacting variables in non-separable problems.
It also evolves a weight vector for co-adaptation of subcomponents for further improving the solutions. In this paper we
investigate the internal mechanism of these algorithms and
demonstrate two major bottlenecks that degrades the performance of these algorithms. We also propose some techniques
for further improving the performance of MLCC. First we
show that more frequent random grouping could result in
finding a solution with fewer number of fitness evaluations
without sacrificing solution quality. Secondly we show that
adaptive weighting does not play a major role in the whole
process and in most cases fails to improve the solution and
wastes considerable number of fitness evaluations. Overall,
we demonstrate the followings in this paper:
• Extend the theoretical background of random grouping
to include more than two interacting variables.
• Show that more frequent random grouping is beneficial,
specially when there are more than two interacting
1754
variables.
• Identifying two major bottlenecks in MLCC and show
how to increase the frequency of random grouping without increasing the total number of fitness evaluations.
• Demonstrate that adaptive weighting is ineffective,
wasting valuable fitness evaluations, and show that how
the overall performance will be significantly improved
by simply disabling this feature.
• A simpler and more intuitive and yet more efficient
alternative to self-adaptation of subcomponent sizes
which is used in MLCC. This new algorithm is called
DECC-ML
• Showing that DECC-ML shows faster convergence
compared to previous techniques. This faster convergence results in saving up to 32 of fitness evaluations
which is a significant improvement specially in expensive optimization problems.
The organization of the rest of this paper is as follows: Section II describes the preliminaries and background
information, Section III describes the new techniques for
cutting down the number of fitness evaluations. Section
IV demonstrates and analyzes the experimental results, and
finally section V summarizes this paper.
II. P RELIMINARIES
A. Cooperative Co-evolution
Divide-and-conquer is an effective technique in solving
complex problems. Cooperative Co-evolution [2] uses a
similar technique to decompose a complex problem into
several simpler sub-problems.
Potter and De Jong made the first attempt to incorporate
CC into Genetic Algorithm for function optimization[2]. The
success of CC attracted many researchers to incorporate
CC into other evolutionary techniques such as Evolutionary
Programming [5], Evolutionary Strategies[6], Particle Swarm
Optimization[7], and Differential Evolution [4], [8].
The original CC decomposes a n-dimensional decision
vector into n subcomponents and optimizes each of the
subcomponents using GA in a round robin fashion. This
algorithm was called CCGA.
The first attempt for applying CC to large scale optimization was made by Liu et al. using Fast Evolutionary Programming with Cooperative Co-evolution(FEPCC)[5]
where they tackled problems with up to 1000 dimensions,
but it converged prematurely for one of the non-separable
functions, confirming that Potter and De Jong decomposition
strategy is ineffective in dealing with variable interaction.
van den Bergh and Engelbrecht[7] were first to apply
CC to PSO. They developed two dialects of Cooperative
PSO(CPSO) based on the original Potter’s framework, but
unlike the original CCGA[2] they divided a n-dimensional
problem into m s-dimensional subcomponents as depicted in
Figure 1.
CPSO follows the following steps [5], [2].
1) Divide the variables of the objective function into mdimensional subcomponents.
m populations
P1
.
.
.
P1,b
.
.
.
.
P2
.
.
.
.
.
.
P2,b
.
Pm
...
Pm,b
.
.
.
.
.
.
.
s dims
Fig. 1. The n-dimensional objective vector is divided into m s-dimensional
subcomponents. Pm,b denotes the best individual in mth subcomponent.
2) Optimize each of the m subcomponents in a roundrobin fashion with a certain EA. Note that the number
of evaluations is predetermined.
3) Stop the evolutionary process once the halting criteria
is satisfied or the maximum number of evaluations is
exceeded.
Every individual in a subcomponent is evaluated by concatenating it with the best-fit individuals in the rest of the
subcomponents to form what is called a context vector[7].
This context vector contains all the parameters required by
the objective function and is fed into it for fitness evaluation.
This is actually where the cooperation happens.
One major drawback of CPSO is that its performance degrades rapidly as the dimensionality of the problem increases,
mainly due to exponential growth of the size of search space.
This problem is magnified when applied to non-separable
problems due to parameter interactions.
CC has also been applied to Differential Evolution in [4],
[8]. Shi et al. [8] used splitting-in-half strategy where they
divide the variables into two equally sized subcomponents
each of which is optimized using a separate DE. Their
decomposition strategy does not scale up efficiently as the
number of dimensions increases. Yang et al.[4] made the first
attempt to develop a more systematic way of dealing with
variable interactions by random grouping of decision variables into different subcomponents. They have shown that
random grouping of variables will increase the probability
of grouping interacting variables in one subcomponent. This
is further explained in Section II-C.
B. Differential Evolution
Differential Evolution (DE)[2] is a relatively new EA for
global optimization. Despite its simplicity it has shown to be
very effective on a set of benchmark functions [9].
DE[10] is designed to be less greedy compared to other EA
techniques[9] such as PSO [11] which makes it a good choice
for large scale optimization. In large scale optimization,
due to the vastness of the search space, more exploration
is needed in order to find the global optimal point, so
1755
more greedy techniques run the risk of finding a suboptimal
solution.
DE’s control parameters are problem dependent and hard
to determine[12], [13]. Self-Adaptive Differential Evolution with Neighborhood Search(SaNSDE)[14] self-adapts
crossover rate CR, scaling factor F, and mutation strategy. It
has been shown that SaNSDE performs significantly better
than other similar DE algorithms.
C. Random Grouping and Adaptive Weighting
In the framework proposed by Yang et al. [4], following a CC approach, the problem is decomposed into m
s-dimensional subcomponents, and each subcomponent is
optimized using a certain EA. Their method differs from
ordinary CC approaches in two major ways. First they use
random grouping of decision variables in order to increase
the probability of grouping two interacting variables in one
subcomponent, and secondly, the co-adaptation of subcomponents which is done using a technique known as Adaptive
Weighting.
The motivation for random grouping is that in most realworld non-separable problems, only a proportion of variables
interact with each other, so if an optimization algorithm
manages to group highly dependent variables in one subcomponent there will be a better chance to further improve
the performance of the algorithm. Since there is no prior
information about how the variables are interacting Yang
et al.[4] showed that by random grouping of the decision
variables one can increase the probability of grouping two
interacting variables in one subcomponent for at least some
predetermined number of cycles. They demonstrated that random grouping results in a probability of 96.62% in grouping
two interacting variables together for at least two cycles in
a CC setting with 10 subcomponents and a total of 1000
decision variables. However, it remains unclear what the
probability will be if more than 2 variables interacting with
each other, which is more likely the case in most optimization
problems. In Section II-C we show how the probability of
grouping interacting variables drops significantly when there
are more than two interacting variables and we also show
that given the same number of evaluations how to increase
the probability of grouping more than two variables together.
In adaptive weighting, a numeric weight value is applied
to each subcomponent. All of these weight values form a
vector called weight vector. This weight vector is optimized
using a separate optimizer for some predetermined Fitness
Evaluations(FEs). The motivation behind adaptive weighting
is to co-adapt interdependent subcomponents. It is obvious
that optimizing the weight vector is far simpler than the
original problem because the dimensionality of the wight
vector with only m variables is smaller than the original
problem with m × s variables.
Yang et al. [4] outline the steps of weight vector coadaptation as follows.
1) set i = 1 to start a new cycle.
2) Randomly split a n-dimensional objective vector into
m s-dimensional vector. This essentially means that
any variable has equal chance of being assigned to any
of the subcomponents.
3) Optimize the ith subcomponent with a certain EA for
a predefined number of FEs.
4) If i < m then i + +, and go to Step 3.
5) Construct a weight vector and evolve it using a separate
EA for the best, worst, and a random member of the
current population.
6) Stop if the maximum number of FEs is reached or go
to Step 1 for the next cycle
This scheme that uses a cooperative co-evolutionary EA
with weight co-adaptation was named EACC-G in [4]. Since
SaNSDE[14] is used as the subcomponent optimizer in Step
3 the algorithm is called DECC-G.
D. Multilevel Cooperative Co-evolution
One major problem with DECC-G is determining the
size of subcomponents. This parameter is problem dependent and is hard to determine. Multilevel Cooperative Coevolution(MLCC) [3] is an extension of DECC-G that dynamically self-adapts the subcomponent sizes.
MLCC uses a more flexible way of choosing the subcomponent sizes by choosing a group size from a set
S = {s1 , ..., st } of predefined group sizes. The selection
probability is calculated based on the performance history of
each of decomposers through the course of evolution.
MLCC uses a sophisticated formula for calculating the
selection probability of decomposers at the beginning of
each cycle. The formula contains some arbitrary constants
which are based on some empirical studies. In this paper
we propose a new technique for self-adaptation of the
subcomponent sizes which simply choose randomly a group
size from a predetermined set. DECC-ML shows substantial
improvement compared to MLCC.
III. P ROPOSED T ECHNIQUES
In this paper we propose two techniques for optimizing
the performance of DECC-G, and an alternative to MLCC
for self-adapting subcomponent sizes which is easier to
implement and more intuitive, and yet more efficient. In our
algorithm we perform the random decomposition of objective
vector at every iteration which results in more frequent
random grouping.
A. More Frequent Random Grouping
In their paper Yang et al. proved that random grouping
of decision variables and grouping them together at the
beginning of each cycle increases the chance of putting
two interacting variables in the same subcomponent. In [4]
they used 50 cycles and have shown that the probability of
grouping two parameters together at least for two iterations
would be 96.62% in a CC setting with 10 subcomponents
with a total of 1000 variables.
We further generalized their theorem here to any number
of interacting variables. Equation(1) calculates the probability of grouping v interacting variables in the same subcomponent for at least k cycles.
1756
Theorem 1. Given N cycles, the probability of assigning
v interacting variables x1 , x2 , ..., xv into one subcomponent
for at least k cycles is:
P (X ≥ k) =
N X
N
r=k
r 1
mv−1
r
1−
N −r
1
(1)
mv−1
where N is the number of cycles, v is the total number of
interacting variables, m is the number of subcomponents,
and the random variable X is the number of times that
v interacting variables are grouped in one subcomponent
and since we are interested in the probability of grouping v
interacting variables for at least k cycles, X takes the values
greater than or equal to k. k is also subject to the following
condition k ≤ N .
Proof 1. A variable can be assigned to a subcomponent in
m different ways, and since there are v interacting variables,
the probability of assigning all of the interacting variables
into one subcomponent would be:
psub =
1
1
1
× ... ×
= v
m
m
m
|
{z
}
v times
when there are 1e4 cycles. We can see from the graph that the
probability of grouping 5 variables for at least once using 50
cycles is close to zero whereas the probability of grouping
the same number of interacting variables for at least once
will increase to approximately 60% using 1e4 cycles. Note
that the number of fitness evaluations will be the same by
applying the techniques described later in this section.
Figure 3 shows the effect of increasing N , which is
the frequency of random grouping for different number of
interacting variables. It is evident that higher rate of random
grouping will increase the probability of grouping interacting
variables, regardless of the number of them. So more frequent
random grouping will not only results in higher probability
in grouping interacting variables, but also increases the
efficiency of the algorithms in dealing with more than two
interacting variables. Our studies on DECC-G and MLCC
revealed that the frequency of random grouping could be
significantly increased without increasing the total number
of fitness evaluations. In order to maximize the frequency of
random grouping the subcomponent optimizers should run
for only one iteration which is equivalent to Npopsize number
of fitness evaluations, where Npopsize is the population size
of the EA.
Since there are m different subcomponents, the probability of
assigning all v variables to any of the subcomponents would
be:
m
1
p = m × psub = v = v−1
m
m
There are a total of N independent random decompositions
of variables into m subcomponents, so using a binomial distribution the probability of assigning v interacting variables
into one subcomponent for exactly r times would be:
N r
P (X = r) =
p (1 − p)N −r
r
r N −r
N
1
1
=
1 − v−1
r
mv−1
m
Thus,
P (X ≥ k) =
N X
N
r=k
r
1
mv−1
r 1−
1
1
P(X >= 1), N=50
P(X >= 1), N=10000
0.9
0.8
0.7
Probability
0.6
0.5
0.4
0.3
0.2
0.1
0
2
3
4
5
6
7
Number of interacting variables(v)
8
9
10
Fig. 2. Increasing v, the number of interacting variables will significantly
decrease the probability of grouping them in one subcomponent, given n =
1000 and m = 10.
N −r
mv−1
Given n = 1000, m = 10, N = 50 and v = 4, we have:
50
1
= 0.0488
P (X ≥ 1) = 1 − P (X = 0) = 1 − 1 − 3
10
which means that over 50 cycles, the probability of assigning
10 interacting variables into one subcomponent for at least 1
cycle is only 0.0488. As we can see this probability is very
small, and it will be even less if there are more interacting
variables. Figure 2 shows how the probability of grouping
interacting variables for at least one cycle drops significantly
as the number of interacting variables increases. The solid
line shows how the probability will change by v for 50 cycles
and the dashed line shows how the probability will change
In their experiments, Yang et al. used only 50 cycles with
5e6 FEs while there is a potential for approximately 3800
cycles. This setting is counterintuitive because according to
the above discussion, a higher number of random grouping
will result in higher probability in grouping any two interacting variables in one subcomponent and also better efficiency
in dealing with more than two interacting variables.
B. Removing Adaptive Weighting
Further investigation on DECC-G have shown that Adaptive Weighting is not effective, and the computational resource can be spent on using more frequent random grouping
instead.
Although in theory it seems to be a good way of reducing
the dimensionality of the original problem, in practice its
1757
1
TABLE I
S UMMARY OF THE 7 CEC’08 T EST F UNCTIONS
P(X>=1),v=3
P(X>=1),v=4
P(X>=1),v=5
P(X>=1),v=6
0.9
Func
f1
f2
f3
f4
f5
f6
f7
0.8
0.7
Probability
0.6
0.5
0.4
Description
Shifted Sphere Function
Shifted Schwefel’s Problem 2.21
Shifted Rosenbrock’s Function
Shifted Rastrigin’s Function
Shifted Griewank’s Function
Shifted Ackley’s Function
FastFractal “DoubleDip” Function
Modality
Unimodal
Unimodal
Multimodal
Multimodal
Multimodal
Multimodal
Multimodal
0.3
0.2
0.1
0
0
1000
2000
3000
4000
5000
6000
Number of cycles
7000
8000
9000
10000
Fig. 3. Increasing N , the number of cycle increases the probability of
grouping v number of interacting variables in one subcomponent.
formula, F Es = 5000×D, where D is the number of dimensions. For the subpopulation sizes we used the same set as it
was used in[3], S = {5, 10, 25, 50, 100}. The experimental
setup is summarized in Table II. For better clarity all the
algorithms mentioned in this paper are summarized in Table
III
TABLE II
E XPERIMENTAL S ETUP PARAMETERS
role is insignificant in the whole optimization process. In
order to demonstrate this we counted the number of times
that adaptive weighting subcomponent manages to find a
solution, and based on our experiments we realized that
in most cases adaptive weighting fails to find a better
solution. This essentially means that a significant number
of fitness evaluations are wasted by adaptive weighting. In
order to demonstrate this difference we modified DECCG and disabled the adaptive weighting subcomponent and
conducted an experiment with the same settings as DECC-G
with a total of 50 cycles. We called this new variant DECCNW. Tables IV, V, and VI summarize the results of running
these two algorithms over the same benchmark functions.
Parameter
Dimensions
FEs
Population size
Subpopulation sizes
Number of runs
TABLE III
S UMMARY OF ALGORITHMS DESCRIBED IN THIS PAPER . T HOSE
WITHOUT CITATION ARE PROPOSED IN THIS PAPER .
Algorithm
DECC-G[4]
DECC-NW
C. Self-adaptation of subcomponent sizes
DECC
As we mentioned earlier we also proposed a simpler
technique for self-adapting the subcomponent sizes. In our
method we preserved the same decomposer set S as it was
used in MLCC, but instead of using the sophisticated formula
for probability calculation in choosing a decomposer we
simply use a uniform random number generator for selecting
a decomposer from the set S. This only happens when there
is no improvement in the fitness between the previous and
the current cycles.
DECC-ML
IV. E MPIRICAL R ESULTS
A. Experiment Setup
We evaluated our proposed techniques with CEC’08 test
suite which was proposed on CEC’2008 special session for
Large Scale Global Optimization[15].
This test suite consists of seven functions which are
summarized in Table I.
We ran each algorithm for 25 independent runs for 100,
500, and 1000 dimensions and the mean and standard deviation of the best fitness values over 25 runs was recorded.
The population size was set to 50, and the maximum number
of fitness evaluations(FEs) was calculated by the following
Value
D = {100, 500, 1000}
5000 × D
{5e + 5, 2.5e + 6, 5e + 6}
Npopsize = 50
S = {5, 10, 25, 50, 100}
25
MLCC[3]
Description
Cooperative Co-evolutionary DE with
random grouping and adaptive weighting.
Similar to DECC-G with the adaptive
weighting subcomponent disabled.
Cooperative Co-evolutionary DE without adaptive weighting and
running the subcomponent optimizers for only one iteration.
Similar to DECC, but uses a uniform selection
for self-adapting the subcomponent sizes.
Similar to DECC-G, but it self-adapts the subcomponent
sizes using historical performance of different decomposers.
B. Analysis of Results
The result of 25 independent runs for DECC-ML is
listed in Tables VII, and VIII for 1000 dimensions. These
tables show the progress of the algorithm and is based on
provided template for CEC’08 competition for Large Scale
Global Optimization[15]. The summary Tables IV, V, and VI
compares all five algorithms for 100, 500, 1000 dimensions
respectively.
According to Tables VII, and VIII, DECC-ML shows
faster convergence than MLCC on 6 out of 7 functions.
A deeper analysis of the 25 runs revealed that existence
of two outliers in results of DECC-ML increased the mean
significantly on f5 . Running Wilcoxon rank-sum test using
99% confidence interval has shown that DECC-ML is also
significantly better than MLCC on f5 . The p-values of
Wilcoxon rank-sum test is recorded in Table VI.
The global optimum point for f7 is unknown, but from
the Tables VII, and VIII it is clear that the algorithm has
1758
a steady progress in the improvement of the fitness value.
This shows the ability of DECC-ML to converge even on a
difficult function such as f7 which is highly multimodal and
has a rugged surface. It is noteworthy that DECC-ML found
the absolute global point in all 25 runs for f4 and in at least
19 out of 25 runs for f1 .
By comparing DECC and DECC-G we can see that
DECC found a better solution using same number of fitness
evaluations in 6 out of 7 benchmark functions which shows
that more frequent random grouping of variables yields better
performance. This trend is almost the same for 100, 500, and
1000 dimensions.
The summary Tables IV, V, and VI also confirm our
speculation about high failure rate of adaptive weighting
subcomponent of DECC-G.
From the results in Table VI, V, IV, we can see that
a significant number of fitness evaluations could be saved
for using more frequent random grouping in order to increases the performance of the algorithm, we can conclude
that using the saved fitness evaluations for increasing the
frequency of random grouping will improve the performance
of the algorithm even further. We incorporated both of these
changes into DECC-G,and created another version called
DECC. Experimental results shows that DECC outperforms
DECC-G over 6 out of 7 benchmark functions.
DECC-NW and DECC-G are identical except that in
DECC-NW the adaptive weighting subcomponent is disabled. We tested both algorithms for 50 cycles and equal
number of fitness evaluations. By looking at the Tables IV,
V, and VI we can observe that DECC-NW converged faster
than DECC-G in 6 out of the 7 functions.
Tables IV, V, VI show that DECC-ML outperforms MLCC
in 6 out of 7 test functions which is quite substantial. The
same trend continues over all dimensions which shows the
better scalability of DECC-ML.
Another observation is DECC-ML’s faster convergence
behavior than MLCC for most of the test functions. This
behavior is specially clear from Figures 4(a), 4(e), 4(f). For
functions f1 , f5 , f6 a solution is found with almost the same
quality with only half of the FEs. This is a very valuable
property specially in real world problems where evaluation
of the fitness function is very costly. DECC-ML also shows
a quicker convergence over the rest of test functions but the
trend is less significant compared to f1 , f5 , f6 .
TABLE IV
R ESULTS OF DIFFERENT ALGORITHMS OVER 100 DIMENSIONS (AVERAGE
OVER 25 RUNS ). B EST RESULTS ARE HIGHLIGHTED IN BOLD .
f1
f2
f3
f4
f5
f6
f7
DECC
2.7263e-29
5.4471e+01
1.4244e+02
5.3370e+01
2.7589e-03
2.3646e-01
-9.9413e+02
DECC-G
1.1903e-08
6.0946e+01
4.6272e+02
1.1783e+02
3.7328e-03
1.0365e+00
-1.2666e+03
DECC-ML
5.7254e-28
2.7974e-04
1.8871e+02
0
3.6415e-03
3.3822e-14
-1.5476e+03
DECC-NW
8.9824e-28
5.9949e+01
1.2959e+02
4.5768e+00
7.3233e-03
1.2224e+00
-1.3967e+03
MLCC
6.8212e-14
2.5262e+01
1.4984e+02
4.3883e-13
3.4106e-14
1.1141e-13
-1.5439e+03
TABLE V
R ESULTS OF DIFFERENT ALGORITHMS OVER 500 DIMENSIONS (AVERAGE
OVER
f1
f2
f3
f4
f5
f6
f7
25 RUNS ). B EST RESULTS ARE HIGHLIGHTED IN BOLD .
DECC
8.0779e-30
4.0904e+01
6.6822e+02
1.3114e+02
2.9584e-04
6.6507e-14
-5.5707e+03
DECC-G
1.0326e-25
7.6080e+01
1.4295e+03
5.6116e+00
4.1332e-03
1.8778e+00
-6.0972e+03
DECC-ML
1.6688e-27
1.3396e+00
5.9341e+02
0
1.4788e-03
1.2818e-13
-7.4582e+03
DECC-NW
2.4479e-27
7.2776e+01
1.2499e+03
5.2932e+00
2.7584e-02
1.0634e+00
-6.9403e+03
MLCC
4.2974e-13
6.6663e+01
9.2466e+02
1.7933e-11
2.1259e-13
5.3433e-13
-7.4350e+03
TABLE VI
R ESULTS OF DIFFERENT ALGORITHMS OVER 1000
DIMENSIONS (AVERAGE OVER 25 RUNS ). B EST RESULTS ARE
HIGHLIGHTED IN BOLD .
f1
f2
f3
f4
f5
f6
f7
DECC
1.2117e-29
4.2729e+01
1.2673e+03
2.4498e+02
2.9584e-04
1.3117e-13
-1.4339e+04
DECC-NW
DECC-G
3.8745e-27
7.4234e+01
2.6306e+03
1.0666e+01
7.8435e-03
2.3475e+00
-1.3015e+04
MLCC
f1
f2
f3
f4
f5
f6
f7
3.8145e-27
7.2561e+01
2.5411e+03
7.7209e+00
1.4976e-02
1.7613e+00
-1.41679e+04
8.4583e-13
1.0871e+02
1.7986e+03
1.3744e-10
4.1837e-13
1.0607e-12
-1.4703e+04
DECC-ML
5.1750e-28
3.4272e+00
1.0990e+03
0
9.8489e-04
2.5295e-13
-1.4757e+04
DECC-ML,MLCC
MWW rank-sum test
2.851e-06
6.535e-06
6.535e-06
6.482e-06
1.161e-03
6.506e-06
6.530e-06
V. C ONCLUSION
In this paper we proposed three techniques for further
refinement of MLCC aiming to substantially reduce its computational cost. We have shown that more frequent random
grouping will result in faster convergence without sacrificing
solution quality due to increased probability in grouping interacting variables in a subpopulation. More frequent random
grouping will also increase the efficiency of the algorithms
in dealing with problems with more interacting variables. We
have also shown that Adaptive Weighting is not as efficient
as it was initially thought in reducing the dimensionality of
the problem and in most cases fails to improve the fitness
value. This will waste a considerable amount of fitness
evaluations which might be better used for more effective
random grouping and co-evolution of subcomponents.
We have also proposed an alternative technique for selfadapting the subcomponent sizes in CC framework and have
shown that this simple technique is very effective.
Finally we proposed an algorithm called DECC-ML which
uses the new uniform selection of subcomponent sizes along
with more frequent random grouping. Experimental results
show that this new algorithm outperforms MLCC in most
cases with a considerable difference. Fast convergence of
DECC-ML allows for saving a significant amount of fitness
evaluations for most of the functions. This saves considerable
amount of CPU time specially in real-world problems with
costly objective functions. The findings of this research
is not limited to the CEC’2008 benchmark functions. We
also observed similar behavior on CEC’2010 benchmark
1759
1e+10
1000
DECC
DECC-G
DECC-ML
DECC-NW
100000
1
100
1e-05
1e-10
1e-15
10
1e-20
1e-25
1e-30
DECC
DECC-G
DECC-ML
DECC-NW
1
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0
10000
20000
30000
40000
Evaluations
50000
60000
70000
80000
90000
100000
Evaluations
(a) f1
(b) f2
1e+13
1e+06
DECC
DECC-G
DECC-ML
DECC-NW
1e+12
10000
100
1e+11
1
1e+10
0.01
1e+09
0.0001
1e+08
1e-06
1e+07
1e-08
1e+06
1e-10
100000
1e-12
10000
DECC
DECC-G
DECC-ML
DECC-NW
1e-14
1000
1e-16
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0
10000
20000
30000
40000
Evaluations
50000
60000
70000
80000
90000
100000
60000
70000
80000
90000
100000
Evaluations
(c) f3
(d) f4
1e+06
100
DECC
DECC-G
DECC-ML
DECC-NW
10000
1
100
0.01
1
0.01
0.0001
0.0001
1e-06
1e-06
1e-08
1e-08
1e-10
1e-10
1e-12
1e-12
DECC
DECC-G
DECC-ML
DECC-NW
1e-14
1e-16
1e-14
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0
10000
20000
30000
40000
Evaluations
50000
Evaluations
(e) f5
(f) f6
Fig. 4.
Convergence plots for f1 − f6 with 1000 dimensions
1760
TABLE VII
E XPERIMENTAL RESULTS OF 25 INDEPENDENT RUNS FROM DIFFERENT
-6000
DECC
DECC-G
DECC-ML
DECC-NW
-7000
F Es F Es
, 10 ,F Es).
STAGES OF EVOLUTION ( 100
-8000
5
e
+
4
1000D
1st
7th
13th
19th
25th
M
Std
st
f1
1.7161e+05
4.1466e+05
4.7111e+05
4.8698e+05
5.1115e+05
4.3797e+05
7.7341e+04
f2
1.1887e+02
1.3322e+02
1.3836e+02
1.6499e+02
1.8057e+02
1.4607e+02
1.8315e+01
f3
6.6224e+09
3.4286e+10
4.0710e+10
9.6494e+10
1.0925e+11
5.9240e+10
3.5704e+10
f4
4.0066e+03
5.4659e+03
5.6821e+03
7.1419e+03
8.3134e+03
6.1375e+03
1.3938e+03
5
e
+
5
1
7th
13th
19th
25th
M
Std
9.1964e-01
2.2217e+01
3.0578e+01
3.2747e+01
1.3197e+02
4.1338e+01
4.0586e+01
3.0613e+01
3.5070e+01
5.5145e+01
9.6366e+01
1.2751e+02
6.4509e+01
3.2558e+01
1.0463e+04
3.2238e+04
4.4875e+04
1.2383e+05
1.4550e+05
6.9043e+04
4.7066e+04
3.0303e+02
3.5180e+02
8.6533e+02
8.9242e+02
9.1424e+02
6.8650e+02
2.6892e+02
5
e
+
6
1st
7th
13th
19th
25th
M
Std
0.0000e+00
0.0000e+00
0.0000e+00
0.0000e+00
1.2937e-26
5.1750e-28
2.5875e-27
6.9941e-02
9.0952e-01
2.0454e+00
2.8289e+00
1.5571e+01
3.4272e+00
4.4636e+00
9.2453e+02
9.8941e+02
1.0675e+03
1.1714e+03
1.4235e+03
1.0990e+03
1.3217e+02
0.0000e+00
0.0000e+00
0.0000e+00
0.0000e+00
0.0000e+00
0.0000e+00
0.0000e+00
-9000
-10000
-11000
-12000
-13000
-14000
-15000
0
Fig. 5.
TABLE VIII
E XPERIMENTAL RESULTS OF 25 INDEPENDENT RUNS FROM DIFFERENT
F Es F Es
, 10 ,F Es).
STAGES OF EVOLUTION ( 100
5.0e+04
1st
7th
13th
19th
25th
M
Std
f5
9.9964e+02
2.9989e+03
3.1484e+03
4.1706e+03
4.5463e+03
3.3631e+03
9.4624e+02
f6
1.2131e+01
1.4462e+01
1.6667e+01
1.6810e+01
1.7210e+01
1.5605e+01
1.7201e+00
f7
-1.1674e+04
-1.1328e+04
-1.0720e+04
-9.7556e+03
-9.6350e+03
-1.0589e+04
7.7443e+02
5.0e+05
1st
7th
13th
19th
25th
M
Std
2.1304e-01
1.1769e+00
1.6448e+00
2.1568e+00
2.2286e+00
1.5547e+00
6.0200e-01
1.5612e-01
3.9539e-01
4.1639e-01
7.7794e-01
9.7411e-01
5.2513e-01
2.8627e-01
-1.3881e+04
-1.3812e+04
-1.3445e+04
-1.2909e+04
-1.2785e+04
-1.3362e+04
4.1907e+02
5.0e+06
1st
7th
13th
19th
25th
M
Std
1.3323e-15
1.4433e-15
1.4433e-15
1.5543e-15
1.7226e-02
9.8489e-04
3.6923e-03
2.0606e-13
2.4158e-13
2.5224e-13
2.7711e-13
2.8777e-13
2.5295e-13
2.3677e-14
-1.4780e+04
-1.4771e+04
-1.4758e+04
-1.4743e+04
-1.4731e+04
-1.4757e+04
1.4880e+01
1000D
10000
functions[16], but lack of space does not allow to include
all of the experimental results.
ACKNOWLEDGMENT
This work was partially supported by an EPSRC grant
(No. EP/G002339/1) on “Cooperatively Coevolving Particle
Swarms for Large Scale Optimisation”.
R EFERENCES
[1] S. Luke, “Genetic programming produced competitive soccer softbot
teams for robocup97,” in University of Wisconsin. Morgan Kaufmann,
1998, pp. 214–222.
[2] M. A. Potter and K. A. D. Jong, “A cooperative coevolutionary
approach to function optimization,” in Proc. of the Third Conference
on Parallel Problem Solving from Nature, vol. 2, 1994, pp. 249–257.
20000
30000
40000
50000
60000
Evaluations
70000
80000
90000
100000
Convergence plots for f7 with 1000 dimensions
[3] Z. Yang, K. Tang, and X. Yao, “Multilevel cooperative coevolution
for large scale optimization,” in Proc. of IEEE World Congress on
Computational Intelligence, June 2008, pp. 1663–1670.
[4] ——, “Large scale evolutionary optimization using cooperative coevolution,” Information Sciences, vol. 178, pp. 2986–2999, August 2008.
[5] Y. Liu, X. Yao, Q. Zhao, and T. Higuchi, “Scaling up fast evolutionary
programming with cooperative coevolution,” in Proc of the 2001
Congress on Evolutionary Computation, 2001, pp. 1101–1108.
[6] D. Sofge, K. D. Jong, and A. Schultz, “A blended population approach
to cooperative coevolution fordecomposition of complex problems,” in
Proc. of IEEE World Congress on Computational Intelligence, 2002,
pp. 413–418.
[7] F. van den Bergh and A. P. Engelbrecht, “A cooperative approach
to particle swarm optimization,” IEEE Transactions on Evolutionary
Computation 8 (3), pp. 225–239, 2004.
[8] Y. Shi, H. Teng, , and Z. Li, “Cooperative co-evolutionary differential
evolution for function optimization,” in Proc. of the First International
Conference on Natural Computation, 2005, pp. 1080–1088.
[9] J. Vesterstrom and R. Thomsen, “A comparative study of differential
evolution, particle swarm optimization, and evolutionary algorithms
on numerical,” in Proc. of the 2004 Congress on Evolutionary Computation, vol. 2, 2004, pp. 1980–1987.
[10] R. Storn and K. Price, “Differential evolution . a simple and efficient
heuristic for global optimization over continuous spaces,” Journal of
Global Optimization 11 (4), pp. 341–359, 1995.
[11] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of IEEE International Conference on Neural Networks, vol. 4,
1995, pp. 1942–1948.
[12] R. Gamperle, S. Muller, and P. Koumoutsakos, “A parameter study
for differential evolution,” in Proc. WSEAS International Conference
on Advances in Intelligent Systems, Fuzzy Systems, Evolutionary
Computation, 2002, pp. 293–298.
[13] A. Qin and P. Suganthan, “Self-adaptive differential evolution algorithm for numerical optimization,” in Proc. of the 2005 IEEE Congress
on Evolutionary Computation, vol. 2, 2005, pp. 1785–1791.
[14] Z. Yang, K. Tang, and X. Yao, “Self-adaptive differential evolution
with neighborhood search,” in Proc. of IEEE World Congress on
Computational Intelligence, June 2008, pp. 1110–1116.
[15] K. Tang, X. Yao, P. N. Suganthan, C. MacNish, Y. P. Chen, C. M.
Chen, , and Z. Yang, “Benchmark functions for the cec’2008 special
session and competition on large scale global optimization,” Nature
Inspired Computation and Applications Laboratory, USTC, China,
Tech. Rep., 2007, http://nical.ustc.edu.cn/cec08ss.php.
[16] K. Tang, X. Li, P. N. Suganthan, Z. Yang, and T. Weise, “Benchmark functions for the cec’2010 special session and competition on large-scale global optimization,” Nature Inspired Computation and Applications Laboratory, USTC, China, Tech. Rep., 2009,
http://nical.ustc.edu.cn/cec10ss.php.
1761