A Hybrid EA for polynomial opt

A Hybrid Evolutionary Algorithm for Polynomial Optimisation with
Constraints
Angel Kuri-Morales
Instituto Tecnológico Autónomo de México
Río Hondo No. 1
[email protected]
Abstract
The search for the global optimum in a function,
particularly a polynomial with constrains, is an old
problem. This paper proposes a hybrid algorithm
(QGA) that merges two different and well studied
methods for solving it: Genetic Algorithms (VGA) and
Sequential Quadratic Programming (SQP). In so
doing we expect to benefit from of each method’s
advantages while ameliorating their shortcomings.
We tested the hybrid algorithm on a large number of
randomly generated polynomials From this extensive
simulations we were able to extract statistically
significant quantitative results proving that QGA is
better than either of its components.
Key words: Optimisation, Genetic Algorithms,
Sequential Quadratic Programming, Hybrid Methods.
1. Introduction
Some of the methods to solve optimisation problems
that have been developed through time are based on
the observation of the surrounding reality and in the
analysis of the properties of the problem that has to be
solved. Likewise, some algorithms developed in
artificial intelligence are abstractions of a model seen
in nature. For example, neural networks, simulated
annealing, genetic algorithms, etc. GA’s theory, in
particular, ensures they will arrive to an optimal
solution given enough time.
On the other hand, there are several
mathematical methods devised for continuous
optimisation with constraints. These use information
of first and second order to find a feasible region or a
direction to one where they can move to get better
results. In this paper we discuss a hybrid method,
where a rough search (performed by a genetic
algorithm) is tuned up (with a second order gradient
descent method). Putting these two techniques
together will allow us to improve on both, as will be
discussed in what follows.
The kind of problem that is analysed in this work is:
Karina Zapién-Arreola
Instituto Tecnológico Autónomo de México
Río Hondo No. 1
[email protected]
minimize f (x )
subject to ci ( x)  0, i  I
where,
x  n , n  4
f ( x) :  n   is a polynomial of grade l  5,
c i ( x) :  n   is a set of m linear constrains that
together have the form: Ax  b  0 with A   mn
and b   m . The constraint gradients selected are
independent in an active point. The restrictions were
chosen as above to be able to tackle them with a
traditional optimisation method. The evolutionary
algorithm imposes no such restriction on the form of
the constraints.
2. Genetic Algorithms
Genetic Algorithms are inspired by the theory
developed by Charles Darwin about evolution, which
states that adaptation to the environment is the key
factor that decides which individual will be survive on
the next generation1.
Therefore, just as evolution does, genetic
algorithms initially generate a randomly coded
population. In each generation every individual is
evaluated, reflecting the fitness of the said individual
to the environment (which depends, of course, on the
problem that has to be solved). Then, it iterates
changing the code of the individuals in the population
with operators such as crossover and mutation.
Roughly speaking the three operators (selection,
crossover and mutation) correspond to learning about
the problem, exploiting the knowledge already gained
to approach the target efficiently and exploring new
points in the space of solutions.
One of the most powerful genetic algorithms is
the Vasconcelos’ genetic algorithm (VGA) 2. In the
VGA selection is performed based on the fitness for
each of the individuals, including the ones that were
present in the past population. From an original
population of size n we generate n new individuals
1
General Problem.
2
For details refer to [2], [6] and [10]
This result is proved in [11]
from crossover and mutation (as described in what
follows). Then we select the best n individuals (out of
the 2n) and we discard the rest. Crossover is annular
with probability pc. The couples for crossover are
chosen a priori based on their fitness. The best one is
crossed with probability pc with the worst, the second
with the next to the last, etc. Mutation is performed,
simply, when a bit from the population is chosen with
probability pm and its state is changed ( 0  1 ). We
iterate l times and the solution will correspond to the
individual with the best fitness (in this case, it would
be the one with the lower value).
When applying the VGA to a constrained
optimisation task those individuals that do not satisfy
all of the constraints will get a very high constant
value added to its fitness. This scheme has been tried
before
successfully
to
handle
constrained
optimisation [26].
GAs are very flexible, in the following sense. If
we find a way to represent the solution in a (typically)
binary-encoded individual, almost any problem may
be tackled. GAs do not require the function to
optimise to be differentiable. This sort of advantage
makes it easy to program a GA. And since it looks for
the solution in all the feasible space (via mutation), it
can (and will, given sufficient time) converge to a
global minimum.
Unfortunately, a relatively long time to find the
solution is sometimes required because the GA looks
for it in all the space of solutions and we do not have
information about how close it is to the best optimum
once the algorithm has iterated k times.
In order to accelerate convergence, we intend to
selectively activate a mathematically more efficient
optimisation method: SQP (sequential quadratic
programming). Hopefully, then, the GA will receive
the benefit of SQP and target on the desired global
optimum more efficiently.
3. Sequential quadratic programming
Some closed methods for optimisation are based in
optimality theorems, which mainly describe the form
that the first and second order derivatives3 must
satisfy in order for the algorithm to work properly.
Here we discuss SQP.
This method starts with a feasible point that can
be obtained, for instance, from the first phase of the
Simplex method. The purpose of SQP is to convert
the general problem in an easier sub-problem such as
a Quadratic Problem. This can be solved to approach
the general solution and used as a basis of an iterative
problem. This Quadratic problem is obtained by
approaching the function with its second order Taylor
series. The Quadratic sub-problem is defined as
follows:
3
Refer to [5], [8], [12], [13], [15] and [16]
minimize 1 p T Gp  dp
2
subject to Ap  ci ( xk )  0,
where,
iI
  2 f ( x) if  2xx f ( x) positive definite
G   2 xx
in other case
 xx f ( x)  (0.0001  emin ) I
This is an approximation to the Hessian of f(x)
evaluated in
 x f ( x) if  2xx f ( x) positive definite
x, d  

(0.0001  emin )   x f ( x) in other case
which yields an approximation to the gradient of f(x)
evaluated in x, with


emin  min eigenvalues of  2xx f ( x)
A  ci ( x)
T
is the gradient of the i-th constrain
evaluated in x, which in this case is the same as the
matrix that defines the linear constrains.
In several methods, a strategy is to convert a
constrained problem into a problem without constrains
using a merit function which seeks to reach the
minimum and satisfy the constrains. This process, in
the limit, converges to the optimal within the feasible
set.
Definition 1. The Lagrange function is defined as:
L( x,  )  f ( x)  T c( x).
where,  is a vector of the so-called the Lagrange
multipliers for the constraints.
It is very common to use this function as a merit
function. In order to understand the following
theorems, we introduce the next definitions:
Definition 2. The active set A(x ) at any feasible point
x is the indices of the active inequality constraints; that
is,
A( x)  i  I | ci ( x)  0.
*
Definition 3. Given the point x and the active set
A( x * ) , we say that the linear independence constraint
qualification (LICQ) holds if the set of active
constraint gradients ci ( x * ), i  A( x * ) is linearly


independent.
Theorem 1 (First-Order Necessary Conditions).
*
Suppose that x is a local solution and that the LICQ
*
holds at x . Then there is a Lagrange multiplier
vector
* , with components
*i , i  E  I , such that
the following conditions are satisfied at ( x ,  ):
*
*
 x L( x * , * )
ci ( x * )  0,
  0,
*
i
*i ci ( x * )  0,
iI
for all i  I
for all i  I
for all
These conditions are known as the Karush-KuhnTucker conditions, or KKT conditions for short.
The theorem for the Second-Order Necessary
Conditions states that the Hessian of the Lagrange
function must be positive semi-definite in a space
defined by the function and the constraints.
Thus, the algorithm will stop if it reaches both
conditions or it becomes numerically unstable.
If instability is not reached, it is true that it will arrive
rapidly to a (at least) local minimum since it has
information about where to move in the next step to
decrease the function. It can arrive as close as we
decide to the optimal accordingly to the first and
second order conditions for optimality. Unfortunately
we cannot know if the point so reached is a global
minimum and the solution given by this algorithm can
be a local solution. The function must be twice
differentiable and the functions for the constrains
have to be once differentiable.
The strategy is to use as a base the VGA and add
another sort of mutation with probability p sqp if an
individual is feasible. This mutation consists of
applying SQP and changing the original individual
and its fitness accordingly. If the individual is not
feasible, it remains untouched. Because it uses SQP,
we denote it as Quadratic Genetic Algorithm (QGA).
5. Results
To test the three algorithms, we generated random
polynomials of no more than 4 variables of degree 5 at
most. We also generated random constrains that define
a hypercube in the solution space.
A statistical methodology is used to analyse each
algorithm. We basically estimate the sample mean and
the standard deviation. We find confidence intervals
and, in order to make a better estimation of the
parameters, we use the central limit theorem and
Chebyshev’s theorem to find a more general
confidence interval.
The results pertaining the minimum values found
are shown in table 1.
4. Hybrid algorithm
The two algorithms in the previous sections perform
their search quite differently. The algorithm proposed
herein is one that intends to take advantage of each
algorithm’s pluses while discarding their minuses to
get a more robust and efficient one.
Minimizer found
Table 1. Parameters from the simulations
Proof with the original data
Total number of runs:
5450
SQP
VGA
QGA
-0.7130
1.2998
-17.5294
1.8496
-1.0823
1.8330
-19.9167
0.9912
-1.0875
1.8380
-19.9168
0.9912
I. Statistics
Mean
S. Deviation
Minimun
Maximun
II. Fail and success proportion
Proportion
than AG
than AG_SQP
Better
Same
Worse
60.81%
0.86%
38.33%
21.04%
21.91%
57.05%
than SQP
38.33%
0.86%
60.81%
than AG_SQP
9.02%
1.70%
89.28%
than SQP
57.05%
21.91%
21.04%
than AG
89.28%
1.70%
9.02%
Proof with sample means of the original data
Number of sample means:
Size of the sampe:
SQP
109
50
VGA
QGA
9.6239
7.7890
III.Proof X for normality
2
Proof X 2
6.8716
IV. 93.75% confidence interval with Chebyshev's theorem for the original data
Better case
Worst case
Proportion
Better case
Worst case
-5.9128
4.4868
with AG
with AG_SQP
0.6833
0.6852
1.4462
1.4385
SQP - Sequential Quadratic Programming algorithm
VGA - Vasconcelos' Genetic Algorithm
QGA - Genetic Algorithm with SQP
*  = mean
-8.6533
6.4888
with SQP
with AG_SQP
1.4635
1.0028
0.6915
0.9947
-8.6291
6.4542
with SQP
with AG
1.4594
0.9972
0.6952
1.0054
In Table 1, part I (Statistics) we compare the
mean values for SQP, VGA and QGA (the
hybrid algorithm). The columns entitled “AG”
refer to VG; the columns entitled “AG_SQP”
refer to QGA. Its is clear that VGA is much
better than SQP. But QGA improves on VGA
alone, as expected. In part II of the table we
show the number of times (as a percentage) in
which we found better results of SQP vs. VGA
and QGA.; VGA vs. SQP and QGQ; QGA vs.
SQP and VGA. In the second part of the table
we display similar results but applying the
method developed in [11]. In this instance,
normality was not assumed. Rather, we used a
chi square test with 95% confidence to
guarantee that the means were distributed
normally. At this point we were able to
determine μ X and σ X . From these, it is easy to
determine the original population’s parameters
from μ  μ X and σ  n σ X .
The previous table shows that the
independent results of either algorithm (VGA
and SQP) were improved upon when using the
hybrid combination of both. However, in some
cases SQP found better results than VGA and
QGA. This is due to the fact that, because of its
mathematically closed design, SQP is more
precise when it is started on the vicinity of a
possible solution. We stress that the
computational effort implied in SQP is an order
or magnitude smaller than that of VGA.
Therefore, the following strategy suggests itself:
run both QGA and SQP. Select the best result.
With this simple scheme we achieve the best
results and the additional cost is negligible.
References and Bibliography
[1] Aarts, E. and Korst, J.: Simulated Annealing and
Boltzmann Machines, John Wiley & Sons, 1990.
[2] Bäck, T.: Evolutionary Algorithms in Theory and
Practice, Oxford, 1996.
[3] Bartle, R.: The Elements of Real Analysis, John
Wiley, second edition, 1976.
[4] Devore, J. L.: Probabilidad y Estadística para
ingeniería y ciencias, trad: Jorge H. Romo,
International Thomson Editores, cuarta edición, 1998.
[5] Fletcher, R.: Practical Methods of Optimisation,
John Wiley & Sons, second edition, 1990.
[6] Fogel, D.: Evolutionary Computation : Toward a
New Philosophy of Machine Intelligence, IEEE,
second edition, 1999.
[7] Freund, J., Simon, G.: Statistics, A first course,
Prentice Hall, Fifth edition, 1991.
[8] Gottfried, B., Weisman, J.: Introduction to
optimisation theory, Prentice-Hall, Inc. 282-284,
1973.
[9] Grossman, S., Álgebra lineal, McGraw Hill,
quinta edición, 1996.
[10] Kuri-Morales, A., A Comprehensive Approach
To Genetic Algorithms in Optimisation and Learning.
Theory and Applications, Instituto Politécnico
Nacional, 1999.
[11] Kuri-Morales, A., A Methodology for the
Statistical Characterization of Genetic Algorithms,
MICAI 2002, Advances in Artificial Intelligence,
LNAI 2313. Springer, 2002.
[12] Luenberger, D.: Programación Lineal y no
lineal, Addison Wesley Iberoamericana, 1989.
[13] The MatlabWorks Inc.: Optimisation Toolbox
User's Guide, The Matlab-Works Inc., Fourth
printing, 2000.
[14] Megeath, J.: How to use statistics, Canfield
Press, San Francisco, 1975.
[15] Nemhauser, G., Rinnooy, A., Todd, M.
Handbooks in operations research and management
science, Elsevier Science Publishers B.V., NorthHolland, Volume I Optimisation, 1991.
[16] Nocedal, J., Wright, S., Numerical Optimisation,
Harcourt Academic Press, seventh edition, 2000.
[17] Oetiker, T., Partl, H., Hyna, I. and Schlegl, E.:
The Not So Short Introduction to LATEX, 2000.
[18] Rice, J.:Mathematical Statistics and Data
Analysis, Duxbury Press, Second Edition, 1995.
[19] Ross, S., Introduction to Probability Models,
Horcourt Academic Press, seventh edition, 2000.
[20] Ross, S.: Stochastic Process, John Wiley &
Sons, Inc., second edition, 1996.
[21] Salvat Editores, S.A.: Enciclopedia Salvat
Diccionario, Tomo 5, Salvat Editores, S.A.,
Barcelona, 1971.
[22] Walpole, R., Myers, R.: Probabilidad y
estadística, trad.: Gerardo Maldonado, McGraw-Hill,
Cuarta Edición, 1992.
[23] http://www.xrefer.com/...
entry.jsp?xred=509861&secid
[24] http://www.xrefer.com/entry/171151
[25] http://www.xrefer.com/entry.jsp/xred=170766
[26] Kuri-Morales, A. and Gutiérrez-García, J.,
Penalty Function Methods for Constrained
Optimisation with Genetic Algorithms: A Statistical
Analysis, Lecture Notes in Artificial Intelligence,
LNAI 2313, Springer-Verlag, 2002, pp. 108-117.