chapter 2 genetic algorithms

PENALTY METHODS IN GENETIC ALGORITHM FOR
SOLVING NUMERICAL CONSTRAINED
OPTIMIZATION PROBLEMS
A THESIS SUBMITTED TO
THE GRADUATE SCHOOL OF APPLIED SCIENCES
OF
NEAR EAST UNIVERSITY
by
MAHMOUD K.M. ABURUB
In Partial Fulfillment of the Requirements for
the Degree of Master of Science
in
Computer Engineerıng
NICOSIA 2012
1
I hereby declare that all information in this document has been obtained and presented in
accordance with academic rules and ethical conduct. I also declare that, as required by these
rules and conduct, I have fully cited and referenced all material and results that are not
original to this work.
Name, Last name: MAHMOUD ABURUB
Signature:
Date:
2
ABSTRACT
Optimization is a computer based or mathematical based process used to find the best
solution in complicated hyperspace. Optimization is an important theme that can be used to
enhance a given result or, to prove it. However, it is proportional to the formulation of the
problem in hand. Optimization is really simple for some sort of problems, but it will be more
complicated in constrained hyperspace, where equality and inequality constraints exist.
Evolutionary Algorithms are one of the most powerful optimization methods used for
many types of problems. Genetic algorithms, other strategies in use, are also powerful
optimization tools, as they are not interfered with by the complexity of hyperspace. On the
other hand, they only interfere with traits need to be optimized by mimicking natural selection
and environmental adaptation like genetic developments process of any species.
Combining genetic algorithms with optimization in constraints hyperspace is only by
applying penalty functions. If two types of constraints are on, equality, and inequality
constraints; converting equality constraints to inequality format can be done by subtracting a
constant from constraint value, often a rational number. The satisfaction of constraints is the
basic condition for solution to be recognized as valid one. Nevertheless, not all formulated
problems will be solved by using an optimization method, as they could suffer from a
misunderstanding of the problem or the constraints violations.
This study focuses on applying genetic algorithms to constraints problems by applying
penalty. Three types of algorithms are used, dynamic penalty, static penalty and stochastic
ranking for constraints optimization. These methods were tested in twelve known and
published benchmarked problem. We found that not all of them were completely successful in
solving the suit of tested problems, which gives an additional support for No Free Lunch
Theorem (Wolpert & Macready, 1996). In summary, it is not necessary for any two distinct
algorithms to perform identically within the same search space.
Finally, stochastic ranking was the optimum solver for the tested suit. Some other
methods do have a solution, but for some problems a solution could not be found. In fact,
stochastic ranking mostly has a solution that could be enhanced to be the best; On the other
hand, it provides additional proof for No Free Lunch Theorem and Lamarckian Theory.
Keywords: genetic algorithms, adaptive penalty, static penalty, stochastic ranking
optimization with constrained hyperspace.
3
ÖZET
Optimizasyon (eniyileme) karm aşık hyperspace problemlerıne en iyi çözümü bulmak
için kullanılan bilgisayar veya matematık bazlı işlemdir. Optimizasyon verilen sonucu
genişletmek veya sağlamasını yapmak için kullanıabilem önemli bir temadır. Ayrıcaö söz
konus problemin förmülasyonuyla doğru orantılıdır. Bir takım problemler için optimizasyon
işlemi oldukça kolaydır ancak eşitlik ve eşitsizlik kısıtları (sabitleri) olan kısıtlı hyperspace
konusunda daha karmaşıktır.
Evrimsel algoritmalar birçok problem için kullanılan en etikli optimizasyon
yöntemlerinden birdir. Genetik Algoritmalar da, kullanılan digger stratejiler, hyperspace’in
karışıklığından etkilenmeyen güçlü optimizasyon araçlarıdır. Bunun yanı sıra, genetik
algorıtmalar herhangi bir türün genetik gelişim sadece süreci gibi taklit etmedeki doğal seçim
ve çevresel adapasyon ile optimize edilmesi gereken tahditlerle engellenir.
Hyperspace kısıtlarında optimizasyon ile genetik algorıtmaların birleştirilmesi sadece
programlama işlevleri uygulanarak yapılır. Eğer eşitlik ve eşitliksizlik kısıtları söz konusuysa
eşitlik Kısıtları eşitliksizlik kısıtlerına dönüştürmek sabit genellikle değeri rasyonel sayı olan
kısıtların diğernden çıkararak yapılabilir. Kısıtların tazmini sonucun geçerli olması için
gerakli en temel koşuldur. Buna rağmen, formüle edilmiş her problem, problemin yanliş
anaşılması veya kısıtların ihlal ihtimali olduğu için, optimizasyon yöntemi kullanarak
çözolmeyecektir.
Bu çalişma kısıtlı problemlere program uygulayarak genetic algoritma yöntemlerinin
kullanımasına odeklanmiştir. Kısıtler optimizasyon için üç çeşit program işlevi vardır;
dinamik programlama, istatik programlama ve stokastic sıralama. Bu yöntemler bilinen ve
yayınlana on iki problem kıstas alınarak test edildi. Her üç yöntem de problemlerin
çözülmesinde başarılı olmayarak No Free Lunch Theroem’i destekledi. Özet olarak, iki farklı
algorıtmanın aynı şeklide çalışması gerekmez.
Test edilen alanlard stokastık sıralmanın optimum çözüme ulşan yöntem olduğunu
keşfettik. Diğer bazı yöntemleri kullanarak da çözüm alabilirsiniz ancak bazı programlerin
çözüm bulunamaybilir ancak, stokastik sıralama kullanıdığında en iyi sonuç olarak
genişletebileceğiniz çözüm ulaşmak daha mümkündür. Bir digger taraftan da No Free Lunch
theorem va Lamarckian theorem için
Anahtar Kelimeler: genetik algoritmalar, adaptif ceza, static ceza kısıtlı hiperuzayda
ile stokastik sıralama optimizasyonu.
ACKNOWLEDGMENTS
I want to thank my Prof. Dr. Adil Amirjanov, for his help during my working period,
for the twinkling ideas he has had, and the courage he has given to me even when my way I
several times, also, for his patience and hope which he inspired in me, to keep going and
moving forward toward that small and thin light at the end of it all. I also send my regards to
all the faculty stuff and jury members. To my mother and aunt, I send my deepest thanks and
emotion, as they were always there to support me. Also to my special and unique brother,
4
Rabah, for his harmony and warm heart and for his continued support. He provides me with
the valuable advices and kept my degree on track, even when there appeared to be no hope.
Dedicated to sand of Palestine, mother, father, aunt, Rabah, my wife, my daughter (Shams),
and brother…
5
CONTENTS
ABSTRACT ..................................................................................................................... 3
ÖZET ............................................................................................................................... 4
ACKNOWLEDGMENTS .............................................................................................. 4
LIST OF FIGURES ....................................................................................................... vi
LIST OF TABLES .......................................................................................................... v
CHAPTER1 INTRODUCTION .................................................................................... 1
1.1.
What is optimization? ........................................................................................ 1
1.2.
Thesis Overview................................................................................................. 2
CHAPTER 2 GENETIC ALGORITHMS ................................................................... 4
2.1.
Overview ............................................................................................................ 4
2.3.
Selection ............................................................................................................. 8
2.3.1.
Roulette Wheel Selection ............................................................................... 9
2.3.2.
Linear Ranking Selection ............................................................................. 10
2.3.3.
Tournament Selection ................................................................................... 11
2.4.
Crossover.......................................................................................................... 12
2.5.
Mutation ........................................................................................................... 14
2.6.
Population Replacement................................................................................... 14
2.7.
Search Termination .......................................................................................... 14
2.8.
Solution Evaluation .......................................................................................... 15
2.9.
Summary .......................................................................................................... 16
CHAPTER 3 CONSTRAINTS HANDLING METHODS........................................ 17
3.1.
Penalty Method ................................................................................................ 17
3.2.
Adaptive Penalty for Constraints Optimization ............................................... 22
3.3.
Static Penalty for Constrained Optimization.................................................... 23
3.4.
Stochastic Ranking for Constrained Optimization ........................................... 25
3.4.1.
3.5.
Stochastic Ranking using Bubble Sort like procedure ................................. 26
Summary .......................................................................................................... 28
CHAPTER 4 SIMULATION ....................................................................................... 29
4.1.
System Environment ........................................................................................ 29
4.2.
Tested Problems ............................................................................................... 31
4.3.
Criteria for Assessment .................................................................................... 35
iv
4.4.
No Free Lunch Theorem .................................................................................. 37
4.5. Summary .............................................................................................................. 39
CHAPTER 5 EXPERIMENTAL RESULTS AND ANALYSIS .............................. 39
5.1.
Overview .......................................................................................................... 39
5.2.
Results Discussion ........................................................................................... 40
5.3.
Result Comparison ........................................................................................... 46
5.4.
Convergence Map ............................................................................................ 47
5.5.
Summary .......................................................................................................... 50
CHAPTER 6 CONCLUSION REMARKS ................................................................ 51
6.1.
Conclusions ...................................................................................................... 51
6.2.
Future Work ..................................................................................................... 52
BIBLIOGRAPHY ......................................................................................................... 53
ÖZGEÇMİŞ .................................................................................................................. 55
APPENDIX .................................................................................................................... 55
v
LIST OF FIGURES
Figure 2.3.1.1 Roulette Wheel Selection Algorithms .................................................... 10
Figure 2.3.2.1 Linear ranking selection pseudo code .................................................... 12
Figure 2.3.3.1 Basic tournament selection pseudo codes .............................................. 12
Figure 2.4.1 Crossover (Recombination) algorithms ..................................................... 14
Figure: 3.2.1 Adaptive Penalty Algorithms Pseudo Code ............................................. 24
Figure 3.2.2 Static Penalty Algorithm Pseudo Code ..................................................... 25
Figure 3.4.1.1 Stochastic Ranking Using Bubble Sort like Procedure .......................... 28
Figure 4.1.1 System execution diagram ......................................................................... 30
Figure 4.3.1 Upper constraint ........................................................................................ 36
Figure 4.3.2 The function............................................................................................... 37
Figure 4.3.3 Lower constraint ........................................................................................ 37
Figure 5.4.1 Adaptive Penalty Convergence Map ......................................................... 49
Figure 5.4.2 Static Penalty Convergence Map ............................................................... 49
Figure 5.4.3 Stochastic Ranking Convergence Map ...................................................... 50
vi
LIST OF TABLES
Table 3.1.1 Static vs. dynamic penalty .......................................................................... 20
Table 4.1.1 PC configurations ........................................................................................ 29
Table 4.1.2 GA and System parameters ......................................................................... 29
Table 5.1.1 Number of variables and estimated ratio of feasible region ....................... 39
Table 5.2.1 Adaptive Penalty testing result ................................................................... 41
Table 5.2.2 Static Penalty testing result ......................................................................... 43
Table 5.2.3 Stochastic Ranking testing result ................................................................ 44
Table 5.3.1 Algorithms Best Result Comparison........................................................... 46
Table 5.4.1 Error achieved when FES equal to 5000, 50000 and 50000 ....................... 47
v
CHAPTER1 INTRODUCTION
1.1. What is optimization?
Our life is filled with problems; these problems are the driving force for our
inventions, and environmental enhancement strategies. In computer science
optimization is a computer process based process used to find solution to complex
problems. For example, if we want to find the maximum peak for any function, then we
need to formulize the precepts for a solution to be recognized as an optimum
corresponding to our aim of finding either global optima, or local optima. Nevertheless,
we may use constraint to push the algorithms to a feasible peak by suit of constraints,
and if we want to make things more difficult we will use mixed constraint types, such as
equality and inequality constraints. Finally, optimization can be defined as
“optimization is to find an algorithm which solves a given class of problem”
(Sivanandam & Deepa, 2008).
In mathematics we can use derivatives or differentiation to find an optima, but
not all function are continuous and differentiable. In general, the non-linear

 
programming is to find x so as to optimize f (x ) , x  ( x1 , x 2 ,, x n )   n , where


x  F  S . The objective function f (x ) is defined in search space S , the
set F  S define the feasible region, usually S is defined in n dimensional space from

the
global
space  n .
Every
vector
has
domain
boundaries,
x
where l (i)  xi  u(i),1  i  n , and the feasible region is defined by a set of constraints.
Inequality constraints, g i ( x)  0 , and equality constraints h j ( x)  0 . Those
inequality constraints could be equal to zero then they are called active; however, the
equality constraints are always active and equal to zero in the entire of search space.

Some researches were focused in local optima, where the point x is local optima 




there exist ε such that. For all x 0 in the ε neighborhood of x 0 in F , f ( x )  f ( x0 ) .
Finally, evolutionary algorithms are contrasting the mathematical derivatives to be a
global optimizer method with complex objective functions when mathematics fails to
give a sensible solution because of the complexity of the hyperspace or function
discontinuity elsewhere (Michalewicz & Schoenauery, 1996).
Evolutionary computing is often used to solve such complicated problems,
where the boundaries of the feasible region are so strict; whereas, genetic algorithms are
an expert optimization method. Its chromosomal representation can be continuous or
discreet. Genetic algorithms can be used for complex optimization problem; since, they
are not attracted by the shape or the complexity of the objective function. By adding the
constraints functions for the infeasible chromosome it can enforce those individuals to
be feasible, or it may give them cost to be feasible. On the other hand, the feasible
chromosomes have no addition or subtraction from their objective function value. This
criterion will enhance the feasible solution and penalize infeasible no matter the shape
of the function. Discontinuity is the second problem genetic algorithms can avoid;
since, the value of constraints will avoid it.
By using penalty irrespective of its criterion, unreliable chromosomes will lose
the undesired traits and they sometimes may suffer from killing penalty. In our study we
have used the same standards of penalty, where individuals are fixed rather than killing
1
penalty. We used the standard forms of static and dynamic penalty, where a specified
value was added or deleted from the infeasible individuals. In contrast, stochastic
ranking did not apply any kind of alternates for the objective function; only arranged
individuals inside the population. In fact, that was equivalent to penalty, because the
selection method will provide probability for individuals to participate in
recombination. Result of this study showed static penalty was the best optimization
method, because it always had a development opportunity to reach the global optimum.
Finally, we showed that algorithms were competed in the same environment but with
different strategies and they had different solutions. No Free Lunch Theorem was
interesting, when it suggested a different result and algorithms performance.
Twelve benchmarks were tested in three different algorithms, adaptive penalty,
static penalty, and stochastic ranking. Those three methods were able to solve the
majority of the problems, but with three categories: solved according to the best value
and number of constraints unsolved where some constraints are not satisfied and finally
unsolved permanently. Static penalty had got the maximum number of problem solved,
the best feasible rate, and standard deviation. It was so close to the identical distribution
shape. On the other hand, stochastic was second in rank according to the same
evaluation, but solved fewer problems. Finally, adaptive penalty proved worst according
to the same evaluations. It had the same amount of problems solved like stochastic
ranking. In fact it was 10 cases out of twelve. Those problems are chosen because they
were complex in nature according to the number of variables and constraints. Many
algorithms will be tested to view the reliability of those algorithms. All of those
benchmarks are designed to have global optima solution with varied complexity and
dimensions where it can make a worst hyperspace environment.
1.2. Thesis Overview
This thesis is organized in incremental method. Starting with simple and moving
to more complicate declaration depending on the issues.
CHAPTER 2: Discusses Genetic Algorithms framework, structure, and it basic
operation.
CHAPTER 3: Augmenting about constraints and different criteria has been used before
to handle them. Meanwhile, we will discuss penalty method as core of our system to
handle them. Discusses three types of penalty, adaptive, static and stochastic ranking.
CHAPTER 4: Describes simulation for tested problems and discusses how we assess
and analyze for the result; It give pseudo code for the systems and the convergence map.
Finally, there is a brief description of No Free Lunch theorem.
CHAPTER 5: Discusses result after making testing on selected problems from more
than 12 benchmarks. It’s illustrates diagrams for convergence graph.
2
CHAPTER 6: Conclusion depending on results achieved and future work.
3
CHAPTER 2 GENETIC ALGORITHMS
2.1. Overview
Genetic Algorithms (GA) is the primary branch of evolutionary computing. It is
the best known and most widely used global search technique with its ability to explore
and exploit a given search space using available performance measures (Sivanandam &
Deepa, 2008). It is also the most popular stochastic optimization methodology used now
a day. The basic idea of GA is Charles Darwin theory of “survival of the fittest”; where
species must adapt to their environment to survive. Individual with fittest natural traits
will have a greater ability and chance to survive. They will also have more priority for
breeding and transforming their phenotype and genotype to future generations. GA
basic building blocks are the chromosome that contains a set of genes. Where a single
gene represent factor in phenotype. Factors have upper and lower bounds that represents
the minimal and maximum adaptive (fitness) in phenotype for candidates. Genes can
provide solutions or near solutions for the global problem. Meanwhile, gene length
makes range of representing specific factor set; for example, if gene length is equal to
n , then it can represent 2 n1 binary strings (Sivanandam & Deepa, 2008), those can be
n
encoded for (length) 2 1 (Reeves & Rowe, 2002). On the other hand, every gene has
one or more alleles, those alleles will be stored in a single locus. The set of all alleles
will represent a single individual. (Holland, 1975) introduced genetic algorithm for
solving nonlinear problems. GA is problem dependent as there are many restrictions for
individual representation (i.e., binary representation because our aim is to ensure that
GA accurately and precisely represents all possible alleles for every point of the search
space). These values for alleles will represent the genotype that makes direct reflection
for phenotype where we will evaluate the solution according to their fitness. In general
if a decision variable can tack values between a and b , and b  a , and if it’s mapped to
binary string of length L , then the precision will be calculated in the next equation
(Reeves & Rowe, 2002), where x is best gene width.
x  (    ) /( 2 l  1)
4
Another method was addressed for binary representation of individuals, such as
gray coding, where the hamming cliff is reduced to one rather than standard binary
representation. The probability that at least one allele is presented at each locus can be
defined in the next equation (Sharda & Voß, 2002).
1 N 1 l
p  (1  ( ) )
2
GA basic operations are selection, crossover (permutation), mutation, population
replacement, and fitness evaluation. Figure 2.1.1 represents GA flowchart with those
operations (Haupt & Haupt, 2004). Before we can continue in declaration, we must
describe fitness, which is the most important part of search directing factor? It will be
the criteria used to evaluate the solution, and it will be problem dependent, with respect
to the definition of the problem. By decoding the value of genes from genotype into
phenotype, we can construct the objective function. According to objective function,
individual will be satisfied or dissatisfied with being selected for breeding. There are
many different methods of evaluating objective function and selection criteria. They can
be classified into ordinal based, such as linear ranking, or proportional base, such as
roulette wheel selection.
2.2. Binary representation
The GA solution was firstly introduced by (Holland, 1975) was in binary,
because it was mimicking the natural chromosomes gene representation and its
simplicity of applying the GA basic operations. In the first test we use the basic binary
to decimal convergence method. For example, to represent variable in decimal equal to
15 for a given problem, we start from 0 and start to give discrete values in range by
increment by one. Then we calculate to represent number from 0 (0000) binary to 15
(1111) binary we need 4 bits. With this method the results were terrible. Basically three
problems were highlighted.
A. Number of bit needed: every variable had its own domain Di , which has
lower and upper bound; for example, problem G5 (see page 34) let us
tack sample of two variables 0  x1  1200 , here we have a problem of
mapping variable domain in binary level, revealing to smooth binary bit
5
corresponding to every variable. To represent x1 , in trivial method we
need 11 bit binary string to represent it. On the other hand
Define cost function, cost, variables
Select GA parameters
Generate initial population
Decode chromosomes
Find cost for each chromosome
Select mates
Mating (crossover)
Termination condition fulfilled
Mutation
Done
Figure 2.1.1 Genetic Algorithms Flowchart
,  0.55  x3  0.55 how could be represented it in trivial method. This
mapping issue has raised a problem which is defined as Big Jump. We
want to make all variables into binary string that keeps them within
boundaries. Sometimes there will be off bits internally, then how can we
manage the domain concepts? For example, to represent 200 decimal in
6
binary (11001000) binary as it shows, we have many empty alleles, that can
in total shift the search space for infeasible solution. If we have several
successive
recombination
operations,
either
maximization
or
minimization problems all of them will be 0’s or 1’s after a specific
number of iterations. And if we want to represent 200 in fewer number
of bits depending on only full (on) bits, we will have a value less than
200, which is a loss of valuable data points in search space. Either way
will be inefficient for an accurate solution, within this bad status of
binary representation and domain satisfaction; still, we could construct a
temporary solution. For instance, uniform crossover, where crossover
probability is taken independently for every bit instead of chromosome.
Somehow it’s alternating for mutation. Moreover we propose a
methodology for constructing and retrieving values of variable with less
complexity and more accurate results.
Suppose we are going to maximize or minimize function, such
that R i  R , each variable xi , can tack value from domain Di  [ai , bi ] ,
where ai  bi . If we want to optimize f (x) with precisions, each
domain should be constructed by (ai  bi )  10 n , where n is the decimal
precisions desired. Let us denote mi as the smallest boundaries integer
then (bi  ai )  10 n  2 mi  1 . For example, 0  Dv  3 , then Dv  [3,0] .
Suppose we need precisions with degree 2; then (3  0)  10 2  2 mi  1 ,
to represent 300100 we need 9 bits which implies the inequality will be
according to Equation (1). Finally; in order to represent predictions with
variable boundaries elsewhere (Goldberg, 1989).
v  ai  decimal (binarystri ng ) 
bi  ai
2n  1
(1)
B. Imagine a more complicated scenario where we have a variable with a
huge number and another variable with a tiny domain. For example, the
same problem G5 (see page 32) where  0.55  x3  0.55 , the question is
how to represent variable domain that has negative range? Let us predict
7
scenario, if we use probability to be positive; or negative for
corresponding set of bit in chromosome, then that will be imagination.
Still, if we design a control matrix for variable negative and positive
value that will be initialized from the beginning by constructing variables
in domain range and checkup they sign. By imaginary of fixed variable
rage (i.e. fixed sign independent of search process operation, and predict
it will be the same sign), both of them are terrible in implementation and
mathematical proof.
Another issue to consider is does all variables
within the same number of bits, need to be shifted? The answerer is Yes,
I should do, because making proportional number of bits will make the
process more complicated mistakes. Another question is how to retrieve
an objective function value from the chromosome? Here we need more
than one standard method for retrieving variable values and of course we
need more complicated mapping of bits to ratio or real values. Finally,
variables are discreet and mostly the same for entire runs and search
processes.
C. Re-construct binary string: after retrieving variables values and
calculating objective function, we need to apply GA operation and
penalty. The question is how to retrieve specific variable value from
penalty function? Which methodology will use to construct binary string
from its corresponding variables value? We have designed more than one
solution, but all unsuitable. Mostly the left most binary string values
almost zeroes; in contrast, numeric optimization method discussed before
condition is to deal only with positive binary strings. Then we find the
inverse of the given penalty function. And we have been trying it in
simple method which is maximized x  [0,31] . The produced objective
function loses too many points out of the original function. The solution
is to alter the value of penalized individual, corresponding to the same
Equation (1).
2.3. Selection
8
Selection is the process of choosing two parents from the population for
crossing. The set of questions needed to be answered are how many individuals will be
selected? Are they going to make better traits! Whether or not a better (fitter) solution
after breeding. The basic selection method is fully random, where individual objective
functions are exchanged to be the probability of selection, this process is the spirit of
roulette wheel selection. It’s quite simple and easy, but other issues will need to be
answered. For example how many new copies these two selected elements from the
population mating pool will copy themselves for the next generation. Briefly these are
the basic problems of selection problem, and there was a complementary solution to
avoid those disadvantages. Such as scaling of fitness, fitness pressure balancing, and
elements rank depending on the nature of the problem, such as, linear ranking selection.
Many other methods were invented to solve this problem, since, selection pressure, and
other attributes of selection algorithm are going to play the basic role of convergence of
the algorithm. There are two major types of selection proportional selection, where
element fitness is ratio captured from overall population elements, such as, roulette
wheel selection. On the other hand, ordinal based selection, where fitness depending on
ranking (position) of element in population, and the first position reserved for the worst
individual. In this study we have used binary tournament selection, because of its
coherency, and ability to give chance for worst individuals to participate in selection
process, where their priority is very low. On the other hand, we can consider stochastic
ranking for constraints optimization as a selection method. However it can’t be
recognized as a complete one, because of its shortness to select individuals from the
mating pool. Another complimentary method need to be used that can make the
decision of selection.
2.3.1. Roulette Wheel Selection
Roulette Wheel is the most familiar selection method bounded with GA. It starts
by selecting elements linearly from the mating pool. However, cumulative element
objective function is summed, and the average of fitness is calculated. By using the sum
of fitness, individual’s fitness is divided sequentially with the total probability of one.
Individuals are captured in the roulette space proportional to their fitness. Meanwhile,
the number of times individual can be selected is proportional to the average of fitness.
When comparing with other method this method has disadvantage. It’s hardly
9
dependent on individual objective function, which allows the best individuals to
manipulate the mating pool. On the other hand, it will encourage the algorithm for less
exploration for infeasible search space from the beginning. However, those individuals
may have a solution. Finally, scaling of fitness and other techniques are used to make
less impact of fittest individuals for the search process. Figure 2.3.1.1 declares RWS
algorithm pseudo code (Goldberg, 1989).
Algorithm: roulette wheel selection
Input: the population P
Output: population after selection P *
X= random ]0,1[
i 1
while i  n do
If i<m & x <
i  i 1
select bi , t
fi
f (bi , t )
i1 f (bi , t )
n
then
od
Return P *
Figure 2.3.1.1 Roulette Wheel Selection Algorithms
2.3.2. Linear Ranking Selection
In contrast with proportion selection, linear ranking selection is based in
position, where individual are sorted with respect to problem in hand. Meanwhile the
first position will be reserved for the worst element. The positions of the population will
have constant probability to be selected with respect to the Equation (2) (Blickle &
Thiele, 1997), where linear function will be constructed. The probability of worst
individual to be selected will be

N
(Blickle & Thiele, 1997), and the best will be

N
(Blickle & Thiele, 1997). The value of   must be in between [0,1]; on the other hand,
the value of   will be calculated by    2    (Blickle & Thiele, 1997), where  
value will determine the probability of worst individual to participate in selection
10
process and N is population size and i is the index of element. Figure 2.3.2.1 is
illustrates the pseudo code for linear ranking selection algorithm (Blickle & Thiele,
1997).
pi 
1 

 i 1 
    

N
N 1


(2)
2.3.3. Tournament Selection
Unlike linear ranking selection, tournament selection has a sensitive selection
pressure, population are isolated into two subsets N  {Tlower , Tupper } (Sharda & Voß,
2002), where T is the tour length. Those elements in the upper subset will be compared
with average fitness T times until selection N individuals for parent pool. However the
most popular disadvantage of tournament implies that every time a best individual is
compared absolutely, it will be selected, if we use hard tournament selection pressure.
Meanwhile “the chance of the median string being chosen is the probability that the
remaining strings in its set are all worst  1 2 


t 1
(Sharda & Voß, 2002), where t is
the tour lengthand selection pressure,  = 2 t 1 (Sharda & Voß, 2002). Figure 3.2.3.1
shows the basic tournament selection pseudo code (Blickle & Thiele, 1997).
Algorithm: Linear Ranking Selection
Input: the population P (τ) and the production rate of worst individual
   [0,1[
Output: the population after selection P '
Linear ranking (  , J 1 ,, J n )

J ← sorted population according to fitness with worst individual at
the first position
S0  0
For i  1 to N do
11
S i  S i 1  Pi ,where Pi value is calculated in equation(2)
od
Figure 2.3.2.1 Linear ranking selection pseudo code
Algorithm: Tournament Selection
Input: the population P ( ) the tournament size t  1,2,, N
Output: population after Selection
Tournament (t , J 1 ,, J N )
for i  1 to N do
J 'j  best individual out of τ randomly picked
individuals from
od
Return J 1' ,  , J N' 
Figure 2.3.3.1 Basic tournament selection pseudo codes
2.4. Crossover
Crossover is the production method that uses exploitation to shift the search
process for better region of the search space. It can hopefully produce new individuals
that are better than their parents by sending their traits into new offspring’s. It can only
clone ancestor’s traits without any production of new traits. For every individual GA are
going to assign probability for crossover, depending on every individual, those elements
will be send into the mating pool. The typical probability of crossover Pc will be
constant for the entire of the process and equal to (0.5-1.0) (Goldberg, 1989), where a
uniform random generator will keep producing random values; every time, for selected
12
new individual selected the value will be compared with Pc , in order to send element
into the mating pool. Many types of crossover exist, such as single point crossover,
multipoint crossover, uniform crossover, and three parent crossovers. In this study we
use single point crossover, where two parent’s contents are exchanged according to
random choose point. The primary disadvantage of single point crossover is that the
heads of the parents are kept the same. Where they are separated but they may contain
solution for the problem. In contrast, multiple points’ crossover uses more than one
uniform random generated crossover points, where they can split the parent and pass
their values into new offspring. This takes precedence over single point crossover.
Uniform crossover uses a single point probability for every different allele, which
produce a higher probability for locus values to be swapped. For example, for binary
representation, if locus value is 1 the first individual contents are sent to second, and
vice versa if zero is found. Figure 2.4.1 shows the single crossover pseudo code
(Goldberg, 1989).
Algorithm: Single Point Crossover
Input: two individuals randomly picked from mating pool
Output: new explored offspring’s
Position= random 1,, N 
For i  1to position do
Child 1[i] = parent 1[i]
Child 2[i] = parent 2[i]
end
For i  position  1 to N do
Child 1[i] = parent 2[i]
Child 2[i] = parent 1[i]
end
13
Figure 2.4.1 Crossover (Recombination) algorithms
2.5. Mutation
It’s the background operation that prevents algorithms from being trapped with
local minimum, because it explores the entire search space. For example; if we want to
maximize function f ( x)  x in constrained interval [0, 7]. Then the initial population
won’t be the best. By mutating locus we may shift some chromosomes into value close
to (111) binary , by iteration. Probability of mutation is applied for every allele, which is
contradicting crossover; however, it rarely happens because of its negligible value.
There are many types of mutation which directly depend on representation. For
instance, if we use real data or integer then mutation criteria will be different with
binary representation. If data are discreet and individuals are represented in binary base,
then mutation will be a bitwise by exchanging 0 to 1, and vice versa. Finally,
probability of mutation, Pm 
1
(Sharda & Voß, 2002), where n is chromosome
n
length. Sometimes, Pm may be fixed, but the typical Pm (0.05-0.5) (Goldberg, 1989),
in our system we use the same values.
2.6. Population Replacement
There are many options for population replacement, but to summarize, we are going
to describe two types:
1. After GA basic operations select only the best individuals with some preceding
methods, where the entire parents and offspring’s are sharing the same
probability to be selected.
2. Select only from the new created offspring and kill the entire parents, in another
word, replacement method, where offspring inherits their parent.
2.7. Search Termination
There are many criteria which have been constructed for search termination.
Because of the stochastic nature of GA it can run for infinity, but it needs to be stopped
14
at any given time because evaluation of the solution is needed. We can classify stopping
condition into three types, time dependent, iteration dependent and fitness dependent.
1. Maximum generation: if we reach the maximum number of allowed iteration we
need to stop the algorithms. Sometimes we need to predict the specific number
of needed iterations depending on the complexity of the problem. For instance,
our maximum function evaluation strategy (FES) is equal to 500000. Number of
iterations is the most important and widely used criteria; it will be the primary
stopping condition.
2. Elapse time: starting time to end time sometimes can be used as a secondary
stopping condition. Problems are varied in complexity; sometimes, we can
predict interval for stopping algorithm runs. Meanwhile if the maximum
generation number is reached then it must stop.
3. Minimal diversity: measuring difference between traits and fitness internally is a
crucial operation. Because traits are preserved and the solution will retain its
value even after recombination process. Then algorithms need to be stopped.
Sometimes, this criteria tack more priority over number of iterations.
4. Best Individuals: if the minimum of fitness in the population dropped under the
convergence value. This will bring the search process to faster conclusions that
guarantee at least one good solution (Sivanandam & Deepa, 2008).
5. Worst Individuals: the minimum fitness value for the worst individual can be
less than convergence criteria. In this case convergence value may not be
obtained (Sivanandam & Deepa, 2008).
6. Sum of Fitness: the search considered to have satisfaction converged when the
sum of fitness in the entire population is less than or equal to the convergence
value on the population record. This guarantees that logically all elements are in
the same range (Sivanandam & Deepa, 2008)
2.8. Solution Evaluation
In very iteration, GA is going to enhance and delete some traits. Those we need
to clarify the meaning of best. The best declaration is fuzzy for most cases, but in the
final generation we need to obtain a solution. This solution may or may not be the
15
desired solution. Then we could make another prediction to the number of iteration, or
we may use the best to be enhanced. Finally, the feasible region may be a constrained
one, such as our tested cases. We will see in Chapter 3, how we formulize the best to be
not the minimum, but it must satisfy even all the constraints to be recognized as a
solution.
2.9. Summary
Our study focusing only in standard GA operation, we have been chose single
point crossover, bitwise mutation, binary tournament and population replacement by the
new offspring’s. But evaluation where done harsh, because we will evaluate solution
not only by their fitness, instead; we are going to add number of constraints satisfies to
be the critical evaluation strategy.
16
CHAPTER 3 CONSTRAINTS HANDLING METHODS
3.1. Penalty Method
Evolutionary Algorithms (EA), have introduce a penalty method to cope with
the dilemma of constraints handling technique. Since we have a set of constraints that is
going to direct and drive the search process. EA can change constraints problem A into
unconstrained one A* by introducing penalty. However, changing is achieved in
particular by adding or subtracting values from objective function based on the number
of constraints violation (Coello, 2000). Evolutionary Computing uses two kinds of
search directive, exterior or interior (Coello, 2000). Where the exterior search process
starts from an infeasible region and continues to have most individual’s inside a feasible
one. But the interior search process starts with random small values within the
boundaries of a feasible region, and uses constraints to retain its boundaries. Moreover,
exterior have critical advantages over interior. That is, initial solution generated
randomly has no obligation to be optimum. In this study we have been used exterior for
its simplicity. Since, we relied on the algorithm to give the solution in such complicated
search space. The general formula for exterior penalty is shown in Equation (3) (Davis,
1987), where  (x) the new objective function value, Where f (x) is the objective
function before applying penalty and it will be calculated according to the problem
percepts,

n
p

i 1
j 1

 ( x)  f ( x)   ri  Gi   c j  L j 
, Gi is inequality constraints. L j is the equality constraints,

(3)
ri and c j are penalty
coefficients respectively. For every equality constraints it should be exchanged to the
form of inequality by introducing tolerance factor   0.0001 ; where | hi ( x) |   0
(Liang, et al., 2006), where the value of Gi  max[ 0, g i ( x)]  (Coello, 2002), where
individuals that’s satisfy the summation will not be penalized and their value will be
17
retained. On the other hand, L j  h j (x)
(Coello, 2000). The absolute value is
calculated, and by extracting the tolerance value, we can classify the constraint
satisfaction. Finally, the value of β is normally 1 or 2 (Coello, 2000).
After introducing the main formula for penalty function, some problems is
highlighted. For example, what is the best value for penalty factors? To answer this
question we need to predict the scenarios for chosen values. For instance, if we choose a
penalty factor that’s too high, then like GA search process it will be pushed immediately
for the feasible region, which makes for a fast solution with less of consistency,
Because algorithms will not be able to exploit a more infeasible individuals. They could
hold an optimum solution. In contrast, if we choose too small penalty factors, then
algorithm will explore more infeasible solution and mostly will not trapped by local
minimum. The dilemma of convergence time will be long, neither high, nor negligible
will be optimum. This issue has been addresses many times during previous research
and conferences.
An interesting solution stochastic ranking with bubble sort like procedure has
been used. This will discuss in details later. However the main idea is how to balance
objective and penalty functions. The better penalty coefficients will optimize more
individuals to the given problem, and will allow them to enter the mating pool. Many
methods are studying how to treat individuals proportional for their state (i.e., if they
were inside or outside the feasible region). Let us imagine the new scenario for
individuals. Firstly, it can be settled in feasible region, how are we going to treat it?
How much pressure are we going to apply on it? Secondly, individuals can be residing
outside the feasible region, what’s the best penalty factor to fix them and make them
feasible.
Coello have been introduced some guide lines for heuristic on design penalty
function (Richardson, Palmer, G., & M., 1989) and he was giving some
recommendations, like;
1. Distance based penalty functions achieved better performance over
constraints dependent.
2. If numbers of constraints are limited then numbers of feasible regions are
limited too, which implies algorithms frequently will not have a solution.
This was the case in study for case 1 and 10.
18
3. Penalized fitness function should be close to feasible region.
Many studies have been previously carried out. All of them may be categorized
into one of these basic methods, static penalty and dynamic penalty.
 Static Penalty: as expected we retain penalty function attributes and
penalty coefficient constant for the entire iterations, without any
feedback from the population. It Maybe according to previous statistical
data collected, or it could be raw guess. In my point of view, the trivial
drawback for this method will be in the final stages of the search process.
But we couldn’t use penalty coefficient that preventing algorithm from
reaching the global optimum like probability of mutation for simple GA.
Fixing penalty coefficient without any prior information about the
problem, or feasible search space could provide a good solution, but
mostly it will be trapped with local minimum depending on the problem
complexity. (Homaifar, Lai, & Qi, 1994), propose an approach in which
the user defines several level of violation, and a penalty coefficient is
chosen for each in such a way that the penalty coefficient increases as we
reach higher level of violation. Equation (4) shows the individuals
evaluation (Michalewcz, 1996), where k, i is penalty coefficients, m is
the total number of constraints, and max 0, g i ( x) are the quadratic
2
penalty function.

fitness( x)  f ( x)  i 1 Ri ,k  max 0, g i ( x)
m
2

(4)
The fitness(x) is the objective function after applying penalty.
k  1,, N , where N is the number of violation (satisfy) pre-defined
by user. The main drawback for this method is like mutation in GA;
however, the number of violations levels will make more complexity for
the algorithm to find the optimum. Penalty should be calculated
according to Equation (5) (Morales & Quezada, 1998) , where s is
number of constraints satisfied, and m is the total number of constraints.
 f ( x) if the solution is feasible

fitness( x)  
s K 
K  i 1  m  otherwise
 

19
(5)
9
Finally, K is a large number as it was set to 1 10 (Morales &
Quezada, 1998). The trivial drawback of this method is that it does not
use any information about constraints violation that could hopefully
direct the search algorithms to better region.
 Dynamic Penalty: its contradictions for the previous method where
information about current iteration is used in evaluation process. Joines
and Houck, introduced formula for evaluating individuals dynamically
according to the generation number (Joines & Houck, 1994).
fitness( x)  f ( x)  C  t   SVC (  , x)

(6)
Where C ,  and  are predefined constants, and SVC function is
calculated depending on constraints violation according to (Joines &
Houck, 1994)
SVC(  , x)  i 1 Di ( x)   j 1 D j ( x)
n
p
(7)
The value of Di (x ) , and D j (x ) is calculated according to the next
equation (Joines & Houck, 1994).
0, g i ( x)  0
Di ( x)  
 g i ( x), otherwise
1 i  n

0,    h j ( x)  
D j ( x)  
1 j  p
h
(
x
)
,
otherwise

j

(8)
(9)
By using those equations penalty values will be reduced dramatically
when it reaches more iteration.
Because normal exterior penalty starts from an infeasible solution, we need to
ensure that we are starting with minimum penalty factor to enforce EA for exploiting
more infeasible solution. Dynamic penalty is such a massive umbrella that has many
other optimization problem techniques; such as, Dynamic Simulation Annealing and
Adaptive Penalty.
From the previous section we can summarize the differences between static and
dynamic penalty:
Table 3.1.1 Static vs. dynamic penalty
20
Static Penalty
Penalty function is constant
Dynamic Penalty
Penalty function and coefficient are in
dynamic changing depending on current
iteration
Need priori information about number
of constraints violation probability
Need to assign user defined values such
as beta and alpha accurately
Less accuracy
Hard to define penalty factor
More accuracy
Hard to define penalty function
 Adaptive Penalty: it’s a flavor from dynamic penalty, where information
about current population is taken to ensure a more accurate penalty value
applied for individuals. Meanwhile, it will dramatically shift the search
to the direction of feasible region, by getting feedback from the
population. Adaptive penalty allows us to neglect the problem of hill
climbing with GA, where some solutions will make the algorithms go
backward to previous stage in hill climbing function. On the other hand;
adaptive penalty never go back to a previous region. Equation (10) show
the new fitness after applying penalty (Hadj-Alouane & Bean, 1997),
where  (t ) is feedback from population. It used to update penalty
n
p

fitness( x)  f ( x)   (t )  g i2 ( x)   j 1 h j ( x) 
 i 1

(10)
function with respect to the current population number. The value of
 (t ) is updated adaptively in every iteration by Equation (11) (HadjAlouane & Bean, 1997).
 1 

    (t ) if case1
 1 



 (t  1)   2   (t )
if case 2 
 (t )
otherwise 




(11)
The value of  1 and  2 are both greater than 1; although, they
must be not equals to avoid cycling (Coello, 2000). CASE1, if the best
individual from the previous population is in the feasible region. CASE2,
if the best individual is not in the feasible region (Coello, 2000). Finally,
the value of  (t ) will have no change, if the best is not feasible, nor
21
infeasible. Although in CASE 1, penalty factor will be small enough to
keep the search process within feasible region boundaries. In contrast, it
will be high in CASE 2; since, the search process needs to shift out
infeasible region.
There are two basic disadvantages of adaptive penalty. Firstly,
how do we define the value of k . Secondly, how do we define  1 and
 2 values? Since, a misunderstanding of the attributes of the problem,
will absolutely guide to inapplicable values of them, which implies
unfair penalty for individuals.
3.2. Adaptive Penalty for Constraints Optimization
Changing constraints problem into unconstraint one by applying penalty is the
basic technique we have used to control the search process, in order to retain in the
feasible region. Adaptive penalty is a flavor of dynamic penalty, where information
about current population and individuals, is retrieved to apply accurate and dependent
penalty factor. All problems have domain constraints values, the search process must be
within boundaries of it, or it must be pushed coherently for the feasible region. Let us
declare what has been applied according to adaptive penalty general Equation (11),
where three different states could be. Firstly, if we have a feasible solution in the
population, then the value of  (t ) will be close to zero; then,  1 will control the new
penalty function , and the entire population will be penalized at the same value which is
close to zero [0,1[. Secondly; if infeasible solutions are found and we don’t have in
hand a feasible one, then the penalty will be according to  2 , where it value will be real
number greater than 1. We want to ensure that search process will be pushed more often
to the feasible region. Finally, when feasible and infeasible individuals exist within the
same population, then we need to keep the value of  (t ) , where it has been derived from
the previous iteration. The clue is; since, we have feasible and infeasible, this implies
we are shifting individuals roughly to the desired region and can fix the value of  (t ) .
Now there is a sensible reason for the third state? This will be summarized. If we reach
the value of penalty function accidentally then it may balance its value, then we must
use it. Finally; the result tables in chapter 4 will show, that the adaptive is not the best,
and then we will calculate that stochastic ranking precedes other algorithms. It gives a
22
higher priority for penalty function instead, of objective function. Now, let us declare a
critical point of view, suppose we are using the phenotype. Then making the decision if
individuals are feasible, or not will be in phenotypes, at the same time, individuals are
encoded in genotype space. We know that, every left most bit will be equivalent to the
entire right most minus one. Here hamming cliff will be our problem with adaptive
penalty. To summarize, if we apply any measurement for genotype we will suffer from
hamming cliff; in contrast, if we apply it for phenotype, then how many iteration will
we need to exchange an undesired phenotype to an applicable one. Figure 3.2.1
illustrates the pseudo code for adaptive penalty.
3.3. Static Penalty for Constrained Optimization
In contrast with adaptive penalty, static penalty starts with a sequence of
predefined penalty factors, where those factors are random or derived from searching a
process with raw data to guess the accurate factors. Those factors could shift the search
process to infeasible; or keep with the optimal region. For example, suppose we have
0.5 penalty factor, meanwhile, we are in the feasible region. By applying this factor
according to the static penalty equation, we will drag the solution to a less feasible
region. Adaptive penalty where the penalty coefficient is varied and dependent on the
current population could drag the algorithm to a local minimum. On the other hand,
with optimization in constraints problems hyperspace we can only choose penalty for
infeasible individuals, and leave the feasible individuals as they are.
Algorithm Adaptive Penalty
Input: population P before applying penalty, initial values of  (t ) ,  1 and
2
Output: population after applying adaptive penalty
While population has more elements
do
xi =split Chromosome (next Element)
attributes=retrieve attributes (n variables)
f (x ) =function x (attributes)
Calculate constraints values
n
p

fitness( x)  f ( x)   (t )  g i2 ( x)   j 1 h j ( x)  (10)
 i 1

Apply the inverse equation
23
p (x ) =inverse equation for f (x )
Convert f (x) to binary format
Figure: 3.2.1 Adaptive Penalty Algorithms Pseudo Code
Hopefully, after a combination of the two brands, we may get a better solution,
with respect to what we had. Still, we have the same unanswered question, what is the
best value for penalty factors? To summarize, it is logically that we can’t guess the
optimum and the accurate value unless we make several runs. In result section we will
Algorithm: Static Penalty
Input: the population P, ri and c j
Output: penalized population P '
Static Penalty ( P, ri , c j )
While population has more elements do
xi =split current chromosome
Attributes = retrieve attributes (List)
f (x) =function X (variable)
Calculate constraints values
 f ( x) if the solution is feasible

fitness( x)  
(5)
s K 
K


otherwise




i 1
m

p (x ) =inverse equation
Convert p (x ) to binary format
Add new individual into P '
od
24
Return P '
Figure 3.2.2 Static Penalty Algorithm Pseudo Code
show that static penalty behaves better than adaptive penalty. Even, with the short falls
we have been describing, it achieved better optimum and mostly it wasn’t trapped by
local optimum. Finally, it achieved greater more number of constraints satisfied for the
majority of tested problems, unlike both the adaptive and stochastic methods. Figure
3.2.2 illustrates static penalty pseudo code. Finally, in our study we have been used the
value of penalty value to be according to Equation (5), for equality and inequality
constraints.
3.4. Stochastic Ranking for Constrained Optimization
In the previous section we were discussed the dilemma of choosing a best
penalty coefficient. We need to drive the process for feasible region without neglecting
the rule of infeasible ones, because they could have a solution. We need to control the
convergence speed, because if algorithms move towards to specified region it will not
be able to go backwards to an infeasible region, since penalized objective functions
must be within the boundaries of the current region. Stochastic ranking tries to balance
between objective and penalty functions. In this study we followed this style for ranking
and selecting individuals by binary tournament selection. Since it has a particular
property for which individuals will be selected in average of feasibility. Stochastic
ranking has to make the ranking individuals. Still, any other selection methods can be
used, like roulette wheel selection. But it will be a trivial drawback, how many times
feasible individual will be selected? And what is the specification for infeasible
25
individuals. The answer to this issue will be another problem, which is mating pool
domination by the best individuals. Ranking is the best known solution is in use to solve
it. For example; GA use linear ranking for selection and introducing selection pressure
will affect the overall search process. In contrast; this will not be used for constraints
handling until we apply a prior method for balancing the value of penalty factor.
3.4.1. Stochastic Ranking using Bubble Sort like procedure
This method was invented and developed by (Runarsson & Yao, 2000), for
penalty constraints optimization criteria, where penalty coefficient rg is hard to
recognize as an optimal value in ether previous cases. However, there were
disagreement about balancing between objective function and penalty function, which
could reveals a better penalty function value. Elements need to be arranged in ascending
or descending order, depending on the problem itself. By constructing a relationship
between adjacent individuals, as the ability to be feasible and winning the computation,
they developed a more sophisticated idea. If we make ri ~ be the critical penalty
coefficient, there are three options for it. Firstly, individuals are arranged according to
objective function and dominance for it. Secondly, they are arranged by penalty
function and dominance for it. Finally, if ri ~  0 and no dominance for either one of
them. Because of all these issues stochastic ranking with bubble sort flavor tries to
balance between two sides of inequality. The probability of an adjacent individual
winning the comparison is given by, where Pw is the probability of winning according to
Pw  Pfw Pf  Pw 1  Pf

objectives function, and Pw is the probability of winning according to penalty function.
If both of adjacent individuals are feasible then Pw  Pfw .in our testing problems, we
have managed the probability of feasible to be manually initialized, because we would
like to share between objective and penalty function, to play the same rule in the next
population production. It was sited to 0.475, since Pw = 0.5 → Pf = 0.5, as this implies
that, neither objective; nor penalty, are manipulating the comparison. If we want to
clarify the relationship between winning and losing either method in a given number of
comparisons, then we can construct the next equation, where n is the total number of
comparisons, and K ' is the total number of wins, elsewhere (Runarsson & Yao, 2000).
26
K '  n  k 
2
Bubble sort like procedure follows the same strategy of natural bubble sort
sorting algorithms, where execution time complexity belongs to T  2 n . Figure 3.4.1.1
shows its pseudo code (Runarsson & Yao, 2000). Later on we will construct another
method, where comparison of performance in one algorithm compared to another can be
constructed according to a different strategy, where the domain of objective function
and points play the critical rule for performance measurement. In addition, another issue
with this algorithm is the disability to cut the needed population element and shift them
to the new one, as it can’t make a local selection and needs any other selection method
to recover it. This drawback can be solved by any complementary selection method.
However, in this study we used binary tournament selection, to give a fairer selection
pressure, and to allow the infeasible candidate to participate in next the generation.
Algorithm: Stochastic Ranking using bubble sort like procedure
Input: Population P, Pf
Output: Population P '
Stochastic Ranking ( P, Pf )
I j
j  1, , N 
For i  1 to N do
For j=1 to   1 do
Sample u  U (0, 1)
If ( ( I j )   ( I j 1 ))  0 or u  Pf then
If ( f ( I j )  f ( I j 1 ) then
Swap ( I j , I j 1 )
fi
Else
If (( I j )  ( I j 1 ) then
Swap ( I j , I j 1 )
fi
fi
od
od
27
Figure 3.4.1.1 Stochastic Ranking Using Bubble Sort like Procedure
Runarsson and Yao constructed this algorithm, and they were used it for
maximization, but in fact they were solving minimization problems by tacking the
complement. They suggested a value of U to be in between (0, 1), and to be generated
via a uniform distribution random generator. However, in this study we do not have to
worry about uniformity; since, java virtual machine always has a uniform distribution
random generator. Runarsson and Yao also used a fixed value of Pf , which they
suggested as 0.475, as they found it more applicable to solve given problems suit. As
such we used the same problems to test our algorithms.
3.5. Summary
Exchanging constraints problem into unconstraint one was the core of penalty
method. We have been discussed dynamic and static penalty, however; adaptive penalty
is a flavor of dynamic. The main difference between those penalty methods and
stochastic ranking is they both add or subtract value from objective function, but
stochastic ranking are doing complementary method like them only by ranking
individuals. Those methods are competitors and we will see that they are going to
behave in different manner. In next chapter, we are going to show the difference
between the three algorithms, we will find that each method was have its own drawback
and experts; meanwhile, all of them give solution, but with different records, and
different number of constraints. In No Free Lunch theorem section, we will see that
every algorithm will behave in different manner comparing to another one, in the same
search space.
28
CHAPTER 4 SIMULATION
4.1. System Environment
The three algorithms were tested in the same machine, with the same
parameters. Nevertheless, they were varied in result. We will see that the algorithms
need to have the same parameters in order to make fair a comparison. Table 4.1.1
illustrates the machine properties. The system was developed in pyramidal shape, where
the GA is the base of it. Figure 4.1.1 shows the system diagram.
Table 4.1.1 PC configurations
System: Microsoft Windows
RAM: (1) GB
Algorithms: Genetic Algorithm with
Adaptive Penalty, Static Penalty and
Stochastic Ranking methods for
Constrained Optimization.
CPU: Intel(R) Atom ™ CPU N270
@1.6 GHz 1.6 GHz
Language: Java
The code was developed in Java, where complex data structures are used; hence,
constant pointers were used instead of user defined ones, as these are the only ones that
Table 4.1.2 GA and System parameters
GA parameters
Representation
Selection
Crossover
Mutation
Replacement
Binary
Binary tournament
selection
Single point
Pc
(0.0-1.0)
Bitwise mutation
Pm
(0.05-0.5)
Offspring’s inherits
their parents
System Parameters
Number of
iterations
Number of
individuals
Individuals number
of bits
5000
100
Proportional to the
problem
𝜺 value
0.0001
Independent runs
30
29
exists in Java. Figure 4.1.1 illustrates the overall system execution criteria, where every
method has its own function, but the rest will remain the same.
The system has a fixed number of individuals, which was fixed to be 100;
meanwhile, the maximum FES was 500000. The number of runs is 30 independent runs
for each problem. Three checkpoints were constructed in 5000, 50000, and 500000.
Those checkpoints were constructed to show the dynamics of the system (Liang, et al.,
2006), where statistical functions are applied to given data such as, mean, median, and
standard deviation. Finally, Table 4.1.1.2 shows the GA parameters and the system
overall overview.
Define cost function, cost, variables
Select GA parameters
Generate initial population
Decode chromosomes
Find cost for each chromosome
Select mates
Mating (crossover)
Termination condition fulfilled
Mutation
Done
Figure 4.1.1 System execution diagram
30
4.2. Tested Problems
4
4
13
i 1
i 1
i 5
(Floundas & Pardalos, 1987) Minimize G1  5 xi  5 xi2   xi
1.
Subject to:
1.
g1 ( x)  2 x1  2 x2  x10  x11  10  0
2.
g 2 ( x)  2 x1  2 x3  x10  x12  10  0
3.
g 3 ( x)  2 x2  2 x3  x11  x12  10  0
4.
g 4 ( x)  8x1  x10  0
5.
6.
7.
8.
9.
g 5 ( x)  8x2  x11  0
g 6 ( x)  8x3  x12  0
g 7 ( x)  2 x4  x5  x10  0
g 8 ( x)  2 x6  x7  x11  0
g 9 ( x)  2 x8  x9  x12  0
where:
a. 0  xi  1 i  1,2,3,4,5,6,7,8,9,13
b. 0  xi  100 i  10,11,12
with six constraints are active ( g 1, g 2 , g 3 , g 7 , g 8
2.
and
(Koziel & Michalcwicz, 1999)Minimize
n
n
G2 
 cos 4 ( xi )  2 cos 2 ( xi )
i 1
i 1
n
 i( x
i 1
2
i
)
Subject to:
n
1.
g1  0.75   xi  0
i 1
n
2.
g 2   xi  .075n  0
i 1
where:
n  20 and 0  xi  10 i  1,2,, n
a.
constraint g 1 is closed to be active.
31
g 9 ).
3.
(Michalcwicz, Nazhiyath, & Michalcwicz, 1996) Minimize
  x
G3   n
n
n
i
i 1
Subject to:
n
1. h1   xi2  1  0
i 1
where:
a.
n  10 and
0  xi  1 i  1,2,, n
4.
(Himmelblau, 1972) Minimize
G4  5.3578547 x32  0.8356891x1 x5  37.293239 x1  40792.141
Subject to:
1.
2.
3.
4.
5.
6.
g1  85.334407  0.0056858 x 2 x5  0.0006262 x1 x 4
 0.0022053x3 x5  92  0
g 2  85.334407  0.0056858 x2 x5  0.0006262 x1 x4
 0.0022053x3 x5  0
g 3  80.51249  0.0071317 x2 x5  0.0029955 x1 x2 
0.0021813x32  110  0
g 4  80.51249  0.0071317 x2 x5  0.0029955x1 x2 
0.0021813x32  90  0
g 5  9.300961  0.0047026 x3 x5  0.0012547 x1 x3 
0.0019085 x3 x4  25  0
g 6  9.300961  0.0047026 x3 x5  0.0012547 x1 x3 
0.0019085 x3 x4  20  0
where:
78  x1  102
33  x2  45
27  xi  45 i  3,4,5
two constraints are active ( g 1 and g 6 ).
a.
b.
c.
5.
(Hock & Schittkowski, 1981) Minimize
G5  3x1  0.000001x13  2 x2  0.000002 x23
3
Subject to:


1. g1   x4  x3  0.55  0
2. g1   x3  x4  0.55  0
3. h1  1000 sin  x3  0.25  1000 sin(  x4  0.25)  894.8  x1  0
32
4. h2  1000 sin x3  0.25  1000 sin( x3  x4  0.25)  894.8  x2  0
5. h3  1000 sin x4  0.25  1000 sin( x4  x3  0.25)  1294.8  0
where:
a.
b.
c.
d.
6.
Subject to:
0  x1  1200
0  x2  1200
 0.55  x3  0.55
 0.55  x4  0.55
(Floundas & Pardalos, 1987) Minimize G6  x1  103  x2  203
1. g1  x1  5  x 2  5  100  0
2
2
2. g 2  x1  6  x 2  5  82.81  0
2
2
where:
a.
13  x1  100
b.
0  x2  100
Both constraints are active
7.
(Hock & Schittkowski, 1981) Minimize
G7  x12  x22  x1 x2  14 x1  16 x2  ( x 3 10) 2  4( x4  5) 2
 ( x5  3) 2  2( x6  1) 2  5 x72  7( x8  11) 2  2( x9  10) 2  ( x10  7) 2  45
Subject to:
1. g1  105  4 x1  5x2  3x7  9 x8  0
2. g 2  10 x1  8x2  17 x7  2 x8  0
3. g 3  8x1  2 x2  5x9  2 x10  12  0
4. g 4  3( x1  2) 2  4( x 2 3) 2  2 x32  7 x 4  120  0
5. g 5  5 x12  8 x 2  ( x3  6) 2  2 x 4  40  0
6. g 6  x12  2( x 2  2) 2  2 x1 x 2  14 x5  6 x6  0
7. g 7  0.5( x1  8) 2  2( x 2  4) 2  3x52  x6  30  0
8. g 8  3x1  6 x 2  12( x9  8) 2  7 x10  0
where:
 10  xi  10 i  1,2,,10
a.
six constraints are active ( g 1, g 2 , g 3 , g 4 , g 5 and
33
g 6 ).
(Koziel & Michalcwicz, 1999) Minimize G8 
8.
sin 3 (2x1 ) sin( 2x 2 )
x13 ( x1  x 2 )
Subject to:
1. g1  x12  x2  1  0
2. g 2  1  x1  ( x2  4) 2  0
where:
a.
b.
9.
0  x1  10
0  x2  10
(Hock & Schittkowski, 1981) Minimize
G9  ( x1  10) 2  5( x2  12) 2  x34  3( x4  11) 2  10 x56
 7 x62  x74  4 x6 x7  10 x6  8 x7
Subject to:
1. g1  127  2 x12  3x 24  x3  4 x 42  5 x5  0
2. g 2  282  7 x1  3x 2  10 x32  x 4  x5  0
3. g 3  196  23x1  x 22  6 62  8 x7  0
4. g 4  4 x12  x 22  3x1 x 2  2 x32  5 x6  11x7  0
where:
a.  10  xi  10 i  1,2,,7
10.
Subject to:
(Hock & Schittkowski, 1981) Minimize G10  x1  x2  x3
1.
2.
3.
4.
5.
6.
g1  1  0.0025( x4  x6 )  0
g 2  1  0.0025( x5  x7  x4 )  0
g 3  1  0.01( x8  x5 )  0
g 4   x1 x6  833.33252 x4  100 x1  83333.333  0
g 5   x2 x7  1250 x5  x2 x4  1250 x4  0
g 6   x3 x8  1250000  x3 x5  2500 x5  0
where:
a. 100  x1  10000
b. 1000  xi  10000 i  2,3
c. 10  xi  1000 i  4,5,,8
all constraints are active ( g 1, g 2 and g 3 ).
34
(Koziel & Michalcwicz, 1999) Minimize G11  x12  ( x2  1) 2
11.
Subject to:
1. h1  x2  x12  0
where:
a.  1  x1  1
b.  1  x2  1
(Hock & Schittkowski, 1981) Minimize G12  e x1x2 x3 x4 x5
12.
Subject to:
1.
h1 ( x)  x12  x 22  x32  x 42  x52  10  0
2.
h2  x2 x3  5x4 x5  0
3.
Where:
a.
h1 ( x)  x13  x23  1  0
b.
 2.3  xi  2.3 i  1,2
 3.2  xi  3.2 i  3,4,5
4.3. Criteria for Assessment
 Constraints violation: Sometimes the feasible region are slightly small,
and the constraints drawn in such a complex hyperspace. Even though,
we have set a primary condition in order to determine whether the
solution is able to satisfy all the constraints. In fact, it was difficult
condition comparing to another study, when they accepted some
constraints not be satisfied. This condition reveals a more consistent
result. For example, some of them give a fixed value to evaluation the
constraints satisfied, they gave three conditions:
1. Constraints values greater than one.
2. Constraints values greater than 0.01
3. Finally constraints values greater than 0.0001
35
Overall they give more chances for algorithms to do well, but on another
hand they give a poor solution.
Some problems have a spherical shape of constraints, where applying
one could reveal for applying the second one. Still, it’s proportional to
how much that element is far away from the origin of the sphere.
Problem six shows a clear example of this. Figure 4.3.1 to 4.3.3 is show
graphs in 3D for the function and its constraints. Figure 4.3.1 shows the
upper constraint, Figure 4.3.2 shows the function, and Figure 4.3.3 is
shows the lower constraints. Because they are arranged in the same order
graphical order, the other problem with constraints to deal with is
hyperspace. Some problems has 20 or more dimensions, the question is
how do we know what is the limit for a given constraints in order to
evaluate it as a satisfied one. Another critical problem is that currently
there is no software that can draw more than 3D; also, its shows the
constraints combined to the function.
Figure 4.3.1 Upper constraint
36
Figure 4.3.2 The function
Figure 4.3.3 Lower constraint
From these graphs we can imagine how complicated problem it will be,
if we have a new variable or if we have more than 3D problems.
 # feasible runs 
 (Liang, et al., 2006).
 Feasible rate: feasible rate  
 total runs 
It’s the classifier that we can rely on to make the preference between two
distinct algorithms. In our study, we figure it out that not all problems
were solved with high consistency feasible rate and it was varied
between algorithms.
 Quality of result: its definition is varied according to the desired result
need to be achieved by the pioneers. In our study we focus only in the
minimum value achieved by the algorithm and the number of constraints.
4.4. No Free Lunch Theorem
In last few decades many optimization algorithms have been introduced those
are black boxes, such evaluation computing, neural network etc. Those algorithms can
exploit the search space with little knowledge concerning the optimization problem on
which they are run.
Evolutionary algorithms are one of those black boxes that mimic the process of
natural selection. It is important to analyze the relationship between the algorithm and
the optimization problem and the optimization community used to adopt an oracle based
analysis, where the assessment of the function is stated in terms of number of function
evaluation to achieve the solution. On the other hand, this method has disadvantages.
Such as, not all functions are non- revisiting algorithms, where some point is visited
over and over, because functions do not memorize where they were in the previous
37
point. These wasteful functions can be enhanced; simply by making they remember
what and where they have already searched.
The amount of revisiting is complicated for algorithms and optimization
problem that will reveals for complicated situation, where analysis or filtering will be
difficult in mathematics. The definition of NFL (No free lunch) theorem is to restrict the
attention to combinatorial optimization in which the search space X is large and finite
and the space of the cost value Y . However, the objective function f : X  Y is the
|Y |
space of all possible problems, and then f will be dependent in time and with size X .
Let and B be any distinct algorithm. Then on average A could perform B even
if B sometimes out performs A . NFL was developed to analyze algorithms not only by
the number of iteration needed to have a solution.
If function is applied to given problem and the performance is investigated by
machine learning instead of optimization. Optimization is algorithm dependent;
however, NFL can evaluate performance of algorithm A in a class of problem. For
instance, suppose we built a simulation annealing function to minimize; or,
maximize f (x) , according to oracle-based optimization performance measurement.
Then we will evaluate this function according to execution time exactly; since, we are
counting down every piece of code. The good of NFL is to construct such a classifier
that maps input to the output, and then makes a generalization for given algorithm
performance in class of problems. To summarize, NFL will predict whether algorithm
A will be more or less accurate compared to B , and maybe with a better performance,
in the same class of problem. It will not promise to be so for all classes.
Wolpert and Macready (Wolpert and Macready, 1996) circumvented the
problem by comparing the algorithms according to the number of distinct function
evaluations they have perform, they calculated only the number of distinct calls to
oracle base. However, oracle base has many criteria. For example, in minimization
problem the criteria might be the lowest value of cost function in order to derive the
better performance function. Finally, they introduced a sample which is time ordered
set. Such that, m ”sample” denoted by d m  d mx (1), d my (1), , d mx (m), d my (m) , where
those point are representing the time when the algorithm generate the pair of appoint
and cost.
Optimization algorithm is represented as mapping from the previous visited set
of points to a single new point. The connection between the algorithms and their cost
function is the core of NFL theorem. It emphasized that if the algorithm performed well
in one class of problem, then it will performs more poorly in the rest of problem classes
elsewhere (Wolpert and Macready, 1996).
38
4.5. Summary
In fact, we have been constructing the base for our system and the parameters
for it; we have done it to get more comfort with our result. We have been mentions the
criteria for assessment we follow to get hopefully more reliable solution. Finally; we
have been supporting our submission by No Free lunch theorem, we will see it recover
us in comparison between the set of algorithm.
CHAPTER 5 EXPERIMENTAL RESULTS AND ANALYSIS
5.1. Overview
Twelve problems were tested in the same environment. Those problems were
tested in three flavors: adaptive penalty, static penalty, and stochastic ranking. All of
them started with a stochastic population, and continue the search according to a
corresponding method.
Table 5.1.1 Number of variables and estimated ratio of feasible region
Problem
n
optimum
 | F | / | S |
G1
G2
G3
13
-15.0000
0.0111%
20
-0.8036
99.9971%
10
-1.0005
0.0000%
G4
G5
5
-30665.5386 52.1230%
4
5126.4967
0.0000%
G6
2
-6961.8138
0.0066%
G7
10
24.3062
0.0003%
G8
2
-0.0958
0.8560%
39
G9
7
680.6300
0.5121%
G10
8
7049.2480
0.0010%
G11
G12
2
0.7499
0.0000%
5
0.0539
0.0000%
The fitness function is the calculation of objective and penalty function for the
two penalty method, but for stochastic ranking we don’t add or subtract any value from
the solution fitness, the objective function and penalty function were only used for
sorting only. As fare, the objective function is calculated according to bench mark and
the penalty function is the value of constraints chronic with the problem constraint.
Table 5.1.1 is shows the suit of function and them feasible region ratio. Where ρ is the
estimated ratio between feasible region and search space, F is the feasible region, S is
the search space and n is the number of dimensions (Liang, et al., 2006).
5.2. Results Discussion
Let us start with an anatomical discussion to three penalty methods. If we scan
the method we will recognize that they are three different types from a theory point of
view. Statistical penalty is classified to be static, but adaptive is a flavor of dynamics
penalty. However, stochastic ranking similarly to penalty via evaluates ranking of
individuals. Therefore, those three methods where sited in the same environment with
the same parameters. We fixed the number of individuals in population to be 100 for all
testing and number of iterations to be 5000. In total there were twelve problems. They
were “published” in many studies, where they discussed optimization problems. Those
problems can be classified into three classes according to our testing results solved,
inapplicable, and needs different parameters to get the desired solution. Problem
number G7 suffered from shortness of parameters and values satisfy. Those problems
were got solution, may or may not be infeasible. The set of all other cases somehow
achieved a solution, but varied, or were close in value in three methods. Those methods
contain equality and inequality constraints. Equality constraints were transformed to the
format of inequality constraints, by subtracting tolerance factor   0.0001 (Liang, et
al., 2006). On the other hand, the value of them is changed to be
hi ( x)    0 ;
instead, of their standard format hi (x) . Those constraints were involved in penalty as
40
absolute value; but inequality constraints where raised to power two, with maximum
zero. For every case we made 30 runs; and have been tacking three check points to see
the dynamics of algorithms runs. Sample tables will be shown in the next section. Those
problems were solved by combining GA as the base of the system, and appending it
with the three methods. For every FES we took the optimum solution, worst, standard
deviation, mean, median and feasible rate. For some problems there is a big standard
deviation but they have a solution; therefore, it may be caused by GA encoding strategy.
Table 5.2.1 shows adaptive penalty testing result. Problems G2 , G7 , G8 , G10 and
G12 were had an infeasible solution. On the other hand, problems G11 achieved good
results.
Compared to the static algorithm problem G1 achieved the best -7.404016358,
but static was achieved -9.275285357. Problem G4 had all constraints satisfied, but it
didn’t reach the best known optimum, it was -30281.26967, but it had a greater standard
deviation compared to the static algorithm. With two active constraints (i.e. they could
be equal to zero). Problem G11 was excellent it was reached 0.7514802, and it was the
same with static algorithm.; however, it had a good dynamics. The mean was
0.755743176 and the Standard Deviation (STD) was 0.003979412, and the feasible rate
was equal to 100%. Problem number G1 , G3 , G4 , G5 , G6 and G9 had a solution, but it is
greater the optimum known to those problems.
In conclusion adaptive penalty had two unsolved problems, but achieved a high
feasible rate for problems it was able to solve, but only problem G9 with low feasible
Table 5.2.1 Adaptive Penalty testing result
best
median
mean
STD
worst
feasible rate

x
G1
-7.404016358
-6.297534029
-6.363586238
0.399727213
-5.633156321
100.0000%
0.921259843,
0.11023622,
0.787401575,
0.921259843,
1,
1,
G2
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
41
G3
-0.931253421
-0.842859046
-0.841116058
0.036221653
-0.779364494
100.0000%
0.322834646,
0.385826772,
0.149606299,
0.251968504,
0.527559055,
0.31496063,
G4
-30281.26967
-30005.56388
-30021.59748
144.5448427
-29661.57945
100.0000%
79.5483871,
33.8,
31.06451613,
43.83870968,
35.70967742
0.866141732,
0.921259843,
0.937007874,
2.67960691,
0,
0,
0
G5
best
median
mean
STD
worst
feasible rate

x
best
median
mean
STD
worst
feasible rate

x
5556.480063
6280.456975
6233.58929
341.3571198
6762.328196
100.0000%
880.7531796,
884.6899772,
0.004330709,
-0.480708661
G9
1080.145469
112767.8806
1043658.872
2293734.51
8526739.99
76.6667%
1.587689301,
1.450903762,
0.24914509,
1.783097215,
0.356619443 ,
-4.626282364,
0.190522716
0.31496063,
0.125984252,
0.267716535,
0.283464567
G6
-6182.583956
-4691.12557
-4452.394377
1634.529669
-1671.319568
100.0000%
14.39663065,
1.562595373
G7
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
G8
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
G10
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
G11
0.7514802
0.755755479
0.755743176
0.003979412
0.763475586
100.0000%
-0.717647059,
0.51372549
G12
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
rate equal to 76.6667%, that could be for some factor that could influence the search
process. In Table 5.2.2 we will calculate some of this problem was solved, which assess
No Free Lunch Theorem.
Table 5.2.2 shows static penalty testing result. From the table we can see that
problems G2 , G8 and G12 achieved infeasible solution. Many studies were argued about
problem G7 to be infeasible, but they did not show any evidence. G11 result was had
good solution like adaptive penalty.
Compared to table 5.2.1 problem G1 achieved better value rather than adaptive
penalty; however, it is still not the best known result. It was -9.275285357. Problem G4
42
had value less than adaptive solution, the result was -30214.60354. Problem G11
reached the same value with adaptive 0.7514802, but with more accurate mean
0.7514802 and median 0.7514802; the standard deviation was enhancement too, where
it was 4.51681E-16. Problems G7 and G10 was solved with static penalty, but with a
poor result and low feasible rate.
In conclusions problem G7 are unsolved yet because of it’ feasible region is
very small see table 5.2.1. Static penalty out performs adaptive penalty, and stochastic
ranking; since, it has the maximum number of methods solved compared to table 5.2.1
and table 5.2.3, which describe adaptive penalty and stochastic ranking method
respectively. Finally, it achieved better solutions with better dynamics compared to both
algorithms.
Table 5.2.3 describes stochastic ranking algorithm result. From the table we can
see that problems G2 , G7 , G8 , G10 and G12 have an infeasible solution where all the
constraints are not satisfied like all previous methods. Problem G11 was have a good
solution
Table 5.2.2 Static Penalty testing result
best
median
mean
STD
worst
feasible rate

x
best
median
mean
G1
-9.275285357
-6.593175853
-6.598004029
0.995000984
-4.11792712
100.0000%
0.480314961,
0.535433071,
1,
0.976377953,
0.88976378,
0.732283465,
0.511811024,
0.866141732,
0.937007874,
1.94103644,
1.562595373,
1.57480315,
0.25984252
G5
5189.629255
5576.881075
5560.748445
G2
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
G3
-0.933743404
-0.908843579
-0.910669567
0.008940373
-0.893903685
100.0000%
0.118110236,
0.212598425,
0.330708661,
0.385826772,
0.212598425,
0.149606299,
0.456692913,
0.346456693,
0.42519685,
0.31496063
G4
-30214.60354
-30214.32218
-30212.26356
11.32879254
-30152.28217
100.0000%
29.90322581,
40.93548387,
36.87096774
G6
-6335.307512
-5394.596871
-5690.293922
G7
3299.438956
3299.438956
3299.438956
G8
Infeasible
Infeasible
Infeasible
43
STD
worst
feasible rate

x
best
median
mean
STD
worst
feasible rate

x
207.4249675
5965.711937
100.0000%
725.9180139,
990.6264544,
0.090944882,
-0.411417323
443.5371583
-5359.73563
100.0000%
14.35945797,
1.416102057
G9
733.3841424
835.4437699
841.6365757
60.64115782
974.1190833
100.0000%
0.620420127,
2.046897899,
-0.434782609,
4.352711285,
-1.157791891,
0.053737176,
0.786516854
G10
12673.14241
20123.15371
19184.00428
4519.533281
28414.08512
63.3333%
2070.24705,
1360.274658,
9242.6207,
202.8396823,
243.3321635,
159.6431705,
317.020775,
342.7770445
0
3299.438956
3.3333%
2.467024915,
1.890571568,
4.880312653,
8.983878847,
-0.835368832,
2.144601856,
7.997068881,
-9.198827553,
6.980947728,
5.017098192
G11
0.7514802
0.7514802
0.7514802
4.51681E-16
0.7514802
100.0000%
-0.717647059,
0.51372549
Infeasible
Infeasible
Infeasible
G12
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
Infeasible
Compared to the table 5.2.1 and 5.2.2 stochastic ranking solve G11 with less
dynamics, but with the same feasible rate. As far, stochastic ranking where behave
poorly comparing to precedence methods, but it may be enhanced by using deferent
parameters or representation. Problem G1 has the worst value comparing with the
previous two methods. Problems G4 , G5 and G9 followed the same pattern like G1 .
Problem G5 was having poor feasible rate value equal to 6.6667%.
In conclusion we have notice two basic problems with stochastic ranking
method, first it poorness with respect to STD. Secondly, its success rate was low
compared to the other methods. Therefore, it was give an enhancement over the two
algorithms because of eliminating of the penalty factors and using only complementary
criteria.
Table 5.2.3 Stochastic Ranking testing result
44
G1
G2
G3
G4
best
-2.191417933
Infeasible
-0.931253421
-30178.98389
median
-1.069096014
Infeasible
-0.796794371
-29639.23488
mean
-1.157421311
Infeasible
-0.800529345
-29608.7686
STD
0.429445372
Infeasible
0.049871505
318.0689861
worst
-0.503937008
Infeasible
-0.722094899
-28714.93996
feasible rate
100.0000%
Infeasible
100.0000%
86.6667%
0.330708661,
0.236220472,
0.464566929,
0.385826772,
0.283464567,
0.220472441,
0.165354331,
0.31496063,
0.440944882,
0.102362205
81.09677419,
33,
31.64516129,
43.25806452,
32.80645161

x
0,
0.125984252,
0.125984252,
0.015748031,
0.519685039,
0,
0.503937008,
0.346456693,
0.503937008,
0,
0.219739974,
0.097662211,
0
G5
G6
G7
G8
best
6475.340174
-6182.275629
Infeasible
Infeasible
median
6790.845141
-6181.034864
Infeasible
Infeasible
mean
452.7230094
-5053.570525
Infeasible
Infeasible
STD
446.1914042
1892.998669
Infeasible
Infeasible
worst
7106.350109
-1400.92641
Infeasible
Infeasible
feasible rate
6.6667%
23.3333%
Infeasible
Infeasible

x
972.9841078, 14.40194104,
992.1187753 , 1.562595373
45
0.030314961,
-0.532677165
G9
G10
G11
G12
best
1774.973383
Infeasible
0.751910804
Infeasible
median
109144.2174
Infeasible
0.783098808
Infeasible
mean
1291656.182
Infeasible
0.792522876
Infeasible
STD
2335055.026
Infeasible
0.033428282
Infeasible
worst
7782401.812
Infeasible
0.885890042
Infeasible
feasible rate
43.3333%
Infeasible
100.0000%
Infeasible

x
0.004885198,
0.043966781,
-1.871030777,
0.200293112,
0.083048363,
0.004885198,
5.026868588
-0.733333333,
0.537254902
5.3. Result Comparison
Table 5.3.1 shows the set of algorithms and there best values, we can recognize
from the table that, problems G2 , G7 , G8 , G10 and G12 where not solved by adaptive
penalty and stochastic ranking.
Table 5.3.1 Algorithms Best Result Comparison
Function
Optimum
value
G1
G2
G3
-15.0000
-0.8036
-1.0005
Adaptive
Penalty
Method
-7.404016358
Infeasible
-0.931253421
G4
-30665.5386
-30281.26967
46
Best values
Static Penalty
Method
-9.275285357
Infeasible
-0.933743404
Stochastic
Ranking
Algorithm
-2.191417933
Infeasible
-0.931253421
-30214.60354
-30178.98389
G5
5126.4967
5556.480063
5189.629255
6475.340174
G6
-6961.8138
-6182.583956
-6335.307512
-6182.275629
G7
24.3062
Infeasible
3299.438956
Infeasible
G8
-0.0958
Infeasible
Infeasible
Infeasible
G9
680.6300
1080.145469
733.3841424
1774.973383
G10
7049.2480
Infeasible
12673.14241
Infeasible
G11
G12
0.7499
0.0539
0.7514802
Infeasible
0.7514802
Infeasible
0.751910804
Infeasible
From table 5.3.1 shows that static penalty was having the maximum number of
solved problem with high consistency. Adaptive penalty and stochastic ranking were
has two unsolved problems. From the table we can see that no method is capable to
solve any problem, and no problem can be solved by every algorithm. Finally; those
method were compete and has them own best solution corresponding to the problem
itself.
In conclusion, this comparison is enhancing the No Free Lunch theorem, when it
says no algorithm is professional for all problems.
5.4. Convergence Map
We constructed three check points in 5000, 50000 and 500000 which is the
maximum number of FES. Logically, all cases run will follow the same pattern of
convergence for two reasons. Firstly, we are starting with a population stochastically,
Table 5.4.1 Error achieved when FES equal to 5000, 50000 and 50000
FES
5 x 10³
5 x 10⁴
best
median
c
v
mean
STD
worst
best
median
c
v
mean
adaptive
G11
0.7514802
0.755755479
0
1
0.755743176
0.003979412
0.763475586
0.7514802
0.755755479
0
1
0.755743176
47
static
G11
0.7514802
0.758431373
0
1
0.758423171
0.006491864
0.781930027
0.7514802
0.751910804
0
1
0.752134307
stochastic
G11
0.751910804
0.783098808
0
1
0.793156478
0.033036578
0.885890042
0.751910804
0.783098808
0
1
0.792522876
5 x 10⁵
STD
worst
best
median
c
v
mean
STD
worst
0.003979412
0.763475586
0.7514802
0.755755479
0
1
0.755743176
0.003979412
0.763475586
0.001196255
0.756278354
0.7514802
0.7514802
0
1
0.7514802
4.51681E-16
0.7514802
0.033428282
0.885890042
0.751910804
0.783098808
0
1
0.792522876
0.033428282
0.885890042
and starting to make corresponding method operations, and then we will get
enhancement for the given solution; or at least it will retain the best known solution we
have in hand. Secondly, according to No Free lunch Theorem, we will have some
algorithms that have the ability to solve a given class of problem; however, the given
algorithm will behave the same for this set.
Table 5.4.1 is describing the error rate with respect to individual FES records for
problem G11 , where C is the number of violated constraints and V is the mean value of
violation
We can recognize distinct differences in the result between. On the other hand,
they were varied in standard deviation.
From our set of problems we have chosen problem G11 , where only 1 constraints
need to be satisfied in order to recognize that the solution is feasible. Meanwhile, the set
of three algorithms used behaved approximately the same, and all of them gave a
feasible solution. Table 5.4.1 shows the error value achieved when FES is equal to
5000, 50000 and 500000 (Liang, et al., 2006). Those check points were designed to
investigate the dynamics of the algorithms, and to navigate through the algorithms
internally and try to find how they were converged.
From the table we can see that static penalty got the best value, with respect to
the number of constraints, it reached 0.7514802 in the first check point, and retain the
value until the maximum FES. It also had the best values for mean, median, and the
worst record, and a enhanced standard deviation equal to4.51681E-16. In contrast,
stochastic ranking got the maximum value of best; but worst standard deviation. Finally,
adaptive penalty was in between by decreasing standard deviation. It was starts with
0.003979412 with respect to the first check point, and retains it to 500000 FES.
The standard deviation provides us with information about the convergence of
algorithms, and the ability of the algorithm to solve the problem coherently. However,
48
Figure 5.4.1 Adaptive Penalty Convergence Map
having an enhanced standard deviation makes sure that the algorithm was excellent and
the dynamics are developed to retain in the feasible region. Figures 5.4.1, 5.4.2 and
5.4.3 is illustrates the convergence map, where the best is represented to clarify the
development of the algorithm and the objective function reached with respect to
iterations.
Figure 5.4.2 Static Penalty Convergence Map
Figure 5.4.1 is illustrating the convergence of adaptive penalty, from the figure
we can see that it was converged to the best in the 18th generation. However, it was
having not followed the virtual shape of logarithmic function in it convergence. It was
had the same best comparing to static penalty. Figure 5.4.2 are describing the static
penalty convergence graph, it was got the best in the 24th iterations with better
49
logarithmic shape of function, meanwhile it was have the minimum value of best.
Finally, Figure 5.4.3 is describing stochastic ranking convergence graph, it was the best
corresponding to the shape of virtual function similarity, meanwhile it was converged in
the 4th iterations.
In conclusions we can see that stochastic ranking was the best according to
shape of the function and number of iteration, but it was the worst corresponding to the
best value.
Figure 5.4.3 Stochastic Ranking Convergence Map
5.5. Summary
From the previous section we can generalize one important issue, that for any
penalty method we use there is no complete method static, dynamic or stochastic
ranking. There is no complete method each method has its own strength and weakness.
If we scan the results we found totally static penalty was able to solve 9 problems, and
adaptive and stochastic method solved 7 problems, but each with a different sequence.
Ranking individuals were got a fantastic impact on the search process with enhancement
for some problems such as problem G5 . It was interesting to find this equivalent effect
with penalty. Another important issue to mention here is that this algorithm was fully
applied by GA, and the individuals were encoded in binary string representation. The
variation of result was absolutely not a shortage of GA itself, but it may be due to of
50
two factors. Firstly, those problems were complicated enough to be trivially solved;
since, the feasible region is small in huge search space. The shape of hyperspace was
complicated with different variables range; for example, problem G5 had variables
ranged from 0 to 1200; meanwhile, with the same problem there were some variable
that ranged from (- 0.5 to 0.5). Secondly, every algorithm has its own criteria which
reveals for different results achieved. No Free Lunch Theorem supports these findings.
CHAPTER 6 CONCLUSION REMARKS
6.1. Conclusions
In the real world, objects have three dimensions, but in mathematics, there can
be an infinite number of dimensions. For example, if we want to track the motion of the
moon with respect to the motion of the earth, using the sun as a central point, then there
are a total number of nine dimensions. In mathematics, it will be extremely difficult to
find a local or global optimum for minimization or maximization.
In mathematics, concavity and derivatives is the main clues; meanwhile, it will
be an extensive time consuming methodology. On the other hand, GA purpose is to be
used for maximization problem only, either GA; or calculus can reveal a complete
constraints problem resolver.
Penalty method is a third party problem solver, but as with any evolutionary
strategy technique. The aim of this study was comparing three method of penalty by
combined GA with penalty and to make optimization for a set of problems, where
constraints are sensitive and dimensions are immense. GA as the core of the system
obtained quite good results and solved the majority of problems.
Applying GA with binary representation for the first time with stochastic is a
new technique which has never been done before. It provides new perspectives for GA
with binary representation to be a constraints optimizer technique. Applying both static
and dynamic penalties for the same set of problems could provide further understanding
of given algorithms. A free back from the current population could be better than a fixed
ratio of penalty, compared to a more consistent result, as though by the majority of
pioneers. However, this was the case as we obtain more reliable result with static
penalty rather than adaptive. By comparing adaptive penalty and static penalty
algorithms with such a simple technique, where only ranking of individuals with SimiBubble sort like procedure gives an incredible result, without having to guesses the
penalty factor to be applied and eliminating the more complicated nature of static and
dynamic penalties.
51
6.2. Future Work
Future work will focus on two basic fields:
 Applying optimization to new sets of problems by using the same
penalty methods discussed previously in this study, with the same
technique.
 Applying new optimization methodology for the same set of
problems with GA flavor, such as ant-colony and other
techniques.
52
BIBLIOGRAPHY
Bean, J., & Hadj-Alouane, A. (1992). A dual genetic algorithm for bounded integer programs.
Technical Report TR 92-53, Department of Industrial and Operations Engineering. The
University of Michigan.
Blickle, T., & Thiele, L. (1997). A comparison of selection schemes used in genetic algorithms.
Evolutionary Computation, 4(4), 361-394.
Coello, C. A. (2000). Theoretical and numerical constraint-handling technoques used with
evaluationary algorithms. Computer Methods in Applied Mechanics and Engineering, 191(1112), pp. 1245-1287.
Davis, L. (1987). Genetic Algorithms and Simulation Annealing. London: Pitman Publishing.
Floudas, C. A., & Pardalos, P. M. (1990). A Collection of Test Problems for Constrained Global
Optimization Algorithms. New York: Springer-Verlag.
Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning.
Addison-Wesley, Reading.
Haupt, R. L., & Haupt, S. E. (2004). Practical Genetic Algorithms. Hoboken, New Jersey: John Wiley
& Sons.
Himmelblau, D. M. (1972). Applied nonlinear programming. New York: McGraw-Hill.
Hock, W., & Schittkowski, K. (1980). Test examples for nonlinear programming codes. Journal of
Optimization Theory and Applications, 3(1), pp. 127-129.
Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. Ann Harbor: University of
Michigan Press.
Homaifar, A., Lai, S., & Qi, X. (1994). Constrained optimization via genetic algorithms. Simulation 64
(4), 242-254.
Joines, J., & Houck, C. (1994). On the use of non-stationary penalty functions to solve non-linear
constrained optimization problems with GAs. Proceedings of the First IEEE International
Conference on Evolutionary Computation (pp. 579-584). Orlando FL: IEEE Press.
Koziel, S., & Michalcwicz, Z. (1999). Evaluationary algorithns, homorphase mapping, and
constrained parameter optimization. Evolutionary Computation, 7(1), pp. 19-44.
Liang, J. J., Runarsson, T. P., Mezura-Montes, E., Clerc, M., Suganthan, P. N., Coello, C. A., et al.
(2006). Problem definitions and evaluation criteria for the CEC 2006 special session on
constrained real-parameter optimization,Technical Report . Nanyang Technological
University, Singapore.
Michalcwicz, Z., Nazhiyath, G., & Michalcwicz, M. (1996). A note on usefulness of geometrical
crossover for numerical optimization problems. In P. A. In L.J. Fogel (Ed.), Proceedings of
53
the Fifth Annual Conference on Evolutionary Programming, In P. J. Angeline & T. Bäck
(Eds.) (pp. 305-312). Cambridge: MA: MIT Press.
Michalewcz, Z. (1996). Genetic Algorithms + Data Structure = Evoluationary Programs (3 ed.).
Berlin: Springer.
Michalewicz, Z., & Schoenauery, M. (1996). Evolutionary algorithms for constrained parameter
optimization problems. Evolutionary Computation, 4(1), 1-32.
Morales, A. K., & Quezada, C. V. (1998). A universal election genetic algorithms for constrained
optimization. Proceeding s of the 6th European Congress on Intellegent Techniques and Soft
Computing (pp. 518-522). Aechen, Germany: Verlag Mainz.
Reeves, C. R., & Rowe, J. E. (2002). Genetic Algorithms Prenciples and Perspective. Kluwer
Acadimic Publishers.
Richardson, J. T., Palmer, M., G., L., & M., H. (1989). Some guidelines for genetic algorithms with
penalty functions. In J.D. Schaffer (Ed.), Proceedings of the Third International Conference
on Genetic Algorithms (pp. 191-197). George Mason University, Morgan Kaufmann, Reading,
MA.
Runarsson, T. P., & Yao, X. (2000). Stochastic ranking for constrainted evolutionary optimization.
IEEE Transactions on Evolutionary Computation, 4(3), pp. 284-294.
Sivanandam, S., & Deepa, S. (2008). Introduction to Genetic Algorithms. Berlin: Springer-Verlag.
Wolpert, D. H., & Macready, W. G. (1996). No free lunch theorems for optimization. IEEE
Transactions on Evolutionary Computation, 1, pp. 67-82.
54
ÖZGEÇMİŞ
Adı Soyadı: MAHMOUD ABURUB
Doğum Tarihi: 15 Ekim 1978
Öğrenim Durumu:
Derece
Bölüm/Program
Lisans
Computer Informatıon
Technology
Üniversite
Arab American Üniversitesi
Yıl
2007
Ödüller :
Son iki yılda verdiği lisans ve lisansüstü düzeydeki dersler (Açılmışsa, yaz döneminde
verilen dersler de tabloya ilave edilecektir):
Haftalık Saati
Akademik
Öğrenci
Dönem
Dersin Adı
Yıl
Sayısı
Teorik Uygulama
FUZZY LOGIC
YES YES
5
SOFT COMPUTING
YES YES
5
İlkbahar
DATA AND COMPUTER
YES
YES
6
2010-2011
COMUNICATION
EXPERT SYSTEM
YES
YES
6
Güz
GENETIC ALGORITHNS
YES
YES
6
PATTERN RECOGNITION YES
YES
7
ADVANCE SOFTWARE
YES
YES
7
ENGINEERING
2011-2012 İlkbahar
APPENDIX
Penalty Methods In Genetic Algorithm For Solving Numerical Constrained Optimization
Problems:CD
55