Comparison of GenetiC Algorithms with Conjugate Gradient Methods

NASACONTRA
YASA CR-2093
<I,/
-
CTOR
REPORT
m
gc
0
N
LOAN
COPY: RETURN
AFWL
(DOUL)
KIRTLAND
AFB, N.
M
U
TO
MI
L COMPARISON
OF GENETIC ALGORITHMS
WITH CONJUGATE
GRADIENT
METHODS
: by Jack Bosworth,
Normm
Foe, und Bernard
P. Zeigler
‘I
d
f
Prepared
j
5
g
THE
;:
for
by
UNIVERSITY
Ann Arbor,
Langley
Mich.
Research
OF MICHIGAN
48 104,
Center
I‘i‘5 NATIONAL AERONAUTICSAND SPACE ADMINISTRATION
l
WASHINGTON, D. C. . AUGUST 1972
TECH LIBRARY KAFB, NM
1.Report
No.
2. Government
4. Title
and
Accession
No.
Subtitle
-.
Jack Bosworth.
Name
and
No.
Name
and
Date
6. Performing
Organization
Code
6.
Organization
Report
Performing
10.
Work
11.
&ntract
Unit
No.
or Grant
No.
NGR-23-005-04 7
Department
13.
Address
Type
of
Report
Contractor
National Aeronautics
and Space Administration
Washington, D.C. 20546
14.
___.
.--.
Sponsoring
and
Period
Report
Agency
Code
.~
Notes
16. Abstract
Genetic algorithms
for mathematical
function optimization
are modeled on search
strategies
employed in natural adaptation.
Comparisons of genetic algorithms with
conjugate gradient methods , which have been made on sn IBM 1800 digital
computer, show
that genetic algorithms
display superior performance over gradient methods for functions
and functions obscured by
which are poorly behaved mathematically,
multimodal functions,
additive
random noise.
Furthermore,
genetic methods offer performance comparable to
gradient methods for many of the standard functions.
7. Key
Words
(Suggested
by Authoris))
-~
16.
Function optimization
Mathematical Programing
19. Security
aassif.
20.
I
sale by the
Distribution
Statement
Unclassified
(of this report)
Unclassified
*For
No.
003120-1-T
Address
The University
of Michigan
Logic of Computers Group
Computer and Communication Sciences
Ann Arbor, Michigan
48104
15. Supplementary
Report
Norman Foo, and Bernard P. Zeigler
Organization
Agency
5.
Catalog
I
7. Author(s)
12. Sponsoring
Recipient’s
August 1972
COMPARISONOF GENETIC ALGORITHMSWITH CONJUGATEGRADIENT
METHODS
9. Performing
3.
-I
NASA CR-2093
National
Security
Classif.
(of this
page)
- Unlimited
lZl.NoGPap.
Unclassified
Technical
Information
Service,
Springfield,
Virginia
22151
Covered
I.
Introduction
A function
optimization
a real valued function
points
problem may be defined
defined on a finite
dimensional
of the space at which the function
A direct
or maximum) values.
optimization
sear&
a number of points
attains
in the space until
a point
the
optimum (minimum
its
step-by-step
Given
space, find
for solving
aZgoritkm
problem is an iterative
as follows:
such an
procedure which samples
is found which is apparently
optimum.
Function optimization
arise
problems requiring
search algorithms
from the general area of the design of optimal
(Athans and Falb (1966)).
the control
control
some pre-determined
solution
plant
necessitate
(Kalman, Falb, Arbib
(controlled
optimal
control
or chemical processing
optimization
control
to optimize
controller
actions
attempts
In this
point
being based upon its
and environmental
to
Often the design of such
direct
search algorithms
to formulate
for their
beforehand a realistic
case, one may design a control
system
of view (Bellman (1959), Mishkin and Braun
An adaptive
the performance of the plant
continually
according
problems which cannot be solved
(1961), Feld'baum (1966), Sworder (1966)).
attempts
for
however, not enough is known about the
system) behavior
from the adaptive
plants,
to
(1969), Lavi and Vogl (1965)).
applications,
problem.
systems
of view, when applied
of performance.
and therefore
In many control
point
control
systems which perform optimally
criteria
systems leads to function
analytically
The optimal
of aerospace vehicles
example, involves
inputs
direct
"on line",
to improve the plant's
record of past plant
disturbances.
An adaptive
control
system
i.e.,
performance,
the
its
responses to control
controller
must possess
as essential
subcomponents, direct
search toward optimum points
search algorithms
of the criterion
which can direct
function
the
(Wilde (1964),
Hall and Ratz (1967)).
Thus the successful
rests
critically
solving
on the existence
function
algorithm
optimization
secondly,
and adaptive
of useful
problems.
in any application
to converge (i.e.,
locate
design of optimal
direct
to converge rapidZy
locate
it
variations
roundoff
is important
in the criterion
error
or plant
far removed from the actual
Genetic algorithms
upon search strategies
for practical
into
time)
and
application).
not be misled by random
(arising,
disturbances)
place
can be guaranteed to eventually
such an algorithm
function
for
search
in the first
the optimum in a finite
(many algorithms
that
search algorithms
ability
the optimum but do so much too slowly
Thirdly,
systems
The value of a direct
depends on its
to actually
control
for example, by digital
settling
on apparent optima
ones.
are direct
search algorithms
employed in natural
which are modelled
adaptation.
Attempts were made by Fogel, Owens and Walsh (1966) and Bremermann
(1966) to implement some of the search strategies
adaptation.
The techniques
employed by these workers only superficially
resembled those known to exist
did not yield
information
or cost and complexity
algorithms
in nature
concerning
the comparative
of the genetic
at the genotypic
algorithms.
level
(1967), Bagley (1967), and Cavicchio
experimental
(Mayr, 1965) and the studies
employing the mechanisms of crossover,
and reproduction
results
employed in natural
indicating
convergence properties
More sophisticated
inversion,
mutation
have been developed by Rosenberg
(1970).
the superiority
These workers obtained
of the-genetic
algorithms
to competitive
methods in the areas of pattern
adaptation
which they explored.
systematic
theoretical
existence
of an ideal
to any other plan,
Holland
analysis
it
and biochemical
has undertaken
a
His work concerns the
plan which is "good" in comparison
sustains
only a finite
time when compared to any other plan.
of the requirements
(1969a,b,c)
of these methods.
reproductive
i.e.,
recognition
loss over infinite
This criterion
that a search algorithm
is a formalization
be "efficient"
and "robust"
over a broad range of test problems.
Hollstien
optimization.
(1971) developed a class of genetic
He has shown that these algorithms
convergence on functions
as classical
single
hill
which are multipeaked
climbing
algorithms
for function
are capable of achieving
and discontinuous
where
methods operate well only on sufficiently
smooth
peaked functions.
In this
algorithms
paper we are concerned with the convergence rates
in comparison with other methods.
the convergence rates
gradient
(variable
is a severe test
of genetic methods relative
metric)
Polak (1971)) on test
problems typical
for the genetic
extraction
from the analytic
structure
other hand the conjugate
to those of the conjugate
in the latter
techniques
function)
Thus from this
inferior
Some positive
for performance however arise
Hollstien
claims superior
point
than fixed
of
of view
performance from the genetic
(1966) and Schumer (1968) which indicate
methods can be more efficient
and on the
methods have been honed to the point
one may expect relatively
Rastrigin
This
for guidance (which is available
of the usual test
gradient
area.
methods since on the one hand they do not
for these functions.
indications
we investigate
methods (Luenberger (1964), Pearson (1969),
employ derivative
extreme efficiency
As a beginning,
of genetic
methods.
from studies
by
that random step size
step size gradient
methods.
Since
performance for his methods over those of
3
Rastrigin
this
opens the possibility
that genetic
ably with the conjugate
gradient
powerful
step size gradient
than the fixed
methods can compete favor-
methods (which are themselves more
methods).
4
---
~--
-.
II.
Description
of Program
As work progressed on our optimization
We shall
a number of modifications.
by describing
description
program it naturally
attempt to portray
this
four stages of development (I,II,III,IV).
the theoretical
these modifications
will
and experimental
underwent
evolution
After
the
developments which motivated
be discussed.
We consider maximization
of real valued n-ary
functions
of the form
f:Rn + R.
A chromosome (or string)
n-dimensional
pattern
is a list
vector with an associated
is a permutation
of coordinate
inversion
of the sequence l,...,n
values of an
pattern.
An inversion
say il,...,i
. If a
n
means that there
string
is a ,...,an
with inversion pattern i ,...,i n this
1
1
is a point in n-space which corresponds to the string such that its
For example, let n=4, and the string
coordinate
is a..
3
-.4 with inversion
1.3, -.4,
va2v.e associated
(currently
being optimized)
the value associated
fC.1,
1,4,2,3
then the corresponding
be .l,
point
is (.I,
.02).
The function
function
pattern
i. th
I
.02, 1.3,
with a string
is just
at the corresponding
with the above string
is f(.l,
the value of the
point.
1.3, -.4,
Thus
.02) (not
.02, 1.3, -.4)).
Version I
The basic flow diagram for Version
I is as follows:
5
--- --..-I-~-.---_I__
__-.
-
_
cross-over
no
Forty strings
each.
I.e.,
Only one inversion
any two strings
inversion
giving
pattern.
in four subpopuZations
was associated
in the same subpopulation
called
of ten strings
with each subpopulation.
had the same associated
the utility
vector
was maintained
value of each string.
consisted
the best string
then replacing
pattern
A vector
the function
Selection
(i.e.,
were maintained
of ordering
each subpopulation
is the one with the highest
the lowest four strings
by function
function
value)
by the best four strings
value
and
(in each
subpopulation).
Cross-over
two pairs
pivot
consisted
(7,8)
points.
(9,lO)
Then all
of picking
of strings
at random two coordinates
in each subpopulation.
coordinate
a1 ,...,a
with pivot
points
bla2a3a4b5'
Inversion
best strings,
subpopulations,
5 and bl,...,
2 and 4 say.
consisted
1
the pivot
For example suppose we have a pair
b5 with inversion
The resulting
of ordering
These are called
values between and including
are exchanged between pair members.
of strings
for each of the
pattern
strings
1,2,3,4,5
are alb2b3b4a5 and
the four subpopulations
copying the best two subpopulations
and changing the inversion
patterns
and
into
by their
the worst two
of the copies as
‘1 In each subpopulation the string with the highest function value is
found (the best string of the subpopulation)
and the subpopulation
'
with the highest "best string" is best, etc.
points
folloWs.
To change the inversion
for each copy and all
I.e.,
if
strings
4 say, then the new string
Mutation
initial
were inverted
a3 is a string
al,...,
mutation
were chosen
with pivot
points,
points
2 and
is ala4a3a2a5.
A probabitifg
was more complex.
specified
points
about these pivot
of a subpopulation
vector
The vector
parameter specifications.
coordinate
two pivot
patterns,
the probability
was included
in the
had four coordinates.
Each
of using a corresponding
method of
on any given string.
The methods of mutation were:
1)
Fletcher-Reeves
A version
(FR) Mutation.
(1960) method which could be applied
a controlled
a point
(without
When q=l this
mutation,
i.e.,
by the string
reinitialization).2
an approximate gradient
which were initialized.
2)
from the point
m integers
to be mutated,
(the actual
coordinates
Gaussian
An integer,
with limits
the number
to be mutated) lvere chosen randomly
limits
amounts) were
symmetric about 0, say rl,...,rm.
coordinate
of the point.
ApproximatZon.
m and il,...,im
If & is the initialized
rl,l'rl,2'""rm,l~rm,2~
"standard deviation"of
coordinate
by the gradient
search
were
2m numbers were chosen randomly between -1 and 1, say
chosen as in 2).
i.th
1
one dimensional
m numbers (the mutation
rj was added to the i.th
7
Quadratic
specified
specified
was chosen randomly between 1 and n, say m.
chosen randomly between initialized
3)
was taken at the point
of coordinates.
between 1 and n, say il,...,im.
Finally
reduces to gradient
3
UnifoMn random mutation
of coordinates
number of times q to
to be mutated and a "Golden Section"
was made along the line
of the Fletcher-Reeves
this
of the point
mutation,
then r.
JXrj,2
for each j=l,...,j=m.
number determining
the
l L is added to the
2 Since everytime the routine is called its
remembered gradient is set to
0 this is equivalently
a reset mode of operation with reset interval
q.
jOur Fletcher-Reeves method uses 2n samples for its gradient estimation and
30 samples for its one dimensional search per iteration
(n is the dimension
of the space).
7
4)
zero mutation.
The string
For each of the forty
strings
was chosen according
corresponding
with the same inversion
the strings
pattern
value vector)
vector
was
values between two bounds, say -2 and 2.
The utility
set to 1,2,3,4...n.
with the associated
other parameters were considered
and initialized
was converted to a string
of reading in parameters and initializing
were all
was initialized
point
to the point
to the point.
consisted
patterns
and applied
as before and the utility
to random coordinate
The four inversion
vector
The resulting
the function
The initiaZization
unaltered.
one of these four methods of mutation
to the probability
to the string.
updated by applying
is left
to be subject
function
vector
values.
to experimental
All
manipulation
accordingly.
Version II
Version I was modified
Selection
four strings
replaced
1.
String
8 is replaced by string
String
II in the following
The strings
String
String
randomly from those remaining.
Thus the best two strings
until
all
strings
Cross-over
caused cross-over
strings,
rather
are
i where i is (uniform)
9 is replaced by string
by the selection.process.
with
7 is replaced by
which case 9 is replaced by 3.
duplicated
ways:
in each subpopulation
as follows.
and 7,8,9 and 10 are replaced.
chosen from 2,3,...,10.
randomly
2 unless i=2 in
10 is replaced by a string
chosen
are always
(None of the replacements
were made
were chosen.)
was done in the same way.
Note that the selection
to occur between the best strings
than among the best strings
The four mutation
8
the worst four strings
from the same subpopulation
rated 1 ,...,lO
string
to create Version
now
and randomly chosen
themselves.
methods of I were used except that
2) was altered
(function
as follows:
rl,...,rm
2') Cubic Gaussian Approximation.
between -1 and 1 then added ri*L
A fifth
5)
coordinate.
method was added:
Uniform Raxdom with
old 2) but the limits
for different
Let these limits
Before this
mutation
4.1
was mutated as before,
each subpopulation
the
=
be
was done the maximum and
values were found for each coordinate,
for the 1.th coordinate.
Each string
This method was like
rm were chosen were different
of the point.
nn,L .
minimum coordinate
VariabZe Limits.
between which rl,...,
coordinates
-l.l,.L1'-L2,.L2,...,-&
respectively
to the i,th
J
were chosen randomly
say ai and.ai,
a;-a. 1**
but when the best string
was mutated (according
to the probability
the mutant replaced the worst member in the subpopulation
in
vector),
(the best string
was also saved unmutated).
The major addition
"adaptation"
fixed
routine
_-
to the program structure
which controlled
at initialization.
These parameters included
toward selecting
was based on a history
how often
resulted
highest
in applying
mutation
which contained
function
mutations
value present
routine
The variance
producing increases
routine
2')
deviatiofl'
(determining
the
method).
information
The adaptation
concerning
and 3) for each subpopulation,
in each subpopulation
used was similar
determining
whether large mutation
of this
vector
the%tandard
and when each mutation was used, the average mutation
The adaptation
(1968).
a particular
vector
and the
to that of Schumer and Steiglitz
parameter L was modified
value.
which
before mutation.
according
amounts or small ones proved more fruitful
in the function
,
some of the parameters previously
L used in methods 3) and Z'), the probability
disposition
was a second level
to
in
A more complete description
is given in Appendix A.
9
Version III
The flow diagram for Version
III
is as follows:
I-initialization
The major change introduced
tion
into
amically
four distinct
each sharing
was that
to at most 40.
a commoninversion
there was no partitioning
The population
subpopulations.
but was limited
I
pattern
associated
allowing
pattern
crossing
were not maintained
with the inversion
the alleles
patterns.
(for the difficulties
Instead
pattern
cross-over
inversion
pattern
pattern.
More detail
The mutation
in the following
of the better
will
routine
ways.
between strings
with
that of
having the same inversion
in this
strategy
see
was allowed between arbitrary
of the better
string
string
is in fact
strings
of the pair determining
In essence, the heuristic
to be crossed-over.
some convention
One possibility,
over only between strings
was rejected
(Bagley (1557)).
10
inversion
size was determined dyn-
Because separate subpopulations
had to be adopted in order to achieve crossover
different
of the popula-
is that the
the better
inversion
be given in a moment.
differed
from the previous
mutation
routines
A parameter ml (determined by initialization)
was defined
as the number of strings
began with m strings
(m assumed notless
which had the highest
were chosen.
to be mutated.
associated
These ml strings
than ml),
function
Suppose the program
then the ml strings
values among the initial
were copied.
Each of the ml copies was
mutated using a method chosen randomly with the probability
mining the frequency of selection
of any given mutation
methods were the same as l),
3) and 2') of Version
Z),
was not implemented in the Version
the utility
mutation
vector was updated and the history
The adaptation
routine
III
of
routine
Version
II
of the history
vector).
was introduced
to evaluate
was essentially
(allowing
deter-
The mutation
Method 5)
(As
before,
was maintained.)
the same as the adaptation
The major difference
in the structure
was that a weighting
method effectiveness
so that
difference
value than a method not weighted so heavily
of the probabilities
method.
routine.
vector
vector
II.
for the differences
method had to produce a higher percentage
m strings
a heavily
scheme
weighted
in the best function
in order to have the ratio
of these two methods remain the same. These weights
were initialized.
The cross-over
initialized
routine
parameter indicating
would operate on.
Z-ml strings
of the string
points
other string.
Let m2 be the
the number of strings
than or equal to m2’
were chosen.
present and pairing
the pivot
as follows.
Z-ml (the number of strings
was assumed greater
values)
was altered
leaving
with the higher
Then the alleles
function
routine)
among the
by copying the strings
(coordinate
value between and including
were exchanged with the corresponding
Equivalently
the mutation
The best m2 strings
Cross-over initiated
the copies randomly.
which the routine
the normal cross-over
alleles
operation
of the
is performed
11
except that the inversion
of the better
string
pattern
of the worse string
before the exchange is begun.
one of the daughters receives.the
other daughter inheriting
worse string's
the better
string's
is replaced by that
After
the exchange
inversion
pattern).
pattern
(the
For example, if
ala2a3a4a5 with pattern 12345 and blb2b3b4b5 with pattern 54321 are to
be crossed over, first
create b5b4b3b2bl with pattern 12345 and do the
cross-over
as usual.
With pivot
alb4b3b2a5 and b a a a b .
52341
the other gets 54321.
points
One of these is given pattern
The number of successive
cross-overs
but was determined by an initialized
constraint
40.
2 and 4 for example, we obtain
was not held at one (as before),
maximum bound i subject
that the process was to be stopped if
(Note that
the population
12345 while
to the
the population
doubles at each successive
size reached
cross-over
and 2' = 32 so i < 5.)
The inversion
entering
greater
population
than ml/Z)
the inversion
points
were chosen.
(the least
Each such string
was copied and
(Production
was halted
when ml strings
were produced.)
routine
the same as Version
some of the original
III
ml strings
except that
in the
were mutated as well.
parameter mi < ml determined that mi randomly chosen
from the original
ml strings
not including
the best were to be
mutated in the same manner as the ml copies already produced.
12
integer
of the copy was determined by randomly chosen pivot
IV was exactly
Thus an initialized
strings
strings
Assuming the
IV
Version
mutation
always produced ml strings.
size exceeded ml the best 'ml/Z'
pattern
as before.
Version
routine
III.
Test Functions
The following
1.
2.
3.
4.
Spherical
functions
we used as test
functions
to be optimized:
Contours
fl(X)
= “co x2.
is1 '
f2(x)
=
Index
40
C ix:
i=l
Index squared
f3(x)
= %Oi'xf
i=l
f4(x)
= 100(x2-x1)+(1-x1)
wood
22
22
+ 90(x4-x3)
+(1-x3)
2
2
+ lO.l((x,-1)2+(x4-1)2)
+19.8(x2-1)(x4-1)
5.
VaZleys
fg(x)
6.
=
Z i2(x5+i-xi)2+ixij
i=l
Repeated Peaks
f6 (‘)
= (4i~lxi
for
(‘-‘i))
x.1 > 0
~+~ xs] -x5)(“5-[
i = 1,2,3,4
‘51) ’ xs1
( ~x5~+1)
and x 5z1
= 0 otherwise4
Functions
We invented
4[x]
1 through 4 are standard in the direct
5 and 6 to test
is the integer
our hypotheses
part of x,'
e.g.,
[1.5]
concerning
search literature.
algorithm
behavior.
= J.
13
NOTE:
Functions
is replaced
14
l-5
by -f(x)
are to be minimized so that
and the standard maximization
in the program f(x)
formal
is satisfied.
IV. Comparison of Genetic and Classical
As stated
before,
performance of genetic
one of our primary objectives
and classical
could utilize
the local
well to compete favorably
extraction
structure
was to compare the
methods in the realm of numerical
We hoped to ascertain
optimization.
Methods
in this
of analytic
with classical
way whether genetic
functions
methods which employ gradient
routines.
to date with the Fletcher-Reeves
are strictly
method.
speaking only relevant
the means of comparison.
our conclusions
to the particular
Version II called
methods compared and
were selected
consisted
of running Version
FRl had the Version
IV against
with
that
a control
is the only mutation
II structure
routine
now had the form: apply the Fletcher-Reeves
interval
q = n (where n is the number of variables
method.
except that the mutation
method with reset
in the function)
to
in each subpopulation.
IV and FRl were applied
5 with the same initial
The resuZts
function
time the mutation
obtained
in mind we have reason to believe
FRl in which Fletcher-Reeves
More specifically,
Version
constructed
may have general validity.
The experiment
the best string
routines
Of course the results
However since the latter
role of class representatives
each test
methods
sufficiently
Our approach was to compare the best of our genetic
their
function
to each of the test
2 through
set of points.
are shown in Tables 1 and 2.
the number of function
routine
functions
is executed.
In Table 1, we record for
evaluations
taken by FRl each
In comparison the number of
15
TABLE 1
Test Function
Number of FRl
Function Evaluations
per generation
actual
divided
Number of Version IV
Function evaluations needed
to achieve same change in
function value
Corresponding change
(order of magnitude)
by 4
2.
Index
17,616
4,404
55,800
11,475
90
3.
Index Square
17,616
4,404
19,800
4,400
5,500
3,200
3,500
3,500
4.
Wood
5.
Val 1eys
656
164
45
45
45
45
45
45
270
90
90
90
3780
3240
0
0
0
40,770
18,900
2,100
525
1,566
5,220
1,392
3,132
15
1.46
.12
.8
.3
.189
.292
,385
.238
.34
.28
.525
.315
.583
,479
.00001
.006
.o
1.669
.643
1.2
.71
.88
1.3
-
TABLE 2a
Test Function
Function
value attained
Number of function
evaluations
actua
1.
Spherical
2.
Index
3.
Contours
1o-3g
'Rl
divided
by 4
T Version
52,800
110‘
52,848
13,424
67,725
Index Squared
-1.0 x lo-l5
-2.0 x 10-10
.05,696
26,421
40,000
4.
Wood
-1.46 x 1O-6
11,152
2,790
68,000
5.
Valleys
-3.4 x 10-9
8,400
2,100
11,310
6.
Repeated Peaks
11.999
m
IV
03
5,070
*The figure given in the number of function evaluations required by our
optimum gradient method (Fletcher-Reeves with q = 1) which converged
in one iteration.
17
TABLE 2b
Function
value attained
Test Function
Number of Function
evaluations required by
Version IV after FRl hung up
2.
Index
1.6 x lo-l9
86,175
3.
Index Squared
1 x 1o-22
94,140
4.
Wood
1 x lo-l4
5.
Valleys
2.4 x 10-13
200,000
93,544
18
-..-
_._,.-_.----.,_-_-_--,_,.--
---
--,m.
,-..,.,.-1.1
l,~..~..~--,,,l..,.-,~,-,
I._..
,,-~-1,.,,.,,
,.,11.--1-m..111mm
mm...,,.,
1.1..-1-11m1
1.1mm.m1.1.11’
function
evaluations
function
value is indicated
The function
taken by Version
IV to achieve the same change in
(along with the change in value achieved).
value attributed
to a population
In Table 2awe record the total
number of function
by the methods to reach the indicated
In these tables
evaluations
number divided
on the number of function
by 4.
evaluations
taken
evaluations
number of FRl function
The latter
is a lower bound
were the classical
our method in its nonHUform)
in the initial
best string.
level.
we have given both the actual
and this
method (i.e.,
is that of its
Fletcher-Reeves
to be applied
to the best point
population.
It may have become apparent to the reader that we face the difficulty
here of comparing the parallel
conjugate
gradient
of initial
points.
methods.
Our genetic
The Fletcher-Reeves
have observed that
variable
operating
algorithms
is locally
iterations
taken since the last
quadratic
for meaningful
themselves more to this
comparison.
form of analysis
little
context.
to be used: the maximum rate of convergence?
a method fails
rapidly
from others?
We
(for
example
and the number of
re-initialization.
type of method in the present
What if
with a number
may be quite
search region
or near a sharp ridge)
some kind of aggregate behavior
space is required
must start
method begins at one point.
depending on the nature of the current
Clearly
methods with the sequential
the rate of convergence of Fletcher-Reeves
whether it
either
genetic
of a method over the search
While parallel
methods lend
is known analytically
for
Then too which aggregate is
the average?
to converge from some starting
points
the minimum?
but converges
19
As already
indicated,
method in a Version
over: this
II genetic
is equivalent
reach a given function
was to embed the Fletcher-Reeves
program.
to applying
of the 4 subpopulations,
by Fletcher-Reeves
our decision
If we ignore the effects
Fletcher-Reeves
the number of function
value level
applied
level
by four"
first
to the point
iability
we would need only l/4
efficiency.
evaluations
which reaches this
points
level
an "optimistic"
This optimism will
best is unimportant)
be well
and inappropriate
fact high in which case the "pessimistic"
Our results
indicate
that
if
first.
reach
estimate
of
founded if the varpoint
the variability
is
is in
upper bound is justified.
except for the behavior
Wood and Repeated Peaks function
contours,
to
Thus the "divided
of convergence is low (so that knowing which starting
ultimately
in each
required
would actually
of the total.
columns of Tables 1 and 2a represent
Fletcher-Reeves
to the best point
being then four times the number required
knowing before hand which of the four initial
this
of cross-
on the spherical
there is not a vast difference
in convergence rates.
The behavior
on the spherical
IV's lack of gradient
(or just
extraction
optimum gradient)
direction
directly
functions
On this
facilities.
can follow
a one-dimension
points
out Version
function,
Fletcher-Reeves
search in the gradient
to the optimum.
The Wood function
good ridge
contours
follower.
results
(Its
indicated
initial
that Version
progress
IV is not a very
is comparable to FRl but it
seems to get hung up in mid course though its
mutation
facilitates
enable
it to make a recovery).
Repeated Peaks is a multiple
the abilities
in the fact
on any local
climbing
that FRl hangs up on the local
5Actually our observations
the FRl context.
20
hill
peak function
indicate
that
and thus should be beyond
method.
This is substantiated
peak on which it
crossover
has little
is initiated.
effect
in
Of course for the comparison here to be truly
level
meaningful
should be superposed above the Fletcher-Reeves
The conclusion
that convergence rates
local
(1968) and Hollstien
show that on functions
at the same value.
for his genetic
algorithms
gradient
methods with fixed
Rastrigin
over the random directional
are of the fixed
The efficiency
exceed that of the fixed
mation to this
algorithms
methods.
step
method with
convergence
methods.
outperform
methods referred
of the latter
conjugate
is a good representative6)
step gradient.7
the
to by
gradient
gradient
is known
Thus the question
remains
methods compare with the conjugate
methods and hence how Hollstien's
the gradient
two references
step size type and not of the conjugate
open as to how the random direction
gradient
genetic
(1963),
step size.
methods (of which Fletcher-Reeves
a
The first
claims superior
here to note that the gradient
class which we considered.
to
of Rastrigin
than a gradient
Hollstien
then that Hollstien's
It is crucial
(1971).
more efficient
step size fixed
It follows
search.
of type 1 through 3 a random directional
method can be significantly
search
are comparable on Functions
2 through 5 should be discussed in view of some results
Schumer and Steiglitz
a global
Our present results
genetic
methods compare with
thus add essentially
new infor-
comparison.
61n a comparison of 7 conjugate gradient methods including the well-known
ones, Pearson's (1969) results show that in terms of the number of
one-dimensional searches, Fletcher-Reeves is superior to all others
(except Newton Raphson) when operated in the reset mode (as it is here)
on the Rosenbrock and Wood functions.
Thus we chose Fletcher-Reeves since
it is both more simple and efficient
on the "well-behaved" functions we
considered.
considered by Pearson the situation
(On the "penalty functions"
is drastically
reversed with Pearson's method #3 coming out well on top.)
7This can be seen in any of the texts referred to in the literature
survey
and is essentially
due to the use of one-dimensional rather than the
much more costly n-dimensional searches.
21
Actually,
n2
C xi
i=l
Schumer compares his method with Newton Raphson on
(our function
1) and ? x4 and finds
i=l
n < 78 and superior
on the latter
of the number of function
increases
linearly
inferior
evaluations
and it
because second partial
As we have indicated
the function
increase
appears that Schumer's method
quadratically
derivatives
evaluations
only linearly
on the former for
The comparison is in terms
for n > 2.
while Newton Raphson increases
regard (essentially
Fletcher-Reeves
it
in this
must be estimated).
per iteration
required
in dimension and on the
by
: x2. and
i=l l
n4
C xi functions it should far surpass Newton Raphson in this measure.
i=l
In fact, Table 2a shows that our Fletcher-Reeves requires only 110
samples compared to the 330 required
by Schumer's method and the 1500 required
by Newton Raphsonl (Data taken from Schumer's Figure 4).
classical
Fletcher-Reeves
on Spherical
Contours for any finite
It is interesting
as a ridge
should be uniformly
follower
also that
as indicated
better
Thus the
than Schumer's method
dimension.
Schumer's method proved not very effective
by its
inferior
performance on Rosenbrock's
function.
It should be noted that Version
value levels
behavior
than was FRl.
starting
This is shown in Table 2b which gives Version
from the levels
are those for which FRl's progress
our Fletcher-Reeves
IV'was able to reach much lower function
indicated
terminated.
in Table 2a.
The latter
IV's
levels
(This may be an artifact
of
realization.)
when it is considered that
lActually
the difference
is-53 en more striking
Fletcher-Reeves reached 10
from point (2,2,. ..,2),
while Schumer's data
are for the level 10-8 starting
from (l,l,...,l).
Note that Version IV's
performance fell in between the Schumer and Newton Raphson methods.
22
V. Evolution
from Version
5.1 Mutation
and Second Level Adaptation
Initially,
I to Version
only mutation
random selection
off
false
of coordinates
ridges
adding a second level
(loci)
and values
(Version
As a run progresses
For this
II)
smaller
in function
value over the period
halved assuming that
of the
vectors
of a random mutation
of a better
mutation.
and added a program to
the standard deviation
mutation.
Table 3a
must be changed less in order
parameters in a Bayesian approximation.
for a larger
on the
Our analysis
routine.
reason, the standard deviation
end we implemented some history
and like-wise
that
is as follows.
the best alleles
mutation had worked best,
toward
the system move
improved performance.
must decrease in order to improve the probability
adapt the mutation
Methods
to modify the biasing
of the adaptation
reasons for the improvement obtained
To this
(alleles).
Later we discovered
1965).
considerably
the effectiveness
a uniform
in order to bias the distribution
(Wilde,
routine
basis of past experience
to improve.
It involved
This improved convergence by helping
resolution
indicates
Routines
method 1) was used.
3), 2') and 5) were introduced
small changes.
IV
Thus, if
a
was decreased
If there has been no improvement
of history
the standard deviation
it had been too large.
was
When the parameters became
too small for the accuracy of the machine, they were reset
to maximal
values.
It was apparent that the kind of mutation
one point
For this
in a run was sometimes different
reason more history
ent mutation
which worked best at
for a different
was kept and the probabilities
methods were changed.
part of the run.
of the differ-
This seems to work but does not usually
give marked improvement in the performance of the system.
23
Non-uniform
adaptation
distributions
was applied
worked better
than uniform
ones when no
since there was a higher probability
of small
uniform works best under certain
conditions
change.
With the adaptation,
since the probability
tation
of making the right
can progress
more likely
faster.
Under different
to put the adaptation
where change is quite
for a long period
This can happen in our present
Resetting
below a preset
limit.
below the preset
in the function
state
the simpler
better
but is still
mutation
of the adaptive
flexibility
(i.e.,
(adaptation)
routine.
We were interested
limit
in "quasi-stable"
routine
shown.
is insufficient
routine
5) was indeed
of 5) was redundant in view
variability)
to which gradjent
algorithm
that
(5) against
However, used with the additional
controllable
in the extent
the genetic
mutation
Table 3b indicates
the extra variability
structure.
introduced
information
could
Employing Fletcher-
does not give much of an answer to this
since on the test functions
vergence to such an extent
24
result
variable
(2).
routine
question
will
changes
change in the parameter setting.
adaptation
Reeves as a mutation
random or
in which the parameter is not
fed back to the adaptive
the more complicated
into
no
value.
too small to cause significant
than 2) on the two functions
be encorporated
of time at a suboptimal
values of mutated points
quadratic
by the latter
state
of the parameter occurs only when it has passed
since the information
We tested
is
in which the adjusted
system since we include
Thus a situation
level
to cause a directed
the uniform
parameter in a "quasi-stable"
state we mean a situation
parameter is maintained
reset.
conditions
and adap-
smooth but too slow to be useful.
By a "quasi-stable"
regular
size change is higher
employed it
that the essential
tends to speed up congenetic
elements
(cross-over
TABLE3a
Number of function
(All
evaluations
by Version
I V.S. Version
parameters are set to the same values except that Verstion
a second level
Function
adaptation
II.
II employs
routine).
Value
Attained
Spherical
contours
-2.045E+l*
-5.28
-3.78
-2.97
-2.48
-1.80
-1.36
-1.26
-1.00
- .753
- .472
- .268
- .218
- .217
*aEb is Fortran
for a x lob
1.
required
Version I
90
900
1350
2250
2700
4400
5500
5840
10,750
13,400
14,400
17,820
28,600
>38,300
Version
II
90
1050
1175
1350
i440
1440
1525
1620
1700
1890
2150
2700
2790
2790
25
TABLE 3b
The number of generations
mutation
(2) V.S. variable
Function
26
limit
Value
Attained
2.
Index
-700
-400
-300
-200
-100
- 80
- 60
- 40
- 20
4.
Wood
- 15.0
- 10.0
- 9.0
- 4.0
- 2.0
- 1.0
.5
.l
.Ol
required
by Version
mutation
II using quadratic
(5).
(21
(5)
10
15
36
75
190
260
310
550
>4200
20
50
70
110
170
190
220
260
470
7
9
10
12
15
46
60
>700
>700
10
20
40
60
70
80
170
370
TABLE 3c
The number of generations
mutation
quadratic
3.
required
(1) V.S. a mixed strategy
Value
Attained
Index
squared
-100
- 10
:
-
6"
5
II using a pure gradient
using 1) with probability
random (3) with probability
Function
by Version
l/4.*
1
9
34
43
55
64
l----
*Note that 1) also uses more function
than does 3/4(1)+1/4(3).
3/4 and
evaluations
3/4(1)+1/4(3)
10
34
35
41
45
per generation
27
TABLE 3d
The number of generations
28
required
by version
I with "best saved"
strategy
versus "best not saved."
Function
Value Attained
Best Saved
Best not Saved
3. Index Squared
-.198E5
10
10
-.llE5
20
30
-.7E4
40
50
-.5E4
50
90
-.4E4
70
120
-.3E4
90
160
-:2E4
110
310
-.lE4
190
'4700
and inversion)
mutation
better
do not play much of a role.
Fletcher-Reeves
(i.e.,
in conjunction
subpopulation
for the worse.
from the best string,
than alone (see Table 3~).
often
changed the best string
When we saved the best string
the worst string
with the mutation
the performance was increased
the Version
I kind of cross-over
several
can cause good alleles
and still
make the string
best with best cross-over
(coordinate
values)
does not use all
its
crossing
over the best strings
(see the description
performance considerably
of Version
test
were used.
attained
of crossover
(since none can be introduced
and inversion
of those available.
5, where the alleles
best of those initially
We also tested
This improved
and inversion
where no
Here one expects that the ultimate
is governed by the alleles
is whether crossover
alleles
II).
crossed over
as shown in Table 4.
We examined the effectiveness
routines
concept of
with randomly selected
so that only part of the time are the best strings
with each other
bad,
For this
potential.
we tried
population
(Table 3d).
because
bad by making some alleles
crossover
value level
fold.
to appear in
reason and other reasons based on our theoretical
mutation
in each
in each subpopulation.
intuitive
strings
in
results
best-with-best
only occurred between the best strings
Since mutation
a string
it worked
and Crossover
We called
cross-over
the mutation
by replacing
5.2 Inversion
with q = 1) we found that
with random mutation
It was apparent that
a given population
However when we used gradient
That this
in the final
present
by the mutation)
in the initial
so the real
can operate to select
is possible
population
function
is indicated
the best
in Table
are no worse than the second
available.
the effectiveness
of crossover
in bringing
together
29
TABLE 4
The number of generations
crossing
over best with best
Function
1.
Spherical
contours
3. .
Index
squared
4.
Wood
function
*Version
30
Function
value attained
-500
-400
-300
-200
-100
-10,000
- 8,000
- 6,000
- 4,000
- 2,000
-15
-10
- 9
-4
-2
I using mutation
required
to reachindicated
level
(BB) V.S. best with random (BR).*
BB
BR
10
19
48
190
>400
6
15
36
75
190
16
30
50
108
>2700
16
22
34
75
200
2
9
21
32
>600
7
9
10
12
15
2) (uniform
random)
by
TABLE 5
The effectiveness
of Verison
IV with no mutation
and 1 crossover
per generation.
Function
2.
Index
The two smallest* values of
alleles in 1st four co-ordinates available in initial
population
After generation 12 only
one string remained:
1
2
1
2
3
4
. 3835
.0488
-.6048
.0488
.1774
.1181
I
-.6048
*Clearly,
.0976
for the index function,
the smallest
are the best.
31
"good" alleles
crossover
Version
in another way.
routine
was unable to achieve the ultimate
IV using 2 crossovers
The effectiveness
5.3
Table 6 shows that Version
The motivation
tested
that
the version
III
the nwnber of subpopulations
with the result
only two.inversion
Thus on a function
are linked,
would not be tested
of the subpopulations
without
were reduced to say five
subpopulations
Thus it
inversion
pattern.
effectiveness
were maintained
patterns.
rapidly
appeared that
sized
primarily
Comparing of subpopulations
of inversion
some of whose variables
enough to
or six strings
to achieve the smaller
But subpopulations
patterns
On the other hand, if
of cross-over.
over would have been less effect.
evalua-
However, were we to reduce
to two or three,
patterns
was being carried
that more function
would be compared in general.
improve the effectiveness
(Table 7).
system was as follows:
a lot of "excess baggageIf
were being used than was necessary.
inversion
performance of
II and III
along in the four subpopulations
tions
was similarly
for constructing
It seemed probable
a
per generation.
of inversion
Comparison of Versions
IV without
the size
each, cross-
one must do
(total)
population.
to preserve
a single
was thus a test
How can this
of the
be achieved without
subpopulations?
Doing away with subpopulations
strings
will
terns
pattern
means that
Then there are n!/2
any inversion
for functions
%.e., redundant information
32
What
having
may be crossed over.Suppose the function
since any permutation
but turning
the question:
be crossed over and how? Suppose that only strings
the same inversion
n variables.
also forces
essentially
of the variables
pattern
different
inversion
is an inversion
end for end preserves
has
pat-
pattern,
clumpings.
with more than three or four variables
This
cross-over
TABLE 6
Number of function
crossover
(all
evaluations
required
by Version
IV with and without
other parameters fixed).
Function
Value Attained
Without Crossover
With Crossover*
2. Index
-6.03E2
75
225
-3.4E2
750
450
-1.65B2
1500
1125
-6.87El
2250
2475
-2.05El
3000
4500
-8.07EO
4500
6750
-3.19Eo
6000
8100
-.9.99El
7500
10,080
-5.82E-1
9000
10,800
-3.59E-1
10,500
11,210
-2.68~-1
12,000
11,700
-l.BOE-1
15,000
12,150
-7.7E-2
22,500
13,300
-4.9E-2
30,000
13,300
*using 2 consecutive
crossovers
33
TABLE 7
Number of
inversion
5.
34
(all
generations
required
by Version
I with and without
other parameters fixed).
Function
Value Attained
No Inversion
Valleys
6.7
4.5
4.0
3.0
2.2
2.0
1.5
1.3
1.0
.9
.7
.6
10
30
40
60
90
150
170
260
380
440
520
570
Inversion
10
30
30
50
60
70
100
140
150
340
360
360
would take place very seldom in a set of strings
inversion
Therefore
patterns.
As already
employed.
different
inversion
applying
the pivot
a more general kind of cross-over
indicated,
patterns
points
we tried
by picking
inversion
pattern.
variables.
Although this
its
from the first,
to be about half
uncertain
cross-over
alleles".
pattern
value
alleles
with only a
It
asswnes
at finding
clumps the right
is only slightly
consequences are more difficult
obtained
function
vaZue usuaZZy has the better
type of cross-over
as effective
The results
function
that the inversion
That is,
as before but
are in the string.
the "corresponding
with the better
with
with the corresponding
allows unrestricted
computing cost to find
that the string
points
with the better
involved
must be
over two strings
no matter where those alleles
This kind of cross-over
slight
crossing
two pivot
to the string
and simply exchanging the alleles
of the worse string
which have individual
III
It seems
to predict.
the best inversion
from the version
different
pattern.
and IV systems are often
because they have more than a dozen parameters.
The purpose
of having open so many parameters was that we wished to be able to test
hypotheses which we had formulated
the version
II system.
dependent.
That is,
one parameters,
has a strong
(that
effect
found the optimal
as a result
of our experience
These parameters have proved to be quite
we find
that
for any reasonable
one being arbitrary),
on the efficiency
of the system.
amount.
parameters which yielded
but
parameter
However, having
fixed,
changes the optimal
An important
be to chart the interrelations
However we were able to show that
inter-
of all
the single
value for that parameter with the others
value for the.one parameter by a.large
will
setting
varying
changing some of the other parameters frequently
for the future
with
involved.
there were settings
performance much superior
project
of Version
to Version
II
III
(Table 8)
35
TABLE 8
The number of function
value by Version
Function
3.
36
Index
Squared
II V.S.
Value
Attained
8000
4700
3200
2200
950
700
500
200
80
40
10
7
evaluations
Version
required
to reach indicated
III.
Version
630
1,260
2,520
2,835
4,400
5,350
7,550
13,200
18,300
21,400
34,300
35,900
II
Version
180
440
630
1,080
1,710
1,800
2,250
3,150
3,780
4,590
6,930
9,000
III
function
thus justifying
VI.
the change in system structure.
Conclusions
If the reader finds
the results
himself
of our work to date,
in the same position.
are sufficiently
We have constructed
interesting,
of theoretical
the dark experimentation.
Moreover,
algorithms,
effectiveness
a significant
Clearly
which
but which at the same
guidance we are reduced to stab in
since a single
optimization
in obtaining
however, we have obtained
the comparative behavior
of genetic
run
data is
of various
the crossover
subcomponents of the genetic
and inversion
suggestive
and conjugate
and we have also come to some conclusions
We have for example demonstrated optimization
individually
to be
as our enthusiasm demands.
Within these cmstraints
concerning
ourselves
study and classification?
takes hours to complete our rate of progress
not as fast
statement of
a class of algorithms
amenable to analytical
the benefit
a clear
let us assure you that we feel
complex to be highly
time are not readily
Thus, without
unable to formulate
results
gradient
concerning
the
algorithms.
problems where encorporating
operators
actually
does achieve
improvement in the rate of convergence (Tables 5,6,7).
much remains to be done in confirming
or disconfirming
these con-
clusions.
9 We look forward to a forthcoming book by J.H. Holland on adaptive systems
for possible help in this direction.
Also some preliminary
analysis will
appear in our Report (Foo and Bosworth, 1972).
37
APPENDIX
The following
is a more mathematical
Let
the function
description
of the adaptation
used.
highest
fi
denote
function
value)
adaptation.
Let
the mutation
which
of the version
f'
just
denote
before
just
integer,
system.
Let i.
fi-fi
1
di=
f
i-l
i.
in best
function
for
for
version
between
To avoid
mutation
this
we will
are used and an exact
from the program
used in the ith
to f'.
for
generations
generation
10 The theoretical
consider
Let
1 and ten.
string
just
before
In the case
i has values
in the case of the version
i-l
and i.
between
II
This
requires
the version
II
ai denote
double
and its
subscripting.
The same methods
I system
may be gained
of all
adaptation.
of the ai over
string
system.
the average
the last
difference
Let wi = di(i+l)/2.
own best
of the version
following
average
the last
adaptation.
each has its
understanding
listings.
following
the
II
each generation.
only
one with
and ten for version
I.
Let
f1-f'
and dl = f'
. di is the percentage
In the case of subpopulations,
own average
between
(the
of the best
the last
0'
string
mutation
value
before
say i
i=2,...,io
value
the ith
i has values
one and an initialized
denote
of the best
the function
occurred
I system
value
an infinite
mutations
Let
a' ccrres>ond
number of trials
loWhen a string
is mutated using one of the methods which use the adaptation
parameter k?, a random number of coordinates
are chosen to be mutated,
sayth
mutation
is used an r. is chosen for the j
If the "cubic"
i (1 2 i < n).
3 The absolute
values,
coordinate
if it is to be mutated where r. E [-l,l].
thJ average being say bk where
of these i numbers are averaged,
Ir.1,
to be mutated that generation.
If
thJ string
mutated was the kth string
are chosen for the jth
mutation
is used an r . and an r
the "quadratic"
2*i numbers
E
coordinate
if it is to be mutated whe% r
string
mutated
the average be&
b
Let ai denote the average o!2 all the bk in the ith
generation.
38
is
.5 since
between
the ri
and r.. have absolute
values uniformly
distributed
Jl
Let a* and a* denote respectively
the maximum and minimum
0 and 1.
of the a i's
including
a' excluding
ai
.
0
If
1'
is greater
L is replaced
$4,
new t was less
initialized
than 4-e.
by i-1.
The $ limit
above is a Bayesian
This
from the probability
method.
applied
before
the last
each generation,
p.m.
p’
based on the assumption
vector
on generations
vector).
If
to the ai
over
is less
by C'.
an infinite
the
& was
that
The
the amount
to the effect
one is negligible.
the probability
consideration
vector.
(found
p is 0 the method was not used so go to
(from
(like
If
obviously.
the next
method under
than
to the value
pertaining
following
is made when adapting
of the mutation
adaptation
l'
k? is replaced
Let ki be the number of strings
correspond
If
.4?was reset
in the history
of mutation
same assumption
by $1.
in the change of 1 is arbitrary
stored
Let p be the probability
the next
constant,
approximation
information
of a generation
$L I &' I $l,
than an initialized
to.
of usable
If
1 is replaced
adapting
to which
e).
a' and f').
the method was
Let k' be the number just
If
m strings
number of trials
the ki
were mutated
should
average
Thus let
@:wi+Cki)+
I
(I
iO
c wi
i=l
-ial]
_ p.m
. p,1o
I
+ p
)
39
p could be changed by no more than one tenth in the same manner as .& could
be changed by no more than one half.
The probabilities
size.
in the vector
The "probabilities"
You can easily
in the probability
see that this
vector
has no effect
40
vector
set to p'.
as to numerical
were not normalized.
on the above computations
derived
from the probability
since
vector.
on the vector were programmed so that no value in the probability
could be greater
or version
p was normally
had upper and lower limits
the p used there is a true probability
The limits
Therefore
II
than 20.0 or less than 0.5 or 0.1 in the version
systems respectively.
I
VII.
References
Athans, M. and Falb, P.L. (1966) "Optimal Control:
Theory and Its Applications"
McGraw-Hill.
An Introduction
to the
Bagley, J.D. (1967) "The Behavior of Adaptive Systems Which Employ Genetic
Department of Computer
and Correlation
Algorithms"
Doctoral Thesis.
and Communication Sciences, The University
of Michigan.
Bellman, R. (1959) Adaptive
University
Press.
Control
Processes: A Guided Tour, Princeton
Bremermann, H.J.; Rogson, M.; Salaff, S. (1966) "Global Properties
in Natural Automata and Useful SimuZations.
Evolution Processes"
Spartan Books.
of
of Constrained
Brioschi,
F. and Locatelli,
A.F. (1967) "Extremization
Multivariable
Function: Structural
Programming" IEEE Trans. on Sys.
Sci. and cy. ssc-3, 2.
Cavicchio, D.J. (1970) "Adaptive Search Using Simulated Evolution"
Department of Computer and Communication Sciences,
Doctoral Thesis.
The University
of Michigan.
Cohen, A.I. (1971) "Rate of Convergence of Several Conjugate Gradient
Fifth Annual Princeton Conference on Information Science
Algorithms"
and Systems.
Cragg, E.E. and Levy, A.V. (1969) "Study on a Supermemory Gradient Method
Journal of Optimization:
Theory and
for the Minimization
of Functions"
Application,
4, 3.
Davidon, W.C. (1966) Variable Metric Method for Minimization.
Argonne Nat. Lab. ANL-5990 (Rev. 2).
Feld'baum,
A.A. (1966)
Optimal Control
Systems.
Fletcher, R. and Reeves, C.M. (1964) "Function
The Computer J., 7, pp. 149-154.
Gradients"
Academic Press.
Minimization
by Conjugate
Fletcher, R. and Powell, M.J.D. (1963) "A Rapidly Convenient Descent
Method for Minimization"
The Computer J., 6, pp. 163-168.
Flood, M.M. and Leon, A. (1963) "A Direct Search Code for the Estimation
of Parameters in Stochastic Learning Models" Mental Health Research
Institute,
The University
of Michigan, Preprint 109.
Fogel, L.J.; Owens, A.J.; Walsh, M.J. (1966) ArtificiaZ
SimuZated Evohtion.
John Wiley and Sons, Inc.
Foo, N. and Bosworth, J.L. (1972)
Aspects of Genetic Operators."
"Algebraic,
Geometric,
NASA CR-2099, 1972.
Intelligence
Through
and Stochastic
41
Hall, C.D. and Ratz, H.C. (1967) "The Automatic Design of Fractional
Factorial
Experiments for Adaptive Process Optimization"
Information
and ControZ, 11, pp. 505-527.
Hartmanis, J. and Stearns,
Information Sciences.
R.E. (1969)
"Computational
Complexity"
Hill,
J.D. (1969) "A Search Technique for Multimodal
IEEE Trans. on Sys. 5%. and Cy., SSC-3, January.
Holland, J.H. (1969) "A New Kind of Turnpike
the American Math. Sot., 75, 6.
Surfaces"
Theorem"
of
BuZZetin
Hollstien,
R.B. (1971) "Artificial
Genetic Adaptation in Computer Control
Department of Computer Information and
Systems" Doctoral Thesis.
Control Engineering, The University
of Michigan.
Kalman, R.E.; Falb, P.L.; Arbib,
Theory. McGraw-Hill Book Co.
Lasdon, L.S.
(1971)
Optimization
M.A. (1969)
Topics in Mathematical
Theory for Large Systems.
System
MacMillan.
Bibliography
on Optimization"
In Recent
Leon, A. (1965a) "A Classified
Advances in Optimization
Techniques.
Lavi, A. and Vogl, T.P., eds.
John Wiley and Sons.
-,
(1965bl "A Comparison Among Eight Known Optimization Procedures"
Lavi, A. and Vogl, T-P.,
In Recent Advances in Optimization
Techniques.
eds. John Wiley and Sons.
Luenberger, D.G. (1969)
John Wiley and Sons.
Mayr, E. (1965)
Cambridge.
Optimization
Animal Species oxd EvbZution.
Miele, A. and Cantrell,
for The Minimization
AppZication,
3, 6.
J.W. (1969)
of Functions"
Mishkin, E. and Braun, L. (1961)
Book Co.
Pearson, J.D. (1968)
Trans. on Sys. Sci.
-,
(1969)
"Variable
Harvard University
Metric
Press,
"Study on a Memory Gradient Method
JournaZ of Optimization
Theory oxd
Adaptive
ControZ Systems.
"Decomposition in Multivariable
mtd Cy., 1. SSC-4.
Polak, E. (1971) Computational
Academic Press.
42
by Vector Space Methods.
Methods"
Systems"
The Computer JournaZ,
Methods in Optimization
McGraw-Hill
IEEE
72, 2.
- A Unified
Approach.
Rastrigin,
L.A. (1963) "The Convergence of the Random Search Method in
the Extremal Control of a Many Parameter System" Atiomtion
and Remote
Co&&.,
24, pp. 1337-1342.
Rosenberg, R. (1967) tlSimulation of Genetic Populations with Biochemical
Properties"
Doctoral Thesis, Department of Computer and Communication
Sciences, The University
of Michigan.
Rosenbrock, H.H. (1960) "Automatic Method for Finding the Greatest or
Least Value of a Function"
The Computer Journal, 3, pp. 175-184.
Schumer, M.A. and Steiglitz,
K. (1968) "Adaptive
IEEE Trans. on Aut. Control, AC-13, 3.
Step Size Random Search"
Shekel, J. (1971) "Test Functions for Multimodal
Fifth Annual Princeton Conference on Information
Search Techniques"
Science and Systems.
Spang, H.A. (1962) "A Review of Minimization
Functions"
SIAM Review, 4, pp. 343-365.
Sworder, D. (1966)
Optimal Adaptive
Wilde,
Optimum Seeking Methods.
D.J.
(1964)
Control
Techniques for Non-Linear
Systems.
Academic Press.
Prentice-Hall.
Wood, C.F. (1964) "Review of Design Optimization Techniques"
Westinghouse Research Laboratories,
Science Paper 64-SC4-361-Pl.
Zeigler, B.P. (1969a) "On the Feedback Complexity of Automata" Doctoral
Thesis, Department of Computer and Communication Sciences, The
University
of Michigan.
-9
U969bl "On the Feedback Complexity of Automata" Proceedings of
The Third Annuul Princeton Conference on Information Science and
Systems.
NASA-Lan@ey,
1972 -
19
CR-2093
43