Evolutionary Algorithms and Dynamic Optimization

Evolutionary Algorithms and
Dynamic Optimization Problems
Von der Fakultät Informatik, Elektrotechnik und
Informationstechnik der Universität Stuttgart
zur Erlangung der Würde eines Doktors der
Naturwissenschaften (Dr. rer. nat.) genehmigte
Abhandlung
Vorgelegt von
Karsten Weicker
aus Ludwigsburg
Hauptberichter:
Mitberichter:
Prof. Dr. V. Claus
Prof. Dr. H. Schmeck
Prof. Dr. K. De Jong
Tag der mündlichen Prüfung: 24.02.2003
Institut für Formale Methoden der Informatik der
Universität Stuttgart
2003
2
3
Summary
This thesis examines evolutionary algorithms, a universal optimization method,
applied to dynamic problems, i.e. the problems are changing during optimization.
The thesis is motivated by a lack of foundations for the field and the incomparability
of most publications that are of an empirical nature.
To establish a basis for the comparison of evolutionary algorithms applied to different time-dependent problems, a mathematical framework for the description of the
majority of dynamic problems is introduced in the first part of the thesis. Within
the framework, the dynamics of a problem are defined exactly for the changes
between the discrete time steps. At one time step, the overall fitness function is
defined as the maximum of several static component functions at each point of
the search space. Each component functions may be modified between the time
steps by stretching it with respect to the optimum, by rescaling the fitness, and by
coordinate transformations to relocate the optimum in the search space. The properties of the modifications can be described mathematically within the framework.
This leads to a classification of the considered dynamic problems. As a result examinations on distinct problems can be integrated in an overall picture using their
similarities concerning the dynamics. On the one hand, this is used to create a
mapping between problem classes and special techniques used in dynamic environments. This mapping is based on an analysis of the literature in the field. It is
a first step toward the identification of design patterns in evolutionary computing
to support the development of evolutionary algorithms for new dynamic problems.
On the other hand, the problem classes of the framework are used as basis for an
examination of performance measures within dynamic environments.
The second part of the thesis analyzes one specific technique, namely local variation, applied to one problem class, namely tracking problems or drifting landscapes, in detail. For this purpose, the optimization process of a (1, λ)-strategy
applied to a simple two-dimensional problem is modeled using a Markov chain.
This enables the exact computation of the probability to be within a certain distance to the optimum at any time step of the optimization process. By variation
of the strength of the dynamics, the step width parameter of the mutation, certain paradigms concerning the mutation operator, and the population size, findings
concerning the optimal calibration of the optimization methods are derived. This
leads to ten qualitative design rules for the application of local variation to tracking problems. In particular statements are made concerning the choice of the step
width parameter, directed mutations, and mutations penalizing small steps. Good
settings of the offspring population size are deduced by correlating fitness evaluations and the strength of the dynamics. Moreover, external memorizing techniques
4
and self-adaptation mechanisms are considered. In order to exhaust the frontiers
of the technique, an extreme case of dynamic problems is analyzed that is hard for
current self-adaptation techniques. This problem may serve as benchmark for the
development of self-adaptation mechanisms tailored to dynamic problems. The design rules are validated rudimentary on a small set of test functions using evolution
strategies as optimization algorithm. Two new techniques to cope with the new
benchmark are proposed.
5
Zusammenfassung
Diese Arbeit beschäftigt sich mit dem universell einsetzbaren Optimierungsverfahren der evolutionären Algorithmen angewandt auf so genannte dynamische Probleme, d.h. sich während der Optimierung verändernde Probleme. Die Dissertation
ist im Wesentlichen durch einen Mangel an Grundlagen zu dieser Thematik sowie
vielen meist unvergleichbaren empirischen Arbeiten motiviert.
Um eine Vergleichsgrundlage für evolutionäre Algorithmen auf unterschiedlichen
zeitabhängigen Problemen zu schaffen, wird im ersten Teil der Arbeit eine mathematische Beschreibung für eine große Klasse von dynamischen Problemen eingeführt.
Diese erlaubt die exakte Definition der Dynamik anhand von Veränderungen zwischen den diskreten Zeitschritten. Dabei wird von mehreren statischen Basisfunktionen ausgegangen, die über Maximumsbildung zu einer Gesamtfunktion zusammengeführt werden. Die einzelnen Basisfunktionen können mittels Streckung,
Fitnessskalierung und Koordinatentransformation modifiziert werden. Die Eigenschaften dieser drei Modifikationsmöglichkeiten lassen sich wiederum mathematisch beschreiben und führen zu einer Klassifikation der dynamischen Probleme.
Damit können empirische Untersuchungen, die unterschiedliche Probleme benutzen,
bezüglich der Ähnlichkeit der zugrunde liegenden Dynamik eingeordnet werden.
Auf der einen Seite dient dies der Erstellung einer Zuordnung von speziellen Techniken für dynamische Probleme zu den Problemklassen. Eine erste solche Zuordnung wird auf der Basis einer Analyse der Literatur zum Thema erstellt. Dies ist ein
erster Schritt zur Identifikation von Entwurfsmustern zur Unterstützung der Lösung
zukünftiger dynamischer Probleme. Auf der anderen Seite werden die Problemklassen auch als Basis für eine Untersuchung von Leistungsmaßen auf dynamischen
Problemen herangezogen.
Im zweiten Teil der Arbeit wird eine spezielle Technik, die lokale Veränderung,
auf einer Problemklasse, den so genannten Tracking-Problemen oder driftenden
Landschaften, im Detail analysiert. Hierfür wird der Optimierungsprozess eines
einfachen zweidimensionalen Problems durch eine (1, λ)-Strategie mittels einer
Markov-Kette modelliert. Dies ermöglicht exakte Berechnungen der Wahrscheinlichkeiten, sich zu einem beliebigen Zeitschritt in einer bestimmten Entfernung
zum Optimum zu befinden. Durch Variation der Stärke der Dynamik, des Schrittweitenparameters der Mutation, bestimmter grundlegender Eigenschaften der Mutation sowie der Populationsgröße lassen sich Ergebnisse zur Einstellung des Optimierungsverfahrens ableiten. Dies führt zu zehn qualitativen Entwurfsregeln zur
Anwendung von lokaler Variation auf Tracking-Probleme. Insbesondere werden
dabei Aussagen zur Wahl des Schrittweitenparameters sowie zum Einfluss einer
gerichteten Mutation oder einer Mutation, die kleine Schrittweiten benachteiligt,
6
gemacht. Sinnvolle Populationsgrößen lassen sich ableiten, indem die Anzahl der
Fitnessevaluationen pro Generation mit der Stärke der Dynamik korreliert wird.
Ebenso werden im Rahmen des zweiten Teils der Dissertation Techniken mit externem Erinnerungsvermögen sowie Selbstanpassungstechniken betrachtet. Um
die Grenzen der Technik auszuloten, wird ein Extremfall eines dynamischen Problems betrachtet, der als zukünftiger Benchmark für die Entwicklung von Selbstanpassungsmechanismen in dynamischen Problemen dienen kann. Die Entwurfsregeln werden ansatzweise auf einer kleinen Menge an Testfunktionen für die Evolutionsstrategie validiert. Zwei neue Techniken zur Bewältigung des neuen Benchmarks werden dabei präsentiert.
7
Contents
1
2
Introduction
13
1.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
1.2
Organization of this Thesis . . . . . . . . . . . . . . . . . . . . .
14
1.3
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . .
15
Background and Related Work
17
2.1
Optimization problems . . . . . . . . . . . . . . . . . . . . . . .
17
2.1.1
Static problems . . . . . . . . . . . . . . . . . . . . . . .
17
2.1.2
Dynamic problems . . . . . . . . . . . . . . . . . . . . .
19
Evolutionary algorithms . . . . . . . . . . . . . . . . . . . . . .
22
2.2.1
Basic algorithm . . . . . . . . . . . . . . . . . . . . . . .
22
2.2.2
Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . .
26
Dynamic Optimization . . . . . . . . . . . . . . . . . . . . . . .
31
2.2
2.3
3
Contribution and Methodology
37
3.1
37
Limitations of Previous Work . . . . . . . . . . . . . . . . . . . .
9
C ONTENTS
4
3.2
Focus and Contribution of this Thesis . . . . . . . . . . . . . . .
38
3.3
Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
A Classification of Dynamic Problems
41
4.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
4.2
Existing Classifications . . . . . . . . . . . . . . . . . . . . . . .
43
4.3
Dynamic Problem Framework . . . . . . . . . . . . . . . . . . .
44
4.4
Problem Properties . . . . . . . . . . . . . . . . . . . . . . . . .
51
4.4.1
Coordinate Transformations . . . . . . . . . . . . . . . .
51
4.4.2
Fitness Rescalings . . . . . . . . . . . . . . . . . . . . .
56
4.4.3
Stretching Factors . . . . . . . . . . . . . . . . . . . . .
61
4.4.4
Frequency of Changes . . . . . . . . . . . . . . . . . . .
62
4.4.5
Resulting Classification . . . . . . . . . . . . . . . . . .
63
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
4.5
5
Measuring Performance in Dynamic Environments
67
5.1
Goals of Dynamic Optimization . . . . . . . . . . . . . . . . . .
68
5.1.1
Optimization accuracy . . . . . . . . . . . . . . . . . . .
69
5.1.2
Stability . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
5.1.3
Reactivity . . . . . . . . . . . . . . . . . . . . . . . . . .
71
5.1.4
Technical aspects of adaptation . . . . . . . . . . . . . .
73
Performance Measures . . . . . . . . . . . . . . . . . . . . . . .
73
5.2.1
Measures for optimization accuracy . . . . . . . . . . . .
75
5.2.2
Measures for stability . . . . . . . . . . . . . . . . . . . .
78
5.2.3
Measures for reactivity . . . . . . . . . . . . . . . . . . .
79
5.2.4
Comparing algorithms . . . . . . . . . . . . . . . . . . .
79
Examination of Performance Measures . . . . . . . . . . . . . . .
80
5.3.1
Considered problems . . . . . . . . . . . . . . . . . . . .
80
5.3.2
Experimental Setup . . . . . . . . . . . . . . . . . . . . .
81
5.3.3
Statistical examination of the measures . . . . . . . . . .
81
5.2
5.3
10
C ONTENTS
5.3.4
5.4
6
Discussion of the Results . . . . . . . . . . . . . . . . . .
84
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
Techniques for Dynamic Environments
87
6.1
Restarting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
6.2
Local variation . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
6.3
Memorizing previous solutions . . . . . . . . . . . . . . . . . . .
89
6.3.1
Explicit memory . . . . . . . . . . . . . . . . . . . . . .
90
6.3.2
Implicit memory . . . . . . . . . . . . . . . . . . . . . .
90
Preserving diversity . . . . . . . . . . . . . . . . . . . . . . . . .
91
6.4.1
Diversity increasing techniques . . . . . . . . . . . . . .
91
6.4.2
Niching techniques . . . . . . . . . . . . . . . . . . . . .
92
6.4.3
Restricted mating . . . . . . . . . . . . . . . . . . . . . .
93
6.5
Adaptive and self-adaptive techniques . . . . . . . . . . . . . . .
93
6.6
Algorithms with overlapping generations . . . . . . . . . . . . . .
94
6.7
Non-local encoding . . . . . . . . . . . . . . . . . . . . . . . . .
95
6.8
Learning of the underlying dynamics . . . . . . . . . . . . . . . .
98
6.9
Resulting Problem-Techniques Mapping . . . . . . . . . . . . . .
98
6.4
6.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7
Analysis of Local Operators for Tracking
7.1
101
Theoretical framework . . . . . . . . . . . . . . . . . . . . . . . 103
7.1.1
Exact Markov chain model . . . . . . . . . . . . . . . . . 106
7.1.2
Worst-case Markov chain model . . . . . . . . . . . . . . 111
7.2
Feasible tracking . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.3
Optimal parameter settings . . . . . . . . . . . . . . . . . . . . . 128
7.4
Non-zero-mean mutation . . . . . . . . . . . . . . . . . . . . . . 132
7.5
Proposition of bigger steps . . . . . . . . . . . . . . . . . . . . . 137
7.6
Dependence on the population size . . . . . . . . . . . . . . . . . 146
7.7
Memorizing techniques . . . . . . . . . . . . . . . . . . . . . . . 152
11
C ONTENTS
7.8
7.9
8
7.8.1
Evaluation of the presented operators . . . . . . . . . . . 156
7.8.2
Limits of self-adaptation: uncentered tracking . . . . . . . 158
7.8.3
Alternative adaptation mechanisms . . . . . . . . . . . . 160
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Four Case Studies Concerning the Design Rules
8.1
9
Issues of adaptation and self-adaptation . . . . . . . . . . . . . . 156
165
Adapting and self-adapting local operators . . . . . . . . . . . . . 166
8.1.1
Experimental setup . . . . . . . . . . . . . . . . . . . . . 166
8.1.2
Limitations of local operators . . . . . . . . . . . . . . . 172
8.1.3
Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . 178
8.1.4
Self-adaptation . . . . . . . . . . . . . . . . . . . . . . . 178
8.1.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.2
Proposition of bigger steps . . . . . . . . . . . . . . . . . . . . . 181
8.3
Self-adaptation for the moving corridor . . . . . . . . . . . . . . 184
8.4
Building a model of the dynamics . . . . . . . . . . . . . . . . . 189
Conclusions and Future Work
195
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
12
CHAPTER 1
Introduction
1.1
Motivation
In nature, biological evolution is an adaptation process in an always changing environment. Changes occur from external reasons like climate changes or geophysical
catastrophes as well as from evolution itself since all species, animals and plants,
are coevolving and thus determining their common environment. Evolution of one
species comes along with a change in the environment of all other species in its
habitat. Although natural evolution is always aiming at optimality, it will never
converge in an equilibrium and reach optimality. The reasons are the changing environment and a certain inertia of evolution. As a consequence natural evolution is
also not reversible.
Since the 1950s natural evolution has been modeled by engineers and scientists in
order to solve their problems (Reed, Toombs, & Barricelli, 1967; Fogel, Owens, &
Walsh, 1965; Bremermann, 1962; Friedman, 1956; Friedberg, Dunham, & North,
1959; Friedberg, 1958; Fraser, 1957; Box, 1957). The emerging evolutionary algorithms (EAs) have been developed in four different streams, genetic algorithms
(GA, Holland, 1975, 1992; Goldberg, 1989), evolutionary programming (EP, Fogel, Owens, & Walsh, 1966; Fogel, 1995), evolution strategies (ES, Rechenberg,
13
1. I NTRODUCTION
1973, 1994; Schwefel, 1977, 1995), and genetic programming (GP, Koza, 1992a).
In the case of GAs the initial idea was to control processes by adaptive reproductive
plans (cf. Section 3.5 in Holland, 1992; De Jong, 1993) which comes very close to
the idea of natural evolution. However, most research restricted its focus on optimization in static environments. As a consequence, the standard algorithms are not
designed for non-stationary problems and only applicable with certain difficulty.
Lately an increase in the popularity of dynamic optimization can be observed reflected in the number of research papers presented at conferences and workshops.
Even real-world applications of evolutionary algorithms on dynamic problems (e.g.
Vavak, Jukes, & Fogarty, 1997; Rana-Stevens, Lubin, & Montana, 2000) are arising, paving the way for a huge number of potential applications.
Numerous special techniques have been proposed to tackle dynamic problems more
effectively. However, there are many diverse non-stationary test problems with different characteristics such that comparisons among results or generalizing conclusions are almost impossible. Also, we notice a remarkable absence of theoretical
foundations for the classification of problems, for the comparison of different algorithms, and for the development and explanation of techniques. Nevertheless,
the author of this thesis believes that such a foundation is necessary to drive the
emerging field of evolutionary dynamic optimization to fruition.
1.2
Organization of this Thesis
Chapter 2 provides a short introduction to stationary and dynamic optimization
problems, evolutionary algorithms in general, and an overview of the issues and
the major results in dynamic optimization.
Chapter 3 shows how this thesis fits into this picture. It discusses the most urgent
open problems and the contribution of the thesis.
A more general and formal discussion of dynamic function optimization problems
follows in Chapter 4. Here, properties of these problems are defined within a mathematical framework which enables a profound problem classification.
In Chapter 5, different aspects concerning the goals of optimization in dynamic
environments are discussed. Various performance measures are reviewed and new
measures are proposed. They are examined on four problem classes of the introduced framework.
Chapter 6 reviews the major publications in dynamic optimization and categorizes
the empirical investigations concerning the problems as well as the techniques used.
14
1.3. ACKNOWLEDGEMENTS
The remainder of the thesis focuses on a special class of problems, namely drifting
problems. The performance of local operators on these problems is investigated in
detail in a formal, theoretical investigation in Chapter 7.
The findings of Chapter 7, a set of design rules, are applied to a few scenarios in
Chapter 8.
Eventually, Chapter 9 concludes the work with a summary and discussion.
1.3
Acknowledgements
First of all, I would like to thank my advisor, Prof. Dr. Volker Claus, who gave
me the opportunity to work in his group and to develop this thesis. It was always a
privilege to experience the freedom of following my own ideas in research as well
as in teaching. He also made the integration of both research and family possible.
Very special thanks go to the co-referees Prof. Dr. Hartmut Schmeck and Prof.
Kenneth A. De Jong Ph.D. for reviewing my thesis and for many helpful comments. I am very grateful for the discussions with Prof. De Jong, Dr. Christopher
Ronnewinkel, and Dr. Jürgen Branke. They all contributed valuable impulses to
my work. This is even more true for the perpetual discussions with my wife Dr.
Nicole Weicker. Also I appreciate the support of all members of our group “Formal
Concepts” at the University of Stuttgart.
Moreover, I want to thank Prof. Eliot Moss Ph.D. at the University of Massachusetts and Prof. Dr. Andreas Zell at the University of Tübingen. Both had
an impact on my way of doing research and it was a pleasure to work with them.
I am grateful to the staff of Schloss Dagstuhl who enabled me to stay one week
at Dagstuhl to have unimpaired attention to promote my research. Large parts of
Chapter 7 are due to this research stay.
Finally, I want to thank my family—Nicole for her support and advice and the
children for bearing their busy Dad. I also owe thanks to my parents who have
supported me such a long time and gave me the opportunity to become what I am.
And last, I am grateful to my brother Norbert whose “technical support” was often
a big help when I was focusing on my research.
15
CHAPTER 2
Background and Related Work
This chapter summarizes the relevant definitions and results that are necessary to
understand and to rate the topic of this thesis. Section 2.1 focuses on optimization
problems, Section 2.2 is concerned with evolutionary algorithms, and Section 2.3
gives the background information on optimization in non-stationary environments.
2.1
Optimization problems
This section gives a formal definition of optimization problems and, then, advances
to the special class of dynamic or non-stationary problems.
2.1.1
Static problems
Optimization problems occur in all areas of industry, research, and management.
To illustrate the term “optimization” within these areas, a few typical short scenarios are presented. A first example is an improved utilization of existing resources:
in job shop scheduling the through-put of a factory or a set of machines is increased,
or the task of flight crew assignment reduces the number of idle flights of airplane
17
2. BACKGROUND AND R ELATED W ORK
crew members. A second example is the design of technical objects, e.g. the economical construction of spatial structures like bridges or cranes, the improvement
of nozzles, or the optimization of hardware circuits. Another optimization task is
the mere calibration of parameters for existing processes—a technical example is
the optimization of electronic control units for combustion engines. A last example is the search for biochemical lead structures in the context of pharmaceutical
drug design, which differs from the previous scenarios in the fact that any feasible
structure must be found contrary to the improvement of existing solutions. Each of
those problems has different characteristics which must be taken into consideration
by a suitable optimization algorithm.
The following definition formalizes an optimization problem by introducing the
possible candidate solutions as the search space Ω and a quality value for each
candidate solution enabling an assessment of the candidate solution’s quality and a
comparison of different candidate solutions.
Definition 2.1 (Optimization problem) An optimization problem is defined by the
closed search space Ω, a quality function f : Ω → R, and a comparison relation
“ ” ⊂ {<, >}.
The task is to determine the set of global optima X ⊆ Ω defined as
X = {x ∈ Ω | ∀x0 ∈Ω f (x) f (x0 )} .
♦
Note, that often the identification of one solution x ∈ X suffices for a successful
optimization. Moreover, in real-world applications, any improvement of the quality
over the best candidate solution known so far is usually already a success. As a
consequence, the detection of the exact global optimum should usually be replaced
by an approximation, i.e. the task is to find a candidate solution x∗ with f (x∗ ) as
close as possible to f (x) (x ∈ X ).
A very simple example to illustrate the definition is the sphere function.
Example 2.1 (Sphere function) The search space is defined as
Ω = [−5.12, 5.12] × · · · × [−5.12, 5.12] ⊂ Rk .
Pk
2
The quality function is
f (x1 , . . . , xk ) =
i=1 xi .
It is a minimization problem, i.e. “” ≡ “<”. The set of global optima follows
immediately as X = {~0}. This problem is unimodal which means that there exists
only one local and global optimum. The problem is also separable since the quality
function can be expressed as a sum of terms where each term depends on only one
object variable xi .
♦
18
2.1. O PTIMIZATION PROBLEMS
Note, that this general definition of optimization problems includes problems of
varying difficulty. Toy problems like Example 2.1 are usually used as benchmark
problems to compare various general optimization methods or to analyze EA theoretically. In the context of evolutionary algorithms rather uninteresting problems
are those where deterministic, polynomial time algorithms exist, e.g. minimum
spanning trees, shortest paths, or maximum flow in graphs. More relevant for
real-world applications are the difficult problems, i.e. those problems which are
NP-hard. The complexity class NP contains all problems which may be solved by
a non-deterministic Turing machine in polynomial time. A problem X is called
NP-hard iff any problem Y in the class NP may be reduced, i.e. rephrased, to an instance of X using a polynomial time algorithm and the problem Y may be solved
by solving X. If X is furthermore an instance of NP the problem is called NPcomplete. A popular NP-complete problem is the traveling salesperson problem
(Papadimitriou, 1977).
2.1.2
Dynamic problems
Static problems have been the focus of evolutionary algorithm research for almost
20 years, and their solution proved to be of considerable difficulty. However, the
introduction of time dependency by Goldberg and Smith (1987) adds a new and
very distinct degree of difficulty. Before these issues are discussed in more detail,
the non-stationary version of an optimization problem is defined formally.
Definition 2.2 (Dynamic optimization problem) A dynamic optimization problem is defined by the search space Ω, a set of quality functions f (t) : Ω → R
(t ∈ N0 ), and a comparison relation “ ” ∈ {<, >}.
The goal is to determine the set of all global optima X (t) ⊆ Ω (t ∈ N0 ) defined as
X (t) =
x ∈ Ω | ∀x0 ∈Ω f (t) (x) f (t) (x0 ) .
♦
In dynamic optimization, a complete solution of the problem at each time step is
usually unrealistic or infeasible. As a consequence, the search for exact global
optima must be replaced again by the search for acceptable approximations.
Note, for the sake of simplicity, in all examinations of this thesis only problems
with one global optimum are considered.
Example 2.2 (Dynamic sphere function) Again, the search space is defined as
Ω = [−5.12, 5.12] × · · · × [−5.12, 5.12] ⊂ Rk .
19
2. BACKGROUND AND R ELATED W ORK
The sequence of quality functions is given by
(t)
f (x1 , . . . , xk ) =
k
X
(t)
(xi − yi )2 .
i=1
(t)
with yi = 5sin(ρt) for t ∈ N0 , a scaling factor ρ ∈ R, and 1 ≤ i ≤ k. It
is a minimization problem, i.e. “” ≡ “<”. The set of global optima follows
as X (t) = {(5sin(ρt), . . . , 5sin(ρt)}. Depending on the factor ρ, changes to the
problem from t to t + 1 are slight or significant. This leads to a high variance in the
degree of difficulty of the problem.
♦
The time dimension has a completely different impact on the characteristics of a
problem than any other dimension of the search space. As long as changes in the
landscape occur only occasionally the problem can be optimized as it were static
in the periods between those changes. However, as soon as the changes occur more
often or even continuously, there is only a very restricted time span available to
deliver a new approximation for the problem and, as a consequence, only a limited number of quality evaluations must suffice. This is a completely different task
compared to static optimization problems. In fact, incorporating dynamics can turn
a very simple unimodal problem like the sphere in Example 2.1 into such a complex task that standard algorithms cannot cope with the problem. This underlines
the need to develop a foundation for the application of evolutionary algorithms to
dynamic environments.
In the literature, the dynamics inherent in the problem are often referred to as exogenous dynamics in contrast to endogenous dynamics stemming from the dynamics in the evolutionary optimizer itself. Examples for the latter are changes due to
coevolutionary methods or effects in small finite populations.
Two real-world examples for dynamic problems are the classical tasks of time series prediction (in the context of evolutionary computation tackled by Fogel et al.,
1966; Angeline, Fogel, & Fogel, 1996; Angeline & Fogel, 1997; Angeline, 1998;
Neubauer, 1997) and control problems like the stabilization of a pole-cart system
(e.g. solved with means of evolutionary computation by Odetayo & McGregor,
1989). Another application of increasing importance is dynamic scheduling, e.g.
in a job shop scheduling problem, where a number of jobs has to be assigned to a
setup of machines guaranteeing a high through-put and reacting flexibly on newly
arriving jobs. Examples tackled by evolutionary computation can be found in the
publications of Biegel and Davern (1990); Bierwirth, Kopfer, Mattfeld, and Rixen
(1995); Bierwirth and Mattfeld (1999); Branke and Mattfeld (2000); Fang, Ross,
and Corne (1993); Hart and Ross (1998); Lin, Goodman, and Punch III (1997);
20
2.1. O PTIMIZATION PROBLEMS
Mattfeld and Bierwirth (1999); Cartwright and Tuson (1994) and Rixen, Bierwirth,
and Kopfer (1995). Other real world problems that have been tackled successfully
by evolutionary computation are combustion balancing in multiple burner boiler
(Vavak et al., 1997), load balancing (Munetomo, Takai, & Sato, 1996), digital signal processing (Neubauer, 1996), reduction of air traffic congestion (Oussedik, Delahaye, & Schoenauer, 1999), speech recognition (Spalanzani & Kabré, 1998), and
parameter identification (Fogarty, Vavak, & Cheng, 1995). However, those real
world problems are often difficult to examine and to analyze. Very often they require also rather special optimization techniques and are, therefore, inadequate to
get insights into the usage of a general problem solver to tackle dynamic problems.
As a consequence, artificial non-stationary problems are usually the basis for the
examination of general dynamic optimization techniques. The most simple way to
create an artificial dynamic problem is to take a stationary problem function and
to move it like in Example 2.2. Frequently, very simple unimodal problems are
used because the focus of the difficulty is one the introduced dynamics. In the
literature there are examples for a linear movement (see Vavak, Fogarty, & Jukes,
1996a, 1996b; Salomon & Eggenberger, 1997; Ryan & Collins, 1998), movement
according to a sine function (see Cobb, 1990; Dasgupta, 1995), or along a cyclic
trace in the search space (see Angeline, 1997; Bäck, 1998). Often, there are also
certain stationary periods involved.
Instead of a continuous movement, random relocation of the stationary problem
is considered by Angeline (1997) and Bäck (1997, 1998). Collard, Escazut, and
Gaspar (1996, 1997) relocated a multi-dimensional Gaussian function according to
an enumerating function.
Where shifting and relocating introduces a special kind of dynamics which is applied similarly to each point of the stationary problem, more complex dynamics
can be introduced by the moving hills technique. Here several unimodal problems (hills) are placed in the search space which can individually change their
height (e.g. Cedeño & Vemuri, 1997; Liles & De Jong, 1999; Trojanowski &
Michalewicz, 1999b), the position of the maximum peak or all peaks (e.g. Grefenstette, 1992; Cobb & Grefenstette, 1993; Vavak et al., 1997; Vavak, Jukes, &
Fogarty, 1998; Sarma & De Jong, 1999; Smith & Vavak, 1999), or the shape of
the peaks. Morrison and De Jong (1999), Grefenstette (1999), and Branke (1999c)
proposed different moving hills test problem generators comprising most described
changes. These or similar problem generators have been used by Kirley and Green
(2000), Morrison and De Jong (2000), Saleem and Reynolds (2000), and Ursem
(2000).
21
2. BACKGROUND AND R ELATED W ORK
Besides the described problems in a real-valued search space, there is also a wide
set of artificial dynamic problems using binary search spaces. One popular approach are the dynamic match functions (or pattern tracking) where a binary string
is given and the fitness of each individual is measured as the number of common
bits with the bit string. The most simple version of this kind of problem is the timevarying counting ones function in which alternating the number of ones and zeros
are maximized (cf. Bäck, 1997, 1999). In a more general version of the problem
all g generations d bits in a target string are changed (cf. Vavak & Fogarty, 1996;
Collard et al., 1997; Escazut & Collard, 1997; Gaspar & Collard, 1997, 1999a,
1999b; Stanhope & Daida, 1998, 1999). In a very early version of the problem,
Pettit and Swigger (1983) changed each bit according to an individual stochastic
transition table.
Similarly to relocation according to an enumerating function in real-valued search
spaces, Collard et al. (1996, 1997) applied the same scheme to a two-bit needle-ina-haystack function.
Probably the most popular binary dynamic test function is the time-varying knapsack problem where the size of the knapsack or the weight of the items changes
over time. As an extreme, the dynamic problem can alternate between completely
different knapsack instances. For various dynamic knapsack problems refer to the
work of Dasgupta and McGregor (1992), Goldberg and Smith (1987), Hadad and
Eick (1997), Lewis, Hart, and Ritchie (1998), Mori, Kita, and Nishikawa (1996,
1998), Ng and Wong (1995), Ryan (1996, 1997), Smith and Goldberg (1992), and
Smith and Vavak (1999).
2.2
Evolutionary algorithms
This section outlines a general generic evolutionary algorithm and, very briefly,
presents the four major paradigms, namely genetic algorithms, evolution strategies,
evolutionary programming, and genetic programming.
2.2.1
Basic algorithm
Figure 2.1 shows the general evolutionary cycle which is the generic basis for all
evolutionary algorithms (EA).
The fundamental idea of evolutionary computing is the mimicry of natural evolution: an initial multi-set of candidate solutions undergoes a process of simulated
evolution. That means that candidate solutions are able to reproduce themselves
22
2.2. E VOLUTIONARY ALGORITHMS
initialization
evaluation
Output
result
yes
no
termination
criterion
environmental
selection
evaluation
parental
selection
recombination
mutation
Figure 2.1 Schematic description of the generation loop in evolutionary algorithms.
and are subject to an additional selection pressure. Following the biological terminology, a candidate solution is referred to as individual and a multi-set (or a tuple)
of individuals is called a population. Usually populations in EAs are of a fixed size
contrary to the varying population sizes in nature, where changes in the population
size are one means of direct response to changing environmental conditions.
In the evolutionary cycle in Figure 2.1 parental and environmental selection, recombination, and mutation are clearly biologically inspired components where the
initialization, the direct evaluation of individuals, and the termination criteria are
additional elements, necessary for the use of evolution as an optimization method.
The components are discussed in more detail in the next paragraphs. One pass
through the evolutionary cycle is called a generation.
In the initialization a first population of individuals is created. Usually, those individuals are chosen at random – under certain circumstances also concrete individuals, e.g. known good candidate solutions, are included in the initial population.
Since individuals in simulated evolution do not live in a real environment struggling
for survival, the quality function of the optimization problem replaces the interaction of individuals with the environment: it establishes a means of comparison of
individuals to guide the evolutionary search process. In the context of evolutionary
optimization we refer to the quality of an individual as fitness.
23
2. BACKGROUND AND R ELATED W ORK
In order to create new individuals it is necessary to select parents from the current
population. This selection and the assignment of a number of offspring to each
parent is one of two possible positions in the evolutionary cycle where selective
pressure may take place. No selective pressure can be obtained in the parental
selection by a uniform random selection of parents.
Between the selected parents a recombination takes place and at least one offspring
is created by the combination of the genetic material of the parents where the term
“combination” should be understood in a wider sense. Offspring inherit certain
traits of their parents—however often also completely new traits may be created.
Note, that there are evolutionary algorithms that do not use recombination.
Also in analogy to natural evolution an error rate in the process of reproduction
is considered within the mutation operators that are applied when generating offspring individuals. Usually only rather small changes should be added to individuals since evolutionary progress relies on the inheritance of parental traits.
Those recombined and mutated individuals are evaluated using the quality function to determine their fitness. The fitness of the individuals is the basis for the
environmental selection where for each individual a decision is met whether it will
survive and be a potential parent in the next iteration. There are two extreme cases
of environmental selection, namely the steady state EA, where just one offspring is
created and replaces an individual in the parental population, and the generational
EA, where the whole parental population is replaced by new individuals. In between those extremes there exist many environmental selection strategies, e.g. by
selecting the best individuals from the union of parents and offspring or by replacing more than one individual in the parental population. The latter case is usually
determined by a degree of overlap between the generations and a replacement strategy.
Contrary, to natural evolution at the end of each cycle the termination criteria tests
whether the goal of the optimization has been met already. Usually, an additional
maximal number of generations is given such that the EA always halts.
The processing scheme of the general evolutionary algorithm is shown in Algorithm 2.1 in pseudo code. Parameters are the population size as well as the number
of offspring that have to be created each generation. Moreover a genotypic search
space G must be determined together with a decoding function dec : G → Ω
that determines to which phenotypic candidate solution a genotype is mapped. The
interaction between genotype and phenotype is shown in Figure 2.2. Ideally such
a mapping from genotype to phenotype is bijective. But due to the optimization
problem or restrictions caused by the chosen evolutionary algorithm, there is often
redundancy in the encoding, i.e. several genotypes are mapped to one phenotype,
24
2.2. E VOLUTIONARY ALGORITHMS
Algorithm 2.1 General evolutionary algorithm EA
1: INPUTS: certain parameter µ, λ, . . . , quality function f : Ω → R
2: PARAMETERS: population size µ, number of offspring λ, genotype G, decoding function dec
3: t ← 0
4: P (t) ← create a population of size µ
5: evaluate individuals in P (t) using dec and f
6: while termination criteria not fulfilled do
7:
E ← select parents for λ offspring from P (t)
8:
P 0 ← create offspring by recombination of individuals in E
9:
P 00 ← mutate individuals in P 0
10:
evaluate individuals in P 00 using dec and f
11:
t←t+1
12:
P (t) ← select µ individuals from P 00 (and P (t − 1))
13: end while
14: OUTPUT: best individual in P (t)
or the mapping is not surjective, i.e. there are phenotypes which are not encoded
by any genotype. Whenever the decoding function is not important, it is omitted
and the induced quality function is used instead.
phenotype
phenotypic
search space Ω
decoding
dec
genotype
quality function
f
IR+
induced
quality function
F = f ◦ dec
G ⊆ G1 × . . . × Gl
Figure 2.2 Genotypic coding of the search space.
For each application of the evolutionary algorithm displayed in Algorithm 2.1, the
operators must be chosen in accordance with the respective problem. The operators
recombination, mutation, and parental and environmental selection must conform
to the following definition.
Definition 2.3 (Operators) Given a gentoypic search space G, a mutation operator
is defined by the function
Mξ : G → G
25
2. BACKGROUND AND R ELATED W ORK
where ξ ∈ Ξ represents a random number (or several random numbers).
Analogously, the recombination operator is defined for r ≥ 2 parents and s ≥ 1
offspring by the function (r, s ∈ N)
Rξ : G r → G s .
A selection operator on a population P = hA1 , . . . , Ar i with Ai ∈ G selecting
s ∈ N individuals is defined by the function
S ξ,dec,f : G r → G s
where for the resulting individuals S ξ,dec,f (P ) = hB1 , . . . , Bs i it holds that Bi ∈
set(P ) for 1 ≤ i ≤ s and set(P ) = {Ai ∈ G | P = hA1 , . . . , Ar i, 1 ≤ i ≤ s}.
The function S can depend on a random number ξ ∈ Ξ, the genotype–phenotype–
mapping dec, and the quality function f .
♦
2.2.2
Paradigms
In this subsection, we review those standard algorithms (and paradigms) that are
relevant in the remainder of this work, namely genetic algorithms, evolution strategies, and evolutionary programming. The fourth paradigm, genetic programming,
is described superficially at the end of this section.
Genetic algorithms (GAs) have their roots in the early work of Holland (1969,
1973), resulting in his book on adaptive systems (Holland, 1975) where GAs are
called reproductive plans. The use of genetic algorithms as optimization tools was
established by his students (e.g. De Jong, 1975). Today’s popularity of GAs is
primarily due to the book by Goldberg (1989).
The canonical GA is outlined in Algorithm 2.2. The historical form is characterized by a binary genotypic search space G = Bl and a coding function mapping
this space to the phenotypic search space, e.g. Ω = Rn . For the encoding of
each phenotypic dimension usually a standard binary encoding or a Gray code is
used (Caruana & Schaffer, 1988). The operators to modify the individuals are a recombinative crossover operator which exchanges certain bits of two parents and a
bit-flipping mutation where each bit in an individual is flipped with a certain probability. In GAs there is a high emphasis on the recombination which is applied with
a rather high probability. The mutation has the role of a background operator with
a very low bit-flipping probability. The selective pressure is created by parental
selection only, using fitness proportional selection, i.e. selection is probabilistic
26
2.2. E VOLUTIONARY ALGORITHMS
Algorithm 2.2 Genetic algorithm GA
1: INPUTS: quality function f
2: PARAMETERS: population size µ, decoding function dec : Bl → Ω,
mutation rate pm , crossover probability px
3: t ← 0
4: P (t) ← create initial population of size µ
5: evaluate individuals in P (t) (using dec and f )
6: while termination criteria not fulfilled do
7:
P 0 ← select µ parents from P (t) with a probability proportional to the individuals quality
8:
P 00 ← recombine the parents in P 0 using the crossover operator (each application results in two new individuals; crossover takes place with probability
px —the individuals are copied otherwise)
9:
P 000 ← apply the mutation to each individual in P 00 (each bit is flipped with
probability pm )
10:
evaluate population P 000 (using dec and f )
11:
t←t+1
12:
P (t) ← P 000
13: end while
14: OUTPUT: best individual in P (t)
where better individuals get a higher probability assigned. As a result, better individuals are expected to have more offspring than worse individuals have. Since the
number of created individuals matches the population size there is no environmental selective pressure.
Three prominent crossover operators are the 1–point crossover, where one crossover
point in the individuals is chosen and the left part of the individuals is exchanged,
the 2–point crossover, where two crossover points are chosen and the section between those points is exchanged, and the uniform crossover, where each bit is chosen individually from one of both parents.
In the course of time, many modifications concerning the selection mechanism have
been proposed. There are on the one hand certain scaling techniques which are able
to control the selective pressure more effectively. On the other hand there are also
very distinct selection operators like the tournament selection where an individual
is selected as the best of k uniformly chosen individuals.
In the 1980s, there have been also first GAs that use a genotypic search space different from the binary space. Namely, real-valued search spaces and permutations
for combinatorial optimization are used. For these algorithms the development of
27
2. BACKGROUND AND R ELATED W ORK
special operators was necessary. However, all those modifications are out of the
scope of this work.
Evolution strategies (ES) have been developed by Rechenberg (1964) together
with Schwefel and Bienert for the design of a nozzle as well as a crooked pipe.
Their first experiments have been executed manually.
Within evolution strategies the genotypic search space is always real-valued G = Rl
and usually equivalent to the phenotypic search space. The main operator in ES is
the mutation which generates modifications local to the genotypic search space by
adding a Gaussian random number to each dimension of G. The standard deviation
of the probability density function used in the mutation is an essential parameter
for success or failure of an optimization. The next paragraph is devoted to the
control of this parameter. If the standard deviation is identical for all search space
dimensions, the mutation is called isotropic. Originally there was no recombination
which was later introduced as secondary operator. Selective pressure is only generated by an environmental selection – the parents are chosen with uniform probability. In a population of size µ, the environmental selective pressure is generated by
increasing the population size with newly created individuals and reducing it again
by selecting only the µ best individuals. If the individuals are chosen from λ > µ
offspring only, it is a comma–strategy; if the original population is expanded by λ
new individuals, it is a plus–strategy.
The first attempt to control the mutation’s standard deviation σ has been the 1/5–
success rule by Rechenberg (1973) where σ was diminished if the success rate is
less than 1/5 and it is increased if the success rate, i.e. the percentage of offspring
better than their parent, is greater than 1/5.
σ0

 ασ, iff ps > 1/5
1
σ, iff ps < 1/5
←
 α
σ, iff ps = 1/5
Usually, α ≈ 1.224 is chosen. Where this rule works sufficiently in unimodal
search spaces, it gets easily trapped in multimodal search spaces. A much more
robust adaptation mechanism was introduced by Schwefel (1975) with the concept
of self-adaptation. Here, each individual is extended by one (or more) additional
parameter(s) which contain the standard deviation(s) used for the creation of this
individual. In case of the isotropic mutation, one strategy parameter is needed.
This strategy parameter is first mutated and then used to create the next individual.
The mutation is carried out for an individual A = hA1 , . . . , An , σi according to
28
2.2. E VOLUTIONARY ALGORITHMS
following rules.
σ
0
A0i
1
← σ exp √ N (0, 1)
l
← Ai + N (0, σ 0 )
(2.1)
The rule for the strategy parameter is called the lognormal update rule. The primary idea is that high quality individuals are due to good parameter values which
leads to the dominance of well-adapted standard deviations for each individual at
its position in the search space. The resulting self-adaptive evolution strategy is
displayed in Algorithm 2.3.
Algorithm 2.3 Self-adaptive evolution strategy ES
1: INPUTS: quality function f
2: PARAMETERS: population size µ, number of offspring λ, recombination
rate pr
3: t ← 0
4: P (t) ← create initial population of size µ
5: evaluate P (t) using f
6: while termination criteria not fulfilled do
7:
P 0 ← hi
8:
for i ∈ {1, . . . , λ} do
9:
A ← select parent uniformly from P (t)
10:
if U ([0, 1]) < pr then
11:
B ← select mate uniformly from P (t)
12:
A ← recombine A and B
13:
end if
14:
σ 0 ← apply mutation on strategy parameter of A
15:
A0 = hA01 , . . . , A0n i ← apply mutation on hA1 , . . . , An i using σ 0
16:
evaluate A0 using f
17:
P 0 ← P 0 ◦ hA0 i
18:
end for
19:
t←t+1
20:
P (t) ← select the µ best individuals from P 0 (or from P 0 ◦ P (t − 1))
21: end while
22: OUTPUT: best individual in P (t)
In the above adaptation scheme the same standard deviation is applied to all dimensions of the search space. This can be changed by individual strategy parameters for each search space dimension, i.e. the individual has the form A =
29
2. BACKGROUND AND R ELATED W ORK
hA1 , . . . , An , σ1 , . . . , σn i. The strategy parameters are adapted using the following
rule.
!
√
1
σi0 ← σi exp
2lu + p √ N (0, 1)
2 l
0
0
Ai ← Ai + N (0, σi )
where u ∼ N (0, 1) is chosen once for the adaptation of all strategy variables σi
(1 ≤√i ≤ n). We refer to this mutation as (simple) non-isotropic mutation. The factors 2l and √1√ are recommended by Schwefel (1977) as appropriate heuristic
2 l
settings.
If in addition the orientation in the search space should be adapted, 12 n(n − 1) strategy variables are necessary to encode the direction in a n-dimensional search space.
One of those techniques is the covariance matrix adaptation (cma) by Hansen and
Ostermeier (1996, 2001). It is a derandomized approach—that means that first a
random change is added to the object variables, then according to the successs of
this change the underlying covariance matrix that encloses all strategy variables is
changed. So the mutation does not rely on good random changes of the strategy
variables.
Evolutionary programming (EP) was developed by Fogel et al. (1965, 1966)
who evolved finite automata to predict time series. In the late 80s and beginning
90s, Fogel (1992a, 1992b) extended the evolutionary programming paradigm to
real-valued search spaces as described in the next paragraph.
There is no selective pressure in the parental selection—each individual in the population creates exactly one offspring using mutation. No recombination is used.
The mutation is quite similar to the mutation in the evolution strategies. Even a selfadaptation mechanism was developed independently. The mutation and the usage
of the strategy parameter σ differs for an individual A = hA1 , . . . , An , σ1 , . . . , σn i
and is described in the following equations.
√
(2.2)
σi0 ← max{σi + N (0, sσi ), ε}
0
Ai ← Ai + N (0, σi )
where parameter s controls the strength of adaptation and ε > 0 is a minimal required standard deviation. As environmental selection each new individual and
each parent are compared to k randomly chosen individuals in the population. The
number of wins is assigned to each individual as score. The µ best scoring individuals survive for the next generation. An outline of evolutionary programming is
given in Algorithm 2.4.
30
2.3. DYNAMIC O PTIMIZATION
Algorithm 2.4 Evolutionary programming EP
1: INPUTS: quality function f
2: PARAMETERS: tournament size k, population size µ
3: t ← 0
4: P (t) ← create population of size µ
5: evaluate P (t) using f
6: while termination criteria not fulfilled do
7:
P 0 ← mutate each individual in P (t − 1) according to the mutation given
above
8:
evaluate P 0 using f
9:
P 00 ← P 0 ◦ P (t − 1)
10:
for each individual A in P 00 do
11:
Q ← choose k individuals randomly from P 00
12:
determine score of A as the number of individuals B ∈ Q where f (A) f (B)
13:
end for
14:
t←t+1
15:
P (t) ← µ individuals in P 00 with best score
16: end while
17: OUTPUT: best individual in P (t)
Genetic programming (GP) was introduced by Koza (1989, 1992a, 1992b). Initially the search space consisted of parse trees, e.g. containing Lisp S-expressions.
Later, also genetic programming on graphs and other structures of varying size has
been developed. The basic algorithm is quite similar to the algorithm of genetic algorithms although the population size is usually very large compared to GAs. In the
initial tree representation, the mutation has been implemented as the replacement
of subtrees by random trees, the recombination as exchange of subtrees among
individuals, and the parental selection as tournament selection.
2.3
Dynamic Optimization
Optimization or adaptation in non-stationary environments is a fairly new research
area from the perspective of a computer scientist or engineer. However, from a
biologist’s viewpoint, adaptation in a changing environment is a common theme:
coevolutionary interactions are examined since the 1960s where various species
are responding to one another modifying their environments reciprocally. There
is a rich literature on models and examinations (e.g. Futuyma & Slatkin, 1983).
31
2. BACKGROUND AND R ELATED W ORK
However, as Slatkin (1983) points out, those findings are always restricted by the
simplifications of the models. Also, these results are not transferable to the optimization of non-stationary problems in computer science or engineering since the
reciprocal redefinition of the environment does not match the character of the dynamic problems considered there. In dynamic optimization the dynamics are usually exogenous and not affected by the optimization process itself. In the following
paragraphs a quick overview on the state of the art is given.
Related biological results: In recent years there are some theoretical publications (e.g. Wilke, 1999) on exogenous dynamics in biological systems, using the
Eigen quasispecies model (Eigen, 1971). Primary results of this approach are the
examination of environmentally guided drifts where an adaptive walker can be
dragged to a global minimum in a maximization task (Wilke, 1998, 1999). Also oscillation frequencies are examined (Wilke, Ronnewinkel, & Martinetz, 1999, 2001;
Wilke & Ronnewinkel, 2001). These results are related to dynamic optimization,
although the relevant evolutionary algorithms are restricted to rather simple variants of standard genetic algorithms.
Approaches to dynamic problems: When tackling problems with exogenous
dynamics, there are basically two different evolutionary approaches. First, an evolutionary algorithm may be used to evolve a strategy, program, or automaton for
tackling the problem. In this case, usually each created individual must be tested
for a certain amount of time in the dynamic environment to determine its fitness
value. Nevertheless, although the individual solves a dynamic task, the evolutionary algorithm faces a more or less static problem since the dynamic task is identical
for each evaluation. This approach, displayed in Figure 2.3, is also referred to as
offline optimization. It is used in many evolutionary programming (EP) and genetic
programming (GP) applications, e.g. for time series prediction (Fogel et al., 1966;
Angeline et al., 1996; Angeline & Fogel, 1997; Angeline, 1998) or the stabilization
of the pole-cart system (using a GA, Odetayo & McGregor, 1989; Thierens & Vercauteren, 1991). The task is the evolution of a strategy that is able to deal with the
respective dynamic problem. In general, this is a difficult task where the solution
quality depends on the complexity and regularity of the dynamics and the internal
representation of the strategy.
Where offline optimization is completely distinct from the biological examinations
in the previous paragraph, the second approach is the exact analogue to the exogenous biological considerations. Here, the dynamic problem is changing independently while the evolutionary algorithm evolves solutions. This approach,
32
2.3. DYNAMIC O PTIMIZATION
Dynamic
Problem
Tackles
Ind.
Evolutionary
Algorithm
Figure 2.3 Static evolution of a program to solve the dynamic problem.
displayed in Figure 2.4, can be used to tackle more unpredictable problems than
the first approach. The second approach, also referred to as online optimization, is
the subject of this thesis. The following paragraphs summarize the state of the art
concerning online optimization.
Dynamic
Problem
is
solution
Ind.
Evolutionary
Algorithm
Figure 2.4 Dynamic evolution of solutions to the problem.
Theoretical results: There are only few theoretical results available and most of
the results are closely related to the biological examinations. For example, Rowe
(1999, 2001) also uses the Eigen quasispecies model. He computes the cyclic attractors in periodic (or oscillating) fitness functions. As a result he finds that the
theory conforms to experimental findings for high and modest mutation rates—
only modest mutation enables a successful adaptation. For low mutation rates,
however, the theory cannot be confirmed since no adaptation could be observed in
experiments.
Another theoretical result concerns the fixed point concentrations of a needle-in-ahaystack problem examined by Ronnewinkel, Wilke, and Martinetz (2000). This
work is an extension of the biological investigations of Wilke (1999). It enables
the derivation of an optimal mutation rate for this simple problem. The work is
extended for problems with bigger basins of attraction by Droste (2002).
Stanhope and Daida (1999) analyzed the dynamics of a genetic hillclimber. They
observe in their theoretical results as well as in empirical validations that small
perturbations have a huge effect on the optimizers performance. However, for a
fixed dynamic problem, an optimally chosen mutation rate affects the performance
33
2. BACKGROUND AND R ELATED W ORK
only little. The authors conclude that self-adaptation techniques might be of little
use in dynamic problems—however their results are restricted to population size 1.
These findings are also related to the results of Angeline (1997) who also concluded
from experiments that self-adaptation is not suited for non-stationary problems. As
a response, Bäck (1998) showed that self-adaptation is useful for a small degree of
dynamics.
Evolution strategies with isotropic mutations have been the focus of Arnold and
Beyer (2002) who analyze random drifting problems.
Not theoretical in a strict sense but still a fundamental conception is the comparison of evolutionary algorithms to filters in signal processing by Hirst (1997). He
showed that they act similar to low pass analogue filters in the continuous domain
and band pass non-recursive digital filters in the discrete domain using a generational evolutionary algorithm.
Comparison to learning techniques: In the context of dynamic optimization,
evolutionary algorithms have been compared systematically to other learning techniques seldom. The early work by Pettit and Swigger (1983) showed in a simple
set-up that cognitive approaches lead to better results than a genetic approach.
The work by Littman and Ackley (1991) did not compare evolutionary computation
to a learning algorithm but rather showed that adding a learning component can
improve the response time of the evolutionary algorithm considerably in a dynamic,
artificial life–related scenario.
Modeling the dynamic environment: When tackling dynamic problems a nearby
idea is to use well-examined models of the environment, as they are used in many
dynamic systems solutions. Since there are manifold applications within that category, only a partial overview is given here.
For control tasks there exists broad experience with fuzzy controllers. Also, there
is a symbiotic development of fuzzy and evolutionary computing during the past
years (for an overview see Bonissone, 1997; Karr, 1997; Bersini, 1998). Evolutionary algorithms for modifying fuzzy controllers within a dynamic environment
are proposed by Karr (1991, 1999).
Another approach to model the environment is the usage of stochastic learning automata (see Narendra & Thathachar, 1989). In the work of Munetomo et al. (1996)
such a model is used: the genetic algorithm generates new actions in the automaton,
actions are selected based on their fitness and applied to the environment, and the
fitness is recomputed using a linear learning scheme based on the feedback of the
34
2.3. DYNAMIC O PTIMIZATION
environment. A related approach was presented by Papadimitriou and Pomportsis
(2000) where a real valued mutation like in ES was used.
Only partially related is the work where evolutionary algorithms are used to support
methods in control theory. For example, Fadali, Zhang, and Louis (1999) applied
genetic algorithms to robust stability analysis in dynamic discrete-time systems.
However, all those approaches that use a model of the environment are restricted in
their application by the limits of the model. Especially the deterministic description
as linear or non-linear system is often not possible.
Technique oriented research: In the case of online optimization, standard algorithms reveal many deficiencies restricting their applicability. Generally speaking,
the algorithms loose their adaptability: they are not able to track an optimum any
more or to react to changes like the advent of a new global optimum. One main
reason for this effect is the loss of diversity in the population. Convergence is an intrinsic characteristic of evolutionary algorithms due to a big enough selective pressure. As a consequence, most of the research in the area of dynamic optimization
focuses on the development of new mechanisms and techniques to avoid convergence, to keep a certain level of adaptability, and to create more powerful tailored
evolutionary algorithms. The work of Goldberg and Smith (1987) and Cobb (1990)
are two early examples of this kind of research. An overview of the techniques is
contained in Chapter 6. For a more complete list refer to the doctoral thesis of
Branke (2002).
Application oriented research: Another research area with increasing importance is the application of the techniques to real-world problems. Here, successful
applications are flowshop scheduling (Cartwright & Tuson, 1994), rescheduling
(Bierwirth & Mattfeld, 1999), air crew scheduling (Rana-Stevens et al., 2000),
speech recognition (Spalanzani & Kabré, 1998), or scheduling of aircraft landings
(Beasley, Krishnamoorthy, Sharaiha, & Abramson, 1995). However, as Mattfeld
and Bierwirth (1999) argue in their workshop contribution, many of those realworld applications are different from the usually considered benchmark functions
since they replace the exogenous dynamics, an external goal modification, by internal model changes where the representation of candidate solutions changes due to
newly arriving jobs in case of a scheduling problem. As a consequence, techniques
oriented research is only partially relevant to real-world applications.
Systematical examinations: As we have seen above, the theoretical results are
very sparse and consider usually only very simple algorithms and problems. There35
2. BACKGROUND AND R ELATED W ORK
fore, the question arises how many systematical examinations of parameters may
be found in the literature.
Vavak and Fogarty (1996) and Smith and Vavak (1999) have examined the choice
between generational and steady state algorithms and, in the latter article, the influence of the replacement strategy in the steady state algorithm.
A systematical experimental investigation of good parameters for the dynamic
matching function (a dynamic version of the onemax problem) was done by Stanhope
and Daida (1998). Also, the doctoral thesis of Branke (2002) contains an extensive
empirical study of parameter choices for two different problems.
Only partially relevant for general dynamic optimization are the examinations of
Hartley (1999), who considered the fitness computation in dynamic classification
tasks, and of Coker and Winter (1997), who investigated the optimal number of
parents in an artificial life simulation.
In the face of necessary empirical analyses concerning the usefulness of algorithms
and good parameter setups, the need for dynamic test function generators was recognized. Recently three different generators have been proposed (Grefenstette,
1999; Branke, 1999c; Morrison & De Jong, 1999).
Classification: In consideration of the variety of dynamic problems and algorithms, the definition of problem clusters and technique clusters is necessary. Besides very coarse approaches (e.g. De Jong, 2000), first more profound classification were proposed by Trojanowski and Michalewicz (1999a) and Branke (2002).
36
CHAPTER 3
Contribution and Methodology
This chapter summarizes the open problems in the field of dynamic optimization
(Section 3.1) and clarifies which topics this thesis focuses on (Section 3.2). Furthermore, the methodology used to examine these topics is described in Section 3.3.
3.1
Limitations of Previous Work
The discussion in Section 2.3 gives an overview on the current state of dynamic
optimization. The shortcomings and open problems in the field are obvious. The
following list gives an incomplete overview on those topics.
1. Besides very few theoretical examinations of simple algorithms and problems, there is no theoretical or fundamental framework for both theoretical
and empirical work. Such a framework is necessary to enable a systematic
research and integration of results. Most existing research is technique oriented leading to isolated empirical findings that are often not comparable due
to very distinct non-stationary problems.
2. The restriction of theoretical investigations on simple problems, which is true
for almost any analysis of evolutionary computing, is especially problematic
37
3. C ONTRIBUTION AND M ETHODOLOGY
in dynamic optimization since the characteristics of the dynamics may have
an even stronger impact on the difficulty of the problem. As a consequence
theory for more complex problems is needed.
3. Also most theoretical approaches are limited to the analysis of simplified
genetic algorithms. All the developed special techniques to tackle dynamic
problems are again not considered in the current theory.
4. There is no fine-grained classification of dynamic problems available. Most
existing classifications are very coarse and not based on exact mathematical
criteria.
5. Besides a small number of responsive arguments, there is no fundamental
investigation available concerning the choice of performance measures. Both
the discussion what is actually expected from an optimization in a dynamic
environment as well as the discussion how the attainment of these goals is
measured are usually omitted in most publications.
6. The missing fundamental framework and the missing classification of problems prevents a mapping of existing techniques to problem classes. Such a
mapping could serve as decisive factor when designing an evolutionary algorithm for a specific dynamic problem.
7. Even for a restricted dynamic problem class there are no design rules available.
Apparently this list can be continued easily—especially if more specific topics concerning the techniques and problems are considered. However action seems to be
more imperative for the listed rather general open problems.
3.2
Focus and Contribution of this Thesis
This thesis addresses a few of the topics listed above. Chapter 4 is concerned with
a fundamental classification of dynamic problems (Topic 4). This classification is
also intended to serve as a first step toward a formal foundation of dynamic evolutionary optimization (Topic 1). It is the first approach to base such a classification
on mathematically defined, exact properties. How it can serve for an integration
of empirical investigations is demonstrated in Chapters 5 and 6. In Chapter 5 the
goals of an optimization in a dynamic environment are clarified and respective
38
3.3. M ETHODOLOGY
performance measures are examined for four problem classes within the framework (Topic 5). Chapter 6 reviews the primary literature on techniques used in
dynamic optimization and derives a first tentative mapping between dynamic optimization techniques and problem characteristics (Topic 6). Thus, the ensemble
of Chapters 4–6 is the first attempt to interrelate problem properties, optimization
techniques, and performance measures on a broad formal basis for dynamic optimization.
Where the first half of the thesis is concerned with the complete range of dynamic
problems, it focuses on drifting problems in the second half. In Chapter 7 a rigorous analysis of local variation operators in drifting landscapes is executed. It
results in several design rules to guide practitioners when applying a local variation
evolutionary algorithm to a drifting problem (Topic 7). These design rules are used
in four small case studies in Chapter 8. Besides the practical guidelines, Chapter 7 provides a detailed insight into the processing of local mutation operators in
drifting landscapes. The understanding and control of those problems is increased
persistently.
Altogether this thesis extends both the formal basis for integrative research in the
area of non-stationary optimization and the understanding and usage of local variation to dynamic problems (especially drifting landscapes).
3.3
Methodology
In underpinning and deriving the results of this thesis a very strict methodology has
been used for both empirical and theoretical examinations.
In empirical studies, all conclusions for a setup consisting of algorithm, parameters,
and problem are based on at least 50 experiments using independent initial random
seeds for the random number generator. Usually averaged results are presented in
the paper. As soon as a comparison of different techniques or parameter settings
is aimed at, a statistical test concerning the confidence in the superiority of either
choice is used. Throughout the whole paper Student’s t-test is applied for this
purpose. For each generation of the experiments, the hypothesis test is applied to
the data of that generation. As a consequence the t-test creates a curve of statistical
confidence per generation. In Chapter 5, Spearman’s rank order correlation and
the mean square error are used additionally to evaluate time series of performance
values. Details are described in Section 5.3.3.
All algorithms and problems used in the thesis are implemented in C++. The
resulting library is called sea (Stuttgart Evolutionary Algorithms) and provides
39
3. C ONTRIBUTION AND M ETHODOLOGY
an easily usable and expandable command-line program. Uniform random numbers are created using the random number generator of Park and Miller (1988)
with Bays-Durham shuffle and added safeguards. Random numbers From a Gaussian probability density function are generated with the Box-Muller transformation
(Box & Muller, 1958). The random bits needed by genetic algorithms are produced by a separate, very fast random number generator (Knuth, 1981, p.28). All
random number generators and statistical methods are used in the implementations
of Press, Teukolsky, Vetterling, and Flannery (1992).
The theoretical considerations in Chapter 7 use Markov chains to model the search
dynamics exactly as well as in a worst case scenario. The exact model is used for
computations of the probabilities to be at certain points in the search space. These
computations are carried out using the GNU multiple precision arithmetic library
(Granlund, 1996) to rule out any numerical effects in the computations.
40
CHAPTER 4
A Classification of Dynamic
Problems
This chapter introduces a new mathematical framework for classifying non-stationary
problems. After a short motivation in Section 4.1, the existing classifications are
reviewed in Section 4.2. Then in Section 4.3 a formal framework for defining or
describing dynamic problems is given. Section 4.4 extracts certain problem properties from the formal framework and uses them to define a very general classification scheme for dynamic problems. A short discussion in Section 4.5 concludes
the chapter.
4.1
Motivation
Research in dynamic optimization is characterized by many different non-stationary
problems and applications. As a consequence, a huge fraction of today’s research
is driven by certain applications or specific exemplary problems. But still there is
no common basis for classifying or comparing different problems. However such
a classification is essential if we want to build a general foundation for the design
of non-stationary optimizers on the available fragmentary results. In particular the
41
4. A C LASSIFICATION OF DYNAMIC P ROBLEMS
classification is useful within two stages of the design process.
First, knowledge on the characteristics of a completely new problem might help
when designing a respective algorithm. If we are able to identify the problem class
the new problem belongs to, knowledge concerning the techniques to tackle these
problems may be transfered to the new problem. Since introducing dynamics to
problems leads to a wide range of manifold and complex behavior, it is desirable
to have at least certain clues which technique is suited to tackle a given problem.
The necessity of this mapping between problem characteristics and algorithmic
techniques concerning “good” performance is also a direct consequence of the “No
Free Lunch” theorems by Wolpert and Macready (1995, 1997).
Second, the assessment of the performance of an algorithm depends highly on the
properties of the problem. The desired adaptive process of an evolutionary algorithm in a dynamic environment can be qualified closely by a clarification of the
requirements. This desired behavior may be influenced by the problem characteristics. An even bigger influence of the problem properties concerns the selection of
the performance measure to assess the stated goal. Here knowledge on the assessment of certain problem classes may help to measure the performance of a newly
designed algorithm for a new instance of this class and to evaluate and to improve
the algorithm.
Dynamic
Problem
apply
to
Evolutionary
Algorithm
identify
Characteristic
Properties
influences
design
select
assess
Performance
Measure
Goal/
Desired
Behavior
Figure 4.1 Schematic overview of the different aspects involved in this chapter.
Figure 4.1 shows the different stages where knowledge on problem characteristics
within a classification may be helpful when tackling a new non-stationary problem: design of evolutionary algorithms, desired behavior, and the respective performance measures. The selection of performance measures is discussed in Chapter 5
together with a minor discussion of the desired behavior. Simple design issues are
the topic of Chapter 6.
42
4.2. E XISTING C LASSIFICATIONS
Since many applications and benchmark problems are rather incomparable, such a
classification is difficult if not infeasible on a general level. Therefore, this chapter
restricts the task to a mathematical framework of dynamic function optimization
problems that covers most of the possible problem characteristics.
4.2
Existing Classifications
We can distinguish two different approaches to classify dynamic problems: direct
description of classes and classification by parametrization.
With respect to the former approach, the direct description of classes, there are only
very few problem classifications in the literature and most of these classifications
are rather coarse-grained. In spite of different terminology, the following classes
are broadly used.
• In alternating (or cyclic) problems, only a few different landscapes occur
and alternate with each other—usually with certain static periods between
changes (e.g. Collard et al., 1997; Liles & De Jong, 1999; De Jong, 2000).
• In problems with changing morphology, the fitness landscape changes according to certain topological rules. In most cases those landscapes are characterized by rather severe changes (e.g. Collard et al., 1997; Liles & De Jong,
1999; De Jong, 2000).
• In drifting landscapes, there is at least one static topology that drifts through
the landscape; often several, superimposed topologies are used (e.g. Liles &
De Jong, 1999; De Jong, 2000).
• In abrupt and discontinuous problems, only very few and unpredictable changes
occur (e.g. De Jong, 2000).
Another class, dynamical encoding problems, is proposed by Collard et al. (1997)
as “problems where the interpretation of alleles changes” which is a rather generally applicable description although this term is used only in the context of one specific function. Another classification is defined by Trojanowski and Michalewicz
(1999a)—they inspect problems on their randomness and predictability. For problems with non-random and predictable changes, a cyclic behavior is considered in
addition. Furthermore, they are interested in dynamic changes of problem constraints. However, as already mentioned above, all those classifications are coarse.
The different classes are not properly delimited and there is a vast number of classes
43
4. A C LASSIFICATION OF DYNAMIC P ROBLEMS
in-between. In order to make meaningful statements for certain problem properties,
a more detailed classification is necessary.
Where exact properties are missing in the existing classifications so far, they can be
found in the second approach, classification by parametrization, as parameters in
problem generators and in the description of problems. Therefore, we can consider
the following parameters as properties of rather fine-grained classifications.
• The frequency of change indicates how often the changes occur or how many
generations remain static between the changes (Branke, 1999b). The same
property is also called transition period (Gaspar & Collard, 1999a), period
duration (Collard et al., 1997), or punctuation (Grefenstette, 1999).
• The severity of change denotes how far e.g. the optimum is moving (Branke,
1999b). This property is also called transition range (Gaspar & Collard,
1999a) or maximal drift (Grefenstette, 1999). An alternative version of the
severity is the average shape alteration which considers all points in the
search space (Collard et al., 1997).
• The rule of meta dynamics denotes an underlying regularity concerning the
changes (e.g. Morrison & De Jong, 1999, use the logistics function to specify the meta rule). In a more vague way this property is also called the predictability of change (Branke, 1999b).
• The cycle length considers a periodic behavior and denotes the length of one
period (Branke, 1999b).
These properties can be used for a very exact description of various problem classes,
e.g. in the case of one moving hill. But there are many scenarios and problems for
which this classification is not general enough.
The classification presented in the following sections uses a combination of both
approaches. Within a mathematical framework defined in Section 4.3, a set of basic
characteristics similar to the fine-grained parameters is defined in an exact manner
(cf. Section 4.4). Those basic properties are used to build up problem classes
similar to the coarse-grained classes. The advantage of this method is that for any
problem covered by the framework an exact mapping to a problem class is possible.
4.3
Dynamic Problem Framework
This section introduces a new approach to classification: a general mathematical
framework to describe and characterize dynamic fitness functions. In order to es44
4.3. DYNAMIC P ROBLEM F RAMEWORK
tablish a well-defined basis for comparison and classification the non-stationary
fitness functions are defined in a formal way.
Similar to the moving hills problems, each dynamic function consists of several
static functions where for each static function a rule of dynamics is given describing
how the contribution of this component to the complete problem changes over time.
Note that the term “time” is always associated with an equidistant discretization
throughout the whole paper which is in particular due to the discrete nature of
evolutionary algorithms. The dynamics rule is defined by a sequence of coordinate
transformations, stretching factors for the coordinates, and fitness rescalings.
Definition 4.1 (Dynamic fitness function for maximization) Let Ω be the search
space with distance metric d : Ω × Ω → R. A dynamic fitness function F ≡
F (t) t∈N with F (t) : Ω → R for t ∈ N is defined by n ∈ N components consisting
of a static fitness function fi : Ω → R (1 ≤ i ≤ n) with optimum at 0 ∈ Ω,
fi (0) = 1, and a dynamics rule with
(t)
(t)
• coordinate transformations ci
with ci : Ω → Ω preserving any dist∈N
tance in the search space, i.e.
(t)
(t)
d(ci (ω1 ), ci (ω2 )) = d(ω1 , ω2 )
for all ω1 , ω2 ∈ Ω,
,
(t)
• stretching factors si
(t)
• fitness rescalings ri
(t)
t∈N
with si ∈ R+ , and
(t)
t∈N
with ri ∈ R+ .
The accumulated dynamics are defined as
(t ,t )
(t )
(t)
(0,0)
(t +1)
(0,t)
for t2 > t1 ≥ 0, Ci
= id , Ci = Ci
• Ci 1 2 = ci 2 ◦ . . . ◦ ci 1
t ≥ 0,
Q2
(t ,t )
(t)
(0,0)
(t)
(0,t)
• Si 1 2 = tt=t
s for t2 > t1 ≥ 0, Si
= 1, Si = Si , and
1 +1 i
(t1 ,t2 )
• Ri
=
(t)
t=t1 +1 ri
Qt2
(0,0)
for t2 > t1 ≥ 0, Ri
(t)
(0,t)
= 1, Ri = Ri
for
.
For ω ∈ Ω, the resulting dynamic fitness function is defined as
n
−→ o
(t)
(t)
(t) −→
F (t) (ω) = max R1 f1 (C1 (S1 0ω)), . . . , Rn(t) fn (Cn(t) (Sn(t) 0ω))
45
4. A C LASSIFICATION OF DYNAMIC P ROBLEMS
Moreover, it is required that the following condition holds
(t)
(t)
∃1≤j≤n ∀1≤k≤n,j6=k Rj > Rk .
In generation t = 0, (4.1) holds for j = 1.
(4.1)
♦
Note, that the coordinate transformations are linear translations, rotations, and
combinations of translation and rotation. All those transformations are preserving
any distance in the search space
Also note, that condition (4.1) assures that there is at each time step only one global
optimum, i.e. only one of the component fitness functions attains the optimal fitness value. This condition is of a technical nature to simplify the definition of the
properties in the succeeding section.
Before a few examples are presented illustrating how problems from recent publications may be studied within the given framework, the difference between the
framework and a problem generator are elucidated in this paragraph. On the one
hand, a problem generator produces many different test problems according to certain given parameters. However, the variance concerning various problem properties may be still very high. As a consequence the problem classes defined by the
parameters are often very broad with rather fuzzy and imprecise boundaries. On
the other hand, the given framework defines exactly how the search space changes
at any point for all time steps. There is no random variation possible within the
framework. As a consequence the properties can be defined formally in the next
section.
An illustration of constructing one component of the dynamic fitness function is
contained in Figure 4.2
Example 4.1 (Moving Peaks Problem) One example are the moving peaks problems (e.g. Branke, 1999c; Morrison & De Jong, 1999) which can be realized by one
component function for each peak. Then, the movement of the peaks is defined by
the coordinate transformations, the peak heights may be changed by fitness rescaling, and the width or the slope of a cone may be changed by the stretching factor.
The coordinate transformations, the rescaling, and the stretching factor apply usually only very small changes to each hill. In case of the problem generator proposed
by Morrison and De Jong (1999) it is possible to choose the randomness of the different transformations using a logistic function where small values produce exactly
one deterministic value but for higher values the function bifurcates resulting in
various values with equal probability. That means, by using small values, a linear
constant change can be achieved where higher values lead to a more random behavior of the hills. However, each random instance must be described separately within
46
4.3. DYNAMIC P ROBLEM F RAMEWORK
1.0
stretching
0.75
0.5
v
0.25
Sv
0
0
coordinate
transformation
1.0
fitness rescaling
0.75
0.5
v
0.25
Rv
0
C
0
integrate several component functions
0
Figure 4.2 Transformation of one cone at one time step in an exemplary moving
hills problem.
the framework. An exemplary landscape with two hills is defined for Ω = R2 using
the stationary functions
p
p
1 − x2 + y 2 , if x2 + y 2 < 1
f1 (x, y) = f2 (x, y) =
,
0,
otherwise
coordinate transformations that provide a linear movement into one direction
x
x−5
(0)
c1 :
7→
y
y−1
47
4. A C LASSIFICATION OF DYNAMIC P ROBLEMS
Figure 4.3 The left column shows the movement of one static hill in the fitness
landscape. The right column shows two hills moving and changing
their heights additionally.
(0)
c2
(t)
c1
(t)
c2
:
:
:
x
y
x
y
x
y
x+1
y+1
x + 0.2
y + 0.1
x − 0.1
y − 0.1
7→
7→
7→
for 1 ≤ t
for 1 ≤ t,
no stretching factors, i.e. the slope of the hills does not change,
(t)
s1
(t)
= s2
= 1 for 0 ≤ t, and
opposite fitness rescalings for both hills
(0)
(0)
r1
= 1
r2
(t)
r1
(t)
r2
= 0.9 for 1 ≤ t
= 0.1
= 1.2 for 1 ≤ t.
Figure 4.3 shows schematically how such a landscape is changing. Due to the
fitness rescalings the global optimum is jumping from one hill to the other hill. ♦
Example 4.2 (Hills with changing height) This fitness function, introduced by Trojanowski
and Michalewicz (1999b), divides the search space into different segments which
48
4.3. DYNAMIC P ROBLEM F RAMEWORK
1
8
2
7
3
6
4
0
5
1
4
3
2
1
2
5
6
7
9
10
11
8
8
3
4
7
12
13
15
14
6
5
Figure 4.4 Hills with changing height: the maximum height appears at the marked
hills in the order indicated.
hold each a peak where the height of the peaks is changing according to a schedule.
In particular, a two-dimensional search space [0, 1) × [0, 1) may be divided into 16
cells of equal size. Each cell 0 ≤ k ≤ 15 is defined as Il × Im with k = 4l + m,
0 ≤ l, m ≤ 3 and Ij = [lbj , ubj ) where
lb0 = 0,
ub0 = 0.25,
lb1 = 0.25, lb2 = 0.5,
ub1 = 0.5, ub2 = 0.75,
lb3 = 0.75,
ub3 = 1.
Then, for each cell k the fitness function is defined as

 Maxheight (t)
k (ubl − x1 )(x1 − lbl )(ubm − x2 )(x2 − lbm ),
(t)
fk (x1 , x2 ) =
.
iff lbl ≤ x1 < ubl and lbm ≤ x2 < ubm

0, otherwise
The hills in the component functions with index k ∈ {0, 3, 6, 9, 10, 12, 13, 15} have
(t)
a constant height Maxheight k = 0.5. The other hills form a cycle
cyc(0) = 1, cyc(1) = 2, cyc(2) = 7, cyc(3) = 11,
cyc(4) = 14, cyc(5) = 13, cyc(6) = 8, cyc(7) = 4.
(t)
All those hills get the maximal height Maxheight k = 1.0 assigned. Using the
fitness rescalings one of those hills becomes the maximum hill and this role cycles
through all eight hills. There are no coordinate or stretching dynamics necessary to
49
4. A C LASSIFICATION OF DYNAMIC P ROBLEMS
model this problem
(t)
= 1 (identity)
(t)
= 1.
ci
si
The following values define the rescaling of the 8 cycling hills
α0 = 1
αi = 0.5 for 1 ≤ i ≤ 7.
In the first generation, α0 will be associated with hill cyc(0), α1 with cyc(1), and
so on. Like in the work of Trojanowski and Michalewicz (1999b), the maximum
hill jumps each 5 generations. This is modeled by the fitness rescalings as follows
for 0 ≤ i ≤ 7

αi ,
if t = 0


 α(i+ t )mod8
5
(t)
, if ∃a∈N t = 5a
rcyc(i) =
α
k mod8
7+i+

(
5)


1,
otherwise
All remaining hills k ∈ {0, 3, 6, 9, 10, 12, 13, 15} are associated with the fitness
(t)
rescaling factor rk = 1.
♦
Example 4.3 (Rotating fitness functions) Another possible dynamic fitness function is the rotation of a static fitness function around a center as introduced by
Weicker and Weicker (1999, 2000). This is easily reproducible in this framework
by defining only one component function with the static fitness function f1 and a
rotating coordinate transformation
x
cos( 2π
)x − sin( 2π
)y
(t)
τ
τ
c1 :
7→
for 0 ≤ t,
sin( 2π
)x + cos( 2π
)y
y
τ
τ
where τ is the number of time steps necessary for one full rotation. The rotation
center is ~0 ∈ R2 = Ω.
Stretching factors and fitness rescalings are not necessary and therefore
(t)
r1
= 1
(t)
s1
= 1.
Figure 4.5 shows how the fitness function changes.
50
♦
4.4. P ROBLEM P ROPERTIES
Figure 4.5 Schematic description of the rotating fitness function.
4.4
Problem Properties
The mathematical formulation of dynamic fitness functions in Definition 4.1 enables the definition of several problem properties influencing the hardness of a dynamic problem. The following definitions formalize a few basic problem properties
inherent in the dynamics of the problem. All properties are defined with respect to
a set of time steps T ∈ N (T 6= ∅). For each definition the properties are illustrated
using the examples above. For simplicity, we assume from now on that Ω = Rn .
4.4.1
Coordinate Transformations
The first four definitions concern the coordinate transformations. Since Example 4.2 makes no use of coordinate transformations those properties are fulfilled
for this example for trivial reasons.
Definition 4.2 (Predictability of coordinate transformations)
Let F be a dynamic
(t)
(t)
(t)
fitness function consisting of n components Fi = fi , (ci )t∈N , (si )t∈N , (ri )t∈N
(1 ≤ i ≤ n). Then, the coordinate transformations of Fj (1 ≤ j ≤ n) are predictable with respect to time steps T ⊆ N (T 6= ∅) and error ε ∈ R+
0 iff
(t)
(t+1)
predictC T,ε (Fj ) ≡ ∀t∈T max d cj (ω), cj (ω) ≤ ε.
(4.2)
ω∈Ω
♦
51
4. A C LASSIFICATION OF DYNAMIC P ROBLEMS
Lemma 4.1 Supposed Ω = Rn , a dynamic problem consisting of just one component function F1 with predictable dynamics for T = {t} and ε = 0, and the rule of
dynamics being a rotation or a linear translation—no combination of both—, then
it follows from Equation 4.2 that the movement of any point can be predicted at
time t + 1 with probability 1 using the preceding movement of n uniformly chosen
random points ωi (1 ≤ i ≤ n) at time t.
♦
Proof: It follows immediately from Equation 4.2 that the coordinate transformation
is the same at times t and t + 1 (since ε = 0). Then, the two cases of coordinate
transformations must be distinguished. First, the case of the linear translation can
be recognized by equal translations for all points ω1 , . . . , ωn ∈ Ω:
(t)
(t)
c1 (ω1 ) − ω1 = . . . = c1 (ωn ) − ωn .
(4.3)
Then the prediction for ω is
(t+1)
c1
(t)
(ω) = ω + c1 (ω1 ) − ω1 .
Second, if Equation 4.3 does not hold, the coordinate transformations must be a ro(t)
tation. From one point in the search space ωk and its translation c1 (ωk ) − ωk it can
be deduced that the center of the rotation is in the hyperplane plane k determined
by the following conditions:
(t)
1. c1 (ωk ) − ωk is a normal vector on plane k and
(t)
2. ωk + 21 (c1 (ωk ) − ωk ) ∈ plane k .
Note, that for any pair 1 ≤ k, k 0 ≤ n the probability is 0 that there exists an s ∈ R
−→
−→
center ωk = s center ωk0 .
Therefore, the center of the rotation can be computed with probability 1 as
\
center =
plane k
1≤k≤n
The degree of one rotation results as
(t)
180 |c1 (ω1 ) − ω1 |
α =
.
π |ω1 − center |
q.e.d.
52
4.4. P ROBLEM P ROPERTIES
Note, if the rule of dynamics is a combination of translation and rotation, there is
no simple derivation mechanism to detect and compute the rule.
In the case of Example 4.1 with linear movement, each component is predictable
and, as a consequence, this holds for the complete function too. Example 4.3 is
predictable too.
Definition 4.3 (Severity of coordinate transformations)
Let F be a dynamic fit
(t)
(t)
(t)
ness function consisting of n components Fi = fi , (ci )t∈N , (si )t∈N , (ri )t∈N
(1 ≤ i ≤ n). Then, the coordinate transformations of Fj (1 ≤ j ≤ n) have the
severity at t ∈ N and ω ∈ Ω
(t)
severity ω,t (Fj ) = d(cj (ω), ω).
The maximal severity with respect to time steps T ⊆ N (T 6= ∅) is defined as
severityC T (Fj ) =
max
t∈T, ω∈Ω
severity ω,t (Fj ).
(4.4)
♦
Example 4.1 has a constant step size and, therefore, constant severity. Example 4.3
has no constant severity since the changes close to the optimum are considerably
smaller than at the border of the search space.
Definition 4.4 (Repetitive coordinate dynamics) Let F be a dynamic fitness func(t)
(t)
(t)
tion consisting of n components Fi = fi , (ci )t∈N , (si )t∈N , (ri )t∈N (1 ≤ i ≤
n). Then, the coordinate transformations of Fj (1 ≤ j ≤ n) are repetitive with
respect to time steps T ⊆ N (|T | ≥ 2) and error ε ∈ R+
0 iff
(t1 ,t2 )
repetitiveC T,ε (Fj ) ≡ ∀t1 ,t2 ∈T,t1 <t2 ∃ε0 ∈[1−ε,1+ε] ∀ω∈Ω d(Cj
(ω), ω) ≤ ε ♦
The set of time steps T are the moments where the landscape has an almost identical
shape.
Example 4.1 is not repetitive for all T ⊂ N. This follows directly from the linear
translation and a non-zero severity in each time step. However, Example 4.3 is
repetitive e.g. for
T = {1, 1 + τ, 1 + 2τ, 1 + 3τ, . . .}.
Definition 4.5 (Coordinate homogeneity)
Let F be a dynamic fitness
function
(t)
(t)
(t)
consisting of n components Fi = fi , (ci )t∈N , (si )t∈N , (ri )t∈N (1 ≤ i ≤ n).
Then, the coordinate transformations of Fj and Fk (1 ≤ j, k ≤ n) are homogeneous
with respect to time steps T ⊆ N (T 6= ∅) iff
(t)
(t)
homoC T (Fj , Fk ) ≡ ∀t∈T cj = ck .
♦
53
4. A C LASSIFICATION OF DYNAMIC P ROBLEMS
Example 4.3 is homogeneous for trivial reasons. Example 4.1 is not homogenous
since different coordinate transformations are applied to each component function.
Given the various properties of coordinate transformations, dynamic problems can
be classified using the properties concerning the movement of the optimum of one
component function in the search space. Therefore, the homogeneity is not considered for this classification. Note, that these classes can be adjusted for different
purposes by choosing time steps T 6= ∅ and error rates ε, ε0 appropriately.
Static coordinates: Position and orientation of the component function(s) is unaffected over the course of time:
∀1≤j≤n
severityC T (Fj ) = 0
(4.5)
for T = N. The other properties are fulfilled trivially if the coordinates are
stationary.
Drifting landscapes: The component functions may change their position or orientation slightly between the time steps following a modification rule. We
can characterize this class using a threshold value θ ∈ R+ for the maximum
small severity value and the error rate ε ∈ R for the predictability:
∀1≤j≤n
∀1≤j≤n
0 < severityC T (Fj ) ≤ θ,
predictC T,ε (Fj )
(4.6)
(4.7)
For a mere drifting landscape T = N is required and ε is supposed to be sufficiently small. In a bounded search space, the definition is usually weakened
by omitting a set of isolated time steps T 0 ⊂ N for the predict predicate where
the direction of the movement may change (T = N \ T 0 in Equation 4.7).
Rotating landscapes: A special case of drifting landscapes where the modification rule is a rotation around a center in the search space. This can be characterized by Equations 4.6, 4.7 and the following predicate
∀1≤j≤n ∃T 0 ⊂T, T 0 6=∅
repetitiveC T 0 ,ε0 (Fj )
(4.8)
with a sufficiently small error rate ε0 ∈ R and T’ being a subset of T in
Equations 4.6 and 4.7. In order to rule out trivial repetition it should hold
that ε0 < θ − ε. Then, from predictability regarding T = {t1 , . . . , tk } and
repetition for T 0 ⊂ T , a mere rotation follows as only possible coordinate
transformation for the period T .
54
4.4. P ROBLEM P ROPERTIES
Randomly drifting landscapes: Unpredictable drifting component functions with
small severity can be characterized by the following equations with regard to
θ ∈ R+ and a sufficiently small ε ∈ R:
∀1≤j≤n
∀1≤j≤n
0 < severityC T (Fj ) ≤ θ,
¬predictC T,ε (Fj )
(4.9)
(4.10)
A subclass of problems returns frequently to previously found solutions where
in addition
∀1≤j≤n ∃T 0 ⊂T, |T 0 |≥2
repetitiveC T 0 ,ε0 (Fj )
(4.11)
with a sufficently small error rate ε0 ∈ R holds (ε0 < θ − ε).
Fast drifting landscapes: Landscapes that drift into a certain direction with rather
big steps can be described using the following equations:
∀1≤j≤n
∀1≤j≤n
∀1≤j≤n ∀T 0 ⊂T, |T 0 |≥2
severityC T (Fj ) > θ,
predictC T,ε (Fj )
¬repetitiveC T 0 ,ε0 (Fj )
(4.12)
(4.13)
(4.14)
with θ ∈ R+ and sufficiently small ε, ε0 ∈ R (ε0 < θ − ε).
Superimposing landscapes: A very fast rotation around a center in the search
space equals the superposition of many landscapes. The global optima may
describe a circular track in the search space. A pathological case is the positioning of the global optima in the center. This class is described by
∀1≤j≤n
∀1≤j≤n
∀1≤j≤n ∃T 0 ⊂T, |T 0 |≥2
severityC T (Fj ) > θ,
predictC T,ε (Fj )
repetitiveC T 0 ,ε0 (Fj )
(4.15)
(4.16)
(4.17)
with θ ∈ R+ and ε, ε0 ∈ R (ε0 < θ − ε).
Chaotic coordinate changes: Unpredictable big changes may lead in most cases
to a rather chaotic behavior. This class is described by
∀1≤j≤n
∀1≤j≤n
severityC T (Fj ) > θ,
¬predictC T,ε (Fj )
(4.18)
(4.19)
with θ ∈ R+ and ε ∈ R. A subclass on which an evolutionary algorithm is
probably applicable is characterized by the additional repetition property for
a set of isolated time steps T 0 (|T 0 | ≥ 2).
∀1≤j≤n
repetitiveC T 0 ,ε0 (Fj )
(4.20)
55
4. A C LASSIFICATION OF DYNAMIC P ROBLEMS
Note, that this characterization assumes that the properties hold for all component
functions. There is an even bigger number of mixed classes where single component functions would fall into different classes. However, those mixed classes are
more complicated to analyze: single instances may have varying difficulty depending on properties concerning the fitness rescalings and stretching factors.
fast drifting
not predictable
predictable
Figure 4.6 gives an overview on the resulting classes.
chaotic
superimposition
big severity
returning
rotation
repetitive
not repetitive
random drifting
drifting
small severity
static
no severity
Figure 4.6 Overview on the different problem classes generated by coordinate
transformations.
4.4.2
Fitness Rescalings
The following three definitions make statements on the properties of fitness rescalings. Since Example 4.3 uses no fitness rescalings, it is not considered in the discussion of the properties.
56
4.4. P ROBLEM P ROPERTIES
Definition 4.6 (Predictable fitness rescalings)
Let F be a dynamic fitness
func
(t)
(t)
(t)
tion consisting of n components Fi = fi , (ci )t∈N , (si )t∈N , (ri )t∈N (1 ≤ i ≤
n). Then, the fitness rescalings of Fj (1 ≤ j ≤ n) have constant dynamics with
respect to time steps T ⊆ N (T 6= ∅) iff
(t)
(t+1)
predictR T (Fj ) ≡ ∀t∈T rj < 1 ⇔ rj
<1 .
The time steps T are those moments where the optimum of Fj has almost identical
fitness.
♦
Example 4.1 is predictable since one hill is swelling and the other is shrinking at
all time steps. Example 4.2 is not predictable with respect to the definition above.
Definition 4.7 (Severity of fitness rescalings)
Let F be a dynamic fitness
func
(t)
(t)
(t)
tion consisting of n components Fi = fi , (ci )t∈N , (si )t∈N , (ri )t∈N (1 ≤ i ≤
n). Then, the fitness rescalings of Fj (1 ≤ j ≤ n) have the following severity with
respect to time steps T ⊆ N (T 6= ∅)
(t)
(t−1)
severityR T (Fj ) = max (rj − 1)Rj
t∈T
♦
.
The severity is the absolute (additive) change concerning the best fitness of a component function. In Example 4.1 the severity of hill 1 is −0.1 at time step 1, −0.09
at time step 2, −0.081 at time step 3, and so on. The severity of hill 2 is 0.02 at
time step 1, 0.024 at time step 2, 0.0288 at time step 3, and so on. In Example 4.2
the severity of a component function Fcyc(i) is α(i+ 5t )mod8 − α(7+i+ 5t )mod8 for time
steps t = 5, 10, 15, . . ..
Definition 4.8 (Repetitive fitness rescalings)
Let F be a dynamic fitness function
(t)
(t)
(t)
consisting of n components Fi = fi , (ci )t∈N , (si )t∈N , (ri )t∈N (1 ≤ i ≤ n).
Then, the fitness rescalings of Fj (1 ≤ j ≤ n) are repetitive with respect to time
steps T ⊆ N (|T | ≥ 2) and error ε ∈ R+
0 iff
(t ,t2 )
repetitiveR T,ε (Fj ) ≡ ∀t1 ,t2 ∈T,t1 <t2 ∃ε0 ∈[1−ε,1+ε] Rj 1
= ε0 .
♦
Example 4.2 is repetitive for always five generations, e.g. T = {0, 1, 2, 3, 4} or T =
{5, 6, 7, 8, 9}. Due to the cyclic movement also T = {0, 20, 40, . . .} is repetitive.
This is not the case with Example 4.1 where there is no repetition at all. Again,
Example 4.3 is repetitive for trivial reasons.
57
4. A C LASSIFICATION OF DYNAMIC P ROBLEMS
Definition 4.9 (Homogenous fitness rescalings)
Let F be a dynamic fitness
func
(t)
(t)
(t)
tion consisting of n components Fi = fi , (ci )t∈N , (si )t∈N , (ri )t∈N (1 ≤ i ≤
n). Then, the fitness rescalings of Fj and Fk (1 ≤ j, k ≤ n) are homogeneous with
respect to time steps T ⊆ N (T 6= ∅) iff
(t)
(t)
homoR T (Fj , Fk ) ≡ ∀t∈T rj = rk .
F is homogeneous with respect to T iff homoR T (Fj , Fk ) holds for all 1 ≤ j, k ≤
n.
♦
Neither Example 4.1 nor Example 4.2 are homogeneous concerning the fitness
rescalings.
Another aspect of changing fitness rescalings is the fact that two components may
exchange their roles in a relation concerning the maximal height of the components.
This may lead to a drastic change concerning the overall fitness landscape since the
global optimum may be relocated in a different component function. This property
is defined formally in the next definition.
This property is based on the fact that an alternation between two composite func(t)
(t)
(t+1)
tions Fj and Fk takes place at time t if and only if either Rk < Rj and Rk
>
(t+1)
(t)
(t)
(t+1)
(t+1)
Rj
or Rj < Rk and Rj
> Rk . Moreover, Definition 4.1 states that
there is at any time exactly one component function with maximal fitness, starting
with component function F1 at time t = 0.
Definition 4.10 (Alternating)
Let F be a dynamicfitness function consisting of n
(t)
(t)
(t)
components Fi = fi , (ci )t∈N , (si )t∈N , (ri )t∈N (1 ≤ i ≤ n). Then, the best
fitness values of Fj and Fk (1 ≤ j, k ≤ n) are alternating with respect to time steps
T ⊆ N (T 6= ∅) iff
!
(t+1)
(t)
(t)
(t+1)
rj
Rj
Rk
rk
alter T (Fj , Fk ) ≡ ∀t∈T 1 < (t) < (t+1) ∨ 1 < (t) < (t+1) .
Rj
rk
Rk
rj
Thus the sequence of time steps with changes concerning the global optimum and
the involved composite functions is defined by the series φi ∈ N×{1, . . . , n} where
the first element contains the time when a change occurs and the second element
the index of the new optimum component function.
φ0 = (0, 1)

(t, j),



φi =



undefined,
58
if alter {t} (Fj 0 , Fj ) holds and
∀t0 <τ <t ∀1≤k≤n ¬alter {τ } (Fj 0 , Fk )
where φi−1 = (t0 , j 0 )
otherwise
4.4. P ROBLEM P ROPERTIES
The maximal and minimal time sequence of one dominating composite function is
given by
min (ti+1 − ti )
i∈N
max (ti+1 − ti ).
i∈N
where φi = (tni , ki). Also the set of involved
o alternating composite functions may
be defined as ki i ∈ N ∧ φi = (ti , ki ) .
♦
Note that the alternation is strongly related to the severity and the homogeneity as
the following lemma states.
Lemma 4.2 For any set of time steps T ⊂ N and T − = {t − 1 | t ∈ T ∧ t 6= 0}
it holds that
(∀1≤j≤n severityR T (Fj ) = 0) ∨ ∀1≤j,k≤n (k6=j) homoR T (Fj , Fk )
⇒ 6 ∃1≤j,k≤n (k6=j) alter T − (Fj , Fk )
♦
Proof: If the first condition concerning the severity holds, it follows immediately
(t)
that rj = 1 for all 1 ≤ j ≤ n and for all t ∈ T . As a consequence the condition
for the alternation can never be true for all 1 ≤ j, k ≤ n:
(t)
∀t∈T − 1 <
Rk
(t)
< 1.
Rj
Also, the second condition concerning the homogeneous fitness rescalings leads to
the same contradiction.
q.e.d.
Figure 4.3 illustrates how for the moving hills problem (Example 4.1) the global
optimum jumps from one hill to the other hill—this happens according to the definition exactly at time step t = 9. That means that both component functions
alternate. In Example 4.2 the component functions participating in the cycle of
cells are alternating at time steps 5, 10, . . ..
The following classes may be identified concerning the fitness rescalings.
static: The fitness values of the component functions are not changed by rescaling:
∀1≤j≤n
severityR T (Fj ) = 0.
The other properties are fulfilled trivially if the fitness is static.
59
4. A C LASSIFICATION OF DYNAMIC P ROBLEMS
Unaffecting: As Lemma 4.2 states, the homogenous fitness functions are not alternating. As a consequence the class is described by
∀1≤j≤n
∀1≤j,k≤n
0 < severityR T (Fj )
(j 6= k) ⇒ ¬homoR T (Fj , Fk ).
Swelling and shrinking: This class is characterized by predictable small fitness
rescalings.
∀1≤j≤n
∀1≤j≤n
∀1≤j,k≤n
0 < severityR T (Fj ) ≤ θ
predictR T (Fj )
(j 6= k) ⇒ ¬homoR T (Fj , Fk )
where θ ∈ R+ denotes the maximal acceptable small severity value.
A subclass of problems are those problems that alternate
∃1≤j,k≤n ∃T 0 ⊂T, |T 0 |≥2
alter T 0 (Fj , Fk ).
Random: This class denotes those fitness rescalings which change slightly in a
rather random and unpredictable manner.
∀1≤j≤n
∀1≤j≤n
∀1≤j,k≤n
0 < severityR T (Fj ) ≤ θ
¬predictR T (Fj )
(j 6= k) ⇒ ¬homoR T (Fj , Fk )
where θ ∈ R+ denotes the maximal acceptable small severity value.
A subclass of problems are those problems that alternate
∃1≤j,k≤n ∃T 0 ⊂T, |T |≥2
alter T 0 (Fj , Fk ).
Chaotic: This class is characterized by the following predicates
∀1≤j≤n
∀1≤j,k≤n
severityR T (Fj ) > θ
(j 6= k) ⇒ ¬homoR T (Fj , Fk )
where θ ∈ R+ denotes the maximal acceptable small severity value. In case
of the fitness rescalings, big changes imply immediately chaotic behavior
since the domain of rescaling values is restricted to [0, 1].
Figure 4.7 gives a graphical overview on the different classes. Note that the repetitive property is not considered in this classification. The reason is that the mere
fact that certain fitness values return exactly has not decisive influence on the difficulty of the problem. Here the alternation, the predictability, and the severity are
the most important properties.
60
4.4. P ROBLEM P ROPERTIES
chaotic
big severity
swelling and shrinking
not affecting alternation
random
alternating
small severity
homogeneous
predictable
not predictable
not homogeneous
alternating
static
no severity
Figure 4.7 Overview on the different problem classes generated by fitness rescalings.
4.4.3
Stretching Factors
The last dynamic modification applied to the components of fitness functions is the
stretching factor. It differs from coordinate transformations and fitness rescalings in
the dynamics it creates. Obviously these modifications have no effect on the global
optimum in the search space but on the morphology of the landscape. This impact
is formalized in the following definition on the visibility of component functions.
Definition 4.11 (Visibility of component functions)
Let F be a dynamic fitness
(t)
(t)
(t)
function consisting of n components Fi = fi , (ci )t∈N , (si )t∈N , (ri )t∈N (1 ≤
i ≤ n). Then, the best fitness value of Fj is visible at time steps
n
o
(t)
(t)
(t)
(t)
(t) −1
t ∈ N ∀k∈{1,...,n}\{j} Rj > Rk fk (Ck Sk (Cj ) (0) )
♦
61
4. A C LASSIFICATION OF DYNAMIC P ROBLEMS
In the examples given only the moving hills problem in Example 4.1 may exhibit
invisibility of certain component functions. However, the values in the example
are chosen such that the optimal value of both component functions is visible at all
time steps.
4.4.4
Frequency of Changes
Besides the actual modifications a very important factor is also the frequency with
which the changes within the dynamic landscape occur. In many benchmark problems for dynamic optimization there is no change for several generations and the
landscape stays fixed. A lot of optimization techniques rely on these static periods.
Therefore, the following definition classifies non-stationary problems according to
their frequency of change.
Definition 4.12 (Frequency of change)
Let F be a dynamic fitness
function con
(t)
(t)
(t)
sisting of n components Fi = fi , (ci )t∈N , (si )t∈N , (ri )t∈N (1 ≤ i ≤ n).
Then, for a time interval T 0 = {t1 , . . . , tl } the frequency of change is defined as
frequency T 0 (F ) =
#{t ∈ T 0 | F is not constant at t}
#T 0
where F is called constant at t iff
∀1≤j≤n severityC {t} (Fj ) = 0,
∀1≤j≤n severityR {t} (Fj ) = 0, and
(t)
∀1≤j≤n sj = 1.
♦
In Examples 4.1 and 4.3 the frequency of change is 1, which means that in each
generation a change is occuring. In Example 4.2 the frequency of change is 15 since
the landscape is modified once in five generations.
The frequency of change is probably a questionable concept as it is discussed here,
since it is strongly related to the concept of a generation in the evolutionary algorithm. In general, the time steps of the problem and the number of fitness evaluations determining the time of one generation of the algorithm are independent of
each other. This may be problematic if the problem changes within one generation.
Therefore, we assume that the evolutionary algorithm is tailored to the problem in
the sense that the changes only occur in between the generations of the algorithm.
62
4.4. P ROBLEM P ROPERTIES
4.4.5
Resulting Classification
In order to get a classification of dynamic problems in function optimization under
consideration of all problem properties, the different classes for coordinate transformations and fitness rescalings are combined in Figure 4.8.
coordinate transformation
static
rotating
rand. drift fast drift
superpose
1
ss
Cla
static
fitness rescaling
drifting
random
not
alt
alt.
cyclic
Drifting landscapes
unaffecting
not
swell/ alt
shrink
alt.
chaotic
3
ss
Cla
2
ss
Cla
4
ss
Cla
cyclic
cyclic
chaotic
Morphological changes (esp. with restr. visibility)
Figure 4.8 Graphical overview on the resulting classification. The areas with thick
lines indicate where the general classes might be positioned. The four
numbered classes are used as examples in the next chapter.
The factors of the visibility of the component functions and the frequency of change
are not considered in Figure 4.8 since they do not affect the global optimum of the
dynamic problem. These factors could be added in an additional dimension.
This new classification is rather complex. Therefore, it is important to embed previous classifications from the literature. First, De Jong (2000) distinguishes in
a rather coarse grained classification drifting landscapes, landscapes undergoing
significant morphological changes, cyclic patterns, and abrupt and discontinuous
changes. The first two classes can be easily identified within the new framework.
However, note that there is an overlap between both classes in Figure 4.8. This is
primarily due to the contrast between the existing rather inexact definitions of problem classes and the classification presented here which is based on exact properties.
According to De Jong (2000), the third class is characterized by a small number of
landscapes which may occur. This kind of dynamics may be introduced by different
63
4. A C LASSIFICATION OF DYNAMIC P ROBLEMS
techniques in the presented framework but cannot be assigned to one combination
of properties. The fourth class, containing problems with abrupt changes, is primarily motivated by static landscapes where catastrophies happen and lead to a new
landscape. Where this kind of behavior can be easily modeled within the framework by many properties, it is slightly out of the focus of this thesis since it is more
concerned with iteratively occuring changes. One of the main properties for these
problems is the frequency of change—the characteristics of the coordinate transformations and rescalings appear to be of secondary importance. For this reason,
this class was not included in Figure 4.8.
However, as we see in Figure 4.8, the classes from literature are very coarse-grained
and especially the cyclic problems can be generated by many different constellations of the meta rules. For the discussion of dynamic problems in the remainder
of this thesis, the following four classes are introduced.
Class 1: pure tracking task with constant severity concerning the coordinates, i.e.
the static fitness landscape is moving as a whole (homogeneous) or the parts
are moving differently (inhomogeneous), the coordinate translation is predictable and there is no fitness rescaling. As a consequence the problem is
not alternating.
Class 2: pure tracking task with constant severity in a changing environment, i.e.
fitness rescalings and coordinate translations are inhomogeneous and predictable, but the problem is not alternating.
Class 3: oscillation between several static optima, i.e. no coordinate translation
but inhomogeneous, predictable fitness rescalings take place. The problem is
alternating.
Class 4: oscillating tracking task, i.e. fitness rescalings and coordinate translations
are inhomogeneous and predictable and the problem is not alternating.
In general we require the properties of the classes to hold for all time steps but there
can be a small fraction of singularly time steps which do not belong to the strict
properties. An example is the change of the direction of a linear translation within
a restricted search space.
4.5
Discussion
This chapter introduced a new formal framework for analyzing, discussing, and
comparing dynamic problems. Using the framework single problem instances can
64
4.5. D ISCUSSION
be described in a very detailed manner which is not possible in any other available classification scheme. This is a necessary foundation for the next chapters of
the thesis where questions like the assignment of techniques to problem classes or
the measurement of performance are discussed in the light of properties of nonstationary problems.
65
CHAPTER 5
Measuring Performance in Dynamic
Environments
This chapter is devoted to the question what the term “good performance” means
in dynamic optimization. In the case of classical optimization tasks in static environments the performance can easily be measured using the best or average fitness
values—it indicates how good the optimum was approached. But in the case of
dynamic problems the task of finding one optimum shifts to an adaptive process
that approximates the best solution at each time step. Due to various kinds of dynamics and the underlying component functions there are many possible aspects of
adaptation one might be interested in. The importance of each of these aspects is
directly related to the goals one wants to achieve by dynamic optimization. Therefore, the suitability of a performance measure in a dynamic environment depends
primarily on the goals and the problem characteristics. Section 5.1 deals with different aspects concerning the goals of dynamic optimization. Possible performance
measures are discussed in Section 5.2 and an empirical investigation is presented
in Section 5.3 before a short summary in Section 5.4. The chapter is based on the
results of a pre-published conference contribution (Weicker, 2002).
67
5. M EASURING P ERFORMANCE IN DYNAMIC E NVIRONMENTS
5.1
Goals of Dynamic Optimization
In the previous chapter, a general framework and essential properties of dynamic
fitness functions have been defined formally. This chapter is aiming at a relation
between problem properties and performance measures to assess algorithms on
these problems. In order to do this, it is interesting to look closer at the goals that
are pursued by dynamic optimization.
However in the existing literature there is almost no discussion on the goals of
optimization in a dynamic environment. This is somewhat surprising since the
goal is not anymore to find a static optimum only. It is rather an adaptation and
estimation process. De Jong (1993) has already pointed out that, even in the case
of static problems, genetic algorithms are more than a mere optimization tool and
points to the original intentions of Holland (1975). This is even more the case in
a dynamic environment. As a consequence this section is concerned with various
aspects that are involved when talking about adaptation.
Adaptation is defined as the adjustment of organs or organisms to certain stimulant
or environmental conditions. When transferring this definition to dynamic function optimization such an adjustment can be measured best by the quality of the
approximation delivered by the algorithm over a set of generations. But this includes not all aspects of adaptation. We might not only be interested in an average
approximation quality but also in the stability of the quality and, if stability cannot
be guaranteed, the time necessary to reach again a certain approximation level.
In the following subsections, these aspects are defined more formally. These definitions presume global knowledge on all aspects of the dynamic problem. Furthermore they use only the observable behavior of the optimization algorithm, i.e. the
best approximation (fitness value) at each time step. Since this global knowledge is
not available in most applications for an assessment of the quality of an evolutionary algorithm, the next section is concerned with performance measures to estimate
these characteristics of evolutionary algorithms on dynamic problems.
In order to enable a fair comparison of different algorithm with respect to the question how they reach the goal of dynamic optimization, it is useful to have a single
value that states how well an algorithm is doing. The standard technique for this
problem was already considered by the performance measures of De Jong (1975):
averaging over all generations (or a subset of generations).
Definition 5.1 (Averaging over time) Let T be a set of time steps and M (t) ∈ R
be the performance value for the application of an algorithm to a function with
respect to generation t ∈ T .
68
5.1. G OALS OF DYNAMIC O PTIMIZATION
Then, the average value or performance is defined as
1 X (t)
Avg(M, T ) =
M
| T | t∈T
(5.1) ♦
This technique will be used in the following discussion of goals as well as in the
performance measures.
5.1.1
Optimization accuracy
The primary overall goal of any (dynamic) function optimization is to achieve a
high quality approximation for all considered time steps. This aim is resumed in
the following formal definition of the optimization accuracy.
(t)
Definition 5.2 (Optimization accuracy) Let F be a fitness function, Max F ∈ R
(t)
the best fitness value in the search space, and Min F ∈ R the worst fitness value
in the search space. Moreover, let EA be an optimization algorithm. Then, the
algorithm’s accuracy at time t ∈ Time EA is defined as
(t)
(t)
(t)
accuracy F,EA =
F (best EA ) − Min F
(t)
(t)
Max F − Min F
(5.2)
(t)
where best EA is the best candidate solution in the population at time t.
For a given set of relevant time steps T ⊆ Time EA the average optimization accuracy is defined as
(t)
Acc F,T (EA) = Avg(accuracy F,EA , T )
(5.3) ♦
Note, that the accuracy is only well defined if F is non-trivial at each time step,
(t)
(t)
that means that the function is non-constant (Min F < Max F for t ∈ T ). The
optimization accuracy ranges between 0 and 1, where accuracy 1 is the best possible value. It is also noteworthy that the optimization accuracy is independent of
fitness rescalings since it equals the percentage of the best fitness produced by the
algorithm with regard to the optimal fitness.
Trojanowski and Michalewicz (1999a) point out that this formula was already introduced by Feng et al. (1997) as a performance measure. However, there it was
only applied to stationary fitness functions.
Example 5.1 Consider the following sequence of optimal fitness values, worst fitness values, and best approximations of an evolutionary algorithm.
69
5. M EASURING P ERFORMANCE IN DYNAMIC E NVIRONMENTS
generation
1
2
3
optimal fitness value 3.4 3.0 2.5
worst fitness value
0.0 0.1 0.0
best approximation
3.3 2.3 2.4
4
4.0
1.0
2.0
Then, the optimization accuracy results in
1 3.3 − 0.0 2.3 − 0.1 2.4 − 0.0 2.0 − 1.0
+
+
+
Acc F,{1,2,3,4} (EA) =
4 3.4 − 0.0 3.0 − 0.1 2.5 − 0.0 4.0 − 1.0
1
=
(0.9706 + 0.7586 + 0.96 + 0.3333)
4
= 0.7556.
♦
5.1.2
Stability
Stability is an important issue in optimization. Usually a candidate solution is referred to as stable if slight variations of the candidate solution have a very similar
fitness. In the context of dynamic optimization, I want to use the term “stability”
with a slightly different meaning: an adaptive algorithm is called stable if changes
in the environment do not affect the optimization accuracy severely. That means
that a stable, adaptive algorithm is required to be prepared for changes in the environment. Although the optimum can move slightly or severely or rather drastic
changes occur, an algorithm should be able to limit the respective fitness drop.
The following definition formalizes this aspect using the optimization accuracy.
Definition 5.3 (Stability) Let F be a fitness function, EA an optimization algorithm, and T ⊆ Time EA the relevant time steps.
Then, the stability for time steps T is defined as
(t−1)
(t)
Stab F,T (EA) = Avg(max{0, accuracy F,EA − accuracy F,EA }, T )
(5.4) ♦
The stability ranges between 0 and 1. A value close to 0 implies a high stability.
Example 5.2 Consider the following sequence of optimal fitness values, worst fitness values, best approximations of an evolutionary algorithm, and the resulting
accuracy per generation.
70
5.1. G OALS OF DYNAMIC O PTIMIZATION
optimal fitness value
worst fitness value
best approximation
accuracy
1
4.4
0.0
4.0
0.9090
generation
2
3
4
4.4
4.4
3.0
0.0
0.0
0.0
4.1
4.1
1.0
0.9318 0.9318 0.3333
5
3.0
0.0
0.9
0.3
6
3.0
0.0
1.1
0.3667
Then the accuracy difference between the succeeding generations is given in the
following table.
between generations
1→2
2→3 3→4 4→5
accuracy difference -0.0228 0.0
0.5985 0.0333
5→6
-0.0667
The following stability results.
1
(0 + 0 + 0.5985 + 0.0333 + 0)
5
= 0.1264
Stab F,T (EA) =
In this example the stability is rather high during the stable phases and the major
contribution to the stability value stems from the drastic change in generation 4.
However, note that the accuracy level after the change is rather low and accuracy
gains are not considered within the stability.
♦
Stability is especially in two scenarios of interest. In the case of drifting landscapes
it is a very good means to gain insight into the ability to track the moving optimum
by observing the stability over a period of time. However, as the example above
shows, the stability must not serve as the sole criteria since it makes no statement
on the accuracy level. Another scenario are chaotic landscapes with few changes.
There it is of interest to examine the stability at the generations where changes
occur.
5.1.3
Reactivity
So far the optimization accuracy and the stability have been defined covering the
aspects of quality and persistence. But there is still another aspect completely disregarded: the ability of an adaptive algorithm to react quickly to changes. Now,
71
5. M EASURING P ERFORMANCE IN DYNAMIC E NVIRONMENTS
one could argue that a combined consideration of accuracy and stability covers this
aspect since unstable phases with a high overall accuracy implies good reactivity.
Nevertheless, this aspect is formalized more exactly in the following definition.
Definition 5.4 (Reactivity) Let F be a fitness function, EA an optimization algorithm, and T ⊆ Time EA a set of optimization time steps.
Then, algorithm EA’s average ε-reactivity for time steps T is defined as
(t)
ReactF,EA,ε = Avg(min recov F,EA,ε , T )
(5.5)
where the time steps until ε-recovery for t ∈ T are defined as
n
o
0
(t0 )
(t)
0
0
= t −t t ∈ Time EA and t > t and accuracy F,EA ≥ (1−ε)accuracy F,EA
(5.6)
and min ∅ = ∞.
♦
(t)
recov F,EA,ε
The reactivity is a value of the set R+ ∪ {∞}. A smaller value implies a higher
reactivity.
This aspect of adaptation is especially of interest if the problem has short phases
of big severity alternating with extensive phases of no severity with regard to the
coordinate transformations or if the problem is alternating concerning the fitness
rescalings (with rather low severity for the coordinates).
Example 5.3 Consider the following sequence of optimal and worst possible fitness values, the best approximations of an evolutionary algorithm, and the resulting
accuracy per generation.
1
optimal fitness value 4.4
worst fitness value
0.0
best approximation
4.0
accuracy
0.9090
2
4.8
0.0
3.1
0.6458
generation
3
4
4.8
4.8
0.0
0.0
3.5
3.7
0.7292 0.7708
5
4.8
0.0
4.0
0.8333
6
4.8
0.0
4.1
0.8542
A 0.1-recovery is already met in generation 5, i.e. the minimum recovery time is 4.
A 0.07-recovery is met in generation 6 with a minimum recovery time of 5.
♦
72
5.2. P ERFORMANCE M EASURES
5.1.4
Technical aspects of adaptation
Certain other aspects of adaptation are of a more technical nature. For example, it
is possible to argue that an algorithm must detect changes in the problem to adapt
well. This ability will certainly improve the performance of an algorithm (if it is
translated into concrete and sensible actions), but it is not necessary for an evolutionary algorithm to guarantee a good adaptive behavior. Therefore, it is preferred
to define in this context adaptation merely using the optimization accuracy.
Another example is the gaining of meta-knowledge in the case of predictable problems. That means that the underlying
be predicted
rules of the dynamics should
(t)
(t)
and fitness rescalings ri
). Again,
(e.g. coordinate transformations ci
t∈N
t∈N
used in an intelligent manner, it can boost the performance of an algorithm but is
actually only a technique to improve the accuracy or stability.
These technical aspects are not characteristic properties of adaptive algorithms.
Since there is no proof that detecting changes in a poor algorithm leads to better
adaptation than a very sophisticated problem-specific algorithm without this ability,
the technical aspects are omitted in the discussion of the goals.
5.2
Performance Measures
Where the goals defined in the preceeding section capture different aspects of approximation quality on dynamic problems, the global knowledge used in the definitions is usually not available. As a consequence, this section is concerned with
measures to estimate those globally defined values.
Before the special requirements of measurements in dynamic environments are discussed, the two standard performance measures in static environments are quickly
reviewed. There, the set of best solutions is constant during the optimization.
Therefore, the best approximation at the end of the search process is of primary
interest. However, since the time to find an optimal value is an important issue,
this factor may be considered in certain performance measures like the online or
offline performance by De Jong (1975). The online performance considers all function evaluations in the search and is therefore well suited for expensive functions.
Offline performance considers for each generation only the fitness of the best individual found so far. Naturally, these performance measures are the basis for many
dynamic performance measures and, as a consequence, are revisited and formally
defined in the next section with a special emphasis on dynamic problems.
In the case of dynamic problems, measuring performance is even more difficult
73
5. M EASURING P ERFORMANCE IN DYNAMIC E NVIRONMENTS
than in static environments. As it was already pointed out in the last section, the
process of adaptation may involve various different aspects that are probably difficult to cover with the usual performance measures. But even if only the optimization accuracy is considered, which equals a scaled fitness in static environments,
certain problems may occur in the case of dynamic fitness functions. The coordinate transformations are not responsible for the difficulty in measuring—they are
the major challenge for adaptive algorithms. But the fitness rescalings aggravate
the measurement of performance for several reasons.
• If the fitness rescalings are not known, the best fitness value at a generation
is not known. Because the best fitness can change, there is no common basis
to talk about accuracy. A good fitness value at one time can be a bad fitness
value at another time—but this is not transparent for the user.
• To continue the argument of the previous item, a considerable drop in best
fitness does not necessarily imply worse accuracy—it could even mean the
contrary.
• Also in the case of alternating problems, a point with new best fitness can
emerge where the hitherto optimum may have unaffected fitness. Then, a
stable fitness gives the wrong impression that the optimum is still observed.
These arguments will be revisited in the following subsections.
Before the concrete measures are discussed, two possible classifications of performance measures are presented. A first approach uses the information employed by
the measure. Here two classes can be distinguished, namely
• fitness based measures and
• genotypic or phenotypic measures.
A second distinction can be made by the knowledge which is available and used in
the measure:
• knowledge on the position of the optimum is available (which is usually only
the case in academic benchmarks),
• knowledge on the best fitness value is available, and
• no global knowledge is available.
The following section argues how the aspects of adaptation could be assessed with
the different degrees of knowledge. These performance measures are evaluated for
various dynamic problems in Section 5.3.
74
5.2. P ERFORMANCE M EASURES
5.2.1
Measures for optimization accuracy
For measuring the optimization accuracy with global knowledge, the definition of
(t)
accuracy F,EA may be used. This makes the most evident statement possible since
it is unaffected by any fitness rescaling. Indeed, Mori et al. (1996) and Mori, Imanishi, Kita, and Nishikawa (1997) used this performance measure averaged over a
number of generations T
PerfAcc F,T (EA) =
1 X
(t)
αt accuracy F,EA .
| T | t∈T
For an exact average value the weights are set to αt = 1 for all t ∈ T . An almost
similar performance measure was used by Trojanowski and Michalewicz (1999b)
where the normalization using the worst fitness value was omitted—which is reasonable since the worst fitness value in the regarded problem is 0. They considered
also only those time steps before a change in the environment occurs.
However, very often simple averaging might lead to misleading values. This can
be avoided by putting more emphasis on the detection of the optimum in each time
step. This was accomplished by Mori et al. (1998) using the weights αt . They
(t)
used αt = 1 if accuracy F,EA = 1 and αt = 0.5 otherwise. Since this rewards only
those generations where the optimum was found exactly, a more gradual approach
is the usage of the square error to the best fitness value as done by Hadad and Eick
(1997).
However, this measure requires the best and worst possible fitness values. In most
applications and problems, this information is not available. As a consequence,
other measures are examined how well they approximate this exact measure. First,
also fitness based measures are considered.
The majority of publications uses the best fitness value in a generation t to assess
the quality of the algorithm (e.g. Angeline, 1997; Bäck, 1999; Cobb, 1990; Cobb
& Grefenstette, 1993; Collard et al., 1996, 1997; Dasgupta & McGregor, 1992;
Dasgupta, 1995; Gaspar & Collard, 1997, 1999a, 1999b; Goldberg & Smith, 1987;
Grefenstette, 1992, 1999; Hadad & Eick, 1997; Lewis et al., 1998; Liles & De
Jong, 1999; Mori et al., 1996, 1997; Vavak et al., 1996a, 1998).
(t)
(t)
currentBest F,EA = max{F (ω) | ω ∈ PEA }
Another common possibility is the average fitness value of the population at generation t (e.g. Cobb & Grefenstette, 1993; Dasgupta & McGregor, 1992; Dasgupta,
75
5. M EASURING P ERFORMANCE IN DYNAMIC E NVIRONMENTS
1995; Goldberg & Smith, 1987; Lewis et al., 1998; Mori et al., 1996, 1997).
(t)
currentAverage F,EA =
1
X
(t)
| PEA |
F (ω)
(t)
ω∈PEA
Averaged over generations T those two measures lead to the classical performance
measures of De Jong (1975). The online performance is defined as the average over
all function evaluations since the start of the algorithm. Presumed that the population size is constant and the algorithm is generational, the online performance may
be defined as follows
(t)
PerfOnline F,T (EA) = Avg(currentAverage F,EA , T )
(5.7)
Online performance reflects the focusing of the search on optimal regions (cf.
Grefenstette, 1992, 1999; Vavak et al., 1996a, 1996b, 1997, 1998). In the online performance actually each new created individual is supposed to contribute a
high fitness value. However, Cobb (1990) argued that this conception might not
be suited for many dynamic problems because focusing too much on good fitness
values might have negative effects on the adaptability.
The offline performance is usually defined using the best fitness value found up to
each generation.
(t0 )
PerfOffline F,T (EA) = Avg max
currentBest F,EA , T
(5.8)
0
1≤t ≤t
This measure allows a much higher degree of exploration since the performance
can revert to the best individual of previous generations. However, as Grefenstette
(1999) and Branke (1999b) point out this measure is not suited for dynamic problems, since it cannot be assumed that a very good solution from several generations
ago is still valid. Therefore, the individuals of previous generations should not be
considered when assessing the current generation. As a consequence the offline
performance can be restricted to the individuals of each generation only.
(t)
PerfOffline ∗F,T (EA) = Avg currentBest F,EA , T
(5.9)
This performance measure shows how well the optimum is approximated over the
generations. Branke (1999a) uses a different approach to deal with this problem.
There for each generation the best fitness value is used from those preceeding generations in which no change in the environment occurred. Apparently, this requires
global knowledge on any possible change in the environment.
76
5.2. P ERFORMANCE M EASURES
Offline performance or variants have been used by Branke (1999c), Cobb (1990),
Collard et al. (1997), Gaspar and Collard (1997), Grefenstette (1992), Vavak and
Fogarty (1996), Vavak et al. (1997), and (Vavak et al., 1998).
Another approach to measure the accuracy without knowing the actual best possible
fitness is based on the assumption that the best fitness value will not change much
within a small number of generations. As a consequence we introduce a local
window of interest W ∈ N and measure the accuracy using the best fitness within
the window as virtual target value.
F (ω) − windowWorst
(t)
(t)
windowAcc F,EA,W = max
ω ∈ PEA with
windowBest − windowWorst
(t0 )
windowBest = max{F (ω) | ω ∈ PEA , t − W ≤ t0 ≤ t}
(t0 )
windowWorst = min{F (ω) | ω ∈ PEA , t − W ≤ t0 ≤ t}
Apparently, this measure has the same problems like the above fitness based approximations: suboptimal convergence leads to values which cannot be recognized
as suboptimal. However, presumed that an algorithm does not completely fail on a
problem with fitness rescalings, this measure produces at least reasonable approximations of the accuracy. This is especially of interest when the values are averaged
over a set of generations. Here the previous measures can easily produce values
that are difficult to interpret. The averaged measure is defined as follows.
(t)
PerfWindow ∗F,T,W (EA) = Avg windowAcc F,EA,W , T
(5.10)
This window based measure has not been used in the experiments reported in the
literature. A similar technique with windows was used by Grefenstette (1986) for
scaling fitness functions in order to improve fitness proportional selection.
Alternatively to the fitness based measures, genotype or phenotype based measures
may be used to approximate the optimization accuracy. Although they are independent of fitness rescalings, they require full global knowledge of the position of the
current optimum. There are two variants of those measures in the literature. First,
Weicker and Weicker (1999) used the minimal distance of the individuals in the
population to the current optimum ω ∗ ∈ Ω. In the following version of the measure the values are scaled such that a value close to 1 implies a close-to-optimum
approximation and a value close to 0 means a rather huge distance.
maxdist − d(ω ∗ , ω) (t)
(t)
(5.11)
bestDist F,EA = max
ω ∈ PEA
maxdist
Second, Salomon and Eggenberger (1997) used the distance of the mass center of
the population to the current optimum’s position. Again the measure is scaled in
77
5. M EASURING P ERFORMANCE IN DYNAMIC E NVIRONMENTS
the same way.
(t)
maxdist − d(ωbest , ωcenter )
with
maxdist
1 X (i)
(t)
=
A where PEA = hA(i) i1≤i≤µ
µ 1≤i≤µ
centerDist F,EA =
ωcenter
Where the first approach seems to be straightforward to assess the approximation
quality, the second performance measure is more difficult to interpret. Similarly to
the online performance each considered individual is important. It requires that the
population as a whole describes very closely the region of the optimum. As it will
be discussed shortly this may be difficult for certain dynamic problems.
Similarly to the fitness based performance measures, those measures may be extended to sets of time steps by averaging. The according performance measures are
called PerfBestDist and PerfCenterDist.
5.2.2
Measures for stability
Since the definition of stability uses the approximation accuracy the performance
measures for stability also build on the performance measures for the accuracy.
If there is global knowledge available the stability can be computed directly. If this
is not possible, the accuracy in the definition of the stability
Stab F,T (EA) =
1 X
(t)
(t−1)
max{0, accuracy F,EA − accuracy F,EA }
| T | t∈T
(5.12)
can be replaced by an arbitrary measure discussed in the previous section. However, the genotype/phenotype based measures do not seem to be adequate for this
purpose since they also require global knowledge and it does not make sense to use
a distance to the optimum if the actual fitness as accuracy measure may be used.
If no global knowledge is available and the problem does not exhibit fitness rescalings, the best fitness values should be a good approximation for the accuracy. Also
the stability using best fitness should still deliver reasonable performance values if
there are moderate fitness rescalings.
The empirical investigation in Section 5.3 considers the performance measures for
the stability based on best fitness, average fitness, and the fitness within the window.
78
5.2. P ERFORMANCE M EASURES
5.2.3
Measures for reactivity
Analogously to the performance measures for stability, the measures for reactivity may be defined in a first approach using the definition of reactivity and the
performance measures for the optimization accuracy. Depending on the available
knowledge most defined measures seem to be reasonable in the context of reactivity. Only the measure using the distance of the population center to the optimum
appears to be rather problematic. Good reactivity seems to be correlated to high
diversity and exploration in many situations—therefore the population center is a
probably inadequate basis for measuring it.
Again the performance measures based on best fitness, average fitness, and fitness
within the window are used in the empirical investigation in Section 5.3.
5.2.4
Comparing algorithms
Performance measures are necessary to compare different algorithms and serve as
a decision criteria for the design of new algorithms. Admittedly, the usage of a
single averaged performance value for an algorithm over a period of several generations might be difficult—especially in the case of dynamic fitness functions where
changing conditions require separate evaluations. Therefore, it is not a big surprise
that the majority of publications relies on a visual comparison of performance values of two (or more) algorithms.
The usage of statistical tests is still rather uncommon. To get empirical confidence that one method is better than another, it is necessary to execute a number
of independent experiments. Then the average performance of each experiment or
the performance in a certain generation may be used as independent samples of a
random variable. For example, Student’s t-test is one method to compare two algorithms. Other methods like the Scheffé test can be used to compare more than
two algorithms. In general, statistical tests are used rather seldom in investigations
concerning evolutionary algorithms. In the context of dynamic problems, they have
been used by Angeline et al. (1996) to assess the hit rate with time series prediction, Vavak et al. (1997) compared operators, and other approaches may be found in
(Pettit & Swigger, 1983; Grefenstette & Ramsey, 1992; Stanhope & Daida, 1998),
to name just a few. Besides the independence of the samples, there are usually
certain assumptions in statistical tests, e.g. the normal distribution of the sample
values. Since those assumptions are usually not completely fulfilled, those results
must also be viewed critically. However the t-test appears to be rather conservative
and, therefore, tends to be more strict if the assumptions are not met.
79
5. M EASURING P ERFORMANCE IN DYNAMIC E NVIRONMENTS
5.3
Examination of Performance Measures
This section considers the four dynamic problem classes that have been introduced
in Section 4.4.5. The various performance measures are compared concerning their
utility on the different problem classes.
5.3.1
Considered problems
The four different problem classes are exemplified and instantiated using the following basic non-stationary component function
(t)
f (A) =
max
1≤j≤hills
0,
150−d(A,optj )
maxfit j
,
150
if d(A, optj ) > 150
otherwise
with A ∈ Ω = [−500, 500]×[−500, 500], Euclidean distance d, and hills randomly
chosen local optima optj ∈ Ω—each local optima corresponds to one component.
The coordinate transformation for each component j is a linear translation of length
coordsev into a direction dirj which is randomly determined at the beginning and
every time a point outside of Ω would be created. The fitness rescaling is a factor
f itchangej which is added to maxf itj . Again, f itchangej ∈ [−f itsev, f itsev] is
chosen randomly at the beginning and when maxf itj would leave the range [0, 1].
In non-alternating problems the maximum hill with j = 1 must have maximal
fitness in the range [0.5, 1] and all other hills have maximum fitness in the range
[0, maxf it1 ]. For the four problem classes the following concrete values are used.
Class 1: coordinate translation, no fitness rescaling, no alternation
Problem instance: hills = 5 , coordsev = 7.5, f itsev = 0
Various hills are moving while their height remains constant and the best hill
remains best.
Class 2: coordinate translation, fitness rescaling, no alternation
Problem instance: hills = 5 , coordsev = 7.5, f itsev = 0.01
Various hills are moving while their height is changing, but the best hill remains best.
Class 3: no coordinate translation, fitness rescaling, alternation
Problem instance: hills ∈ {2, 5} , coordsev = 0, f itsev = 0.01
The hills are not moving but changing their height leading to alternating best
hills.
80
5.3. E XAMINATION OF P ERFORMANCE M EASURES
Table 5.1: Average accuracy and standard deviation for the genetic algorithm with
and without hypermutation.
w/out hypermut.
w/ hypermut.
avg
sdv
avg
sdv
class 1
0.45
0.023
0.87
0.0049
0.45
0.018
0.87
0.0035
class 2
class 3
0.82
0.035
0.96
0.0054
0.97
0.0029
0.99
0.00086
class 3 (2 hills)
class 4
0.46
0.025
0.87
0.0031
0.41
0.023
0.86
0.0019
class 4 (2 hills)
Class 4: coordinate translation, fitness rescaling, alternation
Problem instance: hills ∈ {2, 5} , coordsev = 7.5, f itsev = 0.01
The hills are moving while changing their height and different hills take the
role of the best hill at different generations.
The problem instances with 2 hills are chosen in such a way that there is at least
one alternation while both hills are changing their height into the same direction.
This additional characteristic is supposed to be problematic when measuring the
performance. Note, that the fitness severity is chosen moderately in all classes.
5.3.2
Experimental Setup
To optimize the dynamic problems two genetic algorithms are used. Both algorithms are based on a standard genetic algorithm where each search space dimension is encoded using 16 bits, the crossover rate is 0.6, the bit flipping mutation is
1
, a tournament selection with tournament size 2 is used,
executed with probability 32
and the algorithm runs for 200 generations. In addition to this standard algorithm,
a version using hypermutation with a fixed rate of 0.2 is used (see Grefenstette,
1999). Table 5.1 shows the accuracy averaged over 10 problem instances and 50
independent experiments for each instance as well as the respective standard deviation. The GA with hypermutation performs superior—however the performance
of both algorithms should be expressed by a performance measure equally well.
5.3.3
Statistical examination of the measures
The goal of this investigation is to find some empirical evidence for the question
how good the various measures approximate the exact adaptation characteristics.
81
5. M EASURING P ERFORMANCE IN DYNAMIC E NVIRONMENTS
Table 5.2: Ranking based on pairwise hypothesis tests concerning the MSE of the
curves
Class 2
Class 3
Class 3 (2 hills)
Class 4
Class 4 (2 hills)
Class 1
Class 2
Class 3
Class 3 (2 hills)
Class 4
Class 4 (2 hills)
Accuracy:
best fitness
average fitness
window based
shortest distance
distance of center
Stability:
best fitness
average fitness
window based
0.05-Reactivity:
best fitness
average fitness
window based
GA with hypermutation
Class 1
standard GA
1
2
3
4
5
1
2
3
4
4
1
2
5
3
3
4
3
5
1
2
1
2
3
4
5
1
2
5
3
4
1
4
2
3
5
2
3
1
3
5
1
2
5
4
2
4
3
5
1
2
1
3
2
4
5
1
3
2
4
5
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
3
2
1
3
2
1
3
2
1
3
2
1
3
2
1
3
2
1
3
2
1
3
1
1
3
1
1
3
1
2
3
1
1
3
1
A first approach is based on the assumption that the curves of the performance
measures should match the curves of the respective exact values to guarantee a
meaningful statement of the performance measure. So the essential question is
whether the values of the performance measure are a solid basis for comparing
one generation of the algorithm with another generation. The second approach
considers the averaged performance values only and tests how well they correlate
to the averaged exact values.
p
In the first approach, the measurements are normalized (g(t) − Eg )/ Vg where
Eg is the expectancy value and Vg the variance of the whole curve. This makes
the values of different performance measures comparable since the values are independent of the range of the values. To assess the similarity of the curves of the
exact values h0 and the normalized performance measure g 0 , the mean square error
82
5.3. E XAMINATION OF P ERFORMANCE M EASURES
Table 5.3: Percentage of problem instances with a high correlation to the exact
averaged value
Class 2
Class 3
Class 3 (2 hills)
Class 4
Class 4 (2 hills)
Class 1
Class 2
Class 3
Class 3 (2 hills)
Class 4
Class 4 (2 hills)
Accuracy:
best fitness
average fitness
window based
shortest distance
distance of center
Stability:
best fitness
average fitness
window based
0.05-Reactivity:
best fitness
average fitness
window based
GA with hypermutation
Class 1
standard GA
1.0
1.0
0.9
0.7
0.7
1.0
1.0
0.9
0.7
0.7
1.0
1.0
0.6
0.9
0.9
1.0
1.0
0.0
1.0
1.0
1.0
1.0
0.8
0.7
0.7
1.0
1.0
0.4
0.8
0.9
1.0
0.8
0.9
0.9
0.7
1.0
1.0
0.9
0.7
0.4
1.0
1.0
0.5
0.9
0.9
1.0
1,0
0.0
1.0
0.5
1.0
1.0
1.0
0.8
0.6
1.0
1.0
1.0
0.6
0.3
1.0 1.0 1.0
1.0 1.0 0.4
0.4 0.4 0.2
1.0 1.0 1.0 1.0 1.0 1.0 1.0
0.0 1.0 1.0 0.0 0.2 0.0 0.0
0.2 1.0 0.5 0.1 0.5 0.4 0.2
1.0 1.0
0.0 0.0
0.5 0.3
1.0 1.0 1.0
1.0 1.0 0.4
0.9 0.8 0.8
1.0 1.0 1.0 1.0 0.6 1.0 1.0
0.2 1.0 1.0 0.3 0.3 0.0 0.0
1.0 0.6 0.4 1.0 1.0 1.0 1.0
1.0 0.8
0.1 0.1
1.0 1.0
Pmaxgen 0
M SEg0 ,h0 =
(g (t) − h0 (t))2 is computed. In order to get a statistical
t=1
confidence of one measure over another, a hypothesis test is carried out using the
500 independent mean square errors of each performance measure. Those pairwise
hypothesis tests are used to establish a ranking concerning the suitability of the performance measures. Student’s t-test is used as a hypothesis test with a significant
error probability of 0.05. Table 5.2 shows the results of this analysis.
In the second approach, the averaged measures at the end of a optimization run are
used to determine how well the algorithm performed on the problem. Therefore,
a statistical test is used to compute the correlation of the approximated measures
to the exact measures. The input data for the correlation analysis are the averaged
performance values of the 50 different runs of an algorithm on a problem instance.
(Since the reactivity measures depend highly on the successive generations, the
values up to generation 150 are used for those measures instead of the final perfor83
5. M EASURING P ERFORMANCE IN DYNAMIC E NVIRONMENTS
mance values in generation 200). As statistical method Spearman’s rank correlation
is used. A series of data is considered to be highly correlated if the Spearman’s rank
correlation is positive and the two-sided significance level of its deviation from zero
is less than 0.001. The correlation is computed for each of the ten instances of a
problem class. Table 5.3 shows the percentage of instances where a high correlation
between exact value and performance measure could be identified.
5.3.4
Discussion of the Results
The interpretation of the results appears to be difficult and they should be interpreted only very carefully. The reason for this problematic interpretation is the fact
that there is actually no well founded methodology for a comparison. The statistical
examinations used in this chapter are an attempt to reflect two aspects of good performance measures. However, it seems that a comparison of the averaged measures
might show some misleading tendencies since certain deviations and problems are
averaged out.
According to the averaged values, best fitness is a quite good indicator for all
classes and both high and low quality algorithms. However, the examination of
the MSE shows that all fitness based measures have severe problems with class 3
(2 hills) where only fitness rescalings are occurring, in a misleading way. Especially the windows based measure has severe problems with all class 3 instances
and also with low quality approximations of class 4 problems. The MSE of the
GA with hypermutation on class 2 indicates that the window based measures can
be a better indicator than best fitness although this is not approved by the averaged
performance values.
Also, the stability is measured best using best fitness values. The average fitness
shows very poor results with the averaged performance values. The windows based
measure is insufficient regarding the MSE.
Concerning the reactivity, the window based measure proves to be equivalent or
superior to the best fitness in case of the high quality experiments (GA with hypermutation). Here, the measure based on best fitness shows problems in the MSE
curves for class 4 and in the averaged values on class 2 and class 4. The good
performance of the window based measure can be explained presumably by reconsidering the definition of the reactivity and the window based measure: both have a
very strong relation to filters of data series which could be a hint why they interact
so good with each other.
However, the results for the reactivity visualize also a serious concern: apparently
the quality and usefulness of a performance measure depends on the quality that is
84
5.4. S UMMARY
produced by an algorithm. However, this complicates the picture since we want to
choose a performance measure at hand to determine whether the algorithm shows
good or bad performance.
5.4
Summary
This chapter presents the first systematic approach to examine the usefulness of
performance measures in time-dependent non-stationary environments. The goals
of an adaptation process are discussed in detail and accuracy, stability, and reactivity are proposed as key characteristics. Existing performance measures from
literature are reviewed and a new window based performance measure is proposed.
On a wide set of dynamic problems the measures are examined for an algorithm
with high accuracy and an algorithm generating low accuracy. Altogether the best
fitness value proves to be the best performance measure for problems with moderate
fitness severity—deficiencies exist for problems without coordinate transitions and
as a basis for recovery measures. In the latter case, the window based measure
exhibits a superior performance.
The mapping between problem classes and performance measures appears to be
difficult. However, certain problematic issues for instances of class 3 could be derived for example. Moreover, the performance measure seems to depend in addition
on the quality of the approximation algorithm which raises many new questions
concerning the measurement of algorithms in dynamic environments.
Furthermore, two rather basic concerns can be derived from this examination. First,
a good methodology to examine performance measures is necessary. And second,
the use of rather general problem generators should be reconsidered. The results
for the instances of class 3 and the more selective determined instances of class 3
with two hills show, that there is a huge variance in the behavior of the different instances. Basically, this shows that problem generators are only useful if any aspect
of the dynamics can be calibrated somehow using the parameters of the generator.
This justifies probably the very strict definition of the problem framework in the
previous chapter and underlines the need for such a classification.
85
CHAPTER 6
Techniques for Dynamic
Environments
This chapter provides an overview on the techniques used to trim evolutionary algorithms for dynamic environments—some of the techniques are inherent in certain standard algorithms. For this purpose, the major publications on the topic are
reviewed. The techniques are classified using the following distinction:
• restarting techniques where the optimization is started again from scratch,
• local variation to adapt to changes in the environment,
• memorizing techniques where previous solutions (or parts of previous solutions) are reintroduced into the population,
• diversity preserving techniques to avoid the loss of adaptability,
• adaptive and self-adaptive techniques,
• algorithms with overlapping generations,
• learning of the dynamics rules, and
87
6. T ECHNIQUES FOR DYNAMIC E NVIRONMENTS
• non-local encoding.
There are many algorithms that use various mechanisms and, therefore, can be
assigned to more than one class.
The goal of this overview is to achieve a first mapping of the different techniques
to the problem classes and problem properties identified in Chapter 4. As a consequence, this chapter uses primarily references where the properties of the problem
framework can be identified easily in the considered application. This is for example not the case in rescheduling problems or dynamic job shop scheduling (e.g.
Biegel & Davern, 1990; Bierwirth et al., 1995; Bierwirth & Mattfeld, 1999; Hart &
Ross, 1998). In those problems the properties of the dynamics are not as obvious
and can even depend on the used representation and the operators.
6.1
Restarting
An intuitive approach to deal with premature convergence is the restart of the optimizer as soon as a change in the environment or convergence in the population
is observed. However, it is advisable to initialize the population at the restart with
few individuals of the previous optimization run to guarantee continuity (e.g. in
anytime learning, Grefenstette & Ramsey, 1992).
Although there is a broad range of applications using restarting most of the problems is hardly categorizable within the proposed framework (e.g. Ramsey & Grefenstette, 1993; Pipe, Fogarty, & Winfield, 1994). Only the work by Vavak et al.
(1998) examines restarting on a moving hills problem which has no fitness rescalings, varying coordinate severity, a frequency of change between 24−1 and 6−1 , and
is highly inhomogeneous since only the maximum hill is changing its position. The
examination shows that restarting is a feasible option if the frequency is less than
15−1 where this value decreases with increasing coordinate severity.
Since restarting leads to a completely new optimization there should be no influence of fitness rescalings or scaling factors. Therefore, the results of Vavak et al.
(1998) should be valid for almost all problems.
6.2
Local variation
If only slight modifications occur, it is sensible to use local operators to create
offspring in the region of the current individuals. This is accomplished by standard
88
6.3. M EMORIZING PREVIOUS SOLUTIONS
mutation operators in evolution strategies with a lognormal update rule (see Bäck,
1997, 1998; Salomon & Eggenberger, 1997) and in evolutionary programming with
the additive Gaussian update rule (see Angeline, 1997; Saleem & Reynolds, 2000).
However, in genetic algorithms there is no standard operator searching within a
phenotypic neighborhood of the real-valued search space. Therefore, a special
variable local search operator has been introduced by Vavak et al. (1996a, 1996b)
and Vavak et al. (1997, 1998) where a local search phase is started in which first
only bits with low order are changed and with missing success the search range is
extended by higher order bits. The multinational GA by Ursem (2000) promotes the
intense local variation by linking the mutation rate to the distance of the individual
to a special representative of the subpopulation (called “nation”). The closer an
individual is to its representative, the smaller the used mutation rate. This leads to
an intense search around the representative.
Most problems where local variation has been used are characterized by no fitness rescaling and rather small coordinate severity. There are drifting landscapes
(Angeline, 1997; Bäck, 1998; Salomon & Eggenberger, 1997; Saleem & Reynolds,
2000), repetitive tracking (or rotating) problems (Angeline, 1997; Bäck, 1998), and
random drifting (Angeline, 1997; Bäck, 1998). However, the work of Saleem and
Reynolds (2000) shows in the context of a moving hills problem that increasing
severity or decreasing frequency of change handicaps algorithms using local variation. Very big (unpredictable) severity values lead to bad performance. As a
consequence successful reports on problems with rather big severity comes along
with very low frequency of change (Bäck, 1997). This is confirmed by the study
of Vavak et al. (1996a, 1996b) which shows that variable local search is only better
than a diversity increasing technique (e.g. hypermutation) if there is a rather small
coordinate severity in the problem.
The multinational GA by Ursem (2000) is the only known application of local
variation to an alternating two hills problem with fitness rescaling, small severity
values, and the meta dynamics following sinusoidal curves, straight lines as well as
circles.
6.3
Memorizing previous solutions
If the time-dependent changes in the environment create landscapes that are very
similar to previous landscapes, it might be sensible to memorize previous solutions
to guide the evolutionary search to those regions in the search space. There are two
different concepts for memorization in the literature.
89
6. T ECHNIQUES FOR DYNAMIC E NVIRONMENTS
6.3.1
Explicit memory
In the case of explicit memory there is a storage for previous candidate solutions
that can be reevaluated or reinserted into the population if a change in the environment occurs. Mori et al. (1997) and Branke (1999c) use an external population
where the individuals are gathered. Kirley and Green (2000) add an external memory to a fine-grained parallel population model. Trojanowski and Michalewicz
(1999b) extend each individual by a FIFO queue where the ancestors are stored.
When a change in the environment occurs, all candidate solutions are reevaluated
and the best may switch positions with the current individual if it is better. And
last, the cultural algorithm of Saleem and Reynolds (2000) uses a memory for its
situational knowledge.
Those approaches that rely primarily on the explicit memory are applied to alternating problems. The test function of Trojanowski and Michalewicz (1999b) alternates
between 8 different component functions (swelling and shrinking) and Mori et al.
(1997) use a recurrently varying knapsack problem where three different weight
limits alternate. In both examples the frequency of change is rather high. Branke
(1999c) applied the memorizing technique to a moving hills problem where there
are fitness rescalings and coordinate transformations with small severity (random
drift). Also the cellular genetic algorithm by Kirley and Green (2000) is improved
drastically by an external memory if applied to a problem alternating between two
states. However, the performance of this algorithm on drifting or randomly drifting problems is not satisfying. Saleem and Reynolds (2000) applied their cultural
algorithm to moving hills problems drifting with small or medium severity or a
rather chaotic behavior. However, the memory mechanism is only one aspect in
this mixture of techniques.
6.3.2
Implicit memory
Contrary to explicit memory, implicit memory does not memorize complete candidate solutions that are reinserted into the population but rather provides a mechanism where parts of previous solutions still participate in the evolutionary process
but do not undergo the selective pressure. The major techniques are diploid or polyploid representations where either depending on a dominance mechanism certain
recessive information persists in the individual or an encoded switch is used to fade
out and activate (parts of) candidate solutions. Diploid algorithms with dominance
mechanisms have been used by Goldberg and Smith (1987), Smith and Goldberg
(1992), Hadad and Eick (1997), Lewis et al. (1998), and Ng and Wong (1995).
Ryan introduced several modifications like additive diploidity (Ryan, 1996), forced
90
6.4. P RESERVING DIVERSITY
mutation (Ryan, 1997), and perturb operators (Ryan & Collins, 1998). Also, Ryan
and Collins (1998) introduced a triploid scheme called shades. As a generalization of diploidity polygenic inheritance was introduced (Ryan, 1996). Hadad and
Eick (1997) developed a tetraploid scheme and Dasgupta and McGregor (1992) and
Dasgupta (1995) presented an diploid algorithm with switch bits, called structured
GA.
All these algorithms have been applied primarily to recurrently varying knapsack
problems, i.e. alternating, repetitive problems with either high frequency changes
(e.g. 15−1 , Goldberg & Smith, 1987; Smith & Goldberg, 1992; Dasgupta & McGregor, 1992; Ng & Wong, 1995) or low frequency changes (e.g. 1500−1 , Lewis
et al., 1998). Most problems oscillate between two different weight limits. Exceptions are three weight limits in the work of Dasgupta and McGregor (1992) and
random new weight limits in one application of Lewis et al. (1998). The latter
work also showed implicitly by the definition of a general knapsack problem that
this problem is equivalent to pattern tracking. Therefore, the pattern tracking problem considered by Ryan (1996) adds nothing new to the problem classes above.
There is only one application to a drifting problem that is repetitive in a more
restricted way (Dasgupta, 1995).
6.4
Preserving diversity
As it was pointed out in Section 2.3, a huge fraction of the techniques oriented
research focuses on the maintainance of the diversity to preserve the ability of an
algorithm to adapt to a dynamic environment. As a consequence, there is a wide
variety of techniques in the literature to support diversity in the population. Here,
diversity increasing techniques, niching techniques to prevent convergence, and
restricted mating are distinguished.
6.4.1
Diversity increasing techniques
One very basic mechanism to increase diversity is to introduce random individuals into the population each generation. This method is referred to as random
immigrants, partial hypermutation, or hypermutation with a fixed rate (e.g. Grefenstette, 1992, 1999; Cobb & Grefenstette, 1993). The triggered hypermutation only
introduces new individuals when the performance, usually measured as best-ofgeneration fitness over 5 generations, worsens. In addition, the individuals are not
completely random but obtained by a mutation with severely increased mutation
91
6. T ECHNIQUES FOR DYNAMIC E NVIRONMENTS
rate (e.g. Cobb, 1990; Cobb & Grefenstette, 1993; Vavak et al., 1996a, 1996b;
Morrison & De Jong, 2000). Grefenstette (1999) examined also a self-adaptive
hypermutation rate and Smith and Vavak (1999) introduced hypermutation to algorithms with overlapping populations. Instead of using a hypermutation operator
Collard et al. (1996, 1997) introduced a dual GA where an additional bit in the
individual determines whether all bits are reversed when decoding the individual.
In addition a mirroring operator inverses complete individuals. Based on this algorithm a set of different techniques was proposed. The dreamy GA restricts recombination between normal and inversed individuals during certain phases of the
evolutionary process (Escazut & Collard, 1997). The folding GA separates the individuals into segments by introducing introns as seperators (or meta-genes). Those
seperators determine whether the subsequent segment is inversed during decoding
(Gaspar & Collard, 1997). Eventually, the dual sharing GA extends the dual GA
by a mechanism to adapt the mirroring rate on the basis of the ratio of inverted
individuals (Gaspar & Collard, 1999b).
Most applications of diversity increasing techniques use problems without fitness
rescaling but with non-repetitive coordinate transitions. Usually the frequency of
change is rather big—between 20−1 (Cobb & Grefenstette, 1993; Grefenstette,
1992; Smith & Vavak, 1999; Vavak et al., 1996a, 1996b; Grefenstette, 1999) and 1
(Collard et al., 1996, 1997; Gaspar & Collard, 1997; Grefenstette, 1999; Morrison
& De Jong, 2000). However the severity can be rather small or rather big as in
the study of Grefenstette (1999) or it is even varying within one problem (Cobb &
Grefenstette, 1993; Collard et al., 1996, 1997; Gaspar & Collard, 1997). The pattern tracking problem by Escazut and Collard (1997); Collard et al. (1997); Gaspar
and Collard (1997, 1999b) has quite similar dynamics. There is just one application where the problem is in addition returning to previous optimum positions in
the search space (Cobb, 1990).
In three publications, diversity increasing techniques are used to tackle alternating problems without coordinate transitions (Cobb & Grefenstette, 1993; Smith &
Vavak, 1999; Morrison & De Jong, 2000). In the first two cases the frequency of
change is 20−1 resp. 200−1 where each change equals an alternation between component functions. The latter publication has a frequency of change of 1 leading to
rather irregular alternations.
6.4.2
Niching techniques
In the literature, there is a wide variety of sharing methods as niching techniques
available. However, suprisingly, most techniques are not yet used in the context
92
6.5. A DAPTIVE AND SELF - ADAPTIVE TECHNIQUES
of dynamic optimization (the author is aware of only one publication using a traditional sharing method by Ursem, 2000). Cedeño and Vemuri (1997) introduced
the multi-niche crowding GA where a combination of mating and environmental
replacement guarantees niching. Also, Liles and De Jong (1999) combined a restricted mating mechanism with sharing. In the thermodynamical GA by (Mori et
al., 1996, 1998) a diversity measure is used in the selection to reach a good distribution of the search space. As already mentioned this technique was also combined
with a memory (Mori et al., 1997).
All applications are alternating problems. Both Cedeño and Vemuri (1997) and
Liles and De Jong (1999) use a swelling and shrinking problem without coordinate
translations. (Ursem, 2000) add coordinate translations with small severity in his
study. The examinations of Mori et al. (1996, 1997, 1998) use dynamic knapsack
problems which are also alternating.
6.4.3
Restricted mating
This set of techniques preserves diversity by dividing the population into several
“subpopulations” and restricting the recombination operator to individuals of the
same subpopulation. Two approaches using common distributed algorithms are
the cellular GA with a toroidal grid as topology (Kirley & Green, 2000) and the
spatially distributed GA in a 15x15 grid (Sarma & De Jong, 1999). The tagged
GA by Liles and De Jong (1999) assigns each individual a tag and restricts mating
to individuals with equal tags. In the multinational GA by Ursem (2000) each
individual is assigned to a subpopulation called nation. A valley detection can reassign individuals to other or new nations.
Two of the applications are alternating swelling and shrinking problems: Liles and
De Jong (1999) use constant coordinates and Ursem (2000) coordinate translations
with small severity values. The other problems have no fitness rescalings but varying coordinate severity and a frequency of change of 20−1 (Sarma & De Jong, 1999;
Kirley & Green, 2000)
6.5
Adaptive and self-adaptive techniques
Adaptation and self-adaptation are very popular and successful in static optimization. Therefore, it is sensible to apply those techniques to a non-stationary environment where also the characteristics of the search space keep changing. Adaptive
and self-adaptive EP has been the focus of a publication of Angeline (1997). Self93
6. T ECHNIQUES FOR DYNAMIC E NVIRONMENTS
adaptive EP has also been investigated by (Saleem & Reynolds, 2000). Bäck (1997,
1998) and (Salomon & Eggenberger, 1997) examine self-adaptive ES with the lognormal update rule. Genetic algorithms with a self-adaptive mutation rate have
been the target of work by Bäck (1997, 1999) and Grefenstette (1999). The latter
publication also considered a self-adaptive hypermutation rate.
An adaptive technique was used in the thermodynamical GA by Mori et al. (1998)
to update a parameter, called the temperature.
Since most adapation techniques are strongly related to local variation, the problems have almost similar properties. Most problems are again characterized by
missing fitness rescaling and rather small coordinate severity. There are drifting
landscapes (Angeline, 1997; Bäck, 1998; Salomon & Eggenberger, 1997; Saleem
& Reynolds, 2000), rotating problems (Angeline, 1997; Bäck, 1998), and randomly
drifting problems (Angeline, 1997; Bäck, 1998).
Applications on problems with rather big severity values have usually a very low
frequency of change (Bäck, 1997, 1999). Also the non-repetitive dynamic knapsack problem used in the work of Mori et al. (1998) has a frequency of change of
0.01.
6.6
Algorithms with overlapping generations
Algorithms with overlapping populations have been a research topic especially in
the context of genetic algorithms (steady state GA). Vavak and Fogarty (1996) were
the first to use a steady state GA on a dynamic problem. Smith and Vavak (1999)
examined various replacement strategies. They show that the steady state model is
often better suited to dynamic problems than the generational model. However, the
replacement strategy must be chosen appropriately to guarantee good performance.
The empirical investigation of several strategies showed that deletion of the oldest
or a random individual leads to poor performance. A good strategy seems to be a
modification of the deletion of the oldest: each the parents is selected using a binary
tournament beween a random individual and the oldest individual in the population.
This scheme guarantees reevaluation of individuals and an implicit mechanism for
elitism. Cedeño and Vemuri (1997) use a worst among most similar replacement
method. A rather sophisticated adaptive replacement strategy was introduced by
Dozier (2000) for a path planning problem. A comparable approach for a specific
application was also presented by Stroud (2001).
The problems tackled in these publications are very distinct. Since the method of
overlapping generations does not seem to depend on the kind of the dynamics and
94
6.7. N ON - LOCAL ENCODING
there are only few general publications, the classification of the problems is omitted
here.
6.7
Non-local encoding
Standard genetic algorithms are also often applied to dynamic problems—usually
to demonstrate the superiority of a new technique. However, there are few studies
where genetic algorithms are the main topic of an examination, e.g. in the publications of Salomon and Eggenberger (1997), Stanhope and Daida (1998), and Vavak
and Fogarty (1996).
Based on the results pre-published by Weicker and Weicker (2000), it can be deduced that the good results of GAs on certain dynamic problems are rather due to
the encoding of the search space than some parallel schema/hyperplane search. The
general details of the examination are resembled in Section 8.1. The particular genetic algorithm uses standard binary encoding of each search space dimension with
16 bits, population size 100, crossover rate 0.6, mutation rate (Number of Bits)−1 ,
and fitness proportional selection with linear scaling.
The results on a rotating Rastrigin function are shown in Figure 6.1 and demonstrate a continual convergence of the GA towards the optimum. Only in the case
of very fast rotations the performance is slightly worse. This result is in particular interesting since the evolution strategy based on local variation is not able to
converge to the optimum (see Section 8.1). Apparently the good performance is
not due to recombination of “building blocks” since the GA without recombination always outperforms the standard GA. Therefore the advantage seems to be in
the different neighbourhood structure induced by a binary encoding—supporting
a very diffuse neighborhood for each point in the search space. However, small
deteriorations after each quarter cycle of the rotaiton show that the encoding has
also its disadvantages. Presumably due to huge Hamming cliffs the algorithm has
severe problems to follow the currently best known fitness region.
In our experiment only one example could be found where recombination has a
statistically significant positive effect on the behavior of a genetic algorithm. The
experiments use a rotating cone segment and the cycle time of the rotation is very
fast, i.e. after 5 generations the dycle is complete. The details of the experiments
are again described in Section 8.1. Figure 6.2 shows the results. Apparently the
recombination helps to generalize from five different alternating landscapes with a
common optimum. This does not result in a significant advantage concerning the
fitness values but in a significantly shorter distance to the optimum. However, the
95
6. T ECHNIQUES FOR DYNAMIC E NVIRONMENTS
3.2
2.8
2.4
2
1.6
1.2
cycle time 50
distance to optimum
distance to optimum
cycle time 5
0
40
3.5
3
2.5
2
1.5
1
0.5
80 120 160 200
generation
0
distance to optimum
distance to optimum
3
2.5
2
1.5
1
0.5
0
40
3.5
3
2.5
2
1.5
1
0.5
80 120 160 200
generation
0
cycle time 50
80 120 160 200
generation
250
GA
GA, mut.
250
200
fitness
200
fitness
40
cycle time 100
300
150
100
150
100
50
50
0
80 120 160 200
generation
cycle time 200
cycle time 100
3.5
40
0
40
80 120 160 200
generation
0
0
40
80 120 160 200
generation
96
Figure 6.1 Rotating Rastrigin function optimized by genetic algorithms. The experimental setup is explained detailed in Section 8.1.1.
6.7. N ON - LOCAL ENCODING
8
20
15
10
5
0
avg. distance to opt.
GA
GA, mut.
25
0
40
best distance to opt.
30
8
7
6
5
4
3
2 significance for rec.
1
0
0
40 80 120 160 200
generation
7
6
5
4
3
2
1
80 120 160 200
generation
significance for rec.
0
40
80 120 160 200
generation
6
significance for rec.
4
best fitness
distance to optimum
cycle time 5
2
0
-2
significance for no rec.
0
40
80 120 160 200
generation
Figure 6.2 Rotating cone segment with fast severity (5 generations for one complete cycle) optimized by genetic algorithms. The Student’s t-test
shows that the recombination improves the performance of the GA on
the rotating cone segment with cycle time 5 (as far as the distance to the
optimum is concerned). The experimental setup is explained detailed
in Section 8.1.1.
properties of this dynamic problem are probably closer to a noisy function than to
a usual dynamic problem.
Therefore, a conclusive indication for the usefulness of schema processing in dynamic environments is still missing.
97
6. T ECHNIQUES FOR DYNAMIC E NVIRONMENTS
6.8
Learning of the underlying dynamics
Algorithms that predict the next changes in a non-stationary environment are promising. In a very broad sense we could even consider self-adaptive mechanisms (e.g.
Angeline, 1997; Bäck, 1998) as one possible technique to derive knowledge concerning the dynamics. However, we will argue in the next chapter (Section 7.8.2)
that the standard self-adaptation techniques are not suited to adapt a step-size parameter in all possible dynamic situations. Also the cultural algorithms as they are
used by Saleem and Reynolds (2000) are not suited to derive knowledge on the
dynamics—they are, similar to self-adaptation, only concerned with useful parameter values.
I am only aware of one consequent approach, namely the work by Munetomo et
al. (1996), where the derivation of knowledge concerning the dynamics is realized.
The environment is modeled as a stochastic learning automata. As a consequence
this approach is also only applicable on a rather narrow class of problems. In
this thesis a technique to derive the meta-rule of dynamics of drifting problems is
proposed in Section 8.4.
6.9
Resulting Problem-Techniques Mapping
Based on the above analysis of existing applications of dynamic optimization techniques to non-stationary problems, a first mapping between problem characteristics
and techniques can be derived.
Figure 6.3 shows the resulting map. Due to the rather facile character of the literature analysis and the difficulties of categorizing problems described with varying
degree of detail, the result is only a first impression how such a mapping could look
like.
Certain compromises have been made when drawing the table in Figure 6.3. For
example the entry of implicit memory in the case of static fitness and rotating coordinate transformations is derived from a problem with quite similar properties
stemming from a sinusoidal translation in one search space dimension.
The dynamic problem classes can be seen in the figure where certain techniques are
probably well suited for (or even designed for). However, the resulting Figure 6.3
should not be considered as an overall truth since it only reflects where researchers
have applied a certain technique successfully. Therefore, a missing entry does not
imply that a technique should not be used in a specific context. As a consequence,
a detailed interpretation of the resulting problem-techniques mapping is omitted
98
6.9. R ESULTING P ROBLEM -T ECHNIQUES M APPING
coordinate transformation
static
drifting
rotating
rand. drift fast drift
superpose
chaotic
diversity increasing
local variation
static
restricted mating
big fr. of ch.
small/med. fr.ch.
loc.var. small
fr.ch.
loc.var. small
fr.ch.
schema?
fitness rescaling
impl. mem.
non-local
encoding
unaffecting
expl. mem.
impl. mem.
div. incr.
niching
restr.mating
swell/
shrink
niching
restricted mating
random
returning
expl. mem.
chaotic
Figure 6.3 A first mapping between problem characteristics and techniques based
on a facile analysis of the literature.
99
6. T ECHNIQUES FOR DYNAMIC E NVIRONMENTS
here.
6.10
Discussion
This chapter demonstrates how the framework of Chapter 4 can be used to integrate
the research of mannifold publications into a common knowledge basis. Also the
necessity of a systematic study becomes obvious to create a complete overview on
techniqes and problem classes. As soon as the different gaps of Figure 6.3 are filled
in the map might serve as a decision criterion for the application or development of
algorithms for certain problems.
In addition, future work should consider in depth the different reasons why certain
techniques are applied in a specific domain. There are three different tasks an
evolutionary algorithm might face in a dynamic environment, namely
• the tracking of “good” regions in the search space,
• the ability to manage an inhomogenous dynamic search space, and
• the task to find an optimum in a changing environment.
We can expect that in different problem classes different techniques are sensible
for the three tasks. Probably this is a good explanation for the success of many
techniques that mix various of the previously described techniques (Ursem, 2000;
Saleem & Reynolds, 2000).
100
CHAPTER 7
Analysis of Local Operators for
Tracking
The literature analysis of the previous chapter has shown that local operators are
primarily used for problems with a drifting character. However there is no advice in
the literature how to choose and tailor a local operator to a specific problem. This
chapter analyzes an exemplary local operator concerning the parameter calibration
and the limits of its performance using a theoretical model. The findings of the
analysis result in a set of design rules that should simplify the development of new
evolutionary algorithms for tracking tasks.
The local operator analyzed in this chapter takes up essential properties of the probably most prominent and successful local operator, the evolution strategy mutation
(see Section 2.2.2). This operator is characterized by the following principles:
1. zero-mean: an average neutrality implies that an object variable may be increased with the same probability as it may be decreased,
2. small changes occur with a higher probability than big changes,
3. each point in the search space is reachable by a mutation in one step with a
probability greater than zero,
101
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
4. mostly a Gaussian distribution is used, but there have been also works with
Cauchy (Yao & Liu, 1996) and Laplace (Montana & Davis, 1989) distributions, and
5. self-adaptive techniques are used to adjust the operator to the fitness landscape.
For principles (1), (2), and (4) see also the book by Schwefel (1977). Principle
(3) is of essential importance if a global convergence in static fitness landscapes is
shown (see Rudolph, 1997).
The local operator in this chapter is defined on the discrete search space Z × Z
in order to simplify the examinations. The operator is also zero-mean, prefers
small changes over big changes, and uses the binomial distribution which is closely
related to the Gaussian distribution. The reachability of any point in the search
space by one application of the mutation operator is omitted here since the focus is
on the local tracking of an optimum. Issues like escaping from a local optimum are
not considered because the area of interest in the search space is very narrow and
the probability of hitting a moving target by random probes can be ignored from
a practical viewpoint. Also self-adaptation is difficult to model in our theoretical
framework. As a consequence the major part of the analysis is done with a fixed
but arbitrary value for the step size parameter.
Note that the analysis focuses on an as exact as possible analysis of a local operator
in a low dimensional search space. This is motivated by the dimensionality of most
existing moving peaks problems. Also, rather complex problem dynamics where
drifting is only one aspect are presumably only manageable in few dimensions. For
simple high dimensional problems results may be obtained using the sphere model
from ES theory (Beyer, 2001). Arnold and Beyer (2002) presented a first result for
a randomly drifting peak.
This chapter is organized as follows. Section 7.1 presents a few basic definitions
and derives two different Markov chain models for the optimization of an unimodal
problem and the local mutation operator. In Section 7.2 a worst-case analysis is
used to derive minimal requirements for a successful tracking behavior. Optimal
parameter settings concerning the tracking rate are investigated in Section 7.3. In
the next two sections two of the underlying principles of ES-mutation are questioned within the context of tracking problems: the zero-mean mutation in Section 7.4 and the preference of small steps over big steps in Section 7.5. The influence of the population size on the performance in a non-stationary problem is
examined in Section 7.6. The combination of a local operator with a memorizing technique is analyzed in Section 7.7. Section 7.8 discusses several issues of
102
7.1. T HEORETICAL FRAMEWORK
self-adaptation and adaptation related to the considered problem. And Section 7.9
summarizes and concludes.
7.1
Theoretical framework
For the major part of this analysis, a unimodal problem is assumed where the fitness
corresponds to the distance to the optimum. As a discrete two-dimensional search
space Z × Z is chosen.
The distance within the model is defined as the number of vertical and horizontal
crossings of raster boundaries of the search space.
Definition 7.1 (Distance metric) For two points B = (B1 , B2 ) and C = (C1 , C2 )
in the search space Z × Z the distance is defined as
dist(B, C) = d(B1 , C1 ) + d(B2 , C2 ).
with d(x, y) = |x − y|.
(7.1)
♦
This distance metric is used to define the non-stationary optimization problem.
The dynamics are introduced by moving the optimum horizontally as stated in the
following definition.
Definition 7.2 (Tracking problem) The tracking problem is defined by the starting position of the optimum A∗ (0), the severity of the dynamics s, and the tracking
tolerance δ ∈ N. The position of the optimum at generation t ∈ N0 is defined as
A∗ (t + 1) = A∗ (t) + S
(7.2)
with S = (s, 0)T . The task for an optimization algorithm ALG is to produce at
each time step t a tolerable approximation A(t) for the position of the optimum:
dist(A(t), A∗ (t)) ≤ δ
(7.3)
♦
Figure 7.1 illustrates the definition of the tracking problem. For severity 6 the
optimum and the respective tolerable points with a tracking tolerance 2 are shown.
To simplify the following computations and notations the number of points within
a distance to a given point is introduced.
103
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
8
4
6
7
8
5
6
7
6
7
8
8
8
severity = 6
Figure 7.1 The gray point marks the position of the optimum in the last generation which was now moved horizontally six steps. The region on the
right denotes those points which are within a distance 2 of the new optimum’s position. The numbers in the squares denote the distance to
the previous optimum.
Definition 7.3 (Number of points) For an arbitrary point B ∈ Z × Z in the search
space, the number of points with distance d and respective within distance d are
denoted by
NB (d) = # {C ∈ Z × Z | dist(B, C) = d}
bB (d) = # {C ∈ Z × Z | dist(B, C) ≤ d}
N
♦
Note, that according to the following lemma the numbers are independent of point
b (d)).
B. As a consequence the subscript B is usually dropped (N (d) and N
Lemma 7.1 For any B ∈ Z × Z:
NB (d) =
4d,
1,
if d > 0
otherwise d = 0
bB (d) = 2d(d + 1) + 1.
N
(7.4)
(7.5)
♦
Proof: For all points C with dist(B, C) = d, it follows from Definition 7.1 that
|B1 − C1 | + |B2 − C2 | = d.
104
7.1. T HEORETICAL FRAMEWORK
Then there exists 0 ≤ a ≤ d with |B1 − C1 | = a and |B2 − C2 | = d − a. All
possible first coordinates of C can be described by C1 ∈ {B1 − d, . . . , B1 + d}.
Then, the following values C2 result immediately:
• C2 = 0 if C1 ∈ {B1 − d, B1 + d} and
• C2 ∈ {B2 − (d − a), B2 + (d − a)} for C1 ∈ {B1 − a, . . . , B1 + a} and
0 ≤ a ≤ d − 1.
bB (d) follows imThe proof for NB (D) is finished by counting all points. Also N
mediately.
bB (d) =
N
d
X
NB (i)
i=0
= NB (0) +
d
X
NB (i) = 1 + 4
i=1
d
X
i
i=1
d(d + 1)
= 1+4
2
q.e.d.
In the framework of this examination, a simple local search algorithm using a
(1, λ)-selection strategy is investigated, i.e. λ new offspring are created and the
best offspring replaces the current individual. The mutation operator is defined
in the discrete search space by the binomial pdf mimicking the ES-mutation with
the Gaussian pdf. How the binomial pdf is used in the mutation is illustrated in
Figure 7.2.
Definition 7.4 (Local mutation) A local mutation, applied to individual A, results
in B = A + X where X ∈ Z × Z is a random variable. The respective probability
density function on the discrete search space is defined as

P
−1

~0, C))N (dist(~0, C))
 p(dist(~0, x))N (dist(~0, x))
p(dist(
,
C∈R
P rp [X = x] =
(7.6)
if x ∈ R


0, otherwise
where R = {C | dist(~0, C) ≤ maxstep} with maximal step size maxstep and a
basic one-dimensional pdf p.
The local mutation operator is defined by p = plocal using the following binomial
distribution for 0 ≤ d ≤ maxstep:
1
2maxstep + 1
(7.7)
plocal (d) = 2maxstep
2
maxstep − d
♦
105
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
Figure 7.2
This figure illustrates the usage of the
binomial pdf to define the mutation operator. The pdf is used to assign a probability to each point for being the result
of a mutation. It resembles the principles of evolution strategies.
Figure 7.3 shows the resulting probability density function exemplary.
0.014
0
10
-10
0
0
10
-10
Figure 7.3 Resulting probability density function for the local standard mutation.
7.1.1
Exact Markov chain model
In this section, the dynamics of the described local search algorithm to the tracking
problem are modeled exactly using a Markov chain. This model is used to derive
exact results concerning the tracking behavior.
In this model as well as in the worst-case model described in the next section, the
model is simplified considerably by keeping the optimum at a stationary point A∗ =
~0 = (0, 0)T in the search space. The original problem’s movement of the optimum
is transformed into a negative movement of the current best approximation. Both
descriptions of the tracking problem are equivalent to each other. They are shown
106
7.1. T HEORETICAL FRAMEWORK
in Figure 7.4. A similar modeling approach was chosen independently by Droste
(2002).
Moving optimum:
A+X
Model with static optimum:
A−S+X
δ2
X
A∗
A
δ2
X
S
A
A∗ + S
A∗
A−S S
δ1
δ1
Figure 7.4 The left diagram shows the situation according to the problem definition: the optimum A∗ moves according to S, and the best approximation A moves according to X. However, the Markov chain model is
easier to define if the optimum stays at the same position and the negative severity S is applied to the approximation A, as it is shown in the
right diagram.
The states of the Markov chain are defined as the relative position of the current best
solution candidate to the optimum. However, only a window around the optimum
is considered, i.e. those points in the search space that have a distance greater than
radius to the optimum are unified in an absorbing state absorb.
n
o
States =
A ∈ Z2 | dist(A, ~0) ≤ radius ∪ {absorb}.
The limitation of the number of States is necessary to keep the simulation of the
EA dynamics feasible. The probability to be in the absorbing state is an indicator
whether the algorithm stays close to the optimum A∗ and whether other derivations
of those simulations are accurate.
In the subsequent paragraphs, for an arbitrary point A in the search space the transition probability is derived that the best individual of λ randomly created offsprings is positioned at A + X − S. Again X equals the effect of the mutation and
S = (s, 0)T is the change of the optimum from one generation to the next.
As a first step, the probability is computed that an offspring has the exact distance
δ to the new optimum. This probability results as
X
P rp [dist(A + X − S, ~0) = δ] =
P rp [X = x].
x∈Z×Z
dist(A + x − S, ~0) = δ
107
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
Note, that the x with P r[X = x] > 0 are described by the set R in Definition 7.4.
The probability to be further away than distance δ follows as
dist(A−S,~0)+maxstep
P rp [dist(A + X − S, ~0) > δ] =
X
P rp [dist(A + X − S, ~0) = i]
i=δ+1
= 1−
δ
X
P rp [dist(A + X − S, ~0) = i].
i=0
This probability may be used to compute the probability that the best of λ offspring
of parent A is placed at A + X − S.
Lemma 7.2 When creating λ offspring of parent A under given severity S, the
probability that the offspring with minimal distance to A∗ = ~0 is positioned at
A + X − S = A + x − S = y with dist(y, ~0) = δ is given by
P rp [best 1≤j≤λ (A + Xj − S) = y | X1 , . . . , Xλ ] =
i λ h
λ−i X
X
λ
~
(P rp [X = x])k
P rp [dist(A + Xi − S, 0) > δ]
i
i=1
k=1
i−k
~
P rp [dist(A + X − S, 0) = δ] − P rp [X = x]
i
i−1
k−1
where selection is uniform among several offspring with minimal distance and
X1 , . . . , Xλ are independent and identical distributed as X.
♦
Proof: The probability that λ − i individuals (0 ≤ i < λ) are further away than
distance δ equals
λ−i
λ
~0) > δ]
P
r
[dist(A
+
X
−
S,
.
p
i
Then the remaining i offspring individuals have distance δ to the optimum. Now,
the probability that k of those offspring individuals (1 ≤ k ≤ i) are placed at the
target spot in the search space equals
i−k i
(P rp [X = x])k P rp [dist(A + X − S, ~0) = δ] − P rp [X = x]
.
k
The probability to choose one of those k individuals uniformly equals ki . And the
lemma follows immediately.
q.e.d.
Then the exact Markov chain model of the dynamic optimization is given in the
following definition.
108
7.1. T HEORETICAL FRAMEWORK
Definition 7.5 (Exact local Markov chain model) The exact local Markov chain
model for severity S = (s, 0)T , a mutation defined by pdf p, and λ offspring is
given by the tuple (States, T ) where
States =
A = (x, y) ∈ Z2 | |x| + |y| ≤ radius ∪ {absorb}.
and the transition matrix T = States × States is given by the following equations
with A, B ∈ States.
T [A → B] = P rp [best 1≤i≤λ (A + Xi − S) = B | X1 , . . . , Xλ ]
X
T [A → absorb] = 1 −
P rp [best 1≤i≤λ (A + Xi − S) = B | X1 , . . . , Xλ ]
B∈States\{absorb}
T [absorb → absorb] = 1
T [absorb → A] = 0.
♦
Example 7.1 The exact Markov chain model is illustrated with a small example.
For simplicity the probability density function p in Figure 7.5 is used which does
not correspond to the mutations defined above. The maximal step size maxstep =
2, population size λ = 3, parent individual A = (−1, −1)T , and severity S =
(1, 0)T is used. Figure 7.6 shows the best individual as a black frame, the current
0.05
0.05 0.1 0.05
0.05 0.1 0.2 0.1 0.05
Figure 7.5
Exemplary mutation distribution. The square in the
center is the current position.
0.05 0.1 0.05
0.05
optimum as a gray square and the movement of the optimum from one generation
to the next.
The probability to produce an offspring at distance δ equals

if δ

 0,


0.1,
if δ




if δ
 0.2,
0.35, if δ
P rp [dist(A + X − S, ~0) = δ] =


0.2,
if δ




0.15, if δ



0,
if δ
=0
=1
=2
=3
=4
=5
>5
109
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
Figure 7.6
Scenario: The square in the center is the current position
of the best individual. The gray square is the optimum
which moves one position to the right.
δ=0
0.05
0.05
δ=1
0.05
δ=2
0.05 0.1 0.05
0.05 0.1 0.05
0.05 0.1 0.05
0.05 0.1 0.2 0.1 0.05
0.05 0.1 0.2 0.1 0.05
0.05 0.1 0.2 0.1 0.05
0.05 0.1 0.05
0
0.05
0.05
δ=3
0.05 0.1 0.05
0.1
0.05 0.1 0.05
0.05
0.05
0.05 δ = 4
0.05 δ = 5
0.2
0.05 0.1 0.05
0.05 0.1 0.05
0.05 0.1 0.05
0.05 0.1 0.2 0.1 0.05
0.05 0.1 0.2 0.1 0.05
0.05 0.1 0.2 0.1 0.05
0.05 0.1 0.05
0.05 0.1 0.05
0.05 0.1 0.05
0.05
0.35
0.05
0.2
0.05
Figure 7.7 The computation of the probability to produce an offspring at distance
δ.
as it is also shown in Figure 7.7 and the probability to produce an offspring that is
further away than δ equals

1,
if δ = 0




0.9,
if
δ=1



0.7,
if δ = 2
P rp [dist(A + X − S, ~0) > δ] =
.
0.35,
if
δ=3




0.15, if δ = 4



0,
if δ ≥ 5
The transition probability is calculated exemplary for the case that the offspring
is placed at the same position like the parent, while the optimum moves one step
110
0.15
7.1. T HEORETICAL FRAMEWORK
further away. Using d = dist(A + X − S, ~0) the probability equals
P rp [best((−1, −1)T + X − (1, 0)T ) = (−2, −1)T | X1 , X2 , X3 ]
3
1
0 0
2
~
~
P rp [X = 0] (P rp [d = 3] − P rp [X = 0]) 0 +
= 1 P rp [d > 3]
3
P rp [d > 3]1
P rp [X = ~0]1 (P rp [d = 3] − P rp [X = ~0])1 10 +
2
P rp [X = ~0]2 (P rp [d = 3] − P rp [X = ~0])0 11 +
3
0
~0]1 (P rp [d = 3] − P rp [X = ~0])2 2 +
P
r
[d
>
3]
P
r
[X
=
p
p
3
0
2
1
P rp [X = ~0] (P rp [d = 3] − P rp [X = ~0]) 21 +
P rp [X = ~0]3 (P rp [d = 3] − P rp [X = ~0])0 22
=
=
3 · 0.352 · (0.21 · 0.150 · 1) +
3 · 0.351 · (0.21 · 0.151 · 1 + 0.22 · 0.150 · 1) +
1 · 0.350 · (0.21 · 0.152 · 1 + 0.22 · 0.151 · 2 + 0.23 · 0.150 · 1)
0.1715
♦
All resulting transition probabilities are shown in Figure 7.8.
0.043
0.043 0.193 0.136
0.001 0.02 0.172 0.193 0.136
Figure 7.8
The resulting probabilities for an offspring population size λ = 3.
0.001 0.02 0.043
0.001
This exact Markov chain model is used in the remainder of this chapter for the
derivation of exact results and for simulations how the state distribution is changing
over several generations.
7.1.2
Worst-case Markov chain model
In this section a simplified Markov chain model is presented with the goal of a
one-dimensional state space. The basis for this model is the assumption that the
111
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
current individual is always situated at the horizontal line drawn through the optimum (see Figure 7.9), i.e. that there is no vertical deviation relative to the optimum.
Independently from the actual position of a new individual, the mere distance is of
optimum
current individual
Figure 7.9 Simplified situation for the worst-case model: it is assumed that the
next individual is always chosen on the horizontal line through the optimum. That implies that the probabilities to reach the light gray points
(with distance 2 to the optimum) are unified in the light gray point at
the horizontal line.
interest and in the further steps of the approximation the point on the horizontal
line is assumed. This situation is shown in Figure 7.10 where all points with the
same distance are mapped to the same transition from the current distance to the
new distance. At the bottom of the figure the resulting Markov chain is shown: The
states States = N0 are the distances to the optimum and the transition probabilities
are summarized probabilities as sketched in Figure 7.10.
The relevance of this modeling is pointed out by the following lemma which is the
basis for the consideration of the model as a worst-case scenario.
Lemma 7.3 If the considered mutation operator is zero-mean, the situation described above is a worst-case scenario with regard to the probability to get at least
as close as distance d to the optimum (d = 0, 1, 2, . . .).
♦
Proof: The proof is omitted here. Instead, Figure 7.11 illustrates a plausibility
argument that the probability to hit the optimum or get close to the optimum increases in any other possible scenario. It follows immediately from the figure that
the probability to get at least as close as distance d increases in any other scenario
for arbitrary d.
q.e.d.
112
7.1. T HEORETICAL FRAMEWORK
4
3
2
1
0
Figure 7.10 This figure shows exemplarily which points of the probability density function are summarized and assigned to which transition of the
Markov chain. The optimum moves one step to the right. Last generations state of the Markov chain was 2.
For the subsequent discussion of this model we can distinguish four different scenarios how distance d0 of the current approximation to the optimum, the severity
s, the maximal step size maxstep, and the new distance d to the optimum relate to
each other.
(i)
(ii)
(iii)
(iv)
(s + d0
(s + d0
(s + d0
(s + d0
> d)
≤ d)
≤ d)
> d)
∧
∧
∧
∧
(maxstep
(maxstep
(maxstep
(maxstep
≥ s + d0 − d)
> d − s − d0 )
≤ d − s − d0 )
< s + d0 − d)
The four situations are described visually in Figure 7.12 The range of X = x
leading to the new distance d to the optimum is restricted for the different cases as
follows.
(i)
(ii)
(iii)
(iv)
s + d0 − d ≤ dist(x, ~0) ≤ min{s + d0 + d, maxstep}
d − s − d0 + 1 ≤ dist(x, ~0) ≤ min{s + d0 + d, maxstep}
0 ≤ dist(x, ~0) ≤ min{d − s − d0 , maxstep}
none
113
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
possible new positions
of the optimum with
severity 1
33
31 22 13
31 22 13
33
current individual
31
31 22 33
33 22 13
13
possible positions of the
optimum at distance 3 to
the current individual
worst-case
scenario
33
33
31
31 22 13
31 22 33
31 20 31
33
31 22 13
33 22 13
13 02 13
33
13
Figure 7.11 The upper left part of the figure shows the possible positions of an
optimum at distance 3 moved by severity 1. The other parts of the
figure show different scenarios for the optimum’s new position. The
possible offspring with distance 1, 2, or 3 to the new optimum are
shown. The number ij indicates an offspring with distance i to the
new optimum and distance j to the parental individual. Apparently
for all distances the numbers in the worst-case scenario in the upper
right part are a subset of the numbers in any other scenario.
114
33 22 11 22 33
7.1. T HEORETICAL FRAMEWORK
(i)
(iii)
A∗ + S
A
A
d
(ii)
A∗ + S
d
(iv)
A
A∗ + S
A
A∗ + S
d
d
Figure 7.12 The four different cases used in the following computations: the
dashed lines mark the possible values of the maximal step parameter of the mutation, the solid line refers to those points with distance
d to the new optimum.
The following lemma makes a statement on the number of points within distance
d to the optimum, that can be reached from a point with distance s + d0 from the
optimum by a mutation of exact step size δ.
Lemma 7.4 Given point A with distance d0 from optimum ~0. After the optimum
moving by s and mutating A with X at exact step size dist(X, ~0) = δ (according
to the valid values given above), the number of distinct newly created points within
distance d from the optimum is

0
, iff (i) or (ii)
 2d + 1 − 2 s+d +d+1−δ
2
Ndist(A+Xδ −S,~0)≤d =
♦
N (δ),
iff (iii)

0,
otherwise
115
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
The number given in the lemma specifies the number of points on the dashed lines
that are within or at the borderline of the solid drawn rectangle in Figure 7.12.
Proof: In case (iii) all points with distance δ from A are within distance d to the
new optimum, resulting in N (δ). In case (iv) the range of the mutation and the
points within distance d of the optimum do not overlap. As a consequence there
are no points. In case (i) the valid range of X = x is given as
s + d0 − d ≤ dist(x, ~0) ≤ min{s + d0 + d, maxstep}
For δ = s − d0 − d it follows
s + d0 + d + 1 − (s − d0 − d)
= 2d + 1 − 2
2
Ndist(A+Xδ −S,~0)≤d
2d + 1 − 2d = 1
This corresponds to the most left dashed line in the figure for case 1 where there is
an overlap of just one point. As one can easily verify increasing δ by 2 leads to an
increase of 2 for N . The upper bound follows similarly. The same considerations
hold for case (ii) where the lower bound δ = d − s − d0 + 1 leads immediately
to 2(d − s − d0 ) + 1 corresponding to the innermost dashed line in the figure of
scenario (ii).
q.e.d.
Lemma 7.5 Given point A with distance d0 from optimum ~0. After moving the
optimum by s and mutating A with maximal step size maxstep, the probability to
hit a point within distance d from the optimum, the hitting probability, results as
P r[dist(A + X − S, ~0) ≤ d] =
 P
s+d0 +d+1−δ P r[dist(X,~0)=δ]

, if (i)

0 −d≤δ≤min{s+d0 +d,maxstep} 2d + 1 − 2
s+d
2
N (δ)


 P

~


0≤δ≤d−s−d0 P r[dist(X, 0) = δ]

P r[dist(X,~0)=δ]
0

P

+ d−s−d0 +1≤δ≤min{s+d0 +d,maxstep} 2d + 1 − 2 s+d +d+1−δ
,
2
N (δ)

if (ii)



P


~

if (iii)

0≤δ≤min{d−s−d0 ,maxstep} P r[dist(X, 0) = δ],



0,
otherwise
♦
Proof: The formula results immediately if the possible values for |X| and the numbers of Lemma 7.4 are substituted into
X
Ndist(A+Xδ −S,~0)≤d
P r[dist(A + X − S, ~0) ≤ d] =
P r[dist(X, ~0) = δ].
N
(δ)
s+d0 −d≤δ≤s+d0 +d
q.e.d.
116
7.1. T HEORETICAL FRAMEWORK
Figure 7.13
Scenario adapted to the worst-case model: The
square in the center is the current position of the best
individual and the gray square, situated at a horizontal line to the individual, is the optimum which moves
one position to the right.
Example 7.2 This example uses again the exemplary mutation distribution in Figure 7.5. The modified scenario for the worst-case model is shown in Figure 7.13.
Then s = 1, d0 = 2, and maxstep = 2. In this example we write P r[δ] as a short
cut for P r[dist(X, ~0) = δ].
For d = 0, case (iv) holds. As a consequence
P r[dist(A + X − S, ~0) ≤ 0] = 0.
For d = 1, case (i) holds and
P
r[δ]
1
+
2
+
1
+
1
−
δ
P r[dist(A + X − S, ~0) ≤ 1] =
2+1−2
2
N (δ)
1+2−1≤δ≤min{1+2+1,2}
1 + 2 + 1 + 1 − 2 P r[2]
= 2+1−2
= 0.05.
2
N (2)
| {z }
X
=0.05
For d = 2, case (i) holds too and
P
r[δ]
1
+
2
+
2
+
1
−
δ
P r[dist(A + X − S, ~0) ≤ 2] =
4+1−2
2
N (δ)
1+2−2≤δ≤min{1+2+2,2}
1+2+2+1−1
P r[1]
=
4+1−2
+
2
N (1)
| {z }
=0.1
1+2+2+1−2
P r[1]
4+1−2
= 0.15.
2
N (1)
| {z }
X
=0.05
For d = 3, case (ii) holds and
P r[dist(A + X − S, ~0) ≤ 3]
117
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
P r[δ]
1+2+3+1−δ
=
P r[δ] +
6+1−2
2
N (δ)
0≤δ≤3−1−2
3−1−2+1≤δ≤min{1+2+3,2}
P r[1]
1+2+3+1−1
= P r[0] + 6 + 1 − 2
+
| {z }
2
N (1)
|
{z
} | {z }
=0.2
=0.1
=1
P r[2]
1+2+3+1−2
6+1−2
= 0.45.
2
N (2)
{z
} | {z }
|
X
X
=0.05
=3
For d = 4, case (ii) holds too and
P r[dist(A + X − S, ~0) ≤ 4]
X
=
P r[δ] +
P r[δ]
1+2+4+1−δ
8+1−2
2
N (δ)
0≤δ≤4−1−2
4−1−2+1≤δ≤min{1+2+4,2}
1+2+4+1−2
P r[2]
= P r[0] + P r[1] + 8 + 1 − 2
= 0.75.
| {z } | {z }
2
N (2)
|
{z
} | {z }
=0.2
=0.4
X
=0.05
=3
For d = 5 (and also d > 5) case (iii) holds and
P r[dist(A + X − S, ~0) ≤ 5]
=
X
P r[δ]
0≤δ≤min{5−1−3,2}
= P r[0] + P r[1] + P r[2] = 1.0.
Figure 7.14 illustrates the computations in the example.
118
♦
7.1. T HEORETICAL FRAMEWORK
0.05
0.05
0.05
0.1
0.05
0.1 0.05
0.2
0.1 0.05
d = 0: 0.0
d = 1: 0.05
d = 2: 0.15
d = 3: 0.45
d = 4: 0.75
d = 5: 1.0
0.1 0.05
0.05
Figure 7.14 The resulting probabilities for P r[dist(A + X − S, ~0) ≤ d].
Lemma 7.6 Given point A with distance d0 from optimum ~0. After the optimum
moving by s and mutating A with maximal step size maxstep, the probability to be
at exact distance d to the optimum ~0 results as
P r[dist(A + X − S, ~0) = d | dist(A, ~0) = d0 ] =
119
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING

P r[dist(A + X − S, ~0) ≤ 0],




iff d = 0






0,
iff (maxstep < s + d0 − d) ∧ (s + d0 > d > 0)




P r[maxstep]


,
iff (maxstep = s + d0 − d) ∧ (s + d0 > d > 0)

N (maxstep)


P

P r[s+d0 −d]


+ s+d0 −d+1≤δ≤maxstep 2χ(s + d0 + d − δ) PNr[δ]
,

N (s+d0 −d)
(δ)


0

iff (s + d − d < maxstep < s + d0 + d)




∧ (s + d0 > d > 0)




P

P r[s+d0 −d]
P r[δ]
P r[s+d0 +d]
0


0 −d) +
s+d0 −d+1≤δ≤s+d0 +d−1 2χ(s + d + d − δ) N (δ) + (2d − 1) N (s+d0 +d) ,

N
(s+d



iff (maxstep ≥ s + d0 + d) ∧ (s + d0 > d > 0)



P

P r[δ]


P
r[0]
+

1≤δ≤maxstep 2χ(2d − δ) N (δ) ,

iff (s + d0 = d) ∧ (maxstep < 2d)

P



P r[0] + 1≤δ≤2d−1 2χ(2d − δ) PNr[δ]
+ (2d − 1) PNr[2d]
,

(δ)
(2d)


0

iff (s + d = d) ∧ (maxstep ≥ 2d)




0

0
0 P r[d−s−d ]

(N
(d
−
s
−
d
)
−
2d
+
1
+
2s
+
2d
)

0)
N
(d−s−d


P


+ d−s−d0 +1≤δ≤maxstep 2χ(s + d0 + d − δ) PNr[δ]
,

(δ)


0


iff (d − s − d < maxstep < s + d0 + d) ∧ (s + d0 < d)



0]



(N (d − s − d0 ) − 2d + 1 + 2s + 2d0 ) PNr[d−s−d
0)

(d−s−d

P
0 +d]



+ d−s−d0 +1≤δ≤s+d0 +d−1 2χ(s + d0 + d − δ) PNr[δ]
+ (2d + 1) PNr[s+d
,

(δ)
(s+d0 +d)


0
0

iff (maxstep ≥ s + d + d) ∧ (s + d < d)






(2maxstep + 1) PNr[maxstep]
, iff (maxstep = d − s − d0 ) ∧ (s + d0 < d)

(maxstep)



0,
iff (d − s − d0 > maxstep) ∧ (s + d0 < d)
1, iff b x+1
c = b x2 c
2
and the short cut P r[δ] is again used for
0, otherwise
P r[dist(X, ~0) = δ].
♦
where χ(x) =
Proof: The first case is true for trivial reasons. The remaining cases of the lemma
follow immediately from
P r[dist(A + X − S, ~0) = d]
= P r[dist(A + X − S, ~0) ≤ d] − P r[dist(A + X − S, ~0) ≤ d − 1]
for d ≥ 1. Here, we have to clarify which situations might occur for d and d − 1.
When situation (iv) holds for d − 1, either (iv) is also true for d (second line) or
120
7.1. T HEORETICAL FRAMEWORK
(i) is true for d (third line). Lines four and five deal with the transitions from (i) to
(i). Lines six and seven distinguish two cases for the transition from situation (i)
to (ii). Lines eight and nine show the formulas for the transition from (ii) to (ii).
Situation (ii) at d − 1 leading to situation (iii) at d is handled by line ten. The last
line takes care of situation (iii) for both d − 1 and (d). Substitutions and simple
transformations lead to the formula given in the lemma.
q.e.d.
Example 7.3 This example uses again the exemplary mutation distribution in Figure 7.5 and the scenario in Figure 7.13 with s = 1, d0 = 2, and maxstep = 2.
For d = 0, the first line of Lemma 7.6 holds.
P r[dist(A + X − S, ~0) = 0] = P r[dist(A + X − S, ~0) ≤ 0] = 0.
For d = 1, the case (maxstep = s + d0 − d) ∧ (s + d0 > d) holds and
P r[2]
= 0.05.
P r[dist(A + X − S, ~0) = 1] =
N (2)
For d = 2, case (s + d0 − d < maxstep < s + d0 + d) ∧ (s + d0 > d) holds and
X
P r[δ]
P r[1]
2χ(1 + 2 + 2 − δ)
P r[dist(A + X − S, ~0) = 2] =
+
N (1) 1+2−2+1≤δ≤2
N (δ)
P r[2]
= 0.1.
= 0.1 + 2 χ(3)
|{z} N (2)
=0
For d = 3, case (s + d0 = d) ∧ (maxstep < 2d) holds and
X
P r[δ]
P r[dist(A + X − S, ~0) = 3] = P r[0] +
2χ(6 − δ)
N (δ)
1≤δ≤2
P r[1]
P r[2]
= 0.2 + 2 χ(6 − 1)
+ 2 χ(6 − 2)
= 0.3.
| {z } N (1)
| {z } N (2)
|
{z
}
=0
=1
0.05
For d = 4, case (d − s − d0 < maxstep < s + d0 + d) ∧ (s + d0 < d) holds and
P r[4 − 1 − 2]
P r[dist(A + X − S, ~0) = 4] = (N (4 − 1 − 2) −8 + 1 + 2 + 4)
+
|
{z
}
N (4 − 1 − 2)
{z
}
|
=4
=0.1
P r[δ]
2χ(1 + 2 + 4 − δ)
N (δ)
4−1−2+1≤δ≤2
X
P r[2]
= 0.3 + 2 χ(1 + 2 + 4 − 2)
= 0.3.
|
{z
} N (2)
=0
121
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
For d = 5, case (maxstep = d − s − d0 ) ∧ (s + d0 < d) holds and
P r[2]
P r[dist(A + X − S, ~0) = 5] = (N (2) −10 + 1 + 2 + 4)
= 0.25.
| {z }
N (2)
| {z }
=8
=0.05
For d > 5, case (d − s − d0 > maxstep) ∧ (s + d0 < d) holds resulting in
P r[dist(A + X − S, ~0) = d] = 0.
♦
Figure 7.15 illustrates the computations in the example.
0.05
0.05
0.05
0.1
0.05
0.1 0.05
0.2
0.1 0.05
d = 0: 0.0
d = 1: 0.05
d = 2: 0.1
d = 3: 0.3
d = 4: 0.3
d = 5: 0.25
d = 6: 0.0
0.1 0.05
0.05
Figure 7.15 The resulting probabilities for P r[dist(A + X − S, ~0) = d].
Corollary 7.1 In the worst-case model, the probability to get from distance d0 to
distance d within one generation is
P r[ min dist(A + Xi − S, ~0) = d | X1 , . . . , Xλ , dist(A, ~0) = d0 ]
1≤i≤λ
λ h
i
X
λ
~0) > d])λ−i (P r[dist(A + X − S, ~0) = d])i
=
(P
r[dist(A
+
X
−
S,
i
i=1
where
P r[dist(A + X − S, ~0) > d] = 1 − P r[dist(A + X − S, ~0) < d + 1]
122
♦
7.1. T HEORETICAL FRAMEWORK
Definition 7.6 (Worst-case Markov chain model) The worst-case Markov chain
model for severity S = (s, 0)T , a mutation defined by pdf p, and λ offspring is
given by the tuple (States, T ) where
States = N0
and the transition matrix T = States × States is given by
T [d0 → d] = P r[ min dist(A + Xi − S, ~0) = d | X1 , . . . , Xλ , dist(A, ~0) = d0♦
].
1≤i≤λ
Example 7.4 Given the numbers computed in Examples 7.2 and 7.3, the probabilities for a (1, λ)-selection model result as follows for λ = 3, s = 1, and d0 = 2.
P r[ min dist(A + Xi − S, ~0) = 0 | X1 , . . . , Xλ , dist(A, ~0) = 2] = 0
1≤i≤λ
P r[ min dist(A + Xi − S, ~0) = 1 | X1 , . . . , Xλ , dist(A, ~0) = 2]
1≤i≤λ
= 31 (P r[dist(A + X − S, ~0) > 1])2 (P r[dist(A + X − S, ~0) = 1])1 +
3
(P r[dist(A + X − S, ~0) > 1])1 (P r[dist(A + X − S, ~0) = 1])2 +
2
3
(P r[dist(A + X − S, ~0) > 1])0 (P r[dist(A + X − S, ~0) = 1])3
3
= 3 · 0.952 · 0.051 + 3 · 0.951 · 0.052 + 1 · 0.950 · 0.053
= 0.142625
P r[ min dist(A + Xi − S, ~0) = 2 | X1 , . . . , Xλ , dist(A, ~0) = 2]
1≤i≤λ
= 31 (P r[dist(A + X − S, ~0) > 2])2 (P r[dist(A + X − S, ~0) = 2])1 +
3
(P r[dist(A + X − S, ~0) > 2])1 (P r[dist(A + X − S, ~0) = 2])2 +
2
3
(P r[dist(A + X − S, ~0) > 2])0 (P r[dist(A + X − S, ~0) = 2])3
3
= 3 · 0.852 · 0.11 + 3 · 0.851 · 0.12 + 1 · 0.850 · 0.13
= 0.24325
P r[ min dist(A + Xi − S, ~0) = 3 | X1 , . . . , Xλ , dist(A, ~0) = 2]
1≤i≤λ
= 31 (P r[dist(A + X − S, ~0) > 3])2 (P r[dist(A + X − S, ~0) = 3])1 +
3
(P r[dist(A + X − S, ~0) > 3])1 (P r[dist(A + X − S, ~0) = 3])2 +
2
3
(P r[dist(A + X − S, ~0) > 3])0 (P r[dist(A + X − S, ~0) = 3])3
3
= 3 · 0.552 · 0.31 + 3 · 0.551 · 0.32 + 1 · 0.550 · 0.33
= 0.44775
123
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
0.24325
0.15075
5
3
4
0.44775
0.0
2
1
0
0.142625
0.015625
Figure 7.16 The resulting transitions for state 2 in the worst-case Markov model.
P r[ min dist(A + Xi − S, ~0) = 4 | X1 , . . . , Xλ , dist(A, ~0) = 2]
1≤i≤λ
= 31 (P r[dist(A + X − S, ~0) > 4])2 (P r[dist(A + X − S, ~0) = 4])1 +
3
(P r[dist(A + X − S, ~0) > 4])1 (P r[dist(A + X − S, ~0) = 4])2 +
2
3
(P r[dist(A + X − S, ~0) > 4])0 (P r[dist(A + X − S, ~0) = 4])3
3
= 3 · 0.252 · 0.31 + 3 · 0.251 · 0.32 + 1 · 0.250 · 0.33
= 0.15075
P r[ min dist(A + Xi − S, ~0) = 5 | X1 , . . . , Xλ , dist(A, ~0) = 2]
1≤i≤λ
= 31 (P r[dist(A + X − S, ~0) > 5])2 (P r[dist(A + X − S, ~0) = 5])1 +
3
(P r[dist(A + X − S, ~0) > 5])1 (P r[dist(A + X − S, ~0) = 5])2 +
2
3
(P r[dist(A + X − S, ~0) > 5])0 (P r[dist(A + X − S, ~0) = 5])3
3
= 3 · 0.02 · 0.251 + 3 · 0.01 · 0.252 + 1 · 0.00 · 0.253
= 0.015625
Figure 7.16 shows the transition for state 2 in the resulting Markov model. Comparing these values to the values of the exact model in Figure 7.8 the worst-case
character of the model presented in this section can be seen again.
♦
The worst case model is used in the next section for the derivation of a necessary
criterion concerning feasible tracking.
7.2
Feasible tracking
This section is concerned with the definition of a criterion on how the parameters
of a local search algorithm have to be chosen to guarantee feasible tracking. In this
124
7.2. F EASIBLE TRACKING
section we are only marginally interested in the accuracy of tracking or optimal
parameter settings.
A first attempt to define such a criterion could be the statement that “the surviving
individuals in a (1, λ)-strategy should stay rather close to the moving optimum”.
This is, however, only partially true. The state distribution of the exact Markov
chain model around the optimum can be compared best to an electron cloud around
the atomic nucleus. This cloud cannot be bound since in an evolutionary algorithm
there will be always a probability greater than zero for k steps leading away from
the optimum. Instead, the evolutionary algorithm should guarantee that there is
always a higher probability to get closer towards the optimum than to continue
moving away from the optimum.
This is examined using the worst-case Markov chain model. It can be easily
observed that the relative transition probabilities are constant for all states d >
maxstep. It is enough to analyze the transition probabilities of those states where
it must be guaranteed that outliers are very likely shifted back into the center of the
cloud. The behavior of the other states is much more diverse.
In order to find a useful criterion for this behavior, we consider the expected distance change, i.e. the change concerning the distance to the optimum that can be
expected in one generation,
E[ min dist(A + Xi − S, ~0) − d0 | X1 , . . . , Xλ , dist(A, ~0) = d0 ]
1≤i≤λ
X
δ T [dist(A, ~0) → dist(A, ~0) + δ].
=
−maxstep≤δ≤maxstep
This value is required to be less than zero, that means on average the distance to
the optimum is decreased.
Figure 7.17 shows the criteria for severity values s = 1, 2, and 3, maximal step
size between 1 and 40, and λ = 5 offspring each generation. For the (1, λ)-strategy
successful tracking is unlikely with maximal step sizes smaller than 7 in the case of
severity 1. This lower bound for the maximal step size increases to 30 for severity
2. And for severity 3, the smallest value of the maximal step size is 67 that enables
feasible tracking (not shown in the Figure). The latter step size value implies a very
low accuracy with huge deviations during tracking.
However, a negative expected state change can be reached for any severity—implying
successful tracking. Figure 7.18 provides an argument for this statement. There the
solid line marks the distance to the optimum of the current parental individual. The
optimum is to the right and the dashed line marks the shift of the problem caused
by the dynamics. The area of the displayed pdf to the right of the dashed line is the
125
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
expected distance change
2.5
2
1.5
1
0.5
0
-0.5
-1
-1.5
severity 3
severity 2
severity 1
0 5 10 15 20 25 30 35 40
maximal step size
Figure 7.17 Using offspring population size λ = 5, this figure shows the expected
distance change. For severity 1 the criterion is met from maximal step
size 7 on; for severity 2 with maximal step size 30; for severity 3 with
maximal step size 68 (not shown here).
no change to parental
individual
severity
severity
Figure 7.18 This figure shows two pdfs for changing the distance to the optimum,
where the optimum is situated to the right of the pdf’s. It illustrates
how increasing the maximal step size decreases the impact of the
severity.
probability to move closer to the optimum. By increasing the maximal step size,
the impact of the severity is getting smaller, that means, the dashed line in the figure
is moving closer to the solid line—relatively to the range of possible step sizes. As
a consequence the proportion of the area to the right of the dashed line to the area
of the complete pdf is increasing. For any ε > 0, maxstep can be chosen suitably
126
7.2. F EASIBLE TRACKING
big that
maxstep − s
1
>
− ε.
2maxstep + 1
2
Possible values are
maxstep ≥
s+1
.
2ε
Since in addition the pdf is always skewed towards the optimum because of the
selective pressure, the criterion defined above can always be met by an accordingly
high maximal mutation step size.
expected distance change
These considerations show that it is always possible to guarantee successful tracking by increasing the maximal step size. A bigger maximal step size leads to an
expanded cloud around the optimum. As a consequence, bigger severity comes
always along with decreased accuracy for the considered local mutation (see Definition 5.2 with the fitness defined by the distance to the optimum).
severity 1
severity 2
severity 3
2
0
-2
-4
0
10
20
30
maximal step size
40
Figure 7.19 For an offspring population size λ = 20, the expected distance change
is shown for severity values 1, 2, and 3.
However, when increasing the offspring population size to λ = 20, the worstcase Markov chain model exhibits an interesting behavior. As it can be seen in
Figure 7.19 the required step sizes are reduced to achieve stable tracking. This can
also explained using Figure 7.18. Since an increase in the population size creates
a higher selective pressure in a (1, λ)-strategy, the pdf for one generational step
is more skewed towards the optimum. As a consequence, a smaller step size is
127
balanced max. step size
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
30
25
20
15
10
5
0
10
20
30
population size
40
Figure 7.20 For severity 2 and various offspring population sizes, the expected
maximal step size values are shown where a stable tracking can be
derived from the worst-case model.
required to achieve an expected change towards the moving optimum. Figure 7.20
shows how the boundary for feasible tracking approaches the minimal required
maximal step size, namely the severity of the problem, with increasing offspring
population size.
Design rule 1 By increasing the maximal step size parameter and/or the offspring
population size tracking becomes feasible for any severity value. Increasing the
maximal step size decreases the accuracy. Increasing the population size can decrease the minimal required value for the maximal step size parameter.
♦
In the following sections, we are primarily concerned with problems associated
with a very restricted time resource. In this case increasing the population size is
no choice, since there is not enough time available, and bigger maximal step size
values are probably also no option, since the tracking accuracy decreases by this
method. Various possible solutions to this problem are examined.
7.3
Optimal parameter settings
On the basis of the results of the previous section, we are concerned with the choice
of optimal parameter settings for the local mutation operator and small severity
values in this section. As an exemplary value population size 5 is chosen in this
128
7.3. O PTIMAL PARAMETER SETTINGS
1000
parameter
100
10
1
2
4
6
8
expected step size
10
12
Figure 7.22
Local Mutation: Expected distance to
the optimum for severity 1, population
size 5, and maximal step sizes as indicated.
expected distance
Figure 7.21 The values for the parameter “maximum step size” that are necessary
to reach a particular expected step size (see Equation 7.8).
4
3.5
3
2.5
2
1.5
1
0.5
3
5
7
9
15
0
4
8 12 16
generation
20
examination. But how is an “optimal” parameter setting defined? This term refers
to parameter values that are chosen in such a way that tracking is feasible and the
highest possible accuracy is guaranteed.
For this analysis the exact Markov chain model is used with a radius of at most
30 (see Definition 7.5), which means that any probability to be at a distance of
more than 30 to the optimum is summarized in the absorbing state. From this
computation of the exact probability distributions in the search space for the first
20 generations with the first individual starting at the optimum, we derive a 3dimensional graph on the course of the distribution as well as a comparison of
different mutation step sizes concerning the expected distance to the optimum.
129
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
frequency
0.45
0.3
0.15
0
4
8
generation
12
16
20 0
4
12
8
distance
Figure 7.23 Local Mutation: Change in the distribution of the distance to the optimum for severity 1 and population size 5 using the optimal maximal
step size 9.
frequency
0.45
0.3
0.15
0
4
8
generation
12
16
20 0
4
12
8
distance
Figure 7.24 Local Mutation: Change in the distribution of the distance to the optimum for severity 1 and population size 5 using maximal step size
3.
frequency
0.4
0.3
0.2
0.1
0
4
8
generation
12
16
20 0
4
12
8
distance
Figure 7.25 Local Mutation: Change in the distribution of the distance to the optimum for severity 1 and population size 5 using maximal step size
15.
130
7.3. O PTIMAL PARAMETER SETTINGS
Due to limited time and memory resources, a complete analysis is only possible
for the severity value 1. This is in particular due to the exponential increase in the
parameter value for the maximal step size needed to achieve a certain average step
size (or expected step size)
Ep [X] =
X
P rp [X = x]dist(~0, x).
(7.8)
x∈Z2 , dist(~0,x)<maxstep
These numbers are shown for the local mutation operator in Figure 7.21.
The expected distance to the optimum is shown for severity 1 and for selected
step sizes in Figure 7.22. After 20 generations the maximal step size 9 shows the
best performance with an expected distance of 1.831 in generation 20. Probably,
maximal step size 9 is not completely stable toward infinity, but it represents the
best possible value for tracking over 20 generations. The respective distribution of
the values is shown in Figure 7.23 and appears to be a very stable tracking behavior.
This result underpins the experimental findings using evolution strategies with slow
dynamics (cf. Bäck, 1998, 1999).
The drawback of a suboptimally chosen maximal step size can be illustrated again
by the course of the changing probability distribution. On the one hand, the effect
of a step size chosen too small is shown in Figure 7.24 (with maximal step size 3
for severity 1) where the distribution is flattening out with advancing generations.
This results in an increasing expected distance to the optimum. On the other hand,
a too big maximal step size results in a broader distribution. This can be seen in
Figure 7.25 (with maximal step size 15 for severity 1) compared to the optimal
maximal step size 9 shown in Figure 7.23. This leads also to a slightly increased
expected distance to the optimum. However the optimal step size value 9 is very
close to the step size 7 of the worst-case analysis in Section 7.2 where a break even
concerning the distance to the optimum (see Figure 7.17).
All in all the results of this section can be summarized in the following recommendation, that will be also underlined by examinations in the following sections.
Design rule 2 Optimal step size parameters can be roughly estimated by the break
even expected distance of the worst-case analysis in the previous section. The
consequences of too big parameter values are less severe than too small values
with regard to the accuracy for a fixed number of generations.
♦
131
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
7.4
Non-zero-mean mutation
In any evolutionary algorithm the operators have a certain underlying conception of
how the next individuals should be created using previous individuals. These conceptions can be made explicit by operator-defined fitness landscapes (Jones, 1995)
in the case of discrete search spaces and by probability density functions (pdf) in
the case of continuous search spaces, which is the case with evolution strategies.
In order to get good results there should be a high correlation between the characteristics of the problem and this conception of the operator. One example is the
Gaussian mutation applied to real-valued, smooth, stationary problems. A smooth,
partially monotonous search space (e.g. sphere model) guarantees that even a small
step in the right direction is a good step. This fits perfectly to the internal model of
the zero-mean Gaussian pdf. The additional self-adaptation mechanisms of evolution strategies enable a quick adaptation to quite different problem spaces.
Time t
⊗
⊗
⊗
⊗
⊗
⊗
Time t+1
Figure 7.26 This figure shows three different landscapes at time t (upper row)
and t + 1 (middle row). By the change of the landscape there arises
a region of discontinuity shown in the lower row where the fitness
should be equal or better to a point situated to the left but where the
fitness decreases caused by the dynamics.
However, reconsidering the worst-case analysis in Section 7.2 which implied a decreasing accuracy with increasing severity, the correlation between mutation and
problem characteristics is disturbed by the introduction of considerable dynamic,
time-varying aspects into the problem. Figure 7.26 illustrates this effect schematically. The upper row shows three different one-dimensional fitness landscapes
where the circled cross marks the current position of a candidate solution. The
middle row shows how the fitness landscape is shifted from one generation to the
132
7.4. N ON - ZERO - MEAN MUTATION
next. And the lower row shows the arising discontinuity from this shift as the difference between the fitness values we could expect if the problem was stationary
and the fitness values we encounter in the next generation. The smoothness we
could actually expect gets disarranged from generation t to generation t + 1. At
time t, any step to the right improves the fitness—the mutation cannot do wrong if
a small step is chosen instead of a bigger step. But this is not true anymore at time
t + 1 since the small step reaches a worse fitness than at time t. Here a bigger step
is desirable. As a consequence the hitting probability decreases considerably with
increasing severity and the algorithm is not able to track the optimum anymore.
Thus dynamics introduce a new difficulty in the optimization which is probably
not met by a local mutation operator fulfilling the five principles mentioned at the
beginning of this chapter (page 101).
Where principles (3) and (5) have been completely disregarded in this examination
and principles (4) is mimicked to some extend, the principle of zero-mean mutation
(1) and the principle concerning smaller changes (2) are the main characteristics of
a local mutation. As a consequence, this section and the next section examine to
what extend it is useful to break with those principles. This section is dedicated to
non-zero-mean mutations.
There have been several proposed local mutations involving the preference of a
certain direction. In the real-valued domain, there are examinations by Ghozeil and
Fogel (1996) and Hildebrand, Reusch, and Fathi (1999) concerning the stationary
fitness functions. In this section, we introduce a direct mutation by skewing the
probability function according to the following definition.
Definition 7.7 (Directed mutation) The directed version of a mutation operator
where positive values of search space dimension x1 are favored is defined using
pdf p
 2d(0,x2 )
3
~

−

 2 N (dist(~0,x)) P rp [X = x], if x1 ≥ 0 ∧ dist(0, x) > 0
2d(0,x2 )
1
(7.9)
P rpdir [X = x] =
~
 2 + N (dist(~0,x)) P rp [X = x], if x1 < 0 ∧ dist(0, x) > 0


♦
P rp [X = x],
if dist(~0, x) = 0
Figure 7.27 sketches how the probabilities are modified. This modification of
the mutation operators interferes with the usual property of zero-mean mutations.
However, this definition does not change the general scheme of prefering small
changes over big changes as is shown in the following lemma.
Lemma 7.7 The probability to make a step of size d (for 0 ≤ d ≤ maxstep) is
identical for the undirected and the directed version of a mutation using an arbitrary pdf p.
♦
133
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
P r[d]
+ N 2(d)
d
1
2 P r[d]
3
2 P r[d]
P r[d]
Figure 7.27 Transformation of an undirected mutation into a directed mutation by
modification of the probabilities.
Proof: The case d = 0 holds trivially because of the last case of Definition 7.7. For
d > 0 the following transformations hold (using N (d) = 4d).
X
P rpdir [X = x]
x∈Z2 , dist(~0,x)=d
=
X
0≤i<d
2
3 2(d − i)
−
2
4d
X 1 2(d − i) 1
3
+ +
2
+
+
2 0<i<d
2
4d
2
!
P rp [X = x]
= 4dP rp [X = x]
X
=
P rp [X = x].
x∈Z×Z, dist(~0,x)=d
q.e.d.
The application of Definition 7.7 to the previously defined local mutation (using
plocal ) results in the probability density function shown in Figure 7.28.
The following investigation of the directed local mutation relies completely on the
exact Markov chain model since the worst-case model is not applicable to nonzero-mean mutations. Furthermore we assume that the direction of the mutation
134
7.4. N ON - ZERO - MEAN MUTATION
0.02
0
10
-10
0
0
10
-10
Figure 7.28 Resulting probability density function for the directed local mutation.
Figure 7.29
Directed local mutation: Expected
distance to the optimum for severity
1, population size 5, and maximal step
sizes as indicated.
expected distance
1.6
1.4
1.2
1
3
4
5
7
10
15
0.8
0.6
0.4
0
4
8 12 16
generation
20
frequency
0.45
0.3
0.15
0
4
8
generation
12
16
20 0
4
8
12
distance
Figure 7.30 Directed local mutation: Change in the distribution of the distance to
the optimum for severity 1 and population size 5 using the best found
step size.
is optimal set in the first generation and is not changing in the complete simulation/computation.
Figure 7.29 shows the expected distance for severity 1, population size 5, and var135
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
expected distance
4
3
2
6
11
16
21
25
1
0
0
4
8
12 16
generation
20
Figure 7.31 Directed local mutation: Expected distance to the optimum for severity 2, population size 5, and maximal step sizes as indicated.
frequency
0.4
0.3
0.2
0.1
0
4
8
generation
12
16
20 0
4
12
8
distance
Figure 7.32 Directed local mutation: Change in the distribution of the distance to
the optimum for severity 2 and population size 5 using the best found
step size 21.
ious selected values for the maximal step size. The discovered best maximal step
size is 5 leading to an expected distance of 1.079 at the end of generation 20. The
respective course of the distribution is shown in Figure 7.30.
Compared to the results of the undirected mutation in Figure 7.22 the negative
effects of too small maximal step sizes can be reduced considerably. The effects of
too large maximal step sizes appear to be unaffected however.
For severity value 2, Figure 7.31 shows the expected distance to the optimum for
selected maximal step sizes. The optimal parameter setting could be identified as
step size 21 with an expected distance of 2.3825. The respective course of the dis136
7.5. P ROPOSITION OF BIGGER STEPS
Figure 7.33
This figure illustrates the usage of the
binomial pdf to define the mutation operator promoting bigger steps. The pdf
is used to assign a probability to each
point that this point is created as an offspring. It advocates a certain step size
with a smaller standard deviation.
tribution is shown in Figure 7.32 with a considerably broader deviation than for
severity 1. These results indicate, however, that the technique of directed mutations is not a significant remedy for the identified problems when applying local
mutations to bigger severity values.
All in all the findings of this section can be summarized in the following recommendation.
Design rule 3 For small population sizes and predictable dynamics with small
severity, a well-orientated directed local mutation is able to reduce the divergent
behavior of local mutations with small maximal step sizes. It does not solve the
problems with higher severity values.
♦
7.5
Proposition of bigger steps
The motivation at the beginning of the last section has been the need to overcome
the discrepancy between local mutation and problem characteristics. There, breaking with the underlying principles has been argued to be a proper means for identifying properties of better suited mutation operators. In the last section non-zeromean mutations have been examined. In this section the promotion of bigger steps
is analyzed as well as the combination of bigger steps with non-zero-mean mutation.
Therefore, it has to be achieved somehow that smaller steps are more unlikely to
occur than bigger steps. In this section, this is implemented by a different assignment of the underlying binomial pdf to the pdf of the mutation operator as it is
137
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
undirected
directed
0.01
0.014
0
0
10
-10
0
0
10
-10
10
-10
0
0
-10
10
Figure 7.34 Resulting probability density functions for the ring-like mutation
proposing bigger step sizes.
shown in Figure 7.33. A similar mutation with a more crisp pdf has been introduced by Weicker and Weicker (1999). The following definition introduces the
mutation formally.
Definition 7.8 The ring-like mutation is defined using the following pdf
0,
if d = 0
pring (d) =
maxstep−1
1
, if d > 0
2maxstep−1
d−1
with 0 ≤ d ≤ maxstep in Definition 7.4.
(7.10)
♦
Since this definition only specifies the assignment of probability values to stepwidth d, the zero-mean principle is untouched by the definition. However, by
combining Definitions 7.7 and 7.8 a mutation breaking with both principles can
be defined. Both mutation probability density functions are shown in Figure 7.34.
Since the mere ring-like mutation is zero-mean, the worst-case Markov chain model
is applicable. The result of the computation concerning the feasibility of tracking
is shown in Figure 7.35. Due to the different shape of the ring-like mutation’s pdf,
the maximal step size needed to guarantee feasible tracking grows almost linear
with increasing severity. This can also be seen in the dependence of the expected
step size on the maximal step size parameter shown in Figure 7.36. As a consequence the mutation step size can be derived better from a given severity value and
the optimal step size parameters are closer together for varying severity than in the
case of the local mutation.
Another conclusion that can be drawn from the worst-case Markov chain model is
the fact that analogously to the considerations in Section 7.2 any too big maximal
step size parameter will lead to successful tracking but with a smaller accuracy (see
138
expected distance change
7.5. P ROPOSITION OF BIGGER STEPS
4
3
2
1
0
-1
-2
-3
-4
-5
-6
-7
severity 1
severity 2
severity 3
0 5 10 15 20 25 30 35 40
maximal step size
Figure 7.35 Ring-like mutation: This figure shows the expectancy value of changing states for number of offsprings λ = 5. For severity 1 the criterion
is met from maximal step size 4 on; for severity 2 with maximal step
size 9 and for severity 3 with step size 15.
1000
Gauss like
Ring like
parameter
100
10
1
2
4
6
8
expected step size
10
12
Figure 7.36 The values for the maximum step size parameter that are necessary to
reach a particular expected step size.
Figure 7.37). This is probably astonishing since the shape of the ring-like mutation
could imply a different behavior.
The exact Markov chain model is used for computations concerning severity values 1, 2, and 3. Again the number of offspring individuals is λ = 5. The expected
139
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
no change to parental
individual
severity
severity
Figure 7.37 Schematic illustration how increasing the maximal step size parameter of the ring-like mutation helps to increase the fraction of improving mutations (leading to smaller distance to the optimum).
distance values are shown for severity 1 and selected maximal step sizes in Figure 7.38. The best maximal step size for the undirected ring-like mutation is 3
causing an expected distance of 1.724 after 20 generations. The best maximal step
size for the directed version is 2 with an expected distance of 0.913. The respective
courses of the distribution are shown in Figure 7.39. In accordance to the results
using the local mutation, the directed version is able to soften the negative effects
of too small mutations.
The results for severity 2 shown in Figure 7.40 and Figure 7.41 are along the same
lines as well as the results for severity 3 in Figure 7.42 and Figure 7.43.
Table 7.1 gives an overview on the optimal parameter values and the respective expected distances to the optimum in generation 20 for all considered severity values
and mutation types.
In case of the ring-like mutations, the worst case scenario yields again good approximations of these values. In the scenario using population size 5, the expected
distance to the optimum (being an indicator for the accuracy) is approximately linear with the severity value (see also Figure 7.44 for severity 4). This aspect makes
the directed ring-like mutation very appealing. However, two restrictions must be
considered: first, the maximal step size has to be chosen accordingly and, second,
the orientation of the directed mutation is crucial. The first restriction is discussed
in the next paragraph and the second restriction is considered in Section 7.8.
When comparing the expected distances in Figure 7.22 and Figure 7.38, it appears
that the ring-like mutation is much more sensitive to inappropriately big maximal
step size parameters. This leads to a more severe decline in the tracking accuracy.
Partially this can be explained using the different effect of the maximal step size
140
7.5. P ROPOSITION OF BIGGER STEPS
8
7
6
5
4
3
2
1
0
directed
1
2
3
4
6
0
expected distance
expected distance
undirected
4
8 12 16
generation
20
3.5
3
2.5
2
1.5
1
0.5
0
1
6
4
3
2
0
4
8 12 16
generation
20
Figure 7.38 Ring-like mutation: Expected distance to the optimum for severity 1,
population size 5, and maximal step sizes as indicated.
frequency
frequency
undirected
0.6
0.4
0.2
0
4
8
generation
12
16
20 0
4
12
8
distance
directed
0.6
0.4
0.2
0
4
8
generation
12
16
20 0
4
12
8
distance
Figure 7.39 Ring-like mutation: Change in the distribution of the distance to the
optimum for severity 1 and population size 5 using the optimal step
size.
141
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
directed
undirected
3
4
5
7
10
10
8
expected distance
expected distance
12
6
4
2
0
0
4
8 12 16
generation
4.5
3.5
2.5
1.5
0.5
20
3
4
5
7
10
5.5
0
4
8 12 16
generation
20
Figure 7.40 Ring-like Mutation: Expected distance to the optimum for severity 2
and population size 5.
frequency
frequency
undirected
0.4
0.3
0.2
0.1
0
directed
0.45
0.3
0.15
0
4
8
12
16
20 0
4
8
12
distance
4
8 12
generation
16
20 0
Figure 7.41 Ring-like Mutation: Change in the distribution of the distance to the
optimum for severity 2 and population size 5 using the optimal step
size.
142
4
8
12
distance
7.5. P ROPOSITION OF BIGGER STEPS
undirected
directed
10
expected distance
expected distance
10
8
6
4
6
9
12
18
24
2
0
0
4
8
12 16
generation
4
8
24
6
6
9
2
0
20
15
4
0
4
8
12 16
generation
20
Figure 7.42 Ring-like mutation: Expected distance to the optimum for severity 3
and population size 5.
frequency
frequency
undirected
directed
0.3
0.2
0.1
0
0.2
0.1
0
4
8
generation
8
12
16
20 0
12
4
distance
4
8
generation
12
16
20 0
4
8
12
distance
Figure 7.43 Ring-like Mutation: Change in the distribution of the distance to the
optimum for severity 3 and population size 5 using the optimal maximal step size value.
143
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
best maximal step size final expected distance
severity 1:
undirected local
directed local
undirected ring-like
directed ring-like
severity 2:
undirected Gaussian
directed Gaussian
undirected Ring
directed Ring
severity 3:
undirected Gaussian
directed Gaussian
undirected Ring
directed Ring
9
5
3
2
1.831
1.079
1.724
0.913
≈ 30
–
7
5
–
–
3.459
2.104
≈ 68
–
12
9
–
–
5.225
3.203
Table 7.1: Overview of maximal step size for the four mutation operators using
population size 5.
expected distance
10
8
6
4
6
9
12
18
24
2
0
0
4
8
12 16
generation
20
Figure 7.44 Ring-like mutation: Expected distance to the optimum for the directed mutation, severity 4 and population size 5.
144
7.5. P ROPOSITION OF BIGGER STEPS
parameter on the expected step size shown in Figure 7.36.
A more compelling explanation can be found by looking at the hitting probability
(see Lemma 7.5) for a parental individual placed at the current optimum, i.e. we
compute the probability to get as close as distance 5 to the optimum within one
generation where the optimum has moved the severity distance s. The results are
shown for the local mutations as well as the ring-like mutations in Figure 7.45.
directed ring-like
directed local
directed ring-like
1
1
20 30 40 50
ximum step size
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
100
200
300
maximum step size
10 20 30 40 50
maximum step size
0
Figure 7.45 The diagram shows the probabilities to hit a point within distance 5
of the optimum. On the x-axis the maximum step size is shown and
different line types indicate different severity of dynamics.
There are three primary facts that we can conclude from this figure.
1. Like in the examinations over 20 generations a substantial higher accuracy of
the ring-like mutation over the local mutation as well as the directed versions
over the undirected versions is obvious.
2. However, the higher hitting probability of the ring-like mutation holds only
for a very narrow window of step size parameters. As a consequence, a
poorly calibrated local mutation can easily outperform a poorly calibrated
ring-like mutation.
3. In addition, the narrow windows of the ring-like mutation are shifted and
overlap only partially. This may cause problems if the severity value is not
known in advance or varies during optimization.
145
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
To sum it up it can be said that the proposition of bigger steps has certain considerable advantages. However, the risk of the disadvantages must be analyzed and
weighed carefully. The important recommendations of this section are summarized
in the following design rules.
Design rule 4 The proposition of bigger steps enables better accuracy rates, but
requires proper calibration of the step size parameter. For problems with varying
severity values, the latter point should be guaranteed. Otherwise the accuracy
rates may drop below the accuracy of the local mutation.
♦
Design rule 5 Even with small population sizes, combining non-zero mean mutation with the proposition of bigger steps may lead to very precise tracking accuracy.
♦
7.6
Dependence on the population size
As it was already discussed in Section 7.2 (Design rule 1), an increase in the population size can be used to diminish the expected distance to the optimum. There
it was also argued that a change in the population size comes along with a change
in the required optimal value of the step size parameter. This section is devoted to
a more exhaustive analysis of the influence of the population size.
The dependence on the population size and the optimal value of the step size parameter is examined closer using the exact Markov chain model. Figure 7.46 shows the
expected distance to the optimum in generation 20 for severity 1, varying maximal
step size (in the x-axis), and different population sizes as indicated. Figures 7.47
and 7.48 show the analogous computations for severity values 2 and 3. In all three
figures the dependence between maximal step size and population size can be seen.
Also the effect on the expected distance becomes obvious. Rather small population
sizes (between 2 and 5) affect the negative effect of suboptimally chosen maximal step size values on the distance to the optimum more significantly than bigger
population sizes (e.g. 8 and above).
However, the probably obvious conclusion to increase the population size for improving the tracking behavior contradicts the limited time resources assumed in
this chapter. Therefore, in the remainder of this section the severity depends on the
number of evaluations (or the number of offspring per generation) anymore. Up to
now, the severity was rather conceived as an external characteristic of the problem.
In this analysis, we correlate the population size with the severity: each evaluation
contributes to the severity as it is conceived by the evolutionary algorithm in one
generation.
146
7.6. D EPENDENCE ON THE POPULATION SIZE
final expected dist.
14
2
5
8
12
12
10
8
6
4
2
0
0
4
8
12
maximal step size
16
Figure 7.46 Severity 1, ring-like mutation: Expected distance to the optimum in
generation 20 for several maximal step sizes shown in the x-axis and
population sizes as indicated.
final expected dist.
25
2
5
8
12
20
15
10
5
0
0
4
8
12
maximal step size
16
Figure 7.47 Severity 2, ring-like mutation: Expected distance to the optimum in
generation 20 for several maximal step sizes shown in the x-axis and
population sizes as indicated.
147
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
35
2
5
8
12
final expected dist.
30
25
20
15
10
5
0
2
4 6 8 10 12 14 16
maximal step size
Figure 7.48 Severity 3, ring-like mutation: Expected distance to the optimum in
generation 20 for several maximal step sizes shown in the x-axis and
population sizes as indicated.
This correlation is modeled using two distinct time factors,
• the time α to create and evaluate an individual and
• the time β as the processing time of one generation that is independent of the
number of individuals in the population.
Then, the total computation time for one generation results in
T = β + αn
where n is the population size. Furthermore, we assume that the computation time
T equals the severity of the problem. For example, the time factors α = 31 and
β = 0 imply that a population size 3 leads to severity 1. For α = β = 13 a
population size 5 leads to severity 2.
In the following computations the directed ring-like mutation is used since it exhibits the best tracking performance and the computations using the exact Markov
model are still feasible for severity value 4. The population sizes and the respective
optimal values for the step size parameter are shown in Table 7.2 for the used (α,
β)-configurations and the considered severity values.
First, the dependence of the severity on the population size without additional time
costs (β = 0) is examined. Figure 7.49 shows the computations of the expected
148
7.6. D EPENDENCE ON THE POPULATION SIZE
α=
1
2
α=
1
3
3
2
2
4
6
8
1
0
0
4
8 12 16
generation
α=
expected distance
expected distance
4
1.4
1.2
1
0.8
0.6
0.4
0.2
0
2
1.5
3
6
9
12
1
0.5
0
20
0
4
1
4
8 12 16
generation
α=
20
1
5
1
expected distance
expected distance
2.5
4
8
12
16
0
4
8 12 16
generation
20
0.8
0.6
0.4
5
10
15
20
0.2
0
0
4
8 12 16
generation
20
Figure 7.49 Results for β = 0 and directed ring-like mutation. In each graph four
different population sizes are used where the smallest population size
corresponds to severity 1, the next to severity 2, severity 3, and the
biggest to severity 4.
149
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
α = 15 , β = 0
α=β=
expected distance
expected distance
1
0.8
0.6
0.4
5
10
15
20
0.2
0
0
4
8 12 16
generation
α = 15 , β =
20
expected distance
expected distance
1
0.5
0
4
8 12 16
generation
4
8 12 16
generation
20
3
5
2
7
12
17
3
2
1
0
4
8 12 16
generation
Figure 7.50 Results for α = 15 , β > 0, and directed ring-like mutation. In each
graph four different population sizes are used where the smallest population size corresponds to severity 1, the next to severity 2, to severity 3, and the biggest to severity 4.
150
20
4
0
0
0
α = 15 , β =
3
8
13
18
1.5
4
9
14
19
2
5
2.5
2
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1
5
20
7.6. D EPENDENCE ON THE POPULATION SIZE
severity 1 severity 2 severity 3 severity 4
α = 12 , β = 0:
2
population size
maximal step size 4
4
6
6
8
8
10
α = 13 , β = 0:
population size
3
maximal step size 3
6
5
9
7
12
9
α = 14 , β = 0:
population size
4
maximal step size 2
8
4
12
6
16
8
α = 15 , β = 0:
population size
5
maximal step size 2
10
4
15
6
20
8
α = 15 , β = 51 :
4
population size
maximal step size 2
9
4
14
6
19
8
α = 15 , β = 52 :
3
population size
maximal step size 3
8
4
13
6
18
8
α = 15 , β = 53 :
population size
2
maximal step size 4
7
4
12
6
17
8
Table 7.2: Population sizes and the respective optimal values for the maximal step
size parameter used to achieve a certain severity for the given (α, β) combinations
(directed ring-like mutation).
distance in the exact Markov model for α ∈ { 12 , 13 , 14 , 15 }. The value α = 21 reflects
a very strong influence of the number of offspring on the severity. Surprisingly,
increasing the offspring population size up to 8 improves still the expected distance
although the severity increases. With α = 13 the picture is still the same. But with
decreasing influence of the population size, optimal values for the population size
are found: for α = 14 population size 12 is optimal and for α = 51 population size
10 is optimal (restricted by the coarse granularity of the values for the population
size). This result can be explained as follows. For small numbers of offspring, a
slight increase in offspring population size leads to rather big positive effects on the
151
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
performance (see Figures 7.46, 7.47, and 7.48). This is due to the improvements of
a bigger population size in general as well as the decreasing optimal value for the
step size parameter. Apparently this positive effect outweighs the negative effects
of an increasing severity. With decreasing α, the influence due to the decreasing
optimal value of the step size parameter becomes smaller. As a consequence the
negative effects of an increasing severity value are bigger than the mere effects of
increasing the population size.
When we introduce independent time cost β > 0, the results for α = 51 are shown
in Figure 7.50. There the optimal population size increases from 10 with β = 0 to
14 with β = 51 . For bigger values of β the optimal population size increases further
although the optimal values are not shown in the figure. Like with the dependent
cost α, we see that increasing independent cost β affects the optimal population
size considerably too. In particular the figure shows how sensitive the optimal
population size is affected by small irritations, e.g. due to administrative tasks.
The findings of this section are summarized in the following design rule.
Design rule 6 If the severity depends on the number of evaluations there is an
optimal offspring population size. This optimal number of offspring per generation
increases if the value α or the value β increase. This implies especially for very
restricted time resources rather big offspring population sizes. For the examined
mutation operator and β = 0, the population size 10–15 could serve as a rule of
thumb. A detailed analysis is necessary for concrete recommendations.
♦
7.7
Memorizing techniques
Memorizing techniques are rather seldom used in tracking tasks. However, there
are scenarios like the examination of Branke (1999c) where at least situations with
a tracking character might occur in an oscillating framework. In those situations
adding an external memory to the optimizer might be a useful idea. The external
memory stores a fixed number of previous solutions. Also the algorithm of Kirley
and Green (2000) uses an external memory and is applied to a drifting landscape.
An internal memory (using a polyploid representation) was applied to a tracking
task by Dasgupta (1995).
This section examines the use of an external memory for a tracking task. The
framework is similar to the previous sections, however, it is assumed that the path
of the optimum returns from time to time to former positions of the optimum. The
investigation is concerned with what we gain by this technique in the successful
case and what we loose if the memory fails to save the right solutions.
152
7.7. M EMORIZING TECHNIQUES
This is modeled by a slight modification of the exact Markov-model from Section 7.1.1. In the model the organization of the memory and the actual path of
the optimum are not considered in detail. Instead, it is simply assumed that in
each generation one individual is selected from the memory and is inserted into
the population. With a certain probability psuccess the individual corresponds to the
current position of the optimum. Furthermore, it is assumed that the memory stores
rather distinct solutions. Therefore, the model simplifies the new dynamics of the
memorizing technique by the assumption that either of the following cases holds:
1. The optimum is hit with the probability psuccess .
2. The optimum is not hit by the new individual and the introduced individual is
so far away from the optimum that the probabilities are not affected to reach
any other point of the exact model distinct from the optimum.
Since the random variables for hitting the optimum by the individual from the memory and by the individuals created by the local operator are independent from each
other, the probability to hit the optimum is expressed by
P rp [best(A + Xi − S, Z) = ~0 | X1 , . . . , Xλ , Z]
= P rp [best(A + Xi − S) = ~0 | X1 , . . . , Xλ ] + psuccess
−P rp [best(A + Xi − S) = ~0 | X1 , . . . , Xλ ] psuccess
where Z is a random variable associated with the individual selected from the memory.
The probability that another point distinct from the optimum created as best result
from the local operators application must be modified too. It is only effective if the
individual from the memory does not correspond to the optimum. This probability
is expressed by the following formula for y 6= ~0 since both random variables are
again independent.
P rp [best(A + Xi − S, Z) = y | X1 , . . . , Xλ , Z]
= P rp [best(A + Xi − S) = y | X1 , . . . , Xλ ] (1 − psuccess )
Then a Markov chain model for the optimizer including the memory is given in the
following definition.
Definition 7.9 (Markov chain model for algorithm with memory) The Markov
chain model for a local optimizer using an external memory is defined by the model
153
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
λ = 10, severity 3
expected distance
expected distance
λ = 10, severity 2
0.9
0.8
0.7
0.6
0
1.6
1.4
1.2
1
0.8
0.6
4 8 12 16 20
generation
0.9
0.8
0.7
0.6
0
0.9
0.8
0.7
0.6
4 8 12 16 20
generation
no memory
unsucc. memory
4 8 12 16 20
generation
λ = 20, severity 4
expected distance
expected distance
λ = 15, severity 3
0
0
4 8 12 16 20
generation
succ. memory (0.01)
succ. memory (0.05)
succ. memory (0.1)
Figure 7.51 Comparison of the tracking accuracy of evolutionary algorithms with
and without external memory. The unsuccessful use of the memory
assumes a success probability psuccess = 0.
of Definition 7.5 with the following modifications.
T [A → B] = P rp [best(A + Xi − S, Z) = ~0 | X1 , . . . , Xλ , Z]
X
T [A → absorb] = 1 −
P rp [best(A + Xi − S, Z) = v | X1 , . . . , Xλ , Z]
♦
v∈States\{absorb}
154
7.7. M EMORIZING TECHNIQUES
This model is now used to examine at which expected success rate for choosing an
optimal individual from the memory it is useful to invest one fitness evaluation in
the individual from the memory. For an offspring population size λ, the EA with an
external memory creates λ − 1 individual with the mutation operator and chooses
one individual from the memory. This algorithm is compared to an EA that creates
λ individuals with the mutation operator.
The results of the comparison are shown in Figure 7.51 for population size 10 with
severity values 2 and 3, population size 15 with severity 3, and population size 20
with severity 4.
The graph for offspring population size λ = 10 and severity 2 shows that with
a rather small population size the success rate of the memorizing technique must
reach a value of approximately psuccess = 0.1 that the tracking accuracy is improved. However, that means that each 10 generations a close-to-optimal solution
is selected from the memory. This is a very unrealistic assumption. This threshold can be lowered by choosing bigger population sizes as the other graphs imply
(even for bigger severity values). But still for all examined setups, a success rate
of psuccess = 0.01 is very close to the unsuccessful case with psuccess = 0. And
for the relevant range of “optimal” offspring population sizes the investment in an
additional offspring is always an advantage over the use of an external memory in
a realistic scenario.
However, an external memory is still useful if the problem combines a tracking
character with alternating and repetitive coordinate dynamics. Then the success
rate psuccess is probably big enough that the investment into the memory evaluation
pays off.
Design rule 7 For a mere tracking task the usage of an external memory should
be avoided if the individuals from the memory have a success rate of psuccess ≤
0.01. However, introducing one individual from the memory into the population
affects the tracking accuracy moderately such that an external memory is useful
for tracking problems with non-predictable, repetitive phases or alternations with
low severity.
♦
The examination of this section is based on a preliminary investigation of the hitting
probability within a similar framework (Weicker, 2000). The findings go along the
same lines than the empirical results of Kirley and Green (2000).
155
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
7.8
Issues of adaptation and self-adaptation
In the previous sections various techniques and approaches are discussed that aim
at improving the tracking behavior of a local search algorithm. However, all examinations and comparisons rely on the assumption that all algorithms are calibrated
optimally, which means that they perform at their best. Whether this assumption
can be met by the different algorithms in a dynamic environment is investigated in
this section.
Usually the properties of the dynamics in a non-stationary problem are not known
at hand. As a consequence, it appears to be useful to combine such an operator
with a mechanism to adapt the parameter settings such that the algorithm is able
to tune itself for reaching an optimal tracking behavior or to react on changes in
the dynamics of the environment. Examples for adaptation and self-adaptation
mechanisms have already been presented in Section 2.2.2. However, the decision
for or against an adaptation mechanism must consider the following factors that are
examined in the remainder of this section.
1. Severity: Is the characteristic of the tackled problem rather stable (constant
severity)? Can we assume in advance that the severity is rather big or small?
Is the severity rather unstable, i.e. small and big severity values alternate?
2. Accuracy: Do we require the best possible accuracy or is the successful tracking at any accuracy sufficient?
3. Number of parameters: How many parameters are used in the adaptation
mechanism and need to be calibrated?
7.8.1
Evaluation of the presented operators
This subsection focuses on the self-adaptive potential of the local and the ring-like
mutation as well as the tension between undirected and directed mutations.
In Section 7.5 the characteristics of the two different mutation operators have been
compared. As it is shown in Figure 7.45 the higher accuracy of the ring-like mutation is due to a smaller window of maximum step size values that are close to
the optimal value. This is not the case with the local mutation where the hitting
probability flattens smoothly with increasing maximal step size. And what is even
more important: The optimal parameter range of the ring-like mutation shifts with
increasing dynamics with the consequence that there is no overlap between the optimal ranges, e.g. for severity values 3 and 10. Therefore, a self-adaptation mech156
7.8. I SSUES OF ADAPTATION AND SELF - ADAPTATION
anism has to react very fast in case of the ring-like mutation when the severity
of the problem is changing drastically. Due to the different structure of the pdf,
this appears to be only a subordinate problem for the local mutation. It is only
of interest if a small maximal step size is used (e.g. for severity value 3) and the
severity value increases (e.g. to 10) such that a big maximal step size (above 50)
is required. However, a self-adaptation mechanism like in evolution strategies uses
a multiplicative factor with the probability density function shown in Figure 7.52
(see also Equation 2.1). As a consequence considerable increased values of the
probability
0.4
0.3
0.2
0.1
0
0.1
1
multiplicative factor
10
Figure 7.52 Probability density function of the multiplicative change of the strategy variable in evolution strategy mutation.
self-adaptation parameter occur with a rather high probability. Since the window
of good parameter values for the local operator is bound only single-sided and there
is no severe problem for too big maximal step sizes, it is enough to create an individual with a big enough step size. This is less critical than hitting the correct
range of step size values in case of the ring-like mutation.
Design rule 8 For problems with drastically changing dynamics, the use of the local mutation together with a self-adaptation mechanism promises a better tracking
behavior.
♦
If a directed mutation has to be supported by a self-adaptation mechanism, at least
n strategy variables are necessary to represent a vector for the direction in an n
dimensional search space. However, this does not consider the step size of the
mutation. Therefore, the directed mutation e.g. by Hildebrand et al. (1999) needs
n + 1 strategy variables. However, to adjust n + 1 parameters we need at least n + 1
157
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
useful, distinct evaluations of the search space. In a higher dimensional search
space this requires either many evaluations per generation or the adaptation speed
is very slow. This is underpinned by experimental investigations for undirected
mutation schemes with n and n(n+1)
strategy variables. There it was shown that the
2
adaptation mechanism is often too inert and fails to track an optimum (Weicker &
Weicker, 1999).
Design rule 9 If the direction of the dynamics is expected to change considerably,
it is advisable to use an undirected mutation operator with few strategy variables
to ensure quick adaptation.
♦
7.8.2
Limits of self-adaptation: uncentered tracking
This subsection questions whether the self-adaptation technique is applicable for
any kind of tracking problem. Self-adaptation was invented in the context of evolution strategies to enable the mutation operator to adapt to different problem landscapes. Analogously to the argumentation concerning the local (Gaussian) mutation in Section 7.4 (see Figure 7.26), the self-adaptation is tailored to stationary
landscapes. But in non-stationary problems the dynamics add a completely new
dimension to the problem. And the question arises whether self-adaptation is still
able to handle those problems.
To analyze this problem closer, we drop the assumption that our algorithm is close
to the optimum and tries to track its position only. Rather it is assumed that the
optimum still needs to be detected in the changing landscape: tracking is combined
with optimization. In order to explore the limits, a pathological setup is constructed
where the tracking direction is arranged orthogonally to the optimization direction.
The resulting “moving corridor problem” is sketched in Figure 7.53. Inside the
corridor the fitness is increasing in one direction. Outside the corridor the fitness
value is assumed to be constantly bad—if an algorithm fails to track the corridor in
any generation the tracking target is lost.
For an exemplary setup shown in Figure 7.54, we investigate the hitting probability
(to hit the corridor) and the expected fitness improvement if an improvement takes
place. For a severity value 3, a corridor of width 5, and the assumption that in the
current generation the individual is located in the middle of the corridor, the computations are carried out for values of the maximal step size parameter between 5
and 550. From the examination of the worst-case model (Figure 7.17), the optimal setting of the maximal step size parameter is known to be approximately 68
for a mere tracking task and population size λ = 5. Figure 7.55 shows the computed values of the hitting probability and the expected fitness improvement in the
158
7.8. I SSUES OF ADAPTATION AND SELF - ADAPTATION
Tracking
region
Figure 7.53
Arrangement of the moving corridor problem
where the tracking direction and the optimization direction are arranged orthogonally.
Dynamics
Optimum
case that the corridor is hit. Again there is an optimal value concerning the hitting probability. For higher values of the maximal step size the hitting probability
decreases. On the other hand the expected improvement is constantly increasing
with increasing maximal step size. This has the following consequences: if the
self-adaptation mechanism chooses randomly a rather high value for the step size
parameter (which happens rather often because of the multiplicative self-adaptation
rule) and by chance the modified object value of the individual is inside the corridor and better than the current individual’s fitness, then the individual with the
bigger step size parameter can be expected to have a better fitness than an offspring
with lower step size parameter (since the expected fitness improvement increases
constantly when the fitness increases). As a consequence the individual with the
bigger value of the step size parameter is accepted for the next generation. Iterating this scheme the maximal step size parameter will increase until the hitting
probability has decreased so much that no offspring within the corridor is found
anymore. During this process the self-adaptation will create also individuals with
smaller step sizes but because of their smaller expected improvement the selection
will usually prefer the individual with the bigger step size parameter.
This is an example where the self-adaptation mechanism is not able to master both
tracking and optimization. Because of the focus of the selection on the fitness as
the optimization criterion, the trackability is lost in the long run.
Design rule 10 Sole dependence of a self-adaptation of parameters on the fitness
value is not advisable. The tracking rate should be reflected as a control quantity
in any component of the algorithm.
♦
Moreover, the moving corridor problem should be used as a benchmark for all
159
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
part of the
corridor with
equal or
worse fitness
part of the
corridor with
better fitness
Figure 7.54 Supposed the current individual is centered in the corridor, the figure shows the corridor of the next generation with severity 3 and a
corridor of width 5. The shaded area marks the corridor.
adaptation techniques in dynamic environments to ensure a stable behavior even in
rather extreme situations.
7.8.3
Alternative adaptation mechanisms
As the previous subsection has shown, the standard self-adaptation mechanisms
might be deceived by the fitness values in the moving corridor problem. This subsection raises the question how different adaptation mechanisms might be conditioned.
The adaptation technique like in the 1/5-success rule (see page 2.2.2) uses statistics
to adapt a strategy parameter. The main differences to the self-adaptation mechanism are
160
hitting probability
1
expected
improvement
0.8
0.4
0.6
0.3
0.4
hitting probability
0.2
expected improvement
7.8. I SSUES OF ADAPTATION AND SELF - ADAPTATION
0.2
0
100
200
300
400
maximal step size
500
Figure 7.55 For the example in Figure 7.54, the lower curve shows the hitting
probability and the upper curve shows the expected improvement if
an improvement takes place.
• that the change to the strategy parameter is not random but derived using a
rule and
• that the strategy parameter is applied to all individuals in the population
equally.
The second issue is probably an advantage since the dynamics are identical for all
points in the search space as long as homogenous problems with linear coordinate
transformations are considered. Also, Angeline (1997) reported that there are indications in his experiments that adaptation might be superior to self-adaptation.
However, how a sensible rule can be defined to guarantee successful adaptation
concerning the moving corridor problem is not yet known and a topic of future
work.
An even more sophisticated method to adapt an evolutionary algorithm to a dynamic problem would be the derivation of the meta-rule of the dynamics from the
evaluated individuals. If the meta-rule can be approximated sufficiently, the dynamics can be eliminated, and a standard algorithm may be used. In addition this
approach has the advantage that arbitrary small accuracy can be reached independent of the population size. But this method is still unexplored like the adaptation
methods, too. In Section 8.4, a first technique is presented to derive the meta-rule
in 2-dimensional dynamic problems.
161
7. A NALYSIS OF L OCAL O PERATORS FOR T RACKING
7.9
Conclusion
This chapter has investigated the local mutation in a dynamic environment thoroughly. The focus has been on the question whether the principles of ES mutation,
i.e. a zero-mean and preferably small change, is a good choice for arbitrary tracking
problems. Furthermore the adjustment of the parameters and necessary additional
techniques are examined.
The examination results in a set of design rules that are summarized in the following
list.
1. By increasing the maximal step size parameter and/or the offspring population size tracking becomes feasible for any severity value. Increasing the
maximal step size decreases the accuracy. Increasing the population size can
decrease the minimal required value for the maximal step size parameter.
2. Optimal step size parameters can be roughly estimated by the break even
expected distance of the worst-case analysis. The consequences of too big
parameter values are less severe than too small values with regard to the
accuracy for a fixed number of generations.
3. For small population sizes and predictable dynamics with small severity, a
well-orientated directed local mutation is able to reduce the divergent behavior of local mutations with small maximal step sizes. It does not solve the
problems with higher severity values.
4. The proposition of bigger steps enables better accuracy rates, but requires
proper calibration of the step size parameter. For problems with varying
severity values, the latter point should be guaranteed. Otherwise the accuracy
rates may drop below the accuracy of the local mutation.
5. Even with small population sizes, combining non-zero mean mutation with
the proposition of bigger steps may lead to very precise tracking accuracy.
6. If the severity depends on the number of evaluations there is an optimal offspring population size. This optimal number of offspring per generation increases if the value α or the value β increase. This implies especially for very
restricted time resources rather big offspring population sizes. For the examined mutation operator and β = 0, the population size 10–15 could serve as
a rule of thumb. A detailed analysis is necessary for concrete recommendations.
162
7.9. C ONCLUSION
7. For a mere tracking task the usage of an external memory should be avoided
if the individuals from the memory have a success rate of psuccess ≤ 0.01.
However, introducing one individual from the memory into the population
affects the tracking accuracy moderately such that an external memory is
useful for tracking problems with non-predictable, repetitive phases or alternation with low severity.
8. For problems with drastically changing dynamics, the use of the local mutation together with a self-adaptation mechanism promises a better tracking
behavior.
9. If the direction of the dynamics is expected to change considerably, it is advisable to use an undirected mutation operator with few strategy variables to
ensure quick adaptation.
10. Sole dependence of a self-adaptation of parameters on the fitness value is not
advisable. The tracking rate should be reflected as a control quantity in any
component of the algorithm.
Besides these rules that might help to construct algorithms that are better suited to
tracking problems, the moving corridor problem is introduced as a new benchmark
problem for self-adaptation and adaptation techniques. This problem was used to
identify a new type of hard problems in the domain of dynamic environments.
163
CHAPTER 8
Four Case Studies Concerning the
Design Rules
This chapter links the findings of Chapter 7 together with standard evolution strategy. In four small case studies the validity of (a subset of) the design rules in a
continuous search space is shown. Since evolution strategies are usually used with
self-adaptation or adaptation techniques, the used step width changes too. As a
consequence, this chapter here is not concerned with concrete optimal parameter
values. It is rather intended to underline the general applicability of the design rules
to optimization with evolution strategies.
Section 8.1 is concerned with the various adaptation and self-adaptation mechanisms. Concerning the proposition of bigger steps a new mutation operator is derived from the previous chapter in Section 8.2. Section 8.3 examines the moving
corridor problem as a pathological combination of tracking with optimization in
the context of evolution strategies. And, eventually, Section 8.4 explores the possibilities to add a new adaptation scheme by building a global model of the dynamics
during optimization.
165
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
8.1
8.1.1
Adapting and self-adapting local operators
Experimental setup
All the experiments in this section use the rotation shown in Example 4.3 where
only one static component function is rotated around a point in the search space.
The optimum is always positioned in the center of the rotation. And the task is to
detect the optimum in spite of the changing environment. This kind of problem
was in particular chosen for this examination since it reveals certain difficulties for
the optimizer:
• the circular movement is more pretentious than a linear movement concerning more advanced adaptation schemes,
• points close to the optimum have a smaller severity than points with a bigger
distance to the optimum, and
• it resembles the characteristics of the moving corridor problem since again
the algorithm has to keep up tracking of certain points where the optimization
direction is orthogonally to the tracking direction.
The following fitness functions are used as static component functions. Their general structure is visualized in Figure 8.1 for the two dimensional case. All problems
must be minimized.
• Weighted Sphere: for xi ∈ [−30, 30] (1 ≤ i ≤ n) and n = 30
f (~x) =
n
X
ix2i
i=1
This is a unimodal problem. This problem is used to determine how well the
algorithm optimizes in a problem with small fitness changes.
• Rastrigin: for xi ∈ [−1, 1] (1 ≤ i ≤ n) and n = 30
f (~x) = 10n +
n
X
x2i − 10 cos(2πxi )
i=1
Note the restricted definition range of the Rastrigin function. However still
the problem is multimodal with several local optima situated around the
global optimum. This problem faces the optimizer with static difficulty –
overcoming a local optimum – embedded into a dynamic environment.
166
8.1. A DAPTING AND SELF - ADAPTING LOCAL OPERATORS
• Cone segment: for xi ∈ [−10, 10] (1 ≤ i ≤ n) and n = 30
f (~x) =

 pPn
2
i=1 xi ,
 pPn
i=1
102 ,
if arccos
√ 2 x1 2
x1 +...+xn
≤α
otherwise
with α = π2 in this investigation. This problem is also unimodal. However,
only in one direction from the optimum fitness information is available that
leads to the optimum. All other directions are part of a plateau with constant “bad” fitness. This problem helps to investigate the ability to follow a
small tracking region and requires further optimization within the changing
landscape.
• Windmill: for xi ∈ [−1, 1] (1 ≤ i ≤ n), n = 2, and the following fitness
function described in polar coordinates (r, ϕ)
f (r, ϕ) =
r,
P0 (r, ϕ) ∨ P π2 (r, ϕ) ∨ Pπ (r, ϕ) ∨ P 3π (r, ϕ)
2
2.0, otherwise
with the predicate
√
√
√
4 2
4 2
2
−
≤ r ≤ (ϕ + rot)
Prot (r, ϕ) ≡ (ϕ + rot)
π
9
π
π
This function results in four segments of width 5◦ = 36
that look like a windmill. It is a two-dimensional version of the cone segment where a more accurate tracking is necessary. However, the existence of four segments releases
the punishment if one segment is lost since the next approaching segment
offers a next chance to gain information leading to the optimum.
The rotation must be supplied with a predetermined rotation time τ , which equals
the number of generations until previous landscapes are repeated again. The rotation is managed by the multiplication of single rotations around two coordinate
axes. The rotation is given exactly in the following definition.
Definition 8.1 (Rotation matrix) The rotation matrix for an n-dimensional search
space is defined by a permutation Perm of the dimensions {1, . . . , n} and the rotation time τ . In addition, n is required to be even. Then the rotation matrix is
defined as the multiplication
R = Rp1 ,p2 · Rp3 ,p4 · · · Rpn−1 ,pn
167
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
0
-20
-30 -20
-10
0
0
10
20
10
0
-10
-20
-30
20
30
-40
1
0.5
-1
-0.5
0
0
0.5
-0.5
-1
Weighted sphere
Rastrigin
0
-1
-2
10
5
-5
0
0
5
-5
-10
Cone segment
-1
-0.6 -0.2
0.2
0.6
-1
0.2
-0.2
-0.6
0.6
1
Spiral
Figure 8.1 Inverse diagrams of the used static fitness functions. In all functions
the optimum is positioned at (0, 0).
with Perm = (p1 , . . . , pn ) and the pairwise rotations Ri,j defined by

cos( 2π
),

τ


2π

 sin( τ ),
− sin( 2π
),
Ri,j (k, l) =
τ


1,



0,
if (k = i ∧ l = i) ∨ (k = j ∧ l = j)
if k = i ∧ l = j
if k = j ∧ l = i
if k = l ∧ k 6= i ∧ k 6= j
otherwise
for 1 ≤ i, j, k, l ≤ n. The complete rotation matrix at generation g > 0 may be
computed as Rg .
♦
Example 8.1 The matrix for a rotation in a 4-dimensional search space R4 determined by the permutation (1, 3, 4, 2) is constructed using the following two basic
168
8.1. A DAPTING AND SELF - ADAPTING LOCAL OPERATORS
cycle time
5
25
50
1/5-success rule
0.225 0.272
0.217
0.0928 0.027
0.0117
self-adaptive isotropic
self-adaptive non-isotropic 0.179 0.0758 0.082
100
0.0315
0.0
0.0696
200
0.0
0.0
0.0736
Table 8.1: Fraction of lost generations where there is no valid individual in the
population (averaged over all respective runs and generations).
rotations
cos( 2π
)
τ

0
= 
 sin( 2π )
τ
0

R1,3

R4,2
0 −sin( 2π
)
τ
1
0
0 cos( 2π
)
τ
0
0
1
0
 0 cos( 2π )
τ
= 
 0
0
0 −sin( 2π
)
τ

0
0 

0 
1

0
0
0 sin( 2π
) 
τ


1
0
2π
0 cos( τ )
resulting in the overall rotation matrix

cos( 2π
)
0
sin( 2π
)
0
τ
τ
2π

0
cos( τ )
0
sin( 2π
)
τ
R = 
2π
2π
 sin( )
0
cos(
)
0
τ
τ
0
−sin( 2π
)
0
cos( 2π
)
τ
τ




♦
In the experiments presented here the values for τ are 5, 25, 50, 100, and 200
for the weighted sphere, the cone segment, and the Rastrigin function. And the
permutations Perm are created randomly. For the windmill function τ equals 72
and 144. Also note, that in case of the windmill function also the static landscape
with τ = 2π was examined.
The evolution strategy as an optimization algorithm is used as a (15, 100)-strategy
without recombination. There was no further tuning of this parameters. The offspring population size was investigated closer for the two-dimensional windmill
function where a (1, λ)–strategies with λ ∈ {10, 15, 20, 25, 30, 35, 50, 75, 100}
was used. In order to adapt the step-size of the mutation the following mechanisms
are used:
169
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
30
1/5 rule
isotropic
non-isotropic
25
20
15
10
5
0
0
40
cycle time 25
distance to optimum
distance to optimum
cycle time 5
30
25
20
15
10
5
0
80 120 160 200
generation
0
distance to optimum
distance to optimum
25
20
15
10
5
0
0
40
80 120 160 200
generation
80 120 160 200
generation
cycle time 200
cycle time 50
30
40
30
25
20
15
10
5
0
0
40
80 120 160 200
generation
Figure 8.2 Rotating weighted sphere optimized by evolution strategies
• the 1/5-success rule by Rechenberg (1973) as a global adaptation mechanism,
• self-adaptation of the isotropic mutation with one strategy parameter which
is applied to all search space dimensions (see Schwefel, 1981),
• step-size self-adaptation for the non-isotropic mutation with n strategy parameters (see Schwefel, 1981), and
• the covariance matrix adaptation (cma) by Hansen and Ostermeier (1996,
2001) for the two-dimensional windmill function only.
170
8.1. A DAPTING AND SELF - ADAPTING LOCAL OPERATORS
30
25
20
15
10
5
0
cycle time 5
fraction of lost runs
distance to optimum
cycle time 5
0
40
80 120 160 200
generation
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
25
20
15
10
5
0
0
40
80 120 160 200
generation
fraction of lost runs
distance to optimum
1/5 rule
isotropic
non-isotropic
80 120 160 200
generation
cycle time 25
cycle time 25
30
40
0.5
0.4
0.3
0.2
0.1
0
0
40
80 120 160 200
generation
Figure 8.3 Rotating cone segment with fast severity optimized by evolution strategies
All experiments are averaged over 100 – or in the case of the windmill function
200 – independent runs. As a basis for comparison the best fitness value and the
distance of the best individual to the optimum of each generation are used. In case
of the rotating cone segment also the percentage of generations is considered where
the segment with “valid” or “good” fitness was completely lost.
The results are displayed in Figures 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, and Table 8.1.
They are described and discussed in the following sections in various different
171
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
30
1/5 rule
isotropic
non-isotropic
25
20
15
10
5
0
0
40
cycle time 100
fraction of lost runs
distance to optimum
cycle time 100
0.25
0.2
0.15
0.1
0.05
0
80 120 160 200
generation
0
30
25
20
15
10
5
0
0
40
80 120 160 200
generation
80 120 160 200
generation
cycle time 200
fraction of lost runs
distance to optimum
cycle time 200
40
0.25
0.2
0.15
0.1
0.05
0
0
40
80 120 160 200
generation
Figure 8.4 Rotating cone segment with low severity optimized by evolution strategies
contexts.
8.1.2
Limitations of local operators
In this subsection, we are concerned with the question whether the evolution strategy is actually able to solve the dynamic optimization problems at all. A comparison of the different techniques follows in the next two subsections.
172
8.1. A DAPTING AND SELF - ADAPTING LOCAL OPERATORS
3.5
3.4
3.3
3.2
3.1
3
2.9
2.8
2.7
2.6
1/5 rule
isotropic
non-isotropic
0
40
cycle time 50
distance to optimum
distance to optimum
cycle time 5
80 120 160 200
generation
3.4
3.3
3.2
3.1
3
2.9
2.8
2.7
2.6
0
3.4
3.3
3.2
3.1
3
2.9
2.8
2.7
2.6
0
40
80 120 160 200
generation
3.4
3.3
3.2
3.1
3
2.9
2.8
2.7
2.6
0
0
40
80 120 160 200
generation
40
80 120 160 200
generation
cycle time 100
fitness
fitness
cycle time 50
300
280
260
240
220
200
180
160
140
120
80 120 160 200
generation
cycle time 200
distance to optimum
distance to optimum
cycle time 100
40
280
260
240
220
200
180
160
140
120
100
0
40
173
80 120 160 200
generation
Figure 8.5 Rotating Rastrigin function optimized by evolution strategies
0.6
0.4
0.2
40
80 120 160 200
generations
avg. distance to optimum
cycle time 144, population size 15
1
0.8
0.6
0.4
0.2
0
40
80 120 160 200
generations
avg. distance to optimum
cycle time 72, population size 15
1
0.8
0.6
0.4
0.2
174 0
40
80 120 160 200
generations
static case, population size 30
1
0.8
0.6
0.4
0.2
0
40
80 120 160 200
generations
cycle time 144, population size 30
1
avg. distance to optimum
0
avg. distance to optimum
static case, population size 15
1
isotropic
non-isotropic
0.8
cma
0.8
0.6
0.4
0.2
0
40
80 120 160 200
generations
cycle time 72, population size 30
1
avg. distance to optimum
avg. distance to optimum
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
0.8
0.6
0.4
0.2
0
40
80 120 160 200
generations
Figure 8.6 Windmill function: average convergence for small populations and various degrees of rotation.
8.1. A DAPTING AND SELF - ADAPTING LOCAL OPERATORS
0.6
0.4
0.2
0
40
80 120 160 200
generations
cycle time 72, population size 100
1
avg. distance to optimum
avg. distance to optimum
cycle time 72, population size 50
1
isotropic
non-isotropic
0.8
cma
0.8
0.6
0.4
0.2
0
40
80 120 160 200
generations
Figure 8.7 Windmill function: average convergence for big populations and cycle
time τ = 72.
Operators that are local concerning the phenotypic search space are very effective
on rather smooth static landscapes. As the experiments on the rotating weighted
sphere (Figure 8.2) show, this is also the case on a smooth unimodal function rotating around the optimum.
Also the evolution strategy is able to optimize the rotating cone segment as the results in Figures 8.3 and 8.4 show. Although this function is the only function where
a successful tracking of a certain region is strictly required (like in the moving corridor problem of the previous chapter), this task is solved adequately.
But, similar to static optimization, as soon as multi-modal problems are involved
the picture is different. The rotating Rastrigin function cannot be solved by all
variants of the evolution strategy (Figure 8.5). Apparently the algorithm is not able
to overcome the local optima by mere local search. The reasons for this behavior
have not been investigated further. But since the evolution strategy is able to follow
the cone segment in the previous problem, we assume that the interplay of the
distance between local and global optima, the severity of the dynamics, and the
techniques for step-size adaptation prevent the algorithm from detecting the basin
of the global optimum.
175
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
cma, run 91, 60 generations
1
cma, run 99, 60 generations
1
0.5
0.5
0
0
-0.5
-0.5
-1
-1
-0.5
0
0.5
1
-1
-1
-0.5
0
0.5
1
non-isotr., run 93, 110 generations
1
non-isotr., run 95, 130 generations
1
0.5
0.5
0
0
-0.5
-0.5
-1
-1
-0.5
0
0.5
1
isotr., run 99, 100 generations
1
-1
0.5
0
0
-0.5
-0.5
-1
-0.5
0
0.5
1
-0.5
0
0.5
1
isotr., run 28, 200 generations
1
0.5
176 -1
-1
-1
-1
-0.5
0
0.5
1
Figure 8.8 Windmill function: path of the best individuals for exemplary runs with
cycle time τ = 72 and population size 15. The points where the visible
segments are lost are marked with a +. Note that the visible segments
rotates clockwise.
Rotation 0.5, population size 15
140
isotropic
120 non-isotropic
cma
100
80
60
40
20
0
0
60
40
20
0
frequency
0.4 0.8 1.2
distance to optimum
Rotation 1.0, population size 15
120
isotropic
non-isotropic
100
cma
80
0
frequency
Rotation 0.0, population size 15
180
isotropic
160 non-isotropic
140
cma
120
100
80
60
40
20
0
0
0.4 0.8 1.2
distance to optimum
0.4 0.8 1.2
distance to optimum
frequency
frequency
frequency
frequency
8.1. A DAPTING AND SELF - ADAPTING LOCAL OPERATORS
Rotation 0.0, population size 30
200
isotropic
180 non-isotropic
160
cma
140
120
100
80
60
40
20
0
0
0.4 0.8 1.2
distance to optimum
Rotation 0.5, population size 30
200
isotropic
180 non-isotropic
160
cma
140
120
100
80
60
40
20
0
0
0.4 0.8 1.2
distance to optimum
Rotation 1.0, population size 30
180
isotropic
160 non-isotropic
140
cma
120
100
80
60
40
20
0
0
0.4 0.8 1.2
distance to optimum 177
Figure 8.9 Windmill function: distribution of the final fitness values of the 200
experiments for each parameter setting for a qualitative comparison of
the reliability of the different algorithms.
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
8.1.3
Adaptation
As an adaptive technique the 1/5-success rule was used for the adaptation of the
step size in evolution strategy mutation. This technique is known to work well for
unimodal static problems but gets trapped easily in the case of multimodal problems.
In the first experiments using the rotating weighted sphere (Figure 8.2), the 1/5success rule performs superior as long as the rotation time is very fast. It worsens slowly and with the rotation time τ = 200 certain problems become visible.
This result is somewhat surprising since one would expect a better performance
with slower rotation. But apparently, in case of slow rotation, certain points in
the landscape are conceived as if they were local optima during certain periods of
the optimization since all points in this region have degrading fitness. Then, the
1/5-success rule reduces the step size considerably and impedes the optimization
process. In case of fast rotation, the 1/5-success rule is not able to reduce the step
size comparably. As a consequence, the optimization process is not affected and
the 1/5-success rule outperforms all other algorithms. The fast rotation of an unimodal search space around the optimum helps to soften the effects of premature
convergence occurring with slow speeds of rotation.
However, these results are not transferable to arbitrary dynamic problems. In the
case of the slowly rotating cone segment (Figure 8.3 and Table 8.1) the 1/5-success
rule shows an insufficient tracking behavior which leads to mediocre performance.
One reason for this behavior is the very high fraction of lost runs which underlines
that the 1/5-success rule is not able to adapt the operator appropriately and to follow the moving cone segment. These effects vanish with increasing cycle time, i.e.
decreasing severity. With cycle time 200, the algorithm is even able to follow the
cone segment in all runs and the 1/5-success rule outperforms the self-adapting
techniques. As a consequence, it seems that the 1/5-success rule is a good choice
if the dynamics are very slow and the tracking area is distinct from the remaining
search space.
The investigation of the 1/5-success rule shows that results of algorithms in dynamic environments must be examined and interpreted very carefully since various
reasons and interactions can lead to good or poor performance.
8.1.4
Self-adaptation
Contrary to the 1/5-success rule as an adaptation mechanism, the results indicate
that self-adaptation is able to yield good results on a wider range of problems. In
178
8.1. A DAPTING AND SELF - ADAPTING LOCAL OPERATORS
particular, this is true for the step-size adaptation of the isotropic mutation. Here a
very stable behavior can be observed on the rotating weighted sphere (Figure 8.2)
where the performance even seems to be independent of the cycle time of the problem. In case of the rotating cone segment the step-size adaptation of the isotropic
mutation shows also a good performance (Figures 8.3 and 8.4)—however with increasing severity (or decreasing cycle time) the performance worsens slightly. As
Table 8.1 indicates, this is also due to an increase of lost generations, i.e. no individual could be placed in the cone segment. But still the fraction of lost generations
is very small compared to the other used techniques.
The self-adaptation using n strategy parameters for the non-isotropic mutation
proves to be less adequate than the adaptation of the isotropic mutation. In particular this is due to the rotating movement which requires the self-adaptation mechanism to adapt all separate strategy parameters simultaneously in a correlated manner. As the fraction of lost runs for cycle time 100 shows in Figure 8.4, there is a
peak of lost runs after each quarter rotation—these are the points where the sign of
half of the strategy parameters has to change. Apparently this is rather problematic
and is at least one reason for the inappropriate performance.
The self-adaptation for the isotropic mutation uses only one strategy parameter
and, therefore, shows a much more stable behavior. This might be a hint that selfadaptation involving less strategy variables are able to react more promptly—as it
is indicated by Design rule 9.
This hypothesis is examined more closely with the rotating windmill function with
dimension 2. On this function self-adaptation for both types of mutation as well as
the covariance matrix adaptation cma are investigated.
Figure 8.6 shows the results for small population sizes by using the distance to the
optimum as convergence criterion. The covariance matrix adaptation outperforms
both other self-adaptation techniques in the static case. However, with increasing
severity, cma performs considerably worse. Also the self-adaptation of the nonisotropic mutation with n strategy parameters worsens. The self-adaptation of the
isotropic mutation is the only technique that shows with an offspring population
size of 30 individuals a stable performance in the static problem and the two dynamic problems. However, the differences between 15 and 30 offspring indicates
already that increasing the offspring population can improve the performance of
cma considerably. Figure 8.7 shows that an offspring population size of 100 individuals suffices for cma to beat the self-adaptive isotropic mutation again. However, such an offspring populations size seems to be inadequate for a problem with
a two-dimensional search space.
By examining the windmill experiments closer, a few explanations for the insuffi179
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
cient behavior of more complex self-adaptive techniques are found. Obviously the
simple adaptation technique is able to adapt quickly within the dynamically changing environments where the more sophisticated mechanisms are too inert to adapt.
The information is not steady enough to enable successful adaptation. In the case
of covariance matrix adaptation two exemplary runs in Figure 8.8 show an often
occurring behavior for cycle time τ = 72 with a small populations size: the mutation is able to track the visible segment for some generations but then gets lost and
can recover only very seldom, though visible information is passing by. Figure 8.9
reflects this behavior and shows that a high percentage of the experiments got lost
far from the optimum.
In the case of the self-adaptive non-isotropic mutation with n strategy parameters,
the exemplary runs in Figure 8.8 show that, although the visible segment is tracked
for periods, the behavior is rather erratic. In one example, it is not attracted by the
optimum and only moves in a big circle around it. In the other example, it gets close
to the optimum but moves away again. Moreover, it seems as if the self-adaptation
only adapts in one dimension at a time.
Figure 8.8 shows also two typical runs for the self-adaptive isotropic mutation in
which the visible segment is tracked very well and the search is drawn towards the
optimum. However, in the right example the algorithm is not able to stay close
to the optimum and tends to move again slightly away from it. Still, a very good
accuracy of the optimization is also reflected in the final fitness distributions in
Figure 8.9.
This comparison illustrates that sometimes the simple self-adapting mutations are
more successful than complex, smart adaptation mechanisms. The reason for this
behavior is the underlying supposition for those mechanisms that the fitness landscape remains firm until the adaptation takes place. In the case of dynamic landscapes this supposition is seldom true.
8.1.5
Discussion
The experiments confirm the consideration of the previous chapter for the selfadaptation. A more simple self-adaptation mechanism seems to be advisable (Design rule 9). However adaptation in form of the 1/5-success rule is not a general
enough adaptation mechanism.
The investigation of the offspring population size also confirms Design rule 1 concerning the influence of the population size.
The consequences concerning optimization and tracking in Design rule 10, could
180
8.2. P ROPOSITION OF BIGGER STEPS
not be discovered in the examination of the rotating cone segment. Here, at least the
step-size adaptation with one strategy parameter shows a good adaptive behavior.
The results of this section are based on the experiments of two previously published
articles (Weicker & Weicker, 1999, 2000).
8.2
Proposition of bigger steps
This section examines whether Design rule 4 is applicable, i.e. the proposition of
bigger steps leads to an improved tracking accuracy.
For this investigation, a mere tracking problem was chosen: a decentralized sphere
rotating around the center. The static component function is defined as follows.
• Decentralized Sphere: for xi ∈ [−30, 30] (1 ≤ i ≤ n) and n = 2
!2
r
n
X
302
f (~x) =
xi −
n
i=1
Again the rotation from Definition 8.1 is applied to turn the static problem into
a dynamic one. For the cycle time τ ∈ {50, 100} is considered. Note, that the
distance of the optimum to the center of the rotation is always 30, which means
that the optimum is always inside or at the border of the search space.
For this examination, a (1, 15)-evolution strategy is used. The standard isotropic
Gaussian mutation with step-size self-adaptation is compared to the following ringlike mutation which is derived from the analytical investigation in Section 7.5.
Where in the theoretical investigation the ring-like mutation was parameterized by
the maximal step size maxstep, this is not possible in a real-valued search space
if the ring-like mutation is based on the Gaussian pdf. However, for convenience
we still call the strategy parameter of the real-valued ring-like mutation maxstep.
When mutating an individual A = hA1 , . . . , An , maxstepi we proceed as follows.
1. The maximal step size is update like the standard deviation in isotropic stepsize adaptation of evolution strategies
maxstep 0 ← maxstep e
1
√
N (0,1)
l
2. Randomly a unity vector v = hv1 , . . . , vn i is chosen for the direction of the
mutation.
181
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
3. A random number x ∼ N (0, 1) is chosen which is mapped to the step size
stepsize ←
4+x
maxstep 0
8
4. each component Ai (1 ≤ i ≤ n) is updated using the following rule
A0i ← Ai + stepsize vi
The new individual is A0 = hA01 , . . . , A0n , maxstep 0 i.
Figure 8.10 shows how the random variable x is mapped to the distance range
[0, maxstep]. However maxstep is not the maximal occurring step size, with a
probability of approximately 0.000317 a bigger step may occur (see p.20, Bronstein & Semendjajew, 1991). And similarly steps with a negative steps size may
occur with probability 0.000317 too, i.e. the opposite direction is used. Besides
those restrictions, this mutation imitates the discrete ring-like mutation as close as
possible.
probability
mapped to distances [0, maxstep]
0.4
Gaussian
pdf
0.3
0.2
probability
to step in
opposite
direction
probability to
go further than
maxstep
0.1
0
-4
-2
0
2
random variable x for the step size
4
Figure 8.10 The probability of the step size in the ring-like mutation is explained
in this figure.
For each set-up of problem and algorithm 100 experiments have been executed for
200 generations using different initial random seeds. Student’s t-test is applied to
the best fitness values of each generation. A difference between the mean best
182
50
4
sig. for Ring
Gaussian
Ring
40
2
30
t-value
best fitness (on average)
8.2. P ROPOSITION OF BIGGER STEPS
20
-2
10
0
0
0
40
-4
80 120 160 200
generation
sig. for Gauss
0
40
80 120 160 200
generation
4
10
Gaussian
Ring
8
6
4
0
-2
2
0
sig. for Ring
2
t-value
best fitness (on average)
Figure 8.11 This figure shows the results of the Gaussian and the ring-like mutation on the decentralized, rotating sphere function with cycle time
50. The best fitness value averaged over 100 experiments is shown on
the left. The t-value of the statistical hypothesis test is shown on the
right.
0
40
80 120 160 200
generation
-4
sig. for Gauss
0
40
80 120 160 200
generation
Figure 8.12 This figure shows the results of the Gaussian and the ring-like mutation on the decentralized, rotating sphere function with cycle time
100. The best fitness value averaged over 100 experiments is shown
on the left. The t-value of the statistical hypothesis test is shown on
the right.
fitness values of the two algorithms is considered to be significant if the error rate
183
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
is less than 0.05, i.e. t-values above 1.96 or below -1.96.
Figure 8.11 shows the results for cycle time 50. If we exclude the initial phase
for finding the optimum, there is in almost all generations a better average best
fitness of the ring-like mutation and in many generations the difference is even
significant. For a smaller severity (cycle time 100), the results in Figure 8.12 show
that the advantage of the ring-like mutation over the Gaussian mutation decreases.
Altogether we can conclude that there is an obvious trend in favor of the ring-like
mutation with increasing severity which validates the statement of Design rule 4.
8.3
Self-adaptation for the moving corridor
Section 7.8.2 of the last chapter has argued that in a set-up where the tracking
direction and the optimization direction are arranged orthogonally a self-adaptation
mechanism of a local operator will increase its step size until the step size is so big
that no accurate tracking is possible anymore. This statement is reconsidered for
evolution strategies.
The problem is defined very similar to the discrete version of the problem with a
few particularities. Again the dimension of the problem is 2. Since we consider
both isotropic and non-isotropic mutation with self-adaptation the direction of the
dynamics are arranged in a 30◦ angle from one axis of the search space to rule out
effects due to an alignment. The width of the corridor is 1.0 and the severity is
measured as the fraction of the area of the corridor that is not covered anymore by
the corridor of the next generation. In this investigation a severity of 0.6 is used.
Both the set-up of the corridor and the definition of the severity are displayed in
Figure 8.13.
Because of the very narrow corridor that must be tracked, a (1, 40)-evolution strategy is used. As indicated already above, mutation operators with one and n strategy
parameters are used. For each set-up of algorithm and problem 100 independent
experiments are executed using different initial random seeds. As a performance
measure the average best fitness over all experiments, the percentage of invalid individuals created, and the percentage of experiments which got completely lost, i.e.
they could not track the tracking region, are considered.
The results are shown in Figure 8.14 exemplary for dynamics 0.6. Other severity
values reveal a similar behavior and are not shown here. The step size adaptation
of the isotropic mutation shows an astonishingly good behavior. However as the
theoretical analysis in the previous chapter predicted, this is accompanied by a
huge increase in completely lost runs and invalid individuals. At the end of 200
184
8.3. S ELF - ADAPTATION FOR THE MOVING CORRIDOR
dynamics
degree of
dynamics
0.6
30°
0.0
1.0
0.2
tracking
region
optimum
Figure 8.13 The orthogonal arrangement of tracking and optimization of the moving corridor problem is shown at the left as well as the orientation of
the tracking direction. On the right the definition of the severity is
sketched.
generations more than 50 percent of all experiments have lost the moving corridor.
This is due to the inevitable increase of the values in the strategy parameter, shown
for the isotropic mutation in the left part of Figure 8.15.
In order to avoid such a behavior Design rule 10 proposes to break up the sole
dependence of the self-adaptation on the fitness value. Since the suggested consideration of the tracking rate is difficult to realize, a different technique was chosen here to take care of the discrepancy between the greedy behavior concerning
the optimization and the needs for successful tracking. The primary idea is to
select object values and strategy values with different mechanisms. The object
values are selected with the usual selection using the best fitness (comma selection). However, the strategy values are determined as follows. For all current
offspring
individuals that have a better fitness than the parent’s fitness, the distance
qP
n
child
− sparent
)2 of their strategy variables ~schild and the parent’s strategy
i
i=1 (si
variables ~sparent are computed. Those strategy variables with a minimal deviation
from the parents strategy values are selected. If there are no offspring with better
fitness, the strategy variables are selected with the best fitness selection too. The
new individual is composed of the selected values. This selection mechanism is
referred to as distinct selection.
The distinct selection is applied to the isotropic mutation. The results concerning
the moving corridor problem are displayed in Figures 8.14 and 8.15. The distinct
selection dampens the explosion of the strategy variables. As a consequence, the
185
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
10000
9900
9800
9700
9600
9500
9400
9300
isotropic
non-isotropic
isotr. (dist. sel.)
0
40
80 120 160 200
generation
all runs
all runs
fraction of invalid inds.
fraction of completely lost runs
avg. best fitness
successful runs
1
0.8
0.6
0.4
0.2
0
0
40
80 120 160 200
generation
1
0.9
0.8
0.7
0.6
0
40
80 120 160 200
generation
Figure 8.14 Results for the moving corridor problem with strength of dynamics
0.6: the upper row shows the best fitness values of all successful
experiments on average (with two differently scaled ordinates), the
lower row shows the fraction of completely lost experiments (left)
and the fraction of invalid generated individuals (right).
created invalid individuals can be kept on an almost constant level which leads to a
significantly smaller increase of the completely lost runs.
For a test how the distinct selection behaves on a mere tracking task, the moving
circle problem is considered where inside the circle the distance to the center serves
186
8.3. S ELF - ADAPTATION FOR THE MOVING CORRIDOR
1400
1200
1000
800
600
400
200
0
all runs
avg. best strategy vars
avg. best strategy var
all runs
isotropic mutation
0
40
0
80 120 160 200
generation
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
isotropic mutation
with distinct selection
0
40
80 120 160 200
generation
Figure 8.15 Analysis of the adaptation of strategy variables for the moving corridor problem: the left picture shows the values for the standard
isotropic mutation with one strategy parameter and the right picture
shows the values for isotropic mutation with distinct selection.
severity
0.6
tracking
region
0.0
dynamics
30°
1.0
0.2
0.2
0.6
Figure 8.16 This figure shows the moving circle problem where the dynamics and
the severity of the dynamics are similarly defined to the moving corridor problem.
as fitness value and outside of the circle a constant “bad” fitness value is assumed.
The dynamics are introduced analogously to the dynamics in the moving corridor
problem. They are sketched in Figure 8.16. Figure 8.17 shows the results of the
187
Dynamics 0.6, successful runs
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0 40 80 120 160 200
generation
Dynamics 1.0, successful runs
0.14
avg. best fitness
0.12
0.1
0.08
0.06
0.04
0.02
188
0
40
80 120 160 200
generation
percentage of invalid inds.
Dynamics 0.2, all runs
0.6
standard isotr.
0.5 isotr.
(dist. sel.)
0.4
0.3
0.2
0.1
0
0
40
80 120 160 200
generation
Dynamics 0.6, all runs
percentage of invalid inds.
Dynamics 0.2, successful runs
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
0 40 80 120 160 200
generation
0.55
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0
40
80 120 160 200
generation
Dynamics 1.0, all runs
percentage of invalid inds.
avg. best fitness
avg. best fitness
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
0.65
0.6
0.55
0.5
0.45
0.4
0.35
0.3
0
40
80 120 160 200
generation
Figure 8.17 Impact of the distinct selection on the behavior in a mere tracking
task.
8.4. B UILDING A MODEL OF THE DYNAMICS
experiments for severity of dynamics 0.2, 0.6, and 1.0. In case of this mere tracking
task, the performance of the algorithm is not affected by the new selection. These,
experiments indicate that this selection technique is a useful means for dynamic
problems. The basic idea of considering most similar individuals is used in the
next section to define a more global adaptation mechanism.
The results in this section are based on the experiments of an article previously
published at a conference (Weicker, 2001).
8.4
Building a model of the dynamics
As Design rule 10 and the case study in the previous section have shown, sole
dependence on self-adaptive mechanisms can be harmful. In the previous section,
distinct selection of strategy and object variables has been proposed as one means
to reduce the negative effects. The approach in this section goes one step further
since it tries to implement a small-scale version of the vision noted in Section 7.8.3:
the derivation of the underlying rules of dynamics during the optimization process.
The idea of learning something about the problem during optimization is not new.
This is the basis for all self-adaptation techniques as well as the approach to build
statistical models of the fitness distribution of the problem (Baluja, 1994; Pelikan,
Goldberg, & Cantú-Paz, 1999; Mühlenbein & Mahnig, 2001; Sebag, Schoenauer,
& Ravisé, 1997a, 1997b). However, as it was already noted in Section 6.8, in
the field of dynamic environments, the approach of Munetomo et al. (1996) using
stochastic learning automata is unique. But, it is not applicable to drifting problems. As a consequence this section proposes a new method for the purpose of
showing the feasibility of such a technique.
In Section 4.3 the possible coordinate transformations in the framework have been
identified as rotations and linear translations as well as combinations of both. It was
argued in Section 4.4.1 that especially the latter case makes a simple derivation
of the dynamics from the movement of a few points very hard. This problem is
complicated by the fact that the evolutionary algorithm cannot derive the exact
movement of certain points in the search space.
For the two-dimensional case any dynamics of a predictable problem can be described by the following two formulas.
x ← a + u + cos(α)(x − a) − sin(α)(y − b)
y ← b + v + sin(α)(x − a) − cos(α)(y − b)
It describes a linear translation if a = b = α = 0 where the translation equals
189
40
35
30
25
20
15
10
5
0
5
Gaussian
Predictive
sig. for Predictive
4
3
t-value
best fitness (on average)
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
2
1
0
-1
0
40
80 120 160 200
generation
-2
sig. for Gauss
0
40
80 120 160 200
generation
Figure 8.18 This figure shows the results of the Gaussian and the predictive mutation using a global model applied to the decentralized, rotating sphere
function with cycle time 50. The best fitness value averaged over 100
experiments is shown on the left. The t-value of the statistical hypothesis test is shown on the right.
the vector (u, v). With u = v = 0 a rotation around the center (a, b) and rotation
angle α is described. If now the parameter values a, b, u, v, α can be learnt correctly during the optimization process, the two formulas can be used to eliminate
the dynamics of the problem with the consequence that—in spite of probably big
coordinate severity—arbitrary small accuracy during tracking is possible.
The basic algorithm is a (1, 15)-evolution strategy. Each offspring individual is
created by mutation only—no recombination operator is used.
The mechanism to adapt the dynamics model works as follows. In each generation
the fitness of the created offspring is compared to the fitness of the parent individual. And like in the distinct selection, a comparison concerning the most similar
individuals is used. However, here not the strategy variables but the most similar
fitness values are used to approximate the movement of the position associated with
the parent in the search space. The position of the parent and the vector from the
parent to the most similar offspring are stored together in a FIFO list. In the current
implementation the list holds at most 50 positions and vectors. After generation
30, it is assumed that enough data is collected and the learning process starts.
Currently a rather expensive learning procedure has been chosen. At the end of
each generation the center of a rotation is approximated in a first step which is
described in the next paragraph. If the approximated center lies within the search
190
14
Gaussian
Predictive
12
10
t-value
best fitness (on average)
8.4. B UILDING A MODEL OF THE DYNAMICS
8
6
4
2
0
0
40
80 120 160 200
generation
6
5
4
3
2
1
0
-1
-2
sig. for Predictive
sig. for Gauss
0
40
80 120 160 200
generation
0.4
10
Gaussian
Predictive
0.3
8
6
t-value
best fitness (on average)
Figure 8.19 This figure shows the results of the Gaussian and the predictive mutation using a global model applied to the decentralized, rotating sphere
function with cycle time 100. The best fitness value averaged over
100 experiments is shown on the left. The t-value of the statistical
hypothesis test is shown on the right.
0.2
4
sig. for Predictive
2
0.1
0
0
0
40
80 120 160 200
generation
-2
sig. for Gauss
0
40
80 120 160 200
generation
Figure 8.20 This figure shows the results of the Gaussian and the predictive mutation using a global model applied to the linear translation of a sphere
function. The best fitness value averaged over 100 experiments is
shown on the left. The t-value of the statistical hypothesis test is
shown on the right.
space boundaries it replaces the values (a, b) of the current model, otherwise a =
191
8. F OUR C ASE S TUDIES C ONCERNING THE D ESIGN RULES
b = 0 are used. Then in a second step, a self-adaptive (1 + 1)-evolution strategy is
executed for 2000 steps which optimizes all model parameters a, b, u, v, α.
The estimation of the center uses three points and vectors from the FIFO queue,
namely the last entry, the oldest entry, and the entry in the middle. For all vectors
a normal vector is determined. Then the normal vectors at the respective points
are intersected pairwise. This results in three different intersection points. And the
estimated center is computed as the arithmetic mean of the three points.
After generation 30 the mutation operator uses the model and translates the parental
individual accordingly. The usual self-adaptive mutation is applied to the translated
point in the search space.
This algorithm is applied to the rotating decentralized sphere introduced in Section 8.2 and compared to the standard evolution strategy with similar parameter
settings. Again for each pair of problem and algorithm 100 independent experiments have been executed for 200 generations. Student’s t-test is applied to the
best fitness values of each generation. A difference between the mean best fitness
values of the two algorithms is considered to be significant if the error rate is less
than 0.05, i.e. t-values above 1.96 or below -1.96.
The results are shown in Figure 8.18 for a cycle time of 50 and in Figure 8.19 for
a cycle time of 100. As it can be seen clearly there is a strong tendency that the
algorithm with the dynamics model produces better results. For the majority of
generations there is even a statistical significance for this algorithm. At no time
there is statistical significance for the Gaussian mutation. As a consequence, we
can conclude that the algorithm is able to derive a useful model of the rotating
dynamics and to improve its performance by using this model.
In order to investigate whether linear translations can be derived by the algorithms
similarly good, the following problem of a linearly moving sphere is investigated.
The problem is defined as
(t)
f (~x) =
n X
xi −
(t)
zi
2
i=1
where xi ∈ [−30, 30] (1 ≤ i ≤ n) and n = 2 and z (t) follows the schedule
(0)
= −30
(0)
= −15
z1
z2
(t+1)
z1
(t+1)
z2
192
(t)
= z1 + 0.3
(t)
= z2 + 0.15
8.4. B UILDING A MODEL OF THE DYNAMICS
for generations 1 ≤ t ≤ 200. The experimental setup and the assessment is identical to the rotating sphere.
The results are shown in Figure 8.20. Apparently the linear translation can be used
even better by the model of the dynamics.
There has been made no attempt to improve or tune the algorithm that uses the
global dynamic model. As a consequence the computation time of this algorithm is
significantly bigger than the time needed by the standard evolution strategy. In the
current version the global model is only sensible in problems where each fitness
evaluation is extremely expensive. Nevertheless the comparison is still valid since
this case study only aims at a feasibility study that it is possible to derive a global
model of the dynamics—independently of the self-adaptive control of the step size.
Future work has to investigate how the computational cost of the algorithm can
be reduced and whether there is a simplified model of the dynamics possible that
scales with increasing dimensionality.
193
CHAPTER 9
Conclusions and Future Work
This chapter summarizes the results of this thesis, re-evaluates their utility, and
gives an outlook at future improvements and topics which I believe to be the focus
of future research in the field.
Chapter 1 gives a short motivation for the topic of this thesis and Chapter 2 presents
a short summary of the knowledge presupposed in the thesis. It also provides a survey of the major advances in evolutionary dynamic optimization. This survey is the
basis for the discussion of open problems in Chapter 3 and an overview on the contributions of this thesis and how they integrate into the existing research. The last
section of this chapter is also devoted to a short discussion on the methodological
approach of the thesis.
Chapter 4 proposes a mathematical framework to classify and compare dynamic
problems. This is the first detailed classification for non-stationary environments
and should serve as a proper basis for the integration of results of many different
researchers. The main drawback of the framework is the exclusion of problems
defined on a binary search space. Also the usage of the framework in the thesis—
especially in Chapter 6—has shown that there is still a high variance how certain
problems can be fit into the framework. Future work should focus on this aspect,
probably redefine certain aspects of the framework, and develop strict guidelines
how the framework should be applied.
195
9. C ONCLUSIONS AND F UTURE W ORK
In Chapter 5 properties of “good” or “successful” evolutionary processing in dynamic environments are discussed. This leads to the definition of stability and
reactivity as alternative or additional goals to the mere accuracy of an approximation. The focus of the chapter is on the empirical investigation how the different
goals may be measured in various problem classes. This is the first study devoted
to performance measures only in dynamic optimization. Probably most surprising
is the difference between high and low quality algorithms for some measures and
problem classes since performance measures should be used to determine the quality of an algorithm—if the measure depends on the quality this is not possible in an
objective way. For measuring the recovery a new window based measure appears
to be promising for high quality approximations. Still this first study presented here
is only based on four different problem classes. A large-scale investigation should
be carried out to get more insight into the utility of performance measures. Also
the methodology of how to assess the quality of performance measures should be
re-evaluated. It appears that averaging favors certain measures that are doing fairly
well on most problem instances. Future work should consider the aspect in more
depth to what extent a measure can guarantee exactness.
Chapter 6 deals with the various different techniques in dynamic environments. It
analyzes a major part of the existing experimental research and classifies the used
techniques. Furthermore, the tackled problems are classified using the framework
of Chapter 4. As a consequence, both classifications lead to a mapping between
techniques and problems. This demonstrates well the potential utility of the proposed framework. However, the current mapping reflects only the attempts of certain techniques on certain problems. Most research articles are not concerned with
a proper comparison of techniques and there are many combinations of problems
and techniques nobody has investigated yet. In addition, many compromises have
been taken into bargain to categorize the tackled problems within the proposed
framework. Future work should concentrate on a systematical investigation with
the primary focus on the comparison of the different techniques.
The first part of the thesis (including Chapter 6) endeavors to lay a broad foundation
for the whole field of dynamic optimization. Due to time restrictions it was not
possible to apply the framework intensively for filling in the missing results to build
up the holistic comprehensive understanding of dynamic evolutionary optimization
the framework aims at. The systematic construction of integrated results on this
foundation is only sketched. Numerous extensive investigations will be necessary
to fill the scientific empty spaces created by this foundation. Eventually, these
future results will confirm or jeopardize the presented framework.
Chapter 7 concentrates on one special class of problems, namely unimodal drifting problems without fitness rescaling. It models instances of this problem class
196
on a discrete, two dimensional domain. With two Markov chain models of the EA
dynamics on this problem several results are derived. Both the impact of the parameters and the relevance of the principles underlying the ES mutation are examined.
From these investigations ten design rules are derived for tackling dynamic problems with local variation. This is the most profound and rigorous part of the thesis
since all possible aspects of the subfield are considered. The very abstract model
could be criticized which is not directly related to any existing evolutionary algorithm. However, the principles of the evolution strategy mutation are reflected very
well. Therefore, the results should be transferable easily to evolution strategies.
This statement is supported by the case studies presented in Chapter 8.
The thesis closes with four different case studies in Chapter 8 where some of the
design rules of Chapter 7 are applied. The validity of the design rules is confirmed
by the empirical investigations. On the basis of Chapter 7 two new techniques have
been developed for tracking a drifting problem, distinct selection and the usage of
a global model of the dynamics.
197
9. C ONCLUSIONS AND F UTURE W ORK
References
Angeline, P. J. (1997). Tracking extrema in dynamic environments. In P. J. Angeline, R. G. Reynolds, J. R. McDonnell, & R. Eberhart (Eds.), Evolutionary
Programming VI (pp. 335–345). Berlin: Springer.
Angeline, P. J. (1998). Evolving predictors for chaotic time series. In S. Rogers,
D. Fogel, J. Bezdek, & B. Bosacchi (Eds.), Proc. of SPIE (Volume 3390): Application and Science of Computational Intelligence (pp. 170–180). Bellingham, WA.
Angeline, P. J., & Fogel, D. B. (1997). An evolutionary program for the identification of dynamical systems. In S. Rogers (Ed.), Proc. of SPIE (Volume
3077): Application and Science of Artificial Neural Networks III (pp. 409–
417). Bellingham, WA.
Angeline, P. J., Fogel, D. B., & Fogel, L. J. (1996). A comparison of self-adaptation
methods for finite state machines in dynamic environments. In L. J. Fogel,
P. J. Angeline, & T. Bäck (Eds.), Evolutionary Programming V: Proc. of
the Fifth Annual Conf. on Evolutionary Programming (pp. 441–449). Cambridge, MA: MIT Press.
Arnold, D. V., & Beyer, H.-G. (2002). Random dynamics optimum tracking with
evolution strategies. In J. J. Merelo Guervós, P. Adamidis, H.-G. Beyer, J.L. Fernández-Villacañas, & H.-P. Schwefel (Eds.), Parallel problem solving
from nature – PPSN VII (pp. 3–12). Berlin: Springer.
Bäck, T. (1997). Self-adaptation. In T. Bäck, D. B. Fogel, & Z. Michalewicz (Eds.),
Handbook of Evolutionary Computation (pp. C7.1:1–15). Bristol, New York:
Institute of Physics Publishing and Oxford University Press.
Bäck, T. (1998). On the behavior of evolutionary algorithms in dynamic environments. In IEEE Int. Conf. on Evolutionary Computation (pp. 446–451).
Piscataway, NJ: IEEE Press.
Bäck, T. (1999). Self-adaptive genetic algorithms for dynamic environments with
slow dynamics. pp. 142–145, GECCO Workshops, A. Wu (ed.).
Baluja, S. (1994). Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning
(Tech. Rep. No. CMU-CS-94-163). Pittsburgh, PA: Carnegie Mellon University.
Beasley, J. E., Krishnamoorthy, M., Sharaiha, Y. M., & Abramson, D. (1995). Dynamically scheduling aircraft landings - the displacement problem. (Imperial
College, London, England)
Bersini, H. (1998). Fuzzy-evolutionary systems. In E. H. Ruspini, P. P. Bonissone,
& W. Pedrycz (Eds.), Handbook of Fuzzy Computation (pp. D3.1:1–D3.6:2).
Bristol: Institute of Physics Publishing.
198
R EFERENCES
Beyer, H.-G. (2001). The theory of evolution strategies. Berlin: Springer.
Biegel, J. E., & Davern, J. J. (1990). Genetic algorithms and job shop scheduling.
Computers and Industrial Engineering, 19(1–4), 81–91. (Proc. of the 12th
Annual Conf. on Computers and Industrial Engineering)
Bierwirth, C., Kopfer, H., Mattfeld, D. C., & Rixen, I. (1995). Genetic algorithm
based scheduling in a dynamic manufacturing environment. In Proc. of 1995
IEEE Conf. on Evolutionary Computation. Piscataway, NJ: IEEE Press.
Bierwirth, C., & Mattfeld, D. C. (1999). Production scheduling and rescheduling
with genetic algorithms. Evolutionary Computation, 7(1), 1–17.
Bonissone, P. P. (1997). Soft computing: the convergence of emerging reasoning
technologies. Soft Computing, 1(1), 6–18.
Box, G. E. P. (1957). Evolutionary operation: A method for increasing industrial
productivity. Applied Statistics, 6(2), 81–101.
Box, G. E. P., & Muller, M. A. (1958). A note on the generation of random normal
deviates. Annals. Math. Stat., 29, 610–611.
Branke, J. (1999a). Evolutionary algorithms for dynamic optimization problems:
A survey (Tech. Rep. No. 387). Karlsruhe, Germany: Institute AIFB, University of Karlsruhe.
Branke, J. (1999b). Evolutionary approaches to dynamic optimization problems:
A survey. pp. 134–137, GECCO Workshops, A. Wu (ed.).
Branke, J. (1999c). Memory enhanced evolutionary algorithms for changing optimization problems. In 1999 Congress on Evolutionary Computation (pp.
1875–1882). Piscataway, NJ: IEEE Service Center.
Branke, J. (2002). Evolutionary optimization in dynamic environments. Boston:
Kluwer.
Branke, J., & Mattfeld, D. (2000). Anticipation in dynamic optimization: The
scheduling case. In M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton,
J. J. Merelo, & H.-P. Schwefel (Eds.), Parallel problem solving from nature
– PPSN VI (pp. 253–262). Berlin: Springer.
Bremermann, H. J. (1962). Optimization through evolution and recombination. In
M. C. Yovitis & G. T. Jacobi (Eds.), Self-organizing systems (pp. 93–106).
Washington, D.C.: Spartan.
Bronstein, I. N., & Semendjajew, K. A. (1991). Taschenbuch der Mathematik.
Stuttgart: Teubner.
Cartwright, H. M., & Tuson, A. L. (1994). Genetic algorithms and flowshop
scheduling: towards the development of a real-time process control system.
In T. C. Fogarty (Ed.), Proc. of the AISB Workshop on Evolutionary Computing (pp. 277–290). Berlin: Springer.
Caruana, R. A., & Schaffer, J. D. (1988). Representation and hidden bias: Gray
versus binary coding in genetic algorithms. In J. Leard (Ed.), Proc. of the
199
9. C ONCLUSIONS AND F UTURE W ORK
5th Int. Conf. on Machine Learning (pp. 153–161). San Mateo, CA: Morgan
Kaufmann.
Cedeño, W., & Vemuri, V. R. (1997). On the use of niching for dynamic landscapes.
In Int. Conf. on Evolutionary Computation (pp. 361–366). Piscataway, NJ:
IEEE Press.
Cobb, H. G. (1990). An investigation into the use of hypermutation as an adaptive
operator in genetic algorithms having continuous, time-dependent nonstationary environments (Tech. Rep. No. 6760 (NLR Memorandum)). Washington, D.C.: Navy Center for Applied Research in Artificial Intelligence.
Cobb, H. G., & Grefenstette, J. J. (1993). Genetic algorithms for tracking changing
environments. In S. Forrest (Ed.), Proc. of the Fifth Int. Conf. on Genetic
Algorithms (pp. 523–530). San Mateo, CA: Morgan Kaufmann.
Coker, P., & Winter, C. (1997). N-sex reproduction in dynamic environments. In
P. Husbands & I. Harvey (Eds.), Fourth european conference on artificial life
(p. ?). Cambridge, MA: MIT Press.
Collard, P., Escazut, C., & Gaspar, A. (1996). An evolutionary approach for time
dependant optimization. In Int. Conf. on Tools for Artificial Intelligence 96
(pp. 2–9). IEEE Computer Society Press.
Collard, P., Escazut, C., & Gaspar, A. (1997). An evolutionary approach for
time dependant optimization. International Journal on Artificial Intelligence
Tools, 6(4), 665–695.
Dasgupta, D. (1995). Incorporating redundancy and gene activation mechanisms in
genetic search for adapting to non-stationary environments. In L. Chambers
(Ed.), Practical handbook of genetic algorithms, vol.2 – new frontiers (pp.
303–316). Boca Raton: CRC Press.
Dasgupta, D., & McGregor, D. R. (1992). Nonstationary function optimization
using the structured genetic algorithm. In R. Männer & B. Manderick (Eds.),
Parallel Problem Solving from Nature 2 (Proc. 2nd Int. Conf. on Parallel
Problem Solving from Nature, Brussels 1992) (pp. 145–154). Amsterdam:
Elsevier.
De Jong, K. A. (1975). An analysis of the behavior of a class of genetic adaptive
systems. Unpublished doctoral dissertation, University of Michigan, Ann
Arbor, MI.
De Jong, K. A. (1993). Genetic algorithms are not function optimizers. In L. D.
Whitley (Ed.), Foundations of genetic algorithms 2 (pp. 5–17). San Mateo,
CA: Morgan Kaufmann.
De Jong, K. A. (2000). Evolving in a changing world. In Z. Ras & A. Skowron
(Eds.), Foundation of intelligent systems (pp. 513–519). Berlin: Springer.
Dozier, G. (2000). Steady-state evolutionary path planning , adaptive replacement,
and hyper-diversity. In M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lut200
R EFERENCES
ton, J. J. Merelo, & H.-P. Schwefel (Eds.), Parallel problem solving from
nature – PPSN VI (pp. 561–570). Berlin: Springer.
Droste, S. (2002). Analysis of the (1 + 1) EA for a dynamically changing onemaxvariant. In Congress on Evolutionary Computation (CEC 2002) (pp. 55–60).
Piscataway, NY: IEEE Press.
Eigen, M. (1971). Selforganization of matter and the evolution of biological macromolecules. Die Naturwissenschaften, 58(10), 465–523.
Escazut, C., & Collard, P. (1997). Genetic algorithms at the edge of a dream. In J.K. Hao, E. Lutton, E. Ronald, M. Schoenauer, & D. Snyers (Eds.), Artificial
Evolution – Third European Conference (pp. 69–80). Berlin: Springer.
Fadali, M. S., Zhang, Y., & Louis, S. J. (1999). Robust stability analysis of discretetime systems using genetic algorithms. IEEE Trans. on Systems, Man, and
Cybernetics—Part A, 29(5), 503–508.
Fang, H.-L., Ross, P., & Corne, D. (1993). A promising genetic algorithm approach
to job-shop scheduling, re-scheduling, and open-shop scheduling problems.
In S. Forrest (Ed.), Proc. of the Fifth Int. Conf. on Genetic Algorithms (pp.
375–382). San Mateo, CA: Morgan Kaufmann.
Feng, W., Brune, T., Chan, L., Chowdhury, M., Kuek, C. K., & Li, Y. (1997).
Benchmarks for testing evolutionary algorithms (Tech. Rep. No. CSC97006). Glasgow, UK: Center for System and Control, University of Glasgow.
Fogarty, T. C., Vavak, F., & Cheng, P. (1995). Use of the genetic algorithm for load
balancing of sugar beet presses. In L. J. Eshelman (Ed.), Proc. of the Sixth
Int. Conf. on Genetic Algorithms (pp. 617–624). San Francisco, CA: Morgan
Kaufmann.
Fogel, D. B. (1992a). An analysis of evolutionary programming. In D. B. Fogel & W. Atmar (Eds.), Proc. of the first annual conference on evolutionary
programming (pp. 43–51). La Jolla, CA.
Fogel, D. B. (1992b). Evolving artificial intelligence. Unpublished doctoral dissertation, University of California, San Diego, CA.
Fogel, D. B. (1995). Evolutionary computation: Toward a new philosophy of
machine intelligence. New York, NY: IEEE Press.
Fogel, L. J., Owens, A. J., & Walsh, M. J. (1965). Artificial intelligence through a
simulation of evolution. In M. Maxfield, A. Callahan, & L. J. Fogel (Eds.),
Biophysics and Cybernetic Systems: Proc. of the 2nd Cybernetic Sciences
Symposium (pp. 131–155). Washington, D.C.: Spartan Books.
Fogel, L. J., Owens, A. J., & Walsh, M. J. (1966). Artificial intelligence through
simulated evolution. New York, NY: Wiley & Sons.
Fraser, A. S. (1957). Simulation of genetic systems by automtic digital computers
i. introduction. Australian Journal of Biological Sciences, 10, 484–491.
201
9. C ONCLUSIONS AND F UTURE W ORK
Friedberg, R. M. (1958). A learning machine: Part I. IBM Journal of Research
and Development, 2(1), 2–13.
Friedberg, R. M., Dunham, B., & North, J. H. (1959). A learning machine: Part II.
IBM Journal of Research and Development, 3(3), 282–287.
Friedman, G. J. (1956). Selective feedback computers for engineering synthesis
and nervous system analogy. Unpublished master’s thesis, University of California, Los Angeles, CA.
Futuyma, D. J., & Slatkin, M. (Eds.). (1983). Coevolution. Sunderland, MA:
Sinauer Associates.
Gaspar, A., & Collard, P. (1997). Time dependent optimization with a folding
genetic algorithm. In Int. Conf. on Tools for Artificial Intelligence 97 (pp.
207–214). IEEE Computer Society Press.
Gaspar, A., & Collard, P. (1999a). From GAs to artificial immune systems: Improving adaptation in time dependent optimization. In 1999 Congress on
Evolutionary Computation (pp. 1859–1866). Piscataway, NJ: IEEE Service
Center.
Gaspar, A., & Collard, P. (1999b). There is ALife beyond convergence: using a
dual sharing to adapt in time dependent optimization. In 1999 Congress on
Evolutionary Computation (pp. 1867–1874). Piscataway, NJ: IEEE Service
Center.
Ghozeil, A., & Fogel, D. B. (1996). A preliminary investigation into directed
mutations in evolutionary algorithms. In H. Voigt, W. Ebeling, & I. Rechenberg (Eds.), Parallel Problem Solving from Nature – PPSN IV (Berlin, 1996)
(Lecture Notes in Computer Science 1141) (pp. 329–335). Berlin: Springer.
Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine
learning. Reading, MA: Addison-Wesley.
Goldberg, D. E., & Smith, R. E. (1987). Nonstationary function optimization using
genetic algorithms with dominance and diploidy. In J. J. Grefenstette (Ed.),
Proc. of the Second Int. Conf. on Genetic Algorithms (pp. 59–68). Hillsdale,
NJ: Lawrence Erlbaum Associates.
Granlund, T. (1996). GNU MP: The GNU multiple precision arithmetic library.
Boston, MA.
Grefenstette, J. (1986). Optimization of control parameters for genetic algorithms.
IEEE Trans. on Systems, Man, and Cybernetics, SMC-16(1), 122–128.
Grefenstette, J. J. (1992). Genetic algorithms for changing environments. In
R. Männer & B. Manderick (Eds.), Parallel Problem Solving from Nature
2 (Proc. 2nd Int. Conf. on Parallel Problem Solving from Nature, Brussels
1992) (pp. 137–144). Amsterdam: Elsevier.
Grefenstette, J. J. (1999). Evolvability in dynamic fitness landscapes: A genetic
algorithm approach. In 1999 Congress on Evolutionary Computation (pp.
202
R EFERENCES
2031–2038). Piscataway, NJ: IEEE Press.
Grefenstette, J. J., & Ramsey, C. L. (1992). An approach to anytime learning. In
Proc. of the ninth int. machine learning workshop (pp. 189–195). San Mateo,
CA: Morgan Kaufmann.
Hadad, B. S., & Eick, C. F. (1997). Supporting polyploidy in genetic algorithms
using dominance vectors. In P. J. Angeline, R. G. Reynolds, J. R. McDonnell,
& R. Eberhart (Eds.), Evolutionary Programming VI (pp. 223–234). Berlin:
Springer. (Lecture Notes in Computer Science 1213)
Hansen, N., & Ostermeier, A. (1996). Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation. In Proc.
of the 1996 IEEE Int. Conf. on Evolutionary Computation (pp. 312–317).
Piscataway, NJ: IEEE Service Center.
Hansen, N., & Ostermeier, A. (2001). Completely derandomized self-adaptation
in evolution strategies. Evolutionary Computation, 9(2), 159–195.
Hart, E., & Ross, P. (1998). A heuristic combination method for solving jobshop scheduling problems. In A. E. Eiben, T. Bäck, M. Schoenauer, & H.-P.
Schwefel (Eds.), Parallel Problem Solving from Nature – PPSN V (pp. 845–
854). Berlin: Springer. (Lecture Notes in Computer Science 1498)
Hartley, A. R. (1999). Accuracy-based fitness allows similar performance to
humans in static and dynamic classification environments. In W. Banzhaf,
J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela, & R. E. Smith
(Eds.), Proc. of the Genetic and Evolutionary Computation Conf. GECCO99 (pp. 266–273). San Francisco, CA: Morgan Kaufmann.
Hildebrand, L., Reusch, B., & Fathi, M. (1999). Directed mutation – A new
selfadaptation for evolutionary strategies. In 1999 Congress on Evolutionary
Computation (pp. 1550–1557). Piscataway, NJ: IEEE Service Center.
Hirst, T. (1997). Evolutionary signal processing: A preliminary study. In P. Husbands & I. Harvey (Eds.), Fourth european conference on artificial life (p. ?).
Cambridge, MA: MIT Press.
Holland, J. H. (1969). A new kind of turnpike theorem. Bulletin of the American
Mathematical Society, 75(6), 1311–1317.
Holland, J. H. (1973). Genetic algorithms and the optimal allocation of trials.
SIAM Journal on Computing, 2(2), 88–105.
Holland, J. H. (1975). Adaptation in natural and artifical systems. Ann Arbor, MI:
University of Michigan Press.
Holland, J. H. (1992). Adaptation in natural and artifical systems. Cambridge,
MA: MIT Press.
Jones, T. (1995). Evolutionary algorithms, fitness landscapes and search. Unpublished doctoral dissertation, The University of New Mexico, Albuquerque,
NM.
203
9. C ONCLUSIONS AND F UTURE W ORK
Karr, C. L. (1991). Design of an adaptive fuzzy logic controller using a genetic algorithm. In R. K. Belew & L. B. Booker (Eds.), Proc. of the Fourth Int. Conf.
on Genetic Algorithms (pp. 450–457). San Mateo, CA: Morgan Kaufmann.
Karr, C. L. (1997). Fuzzy-evolutionary systems. In T. Bäck, D. B. Fogel, &
Z. Michalewicz (Eds.), Handbook of Evolutionary Computation (pp. D2.1:1–
2:9). Bristol, New York: Institute of Physics Publishing and Oxford University Press.
Karr, C. L. (1999). An architecture for adaptive process control systems. pp. 146–
148 in Evolutionary Algorithms for Dynamic Optimization Problems (eds.
Jürgen Branke, Thomas Bäck), part of GECCO Workshops, A. Wu (ed.).
Kirley, M., & Green, D. G. (2000). An empirical investigation of optimisation in
dynamic environments using the cellular genetic algorithm. In D. Whitley,
D. Goldberg, E. Cantu-Paz, L. Spector, I. Parmee, & H.-G. Beyer (Eds.),
Proc. of the genetic and evolutionary computation conf. (gecco-2000) (pp.
11–18). San Francisco, CA: Morgan Kaufmann.
Knuth, D. E. (1981). Seminumerical algorithms. Reading, MA: Addison-Wesley.
(2nd ed., vol. 2 of The Art of Computer Programming)
Koza, J. R. (1989). Hierarchical genetic algorithms operating on populations of
computer programs. In N. S. Sridharan (Ed.), Proc. of the 11th Joint Conf. on
Genetic Algorithms (pp. 786–774). San Francisco, CA: Morgan Kaufmann.
Koza, J. R. (1992a). The genetic programming paradigm: Genetically breeding
populations of computer programs to solve problems. In B. Souček (Ed.),
Dynamic, genetic, and chaotic programming (pp. 203–321). New York: John
Wiley.
Koza, J. R. (1992b). Genetic programming: On the programming of computers by
means of natural selection. Cambridge, MA: MIT Press.
Lewis, J., Hart, E., & Ritchie, G. (1998). A comparison of dominance mechanisms
and simple mutation on non-stationary problems. In A. E. Eiben, T. Bäck,
M. Schoenauer, & H.-P. Schwefel (Eds.), Parallel Problem Solving from Nature – PPSN V (pp. 139–148). Berlin: Springer.
Liles, W., & De Jong, K. A. (1999). The usefulness of tag bits in changing environments. In 1999 Congress on Evolutionary Computation (pp. 2054–2060).
Piscataway, NJ: IEEE Press.
Lin, S.-C., Goodman, E. D., & Punch III, W. F. (1997). A genetic algorithm
approach to dynamic job shop scheduling problems. In Proc. of the Seventh
Int. Conf. on Genetic Algorithms (pp. 481–488). Morgan Kaufmann.
Littman, M. L., & Ackley, D. H. (1991). Adaptation in constant utility nonstationary environments. In R. K. Belew & L. B. Booker (Eds.), Proc. of
the Fourth Int. Conf. on Genetic Algorithms (pp. 136–142). San Mateo, CA:
Morgan Kaufmann.
204
R EFERENCES
Mattfeld, D. C., & Bierwirth, C. (1999). Adaptation and dynamic optimization
problems: A view from general system theory. pp. 138–141, GECCO Workshops, A. Wu (ed.).
Montana, D. J., & Davis, L. (1989). Training feedforward neural networks using genetic algorithms. In N. S. Sridharan (Ed.), Proc. of the eleventh int.
joint conf. on artificial intelligence IJCAI-89 (pp. 762–767). San Mateo,
CA: Morgan Kaufmann.
Mori, N., Imanishi, S., Kita, H., & Nishikawa, Y. (1997). Adaptation to changing
environments by means of the memory based thermodynamical genetic algorithm. In T. Bäck (Ed.), Proc. of the Seventh Int. Conf. on Genetic Algorithms
(pp. 299–306). San Francisco, CA: Morgan Kaufmann.
Mori, N., Kita, H., & Nishikawa, Y. (1996). Adaptation to a changing environment
by means of the thermodynamical genetic algorithm. In H. Voigt, W. Ebeling,
& I. Rechenberg (Eds.), Parallel Problem Solving from Nature – PPSN IV
(pp. 513–522). Berlin: Springer.
Mori, N., Kita, H., & Nishikawa, Y. (1998). Adaptation to a changing environment by means of the feedback thormodynamical geentic algorithm. In A. E.
Eiben, T. Bäck, M. Schoenauer, & H.-P. Schwefel (Eds.), Parallel Problem
Solving from Nature – PPSN V (pp. 149–158). Berlin: Springer. (Lecture
Notes in Computer Science 1498)
Morrison, R. W., & De Jong, K. A. (1999). A test problem generator for nonstationary environments. In 1999 Congress on Evolutionary Computation
(pp. 2047–2053). Piscataway, NJ: IEEE Service Center.
Morrison, R. W., & De Jong, K. A. (2000). Triggered hypermutation revisited. In
Proc. of the 2000 Congress on Evolutionary Computation (pp. 1025–1032).
Piscataway, NJ: IEEE Service Center.
Mühlenbein, H., & Mahnig, T. (2001). Evolutionary algorithms: From recombination to search distributions. In L. Kallel, B. Naudts, & A. Rogers (Eds.), Theoretical aspects of evolutionary computing (pp. 135–173). Berlin: Springer.
Munetomo, M., Takai, Y., & Sato, Y. (1996). Genetic-based dynamic load balancing: Implementation and evaluation. In H.-M. Voigt, W. Ebeling, I. Rechenberg, & H.-P. Schwefel (Eds.), Parallel problem solving from nature – PPSN
IV (pp. 920–929). Berlin: Springer.
Narendra, K. S., & Thathachar, M. A. L. (1989). Learning automata: An introduction. Englewood Cliffs, NJ: Prentice Hall.
Neubauer, A. (1996). A comparative study of evolutionary algorithms for online parameter tracking. In H.-M. Voigt, W. Ebeling, I. Rechenberg, & H.-P.
Schwefel (Eds.), Parallel problem solving from nature – PPSN IV (pp. 624–
633). Berlin: Springer.
Neubauer, A. (1997). Prediction of nonlinear and nonstationary time-series using
205
9. C ONCLUSIONS AND F UTURE W ORK
self-adaptive evolution strategies with individual memory. In T. Bäck (Ed.),
Proc. of the Seventh Int. Conf. on Genetic Algorithms (pp. 727–734). San
Francisco, CA: Morgan Kaufmann.
Ng, K., & Wong, K. C. (1995). A new diploid scheme and dominance change
mechanism for non-stationary function optimization. In L. Eshelman (Ed.),
Proc. of the Sixth Int. Conf. on Genetic Algorithms (pp. 159–166). San Francisco, CA: Morgan Kaufmann.
Odetayo, M. O., & McGregor, D. R. (1989). Genetic algorithm for inducing
control rules for a dynamic system. In J. D. Schaffer (Ed.), Proc. of the Third
Int. Conf. on Genetic Algorithms (pp. 177–182). San Mateo, CA: Morgan
Kaufmann.
Oussedik, S., Delahaye, D., & Schoenauer, M. (1999). Dynamic air traffic planning
by genetic algorithms. In 1999 Congress on Evolutionary Computation (pp.
1110–1117). Piscataway, NJ: IEEE Service Center.
Papadimitriou, C. H. (1977). Euclidean TSP is NP-complete. Theoretical Computer Science, 4, 237–244.
Papadimitriou, G. I., & Pomportsis, A. S. (2000). On the use of stochastic estimator
learning automata for dynamic channel allocation in broadcast networks. In
Proc. of the 2000 Congress on Evolutionary Computation (pp. 112–116).
Piscataway, NJ: IEEE Service Center.
Park, S. K., & Miller, K. W. (1988). Random number generators: Good ones are
hard to find. Communications of the ACM, 31(10), 1192–1201.
Pelikan, M., Goldberg, D. E., & Cantú-Paz, E. (1999). BOA: The Bayesian optimization algorithm. In W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon,
V. Honavar, M. Jakiela, & R. E. Smith (Eds.), Proc. of the Genetic and Evolutionary Computation Conf. GECCO-99 (pp. 525–532). San Francisco, CA:
Morgan Kaufmann.
Pettit, E., & Swigger, K. M. (1983). An analysis of genetic-based pattern tracking
and cognitive-based component tracking models of adaptation. In Proc. of
the National Conf. on Artificial Intelligence (AAAI-83) (pp. 327–332). ?
Pipe, A. G., Fogarty, T. C., & Winfield, A. (1994). Hybrid adaptive heuristic
critic architectures for learning in mazes with continuous search spaces. In
Y. Davidor, H.-P. Schwefel, & R. Männer (Eds.), Parallel Problem Solving
from Nature – PPSN III (pp. 482–491). Berlin: Springer. (Lecture Notes in
Computer Science 866)
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1992). Numerical recipes in C: the art of scientific computing. Cambridge: Press Syndicate
of the University of Cambridge. (2nd ed.)
Ramsey, C. L., & Grefenstette, J. J. (1993). Case-based initialization of genetic
algorithms. In S. Forrest (Ed.), Proc. of the Fifth Int. Conf. on Genetic Algo206
R EFERENCES
rithms (pp. 84–91). San Mateo, CA: Morgan Kaufmann.
Rana-Stevens, S., Lubin, B., & Montana, D. (2000). The air crew scheduling system: The design of a real-world, dynamic genetic scheduler. In D. Whitley
(Ed.), Late breaking papers at the 2000 genetic and evolutionary computation conference (pp. 317–324). ?
Rechenberg, I. (1964). Kybernetische Lösungsansteuerung einer experimentellen
Forschungsaufgabe [Cybernetic solution path of an experimental problem].
presented at the Annual Conference of the WGLR at Berlin in September
1964.
Rechenberg, I. (1973). Evolutionsstrategie: Optimierung technischer Systeme nach
Prinzipien der biologischen Evolution. Stuttgart: frommann-holzbog.
Rechenberg, I. (1994). Evolutionsstrategie ’94. Stuttgart: frommann-holzbog.
Reed, J., Toombs, R., & Barricelli, N. A. (1967). Simulation of biological evolution
and machine learning: I. selection of self-reproducing numeric patterns by
data processing machines, effects of hereditary control, mutation type and
crossing. Journal of Theoretical Biology, 17, 319–342.
Rixen, I., Bierwirth, C., & Kopfer, H. (1995). A case study of operational just-intime scheduling using genetic algorithms. In J. Biethan & V. Nissen (Eds.),
Evolutionary algorithms in management applications (pp. 113–123). Berlin:
Springer.
Ronnewinkel, C., Wilke, C. O., & Martinetz, T. (2000). Genetic algorithms in timedependent environments. In L. Kallel, B. Naudts, & A. Rogers (Eds.), Theoretical aspects of evolutionary computing (pp. 263–288). Berlin: Springer.
Rowe, J. E. (1999). Finding attractors for periodic fitness functions. In W. Banzhaf,
J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela, & R. E. Smith
(Eds.), Proc. of the Genetic and Evolutionary Computation Conf. GECCO99 (pp. 557–563). San Francisco, CA: Morgan Kaufmann.
Rowe, J. E. (2001). Cyclic attractors and quasispecies adaptability. In L. Kallel,
B. Naudts, & A. Rogers (Eds.), Theoretical aspects of evolutionary computing (pp. 251–259). Berlin: Springer.
Rudolph, G. (1997). Convergence properties of evolutionary algorithms. Hamburg:
Kovač.
Ryan, C. (1996). The degree of oneness. In 1st online workshop on soft computing
(pp. 100–105). Nagoya, Japan.
Ryan, C. (1997). Diploidy without dominance. In J. T. Alander (Ed.), Proc. of
the Third Nordic Workshop on Genetic Algorithms and their Applications
(3NWGA) (pp. 63–70). Vaasa, Finnland.
Ryan, C., & Collins, J. J. (1998). Polygenic inheritance – a haploid scheme that
can outperform diploidy. In A. E. Eiben, T. Bäck, M. Schoenauer, & H.P. Schwefel (Eds.), Parallel Problem Solving from Nature – PPSN V (pp.
207
9. C ONCLUSIONS AND F UTURE W ORK
178–187). Berlin: Springer. (Lecture Notes in Computer Science 1498)
Saleem, S., & Reynolds, R. (2000). Cultural algorithms in dynamic environments. In Proc. of the 2000 Congress on Evolutionary Computation (pp.
1513–1520). Piscataway, NJ: IEEE Service Center.
Salomon, R., & Eggenberger, P. (1997). Adaptation on the evolutionary time
scale: A working hypothesis and basic experiments. In J.-K. Hao, E. Lutton,
E. Ronald, M. Schoenauer, & D. Snyders (Eds.), Artificial Evolution: Third
European Conf., AE’97 (pp. 251–262). Berlin: Springer.
Sarma, J., & De Jong, K. (1999). The behavior of spatially distributed evolutionary
algorithms in non-stationary environments. In W. Banzhaf, J. Daida, A. E.
Eiben, M. H. Garzon, V. Honavar, M. Jakiela, & R. E. Smith (Eds.), Proc. of
the Genetic and Evolutionary Computation Conf. GECCO-99 (pp. 572–578).
San Francisco, CA: Morgan Kaufmann.
Schwefel, H.-P. (1975). Evolutionsstrategie und numerische Optimierung. Unpublished doctoral dissertation, Technische Universität Berlin, Berlin.
Schwefel, H.-P. (1977). Numerische Optimierung von Computer-Modellen mittels
der Evolutionsstrategie [Numeric optimization of computer models using the
evolution strategy]. Basel, Stuttgart: Birkhäuser.
Schwefel, H.-P. (1981). Numerical optimization of computer models. Chichester:
John Wiley & Sons, Ltd.
Schwefel, H.-P. (1995). Evolution and optimum seeking. New York, NY: Wiley &
Sons.
Sebag, M., Schoenauer, M., & Ravisé, C. (1997a). Inductive learning of mutation step-size in evolutionary parameter optimization. In P. J. Angeline,
R. G. Reynolds, J. R. McDonnell, & R. Eberhart (Eds.), Evolutionary Programming VI (pp. 247–261). Berlin: Springer. (Lecture Notes in Computer
Science 1213)
Sebag, M., Schoenauer, M., & Ravisé, C. (1997b). Toward civilized evolution:
Developing inhibitions. In T. Bäck (Ed.), Proc. of the Seventh Int. Conf. on
Genetic Algorithms (pp. 291–298). San Francisco, CA: Morgan Kaufmann.
Slatkin, M. (1983). Models of coevolution: Their use and abuse. In M. H. Nitecki
(Ed.), Proc. of the Fifth Annual Spring Systematics Symposium: Coevolution
(pp. 339–370). Chicago, IL: The University of Chicago Press.
Smith, J. E., & Vavak, F. (1999). Replacement strategies in steady state genetic
algorithms: dynamic environments. Journal of Computing and Information
Technology, 7(1), 49–60.
Smith, R. E., & Goldberg, D. E. (1992). Diploidy and dominance in artificial
genetic search. Complex Systems, 6, 251–285.
Spalanzani, A., & Kabré, H. (1998). Evolution, learning and speech recognition in
changing acoustic environments. In A. E. Eiben, T. Bäck, M. Schoenauer, &
208
R EFERENCES
H.-P. Schwefel (Eds.), Parallel Problem Solving from Nature – PPSN V (pp.
663–670). Berlin: Springer. (Lecture Notes in Computer Science 1498)
Stanhope, S. A., & Daida, J. M. (1998). Optimal mutation and crossover rates
for a genetic algorithm operating in a dynamic environment. In V. W. Porto,
N. Saravanan, D. Waagen, & A. E. Eiben (Eds.), Evolutionary Programming
VII (pp. 693–702). Berlin: Springer. (Lecture Notes in Computer Science
1447)
Stanhope, S. A., & Daida, J. M. (1999). (1 + 1) genetic algorithm fitness dynamics
in a changing environment. In 1999 Congress on Evolutionary Computation
(pp. 1851–1858). Piscataway, NJ: IEEE Service Center.
Stroud, P. D. (2001). Kalman-extended genetic algorithm for search in nonstationary environments with noisy fitness evaluations. IEEE Transactions on
Evolutionary Computation, 5(1), 66–77.
Thierens, D., & Vercauteren, L. (1991). A topology exploiting genetic algorithm
to control dynamic systems. In H.-P. Schwefel & R. Männer (Eds.), Parallel
problem solving from nature: 1st Workshop, PPSN I (pp. 104–108). Berlin:
Springer.
Trojanowski, K., & Michalewicz, Z. (1999a). Evolutionary algorithms for nonstationary environments. In Proc. of 8th workshop: Intelligent information
systems (pp. 229–240). ?: ICS PAS Press.
Trojanowski, K., & Michalewicz, Z. (1999b). Searching for optima in nonstationary environments. In 1999 Congress on Evolutionary Computation
(pp. 1843–1850). Piscataway, NJ: IEEE Service Center.
Ursem, R. K. (2000). Multinational GAs: Multimodal optimization techniques in
dynamic environments. In D. Whitley, D. Goldberg, E. Cantu-Paz, L. Spector, I. Parmee, & H.-G. Beyer (Eds.), Proc. of the Genetic and Evolutionary
Computation Conf. (GECCO-00) (pp. 19–26). San Francisco, CA: Morgan
Kaufmann.
Vavak, F., & Fogarty, T. C. (1996). A comparative study of steady state and
generational genetic algorithms for use in nonstationary environments. In
Proc. of the Society for the Study of Artificial Intelligence and Simulation of
Behaviour Workshop on Evolutionry Computing 96 (pp. 301–307). ?
Vavak, F., Fogarty, T. C., & Jukes, K. (1996a). A genetic algorithm with variable
range of local search for adaptive control of the dynamic systems. In Proc.
of the 2nd Int. Mendelian Conf. on Genetic Algorithms (pp. 181–186). Brno:
PC-DIR Publishing.
Vavak, F., Fogarty, T. C., & Jukes, K. (1996b). A genetic algorithm with variable
range of local search for tracking changing environments. In H.-M. Voigt,
W. Ebeling, I. Rechenberg, & H.-P. Schwefel (Eds.), Parallel problem solving from nature – PPSN IV (pp. 376–385). Berlin: Springer.
209
9. C ONCLUSIONS AND F UTURE W ORK
Vavak, F., Jukes, K., & Fogarty, T. C. (1997). Adaptive combustion balancing
in multiple burner boiler using a genetic algorithm with variable range of
local search. In T. Bäck (Ed.), Proc. of the Seventh Int. Conf. on Genetic
Algorithms (pp. 719–726). San Mateo, CA: Morgan Kaufmann.
Vavak, F., Jukes, K. A., & Fogarty, T. C. (1998). Performance of a genetic algorithm with variable local search range relative to frequency of the environmental changes. In J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb,
M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg, H. Iba, & R. Riolo
(Eds.), Proc. of the Third Int. Conf. on Genetic Programming (pp. 602–608).
San Mateo, CA: Morgan Kaufmann.
Weicker, K. (2000). An analysis of dynamic severity and population size. In
M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton, J. J. Merelo, &
H.-P. Schwefel (Eds.), Parallel problem solving from nature – PPSN VI (pp.
159–168). Berlin: Springer.
Weicker, K. (2001). Problem difficulty in real-valued dynamic problems. In
B. Reusch (Ed.), Computational Intelligence: Theory and Applications (pp.
313–325). Berlin: Springer.
Weicker, K. (2002). Performance measures for dynamic environments. In J. J.
Merelo Guervós, P. Adamidis, H.-G. Beyer, J.-L. Fernández-Villacañas, &
H.-P. Schwefel (Eds.), Parallel problem solving from nature – PPSN VII (pp.
64–73). Berlin: Springer.
Weicker, K., & Weicker, N. (1999). On evolution strategy optimization in dynamic
environments. In 1999 Congress on Evolutionary Computation (pp. 2039–
2046). Piscataway, NJ: IEEE Service Center.
Weicker, K., & Weicker, N. (2000). Dynamic rotation and partial visibility. In
Proc. of the 2000 Congress on Evolutionary Computation (pp. 1125–1131).
Piscataway, NJ: IEEE Service Center.
Wilke, C. O. (1998). Evolution in time-dependent fitness landscapes (Tech. Rep.
Nos. 98–09). Bochum, Germany: Ruhr-Universität Bochum, Institut für
Neuroinformatik.
Wilke, C. O. (1999). Evolutionary dynamics in time-dependent environments.
Aachen, Germany: Shaker Verlag.
Wilke, C. O., & Ronnewinkel, C. (2001). Dynamic fitness landscapes: Expansions
for small mutation rates. Physica A, 290(3-4), 475–490.
Wilke, C. O., Ronnewinkel, C., & Martinetz, T. (1999). Molecular evolution in
time-dependent environments. In D. Floreano, J.-D. Nicoud, & F. Mondada
(Eds.), Advances in artificial life, proc. of the tth european conf. on artificial
life (p. ??). Berlin: Springer.
Wilke, C. O., Ronnewinkel, C., & Martinetz, T. (2001). Dynamic fitness landscapes
in molecular evolution. (accepted by Physics Reports)
210
R EFERENCES
Wolpert, D. H., & Macready, W. G. (1995). No free lunch theorems for search
(Tech. Rep. No. SFI-TR-95-02-010). Santa Fe, NM: Santa Fe Institute.
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Trans. on Evolutionary Computation, 1(1), 67–82.
Yao, X., & Liu, Y. (1996). Fast evolutionary programming. In L. J. Fogel, P. J. Angeline, & T. Bäck (Eds.), Proc. 5th Ann. Conf. on Evolutionary Programming
(p. ?). Cambridge, MA: MIT Press.
211

Download Report

Evolutionary Algorithms and Dynamic Optimization

Paperzz.com

Your Paperzz