Feedback mechanisms in Switching Max

Feedback
mechanisms
in
Switching Max-Plus Linear systems
And its application to robotic legged locomotion
Master of Science Thesis
Django D. van Amstel
Delft Center for Systems and Control
Feedback mechanisms in Switching
Max-Plus Linear systems
And its application to robotic legged locomotion
Master of Science Thesis
For the degree of Master of Science in Systems and Control at Delft
University of Technology
Django D. van Amstel
April 28, 2014
Faculty of Mechanical, Maritime and Materials Engineering (3mE) · Delft University of
Technology
c Delft Center for Systems and Control (DCSC)
Copyright All rights reserved.
Delft University of Technology
Department of
Delft Center for Systems and Control (DCSC)
The undersigned hereby certify that they have read and recommend to the Faculty of
Mechanical, Maritime and Materials Engineering (3mE) for acceptance a thesis
entitled
Feedback mechanisms in Switching Max-Plus Linear systems
by
Django D. van Amstel
in partial fulfillment of the requirements for the degree of
Master of Science Systems and Control
Dated: April 28, 2014
Supervisor(s):
Dr. G.A.D. Lopes, Daily supervisor
Dr. Ir. T.J.J. van den Boom, Second Supervisor and chair Examination Committee
Prof. Dr. B.F. Heidergott, Third Supervisor
Reader(s):
Prof. Dr. Ir. B. de Schutter, External Committee member
Abstract
The class of Discrete Event Systems (DES) are systems of which the dynamics consist of
the occurrences of discrete events in the time domain [1]. Typical examples of DES from
everyday life are transport systems [2],[3], production facilities [4] or communication
systems.
Recently legged locomotion has been adressed from the point of view of DES [5],[6].
Here, the continuous motion of each leg is represented by the discrete touch-down and
lift-off event of the feet. This abstraction allows for a novel approach in the control of
legged robotic platforms.
A certain sub-class of DES, the so called Max-Plus Linear (MPL)-DES can be described
in a linear fashion using max-plus algebra. An extension to MPL models is the class of
Switching Max-Plus Linear (SMPL) models. This extension is used to describe DES
that can switch between different modes of operation. In each mode the system is
described by a unique MPL state space model. Because of the algebraic structure of
the max-plus algebra, stable transient dynamics are guaranteed during a switch of mode
of operation.
However, very little research has been done on how to decide which mode of operation
to switch to, possibly given a certain set of information. This thesis focuses on the
development of such methods. In particular, it has been researched how to select
the appropriate mode of operation as a function of the measured disturbances and
the environment the MPL system operates within. The problem of selecting the most
appropriate mode of operation is recast as an optimization problem in a feedback control
setting.
For the development of such methods an analysis of the structure of MPL state space
models is performed. Moreover, it is analysed how one can model disturbances in
a MPL framework. Finally, the ordinal optimization method has been extended to
incorporate learning.
Two feedback methods have been developed. The first is the so called reactive feedback
loop. It uses the structural information of the MPL system to mitigate disturbances
Master of Science Thesis
Django D. van Amstel
ii
as fast as possible. The second feedback method is named the deliberate loop. In the
deliberate loop a certain performance function is learned as a function of the disturbances, mode of operation and the environment. Then in the optimization procedure
to goal is to find the mode of operation that minimizes the approximated performance
function.
The deliberate feedback method has been implemented on the Zebro, a hexapod robotic
walking platform using SMPL models to schedule and control the walking pattern. It
has been confirmed that the algorithm optimizes the mode of operation if the performance function is known. However, more experimental data is needed to claim anything
about the learning capabilities of the algorithm in practice.
Django D. van Amstel
Master of Science Thesis
Table of Contents
Preface
I
xiii
Introduction and Literature
1
1 Introduction
1-1 Background . . . . . . . . . . . . . . . . . . .
1-2 Problem statement . . . . . . . . . . . . . . . .
1-2-1 The current control loop . . . . . . . . .
1-2-2 The extended control schematic . . . . .
1-2-3 Mathematical formulation of the problem
1-3 Scope and Outline . . . . . . . . . . . . . . . .
1-3-1 Outside of the scope of this thesis . . . .
1-4 Chapter notes . . . . . . . . . . . . . . . . . .
2 Max-plus algebra
2-1 Basics . . . . . . . . . . . . . . . . . . . . .
2-1-1 Definitions . . . . . . . . . . . . . . .
2-1-2 Algebraic properties . . . . . . . . . .
2-1-3 Vectors and matrices . . . . . . . . . .
2-1-4 Matrix definitions . . . . . . . . . . .
2-2 Graphs . . . . . . . . . . . . . . . . . . . . .
2-2-1 Basic definitions . . . . . . . . . . . .
2-2-2 Representing a matrix graphically by its
2-3 Solutions to recursive linear equations . . . . .
2-4 Eigenvalues and eigenvectors . . . . . . . . .
Master of Science Thesis
. . . . . .
. . . . . .
. . . . . .
. . . . . .
statement
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
communication graph
. . . . . . . . . . . .
. . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
4
4
6
8
10
10
12
.
.
.
.
.
.
.
.
.
.
13
13
13
14
15
16
16
16
17
18
19
Django D. van Amstel
iv
Table of Contents
2-4-1
Cyclicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2-5 Switching max-plus linear models and Timed Event Graphs . . . . . . . .
21
2-5-1
Max-plus linear state space models . . . . . . . . . . . . . . . . .
21
2-5-2
Switching max-plus linear systems . . . . . . . . . . . . . . . . . .
22
2-5-3
Timed event graphs and linear systems . . . . . . . . . . . . . . .
22
2-5-4
Going from a Petri Net representation to a state space model . . .
24
2-6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3 Ordinal optimization
27
3-1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-1-1 Summary of method . . . . . . . . . . . . . . . . . . . . . . . . .
3-1-2
II
28
28
Goal softening . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3-2 Alignment probability . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3-3 The ordered performance curve . . . . . . . . . . . . . . . . . . . . . . .
30
3-3-1
The efficiency of ordering . . . . . . . . . . . . . . . . . . . . . .
31
3-3-2
Determining the required set size . . . . . . . . . . . . . . . . . .
31
3-4 The selection rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-4-1 The Blind pick selection rule . . . . . . . . . . . . . . . . . . . . .
31
31
3-4-2 The Horse Race selection rule . . . . . . . . . . . . . . . . . . . .
3-5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-6 Chapter notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
32
32
Theoretical foundation
35
4 Structural analysis of max plus linear models
37
4-1 Petri Nets and MPL state space equations . . . . . . . . . . . . . . . . .
40
4-2 The single event cycle iteration method . . . . . . . . . . . . . . . . . . .
41
4-2-1
Bounding the iteration method . . . . . . . . . . . . . . . . . . .
43
4-2-2
Summary of single event cycle state calculation method . . . . . .
44
4-2-3
Comparison of methods . . . . . . . . . . . . . . . . . . . . . . .
45
4-3 The single event cycle iteration method for higher order max-plus linear systems 45
4-4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-5 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Django D. van Amstel
46
46
Master of Science Thesis
Table of Contents
v
5 The max-plus linear disturbance model
49
5-1 Modeling disturbances in a max-plus linear model framework . . . . . . . .
5-1-1
The four ways to define the disturbance function . . . . . . . . . .
50
5-1-2 Definition of the disturbance function . . . . . . . . . . . . . . . .
5-2 Deriving the disturbance MPL model . . . . . . . . . . . . . . . . . . . .
51
52
5-2-1
Going from a standard MPL to the MPL disturbace model . . . . .
53
5-3 Conditions of existence for the MPL disturbance model . . . . . . . . . . .
5-4 Application of the disturbance model . . . . . . . . . . . . . . . . . . . .
54
55
5-5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-6 Chapter notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
56
6 The adaptive Blind pick selection rule
III
49
57
6-1 The Blind pick versus Horse race selection rule and learning . . . . . . . .
57
6-2 The Adaptive Blind pick selection rule . . . . . . . . . . . . . . . . . . . .
58
6-2-1
6-2-2
Random selection . . . . . . . . . . . . . . . . . . . . . . . . . .
Introducing the Selection Probability Matrix . . . . . . . . . . . .
58
59
6-2-3
Inverse Transform Sampling . . . . . . . . . . . . . . . . . . . . .
59
6-2-4
Summary of the Adaptive Blind Pick selection rule . . . . . . . . .
60
6-3 The alignment probability for the adaptive Blind Pick selection rule . . . .
61
6-4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
Feedback mechanisms
65
7 Switching Max Plus Linear Feedback Methods
67
7-1 The reactive feedback loop . . . . . . . . . . . . . . . . . . . . . . . . .
7-1-1
7-1-2
67
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Reactive Gait Scheduler algorithm . . . . . . . . . . . . . . .
68
71
7-2 The deliberate feedback loop . . . . . . . . . . . . . . . . . . . . . . . .
72
7-2-1
7-2-2
7-2-3
Switch Decision Maker . . . . . . . . . . . . . . . . . . . . . . . .
Performance Function Learner . . . . . . . . . . . . . . . . . . . .
Mode of Operation Optimizer and Synthesizer . . . . . . . . . . .
72
73
74
7-2-4
7-2-5
Gait Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary of the deliberate feedback algorithm . . . . . . . . . . .
75
76
7-3 Combining the feedback loops . . . . . . . . . . . . . . . . . . . . . . . .
76
7-3-1 Data Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-5 Chapter notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
79
79
Master of Science Thesis
Django D. van Amstel
vi
Table of Contents
8 Case Study: The Zebro hexapod walking robot
8-1 Implementation of the deliberate feedback loop on the Zebro . . . . . . .
8-1-1 Implementation of the Mode of Operation Synthesizer . . . . . . .
8-1-2 MPL Gait Scheduler . . . . . . . . . . . . . . . . . . . . . . . . .
8-1-3 Implementation of the Switch Decision Maker . . . . . . . . . . .
8-1-4 Implementation of the Performance Function Learner and Mode of
Operation Optimizer . . . . . . . . . . . . . . . . . . . . . . . . .
8-2 Description of experiments . . . . . . . . . . . . . . . . . . . . . . . . . .
8-2-1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . .
8-2-2 Reference speed experiment . . . . . . . . . . . . . . . . . . . . .
8-2-3 Learning experiment . . . . . . . . . . . . . . . . . . . . . . . . .
8-3 Analysis of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-3-1 Reference speed experiment results . . . . . . . . . . . . . . . . .
8-3-2 Learning experiment results . . . . . . . . . . . . . . . . . . . . .
8-4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
81
81
84
84
9 Conclusions
9-1 Modeling disturbances in a max-plus framework . . . . . . .
9-2 The reactive feedback loop . . . . . . . . . . . . . . . . . .
9-2-1 The single event iteration state calculation method . .
9-2-2 Reactive calculation of the system matrices . . . . . .
9-3 The deliberate feedback loop . . . . . . . . . . . . . . . . .
9-3-1 The Adaptive Blind Pick selection rule . . . . . . . .
9-3-2 Optimizing the mode of operation . . . . . . . . . . .
9-3-3 Implementation of the deliberate feedback loop on the
9-4 Discussion and recommendations . . . . . . . . . . . . . . .
9-4-1 Recommendations . . . . . . . . . . . . . . . . . . .
95
95
96
96
97
97
97
98
98
99
99
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Zebro robot
. . . . . . .
. . . . . . .
A Numerical examples per chapter
A-1 Chapter 2: Max plus algbra . . . . . . . . . . . . . . . . . . . . .
A-2 Chapter 3: Ordinal Optimization . . . . . . . . . . . . . . . . . .
A-3 Chapter 4: Structural analysis of max-plus linear models . . . . . .
A-4 Chapter 5: The max-plus linear disturbance model . . . . . . . . .
A-5 Chapter 6: The adaptive Blind Pick selection rule . . . . . . . . .
A-6 Chapter 7: Switching Max-Plus Linear Feedback Methods . . . . .
A-7 Chapter 8: Case Study: The Zebro hexapod walking robot . . . . .
A-8 MATLAB code . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A-8-1 MATLAB implementation reactive feedback control method
Django D. van Amstel
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
85
88
88
89
90
90
90
92
92
101
101
106
108
110
113
115
121
123
127
Master of Science Thesis
Table of Contents
B Alignment probabilities as a function of the
B-1 Tables . . . . . . . . . . . . . . . . . . .
B-2 Graphs . . . . . . . . . . . . . . . . . . .
B-3 Used MATLAB code . . . . . . . . . . . .
B-3-1 Main code . . . . . . . . . . . . .
B-3-2 Functions used in main code . . . .
C Code of implementation
C-1 Implementation of the event level feedback
C-2 Implementation code . . . . . . . . . . . .
C-2-1 Main code . . . . . . . . . . . . .
C-2-2 Switch Decision Maker . . . . . . .
C-2-3 Performance Function Learner . . .
C-2-4 Mode of Operation Optimizer . . .
C-2-5 Mode of Operation Synthesizer . .
C-2-6 Performance function . . . . . . .
C-2-7 Annealing temperature function . .
C-2-8 ξ(t) to ξ(k) discretization function
C-2-9 Initialization code for Ξ, Θ and Λ .
C-2-10 Definition of Θ . . . . . . . . . . .
vii
annealing
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
temperature
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
D Experimental setup and results
D-1 Experimental results . . . . . . . . . . . . . . . . . . . .
D-1-1 Reference speed experiment results . . . . . . . .
D-1-2 Learning experiment results . . . . . . . . . . . .
D-2 Technical difficulties during the experiments . . . . . . .
D-3 Original designed experiment . . . . . . . . . . . . . . .
D-3-1 The CyberZoo . . . . . . . . . . . . . . . . . . .
D-3-2 Method of experiment . . . . . . . . . . . . . . .
D-3-3 Issues with the CyberZoo experimental setup . . .
D-4 Recommendations . . . . . . . . . . . . . . . . . . . . .
D-5 Used MATLAB code . . . . . . . . . . . . . . . . . . . .
D-5-1 Reference speed experiment . . . . . . . . . . . .
D-5-2 MATLAB code use for experimental result analysis
D-5-3 Learning experiment . . . . . . . . . . . . . . . .
Bibliography
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
131
131
134
138
138
142
.
.
.
.
.
.
.
.
.
.
.
.
147
147
148
148
160
160
161
162
163
163
164
164
165
.
.
.
.
.
.
.
.
.
.
.
.
.
167
167
167
172
174
184
184
186
187
187
187
187
188
189
191
Glossary
195
List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Master of Science Thesis
Django D. van Amstel
viii
Django D. van Amstel
Table of Contents
Master of Science Thesis
List of Figures
1-1
1-2
1-3
1-4
The current state of the art in the control of robotic legged locomotion.
The Zebro hexapod robot. Image courtesy of the Delft Robotics Institute
The extended control scheme. . . . . . . . . . . . . . . . . . . . . . . .
Structure of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
5
5
9
11
3-1 Graphical representation of defined sets . . . . . . . . . . . . . . . . . . .
3-2 Different types of Ordered Performance Curve (OPC). . . . . . . . . . . .
29
30
4-1 A generic Petri Net, used to analyse the relation between the structures of
the graphical an algebraic representation of a MPL DES . . . . . . . . . .
4-2 The movement of the tokens in time for one event iteration . . . . . . . .
38
39
5-1 The Petri Net representation of equation 5-5 . . . . . . . . . . . . . . . .
52
6-1 The alignment probability P (|G ∩ S| > k) as a function of the desired
alignment level k for selected values of the annealing temperature T . . . .
62
7-1 The desired control scheme. . . . . . . . . . . . . . . . . . . . . . . . . .
7-2 The delayed walking schedule of a six legged system. . . . . . . . . . . . .
7-3 The updated schedule, compressing the schedule to mitigate the effects of
the delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-4 Block diagram of the reactive feedback loop. . . . . . . . . . . . . . . . .
7-5 The block diagram of the deliberate feedback loop . . . . . . . . . . . . .
7-6 The internal structure of the supervisory controller . . . . . . . . . . . . .
68
69
8-1 The Zebro hexapod robot. . . . . . . . . . . . . . . . . . . . . . . . . . .
8-2 The leg numbering as used in the modeling . . . . . . . . . . . . . . . . .
8-3 The alignment probability as a function of the annealing temperature. . . .
82
82
88
Master of Science Thesis
69
71
73
78
Django D. van Amstel
x
List of Figures
8-4 The experimental environment . . . . . . . . . . . . . . . . . . . . . . . .
8-5 The approximated performance values after 1 hour of learning . . . . . . .
A-1
A-2
A-3
A-4
A-5
A-6
A-7
Communication graph of matrix A . . . . . . . . . . . . . . .
The Petri Net modeling bipedal static stable waking . . . . . .
The schedule of bipedal static stable walking. . . . . . . . . . .
The Ordered Performance Curve for the performance function of
The communication graph of A0 of the bipedal walking model .
The Petri Net of the obtained disturbance model . . . . . . . .
The cumulative distribution function FΘ (θ) . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
Example
. . . . .
. . . . .
. . . . .
89
93
. 102
. 105
. 107
3-1107
. 109
. 111
. 114
A-8 The CDF FX (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A-9 Comparison of histograms. . . . . . . . . . . . . . . . . . . . . . . . . . . 116
B-1
B-2
B-3
B-4
C-1
The
The
The
The
selected annealing temperatures. . .
alignment probabilities for selected T
alignment probabilities for selected T
alignment probabilities for selected T
. . . . . . . .
and g = 10. .
and g = 50. .
and g = 100.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
134
135
136
137
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
D-1 The performance function as a function of Θ, for each environment . . . .
D-2 Results for the first Reference speed experiment with T = 0.01. . . . . . .
D-3 Results for the second Reference speed experiment with T = 0.01. . . . . .
D-4 Results for the third Reference speed experiment with T = 0.01. . . . . . .
D-5 Results for the first Reference speed experiment with T = 5. . . . . . . . .
D-6 Results for the second Reference speed experiment with T = 5. . . . . . .
D-7 Results for the third Reference speed experiment with T = 5. . . . . . . .
D-8 The approximated performance values after t = 130s of learning . . . . . .
D-9 The approximated performance values after t = 820s of learning . . . . . .
D-10 The approximated performance values after t = 1564s of learning . . . . .
D-11 The approximated performance values after t = 2186s of learning . . . . .
D-12 The approximated performance values after t = 2186s of learning and the
outlier removed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
D-13 The approximated performance values after t = 2686s of learning . . . . .
D-14 The approximated performance values after t = 3464s of learning . . . . .
D-15 The approximated performance values after t = 4214s of learning . . . . .
D-16 Picture of an IR marker as installed on the Zebro. . . . . . . . . . . . . .
D-17 Schematic representation of the Cyber Zoo communication network. . . . .
Django D. van Amstel
168
169
169
170
170
171
171
175
176
177
178
179
180
181
182
185
185
Master of Science Thesis
List of Tables
8-1 Definition of all modes of operation θi ∈ Θ . . . . . . . . . . . . . . . . .
8-2 GS for Tlow and Thigh in the reference speed experiment . . . . . . . . . .
86
91
A-1 Holding times in the Petri Net modeling Bipedal walking . . . . . . . . . . 116
B-1
B-2
B-3
B-4
B-5
B-6
Alignment
Alignment
Alignment
Alignment
Alignment
Alignment
probabilities
probabilities
probabilities
probabilities
probabilities
probabilities
for
for
for
for
for
for
N
N
N
N
N
N
=
=
=
=
=
=
1000,
1000,
1000,
1000,
1000,
1000,
g
g
g
g
g
g
=
=
=
=
=
=
10,
10,
10,
10,
10,
10,
s
s
s
s
s
s
=
=
=
=
=
=
5 .
10
20
30
40
50
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
132
132
132
133
133
133
D-1 Indexes of four optimal modes of operation for each environment in the
Reference Speed experiment . . . . . . . . . . . . . . . . . . . . . . . . . 168
D-2 The learned performance values matrix Λ after t = 4215s . . . . . . . . . 172
Master of Science Thesis
Django D. van Amstel
xii
Django D. van Amstel
List of Tables
Master of Science Thesis
Preface
This report is the final written delivery in the graduation project for obtaining the
degree of Master of Science in Systems and Control, carried out by Django van Amstel
at the department of Delft Center for Systems and Control (DCSC) at the Delft University of Technology, the Netherlands.
The idea for this thesis project was born in a discussion between two of the project
supervisors Dr. Gabriel Lopes, assistant professor at DCSC and Prof. Dr. Bernd Heidergott, associate professor at the department of Econometrics and Operations research
at the Free University of Amsterdam. They were wondering if the unique switching
property of SMPL models could be utilized to arrive at novel control methods. The
project team was completed by Dr. Ir. Ton van den Boom, associate professor at DCSC.
The project mainly revolved around the Zebro, a walking robot that uses SMPL models
to schedule and control the walking gait. It is interesting to note that my high school
graduation project also involved walking robots. Because of my limited knowledge of
dynamics, control algorithms and programming the project was not a success. Hence,
it almost felt natural to take up the Zebro project and show what difference five years
of engineering school can make.
First of all I must express my great gratitude towards my three supervisors. Not
only am I grateful for the tremendous amount of ideas and suggestions they had during
the execution of the project. Often during meetings the discussion would not only be
on the topics of max-plus algebra and walking robots, but on much more general and
philosophical topics in science and engineering. I most enjoyed these off-topic discussions and I feel they have broadened my view on the academic world and what it means
to be an engineer.
I also have to thank Fankai Zhang, technical staff member at the robotics laboratory of DCSC, for all his help with the Zebro robot. Without his help, it would have
been impossible for me to implement anything on the Zebro.
Master of Science Thesis
Django D. van Amstel
xiv
Preface
Finally, special thanks go out to my sister Naomi and my good friends Bas, Rafn,
Rishabh, Roicy and Stefan. More than anyone they have provided me with the support and means to keep going on with this thesis project under the sometimes difficult
circumstances.
Delft, University of Technology
April 28, 2014
Django D. van Amstel
Django D. van Amstel
Master of Science Thesis
xv
Title page image courtesy of the Robotics Institute of the Delft University of Technology.
Master of Science Thesis
Django D. van Amstel
xvi
Django D. van Amstel
Preface
Master of Science Thesis
“The greats weren’t great because at birth they could paint. The greats were
great because they’d paint a lot.”
— Ben Haggerty
Part I
Introduction and Literature
Master of Science Thesis
Django D. van Amstel
Chapter 1
Introduction
1-1
Background
The class of Discrete Event Systems (DES) consists of systems that have a discrete
state and of which the state evolution or the dynamics are determined by the happening of certain discrete events in the time domain. As a result, the state of a DES
changes at discrete points in time [1]. Typical examples of DES from everyday life are
transportation systems [2],[3], production facilities [4] or communication systems.
In [5],[6] legged locomotion has been abstracted to the the discrete touch-down and liftoff event of the feet. By this abstraction, legged locomotion can be viewed and modelled
as a DES, allowing for a novel approach in the control of legged robotic platforms.
In neuroscience it was discovered that the manner of walking of certain animals are
created by a Central Pattern Generator (CPG) in the spinal cord [7], [8]. A unique
manner of walking is named a gait [9]. Using a DES representation in the control
of legged locomotion creates a framework that is very similar to how these biological
CPG’s work.
Modeling DES systems in conventional algebra leads to nonlinear systems of differential
equations [10]. In general these systems of equations can only be solved numerically.
A disadvantage of solving these systems numerically is that analysis is very difficult.
However, a certain sub-class of DES, the so called Max-Plus Linear (MPL)-DES, can
be described in a linear fashion using max-plus algebra [11], [1]. This sub-class can be
characterized as the class of DES in which only synchronization and no concurrency or
choice occurs [12]. In the application of MPL models to legged locomotion, the gaits
are described by linear models in which analysis is very simple.
An extension of MPL models is the class of Switching Max-Plus Linear (SMPL) models.
This extension is used to describe DES that can switch between different modes of
operation. In each mode the system is described by a unique MPL state space model
Master of Science Thesis
Django D. van Amstel
4
Introduction
with different system matrices for each mode [12]. Because of the algebraic structure
of the max-plus algebra, stable transient dynamics are guaranteed during a switch of
mode of operation [12].
In the application of SMPL models to robotic legged locomotion, switching the mode
of operation translates to switching the gait [5]. By utilizing the properties of SMPL
systems, gait switching can be performed with minimal computational complexity and
guaranteed stability.
From biology it is known that animals typically use multiple gaits in different situations.
For example, in [13] it is investigated how horses change their gait to minimize the
energy consumption while varying desired locomotion speed. In [13] more than 100
different horse gaits have been identified.
This fact from biology is the inspiration for the main question adressed in this thesis;
can this gait switching behavior of animals, aiming to optimize some performance index,
be implemented on walking robots as well?
1-2
Problem statement
The question raised at the end of the previous section can be formulated in a more
general sense. This general question is the research question of this thesis:
"How to find the optimal mode of operation from a large and unstructured
search space for a Switching Max Plus Linear System"
In the remainder of this section, this ojective will be translated to a mathematical
description. First, the control loop as currently used in robotic leg control will be
presented. Subsequently, the necessary extensions to the current control block diagram
will be presented. In this extended framework the mathematical formulation of the
research objective will be defined.
1-2-1
The current control loop
In Figure 1-1 the currently used control block diagram for the Zebro hexapod walking
robot [5] is depicted.
The Zebro robot is a six legged robot developed by Delft Center for Systems and
Control (DCSC) of the Delft University of Technology, see Figure 1-2 for an impression.
The Zebro uses a SMPL model to synchronise the movement of the six legs.
The systems matrix A, representing a unique walking gait, is an input for the MPL
state space model, acting as the gait scheduler. The resulting state x contains the
timings at which the legs should lift- off and touch down on the surface respectively.
This schedule is transformed into a continuous time reference signal r for each leg.
Finally, this reference signal is used in a classical PD feedback control loop with control input u to control the actual position for all legs. This actual position can be
transformed back in a realized state xb of the MPL system.
Django D. van Amstel
Master of Science Thesis
1-2 Problem statement
5
Figure 1-1: The current state of the art in the control of robotic legged locomotion. Image
adapted from [5]
Figure 1-2: The Zebro hexapod robot. Image courtesy of the Delft Robotics Institute
Master of Science Thesis
Django D. van Amstel
6
Introduction
In the control schematic xb is also fed back into the max-plus scheduler. This loop will
be explained in future chapters, as it is currently not of importance.
Essentially, the current scheme is a feed forward controller as the A matrix is taken
as an external input and no feedback on it is present. Currently, a human operator
provides the system matrix A as input. The selection of the A matrix is done by trial
and error.
A more detailed explanation of the presented control scheme can be found in [5],[14],[15].
Note that although this control scheme was designed for robotic walking, it applies to
a general MPL DES as well.
1-2-2
The extended control schematic
The current control scheme introduced in Figure 1-1 implicitly assumes no disturbances
are present as there is no external signal that can disturb the state. However, real
systems operate within an environment. Environments introduce disturbances that
make the actual state deviate from the calculated or scheduled state of the MPL model.
This interaction with the environment is modelled by expanding the current scheme
as represented graphically in Figure 1-3. In here, the original schematic is enclosed
in the grey box. Two entities have been added; the environment and the supervisory
controller.
Definition of variables
The different signals present in the newly obtained schematic will now be defined.
Let Θ be the discrete finite set of size N of all possible parameterizations of the system
matrix A, defined on the so-called max-plus set Rmax . This set will be defined in chapter
2. Θ is called the set of all modes of operation and θ ∈ Θ is a certain unique mode of
operation, such that the whole set Θ can be denoted as
Θ = {θ1 , θ2 , · · · , θN } .
n×n
Now the system matrix A can be viewed as a specific projection of θ on Rmax
by the
mapping A:
A : θ → A(θ),
n×n
θ ∈ Θ and A ∈ Rmax
.
Let the complete environment be represented by a vector of parameters of very high
dimension Z. All possible parameterizations of the environment form the finite discrete
set N of size M , such that Z ∈ N . For example, a part of the environment vector Z
Django D. van Amstel
Master of Science Thesis
1-2 Problem statement
7
could be






Z=




..
.
temperature/K
Angle of surface/rad
Surface type
Raining
..
.
and two possible parameterizations could be






Zi = 




..
.
273.15
0
"asphalt"
"yes"
..
.






,










Zj = 















..
.
280
0.5 · π
"sand"
"no"
..
.






,




such that
N = {Z1 , · · · , Zi , Zj , · · · , ZM } .
In other words, the set N is the set of all unique parameterizations of the high dimensional vector Z. Each vector represents a unique state of the environment, such
that N is the finite and discrete set that contains all M unique state vectors of the
environment.
Then, let ξ be a lower dimensional or partial representation of the complete environment
state vector Z. Using the example above, ξ could be something like
ξ=
"
Angle of surface/rad
Surface type
#
.
The set of all possible unique vectors ξ is named Ξ, so similarly as in the example above
it is obtained as
#
"
#
"
0.5 · π
0
,
, ξj =
ξi =
"sand"
"asphalt"
such that
Ξ = {ξ1 , · · · , ξi , ξj , · · · , ξM } .
Note that Ξ must be of size M as well. Ξ is named the observerable environment
set, while Φ is the set of all remaining unobservable dimensions of the complete environment. Effectively the full environment state vector is divided into two sets of lower
dimensional vectors Ξ and Φ, the first being the observable environment and the second
the unobservable part:
N = Ξ × Φ.
Ξ is assumed explicitly to only be a partial representation of the complete state of the
environment for a very practical reason. All systems (both biological and artificial)
Master of Science Thesis
Django D. van Amstel
8
Introduction
typically only have a small set of sensors compared to the dimension of the environment state. For simpler tasks, the sensor set can be designed such that the important
environment states for the task at hand are observable.
However, as robots become more autonomous, it is inevitable that they will encounter
unknown and unexpected environments. It is safe to assume that in these environments,
unobservable states by the sensors might have significant influence on the system. Incorporating this assumption of unobservable influences in the design of the robot lead
to more robust controlling algorithms.
Let the scheduled or reference state of size n of the MPL system be denoted by x(k). k is
the event counter, interpretable as the time in event domain. An extensive explanation
of MPL systems is given in chapter 2.
Subsequently, the disturbances acting on the system are given by the disturbance vector
d(k) ∈ Rn . Then, the scheduled state x, the disturbances d(k) and the realized or actual
state xb(k) ∈ Rnmax are related via a function D as:
xb = D(d, x).
This mapping D is explained in more detail in chapter 5, for now it suffices to note
that it is defined by
xbi (k) = xi (k) + di (k),
∀i ∈ {1, 2, · · · , n} .
Finally, define the performance function F as
F(x, xb, Z, θ),
such that a true performance value J is given by
J = F(x, xb, Z, θ).
(1-1)
Jb = F(x, xb, ξ, θ).
(1-2)
Note that it is impossible to obtain Z as only the lower dimensional vector ξ is observable. Hence, the approximated performance value is introduced as
The unobservable part Φ is taken into account in the model as a stochastic insecurity
with some yet to be determined distribution function.
In Figure 1-3 not J is presented as an internal variable for the supervisory controller
b will be defined as a matrix having performance values Jb as it’s
but Λ. In chapter 7 Λ
elements.
1-2-3
Mathematical formulation of the problem statement
With all the variables defined, it is possible to define the problem statement in a
mathematical way. The problem statement is redefined as:
Django D. van Amstel
Master of Science Thesis
1-2 Problem statement
9
Figure 1-3: The extended control scheme. Image edited from [5]
Find the optimal mode of operation θ∗ , defined by the optimization problem
θ∗ = arg min F(x, xb, Z, θ) = Γ(x, xb, ξ),
θ∈Θ
θ∗ ∈ Θ,
(1-3)
such that the output of the supervisory controller A is equal to Aopt , defined
as
n×n
Aopt = A(θ∗ ), Aopt ∈ Rmax
.
One can interpret this formulation also as the search for the aggregated feedback function Γ that links the selected system matrix A to the various measurements available
from our physical system and environment.
Subproblems of the problem statement
The search for Aopt with the above definition gives rise to the following subproblems:
1. The disturbances that result in the actual state xb(k) are propagated through the
system by the A matrix. Hence, the actual state influences the scheduled state
x(k). Analysis of the structure of the A matrix will give insight in how this
propagation works.
2. The function D must be defined formally such that the relation between d(k),
xb(k) and x(k) is defined. With the formal definition, it will be possible to derive
a MPL model that incorporates disturbances.
3. In the problem statement, the true performance value J plays a role. However,
only the approximations Jb are assumed to be available. A method to approximate
J from Jb is necessary.
4. A suitable optimization technique to solve the optimization problem in 1-3 should
be selected.
Master of Science Thesis
Django D. van Amstel
10
1-3
Introduction
Scope and Outline
The structure of this thesis is given in Figure 1-4. The thesis is divided into three
distinct parts.
The first part covers the introduction, the problem statement and the necessary topics
from literature. In chapter 2 the max-plus algebra and switching max-plus linear
models will be introduced. In this chapter, major attention is given to the graphical
representation of max-plus linear models as well. Chapter 3 is devoted to the explanation of Ordinal Optimization (OO), a recently developed optimization technique based
on simulation and statistics. This technique is the partial answer to subproblem 4.
Part two holds three chapters, each roughly covering a particular contribution of this
thesis. In chapter 4 the structure of the A matrix is analysed, giving answer to
subproblem 1. This analysis will form the basis for one of the developed feedback
mechanisms in part III. Subproblem 2 is adressed in chapter 5. In here, the standard
SMPL model is extended to incorporate disturbances, relating the calculated state x,
the actual state xb and the disturbances in a systematic way. The last chapter of part
II, chapter 6, introduces a novel selection rule for the OO technique. This chapter
completes the answer to subproblem 4.
The third and last part combines the theory as presented in part II, leading to the
development of two feedback mechanisms. Both are presented in Chapter 7. In the
development of the feedback control methods, subproblem 3 is addressed. At the
end of chapter 7, both methods are fused together, giving the theoretical answer to the
problem statement. One of the developed feedback methods is tested in practice on
the Zebro hexapod robot.
The implementation and description of the experiments and the analysis of the results
are found in Chapter 8. The final conclusions of this thesis are drawn in Chapter
9. In this last chapter, recommendations are made for if the current research would be
continued.
Most of the material presented in part I and II of this report benefits from numerical
examples. Towards this end, appendix A contains simple numerical examples per
chapter to illustrate the theory presented.
1-3-1
Outside of the scope of this thesis
The scope of this thesis is limited to feedback methods by adapting or switching the system matrix A. Feedback control strategies using the more traditional additive control
input u(k) of the form
x(k) = A ⊗ x(k − 1) ⊕ u(k)
are outside of the scope of this thesis. The interested reader is referred to [16], [17] for
examples of different (feedback) control strategies for MPL systems and to [18], [19],
[20] for the development of Model Predictive Control for MPL and random or stochastic
SMPL systems.
Django D. van Amstel
Master of Science Thesis
1-3 Scope and Outline
11
Figure 1-4: Structure of thesis
Master of Science Thesis
Django D. van Amstel
12
1-4
Introduction
Chapter notes
One of the common goals in robotics is that one day robotic platforms will replace humans in dangerous and hazardous situations like search and rescue operations. In order
for this vision to become reality, developing robots that can autonomously manoeuvre
in highly unstructered terrain is necessary. Making informed gait switching decisions,
as is the goal of this thesis contributes to making walking robots more autonomous.
An emerging field towards the goal of more autonomous and robust robots is machine
learning. A necessary ability for robots to become more autonomous is to learn how
to improve its own performance [21], [22], [23], [24]. Reinforcement Learning (RL) is
a widely researched topic in gait synthesis and adaption for legged robotic platforms.
RL is a machine learning technique in where the robot will try different actions from
a predefined set, and receives a certain reward or punishment afterwards. From this
reward-punishment feedback, the robot learns what behavior is optimal. See for example [23] and the citations therein for an overview of recent work in the field of gait
adaption and reinforcement learning in the application of robotic legged locomotion.
Typically, these learning processes present the issues of sparse training data and dynamical complexity [25].
It is interesting to see that although in this thesis gait synthesis and adaption has been
viewed from a completely different angle, the methods developed are actually closely
related to reinforcement learning.
Django D. van Amstel
Master of Science Thesis
Chapter 2
Max-plus algebra
This chapter is devoted to the introduction of the max-plus algebra and Max-Plus
Linear (MPL) state space models. In the first section, the basics of max-plus algebra will
be presented such as the algebraic properties and operations on scalars and matrices.
In section 2 the communication graphs, being the graphical equivalent of max-plus
matrices, will be introduced. In here, some necessary graph theory will be presented
as well.
It is followed in section 3 by how one can solve linear equations in max-plus algebra. In
section 4 the eigenstructure of max plus matrices and the interpretation of eigenvalues,
eigenvectors and the notion of cyclicity are briefly discussed. In the final section, MPL
and Switching Max-Plus Linear (SMPL) linear models will be introduced. In this
section the graphical representation of MPL systems known as Timed Event Graph
(TEG), a sub-class of Petri Nets, will be treated as well.
The material presented in this chapter is for the majority a summary of the first chapters
in [11]. The interested reader is referred to [11] and [1] for more background information
and a more detailed treatment of the described.
2-1
Basics
The introduction of the max-plus algbra will be started with the very core basic definitions of variables and operations.
2-1-1
Definitions
Define the following;
• ε = −∞;
Master of Science Thesis
Django D. van Amstel
14
Max-plus algebra
• e = 0;
• Rmax = R ∪ ε, in where R is the set of real numbers;
For elements a, b ∈ Rmax , the following operators are defined;
a ⊕ b = max(a, b)
(2-1)
a⊗b=a+b
(2-2)
Taking a scalar base x to the power n in max-plus sense is defined as:
x⊗n = x
⊗ x ⊗{z· · · ⊗ x} = n × x
|
(2-3)
n
With × the multiplication operator from conventional algebra.
The above leads to the definition of the max-plus algebra;
Rmax = (Rmax , ⊕, ⊗, ε, e)
Rmax is a so-called idempotent semiring. For more information on idempotency and
semirings, see sources cited within [11] such as [26] or [27].
2-1-2
Algebraic properties
The algebraic properties of the idempotent semiring Rmax are given below.
• Associativity:
∀x, y, z ∈ Rmax :
x ⊕ (y ⊕ z) = (x ⊕ y) ⊕ z
∀x, y, z ∈ Rmax :
x ⊗ (y ⊗ z) = (x ⊗ y) ⊗ z
and
• Commutativity:
∀x, y ∈ Rmax :
x⊕y =y⊕x
and
∀x, y, z, ∈ Rmax :
x⊗y =y⊗x
• Distributivity of ⊗ over ⊕:
∀x, y, z, ∈ Rmax :
x ⊗ (y ⊕ z) = (x ⊗ y) ⊕ (x ⊗ z)
• Existence of a zero element, ε:
∀x ∈ Rmax :
Django D. van Amstel
x⊕ε=ε⊕x=x
Master of Science Thesis
2-1 Basics
15
• Existence of a unit element, e:
∀x ∈ Rmax :
x⊗e=e⊗x=x
• The zero element is absorbing for ⊗:
∀x ∈ Rmax :
x⊗ε=ε⊗x=ε
• Idempotency of ⊕:
∀x ∈ Rmax :
x⊕x=x
To illustrate the theory above, a few calculations are given in Example 2-1, appendix
A.
2-1-3
Vectors and matrices
n×m
Define a matrix A ∈ Rmax
as a matrix in max-plus algebra with n the number of rows
and m the number of columns. Then, matrix A can be written as



A=


a11 a12 · · · a1m
a21 a22 · · · a2m
..
..
..
...
.
.
.
an1 an2 · · · anm



.


n×m
For A, B ∈ Rmax
, the matrix sum is defined as
[A ⊕ B]ij = aij ⊕ bij = max(aij , bij )
with i ∈ {1, 2, . . . , n}, j ∈ {1, 2, . . . , m} and where [A ⊕ B]ij denotes the element of the
resulting matrix in row i and column j.
The scalar multiplication of a matrix, is defined by
[α ⊗ A]ij = α ⊗ aij
n×m
with α ∈ Rmax and A ∈ Rmax
and i ∈ {1, 2, . . . , n}, j ∈ {1, 2, . . . , m} .
n×l
l×m
The matrix multiplication for two matrixes A ∈ Rmax
and B ∈ Rmax
, denoted by A⊗B,
is defined by
[A ⊗ B]ik =
l
M
aij ⊗ bjk =
j=1
max (aij + bjk )
j∈{1,2,...,l}
with i ∈ {1, 2, . . . , n} and k ∈ {1, 2, . . . , m}.
See Example 2-2 in Appendix A for some examples on matrix operations in max-plus
sense.
Note the similarities of the definition above and the operations with matrices in conventional algebra. With this analogy in mind, the following definitions of matrices for
max-plus algebra are almost natural.
Master of Science Thesis
Django D. van Amstel
16
2-1-4
Max-plus algebra
Matrix definitions
n×m
Let E ∈ Rmax
be defined by:
[E]ij =
(
e for i = j
ε otherwise,
with i ∈ {1, 2, . . . , n} and j ∈ {1, 2, . . . , m}.
n×m
Moreover, define E ∈ Rmax
to be a matrix of size n × m with all its elements equal to
ε.
When E and E are square (n = m), they are the max-plus equivalent of the identity
and zero matrix in conventional algebra, respectively. By direct computation it follows
n×m
that for A ∈ Rmax
:
A ⊕ E(n, m) = A
A ⊗ E(n, n) = A.
For k ≥ 1 it holds that:
A ⊗ E(m, k) = E(n, k)
E(k, n) ⊗ A = E(k, m).
2-2
Graphs
Any square matrix can be represented graphically by a so-called weighted graph.
Products and powers of max-plus matrices show nice graph-theoretical interpretations.
These results play a crucial role in the analysis of max-plus linear systems.
2-2-1
Basic definitions
A directed graph G is a pair (N , D), where N is a finite set of elements called nodes or
vertices and D ⊂ N × N is a set of ordered pairs of nodes, called arcs or edges. An
ordered pair means the arc (i, j) is distinguished from the arc (j, i).
When the arc (i, j) ∈ D exists, it is said G contains an arc from i to j. The arc (i, j) is
an outgoing arc at node i, and logically an incoming arc at node j. When (i, j) ∈ D, it
doesn’t necessarily mean (j, i) ∈ D exists, therefore the name directed graph or digraph.
When a weight w(i, j) ∈ R is associated with an arc (i, j) ∈ D, the directed graph
is named weighted directed graph. These weighted directed graphs form the basis in
representing and analysing MPL systems using Petri Nets. The specific class of Petri
Nets will be discussed in section 2-6.
Django D. van Amstel
Master of Science Thesis
2-2 Graphs
2-2-2
17
Representing a matrix graphically by its communication graph
n×n
Any matrix square matrix A ∈ Rmax
can be represented graphically by its communication graph, denoted by G(A). Define the set n = {1, 2, . . . , n}. The set of nodes of the
graph is given by N (A) = n. A pair (i, j) ∈ n × n is an arc of the graph if aji 6= ε. In
symbols:
(i, j) ∈ D(A) ⇔ aji 6= ε.
The weight of an arc (i, j) in G(A) is given by aji . It might feel unlogic to relate arc
(i, j) with element aji of the matrix A, as the indices i and j are switched. Indeed, this
is a procedure prone to error and to cause confusion.
However, when matrix A is a system state space matrix, defining the relation between
the arc (i, j) and matrix element aji is necessary for it to be mathematically correct.
A path from node i to j is a sequence of arcs p, defined as:
p = ((ik , jk ) ∈ D(A) :
k ∈ m)
such that i = i1 , jk = ik+1 for k < m, and jm = j.
In words, the path p consists of a sequence of connected arcs starting at node i, ending
at node j. The number of arcs in the path is given by m, defining the path length as
|p|l = m. The preceding node z − 1 of node z in a path is named the upstream node of
node z and similarly the following node z + 1 is named the downstream node of node z.
When the start and end node of the path coincide, or i = j, the path is called a circuit.
An elementary circuit arises when each of its nodes only has one incoming and one
outgoing arc. When an arc starts and ends at the same node (formally an elementary
circuit of length 1), it is called a self loop.
The weight of a path p, is defined as:
|p|w =
m
O
aik+1 ,ik
k=1
The average path weight of a path p is defined as |p|w /|p|l . The same notions hold for
circuits, as they are essentially paths. However, usually the term circuit mean is used
instead of average path weight when referring to the path length of a circuit.
Paths can be combined to create new, longer paths. Let p1 = ((i1 , i2 ), (i2 , i3 )) and
p2 = ((i3 , i4 ), (i4 , i5 )) be two paths in G(A). Then a new path p1 ◦ p2 can be formed by
introducing a new operator:
p1 ◦ p2 = ((i1 , i2 ), (i2 , i3 ), (i3 , i4 ), (i4 , i5 )).
The operation of combining paths, denoted by ◦ is called concatenation of paths.
Next follow some definitions which will play an important role in the relation of communication graphs with the max-plus eigenvalue of matrices, discussed in section 2-4.
Master of Science Thesis
Django D. van Amstel
18
Max-plus algebra
• A circuit p in G(A) is called a critical circuit if its average weight is the maximal average weight of all circuits. The critical graph of A, denoted by G c (A) =
(N c (A), Dc (A)) is the graph consisting of those nodes and arcs that belong to the
critical circuit.
• When there exists a path from node i to node j, it is said node j is reachable from
node i. If for any two nodes i, j ∈ N node j is reachable from i, the graph G is
said to be strongly connected. When the communication graph G(A) is strongly
n×n
connected, the associated matrix A ∈ Rmax
is called irreducible.
• Note that it is necessary to have at leat one circuit in the graph of A for it to be
irreducible. If the communication graph of A doesn’t contain a circuit, node i will
not be reachable from itself, violating the definition that any node is reachable
from any node.
Finally, the notion of cyclicity is introduced. The cyclicity of G(A), denoted by σG(A)
is defined as:
If G is strongly connected , the cyclicity σG(A) is equal to the greatest common divisor of the lengths of all elementary circuits in G(A). In the case G(A) consists
of one node without a self-loop, the cyclicity is defined as 1.
G is not strongly connected , the cyclicity equals the least common multiple of the
cyclicities of all maximal strongly connected subgraphs of G(A).
The just introduced theory is again illustrated by means of an example. See Example
2-3, Appendix A.
2-3
Solutions to recursive linear equations
With the basic definitions and operations for scalars and matrices introduced, the
solutions to simple recursive linear equations in max-plus algebra can be formulated.
n×n
For the square matrix A ∈ Rmax
, define the following:
A+ =
∞
M
A⊗k
k=1
A∗ = E ⊕ A+ =
M
A⊗k
(2-4)
k≥1
n×n
. If the average
This ·∗ operator is named the Kleene star operator. Let A ∈ Rmax
circuit weight of any circuit in G(A) is equal or less than e, it holds that
A+ =
∞
M
n×n
A⊗k = A ⊕ A⊗2 ⊕ · · · ⊕ A⊗n ∈ Rmax
.
k=1
Django D. van Amstel
Master of Science Thesis
2-4 Eigenvalues and eigenvectors
19
For the proof, see [11], page 31.
Hence, if all circuits have non-positive circuit weight, we have
A∗ =
n−1
M
A⊗k .
(2-5)
k=0
n×n
Then, given A ∈ Rmax
, b ∈ Rnmax and x ∈ Rnmax , the solution to the recurrence relation
x=A⊗x⊕b
is given by
x = A∗ ⊗ b.
(2-6)
The proof is given in [11], page 43. Moreover, this solution is unique under some
conditions. One of these conditions is that A is nilpotent. A matrix A ∈ R×n
max is called
nilpotent if for some integer z ∈ Z+ it holds that
A⊗z = E.
The result of equation 2-6 is very important in the solution of MPL state space models,
discussed in section 2-5.
2-4
Eigenvalues and eigenvectors
The eigenstructure of matrices and systems in max-plus algebra contains important
information of the system, just as in conventional algebra. Eigenvalues and eigenvectors
are defined in max-plus sense as follows.
n×n
Let A ∈ Rmax
be a square matrix, µ ∈ Rmax be a scalar, and v ∈ Rnmax a vector that
contains at least one finite element. If they obey the relation
A ⊗ v = µ ⊗ v,
then µ is called an eigenvalue of A and v an eigenvector of A associated with eigenvalue
µ. The eigenspace of matrix A associated with eigenvalue µ is denoted by V (A, µ) and
is build up from all unique eigenvectors associated with eigenvalue µ. Notice that
eigenvectors are not necessarily unique. If v is an eigenvector, α ⊗ v will be as well:
A ⊗ (α ⊗ v) =
=
=
=
A⊗α⊗v
α ⊗ (A ⊗ v)
α ⊗ (µ ⊗ v)
µ ⊗ (α ⊗ v).
An important notion is that any finite eigenvalue µ of a square matrix A is the average
weight of some circuit γ in G(A), or more formally:
µ=
Master of Science Thesis
|γ|w
.
|γ|l
Django D. van Amstel
20
Max-plus algebra
A proof can be found in [11] p. 37.
Irreducibility of A implies that it’s eigenvalue is finite. Moreover, it can be shown (see
[11], p.40) that the associated eigenvector v will have all elements different from ε. In
n×n
fact, it is proven in [11] that for any irreducible matrix A ∈ Rmax
the eigenvalue λ is a
unique and finite number, equal to the maximal average circuit weight in G(A).
2-4-1
Cyclicity
The eigenvalue and eigenvector of a MPL system play an important role in the steady
state behavior of the system. This steady state behavior is described by the notion of
cyclicity.
Although named alike, the cyclicity σG of a graph (see section 2.2) is not equal to the
algebraic cyclicity σ as defined in this section.
Let
x(k + 1) = A ⊗ x(k)
n×n
which generates the sequence {x(k) : k ∈ N} for k ≥ 0, where A ∈ Rmax
and
x(0) = x0 ∈ Rnmax is the initial condition.
One can find x(k) explicitly when the initial condition is known. The explicit expression
is found by successive application of the recurrance relation:
x(1) = A ⊗ x(0)
x(2) = A ⊗ x(1) = A ⊗ A ⊗ x(0) = A⊗2 ⊗ x(0)
x(3) = A ⊗ x(2) = A ⊗ A ⊗ A ⊗ x(0) = A⊗3 ⊗ x(0)
..
.
x(k) = A⊗k ⊗ x(0)
Let λ be the eigenvalue of A and let σ denote the cyclicity of A. Then the following
holds (see [11], page 50):
A⊗(k+σ) = λ⊗σ ⊗ A⊗k .
This is a very important equation. It says that the sequence of powers of Aλ execute a
periodic behavior with length σ, when k is larger than some integer, say N . The value
of σ is equal to the cyclicity of the communication graph of G(A). When the above
equation holds, it is said the system is in it’s eigen space.
The smallest integer value of k for when this cyclic behavior starts to show for any
arbitrary initial condition x0 is called the transient time of A, denoted by t(A) ∈ N.
Note that the transient time can be very large or even infinite. If t(A) however has
a finite value, it is certain that the system will converge to the eigenspace in at most
t(A) steps.
The periodic behavior is characterized not only by the cyclicity, but also by the eigenvalue λ. In [11] p.57 it is proven that the limit
limk→∞ xi (k)/k
Django D. van Amstel
Master of Science Thesis
2-5 Switching max-plus linear models and Timed Event Graphs
21
exists for every intitial condition. Differently put, it is independent of the inital condition x0 . The value of this limit is proven to be λ for every entry xi in x with j ∈ n,
given that A is irreducible.
In Appendix A, Example 2-4, a numerical example is given to illustrate the notions of
cyclicity and the transient time in MPL models.
2-5
Switching max-plus linear models and Timed Event Graphs
This section is devoted to the introduction of MPL and SMPL systems and models.
The theory put forward in the previous sections of this chapter provides tools and
insight into the theory discussed in this section.
MPL systems are the class of Discrete Event Systems (DES) in which only synchronization and no concurrency or choice occurs [12]. However, with the arrival of SMPL
systems, a certain form of choice is allowed as the mode of operation can be chosen. A
common graphical tool to represent DES in a graphical way is the Petri Net formalism
[28]. The MPL systems are described by the Petri Net subclass of TEG. Although
TEG are limited in their application to describe every DES, many real life systems can
be described fairly well by TEG and MPL models [29].
In the first subsection, the MPL state space models and SMPL models will be presented.
Next, the graphical representation by using TEG is treated in the second subsection.
Moreover, it will be shown how one can go construct the MPL state space model from
the TEG representation in a systematic way.
2-5-1
Max-plus linear state space models
Define a state vector x(k) ∈ Rnmax , in where n is the number of events. k is called the
event counter and it can be interpreted as an discrete time variable in "event domain".
the entry xi (k) of the state vector corresponds to the k-th happening of event i.
n×n
Now introduce the system matrix A ∈ Rmax
. Then, the simplest homogeneous MPL
state space equation is given by






x(k) = 
A ⊗ x(k − 1)

A1,1 A1,2
x1 (k)


...

x2 (k) 
 A2,1

=

.. 
...
 ..
. 
 .
xn (k)
An,1 An,2
···
...
A1,n
A2,n
..
...
.
· · · An,n


 
 
 
⊗
 

x1 (k − 1)
x2 (k − 1)
..
.
xn (k − 1)



,


with initial condition x(0) = x0 ∈ Rnmax . Interpreting what this means is easiest done
by writing out the full equation for a single event. For example, if n = 2, for i = 1, the
following equation is found by writing out the matrix multiplication above:
x1 (k) = A1,1 ⊗ x1 (k − 1) ⊕ A1,2 ⊗ x2 (k − 1)
x1 (k) = max (A1,1 + x1 (k − 1), A1,2 + x2 (k − 1))
Master of Science Thesis
Django D. van Amstel
22
Max-plus algebra
What this equation says is that event x1 will occur for the k-th time, A1,1 time units
after it’s previous occurance, or A1,2 time units after the k − 1-th happening of event
x2 , whichever value is larger.
From the previous section on the eigenstructure of max-plus matrices, it follows that a
MPL system has the following characteristic that in steady state, the state increases in
a periodic fashion, defined by the eigenvalue λ, the initial condition x0 and the algebraic
cyclicity σ.
Because all state trajectories in a MPL system are strictly nondecreasing, using the
conventional definition for stability results in every max-plus system being inherently
unstable. A MPL system is called stable if tokens cannot accumulate indefinitly inside
the graph [29]. A sufficient condition is that the communication graph is strongly
connected.
2-5-2
Switching max-plus linear systems
Now let Θ denote the discrete set of size N of all modes of operation θ such that
θ(k) ∈ Θ = {θ1 , θ2 , · · · , θN }
and modify the state update equation [12] as
x(k) = A(θ(k)) ⊗ x(k − 1),
in which A(θ(k)) corresponds to the system matrix of the mode of operation θ(k). At
every event step k the system can switch between modes of operation, such that a time
series in event domain of modes of operation is created.
The moments of switching are determined by some switching mechanism [12]. After
switching, during a certain transient time t(A(θ(k))) the system will behave a-periodic
before returning to the periodic regime as defined by the eigenvalue λ(A(θ)) and the
cyclicity σ(A(θ)) as defined in section 2-4-1. Because of this transient time, High
frequent switching will result that the system is constantly in the transient-space, rather
than the eigenspace in where the behavior is periodic. After a switch, the initial
condition x0 is defined as the previous state x(k − 1) before the switch at k.
2-5-3
Timed event graphs and linear systems
In section 2-3, the relation between matrices in the max-plus algebra and their communication graph has been investigated. In this section, the so called Petri Nets [28]
will be presented.
A subclass of these Petri nets, called TEG, can be modelled by MPL recurrence relations
[11]. The graphical representation of a DES can aid in the interpretation of a DES and
gives in general an easier interpretation into the system dynamics than just the state
space representation alone. Hence, being able to read a TEG makes it easier to interpret
MPL systems.
Django D. van Amstel
Master of Science Thesis
2-5 Switching max-plus linear models and Timed Event Graphs
23
This subsection is based on [11] and [29].
The class of event graphs is defined as follows. Divide the set of nodes N into two
disjoint subsets P and Q, with their elements being called places and transitions respectively. Place i is denoted by pi and similarly, transition j is denoted by qj . The set
of arcs is defined by
D ⊂ (Q × P) ∪ (P × Q).
This means that arcs exist from transitions to arcs and vice verca, but there are no
arcs from places to places or from transitions to transitions. When all places in the
network have a single transition upstream and a single one downstream and the place
weights represent holding times (usually indicated by τi ), it is called a TEG.
A single upstream transition means there is a single source of of token supply for each
place and hence there is no competition in consumption and supply or supply of tokens
[29]. A single downstream transition is defined likewise.
The transitions by convention have names of the variables of the system such as the
input u, output y and states xi . The places represent the holding times of the different
arcs.
When a condition is fulfilled, a token (represented by a dot) is allocated to the place.
The event, associated with a transition, can take place when all upstream places have
at least one token and the token has spend the specific holding time at the place. If
this is the case, the transition is named enabled.
When the tokens have spend the respective holding times at all upstream places, te
transition directly fires. This means it takes one token from each upstream place and
adds one token to each downstream place.
Because in TEG’s each place only has one upstream and one downstream transition,
The number of tokens in place i is given by mi . The total division of tokens over the
network is called the marking, and is represtended by M = (m1 , m2 , . . . , m|P| )⊤ , in
where |P| denotes the total number of places.
With the initial marking M0 one can represent different synchronization constraints
All the holding times of the places are the elements of vector T = (τ1 , τ2 , . . . , τ|P| )⊤ .
In summary, a timed event graph G is formally defined by 5 characteristics P, Q, D, M0 , T ,
where
• P is the set of all places
• Q is the set of all transitions
• D is the set of arcs
• M0 is the initial marking
• T is the vector of holding times, ordered equally as P such that Pi has holding
time Ti .
Master of Science Thesis
Django D. van Amstel
24
Max-plus algebra
2-5-4
Going from a Petri Net representation to a state space model
Now the relation between the MPL state space representation and its Petri Net will be
shown. Let τji represent the holding time of the place having upstream transition qi
and downstream transition qj . The size of the event set Q is n. Also, let xi (k) represent
the kth firing of transition qi . The firing times of all transitions for the kth time form
the elements of vector
x(k) = (x1 (k), x2 (k), . . . , xn (k))⊤ .
Vector x(k) is the state of the system.
n×n
With every event graph a matrix A0 , . . . , AM ∈ Rmax
can be associated with it. Define
each element by
[Am ]ji =
(
τji if the number of tokens in place pqi ,qj equals m,
ε otherwise
(2-7)
for m = 0, 1, . . . , M . In here, M defines the maximum number of tokens with respect
to all places. Then the following recurrence relation holds
x(k) = A0 ⊗ x(k) ⊕ A1 ⊗ x(k − 1) ⊕ · · · ⊕ AM ⊗ x(k − M )
for
k ≥ 0.
This is proven easily by construction. The above higher order recurrence relation can
be transformed into a first order one, for which we have many tools available. Start off
by writing the above as
x(k) =
M
M
Am ⊗ x(k − m),
k ≥ 0.
m=0
L
Now define b(k) = M
m=1 Am ⊗ x(k − m). Note that the summation counter starts at
m = 1, reducing the equation further to
x(k) = A0 ⊗ x(k) ⊕ b(k).
Recall the Kleene star operator introduced in section 2-3. Then the solution reads as
x(k) = A∗0 ⊗ b(k) = A∗0 ⊗ A1 ⊗ x(k − 1) ⊕ · · · A∗0 ⊗ AM ⊗ x(k − M )
for k ≥ 0. (2-8)
Whereas x(k) first showed up on both sides of the = sign, the right hand side of the
equation now only contains "past" instances x(k − 1) to x(k − M ). Define an extended
state vector, practically stacking state vectors of past instances below the current state
vector as:
⊤
xe(k) = x⊤ (k), x⊤ (k − 1), . . . , x⊤ (k − M + 1) .
Then the recurrence relation of order M can be reduced to a first order one as
Django D. van Amstel
xe(k) = Ae ⊗ xe(k − 1),
k≥0
(2-9)
Master of Science Thesis
2-6 Conclusion
25
with





e

A=




A∗0 ⊗ A1 A∗0 ⊗ A2 · · · · · · A∗0 ⊗ AM
E
E
··· ···
E
E
E
E ···
E
..
..
...
... ...
.
.
..
..
.
.
E
···
E
E
E






.




(2-10)
This equation is often referred to as the standard autonomous equation, which described
the firing evolution of each transition of a timed event graph in a linear manner. Note
that the state vector and thus the state space increases linearly with M .
In Example 2-5, Appendix A, the synchronization and modeling of two legs is described
in great detail to illustrate the findings in this section. First, the two legs and their
synchronization constraints are represented by a TEG Petri Net. From this graphical
representation a MPL model is obtained.
2-6
Conclusion
This chapter started off with the definition of the max-plus algebra
Rmax = (Rmax , ⊕, ⊗, ε, e).
The algebraic properties have been introduced and various elements have been defined
in analogy to conventional algebra. It was shown how one can solve the recursive linear
equation x = A ⊗ x ⊕ b using the Kleene star operator. Although in general this leads
to an infinite sum, under some conditions the sum is finite and the solution will be
finite and unique.
Graph theoretical concepts such as path lengths, average path weights, circuits and
their relation to the eigenstructure of max-plus matrices have been discussed. The
notion of irreducible matrices was introduced.
It was shown that the maximal average circuit weight of a communication graph corresponds to the eigenvalue λ of a max-plus matrix. This eigenvalue determines the
asymptotic growth rate of the state.
Subsequently the SMPL models have been introduced. It was shown how one can
obtain an MPL state space model from the graphical representation, being Timed
Event Graphs. These graphs are a sub-class of Petri Nets and are directed, biparted
graphs. The graphs consist of places with associated holding times and events. The
dynamics are represented by the movement of tokens through the network of places
and events.
Master of Science Thesis
Django D. van Amstel
26
Django D. van Amstel
Max-plus algebra
Master of Science Thesis
Chapter 3
Ordinal optimization
In most traditional optimization search techniques the goal is to find the single best solution or the global optimum. However, in practice this truly best solution is not found
due to constraints in terms of calculation time or complexity. Stopping criteria are
included to ensure that the algorithm arrives at a solution[30]. By including stopping
criteria, the found solution will not be the true optimum, but an approximate solution
close to the presumed best solution. Typically these stopping criteria do not asses how
close the found solution is to the true optimal solution. It is therefore impossible to
claim anything about the quality of the found solution.
Let Θ denote the set of possible solutions and denote the performance of each solution
θ ∈ Θ by J(θ). Then the optimization problem can be formulated mathematically as
finding the optimal solution θ∗ by solving
θ∗ = arg min J(θ).
θ∈Θ
Finding the solution becomes more and more computational complex when Θ is large
and/or unstructured or the performance J(θ) is difficult or costly to evaluate [31]. As
technology advances, more and more optimization problems show these properties and
the traditional search techniques in optimization problems are not suitable any more.
Especially Discrete Event Systems (DES) tend to be difficult to handle by traditional
optimization techniques
In order to handle optimization problems with these properties, a relative new technique
is developed, named Ordinal Optimization (OO). In this chapter, this technique will
be discussed. The discussion will be limited to a version of the classical OO that is
suitable for relative small to medium sized problems. The remainder of this chapter is
based on [31],[32] and [33], unless cited otherwise.
OO is mainly based on two concepts: goal softening and order versus value. By the
concept of goal softening, OO shifts the focus to finding good designs, instead of the
Master of Science Thesis
Django D. van Amstel
28
Ordinal optimization
search for the single best solution. This idea will be the starting point of the introduction of OO. The concept of order versus value uses the fact that it is much easier and
efficient to determine if a is larger than b than to accurately determine the value of a.
In the first section, the necessary sets and variables will be defined. In the subsequent
sections the mechanics of the OO method will be explained by the hand of the concepts
of goal softening, alignment probability, the ordered performance curve and the selection
rule in the final section.
3-1
Definitions
Define the following:
• The total solution or search space Θ of size N or |Θ| = N ∈ N+ . Each unique
solution is named θi , with index i = 1, 2, · · · , N . Hence, Θ is a finite discrete set
of some dimension.
• The performance or cost function J(θ) ∈ R. Each observation of the cost function
is assumed to be subjected by i.i.d. noise with variance σ 2 .
• The selected subset S ⊂ Θ of size |S| = s ∈ N+ .
• The good enough subset G ⊂ Θ of size |G| = g ∈ N+ .
• the level of alignment, defined as k = |G ∩ S| ∈ N0 . Note that k ≤ min(g, s).
• The alignment probability, P (|G ∩ S| ≥ k; σ, g, s, N ).
The constraint that all set sizes except k are in the set of positive integers N+ is there
to avoid trivial cases in where a set would have size 0. However, the level of alignment
k is allowed to be 0.
The above sets are depicted graphically in Figure 3-1.
3-1-1
Summary of method
The method of OO can then be summerized by the following steps:
1. Select g and a required probability of alignment P and s.
2. Form the selected subset S from Θ according to a certain selection rule.
3. Evaluate J(θ), ∀θ ∈ S. The actual obtained evaluations are subjected to noise
e
and hence an approximation J(θ)
will be obtained.
e
4. Order S according to J(θ):
Django D. van Amstel
e ) ≤ J(θ
e ) ≤ · · · ≤ J(θ
e )
S = {θ1 , θ2 , · · · , θs } , with J(θ
1
2
s
Master of Science Thesis
3-1 Definitions
29
Figure 3-1: Graphical representation of defined sets
5. The first k solutions {θ1 , θ2 , · · · , θk } in S are now equal to the intersection {G∩S}
with probability P .
In the subsequent sections, the concepts that are used in OO will be explained in more
detail. First, the concept of goal softening is explained. Subsequently the alignment
probability and Ordered Performance Curve (OPC) will be discussed. Finally, the
selection rule will be discussed.
3-1-2
Goal softening
The goal of any arbitrary optimization technique is to find at least one solution θ ∈ G.
In most traditional optimization search techniques the constraint of |G| = 1 is imposed
implicitly; the search is for the single solution that mimizes the cost function J.
The concept of goal softening in OO is that this constraint is not imposed. By relaxing
the very tight constraint of |G| = 1 to |G| = g, The probability of finding G increases.
Depending on the method, this should decrease the computational complexity and/or
the necessary time to find a solution θ that belongs to G.
Master of Science Thesis
Django D. van Amstel
30
Ordinal optimization
Figure 3-2: Different types of OPC. Image courtesy of [34]
3-2
Alignment probability
The alignment probability was already defined at the beginning of this chapter as
P |G ∩ S| ≥ k; σ 2 , g, s, N .
The alignment probability can be interpreted as the probability that k solutions in the
subset S are also in the good enough subset G. For most selection rules, discussed in the
fourth section of this chapter, it is impossible to calculate this probability analytically.
Numerical values are obtained by simulations. In general, an alignment probability of
P ≥ 0.95 is selected, depending on the problem at hand.
This exposes a fundamental difference with and a great advantage of OO compared
to traditional optimization methods. The alignment probability is a measure of quality that is missing when traditional optimization methods are used in practice with
stopping criteria.
3-3
The ordered performance curve
When forming the ordered subset from S, the solutions are ranked according to the
performance function. The OPC is defined as the ordered set Θ according to the
performance function J(Θ):
OP C = {θ1 , θ2 , · · · , θN } , with J(θ1 ) ≤ J(θ2 ) ≤ · · · J(θN ).
Also note that for the OPC, no noise is taken into account. Typically, five different
types of OPC are distinguished. They are presented in Figure 3-2.
The type of OPC plays a major role in selecting g. Usually, g is selected as a fraction
or the top n-percentile of the total search space:
g = α · N,
α ∈ [0, 1].
If the OPC is steep and there are many bad designs, α should be chosen small since
the actual performance of the solutions degrades fast. A smaller g will lead to a larger
Django D. van Amstel
Master of Science Thesis
3-4 The selection rule
31
necessary s and hence will increase the computational burden. On the other hand, if
the OPC is very flat, an α closer to 1 is possible since many solutions perform relatively
well.
Although the general shape of the OPC is usually known, there is no general systematic
way of determining the value of g analytically from the OPC.
3-3-1
The efficiency of ordering
It has been shown in [33] that order converges faster than value. In other words, it is
easier to determine the ordering a ≤ b ≤ c than the actual values of a, b and c.
Moreover, it has been noted previously that the evaluation of the performances of the
solutions θ in S is subjected to noise. In [32] it is shown by experiments that noise
has less effect on determining an order, compared to when actual values are to be
determined.
The fact that ordering converges faster than value and is moreover less sensitive to
noise is the second reason why OO is very efficient in finding a solution θ ∈ G.
3-3-2
Determining the required set size
As with selecting g, there is no general formula or method available to determine
the required value for s besides numerical simulation. However, some approximation
functions and tables in where s is calculated as a function of P , g and N can be found
in [32] and [35].
3-4
The selection rule
It still remains unknown how the selected subset S is formed. Various methods have
been developed, of which a good overview and comparison is presented in [36]. From this
publication, two methods will be discussed; the Blind Pick (BP) and Horse Race (HR)
selection rule.
3-4-1
The Blind pick selection rule
In the BP selection rule, the selected subset S is selected from Θ completely at random.
The advantage is that absolutely no knowledge of the system is required.
Since the method fully relies on a stochastic selection, an analytical expression is available for the alignment probability [35] :
2
P |G ∩ S| ≥ k; σ , g, s, N =
Master of Science Thesis
min(g,s)
Σi=k
g
i
!
N −g
s−i
×
N
s
!
!
.
(3-1)
Django D. van Amstel
32
3-4-2
Ordinal optimization
The Horse Race selection rule
The HR selection rule does not select the candidates for S at random. Each solution θ
is evaluated for a fixed number of observations. Then, the top s solutions are selected
for further evaluation until the computing budget is depleted. Some extensions such as
the HR with global comparison and HR with no elimination are presented in [36]. In
those methods, the initial evaluation of all solutions are replaced by techniques using
information from previous iterations, aimed at speeding up the selection process.
Comparison of selection rules
With the exception of some specific problems, HR with no elimination is found to be
the most efficient selection rule[36]. For problems where the computing budget is very
small or where the good enough subset is very small, normal HR is recommended.
3-5
Conclusion
The optimization technique of Ordinal Optimization (OO) has been introduced in this
chapter. It is designed to efficiently handle large, complex and unstructured search
spaces.
In order to do so, it relies on two concepts. The first concept is goal softening. Instead
of searching for a singleton best solution, a set of good enough solutions is defined.
The size of this set is depended on the type of problem, characterized by the Ordered
Performance Curve. The OPC is the ordered set of Θ according to the performance
function J(θ).
From the total solution space Θ a selected subset S ⊂ Θ is selected using a selection rule.
The most efficient selection rule in general cases has been found to be Horse Race. The
solutions θ in the selected subset S are ordered according to their performance function,
similar to how the OPC is defined.
Since order converges faster than value, only a limited amount of observations are
needed. Another advantage is that ordering is less sensitive to noise. Ordering the
solutions instead of trying to approximate their exact performance value is the second
concept of OO. See [33] and the references therein for examples how OO compares to
other optimization methods in terms of rate of convergence.
In Appendix A, Example 3-1 an example of using the OO method with a BP selection
rule is presented.
3-6
Chapter notes
It is noteworthy to mention that, for example in [35], it is stated that OO is not there to
replace existing optimization methods. OO is most effective in aiding more traditional
Django D. van Amstel
Master of Science Thesis
3-6 Chapter notes
33
techniques in solving problems which show the before described characteristics. In
the literature OO is often associated with the optimization in DES, since the discrete
dynamics and the event driven dynamics typically introduce large unstructered search
spaces [35], [37], [32].
Master of Science Thesis
Django D. van Amstel
34
Django D. van Amstel
Ordinal optimization
Master of Science Thesis
Part II
Theoretical foundation
Master of Science Thesis
Django D. van Amstel
Chapter 4
Structural analysis of max plus linear
models
This chapter is dedicated to the analysis of the structural relation between the graphical
Petri Net and algebraic state space representations of Max-Plus Linear (MPL) Discrete
Event Systems (DES). Analysing this relation and the underlying structure is useful for
the development of a method to react on disturbances. When the A matrix is formed,
this useful information is lost. The analysis in this chapter is limited to first order MPL
systems.
In order to preserve the structural information, an alternative state calculation method
will be proposed. It will be shown that the proposed method results in the same
state calculation as with a MPL model while preserving the structural information.
Moreover, it will be shown that the method applies to higher order MPL systems as
well.
In the following paragraph the issue of lost structural information will be sketched.
Subsequently, it will be analysed how the A matrix is formed using the Kleene star
operator and what the relation is to the structure of the Petri Net. In the second
section the new proposed method will be introduced and formally defined. It will be
proven that this new method arrives at the same results as using the state space model.
A comparison is made between the two methods. In the final section the proposed
method is generalized to higher order MPL systems.
The analysis between the Petri
Net and the resulting A matrix in the state space representation will be performed on
the Petri Net as presented in Figure 4-1. The holding time in seconds of each place is
written inside the circles representing the place. The labeling of the places and events
are given in the lower left corner. The black dots represent the tokens and are in the
initial marking configuration.
Capturing the dynamics of the Petri Net algebraically
Master of Science Thesis
Django D. van Amstel
38
Structural analysis of max plus linear models
Figure 4-1: A generic Petri Net, used to analyse the relation between the structures of the
graphical an algebraic representation of a MPL DES
Django D. van Amstel
Master of Science Thesis
39
Figure 4-2: The movement of the tokens in time for one event iteration
In Figure 4-2 the evolution of the tokens through the network is depicted. After 2
seconds, both places p1 and p3 become enabled and hence event q1 can fire, distributing
tokens to places p1 and p2 . Event q3 will fire after 3 seconds, since its only upstream
place is p4 with holding time 3. After q3 has fired, a token is placed in places p3 and
p5 .
Note that after 3 seconds, p2 is enabled as well, since q1 fired after 2 seconds and the
holding time of p2 is only 1 second. Since the holding time of p5 is 0 seconds, the token
in p5 is directly enabled and event q2 will fire at T = 3s as well. Because event q2 has
fired, a token is placed again in place p4 .
Although place p1 will be enabled by T = 4s, event q1 will not fire until place p3 is
enabled as well at T = 5s. At this time, the second iteration will start. The first
iteration firing times can be summerized by




2s
q1 (1)




 q2 (1)  =  3s  .
3s
q3 (1)
Now the attention is turned to the algebraic representation. Using the method described
in chapter 2 and noticing that M = 1, the AM matrices are obtained from the Petri
Net as:


ε ε ε


(4-1)
A0 =  1 ε 0  ;
ε ε ε


2 ε 2

A1 = 
 ε 0 ε .
ε 3 0
(4-2)
The zeros on the main diagonal of A1 represent the constraint that an event cannot
occur for the k-th time, if it hasn’t happened for the (k − 1)-th time.
Master of Science Thesis
Django D. van Amstel
40
Structural analysis of max plus linear models
Using equation 2-8 the system matrix is obtained as


2 ε 2

A= 3 3 3 
,
ε 3 0
(4-3)
such that the k-th firing of the events q are given by the state space model






x1 (k − 1)
2 ε 2
x1 (k)

 



 x2 (k)  =  3 3 3  ⊗  x2 (k − 1)  .
x3 (k − 1)
ε 3 0
x3 (k)
(4-4)
Indeed, using the state space model with initial state x(0) = x0 = [ e 0 0 ]⊤ results
in


2

x(1) = A ⊗ x(0) = 
 3 
3


5

x(2) = A ⊗ x(1) =  6 

6
,
which corresponds to the results found by following the movement of the tokens through
the Petri Net.
However, some of the dynamical properties that were observed by following the movement of the tokens through the network have been lost. It is impossible to see from
the A matrix how it is possible that q2 can fire together with q3 or how the exact firing
time of q1 is determined by the other two events. On the other hand, following the
tokens in the Petri Net graphically is a slow procedure, especially if the number of
states increases.
In the following analysis the goal is to expose the movements of the tokens in an
algebraic way. The answer lies in how the Kleene star operator is used in the synthesis
of A from A0 and A1 .
4-1
Petri Nets and MPL state space equations
n×n
be the system matrix for a MPL system with n states and state vector
Let A ∈ Rmax
n
x(k) ∈ Rmax . Recall that the system matrix A is calculated as
A = A∗0 ⊗ A1 ,
in where A0 and A1 are obtained via equation 2-7. Inserting this definition of the A
matrix in the state space equation results in
x(k) = A∗0 ⊗ A1 ⊗ x(k − 1).
Django D. van Amstel
(4-5)
Master of Science Thesis
4-2 The single event cycle iteration method
41
Assuming A0 is nilpotent for power N ≤ n and using the definition of the Kleene star
operator gives
!
N
M
(4-6)
x(k) =
Az0 ⊗ A1 ⊗ x(k − 1).
z=0
Using the distributivity property of the max-plus algebra yields
x(k) =
N
M
(Az0 ⊗ A1 ⊗ x(k − 1)) .
(4-7)
z=0
The right hand side of equation 4-7 is a summation of vectors in max-plus sense. For
the Petri Net of Figure 4-1 and the corresponding MPL system N = 1. This number is
found by calculating succesive powers of A0 until the result is the max-plus zero matrix
E(n).
Writing out the result of 4-7 for k = 1 yields
L
⊗z
1
x(1) =
z=0 A0 ⊗ A1 ⊗ x(0)
⊗0
= A
⊗ A1 ⊗ x(0)
⊕ A⊗1
0 ⊗ A1 ⊗ x(0)
0 

2
ε
 

0
3
= 
⊕
 

.
3
ε
| {z }
z=0
(4-8)
| {z }
z=1
Inspection of the Petri Net tells us how to interpret these vectors. The vector corresponding to z = 0 tells us which events can fire and when if the tokens take one step
in the Net. The second vector, corresponding to z = 1, gives the same information for
when the tokens have taken two steps.
In general, the vector corresponding to z gives information on how the tokens can
have moved through the Petri Net after (z + 1) steps from the initial marking. By
having a series of vectors corresponding to z = 0, 1, 2, · · · , the movement information
of the tokens has been represented in a algebraic manner. With these vectors, no
synchronizations constraints are taken into account; it just provides information on
how the tokens can move if all events were able to fire without the constraint that all
upstream places have to be enabled. The synchronization constraints are taken into
account when the max-plus sum is taken over the vectors to form the final state x(k).
In the above analysis it has been shown by expanding the MPL state space equation
and rearranging the terms in a particular fashion, a link can be found between how
tokens move through the Petri Net and how the state is constructed. In the next section
this procedure is formalized.
4-2
The single event cycle iteration method
n×n
For a Petri Net with n events that can be described by two matrices A0 ∈ Rmax
and
n×n
A1 ∈ Rmax , a MPL state space equation with n states of the form
x(k) = A ⊗ x(k − 1)
Master of Science Thesis
Django D. van Amstel
42
Structural analysis of max plus linear models
n×n
can be obtained, in where A ∈ Rmax
. As was noted before, this A matrix can be seen
as an aggregated matrix from A0 and A1 . Hence to avoid confusion, the recursive state
space equation will now be referred to as the aggregated event cycle.
From the analysis in the previous section, it was found that by rewriting the state space
equation in a particular fashion, more information can be preserved. This procedure is
formalized in this section and is named the single event cycle method.
Definition 4.1. Let l ∈ Z∗ be the single event cycle counter. Then define for the
aggregated event cycle k the single cycle state x(k, l) ∈ Rnmax . Moreover, define the
initial conditions for the single cycle x(k, 0) as x(k, 0) = x(k − 1).
Each single cycle state iteration x(k, l) corresponds to timings of the events after l steps
of the tokens through the Petri Net in a given aggregated event iteration k.
How the single cycle states x(k, l) are calculated is given in the following theorem.
n×n
n×n
Proposition 4.2. Let A0 ∈ Rmax
be a nilpotent matrix which together with A1 ∈ Rmax
describes the Petri Net G with initial marking M0 . Let x(k) = A ⊗ x(k − 1) be the
aggregated event cycle equation, with A = A∗0 ⊗ A1 .
Then, the single cycle states x(k, l) are calculated by the single event cycle equation
x(k, l + 1) =
(
A1 ⊗ x(k, 0)
for l = 0
,
A0 ⊗ x(k, l − 1) for l = 1, 2, · · · , lmax − 1
(4-9)
with initial condition x(k, 0) = x(k − 1).
The aggregated event cycle state x(k) is then given by
x(k) =
lM
max
x(k, l)
(4-10)
l=1
Proof. The proof is done by rewriting the above equations 4-9 and 4-10 such that the
original state space equation is obtained.
Iteration of equation 4-9 and using Definition 4.1 yields
x(k, 1)
x(k, 2)
x(k, 3)
x(k, 4)
= A⊗0
0 ⊗ A1 ⊗ x(k, 0)
=
A0 ⊗ x(k, 1)
=
A0 ⊗ x(k, 2)
=
A0 ⊗ x(k, 3)
..
.
=
=
=
=
A1 ⊗ x(k − 1)
A0 ⊗ A1 ⊗ x(k − 1)
A⊗2
0 ⊗ A1 ⊗ x(k − 1)
⊗3
A0 ⊗ A1 ⊗ x(k − 1)
Continuing this iteration process results in the general equation
x(k, l) = A0 ⊗ x(k, l) = A⊗l−1
⊗ A1 ⊗ x(k − 1).
0
Django D. van Amstel
(4-11)
Master of Science Thesis
4-2 The single event cycle iteration method
43
Substituting this result in 4-10 yields
⊗lmax −1
x(k) = A1 ⊗x(k−1)⊕A0 ⊗A1 ⊗x(k−1)⊕A⊗2
⊗A1 ⊗x(k−1)
0 ⊗A1 ⊗x(k−1)⊕· · ·⊕A0
Noting that A⊗0
0 = E and rearranging gives
⊗2
⊗lmax −1
x(k) = A⊗0
⊗ A1 ⊗ x(k − 1)
0 ⊕ A0 A0 ⊕ · · · ⊕ A0
for lmax → ∞ this yields
⊗2
⊗∞
⊗ A1 ⊗ x(k − 1)
x(k) = A⊗0
0 ⊕ A0 ⊕ A0 ⊕ · · · ⊕ A0
L∞
∞
x(k) =
l=0 A0 ⊗ A1 ⊗ x(k − 1)
x(k) = (A∗0 ⊗ A1 ) ⊗ x(k − 1).
And the original aggregated event cycle equation has been obtained.
Note that the single event iteration method is very similar to a state space model
expression; the next state with single event counter l is a function of the previous
state at (l − 1) and a system matrix AM . With this insight, one can view the single
event iteration method as a state space model modelling a single state, "within" the
aggregated state space model.
4-2-1
Bounding the iteration method
In the proof of the single event cycle equations it was used that lmax → ∞. However,
just as for the Kleene star operator the result is a finite sum for certain A0 matrices,
lmax is bounded to a finite number from the positive integer set in a similar way. In
proving lmax is bounded, use will be made of graph theory.
Lemma 4.3. Let a Petri Net G with |Q| = n the number of event nodes and initial
n×n
n×n
marking M0 with M = 1 be represented by the matrices A0 ∈ Rmax
and A1 ∈ Rmax
such that a MPL model is obtained as x(k) = A ⊗ x(k − 1).
Then lmax in the equation
x(k) =
lM
max
x(k, l)
l=1
is bounded to a finite positive integer lmax ∈ N+ , if there are no circuits in the communication graph of A0 .
Proof. Recall equation 4-11:
x(k, l) = A0 ⊗ x(k, l) = A⊗l−1
⊗ A1 ⊗ x(k − 1).
0
Master of Science Thesis
Django D. van Amstel
44
Structural analysis of max plus linear models
Now recall as well that the analysis in the beginning of this chapter showed that x(k, l)
corresponds to the tokens in the initial marking M′ have moved past l events without
crossing the places from the initial marking.
Combining the two facts above, it follows that the result of A⊗l−1
corresponds to the
0
tokens have moved past l events in the Petri Net from their initial marking M0 .
This power series can only continue to grow to infinity in two cases.
1. The number of connections to unvisited states is infinity. Since the Petri Net only
has n event nodes, this is not possible.
2. A circuit in the communication graph of A0 exists, such that the tokens from the
initial marking M0 can keep moving without crossing a place associated with this
initial marking. Since the theorem demands that case (b) is false, for l > lmax the
n×n
matrix A⊗l−1
will be the max-plus zero matrix E ∈ Rmax
.
0
It then follows from equation 4-11 that the single events cycle state x(k, l) must be
equal to a max-plus zero vector for l > lmax .
From the proof above, the actual bound is easily found.
Lemma 4.4. The bound of lmax is equal to the maximal path length in the communication graph of A0 .
Proof. In the proof of the previous theorem, it was noted that the matrix A⊗l−1
cor0
responds to that the tokens in the initial marking have taken l steps past event nodes
in the Petri Net. When the power series of A0⊗l−1 becomes the max-plus zero matrix,
it corresponded to that the tokens can not move any further without passing a place
associated with the initial marking M0 .
The matrix A0 corresponds to all places not associated with the initial marking, since
it requires that the number of tokens m in the places are 0. This requirement follows
directly from the definition of the A0 matrix in equation 2-7. Hence, the length of the
longest path in the communication graph of A0 is equal to lmax . If the path would be
longer than lmax , a succesive power of A0⊗l−1 would exist, contradicting the fact that
this power series was bound by lmax .
If the longest path in the communcation graph of A0 was smaller than lmax , an extra
event and place should exist, contradicting the assumption that the Petri Net has n
states.
4-2-2
Summary of single event cycle state calculation method
Now the single event cycle state calculation method can be summarized as:
1. Let x(k, 0) = x(k − 1).
Django D. van Amstel
Master of Science Thesis
4-3 The single event cycle iteration method for higher order max-plus linear systems
45
2. Iterate equation 4-9 for l = 1, 2, · · · , lmax . lmax is equal to the maximal path length
in G(A0 ). These vectors form the single events cycle states,
x(k, 1), x(k, 2), · · · , x(k, lmax )
3. Calculate the aggregated events cycle state as
x(k) =
lM
max
x(k, l).
l=1
In Appendix A, example 4-1, a detailed example and comparison of a state calculation
with the single events cycle procedure is given.
4-2-3
Comparison of methods
Calculating the aggregated events state by the standard state space equation
x(k) = A ⊗ x(k − 1)
results in a very quick and easy calculation of the next state. Only one matrix multiplication is involved. It has already been mentioned that during the abstraction to the
system matrix A some structural information on dynamics of the Petri Net have been
lost; there is no method to reconstruct a unique set of AM matrices from the A matrix.
The single events cycle method provides a solution here. It exposes some of the inner
mechanics on how the next state x(k) is constructed from the previous one x(k − 1).
However, this comes at the cost of increased computational burden, as it involves
multiple matrix multiplications and summations for each new calculation of the state
x(k). In the original state space model, these calculations are only done once in forming
the A matrix.
4-3
The single event cycle iteration method for higher order maxplus linear systems
In the analysis so far it was assumed that the system was a first order MPL system.
However, for many real life systems this assumption doesn’t hold. Hence, the method
has been extended to M order systems.
For an Petri Net G and initial marking M0 such that M 6= 1 the following equation
holds [?]
x(k) = A∗0 ⊗ A1 ⊗ x(k − 1) ⊕ · · · ⊕ A∗0 ⊗ AM ⊗ x(k − M ).
(4-12)
This can be rewritten into
x(k) =
lM
max
z=0
A⊗z
0
Master of Science Thesis
⊗ A1 ⊗ x(k − 1) ⊕ · · · ⊕
lM
max
z=0
A⊗z
0 ⊗ AM ⊗ x(k − M ) .
(4-13)
Django D. van Amstel
46
Structural analysis of max plus linear models
Switching the summations fially yields
x(k) =
lM
max
z=0
M M
A⊗z
0
⊗ Am ⊗ x(k − m)
m=1
!
.
(4-14)
The single event cycle state is now defined as
x(k, l) =
4-4
( L
M
⊗l−1
⊗ Am ⊗ x(k − m)
m=1 A0
x(k, l) = x(k − 1)
if l ≥ 1,
if l = 0
Conclusions
The main contribution of this chapter has been the introduction of a new iterative
method to calculate the events state x(k) of a MPL system.
This chapter started with an analysis of how the Petri Net representation is related to
the algebraic representation. Specifically, the role of the A0 matrix was investigated. It
turns out that each power of the matrix A0 corresponds to a single step of the tokens
in the Petri Net.
This analysis led to the introduction of the single event cycle method, in where a new
iteration procedure was defined. This procedure calculates the state and preserves information that is lost when using the aggregated state space equation. The preservation
of information comes at the cost of increased computational effort.
In the last section of this chapter the method was extended from first to M order MPL
systems. Because higher order systems are not within the scope of this thesis, the
extended method has only been presented briefly.
4-5
Chapter Notes
The fact that max-plus algebra and MPL models operate within the event-domain and
not the time domain has been a point of attention since the discovery of max-plus
algebra. The main reason that this is a point of discussion is that a state in event
domain spans a whole interval in time domain. The other way around is also true; a
given point on the time-axis can correspond to different points on the event-axis.
For example, let the first two states of a MPL be
x(1) =
"
0.3
10
#
sx(2) =
"
5
14.8
#
s.
Now, transforming
x(1)
h
i to the time domain would result in something defined on the
interval t = 0.3 10 . The other way around is also true. If we look at the state at
t = 6s, both x1 (1) and x2 (1) have occured but x1 (2) has not. Hence, at t = 6s we find
ourselves at both k = 1 and k = 2.
Django D. van Amstel
Master of Science Thesis
4-5 Chapter Notes
47
This difference makes it dfficult to implement feedback control in the event domain
that uses time domain measurements. As will be shown in chapter 7, the proposed
single event cycle method solves this problem to a certain degree.
Master of Science Thesis
Django D. van Amstel
48
Django D. van Amstel
Structural analysis of max plus linear models
Master of Science Thesis
Chapter 5
The max-plus linear disturbance model
The incorporation of disturbances into Max-Plus Linear (MPL) models will be discussed
in this chapter. Four different ways of how disturbances can be modeled in the MPL
state space framework will be presented. From a comparative analysis it will appear
only one method models disturbances in a useful manner for control.
From this analysis a new MPL model is derived that incorporates disturbances. This
model is named the MPL disturbance model. The conditions on existence of the disturbance model will be derived and compared to those of standard MPL models.
The presentation and analysis of the four ways of modeling disturbances will be discussed in section 5-1. The derivation of the disturbance model will be presented in
section 5-2. Moreover, a method will be introduced to extend any MPL model to this
disturbance model in a systematic way. In the final third section, the conditions on
existence will be derived and compared to those of the traditional model.
5-1
Modeling disturbances in a max-plus linear model framework
n×n
Let x(k) ∈ Rnmax be the state and A ∈ Rmax
the system matrix such that the MPL
state space model is given by
x(k) = A ⊗ x(k − 1),
with initial condition x(0) = x0 .
First, the concepts of the scheduled state, the realized state and the disturbances are
introduced by the following definitions.
Definition 5.1. Let x(k) ∈ Rnmax denote the scheduled system state of size n. The
realized or measured state is then defined as
Master of Science Thesis
xb(k) ∈ Rnmax .
Django D. van Amstel
50
The max-plus linear disturbance model
Definition 5.2. Let x(k) ∈ Rnmax be the scheduled state of a MPL system and xb(k) ∈
Rnmax the realized state of the system. Then, the measured disturbance vector is defined
as d(k) ∈ Rnmax .
x(k),xb(k) and d(k) are related by a function D as
xb(k) = D (d(k), x(k)) .
What the function D should be is the next topic of discussion.
5-1-1
The four ways to define the disturbance function
Inspired by the state space model in conventional algebra, we can model the disturbances as additive to the state:
xb(k) = x(k) ⊕ d(k),
(5-1)
xb(k) = D(k) ⊗ x(k),
(5-2)
Another option is to model multiplicative disturbances:
in where D(k) is a diagonal matrix having the vector d(k) on the main diagonal, denoted
in the subsequent by diag(d(k)).
From a theoretical point of view, modeling the disturbances within the system matrix
represents reality best. Indeed, the holding times in the A matrix represent the processing times in a system in where abnormalities occur. These deviations cause the
disturbances of event happenings.
Modeling additive disturbances on the holding times yields a state space equation of
the form
x(k) = (A ⊕ Adis (d(k))) ⊗ x(k − 1),
(5-3)
in where Adis (d(k)) is a matrix of equal size as A such that each holding time τij is
associated with the correct disturbance di (k).
To model disturbances as acting multiplicative on the holding time one has to resort
to the heaps of pieces [11] modeling tool. This is a rather extensive analysis, which will
not be presented here.
Django D. van Amstel
Master of Science Thesis
5-1 Modeling disturbances in a max-plus linear model framework
5-1-2
51
Definition of the disturbance function
The selection from the above four methods is based on two criteria. The first one is of
mathematical usefulness. When one adds the delay to the state in max-plus sense as
xb(k) = x(k) ⊕ d(k)
two cases are possible. In the case that d(k) < x(k)
xb(k) = x(k).
Hence, no early event happening can be taken into account. The other case is when
d(k) > x(k), which yields
xb(k) = d(k).
This result is rather useless, as the actual difference between the actual state xb and the
scheduled state x(k) is not obtained.
The second criterium is one of practical nature. Although modeling the disturbances
as acting on the holding times is more correct from a theoretical point of view, it is in
general impossible to measure these quantities in practice. For example, have a train
departing from station A at time tA and arriving at station B at time tB , both events
happening for the k-th time. The departure and arrival from state entries x1 (k) and
x2 (k) respectively. In event domain, one can only measure these quantities; everything
in between is in time domain and doesn’t exist in event domain. In event domain both
events occur at exactly the same point on the time axis.
From the two criteria it can be concluded disturbances should be modeled as multiplicative on the state. Hence a definition for D is found.
Definition 5.3. Let x(k) ∈ Rnmax be the scheduled state, xb(k) ∈ Rnmax the actual state
and d(k) ∈ Rnmax the disturbance on the state which are related by the function D.
This function D is defined as
D (x, y) = diag(x) ⊗ y.
(5-4)
Writing out equation 5-4 for x(k) ∈ Rnmax and d(k) ∈ Rnmax gives






xb1 (k)
xb2 (k)
..
.
xbn (k)


d1 (k)



..
=
.



ε


ε ···
ε


.. 
... ...
⊗
.  

· · · ε dn (k)
x1 (k)
x2 (k)
..
.
xn (k)



.


The above for state i yields
Master of Science Thesis
xbi (k) = di (k) ⊗ xi (k) = di (k) + xi (k).
(5-5)
Django D. van Amstel
52
The max-plus linear disturbance model
Figure 5-1: The Petri Net representation of equation 5-5
With + denoting addition in the conventional sense. Both negative and positive disturbances d(k) can now be incorporated. Moreover, it gives a natural interpretation of
the disturbance being the difference between the actual and scheduled state.
Now that D is obtained, the proposed disturbance MPL model can be derived.
5-2
Deriving the disturbance MPL model
Equation 5-5 can be seen as a very simple two state MPL system. Let the state be
defined by
"
#
xi
xe(k) = b ,
xi
and let the place be named di , with holding time di (k) ∈ R. Note that actually a
Switching Max-Plus Linear (SMPL) system is obtained, as di (k) can be different for
every k.
The corresponding Petri Net of equation 5-5 is given in Figure 5-1. Formally this Petri
Net is defined by the quintuple Gd = {Qd , Pd , Dd , Td , M0,d } with
Qd
Pd
Dd
Td
M0,d
=
=
=
=
=
{xi , xbi }
{di } = D
{(xi , di ), (di , xbi )}
{di }
∅
(5-6)
This Petri Net is the key in obtaining the disturbance MPL model from any MPL.
One only has to substitute the above two-state Petri Net for every state in the original
system.
Take a generic Petri Net G with |Q| = n. Following [11], define
n = {1, 2, · · · , n}
Let G be a first order Petri Net represented by the quintuple
G = {Q, P, D, T , M0 }
Django D. van Amstel
Master of Science Thesis
5-2 Deriving the disturbance MPL model
53
as defined in section 2-5. Its algebraic representation in a max-plus MPL model is given
by the state x(k) ∈ Rnmax , with n = |Q| and matrices A0 and A1 .
Then substituting the Petri Net Gd as defined in 5-7 for each state in a Petri Net G
e defined by:
results in the extended Petri Net G,
e = Q∪Q
b = {x } ∪ {x
bi } ,
Q
i
Pe = P ∩ D,
with D = {di } ∀i
∀i
b × P ∪ (P × Q)
e ⊂ (Q × D) ∪ D × Q
b ∪ Q
D
Te = T ∪ {di (k)} ,
(5-7)
∀i
g = M
M
0
0
Note that all sets of which a union is taken in the above, are disjoint.
e Because of the specific structuring of
The interesting part is the new set of arcs D.
e and P,
e the arc set D
e is clearly structured as well. This
the event and place sets Q
structuring on its turn results in very structured matrices when transforming the Petri
Net representation into an SMPL model.
From the structure it can be derived that the downstream places of all original or
scheduled events only belong to D. The places in D can be the only upstream places
for Q, being the set of all actual events. To be precise, place di ∈ D will have upstream
event xi ∈ Q and the downstream event will be xbi ∈ Q
A similar conclusion can be drawn for the set of places P in the original Petri Net.
b and downstream events
These places can only have uptream events from the subset Q
from the original event set Q.
For the original set of places the exact events are known as well. If place pi ∈ P had
event xj ∈ Q upstream and event xk ∈ Q downstream, in the disturbance model it will
b upstream and still x ∈ Q downstream.
have event xbj ∈ Q
k
This very structured appearance is also seen in the MPL state space model representation, introduced in the next session.
5-2-1
Going from a standard MPL to the MPL disturbace model
The procedure of obtaining a disturbance MPL model from a MPL model is described
in the following theorem.
Proposition 5.4. Let G = {Q, P, D, T , M0 } be the Petri Net describing a MPL system. The MPL state space equation is given by
x(k) = A0 ⊗ x(k) ⊕ A1 ⊗ x(k − 1) ⊕ · · · ⊕ AM ⊗ x(k − M ).
Master of Science Thesis
Django D. van Amstel
54
The max-plus linear disturbance model
Then, the disturbance MPL model is defined by
with
xe(k) = A0,dis (k) ⊗ xe(k) ⊕ A1,dis ⊗ xe(k − 1) ⊕ · · · ⊕ AM,dis ⊗ xe(k − M )
A0,dis (k) =
"
E
A0
D(k) E
AM,dis =
"
E AM
E E
and
#
#
,
(5-8)
,
(5-9)
for M ≥ 1.
Proof. Since the Petri Net obtained by introducing the disturbances is still a Petri Net,
it can be described algebraically by the same equation. Only the size of the state and
hence the structure of the resulting matrices AM have changed. The given structure of
e in equation 5-7.
the AM,dis matrices follow directly from the definition of D
The procedure as described in 5.4 is illustrated by means of an example in Appendix
A, example 5-1.
5-3
Conditions of existence for the MPL disturbance model
In order to find the algebraic equivalence of a Petri Net as a MPL state space model,
the condition is that A∗0 is finite. If not, the system matrix A cannot be formed. A
similar condition holds for the disturbance model.
These conditions are described in the following theorems. They are given for first order
MPL models.
Lemma 5.5. Let the original system be described by a MPL state space model with
system matrix A = A∗0 ⊗ A1 .
Given that the original system matrix A exists, a necessary condition for Adis to exist
is that the power series of matrices (A0 ⊗ D) and (D ⊗ A0 )converge.
Proof. For Adis to exist, it is necessary that A∗0,dis has a (unique stationary) solution.
Using the result in (5-8), by direct computation we find the powers of A0,dis as
A⊗n
0,dis =
"



















n
(A0 ⊗ D)⊗ 2
E
(D ⊗ A0 )
#
,
⊗( n−1
2 )
E
⊗( n−1
2 )
E
n
(D ⊗ A0 )⊗ 2
(A0 ⊗ D)
⊗D
E
if n is even

(5-10)
⊗ A0 
, if n is odd
If (A0 ⊗ D)⊗n and (D ⊗ A0 )⊗n converge to a stationary solution for n → ∞, then A⊗n
0,dis
has a stationary solution for n → ∞.
Django D. van Amstel
Master of Science Thesis
5-4 Application of the disturbance model
55
In general nothing much can be said about the asymptotic behaviour of the power series
of the matrices mentioned above. However, if (A0 ⊗D) and (D⊗A0 ) are nilpotent, their
respective power series converge to E and a stationary solution is hence guaranteed.
This is shown in the following lemma.
Lemma 5.6. Given that the original system matrix A = A∗0 ⊗ A1 exists, a sufficient
condition for Adis to exist is that A0 is nilpotent.
Proof. Since D is a diagonal matrix, in general it is never nilpotent. However if A0 is
nilpotent, the max-plus product of D and A0 will be nilpotent since
E ⊗ D = D ⊗ E = E.
This result in combination with Lemma 5.5 above concludes the proof.
Although the conditions for existence of the disturbance model might seem mathematically very strict, in practice most max-plus linear models have a nilpotent A0 matrix.
By the last theorem, for all those systems a disturbance model can be derived.
5-4
Application of the disturbance model
In Example 5-2 of Appendix A, the disturbance model is compared to the traditional
MPL model. From this example the main advantage of the disturbance model will become clear. The model has the ability to directly expose how a certain disturbance will
propagate through the system by updating the scheduled state x(k). By incorporating
the measured disturbances, the whole scheduled state is updated by the disturbance
model according to the dynamics defined by the A matrix.
Not only from a practical point of view is it very useful to know how disturbances
propagate through the system. This information is also used in one of the feedback
methods to be discussed in chapter 7.
It is not impossible to obtain this information from the original model. By iterating
the recurrence relation
x(k) = A0 ⊗ x(k) ⊕ A1 ⊗ x(k − 1)
until x(k) on the left of the equality sign is equal to the one on the right, the same
updated state can be obtained. However, this procedure is rather ad-hoc and not very
systematic.
Since the disturbance model explicitly models disturbances and shows how disturbances
influence the system, it is very well suited to be used in a robust control type of
framework. However, this topic is outside of the scope of this thesis.
Master of Science Thesis
Django D. van Amstel
56
5-5
The max-plus linear disturbance model
Conclusions
In standard MPL systems there is no systematic way to model deviations from or
disturbances on the calculated state. Therefore these models are only suitable for
feedforward control.
First the distinction was made between the calculated or scheduled state x(k) and the
actual state xb(k). Their relation was defined through the disturbance vector d(k) and
the function xb(k) = D (d(k), x(k)).
A very simple expression was found for D as
D(x, y) = diag(x) ⊗ y.
Subsequently this modeling method was represented as a Petri Net. This Petri Net
was then incorporated in a generic Petri Net, resulting in a method to go from a
standard MPL system to a disturbance MPL model that can handle deviations from
the scheduled state and the actual state. Incorporating the disturbances in the model
does come at the cost of the state vector to double in size.
Finally, the conditions on existence of the disturbance MPL model were derived. It has
been found that the same conditions that exist for standard MPL systems are sufficient
for the disturbance MPL model as well.
5-6
Chapter notes
It is interesting to see that in the disturbance model, the disturbances are actually
incorporated in the system matrix. Recall that it was argued in section 5-1 that modeling the disturbances to act on the entries in the system matrix was the most natural
way. However, modeling the disturbances as disturbances on the state vector is more
logical in a practical way. In the final disturbance model, these approaches are fluently
unified.
When modeling measured disturbances in the event domain, the very well known issue
of the time domain versus event domain that was also mentioned in the Chapter Notes
of chapter 4 arise again. The disturbances are measured in time domain and hence
need to be transformed into the event domain. The main issue is that if an event is
measured to not have happened, it still remains unknown when it will happen. The
other way around is also true; if an event is early, when should one start to keep track
of the event happening? More on this topic is found in the last section of appendix
D, in where the implementation of the algorithms on the Zebro robot is explained in
detail.
Django D. van Amstel
Master of Science Thesis
Chapter 6
The adaptive Blind pick selection rule
In chapter 3 two selection rules for Ordinal Optimization (OO) have been discussed.
They were the Blind Pick (BP) and Horse Race (HR) selection rule. Whilst the first
doesn’t require any knowledge about the problem at hand and is hence useful when little
knowledge is available, the second method is the most efficient for most OO problems.
In this chapter a new selection rule is proposed that combines the two selection rules,
aiming at utilizing the good characteristics of both methods. Combining both methods is done by incorporating concepts from Reinforcement Learning (RL) into the BP
selection rule.
In the first section the BP and HR selection rules are compared from a learning point
of view. The idea of why combining these two methods is useful will be sketched. In
section 6-2 the new selection rule will be presented. The Inverse transform sampling
technique will be discussed and the Selection Probability Matrix (SPM) will be introduced. In the third section the alignment probability as a function of the new selection
rule is analysed by simulation.
6-1
The Blind pick versus Horse race selection rule and learning
Recall that in OO, a set S is formed, being a selection of s possible solutions θ from
the total search space Θ. Then the corresponding values of the cost or performance
function J(θ) are observed and ordered as
b ) ≤ J(θ
b ) ≤ · · · ≤ J(θ
b ).
J(θ
1
2
s
The hat is used to denote that the observations of the performance function are subjected to uncertainty and therefor only approximations of the true performance J.
Master of Science Thesis
Django D. van Amstel
58
The adaptive Blind pick selection rule
The selection of S is done according to a so-called selection rule. In the BP rule, S is
formed by random selection. However, the HR selects S on basis of the last observations
b
of J(θ)
for all solutions θ ∈ Θ.
b
The HR selection rule implicitely assumes that the already available observations J(θ)
approximate the actual function J(θ) fairly well. The BP selection rule is based on the
contrary belief. By selecting S at random, it assumes that absolutely no information
on the performance function is known a priori.
In RL a certain policy is used to select the action to take. A greedy policy will select
an action that has the highest probability of succes. On the other hand, a non-greedy
policy might select a sub-optimal solution to explore parts of the search space where
potentially better solutions may be found [38].
The dilemma of what kind of policy should be used is known as the exploration versus
exploitation dilemma in optimization. In the light of selection rules for OO, the BP
selection rule corresponds to exploration. In the selection of S randomly, the probability
of selecting a good solution is equal to the selection of a less good solution. The HR
selection rule is a greedy selection rule. It first assesses the performance function and
hence only solutions that proved to be good solutions in the past will be considered.
This analogy between the notion of greedy and non-greedy policies and how the BP
and HR selection rule work, was the inspiration for the development of the adaptive
Blind Pick selection rule, abbreviated to the Adaptive Blind Pick (ABP) selection rule.
6-2
The Adaptive Blind pick selection rule
The Adaptive Blind Pick (ABP) is essentially a BP selection rule. However, it can use
a priori knowledge in the stochastic selection of S. To what extend this knowledge is
used, is controlled by the annealing temperature T .
For high temperatures T the ABP selection rule acts as a normal BP rule. For low
values of T , the selection rule prefers known good solutions over others ones as in the
HR selection rule.
By combining the BP with concepts developed in Reinforcement Learning, the ABP
selection rule is obtained. In the following section the mechanics of ABP are explained
in detail.
6-2-1
Random selection
Let Θ be the discrete set of all solutions θ of size |Θ| = N . Θ is named the search
space. Moreover, let S ⊂ Θ denote the selected subset of size s. Then, the probability
of a solution θ ∈ Θ to be selected for S is defined as P (θ ∈ S|N, s).
Recalling that a random selection corresponds to a constant probability density function, it follows that
P (θ ∈ S|N, s) = C ∈ [0, 1] ,
Django D. van Amstel
∀θ ∈ Θ.
(6-1)
Master of Science Thesis
6-2 The Adaptive Blind pick selection rule
59
The exact value of C is a function of N and s, but is not of importance for the derivation
of the ABP.
6-2-2
Introducing the Selection Probability Matrix
Besides the already introduced solution set Θ, define the set of states as Ξ. Again, this
is a discrete set with size |Ξ| = M .
Then, the selection probability matrix is defined as follows.
Definition 6.1. Let Θ of size |Θ| = N represent the discrete set of solutions and Ξ
of size |Ξ| = M the discrete set of states. Moreover, let T ∈ R+ be the annealing
temperature. Define the approximated performance value for a pair (θ, ξ)as Jbθ,ξ =
F(x, xb, θ, ξ) ∈ R.
Then, the Selection Probability Matrix (SPM) P ∈ RN ×M is defined by its elements.
For a particular solution θn with n = 1, 2, · · · , N and given the state ξm with m =
1, 2, · · · , M and annealing temperature T the SPM entry is defined by the BoltzmannGibbs probability density function [38] as
e−Jbθn ,ξ /T
[P ]nm = P (θn |ξm , T, x, xb) = P
.
N
−Jbθi ,ξ /T
e
i=1
(6-2)
For a given state ξm the probability density function is then given by the column with
index m of the SPM:
pΘ (θ|ξm ) = [P ]·m .
(6-3)
The ABP selection rule is then obtained by replacing the probability density function in
equation 6-1 by the density function defined by equation6-2 in the Blind Pick selection
rule.
Moreover, let the cumulative distribution function of the ABP be given by
PΘ (θ) =
X
pΘ (θn )
(6-4)
θn ≤θ
such that
P (θn ∈ S|N, s, ξm , T, x, xb) = [P ]nm .
An example of how the annealing temperature in 6-2 reshapes the cumulative distribution function PΘ (θ) is given in Example 6-1 in Appendix A.
6-2-3
Inverse Transform Sampling
In using the ABP selection rule, a custom defined probability density function is defined. Using a custom probability density function in practice in a numerical computing environment such as MATLAB is not straightforward. The technique of Inverse
Transform Sampling [39] provides a solution in such a case. The method transforms a
uniformly distributed set of numbers on [0, 1] to a set of numbers drawn on any desired
distribution. The procedure is described by the following proposition.
Master of Science Thesis
Django D. van Amstel
60
The adaptive Blind pick selection rule
Proposition 6.2. (Proposition 1.1 (The Inverse Transform Method), [39])
Let F (x), x ∈ R, denote any cumulative distribution function (CDF). Let F −1 (y), with
y ∈ U(0, 1) denote the inverse function defined as
F −1 (y) = min {x : F (x) > y} ,
y ∈ U(0, 1).
Define X = F −1 (U ), where U has the continuous uniform distribution over the interval
(0,1). Then X is distributed as F , that is,
P (X ≤ x) = F (x),
x ∈ R.
The proof of the above proposition can be found in [39], p. 1.
The practical implementation of the above theory for a discrete CDF is given by the
following steps.
1. Let the discrete cumulative distribution function be given by the vector FX (xn ),
where n = 1, 2, · · · , N , with N the length of the vector. Draw a random number
U ∈ U(0, 1).
2. Calculate ∆, defined as
∆(n) = FX (xn ) − U,
n = 1, 2, · · · N.
3. Find the index i of the zero crossing in the vector ∆. The zero crossing is defined
as the entry with index i, for which the next entry i + 1 has value 0 or of the
opposite sign. Then, n = i + 1 for xn . The found value xn is now drawn from X
with CDF FX (x).
To show how the above described implementation of the Inverse Transform Sampling
method works, it is applied to a cumulative distribution function FX (x) in Example
6-2, Appendix A.
6-2-4
Summary of the Adaptive Blind Pick selection rule
The procedure of OO using the ABP can be summarized by
1. Define g,s,T and the desired alignment probability P (|G ∩ S| > k). The Selection
Probability Matrix as a function of the approximated performance values Jbθ,ξ is
given.
2. Calculate the SPM using equation 6-2 and from here the custom cumulative distribution function PΘ (θ).
3. Create the vector U ∈ U(0, 1)s .
Django D. van Amstel
Master of Science Thesis
6-3 The alignment probability for the adaptive Blind Pick selection rule
61
4. Using the inverse transform sampling method, transform U into UΘ . The vector
UΘ will contain s entries selected from the set {1, 2, · · · , N } according to the
cumulative distrubution function FΘ . These values correspond to the indices i of
θi ∈ Θ.
5. Define the selected subset S as
S = {θi } ,
6-3
∀i ∈ Uθ .
The alignment probability for the adaptive Blind Pick selection
rule
Recall that in OO, the alignment probability P (|G ∩ S| > k) was a measure for how
certain the observed best solutions θ are actually in the good enough subset G. In the
following lemma, the main advantage of the ABP to OO is given.
Lemma 6.3. Let Θ, |Θ| = N be the total solutions space and let θi ∈ Θ denote a
unique solution with index i. Let T ∈ R+ denote the annealing temperature and Jbθ,ξ
the approximated performance values. Let the SPM be defined as in definition 6.1 and
let the selection rule be ABP.
Then, the alignment probability P (|G ∩ S| > k) in the OO method will approach 1 as
T approaches 0 for arbitrary Jbθ,ξ , n, g and s.
Proof. The proof follows from the Boltzmann-Gibbs probability density function. This
function was defined in equation 6-2 as
e−Jbθ,ξ /T
P
N
This function is of the form
y
P ,
y
e−Jbθ,ξ /T
with y = e−z/x .
Note that the function e−z/x is monotonically decreasing for increasing z. Hence,
ez−δ/x > ez/x , ∀δ ∈ (0, ∞]
ez−δ/x
> 1, ∀δ ∈ (0, ∞].
ez/x
(6-5)
Now note that
d(z/x)
−z
= 2,
(6-6)
dx
2x
which is a monotonically increasing function that is always smaller than 0. Combining
6-5 and 6-6 yields
!
ez−δ/x
= ∞.
lim
x→0−
ez/x
Master of Science Thesis
Django D. van Amstel
62
The adaptive Blind pick selection rule
Figure 6-1: The alignment probability P (|G ∩ S| > k) as a function of the desired alignment
level k for selected values of the annealing temperature T .
In other words, Take two arbitrary points Jb1 and Jb2 on the curve y = e−Jbθ,ξ /T with
y(Jb1 ) > y(Jb2 ). Since the function e−z/x is monotically decreasing, it holds that Jb1 <2 .
From the above, it follows that the ratio y(Jb1 )/y(Jb2 ) tends to infinity for T approaching
zero from the right. By normalizing this ratio approaches 1, which concludes the
proof.
An example to illustrate how the alignment probability P (|G ∩ S| > k) is a function
of the annealing temperatures T is given in Figure 6-1. In the figure on the x-axis one
finds the desired level of alignment k, on the y-axis the resulting alignment probability.
Five curves for different selected annealing temperatures are plotted. These curves are
obtained by simulation.
While for T = 10 the alignment probability is equal to that of BP, the alignment
probability rises when the temperature goes down. For the lowest temperature T =
0.01, the alignment probability is 1 for almost all desired alignment levels (note the
small drop in the alignment level at k = 10 for T = 0.01). Hence for this temperature,
the optimization procedure will always arrive at a selected subset such that |G∩S| > k.
b which are the approximated performance values.
Note that the above is derived for J,
It depends on how well these approximated performance values approximate the true
performance values J, to what extend the alignment probability converges to 1 for the
true problem.
These curves are obtained by simulation. The used MATLAB code is given in Appendix
B. Moreover, in Appendix B the alignment probability for common combinations of
g,s,k and N are tabulated for reference.
Django D. van Amstel
Master of Science Thesis
6-4 Conclusions
6-4
63
Conclusions
In this chapter the BP selection rule was extended to the ABP. In the ABP, the
selected subset S is still done by a stochastic selection process as in BP.
However, the uniform probability density function in BP is replaced by the BoltzmannGibbs probability density function. This function reshapes the probability density
function according to the performances of the solutions. This reshaping is determined
by the annealing temperature T . For high temperatures the ABP acts just as the BP.
However as T → 0, the ABP is more similar to the HR selection rule.
In the final part it was shown that with the ABP, the alignment probability P (|G∩S| >
k) approaches 1 for T → 0. More important, it was proven that this probability can be
brought arbitrary close to 1 for any size of G, S and Θ by letting T approach 0. This
alignment level is defined for the approximated performance values and not the true
performance values.
Master of Science Thesis
Django D. van Amstel
64
Django D. van Amstel
The adaptive Blind pick selection rule
Master of Science Thesis
Part III
Feedback mechanisms
Master of Science Thesis
Django D. van Amstel
Chapter 7
Switching Max Plus Linear Feedback
Methods
In this chapter the theory put forward in part II of this thesis will be used to arrive at
the design of the supervisory controller. The extended control block diagram from the
introduction is depicted again in Figure 7-1. The aim of this chapter is to explain the
inner structure of the supervisory controller.
The supervisory controller essentially consists of two feedback loops. The first is the
reactive loop, which works on a lower abstraction level. The reactive loop takes the
measured disturbances and directly switches the A matrix to mitigate the disturbances
as soon as possible. The reactive feedback control loop builds upon the theory of the
individual event cycle method, described in chapter 4. The reactive feedback loop will
be described in detail in section 7-1.
The environment is taken into account in the deliberate loop. This method involves the
Ordinal Optimization (OO) method and incorporates learning by using the Adaptive
Blind Pick (ABP) selection rule, as presented in chapter 6. Via the deliberate loop, the
system learns which modes of operation perform better or worse in a given environment.
The deliberate feedback loop will be discussed in section 7-2.
In section 7-3 both methods will be combined to arrive at the final internal structure
of the supervisory controller.
7-1
The reactive feedback loop
Before the reactive feedback loop is explained, an example is given to clarify its motivation. In Figure 7-2 the leg lift-off and touch-down schedule of a six legged walking
system is schedule. On the x-axis is the time in seconds, on the y-axis the leg index.
The white parts represent aerial phase of the leg and dark gray support or ground
Master of Science Thesis
Django D. van Amstel
68
Switching Max Plus Linear Feedback Methods
Figure 7-1: The desired control scheme. Image edited from [5]
phase. The aerial phases are scheduled to be τf seconds. When a leg touch-down event
occured, the next leg will lift up τp seconds after. This so called double stance time
results in a more stable gait. However, it is not necessary to have. A minimal value is
min(τp ) = 0.
In the schedule at approximately 37s, leg 2 lifts off. The following touchdown event
is delayed, causing all future events to be rescheduled further into the future as well.
The hatched areas in Figure 7-2 represent the aerial phases of the original undelayed
schedule.
The reactive feedback loop will use the fact that the minimal allowed value of τp is
0 to compress the schedule as much as admissible, such that the original schedule is
obtained again. This "re-updated" schedule is given graphically in 7-3. Note that the
white areas in the re-updated schedule start to overlap more and more again with the
hatched original schedule.
How this re-updated schedule is obtained algebraically is explained in the next section.
7-1-1
Definitions
Definition 7.1. Let G = {Q, P, D, T , M0 }, be a Petri Net of an n state system with
n×n
associated max-plus matrices AM ∈ Rmax
and Max-Plus Linear (MPL) system matrix
n×n
A ∈ Rmax . Then, the minimal Petri Net Gmin is defined by
Gmin = {Q, P, D, Tmin , M0 }
with Tmin the set of minimal holdingtimes corresponding to the maximal performance
of the system. Let h denote the number of elements in T , then the constraint
[Tmin ]i ≤ [T ]i ,
∀i ∈ {1, 2, · · · , h}.
(7-1)
is imposed. This constraint guarantees normal operation when no disturbances are
present.
n×n
The MPL matrices obtained using equation 2-7 are denoted as AM,min ∈ Rmax
.
Django D. van Amstel
Master of Science Thesis
7-1 The reactive feedback loop
69
Figure 7-2: The delayed walking schedule of a six legged system. The blue areas represent the
aerial phase. τf is the scheduled flight time and τp the double stance time.
Figure 7-3: The updated schedule, compressing the schedule to mitigate the effects of the delay
Master of Science Thesis
Django D. van Amstel
70
Switching Max Plus Linear Feedback Methods
Note that by the definition, structurally Gmin is exactly the same as G. Only the holding
times are different.
n×n
Besides the minimal matrices, a negative disturbance matrix D− (k) ∈ Rmax
is defined
as the following.
Definition 7.2. Let di (k) denote the disturbance acting on state element xi (k) in the
n×n
k-th aggregated event cycle. Then, the negative disturbance matrix D− (k) ∈ Rmax
is
defined as




D (k) = 


−
−d1 (k)
ε
···
...
ε
ε
−d2 (k)
...
...
ε
ε
ε −dn (k)
ε
..
.
ε




.


With the above definitions, the Reactive Gait Scheduler (RGS) feedback law can be
defined.
n×n
n×n
Definition 7.3. Let AM ∈ Rmax
and AM,min ∈ Rmax
denote the matrices calculated
from the Petri Net G and Gmin respectively by equation 2-7. Moreover, let D− (k, l) ∈
n×n
Rmax
denote the negative disturbance matrix.
n×n
Then, the updated matrices AbM (k, l) ∈ Rmax
are defined in the individual event cycle
method by the RGS feedback law as
Note that it follows that
AbM (k, l) = AM ⊗ D− (k) ⊕ AM,min .
AbM (k, l) = AM
(7-2)
n×n
if D− (k) ≥ E ∈ Rmax
,
n×n
with E ∈ Rmax
the max-plus identity matrix. This fact is easily seen by noting that
because of constraint 7-1,
[AM ]ij ≥ [AM,min ]ij
∀i, j ∈ {1, 2, · · · , n}.
The addition in max-plus sense of AM,min in equation 7-2 can also be seen as a limit
on the control input; it provides a lower bound for AbM (k, l) when the schedule is
delayed. However, an upper bound does not exist. An upper bound would constraint
the control input if events are early and negative disturbances are measured. Because
an upper bound is absent, the RGS feedback law will always reschedule early events to
their original scheduled timing directly. Hence, the interesting dynamics of the RGS
feedback are only with delayed events. Therefore the following discussion will focus
on delayed states only. However, the presented algorithm applies to early events or
negative disturbances as well.
Django D. van Amstel
Master of Science Thesis
7-1 The reactive feedback loop
71
Figure 7-4: Block diagram of the reactive feedback loop.
7-1-2
The Reactive Gait Scheduler algorithm
Let x(k, l) ∈ Rnmax denote the scheduled or reference state of size n at aggregated event
counter k and single event counter l. Let lmax be as defined in Theorem 4.4.
The updated or actual state is denoted by xb(k, l) ∈ Rnmax . Moreover, let D− (k, l) ∈
n×n
Rmax
be the negative disturbance matrix as defined in definition 7.2 from measured
n×n
disturbances dn (k). Finally, let AbM ∈ Rmax
be defined as in definition 7.3.
In Figure 7-4 the reactive feedback loop is presented. The measured delays are used
to update the given A0 and A1 matrices, such that the realized schedule is adapted to
match the original, undelayed schedule.
The iterative algorithm of the reactive feedback loop for a first order MPL can be
described by the following steps. In here it is assumed that the scheduled single event
cycle states x(k, l), x(k, l + 1), · · · , x(k, lmax , x(k + 1, l), · · · are known.
1. A disturbance di (k, lcurrent ) with i ∈ 1, 2, · · · , n is measured. Determine lcurrent as
lcurrent = arg
max
l∈1,2,··· ,lmax
xi (k, l)
2. Calculate D− (k, lcurrent ) from d(k, lcurrent ).
3. Calculate AbM using definition 7.3 for M = 0, 1.
4. Calculate xb(k, lcurrent + 1):
xb(k, lcurrent + 1) =
(
A1 ⊗ x(k − 1)
if lcurrent = 0
A0 ⊗ x(k, lcurrent ) if lcurrent = 1, 2, · · · , lmax
5. Calculate the difference δ between the actual and scheduled state:
δ = lef t(xb(k, lcurrent + 1) − x(k, lcurrent + 1),
6. Let d(k, lcurrent + 1) = δ.
Master of Science Thesis
Django D. van Amstel
72
Switching Max Plus Linear Feedback Methods
7. Update
lcurrent = lcurrent + 1
until lcurrent = lmax − 1. If lcurrent = lmax − 1, let d(k + 1) = err, k = k + 1 and
lcurrent = 0.
8. If δ 6= 0n,1 , return to step 2. Otherwise stop.
Then, the aggregated event cycle scheduled state is updated as
x′ (k) =
lM
max
l=1
(xb(k, l)) .
The accent is used to distinguish the true original schedule and the updated schedule by
the reactive feedback control algorithm. In practice, one might never arrive at δ = 0, or
not in acceptable time. Hence, one can incorporate a computation horizon kmax , such
that in step 8 the algorithm also stops if k = kmax .
The reactive feedback control loop has been tested in simulation. In Appendix A the
results are presented.
7-2
The deliberate feedback loop
In contrast to the reactive feedback loop, the deliberate feedback loop takes the measurements from the environment into account besides the disturbances. The goal is to
learn and select a better mode of operations for a given environment.
The structure of the deliberate feedback loop is given in Figure 7-5. In the following
subsections each part of the block diagram will be explained. In which section each
block is discussed is depicted in Figure 7-5 as well.
7-2-1
Switch Decision Maker
The output of the Switch Decision Maker (SDM) is a boolean variable S(k) ∈ [0, 1].
When it is true, the Mode of Operation Optimizer (MOO) will select a new mode of
operation for the k-th iteration. If S(k) is false, no new mode of operation will be
selected for the k-th iteration in the aggregated event iteration.
S(k) can be a function of the environment measurement ξ(k) and the direct imposed
delays d(k) or possible other variables. This function can take on many forms. Two
simple examples will be given.
The arguably most intuitive function is that the system should switch when some norm
of the delay vector d(k) is larger than some bound δ, for example
S(k) =
Django D. van Amstel
(
1 if |d(k)|2 > δ
,
0 else.
(7-3)
Master of Science Thesis
7-2 The deliberate feedback loop
73
Figure 7-5: The block diagram of the deliberate feedback loop
where | · |2 denotes the 2-norm.
In some cases it is beneficial that the system switches often. Let t(A) denote the
transient time, then the switching function could look something like
S(k) =
(
1 if S(k − i) 6= 1, for i = 1, 2, · · · , (t(A) + S)
0 else.
(7-4)
Equation 7-4 is defined such that a switch is performed if and only if the system has
been in the same mode of operation for t(A) + S event iterations. By letting S ∈ N+ ,
this switching function ensures the system will operate at least for S event iterations
in the eigenspace, given that t(A) is a finite number as well.
Many other switching function are possible, but combining the above two proved to
be sufficient for the experiments conducted in this thesis. The second function is very
suitable in learning situations as it guarantees switching with a certain frequency. The
danger however of using such a "switch-greedy" function is that the frequent switching
might be detrimental to the performance of the system. A more in-depth research on
switching functions is outside of the scope of this thesis.
7-2-2
Performance Function Learner
The goal of the Performance Function Learner (PFL) is to constantly update the
b in the hope of approximating the true
Performance Approximation Matrix (PAM) Λ
performance values J. Let the total set of modes of operation be defined by Θ of
size N and the set of environments be denoted by Ξ with size M . Then let the PAM
b
Λ(k)
∈ RN ×M be defined as
h i
b
Λ
nm
b
b, ξm , θn ),
= J(ξ
m , θn ) = F(x, x
Master of Science Thesis
∀n ∈ {1, 2, · · · , N },
∀m ∈ {1, 2, · · · , M },
Django D. van Amstel
74
Switching Max Plus Linear Feedback Methods
h i
b
b correis the approximated performance value J,
such that the matrix element Λ
nm
sponding to mode of operation θn ∈ Θ and environment ξm ∈ Ξ.
At every event step k the PFL receives a current mode of operation θ(k) ∈ Θ from the
MOO and the current environment ξ(k) ∈ Ξ from the measurements. This information
b by the following definition.
is used to update the approximation matrix Λ
b
Definition 7.4. Let Λ(k)
denote the approximation matrix at event step k with initial
b such that Λ(0)
b
b . Moreover let H ∈ N+ be defined as the history
condition Λ
= Λ
0
0
b
horizon. Then for every event iteration k a single element of Λ(k)
is updated as
h
i
b
Λ(k)
n,m
=
b
J(ξ
m , θn ) +
PH
h=1
h
H +1
i
b
Λ(k
− h)
nm
,
(7-5)
with ξm = ξ(k − 1) and θn = θ(k − 1).
The history horizon determines how conservative the update is. For large H, more
emphasis is put on historical values. There is a trade-off between putting more emphasis
on historical values which makes the learning more robust to outliers and putting
less emphasis on the historical values to increase the allowed variation and hence the
learning rate.
b at one point
A downside of the current method is that it only updates the matrix Λ
(ξ, θ) for every k. For larger search spaces this will lead to very slow learning.
Several methods like gaussian smoothing, local linear regression [40], basis functions
[41] or B-Splines [42] have been analysed and compared to increase the convergence rate
by increasing the number of points that are updated at every iteration k. However, all
methods suffer from the fact (implicitly) assume a certain knowledge about the function
F, which is not the case in general. The analysis of how to increase the learning rate
is deferred to a later study.
7-2-3
Mode of Operation Optimizer and Synthesizer
The core of the deliberate loop is the Mode of Operation Optimizer (MOO). The
output is a certain mode of operation from the total space of possible solutions, θ ∈ Θ.
The main function of the MOO can be described as
θ(k) =
(
arg minθ∈Θ F(x, xb, ξm , θ) if S(k) = 1
,
θ(k − 1)
if S(k) = 0.
(7-6)
with ξm = ξ(k) ∈ Ξ.
The first line of equation 7-6 involves an optimization problem. This is done by using
the OO method as introduced in chapter 3, using the ABP selection rule from chapter
7 and the PAM such that the optimization problem is solved as
n∈N
Django D. van Amstel
h
i
b
n = arg min Λ(k)
N,m
Master of Science Thesis
7-2 The deliberate feedback loop
75
with index m defined as the index in ξm = ξ(k). Subsequently
θ(k) = θn .
In other words, the column m of the PAM corresponding to the current environment
b
ξm is used to find the element with the minimal approximated performance value J.
The row index n of this element corresponds to the index of the mode of operation
θn ∈ Θ.
Note that if future environments ξ(k + 1), ξ(k + 2), · · · woul be known, it would be
possible to schedule the modes of operation for future event steps k as well. More on
this topic is found in the chapter notes.
Recall from chapter 7 that in the ABP selection rule, the annealing temperature T
determines the alignment probability in the OO algorithm.
The simplest way is to define T is by letting it be a function of the time unit, for
example
−k
(7-7)
T (k) = T0 · e C + T∞ .
In here, T0 ∈ R is the initial temperature, T∞ ≤ T0 is the desired final temperature for
k → ∞ and C a constant that determines how steep the temperature drop is. By the
above definition, in theory the algorithm will initially do a lot of exploration as it will
select sub-optimal solutions more frequent for higher T . As the annealing temperature
approaches T0 , less exploration is done.
However, this assumes that the PFL indeed converges to the true PAM in a certain finite
time. A more intelligent way to determine the value of T would be to also incorporate
b of the true performance
information on the convergence of the approximation of Λ
values J. However, due to time constraints and the complexity of this subject, it has
not been investigated any further in this thesis.
The Mode of Operation Synthesizer (MOS) transforms the found mode of operation representation θ(k) into the system matrices AM (k)
and possibly AM,min(k) by the mapping A as defined in the introduction. What this
mapping exactly is, is dependent on the problem. In the next chapter, an example will
be given.
The Mode of Operation Synthesizer
7-2-4
Gait Scheduler
the Gait Scheduler (GS) is simply a Switching Max-Plus Linear (SMPL) state space
model extended to the disturbance model. Let x(k) ∈ Rnmax denote the scheduled state
and xe(k) ∈ Rnmax the actual or measured state. Then the GS is defined by the SMPL
model as
xe(k) =
"
x(k)
xb(k)
Master of Science Thesis
#
= Adis (θ(k)) ⊗ xe(k − 1) = A∗0,dis ⊗ A1,dis ⊗ xe(k − 1)
(7-8)
Django D. van Amstel
76
Switching Max Plus Linear Feedback Methods
with
A0,dis =
"
E
A0 (θ(k))
D(k)
E
#
∈
R2n×2n
max ;
A1, =
"
E A1 (θ(k))
E
E
#
∈ R2n×2n
max .
(7-9)
n×n
In here, D(k) ∈ Rmax
is the disturbance matrix defined by



D(k) = 



d1 (k)
ε
..
.
ε
ε
···
...
ε
..
.
d2 (k)
...
...
ε
···
ε dn (k)




.


(7-10)
The scalar values di (k) ∈ R+ are obtained from the feedback signal, while θ(k) is
received from the MOS. Using the above equations, the GS can schedule an arbitrary
event steps ahead.
7-2-5
Summary of the deliberate feedback algorithm
The deliberate feedback algorithm can be summarized by the following steps.
1. An environment measurement xi(k) and the realized state xb(k) are received.
2. The Switch Decision Maker (SDM) decided if a switch should be performed.
3. Regardless of the decision of the SDM, the Performance Function Learner (PFL)
b
updates the Performance Approximation Matrix (PAM) Λ(k)
using the obtained
measurements.
4. If the SDM decided a switch of mode of operation is necessary, the Mode of
Operation Optimizer (MOO) will perform an optimization using the Ordinal Opb
timization algorithm, using Λ(k)
as a representation of the performance function
to find θ(k).
5. The abstract representation of the mode of operation θ(k) is transformed into
useful system matrices by the Mode of Operation Synthesizer (MOS).
7-3
Combining the feedback loops
Recall that in definition 7.3 of the reactive feedback loop the matrices AM are calculated
as
AbM (k, l) = AM ⊗ D− (k) ⊕ AM,min .
Implicitly here it is still assumed that AM and AM,min are given. Note that also in
Figure 7-4 the AM matrices are an external given input.
Django D. van Amstel
Master of Science Thesis
7-3 Combining the feedback loops
77
With the deliberate feedback loop introduced, the definition of the reactive feedback
law can be extended to
AbM (k, l) = AM (θ(k)) ⊗ D− (d(k)) ⊕ AM,min (θ(k)).
(7-11)
Equation 7-11 is then the very brief and summerized answer to the problem statement
of this thesis. The equation makes the system matrix A a function of ξ(k) and d(k).
By noting that θ(k) is a function of the environment through the deliberate loop, the
resulting system is autonomous as there is no external input that provides the system
matrix A anymore.
By combining both methods, the final structure of the supervisory controller is defined.
It is given graphically in Figure 7-6. It must be noted that this controller scheme is
derived only for first order SMPL systems.
Essentially, the GS in the deliberate loop is replaced by the MPL Reactive Gait Scheduler (RGS) block. This block is the reactive feedback loop, in where the necessary
AM and AM,M matrices are provided by the deliberate loop. When the feedback loops
are combined, the GS does not use the extended disturbance model as this model has
not been defined in the framework of the single event cycle method that the reactive
feedback algorithms uses.
Note that in the final structure, the Data Processor block is added. The Data processor takes care of transforming the time domain measurements into event domain
representations.
7-3-1
Data Processor
In the Data Processor, the time domain measurements are transformed to the event
domain. Let ξtime (t) and xbtime (t) denote the time domain measurements of the environment ξ and the actual state xb.
Definition 7.5. Let txi (k) ∈ R+ denote the time at which it is measured that event
xi (k) has occured and t ∈ R denote the actual time. Then the conversion is given by
[xb]i (k) =
(
[xbtime ]i (t), if t ≥ txi (k)
∀i ∈ 1, 2, · · · n.
ε
if t < txi (k)
The disturbance di (k) on the event corresponding to state xi (k) is then calculated as
di (k) = xbi (k) − xi (k).
Representing the environment ξtime (t) in event domain by ξ(k) is not straightforward.
The main issue is how the time domain and event domain project on each other, as
explained in the chapter notes of chapter 4. A possible method that is used in this
thesis uses the average of the time domain measurements for the transformation to the
event domain. The method is described in the following definition.
Master of Science Thesis
Django D. van Amstel
78
Switching Max Plus Linear Feedback Methods
Figure 7-6: The internal structure of the supervisory controller
Definition 7.6. Let f be the sampling frequency, and let nt denote a sample corresponding to time t. Then the time domain measurements of the environment ξtime (t)
result in the discrete time series of samples
ξmeasurements (t) = {ξtime (1), ξtime (2), · · · , ξtime (nt )} .
Now define
tmin (k) = min (xbi (k)) .
i∈1,2,···n
Then define the vector ξk as



ξk = 


ξtime (ntmin (k) )
ξtime (ntmin (k) + 1)
..
.
ξtime (ntmin (k+1) − 1)



.


The vector ξk is essentially a specific part of the time series ξmeasurements (t), containing
all measurements between the measurement corresponding to the first event happening
in the state x(k) and the last sample before the first event of x(k + 1) has happened.
Finally, ξ(k) is then defined as the average of ξk :
i
1X
ξ(k) =
ξk (i),
n
Django D. van Amstel
Master of Science Thesis
7-4 Conclusions
79
with i the index of element i in the vector ξk .
Taking the average is essentialy a zeroth order polynomial approximation. One could
use higher order polynomial regression analysis on the time series ξmeasurement (n) to
retrieve more information such as derivatives of the measured signal. Such a higher
order regression would especially be very useful when one aims to predict future environments.
How the data processor functions have been implemented in practice can be found in
appendix C.
7-4
Conclusions
This chapter combined the theory of part II of this thesis to arrive at two feedback
methods that recalculate the system matrices in a feedback control setting. The first,
lower level method is named the reactive loop as it reacts directly to measured disturbances. It does not alter the structure of the system matrices, but only adapts the
holding times to mitigate the effects of the disturbances.
The second method, operating on a higher abstraction level, received the name of the
deliberate loop. In this algorithm, a matrix of performance values is constantly updated
by new approximations. It is assumed that over time, this learned performance values
converge to the true values. This performance matrix is used in an Ordinal Optimization (OO) optimization method using the Adaptive Blind Pick (ABP) selection rule to
explore which modes of operation work well in a particular environment. Over time,
the exploration becomes less and the optimization will exhibit more exploitation. The
deliberate loop only uses the measured disturbances to determine if a structural switch
of the system matrix is necessary.
Finally both methods have been combined and result in the equation
AbM (k, l) = AM (θ(k)) ⊗ D− (d(k)) ⊕ AM,min (θ(k)).
This equation can be viewed as the summarized answer to the problem statement of
this thesis.
7-5
Chapter notes
It was noted in section 7-2-3 that the optimization is performed only on a specific
b This column corresponds to a certain environment. If future environments
column of Λ.
ξ(k+1), ξ(k+2), · · · are known, it would be possible to schedule the modes of operation
ahead and hence a Model Predictive Control [43] type of control framework could be
implemented.
Master of Science Thesis
Django D. van Amstel
80
Switching Max Plus Linear Feedback Methods
Even a step further is that a sequence of modes of operation could be optimized for a
sequence of environments. Such a setting would be very alike Reinforcement Learning
[38], in where the goal is to find the optimal sequence of actions given a certain state of
the system. However, the current Performance Function Learner is by far not advanced
enough to work in such a learning environment.
Django D. van Amstel
Master of Science Thesis
Chapter 8
Case Study: The Zebro hexapod
walking robot
The deliberate feedback loop has been implemented on the Zebro hexapod robot. Two
different experiments have been conducted to verify the feedback loop in practice. The
discription on how the feedback loop is implemented on the Zebro will be presented
first. Subsequently, the experiments will be described and the results will be analyzed.
8-1
Implementation of the deliberate feedback loop on the Zebro
The Zebro is a hexapod robot with a relative simple geometry. The main body has the
shape of a rectangular box on which 6 leg joints or hips are attached. In each hip a
motor is situated, driving a single leg. Each leg has the shape of approximately a halve
circle, see the left picture in Figure 8-1. The middle picture in Figure 8-1 shows the
wireless router and the battery pack on the back and all the electronics in the front
of the main body. In Figure 8-1 on the right, the Zebro is depicted while walking. In
Figure 8-2 the used leg numbering is depicted.
Each leg positions is controlled by a seperate P D controller. The used reference signal
comes from a Switching Max-Plus Linear (SMPL) model. These models are derived in
[15]. The method how to obtain these models will be summarized in the description of
the Mode of Operation Synthesizer (MOS) and Gait Scheduler (GS).
In the following sections the implementation of each block will be presented. The
MATLAB code of the implementation can be found in appendix C.
8-1-1
Implementation of the Mode of Operation Synthesizer
The content of this section is a summary of the methods developed in [15]. Let ti (k)
and li (k) denote the touch-down and lift-off events of leg i ∈, 2, · · · , 6 for the k-th time.
Master of Science Thesis
Django D. van Amstel
82
Case Study: The Zebro hexapod walking robot
Figure 8-1: The Zebro hexapod robot. Left: close up of a Zebro leg. Middle: Zebro in rest.
Right: Zebro walking with a tripod gait.
Figure 8-2: The leg numbering as used in the modeling
Django D. van Amstel
Master of Science Thesis
8-1 Implementation of the deliberate feedback loop on the Zebro
83
The state is then defined as


t1 (k)

.. 

. 


 t (k) 
 6

 ∈ R12
x(k) = 
max
 l1 (k) 



.. 

. 
(8-1)
l6 (k)
Let τf denote the time duration of the aerial phase of the leg, τg the support phase time
duration and τ∆ the so called double stance time. The double stance time is the time
that the next lift-off will wait after the previous touch-down has occured. Moreover,
let L denote a certain synchronization or gait of the legs.
The mode of operation θ is then defined as the vector


L

θ=
 τf  .
τ∆
(8-2)
Then the AM forM = 0, 1 matrices are given as
A0 =
"
E τf ⊗ E
P
E
#
;
A1 =
"
E
E
τg ⊗ E ⊕ Q E
#
,
(8-3)
such that the Max-Plus Linear (MPL) state space model is obtained as
x(k) = A0 ⊗ x(k) ⊕ A1 ⊗ x(k − 1) = (A∗0 ⊗ A1 ) ⊗ x(k − 1).
(8-4)
In 8-3 the P and Q matrices encode the synchronization constraints of the legs. They
are used to define a gait by the following definition (definition 7 in [15]).
Definition 8.1. Let n be the number of legs in the robot and define m as a number
of leg groups. Let l1 , · · · , lm be ordered sets of integers such that
m
[
lp = {1, · · · , n} , ∀i 6= j, li ∩ lj = ∅and ∀i, li 6= ∅
(8-5)
p=1
i.e., the sets lp form a partition of {1, · · · , n}. A gait L is hence defined as an ordering
relation of groups of legs:
L = l1 ≺ l2 ≺ · · · ≺ lm .
(8-6)
The gait space is the set of all gaits that satisfy the previous definitions.
With the above definition, the P and Q matrices are defined as
[P ]pq =
Master of Science Thesis
(
τ∆ ∀j ∈ {1, · · · , m − 1} ; ∀p ∈ lj+1 ; ∀q ∈ lj
ε otherwise
(8-7)
Django D. van Amstel
84
Case Study: The Zebro hexapod walking robot
and
[Q]pq =
(
τ∆ ∀p ∈ l1 ; ∀q ∈ lm
ε otherwise.
(8-8)
An example of how different gaits are tranformed into a MPL model is given in appendix
A. Hence, every gait leads to a unique system matrix A.
8-1-2
MPL Gait Scheduler
Due to technical issues regarding the computational burden, the disturbance model has
not been implemented. Instead, the state is updated using a feedback method that was
implemented on the RQuad robot. This robot is very similar to the Zebro, except that
it only has four legs.
Instead, the conventional SMPL model is to schedule the states ahead as
x(k + 1) = A(θ(k)) ⊗ x(k).
The measured state and hence the disturbances are incorporated by using
x(k) = A0 ⊗ xb(k) ⊕ A1 ⊗ x(k − 1).
However, this equation has to be iterated to arrive at a fully updated state x(k), as
shown in chapter 5. This is not done in the actual implementation for the sake of
computational speed. Hence, only the directly influenced states will be updated. The
implementation is explained in more detail in appendix C.
8-1-3
Implementation of the Switch Decision Maker
In the implementation, the switching function is defined as a combination of the introduced switching functions in the previous chapter:
S(k) =
(
1 if S(k − i) 6= 1, for i = 1, 2, · · · , (t(A) + S) ∧ |d(k)|2 > δ
0 else.
(8-9)
In [15] it was derived that t(A) = 2 for the Zebro. By trial and error the value of
δ = 0.07 has been selected.
For the Reference speed experiment S = 0 was selected because no learning is performed in this experiment and high frequent switching is favourable. For the learning
experiment, S = 2 to ensure that the system will reach and stay in steady state after
each switch for at least 2 event iterations.
Django D. van Amstel
Master of Science Thesis
8-1 Implementation of the deliberate feedback loop on the Zebro
8-1-4
85
Implementation of the Performance Function Learner and Mode of Operation Optimizer
For the Mode of Operation Optimizer (MOO) and Performance Function Learner (PFL) the total set of modes of operation Θ is defined as a set of 15 unique
modes of operation and hence
Definition of Θ.
Θ = {θ1 , θ2 , · · · , θ15 } .
(8-10)
Let L denote a certain gait, τf the time of the aerial phase of each leg, τg the support
or ground phase of a leg and τ∆ the double stance time.
The unique mode of operation θ is then defined as the vector


L

θ =  τf 
.
τ∆
(8-11)
Note that τg is not part of the mode of operation. As it has been shown in [15], τg is a
function of the variables in θ for feasible modes of operation as
τg = (τf ⊗ τ∆ )⊗m − τf ,
in where m is the number of leg groups. This number will be explained in more detail in
the forthcoming sections. Using the definition of equation 8-11, the modes are defined
as given in Table D-1. The leg synchronizations Li in here are
L1 = {1}, {4}, {5}, {2}, {3}, {6};
L2 = {4, 5}, {1, 6}, {2, 3};
L3 = {1, 4, 5}, {2, 3, 6}.
One might argue that the size of Θ is rather small. It is not representative for a real
sized problem and the Ordinal Optimization method was selected partially because it
could handle very large search spaces. Large search spaces are typically present in a
SMPL framework.
The size of the search space was chosen rather small because available experimental
time with the Zebro was limited and the larger the search space, the longer learning
will take. More on this issue is discussed in appendix D.
However, by defining the good enough subset G small as well, the sizes of the total
search space and good enough subset are kept at a ratio representative for real sized
problems.
Master of Science Thesis
Django D. van Amstel
86
Case Study: The Zebro hexapod walking robot
Table 8-1: Definition of all modes of operation θi ∈ Θ
Mode
θ1
θ2
θ3
θ4
θ5
θ6
θ7
θ8
θ9
θ10
θ11
θ12
θ13
θ14
θ15
Leg synchronization
L1
L1
L1
L1
L1
L2
L2
L2
L2
L2
L3
L3
L3
L3
L3
τf /s
0.300
0.525
0.750
0.975
1.20
0.300
0.525
0.750
0.975
1.20
0.300
0.525
0.750
0.975
1.20
τ∆ /s
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
Definition of Ξ. In the two experiments the environment is defined differently. In the
Reference speed experiment it is defined as a user imposed reference speed Vref . The
theoretical forward speed of a mode of operation can be easily computed by dividing
the horizontal displacement of a foot during the support phase by the time it is in
support phase, τg . From simple geometry the theoretical speed of a mode of operation
is obtained as
q
2 · r2 − 2 · r2 · cos(α∆ )
,
(8-12)
Vth (θ) =
τg (θ)
in where r denotes the distance between the leg tip or foot and the pivoting point of
the leg and α∆ is the angular displacement of a leg in support phase.
It has been measured that approximately r = 0.15m and it has been designed that
α∆ = 0.6rad. Then, the maximal and minimal theoretical speed from all modes of
operation in Θ are obtained as
h
Vmin Vmax
i
=
h
0.0152m/s 0.201m/s
i
and hence the environment for the reference speed experiment is defined as
ΞReference Speed =
h
i
0.0152 0.0417 0.0683 0.0948 0.121 0.148 0.175 0.201
(8-13)
In the learning experiment the environment is a limiting factor on the current in the
actuators. By limiting the actuators, it is simulated that the Zebro has a higher load
to overcome as for example in climbing or pulling a weight.
Django D. van Amstel
Master of Science Thesis
8-1 Implementation of the deliberate feedback loop on the Zebro
87
This limiting factor is defined as a number between 0 and 1, in where 0 corresponds to
no electrical current in the actuators and 1 to the maximal allowable electrical current
in the actuators. By trial and error it was found that at 30% the actuators are just
powerful enough to support the weight, while at 70% there is no noticable difference
any more between gaits. Hence, the environment is defined as
ΞLearning =
h
i
0.3 0.4 0.5 0.6 0.7 .
(8-14)
One can view this environment also as a varying saturation limit on the control input
for the lower level P D control loop.
In the reference speed experiment, the performance function is defined as the absolute difference between the user imposed reference speed and the theoretical speed of the mode of operation;
Definition of the performance function F.
FReference Speed (ξ, θ) = |Vref − Vth (θ)| .
(8-15)
Recall that by definition Vref = ξReference Speed . Note that there are no uncertain parts
in this definition of the environment, and hence we have Ξ = N and hence the true
performance function and values are obtained.
For the learning experiment, the performance function is defined as the 2-norm of the
disturbance vector:
Flearning (x, xb) = |d(k)|2 .
(8-16)
By defining the performance function solely as the 2-norm of the disturbance vector,
there is no a priori knowledge available. The function will be learned over time as the
disturbances are assumed to be an implicit function of ξ and θ. If this assumption is
valid will be tested in the experiment.
Recall that the performance function is updated ac-
Definition of other parameters.
cording to
h
i
b
Λ(k)
nm
=
b
J(ξ
m , θn ) +
PH
h=1
h
H +1
i
b
Λ(k
− h)
nm
,
(8-17)
in where H ∈ N+ was some history horizon. For the current implementation H = 1 is
chosen to allow for fast learning. However, the approximation will also become more
sensitive to outliers.
For the Ordinal Optimization (OO) method, the following parameters have been selected:
g = 1; s = 8; k = 1.
By doing so, the good enough subset is approximately the top 7% of solutions.
Master of Science Thesis
Django D. van Amstel
88
Case Study: The Zebro hexapod walking robot
Figure 8-3: The alignment probability as a function of the temperature for N = 15, g = 1,
s = 8 and k = 1.
In the OO procedure, the performance function is shifted such that all values Λξ,θ > 0.
This is done by applying the shift
Lambda(ξ, θ) = Lambda(ξ, θ) + min (Lambda(ξ, θ)) .
ξ∈Ξ,θ∈Θ
Because the OO algorithm only takes the ordering of the values into account, this shift
does not affect the algorithm. The purpose of the shift is to ensure that the Selection
Probability Matrix (SPM) is calculated correctly.
The resulting alignment probability as a function of the annealing temperature T is
given in Figure 8-3. Note that again this alignment probability is defined according to
the approximated performance values and not the true values.
8-2
Description of experiments
Two experiments have been conducted. The first one is the so called reference speed
experiment. The Zebro is given a certain velocity and selects the gait that comes closest
to that speed. In here, no learning is necessary. In the second experiment the Zebro
will have to learn the performance function, as the goal is to minimize the introduced
disturbances.
8-2-1
Experimental setup
The experimental setup is a clear track of approximately 6 meters in length, see Figure
8-4. The environment has a very homogeneous surface and therefore the dynamical
Django D. van Amstel
Master of Science Thesis
8-2 Description of experiments
89
Figure 8-4: The experimental environment
properties of the robotic body are isolated, as almost no influence comes from the
terrain.
During the experiments, the battery has been kept charged above 23.5V at all times to
avoid influence from a depleted battery. The fully charged voltage is approx. 25.0V ,
while at 22.0V the Zebro shuts down. Hence, the battery was always charged above
50% of a full charge.
Moreover to avoid influence from dynamics caused by turning, the Zebro was turned
around by hand at the ends of the track. For a more detailed description of the
experimental setup, see appendix D.
8-2-2
Reference speed experiment
The experiment is executed by letting the reference speed Vref increase linearly over
a timespan of 180 seconds. The experiment is repeated three times at two different
annealing temperature T :
Tlow = 0.01; Thigh = 5;
From 8-3 it is then approximated that the alignment probability for each temperature
is
P (|G ∩ S| > k)low = 1; P (|G ∩ S| > k)high = 0.42;
See Appendix D for the used MATLAB code in this experiment.
The hypothesis is that for the repititions with Thigh , the algorithm will
select sub-optimal solutions in approx. 42% of the total number of switches and hence
more exploration of the search space is done. For Tlow , the optimal mode of operation
will be selected at all switches.
Hypothesis.
Master of Science Thesis
Django D. van Amstel
90
Case Study: The Zebro hexapod walking robot
8-2-3
Learning experiment
In the learning experiment, the annealing temperature is kept at a constant high temperature of T = 50. During the learning the environment will be changed linearly from
the lowest to the highest value. The alignment probability in this learning phase is
P (|G ∩ S| > k)high = 0.42 according to Figure 8-3.
b is the zero matrix:
The initial Performance Approximation Matrix (PAM) Λ
n×m
b
b =0
Λ(0)
=Λ
.
0
nm ∈ R
Since the performance function was defined as the 2-norm of the disturbance vector
d(k), all updated performance values will be larger than 0. This follows directly from
the definition of a norm. Hence, unvisited parts of the search space will have a minimal
performance value.
Moreover, although the true performance function is unknown, it is known what this
function will represent. Since the surface is very smooth, disturbances can only be
introduced by the dynamical properties of the robot body. Hence, the true performance
function represents how the robot itself is influencing the walking schedule.
Hypothesis.
The hypothesis of the learning experiment consists three parts:
1. During exploration, all parts of the search space will be explored.
2. During exploration the learned performance function will converge to a certain
solution over time.
3. The learned performance function converges to the true performance function.
8-3
Analysis of results
An extensive presentation of the experimental results can be found in Appendix D. In
this section, only the significant results for analysis will be presented.
8-3-1
Reference speed experiment results
In the reference speed experiment it is possible to calculate the good enough subset G
a priori per reference speed as the performance function was deterministic and known.
At every gait switch, it is analysed if the algorithm selected a solution from the good
enough subset. Define the good selection measure GS as
GS =
Django D. van Amstel
number of times a good switch is performed
.
total number of switches
Master of Science Thesis
8-3 Analysis of results
91
Table 8-2: GS for Tlow and Thigh in the reference speed experiment
Repitition
Tlow
Thigh
#1
1.000
0.3929
#2
0.9412
0.3000
#3
1.000
0.3793
mean
0.9804
0.3574
Variance
0.0008
0.0017
In here a good switch is defined as a switch in where the algorithm selected a mode of
operation that is in the good enough subset. Now note that by the following definition
E [GS] = P (|G ∩ S| > k) ,
in where E denotes the expected value.
The resulting GS for the two annealing temperatures T are given in in Table 8-2.
Clearly, for Tlow GS is equal to the expected alignment probability of 1. However, the
GS of Thigh is slighty lower; it was expected to be 0.42, but it is 0.36 on average.
There are a few reasons for this discrepancy. First of all, the number of experiment
repititions is limited and secondly each repitition only contained about 30 switches on
average. It is very well possible that if more switches would be performed, GS converges
better to the expected value.
Although this seems the most plausible reason, the fact that all three observations of
GS for Thigh are lower than the expected value and the variance is small indicate this
isn’t the only reason. Moreover, the fact that the values of Tlow are much closer to their
expected value shows it is unlikely that the number of repititions is the main reason
for the observed difference between the expected and the obtained values.
Another reason might be that the initial switch is a given mode of operation, regardless
of the environment. Hence, it is very likely this switch is sub-optimal. On 30 switches,
1 bad switch corresponds to an error of about 1/30 = 3%. However, this effect should
then be seen in the Tlow repititions as well. Also, the random numbers that form the
set U in the inverse transform sampling step are not completely random and hence
introduce a small bias. However, this effect should not be of influence for such small
numbers of gait switches.
It can be concluded that the hypothesis is confirmed. For
higher values of the annealing temperature T more exploration is present. For the low
temperature the algorithm selected a mode from the good enough subset at practically
each switch of mode of operation.
Conclusion of experiment
However, the found GS values do not completely correspond to their expected values
for Thigh . The results significantly lower than the expected values. The number of experiments is relatively small and the durations have been short, but this not completely
explains the relative large deviation.
Master of Science Thesis
Django D. van Amstel
92
8-3-2
Case Study: The Zebro hexapod walking robot
Learning experiment results
After approx. 1 hour of learning time, the performance function as shown in Figure
8-5 has been obtained. In appendix D the chronological construction of this learned
performance function is given at certain time intervals.
From Figure 8-5 the first part of the hypothesis is clearly verified; almost all values of
the performance function are larger than 0, indicating that the whole search space has
been explored and updated. This was expected since the initial value of the performance
function was 0 for all combinations of ξ and θ.
If the initial values would have been selected too large, unvisited points will not be
explored because the performance value is relatively high, decreasing the probability
of selection in the OO algorithm. Hence, exploration is only guaranteed if the initial
performance values are relatively small.
However, it is not possible to verify or reject the second and third part of the hypothesis
on basis of the obtained data. Although evidently some performance function is learned,
no claim can be done about convergence and how well the approximation is of the true
performance function. For these claims, the experiment should have been longer and
repeated several times. However, due to technical problems this was not possible. More
discussion on this topic can be found in appendix D.
The first part of the hypothesis is confirmed, but only under certain conditions. The exploration of the search space is a function of the initial
performance values. The initial values have to be close to the minimal values of the
resulting approximated performance value to ensure exploration. This doesn’t necessarily has to be a disadvantage as it also allows to guide the exploration by shaping the
initial performance function in a suitable way.
Conclusion of experiment
Although the exploration is shown, the second part of the hypothesis cannot be confirmed or rejected on basis of the current experimental results. The performance function does seem converge to a solution, but more experiments are needed to confirm this
solution is converging to the actual performance function.
8-4
Conclusion
This chapter described how the deliberate feedback control loop was implemented on
the Zebro hexapod. Two experiments were conducted to verify if the algorithm works
in practise.
In the first so-called reference speed experiment, the peformance function was fully
known and no learning was necessary. The hypothesis was that the number of times
a sub-optimal solution was selected would increase with increasing annealing temperature. Although the experimental histogram of sub-optimal selections did not exactly
match with the expected theoretical probabilities, the hypothesis was confirmed.
Django D. van Amstel
Master of Science Thesis
8-4 Conclusion
93
b after 1 hour of learning. Top left: 3D
Figure 8-5: The approximated performance values Λ
surface plot with outlier. Other images: performance curve per environment ξ, outlier deleted
Master of Science Thesis
Django D. van Amstel
94
Case Study: The Zebro hexapod walking robot
The second experiment was named the learning experiment and tested the hypothesis
that the algorithm explores all parts of the search space for high annealing temperatures
and the subsequent learned performance function converges to the true performance
function.
The first part of the hypothesis (full search space exploration) is confirmed under the
condition that the unexplored areas have a relative low initial performance value. If
they have not, it is not guaranteed that the whole search space will be explored.
The second part (convergence) and third (convergence to the true function) of the
hypothesis can not be confirmed or rejected on basis of the obtained data. In order
to do any claims on the confirmation or rejection of the second and third part of the
hypothesis, more experimental data is necessary. It was impossible to obtain more
experimental data in the performed experiments because of technical problems.
Django D. van Amstel
Master of Science Thesis
Chapter 9
Conclusions
The main goal of this thesis has been to develop methods for a Switching Max-Plus
Linear (SMPL) system to optimize the mode of operation. This goal led to the development of two novel feedback methods in where the system matrix elements are adapted
as a function of the measured disturbances and environment.
Three contributions have been presented in this thesis:
1. A systematic way of modeling disturbances in a Max-Plus Linear (MPL) framework;
2. The single event iteration state calculation method;
3. A new selection rule for the Ordinal Optimization (OO) method.
These contributions led to the development of a reactive and deliberate feedback loop.
9-1
Modeling disturbances in a max-plus framework
In order to systematically model actual measured disturbances on a MPL system, a
new disturbance model has been derived. In this model, for each scheduled state xi (k)
a new realized state xbi (k) is introduced. For a state vector x(k) ∈ Rnmax , the scheduled
and realized state vector are related as






xb1 (k)
xb2 (k)
..
.
xbn (k)
Master of Science Thesis






=





d1 (k)
ε
..
.
ε
ε
···
...
ε
..
.
d2 (k)
...
...
ε
···
ε dn (k)


 
 
 
⊗
 

x1 (k)
x2 (k)
..
.
xn (k)



,


Django D. van Amstel
96
Conclusions
with di (k) ∈ R denoting a disturbance on the state with index i. By doing so, both
delayed and early events can be modeled in a useful representation.
It was derived that the disturbance model can be constructed for any MPL system that
n×n
can be described by a Petri Net. Let AM ∈ Rmax
denote the matrices obtained from
the Petri Net. Then the disturbance model matrices AM,dis ∈ R2n×2n
are obtained by
max
A0,dis (k) =
AM,dis =
"
"
E
A0
D(k) E
E AM
E E
#
xe(k + 1) =
;
for M ≥ 1.
Let the disturbance model state be defined as
"
#
x(k + 1)
xb(k + 1)
#
.
The disturbance SMPL state space model is then given by
xe(k + 1) = Adis (k) ⊗ xe(k)
with the system matrix for a first order system defined by
Adis (k) = A∗0,dis (k) ⊗ A1,dis .
The conditions for existence of the disturbance model and thus the limitations to its
usability, are equal to those of the original model.
9-2
The reactive feedback loop
The reactive feedback loop mitigates the effects of the measured disturbances by adapting the holding times of the system matrices. For the reactive feedback loop, a new
method is proposed to calculate the state x(k). This new state calculation method is
named the single event cycle iteration method.
9-2-1
The single event iteration state calculation method
By rewriting the state space model, the MPL state x(k) ∈ Rnmax was written as a sum
of vectors in max plus sense:
x(k) =
lM
max
x(k, l),
l=1
in where the matrices x(k, l) are obtained from the recurrence relation
x(k, l + 1) =
Django D. van Amstel
(
A1 ⊗ x(k, 0) for l = 0
,
A0 ⊗ x(k, l) for l = 1, 2, · · · , lmax
Master of Science Thesis
9-3 The deliberate feedback loop
97
with x(k, 0) = x(k − 1). The interpretation of vector x(k, l) is the firing times of a state
after the tokens have taken l steps in the Petri Net from the initial condition without
synchronization. It has been proven that lmax is a finite value equal to the maximal
path length in the communication graph of A0 .
9-2-2
Reactive calculation of the system matrices
The updated matrices are given by the following feedback law:
AbM (k, l) = AM ⊗ D− (k, l) ⊕ AM,min ,
in where AM are the original matrices, D− (k, l) is a function of the disturbances and
AM,min are matrices corresponding to the maximal performance of the system. The
rescheduled state is then calculated as
xb(k, l + 1) =
(
c (k, l) ⊗ x(k, 0) for l = 0
A
1
c
A0 (k, l) ⊗ x(k, l) for l = 1, 2, · · · , lmax
until the delay is mitigated or if the stopping criteria are met. The reactive feedback
loop was succesfully tested in simulation.
9-3
The deliberate feedback loop
The deliberate feedback control loop takes the partial representation of the environment
Ξ into account and aims at learning which modes of operation perform well in which
environments. The deliberate feedback loop consists of three main components; the
Switch Decision Maker (SDM), the Performance Function Learner (PFL) and the Mode
of Operation Optimizer (MOO).
The SDM is a function with a binary output that decides if a switch in mode of
operation should be done. If so, the Mode of Operation Optimizer (MOO) determines
a new mode of operation by using the Adaptive Blind Pick selection rule in a Ordinal
Optimization procedure. The performance function used in the optimization procedure
is constantly approximated by the PFL.
9-3-1
The Adaptive Blind Pick selection rule
The Adaptive Blind Pick (ABP) selection rule is proposed to combine the good qualities
of the Horse Race and Blind Pick selection rules. It is essentially an Blind Pick (BP)
selection rule that adapts the uniform cumulative density function as a function of the
knowledge gained on the performance function.
Let θn ∈ Θ denote a candidate solution with index n in the space of all solutions Θ of
size N . Moreover, define Λ(θn ) ∈ R∗ as the corresponding performance value and let
T ∈ R+ be the annealing temperature.
Master of Science Thesis
Django D. van Amstel
98
Conclusions
Then the adapted probability density function pΘ is given by its elements [pΘ ]n using
the Boltzmann-Gibbs probability density function as
[p − Θ]n = P (θn |T ) = P
e−Λ(θn )/T
,
−Λ(θn )/T
θ∈Θ e
∀ ∈ {1, 2, · · · , N } .
Lower temperatures of T correspond to a high trust in the approximation of the performance function. It is proven that using the ABP, the alignment probability goes to
1 for the annealing temperature approaching 0.
It still remains an unsolved question how to determine the correct value of T . The issue
is how to quantify the confidence in how well the approximation is in the performance
function.
9-3-2
Optimizing the mode of operation
With the Adaptive Blind Pick selection rule and the Ordinal Optimization Technique,
a new mode of operation is found by solving
θ(k) = arg min F(x, xb, Z, θ).
θ∈Θ
The state space equation then becomes
x(k + 1) = A (θ(k)) ⊗ x(k).
9-3-3
Implementation of the deliberate feedback loop on the Zebro robot
The deliberate feedback loop was implemented on the Zebro, a hexapod walking robot
that uses SMPL state space models to control the leg movement. Two experiments
were conducted to test the implementation. The first experiment focussed on the
optimization of the mode of operation, given that the performance function is known.
The results confirmed the hypothesis that for high annealing temperatures, more exploration is done by the algorithm while for low temperatures the optimal mode of
operation (according to the performance function) is selected with a probability close
to 1.
The second experiment focussed on the learning part of the algorithm. The performance
function was defined in this experiment as the 2-norm of the disturbance vector, such
that the system would learn how the dynamics of the robotic body influence the walking.
The hypothesis consisted of three parts. The first part of the hypothesis was that the
algorithm would explore the full search space. This part has been confirmed under the
condition that the unexplored areas have a relative low performance value. Because
of the ordering in the ordinal optimization, modes of operation with low performance
index have a preference. Hence, if a mode of operation has a high initial performance
value, the probability that it will be selected decreases.
Django D. van Amstel
Master of Science Thesis
9-4 Discussion and recommendations
99
The second and third part of the hypothesis could not be confirmed due to lack of
enough experimental data. These sub-hypothesisses claimed that the learned performance function would converge to the true performance function. However, in the
limited experimental data the function did not seem to have converged. Since no convergence was reached, nothing can be said about to what solutions the convergence
went.
9-4
Discussion and recommendations
Although the obtained results from the experiments and simulations are positive they
are far from conclusive. The focus has been on the synthesis and implementation of
the feedback methods and the necessary theoretical background for these methods.
This led to the construction of a general feedback framework. However, the more
specific aspects have not been analysed in full detail as a consequence. Moreover, due
to lack of experimental data not all subparts of the method could be tested in practice.
Mainly the learning part of the algorithm in where the true performance function is
approximated should be analysed in more detail and improved upon.
In conclusion, the problem statement has been adressed by the proposed deliberate and
feedback loop in this thesis. The deliberate feedback loop is designed such that a certain
performance function is optimized by switching modes of operation. In addition, the
reactive feedback loop is designed such that incidental disturbances are mitigated from
the system. However, it is still to early to claim that the proposed feedback methods
also work in practice.
To be more specific, the experimental results are indefinite in how well the learning
capabilities of the algorithm are. The following recommendations are suggested to
further improve and analyse the proposed feedback methods.
9-4-1
Recommendations
1. Analyse the Switch Decision Maker function. The decision if a switch
should be performed is of much influence on the performance. In this thesis only
very simple functions for the SDM were considered, as the focus of this thesis was
to find a method to find the optimal mode of operation given it is already decided
a switch should be performed.
2. Finding a representative annealing temperature. The height of the annealing temperature T in the optimization procedure should reflect the confidence that
can be put on the approximation of the performance function by the PFL. In order to calculate a representative annealing temperature at any given instance, this
confidence in the approximation should be appropiatly quantified.
3. Increasing the learning efficiency. In the PFL only a single point of the
performance function is updated at each event step k. For larger search spaces
Master of Science Thesis
Django D. van Amstel
100
Conclusions
this will lead to slow learning. By implementing intelligent data algorithms, it
might be possible to increase the number of points of the performance function
that are updated in a single event step.
4. Implement the deliberate feedback control loop in the individual event
cycle method. In the current implementation the mode of operation can be
switched at every event iteration k. However, in the reactive feedback loop, the
individual event cycle counter l was introduced. By letting the mode of operation
change at every iteration of l as well, a higher switching frequency in the time domain is achieved, allowing for more adequate response to disturbances. However,
it should be analysed if this does not destablize the system.
5. Robust control in a MPL framework. The disturbance MPL model resembles
the uncertainty state space models that are used in robust control in conventional
algebra. Since the disturbance MPL model exposes how disturbances propagate
through the system, it can be determined a priori how sensitive a certain mode of
operation is to certain disturbances. This knowledge could then be incorporated
in the initial guess of the performance function, possibly increasing the learning
efficiency.
Django D. van Amstel
Master of Science Thesis
Appendix A
Numerical examples per chapter
In this appendix many (numerical) examples are given to clarify the presented theory
in the various chapters of this thesis. The examples are structured per chapter. In the
last section of this appendix the MATLAB code that was used to obtain the results of
the examples is given.
A-1
Chapter 2: Max plus algbra
Example 2-1: Elementary max-plus operations
5 ⊕ 3 = max(5, 3) = 5
5⊗3 = 5+3=8
2⊗6 = |2 ⊗ 2 ⊗{z· · · ⊗ 2} = 2 + 2 + · · · + 2 = 6 × 2 = 12
2⊗−6
2⊗1/6
5⊕ε
5⊗e
4⊗ε
3⊕3
=
=
=
=
=
=
6
−6 × 2 = −12
1/6 × 2 = 1/3
max(5, −∞) = 5
5+0=5
4 + −∞ = −∞ = ε
max(3, 3) = 3
2 ⊗ (3 ⊕ 5) = 2 + max(3, 5) = 2 + 5 = 7
(2 ⊗ 3) ⊕ (2 ⊗ 5) = max ((2 + 3), (2 + 5)) = max(5, 7) = 7
Example 2-2: Max-plus matrix and vector operations
1 2
3 4
!
⊕
e −4
9 ε
Master of Science Thesis
!
=
1 ⊕ e 2 ⊕ −4
3⊕9 4⊕ε
!
=
max(1, 0) max(2, −4)
max(3, 9) max(4, −∞)
!
=
1 2
9 4
Django D. van Amstel
!
.
102
Numerical examples per chapter
Figure A-1: Communication graph of matrix A
6⊗
1 2
3 4
!
⊗
1 2
3 4
!
=
e −4
9 ε
!
6⊗1 6⊗2
6⊗3 6⊗4
=
!
=
6+1 6+2
6+3 6+4
!
7 8
9 10
=
(1 ⊗ e ⊕ 2 ⊗ 9) (1 ⊗ −4 ⊕ 2 ⊗ ε)
(3 ⊗ 9 ⊕ 4 ⊗ 9) (3 ⊗ −4 ⊕ 4 ⊗ ε)
!

.
11 −3
13 −1
!
.
Define the matrix
Example 2-3: Max-plus matrices and their communication graph
3×3
A ∈ Rmax
as
=
!

1 ε 1/2
ε 
A=
.
 3 ε
5 7 ε
The corresponding communication graph G(A) is depicted in Figure A-1.
Then, the following information can be substracted:
N (A) = {1, 2, 3}
.
D(A) = {(1, 1), (1, 2), (1, 3), (2, 3), (3, 1)}
Note that for each arc going from node i to node j, denoted by (i, j) in D(A), the arc
weight is given by the matrix element [A]ji . By concatenation an infinite amount of
paths could be defined. However, one can identify three elementary circuits and one
concatenated circuit:
C1
C2
C3
C4
Django D. van Amstel
=
=
=
=
((1, 2), (2, 3), (3, 1))
((1, 3), (3, 1))
(1, 1)
C1 ◦ C2 = ((1, 2), (2, 3), (3, 1), (1, 3), (3, 1))
Master of Science Thesis
A-1 Chapter 2: Max plus algbra
103
The length, weight and average weight of circuit C4 are given by:
|C4 |l
= 5 (= m)
N
|C4 |w
=
(a21 , a32 , a13 , a31 , a13 ) = 3 + 7 + 1/2 + 5 + 1/2 = 16
|C4 |w /|C4 |l = 16/5 = 3.2.
The other average circuit weights are computed similarly as
|C1 |w /|C1 |l = 5
||C2 |w /|C2 |l = 2.75
|C3 |w /|C3 |l = 1.
Since C1 has the highest average circuit weight, the critical circuit is C1 and the critical
graph is defined by nodes
N c (A) = (1, 2, 3) and arcs Dc (A) = ((1, 2), (2, 3), (3, 1))
From Figure A-1 one easily sees the graph is strongly connected, as one could start in
every node and reach every node by the arcs. Hence, A is an irreducible matrix. Moreover, given the lengths 3, 2 and 1 for the elementary circuits C1 , C2 and C3 respectively
and their greatest common divisor 1, the cyclicity is 1: σG = 1.
Example 2-4: Cyclicity
This example is a copy of example 3.1.3 in [11]. Let
A=
"
−1 11
1 ε
#
.
The succesive powers of A are calculated as
⊗2
A
⊗3
A
⊗4
A
=
"
12 10
0 12
#
;
=
"
11 23
13 11
#
;
=
"
24 22
22 24
#
;
=
"
23 35
25 23
#
.
and
⊗5
A
Now notice A⊗5 = A⊗3 + 12 and also A⊗4 = A⊗2 + 12, with + addition in traditional
sense. It appears that in general it holds that
A⊗(k+2) = 12 ⊗ A⊗k
A⊗(k+2) = 6⊗2 ⊗ A⊗k ,
k ≥ 2.
From here, the following properties can be derived.
Master of Science Thesis
Django D. van Amstel
104
Numerical examples per chapter
• The algebraic cyclicity: σ(A) = 2.
• The eigenvalue: λ(A) = 6.
• The transient time: t(A) = 2.
Example 2-5: The Timed Event Graph (TEG) and a Max-Plus Linear model representation
for bipedal walking In walking, the leg movement can be viewed as a Max-Plus Linear
(MPL)Discrete Event Systems (DES) with the touch-down and lift-off of each foot as
the discrete events. Moreover, in bipedal static stable walking one leg can only lift up
after the other leg has touched down.
The Petri Net of the above described bipedal walking situation is given in Figure A-2.
In here, the black bars represent event or state nodes, their names written in the upper
right corner next to them. The circles represent the place nodes. The holding times of
each place is given in the lower right corner of the place and the names are depicted
above the place on the left. The tokens are represented by the black dots in the places
and positioned according to the initial marking.
Hence, the Petri Net Gbiped = {P, Q, D, M0 , T } is defined by:
• P = {F1 , F2 , G1 , G2 , D1 , D2 },
• Q = {t1 , t2 , l1 , l2 }
• D = {(t1 , G1 ), (t1 , D1 ), (t2 , G2 ), (t2 , D2 ), (l1 , F1 ), (l2 , F2 ), (F1 , t1 ), ...
...(F2 , t2 ), (G1 , l1 ), (D2 , l1 ), (D1 , l2 ), (G2 , l2 )}
• M0 = {G1 , G2 , D2 }
• T = {τf , τf , τg , τg , τδ , τδ }
The state is defined as




x(k) = 
t1 (k)
t2 (k)
l1 (k)
l2 (k)



.

Then the matrices A0 and A1 can be defined using equation (2-7) as:




A0 = 
and



A1 = 

Django D. van Amstel
ε
ε
ε
τδ

ε τf ε
ε ε τf 


ε ε ε 
ε ε ε
e ε
ε e
τg τδ
ε τg
ε
ε
e
ε
ε
ε
ε
e



.

Master of Science Thesis
A-1 Chapter 2: Max plus algbra
105
Figure A-2: The Petri Net modeling bipedal static stable waking
Using the Kleene star operator (2-5) and noticing that M = 1, the definition of the
state space model is obtained by following equation (2-9) as
x(k) = A∗0 ⊗ A1 ⊗ x(k − 1).
Expanding yields















t1 (k)
t2 (k)
l1 (k)
l2 (k)
t1 (k)
t2 (k)
l1 (k)
l2 (k)
t1 (k)
t2 (k)
l1 (k)
l2 (k)



Ln−1 


 =
z=0 










 = 






 = 


ε
ε
ε
τδ
ε
τf ⊗ τδ
ε
τδ
⊗z
ε τf ε
ε ε τf 


ε ε ε 
ε ε ε



⊗


ε ε
ε ε
τg τδ
ε τg

ε
τf
ε
⊗2

ε τ f ⊗ τδ τf 
 
⊗
ε
ε
ε  
ε τ f ⊗ τδ ε
ε
ε
ε
ε
ε
ε
ε
ε
ε ε
ε ε
τg τδ
ε τg
ε
ε
ε
ε


 
 
⊗
 
ε
ε
ε
ε

t1 (k − 1)
t2 (k − 1)
l1 (k − 1)
l2 (k − 1)

 
 
⊗
 





t1 (k − 1)
t2 (k − 1)
l1 (k − 1)
l2 (k − 1)







τf ⊗ τg ⊕ e
τδ ⊗ τ f
τf
ε
⊗2
⊗2

⊗2
τδ ⊗ τf ⊗ τg
(τδ ⊗ τf ) ⊕ τf ⊗ τg ⊕ e τδ ⊗ τf
τf 
 
⊗
τg
τδ
e
ε  
⊗2
τδ ⊕ τδ ⊗ τf ⊗ τg
τf ⊗ τδ ⊕ τg
τδ ⊗ τf e
Master of Science Thesis
t1 (k − 1)
t2 (k − 1)
l1 (k − 1)
l2 (k − 1)
Django D. van Amstel



.

106
Numerical examples per chapter
And a MPL state space model has been obtained. Substituting
τf = 0.5s,
τg = 0.3 and τδ = 0.2s
and letting the initial condition be




x(0) = x0 = 
0
0
0
0



,

the first few states are calculated as


x(1) = 


0.8
1.5
0.3
1.0



,



x(2) = 


2.2
2.9
1.7
2.4



,



x(3) = 


3.6
4.3
3.1
3.8



.

From the states, the schedule of the legs is given in a so-called gait graph in Figure
A-3. In here, the white areas represent the aerial phase of the leg and dark grey the
ground or support phase.
A-2
Chapter 3: Ordinal Optimization
Example 3-1: Ordinal Optimization with the Blind Pick (BP) selection rule Define
the solution space Θ of size N = 100, such that each solution θi is indexed with
i = 1, 2, · · · , N . N is chosen rather small such that the working precision of MATLAB
(the largest number MATLAB can represent) is sufficient. For larger search spaces, the
number of combinations explode and the limit to the working precision is reached quite
soon.
The performance function J is defined as
J(θi ) = i1/4 .
By defining the performance in such a way, the solutions are already ordered by their
performance. Hence, the OPC is directly obtained, as given in Figure A-4.
Now, select for example g = 5, s = 30 and k = 1. Then, using equation 3-1, the
alignment probability is calculated as P (|G ∩ S| > k) = 0.84.
Then a random combination of s = 30 indexes are selected from N = 100 and hence
the selected subset is defined as
S = {θi } ,
Django D. van Amstel
i = 85, 58, 33, 52, 50, 30, 86, 13, 38, 84, ...
...59, 53, 42, 29, 28, 34, 65, 66, 97, 14, ...
...75, 100, 2, 98, 61, 57, 83, 11, 71, 51.
Master of Science Thesis
A-2 Chapter 3: Ordinal Optimization
107
Figure A-3: The schedule of bipedal static stable walking. White areas represents the aerial
phase, dark grey the support phase of the leg
Figure A-4: The Ordered Performance Curve for the performance function of Example 3-1
Master of Science Thesis
Django D. van Amstel
108
Numerical examples per chapter
Then, the elements in set S are ordered according to their corresponding performance
value J(θi ). For notational clarity, only the first and last three entries will be presented:
1.19 ≤
1.82 ≤
1.90 ≤ · · · ≤ 3.14 ≤
3.15 ≤
3.16
.
J(θ2 ) ≤ J(θ11 ) ≤ J(θ13 ) ≤ · · · ≤ J(θ97 ) ≤ J(θ98 ) ≤ J(θ100 )
Since k = 1, the solution of the algorithm is θ2 . Note that the good enough subset was
θi for i = 1, 2, 3, 4, 5 and indeed, the solution θ2 is in the good enough subset.
The above procedure is now repeated for 100.000 times. Then, the number of times
that the size |G ∩ S| was larger than k is counted. By dividing this counted number by
100.000, the alignment probability is approximated by simulation, resulting in Psim (|G∩
S| > k).
This experiment is repeated five times. The results are summarized as
Psim (|G ∩ S| > k) =
h
i
0.8432 0.8405 0.8392 0.8398 0.8429 .
These results correspond to the theoretical determined alignment probability P (|G ∩
S| > k) = 0.84.
A-3
Chapter 4: Structural analysis of max-plus linear models
For this example, the
Petri Net of Example 2-5 as depicted in Figure A-2 will be used. From this Petri Net,
the matrices A0 and A1 were obtained as
Example 4-1: The single event cycle state calculation method



A0 = 

ε
ε
ε
τδ

ε τf ε
ε ε τf 


ε ε ε 
ε ε ε
and




A1 = 
e ε
ε e
τg τδ
ε τg
ε
ε
e
ε
ε
ε
ε
e



.

Recall moreover that the first two state vectors were calculated using the obtained MPL
model as




2.2
0.8
 2.9 
 1.5 



.

 , x(2) = 
x(1) = 
 1.7 
 0.3 
2.4
1.0
Django D. van Amstel
Master of Science Thesis
A-3 Chapter 4: Structural analysis of max-plus linear models
109
Figure A-5: The communication graph of A0 of the bipedal walking model
In this example, it will be shown how one can obtain x(2) from x(1), A0 and A1 using the
individual event cycle method. First, lmax will be determined from the communication
graph of A0 . This graph is depicted in Figure A-5
The only path is p = (l1 , t1 , l2 , t2 ), and hence the longest path length is |p| = 4 and
therefore lmax = 4. This can be checked by noting that
⊗(lmax −1)
A0




= A⊗3
0 = 
ε
ε
ε
ε
ε ε ε
ε 1.2 ε
ε ε ε
ε ε ε



,

while A⊗4
0 = E.
Then by noting that x(2, 0) = x(1) =
method gives
h
0.8 1.5 0.3 1



x(2, 1) = A1 ⊗ x(2, 0) = 




x(2, 2) = A0 ⊗ x(2, 1) = 




x(2, 3) = A0 ⊗ x(2, 2) = 




x(2, 4) = A0 ⊗ x(2, 3) = 

Master of Science Thesis
i⊤
0.8
1.5
1.7
1.8
0.5
0.5
ε
0.2
0.8
1.5
ε
1.0
2.2
2.9
ε
2.4
the individual event cycle




















Django D. van Amstel
110
Numerical examples per chapter
The aggregated state is obtained as
x(2) =
lM
max
l=1



x(2, l) = 

0.8
1.5
1.7
1.8


 
 
⊕
 
0.5
0.5
ε
0.2


 
 
⊕
 
0.8
1.5
ε
1.0


 
 
⊕
 
2.2
2.9
ε
2.4






=


2.2
2.9
1.7
2.4



.

The aggregated state x(2) is indeed equal to the one calculated in Example 2-5. Note
that it is interesting to see that only x3 (2) comes from x(2, 1) and all other states
originate from x(2, 4). Apparantly, the connection between two succesive states x(k) is
via x3 (k).
A-4
Chapter 5: The max-plus linear disturbance model
In this example, the disturbance model
for the Petri Net of the bipedal walking model will be derived. See Figure A-2 for the
Petri Net.
Example 5-1: Obtaining the disturbance model
The disturbance model will be obtained in both the algebraic way by using Proposition
5.4 and graphically from the Petri Net using the procedure described by equation 2-7.
It will be shown that both methods arrive at the same result.
Using Proposition 5.4 one can directly derive that







A0,dis (k) = 






ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
τδ
d1 (k)
ε
ε
ε
ε
ε
d2 (k)
ε
ε
ε
ε
ε
d3 (k)
ε
ε
ε
ε
ε
d4 (k) ε

ε τf ε
ε ε τf 

ε ε ε 


ε ε ε 
,
ε ε ε 

ε ε ε 


ε ε ε 
ε ε ε
and

A1,dis






=






e
ε
ε
ε
ε
ε
ε
ε
ε
e
ε
ε
ε
ε
ε
ε
ε
ε
e
ε
ε
ε
ε
ε
ε ε ε
ε ε ε
ε τ g τδ
e ε τg
ε e ε
ε ε e
ε ε ε
ε ε ε
ε
ε
ε
ε
ε
ε
e
ε
ε
ε
ε
ε
ε
ε
ε
e







,






The extended Petri Net is given in Figure A-6. Now, using the method of equation
2-7 leads to the exact same matrices as obtained above by using Proposition 5.4. The
reader is invited to verify this result.
Django D. van Amstel
Master of Science Thesis
A-4 Chapter 5: The max-plus linear disturbance model
111
Figure A-6: The Petri Net of the disturbance model, obtained by replacing all original states
xi (k) by the actual and scheduled state, xbi (k) and xi (k). They are connected via a place with
holdingtime di (k)
In this example
the Petri Net of Figure 4-2 at the beginning of chapter 4 will be used. The A0 and A1
matrices were obtained as
Example 5-2: Propagating disturbances using the disturbance model
and
such that






ε ε ε

A0 =  1 ε 0 
;
ε ε ε
2 ε 2

A1 =  ε 0 ε 

ε 3 0
2 ε 2

A= 3 3 3 

ε 3 0
With initial condition x(0) = x0 = [ e 0 0 ]⊤ the first states are calculated as


x(1) = 

x(2) = 

Master of Science Thesis
2
3
3
5
6
6


,


.
Django D. van Amstel
112
Numerical examples per chapter
Now assume it is measured that xb1 (1) = 2.2. How this delay is propagated through the
system can be calculated by iterating the recurrence relation
Substituting xb(1) =
h
x(k) = A0 ⊗ xb(k) ⊕ A1 ⊗ x(k − 1).
2.2 3 3
i⊤
yields




2

x(1) = 
 3.2 
3
and
5

x(2) = 
 6.2  .
6.2
Note that x1 (2) is now again calculated without the delay and has to be adapted. This
procedure has to be iterated until the state x(2) converges. In this particular case, it
has already arrived at the solution and no succesive calculations are necessary.
Now the above is calculated using the disturbance model. Extending the original matrices yields

and




A0,dis (k) = 




A1,dis
By substituting d(1) =
as
h
ε
ε
ε
ε
ε
ε
ε
ε
ε
d1 (k)
ε
ε
ε
d2 (k)
ε
ε
ε
d3 (k)





=




0.2 0 0





Adis (1) = 




Django D. van Amstel
i⊤
e
ε
ε
ε
ε
ε
ε
e
ε
ε
ε
ε
ε
ε
e
ε
ε
ε
ε
ε
ε
2
ε
ε
ε
ε
ε
ε
e
3
ε
1
ε
ε
ε
ε
ε
ε
ε
2
ε
e
ε
ε
ε
ε
ε
ε
ε
0
ε
ε
ε
ε






;








.




the Adis matrix is obtained using equation 2-7
e
1.2
ε
0.2
1.2
ε
ε
e
ε
ε
e
ε

ε 2 ε 2
e 3.2 3 3.2 


e ε 3 ε 
.
ε 2.2 ε 2.2 

e 3.2 3 3.2 

e ε 3 e
Master of Science Thesis
A-5 Chapter 6: The adaptive Blind Pick selection rule
113
Now the extended state of the disturbance model is directly obtained as





xe(1) = 




and





xe(2) = 





2
3.2
3
2.2
3.2
3
5
6.2
6.2
5
6.2
6.2














.




Recall that the disturbance model state xe(k) had the structure
xe(k) =
"
x(k)
xb(k)
#
.
Hence, the first three states of xe(k) represent the scheduled state x(k), whilst the last
three states represent xb(k), the actual or measured state. See that because of the delay
in x1 (k) the scheduling of this state is not updated. However, the scheduled x2 (k) is
updated because it depends on x1 (k).
Although it comes at the computational cost of recalculating the A matrix everytime
a disturbance is measured, the disturbance model does handle these disturbances in a
systematic way that moreover makes a clear distinction between scheduled events and
the actual happening of events.
A-5
Chapter 6: The adaptive Blind Pick selection rule
Example 6-1: Reshaping the cumulative distribution function with the annealling temperature Define the solution space Θ of size N = 100, such that each solution θi is
indexed with i = 1, 2, · · · , N . The performance function J is defined as
J(θi ) = i1/4 .
See Figure A-4 in Example 3-1 for a plot of J(θ). Then define the annealing temperatures as
h
i
h
i
Tlow Tmed Thigh = 10 1 0.1 .
For these annealing temperatures, the CDF is calculated using equation 6-2. The
results are depicted in Figure A-7. For T = 10 the CDF is linear; this corresponds
Master of Science Thesis
Django D. van Amstel
114
Numerical examples per chapter
Figure A-7: The cumulative distribution function FΘ (θ), calculated from J(θi ) for selected
annealing temperatures T .
to a uniform distribution or random selection. For T = 1 however, the slope of the
CDF is larger than for T = 10, but the derivative decreases for increasing index i.
This translates to that the solutions with lower indexes i have a higher probability of
selection. This effect is very extreme for T = 0.1. Approximately only the first six
indexes have a probability larger than 0 to be selected. Inspecting Figure A-4 shows
that these solutions are indeed the better solutions.
Let the
i custom CDF be
1 2 · · · 10 as
Example 6-2: The discrete inverse transform sampling
h method
defined by FX (xn ) of the random discrete set X =
x2
FX (xn ) = P10n
i=1
i2
.
It is depicted by the solid black line in Figure A-8. Note that the discrete function
FX (xn ) has been transformed to a continuous stairs function and that since the CDF
is quadratic, when drawing samples xn from X the higher values should appear more
often.
Then 5 samples U are drawn at random from (0, 1), for example
U=
h
i
0.4314 0.9106 0.1818 0.2638 0.1455 .
These points are represented by circles on the y-axis in Figure A-8. Then the inverse
transform sampling method can be explained graphically by extending these points to
the right, until they hit the curve of FX (xn ). This is represented by the dashed lines.
Django D. van Amstel
Master of Science Thesis
A-6 Chapter 7: Switching Max-Plus Linear Feedback Methods
115
Figure A-8: The CDF FX (x)
Then the xn value for which the dashed line hits the curve of FX (xn ) corresponds to a
sample from X with CDF FX (xn ).
The numerical procedure is shown for U (1) = 0.4314. For this value ∆ is calculated as
∆ = FX (xn )−U (1) =
h
i
−0.43 −0.42 −0.40 −0.35 −0.2886 −0.20 −0.07 0.10 0.31 0.57 .
The zero crossing is at ∆(7). Hence, the resulting sample is x7+1 = x8 . This result
indeed corresponds to the figure.
Finally, 100.000 samples are selected at random from (0, 1) and transformed using
the inverse transform sampling method with FX (xn ). Both resulting histograms are
depicted in Figure A-9. While the left histogram of u clearly shows a uniform selection,
the right histogram clearly is sampled according to FX (xn ).
A-6
Chapter 7: Switching Max-Plus Linear Feedback Methods
In the following examples the reactive feedback control loop will be explained by showing simulations. Three cases will be presented;
1. The disturbance can be mitigated directly; it does not influence other states.
2. The disturbance is mitigated within the same aggregated event iteration k.
Master of Science Thesis
Django D. van Amstel
116
Numerical examples per chapter
Figure A-9: Comparison of histograms. Left: histogram of u, uniformly drawn from (0,1). Right:
Histogram of x, obtained using the Inverse transform sampling method with FX (xn ) from u.
Table A-1: Holding times in the Petri Net modeling Bipedal walking
Ti
τf
τg
τ∆
value / s
0.5
0.4
0.5
0.3
0.4
0
τf,min
τg,min
τ∆,min
3. The disturbance is mitigated in two aggregated event iterations, k and k + 1.
For the three examples the Petri Net modeling bipedal walking will be used. The Petri
Net is presented in Figure A-2 of example 2-5.
The holding times are summarized in Table A-1 Note that the flight time and the
double stance time have a minimal value different from the nominal value.
Define the state as




x(k) = 
t1 (k)
t2 (k)
l1 (k)
l2 (k)



.

Then the system matrices are obtained as



A0 = 

Django D. van Amstel
ε
ε
ε
0.2

ε 0.5 ε
ε ε 0.5 

;
ε ε
ε 
ε ε
ε
Master of Science Thesis
A-6 Chapter 7: Switching Max-Plus Linear Feedback Methods



A1 = 

0
ε ε
ε
0 ε
0.4 0.2 0
ε 0.4 ε




A0,min = 



A1,min = 

ε
ε
ε
0
117
ε
ε
ε
0



;


ε 0.3 ε
ε ε 0.3 

;
ε ε
ε 
ε ε
ε
0
ε
ε
0
0.4 0
ε 0.4
ε
ε
0
ε
ε
ε
ε
0



.

Note that from A0 it follows that lmax = 4. Let the initial condition be

0
0
0
0

x(0) = x0 = 





.

The first three states are then given by




x(1) = 
0.9
1.6
0.4
1.1



;





x(2) = 
2.3
3.0
1.8
2.5



;


3.7
4.4
3.2
3.9



x(3) = 



.

Moreover, using the single event iteration method the single event states x(1, l) are
found as
h
i⊤
0 0 0 0
x(1, 0) =
x(0)
=
x(1, 1) = A1 ⊗ x(1, 0) =
x(1, 2) = A0 ⊗ x(1, 1) =
x(1, 3) = A0 ⊗ x(1, 2) =
x(1, 4) = A0 ⊗ x(1, 3) =
h
0 0 0.4 0.4
h
ε 0.7 ε 1.1
h
h
i⊤
0.9 0.9 ε 0.2
ε 1.6 ε ε
This state evolution will be used in Example 7-1, 7-2 and 7-3.
i⊤
i⊤
i⊤
Example 7-1: Direct mitigation of the disturbance Let it be measured that d3 (1) = 0.1.
Using the disturbance model, the resulting updated state would be



xb(1) = 

Master of Science Thesis
1
1.7
0.5
1.2



.

Django D. van Amstel
118
Numerical examples per chapter
By step 1 in the summary of section 7-1-2, it is obained that lcurrent = 1 and the negative
disturbance matrix is found in step 2 as




D− (1, 1) = 
0
ε
ε
ε
ε
ε
ε
0
ε
ε
ε −0.1 ε
ε
ε
0



.

The resulting updated matrices are found using equation 7.3:



Ab0 (1, 1) = 



Ab1 (1, 1) = 

0
ε ε
ε
0 ε
0.4 0.2 0
ε 0.4 ε
ε
ε
ε
0


ε
ε
ε
0.2

ε 0.5 ε
ε ε 0.5 

;
ε ε
ε 
ε ε
ε


.

Then, the next single event cycle state is updated as
xb(1, 2) = Ab0 (1, 1) ⊗ x(1, 1) =
h
0.9 0.9 ε 0.2
i⊤
.
The error δ is already zero;



δ = (xb(1, 2) − x(1, 2)) = 

0.9
0.9
ε
0.2


 
 
−
 
0.9
0.9
ε
0.2






=


0
0
0
0



.

Hence, the next single event cycle states do not have to be recomputed and the rescheduled state is


0.9
 1.6 

x′ (1) = 

.
 0.5 
1.1
Note that since the disturbance could be accounted for within one step of the single
event cycle, all other (future) states remain uninfluenced, while the disturbance model
did reschedule all future states 0.1s later as the disturbance model does not take the
minimal matrices into account.
In this example,
it will be shown what happens if a disturbance cannot be accounted for in one step.
The same state is disturbed as in the previous example but a larger delay is measured:
d3 (1) = 0.4.
Example 7-2: Mitigation of the disturbance within the event iteration
Django D. van Amstel
Master of Science Thesis
A-6 Chapter 7: Switching Max-Plus Linear Feedback Methods
119
Again, the negative disturbance matrix is calculated as



D− (1, 1) = 

resulting in


Ab0 (1, 1) = 


0
ε
ε
ε

ε
ε
ε
0
ε
ε
ε −0.4 ε
ε
ε
0
ε
ε
ε
0.2


,


ε 0.3 ε
ε ε 0.5 

,
ε ε
ε 
ε ε
ε
such that
xb(1, 2) = Ab0 (1, 1) ⊗ x(1, 1) =
h
1.1 0.9 ε 0.2
i⊤
.
This gives the error



δ = (xb(1, 2) − x(1, 2)) = 

1.1
0.9
ε
0.2


 
 
−
 
0.9
0.9
ε
0.2






=


0.2
0
0
0



.

Since the disturbance is not mitigated, the above procedure is repeated for the next
state in the single event cycle. Let the next negative disturbance matrix be given from
the error δ as


−0.2 ε ε ε
 ε
0 ε ε 


,
D− (1, 2) = 
 ε
ε 0 ε 
ε
ε ε 0
resulting in


Ab0 (1, 2) = 


ε
ε
ε
0

ε 0.5 ε
ε ε 0.5 

.
ε ε
ε 
ε ε
ε
The next single event state is now updated as
xb(1, 3) = Ab0 (1, 2) ⊗ x(1, 2) =
h
ε 0.7 ε 1.1
i⊤
.
This results in the error



δ = (xb(1, 3) − x(1, 3)) = 

Master of Science Thesis
ε
0.7
ε
1.1


 
 
−
 
ε
0.7
ε
1.1






=


0
0
0
0



.

Django D. van Amstel
120
Numerical examples per chapter
And again, the disturbance is mitigated, resulting in the state


xb′ (1) = 



1.1
1.6
0.8
1.1


.

The reader is invited to check that x(2) remains unchanged under the updated state
x(1).
Example 7-3: Mitigation of the disturbance over two event iteration
Now assume
d4 (1) = 0.4, such that lcurrent = 3.
Calculting again D− (1, 3) and Ab0 (1, 3) yields
xb(1, 4) = Ab0 (1, 3) ⊗ x(1, 3) =
The remaining error is


δ=


0
0.2
0
0
h
ε 1.8 ε ε
i⊤
.





However, lmax has been reached and hence the aggregated event counter is increased by
1.
Let
x(2, 0) = xb(1) =
h
0.9 1.8 0.4 1.5
i⊤
Then, using the previous error, the negative disturbance matrix is obtained as



D− (2, 0) = 

0
ε
ε ε
ε −0.2 ε ε
ε
ε
0 ε
ε
ε
ε 0



.

From D− (2, 0) the AbM matrices are calculated and the next single event state is obtained as
x(2, 1) = Ab1 ⊗ x(2, 0) =
and the error



δ = (xb(2, 1) − x(2, 1)) = 

h
0.9 1.8 1.8 2.2
0.9
1.8
1.8
2.2


 
 
−
 
0.9
1.6
1.8
2






=


i⊤
,
0
0.2
0
0.2



.

Iterating one cycle more yields an error vector of zero. The reader is invited to do these
calculations.
Django D. van Amstel
Master of Science Thesis
A-7 Chapter 8: Case Study: The Zebro hexapod walking robot
A-7
121
Chapter 8: Case Study: The Zebro hexapod walking robot
In this example it will be shown how
to go from an aggregated representation θ of the mode of operation to an MPL state
space model. The reader is referred to [15] for a detailed treatment of the theory.
Example 8-1: Obtaining a MPL model from θ
let


L2

θ6 =  0.3 

0
such that tauf = θ(2) = 0.75s, τ∆ = θ(3) = 0s and
L = {{4, 5}, {1, 6}, {2, 3}} .
There are three leg groups, thus m = 3 and
τg = (τf ⊗ τ∆ )⊗m − τf = (0.3 ⊗ 0)⊗m − 0.3 = 0.6s.
Using equations 8-7 and 8-8 the P and Q matrices are obtained as





P =




ε
0
0
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
0
ε
ε
ε
ε
0
0
ε
ε
ε
ε
0
ε
0
0
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
0
0
ε
ε
ε
ε
0
0
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε










and





Q=




Master of Science Thesis





.




Django D. van Amstel
122
Numerical examples per chapter
Subsequently the AM matrices are found using equation 8-3 as












A0 = 











ε
ε
ε
ε
ε
ε
ε
0
0
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
0
ε
ε
ε
ε
0
ε
ε
ε
ε
ε
ε
0
ε
ε
ε
ε
0

ε 0.3 ε
ε
ε
ε
ε
ε ε 0.3 ε
ε
ε
ε 


ε ε
ε 0.3 ε
ε
ε 

ε ε
ε
ε 0.3 ε
ε 

ε ε
ε
ε
ε 0.3 ε 


ε ε
ε
ε
ε
ε 0.3 

ε ε
ε
ε
ε
ε
ε 

0 ε
ε
ε
ε
ε
ε 


0 ε
ε
ε
ε
ε
ε 

ε ε
ε
ε
ε
ε
ε 

ε ε
ε
ε
ε
ε
ε 

ε ε
ε
ε
ε
ε
ε
and












A1 = 











ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
0.6 ε
ε
ε
ε
ε
ε 0.6 ε
ε
ε
ε
ε
ε 0.6 ε
ε
ε
ε
0
0 0.6 ε
ε
ε
0
0
ε 0.6 ε
ε
ε
ε
ε
ε 0.6
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε

ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε ε
ε ε 


ε ε 

ε ε 

ε ε 


ε 0.3 
,
ε ε 

ε ε 


ε ε 

ε ε 

ε ε 

ε ε
such that the MPL model x(k + 1) = A ⊗ x(k) is obtained as




















t1 (k + 1)
t2 (k + 1)
t3 (k + 1)
t4 (k + 1)
t5 (k + 1)
t6 (k + 1)
l1 (k + 1)
l2 (k + 1)
l3 (k + 1)
l4 (k + 1)
l5 (k + 1)
l6 (k + 1)




















=


















Django D. van Amstel
0.9
1.2
1.2
ε
ε
ε
0.6
0.9
0.9
ε
ε
ε
0.6
0.9
0.9
0.3
0.3
0.6
0.3
0.6
0.6
0
0
0.3
0.6
0.9
0.9
0.3
0.3
0.6
0.3
0.6
0.6
0
0
0.3
1.2
1.5
1.5
0.9
ε
1.2
0.9
1.2
1.2
0.6
ε
0.9
1.2
1.5
1.5
ε
0.9
1.2
0.9
1.2
1.2
ε
0.6
0.9
ε
1.2
1.2
ε
ε
0.9
ε
0.9
0.9
ε
ε
0.6
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε
ε


 
 
 
 
 
 
 
 
 
⊗
 
 
 
 
 
 
 
 
 
t1 (k)
t2 (k)
t3 (k)
t4 (k)
t5 (k)
t6 (k)
l1 (k)
l2 (k)
l3 (k)
l4 (k)
l5 (k)
l6 (k)










.









Master of Science Thesis
A-8 MATLAB code
A-8
1
2
3
4
5
6
7
8
9
123
MATLAB code
%% EXAMPLE 2-4
e = −inf ;
A = [−1 1 1 ; 1 e ] ;
A2 = MPtimes ( A , A ) ;
A3 = MPtimes ( A , A2 ) ;
A4 = MPtimes ( A , A3 ) ;
A5 = MPtimes ( A , A4 ) ;
%% EXAMPLE 2-5
e = −inf ;
10
11
12
13
tf = 0 . 5 ;
tg = 0 . 3 ;
td = 0 . 2 ;
14
15
16
17
18
19
20
21
A0 = [ e e tf e ; e e e tf ; e e e e ; td e e e ] ;
A1 = [ 0 e e e ; e 0 e e ; tg td 0 e ; e tg e 0 ] ;
A = calculateA ( A0 , A1 ) ;
events_state ( 1 , : ) = zeros ( 1 , 4 ) ;
for i = 2 : 3
events_state ( i , : ) = MPtimes ( A , events_state ( i − 1 , : ) ’ ) ’ ;
end
22
23
sz = size ( events_state ) ;
24
25
26
27
28
29
30
31
32
for i = 2 : sz ( 1 )
for p = 1 : 2
rectangle ( ’Position ’ , [ events_state ( i , p+2) , p −0.5 , events_state ( i , p )
−events_state ( i , p+2) , 1 ] , ’FaceColor ’ , ’b’ )
end
end
title ( ’Schedule of bipedal walking from MPL model ’ )
ylabel ( ’leg index ’ )
xlabel ( ’time / s’ )
33
34
%% EXAMPLE 3-1
35
36
37
38
39
40
41
42
43
44
45
N = 100;
g= 5 ;
s= 3 0 ;
k = 1;
idx = 1 : N ;
J = idx . ^ ( 1 / 4 ) ;
stairs ( idx , J ) ;
title ( ’Ordered Performance Curve ’ )
xlabel ( ’N’ )
ylabel ( ’J(\ theta_i )’ )
46
47
48
49
% for k = 1: min(g,s)
%
P(k) = calcPOO (g,s,k,N);
% end
Master of Science Thesis
Django D. van Amstel
124
Numerical examples per chapter
50
51
52
53
Z = 10000;
counter = 0 ;
for z = 1 : Z
54
55
56
57
58
59
60
61
62
63
64
65
% Because indexes are already ordered ,
S = randperm ( N , s ) ;
JS = J ( S ) ;
[ JS , idx ] = sort ( JS ) ;
S = S ( idx ) ;
GS = S ( 1 : k ) ;
if GS <= g
counter = counter +1;
end
end
Psim = counter /Z ;
66
67
68
%% Ex 4-1
e = −inf ;
69
70
71
72
73
74
75
76
77
tf = 0 . 5 ;
tg = 0 . 3 ;
td = 0 . 2 ;
lmax = 3 ;
A0 = [ e e tf e ; e e e tf ; e e e e ; td e e e ] ;
A1 = [ 0 e e e ; e 0 e e ; tg td 0 e ; e tg e 0 ] ;
A = calculateA ( A0 , A1 ) ;
events_state ( 1 , : ) = zeros ( 1 , 4 ) ;
78
79
80
81
for i = 2 : 3
events_state ( i , : ) = MPtimes ( A , events_state ( i − 1 , : ) ’ ) ’ ;
end
82
83
84
individual_state = e∗ ones ( lmax , 4 ) ;
individual_state ( 1 , : ) = events_state ( 2 , : ) ;
85
86
87
88
89
90
91
92
93
for l = 0 : lmax
if l == 0
individual_state ( l + 1 , : ) = MPtimes ( A1 , events_state ( 2 , : ) ’ ) ’ ;
else
individual_state ( l + 1 , : ) = MPtimes ( A0 , events_state ( l , : ) ’ ) ’ ;
end
end
events_state_check = max ( individual_state )
94
95
96
97
98
99
100
101
%% EXAMPLE 5-2
e = −inf ;
A0 = [ e e e ; 1 e 0 ; e e e ] ;
A1 = [ 2 e 2 ; e 0 e ; e 3 0 ] ;
A = calculateA ( A0 , A1 ) ;
x = [0 0 0; 2 3 3; 5 6 6] ’;
xhat = [ 0 0 0 ; 2 . 2 3 3 ] ’ ;
102
Django D. van Amstel
Master of Science Thesis
A-8 MATLAB code
103
104
105
125
xhat ( : , 2 ) = MPplus ( MPtimes ( A0 , xhat ( : , 2 ) ) , MPtimes ( A1 , x ( : , 1 ) ) ) ;
xhat ( 1 , 2 ) = 2 . 2 ;
xhat ( : , 3 ) = MPtimes ( A , xhat ( : , 2 ) )
106
107
108
109
d1 = 0 . 2 ;
d2 = 0 . 0 ;
d3 = 0 . 0 ;
110
111
112
113
114
A0dis = [ epsilonmatrix ( 3 ) A0 ; [ d1 e e ; e d2 e ; e e d3 ] epsilonmatrix ( 3 ) ] ;
A1dis = [ unitmatrix ( 3 ) A1 ; epsilonmatrix ( 3 ) unitmatrix ( 3 ) ] ;
A0stardis = MPstar ( A0dis ) ;
Adis = calculateA ( A0dis , A1dis ) ;
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
widetildeX ( : , 2 ) = MPtimes ( Adis , [ x ( : , 1 ) ; xhat ( : , 1 ) ] ) ;
d1 = 0 ;
d2 = 0 ;
d3 = 0 ;
A0dis = [ epsilonmatrix ( 3 ) A0 ; [ d1 e e ; e d2 e ; e e d3 ] epsilonmatrix ( 3 ) ] ;
A1dis = [ unitmatrix ( 3 ) A1 ; epsilonmatrix ( 3 ) unitmatrix ( 3 ) ] ;
A0stardis = MPstar ( A0dis ) ;
Adis = calculateA ( A0dis , A1dis ) ;
widetildeX ( : , 3 ) = MPtimes ( Adis , widetildeX ( : , 2 ) )
%% EXAMPLE 6-1
N = 100;
Nidx = 1 : N ;
J = Nidx . ^ ( 1 / 4 ) ;
temp = [ 1 0 1 . 1 ] ;
130
131
132
133
if min ( J ) <= 0
J = J + abs ( min ( J ) +1) ;
end
134
135
136
137
for idx = 1 : length ( temp )
% Calculate non - normalized probability .
P = exp (( −( J ) ) . / temp ( idx ) ) ;
138
139
% Calculate normalized probability (hence reach of Pn = [0 ,1])
for j = 1 : length ( P )
Pn ( j ) = P ( j ) . / sum ( P ) ;
% Create CDF of the probabilities
F ( idx , j ) = sum ( Pn ( 1 : j ) ) ;
end
140
141
142
143
144
145
146
end
147
148
149
150
151
152
plot ( Nidx , F ( 1 , : ) , ’k-’ , Nidx , F ( 2 , : ) , ’k--’ , Nidx , F ( 3 , : ) , ’k*-.’ )
title ( ’The cumulative distribution function for selected annealing
temperatures ’ )
xlabel ( ’index i of \ theta_i ’ )
ylabel ( ’F_\Theta (\ theta )’ )
legend ( ’T = 10 ’ , ’T = 1’ , ’T = 0.1 ’ )
153
154
Master of Science Thesis
Django D. van Amstel
126
155
156
157
158
159
160
161
162
Numerical examples per chapter
%% EXAMPLE 6-2
x = 1:10;
P = x.^(2) ;
for j = 1 : length ( P )
Pn ( j ) = P ( j ) . / sum ( P ) ;
% Create CDF of the probabilities
F ( j ) = sum ( Pn ( 1 : j ) ) ;
end
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
u = rand ( 1 , 5 ) ;
for i = 1 : length ( u )
U = u(i) ;
y = ( F−U ) ;
t1=y ( 1 : length ( x ) −1) ;
t2=y ( 2 : length ( x ) ) ;
tt=t1 . ∗ t2 ;
indx=find ( tt <0) ;
if isempty ( indx )
X(i) = 1;
else
X ( i ) = indx +1;
end
end
180
181
182
183
184
185
186
187
188
189
190
hold on
stairs ( F , ’k-’ )
xcoor = [ zeros ( 1 , length ( u ) ) ; X ] ;
ycoor = [ u ; u ] ;
plot ( xcoor , ycoor , ’k--’ )
plot ( zeros ( 1 , length ( u ) ) , u , ’ko ’ )
hold off
title ( ’F_X(x)’ )
xlabel ( ’x’ )
ylabel ( ’P(X\leq x)’ )
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
u = rand ( 1 , 1 0 0 0 0 0 ) ;
figure ( 1 )
hist ( u )
title ( ’histogram of u’ )
xlabel ( ’value of u \in (0 ,1) ’ )
ylabel ( ’N’ )
for i = 1 : length ( u )
U = u(i) ;
y = ( F−U ) ;
t1=y ( 1 : length ( x ) −1) ;
t2=y ( 2 : length ( x ) ) ;
tt=t1 . ∗ t2 ;
indx=find ( tt <0) ;
if isempty ( indx )
X(i) = 1;
else
Django D. van Amstel
Master of Science Thesis
A-8 MATLAB code
208
209
210
211
212
213
214
215
X ( i ) = indx +1;
end
end
figure ( 2 )
hist ( X )
title ( ’histogram of x’ )
xlabel ( ’value of x \in X’ )
ylabel ( ’N’ )
A-8-1
1
2
127
MATLAB implementation reactive feedback control method
function [ current_state_new , go_to_next_iteration , delayscheck ] =
reactiveiterationstate ( A0 , A0min , A1 , A1min , . . .
current_l , current_lmax , current_state_hat , current_state_hat_new ,
previous_state , delays )
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
% Create D^- matrix for first iteration step
idx = find ( delays ) ;
Dmin = unitmatrix ( length ( delays ) ) ;
for IDX = idx
Dmin ( IDX , IDX ) = −delays ( IDX ) ;
end
% Calculate updated A0 A1 matrices for first iteration step
A0hat = MPplus ( MPtimes ( A0 , Dmin ) , A0min ) ;
A1hat = MPplus ( MPtimes ( A1 , Dmin ) , A1min ) ;
% initialize delayscheck matrix
delayscheck = zeros ( current_lmax , length ( delays ) ) ;
% start iteration
for L = current_l : current_lmax
% Calculate one step ahead in the innercyclic iteration
if L == 0
current_state_hat_new ( L + 1 , : ) = MPtimes ( A1hat , previous_state ’ ) ’ ;
else
current_state_hat_new ( L + 1 , : ) = MPtimes ( A0hat ,
current_state_hat_new ( L , : ) ’ ) ’ ;
end
% Compare delayed schedule with original schedule : check if delays
are
% mitigated .
if L == current_lmax
delayscheck ( L , : ) = max ( current_state_hat_new ) − max (
current_state_hat ) ;
delayscheck ( L , isnan ( delayscheck ( L , : ) ) ) =0;
% lmax reached , stop iteration
go_to_next_iteration = 1 ;
break
else
delayscheck ( L + 1 , : ) = current_state_hat_new ( L + 1 , : )−
current_state_hat ( L + 1 , : ) ;
Master of Science Thesis
Django D. van Amstel
128
Numerical examples per chapter
delayscheck ( L+1, isnan ( delayscheck ( L + 1 , : ) ) ) =0;
idx = find ( delayscheck ( L + 1 , : ) ) ;
% Delays are mitigated , stop iteration
if isempty ( idx )
go_to_next_iteration = 0 ;
break
end
37
38
39
40
41
42
43
end
44
45
% Calculate new D^- matrix for next iteration step
Dmin = unitmatrix ( length ( delays ) ) ;
for IDX = idx
Dmin ( IDX , IDX ) = −delayscheck ( L+1, IDX ) ;
end
46
47
48
49
50
51
% Update A0 A1 matrices for next iteration step
A0hat = MPplus ( MPtimes ( A0 , Dmin ) , A0min ) ;
A1hat = MPplus ( MPtimes ( A1 , Dmin ) , A1min ) ;
52
53
54
55
56
end
57
58
59
1
% Create single state vector for the event domain from inner event matrix
current_state_new = max ( current_state_hat_new ) ;
function lmax = determinelmax ( A0 ) ;
2
3
4
5
6
7
8
9
10
11
12
13
14
15
lmax = 0 ;
zero = 1 ;
sz = size ( A0 ) ;
while zero
check = MPpower ( A0 , lmax ) ;
count = 0 ;
for i = 1 : sz ( 1 )
for j = 1 : sz ( 2 )
if check ( i , j ) == −inf ;
count = count +1;
end
end
end
16
if count >= sz ( 1 ) ∗sz ( 2 )
zero = 0 ;
lmax = lmax ;
else
lmax = lmax +1;
end
clear count
17
18
19
20
21
22
23
24
end
25
26
end
Django D. van Amstel
Master of Science Thesis
A-8 MATLAB code
1
2
function [ x_iter ] = iterationstate ( A0 , A1 , x_previous , lmax )
3
4
5
x_iter ( : , 1 ) = MPtimes ( A1 , x_previous ’ ) ;
for l = 2 : lmax
x_iter ( : , l ) = MPtimes ( A0 , x_iter ( : , l−1) ) ;
end
x_iter = x_iter ’ ;
6
7
Master of Science Thesis
129
Django D. van Amstel
130
Django D. van Amstel
Numerical examples per chapter
Master of Science Thesis
Appendix B
Alignment probabilities as a function
of the annealing temperature
In this appendix the alignment probabilties for a selected set of parameters are presented. They have been obtained by simulation. These probabilities are both given
graphically for quick reference as in table form for more precise interpolation. The
MATLAB code used in creating these plots are given in the final section of this appendix.
B-1
Tables
The following tables contain the alignment probability P (|G ∩ S| > k) as a function
of the desired level of alignment k and the annealing temperature T . In the table
captions are the sizes of the used subsets. Each column corresponds to the desired level
of alignment
k = 1, 2, · · · , min(g, s),
and each row to the annealing temperature
T =
Master of Science Thesis
h
i
0.1 1 2 3 4 5 6 7 8 9 10 100 .
Django D. van Amstel
132
Alignment probabilities as a function of the annealing temperature
Table B-1: Alignment probabilities for N = 1000, g = 10, s = 5
0.1
1
2
3
4
5
6
7
8
9
10
100
1
1.00
0.48
0.19
0.12
0.10
0.09
0.08
0.07
0.07
0.07
0.07
0.05
2
1.00
0.12
0.02
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
3
1.00
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
4
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
5
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Table B-2: Alignment probabilities for N = 1000, g = 10, s = 10
0.1
1
2
3
4
5
6
7
8
9
10
100
1
1.00
0.73
0.34
0.23
0.19
0.17
0.15
0.14
0.14
0.13
0.13
0.10
2
1.00
0.35
0.06
0.03
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.00
3
1.00
0.11
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
4
1.00
0.03
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
5
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
6
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
7
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
8
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
9
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
10
0.99
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Table B-3: Alignment probabilities for N = 1000, g = 10, s = 20
0.1
1
2
3
4
5
6
7
8
9
10
100
1
1.00
0.93
0.56
0.41
0.34
0.30
0.28
0.26
0.25
0.25
0.24
0.19
Django D. van Amstel
2
1.00
0.72
0.19
0.09
0.06
0.05
0.04
0.04
0.03
0.03
0.03
0.02
3
1.00
0.45
0.04
0.01
0.01
0.01
0.00
0.00
0.00
0.00
0.00
0.00
4
1.00
0.22
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
5
1.00
0.09
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
6
1.00
0.03
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
7
1.00
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
8
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
9
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
10
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Master of Science Thesis
B-1 Tables
133
Table B-4: Alignment probabilities for N = 1000, g = 10, s = 30
0.1
1
2
3
4
5
6
7
8
9
10
100
1
1.00
0.98
0.71
0.55
0.47
0.42
0.39
0.37
0.35
0.34
0.34
0.27
2
1.00
0.90
0.34
0.18
0.13
0.10
0.09
0.08
0.07
0.06
0.06
0.04
3
1.00
0.72
0.12
0.04
0.02
0.02
0.01
0.01
0.01
0.01
0.01
0.00
4
1.00
0.51
0.03
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
5
1.00
0.30
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
6
1.00
0.15
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
7
1.00
0.07
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
8
1.00
0.02
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
9
1.00
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
10
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Table B-5: Alignment probabilities for N = 1000, g = 10, s = 40
0.1
1
2
3
4
5
6
7
8
9
10
100
1
1.00
0.99
0.81
0.65
0.57
0.52
0.48
0.46
0.44
0.43
0.42
0.34
2
1.00
0.96
0.48
0.28
0.20
0.16
0.14
0.12
0.11
0.11
0.10
0.06
3
1.00
0.88
0.22
0.09
0.05
0.04
0.03
0.02
0.02
0.02
0.02
0.01
4
1.00
0.74
0.08
0.02
0.01
0.01
0.00
0.00
0.00
0.00
0.00
0.00
5
1.00
0.55
0.02
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
6
1.00
0.36
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
7
1.00
0.21
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
8
1.00
0.11
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
9
1.00
0.05
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
10
1.00
0.02
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Table B-6: Alignment probabilities for N = 1000, g = 10, s = 50
0.1
1
2
3
4
5
6
7
8
9
10
100
1
1.00
1.00
0.87
0.73
0.65
0.60
0.56
0.54
0.52
0.51
0.49
0.40
Master of Science Thesis
2
1.00
0.99
0.61
0.38
0.28
0.23
0.20
0.18
0.16
0.15
0.15
0.09
3
1.00
0.95
0.33
0.14
0.09
0.06
0.05
0.04
0.04
0.03
0.03
0.01
4
1.00
0.87
0.14
0.04
0.02
0.01
0.01
0.01
0.01
0.01
0.00
0.00
5
1.00
0.74
0.05
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
6
1.00
0.58
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
7
1.00
0.41
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
8
1.00
0.26
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
9
1.00
0.15
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
10
1.00
0.08
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Django D. van Amstel
134
Alignment probabilities as a function of the annealing temperature
Figure B-1: The selected annealing temperatures.
B-2
Graphs
For g = 50 and g = 100 The tables become too large to fit the page. Hence, for these
the alignment probability P (|G ∩ S| > k) as a function of the desired level of alignment
k and the annealing temperature T will be depicted graphically in Figures B-2, B-3 and
B-4. One can find the numerical values on the accompanying CD-ROM. The legend
for these figures can be found in Figure B-1.
Django D. van Amstel
Master of Science Thesis
B-2 Graphs
135
Figure B-2: The alignment probabilities for selected T and g = 10.
Master of Science Thesis
Django D. van Amstel
136
Alignment probabilities as a function of the annealing temperature
Figure B-3: The alignment probabilities for selected T and g = 50.
Django D. van Amstel
Master of Science Thesis
B-2 Graphs
137
Figure B-4: The alignment probabilities for selected T and g = 100.
Master of Science Thesis
Django D. van Amstel
138
B-3
Alignment probabilities as a function of the annealing temperature
Used MATLAB code
B-3-1
1
2
3
4
5
6
7
8
9
Main code
%% Alignment probability as a function of T, figure in 6-3
N = 1000;
g = 10;
s = 10;
tau= [ 0 . 0 0 1 0 . 0 1 0 . 1 1 2 3 4 5 6 7 8 9 10 1 0 0 ] ;
k = 1:10;
for i = 1 : length ( temp )
[ Prob1010 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
10
11
12
13
14
15
16
17
18
k = 1:10;
temp = tau ( [ 3 4 5 9 ] ) ;
plot ( k , Prob1010 ( 3 , : ) , ’ko --’ , k , Prob1010 ( 4 , : ) , ’k^--’ , k , Prob1010 ( 5 , : ) , ’k+--’
, k , Prob1010 ( 9 , : ) , ’k*--’ , k , BP1010 , ’k-’ )
legend ( ’T = 0.1 ’ , ’T = 1’ , ’T = 2’ , ’T = 6’ , ’T = 10 ’ )
title ( ’Alignment probability for selected annealing temperatures with g =
10, s = 10, N =1000 ’ )
xlabel ( ’k’ )
ylabel ( ’P(G\cap S > k)’ )
axis ( [ 1 10 0 1 . 1 ] )
19
20
clear all , clc
21
22
23
24
25
26
27
%% APPENDIX B TABLES
N = 1000;
g = [ 1 0 50 1 0 0 ] ;
s = [ 5 10 20 30 40 5 0 ] ;
sigma = [ 0 ] ;
temp = [ 0 . 1 1 2 3 4 5 6 7 8 9 10 1 0 0 ] ;
28
29
% DISREGARD Prob20050 !!!
30
31
32
33
34
35
36
g = 10;
s = 5;
Prob305 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob105 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
37
38
39
40
41
42
43
g = 10;
s = 10;
Prob3010 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob1010 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
44
45
46
g = 10;
s = 20;
Django D. van Amstel
Master of Science Thesis
B-3 Used MATLAB code
47
48
49
50
139
Prob3020 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob1020 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
51
52
53
54
55
56
57
g = 10;
s = 30;
Prob1030 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob1030 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
58
59
60
61
62
63
64
g = 10;
s = 40;
Prob3040 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob1040 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
65
66
67
68
69
70
71
g = 10;
s = 50;
Prob3050 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob1050 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
72
73
74
75
76
77
78
g = 50;
s = 5;
Prob505 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob505 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
79
80
81
82
83
84
85
g = 50;
s = 10;
Prob3010 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob5010 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
86
87
88
89
90
91
92
g = 50;
s = 20;
Prob3020 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob5020 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
93
94
95
96
97
98
99
g = 50;
s = 30;
Prob3030 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob5030 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
Master of Science Thesis
Django D. van Amstel
140
Alignment probabilities as a function of the annealing temperature
100
101
102
103
104
105
106
g = 50;
s = 40;
Prob3040 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob5040 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
107
108
109
110
111
112
113
g = 50;
s = 50;
Prob3050 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob5050 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
114
115
116
117
118
119
120
g = 100;
s = 5;
Prob505 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob1005 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
121
122
123
124
125
126
127
g = 100;
s = 10;
Prob3010 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob10010 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
128
129
130
131
132
133
134
g = 100;
s = 20;
Prob3020 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob10020 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
135
136
137
138
139
140
141
g = 100;
s = 30;
Prob3030 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob10030 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
142
143
144
145
146
147
148
g = 100;
s = 40;
Prob3040 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
for i = 1 : length ( temp )
[ Prob10040 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
149
150
151
152
g = 100;
s = 50;
Prob3050 = zeros ( [ length ( temp ) , min ( g , s ) ] ) ;
Django D. van Amstel
Master of Science Thesis
B-3 Used MATLAB code
153
154
155
141
for i = 1 : length ( temp )
[ Prob10050 ( i , : ) ] = taufunction ( N , g , s , temp ( i ) , 0 ) ;
end
156
157
158
159
160
%% CONVERTING TO LATEX TABLES
temp = [ 0 . 1 1 2 3 4 5 6 7 8 9 10 1 0 0 ] ;
load ( ’g10.mat ’ )
g = 10;
161
162
163
s = 5;
matrix2latex ( Prob105 , ’Prob105 .tex ’ , ’rowLabels ’ , temp , ’columnLabels ’ ,
1 : min ( g , s ) , ’alignment ’ , ’c’ , ’format ’ , ’% -6.2f’ , ’size ’ , ’small ’ ) ;
164
165
166
s = 10;
matrix2latex ( Prob1010 , ’Prob1010 .tex ’ , ’rowLabels ’ , temp , ’columnLabels ’ ,
1 : min ( g , s ) , ’alignment ’ , ’c’ , ’format ’ , ’% -6.2f’ , ’size ’ , ’small ’ ) ;
167
168
169
s = 20;
matrix2latex ( Prob1020 , ’Prob1020 .tex ’ , ’rowLabels ’ , temp , ’columnLabels ’ ,
1 : min ( g , s ) , ’alignment ’ , ’c’ , ’format ’ , ’% -6.2f’ , ’size ’ , ’small ’ ) ;
170
171
172
s = 30;
matrix2latex ( Prob1030 , ’Prob1030 .tex ’ , ’rowLabels ’ , temp , ’columnLabels ’ ,
1 : min ( g , s ) , ’alignment ’ , ’c’ , ’format ’ , ’% -6.2f’ , ’size ’ , ’small ’ ) ;
173
174
175
s = 40;
matrix2latex ( Prob1040 , ’Prob1040 .tex ’ , ’rowLabels ’ , temp , ’columnLabels ’ ,
1 : min ( g , s ) , ’alignment ’ , ’c’ , ’format ’ , ’% -6.2f’ , ’size ’ , ’small ’ ) ;
176
177
178
s = 50;
matrix2latex ( Prob1050 , ’Prob1050 .tex ’ , ’rowLabels ’ , temp , ’columnLabels ’ ,
1 : min ( g , s ) , ’alignment ’ , ’c’ , ’format ’ , ’% -6.2f’ , ’size ’ , ’small ’ ) ;
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
%
%
%
%
%
load (’g50.mat ’)
g = 50;
s = 5;
matrix2latex (Prob505 , ’Prob505 .tex ’, ’rowLabels ’, temp , ’columnLabels ’,
1: min(g,s), ’alignment ’, ’c’, ’format ’, ’% -6.2f’, ’size ’, ’small ’);
%
% s = 10;
% matrix2latex (Prob5010 , ’Prob5010 .tex ’, ’rowLabels ’, temp , ’columnLabels
’, 1: min(g,s), ’alignment ’, ’c’, ’format ’, ’% -6.2f’, ’size ’, ’small ’);
%
% s = 20;
% matrix2latex (Prob5020 , ’Prob5020 .tex ’, ’rowLabels ’, temp , ’columnLabels
’, 1: min(g,s), ’alignment ’, ’c’, ’format ’, ’% -6.2f’, ’size ’, ’small ’);
%
% s = 30;
% matrix2latex (Prob5030 , ’Prob5030 .tex ’, ’rowLabels ’, temp , ’columnLabels
’, 1: min(g,s), ’alignment ’, ’c’, ’format ’, ’% -6.2f’, ’size ’, ’small ’);
%
Master of Science Thesis
Django D. van Amstel
142
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
Alignment probabilities as a function of the annealing temperature
% s = 40;
% matrix2latex (Prob5040 , ’Prob5040 .tex ’, ’rowLabels ’, temp , ’columnLabels
’, 1: min(g,s), ’alignment ’, ’c’, ’format ’, ’% -6.2f’, ’size ’, ’small ’);
%
% s = 50;
% matrix2latex (Prob5050 , ’Prob5050 .tex ’, ’rowLabels ’, temp , ’columnLabels
’, 1: min(g,s), ’alignment ’, ’c’, ’format ’, ’% -6.2f’, ’size ’, ’small ’);
%
% load (’g100 .mat ’)
% g = 100;
%
% s = 5;
% matrix2latex (Prob1005 , ’Prob1005 .tex ’, ’rowLabels ’, temp , ’columnLabels
’, 1: min(g,s), ’alignment ’, ’c’, ’format ’, ’% -6.2f’, ’size ’, ’small ’);
%
% s = 10;
% matrix2latex (Prob10010 , ’Prob10010 .tex ’, ’rowLabels ’, temp , ’
columnLabels ’, 1: min(g,s), ’alignment ’, ’c’, ’format ’, ’% -6.2f’, ’size
’, ’small ’);
%
% s = 20;
% matrix2latex (Prob10020 , ’Prob10020 .tex ’, ’rowLabels ’, temp , ’
columnLabels ’, 1: min(g,s), ’alignment ’, ’c’, ’format ’, ’% -6.2f’, ’size
’, ’small ’);
%
% s = 30;
% matrix2latex (Prob10030 , ’Prob10030 .tex ’, ’rowLabels ’, temp , ’
columnLabels ’, 1: min(g,s), ’alignment ’, ’c’, ’format ’, ’% -6.2f’, ’size
’, ’small ’);
%
% s = 40;
% matrix2latex (Prob10040 , ’Prob10040 .tex ’, ’rowLabels ’, temp , ’
columnLabels ’, 1: min(g,s), ’alignment ’, ’c’, ’format ’, ’% -6.2f’, ’size
’, ’small ’);
%
% s = 50;
% matrix2latex (Prob10050 , ’Prob10050 .tex ’, ’rowLabels ’, temp , ’
columnLabels ’, 1: min(g,s), ’alignment ’, ’c’, ’format ’, ’% -6.2f’, ’size
’, ’small ’);
B-3-2
Functions used in main code
Blind Pick
1
2
% This matlab script calculates the alignment probability for given
subset
% sizes and using the Blind Pick selection rule in Ordinal Optimization
3
4
5
6
7
%
N
g
s
Define set sizes
= 1000;
= 10;
= 10;
Django D. van Amstel
Master of Science Thesis
B-3 Used MATLAB code
143
8
9
10
11
12
13
14
15
16
17
for k = 1 : min ( g , s ) ;
% for different alignment levels
for i = k : min ( g , s ) ;
Psum ( i ) = nchoosek ( g , i ) ∗ nchoosek ( ( N−g ) , ( s−i ) ) / nchoosek ( N , s ) ;
end
% Calculate alignment probability BP for desired level k
BP ( k ) = sum ( Psum ) ;
clear Psum
end
Inverse Transform Sampling Method
1
2
3
function Y = disinvtranssample ( F , x , U )
% This function is the implementation of the inverse transform sampling
% method for a discrete set.
4
5
6
% Calculate Delta
Dlt = ( F−U ) ;
7
8
9
10
11
% Create vectors to find zero crossing
t1=Dlt ( 1 : length ( x ) −1) ;
t2=Dlt ( 2 : length ( x ) ) ;
tt=t1 . ∗ t2 ;
12
13
14
% Find zero crossing
indx=find ( tt <0) ;
15
16
17
18
19
20
21
% Output index
if isempty ( indx )
Y = 1;
else
Y = indx +1;
end
Simulation code
1
2
3
4
5
6
7
function [ ProbTAU , PBP ] = taufunction ( N , g , s , temp , sigma )
% This function simulates the Ordinal Optimization method for given set
% sizes and different temperatures . The simulations are used to
approximate
% the alignment probability . The performance function and number of
% simulations are entered within this fumction .
tic
Z = 100000;
8
9
10
11
%% Create performance function
theta = 1 : N ;
J = ( theta ) . ^ ( 1 / 4 ) ; % performance value
12
13
14
% plot OPC
Master of Science Thesis
Django D. van Amstel
144
15
16
17
18
%
%
%
%
Alignment probabilities as a function of the annealing temperature
figure (1) , plot (theta ,J)
title (’ Ordered performance curve ’)
xlabel (’ index i of \theta_i ’)
ylabel (’ Performance J(\ theta ) ’)
19
20
21
%% LEARNING METHOD
% Calculate non - normalized selection probability .
22
23
24
25
26
27
for i = 1 : length ( J )
P ( i ) = exp (−(J ( i ) ) / temp ) ;
end
% The current mode of operation should have a probability of 0 to be
% chosen , as SDM already decided it SHOULD switch .
28
29
30
31
32
33
34
35
% Calculate normalized probability ( hence reach of Pn = [0 ,1])
for j = 1 : length ( P )
Pn ( j ) = P ( j ) . / sum ( P ) ;
% Create CDF of the probabilities
F ( j ) = sum ( Pn ( 1 : j ) ) ;
end
36
37
38
% transform CDF into piecewise continuous for the inverse transform
sampling
% method
39
40
41
42
43
44
45
46
47
48
49
50
%
%
%
%
%
%
%
%
%
%
%
figure (2) ,plot (theta ,Pn)
axis ([ min(theta ),max(theta ) ,0 ,1])
title (’ Selection Probability Density Function ’)
xlabel (’ index i of \theta_i ’)
ylabel (’P(\ theta ) ’)
figure (3) ,plot (theta ,F)
axis ([ min(theta ),max(theta ) ,0 ,1])
title (’ Cumulative Density Function ’)
xlabel (’ index i of \theta_i ’)
ylabel (’F(\ theta ) ’)
51
52
53
54
55
56
57
58
59
60
61
62
theta_alignment = zeros ( Z , s ) ;
for z = 1 : Z
%% Run Ordinal Optimization using inverse transform sampling
% Create uniform random vector of length s, being the size of the
selected
% subset .
U = rand ( s , 1 ) ;
% Find corresponding x values in F(x) = U
theta_selected = zeros ( 1 , length ( U ) ) ;
J_theta = zeros ( 1 , length ( U ) ) ;
for u = 1 : length ( U )
theta_selected ( u ) = disinvtranssample ( F , theta , U ( u ) ) ;
63
64
65
% Find accompanying performance index
[ ~ , theta_I ] = min ( abs ( theta−theta_selected ( u ) ) ) ;
Django D. van Amstel
Master of Science Thesis
B-3 Used MATLAB code
J_theta ( u ) = J ( theta_I ) ;
66
67
end
68
69
70
% Sort according to performance index
[ ~ , index ] = sort ( J_theta ) ;
theta_selected = theta_selected ( index ) ;
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
145
% output the alignment set
theta_alignment ( z , : ) = theta_selected ;
end
% Check alignment level
alignlevel = zeros ( 1 , Z ) ;
ProbTAU = zeros ( 1 , min ( g , s ) ) ;
for k = 1 : min ( g , s )
alignment = theta_alignment ( : , 1 : k ) − g ;
alignment = alignment <= 0 ;
for i = 1 : Z
if sum ( alignment ( i , : ) ) == length ( alignment ( i , : ) )
alignlevel ( i ) = 1 ;
else
alignlevel ( i ) = 0 ;
end
end
ProbTAU ( k ) = sum ( alignlevel ) / length ( alignlevel ) ;
%% Blind Pick
PBP = 0 ;
end
toc
Master of Science Thesis
Django D. van Amstel
146
Django D. van Amstel
Alignment probabilities as a function of the annealing temperature
Master of Science Thesis
Appendix C
Code of implementation
In this appendix all generated MATLAB code in the implementation of the methods
on the Zebro hexapod robot are given. The code is an extension to given code written
by F. Zhang, Msc. and dr. G.A.D. Lopes.
C-1
Implementation of the event level feedback
In Figure C-1 the current state of the art in using Switching Max-Plus Linear (SMPL)
state space models to control legged locomotion is depicted again. The feedback loop
going into the max plus gait scheduler was actually not implemented on the Zebro, but
only on the RQuad. The RQuad is very similar to the Zebro, the main difference being
the RQuad only has four legs.
In order to obtain the actual states and the delays, the RQuad implementation has
been adapted to work on the Zebro. The code can be found in section C-2-1, lines 391
to 433. This code was implemented on the RQuad by dr. G.A.D. Lopes.
The method works as follows. It sorts the events state in chronological order in the
so-called EventList and checks if and event should be happening. If the event should
be happening, the error between the reference angular position and actual position is
checked. If it is smaller than some bound δ, the event is deleted from the EventList.
Figure C-1
Master of Science Thesis
Django D. van Amstel
148
Code of implementation
If the error (disregarding it’s sign) is larger than the bound, the event timing and thus
the state value is updated with an increment of two times the samplingperiod. The
delay is then propagated one step through the system via the equation
x(k) = A0 ⊗ x(k) ⊕ A1 ⊗ x(k − 1).
Note that hence the schedule is not fully updated. The delays di (k) are then obtained
in the data processor by
di (k) = xbi (k) − xi (k),
∀i.
This implementation is limited in two ways. First, it assumes only delays are possible
as it only starts checking the actual occurrence of an event when its scheduled time
is very close. Hence, if it is very early, the method does not see it. Moreover, it only
adds time to the state when it is updated. The second limitation is that the delay is
propagated only one step in the Petri Net by using the above equation. Hence, only
direct influences of the delay are taken into account. For this event level feedback
this is not an issue, as any other following delays will just be seen as new introduced
disturbances and not as a consequence of an earlier delay.
The proposed disturbance model as introduced in chapter 5 solves this second limitation, as it updated the full system matrix when a disturbances is measured. As it
reschedules the full state, the disturbance modelt distinguishes between actual disturbances and deviations from the schedule as a result of a previous disturbance. This
distinction is important for the higher level feedback, as only actual disturbances should
be taken into account.
A method how to overcome the first limitation has not yet been found.
C-2
Implementation code
In this section the extended code for the implementation of the deliberate feedback
loop on the Zebro is given. The implementation of the reactive feedback loop has
not yet been finished and is hence not represented here. First, the main code will be
represented. Afterwards, the code of the different functions called upon in the main
code is presented.
C-2-1
1
Main code
clear all , clc , close all
2
3
4
5
6
7
%% experiment settings
finaltime = 6 0 0 ; % time in seconds for the experiment to run
sample_rate = 1 / 3 0 0 ;
disabled_legs = [ ] ;
limiter = 0 . 7 ; % fraction of maximal power available for each leg.
Django D. van Amstel
Master of Science Thesis
C-2 Implementation code
149
8
9
%%
INITIALIZE
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
10
11
12
13
14
15
16
17
18
%% settings
feedback = 1 ; % turn off/on feedback
anglemargin = . 2 ; % maximal error between reference and actual position
of leg at check
state_buffer = 2 ; % how many past states are kept in the events_state
switch_buffer = 4 ; % number of event iterations must be in between
switching
horizon = 2 ; % number of event iterations that calculated ahead from the
one the time domain is in
19
20
offsettuning = ( ( 3 / 2 ) +.07) ∗pi ; % Calibrate the legs without the FSM
overhead
21
22
23
%% Creating THETA ,XI , LAMBDA
parameters % Load m.file , creates THETA ,XI and initializes LAMBDA
24
25
26
27
28
29
% Initialize current gait and environment
xiindex = 1 ;
thetaindex = 2 ;
switch_necessary = 0 ; % decision variable if a gait switch should be
performed
switch_counter = 0 ; % counter that counts for the switch_buffer
30
31
32
current_theta = THETA ( : , thetaindex ) ;
current_xi = XI ( xiindex ) ;
33
34
35
36
37
38
39
40
41
42
43
44
45
46
buffer_xi = [ ] ;
buff = 0 ;
%% EventList initializing
% ROWS : 1; scheduled time , 2; index in state , 3; event iteration index , 4;
leg
% number , 5; defined angle of event occurence
EventList = [ ] ;
touchdowns = 1 : 6 ;
liftoffs = 7 : 1 2 ;
EventList_temp = zeros ( 1 2 , 5 ) ;
EventList_temp ( : , 2 ) = ( 1 : 1 2 ) ’ ;
EventList_temp ( touchdowns , 4 ) = ( 1 : 6 ) ’ ;
EventList_temp ( liftoffs , 4 ) = ( 1 : 6 ) ’ ;
EventList_temp ( : , 5 ) = current_theta ( 4 ) ;
47
48
49
50
51
%%
Kp
Ki
Kd
PID leg - controllers parameters
= [1.5 1 1.5 1.5 1.5 1 . 5 ] ;
= [.25 .25 .25 .25 .25 . 2 5 ] ;
= [0.025 0.025 0.025 0.025 0.025 0 . 0 2 5 ] ;
52
53
%% Initialize logs
Master of Science Thesis
Django D. van Amstel
150
54
55
56
57
58
59
60
61
62
Code of implementation
TC . Time = zeros ( 1 , floor ( finaltime / sample_rate ) ) ;
% Initialize time
log
TC . Encoder =zeros ( 6 , floor ( finaltime / sample_rate ) ) ;
% Initialize encoder log
TC . Velocity = zeros ( 1 , floor ( finaltime / sample_rate ) ) ;
% Initialize velocity log
TC . Input = zeros ( 1 , floor ( finaltime / sample_rate ) ) ;
% Initialize input log
TC . Reference = zeros ( 6 , floor ( finaltime / sample_rate ) ) ;
% Initialize reference log
TC . Theta = zeros ( 1 , floor ( finaltime / sample_rate ) ) ;
% Initialize gait
log
TC . Xi = zeros ( 1 , floor ( finaltime / sample_rate ) ) ;
% Initialize
environment log
TC . S = zeros ( 1 , floor ( finaltime / sample_rate ) ) ;
% Initialize
switch decision log
TC . temp = zeros ( 1 , floor ( finaltime / sample_rate ) ) ;
63
64
65
66
67
68
69
70
71
72
73
74
%% SIMULATE ENVIRONMENT
XImeasurement = . 7 ∗ ones ( 1 , ceil ( finaltime / sample_rate ) ) ;
c = 2∗ pi / length ( XImeasurement ) ∗ 2 ;
avg = ( max ( XI )+min ( XI ) ) / 2 ;
A = max ( XI )−avg ;
for i = 1 : length ( XImeasurement )
% sine wave
XImeasurement ( i ) = A∗ sin ( c∗i )+avg ;
% linear
% XImeasurement (i) = max(XI) -(max(XI)-min(XI))/ length ( XImeasurement )*
i;
end
75
76
77
78
79
80
81
82
83
84
85
% Square wave
% period = 25; % period in seconds
% signal = [ones (1, period /(2* sample_rate )), zeros (1, period /(2* sample_rate
))];
% while length ( signal ) < finaltime / sample_rate
%
signal = [signal , signal ];
% end
% XImeasurement = max(XI)* signal (1: ceil ( finaltime / sample_rate ))+min(XI);
% plot ( XImeasurement )
%% init
86
87
88
89
90
91
% Define poses
sit_pose = −2.2479∗ ones ( 1 , 6 ) ;
stand_pose = zeros ( 1 , 6 ) ;
limit_stand_pose = 0 . 2 ∗ ones ( 1 , 6 ) ;
directions = −1∗[−1 1 −1 1 −1 1 ] ;
92
93
94
95
96
% initialize low level control variables
encoders = zeros ( 1 , 6 ) ;
input = zeros ( 1 , 6 ) ;
u_input = input ;
Django D. van Amstel
Master of Science Thesis
C-2 Implementation code
97
98
99
100
101
151
saturated_input = input ;
sat_input = input ;
integrator = zeros ( 1 , 6 ) ;
previous_input = input ;
current_time = −0.5;
102
103
104
105
% initialize internal clock and counter
FSM_CLOCK = tic ;
FSM_CLOCK_COUNTER = 0 ;
106
107
108
% ------ Initialize Gait
events_state_size = 1 0 ;
109
110
111
112
113
% Initialize initial gait values such that the robot starts walking on t
=0.
[ A , G , H , P , Q , current_L , Td , Tf , Tg , lambda ] = MOOsyn ( current_theta ) ;
gaiteigenvector = MPComputeEigenVector ( current_L , Td , Tf , 6 ) ;
gaiteigenvector = gaiteigenvector−max ( gaiteigenvector ) ;
114
115
116
117
118
119
120
121
% Initialize max plus event states and matrices
events_state = zeros ( events_state_size , length ( gaiteigenvector ) ) ;
events_scheduled = zeros ( events_state_size , length ( gaiteigenvector ) ) ;
delays = zeros ( events_state_size , length ( gaiteigenvector ) ) ;
A_state = −inf ∗ ones ( 1 2 , 1 2 , events_state_size ) ;
G_state = −inf ∗ ones ( 1 2 , 1 2 , events_state_size ) ;
H_state = −inf ∗ ones ( 1 2 , 1 2 , events_state_size ) ;
122
123
124
events_state_counter = 1 ;
step_counter = 0 ;
125
126
127
128
129
%% Calculate initial gait matrices and information
A_state ( : , : , events_state_counter ) = A ;
G_state ( : , : , events_state_counter ) = G ;
H_state ( : , : , events_state_counter ) = H ;
130
131
132
% initialize event state to be a valid gait vector
events_state ( events_state_counter , : ) = [ gaiteigenvector ] ;
133
134
135
136
137
138
139
% generate initial events
events_state_counter = events_state_counter + 1 ;
A_state ( : , : , events_state_counter ) = A ;
G_state ( : , : , events_state_counter ) = G ;
H_state ( : , : , events_state_counter ) = H ;
events_state ( events_state_counter , : ) = MPTimes ( A_state ( : , : ,
events_state_counter ) , events_state ( events_state_counter − 1 , 1 : 1 2 ) ’ ) ’ ;
140
141
142
143
144
145
146
147
% initialize parameters
turning = 0 ;
turning_direction = 0 ;
auto_turning = 0 ;
manual_turning = 0 ;
turning_control = 0 ;
Master of Science Thesis
Django D. van Amstel
152
148
149
150
151
152
153
154
155
156
157
158
Code of implementation
starting = 1 ;
event_counter = 1 ;
time_counter = 0 ;
new_event = false ;
upcoming_events = [ ] ;
event_missed = 0 ;
first_sample = 1 ;
current_pose = encoders ;
previous_pose = current_pose ;
previous_time = current_time ;
tic ;
159
160
161
162
163
164
165
166
167
168
169
170
% Load all Zebro lower level control and functions
zebro = ZebroSpine ;
zebro . fopen ( ) ;
zebro . calibrate ( ) ;
zebro . setSampleTime ( sample_rate ) ;
zebro . disableLeg ( disabled_legs ) ;
zebro . setSaturation ( limiter ) ;
% zebro . setSaturation ( current_xi );
% read sensors
zebro . fwrite ( zeros ( 1 , 6 ) ) ;
current_pose = zebro . getEncoders ( ) ;
171
172
173
174
175
176
177
178
179
% Calculate initial eventlist
EventList_temp ( touchdowns , 5 ) = thT ;
EventList_temp ( liftoffs , 5 ) = thL ;
EventList_temp ( : , 3 ) = events_state_counter ;
EventList_temp ( : , 1 ) = events_state ( events_state_counter , 1 : 1 2 ) ;
EventList_temp ( : , 5 ) = current_theta ( 4 ) ;
EventList = [ EventList ; EventList_temp ] ;
EventList = EventList ( sortrowsc ( EventList , 1 : 5 ) , : ) ;
180
181
182
183
184
185
186
187
index = sum ( events_state ( 1 : events_state_counter , : ) <= 0 ) ;
INDEX ( 1 , : ) = events_state ( 1 , : ) <= 0 ;
INDEX ( 2 , : ) = zeros ( 1 , 1 2 ) ; % events_state (2 ,:) < toc;
TT = zeros ( 1 , 6 ) ; % touchdown events
LL = zeros ( 1 , 6 ) ; % lift off events
Delta = zeros ( 1 , 6 ) ;
Check = zeros ( 1 , 6 ) ;
188
189
190
191
192
193
194
195
196
197
198
199
200
for p =1:6
d1 = events_state ( index ( p ) , p ) ;
d2 = events_state ( index ( p+6) , p+6) ;
if d1 > d2
TT ( p ) = d1 ;
LL ( p ) = events_state ( index ( p+6)+1, p+6) ;
Delta ( p ) =0;
Check ( p )=LL ( p ) ;
else
TT ( p ) = events_state ( index ( p ) +1, p ) ;
LL ( p ) = d2 ;
Django D. van Amstel
Master of Science Thesis
C-2 Implementation code
Delta ( p ) =2∗pi ;
Check ( p )=TT ( p ) ;
201
202
end
203
204
153
end
205
206
207
208
209
210
211
212
213
% idx = min(index );
% if idx > 1
%
step_counter = step_counter + 1;
%
events_state (1:( events_state_counter -idx +1) ,:) = events_state (idx:
events_state_counter ,:);
%
events_state_counter = events_state_counter - idx;
% end
% new_event = false ;
214
215
216
217
% ------ Compute initial desired pose forward
desired_pose = zeros ( 1 , 6 ) ;
max_transition_speed = 1 0 ;
218
219
220
221
222
223
224
225
% ------ Compute Starting Trajectory
max_arc = max ( abs ( mod ( desired_pose − current_pose+pi , 2 ∗ pi )−pi ) ) ;
time_length = max_arc / max_transition_speed ;
reference_trajectory = get_reference ( current_pose , . . .
desired_pose , time_length , sample_rate , 0 ) ;
reference_counter = 0 ;
reference_samples = time_length / sample_rate ;
226
227
228
229
230
231
232
233
234
235
236
% Stopping Zebro
%% EXIT PHASE
stoppingtime = 5 ;
counter = 0 ;
stopreference = zeros ( stoppingtime / sample_rate , 6 ) ;
for i = 1 : length ( stopreference )
stopreference ( i , : ) = ( stand_pose+offsettuning ) −(( stand_pose+
offsettuning ) −(sit_pose+offsettuning ) ) / length ( stopreference ) ∗i ;
end
%% loop
while true
237
238
239
240
running_time = toc ( FSM_CLOCK ( 1 ) ) ;
if running_time >= FSM_CLOCK_COUNTER ( 1 ) ∗ sample_rate % synchronisation
FSM_CLOCK_COUNTER ( 1 ) = FSM_CLOCK_COUNTER ( 1 ) + 1 ;
241
242
243
244
if running_time > finaltime
break ;
end
245
246
if new_event % If an event from the EventList has been deleted (
thus has occured )
247
248
249
% ------ Check Event State Update Necessary
event_state_update_necessary = toc > min ( events_state (
events_state_counter − 1 , : ) ) ;
Master of Science Thesis
Django D. van Amstel
154
250
Code of implementation
if event_state_update_necessary
251
% ------ Increment Event State
% fill in the missing elements of the timing vector until
min(T,L)>t
while toc > min ( events_state ( events_state_counter−max (
horizon −1 ,0) , : ) )
252
253
254
255
events_state_counter = events_state_counter + 1 ;
switch_counter = switch_counter + 1 ;
INDEX ( events_state_counter , : ) = zeros ( 1 , 1 2 ) ;
256
257
258
259
% Increase state size if necessary
if events_state_counter > events_state_size
events_state = [ events_state ; zeros (
events_state_size , length ( gaiteigenvector ) ) ] ;
events_scheduled = [ events_scheduled ; zeros (
events_state_size , length ( gaiteigenvector ) ) ] ;
delays = [ delays ; zeros ( events_state_size , length (
gaiteigenvector ) ) ] ;
events_state_size = size ( events_state , 1 ) ;
end
260
261
262
263
264
265
266
267
TC . S ( FSM_CLOCK_COUNTER ( 1 ) ) = switch_necessary ;
% switch decision log
268
269
% ----- Change gait if necessary
if switch_necessary
current_LAMBDA = LAMBDA ( : , xiindex ) ;
[ thetaindex , TC . temp ( FSM_CLOCK_COUNTER ( 1 ) ) ] = MOS (
current_LAMBDA , THETA ( 4 , : ) , current_theta ( 4 ) ,
current_time ) ;
current_theta = THETA ( : , thetaindex ) ;
[ A , G , H , ~ , ~ , current_L , Td , Tf , Tg , lambda ] = MOOsyn (
current_theta ) ;
switch_necessary = 0 ;
end
270
271
272
273
274
275
276
277
278
279
280
A_state ( : , : , events_state_counter ) = A ;
G_state ( : , : , events_state_counter ) = G ;
H_state ( : , : , events_state_counter ) = H ;
281
282
283
284
285
286
% Calculate next step in state
events_state ( events_state_counter , : ) = MPTimes (
A_state ( : , : , events_state_counter ) , events_state (
events_state_counter − 1 , : ) ’ ) ’ ;
events_scheduled ( events_state_counter , : ) =
events_state ( events_state_counter , : ) ;
287
288
289
290
Django D. van Amstel
Master of Science Thesis
C-2 Implementation code
% Update EventList
EventList_temp ( touchdowns , 5 ) = thT ;
EventList_temp ( liftoffs , 5 ) = thL ;
EventList_temp ( : , 3 ) = events_state_counter ;
EventList_temp ( : , 1 ) = events_state (
events_state_counter , 1 : 1 2 ) ;
EventList_temp ( : , 5 ) = current_theta ( 4 ) ;
EventList = [ EventList ; EventList_temp ] ;
EventList = EventList ( sortrowsc ( EventList , 1 : 5 ) , : ) ;
291
292
293
294
295
296
297
298
end
299
300
155
end
301
302
303
% update event index
index = sum ( INDEX ) ;
304
305
306
307
308
TT = zeros ( 1 , 6 ) ; % touchdown events
LL = zeros ( 1 , 6 ) ; % lift off events
Delta = zeros ( 1 , 6 ) ;
Check = zeros ( 1 , 6 ) ;
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
for p =1:6
d1 = events_state ( index ( p ) , p ) ;
d2 = events_state ( index ( p+6) , p+6) ;
if d1 > d2
TT ( p ) = d1 ;
LL ( p ) = events_state ( index ( p+6)+1, p+6) ;
Delta ( p ) =0;
Check ( p )=LL ( p ) ;
else
TT ( p ) = events_state ( index ( p ) +1, p ) ;
LL ( p ) = d2 ;
Delta ( p ) =2∗pi ;
Check ( p )=TT ( p ) ;
end
end
325
326
327
328
329
330
331
332
333
334
335
idx = min ( index ) ;
if idx > state_buffer
events_state ( 1 : ( events_state_counter−idx+state_buffer +1)
, : ) = events_state ( idx−state_buffer :
events_state_counter , : ) ;
A_state ( : , : , 1 : ( events_state_counter−idx+state_buffer +1) )
= A_state ( : , : , idx−state_buffer : events_state_counter ) ;
G_state ( : , : , 1 : ( events_state_counter−idx+state_buffer +1) )
= G_state ( : , : , idx−state_buffer : events_state_counter ) ;
H_state ( : , : , 1 : ( events_state_counter−idx+state_buffer +1) )
= H_state ( : , : , idx−state_buffer : events_state_counter ) ;
events_state ( ( events_state_counter−idx+state_buffer +2) :
end , : ) = 0 ;
events_state_counter = events_state_counter − idx+
state_buffer +1;
EventList ( : , 3 ) = EventList ( : , 3 )−idx+state_buffer +1;
INDEX = INDEX ( idx−state_buffer : end , : ) ;
Master of Science Thesis
Django D. van Amstel
156
Code of implementation
end
new_event = false ;
336
337
338
end
339
340
341
% ------ Increment time
time_counter = time_counter + sample_rate ;
342
343
344
345
346
if starting % transition to zero
reference_counter = reference_counter + 1 ;
reference = reference_trajectory ( reference_counter , : ) ;
current_pose = zebro . getEncoders ( ) ;
347
348
349
350
351
352
353
354
355
356
% compute velocity
velocity= zebro . getVelocity ( ) ;
% compute control law
err = sin ( current_pose − reference ) ;
windup = ( abs ( u_input−saturated_input ) <= 0 ) ;
if windup
integrator = integrator + err ∗ sample_rate ;
end
u_input = − Kp . ∗ err − Ki . ∗ windup . ∗ integrator − Kd . ∗ velocity ;
357
358
359
saturated_input = zebro . saturate ( u_input ) ;
zebro . fwrite ( saturated_input ) ;
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
if reference_counter >= reference_samples
starting = 0 ;
tic ;
end
else % gait starting
% ------ Compute Reference Trajectory
current_time = toc ;
% read
current_pose = zebro . getEncoders ( ) ;
% generate reference
reference=mod ( pi + ( LL−current_time ) . / ( LL−TT ) . ∗ ( thT+Delta + . . .
turning∗−directions ) + . . .
( thL−turning∗−directions . . .
) . ∗ ( current_time−TT ) . / ( LL−TT ) , 2 ∗ pi )−pi+offsettuning ;
375
376
377
378
379
380
381
382
383
384
385
386
% ------ Compute Tracking Control forward
% compute velocity
velocity= zebro . getVelocity ( ) ;
% compute control law
err = sin ( current_pose − reference ) ;
windup = ( abs ( u_input−saturated_input ) <= 0 ) ;
if windup
integrator = integrator + err ∗ sample_rate ;
end
u_input = − Kp . ∗ err − Ki . ∗ windup . ∗ integrator − Kd . ∗ velocity ;
387
388
saturated_input = zebro . saturate ( u_input ) ;
Django D. van Amstel
Master of Science Thesis
C-2 Implementation code
389
390
391
157
% write
zebro . fwrite ( saturated_input ) ;
%% Event level feedback
392
393
394
395
396
397
398
399
400
while previous_time < EventList ( 1 , 1 ) && EventList ( 1 , 1 ) <
current_time
% update that an event has occured
new_event = true ;
if EventList ( 1 , 3 ) ~= EventList ( 2 , 3 )
buff = 1 ;
end
if feedback
401
% If the difference between the reference and the
real
% state is smaller than some margin , the event is on
% time and it is dropped from the EventList
if abs ( err ( EventList ( 1 , 4 ) ) ) < anglemargin
INDEX ( EventList ( 1 , 3 ) , EventList ( 1 , 2 ) ) = 1 ;
EventList = EventList ( 2 : end , : ) ;
402
403
404
405
406
407
408
else % If the error is larger than the anglemarin ,
the state is updated
% update disturbance to event
timenow = toc ;
events_state ( EventList ( 1 , 3 ) , EventList ( 1 , 2 ) )=
timenow+2∗sample_rate ;
events_state ( EventList ( 1 , 3 ) , : ) = MPplus ( MPPlus (
MPTimes ( G_state ( : , : , EventList ( 1 , 3 ) ) ,
events_state ( EventList ( 1 , 3 ) , : ) ’ ) , MPTimes (
H_state ( : , : , EventList ( 1 , 3 ) ) , events_state (
EventList ( 1 , 3 ) −1 ,:) ’ ) ) , events_state ( EventList
(1 ,3) ,:) ’) ;
events_scheduled ( EventList ( 1 , 3 ) , : ) = MPPlus (
MPTimes ( G_state ( : , : , EventList ( 1 , 3 ) ) ,
events_state ( EventList ( 1 , 3 ) , : ) ’ ) , MPTimes (
H_state ( : , : , EventList ( 1 , 3 ) ) , events_state (
EventList ( 1 , 3 ) −1 ,:) ’ ) ) ;
delays ( EventList ( 1 , 3 ) , : ) = events_state ( EventList
( 1 , 3 ) , : )−events_scheduled ( EventList ( 1 , 3 ) , : ) ;
409
410
411
412
413
414
415
416
% Propagate delays for next events
for i = 1 : events_state_counter − EventList ( 1 , 3 )
events_state ( EventList ( 1 , 3 )+i , : ) = MPTimes (
A_state ( : , : , EventList ( 1 , 3 )+i ) , events_state
( EventList ( 1 , 3 )+i − 1 , : ) ’ ) ;
end
417
418
419
420
421
% update event list
EventList ( : , 1 ) = events_state ( sub2ind ( size (
events_state ) , EventList ( : , 3 ) , EventList ( : , 2 ) ) ) ;
422
423
Master of Science Thesis
Django D. van Amstel
158
Code of implementation
EventList = EventList ( sortrowsc ( EventList , 1 : 5 )
,:) ;
424
end
425
426
% update event index if feedback is switch off
else
INDEX ( EventList ( 1 , 3 ) , EventList ( 1 , 2 ) ) = 1 ;
EventList = EventList ( 2 : end , : ) ;
427
428
429
430
431
end
432
433
end
434
435
436
previous_time = current_time ;
437
438
439
% ------ environment measurement
previous_xi = current_xi ;
440
441
442
443
444
current_xi = XImeasurement ( FSM_CLOCK_COUNTER ( 1 ) ) ;
environment_change = abs ( current_xi−previous_xi ) > 0 ;
% % FOR LIMITER EXPERIMENT
zebro . setSaturation ( current_xi ) ;
445
446
447
448
449
450
451
452
453
current_xi = discretize ( current_xi , XI ) ;
discretize measurement according to XI space
[ dummy , xiindex ] = min ( abs ( XI−current_xi ) ) ;
index of environment
buffer_xi = [ buffer_xi current_xi ] ;
xi ( EventList ( 1 , 3 ) ) = mean ( buffer_xi ) ;
if buff
buff = 0 ;
buffer_xi = [ ] ;
end
%
% find
454
455
456
457
458
459
460
461
462
463
464
% ----- Performance Learner
if new_event
LAMBDA = Learner ( current_theta , xi ( EventList ( 1 , 3 ) ) , delays (
EventList ( 1 , 3 ) , : ) , XI , THETA , LAMBDA , current_L ) ;
end
% ----- Check if Gait switch in necessary
if switch_counter > switch_buffer
% switch_necessary = SDM( environment_change , delays (
EventList (1 ,3) -switch_buffer : EventList (1 ,3) ,:));
switch_necessary = 1 ;
switch_counter = 0 ;
end
465
466
%% log data
467
468
469
TC . Time ( FSM_CLOCK_COUNTER ( 1 ) ) = current_time ;
% time log
TC . Encoder ( : , FSM_CLOCK_COUNTER ( 1 ) ) = zebro . getEncoders ( ) ’ ;
% encoder log
Django D. van Amstel
Master of Science Thesis
C-2 Implementation code
159
%TC. Velocity = [TC. Velocity ; zebro . getVelocity () ];
% velocity log
%TC.Input = [TC.Input ; u_input ];
% input log
TC . Reference ( : , FSM_CLOCK_COUNTER ( 1 ) ) = reference ’ ;
% reference log
TC . Theta ( FSM_CLOCK_COUNTER ( 1 ) ) = current_theta ( 4 ) ;
% gait log
TC . Xi ( FSM_CLOCK_COUNTER ( 1 ) ) = current_xi ;
% environment log
470
471
472
473
474
475
end
476
end
477
478
end
479
480
481
482
%% exit
STOP_CLOCK = tic ;
STOP_CLOCK_COUNTER = 1 ;
483
484
485
486
487
488
489
490
491
492
493
494
495
while true
running_time = toc ( STOP_CLOCK ( 1 ) ) ;
if running_time >= STOP_CLOCK_COUNTER ( 1 ) ∗ sample_rate
STOP_CLOCK_COUNTER ( 1 ) = STOP_CLOCK_COUNTER ( 1 ) + 1 ;
if running_time >= stoppingtime
break ;
end
% ------ Compute Reference Trajectory
current_time = toc ;
% read
current_pose = zebro . getEncoders ( ) ;
% generate reference
496
497
reference = stopreference ( STOP_CLOCK_COUNTER ( 1 ) , : ) ;
498
499
500
501
502
503
504
505
506
507
508
% ------ Compute Tracking Control forward
% compute velocity
velocity= zebro . getVelocity ( ) ;
% compute control law
err = sin ( current_pose − reference ) ;
windup = ( abs ( u_input−saturated_input ) <= 0 ) ;
if windup
integrator = integrator + err ∗ sample_rate ;
end
u_input = − Kp . ∗ err − Ki . ∗ windup . ∗ integrator − Kd . ∗ velocity ;
509
510
511
512
513
514
515
516
517
saturated_input = zebro . saturate ( u_input ) ;
% write
zebro . fwrite ( saturated_input ) ;
stoppingtime = stoppingtime−sample_rate ;
end
end
zebro . stop ( ) ;
% plottings
Master of Science Thesis
Django D. van Amstel
160
C-2-2
1
2
Code of implementation
Switch Decision Maker
function S = SDM ( environment_change , delays )
% This function decides if a switch of MOO should be performed .
3
4
5
6
7
8
if environment_change | max ( max ( delays ) ) > 0 . 0 7
S = 1;
else
S = 0;
end
9
10
end
C-2-3
1
2
3
4
5
6
7
8
9
10
11
12
Performance Function Learner
function LAMBDA = Learner ( theta , xi , d , XI , THETA , LAMBDA , L )
% This m.file contains the learner block of the algorithm . While the
% performanc index can be calculated in the absence of disturbances , it
is
% unknown how the environment induces delays to the system and hence the
% performance index . Inputs are the measured environment and selected
mode
% of operation in the last event iteration , and the measured delay . The
% Output is the performance index function for the current environment .
% Hence from the whole surface LAMBDA , only the line corresponding to the
% current environment is taken as an output .
%
% In learning , each new measurement and datapoint is used to shape the
tota
% surface LAMBDA (THETA ,XI).
13
14
% ASSUMPTION MADE : CURRENT DELAY IS INDEED FUNCTION OF ENVIRONMENT
15
16
17
18
19
20
21
% Find indexes / location of current environment xi in total space XI
[ dummy , xiindex ] = min ( abs ( XI−xi ) ) ;
for i = 1 : length ( THETA )
diff ( i ) = norm ( THETA ( 1 : 3 , i ) )−norm ( theta ( 1 : 3 , : ) ) ;
end
[ dummy , thetaindex ] = min ( abs ( diff ) ) ;
22
23
24
% Calculate current (theta ,xi) performance index
performanceindex = perfindex ( d , theta , L , xi ) ;
25
26
27
% Update whole LAMBDA function surface (now: just the single point )
LAMBDA ( thetaindex , xiindex ) = ( performanceindex+LAMBDA ( thetaindex , xiindex )
) /2;
28
29
30
31
32
33
%
%
%
%
%
Gaussian filter applied to update neighboring environments
sigma =1;
sz = 3;
edge = floor (sz /2);
y = gaussian1D (sigma ,sz);
Django D. van Amstel
Master of Science Thesis
C-2 Implementation code
34
35
36
37
161
% for i = 1: length (THETA )
%
LAMBDAconv (i ,:) = conv (y, LAMBDA (i ,:));
%
LAMBDA (i ,:) = LAMBDAconv (i,edge +1: end -edge );
% end
38
39
40
end
C-2-4
1
2
3
4
Mode of Operation Optimizer
function [ theta_new , temp ] = MOS ( LAMBDA , THETA , theta_current , current_time )
% This function takes the performance index and the set of modes of
% operations and returns a (set of) best solution (s). It uses ordinal
% optimization and inverse transform sampling .
5
6
7
8
9
10
% annealing temperature , decides which selection rule : Blind Pick (temp
>> 1 or Horse
mintemp = . 0 1 ;
maxtemp = 5 0 ;
temp = tempfunction ( mintemp , maxtemp , current_time ) ;
11
12
13
14
15
16
17
18
19
20
21
22
23
%% Ordinal optimization settings
s = 8;
k = 1;
%% Creating the custom selection probability function
% USING THE EXPONENTIAL : REQUIRES LAMBDA TO BE STRICTLY POSITIVE . If
there
% are negative values , shift the whole surface up(only the relative
% distances matter ; they dont change )
if min ( LAMBDA ) <= 0
LAMBDA = LAMBDA + abs ( min ( LAMBDA ) +1) ;
end
% Calculate non - normalized probability .
P = exp (( −( LAMBDA ) ) . / temp ) ;
24
25
26
27
28
29
30
31
32
% Calculate normalized probability ( hence reach of Pn = [0 ,1])
for j = 1 : length ( P )
Pn ( j ) = P ( j ) . / sum ( P ) ;
% Create CDF of the probabilities
F ( j ) = sum ( Pn ( 1 : j ) ) ;
end
% plot (THETA (4 ,:) ,F, THETA (4 ,:) ,Pn)
33
34
35
36
37
38
39
%% Run Ordinal Optimization using inverse transform sampling
% Create uniform random vector of length s, being the size of the
selected
% subset .
U = rand ( s , 1 ) ;
% Find corresponding x values in F(x) = U
Master of Science Thesis
Django D. van Amstel
162
40
41
Code of implementation
for i = 1 : length ( U )
thetaV ( i ) = invtranssample ( F , THETA , U ( i ) ) ;
42
% Find accompanying performance index
[ dummy , theta_I ] = min ( abs ( THETA−thetaV ( i ) ) ) ;
lambda ( i ) = LAMBDA ( theta_I ) ;
43
44
45
46
end
47
48
49
50
% Sort according to performance index
[ lambda_sorted , index ] = sort ( lambda ) ;
theta_sorted = thetaV ( index ) ;
51
52
53
% output the alignment set
theta_new = theta_sorted ( 1 : k ) ;
54
55
end
C-2-5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Mode of Operation Synthesizer
function [ A , G , H , P , Q , L , Td , Tf , Tg , lambda ] = MOOsyn ( theta )
% This function synthesizes the A matrix from the selected mode of
% operation and given disturbances .
gaits
L = theta ( 1 ) ;
sz = size ( L ) ;
switch L
case 1
L = L1 ;
case 2
L = L2 ;
case 3
L = L3 ;
case 4
L = L4 ;
case 5
L = L5 ;
case 6
L = L6 ;
case 7
L = L7 ;
case 8
L = L8 ;
case 9
L = L9 ;
case 10
L = L10 ;
case 11
L = L11 ;
end
31
32
33
Tf = theta ( 2 ) ;
Td = theta ( 3 ) ;
Django D. van Amstel
Master of Science Thesis
C-2 Implementation code
34
35
36
37
38
163
Tg = length ( L ) ∗ ( Tf+Td )−Tf ;
tau_D = MPTimes ( Td , Tf ) ;
tau_gamma = MPTimes ( Tg , Tf ) ;
lambda = MPplus ( MPpower ( tau_D , sz ( 1 ) ) , tau_gamma ) ;
[ A , G , H , P , Q ] = MPGenerateAllMatrices ( L , 6 , Tf , Tg , Td ) ;
39
40
end
C-2-6
1
2
3
Performance function
function performanceindex = perfindex ( d , theta , L , Vref )
% This function defines and calculates the performance index as a
function
% of the delay and the mode of operation .
4
5
6
7
8
% gait parameters
Tf = theta ( 2 ) ;
Td = theta ( 3 ) ;
Tg = length ( L ) ∗ ( Tf+Td )−Tf ;
9
10
11
% Parameters of the thresholdfunction
kd = [ 8 10 . 2 ] ;
12
13
14
15
% Robot parameters
r = 0.8;
V = sqrt ( 2 ∗ r^2−2∗r^2∗ cos ( . 6 ) ) /Tg ;
16
17
18
19
20
21
%%%% FOR LIMITER EXPERIMENT
% performanceindex = 6* mean (abs(d))-V;
22
23
24
25
26
27
28
29
30
sz = size ( d ) ;
perf = zeros ( 1 , sz ( 2 ) ) ;
for j = 1 : sz ( 2 )
for i = 1 : sz ( 1 )
perf ( j ) = d ( i , j ) ∗ ( i/sz ( 1 ) )+perf ( j ) ;
end
end
performanceindex = norm ( perf ) ;
31
32
33
34
%%% FOR Vref EXPERIMENT
% performanceindex = abs(Vref -V);
35
36
end
C-2-7
Annealing temperature function
Master of Science Thesis
Django D. van Amstel
164
1
Code of implementation
function temp = tempfunction ( mintemp , maxtemp , current_time )
2
3
4
5
6
curvature = 6 0 ;
temp = maxtemp ∗ exp(−current_time / curvature )+mintemp ;
plot ( temp )
end
C-2-8
1
2
3
4
function discrete_xi = discretize ( current_xi , XI )
[ dum , idx ] = min ( abs ( XI−current_xi ) ) ;
discrete_xi = XI ( idx ) ;
end
C-2-9
1
2
ξ(t) to ξ(k) discretization function
Initialization code for Ξ, Θ and Λ
% This m-file loads all paramaters that define the gaits , and creates the
% THETA and XI spaces .
3
4
5
6
7
8
9
10
%% gait parameters
thT = −.3; % touchdown angle of leg
thL = . 3 ; %lift off angle of leg
Tf = linspace ( 0 . 3 , 1 . 2 , 5 ) ; % flight time of step
Td = 0 ; % double stance time of leg groups
% The ground time is a function of Tf and Td , therefore not explicitly
% defined !
11
12
13
14
% ---------- minimal parameter values for reactive feedbackloop
Tfmin = 0 . 2 5 ;
Tdmin = 0 ;
15
16
thetamin = [ Tfmin ; Tdmin ] ;
17
18
19
% Load gait synchronizations
gaits
20
21
%% Create THETA space
22
23
24
25
26
27
% !!! Generalize this code !!!
THETA = [ 1 ∗ ones ( 1 , 5 ) 2∗ ones ( 1 , 5 ) 3∗ ones ( 1 , 5 ) ;
Tf Tf Tf ;
Td ( 1 ) ∗ ones ( 1 , 1 5 ) ] ;
THETA ( 4 , : ) = 1 : length ( THETA ) ;
28
29
30
31
%% Create XI space
% % LIMITER EXPERIMENT
XI = linspace ( 0 . 3 , 0 . 7 , 5 )
32
33
34
% VREF EXPERIMENT
% XI = linspace (0.06 ,.95 ,20)
35
36
%% Create initial performance index LAMBDA space
Django D. van Amstel
Master of Science Thesis
C-2 Implementation code
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
165
% %for each unique set of parameters in THETA
% for iteration = 1: max(THETA (4 ,:))
%
% load set of parameters
%
current_theta = THETA (:, iteration );
%
% calculate all gait information
%
[A,G,H,P,Q,current_L ,Td ,Tf ,Tg , eigenvalue ] = MOOsyn ( current_theta );
%
% for each discretized value in XI
%
for idx = 1: length (XI)
%
% calculate value for LAMBDA
%
LAMBDA (iteration ,idx) = perfindex (0, current_theta ,current_L ,XI(
idx));
%
% random values on (0 ,1) interval
% %
LAMBDA (iteration ,idx) = rand (1);
%
end
%
% end
52
53
54
55
load ( ’Limiter_3714s .mat ’ )
C-2-10
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Definition of Θ
L1 = { { 1 } , { 4 } , { 5 } , { 2 } , { 3 } , { 6 } } ;
%L2 = {{1} ,{4} ,{5} ,{2} ,{6} ,{3}};
%L3 = {{1} ,{4} ,{2} ,{5} ,{3} ,{6}};
%L4 = {{1} ,{4} ,{2} ,{5} ,{6} ,{3}};
%L5 = {{4} ,{1} ,{5} ,{2} ,{3} ,{6}};
%L6 = {{4} ,{1} ,{5} ,{2} ,{6} ,{3}};
%L7 = {{4} ,{1} ,{2} ,{5} ,{3} ,{6}};
%L8 = {{4} ,{1} ,{2} ,{5} ,{6} ,{3}};
%L9 = {{1 ,4} ,{5 ,2} ,{3 ,6}};
L2 = { { 4 , 5 } , { 1 , 6 } , { 2 , 3 } } ;
L3 = { { 1 , 4 , 5 } , { 2 , 3 , 6 } } ;
ngaits = 3 ;
Master of Science Thesis
Django D. van Amstel
166
Django D. van Amstel
Code of implementation
Master of Science Thesis
Appendix D
Experimental setup and results
In this appendix the results of the experiments that have been conducted with the Zebro
robot are described in more detail. Moreover, in the second section of this appendix a
description of the orignally designed experiment is given. Moreover, the reasons why
this experiment has not been conducted will be presented and recommendations will
be done to overcome these problems.
D-1
D-1-1
Experimental results
Reference speed experiment results
In Figure D-1 The performance function
J(ξ, θ) = |Vref − Vth (θ)|
is depicted graphically. From this graph, it is easy to deduce the best modes of operation
for each environment. They are given in a table form in table D-1. Here θopt denotes
the optimal mode of operation, θopt+1 the second best and θopt+2 the third best, etc.
Note the diagonal structure of the table; see for example how modes of operation θ6 or
θ12 appear in the table. This is rather logical; if a certain mode of operation corresponds
best to a reference speed, for a slighty different reference speed it will still be close.
In Figures D-2 to D-7 the mode of operation and the environment are given as a function
of the time of the experimental runs. In the right figures, the vertical lines denote that
the Switch Decision Maker (SDM) decided a switch should be performed.
From these figures, one can easily see that for T = 5 there is much more exploration.
The amount of times the SDM decided a switch should be performed is roughly equal
for both annealing temperatures. However, for T = 5 much more different modes of
operation are selected compared to for T = 0.01.
Master of Science Thesis
Django D. van Amstel
168
Experimental setup and results
Table D-1: Indexes of four optimal modes of operation for each environment in the Reference
Speed experiment
environment
ξ1 = 0.0152
ξ2 = 0.0417
ξ3 = 0.0683
ξ4 = 0.0948
ξ5 = 0.121
ξ6 = 0.148
ξ6 = 0.174
ξ7 = 0.201
θopt
4
1
7
13
12
12
11
11
θopt+1
5
9
15
6
6
11
12
12
θopt+2
3
8
14
14
13
6
6
6
θopt+3
2
10
8
12
14
13
13
13
Figure D-1: The performance function as a function of Θ, for each environment
Django D. van Amstel
Master of Science Thesis
D-1 Experimental results
169
Figure D-2: Results for the first Reference speed experiment with T = 0.01. Left: The selected
mode of operation as a function of time. Right: The environment and switching instances as a
function of time
Figure D-3: Results for the second Reference speed experiment with T = 0.01. Left: The
selected mode of operation as a function of time. Right: The environment and switching instances
as a function of time
Master of Science Thesis
Django D. van Amstel
170
Experimental setup and results
Figure D-4: Results for the third Reference speed experiment with T = 0.01. Left: The selected
mode of operation as a function of time. Right: The environment and switching instances as a
function of time
Figure D-5: Results for the first Reference speed experiment with T = 5. Left: The selected
mode of operation as a function of time. Right: The environment and switching instances as a
function of time
Django D. van Amstel
Master of Science Thesis
D-1 Experimental results
171
Figure D-6: Results for the second Reference speed experiment with T = 5. Left: The selected
mode of operation as a function of time. Right: The environment and switching instances as a
function of time
Figure D-7: Results for the third Reference speed experiment with T = 5. Left: The selected
mode of operation as a function of time. Right: The environment and switching instances as a
function of time
Master of Science Thesis
Django D. van Amstel
172
Experimental setup and results
Table D-2: The learned performance values matrix Λ after t = 4215s
Θ
θ1
θ2
θ3
θ4
θ5
θ6
θ7
θ8
θ9
θ10
θ11
θ12
θ13
θ14
heightθ15
ξ = 0.3
0.1125
0.0339
0.0150
0.2787
0.0179
0.8593
0.7102
0.6227
0.3451
0.3027
0.4105
0.1699
0.0266
0.0712
0.1359
ξ = 0.4
0.8200
0.0206
0.0106
0.0160
0.2903
0.4990
0.3076
0.1253
0.0938
0.0515
4.5068
0.0463
0.0518
0.1178
0.0431
ξ = 0.5
0.0537
0.0268
0
0.0171
0.0171
0.1013
0.0431
0.1182
0.4078
0.6134
0.0803
0.0218
0
0.0290
0.0356
ξ = 0.6
0.0341
0.0405
0.0200
0.5160
0.0250
0.0659
0.0345
0.1806
0.1179
0.0550
0.0669
0.0493
0.0382
0.0576
0.0891
ξ = 0.7
0.3166
0.0326
0.0070
0.2529
0.0517
0.4352
0.0881
0.2767
0.5486
0.0880
0.0488
0.0571
0.2531
0.2618
0.0266
Note that in th current implementation, it is possible that although the SDM decides
that a switch should be performed, the Mode of Operation Optimizer (MOO) comes
up with a mode of operation equal to the previous one. It was decided for the current
implementation because the Ş is not very advanced in the switching decisions to ensure
highly frequent switching for the sake of testing.
Implementing that the MOO can not select the current mode of operation as the new
mode of operation in a switch is done by adapting the Selection Probability Matrix
(SPM). The elements in the SPM corresponding to the current mode of operation
should be altered temporarily to 0, such that the probability of selection will be 0.
D-1-2
Learning experiment results
The learned performance function after selected learning time are depicted in Figures
D-8 to D-15. Since outliers have a negative influence on the learning, the outlier of
D-11 is deleted by hand. The final learned performance function is also given in table
form in Table D-2.
In Table D-2 almost no values of exactly 0 are seen; indicating that all values have been updated at least once. This confirms part one of the
hypothesis. However, due to the ordinal comparison the algorithm, even for high values
of the annealing temperatures, the better performing solutions have a preference over
the less good solutions by the algorithm. Hence, even in the annealing temperature is
kept at a low value, the better solutions will be explored more than the worse ones.
exploration and outliers.
Because the algorithm has an intrinsic preference for low values of the learned performance function, it is very sensitive to outliers. The outliers are usually a consequence
Django D. van Amstel
Master of Science Thesis
D-1 Experimental results
173
of an incidental occurance. However, due to the ordinal comparison in the Ordinal
Optimization (OO) algorithm, extremely high values will never be selected. In fact,
the solution with the greatest performance value has no probability of selection if s ≥ 2,
since then the other solutions will always be better. If a possible very good solution
receives a very high performance value early on in the learning, it will not be chosen
very often any more.
By taking into account past values in updating the performance function it was tried
to overcome the problem of outliers. In the experiments the least amount of history
is taken into account to allow faster learning. If more historical data is taken into
account, values will change slower, leading to more conservative updating and hence
slower learning. Clearly a more sophisticated method is needed to recognize and handle
outliers.
Another issue caused by this preference effect is that the initial shape of the performance
function is very important. The initial values should be relatively low, such that the
unvisited parts of the search space are preferred to be explored. Hence, a poorly chosen
initial values for the performance function can lead to very bad learning.
This disadvantage can be an advantage as well; it allows for very easy incorporation
of a priori knowledge about the modes of operation. If, for example by experience it
is known that certain modes of operation perform well in a certain environment, the
initial values of the Performance Approximation Matrix (PAM) matrix can be designed
accordingly to a relative low value.
In the case that no prior knowledge is available, a possible solution might be to assign
the performance value 0 to all modes of operation initially and to define the performance
function as a strictly positive function. By doing so, exploration is guaranteed.
The second and third part of the hypothesis,
being convergence of the approximated performance function to the true performace
function, cannot be confirmed or rejected with the obtained data.
Convergence and level of approximation.
First, it does not seem that the PAM has converged as between every Figure D-8 to
D-15 the shape of the plot has changed significantly. This does not mean that it will not
converge at all; the experiment duration might have been too short. Hence, the second
part of the hypothesis in the learning experiment cannot be confirmed nor rejected.
Because no convergence is seen, it is impossible to claim anything about what it might
have converged to, leaving part three of the hypothesis unanswered as well.
However, there are some observations that do endorse that the second and third part
of the hypothesis can be confirmed if more data is obtained. Recall that by defining
the performance function as the 2-norm of the disturbance vector and the homogeneous
surface of the experimental setup, the influence of the dynamical properties of the Zebro
on the gait was the unknown factor.
Note that he most outliers were observed for the environment ξ = 0.7, corresponding to
70% of maximal leg power. The fact that such outliers are observed for this environment
might be because of the tuning of the P D controllers. The gains of the P D controller
Master of Science Thesis
Django D. van Amstel
174
Experimental setup and results
have been tuned for a lower limiting factor. When more power is available, the P D
controller’s control input is less saturated and hence more agressive control is possible.
Because the gains of the P D controllers are not adapted accordingly, more overshoot
and oscillation are introduced. This larger overshoot results in a disturbance in the
max-plus state, since this error between the actual leg position and the reference leg
position is used to calculate the disturbances in event domain.
This problem could be solved by adding a fuzzy gain scheduler that adapts the controller
gains according to the imposed limiting factors.
Moreover, the peaks of approx. Λ = 0.6 in the graphs of ξ = 0.3 and ξ = 0.4 find
their origin in the dynamics of the Zebro. Note that they occur at θ6 for the first two
environments, having the parameterization


L2

θ6 = 
 0.300  ,
0.1
and recall that L2 is the quadpod gait. The last leg group in this synchronization is
{1, 6}. Hence, the left front and right back leg lift up together. During this step, the
support polygon of the supporting feet is very small. The support polygon is defined
as the area spanned by the feet tips. For static stability, the projection of the centre of
mass to the horizontal plane should lie within the support polygon.
Because of the a-symmetric organization of the electrical components in the main body
of the Zebro, the centre of mass falls outside of the support polygon in gait L2 . It
appeared that in this gait, with the dynamical properties introduced by the step frequency by selecting τf = 0.3 in combination with the saturation limits of 30% and 40%
resulted in the forward and backward rocking of the Zebro body in a frequency such
that leg 1 was always early to touch-down. Moreover, as most weight is placed on this
leg due to the rocking, the subsequent lift-off is delayed as well as it the extra weight
causes slower movement of the leg. This introduced large delays, corresponding to the
peaks in the performance values for θ6 at ξ = 0.3 and ξ = 0.4.
D-2
Technical difficulties during the experiments
At the experimental phase, only 1 functioning battery pack was available for the Zebro.
With a fully charged battery it is possible to operate for approximately 20 minutes.
However, recharging it takes at least 40 minutes.
Another issue has been a bug in the leg calibration. At apparantly random moments
the calibration would shift by an angle π. When this happens, the experiment must
be stopped by hand and restarted. Because of this, much experimental data had to
be thrown away. Moreover, it prevented the execution of experiments with a long
duration. It was noted by G.A.D. Lopes, daily supervisor in this thesis project, that
this bug might be introduced in the conversion of the binary representation in the
motor encoder to angles.
Django D. van Amstel
Master of Science Thesis
D-2 Technical difficulties during the experiments
175
Figure D-8: The approximated performance values after t = 130s of learning
Master of Science Thesis
Django D. van Amstel
176
Experimental setup and results
Figure D-9: The approximated performance values after t = 820s of learning
Django D. van Amstel
Master of Science Thesis
D-2 Technical difficulties during the experiments
177
Figure D-10: The approximated performance values after t = 1564s of learning
Master of Science Thesis
Django D. van Amstel
178
Experimental setup and results
Figure D-11: The approximated performance values after t = 2186s of learning
Django D. van Amstel
Master of Science Thesis
D-2 Technical difficulties during the experiments
179
Figure D-12: The approximated performance values after t = 2186s of learning and the outlier
removed
Master of Science Thesis
Django D. van Amstel
180
Experimental setup and results
Figure D-13: The approximated performance values after t = 2686s of learning
Django D. van Amstel
Master of Science Thesis
D-2 Technical difficulties during the experiments
181
Figure D-14: The approximated performance values after t = 3464s of learning
Master of Science Thesis
Django D. van Amstel
182
Experimental setup and results
Figure D-15: The approximated performance values after t = 4214s of learning
Django D. van Amstel
Master of Science Thesis
D-2 Technical difficulties during the experiments
183
The above two issues resulted in a very inefficient workflow for executing experiments.
It is estimated that obtaining one hour of useful data took up to 4 hours.
During the execution of the second learning experiment, the powerboard of the Zebro
broke down. Replacing this hardware kept the Zebro unavailable for approximately
two weeks. After the repair it appeared the actuators were still not getting power
properly. As a quick fix, the motors were connected directly to the battery. However,
this adaption led to different dynamical properties of the system.
Because the dynamical properties of the Zebro changed, it was impossible to extend
the learning experiment in both time duration and number of repititions.
Master of Science Thesis
Django D. van Amstel
184
D-3
Experimental setup and results
Original designed experiment
Originally, a completely different experiment was designed to test the deliberate loop
on the Zebro. In this section this original experiment will be described.
The experiment was designed such that over a longer period of time the Zebro would
learn the most efficient gaits. In this section this original experiment will be described.
For long learning epochs the Zebro should be able to navigate autonomously such that
it can run for hours without human supervision.
Autonomous navigation is possible in the new CyberZoo laboratory of the Delft University of Technology. In this lab multiple infrared (IR) cameras are installed to track
the movement of robots. The setup and possibilities in this laboratory will be discussed
first. Then, the original designed experiment will be described. Finally, a list of technical problems that should be solved before the original experiment can be conducted
is presented.
D-3-1
The CyberZoo
The CyberZoo is a new laboraty at the faculty of Aerospace of the Delft University of
Technology. It is a cage of 10 by 10 by 7 meters in width, depth and height respectively.
It’s main feature is the OptiTrack system from the manufacturer Motive. The system
consists of 12 infrared cameras with software that can track infrared markers. By doing
so, the spatial coordinates of the markers are obtained with high precision. The reader
is referred to the website of Motive for detailed technical specifications on the system.
The Zebro has been equipped with 5 active infrared markers. See Figure D-16 for a
close up. The marker is a IR LED and a resistor to limit the current.
Using these markers, the cartesian coordinates {x, y, z} and the euler angles {θ, α, γ}
of the Zebro are measured. This information is send to the Zebro, such that it knows
where it is in the Cyber Zoo and which direction it is facing. A schematic representation
of the communication network is given in Figure D-17.
A Client computer is connected by wire to the OptiTrack system and forwards the
data via a seperate wireless network to the Zebro. It is possible to directly connect
the Zebro to the OptiTrack system by WLAN. However, this results in a very unstable
connection, especially when more robots are using the system simultaneously. By using
the client computer a much more stable and robust datastream is achieved.
With its coordinates available to the Zebro, it can autonomously walk a certain path
defined by waypoints in the CyberZoo space. This navigation algorithm has been
implemented succesfully on the Zebro by Fankai Zhang, technical lab employee at
Delft Center for Systems and Control (DCSC), assisted by the author of this thesis.
The written code can be found on the CD-ROM accompanying this thesis.
The least robust part of the algorithm is the tracking of the IR markers by the camera’s. There are many external sources that influence the tracking performance. A few
pointers for troubleshooting poor tracking performance are given in the list below.
Django D. van Amstel
Master of Science Thesis
D-3 Original designed experiment
185
Figure D-16: Picture of an IR marker as installed on the Zebro.
Figure D-17: Schematic representation of the Cyber Zoo communication network.
Master of Science Thesis
Django D. van Amstel
186
Experimental setup and results
• As the Zebro is equipped with active markers, the IR light emitting LEDS (used
for passive markers) on the cameras can be switched completely off to avoid interference.
• The markers on the Zebro have relative high IR radiance power. Hence the exposure of the cameras can be set to a relative low value, while the threshold value
for marker recognition can be quite high.
• It is advised to recalibrate the mask during the day as the external IR interference
depends on the intensity and angle of the incoming sunlight.
• Set the minimal visible marker count as low as possible (3 is the absolute minimum). By doing so more markers can be invisible without loss of tracking of the
total body.
• When the system is having difficulties to calculate the position of markers (recognizable by flickering of the markers in the graphical interface of the OptiTrack
software), increase the reflectivity.
D-3-2
Method of experiment
In the CyberZoo space, two waypoints A and B have been defined. The Zebro
walks in a loop between these two waypoints during the whole experiment.
Define the environment as the weight in kg that is placed on the Zebro or on a
sled pulled by the Zebro. By changing the weight, the dynamical properties of
the Zebro-sled system are altered.
The SDM is defined such that a switch will be performed every time a waypoint is
reached. Hence, each mode of operation is observed over the fixed distance A − B.
Let tA,B denote the time between departing from waypoint A and the subsequent
arrival at waypoint B or vice versa. Then the performance function is defined as
J(ξ, θ) = tA,B (θ(k)) + α · |d(ξ(k))|2 ,
in where α is a scaling factor to tune the relative importance between the two
terms and | · |2 denotes the two-norm.
By defining the performance in such a way, the average speed over the distance
A − B is optimized, under the constraint that the disturbances should remain
small. How small is determined by α.
The definition of the mode of operation is


θ=


L
τf
τg
α∆



,

with α∆ the angle between the lift off and touch down positions of the leg. The
set Θ can be made very large, for example 1000 unique combinations θ.
Django D. van Amstel
Master of Science Thesis
D-4 Recommendations
187
The goal of the experiment is that the Zebro learns which modes of operation result
in a large forward velocity. It has been observed that theoretical slower modes
of operation can result in a higher forward velocity because of the dynamical
properties of the Zebro body. If the dynamical properties of the Zebro change by
the addition of weight, the optimal mode of operation should change as well.
D-3-3
Issues with the CyberZoo experimental setup
There were some technical issues that prevented the execution of the experiment
describe in the previous section. Hence, the reference speed and learning experiment were performed to obtain as much practical data as possible.
The main problem with using the motion capture system has been the clock speed
of the Zebro hardware. Both the motion tracking and the learning algorithm take
up a considerable amount computational budget. Combining both algorithms
resulted in such a computational burden that the frequency of the supervisory
controller had to be lower than the minimal bandwidth of the lower level leg
P D controllers. As a result, the legs were uncontrolled frequenty as no reference
position was available.
D-4
Recommendations
The following recommendations are done to overcome the above described issues
with experimenting with the Zebro
– Replace the MATLAB control environment of the Zebro for a environment
with much less overhead. By doing so, the learning and motion tracking
algorithms can be combined.
– Analyse the encoders and drivers of the leg posititioning to find the bug
causing the shift in calibration.
– Acquire at least 2 extra battery packs and 1 charger. By doing so it will be
possible to charge and run the Zebro in parallel.
D-5
D-5-1
Used MATLAB code
Reference speed experiment
1
2
clear all , clc , close all
3
4
5
% load THETA ,XI , LAMBDA
parameters
6
7
8
%% plot LAMBDA curves and OPC
figure ( 1 )
Master of Science Thesis
Django D. van Amstel
188
9
10
11
12
13
14
Experimental setup and results
for i = 1 : length ( XI )
subplot ( 2 , 4 , i ) , plot ( THETA ( 4 , : ) , LAMBDA ( : , i ) )
ylabel ( ’\ Lambda (\ theta_i )’ )
xlabel ( ’index i of \ theta_i ’ )
title ( [ ’Performance function for \Xi = ’ , num2str ( XI ( i ) ) ] )
end
15
16
17
18
19
G = zeros ( size ( LAMBDA ) ) ;
for i = 1 : length ( XI )
[ ~ , G ( : , i ) ] = sort ( LAMBDA ( : , i ) ) ;
end
20
21
22
23
24
25
26
27
28
29
% Calculate Vmax , Vmin
for i = 1 : 1 5
theta = THETA ( : , i ) ;
[ A , G , H , P , Q , L , Td , Tf , Tg ( i ) , lambda ] = MOOsyn ( theta ) ;
r = 0.17;
V ( i ) = sqrt ( 2 ∗ r^2−2∗r^2∗ cos ( . 6 ) ) /Tg ( i ) ;
end
30
31
32
33
34
35
% define parameters
N = length ( XI ) ;
g = 1;
s = 8;
temp = [ 0 . 0 1 0 . 0 5 . 1 . 5 1 5 10 5 0 ] ;
36
37
38
39
40
41
42
% Calculate OPC ’s for all entries in XI
Prob = zeros ( 1 , length ( temp ) ) ;
idx = 0 ;
for T = temp
tic
idx = idx +1;
43
%% Simulate optimization procedure
Z = 1 0 0 0 0 0 ; % number of simulations
alignment = zeros ( Z , 1 ) ;
for z = 1 : Z
theta_selected = OrdinalOptimization ( LAMBDA ( : , idx ) , THETA ( 4 , : )
,T , s) ;
alignment ( z , : ) = theta_selected ( 1 : g ) ;
end
44
45
46
47
48
49
50
51
check = ( alignment == g ) ;
Prob ( idx ) = sum ( check ) /Z ;
toc
52
53
54
55
end
D-5-2
MATLAB code use for experimental result analysis
Django D. van Amstel
Master of Science Thesis
D-5 Used MATLAB code
1
189
Prob = zeros ( 3 ) ;
2
3
4
5
6
7
8
load ( ’Vref_tau001R1 .mat ’ )
Prob ( 1 , 1 ) = Vrefanalysis ( TC , LAMBDA , THETA , XI ) ;
load ( ’Vref_tau001R2 .mat ’ )
Prob ( 2 , 1 ) = Vrefanalysis ( TC , LAMBDA , THETA , XI ) ;
load ( ’Vref_tau001R3 .mat ’ )
Prob ( 3 , 1 ) = Vrefanalysis ( TC , LAMBDA , THETA , XI ) ;
9
10
11
12
13
14
15
load ( ’Vref_tau1R1 .mat ’ )
Prob ( 1 , 2 ) = Vrefanalysis ( TC , LAMBDA , THETA , XI ) ;
load ( ’Vref_tau1R2 .mat ’ )
Prob ( 2 , 2 ) = Vrefanalysis ( TC , LAMBDA , THETA , XI ) ;
load ( ’Vref_tau1R3 .mat ’ )
Prob ( 3 , 2 ) = Vrefanalysis ( TC , LAMBDA , THETA , XI ) ;
16
17
18
19
20
21
22
load ( ’Vref_tau100R1 .mat ’ )
Prob ( 1 , 3 ) = Vrefanalysis ( TC , LAMBDA , THETA , XI ) ;
load ( ’Vref_tau100R2 .mat ’ )
Prob ( 2 , 3 ) = Vrefanalysis ( TC , LAMBDA , THETA , XI ) ;
load ( ’Vref_tau100R3 .mat ’ )
Prob ( 3 , 3 ) = Vrefanalysis ( TC , LAMBDA , THETA , XI ) ;
23
24
Prob
D-5-3
Learning experiment
1
2
3
parameters
load ( ’Limiter_4214s .mat ’ )
4
5
6
7
8
9
10
11
% plot 3D image
subplot ( 3 , 2 , 1 ) , surf ( XI , THETA ( 4 , : ) , LAMBDA )
title ( ’3D surface plot of learned performance function ’ )
xlabel ( ’\Theta ’ )
ylabel ( ’\Xi ’ )
zlabel ( ’\ Lambda ’ )
axis ( [ 0 . 3 0 . 7 0 15 0 1 ] )
12
13
14
15
16
% Delete outlier
[ value , row ] = max ( LAMBDA ) ;
[ ~ , column ] = max ( value )
row = row ( column )
17
18
LAMBDA ( row , column ) = ( LAMBDA ( row +1, column )+LAMBDA ( row −1, column ) ) /2
19
20
21
22
for i = 2 : 6
subplot ( 3 , 2 , i )
plot ( THETA ( 4 , : ) , LAMBDA ( : , i−1) )
Master of Science Thesis
Django D. van Amstel
190
23
24
25
26
Experimental setup and results
title ( [ ’The learned performance function for \Xi =’ , num2str ( XI ( i−1) )
])
xlabel ( ’\Theta ’ )
ylabel ( ’\ Lambda ’ )
end
Django D. van Amstel
Master of Science Thesis
Bibliography
[1] F. Baccelli, Synchronization and Linearity. JohnWiley & Sons, 1992.
[2] B. Heidergott and R. De Vries, “Towards a (max,+) control theory for public transportation networks,” Discrete Event Dynamic Systems: Theory and
Applications, vol. 11, pp. 371–398, 2001.
[3] W. Leune, “Model-based operational control of railway networks,” Master’s
thesis, Delft University of Technology, 2009.
[4] T. Van den Boom and B. De Schutter, “Mpc for perturbed max-plus-linear
systems,” Proceedings of the European Control Conference, pp. 3783–3788,
2001.
[5] G. A. Lopes, R. Babuska, B. De Schutter, and Van, “Switching max-plus
models for legged locomotion,” in Proceedings of the 2009 IEEE International
Conference on Robotics and Biomimetics December 19-23, Guilin, China,
2009.
[6] G. D. Lopes, B. De Schutter, and T. Van den Boom, “On the synchronization of cyclic discrete-event systems,” 51st IEEE Conference on Decision and
Control, Maui, Hawaii, USA, 2012. Maui.
[7] J. Duysens and H. W. Van de Crommert, “Neural control of locomotion;
part1: The central pattern generator from cats to humans,” Gait & posture,
vol. 7.2, pp. 131–141, 1998.
[8] P. Holmes, R. J. Full, D. Koditschek, and J. Guckenheimer, “The dynamics of legged locomotion: Models, analyses and challenges,” SIAM review,
vol. Vol.48, No. 2, pp. pp. 207–304, 2006.
[9] A. Mahajan and F. Figueroa, “Four-legged intelligent mobile autonomous
robot,” Robotics & Computer Integrated Manufacturing, vol. 13, pp. 51–61,
1997.
[10] B. De Schutter and van den, “Max-plus algebra and max-plus linear discrete event systems: An introduction,” Proceedings of the 9th International
Master of Science Thesis
Django D. van Amstel
192
Bibliography
Workshop on Discrete Event Systems (WODES’08), vol. Götenborg, Sweden,
pp. pp. 36–42, May 2008.
[11] B. Heidergott, G. J. Olsder, and J. Van der Woude, Max Plus at Work.
Princeton University Press, 2006.
[12] T. J. Van den Boom and B. De Schutter, “Modelling and control of discrete
event systems using switching max-plus-linear systems,” Control Engineering
Practice, vol. Vol. 14, no. 10, pp. pp. 1199–1211, 2006.
[13] D. Hoyt and C. R. Taylor, “Gait and the energetics of locomotion in horses,”
Nature, vol. 292, pp. 239–240, 1981.
[14] G. A. Lopes, T. Van den Boom, B. De Schutter, and R. Babuska, “Modeling
and control of legged locomotion via switching max-plus systems,” (Berlin,
Germany), pp. pp. 392–397, Aug.-Sept. 2010.
[15] G. A. Lopes, B. Kersbergen, B. De Schutter, T. Van den Boom, and Ba,
“Synchronization of a class of cyclic discrete-event systems describing legged
loco,” tech. rep., Delft University of Technology, 2012.
[16] M. Alsaba, S. Lahaye, and J.-L. Boimond, “On just in time control of switching max-plus linear systems,” ICINCO-SPSMC, pp. pp. 79–84, 2006.
[17] E. Menguy, J.-L. Boimond, L. Hardouin, and J. Ferrier, “A first step towards
adaptive control for linear systems in max algebra,” Discrete Event Dynamic
Systems, vol. 10.4, pp. pp. 347–367, 2000.
[18] T. Van den Boom and B. De Schutter, “Stabilizing model predictive controllers for randomly switching max-plus-linear systems,” Proceedings of the
European Control Conference 2007 (ECCŠ07), pp. pp. 495–4959, 2007.
[19] T. Van den Boom and B. De Schutter, “Model predictive control for switching
max-plus-linear systems with random and deterministic switching,” Proceedings of the 17th World Congress The International Federation of Automatic
Control, Seoul, Korea, pp. pp. 7660–7665, 2008.
[20] B. De Schutter and T. Van den Boom, “Model predictive control for maxplus-linear discrete event systems,” Automatica, vol. 37.7, pp. pp. 1049–1056,
2001.
[21] E. Karalarli, A. M. Erkmen, and I. Erkmen, “Intelligent gait synthesizer for
hexapod walking rescue robots,” Proceedings ICRA’04. IEEE International
Conference on Robotics and Automation, vol. Vol. 3, pp. 2177–2182, 2004.
[22] J. M. Porta and E. Celaya, “Efficient gait generation using reinforcement learning,” International Conference on Climbing and Walking Robots
(CLAWAR), 2001.
[23] M. S. Erden and K. Leblebicioglu, “Free gait generation with reinforcement
learning for a six-legged robot,” Robotics and Autonomous Systems, vol. 56,
pp. 199–212, 2008.
[24] T. Mori, Y. Nakamura, M.-a. Sato, and S. Ishii, “Reinforcement learning for
a cpg-driven biped robot,” AAAI, pp. 623–630, 2004.
Django D. van Amstel
Master of Science Thesis
193
[25] N. Kohl and P. Stone, “Machine learning for fast quadrupedal locomotion,”
AAAI, vol. Vol. 4, pp. pp. 611–616, 2004.
[26] J. S. Golan, Semirings and Their Applications. Kluwer Academic Publisher,
Dordrecht, 1999.
[27] J. Gunawardena, Idempotency. Newton Institute, Cambridge University
Press, Cambridge, U.K., 1998.
[28] T. Murata, “Petri nets: properties, analysis and applications,” Proceedings
of the IEEE, vol. 77, pp. 541–580, 1989.
[29] G. Cohen, S. Gaubert, and J.-P. Quadrat, “Max-plus algebra and system
theory: where we are and where to go now,” Annual Reviews in Control,
vol. 23, pp. 207–219, 1999.
[30] T. Van den Boom and B. de Schutter, Optimization in Systems and Control.
Delft Centre for Systems and Control, Delft University of Technology, Delft.,
2011.
[31] Heidergott, “Recent trends in optimization: Ordinal optimization,” April
2012.
[32] Y. C. Ho, R. Sreenivas, and P. Vakili, “Ordinal optimization of deds,” Discrete
Event Dynamic Systems, vol. vol. 2, no. 1, pp. 61–88, 1992.
[33] Y.-C. Ho, “The ordinal optimization teaching module.” Online, 2009.
[34] S. Rocco, C. M., and J. E. Ramirez-Marquez, “Identification of top contributors to system vulnerability via an ordinal optimization based method,”
Reliability Engineering & System Safety, 2013.
[35] T. Edward Lau and Y. Ho, “Universal alignment probabilities and subset
selection for ordinal optimization,” Journal of optimization theory and applications, vol. Vol. 93, No. 3, pp. pp. 455–489, 1997.
[36] Q. Jia, Y. Ho, and Q. Zhao, “Comparison of selection rules for ordinal optimization,” Mathematical and Computer Modelling, vol. 43, pp. 1150–1171,
2006.
[37] D. Li, L. H. Lee, and Ho, “Constrain ordinal optimization,” Information
Sciences, vol. 148, pp. 201–220, 2002.
[38] R. Babuska, Knowledge-Based Control Systems. Delft Centre for Systems
and Control, Delft University of Technology, Delft., 2010.
[39] K. Sigman, “Lecture notes of course ieor 4404 simulation.” Online, 2010.
[40] J. Fan, “Design-adaptive nonparametric regression,” Journal of the American
statistical Association, vol. 87.420, pp. 998–1004, 1992.
[41] J. Park and I. W. Sandberg, “Approximation and radial-basis-function networks,” Neural computation, vol. 5.2, pp. 305–316, 1993.
[42] C. De Boor, A practical guide to splines. New York: Springer-Verlag, 1978.
[43] B. De Schutter and T. Van den Boom, “Mpc for discrete-event systems with
soft and hard synchronisation constraints,” International Journal of Control,
vol. 76, no. 1, pp. 82–94, 2003.
Master of Science Thesis
Django D. van Amstel
194
Django D. van Amstel
Bibliography
Master of Science Thesis
Glossary
List of Acronyms
DCSC
Delft Center for Systems and Control
DES
Discrete Event Systems
TEG
MPL
Timed Event Graph
Max-Plus Linear
SMPL
RL
Switching Max-Plus Linear
Reinforcement Learning
OO
OPC
Ordinal Optimization
Ordered Performance Curve
CPG
BP
Central Pattern Generator
Blind Pick
HR
ABP
Horse Race
Adaptive Blind Pick
SPM
PAM
Selection Probability Matrix
Performance Approximation Matrix
SDM
MOO
Switch Decision Maker
Mode of Operation Optimizer
PFL
MOS
Performance Function Learner
Mode of Operation Synthesizer
RGS
Reactive Gait Scheduler
GS
Gait Scheduler
Master of Science Thesis
Django D. van Amstel
196
Django D. van Amstel
Glossary
Master of Science Thesis
197
Master of Science Thesis
Django D. van Amstel
198
Django D. van Amstel
Glossary
Master of Science Thesis
199
Master of Science Thesis
Django D. van Amstel
200
Django D. van Amstel
Glossary
Master of Science Thesis
201
Master of Science Thesis
Django D. van Amstel

Download Report

Feedback mechanisms in Switching Max

Paperzz.com

Your Paperzz