concurrent learning in the presence of uncertain input

CONCURRENT LEARNING IN THE PRESENCE
OF UNCERTAIN INPUT
ALLOCATION
By
BENJAMIN DAVID REISH
Bachelor of Science in Mechanical Engineering
Oklahoma Christian University
Edmond, OK
2004
Submitted to the Faculty of the
Graduate College of
Oklahoma State University
in partial fulfillment of
the requirements for
the Degree of
MASTER OF SCIENCE
July, 2015
CONCURRENT LEARNING IN THE PRESENCE
OF UNCERTAIN INPUT
ALLOCATION
Thesis Approved:
Dr. Girish Chowdhary
Thesis Advisor
Dr. Prabhakar Pagilla
Dr. Lisa Mantini
ii
Acknowledgments
Without my wife, Melissa, this degree would have been impossible. She is my supporter
and encourager. She has endeavored tirelessly to provide for the family and raise our
children while I completed this degree. She is my cheering section and without her, I would
still be hating my job. She encouraged me to look for what makes me happy and to not be
limited by what I think are ‘my’ responsibilities.
I want to thank my parents for their encouragement and financial support throughout
this degree. They helped shoulder some of the financial strain that having a family, house,
and a fifty mile commute one way and allowed me to concentrate on my classwork and
research to finish this degree.
I want to thank Dr. Byron Newberry for planting a seed years ago about obtaining an
advanced degree. I do not know that he knows the impact he had when he talked with
me about his graduate school experience initially and then repeatedly over several years in
small conversations that we had together.
Part of this material is based upon work supported by the National Aeronautics and
Space Administration under Research Initiation Grant No. NNX13AB21A issued through
the Research Infrastructure Development Program.
iii
Acknowledgments reflect the views of the author and are not endorsed by
committee members or Oklahoma State University.
Name: BENJAMIN DAVID REISH
Date of Degree: JULY, 2015
Title of Study: CONCURRENT LEARNING IN THE PRESENCE OF UNCERTAIN
INPUT ALLOCATION
Major Field: MECHANICAL AND AEROSPACE ENGINEERING
Abstract: Most Model Reference Adaptive Control methods assume that the input allocation matrix (B in the state-space representation: ẋ = Ax + Bu) is known. These methods
cannot be used in situations that require adaptation in presence of uncertain input allocation, such as when controls reverse on a flexible aircraft due to wing twist, or when actuator
mappings are unknown. To handle such situations, a Concurrent Learning Model Reference Adaptive Control method is developed for linear uncertain dynamical systems where
the input allocation matrix is uncertain. The approach relies on simultaneous estimation
of the input allocation matrix using online recorded and instantaneous data concurrently,
while the system is being actively controlled using the online updated estimate. It is shown
that the tracking error and weight error convergence depend on how accurate the estimates
of the unknown parameters are. This is used to establish the necessity for purging the
concurrent learning history stack, and three algorithms for purging the history stack for
eventual re-population are presented. The system stability is shown by the solutions to the
system being locally bounded and staying within that local set. Local ultimate boundedness is plausible. Then a relaxation of the uncertain input allocation matrix assumption is
discussed and shown to be locally bounded in the closed-loop control case and ultimately
bounded within the set described. Simulations validate the theoretical results for both the
uncertain input allocation case and the relaxed uncertain input allocation case.
iv
Nomenclature
Symbol
A
Arm
B
b
B
b
B†
Brm
e
B
D
δK
δc
K
δKr
δc
Kr
e
ė
0
B
ts
b
K
K i
b
Kr
Kr j
b
W
ΓB
Γr
Γx
ΓW
K
K∗
e
K
Kr
K∗r
fr
K
Λ
Description
state matrix of plant
reference model plant
actual plant control matrix
controller’s estimate of the B matrix
b
pseudo-inverse of B
reference model’s input matrix
b and B
difference between B
known part of B when dissolved
vector stored for concurrent learning
e
vector with error from B
vector stored for concurrent learning
e
vector with error from B
error between x and xrm
trajectory of the system error through time
threshold for deciding to shock the history
stacks
b matrix
regressor for the B
e at time ts
norm of B
b
weight error for K using B
ith weight error for K used in concurrent learning history stack
b
weight error for Kr using B
th
j weight error for Kr used in concurrent
learning history stack
combined b
K and b
Kr
b
update law learning rate for B
update law learning rate for Kr
update law learning rate for K
update law learning rate for W
state feedback gain
ideal matching gains needed for MRAC
difference between K and K∗
feedforward gain
ideal matching gains needed for MRAC
difference between Kr and K∗r
unknown diagonal matrix for dissolved B
v
Notes
assumed known
uncertain in this work
Chpt. 5
uncertain B
uncertain B
concurrent learning
Symbol
b
Λ
λΛ
e
Λ
P
pmax
Φ0
Φ1
ΦT
Q
r
RBb
RΛ
RW
σ
θ
θTstart
θTend
u
W
W∗
f
W
x
ẋ
xdes
xrm
ẋrm
XBb
XΛ
XW
Description
estimator for Λ
b
outer bound of the projection operator for Λ
b −Λ
Λ
positive definite matrix from Lyapunov eqn.
max allowed points in history stack
set for Projection operator
set for Projection operator
set for Projection operator
positive definite matrix from Lyapunov eqn.
reference command
b
regressor history stack for B
regressor history stack for Λ
regressor history stack for W
combined x and r vectors
vector used with Projection operator
inner bound of Projection operator
outer bound of Projection operator
input to the plant
combined K and Kr matrices
combined K∗ and K∗r matrices
e and K
fr matrices
combined K
state of the plant
derivative of the state, x
desired state
reference model state
derivative of the reference model state
b
input history stack for B
b
input history stack for Λ
state history stack for W
vi
Notes
of dissolved B
assumed measurable
assumed measurable
used in MRAC
Table of Contents
Chapter
Page
1 Literature Survey
1
1.1
State-Space Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Model Reference Adaptive Control . . . . . . . . . . . . . . . . . . . . . . .
2
1.3
Persistency of Excitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.4
Nussbaum Gains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.5
Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2 Model Reference Adaptive Control
7
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.2
Classic Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.3
Concurrent Learning-MRAC
. . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.4
CL-MRAC with Uncertain Allocation Matrix . . . . . . . . . . . . . . . . .
11
2.5
Combining Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.6
The Projection Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.7
Derivatives with the Projection Operator . . . . . . . . . . . . . . . . . . .
17
2.8
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
3 Error in the History Stack
18
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
3.2
Shocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
vii
3.3
3.2.1
Heuristic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
3.2.2
Hypothesis Testing Algorithm . . . . . . . . . . . . . . . . . . . . . .
20
3.2.3
Variance of Average Allocation Matrix Estimate Error . . . . . . . .
22
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
4 Stability
23
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
4.2
Boundedness without a Full Rank History Stack . . . . . . . . . . . . . . .
25
4.3
Boundedness with a Full Rank History Stack . . . . . . . . . . . . . . . . .
27
4.4
Bounded Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
4.5
Ultimate Boundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
4.6
Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
4.7
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
5 Dissolved B Matrix
39
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
5.2
Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
5.2.1
Bounded Operation . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
5.2.2
Locally Ultimately Bounded . . . . . . . . . . . . . . . . . . . . . . .
44
5.2.3
Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
5.3
6 Simulations
47
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
6.2
Uncertain Allocation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
6.3
Dissolved Input Allocation Matrix . . . . . . . . . . . . . . . . . . . . . . .
52
viii
6.4
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 Conclusion
58
59
7.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
7.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
References
61
A Derivations
65
A.1 Expanding the Input Matrix into Known and Unknown Components . . . .
66
A.2 Define Concurrent Learning Error Terms . . . . . . . . . . . . . . . . . . . .
68
A.3 Derivative of the Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
A.4 The Lyapunov Candidate and Derivative . . . . . . . . . . . . . . . . . . . .
71
A.4.1 Properties of the Trace Operator . . . . . . . . . . . . . . . . . . . .
71
A.4.2 The Lyapunov Candidate Equation . . . . . . . . . . . . . . . . . . .
71
A.4.3 The Lyapunov Candidate Derivative . . . . . . . . . . . . . . . . . .
73
A.5 Expanding the Hatted Epsilons . . . . . . . . . . . . . . . . . . . . . . . . .
82
A.6 Expanding the Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
A.6.1 Expanding the Input with W . . . . . . . . . . . . . . . . . . . . . .
87
B Concurrent Learning Update
90
B.1 The MRAC Update Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
B.1.1 CL-MRAC Update Law . . . . . . . . . . . . . . . . . . . . . . . . .
90
B.2 MRAC Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
B.2.1 Model Reference Adaptive Control Only . . . . . . . . . . . . . . . .
91
B.2.2 Model Reference Adaptive Control with Concurrent Learning . . . .
92
B.2.3 Concurrent Learning in the Weight Space . . . . . . . . . . . . . . .
96
ix
C Acronyms
100
D Additional Plots
101
x
List of Tables
Table
Page
Algorithm 1. Heuristic, Time Based Method
. . . . . . . . . . . . . . . . . . . .
19
Algorithm 2. Hypothesis Test on Expectation of (x
^˙ − ẋ) . . . . . . . . . . . . . .
20
Algorithm 3. Variance of Average Allocation Matrix Estimate Error . . . . . . .
21
xi
List of Figures
Figure
Page
2.1
Projection Operator in the Weight Space
. . . . . . . . . . . . . . . . . . .
15
4.1
e and W
f with Time . . . . . . .
Notional Representation of System State, B,
24
6.1
6.2
Time History of Reference Model Tracking Using Algorithms 1, 2, and 3 . .
48
Time History of Reference Model Tracking Using Algorithms 1, 2, and 3,
zoomed to first 10 seconds . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
6.3
System Tracking Errors Shown Using Algorithms 1, 2, and 3 . . . . . . . .
50
6.4
51
6.5
b K, and Kr History Stacks . . . . . . . . . .
Minimum Eigenvalues of the B,
6.6
b Convergence Using Algorithms 1, 2, and 3, with Ideal Values . . . . . . .
B
W Convergence Using Algorithms 1, 2, and 3, with Ideal Values . . . . . . .
53
6.7
Dissolved Input Allocation Matrix, not using concurrent learning. . . . . . .
54
6.8
Dissolved Input Allocation Matrix, States, using concurrent learning. . . . .
56
6.9
Dissolved Input Allocation Matrix, Errors, using concurrent learning. . . . .
57
52
6.10 Dissolved Input Allocation Matrix, Adaptive Weights, with concurrent learning. 57
b Convergence, using concurrent learning. 58
6.11 Dissolved Input Allocation Matrix, Λ
B.1 Step Response of Two Systems . . . . . . . . . . . . . . . . . . . . . . . . .
92
B.2 MRAC only State Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
B.3 Time history of MRAC only Adaptive Gains . . . . . . . . . . . . . . . . .
94
B.4 Time history of MRAC only State Tracking Error . . . . . . . . . . . . . . .
94
xii
B.5 CL-MRAC State Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
B.6 CL-MRAC Tracking Errors . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
B.7 CL-MRAC Adaptive Weights . . . . . . . . . . . . . . . . . . . . . . . . . .
96
B.8 CL-MRAC Weight Space . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
B.9 CL-MRAC Weight Stochastic Gradient . . . . . . . . . . . . . . . . . . . . .
98
D.1 System Tracking Errors Shown Using Algorithms 1, 2, and 3 . . . . . . . .
102
D.2 Dissolved Input Allocation Matrix Weight Space . . . . . . . . . . . . . . .
103
b . . . . . . . . . . . . . . . . . . . . . .
D.3 Dissolved Input Allocation Matrix Λ.
104
xiii
CHAPTER 1
Literature Survey
Uncertain input allocation is a rare topic in the controls world. In many cases, input
allocation is the one thing that is designed and known with little uncertainty about a
system. Even so, there are times when input allocation is uncertain. In this chapter, Model
Reference Adaptive Control (MRAC) is the focus of the available adaptive control tools.
Then there is a discussion of the persistence of excitation (PE) required by classical MRAC
architectures to drive the adaptive gains to their ideal values. Finally, there is a short
discussion on a more general formulation and eigenvectors and eigenvalues.
1.1
State-Space Modeling
Sets of equations have been used to model interconnected systems. If the format of those
equations takes on the form of multiple ordinary differential equations (ODE’s) which use
a minimum number of variables to describe the internal status of the system, the form of
that set of ODE’s is called the state-space form. In that form, the ODE’s may be separated
into parts which are linearly combined: one that is functions of the states of the system,
and one that is functions of the input to the system (if any). If the system is a linear
system, then these equations may be written as matrices, one multiplying the state vector
and one multiplying the input vector. The state vector need not be made up of measurable
quantities. The input vector is generally measurable because of its physical nature. Inputs
are designed to affect the system and therefore to what extent that happens is generally
known. In general, the state-space matrix form of a linear, time-invariant, continuous
1
system is denoted as
ẋ(t) = Ax(t) + Bu(t)
(1.1)
y(t) = Cx(t) + Du(t)
(1.2)
where ẋ is the state derivative vector, x is the state vector, A is a matrix of coefficients,
B is the input allocation matrix, u is the input to the system, y is the measurable output
of the system, C is the observability matrix, and D is the direct transmission matrix. The
situation is more difficult with the output defined such that C is not an identity matrix,
but that may be handled. A special situation where y = x is used here and is called full
state feedback. Obviously, C is an identity matrix in full state feedback while D is a zero
matrix. [3]
In this thesis the following terms are defined as a naming convention used to differentiate
the input to the system, u, and the matrix that shows how it is distributed to the system.
Definition 1 The allocation matrix, B, is the matrix, when the system is described in
state-space matrix form, which is multiplied by the system’s input vector.
Definition 2 The state matrix, A, is the matrix when the system is described in state-space
matrix form which is multiplied by the state vector of the system.
1.2
Model Reference Adaptive Control (MRAC)
Adaptive control of uncertain systems has been studied and applied to many areas [1, 7, 11,
15,21,30]. A widely used approach is MRAC which drives the instantaneous tracking error to
zero. In most MRAC approaches, the input allocation matrix of the plant (e.g. the B matrix
in the standard state-space representation: ẋ = Ax + Bu) is assumed known, or at least
the sign of the diagonal elements of the matrix is assumed known [1, 7, 11, 15, 17, 21, 26, 30].
These methods cannot be used in situations that require adaptation in presence of uncertain
input allocation, such as when controls reverse on an flexible-wing aircraft due to wing
twist, or when an autopilot must control an Unmanned System whose actuator mappings
are unknown.
2
Authors have studied the problem of uncertain allocation matrices: Lavretsky et al.
used a diagonal scaling matrix and adaptive laws to approximate the symmetric control
effectiveness loss [18]. Somanath showed uncertainties in the allocation matrix could be
handled if the allocation matrix was some multiplicative combination of an uncertain matrix and the reference model input allocation matrix [29]. His work on hypersonic vehicles
required the reference model input allocation matrix to be in the subspace of the plant
allocation matrix. Tao et al. show how to control a multi-input system where some of the
actuators became inoperative at a fixed or varying locations of actuation at some unknown
time which does not address the uncertain allocation matrix problem, it just reduces the
allocation matrix by the number of stuck actuators [31]. All these works require an assumption that the input allocation matrix be defined as a diagonal scaling matrix multiplied by
a known matrix which is usually the reference model allocation matrix.
The works just discussed all use MRAC formulations with one model, but there are
other formulations available. Multiple model adaptive control [14,22] techniques can handle
uncertain input allocation by choosing between candidate models for the B matrix. However,
these candidate models need to be available before hand. The retrospective cost adaptive
control method [28, 33] could also potentially handle uncertain input allocation situations.
Stability of the system, however, may not be guaranteed while data is being collected for
learning in the retrospective cost adaptive method. It appears that little work has been
done on the general case of controlling a stable system when the input allocation matrix is
uncertain.
1.3
Persistency of Excitation
Model reference adaptive control has been shown to need persistently exciting input in order
to drive the adaptive gains or weights to their ideal values by Boyd and Sastry [2]. But,
with CL-MRAC, the adaptive gains are shown to converge if the input signals are exciting
over a finite time. Exciting signals can be defined using Tao’s definition in [30]: over some
3
interval [t, t + T ] where t > t0 and T > 0, the input signal is exciting if
Z t+T
u(τ)uT (τ) dτ ≥ λI
(1.3)
t
for some λ > 0 and I is an identity matrix of appropriate dimension. The signal is persistently exciting if (1.3) holds for all t ≥ t0 . When the adaptive weights attain their ideal
values, the tracking error reduces [21], but without persistently exciting (PE) input, the
adaptive weights are not guaranteed to converge to the ideal values under traditional gradient based MRAC update laws because the tracking error goes to zero, but concurrent
learning MRAC (CL-MRAC) is a method that guarantees the adaptive weights converge to
the ideal with only finite excitation [5, 8, 10]. CL-MRAC achieves this by using specifically
selected online data concurrently with instantaneous data for adaptation. The argument
can be made that any non-zero signal will fulfill this condition. This is true. What is lacking
is the persistent component. With a non-zero signal, the MRAC formulation will have zero
tracking error, e(t), but that is no guarantee that the adaptive weights will converge. And,
in fact, zero tracking error stops adaptation and guarantees the weights will not converge
to their ideal values. As an example, a step input would be exciting per the definition in
(1.3), but an MRAC system would not converge to its ideal weights. The tracking error,
e(t) would go to zero. An example of this can be seen in Appendix B.2.1 on page 91 with
a sinusoidal input to a two state MRAC system.
Persistently exciting input is a different thing. Just because the input meets the definition in (1.3), this does not mean the input is persistently exciting. The signal has to do
so for all time. The term ‘persistent’ applies to maintaining a non-zero tracking error, e(t).
Boyd and Sastry discuss this in terms of the number of spectral lines the input has in [2].
The idea is that with enough frequencies in the input, the system’s tracking error, e(t), will
not converge to zero without the adaptive weights converging to their ideal values. MRAC
is a minimization scheme in this context. With the amount of excitation in the signal that
Boyd and Sastry mention, the least tracking error occurs when the adaptive weights are
equal to the ideal weights.
However, existing results in CL-MRAC do not extend to the case when the input allo4
cation matrix is uncertain. Simulation results in [26] of CL-MRAC working with uncertain
allocation matrix were presented, however, that work did not provide rigorous justification
of stability when the input allocation is uncertain. In particular, unlike previously studied
CL-MRAC approaches which assumed the allocation matrix was known [5, 8], the case for
uncertain input allocation matrix requires that the concurrent learning histories be purged
once the estimate of the input allocation matrix converges close to its actual value as shown
in [27], though the stability shown in that paper did not put enough restrictions on the
system. A more complete and correct treatment is given here.
1.4
Nussbaum Gains
Nussbaum used a more general formulation where the input allocation function is unknown.
He used an equation set like
ẋ(t) = x(t) + λf(x(t), y(t))
ẏ(t) = g(x(t), y(t))
where the λ is a non-zero, real number and f and g are differentiable in [23]. He shows there
exist specific families of functions for which the proposed system is stable, but in order for
the proof to work, Nussbaum needs random input commands to be able to eventually find
a value, c, that limits values of g to be either positive or negative ( he shows both cases)
for values of x greater than c. Once c has been found, the sign of g is known for values
greater than c. The hinge is the random input, though. Aerospace applications generally
do not react well to random input.
1.5
Eigenvectors and Eigenvalues
Important components of any matrix are its eigenvalues and eigenvectors. Eigenvalues are
the roots of a matrix’s characteristic polynomial gleaned from |A − Iλ| = 0 where | · | is
the determinant operator, A is the matrix under investigation, I is the identity matrix,
5
and λ is the working variable. So, for every n × n matrix, there exists an nth order
characteristic equation which means there are n roots per fundamental algebraic result.
Once the roots of the characteristic equation are known, the eigenvectors can be found by
inspecting Axi = λi xi for xi , i ∈ {1, 2, . . . , n} assuming the eigenvalues are real and distinct.
If not, there are methods that will be left to the reader in [3] among other texts. The
xi ’s found following this method are the eigenvectors associated with the ith eigenvalue.
Together the eigenvectors and values illuminate the direction (vectors) and the magnitude
(values) that a given vector will grow in when multiplied by A. [3, Chap. 7]
6
CHAPTER 2
Model Reference Adaptive Control
2.1
Introduction
This work presents a hybrid MRAC method that directly uses an online-updated estimate
of the uncertain allocation matrix in the adaptation laws. An outline of Model Reference
Adaptive Control is presented. Then Concurrent Learning Model Reference Adaptive Control is shown and a discussion of how the CL-MRAC process was changed to be able to
identify a input allocation matrix is reported. Combined variables and projection operators
are the final segments of the chapter.
2.2
Classic Equations
Let D ∈ Rn be compact and let x(t) ∈ D be the state vector of the system. Consider a
linear, time-invariant, dynamical system of the form
ẋ(t) = Ax(t) + Bu(t)
(2.1)
where A is the known state matrix and B is an known allocation matrix. Equation (2.1) is
called the plant. The linear reference model is of the form
ẋrm (t) = Arm xrm (t) + Brm r(t)
7
(2.2)
where Arm is chosen to be Hurwitz, Brm is chosen to be an identity matrix, and the reference
model input is
r(t) = −B†rm Arm xdes (t)
(2.3)
where the † means the psuedoinverse operation of [24] (Tikhonov regularization of [32] may
be used to guarantee the inverse exists). The desired state, xdes is a bounded signal. The
control law applied to the plant is defined as the following by [1, 30]:
u(t) = KT (t)x(t) + KTr (t)r(t)
(2.4)
where the first term on the right-hand-side is the state feedback and the second is the
reference input feed-forward term. Defining the error as
e(t) = x(t) − xrm (t),
(2.5)
then it can be shown (see [1, 10, 30]) that the time derivative of the tracking error is
where
e T (t)x(t) + BK
fr T (t)r(t)
ė(t) = Arm e(t) + BK
e
K(t)
= K(t) − K∗
fr (t) = Kr (t) − K∗
K
r
(2.6)
(2.7)
(2.8)
are the vanishing weights. This progression is also shown in Appendix A.3 on page 70 in
detail. The starred parameters come from the assumption of matched uncertainty which
guarantees the existence of ideal constant gains, K∗ and K∗r .
Assumption 1 (Matching Conditions) There exists two constant, non-zero matrices,
K∗ and K∗r , such that A + BK∗T = Arm and BK∗T
r = Brm .
Using (2.1), (2.2), (2.4), and the matching conditions, the state derivative of the plant can
be represented as:
e T (t)x(t) + BK
fr T (t)r(t)
ẋ(t) = Arm x(t) + Brm r(t) + BK
8
(2.9)
which is derived in detail in Appendix A.1 on page 66.
The adaptation laws for K and Kr are:
K̇(t) = −Γx x(t)eT (t)PB
K̇r (t) = −Γr r(t)eT (t)PB
(2.10)
(2.11)
where P is from the Lyapunov equation. Let P, Q ∈ Rn×n and positive definite matrices
such that the Lyapunov equation states:
ATrm P + PArm = −Q.
(2.12)
e and K
fr because from (2.7) and (2.8),
Then (2.10) and (2.11) are also the derivatives for K
e and K
fr are linear combinations of K∗ and K and K∗ and Kr . Since K∗ and K∗ are both
K
r
r
constant matrices, their derivatives are 0 leaving just the derivatives of K and Kr as shown
in (2.10) and (2.11).
2.3
Concurrent Learning Model Reference Adaptive Control
CL-MRAC differs from MRAC in that concurrent learning operates on an estimate of the
weight error concurrently with the tracking error. The regressor terms for the jth recorded
data point for adaptive gains K and Kr are:
e T x = KT xj − δKj
Kj = K
(2.13)
δKj = K∗T xj = KT xj − B† (ẋj − Arm xj − Brm rj − BKrj )
T
fr rj = KT rj − δK
Krj = K
r
rj
(2.14)
(2.15)
†
δKr j = K∗T
r rj = B Brm rj
(2.16)
Along with δKj and δKr j , concurrent learning also stores xj and rj at that time. Note that
the evaluation of these regressors requires an estimate of ẋj for a recorded data point. The
estimate can be computed using fixed-point smoothing. This method has been validated
through several flight tests to yield acceptable results (see [6, 8–10]), and furthermore, [20]
shows that the CL-MRAC framework is robust to noise in estimating ẋj . Also, the ẋj is a
9
stored value, which means there is time to use Kalman filtering to improve the estimate of ẋj
before using it, if so desired. In this work, zero-mean, truncated, white, Gaussian noise was
added to ẋ to simulate measurement error. The regressor in (2.13) comes from rearranging
(2.9) and the regressor in (2.15) comes from the matching condition (Assumption 1) that
fr , (2.8). Multiplying by xj and rj respectively compresses
defined K∗r and the definition of K
the data into a n × 1 vector for storage in the history stack.
The regressor terms in (2.13) and (2.15) are summed and used in the update laws for
K and Kr in CL-MRAC (see [4, 26]). The CL-MRAC update laws are


pX
max
K̇(t) = − Γx x(t)eT (t)PB +
xj K Tj 
j=1

K̇r (t) = − Γr r(t)eT (t)PB +
pX
max
j=1

rj Kr Tj 
(2.17)
(2.18)
where pmax is the maximum number of data points to be stored. The error terms (the
summations in (2.17) and (2.18)) are part of CL-MRAC and would not be present in MRAC
update laws.
As a counterpoint to having a highly oscillatory input per [2], concurrent learning may
be used with step inputs or single sinusoids. As long as the input is exciting per (1.3) over
some (short) finite time when concurrent learning may record the data, that data can be
used along with the normal MRAC adaptation law like in (2.17) and (2.18) to drive the
adaptive weights to their ideal values per the following theorem.
Theorem 1 Consider the system in (2.1), the control law of (2.4), and let p ≥ n be
the number of recorded data points. Let Xk = [x1 , x2 , . . . , xp ] be the history stack matrix
containing recorded states, and Rk = [r1 , r2 , . . . , rp ] be the history stack matrix containing
recorded reference signals. Assume that over a finite interval [0, T ] the exogenous reference
input r(t) is exciting, the history stack matrices are empty at t = 0, and are consequently
updated using Algorithm 1 of [10], the Concurrent Learning History Stack Minimum Eigenvalue Maximization (CL-HSMEM) routine. Then, the concurrent learning weight update
h
i
e
fr (t) ≡ 0 is globally
laws of (2.17) and (2.18) guarantee that the zero solution e(t), K(t),
K
10
exponentially stable.
Proof. See [10] for the proof.
An example of concurrent learning on a two state system with step inputs is shown in
Appendix B.2.2 on page 92.
2.4
CL-MRAC with Uncertain Allocation Matrix
In order to describe the method for uncertain input allocation CL-MRAC, the framework
of CL-MRAC in section 2.3 must be expanded to extimate the input allocation uncertainty.
The classical implementation of MRAC from section 2.2 would not operate consistently
with an uncertain allocation matrix because the adaptive laws, grounded in the Lyapunov
function for the system, assume the sign (and magnitude) of the input allocation matrix.
The wrong sign would drive the adaptation in the opposite direction causing divergence of
parameters and instability. CL-MRAC will bound parameter growth ( [10, 26]) but still the
uncertain B matrix is not tackled.
Since the input allocation matrix, B, is uncertain, the controller uses an internal estimate
b
b
of B, denoted as B(t).
Then, B(t)
is related to B by the following:
e
b −B
B(t)
≡ B(t)
(2.19)
e
b
where B(t)
is not directly measurable. The regressor for B(t)u(t)
is defined as ẋ(t) − Ax(t)
with the following assumption.
Assumption 2 The state matrix, A, is assumed to be known.
The knowledge of A is a restrictive assumption, however, there is a lack of results on the
topic of adaptive control with uncertain input allocation even with this assumption. Future
work will try to relax Assumption 2 along the empirical evidence presented in [26].
b
To utilize B(t),
(2.6), (2.9), (2.15)-(2.16) are rewritten solving (2.19) for B and inserting
11
as follows:
bK
eT x − B
eK
eT x + B
bK
fr T r − B
eK
fr T r
ẋ = Arm x + Brm r + B
(2.20)
bK
eT x + B
bK
fr T r − B(
e K
eT x + K
fr T r)
ė = Arm e + B
(2.21)
∗T
T
b † e fT
eT
b†
b fT
δc
K = K x + B B(Kr r + K x) = K x − B (ẋ − Arm x − Brm r − BKr r)
∗T
b † e ∗T
b†
δc
Kr = Kr r − B BKr r = B Brm r
e T x = KT xj − δc
b
K = K
Kj
(2.22)
(2.23)
(2.24)
fr T rj = KT rj − δc
b
Kr = K
Kr j
r
(2.25)
and the full derivations can be found in Appendix A.1. Again, (2.22) is derived from (2.20).
e
Note that δc
K has elements of B stored in it under this derivation. This situation will remain,
e can be forced to become small. In the same way, (2.23)
but with concurrent learning, B
e as well. Finding the error
also stores more than estimates of K∗r r. It includes elements of B
b
to the regressor for B,
b
B = B(t)u(t)
− ẋ(t) + Ax(t)
(2.26)
e in expanded form) which is then used in concurrent learning for B.
b The update law
(or Bu
b
for B(t)
is chosen as follows:
"
#
max
T pX
ḃ
b
B(t)
= −ΓB u(t) B(t)u(t)
− ẋ(t) + Ax(t) +
ui TB
= −ΓB
e T(t) +
u(t)uT (t)B
pX
max
i=1
e T(t)
ui uTi B
!
i=1
(2.27)
e
where B(t)
is defined in (2.19). And, (2.17) and (2.18) are rewritten using the estimator,
b instead of B as
B,
K̇(t) = − Γx
b +
e (t)x(t)PB(t)
T

b +
K̇r (t) = − Γr eT (t)r(t)PB(t)
pX
max
i=1
pX
max
j=1
xi b
TK i
!

rj b
TKr j 
(2.28)
(2.29)
because this uses terms the controller knows. As in [4, 8, 26], the history stacks of the
concurrent learning mechanism are populated while the input signal is exciting per the
12
definition given in (1.3) up to a given number of points, pmax . The CL-HSMEM routine
of [10] is used to select and replace future data points in the history stack.
c
b
Remark 2.1 Solving for δc
K and δKr is the only place the controller inverts the B matrix.
b and turning off
Singularities may be avoided here by placing a dead-zone around zero for B
b moves away from zero.
the CL-HSMEM routine until the norm of B
Assumption 3 The input allocation matrix, B, is uncertain except for its dimension.
The uncertainty may include the sign of the elements of B, but knowledge of the dimension
of B is necessary because it affects the dimensions of K∗ and K∗r and therefore K and Kr .
A completely different result would be found if a the input allocation matrix was a vector
instead of a square matrix (or non-square). Also, knowing if the system is or is not multiinput is rather straight forward.
2.5
Combining Variables
For ease of expostulation, several variable matrices will be concatenated together. W, W ∗ ,
f Ẇ along with σ and b
W,
W are now defined.
Definition 3 Let W be a 2n × n matrix of the form:


 K 
W≡
.
Kr
(2.30)
Using (2.30) will reduce the number of terms in the results in later chapters. Definition 3
subsequently requires the definition of the following:
f and Ẇ)
Definition 4 (W ∗ , W,


W∗ ≡ 
13
K∗
K∗r



(2.31)


e
K 
f ≡
W


fr
K


 K̇ 
Ẇ ≡ 

K̇r
(2.32)
(2.33)
where (2.33) refers to (2.17) and (2.18).
Also, the state and reference input shall be combined as well.
Definition 5 Let σ be a 2n × 1 matrix of the form:
 
 x 
σ ≡  .
r
(2.34)
This is necessary for use with W.
Definition 6 Let b
W i be a 2n × 1 matrix of the form:


K 
 b
b
W i ≡ 

b
Kr
(2.35)
where b
K refers to (2.24) and b
Kr refers to (2.25).
2.6
The Projection Operator
One of the many methods to increase the robustness of a solution is to add a projection
operator of the form found in [19, 25] to the update law. Now a projection operator works
by creating a ball inside which the weights are allowed to move with freedom, but if the
weight vector approaches the boundary of the ball specified, the projection operator begins
to remove the component of the weights that is perpendicular to the boundary. The amount
removed increases as the boundary is approached and by the time the weights achieve the
boundary, all the perpendicular component of the update is removed so that the weight
vector is projected parallel to the boundary. This is shown in Figure 2.1. [19]
14
Proj(θ, y, f)
y
5f(θ)
ΦT
{θ f(θ) = 1}
Φ0
{θ f(θ) = 0}
Figure 2.1: Projection Operator in the Weight Space adapted from [19]. Any θ such that f(θ) ≤ 1
is an allowable value. If the weights arrive at the outer boundary as shown by y, the projection
operator redirects y to become parallel to the boundary, Proj(θ, y, f) by removing 5f from y.
The function f : Rn → R is used to smoothly transition from not projecting at all to
fully projecting the vector, y, parallel to the boundary where f(θ) = 1. For this thesis, the
function is of the form
f(θ, θTstart , θTend ) =
kθk2F − θ2Tstart
2θTend θTstart + θ2Tend
(2.36)
where θTstart is the dashed line shown in Figure 2.1 at which the projection operator begins
to affect the result, θTend is the additional width from θTstart to the maximum value that
should be allowed by the projection operator. Thus, when θ = θTstart + θTend , the entire
component of θ perpendicular to the boundary, delineated by f(θ) = 1, has been removed
and the transition is complete. The k·kF is the Frobenius norm for matrices and the 2-norm
for vectors as defined in [30]. If θTstart and θTend have been pre-specified, then (2.36) may
be denoted as f(θ). For completeness, the derivative of f(θ) where θ is a vector is
q
P 2 2
P 2
+ θ2Tstart
2
i θi
∂
2θi
∂
i θi + θTstart
=
=
∂θi 2θTend θTstart + θ2
∂θi 2θTend θTstart + θ2
2θTend θTstart + θ2Tend
Tend
Tend
15
(2.37)
Thus, a transition set is created of values of θ where the projection operator is active
as:
Φ1 = {θ f(θ) ≤ 1}
(2.38)
Φ0 = {θ f(θ) ≤ 0}
(2.39)
ΦT = Φ1 \ Φ0
(2.40)
All values of θ in set Φ1 are allowable as are those in set Φ0 , but the point here is to
define a region where projection operator does nothing and then another region where the
projection operator does its job, ΦT .
Definition 7 (Projection Operator) The projection operator is then defined as


T

y − 5f(θ)(5f(θ))
yf(θ) f(θ) > 0 ∧ yT 5 f(θ) > 0
2
k5f(θ)k
Proj(θ, y, f) =
(2.41)


y
otherwise
where θ is the vector the operator is limiting to within the set defined by the function f per
(2.36), and y is the vector which defines the growth of θ.
The preceding definition is for the case of two vectors, θ and y. To generalize the operator
to the matrix case, the projection operator is defined as
Proj(Θ, Y, F) = [Proj(θ1 , y1 , f(θ1 )), . . . , Proj(θj , yj , f(θj ))]
(2.42)
where Θ, Y, F ∈ Rn×m given Θ = [θ1 , . . . , θm ], Y = [y1 , . . . , ym ], and F = [f(θ1 ), . . . , f(θm )]
and j = 1 to m. If more control of the extents of the boundary is needed, then the matrix
case could be redefined with an indexed list for each column of the matrix so that its norm
is within the indexed values like the format in [19]. The Gamma Projection operator is
developed also in [19] to handle the case when the learning rate gain matrix for MRAC is
not of the form Γ = λI.
16
2.7
Derivatives with the Projection Operator
b are converted
Using the operator explained in section 2.6, the derivatives for both W and B
to the following:
b − ΓW
Ẇ = Proj W, −ΓW σe PB
T
ḃ = Proj B,
b −ΓB uuT B
e − ΓB
B
pX
max i
pX
max k
σi b
TW i
, f(W, ωb , ωmax )
!
e , f(B,
b βb , βmax )
uk uTk B
!
(2.43)
(2.44)
where ωb is the norm value that begins the transition of the projection operator for the
weights, ωmax is the norm value maximum allowed for the weights and βb and βmax are
b respectively.
the beginning and maximum normed values allowed for B
2.8
Summary
Model Reference Adaptive Control has been reviewed along with the necessary changes for
adding concurrent learning to the derivative and the changes for estimating an uncertain
input allocation matrix. Then projection operators were discussed as a method to add
robustness to a system. Finally, the projection operator was used to define derivatives from
concurrent learning model reference adaptive control.
17
CHAPTER 3
Error in the History Stack
3.1
Introduction
The mechanism of concurrent learning will store errors, defined in section 2.4, in the history
stack if the data is linearly independent. This chapter discusses the need for shocking the
adaptive weight history stack to remove those accumulated errors.
3.2
Shocking
The concurrent learning history stacks are empty at t = t0 and are filled with data per the
e goes to zero.
CL-HSMEM routine explored in [10]. The following lemma shows that B
Lemma 1 Consider the plant of (2.1), the reference model of (2.2), the control law of
e
(2.4), the weight update law of (2.44), and Assumption 2, then B(t)
→ 0 as t → ∞
exponentially.
Proof. From Theorem 1, let (2.26) be the reference signals, Rk , and let u be used in place
of the state vector for Xk , then result is straight forward.
e
Hence, by the Lemma 1, there exists a time, ts > 0, such that B(t
)
s ≤ 0 , where 0
is a small positive constant. Before ts , the data stored in the history stack has estimates
of W ∗ formed by using (2.22) and (2.23) together. However, when these estimates were
e term was large causing the values stored to be very different from K∗ and
recorded, the B
18
K∗r that were expected. As the theoretical results show later, this incorrect data in the
W history stack helps ensure that the system response stays bounded, but W will not
converge to its ideal values as long as the incorrect data remains in the stack. Therefore,
the stack for W must be shocked or purged to remove this incorrect data and allow collection
b is closer to the actual allocation matrix. Hence, the
of new data where the estimate, B,
e e
condition for shocking the stack becomes B(t)
is assumed to be not
≤ 0 . Since B(t)
directly measurable, three methods to estimate it are presented: a heuristic (Algorithm 1),
by hypothesis testing (Algorithm 2), and by investigating the variance of the expectation
of the regressor (Algorithm 3).
Algorithm 1: Heuristic, Time Based Method
b
Require: x(t), xrm (t), B(t),
u(t), dt
ẋ(t) ⇐ plant (2.1)
ẋrm (t) ⇐ model (2.2)
ḃ
B(t)
⇐ control law (2.44)
ḃ
if |B(t)|
< 0 then
. Choose 0 to be small
Step counter, cnt
end if
if cnt · dt == 1sec then
. Heuristic
Purge history stack for W
end if
3.2.1
Heuristic Algorithm
e Algorithm 1 estimates B(t)
by looking at the number of iterations that the absolute values
ḃ
ḃ is small for a number of iterations that accounts for a second of time,
of B(t)
< 0 . When B
the algorithm shocks the history stack. While Algorithm 1 is simple to implement, it needs
to be told from an outside source that the allocation matrix has changed. The algorithm is
not robust to multiple changes of the allocation matrix without outside information.
19
3.2.2
Hypothesis Testing Algorithm
To address the limitation of Algorithm 1, a hypothesis test is used to detect changes in the
b in Algorithm 2 based on a rolling set of 2-norms of the expectations
B matrix relative to B
of
e≈x
B
^˙ (t) − ẋ(t)
(3.1)
b
values where x
^˙ (t) = Ax(t) + B(t)u(t)
so other than measurement errors, (3.1) is a good
e A rolling set is a data smoothing operation which has a fixed number
approximation of B.
of elements which are replaced, oldest first, by new elements. Using only pmax elements
in the rolling set, the set is updated whenever ku(t)k is greater than zero. Otherwise, the
set could artificially go to zero. The check to purge the history stacks is executed every
e iteration because due to Lemma 1 concurrent learning will be driving B(t)
toward zero
continuously.
Algorithm 2: Hypothesis Test on Expectation of (x
^˙ − ẋ)
b
Require: x(t), xrm (t), B(t),
u(t)
ḃ
ẋ(t) ⇐ plant (2.1), ẋrm (t) ⇐ model (2.2), and B(t)
⇐ control law (2.44)
b
^ ⇐ Ax(t) + B(t)u(t)
ẋ(t)
if ku(t)k2 > 0 then
Ppmax ˙
1
T
˙
xrollmean ⇐ pmax
(
x
^
−
ẋ
)(
x
^
−
ẋ
)
i
i
i
i
i
˙
T
˙
Add (x
^ − ẋ)(x
^ − ẋ) − xrollmean to rolling list, xroll
2
xrollave ⇐ mean(xroll )
p
xrollS ⇐ var(xroll )
end if
Expected-ẋ ⇐ (x
^˙ − ẋ)(x
^˙ − ẋ)T − xrollmean
√
UCL ⇐ 4.781xrollS / pmax
√
Tstat⇐ kExpected − ẋk2 pmax /xrollS
. 99.99% Upper Control Limit
. Statistic
if Tstat<UCL then
Purge history stack for W
end if
20
e ˙
T
˙
Algorithm 2 estimates B(t) with (x
^(t) − ẋ(t))(x
^(t) − ẋ(t)) since x
^˙ (t) − ẋ(t) =
2
b − B)u(t) = B(t)u(t).
e
(B(t)
The upper control limit of the hypothesis test changes based on
the mean and standard deviation of the rolling set. The statistic depends on the current
measurement as well as the standard deviation. The statistic is the difference between the
plant and the controller’s estimate of the plant. The outside product of this difference is
normed to return a positive number so the statistic is positive semi-definite. The variance
approaches zero faster due to being squared. Unlike Algorithm 1, if the allocation matrix
changes again, then the difference x
^˙ (t) − ẋ(t) will be non-zero and the control limit will
increase, as will the statistic, automatically allowing for successive changes in the B matrix
to be detected.
Algorithm 3: Variance of Average Allocation Matrix Estimate Error
b
Require: x(t), xrm (t), B(t),
u(t)
ḃ
ẋ(t) ⇐ plant (2.1), ẋrm (t) ⇐ model (2.2), and B(t)
⇐ control law (2.44)
b
^ ⇐ Ax(t) + B(t)u(t)
ẋ(t)
if ku(t)k2 > 0 then
Ppmax ˙
1
xrollmean ⇐ pmax
(x
^i − ẋi )(x
^˙ i − ẋi )T
i
˙
Add (x
^ − ẋ)(x
^˙ − ẋ)T − xrollmean to rolling list, xSnroll
2
p
xSnrollstd ⇐ var(xroll )
end if
. small tolerance
if xSnrollstd < tol then
Purge history stack for W
Set flag to indicate that stack has been purged
end if
if flag set then
if xSnrollstd > tol2 then
Set flag to indicate that stack can be purged
end if
end if
21
3.2.3
Variance of Average Allocation Matrix Estimate Error
Algorithm 3 is defined in order to shock the stack less often when compared to Algorithm
2. It focuses on the standard deviation of the expectation of ẋ(t) − Ax(t), reasoning that
q
e when B(t)
^˙ (t) − ẋ(t))(x
^˙ (t) − ẋ(t))T ) will be small. Therefore, two
< 0 , then var((x
tolerances are given. The first, tol, is small, around 10−8 , and is used to verify that the
history stack should be shocked, and the second tolerance, tol2, is much larger (≈ 10−3 )
and is used to indicate that the difference between x
^˙ (t) and ẋ(t) has increased. In this way,
the algorithm allows for multiple changes to the allocation matrix to be detected. While the
selection of tolerances is arbitrary, using the variance of the expectation of the difference
between x
^˙ (t) and ẋ(t) gives a measure of how consistently the expectation is near zero.
3.3
Summary
In this chapter retained errors are discussed as part of concurrent learning. Those errors
are present due to the uncertainty of the B matrix and three methods were presented to
remove the errors from the concurrent learning history stack. Algorithm 1 uses a simple
time counting scheme. Algorithm 2 uses a hypothesis test to decide whether to shock the
stack and Algorithm 3 looks at the variance of the average allocation matrix estimate error
to decide if it is time to shock the stack.
22
CHAPTER 4
Stability
4.1
Introduction
Lyapunov stability theory is used to demonstrate the local boundedness of the zero solution. This chapter proceeds with a set of events that happen in order notionally shown in
Figure 4.1. The system state begins to grow in response to input and the estimators for B
e begins to shrink
and W ∗ are wrong. Shortly thereafter, XBb becomes full rank at time, tf . B
f continues to grow because of large values of B
e stored
exponentially following Lemma 1. W
in the history stack at first. Because W is growing, kxk grows from the control law. Then
e
B is found to be less than 0 and the shocking method shocks XW . There is a little time
e continues to decrease. Then W
f too, shrinks.
while XW becomes full rank after ts and B
This pulls kxk back as well.
b history stack is full rank is presented. Then
First, system boundedness before the B
system boundedness after the XBb becomes full rank is shown. After that, there is a theorem
for combining the first two together. Then the ultimate boundedness of the system is
described and finally, the convergence rate is discussed.
In this thesis, vec(·) is the vectorize operation where the columns of a matrix are stacked
one upon the next. The function λmax (·) returns the maximum eigenvalue of its argument
while λmin (·) returns the minimum eigenvalue of its argument. Also, Algorithm 1 of [10], the
concurrent learning history stack minimum eigenvalue maximization (CL-HSMEM) routine
is used throughout this chapter.
23
R
ẋ(τ )dτ Φ0
e
B(t)
f (t)
W
ts
t0
tf
T1
e and W
f with Time. Time is along the
Figure 4.1: Notional Representation of System State, B,
e and W
f are pictured above that axis. Time
bottom and representations of the sets containing x, B,
ts is when the history stack shock occurs.
24
4.2
Boundedness without a Full Rank History Stack
The problem of showing there is a limited amount growth during the time required for a full
rank history stack to be selected is evaluated. Define tf as the time when the history stack
b Xb , becomes full rank. This controller is only for use on inherently stable platforms
for B,
B
like some fixed wing aircraft and unmanned aerial vehicles. The system must be able to
sustain itself during the time between t0 , initialization, and tf .
This theorem shows that the integral of kẋk over a fixed interval of time is finite because
of the assumption of a linear plant and the use of projection operators.
Theorem 2 Consider the plant of (2.1), the reference model of (2.2), the control law of
(2.4), the weight update laws of equations (2.44), and (2.43), and let pmax be an integer
such that pmax > n where n is the number of states in the system and pmax is the maximum number of recorded data points. Assume the ideal values, W ∗ , and actual system
parameters, B, are within Φ0,W and Φ0,B , respectively, per (2.39), for the projection opb
erators used, and that W(0) and B(0)
are within those sets as well. Let the norm bounds
for the projection operator be chosen such that bmax ≥ 2 kPk, where P is selected with the
Lyapunov equation, and wmax ≥ 2 kW ∗ k > 2. From Assumption 3, B is uncertain. Let
XBb = [u1 , u2 , . . . , upmax ] ∈ Rn×pmax be the history stack matrix containing recorded inputs
h
i
n×pmax be the history stack matrix of recorded reand let RBb = B,1
,
,
.
.
.
,
b
b
b max ∈ R
B,2
B,p
b Assume Xb and Rb are 0 at t0 and are updated by Algorithm 1 of [10], the
gressors for B.
B
B
concurrent learning history stack minimum eigenvalue maximization (CL-HSMEM) routine.
Assume that the exogenous reference input, r(t), is sufficiently rich to allow the CL-HSMEM
routine to select n linearly independent points and let tf be the time at which XBb becomes
fB
e ] is bounded.
full rank. Then the zero solution of [ e W
Proof. Selecting a quadratic Lyapunov candidate like
1 fT −1 f 1 e T −1 e 1
Γw W + tr B ΓB B
V(ζ) = eT Pe + tr W
2
2
2
(4.1)
f
allows the compilation of all the energy in the system into one variable. Noting that W
25
e are bounded by projection operator leaves eT Pe. From (2.5) the necessity is bound
and B
x(t) because xrm (t) is bounded due to the reference input being bounded and the reference
model Arm being Hurwitz. So, evaluate the integral of kẋk to see if it exists and is finite.
Z t
Z t
f
f
ẋ(τ) dτ = (Ax(τ) + Bu(τ)) dτ
Then since u = W T σ,
(4.2)
t0
t0
Z t Z t
f
f
T
ẋ(τ) dτ = Ax(τ)
+
BW
(τ)σ(τ)
dτ
(4.3)
t0
t0
and σ may be separated into [ xT rT ]T as follows:


 Z t
Z
x(τ)  f
tf 

T
ẋ(τ) dτ = Ax(τ) + BW (τ) 
 dτ.
t0
r(τ)
t0
(4.4)
f is bounded
The reference input is bounded by cr > max kr(t)k, and the adaptation of W
due to projection operators as is B. The upper bounds of B and W are found by
wmax
f
= sup W
(4.5)
e
W∈Φ
1,W
e
bmax = sup B
.
(4.6)
e
B∈Φ
1,B
Substituting wmax for W and bmax for B and bounding r(t) obtains
Z t
Z t
f
f
ẋ(τ) dτ ≤ Ax(τ) + bmax (wmax x(τ) + wmax cr ) dτ.
t0
(4.7)
t0
Then letting Ap = A + bmax wmax In , the remainder is
Z t
Z t
f
f
ẋ(τ) dτ ≤ Ap x(τ) + bmax wmax cr dτ
(4.8)
t0
t0
and applying the triangle inequality and integrating the constant term leaves
Z t
Z t
f
f
ẋ(τ) dτ ≤ Ap x(τ) dτ + c.
t0
(4.9)
t0
Now there exist functions which grow faster than the above like the exponential function
for instance because the plant in this thesis is linear time invariant. Thus the norm of the
integral of Ap x(τ) is bounded above by the integral of the exponential function plus c from
fB
e ] is bounded from t0 to tf .
t0 to tf . Therefore, the zero solution of [ e W
26
Remark 4.1 The assumption that the reference input is rich enough is not difficult to
qualify. If the reference input is identically zero and the system initializes at the zero state,
then there is no requirement for the system to move and then no linearly independent data
to collect. Even with the system initializing at a non-zero state, there is only the error term
driving change which will not be that rich of a signal. So, some level of excitement within
the reference input is required to allow selection of linearly independent data points for the
W history stack.
Remark 4.2 The growth of the state, x, depends on Ap from this derivation. Ap may be
partially selected by the designer by choosing stable platforms like some fixed wing aircraft.
If highly unstable platforms are chosen or nonlinear plants are used, Theorem 2 does not
necessarily hold.
4.3
Boundedness with a Full Rank History Stack
b is now full rank and the
Proceeding forward toward the next event, the history stack for B
e converges toward
following theorem is presented to show boundedness of the system while B
zero from concurrent learning, but B is still not known.
Theorem 3 Consider the plant of (2.1), the reference model of (2.2), the control law of
(2.4), the weight update laws of equations (2.44), and (2.43), and Theorem 1. Assume that
b the ideal values for W ∗ and B are within Φ0,W and
a full rank history stack exists for B,
b
Φ0,B , respectively, per (2.39), for the projection operators used, and that W(0) and B(0)
are
within the set Φ0 , too. Let the norm bounds for the projection operator be chosen such that
bmax ≥ 2 kPk where P is selected from the Lyapunov equation and wmax ≥ 2 kW ∗ k > 2.
From Assumption 3, B is uncertain, but by using Theorem 1 and one of the methods for
fB
e ] is
shocking the history stack for W discussed in section 3.2, the zero solution of [ e W
bounded.
Proof. Focusing on a specific interval of time, [ti , ti+1 ], allows the ‘(t)’ to not be included
27
for ease of reading. The Lyapunov candidate is chosen to be:
1
1 fT −1 f 1 e T −1 e V(ζ) = eT Pe + tr W
Γw W + tr B ΓB B
2
2
2
fT
vec(W)
where ζ = [ eT
(4.10)
e T ]T and P, Q ∈ Rn×n agreeing with (2.12). The candivec(B)
date Lyapunov function, (4.10), can be bounded above and below by
min λmin (P), λmin (Γw−1 ), λmin (ΓB−1 ) ||ζ||2
≤ 2V(ζ) ≤
max λmax (P), λmax (Γw−1 ), λmax (ΓB−1 ) ||ζ||2
(4.11)
where the λmax (·) operator and λmin (·) are described in section 4.1. Let [t1 , t2 , . . . , tpmax ] ≥
ti < ti+1 be the sequence of times where each data point was recorded in the past with ti
being the initial starting time. The derivative of the Lyapunov candidate along the system
trajectory of (2.21) for each interval [ti , ti+1 ] with simplification and removing (ζ) for ease
is:



Ppmax


pX
T
max

xi b
K i 
i=1

T
T
T
T
TQ
e
e+
f 
e ku B
e
B
e − tr W
Bu
V̇ = − e
 − tr Buu
k
Ppmax


2
T


rj b
k=1
j=1
−
1
2
Kr j
fB
e T Pe + eT PB
eW
fT σ
σT W
(4.12)
where σ is defined in (2.34) and the whole derivation is in Appendix A.4.3.
Define b
Kr = ∆Kr + Kr and b
K = ∆K + ∆Kr + K where Kr is from (2.15), K is from
b instead of B as shown in
(2.13), and ∆K and ∆Kr are the differences caused by using B
b † BK
e ∗T r, and (2.22): B
b † B(
e K
eT x + K
fr T r). Define Ωσ = Ppmax σi σT which creates a
(2.23): −B
r
i
i=1
non-negative matrix. Then



p
Ppmax


T
max


X
Q
i=1 xi ∆K i 
fT Ωσ W
f − tr
e k uT B
e T − tr W
fT 
Bu
V̇ = − eT e − tr W
 P

k


2
pmax
T


r
∆
k=1
j
Kr j
j=1
1 T fe T
eW
fT σ − tr Buu
e TB
e
−
σ W B Pe + eT PB
(4.13)
2
and the expansion of b
K and b
Kr is in Appendix A.5. The first three terms in (4.13) are
negative definite and the last term is non-positive depending on if the input, u, is nonzero.
28
Let cQ = λmin (Q) and cP = kPk. Now, Arm of the reference model is chosen to be
Hurwitz and the reference signal, r, is a scaled version of xdes which is bounded. Therefore,
there exist scalars crm , cr > 0 such that crm > max(kxrm k) and cr > max(krk). Let
cΩσ = λmin (Ωσ ) and
cerr


Ppmax
T
i=1 xi ∆K i 

>  P
.
pmax
T
j=1 rj ∆Kr j
(4.14)
The concurrent learning error terms, ^, (in all variants) are functions of xi and rj which
are time invariant and bounded over the interval. Let the constants cR , cK > 0 where
f B
e
cR > [ crm cr ]T and cK > [ K∗ K∗r ]T . Now, formulating an upper bound for V̇ e, W,
and simplifying yields
2
2
cQ
f
e
f
e f
2
kek2 − cΩσ W
− cu B
+ cerr W
+ cP B
W kek
2
2 2 2
e f
f
e
+ cP crm B
W kek − kek + cR
W + cK B
.
V̇ ≤ −
(4.15)
f and B
e once more, it can be seen that the projection
Returning to the derivatives of W
operator in use would limit both of these to at most wmax and bmax , respectively which are
f and B
e began within their respective sets Φ1 per
defined by (4.5) and (4.6). Since both W
(2.38), they are bounded within that set. Thus an upper bound for V̇ is
cQ
kek2 − cΩσ w2max − cu b2max + cerr wmax + cP bmax wmax kek2
2
2
+ cP crm bmax wmax kek − kek + cR (wmax + cK )2 b2max .
V̇ ≤ −
(4.16)
By setting the left-hand-side of (4.16) to 0, and neglecting the negative constant terms in
(4.16), the conservative set outside of which V̇ is negative requires three constants. Let c1 ,
c2 , and c3 be greater than zero and let
c1 =
cQ
+ b2max (wmax + cK )2 − cP bmax wmax
2
(4.17)
c2 = cP crm bmax wmax + 2cR b2max (wmax + cK )2
(4.18)
c3 = cerr wmax .
(4.19)
29
Then the inequality (4.16) may be restated as
0 ≤ − c1 kek2 + c2 kek + c3
c3
c2
≤ .
kek kek −
c1
c1
(4.20)
(4.21)
To show that c1 is positive, remember that the selection of bmax and wmax was guided by
the theorem. So, expanding c1 obtains
cQ
+ b2max (w2max + 2cK wmax + c2K ) − cP bmax wmax
2
cQ
0<
+ b2max w2max + 2cK wmax b2max + b2max c2K − cP bmax wmax
2
c1 =
(4.22)
(4.23)
and rearranging delivers
cQ
+ b2max w2max + b2max c2K + 2cK wmax b2max > cP bmax wmax .
2
(4.24)
Using the minimum values, per the theorem, (4.24) becomes
cQ
+ (2cP )2 (2cK )2 + (2cP )2 c2K > 2cK (2cK )(2cP )2 + cP (2cP )(2cK )
2
cQ
+ 5cK > 4cK + 1
8c2P cK
(4.25)
(4.26)
Note that the left hand side is greater than the right hand side ignoring the cQ term if
cK is chosen greater than 1. Now, a set outside of which the derivative of the Lyapunov
candidate is negative may be shown:
Ω=
c2
c3
kek kek −
≤
c1
c1
(4.27)
where c1 is from (4.17), c2 is from (4.18), and c3 is from (4.19). Thus the tracking error
f and B
e are bounded by their projection operators. Therefore, the zero
is bounded and W
h
i
fB
e is bounded.
solution of e W
e conRemark 4.3 Though the above proof shows boundedness only, the thought is that B
verges towards zero faster than the system tracking error grows because of the exponential
convergence of concurrent learning. Thus the new errors in the current update stored in the
history stack for W, XW , will be smaller in magnitude than those already stored.
30
4.4
Bounded Operation
Here, the previous two theorems are combined together to show boundedness from the
initial time, t0 , onward to a time T1 > tf when the history stack for W becomes full rank
after being shocked.
Theorem 4 Consider the system of (2.1), the reference model of (2.2), the control law of
(2.4), the weight update laws of equations (2.27), and (2.33), and let pmax be an integer
such that pmax > m > n where m = 2n is the number of rows in W, n is the number of
states in the system, and pmax is the maximum number of recorded data points. Assume the
ideal values, W ∗ , and actual system parameters, B, are within Φ0,W and Φ0,B , respectively,
b
per (2.39), for the projection operators used, and that W(0) and B(0)
are within those
sets as well. Let the norm bounds for the projection operator be chosen such that bmax ≥
2 kPk, where P is selected with the Lyapunov equation, and wmax ≥ 2 kW ∗ k > 2. From
Assumption 3, B is uncertain so select a method of shocking from section 3.2. Let XBb =
[u1 , u2 , . . . , upmax ] ∈ Rn×pmax be the history stack matrix containing recorded inputs and let
h
i
n×pmax be the history stack matrix of recorded regressors
RBb = B,1
,
,
.
.
.
,
b
b
b max ∈ R
B,2
B,p
b Assume Xb and Rb are 0 at t0 and are updated by the CL-HSMEM routine of [10].
for B.
B
B
Let XW = [σ1 , σ2 , . . . , σpmax ] ∈ Rm×pmax be the history stack matrix of combined state and
reference input points and let RW = [b
W 1 , b
W 2 , . . . , b
W pmax ] ∈ Rm×pmax be the history stack
containing the regressors for W from (2.35), and let XW and RW be updated by the CLHSMEM routine from 0 at time t0 . Assume that the exogenous reference input, r(t), is
sufficiently rich to allow the CL-HSMEM routine to select n linearly independent points
and let tf be the time at which XBb becomes full rank. Let T1 > ts be the time at which
XW becomes full rank after shocking and assume that the reference input continues to be
sufficiently rich from tf to T1 such that the CL-HSMEM routine may select at least m data
h
i
f B
e for this system is bounded.
points. Then the zero solution of e, W,
Proof. This situation contains two cases: the time before tf when XBb becomes full rank
and the time after tf . The first case for the time prior to XBb becoming full rank fulfills
31
the requirements of Theorem 2 and is therefore bounded. The second case from time tf
to T1 , the time when the history stack for W becomes full rank, meets the requirements of
Theorem 3 and is therefore bounded. Then since a common Lyapunov candidate was used
in both cases the zero solution for this system is bounded.
4.5
Ultimate Boundedness
In the previous section, the boundedness of the concurrent learning adaptive scheme was
set forth in the presence of uncertain input allocation. In this section, the same scheme is
investigated, but the local ultimate bound is found.
In section 3.2, the term 0 was defined for use with the shocking methods. Let ts =
e
B(ts ). In this chapter the proof of Theorem 4 shows that the closed loop system is
e
bounded before B
< ts which is at a time later than both tf , when the history stack for
b becomes full rank, and ts , when the shocking method from section 3.2 first acts. Here,
B
ultimate boundedness will be investigated, but only inside the bounds of the projection
operators assuming the ideal weights and parameter values are within the boundary of
the projection operators, too. Thus, the following theorem is not for global, ultimate
boundedness, but for local ultimate boundedness. Once the history stack has been shocked,
the bmax terms in (4.16) can be replaced with ts for upper bounding purposes.
Theorem 5 Let D ⊂ Rn × Φ1,W × Φ1,B ⊂ Rn+m be a domain that contains the origin
T
T f
e
where m is the dimension of vec W
vec B
, let ν > 0 ∈ R be a constant, and let
V : D → R be a continuously differentiable function such that α1 (kζk) ≤ V(ζ) ≤ α2 (kζk)
and the derivative of V(ζ) in the trajectories of the system is V̇(ζ) such that V̇(ζ) ≤ −M(ζ)
∀ ν > kζk ≥ µ > 0 ∀ ζ ∈ D, where α1 and α2 are positive, increasing functions whose limit
as their argument goes to ∞ is ∞, and M(ζ) is a continuous, positive definite function.
b is full rank. Additionally, let r(t) be such that the input is
Assume the history stack for B
exciting over a finite interval (0, ts + δt) so that by T1 = ts + δt, 2n linearly independent
data points are selected by the CL-HSMEM routine where 2n is full rank for the W history
32
stack, XW . Then the closed loop system described by (2.1), (2.2), (2.4), (2.44), (2.43), and
using a shocking method from section 3.2 and Theorem 4 is ultimately bounded for t > T1 ,
the time when XW becomes full rank after the first shock occurred.
b history stack full rank implies that the system is bounded
Proof. Beginning with the B
from Theorem 4. Since 2n linearly independent points were stored in XW , the CL-HSMEM
routine from [10] found enough exciting input or rich reference input per Boyd and Sastry
in [2] from the time ts when the history stack was shocked to T1 . The system continues
to be described by the Lyapunov candidate from (4.10), which is restated here for ease of
reference:
1
1 fT −1 f 1 e T −1 e V(ζ) = eT Pe + tr W
ΓW W + tr B ΓB B .
2
2
2
(4.28)
The positive, increasing functions that can bound (4.28) above and below are (4.11) so α1
and α2 are defined as
1
α1 (ζ) = min λmin (P), λmin (Γw−1 ), λmin (ΓB−1 ) kζk2
2
1
α2 (ζ) = max λmax (P), λmax (Γw−1 ), λmax (ΓB−1 ) kζk2
2
(4.29)
(4.30)
where both α1 and α2 → ∞ as ζ → ∞. The Lyapunov candidate derivative is (4.13),
reproduced here:



p
Ppmax


T
max


X
i=1 xi ∆K i 
TQ
T
T eT
T 
f
f
e
f
V̇ = − e
e − tr W Ωσ W − tr
Buk uk B
− tr W  P



2
pmax
T


k=1
j=1 rj ∆Kr j
1 T fe T
eW
fT σ − tr Buu
e TB
e
−
σ W B Pe + eT PB
(4.31)
2
which can be more narrowly bounded in this stage of operation. Let ts be a constant such
e
e
that ts = B(t
)
.
Then
B < ts < bmax from the projection operator due to Lemma 1.
s
Since Theorem 4 assumed that the ideal weights and parameters were within the bounds
f towards zero with some bias.
of the projection operators, concurrent learning is driving W
The bias has been reduced by shocking XW . Thus, (4.31) may be bounded above neglecting
33
some of the negative definite terms by
V̇ ≤ −
2
cQ
f
f
f
f
2
kek2 − cΩσ W
+ cP ts W
kek + cP crm ts W
kek + cerr W
.
2
(4.32)
c
Let θ1 and θ2 be constants such that θ1 , θ2 ∈ (0, 1) and θ1 + θ2 < 1. Use part of 2Q kek2
2
f
f
2
f Use
and part of cΩσ W
kek for a certain subset of e and W.
to dominate cP ts W
2
f
f
another fraction of cΩσ W
to dominate the cerr W
term. Using θ1 and θ2 , (4.32) is
rewritten as
2
cQ
f
f
V̇ ≤ − (1 − θ1 )
kek2 − (1 − θ1 − θ2 )cΩσ W
kek
+ cP crm ts W
2
cQ
f
f
f2
f2
2
2
− θ1
kek + cΩσ W + cP ts W kek − θ2 cΩσ W + cerr W
. (4.33)
2
f
2
To find the values where cP ts W
kek is dominated, look at just the terms involved:
2 cQ
f
f
2
2
0 > −θ1
kek + cΩσ W
+cP ts W
kek
2
2 cQ
f
f
2
θ1
kek2 + cΩσ W
>
c
P ts W kek
2
(4.34)
(4.35)
In the worst case, the normed variables become equal, like one variable with all the exponents combined, and the inequality reduces to:
θ1
c
kzk2 + cΩσ kzk2 > cP ts kzk3
2
c
Q
θ1
+ cΩσ kzk2 > cP ts kzk3
2
θ1 c Q
+ cΩσ > kzk
cP ts 2
Q
(4.36)
(4.37)
(4.38)
f
where z is the psuedo-variable that combined the exponents. Then, for kek and W
<
f
cQ
2
θ1
+ cΩσ = ν, the cP ts W
kek term is dominated and may be neglected from
cP ts
2
the inequality. Following the same progression for the cerr term obtains:
2
f
f
0 > −θ2 cΩσ W
+ cerr W
2
f
f
θ2 cΩσ W
> cerr W
cerr
f
=µ
W >
θ2 cΩσ
(4.39)
(4.40)
(4.41)
It may be noted from page 28 that the cerr term itself is proportional to ts . Since t > ts ,
e
the errors collected are relatively small since B
is small and since XW is assumed full
34
rank, the value of cΩσ is increased at every opportunity. This bound is going to be near
zero. Then the positive definite function, M, is
M(ζ) = (1 − θ1 )
cQ
cP crm ts
kek −
2
(1 − θ1 )cQ
h
while (1 − θ1 − θ2 )cΩσ −
(cP crm ts )2
2(1−θ1 )cQ
i
2 (cP crm ts )2 f
f2
+
(1
−
θ
−
θ
)c
−
W
W
1
2 Ωσ
2(1 − θ1 )cQ
(4.42)
≥ 0. This is not a difficult condition to meet. Note
that cΩσ is in the numerator and positive because θ1 + θ2 < 1 by definition and cΩσ is
from the W concurrent learning history stack which the CL-HSMEM routine is increasing
at every opportunity, so it is large since XW is full rank. For the negative term, note that
ts is in the numerator and squared and the other elements are chosen directly or indirectly
by the designer. So the bounding function is
V̇(ζ) ≤ −M(ζ) ∀ kζk,
θ1 cQ
cerr
+ cΩσ > kζk >
cP ts 2
θ2 cΩσ
(4.43)
which implies that there are constants ψ1 , ψ2 such that ψ2 = sup maxµ<kζk<ν V(ζ) and
ψ1 = inf minµ<kζk<ν V(ζ). Then there is a positively invariant set, Ψ = {ζ ψ1 < V(ζ) < ψ2 }
and in that set, the Lyapunov candidate derivative is negative so the Lyapunov candidate
will decrease. Thus, following Khalil’s ultimate bound discussion in [16], solve α2 (α−1
1 (µ))
for the stated α1 and α2 , the ultimate bound is
s
max λmax (P), λmax (Γw−1 ), λmax (ΓB−1 )
cerr
·
ultbound =
.
−1
−1
θ2 cΩσ
min λmin (P), λmin (Γw ), λmin (ΓB )
(4.44)
Remark 4.4 If the shocking method selected allows for multiple shocks, then each time the
history stack for W is shocked, the value of ts could be updated and since more time has
e
e the ultimate bound
passed, the value of B
will have decreased. Since cerr depends o B,
may become arbitrarily small.
Remark 4.5 If
(1 − θ1 − θ2 )cΩσ −
(cP crm ts )2
2(1−θ1 )cQ
previous theorem.
35
< 0, the system is still bounded by the
4.6
Rate of Convergence
Inside the set Ψ, defined by the previous subsection, the Lyapunov candidate is positive while
its derivative is negative. This fulfills the requirements for local asymptotic stability. Yet,
due to noise in the measurements, the zero solution may not be attainable, but exponential
convergence into a set may be. Toward that end, the definition of exponentially pth ultimate
boundedness from [12] is given.
Definition 8 Let x(t) be a solution to the nonlinear system ẋ(t) = f(x(t)) with x(0) = x0 ,
then x(t) is said to be exponentially pth ultimately bounded if kx(t)kp ≤ α kx(0)kp e−ct + k
for some positive constants α, c, k.
So, if the system is within the set Ψ and the reference input continues to be rich the
convergence into the interior of Ψ is exponential in the sense of Definition 8. The following
theorem shows this.
Theorem 6 If in addition to the system meeting the conditions of Theorem 5, r(t) is such
that the input is exciting over a finite interval (0, ts + δt) so that by T1 = ts + δt, m linearly
f B
e)
independent data points are selected by the CL-HSMEM routine then the solution ( e, W,
of the closed loop system of (2.21) and (2.43) is exponentially ultimately bounded for t ≥ T1 .
Proof. By meeting the criteria for Theorem 5, XW , the history stack for W, is full rank.
T
T T
f
e
Restating (4.11) where ζ = eT vec W
returns:
vec B
1
−1
min λmin (P), λmin (ΓW
), λmin (ΓB−1 ) kζk2
2
1
−1
), λmax (ΓB−1 ) kζk2
≤ V(ζ) ≤ max λmax (P), λmax (ΓW
2
(4.45)
−1
Dividing (4.11) by max λmax (P), λmax (ΓW
), λmax (ΓB−1 ) gives an inequality for kζk2 that
is useful.
−1
), λmin (ΓB−1 )
min λmin (P), λmin (ΓW
2V(ζ)
kζk2 ≤
≤ kζk2
−1
−1
max λmax (P), λmax (ΓW
), λmax (ΓB−1 )
max λmax (P), λmax (ΓW
), λmax (ΓB−1 )
(4.46)
36
h
Let c4 = (1 − θ1 − θ2 )cΩσ −
(cP crm ts )2
2(1−θ1 )cQ
i
and rearranging (4.43) yields:
cQ 1
cP crm ts
V̇ ≤ − min (1 − θ1 ) , cP crm ts , c4 kζk2 +
2 2
(1 − θ1 )cQ
Applying (4.46) to (4.47) obtains:
f
W kek
cP crm ts 1
f
2
V̇(ζ) ≤ − min ((1 − θ1 )cQ , cP crm ts , 2c4 ) kζk +
W kek
2
(1 − θ1 )cQ
min ((1 − θ1 )cQ , cP crm ts , 2c4 )
cP crm ts f
V(ζ)
+
V̇(ζ) ≤ −
W
kek
−1
(1 − θ1 )cQ
max λmax (P), λmax (ΓW
), λmax (ΓB−1 )
(4.47)
(4.48)
Thus, by meeting Theorem 5, the system is ultimately bounded.
Now let
c=
min ((1 − θ1 )cQ , cP crm ts , 2c4 )
(4.49)
,
−1
max λmax (P), λmax (ΓW
), λmax (ΓB−1 )
and note that due to the boundedness, the quantity
Zt
cP crm ts f −c(t−τ)
(4.50)
k = sup
e
W(τ) ke(τ)k dτ
(1 − θ1 )cQ
t
T1
f exists. Let k = cP crm ts W(t)
ke(t)k, then from (4.48) the derivative is bounded by
(1−θ1 )cQ
(4.51)
V̇(ζ) ≤ −cV(ζ(T2 )) + k.
−1
Since V(ζ(T1 )) ≤ 21 max λmax (P), λmax (ΓW
), λmax (ΓB−1 ) kζ(T1 )k2
−1
), λmin (ΓB−1 ) kζk2 ≤ V(ζ), then
α1 (ζ) = 21 min λmin (P), λmin (ΓW
=
α2 (ζ(T1 )) and
kζk2 ≤ α2 (ζ(T1 ))e−ct + k
kζk2 ≤
(4.52)
1
−1
max λmax (P), λmax (ΓW
), λmax (ΓB−1 ) kζ(T1 )k2 e−ct + k.
2
(4.53)
Therefore, for t ≥ T1 the solution of the closed loop system of (2.6), (2.43), and (2.44),
fB
e ) is exponentially ultimately bounded in the sense of Definition 8.
(e W
Remark 4.6 If the shocking method selected allows multiple shocks, then with each shock
the value of ts could be updated to its new (smaller) value and the cerr term would be propor e
tionally smaller as well because of its relationship to ts since B
converges exponentially.
After each shock, new limits would be computed and the solution would be exponentially
ultimately bounded into another subset of the previous set.
37
f B
e)
Remark 4.7 If the measurement of ẋ is without noise, the zero solution of ( e W
could become a possibility. Then, at that point, an exponential solution which converges
to an arbitrarily small value could be calculated and applied. Only then would the original
exponential convergence rate of [10] be recovered.
4.7
Summary
In this chapter the stability of concurrent learning used in the presence of uncertain input
allocation was studied. Five theorems were presented to step with the system through its
operating time from initialization to after both history stacks are full rank. The solution to
the closed loop system is found to be bounded, and under some additional conditions, ultimately bounded and even exponentially ultimately bounded in the sense of Definition 8.
38
CHAPTER 5
Dissolved Control Allocation Matrix
5.1
Introduction
Having explored the uncertain control allocation matrix case, Assumption 3 is now relaxed.
Instead of the B matrix being uncertain, other than dimension, let it be dissolved into two
matrices multiplied together.
Assumption 4 Assume Λ is a diagonal matrix whose magnitude is uncertain and D is
known such that
B = DΛ
(5.1)
Since D is known, it gives insight into the dimension of B, but relaxes Assumption 3. The
matching condition does not change, but is shown for clarity.
Assumption 5 (Matching Condition) Let W ∗ ∈ R2n×n be a constant matrix such that


∗
 K 
(5.2)
W∗ = 

K∗r
where K∗ and K∗r fulfill
Arm = A + DΛK∗T
(5.3)
Brm = DΛK∗T
r .
(5.4)
Under Assumption 5, W ∗ includes the uncertain Λ−1 . Somanath used a similar formulation
in [29], but he used the reference model input allocation matrix in place of D used here.
39
b
Once again, since Λ is uncertain, estimating it becomes necessary. The estimate is Λ
and the vanishing term is
e=Λ
b − Λ.
Λ
(5.5)
Using (2.1) for the plant and (2.2) for the reference model and Assumption 5, the matching
conditions mentioned earlier in this chapter, the state derivative is
fT σ = Arm x + Brm r + D(Λ
b − Λ)
e W
fT σ
ẋ = Arm x + Brm r + DΛW
(5.6)
bW
fT σ − DΛ
eW
fT σ
ẋ = [ Arm Brm ] σ + DΛ
(5.7)
or in full matrix form with substitution for Λ
f is from (2.32) and σ is from (2.34). W is the working variable and the input is
where W
defined as u = W T σ. Subtracting (2.2) from (5.6), the derivative state tracking error is
obtained:
bW
fT σ − DΛ
eW
fT σ.
ė = Arm e + DΛ
(5.8)
Then the update law of the adaptive weights is selected to follow the normal MRAC update
b is used in place of B:
except that DΛ
b − ΓW
Ẇ = Proj W, −ΓW σe PDΛ
T
pX
max
σi ^TW,i
!
.
(5.9)
b uses the regressor in (5.10) and is show below:
The update law for Λ
(5.10)
i
b − D† (ẋ − Ax) = (Λ
b − Λ)u
Λ = Λu


pX
max
ḃ = Proj Λ,
b −ΓΛ uT − ΓΛ
Λ
uj TΛ,j 
Λ
(5.11)
j
where ΓΛ is the learning rate. Since the structure of Λ is known per Assumption 4 to
be diagonal, only the main diagonal elements of the above derivative are used. As per
section 2.6, the Proj(·, ·) operator is equivalent to Proj(·, ·, f(·, θd , θb )) when θd is the
selected, normed length across which the projection operator is to work (the width of ΦT )
and θb is the normed length at which point the projection operator begins to work (the
boundary of Φ0 from section 2.6).
40
5.2
Stability
The stability of the system will be analyzed by breaking the duration in to separate intervals.
The first theorem will be from the initialization time until the controller has two full rank
history stacks. The second theorem will be for local ultimate boundedness and the third
theorem in this chapter will be for exponentially ultimate boundedness in the sense of
Definition 8.
5.2.1
Bounded Operation
Theorem 7 Consider the plant of (2.1), the reference model of (2.2), the control law of
(2.4), Assumption 2, 4, and 5, and the adaptive laws (5.9) and (5.11). Assume the ideal
values, W ∗ , and actual system parameters, Λ, are within Φ0,W and Φ0,Λ , respectively, per
b
(2.39), for the projection operators used, and that W(0) and Λ(0)
are within those sets as
well. Let the norm bounds for the projection operator be chosen such that λΛ ≥ 2 kPk,
where P is selected with the Lyapunov equation, and wmax ≥ 2 kW ∗ k > 2 kDk. Let XΛ =
[u1 , u2 , . . . , upmax ] ∈ Rn×pmax be the history stack matrix containing recorded inputs and let
RΛ = [Λ,1 , Λ,2 , . . . , Λ,pmax ] ∈ Rn×pmax be the history stack matrix of recorded regressors
b Assume XΛ and RΛ are 0 at t0 and are updated by the CL-HSMEM routine during
for Λ.
periods of rich reference input over an interval [ t0 , t0 + δt ] allowing for the selection of n
points making XBb full rank and the selection of 2n points over [ ts , ts + δt ] making XW full
rank after time ts , when the first history stack shock occurs due to a method selected from
f Λ
e ] for the closed loop system is bounded.
section 3.2, then the zero solution of [ e, W,
Proof. Initially, the system is within an allowable set and, due to the projection operators,
will stay there. Neither history stack is full rank, so the CL-HSMEM routine will select
data points as they come along until XΛ becomes full rank at time, tf . Thus, the proof that
the system is bounded from t0 to tf follows the same logic as that of Theorem 2.
e → 0 per
Then, from time tf to ts , XΛ is full rank. Concurrent learning is driving Λ
e
Theorem 1, similar to Lemma 1, so Λ
is approaching and will attain values less than ts .
41
That happens at time ts and the shocking method will remove the old data from the history
stack XW . To show boundedness, select a linear quadratic for the Lyapunov candidate like:
V(ζ) =
1 T
1 fT −1 f 1 e T −1 e e Pe + tr W
ΓW W + tr Λ ΓΛ Λ
2
2
2
(5.12)
Then the Lyapunov candidate derivative can be shown to be:
Q
1 T fbT T
bW
fT σ − σT W
fΛ
e T DT Pe − eT PDΛ
eW
fT σ
e+
σ W Λ D Pe + eT PDΛ
2
2


!T
pX
pX
max
max


1
bW
f−W
fT σeT PDΛ
b−
f−W
fT
σi ^TW,i W
σi ^TW,i
+ tr −DT PeσT Λ

2 
i
i



T


pX
pX
max
max


1
T
T
T
T
T
T
e uu Λ
e −Λ
e uu Λ
e −
e −Λ
e
+ tr −Λ
uj Λ,j  Λ
uj Λ,j
(5.13)

2 


j
j
V̇(ζ) = − eT
which can be simplified by combining like terms and canceling where possible into
pX
max
1
Q
fT
fΛ
e T DT Pe − eT PDΛ
eW
fT σ + tr −W
σi b
TW i
−σT W
V̇(ζ) = − eT e +
2
2
i


p
max


X
e T uuT Λ
e −Λ
e
+ tr −Λ
(5.14)
uj TΛ,j


j
and noting that b
W = W + ∆W, from (2.35) returns
pX
max
1 T feT T
Q
T
T
T
T
eW
f σ + tr −W
f
−σ W Λ D Pe − e PDΛ
σi W,i
e+
V̇(ζ) = − e
2
2
i


pX
pX
max
max


fT
e T uuT Λ
e −Λ
e
σi ∆TW,i + tr −Λ
(5.15)
uj TΛ,j
+ tr −W


T
i
j
pX
max
Q
1 T feT T
T
T
T
T
eW
f σ − tr W
f
f
= −e
e+
−σ W Λ D Pe − e PDΛ
σi σi W
2
2
i


pX
pX
max
max


fT
e T uuT Λ
e −Λ
e
+ tr −W
σi ∆TW,i + tr −Λ
uj TΛ,j
(5.16)


T
i
j
Remembering that σ = [x r]T , it can be restated as σ = [(e+xrm ) r]T . Let crm be a constant
greater than the upper bound of the norm of xrm which is bounded because it is driven by
r which is a bounded input and the reference model was chosen to be Hurwitz. Let cΩσ
P
P
be a constant such that cΩσ > pi max σi σTi and let cerr > pj max σj ∆TW,i . Let cr be a
42
constant greater than the maximum norm of r. Then σ can be referred to as:

  
 e   crm 
σ ≤  + 

cr
0
(5.17)
and (5.16) can be bounded with using the triangle inequality:
e
f
V̇(ζ) ≤ − kQk kek2 + Λ
kDk kPk W
kek kσk
pX
max
2
2 2
f
f
2 e
T e
uj uj Λ + cerr W
− kuk Λ − cΩσ W − j
(5.18)
P
pmax
T
uj uj , cd = kDk, cp = kPk and cR to be a constant
Define cq = λmin (Q), cu ≥ j
greater than the norm of [ crm cr ]T . Then (5.16) can be bounded with (5.18) and making
substitutions:
e f
e f
2
V̇(ζ) ≤ − cq kek2 + cd cP Λ
W kek + cR cd cP Λ
W kek
2
2
2 2
f
e
f
e
− (kek − cR )2 W
− cK Λ
− cΩσ W
− cu Λ
2
e
f
+ 8cR cK Λ
+ cerr W
.
(5.19)
ė and W,
ḟ their upper bounds are known.
Since projection operators are being used with Λ
Substituting these bounds into (5.19) obtains:
V̇(ζ) ≤ − cq kek2 + cd cP λΛ wmax kek2 + cR cd cP λΛ wmax kek − (wmax − cK )2 λ2Λ kek2
+ 2cR (wmax − cK )2 λ2Λ kek − c2R (wmax − cK )2 λ2Λ − cΩσ w2max − cu λ2Λ
+ 8cR cK λ2Λ + cerr wmax .
(5.20)
Neglecting the negative definite, constant terms in (5.20) leaves
V̇(ζ) ≤ − cq kek2 + cd cP λΛ wmax kek2 − (wmax − cK )2 λ2Λ kek2
+ cR cd cP λΛ wmax kek + 2cR (wmax − cK )2 λ2Λ kek + 8cR cK λ2Λ + cerr wmax
(5.21)
which is quadratic in kek. To find the set, Ωy , where V̇ is negative, set the left hand side
43
of (5.21) to 0 and solve for kek:
c4 = cq + (wmax − cK )2 λ2Λ − cd cP λΛ wmax
c5 = cR cd cP λΛ wmax
c6 = 8cR cK λ2Λ + cerr wmax
V̇(ζ) ≤ − c4 kek2 + c5 kek + c6
Ωy =
(e)
c5
kek −
2c4
2
c2
c6
≤ 52 +
4c4 c4
(5.22)
(5.23)
and it may be noted that c4 is positive from the assumed limits in the theorem statement
for wmax and λΛ following the process for c1 on page 30. Let
φ = max V(e, wmax , λΛ )
e∈Ωy
be the criteria for another set, Ωφ such that
Ωφ = (e) V(e, wmax , λΛ ) ≤ φ
(5.24)
(5.25)
then Ωφ is a positively invariant set with respect to the Lyapunov candidate and Ωy ⊂ Ωφ .
V̇(ζ) is negative outside Ωφ while V(ζ) is positive so solutions starting outside Ωφ will enter
it and those beginning within Ωφ will not leave it since V̇(ζ) is negative at the boundary.
Thus the zero solution of the system is bounded.
Remark 5.1 Note that the knowledge of the signs of the elements of Λ is not necessary for
the boundedness condition. This sort of formulation has been used by others to model loss of
actuators (see [11,17,31]) so negativeness does not seem like an option for that application.
Remark 5.2 Another difference with the existing works is that D does not have to be Brm
with use of concurrent learning. This allows a broader selection of reference models.
5.2.2
Locally Ultimately Bounded
After ts + δt, XW is full rank and the system is bounded. If the system is examined in this
state, a more narrow bound may be found.
44
Theorem 8 Let D ⊂ Rn × Φ1,W × Φ1,Λ ⊂ Rn+m be a domain that contains the origin
T T
e
f
, and n is the number of states, let
vec Λ
where m is the dimension of vec W
constants µ, ν ∈ R be greater than 0, and let V : D → R be a continuously differentiable
function such that α1 (kζk) ≤ V(ζ) ≤ α2 (kζk) and the derivative of V(ζ) in the trajectories
of the system is V̇(ζ) such that V̇(ζ) ≤ −M(ζ) ∀ ν > kζk ≥ µ > 0 ∀ ζ ∈ D, where α1 and
α2 are positive, increasing functions whose limit as their argument goes to ∞ is ∞, and
b is full rank.
M(ζ) is a continuous, positive definite function. Assume the history stack for B
Additionally, let r(t) be such that the input is exciting over a finite interval (0, ts + δt) so
that by T2 = ts + δt, 2n linearly independent data points are selected by the CL-HSMEM
routine where 2n is full rank for the W history stack, XW . Then the closed loop system
described by (2.1), (2.2), (2.4), (5.9), (5.11), and using a shocking method from section 3.2
and Theorem 7 is ultimately bounded for t > T2 , the time when XW becomes full rank after
the first shock occurred.
Proof. Assuming the system fulfills the criteria for Theorem 7 meets the requirement for
XW to be full rank after T2 . Thus, following Theorem 5 with necessary changes based on
Assumption 4, the result is straight forward.
5.2.3
Rate of Convergence
Here, like in Chapter 4, convergence rate will be analyzed to see if exponential convergence
into a set is feasible.
Theorem 9 If in addition to the system meeting the conditions of Theorem 8, r(t) is such
that the input is exciting over a finite interval (0, ts + δt) so that by T2 = ts + δt, 2n
linearly independent data points are selected by the CL-HSMEM routine then the solution
f B
e ) of the closed loop system of (5.8) and (5.9) is exponentially ultimately bounded
( e, W,
for t ≥ T2 .
Proof. From Theorem 8, XW is full rank, T2 is known and the system is ultimately bounded.
45
Then by making the necessary changes to align with Assumption 4,
kζk2 ≤
1
−1
max λmax (P), λmax (ΓW
), λmax (ΓΛ−1 ) kζ(T2 )k2 e−ct + k
2
(5.26)
follows where the constants are defined on page 37. Therefore, for t ≥ T2 the solution of the
fΛ
e ] is exponentially ultimately bounded
closed loop system of (5.8), (5.9), and (5.11), [ e W
in the sense of Definition 8.
5.3
Summary
The dissolved B matrix variation on the uncertain input allocation matrix has been presented. The variation is like those used by other authors to model actuator loss, but here,
the loss does not have to be scaled between [ 0, 1 ] nor does the scaling need to be positive.
The known segment of the input allocation matrix, D, is also not required to be equal to
the input allocation matrix of the reference model. The solution to the closed loop system
is found to be bounded, and with certain conditions met, ultimately bounded, and under
even more specific circumstances, is found to be exponentially ultimately bounded.
46
CHAPTER 6
Simulations
6.1
Introduction
In this chapter, simulations using the shocking methods and concurrent learning controller
described previously are presented. Both the uncertain input allocation matrix case and
the dissolved B case are simulated.
6.2
Uncertain Allocation Matrix
In this section, an example is presented where the controller is placed in a system with
known state matrix, but the input allocation matrix is not known (a general case). The
system state and the input allocation matrices are defined as follows:




0
1
0
−1
0.1
0.1










B
=
A=
0 
 0.1 0.9

 0 0 0.5 




0.5 0 −0.5
−1 2 −3
The reference model state matrix is chosen to be Hurwitz with an identity input allocation
matrix and the estimate of B is
b
B(0)
= I3 .
b
Note that B(0)
is significantly different from B, not only in magnitude, but also the signs
of parameters on the main diagonal differ. The reference input for state X1 is set to 2 for
the first 5 seconds. Then the reference for state X2 is set to 2 from 15 to 25 seconds. Then
47
State X1
2
0
−2
0
Alg 1
20
40
60
80
State X2
Time (sec)
Alg 2100
Alg 3
2
Xrm
0
−2
0
20
40
60
80
100
60
80
100
60
80
100
State X3
Time (sec)
2
0
−2
0
20
40
Refernce
Time (sec)
2
0
−2
0
R1
20
R2
R3
40
Time (sec)
Figure 6.1: Time History of Reference Model Tracking Using Algorithms 1, 2, and 3. The Reference
r(t) is shown on its own. Note the tracking is quite good after each algorithm shocks the stacks. A
zoomed in view is shown with Figure 6.2.
the reference for state X3 is set to 2 from 35 to 45 seconds. Then state X1 is set to 1, state
X2 is set to −1, and state X3 is set to 0.5 from 60 seconds to 80 seconds. To show that the
system will track reference inputs where state X2 is the derivative of state X1 and state X3
is the derivative of state X2 , from 85 to 95 seconds, the reference input, R1 , is a sine wave,
its derivative for state R2 , and its second derivative for state R3 .
In Figure 6.1, the system states are shown tracking the reference model states. The
tracking is quite good after an initial disparity which is shown in greater detail in Figure 6.2.
This is to be expected, since the controller has the vastly different estimate of B when the
48
State X1
2
0
−2
0
Alg 1
2
4
6
Alg 2 10
Alg 3
8
Time (sec)
X rm
State X2
1
0
−1
0
2
4
6
8
10
6
8
10
State X3
Time (sec)
1
0
−1
0
2
4
Refernce
Time (sec)
2
R1
R2
R3
0
−2
0
2
4
6
8
10
Time (sec)
Figure 6.2: Time History of Reference Model Tracking Using Algorithms 1, 2, and 3, plotted to show
the first 10 seconds. The Reference r(t) is shown on its own. The initial variability of the states is
b converges and each of the methods select at least a point to shock the history stack.
shown while B
The shocks occurred at the blue ‘+’ for Alg. 1, at red the ‘X’s for Alg. 2, and at the black ‘5’ for
Alg. 3.
49
Tracking Error
e1
2
0
−2
0
5
10
Time (sec)
15
20
5
10
Time (sec)
15
20
5
10
Time (sec)
15
20
e2
0.5
0
−0.5
0
e3
1
0
−1
0
Figure 6.3: System Tracking Errors Shown Using Algorithms 1 (solid), 2 (dashed), and 3 (dash dot).
Note that the error reduces quickly once the stacks are shocked. Only the first 20 seconds is shown.
The full time scale is shown in Appendix D.
system is initialized than what B actually is. Note that even with the initial disparity, the
system states remain bounded. The disparity starts to reduce immediately as each of the
algorithms shocks the history stacks. It should be noted that baseline MRAC laws diverged
for the presented case of uncertain B matrix, and hence their response is not shown. The
reference input is also shown in Figures 6.1 and 6.2. In Figure 6.3, the error between the
system states and the reference model states is shown for the three algorithms. The error
reduces as the B matrix is identified and the W matrix converges. The full time scale of the
system tracking error is shown in Appendix D, Figure D.1, but the initial disparity seems
the most important to show.
b and W depends on the minimum eigenvalue of the respective
The convergence of B
history stack matrix. A time history of the minimum eigenvalues of the three history stacks
are displayed in Figure 6.4 with the different algorithms noted. The markers in Figure 6.4
50
Concurrent Learning Stacks Minimum Eigenvalue
Alg
Alg
Alg
Alg
Alg
Alg
Alg
Alg
Alg
Alg
Alg
Alg
Min Eigenvalue
15
10
5
0
0
20
40
60
Time (sec)
80
1
1
1
2
2
2
3
3
3
1
2
3
B̂
Kx
Kr
B̂
Kx
Kr
B̂
Kx
Kr
Shock
Shock
Shock
100
b K, and Kr History Stacks Using Algorithms 1, 2, and 3
Figure 6.4: Minimum Eigenvalues of the B,
(The markers denote when one of the algorithms purged the history stacks)
and 6.5 indicate when the stacks were purged under the different algorithms: a ‘+’ for
Algorithm 1, ‘X’s for Algorithm 2, and a ‘5’ for Algorithm 3. Algorithm 2 purges the
stacks many times in the first second and then does so once again at about 78 seconds
which is about the same time that all three state errors converge back to zero in Figure D.1.
Notice that the minimum eigenvalues for Algorithm 2, K and Kr , drop to zero, but begin
increasing again as the input becomes exciting (85 sec). Algorithms 1 and 3 purge the
stacks only once.
b matrix to the B matrix under the three
Figure 6.5 indicates the convergence of the B
algorithms. Note that only the first 30 seconds of the simulation are shown in Figure 6.5.
The balance is quite similar to the last 5 seconds shown in the figure.
The time history of the element values of W converge to their ideal values (the dotted
lines) in Figure 6.6 which is split into K and Kr for clarity. The elements have arrived in
the first 45 seconds which concludes the state by state excitation of the input.
Remark 6.1 It is interesting to note that the bounding of the growth of the W values aligns
with each algorithm’s time for shocking the history stack, but the steps in W toward the ideal
values align with steps in the input. This may be seen in Figure 6.6. All three algorithms
have shocked the stacks by about 5 seconds. In the upper plot for Kx Element Values, the
51
b Element Values
B
1
0.5
0
−0.5
−1
−1.5
0
5
10
15
Time (sec)
20
25
30
b Convergence Using Algorithms 1 (solid), 2 (dashed), and 3 (dash dot), Ideal Values
Figure 6.5: B
(dotted). Note that the dashed lines of Alg. 2 arrive faster than the others at the ideal values.
(markers same as Figure 6.4)
solid blue lines for Algorithm 1 growing up until about 5 seconds. Then the lines converge
to their ideal values with steps aligning with steps in the reference input which can most
clearly be seen in the lower plot of Kr Element Values, around 15 and 35 seconds.
Remark 6.2 This simulation was accomplished with a plant that is unstable which has
imaginary eigenvalues.
6.3
Dissolved Input Allocation Matrix
This section describes the relaxed uncertain input allocation matrix assumption, Assumption 4. Usually, this scheme is used to model actuator control loss. The system state and
the input allocation matrices are

 0

A=
 0

−1
defined as follows:


1 0 
 −1 0.1 0.1



0 0.5 
0
 B =  0.1 0.9


2 −3
0.5 0 −0.5






which is the same as the previous section. The reference model state matrix is chosen to
be Hurwitz with an identity input allocation matrix. Per Assumption 4, B = DΛ and Λ is
52
Kx Element Values
15
10
The max of the dashed red line is 20.2543
5
0
−5
−10
−15
0
20
40
60
80
100
80
100
Time (sec)
Kr Element Values
5
0
−5
0
The min of the dashed red line is −14.2636
20
40
60
Time (sec)
Figure 6.6: W Convergence Split into K and Kr for Clarity Using Algorithms 1 (solid), 2 (dashed),
and 3 (dash dot), Ideal Values (dotted). Note that the steps in convergence align with steps in the
input, but the bounding of growth aligns with shocking the stacks.
53
3
X1
Xrm1
2
1
0
−1
−2
0
10
20
30
40
50
Time (s)
60
70
80
90
100
3
X2
Xrm2
2
1
0
−1
−2
0
10
20
30
40
50
Time (s)
60
70
80
90
100
3
X3
Xrm3
2
1
0
−1
0
10
20
30
40
50
Time (s)
60
70
80
90
100
Figure 6.7: Dissolved Input Allocation Matrix, not using concurrent learning. Note that Λ must
be positive definite in this example and notice the high oscillations in the system states (blue), but
then the state tracks the reference model reasonably well.
uncertain. As an example, Figure 6.7 shows the state response when Λ is positive definite
and concurrent learning is not used. MRAC is able to reduce the state tracking error similar
to [26]. While the control method without concurrent learning works, it involves a large
amount of oscillation during the adaptation process.
Going away from the MRAC example of Figure 6.7, back to the relaxed uncertain input
allocation matrix case, the D and Λ matrices are defined as:

0.0500
 −0.5000 −.0333

D=
0
 0.0500 −0.3000

0.2500
0
−0.2500


 2 0 0 



Λ=
 0 −3 0 


0 0 2
54






(6.1)
(6.2)
which multiply to equal B while the controller’s estimate of Λ is initially
b
Λ(0)
= I3
b
where I3 is the identity matrix of rank 3. Note that Λ(0)
is significantly different from Λ,
not only in magnitude, but also the sign of the one parameter on the main diagonal differs.
That negative sign in Λ causes at least one of the eigenvectors of D to be of opposite
direction compared to the eigenvectors of B.
Remark 6.3 At this point, the normal adaptive law of MRAC, (2.28) for instance, would
fail and cause the weights to diverge because it is using D, which is known, and the plant
is using B and their eigenvectors are different. In the systems where D is in the subspace
of B, this does not matter because their eigenvectors are the same, but that is not the case
here.
For this case’s simulation, the input is a series of steps of different magnitudes shown
in the bottom plot of Figure 6.8 for the first 55 seconds. Then the reference has different
magnitudes for all three states simultaneously for 20 seconds. Finally, the reference executes
a sine waveform for state X1 , its derivative for state X2 and its second derivative for state
X3 from 85 seconds to 95 seconds. Then the reference is zero on all states through the end
of the simulation.
Figure 6.8 shows that adding concurrent learning to this adaptation process removes the
oscillation mentioned about Figure 6.7, and it shows the reference input. Remember that at
the beginning, the scaling matrix and the weights matrix are both wrong so tracking errors
are to be expected. Those errors lessen with time as can be seen in Figure 6.9. Again, as
the tracking approaches zero, non-concurrent learning MRAC adaptation would slow and
effectively stop. But, with concurrent learning the parameters and weights continue to be
driven to their ideal values which is shown in Figure 6.10. There, the orange dashed lines
are the ideal values for the weights and the history stack was shocked at about 9 seconds.
b to
This is marked on Figure 6.11 with a ‘+’ along with the display of the convergence of Λ
b is shown in the appendix on page 104.
its ideal values. The full length time history of Λ
55
State X1
10
5
0
−5
0
20
40
60
80
100
Time (sec)
Alg 1
State X2
2
X rm
0
−2
0
20
40
60
80
100
60
80
100
Time (sec)
State X3
5
0
−5
−10
0
20
40
Reference
Time (sec)
R1
R2
R3
2
0
−2
0
20
40
60
80
100
Time (sec)
Figure 6.8: Dissolved Input Allocation Matrix, using concurrent learning. Note the reduction in
oscillation compared to Figure 6.7, but this simulation has a negative sign in Λ and takes longer to
settle.
56
Tracking Error
e1
5
0
−5
0
20
40
60
80
100
60
80
100
60
80
100
Time (sec)
2
e2
1
0
−1
0
20
40
Time (sec)
5
e3
0
−5
−10
0
20
40
Time (sec)
Figure 6.9: Dissolved Input Allocation Matrix Errors, using concurrent learning. The errors decrease
with time.
K x Element Values
10
5
0
−5
0
20
40
60
80
100
60
80
100
Time (sec)
K r Element Values
10
5
0
−5
0
20
40
Time (sec)
Figure 6.10: Dissolved Input Allocation Matrix Adaptive Weights, with concurrent learning. Note
the same steps toward the ideal (dotted) values aligning with steps in the reference input.
57
b Element Values
Λ
1
0.5
0
−0.5
−1
−1.5
0
2
4
6
8
10
Time (sec)
b Convergence, using concurrent learning. The
Figure 6.11: Dissolved Input Allocation Matrix, Λ
time scale was reduced to show prior to the history stack shock which happened at the ‘+’. The full
time is pictured in the appendix.
6.4
Summary
Two cases have been shown in this chapter: the uncertain input allocation matrix and
the dissolved input allocation matrix case. In both cases the reference model allocation
matrices were quite different from the plant’s allocation matrix. In both cases the controllers
converged onto the ideal weights and reduced the tracking error.
58
CHAPTER 7
Conclusion
7.1
Summary
This thesis seeks to address the problem of uncertain input allocation in linear systems
control. Using a reinforcement learning technique, concurrent learning, the result can be
achieved with known state matrix. The approach relies on simultaneous estimation of the
uncertain input allocation matrix while the system is actively controlled using that estimate
of the uncertain allocation matrix. Lyapunov theory was used to show that the approach
will result in system states being bounded. The concurrent learning history stack purging or
shocking is required to remove data retained while the B matrix estimates were far from their
true values. Three algorithms for shocking the history stack were presented. Algorithms 2
and 3 have the advantage to be able to handle multiple changes in the input allocation matrix
if they happen relatively slowly. Algorithm 1 would not be able to do that in its present
form. The local ultimate boundedness and local exponentially ultimate boundedness of this
result are shown. Then a relaxation of the uncertain input allocation matrix assumption
is discussed and shown to be locally ultimately bounded in the closed-loop control case
and locally exponentially ultimately bounded as well. Simulation results for both systems
are included to show the controller’s performance characteristics. These results establish
the feasibility of using a learning based adaptive controller to handle uncertainties in input
allocation matrices.
59
7.2
Future Work
Recommendations for future work include
• Relax the assumption that A be known. This can be accomplished if the sign of the
input allocation matrix is known and does not change along the lines of [26].
• Work with NASA’s flexible Generic Transport Model, f–GTM, since the wings may
cause situations where the input allocation matrix is uncertain
• Delve into multiple objective optimization, MO–Op, where objectives may fight each
others’ fulfillment. In the case of the f–GTM, the objectives could be something like
fuel efficiency, ride comfort, and stability
• Use risk averse methods to make inferences on stability of systems
e is the
• Investigate uncertain B matrix from the singular perturbation theory where B
fast variable and the other system variables are considered slow
60
References
[1] Karl Johan Åström and Björn Wittenmark. Adaptive Control. Addison-Weseley, Readings, 2nd edition, 1995.
[2] Stephen Boyd and Shankar Sastry. Necessary and sufficient conditions for parameter
convergence in adaptive control. Automatica, 22(6):629–639, 1986.
[3] William Brogan. Modern control theory. Prentice Hall, Englewood Cliffs, N.J, 1991.
[4] Girish Chowdhary. Concurrent Learning for Convergence in Adaptive Control Without
Persistency of Excitation. PhD thesis, Georgia Institute of Technology, Atlanta, GA,
2010.
[5] Girish Chowdhary and Eric N. Johnson. Concurrent learning for convergence in adaptive control without persistency of excitation. In 49th IEEE Conference on Decision
and Control, pages 3674–3679, 2010.
[6] Girish Chowdhary and Eric N. Johnson. Theory and flight test validation of a concurrent learning adaptive controller. Journal of Guidance Control and Dynamics,
34(2):592–607, March 2011.
[7] Girish Chowdhary, Maximilian Mühlegg, Jonathan P. How, and Florian Holzapfel.
Concurrent learning adaptive model predictive control. In Qiping Chu, Bob Mulder,
Daniel Choukroun, Erik-Jan Kampen, Coen Visser, and Gertjan Looye, editors, Advances in Aerospace Guidance, Navigation and Control, pages 29–47. Springer Berlin
Heidelberg, 2013.
[8] Girish Chowdhary, Maximilian Mühlegg, and Eric N. Johnson. Exponential parameter
and tracking error convergence guarantees for adaptive controllers without persistency
of excitation. International Journal of Control, 87(8):1583–1603, 2014.
61
[9] Girish Chowdhary, Tongbin Wu, Mark Cutler, and Jonathan P. How. Rapid transfer of
controllers between UAVs using learning based adaptive control. In IEEE International
Conference on Robotics and Automation (ICRA). IEEE, 2013.
[10] Girish Chowdhary, Tansel Yucelen, Maximilian Mühlegg, and Eric N. Johnson. Concurrent learning adaptive control of linear systems with exponentially convergent bounds.
International Journal of Adaptive Control and Signal Processing, 27(4):280–301, 2013.
[11] Travis E. Gibson, Anuradha M. Annaswamy, and Eugene Lavretsky. Adaptive systems
with closed-loop reference models: Stability, robustness and transient performance.
arXiv preprint arXiv:1201.4897, 2012.
[12] Wassim M. Haddad and VijaySekhar Chellaboina. Nonlinear Dynamical Systems and
Control: A Lyapunov-Based Approach. Princeton University Press, Princeton, 2008.
[13] Martin Hagan. Neural network design. PWS Pub, Boston, MA, 1st edition, 1996.
[14] Zhuo Han and Kumpati Narendra. Multiple adaptive model for control. In Conference
on Decision and Control, pages 60–65, Atlanta, December 2010. IEEE.
[15] Petros A. Ioannou and Petar V. Kokotovic. Adaptive Systems with Reduced Models.
Springer Verlag, Secaucus, NJ, 1983.
[16] Hassan Khalil. Nonlinear systems. Prentice Hall, Upper Saddle River, N.J, 3rd edition,
2002.
[17] Nakawan Kim. Improved Methods in Neural Network Based Adaptive Output Feedback Control, with Applications to Flight Control. PhD thesis, Georgia Institute of
Technology, Atlanta Ga, 2003.
[18] Eugene Lavretsky. Combined/composite model reference adaptive control. Automatic
Control, IEEE Transactions on, 54(11):2692–2707, Nov. 2009.
[19] Eugene Lavretsky, Travis E Gibson, and Anuradha M Annaswamy. Projection operator
in adaptive systems, 2011.
62
[20] Maximilian Mühlegg, Girish Chowdhary, and Eric N. Johnson. Concurrent learning
adaptive control of linear systems with noisy measurements. In AIAA Guidance, Navigation, and Control Conference, Guidance, Navigation, and Control and Co-located
Conferences. American Institute of Aeronautics and Astronautics, 2012.
[21] Kumpati S. Narendra and Anuradha M. Annaswamy.
Stable Adaptive Systems.
Prentice-Hall, Englewood Cliffs, 1989.
[22] Kumpati S. Narendra and Jeyendran Balakrishnan. Adaptive control using multiple
models. Automatic Control, IEEE Transactions on, 42(2):171–187, feb 1997.
[23] Roger Nussbaum. Some remarks on a conjecture in parameter adaptive control. Systems & Control Letters, 3(5):243–246, November 1983.
[24] R. Penrose. A generalized inverse for matrices. Mathematical Proceedings of the Cambridge Philosophical Society, 51:406–413, 7 1955.
[25] Jean-Baptiste Pomet and Laurent Praly. Adaptive nonlinear regulation: estimation
from the lyapunov equation. Automatic Control, IEEE Transactions on, 37(6):729–
740, June 1992.
[26] Ben Reish, Girish Chowdhary, Kemal Ure, and Jonathan P. How. Concurrent learning
adaptive control in the presence of uncertain control allocation matrix. In Proc. of the
Conference on Guidance, Navigation, and Control, pages 1–19. AIAA, 2013.
[27] Benjamin Reish and Girish Chowdhary. Concurrent learning adaptive control for systems with unknown sign of control effectiveness. In Proceedings of the Conference on
Decision and Control, pages 4131–4136. IEEE, 2014.
[28] Mario A. Santillo and Dennis S. Bernstein. Adaptive control based on retrospective
cost optimization. Journal of Guidance, Control, and Dynamics, 33(2):289–304, MarchApril 2010.
[29] Amith Somanath. Adaptive control of hypersonic vehicles in presence of actuation
uncertainties. Sm, Massachusetts Institute of Technology, Cambridge, MA, June 2010.
63
[30] Gang Tao. Adaptive Control Design and Analysis. Wiley, New York, 2003.
[31] Gang Tao, Suresh M. Joshi, and Xiaoli Ma. Adaptive state feedback and tracking
control of systems with actuator failures. Automatic Control, IEEE Transactions on,
46(1):78–95, jan 2001.
[32] Andrey Nikolayevich Tikhonov. On the stability of inverse problems. Dokl. Akad. Nauk
SSSR, 39(5):195–198, 1943.
[33] Ming-Jui Yu, Yousaf Rahman, Ella M. Atkins, Ilya V. Kolmanovsky, and Dennis S.
Bernstein. Minimal modeling adaptive control of the NASA generic transport model
with unknown control-surface faults. In AIAA Guidance, Navigation, and Control
(GNC) Conference, Proc. of, pages 1–21, Boston, MA, August 2013.
64
APPENDIX A
Derivations
This appendix contains the full derivations of the jumps shown in the preceding chapters.
Section A.1 shows the derivation of the normal Concurrent Learning MRAC equation using
the known and unknown allocation matrix components. Section A.2 shows the derivation
of the Concurrent Leaning regressor terms. Section A.3 shows the derivation of the system
e and B.
b Section A.4 derives the Lyapunov candidate and derivative and
error in terms of B
works through the bounding of them. Section A.5 expands the regressors for K and Kr to
terms capable of being bounded. Section A.6 shows how the input can be expressed in
e and K
fr .
terms of e, K,
65
A.1
Expanding the Input Matrix into Known and Unknown
Components
b e
e b
b
Assume Arm = A+BK∗T and Brm = BK∗T
r . Assume B = B− B because B ≡ B−B. Then, B is
e is the component unknown to the controller.
the component known to the controller while B
ẋ =Ax + Bu
(A.1)
b − B)u
e
ẋ =Ax + (B
(A.2)
u =upd + urm = KT x + KTr r
(A.3)
b T x + KT r) − B(K
e T x + KT r)
ẋ =Ax + B(K
r
r
(A.4)
b T x + BK
b T r ± (B
b − B)K
e ∗T x ± (B
b − B)K
e ∗T r − B(K
e T x + KT r)
=Ax + BK
r
r
r
(A.5)
b − B)K
e ∗T x + BK
b T x + BK
b T r − BK
b ∗T x + BK
e ∗T x ± (B
b − B)K
e ∗T r − B(K
e T x + KT r)
=Ax + (B
r
r
r
(A.6)
b − B)K
e ∗T x + B(K
b T − K∗T )x + BK
b T r + BK
e ∗T x ± (B
b − B)K
e ∗T r − B(K
e T x + KT r)
=Ax + (B
r
r
r
(A.7)
b − B)K
e ∗T x + (B
b − B)K
e ∗T r + B(
b K
e T )x + BK
b T r + BK
e ∗T x − (B
b − B)K
e ∗T r
=Ax + (B
r
r
r
e T x + KT r)
− B(K
r
(A.8)
b − B)K
e ∗T x + (B
b − B)K
e ∗T r + B(
b K
e T )x + BK
b T r + BK
e ∗T x − BK
b ∗T r + BK
e ∗T r
=Ax + (B
r
r
r
r
e T x + KT r)
− B(K
r
(A.9)
b − B)K
e ∗T x + (B
b − B)K
e ∗T r + B(
b K
e T )x + B(K
b T − K∗T )r + BK
e ∗T x + BK
e ∗T r
=Ax + (B
r
r
r
r
e T x + KT r)
− B(K
r
(A.10)
b − B)K
e ∗T x + (B
b − B)K
e ∗T r + B(
b K
e T )x + B(
b K
fr T )r + BK
e ∗T x + BK
e ∗T r
=Ax + (B
r
r
e T x + KT r)
− B(K
r
(A.11)
b − B)K
e ∗T x + (B
b − B)K
e ∗T r + B(
b K
e T )x + B(
b K
fr T )r + BK
e ∗T x + BK
e ∗T r
=Ax + (B
r
r
e T x − BK
e Tr
− BK
r
(A.12)
66
T
b − B)K
e ∗T x + (B
b − B)K
e ∗T r + B(
b K
e T )x + B(
b K
fr )r
=Ax + (B
r
e ∗T x − BK
e T x + BK
e ∗T r − BK
e Tr
+ BK
r
r
(A.13)
T
b − B)K
e ∗T x + (B
b − B)K
e ∗T r + B(
b K
e T )x + B(
b K
fr )r
=Ax + (B
r
e T − K∗T )x − B(K
e T − K∗T )r
− B(K
r
r
(A.14)
T
b − B)K
e ∗T x + (B
b − B)K
e ∗T r + B(
b K
e T )x + B(
b K
fr )r
=Ax + (B
r
e K
e T )x − B(
e K
fr T )r
− B(
T
(A.15)
T
b eT
b f
e eT
e f
=(Ax + BK∗T x) + BK∗T
r r + B(K )x + B(Kr )r − B(K )x − B(Kr )r
(A.16)
bK
eT x − B
eK
eT x + B
bK
fr T r − B
eK
fr T r
=(Arm )x + Brm r + B
(A.18)
bK
eT x + B
bK
fr T r − B
eK
eT x − B
eK
fr T r
=(A + BK∗T )x + Brm r + B
(A.17)
b − B)
e K
e T x + (B
b − B)
e K
fr T r
=(Arm )x + Brm r + (B
(A.19)
e T x + BK
fr T r
ẋ =Arm x + Brm r + BK
(A.20)
It can be seen that the derivation of ẋ is the same from either perspective, using B or using
b − B.
e Equation (A.19) is useful for further discussions.
B
67
A.2
Define Concurrent Learning Error Terms
Now the definition of the terms has to be addressed. Using (A.19) as a starting point for
b and B
e displayed in it.
the derivation seem like a good start since it has both B
b − B)
e K
e T x + (B
b − B)
e K
fr T r
ẋ =Arm x + Brm r + (B
b − B)
e K
e T x + (B
b − B)
e K
fr T r
ẋ − Arm x − Brm r =(B
T
b − B)
e K
fr r =(B
b − B)
e K
eT x
ẋ − Arm x − Brm r − (B
(A.21)
(A.22)
(A.23)
b − B)
e K
fr T r =B
bK
eT x − B
eK
eT x
ẋ − Arm x − Brm r − (B
(A.24)
bK
e T x =ẋ − Arm x − Brm r − (B
b − B)
e K
fr T r + B
eK
eT x
B
bK
e T x =ẋ − Arm x − Brm r − B
bK
fr T r + B
eK
fr T r + B
eK
eT x
B
bK
e T x =ẋ − Arm x − Brm r − B
bK
fr T r + B(
e K
fr T r + K
e T x)
B
T
(A.25)
(A.26)
(A.27)
T
e T x =B
b † (ẋ − Arm x − Brm r − B
bK
fr r + B(
e K
fr r + K
e T x)) (A.28)
K = K
From K to what concurrent learning stores:
b † (ẋ − Arm x − Brm r − B
bK
fr T r + B(
e K
fr T r + K
e T x))
(KT − K∗T )x =B
T
T
(A.29)
b † (ẋ − Arm x − Brm r − B
bK
fr r + B(
e K
fr r + K
e T x))
δ = K∗T x =KT x − B
(A.30)
b † B(
e K
fr T r + K
e T x) =KT x − B
b † (ẋ − Arm x − Brm r − B
bK
fr T r)
δ = K∗T x + B
(A.31)
So (A.30) shows why the stacks have to be shocked.
Thus (A.31) shows what the controller actually uses to calculate δ on the right-hand-side
e errors end up on the left-hand-side, in the history stack.
so that the B
Now the definition for the Kr term.
fr T =KT − K∗T
K
r
r
T
fr r =KT r − K∗T r
Kr = K
r
r
(A.32)
(A.33)
=KTr r − B† Brm r
(A.34)
b − B)
e † Brm r
=KTr r − (B
(A.35)
68
(A.36)
Working back alternate ways of showing the same thing.
b e ∗T b ∗T e ∗T
Brm =BK∗T
r = (B − B)Kr = BKr − BKr
(A.37)
b † (Brm + BK
e ∗T )
Kr∗T =B
r
(A.39)
e ∗T =BK
b ∗T
Brm + BK
r
r
(A.38)
(A.40)
Found an equation for K∗T
r , and plugging it into A.33:
T
fr r =KT r − K∗T r
Kr = K
r
r
fr T r =KT r − B
b † (Brm + BK
e ∗T )r
K
r
r
fr T r + B
b † BK
e ∗T r =KT r − B
b † Brm r
b
Kr = K
r
r
T
b † e ∗T
b†
KTr r − K∗T
r r + B BKr r =Kr r − B Brm r
b † e ∗T
b†
K∗T
r r − B BKr r =B Brm r
b†
δc
Kr = B Brm r
(A.41)
(A.42)
(A.43)
(A.44)
(A.45)
(A.46)
Again, an error shows up that makes the reason for shocking easier to understand. Once
again, (A.43) shows why shocking is necessary.
69
A.3
Derivative of the Error
b−B
e is used.
Look at what happens to ė when B = B
e =x − xrm
(A.47)
ė =ẋ − ẋrm
(A.48)
=Ax + Bu − Arm xrm − Brm r
(A.49)
b − B)u
e − Arm xrm − Brm r
=Ax + (B
(A.50)
b T x + KT r) − B(K
e T x + KT r) − Arm xrm − Brm r
=Ax + B(K
r
r
(A.52)
b − Bu
e − Arm xrm − Brm r
=Ax + Bu
(A.51)
b T x + BK
b T r − B(K
e T x + KT r) − Arm xrm − Brm r
=Ax + BK
r
r
(A.53)
b T x + BK
b T r − B(K
e T x + KT r) ± (B
b − B)K
e ∗T x − Arm xrm − Brm r
=Ax + BK
r
r
(A.54)
b − B)K
e ∗T x + BK
b T x + BK
b T r − B(K
e T x + KT r) − (B
b − B)K
e ∗T x − Arm xrm − Brm r
=Ax + (B
r
r
(A.55)
b T x + BK
b T r − B(K
e T x + KT r) − BK
b ∗T x + BK
e ∗T x − Arm xrm − Brm r
=Ax + (B)K∗T x + BK
r
r
(A.56)
b T x − BK
b ∗T x + BK
b T r − B(K
e T x + KT r) + BK
e ∗T x − Arm xrm − Brm r
=(A + BK∗T )x + BK
r
r
(A.57)
b T − K∗T )x + BK
b T r − B(K
e T x − K∗T x + KT r) − Arm xrm − Brm r
=(Arm )x + B(K
r
r
(A.58)
bK
e T x + BK
b T r − B([
e K
e T ]x + KT r) − Brm r
=Arm x − Arm xrm + B
r
r
(A.60)
bK
e T x + BK
b T r ± (B
b − B)K
e ∗T r − B(
e K
e T x + KT r) − Brm r
=Arm (e) + B
r
r
r
(A.62)
bK
e T x + BK
b T r − BK
b ∗T r + BK
e ∗T r − B(
e K
e T x + KT r) + (B)K∗T r − Brm r
=Arm e + B
r
r
r
r
r
(A.64)
b K
e T )x + BK
b T r − B([K
e T − K∗T ]x + KT r) − Brm r
=Arm x − Arm xrm + B(
r
r
(A.59)
bK
e T x + BK
b T r − B(
e K
e T x + KT r) − Brm r
=Arm (x − xrm ) + B
r
r
(A.61)
bK
e T x + BK
b T r − (B
b − B)K
e ∗T r − B(
e K
e T x + KT r) + (B
b − B)K
e ∗T r − Brm r
=Arm e + B
r
r
r
r
(A.63)
bK
e T x + B(K
b T − K∗T )r + BK
e ∗T r − B(
e K
e T x + KT r) + Brm r − Brm r
=Arm e + B
r
r
r
r
bK
e T x + B(
b K
fr T )r + BK
e ∗T r − B(
e K
e T x + KT r)
=Arm e + B
r
r
70
(A.65)
(A.66)
T
bK
eT x + B
bK
fr r − B(
e K
e T x + KT r − K∗T r)
ė =Arm e + B
r
r
bK
eT x + B
bK
fr T r − B(
e K
eT x + K
fr T r)
ė =Arm e + B
(A.67)
(A.68)
b is obvious in (A.68).
Here again, the error term for using B
A.4
The Lyapunov Candidate and Derivative
Make ζ= eT
e T
vec(K)
fr )T
vec(K
e T
vec(B)
T
for ease of expostulation. The trace will be
used in this Lyapunov candidate, so a section is reserved for pertinent properties of the
trace operator. Then the section continues with the expansion of the Lyapunov candidate
derivative.
A.4.1
Properties of the Trace Operator
The trace tr{·} is defined as the sum of the main diagonal elements of a square matrix.
P
Another way to think about it is tr{A} = i Aii or even (assuming the vec(·) operator is
defined) tr{A} = vec(A)T vec(I), where I is the identity matrix of the same dimension as A.
A matrix and its transpose have equal traces: tr{A} = tr AT . A trace of a product of
several matrices is equal to the trace of those matrices in cyclical order:
tr{ABCD} = tr{BCDA} = tr{CDAB} = tr{DABC}
which is different than the trace of an arbitrarily ordered set of matrices. The trace of
an arbitrary set of matrices is not equal to another order of them, in general: tr{ABC} 6=
tr{ACB}. If the matrices, A, B, C, are symmetric then the order does not matter. Their
traces will be equal. [3, Chapter 4]
A.4.2
The Lyapunov Candidate Equation
Define an equation as a non-negative, scalar valued function that measures the energy in
the system. Usually, the working variables of the system make an appearance and since the
71
value should be non-negative, square the working variables.
T
e K
fr , B)
e = 1 eT Pe + 1 tr K
e T Γ −1 K
e + 1 tr K
fr Γ −1 K
fr + 1 tr B
e T Γ −1 B
e
V(e, K,
x
r
B
2
2
2T
2
e K
fr , B)
e = 1 ėT Pe + eT Pė + 1 tr K
ė Γ −1 K
e+K
e T Γ −1 K
ė
V̇(e, K,
x
x
2
2
T
T
T −1
1
1
−1 f
−1 e
T −1 ė
ḟ
f
ḟ
ė
e
+ tr Kr Γr Kr + Kr Γr Kr + tr B ΓB B + B ΓB B
2
2
(A.69)
(A.70)
The derivative of important variables are summarized here. Concurrent learning terms are
included.
bK
eT x + B
bK
fr T r − B(
e K
eT x + K
fr T r)
ė =Arm e + B
(A.71)
b − B)
e K
e T x + (B
b − B)
e K
fr T r
ẋ =(Arm )x + Brm r + (B
(A.72)
T
e T x + BK
fr r
ẋ =Arm x + Brm r + BK
ė = − Γx (xeT PB
b+
K
pX
max

j=1
b+
= −Γx xeT PB
ḟr = − Γr reT PB
b − Γr
K

pX
max
j=1
pX
max
j=1
b+
= −Γr reT PB
ė = − ΓB
B
xj b
TK i )
j=1

TK i 
xj b
TKr j
rj b
pX
max
(A.73)

rj b
TKr j 
b − ẋ + Ax)T +
u(Bu
72
(A.74)
pX
max
k=1
(A.75)
uk (ẋk − Axk )T
!
(A.76)
A.4.3
The Lyapunov Candidate Derivative
1
V̇(ζ) =
2
h
e K
eT x + K
fr T r)
bK
eT x + B
bK
fr T r − B(
Arm e + B
iT
h
bK
eT x
Pe + eT P Arm e + B
i
bK
fr T r − B(
e K
eT x + K
fr T r)
+B
T
T
T −1
1
1
−1 e
T −1 ė
−1 f
ė
e
ḟ
f
ḟ
+ tr K Γx K + K Γx K + tr Kr Γr Kr + Kr Γr Kr
2
2
T
1
ė Γ −1 B
e+B
e T Γ −1 B
ė
+ tr B
B
B
2
=
=
(A.77)
i
1 h T T
eB
b T + rT K
fr B
b T − (K
eT x + K
fr T r)T B
e T Pe + eT PArm e + eT PB
bK
eT x
e Arm + xT K
2
T
T b fT
T e eT
f
+e PBKr r − e PB(K x + Kr r)
T
T
T −1
1
1
−1
T
−1
−1
ė Γ K
e+K
e Γ K
ė + tr K
ḟr Γ K
fr + K
fr Γ K
ḟr
+ tr K
x
x
r
r
2
2
T
1
−1 e
T −1 ė
ė
e
+ tr B ΓB B + B ΓB B
(A.78)
2
1 T T
eB
b T Pe + rT K
fr B
b T Pe − (K
eT x + K
fr T r)T B
e T Pe + eT PArm e + eT PB
bK
eT x
e Arm Pe + xT K
2
bK
fr T r − eT PB(
e K
eT x + K
fr T r)
+eT PB
T
T
T −1
1
1
−1 e
−1 f
T −1 ė
ė
ḟ
e
f
ḟ
+ tr K Γx K + K Γx K + tr Kr Γr Kr + Kr Γr Kr
2
2
T
1
ė Γ −1 B
e+B
e T Γ −1 B
ė
+ tr B
(A.79)
B
B
2
The Lyapunov equation is necessary for MRAC type control system stability to be analyzed.
It says for two positive definite, symmetric matrices P, Q ∈ Rn×n , that AT P + PA = −Q.
So, assuming A = Arm , select P such that ATrm P + PArm = −Q
73
V̇(ζ) =
=
1 T T
eB
b T Pe + eT PB
bK
e T x + rT K
fr B
b T Pe
e Arm Pe + eT PArm e + xT K
2
bK
fr T r − (K
eT x + K
fr T r)T B
e T Pe − eT PB(
e K
eT x + K
fr T r)
+eT PB
T
T
1
ė Γ −1 K
e+K
e T Γ −1 K
ė + 1 tr K
ḟr Γ −1 K
fr + K
fr T Γ −1 K
ḟr
+ tr K
x
x
r
r
2
2
T
1
ė Γ −1 B
e+B
e T Γ −1 B
ė
+ tr B
B
B
2
i
1 Th T
eB
b T Pe + eT PB
bK
e T x + rT K
fr B
b T Pe
e Arm P + PArm e + xT K
2
bK
fr T r − (K
eT x + K
fr T r)T B
e T Pe − eT PB(
e K
eT x + K
fr T r)
+eT PB
T
T
T −1
1
1
−1 e
T −1 ė
−1 f
ė
e
ḟ
f
ḟ
+ tr K Γx K + K Γx K + tr Kr Γr Kr + Kr Γr Kr
2
2
T
1
ė Γ −1 B
e+B
e T Γ −1 B
ė
+ tr B
B
B
2
(A.80)
(A.81)
Apply the Lyapunov equation result figured yesterday.
=
1 T
eB
b T Pe + eT PB
bK
e T x + rT K
fr B
b T Pe
e [−Q] e + xT K
2
bK
fr T r − (K
eT x + K
fr T r)T B
e T Pe − eT PB(
e K
eT x + K
fr T r)
+eT PB
T
T
T −1
1
1
−1 e
T −1 ė
−1 f
ė
e
ḟ
f
ḟ
+ tr K Γx K + K Γx K + tr Kr Γr Kr + Kr Γr Kr
2
2
T
1
ė Γ −1 B
e+B
e T Γ −1 B
ė
+ tr B
B
B
2
74
(A.82)
V̇(ζ) =
1 T
eB
b T Pe + eT PB
bK
e T x + rT K
fr B
b T Pe
−e Qe + xT K
2
bK
fr T r − (K
eT x + K
fr T r)T B
e T Pe − eT PB(
e K
eT x + K
fr T r)
+eT PB


T

pX
max

1
b+
e
+ tr −Γx xeT PB
xj b
TK i  Γx−1 K
2 

j=1


pX
max
 1 T
T −1
T −1  T b
−1 f
T 
e
ḟ
f
ḟ
−K Γx Γx xe PB +
xj b
K i
+ tr Kr Γr Kr + Kr Γr Kr
 2
j=1
T
1
−1
T
−1
ė Γ B
e+B
e Γ B
ė
+ tr B
B
B
2
=
(A.83)
1 T
eB
b T Pe + eT PB
bK
e T x + rT K
fr B
b T Pe
−e Qe + xT K
2
bK
fr T r − (K
eT x + K
fr T r)T B
e T Pe − eT PB(
e K
eT x + K
fr T r)
+eT PB

"p
#T
max
X
1 
b T PexT K
e − Γx Γ −1
e
+ tr −Γx Γx−1 B
xi b
TK i K
x
2 
i=1
#
"
T
pX
max
T −1
1
−1 f
T −1
T b
T
ḟ
f
ḟ
e
−K Γx Γx xe PB +
xi b
K i
+ tr Kr Γr Kr + Kr Γr Kr
2
i=1
T
1
ė Γ −1 B
e+B
e T Γ −1 B
ė
+ tr B
B
B
2
75
(A.84)
V̇(ζ) =
1 T
eB
b T Pe + eT PB
bK
e T x + rT K
fr B
b T Pe
−e Qe + xT K
2
bK
fr T r − (K
eT x + K
fr T r)T B
e T Pe − eT PB(
e K
eT x + K
fr T r)
+eT PB

#T
"p
max
X
1  bT
e−
e
+ tr −B PexT K
xi b
TK i K
2 
i=1

#
"p
max
 1 T
X
T −1
−1
T
T
T
e xe PB
b−K
e
ḟr Γ K
fr + K
fr Γ K
ḟr
−K
xi b
K i
+ tr K
r
r

2

i=1
T
1
−1
T
−1
ė Γ B
e+B
e Γ B
ė
+ tr B
B
B
2
1
=
2
1
=
2
(A.85)
fr B
b T Pe + eT PB
bK
fr T r
− eT Qe + rT K
T T T
T
T
T e eT
e
f
e
f
− (K x + Kr r) B Pe − e PB(K x + Kr r)

#T
"p
max
X
1  T e bT
T
T
T
e
b Pex K
e−
xi b
K i K
+ tr x KB Pe − B
2 
i=1

#
"p
max
 1 T
X
T −1
T b eT
T
T b
T
−1 f
e
e
ḟ
f
ḟ
+ e PBK x − K xe PB − K
xi b
K i
+ tr Kr Γr Kr + Kr Γr Kr

 2
i=1
T
1
T −1 ė
−1 e
e
ė
(A.86)
+ tr B ΓB B + B ΓB B
2
fr B
b T Pe + eT PB
bK
fr T r
− eT Qe + rT K
T T T
T
T
T e eT
e
f
e
f
− (K x + Kr r) B Pe − e PB(K x + Kr r)

"p
#T
max
X
1 bT
e−B
b T PexT K
e−
e
+ tr B PexT K
xi b
TK i K
2 
i=1

"p
#
max
 1 T
X
T −1
T
T b
T
T b
T
−1 f
e
e
e
ḟ
f
ḟ
b
+ K xe PB − K xe PB − K
xi K i
+ tr Kr Γr Kr + Kr Γr Kr

 2
i=1
T
1
−1 e
T −1 ė
ė
e
+ tr B ΓB B + B ΓB B
(A.87)
2
76
V̇(ζ) =
1
2
T
fr B
b T Pe + eT PB
bK
fr r
− eT Qe + rT K
T T T
T
T
T e eT
e
f
e
f
− (K x + Kr r) B Pe − e PB(K x + Kr r)

"p
#T
"p
#
max
max

X
X
1 
e+0−K
e
xi b
TK i K
xi b
TK i
+ tr 0 −

2 
i=1
i=1
T
T
1
ḟr Γ −1 K
fr + K
fr T Γ −1 K
ḟr + 1 tr B
ė Γ −1 B
e+B
e T Γ −1 B
ė
+ tr K
r
r
B
B
2
2
1
=
2
fr B
b T Pe + eT PB
bK
fr T r
− eT Qe + rT K
T e eT
=
=
T
e eT
T
eK
fr r
− x KB Pe −
− e P BK x − e P B
#
#
"p
"p
max
max
X
X
1
T
T
T
e
e
+ tr −K
xi b
K i
xi b
K i − K
2
i=1
i=1
T
T
T −1
1
1
T −1 ė
−1 e
−1 f
e
ḟ
ė
f
ḟ
+ tr Kr Γr Kr + Kr Γr Kr + tr B ΓB B + B ΓB B
2
2
=
fr B
e T Pe
rT K
T
1 T
fr B
b T Pe + eT PB
bK
fr T r − rT K
fr B
e T Pe − eT PB
eK
fr T r
−e Qe + rT K
2
"p
#
max
X
1
eB
e T Pe − eT PB
eK
e T x − 2K
eT
xi b
TK i
+ tr −xT K
2
i=1
T
T
T −1
1
1
T −1 ė
−1 f
−1 e
e
ḟ
f
ḟ
ė
+ tr Kr Γr Kr + Kr Γr Kr + tr B ΓB B + B ΓB B
2
2
1 T
fr B
b T Pe + eT PB
bK
fr T r − rT K
fr B
e T Pe − eT PB
eK
fr T r
−e Qe + rT K
2
"p
#
max
X
1
T
T e eT T
T e eT
T
e
xi b
K i
+ tr −(e PBK x) − e PBK x − 2K
2
i=1
T
T
T −1
1
1
−1 f
−1 e
T −1 ė
ḟ
f
ḟ
ė
e
+ tr Kr Γr Kr + Kr Γr Kr + tr B ΓB B + B ΓB B
2
2
1 T
fr B
b T Pe + eT PB
bK
fr T r − rT K
fr B
e T Pe − eT PB
eK
fr T r
−e Qe + rT K
2
"p
#
max
X
1
eK
e T x − 2K
eT
xi b
TK i
+ tr −2eT PB
2
i=1
T
T
T −1
1
1
−1 f
−1 e
T −1 ė
ḟ
f
ḟ
ė
e
+ tr Kr Γr Kr + Kr Γr Kr + tr B ΓB B + B ΓB B
2
2
77
(A.88)
(A.89)
(A.90)
(A.91)
(A.92)
e is taken care of, move on to K
fr .
Now that K
V̇(ζ) =
1 T
fr B
e T Pe − eT PB
eK
fr T r
fr B
b T Pe + eT PB
bK
fr T r − rT K
−e Qe + rT K
2 "p
#
max
X
T e eT
T
T
e
− tr e PBK x + K
xi b
K i
i=1
T
T
T −1
1
1
−1 f
−1 e
T −1 ė
ḟ
f
ḟ
ė
e
+ tr Kr Γr Kr + Kr Γr Kr + tr B ΓB B + B ΓB B
2
2
=
(A.93)
1 T
fr B
e T Pe − eT PB
eK
fr T r
fr B
b T Pe + eT PB
bK
fr T r − rT K
−e Qe + rT K
2 "p
#
T
max
X
1
T e eT
T
T
−1 e
T −1 ė
e
ė
e
− tr e PBK x + K
xi b
K i
+ tr B ΓB B + B ΓB B
2
i=1


T




p
p
max
max

X
X
T −1
1 
T b
T 
−1 f
T b
T 
f


rj b
Kr j Γr Kr − Kr Γr Γr re PB +
rj b
Kr j
+ tr −Γr re PB +


2 

j=1
j=1
(A.94)
=
1 T
fr B
b T Pe + eT PB
bK
fr T r − rT K
fr B
e T Pe − eT PB
eK
fr T r
−e Qe + rT K
2 #
"p
T
max
X
1
T
−1 e
T −1 ė
T e eT
T
ė
e
e
K i
+ tr B ΓB B + B ΓB B
− tr e PBK x + K
xi b
2
i=1


T




pX
pX
max
max


T −1
1
−1  T b
T  f
T
T
f
b
+ tr −Γr Γr
re PB +
rj b
Kr j Kr − Kr Γr Γr re PB +
rj b
Kr j 

2 


j=1
j=1
(A.95)
=
1 T
fr B
b T Pe + eT PB
bK
fr T r − rT K
fr B
e T Pe − eT PB
eK
fr T r
−e Qe + rT K
2 "p
#
T
max
X
1
T e eT
T
T
e
ė Γ −1 B
e+B
e T Γ −1 B
ė
− tr e PBK x + K
xi b
K i
+ tr B
B
B
2
i=1


T




pX
pX
max
max


T T
T
1
T
T
T
T
b Per K
fr − 
fr − K
fr re PB
b−K
fr 
rj b
Kr j  K
rj b
Kr j 
(A.96)
+ tr −B

2 


j=1
j=1
78
V̇(ζ) =
=
=
1 T
fr B
e T Pe − eT PB
eK
fr T r
−e Qe − rT K
2 "p
#
T
max
X
1
T e eT
T
T
−1 e
T −1 ė
ė
e
e
− tr e PBK x + K
xi b
K i
+ tr B ΓB B + B ΓB B
2
i=1


T

pX
max
T
1  T f bT
b T PerT K
fr − 
bK
fr r − B
b
fr − K
fr T reT PB
rj b
TKr j  K
+ tr r Kr B Pe + eT PB
2 

j=1




pX
max

T
T 
f

b
rj Kr j
(A.97)
− Kr


j=1
1 T
fr B
e T Pe − eT PB
eK
fr T r
−e Qe − rT K
2 "p
#
T
max
X
1
T
T e eT
T
−1 e
T −1 ė
e
ė
e
K i
− tr e PBK x + K
xi b
+ tr B ΓB B + B ΓB B
2
i=1


T

pX
max

1
b T PerT K
fr + K
fr T reT PB
b−B
b T PerT K
fr − K
fr T reT PB
b−
fr
rj b
TKr j  K
+ tr B
2 

j=1




pX
max

T
T
fr 
−K
rj b
Kr j 
(A.98)


j=1
1 T
fr B
e T Pe − eT PB
eK
fr T r
−e Qe − rT K
2 "p
#
T
max
X
1
−1 e
T −1 ė
T
T e eT
T
e
ė
e
xi b
K i
+ tr B ΓB B + B ΓB B
− tr e PBK x + K
2
i=1


T




p
p
max
max

X
T X
T
1 
T 
T 
f
f


+ tr 0 + 0 − Kr
rj b
Kr j − Kr
rj b
Kr j


2 

j=1
=
(A.99)
j=1
1 T
fr B
e T Pe − eT PB
eK
fr T r
−e Qe − rT K
2 "p
#
T
max
X
1
T e eT
T
−1 e
T −1 ė
T
ė
e
e
− tr e PBK x + K
xi b
K i
+ tr B ΓB B + B ΓB B
2
i=1


pmax

1  fT X
T
b
+ tr −2K
r
r
j Kr j

2 
j=1
79
(A.100)
V̇(ζ) =
1 T −e Qe
2 "p
max
X
#
T
1
−1 e
T −1 ė
ė
e
+ tr B ΓB B + B ΓB B
2
i=1


pX
max

T
T
1  T f eT
eK
fr r − 2K
fr
rj b
TKr j
+ tr −r Kr B Pe − eT PB

2 
eK
eT x + K
eT
− tr e PB
T
xi b
TK i
(A.101)
j=1
=
1 T −e Qe
2 "p
max
X
#
T
1
−1 e
T −1 ė
ė
e
+ tr B ΓB B + B ΓB B
2
i=1


pX
max


1
eK
fr T r)T − eT PB
eK
fr T r − 2K
fr T
rj b
TKr j
+ tr −(eT PB

2 
eK
eT x + K
eT
− tr e PB
T
xi b
TK i
(A.102)
j=1
=
1 T −e Qe
2 T
e eT
eT
− tr e PBK x + K
"p
max
X
TK i
xi b
#
T
1
−1 e
T −1 ė
ė
e
+ tr B ΓB B + B ΓB B
2


i=1

pmax
T X
1 
T e fT
f
+ tr −2e PBKr r − 2Kr
rj b
TKr j

2 
(A.103)
j=1


pX
pX
max
max


T
T
Q
eK
eT x + K
eT
eK
fr r + K
fr
= − eT e − tr eT PB
xi b
TK i − tr eT PB
rj b
TKr j


2
i=1
j=1
T
1
−1 e
T −1 ė
ė
e
(A.104)
+ tr B ΓB B + B ΓB B
2
ė to its derivative.
Now, expand B
pX
max


pX
max


T
T e fT
T
f
b
− tr e PBKr r + Kr
rj Kr j


Q
eK
eT x + K
eT
e − tr eT PB
xi b
TK i
2
i=1
j=1

"
#
T
pX
max
1 
b − ẋ + Ax)T +
b k − ẋk + Axk )T Γ −1 B
e
uk (Bu
+ tr −ΓB u(Bu
B
2 
k=1

"
#
pX
max

e T Γ −1 ΓB u(Bu
b − ẋ + Ax)T +
b k − ẋk + Axk )T
−B
u
(
Bu
k
B


k=1
V̇(ζ) = − eT
80
(A.105)
pX
max


pX
max


eK
fr T r + K
fr T
rj b
TKr j
− tr eT PB


Q
eK
eT x + K
eT
e − tr eT PB
xi b
TK i
2
i=1

"
#T
pX
max
1 
b − Bu)T +
b k − Buk )T B
e
+ tr −ΓB ΓB−1 u(Bu
uk (Bu
2 
k=1

"
#
pX
max

T
T
T
e
b
b
− B u(Bu − Bu) +
uk (Buk − Buk )


k=1
V̇(ζ) = − eT
j=1
(A.106)


pX
max


Q
eK
fr T r + K
fr T
eK
eT x + K
eT
rj b
TKr j
= − eT e − tr eT PB
xi b
TK i − tr eT PB


2
j=1
i=1

 "
#
#T
"
pX
pX
max
max

1 
e k )T
e T+
e k )T B
e−B
e T u(Bu)
e T+
uk (Bu
uk (Bu
+ tr − u(Bu)

2 
pX
max
k=1
k=1
(A.107)


pX
pX
max
max


T
T
Q
eK
eT x + K
eT
eK
fr r + K
fr
= − eT e − tr eT PB
xi b
TK i − tr eT PB
rj b
TKr j


2
i=1
j=1

 "
#
"
#T
pX
pX
max
max

1 
eT
e−B
e T uuT B
eT +
eT B
eT +
(A.108)
uk uTk B
+ tr − uuT B
uk uTk B

2 
k=1
k=1


pX
max


T
TQ
T
T e eT
T
T e fT
T
e
f
b
b
=−e
xi K i − tr e PBKr r + Kr
e − tr e PBK x + K
rj Kr j


2
i=1
j=1
"p
#
pX
max
max
X
1
T
T
T
T
T
T
T
T
e
e−
e k uk u B
e−B
e uu B
e −B
e
e
+ tr −Buu
B
B
uk uk B
(A.109)
k
2
pX
max
k=1
k=1


pX
max


T
T
Q
eK
eT x + K
eT
eK
fr r + K
fr
xi b
TK i − tr eT PB
rj b
TKr j
= − eT e − tr eT PB


2
i=1
j=1


" p
#T
pX
max
max

X
1  e Te
T eT
e T B)
e T− B
eT
e k uk uT − B
eT
+ tr −Buu B − (Buu
B
u
u
B
(A.110)
k k
k

2 
pX
max
k=1
81
k=1
pX
max
Q
eK
eT x + K
eT
e − tr eT PB
xi b
TK i
2
i=1
pX
max
1
e TB
e − 2B
eT
eT
+ tr −2Buu
uk uTk B
2
V̇(ζ) = − eT


pX
max


eK
fr T r + K
fr T
rj b
TKr j
− tr eT PB


j=1
(A.111)
k=1
pX
max
Q
eK
eT x + K
eT
e − tr eT PB
xi b
TK i
2
i=1
pX
max
e TB
e+B
eT
eT
− tr Buu
uk uT B
= − eT


pX
max


eK
fr T r + K
fr T
rj b
TKr j
− tr eT PB


j=1
(A.112)
k
k=1
A.5
Expanding the Hatted Epsilons
As noted, b
K = ∆K + ∆Kr + K and b
Kr = ∆Kr + Kr . In (A.112), the b
K and b
Kr have to
be expanded in order to attempt a bound on the Lyapunov candidate derivative.
Expanding b
K :
pX
max
eK
eT x + K
eT
xi b
T
tr eT PB
(A.113)
Ki
eK
eT x + K
eT
tr eT PB
T
e eT
eT
tr e PBK x + K
eK
eT x + K
eT
tr eT PB
eK
eT x + K
eT
tr eT PB
T
e eT
eT
tr e PBK x + K
eK
eT x + K
eT
tr eT PB
i=1
pX
max
i=1
pX
max
i=1
pX
max
i=1
pX
max
i=1
pX
max
i=1
pX
max
xi (∆K + ∆Kr + K )T
eT
T
xi (∆K + ∆Kr ) + K
eT
xi (∆K + ∆Kr )T + K
eT
xi (∆K + ∆Kr )T + K
pX
max
i=1
pX
max
i=1
pX
max
i=1
e
xi (∆K + ∆Kr ) + K Ωx K
T
xi (∆K + ∆Kr )T
i=1
82
eT
(A.114)
xi TK
(A.115)
e T xi )T
xi (K
(A.116)
e
xi xTi K
(A.117)
e T Ωx K
e
+ tr K
(A.118)
(A.119)
Expanding b
Kr :


pX
max


eK
fr T r + K
fr T
rj b
TKr j
tr eT PB


j=1


pX
max


eK
fr T r + K
fr T
rj (∆Kr + Kr )T
tr eT PB


j=1


pX
pX
max
max


T
T
T
eK
fr r + K
fr
fr
rj TKr
rj (∆Kr )T + K
tr eT PB


j=1
j=1


pX
pX
max
max


T
T
T
T
eK
fr r + K
fr
fr
fr rj )T
rj ∆TKr + K
rj ( K
tr eT PB


j=1
j=1


pmax
pmax


T X
T X
T e fT
T
f
f
fr
tr e PBKr r + Kr
rj ∆Kr + Kr
rj rTj K


j=1
j=1


pmax


T X
T e fT
fr T Ωr K
fr
f
tr e PBKr r + Kr
rj ∆TKr + K


j=1


pmax


T
T X
T
T e fT
f
fr Ωr K
fr
tr e PBKr r + Kr
rj ∆Kr + tr K


(A.120)
(A.121)
(A.122)
(A.123)
(A.124)
(A.125)
(A.126)
j=1
A.6
(A.127)
Expanding the Input
Here the input, u, is expanded into terms that may be bounded. The idea is that since u
is squared, it is non-negative so this section shows that it could be expanded into the four
working variables, but it does not need to be.
u =KT x + KTr r
(A.128)
e + K∗ ) T x + ( K
fr + K∗ )T r
=(K
r
(A.129)
e T + K∗T )(e + xrm ) + (K
fr T + K∗T )r
=(K
r
(A.131)
e + K∗ )T (e + xrm ) + (K
fr + K∗ )T r
=(K
r
(A.130)
eT e + K
e T xrm + K∗T e + K∗T xrm + K
fr T r + K∗T r
=K
r
(A.132)
83
Now, expand uuT to elements that are boundable.
uuT =(KT x + KTr r)(KT x + KTr r)T
(A.133)
=(KT x + KTr r)(xT K + rT Kr )
(A.134)
T ∗
T
∗
eT e + K
e T xrm + K∗T e + K∗T xrm + K
fr T r + K∗T r)(eT K
e + xT K
e
=(K
r
rm + e K + xrm K
(A.135)
fr + rT K∗ )
+ rT K
r
e T eeT K
e+K
e T exT K
e eT T ∗ eT T ∗ eT T f eT T ∗
=K
rm + K ee K + K exrm K + K er Kr + K er Kr
T ∗
T
∗
Tf
T ∗
e T xrm eT K
e+K
e T xrm xT K
e eT
eT
eT
eT
+K
rm + K xrm e K + K xrm xrm K + K xrm r Kr + K xrm r Kr
∗T
T ∗
∗T
T
∗
∗T T f
∗T T ∗
e + K∗T exT K
e
+ K∗T eeT K
rm + K ee K + K exrm K + K er Kr + K er Kr
∗T
T ∗
∗T
T
∗
∗T
Tf
e + K∗T xrm xT K
e
+ K∗T xrm eT K
rm + K xrm e K + K xrm xrm K + K xrm r Kr
T
T
T
T
T
fr reT K
e+K
fr rxT K
e f T ∗ f T ∗ f Tf
+ K∗T xrm rT K∗r + K
rm + Kr re K + Kr rxrm K + Kr rr Kr
T
∗T T ∗
∗T T
∗
∗T T f
fr rrT K∗ + K∗T reT K
e + K∗T rxT K
e
+K
r
r
r
rm + Kr re K + Kr rxrm K + Kr rr Kr
T ∗
+ K∗T
r rr Kr
(A.136)
T
∗T
T ∗
∗T
T
∗
∗T T ∗
e T eeT K
e+K
e T xrm xT K
e
f Tf
=K
rm + K ee K + K xrm xrm K + Kr rr Kr + Kr rr Kr
e T exT K
e eT T ∗ eT T ∗ eT T f eT T ∗
+K
rm + K ee K + K exrm K + K er Kr + K er Kr
e T xrm eT K
e+K
e T xrm eT K∗ + K
e T xrm xT K∗ + K
e T xrm rT K
fr + K
e T xrm rT K∗
+K
rm
r
∗T
T
∗
∗T T f
∗T T ∗
e + K∗T exT K
e
+ K∗T eeT K
rm + K exrm K + K er Kr + K er Kr
∗T
T ∗
∗T
Tf
e + K∗T xrm xT K
e
+ K∗T xrm eT K
rm + K xrm e K + K xrm r Kr
T
T
T
T
fr reT K
e+K
fr rxT K
e f T ∗ f T ∗
+ K∗T xrm rT K∗r + K
rm + Kr re K + Kr rxrm K
T
∗T T ∗
∗T T
∗
∗T T f
fr rrT K∗ + K∗T reT K
e + K∗T rxT K
e
+K
r
r
r
rm + Kr re K + Kr rxrm K + Kr rr Kr
(A.137)
That is, 6 positive definite terms and 30 not so positive definite terms. Using the same
constants as in Chapter 3, the upper bound of uuT is
84
2
2
2
e
e
f
2
2
2 2
uuT ≤ K
kek + c2rm K
+ κ2x kek + κ2x c2rm + c2r K
r + cr κr
2
e
e
e
e
e
f
2
+ crm kek K
+ κx K
kek + crm κx K
kek + cr K
kek K
r + κx cr K kek
2
e
e
e
e f
e
+ crm kek K
+ crm κx K
kek + c2rm κx K
+ cr crm K
Kr + κr crm cr K
e
e
f
+ κx kek2 K
+ κx crm kek K
+ κ2x crm kek + cr κx kek K
r + κx κr cr kek
f
e
e
+ κx crm kek K
+ κx c2rm K
+ κ2x crm kek + κx crm cr K
r
f
e
f e f
f
+ κx crm cr κr + cr K
r kek K + crm cr Kr K + κx cr Kr kek + cr crm κx Kr f
e
e
2
2 f
+ cr κr Kr + κr cr kek K + κr cr crm K + κr cr κx kek + κr cr crm κx + κr cr Kr (A.138)
and combining terms is
2
2
e
e
e
e
f
2
2
2
2
uuT ≤ K
kek + 2κx K
kek + 2crm kek K
+ 2cr K
kek K
r + κx kek
2
2
e
f
e
e
f
+ c2rm K
+ c2r K
r + 4crm κx K kek + (κx + κr )cr K kek + 2cr κx kek Kr e f
e
e
2
+ 2cr crm K Kr + 3κr crm cr K + 2crm κx K
+ 2κ2x crm kek + 2κx κr cr kek
f
2 f
2 2
2
2
+ κx crm cr K
+
2κ
c
(A.139)
r
r r Kr + cr κr + crm κx + 2cr crm κx κr .
Then rearranging to place positive definite terms first:
2
2
e
e
2
2
uuT ≤ K
kek + κ2x kek + c2rm K
+ c2r κ2r + c2rm κ2x + 2cr crm κx κr
2
2
e
e
e
f
e
2
2 f
+ 2κx K
kek
+
2c
kek
K
+
2c
K
kek
K
+
c
K
+
4c
κ
r
rm
r rm x K kek
r r
e
f
e f
e
+ (κx + κr )cr K
K
K
+
2c
c
+
3κ
c
c
kek + 2cr κx kek K
r
r rm
r
r rm r K
e
f
2 f
+ 2c2rm κx K
(A.140)
+ 2κ2x crm kek + 2κx κr cr kek + κx crm cr K
r + 2κr cr Kr .
Since (4.13) has uuT negative, the bound can be found by comparing coefficients. So then,
κ2 c + κ κ c 3κr crm cr
e 2
e
x r r
2
x rm
T
2
kek − 2crm kek +
+ crm
−uu ≤ − K − 2κx K +
crm
2κx
2
κx κr cr
3κr cr crm e 2
f
+ 5c2rm + 7c2r κ2r + 12crm cr κx κr +
kek2 +
K − cr K
r
crm
2κx
e
e f
e f
+ (8crm κx + cr κx + cr κr ) kek K
+ 2cr kek K
Kr + 2cr crm K
Kr f
+ κx cr crm + 2κr c2r K
(A.141)
r .
85
Continuing to complete the square where possible the inequality expands to
κ2 c + κ κ c 3κr crm cr
e
e 2
x r r
2
x rm
2
−uu ≤ − K − 2κx K +
kek − 2crm kek +
+ crm
crm
2κx
3κr cr crm κx κr cr
e
e 2
kek2 +
+
K + (8crm κx + cr κx + cr κr ) kek K
crm
2κx
2
e f
e f
f
2
2 2
+ 5crm + 7cr κr + 12crm cr κx κr − cr Kr + 2cr kek K Kr + 2cr crm K
Kr f
(A.142)
+ κx cr crm + 2κr c2r K
r .
T
κ2 c + κ κ c 3κr crm cr
e
e 2
x r r
2
x rm
2
kek − 2crm kek +
+ crm
−uu ≤ − K − 2κx K +
crm
2κx
2
3κr cr crm 8crm κx + cr crm κx + cr crm κr
κx κr cr
e
e 2
2
kek +
kekK
+
+
K
crm
κx κr cr
2κx
2
f
e f
e f
+ 5c2rm + 7c2r κ2r + 12crm cr κx κr − cr K
+
2c
kek
K
K
+
2c
c
r
r
r
r rm K Kr f
+ κx cr crm + 2κr c2r K
(A.143)
r
T
κ2 c + κ κ c 3κr crm cr
e 2
e
x r r
2
x rm
2
≤ − K − 2κx K +
kek − 2crm kek +
+ crm
crm
2κx
2
2
κx κr cr
8crm κx + cr crm κx + cr crm κr 3κr cr crm e
e 2
+
kek +
K +
K
crm
2κx κr cr
2κx
2
2 κx κr cr 8crm κx + cr crm κx + cr crm κr e 2
−
K
4crm
κx κr c r
2
f
e f
e f
+ 5c2rm + 7c2r κ2r + 12crm cr κx κr − cr K
+
2c
kek
K
K
+
2c
c
r
r
r
r rm K Kr f
+ κx cr crm + 2κr c2r K
(A.144)
r
κ2 c + κ κ c 3κr crm cr
e 2
e
x r r
2
x rm
2
kek − 2crm kek +
+ crm
≤ − K − 2κx K +
crm
2κx
2
2
κx κr cr
8crm κx + cr crm κx + cr crm κr e
+
kek +
K
crm
2κx κr cr
2
2 ! 8crm κx + cr crm κx + cr crm κr
3κr cr crm
e 2
−
+
K
2κx
4κx κr cr crm
2
e f
e f
f
+ 5c2rm + 7c2r κ2r + 12crm cr κx κr − cr K
r + 2cr kek K Kr + 2cr crm K Kr 2 f
+ κx cr crm + 2κr cr Kr (A.145)
86
κ2 c + κ κ c 3κr crm cr
e 2
e
x r r
2
x rm
2
−uu ≤ − K − 2κx K +
+ crm
kek − 2crm kek +
crm
2κx
2
2
κx κr cr
8crm κx + cr crm κx + cr crm κr e
kek +
+
K
crm
2κx κr cr
!
2
2
8crm κx + cr crm κx + cr crm κr
3κr cr crm e 2
−
−
K
4κx κr cr crm
2κx
2 f
2 f
− cr K
r + κx cr crm + 2κr cr Kr e f
e f
+ 5c2rm + 7c2r κ2r + 12crm cr κx κr + 2cr kek K
(A.146)
Kr + 2cr crm K
Kr T
κ2 c + κ κ c 3κr crm cr
e 2
e
x r r
2
x rm
2
≤ − K
kek
−
2c
kek
+
+
c
−
2κ
K
+
rm
x rm
crm
2κx
2
2
8crm κx + cr crm κx + cr crm κr κx κr cr
e
kek +
+
K
crm
2κx κr cr
!
2
2
8crm κx + cr crm κx + cr crm κr
3κr cr crm e 2
−
−
K
4κx κr cr crm
2κx
f2
f
− cr Kr − [κx crm + 2κr cr ] K
r
e f
e f
+ 5c2rm + 7c2r κ2r + 12crm cr κx κr + 2cr kek K
Kr + 2cr crm K
Kr (A.147)
κ2 c + κ κ c 3κr crm cr
e 2
e
x r r
2
x rm
2
kek − 2crm kek +
+ crm
≤ − K − 2κx K +
crm
2κx
2
2
κx κr cr
8crm κx + cr crm κx + cr crm κr e
+
kek +
K
crm
2κx κr cr
!
2
2
8crm κx + cr crm κx + cr crm κr
3κr cr crm e 2
−
−
K
4κx κr cr crm
2κx
2
cr
f κx crm + 2κr cr
− cr K
+ (κx crm + 2κr cr )2
r −
2
4
e f
e f
+ 5c2rm + 7c2r κ2r + 12crm cr κx κr + 2cr kek K
K
+
2c
c
(A.148)
r
r rm K Kr A.6.1
Expanding the Input with W
Instead of K and Kr , use W.
−uuT = − W T σ(W T σ)T = −W T σσT W
f + W ∗ )T σσT (W
f + W∗)
= − (W
87
(A.149)
(A.150)
Let cK ≥ kW ∗ k.
fT + cK )σσT (W
f + cK )
−uuT ≤ − (W
(A.151)
fT + cK )(W
f + cK )σT σ
≤ − (W
h
i
fT W
f + Wc
f K + cK W
f + c2 σT σ
≤− W
K
(A.152)
(A.153)
(A.154)
Apply the norm:
Let cR ≥ [ crm cr ].
2
f
f
2
2
2
−uuT ≤ − W
kσk − 2cK W
kσk − c2K kσk
kek ≥ kσk − cR
(A.156)
kek + cR ≥ kσk
(A.157)
kσk2 ≤(cR + kek)(cR + kek)
(A.158)
≤ kek2 + 2cR kek + cR
(A.159)
Going back to (A.155)
2
f
f
2
2
−uuT ≤ − W
(kek + 2cR kek + c2R ) − 2cK W
(kek + 2cR kek + c2R )
− c2K (kek2 + 2cR kek + c2R )
2
2
2
f
f
f
f
2
2
≤ − W
− 2cK W
kek
kek − 2cR kek W
− c2R W
f
2
2
2
2 2
2 f
− 4cR cK W
kek
−
2c
c
K R W − cK kek − 2cR cK kek − cK cR
2
2
f
f
f
2
2
2
≤ W
kek − 2cK W
kek − c2K kek − 2cR kek W
2
f
f
f
− c2R W
kek − 2cK c2R W
− 2cR c2K kek − c2K c2R
− 4cR cK W
Now pull out the negative and look at the function.
2
2
f
f
f
2
2
2
kek + c2K kek + 2cR kek W
W kek + 2cK W
2
f
f
f
+ c2R W
+ 4cR cK W
kek + 2cK c2R W
+ 2cR c2K kek + c2K c2R
2
kek + 2cR kek +
(A.155)
c2R
2
f
f
2
W + 2cK W + cK
88
(A.160)
(A.161)
(A.162)
(A.163)
(A.164)
f
−uuT ≤ −(kek + cR )2 (W
+ cK )2
89
(A.165)
APPENDIX B
Concurrent Learning Update
The purpose of this appendix is to assist the reader in understanding the Model Reference
Adaptive Control (MRAC) update law and how it is affected by adding concurrent learning
(CL-MRAC).
B.1
The MRAC Update Law
In MRAC with no concurrent learning, the learning stops as soon as the error between the
reference model and the actual system decreases to very small values. The adaptive law is:
ċ = −ΓxeT PB
W
(B.1)
c is the adapting variable, Γ > 0 is the adaptive learning rate, x is the state vector,
where W
e is the tracking error vector, P is the solution to the Lyapunov equation, AP + PAT − Q = 0
([16]), and B is the input allocation matrix.
B.1.1
CL-MRAC Update Law
Concurrent Learning (CL) may be used to alleviate this condition as discussed in [4]. CL
acts like extra excitement when the input is not exciting as defined by Tao in [30]. The
MRAC-only adaptive update law, (B.1), is only effective when x and e are large. If e goes
to zero, the adaptation stops. The CL-MRAC adaptation law is
"
#
n X
T
T
ċ = −Γ xe PB +
Wb
xi i
i=1
90
(B.2)
where the new terms are xi , previously recorded states and i , previously recorded adaptation errors, and n, the upper limit to the number of stored or budgeted points. These
are stored in a history stack which is governed by the algorithm outlined in [10]. But, note
that the summation term in (B.2) is not dependent on the current state or the current state
tracking error. The points in the history stack were recorded when the input was exciting
so the summation is adding excitation when the first term of (B.2) is small. Therefore, the
adaptation will continue even while the state tracking error is small and the ideal values
will be attained.
The addition of concurrent learning is like a stochastic gradient ascent for neural networks (stochastic gradient ascent, [13]). The stored data points are like the test points in a
neural network. Execute the gradient search at each one and add the results together. The
total result will generally be more accurate in pointing to the goal than any of the single
points taken alone.
B.2
MRAC Example
To help illustrate this concept, the simple MRAC control problem is illustrated in the next
section. Two linear systems are shown in Figure B.1 where a.) is the step response of
the plant or actual system, and b.) is the step response of the desired model. Note that
Figure B.1 a.) overshoots the amplitude of 1, the desired value, and then oscillates for 13
seconds before reducing to under 2% of the desired, which is the definition of settling time.
Figure B.1 b.) has no overshoot and settles in 3 seconds.
B.2.1
Model Reference Adaptive Control Only
The state tracking is shown in Figure B.2, and the effect of the control law is evidenced in
the first three seconds. The error was large, but as adaptation happened, the error decreases
and the reference input is then dominant. The tracking error is shown in Figure B.4. The
time history of the adaptive gains is plotted in Figure B.3. The lack of adaptation can be
91
a.)
Amplitude
Step Response
2
1.5
1
0.5
0
0
2
4
6
8
10
12
14
Time (seconds)
b.)
Amplitude
Step Response
1
0.5
0
0
0.5
1
1.5
Time (seconds)
2
2.5
3
Figure B.1: Step Response of Two Linear Systems. Note the differences in magnitude and settling
time between the plant, a.), and the reference model, b.). The goal is to make a system that naturally
acts like a.) instead take on the response of the system shown in b.)
seen by comparing when the gains in Figure B.3 stop adapting (flatten out) and when the
state tracking errors (especially the error for the second state) become very small in Figure
B.4. Since the gains have not attained their ideal values, when the system encounters a
new region of the state-space, the actual system will not respond as the reference model
does. In the past, this problem was overcome by requiring the system to have persistently
exciting input ([2]), but that is not easily attained in an aerospace application. Step inputs
are much more tolerable.
B.2.2
Model Reference Adaptive Control with Concurrent Learning
This concept is illustrated in Figure B.5–B.9. A linear system is selected for this example
that has oscillation while the model settles within 3 seconds to the demanded signal with
no overshoot. The state tracking and the input to the system is shown in Figure B.5 where
the green dashed line indicates the model. The tracking is quite good even at the start. The
demanded signal is a series of steps, both positive and negative, which is more like what
aerospace applications use than the sinusoid that Figures B.2–B.4 use. Figure B.6 shows
the error between the system state and the model state. Note that the error has jumps
that coincide with the steps in the demanded signal shown in Figure B.5. Those are not
92
1
X
1
Amplitude
0.5
Xrm1
0
−0.5
−1
0
2
4
6
8
10
Time (sec)
12
14
16
1.5
X2
1
Amplitude
18
20
Xrm2
0.5
0
−0.5
−1
0
2
4
6
8
10
Time (sec)
12
14
16
Input u
18
20
Command
1
Amplitude
0.5
0
−0.5
−1
0
2
4
6
8
10
Time (sec)
12
14
16
18
20
Figure B.2: Model Reference Adaptive Control update law. Note the difference in the first couple of
seconds while the input is jumping around. Then the input settles down and tracking error is quite
small by 8 seconds. The bottom plot shows the commanded input and the total input including the
adaptive gains.
93
0.5
Amplitude
0
K*x1
Kx1
−0.5
−1
−1.5
0
2
4
6
8
10
Time (sec)
12
14
16
18
20
0
Amplitude
−0.2
−0.4
K*x2
Kx2
−0.6
−0.8
−1
0
2
4
6
8
10
Time (sec)
12
14
16
18
20
2.5
Amplitude
2
1.5
K*r
Kr
1
0.5
0
−0.5
0
2
4
6
8
10
Time (sec)
12
14
16
18
20
Figure B.3: Model Reference Adaptive Control update law gains Kx and Kr . Note that the actual
gains never attain their ideal values indicated by the green lines.
0.01
e1
0
−0.01
−0.02
0
2
4
6
8
10
Time (sec)
12
14
16
18
20
0
2
4
6
8
10
Time (sec)
12
14
16
18
20
0.2
e2
0.1
0
−0.1
−0.2
Figure B.4: Model Reference Adaptive Control time history of tracking error. Note that the error
for both states is quite small by about 8 seconds which is when the gains stopped adapting. Also,
the original response of the plant can be seen in the first 3 seconds prior to being damped out by
the reference model.
94
2
X1
1
Xref1
0
−1
−2
0
2
4
6
8
10
Time (sec)
12
14
16
18
20
5
X2
Xref2
0
−5
0
2
4
6
8
10
Time (sec)
12
14
16
18
20
12
14
16
18
20
4
2
0
Input u
Command
−2
−4
0
2
4
6
8
10
Time (sec)
Figure B.5: Concurrent Learning MRAC State Tracking. The system states are solid lines while
the model states are dashed. The bottom plot shows the input to the system as solid and the
commanded signal as dashed. Note the good state tracking in the top and middle plots. Stepped
input was selected over sinusoidal.
e1
0.05
0
−0.05
0
2
4
6
8
10
Time (sec)
12
14
16
18
20
0
2
4
6
8
10
Time (sec)
12
14
16
18
20
1
e2
0.5
0
−0.5
−1
Figure B.6: Concurrent Learning MRAC Tracking Errors. The system state tracking errors compared to the reference model. Note the jumps in the error coincide with steps in the command signal
from Figure B.5.
95
0.5
0
Kx1
−0.5
K*x1
−1
−1.5
0
2
4
6
8
10
Time (sec)
12
14
0
16
18
20
18
20
Kx2
K*x2
−0.5
−1
−1.5
0
2
4
6
8
10
Time (sec)
12
14
16
2.5
2
1.5
Kr
1
K*r
0.5
0
0
2
4
6
8
10
Time (sec)
12
14
16
18
20
Figure B.7: Concurrent Learning MRAC Adaptive Weights. Note the gains, solid, attain their ideal
values, dashed, within 6 seconds.
apparent in Figure B.4 because the input is a smooth function. The time history of the
adaptive weights for the concurrent learning system is shown in Figure B.7. Note that the
ideal values are attained quickly and are continuously reattained when the gains are driven
off the ideal values by large errors due to steps in the demand signal. Figure B.3 displays
nothing like this behavior: the adaptation just stops.
B.2.3
Concurrent Learning in the Weight Space
The weight space of a system is the area where the weights or gains live. It can be Rn or
even larger than the number of states for the system. In the case of neural networks, the
weight space can be several times larger than the state space in dimension. It is separate
and apart from the state space.
The weight space of the Concurrent learning MRAC is displayed in Figure B.8 where
the ideal values are the juncture of the orange lines and the randomly selected starting
values are in the bottom center of the figure. This figure shows the way that MRAC and
concurrent learning work together to guide the gains to their ideal values based on the data
they each have. It becomes apparent that the MRAC adaptive law and the concurrent
96
2.5
2
1.5
CL Vector
MRAC Adaptive Law
Weights
Ideal Values
Kr
1
0.5
0
−0.5
−1.5
−1
−0.5
0
0.5
−1.5
Kx1
−0.5
−1
0
Kx2
Figure B.8: Concurrent Learning MRAC Adaptive Weight Space. The actual weights are difficult
to see among the arrows. The blue arrows show the direction of the summation CL vector while the
green arrows show the direction of adaption if only the MRAC adaptive law was used. The orange
lines cross at the ideal values for the gains.
97
Adaptation vectors for Time Instant 764
Adaptation vectors for Time Instant 864
−0.84
−0.86
−0.88
−0.945
Concurrent Learning Summation vector
Individual Concurrent Learning vectors
Normal Adaptive Law vector
Combined CL−MRAC vector
Ideal Values
−0.946
−0.947
Concurrent Learning Summation vector
Individual Concurrent Learning vectors
Normal Adaptive Law vector
Combined CL−MRAC vector
Ideal Values
−0.9
−0.948
Kx2
Kx2
−0.92
−0.949
−0.94
−0.95
−0.96
−0.951
−0.98
−0.952
−1
−1.02
−1.05
−1.04
−1.03
−1.02
−1.01
−1
−0.99
−0.98
−0.953
−1.004
−0.97
Kx1
−1.003
−1.002
−1.001
−1
−0.999
−0.998
−0.997
−0.996
Kx1
a.)
b.)
Figure B.9: Concurrent Learning MRAC Weight Stochastic Gradient. In each, the red arrows are
the direction of adaption from each element of the history stack. The blue line is the summation
of the red arrows. The green arrow is the MRAC adaptation law direction and the black arrow is
the combined concurrent learning and MRAC adaptation direction. Plot a.) shows this for time
instant 7.64 seconds while Plot b.) displays this for time instant 8.64 seconds. Note that in a.), the
combined vector is close to, but not directly pointing at, the ideal values indicated by the X. In b.),
the combined vector points directly to the ideal values.
learning law are at odds with each other. This means the actual adaptation is smaller and
smoother than either by itself.
Now looking at two instances of the adaptation split into MRAC adaptive law and the
concurrent learning law directions, the plots in Figure B.9 expound on the similarity of
concurrent learning to stochastic gradient optimization in neural networks. The red vectors
in both are the directions attained by utilizing the elements of the history stack to obtain a
bearing on the ideal values for the system. The blue arrow is the summation of the red ones
and the green arrow is the MRAC adaptation vector as in Figure B.8. The black arrow is
the combined Concurrent learning MRAC adaptation vector. In Figure B.9 a.), the MRAC
adaptation vector is pointing in the opposite direction from the ideal weights, indicated by
the X. One second later, Figure B.9 b.) shows the combined concurrent learning MRAC
vector passing through the ideal values. Also note about Figure B.9 b.) that the MRAC
adaptation vector is difficult to see, being so small. This would have caused minuscule
adaptation under just MRAC. With concurrent learning, the adaptation is driven even
while the state tracking error is small.
Without concurrent learning, MRAC controllers are only high-gain controllers and will
98
not attain their ideal values unless persistently excited. The ideal values of the adaptive
weights are important because once they are attained, the system will act like the model for
any input at any point in the state-space. If the ideal values have not been attained when
a new region of the state-space is encountered, the adaptive law, (B.1), will direct change
in the gains based on the error which will most likely be large. In an attempt to reduce the
error, the MRAC adaptation will change the gains. This is not really guided by anything,
c.f. Figure B.8 where the green vectors are pointing away from the ideal value point.
99
APPENDIX C
Acronyms
Acronym
CL
CL-HSMEM
f–GTM
LQR
MO–Op
MPC
MRAC
ODE
PE
Expanded Version
Concurrent Learning
Concurrent Learning History Stack Minimum Eigenvalue Maximization Routine
flexible Generic Transport Model
Linear Quadratic Regulator
Multi-Objective Optimization
Model Predictive Control
Model Reference Adaptive Control
Ordinary Differential Equation
Persistence of Excitation
100
APPENDIX D
Additional Plots
This appendix contains plots in addition to those from Chapter 6.
101
Tracking Error
e1
2
0
−2
0
20
40
60
80
100
60
80
100
60
80
100
Time (sec)
e2
0.5
0
−0.5
0
20
40
Time (sec)
e3
1
0
−1
0
20
40
Time (sec)
Figure D.1: System Tracking Errors Shown Using Algorithms 1 (solid), 2 (dashed), and 3 (dash
dot). Note that the error reduces quickly once the stacks are shocked.
102
Figure D.2: Dissolved Input Allocation Matrix Weight Space. The red stars ‘*’ are the inital weights
while the orange diamonds ‘’ show the ideal values.
103
Kx(3,1)
Kr(3,1)
0
0.1
Kr(2,1)
−0.4
0.2
−0.3
−0.2
−0.1
0
Kx(2,1)
−2
2
−1.5
−1
−0.5
0
−2 0
−4 Kr(1,1)
−2
2
Kx(1,1)
0
4
Kx(3,2)
Kr(3,2)
0
0
1
Kr(2,2)
0
2
0.01
0.02
0.03
Kx(2,2)
−1
5
−0.5
0
0.5
1
0 0
−5 −1
1
Kr(1,2)
−0.5
Kx(1,2)
2
0
Kx(3,3)
Kr(3,3)
0
0.1
Kr(2,3)
−3
0.2
−2
−1
0
1
Kx(2,3)
−6
10
−4
−2
0
2
0 −10
−10 −2
0
Kr(1,3)
0
Kx(1,3)
10
2
Version
b Element Values
Λ
1
0.5
0
−0.5
−1
−1.5
0
20
40
60
80
Time (sec)
b
Figure D.3: Dissolved Input Allocation Matrix Time History of Λ.
104
100
VITA
Benjamin David Reish
Candidate for the Degree of
MASTER OF SCIENCE
Thesis: CONCURRENT LEARNING IN THE PRESENCE OF UNCERTAIN INPUT
ALLOCATION
Major Field: Mechanical and Aerospace Engineering
Biographical:
Personal Data: Lives in Edmond, OK
Education: Completed the requirements for the Master’s of Science degree with a
major in Mechanical and Aerospace Engineering at Oklahoma State University
in May, 2015. He holds a Bachelor of Science in Mechanical Engineering from
Oklahoma Christian University from 2004.
Experience:
Cognizant Engineer F101/F118 Engines Section, Tinker AFB, OK: Jet engine performance and engine accessories maintenance from 2005 to 2012.
Teacher’s Assistant Mechanical Engineering Department, Oklahoma Christian University, Edmond, OK: Instrumentation lab, spring 2011.
Teacher’s Assistant Mechanical and Aerospace Engineering Department, Oklahoma
State University, Stillwater, OK: Dynamics, spring 2014.
Research Assistant DASLab, Oklahoma State University, Stillwater, OK: spring 2013
to present.
Professional Memberships: IEEE and DASLab

Download Report

concurrent learning in the presence of uncertain input

Paperzz.com

Your Paperzz