Markov jump linear systems Optimal Control

Markov jump linear systems
Optimal Control
Pantelis Sopasakis
IMT Institute for Advanced Studies Lucca
February 5, 2016
Abbreviations
1. MJLS: Markov Jump Linear Systems
2. FHOC: Finite Horizon Optimal Control
3. IHOC: Infinite Horizon Optimal Control
4. CARE: Coupled Algebraic Ricatti Equations
1 / 26
Outline
1. LQR (deterministic case) – A quick revision
2. FHOC for MJLS
3. IHOC for MJLS (CARE)
2 / 26
I. Dynamic programming
3 / 26
Finite horizon optimal control
We have a (deterministic) LTI system
x(k + 1) = Ax(k) + Bu(k),
with x(0) = x0 . For a given sequence of input values of length N , that is,
πN = (u(0), u(1), . . . , u(N − 1)) we define the cost function
JN (πN ; x0 ) =
N
−1
X
`(x(k), u(k)) + `N (xN ).
k=0
Assume
`(x, u) = x0 Qx + u0 Ru, and `N (x) = x0 PN x.
n , P ∈ Sn , R ∈ Sm .
for some Q ∈ S+
f
++
++
4 / 26
Finite horizon optimal control
We need to determine a finite sequence πN to minimise JN (πN ):
?
JN
(x0 ) = min JN (πN ; x0 )
πN
subject to the system dynamics and x(0) = x0 . DP recursion1 :
VN (x(N )) = x(N )0 PN x(N ),
Vk (x(k)) = min `(x(k), u(k)) + Vk+1 (x(k + 1)),
uk
for k = N − 1, . . . , 0.
1
See for instance: F. Borelli, Constrained Optimal Control of Linear and Hybrid
Systems, Springer, 2003.
5 / 26
Why DP?
DP facts:
I
We may decompose a complex optimisation problem into simpler
subproblems
6 / 26
Why DP?
DP facts:
I
We may decompose a complex optimisation problem into simpler
subproblems
I
Here, we solve for one uk at a time
6 / 26
Why DP?
DP facts:
I
We may decompose a complex optimisation problem into simpler
subproblems
I
Here, we solve for one uk at a time
I
DP used Bellman’s principle of optimality
6 / 26
Why DP?
DP facts:
I
We may decompose a complex optimisation problem into simpler
subproblems
I
Here, we solve for one uk at a time
I
DP used Bellman’s principle of optimality
I
It can be applied the same way to stochastic optimal control
problems
6 / 26
Why DP?
DP facts:
I
We may decompose a complex optimisation problem into simpler
subproblems
I
Here, we solve for one uk at a time
I
DP used Bellman’s principle of optimality
I
It can be applied the same way to stochastic optimal control
problems
I
It is a powerful tool to study the MSS of MJLS and Markovian
switching systems (next class)
6 / 26
Finite horizon optimal control
Let π ? (x0 ) be the respective minimiser with
π ? (x0 ) = {u? (1), u? (2), . . . , u? (N − 1)}.
Using DP we derive
Vk (x) = x0 Pk x,
u? (k) = F (Pk+1 )x(k)
where Pk is determined as follows:
Pk = A0 Pk+1 A + Q + A0 Pk+1 F (Pk+1 )
and
F (P ) = −B(B 0 P B + R)−1 B 0 P A.
7 / 26
Infinite horizon optimal control
What happens as N → ∞? Let us define
J∞ (π; x0 ) =
∞
X
`(x(k), u(k)),
k=0
where π is a sequence of inputs {u(k)}k∈N . For the series to converge it is
of course required that
kx(k)k2 , ku(k)k2 → 0, as k → ∞.
8 / 26
Infinite horizon optimal control
We can show that – under certain conditions2 – the IHOC problem is
solvable and
?
J∞
(x) = x0 P∞ x,
u? (k) = F (P∞ )x(k),
where P∞ is a fixed point of the DP recursion of the FHOC problem
(Algebraic Ricatti Equation), that is
P∞ = A0 P∞ A + Q − A0 P∞ B(B 0 P∞ B + R)−1 B 0 P∞ A.
2
Provided that (A, B) is stabilisable and (Q1/2 , A) is detectable. Then the matrix
A + BF (P∞ ) is stable. Proof. See D.P. Bertsekas, Dynamic programming and
optimal control, Vol. 1, 2005, Prop. 4.4.1.
9 / 26
End of first section
I
Revision of FHOC and DP
I
We solved the LQR problem
10 / 26
II. FHOC for MJLS
11 / 26
FHOC for MJLS
Consider a MJLS
x(k + 1) = Aθ(k) x(k) + Bθ(k) u(k) + Mθ(k) v(k),
|{z}
noise
with x(0) = x0 , and let z(k) = Cθ(k) x(k) + Dθ(k) u(k) be the quantity
that will be penalised. We define the following cost functional
J(θ0 , x0 , πN ) :=
N
−1
X
E kz(k)k2 + E x(T )0 Vθ(N ) x(T ) .
k=0
Where πN is a policy π = (u(0), . . . , u(N − 1)) with
u(k) = µk (x(k), θ(k)).
12 / 26
FHOC assumptions
Let Gk be the σ-algebra generated by {x(t), θ(t); t = 0, . . . , N − 1}.
Assumptions on v:
1. v(k) are random variables with E v(k)v(k)0 1{θ(k)=i} = Ξi (k)
2. For every f , g, f (v(k)) and g(θ(k)) are independent w.r.t Gk
3. E v(0)x(0)0 1{θ(0)=i} = 0
Assumptions on z(k):
1. Ci (k)0 Di (k) = 0 – no penalties of the form x(k)0 Sθ(k) u(k)
2. Di (k)0 Di (k) > 0
13 / 26
Control laws and policies for MJLS
A measurable function
µ : IRn × N → IRm
is called a control law.
A (finite of infinite) sequence of control laws
π = {µ0 , µ1 , . . .},
where µk is Gk -measurable, called a control policy.
14 / 26
FHOC – Dynamic programming recursion
To perform DP we introduce the cost functional
Jκ (θ(κ), x(κ), uκ ) :=
N
−1
X
E kz(k)k2 | Gκ + E x(T )0 Vθ(N ) x(T ) | Gκ ,
k=κ
for κ ∈ {0, . . . , N − 1} where uκ = (u(κ), . . . , u(N − 1)) so that each u(k)
is Gk -measurable. The optimal value of Jκ (θ(κ), x(κ), uκ ) is then given by
Jκ? (i, x) = x0 Xi (κ)x + α(κ),
where Xi is given by a Ricatti-like equation.
15 / 26
FHOC – Dynamic programming recursion
We have
Jκ? (i, x) = x0 Xi (κ)x + α(κ),
where
Xi (N ) = Vi ,
Xi (k) = A0i E(X(k + 1))Ai − Ai E(X(k + 1))Bi Fi (X(k + 1)) + Ci0 Ci ,
where Ei (X) =
PN
j=1 pij Xj ,
Ri (X) := Di0 Di + Bi0 E(X)Bi and
Fi (X) := −Ri−1 Bi0 E(X)Ai .
The respective optimisers are given by
u? (k) = Fθ(k) (X(k + 1))x(k).
16 / 26
End of second section
I
Formulation of FHOC for MJLS considering also an additive noise
term
I
Control policies and control laws
I
Solution of FHOC: piecewise linear control laws
u? (k) = κ(x(k), θ(k)) = Fθ(k) x(k).
17 / 26
III. IHOC for MJLS and MSS
18 / 26
IHOC for MJLS
Consider a MJLS without additive noise
x(k + 1) = Aθ(k) x(k) + Bθ(k) u(k),
with x(0) = x0 , and let z(k) = Cθ(k) x(k) + Dθ(k) u(k) be the quantity that
will be penalised. We are now looking for sequences π = {u(k)}k∈N in
u(k) is Gk -measurable, ∀k ∈ N
U = π
limk→∞ E kx(k)k2 = 0.
19 / 26
IHOC for MJLS
With π ∈ U the following is a well-defined infinite horizon cost function
J(θ0 , x0 , π) :=
∞
X
E kz(k)k2 ,
k=0
and the IHOC problem amounts to determining
J ? (θ0 , x0 ) := inf J(θ0 , x0 , π),
π∈U
and we define π ? to be the respective optimiser with elements
u? (k) = ψk (θ(k), x(k)).
20 / 26
Objectives
1. Under what conditions does the IHOC problem have a solution?
2. How can this solution be determined?
3. Can we derive a MS-stabilising controller by solving the IHOCP?
21 / 26
Control CARE
Assume that there is X ∈ Hn+ satisfying the control CARE :
Xi =A0i Ei (X)Ai −Ai Ei (X)Bi (Di0 Di +Bi0 Ei (X)Bi )−1 Bi0 Ei (X)Ai +Ci0 Ci
and let
Fi (X) := −(Di0 Di +Bi0 Ei (X)Bi )−1 Bi0 Ei (X)Ai .
The IHOC problem solution is given by
u? (k) = Fθ(k) (X)x(k)
and the value function is
J ? (θ0 , x0 ) = E x00 Xθ0 x0 .
22 / 26
Control CARE ⇒ MSS
The control CARE, when solvable, yields a MS-stabilising control law, i.e.,
the closed-loop system
x(k + 1) = (Aθ(k) + Bθ(k) Fθ(k) (X))x(k),
is mean square stable.
23 / 26
Solvability conditions
The following conditions entail the solvability of the control CARE:
1. (A, B) – with A ∈ Hn and B ∈ Hn,m – is stabilisable,
2. (C, A) – with C ∈ Hn,nz is detectable.
Proof. Book of Costa et al., 2005, Corollary A.16.
24 / 26
End of third section
I
We formulated the infinite horizon optimal control problem
I
The solution of IHOC produces a MS-stabilising control law
I
IHOC is solved by a CARE which can be formulated as an LMI
I
Solvability conditions: (A, B) is stabilisable, (C, A) is detectable
25 / 26
References
1. For an introduction to DP: D. P. Bertsekas, Dynamic Programming and Optimal
Control. Athena Scientific, 2nd ed., 2000.
2. Chapter 4 of: O.L.V. Costa, M.D. Fragoso and R.P. Marques, Discrete-time
Markov Jump Linear Systems, Springer 2005.
3. Chapter 6 of: M.H.A. Davis and R.B. Vinter, Stochastic modelling and control,
Chapman and Hall, New York 1985.
4. M.D. Fragoso, Discrete-time jump LQG problem, Int. J. Systems Sci., 20(12), pp.
2539–2545, 1989.
26 / 26