- Catalyst - University of Washington

AA/EE/ME 548: Homework #1 – Solutions
Definition and goals of LQR. Dynamic programming. Hamilton-Jacobi-Bellman
equation. Riccati equation. Lagrange multipliers and Hamiltonian functions.
Assigned: Monday, January 14, 2013, Due: Thursday, January 24, 2013
Instructor: Tamara Bonaci
Department of Electrical Engineering
University of Washington, Seattle
Problem 1 [Linear dynamical systems]
Consider the system:
ẋ = A(t)x + B(t)u
(1)
where A, B are continuous-time matrices. Show that there does not exist a linear control law:
u = K T (t)x(t)
(2)
with continuous entries of K(t), such that with arbitrary x0 and some finite time, T , x(T ) = 0.
Hint: Use the fact that if ẋ = Ā(t)x where Ā has continuous-time entries, then a transition matrix
exists.
Solution:
Let’s prove by contradiction that the described linear control law does not exist - i.e., for the purpose
of the proof, let’s assume there exists some control law:
u(t) = K T (t)x(t)
(3)
such that for any x0 (other than x0 = 0) and some finite time T the final state value is x(T ) = 0. With
control input (3), the closed-loop dynamics of the system can be written as:
ẋ(t) = A(t)x(t) + B(t)u(t) = A(t)x(t) + B(t)K(t)x(t) = A(t) + B(t)K(t) x(t)
|
{z
}
(4)
Acl (t)
The closed-loop matrix Acl := A(t) + B(t)K(t) is continuous matrix. Therefore, the state transition
matrix Φ(t, t0 ) of the given closed-loop system exists and it is nonsingular (the non-singularity (invertibility) follows from the fact that the closed-loop matrix Acl is continuous) on the whole interval [t0 , t].
Let’s now recall that the solution of a linear time-varying system is equal to:
Z t
Φ(t, τ )B(τ )u(τ )dτ
(5)
x(t) = Φ(t, t0 )x0 +
t0
Using equation (5), it follows that for any x0 , the final value of our system can be found as:
x(T ) = Φ(T, t0 )x0
(6)
since, in our case, matrix Bcl is equal to zero. Equation (6) represents, however, a contradiction - if
state transition matrix is not zero, and the initial state of the system is not zero, then the final state
cannot be zero, as the product of two non-zero values cannot be zero.
Problem 2 [Routing network]
A routing network is depicted in Figure 1. Find the optimal path (a path of minimal cost) from node
x0 to node x6 if only movements from left to right are permitted. Then find the optimal path from
any other node to node x6 as a state-variable feedback.
1
5
7
7
10
10
7
9
9
7
8
7
8
x0
x6
12
12
5
5
5
7
9
10
8
6
5
11
Figure 1: A routing network used in Problem 2.
Solution:
At each node in this problem, we need to decide whether to move to the left or to the right, in order
to reach node x6 in the cheapest sequence of links. We use dynamics programming approach to find
the optimal sequence of decisions from each node in the grid to node x6 . The optimal decision grid is
depicted in Figure 2 (a), where green arrows denote allowed paths, red crosses forbidden paths, and
green numbers in the brackets the optimal cost-to-go from the given node to node x6 .
The optimal path from node x0 to node x6 is depicted in Figure 2 (b), in blue arrows. We observe the
minimal cost of this decision sequence is 44.
Problem 3 [Dynamic programming for bilinear system]
Consider the scalar bilinear system:
xk+1 = xk uk + u2k
with cost
J0 = x2N +
N
−1
X
xk uk
(7)
(8)
k=0
Let N = 2. The control is constrained to take on values of uk = −1 or 1, and the state to take on
values of xk = −1, 0, 1 or 2.
(a) Use dynamic programming to find an optimal state feedback control law.
(b) Let x0 = 2. Find the optimal cost, control sequence and state trajectory.
2
(23)
(23)
5
10
9
x0 (44) or (45)
12
5
8
7
(37)
10
7
8
(13) or (15)
(28) or (30)
12
5
5
7
(6)
6
10
x6 x0 (44)
12
(16)
9
9
(7)
(20)
7
7
8
(28) or (30)
12
5
5
7
8
(13)
x6
7
6
(6)
(18) or (20)
5
9
(28) or (30)
8
11
7
(27)
7
10
(7)
(20) or (25)
10
(33)
5
(16)
(27) or (28)
7
9
(37) or (38)
7
7
(33)
5
10
(18) or (20)
5
9
(28) or (30)
8
11
(11)
(11)
(22)
(22)
(a)
(b)
Figure 2: Problem 2: (a) The optimal decision sequence from any node in the grid to node x6 . (b) The
optimal decision sequence from node x0 to node x6 . The minimum cost of this sequence is 44.
Solution:
(a) As usual, to solve the given dynamic programming problem, we recall the principle of optimality
and begin solving the dynamic programming problem from the final state. The minimum cost of
the final stage can be computed as:
J2∗ (x(2)) = x(2)2
(9)
The obtained results are depicted in Table 1. Let’s now consider state x(1). The minimum cost
for this state can be computed as:
J1∗ (x(1)) = min{x(1)u(1) + J2∗ (x(2))}
u(1)
(10)
For the control values u(1) = −1 and u(1) = 1, we get results depicted in Table 2. Next, we move
backward one more stage, and consider state x(0), with the minimal cost:
J0∗ (x(0)) = min{x(0)u(0) + J1∗ (x(1))}
u(0)
(11)
We can depict results in Table 3.
(b) If the initial state value is equal x(0) = 2, then we can read off the optimal control sequence from
Tables (3)–(1) as:
Optimal state sequence: x(0) = 2 → x(1) = −1 → x(2) = 0
Optimal control sequence: u(0) = −1 → u(1) = 1
Optimal control cost: J ∗ (x(2)) = −3
(12)
Problem 4 [Hamilton-Jacobi-Bellman equation]
Consider a system of the form:
ẋ = f (x(t)) + g · u(t)
3
(13)
Table 1: Problem 3 - final state x(2)
State value x(2) Minimum cost J2∗ (x(2)) = x(2)2
State value x(1)
-1
0
1
2
-1
1
0
0
1
1
2
4
Table 2: Problem 3 - State x(1)
Control value u(1) Next state x(2)
Cost J1
−1
1+1=2
1+4=5
1
−1 + 1 = 0
-1 + 0 = -1
−1
0+1=1
0+1=1
1
0+1=1
0+1=1
−1
−1 + 1 = 0
-1 + 0 = -1
1
1+1=2
1+4=5
−1
−2 + 1 = −1
-2 + 1 = -1
1
2+1=3
N/A
Minimum cost J1∗
u∗ (u(1))
-1
u∗ (−1, 1) = 1
1
u∗ (0, 1) = {−1, 1}
-1
u∗ (1, 1) = −1
-1
u∗ (2, 1) = −1
Table 3: Problem 3 - State x(0)
State value x(0)
-1
0
1
2
Control value u(0)
Next state x(1)
Cost J0
−1
1+1=2
1-1=0
1
−1 + 1 = 0
-1 + 1 = 0
−1
0+1=1
0 - 1 = -1
1
0+1=1
0 - 1 = -1
−1
−1 + 1 = 0
-1 + 1 = 0
1
1+1=2
1-1=0
−1
−2 + 1 = −1
-2 - 1 = -3
1
2+1=3
N/A
with performance measure:
V (x(t), u(·), t) =
Z
Minimum cost J0∗
u∗ (u(0))
0
u∗ (−1, 0) = {−1, 1}
-1
u∗ (0, 0) = {−1, 1}
0
u∗ (1, 0) = {−1, 1}
-3
u∗ (2, 0) = −1
T
(u2 + h(x))dt
(14)
t
Show that the Hamilton-Jacobi-Bellman equation of the given system and the given measure is linear
∗
∂V ∗
in ∂V
∂t and quadratic in ∂x .
4
Solution:
For the given system and its performance cost, the Hamilton-Jacobi-Bellman equation can be written
as:
(
)
∗ T
∂V ∗
∂V
(15)
= − min u2 + h(x(t)) +
[f (x(t)) + gu(t)]
∂t
∂x
u(t)
{z
}
|
F (t,x(t),u(t))
Since function F (t, x(t), u(t)) is convex in u(t), we can find the minimizing u(t) as:
∂F (t, x(t), u(t))
= 2u +
∂u(t)
and it follows:
u∗ = −
1
2
∂V ∗
∂x
∂V ∗
∂x
T
T
g=0
g
(16)
(17)
Combining now equations (15) and (17), we can rewrite the Hamilton-Jacobi-Bellman equation as:
∂V ∗
1
=
∂t
4
"
Equation (18) is clearly linear in
∂V ∗
∂t
∂V ∗
∂x
T #2
T
∂V ∗
g − h(x(t)) −
f (x(t))
∂x
and quadratic in
(18)
∂V ∗
∂x .
Problem 5 [Unique minimum of a performance function]
Let u ∈ Rr be a vector of size r, and p, x ∈ Rn vectors of size n. Let A, B, C be constant matrices of
appropriate dimensions such that the following function of u, x and p can be formed:
Q(u, x, p) = uT Au + 2xT Bu + 2uT Cp
(19)
Show that Q has a unique minimum in u for all x and p if and only if 12 (A + AT ) is positive definite.
Solution:
In order to show that function Q has unique minimum in u for all x and p if and only if 12 (A + AT ) is
positive definite, let’s start by assuming 12 (A + AT ) is positive definite. Then the partial derivative of
Q with respect to u is equal to:
∂Q
= (AT + A)u + 2B T x + 2Cp
∂u
(20)
This derivative is equal to zero if and only if:
u = −(AT + A)−1 [2B T x + 2Cp]
(21)
Moreover, the obtained control input (21) is the unique minimum of the given problem, since the
second derivative of Q with respect to u:
∂Q2
= (AT + A)
∂u2
(22)
is positive.
Let’s now consider the case when 21 (A + AT ) is indefinite, i.e., there exists a λ < 0 and some ū 6= 0 such
that 12 (A+AT )ū = λū. Then Q(x, u, p) can be made as negative as desired via u = kū, for k sufficiently
large. So, if there exists a unique minimum of Q, then it has to be the case that 21 (A + AT ) ≥ 0.
5
Suppose now AT + A is singular. Let (A + AT )ū = 0 when ū 6= 0 and xT Bu + 2uT Cp 6= 0. Then Q can
be made as negative as desired by setting u = kū and setting k of a suitable sign and appropriately
large. If, on the other hand, ū 6= 0 and xT Bu + 2uT Cp = 0, and Q is minimized by u∗ , (equation
(21)) then u∗ + ū achieves the same minimum value for Q, thus causing the minimizing u∗ to become
non-unique. Hence, 12 (A + AT ) is nonsingular, with all eigenvalues strictly positive.
Problem 6 [Scalar Riccati equation]
Consider the problem of minimizing the performance index:
V =
Z
T
(x2 + u2 )dt + ρx2 (T )
(23)
t
given the scalar system:
ẋ = u
(24)
(a) Use the separation of variables principle to solve the associated scalar Riccati equation for arbitrary ρ.
(b) Sketch P (t) for all t ≤ T for each of the following values of ρ : [2, 1, 0, −1, −2]. Note that letting
t approach −∞ is equivalent to letting T approach +∞. Is P (T ) monotone increasing with T in
all cases?
Solution:
(a) The optimization problem can be written as:
2
minimizeu(t) ρx (T ) +
Z
T
{x2 + u2 }dt
t
subject to ẋ(t) = u(t)
(25)
The given system is scalar linear time-invariant and the optimization problem is considered over
a finite time interval. Therefore, we need to solve the following differential Riccati equation in
order to find the minimum control cost (and the optimal control input):
−Ṗ = AT + P A + Q − RBR−1 B T P
(26)
where A = 0, B = 1, Q = 1 and R = 1. Equation (26) thus becomes:
−Ṗ = 1 − P 2
(27)
P (T ) = ρ
(28)
with the boundary condition:
Using the separation of variables, equation (27) can be solved as:
−Ṗ (t)
dP (t)
−
dt
dP
−
1 − P 2 (t)
Z t
dP
1
−
P2
t
6
=
1 − P 2 (t)
=
1 − P 2 (t)
=
=
1
dt
Z
−
t
T
dt
1
ln
2
atanh(P ) =
1+P
=
1−P
1+P
=
1−P
−(T − t) + C
−2(T − t) + C
exp{−2(T − t)} + C
(29)
Now, from the set of equations:
P (t) =
P (T ) =
1 − exp{−2(T − t)}
+C
1 + exp{−2(T − t)}
ρ
(30)
We can find P (t) as:
P (t) =
1 + ρ − (1 − ρ) exp{−2(T − t)}
}
1 + ρ + (1 − ρ) exp{−2(T − t)
Note: In order to solve differential equation (28), we use the following facts:
Z
u
du
1
+ C, u2 < a2
=
atanh
a2 − u 2
a
a
where
atanh =
1
ln
2
1+x
1−x
,
|x| < 1
(31)
(32)
(33)
(b) Let’s start by showing that letting t → −∞ is indeed equal to letting T → ∞ :
lim P (t) =
t→−∞
lim P (t) =
T →∞
1+ρ
1 + ρ − (1 − ρ) exp{−2(T − t)}
}=
=1
1 + ρ + (1 − ρ) exp{−2(T − t)
1+ρ
1+ρ
1 + ρ − (1 − ρ) exp{−2(T − t)}
lim
}=
=1
T →∞ 1 + ρ + (1 − ρ) exp{−2(T − t)
1+ρ
lim
t→−∞
(34)
Figure (3) depicts P (t) for t ≤ T for the given values of ρ. We observe P (t) is is not monotonically increasing in t for every value if ρ. In fact, the following cases are possible:
•
•
•
•
For
For
For
For
values ρ = {−1, 1}, function P (t) is constant (ρ = 1 ⇒ P (t) = 1; ρ = −1 ⇒ P (t) = −1),
ρ = 2, P (t) is monotonically increasing in t,
ρ = 0, P (t) is monotonically decreasing in t, and
ρ = −2, it has two points of inflection (neither increasing, nor decreasing).
The obtained result indicates that the steady-state solution of the given LQR problem does not
exist for every value of ρ.
Problem 7 [LQR with cross-coupling terms]
Derive the optimal LQR solution when there is a cross-coupling term in x and u in the performance
measure, i.e.:
L(x, u, t) = xT Qx + 2uT N x + uT Ru
(35)
In particular, show that:
u∗ = −R−1 (N + B T P )x
(36)
where P satisfies the following differential equation:
−Ṗ = (A − BR−1 N )T P + P (A − BR−1 N ) + (Q − N T R−1 N ) − P BR−1 B T P
(37)
Note that in order for P to have all the usual properties, i.e., to be positive definite, stabilizing etc, we
now, in addition to R > 0, require that Q − N T R−1 N > 0.
7
10
ρ=2
ρ=1
ρ=0
ρ = −1
ρ = −2
4
3
2
P(t)
5
P(t)
5
ρ=2
ρ=1
ρ=0
ρ = −1
ρ = −2
0
1
0
−1
−5
−2
−3
−10
0
0.1
0.2
0.3
0.4
0.5
t[s]
0.6
0.7
0.8
0.9
−4
0
1
0.1
0.2
0.3
0.4
(a)
0.5
t[s]
0.6
0.7
0.8
0.9
1
(b)
Figure 3: Problem 6(b) - P (t) as a function of t for different values of the final cost ρ.
Solution:
To derive the optimal control input when there is a cross-coupling term present in the performance
cost, let’s start by defining the optimization problem we want to solve:
T
minimizeu(t) x (T )M (T )x(T ) +
Z
T
{xT Qx + 2uT N x + uT Ru}dt
t0
subject to ẋ = Ax + Bu
Let’s find the Hamilton-Jacobi-Bellman equation of the given problem:
(
)
∗ T
∂J (t)
∂J ∗ (t)
T
T
T
= min x Qx + 2u N x + u Ru +
(Ax + Bu − ẋ)
−
∂t
∂x
u(t)
(38)
(39)
where the boundary condition is given as:
J ∗ (t) = M (T )
(40)
Let’s now assume that the minimum control cost can be represented as:
J ∗ (t) = xT (t)P (t)x(t)
(41)
Using equation (41), equation (40) can be rewritten as:
−xT Ṗ x = min{xT Qx + uT Ru + 2uT N x + 2xT P Ax + 2xT P Bu}
{z
}
u(t) |
(42)
function F (t,x,u)
Since function F (t, x, u) is convex in u(t), its unique minimum is achieved when:
∂F (t, x, u)
= 2Ru + 2N x + 2B T P x = 0
∂u(t)
(43)
From equation (43), it follows that the optimal control input is given as:
u∗ (t) = −R−1 (N + B T P (t))x(t)
(44)
Next, we need to show that the assumed minimal cost indeed has the form xT P (t)x(t). Let’s consider
the Hamilton-Jacobi-Bellman equation (42) again. If we plug the optimal input (44) in it, we can
write:
8
−xT Ṗ x =
−
xT Qx + xT (AT P + P A)x − 2(xT (N T + P B)R−1 )N x
2xT P B(R−1 (N + B T P ))x + xT (N T + P B)R−1 RR−1 (N + B T P )x
=
+
xT Qx + xT (AT P + P A)x − 2xT N T R−1 N x − 4xT P BR−1 N x − 2xT P BR−1 BP x
xT N T R−1 N + xT P BR−1 N + xT N T B T P x + xT P BR−1 B T P x
=
+
xT [(Q − 2N T R−1 N + N T R−1 N )] + AT P + P A − 2(N T R−1 B T P + P BR−1 N )
P BR−1 N + N T R−1 B T P + P BR−1 B T P − 2P BR−1 BP ]x
(45)
Finally, from equation (45), we can find P (t) as a solution of the following differential equation:
−Ṗ (t) = (Q − N T R−1 N ) + (AT + N T R−1 B T )P + P (A − BR−1 N ) − P BR−1 B T
(46)
Problem 8 [Critical points of quadratic surfaces]
Find the critical points u∗ , classify them and then find the value of L(u∗ ) in Example 1.1.-1 (Lewis,
Vrabie, Syrmos, ch.1., p.2) if:
−1 1
(a) Q =
, ST = 0 1
1 −2
−1 1
(b) Q =
, ST = 0 1
1 2
Solution:
Given quadratic form:
L(u) =
1 T
u Qu + S T u,
2
(47)
its critical points can be found as:
∂L(u)
= Qu + S = 0
∂u
From equation (48), it follows the optimal control input is given as:
u∗ = −Q−1 S
(48)
(49)
For the given subproblems, we thus obtain:
u∗a
= −Q
−1
S=
u∗b = −Q−1 S =
1
1
−0.3333
−0.3333
(50)
(51)
The type of critical points (50) and (51) is minimum if the second derivative of L(u) with respect to
u is positive, and maximum otherwise. Since the second derivative of L(u) is given as:
∂L(u)2
=Q
∂2u
(52)
for the given subproblems, it follows:
(a) Matrix Q has both eigenvalues negative (λ1 = −2.6180, λ2 = −0.3820), thus it is negative definite.
The given critical point is therefore maximum.
(b) Matrix Q has one eigenvalue negative (λ1 = −1.3028), and the other positive (λ2 = 2.3028). It
is thus indefinite. The given critical point is therefore neither minimum nor maximum (saddle
point).
9
Problem 9 [Shortest distance between two points]
Let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be two given points. Find the third point P3 = (x3 , y3 ) such that
d1 = d2 is minimized, where d1 is the distance from P3 to P1 and d2 is the distance from P3 to P2 .
Solution:
Let’s define the squared distance between points P1 , P2 , P3 as:
d1
=
(x3 − x1 )2 + (y3 − y1 )2
d2
=
(x2 − x3 )2 + (y2 − y3 )2
(53)
The problem at hands can then be described as the following minimization problem:
minimizex3 ,y3 d1 + d2 = (x3 − x1 )2 + (y3 − y1 )2 + (x2 − x3 )2 + (y2 − y3 )2
|
{z
}
(54)
f (x3 ,y3 )
From equation (54), we immediately observe function f (x3 , y3 ) is convex in both x3 and y3 . Thus, its
optimal value can be find as:
∂f (x3 , y3 )
∂x3
∂f (x3 , y3 )
∂y3
= 2(x3 − x1 ) − 2(x2 − x3 ) = 0
= 2(y3 − y1 ) − 2(y2 − y3 ) = 0
x1 + x2
2
y1 + y2
y3 =
2
x3 =
(55)
From the obtained results we see point P3 should divide the line between points P1 and P2 exactly in
half.
10