Discrete Time Optimal Control

Discrete Time Optimal Control
Given a dynamical system:
𝑥𝑘+1 = 𝑓𝑘 𝑥𝑘 , 𝑢𝑘
𝑥0 = 𝑥𝑖𝑛𝑖𝑡
Find the control sequence uk = u(tk), k=0,…,N-1, which minimizes :
𝑁−1
𝐽=
𝑙𝑘 𝑥𝑘 , 𝑢𝑘 + 𝑉𝑇 𝑥𝑁
𝑁 𝑡𝑜𝑡𝑎𝑙 "𝑠𝑡𝑎𝑔𝑒𝑠"
𝑘=0
Solution: introduce Lagrange multipliers λ0, λ1, …, λN for all constraints
𝑁−1
𝑙𝑘 𝑥𝑘 , 𝑢𝑘 + λ𝑇𝑘+1 𝑓𝑘 𝑥𝑘 , 𝑢𝑘 − 𝑥𝑘+1 + 𝑉𝑇 𝑥𝑁 + λ𝑇0 𝑥0 − 𝑥𝑖𝑛𝑖𝑡
𝐽=
𝑘=0
𝑁−1
𝐻𝑘 𝑥𝑘 , 𝑢𝑘 , λ𝑘+1 − λ𝑇𝑘+1 𝑥𝑘+1 + 𝑉𝑇 𝑥𝑁 + λ𝑇0 𝑥0 − 𝑥𝑖𝑛𝑖𝑡
=
𝑘=0
𝐻𝑖 𝑥𝑖 , 𝑢𝑖 , λ𝑖+1 ≡ 𝑙𝑖 𝑥𝑖 , 𝑢𝑖 , λ𝑖+1 + λ𝑇𝑖+1 𝑓𝑖 𝑥𝑖 , 𝑢𝑖
Extremize the augmented cost w.r.t.
𝑢𝑘 (𝑘 = 0, … , 𝑁 − 1);
𝑥𝑘 𝑘 = 0, … , 𝑁 ;
λ𝑘 (𝑘 = 0, … , 𝑁);
Since there are a discrete number of “states”, we can use the
classical finite dimensional extremal criteria:
𝜕𝐽
= 0;
𝜕𝑥𝑘
𝜕𝐻𝑘
⇒ λ𝑘 =
𝑥 ,𝑢 ,λ
𝜕𝑥𝑘 𝑘 𝑘 𝑘+1
𝜕𝐽
= 0;
𝜕𝑥𝑁
𝜕𝑉𝑇
⇒ λ𝑁 =
𝜕𝑥𝑁
𝜕𝐽
= 0;
𝜕𝑥0
⇒ 𝑥0 = 𝑥𝑖𝑛𝑖𝑡
𝜕𝐽
= 0;
𝜕𝑢𝑖
⇒
𝜕𝐻𝑖
𝑥 ,𝑢 ,λ
=0
𝜕𝑢𝑖 𝑖 𝑖 𝑖+1
𝜕𝐽
= 0;
𝜕λ𝑚
⇒
𝑥𝑚+1 = 𝑓 𝑥𝑚 , 𝑢𝑚
𝑘 = 1, … , 𝑁 − 1
𝑥𝑁
𝑖 = 0, … , 𝑁 − 1
𝑚 = 0, … , 𝑁 − 1
Discrete TIME LQR
(a different derivation)
A linear dynamical system: 𝑥𝑘+1 = 𝐴𝑘 𝑥𝑘 + 𝐵𝑘 𝑢𝑘
𝑥0 = 𝑥𝑖𝑛𝑖𝑡
Find the control sequence uk which minimizes :
1 𝑇
1
𝐽 = 𝑥𝑁 𝑃𝑇 𝑥𝑁 +
2
2
𝑁−1
𝑥𝑘𝑇 𝑄𝑘 𝑥𝑘 + 𝑢𝑘𝑇 𝑅𝑘 𝑢𝑘
𝑘=0
Solution: Assuming uk = - Kk xk , then 𝑥𝑘+1 =(𝐴𝑘 −𝐵𝑘 𝐾𝑘 )𝑥𝑘 ≡ 𝐴𝐶𝑘 𝑥𝑘
1 𝑇
1
𝐽 = 𝑥𝑁 𝑃𝑇 𝑥𝑁 +
2
2
𝑁−1
𝑥𝑘𝑇 (𝑄𝑘 +𝐾𝑘𝑇 𝑅𝑘 𝐾𝑘 )𝑥𝑘
𝑘=0
“Cost-to-go” from tp
1 𝑇
1
𝐽𝑝 = 𝑥𝑁 𝑃𝑇 𝑥𝑁 +
2
2
Define:
Then note that :
𝑁−1
𝑥𝑘𝑇 (𝑄𝑘 +𝐾𝑘𝑇 𝑅𝑘 𝐾𝑘 )𝑥𝑘
𝑘=𝑝
1
𝐽𝑝+1 − 𝐽𝑝 = − 𝑥𝑝𝑇 (𝑄𝑝 +𝐾𝑝𝑇 𝑅𝑝 𝐾𝑝 )𝑥𝑝
2
“Incremental” Cost
(††)
Discrete TIME LQR
(continued)
The controlled dynamical system will propagate as:
𝑚−1
𝑥𝑚 = 𝛷(𝑚, 𝑘) 𝑥𝑘 =
𝑖=𝑘
𝐴𝐶𝑖 𝑥𝑘
Transition matrix
The “cost-to-go” can be rewritten using the Transition Matrix:
1 𝑇
𝐽𝑝 = 𝑥𝑝 𝛷 𝑇 𝑁, 𝑝 𝑃𝑇 𝛷(𝑁, 𝑝) +
2
1
2
≡ 𝑥𝑝𝑇 𝑃𝑝 𝑥𝑝
𝑁−1
𝛷𝑇 (𝑘, 𝑝)(𝑄𝑘 + 𝐾𝑘𝑇 𝑅𝑘 𝐾𝑘 )𝛷(𝑘, 𝑝)
𝑥𝑝
𝑘=𝑝
Pp = “cost-to-go” (by abuse of language)
Then:
1 𝑇
1 𝑇
𝑇
𝐽𝑝+1 − 𝐽𝑝 = (𝑥𝑝+1 𝑃𝑝+1 𝑥𝑝+1 − 𝑥𝑝 𝑃𝑝 𝑥𝑝 ) = 𝑥𝑝 (𝐴𝐶𝑝 )𝑇 𝑃𝑝+1 𝐴𝐶𝑝 − 𝑃𝑝 𝑥𝑝 (**)
2
2
Now equate (**) and (††), and then rearrange:
𝑃𝑝 = (𝐴𝑝 − 𝐵𝑝 𝐾𝑝 )𝑇 𝑃𝑝+1 𝐴𝑝 − 𝐵𝑝 𝐾𝑝 + 𝐾𝑝𝑇 𝑅𝑝 𝐾𝑝 + 𝑄𝑝
Discrete TIME LQR
(continued)
To derive the optimal feedback gain, we want to minimize the cost-togo with respect to the gain:
𝜕𝑃𝑝
= 0;
𝜕𝐾𝑝
⇒
𝐾𝑝 = (𝑅𝑝 + 𝐵𝑝𝑇 𝑃𝑝+1 𝐵𝑝 )−1 𝐵𝑝𝑇 𝑃𝑝+1 𝐴𝑝
Substituting the expression for Kp back into the expression for Pp yields:
𝑃𝑝 = 𝑄𝑝 + 𝐴𝑇𝑝 𝑃𝑝+1 𝐴𝑝 − 𝐴𝑇𝑝 𝑃𝑝+1 𝐵𝑝 (𝑅𝑝 + 𝐵𝑝𝑇 𝑃𝑝+1 𝐵𝑝 )−1 𝐵𝑝𝑇 𝑃𝑝+1 𝐴𝑝
= 𝑄𝑝 + 𝐴𝑇𝑝 𝑃𝑝+1 − 𝑃𝑝+1 𝐵𝑝 (𝑅𝑝 + 𝐵𝑝𝑇 𝑃𝑝+1 𝐵𝑝 )−1 𝐵𝑝𝑇 𝑃𝑝+1 𝐴𝑝
This is the discrete time Riccati equation (DRE).
–
Because (xpT Pp xp)/2 is the “cost-to-go”, and at stage N the cost to go is
(xNT PN xN)/2, the terminal value of PN is given by: PN = PT

Download Report

Discrete Time Optimal Control

Paperzz.com

Your Paperzz