Discrete Time Optimal Control

Discrete Time Optimal Control
Given a dynamical system:
π‘₯π‘˜+1 = π‘“π‘˜ π‘₯π‘˜ , π‘’π‘˜
π‘₯0 = π‘₯𝑖𝑛𝑖𝑑
Find the control sequence uk = u(tk), k=0,…,N-1, which minimizes :
π‘βˆ’1
𝐽=
π‘™π‘˜ π‘₯π‘˜ , π‘’π‘˜ + 𝑉𝑇 π‘₯𝑁
𝑁 π‘‘π‘œπ‘‘π‘Žπ‘™ "π‘ π‘‘π‘Žπ‘”π‘’π‘ "
π‘˜=0
Solution: introduce Lagrange multipliers Ξ»0, Ξ»1, …, Ξ»N for all constraints
π‘βˆ’1
π‘™π‘˜ π‘₯π‘˜ , π‘’π‘˜ + Ξ»π‘‡π‘˜+1 π‘“π‘˜ π‘₯π‘˜ , π‘’π‘˜ βˆ’ π‘₯π‘˜+1 + 𝑉𝑇 π‘₯𝑁 + λ𝑇0 π‘₯0 βˆ’ π‘₯𝑖𝑛𝑖𝑑
𝐽=
π‘˜=0
π‘βˆ’1
π»π‘˜ π‘₯π‘˜ , π‘’π‘˜ , Ξ»π‘˜+1 βˆ’ Ξ»π‘‡π‘˜+1 π‘₯π‘˜+1 + 𝑉𝑇 π‘₯𝑁 + λ𝑇0 π‘₯0 βˆ’ π‘₯𝑖𝑛𝑖𝑑
=
π‘˜=0
𝐻𝑖 π‘₯𝑖 , 𝑒𝑖 , λ𝑖+1 ≑ 𝑙𝑖 π‘₯𝑖 , 𝑒𝑖 , λ𝑖+1 + λ𝑇𝑖+1 𝑓𝑖 π‘₯𝑖 , 𝑒𝑖
Extremize the augmented cost w.r.t.
π‘’π‘˜ (π‘˜ = 0, … , 𝑁 βˆ’ 1);
π‘₯π‘˜ π‘˜ = 0, … , 𝑁 ;
Ξ»π‘˜ (π‘˜ = 0, … , 𝑁);
Since there are a discrete number of β€œstates”, we can use the
classical finite dimensional extremal criteria:
πœ•π½
= 0;
πœ•π‘₯π‘˜
πœ•π»π‘˜
β‡’ Ξ»π‘˜ =
π‘₯ ,𝑒 ,Ξ»
πœ•π‘₯π‘˜ π‘˜ π‘˜ π‘˜+1
πœ•π½
= 0;
πœ•π‘₯𝑁
πœ•π‘‰π‘‡
β‡’ λ𝑁 =
πœ•π‘₯𝑁
πœ•π½
= 0;
πœ•π‘₯0
β‡’ π‘₯0 = π‘₯𝑖𝑛𝑖𝑑
πœ•π½
= 0;
πœ•π‘’π‘–
β‡’
πœ•π»π‘–
π‘₯ ,𝑒 ,Ξ»
=0
πœ•π‘’π‘– 𝑖 𝑖 𝑖+1
πœ•π½
= 0;
πœ•Ξ»π‘š
β‡’
π‘₯π‘š+1 = 𝑓 π‘₯π‘š , π‘’π‘š
π‘˜ = 1, … , 𝑁 βˆ’ 1
π‘₯𝑁
𝑖 = 0, … , 𝑁 βˆ’ 1
π‘š = 0, … , 𝑁 βˆ’ 1
Discrete TIME LQR
(a different derivation)
A linear dynamical system: π‘₯π‘˜+1 = π΄π‘˜ π‘₯π‘˜ + π΅π‘˜ π‘’π‘˜
π‘₯0 = π‘₯𝑖𝑛𝑖𝑑
Find the control sequence uk which minimizes :
1 𝑇
1
𝐽 = π‘₯𝑁 𝑃𝑇 π‘₯𝑁 +
2
2
π‘βˆ’1
π‘₯π‘˜π‘‡ π‘„π‘˜ π‘₯π‘˜ + π‘’π‘˜π‘‡ π‘…π‘˜ π‘’π‘˜
π‘˜=0
Solution: Assuming uk = - Kk xk , then π‘₯π‘˜+1 =(π΄π‘˜ βˆ’π΅π‘˜ πΎπ‘˜ )π‘₯π‘˜ ≑ π΄πΆπ‘˜ π‘₯π‘˜
1 𝑇
1
𝐽 = π‘₯𝑁 𝑃𝑇 π‘₯𝑁 +
2
2
π‘βˆ’1
π‘₯π‘˜π‘‡ (π‘„π‘˜ +πΎπ‘˜π‘‡ π‘…π‘˜ πΎπ‘˜ )π‘₯π‘˜
π‘˜=0
β€œCost-to-go” from tp
1 𝑇
1
𝐽𝑝 = π‘₯𝑁 𝑃𝑇 π‘₯𝑁 +
2
2
Define:
Then note that :
π‘βˆ’1
π‘₯π‘˜π‘‡ (π‘„π‘˜ +πΎπ‘˜π‘‡ π‘…π‘˜ πΎπ‘˜ )π‘₯π‘˜
π‘˜=𝑝
1
𝐽𝑝+1 βˆ’ 𝐽𝑝 = βˆ’ π‘₯𝑝𝑇 (𝑄𝑝 +𝐾𝑝𝑇 𝑅𝑝 𝐾𝑝 )π‘₯𝑝
2
β€œIncremental” Cost
(††)
Discrete TIME LQR
(continued)
The controlled dynamical system will propagate as:
π‘šβˆ’1
π‘₯π‘š = 𝛷(π‘š, π‘˜) π‘₯π‘˜ =
𝑖=π‘˜
𝐴𝐢𝑖 π‘₯π‘˜
Transition matrix
The β€œcost-to-go” can be rewritten using the Transition Matrix:
1 𝑇
𝐽𝑝 = π‘₯𝑝 𝛷 𝑇 𝑁, 𝑝 𝑃𝑇 𝛷(𝑁, 𝑝) +
2
1
2
≑ π‘₯𝑝𝑇 𝑃𝑝 π‘₯𝑝
π‘βˆ’1
𝛷𝑇 (π‘˜, 𝑝)(π‘„π‘˜ + πΎπ‘˜π‘‡ π‘…π‘˜ πΎπ‘˜ )𝛷(π‘˜, 𝑝)
π‘₯𝑝
π‘˜=𝑝
Pp = β€œcost-to-go” (by abuse of language)
Then:
1 𝑇
1 𝑇
𝑇
𝐽𝑝+1 βˆ’ 𝐽𝑝 = (π‘₯𝑝+1 𝑃𝑝+1 π‘₯𝑝+1 βˆ’ π‘₯𝑝 𝑃𝑝 π‘₯𝑝 ) = π‘₯𝑝 (𝐴𝐢𝑝 )𝑇 𝑃𝑝+1 𝐴𝐢𝑝 βˆ’ 𝑃𝑝 π‘₯𝑝 (**)
2
2
Now equate (**) and (††), and then rearrange:
𝑃𝑝 = (𝐴𝑝 βˆ’ 𝐡𝑝 𝐾𝑝 )𝑇 𝑃𝑝+1 𝐴𝑝 βˆ’ 𝐡𝑝 𝐾𝑝 + 𝐾𝑝𝑇 𝑅𝑝 𝐾𝑝 + 𝑄𝑝
Discrete TIME LQR
(continued)
To derive the optimal feedback gain, we want to minimize the cost-togo with respect to the gain:
πœ•π‘ƒπ‘
= 0;
πœ•πΎπ‘
β‡’
𝐾𝑝 = (𝑅𝑝 + 𝐡𝑝𝑇 𝑃𝑝+1 𝐡𝑝 )βˆ’1 𝐡𝑝𝑇 𝑃𝑝+1 𝐴𝑝
Substituting the expression for Kp back into the expression for Pp yields:
𝑃𝑝 = 𝑄𝑝 + 𝐴𝑇𝑝 𝑃𝑝+1 𝐴𝑝 βˆ’ 𝐴𝑇𝑝 𝑃𝑝+1 𝐡𝑝 (𝑅𝑝 + 𝐡𝑝𝑇 𝑃𝑝+1 𝐡𝑝 )βˆ’1 𝐡𝑝𝑇 𝑃𝑝+1 𝐴𝑝
= 𝑄𝑝 + 𝐴𝑇𝑝 𝑃𝑝+1 βˆ’ 𝑃𝑝+1 𝐡𝑝 (𝑅𝑝 + 𝐡𝑝𝑇 𝑃𝑝+1 𝐡𝑝 )βˆ’1 𝐡𝑝𝑇 𝑃𝑝+1 𝐴𝑝
This is the discrete time Riccati equation (DRE).
–
Because (xpT Pp xp)/2 is the β€œcost-to-go”, and at stage N the cost to go is
(xNT PN xN)/2, the terminal value of PN is given by: PN = PT