Discrete Time Optimal Control Given a dynamical system: π₯π+1 = ππ π₯π , π’π π₯0 = π₯ππππ‘ Find the control sequence uk = u(tk), k=0,β¦,N-1, which minimizes : πβ1 π½= ππ π₯π , π’π + ππ π₯π π π‘ππ‘ππ "π π‘ππππ " π=0 Solution: introduce Lagrange multipliers Ξ»0, Ξ»1, β¦, Ξ»N for all constraints πβ1 ππ π₯π , π’π + Ξ»ππ+1 ππ π₯π , π’π β π₯π+1 + ππ π₯π + Ξ»π0 π₯0 β π₯ππππ‘ π½= π=0 πβ1 π»π π₯π , π’π , Ξ»π+1 β Ξ»ππ+1 π₯π+1 + ππ π₯π + Ξ»π0 π₯0 β π₯ππππ‘ = π=0 π»π π₯π , π’π , Ξ»π+1 β‘ ππ π₯π , π’π , Ξ»π+1 + Ξ»ππ+1 ππ π₯π , π’π Extremize the augmented cost w.r.t. π’π (π = 0, β¦ , π β 1); π₯π π = 0, β¦ , π ; Ξ»π (π = 0, β¦ , π); Since there are a discrete number of βstatesβ, we can use the classical finite dimensional extremal criteria: ππ½ = 0; ππ₯π ππ»π β Ξ»π = π₯ ,π’ ,Ξ» ππ₯π π π π+1 ππ½ = 0; ππ₯π πππ β Ξ»π = ππ₯π ππ½ = 0; ππ₯0 β π₯0 = π₯ππππ‘ ππ½ = 0; ππ’π β ππ»π π₯ ,π’ ,Ξ» =0 ππ’π π π π+1 ππ½ = 0; πΞ»π β π₯π+1 = π π₯π , π’π π = 1, β¦ , π β 1 π₯π π = 0, β¦ , π β 1 π = 0, β¦ , π β 1 Discrete TIME LQR (a different derivation) A linear dynamical system: π₯π+1 = π΄π π₯π + π΅π π’π π₯0 = π₯ππππ‘ Find the control sequence uk which minimizes : 1 π 1 π½ = π₯π ππ π₯π + 2 2 πβ1 π₯ππ ππ π₯π + π’ππ π π π’π π=0 Solution: Assuming uk = - Kk xk , then π₯π+1 =(π΄π βπ΅π πΎπ )π₯π β‘ π΄πΆπ π₯π 1 π 1 π½ = π₯π ππ π₯π + 2 2 πβ1 π₯ππ (ππ +πΎππ π π πΎπ )π₯π π=0 βCost-to-goβ from tp 1 π 1 π½π = π₯π ππ π₯π + 2 2 Define: Then note that : πβ1 π₯ππ (ππ +πΎππ π π πΎπ )π₯π π=π 1 π½π+1 β π½π = β π₯ππ (ππ +πΎππ π π πΎπ )π₯π 2 βIncrementalβ Cost (β β ) Discrete TIME LQR (continued) The controlled dynamical system will propagate as: πβ1 π₯π = π·(π, π) π₯π = π=π π΄πΆπ π₯π Transition matrix The βcost-to-goβ can be rewritten using the Transition Matrix: 1 π π½π = π₯π π· π π, π ππ π·(π, π) + 2 1 2 β‘ π₯ππ ππ π₯π πβ1 π·π (π, π)(ππ + πΎππ π π πΎπ )π·(π, π) π₯π π=π Pp = βcost-to-goβ (by abuse of language) Then: 1 π 1 π π π½π+1 β π½π = (π₯π+1 ππ+1 π₯π+1 β π₯π ππ π₯π ) = π₯π (π΄πΆπ )π ππ+1 π΄πΆπ β ππ π₯π (**) 2 2 Now equate (**) and (β β ), and then rearrange: ππ = (π΄π β π΅π πΎπ )π ππ+1 π΄π β π΅π πΎπ + πΎππ π π πΎπ + ππ Discrete TIME LQR (continued) To derive the optimal feedback gain, we want to minimize the cost-togo with respect to the gain: πππ = 0; ππΎπ β πΎπ = (π π + π΅ππ ππ+1 π΅π )β1 π΅ππ ππ+1 π΄π Substituting the expression for Kp back into the expression for Pp yields: ππ = ππ + π΄ππ ππ+1 π΄π β π΄ππ ππ+1 π΅π (π π + π΅ππ ππ+1 π΅π )β1 π΅ππ ππ+1 π΄π = ππ + π΄ππ ππ+1 β ππ+1 π΅π (π π + π΅ππ ππ+1 π΅π )β1 π΅ππ ππ+1 π΄π This is the discrete time Riccati equation (DRE). β Because (xpT Pp xp)/2 is the βcost-to-goβ, and at stage N the cost to go is (xNT PN xN)/2, the terminal value of PN is given by: PN = PT
© Copyright 2026 Paperzz