Output Feedback Optimal Control with Constraints

Output Feedback Optimal Control with Constraints
Marı́a M. Seron
September 2004
Centre for Complex Dynamic
Systems and Control
Outline
1
Introduction
2
Problem Formulation
3
Optimal Solutions
Optimal Solution for N = 1
Optimal Solution for N = 2
Discussion on Implementation
4
Suboptimal Strategies
Certainty Equivalent Control
Partially Stochastic Certainty Equivalent Control
5
Simulations
Centre for Complex Dynamic
Systems and Control
Introduction
Here we address the problem of constrained optimal control for
systems with uncertainty and incomplete state information.
We adopt a stochastic description of uncertainty, which associates
probability distributions to the uncertain elements, that is,
disturbances and initial conditions.
When incomplete state information exists, a popular
observer-based control strategy in the presence of stochastic
disturbances is to use the certainty equivalence [CE] principle,
introduced before for deterministic systems.
In the stochastic framework, CE consists of estimating the state
and then using these estimates as if they were the true state in the
control law that results if the problem were formulated as a
deterministic problem (that is, without uncertainty).
Centre for Complex Dynamic
Systems and Control
Introduction
The CE strategy is motivated by the unconstrained problem with a
quadratic objective function, for which CE is indeed the optimal
solution.
Here we analyse the optimality of the CE principle.
We will see that CE is not optimal in general.
We will also analyse the possibility of obtaining truly optimal
solutions for single input linear systems with input constraints and
uncertainty related to output feedback and stochastic disturbances.
We first find the optimal solution for the case of horizon N = 1, and
then we indicate the complications that arise in the case of
horizon N = 2.
Centre for Complex Dynamic
Systems and Control
Problem Formulation
We consider the following time-invariant, discrete time linear
system with disturbances
xk +1 = Axk + Buk + wk ,
(1)
yk = Cxk + vk ,
where xk , wk ∈ Rn and uk , yk , vk ∈ R. The control uk is constrained
to take values in the set
U = {u ∈ R : −∆ ≤ u ≤ ∆},
for a given constant value ∆ > 0.
The disturbances wk and vk are i.i.d. random vectors, with
probability density functions (pdf) pw ( · ) and pv ( · ), respectively.
The initial state, x0 , is characterised by a pdf px0 ( · ).
We assume that (A , B ) is reachable and that (A , C ) is observable.
Centre for Complex Dynamic
Systems and Control
Problem Formulation
We further assume that, at time k , the value of the state x k is not
available to the controller.
Instead, the following sets of past inputs and outputs, grouped as
the information vector I k , represent all the information available to
the controller at the time instant k :


{y 0 }







{y 0 , y 1 , u 0 }




 {y , y , y , u , u }
k
I =
0 1 2 0 1



..



.





{y0 , y1 , . . . , yN −1 , u0 , u1 , . . . uN −2 }
if k = 0,
if k = 1,
if k = 2,
..
.
if k = N − 1.
Then, I k ∈ R2k +1 , and also I k +1 = {I k , yk +1 , uk }, where I k ⊂ I k +1 .
Centre for Complex Dynamic
Systems and Control
Problem Formulation
For system (1), under the assumptions made, we formulate the
optimisation problem:
minimise E F (xN ) +
N
−1
X
k =0
L (xk , uk ) ,
(2)
where
F (xN ) = xN PxN ,
L (xk , uk ) = xk Qxk + Ruk2 ,
subject to the system equations (1) and the input constraint
uk ∈ U, for k = 0, . . . , N − 1.
Note that, under the stochastic assumptions, the expression
P −1
F (xN ) + N
k =0 L (xk , uk ) is a random variable. Hence, it is only
meaningful to formulate the minimisation problem in terms of its
statistics, for example, its expected value as in (2).
Centre for Complex Dynamic
Systems and Control
Problem Formulation
The result of the above minimisation problem will be a sequence of
functions


{π
0 ( · ), π1 ( · ), . . . , πN −1 ( · )}
that enable the controller to calculate the desired optimal control
action depending on the information available to the controller at
each time instant k , that is,
k
uk = π
k (I ).
These functions also must ensure that the constraints be always
satisfied.
We thus make the following definition.
Centre for Complex Dynamic
Systems and Control
Problem Formulation
Definition (Admissible Policies for Incomplete State Information)
A policy ΠN is a finite sequence of functions πk ( · ) : R2k +1 → R for
k = 0, 1, . . . , N − 1, that is,
ΠN = π0 ( · ), π1 ( · ), · · · , πN −1 (·) .
A policy ΠN is called an admissible control policy if and only if
πk (Ik ) ∈ U for all I k ∈ R2k +1 ,
for k = 0, . . . , N − 1.
Further, the class of all admissible control policies will be denoted
by
Π̄N = ΠN : ΠN is admissible .
Using the above definition, we can then state the optimal control
problem of interest as follows.
Centre for Complex Dynamic
Systems and Control
Problem Formulation
Definition (Stochastic Finite Horizon Optimal Control Problem)
Given the pdfs px0 ( · ), pw ( · ) and pv ( · ) of the initial state x0 and
the disturbances wk and vk , respectively, we seek the optimal
control policy Π
belonging to the class of all admissible control
N
policies Π̄N , which minimises the objective function
VN (ΠN ) =
E
x 0 ,w k ,v k
k =0,...,N −1
N
−1
X
F (x N ) +
L (xk , πk (I k )) ,
(3)
k =0
subject to the constraints
xk +1 = Axk + B πk (I k ) + wk ,
yk = Cxk + vk ,
I
k +1
= {Ik , yk +1 , uk },
for k = 0, . . . , N − 1.
Centre for Complex Dynamic
Systems and Control
Problem Formulation
In (3) the terminal state weighting F ( · ) and the per-stage
weighting L ( · , · ) are given by
F (xN ) = xN PxN ,
(4)
L (xk , πk (I k )) = xk Qxk + R π2k (I k ),
with P > 0, R > 0 and Q ≥ 0.
The optimal control policy is then
Π
N = arg inf VN (ΠN ),
ΠN ∈Π̄N
with the following resulting optimal objective function value
VN = inf VN (ΠN ).
(5)
ΠN ∈Π̄N
Centre for Complex Dynamic
Systems and Control
Problem Formulation
It is important to recognise that the optimisation problem of
Definition 3.2 takes into account the fact that new information will
be available to the controller at future time instants.
This is called closed loop optimisation, as opposed to open loop
optimisation where the control values {u0 , u1 , . . . , uN −1 } are
selected all at once, at stage zero.
For deterministic systems, in which there is no uncertainty, the
distinction between open loop and closed loop optimisation is
irrelevant, and the minimisation of the objective function over all
sequences of controls or over all control policies yields the same
result.
Centre for Complex Dynamic
Systems and Control
Problem Formulation
In what follows, and as done before, the matrix P in (4) will be
taken to be the solution to the algebraic Riccati equation,
P = A  PA + Q − K  R̄K ,
(6)
where
K , R̄ −1 B  PA ,
R̄ , R + B  PB .
(7)
Centre for Complex Dynamic
Systems and Control
Optimal Solutions
The problem just described belongs to the class of the so-called
sequential decision problems under uncertainty.
A key feature of these problems is that an action taken at a
particular stage affects all future stages.
Thus, the control action has to be computed taking into account
the future consequences of the current decision.
The only general approach known to address sequential decision
problems is dynamic programming.
We next briefly show how dynamic programming is used to solve
the stochastic optimal control problem just defined.
Centre for Complex Dynamic
Systems and Control
Dynamic Programming
The dynamic programming algorithm for the case of incomplete
state information can be expressed via the following sequential
optimisation (sub-) problems [SOP]:
For k = N − 1,
SOPN −1 : JN −1 (IN −1 ) = inf L̃N −1 (IN −1 , uN −1 ),
uN −1 ∈U
subject to:
(8)
xN = AxN −1 + BuN −1 + wN −1 .
where
o
n
L̃N −1 (I N −1 , πN −1 (I N −1 )) = E F (xN ) + L (xN −1 , πN −1 (I N −1 ))|I N −1 , πN −1 (I N −1 ) .
Centre for Complex Dynamic
Systems and Control
Dynamic Programming
For k = 0, . . . , N − 2,
oi
h
n
SOPk : Jk (Ik ) = inf L̃k (Ik , uk ) + E Jk +1 (Ik +1 )|Ik , uk ,
uk ∈U
subject to:
xk +1 = Axk + Buk + wk ,
I k +1 = {I k , yk +1 , uk },
yk +1 = Cxk +1 + vk +1 ,
where
n
o
L̃k (I k , πk (I k )) = E L (xk , πk (I k ))|I k , πk (I k )
for k = 0, . . . , N − 2.
Centre for Complex Dynamic
Systems and Control
Dynamic Programming
The dynamic programming algorithm proceeds as follows:
It starts at stage N − 1 by solving SOPN −1 for all possible
( · ) is thus obtained, in the sense
values of I N −1 . The law π
N −1
N −1
that given the value of I
, the corresponding optimal control
N −1
I
is the value uN −1 = π
(
), the minimiser of SOPN −1 .
N −1
Then it continues to solve the sub-problems

( · ), . . . , π
( · ).
SOPN −2 , . . . , SOP0 to obtain the laws πN
0
−2
After the last optimisation sub-problem is solved, the optimal

control policy ΠN
is obtained and the optimal objective
function (see (5)) is

VN = VN (ΠN
) = E{J0 (I0 )} = E{J0 (y0 )}.
Centre for Complex Dynamic
Systems and Control
Optimal Solution for N = 1
In the following proposition, we apply the dynamic programming
algorithm to obtain the optimal solution of the problem in
Definition 3.2 for the case N = 1.
Proposition
For N = 1, the solution to the optimal control problem stated in
Definition3.2 is of the form Π
= {π
( · )}, with
1
0
0
0
0
u0 = π
0 (I ) = −sat∆ (K E{x0 |I }) for all I ∈ R,
(9)
where K is given in (7) and sat∆ : R → R is the saturation function
defined as



∆
if z > ∆,




sat∆ (z ) = 
z
if |z | ≤ ∆,




−∆ if z < −∆.
Centre for Complex Dynamic
Systems and Control
Optimal Solution for N = 1
Proposition (continued)
Moreover, the last step in the dynamic programming algorithm has
the value
o
n
J0 (I 0 ) = E x0 Px0 |I 0 + R̄ Φ∆ (K E{x0 |I 0 })
(10)
+ tr(K  K cov{x0 |I0 }) + E{w0 Pw0 },
where P and R̄ are defined in (6) and (7), respectively, and where
Φ∆ : R → R is given by
Φ∆ (z ) = [z − sat∆ (z )]2 .
(11)
Centre for Complex Dynamic
Systems and Control
Optimal Solution for N = 1
Proof:
For N = 1, we only have to solve SOP0 (see (8)):
n
o
J0 (I 0 ) = inf E F (x1 ) + L (x0 , u0 )|I 0 , u0
u0 ∈U
n
= inf E (Ax0 + Bu0 + w0 ) P (Ax0 + Bu0 + w0 )
u0 ∈U
o
+ x0 Qx0 + Ru02 |I0 , u0 .
(12)
Using the fact that E{w0 |I 0 , u0 } = E {w0 } = 0 and that w0 is neither
correlated with the state x0 nor correlated with the control u0 , (12)
can be expressed, after distributing and grouping terms, as
n
J0 (I 0 ) = E{w0 Pw0 } + inf E x0 (A  PA + Q )x0
u0 ∈U
o

+ 2u0 B PAx0 (B  PB + R )u02 |I0 , u0 .
Centre for Complex Dynamic
Systems and Control
Optimal Solution for N = 1
Proof (continued):
Using the Riccati equation (6)–(7), the above becomes
n
J0 (I 0 ) = E{w0 Pw0 } + inf E x0 Px0 + R̄ (x0 K  Kx0
u0 ∈U
o
+ 2u0 Kx0 + u02 )|I0 , u0
o
n
= E{w0 Pw0 } + E{x0 Px0 |I0 } + R̄ inf E (u0 + Kx0 )2 |I0 , u0 ,
u0 ∈U
where we have used the fact that the conditional pdf of x 0 given I 0
and u0 is equal to the pdf where only I 0 is given.
Finally, using properties of the expected value of quadratic forms
the optimisation problem to solve becomes
J0 (I 0 ) = E{w0 Pw0 } + E{x0 Px0 |I 0 } + tr(K  K cov{x0 |I 0 })
h
i
+ R̄ inf (u0 + K E{x0 |I0 })2 .
u0 ∈U
Centre for Complex Dynamic
Systems and Control
Optimal Solution for N = 1
Proof (continued):
We repeat for convenience:
J0 (I 0 ) = E{w0 Pw0 } + E{x0 Px0 |I 0 } + tr(K  K cov{x0 |I 0 })
h
i
+ R̄ inf (u0 + K E{x0 |I0 })2 .
(13)
u0 ∈U
It is clear from (13) that the unconstrained minimum is attained at
u0 = −K E{x0 |I 0 }.
In the constrained case, equation (9) follows from the convexity of
the quadratic function.
The final value (10) is obtained by substituting (9) into (13).
Centre for Complex Dynamic
Systems and Control
Optimal Solution for N = 1
Hence, for N = 1 the optimal control is
0
0
0
u0 = π
0 (I ) = −sat∆ (K E{x0 |I }) for all I ∈ R.
Note that the optimal control law π
depends on the information I 0
0
only through the conditional expectation E{x0 |I 0 }.
Therefore, this conditional expectation is a sufficient statistic in this
case, that is, it provides all the necessary information to implement
the control.
Centre for Complex Dynamic
Systems and Control
Optimal Solution for N = 1
We observe that the N = 1 control law
0
0
0
u0 = π
0 (I ) = −sat∆ (K E{x0 |I }) for all I ∈ R.
is also the optimal control law for the cases in which:
the state is measured (complete state information) and the
disturbance wk is still acting on the system;
the state is measured and wk is set equal to a fixed value or to
its mean value (as we saw on Day 4 for the case wk = 0).
Therefore, CE is optimal for horizon N = 1, that is, the optimal control law is the same law that would result from an associated deterministic optimal control problem in which some
or all uncertain quantities were set to a fixed value.
Centre for Complex Dynamic
Systems and Control
Optimal Solution for N = 2
We now consider the case where the optimisation horizon is N = 2.
Proposition
For N = 2, the solution to the optimal control problem stated in
Definition3.2 is of the form Π
= {π
( · ), π
( · )}, with
2
0
1
1
1
1
3
u1 = π
1 (I ) = −sat∆ (K E{x1 |I }) for all I ∈ R ,
h
0
0 2
u0 = π
0 (I ) = arg inf R̄ (u0 + K E{x0 |I })
u0 ∈U
(14)
n
oi
+ R̄ E Φ∆ (K E{x1 |I1 })|I 0 , u0
for all I 0 ∈ R,
where Φ∆ (z ) = [z − sat∆ (z )]2 .
Centre for Complex Dynamic
Systems and Control
Optimal Solution for N = 2
Hence, for N = 2 the optimal control is
u0
=
0
π
0 (I )
"
= arg inf R̄ (u0 + K E{x0 |I0 })2
u0 ∈U
nh
1
1
i2
0
o#
+ R̄ E (K E{x1 |I }) − sat∆ (K E{x1 |I }) |I , u0 ,
I 0 ∈ R.
To obtain an explicit form for π
, we would need to express
0
1
0
{
x
I
x
I
y
u
|
}
=
{
|
,
,
}
explicitly
as a function of I 0 , u0 and y1 .
E 1
E 1
1 0
The optimal law π
( · ) depends on I 0 not only through E{x0 |I0 }, as
0
was the case for N = 1.
Indeed, even for Gaussian disturbances, when input constraints
are present, it is possible to show that the optimal control law
π
( · ) depends also on cov{x0 |I0 }.
0
Centre for Complex Dynamic
Systems and Control
Discussion on Implementation
To calculate E{x1 |I 1 }, we need to find the conditional pdf px1 |I1 ( · |I 1 ).
At any time instant k , the conditional pdfs pxk |Ik ( · |I k ) satisfy the
Chapman–Kolmogorov equation (time update) and the observation
update equation:
Time update
p (xk |I
k −1
, uk −1 ) =
Z
Rn
p (xk |xk −1 , uk −1 ) p (xk −1 |I k −1 , uk −1 )dxk −1 (15)
Observation update
p (xk |I k ) = p (xk |I k −1 , yk , uk −1 )
=
p (yk |xk )p (xk |I k −1 , uk −1 )
p (yk |I k −1 , uk −1 )
(16)
Centre for Complex Dynamic
Systems and Control
Discussion on Implementation
In general, depending on the pdfs of the initial state and the
disturbances, it may be very difficult or even impossible to obtain
an explicit form for the conditional pdfs that satisfy the recursion
given by (15) and (16).
If the pdfs of the initial state and the disturbances are Gaussian,
however, all the conditional densities that satisfy (15) and (16) are
also Gaussian.
In this particular case, (15) and (16) lead to the well-known Kalman
filter algorithm, a recursive algorithm in terms of the (conditional)
expectation and covariance, which completely define any
Gaussian pdf.
Centre for Complex Dynamic
Systems and Control
Discussion on Implementation
Due to the way the information enters the conditional pdfs, it is, in
general, very difficult to obtain an explicit form for the optimal
control.
On the other hand, even if the recursion given by (15) and (16) can
be found explicitly, the implementation of such optimal control may
also be complicated and computationally demanding.
One approach to approximate the solution is to discretise the set U
of admissible control values and to perform the required
minimisation over the discretised set.
The above approximation also requires the evaluation of several
expected values, for which further discretisations (for x 0 , x1 and y1 ,
respectively) may be needed.
Centre for Complex Dynamic
Systems and Control
Discussion on Implementation
As an alternative approach to brute force discretisations, we could
use Markov chain Monte Carlo [MCMC] methods.
These methods approximate continuous pdfs by discrete ones by
drawing samples from the pdfs in question or from other
approximations.
However, save for some very particular cases, the exponential
growth in the number of computations as the optimisation horizon
is increased seems to be unavoidable.
Centre for Complex Dynamic
Systems and Control
Discussion on Implementation
The above discussion suggests that:
it seems impossible to analytically proceed with the
optimisation for horizons greater than two;
the implementation of the optimal law (even for N = 2)
appears to be quite intricate and computationally
burdensome.
This leads us to consider suboptimal solutions.
In the next section, we analyse two alternative suboptimal
strategies.
Centre for Complex Dynamic
Systems and Control
Suboptimal Strategy: Certainty Equivalent Control
As mentioned before, certainty equivalent control [CEC] uses the
control law obtained as the solution of an associated deterministic
control problem derived from the original problem by removing all
uncertainty.
Specifically, the associated problem is derived by setting the
disturbance wk to a fixed typical value (for example, w̄ = E{wk })
and by also assuming perfect state information.
The resulting control law is a function of the true state.
Then, the control is implemented using some estimate of the state
x̂ (I k ) in place of the true state.
Centre for Complex Dynamic
Systems and Control
Suboptimal Strategy: Certainty Equivalent Control
For our problem, we first obtain the optimal policy for the
deterministic problem


= π
ΠN
0 ( · ), . . . , πN −1 ( · ) ,
(17)
where πk : Rn → R for k = 0, 1, . . . , N − 1.
Then, the CEC evaluates the deterministic laws at the estimate of
the state, that is,
k
uk = π
x̂
(
I
)
.
(18)
k
As we saw on Day 4, the associated deterministic problem for
linear systems with a quadratic objective function is an example of
a case where the control policy can be obtained explicitly for any
finite optimisation horizon.
Centre for Complex Dynamic
Systems and Control
Suboptimal Strategy: Certainty Equivalent Control
Example (Closed Loop CEC for N = 2)


For N = 2, the deterministic policy Π
2 = {π0 ( · ), π1 ( · )} is given
by:
n
π
1 (x ) = −sat∆ (Kx ) for all x ∈ R



−sat∆ (Gx + h ) if x ∈ Z− ,




π
if x ∈ Z,
−sat∆ (Kx )
0 (x ) = 




−sat∆ (Gx − h ) if x ∈ Z+ .
K is given by (7) and
G=
K + KBKA
,
1 + (KB )2
h=
KB
∆.
1 + (KB )2
Centre for Complex Dynamic
Systems and Control
Suboptimal Strategy: Certainty Equivalent Control
Example (Closed Loop CEC for N = 2, continued)
The sets Z− , Z, Z+ form a partition of Rn , and are given by
Z− = {x : K (A − BK )x < −∆} ,
Z = {x : |K (A − BK )x | ≤ ∆} ,
Z+ = {x : K (A − BK )x > ∆} .
Therefore, a closed loop CEC applies the controls
x̂ (I 0 ) ,
u0 = π
0
1
u1 = π
x̂
(
I
)
,
1
where the estimate x̂ (I k ) can be provided, for example, by the
Kalman filter.
Centre for Complex Dynamic
Systems and Control
Suboptimal Strategy: Partially Stochastic Certainty
Equivalent Control
This variant of CEC uses the control law obtained as the solution
to an associated problem that assumes perfect state information
but takes stochastic disturbances into account.
To actually implement the controller, the value of the state is
replaced by its estimate x̂k (I k ).
In our case, given a partially stochastic CEC [PS–CEC] admissible
policy
(19)
ΛN = λ0 ( · ), . . . , λN −1 ( · ) ,
that is, a sequence of admissible control laws λk ( · ) : Rn → U that
map the (estimates of the) states into admissible control actions,
the PS–CEC solves the following perfect state information problem.
Centre for Complex Dynamic
Systems and Control
Suboptimal Strategy: Partially Stochastic CEC
Definition (PS–CEC Optimal Control Problem)
Assuming that the state x̂k will be available to the controller at time
instant k to calculate the control, and given the pdf p w ( · ) of the
disturbances wk , find the admissible control policy

( · ) that minimises the objective function
Λ
= λ
( · ), . . . , λN
0
N
−1
V̂N (ΛN ) =
E
wk
k =0,...,N −1
F (x̂N ) +
N
−1
X
k =0
L (x̂k , λk (x̂k )) ,
subject to x̂k +1 = A x̂k + B λk (x̂k ) + wk for k = 0, . . . , N − 1.
◦
The optimal control policy Λ
will then be used, as in CEC, to
N
calculate the control action based on the estimate x̂k provided by
the estimator; that is,
uk = λ
k (x̂k ).
Centre for Complex Dynamic
Systems and Control
Partially Stochastic CEC for N = 1
Using the dynamic programming algorithm, we have
n
o
Ĵ (x̂0 ) = inf E x̂1 P x̂1 + x̂0 Q x̂0 + Ru02 |x̂0 , u0 .
u0 ∈U
As with the true optimal solution for N = 1, the PS–CEC optimal
control has the form
û0 = λ
0 (x̂0 ) = −sat∆ (K x̂0 ),
Ĵ0 (x̂0 ) = x̂0 P x̂0 + R̄ Φ∆ (K x̂0 ) + E{w0 Pw0 }.
We can see that if x̂0 = E{x0 |I 0 } then the PS–CEC for N = 1
coincides with the optimal solution.
Centre for Complex Dynamic
Systems and Control
Partially Stochastic CEC for N = 2
The first step of the dynamic programming algorithm yields
û1 = λ
1 (x̂1 ) = −sat∆ (K x̂1 ),
Ĵ1 (x̂1 ) = x̂1 P x̂1 + R̄ Φ∆ (K x̂1 ) + E{w1 Pw1 }.
For the second step, we have, after some algebra, that
i
h
Ĵ0 (x̂0 ) = inf E{L (x̂0 , u0 ) + Ĵ1 (x̂1 )|x̂0 , u0 } ,
u0 ∈U
subject to:
x̂1 = A x̂0 + Bu0 + w0 ,
i
h
û0 = arg inf R̄ (u0 + K x̂0 )2 + R̄ E{Φ∆ [K (A x̂0 + Bu0 + w0 )]|x̂0 , u0 } .
u0 ∈U
(20)
Centre for Complex Dynamic
Systems and Control
Partially Stochastic CEC for N = 2
Comparing û0 :
i
h
û0 = arg inf R̄ (u0 + K x̂0 )2 + R̄ E{Φ∆ [K (A x̂0 + Bu0 + w0 )]|x̂0 , u0 }
u0 ∈U
with expression (14) for the optimal control:
oi
n
h
u0 = arg inf R̄ (u0 + K E{x0 |I 0 })2 + R̄ E Φ∆ (K E{x1 |I 1 })|I 0 , u0 ,
u0 ∈U
we can appreciate that, given x̂0 , even if
E Φ∆ [K (A x̂0 + Bu0 + w0 )]|x̂0 , u0 cannot be found in explicit form
as a function of u0 , the numerical implementation of this
suboptimal control action is much less computationally demanding
than its optimal counterpart.
Centre for Complex Dynamic
Systems and Control
Simulations
We compared the performance of the suboptimal strategies CEC
and PS–CEC by means of simulation examples.
The performance was assessed by computing the achieved value
of the objective function.
To numerically compute values of the objective function for a given
control policy, different realisations of the initial state plus process
and measurement disturbances were obtained and a
corresponding realisation of the quadratic function in the objective
function evaluated.
Then, the expected value was approximated by averaging over the
different realisations.
Centre for Complex Dynamic
Systems and Control
Simulations
The system considered is:
A=
"
0.9713 0.2189
−0.2189 0.7524
#
, B=
"
0.0287
0.2189
#
, C = [0.3700 0.0600].
The disturbances wk are assumed to have a uniform distribution
with support on [−0.5, 0.5] × [−1, 1] and likewise for vk with
support on [−0.1, 0.1].
The initial state x0 is assumed to have a Gaussian distribution with
zero mean and covariance diag{300−1 , 300−1 }.
Centre for Complex Dynamic
Systems and Control
Simulations
A Kalman filter was implemented to provide the state estimates
needed.
Although this estimator is not the optimal one in this case because
the disturbances are not Gaussian, it yields the best linear
unbiased estimator for the state.
The parameters for the Kalman filter were chosen as the true
mean and covariance of the corresponding variables in the system.
The saturation limit of the control was taken as ∆ = 1.
The optimisation horizon is in both suboptimal strategies N = 2.
Centre for Complex Dynamic
Systems and Control
Simulations
For CEC, we implement the policy given in Example
5.1 (Closed Loop CEC for N = 2).
For PS–CEC, we discretise the set U so that only 500 values
are considered, and the expected value in (20):
i
h
û0 = arg inf R̄ (u0 + K x̂0 )2 + R̄ E{Φ∆ [K (A x̂0 + Bu0 + w0 )]|x̂0 , u0 }
u0 ∈U
is approximated by taking 300 samples of the pdf pw ( · ) for
every possible value of u0 in the discretised set.
Centre for Complex Dynamic
Systems and Control
Simulations
We simulated the closed loop system over two time instants and
repeated the simulation around 5000 times.
For each simulation, a different realisation of the disturbances and
the initial state was used.
A realisation of the quadratic function in the objective function was
calculated for every simulation run for each one of the control
policies applied (CEC and PS–CEC).
The sample average of the objective function values achieved by
each policy was computed.
The difference between them was always found to be
less than 0.1%.
Centre for Complex Dynamic
Systems and Control
Simulations
The comparison seems to indicate that the trade-off between
better performance and computational complexity favours the
CEC implementation over the PS–CEC.
From a practical perspective, it would be of interest to extend
the optimisation horizon beyond N = 2. Due to computational
issues, this extension may only be possible for CEC.
The ultimate test for the suboptimal strategies would be to
contrast them with the optimal one.
It would be expected that, in this case, an appreciable
difference in the objective function values may be obtained
due to the fact that the optimal strategy takes into account the
process and measurement disturbances in a unified manner.
Centre for Complex Dynamic
Systems and Control