Pseudo linear regression for ARMAX model Consider the ARMAX

Lecture 7
Pseudo linear regression for ARMAX model
Consider the ARMAX model
A(q −1)y(t) = B(q −1)u(t) + C(q −1)e(t)
or
y(t) = −a1y(t − 1) − ... − ana y(t − na)
+b1u(t − 1) + ... + bnb u(t − nb)
+c1e(t − 1) + ... + cnc e(t − nc) + e(t)
To simply the notation, we study a first order
system:
y(t) = −ay(t − 1) + bu(t − 1) + ce(t − 1) + e(t)
(1)
If we rewrite (1) as
y(t) = φT (t)θ + v(t)
in which v(t) = ce(t − 1) + e(t) is moving average of e(t).
1
Lecture 7
and
φ = [−y(t − 1), u(t − 1)]T , θ = [a, b]T
In matrix form
y = Φθ + v
The least squares estimate
θ̂ = [ΦT Φ]−1ΦT y
= [ΦT Φ]−1ΦT (Φθ + v)
= θ + [ΦT Φ]−1ΦT v
E(θ̂ ) = θ + E([ΦT Φ]−1ΦT v)
The second term E([ΦT Φ]−1ΦT v) 6= 0 is the
bias. This is due to v(t) is a correlated signal
(colored noise)
2
Lecture 7
4
3
2
2
0.8e(t)+e(t)
e(t)
1
0
−1
−2
−2
−3
0
0
20
40
60
80
100
−4
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
−10
−5
0
5
10
0
−0.2
−10
20
−5
40
60
0
80
5
100
10
Figure: Comparison of autocorrelation coefficients
3
Lecture 7
However (1) can be rewritten as
y(t) = φT (t)θ + e(t)
with
φ = [−y(t − 1), u(t − 1), e(t − 1)]T
θ = [a, b, c]T
Here in φ, y(t − 1) and u(t − 1) are known at
time t, but e(t − 1) is unknown. However we
can replace it using the prediction error ε(t−1).
ε(0) can be initialized as 0.
For this system, the RLS algorithm is given by
φ(n) = [−y(n − 1), u(n − 1), ε(n − 1)]T
ε(n) = y(n) − φT (n)θ̂ (n − 1)
P(n) =
1
P(n − 1)φ(n)φT (n)P(n − 1)
{P(n − 1) −
}
λ
λ + φT (n)P(n − 1)φ(n)
K(n) = P(n)φ(n)
θ̂ (n) = θ̂ (n − 1) + K(n)ε(n)
4
Lecture 7
Summary of the RLS algorithm for ARMAX:
1. Initialization: Set na,nb, nc, λ, θ̂ (0) and
P(0). ε(0), ..., ε(1 − nc) = 0. Step 2-5 are
repeated starting from t = 1.
2. At time step t = n, measure current output
y(n).
3. Recall past y’s and u’s and ε’s form φ(n).
4. Apply RLS algorithm for ε(n), θ̂ (n) and
P(n)
5. θ̂ (n) → θ̂ (n − 1) and P(n) → P(n − 1) and
ε(n) → ε(n − 1).
6. t = n + 1, go to step 2.
5
Lecture 7
Example 1: Consider a system described by
y(t) + ay(t − 1) = bu(t − 1) + ce(t − 1) + e(t)
where {e(t)} is a white noise sequence with
variance 1. The system parameters are
a = −0.9, b = 1.0, c = 0.3
The input
π
πt
πt
+ ) + sin( )
u(t) = 2 sin(
50
3
30
1000 data samples are generated.
3
2
input u
1
0
−1
−2
−3
0
100
200
300
400
500
t
600
700
800
900
1000
0
100
200
300
400
500
t
600
700
800
900
1000
30
20
output y
10
0
−10
−20
−30
−40
Figure: System input and output.
6
Lecture 7
6
prediction error
4
2
0
−2
−4
0
100
200
300
400
500
t
600
700
800
900
1000
0
100
200
300
400
500
t
600
700
800
900
1000
1.5
1
θ
0.5
0
−0.5
−1
Figure: Results of RLS algorithm (λ = 1).
7
Lecture 7
Determining the model structure
For a given data set, consider using the linear
regression
y(t) = φT (t)θ + e(t)
with a sequence of model structures of increasing dimension to obtain the best fitted models
within each of the model structures.
e.g. The first model is (na = 1, nb = 1)
y(t) = −a1y(t − 1) + b1u(t − 1) + e(t)
and the second model is (na = 2, nb = 2)
y(t) = −a1y(t − 1) − a2y(t − 2)
+b1u(t − 1) + b2u(t − 2) + e(t)
With more free parameters in the second model,
a better fit will be obtained. In determining the
best model structure, the important thing is to
investigate whether or not the improvement in
the fit is significant.
8
Lecture 7
The model fit is given by the loss function
1 PN ε2 (t) for each model. DependV (θ̂ ) = N
t=1
ing on the data set, we may obtain either of
the following Figures:
V( θ )
model 1
model 2
model size
Figure: Model 2 is preferable as Model 1 is not
large enough to cover the true system.
V( θ )
model 1
model 2
model size
Figure: Model 1 is preferable as Model 2 is
more complex, but not significantly better than
Model 1.
9
Lecture 7
If a model is more complex than is necessary,
the model may be over-parameterized.
Example 2. (Over-parameterization) Suppose
that the true signal y(t) is described by the
ARMA process
y(t) + ay(t − 1) = e(t) + ce(t − 1)
Let the model set be given by
y(t) + a1y(t − 1) + a2y(t − 2)
= e(t) + c1e(t − 1) + c2e(t − 2)
we see that all ai, ci such that
1 + a1q −1 + a2q −2 = (1 + aq −1)(1 + dq −1)
1 + c1q −1 + c2q −2 = (1 + cq −1)(1 + dq −1)
with an arbitrary number d give the same description of the system. ai, ci can not be uniquely
determined, we may say that model is overT
Φ
Φ is singular.
parameterized. Consequently
N
10