State Space Modelling, the Kalman Filter

State Space Models
Let { xt:t T} and { yt:t T} denote two vector
valued time series that satisfy the system of
equations:
yt = Atxt + vt (The observation
equation)
xt = Btxt-1 + ut (The state equation)
The time series { yt:t T} is said to have
state-space representation.
Note: { ut:t T} and { vt:t T} denote two
vector valued time series that satisfying:
1.
2.
3.
4.
E(ut) = E(vt) = 0.
E(utusˊ) = E(vtvsˊ) = 0 if t ≠ s.
E(ututˊ) = Su and E(vtvtˊ) = Sv.
E(utvsˊ) = E(vtusˊ) = 0 for all t and s.
Example: One might be tracking an object with
several radar stations. The process {xt:t  T}
gives the position of the object at time t. The
process { yt:t  T} denotes the observations at
time t made by the several radar stations.
As in the Hidden Markov Model we will be
interested in determining position of the object,
{xt:t  T}, from the observations, {yt:t  T} ,
made by the several radar stations
Example: Many of the models we have
considered to date can be thought of a StateSpace models
Autoregressive model of order p:
yt  1 yt 1   2 yt  2     p yt  p  ut
 yt  p 1 
  

xt  
 yt 1 


 yt 
Define
Then yt  0  0 1xt
and
xt  Bxt 1  ut
0


0

 1
1

0
2
Observation equation
State equation
0
0 


  
x t 1   ut
 1
0 

 
  p 
1

Hidden Markov Model: Assume that there are
m states. Also that there the observations Yt are
discreet and take on n possible values.
Suppose that the m states are denoted by the
vectors:
1
0 
0 
0 
1
0 
e1   , e 2   , , e m   



 
 
 
0 
0 
1
Suppose that the n possible observations taken
at each state are
1
0 
0 
0 
1
0 
f1   , f 2   , , f n   



 
 
 
0 
0 
1
Let
 ij  PX t  e j X t 1  ei ,    ij 
mm
and
ij  PYt  f j X t  ei , Β  ij 
mn
  i1 
 
i2 

E X t X t 1  e i  
 Πe i
 
 
 im 
Note
Let
ut  X t  EX t X t 1  ei 
 X t  Πei
 X t  ΠX t 1
So that
X t  ΠX t 1  ut
with
The State Equation
Eut X t 1 , X t 2   Eut X t 1   0
Also




X t X t  Π X t 1  ut Π X t 1  ut 







 Π X t 1 Π X t 1   Π X t 1ut  ut Π X t 1   ut ut
Hence


ut ut  X t X t  ΠX t 1 ΠX t 1   ΠX t 1ut  ut ΠX t 1 
and
Σu  Eut ut X t 1   EX t X t X t 1 t  ΠX t 1 ΠX t 1 

 Ediag  X t  X t 1  Πdiag  X t 1 Π
where diag(v) = the diagonal matrix with the
components of the vector v along the diagonal
Since
X t  ΠX t 1  ut
then
diag  X t   diag ΠX t 1   diag ut 
and
Ediag  X t  X t 1   diag ΠX t 1 
Thus
Σu  diag ΠX t 1   Πdiag  X t 1 Π
We have defined
ij  PYt  f j X t  ei , Β  ij 
mn
Hence
Let
  i1 
 
E Yt X t  e i    i 2   Βe i
  
 
  in 
vt  Yt  EYt X t 
 Yt  ΒX t
Then
Yt  ΒX t  v t
with
The Observation
Equation
Evt X t   0
and
Σ v  Evt vt X t   diag ΒX t   Βdiag  X t Β
Hence with these definitions the state sequence
of a Hidden Markov Model satisfies:
X t  ΠX t 1  ut The State Equation
with Eut X t   0
and Σu  Eut ut X t 1   diag ΠX t 1   Πdiag  X t 1 Π
The observation sequence satisfies:
Yt  ΒX t  v t The Observation Equation
with
Evt X t   0
and Σ v  Evt vt X t   diag ΒX t   Βdiag  X t Β
Kalman Filtering
We are now interested in determining the state
vector xt in terms of some or all of the
observation vectors y1, y2, y3, … , yT.
We will consider finding the “best” linear
predictor.
We can include a constant term if in addition one
of the observations (y0 say) is the vector of 1’s.
We will consider estimation of xt in terms of
1. y1, y2, y3, … , yt-1 (the prediction problem)
2. y1, y2, y3, … , yt (the filtering problem)
3. y1, y2, y3, … , yT (t < T, the smoothing problem)
For any vector x define:

xˆ s   xˆ
1
s 
xˆ
2 
s 
 xˆ
 p
s 

where
xˆ i  s 
is the best linear predictor of x(i), the ith
component of x, based on y0, y1, y2, … , ys.
The best linear predictor of x(i) is the linear
function that of x, based on y0, y1, y2, … , ys that
minimizes

i 
E x  xˆ
i 
s  
2
Remark: The best predictor is the unique
vector of the form:
xˆ s   C0 y 0  C1y1    Cs y s
Where C0, C1, C2, … ,Cs, are selected so that:
x  xˆ s   y i
i  0,1,2,, s
i.e. Ex  xˆ s yi   0 i  0,1,2,, s
Remark: If x, y1, y2, … ,ys are normally
distributed then:
xˆ s   Ex y1 , y 2 ,, y s 
Remark
Let u and v, be two random vectors than
uˆ v  Cv
is the optimal linear predictor of u
based on v if
C  EuvEvv
1
State Space Models
Let { xt:t T} and { yt:t T} denote two vector
valued time series that satisfy the system of
equations:
yt = Atxt + vt (The observation
equation)
xt = Btxt-1 + ut (The state equation)
The time series { yt:t T} is said to have
state-space representation.
Note: { ut:t T} and { vt:t T} denote two
vector valued time series that satisfying:
1.
2.
3.
4.
E(ut) = E(vt) = 0.
E(utusˊ) = E(vtvsˊ) = 0 if t ≠ s.
E(ututˊ) = Su and E(vtvtˊ) = Sv.
E(utvsˊ) = E(vtusˊ) = 0 for all t and s.
Kalman Filtering:
Let { xt:t  T} and { yt:t  T} denote two vector
valued time series that satisfy the system of
equations:
yt = Atxt + vt
xt = Bxt-1 + ut
xˆ t s   Ext y1 , y 2 ,, y s 
Let
and
s 
Σtu


 E xt  xˆ t s xu  xˆ u s  y1 , y 2 ,, y s 


Then
xˆ t t  1  Βxˆ t 1 t  1
xˆ t t   xˆ t t  1  K t y t  At xˆ t t  1
where
t 1
K t  Σ tt

t 1
 At  A t  Σ tt
 At  Σ v

1
One also assumes that the initial vector x0
has mean m and covariance matrix S an that
xˆ 0 0  μ
The covariance matrices are updated
Σttt 1  B  Σtt11,t1  B  Σu
Σ ttt   Σ ttt 1  K t AΣ ttt 1
with
0 
Σ 00  Σ
Summary: The Kalman equations
1.
Σttt 1  B  Σtt11,t1  B  Σu
t 1

t 1
 At  A t  Σ tt
 At  Σ v

1
2.
K t  Σ tt
3.
Σ ttt   Σ ttt 1  K t AΣ ttt 1
4.
xˆ t t  1  Βxˆ t 1 t  1
5.
xˆ t t   xˆ t t  1  K t y t  At xˆ t t  1
with
xˆ 0 0  μ
and
0 
Σ 00  Σ
Proof:
Now
xˆ t s   Ext y1 , y 2 ,, y s 
hence xˆ t t 1  Ext y1 , y 2 ,, y t 1 
 EΒxt 1  ut y1 , y 2 ,, y t 1 
 ΒExt 1 y1 , y 2 ,, y t 1 
 Βxˆ t 1 t  1
proving (4)
Note yˆ t t 1  Ey t y1 , y 2 ,, y t 1 
 EAt xt  vt y1 , y 2 ,, y t 1 
 At Ext y1 , y 2 ,, y t 1   At xˆ t t 1
Let
et  y t  yˆ t t  1
 y t  At xˆ t t  1
 At xt  v t  At xˆ t t  1
 At xt  xˆ t t  1  v t
Let
dt  xt  xˆ t t  1
Given y0, y1, y2, … , yt-1 the best linear
predictor of dt using et is:
1
Edt et Eet et  et
 Edt y 0 , y1 ,, y t 1 , et 
 Edt y 0 , y1 ,, y t 1 , y t 
 xˆ t t   xˆ t t  1
Hence
xˆ t t   xˆ t t  1  K t et
where
et  y t  At xˆ t t  1
and
K t  Edt et Eet et 
(5)
1



 Ex  xˆ t  1x  xˆ t  1 A
Now


Edt et   E xt  xˆ t t  1At xt  xˆ t t  1  vt 
t
t 1
 Σ tt
t
 At
t
t
t
Also
Eet et   EAt xt  xˆ t t  1  v t 

At xt  xˆ t t 1 vt 

 At E xt  xˆ t t  1xt  xˆ t t  1 At 
At Ext  xˆ t t  1vt 

E vt xt  xˆ t t  1 At  Evt vt 





 A t  Σ ttt 1  At  Σ v
hence
t 1

t 1
K t  Σ tu  At  A t  Σ tt
 At  Σ v

1
(2)
Thus
xˆ t t  1  Βxˆ t 1 t  1
(4)
xˆ t t   xˆ t t  1  K t y t  At xˆ t t  1
where
t 1
K t  Σ tt

t 1
 At  A t  Σ tt
 At  Σ v

1
(5)
(2)

Also

t 1
ˆ
ˆ
Σtt  E xt  xt t  1xt  xt t  1 y 0 ,, y t 1
 EΒxt 1  ut  Βxˆ t 1 t  1

Βxt  ut  Βxˆ t t  1 y 0 ,, y t 1


Hence
Σttt 1  B  Σtt11,t1  B  Σu
(3)
The proof that
Σ ttt   Σ ttt 1  K t AΣ ttt 1
will be left as an exercise.
(1)
Example:
Suppose we have an AR(2) time series
xt  1 xt 1   2 xt 2  ut
What is observe is the time series
yt  xt  vt
{ut|t  T} and {vt|t  T} are white noise time
series with standard deviations su and sv.
This model can be expressed as a state-space
model by defining:
 xt 
1
x t   , Β  
1
 xt 1 
then
 xt  1
x    1
 t 1  
2 
ut 
, ut   

0
0
 2   xt 1  ut 
 



0   xt 2   0 
or xt  Βxt 1  ut
The equation:
yt  xt  vt
can be written
 xt 
yt  1,0   vt  Ax t  vt
 xt 1 
Note:
Σ v  s v2 
s u2
Σu  
0
0

0
The Kalman equations
1.
Σttt 1  B  Σtt11,t1  B  Σu
t 1

t 1
 At  A t  Σ tt
 At  Σ v

1
2.
K t  Σ tt
3.
Σ ttt   Σ ttt 1  K t AΣ ttt 1
4.
xˆ t t  1  Βxˆ t 1 t  1
5.
xˆ t t   xˆ t t  1  K t y t  At xˆ t t  1
Let
t 1
Σtt
s

s
t
11
t
12
s 

s 
t
12
t
22
t
t


r
r
t 
11
12
Σtt   t
t 
r12 r22 
The Kalman equations
Σttt 1  B  Σtt11,t1  B  Σu
1.
t
 s11
 t
s12
s   1

s  1
t
12
t
22
 2  r11t 1 r12t 1   1 1 s u2 0

 t 1 t 1  



0  r12 r22    2 0  0 0
s  r   2r   2  r   s
t
11
t 1 2
11
1
t 1
12
1
s  r   r 2
t
12
t 1
11
1
t 1
11
s r
t
22
t 1
12
t 1
22
2
2
2
u
2.
t 1
K t  Σ tt
s
Kt  
s
t
11
t
12

t 1
 At  A t  Σ tt
s
s  1
   1 0
s  0  
s
t
12
t
22
t
11
t
12
 At  Σ v

s  1

   s v 
s  0 

t
12
t
22
t
 s11

t

t

 s11
 t
s

s
 1
11
v 
  t  s11  s v  
t
s
 12 
 s12 
 st  s  
v 
 11



1
1
t 
t 1
Σ tt  Σ tt
3.
t
 r11t r12t   s11
 t
 t
t 
r12 r22  s12
r s 
t
11
s 
s s
r s 
t
22
t
22
2
v
2
s 
t
12
s s
t
11
t
t


s
s11 s12 
  K t 1 0 t
t 
s 
s12 s22 
t
 s11

t
t
2
  s11
s12

s
t
v  t


s11 s12
t
t 
s22   s12 
 st  s 2 
v 
 11
t t
s11s12
t
t
r12  s12  t
s11  s v2

t 2
11
t
11
 K t AΣ tt
t
12
t
22
 s11t
 t
 s12
t
11
t 1
2
v

4.
xˆ t t  1  Βxˆ t 1 t  1
 xˆt t  1   1
 xˆ t  1   1
 t 1
 
 2   xˆt 1 t  1



ˆ
0   xt 2 t  1
xˆt t  1  1 xˆt 1 t  1   2 xˆt 2 t  1
5.
xˆ t t   xˆ t t  1  K t y t  At xˆ t t  1

 xˆt t    xˆt t  1 
 xˆt t  1  
 xˆ t    xˆ t  1  K t  yt  1 0 xˆ t  1 
 t 1   t 1

 t 1


t
 s11

t
2
 xˆt t    xˆt t  1   s11

s
v 

 yt  xˆt t  1


t
 xˆ t   xˆ t  1
 t 1   t 1
  s12 
 st  s 2 
v 
 11
t
11
s
 yt  xˆt t  1
xˆt t   xˆt t  1  t
2
s11  s v
t
12
s
 yt  xˆt t  1
xˆt 1 t   xˆt 1 t  1  t
2
s11  s v
Kalman Filtering (smoothing):
Now consider finding
xˆ t T   Ext y1 , y 2 ,, yT 
These can be found by successive backward
recursions for t = T, T – 1, … , 2, 1
xˆ t 1 T   xˆ t 1 t  1  J t 1 xˆ t T   xˆ t t  1
where
s 
Σtu
t 1

t 1
J t 1 Σt 1,t 1Β Σtt

1


 E xt  xˆ t s xu  xˆ u s  y1 , y 2 ,, y s 


The covariance matrices satisfy the recursions
T 
t 1

T 
t 1
Σt 1,t 1  Σt 1,t 1 J t 1 Σtt  Σtt
J
t 1
The backward recursions
1.
2.
3.
t 1

t 1
J t 1 Σt 1,t 1Β Σtt

1
xˆ t 1 T   xˆ t 1 t  1  J t 1 xˆ t T   xˆ t t  1
T 
t 1

T 
t 1
Σt 1,t 1  Σt 1,t 1 J t 1 Σtt  Σtt
J
t 1
In the example:
t
 1
 t   r11t r12t 
s12
Β

Σ

1



tt
t
t
t
s22 

r12 r22 
t 1
t 1
Σ tt , Σ tt , xˆt t  1 and xˆt t 
t

s
t 1
11
Σtt   t
s12
- calculated in forward recursion
2 

0