Kalman Filtering

Kalman Filtering
Kalman Filtering
• (Optimal) estimation of the (hidden) state of a
linear dynamic process of which we obtain
noisy (partial) measurements
• Example: radar tracking of an airplane.
What is the state of an airplane given noisy
radar measurements of the airplane’s position?
Model
• Discrete time steps, continuous state-space
• (Hidden) state: xt , measurement: yt
• Airplane example:
 xt 
 
x t   xt , y t  ~
xt 
 x 
 t
• Position, speed and acceleration
Dynamics and Observation model
• Linear dynamics model describes relation
between the state and the next state, and the
observation:
xt 1 
yt 
Axt  w t , w t ~ Wt  N (0, Q)
Cxt  v t ,
v t ~ Vt  N (0, R)
• Airplane example (if process has time-step ):
1 

A  0 1
0 0

1
2
2

 , C  1 0 0
1 
Normal distributions
• Let X0 be a normal distribution
of the initial state x0
• Then, every Xt is a normal distribution of
hidden state xt. Recursive definition:
X t 1  AX t  Wt
• And every Yt is a normal distribution of
observation yt. Definition:
Yt  CX t  Vt
• Goal of filtering: compute conditional
distribution  X t | Y0  y 0 ,, Yt  y t 
Normal distribution
• Because Xt’s and Yt’s are normal
distributions,  X t | Y0  y 0 ,, Yt  y t 
is also a normal distribution
• Normal distribution is fully
specified by mean and covariance
• We denote:
 X t | Y0  y 0 ,, Ys  y s 
X t |s 
 N E X t | Y0  y 0 , , Ys  y s , Var  X t | Y0  y 0 , , Ys  y s 

N xˆ t|s , Pt|s 
Problem reduces to computing xt|t and Pt|t
Recursive update of state
• Kalman filtering algorithm: repeat…
– Time update:
from Xt|t, compute a priori distrubution Xt+1|t
– Measurement update:
from Xt+1|t (and given yt+1), compute a posteriori
distribution Xt+1|t+1
X0
X1
X2
X3
X4
X5
…
Y1
Y2
Y3
Y4
Y5
Time update
• From Xt|t, compute a priori distribution Xt+1|t:
X t 1|t

AX t|t  Wt

N EAX t|t  Wt , Var AX t|t  Wt 
 N A EX t|t   EWt , A Var X t|t AT  Var Wt 
N Axˆ t|t , APt|t AT  Q


• So,
xˆ t 1|t  Axˆ t|t
Pt 1|t  APt|t AT  Q



Measurement update
• From Xt+1|t (and given yt+1), compute Xt+1|t+1.
• 1. Compute a priori distribution of the
observation Yt+1|t from Xt+1|t:
Yt 1|t

CX t 1|t  Vt 1

N ECX t 1|t  Vt 1 , Var CX t 1|t  Vt 1 
 N C EX t 1|t   EVt 1 , C Var X t 1|t C T  Var Vt 1 
N Cxˆ t 1|t , CPt 1|t C T  R





Measurement update (cont’d)
• 2. Look at joint distribution of Xt+1|t and Yt+1|t:
X
t 1|t
, Yt 1|t  

  E X t 1|t   Var X t 1|t 
CovX t 1|t , Yt 1|t  
, 

N  



 E Yt 1|t  CovYt 1|t , X t 1|t 


Var
Y
t

1
|
t



  xˆ t 1|t   Pt 1|t
Pt 1|t C T  


, 
N  
  Cxˆ t 1|t   CPt 1|t CPt 1|t C T  R  



where
CovYt 1 , X t 1|t  
CovCX t 1|t  Vt 1 , X t 1|t 
 C CovX t 1|t , X t 1|t   CovVt 1 , X t 1|t 
C Var X t 1|t 

CPt 1|t

Measurement update (cont’d)
• Recall from undergrad that if
  1   11 12  
 
Z1 , Z 2   N   , 
   2    21  22  
then
Z1 | Z 2  z 2   N 1  12221 z 2  2 , 11  1222121 
• 3. Compute Xt+1|t +1 = (Xt+1|t|Yt+1|t = yt+1):
X t 1|t 1 

X
t 1|t
| Yt 1|t  y t 1 
 N xˆ t 1|t  Pt 1|t C CPt 1|t C  R  y t 1  Cxˆ t 1|t ,
T
1
T
Pt 1|t  Pt 1|t C CPt 1|t C  R  CPt 1|t
T
T
1

Measurement update (cont’d):
• Often written in terms of Kalman gain matrix:


1
K t 1  Pt 1|t C CPt 1|t C  R
xˆ t 1|t 1  xˆ t 1|t  K t 1 y t 1  Cxˆ t 1|t 
T
Pt 1|t 1

T
Pt 1|t  K t 1CPt 1|t
Kalman filter summary
• Model: xt 1  Axt  w t , w t ~ Wt  N (0, Q)
Cxt  v t ,

yt
v t ~ Vt  N (0, R)
• Algorithm: repeat…
– Time update:
xˆ t 1|t  Axˆ t|t
Pt 1|t  APt|t AT  Q
– Measurement update:


1
K t 1  Pt 1|t C CPt 1|t C  R
xˆ t 1|t 1  xˆ t 1|t  K t 1 y t 1  Cxˆ t 1|t 
T
Pt 1|t 1

T
Pt 1|t  K t 1CPt 1|t
Initialization
• Choose distribution of initial state by picking
x0 and P0
• Start with measurement update given
measurement y0
• Choice for Q and R (identity)
– small Q: dynamics “trusted” more
– small R: measurements “trusted” more
Conclusion
• Kalman filter can be used in real time
• Use xt|t’s as optimal estimate of state at time t,
and use Pt|t as a measure of uncertainty.
Extensions
• Dynamic process with known control input
• Non-linear dynamic process
• Kalman smoothing: compute optimal
estimate of state xt given all data y1, …, yT,
with T > t (not real-time).
• Automatic parameter (Q and R) fitting using
EM-algorithm
Kalman Smoothing
Kalman Filtering vs. Smoothing
• Dynamics and Observation model
X t 1  AX t  Wt , Wt  N (0, Q)
Yt  CX t  Vt , Vt  N (0, R)
• Kalman Filter:
– Compute  X t | Y0  y 0 ,, Yt  y t 
– Real-time, given data so far
• Kalman Smoother:
– Compute  X t | Y0  y 0 ,, YT  y T , t  T
– Post-processing, given all data
Kalman Filtering Recap
• Time update
– X t 1|t  AX t|t  Wt
• Measurement update:
– Yt 1|t  CX t 1|t  Vt 1
– Compute joint distribution ( X t 1|t , Yt 1|t )
– Compute conditional X t 1|t 1  ( X t 1|t | Yt 1|t  y t 1 )
X0
X1
X2
X3
X4
X5
…
Y1
Y2
Y3
Y4
Y5
Kalman filter summary
• Model: X t 1  AX t  Wt , Wt  N (0, Q)
Yt  CX t  Vt , Vt  N (0, R)
• Algorithm: repeat…
– Time update:
xˆ t 1|t  Axˆ t|t
Pt 1|t  APt|t AT  Q
– Measurement update:


1
K t 1  Pt 1|t C CPt 1|t C  R
xˆ t 1|t 1  xˆ t 1|t  K t 1 y t 1  Cxˆ t 1|t 
T
Pt 1|t 1

T
Pt 1|t  K t 1CPt 1|t
Kalman Smoothing
• Input: initial distribution X0 and data y1, …, yT
• Algorithm: forward-backward pass
(Rauch-Tung-Striebel algorithm)
• Forward pass:
– Kalman filter: compute Xt+1|t and Xt+1|t+1 for 0 ≤ t < T
• Backward pass:
– Compute Xt|T for 0 ≤ t < T
– Reverse “horizontal” arrow in graph
Backward Pass
• Compute Xt|T given X t 1|T  N (xˆ t 1|T , Pt 1|T )
• Reverse arrow: Xt|t → Xt+1|t
• Same as incorporating measurement in filter
– 1. Compute joint (Xt|t, Xt+1|t)
– 2. Compute conditional (Xt|t | Xt+1|t = xt+1)
• New: xt+1 is not “known”, we only know its
distribution: xt 1 ~ X t 1|T
– 3. “Uncondition” on xt+1 to compute Xt|T using
laws of total expectation and variance
Backward pass. Step 1
• Compute joint distribution of Xt|t and Xt+1|t:
X
t |t
, X t 1|t 
  E X t|t    Var X t|t 
CovX t|t , X t 1|t  
, 

 N  



 E X t 1|t  CovX t 1|t , X t|t 


Var
X
t

1
|
t



  xˆ t|t   Pt|t Pt|t AT  

, 

N  
  xˆ t 1|t   APt|t Pt 1|t  



where
CovX t 1|t , X t|t  
CovAX t|t  Wt , X t|t 


A CovX t|t , X t|t   CovWt , X t|t 
A Var X t|t 

APt|t
Backward pass. Step 2
• Recall that if
  1   11 12  
 
Z1 , Z 2   N   , 
   2    21  22  
then
Z1 | Z 2  z 2   N 1  12221 z 2  2 , 11  1222121 
• Compute (Xt|t|Xt+1|t = xt+1):
X
t |t
| X t 1|t  x t 1  

1
N xˆ t|t  Pt|t AT Pt 1|t x t 1  xˆ t 1|t ,
1
Pt|t  Pt|t AT Pt 1|t APt|t

Backward pass Step 3
• Conditional only valid for given xt+1.
X
t |t

| X t 1|t  x t 1  
1
N xˆ t|t  Pt|t AT Pt 1|t x t 1  xˆ t 1|t ,
1

Pt|t  Pt|t AT Pt 1|t APt|t
 N xˆ t|t  Lt x t 1  xˆ t 1|t , Pt|t  Lt Pt 1|t LTt 
– Where
Lt  Pt|t A Pt 1|t
T
1
• But we don’t know its value, but only its
distribution: xt 1 ~ X t 1|T
• Uncondition on xt+1 to compute Xt|T using law
of total expectation and law of total variance
Law of total expectation/variance
• Law of total expectation:
– E(X) = EZ( E(X|Y = Z) )
• Law of total variance:
– Var(X) = EZ( Var(X|Y = Z) ) + VarZ( E(X|Y = Z) )
• Compute X t|T  N ( E ( X t|T ), Var( X t|T ))
– where
E( X t|T )  EX t1|T E( X t|t | X t 1|t  X t 1|T )
Var( X t|T )  E X t1|T Var( X t|t | X t 1|t  X t 1|T )  
VarX t1|T E ( X t|t | X t 1|t  X t 1|T ) 
Unconditioning
• Recall from step 2 that
E ( X t|t | X t 1|t  X t 1|T )  xˆ t|t  Lt X t 1|T  xˆ t 1|t 
Var( X t|t | X t 1|t  X t 1|T )  Pt|t  Lt Pt 1|t LTt
• So, E ( X t|T )  E X

| X t 1|t  X t 1|T ) 
xˆ t|t  Lt xˆ t 1|T  xˆ t 1|t 
t 1|T
E ( X
t |t
Var( X t|T )  E X t 1|T Var( X t|t | X t 1|t  X t 1|T )  
Var X t 1|T E ( X t|t | X t 1|t  X t 1|T ) 


Pt|t  Lt Pt 1|t LTt  Lt Pt 1|T LTt
Pt|t  Lt ( Pt 1|T  Pt 1|t ) LTt
Backward pass
• Summary:
1
Lt
xˆ t|T

Pt|T
 Pt|t  Lt ( Pt 1|T  Pt 1|t ) LTt

T
Pt|t A Pt 1|t
xˆ t|t  Lt xˆ t 1|T  xˆ t 1|t 
Kalman smoother algorithm
• for (t = 0; t < T; ++t)
// Kalman filter
xˆ t 1|t  Axˆ t|t
Pt 1|t  APt|t AT  Q


1
K t 1  Pt 1|t C CPt 1|t C  R
xˆ t 1|t 1  xˆ t 1|t  K t 1 y t 1  Cxˆ t 1|t 
T
Pt 1|t 1

T
Pt 1|t  K t 1CPt 1|t
• for (t = T – 1; t ≥ 0; --t)
// Backward pass
1
Lt
xˆ t|T

Pt|T
 Pt|t  Lt ( Pt 1|T  Pt 1|t ) LTt

T
Pt|t A Pt 1|t
xˆ t|t  Lt xˆ t 1|T  xˆ t 1|t 
Conclusion
• Kalman smoother can in used a post-processing
• Use xt|T’s as optimal estimate of state at time t,
and use Pt|T as a measure of uncertainty.
Extensions
• Automatic parameter (Q and R) fitting using
EM-algorithm
– Use Kalman Smoother on “training data” to learn
Q and R (and A and C)