Recursive Estimation

Recursive Estimation
Raffaello D’Andrea
Spring 2011
Problem Set:
Extracting estimates from probability distributions
Last updated: March 30, 2011
Notes:
• Notation: Unless otherwise noted, x, y, and z denote random variables, fx (x) (or the short
hand f (x)) denotes the probability density function of x, and fx|y (x|y) (or f (x|y)) denotes
the conditional probability density function of x conditioned on y. The expected value is
denoted by E [·], the variance is denoted by Var[·], and Pr (Z) denotes the probability that
the event Z occurs. A normally
distributed random variable x with mean µ and variance
2
2
σ is denoted by x ∼ N µ, σ .
• Please report any errors found in this problem set to the teaching assistants
([email protected] or [email protected]).
Problem Set
Problem 1
Find the maximum likelihood (ML) estimate x̂M L := arg max f (z|x) when
x
zi = x + w i ,
i = 1, 2 ,
where wi are assumed to be independent with wi ∼ N 0, σi2 and zi , wi , x ∈ R.
Problem 2
Solve Problem 1 for independent wi uniformly distributed on [−1, 1].
Problem 3
Find the ML estimate x̂M L when
z = Hx + w ,
where x ∈ Rn and z, w ∈ Rm , n ≤ m. The matrix H has full rank.
The components of the vector w are assumed to be independent with wi ∼ N (0, σi2 ), i =
1, . . . , m.
How does this relate to weighted least squares?
Problem 4
Find the maximum a posteriori (MAP) estimate x̂M AP := arg max f (z|x)f (x) when
x
z = x+w,
x, z, w ∈ R ,
where x is exponentially distributed,
(
0
for x < 0
f (x) =
−x
e
for x ≥ 0
and w ∼ N (0, 1).
Problem 5
Prove that
∂ trace ABAT
∂A
= 2AB
if B = B T ,
for A, B ∈ R2×2 , where trace (·) is the sum of the diagonal elements of a matrix.
Problem 6
Prove that
∂ trace (AB)
= BT ,
∂A
for A, B ∈ R2×2 .
2
Problem 7
Apply the recursive least squares algorithm when
z(k) = x + w(k) ,
k = 1, 2, . . . ,
(1)
where
E [x] = 0, Var[x] = 1,
and
E [w(k)] = 0, Var[w(k)] = 1.
(2)
The random variables {x, w(1), w(2), . . . } are assumed to be independent and z(k), w(k), x ∈ R.
a)
Solve for K(k) and P (k).
b)
Simulate the algorithm for normally distributed continuous random variables w(k) and x
with the above properties (2). Choose k up to 10000.
c)
Simulate the algorithm for x and w(k) being discrete random variables that take the value
1 and −1 with equal probability. Note that x and w(k) satisfy (2). Choose k up to 10000.
Problem 8
Consider the above setup, (1) and (2), again with x and w(k) being discrete random variables
that take the value 1 and −1 with equal probability; {x, w(1), w(2), . . . } are assumed to be
independent. What is the minimum mean-squared error (MMSE) estimate of x at time k given
all past measurements z(1), z(2), . . . , z(k)?
The MMSE estimate is given by
h
i
x̂M M SE (k) := arg min E (x̂ − x)2 z(1), z(2), . . . , z(k) .
x̂
x
3
Sample solutions
Problem 1
For a normally-distributed random variable with wi ∼ N 0, σi2 , we have
1 wi2
.
f (wi ) ∝ exp −
2 σi2
By their conditional independence, we can write
(z1 − x)2 (z2 − x)2
+
σ12
σ22
!!
Differentiating with respect to x, and setting to 0, leads to
z2 − x̂M L
z1 − x̂M L
∂ f (z1 , z2 |x) +
=0
ML = 0 ⇔
∂x
σ12
σ22
x=x̂
⇔
1
f (z1 , z2 |x) = f (z1 |x) f (z2 |x) ∝ exp −
2
.
x̂M L =
z1 σ22 + z2 σ12
,
σ12 + σ22
which is a weighted sum of the measurements. Note that
for σ12 = 0 :
x̂M L = z1 ,
for σ22 = 0 :
x̂M L = z2 .
Problem 2
We now have
f (wi ) =
(
1
2
if wi ∈ [−1, 1]
otherwise.
0
Therefore,
f (zi |x) =
(
1
2
0
if − 1 ≤ zi − x ≤ 1 or zi − 1 ≤ x ≤ zi + 1
otherwise.
We need to solve for f (z1 , z2 |x) = f (z1 |x) f (z2 |x). Consider the different cases:
• |z1 − z2 | > 2, see Fig. 1:
f (z2 |x)
f (z1 |x)
x
z1
z2
Figure 1: Non-overlapping probability density functions.
Since no value for x can explain both, z1 and z2 , given our model zi = x + wi , we have
f (z1 , z2 |x) = 0.
4
• |z1 − z2 | ≤ 2, see Fig. 2:
f (z1 , z2 |x)
f (z1 |x)
f (z2 |x)
1
4
x
z2
z1
z2
x
z1
Figure 2: Overlapping probability density functions.
We have
f (z1 , z2 |x) =
(
1
4
for x ∈ [z1 − 1, z1 + 1] ∩ [z2 − 1, z2 + 1]
otherwise,
0
where the notation A∩B represents the intersection of two sets, set A and set B. Therefore
our estimate is x̂M L ∈ [z1 − 1, z1 + 1] ∩ [z2 − 1, z2 + 1]. All values in this range are equally
likely.
Problem 3
We proceed similarly as in class, for z, w ∈ Rm and x ∈ Rn :


H1
 H2 


H =  .  , Hi = hi1 hi2 . . . hin , where hij ∈ R .
 .. 
Hm
Using
m
X
(zi − Hi x)2
1
f (z|x) ∝ exp −
2
σi2
i=1
!!
,
differentiating with respect to xj , and setting to zero yields
m
X
zi − Hi x̂M L
i=1
⇔
h1j
|
σi2
hij = 0 ,



· · · hmj 
{z
}

1
σ12
column of H
0
1
σ22
..
0
|
From which we have
j = 1, 2, . . . , n
{z
W
.
1
2
σm



 z − H x̂M L = 0,


}
H T W z − H x̂M L = 0
with W being a weight matrix, and
x̂M L = H T W H
−1
H T W z.
5
j = 1, . . . , n .
This is a weighted least squares,
w(x) = z − Hx
x̂ = arg min w(x)T W w(x),
x
where W is a weighting of the measurements. The result in both cases is the same!
Problem 4
We calculate the conditional probability density function,
1
1
f (z|x) = √ exp − (z − x)2 .
2
2π
Both distributions, f (z|x) and f (x), are illustrated in Fig. 3.
1 f (x)
f (z|x)
x
z
Figure 3: The gaussian and exponential distribution.
From the above, we have
(
0
f (z|x) f (x) =
1
√
2π
exp (−x) exp
− 21
(z − x)
2
for x < 0
for x ≥ 0.
We differentiate f (z|x) f (x) with respect to x, and set to 0, to get
∂ (f (z|x) f (x)) =0
⇔
− 1 + z − x̂M AP = 0
∂x
x=x̂M AP
⇔
x̂M AP = z − 1
x̂M AP = 0
if z ≥ 1
if z < 1 (cf. Fig. 3).
Problem 5
Here, we do the proof for the general case where A, B ∈ Rn×n , n ∈ R. We write the matrices
in terms of their elements, where (A)ij denotes the (i, j)th element of matrix A (element in ith
row and jth column),
(A)ij = aij
(B)ij = bij
i, j ∈ {1, 2, . . . , n}
with bij = bji .
Then the elements of the products AB and ABAT are
(AB)ij =
ABAT ij =
n
X
air brj ,
s=1
r=1
r=1
n
X
n
X
air brs
!
ajs .
6
From this we obtain the trace as
trace ABAT = ABAT 11 + ABAT 22 + · · · + ABAT nn
n
n X
n X
X
atr brs ats ,
=
t=1 s=1 r=1
and
∂ trace ABAT
∂aij
=
n
X
s=1
bjs ais +
|{z}
=bsj
n
X
air brj
r=1
= 2 (AB)ij .
We proved that
∂ trace ABAT
∂A
= 2AB.
Problem 6
We proceed similarly to the previous exercise:
trace (AB) =
n X
n
X
asr brs
s=1 r=1
∂ trace (AB)
= bji ,
∂aij
that is
∂ trace (AB)
= BT .
∂A
Problem 7
a)
We initialize P (0) = 1, x̂(0) = 0, H(k) = 1 and R(k) = 1 ∀k = 1, 2, . . . , and get:
K(k) =
P (k − 1)
,
P (k − 1) + 1
(3)
P (k) = (1 − K(k))2 P (k − 1) + K(k)2
P (k − 1)2
P (k − 1)
2 +
(1 + P (k − 1))
(1 + P (k − 1))2
P (k − 1)
=
.
1 + P (k − 1)
=
(4)
From (4) and P (0) = 1, we have P (1) = 21 , P (2) = 13 , . . . , and we can show by induction
(see below) that
1
,
1+k
1
from (3).
K(k) =
1+k
P (k) =
Proof by induction:
7
•
Assume
P (k) =
•
•
1
.
1+k
Start with P (0) =
=
=
c)
= 1, which satisfies (5).
Now we show for (k + 1):
P (k + 1) =
b)
1
1+0
(5)
1
1
P (k)
=
=
from (4)
1+k+1
k+2
1 + P (k)
1
1
1+k
1
+ 1+k
by assumption (5)
1
q.e.d.
1 + (k + 1)
The Matlab code is available on the class webpage.
•
Recursive least squares works, and provides a reasonable estimate after several thousand of iterations.
•
The recursive least squares estimator is the optimal estimator for gaussian noise.
The Matlab code is available on the class webpage.
•
Recursive least squares works, but we can do much better with the proposed nonlinear estimator derived below.
Problem 8
By conditional independence, the MMSE estimate is
)
( 1
X
2
M M SE
(x̂ − x) f (x|z(1)) f (x|z(2)) · . . . · f (x|z(k)) .
x̂
(k) = arg min
x̂
(6)
x=0
Let us first consider the MMSE estimate at time k = 1. The only possible combinations
are:
x w(1) z(1)
-1
-1
-2
-1
1
0
1
-1
0
1
1
2
If z(1) = −2 or z(1) = 2, we know x precisely: x = −1 and x = 1, respectively. If z(1) = 0,
it is equally likely that x = 1 or x = −1.
The MMSE is
•
for z(1) = 2:
x̂M M SE (1) = 1
•
since f (x = −1|z(1) = 2) = 0
for z(1) = −2:
x̂M M SE (1) = −1 since f (x = 1|z(1) = −2) = 0
8
•
for z(1) = 0:
x̂M M SE (1) = arg min f (x = 1|z(1) = 0) (x̂ − 1)2 + f (x = −1|z(1) = 0) (x̂ + 1)2
x̂
1
1
2
2
= arg min
(x̂ − 1) + (x̂ + 1)
x̂
2
2
= ±1 .
Recalling (6), we note that the MMSE estimate is given for all k if one measures either
−2 or 2,
for z(1) = 2, x̂M M SE (k) = 1 ∀ k = 1, 2, . . .
for z(1) = −2, x̂M M SE (k) = −1 ∀ k = 1, 2, . . . .
A measurement z(1) = 0 does not provide any information, i.e. x(1) = 1 and x(1) = −1
stay equally likely. As soon as a subsequent measurement z(k), k ≥ 2, has a value of −2
or 2, x is known precisely which is also reflected by the MMSE estimate. As long as we
keep measuring 0, we do not gain additional information and
x̂M M SE (k) = ±1 if z(ℓ) = 0 ∀ ℓ = 1, 2, . . . , k .
Matlab code for this nonlinear MMSE estimator is available on the class webpage.
9