Recursive Estimation Raffaello D’Andrea Spring 2011 Problem Set: Extracting estimates from probability distributions Last updated: March 30, 2011 Notes: • Notation: Unless otherwise noted, x, y, and z denote random variables, fx (x) (or the short hand f (x)) denotes the probability density function of x, and fx|y (x|y) (or f (x|y)) denotes the conditional probability density function of x conditioned on y. The expected value is denoted by E [·], the variance is denoted by Var[·], and Pr (Z) denotes the probability that the event Z occurs. A normally distributed random variable x with mean µ and variance 2 2 σ is denoted by x ∼ N µ, σ . • Please report any errors found in this problem set to the teaching assistants ([email protected] or [email protected]). Problem Set Problem 1 Find the maximum likelihood (ML) estimate x̂M L := arg max f (z|x) when x zi = x + w i , i = 1, 2 , where wi are assumed to be independent with wi ∼ N 0, σi2 and zi , wi , x ∈ R. Problem 2 Solve Problem 1 for independent wi uniformly distributed on [−1, 1]. Problem 3 Find the ML estimate x̂M L when z = Hx + w , where x ∈ Rn and z, w ∈ Rm , n ≤ m. The matrix H has full rank. The components of the vector w are assumed to be independent with wi ∼ N (0, σi2 ), i = 1, . . . , m. How does this relate to weighted least squares? Problem 4 Find the maximum a posteriori (MAP) estimate x̂M AP := arg max f (z|x)f (x) when x z = x+w, x, z, w ∈ R , where x is exponentially distributed, ( 0 for x < 0 f (x) = −x e for x ≥ 0 and w ∼ N (0, 1). Problem 5 Prove that ∂ trace ABAT ∂A = 2AB if B = B T , for A, B ∈ R2×2 , where trace (·) is the sum of the diagonal elements of a matrix. Problem 6 Prove that ∂ trace (AB) = BT , ∂A for A, B ∈ R2×2 . 2 Problem 7 Apply the recursive least squares algorithm when z(k) = x + w(k) , k = 1, 2, . . . , (1) where E [x] = 0, Var[x] = 1, and E [w(k)] = 0, Var[w(k)] = 1. (2) The random variables {x, w(1), w(2), . . . } are assumed to be independent and z(k), w(k), x ∈ R. a) Solve for K(k) and P (k). b) Simulate the algorithm for normally distributed continuous random variables w(k) and x with the above properties (2). Choose k up to 10000. c) Simulate the algorithm for x and w(k) being discrete random variables that take the value 1 and −1 with equal probability. Note that x and w(k) satisfy (2). Choose k up to 10000. Problem 8 Consider the above setup, (1) and (2), again with x and w(k) being discrete random variables that take the value 1 and −1 with equal probability; {x, w(1), w(2), . . . } are assumed to be independent. What is the minimum mean-squared error (MMSE) estimate of x at time k given all past measurements z(1), z(2), . . . , z(k)? The MMSE estimate is given by h i x̂M M SE (k) := arg min E (x̂ − x)2 z(1), z(2), . . . , z(k) . x̂ x 3 Sample solutions Problem 1 For a normally-distributed random variable with wi ∼ N 0, σi2 , we have 1 wi2 . f (wi ) ∝ exp − 2 σi2 By their conditional independence, we can write (z1 − x)2 (z2 − x)2 + σ12 σ22 !! Differentiating with respect to x, and setting to 0, leads to z2 − x̂M L z1 − x̂M L ∂ f (z1 , z2 |x) + =0 ML = 0 ⇔ ∂x σ12 σ22 x=x̂ ⇔ 1 f (z1 , z2 |x) = f (z1 |x) f (z2 |x) ∝ exp − 2 . x̂M L = z1 σ22 + z2 σ12 , σ12 + σ22 which is a weighted sum of the measurements. Note that for σ12 = 0 : x̂M L = z1 , for σ22 = 0 : x̂M L = z2 . Problem 2 We now have f (wi ) = ( 1 2 if wi ∈ [−1, 1] otherwise. 0 Therefore, f (zi |x) = ( 1 2 0 if − 1 ≤ zi − x ≤ 1 or zi − 1 ≤ x ≤ zi + 1 otherwise. We need to solve for f (z1 , z2 |x) = f (z1 |x) f (z2 |x). Consider the different cases: • |z1 − z2 | > 2, see Fig. 1: f (z2 |x) f (z1 |x) x z1 z2 Figure 1: Non-overlapping probability density functions. Since no value for x can explain both, z1 and z2 , given our model zi = x + wi , we have f (z1 , z2 |x) = 0. 4 • |z1 − z2 | ≤ 2, see Fig. 2: f (z1 , z2 |x) f (z1 |x) f (z2 |x) 1 4 x z2 z1 z2 x z1 Figure 2: Overlapping probability density functions. We have f (z1 , z2 |x) = ( 1 4 for x ∈ [z1 − 1, z1 + 1] ∩ [z2 − 1, z2 + 1] otherwise, 0 where the notation A∩B represents the intersection of two sets, set A and set B. Therefore our estimate is x̂M L ∈ [z1 − 1, z1 + 1] ∩ [z2 − 1, z2 + 1]. All values in this range are equally likely. Problem 3 We proceed similarly as in class, for z, w ∈ Rm and x ∈ Rn : H1 H2 H = . , Hi = hi1 hi2 . . . hin , where hij ∈ R . .. Hm Using m X (zi − Hi x)2 1 f (z|x) ∝ exp − 2 σi2 i=1 !! , differentiating with respect to xj , and setting to zero yields m X zi − Hi x̂M L i=1 ⇔ h1j | σi2 hij = 0 , · · · hmj {z } 1 σ12 column of H 0 1 σ22 .. 0 | From which we have j = 1, 2, . . . , n {z W . 1 2 σm z − H x̂M L = 0, } H T W z − H x̂M L = 0 with W being a weight matrix, and x̂M L = H T W H −1 H T W z. 5 j = 1, . . . , n . This is a weighted least squares, w(x) = z − Hx x̂ = arg min w(x)T W w(x), x where W is a weighting of the measurements. The result in both cases is the same! Problem 4 We calculate the conditional probability density function, 1 1 f (z|x) = √ exp − (z − x)2 . 2 2π Both distributions, f (z|x) and f (x), are illustrated in Fig. 3. 1 f (x) f (z|x) x z Figure 3: The gaussian and exponential distribution. From the above, we have ( 0 f (z|x) f (x) = 1 √ 2π exp (−x) exp − 21 (z − x) 2 for x < 0 for x ≥ 0. We differentiate f (z|x) f (x) with respect to x, and set to 0, to get ∂ (f (z|x) f (x)) =0 ⇔ − 1 + z − x̂M AP = 0 ∂x x=x̂M AP ⇔ x̂M AP = z − 1 x̂M AP = 0 if z ≥ 1 if z < 1 (cf. Fig. 3). Problem 5 Here, we do the proof for the general case where A, B ∈ Rn×n , n ∈ R. We write the matrices in terms of their elements, where (A)ij denotes the (i, j)th element of matrix A (element in ith row and jth column), (A)ij = aij (B)ij = bij i, j ∈ {1, 2, . . . , n} with bij = bji . Then the elements of the products AB and ABAT are (AB)ij = ABAT ij = n X air brj , s=1 r=1 r=1 n X n X air brs ! ajs . 6 From this we obtain the trace as trace ABAT = ABAT 11 + ABAT 22 + · · · + ABAT nn n n X n X X atr brs ats , = t=1 s=1 r=1 and ∂ trace ABAT ∂aij = n X s=1 bjs ais + |{z} =bsj n X air brj r=1 = 2 (AB)ij . We proved that ∂ trace ABAT ∂A = 2AB. Problem 6 We proceed similarly to the previous exercise: trace (AB) = n X n X asr brs s=1 r=1 ∂ trace (AB) = bji , ∂aij that is ∂ trace (AB) = BT . ∂A Problem 7 a) We initialize P (0) = 1, x̂(0) = 0, H(k) = 1 and R(k) = 1 ∀k = 1, 2, . . . , and get: K(k) = P (k − 1) , P (k − 1) + 1 (3) P (k) = (1 − K(k))2 P (k − 1) + K(k)2 P (k − 1)2 P (k − 1) 2 + (1 + P (k − 1)) (1 + P (k − 1))2 P (k − 1) = . 1 + P (k − 1) = (4) From (4) and P (0) = 1, we have P (1) = 21 , P (2) = 13 , . . . , and we can show by induction (see below) that 1 , 1+k 1 from (3). K(k) = 1+k P (k) = Proof by induction: 7 • Assume P (k) = • • 1 . 1+k Start with P (0) = = = c) = 1, which satisfies (5). Now we show for (k + 1): P (k + 1) = b) 1 1+0 (5) 1 1 P (k) = = from (4) 1+k+1 k+2 1 + P (k) 1 1 1+k 1 + 1+k by assumption (5) 1 q.e.d. 1 + (k + 1) The Matlab code is available on the class webpage. • Recursive least squares works, and provides a reasonable estimate after several thousand of iterations. • The recursive least squares estimator is the optimal estimator for gaussian noise. The Matlab code is available on the class webpage. • Recursive least squares works, but we can do much better with the proposed nonlinear estimator derived below. Problem 8 By conditional independence, the MMSE estimate is ) ( 1 X 2 M M SE (x̂ − x) f (x|z(1)) f (x|z(2)) · . . . · f (x|z(k)) . x̂ (k) = arg min x̂ (6) x=0 Let us first consider the MMSE estimate at time k = 1. The only possible combinations are: x w(1) z(1) -1 -1 -2 -1 1 0 1 -1 0 1 1 2 If z(1) = −2 or z(1) = 2, we know x precisely: x = −1 and x = 1, respectively. If z(1) = 0, it is equally likely that x = 1 or x = −1. The MMSE is • for z(1) = 2: x̂M M SE (1) = 1 • since f (x = −1|z(1) = 2) = 0 for z(1) = −2: x̂M M SE (1) = −1 since f (x = 1|z(1) = −2) = 0 8 • for z(1) = 0: x̂M M SE (1) = arg min f (x = 1|z(1) = 0) (x̂ − 1)2 + f (x = −1|z(1) = 0) (x̂ + 1)2 x̂ 1 1 2 2 = arg min (x̂ − 1) + (x̂ + 1) x̂ 2 2 = ±1 . Recalling (6), we note that the MMSE estimate is given for all k if one measures either −2 or 2, for z(1) = 2, x̂M M SE (k) = 1 ∀ k = 1, 2, . . . for z(1) = −2, x̂M M SE (k) = −1 ∀ k = 1, 2, . . . . A measurement z(1) = 0 does not provide any information, i.e. x(1) = 1 and x(1) = −1 stay equally likely. As soon as a subsequent measurement z(k), k ≥ 2, has a value of −2 or 2, x is known precisely which is also reflected by the MMSE estimate. As long as we keep measuring 0, we do not gain additional information and x̂M M SE (k) = ±1 if z(ℓ) = 0 ∀ ℓ = 1, 2, . . . , k . Matlab code for this nonlinear MMSE estimator is available on the class webpage. 9
© Copyright 2026 Paperzz