4. Bayesian Inference, Part IV: Least Mean Squares (LMS) esbmabon

4. Bayesian Inference, Part IV: Least Mean Squares (LMS) es;ma;on ECE 302 Fall 2009 TR 3‐4:15pm Purdue University, School of ECE Prof. Ilya Pollak An Es;mate vs an Es;mator •  Suppose X and Y are random variables. •  Y is observed, X is not observed. An Es;mate vs an Es;mator •  Suppose X and Y are random variables. •  Y is observed, X is not observed. •  An es;mator of X based on Y is any func;on g(Y). An Es;mate vs an Es;mator • 
• 
• 
• 
Suppose X and Y are random variables. Y is observed, X is not observed. An es;mator of X based on Y is any func;on g(Y). Since g(Y) is a func;on of a r.v., it is a r.v. itself. An Es;mate vs an Es;mator • 
• 
• 
• 
• 
Suppose X and Y are random variables. Y is observed, X is not observed. An es;mator of X based on Y is any func;on g(Y). Since g(Y) is a func;on of a r.v., it is a r.v. itself. An es;mate of X based on Y=y is a number, the
value of an es;mator as determined by the
observed value y of Y. An Es;mate vs an Es;mator Suppose X and Y are random variables. Y is observed, X is not observed. An es;mator of X based on Y is any func;on g(Y). Since g(Y) is a func;on of a r.v., it is a r.v. itself. An es;mate of X based on Y=y is a number, the
value of an es;mator as determined by the
observed value y of Y. •  E.g., if g(Y) is an es;mator of X, then g(y) is an
es;mate of X for any specific observa;on y. • 
• 
• 
• 
• 
Pros and Cons of MAP Es;mators •  Minimize the error probability. Pros and Cons of MAP Es;mators •  Minimize the error probability. •  Not appropriate when different errors have
different costs (as in, e.g., the burglar alarm
and spam filtering examples). Pros and Cons of MAP Es;mators •  Minimize the error probability. •  Not appropriate when different errors have
different costs (as in, e.g., the burglar alarm
and spam filtering examples). •  For some posterior distribu;ons, may result in
large probabili;es of large errors. An example when the MAP
es;mate is bad Suppose fX|Y(x|0) is:
10
5
0
0.01
1
•  MAP: the max of the tall le[ triangle, x* = 0.005. 1.38
x
An example when the MAP
es;mate is bad Suppose fX|Y(x|0) is:
10
5
0
0.01
1
1.38
•  MAP: the max of the tall le[ triangle, x* = 0.005. •  Le[ triangle: 0.05 of the condi;onal probability mass. x
An example when the MAP
es;mate is bad Suppose fX|Y(x|0) is:
10
5
0
0.01
1
1.38
•  MAP: the max of the tall le[ triangle, x* = 0.005. •  Le[ triangle: 0.05 of the condi;onal probability mass. •  MAP makes errors of ≥ 0.995 with cond. prob. 0.95. x
An example when the MAP
es;mate is bad Suppose fX|Y(x|0) is:
10
5
0
• 
• 
• 
• 
0.01
1
1.38
MAP: the max of the tall le[ triangle, x* = 0.005. Le[ triangle: 0.05 of the condi;onal probability mass. MAP makes errors of ≥ 0.995 with cond. prob. 0.95. Another es;mate: condi;onal mean E[X|Y=0] =
0.050.005 + 0.951.19 ≈ 1.13. x
Mean Squared Error (MSE) for an
Es;mator g(Y) of X Mean Squared Error (MSE) for an
Es;mator g(Y) of X •  The expecta;on is taken with respect to the joint
distribu;on of X and Y. Mean Squared Error (MSE) for an
Es;mator g(Y) of X •  The expecta;on is taken with respect to the joint
distribu;on of X and Y. •  This is the expecta;on of the squared difference
between the unobserved random variable X and
its es;mator based on the observed random
variable Y. Mean Squared Error (MSE) for an
Es;mator g(Y) of X •  The expecta;on is taken with respect to the joint
distribu;on of X and Y. •  This is the expecta;on of the squared difference
between the unobserved random variable X and
its es;mator based on the observed random
variable Y. •  Makes large errors more costly than small ones. Condi;onal Mean Squared Error
for an Es;mate g(y) of X, given Y=y Condi;onal Mean Squared Error
for an Es;mate g(y) of X, given Y=y •  The expecta;on is taken with respect to the
condi;onal distribu;on of X given Y=y. Condi;onal Mean Squared Error
for an Es;mate g(y) of X, given Y=y •  The expecta;on is taken with respect to the
condi;onal distribu;on of X given Y=y. •  This is the condi;onal expecta;on, given Y=y,
of the squared difference between the
unobserved random variable X and its
es;mate based on the specific observa;on y
of the random variable Y. Least Mean Squares (LMS)
Es;mator •  LMS es'mator for X based on Y is the
es'mator that minimizes the MSE among all
es'mators of X based on Y. Least Mean Squares (LMS)
Es;mator •  LMS es'mator for X based on Y is the
es'mator that minimizes the MSE among all
es'mators of X based on Y. •  If g(Y) is the LMS es;mator, then, for any
observa;on y of Y, the number g(y) minimizes
the condi;onal MSE among all es;mates of X. Least Mean Squares (LMS)
Es;mator •  LMS es'mator for X based on Y is the
es'mator that minimizes the MSE among all
es'mators of X based on Y. •  If g(Y) is the LMS es;mator, then, for any
observa;on y of Y, the number g(y) minimizes
the condi;onal MSE among all es;mates of X. •  The LMS es'mator of X based on Y is E[X|Y]. Least Mean Squares (LMS)
Es;mator •  LMS es'mator for X based on Y is the
es'mator that minimizes the MSE among all
es'mators of X based on Y. •  If g(Y) is the LMS es;mator, then, for any
observa;on y of Y, the number g(y) minimizes
the condi;onal MSE among all es;mates of X. •  The LMS es'mator of X based on Y is E[X|Y]. •  For any observa;on y of Y, the LMS es;mate
of X is the mean of the posterior density, i.e.,
E[X|Y=y]. LMS es;mate with no data •  Suppose X is a r.v. with a known mean. LMS es;mate with no data •  Suppose X is a r.v. with a known mean. •  We observe nothing. LMS es;mate with no data •  Suppose X is a r.v. with a known mean. •  We observe nothing. •  We would like an es;mate x* of X which minimizes the
MSE, E[(X−x*)2]. LMS es;mate with no data •  Suppose X is a r.v. with a known mean. •  We observe nothing. •  We would like an es;mate x* of X which minimizes the
MSE, E[(X−x*)2]. •  E[(X−x*)2] = var(X−x*) + (E[X−x*])2 LMS es;mate with no data •  Suppose X is a r.v. with a known mean. •  We observe nothing. •  We would like an es;mate x* of X which minimizes the
MSE, E[(X−x*)2]. •  E[(X−x*)2] = var(X−x*) + (E[X−x*])2 = var(X) + (E[X] − x*)2 LMS es;mate with no data •  Suppose X is a r.v. with a known mean. •  We observe nothing. •  We would like an es;mate x* of X which minimizes the
MSE, E[(X−x*)2]. •  E[(X−x*)2] = var(X−x*) + (E[X−x*])2 = var(X) + (E[X] − x*)2 MSE var(X) E[X] x*
LMS es;mate with no data •  Suppose X is a r.v. with a known mean. •  We observe nothing. •  We would like an es;mate x* of X which minimizes the
MSE, E[(X−x*)2]. •  E[(X−x*)2] = var(X−x*) + (E[X−x*])2 = var(X) + (E[X] − x*)2 •  This is a quadra;c func;on of x*, minimized by x*=E[X]. MSE var(X) E[X] x*
LMS es;mate from one observa;on •  Suppose X, Y are r.v.’s, and E[X|Y=y] is known. LMS es;mate from one observa;on •  Suppose X, Y are r.v.’s, and E[X|Y=y] is known. •  We observe Y=y. LMS es;mate from one observa;on •  Suppose X, Y are r.v.’s, and E[X|Y=y] is known. •  We observe Y=y. •  We would like an es;mate x* of X which minimizes the
posterior MSE, E[(X−x*)2|Y=y]. LMS es;mate from one observa;on •  Suppose X, Y are r.v.’s, and E[X|Y=y] is known. •  We observe Y=y. •  We would like an es;mate x* of X which minimizes the
posterior MSE, E[(X−x*)2|Y=y]. •  E[(X−x*)2|Y=y] = var(X−x*|Y=y) + (E[X−x*] |Y=y)2 LMS es;mate from one observa;on •  Suppose X, Y are r.v.’s, and E[X|Y=y] is known. •  We observe Y=y. •  We would like an es;mate x* of X which minimizes the
posterior MSE, E[(X−x*)2|Y=y]. •  E[(X−x*)2|Y=y] = var(X−x*|Y=y) + (E[X−x*] |Y=y)2 = var(X|Y=y) + (E[X|Y=y] − x*)2 LMS es;mate from one observa;on •  Suppose X, Y are r.v.’s, and E[X|Y=y] is known. •  We observe Y=y. •  We would like an es;mate x* of X which minimizes the
posterior MSE, E[(X−x*)2|Y=y]. •  E[(X−x*)2|Y=y] = var(X−x*|Y=y) + (E[X−x*] |Y=y)2 = var(X|Y=y) + (E[X|Y=y] − x*)2 MSE var(X|Y=y) E[X|Y=y] x*
LMS es;mate from one observa;on •  Suppose X, Y are r.v.’s, and E[X|Y=y] is known. •  We observe Y=y. •  We would like an es;mate x* of X which minimizes the
posterior MSE, E[(X−x*)2|Y=y]. •  E[(X−x*)2|Y=y] = var(X−x*|Y=y) + (E[X−x*] |Y=y)2 = var(X|Y=y) + (E[X|Y=y] − x*)2 •  Quadra;c func;on of x*, minimized by x*=E[X|Y=y]. MSE var(X|Y=y) E[X|Y=y] x*
LMS es;mator •  Suppose X, Y are r.v.’s, and E[X|Y=y] is known for all y. LMS es;mator •  Suppose X, Y are r.v.’s, and E[X|Y=y] is known for all y. •  Let g(y) = E[X|Y=y] be the LMS es;mate of X based on data Y=y. LMS es;mator •  Suppose X, Y are r.v.’s, and E[X|Y=y] is known for all y. •  Let g(y) = E[X|Y=y] be the LMS es;mate of X based on data Y=y. •  Then g(Y) = E[X|Y] is an es;mator of X based on Y. LMS es;mator • 
• 
• 
• 
Suppose X, Y are r.v.’s, and E[X|Y=y] is known for all y. Let g(y) = E[X|Y=y] be the LMS es;mate of X based on data Y=y. Then g(Y) = E[X|Y] is an es;mator of X based on Y. Let h(Y) be any other es;mator of X based on Y. LMS es;mator • 
• 
• 
• 
• 
Suppose X, Y are r.v.’s, and E[X|Y=y] is known for all y. Let g(y) = E[X|Y=y] be the LMS es;mate of X based on data Y=y. Then g(Y) = E[X|Y] is an es;mator of X based on Y. Let h(Y) be any other es;mator of X based on Y. We just showed that, for every value of y, we have: E[(X−g(y))2|Y=y] ≤ E[(X−h(y))2|Y=y] LMS es;mator Suppose X, Y are r.v.’s, and E[X|Y=y] is known for all y. Let g(y) = E[X|Y=y] be the LMS es;mate of X based on data Y=y. Then g(Y) = E[X|Y] is an es;mator of X based on Y. Let h(Y) be any other es;mator of X based on Y. We just showed that, for every value of y, we have: E[(X−g(y))2|Y=y] ≤ E[(X−h(y))2|Y=y] •  Therefore, E[(X−g(Y))2|Y] ≤ E[(X−h(Y))2|Y] • 
• 
• 
• 
• 
LMS es;mator Suppose X, Y are r.v.’s, and E[X|Y=y] is known for all y. Let g(y) = E[X|Y=y] be the LMS es;mate of X based on data Y=y. Then g(Y) = E[X|Y] is an es;mator of X based on Y. Let h(Y) be any other es;mator of X based on Y. We just showed that, for every value of y, we have: E[(X−g(y))2|Y=y] ≤ E[(X−h(y))2|Y=y] •  Therefore, E[(X−g(Y))2|Y] ≤ E[(X−h(Y))2|Y] •  Take expecta;ons of both sides and use the iterated expecta;on
formula: E[(X−g(Y))2] ≤ E[(X−h(Y))2] • 
• 
• 
• 
• 
LMS es;mator Suppose X, Y are r.v.’s, and E[X|Y=y] is known for all y. Let g(y) = E[X|Y=y] be the LMS es;mate of X based on data Y=y. Then g(Y) = E[X|Y] is an es;mator of X based on Y. Let h(Y) be any other es;mator of X based on Y. We just showed that, for every value of y, we have: E[(X−g(y))2|Y=y] ≤ E[(X−h(y))2|Y=y] •  Therefore, E[(X−g(Y))2|Y] ≤ E[(X−h(Y))2|Y] •  Take expecta;ons of both sides and use the iterated expecta;on
formula: E[(X−g(Y))2] ≤ E[(X−h(Y))2] •  Therefore, the condi;onal expecta;on es;mator achieves an MSE
which is ≤ to the MSE for any other es;mator. • 
• 
• 
• 
• 
Other names that LMS es;mate
goes by in literature •  Least‐squares es;mate (LSE) •  Bayes least‐squares es;mate (BLSE) •  Minimum mean square error es;mate
(MMSE) Ex 8.3: es;ma;ng the mean of a
Gaussian r.v. •  Observe Yi = X + Wi for i = 1,…,n •  X, W1, …, Wn independent Gaussian, with
known means and variances •  Wi has mean 0 and variance σi2 •  X has mean x0 and variance σ02 •  Previously, we found the MAP es;mate of X
based on observing Y = (Y1, …, Yn) •  Now let’s find the LMS es;mate of X based on
observing Y = (Y1, …, Yn) Prior model and measurement model Prior model:
Measurement model:
n
1
fY| X (y | x) = ∏
e
i =1 σ i 2π
1⎛ y −x⎞
− ⎜ i
2 ⎝ σ i ⎟⎠
⎛ n 1 ⎛ y − x⎞2⎞
⎞
i
exp
−
⎜
∑
⎟⎠
⎜⎝ σ ⎟⎠ ⎟
2
i
i
⎝ i =1
⎠
n
1
− n /2 ⎛
= ( 2π ) ⎜ ∏
σ
⎝
i =1
2
Posterior PDF f X |Y (x | y) =
fY| X (y | x) f X (x)
fY (y)
Posterior PDF f X |Y (x | y) =
fY| X (y | x) f X (x)
fY (y)
Posterior PDF f X |Y (x | y) =
fY| X (y | x) f X (x)
fY (y)
Posterior PDF f X |Y (x | y) =
fY| X (y | x) f X (x)
fY (y)
2
2
n
n
⎛
⎞
⎛
⎛
⎞
⎛
⎞
⎛
⎞
1
1
1 yi − x
1
1 x − x0 ⎞
− n /2
=
exp ⎜ − ⎜
( 2π ) ⎜ ∏ ⎟ exp ⎜ − ∑ ⎜
⎟
⎟
⎟⎠ ⎟
fY (y)
2
σ
2
σ
⎝ i =1 σ i ⎠
⎝
⎠
⎝
σ
2
π
i
0
⎝ i =1
⎠ 0
⎝
⎠
Posterior PDF f X |Y (x | y) =
fY| X (y | x) f X (x)
fY (y)
2
2
n
n
⎛
⎞
⎛
⎞
⎛
⎞
⎛
⎞
⎛
⎞
1
1
1
y
−
x
1
1
x
−
x
− n /2
i
0
=
exp ⎜ − ⎜
( 2π ) ⎜ ∏ ⎟ exp ⎜ − ∑ ⎜
⎟
⎟
⎟
⎟
fY (y)
σ
2
σ
2
σ
⎝ i =1 i ⎠
⎝
i ⎠ ⎠ σ 0 2π
0 ⎠ ⎠
⎝ i =1 ⎝
⎝
⎛ n 1 ⎛ y − x⎞2⎞
= C exp ⎜ − ∑ ⎜ i
⎟⎠ ⎟
2
σ
⎝
i
⎝ i=0
⎠
(Denoting y0 = x0 and C = the factor that's not a function of x.)
Further massaging the posterior PDF… 2
n
n
n
⎛ n 1 ⎛ y − x⎞2⎞
⎛
⎡
⎤⎞
1
1
y
y
2
i
i
i
f X |Y (x | y) = C exp ⎜ − ∑ ⎜
⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
⎟
⎝ 2 ⎣ i=0 σ i
i=0 σ i
i = 0 σ i ⎦⎠
⎝ i=0 2 ⎝ σ i ⎠ ⎠
Further massaging the posterior PDF… 2
n
n
n
⎛ n 1 ⎛ y − x⎞2⎞
⎛
⎡
⎤⎞
1
1
y
y
2
i
i
i
f X |Y (x | y) = C exp ⎜ − ∑ ⎜
⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
⎟
⎝ 2 ⎣ i=0 σ i
i=0 σ i
i = 0 σ i ⎦⎠
⎝ i=0 2 ⎝ σ i ⎠ ⎠
n
yi
∑σ2
1
Denoting v = n
and m = i =n0 i , we have:
1
1
∑σ2
∑σ2
i=0
i=0
i
i
Further massaging the posterior PDF… 2
n
n
n
⎛ n 1 ⎛ y − x⎞2⎞
⎛
⎡
⎤⎞
1
1
y
y
2
i
i
i
f X |Y (x | y) = C exp ⎜ − ∑ ⎜
⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
⎟
⎝ 2 ⎣ i=0 σ i
i=0 σ i
i = 0 σ i ⎦⎠
⎝ i=0 2 ⎝ σ i ⎠ ⎠
n
yi
∑σ2
1
Denoting v = n
and m = i =n0 i , we have:
1
1
∑σ2
∑σ2
i=0
i=0
i
i
⎛ 1 2
m 2 1 n yi2 ⎞
2
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
− ∑ 2⎟
2v 2 i = 0 σ i ⎠
⎝ 2v
Further massaging the posterior PDF… 2
n
n
n
⎛ n 1 ⎛ y − x⎞2⎞
⎛
⎡
⎤⎞
1
1
y
y
2
i
i
i
f X |Y (x | y) = C exp ⎜ − ∑ ⎜
⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
⎟
⎝ 2 ⎣ i=0 σ i
i=0 σ i
i = 0 σ i ⎦⎠
⎝ i=0 2 ⎝ σ i ⎠ ⎠
n
yi
∑σ2
1
Denoting v = n
and m = i =n0 i , we have:
1
1
∑σ2
∑σ2
i=0
i=0
i
i
⎛ 1 2
m 2 1 n yi2 ⎞
2
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
− ∑ 2⎟
2v 2 i = 0 σ i ⎠
⎝ 2v
⎛ (x − m)2 ⎞
= C1 exp ⎜ −
2v ⎟⎠
⎝
Further massaging the posterior PDF… 2
n
n
n
⎛ n 1 ⎛ y − x⎞2⎞
⎛
⎡
⎤⎞
1
1
y
y
2
i
i
i
f X |Y (x | y) = C exp ⎜ − ∑ ⎜
⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
⎟
⎝ 2 ⎣ i=0 σ i
i=0 σ i
i = 0 σ i ⎦⎠
⎝ i=0 2 ⎝ σ i ⎠ ⎠
n
yi
∑σ2
1
Denoting v = n
and m = i =n0 i , we have:
1
1
∑σ2
∑σ2
i=0
i=0
i
i
⎛ 1 2
m 2 1 n yi2 ⎞
2
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
− ∑ 2⎟
2v 2 i = 0 σ i ⎠
⎝ 2v
⎛ (x − m)2 ⎞
= C1 exp ⎜ −
2v ⎟⎠
⎝
This is a Gaussian PDF with mean m and variance v,
Further massaging the posterior PDF… 2
n
n
n
⎛ n 1 ⎛ y − x⎞2⎞
⎛
⎡
⎤⎞
1
1
y
y
2
i
i
i
f X |Y (x | y) = C exp ⎜ − ∑ ⎜
⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
⎟
⎝ 2 ⎣ i=0 σ i
i=0 σ i
i = 0 σ i ⎦⎠
⎝ i=0 2 ⎝ σ i ⎠ ⎠
n
yi
∑σ2
1
Denoting v = n
and m = i =n0 i , we have:
1
1
∑σ2
∑σ2
i=0
i=0
i
i
⎛ 1 2
m 2 1 n yi2 ⎞
2
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
− ∑ 2⎟
2v 2 i = 0 σ i ⎠
⎝ 2v
⎛ (x − m)2 ⎞
= C1 exp ⎜ −
2v ⎟⎠
⎝
This is a Gaussian PDF with mean m and variance v,
1
and therefore we immediately have:
C1 =
2π v
Posterior PDF and LMS es;mate n
f X |Y (x | y) =
⎛ (x − m)2 ⎞
1
exp ⎜ −
2v ⎟⎠
⎝
2π v
yi
∑σ2
1
where v = n
and m = i =n0 i
1
1
∑σ2
∑σ2
i=0
i=0
i
i
Posterior PDF and LMS es;mate n
f X |Y (x | y) =
⎛ (x − m)2 ⎞
1
exp ⎜ −
2v ⎟⎠
⎝
2π v
yi
∑σ2
1
where v = n
and m = i =n0 i
1
1
∑σ2
∑σ2
i=0
i=0
i
i
n
yi
∑σ2
Therefore, the LMS estimate is E[X | Y = y] = m = i =n0 i ,
1
∑σ2
i=0
i
Posterior PDF and LMS es;mate n
f X |Y (x | y) =
⎛ (x − m)2 ⎞
1
exp ⎜ −
2v ⎟⎠
⎝
2π v
yi
∑σ2
1
where v = n
and m = i =n0 i
1
1
∑σ2
∑σ2
i=0
i=0
i
i
n
yi
∑σ2
Therefore, the LMS estimate is E[X | Y = y] = m = i =n0 i ,
1
∑σ2
i=0
i
n
Yi
∑σ2
and the LMS estimator is E[X | Y] = i =n0 i
1
∑σ2
i=0
i
Posterior PDF and LMS es;mate n
f X |Y (x | y) =
⎛ (x − m)2 ⎞
1
exp ⎜ −
2v ⎟⎠
⎝
2π v
yi
∑σ2
1
where v = n
and m = i =n0 i
1
1
∑σ2
∑σ2
i=0
i=0
i
i
n
yi
∑σ2
Therefore, the LMS estimate is E[X | Y = y] = m = i =n0 i ,
1
∑σ2
i=0
i
n
Yi
∑σ2
and the LMS estimator is E[X | Y] = i =n0 i
1
∑σ2
i=0
i
Note that, in this example, LMS = MAP Posterior PDF and LMS es;mate n
f X |Y (x | y) =
⎛ (x − m)2 ⎞
1
exp ⎜ −
2v ⎟⎠
⎝
2π v
yi
∑σ2
1
where v = n
and m = i =n0 i
1
1
∑σ2
∑σ2
i=0
i=0
i
i
n
yi
∑σ2
Therefore, the LMS estimate is E[X | Y = y] = m = i =n0 i ,
1
∑σ2
i=0
i
n
Yi
∑σ2
and the LMS estimator is E[X | Y] = i =n0 i
1
∑σ2
i=0
i
Note that, in this example, LMS = MAP, because: •  the posterior density is Gaussian •  for a Gaussian density, the maximum is at the mean Ex. 8.11 •  X = con;nuous uniform over [4, 10]. •  W = con;nuous uniform over [‐1, 1]. •  X and W are independent. Ex. 8.11 • 
• 
• 
• 
X = con;nuous uniform over [4, 10]. W = con;nuous uniform over [‐1, 1]. X and W are independent. Y = X + W. Ex. 8.11 • 
• 
• 
• 
• 
X = con;nuous uniform over [4, 10]. W = con;nuous uniform over [‐1, 1]. X and W are independent. Y = X + W. Calculate the LMS es;mate of X based on Y=y. Ex. 8.11 • 
• 
• 
• 
• 
• 
X = con;nuous uniform over [4, 10]. W = con;nuous uniform over [‐1, 1]. X and W are independent. Y = X + W. Calculate the LMS es;mate of X based on Y=y. Calculate the associated condi;onal MSE. Ex. 8.11 • 
• 
• 
• 
• 
X = con;nuous uniform over [4, 10]. W = con;nuous uniform over [‐1, 1]. X and W are independent. Y = X + W. fY|X(y|x) is uniform over [x−1, x+1] Ex. 8.11 • 
• 
• 
• 
• 
X = con;nuous uniform over [4, 10]. W = con;nuous uniform over [‐1, 1]. X and W are independent. Y = X + W. fY|X(y|x) is uniform over [x−1, x+1] ⎧⎪ 1 / 12, if 4 ≤ x ≤ 10 and x − 1 ≤ y ≤ x + 1
f X ,Y (x, y) = f X (x) fY | X (y | x) = ⎨
⎪⎩ 0, otherwise
Ex. 8.11: support of the joint PDF of X and Y ⎧⎪ 1 / 12, if 4 ≤ x ≤ 10 and x − 1 ≤ y ≤ x + 1
f X ,Y (x, y) = f X (x) fY | X (y | x) = ⎨
⎪⎩ 0, otherwise
x
10
The joint PDF is supported (i.e., is nonzero) inside the blue parallelogram. x−1=y
x+1=y
4
3
5
9
11
y
Ex. 8.11: the form of the posterior f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
Ex. 8.11: the form of the posterior f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y is uniform over the parallelogram. x
10
x=y+1
x=y−1
4
3
5
9
11
y
Ex. 8.11: the form of the posterior f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y is uniform over the parallelogram. Therefore, for any fixed y=y0, the posterior is uniform over the intersec;on of the line y=y0 with the parallelogram. x
10
x=y+1
x=y−1
4
3
5
9
11
y
Ex. 8.11: the posterior f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y is uniform over the parallelogram. Therefore, for any fixed y=y0, the posterior is uniform over the intersec;on of the line y=y0 with the parallelogram. fX|Y(x|y) is uniform over: [4, y+1] for 3 ≤ y ≤ 5 x
10
x=y+1
x=y−1
4
3
5
9
11
y
Ex. 8.11: the posterior f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y is uniform over the parallelogram. Therefore, for any fixed y=y0, the posterior is uniform over the intersec;on of the line y=y0 with the parallelogram. fX|Y(x|y) is uniform over: [4, y+1] for 3 ≤ y ≤ 5 [y−1, y+1] for 5 ≤ y ≤ 9 x
10
x=y+1
x=y−1
4
3
5
9
11
y
Ex. 8.11: the posterior f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y is uniform over the parallelogram. Therefore, for any fixed y=y0, the posterior is uniform over the intersec;on of the line y=y0 with the parallelogram. fX|Y(x|y) is uniform over: [4, y+1] for 3 ≤ y ≤ 5 [y−1, y+1] for 5 ≤ y ≤ 9 [y−1, 10] for 9 ≤ y ≤ 11 x
10
x=y+1
x=y−1
4
3
5
9
11
y
Ex. 8.11: the LMS es;mate f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y is uniform over the parallelogram. Therefore, for any fixed y=y0, the posterior is uniform over the intersec;on of the line y=y0 with the parallelogram. fX|Y(x|y) is uniform over: [4, y+1] for 3 ≤ y ≤ 5 [y−1, y+1] for 5 ≤ y ≤ 9 [y−1, 10] for 9 ≤ y ≤ 11 x
10
x=y+1
Therefore, E[X|Y=y] = (y+5)/2 for 3 ≤ y ≤ 5 x=y−1
4
3
5
9
11
y
Ex. 8.11: the LMS es;mate f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y is uniform over the parallelogram. Therefore, for any fixed y=y0, the posterior is uniform over the intersec;on of the line y=y0 with the parallelogram. fX|Y(x|y) is uniform over: [4, y+1] for 3 ≤ y ≤ 5 [y−1, y+1] for 5 ≤ y ≤ 9 [y−1, 10] for 9 ≤ y ≤ 11 x
10
x=y+1
Therefore, E[X|Y=y] = (y+5)/2 for 3 ≤ y ≤ 5 y for 5 ≤ y ≤ 9 x=y−1
4
3
5
9
11
y
Ex. 8.11: the LMS es;mate f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y is uniform over the parallelogram. Therefore, for any fixed y=y0, the posterior is uniform over the intersec;on of the line y=y0 with the parallelogram. fX|Y(x|y) is uniform over: [4, y+1] for 3 ≤ y ≤ 5 [y−1, y+1] for 5 ≤ y ≤ 9 [y−1, 10] for 9 ≤ y ≤ 11 x
10
x=y+1
Therefore, E[X|Y=y] = (y+5)/2 for 3 ≤ y ≤ 5 y for 5 ≤ y ≤ 9 (y+9)/2 for 9 ≤ y ≤ 11 x=y−1
4
3
5
9
11
y
Ex. 8.11: the LMS es;mate f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y is uniform over the parallelogram. Therefore, for any fixed y=y0, the posterior is uniform over the intersec;on of the line y=y0 with the parallelogram. fX|Y(x|y) is uniform over: [4, y+1] for 3 ≤ y ≤ 5 [y−1, y+1] for 5 ≤ y ≤ 9 [y−1, 10] for 9 ≤ y ≤ 11 x
10
x=y+1
Therefore, E[X|Y=y] = (y+5)/2 for 3 ≤ y ≤ 5 y for 5 ≤ y ≤ 9 (y+9)/2 for 9 ≤ y ≤ 11 x=y−1
4
3
5
9
11
y
Ex. 8.11: the condi;onal MSE E[(X−E[X|Y=y])2 | Y=y] = var(X|Y=y) Ex. 8.11: the condi;onal MSE fX|Y(x|y) is uniform over: [4, y+1] for 3 ≤ y ≤ 5 x
10
x=y+1
Therefore, E[(X−E[X|Y=y])2 | Y=y] = var(X|Y=y) = (y−3) 2/12 for 3 ≤ y ≤ 5 x=y−1
4
3
5
9
11
y
Ex. 8.11: the condi;onal MSE fX|Y(x|y) is uniform over: [4, y+1] for 3 ≤ y ≤ 5 [y−1, y+1] for 5 ≤ y ≤ 9 x
10
x=y+1
Therefore, E[(X−E[X|Y=y])2 | Y=y] = var(X|Y=y) = (y−3) 2/12 for 3 ≤ y ≤ 5 1/3 for 5 ≤ y ≤ 9 x=y−1
4
3
5
9
11
y
Ex. 8.11: the condi;onal MSE fX|Y(x|y) is uniform over: [4, y+1] for 3 ≤ y ≤ 5 [y−1, y+1] for 5 ≤ y ≤ 9 [y−1, 10] for 9 ≤ y ≤ 11 x
10
x=y+1
Therefore, E[(X−E[X|Y=y])2 | Y=y] = var(X|Y=y) = (y−3) 2/12 for 3 ≤ y ≤ 5 1/3 for 5 ≤ y ≤ 9 (11−y) 2/12 for 9 ≤ y ≤ 11 x=y−1
4
3
5
9
11
y
Ex. 8.11: the condi;onal MSE E[(X−E[X|Y=y])2 | Y=y] = var(X|Y=y) = (y−3) 2/12 for 3 ≤ y ≤ 5 1/3 for 5 ≤ y ≤ 9 (11−y) 2/12 for 9 ≤ y ≤ 11 Conditional MSE
1/3
3
5
9
11
y
Es;ma;on error for LMS We denote the LMS estimator of X based on Y by X̂LMS :
X̂LMS = E[X | Y ]
The associated estimation error is denoted by X LMS and is defined as:
X LMS = X̂LMS − X = E[X | Y ] − X
Proper;es of the es;ma;on error for LMS We denote the LMS estimator of X based on Y by X̂LMS :
X̂LMS = E[X | Y ]
The associated estimation error is denoted by X LMS and is defined as:
X LMS = X̂LMS − X = E[X | Y ] − X
The estimation error has zero mean:
E ⎡⎣ X LMS ⎤⎦ = E [ E[X | Y ] − X ] = E [ E[X | Y ]] − E [ X ] = 0
Proper;es of the es;ma;on error for LMS We denote the LMS estimator of X based on Y by X̂LMS :
X̂LMS = E[X | Y ]
The associated estimation error is denoted by X LMS and is defined as:
X LMS = X̂LMS − X = E[X | Y ] − X
The estimation error has zero mean:
E ⎡⎣ X LMS ⎤⎦ = E [ E[X | Y ] − X ] = E [ E[X | Y ]] − E [ X ] = 0
The estimation error is uncorrelated with X̂LMS :
E ⎡⎣ X̂LMS X LMS ⎤⎦ = 0
Proper;es of the es;ma;on error for LMS We denote the LMS estimator of X based on Y by X̂LMS :
X̂LMS = E[X | Y ]
The associated estimation error is denoted by X LMS and is defined as:
X LMS = X̂LMS − X = E[X | Y ] − X
The estimation error has zero mean:
E ⎡⎣ X LMS ⎤⎦ = E [ E[X | Y ] − X ] = E [ E[X | Y ]] − E [ X ] = 0
The estimation error is uncorrelated with X̂LMS :
E ⎡⎣ X̂LMS X LMS ⎤⎦ = 0
Therefore,
(
)
= var ( X̂ ) + var ( X )
(
)
(
)
(
var(X) = var X̂LMS − ( X̂LMS − X) = var X̂LMS − X LMS = var X̂LMS + var − X LMS
LMS
LMS
)

Download Report

4. Bayesian Inference, Part IV: Least Mean Squares (LMS) esbmabon

Paperzz.com

Your Paperzz