4. Bayesian Inference, Part IV: Least Mean Squares (LMS) esbmabon

4.
Bayesian
Inference,
Part
IV:
Least
Mean
Squares
(LMS)
es;ma;on
ECE
302
Fall
2009
TR
3‐4:15pm
Purdue
University,
School
of
ECE
Prof.
Ilya
Pollak
An
Es;mate
vs
an
Es;mator
•  Suppose
X
and
Y
are
random
variables.
•  Y
is
observed,
X
is
not
observed.
An
Es;mate
vs
an
Es;mator
•  Suppose
X
and
Y
are
random
variables.
•  Y
is
observed,
X
is
not
observed.
•  An
es;mator
of
X
based
on
Y
is
any
func;on
g(Y).
An
Es;mate
vs
an
Es;mator
• 
• 
• 
• 
Suppose
X
and
Y
are
random
variables.
Y
is
observed,
X
is
not
observed.
An
es;mator
of
X
based
on
Y
is
any
func;on
g(Y).
Since
g(Y)
is
a
func;on
of
a
r.v.,
it
is
a
r.v.
itself.
An
Es;mate
vs
an
Es;mator
• 
• 
• 
• 
• 
Suppose
X
and
Y
are
random
variables.
Y
is
observed,
X
is
not
observed.
An
es;mator
of
X
based
on
Y
is
any
func;on
g(Y).
Since
g(Y)
is
a
func;on
of
a
r.v.,
it
is
a
r.v.
itself.
An
es;mate
of
X
based
on
Y=y
is
a
number,
the
value
of
an
es;mator
as
determined
by
the
observed
value
y
of
Y.
An
Es;mate
vs
an
Es;mator
Suppose
X
and
Y
are
random
variables.
Y
is
observed,
X
is
not
observed.
An
es;mator
of
X
based
on
Y
is
any
func;on
g(Y).
Since
g(Y)
is
a
func;on
of
a
r.v.,
it
is
a
r.v.
itself.
An
es;mate
of
X
based
on
Y=y
is
a
number,
the
value
of
an
es;mator
as
determined
by
the
observed
value
y
of
Y.
•  E.g.,
if
g(Y)
is
an
es;mator
of
X,
then
g(y)
is
an
es;mate
of
X
for
any
specific
observa;on
y.
• 
• 
• 
• 
• 
Pros
and
Cons
of
MAP
Es;mators
•  Minimize
the
error
probability.
Pros
and
Cons
of
MAP
Es;mators
•  Minimize
the
error
probability.
•  Not
appropriate
when
different
errors
have
different
costs
(as
in,
e.g.,
the
burglar
alarm
and
spam
filtering
examples).
Pros
and
Cons
of
MAP
Es;mators
•  Minimize
the
error
probability.
•  Not
appropriate
when
different
errors
have
different
costs
(as
in,
e.g.,
the
burglar
alarm
and
spam
filtering
examples).
•  For
some
posterior
distribu;ons,
may
result
in
large
probabili;es
of
large
errors.
An
example
when
the
MAP
es;mate
is
bad
Suppose fX|Y(x|0) is:
10
5
0
0.01
1
•  MAP:
the
max
of
the
tall
le[
triangle,
x*
=
0.005.
1.38
x
An
example
when
the
MAP
es;mate
is
bad
Suppose fX|Y(x|0) is:
10
5
0
0.01
1
1.38
•  MAP:
the
max
of
the
tall
le[
triangle,
x*
=
0.005.
•  Le[
triangle:
0.05
of
the
condi;onal
probability
mass.
x
An
example
when
the
MAP
es;mate
is
bad
Suppose fX|Y(x|0) is:
10
5
0
0.01
1
1.38
•  MAP:
the
max
of
the
tall
le[
triangle,
x*
=
0.005.
•  Le[
triangle:
0.05
of
the
condi;onal
probability
mass.
•  MAP
makes
errors
of
≥
0.995
with
cond.
prob.
0.95.
x
An
example
when
the
MAP
es;mate
is
bad
Suppose fX|Y(x|0) is:
10
5
0
• 
• 
• 
• 
0.01
1
1.38
MAP:
the
max
of
the
tall
le[
triangle,
x*
=
0.005.
Le[
triangle:
0.05
of
the
condi;onal
probability
mass.
MAP
makes
errors
of
≥
0.995
with
cond.
prob.
0.95.
Another
es;mate:
condi;onal
mean
E[X|Y=0]
=
0.050.005
+
0.951.19
≈
1.13.
x
Mean
Squared
Error
(MSE)
for
an
Es;mator
g(Y)
of
X
Mean
Squared
Error
(MSE)
for
an
Es;mator
g(Y)
of
X
•  The
expecta;on
is
taken
with
respect
to
the
joint
distribu;on
of
X
and
Y.
Mean
Squared
Error
(MSE)
for
an
Es;mator
g(Y)
of
X
•  The
expecta;on
is
taken
with
respect
to
the
joint
distribu;on
of
X
and
Y.
•  This
is
the
expecta;on
of
the
squared
difference
between
the
unobserved
random
variable
X
and
its
es;mator
based
on
the
observed
random
variable
Y.
Mean
Squared
Error
(MSE)
for
an
Es;mator
g(Y)
of
X
•  The
expecta;on
is
taken
with
respect
to
the
joint
distribu;on
of
X
and
Y.
•  This
is
the
expecta;on
of
the
squared
difference
between
the
unobserved
random
variable
X
and
its
es;mator
based
on
the
observed
random
variable
Y.
•  Makes
large
errors
more
costly
than
small
ones.
Condi;onal
Mean
Squared
Error
for
an
Es;mate
g(y)
of
X,
given
Y=y
Condi;onal
Mean
Squared
Error
for
an
Es;mate
g(y)
of
X,
given
Y=y
•  The
expecta;on
is
taken
with
respect
to
the
condi;onal
distribu;on
of
X
given
Y=y.
Condi;onal
Mean
Squared
Error
for
an
Es;mate
g(y)
of
X,
given
Y=y
•  The
expecta;on
is
taken
with
respect
to
the
condi;onal
distribu;on
of
X
given
Y=y.
•  This
is
the
condi;onal
expecta;on,
given
Y=y,
of
the
squared
difference
between
the
unobserved
random
variable
X
and
its
es;mate
based
on
the
specific
observa;on
y
of
the
random
variable
Y.
Least
Mean
Squares
(LMS)
Es;mator
•  LMS
es'mator
for
X
based
on
Y
is
the
es'mator
that
minimizes
the
MSE
among
all
es'mators
of
X
based
on
Y.
Least
Mean
Squares
(LMS)
Es;mator
•  LMS
es'mator
for
X
based
on
Y
is
the
es'mator
that
minimizes
the
MSE
among
all
es'mators
of
X
based
on
Y.
•  If
g(Y)
is
the
LMS
es;mator,
then,
for
any
observa;on
y
of
Y,
the
number
g(y)
minimizes
the
condi;onal
MSE
among
all
es;mates
of
X.
Least
Mean
Squares
(LMS)
Es;mator
•  LMS
es'mator
for
X
based
on
Y
is
the
es'mator
that
minimizes
the
MSE
among
all
es'mators
of
X
based
on
Y.
•  If
g(Y)
is
the
LMS
es;mator,
then,
for
any
observa;on
y
of
Y,
the
number
g(y)
minimizes
the
condi;onal
MSE
among
all
es;mates
of
X.
•  The
LMS
es'mator
of
X
based
on
Y
is
E[X|Y].
Least
Mean
Squares
(LMS)
Es;mator
•  LMS
es'mator
for
X
based
on
Y
is
the
es'mator
that
minimizes
the
MSE
among
all
es'mators
of
X
based
on
Y.
•  If
g(Y)
is
the
LMS
es;mator,
then,
for
any
observa;on
y
of
Y,
the
number
g(y)
minimizes
the
condi;onal
MSE
among
all
es;mates
of
X.
•  The
LMS
es'mator
of
X
based
on
Y
is
E[X|Y].
•  For
any
observa;on
y
of
Y,
the
LMS
es;mate
of
X
is
the
mean
of
the
posterior
density,
i.e.,
E[X|Y=y]. LMS
es;mate
with
no
data
•  Suppose
X
is
a
r.v.
with
a
known
mean.
LMS
es;mate
with
no
data
•  Suppose
X
is
a
r.v.
with
a
known
mean.
•  We
observe
nothing.
LMS
es;mate
with
no
data
•  Suppose
X
is
a
r.v.
with
a
known
mean.
•  We
observe
nothing.
•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the
MSE,
E[(X−x*)2].
LMS
es;mate
with
no
data
•  Suppose
X
is
a
r.v.
with
a
known
mean.
•  We
observe
nothing.
•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the
MSE,
E[(X−x*)2].
•  E[(X−x*)2]
=
var(X−x*)
+
(E[X−x*])2
LMS
es;mate
with
no
data
•  Suppose
X
is
a
r.v.
with
a
known
mean.
•  We
observe
nothing.
•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the
MSE,
E[(X−x*)2].
•  E[(X−x*)2]
=
var(X−x*)
+
(E[X−x*])2
=
var(X)
+
(E[X]
−
x*)2
LMS
es;mate
with
no
data
•  Suppose
X
is
a
r.v.
with
a
known
mean.
•  We
observe
nothing.
•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the
MSE,
E[(X−x*)2].
•  E[(X−x*)2]
=
var(X−x*)
+
(E[X−x*])2
=
var(X)
+
(E[X]
−
x*)2
MSE
var(X)
E[X]
x*
LMS
es;mate
with
no
data
•  Suppose
X
is
a
r.v.
with
a
known
mean.
•  We
observe
nothing.
•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the
MSE,
E[(X−x*)2].
•  E[(X−x*)2]
=
var(X−x*)
+
(E[X−x*])2
=
var(X)
+
(E[X]
−
x*)2
•  This
is
a
quadra;c
func;on
of
x*,
minimized
by
x*=E[X].
MSE
var(X)
E[X]
x*
LMS
es;mate
from
one
observa;on
•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
LMS
es;mate
from
one
observa;on
•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
•  We
observe
Y=y.
LMS
es;mate
from
one
observa;on
•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
•  We
observe
Y=y.
•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the
posterior
MSE,
E[(X−x*)2|Y=y].
LMS
es;mate
from
one
observa;on
•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
•  We
observe
Y=y.
•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the
posterior
MSE,
E[(X−x*)2|Y=y].
•  E[(X−x*)2|Y=y]
=
var(X−x*|Y=y)
+
(E[X−x*]
|Y=y)2
LMS
es;mate
from
one
observa;on
•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
•  We
observe
Y=y.
•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the
posterior
MSE,
E[(X−x*)2|Y=y].
•  E[(X−x*)2|Y=y]
=
var(X−x*|Y=y)
+
(E[X−x*]
|Y=y)2
=
var(X|Y=y)
+
(E[X|Y=y]
−
x*)2
LMS
es;mate
from
one
observa;on
•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
•  We
observe
Y=y.
•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the
posterior
MSE,
E[(X−x*)2|Y=y].
•  E[(X−x*)2|Y=y]
=
var(X−x*|Y=y)
+
(E[X−x*]
|Y=y)2
=
var(X|Y=y)
+
(E[X|Y=y]
−
x*)2
MSE
var(X|Y=y)
E[X|Y=y]
x*
LMS
es;mate
from
one
observa;on
•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known.
•  We
observe
Y=y.
•  We
would
like
an
es;mate
x*
of
X
which
minimizes
the
posterior
MSE,
E[(X−x*)2|Y=y].
•  E[(X−x*)2|Y=y]
=
var(X−x*|Y=y)
+
(E[X−x*]
|Y=y)2
=
var(X|Y=y)
+
(E[X|Y=y]
−
x*)2
•  Quadra;c
func;on
of
x*,
minimized
by
x*=E[X|Y=y].
MSE
var(X|Y=y)
E[X|Y=y]
x*
LMS
es;mator
•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
LMS
es;mator
•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
•  Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
LMS
es;mator
•  Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
•  Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
•  Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.
LMS
es;mator
• 
• 
• 
• 
Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.
Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.
LMS
es;mator
• 
• 
• 
• 
• 
Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.
Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.
We
just
showed
that,
for
every
value
of
y,
we
have:
E[(X−g(y))2|Y=y]
≤
E[(X−h(y))2|Y=y]
LMS
es;mator
Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.
Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.
We
just
showed
that,
for
every
value
of
y,
we
have:
E[(X−g(y))2|Y=y]
≤
E[(X−h(y))2|Y=y]
•  Therefore,
E[(X−g(Y))2|Y]
≤
E[(X−h(Y))2|Y]
• 
• 
• 
• 
• 
LMS
es;mator
Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.
Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.
We
just
showed
that,
for
every
value
of
y,
we
have:
E[(X−g(y))2|Y=y]
≤
E[(X−h(y))2|Y=y]
•  Therefore,
E[(X−g(Y))2|Y]
≤
E[(X−h(Y))2|Y]
•  Take
expecta;ons
of
both
sides
and
use
the
iterated
expecta;on
formula:
E[(X−g(Y))2]
≤
E[(X−h(Y))2]
• 
• 
• 
• 
• 
LMS
es;mator
Suppose
X,
Y
are
r.v.’s,
and
E[X|Y=y]
is
known
for
all
y.
Let
g(y)
=
E[X|Y=y]
be
the
LMS
es;mate
of
X
based
on
data
Y=y.
Then
g(Y)
=
E[X|Y]
is
an
es;mator
of
X
based
on
Y.
Let
h(Y)
be
any
other
es;mator
of
X
based
on
Y.
We
just
showed
that,
for
every
value
of
y,
we
have:
E[(X−g(y))2|Y=y]
≤
E[(X−h(y))2|Y=y]
•  Therefore,
E[(X−g(Y))2|Y]
≤
E[(X−h(Y))2|Y]
•  Take
expecta;ons
of
both
sides
and
use
the
iterated
expecta;on
formula:
E[(X−g(Y))2]
≤
E[(X−h(Y))2]
•  Therefore,
the
condi;onal
expecta;on
es;mator
achieves
an
MSE
which
is
≤
to
the
MSE
for
any
other
es;mator.
• 
• 
• 
• 
• 
Other
names
that
LMS
es;mate
goes
by
in
literature
•  Least‐squares
es;mate
(LSE)
•  Bayes
least‐squares
es;mate
(BLSE)
•  Minimum
mean
square
error
es;mate
(MMSE)
Ex
8.3:
es;ma;ng
the
mean
of
a
Gaussian
r.v.
•  Observe
Yi
=
X
+
Wi
for
i
=
1,…,n
•  X,
W1,
…,
Wn
independent
Gaussian,
with
known
means
and
variances
•  Wi
has
mean
0
and
variance
σi2
•  X
has
mean
x0
and
variance
σ02
•  Previously,
we
found
the
MAP
es;mate
of
X
based
on
observing
Y
=
(Y1,
…,
Yn)
•  Now
let’s
find
the
LMS
es;mate
of
X
based
on
observing
Y
=
(Y1,
…,
Yn)
Prior
model
and
measurement
model
Prior model:
Measurement model:
n
1
fY| X (y | x) = ∏
e
i =1 σ i 2π
1⎛ y −x⎞
− ⎜ i
2 ⎝ σ i ⎟⎠
⎛ n 1 ⎛ y − x⎞2⎞
⎞
i
exp
−
⎜
∑
⎟⎠
⎜⎝ σ ⎟⎠ ⎟
2
i
i
⎝ i =1
⎠
n
1
− n /2 ⎛
= ( 2π ) ⎜ ∏
σ
⎝
i =1
2
Posterior
PDF
f X |Y (x | y) =
fY| X (y | x) f X (x)
fY (y)
Posterior
PDF
f X |Y (x | y) =
fY| X (y | x) f X (x)
fY (y)
Posterior
PDF
f X |Y (x | y) =
fY| X (y | x) f X (x)
fY (y)
Posterior
PDF
f X |Y (x | y) =
fY| X (y | x) f X (x)
fY (y)
2
2
n
n
⎛
⎞
⎛
⎛
⎞
⎛
⎞
⎛
⎞
1
1
1 yi − x
1
1 x − x0 ⎞
− n /2
=
exp ⎜ − ⎜
( 2π ) ⎜ ∏ ⎟ exp ⎜ − ∑ ⎜
⎟
⎟
⎟⎠ ⎟
fY (y)
2
σ
2
σ
⎝ i =1 σ i ⎠
⎝
⎠
⎝
σ
2
π
i
0
⎝ i =1
⎠ 0
⎝
⎠
Posterior
PDF
f X |Y (x | y) =
fY| X (y | x) f X (x)
fY (y)
2
2
n
n
⎛
⎞
⎛
⎞
⎛
⎞
⎛
⎞
⎛
⎞
1
1
1
y
−
x
1
1
x
−
x
− n /2
i
0
=
exp ⎜ − ⎜
( 2π ) ⎜ ∏ ⎟ exp ⎜ − ∑ ⎜
⎟
⎟
⎟
⎟
fY (y)
σ
2
σ
2
σ
⎝ i =1 i ⎠
⎝
i ⎠ ⎠ σ 0 2π
0 ⎠ ⎠
⎝ i =1 ⎝
⎝
⎛ n 1 ⎛ y − x⎞2⎞
= C exp ⎜ − ∑ ⎜ i
⎟⎠ ⎟
2
σ
⎝
i
⎝ i=0
⎠
(Denoting y0 = x0 and C = the factor that's not a function of x.)
Further
massaging
the
posterior
PDF…
2
n
n
n
⎛ n 1 ⎛ y − x⎞2⎞
⎛
⎡
⎤⎞
1
1
y
y
2
i
i
i
f X |Y (x | y) = C exp ⎜ − ∑ ⎜
⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
⎟
⎝ 2 ⎣ i=0 σ i
i=0 σ i
i = 0 σ i ⎦⎠
⎝ i=0 2 ⎝ σ i ⎠ ⎠
Further
massaging
the
posterior
PDF…
2
n
n
n
⎛ n 1 ⎛ y − x⎞2⎞
⎛
⎡
⎤⎞
1
1
y
y
2
i
i
i
f X |Y (x | y) = C exp ⎜ − ∑ ⎜
⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
⎟
⎝ 2 ⎣ i=0 σ i
i=0 σ i
i = 0 σ i ⎦⎠
⎝ i=0 2 ⎝ σ i ⎠ ⎠
n
yi
∑σ2
1
Denoting v = n
and m = i =n0 i , we have:
1
1
∑σ2
∑σ2
i=0
i=0
i
i
Further
massaging
the
posterior
PDF…
2
n
n
n
⎛ n 1 ⎛ y − x⎞2⎞
⎛
⎡
⎤⎞
1
1
y
y
2
i
i
i
f X |Y (x | y) = C exp ⎜ − ∑ ⎜
⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
⎟
⎝ 2 ⎣ i=0 σ i
i=0 σ i
i = 0 σ i ⎦⎠
⎝ i=0 2 ⎝ σ i ⎠ ⎠
n
yi
∑σ2
1
Denoting v = n
and m = i =n0 i , we have:
1
1
∑σ2
∑σ2
i=0
i=0
i
i
⎛ 1 2
m 2 1 n yi2 ⎞
2
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
− ∑ 2⎟
2v 2 i = 0 σ i ⎠
⎝ 2v
Further
massaging
the
posterior
PDF…
2
n
n
n
⎛ n 1 ⎛ y − x⎞2⎞
⎛
⎡
⎤⎞
1
1
y
y
2
i
i
i
f X |Y (x | y) = C exp ⎜ − ∑ ⎜
⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
⎟
⎝ 2 ⎣ i=0 σ i
i=0 σ i
i = 0 σ i ⎦⎠
⎝ i=0 2 ⎝ σ i ⎠ ⎠
n
yi
∑σ2
1
Denoting v = n
and m = i =n0 i , we have:
1
1
∑σ2
∑σ2
i=0
i=0
i
i
⎛ 1 2
m 2 1 n yi2 ⎞
2
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
− ∑ 2⎟
2v 2 i = 0 σ i ⎠
⎝ 2v
⎛ (x − m)2 ⎞
= C1 exp ⎜ −
2v ⎟⎠
⎝
Further
massaging
the
posterior
PDF…
2
n
n
n
⎛ n 1 ⎛ y − x⎞2⎞
⎛
⎡
⎤⎞
1
1
y
y
2
i
i
i
f X |Y (x | y) = C exp ⎜ − ∑ ⎜
⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
⎟
⎝ 2 ⎣ i=0 σ i
i=0 σ i
i = 0 σ i ⎦⎠
⎝ i=0 2 ⎝ σ i ⎠ ⎠
n
yi
∑σ2
1
Denoting v = n
and m = i =n0 i , we have:
1
1
∑σ2
∑σ2
i=0
i=0
i
i
⎛ 1 2
m 2 1 n yi2 ⎞
2
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
− ∑ 2⎟
2v 2 i = 0 σ i ⎠
⎝ 2v
⎛ (x − m)2 ⎞
= C1 exp ⎜ −
2v ⎟⎠
⎝
This is a Gaussian PDF with mean m and variance v,
Further
massaging
the
posterior
PDF…
2
n
n
n
⎛ n 1 ⎛ y − x⎞2⎞
⎛
⎡
⎤⎞
1
1
y
y
2
i
i
i
f X |Y (x | y) = C exp ⎜ − ∑ ⎜
⎟ = C exp ⎜ − ⎢ x ∑ 2 − 2x ∑ 2 + ∑ 2 ⎥⎟
⎟
⎝ 2 ⎣ i=0 σ i
i=0 σ i
i = 0 σ i ⎦⎠
⎝ i=0 2 ⎝ σ i ⎠ ⎠
n
yi
∑σ2
1
Denoting v = n
and m = i =n0 i , we have:
1
1
∑σ2
∑σ2
i=0
i=0
i
i
⎛ 1 2
m 2 1 n yi2 ⎞
2
f X |Y (x | y) = C exp ⎜ − ⎡⎣ x − 2xm + m ⎤⎦ +
− ∑ 2⎟
2v 2 i = 0 σ i ⎠
⎝ 2v
⎛ (x − m)2 ⎞
= C1 exp ⎜ −
2v ⎟⎠
⎝
This is a Gaussian PDF with mean m and variance v,
1
and therefore we immediately have:
C1 =
2π v
Posterior
PDF
and
LMS
es;mate
n
f X |Y (x | y) =
⎛ (x − m)2 ⎞
1
exp ⎜ −
2v ⎟⎠
⎝
2π v
yi
∑σ2
1
where v = n
and m = i =n0 i
1
1
∑σ2
∑σ2
i=0
i=0
i
i
Posterior
PDF
and
LMS
es;mate
n
f X |Y (x | y) =
⎛ (x − m)2 ⎞
1
exp ⎜ −
2v ⎟⎠
⎝
2π v
yi
∑σ2
1
where v = n
and m = i =n0 i
1
1
∑σ2
∑σ2
i=0
i=0
i
i
n
yi
∑σ2
Therefore, the LMS estimate is E[X | Y = y] = m = i =n0 i ,
1
∑σ2
i=0
i
Posterior
PDF
and
LMS
es;mate
n
f X |Y (x | y) =
⎛ (x − m)2 ⎞
1
exp ⎜ −
2v ⎟⎠
⎝
2π v
yi
∑σ2
1
where v = n
and m = i =n0 i
1
1
∑σ2
∑σ2
i=0
i=0
i
i
n
yi
∑σ2
Therefore, the LMS estimate is E[X | Y = y] = m = i =n0 i ,
1
∑σ2
i=0
i
n
Yi
∑σ2
and the LMS estimator is E[X | Y] = i =n0 i
1
∑σ2
i=0
i
Posterior
PDF
and
LMS
es;mate
n
f X |Y (x | y) =
⎛ (x − m)2 ⎞
1
exp ⎜ −
2v ⎟⎠
⎝
2π v
yi
∑σ2
1
where v = n
and m = i =n0 i
1
1
∑σ2
∑σ2
i=0
i=0
i
i
n
yi
∑σ2
Therefore, the LMS estimate is E[X | Y = y] = m = i =n0 i ,
1
∑σ2
i=0
i
n
Yi
∑σ2
and the LMS estimator is E[X | Y] = i =n0 i
1
∑σ2
i=0
i
Note
that,
in
this
example,
LMS
=
MAP
Posterior
PDF
and
LMS
es;mate
n
f X |Y (x | y) =
⎛ (x − m)2 ⎞
1
exp ⎜ −
2v ⎟⎠
⎝
2π v
yi
∑σ2
1
where v = n
and m = i =n0 i
1
1
∑σ2
∑σ2
i=0
i=0
i
i
n
yi
∑σ2
Therefore, the LMS estimate is E[X | Y = y] = m = i =n0 i ,
1
∑σ2
i=0
i
n
Yi
∑σ2
and the LMS estimator is E[X | Y] = i =n0 i
1
∑σ2
i=0
i
Note
that,
in
this
example,
LMS
=
MAP,
because:
• 
the
posterior
density
is
Gaussian
• 
for
a
Gaussian
density,
the
maximum
is
at
the
mean
Ex.
8.11
•  X
=
con;nuous
uniform
over
[4,
10].
•  W
=
con;nuous
uniform
over
[‐1,
1].
•  X
and
W
are
independent.
Ex.
8.11
• 
• 
• 
• 
X
=
con;nuous
uniform
over
[4,
10].
W
=
con;nuous
uniform
over
[‐1,
1].
X
and
W
are
independent.
Y
=
X
+
W.
Ex.
8.11
• 
• 
• 
• 
• 
X
=
con;nuous
uniform
over
[4,
10].
W
=
con;nuous
uniform
over
[‐1,
1].
X
and
W
are
independent.
Y
=
X
+
W.
Calculate
the
LMS
es;mate
of
X
based
on
Y=y.
Ex.
8.11
• 
• 
• 
• 
• 
• 
X
=
con;nuous
uniform
over
[4,
10].
W
=
con;nuous
uniform
over
[‐1,
1].
X
and
W
are
independent.
Y
=
X
+
W.
Calculate
the
LMS
es;mate
of
X
based
on
Y=y.
Calculate
the
associated
condi;onal
MSE.
Ex.
8.11
• 
• 
• 
• 
• 
X
=
con;nuous
uniform
over
[4,
10].
W
=
con;nuous
uniform
over
[‐1,
1].
X
and
W
are
independent.
Y
=
X
+
W.
fY|X(y|x)
is
uniform
over
[x−1,
x+1]
Ex.
8.11
• 
• 
• 
• 
• 
X
=
con;nuous
uniform
over
[4,
10].
W
=
con;nuous
uniform
over
[‐1,
1].
X
and
W
are
independent.
Y
=
X
+
W.
fY|X(y|x)
is
uniform
over
[x−1,
x+1]
⎧⎪ 1 / 12, if 4 ≤ x ≤ 10 and x − 1 ≤ y ≤ x + 1
f X ,Y (x, y) = f X (x) fY | X (y | x) = ⎨
⎪⎩ 0, otherwise
Ex.
8.11:
support
of
the
joint
PDF
of
X
and
Y
⎧⎪ 1 / 12, if 4 ≤ x ≤ 10 and x − 1 ≤ y ≤ x + 1
f X ,Y (x, y) = f X (x) fY | X (y | x) = ⎨
⎪⎩ 0, otherwise
x
10
The
joint
PDF
is
supported
(i.e.,
is
nonzero)
inside
the
blue
parallelogram.
x−1=y
x+1=y
4
3
5
9
11
y
Ex.
8.11:
the
form
of
the
posterior
f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
Ex.
8.11:
the
form
of
the
posterior
f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y
is
uniform
over
the
parallelogram.
x
10
x=y+1
x=y−1
4
3
5
9
11
y
Ex.
8.11:
the
form
of
the
posterior
f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
any
fixed
y=y0,
the
posterior
is
uniform
over
the
intersec;on
of
the
line
y=y0
with
the
parallelogram.
x
10
x=y+1
x=y−1
4
3
5
9
11
y
Ex.
8.11:
the
posterior
f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
any
fixed
y=y0,
the
posterior
is
uniform
over
the
intersec;on
of
the
line
y=y0
with
the
parallelogram.
fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
x
10
x=y+1
x=y−1
4
3
5
9
11
y
Ex.
8.11:
the
posterior
f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
any
fixed
y=y0,
the
posterior
is
uniform
over
the
intersec;on
of
the
line
y=y0
with
the
parallelogram.
fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
[y−1,
y+1]
for
5
≤
y
≤
9
x
10
x=y+1
x=y−1
4
3
5
9
11
y
Ex.
8.11:
the
posterior
f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
any
fixed
y=y0,
the
posterior
is
uniform
over
the
intersec;on
of
the
line
y=y0
with
the
parallelogram.
fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
[y−1,
y+1]
for
5
≤
y
≤
9
[y−1,
10]
for
9
≤
y
≤
11
x
10
x=y+1
x=y−1
4
3
5
9
11
y
Ex.
8.11:
the
LMS
es;mate
f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
any
fixed
y=y0,
the
posterior
is
uniform
over
the
intersec;on
of
the
line
y=y0
with
the
parallelogram.
fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
[y−1,
y+1]
for
5
≤
y
≤
9
[y−1,
10]
for
9
≤
y
≤
11
x
10
x=y+1
Therefore,
E[X|Y=y]
=
(y+5)/2
for
3
≤
y
≤
5
x=y−1
4
3
5
9
11
y
Ex.
8.11:
the
LMS
es;mate
f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
any
fixed
y=y0,
the
posterior
is
uniform
over
the
intersec;on
of
the
line
y=y0
with
the
parallelogram.
fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
[y−1,
y+1]
for
5
≤
y
≤
9
[y−1,
10]
for
9
≤
y
≤
11
x
10
x=y+1
Therefore,
E[X|Y=y]
=
(y+5)/2
for
3
≤
y
≤
5
y
for
5
≤
y
≤
9
x=y−1
4
3
5
9
11
y
Ex.
8.11:
the
LMS
es;mate
f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
any
fixed
y=y0,
the
posterior
is
uniform
over
the
intersec;on
of
the
line
y=y0
with
the
parallelogram.
fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
[y−1,
y+1]
for
5
≤
y
≤
9
[y−1,
10]
for
9
≤
y
≤
11
x
10
x=y+1
Therefore,
E[X|Y=y]
=
(y+5)/2
for
3
≤
y
≤
5
y
for
5
≤
y
≤
9
(y+9)/2
for
9
≤
y
≤
11
x=y−1
4
3
5
9
11
y
Ex.
8.11:
the
LMS
es;mate
f X ,Y (x, y)
f X |Y (x | y) =
fY (y)
fX,Y
is
uniform
over
the
parallelogram.
Therefore,
for
any
fixed
y=y0,
the
posterior
is
uniform
over
the
intersec;on
of
the
line
y=y0
with
the
parallelogram.
fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
[y−1,
y+1]
for
5
≤
y
≤
9
[y−1,
10]
for
9
≤
y
≤
11
x
10
x=y+1
Therefore,
E[X|Y=y]
=
(y+5)/2
for
3
≤
y
≤
5
y
for
5
≤
y
≤
9
(y+9)/2
for
9
≤
y
≤
11
x=y−1
4
3
5
9
11
y
Ex.
8.11:
the
condi;onal
MSE
E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y)
Ex.
8.11:
the
condi;onal
MSE
fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
x
10
x=y+1
Therefore,
E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y) =
(y−3) 2/12
for
3
≤
y
≤
5
x=y−1
4
3
5
9
11
y
Ex.
8.11:
the
condi;onal
MSE
fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
[y−1,
y+1]
for
5
≤
y
≤
9
x
10
x=y+1
Therefore,
E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y) =
(y−3) 2/12
for
3
≤
y
≤
5
1/3
for
5
≤
y
≤
9
x=y−1
4
3
5
9
11
y
Ex.
8.11:
the
condi;onal
MSE
fX|Y(x|y)
is
uniform
over:
[4,
y+1]
for
3
≤
y
≤
5
[y−1,
y+1]
for
5
≤
y
≤
9
[y−1,
10]
for
9
≤
y
≤
11
x
10
x=y+1
Therefore,
E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y) =
(y−3) 2/12
for
3
≤
y
≤
5
1/3
for
5
≤
y
≤
9
(11−y) 2/12
for
9
≤
y
≤
11
x=y−1
4
3
5
9
11
y
Ex.
8.11:
the
condi;onal
MSE
E[(X−E[X|Y=y])2
|
Y=y]
=
var(X|Y=y) =
(y−3) 2/12
for
3
≤
y
≤
5
1/3
for
5
≤
y
≤
9
(11−y) 2/12
for
9
≤
y
≤
11
Conditional MSE
1/3
3
5
9
11
y
Es;ma;on
error
for
LMS
We denote the LMS estimator of X based on Y by X̂LMS :
X̂LMS = E[X | Y ]
The associated estimation error is denoted by X LMS and is defined as:
X LMS = X̂LMS − X = E[X | Y ] − X
Proper;es
of
the
es;ma;on
error
for
LMS
We denote the LMS estimator of X based on Y by X̂LMS :
X̂LMS = E[X | Y ]
The associated estimation error is denoted by X LMS and is defined as:
X LMS = X̂LMS − X = E[X | Y ] − X
The estimation error has zero mean:
E ⎡⎣ X LMS ⎤⎦ = E [ E[X | Y ] − X ] = E [ E[X | Y ]] − E [ X ] = 0
Proper;es
of
the
es;ma;on
error
for
LMS
We denote the LMS estimator of X based on Y by X̂LMS :
X̂LMS = E[X | Y ]
The associated estimation error is denoted by X LMS and is defined as:
X LMS = X̂LMS − X = E[X | Y ] − X
The estimation error has zero mean:
E ⎡⎣ X LMS ⎤⎦ = E [ E[X | Y ] − X ] = E [ E[X | Y ]] − E [ X ] = 0
The estimation error is uncorrelated with X̂LMS :
E ⎡⎣ X̂LMS X LMS ⎤⎦ = 0
Proper;es
of
the
es;ma;on
error
for
LMS
We denote the LMS estimator of X based on Y by X̂LMS :
X̂LMS = E[X | Y ]
The associated estimation error is denoted by X LMS and is defined as:
X LMS = X̂LMS − X = E[X | Y ] − X
The estimation error has zero mean:
E ⎡⎣ X LMS ⎤⎦ = E [ E[X | Y ] − X ] = E [ E[X | Y ]] − E [ X ] = 0
The estimation error is uncorrelated with X̂LMS :
E ⎡⎣ X̂LMS X LMS ⎤⎦ = 0
Therefore,
(
)
= var ( X̂ ) + var ( X )
(
)
(
)
(
var(X) = var X̂LMS − ( X̂LMS − X) = var X̂LMS − X LMS = var X̂LMS + var − X LMS
LMS
LMS
)