15. Minimum Variance Unbiased Estimation

15. Minimum Variance Unbiased Estimation
ECE 830, Spring 2014
1 / 28
Bias-Variance Trade-Off
Recall that
b = Bias2 (θ)
b + Var(θ).
b
MSE(θ)
In general, the minimum MSE estimator has non-zero bias and
non-zero variance.
We can reduce bias only at a potential increase in variance.
Conversely, modifying the estimator to reduce the variance may
lead to an increase in bias.
2 / 28
Example:
Let
xn = A + wn
wn ∼ N 0, σ 2
N
X
e= α
xn
A
N n=1
where α is an arbitrary constant. If
SN ≡
N
1 X
xn ,
N n=1
then
e =
A
SN
∼
3 / 28
Example: (cont.)
Let’s find the value of α that minimizes the MSE.
e = Var (αSN ) = Var (SN ) =
Var A
h i
e = E A
e − A = E [ SN ] − A =
−A=
Bias A
Thus the MSE is
e =
MSE A
4 / 28
Aside: alternatively, we could have computed the MSE as follows
2
A + σ2 , i = j
E[xi xj ] =
A2
, i 6= j
2 e
e−A
MSE A
= E A
h i
h i
e2 − 2E A
e A + A2
= E A


#
"
N
N
X
X
1
2  1
= α E
x n A + A2
xi xj  − 2αE
N 2 i,j=1
N n=1
N
N
1 X
1 X
E[x
x
]
−
2α
E[xn ]A + A2
i
j
N 2 i,j=1
N n=1
σ2
= α 2 A2 +
− 2αA2 + A2
N
α2 σ 2
2
=
+ (α − 1) A2
|
{z
}
N
| {z }
2
e)
Bias
A
(
e
Var(A)
=
α2
5 / 28
So how practical is the MSE as a design criterion?
In the previous example, the MSE is minimized when
e
dMSE A
=
dα
⇒ α∗ =
The optimal (in an MSE sense) value α∗ depends on the unknown
parameter A! Therefore, the estimator is not realizable. This
phenomenon occurs for many classes of problems.
We need an alternative to direct MSE minimization.
6 / 28
Note that in the above example, the problematic dependence on
the parameter (A) enters through the Bias component of the MSE.
This occurs in many situations. Thus a reasonable alternative is to
constrain the estimator to be unbiased, and then find the estimator
that produces the minimum variance (and hence provides the
minimum MSE among all unbiased estimators).
Note: Sometimes no unbiased estimator exists and we cannot
proceed at all in this direction.
Definition: Minimum Variance Unbiased Estimator
θb is a minimum variance unbiased estimator (MVUE) for θ if
1. Eθb = θ ∀ θ ∈ Θ
2. If Eθb0 = θ ∀ θ ∈ Θ, then Var θb ≤ Var θb0 ∀ θ ∈ Θ.
7 / 28
Existence of the Minimum Variance Unbiased
Estimator (MVUE)
Does an MVUE estimator exist? Suppose there exist three
unbiased estimators:
θb1 , θb2 , θb3
Two possibilities exist.
θb3 is MVUE
no MVUE exists!
8 / 28
Example:
Suppose we observe a single scalar realization x of
X ∼ Unif (0, 1/θ) , θ > 0.
An unbiased estimator of θ does not exist. To see this, note that
p (x|θ) = θ · I[0,1/θ] (x) .
If θb is unbiased, then
h i
∀θ > 0, θ = E θb =
=⇒
=⇒
But if this is true for all θ, then we have θb (x) = 0, which is not an
unbiased estimator.
9 / 28
Finding the MVUE Estimator
There is no simple, general procedure for finding the MVUE
estimator. In the next several lectures we will discuss several
approaches:
1. Find a sufficient statistic and apply the Rao-Blackwell theorem
2. Determine the so-called Cramer-Rao Lower Bound (CRLB)
and verify that the estimator achieves it.
3. Further restrict the estimator to a class of estimators (e.g.,
linear or polynomial functions of the data)
10 / 28
Recipe for finding a MVUE
(1) Find a complete sufficient statistic t = T (X).
(2) Find any unbiased estimator θb0 and set
b
θ(X)
:= E[θb0 (X)|t = T (X)]
or find a function g such that
b
θ(X)
= g(T (X))
is unbiased.
These notes answer the following questions:
1. What is a sufficient statistic?
2. What is a complete sufficient statistic?
3. What does step (2) do above?
4. Is this estimator unique?
5. How do we know it’s the MVUE?
11 / 28
Definition: Sufficient statistic
Let X be an N -dimensional random vector and let θ denote a
p-dimensional parameter of the distribution of X. The statistic
t := T (X) is a sufficient statistic for θ if and only if the conditional
distribution of X given T (X) is independent of θ.
See lecture 4 for more information on Sufficient Statistics and how
to find them.
12 / 28
Minimal and Complete Sufficient Statistics
Definition: Minimal Sufficient Statistic
A sufficient statistic t is said to be minimal if the dimension of t
cannot be reduced and still be sufficient.
Definition: Complete sufficient statistic
A sufficient statistic t := T (X) is complete if for all real-valued
functions φ which satisfy
(E[φ(t)|θ] = 0∀θ)
we have
(P[φ(t) = 0|θ] = 1∀θ)
Under very general conditions, if t is a complete sufficient statistic,
then t is minimal.
13 / 28
Example: Bernoulli trials
Consider N independent Bernoulli trials
iid
xi ∼ Bernoulli(θ), θ ∈ [0, 1].
P
Recall k = N
n=1 xi is sufficient for θ. Now suppose E[φ(k)|θ] = 0
for all θ. But
E[φ(k)|θ] =
=
where poly(θ) is an N th degree polynomial. Then
poly(θ) = 0∀θ ∈ [0, 1]
=⇒ poly(θ) is the zero polynomial
=⇒ φ(k)
=⇒
14 / 28
Rao-Blackwell Theorem
Rao-Blackwell Theorem
Let Y , Z be random variables and define the function
g(z) := E[Y |Z = z].
Then
E[g(Z)] = E[Y ]
and
Var(g(Z)) ≤ Var(Y )
with equality iff Y = g(Z) almost surely.
Note that this version of Rao-Blackwell is quite general and has
nothing to do with estimation of parameters. However, we can
apply it to parameter estimation as follows.
15 / 28
Consider X ∼ p(x|θ). Let θb1 be an unbiased estimator of θ and let
t = T (x) be a sufficient statistic for θ. Apply Rao-Blackwell with
Y := θb1 (x)
Z := t = T (x).
Consider the new estimator
θb2 (x) = g(T (x)) = E[θb1 (X)|T (X) = t].
Then we may conclude:
1. θb2 is unbiased
2. Var(θb2 ) ≤ Var(θb1 )
In words, if θb1 is any unbiased estimator, then smoothing θb1 with
respect to a sufficient statistic decreases the variance while
preserving unbiasedness.
Therefore, we can restrict our search for the MVUE to functions of
a sufficient statistic.
16 / 28
The Rao-Blackwell Theorem
Rao-Blackwell Theorem, special case
Let X be a random variable with pdf p(X|θ) and let t(X) be a
sufficient statistic. Let θb1 (x) be an estimator of θ and define
h
i
θb2 (t) := E θb1 (X)|t(X) .
Then
E[θb2 (T )] = E[θb1 (X)]
and
Var(θb2 (T )) ≤ Var(θb1 (X))
with equality iff θb1 (X) ≡ θb2 (t(X)) with probability one (almost
surely).
17 / 28
Rao-Blackwell Theorem in Action
Suppose we observe 2 independent realizations from a N (µ, σ 2 )
distribution. Denote these observations x1 and x2 , with
X = [x1 , x2 ]T . Consider the simple estimator of µ:
µ̂ =x1
E[b
µ] =
Var [b
µ] =
The MSE is therefore:
Intuitively, we expect that the sample mean should be a better
estimator since
1
µ
e = (x1 + x2 )
2
averages the two observations together.
18 / 28
Is this the best possible estimator?
Let’s find a sufficient statistic for µ:
p(x1 , x2 ) =
1 −(x1 −µ)2 /2σ2 −(x2 −µ)2 /2σ2
e
e
2πσ 2
=
=
19 / 28
The Rao-Blackwell Theorem states that:
µ∗ = E[b
µ|t]
is as good as or better than µ
b in terms of estimator variance. (See
∗
Scharf p94.) What is µ ? First we need to compute the mean of
the conditional density p(b
µ|t) or p(x1 |t)
p(x1 |t) =
p(x1 , t)
p(t)
p(x1 , t) =
p(t) =
E(t) =
Var(t) =
20 / 28
p(x1 |t)
=
=
=
=
⇒ x1 |t
1
2πσ 2
√ 1
4πσ 2
−1
2
2
2
(x1 − µ) + (t − x1 − µ) − (t − 2µ) /2
exp
2σ 2
−1 2
1
x − 2µx1 + µ2 + t2 − 2x1 t + x21 − 2µt+
√
exp 2σ2 1
+2µx1 + µ2 − t2 /2 + 4µt/2 − 4µ2 /2
πσ 2
1
−1
2
2
√
2x
−
2x
t
+
t
/2
exp
1
1
2σ 2
πσ 2
1
−(x1 − t/2)2
√
exp
σ2
πσ 2
∼
µ∗ =E[b
µ|t] =
Var(µ∗ ) =
⇒ MSE(µ∗ ) =
21 / 28
The Lehmann-Scheffe Theorem
The Rao-Blackwell Theorem tells us how to decrease the variance
of an unbiased estimator. But when can we know that we get a
MVUE?
Answer: When t is a complete sufficient statistic.
Lehmann-Scheffe Theorem
If t is complete, there is at most one unbiased estimator that is a
function of t.
22 / 28
Proof
Suppose
E[θb1 ] = E[θb2 ] = θ
θb1 (X) := g1 (T (X))
θb2 (X) := g2 (T (X)).
Define
φ(t) := g1 (t) − g2 (t).
Then
E[φ(t)] =
By definition of completeness, we have
In other words
θb1 = θb2 with probability 1.
23 / 28
Recipe for finding a MVUE
This result suggests the following method for finding a MVUE:
(1) Find a complete sufficient statistic t = T (X).
(2) Find any unbiased estimator θb0 and set
b
θ(X)
:= E[θb0 (X)|t = T (X)]
or find a function g such that
b
θ(X)
= g(T (X))
is unbiased.
24 / 28
Rao-Blackwell and Complete Suff. Stats.
Theorem
If θb is constructed by the recipe above, then θb is the unique MVUE.
Proof: Note that in either construction, θb is a function of t. Let
θb1 be any unbiased estimator. We must show that
b ≤ Var(θb1 ).
Var(θ)
Define
θb2 (X) := E[θb1 (X)|t = T (X)].
By Rao-Blackwell, it suffices to show
b ≤ Var(θb2 ).
Var(θ)
25 / 28
Proof (cont.)
But θb and θb2 are both unbiased and functions of a complete
sufficient statistic
To show uniqueness, in the above argument suppose
b Then the Rao-Blackwell bound holds with
Var(θb1 ) = Var(θ).
equality
26 / 28
Example: Uniform distribution.
Suppose X = [x1 · · · xN ]T where
iid
xi ∼ Unif[0, θ], i = 1, . . . , N.
What is an unbiased estimator of θ?
N
2 X
b
θ1 =
xi
N
i=1
is unbiased. However, it is not MVUE.
27 / 28
Example: (cont.)
From the Fisher-Neyman factorization theorem,
p(X|θ) =
N
Y
1
i=1
=
θ
I[0,θ] (xi )
1
I
(θ) · I(−∞,mini xi ] (0)
N [maxi xi ,∞)
{z
}
|θ
{z
} |
a(X)
bθ (t)
we see that
T = max xi
i
is a sufficient statistic. It is left as an exercize to show that T is in
fact complete. Since θb1 is not a function of T , it is not MVUE.
However,
θb2 (X) = E[θb1 (X)|t = T (X)]
is the MVUE.
28 / 28

Download Report

15. Minimum Variance Unbiased Estimation

Paperzz.com

Your Paperzz