Exercise 7.12 / page 356 a) Method of moments: MLE of θ

Exercise 7.12 / page 356
Note that Xi are iid from Bernoulli(θ) where 0 ≤ θ ≤ 0.5.
a) Method of moments:
Since there is only one parameter to be estimated we need only one equation where we equate
0
the rst sample moment with the rst population moment, m1 = µ1 , and we solve in terms of θ.
0
m1 = X̄ and µ1 = E(X), where
E(X) =
1
X
xP (X = x) = P (X = 1) = θ .
x=0
Hence, the method of moments estimator of θ is
θ̂M M = X̄ .
Note that if more than half out of the n observations are equalt to 1, then X̄ > 0.5 and θ̂M M gets
a value which is out of the range of parameter θ. Recall that 0 ≤ θ ≤ 0.5.
MLE of θ:
We have to determine the likelihood function rst:
L(θ|x) =
n
Y
P (Xi = xi |θ) =
i=1
n
Y
θxi (1 − θ)1−xi = θ
P
xi
(1 − θ)n−
P
xi
i=1
and for θ ∈ (0, 0.5] the log-likelihood is of the form:
log L(θ|x) =
X
xi log θ + n −
X
xi log(1 − θ) .
(Note that θ ∈ [0, 0.5] but the log-likelihood is not dened for θ = 0. So, we may need to study
the function L(θ|x) at this specic point separately.)
The rst derivative of log L(θ|x) with respect to θ is
d
log L(θ|x) =
dθ
For
d
dθ
P
P
P
P
xi n − xi
(1 − θ) xi − θ(n − xi )
n
−
=
=
(x̄ − θ) .
θ
1−θ
θ(1 − θ)
θ(1 − θ)
log L(θ|x) = 0 we get
θ = x̄ .
The second derivative of log L(θ|x) with respect to θ is
d
dθ
P P
P
P
d
d
xi n − xi
xi n − xi
log L(θ|x) =
−
=− 2 −
<0.
dθ
dθ
θ
1−θ
θ
(1 − θ)2
Therefore, θ = x̄ is global maximum point (and we do not need to study L(θ|x) at θ = 0).
Note that θ ∈ [0, 0.5], i.e. we need to nd the global maximum of log L(θ|x) within this range of
θ values. Therefore, we conclude that:
(
θ̂M LE =
X̄
0.5
if X̄ ≤ 0.5
or θ̂M LE = min X̄, 0.5 .
if X̄ > 0.5
1
(Observe
and for
b)
d
log L(θ|x) > 0, i.e. log L(θ|x) and subsequently L(θ|x) are increasing,
θ < x̄, dθ
d
log L(θ|x) < 0, i.e. log L(θ|x) and subsequently L(θ|x) are decreasing.)
θ > x̄, dθ
that for
h
i2
M SE(θ̂M M ) = Bias(θ̂M M ) + V ar(θ̂M M ) ,
where Bias(θ̂M M ) = E(θ̂M M ) − θ and θ̂M M = X̄ . Moreover,
E(θ̂M M ) = E(X̄) = E(X) = θ =⇒ Bias(θ̂M M ) = 0 , and
V ar(X̄) =
1
1
V ar(X) = θ (1 − θ)
n
n
since Xi are iid from Bernoulli(θ). Thus,
M SE(θ̂M M ) =
1
θ (1 − θ) .
n
For M SE(θ̂M LE ) it is preferable to work with the denition of MSE, i.e.:
M SE(θ̂M LE ) = E θ̂M LE − θ
2
2
P
2
Xi
, 0.5 − θ
.
= E min X̄, 0.5 − θ = E min
n
Since Xi ∼ Bernoulli(θ), 0 ≤ θ ≤ 0.5, then
Let T =
Thus:
P
Xi . Then min
Xi ∼ Bin(n, θ) with 0 ≤ θ ≤ 0.5.
o
Xi
, 0.5 is a function of the random variable T where T ∼ Bin(n, θ).
n
P
nP
2 X
2 n T
t
n
M SE(θ̂M LE ) = E min
, 0.5 − θ =
min
, 0.5 − θ
θt (1 − θ)n−t .
t
n
n
t=0
(
Note that if n is even then min n , 0.5 =
(
if n is odd then min n , 0.5 =
t
(
Let [n/2] =
n/2
(n − 1)/2
for t = (n/2) + 1, . . . , n
0.5
and
for t = 0, . . . , (n − 1)/2
.
for t = (n + 1)/2, . . . , n
t
n
0.5
if n even
, then
if n odd
min
for t = 0, . . . , n/2
t
n
t
( t
t
n
, 0.5 =
n
0.5
for t = 0, . . . , [n/2]
.
for t = [n/2] + 1, . . . , n
Thus, the expectation above can be written as
[n/2] M SE(θ̂M LE ) =
X
t=0
t
−θ
n
2 n
t
t
n−t
θ (1 − θ)
+
n
X
t=[n/2]+1
2
2
(0.5 − θ)
n
t
θt (1 − θ)n−t .
c) We will base our answer to the question which of the two estimators is prefered on the MSE
criterion. To compare the MSE of the two estimators it would help if we express M SE(θ̂M M ) in
a similar fashion as we did for M SE(θ̂M LE ). Thus,
2 X
2 n t
T
n
−θ =
−θ
θt (1 − θ)n−t =
M SE(θ̂M M ) = E X̄ − θ = E
t
n
n
t=0
2 [n/2] n
2
X t
X
t
n
n
t
n−t
=
−θ
θ (1 − θ)
+
−θ
θt (1 − θ)n−t .
t
t
n
n
t=0
2
t=[n/2]+1
Since
t
n
> 0.5 for t = [n/2] + 1, . . . , n, then (0.5 − θ)2 <
n
X
2
(0.5 − θ)
t=[n/2]+1
n
t
t
θ (1 − θ)
n−t
n
X
<
t=[n/2]+1
t
−θ
n
t
n
2
− θ for all t = [n/2] + 1, . . . , n and
2 n
t
θt (1 − θ)n−t for all θ ∈ (0, 0.5] .
Hence, M SE(θ̂M LE ) < M SE(θ̂M M ) for all θ ∈ (0, 0.5]. If θ = 0 then M SE(θ̂M LE ) = M SE(θ̂M M ).
Thus, θ̂M LE is preferred based on the MSE criterion.
One could think as follows as well:
Since θ̂M M = X̄ and θ̂M LE = min X̄, 0.5 then θ̂M LE ≤ θ̂M M . Therefore, θ̂M LE − θ
2
2
2
≤
2
and E θ̂M LE − θ ≤ E θ̂M M − θ (see Theorem 2.2.5, page 57), i.e. M SE(θ̂M LE ) ≤
M SE(θ̂M M ). Thus, we would prefer θ̂M LE again to be on the safe side but it would not be clear in
which cases equality holds where both estimators perform equally well based on the MSE criterion.
θ̂M M − θ
Exercise 7.38 / page 356
To answer the question of the exercise we use Corollary 7.3.15.
a) In order to use the corollary we have to check rst whether f (x|θ) satises the condition of
Cramér-Rao inequality, i.e. whether the interchange of integration and dierentiation is allowed.
θ−1
Notice that f (x|θ) = θx
P is an exponential family distribution (it can be written in the form
f (x|θ) = h(x)c(θ) exp ( k wk (θ)tk (x)) ). Thus, the requirement of Cramér-Rao inequality holds.
So, now we can apply the corollary.
The likelihood function is of the form:
L(θ|x) = f (x|θ) =
n
Y
f (xi |θ) =
i=1
n
Y
θxθ−1
= θn
i
i=1
and
log L(θ|x) = n log θ + (θ − 1)
n
Y
!θ−1
xi
i=1
n
X
log xi .
i=1
Then
Pn
n
∂
1 X
1
i=1 log xi
log L(θ|x) = n +
log xi = −n −
−
,
∂θ
θ i=1
n
θ
3
in the form ∂θ∂ log L(θ|x) = a(θ) [W (X) − τ (θ)]. Hence, accordi.e. ∂θ∂ log L(θ|x) can be factorized
Pn
ing to Corollary 7.3.15, − i=1nlog Xi is a best unbiased estimator of 1θ .
b) f (x|θ) =
is an exponential family distribution and therefore the requirement of CramérRao inequality holds. Thus, we can apply Corollary 7.3.15.
The likelihood function is of the form:
log θ x
θ
θ−1
n
Y
n P
n
Y
log θ xi
log θ
L(θ|x) = f (x|θ) =
f (xi |θ) =
θ =
θ xi
θ
−
1
θ
−
1
i=1
i=1
and
log L(θ|x) = n log (log θ) − n log (θ − 1) +
n
X
!
xi
log θ .
i=1
Then
n
∂
1
1
1X
n
log L(θ|x) = n
−n
+
xi =
∂θ
θ log θ
θ − 1 θ i=1
θ
θ
1
x̄ −
−
,
θ − 1 log θ
i.e. ∂θ∂ log L(θ|x) can be factorized in the form ∂θ∂ log L(θ|x) = a(θ) [W (X) − τ (θ)]. Hence, accordθ
ing to Corollary 7.3.15, X̄ is a best unbiased estimator of θ−1
− log1 θ .
Exercise 7.40/ p. 362
One solution could be using Cramér-Rao inequality & Corollary 7.3.10 & Lemma 7.3.11:
First of all, we check that X̄ is an unbiased estimator of p.
Since Xi are iid from Bernoulli(p), i = 1, . . . , n, we get E(X̄) = E(X) = p.
We also compute the variance of X̄ in order to compare it with the Cramér-Rao bound,
V ar(X̄) = n1 V ar(X) = n1 p(1 − p).
Since Bernoulli(p) is an exponential family distribution, the interchange of integration and differentiation is allowed and V ar(X̄) < ∞, so the requirements of Cramér-Rao inequality hold.
Thus,
d
E(X̄)
dp
V ar(X̄) ≥
E
∂
∂pθ
2 ,
log f (X|p)
2 2 ∂
where
= nE ∂p log f (X|p)
since Xi are iid (see
=
= 1 and E
log f (X|p)
2
2
∂
∂
Corollary 7.3.10). Furthermore, E ∂p log f (X|p)
= −E ∂p2 log f (X|p) because Bernoulli(p)
d
E(X̄)
dp
d
p
dp
2
∂
∂p
is an exponential family distribution and Lemma 7.3.11 holds. To compute the denominator:
f (x|p) = px (1 − p)1−x ⇒ log f (x|p) = x log p + (1 − x) log(1 − p) ⇒
∂
x (1 − x)
∂2
x
(1 − x)
log f (x|p) = −
⇒ 2 log f (x|p) = − 2 −
⇒
∂p
p
1−p
∂p
p
(1 − p)2
4
−E
∂2
log f (X|p)
∂p2
X
(1 − X)
= −E − 2 −
p
(1 − p)2
=
E(X) 1 − E(X)
1
1
1
=
+
= +
2
2
p
(1 − p)
p 1−p
p(1 − p)
Thus, the Cramér-Rao lower bound is
E
d
E(X̄)
dp
2
1
1
p(1 − p)
=
=
2 =
1
∂2
n
n p(1−p)
∂
−nE ∂p
2 log f (X|p)
log f (X|p)
∂pθ
which is actually equal to V ar(X̄). Hence, X̄ is a best unbiased estimator of p. Based on Theorem
7.3.19 we conclude that it is the best unbiased estimator of p.
Alternatively, we could use Corollary 7.3.15:
Xi are iid from Bernoulli(p) which satises the condition of the Cramér-Rao Theorem since
Bernoulli(p) is an exponential family distribution. The likelihood function is of the form:
L(p|x) = f (x|p) =
n
Y
f (xi |p) =
i=1
n
Y
pxi (1 − p)1−xi = p
P
xi
P
(1 − p)n−
xi
i=1
and the log-likelihood is
log L(p|x) =
X
X xi log(1 − p)
xi log p + n −
and the rst derivative of the log-likelihood with respect to θ is
∂
log L(p|x) =
∂p
P
P
P
xi n − x i
xi − np
n
−
=
=
(x̄ − p) .
p
1−p
p(1 − p)
p(1 − p)
∂
∂
Since ∂p
log L(p|x) can be factorized in the form ∂p
log L(p|x) = a(p) [W (x) − τ (p)], according to
Corollary 7.3.15, X̄ is an unbiased estimator of p and it attains the Cramér-Rao Lower bound.
Hence, X̄ is a best unbiased estimator of p and based on Theorem 7.3.19 it is the best unbiased
estimator of p.
Exercise 7.41/ p. 363
X1 , . . . , Xn is a random sample where E(Xi ) = µ and V ar(Xi ) = σ 2
a)
n
X
E
i=1
If
b)
ai X i
=
n
X
ai E(Xi ) =
i=1
n
X
i=1
ai µ = µ
n
X
ai
i=1
ai Xi ) = µ, i.e. an unbiased estimator of µ.
P
P
Firstly, we determine the variance of an estimator of type ni=1 ai Xi where ni=1 ai = 1:
!
n
n
n
n
X
X
X
X
2
2 2
2
V ar
ai X i =
ai V ar(Xi ) =
ai σ = σ
a2i .
Pn
i=1
ai = 1, then E (
!
Pn
i=1
i=1
i=1
i=1
5
i=1
To nd the estimator with the minimum variance we can see
P the above as a function of ai 's,
i = 1, . . . , n that we want to minimize under the condition ni=1 ai = 1. Equivalently, we can
minimize the function:
f (a1 , a2 , ..., an−1 ) =
n−1
X
a2i +
1−
i=1
n−1
X
!2
ai
.
i=1
(Observe that the latter is a function
of ai 's, i = 1, . . . , n−1, and an is expressed as an = 1−
Pn
based on the restriction that
a
i=1 i = 1.)
The partial derivatives of f (a1 , a2 , ..., an−1 ) are
n−1
!
n−1
!
X
∂
f (a1 , a2 , ..., an−1 ) = 2a1 − 2 1 −
ai
∂a1
i=1
X
∂
f (a1 , a2 , ..., an−1 ) = 2a2 − 2 1 −
ai
∂a2
i=1
..
.
n−1
X
∂
f (a1 , a2 , ..., an−1 ) = 2an−1 − 2 1 −
ai
∂an−1
i=1
Pn−1
i=1
ai
!
i.e. ∂a∂ i f (a1 , a2 , ..., an−1 ) = 2ai − 2 1 − n−1
i=1 ai for i = 1, . . . , n − 1.
Setting all partial derivatives equal to 0 we get
P
a1 = 1 −
a2 = 1 −
n−1
X
i=1
n−1
X
ai
ai
i=1
..
.
an−1 = 1 −
n−1
X
ai
i=1
i.e. ai = 1 −
it holds that
Pn−1
i=1
ai , for i = 1, . . . , n − 1. Since the right hand side in all equations is the same,
set
a1 = a2 = . . . = an−1 = k .
Replacing in any of the above equations the ai 's with k we get:
k=
1−
n−1
X
!
k
⇒ k = 1 − (n − 1)k ⇒ nk = 1 ⇒ k =
i=1
The second partial derivatives are
∂2
f (a1 , a2 , ..., an−1 ) = 4 > 0 , i = 1, . . . , n − 1
∂ 2 ai
6
1
.
n
∂2
f (a1 , a2 , ..., an−1 ) = 2 > 0 , i, j = 1, . . . , n − 1 and i 6= j .
∂ai aj
Thus, f (a1 , a2 , ..., an−1 ) reaches its global minimum at the point (a1 , a2 , ..., an−1 ) = ( n1 , n1 , ..., n1 ).
n−1 1
n−1
1
Since an = 1 − n−1
i=1 ai , at this point an takes the value: an = 1 −
i=1 n = 1 − n = n .
Hence, in the case P
of Xi variables with the same mean and variance, among all linear unbiased
estimators of type ni=1 ai Xi for µ, the one with all ai 's equal to 1/n, i.e. X̄ , has the minimum
2
variance. Its variance is V ar(X̄) = n1 V ar(X) = σn .
P
P
7