STA 250/MTH 342 Intro to Mathematical Statistics Midterm 1

STA 250/MTH 342 Intro to Mathematical Statistics
Midterm 1 Solution
Solution 1: Use Bayes Theorem,
Pr(Forecast|Rain) · Pr(Rain)
Pr(Forecast|Rain) · Pr(Rain) + Pr(Forecast|No Rain) · Pr(No Rain)
5
0.9 · 365
=
5
0.9 · 365
+ 0.1 · 360
365
1
= .
9
Some assumptions we need to make here:
Pr(Rain|Forecast) =
• Our prior knowledge about the raining probability Pr(Rain) = 5/365.
• The forecast probabilities Pr(Forecast|Rain) and Pr(Forecast|No Rain) apply for the wedding day of our interest.
Solution 2:
Quantity
data
parameter
both
neither
5
sample mean
1. Prior distributions of θ
5
(5)
2. Posterior distributions of θ
5
3. Bayes estimator of θ under squared error loss
5
4. Maximum likelihood estimator of θ
5
5. Estimate of θ
5
6. Sampling distribution of an estimator
5
7. Variance of an estimator
5
8. MSE of an estimator
5
Comment: in general, the prior distribution of θ depends on neither data nor parameter, but it can be based on
informative prior knowledge or expert opinions, which are sometimes data-dependent; from the frequentist/sampling view,
the sampling distribution, variance, and MSE of estimators are essentially all in terms of the true value of parameter.
Solution 3:
(a)
L(p) =
n
Y
Pr[Xi |p] =
i=1
n
Y
pXi (1 − p)1−Xi = p
Pn
i=1
Xi
(1 − p)n−
Pn
i=1
Xi
(= p4 (1 − p)).
i=1
Comment: the data is (X1 , . . . , Xn ), not the sum
Pn
i=1
Xi .
(b)
log L(p) =
∂ log L(p)
=
∂p
n
X
Xi log(p) + (1 − Xi ) log(1
i=1
Pn
Pn
n − i=1 Xi
i=1 Xi
−
p
Pn
i=1
⇒ p̂1 =
Check the second order derivative:
∂ 2 log L(p)
∂p2
=−
Pn
i=1
p2
n
Xi
at p = p̂1 .
1
−
1−p
Xi
− p),
=0
.
Pn
n− i=1 Xi
(1−p)2
< 0. Therefore, L(p) has a (global) maximum
(c) Since Xi ∈ {0, 1}, p̂1 ∈ 0, n1 , n2 , . . . , n−1
n , 1 . For k ∈ {0, 1, . . . , n − 1, n},
X
n
k
Pr p̂1 =
= Pr
Xi = k
n
i=1
n k
=
p (1 − p)n−k .
k
(d)
Pn
2
i=1
σ = Var(p̂1 ) = Var
Xi
n
X
n
1
Xi
= 2 Var
n
i=1
=
n
1 X
Var(Xi )
n2 i=1
=
p(1 − p)
n · p(1 − p)
=
.
n2
n
(e) By the invariance property of MLE,
p̂1 (1 − p̂1 )
Pnn
Pn
i=1 Xi n −
i=1 Xi
=
.
n3
σ̂ 2 =
h Pn
i
Xi
(f) Since B(p̂1 ) = E[p̂1 ] − p = E i=1
−p=
n
Pn
E[Xi ]
n
i=1
−p=
np
n
− p = 0,
2
MSE(p̂1 ) = B(p̂1 ) + Var(p̂1 )
= 0 + σ2
=
p(1 − p)
.
n
Solution 4:
(a)
π(p|X) ∝ π(p)f (X|p)
∝1·
n
Y
i=1
Pn
∝p
i=1
pXi (1 − p)1−Xi
Xi
Pn
(1 − p)n−
∼ Beta 1 +
n
X
i=1
Xi
Xi , n + 1 −
i=1
n
X
Xi ,
i=1
Plug in the data and we get α∗ = 5 and β ∗ = 2.
(b) The Bayes estimate of p under the squared error loss is the posterior mean, i.e. E(p|X) =
(c)
p̂2 = E(p|X) =
2
1+
Pn
i=1
n+2
Xi
.
α∗
α∗ +β ∗
= 57 .
(d) The Bayes estimator of σ 2 under the squared error loss is the posterior mean,
p(1 − p) 2
E(σ |X) = E
X
n
1
=
E(p|X) − E(p2 |X)
n
1
=
E(p|X) − (E2 (p|X) + Var(p|X))
n
2
α∗ β ∗
1
α∗
α∗
−
=
−
n α∗ + β ∗
α∗ + β ∗
(α∗ + β ∗ )2 (α∗ + β ∗ + 1)
∗ ∗
α β
=
∗
∗
n(α + β )(α∗ + β ∗ + 1)
Pn
Pn
1 + i=1 Xi n + 1 − i=1 Xi
=
.
n(n + 2)(n + 3)
(e)
B(p̂2 ) = E
1+
Pn
Xi p − p
n+2
i=1
1 + np
−p
n+2
1 − 2p
,
=
n+2
Pn
1 + i=1 Xi
Var(p̂2 ) = Var
n+2
Pn
Var(Xi )
= i=1
(n + 2)2
np(1 − p)
=
,
(n + 2)2
=
MSE(p̂2 ) = B2 (p̂2 ) + Var(p̂2 )
np(1 − p)
(1 − 2p)2
+
(n + 2)2
(n + 2)2
(4 − n)p2 + (n − 4)p + 1
=
.
(n + 2)2
=
(f) Suppose a better estimator has a smaller MSE. If p = 0 or 1, MSE(p̂1 ) = 0 and MSE(p̂2 ) =
MSE(p̂1 ) =
1
4n
and MSE(p̂2 ) =
n
4(n+2)2
<
n
4n2
=
1
4n .
1
2
−
q
n+1 1
8n+4 , 2
(4 − n)p2 + (n − 4)p + 1 p(1 − p)
−
(n + 2)2
n
2
(8n + 4)p − (8n + 4)p + n
=
n(n + 2)2
2
(8n + 4) p − 21 − n − 1
=
,
n(n + 2)2
q
q
n+1
n+1
+ 8n+4
, p̂2 has a smaller MSE; when p = 12 ± 8n+4
, the two estimators are equally
good; otherwise, p̂1 is better.
Note: if we plug in n = 5,
> 0; if p = 21 ,
In general,
MSE(p̂2 ) − MSE(p̂1 ) =
When p ∈
1
(n+2)2
1
2
−
q
n+1
8n+4
≈ 0.131 and
1
2
+
3
q
n+1
8n+4
≈ 0.869.