Assignment #2 Solution Sketch

Assignment 2
Exercise 1.20 According to the result
Pn of Exercise 1.16, the limit (1.21) implies that
the relative difference between i=1 (1/i) and log n goes to zero. But this does
not imply that the difference itself goes to zero (in general, the difference may
not even have any limit at all). In this particular case, the difference converges to
a constant called Euler’s constant that is sometimes used to define the complexvalued gamma function.
P
Evaluate ni=1 (1/i)−log n for various large values of n (say, n ∈ {100, 1000, 10000})
to approximate the Euler constant.
Sketch of solution:
2, . . . , 6:
Below is R code that gives the value for n = 10k , k =
n<-1e+6
x <- cumsum(1/(1:n))-log(1:n)
x[10^c(2:6)]
[1] 0.5822073 0.5777156 0.5772657 0.5772207 0.5772162
Exercise 1.21 Let X1 , . . . , Xn be a simple random sample from an exponential distribution
with density f (x) = θ exp(−θx) and consider the estimator δn (X) =
Pn
i=1 Xi /(n+2) of g(θ) = 1/θ. Show that for some constants c1 and c2 depending
on θ,
bias of δn ∼ c1 (variance of δn ) ∼
c2
n
as n → ∞. The bias of δn equals its expectation minus (1/θ).
Sketch of solution: This may be shown directly by demonstrating that the
bias of δn is −2/[θ(n + 2)] and its variance is n/[θ(n + 2)]2 . We may thus take
c1 = −2θ and c2 = −2θ.
Exercise 1.26 Create counterexamples to the result in Theorem 1.31 if the hypotheses of the theorem are weakened as follows:
(a) Find an , bn , and convex f (x) with limx→∞ f (x) = ∞ such that an = o(bn )
but f (an ) =
6 o[f (bn )].
Sketch of solution: Many solutions are possible for each of these
three parts. For (a), take an = 0, bn = 1, and f (x) = x + 1.
213
(b) Find an , bn , and convex f (x) such that an → ∞, bn → ∞, and an = o(bn )
but f (an ) 6= o[f (bn )].
Sketch of solution:
Take an = n, bn = n2 , and f (x) = 1.
(c) Find an , bn , and f (x) with limx→∞ f (x) = ∞ such that an → ∞, bn → ∞,
and an = o(bn ) but f (an ) 6= o[f (bn )].
Sketch of solution:
Take an = n, bn = n2 , and f (x) = log x.
Exercise 1.27 Recall that log n always denotes the natural logarithm of n. Assuming
that log n means log10 n will change some of the answers in this exercise!
(a) The following 5 sequences have the property that each tends to ∞ as n → ∞,
and for any pair of sequences, one is little-o of the other. List them in order of
rate of increase from slowest to fastest. In other words, give an ordering such that
first sequence = o(second sequence), second sequence = o(third sequence), etc.
√
Pn √
3
log n!
i
2log n
(log n)log log n
n
i=1
Prove the 4 order relationships that result from your list.
Hint: Here and in part (b), using a computer to evaluate some of the sequences
for large values of n can be helpful in suggesting the correct ordering. However,
note that this procedure does not constitute a proof!
Sketch of solution:
√
• To prove (log n)log log n = o( log n!):
Since log n! > n for n √
> 3 (easy to prove by induction), it suffices to show
log log n
that (log n)
= o( n). Furthermore, by Theorem 1.31 since the exponential function is a convex function tending to infinity, it suffices to show
(taking logarithms) that (log log n)2 = o(0.5 log n). This follows because logarithmic growth is slower than polynomial growth.
√
• To prove log n! = o(2log n ):
√
Since log n! ≤ n log n and 2log n =√nlog 2 , it suffices to prove that n log n =
o(nlog 2 ), which is equivalent to log n = o(nlog 2−0.5 ), which follows since
logarithmic growth is slower than polynomial growth.
• To prove 2log n = o(n):
Simply write 2log n = nlog 2 . The result follows immediately since log 2 < 1.
214
• To prove n = o
Pn √
3
i :
i=1
3n
4n4/3
√
=
×
,
Pn 3
Pn √
3
4/3
4n
i
3
i
i=1
i=1
n
where the last fraction tends to 1 by Equation (1.20). Technically, we never
proved Equation (1.20), but this is not hard using an argument where we
“sandwiching” the sum between two integrals that are easy to solve and
asymptotically equivalent to one another.
Exercise 1.30 Prove Theorem 1.38.
Hint: Starting with Equation (1.31), take x = a + tei and let t → 0.
Sketch of solution: The objective is to show that the i, j entry of ∇f (a) is
∂fj (x)/∂xi |x=a . If x is defined as in the hint, then Equation (1.29) or, alternatively, Equation (1.31), becomes
f (a + tei ) − f (a)
= ∇f (a)> ei
t→0
t
lim
after multiplying through by sign(t) to get rid of the absolute value symbols. The
right hand side above is simply the ith row of ∇f (a), so the result follows from
Definition 1.37 applied to each fj (x) in turn.
Exercise 1.31 Prove that the converse of Theorem 1.38 is not true by finding a
function that is not differentiable at some point but whose partial derivatives at
that point all exist.
Sketch of solution:
f (x, y) = I{xy = 0} is such a function. Its values
along both axes are a constant 1, which means it has partial derivatives with
respect to x and y at the origin. However, the function is not even continuous at
that point, since it equals zero at each point not on one of the axes.
Exercise 1.32 Suppose that X1 , . . . , Xn comprises a sample of independent and identically distributed normal random variables with density
f (xi ; µ, σ 2 ) =
exp{− 2σ1 2 (xi − µ)2 }
√
.
2πσ 2
Let `(µ, σ 2 ) denote
the loglikelihood function; i.e., `(µ, σ 2 ) is the logarithm of the
Q
joint density i f (Xi ; µ, σ 2 ), viewed as a function of the parameters µ and σ 2 .
215
The score vector is defined to be the gradient of the loglikelihood. Find the score
vector for this example.
Hint: The score vector is a vector with two components and it is a function of
X1 , . . . , Xn , µ, and σ 2 . Setting the score vector equal to zero and solving for µ
and σ 2 gives
maximum likelihood estimators of µ and σ 2 , namely
P the well-known
1
2
X and n i (Xi − X) .
Sketch of solution: This exercise was not assigned; it is included because
it is referred to by the next exercise.
Exercise 1.34 (a) Find the Hessian matrix of the loglikelihood function defined in
Exercise 1.32.
Sketch of solution:
∇2 L(µ, σ 2 ) =
Straightforward differentiation gives
!
P
n
i Xi −nµ
− Pσ4
−
P σ2
(X −µ)2
n
i Xi −nµ
− σ4
− i σi6
2σ 4
(b) Suppose that n = 10 and that we observe this sample:
2.946 0.975
2.627 -0.628
1.333
2.476
4.484
2.599
1.711
2.143
Evaluate the Hessian matrix at the maximum likelihood estimator (µ̂, σ̂ 2 ). (A
formula for the MLE is given in the hint to Exercise 1.32).
Sketch of solution: Since µ̂ is the sample mean, the off-diagonal
elements of the Hessian are zero. This gives the matrix shown in the
R code below.
x <- c(2.946, 0.975, 1.333, 4.484, 1.711,
2.627,-0.628, 2.476, 2.599, 2.143)
muhat <- mean(x); sigma2hat <- mean((x-muhat)^2) ; n <- 10
diag(c(-n/sigma2hat, -n/2/sigma2hat^2))
[,1]
[,2]
[1,] -6.0587 0.000000
[2,] 0.0000 -1.835392
(c) As we shall see in Chapter 7, the negative inverse of the Hessian matrix is a
reasonable large-sample estimator of the covariance matrix of the MLE (though
with only n = 10, it is not clear how good this estimator would be in this
216
example!). Invert your answer from part (b), then put a negative sign in front
and use the answer to give approximate standard errors (the square roots of the
diagonal entries) for µ̂ and σ̂ 2 .
Sketch of solution:
code below.
The approximate standard errors are given by the R
sqrt(-1/c(-n/sigma2hat, -n/2/sigma2hat^2))
[1] 0.4062658 0.7381346
Exercise 1.39 Prove Hölder’s inequality: For random variables X and Y and positive
p and q such that p + q = 1,
p
q
E |XY | ≤ E |X|1/p
E |Y |1/q .
(1.39)
(If p = q = 1/2, inequality (1.39) is also called the Cauchy-Schwartz inequality.)
Hint: Use the convexity of exp(x) to prove that |abXY | ≤ p|aX|1/p + q|bY |1/q
whenever aX 6= 0 and bY 6= 0 (the same inequality is also true if aX = 0 or
bY = 0). Take expectations, then find values for the scalars a and b that give the
desired result when the right side of inequality (1.39) is nonzero.
Sketch of solution:
Convexity of exp means that
exp{pA + qB} ≤ p exp{A} + q exp{B}.
If both aX and bY are nonzero, take A = log |aX|/p and B = log |bY |/q to
obtain the inequality in the hint; otherwise, that inequality is easily seen to be
true. Next, notice that Hölder’s inequality is trivially true if either E |X|1/p or
E |Y |1/q is zero; otherwise, take a = [E |X|1/p ]−p and b = [E |Y |1/q ]−q , which
leads directly to Hölder’s inequality since p + q = 1.
217