Statistics 2

MTH 202 : Probability and Statistics
Lecture S2 :
11. Estimation of parameters
11.1 : Method of moments
Recall that the k-th moment µk of an RV X is defined by µk = E(X k ).
If X1 , X2 , . . . , Xn are independent identically distributed (i.i.d.) RVs
from the distribution of X, the k-th sample moment is defined by
n
1X k
X
µ̂k :=
n i=1 i
We can view µ̂k as an estimate of µk = E(X k ).
Suppose we wish to estimate θ1 , θ2 where
θ1 = f1 (µ1 , µ2 ), θ2 = f2 (µ1 , µ2 )
then the method of moment estimates are
θ̂1 = f1 (µ̂1 , µ̂2 ), θ̂2 = f2 (µ̂1 , µ̂2 )
Example 11.1.1 : (Gamma distribution with parameters α, λ)
We have seen that
µ1 =
α
α(α + 1)
, µ2 =
λ
λ2
Also we know that σ 2 = µ2 − µ21 . From these equations the parameters
α, λ are expressed in terms of µ1 , µ2 as
α=
µ21
µ1
,λ =
2
µ2 − µ1
µ2 − µ21
Hence the method of moments estimate of the parameters σ 2 , λ, α are
σ̂ 2 = µ̂2 − µ̂21 , λ̂ =
µ̂1
µ̂21
,
α̂
=
µ̂2 − µ̂21
µ̂2 − µ̂21
1
2
See pages 261-263, Sec-8.4, Chapter-8 for more examples.
11.2 : Method of maximum likelihood (mle)
Suppose X1 , X2 , . . . , Xn have a joint density or frequency function
f (x1 , x2 , . . . , xn | θ), where θ is a parameter. Given observed values
Xi = xi (i = 1, . . . , n), the likelihood of θ as a function of x1 , x2 , . . . , xn
is defined as
lik(θ) = f (x1 , x2 , . . . , xn | θ)
The maximal likelihood estimate (mle) of θ is the value θ0 of θ
that maximizes the likelihood lik(θ). If the Xi are assumed to be i.i.d.
we have
n
Y
lik(θ) =
f (Xi | θ)
i=1
It is often convenient to maximize the natural logarithm of the above
product (which are equivalent)
n
X
l(θ) =
log f (Xi | θ)
i=1
Example 11.2.1 : Assume that X1 , . . . , Xn are all i.i.d. to the normal
distribution N (µ, σ 2 ). Then
n
1 h x − µ i2 Y
1
i
√
f (x1 , x2 , . . . , xn | µ, σ) =
exp −
2
σ
σ 2π
i=1
hence
n
X
1 h x − µ i2 i
1
i
√ exp −
2
σ
σ 2π
i=1
n
n
1 X
(Xi − µ)2
= −nlogσ − log(2π) − 2
2
2σ i=1
l(µ, σ) =
log
h
To find the mle, we need to solve
n
∂l
1 X
= 2
(Xi − µ) = 0
∂µ
σ i=1
and
n
∂l
n
1 X
=− + 3
(Xi − µ)2
∂σ
σ σ i=1
3
solving these, we get
n
n
1X
1X
µ=
Xi , σ 2 =
(Xi − µ)2
n i=1
n i=1
It can be checked from the higher order derivatives that these values
indeed give the maximum values. Hence the mle of µ and σ are given
by
n
1X
2
(Xi − X)2
µ̂ = X, and σ̂ =
n i=1
Also see pages 268-272, Sec-8.5, Chapter-8 (Rice) for more examples.
Example 11.2.2 : (mle of Multinomial cell probabilities)
Recall that the joint PMF of X1 , X2 , . . . , Xm is given by
px1 px2 . . . pxmm
f (x1 , x2 , . . . , xm | p1 , p2 , . . . , pm ) = n!. 1 2
x1 !x2 ! . . . xm !
The log likelihood is given by
m
m
X
X
l(p1 , . . . , pm ) = logn! −
logxi ! +
xi logpi
i=1
i=1
Using the method of Lagrange multiplier we can maximize l(p1 , . . . , pm )
(as in page-272, Sec-8.5.1, Chapter-8, Rice) and obtain the set of estimates
X1
X2
Xm
p̂1 =
, p̂2 =
, . . . , p̂m =
n
n
n
where each Xi is the marginal distribution of the joint PMF given above
and is binomial with parameters (n, pi ).
11.3 : Large sample theory for mles
Definition 11.3.1 : Let θ̂n be an estimate of a parameter θ based on
a sample size n. Then θ̂n is said to be consistent in probability if for
any > 0,
lim P (|θ̂n − θ| > ) = 0
n→∞
which is symbolically written as θ̂n ∼ θ in probability as n large.
Theorem 11.3.2 : Under appropriate smoothness conditions on f ,
the mle from an i.i.d. sample is consistent.
4
Proof : See page 275-276, Sec-8.5.2, Chapter-8, Rice.
TO BE UPDATED
References :
[RS] An Introduction to Probability and Statistics, V.K. Rohatgi and
A.K. Saleh, Second Edition, Wiley Students Edition.
[RI] Mathematical Statistics and Data Analysis, John A. Rice, Cengage
Learning, 2013