471 Part B

I
I
e
I
:1
'1
I
I
I
I
Ie
I
I
I
I
I
I
I
,.
I
ASYMPTOTIC RISK OF MAXIMUM
LIKELIHOOD ESTIMATES
by
V. M. Joshi
University of North Carolina, Chapel Hill
(on leave from Maharashtra Government,
Bombay)
Institute of Statistics Mimeo Series No. 471
May 1966
This research was supported by the Mathematics Division
of the Air Force Office of Scientific Research Contract
No. AF-AFOSR-76o-65.
DEPARTMENT OF STATISTICS
UNIVERSi"lY OF NORTH CAROLINA
Chapel Hill, N. C.
I
I.
I
I
I
I
I
I
I
I_
I
I
I-
1.
Introduction.
The main result proved in this paper is a hypothesis conjectured by Cher-
noff (1956), that with increase in sample size, the risk, suitably normalized,
of a maximum likelihood estimate (m.l.e.) converges to a limit equal to the variance of the
as~totic
distribution of the estimate.
to hold generally subject to mild conditions.
As the
as~totic
variance is a
lower bound for the risk function for all estimates, the result establishes a
new optimum propety of the m.l. e. viz that it is
a~totically of
minimum risk.
Chernoff's above mentioned hypothesis has also been briefly referred to in
a recent paper of Yu. V. Linnik and N. M. Mitro Fanova (1965).
The efficiency of estinates in general is coa814ered in the. l.a8t aectica
in connection with superefficient estinates and a revised definition of l.QIIIPtotic
efficiency suggested.
2.
Main Result.
We first reproduce the relevant formulae from the above mentioned paper
of Chernoff (1956).
X is a random variable with a frequency function f(x,6),
e.
with one unknown parameter
We consider a sequence of estimates of
where
(1)
T - 9 = 0 (....1...-)
n
p •
.rn
I
I
A sequence of loss functions is assumed, given by
•-
where c2n > 0,
and of normalized loss functions,
,.
The hypothesis is shown
(2)
(3)
Ln (t, 9)
= Con (e)
+
2n (t-6)
C
2
Ln (t,e).- c on (9)
L*(t,6)
= n [
n
where we assume,
1
e,
I
••I
L*(t,9) = n[(t-e) 2 + .(t-9) 2 J
(4)
n
in which • is assumed to hold uniformly in n as t ... 9.
Then, c being an arbitrary constant
> 0,
I
I
I
E L~(Tn' )
lim inf 2 _ :2
>. 1
(
[(
n ... eel n E min Tn-e ) , e])n
so that,
E L*(T ,9)
lim inf
n n
2
n ... eel n E (min[(T _9)2, ....5- ])
n
->1
n
If,[n (T -e) has a limiting distribution with second moment
n
(6)
lim
c
lim
",eeI
n
",eeI
2
2
n E (min[(T -e),
n
:
])
1(9), then
2
= 0--(
e)
so that ~2(e) is a lower bound for the risk function
R (T , e)
n n
I
= E (L*(T
, e) }
n n
_I
On the other hand if,
(8)
PflT -91> c} = o(...l..)
n
n
for each c, it is possible to show that
n E (T _6)2)
lim inf
n
>1
E (L*n (Tn' e)}
so that the normalized risk is sandwiched between the real variance and the
asymptotic variance.
Chernoff remarks that in accordance with the axioms of Neumann and lbrgenstein the loss function requires to be bounded above and (8) indicates that he
assumes an upper bound C for the loss function in (2) and hence an upper bound
n C for the normalized loss function in C~), so that
(10)
I
I
L*(t,e)<nc
n
Chernoff's conjecture then is that if Tn is the maximum likelihood estimate,
the standard derivations of the asymptotic normal 4istribution of T , can, withn
out unreason&lle modifications be used to show that
2
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
~
~
~
J
]e
1
(ll)
lim E(L*(T, 8)J
n n
.
n -+eo
= asymptotic
variance of T •
n
We now give a proof of this conjecture.
We aslUDle tha the frequency' func-
tion f(x,8) satisfies the following conditions, sufficient for asymptotic normality, as stated in Crane" Mathematical Statistica, (194-6, S. " ..at Writing
f for f(x, 8).
2
(i)
For alroost all (Lebesgue measure) x the derivate.
~.,~,
·0'"
and
:
eoo..t for ever;y
,
e belonging
to .. non-degenerate inter-
val A;
(ii) For every
(I
in A, we have
ao f8
< F l (x) ,
,02 fl
". < F ()
x
o
2
and ,
?Y
log
-~f , < H(x)
.0
functions P and F2 being (Lebesgue) integrable over ( _eo , eo)
l
the
while
eo
J
H(x) .f(x, 9) dx < M
-eo
Where M is independent of
(iii) For every
8,
J (a
eo
• in A, the integral
-I
0 log
f)
.~
2
.
f dx is fJ.nite
and
-eo
positive.
]
-,
These conditions are sufficient for asymptotic normality of T.
n
For
proving our reSUlt, we assume that f satisfies the following further condition.
FURTHER CONDITION: (i v)
For every
• in A, the integrals
Jra:10:f/
)2 f dx
eo
I
1
I
].
I
l
and
eo
(H(x»
0 2 10
f
..........g"::2-
o
2
dx are finite, or in other words the random variables
.
and H(x) have finite variances.
8
vle now use the same notation as in Cramer (1946) except that we denote the
parameter by', instead of
instead of a*.
•
o
a and the maximum likelihood estimate by Tn =Tn (x..
,x_, ... ,Xn )
~~ -""2
denotes the unknown true value of the parameter " and 6
0
:3
I
is assumed to be an interior point of A.
use the subscript
in place of f(X ,') &D4
i
i
to indicate that • is to be put equal to •. Then putting
o
0
f-'
1
B0
=n
L (
0 log f i
We write f
f-'
1
a9
) 0' Bl
= n L ( 0;'
i=l
(12)
B2
= ~
2
0 10;' f i
)0
i-1
n
I
H(xi )
i=l
and
2
E( 0
l~
= _ E(O log f )2
f)
0
011
= _ k2
0'
it is shown as in Cramer (1946), that
~ (~lOg f
-!(13)
J.tfn(T
-.)
n
0
lex I < 1
)
0
= ...-,;,,;.;~-------------B
1
2
1
k2 -
where 0 <
aJ
0=1
J.tfn
T
a B2 (Tn - 'o)!k
.
The asymptotic normality of (Tn - .) follows from (13).
In
(13), we now put for brevity
un
(14)
=
IJn (Tn -9)
and
Bl
vn = - -;,. - 1 -
1
2" ex
2
B2 (Tn - 'o)/X
so that the denominator in the r.h.s. of (13) is l+Vn .
(14)
so that
n
u
Let Fn
Hence from (13), and
2
n
2
+ 2u 'Y <
n
n-
-![ L
\
2
k n
i=l
= Fn (Xl' x2' ... , Xn ) denote the d. f. of the Xi' i = 1, 2, ... ,n. We
now integrate both sides of (15), with respect to the probability measure
4
_.
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
determined by F , on the subset D of the sample space R , given by
n
(16)
n
D = (x:
ITn -
I<
9
~)
0-
where x = (~, x ' •.• , xn ) denotes a point in Rn and ~ is some podtive con2
stant, whose value shall be determined later.
(16)
I
I
u; d Fn + 2
u:
Vn
d Fn
We thus have from (15),
~ k~n f[
t (~ ~a:
f
i
)01
d Fn ,
i=l
~
3
1
-1
E(
a•
)2 = k2
0
Hence since the variables xi' i
E[
L. (
ae
2
nk
and hence
r.h.s. of (16) ~ 1
Then using (14) and (18)'
1
l.h.s.
Now by the assumed condition (iii) E(H(x)}
(18)
E (H(X)}o
~f
f
(16)
=Ju2
-
n
J
u
D
J
o
exists and is finite.
Let
= ~(90)
D
I
[
Rn i=l
1
J
J.
2, ... , n, are independently distributed,
2
i=l
(17)
I
= 1,
J
n
0 log f f
~)o = J I ( a. i) J =
n o log f.
]e
J
o log f~...
2
n
dF
n
-
2JU2(
D
ex ~(.)
0
n
B2l + 1) dF
k
n
(T - . )
n 2 0 d F
k
n
-
J
D
2
(T - 90 )
dF
u ex [B2-~' )] n 2
n
n
0
k
= say gl + !g2 + g, + g4
where gi' i
= 1,
2, ... ,4 is written for brevity for the ith term in the r.h s.
of (It).
,
I
We now derive upper bounds for I~I , I~ I and, Ig4 1 , by applying Schwarz'
inequa1ity and using the fact that for xED, by (14) and (16)
IUn I~ ltfn
~Te
8.
have by Schwarz'
inequality
g22 <[u4 d F [ ( B21 + 1)2 d F
n
n 'k
n
(20)
Now in (20), using (12),
1
B1
(21)
k2 + 1
= nk2
~ [cP log f i
1-1 ( .0 "
1
2
)e + k
.0
~=l
where by (12), for each i, i=l, 2, ... , n
2
.
0 log f i
E [(
0;' )0 + k _ = 0
21
and by the assumed further condition (i v)
E [(
02
log f
01
i
)
0
+ k2
.
i
= ~2(. )
10
_
E\t[( d2~0~fi )0 ~2 J}2 \t[/~O~fi)o ~2]2 Fn
t
+
d
-n
so that,
(22)
Hence in (20),
(23)
J
D
by (21) and (22),
B
(-1
k2
+.
1)2 d F ~
cr2 (. )
:-
0
n · n kft,
Again in the first integral in the r.~ •. of (20), since by (16) and (14)
6
I
I
I
I
I
I
I
_I
say
Hence since the xi are distributed independently
+
_.
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
I_
(24)
2J
D
2 d F - k2n 82g
u
1
n
n
D
Combining (23) and (24) with (20), we have
2
g2
.:s
cr~
2
8
('0)
k2
gl
so that
(25)
Next in g3 in (19),
lal .:s 1,
by assumed condition (ii), 1J.l('o~
.:s M,
'0 1.:s 8
and by (16) ITn and hence
(26)
I
I
I
I
I
I
I
I·
I
J
4 d F < k2n 8
u
n
n -
M8
-2
k
Igx 1.:s
.I
J
u
2
n
d F
n
='
M8 gl
2
k
D
Lastly in g4' sincelal
(27)
Ig4 1.:s
J
8
~
< 1 and ITn 2
un
IB2
-
'r.:s
J.1(')1
0
d F
8
n
D
etting
we again apply Schwarz I
(28)
g"25• -<
inequality, and have
J 4 FJ
u
D
n
d
[B - J.1(. »)2 d F
n2o n
D
In the second integral in the r.h.s. of (28» by (12)
7
I
••I
n
B - l.J.(e )
2
0
=
I
1
n
i=l
where for each i, i
by (14)
= 1,
2, ••• , n
E[H(x.)
- I.J.('0 )] =
~
I
I
I
I
0
and by assumed further condition (iv)
E[H(xi ) - 1.J.('0)]2 is finite = ~~ ('0) say.
Hence since the Xi are distri~uted independently, in (29)"
E[13 _ ~(e )]2 =
2
o
(T.
(.)
2. 0
n
so that in (28),
(:30)
J
[13 - I.J.(. )]2 B F <
2
o
n-
I
I
rl.2 ('0)
n
--:;;;-,;;....-
D
Now in the first integeal in the r.h.s. of (28), since by (12) and (16)
2 2
2
u <n k e
n
_I
-
Now in the r.h.s. of (19), obviously
gl + ~ + g3 + g4 2: gl - 21g2 1 - Ig3 I - Ig4 1
and hence collecting together the results in (25), (26), and (32), we
(33)
have
from
(16), (17) and (19{ using (33),
(34)
1
e
.l. 26 (T.1( '0)
) _ g2[ _~;,......;;... +
k2
1
k
g (1 _ -M
,2
(T,2(' )
0 ]
<
k-
and denoting the coefficient of gl in (3 4 ) by 2 K, Where
as
8
1
K
> 0 we write (34)
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
g (1 - M 28 ) _ 2K g!
(35)
k
1
Ive nov assume that
<1
1-
K
is sufficiently small
110
that the coefficient of gl in
the 1. h. s. of (35) is positive;J&Y,
8 ~
(36)
k
2
2M
Then solving the quadratic in g!
g (1 - !:U.) - 2K gt = 1
1
k2
1
(35)
I_
I
I
I
I
I
I
I
II
K + 1 +M8 + K
. . 2 k2
< ---------1 - M...§
2
k
so that substituting the value of
_---.1_
M8
1-k2
{
1 + 8[
K
from (34), we have
2 ~ (. )
1
0
+
k
which using (36), can be reduced to the form
where A(e ) is independent of 8
o
9
I
Returning to the loss function in (4), we now write it in the form
.
2
L*{t, , ) = n{t-e) [1 + h{t, • )]
(38)
n
0
0
Where by assumption, given any arbitrary DUmber
€
> 0, we can find a 8,
o
such that
Ih{t,')I < e for all t, such that It - • 1< 8
(39)
-
We now take 8
2
= min[80 ' ~\1
],
0
-
0
8 satisfies both· (36) and (39)
so that
Now
(40)
E {L*,
"
n (Tno
» = JL*(T
n n"o )
d Fn
R
n
= JL*(T
n n"o )d Fn + JL*{T
n n"o )d Fn
D
Rn -D
D being the subset of R defined by
n
(16)
Now the first integral in the r.h.s. of (40)
~
(1 + e)
I
J
n(Tn - '0)2d Fn
D
Hence noting the definitions of gl and un in (19) and (14)
{l + €)gl
J
(41)
D
I4t(Tn , t,)d Fn ~
2
k
Here we note that
Then consider the second integral in the r. h, s. of (40).
since the distribution of T is asymptotically normal, by a well known
n
property of the Normal frequency function (8) holls for T.
n
8, and
(42)
€,
by making n sufficiently large, we can make
E
P[jTn
-. I o
> 8] =n
P[R -D] -< n c
10
I
I
I
I
I
I
I
_I
= In(Tno
-. )2d Fn + In(Tno
-. )2 h(Tno
" )d Fn
D
D
Which by (39)
_.
Hence for given
I
I
I
I
I
I
-.
1-
I
I.
I
I
I
I
I
I
I
I_
I
I
I
I
I
I
I
II
Then from (10) and (42)
where C is the upper bound in (10).
.fL*n (Tn" o) dnF -< e
(4;)
Rn -D
Now collecting the results in (~), (41), and (4') and using ('7), we have
< (1 + A(' )e) (1 + E)
-
"it'§.
0
+ E
for all sufficiently large n.
Since
€
lim sup E (L*(T
) '<
n n, eo.t..:::
n -+eo
(45)
But -
1
~
is the asymptotic variance of T , and hence from (5) and (6)
n
lim inf E{L*(T ,8 )l
n n 0
n -+eo
(46)
(45)
and 8 can be made arbitrarily small, it follows that
and
(46)
> 12
k
-
together imply
E{L* (T , • )}
n n
0
=
1
k2
which was to be proved.
3. Asymptotic Efficiency.
Here we incidentally point out another optimum property of a m. 1. e. ,
which is implied by a relation given in Cheroff (1956), but Which has a
14'
special significance in relation to sup,refficient estimates, which is stressed
here.
According to Fisher's original concept of asymptotic efficiency (A.E.
for short), the A.E. of any estimate T is given by
n
11
I
I
(48)
A. E.
=
1
-.
2
lim (n I(e) E[T -.] }
n
n -+10
where I(e) is Fisher's measure of imlormation; eva.lu&tedat
It was assumed that this A. E. has an upper bound of 1.
"
This &88umption hal
however turned out to be false, and in fact the A. E. in (48) has no upper bound,
so that given any estimate we can construct another uniformly superior to it and
there exists no most efficient estimate.
The Fisherian concept of A. E
is
thus void.
A superefficient estimate is one Whose A. E. is not less than 1 for any
e and
exceets 1 for at least one
e.
An example of such an estimate was first
presented by J. L Hodges, Jr. in 1951.
The example relates to a Normae
population N( 1,1), and is
Tn
= ......
"';::;n
if
Ixn-n;rI< 1
Tn
= xn
if
IXl\I>
(50)
nt-
wherea is some constant, such that 0
=s
lal < 1
x
and n is the sample mean.
n
xn =..1:...
~x
n l.
i
i=l
Further examples of superefficient estimates were later given by Le Cam
(1953) who constructed an example Where the set of superefficiency, i.e. the
set of values of' for which the A. E. excelds 1, is non-denumerable and
everywhere dense.
It was also proved by him that the set of superefficiency
must necessarily be of Lebesgue measure zero.
1 2
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
'I
I
I_
I
I
I
I
I
I
I
II
It should be noted here, that though the A. E. of the superefficient
estimate T ia (48) is for all 9 higher than that of m.l. e. i , t is not
n
n n
,superior' to n even if we take the M. S. E. (mean squared error) as the sole
x,
criterion.
While Tn has a lower
there exist values of
lower M.S.E.
9
~e.E.
.
at ... 0, for every n, however large,
(in fact an interval of values) in which in has a
The observation made sometimes, that according to a 'behaviourist
viewpoint' the superefficient estimate T has to be preferted to i
n
not correct.
n
is thus
As pointed out by Le Cam (1953) the same thing occur. in the
case of all other superefficient estimates.
The author considers that this revells a defect in the definition (48) as
the possession ilf a higher A. E. aces not imply the possession of a higher
degree of some desirable property.
The defect in this definition (48) arises
from the fact that i8 takes into consideration the performance of the estimate
T for each value of nat some single point
n
.....
0
The true value of
is however not known, though as n increases we can locate
closely with a given degree confidence.
9
•
more and more
This SUllests that a revised definition
of :I. E. should take into account the performance of Tn' not at a single point
9 but in some Jleighbourhood of it,
With these considerations a modified
o
definition of A. E. is proposed below.
Let c be a sequence of positive numbers such that
n
C
(51)
n
.fn cn
00+0
oo+eo
Then the suggested revised definition is
(52)
A.E. = lim int
n
oo+eo
1
sup
[n I(') E.(T _.)2
n
• -c <9<, +c
o non
The following result is proved in Chernoff(1956, Theorem 1)
I
lim
k-.IO
lim inf
n -'10
2 k2
{ n I(') E.(min[CT -.) , -
sup
k
k
lWn
Wn
n
- -<9<-
n
••I
J») -> 1
From the proof of the theorem, it is seen that the following conditions are
assumed,
(a)
that the frequency function
:rex, e)
and the estimate Tn' satisfy the ·regularity'
conditions sufficient for the Cramer-Reo inequality, such "as those given by
Wolfourtz
(b)
I(e) is bounded away from zero in some neighbourhood ofB • O.
For (52) we now replace condition (b) by the condition
(c)
I(G)
bounded away from zero in some neighborhood of every point • , in the
o
range of values assumed bye.
It is then easily seen to be a consiquence of (53), that the A. E defined
by (52), has an upper bound of 1.
It is further seen that this upper bound
I
I
I
I
I
I
_I
is attained for a m.l.e. while for the superefficient estimate T the A. E.
n
by the revised definition is nil.
AcknOWledgement
The relation given in (53
) was pointed out to me by a referee of the
Annals in connection with another paper.
I have also to thank Mrs .. Kay Herring
for doing the typing work neatly and speadily.
References
(1) Chernoff H (1956) Large Sample Theory.
(2) Cramer Harold (1946).
sity Press.
(3) Le
Cam
(1953)
Ann. Math Statist, V.
27 1-23.
r-fa,thematical Methods of Statistics Princeton Univer-
p. 500-503
On some asymptotic properties ofaximum likelihood estimates
and related Bayes estimates.
Vol I, 1953 p 277-330
g~iversitx
of
CaliforDia~.
Stats
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
I_
I
I
I
I
I
I
I
Ie
I
(4)
Neumann von J and Morgenstein (l91n)
Theory of Yarnes and Economic
Behavior. 2d ed. Princeton University Press, Princeton N. J. 1947,
p. 641
(5)
Linnik Yu. V. and Mitrofanova N. M. (1965) lome asymptotic expansions for
the distribution of the maximum likelihood estimate.
~hy,a
Series A.
Vol. 27 Part 1, p 73
(6)
1'10lfowitz J (1947)
The efficiency of sequential estibmates and Walds'
equation for sequential processes Ann
p 215-230
Math. Statist. Vol. 18