I I~ I I I I I I I I_ I I I I I I I I I ASYMPTOTIC RISK OF MAXIMUM LIKELIHOOD ESTIMATES by v. M. Joshi University of North Carolina, Chapel Hill (on leave from Maharashtra Government, Bombay) Institute of Statistics Mimeo Series No. 471 Maylg66 This research was supported by the Mathematics Division of the Air Force Office of Scientific Research Contract No. AF-AFOSR-76o-65. DEPARTMENT OF STATISTICS UNIVERST!Y OF NORTH CAROLmA Chapel Hill, N. C. '- I I. I I I I I I I Ie I I I I I I I II 1. Introduction. The main result proved in this paper is a hypothesis conjectured by Cher- noff (1956), that with increase in sample size, the risk, suitably normalized, of a maximum likelihood estimate (m.l.e.) converges to a limit equal to the variance of the asymptmtic distribution of the estimate. to hold generally subject to mild conditions. The hypothesis is shown As the asymptotic variance is a lower bound for the risk function for all estimates, the result establishes 'a new optimum propety of the m.l. e. viz that it is asymptotically of minimum risk. Chernoff's above mentioned hypothesis has also been briefly referred to in a recent paper of Yu. V. Linnik and N. M. Mitro Fanova (1965). The efficiency of estiJrates in general is coaa1iered in the last sectica in connection with superef:f'icient estimates and a revised definition of a.8JII'Ptotic efficiency suggested. 2. Main Result. We first reproduce the relevant formulae from the above mentioned paper of Chernoff (1956). X is a random variable with a frequency function f(x,9), e. with one unknown parameter We consider a sequence of estimates of where (1) T n - 9 = 0 p (-1:-.) ,[n A sequence of loss functions is assumed, given by (2) where c2n > 0, and of normalized loss functions, L L*(t e) = n [ n ' where we assume, n (t,e) - c c2n (e) on 1 e, I ·1 L*(t,8) = n[(t-e) 2 + .(t-8) 2 ] n (4) e in Which • is assumed to hold uniformly in n as t -+ 8. Then, c being an arbitrary constant > 0, so that, (5) lim c -+ 10 E L*(T ,8) lim inf n n 2 n -+ 10 n E (min[(T _9)2, ~ ]) n (6) I ->1 n If.[n (T -e) has a limiting distribution with second moment n 2 lim lim n E (min[(T _e)2, ~ J) =;( 8) n c -+10 n -+10 ;(9), then so that O'2(e) is a lower bound for the risk function R (T ,e) = E (L*(T , e) ) n n n n PflT -el> c) n = 0(...1-) n I I for each c, it is possible to show that (9) lim inf n -+ 10 n E ((T _e)2) n E (L*n (Tn' e») >1 - so that the normalized risk is sandwiched between the real variance and the asymptotic variance. Chernoff remarks that in accordance with the axioms of Neumann and l-brgenstein the loss function requires to be bounded above and (8) indica.tes that he assumes an upper bound C for the loss function in (2) and hence an upper bound n C for the normalized loss function in (10) (3), so that L*(t, e) < n C n I I I _I On the other hand if, (8) I I I - Chernoff's conjecture then is that if Tn is the maximum likelihood estimate, the standard derivations of the asymptotic normal tlistribution of T , can, withn out unreasonalle modifications be used to show that 2 I I I I I -. I I I. I I I I I ~ i lim E (L*(T ,9) J (ll) n , , ] n = asymptotic variance of T • n We now give a proof of this conjecture. We asllWDe tha the frequency tunc- tion f(x,9) satisfies the following conditions, sufficient for asymptotic normality, as stated in Cramerls Mathematical Statistics, (1946, S. " ..at Writing f for f(x,9). 2 (i) For a1nrJst all (Lebesgue measure) x the derivate. and ~ lo~ ~, golo)f (ii) For every' in A, we have *~ < F l (x), r; j 1< F2 (x) and functions 1'1 and F2 being (Lebesgue) integrable over ( while J , exist for every 6 belolllling to a non-degenerate inter- val. A; the ~e n -+10 r~ 101; _10 , I < H(x) 10) 10 H(x) f(x,9) dx < M -10 where M is independent of (iii) For every 9, • in A, the integral 10 J ( 01 a ~g f ) 2 f dx is finite and -10 positive. These conditions are sufficient for asymptotic normality of T. n For proving our reSUlt, we assume that f satisfies the following further condition. 1 I f J Ie 1 FURTHER COIIDITIOII: (iv) For every • in A, the integrals JCO:10~f)2 f <Ix 10 1 and 2 10 2 (H(x)) dx are finite, or in other words the random variables o 10S2f o 9 and H(x) have finite variances. vIe now use the same notation as in Cramer (1946) except that we denote the parameter by .', instead of instead of a*. • o a and the maximum likelihood estimate by Tn==Tn (JS.,~, ... ,Xn ) denotes the unknown true value of the parameter " 3 and 9 0 I We write f i in place of f(X ,') aDd i to indicate that. i8 to be put equal to •. Then putting is assumed to be an interior point of A. use the subscript 0 ~ 0 109. :ri L. ( () 9 1 ""'it B0 = ~ 1 ) 0' B1 = n L ( 0;' )0 i-1 (12) .J. H(X i) i=l and where 0 < lal (14) _I <1 . The asymptotic normality of (T -.) follows from n In (13), we now put for brevity un = J:Jn vn =- (T (~). n -9) and B1 k2 - 1 - 1 2 a 2 B2 (Tn - 'o)!k so that the denominator in the r.h.s. of (13) is l+vn • Hence from (13), and (14) [I i=l so that n u 2 n + 'i!;u2... <..1.. [ \ 2 n n- . 1. A n ••I I I I I I I i.1 n = ~ B2 "a~ :ri o L i=l Let Fn = Fn (X1, x2' ... , Xn ) denote the d.f. of the Xi' i • 1,2, "" n. We now integrate both sides of (15), with respect to the probability measure I I I I I I I -. I -----------~ I I. I I I I I I I determined bY' F , on the subset D of the sample space R , given by n (16) D = {x: where x = (~, ~, I I· • n ITn - 90I < 8) ... , xn ) denotes a point in Rn and 8 is some podtive constant, whose value shall be determined later. We thus have from (15), (16) [ u [un v 2 2 d F +2 n n n E( o log f i a• and bY' (12) o log E( )0 = f~ a.·)~ = k ~ kn < d F n- [[ ~ L ~o: f i )of. d Fn, ( 0 i=l = 1, 2, ... , Now in the r.h.s. of (16), for each i, i n, 0 2 Hence since the variables xi' i = 1, 2, •.. , n, are independently distributed, L. ( 1= J I ( 2 n o log f. Ie I I I I I I . E[ ae J. 0 log f [ n ae )0_ i=l R n i) 1 2 = nk i=l and hence (17) r.h.s. of (16) ~ 1 Now by the assumed condition (iii) E{H(x)} exists and is finite. o (18) E {H(x)} o = ~(6 . 0) Then using (14) and (18)' Lh. s. !'f (16) (19) Let J J = u: d Fn - 2 JU:( D - u D D 2 n ex ~(.) 0 :i + 1) d Fn (Tn - • ) .2 k 0 d F n - J u D 2 n ex [B -J,1(' ) l 2 0 (T -"0 ) n 2 k = say gl + t~ + g, + g4 where g., i = 1, 2, , .. , 4 is written for brevity for the ith term in the r.h s. J. o·f (1'). , I We now derive upper bounds f'or I~J , I~ I and, Ig4 1 , by applYing Schwarz inequality and using the f'act that f'or xeD, by IUn I~ 19Jn I I I I inequality (20), using (12), n \ - 2 + 1 = - 2 /. k nk ~ Bl (21) 1 1 Ie. [02 log f i .... (----:-:::~) + k 0 ~=l Where by 1l 2log ••. , n I 21 f. o I- ~)o [( 2 0 (12), for each i, i=l, 2, E + k .. = 0 I and by the assumed further condition (iv) 02 E [( 2 log f. o;~ i )o+k. {i=lL~ 1}2{{~ L (2 i 0 9 - _. _I 2 =crl('o)say Hence since the x. are distributed independently ~ 2 l0g [ 1l 1og f 1 )0 + k2 ...[ 02 fi E Ri=l n ( . 2 )0 + k 0'- 2 211. JJ - d Fn so that, (22) Hence in (23) (20), by (21) and (22), f 'n B . (---1 +, 1)2 d F ~ k 2 ••I 8. / 'VTe have by Schwarz' Now in (14) and (16) t ,l(, ) :0 I I I I I I n · n k' Again in the f'irst integral in the r.~ •. of 6 (20), since by (16) and (14) I -. I I I. I I I I I I I Ie I I I I I I I I· I (24) Combining (2') and (24) with (20), we have 2 g2 :: cr~ 2 8 ('0) k2 gl so that (25) Next in g, in (19), lal :: 1, by assumed condition (ii), "..l('6~ :: M, and by (16) ITn - '0 I :: 8 and hence (26) Ig,l:: 2Mk 8 Lastly in g4' sincela I J u 2 d F.. n n D :: 1 and ITn - M8 gl 2 k 'f:: 8 etting we again apply Schwarz t (28) g"25, -< J D u 4 n d inequality, and have J Fn [:&2 - fJ(' )]2 d F on D In the second integral in the r.h.s. of (28») by (12) 7 I .-I n (29) 2 - IJ.( 90 ) = B I 1 n [H(xi ) - IJ.( '0)] i=l where for each i, i = 1, 2, ••• , n by (14) and by assumed further condition (iv) = CJ~ E[H(Xi ) - 1J.('0)]2 is finite ('0) say. Hence since the xi are distri~uted independently, in (29)" E[B 2 fLee o )]2 cr. = 2 (.) n 0 so that in (28), (30) J [:5 - IJ.(' )]2 il F < 2 o n- ~ ('0) ~-~ n D Now in the first integeal in the r.h.s. of (28), since by (12) and (16) 2 2 2 u < n k 8 n - Now in the r.h.s. of (19), obviously gl + ~+ g3 + g4 ~ gl - 21g2 1 - Ig3 1 - Ig4 ' and hence collecting together the results in (25), (26), and (33) (32), we have from (16), (17) and (19), using (33), ) 1 (34) 8 ) _ g"2[ g (1 _M _ 1 k2 1 28 (1.1(') _~-.......,;o~ + k ,2 fr: 2 (' ) 0 ] < 1 k- and denoting the coefficient of gl in (34) by 2 K, Where as 8 K >0 we write (34) I I I I I I _I I I I I I I I -. I I I. I I I I I I I M8 k g (1 - -;2) - 21( g l' 'lve no" assume that I( i <1 1- is sufficiently small wo that the coefficient of gl in the 1.h.s. of (35) is positive; Jay, 8 ~ (36) k2 2:M Then solving the quadratic in g! Ie I I I I I I -e M8 + 1 +-2 + . 2k I( I( < --------_. 1 _ M8 k 2 so that substituting the value of _~1_ 1 + 8[ { M8 1-- I( from (34), we have 2rr(') 1 0 + k k2 which using (36), can be reduced to the form Where A(e ) is independent of 8 o 9 I Returning to the loss function in (4), we now write it in the form . (38) L*{t, n e0 ) = net-e) 2 [1 + h{t, • )J 0 where by assumption, given any arbitrary DUmber E > 0, we can find a 6, o such that Ih{t,') I ~ (39) € We now take 6 = min[60 ' for all t, such that It 2 ·k 2 M ],so that '0 I oS 60 6 satisfies both (36) and (:39) Now (40) E (L*, " n (Tno ») = JL*{T n n"o ) d Fn R n =JL*{T n n"o )d F n + JL*{T n n"o )d Fn D D R -D n being the subset of R defined by (16) n Now the first integral in the r.h.s. of (40) = In{T -. In{T -. )2 D D d no )2 Fn + no < (l + €) In{T -. )2d F non D Hence noting the definitions of gl and un in (l + £)g (41) I4t(Tn , t,)d Fn 2 1 J D (19) and (14) :s k Then consider the second integral in the r. h. s. of (40). Here we note that since the distribution of T is asymptotically normal, by a well known n property of the Normal frequency function (8) holl.s for T. Hence for given n 6, and (42) €, by making n sufficiently large, we can make P[ITn -. I o > 6] =n peR -D] -<-.!.. n C 10 I I I I I I _I h{Tno " )d Fn which by (39) - --I I I I I I I I -. I I I. I I I I I I I I_ I I I I I I I II -- Then from (10) and (42) where C is the upper bound in (10). .fL*n (Tn" o) dnF -< e (43) Rn -D Now collecting the results in (lfiO), (41), and (4,) and using ('7), we have < (1 - + A(' )e) (1 +2e) 0 k + e £or all su££iciently large n. Since € and 8 can be made arbitrarily small, it £ollows that lim sup E{L*(T n n,e0 )ks n -.eo (45) 1 But 2 is the asymptotic variance of Tn' and hence from (5) and (6) k (46) lim in! E{L*(T ,e )l > 1 2 n n 0 k n -. eo (45) and (46) together imply E{L* (T , • n n 0 ») = 1 k2 Which was to be proved. 3. Asymptotic Efficiency. Here we incidentally point out another optimum property of a m.l.e., which is implied by a relation given in Cheroff . (1956), but Which has a special signi£icance in relation to sup,re£ficient estimates, which is stressed here. According to Fisher's original concept of asympt~tic for short), the A.E. of any estimate T is given by n 11 efficiency (A.E. I (48) 1 A. E. = .-I 2 lim (n I(e) E[T -.] } n n -.+eo where I(e) is Fisher's measure of imformation; evaluated at I(') = " E ( -!- (~ • r2 n f)2) It was assumed that thi s A. E. has an upper bound of 1. This aS8U1llption has however turned out to be false, and in fact the A, E. in (48) has no upper bound, so that given any estimate we can construct another uniformly superior to it and there exists no most efficient estimate. The Fisherian concept of A, E is thus void. A superefficient estimate is one whose A. E. is not less than 1 for any e and exceets 1 for at least one e. An example of such an estimate wal first presented by J. L Hodges, Jr. in 1951. The example relates to a Normae population N(t,l), and is T = ax-n if IinI < 1 ;r- T n = xn if Iia I> n (50) nr1 wherea is some constant, such that 0 sial < 1 and in is the sample mean. n xn = -L n J. ' ... ' x. J. i=l Further examples of superefficient estimates were later given by Le Cam (1953) who constructed an example where the set of superefficiency, 1. e. the set of values of., for which the A. E. excelds 1, is non-denumerable and everywhere dense. It was also proved by him that the set of superefficiency must necessarily be of Lebesgue measure zero. 1 2 I I I I I I _I I I I I I I I -. I I I. I I '1 I I I I Ie I I I I I I I Ie I It should be noted here, that though the A. E. of the superefficient estimate Tn ill (48) is for all 9 higher than that of m.l. e. in' 'n is not 'superior' to in' even if we take the M.S.E. (mean squared error) as the lole While T has a lower ?t¥e.E. at • • 0, for every n, however large, criterion. n there exist values of lower M.S.E. - 9 (in fact an interval of values) in which in has a The observation made sometimes, that according to a 'behaviourist viewpoint' the superefficient estimate T has to be prefer~ed to i n not correct. As pointed out by 18 Cam n is thus (1953) the same thing occur. in the case of all other superefficient estimates. The author considers that this revells a defect in the definition (48) al the possession af a higher A. E. aoes not imply the possession of a higher The defect in this definition (48) arises degree of some desirable property. from the fact that is takes into consideration the performance of the estimate T for each value of n at some single point n • = •. 0 The true value of is however not known, though as n increases we can locate closely with a given degree confidence. • (; more and more This 8U118sta that a revised definition of E. E. should take into account the performance of Tn' not at a single point (;o but in some aeighbourhood. of it, With these considerations a modified definition of A. E. is proposed below. Let c be a sequence of positive numbers such that n C n .In cn -+0 -+eo Then the suggested revised definition is A.E. = lim inf n -+ eo 1 sup [n I(') E,(T _.)2 n • -c <9<, +c o non The following result is proved in Chernoff(1956, Theorem 1) I 2 lim k-+IO ( n I(') E..(min[CT sup lim inf - - k 1¥fn k • <9<- n _,)2,l J» > 1 n .1 - 1¥fn From the proof of the theorem, it is seen that the following conditions are asswned, (a) that the frequency function :r(x,9) and the estimate Tn' satisfy the 'regularity' conditions sufficient for the Cramer-Reo inequality, such '~as those given by Wolfourtz (b) I(e) is bounded away from zero in some neighbourhood ofe • O. For (52) we now replace condition (b) by the condition (c) I ( e) bounded away from zero in some neighborhood of every point • , in the o range of values assumed bye. It is then easily seen to be a consiquence of (53), that the A, E defined by (52), has an upper bound of 1. It is further seen that this upper bound is attained for a m.l.e. while for the superefficient estimate T the A. E. n by the revised definition is nil. Acknowledgement The relation given in ( 53 ) was pointed out to me by a referee of the Annals in connection with another paper. I have also to thank Mrs. 'Kay Herring for doing the typing work neatly and speadily. References (1) Chernoff H (1956) Large Sample Theory. (2) Cramer Harold (1946). sity Press. p. ~~thematical Ann. Math Statist, V. 27 1-23. Methods £f Statistics Princeton Univer- 500-503 (3) Le Cam (1953) On some asymptotic properties of .aximum likelihood estimates and related Bayes estimates. Vol I, 1953 p 277-330 Universitx of CaliforDia Pubn Stats I I I I I I I _I I I I I~ I' I I, e l I , I I. I I I I I I I Ie I I I I I (4) Neumann von J and Morgenstein (1947) Theory of Yarnes and Economic Behavior. 2d ed. Princeton University Press, Princeton N. J. 1947, p. 641 (5) Linnik Yu. V. and Mitrofanova N. M. (1965) lome asymptotic expansions for the distribution of the maximum likelihood estimate. ~ankhya Series A. Vol. 27 Part 1, p 73 (6) lVolfowitz J (1947) The efficiency of sequential estilimates and Wa1ds' equation for sequential processes ,Al1ll Math. Statist. Vol. 18 p 215-230
© Copyright 2024 Paperzz