MAXIMUM LIKELIHOOD ESTIMATION OF A UNIMODAL DENSITY FUNCTION by EDWARD J. WEGMAN Department of Statistics University of North Carolina at Chapel Hill Institute of Statistics Mimeo Series No.608 January 1969 This research arises in part from the author's doctoral dissertation submitted to the Department of Statistics, The Univeristy of Iowa. Supported in part by PHS Research Grant No. GM-14448-01 from the Div. of General Medical Sciences and in part by N. S. F. Development Grant GU-2059. Maximum Likelihood Estimation of a Unimodal Density Function Edward J. Wegman 1. Introduction. Robertson [4] has described a maximum likelihood estimate of a unimodal density when the mode is known. This estimate is represented as a conditional expectation given a a-lattice. Discussions of such conditional expectations are given by Brunk [1] and [2]. L, A a-lattice, of subsets of ~ L, the real line, measure, let A L 2 those members of (~, A,~) is a collection closed under countable unions and countable intersections ¢ and containing both a a-lattice, of subsets of a measure space A function ~ and if the set, [f a], is in > f is measurable with respect to L for each real a. be the set of square-integrable functions and L 2 is = A is Lebesgue ~ is the collection of Borel sets, and ~ If which are measurable with respect to L. L (L) 2 be Brunk [1] shows the following definition of conditional ecpecfBtion with restect to L is suitable. Definition. If f L , 2 E conditional expectation of then f g € L, given J f • 8(g)dA ( 1.1) for every 8, J A E is equal to L with 0 < A(A) < 00 A the J g • 8(g)dA 8(g) E L 2 and 8(0) o and (f-g)dA ::; 0 • The collection of intervals about a fixed point, is a a-lattice which we denote as E(fIL), if and only if a real valued function such that (1. 2) for each L (L) 2 L(m). A function, m, together with f, is unimodal at ¢ M This research arises in part from the author's doctoral dissertation submitted to the Department of Statistics, The University of Iowa. It was supported in part by PHS Research Grant No. GM-14448-01 from the Division of General Medical Sciences and in part by N. S. F. Development Grant GU-2059. 2 by definition if f to see that this is equivalent to increasing at x then we call I > M. If f f f nondecreasing at x It is not difficult < M and f non- is unimodal at every point of an interval, the modal interval of lattice of intervals containing and only if L(M). is measurable with respect to I. f and we shall write Clearly, is measurable with respect to f L(I) for the has modal interval L(I). The center of I, I if I will be called the center mode. Robertson's estimate of the unimodal density is maximum likelihood where the mode is known. the mode is unknown. The present solution is a maximum likelihood estimate when A peculiar characteristic of Robertson's estimate (and also a related estimate found in Wegman [5]) is a "peaking" near the mode. Figure I is based on a Monte Carlo sample of size 90 from a triangular density. This is Robertson's maximum likelihood estimate with mode known to be I 2. To eliminate the peaking, at least partially, we shall require our estimate to have a modal interval of length where E, E is some fixed positive number. This also will have the effect of uniformly bounding the estimate for all sample sizes by 1 E Figure 2 is the estimate as described in this paper. modal interval extends from .493 to .518. In this estimate, .493 is The an observation. Table 1 is the tabulation of the density drawn in Figure 1. the tabulation for Figure 2. The reader should be warned that these computations were carried out on a computer and rounded. E is actually .0234375. Table 2 is Thus, in the case of Figure 2, Both estimates are based on a sample of size 90. 3 TABLE 1 Interval, I _ 00 Value of Estimate on .011) .0 [ .011 , .135) .360 [ .135, .248) .490 [ .248, .264) .674 [ .264, .345) 1. 37 [ .345, .414) 1.59 [ .414, .432) 2.04 [ .432, .448) 2.17 [ .448, .493) 2.34 [ .493, .500) 4.02 [ .500, .505] 4.54 ( .505, .510] 2.26 ( .510, .553] 1. 74 ( .553, .583] 1.48 ( .583, .834] .973 ( .834, .890 ] .605 ( .890, .915] .427 ( .915, .977] .357 ( ( .977 , 00 ) .0 I 4 TABLE 2 Interval, I , _ 00 Value of Estimate on .011) .0 [ .011, .135) .360 [ .135, .248) .. 490 [ .248, .264) .674 [ .264, .345) 1. 37 [ .345, .414) 1.59 [ .414, .432) 2.04 [ .432, .448) 2.17 [ .448, .493) 2.34 [ .493, .518] 2.37 ( .518, .553] 2.16 ( .553, .583] 1.48 ( .583, .834] .973 ( .834, .890 ] .605 ( .890, .915] .427 ( .915, .977] .357 ( ( .977 , 00 ) .0 I FIGURE 1. Triangular density (broken line) together with the maximum likelihood estimate (solid line) when the mode is known to be .5. II') t --< j ~ H -< 2. ---.( . . ", 1 , ,,/ , '" , '" , '" , ( ,, ,," ,,'" , '" '" , '" ,,"'- , ,," "'- . ,," o """ ........ .... ~"' .... ,( ",'" .... ' ..... )----4 ... ... ....... ....... ... ... .... ... .... ... .... .... ' ..... .... , .... ";,,,;, , ,- " ...... ... .......-( .... ........ .... ) >, , ,," . " , " , .... , • .... ~ )0; .... , """ I I .25 I .5 .75 .... .... • .... ... .... ) ....,:1 1 _ FIGURE 2. Triangular density (broken line) and the maximum likelihood estimate (solid line) with modal interval of length .0234375 . '" .3 -- • c "'"""< ----< 2 // r-- "".... "" ,,' ...... , / .... / • > (.I' .... " .... ,,/ / •o ""......"l"( / / 1 / / "" " // ,,"" r-- / / / / ) ;; / / / / o "" " ' .... ' .... .... ' ' . '. .... .... --< "" . ' .... " " / "/ / ...... .... ' ' , )--';.....-' ' , , \,. ~ ( )... 0 " ,, .... "" I I .25 .5 , ) 'J ,75 1 _ 7 2. The estimate. Let us assume Yl Y 2 ~ ~ sample selected according to a unimodal density such that A[L, R] =R- L = Let E. t(n) greater than or equal to [L, R] R respectively. for its modal interval, define f. and the largest observation less than or equal to g(y) is the ordered n Let r(n) L If Y ~ Land R be points be the subscripts of and the smallest observation h is any estimate which has by Yt(n) ~ y < L g(y) R < Y ~ Yr(n) otherwise It follows that ~ = [fgdA]-l. g has modal interval lihood product no smaller than that of h. [L, R] and has a like- Similarly, any estimate, which is to maximize the likelihood product, must be constant on every open interval joining consecutive observations in the complement of [L, R], [L, R] c . The problem then is to find an estimate which is constant on At(n) + 1 = [L, R], Recall m = %(R + L) At(n) + 2 of intervals containing AI' A , ... , 2 section of ~ and L([L, R]) = (R, Yr(n)]' ••• , Ak is the center mode and [L, R]. (j~l with Aj)c. ~. Let ~ (Yn-l' Yn ]· L([L, R]) denote the Finally, let = is the lattice a-field whose atoms are Ln([L, R]) be the inter- An application of a theorem of Robertson [4] shows that the maximum likelihood estimate is given by ~ nm = E(~ nm IL n ([L, R]» where n L n, • 1 i=l 8 Here of n is the number of observations in i Ai and I . A is the indicator 1 A. . Hence we know the form of the maximum likelihood estimate given the 1 location of the modal interval. modal interval. We wish to determine the location of the The following lemma shows that we need only consider a finite number of locations for the modal interval. A Lemma 2.1. If f is the maximum likelihood estimate, we may assume nm that one of the endpoints of the modal interval lies in the set Proof: Recall L Suppose neither observations in L 2 nor R belongs to L), 00 and 10 in [L, R], We may assume f nm (y r ( n )) < ~ nm (y r(n) ~nm(YO(n)) .(.. ) > ~ nm (yl(n)) ~ nm (y) for ~ ) nm 1 10- . (A similar argument holds for (y l(n) Y < or L ~ Y for . {f[L',R'] L ~nmdA Y < > . Define L' has modal interval to show g has a larger likelihood product than Y ~ ~nm(Yl(n)))dA} - R' . so ~ nm , we need only show [Yl(n)+l' Yr(n)-l] Noting that (R, R'], (~nm = ~ nm on g on g ~ Clearly ~nm [L', R']. R' L' + f[L,L'] for > Yr(n) - R} R - Yr(n)-l' . Let by g(y) g (R,oo). but there are = R+1:2 n and L' = L+1:2 n R' Let . {Y ' .•. , Y } , n l and in = min{L - Yl(n) , Yl(n)+l - L, n g(y) (- 1 m {Y ' ... , Y }. n l ~ nm = ~ nm (L) we have on [L', R'] on [L, R] and that ~ nm 9 g(y) ~nm(L)d>" £-1 • {![L' ,R] ![L,L'] ~nm(L)d>" + ![R,R'] ~nm(Yr(n»)d>" + ~nm(Yf(n))d>"} - ![L,L'] . Combining the first and third terms of the right-hand side and observing A[L, R] = we have for £ y £ Finally, observing that both [L', R'] >..[R, R'] >..[L, L'] and ~ and that g(y) > ~ nm (y) for all y nm on (L) in are 1 2 n , that [L',R], [L', R]. Thus it follows that g(y) This implies that ~ > nm g contrary to choice of {Yl' ..• , Yn } or (y) for all ~ nm Hence either ~nm(Yf(n)) having proper end points. compute ~nm(Yr(n)) L or R must lie in If the latter case holds, an If one or more of the sets (- 00, L), [L, R], R belongs to 2n estimates 2n intervals of the form ... , ~nm. ' intervals. Calculate the g, This completes the proof. [L, R] i = 1, 2, ••. , 2n, 2n where one of If the modal interval is unspecified, y }. n corresponding to these 1 2n which is contain no observations, we may define an alternate function, There are at most or [Yf(n)+l' Yr(n)-l] ~ nm may clearly be defined with modal interval a similar manner to the above. L in has a larger likelihood product than estimate equally likely as (R, 00) y likelihood products, n IT j=l f nmi (y . ) • J or in 10 Select the maximum of these estimate. L Let m n 2n ~ numbers and let be the center mode. be the corresponding n (Notice that if 2 or more of the ~ are equal, we could have some ambiguity of definition of nm. 1 adopt the convention of choosing center mode). ~n Let us n to be the estimate with the smallest ~ Clearly the likelihood product of is as large as that of n any other estimate. Before closing this section, we note the following lemma that may be replaced by Lemma 2.2. n L([L, R]). ~ nm which is measurable The maximum likelihood estimate L ([L, R]) with respect to L ([L, R)) is given by E(g nm where IL([L, R]), is as before. by ~ and E Let us abbreviate for purpose of this lemma Proof: A gnm by A g • We want to show is measurable with respect to L([L, R]). ~ =E (g nm IL ([L, (g nm ILn ([L, R))) Clearly, R]). Property (1.1) is easy to verify, so let us show for We may assume A E A is a closed interval, and L([L, R)) [a, b] , 0 < ;\(A) < 00 • and suppose :::; y. < b :::; J Then The center term of the right-hand side must be non-positive by (1.2), since 11 J[ a,y i+1 ] (g - ~)dA Say the latter is the case. (Y , y j + l ], j g(y + l ) j > > 0 or Then since ~(Yj+l)' ~ and /I g are both constant on From this, it follows that Thus But J[ a,y j+1 (g - ~) dA ] = J[ a,b (g - ~)dA ] + J[b ,Y j+1 ] (g - ~)dA • Since both terms on the right-hand side are positive, we have J[ a,y j+1 ] (g - ~)dA > O. Rewriting The last term on the left-hand side is nonpositive by (1.2), since belongs to L ([L, R]). n J Since Thus ~ and /I g [a,yi+1] From this, we have (g - ~)dA are also constant on > O. we have g(y.) 1. > ~(Y.)' 1. 12 Hence so that But [y., YO+l] J 1 E L ([L, R]), n This is a contradiction. which means by (1.2) Our assumption that A similar argument follows if either both. ~ [a,b] (~ ~)dA L or R < b ~ Yr(n) a < 0 > is false. or This concludes the proof. 3. Conditional Expectations of the true density. investigate the conditional expectation, f f In this section, we E(fIL([L, R]», of the true density with respect to the a-lattice of intervals containing the interval, Notice ff 2 dA < 00 is a condition necessary to the definition of the con- ditional expectations. of this paper. [L, R]. We shall need to require this condition in the remainder We also will require continuity of f. It is not difficult to see that continuity together with unimodality of the density f implies The main theorem of this section is an analogue of Theorem 3.1 in [5]. Theorem 3.1. mode M, and If [L, R] f is a continuous unimodal density with a unique is an interval with center K = (R - L) -1 m, • f[L,R] fdA. let 13 i. If ii. f(L) :5: K f (R) > or E(fIL([L, R]» f E(fIL([L, R))) (R - L) If f(L) K ~ f(R) > (R - a) f E(fIL([L, R]» impossible. Notice that f(L) such that a = If the set defining a. f*(y) (R - a) -1 iii sup {y :5: L': (R - y) L' = min{L, M} • Let K -1 y:5: L' and ·f [a,R] f dA f* = E(f!L([L, R))). f f[y,R] • and dA ~ f(y):5: K Clearly fA (f - f*)dA :5: 0 y for y for M f(L) K :5: f(R) > are Case follows in a manner similar to case and f(y) R :5: [L, b) on f(y)} in f* for every in it is clear that [a, R] [a, R]. E E ii. y belongs to a, indeed, exists. and by We wish to show L([L, R)). A c Hence it remains that L([L, R]) with 0 < A(A) < and f (f - f*)8(f*)dA 8(0) = O. 0 i with Hence the set is not empty, and be given by M and • f [L,b) f dA f(R) < ~ [a, R] Thus cases i, ii, and iii exhaust all possibilities. is easily verified and case Let [L, b]c -1 L and on [L, b) on ~ • f f dA [a ,R] f(L) = K = f(R) = (b - L) E(fIL([l, R]» -1 . and or if there is an interval Proof: [a, R]c [L, R] on such that on K :5: f (R) ·f [L,R] f dA [a, R] E(f!L([L, R]» < and f(L) = K = f (R) f f(L) -1 , f (R) ~ or if E(fIL([L, R))) If K < [L, R]c on there is an interval iii. f(L) for every Borel Function 8 such that 00 14 Let = sup{y a' a < x a' , A E L([L, R]) < < R: (R - a)-I. l[a,R] f dA ~ f(y)}. f(x):::: f*(x) such that and for 0 A(A) < a' < < x 1 ~ a' hI R, < we have f - f* 1 (3.2) If a ~ b l < a' , f - f* :::: 0 ~ 0 [hl,R] on f(x) ~ We may assume 00 (3.1) If R, < on [hl,R] (b , a'). l ~ f*(x). Now let A is a closed interval, (f - f*)dA . (hI' R), (f - f*)dA Thus for so that 0 • Using this 1 [bl,a'] (f - f*)dA ~ 1 [a,a'] (f - f*)dA . (3.3) Now since l[a,R] (f - f*)dA = 0, (3.4) we have I[a,a'] (f - f*)dA = -/(a' ,R] (f - f*)dA • Combining (3.3) and (3.4), for a 1 (3.5) Finally if b l < ~ hI [b ,R] l < a' (f - f*)dA ~ 0 . a , (3.6) 1 [hI ,R] (f - f*) dA = o. 1 [a,R] (f - f*) dA Combining (3.1) with either (3.2), (3.5), or (3.6) we have, I Now let A (f - f*)dA ~ 0 for 8 be any Borel function for which I(f - f*) • 8(f*)dA 1 [a,R] (f - f*) • e(f*)dA 0 < A(A) 8(0) = e(f*) • < O. 00 Then 1 [a,R] (f - f*)dA. 15 But the proof of case In this case, a Thus f* = E(fIL([L, R]». This completes ii. Notice that if 4. o. (f-f*)dA f[a,R] =K f(L) =L f(R) > case i and ii appear to overlap. so that there is no contradiction. ~ nm Convergence of We shall show the consistency of the density n estimate based on a certain consistency of the center mode. ~' where F = [limn-+ I sup IF (y) - F(y) y oo n known that this set has probability one. 1. = 0] , is the distribution corresponding to density empirical distribution based on a sample of size Let n f and from F. F is the n It is well In addition, for points in ~' The largest observation less than and the smallest observation greater than a number converge to that number. 2. Corresponding to every pair of numbers, support of {x: f(x) f O} > pair of observations r l < yr 1 (n) < < l r 2 , in the there is eventually a and yr (n) l Yr (n) < r 2 r y r (n) satisfying 2 2 Recall that the maximum likelihood estimate with center mode by ~ nm=E(~nm ILn ([L, R]), Section 2. Let P t Suppose [~nm > t] Yo and A where g nm ,L ([L, R]), n is a point in the support of T t = {L E [~ nm ~ t] L([L, and a result of Robertson [3] gives us . L f If R]): A(T t - L) > O} and R and let m is given are as in t = ~ nm(YO)· 16 ( 4.1) and (4.2) Robertson's theorem is stated for finite measure spaces. Theorem 4.1. M and m If = R n m in is the ordered sample. n is a continuous unimodal density with unique mode m and n m L n not necessarily = m n 1 2 E , E M, > 0 let • be the maximum likelihood estimate with center mode m Let n n 1 1 L = m - - E Then for points E(fIL([L, R]) ) , with R=m+-E and 2 2 Let f f y ~ is a sequence converging to n ~ ... where space to We restrict our nm ~ ~' nm converges pointwise to f n m except possibly at L or R or both. The proof varies depending upon which representation in Theorem Proof: 3.1 holds. and f m If the first characterization holds, we have = (R - L)-l . f [L,R] f d>.. on [L, R] . For sufficiently large [~nm > t] n belongs to n , [L - n, n* to see CJ n n, Yo E [L , R ]· n n Thus if [L, R] c on yO > R or [4] or Wegman [5]. n t = ~nm In this case, any element of L([L, R ]) -1 is the number of observations in n n For sufficiently large so that for sufficiently large nm (yO) ~ (R - L + 2n) n (yO)' n be any positive number. R + n] E L([L , R ]), n = f L < yo < R. eventually. Let ~ If = m Yo < L For we use methods quite analogous to those found in Robertson Let us turn our attention to f n, A • f[L-n,R+n] gnm d>" . n [L - n, R + n], it is not difficult 17 1\ (n* + 2)/n ~ J [L-n,R+n] gnm Let us write Fn (x-) for limy t x Fn (y). ~ dA (n* - l)/n . n Then we may rewrite the above inequality as F (R + n) - F (L - n-) + n n 1n ~ J[L-n,R+n] gnm dA + ~ F (R n n 1 n) - F (L - n-) n n Hence we have n lim -+ 00 1\ dA J [L-n, R+n l gnm F(R + n) - F(L - n-) = = J [L-n,R+n] f dL n From this, lim inf ~ t nm (YO) ~ [R - L + 2nl -1 • J[L-n,R+n] fdA. n But this is true for all n On the other hand, i f an interval. Clearly lim sup Yi(n) :$ lim ooYi(n) = L. n-+ L T Yi(n) and 0, > so [~ t = nm :$ ~ n and L n lim inf Yj (n) then tl, ~ Yj (n) R. ~ T R , n t = [Yi(n), Yj (n) l so that We wish to show To see this, it is necessary to show lim inf Yi(n) ~ this is not true, there is a subsequence which we shall again denote for simplicity, and a positive constant, n ~nm moment, t Since Yi(n) :$ L - n. Yi(n). L. If Yi(n) For the Since ~ nm n T t eventually, = ~nm (YO) = ~ nm ~ nm such that will mean the subsequence corresponding to n is constant on is n = ~ nm (L - n) n (L - n) converges to f n lim sup ~ nm n = f n (L - m m [L - n, R). eventually on n) (L - n) we have on [L - n, R) • 18 [L - n, R + n] Eventually, ![L-n,R+n] where L([L , R ]) E n ~nmn so that by (1.2) n dA ![L-n,R+n] $ ~nmn dA, is the subsequence corresponding to By Fatou's lemma, A 1 im sup ! [L-n,R+n] gnm dA ![L-n,R+n] lim sup $ n But lim sup ~ nm (x) f $ n (L-n) ~nm lim sup ![L-n,R+n] dL n for all m ~nm dA x so that ![L-n,R+n] fm(L - n) dA . $ n We have just seen A lim00 ![L-n,R+n] gnm n -+ = ![L-n, R+n] f dA , dA n so (4.3) But ![L-n,R+n] f dA [L, R] must contain Similarly, Yj(n) M. Since has unique mode, f Hence (4.3) cannot hold. converges to $ ![L-n,R+n] fm(L - n) dA. M since otherwise the characterization of case in Theorem 3.1 could not hold. some neighborhood of $ R. Since ~ (Yj(n) - Yi(n))-l • ![y Writing dA = Thus f lim n-+ oo belongs to Y ] i (n)' j (n) F (y. ( )) - F (y. ( )) nJn nln > f m (L - n by (4.1) dA. + (1) n it is clear that (R - L)-l • (F(R) - F(L)) Thus n lim -+ 00 ~nm n (YO) = for yo = such that in y.I (n ) = L. Hl(T ), t ~nm n) i L < yo < R. 19 This completes the proof in the case that characterization of E(fIL([L, R]». i of Theorem 3.1 is the The other cases may be proven in a similar manner. We are easily able to upgrade the convergence in this case by applying methods similar to those of the Glivenko-Cantelli Theorem. If the conditions of 4.1. hold, then Corollary 4.1. ~ om converges n uniformly except on an interval of arbitrarily small measure containing or R or the union of two such intervals. ~ This convergence holds for all n', hence with probability one. points in 5. L A convergence theorem for the center mode. Recall from Section 2, the maximum likelihood estimate is given by a conditional expectation n with respect to a lattice containing a modal interval, where mode. Let and the sequence, {m}, R = m n n has a limit n + 12 E • m n is the center It is desired to show that m. Roughly speaking, we shall accomplish this by showing that there is an m for which m, flog f • f dA is maximized. If m n fails to converge to that then we can pick another estimate which has a larger likelihood product eventually. This would be a contradiction to the choice of Lemma 5.1. L m 1 = m - 2" E Proof: and = Jlog Let H(m) R =m+~ E. f m • f dA, Then H(m) where f m ~. n E(fIL([L, R]) and is unimodal. The real line may be divided into three regions, corresponding to the three different characterizations of M {m: f(L) ~ (R - L) f(L) < (R - L) -1 -1 f. m Let • J [L,R] f d\ > f(R) • J[L,R] f d\ ~ f(R)}. or 20 Clearly where 1 M 2 is the mode of M Let f. straightforward to show that m such that each 3.1. m If we can show m+ f ~ m+ m 2:: m+ H(m) + -1 is an upper bound, € 2 = inf M and m+ m is characterized by m m M . For f m ~ m- , f M. sup i of Theorem 3.1 for is characterized as in m is characterized as in m is nondecreasing for I t is m ~ m- ii and iii of Theorem nonincreasing m+ , this will be sufficient. f m* fdA. Let Let us consider m < m* ~ m H(m*) - H(m) = flog -fm L, L*, band b* have their obvious meanings. Since for m ~ and for of Theorem 3.1 M and is a lower bound of € 2:: and unimodal for m ~ m ~ . (y-L) -1 -1 ·f[L,y]fd>..~(y-L*) ·f[L*,y]fd>.. (y - Ly1. f[L,y] f dA it follows that if (y - L*) • f[L*,y] f d>" - f(y) b on and we may conclude b*, [L*, b*]. * fm flog -fm b 2:: 2:: 0, so is From a consideration of the sets defining O. 2:: f(y) b*. Hence both and f m are constant From this, we have f d>" = f m* fm*(L*) f[L*,b*]C log -f- f d>" + log f (L*) f[L*,b*] f d>.. • m m But f [L* ,b*] f d>.. = f [L* ,b*] f m* d>.. and since on f m* f10g(-f-) f d>" m [L*, b*]c , f m* f10g(-f-) f m* d>" . m 21 The latter quantity is the Ku11back-Leib1er information number and is nonnegative. Hence ~ H(m*) A similar proof holds for Let net) H(m) for t + > Let us turn our attention to m • ~ O. We may expand H(m) m :s: m :s: m+ . as follows log r-!o[=L.z.;;:,~=]_fd_A] Recalling m, = t log t m H(m) • =m- L 1 2 - 1 R=m+-E and E 2 and differentiating with respect to we obtain "' t H' (m) dAJ . [L,R! f ~(R) - f(L)] + "(f(L» - "(f(R». Rewriting this, H' (m) I n f (R) - f(L) Noticing that feR) :s: -1 E fJ [E [L,R] f f[L,R] f • dt.] _ rn(f(R» - n(f(L»l [f(R) - f(L) dA and that J n I (t) is an increasing function, we have Let us assume feR) - f(L) ~ O. In this case, the right hand side represents the slope of a chord subtracted from a tangent. It is clear in this case, that 22 H' (m) feR) - f(L) ~ o. Hence, we have that if feR) - f(L) ~ 0, feR) - f(L) ~ 0, H'(m) ~ O. H is that m then for which mode, but if not If f H(m) = feR) H'(m) ~ O. Hence, f(L). H(m) Correspondingly, if is unimodal. Clearly there need not be a unique will have a modal interval, namely is symmetric, then it is clear that the mode of that is, the mode of H M. is In any case, since M, are lower and upper bounds respectively on between the modes of H and of f is less than Let us assume, in general that cumbersome details, let us assume and strictly decreasing at x m f {m: feR) H {m} is (- 00, m. f, 1 2 it is clear that the difference 1 2" E. H. In order to avoid is strictly increasing at Thus H x for has a unique mode. x < M An f. {~ }, we wish to show is the sequence of center modes of n converges to n f {m} f(L)}. M+-E: and E obvious modification exists if there are some "plateaus" in If = is the mode of 1 2 M is the mode of x > M. for The mode of n The next lemma applies in the case that the support of (0). Lemma 5.2. If the entropy, -flog f • f dA of f is finite, with pro- babili ty one - Proof: {m }) n label X o so that ~ n(X) Then as soon as for If x 00 n lim sup m n = 00, which diverges to ~ y. for all x 1 - F(x ) O > n m - X > o n ~ xo' ~n ~ < lim inf m 1 (n). 00. Let n (x O ' mn ) Thus n < 00. there is a subsequence (which we will also Suppose E lim sup m fn(x ) O > y be chosen in (0, 1) and choose 0 and ~ n(x O) infinitely often. ~ n converges uniformly to zero. > n infinitely often. This is impossible for sufficiently large n. Hence 23 Let us write n L(~ We may write n n L(~ n If we let ~n n n 1 L(~ ) I: n n logd (y.)). n i=l 1 as follows ) log(~ n (y.)) + 1 ) log(~ n (y.)) 1 I: be the number cl observations less than or equal to is bounded by E -1 and since since converges to we have lim sup n -1 log(~ n (y.)) 1 ~ (x) On the other hand, since log(~ (x)) n n diverges uniformly to lim sup n ~ log(E -1 ) • Y . converges uniformly to - for 00 -1 x ~ x ' O 0 Thus log(~ n (y.)) 1 - 00. Finally, we may write lim sup L(~ ) - n n Let us consider f m = E(fIL([L, R])) 00 where L = m 1 2 Clearly by the Kolmogorov Strong Law with probability one limoo L(f ,) n-+ n m flog f , • fdA. m This quantity is finite since the entropy is finite. lim inf [L(f ,) n m L(~)] n n = + Hence 00. E and R m + 1:2 E . 24 This is equivalent to saying, eventually is a contradiction to the choice of f ~. m Hence n ~ is more likely than , lim sup m < n which n A similar 00 argument holds for the other inequality. It may be that the support is bounded below, bounded above or both. a = inf{x: f<x) O} > S = sup{x: f(x) and a}. > Let Using methods similar to those found in Lemma 5.2, we may obtain Lemma 5.3. i If unimodal density f 1 a+- E < 2 f m' 21 E m' < 1 E. 2 S E(fIL([L' • F']) ) 3.1, f Let both f m' and and the entropy of the continuous ~ {m }. n (Here let us pennit L with m 1 -2 each agree with E 1 -2 m' L' f S - 21 Then m' has the property a 00 lim sup m n and E 21 [c, d]c, E, S c 1 -2 and d small so that we write = - nm' ~ n f E) , m = f m' = f. S = + 00). and and R' = and let 21 m' + Moreover, since both are in the support of d + n Now let By Theorem E. on the complement of some interval. (recall the bounds on c - nand f. We may pick n 0 > are in M+ and 2 are in the support. m' m and 1 M- - E M are be the larges t . d 1 2 E), sufficiently In the next lemma, when we shall mean that subsequence of the sequence of maximum likelihood By mI. the maximum likelihood estimate whose mode is Lemma 5.4. on E. 1 R = m +-E 2 estimates for which the center modes converge to ~ < be the smallest endpoint of those two intervals and c Hence on (a + lim inf m n with f m' m < be any cluster point of E(fIL([L, R])) m min{M - a, S - M} < is finite, then with probability one a + Let E f n c we shall mean m. With probability one, for sufficiently large (c - n, d + n) . * n, f n * = ~n 25 in n, c). (c - ~ (to' c - 0) , Let III The set of probability one is Proof: OE (0, c - to). Let and n f * n Since as described in 4. f has a point of increase in must both eventually have a jump in be the smallest observation greater than Yi jump and let Without loss of generality, we may assume has a jump. the largest observation smaller than or equal to in f n * Thus we have [f t n * ~ so t], Assume y .• ~ Yi* to for which to 1 Let f Yj * - i1 E, m + i1 and for some observation Hence by (4.1) E]). (5.1) where In a similar manner using (4.2), as defined in Section 2. we can show (5.2) Let gn * t A g n A g ~ (Y . 1 *- y. ) -1 • f [ 1 as defined in Section 2. nm n will agree. mI - i1E > c - u-". ] g n * d'/\ • For sufficiently large This is easily seen, since 1 m - - E > n 2 As soon as Yi,Y i * 1 m n C - 0, 2 E A gn n, and converges to and gn * agree. This together with (5.1) and (5.2) shows (5.3) (Y i * - Yi ) Now let (5.4) t -1 ~ n(t O) . J [Yi,Y and P ~ A gn dA i *] [~ t n (y. - Y 1 ~ (y. - Yj*) 1 > t]. j* -1 f A [Yj*,Y i ] gn dL Again by use of (4.2), we have ) -1 • f [ * n be for which there is a jump Let = [Y j *, Yk *] Tt Then ~ n has a We wish to show equality holds in the last inequality. T (to' c - 0) • for which to be the smallest observation greater than Yi * Pick Yj*'Y i ] ~ n dL 26 Finally letting t = ~ n (y i) ~ n (y i) (5.5) :s; and using (4.1) , we have (y i* - y i) .f -1 1\ [Yi'Yi*] gn d". Using (5.3), (5.4) and (5.5), ~ ~ is a jump point in But n , Y f Y * i i [f n * > t]. (y.). 1. so ~ n (to) Thus our assumption n ~ n (y.). 1. < is false. Let us now consider Recall now that the definition of n L ([m - with respect to ditiona1 expectation of f n * t = f and n involves the con- 12 E, m+ 12 E]). Since there are only a finite number of sets in this lattice, the supremum in (4.2) is really a maximum. Let L = [u, v] be the member of the lattice such that (L - P )-1 • f t L-P g t n * dA. Rewri ting this (y. - u 1. + v - Yk *) -1 • f [ ] g * dL [u,y i ] U Yk*,v n But by (4.2) , * f n (to) (v - Y *) k ~ -1 • f * dA [yk*,v] gn Combining these two displays, we obtain -1 * f n (to) :s; (Y i - u) . f [u,y i ] gn * dL But also by (4.2), * f n (to) ~ (y. - u) 1. -1 • f [u,y.] gn 1. * dL . 27 Hence, for sufficiently large n, {(yo - u) -1 • f[ sup 1 < y. U u,y. ] gn * dAle 1 1 Similarly, for sufficiently large ~ n (to) and Since equal at to f n gn * sup {(y. - u) 1 u < Yi -1 " • ! dAle [u,y.] gn 1 ~ eventually agree in this region, eventually. *(to)' n, For any t < to' and n f n * must be by virtue of the fact, and by use of (4.1) and (4.2), we obtain the desired con- clusion. We may now state and prove the theorem. 5.1. THEOREM If tinuous unimodal density lim n + and m 2 < E f where m is the mode of H. {m }, (which we again label {m }), converges n n 1 1 1 E < m' < S - - E. Let f = E(f L([m - 2 E, 1 m + 2 E]» 2 m , 1 be the maximum likelihood E, m +2"E]). Let f n 1 a+2 1 = E(fIL([m' - 2" f m' m' :f m, with * estimate when the mode is known to occur at m and let likelihood estimate when the mode in unknown. a larger likelihood product than with probability one, and and the entropy of the con- is finite, then with probability one m, n S - M} min{M - a, If not, a subsequence of Proof: to 00 1 {~ } n m n ~n . ~n be the maximum We intend to show that f m'). converges to Recall that be sufficiently small so that on * has This will be a contradiction, so that m. (It is understood that refer to the subsequences corresponding to the subsequence converging to n [c, d]c, c - nand d + n f = f , m = f . m Let n {m } n > are in the support of 0 f. 28 For sufficiently large probability one. f m , 1 "2 > f * ~ and n agree on n d + 11] ~n and because uniformly with probability one and f n * d + 11)c (c - 11, = min{f(c - 11), f(d + 11)}. k [c - 11, on k Let n, Because f converges to converges to f m 1 k > - m f with 2 m and almost , almost uniformly with probability one, logd ) converges to log(f ,) n m * log(f ) converges to log(f ) n m 6 > o. Pick the set P(A) • 10g(kE/2) > - almost uniformly with probability one and almost uniformly with probability one. Let A of arbitrarily small measure so that 6. Now i f are the ordered observations, let 1 L(f *) - L(~ ) n n n n For sufficiently large (c - 11, c d + 11) • [c - 11 , d + 11], f n * A, log(f k n 2 = *) log(~ n ) ~n ~ Yi in n* bilityone, n d + 11] n A. 1n l -n L {log(f *(Y.)) l n 1 > 1. E log(kE/2), sufficiently large n n. 1 (y.))}. 1 ~ and agree on n in Yi log(~ n (y.))} 1 eventually. . This follows since Thus n* > n Since (n*/n) L {log(f *(Y.)) - log(~ (y.))} 2 f * n n with probability one, is the number of observations in [c - 11, log(~ 1 be the summation over l log(~ n (y.))} 1 1 L {log(f *(Y.)) n 2 n 1 where L for sufficiently large n n *(y.)) with probability one, Thus i f we let eventually and >- {log(f L n i=l L(f *) - Ld ) n n n n Now on n n 1 log(kE/Z) A and L Z converges to > - 6 eventually, is the summation over P(A) with proba- with probability one for 29 In a similar manner, one may show eventually 1 - - n Then if L 3 L {10g(f (y.» m 1 2 is the sum over Yi - log(f ,(y.»} m 1 [c - n, in log(f n *) log(~) and on n 8 • d + n] - A, with probability one for sufficiently large properties of > - n. By the uniform convergence d + n] - A, n, [c - we have eventually with probability one, L(f *) - L(~ ) ~ 1 L {log(f (y.» nn nn n l m1 On [c - n, c d - n] , sufficiently large f f m = f , m - log(f ,(y.»} - 38. m 1 so that with probability one and for n, n L(f *) - L(~ ) ~ 1 E {log(f (y.» n n n n n i=l m 1 - log(f ,(y.»} - 38 • m 1 By the Kolmogorov Strong Law, with probability one lim n -+ Since 00 * (L(f ) - L(~ » n n n n m is the unique mode of f ~ f log(f m ) f dA - 38. m' f log(f ) fdA, m and since 8 was arbitrary, with probability one lim (L(f *) - L(~ » n-+ oo nn nn This is equivalent to is a contradiction. f n * > 0 • having a larger likelihood product than Hence with probability one, {m} n converges to ~ n . m. This 30 6. Summary. For a unimodal density. fm = E(fIL([m length E. - %E, Here E > 0, we have given a maximum likelihood estimate of This estimate is a strongly consistent estimate of m 1 + 2 E]). Of course, m is the mode of H(m) f m =f except on an interval of = J log(f m) fdA. same method yields a maximum likelihood estimate. If E = 0, the The lack of a uniform bound causes the consistency arguments described in this paper to fail. Hence the asymptotic behavior of this sort of estimate is unknown. Wegman [5] describes a consistency argument for a related continuous estimate. 7. The analogue for Acknowledgements. ~ n is not difficult. I wish to thank Professor Tim Robertson for suggesting the topic from which this work arose and for his guidance as my thesis advisor at the University of Iowa. 31 REFERENCES [1] Brunk, H.D. (1963). expectation," [2] Brunk, H.D. (1965). "On an extension of the concept of conditional Proc. Amer. Math. Soc. 14, "Conditional expectation given a a-lattice and applications," [3] Robertson, Tim (1966). given [4] Ann. Math. Statis t. 36, a-lattices," Robertson, Tim (1967). Wegman, E.J. (1968). 1339-1350. "A representation for conditional expectations Ann. Math. Statist.]2, 1279-1283. "On estimating a density which is measurable with respect to a a-lattice," [5] 298-304. Ann. Math. Statist.~, 482-493. "A note on estimating a unimodal density," Institute of Stat. Mimeo Series No. 599, The Univ. of North Carolina at Chapel Hill. (Sub. to Ann. Math. Statist.)
© Copyright 2024 Paperzz