Wegman, E.J.; (1969)Maximum likelihood estimation of a unimodal density function."

MAXIMUM LIKELIHOOD ESTIMATION OF A
UNIMODAL DENSITY FUNCTION
by
EDWARD J. WEGMAN
Department of Statistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No.608
January 1969
This research arises in part from the author's doctoral dissertation submitted
to the Department of Statistics, The Univeristy of Iowa. Supported in part by
PHS Research Grant No. GM-14448-01 from the Div. of General Medical Sciences
and in part by N. S. F. Development Grant GU-2059.
Maximum Likelihood Estimation of a Unimodal Density Function
Edward J. Wegman
1.
Introduction.
Robertson [4] has described a maximum likelihood
estimate of a unimodal density when the mode is known.
This estimate is
represented as a conditional expectation given a a-lattice.
Discussions of
such conditional expectations are given by Brunk [1] and [2].
L,
A a-lattice,
of subsets of
~
L,
the real line,
measure, let
A
L
2
those members of
(~, A,~)
is a collection
closed under countable unions and countable intersections
¢
and containing both
a a-lattice,
of subsets of a measure space
A function
~
and
if the set, [f
a], is in
>
f
is measurable with respect to
L for each real a.
be the set of square-integrable functions and
L
2
is
= A is Lebesgue
~
is the collection of Borel sets, and
~
If
which are measurable with respect to
L.
L (L)
2
be
Brunk [1] shows
the following definition of conditional ecpecfBtion with restect to L is suitable.
Definition.
If
f
L ,
2
E
conditional expectation of
then
f
g
€
L,
given
J f • 8(g)dA
( 1.1)
for every
8,
J
A
E
is equal to
L
with
0
<
A(A)
<
00
A
the
J g • 8(g)dA
8(g)
E
L
2
and
8(0)
o
and
(f-g)dA ::; 0
•
The collection of intervals about a fixed point,
is a a-lattice which we denote as
E(fIL),
if and only if
a real valued function such that
(1. 2)
for each
L (L)
2
L(m).
A function,
m,
together with
f,
is unimodal at
¢
M
This research arises in part from the author's doctoral dissertation submitted to
the Department of Statistics, The University of Iowa. It was supported in part
by PHS Research Grant No. GM-14448-01 from the Division of General Medical Sciences
and in part by N. S. F. Development Grant GU-2059.
2
by definition if
f
to see that this is equivalent to
increasing at
x
then we call
I
>
M.
If
f
f
f
nondecreasing at
x
It is not difficult
<
M and
f
non-
is unimodal at every point of an interval,
the modal interval of
lattice of intervals containing
and only if
L(M).
is measurable with respect to
I.
f
and we shall write
Clearly,
is measurable with respect to
f
L(I)
for the
has modal interval
L(I).
The center of
I,
I
if
I
will
be called the center mode.
Robertson's estimate of the unimodal density is maximum likelihood where
the mode is known.
the mode is unknown.
The present solution is a maximum likelihood estimate when
A peculiar characteristic of Robertson's estimate (and
also a related estimate found in Wegman [5]) is a "peaking" near the mode.
Figure I is based on a Monte Carlo sample of size 90 from a triangular density.
This is Robertson's maximum likelihood estimate with mode known to be
I
2.
To
eliminate the peaking, at least partially, we shall require our estimate to
have a modal interval of length
where
E,
E
is some fixed positive number.
This also will have the effect of uniformly bounding the estimate for all
sample sizes by
1
E
Figure 2 is the estimate as described in this paper.
modal interval extends from
.493
to
.518.
In this estimate,
.493
is
The
an
observation.
Table 1 is the tabulation of the density drawn in Figure 1.
the tabulation for Figure 2.
The reader should be warned that these computations
were carried out on a computer and rounded.
E
is actually
.0234375.
Table 2 is
Thus, in the case of Figure 2,
Both estimates are based on a sample of size
90.
3
TABLE 1
Interval, I
_
00
Value of Estimate on
.011)
.0
[ .011 ,
.135)
.360
[ .135,
.248)
.490
[ .248,
.264)
.674
[ .264,
.345)
1. 37
[ .345,
.414)
1.59
[ .414,
.432)
2.04
[ .432,
.448)
2.17
[ .448,
.493)
2.34
[ .493,
.500)
4.02
[ .500,
.505]
4.54
(
.505,
.510]
2.26
(
.510,
.553]
1. 74
(
.553,
.583]
1.48
(
.583,
.834]
.973
(
.834,
.890 ]
.605
(
.890,
.915]
.427
(
.915,
.977]
.357
(
( .977 ,
00
)
.0
I
4
TABLE 2
Interval, I ,
_
00
Value of Estimate on
.011)
.0
[ .011,
.135)
.360
[ .135,
.248)
.. 490
[ .248,
.264)
.674
[ .264,
.345)
1. 37
[ .345,
.414)
1.59
[ .414,
.432)
2.04
[ .432,
.448)
2.17
[ .448,
.493)
2.34
[ .493,
.518]
2.37
( .518,
.553]
2.16
( .553,
.583]
1.48
( .583,
.834]
.973
( .834,
.890 ]
.605
( .890,
.915]
.427
( .915,
.977]
.357
(
( .977 ,
00
)
.0
I
FIGURE 1. Triangular density (broken line)
together with the maximum likelihood
estimate (solid line) when the mode is
known to be .5.
II')
t
--<
j
~
H
-<
2.
---.(
.
.
",
1
, ,,/
, '"
, '"
, '"
, (
,,
,,"
,,'"
, '" '"
, '"
,,"'-
, ,,"
"'-
.
,,"
o
"""
........ ....
~"' ....
,(
",'"
....
' .....
)----4
...
...
.......
.......
... ...
.... ...
.... ...
.... ....
'
.....
.... ,
....
";,,,;,
,
,- "
...... ...
.......-(
....
........
....
) >,
,
,,"
.
" ,
"
, ....
,
•
....
~
)0; ....
, """
I
I
.25
I
.5
.75
.... ....
•
.... ...
....
) ....,:1
1
_
FIGURE 2. Triangular density (broken line) and
the maximum likelihood estimate (solid
line) with modal interval of length
.0234375 .
'"
.3
--
•
c
"'"""<
----<
2
//
r--
""....
"" ,,'
......
,
/
....
/
•
> (.I'
....
"
....
,,/
/
•o
""......"l"(
/
/
1
/
/
"" "
//
,,""
r--
/
/ /
/
)
;; /
/
/
/
o
""
"
'
....
'
....
....
'
'
.
'. ....
....
--<
""
.
' ....
" "
/
"/
/
......
....
'
'
, )--';.....-'
' ,
,
\,.
~
(
)...
0
" ,,
....
""
I
I
.25
.5
,
) 'J
,75
1
_
7
2.
The estimate.
Let us assume
Yl
Y
2
~
~
sample selected according to a unimodal density
such that
A[L, R]
=R- L =
Let
E.
t(n)
greater than or equal to
[L, R]
R
respectively.
for its modal interval, define
f.
and
the largest observation less than or equal to
g(y)
is the ordered
n
Let
r(n)
L
If
Y
~
Land
R be points
be the subscripts of
and the smallest observation
h
is any estimate which has
by
Yt(n) ~ y < L
g(y)
R
<
Y ~ Yr(n)
otherwise
It follows that
~ = [fgdA]-l.
g
has modal interval
lihood product no smaller than that of
h.
[L, R]
and has a like-
Similarly, any estimate, which is
to maximize the likelihood product, must be constant on every open interval
joining consecutive observations in the complement of
[L, R], [L, R]
c
.
The
problem then is to find an estimate which is constant on
At(n) + 1
=
[L, R],
Recall
m
= %(R + L)
At(n) + 2
of intervals containing
AI' A , ... ,
2
section of
~
and
L([L, R])
=
(R, Yr(n)]' ••• , Ak
is the center mode and
[L, R].
(j~l
with
Aj)c.
~.
Let
~
(Yn-l' Yn ]·
L([L, R])
denote the
Finally, let
=
is the lattice
a-field whose atoms are
Ln([L, R])
be the inter-
An application of a theorem of Robertson
[4] shows that the maximum likelihood estimate is given by
~ nm = E(~ nm IL n ([L, R]»
where
n
L n, •
1
i=l
8
Here
of
n
is the number of observations in
i
Ai
and
I .
A
is the indicator
1
A. .
Hence we know the form of the maximum likelihood estimate given the
1
location of the modal interval.
modal interval.
We wish to determine the location of the
The following lemma shows that we need only consider a finite
number of locations for the modal interval.
A
Lemma 2.1.
If
f
is the maximum likelihood estimate, we may assume
nm
that one of the endpoints of the modal interval lies in the set
Proof:
Recall
L
Suppose neither
observations in
L
2
nor
R belongs to
L),
00
and
10
in
[L, R],
We may assume
f nm (y r ( n ))
<
~
nm
(y
r(n)
~nm(YO(n))
.(..
)
>
~ nm (yl(n))
~ nm (y)
for
~
)
nm
1
10-
.
(A similar argument holds for
(y
l(n)
Y
<
or
L ~ Y
for
. {f[L',R']
L
~nmdA
Y
<
>
.
Define
L'
has modal interval
to show
g
has a larger likelihood product than
Y
~
~nm(Yl(n)))dA}
-
R' .
so
~ nm , we need only show
[Yl(n)+l' Yr(n)-l]
Noting that
(R, R'],
(~nm
= ~ nm on
g
on
g
~
Clearly
~nm
[L', R'].
R'
L'
+ f[L,L']
for
>
Yr(n) - R}
R - Yr(n)-l'
.
Let
by
g(y)
g
(R,oo).
but there are
= R+1:2 n and L' = L+1:2 n
R'
Let
.
{Y ' .•. , Y } ,
n
l
and in
= min{L - Yl(n) , Yl(n)+l - L,
n
g(y)
(-
1
m
{Y ' ... , Y }.
n
l
~ nm = ~ nm (L)
we have on
[L', R']
on
[L, R]
and that
~
nm
9
g(y)
~nm(L)d>"
£-1 • {![L' ,R]
![L,L']
~nm(L)d>"
+ ![R,R']
~nm(Yr(n»)d>" +
~nm(Yf(n))d>"}
- ![L,L']
.
Combining the first and third terms of the right-hand side and observing
A[L, R]
=
we have for
£
y
£
Finally, observing that both
[L', R']
>..[R, R']
>..[L, L']
and
~
and that
g(y) > ~
nm
(y)
for all
y
nm
on
(L)
in
are
1
2 n ,
that
[L',R],
[L', R].
Thus it follows that
g(y)
This implies that
~
>
nm
g
contrary to choice of
{Yl' ..• , Yn }
or
(y)
for all
~ nm
Hence either
~nm(Yf(n))
having proper end points.
compute
~nm(Yr(n))
L
or
R must lie in
If the latter case holds, an
If one or more of the sets
(- 00, L),
[L, R],
R belongs to
2n
estimates
2n
intervals of the form
... ,
~nm. '
intervals.
Calculate the
g,
This completes the proof.
[L, R]
i = 1, 2, ••. , 2n,
2n
where one of
If the modal interval is unspecified,
y }.
n
corresponding to these
1
2n
which is
contain no observations, we may define an alternate function,
There are at most
or
[Yf(n)+l' Yr(n)-l]
~ nm may clearly be defined with modal interval
a similar manner to the above.
L
in
has a larger likelihood product than
estimate equally likely as
(R, 00)
y
likelihood products,
n
IT
j=l
f
nmi
(y . ) •
J
or
in
10
Select the maximum of these
estimate.
L
Let
m
n
2n
~
numbers and let
be the center mode.
be the corresponding
n
(Notice that if 2 or more of the
~
are equal, we could have some ambiguity of definition of
nm.
1
adopt the convention of choosing
center mode).
~n
Let us
n
to be the estimate with the smallest
~
Clearly the likelihood product of
is as large as that of
n
any other estimate.
Before closing this section, we note the following lemma that
may be replaced by
Lemma 2.2.
n
L([L, R]).
~ nm which is measurable
The maximum likelihood estimate
L ([L, R])
with respect to
L ([L, R))
is given by
E(g
nm
where
IL([L, R]),
is as
before.
by
~
and
E
Let us abbreviate for purpose of this lemma
Proof:
A
gnm
by
A
g •
We want to show
is measurable with respect to
L([L, R]).
~ =E
(g nm IL ([L,
(g nm ILn ([L,
R)))
Clearly,
R]).
Property (1.1) is easy to verify,
so let us show
for
We may assume
A
E
A is a closed interval,
and
L([L, R))
[a, b]
,
0
<
;\(A)
<
00
•
and suppose
:::; y. < b :::;
J
Then
The center term of the right-hand side must be non-positive by (1.2), since
11
J[
a,y i+1
]
(g - ~)dA
Say the latter is the case.
(Y , y j + l ],
j
g(y + l )
j
>
> 0
or
Then since
~(Yj+l)'
~
and
/I
g
are both constant on
From this, it follows that
Thus
But
J[
a,y j+1
(g - ~) dA
]
=
J[
a,b
(g - ~)dA
]
+ J[b
,Y j+1
]
(g - ~)dA
•
Since both terms on the right-hand side are positive, we have
J[
a,y j+1
]
(g - ~)dA
>
O.
Rewriting
The last term on the left-hand side is nonpositive by (1.2), since
belongs to
L ([L, R]).
n
J
Since
Thus
~
and
/I
g
[a,yi+1]
From this, we have
(g - ~)dA
are also constant on
>
O.
we have
g(y.)
1.
>
~(Y.)'
1.
12
Hence
so that
But
[y., YO+l]
J
1
E
L ([L, R]),
n
This is a contradiction.
which means by (1.2)
Our assumption that
A similar argument follows if either
both.
~
[a,b]
(~
~)dA
L
or
R < b ~ Yr(n)
a <
0
>
is false.
or
This concludes the proof.
3.
Conditional Expectations of the true density.
investigate the conditional expectation,
f
f
In this section, we
E(fIL([L, R]»,
of the true density
with respect to the a-lattice of intervals containing the interval,
Notice
ff 2 dA <
00
is a condition necessary to the definition of the con-
ditional expectations.
of this paper.
[L, R].
We shall need to require this condition in the remainder
We also will require continuity of
f.
It is not difficult
to see that continuity together with unimodality of the density
f
implies
The main theorem of this section is an analogue of Theorem 3.1 in [5].
Theorem 3.1.
mode
M,
and
If
[L, R]
f
is a continuous unimodal density with a unique
is an interval with center
K
=
(R - L)
-1
m,
• f[L,R] fdA.
let
13
i.
If
ii.
f(L) :5: K
f (R)
>
or
E(fIL([L, R]»
f
E(fIL([L, R)))
(R - L)
If
f(L)
K
~
f(R)
>
(R - a)
f
E(fIL([L, R]»
impossible.
Notice that
f(L)
such that
a
=
If
the set defining
a.
f*(y)
(R - a)
-1
iii
sup {y :5: L': (R - y)
L' = min{L, M} •
Let
K
-1
y:5: L'
and
·f [a,R] f dA
f* = E(f!L([L, R))).
f
f[y,R]
•
and
dA
~
f(y):5: K
Clearly
fA (f - f*)dA :5: 0
y
for
y
for
M
f(L)
K :5: f(R)
>
are
Case
follows in a manner similar to case
and
f(y)
R :5:
[L, b)
on
f(y)}
in
f*
for every
in
it is clear that
[a, R]
[a, R].
E
E
ii.
y
belongs to
a, indeed, exists.
and by
We wish to show
L([L, R)).
A
c
Hence it remains that
L([L, R])
with
0
<
A(A)
<
and
f (f - f*)8(f*)dA
8(0) = O.
0
i
with
Hence the set is not empty, and
be given by
M
and
• f [L,b) f dA
f(R)
<
~
[a, R]
Thus cases i, ii, and iii exhaust all possibilities.
is easily verified and case
Let
[L, b]c
-1
L
and
on
[L, b)
on
~
• f
f dA
[a ,R]
f(L) = K = f(R)
= (b - L)
E(fIL([l, R]»
-1
.
and
or if
there is an interval
Proof:
[a, R]c
[L, R]
on
such that
on
K :5: f (R)
·f [L,R] f dA
[a, R]
E(f!L([L, R]»
<
and
f(L) = K = f (R)
f
f(L)
-1
,
f (R)
~
or if
E(fIL([L, R)))
If
K
<
[L, R]c
on
there is an interval
iii.
f(L)
for every Borel Function
8
such that
00
14
Let
= sup{y
a'
a
<
x
a' ,
A
E
L([L, R])
<
<
R: (R - a)-I. l[a,R] f dA ~ f(y)}.
f(x):::: f*(x)
such that
and for
0
A(A)
<
a'
<
<
x
1
~
a'
hI
R,
<
we have
f - f*
1
(3.2)
If
a
~
b
l
<
a' ,
f - f* :::: 0
~
0
[hl,R]
on
f(x)
~
We may assume
00
(3.1)
If
R,
<
on
[hl,R]
(b , a').
l
~
f*(x).
Now let
A is a closed interval,
(f - f*)dA .
(hI' R),
(f - f*)dA
Thus for
so that
0 •
Using this
1 [bl,a'] (f - f*)dA ~ 1 [a,a'] (f - f*)dA .
(3.3)
Now since
l[a,R] (f - f*)dA = 0,
(3.4)
we have
I[a,a'] (f - f*)dA = -/(a' ,R] (f - f*)dA •
Combining (3.3) and (3.4), for
a
1
(3.5)
Finally if
b
l
<
~
hI
[b ,R]
l
<
a'
(f - f*)dA
~
0 .
a ,
(3.6)
1 [hI ,R] (f - f*) dA
=
o.
1 [a,R] (f - f*) dA
Combining (3.1) with either (3.2), (3.5), or (3.6) we have,
I
Now let
A
(f - f*)dA
~
0
for
8 be any Borel function for which
I(f - f*) • 8(f*)dA
1
[a,R]
(f - f*) • e(f*)dA
0 < A(A)
8(0)
= e(f*) •
<
O.
00
Then
1 [a,R] (f - f*)dA.
15
But
the proof of case
In this case,
a
Thus
f*
=
E(fIL([L, R]».
This completes
ii.
Notice that if
4.
o.
(f-f*)dA
f[a,R]
=K
f(L)
=L
f(R)
>
case
i
and
ii
appear to overlap.
so that there is no contradiction.
~ nm
Convergence of
We shall show the consistency of the density
n
estimate based on a certain consistency of the center mode.
~'
where
F
= [limn-+
I
sup IF (y) - F(y)
y
oo
n
known that this set has probability one.
1.
= 0] ,
is the distribution corresponding to density
empirical distribution based on a sample of size
Let
n
f
and
from
F.
F
is the
n
It is well
In addition, for points in
~'
The largest observation less than and the smallest observation
greater than a number converge to that number.
2.
Corresponding to every pair of numbers,
support of
{x: f(x)
f
O}
>
pair of observations
r
l
<
yr 1 (n)
<
<
l
r
2
,
in the
there is eventually a
and
yr (n)
l
Yr (n) < r 2
r
y r (n)
satisfying
2
2
Recall that the maximum likelihood estimate with center mode
by
~ nm=E(~nm ILn ([L, R]),
Section 2.
Let
P
t
Suppose
[~nm
>
t]
Yo
and
A
where
g
nm
,L ([L, R]),
n
is a point in the support of
T
t
=
{L E
[~
nm
~ t]
L([L,
and
a result of Robertson [3] gives us
.
L
f
If
R]): A(T
t
-
L) > O}
and
R
and let
m is given
are as in
t = ~ nm(YO)·
16
( 4.1)
and
(4.2)
Robertson's theorem is stated for finite measure spaces.
Theorem 4.1.
M and
m
If
=
R
n
m
in
is the ordered sample.
n
is a continuous unimodal density with unique mode
m
and
n
m
L
n
not necessarily
=
m
n
1
2
E
,
E
M,
> 0
let
•
be the maximum likelihood estimate with center mode m
Let
n
n
1
1
L = m - - E
Then for points
E(fIL([L, R]) ) , with R=m+-E
and
2
2
Let
f
f
y
~
is a sequence converging to
n
~
...
where
space to
We restrict our
nm
~
~'
nm
converges pointwise to
f
n
m
except possibly at
L
or
R
or
both.
The proof varies depending upon which representation in Theorem
Proof:
3.1 holds.
and
f
m
If the first characterization holds, we have
= (R - L)-l
. f [L,R]
f d>..
on
[L, R] .
For sufficiently large
[~nm
>
t]
n
belongs to
n , [L -
n,
n*
to see
CJ
n
n,
Yo E [L , R ]·
n
n
Thus if
[L, R] c
on
yO > R
or
[4] or Wegman [5].
n
t =
~nm
In this case, any element of
L([L, R ])
-1
is the number of observations in
n
n
For sufficiently large
so that for sufficiently large
nm (yO) ~ (R - L + 2n)
n
(yO)'
n
be any positive number.
R + n] E L([L , R ]),
n
= f
L < yo < R.
eventually.
Let
~
If
=
m
Yo < L
For
we use methods quite analogous to those found in Robertson
Let us turn our attention to
f
n,
A
• f[L-n,R+n] gnm d>" .
n
[L - n, R + n], it is not difficult
17
1\
(n* + 2)/n ~ J [L-n,R+n] gnm
Let us write
Fn (x-)
for
limy t x Fn (y).
~
dA
(n* - l)/n .
n
Then we may rewrite the above inequality
as
F (R + n) - F (L - n-) +
n
n
1n ~ J[L-n,R+n] gnm dA
+
~ F (R
n
n
1
n) - F (L - n-)
n
n
Hence we have
n lim
-+
00
1\
dA
J [L-n, R+n l gnm
F(R + n) - F(L - n-)
=
=
J [L-n,R+n] f dL
n
From this,
lim inf
~
t
nm (YO) ~ [R - L + 2nl
-1
• J[L-n,R+n] fdA.
n
But this is true for all
n
On the other hand, i f
an interval.
Clearly
lim sup Yi(n)
:$
lim ooYi(n) = L.
n-+
L
T
Yi(n)
and
0,
>
so
[~
t =
nm
:$
~
n
and
L
n
lim inf
Yj (n)
then
tl,
~
Yj (n)
R.
~
T
R ,
n
t = [Yi(n), Yj (n) l
so that
We wish to show
To see this, it is necessary to show
lim inf Yi(n)
~
this is not true, there is a subsequence which we shall again denote
for simplicity, and a positive constant, n
~nm
moment,
t
Since
Yi(n)
:$
L - n.
Yi(n).
L.
If
Yi(n)
For the
Since
~
nm
n
T
t
eventually,
= ~nm (YO) = ~ nm
~ nm
such that
will mean the subsequence corresponding to
n
is constant on
is
n
= ~ nm
(L - n)
n
(L - n)
converges to
f
n
lim sup ~
nm
n
=
f
n
(L -
m
m
[L - n, R).
eventually on
n)
(L - n)
we have
on
[L -
n,
R)
•
18
[L - n, R + n]
Eventually,
![L-n,R+n]
where
L([L , R ])
E
n
~nmn
so that by (1.2)
n
dA
![L-n,R+n]
$
~nmn
dA,
is the subsequence corresponding to
By Fatou's lemma,
A
1 im sup ! [L-n,R+n] gnm
dA
![L-n,R+n] lim sup
$
n
But
lim sup
~ nm
(x)
f
$
n
(L-n)
~nm
lim sup ![L-n,R+n]
dL
n
for all
m
~nm
dA
x
so that
![L-n,R+n] fm(L - n) dA .
$
n
We have just seen
A
lim00
![L-n,R+n] gnm
n -+
= ![L-n, R+n] f dA ,
dA
n
so
(4.3)
But
![L-n,R+n] f dA
[L, R]
must contain
Similarly,
Yj(n)
M.
Since
has unique mode,
f
Hence (4.3) cannot hold.
converges to
$
![L-n,R+n] fm(L - n) dA.
M since otherwise the characterization of case
in Theorem 3.1 could not hold.
some neighborhood of
$
R.
Since
~
(Yj(n) - Yi(n))-l • ![y
Writing
dA
=
Thus
f
lim
n-+ oo
belongs to
Y
]
i (n)' j (n)
F (y. ( )) - F (y. ( ))
nJn
nln
> f
m
(L -
n
by (4.1)
dA.
+ (1)
n
it is clear
that
(R - L)-l • (F(R) - F(L))
Thus
n lim
-+ 00
~nm
n
(YO)
=
for
yo
=
such that
in
y.I (n ) = L.
Hl(T ),
t
~nm
n)
i
L
<
yo
<
R.
19
This completes the proof in the case that
characterization of
E(fIL([L, R]».
i
of Theorem 3.1 is the
The other cases may be proven in a
similar manner.
We are easily able to upgrade the convergence in this case by applying
methods similar to those of the Glivenko-Cantelli Theorem.
If the conditions of 4.1. hold, then
Corollary 4.1.
~ om
converges
n
uniformly except on an interval of arbitrarily small measure containing
or
R or the union of two such intervals.
~
This convergence holds for all
n', hence with probability one.
points in
5.
L
A convergence theorem for the center mode.
Recall from Section 2,
the maximum likelihood estimate is given by a conditional expectation
n
with respect to a lattice containing a modal interval, where
mode.
Let
and
the sequence,
{m},
R = m
n
n
has a limit
n
+ 12
E •
m
n
is the center
It is desired to show that
m.
Roughly speaking, we shall accomplish this by showing that there is an
m for which
m,
flog f
• f dA
is maximized.
If
m
n
fails to converge to that
then we can pick another estimate which has a larger likelihood product
eventually.
This would be a contradiction to the choice of
Lemma 5.1.
L
m
1
= m - 2"
E
Proof:
and
= Jlog
Let
H(m)
R
=m+~
E.
f
m
• f dA,
Then
H(m)
where
f
m
~.
n
E(fIL([L, R])
and
is unimodal.
The real line may be divided into three regions, corresponding
to the three different characterizations of
M
{m: f(L) ~ (R - L)
f(L) < (R - L)
-1
-1
f.
m
Let
• J [L,R] f d\ > f(R)
• J[L,R] f d\ ~ f(R)}.
or
20
Clearly
where
1
M
2
is the mode of
M
Let
f.
straightforward to show that
m such that
each
3.1.
m
If we can show
m+
f
~
m+
m
2::
m+
H(m)
+ -1
is an upper bound,
€
2
= inf M and m+
m
is characterized by
m
m
M
.
For
f
m
~
m- ,
f
M.
sup
i
of Theorem 3.1 for
is characterized as in
m
is characterized as in
m
is nondecreasing for
I t is
m
~
m-
ii
and
iii
of Theorem
nonincreasing
m+ ,
this will be sufficient.
f m*
fdA. Let
Let us consider m < m* ~ m
H(m*) - H(m) = flog -fm
L, L*, band b* have their obvious meanings. Since
for
m
~
and for
of Theorem 3.1
M and
is a lower bound of
€
2::
and unimodal for
m
~
m
~
.
(y-L)
-1
-1
·f[L,y]fd>..~(y-L*)
·f[L*,y]fd>..
(y - Ly1. f[L,y] f dA
it follows that if
(y - L*) • f[L*,y] f d>" - f(y)
b
on
and
we may conclude
b*,
[L*, b*].
*
fm
flog -fm
b
2::
2::
0,
so is
From a consideration of the sets defining
O.
2::
f(y)
b*.
Hence both
and
f
m
are constant
From this, we have
f d>"
=
f m*
fm*(L*)
f[L*,b*]C log -f- f d>" + log f (L*) f[L*,b*] f d>.. •
m
m
But
f [L* ,b*] f d>..
=
f [L* ,b*] f m* d>..
and since
on
f m*
f10g(-f-) f d>"
m
[L*, b*]c ,
f m*
f10g(-f-) f m* d>" .
m
21
The latter quantity is the Ku11back-Leib1er information number and is nonnegative.
Hence
~
H(m*)
A similar proof holds for
Let
net)
H(m)
for
t
+
>
Let us turn our attention to
m •
~
O.
We may expand
H(m)
m :s: m :s: m+ .
as follows
log r-!o[=L.z.;;:,~=]_fd_A]
Recalling
m,
= t log t
m
H(m) •
=m-
L
1
2
-
1
R=m+-E
and
E
2
and differentiating with respect to
we obtain
"' t
H' (m)
dAJ .
[L,R! f
~(R)
- f(L)]
+
"(f(L»
- "(f(R».
Rewriting this,
H' (m)
I
n
f (R) - f(L)
Noticing that
feR) :s:
-1
E
fJ
[E
[L,R] f
f[L,R] f
•
dt.] _ rn(f(R»
- n(f(L»l
[f(R) - f(L)
dA
and that
J
n I (t)
is an increasing
function, we have
Let us assume
feR) - f(L)
~
O.
In this case, the right hand side represents
the slope of a chord subtracted from a tangent.
It is clear in this case, that
22
H' (m)
feR) - f(L) ~ o.
Hence, we have that if
feR) - f(L) ~ 0,
feR) - f(L) ~ 0,
H'(m) ~ O.
H
is that
m
then
for which
mode, but if not
If
f
H(m)
=
feR)
H'(m) ~ O.
Hence,
f(L).
H(m)
Correspondingly, if
is unimodal.
Clearly there need not be a unique
will have a modal interval, namely
is symmetric, then it is clear that the mode of
that is, the mode of
H
M.
is
In any case, since
M,
are lower and upper bounds respectively on
between the modes of H and of
f
is less than
Let us assume, in general that
cumbersome details, let us assume
and strictly decreasing at
x
m
f
{m: feR)
H
{m}
is
(-
00,
m.
f,
1
2
it is clear that the difference
1
2"
E.
H.
In order to avoid
is strictly increasing at
Thus
H
x
for
has a unique mode.
x < M
An
f.
{~ }, we wish to show
is the sequence of center modes of
n
converges to
n
f
{m}
f(L)}.
M+-E:
and
E
obvious modification exists if there are some "plateaus" in
If
=
is the mode of
1
2
M
is the mode of
x > M.
for
The mode of
n
The next lemma applies in the case that the support of
(0).
Lemma 5.2.
If the entropy, -flog f • f dA
of
f
is finite, with pro-
babili ty one
-
Proof:
{m })
n
label
X
o
so that
~ n(X)
Then
as soon as
for
If
x
00
n
lim sup m
n
=
00,
which diverges to
~
y.
for all
x
1 - F(x )
O
>
n
m - X >
o
n
~ xo' ~n
~
< lim inf m
1
(n).
00.
Let
n
(x O ' mn )
Thus
n
<
00.
there is a subsequence (which we will also
Suppose
E
lim sup m
fn(x )
O
>
y
be chosen in (0, 1) and choose
0
and
~ n(x O)
infinitely often.
~
n
converges uniformly to zero.
>
n
infinitely often.
This is impossible
for sufficiently large
n.
Hence
23
Let us write
n
L(~
We may write
n n
L(~
n
If we let
~n
n
n
1
L(~ )
I:
n
n
logd (y.)).
n
i=l
1
as follows
)
log(~ n (y.))
+
1
)
log(~ n (y.))
1
I:
be the number cl observations less than or equal to
is bounded by
E
-1
and since
since
converges to
we have
lim sup n
-1
log(~ n (y.))
1
~ (x)
On the other hand, since
log(~ (x))
n
n
diverges uniformly to
lim sup n
~
log(E
-1
) • Y .
converges uniformly to
-
for
00
-1
x
~
x '
O
0
Thus
log(~ n (y.))
1
-
00.
Finally, we may write
lim sup L(~ )
-
n n
Let us consider
f
m
= E(fIL([L, R]))
00
where
L = m
1
2
Clearly by the Kolmogorov Strong Law with probability one
limoo L(f ,)
n-+
n m
flog f , • fdA.
m
This quantity is finite since the entropy is finite.
lim inf [L(f ,) n m
L(~)]
n n
=
+
Hence
00.
E
and
R
m
+ 1:2
E
.
24
This is equivalent to saying, eventually
is a contradiction to the choice of
f
~.
m
Hence
n
~
is more likely than
,
lim sup m
<
n
which
n
A similar
00
argument holds for the other inequality.
It may be that the support is bounded below, bounded above or both.
a
= inf{x: f<x)
O}
>
S = sup{x: f(x)
and
a}.
>
Let
Using methods similar to
those found in Lemma 5.2, we may obtain
Lemma 5.3.
i
If
unimodal density
f
1
a+- E
<
2
f
m'
21
E
m'
<
1
E.
2
S
E(fIL([L' • F']) )
3.1,
f
Let
both
f m'
and
and the entropy of the continuous
~
{m }.
n
(Here let us pennit
L
with
m
1
-2
each agree with
E
1
-2
m'
L'
f
S -
21
Then
m'
has the property
a
00
lim sup m
n
and
E
21
[c, d]c,
E, S
c
1
-2
and
d
small so that
we write
= -
nm'
~
n
f
E) ,
m
= f m' = f.
S = + 00).
and
and
R'
=
and let
21
m' +
Moreover, since both
are in the support of
d + n
Now let
By Theorem
E.
on the complement of some interval.
(recall the bounds on
c - nand
f.
We may pick
n
0
>
are in
M+
and
2
are in the support.
m'
m and
1
M- - E
M are
be the larges t .
d
1
2
E),
sufficiently
In the next lemma, when
we shall mean that subsequence of the sequence of maximum likelihood
By
mI.
the maximum likelihood estimate whose mode is
Lemma 5.4.
on
E.
1
R = m +-E
2
estimates for which the center modes converge to
~
<
be the smallest endpoint of those two intervals and
c
Hence on
(a +
lim inf m
n
with
f m'
m
<
be any cluster point of
E(fIL([L, R]))
m
min{M - a, S - M}
<
is finite, then with probability one
a +
Let
E
f
n
c
we shall mean
m.
With probability one, for sufficiently large
(c - n, d + n) .
*
n,
f
n
* = ~n
25
in
n, c).
(c -
~
(to' c - 0) ,
Let
III
The set of probability one is
Proof:
OE (0, c - to).
Let
and
n
f
*
n
Since
as described in 4.
f
has a point of increase in
must both eventually have a jump in
be the smallest observation greater than
Yi
jump and let
Without loss of generality, we may assume
has a jump.
the largest observation smaller than or equal to
in
f
n
*
Thus we have
[f
t
n
*
~
so
t],
Assume
y .•
~
Yi*
to
for which
to
1
Let
f
Yj *
- i1
E, m +
i1
and
for some observation
Hence by (4.1)
E]).
(5.1)
where
In a similar manner using (4.2),
as defined in Section 2.
we can show
(5.2)
Let
gn
*
t
A
g
n
A
g
~ (Y .
1
*-
y. ) -1 • f [
1
as defined in Section 2.
nm
n
will agree.
mI - i1E > c - u-".
] g
n
*
d'/\ •
For sufficiently large
This is easily seen, since
1
m - - E >
n
2
As soon as
Yi,Y i *
1
m
n
C -
0,
2
E
A
gn
n,
and
converges to
and
gn
*
agree.
This
together with (5.1) and (5.2) shows
(5.3)
(Y i * - Yi )
Now let
(5.4)
t
-1
~ n(t O)
. J [Yi,Y
and
P
~
A
gn dA
i *]
[~
t
n
(y. - Y
1
~
(y. - Yj*)
1
> t].
j*
-1
f
A
[Yj*,Y i ]
gn dL
Again by use of (4.2), we have
) -1 • f [
*
n
be
for which there is a jump
Let
= [Y j *, Yk *]
Tt
Then
~ n has a
We wish to show equality holds
in the last inequality.
T
(to' c - 0) •
for which
to
be the smallest observation greater than
Yi *
Pick
Yj*'Y i
] ~
n
dL
26
Finally letting
t = ~ n (y i)
~ n (y i)
(5.5)
:s;
and using (4.1) , we have
(y i* - y i)
.f
-1
1\
[Yi'Yi*]
gn d".
Using (5.3), (5.4) and (5.5),
~
~
is a jump point in
But
n
,
Y f Y *
i
i
[f
n
*
>
t].
(y.).
1.
so
~ n (to)
Thus our assumption
n
~ n (y.).
1.
<
is false.
Let us now consider
Recall now that the definition of
n
L ([m -
with respect to
ditiona1 expectation of
f
n
*
t
=
f
and
n
involves the con-
12 E,
m+
12 E]).
Since
there are only a finite number of sets in this lattice, the supremum in (4.2)
is really a maximum.
Let
L
= [u, v]
be the member of the lattice such that
(L - P )-1 • f
t
L-P
g
t
n
*
dA.
Rewri ting this
(y. - u
1.
+ v - Yk *) -1
• f
[ ] g * dL
[u,y i ] U Yk*,v
n
But by (4.2) ,
*
f n (to)
(v - Y *)
k
~
-1 • f
* dA
[yk*,v] gn
Combining these two displays, we obtain
-1
*
f n (to) :s; (Y i - u)
. f [u,y
i
] gn
*
dL
But also by (4.2),
*
f n (to)
~
(y. - u)
1.
-1
• f
[u,y.] gn
1.
* dL
.
27
Hence, for sufficiently large
n,
{(yo - u) -1 • f[
sup
1
< y.
U
u,y.
] gn * dAle
1
1
Similarly, for sufficiently large
~ n (to)
and
Since
equal at
to
f
n
gn *
sup {(y. - u)
1
u < Yi
-1
"
• !
dAle
[u,y.] gn
1
~
eventually agree in this region,
eventually.
*(to)'
n,
For any
t
<
to'
and
n
f
n
*
must be
by virtue of the fact,
and by use of (4.1) and (4.2), we obtain the desired con-
clusion.
We may now state and prove the theorem.
5.1.
THEOREM
If
tinuous unimodal density
lim
n +
and
m
2
<
E
f
where
m is the mode of
H.
{m }, (which we again label {m }), converges
n
n
1
1
1
E < m' < S - - E. Let f = E(f L([m - 2 E, 1
m + 2 E]»
2
m
,
1
be the maximum likelihood
E, m +2"E]). Let f n
1
a+2
1
= E(fIL([m' - 2"
f
m'
m' :f m,
with
*
estimate when the mode is known to occur at
m and let
likelihood estimate when the mode in unknown.
a larger likelihood product than
with probability one,
and
and the entropy of the con-
is finite, then with probability one
m,
n
S - M}
min{M - a,
If not, a subsequence of
Proof:
to
00
1
{~ }
n
m
n
~n
.
~n
be the maximum
We intend to show that
f
m').
converges to
Recall that
be sufficiently small so that
on
*
has
This will be a contradiction, so that
m.
(It is understood that
refer to the subsequences corresponding to the subsequence
converging to
n
[c, d]c,
c - nand
d + n
f
=
f
,
m
=
f
.
m
Let
n
{m }
n
>
are in the support of
0
f.
28
For sufficiently large
probability one.
f
m
,
1
"2
>
f
*
~
and
n
agree on
n
d + 11]
~n
and because
uniformly with probability one and
f
n
*
d + 11)c
(c - 11,
= min{f(c - 11), f(d + 11)}.
k
[c - 11,
on
k
Let
n,
Because
f
converges to
converges to
f
m
1 k
> -
m
f
with
2
m
and
almost
,
almost uniformly
with probability one,
logd ) converges to log(f ,)
n
m
*
log(f ) converges to log(f )
n
m
6 >
o.
Pick the set
P(A) • 10g(kE/2)
> -
almost uniformly with probability one and
almost uniformly with probability one.
Let
A of arbitrarily small measure so that
6.
Now i f
are the ordered observations,
let
1
L(f *) - L(~ )
n
n
n
n
For sufficiently large
(c - 11,
c
d + 11) •
[c - 11 ,
d + 11],
f
n
*
A, log(f
k
n
2
=
*)
log(~ n )
~n ~
Yi
in
n*
bilityone,
n
d + 11] n A.
1n
l
-n L {log(f *(Y.)) l
n
1
>
1.
E
log(kE/2),
sufficiently large
n
n.
1
(y.))}.
1
~
and
agree on
n
in
Yi
log(~ n (y.))}
1
eventually.
.
This follows since
Thus
n*
> n
Since
(n*/n)
L {log(f *(Y.)) - log(~ (y.))}
2
f *
n
n
with probability one,
is the number of observations in
[c - 11,
log(~
1
be the summation over
l
log(~ n (y.))}
1
1 L {log(f *(Y.)) n 2
n
1
where
L
for sufficiently large
n
n
*(y.))
with probability one,
Thus i f we let
eventually and
>-
{log(f
L
n i=l
L(f *) - Ld )
n n
n n
Now on
n
n
1
log(kE/Z)
A and
L
Z
converges to
> -
6
eventually,
is the summation over
P(A)
with proba-
with probability one for
29
In a similar manner, one may show eventually
1
- -
n
Then if
L
3
L {10g(f (y.»
m 1
2
is the sum over
Yi
- log(f ,(y.»}
m
1
[c - n,
in
log(f
n
*)
log(~)
and
on
n
8 •
d + n] - A,
with probability one for sufficiently large
properties of
> -
n.
By the uniform convergence
d + n] - A,
n,
[c -
we have
eventually with probability one,
L(f *) - L(~ ) ~ 1 L {log(f (y.»
nn
nn
n l
m1
On
[c -
n,
c
d - n] ,
sufficiently large
f
f
m
=
f
,
m
- log(f ,(y.»} - 38.
m 1
so that with probability one and for
n,
n
L(f *) - L(~ ) ~ 1 E {log(f (y.»
n n
n n
n i=l
m 1
- log(f ,(y.»} - 38 •
m 1
By the Kolmogorov Strong Law, with probability one
lim
n -+
Since
00
*
(L(f ) - L(~ »
n n
n n
m is the unique mode of
f
~ f log(f m ) f dA - 38.
m'
f log(f ) fdA,
m
and since
8
was arbitrary,
with probability one
lim (L(f *) - L(~ »
n-+ oo nn
nn
This is equivalent to
is a contradiction.
f
n
*
>
0 •
having a larger likelihood product than
Hence with probability one,
{m}
n
converges to
~
n
.
m.
This
30
6.
Summary.
For
a unimodal density.
fm
= E(fIL([m
length
E.
-
%E,
Here
E >
0,
we have given a maximum likelihood estimate of
This estimate is a strongly consistent estimate of
m
1
+ 2 E]).
Of course,
m is the mode of
H(m)
f
m
=f
except on an interval of
= J log(f m) fdA.
same method yields a maximum likelihood estimate.
If
E
= 0,
the
The lack of a uniform bound
causes the consistency arguments described in this paper to fail.
Hence the
asymptotic behavior of this sort of estimate is unknown.
Wegman [5] describes a consistency argument for a related continuous
estimate.
7.
The analogue for
Acknowledgements.
~
n
is not difficult.
I wish to thank Professor Tim Robertson for
suggesting the topic from which this work arose and for his guidance as my
thesis advisor at the University of Iowa.
31
REFERENCES
[1]
Brunk, H.D. (1963).
expectation,"
[2]
Brunk, H.D. (1965).
"On an extension of the concept of conditional
Proc. Amer. Math. Soc. 14,
"Conditional expectation given a a-lattice and
applications,"
[3]
Robertson, Tim (1966).
given
[4]
Ann. Math. Statis t. 36,
a-lattices,"
Robertson, Tim (1967).
Wegman, E.J. (1968).
1339-1350.
"A representation for conditional expectations
Ann. Math. Statist.]2,
1279-1283.
"On estimating a density which is measurable
with respect to a a-lattice,"
[5]
298-304.
Ann. Math.
Statist.~,
482-493.
"A note on estimating a unimodal density,"
Institute of Stat. Mimeo Series No. 599, The Univ. of North
Carolina at Chapel Hill. (Sub. to Ann. Math. Statist.)