Carroll, R.J.; (1975).On sequential density estimation."

ON SEQUENTIAL DENSITY
ESTI~~TION
by
Raymond J. Carroll
Department of Statistics
university of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series #1025
August, 1975
,/
L
SUMrJ1ARY
We consider the problem of sequential estimation of a density function
at a point
X
o
which may be known or unknown.
estimators of
Let
{Tn}
For two classes of density estimators
f
be a sequence of
f , namely the
n
kernel estimates and a recursive modification of these, we show that if N(d)
is a sequence of integer-valued random variables and
constants with
N(d)/n(d) + 1
in probability as
ned)
d+O, then
a sequence of
~~(d) (TN(d))-f(x o)
is asymptotically normally distributed (when properly normed).
l~e
also propose
two new classes of stopping rules based on the ideas of fixed-width interval
estimation and show that for these rules,
EN(d)/n(d) + 1 as
d+O.
N(d)/n(d) + 1 almost surely and
One of the stopping rules is itself asymptotically
normally distributed when properly normed and yields a confidence interval
for
f(x o)
of fixed-width and prescribed coverage probability.
1
Introduction
While there have been many papers on the estimation of a probability
density function
f(x)
(see Wegman (1972)), the literature on sequential
density estimation is relatively small.
Srivastava (1973) notes that one
often takes as many observations as possible in a certain time period, so
that the number of observations is random.
Davies and Wegman (1975) were
interested in developing sequential rules which satisfy a certain error contro1.
In this paper we provide a treatment of the asymptotic distributions
of two types of density estimators when the number of observations is random;
the estimators considered are the kernel estimators (Rosenblatt (1956), Parzen
(1962)) and a variant of these due to Yamato (1971).
classes of sequential stopping rules for estimating
We then develop two new
f(x)
and obtain their
precise asymptotic behavior; one of the stopping rules actually yields a confidence interval for
f(x)
of fixed-width and prescribed coverage probability.
Specifically, He focus our attention on the following problem.
we are interested in estimating
f(x O)' where
an example of the latter is the case where
mode.
x
x may be known or unknown;
o
is the population median or
o
Typically, there will be a sequence of estimators of
Suppose N(d)
ping rules) and
Suppose
xO' say
{T }.
n
is a sequence of integer-valued random variables (i.e., stopned)
is a sequence of constants for which N(d)/n(d) + 1
in probability as' d+O.
It is known that in many cases, if the density estiA
mators are denoted by
zero, as
f (x), that for some sequence
n
n-.oo
(l.la)
(nE n )~(fn (Xo)-Efn (xO)J1
(l.lb)
(nEn)~(fn(xO)-f(xo))
t..
A
h: } decreasing to
n
2
converge in distribution to a normal random variable.
The first two sections
of this paper are concerned with finding conditions under which both
~
(1.2a)
A
"
,
'
(N(d)E:N(d)) (fN(d) (TH(d))-EfN(d) (1 N(d)))
(N(d)E:N(d))~(fN(d)(TN(d))-f(Xo))
(1. 2b)
still converge in distribution to a normal random variable.
Srivastava (1973)
attempts to show for the kernel estimators that
(1.3)
is asymptotically normally distributed, but the proof of his Theorem 4.1 contains a number of errors; for example, the statement after equation (4.9) is
not correct (take the kernel to be uniform on the interval
[-~.~]).
Wegman
and Davies (1975) have shown the asymptotic normality of (1.3) for the Yamato
(1971) recursive estimators indirectly by use of an almost sure invariance
principle.
Hence, finding the asymptotic distributions of (1.2a) and (1.2b) are
still unsolved problems, while that of (1.3) is unresolved for the kernel
estimators.
In Sections 2 and 3 we obtain the necessary results by means of
the theory of weak convergence (Billingsley (1968)) and a random change of time
argument.
Incidentally, the approach yields the asymptotic normality of (1.3)
for both types of estimators in a general and reasonably straightforward manner, although it is much harder to obtain asymptotic normality for (1.2a)
and (1.2b).
In the final two sections of the paper. we propose two new classes of
stopping rules
tion.
N(d), both based on the ideas of fixed-width interval estima-
Davies and Wegman (1975) have proposed one class of stopping rules,
3
shown that they stop with probability one, and investigated the existence
of moments, but the exact large sample behavior of their rules is unknown.
For both classes we propose, we find sequences of constants
N(d)/n(d) + 1 almost surely as
(1.4)
EN(d)/n(d) + 1
(1. 5)
as
ned)
for which
d+O
d+O .
This is precise information about the stopping rules and gives the user an
idea of the approximate number of observations to be taken.
Interestingly
enough, one of the stopping rules yields a confidence interval for
f(x O)
of fixed-width and prescribed coverage probability. and we have been able
to show the asymptotic normality of this stopping rule itself.
Kernel Estimators
In this section we investigate the asymptotic normality under random
sample sizes of estimates of the density using the kernel estimates due to
Rosenblatt (1956) and Parzen (1962).
f(x)
The kernel estimator of the density
is given by
n
f (x)
n
where the kernel
-1 I..\"'
= (ne:)
n
1
K
(-1
e: (x-X.) )
n
1
,
K is a bounded density function and
of constants decreasing to zero.
We wish to estimate
{e: n } is a sequence
f(x O)' where
o
X
is
an unknown point; we will thus asswne the existence of a sequence of estimators
{Tn}
of
x •
O
For example, if
o were the population median, Tn would be
X
taken as the sample median.
Although the details are fairly complicated, our method is contained
in a number of basic steps.
Initially, we consider the asymptotic normality
4
of fn(x O)
under random sample sizes by giving a weak convergence argument
f[nt] (xO) (where [0] is the greatest
integer function) and then using a random change of time argument (Billingsley
applied to a process closely related to
(1968)).
Then,
and
f n (T)
n
are shown to be sufficiently close by
modifying the maximal deviation arogument of Woodroofe (1967).
The two basic processes with which we will work in this section are given
below.
DEFINITION 2.1.
for
Let
O<a<l
be fixed, let
{an}
converge to zero, and define
O$s,t~l,
vn (s) = (nE n0)- ~
[ns] {
I
II K(E[ ns
](Xo-X.)) 1
~ [ns] {
1
K(Ei
= (nE)n II n
a ](xo-X.l)
1
V*(s,t) = (nE )
n
n
_~
[ns]
~
1
EK(E[~S](Xo-X))}
if
s~ [na]ln
EK(E[~a](Xo-X))}
if
s$[na]/n .
{K(E[~S](Xo+tan-Xi))
-
EK(E[~S](Xo+tan-X))}
if
~[na]/n
E~(E[~a](Xo+tan-X))}
if
What ,..;e eventually will assume is that
an
T
n
s:=;;[na]/n.
faster than
converges to
converges to zero, so that we can replace the parameter
t
in
V;
by
a -1 (T .,x ), It is clear that V is a random element ·of D[O, 1], while if K
n n
o n
o
is right continuous, with limits from the left, V*n is a random element of
D (see Billingsley (1968) and Bickel and Wichura (1971) respectively for
2
definitions). The i.dea is now reasonably clear; we will first consider the
weak convergence of the process
in probability.
Vn > then show that
Vn
and v*n
are "close ll
Then, since the normed random sample size version of the
5
estimate of f(x o)
can be basically obtained from
V~(d)(N(d)/n(d), a~~d)(TN(d)-XO)r,
a random change of time argument and a
few computations will yield the final result.
It will be assumed throughout the rest of this paper that for some function
h,
€[ns]/€n
+
h(s)
as
11+00,
and A(f,K)
will be defined by
(2.1)
The proofs of all results will be delayed to the end of the section.
LE~~
2.1.
Suppose that for any cO>O
Then, the sequence
{Vn } is tight and there exists a process
(2.3a)
The finite dimensional distributions of Vn
(2.3b)
For
P~MARK
€
n
2.1.
O~ssl,
interval
f(x)
[a,b]
O<S<~.
{an}
V for which
is normally distributed with mean zero and variance
sh(s)A(F,K)
if
s~a
sh(a)A(F,K)
if
s~a.
Then, (2.2) will hold if
and is Lipschitz on this interval.
is bounded,
of constants
Yes)
for which
converge to those of V.
Assumption (2.2) is satisfied in many cases.
=n- S for some
M(C O)
there is a constant
Suppose that
K vanishes off some
It will also hold if
K is continuously differentiable, and if for any sequence
converging to zero,
6
LEMMA~~1...:-
Suppose that
-1
t3
E
= O(a )
nn
n
(2.4a)
a
(2.4b)
nEn
(2.4c)
na- 2+ S exp{_ca- S/ 2}
(2.4d)
K is Lipschitz of order one and satisfies Woodroofe's (1967) condi-
for some
0<6<1, where
0 is the standard "big oh".
increases in n
n
n
+
0
for all
c>O
tions.
Then
sup{lvn (s)-V*(s,t)I
:
n
O~s,tSl} ~ 0 ,
where ~ is convergence in probability.
RE~~RK
2.2.
estimation of
THEOREM 2.1.
Lemma 2.2 is the key result, since as will be seen it says that
o by Tn does not change the asymptotic distribution.
X
Suppose that
(2. Sa)
almost surely as
n-+co, where
"0"
is the "little oh"
notation.
(2.5b)
{an}' {En}' K satisfy the conditions of Lemmas 2.1 and 2.2.
(2.5c)
For some sequence of constants
variables
Then
N(d)
n(d)~,
satisfy N(d)/n(d) + 1
the integer-valued random
in probability as
d+O.
7
converges in distribution to a normal random variable with mean zero and
variance
A(f,K).
REMARK 2.3.
The conditions (2.5a) and (2.5b) are satisfied in many cases.
For example, if
is the pth population quantile and
X
o
Tn
is the pth sample
= O(n-~(log log n)~. If Xo
O
is the population mode, Venter (1967) and Sager (1975) have given rates of
quantile, Bahadur (1966) has shown that
convergence of T
n
to
Note that if one wanted merely to estimate
for a known
Tn
= xo.
Tn-x
(2.5a) and (2.5b) clearly hold by choosing
Thus Theorem 2.1 is a generalization of the problem considered by
Srivastava (1973).
~fuile
of
Theorem 2.1 shows the asymptotic normality of a normed version
fN(d) (TN(d)) , it is useful to ask when the integral appearing in (2.6)
can be replaced by
x
o.
This is the gist of the following Corollary.
The
results here are comparable to those given by Cacou11os (1966).
COROLLARY 2.1.
Suppose that the conditions of Theorem 2.1 hold and that on
the support of
K the density f
tiable.
(2.7)
(2.8)
is twice boundedly continuously differen-
Suppose further that
f
=0
yK(y)dy
k
(m: ) 2 a
n
n
,
-+ 0 ,
f iK
ne:
(y) ely <
0()
5
-+ 0 •
n
Then
(2.9)
is asymptotically normally distributed with mean zero and variance
A(f,K).
8
PROOF.OF LEMMA 2.1.
By using the Cramer-Wold device and the method given in
Parzen (1962, page 1069), we see that the finite dimensional distributions
of
Vn
converge, so it suffices to show that the sequence
{V}
n
is tight.
From, for example, the extension of Theorem 3 of Bickel and Wichura (1971)
given in that paper, it suffices to show that there exist
S>~,
H>O
such
Elvn (s 1 )-Vn (s 2 )1 2 ~ MSIE n- l
:S; M*ls -s
.2 1
I
by assumption (2.2).
The final case
manner, so that
suffices.
S=1
PROOF OF LEMMA 2.2.
(sl~[n~]/n~s2)
follows in a similar
Here we make use of the results of l'Joodroofe (1967).
First define
-k
J;
Z (s) = ([ns]£[ng(s)]) • (nE )' vn(s)/a[ng(s)] (x )
O
n
n
Z*(S,t)
n
= ([nS]E[ng(s)])-~ (nEn)~ V~(s,t)/a[ng(s)](xo+tan)
where
g(s)
= gn (s) = s
=
[n~]/n
if
s~[n~]/n
if
s<[n~]/n
9
If we show that
sup{IZ n (s)-Z*(s,t)I
: O~s,t~l} = sup{/Z n (i/n)-Z*(i/n,t)I
: O~t~l, O~i~n} ~ 0 ,
n
n
this will yield the result because of Lemma 2.1 and since
sup{la[ng(s)](xO+tan)-a[ng(s)](xo)f: O~s,t~l}
Now,
fix
-1
and let, for
i
0 .
+
p=O,l, ... , [E. ] ,
1
x = xO+PE
np
i
(xnp )
anp (x) = a.1 (xnp +XE.)/a.
1
1
O~x~l
pE.a -1 +XE.a -1 )
Znp (x) = anp (x)Z*(i/n,
n
1 n
1 n
O~x:::;l
Z (k) (x) = Z
Next, define
('2- k )
np J
np
if x=j2-\ j=O,l, ... ,l
defined by linear interpolation otherwise.
From
Le~~a
3.2 of Woodroofe (1967), there exists
Pr{ sup
O~x~l
sup{lz
Now, since
0:::; x:::; 1 ,
Choose
we have
Izno (x)-Z(k)(x)I>e:}
. . no
.
no
with
such that
k
DId, n l
l
:::;-D exp{-eda- }
n
Z(k) (x)
s
np
1,/2 -1
...a
such that for
'
(X)_Z(k)Cx)I : O:::;x~l, nl~i~n} ~ 0 .
no
and since
is piecewise linear for
n
.
10
Hence, by Lemma 3.1 of Woodroofe (1967),
Pr{ sup
o~t~l
I
-2 (l-!)
Z (o)-Z(k)(ta E~l)I>~} ~ M*a
n
no
no
n 1
. . ., M*u
-2 (l-!)
n
Thus, as
2
2
exp{ - EMa -B/2 }.
n
n ,n+oo,
1
sup{lz no (x)-Z no (0)1: O~x~an c~l,
nl~i~n} ~ 0
1
Since
Zno (0) = Zn (i/n)
and
-1
1
Zno (tan E.1 ) = Z*(i/n,t)a
n
no (tan E~1 ), we have
that
so that
sup{lvn (i/n)-V*(i/n,t)I
: O~t~l, nl~i~n} ~ 0 •
n
Since
K is Lipschitz, it is continuous so that the result now follows.
PROOF OF THEOREM 2.1.
Define
m(d)=2n(d)
and
Wn (5) = Vn (s)-Vn (~).
Then
is tight and
sup
~~s~~+n
min{lwm(d) (s)-Wm(d)(t)l, IWmCd)(s)-WmCd)ct+n)I}.
Since the first term on the right hand side of the above equation converges
in probability to zero as
d,n+O
(because of (2.2) and Chebychev's inequality)
and the second term is bounded by the modulus of continuity, we have (if
denotes convergence in distribution)
nLli
11
VM(d) (NCd)/mCd))
L V(~)
•
Because of Lemma 2.2 and assumption (2.5a), we thus obtain
(2.10)
But, by choosing a4
(2.11).
in Definition 1. 1, we obtain
(m(d)e: rn (d))-l (NCd)e:N(d)) (fN(d)
L
with
(N(d)e:N(d) )J;(
- m(d)e: mCd ) )-%
sup
Itls1
Now, since
(n/e:)~
n
v(%) ,
P
-+ %hC~)
bution with mean zero and variance
PROOF OF COROLLARY 2.1.
I
(TNCd))-e:~~d) K(e:;~d) (TN(d)-Y))fCY)dY]
.
Since V(%)
has a normal distri-
ACf,K) (%h(%)) , the proof is complete.
Because of Theorem 2.1, it suffices to show that
I f K(e:-n1 Cxo+tan-y))f(y)dy - e: nf(X o)/
+
0 •
K:is a density function, this term is bounded above by
sup
Itl$l
(ne:)~ J
n
K(y) If(xO-e: y-ta J/-f(xO)dy
n
n
A Taylor's expansion now completes the proof.
A Second Class of Estimators
In this section we investigate the asymptotic normality under random
sample sizes of the recursive density estimators introduced by Yamato (1971)
and defined by
f*(x)
n
= n -1
~l e:.-1 K( e:.-1 (x-X.) )
. 1
1=
I
J n-
1
1
1
1
= (n-1 f * lex) + (ue: )-l K(e:- (x_X)) .
n
n
n
n
12
The recursive property of this class of density estimators is clearly useful
in sequential investigations and also for fairly large sample sizes since
addition of a few extra observations means the kernel estimato
be entirely recomputed.
xo'
for a given
fn(x)
Wegman and Davies (1975) have recently shown that
f~(xO)
satisfies an
aln~st
sure invariance principle; this
method is closely related to that of Jain, Jogdeo and Stout (1975).
still left with the
must
~rob1em
f(x O)
of estimating
where
We are
is unknown.
The outline of our results is very much like that of Section 2, but the methods
are different (especially the analogue to Lemma 2.2, where we use a weak convergence argument) and we are able to relax the Lipschitz condition on
K.
One can obtain an analogue of Lemma 2.1 using the results of Wegman and Davies
(1975); however, their assumptions and methods
are different from those used
here.
We again start with two processes.
DEFINITION 3.1.
Define
vn = Vn (s) = (nle )n
v*n
k
= Vn (S,t)
= (nle )-2
3.1.
Suppose that
LEt~
n
~ ens]
1
{K(e: (x -x.))
. 1
101
1=
L
ens]
1
1
L
{K(e:
(xO+ta -X.)) - EK(E~ (xO+ta -X))}/e . .
. 1
1
n 1
1
n
1
1=
lirnnsup
f K (y)f(xo-Y£n)dY
2
f
that
K,f
2
K (y)dy <
ro
<
ro
,
satisfy the conditions of Theorem 5 of Yamato (1971), and that
13
there exists a number
a
(O~a~l)
n
-1
for which
n
l
i=l
e: If:..
n
1
+
a .
Then Vn is tight and there is a process, V -.in. D[O,l] for which the finite
dimensional di$tributions of V. converge to those of V and Yes) is norn
mal1y distributed with mean zero and variance
ash(s)A(f,K).
The following two results form an analogue to Lemma 2.2.
the Lipschitz condition on
K is somewhat reduced in Lemma 3.3, the price
being paid is a stronger relationship between the sequences
LEI~~
3.2.
Note that while
{a }
n
and
{f:.}.
n
Suppose one of the following hold:
(3.la)
K is Lipschitz of order one and
(3.lb)
K is continuously
2
a /£3
n n
diff~rentiable,
+
a 4 Is 5
n n
O.
+
0, and for any sequence
n +0,
n
Then,
sup{lvn (s)-V*(s,t)I
: O~s,t~l} ~ 0 .
n
LEMMA 3.3.
Suppose
Lipschitz on
[a,b].
K vanishes off a closed interval
y>O.
and is
Suppose further that
for some
(3.2)
[a,b]
13>0, and
for some
14
Then,
• : O~s,t~l} -+
p 0 .
sup{ IV (s)-V*(s,t)I
n
n
Theorem 3.1 and Corollary 3.1 (which are given below) follow in a manner
similar to Theorem 2.1 and Corollary 2.1.
THEOREM 3.1.
Assunie (2.Sa), (2.Sc), the conditions of Lemma 3.1, and that the
conditions of either Lemma 3.2 or Lemma 3.3 hold.
Then
converges in distribution to a normal random variable with mean zero and
variance
aA(f,K).
COROLLARY 3.1.
Under the conditions of Theorem 3.1 and Corollary 2.1,
is asymptotically nonnally distributed with mean zero and variance
PROOF OF
LEr$~
3.1.
aA(f,K).
As in Lemma 2.1, it is straightforward to verify the
convergence of the finite dimensional distributions of V.
n
sufficient to show the existence of M>O,
S>~
(3.4)
Now, the left hand side of (3.4) is bounded by
such that for
Thus, it is again
sl=j/n,
s~=k/n,
'-
15
,,; (e: In)
n
which completes the proof with
PROOF OF
LEr~~
3.2.
6=1.
We will use a weak convergence argument, so it is first
necessary to verify the convergence of the finite dimensional distributions.
Fix
x and
t.
Then
=
n
(e: In)
n
L
i=l
2
{K(y) - K(y+tan Ie:.)}
f(xO-e:.y)dy
.
1
1
2 3
If (3.la) holds, the first integral expression is bounded by Manle n .
If
(3.1b) holds, a Taylor's expansion shows that the last integral expression
is bounded by M(an Ie: n )2.
Hence, by Chebychevis inequality, the finite
dimensional distributions each converge to zero in probability.
tightness, we again
USE;
the extension given by Bickel and Wichura (1971) of
their Theorem 3; to do so, first define the process
It is clear that
To verify
V**(s,t)
n
V*-V** ~ 0 so that we may work with V**.
n n
'
n
= V*(s,[nt]/n).
n
It is thus
sufficient to verify the moments condition given by equation (3) of Bickel
and Wichura (1971).
blocks.
M>O, 6>1
(3.5)
Letting
We will adapt their notation and let
B,C
be neighboring
By the Schwartz inequality, it suffices to show that there exists
for which if
i,p,q
are integers,
16
we see that the term on the left hand side of (3.5) is
(E
In) 2
n
i
I
j=l
E z~
In
+ (e
n
In)2
I
j..eQ,
j,t::::;i
If (3.la) holds, we have thus bounded (3.5) by (for some
so that tightness holds in this case.
.Z.
1n
M>O),
If (3.1b) holds, we have
= E.1-1 {K (-1
E. (xO+pa In-X.) ) - K(E.-1 (xO+qa In-X.) ) } + 0 ( (p-q)a Ine: )
1
n
1
1
n
1
n
n
so that
and hence (3.5) is bounded by
which completes the proof.
PROOF OF
LE~~~
3.3.
We first show that each of the finito dimensional dis-
tributions converge to zero.
that
a=-l, b=l, so that
Fix
s,t, and assume (without loss of generality)
17
2
Elvn (s)-V*(s,t)1
n
~
(E
In)
n
n
\'
-2
l.
Ei
i=l
+ (e:
~
n
(e: In)
n
2
K (e: -1
i (xo-y))f(y)dy +
L
In)
i=l
r
i=l
I
xo+e:·+ta
n
e::
1
l Jl
-l+ta
n
Ie:.
1
n
1
xO+E i
[K(Z)-K(z-ta 1e:.)]2 f (x -E.Z)d Z
O 1
n 1
2
an Ie:n
n
~
M..,(e: n In)
,~
r
i=l
{ a 2 e:.-3 + a e:.-2 } +
n
1
n
a.
1
Thus, the finite dimensional distributions converge, so that it remains only.
to verify tightness.
As before, we first try to verify
V*-V** ~ 0 •
n n
(3.6)
Let
I
A denote the indicator function of the event
A, and define
A.1n as
the union of the intervals
[e:., e:.+a +n -1 ], [-e:., -e:.+a +n -1 ] .
1
1
n
1
1
n
Since
K is bounded and Lipschitz on its support, there is a constant
M
for which
.,
(3.7)
IV*(s,t)-V**(s,t)I ~ M{(a Ie: )~ + (e: In)
nn
n
n
n
k
2
n
I
'1
1=
I
. } .
Aln
By using Chebychev's inequality for fourth moments, we see that each term on
the right hand side of (3.7) converges almost surely to zero.
Hence we need
18
only verify the tightness of V -V**. Defining
n n
3.2~ we sec that for p<q~ p, q integers~
Z.
as in the proof of Lemma
1n
XO-E.+(q-p)a /n
-2
2
2
-2
~ HE i ((p-q)/n) (an/Ei) ) +E i
I
1
2-1
n
K (e: i (xo-y))f(y-pan/n)dy
XO-E i
Also, one shows by similar computations that
Thus (3.5) is bounded by
M{(i/n)e:- 2 I~I (an/n) + (i/n)2 1!.?..=.9.1 2 (a /£ )2} .
n
n
n
n n
(3.8)
for some
8>0, and since
n
I E:~I
~ lIn
. , (3.8) is boun-
ded by
(3.9)
which completes the proof.
First Class of Stopping Rules
In the previous sections we have assumed (in (2.5c)) the existence of
a stopping rule with the property that
N(d)/n(d)+l
in probability as
d+oo.
While it is easy to write down various stopping rules, it is not clear how
to develop rules which are based on reasonable criteria for density estimation.
19
We aro aware of only one class of stopping rules for the density estimation
problem; this class was suggested by Davies and Wegman (1975) who base their
idea on stopping when
.c
and
n-1 are close together. However, their
stopping rules have not been shown to satisfy (2.5c) and the precise asympto.L
n
f
tic behavior of the rules is not known, other than that they terminate with
probability one and have certain moment properties.
In this and the next
section we will discuss two classes of stopping rules which are motivated
in a natural manner from the ideas in the theory of fixed-width confidence
intervals (Chow and Robbins (1965), Govindarajulu (1975)).
We will verify
(2.5c) and the related property
EN(d)jn(d)
-+- 1
for all the rules we propose, thus making their properties clear.
One of
the stopping rules, discussed in this section, will actually yield fixedlength confidence intervals for
f(x O); we will also be able to discuss the
asymptotic normality of this stopping rule itself.
In this and the next section we will make the following general assumptions.
There will exist a sequence of density estimators
f (x)
n'
such ther;--
A
(4.1)
for a sequence of statistics
(4.2)
if N(d)jn(d) ~ 1,
function
K, and
zero and variance
B(K)
N(O,a 2)
2
a , then
{T },
n
f (T ) -+- f(x )
o
n n
almost surely.
is a constant depending on some known
denotes a normal random variable with mean
Some discussion is in order.
We are not assuming
f
n
type of estimator such as discussed in Sections 2 and 3.
(x)
is a special
However, as we have
seen, these latter estimators often satisfy (4.2), and it is not in general
very difficult to verify (4.1).
f
For example, if
xo' . f
in some neighborhood of
f
n
converges uniformly to
is continuous, and
Tn
converges to
X
o
almost surely, then (4.1) holds since
Conditions on uniform convergence are found in Schuster (1969) and Davies
(1973) •
Another important point to note is that many of the results in the next
two sections hold if (4.2) is replaced by convergence as in Theorems 2.1 and
3.1, if
f(x
o)
is merely replaced by the correct function.
Unless specified
otherwise (as in Lemma 4.4) this will be the case.
The first stopping rules we discuss
introduced by Chow and Robbins (1965).
function and
~
-1
ari~e
If
w
in a manner similar to those
is the normal distribution
its inverse function, define
(4.3)
STOPPING RULE 4.1.
The stopping rule
nE
LEMHI. 4.1.
n
~
(bId)
N(d)
stops the first time
2 ,.
f (T ) •
n n
Suppose that (4.1) holds and that
N(d)EN(d) (b 2 f(xO)/d 2)-1 + 1 a.s.
En IE n- 1
as
+
d+O.
1.
Then
n~nO
that
21
LEMMA 4.2.
Suppose (4.1) holds and
8 /8
n
n-
1
+
1.
If (4.2) holds, then
The proof of LeJIllUa 4.1 is contained in Lemma 1 of Chow and Robbins (1965).
The proof of Lennna 4.2 is immediate from (4.2) and (4.3).
Because of Lemma 4.2, we say that stopping rule 4.1 yields a confidence
interval of fixed width
LEI-jf.1A 4. 3 •
2d
Suppose that
and prescribed coverage probability
Ef'" (T )+f (x )
O
n n
I-a.
and that there is a constant
M*
for which
()()
L
(4.4)
If
n=l
En=n- a
for some
Pr{lf (T )-Ef (T )I>M*} < ()() .
n n
n n
a>O
and v(d) = (b 2f(x )/d 2)1/(1-a),
o
EN(d)/v(d)
+
1 .
REMARK 4.3.
Schuster (1969) has shown that the kernel estimators fn(x)
n
satisfy (4.4), while under the assumption n- 1 I 8 /E. + a, one may use
i=l n 1
the exponential bounds (Loeve (1968), page 254) to show that f~(x) satisfies
(4.4) .
PROOF OF LEMMA 4.3.
Because of LeIDr.la 4.1 and by Bickel and Yahev (1968), i t
suffices to show there is a
dO>O
for which
()()
sup Pr{N(d»md- 2 (1-a)
m=l O<d<d O
I
-1
Letting
m(d)~md-2(1-a) , we see that
-1
} < ()() .
22
(4. Sa)
N(d)l-a_V(d)l-a ~ (v (d))l-a( fN(d) (TN(d) ) -f(x ) ) /f(x )
O
O
(4.Sb)
N(d)l-~_~(d)l-a ~ (V(d))l-a(fN(d)_l (TN(d)_1)-f(x o))/f(xo)+2no
Since
N(d)/v(d)~l
a.s.
as
d+o, multiplying all terms by v(d)-(1-a)/2
completes the proof.
Thus, taken together, Lemmas 4.1, 4.3 and 4.4 yield a great deal of
information, telling us in detail how the stopping rule 4.1 behaves.
Lemma
4.2 gives us a very nice property of the stopping rule, namely that it yields
confidence intervals for
probability.
f(x O)
of fixed length and prescribed coverage
It should be mentioned here that Starr (1966) has
sho"~
that
for this same class of stopping rules when trying to estimate the mean of
a normal distribution the following approximation is almost true for all
values of d:
This indicates that stopping rule 4.1 may well achieve its asymptotic behavior
for moderate values of d.
A second stopping rule of this type is more global in scope and would
appear more useful in situations where one is interested in estiluating the
density at the mode.
STOPPING RULE 4.2.
Specifically,
The stopping rule
N(d)
stope the first time
that
ne: n ~ (b/d)2 sup f n (x)
x
n~nO
22
LE~4A
if
4.4.
Suppose (4.1) and (4.2) hold and
E:
n
=n
-a
for some
a>O.
Then
A(F)=B(K)f(xO)
REMARK 4.1.
This
Le~ma
requires (4.2).
It may not be true if convergence
only on the order of Theorems 2.1 and 3.1 is known,
PROOF OF
LEr$~
4.4.
The proof follows exactly along the lines of a result
due to Ghosh and Mukhophadyay (1975).
we will sketch the proof.
Since
Since their result is as yet unpublished,
N(d)+oo a.s., by (4.1) and (4.2),
N(d) (1-a)/2(fN(d) (TN(d))-f(xo))/A(F) ~ N(O,l)
(N(d)-l) (1-a)/2(fN(d)_1(TN(d)_1)-f(X o))/A(F) ~ N(O,l) .
Now,
N(d)l-a ~
(N(d)_l)l-a ~
so that
Thus,
23
LEl·it1A 4.5.
Suppose there is a unique
X
o such that
=
max f(x)
x
and
max fn(x)
~
f(x O) a.s.
x
If
e:
n
=n
-I).
as
n+oo.
and
N(d)/v(d)~l
a.s.
If, in addition, there is a constant
00
L
(4.6)
n=l
d~O.
as
M*
for which
"-
Pr{suplfn (x)-f(x)I>H*} <
00
,
x
we have
EN(d)/v(d)~l
a.s.
The proof of Lemma 4.5 is the same as that of Lemmas 4.1 and 4.3.
(1969) has shown that the kernel estimators
is not known whether the estimators
f~(x)
f (x)
n
Schuster
satisfy (4.6), but it
satisfy (4.6).
We have also been
unable to obtain a result similar to Lemma 4.4.
Stopping rule 4.2 is a competitor
f
is symmetric and unimodal and
sample median.
X
o
to StopI1ing rule 4.1 in the case
is the mode.
T
n
tha~:
here might be the
The almost sure asymptotic properties of the two rules would
be the same, but rule 4.2 would of course always take more observations than
rule 4.1.
HO\'lever, in this particular case, both yield fixed-width co:afidence
intervals of prescribed coverage probability for
f(x O)'
24
Finally, note that
probability one, as
K(el)
d~
dcfinc..l
oy stopping rule 4.2 Jiverges with
so that
A
suplfn (x)-f(x)I+O
(4.7)
almost surely
x
~ sup/f
x
N(d)
(x)-f(x) 1+0 almost surely.
A Second Class of Stopping Rules
The second class of stofping rules we investigate is motivated by work
of Farrell (1966) and Sen and Ghosh (1971).
Their idea was to obtain upper
and lower bounds on the parameter of interest and to stop when the difference
in these two bounds becomes at most
2d.
The first rule of this section then
becomes
STOPPING RULE 5.1.
Define for some sequence of constants
{bn } decreasing
to zero
vn = (rn(Tn+bn )-fnn
(T ))/fnn
(T )
(5.1)
and let
H(d)
be the first time
n~nO
The motivation for dividing by
when there is little change in
f
n
,
Ivn 1~2d.
that
in (5.1) is that one will stop
n n
(in a neighborhood of xO ) relative to
f (T )
f(x ). At the end of this section we briefly discuss a rule which does not
O
divide by f (T). As a notational device we make the following:
n
DEFINITION 5.1.
n
A sequence of statistics
L
n=l
Pr{lyn I>e:an } <
y
n
00
=o*(a )
n
•
if for all
£>0.
25
Now, in order to investigate the stopping rule
at
N(d)
we want to look
V.
n
LEr<jMA ,5...1.
Suppose ,that
neighborhood of xO'
f
f(2)(X ) ~ 0, and that
O
Tn
-x O
= o*(b
n)
(5.2a)
(S.2b)
has three continuous',bounded"derivatives in 'a
suplfn(x)-f(X)!
= O*(b~),
where the supremum is taken in some neigh-
bor hood of xo.
(5.3)
(5.4)
PROOF:
Note that
A
A
~
f (T +b ) - f (T )
nnn
nn
where
IHn I
;s;
= f n(Tn+bn)
~
- f(T +b ) + f(T ) - f (T ) + f(T +b ) - f(Tn)
nn
n
nn
nn
2 suplfn (x)-f(x)/.
f (T +b ) - f (T )
n n n
n n
Thus,
= bnf(l) (xo)
+
~bn2f(2)(xo)
+
O*(b 2) .
n
New (5.3) and (5.4) follow easily.
REHARK 5.1.
For a discussion of (5.2b), see the remarks after Lemma 4.5.
26
\Ale
are now in a position to discuss the almost sure behavior of N(d).
THEOREM 5.1.
Suppose
bn =n-
Define
a
VIed)
= 14f(Xo)d/f(2)(xo)I-~a
v (d)
2
=
satisfies the conditions of Lemma 5.1.
N(d)/vlCd)+l
NCd)/v 2 Cd)+1
PROOF:
the case
12f(XO)d/f(1)(Xo)l-l/a
almost surely as
almost surely as
Then, if
d+O.
d+O.
The proof follows a method of Sen and Ghosh (1971).
f(l) (x ).
o
If
lien
denotes set inclusion and
iiU"
We consider only
set union,
The first set on the right hand side of (5.5) is contained in
2fCxO)
2a
2a
}
(2)
(Cl+E)v
Cd))
V[l+E]V
(d)
1
>
(l+E)
1
1
{f
(x )
1
O
so that by Lemma 5.1, it suffices to consider only the last event in (5.5).
Now,
27
so that again by Lemma 5.1 and since
d+O, the proof is now complete.
The next step is to show that an analogue to Lemma 4.3 holds.
Before
proceeding, a few definitions are needed.
DEFINITION 5.2.
Let
00
HE (d)
LEI~:~
5.3.
= L
nZ(d)
00
Pr{N(d»n}, H*(d)
€
Pr{NCd»n} •
ni(d)
Under the conditions of Lemma 5.1,
lim H Cd)
d+O e:
<
lim H* Cd) <
d+o E
PROOF:
= L
00
if
f(l)(x) = 0
0
<XI
if
f Cl ) (x )
Again, consider only the case
<XI
He: (d3
$
L
n ZCd)
{
Pr
0
fCl)Cx )
O
~
o .
= O.
Then for some
4f(x )d
O
V - 1 >
b~f(2) Cx
b~fCZ)(Xo) n
The last sum converges and is a decreasing function of
LEMMA. 5.4.
I}
2f(xO)
Under the conditions of Lemma 5.1.
(5.6a)
EN(d)/v (d)+l
l
if
f(l)Cx )
O
C5.6b)
ENCd)/v Cd)+1
2
if
f(1)CX
=0
o) ~ 0 .
o)
d.
e:1>O,
28
PROOF:
Ho HIll only ::;how (~.ba).
hie have
where
II
extends over
{n:s;n (d)}
1
r
extends over
{n (d)<n<n 2 Cd)}
1
13
extends over
{n;;:n zCd)} .
2
Now,
by Lemma 5.1.
Also,
by Lemma 5.2.
Finally,
as
This completes the proof.
E~,
d+O.
The choice of V
n
one available.
stants
given in (5.1) is certainly not the only possible
We list below more statistics
vI (d), v2 (d)
that go with them:
V
n
and the sequence of con-
29
(5.7)
{I
= max
fn (Tn +bn )-fn (Tn )/
A
'
fn (Tn -bn )-fn (Tn )/}
,
f ,(T )
n
n
I
"..
.
f (T )
n
n
(5.8)
v (d)
2
(5.9)
= 12d/f(1)(xo)I-1/a
Ivn31 = max{lfn(Tn+bn )
-
fnn
(T )1, Ifn(Tn-b
n)
-
fnn
(T )!}
Again, modifications of these stopping rules along the lines of Lemma
4.5 are also possible.
When this is done, we again see that since
diverges with probability one as
N(d)
J+O,
suplr (x)-f(xj+ 0 almost surely
x
n
':!>
s~plfN(d)(X)-f(X)I+O almost surely .
.r.
REFEREI~CES
[1]
Bahadur, R.R. (1966). A note on quanti1es in large samples.
Statist. (37) 577-580.
Ann. Math.
[2]
Bickel, P.J. and Wichura, M.J. (1971). Convergence criteria for multiparameter stochastic processes and some applications. Ann. Math.
Statist. (42) 1656-1670.
[3]
Bickel, P.J. and Yahav, J.A. (1968). Asymptotically optimal Bayes and
minimax procedures in sequential estimation. Ann. Math. Statist.
(39) 442-456.
30
[4]
Billingsley, P. (1968).
New York.
Convergence of Probability Measures. Wiley,
[5]
Cacou11os, T. (1966). Estimation of a multivariate density.
Statist. Math. (18) 178-189.
[6]
Chow, Y.S. and Robbins, H., (1965). On the asymptotic theory of fixedwidth sequential confidence intervals for the mean. Ann. ~~th.
Statist. (36) 463-467.
[7]
Davies, H.I. (1973). Strong consistency of a sequential estimation of
a probability density function. Bull. Math. Statist. (15), 49-54.
[8]
Davies, H.I. and Wegman, E.J. (1975). Sequential nonparametric density
estimation, to appear IEEE Transaotions on Information Theory,
November, (1975).
[9]
Farrell, R.H. (1966). Bounded length confidence intervals for the ppoint of a distribution function III. Ann. Math. Statist. (37) 586592.
[10]
Ghosh, H. and Mukhophadyay, N. (1975). Asymptotic normality of stopping
times in sequential analysis. Unpublished paper.
[11]
Govindarajulu, Z. (1975).
Press, New York.
[12]
Jain, N.C., Jogdeo, K., and Stout, W.F. (1975), Upper and lower functions
for martingales and mixing processes, Ann. Frob., (3) 119-145.
[13]
Lo6ve, ri. (1968). Probability Theory, 3rd cd., Van Nostrand, Princeton.
[14]
Parzen, E. (1962).
and the mode.
[15]
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a
density function. Ann. Math. Statist. (27) 832-837.
[16]
Sager, T.W. (1975). Consistency in nonparametric estimation of the mode.
Ann. Statist. (3) 698-706.
[17]
Schuster, E.F. (1969). Estimation of a probability density function and
its derivatives. Ann. flath. Statist. (40) 1187-1195.
[18]
Sen, P.K. and Ghosh, M. (1971). On bounded length confidence intervals
based on one-sample rank order statistics. Ann. !lath. Statist. (42)
189-203.
[19]
Srivastava, R.C. (1973). Estimation of probability density function based
on random number of observations with applications. Int. Statist.
Rev. (41) 77-86.
Ann. Inst.
SequentiaZ StatistioaZ Prooedures. Academic
On the estimation of a probability density function
Ann. Math. Statist. (33) 1065-1076.
31
[20]
Starr, N. (1966). The performance of a sequential procedure for the
fixed-width interval estimate. Ann. fibth. Statist. (36) 36-50.
[21J
Venter, J.H. (1967). On estimation of the mode. Ann. Math. Statist. (38)
1446-1455.
[22J
Wegman, E.J. (1972). Nonparametric probability density estimation: I.
A summary of available methods. Teahnometrias (14) 533-546.
[23]
Wegman, E.J. and Davies, H.I. (1975). Remarks on some recursive estimators
of a probability density. Institute of Statistics Mimeo Series #1021.
University of North Carolina at Chapel Hill.
[24]
Woodroofe, M. (1967). On the maximum deviation of the sample density.
Ann. Math. Statist. (38) 475-481.
[25]
Yamato, H. (1971), Sequential estimation of a continuous probability
density function and mode, BuZZ. Math. Statist., (14) 1-12.