Remarks on Some Recursive Estimators
of a Probability Densit,y
by
J..
Edward J. Wegman'
Department of Statistics
University of North Carolina at Chapel HiU
and
H. I. Davies
Department of Mathematics
University of New England
Armidale, Australia
Institute of Statistics Mirneo Series No. 1021
July, 1975
+
The work of this author was supported by the Air Force Office of Scienti.fic
Research under Grant No. AFOSR-7S-2840.
ABSTRACT
The density estimator of Yamato (1971),
well as a closely related one
ft(x)
n
= n-1h-~
n
= n -1 nL
f*(x)
j=1
n
1
h: K(x-X.)/h.), as
J
r h:~K((X-X.)/h.)
j=l
J
J
J
J
J
are considered.
Expressions for asymptotic bias, and variance are developed and weak consistency is
shown.
Using the almost sure invariance principle, laws of the iterated logarithm
are developed.
Finally application of these results to sequential estimation pro-
cedures are made.
Keywords:
recursive estimators, asymptotic bias, asymptotic variance, weak consistency, almost sure invariance principle, law of the iterated logarithm,
strong consistency, asymptotic distribution, sequential procedure.
A.~LS.
Classification Nos.
62GOS, 62L12,
60F2~60GSO.
I.
Introduction.
Xl ,X 2 , ... be a sequence of i.i.d. observations drawn according to a probability density~ f. Rosenblatt (1956) introduced the kernel estimaLet
tor of the density,
f(x),
,.
f n (x)
1
[5J
r
n
= nh
K h
j=l
n
n
'
and, in a now classic paper, Parzen (1962) developed many of the important properties of these estimators.
A
closely related estimator
[X-XjJ
1 n 1
f n* (x) = -h K - n . l'
h.
2
J=
was introduced by Yamato (1971).
J
J
This latter estimator has tho very useful property
that it can be calculated recursively, i.e.
f*(x)
n
n-l
=--n
1
X-XnJ
f*n-l (x) + --nhn K[----hn
This property is particularly useful for fairly large sample sizes, since addition
A
of a few extra observations means
fn(x)
must be entirely re-computed - a tedious
chore even with a computer.
In this paper we shall explore some properties of f*n
estimator,
as well as a related
f~, defined by
=
n~ jt ~ K[X~:jJ .
This latter estimator can also be recursively formulated;
ih
f t (x) - n-l . n-l ff· (x) + 1 -K
[x-XnJ
-- n i h.j1 n-l
nh
h
n
n
n
In addition, we will give a law of the iterated logarithm for
vergence for
mators.
f*
n
ft, a rate of conn
and some properties of both when used as sequential density esti-
2
Throughout this paper, we shall deal with univariate estimators.
to the multivariate case is straightforward.
K is
syn~etric
about
The extension
We shall assume throughout that
0,
K(u) > 0 ,
(1)
J:", K(u)du
sup K(u)
<
00
< '" ,
u
luIK(u)
and that
u
+ too ,
{h } is a sequence of numbers such that
n
h
(2)
as
0
+
n
nh
+ 0
n
+
00
•
Other assumptions on
K and
{hn } will be made as needed.
2. Asymptotic Bias, Variance and Consistency: Throughout this paper, it will be
convenient to deal with the sum
[x-x.]
n
1
J
n~ f~(x) = L _ K _
.l.
j=l
£
hj
J
We recall a useful lemma from Parzen (1962).
Lemma 1: Suppose K(u)
is a Borel function satisfying (1).
J:oo
and let
{h } satisfy (2).
n
~n
f:", K(~J
Ig(u)ldu <
00
Let
,
Then
g(x-u)du
+
g(x)
J:oo
K(u)du
as
n+oo.
g(u)
satisfy
3
Theorem 1:
(a)
Let
K and {h}
11
satisfy (1) and (2).
If f
is a bounded
density,
nhn var
(b)
Let us suppose
(00 e- iuy K(y)dy
1-
f~(x)
+
f(x)
f~~
2
K (u)du .
K has Fourier transform
Suppose further that for some
K*(u) =
lim {[l-K*(u)]/lul r } is finite
r,
f(r) (x)
exists.
Then
IE
(c)
so that
u~
00
and that
K*
f* (x) - f(x) I ~ 0 (1:.
n
L
n . 1
J=
n
r
h )
n
Under the assumptions of (b), and choosing h =bn 7Y ,
n
Ef'j- (x)
f(x)
n
+-y-.
1-
2
Proof: Now
nhn var ft(x)
n
= nhn
[
1 I
n1
. 1 h.
2
n
=
hn J=
2[X-X.]
[
(X-X.)]2 ]
EK~-EK-J
h.
h.
J
J
[
J
* .I [J
~.J K2(X~~lf(U)dU
- ~. [I
K(X~~)f(U)dU]2
J)
J
J
OO
OO
J =1
_00
_00
But making a simple change of variable
It follows that the
Ces~ro
1:.
n
Similarly,
sum
r h.(Joo
j=l
J
-00
]
K(U)f(X..h.U)dU)2
J
+
0.
]
4
Since
(*- J
UJ f(u)du =
1 K2 h.
"h.""
[
_00 J
hence the
Ces~ro
_00 J
[00 K2 Cu)du
K is a bounded function,
[~ ~j
1 K2 (U
"h.""
i1:"") f(x-·u)du .
Joo
2
K [Xj;;Jf(U)dU +
J
<
00. By Lemma I,
[~ K2 (uJdu
f(x)
,
sum,
lim~.I ~.
J=l
n-+co
K2(xh~)f(U)dU=
[
J
J
-co
The conclusion of (a) follows.
Joo ~
-co
2
J<X>
__ K
(u)du.
~
Parzen (1962) shows
K(XhY) f(y)dy-f(x)
n
f(x)
n
hr
~ k f(r) (x)
r
n
where
k
r
= lim {[l-K*(u)]/lulrl.
u-+o
I[
-co
~
n
Clearly, there exists
K(XhY) f(y)dy - f(x)
n
I : ; crh~
cr
such that
for all
n.
But
IE f~(x)
I ::;.!.n
- f(x)
I I )_00
r ~.
j=l
J
n
::;.!. I
n . 1
J=
To observe the result for
Multiplying by
fh."
-n1 . nI 1 v1_1.
h
J=
n
FJ
l"h
,
f
K(xh~)f(Y)dY
J
- f(X)!
r
r n .
c h
t (x),
n
dividing by n
and summing yields
n
f(x) -
1
c
I
Ii1 n r j=l
n
h:+~::;
J
E
fT(x)
n
::;.!.
I
)~ f(x)
n j=l v hn
+
5
Under the assumptions of (c),
_1 n
n
n
1 ~ .-y/2
-/.. J
-l. _ n.J= 1
h -y!2
n
n
k·
-
.L
J= 1
Using integral approximations,
r
1
/5.+ l-y/2
1
n J=
. 1!~
n
Similarly, using integral approximations
1
-
n
h:+~
L
+
0 .
J
Ihi1
j=l
n
Thus
lim E ft (x) = f(x)
n-+a>
o
1-y!Z
n
Using the integral approximation, it can be shown that if h =bn- Y, then
!.
r h:: = 0
n . 1
J=
J
n
(n- YI)
= 0(hnr) •
Thus for
hn =bn- Y' the Yamato estimator has the same
rate of convergence for the bias term as the Parzen estimator.
not always be the case.
!.
n
L
n j=l
h.
J
= 0(10 g
log n
n
For example, if
0
log n)
= O(log
r=l
and
log n • h).
n
h
n
flowever this need
= b(10ng n),
then
Thus the Yamato estimator may
have worse bias properties than the standard kernel estimators.
3. An Almost Sure Invariance Principle. Strassen (1964, 1965) introduced the idea
of an almost sure invariance principle an r
Jogdeo and Stout (1975).
his notion has been developed by Jain,
Briefly put, we will use the almost sure invariance prin-
ciple by showing that the sum,
it ~ HX~:jJ -
E
J
6
K[X~:jJ]
is with probability one close to Brownian motion in a sense made precise below.
The asymptotic fluctuation behavior of Brownian motion has been investigated and by
use of the
a1n~st
sure invariance principle, we may transfer results about Brownian
motion to our density estimates.
We first shall reproduce some re1e.ant results from Jain, Jogdeo and Stout
(1975).
Theorem 2 represents a less general version of Theorems 3.2 and 5.1 of
Y1 , ... ,Yn , ... be a sequence of zero mean
n
random variables with finite second moments. Let S = L Y. and
j=l J
n
n
2
V = L E[Y. L SO=O=V O'
n
j=l
J
Jain, Jogdeo and Stout (1975).
Theorem 2:
For a fixed
o.~O,
Let
assume
V
(3)
n
+
00
and
<
(4)
Let
[0,(0)
S be a random function defined on
t€[V,V
n
n+
1)'
Then, redefining
{Set),
space, there exists a Brownian motion
(5)
Here
IS(t) - ~(t) I
s
a.s.
obtained by setting
S(t)=Sn
for
if necessary, on a new probability
such that
,. o(tJ.2(10g zt) 0-0.)/2)
a.s.
log2t = log log t.
In particular, if (4) holds with
then
t~O},
00
o.=Z
and
~>O
is a nondecreasing function,
7
P[Sn > V~$(V
) i.o.]
n
n
=0
1
or
according as
f: ~(~) exp(-~2(t)/2)dt
~ HX~;i)
j
Let us identify Y =
-
E
<
=
or
00
K[X~;j)].
00
•
Thus we have,
J
V =
n
I
E Y: = E
. 1
J=
= h
n
J
E
n
1
J=
h.
n J
I . 1 h
I
• 1
J=
.L ,h.J
-
j)
K(X-X
h.
_ E
J
j)
K(X-X
h.
J2
J
[ K(~) - EK(X~~jJ ] 2
J
J
= hn var nft(x)
.
n
But under the assumptions of Theorem 1
nh
2
K (u)du ,
var ft(x) ~ f(x)
n
n
V
so that -n=
n
Thus for
8>0
and for
n
suffi-
ciently large,
Theorem 3:
(a) Let K satisfy (1) and {hn } satisfy (2).
conditions of Theorem 1. If in addition,
n~
(6)
log n (10g 2n)
then (5) holds for
(b)
In
5n
2 (1l+1)
diverges to
00,
defined arL 'Ie.
pa~ticu1ar,
ith
n
if
diverges to
00,
then
Let
f
satisfy the
8
P[Sn > ~
n (vn ) i.o.]
=a
or
according as
(c)
For
ct~O
(d)
For
ct>l,
" P
I 1m
n-+<x> [
Proof:
ft(X.)_Eft(X)
n
n
-- :;;
';var f "r (x)
n
]
w
=
Consider first
I
[y2>
Vk
]
k 1og Vk (1 og2 Vk )2(ct+1)
Now
where
c*
is some constant.
Thus
I
[y2>
Vk
]
k 1og v~k (1 og2 Vk )2(ct+l)
But
1
9
Yk2 ~ c*
k
log k(10g k)2(a+1)
2
if and only if
[ K('~:kJ
Since
K
(X-X)k
- EK h
J2
k
~
hkk
c*
log k(10g k)2(a+l) .
2
is bounded and (6) holds,
[[ K[\~J is an impossible event for
E
k
K[X~~J
]
~
c*
hkk
log k(10g2k)2(a+l)
sufficiently large.
Thus
]
_
= 0 for k sufficiently large.
It follows that .-
a.s.
Hence the conclusion of (a) holds.
(5) by
(2 log2 t
0
Part (b) is immediate.
t)~,
Set)
I(2QIOg2tot)~
~(t)
- (2.l0g t.t)¥
I
To see part (c), divide
o(t"'(logzt) 12")
~
------,-
z
(2.10g2t.t)~
= O((10g2 t )-a/2)
But
~(t)
~.
+
1 a.s.
(2 olog2t otf2
as
t~, hence for a~O.
a.s.
as
t~.
Thus
1/
S
n
(log2 non )'2
-----k..-+ 1
(log2Vn oVn ) 2
a.s.
as
n~.
10
But
Hence
Sn k
(log2 n on) "2
,
(7)
Finally noting that
Hence part (c).
(2f(x)
+
J~ K2(U)du)~ a.s.
n~
(fT(x) - E ft(x))
n n
n
~ (t)
I It
S (t) -
It
n+oo.
we have
l(n(O,I)).
I : ; a(
~(t)
is Brownian motion
But
I-a
(10 gz t) 2) a. S .
a>l,
Set)
is asymptotically n(O,I).
It
Letting
t=V
n
Sn
is asymptotically n(O,I).
IY
n
But
= Sn ,
For part (d), we observe that since
~(t)/It is normal mean zero variance
For
as
_~
Vn = n 2hnvar ftC
n x)
= var
,;- f t (x).
nvn
n n
Also
Sn
= nI
Y. =
j=l J
nih (f
n
ft (x)-E ft (x)
n
n
';var
~5
asymptotically n(O,I).
t
(x)-E f
D
t
n
(x))
o
l n (x)
~lliile we know the exact order of f~(x) - E f~(x), the fact that
is a
biased estimator and the fact that we do not have any rate on the bias term limits
"r
the usefulness of fn'
us.
Of course,
We could therefore consider
is a parameter of hn and hence known to
(l-Y)f~(X) which would be asymptotically unbiased.
y
11
Combining this';,result with Theorem: Lpart (a) and Theorem 3 part (c) yields a weakly
and a strongly consistent estimator respectively.
f~ can be translated to results for
Results for
useful lemmas.
f~ by the next two very
These were suggested by the Toeplitz Lemma and the Kronecker Lemma.
See Lo~ve (1963).
bn +
+
be sequences such that sn Ie n -+ s. Let
n-1
l
,
a
=b
then
b
'\ a.s. -+ s.
a.=b.-b. l' j~2 with
l 1
c J..
J J
J ) )n n J=l
n
Proof: Note that b n =
a .. Let e>O. There is Ne such that n>N e implies
j=l J
Lemma 2:
Let
c
00,
n
00
and
s
n
;'I
5
s-e
~
-E..~ s+e.
cn
n-1
Let
5'
n
=~ I
N
e
b\n n j=lI a.s.
J J
Taking
+
Then
a.s ..
nCn j=l
J J
~
I
b C
J.
n j=N +1
1
n
--
I
. 1
n J=
C
.L
e
n n )=1
e
lim inf and
If
1
a. (s-e)
1
a.s. + b
J J
n-1
I
n j=N +1
Let
L
j=l
J
e
~
lim inf
and
YJ' -+ s
b
n
5' ~
n
t
00,
lim sup s'
n
o
~ 5+8.
then
n
Proof:
a. (5+e) •
lim sup,
s-e
Lemma 3:
N
n-1
y.,
sO=O
J
and
a.=b.-b. 1
J J J-
with
a 1=b 1 ·
1
n
1
n
b.(s.-s·_l)
-bb.y. = ~
• 1
J J
n n j=l J J J
n Cn J=
L
L
s
n-1
= -E.. - _1_ '\ (b . - b. 1) 5 .
c
b C • [,1
J JJ
n
n n J=
s
1
n-l
= -E..
- - - L a.s . .
C
bn Cn J=
. 1 J J
n
Then
1Z
n
1
be. I bJ.y .
Using Lemma 2,
Theorem 4: Let
-+-
J
n n J=l
0
s-s = O.
satisfy (1) and {h } satisfy (2). Let f satisfy the conditions of
n nh
If in addition,
n 2 diverges to 00, then
Thcore~ 1.
K
nh]~
[ 10g:n
log n (logzn)
(f;(x) - E f;(x)) -+- 0 a.s.
as
n+oo.
rloreover if the conditions of Theorem 1 part (b) hold and
h =bn -Y with
n
>_1_
Y- 2r + l ,
then
l-y )~
(f;(x) - f(x)) -+- 0 a.s.
(~og2n
as
n+oo.
Proof: We observe that
n~ (f;(x) - E f;(x)) =
Identify
3.
en
= (n
lOgzn)"',
bn
I ~ [K(~) -EK(Xh-~jJ ] .
j=l
= ;"
hj
J
= ;,
Yn
n
The result follows from (7) and Lemma 3.
Notice that
E
f~(x)
Since
>_1_
y-Zr+l'
[K(X~:nJ
- E
K(X~:nJ
]
in Lemma
n
- f(x) = O(n -yr ), hence,
(~l-yn) ~ IE
og2 )
J
f* (x) _ f(x)
n
1
0[(n -:
I :s;
(zr+l))~]
og2n
)
.
1-y(2r+l)s8, so that
logzn
IE- f~(x)
(n1-Y)~
- f(x)
I -+-
o
n
In fact, Yamato (1971, p. 6) concludes
terms of bias is better in terms of variance.
1
n+oo.
f*(x), while possibly somewhat worse in
It follows that the Yamato estimator,
under suitable conditions that if
0 as
n
h
lim - L ~ = a, then
n+oo n J. =1 h.J
lim nh var[f*(x)] =
n+oo n
n
13
1 n
In fact, we can apply Lenuna 3 again to the expression - ~
n . 1
J=
n
1
1
1
bn =and
yn =l. Clearly, -c
L y. = - n = 1 ~ l. Hence !.n
hn '
n j=l J n
.
that
hn
h ..
nJ
L
Let
h
....!!. ~ 0, so
. 1 h.
J
J=
Hence for the Yamato estimator,
0=0.
lim nh var[f*(x)]
= O.
n
n-?<X> n
5. A Sequential Procedure.
One particularly useful application of recursively
formulated density estimators is to sequential procedures.
Davies and Wegman (1975)
introduce sequential density estimation, studying in some detail rules of the form:
Stop if
Ifn(x) - £n-l (X)I<E, otherwise continue.
In this section we shall discuss briefly a rule suggested by the recursive estimator
itself.
paper,
For both the Yamato estimator,
f~(x),
and the estimator introduced in this
f~(x), the correction term due to observation,
reasonable stopping rule might be to stop when
n~ -
r
K(
Xn , is
X-X)
n~ K(X~Xn).
n
A
n
gets I1too small".
X
Unfortunately, since nhn-?<X> and K is bounded, n~
l((X~ n) gets "too small"
n
n
independent of the observations. Thus we choose a stopping variable He such that
n
First
Ne
(a)
(b)
n
1 K
(X-X
- -) < e
such that h
h
n
=
00
Theorem 5:
n
n
if no such
We assume (1) and (2) hold for
n
n
exists.
K and
{hn } respectively.
is a closed stopping rule.
P[N <00]=1, i. e. N
E
e
ENk<oo for every k. Moreover there is a number,
e
tN
that E e e exists for t<-log p.
p, with
O<p<l
such
14
(c)
If
K(x»O
for all
x, then
H-+<»
(d)
If
K(x»O
for all
x, then
N -+a.> a.s.
(e)
Under the hypotheses of Theorems 3 and 4 and if
~
f N (x)
£
£
in probability as
f(x) a. s.
£1-0.
as
£
£1-0.
for all
K(x»O
x,
dO
as
and
t
~
(l-y)f (x)
N
f(x) a.s.
as
£1-0.
£
Proof:
X have density,
Let
f.
We first observe
1 K(X-X)
1
1 K(X-x)
P[N =n] = P[h
h ~e:] ... P[-hK(X-X)
-h- ~e:]P[h
h <e:]
e:
1
1
n-l
n-l
n
n
1 K (X-x)
p. = P[h."11:'" ~£] •
where
J
J
J
00
P[N
<00]
£
= I
P[N £ =j] = I-PI
j=l
.
= 1
luIK(u)~O as
Since
p.~O
J
as
j-+<».
Let O<p<l, for
I
n
k
E e
£
=
r
e
tn
n=l
e:
P[N =n]
£
This latter sum will be finite
To show (c), we note that
Thus
P[N
e:
>n]~l
as
rn
n=l
~
n
00
£1-0
Le.
sufficiently
P [N =TI]
n=l
tN
j
n
00
k
EN =
Hence
u+±oo, it follows that
y..
~
P
P
n-l-n
nk p " p <
00
k
I
+
n=n +1
p
t(l+n )
e tn + e
p
n=l
for fixed
•
Similarly
n-l-n
(etp)\)- p
n=n +1
p
provi~ed
p.t!
J
r
00
00
as
n.
t
e p<l
or
dO.
But
Hence N
£
+00
t<-log p.
dO.
in probability as
£1-0.
15
Next let
w be any point in the basic probability space.
We have
> 0
Let NO be any positive integer. Choose
hIn K(X-X(w))
h
.
n
E < min
1- K(X-X(w)) (E may depend on w). Thus Ns(w»N •
O
l<·<N
h.
h.
-J - 0 J
J
E+O,
lim inf N
E+O
But
NO
NO a.s.
~
was arbitrary
s-+O
=
00
a. s.
E
0
Part (e) follows immediately.
slightly more general stopping rule might be
First
n
such that
x-x
1
K(
n)
g(h)
-h--- < E
n
N=
00
Were
lim inf
E
lim inf N
A
Taking
g(x)
if no such n
n
eXists.
in some monotone non-decreasing function of
x.
To illustrate consider
the rule
First
N
In this example, we presume
f(O).
such that
=
00
mating
n
if no such n
XI, ... ,Xn , ...
Let us assume that
K(x)
=
exists
is a
nCO,I)
1
1T(I+x 2 ) '
_00
sample and we are esti< x <
~
He
observe then
16
Pn
= P[~K(~)
h
n
= 2 pro
n
<
>
x<
£]
j 1
h
n/7'11'£
- hn ) = 2(\i>'(1-1
'11'£
- hn )
= 2\i>(j-!.
'11'£
1 ]
n
\fl(0))
- 1 .
In this case, we notice that
Thus
P[N e: =n]
is very close to a
geometric distribution.
We also note here that, in general, we can compute the exact distribution of
given the knowledge of
K,
{h}
n
and
f.
REFERE~~CES
Davies, H.I. and Wegman, E.J. (1975), Sequential nonparametric density estimation,
to appear IEEE Transactions on Information Theory, November, (1975).
Jain, N.C., Jogdeo, K., and Stout, W.F. (1975), Upper and lower functions for martingales and mixing processes, Ann. Prob., l, pp. 119-145.
Loeve, N. (1963), Probability Theory, D. Van Nostrand, Princeton, N.J.
Parzen, E. (1962), On estirlation of a probability density function and mode, Ann.
Math. Statist.~ 33, pp. 1065-1076.
Rosenblatt, I~. (1956), Remarks on some non-parametric estimates of a density function, Ann. Math. Statist., 27, pp. 832-837.
Strassen, V. (1964), An invariance principle for the law of the iterated logarithm,
A. WahrscheinZichkeitstheorie, l, pp. 211-226.
Strassen, V. (1965), Almost sure behavior of sums of independent random variables
and martingales, Proc. Fifth BerkeZey Symp. Math. Statist. Prob., 2, pp. 315343.
Yamato, H. (1971), Sequential estimation of a continuous probability density function and mode, BuZZ. Math. Statist., 1~, pp. 1-12.
N
£
© Copyright 2026 Paperzz