Davies, H.I.; (1974)On the sequential estimation of a probability density function."

A PhD dissertation under the direction of EDWARD J. WEGMAN.
ON THE SEQUENTIAL ESTIMATION OF A
PROBABILITY DENSITY FUNCTION
by
H.I. Davies
Department of Statistias
University of North Carolina at Chapel HiU
Institute of Statistics Mimeo Series No. 918
April, 1974
H. I. DAVIES
On the sequential estimation of a probability density
(Under the direction of EDtlARD J.
function.
~EGf~.)
ABSTP.ACT
Using kernel estimators of both the Parzen- and Yamato-types, a
"naive" sequential procedure for estimating an unknown probability
density function is developed.
It is based on the function
A
A
Vn(x) • fnM(x) - f(n_l)M(x)
through the inequality
Ivn (x) I
S E,
where E~ 0 is given.
The asymptotic
structure of V (x) is examined for both the Parzen and Yamato estimators.
n
Properties of the procedure are investigated and in particular
it is shown that the stopping rule obtained has finite expectation and
variance and that it is closed.
tends to zero as
E~
It is shown that the mean square error
for the estimate fN(x) where N is the stopping
variable defined by the
!~naivell
stopping rule.
Some properties of the
stopping variable are examined and it is shown that under certain
conditions
N~
with probability one as
Properties of an estimator f
E~.
N
E
(x), where N is a stopping
E
variable, are investigated and in particular it is shown that under
appropriate conditions f
N
E
asymptotically normal.
(x) is both mean square consistent and
ii
TABLE OF CONTENTS
CHAPTER
Page
iv
ACKNOWLEDGEl1ENTS
1
2
3
4
INTRODUCTION
1
1.1
1.2
1.3
Introduction and Literature Review
Choice of K(u)
Choice of h
1
6
8
1.4
Summary of Results
10
A SEQUENTIAL PROCEDURE
12
2.1
2.2
2.3
2.4
2.5
2.6
2.7
12
14
28
42
43
Introduction
Properties
Finiteness of EN
Variance of N
Closure of the Stopping Rule
Divergence of N as £+0
Mean Square Error
44
53
SOME GENERAL RESULTS
54
3.1
3.2
3.3
54
Introduction and Assumptions
Finite Expected Value of fN(x)
Convergence Theorems
56
57
THE YAMATO SEQUENTIAL ESTIMATOR
73
4.1
4.2
4.3
4.4
4.5
4.6
73
74
86
4.7
4.8
5
n
Introduction
Strong Consistency
The Sequential Procedure
Properties
An Alternative Procedure
Finiteness of EN
l
Divergence of EN as £+0
l
Mean Square Error
87
92
93
98
99
COMPLEMENTS
100
5.1
5.2
5.3
100
100
103
Introduction
Mean and Variance
Choice of £
iii
CHAPTER
Page
5.4
5.5
5.6
5.7
Choice of 1'1
Global Stopping Rules
Choice of h
104
106
108
Variation
110
n
APPENDIX 1
114
REFERENCES
116
ACKNOWLEDGEMENTS
Firstly, I am greatly indebted to my adviser, Professor
E.J. Wegman who proposed this investigation and who made so many valuable
comments during its course.
His encouragement and stimulating ideas were
of great inspiration to me.
My sincere thanks go also to Professors N.L. Johnson and
Professor W. Hoeffding for reading the manuscript and for making so many
helpful suggestions.
A special thanks is due to Professor G.D. Simons
for his reading of the manuscript at such short notice.
I also wish to thank Professor R.L. Davis of the mathematics
department for his help and for his contribution to my education.
In addition I thank Professor M.R. Leadbetter for his advice
during my first three years as a graduate student in the Department of
Statistics.
I also extend my thanks to all those members of the faculty
in the Department of Statistics and Mathematics who have contributed to
my education at the University of North Carolina.
I am also grateful for the help extended to my by the Departmental Secretaries, especially Mrs.June Maxwell for her help and
suggestions as to the typing of this dissertation.
I gratefully thank Mrs. Chas. Burnett who allowed my wife and I
to live in her home for the last three months of our stay in Chapel Hill,
a circumstance that contributed greatly to the completion of this
dissertation.
I also gratefully acknowledge the Commonwealth Scientific
v
Industrial Research Organisation under whose studentship I have been
studying.
Without this aid this work could not have been carried out.
Finally, there are no words to thank my wife Sue enough.
Not
only did she painstakingly and accurately type the manuscript but she
also encouraged and supported me during the years leading to its
completion.
CHAPTER 1
INTRODUCTION
1.1.
Introduction and Literature Review:
The estimation of a probability density function, f(x), using
non-parametric methods has been considered by several authors.
Three
principal types of estimator have been extensively discussed in the
literature, Kernel, Orthogonal series and
Y~ximum
likelihood although
a fourth method based on the use of spline functions has just recently
been investigated (Wahba 1971, Hill 1973).
Each of the four approaches
is based on obtaining a random sample, Xl' Xl,o •• X ' of size n, of
n
independently and identically distributed random variables from a
population with density function f(x) and using this sample to calculate
an estimate, f (x), of f(x).
n
Depending on which of the four estimators
we use this estimate we obtain may be for a fixed point x only or it may
be a global estimate true for all x.
In this dissertation we will be
concerned only with fixed point estimators.
A natural question that arises is how to develop a rule for
determining the sample size n so that one may expect to satisfy (or
better) some predetermined error criterion.
a sequential method.
This leads us to consider
However in order to develop a sequential method
it is necessary to know the fixed sample properties of the estimator we
are to use.
Since kernel estimators have received the most discussion
in the literature we will thus confine our attention in this
dissertation to that type of estimator.
The kernel estimator has the
2
form
n
=1
K (x,y) dF (y)
n
n
(1.1.1. )
l
j=l
n
where F is the empirical distribution and K is the kernel function.
n
n
Rosenblatt (1956) was the first to introduce estimators of the
form (1.1.1.) and used the kernel
(1.1. 2.)
where h
lim h
~
n
n
K (x,y)
n
=1
h
n
is a sequence of positive numbers satisfying
.. 0 and lim nh
n-+oo
n
==
00.
discussed by Parzen (1962).
this dissertation.
Estimators of this form were more fully
They will be denoted by f (x) throughout
n
Since the results of Parzen will be important in
later chapters we summarize his results in the following theorem.
Theorem 1.1.1.
If K(y) and {h } satisfy appropriate conditions
1
n
then the
kernel estimate, f (x), based on (1.1.2.)
n
(i) is an asymptotically unbiased estimator of f(x) at all
points x at which f(x) is continuous.
2 "-
(ii) If a (f (x») denotes the variance of f (x) then
n
n
lim nh
n-+oo
n
2
oo
2 "
a (f (x») = f(x)
n
f
K (y) dy,
-00
at all points of continuity of f(x).
(iii) If nh
n
-+00
as n-+oo, then lim
n-+oo
"
2
E[f (x) - f(x)] .. 0
n
1.Chapter 2, equations (2.1.3.), and (2.1.4.).
2
form
f (x) ...
(1.1.1. )
f
n
ro
n
n
=1
K (x,y) dF (y)
n
l
j=l
n
-co
where F is the empirical distribution and K is the kernel function.
n
n
Rosenblatt (1956) was the first to introduce estimators of the
form (1.1.1.) and used the kernel
K (x,y)
(1.1.2.)
n
where h
n
lim h
rr+ao
n
K[:-Y]
=1
h
n
n
is a sequence of positive numbers satisfying
=
... 0 and lim nh
00.
n
rr+ao
They will be denoted by f (x) throughout
discussed by Parzen (1962).
this dissertation.
Estimators of this form were more fully
n
Since the results of Parzen will be important in
later chapters we summarize his results in the following theorem.
Theorem 1. 1. 1.
If K(y) and {h } satisfy appropriate conditions
n
1
then the
A
kernel estimate, f (x), based on (1.1.2.)
n
(i) is an asymptotically unbiased estimator of f(x) at all
points x at which f(x) is continuous.
2
A
(ii) If cr {f (x») denotes the variance of f (x) then
n
n
2
lim nh
n-+co
n
cr
A
(f
n
(x»)
= f(x)
2
oo
f
K (y) dy,
_00
at all points of continuity of f(x).
(iii) If nh ~ as n~, then lim
n
n-+co
A
2
E[f (x) - f(x)] ... 0
n
I.Chapter 2, equations (2.1.3.), and (2.1.4.).
3
A
and we say f (x) is weakly consistent.
n
A
(iv)
If f(x) is uniformly continuous then f (x) is uniformly
n
weakly consistent.
That is, given any €>O,
A
p{sup
x
I f n (x)
- f(x)
I
<
€} ~
sc] .
(v)
1 as n~.
c
----=1=_._:_
1/
(2'IT) 2
f
1
e
-"2
2
u
du
=
~(c).
_00
Van Ryzin (1969) also considers the estimate f (x), showing
n
that if x is a continuity point of f(x) then provided K(u) and h
n
2
satisfy certain conditions , f (x) ~ f(x) as n~ with probability one.
n
He then strengthens the conditions to show that provided f(x)is
A
"
uniformly continuous sup If (x) - f(x) I ~ 0 as n~, with probability
x
n
one. We call the first of these properties strong consistency and the
latter uniform strong consistency.
Van Ryzin's results are proven in
the multivariate case in that he assumes x is an m - dimensional vector
and Xi is an m - dimensional random vector.
Both Parzen and Van Ryzin discuss estimation of the mode.
However we will not consider that problem.
The uniform consistency of f (x) with probability one is also
n
proved by Nadaraya (1965).
He however uses a different set of
conditions to Van Ryzin.
We will also discuss kernel estimators of the form introduced
by Yamato (1972).
2.
The kernel in this case has the form
See Chapter 2 conditions for lemma 2.2.2.
4
(1.1.3.)
vhere X is an m - dimensional random vector and x is an m - dimensional
j
vector.
The Yamato-type estimator will be denoted by f (x). When m=l,
n
Yamato's results parallel those of Parzen summarized in Theorem 1.1.1,
2
~
the main difference being that if a (f (x») denotes the variance of
n
f (x) then
n
2 ~
J~ 2
lim nh a (f (x») . " f(x)
K (y) dy
n
n
0
n-+co
-~
where v
is a constant such that O<v
~
This means that f (x) is
n
'"
asymptotically at least as good as f (x) in the sense that the
o
o
1.
n
asymptotic variance of f (x) is less than or equal to the asymptotic
n
variance of f (x).
n
Yamato also notes that f (x) can be written as
n
f (x)
n
= n-l ~fn_l(x) + __1__
n
m
nhn
[x-X]
n
h
K
n
so that he claims it is more suitable than f (x) in order to correct
n
the estimate successively in the case where a sequence of random
vectors or variables is observed.
Yamato in fact refers to his
estimator as sequential though he makes no attempt to use it in a
sequential context.
By this we mean that he is concerned only with
fixed sample properties and asymptotic properties of f (x) and makes
n
no attempt to suggest a stopping rule for use with f (x).
n
A more general kernel function is discussed by Watson and
Leadbetter (1963).
They consider
5
(1.1.4)
K (x,y)
n
where {o n (x)} satisfies (i)
(ii)
J0 (x)
n
dx
fixed A>O, (iv)
= 1,
J
= 0n (x-y)
10 n (x)! dx < A, all n, some fixed A,
all n, (iii) 0 (x)
J
10n(x)1 dx
~
n
0 as
~0
n~
uniformly in
Ixl~A,
for any
for any fixed A>O.
Ixl~A
We will not discuss estimators using kernels of the form (1.1.4) but
will only note that using mean integrated square error (M.l.S.E.) as a
basis, that is
Watson and Leadbetter find the optimal 0 which minimizes that quantity.
n
They show that the form of the optimum estimate is in general a
complicated expression dependent on the explicit form of the density
function and depends heavily on the form of the characteristic function
of f.
Special forms of the characteristic function are considered and
consistency in quadratic mean is shown to be attained in each of the
special cases.
Leadbetter (1963) discusses pointwise consistency results for
L2 densities using mean square error as a basis for his discussion.
Srivastava (1973) treats the problem of the estirration of a
probability density function when the sample size is a random variable.
He shows that if Nt is the number of observations that occur in a time
(x) is both a consistent and uniformly
N
t
consistent estimator of f(x) at the point x. Further, he proves that,
interval (o,t], then as
t~,
f
duo
6
Srivastava discusses both the case when Nt is independent and also
when it is dependent on the observations.
To my knowledge this paper
is the only one that considers density estimation when the sample size
is a random variable.
Two questions now arise quite naturally in the discussion of
kernel estimators.
(i) How should we choose the kernel function, K(u), and
(ii) how should we select the sequence h ?
n
Since these questions are not only relevant to the fixed sample problem
but also to the sequential problem we will briefly review the
information that is available in the literature.
First we consider the
choice of K(u).
1.2.
Choice of K(u):
Parzen (1962) requires that K(u) be a non-negative, even, Borel
function satisfying (i)
s~p
(iii) lim luIK(u) • O.
In addition he requires
K(u)
<~,
J~
(ii)
K(u) du
<~
and
-~
Iu l-+co
J~
K(y) dy
=1
so that fn(X) will be a density function.
He suggests
-~
several possible weighting functions (see Appendix 1) and evaluates
J
~
-~
of
I
2
K (y) dy for these kernels.
~
From the form of the asymptotic variance
f n (x) (Theorem 1.1.1.) we see that it may be minimized by minimizing
2
K (y) dy so that Parzen's calculations do give some idea which kernel
-~
is to be preferred in this sense.
No attempt is made to find the
co
"optimal" kernel which would minimize
f
-~
condi tions •
2
K (y) dy subject to the given
7
This problem is discussed by Epanechnikov (1969).
Epanechnikov
defines "relative global approximation error h as
where x is a m - dimensional vector and
Q ..
We note that for
~1,
f/
(x)
dx.
f (x) corresponds to Parzen's estimate.
n
Letting n-+<», Epanechnikov obtains a
n
re1ative global
approximation error"
Q
2
K (y) dy
where
-00
dx o.odx
and
1
i:J..
The "optimalll kernel, K (y), is obtained by minimizing "relative global
o
error" which amounts to minimizing L subject to certain constraintso
He obtains
2
3
K (y)
:I
415
- 1L-
for
lyl~15
for
lyl>15
2015
0
0
Epanechnikov shows that K (y) is independent of the true
o
probability density, the sample size and the dimensionality of the
space.
He calculates Land r .. L/
""Ko(Y)
2
J
dy for several kernels (see
-00
Appendix i, table 2).
The calculations indicate that relative global
8
error is not necessarily greatly altered by using a non-optimal kernel
and for this reason we feel justified in using a non-optimal kernel in
later chapters, especially if calculations are made simpler by so doing.
Anderson (1969) carried out a Monte - Carlo study and showed
that if the optimum h is used then M.I.S.E. is also relatively
n
independent of the kernel.
The choice of kernel seems far less critical
than the choice of h which we will now discuss.
n
1.3.
Choice of h :
n
Rosenblatt (1956) shows that for large n the mean square error
can be written approximately as
E If (x) - f(x)l
n
If h
n
4
2
A
_ f(x)
2nh
n
h
+ ....!!.
36
.. B n-a ,a>0 and B a cons t an t i d
n epen dent
choice of h
n
and
0
f n, then th e op ti mum
that minimizes mean square error is attained for a • 1/ 5
[~
B =
Similarly, if M.I.S.E. is to be minimized then again a .. 115
Unfortuneately, the unknown density function f appears in the
choice of B.
However, we can still obtain both pointwise consistency
and consistency in quadratic mean for a suitable choice of B.
Rosenblatt shows that if the first three derivatives of f exist at x,
then the optimum choice of h
o(n- 4/s ).
n
gives a mean square error no smaller than
9
Parzen (1962) obtains the optimum sequence h
n
that minimizes
mean square error as
1
r>O
-a.
where k
r
f(r)(x)
= lim
r-+{)
[l-k(u)]
r
lul
= _(2~)-1
fa. e iux
is not zero and
lul r
~(u)
duo
Note k(u) is the Fourier
-co
transform of K(x) and
~(u)
is the charasteristic function of f.
Woodroofe (1970) uses a two stage procedure to estimate h •
n
The first step is to estimate f and its derivatives using f (x) with a
n
h sequence satisfying the usual conditions.
n
is then obtained by finding the optimal h
n
An estimate of h " h ,
n
sequence with respect to the
"
density f (x).
n
n
"
We can then estimate f by replacing h with h in f (x).
n
n
n
Under appropriate regularity conditions, Woodroofe then proves that if
h
n
is the optimal sequence with respect to the true density f(x) then
2
(1.3.1.)
as
n~.
E
2
(fn(X ; hn) - f(X») ... E (fn(X ; hn) - f(X»)
Unfortuneately the estimates h
n
still depend on arbitrary
sequences "b " and "t
"introduced by Woodroofe. Of these he says
n
n,i
" ••• the determination of the band t
sequences ••• is not as
n
n,i
crucial as that of the h sequence. The former affects only the rate
n
of corvergence in (1.3.1) while the latter affects the rate of mean
square consistency".
From the discussion above we conclude that the choice of the
sequence hn using only the observations is still an open problem.
We
10
will conjecture a possible solution in chapter 5.
1.4.
Summary of Results:
Epanechnikov discusses the determination of the sample size to
assure a prescribed level for the minimum relative global error.
How-
ever, this discussion entails assuming the form of f(x), an assumption
that will usually not be justified.
approach to be necessary.
For this reason we felt a new
One approach that is often employed in
determining a sample size necessary to satisfy a given criterion, is
that of sequential methods.
Thus in order to obtain the sample size
required to use the estimator f (x) and satisfy some given criterion,
n
we will use a sequential method.
We feel that the results obtained in
this dissertation bear out the appropriateness of the approach.
In chapter two we use the estimator f (x) to define a function
n
Vn(x)
"
= fnH(x)
- f(n_l)M(x).
We define a !lnaive" sequential procedure
based on this function V (x) and constant
n
The asymptotic structure
E.
of the function V (x) is studied and properties of the stopping rule, N,
n
are obtained.
We show that under appropriate conditions, peN > nM] has
an upper bound which decreases exponentially to zero as
EN
r
n~
and that
is finite for r finite.
In chapter 3 we consider the properties on an estimator
f
N
(x),
E
NE is a stopping variable.
where
made as to the form of either
that if
N
e:
+ ~
Ef
Ne:
(x)
that under certain conditions
N
(x) - f(x)]
2
+ 0
as
e: + 0
a. s. as e: + O.
and
f n (x)
+
E[f
O.
e:
+
+
f(x)
e:
a.s. as
N
fn(x) or the stopping rule
in probability as
appropriate conditions,
E[f
As few assumptions as possible are
E +
f(x)
0 then provided
as
(x) - Ef
e: + O.
E
f
n
•
We show
(x) satisfies
It 1s also shown
(x)]2 + 0 as e: + 0 and
e:
We also show that i f 1n fact N + co
e:
a.s. as n + 00, then
f (x) + f(x)
N
e:
Ne:
N
11
In chapter four we show that
uniformly strongly consistent.
fn(x) is both pointwise and
We use these results to parallel the
results obtained in chapter 2, using
f
n
(x) instead of
~
n
(x).
CHAPTER 2
ASEQUENTIAL PROCEDURE.
2.1.
Introduction.
Under appropriate conditions, we have seen that the density
estimate
n
f (x) • 1
n
is asymptotically consistent.
n
l
j"l
1
h
n
We now show how this property may be
utilised to define a "naive li sequential procedure.
Basically the
procedure will consist of taking successive samples of size M consisting
of M mutually independent and identically distributed random variables
and defining the difference as
(2.1.1)
where fnM(x) and f(n_l)11(x) are the density estimates based on sample
sizes of nM and (n-l)M respectively.
The stopping rule we will consider
is then of the form
First n such that
(2.1.2)
N(&,M) ..
~
Ivn (x)I<&
for given &>0
if no such n exists.
In this chapter we discuss the properties of the stopping rule
and some of the properties of the function V (x) upon which the stopping
n
rule depends.
We examine the asymptotic structure of the function V (x)
n
and prove that both the expected sample size and the variance of N are
finite.
In fact we prove that under appropriate conditions all moments
13
of the distribution of N are finite and that P[N>nM] has an exponential
upper bound which tends to zero as
N(£,M)
~ ~
as
£~O
n~.
Further, we will show that
in both probability and with probability one and that
N(£,M) as defined by (2.1.2) is a liclosedll stopping rule, by which we
will mean that
P[N<oo]
= 1.
Finally, we will redefine N(£,M) so that the class of kernels
~
for which N(£,M)
00
£~O
as
is considerably enlarged.
In order to obtain the results in this chapter we will make
the following assumptions concerning the kernel function and the h
n
sequence.
The kernel, K(u), will be assumed to satisfy:
(i)
(2.1.3)
(11)
(iii)
K(u) is a density on R
sup K(u)
u£R
<
00
lim lui K(u) ~ 0
~
The sequence {h } , n
n
(i) h > 0 for all n
n
(11)
(2.1.4)
satisfies
lim h
n~
(iii)
= 1,2,··'
n • 0
lim nh
n~
n
=
00
and
(2.1.5)
lim h(n+l)M
• 1
hnM
n~
We note that conditions (2.1.3) and conditions (2.1.4) (i)
,..
and (11) are necessary for obtaining the asymptotic unbiasedness of f n (x),
and these together with (2.1.4) (iii) are necessary for obtaining
consistency.
14
2.2.
Properties.
Le~ma
2.2.1 gives some elementary properties of V (x) which allow
n
its use in the stopping rule (2.1.2).
Lemma 2. 2. 1.
n~
As
, if K(u) and {h }, n
n
= 1,2,000
satisfy (2.1.3) and
(2.1. 4) then
(a) Ivn(x)I
+
0 in probability if x is a continuity point of
f(x), and
Ivn (x)1
(b) sup
continuous.
x
+0
in probability if f(x) is uniformly
Proof: (a) By definition
A
"
Ivn(x)I • IfnM(x) - f(n_I)M(x)l.
,.,
Under the conditions stated {f (x)} is a Cauchy sequence in
n
probability since f (x)
n
+
f(x) in probability and so the result follows.
A
(b)
sup Ivn(x)I
x
= sup
,.,
IfnM(x) - f(n_l)M(X)I
x
,.,
S
,.,
sup IfnM(x) - f (x) I + sup If (n-l)N(x) - f(x)
x
x
Under the conditions stated each term on the right hand side
can be made arbitrarily small by making n large enough and so the
result follows.
Lemma 2.2.2
If K(u) and {h }, n
n
= 1,2,000
are as in Lemma 2.2.1, and in
addition condition 2.1.5. and
(i) suplulm{K(cu) - K(u)}2 is locally Lipschitz of order
lul~a
a at c
=1
for some a>O where a>O.
I
15
f
(ii)
2
~
{K(cu) - K(u)}
du is locally Lipschitz of order a at
-~
c • 1 for some a>O,
(iii)
1
nh
then
Ivn (x)1
- 1
I-B
h
n
<
~,
n
~ 0 with probability one if x is a continuity point of
f(x).
Proof:
The conditions stated are sufficient to ensure
{fn (x)}
is a
Cauchy sequence (a.s) and the result follows immediately.
Parzen (1962) shows his estimator is asymptotically normal.
While we cannot show V (x) is also asymptotically normal we will show
n
that it may be written as the sum of two independent random variables,
one of which does have an asymptotic normal distribution.
We develop
this result with the sequence of lemmas 2.2.3 and 2.2.4.
Lemma 2.2.3
If K(y) is a piecewise continuous Borel function satisfying the
conditions (2.1.3) and g is a real function satisfying
-~
and if {h } is a sequence of positive constants satisfying the
n
conditions (2.1.4) then
g (x) •
n
f~ ~
-~
converges to
n
16
_00
at every point of continuity of g(o).
Proof:
The proof is in two stages.
(A)
First we will show that
[00
-00
lim gn (x) .. lim.B1&
n-+ao h
n
n-+ao
K
(L) (-L.)
K
h
n
h
n-
dy
I
and then that
Proof of (A):
Using the definition of g (x),
n
We now split the range of integration into Iyl>o andlylso
where 0 is an arbitrary, positive number.
Thus, the last quantity is less than
max Ig(x-y) - g(x) I 1..lylSo
hn
+
~
f
n
f
Iylso
Ig(x-y) I
Iyl>o
K(.J. .) K(-L)
h
hn
K(~) K(~)
n
+
~
f
n
dy
n-l
Ig(x) I
Iyl>o
dy
n-l
K(~) K(~)
n
n-1
dy
17
We now make the transformation, z
D
y- in the first and third
h
n
terms so that we can write (2.2.1) as
max
lyl~t5
Ig(x-y) - g(x) I
K(z)
J
Izls~
+
J
lyl>t5
+
K(:n z) dz
n-l
n
Ig(x-y) I ~
K(-;-)K(~)
f
IK(z) K(~
hn_l
y
Ig(x) I
n
Izl~~
n
dy
n-1
z) I dz
n
which in turn is less than
(2.2.2)
l;j~o
Ig(x-y) - g(x)
I
f K(z) K[::_l zJ dz
But, by Schwarz's inequality we obtain
<
=.
by assumption.
Since 0 is arbitrary, the first term can be made arbitrarily
small by making t5 arbitrarily small.
Then, letting n+=, the second and
18
third teres in (2.2.2) tend to zero.
Thus the proof of (A) is complete.
Proof of (B): Consider
On letting z
= -y
h
this becomes
n
JOO IK(:n
-00
z) - K(z)
I K(z)
dz.
n-l
Let
Itn(z)! • IK(:n =z) - K(z)1
K(z)
n-l
~ (2 sup I K(z)l) K(z)
which by condition (2.1.3) (ii)
z
is finite since sup K(z)
<00.
Thus
z
since
JOO \K(y)! dy
<
00.
But, since K(y) is assumed piecewise
_00
continuous,
Thus, by the dominated convergence theorem,
and the proof of (B) is completed.
The conclusion of the lemma then follows.
Lemma 2.2.4.
Let V (x) be defined by (2.1.1), then if K(u) and {h } satisfy
n
n
19
(2.1.3), (2.1.4) and (2.1.5) and if in addition
lim n[hnM
-1)
n-+-co
h(n-1)M
(2.2.3)
=1
- v
then
2
lim n Mh
nM Var Vn (x)
n-+-co
=v
JOO K2 (u) duo
f(x)
Proof:
From the definition of f (x) we have
n
(n-1)H
I
j=l
Since X ,x ooa , X are independently and identically
nM
1
2'
distributed
(2.2.4)
Var V (x)
n
\
=
K(:nM-XI)
Var
+ __..:;1:..-_
2
nMhnM
(n-1)r1h(n_1)H
2
---....=.,.---
Cov
nMh nM h{n-l)H
= f(x)
JCO K2 (u) du
-co
Using this with lemma 2.2.3 we have from (2.2.4)
2
lim n Mh
Var V (x)
nM
n
n-+-co
lim
= n-+-co
n
2
+ (n-I)
hnM
2
h(n-1)l'1
K( XnX1 )
h(n-1)M
1) K( X-Xl) ).
(K(X-X
hnM '
h{n-l)M
Parzen (1962) shows that
X-X)
lim __1__ Var K ~
(
n-+-co hnM
nM
Var
Var
K( X-Xl)
h
(n-1)~1
20
2n
hoM
h
(n-1)H
r
2
h
lim
nl1
• n~ ( n + ...,.:n==--~
(n-l) h
(n-l)M
n~
h n- 111
~
... v. f(x)
) f(x)
(n-1)M
+ __1_ hh nM ) f(x)
h nN
... lim n( 1 -
2n h h nN
K2 (u)
(n"l)
n-1H
2
K (u) du
-00
oo
J
K2
(u) du
-00
duo using our assumptions (2.1.5) and
00
(2.2.3).
This proves the lemma.
Lemma 2.2.5.
vn (x) can be written as the sum of independent random
variables 7 that is
vn (x) = An (x) +
Bn (x)
such that under the conditions of lemma 2.2.4
and
Proof:
We can write
vn (x)
(n-1)M
...
l
j=l
(nl1h1nM K(~) _-:--.--=--1_
(n-1)Mh(n_1)H
hoM
(2.2.4A)
+
1
nMh
nM
n11
l
j=(n-1)M+1
K(X-Xj )
hoM
K( X-Xj
) )
\n-1)M
21
=
A (x) + B (x)
n
say.
n
Since A (x) depends only on X , X,··· , X( l)u and B (x)
n
1
2
n- 1~
n
depends only on X( -l)M+l' ••• , X v' A (x) and B (x) are independent.
n r
nn, n
n
Thus
Var Vn (x) • Var An (x) + Var Bn (x).
But
=
Var Bn (x)
1
M Var K(hx-nMX1)
222
n 11 h
nH
so that
-co
But, since A (x) and B (x) are independent we can now write
n
n
Var A (x) • Var V (x) - Var B (x)
n
n
n
so that
2 2 2
lim n M h \. Var A (x)
n~
na
n
= lim
n-+oo
=v
n M h_\'I Var V (x) - lim n M hoM Var B (x)
llr
n
n~
n
2
oo
f(x)
J
co
K (u) du - f(x)
_co
-co
-co
and the lemma is proved.
Remarks:
(1)
v
~
1 so that -1 + v
J
~
o.
2
K (u)du
22
(2)
The quantity B (x) is a finite sum depending only on the
n
final sample which consists of M independent random variables.
Further
Bn (x) is always positive since we assume the kernel, K(u), is always
positive.
If the density function f(x) were known then under
(3)
appropriate conditions on the kernel function, K(u), the exact
distribution of B (x) could be found.
It would depend on n and would
n
be an M- fold convolution of densities of functions of the form
z.
1
k
nt1h
(4)
K(X-~)
h
nM
(n-l)11 + 1
~
k ~ nM
nN
The quantity A (x) depends on the first (n-l)M
n
observations and we will show it has an asymptotic normal distribution.
Example 2.2.1.
We will now give an example of a {h } sequence which satisfies
n
the condition (2.2.3).
h
n
Let
= B n -a.
o<
a. < 1
where B is a constant independent of n, then (2.2.3) gives
n( hnN
(2.2.5)
-
1) • n( (n:1) - 1)
h(n-l)M
.. -a +
Thus, taking the limit as
n~
0
(1)
we have
1 -v .. -a..
That is for the sequence h
v
=1 +
n
.. B n-a., 0 < a. < 1, we have that
ClO
2
a. and lim n M h M A (x)
n+co
n.
n
=a
f(x)
I
_00
2-
K (u) duo
23
Lemma 2.2.6.
If K{u) and {h } satisfy (2.1.3) and (2.1.4) then
n
B (x) s
M
n
Proof:
nMh
sup
nM
K{u)
+
0 as
n~
u
By the definition of B (x)
n
nM
B (x)
n
l
1
D-
j-{n-1)UH
nM
1
hnM
KF~)
hnH
nM
s
1
nHh
nM
H
• nl1h
nl1
l
j-{n-l)M+l
sup K{u)
u
sup K{u)
u
which is finite, since by assumption sup K{u)<oo.
Clearly, since
u
nMh ....~ as n+oo the bound tends to zero as n~.
nu
We will now show that A (x) is asymptotically normally
n
distributed.
To do this we first prove two lemmas, in which we will
require the definition,
1 K(
n-l K(X-X l ) nhnM hnM
h(n-l)M
An1 (x) ...
X-Xl).
h(n-l)M
Lemma 2.2.7.
If K(u) and {h } satisfy (2.1.3) and (2.1.4) then if
n
lim (n-l) 2 + n
hnM
- 2(n-l) h hnH )
n~
n
h(n-1)M
(n-l)M
oo
lim n M hn11 Var Anl(x)
n~
Proof:
omitted.
= "o
f(x)
f
•
"o
2
K (u) du
_00
The proof exactly parallels that of lemma 2.2.4. and so will be
24
Examp Ie 2. 2•2•
Ifh
then v
o
n
-Bn-a
, 0 < a < 1
as defined in Lemma 2.2.7 equals
C'£,
that is v
0
= C'£.
Lemma 2.2.8.
If K(u) and {h } satisfy (2.1.3), (2.1.4) and (2.1.5) and if
n
e»
J
-e»
3
IK
du
(u)
<00,
and
lim (n-l) - hn hnM )
n-+<»
(n-1) 11
= 'VI ,where
'VI is a constant and,
if {C } is a sequence such that C +1 as n-+<» and K(C u)+K(u) uniformly
n
n
n
223
in u ,then,
Proof:
lim
n-+<»
n
h M EIA l(x)1
n
n
=0
By definition
3
EIAnl(X)1
•
K(X-Y) I
K(X-Y
n hnM hnM
h(n-l)M h(n-l)M
-00
=
)1
Je» I(n-l)
1j
njh
nM
3
fey) dy
3
fl(n-l) K(X-Y) - nh llH K( x-Y ) 1 fey) dy
-00
hrUt
h(n-l)M h(n-l)M
2 2
Thus multiplying both sides by n hnM and making the transformation on the right hand side of u
= X-Y we obtain
h n._
M
2 2
n hnM E AnI (x)
=~_(
_
But,
~
I
(n-l) K(u) -
l(n-1) K(u) -
nh nH K(h nH
u)13 f(x-hnHu)du.
h(n-l)M h(n-l)M
nh nM K( hnM
u)
h(n-l)M h(n-l)M
I
I
• lim I(n-l) - hnhnM
K(u)
n-+<»
(n-l)M
25
= vl
K(u)
by assumption.
Thus there exists a 6 • 6(n) such that for n > n ,
o
I(n-l)
nh
h
3
K(u) - h nH
K(
nJ1
u) I
(n-1)M
h(n-1)11
3 I31
- vlIK(u)
s 6
Thus for n > n ,
o
+.2. Jco f (X-hn}f
n
u) du
-co
But,
so that,
223
n hn~M EIAn l(x)1
+
0 as n~.
To prove the asymptotic normality of Anl(X) we will now use a
theorem from Loeve.
Theorem 2.2.9.
The theorem is stated here without proof.
(Normal Convergence Criterion -
Lo~ve
(1960) p 316)
If X are independent summands, then for every £>0,
nk
tX [ a
P k nk
II
and,
if and only if, for every £>0 and a T>O,
(ii)
tk an2k
2
(T) +
a
26
~nk(T)
where
•
J
xdPnk(x)
Ixl<!
Using lemmas 2.2.7 and 2.2.8 together with Theorem 2.2.9 we
now prove the asymptotic normality of A (x).
n
Theorem 2.2.10.
If
~
IK(u)1
3
du <m and conditions (2.1.3), (2.1.4) and (2.1.5)
_00
are satisfied, then provided K(u) is piecewise continuous
lim P (An (x) - E A (X»)
(2.2.6)
n~
y
du • Hc)
(Var A (x») /2
n
By theorem 2.2.9 a necessary and sufficient condition for (2.2.6)
Proof:
to hold is that
(2.2.7)
as n-+<», where An! (x) • (n-l) K(X-Xl ) 1
K( X-Xl)
nh nM
hnM
h(n-l)M h(n-l)M
A sufficient condition (Liapounov's condition) for (2.2.7) to
hold is that for some 0>0,
as
(2.2.8)
2
where a (Anl(X») denotes the variance of Anl(x).
We will consider 0 • 1.
n~
27
Since (a + b)
333
4(a + b}
we obtain
But,
so that
(2.2.9)
3
EIAnl(X} + EAn1(x}I
3
~ 8 EIAnl(X}I •
Substituting (2.2.9) in (2.2.8) we then get
(2.2.l0)
But, from lemmas 2.2.7 and 2.2.8 we have that
and
2
C1
(AnI (x))
=
0(_1_)
nh
nM
so that (2.2.l0) is of the order,
That is
as n-+<o.
This completes the proof of the Theorem.
We have now shown that V (x)
n
= An (x)
+ B (x) consists of one
n
28
component, A (x), that has an asymptotic normal distribution and another
n
component, B (x) which is bounded above by a bound which tends to zero
n
as n~. Notice, however, that the variances of the two components are
of the same order,
(o( 21
))' for estimators of the Parzen type.
n hn}I
2.3
Finiteness of EN:
A property that is desirable for any stopping variable is that
the expected sample size, EN, should be finite.
If we assume no
knowledge of the density function, f(x), we will in general not be able
to determine the exact value of EN.
Indeed, if f(x) is known it is
likely that we can obtain an exact expression for EN in only a
relatively small number of cases.
For this reason we will confine our-
selves to proving in this section that with our
Il
naive tl stopping rule
the expected sample size is finite given certain restrictions on the
kernel.
To do this we prove a sequence of lemmas.
Lemma 2.3.1
For arbitrary t>O, and given e>O,
(2.3.1)
where
nr-!
(2.3.2)
S (x)
n
=
r
j=1
(n-l)H
r
j=l
n
n-l
Proof:
From the definition of V (x), we have
n
p(Vn(X) > e)
= p[Sn(X)
> nMhru1 e)
Define, (using a method described in Hoeffding 1963)
29
T(x)=
S (x) - nUh
1
if
o
otherwise
n
n!1
e:
>
0
But for arbitrary t>O,
t(S (x) - nl:1h
e:)
nM
T(x) s e n
so that
P (Sn(x) > nMh
nM
e: ) s Ee
• e
t(S (x) - ru1h
e:)
nM
n
-nMhnM e:t tSn(x)
Ee
•
Lemma 2.3.2.
For arbitrary t>O and given e:>O
-nMhnMe:t
e
(2.3.3)
Ee
-tS (x)
n
where S (x) is given by (3.4.2)
n
Proof:
The proof follows as for lemma 3.4.1 if we note
and define
I
T (x) •
o
otherwise
To see the relevance of Lemmas 2.3.1 and 2.3.2 we will now
look at P(N)nM) and obtain a simple inequality concerning this quantity.
Define
(2.3.4)
A
k
= {(x , x ,
1
2
""XnM) : IVk(x) > e:} , 2
Then, from definition (2.1.2) we have that for n
~
S
2,
k
S
n.
30
Thus, since we can write
we have that
(2.3.5)
Further, we note that
co
(2.3.6)
= L
EN
where we will assume
k=O
P(N)O) = P(N)M) = 1.
(This is consistent with
the way we define our stopping rule since we need to take at least two
samples of size 11 to define V (x».
n
Thus, provided we can show the
convergence of the right hand side of (2.3.6), we have reached our goal.
To do this we use equation (2.3. 6) ,together with Lemmas (2.3.1) and
tS (x)
(2.3.2). Some further results concerning Ee n
must now be obtained
however, before the latter two lemmas will be useful.
Lemma 2.3.3
If {h } is a sequence of positive constants satisfying (2.1.4)
n
and such that there exists an N = N (€) for which if n
>
N (€),
0 0 0
-nMh
(Al)
e
nH
€
~
1
---=---(nM) Y
+
(1+tS)
,tS > 0
31
and if
Ee
± S (x)
n
~ C. nY
where both C and yare finite positive constants, then
EN <
Note 1:
00.
Since lemmas 3.3.1 and 3.3.2 are true for arbitrary t we will
only consider the case t • 1 in the remainder of this section.
Note 2:
y can be negative in the statement of this lemma, however, we
will see that to obtain meaningful results in later lemmas and theorems
(in particular theorem 2.3.6) we require y
~
O.
Proof of Lemma 2.3.3.
It will be sufficient to prove that
00
L
P (N)nM) <
naN
co
0
i f the above conditions are satisfied since this represents the tail of
Now from (2.3.5)
an infinite series for EN.
co
00
L
P
naN
[N>n~1) ~ l
n""N
0
e
-nMh M
n
€
[
8 (x)
Ee n
-8
+ Ee n
(X»)
0
00
~ 2C
l
naN
o
But, since for n
~
N we assume that (AI) holds we have that
o
co
L
naN
Thus, since the tail of the series (2.3.6) is convergent.
EN <
00
as required.
32
Example 2.3.1:
Consider
h .. B n-a
o<
n
a < 1,
B a constant.
If condition (AI) is to hold then for n
~
N
o
so that
nHhnM ~
log oM
y + (1+0)
&
This means that nMh
must increase faster than log nM.
nM
(nM) I-a
log nM
-+-co
as
But
11-+<lO
so that the condition will hold for the sequence given.
S (x).
We now turn our attention to Ee n
Notice first that we can
write
(n-l)M
nM
S (x). l
A (x,~) + l
B (x):1<)
n
k=l
n
k=(n-l)MH n.
where
An (x,~)
..
K(hx-nM~) -
n
huM
(n-l)
Bn (x,~) • K(X-~)
h
nH
x so
.e.
h(n-l)H
K( x-~
) , k
h(n-l)H
k
= 1,
=
2,·.·, (n-l)M
(n-l)M+l,.
0
.,
that we have written S (x) as the sum of nM mutually independent
random variables.
n
Further An (x,X l ),···, An (x,X( n- l)~)
are identically
~J
distributed random variables, as are Bn (x,X(n_l)l1+l)'···' B(x,XnM ).
This means we may write
nM
33
(n-1)M
s
I
(x)
Ee n
... Ee
k=l
nM
A (x.~) + l
n
-K
k=(n-1)M+1
(n-1)M
~
nM
I
I
... Ee
Bn(x,~.)
k=l
Ee
B
k=(n-1)M+1
n
(x,~)
(as the sums are
independent), so that
(2.3.7)
We now proceed to show that the second term in the product on
the right hand side is bounded above.
Lemma 2.3.4
If K(u) satisfies (2.1.3) then
B (x.X M»)M
n
(Ee n
where
L =
sup
-
Proof:
K(u)'
< u <
00
S
«
ML
e·
~
by (2.1.3»)
00
By definition
EeBn (x,XnM) =
r K[~:J
e
f(u) du
-00
s eL
I~ f(u) du
-~
= eL
and the result is immediate.
Before considering the first term in the product (2.3.7) we
now examine "how 1arge ii S (x) can be and still allow us to retain EN <
n
00.
34
Lemma 2.3.5
If there exists an
Isn (x)1
1'1
0
such that for n > N
0
~ c log n
(a.s)
then
P(N)nM) ~ 2 n c e
(2.3.8)
-nMhnM€ •
Further, i f
Isn (x)1
~ c
(a.s)
then this inequality can be improved to
(2.3.9)
If either (2.3.8) or (2.3.9) holds and if in addition {h }
n
satisfies condition (AI) then EN
Proof:
<
00.
By definition
Ee
Is
n
(x)
I
e
_00
-00
~ foo...
-00
= nC
Isn I
1
00
e
clog n f(x
l'
'"iX ) dx o,odx
nM
1
nM
-00
~oo. ~f(Xl",o'XnM)
-co
dxlo"dxnM
_00
= nc
But both,
s
Ee n
~
Ee
~
Ee
Is I
n
and
Ee
-5
n
Is I
n
Thus from lemmas (2.3.1) and (2.3.2) and using inequality
(2.3.5) we get,
35
P (N)nM)
n > N .
2
S
o
Hence, by lemma 2.3.3 it follows that EN <
~.
Using a similar argument we see that if
Ee
Is n (x)1
S
e
c
Isn (x)1
S
c
then
= constant
and lemma 2.3.3 again gives us that EN <
~.
Example 2.3.2
The conditions of lemma 2.3.5 are satisfied by the normal
kernel and the liusual" {h } sequence, that is
n
K(u)
=
e
1
- -1u 2
2
ili
and
,0<a<1.
To show this we write (with BDl for convenience)
nlAn (x,u) I
.- I
n
(nM)2a
- - - (x-u)
2
e
2
( n )1-a
e
n-1
a
...
{(n_1)11]2 (
)2
x-u
(-!l.-) 1-ae
2
-
-n
n-1
Differentiating this with respect to
'/ i.1T
-
_ (x_u)(Mn)2a e
(x-u)
2
or
2 10 (....!L)3a-1
g n-1
= ----=-..;;;.------=2a
(Mn)2a_[(n_1)M1
2a
2
..1!-[(x-u) [(n_1H-i]2a(-!l.-)1-a e
rnn-1
21T
=0
(nH)2a
2
(x-u)
2
u we obtain
_ [(n-1)1-1]
which is zero when either (x-u)
-e
2
(x-u)
2a
(En},
2
(x-u)
2
1
36
The turning points represented by the latter equation are real
only ~f
1 ( an d f or a = 3'
1 u=x.
)
a >- 3
4
=0
continuous and that lim nlA (x,u)1
n
u+±co
n IAn (x,u)
at u
I
= x.
for all
lim ~ 11 - (~)l-al
ili
1-n
n~
However,
m > n 1 and 0 < a <
1
3'
31
S
Isn(x)
I
ili
so that there exists a
S n sup IAn(X,u)I S c
u
31 .
a < 1
= 1-a
for all u
1
we have three real turning points.
when
By different-
nlAn (x,u)1 a second time we can show that nlAn (x,u)1 has a local
minimum at
that
we see that for 0 < a <
ffIT
u.
iating
is
nIAn(x,U)I
n 1-a, ,its maximum value
is always less than -n- I1"· (l-n)
But
For
By not i ng t h at
u
=x
nIAn(x,u)I
n
n 1-3a
---[(n-1)
]
and from the nature of the function we can conclude
is always less than
1_«n_l)2a/(n2a_(n_1)2a»
.
[(~)2a_l]
ili
which tends to
n-1
2 e 3a-1 as n + co. Thus, as above there exists a constant c and an
--2
/2;
n 2= n 2 (c) such that Isn(x) I S C z for all u when n > n 2 and; S a < 1.
If we choose c • max(c l ,c 2 ) and nO=max(n l ,n 2 ) then for n>nO' Isn(x)1 ~ c
for all u,
0 < a < 1; and EN <
co
in this case.
To this point we have obtained J.emmas cuat: g1.ve
SUI:J:1.c.umL
S (x)
conditions for either S (x) or Ee n
n
to imply EN
<
co.
In practice
however it is the kernel function K(u) and the sequence {hn } that we
control.
We now give a theorem that places a condition directly on the
kernel to ensure EN
< co.
Theorem 2.3.6:
For a piecewise continuous kernel, K(u), and a sequence of
positive constants {h } satisfying (2.1.3), (2.1.4) and (2.1.5) such
n
that condition (A1) holds and if
37
(2.3.10)
11m.
n
n-
log n
II» IA
(x,u)1 e
IA
(x,u)\
n
f(u) du • Y
n
-I»
~
o.
then
EN <
1».
Proof:
From equation (2.3.7) and using lemma (2.3.4) we may write
s (x)
(2.3.11)
Ee n
~
») (n-1)H
(A
(x,X
IA
(x,X1 )I
Ee n
1
e ML
But,
<
- e
n
An(x,X
= 1 + IAn (x'X1 )1 + I
e
(say)
n
where
a
n
= fl» IAn (x,u)1
e
IAn (x,u)\
f(u) duo
Now, it follows from assumption (2.3.10) that
(2.3.12)
a
n
=0
Thus,
(2.3.13)
Now let
y
=
(1 + y10g
n) n
so that
n
log y • n log (1 + y10g
n
n) ,
21
IAn (x,X1 )I
so that
A (x,X )
1 ~ 1 + a
Ee n
1 )I
2
+
000
38
and since log n
+
o as
n~,
there exists an n
n
o
such that for n
~
n ,
0
ylog n < 1 and
n
(2.3.14)
log Y
~
n. ylog n
= ylog
n
n
Hence, asymptotically
and
x, Xl») n-l
(EeA(
n
~
so that
e
Similarly we can show that
Ee
-5 (x)
n
~ (l+a ) (n-l)M e ML
n
e
MI..
It now follows from lemma 2.3.3 that EN <
~.
Clearly condition (2.3.10) still involves the unknown density
function, f(u).
Theorem 2.3.6 by itself therefore does not completely
solve the problem of putting a condition onK(u) directly.
In lerrma 2.3.7
we obtain a sufficient condition that (2.3.10) is satisfied for any
density f(u), though example 2.3.2 will show the condition is not necessary if a weak condition is placed on f(u).
Lemma 2.3.7
If K(u) satisfies (2.1.3) and if {h } satisfies the conditions
n
of theorem (2.3.6) and if in addition
(2.3.15)
lim
n
n~
log n
IAn (x,u) I = y
<
00
uniformly in u,
39
where y is a constant independent of u, then condition (2.3.10) holds so
that EN < "".
Proof:
IAn (x,u)1
By assumption (2.3.15),
given a
~ >0, there exists an n
n
IAn (x,u)
log n
But for n > n ,
and
o
+
0 uniformly in u.
such that for n ~ n
0
both
Thus
IAn (x,u) I
~ 1
I ~ y + o.
o
n
log n
IAn (x,u) I
e
IAn (x,u)1
f(u)
S (y+o)
1
e. f(u) €L
l
so that using the dominated convergence theorem
oo
lim n
n+oo log n
I
IAn (x,u) I e
IAn (x,u)1
f(u) du
=
fy
e
0
f(u) du
_00
_00
=y
and condition (2.3.10) holds.
Comment:
(1)
It is not difficult to show that both the normal kernel
and the double exponential kernel satisfy the condition in lemma 2.3.7
(with y
= 0).
(2)
In example 2.3.3 we show that with the addition of a weak
condition on f(u), the uniform kernel satisfies (2.3.10).
However, the
uniform kernel does not satisfy condition (2.3.15) since
n
n
h
nH
+ ""
as
n+oo.
log n (n-1)
(3)
We note that for the case of the uniform kernel condition
(2.3.10) does answer our question since it will hold for any density f(u)
that satisfies the condition given.
(4)
Comments 2 and 3 considered together show that condition
40
< ~
(2.3.15) is sufficient to ensure EN
(with the other conditions of
lemma 2.3.7) but it is not a necessary condition.
Example 2.3.3
Consider the uniform kernel defined by
1
K(u).
o
and again let h
n
lui
if
2
.. B n
~ 1
otherwise
-a
, 0 < a <1
Then, from the definition of a ,
n
where
JX+hnM
x-h
W ..
2
t-
1\1 - (n-1)
nM
1
x+h
=f
(n-llM
X+h~1
1
2
i
n
(n-1)
2
hnH
n
h(n-l)M (n-l)
I
feu) du
hnH
hn~'1
n
2 (n-l)
h
n
(n-l)
h
e
(n-l)M feu) du
h (n-l)M
1
n
(n-l)
111 -
h(n-l)M
hnM
x-h(n_l)M
W3
ru1
n
'2
Ie
h
2'
hnH
h
e
hn~1
(n-l)H feu) duo
(n-l)M
For the purpose of this example we will make the additional
assumption that there exists an
sup feu) .. C <
u
~
for
N
0
such that for n
~
N ,
0
x - h(n-l)M ~ u S x + h(n-l)M
Then,
n
log n
Wl~
n
1 112
(n-·l)
n
(2hnM.12111n
log n
(n-l)
41
But, expanding
and
1
1.-!
1 a
(1 - -)
n
in powers of 1 we get,
n
n
n-a
log n
I1-.-!L.
n-l
I
l-a 1 - (1 + 1 + 0(1)) (1 - a + o(1))
(!!-1) a, • __n_
n
log n
=
so that
n
n 1-a ( In- a +
log n
n
W tends to zero as
log n l
0
n
n
\n
I
(-nl) )
n~.
We may consider W and W together since they are both bounded
3
2
above by
m-1) a
n
n
(n-1)
tn-l)
a
2(n-l) C n
e
Cn
Now,
(
1
- 1 )
(n-1) a
-
Bn
a a
M n log n
na
(1
1 a
(1--)
n
- 1)
which equals, after some minor algebraic manipulation
Thus
n
(W +W ) tends to zero as
2
3
log n
n~.
This means that
lim
n
n-+-co log n
and so EN <
= by
sequence h
a
n
a
n
=0
theorem 2.3.6 when we use the uniform kernel and the
B n-a •
42
2.4
Variance of N:
vfuen considering the distributions of a random variable we
As we noted for EN, unless f(x)
usually like to obtain its variance.
is known, it is unlikely that we will be able to determine Var (N).
However, through the next lemma we can at least see that the variance of
N is finite.
Lemma 2.4.1
Under the same conditions as theorem 2.3.6 and provided h
n
satisfies condition (AI) of lemma 2.3.3 then
Var l'i <
00.
Proof:
Since in theorem 2.3.6 we have already shown EN <
sufficient to show EN
2
<
00.
it will be
c
Defining A as in equation (2.3.4) and A by
00,
j
j
(2.4.1)
we have from the definition of N that
P(n=na) = P[A~ n. [n~1 A.]J ]
j=l
S P (An_I)
= P (IVn-l (x)
I
> €:).
The conditions of Theorem 2.3.6 ensure that
Ee
IS n I
c
S n, so that using lemmas 2.3.1 and 2.3.2
(2.4.2)
P (N=rh>l) S 2 (n-l) c e
-(n-l)E h(n-l)H€:
Now, since by condition (AI) we have that for n
I
(2.4.3)
o
Y
~
0, 5
~
0
(nN) Y + (1+5)
Let us consider the tail of the series for EU
terms.
> N
Then using the inequality (2.4.2)
2
beyound the first N - 1
o
43
co
I
naN
n
co
2
P(N=nM)
~
0
L
naN
2
2 n (n-l)
c
e
-(n-l)l1h(n_l)M e
0
and further, using (2.4.3) we get, putting y = 2 +c, that
co
L
n=N
n
2
co
P(N=nH)
~
2
L
n
c+2
naN
0
0
- (c+2)- (1+0)
co
= 2 L
(1+0)+2
n=N
(n~1) c +
1
Thus since the tail of EN
2
is finite, EN
2
<
M
n
co.
1+0
is finite and the
proof is complete.
Observations:
(1)
that P(N=nM)
We have obtained an upper bound on P(N=nM) and we note
+
0 as n+co at an exponential rate provided h satisfies
n
appropriate conditions.
(2)
Because of the form of the upper bound on P(N=nM) we can
say that for finite r, EN
r
<
co.
To prove this we need only look at the tail of the series for
EN
r
and proceed as in lemma 2.4.1.
(3)
Evidently, (using the information in (2»
we have that the
distribution for the random variable N is a discrete probability
distribution with all finite moments existing.
(4)
Professor W. Hoeffding has pointed out the existence of a
method for showing Ee
bound on P(N)nM).
tN
< co for some t > 0 given a (suitable) exponential
r
This implies EN < cofor all r > O. See C. Stein, "A
note on cumulative sums, Ii Ann. Math. Statist. 17, (1946). 498 - 499.
Unfortuneately, this method can not be applied directly to our problem as
the bound on P(N)nM) is not of the right form.
2.5
Closure of the Stopping Rule:
By definition, the closure of a stopping rule is the property
by which P(N<co)
= 1.
(If a stopping rule is not closed it is called an
44
extended stopping rule).
The property of closure will now be shown to
hold for our "naive: 1 stopping rule.
Lemma 2.5.1
If the conditions of theorem 2.3.6 hold then the stopping rule
defined by (2.1.2) is a closed stopping rule.
That is» P(N<oo) a 1.
Proof:
The stated conditions t are sufficient to ensure
-nMh
P(N)nN) ~ C n Y e
nM
e:
t
Y > O.
Thus 1
p(Naoo) a lim P(N)nM) a 0
n-+«>
2.6
Divergence of N as e:
0:
~
In the previous sections we have considered a fixed value of e:.
We will now consider the problem of the behaviour of the stopping
variable N(e:tH) as e:
~
O.
depends on the criterion
Intuition tells us that i f e:
Ivn (x)1
=
00
Ot since N(e:,}1)
~ e:, then N(e:,M) should increase and
that it should be unbounded above.
could obtain that lim N(e:,N)
~
That is it would be desirable if we
in probability (or with probability one).
e:-+O
Unfortunately, we show that for a certain class of kernel this result may
not always be true for our naive stopping rule.
Lemma 2.6.1
For finite n» if we define
then as e:
~
0,
o
P(N~nM) ~ P(~nM )
45
Proof:
By the definition of our stopping rule, for any fixed, finite n,
P (N:SnN)
= P (B
k=l
where A~ is defined by (2.4.1).
\)
Let,
()
E;~8e:
= nu
k=l
Ill"l
Akc •
c
Since n is fixed and finite, from the definition of A we see
k
that E;nM(e:) decreases monotonely as e: decreases so that we may write
But,
lim E;nH (e:)
e:+()
=
o
~nl1
so that the result follows.
o
If P(E; n_.M)
probability as e:
~
= 0,
then it would follow that N(e:,M) ~ ~ in
0, since we would have then proved that
lim P(N)nM)
=1
e:~0
o
However, p(~n~ is not zero for all kernels as example
for any finite n.
o
2.6.1 will show.
That is we will show that P(E;nM) can be strictly
positive.
Example 2.6.1:
Let
1
K(u)
=
if
lui : ;
1
2
o
otherwise
For convenience in calculation we will assume that {h } is a
n
monotone decreasing sequence of positive constants(the monotone property
is not necessary).
That is, for all n, hn
~
hn
+1.
46
Certainly, Vn(x)
X-X) = a for all k
= 0 if K ~
(h N
~
nM, and from
the definition of K(u) we have that n
(2.6.1)
=1
- p(x - hnM
x+h
=1
-
J
~ ~ ~ x + hnM~
nN
f(u) du
x-h
nH
Certainly for a wide class of density functions,
o<
I
X+hn~1
f(u) du<l
x-h
nM
so that the right hand side of (2.6.1) is strictly positive.
Since h
n
is monotone we have that
r x-~
)
=a
implies K rx-~. ) = 0
(hnH
K (h(n-1)l'.
so that
(2.6.2)
P(~:M) ~
P(Vn(x)
= 0)
~
(p(l
x-u I >
h (n-l)H
l))(n-l)H(p(I~_ul
> l))M
n1'1
and the last quantity on the right hand side of (2.6.2) can be strictly
positive.
o
Thus we have an example for which P(~nM) can be strictly
positive and hence for which N(£,M)
-f>
~
in probability.
Remark:
We note that the essential feature in the example is that the
uniform kernel has only bounded support which allows the possibility of
47
obtaining a sample for which K(X-~)
=
0 for k
= 1,2, 0.0, nM.
This
h l-f
reasoning could be applied to anY~ernel with only bounded support.
Thus we have found a class of kernels for which N(e,M) does not
necessarily tend to infinity in probability as e
Two questions will nOlo1 be considered.
-+
O.
The first is IIAre there
any (or is there a class) of kernels for which lim N(e,M)
e+O
probabili ty? II
=
= in
Secondly, "Can we redefine our "naive ll stopping rule to include
the kernels with bounded support in the class above?l
The answer is yes to both questions and we now proceed to a
proof of this statement.
Lemma 2.6.2
Let K be the class of kernels satisfying condition (2.1.3)
o
and for which
P(V (x) = 0) = 0
n
for any finite n, then for Ke
lim N(e:,M) =
K,
o
=
e+Q
Proof:
By definition,
and from lemma 2.6.1 we have that
But,
in probability.
48
• 0
Thus, lim P(NSnM)
by assumption.
=0
and the result follows.
E~
It remains to show that K is not an empty class ( and if
o
possible to show it includes at least some of the more commonly used
kernels).
To accomplish this we will
appropriate
show in lemma 2.6.3 that under
conditions, V (x) has an absolutely continuous distribution.
n
Lennna 2.6.3
If K(u) is a kernel satisfying conditions (2.1.3) and if
Xl' ' •• , X are independently and identically distributed random
nM
variables with density function f(x), and, if in addition K(u) satisfies
(i)
K(u) is differentiable for all but a finite number of
values of u
(2.6.3)
(i1) K'(u) is continuous and non-zero at all but a finite
number of values of u, then V (x) has an absolutely
n
continuous distribution.
Proof:
Let
y
k
=
K(X-~)
1
nMh
nM
hoM
k
'
then under the conditions stated,
continuous distribution.
But then z2
= Yl
= 1,2,···,
nH.
Yk' k • 1,2,··· , nM, have absolutely
(Parzen 1960 p3l3)
+Y2' has a density function which is the
convolution of the densities of Yl and Y2'
(Parzen (1960) p3l7)
By
induction we can thus show that
,..
f
n11 (x) = zntI
= y 1 + ••• +
Yn11
has a density function that is the nM - fold convolution of the density
of Y •
I
It then follows that
49
has a density function.
(Parzen (1960), p3l8)
Remarks:
1.
If V (x) has an absolutely continuous distribution then
n
clearly P(V (x)
n
2.
= 0) = O.
The normal, double exponential and Cauchy kernels satisfy
the conditions stated so that
3.
K is not empty.
o
4.
We note that for the uniform kernel, K' (u)
D
0 a.e.
so
that the uniform kernel is not in the class considered.
Before redefining our stopping variable we will first prove
that for kernels, Ke K , that N tends to infinity with probability one.
o
Theorem 2.6.4:
Define V (x) and f (x) as previously.
n
n
The kernel,K(u), is
assumed to satisfy conditions (2.1.3) and (2.6.3) and {h } satisfies
n
Then the stopping variable N, defined
conditions (2.1.4) and (2.1.5).
by (2.1.2) is such that
N(e,M)
~ ~
as
e~
with probability one.
Proof:
In lemma 2.2.1 we proved that under conditions (2.1.3),
V (x)
n
~
0
a.s.
There are three possibilities to consider
V (x) -f> 0
as
n~
(ii)
V (x)
~
0
as
n~
but V (x)
(iii)
vn (x)
~
0
as
n~
and for at least one finite n, V (x)=O.
(i)
n
n
n
~
0 for any finite n, and,
n
We consider these in turn.
(i)
If V (x) ~ 0 then by the definition of N(e,H), N • ~
n
so that the theorem holds in this case.
(Note also that V (x) -f> 0
n
50
only on a set with probability zero).
By assumption, V (x) ~ 0 as n~ but Iv (x)! > 0 for all
(H)
n
n
finite n.
Choose a finite, fixed number N.
We will now show that for any
o
point w in the subspace for which (ii)
e
holds, we can choose an
= e(w,N
) such that N(w) > N , for any finite N •
0 0 0
By assumption, Ivj(x)
all e.
This implies Ivj(x)
for all j
S
I
I
>
0 for j S No'
Assume N(w) S No for
< e for at least one j S No'
NO we can choose e(w) < min
lSjSN
we have a contradiction.
Ivj(x)
I so
Since Ivj(x)
that N(w) > No and
o
Since Ke K we have by assumption that P(V (x)
(Hi)
o
n
= 0) = 0
for all finite n so that this third case can occur at most on a set of
probability zero.
This completes the proof of the theorem.
We have seen however that all kernels are not in the class
K.
o
It is now our purpose to redefine N say N' in such a way so that even
if K
t K we may still obtain lim N' (e,H)
o
e~
~
where N' will be the new stopping variable.
at least in probability,
In redefining N we
endeavour to keep the properties of the class K from changing.
o
Definition (2.6.4):
Given e > 0, and M > 0,
IS t
N' (e ,11)
=
00
n
such that Iv (x)
n
I
<
e but Iv (x)
n
I
>
0
if no such n exists
Remarks:
1.
The only difference between definition (2.6.4) and
definition (2.1.2) is the requirement that we continue sampling if
v (x)
n
= O.
We thus remove the problems caused by the fact that
1>0
51
p(v (x)
n
= 0)
may be strictly positive.
2.
p(v (x)
n
If K€ K then by the definition of the class
o
= 0)
K ,
o
Thus, for the class K , the
= 0, so that N = N' a.s.
o
stopping rules defined by Nand N' are essentially the same.
Theorem 2.6.6:
If K(u) and h
n
satisfy the conditions of theorem 2.3.6 then
EN i <
co •
Proof:
By definition (2.6.4)
P (N '>nM)
where \
=~
u
AO
k
~,
=
{(x
•
P
(~
\)
k=l
being defined by (2.3.4) and
\
l'
~
by,
, k S nH}
000
Thus,
,
P(N'>nM)
(2.6.5)
S
P(A )
n
Now, using (2.6.5) we get
co
EN' =
l
P(N'>nM)
n=O
(2.6.6)
co
S
l
n=O
co
P(A) +
n
l
n=O
P(AO)
n
But by theorem (2.3.6) the first term on the right hand side of (2.6.6)
is finite so it only remains to prove
52
Now, by definition (2.1.2)
P(N=nM) =
p( k=le ~)
A~J
u
~ P(A c )
n
= p (I Vn (x) I S E)
This means that we have the inequality
P(N=nM) ~
p(lvn (x)l· 0)
But,
00
EN
=
I
ncO
nM P (N=n~1)
00
~
I
ncO
n~1 P (I V (x)
n
I = 0)
00
~
I
ncO
I
P ( V (x)
n
I = 0)
However, by Theorem 2.3.6, we have that EN is finite so that we have
proved that
00
I
<
00
ncO
and the proof of the theorem is complete.
Remarks:
1.
With N' defined by (2.6.4) and using the uniform kernel we
will now have
lim N' (E ,11)
E-+O
=
00
in probability
53
since we now have
P(NSnM)
+
0
as E+Q.
If KE K ,since P(V (x) = 0) = 0 for all finite n we have
o
n
that N = N' a.s. and this will imply that EN = EN'.
2.
3.
If K
~
K then from the definitions of the two stopping
o
variables we have that
ENS EN'
2.7
Mean Square Error:
We would like to show that
(2.7.1)
is true.
E(f
2
+ 0
as E+O.
N
Under appropriate conditions we can show that result (2.7.1)
(x) - f(x»)
However, we will leave the proof of this until the theory for
a general stopping rule has been developed in Chapter 3.
The result will
then follow as a special case of the general theory.
We state the result here only for the information of the reader
and since the question of mean square error might logically have been
treated at this point.
CHAPTER 3
SQf'JE GENERP,L RESULTS
3.1.
Introduction and Assumptions:
In chapter 2 we discussed a "naive" sequential procedure,
N(e,M).
Clearly, there are many possible sequential procedures that
could be considered and it is our purpose in this chapter to discuss some
of the properties of an estimator fN(x) that depend on the specific
properties of a stopping rule, N.
We note that these properties will
also depend on the properties of the fixed sample estimate, f (x).
n
For
the purposes of this chapter we will make as few assumptions as possible
about the estimator fn(x) and the stopping rule
~~,
noting in particular
that 1:1 should not be considered as the "naive'; stopping variable of
chapter 2 and f (x) should not be considered as a Parzen type estimator
n
of a density function (although the I'naive il stopping rule and the
Parzen estimator will provide examples of the general theory).
Throughout this chapter however, we will assume that f (x)
n
~
O.
We begin by making several assumptions about the sequential
procedure N.
Assume we are given a set of observations Xl' X2 ,··· which are
random variables on the same measurable space (X,F) and assume there
exists an (unknown) probability measure P over the measurable space.
P
determines the distribution functions of finite subsets of the X's.
The sequential procedure will consist of a stopping rule Nand
a terminal decision rule D.
Note that N is often used to denote the
55
random size of the sample which results from applying the rule N.
The
sense in which we use the symbol N will be clear from the context in
which it appears.
Similarly, D is often used to denote the action taken.
Let,
Bo = {~,F}.
We then assume
(i)
(ii)
N is an extended random variable on (X, F), that is
[N=n] = {set on which N = n} ~ B
n
(iii)
Throughout this chapter we will use a slightly modified
notation for the estimator f (x) at the point x.
n
We will write
(3.1.1)
where we wish to demonstrate more clearly the dependence of f (x) on the
n
random variables, Xl' X "oo, X ,
2
n
In the sections that follow we will make use of a theorem
giving the expectation of a function
variable.
~~(x1'oOO ~~)
where N is a stopping
We now state this theorem without proof.
Theorem 3.1.1
Let gl(x ), g2(x ,x ),·oo be a sequence of non-negative, Borel
1
l 2
1
2
measurable functions on R , R ,
Let N be a stopping variable and
suppose P(l<N<oo)
= 1.
is defined (possibly
Then Y
=+
= ~(X1"oo,
00) and
X~)
is a random variable, EY
56
where
nl
= {(xl'ooo,
= subset
3.2
xl) : [N
= ll}r
t
of R on which [N
= II
Finite Expected Value of fN(x):
In chapter 2 considerable effort was placed in proving EN
< ~.
This result is desirable in its own right, but also implies that
EfN(x)
<~.
We prove this in the next lemma.
Lemma 3.2.1
If f
n
(x)~
X
0 and if sup
sup
1 f (x
X
X -n n
n
l'
'n
000
l'
000
X) <
'n
then a sufficient condition that EfN(x) is finite is that EN
~,
< ~.
Proof:
By theorem 3.1.1,
r In flx
b:l
t
nl = {( xl' x 2 '
where
0
o.
)
N
= l}
= {set on which we stop when N=t}.
Thus,
s sup
n
sup
(x
l'
000
X )
'n
(17 ft(x
Xl '
0
0
0
,X
t»)
00
2
t=l
t
I
rl
dF(xl,OOO,Xt)
t
57
and since
f
= P[N = l]
dF(xl,·D.,xl)
we get
S"ll
2'
xl' .0. ,
.! fl(x
sup
X
••• , X l
X) EN <
n
~
n
by assumption.
Corollary 3.2.2
= f n (x) =
If f (x)
n
n
nh
(X-X) ,
I
1
j=l
n
Kl~
and N is defined by
n
(2.1.2) then under the assumptions of Theorem (2.3.6), EfN(x) <
~.
Proof:
By theorem (2.3.6), EN <
sup .! f n (x) <
n n
=
S
n
-+~
.
Thus it remains only to show that
NO~o1,
~.
1 f (x)
n n
and as nh
~
as
n~
n
1
2
n h
1
2
n h
I
j=l
n
K~)
n
n sup K(u)
u
n
and sup K(u) <
u
~
= _1_
nh
n
sup K(u)
u
we have that sup 1 f (x) <
- n
n n
~
.
Thus the result follows.
3.3
Convergence Theorems:
Previously in chapter 2 we made a point of obtaining results
such as N(€,M) tending to infinity as e tends to zero in both
probability and '"lith probability one.
These results are desirable in
themselves, but, in this section we will show that they are also
required to obtain results concerning the density estimator f (x) where
H
N is a random variable.
theorems.
We will discuss these results as a series of
58
Theorem 3.3.1:
If
n
~ 00
then
N
e
~
in probability and if
00
f (x)
~
n
f(x)
a.s. as
and if in addition
Ef
N (x)
~
f(x)
as
f (x)
~
e
~
o.
e
Proof:
Since
we have that
an
nO
= nO(o)
n
~
f (x)
n
f(x)
a.s. as
f(x) a.u.
such that if
Ifn (x)
n
~
00,
then by Egoroff's theorem
That is, given
n > nO'
- f(x)1 < 0 ,
except possibly on a set A of probability less than
Also, since
an
e:
o ..
in probability as
such that for
+
there exists
0 > 0
e
~
o.
0, there exists
we have
I f~(x; Xl,···,x~)dF(Xl'···'X~)}
AnO
t
59
But for
0
~ 00
the set
00
c
A
we have that
f(x) - 0 S f (x) S f(x) + 0
o
so that
where
c · sup
sup
( ~ , ••• ,.XR.)
R.
ft(xZ xl' ••• ,x t ) •
The right hand side of
the inequality is then less than or equal to
f(x) + o' + c P(A)
since
P(A)
by assumption.
0
$
S
f(x) + 0 + co
Also,
n -1
°l
t=l
J fR,(x;
S1
t
x1 ,···,x t )dF(x 1 ,···,x t )
n -1
$
cOl
tel
n -1
dF(x 1 ,···,xR.)
nR.
O
=c
I
t=l
$
J
co
P(N
= t)
by assumption for
E $
EO •
60
Thus we have shown that for
(3.3.1)
Ef
N
EO
t
f(x) + o(1+2c) .
~
(x)
~ ~
E
Ef
N
(x)
(ll)
l
~ (f(x)-o)
t==n
p(ACnn ) •
t
o
co
CIO
l
p(ntnA) ~ l p(ntnA) == peA) ~ 0 •
t=n
t==l
O
co
CIO
CIO
l pent) == l p(ntnA C) + I p(ntnA) ~ 1 - 0 •
t==n
t==n
t==n
o
O
o
Thus.
CIO
CIO
l
P(Qt nAc ) ~ 1 - 0 - I P(Qt nA )
t-n o
t-no
~
1 - 20
and so finally we obtain
(3.3.2)
Ef N ~x)
E
~
(f(x)-o)(1-20) •
61
Since
0 is arbitrary, the required result now follows from
the inequalities (3.3.1) and (3.3.2).
EXAl1PLE 3. 3 .1 ~
Let
sUnP
f (x) = f (x) and let N
nnE
sup
(~, ... ,~)
f n (x) may be
g (x)
00
be defined by (2.1.2).
Since
let us define
,
if f (x)
$
n
_ [lnR(x)
n
R
if f (x) > R
n
where
that
R is a finite constant.
R > f(x).
R is chosen very large and
We assume
Note we still use the stopping rule defined by (2.1.2)
...
g (x) ue first must calculate f (x) and (it)
n
n
i f the stopping rule were based on g (x) there would be some difficulty
n
as the property NE -+ co a.s. as E -+ 0 would be lost. Notice also that
since (i) to calculate
since
R > f(x)
then provided
...
tions see Lemma 2.2.2) then
f
n
g (x)
n
(x)
-+
f(x)
-+
f(x)
a.s. as
a.s. as
conditions of theorem 3.3.1 are satisfied and
n
n
-+
Eg N. (x)
-+ co
00,
-+
(for condi-
so that the
f(x)
as
€ -+
O.
E
Remarks:
1.
The theorem remains true if we have only
sunP (X· sup X ) f (x) <
1'·'·' n n
2.
00
(a.s.).
From a practical standpoint, provided
enough, the truncation of
estimates we obtain for
f
n
R is made large
(x) should have little effect on the actual
f(x).
62
3.
We now show that asymptotically as
have the same properties.
where
g (x)
n -+
00,
,.
f (x) and g (x)
n
n
To do this we will consider the general case
is the truncated estimator corresponding to
n
it is assumed
f (x)
-+
n
f(x)
f
n
(x), where
a.s.
Asymptotic Properties of a Truncated Estimator:
Define,
g (x)
(3.3.3)
n
Lemma 3.3.2
If
then
=[
f (x)
n
R
~
if
f (x)
if
f (x) > R •
n
R
n
(Asymptotic unbiasedness):
f (x) -+ f(x)
a.s. and if
n
Eg (x) -+ f(x)
n
as
n -+
co ,
g (x)
n
is defined by (3.3.3)
R > f(x).
prOVided that
Proof:
Since
theorem~
nO
= nO(o)
a set
R >
f(x), g (x) -+ f(x)
n
g (x) -+ f(x)
n
a.u.
such that if
a.s. so that by Egoroff's
That is, given
n > nO
,I gn (x)
A of probability less than
6.
- f(x)/ < 0
But
Eg (x)
)
~ I gn (x)dF(x1,···,x
.
n
Eg (x)
n
~
n
so that
(3.3.4)
0 > 0
(f(x)-o)(l-o) •
there exists an
except possibly on
63
Now
Eg (x)
n
=I
gn (x)dF(xl,···,xn )
gn (x)dF(xl,···,xn )
c
~ (f(x)+o)P(A ) + RP(A)
(f(x)+o) + Ro
~
so that
(3.3.5)
Eg (x)
n
Since
~
f(x) + o(l+R) •
is arbitrary, the result follows immediately from
0
inequalities (3.3.4) and (3.3.5).
Lemma 3.3.3:
(i)
(ii)
Proof:
~roof
Ef j (x) ~ Eg j (x)
n
n
~
Var(f n (x»
for every j > 0 •
Var(g n (x»
These results are well known.
We give only a:sketch of the
•
j
n
Ef (x)
= J fj(X)dF(X1,···,X)
n
n
=
f f n (x)dF(x1, ••• ,xn )
j
j
f (x)dF(x1, ••• ,x ) ,
n
n
B
n
where
Bn
= {(xl' ••• ,xn )
Hence
Ef~(X) ~
: f n (x) > R} •
I
RjdF(XI,···.xn )
fj (x)dF(xl, ••• ,x )
n
n
B
n
=f
gj(x)dF(Xl, ••• ,X)
n
n
= Egj(x)
n
•
64
The result ii
fn(x) and gn(x) and then examinine
where
h (~)
n
3~btractinf
for variances follows by
~
from both
cov(gn (x) - R, hn (x) - R)
= max(fn (x),R).
Observations:
1.
as
n
~
If
var fn(x)
~
0
as
~
n
00
The rate of convergence of
00.
,
then so also does
var(gn(x»
var(gn(x»~O
to zero is at least
as fast as the rate of var(f (x».
n
2.
Since
E(gn (x) - f(x»
2
= E(gn(x)
= E(gn
- Egn (x»2 + (Egn(x) - f (x»
2
2
E(g (x) - f(x»2 ~ 0 as
using Lemmas 3.3.2 and 3.3.3, we see that
so that
- Egn(x) + Egn(x) - f(x»
n
g (x) is mean square consistent.
n
Comment on Notation:
We use the stopping variable
N on some criterion.
N€
For consistency with chapter 2 we take this
criterion as dependent on a parameter
the behavior of
to denote the dependence of
Nand
fN(o)
as
E ~
for \lhich we wish to consider
€
O.
Note, however, that only
minor modifications are necessary in the proofs in this chapter if the
criterion were assumed to depend on a parameter
t,
and
t
were
assumed to tend to infinity.
Theorem 3.3.4:
If
N
E
~ w
in probability as
E ~
0
and if
f (x)
n
~
f(x)
a.s.
,
65
as
~
n
E(f
N
and
00
(x) - Ef
E
sup (X sup X )f (x; Xl, ••• ,Xn ) <
n
1"'" n n
N
e:
(x»2 ~ 0 as
E
then
00
~ O.
Proof~
Since we have already shown that
conditions, it will be sufficient to show
that
Ef
Ef
(x) ~ f(x) under these
N'e:
2
2
EfN(x) ~ f (x). We will note
E
2
(x)
~
O.
An upper bound is obtained using a similar method
Ue:
to that employed in obtaining (3.3.1).
The proof is therefore omitted.
Theorem 3.3.5:
If the conditions of Theorem 3.3.4 hold,then
E(f
N
(x) - f(x»
2
~
0
as
~
e:
0 •
E
Proof~
Since,
E(f
N
(x) - f(x»
e:
2
= E(fN
E
(x) - Ef N (x»
2
E
+ (Ef N (x) - f(x»
E
2
,
the result follows immediately from theorem 3.3.1 and 3.3.4.
Theorem 3.3.6:
If
e:
~
0
then
f (x)
n
f
N
e:
(x)
~
~
f(x)
a.s. as
n
~
00
and if
f(x) with probability one
as
N
e:
e:
~
00
~
O.
a.s.
as
66
Proof:
By combining the two sets of probability zero implicit in
the statement of the lemma, the result is immediate.
Corollary 3.3.7:
Let
if
K(u)
and
f
n
h
(x)
n
= f n (x)
and let
N be defined by (2.1.12).
satisfy the conditions of Lemma 2.2.2 and
K
Then
€
K
O
then
A
lim f
€~o
N
(x)
= f(x)
with probability one.
€
Proof:
The conditions are sufficient for both
as
n +
~
and
N~
~
as
€
+
0
f (x)
n
+
f(x)
a.s.
so the result follows immediately
from theorem 3.3.6.
Remarks on Other Solutions:
In the course of this investigation two other methods were
studied for obtaining results on the unbiasedness, asymptotic variance
and mean square error for
f
N
(x).
The following result constitutes
€
alternate sufficient conditions for theorem
with both methods is that I cannot prove
3.3.1.
The difficulty
the conditions can be satis-
fied for my proposed sequential procedures.
We now state the theorem
for unbiasedness with these conditions and sketch the proofs.
67
Theorem 3.3.l.A:
If
N
e:
~ ~
E[f n (x)IN
(a)
or
in probability and if
= nJ
~ f(x)
E(f (x) - f(x»2
(b)
n
~
independent of
I{p(N
t=l
e:, then
E(f
Proof (a):
N
e:
-+
0
<
~
= t)}~ = c
~
(x) - f(x»
as n
as
-+
n
where
,
as
0
~
-+ ~
is a constant
c
e:
and
o.
~
This proof is similar to that used by Srivastava (1973)
for the case when
N
e:
is independent of the observations since we
can write
Ef N (x)
e:
=I
E(f
=I
E(f (x)IN
n=l
n=l
N£
n
Paralleling Srivastava's proof using
Ef (x)
n
now gives the result.
(x)IN
e:
e:
= n)P(N = n)
e:
= n)P(N e: = n)
E(f n (x)INe: • n)
.
in place of
The details will be omitted.
The difficulty we encounter is that we have not been able
to verify that condition (a) is able to be satisfied.
Notice that
68
E[f (x)IN
n
€
and for fixed
€
,
= n] =
P(N
= n)
€
both numerator and denominator tend to zero.
need to know the rates of convergence involved.
for
= n)18
P(N E
much too crude as
€ ~
0
The inequality we have
and we know little about
the other rate so the condition has not been verified for
and
N
f (x)
n
as defined in chapter 2 (or for any other example).
Proof (b):
By theorem 3.1.1
E(f
Choosing
N€ (x) - f(x»
we have
=
nO in a similar manner as in theorem 3.3.1 we obtain
r
t=n
where
In
t
o
(xl,· •• ,x t )
We
is the indicator function of
Schwarz's inequality we see that this is less than
nt
.
Using
69
~
=I
1=n
[E(f (x) - f(x»2]~{P(N = 1)}
1
o
and by the way we choose
I
6
nO
{P(N
1=n
we ensure this is less than
= 1)}~ ~
6c
(by assumption).
o
Given this step it is now not difficult to complete the proof.
~
I
the difficulty lies in showing
However,
~
{P(N
= 1)}~ ~
c < = ,where
c
1~1
is independent of
E,
can be satisfied in practice.
00
for any fixed
£
>
O. that
I
= 1)}~ = c(£)
{P(N
1=1
We can show that
<
=.
However, the
supremum over all such sums may be infinite.
Once again the problem
reduces
= 1).
to finding a suitable bound for
P(N
Observations:
i.
on the
The proof of theorem 3.3.1.A under condition (b) is based
use of Schwarz's inequality.
A similar result follows by
using the more general H~lder's Inequality.
The sufficient condition
(b) may be replaced by
(b')
E(f (x) - f(x»P
n
+
0
as
n
+
=
and
70
where
c
is a constant independent of
E
and
lip + l/q
= 1.
The same difficulty in Jetermining the existence of a
uniform bound, c, applies here also.
ii.
The proof of theorem 3.3.1 involves the assumption that
sup (X sup X )f (x; Xl, ••• ,X )
n
l~·.·' n
n
n
<~.
This is made to ensure that
P(A)
is sufficiently small.
(3.3.6)
is sufficiently small whenever
Letting
P be the probability measure on the usual infinite product space,
(3.3.6) may be rewritten
let us assume
(3.3.7)
for some
k
~
0
and for every measurable set
B.
Clearly
71
= kP(A)
,
hence (3.3.7) is a more general condition than
sup (X sup X )f (x; Xl, ••. ,X ) < 00.
n
1"'" n n
n
this is difficult to verify (P being
that
fn
is uniformly bounded in
iii.
Unfortunately, it appears that
unkno,~)
n
except in the circumstance
and in (Xl"" ,Xn) •
Analogues of theorems 3.3.4 and 3.3.5 can be proved with
sufficient conditions similar to those of theorem 3.3.l.A.
Theorem 3.3.4.A:
If
N
e:
+ ~
in probability and if
(a)
E[(f (x) - Ef (x»2IN=n]
or (b)
E(f (x) - Ef (x»2p
n
n
n
+
n
+
0
<
~
~
L {P(N = t)}l/q
s c
0 as
n +
~
as n
+
and
IIp + llq
~
and
,
t=l
where
c
is a constant independent of
E(f
N
(x) - Ef
e:
N (x»
e:
2
+
e:
= 1,
then
0
Horeover, if the bias tends to zero (theorem 3.3.l.A)
E(f
N
e:
(x) - f(x)} 2
+
0 •
Hote, however, that to prove the mean square error,
E(f
N
e:
(x) - f(x}}2p +0
72
we would also need to show
E(f (x) - f(x»2p~
n
as
€ +
O.
Using the
results of Parzen (1962) this can be shown to be true for the estimator
f (x).
n
The proof is not difficult, involving only some algebraic mani-
pulation and will be omitted.
iv.
Finally we observe that theorem 3.3.6 and corollary 3.3.7
do not have any of the requirements for a uniform bound or for uniform
integrability as required by theorems 3.3.1, 3.3.4 and 3.3.5 or their
analogues.
Moreover, theorems 3.3.6 and corollary 3.3.7 also guarantee
strong convergence.
CHAPTER 4
YP.;~'HO
THE
4.1
SEQUEilTIAl ESnf:ATOR.
Introduction~
Let Xl' X ,···, X be a sequence of independently distributed,
Z
n
m - dimensional random vectors having a probability density function f(x).
Yamato (1972) then defines a ilsequentia111 estimator of the density
function f(x) as
n
(4.1.1)
f
n
(x) ... 1
L
1
n
j-l
hm
Kr~:i]
j
Notice that the estimate f (x) may be written as
n
(4.1.2)
f
n
(x) ... (n-1) f~ _ (x)
n 1
n
+
_1_
[X-X)
K __
n
m
h
nh
n
n
This means that f (x) may be more suitable that f (x) for use
n
n
as a "sequentia1 estimator" since it has the property of correcting the
estimate succesively as observations are added.
A further reason for considering the Yamato type estimator is
that Yamato proves that in the case m = 1,
(4.1.3)
lim
n-+co
Var
(fn (x»)
Var
(fn (x»)
=
v
0
S
1, where v
0
depends on the
sequence {h }, so that f (x) is at least as efficient as f (x), the
n
n
n
Parzen type estimate.
~
We will first show that the
is strongly consistent.
estir~tor
f (x) defined by (4.1.1)
n
That is, if x is contained in the continuity
74
set C(f), then f (x)
n
~
f(x) with probability one.
Our method of proof
is to use a technique due to Van Ryzin (1969), who studied the strong
consistency of estimates of the form f (x).
Notice, that due to the
n
special form of f (x), Van Ryzin's conditions have been considerably
n
weakened.
Since we will be considering m - dimensional space and not
1 - dimensional space as in Chapters 2 and 3 we will require the
following conditions:
rr..
K(u) is a density on R
(1)
(ii)
(4.1. 4)
sup
!K(u)
m
ueR
Ilull K(u)
(iii)
where K(u)
I
<
~
00
as
0
m
2
Ilu"
=
I
i=l
2
U
i
~CIO
= K(u 1 ,
u2 ,···, u ) is a real valued, Borel measurable
m
m
m
function on R and R is an m - dimensional Euclidean space.
Further we will require that {h } is a sequence of real numbers
n
such that condition (2.1.4) is satisfied, that is (i) h
(it) lim h = 0 and lim nh
n
n-+oo
n-+oo n
=
n
> 0, n
= 1,2,··;
00.
The strong consistency of f (x) is proved in section 4.2.
n
In sections 4.3, through 4.7 we will parallel the work of
Chapter 2 and define a "naive" stopping rule based on
~
(4.1. 5)
Vn (x)
-
=
- M(x) - fn
1
f( n- 1)'1(x)
•
where fnM(x) is defined by (4.1.1) with m = 1.
4.2
Strong Consistency:
In order to show strong consistency we will use the following
lemma which is proved in Van Ryzin (1969).
The proof is omitted here.
Lemma 4.2.1:
Let {y } and {Y' } be two sequences of random variables on a
n
n
75
F, P). Let
probability space (X,
Fn c Fn+l c F, where Yn and Y~
(i) 0 ~ Y
a.e.
{F } be a sequence of Borel fields,
n
are measurable with respect to F •
n
If
n
(ii) EY
(4.2.1)
I
<
(iii) E (Y n+l
co
I
co
(iv)
L
I
E IY~
n
n=l
Fn )
<
S
Yn + Y~
a.e.
co
then Y converges a.e. to a finite limit.
n
Theorem 4.2.2 now gives the pointwise strong consistency for the
estimator, f (x).
n
Theorem 4.2.2:
(Pointwise Strong Consistency).
If K(u) satisfies (4.1.4) and {h } is a monotone decreasing
n
sequence of numbers satisfying (2.1.4) and if in addition
00
L
(4.2.2)
n=l
1
z-m
n h
<
co
n
and if
o
(4.2.3)
n
L
< lim 1
n-+<o n
then f (x)
n
+
< 1
j=l
f(x) with probability one if x
~
C(f).
Proof:
First note that Ef
(x)
n
f(x) as n-+<o
+
(Ya~~to
(1972».
Thus it
~
is sufficient to show f (x) - Ef (x)
Let Y
n
=
~
n
~
n
(f (x) - Ef (x»)
n
n
+
and F
n
Xl' X ,ooo, X and note that lim E(Y )
n
2
n
n+oo
(4.1.2) we now have that
0 with probability one as n+oo.
2
= O.
= Borel
field generated by
(Yamato(1972».
From
76
f (x) - Ef (x) n
n
1
n+1
(fn (x)
Ef
n
(x»)
Thus,
Y +
1
n
(n+l)
2
Y n
2
~('-n'-+-:-l"::"")
Y +
n
1
2m
(n+l) hn+1
2
Var K ~x-X ).
h
n+l
Now define
Y'
n
= -~""'2
1
Y n
(n+1)
2 Y +
1
n ---,2~2m(n+1)
( +1) h
n
n+1
Var K(x-X )
~--
(hn+1
In order to use Lemma 1 we must verify conditions (4.2.1), of
which we already have that (i), (ii), and (iii) and satisfied.
Thus to
complete the proof we must verify
rEI Y I
i
n=l
<
(I).
n
But,
00
r
1
n=l (n+l)
ElY n
1=
r
Var f (x).
1
n
n=l (n+1)
Now. Yamato (1972), using a monotone sequence satisfying (2.1.4) and
with v
a
(4.2.4)
defined by (4.2.3) has shown (Theorem 3) that
ID
lim nh
n-+<J)
n
Var
fn (x) = va
f(x)
J
2
K (y)
dy.
m
R
We note that since {h lis a decreasing sequence, v
n
Now, for large n we thus have,
~
Var fn(X)
= 0C~hml'
n
Hence, using condition (4.2.2) we have that
a
~
1.
77
ClO
Var f (x) <
1
L:
n=l
ClO.
11
(n+l)
Further, Van Ryzin (1969) shows that
ClO
(x-
l:
__--?i'l_-:-- Var K X )
2 2m
h
(n+1) h +
. n+l
n l
n=l
<
c:o
using condition (4.2.2).
ClO
L: EIY'I < ClO and the result follows from Lemma 4.2.1 and the fact
n=l
n
that the mean square limit and the almost sure limit coincide with
Thus
probability one.
In order to prove the uniform result with respect to x, we
introduce the Fourier transform of the kernel.
Define
it'u
(4.2.5)
e
K(u) du
m
where
t 'u
=
L
t
j=l
j
uj •
m
Also let
and let
Theorem
4.2.3~
<fln (t)
=1
<fl(t)
= Ee
n
l:
e
it'X
j
j=l
it'X
(Uniform Strong Consistency)
If K(u) satisfies conditions (4.1.4) and {h } is a monotone
n
decreasing sequence of real numbers satisfying (2.1.4) and if both
h
lim
n
n+oo h +
n 1
=1
2m +
and lim nh
n
n+oo
ClO
and if
ClO
(4.2.6)
l
n=l
and
1
m
(nh )
n
<
2
ClO
78
co
I
(4.2.7)
<
co
n=l
and if also
f I k(t) \ dt
(4.2.8)
<
where k(u) is defined by (4.2.5) and Ik(u)
and non-increasing on u
sup
x
Ifn (x)
- f(x)
I+
~
00
I is
non-decreasing on u
<
0
m
0, then if f(x) is uniformly continuous on R ,
0 with probability one as n~.
m
(All integrals are over R unless otherwise specified).
Proof:
~
We first prove sup
x
Ifn (x)
~
- Ef (x)
n
I+
0 a.s. as n~.
Since both
k(u) and K(u) are in L , using the inversion theorem for a Fourier
1
Transform
sup Ifn(x) - Efn(x)
x
I
= sup
x
n
\__1_
(2~)m
.!
Jn
I
(e
j=l
iu'X.
-iu'x
J-cHu») k(h j u) e
du
Taking the absolute value signs inside the integral and noting
le-iU'X\
~
1, we now have that the right hand side is less than
and we can write this as
1
(2~)m
J
n
\.!
n
L (e
j=l
iu'X
j -
~(u») k(hju) \1 k(h u) I
k(h u)
n
n
Using Schwarz's inequality this is less than
duo
I
79
[ Jh;lk(U) I dUJ1/2[ J ~
jt
(e
iu'X
j-
n
I
Since
\k(U)
I du
<
n
iu'X
~
by assumption (4.2.8), we need only
consider the term
L
(e
j - 4>(u») k(h j u)
k(h u)
n
j=l
which we will denote by Y.
I
2
Ik(h u)
n
I
du
Taking expectations we get
n
n
iu'X
I
(e
I
2
j
$(u») k(hju)
k(h u)
-
j=l
n
I
2
!k(h u)
n
du,
the interchange of the integral and expectation sign being justified by
the use of Fubini1s Theorem for positive functions.
iu'X
E Ie
iu'X
2
j -
$(u)
I
=
E (e
=1
But,
-iu'X
j
$(u») (e
-
j -
$(u»)
2
-
I<1>( u) I
so that we obtain
(4.2.9)
EY
n
= n;h J
m
(1 - 1~(u)12) j!1 1:~:~:;12 Ik(hnu)I duo
n
However, since !k(u)
increasing on u
(4.2.10)
~
I
is non-decreasing on u < 0 and non-
0 we have
, j
S
n.
Thus, using inequality (4.2.10) in equation (4.2.9) we get
80
EY n
~ __1__ J
nh
Ik(hnU) dul
=
m
n
1
n 12m
1
J
I duo
Ik(u)
n
Condition (4.2.8) and the assumption nh 2m
n
tends to zero as
~ ensure that EY
+
n
We now prove that the convergence of Y occurs
n
n~.
with probability one.
Write,
n+1 ==
Y
J
1
h
m
+
IJ
1
(n+l)
n 1
\(
n+1
'~l
2
2
iu'X
j
e
- </leu)
k(h.\ u)
)
J
k(hn +1u)
n
1
==
(n+l)
2
1
J L
m-j=l
h +
n 1
Let F be defined as in Theorem 4.2.2.
n
term in the integral using the fact that
E(e
We now expand the first
2
iu'X +1
iu'X +1
n
_ </l(u»)IF ) = 0 and Z(e
n
n
Izi = Z Z and
-
</l(u)
I F)
n
that
o.
=
Thus,
iu'X
(e
2
j -
</leu») k(hjU) Ilk(h +l u)
k(h u)
n
I du
n
+
m
1
2
2
J(1 -
I,(u) 1
Ik(hn+1u)I
duo
h + 1 (n+l)
n
But, 1 function.
2
I</l(u) I
Thus
~ 1 and Ik(u)
I
~ 1 since we assume K(u) is a density
81
+
I
1___
2m
n h
n+l
2
Ik(u)
I
du
Adding and subtracting Y we obtain
n
~
Yn
1 ~
n
+
1
2
2m
-mh n+1
n
I
1) I
-
-Z - ;
Ik(u)
I
2
iu'X
I I (e j-cHu») k(hju)
j=l
k(h u)
n
Ik(h u)
n
I du
n
du
n h n+1
Let,
(4.2.11)
= _-=l=---
,.yn
2
2m
I Ik(u) jdu
n h n+1
and
(4.2.12) Un = LCL - _1_)
2[ m m
hn+1
n hn
iu'X.
I) (e J_ljl(u») _k(_h,L..jU_)
J=l
k(h u)
n
n
I
Note that W is not a random variable.
n
2
Ik(hn u) 1 duo
= Wn ,
Hence EW
n
and
using condition(4.2.6), we have
00
00
(4.2.13)
I
n=l
E 1\oJ
n
I:: I IIIn I
<
00.
n=l
00
I lun I
Thus, in order to use lemma 4.2.1 we need to show that
<
00.
n=l
From (4.2.12) we have on taking expectations
I
I
2
k(h j u)
k(h u)
n
\k(h u)
n
I
du
32
~
1
1-
Jn
1
2" m
n
h
h
n
lk(u)
n
By using the fact that (1 - tnt)
the condition that
~
n~,
1 as
I duo
ffi
=
Ll
1
t J - ) (1-t) and using
Van Ryzin (1969) shows
- 1
h
Thus the
to
LI J
nh 2m- 1
n
co
l
n=l
Elu I
h
n
upper bound in (4.2.14) is
\_1 hn
hn+1
--,,--l'l
m-1
Ik(u)
I duo
n
as~nptotica11y
equivalent
lIenee, using condition (4.2.7),
Thus conditions (iii) and (iv) of lemma (4.2.1) are
< co.
n
satisfied, where Y'
n
= Un + Wn .
The conditions of lemma 4.2.1 are satisfied for Y and so Y
n
tends to zero almost everywhere.
-
~
sup If + (x) - Ef + (x)
n 1
n 1
x
I~
n
Thus,
a.s. as n~.
0
Now, using the definition of fn(x) and noting
f
K(u) du
= 1,
we have
that
sup IEf (x) - f(x)
x
n
\-lhere K. (u)
J
(4.2.15)
=1
h
n
I=
sup I
x
J(2" ;=1r K.(u)
1
n
~
J
f(x-u) - 1
n
r
j=l
K.(u) f(x) du
J
Then the right hand side is less than
I
83
and splitting the range of integration into two sets
I lui I
~ 0 and
Ilull > IS we have that (4.2.15) is less than
sup
x
r
1\ u II ~o
1
n
I.I
+ sup
x
~
Kj(U) \ \f(x-u) - f(x)
J =1
r
!f(x-u) - f(x)
I du
I1
1lull>o
sup
sup
If(x-u) - f(x)
x Ilull~o
I+
n
I
K (u) du
j=l j
n
l
2 sup 1 f(x)
j=l
x n
K(u) duo
Since f(x) is uniformly continuous, the first term can be made
arbitrarily small by choosing 0 sufficiently small.
In the second term,
notice that for this fixed 0,
K(u) du
f(x)
-+ I)
..
as n-+w at all points x.
Hence the Cesaro sum approaches zero and the
proof is complete.
Example 4.2.1
Consider the {h } sequence given by
n
h
= - -1- -
n
I
(log n)m
This sequence satisfies conditions (2.1. 4) and in addltion
lim
n-+w
h
n
hn+ l
= 1.
Also
00
l
j=l
00
(Xl
1
=
n 2 hm
n
l
n=l
log n s
n
2
l
n=l
n
1
2-r
, 0
< r <
1
84
which is finite so that condition (4.2.2) is also satisfied.
Hence the sequence h
1
m
(log n)
o
defined by (4.2.3).
From the definition of v ,
o
n
0
satisfies the conditions
1
Let us now evaluate the constant we have called v ,
of Theorem 4.2.2.
v
=
n
= lim
n-+co n
m
n = lim
1
n 400 (n log n
hll'l
n
h
I
1
j=l
I
j=1
log j).
j
Let y = log x, so that
n+l
J1
n+1
n
log x dx
~
I
log j
j-1
~
J
10g(x-1) dx.
2
That is, on integration by parts
n
I
(n+1) 10g(n+1) - n ~
log j ~ (n+1) log n - (n-1) - log n.
j=1
Hence, using these inequalities we see that
n
lim 1
n-+co n
That is for the sequence h
n
=
I
hm
-.E.
j=l m
h.
J
1
__
= 1.
the conditions of Theorem 4.2.2
1
(log n)m
are satisfied and v
o
= I.
Example 4.2.2
Consider the {h } sequence defined by
n
_r
h
n
=n
m
1
(0 < r < "2)'
This sequence satisfies (2.1.4) and lim ~ = 1.
n-+<x> h n +1
Also we have
85
00
I
that
<
1
j=l
Thus this sequence satisfies the conditions of
00.
2 m
n h
n
Theorem 4.2.2.
In addition, using a method similar to that in example
4.2.1 we can show that
n
\I
hID
n =
I
= lim 1:.
0
n-+oo n
j=l
,2m
Now clearly nn
h
j
as n-+oo and
0+00
n
(0 < r <-1.)
2 •
1
r+l
m
00
I
=
1
n=l
r
<
1
00
since
O<r<1:.
2
11=1
But,
00
~
n=l
1
2m-1
nh
n
I
1 - 1
hn+l
hn
I
r
m
(n+l) m _ n )
r
00
=
~
1
n=1
n.n
-2rr!m
r
00
=
~
n=l
n
1
1-2r
(1~)
m
- 1)
and,
r
(1~)
11
m
~ r
- 1
since r < 1.
m
Thus,
r
00
r
n=l
n
1
1-2r
(1~)
Hence, the sequence h
4.2.3.
m
00
- 1)
~
r
TIl
n
=
n
r
- -m
I
n=l
n
<
1
2-2r
00
since 0 < r <
1
2.
also satisfies the conditions of Theorem
86
We will now return to the case when m = 1 and consider a
"naive" sequential procedure for the Yamato estimator.
4.3
The Sequential Procedure:
We will now consider f (x) defined as in (4.1.1) but with m = 1,
n
that is
- (x)
(4.3.1)
f
n
=1
n
n
L
j=l
Kr~).
1
h
l h.J
j
-
In section 4.2, Theorems 4.2.2 and 4.2.3 show that f (x) is both pointn
wise and uniformly strongly consistent under appropriate conditions so
that as in the case of Parzen type estimators a Hnaive" sequential
procedure can be defined.
Again the procedure consists of taking
successive samples of size
M
consisting of
N:
independently and
identically distributed random variables from a population with density
function f(x).
-
V
n
(x) is defined by (4.1.5)
Since we can write, using a modification of (4.1.2)
-
fnM(x)
= n-l
n
f(n_l)H(x) +
nM
1
nl'l
r
KC:;j]
L
j=(n-l)H'H h.
J
we see that V (x) can be 101ritten as
n
V (x)
n
= fnM(x)
-
- f(n_1)M(x)
= (n-l) f(n_l)M(x)
n
-
nM
- f(n_1)M(x) + 1
nM
l
1-
j=(n-l)M+l h.
J
so that
nM
(4.3.2)
I
-!.
j=(n-l)l#l h
j
Formally, the stopping rule we will consider is defined by,
37
rst n suc~ that
(4.3.3)
ro
IVn (x)
I
< C for given c > 0,
if no such n exists
We will assume that conditions (2.1.3) and (2.1.4) hold for
K(u) and the sequence {h } respectively.
n
Paralleling our previous
A
discussion for the Parzen-otype estimators. f (x), we now discuss some of
n
~
the properties of f
4.4
N1
(x) and N1(c,M).
Properties:
Lemmas 4.4.1 and 4.4.2 correspond to lemmas 2.2.1 and 2.2.2 and
~
give elementary properties of V (x) that allow it's use in the stopping
n
rule (4.3.3).
Lemma 4.4.1:
As n+oo, if K(u) and {h } satisfy conditions (2.1.3) and (2.1.4)
n
and if h
n
is monotone decreasing then
(a)
Ivn (x) I +
° in probability if x is a continuity point of
(b)
sup I V (x)
n
x
I+
f(x) and
0 in probability if f(x) is uniformly
continuous.
Proof:
Yamato (1972) shows that given the above conditions
(a)
fn(x)
+
f(x) in probability and (b) sup Ifn(x) - f(x)
x
I+
0
~
in probability.
~
\Vn(x)
1=
Thus from the definition of V (x) we have that
~
n
~
\fnM(x) - f(n_1)M(x)
I
is a Cauchy sequence in probability and
~
part.
(a) follows.
Similarly, (b) sup\Vn(x)
x
I = suplfnM(x)
~
- f(n_1)M(x)\,
x
This is also a Cauchy sequence under the given conditions.
Thus the lemma is proved.
88
Lemma 4.4.2
If K(u) and the sequence {h } satisfy the conditions of theorem
n
4.2.2 then V (x)
n
+
0 with probability one if x is a continuity point of
f(x).
Proof:
Under the conditions stated f (x) is a Cauchy sequence with
n
probability one so the result follows.
Lenm:a 4.4.3
As
n~,
if K(u) and {h } satisfy the conditions of Theorem
n
4.2.3 then
sUPlv (x)
x
n
I+
0 with probability one if f(x) is uniformly
continuous.
Proof:
The result follmvs since under the conditions stated f (x) is
n
uniformly Cauchy.
In order to examine the properties of the stopping rule we must
first study the properties of V (x).
n
We begin an examination of the
properties of V (x) by examining its variance.
n
Lemma 4.4.4
If K(u) and {h } satisfy conditions (2.1.3) and (2.1.4) and if
n
in addition h
n
is monotone decreasing then we can write
vn (x) = An (x) + Bn (x)
and
2
lim n Mh
n~
nM
Var Vn(x)
= lim
n~
= f(x)
2
n Mh
<xl
J
-<xl
nM
Var B (x)
n
2
K (u) duo
89
Proof:
From equation (4.3.2) we have that
and
nM
=
B (x)
n
I
1
-.!.
nM j=(n-l)M+l
h
K
(X-xi J.
( h
j
j
Further, since An (x) depends only on Xl' Xl'
X(n-1HI
00 0 ,
and
Hence we can write
~
(4.4.1) Var Vn (x) =
.L
~
nM
Var f(n_l)M(x) +
n2
X
2
1
-.!. Var Kr~J
j=(n-1)M+1 h~
(nM)2
( hj
J
In the last expression on the right hand side of (4.4.1) we
notice that (n-1)M+1
j s cl1 so that
~
j~
n~
as
and so using Parzen
(1962)
lim
.J:
n~~
h
Var
K(X-Xj )
h
j
oo
J
K2
(u) du,
(n-1)M < j
~
(n-1)1'1
S nM
nM.
-00
j
But, by assumption lim
n~
= f (x)
h
h
nM = 1
< j
j
so that
nM
lim 1
n~
M
I
j=(n-1)U+1
hnM
2
h.
J
Var
K(X:~j) =
f(x)
f
~
2
K (u) duo
-~
J
That is,
2
lim n Mh M Var B (x) = f(x)
n
n
n~
oo
J
_00
2
K(u) duo
90
Further, Yamato (1972) shows that
=0
lim Var f (x)
n
n-;.o>
so that
2
lim n Mh
nM
n~
An (x)
= lim
n-+oo
Mh
}f
n -
Var fnU(x)
=0
Hence,
2
lim n r1h
~
nM
Var V (x)
n
=0 +
2
lim n
~
~!!h
-~
Var B (x)
n
This proves the lemma.
Remarks:
As in the Parzen case, B (x) is a finite sum, dependent
(1)
n
only on the final sample which consists of
variables.
(2)
~f
independent random
B (x) is always positive.
n
If the density function, f(x), were known then under
appropriate conditions on the kernel, K(u), the exact distribution of
~
B (x) could be found (at least in theory).
n
It would be more complicated
than in the Parzen case since we no longer have M identically distributed
random variables.
(3)
nAn(x)
=
f(n_1)M(x) will have an asymptotic normal
distribution since f(n_l)M(X) is distributed normally asymptotically.
However,
oo
Var A (x)
n
(4)
__~1~
3
n Mh
nM
f
2
K (u) duo
_00
Some comparison between the stopping rules Nand N can be
1
91
made.
At this point we can compare the variances of V (x) and V (x)
n
n
using lemmas 4.4.4 and (2.2.5) and we show that the variance of V (x) is
n
always less than or equal to the variance of V (x).
n
Lemma 4.4.5
Given the conditions of lemnas (4.4.4) and (2.2.4) then there
exists a constant
V
defined by (2.2.2) such that
lim Var Vn (x)
n-+<» Var V (x)
=1
~
1.
v
n
Proof:
From lemmas 4.4.4 and 2.2.4 we get that
2
Var V (x)
lim
n
n-+<» Var V (x)
n
n Nh
= lim
n4<X>
nU
2
n Mh
nM
Var V (x)
n
Var V (x)
n
f(x) J~ K 2 (u) du
-00
=
V f(x)
JOOK2 (u) du
=
1
V
-00
But in lemma 2.2.4 we saw
V
~
1
the result follows.
80
Examp Ie 4. 4 • 1 :
Let h
n
=
B n-
ex
, 0 < a. < 1.
In example 2.2.1 we showed that
2
1
lim [n + n
n4<X>
(n-1)
+ ex
while in example 4.2.2 (with m = 1) we showed that
n
lim..!.
n-+<» n
I
j=l
h
n
h.
J
= _~l_
1
+ ex
and we note that this last quantity is exactly the result of lemma 4.4.5.
92
We thus have a link between Parzen type and Yamato type
estimators.
Notice that in the variance of V (x), the A (x) term gives
n
n
the term a f(x)
~
2
K (u) du while Bn(x) gives the term f(x)
2
J K (u)
du,
00
while for V (x) the A (x)
n
4.5
n
ter~m
leads to a zero.
An Alternative Procedure:
In this section we will examine briefly an intuitively
appealing, but unfortunately impractical stopping rule.
We have seen
that
nN
V (x) = (n-1) f(n_1)M(x) - f (n-l)l1(x)
n
+ 1
n
nM
L
j=(n-1)HH L
hj K P
h )
j
Notice that if we should define
V~(x)
-
= fnM(x) - (n-l) f(n_l)M(x)
n
then on simplification
nH
Vi
n
(x)
:=
1
nM
r
1
j=(n-1)M+1
h.
J
KF::j]
which depends only on the last sample of M observations.
That is we
could then define
N
2
(e:,M)
=
1st n such that Vi (x)
n
00
<
e:, given e:
>
O.
if no such n exists.
Lemma 4.5.1 shows that this stopping rule is unsuitable for use
as a sequential stopping rule.
Lemma 4.5.1
If K(u) and {h } satisfy (2.1.3) and (2.1.4) then N (e:,M) is
n
2
93
bounded above.
Further this bound does not depend on the convergence
properties of fn(x) (thus making N (e,M) an unsuitable stopping rule).
2
Proof: By definition
nM
~
V' (x)
n
=1
I
nM
1-
j=(n-l)H+l h.
J
sup K(u)
u
Thus N (e,M) is always less than or equal to the smallest n
2
that satisfies
sup K(u)
u
nh
nU
~
e
and this value does not depend on the convergence properties of f (x).
n
4.6
Finiteness of EN:
We now return to the stopping rule defined by (4.3.3) and will
show that EN
l
<
First, we will state two lemmas without proof, the
00.
proofs being similar to those of lemmas 2.3.1 and 2.3.2.
Lemma 4.6.1
For arbitrary t > 0,
P(V (x) > e) ~ e
(4.6.1)
-cl~nM€t
n
S (x)t
Ee n
where
(4.6.2)
sn (x) =
(n-1)11
nt1
L
hnH
j=l
h.
K(5) -
J
Lemma 4.6.2
For arbitrary t > 0,
h
j
L
j=l
n
(0-1)
94
-n~~ ~£
-tS (x)
P(V (x) < - £) ~ e n . Ee
n
n
(4.6.3)
~
where S (x) is defined by (4.6.2).
n
Since t is arbitrary we will consider t
following lemmas.
=1
only, in the
We also note at this stage that we may write by using
(4.3.2) that
nM
.. (X-X.)
T - -nri
= 1:
S (x)
n
h
K
j=(n-l)M+l
M
~f
( )
(n-l)M x
J
so that
(4.6.4)
nM
n
h
l
=
S (x)
nl1
h
j
j=(n-l)UH
K
X
rx. j) 1
Ch J
(n-I)
Lemma 4.6.3
If {h } is a monotone decreasing sequence of real numbers
n
satisfying conditions (2.1.4), (2.1.5) and condition (Al) and if K(u)
satisfies (2.1.3), and if in addition v
+5 (x)
Ee
n
o
is defined by (4.2.3) then
~ e ML
where L = sup K(u)
u
Proof:
From (4.6.4) we have that
Ee
S (x)
nM
n
_
- E exp ( l.'\
b
h
nM
h
~
1 for
j
h.
h.
J
~
(n-I) j=l
J
nM we get that
j
nM
(4.6.5)
1 L hnH K(X-X j ))
K
~---.l.
U
j=(n-l)M+l
But, since
(n-l)M
[x.-,'{,-4) )
1
L
k=(n-l)l1+l
h
X
nM
h.
J
K(Xh-J.j)~
H.L
h
j
h
j
95
Further, since K(u)
~
0 we have that
(n-1)N
exp (- (n-1)
1
(4.6.6)
l
j=l
Thus, using inequalities (4.6.5) and (4.6.6) we have that
S (x)
Ee
~ Eel4L E(l) = e ML
n
Lemma 4.6.4
Under the conditions of Lemoa 4.6.3
-8 (x)
Ee
Proof:
n
~
L
e •
From (4.6.4) we have
nM
-8 (x)
Ee
n
hnN
j=(n-1)I:H'1 h
l
= Eexp(
K~))
Ch
j
Since K(u)
~
j
(n-1)M
Eexp r(n-1)
1
l
j=l
hhnU
j
K~))
Ch j
0 for all u the first expection on the right hand side is
less than one.
Now,
(n-1)!1
I
j=l
1
(11-1)
nH
h
j
(n-1)H
since
I
j=l
1
(n-1)
(n-1)M
KF:;j)
h
h
S L
l
j=l
nN
h.
~
h
nM
h
~
L
j
1
J
Thus y7e have that
-8 (x)
Ee
n
S
1.
EeL
= eL
as required.
Theorem 4.6.5:
If K(u) and the sequence {h } satisfy the conditions of lemma
n
4.6.3 then EN
l
<
00.
Proof:
Using a method similar to that used in the Parzen case we can
96
show
and using lemma 4.6.2 together with lemmas 4.6.3 and 4.6.4 we get
-nMh
p(:\>nH)
where C
= e ML +
L
e .
~
Ce
nM
e:
But,
00
~
l
k=O
which is finite since {h } satisfies condition (AI).
n
the theorem is complete.
Thus the proof of
Lemma 4.6.6
If K(u) and the sequence {h } satisfy the conditions of lemma
n
4.6.3 then Var N <
l
00.
Proof:
It will be sufficient to prove EN
shown EN
l
<00
2
l
<
00
since we have already
•
We have from the definition of N1(e:,M) that
where
(4.6.7)
IVj
(x)
so that
and using lemmas 4.6.3 and 4.6.4 we get that
I
s e:}
97
Thus it follows that since {h } satisfies condition (AI),
n
00
2
r Cn
k=l
=
EN 1
2
e
-nMhn~1E:
<
00
and the proof is complete.
Observations:
1.
As
E:~
the upper bound on EN
becomes infinite.
l
This is
what we would expect given the form of the stopping rule.
2.
Given M,
E:
and the form of h
n
we can calculate the upper
bound of EN •
l
3.
Notice that the Yamato estimator leads to a much simpler
set of conditions to ensure the finiteness of the expected sample size
than does the Parzen estimator.
The main reason for this is the simpler
form of equation (4.3.2) compared to that of equation (2.2.4A).
This
is one more advantage of working with f (x) compared to f (x) when we
n
n
wish to use sequential methods.
We now prove the closure of the sequential procedure.
Lemma 4.6.7
(Closure).
If K(u) and {h } satisfy the conditions of lemma 4.6.3 then
n
Proof:
Under the stated conditions we have that
-nHh
~
so the result is immediate.
C. e
tU4
e;
98
4.7.
Divergence of N as e
l
+
0:
We will now examine the stopping rule's behaviour when we let
e +
O.
As for the stopping rule N(e:,E) defined in chapter 2 we would
like to show that
in probability
lim N (e:,M)
e:+0 l
with probability one,
and as in that case it can be shown that this result will not be true
for all kernels.
We state a lemma (that corresponds to lemma 2.6.1)
,,11 thout proof.
Lemma 4.7.1
Ivj(x)
I=0
for some j}
then as
1
Again we note that P(~n}I) may be greater than zero (the uniform
1
kernel again provides an example) but that a class of kernels, K ,
o
exists for which p(V (x)
n
= 0) = O.
1
In fact Ke: K if it satisfies the conditions of lemma 2.7.3.
o
That is if K(u) satisfies (2.1.3) and
(i)
(ii)
K(u) is differentiable for all u, and
K'(u) is continuous and non-zero at all but a finite
number of values of u.
Thus lemma's 2.6.2 and 2.6.3 will hold if we replace V (x)
n
by Vn(x) and N by N in their statements. Similarly Theorem 2.6.4 will
l
"
hold with V (x), £ (x) and N(e:,M) replaced by V (x), f (x) and Nl(e:,M)
~
n
respectively.
n
n
n
That is, we can show under the conditions stated in the
lemmas and theorem that
99
=
in probability and
00
zith probability one.
We can also redefine N (c,M) in the same manner N(£,M) was
1
€~ N~
redefined in order to ensure that as
at least in probability.
Thus we define
1st n such that
00
if no such
n
Ivn (x)1
<
E and
Ivn (x) i
>
0
exists
In this case, by using a similar method to that used in
Theorem 2.6.6 we can show
EN' <
1
4.8.
00.
Mean Square Error:
One of the properties we wish to look at for f
square error.
to zero.
N1 (x), is mean
We wish to show that aSE+ 0 that mean square error tends
This is done in Theorem 4.8.1
Theorem 4.8.1:
If Ke K~
,...
then lim E[f N (x) - f(x)]
1
£+0
conditions of Theorem 4.2.2 are satisfied.
2
=0
provided that the
Proof~
If
Kg
K~ then
by our discussion in section 4.7 we know that
lim Nl(E,M) = 00 lv.lth probability one. But by Theorem 4.2.2,
£+0
a.s. and the result follows from Theorem 3.3.4.
f n (x)
+ f(x)
CHP.PTER 5
Cm;PLEi'lEfnS
5.1.
Introduction:
In this chapter we give some results that complement those in
chapters two through four.
We also try to give some avenues for future
research.
5.2
Mean and Variance:
By definition, the mean of a random variable with density f(x)
1s
f-a>a> x f(x)dx.
Thus, for a given set of observations xl' x ,o.o, x
2
n
let us calculate the mean of f (x) as
n
1.1 (x
l'
000
,
x)
f
n
_a>
n
(x)dx.
Note that in this chapter K(u) is always assumed to satisfy (2.1.3) and
the sequence {h } to satisfy (2.1.4) and (2.1.5).
n
Lemma 5.2.1
For the estimator f (x) defined by (4.1.1)
n
n
1.I(x ,···, xn )
1
where x
=1
n
n
L
i=l
xi
and
1.I
K
=x +
1.IK. 1
n
= fa>
L
j=l
h.
J
u K(u)du
_a>
Proof:
By the definition of fn(x), and since xl' x ,·oo, x is a given
2
n
sample,
101
f~
x fn (x)dx"
-~
-~
u
x-x
= -=-.::J..
h
j=l
h
r
n
I
j=l
n
LKP)
I
n
=1
Transforming by
n
f~X -1
_00
j
!!h.
J
h
dx
j
KPl
dx.
h.J
we get
j
~(xl'···' xn )
=1
n
n
I
j=l
fro
(X
j
+ hjU) K(u)du
-00
and the result follows immediately.
Corollary 5.2.2:
For Parzen-Type estimators, f (x), h. = h
n
J
n
for j .. 1,2,'··, n
so that
Corollary 5.2.3:
If Xl' Xl,·oo, X is a sample of independently and identically
n
distributed random variables and if K(u)
where
1.1
= K(-u)
then
= EXt'
Proof:
Since K(u) .. K(-u), l.I
K
.. 0 and the result is immediate.
Comment:
For the case K(u) = K(-u) we have that l.I
= 0 so that we can
K
write l.I(x ,·oo, x ) .. l.I(x ,ooo, x ) = x, and we have seen this is an
n
n
l
l
unbiased estimate of 1.1.
is symmetric about zero.
This gives us a reason for assuming the kernel
102
Corollary 5.2.4:
Proof:
Clear.
Now let us calculate the variance of f (x) for a given sample
n
~
;2 (xl, ••• ,X ) ...
n
x
2
i n (X)dX
~2(xl' ••• ,
-
xn)
-00
Lemma 5.2.5
For Yamato's estimate, f (x), given xl'
n
~2
CJ (x
where s
2
ZK
0
0
n
... 1
n
lJ
l'
•
=
L
j=l
, xn )
2
x
2
...
s
n
n
-2
- x
h
j=1
x ,
n
2
2
L
+ 1
... ,
j lJ2K -
n
-4
(j~l
n
2
hj )
and
j
r'
u K(u)du.
_00
Proof:
Let
n
2
oo
J
j=l
n
_00
n
n
after some simple calculations.
~2
CJ
(Xl'···' Xn ) =
..
lJ
(x
••• X )
'n
2 l'
n
=1
n
L
j=l
2
-2
x, - x
J
n
2
x,h,
~ +.! L h j
L
j=l J J
n j-l
n
x + 2
j
n
Thus,
-
.. 2
lJ
+ I
n
dx
J
2
l
j=l
... 1
KC:~j)
L
x.l
(x l' ••• , x )
n
n
l
j=l
2
2
h
j
l.l
l.lZK n
~
lJ Z,K
103
Corollary 5.2.6:
,,2
a
(x
l'
000
x)
'n
=1
n
2
n
=1
\'
x
j:l
n
j
- ~2 + h
2
2
a
n
K
Corollary 5.2.1:
If Xl' X2 , 0 . 0 , X is a sample of independently and identically
n
distributed random variables
~2
E
a (X
1
,'0
o,X )
n
"2
E a (Xl'
000
X)
'n
=
(n-l)
n
= lim
0
n~
n~
2
= Var
5.3.
Choice of €:
Xi <
+
2
0
j=l
n
h
j
112K -
11<2
LI=1
n
2
h.)
J
2
2
+ hn OK
.. 2
Eo (Xl,ooo,X ) = a
n
€
2
is important in the use of our sequential
We would like to choose
mean square error.
I
1
2
2
ClO.
The choice of
procedure.
n
n
n
-2
0
0
= -(n-l)
--
and lim E o (Xl' ° • , Xn)
where
2
€
An upper bound for
in such a way as to link it to
€
is obtained through the
following lemma.
Lemma 5.3.1
Proof:
From the definition of V (x) we have
n
104
A
A
IVn (X) 1= IfnM(x)
.- f (n-1)U(X)
I
so that using the triangle inequality and taking expectations
The result follows by noting that for any distribution
EY
2
~
2
[EY] •
A
Now, Elf (x) - f(x)
n
A
ElfnM(x) - f(x)
I
2
I
2
~ 0 as n~.
If we require that
~ y for some y > 0 then it will certainly be sufficient
to require that
Then,
Elvn (x) Is
Thus we should choose e: s 2y
1/2
•
One question that arises is can we choose e: so that a specific
value for mean square error is not exceeded.
to find a function gee:) such that as
Elfn (x)
- f(x)
2
1
S
gee).
e:~,
gee:)
To do this we would have
~
0 and
Unfortuneately, I was unable to obtain this
bound and this remains an open question.
5.4.
Choice of M:
Our sequential procedure depends on the size, M, of the
successive samples which we observe.
We assume M is constant.
Intuitively, if we choose M too large then the final total sample size
attained will tend to be too large in the sense that many of the
observations in the final sample may not be necessary.
On the other
hand, i f M is too small the number of samples of size 11 required to stop
105
will tend to be large.
Clearly, in a practical situation the size of M
will thus depend on the cost of individual observations and the extra
cost involved (if any) in using snaIl samples.
Formally, we stop the first time Iv (x)
n
when Ivn(x)
I = El
I
< €.
~ E and we would like to choose M so
If
that
E
is very much different from
E
approximately equal to
E.
indicate that either
(a) M is too large
E
l
We actually stop
l
is
it would
(b) V (x) is very sensitive to extra observations.
n
From the nature of V (x) we would conclude that except perhaps
n
for small n, (a) is the case that applies.
Thus one possible way of
choosing M is to look at the effect of adding a sample of size t1 to the
calculation of mean square error.
"
Parzen (1962) show that for large n
E If (x) - f(x)
n
2
I =0L I
~
)'
so that
a(
1
( '+/5 '+/5
n
H
(5.4.1)
= 4.
5
a[
1
].
'+/5 '+/5
U n
In chapter 2 we assumed M was a constant independent of n.
To
be consistent with that assumption we assume that at any stage, (5.4.l)
'+/5 •
- 4/ 5
should be made less than n
in which case M
~
'54 C, where C is
the constant of proportionality in (5.4.1) and C is given by.
106
~/S
2
K (Y)d Y)
•
Comments:
1.
He are ,: forcing" 1"1 to be a constant by our choice of rate
of decrease for equation (5.4.1).
Clearly, we can choose to make (5.4.1)
decrease at a d i ff erent rate than n-9/s in which case H wi 11 be a
function of n.
The choice of i1 as a function of n and its effect on the
results in chapters two
2.
to four is a question for future study.
In section 5.3. we mentioned that it should be possible to
2
A
find a geE) such that Elfn1.
'{(x) - f(x) \
:!>
geE).
Given gee:) it would
probably be more desirable to use this inequality to choose M.
For
example, we might choose 1"1 so as to decrease the mean square error by a
specified percentage at each stage of the procedure.
This is also a
topic for future research.
5.5.
Global Stopping Rules:
This dissertation has been concerned only with pointwise
stopping rules.
It would be useful to also have stopping rules
applicable to the global problem.
Two possible quantities, upon which a
global stopping rule could be based are
(i)
Su!?
x
sup Iv (x)
n
x
I
dx
(ii)
_00
Since (i) is closely related to our V (x) function of chapter 2
n
Lemma 2.2.1 (b) shows that suplvn(x) 1+ 0 in
x
probability if f(x) is uniformly continuous. Lemma 5.5.1, which follows,
we will consider it.
gives an analogue of lemma 2.2.2.
107
Lemma 5.5.1
Let K(u) be such that (2.1.3) is satisfied and let k(u) be
defined by (4.2.5).
g(C)
=
J
f~lk(U) Idu
Assume
!k(Ct) - k(t)
I dt
<
~
and that
is locally Lipschitz of order 1 at C = 1.
Let {h } be a sequence such that (2.1.4) and (2.1.5) are
n
satisfied and also that
00
l
n=l
1
--=~2
<
00
(nh )
n
l
n=l
1
nh
lim nh
n-+oo
n
2
n
'"
00
then if f(x) is uniformly continuous on R, suplfn11 (x) - f(n_1)M(x)
x
I~
a.6.
Proof:
The conditions stated are sufficient to ensure f (x) is a
n
uniform Cauchy sequence a.s. (Van Ryzin, 1969) so that the result follows.
Corresponding to definition (2.1.2) we can define our stopping
rule as given
€
> 0,
First n such that sup!V (x)
x
n
(5.5.1)
00
if no such n exists.
Lemma 5.5.2
If N is defined by 2.1.2 we have
I
$
€
108
Proof:
From the definitions of Nand N it is immediate that N
S
S
which implies EN ~ EN.
~
N
S
Remarks:
1.
The result of this lemma is as expected.
From the definit-
ion of N we intuitively expect to take more observations to satisfy the
S
criterion.
Also, from the nature of the problem we expect the result.
That is, if we wish to satisfy some criterion for all x and not just a
specific x we would expect to increase the sample size.
2.
EN
S
The results we would now like to prove are such things as
This should be possible.
<~.
In fact we should be able to
parallel the work in both chapters two and three for the global case.
5.6.
Choice of h :
n
We saw in chapter 1, section 3, that one of the open problems
in the use of kernel estimators, is the choice of h
observations.
n
from the available
In this section we will suggest a method for choosing h •
n
The idea has only intuition as a basis and will be a topic for future
research.
Assume Xl' X ,QOO, X is a sample of independently and
2
n
identically distributed random variables with density function f(x).
~
Joo K(u)du
will consider the estimate f (x), and since we assume
J~
have that
n
fn(X)dX
=1
We
we
= 1.
Suppose now that we order the sample obtained to get the order
statistics
_
00
<
< 00.
If x is the point at which
we wish to estimate fn(x) then for some r, X(r) ~ x ~ x(r+l)' (X(r) may
be _00 and X(r+l) may be +00).
Further, we can write
109
and
1 .
n+l
Thus, since f (x) is an estimate of f(x), let us estimate h
n
n
by
solving the equation
x(r+1)
JX(r)
(5.6.1)
A
f
n
(x)dx
= -1.
11+1
Since K(u) is known and since we can write
(5.6.1) as
X(r+l)
n
1
n
rJ
X(r)
.L K(X-Xj
h
11
j=l
we see that the only unknown is h
n
h
)
dx
= _I_
n+1
n
for any given n.
This corresponds to
finding hn for the pointwise problem.
For the global problem where we wish to find h
n
use the same h
n
for all points x we solve
X
(n)
A
J
f (x)dx
n
= n-1 = 1
n+1
XU)
where n-1 is chosen since
n+1
I
X
E
(n)
xU)
f(x)dx ... }1·-1 •
n+1
-
2
n+1
so that we can
110
Clearly there are many questions to be answered before we can
claim this as a viable method for choosing a h sequence.
n
least we should prove (or
(i)
h
(it)
nh
-+
n
n
0 (in prob?)
a.s.
-+
In addition, since h
dispro~e)
CIC> (in prob?)
a.s.
n
as
At the very
that
n~CIC>
as n-+cc.
is a random variable some attempt should be made
to look at the distribution theory involved.
It may also be possible to make some judgement of fWhow good" a
particular value of h is by use of a quantity we call the variation of
n
f (x). This will be the subject of the next section.
n
5.7.
Variation:
Define the variation of f (x) on a partition Y,
n
As usual, total variation is defined as sup
~(h)
where sup denotes
Y
y
taking the supremum over all possible partitions Y.
We believe that as h
n
decreases,
~(h
n
) increases.
If on the
other hand h is too large the estimate may begin to look like the
n
kernel function and the total variation would be approximately that of
the kernel function.
By studying the function
~(h)
we may be able to
obtain some idea as to whether a given h is too large or too small.
Lemma 5.7.1
h ,(h)
S
Total Variation K(u)
111
Proof:
By definition, for a partition Yl < Y <000< Y
2
m
m
I
1;(h) =
jel
.-11
~ [ KV;~]
nh k=l
h
K~i-lh-
~)
~ IK~]
h
K~j-\-
~)I
m
I
s
1
j=l
nh
1
S 1
nh
• 1
k=l
k=l
II
sup
j=l
Y
K (!j
-~)
-
Ch
K (Y j -1 -
C
h
)
\
~) I
Total Var K(u).
h
Observations:
1.
As h
+
0 this bound becomes infinite.
This agrees with our
intuition that variation may be very large when h is small.
value
2.
The upper bound decreases as h increases
3.
Given Xl' X , 0'0, X we can plot
2
n
"
~(h
n
~(h).
By looking at the
) and looking at its position on the curve we may be able to
get some idea as "how good il h
n
is ao an estimate of h.
Theorem 5.7.2:
Total Variation f (x)
n
-+-
00
as h
-+-
0 for a fixed sample size.
Proof:
By definition
1;(h)
=
I
j=l
\1
nh
1 (K(yf~)
l- h
k=l
KfYj-l
C
~ ~)) I
112
Choose a partition Y, such that for some k
But, since h
But, sup
y
~
= k 0'
Yi =
~
o
0, with the partition cllosen
~(h) ~ ~(h)
so that
Total variation f (x)
n
~ m
as h
~
0 for a fixed sample size n.
Notes:
1.
nh
n
~ m
as
If sample size is allowed to vary and h
n
n~
then the lower bound does not necessarily become infinite.
The condition nh
2.
is such that
n
-+
m
as
so that the increase in variation as h
n~
n
is necessary for consistency
becomes very small for fixed n
is to be expected.
Lemma 5.7.3
~(h)
sup lim
Y n-+m
= Total Var f(x)
a.s.
Proof:
m
lim r;(h)
n~
= lin
n-+oo
If" n (y i)
l
j=l
m
= l
lim
j=l n-+ ClO
If" n (yi)
"
- f n (y i-l)
"
- f n (yi-l)
m
=
L
j=l
If(Yi) - f(Yi_1)
I
I
I
a.s.
Thus,
m
sup lim
Y
n~
~(h)
= sup
Y
L
j=l
If(Yi) - £(Y 1- 1 ) 1 = Total Var. £(x)
a.s.
113
Lemma 5.7.4
~
Total Var f(x)
= sup
where Total Var f (x)
y
n
lim Total Var f (x)
n
~(h).
Proof:
By definition
r;(h)
= Total
Var f (x)
n
,.
Thus,
lim r;(h)
n~
~
lim Total Var f (x).
n~oo
n
But this inequality holds for all partitions Y so that
A
sup lim
Y
n~
~(h) S
lim Total Var f (x)
n~
n
and the left hand side is Total Var f(x) as required.
APPENDIX 1
TA3LE 1 (Parzen)
K(y)
1:,
Iyl
Iy I
2
0,
~
1
~
1
sin
u
Iy I, Iy I
0 , Iy I
1 -
(~)
- 8y
3
2
+
0
1
(21T)
1
-- (1
!-21T
1
~
1
3
81y I ; Iy I <1:2
-I
e
1
- --y
2
+
u
1
2
[i; (ill'
2
3
fSi;(*>r
2
( 1)-1
2
1
e
0·96
- 2" u
2
2 (1T )
2 -1
1.2 e -Iyl
1T
~
1
-I yI) 3, IS
y~1
, Iy I > 1
8
3(1
"" K 2 (y)dy
I_""
k(u)
2 -1
Y )
[BinY Cf>)'
2
1
u)
(1 +
2
e -lui
1
-1T
lui, lui
1 -
0
, lu I
~
1
~
1
1
"" ·3179
- = '1061
31T
"" ·2821
115
TABLE 2 (Epanechnikov)
K(y)
CIO
f
L =
2
K (y)dy
CIO
= L/
r
•••00
K (y)
3
o
515
;.;;:T.:8
4
,.t.ii2-8 y
co s
2
Iyl ~
1
11'
f,T2.:8
Iyl ~
0
11
11'
I7f2-8
1 001
0
16
IifT..8
-ill
1
Iyl
6
16
0
-2
e
1 2
- - y
2
1
215
Iyl ~
1
1 015
0
Iyl > 16
1
(211')
~ 16
1619
1 051
0
/3"
213
1
0
Iyl > 13
213
f
_00
1 077
0
1 320
0
2
KO(y)dy
P-EFEREi'JCES
[1]
Anderson, G. D. (1969):
IiA comparison of methods for estimating a
probability density function,1I Ph.D. Dissertation, University
of Washington.
[2]
Anscombe, F. J. (1952):
"Large sample theory of sequential
estimation," Proc. Camb. Phi1os. Soc. 48, 600 - 607.
[3]
Davies, E. 1. (1973):
liStrong consistency of a sequential estimator
of a probability density function,1I Bull. Hath. Statist, 15,
No. 3
~
4, 49 - 54.
"St~chastic
[4]
Doob, J. L. (1953):
[5]
Epanechnikov, V. A. (1969):
Processes,lI
John Hiley, New York.
"Non-parametric estimation of a
multivariate probability density," Theor. Prob. Appl. 14,
153 - 158.
[6]
Hill, D. H. (1973):
liEstimation of probability functions using
splines,H Ph.D. Dissertation, University of New
[7]
Hoeffding, Wassily. (1963):
~1exico.
"Probability inequalities for sums of
bounded random variables," Journal of Amer. Stat. Assoc., 58,
13 - 20.
[8]
Leadbetter, H. R. (1963):
liOn the nonparametric estimation of
probability densities," Technical Report Nc. 11, Research
Triangle Institute.
(Doctoral dissertation at the University
of North Carolina at Chapel Eill).
[9]
Loeve, H. (1960):
New Jersey.
"Probability Theory,'; Van Nostrand, Princeton,
117
[10]
r!oran, P. A. P. (1968):
"Introduction to Probability Theory,"
Oxford University Press.
[11]
Nadaraya, E. A. (1965):
"On nonparametric estimates of density
functions and regression surveys,ll Theor. Prob. Appl., 10.
186 - 190.
[12]
Parzen, E. (1960):
"Hodern Probability Theory and its
Applications,1I John Uiley, NeH York.
[13]
Parzen, E. (1962):
"On estimation of a probability density
function and mode,'; Ann. Math. Statist., 33, 1065 - 1076.
[14]
Rosenblatt, H. (1956):
"Remarks on some nonparametric estimates
of a density function,lI Ann. Math. Statist., 42, 1815 - 1842.
[15]
Srivastava, R. C. (1973):
BEstimation of a probability function
based on random number of observations with applications,"
~~,
Int. Stat. Rev.,
[16]
Van Ryzin, J.
(1969)~
Nol, 77 - 86.
"On strong consistency of density estimates,"
Ann. Math. Statist., 40, 1765 - 1772.
[17]
Wald, A. (1950):
IIAsymptotic minimax solutions of sequential
point estimation problems," Froc. 2nd. Berkelev Symposium
on Math. Stat and Probability, 1 - 12.
[18]
Wahba, Grace (1971):
"A polynomial algorithm for density
estimation," Ann. Nath. Statist,
[19]
.42, 1870 - 1886.
Watson, G. S. and Leadbetter, H. R. (1963):
"On estimating a
probability density, I,ll Ann. Hath. Statist., 34, 480 - 491.
[20]
Wegman, E. J. (1972):
"Nonparametric probability density
esti1"1ation: 1.
A summary of available methods /1
Technometrics,
14, 533 - 545.
118
[21]
Wegman, E. J. (1972):
estimation: II.
tlNonparametric probability density
A comparison of density estimation methods,"
J. Statist. Comput.
[22]
Woodroofe, 11. (1970):
Simu~
1, 225 - 245.
nOn choosing a delta sequence," Ann. Math.
Statist.,. 41, 1665 - 1671.
[23]
Yamato, H. (1972):
'iSequential egtimation of a continuous
probability dens! ty function and mode, 11 Bull. Hath. Statist.,
14, No. 3 - 4, 1 - 12.