,
OF THE :IN'MiSITY FllNCrIaf
OF A
~ PO~
I'J!f'O'SS
by
Maria Mori Brooks
A Dissertation submitted to the faculty of
The University of North Carolina at Chapel Hill
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in the Department of Statistics
OJapel Hill
1991
Approved by:
-""7.c;....--=---"7'f---;;:'--- Reader
ABSfRACf
Kernel function methods are appealing tools for estimating the
intensity function of a nonhomogeneous Poisson process.
The bandwidth.
which quantifies the smoothness of the kernel estimator, dramatically
affects the accuracy of the estimator.
Therefore, one would like to
find a data based bandwidth selection method that performs well for
kernel intensity estimation in some sense.
We consider two models for
putting a mathematical structure on the intensity function of the point
process: a simple multiplicative intensity model and a stationary Cox
process model.
Asymptotic analysis allows one to see the basic structure of an
estimator.
In thiS dissertation, we show that the least-squares
cross-validation bandwidth is asymptotically optimal for intensity
estimation under the simple multiplicative intensity model.
We also
show that given the stationary Cox process model. this bandwidth is not
asymptotically optimal.
Once asymptotic optimality of an estimator has
been established, it is important to find the rate of convergence
for that estimator.
We determine the·convergence rate for the
cross-validation bandwidth by finding the asymptotic distribution of
this bandwidth under the simple multiplicative intensity model.
Finally, we include an example with coffee purchase data: this example
demonstrates how kernel estimation and the cross-validation bandwidth
perform on a real data set.
11
PREFACE
I would like to express my great appreciation to Steve Marron, my
dissertation advisor. His guidance, many valuable insights and
continuous encouragement have helped me immensely throughout the past
few years. In addition, the entire dissertation committee has provided
useful comments and advice regarding my research.
With heartfelt gratitude, I thank my parents and sister for their
constant support and thoughtfulness.
Finally, I would like to thank my husband Rob for his big bright smiles
and his love - they allow me to find happiness in statistics and in
life.
11i
TABLE OF <XlNTENTS
Page
LIST OF FIGURES
'
v
Olapter
I. LITERATURE REVIEW AND INTRODUCTION
l.a. Kernel Density Estimation Literature Review
l.b. Kernel Intensity Estimation Literature Review
1. c. Models for Intensi ty Estimation
l.d. Results for the Least-Squares Cross-Validation
Bandwidth
,
II. ASYMPTOTIC OPTIMALITY OF THE LEAST-SQUARES CROSS-VALIDATION
BANDWIDTH UNDER THE SIMPLE MULTIPLICATIVE INTENSITY MODEL
2.a. The Model and the Kernel Estimator
2. b. Results
2.c. Proofs
III. THE LEAST-SQUARES CROSS-VALIDATION BANDIWDTH UNDER THE
STATIONARY COX PROCESS MODEL
3.a. The ModeL
:
3.b. Differences Between the Stationary Cox Process Model
and the Simple Multiplicative Intensity Model
3.c. Results
3.d. Proofs
IV. THE ASYMPTOrrC DISTRIBUTION OF THE LEAST-SQUARES CROSSVALIDATION BANDWIDTH UNDER THE SIMPLE MULTIPLICATIVE
INTENSITY MODEL
4.a. Assumptions and Notation
4.b. Results and Discussion
4.c. Proofs of the Theorems
4.d. The Lemmas and Their Proofs
V. AN EXAMPLE OF KERNEL INTENSITY ESTIMATION WIlli THE LEASTSQUARES CROSS-VALIDATION BANDWIDTH APPLIED TO COFFEE SALES
DATA
5.a. Kernel Estimation of the Coffee Data
5.b. The Least-Squares Cross-Validation Bandwidth
1
2
9
11
14
18
19
23
26
45
45
47
51
52
56
57
59
66
69
99
99
107
REFERENCES ...•.....•••...•••.•..•••.•..•.....•.....••....•.....••. 116
iv
LISf OF FIGURES
Figure 4.1 :
Intensi ty Function #1.
Figure 4.2:
Intensi ty Function #2."•...•...••.........•.......... 65
Figure 5.1 :
Brand 4, h=2.82
Figure 5.2:
Brand 4. h=.75
Figure 5.3:
Brand 4, h=10.57
104
Figure 5.4:
Brand 4 Sales and Promotions
106
Figure 5.5:
Brand 4 Cross-Validation CUrve
l09
Figure 5.6:
Brand 29 Cross-Validation CUrve
109
Figure 5.7:
Brand 29", h=4.08
111
Figure 5.8:
Brand 29. h=31.13
111
Figure 5.9:
Brand 20 Cross-Validation CUrve
113
Figure 5.10:
Brand 20, h=55.56
Figure 5.11:
Brand 20. h=4.00
65
103
,
'"
103
:
113
114
v
OIAPTER 1
LITERATIJRE REVIEW AND INfRODUCTION
The goal of this dissertation is to evaluate the least-squares
cross-validation bandwidth for the kernel estimator of the intensity
function of a nonstationary Poisson process.
This chapter contains a
literature review for kernel estimation and bandwidth selection
methods.
The literature related to kernel intensity estimation comes
from two fields of research.
First of all. many of the ideas and
methods developed for kernel density estimation can be applied to the
intensity estimation context.
Secondly, the work in intensity
estimation provides several models for the intensity function and a
variety of methods for studying this function.
Therefore, the
literature review is divided into two parts; section l.a. covers kernel
density estimation methods. and section l.b reviews intensity
estimation methods.
The last two sections of this chapter deal with the work contained
in the following chapters.
In section I.e. we discuss the mathematical
frameworks that we will assume for the point processes.
Finally,
section l.d includes a summary and a brief discussion of the main
results found in this dissertation.
l.a. KerneL Density Estimation Literature Review.
In the density setting. assume that Xl'
X2..... Xn
are independent
identically distributed observations from an unknown probability
density f(x).
A delta sequence density estimator is a general type of
estimate of f(x) which is written in the form
where hem+ is the smoothing parameter.- The general formulation of this
nonparametric estimator was introduced by Watson and Leadbetter (1963)
and then extended by Foldes and Revesz (1974).
By changing the
definition of 6 (x.X ). delta estimators include orthogonal series
h
i
estimators. spline estimators. histogram estimators and kernel
estimators.
See Walter and Blum (1979) for examples of various delta
estimators.
The kernel density estimator is a specific type of delta sequence
density estimator.
It is defined as
A
fh(X)
where
~ (x)
= h
-1
K(x/h).
=n
-1
n
~
~(x-Xi)
i=l
(1.1)
K(.). known as the kernel function. is the
"shape" of the weight that is placed on each data point; h. called the
bandwidth or the smoothing parameter. is the quantity that-controls the
A
amount of smoothing done by fh(x).
In general. the bandwidth depends
on the sample size such that as n increases. h-oO and hn-.
The kernel
density estimator takes on large values where the data are dense and
small values where the data are sparse.
Due to its intuitive appeal as
well as its mathematical tractability. the kernel density estimator has
been Widely studied over the past thirty years.
2
Early work in kernel density estimation was done by Rosenblatt
(1956), Whittle (1958). Parzen (1962). and Watson and Leadbetter
(1963).
Rosenblatt (1956) introduced the kernel density estimator with
a uniform kernel.
He also found expressions for the mean squared error
(MSE) and the mean integrated square error (MISE) of the first kernel
estimator.
Whittle (1958) considered the problem of density estimation
with a random number of observations on a finite interval.
Among other
things. he calculated the MISE for the kernel density estimator in this
more complicated setting.
Parzen (1962) generalized Rosenblatt's kernel estimator by
allowing K(.) to be any function that satisfies relatively
nonres~rictive
conditions.
When K(.) satisfies the specified
conditions, he showed that fh(x) is a consistent estimate of f(x) and
that fh(x) is asymptotically normal.
Watson and Leadbetter (1963)
presented further consistency results for the kernel estimator.
More
recently, Castellana and Leadbetter (1986) proved that kernel
estimators for the marginal probability density function of a
stationary sequence or a continuous parameter process are also
consistent and asymptotically normal.
In order to fully determine the kernel density estimator, the
kernel function and the bandwidth must be specified.
Epanechnikov
(1969) found the kernel function, K(.), that minimizes the asymptotic
MSE among density functions with unit variance.
Parzen (1962) and
Watson and Leadbetter (1963) first suggested that the kernel estimator
could be improved asymptotically by using higher order kernels, that
is. the kernel function can assume negative values.
3
Later, Schucanny
and Sommers (1977) and Gasser, Mue11erand Mammitzsch (1985) used
simulation studies to verify that higher order kernels are beneficial
for finite data sets, and Hall and Marron"(1987b) proposed a method of
choosing the kernel order.
However. it is believed that the choice of
the smoothing parameter is far more important than the choice of the
kernel function; see, for example. Silverman (1986. 'pp. 15-17) for a
discussion and examples illustrating this point.
Therefore, for
convenience. the kernel function is often chosen to be a symmetric
probability density function.
An important goal in kernel estimation is to find the bandwidth
that minimizes the "distance" between the estimate fh(x) and the true
density f(x).
A criterion that is often used to measure this error is
the mean integrated square error where
2
A
(1.2)
MlSE(h} = E[J(fh(x} - f(x}} dx] .
Rosenblatt (1971) presented an intuitive description for the tradeoff
A
between smaller.and larger values of the bandwidth.
The MaE of fh(x)
is the sum of the squared bias and the variance of fh(x}.
Rosenblatt
showed that a small value of h results in high variance (i.e. fh(x) is
affected by individual observations and hence is more variable}, while
A
a large value of h results in high bias (i.e. fh(x) is very smooth but
may not include minor features of the true.distribution).
Thus, it is
desirable to find a bandwidth that balances the effects of the bias and
the variance of the estimate.
A number of methods have been proposed for choosing h.
The
"plug-in" method, introduced by Woodroofe (1970) and Scott, Tapia and
Thompson (1977). uses .the concept that
4
h
_ [C(K) f(x) ]115
o
-115
(1.3)
f" (x)2
o
where h
n
is the bandwidth that minimizes MSE.
Therefore. they estimate
f and f" with kernel estimators. and then substitute these estimates
into equation (1.3) to obtain a value for h.
With this procedure. a
smoothing parameter still must be chosen in order to estimate f(x) and
f"(x); in particular. f"(x) is even more difficul t to estimate than
f(x).
Moreover. equation (1.3) is an asymptotic equation. and thus it
may not give a good solution for h when n is finite.
The Kullback-Leibler or pseudo-likelihood method was introduced by
Habbema. Hermans and van den Broek (1974) and Duin (1976). and later
improved by Hall (1987a).
The motivation for this method is that a
function which is similar to the likelihood function is maximized over
h. and consequently. the solution is analogous to the maximum
likelihood solution.
The Kullback-Leibler method maximizes the
function
n
(1.4)
KL(h) = IT fhi(X i )
i=1
where
n
( 1.5)
}; ~(x-X.)
j=l
J
j;;o!i
A
f
hi
(.) is known as the leave-one-out estimator.
If fh(x). 'as in (1.1).
were used in place of fhi(x) then the trivial solution h=O would
maximize equation (1.4).
Rudemo (1982) and Bowman (1984) suggested using the method of .
least-squares cross-validation for selecting the bandwidth.
Their goal
was to find a bandwidth that minimizes the integrated square error
5
A
(lSE) of fh(x) where
A
lSE(h) = f [fh(x)-f(x)]
=
ff (x)2dx
h
~.
2
dx
2 ffh(x) f(x)dx + ff(x)2dx ;
(1.6)
Rudemo (1982) and Bowman (1984) proposed choosing h to be the minimizer
of the cross validation score:
A
where fhi(x) is the leave-one-out estimator as in (1.5).
n
-1 n
~
i=l
A
fh.(X.)
1
Since
2
is a method of moments estimator of ffhf·. and ff(x) dx
A
1
is independent of h. CV(h) "is a reasonable estimate of the terms in the
lSE that depend on h.
Therefore. the bandwidth that minimizes CV(h)
A
should be close to the bandwidth that minimizes the ISE of fh(x).
A
The optimal rate of convergence for fh(x) was first considered by
Farrell (1972) and Stone (1980) in order to evaluate "different
bandwidth selection procedures.
Let C be a given set of density
functions with two existing derivatives. then"Farrell (1972) showed
that for f(x)€C. r=2/5 is "the smallest value of r such that
r
A
lim lim inf P{lfh(x)-f(X)1 ~ an- } = 1.
a- n- f€e
-2/5
is the best possible rate of convergence for
In other words. n
Ifh(x)-f(x)I.
Furthermore. he showed that whenp is the number of
existing derivatives for the density .'f(.
rate of convergence for Ifh(X)-f(x)
I.
r.
n-p/(2p+l) is the optimal
For a clear presentation of this
result as well as other optimal rate results. see Stone (1980).
Asymptotic rates of convergence were also used to study various
bandwidths and their error criteria. ISE or MISE.
h
o
Let:
be the bandwidth that minimizes MISE(h).
6
h
h
o
cv
~l
be the bandwidth that minimizes ISE(h).
be the bandwidth that minimizes CV(h).
be the bandwidth that minimizes KL(h).
Under the assumption that f(x) has a continuous second derivative. Hall
(1983) proved that h
is asymptotically equivalent to h
cv
0
in the sense
that
h
cv
/ h
~
o
1
in probability
and
A
ISE(h
cv
)
---A-- ~
in probability.
1
ISE(h )
o
Stone (1984) and Burman (1985) showed that under the weaker assumption
that f(x) is continuous
A
ISE(h )
cv
---A--~1
a.s.
and
ISE(h )
o
A
MISE(hcv )
-----~ 1
a.s.
MISE(h )
o
Therefore. the ISE and MISE obtained with the cross-validation
bandwidth converge to the minimum ISE and MISE respectively.
The
results of Hall (1983). Stone (1984) and Burman (1985) imply that the
least-squares cross-validation method is asymptotically optimal for
choosing h to estimate a density with any amount of underlying
smoothness.
Marron and HardIe (1986) showed that ISE and MISE are
essentially the same for large n by proving that under reasonable
assumptions
ISE(h) - MISE(h)
lim sup
n-<'" hE.H
n
= 0
a.s.
MISE(h)
where H is a finite set that depends on n.
n
7
It has been proven that if
~
~
~
h is either h
cv
or
then
'1cl
~
" D(h)
lim
Il-""
inf D(h)
heR
n
= 1
a"s.
where D is either ISE or MISE; see for example Marron (1987).
However.
Hall (1987a) and Marron (1987) concluded that the least-squares
cross-validation method has several advantages over the
Kullback-Leibler method for choosing h.
These advantages include
weaker assumptions needed and fewer sources of error for the
cross-validation method.
The rate of convergence for the least-squares cross-validation
bandwidth was introduced by Hall and Marron (1987).
on the unknown density function f(.) while h
Since h
o
depends
is a function of the
o
sample observations, they suggested that one should aim to minimize
~
ISE.
Hall and Marron (1987a) proved that hcv performs as well as h '
o
o
an unattainable value, when h
o
is the standard of comparison.
This is
seen from the following results:
n3 / 10
3 10
n /
(ho - h 0 ) ...Q.. N(O. a;2 0 20 ).
(hcv - h0 ) ...Q.. N(O. a;2 0 2cv ).
~
n [ISE(h ) - ISE(h ) ]
o
0
n [ISE(h ) - ISE(h )]
o
cv
where
2
R 31
uo'
0
2
cv
D
D
-1 2
, (a3
I
0
0
2
/2) )("
(a;10~v/2)
)(7
are constants depending on f{.) and K{.).
Finally, boundary adjustments are very important for estimating
the" intensity function near the endpoints of the interval.
Results
relating to boundary adjustments in the density estimation"context can
be found in Rice (1984), Schuster (1985), Gasser, Mueller. and
Mammitzsch (1985), and Cline and Hart (1986).
8
l.b. KerneL Intensity Estimation Literature Reuiew.
Let Xl'
~ ..... ~ be
all of the observations on the interval [O.T]
from a point process with intensity function A(X).
That is. the number
of occurrences in (O.t] has expected value equal to
t
fO
A(u)du.
See Cox
and Isham (1980). Ripley (1981). and Diggle (1983) for further
information regarding point processes.
The intensity function is very
similar to a density function in that we expect to have more
observations when the intensity function takes on high values and have
fewer observations when the intensity function takes on low values.
Hence. a natural estimate for A(X) is the kernel estimator:
N
A
~(x) =
where Kb(x) = h
-1
!
i=l
for x€[O.T]
Kb(x-X i )
(1.8)
Again. the kernel function. K(.). is
K(x/h).
generally a symmetric probability density function. and the bandwidth.
A
h. is the smoothing parameter for
~(x).
In the intensity estimation
setting. N. the number of observations in [O.T]. is a random variable.
Note that the kernel density estimate. fh(x). includes a normalization
factor of n
-1
A
.
which makes fh(x) a probability density function; this
adjustment is not needed for estimating an intensity function.
Theoretical properties of the kernel intensity estimator have been
developed by Devroye and Gyorfi (1984). Leadbetter and Wold (1983). and
Ellis (1986).
~(x)
In particular. Leadbetter and Wold (1983) proved that
is almost sure pointwise consistent. almost sure uniformly
consistent and asymptotically normal under specified conditions.
A variety of approaches have been taken to study kernel intensity
estimation.
The multiplicative intensity model. introduced by Aalen
(1978). is frequently used to model counting processes.
9
This model
assumes that the intensity function takes the form
A(X) = a(x) Y(x)
for x€[O.T]
where a(x) is an unknown non-negative deterministic function. and Y(x)
is a nonnegative observable stochastic process such that Y(x) is known
just prior to time x.
Often a(x) can be interpreted as an individual
intensity.for making an occurrence. and Y(x) as the number of
individuals that are at risk at time x.
In order to estimate a(x).
Ramlau-Hansen (1983) proposed the following estimator:
=
a(x)
n
2 Kb(x-X.) / Y(X i ) .
i=l
1
He proved that this estimator is uniformly consistent and
asymptotically normal as
n~, h~
and
nh~.
See Anderson and Borgan
(1985) for an overview of the multiplicative intensity model and the
~
above kernel estimator a(x) ..
Diggle (1985) considered the estimation of the intensity function,
A(X), assuming that the underlying point process is a stationary Cox
process (also known as a doubly stochastic Poisson process).
This
process is defined by :
1) {A(x). xEffi} is a stationary. non-negative valued random process
2) conditional on the realization A(X) of A(x). the point process
is a nonhomogeneous Poisson process with rate function A(X).
See Cox and Isham (1980) for a discussion of doubly stochastic Poisson
processes.
Diggle (1985)
recommen~ed
the standard kernel intensity
~
estimator.
MSE(h) =
o and
~(x).
~
E[~(x)
and used an empirical Bayes method to calculate its
2
- A(X)].
For x sufficiently far from the boundaries
T. the MSE does not depend on x since the process A(x) is
stationary.
In order to estimate the bandWidth. Diggle (1985) proposed
10
r
substituting estimates for the unknown terms in the MSE, and then
minimizing this function over h.
In an earlier paper, Clevenson and
A
Zidek (1977) developed similar methods to study ~(x) with a uniform
kernel in the stationary Cox process setting.
When
~(x)
is used with a uniform kernel, Diggle and Marron (1988)
proved that Diggle's (1985) minimum MSE method and the least-squares
cross-validation bandwidth selection method choose the same bandwidth.
These two methods are motivated from very different Viewpoints, and the
fact that they both select the same bandwidth confirms that this
bandwidth is a reasonable choice.
More importantly, this result also
indicates how aSyIDptotic methods that were developed for density
estimation can be used in the intensity setting.
Finally, Diggle
(1985) and Diggle and Marron (1988) recommend procedures that prOVide
boundary adjustments for the kernel intensity estimator.
I.e. Models for Intensity Estimation.
Two mathematical models for the intensity function are considered
in this thesis: they are a simple multiplicative intensity model and a
stationary Cox process model.
Intensity estimation under the
assumptions of these models are discussed in this section.
First, consider a very simple version of the multiplicative
intensi ty model.
Let Xl'
lS, ... ,~
be the observations from a
nonhomogeneous Poisson process on the interval [D,T] with intensity
xc (x).
Assume that
xc (x)
= c a(x)
where c is a constant, and a(x) is a nonnegative deterministic function
11
.
T
such that
J aCx)dx=l. Moreover. the kernel estimate of Xc is
O
NCc)
NCc) -1
~Cx) = !
~(x-Xi) = !
h K[(x-Xi)lh] for x€[O.T].
i=l
i=l
A
K(.) is the kernel function which we assume is a symmetric prohability
density function •.and h is the bandwidth.
Under this model. N. the
number of observations that occur in [O,T]. is a Poisson random
T
variable with mean JOXc(x)dx=c.
T
-1
Note that a (x)=X (x)[JOX (u)du]
c
c
c
;
thus. conditional on N. the observations X1.X2 •...• ~.have the same
distribution as the order statistics of N independent random variables
with probability density ac(x)I[O.T](x).
As a result. previously
developed kernel density methods can be used to study a(x)I[O.T](x).
Throughout the dissertation. we will assume a circular design where
X (O)=X (T). X'(O)=X'CT) and X"CO)=X"CT).
c
c
c
c
c
c
This allows us to ignore the
boundary effects at 0 and T.
Asymptotic analysis is a powerful tool for understanding the
behavior of an estimator.
with
n~
In the density setting, asymptotic analysis
allows the researcher to evaluate the kernel estimator.
thiS intensity model. letting
c~
In
has the same desirable effect of
adding observations everywhere on the interval [O.T] and not changing
the relative shape of the target function X (x) in the limiting
c
In other words. c
process.
-1
X (x) is a fixed function as
c
c~.
In this
thesis. we will be considering convergence of sequences of random
.
Q)
variables; therefore. we will construct an appropriate sequence {c s }s=l
of values for c indexed by s.
h-;()
and c
s
Throughout. we will assume that c
s
~.
~.
Next. we look at the stationary Cox process model.
Let Xl'
X2 •...• ~ be all of the observations from a stationary Cox process or a
12
r
doubly stochastic Poisson process with intensity function A (x) on the
~
interval [O.T].
Assume that A (x) is a realization of the stationary
~
nonegative random process {A(x). xEm}.
Moreover. for each
~.
assume
that for all x.y€[O.T]:
1) E[A(x)] =
~,
2) E[A(x)A(y)] = v(\x-y\)
where vex) =
~2 v o (x)
for vo(x) a fixed function.
This model is identical to the Cox model used in Diggle (1985).
In this intensity model, asymptotic analysis is done by letting
Consequently, assumption 2) above ensures that the relative shape
~~.
of the curve A(x) in the limiting process does not change.
~
{~s)s=l'
construct a sequence
Again. we
so that we can discuss convergence
properties for sequences of random variables.
The kernel estimate of A (x) is
~
~(x)
=
N(~)
~ Kh(x-X
i=l
i)
for x€[O.T].
A
~(x)
A
in the Cox process model is the same
as.~(x)
in the
multiplicative intensity model for estimating A(X) from a data set.
The difference between the two models is only seen when the estimators
are evaluated mathematically.
T
E[N]=E[fOA(x)dx]~T
[O.T].
Note that under the Cox process model.
where N is the number of observations that occur in
Moreover. if we condition on {A(x). x€[O,T]}. then N is a
T
Poisson random variable with mean (fOA); however. N is not in general a
Poisson random variable.
Finally. define
T
a(x)=A(x)[fOA(u)du]
-1 ; then.
conditional on N. the N occurrence times have the same distribution as
13
the order statistics from N independent identically distributed random
variables with density a(x)I[O.TJ(x).
I.d. Results for the Least-Squares Cross-Validation Bandwidth.
An important consideration in kernel intensity estimation is the
selection of the bandwidth or smoothing parameter.
Ou~
aim is to find
a data based bandwidth selection method which has favorable asymptotic
properties.
For both of the above models. we are interested in finding
a bandwidth that minimizes the distance between the kernel estimate
~
and the true intensity function A.
Therefore, we aim to, minimize the
integrated square error (lSE) of
where
IS~ (h)
The first term,
fb ~2,
=
~
fb ~2 - 2 fb ~ A + fb A2
is a'function of the data. and the third term,
T 2
faA. is independent of h.
~
~i(x)
In addition, let
be the
leave-one-out estimator,
~
~i(x) =
N
! ~(x-Xj)
j=l
for x€[O,TJ.
,
j#
N ~
Then,
i:l~i(Xi)
~
is a method of moments estimator for f
~A
As a
result. we can adapt Rudemo's (1982) cross-validation score function to
the intensity estimation setting by defining
T ~ 2
CVA(h) =
N ~
fa ~ - 2i:l~i(Xi)
~
The cross-validation score
fun~iton
estimate of the first two terms of
for A, CVA(h}, prOVides a good
IS~(h),
and thus, the bandwidth
that minimizes,CVA(h) should be close to the bandwidth that minimizes
I~(h).
14
Again let h
be the bandwidth that minimizes CV,(h) and h
~
n
I~(h).
bandwidth that minimizes
First of all,
.
I~(h
the
The goal in this thesis is to
minimize the lSE; hence, there are two ways that a bandwidth h
optimal.
0
..
can be
), the ISE of the kernel estimator with
bandwidth h", can be asymptotically equivalent to the minimum lSE.
.
Secondly, the bandwidth h
..
.
can be asymptotically equivalent to the
bandwidth that minimizes the ISE.
It is known that the cross-validation bandWidth is asymptotically
optimal for density estimation, and we show that this is also the case
in the context of intensity estimation.
In Chapter 2, given the simple
multiplicative intensity model and several reasonable assumptions we
show that h
cv
is asymptotically optimal in the sense that
I~(hcv )
----+ 1
as c --+ co.
s
a.s.
I~(ho)
A
and
P
A
h cv / h
0
----+
1
as c --+
s
(1.9)
(1. 10)
CD.
We prove (1.9) with martingale techniques similar to those used in
"
Hardle, Marron and Wand (1990) to study the asymptotics of density
derivatives.
In order to prove (1.10), we use an argument that is
based on the kernel density estimation proof found in Hall (1983).
Unfortunately, it turns out that the cross-validation bandwidth is
not asymptotically optimal under the Cox process model.
This follows
from the fact that the lSE and the MISE are not asymptotically
equivalent in this setting.
In Chapter 3, we discuss this more
thoroughly, and we give restrictive conditions under which the
cross-validation bandwidth is asymptotically optimal.
15
Once we determine that the least-squares cross-validation
bandwidth is asymptotically optimal, the next step is to find the rate
of convergence.
Using ideas found in Hall m:'d Marron (1987a), we show
that under the simple mul tipl1catiye intensity model and mild
assumptions,
cIllO
s
{h0
{hcv
c 1/ 10
s
c-1 [ISF_ (h )
-" 0
s
c -l[ISF_
where a ,
4
a
s
2
cv
A
(hcv )
are constants depending on X{.) and K{.) and h
the bandwidth that minimizes the MISE of
that when h
o
~(x).
o
is
These results imply
is used as the basis for comparison, the cross-validation
bandwidth performs as well as the,bandwidth that minimizes the MISE in
terms of the convergence rate.
Thus, the convergence rate, c
slow, but we cannot reasonably 'expect it to be any faster.
-1/10
s
is
We also
show that the rate of convergence under the multiplicative intensity
model is comparable to the density convergence rate.
variances a
2
o
and a
2
cv
In addition, the
are similar for the two estimation settings.
Finally, we examine the variances for couple of different intensity
functions and see that the cross-validation bandwidth performs
relatively well for the intensity function with more curvature.
In the last chapter, we use the kernel intensity estimator with
the cross-validation bandwidth to analyze a coffee sales data set.
In
this example, we are interested in estimating the purchase rate for
various brands of coffee and in relating the purchase rate to
promotions.
The kernel estimator with the cross-validation bandwidth
16
assists the researcher in seeing structure in the data sets; however.
there are some problems since the cross-validation score function has
multiple minima.
The coffee example shows that the least-squares
cross- validation bandwidth not only has good theoretical properties
but also is effective for data analysis.
17
·aIAPTER 2
ASYJIPfOfIC OPTIXALI'rY OF TIlE LEASf-~ARES a«BS-VALIDATION BANDWIDI1I
UNDER TIlE SDIPLE MULTIPLICATIVE INTENSITY MODEL
An important aspect of kernel intensity estimation is the
selection of the bandwidth or smoothing parameter.
We are interested
in finding a data based bandwidth which has desirable asymptotic
properties.
It has been proven that in the related setting of
~e~nel
density estimation. the least-squares cross-validation bandwidth is
asymptotically optimal. ·There are two ways in which a bandwidth can be
asymptotically optimal.
First, the performance of the bandwidth as
measured by some error criterium can be asymptotically eqUivalent to
the "best" performance; secondly, the bandwidth can be asymptotically
equal to the "best" bandwidth.
In this chapter, a simple multiplicative intensity model is used
to put a mathematical structure on the intensity function of the point
process.
Then, we show that the performance, measured by the
integrated squared error (ISE). of the cross-validation bandwidth is
asymptotically optimal under the simple mul tiplicative intensi ty model.
We also prove that under this model the least-squares cross-validation
bandwidth is asymptotically eqUivalent to the bandwidth that minimizes
the ISE.
In the next chapter, we explore the same asymptotic questions
for the stationary eex process model.
2.a. The Model and the Kernel Estimator.
Let
Xl'~""'~
be all of the observations from a nonhomogeneous
Poisson process over [O,T] with intensity A (x).
c
We use a simple
multiplicative intensity model to put a mathematical structure on the
intensity function.
The general multiplicative intensity model,
introduced by Aalen (1978), is frequently used to model counting
processes.
Assume that
AC(X) = ca(x)
for xElR
where c is a positive constant and a(x) is a nonnegative deterministic
function such that
T
SOu(x)dx = 1
This model is
appropriately called a multiplicative intensity model since frequently
c can be interpreted as the number of subjects in an experiment and
u(x) as the "individual" intensity for each subject.
Under this model,
N is a Poisson random variable that has expected value equal to c.
Conditional on N, the occurrence times, Xl'
~,
...
,~,
have the same
distribution as the order statistics corresponding to N independent
random variables with probability density u(x)I[O,T](x).
In order to
avoid boundary effects at the endpoints 0 and T, we assume a circular
design such that
A (O)=A (T), A'(O)=A'(T) and A"(O)=A"(T).
c
c
c
c
c
c
design implies that for O<r,s<T,
A circular
A (-r)=A (T-r) and A (T+S)=A (s).
c
c
c
c
Finally, under the simple multiplicative intensity model, we have
seen that letting
c~
has the desirable effect of adding observations
everywhere on the interval [O.T] without changing the relative shape of
the target function A (x) in the limiting process.
c
Since we will
consider the convergence properties for sequences of random variables,
we require the values of c to come from the sequence of positive real
19
<Xl
numbers {c}
s s=
that letting
~
1 such that c /s
s
T for some constant T)O as
also implies that c
s~
s~.
Note
~.
s
The natural kernel estimate of A (x) is
c
N
A
~(x)
= ! ~(x-Xi)
i=l
for x€[O,T].
K(.) is the kernel function, and h is the bandwidth or smoothing
parameter.
The kernel estimator differs from the typical kernel
density estimator in two ways.
First,
does not include a
~
1
normalization factor, n- , since ftA(X) is the expected number rather
r
than the expected proportion of observations between r and t.
Second,
N is a random variable in the intensity estimation setting.
For the sequence of intensity functions A (x)=c a(x) indexed by s,
c
s
S
AS
we construct a corresponding sequence of kernel estimators.Ah(x).
assume that as
c~, ~
s
such that hc
s
~ ..
It follows from
AS
Ramlau-Hansen (1983) that Ah(x) is uniformly consistent and
asymptotically normal as c
s
h-() and hc
~,
s
~.
Our goal is to find a data based bandwidth that minimizes the
integrated square error (ISE) of
IS~(h)
=
~
where
fb ~2 - 2 fb ~ A - fb A2
.
In the intensity estimation setting, the cross-validation score
function is defined as,
N
2 ! ~i(X,)
A
i=l
1
A
where
~i(x)
is the leave-one-out estimator.
A
~.(x)=
1
!
h-
1
j¢i
K((x-X.)/h)
J
is a method of moments estimator of f
20
T
A
O ~A , and
We
J
T
O
2
.
x is independent of h, CVX(h) is a reasonable estimate of the terms
in lSEx(h) that depend on h.
Therefore. the bandwidth that minimizes
CVX(h) should be near the bandwidth that minimizes lSEx(h).
The mean integrated square error (MISE) is another error criterion
that is used to evaluate bandwidth selection procedures.
k = (Ju~(u)du)/2.
Define
Assuming that K is a probability density function
and that X has two continuous bounded derivatives,
MlSEx(h) = J6
= J
T
E[(~(x)
~,
c
s
~
and
A
Var[~(x)] + {Bias[~(x)]}
O
= h- 1c
as
- X(x»2] dx
A
S
2
dx
(W) + h 4c 2 k 2 J T (u,,)2 + o(h- 1c + h 4 c 2 )
s
5
5
O
hc~.
Let B(h) be the asymptotic mean integrated
s
square error (AMlSE) of X; then.
1
B(h) = h- c
s
(W)
4
+ h c: k2J6(u,,)2 .
There are two interesting things to note about the AMlSE of X.
First of all, the two terms of B(h) are in fact the asymptotic
integrated variance and the asymptotic integrated squared bias.
As h
gets larger. there are fewer "squiggles" in the kernel estimate and
hence, the variance is lower; this is reflected in the first summand
where h-
1
decreases as h increases.
On the other hand, as h gets
larger, the kernel estimate is smoother and misses features of the true
intensity function.
This causes the bias to become larger which is
seen in the second summand where h
4
increases along with h.
The second
point to note is that. B(h) is equal to the AMISE in the density
setting multiplied by c 2
s
Thus, accounting for the difference of scale
between the density and intensity settings, the AMISE is the same in
both estimation situations.
21
It turns out that the squared bias of the kernel estimator in the
intensity.setting is c2 times the squared bias in the density setting.
s
However. the variances in the two settings are different.
One can show
that.
Var[~{x)] = E[Var[~{x)IN]]
+
Var[E[~{x)IN]]
=
[~*A - c- 1{Kb*A)2]
=
~*A
where "*" is the convolution operator.
....
estimate fh{x) = n
~.
-1 n
~
i=1
1
+ c- {Kb*A)2
(2.1)
Meanwhile, for the density
Kb{x-X.).
v2
1
Var[nfh{x)] = ~*nf - n
-1
(Kb*nf)
2
.
Note that the variance of nf is similar to the first two summands of
(2.1).
Since the number of observations coming from a Poisson process
is random. the variance of A has an additional positive term.
Var[E[~{x)IN]]. Thus. the variance of the kernel estimator is larger
in the intensi"ty setting.
However, the difference between the
variances is seen only in the second and higher order terms; hence. the
asymptotic variance is similar for A and nf.
Since the number of observations.in [O.T]. N. will play an
important role in the proofs in this section. we state a few properties
about this random variable.
T
(J A
O
C
N has a Poisson distribution with mean
(x)dx). and hence.
s
E{N) = JT
a Ac s.(x) dx = c s JOT a{x)dx = c s .
As a result.
E[{N) ] = E[N{N-1){N-2) ... {N-m+1)]
m
m
m
m
E[N ].= c s + o{c s ) .
and
(2.2)
In addition. it is well known (e.g. Hoel. Port and Stone. 1971) that
22
given N, a Poisson random variable with mean c . then
s
[
~c s
Nvc
C
]
....!L N(O,l)
as
5
(or equivalently c
~ Q)
s
~).
s
This implies that
N/c
l 2
= 1 + Op(c: / )
s
In other words,
P
N/c
s
-----I
(2.3)
as s-JD
2.b. ResuLts.
A
Let h o be any bandwidth that minimizes
I~-
(h) and h ~ any
-~
bandwidth that minimizes CVA(h) (these minima always exist since
I~(h)
and CVA(h) are continuous and bounded functions).
In this
section, assume that:
a) The kernel function, K(.), is a compactly supported bounded
symmetric probability density function.
Without loss of
generality assume that K(.) is supported on [-1,1].
b) The true intensity function. A, has two continuous bounded
derivatives.
c) For each value of s, the bandwidths under consideration come
from a set H where for some constants
s
~,p,o)O,
= {the
number of elements in Hs } ~ c sP
and for h€H ' ~-lc~-l+O)~ h ~ ~c:o.
s
#(Hs )
d) For some constant T)O, c /s
S
~
T as s
~
m;
Assumption b) is a common technical assumption which allows Taylor
expansion methods to be used for studying the error functions of
~(x).
Assumption c) can be weakened so that H is a continuous interval by
s
using a continui ty argument found in HardIe and Marron (1985).
set of possible bandwidths nearly covers the range of consistent
23
This
bandwidths.
Under these assumptions, the least-squares
cross-validation bandwidth is asymptotically optimal for kernel
intensity estimation.
That is, the performance, measured in terms of
the ISE, of the kernel estimator is asymptotically optimal when the
least-squares cross-validation bandwidth is used.
This result is
stated in Theorem 2.1.
~
2.1 : If assumptions a), b), c) and d) hold,
then under the simple multiplicative intensity model,
~
ISF- (.h
"
cv
)
--+
1
a. s.
as s -+ co .
In order to prove Theorem 2.1, we use arguments similar to the
.
martingale methods employed by HardIe, Marron and Wand (1990) to prove
the asymptotic optimality of density derivatives.
This method is based
on a martingale inequality given by Burkholder (1973).
a direct result of the two follOWing lemmas.
The theorem is
In both of these lemmas,
we assume the simple rnul tiplicative intensi ty model and that
assumptions a), b), c) and d) hold.
Lemma 2.1: sup /fISE}.(h) - B(h)] / B(h)1 ~ 0
h€H
s
ass-+(Q.
24
a.s.
as s ~~.
~
Recall that B(h) is the ANISE for X.
Thus. Lemma 2.1 states that
the ISE and MISE are essentially the same for large s.
If we let h
o
be
the bandwidth that minimizes MISEx(h). a straight forward consequence
of Theorem 2.1 is that h
cv
is also asymptotically optimal with respect
to MISE in the sense that
as
5
~ 0).
Furthermore. Theorem 2.1 can be used to prove that the crossvalidation bandwidth is asymptotically equivalent to the minimum ISE
bandwidth.
However. we need to change assumption c) to
c* ) For each value of s, the bandwidths under consideration come
from a set H where for some constants
s
p.o.f.~>O.
p
#(H ) = {the number of elements in H } ~ c
s
s
s
and for hEH. fc-l/5~ h ~ ~-1/5.
s
s
s
The second asymptotic optimality result for the cross-validation
bandwidth is given in
Theorem 2.2:
Theore~
If assumptions
2.2.
a).
,.
b), c ) and d) hoLd. then under the
simpLe muLtipLicative intensity modeL.
h
h
cv
---> 1
in probabit tty
as s-JX' .
o
The proofs of both theorems and the two lemmas are given in the next
section.
25
2.c. ProoFs.
We will begin with the proof of Theorem 2.1. which states that the
A
ISE of X with the cross-validation bandwidth is asymptotically
equivalent to the minimum IS£.
prooF of Theorem 2.1:
A
Lemma 2.1 says that the ISE and the asymptotic MISE of
asymptotically equivalent as
s~;
~(x)
are
thus we get that:
lSEx(h) = B(h) + o(B(h».
Combining Lemma 2 ..2 and Lemma 2.1 implies that as
A
Since h
s~.
A
0
minimizes ISE).(h) and hcv minimizes CVX(h).
o
A
A
ISE).(ho )
~
lSEx(h cv )
As a result of (2.5), as
A
and CVX(hcv )
A
~
(2.5)
CVX(h o )
s~,
II~(h )/IS~ (h ) - 11 = /[I~- (h 0) - ISF_ (h )] / IS~ (h )/
-"
0
""7\ cv
. -A 0
-"
cv
-l\. cv
-<
I[I~)] /- I~
-I\. (hcv )-I~
--7\ (h0 )+cv,(h0 )-GV,(hc V
l \ (h cv )/
1'\
I\.
~ 21[CV,(h
)-ISF_ (h )-GV,(h )+ISF_ (h )]/[ISF_ (h )+I~- (h )]1
"
0
-" 0
"cv --/I. cv
-/I. 0
-"
cv
(2.6)
From (2.4) and (2.6), we can conclude that,
IISEx(ho)/ISEx(hcv ) - 11.-+ 0
a.s.
as s~,
and hence,
II~(h )/I~- (h ) I -+ 1 a.s. as s~.
-A cv
-" 0
This completes the proof of Theorem 2.1.
(,,)
0
**
It is now necessary to prove Lemma 2.1 and Lemma 2.2.
of the two lemmas are given below in full detail.
26
The proofs
proof of Lemma 2.2:
Let g:(1.2 ..... N)
numbers 1.2 ..... N.
"unordered" Xi's.
~
(1.2 ..... N) be a random permutation of the
Define
Essentially. the Y.• s are the
Yi = Xg(i)'
1
Since the X. 's are observations from a Poisson
1
then conditional on N. the Yi's are i.i.d
~(x).
process with intensity
random variables with density a(x)I[O.T](x).
.
developed by HardIe.
estimates
can
Marron and Wand (1990) for kernel density
be applied to a(x)I[O.T](x).
We begin by introducing some notation.
= -1s i=lN Kb(x-X.)
A
~(x)
A
~
n
Therefore. the methods
~
C
-1
=c s
i(x)
_
T
A
~
j~i
h
2
= Io~ -
CV(h)
1
-1
= c
s
A
~(x)
K«x-Xj)lh) = c
-1 N
2
-1
Define:
_l
A
~i(x)
s 'n
-2
A
i:1~i(Xi) = C s CV~(h)
Cs
These are the typical definitions for kernel density estimates of a(x)
.
except that N is a random variable and c
-1
1
is used instead of n-.
s
A
a result. the ISE for a is.
A
Moreover. the MISE of a(x) is
M(h)
=h- 1c-s 1 (~)
+
h~2ITO(a"(x»2 + o(h- 1c- 1 + h 4 ).
s
A
and therefore. the asymptotic MISE of a is
A(h)
= h-1c~1
(~)
+ h ~2I6(a"(x»2 =
c~2a(h).
Finally. define:
=Kb(Yi-Y j ) Vi(h) =E(UijIY i )
Uij(h)
.
T 2
!Kb(y-Yj)a(y)dy - a(Yi) + IOa (y)dy
= !Kb(Yi-y)a(y)dy - I!Kb(y-z)a(y)a(z)dydz - a(Y )
i
27
As
Wij(h}== Uij - Vi
= ~(Yi-Yj) - ~(Y-Yj}a(Y)dY - ~(Yi-y)a(Y)dy
+ f~(y-z)a(y)a(z)dydz
-1
Ri(h) == [(N-1)c s -1] [~(Yi-y)a(Y)dY - f~(y-z)a(y)a(z)dYdz
T2
- 2a(Y i )+ foa (y)dy].
Note that for i.j=1.2 •...• N.
E(ViIN)=O.
Wij = Wji . and E(Wijlyi.N)=O.
If we sum over all of the data points. we can replace Y (the
i
unordered observations) by Xi (the ordered observations).
Therefore,'
substituting in the definitions of V.. Wi' and R. and recalling that we
J
1
1
are working with a circular design.
c
-1 N
-2 N
1N
};V +c
}; }; W. +c-};R
s i=l i
s i=l j;o!i Ij
s i=l i
=
-2 N
C
-1 N
}; }; ~(Xi-Xj) - C s }; [!Kh(Xi-x)a(x)dx]
s i=l
j;o!i
i=l
.
N
- [2(N-1)c- 1 - 1] c- 1 }; a(X.) + [N(N-1)c-2 ] fToa2(x)dx.
s
s i=l
1
s
lbat is.
c
-1 N
-2 N
-1 ~ R
-1 N ~
T~
}; V.+ c
}; }; W..+ c
" i = c
}; a. .(X.) - fOa.a - G
s i=1 1
s i=l j;o!i IJ
s i=l
s i=l nl 1
n
(2.7)
N
where
G = [2(N-1}c- 1 - 1] c -1 }; a(X.) - [N(N-1)c-2 ] fToa2(x}dx.
s
s i=l
1
s
.
Toward the end of this proof. we will show that Lemma 2.2 follows
directly from:
I[c-
sup
h€H
s
1 N ~
T~
}; ~. (Xi) - fO~a - G]/ A(h}
s i=l 1
I
--+ 0
a.s.
as s.....-:o.
(2.8)
By (2.7). it follows that statement (2.8) holds when (2.9). (2.10)
and (2.11) hold:
sup
h€Hs
c
-1 N
}; Vi / A(h}
s i=l
28
I --+ 0
a. s.
(2.9)
I
sup
h€H
s
Ic
sup
h€H
c
s
-2 N
};
s i=1
I
}; W / A(h}
ij
-+ 0
a.s.
(2.10)
j;o!i
-1 N
}; R / A(h} 1-+0
s i=1 i
a.s.
(2.11 )
Thus. in order to prove Lemma 2.2 it is sufficient to prove (2.9).
(2.10) and (2.11).
Observe that the Vi and W terms are are similar
ij
to the terms that arise in the density esimtation setting except that
the number of observations is random.
R.1 term is an additional
The
term that must be included because we are considering data from a
Poisson process.
We begin by proving (2.9).
< N.
Let k
Then. one can easily show
that
1) a{Y 1 .···.Yk } C a{Y 1 •...• Yk +1 }
k
2) .! V. is measurable w. r. t. a{Y •· '..• Y }
1
k
i=1 1
k+l
k
3) E[ }; Vi
a{Y 1 ·····Yk }· N] = i:lVi + E[Vk+ll a{Y 1 •.... Yk }. N]
i=1 .
k
= }; Vi + E[Vk+ 1 1 N]
i=1
k
= 1: V.•
i=1 1
k
Thus. conditional on N.
{}; V }N
i=1
i
is a martingale with respect to
k=1
the a-fields generated by.{Y .Y •...• Y }.
1 2
k
Applying a martingale inequality given by Burkholder (1973. p. 4O)
k
N
to {}; Vi}
impl ies that
i=1
k=1
k
N
E[( sup
! v. }2m 1 N] ~ a E[( }; E[~I N] }ml N]
k=I •... N i=1 1
i=1
1
Q)
+
a};
k=1
29
E[lvkl2m IN].
(2.12)
First, we find the order of the two ,summands on the right side of
the above inequality.
into the inequality.
Then, these expressions will be substituted back
Let "a" be,a generic constant throughout this
discussion and let "a " be a generic constant that depends on m.
m
4
the squared bias of ~ has order h , it follows that
Since
A
N
.. 2 '
E[( ~ E(ytl N)m
i=l .
I
N
1
N] = E[( ~ E(h- fK((Yi-y)lh)a(y)dy - a(Y i )
.
i=l
- h-1JfK((y-z)lh)a(y)a(z)dYdZ
+ Jba2(Y)dy)2
N
E[( ~ E(~I N)m
i=l
I N)m I N]
4
N] ~ E[(Nah )m I N]
N
E[(
~ E(~I N)m
(2.13)
i=l
Moreover,
N
CD
(2.14)
~ E[IVkl2m I N] ~ E[ ~ IVkl2m I N] ~ am N.
k=l
, k=l
Substituting (2.13) and (2.14) into Burkholder's inequality gives,
E[c
-1 N
2m
-2m
k
2m
~ Vi]
~ c
E[ sup ~ V. ]
s i=1
s
k~l i=1 1
-2m
k
2m
=c s
E[E[(
sup
,~Vi)JN]]
k=l, ... N 1=1
N
~ c~~[aE[( ~ E[~I N])ml N]
CD
+
a~ E[ Ivk ,2m 1 N]] by (2.12)
k=1
i=1
~ c
E[am Nm h4m + a mN]
by (2.13) and (2.14)
s
= a c-2m [h4m E(Nm) '+ E(N)]
m s
m
m
= a m c-5 2m [h4m (c S + o(c S » + c]
by (2.2)
S
-2m
~ a
m
(c-m h4m + c-2m+1).
s
s
-
.
(2.15)
4
Recall that A(h) is the AMlSE of ~, and that A(h) - c~1h-1 + h .
Then, by (2.15) for h€Hs '
30
E[c
-1 N
}; Vi
s i=l
~ a m [hm + c s h2m ]
E[c
for some
-1 N
}; Vi / A(h) ]2m ~ a m c-Tm
s
s i=l
~>O
(2.16)
and m sufficiently large.
Since (2.16) holds for an
arbitrary h€H . we can conclude that
s
N
1
{E[c- }; Vi / A(h)]2m}
s i=l
sup {
h€H
(2.17)
s
Using Chebychev's Theorem on (2.17) we get
N
sup P[ Ic- 1 }; v.1
s i=l 1
h€H
/
> c ~/4]
s
A(h)
/~ a c -(~/2)m .
m s
s
sup P[
h€H
(2.18)
s
Recall that by assumption c). ~(Hs ) ~ cPo
s
that [m
> 2(p+2)/~].
Let K and
1
In addition. choose m such
be finite constants .. Then. given a
~
sequence {c} '" 1 that satisfies assumption d).
5
s=
'"
1N
}; ~(H ) sup p[lc- }; Vii
s=l
s h€H
s i=l
> ~ A(h)]
s
'"
1 N
~ }; ~(H ) sup P[ Ic- };
s=l
s
h€H
s
s i=l
r
~
};
s=l
~(H
s
v.1 > min(~.
c
1
~/4
s
) A(h)]
> ~ A(h)]
)
.
1N
+ }; ~(H) sup P[ Ic- }; Vi I
s i=l
s=r+1
s h€H
s
31
> c~/4
s
A(h)]
+ '"
~ #(H) a c-(~/2)m
s=r+l
s
m s
by (2.18)
+ '"
~ c P a c-(p+2)
s=r+l s m s
since m >
= K + a
1
'"~
c
2(p+2)/~
-2
m s=r+l s
by assumption d)
Hence.
'"
1 N
~ #(H ) sup P[/c- ~ vii> e A(h)]
s=1
s
hER
s i=1
s
< '" .
(2.19)
(2.19) leads to
'"
N
'"
~ P[sup {/c- 1 ~ V.I/A(h)} > e] ~.~
hEH
s=1
s i=1
s
1
'"
~ P[
s=1 heH
.
> e]
s
1 N
sup P[ Ic- ~ V. //A(h)
hEH
s
s i=1
> e]
1
< '" .
(2.20)
Using the Borel-Dantelli Lemma with (2.20). we can conclude that
1 N
P[ sup {Ic- ~ Vil/A(h)} > e for infinitely many s] = 0 .
hEH
s i=1
s
Consequently, for the sequence {c s }'
sup
h€H
I cs
1 N
~ V. / A(n) 1-+0
s i=1
1
a.s.
as s --"'.
(2.21)
.
Thus. we've proven statement (2.9).
Following a similar procedure. we will now prove statement (2.10).
For k
< N.
1)
k
i-I
~
~
i=1 j=1
W' j is measurable w.r.t. a{yl ....• yk}
1
32
k+l i-I
2) E[!
! Wijl a{Yl ..... Yk }. N]
i=l j=l
k
i-I
k
k
!
i-I
! Wij ·
= !
! W + E[ ! Wk+ l .1 a{yl ..... yk}. N]
i=l j=l ij
j=l' J
=
i=l j=l
k
i-I
Hence. conditional on N. {! ! Wij}N
is a martingale with respect
i=l j=l
k=l
to the a-fields generated by {yl.y2 ..... yk}.
k
Applying Burkholder's inequality to the martingale { !
i=l
gives
k
E[(
i-I
sup
! ! Wij )
k=l .... N i=l j=l
2m
1 N]
i-I
N
~ a E[{ ! E[( ! Wij )21 Yl' Y2 ..... Yi - l . N]}m IN]
i=l
j=l
0>
+ a
k-l
! E[( ! Wi .)2m IN].
k=l
j=l
(2.22}
J
Now, consider the two terms on the right side of the inequality.
Observe that for i#j#k.
E[~jIN]=ah-l, and
E(WijWikIN) = E[E(WijWikIYi.N) IN] = E[E(WijIYi.N}E(WikIYi.N) IN] = O.
Thus, for the first term of Burkholder's inequality. we can show that.
(2.23)
For the second term of the inequality, we use Lemma 2.3 which is stated
and proven below to show that.
0>
k-l
N
! E[( ! W )2m 1 N] = E[ !
kj
k=l
j=l
k=l
~ a
33
m
Nm+l h-2m+l.
(2.24)
Leoma 2.3
For any positive integer m and W deFined as above.
kj
N
k-1
k;1
j=1
E[ ~ (~W
'"
kj
)2m
I N] /
a ~+1h-2m+1
"
m
prooF of Lemma 2.3:
Expanding the sums gives
k-l
N
E[ ~ (~Wk')
k=l
j=l
2m
I N]
~
J
k-l
N
~
E[(
k=l
~
N
I N]
W )2m
kj
j=l
N
N
= ~ [~ ... ~ E(Wkj W
kj .. ,Wkj IN)].
k=l j 1=1
j2m=1
122m
For each summand E(Wk .
J
l
Wkj ",Wkj
2
2m
I
N). the set of indices
(jl.j2 ..... j 2m) take values from the set (1.2 ..... N)
2m
.
Define the
function f such that
f(jl ..... j2m) E {the number of distinct values in (jl •.... j2m)}.
Thus. the summation could be written in terms of f(.).
E(Wkj
Wkj ",Wkj I N)=O if there exist some
122m
all i#.
E[
N
~
such that je ¢ ji for
Then.
k-l
(
e
Note also that.
~
k=l j=l
2m
Wk')
I N]
~ N [
J
2m
~
~
E(Wk · ... Wk · IN)]
r=l {(jl ..... j2m):f(.)=r}
Jl
J 2m
2m
m
N[
=
~
+ N
m
N[
~
0
r=m+l
r=l
~
~
~
ah-(2m-l)].
r=l {(jl····j2m):f(.)=r)
For
r~m.
let {il ..... i r } be a set of r distinct numbers such that
i €{1.2 ..... N} for each e. Thus. the number of sets {il
i } is
r
e
r
less than or equal to N . Moreover. given a set {il
i r }. the
number of sets (jl ....• j2m) such that je€ {i1 ..... i r } for all
e=1.2 ....• 2m
is less than or equal to r
34
2m
.
Thus. the number of sets
of (jl •... ,j2m) such that [f(.) = r] is less than or equal to (N
r
r
2m
).
Hence. one gets the final result:
N
E[
!
. k=1
Nm ah-(2m-l)
= a
,
m
Thus. we have proven Lemma 2.3.
Nm+l h-2m+l
'
.
**
Now. We return to the proof of Lemma 2.2.
We substitute the
results (2.23) and (2.24) into Burkholder's inequality (2.22).
-1m
k i-I
2m
! Wij ]2m ~ c
E[ E[( sup
! ! W.. ) IN]]
1=1 j=1
s
k=l •... N i=1 j=1 1J
1m
~ cE[ a N2m h-m + a Nm+l h-2m+l]
N
i-I
E[c~2!
s
~ a
m
m
m
(c- 2m h-m + c- 3m+l h-2m+l).
s
s
Therefore. it follows that for some
~>o
and m sufficiently large. and
for h€H s .
a (c- 2m h-m + c- 3m+l h-2m+l)
N i-I
2m
m
s
s
2
E[c- !
!
Wij / A(h)]
~ ~~"":(:"'c--:-l-h--:-1-+~h4:-)""'2m---s i=1 j=1
s
~ am [hm + c~m+l h)
N
i-I
2m
-~m
(2.25)
!
W•• / A(h)]
~ a
c
s 1=
. 1 j=1 1J
m s
N
Similar to the argument used for ! V.• we can conclude from
E[c- 2 !
i=1
1
(2.25) that:
2 N
sup I.c- !
! W'
h€H
s i=1 j;ti 1 j
/
s
This proves statement (2.10).
35
A(h)
I ... 0
a.s.
Finally.. consider
(2.11). the term involving R..
.
1
Using Taylor
N
expansions on the second factor of
F2
N
=
~
i=1
Ri , one can show that as
s~,
.
-1
.
T2
C s i:l[2fKh(y -y)a(y)dy - J!Kh(y-z)a(y)a(z)dydz - 2a(Y ) + JOa ]
i
i
-1 N
F2 = 2c
~ [!Kh(Yi-y)a(y)dy - a(Y.)]
s i=1
1
-1
T
T 2
- Nc s [JJOKb(y-z)a(y)a(z)dYdz - JOa ]
N
F = h~{2c -1 ~ [a"(Y )] - Nc-1 JTOa"a} + 0 (h2 ).
2
i
s i=1
s
p
Note that since h€H ,
s~
s
implies that c
s
~
as well as
(2.26)
h~.
Moreover, by assumption b) a and a" are bounded: thus, we get
(2.27)
s~.
converges to JTaa" in L~.
(2.27) and (2.28) imply that
Therefore, as
o
s~,
N
(c- 1 .~ [a"(Y.)])
s i=1
Since Nc
-1
s
(2.28)
P
~
P
-+
1
T
I oaa"
I, we can also conclude that as
Nc~1
s
p"
Oa a
(2.29)
s~.
P JT
"
Oaa
(2.30)
~
Combining (2.29) and (2.30) with (2.26) leads to the result that
-2
h . F2
P
T
~
k{fOa"a}
-2
.P
as
s~.
and hence for 0(e(6/2,
-e
F ~ 0 as s~.
2
s
Furthermore. since N is Poisson(c s ), it follows that as
C
h
(2.31)
s~,
d
c~/2 [(N-l)/c s -1] ~ N(O,I).
36
(2.32)
Using the definition of R along with (2.31) and (2.32). we can use
i
Slutsky's theorem to conclude that
c
l/2-e -2
h
s
(2.33)
as s:....,:o.
It can be shown that for all positive integers m.
~
~
[c l/2- e h- 2 (c- l
R )]€L2m and {[c l/2- e h- 2 (c- l
R )]2m: s=I.2 .... }
S
s 1=1 1
s
s i=1 1
is a uniformly integrable family.
Hence. it follows from (2.33) that
N
.
cm-2E.m h-'Im E[c- 1 ! R.]2m
S
s 1"=1 1
->
0
.'
as s
-+
CD.
Thus. there exists some c o >0 such that for c s >c 0 .
N
E[c-1 ! R ]2m ~ a
-m+2em h'lm
s i=l i
m Cs
..'
Since h€Hs ' then for c s·>c0 .
E[c
E[c
-1 N
! R
s i=1 i
-1 N
! R. I
s i=1
A(h}]2m ~ a
1
for m sufficiently large and some
c-Tm
(2.34)
m s
~>O.
(2.11) follows from (2.34) as
seen in the proof of statement (2.9).
At this point. we collect the above results to show that (2.8)
holds.
By (2.7). (2.9). (2.10) and (2.11). it follows that
.
1 NATA
lim sup I(c- ! ~1(Xi) - SO~a - G) I A(h}1
s - h€H
s i=1
s
= lim sup
s - h€H
{I (cs
1 N
! Vi + c
s 1=1
-2 N
!
s i=1
37
1
N
! Ri}1 A(h)l}
s 1=1
c-
1 NATA
-
;:: ~~ I(c~ i:1~i(Xi) - JO~a - G) / A(h)/
s
N -
~ lim sup Ic~l ~ Vi/A(h)I
s...... h€H
s
+ lim sup Ic-
i=l
s...... h€H
s
2 N
~
s i=l
~ Wi .IA(h)
J
j;o!1
I
1 N
lim sup Ic~ ~ Ri/A(h)I
s...... h€H
i=l
s
= 0
a.s.
This limit is the same as (2.8).
We now prove that (2.8) implies
the final result of Lemma 2.2 .
...... 0
a.s.
This completes the proof of Lemma 2.2.
**
proof of Lemma 2.1:
Let Y1' Y2 ""'YN be the "unordered" observations as in Lemma 2.2.
From the definitions of A(h) and M(h) and the fact that we are using a
circular design, we get that
T A
2
T
A
-2
A(h) - M(h) = JO[~(Y) - aCYl] dy - JbE[~(Y) - aCYl] dy
3S
A(h) - M(h) =
r.....
A
fO[~(Y)-~(Y)]
=c
l'
2
dy +
A
r....
..
A
2fO[~(y)-~(y)][~(y)-a(y)]dy
A
JOE[~(y)-~(y)]
2
dy
-2 N
-2 N
2·
~
. ~ rK (y-Y 1)K (y-Yj)dy + c ~ J[K (y-Y 1)] dy
s 1=1 j~l·on
-n
.
s 1=1 Uh
-1 N
- 2c s 1:1f~(Y-Yl)~(y-x)a(x)dxdY
+ ff~(y-x)~(y-z)a(x)a(z)dxdzdy
+
2 [c
C
-1
s
-1 N
~ Jnc (y-Y )K (y-x)a(x)dxdy-c
~ nc (y-Y )a(y)dy
1 --h
1
s 1=1 .on
s 1=1--n
ff~(y-x)~(y-z)a(x)a(z)dxdzdy + f~(y-x)a(x)a(y)dxdy]
-
-1 N
JJ[~(x-y)]
2
a(x)dxdy.
Define:
VI =
J~(Y-Yl)~(y-x)a(x)dxdy
-
-
~(Y-Yl)a(y)dy
ff~(y-x)~(y-z)a(x)a(z)dxdzdy + J~(y-x)a(x)a(y)dxdy
-1....,2
-1 ....,2
1'1 = C s J~(Yl-y)dy - C s fJ~(x-y)a(x)dxdy
W1j = ~(Y-Yi)~(Y-Yj)dy - J~(Y-Yi)~(y-x)a(x)dxdy
-
J~(y-x)~(Y-Yj)a(x)dxdy + JJ~(y-x)~(y-z)a(x)a(z)dxdzdy
2
ff~(y-x)~(y-z)a(x)a(z)dxdzdY
R = -[N(N-l)/c s - 2N/c s + 1]
-1 N
+ 2[N/c -1][c
~ JrK. (y-Y.)K (y-x)a(x)dxdy-f rK (y-x)a(x)a(y)dxdy]
s
s . 1 ··n
1 -n
··n
1=
-1
2
+ [N/c s - 1] c sfJ[K- n (x-y)] a(x)dxdy
-2 N
- 2c s i:1J~(Y-Yi)~(y-x)a(X)dxdY.
Therefore. it follows that
A(h) - M(h) = c
-2 N
~
s 1=1
~
j~1
-1 N
-1 N
W1j + 2c ~ VI + c ~ 1'1 + R.
s 1=1
s 1=1
Consider each of the summands in the expression above.
all. note that
39
(2.35)
First of
With assumption b). this leads to
1 N
sup Ic- }; T j A(h)1 --+ 0
h€H
s i=1 i
as S-.
(2.36)
s
Using Burkholder's inequality. as in Lemma 2.2. it can be shown that.
1 N
sup Ic- }; Vi / A(h)1 --+ 0
h€H
s i=1
a.s.
as
S ' " "',
(2.37)
as s ... "'.
(2.38)
s
2 N
sup Ic- }; }; W / A(h) I --+ 0
h€H
s i=1 j¢1 ij
a.s.
s
Finally. consider the term R/A(h).
We can write R=P +P +P +P
1 2 3 4
where:
.P 1 = [N(N-l)/C: - 2N/c s + I]SS!Kh(y-x)~(y-z)a(x)a(z)dxdzdy
-1
2
P2 = [N/c s - 1] c s SS[I),(x-y)] a(x)dxdy
-1 N
.
P3 = [N/c s -l][c s i:lS!Kh(y-Yi)l),(y-x)a(X)dxdy -S!Kh(y-x)a(x)a(y)dxdy ]
-2 N
,
P4 = c s i:l S!Kh(y-Y i )l),(y-x)a(x)dxdY.
1/2
Since K and a are bounded and c
[N/c -1]
s
s
d
~
N(O.I). one can easily
show that
-1
PI = 0p(c s ).
P = 0 (c-3 / 2 ) = 0 (c- 1 )..
p s
P s
2
-1 .
P = 0 (c ).
4
P s
First look at PI and P4 .. (2.39) and (2.41) imply that for
(2.39)
(2.40)
(2.41)
0(,,(0/2.
1-"
P
(Pl+ P4 ) ~ 0 as S - .
s
Since for all positive integers m. [c 1-" (P 1+ P4 )]€L2m and
C
[c 1-" (P +P )] 2m : s=I.2 •... } is a uniformly integrable family. it
1 4
follows from (2.42) that
40
(2.42)
(2.43)
Next condsider P3'
-1 N
Taylor expansions can be used to show that as s...."
'
2
c s i:1S~(Y-Yi)Kb(y-x)a(x)dxdY - SSa(y)Kb(y-x)a(x)dxdY = 0p(h ).
Hence, it follows from Slutsky's theorem that,
P
3
_
-0
(-1/2+~
c
P s
h2 ) .
(2.44)
As seen above, (2.40), (2.44) can be used to show that there exists
some
C2)0
such that for c )c 2 •
s
E[P + P "]2m ~ a (c-2m + c-(1-2~)m h 4m ).
m
s
s
2
3
(2.45)
Thus, using.(2.43) and (2.45), there exists some c )0 such that
o
for c)c
s
0
and hEH ,
s
for m sufficiently large and some
~)O.
Hence. using similar methods as
seen in Lemma 2.2.
sup
hEH
IR
/ A(h)
I -.0
a.s
as s ... "'.
(2.46)
s
Combining the results (2.36), (2.37), (2.38) and (2.46) in (2.35).
it follows that
lim sup I[A(h) - M(h)] / A(h)
s...." hEH
I
s
_1"N
-1 N
-2 N
!
= lim sup 1[2c
! V.+ c
! T + c
! W + R]/A(h)I
SIS
i
s i=l j;o!i ij
s""" hEH
i=1
i=l
s
41
lim sup IEA(h) - M(h)] / A(h)/
s...... h€H
s
~ 2 lim sup Ic~
s...... h€H
s
·s...... h€H
i=1
-2
+ 11m sup
= 0
1 N
1 N
~ Vi / A(h) I + lim sup Ic- ~ T
s
Ic s
S-
h€H
s
/ A(h) I
s 1=1 i
N
~
~
i=1
j¢i
W' j /A(h) I + lim sup IR / A(h) I
s...... h€H
1
s
a.s.
(2.47)
One can also show that for A(h). the asymptotic MISE of ~, we get
sup IEM(h) - A(h)] / A(h) I -+ 0
h€H
a.s.
as s ...
(2.48)
<XI.
s
As a result of (2.47) and (2.48),
sup IEI~(h) - B(h)] / B(h)1 = sup IEA(h) - A(h)] / A(h)
h€H
h€H
s
I
s
~ sup IEA(h) - M(h)] / A(h)
h€H
s
-+0
I+
sup IEM(h)- A(h)] / A(h)
hER
I
s
a.s.
**
Thus, we have proven Lemma 2.1.
Finally, we prove the second theorem; that is,'we show that the
cross-validation bandwidth is asymptotically equivalent to the minimum
ISE bandwidth.
prooF of
Theorem 2.2:
As seen earlier in this chapter! the asymptotic MISE of the
intensity function A is defined by
B(h) = c
as c
o
s
~,
h = a c
~
-1/5
o s
and
cs h.......
s
h-1(~)
+ c:
h4~k2!b(ah)2
Let h ° be the value that minimizes B(h); then,
for some constant a . which de~~ds on a.and K.
o
B(h°)= a 1 c -4/5
s
42
Moreover,
(2.49)
for some constant a .
1
Therefore. in order to prove Theorem 2.2. it is
sufficient to prove the following statements:
h /h°
o
A
hA
By Lemma 2.1.
cv
/h°
I~(h)
1
~
in probability
1
~
as
in probability
as
s~.
(2.50)
s~.
(2.51)
and B(h) are asymptotically equivalent, and
hence. one can show that
I~(hO)IB(ho) ~ 1 a.s.
as s~.
(2.52)
As a direct result of Theorem 2.1 and (2.52). we get that
A
°
ISF_
--1\ (hcv )1B(h)
~
1 a.s.
as
s~.
(2.53)
Given (2.52) and (2.53). the argument proving (2.50) and (2.51) is
similar to the proof found in Hall (1983) (p. 1160).
(2.51) here: the proof of (2.50) is analogous.
Note that h
that for all s.
cv
is a function of s.
We will prove
..
By assumption c ). it follows
c~/5 h cv€ [E.n] • and thus.
°
A
(2.54)
(hcv/h ) € [E/ao ' n/ao ]'
Given c ' let F be the distribution function for (h /ho). Then
s
s
by (2.54). {Fs : s=l,2 .... } is said to be "tight" sequence of
=
functions.
CD
{~}k=l
Using Helly's extraction principle. there is subsequence
and a distribution function F such that Fsk(x)
for all xEffi.
That is. F
~
converges weakly to F.
~
F(x)
(2.54) further
implies that the limiting distribution is proper and supported on
[E/a • n/a].
o
0
distribution.
Let
e>o be a point of support of the limiting
Now suppose that (2.51) is not true.
choose a subsequence
{~}
where
e~l.
Let
Then. it 'is possible to
O<e<l; the case of e>l
follows in a similar fashion. but we will not present that argument
43
here.
Define
C=
1/2 min{2.l-2}.
Now choose d>O sufficiently small
so that for all z€[2-C.2+C] and some p>l.
°
cs(h z)
-1
T-d
_.2
2 ° <t 2 T-d
2
{fd a(x)<Ix}{JK-(y)dy} + cs(h z) K Jd {a"(x)} <Ix
c (ho)-l~(y)dY + c 2 (ho),\2JT{a"(x)}2<1x
s
s
O
> p.
(2.55)
It is always possible to find such a d because h ° minimizes the
denominator.
Moreover. since h °= a c -1/5 . the left side of (2.55) is
o s
independent of s.
From the definition of B(h) and Lemma 2.1. one can .
show that
When
Therefore.
(h )/B(ho) > p. + 0 p (1) for (hA cv/h° )€[2-C.2+C].
(2.57)
- " cv
Furthermore. since 2 is a point of support of the subsequence limit of
ISF_
A
° it follows that
(h
/h).
cv
A
°)€[ 2-C. 2+C]} ,> O.
lim sup P{ (hcv/h
s.
(2.58)
The combination of (2.57) and (2.58) contradicts statement (2.53).
Hence. (2.51) must be true. and consequently the theorem has been
proven. -
44
aJAYI'ER 3
TIlE LEAST-SQUARES CROSS-VALIDATION BANDWIDIlI UNDER TIlE
STATIONARY
rox
PR<lCEiS MODEL
For kernel intensity estimation. we would like to find a data
based bandwidth has asymptotically appealing properties.
In Chapter 2.
we proved that the least-squares cross-validation bandwidth is
asymptotically optimal under the simple multiplicative intensity model.
In this chapter. a stationary Cox process model is considered for
putting
a
mathematical structure on the intensity function.
This
situation is more complicated since the true intensity function is a
random function.
We will show that in general the the least-squares
cross-validation bandwidth is not asymptotically optimal (measured by
the ISE) for a stationary Cox process model.
3.a. The Hodel.
A second model that is frequently used to put a mathematical
structure on counting processes is the stationary Cox process model.
Throughout this section. let
X1'~""'~
form a partial realization of
a stationary Cox process also known as a doubly stochastic Poisson
process with intensity function
A~(X)
on the interval [O.T].
A
stationary Cox process is defined by :
1) {A(x). xEffi} is a stationary nonnegative valued random process.
2) conditional on the realization X (x) of A(x). the point process
.
J.L
is a nonhomogeneous Poisson process with rate function X (x).
J.L
Furthermore, for each J.L, assume that for all x,y€[O,T],
3) E[A(x)] = J.L,
4) E[A(x)A(y)] = v(lx-yl)
2
where v(x)=J.L v (x)
o
for vo(x) a fixed function.
The above definition and assumptions are identical to those found in
Diggle and Marron (1987).
In addition. to avoid boundary effects, we
employ a circular design where X (O)=X (T), X'(O)=X'(T) and
J.L
J.L
J.L
J.L
X"(O)=X"(T). The Cox process model differs from the simple
J.L
. J.L
multiplicative intensity model since the true intensity function XJ.L(x)
of a stationary Cox process is a random function with mean J.L.
Under the Cox process model. letting
J.L~
again has the effect of
adding observations everywhere on the interval [O,T] and not changing
the relative shape of the target function XJ.L(x) in the limiting
process.
Therefore, we require the values of J.L to come from the
sequence {J.L}
CD
s s=
1 such that J.L /s
s
~ K
for some constant K)O.
The kernel estimate of X (x) is
J.L s
N
= }; ~(x-Xi) for x€[O,T].
i=l
The kernel estimate of X under the Cox process model is identical to
~(x)
~
~(x)
in the simple multiplicative intensity model; the difference
between the two models is only seen when the estimators are analyzed
mathematically.
~
The ISE of
~(x)
and the cross-validation score function are
defined as
46
I~(h)
T .... 2
CYX(h) =
So
~
=
sb ~2 - 2 Sb ~A - Sb A2 .
-
2 ~ ~i(Xi)
i=l
N .....
,..
~i(x) =
where
~ ~(x-X.).
j;ti
J
In the Cox process setting. it is natural to consider the
A
expectation of the ISE of
~
conditional on {A(x). xE[O.T]}.
When
k=(Su~)/2.
E[I~(h) I A] =
E[Sb
(X(x) - A(x»2
I A]
S~(X-Y)A(Y)dYdx - S[~(x-Y)A(y)dy - A(x)]2dx
=
= h-1(SbA)(~) + h
as Il ....... h-<l and
s
4
k2Sb(A")~
+ o(h-1S6A + h4S6(A")2)
Let B(h) be the asymptotic conditional
Il~.
s
A
expectation of the ISE of
then we define
B(h) = h-1(SbA)(~) + h4 k2Sb{A,,)2.
~(x).
Finally. one can show that E[(A"(x»2] = 112 v(4){O). and hence. the
s
0
A
MISE and the asymptotic MISE (AMISE) of
~(x)
are
MI~{h) = E[Sb{X{x) - A(x»2]
= Il T h-1(~) + Il~ h4 E[{A"/1l)2] k2 + o{1l2 h4 )
s
s
= Il T h-1(~) +·Il~ h 4 v(4)(O) k 2 + o(1l2 h 4 )
s
s
0
AMIfW_ (h) = Il T h-1(~) + Il~ h4 v(4)(O) k2 .
-"
5
5
0
3.b. Differences Between the Stationary Cox Process Model and the
Simple Multiplicative Intensity Model.
In Chapter 2. we saw that the number of observations in [O.T]
under the multiplicative intensity model is a Poisson random variable
with mean c.
s
As a result. Nis· asymptotically normal with standard
deviation c 1/2 . and N/c
s
s
converges to 1 in probability.
Under the
stationary Cox process model. N is not a Poisson random variable. and
the relationship between N and E[N] is different.
47
Note that.
E[f~
E[N] = E[E[N/A]] =
A(x)dx] = MsT
In addition.
MT
s
and as s ... "'.
2
Var[N/(M s T)] = [(G(T)-r2) Ms + Ms T]/(M s T)2
--+
[G(T)-r2]~.
Hence. under the stationary Cox process model. as s_.
1) N/E[N] = N/(M T) does not converge in probability.
s
1/2 (N/(usT)
2)" <usT)
~
~
- 1) d oes not converge i n di strl'but i on
to N(O.I).
l'
Through out this chapter. let c=fOA(x)dx.
Restrict the values of c to
'" such that cs/s ... T for some constant T)O.
the sequence {c s }s=1
Then
conditional on {c s }'
1) N is a Poisson(c ).
s
2) c~/2 (N/c
s
- 1 ) ~ N(O.I)
as s ... '" •
3) N/E[Nlc ] = 1 + 0 (c- 1/ 2 ).
s
p s
As a result. we might want to consider the effect of conditioning on
l'
(fOA) or the effect of conditioning on {A(x). x€~} in the Cox process
setting.
The ISE and the MISE are two error criteria used for evaluating
~
~.
In the density estimation setting and under the multiplicative
intensity model. the ISE and the MISE are asymptotically'equivalent.
We will show that this is not the case under the stationary Cox process
model.
48
For a simple example. consider a homogeneous doubly stochastic
Poisson process.
That is. let A(x) = A for all xE[O.T]. and assume
o
that A has an exponentiaf distribution with mean
o
In this situation. for x.yE[O.T].
E[A(x)A(y)]
and
= E[Ao2] = ~2s
~
s
.
(i.e. v 0 (\x-YI)
= 2)
A"(x) = 0 .
Using the definitions given in section 3.a.
AMISE,.(h) = ~Th-l(~)
E[Iss.(h)IA] = (JbAo)h-l(~) + 0(h-1JbAo)
. Var[E[Iss.(h) IA]] = ~Th-l(~) + o(h-l~)
Thus.
E[ISE,.IA]
Var
~
1
as s -+
0).
AMISE
This implies that for the homogeneous doubly stochastic Poisson
process. the variance of I(E[Iss.(h)I A] - MISE(~»/AMIss.(h) I
converges to one. and hence. \ (E[Iss.(h)I A] - MISE(~»/AMIss.(h)1
does not converge to zero in probability.
Furthermore. consider the difference between E[ISE,.(h)I A] and
MISE(h) under a more general stationary Cox process model.
First. note
that.
Second. define the random variables Zl and Z2 to be
ZI ;: [(Jb
A)/~sT - 1]
and
Z2;: [(Jb(A.. )2)/(~;r) -
v~4)(O)].
If we choose h such that h_~-1/5 then. as s ~ ~.
s
E[Iss.(h)I A] - MIss.(h) = ~sTh-l(~) ZI + ~~~2 Z2 + 0(~sh-l+~~4)
= ,,6/5.r(~} Zl + ,,6/~2 Z2 + 0(,,6/5)
s
s
49
s
and hence .
. E[lS~(h) I A] - Ml~(h)
= aZ l + bZ2 + 0(1)
AMl~(h)
where
a = (~)/[(~)+k2v(4)(0)]
o
Moreover. one can show that
Var[aZ
l
+ bZ ] = a 2{T-2:r:r
JOJO v o ( Ix-y I)dxdy - I}
2
2
+ b {Jl;tr-2 E[fbJ b(A"(x)A"(y»2dxdy]
-
2ab{Jl:~-2
E[fbJbA(x)(A"(y»2dXdY]
As a result. Var[aZ +bZ ] may be a strictly positive constant.
l
2
When
this is the case. the variance of I{E[lS~(h)IA]-MlS~(h)}/AMlS~(h)1
converges to a strictly positive constant as
and hence.
s~.
I{E[l~(h)IA]-MlS~(h)}/AMlS~(h)1does not converge to zero in
probability.
Consequently. E[l~(h)1 A] and Ml~(h) are not asymptotically
equivalent in general.
On the other hand.
l~(h)
and E[lS~(h)1 A]
are asymptotically equivalent under the stationary Cox process model.
A
Since we are interested in minimizing the lSE of
~.
we will study the
asymptotic behavior of the cross-validation bandwidth conditional on
the process A.
Unfortunately, conditioning on A makes the stationary
Cox process much less interesting.
Essentially, given {A(x). x€[O,T]},
the stationary Cox process setting is equivalent to the simple
multiplicative intensity setting.
Thus. conditional on
{A(x). x€[O.T]}. we can get the same asymptotic properties for the
stationary Cox process model as we proved for the simple multiplicative
intensity model.
50
3.c. ResuLts.
A
As in Chapter 2. let h
A
o
I~(h)
minimize
and h
cv
minimize CVA(h).
Also. assume that
a) K is a compactly supported bounded symmetric probability
density function.
b) A has two continuous bounded derivatives.
c) For each value of s.
the bandwidths under consideration come
from a set H where for some constants f3.P.o)O.
s
~(H ) ~ c P , and for hEH , f3-1c(-1+o)~ h < RC- O.
S
S
5
5
d) For some constant T)O. c Is
s
~
T as
-
.... 5
s~.
Under these assumptions. the least-squares cross-validation bandwidth
is asymptotically optimal for the conditional Cox process setting.
This is stated in Theorem 3.1.
THEOREM 3.1: Let assumptions a). b). c) and d) and the stationary Cox
process model hold.
T
Then, conditional on the random uariable cs=JOA
T
and the random function {a(x)=A(x)/(JOA). x€[O,T]}.
A
ISF_ (h )
"
cu
~
1
a.s.
ass-+O).
The proof of Theorem 3.1 is very similar to the proof of Theorem 2.1 in
the simple multiplicative case.
of the two lemmas below.
Again the theorem is a direct result
In these lemmas. assume that the assumptions
a). b). c). and d) hold .
.
51
Lemma 3.1: Under the stationary Cox process modet, and conditionat
T
T
on cs=IOA
and
{a(x)=A(x)/(J'OA),x€[O,T]}.
sup I[IS~(h) - B(h)] / B(h)/ ~ 0
h€H
a.s.
as s ~
m.
s
Lemma 3.2: Under the stationary Cox process modet, and conditionat
T
on cs=IOA
and
T
{a(x)=A(x)/(IOA)
,
x€[O,T]},
sup I[CvA(h)-IS~(h)·CVA(b)+IS~(b)]/ [B(h)+B(b)] I ~ 0
h,b€H
a.s s
a.s.
s
~ 00.
First of all, note that under the Cox process model, B(h) is the
asymptotic expression of E[IS~(h}IA]. Thus, the proof of the Lemma
3.1 above is identical to the proof of Lemma 2.1 .in Chapter 2 except
that E[I~(h)IA] isused instead of MIS~(h).
In this Chapter, we
will only present an outline of the proof of Lemma 3.2.
As in Chapter 2, we can use Theorem 3.1 to give results regarding
~
the convergence of (hcv /h)
conditional on the process A.
0
3.d. ProoFs.
prooF of Lemma 3.2:
We begin with some notation.
a(x) ;;; A(x)/c
s
Then. we can estimate a(x) by
~
. -1 N
~(x) = cs i:1
Again, define
52
Kh (x-X i )
where
A
~i(x)
n
= c
-1
! ~ (x-X.)
s j;o<i"n
J
Therefore.
PO(a,,)2 + o(h- 1 c-s 1 + h4 )
E[A(h) I A] = h- 1 c-1(~)+ h4 k 2
s
Finally. let A(h) be the asymptotic conditional expectation of A(h).
Po(a,,)2
A(h) ;; c- l h-l(~) + h4 k2
s
= c-~(h) .
s
Let Y . Y ... 'Y be the "unordered" observations as in Lenuna 2.2.
l
2
N
Conditional on N and on {A(x). x€[O.T]}. the Yi's are i.i.d. random
variables with probability density function a(x)I[O.T](x).
Define:
Uij(h) = h
-1
~(Yi-Yj) - !Kn(y-Yj)a(y)dy - a(Y ) +
i
T 2
IOa (y)dy
Vi (h) = E(Uijlyi.N)
Wij(h}= Uij -.V i
-1
Ri(h) = [(N-l)c s -1] [2!Kn(Y i -y)a(y)dy - I!Kn(y-z)a(y)a(z)dydz
T 2
- 2a(Y i ) + IOa (y)dy]
k
Conditional on N and on the process A. {! Vi}N
and
i=l
k=1
k i-I
{!
! W }N
are both martingales with respect to the a-fields
i=l j=l ij k=l
generated by {Y .Y ..... Y }. Now we proceed exactly as in Lenuna 2.2
k
1 2
for the simple multiplicative intensity case except that we use
conditional expectations and probabilities given {A(x). x€[O.T]}.
By Burkholder's 1nequa1ity. it follows that
-1 N
E[(c s
2m
! Vi) I A] ~
i=l·
~ cs
2m
-2m
C
s
E[ E[(
N
k
sup
! Vi)
k=l .... N 1=1
2m
I N. A]
I
A]
m
E[aE[( ! E[~I N.A])ml N.A] + a! E[lvkl2m\ N.A]
1=1
1
k=l
53
A].
In addition,
N
E[(c- 1 ! V )2m 1 A]
i
5 i=1
-2m
4m
m
m
cs
[h
(am Cs + o(c s » + ac s ]
~ a [c-m h4m + c-2m+I]
~
m
5
5
-1 -1
4
Similar to Lemma 2.2, since A(h) - c h
5
~>o
+ h , there exists some
so that for m sufficiently large and h€H ,
s
E[(c
-1 N
! Vi / A(h»2m
I
A] ~
[c-m h4m + c-2m+l]
a
m
s i=1
5
~
5
[c- 1 h- 1 + h 4 ]2m
a
m
5
Equivalently, one can write,
N
E[(c- 1 ! V. / A(h) ) 2m
s i=1 1
I aocs]
~
a mc -~m
s
Therefore, using the same procedure as seen in Lemma 2.2, conditional
on a and c s '
1 N
sup /c- ! V. / A(h) 1 .... 0
h€H
s i=1 1
a.s.
as
S
....
oo
(3.1)
5
Using conditional expectations and probabilities and arguments
similar to Lemma 2.2 in the simple multiplicative case it can be shown
that given a and c ,
s
2 N
sup /c-!
! Wij/A(h)I .... 0
h€H
s i=1 j,H
a.s.
as s ....
(3.2)
00,
5
1 N
sup Ic- ! R. / A(h)
h€H
s i=1 1
I .... 0
G = [2(N-l)c~1 - 1] c
-1
a.s. as s ....
(3.3)
00.
s
Now, let
N
-2
:r
2
! a(X i ) - [N(N-l)c ] JOa (x)dx.
s i=1
s
54
Then. conditional On a and c •
s
---+0
a.s.
by (3.1). (3.2) and (3.3)
ass~(X)
This completes the proof of Lemma 3.2.
55
**
TIlE ASYMP'IUI'IC DISfRIBUfION OF TIlE
LEAST~ARES
CROSS-VALIDATION
BANDWIIYIlI UNDER TIlE SDIPLEMULTIPLICATIVE INfENSITY JIODE!..
In Chapter 2. we showed that the least-squares cross-validation
bandwidth is asymptotically optimal for kernel intensity estimation
under the simple multiplicative intensity model.
The next problem to
consider is the rate of convergence of this bandwidth to an optimal
bandwidth.
If the rate of convergence for an asymptotically optimal
bandwidth is extraordinarily slow. then that bandwidth will often
perform poorly for moderately sized data sets.
Therefore. the
convergence rate of the cross-validation bandwidth. h
cv
• is critical.
There are two bandwidths that could be considered "optimal"; they
are h ' the bandwidth that minimizes the integrated square error of A,
o
and h , the bandwidth that minimizes the mean integrated square error
o
of A. If we aim to approximate the optimal bandwidth h • then. it
o
~
turns out that the cross-validation bandwidth does as well as h
first and second order.
to the
o
We examine the relationship between these
~
~
bandwidths by finding the asymptotic distributions of (h - h ) and
cv
0
~
(h - h). Moreover, in order to evaluate the performance of the
o
0
cross-validation bandwidth, we find the asymptotic distribution of the
distance between the ISE with the cross-validation bandwidth and the
minimum ISE.
4.a. Assumptions and Notation.
~'"
Assume that Xl'
"~
are observations in the interval [O,T]
from a nonhomogeneous Poisson process with a simple multiplicative
intensity function
xc (x) = c s a(x) for xEm
s
where c
s
is a positive constant, a(x) is a nonnegative deterministic
T
function such that JOa(x)dx=l.
variable with mean c .
s
Under this model, N is a Poisson random
Moreover, conditional on N, a(x)I[O.T](x) is
the density function of the "unordered" observations.
circular design with Xc(O)=Xc(T),
s
s
X~(O)=X~(T)
s
and
We assume a
X~(O)=A~(T).
s
s
s
The kernel estimate of the intensity function is defined as in
Chapter 2,
~(x) =
where
-1
~(x)=h
for x€[O,T]
A
K(x/h).
Again,
A
~(x)
=
~(x)/cs
will be the kernel
estimator of a(x).
We will assume the following technical conditions throughout this
Chapter.
1) K is a compactly supported, symmetric probability density
function on
mwith
(Jz~(z)dz)/2
Holder continuous derivative K' such that,
=k~.
Without loss of generality, assume that
K is supported on [-1.1].
2) X is bounded and twice differentiable.
and integrable.
X' and X" are bounded
X" is uniformly continuous.
3) For some constant T)O. c /s
s
~
T as
s~.
Regarding the first assumption, recall that a function g is Holder
57
continuous if Ig{x)-g{y)/ ~ alx-yl~ for some a.~>O
and all x.y.
Assumption 2) implies that the same properties hold for a as well as
for A.
Since we are going to deal with sequences of random variables
'co
we need a countable sequence of indices {c s }s=l'
s~
gives a sequence of values {c s } such that
The third assumption
implies that c s ~.
We define the following functions of the ISE and MISE of
-2
-2 T A '
2
A{h)
C
I~{h) = C s fa (~{x) -A{X) dx
s
~:
=
M{h)
=c~~IS~{h) = c~2f6 E[{~{X) D{h) =A{h) - M{h)
A{x»)2]dx
A
Essentially. A{h) and M{h) are the ISE and the MISE for a{x).
A
~
.(x)
. In
=
~
j;o!i
h
-1
.
Let
.
K{{x-X.)/h) be the leave-one-out estimator. then define
J,
.
A
several functions of the cross-validation score function of
CV{h)
=c-s2"'
CV~{h) =
o{h)
Thus.
[fa
s
~-(x) .b
.
2
N A
~ ~ .(X )]
i=I' bl i
T 2
faa
.
A
MI~{h). IS~{h)
-2 T A2
AN A
=2c-2s [faT
~(x)A{x)dx - i:l~i{Xi)]
CV{h) = A{h) + o{h) Finally. let
c
A
h . h and h
o
0
cv
be the bandwidths that minimize
and CVA{h) respectively.
Hence. these bandwidths
also minimize M{h). A{h) and CV{h) respectively.
a,
~:
=~
and
a2
Define
= k2f6{a,,)2
Then. given a circular design. we can conclude that as c s ~.
1
+ h4a 2 + o{c- h-
s
1 + h4 ) .
2
-1 -3
2
+ 12h a2 + o{c h
+ h )
s
1 5 where
Thus. M{h) is minimized by h o such that h o- a 0 c-s /
1/5
Furthermore. M"{h) - a 3 c-2/5 where
a E [a,/(4a2 )]
.
o
s
o
58
h~.
and
a~
-3
2
+ 12a2 a
o
0
;: 2a,a
It will also be useful to denote
Finally, define:
L(z) ;: -zK' (z)
0
2 = 8/(a5a~) (ST 2)S{fK(y+z)[K(z)-L(z)]dz)2dy
Oa
o
0
+ 16k2/(a~) Ub(a,,)2a + Uba"a)2]
(4.1)
o~v= 8/(a~a~) U~a2)(fL2) + 16k2/(a~) U~(a,,)2a + Uba"a)2]
(4.2)
4.b. ResuLts and Discussion.
Given all of the definitions and assumptions in the previous
section, We are now prepared to present the asymptotic distribution
~
functions for (h
~
cv
~
- h ) and (h - h).
0
0
It is reasonable to standardize
0
these differences by diViding them by h
o
which is of order c
-115
s
~
Hence, [(h - h )Ib ] is the relative distance between the minimum MISE
000
~
bandwidth and the minimum ISE bandwidth, while [(h
~
cv
- h )Ib ] is the
0
0
appropriately scaled distance between the cross-validation bandwidth
and the minimum ISE bandWidth.
Using these variables. we get the
asymptotic distributions found in Theorem 4.1.
Theorem 4.1:
Given assumptions 1), 2) and 3), under the simpLe
muL tipLtcative intensity model
c;/10[
ho~oho ] ~ N(O, o~)
ass-+
CD •
and
a.s s -+
59
CD.
There are a few important things to notice regarding this theorem.,
~
~
~
First of all. the convergence rate for [(h' - h )Ih ] and [(h - h )Ih ]
evoo
i
5
-1/10 .
C
000
Thus. it turns out that the cross-validation bandwidth
s
converges very slowly to the bandwidth that minimizes the ISE; however,
in terms of the asymptotic rate. h
cv
does as well as the minimum MISE
bandwidth.
In the kernel density estimation setting. Hall and Marron (1987)
~
showed that [(h
- h )Ih ] and [(h - h )Ih ] converge in distribution
cv
0
0
o· 0
0
. t h e expect ed va I ue 0 f t h e number 0 f
with rate n -,1/10. S'Ince c IS
s
observations, the convergence rate for the multiplicative intensity
model is parallel to the density estimation resul t.
~
~
Furthermore. Hall
~
and Marron (1987) proved that [(h cv- ho)lh ] and [(h - ho)lh ] are each
o
o
o
asymptotically normal.
Considering this fact. our result for the
intensity estimation setting makes ,sense intuitively.
~
N. [(h
~
Conditional on
~
- h )Ih ] and [(h
evoo
~
h )Ih ] are asymptotically normal as in
000
density estimation.
Moreover. the number of observations. N. is a
Poisson random variable. and a Poisson random
asymptotically normally distributed.
vari~ble
is known to be
Essentially. the unconditional
~
~
asymptotic cumulative distributions of [(h
h
- h )Ih ] and [(h - h )Ih ]
evoo
000
are the 11mi ts of weighted normal 'cumulative distributions where the
weight function is the normal density.
Taking the limit of this
weighted normal cumulative distribution. we find that the asymptotic
~
~
~
distributions of [(h - h )Ih ] and [(h - h )Ih ] are also normal for
cvoo
000
intensity estimation.
The variances for the asymptotic distributions in the'intensity
setting are similar to the variances in the density setting.
60
However,
2
2
00 are not 0cv equivalent.
We compare these variances in more detail
after we present the second theorem.
The integrated square error measures the distance between the
kernel estimator and the true intensity function X. and hence, I~(h*)
evaluates the performance of the kernel estimator with the bandwidth
h*.
Since h
minimizes Il'W_ ,h performs "best" according this error
o
-"
0
Therefore, when II~(h*)-ISEx(ho)1 is comparatively small,
criterion.
the kernel estimator with bandwidth h * is doing a relatively good job
of estimating X.
At this point, we will compare the degree to which ISEx(h ) and
o
I~(hcv)
2
c .
s
differ from ISE(ho )'
Notice that the lSE of
~
is of order
Thus, it makes sense to consider the scaled version of the
integrated square error
= c -2
s ISE(h).
.
A(h)
In order to compare the
ISE's of these bandwidths, an extra technical condition is necessary.
Thus, assume that a fourth condition holds:
4) K has a second derivative on Ill, and K" is Holder
continuous.
With the above condition as well as all of the conditions and
definitions from the previous section, we can calculate the asymptotic
distributions found in Theorem 4.2.
Theorem 4.2:
GilJen assumptions 1), 2), 3) and 4), under the simple
multiplicatilJe intensity model
~
cs{A(ho ) - A(ho )}
D
--4
2
00
and
61
a~
2
Xl
as s
~ ~,
Our aim is to find a bandwidth that minimizes ISE(h) or similarly
In general. we believe that the minimum MISE bandwidth, h ,
A(h).
o.
would perform quite well; however. h
o
is unknown.
Theorem 2.2 implies
that the data based cross-validation bandwidth performs almost as well
as h
since the distance from the minimum.scaled integrated square
o
A
error (i.e. A(h
o
»
is of order c
Again, the convergence rate
to the n
-1
-1
.....
for both A(h
S
-1
C
s
cv
) and A(h ).
0
seen in Theorem 4.2 is analogous
rate of convergence found by Hall and Marron (1987) for
density estimation.
In addition, both estimation settings yield
asymptotic Chi-square distributions.
At thiS point, it is interesting to compare the asymptotic
variances in Theorems 4.1 and 4.2 for a nonhomogeneous Poisson process
to the respective variances derived from a density function.
i=1,2,3 be constants in terms of the kernel function K(.).
Let Ai(K)
Then, under
the multiplicative intensity model,
a~
U~(a"h-l/5 (f~a2)
= Al (K)
+
~(K) U~(a,,)2]-6/5 U~(a,,)2a + (f6a"a)2].
a~v= ~(K) U~(a,,)2]-l/5 (f6a2)
+ A (K)
2
U~(a,,)2]-6/5 U6(a,,)2a + (f~a"a)2].
For kernel estimation of a density function f estimation, let the
1 1
asymptotic variances be defined by AVar[n / 0(h -h )/h ]
o 0
0
AVar[n
1/10
~
(h
~
cv
-h)/h]
0
0
=a c2d'
=a 2d
0
and
Then, it follows from Hall and Marron
(1987) that,
a~d = Al (K) [f(f,,)2]-l/5 (ff2)
+ A (K) U(f,,)2]-6/5 [f(f,,)2 f - (ff"f)2],
2
62
O~d = A:3(K) [f(f")2]-1I5 (ff2)
+ ~(K) [f(f")2]-6/5 [f(f")2 f - (ff"f)2].
Hence, the asymptotic variances in the density setting are similar to
the asymptotic variances in the intensity setting.
Recall that when we
contrasted the variances of the kernel estimators in the two settings
(Chapter 2, section 2.a), we found that the variance of the kernel
intensity estimator was larger than the variance of the kernel density
estimator, but that the highest order terms were the same (i.e. the
asymptotic variances were equal).
A
Similarly, the variance of
A .
A
[(h - h )/h ] and the variance of [(h - h )/h ] are larger in the
cvoo
000
intensity estimation setting since the number of observations from a
Poisson process is random; however, this difference shows up only in
smaller order terms.
Hence, the asymptotic distribution of the
·cross-validation bandwidth is equivalent for the two estimation
settings.
The more important issue is the comparison of
simple multiplicative intensity model.
0
2 to
o
0
2 under the
cv
First of all, we will show that
2
2
~ o.
By the Cauchy-Schwartz inequality and the fact that fK=I, it
cv
0
follows that
o
{fK(y+z)[K(z)-L(z)]dz}
2
~
2
fK(y+z)dz fK(y+z)[K(z)-L(z)] dz
2
= fK(y+z)[K(z)-L(z)] dz
Integrating over y leads to the conclusion that
S{fK(y+z)[K(z)-L(z)]dz}2dy ~ S[K(z)-L(z)]2dz =
JL2 (z)dz.
Thus, using the definitions of o~ and o~v given in (4.1) and (4.2), we
get the resul t:
63
2
2
5 2 -1 T 2
2
2
- a = 8(a a3) (JOa )[(JL )-J{fK(y+z)[K(z)-L(z)]dz) dy]
cv
0
o
2
2
a - a ~ O.
(4.3)
cv
0
a
A
As
s~,
A
A
the asymptotic variances of [(ho-ho)/ho] and [(hcv-ho)/ho] are
-1/5 2
-1/5 2
a ) and (c
a ) respectively, and the asymptotic variances
. s
0
s
CV
(c
-2 2 4
-2 2 4
of {A(ho)-A(ho » and {A(hcv)-A(ho )} are (2c s a 4 a o ) and (2c s a 4 aCyl
A
A
respectively.
A
Therefore. since our goal is to minimize 1SE, (4.3)
A
implies that h
o
performs slightly better than h
=
Secondly, we illustrate the relative sizes of a~ and a~v by giving
a couple of examples.
i=l,2
Consider the intensity functions
A.1 (x)=ca.1 (x)
where
alex) = (1/8) [sin(wx). + 1] 1[O,8](x),
22
a 2 (x) = (.0005255) [20 - (x-4)] 1[O,8](x).
See figures 4.1 and 4.2.
We calculate a
2'
2
o
cv
and a
from their definitions for the two
intensity functions Al and A2 .
table.
The results appear in the following
a
2
a
0
2
cv
Al
.1468
.1787
A
2
.5283
.6318
2
2
Both variances, a cv and a 0 • are bigger for A2 compared to AI'
Of
the two intensity functions, Al has more "curvature" as i t rises and
falls four times over the interval [0,8].
Therefore, from the two
2
examples it follows that a cv
can be lower when the intensity function
64
Figure 4.1
Figure 4.2
Intensity Function
#1
Intensity Function #2
on
N
o
o
N
......0
X
'-'
Non
«Io
c..
...J
«~
o
on
o
o
o
ci LO~~""""'~-2""""'~-.3""""'~-4'-'-~-:5~~""6'-'-~-:7:- ..........--:8
x
65
has more curvature.
The curvature of a function is often measured by
the second derivative of that function.
Notice that a" (which is
proportional to A") appears in the the formulas of a
2
cv
2
and a.
In
0
general, both of the variance functions are the sum of two ratios of
measures of curvature (1.e. (J~a2) / [S~(a,,)2]l/5and
[.f~(a,,)2a +
(J~a"a)2] / [.f~(a,,)2]6/5 ).
The fact that a
2
is comparatively small when the underlying
cv
intensity function has more curvature is very encouraging.
earlier, the convergence rate c
As stated
1/10
is disappointingly slow; on the
s
other hand, this slow convergence rate implies that the constants in
this situation are more important than usual.
Therefore, it is useful
to know that the cross-validation bandwidth might perform reasonably
well when there exists a fair amount of curvature in the true intensity
function.
4.c. Proofs of the Theorems.
In this section, we will present "the proofs of the two theorems
given in section 4.b.
The proof Theorem 4.1 depends on a number of
lemmas which are stated and proven in section 4.d.
Theorem 4.2 is a
direct consequence of Theorem 4.1 and Lemma 4.6.
proof of Theorem 4.1:
We will use the lemmas that are found in section 4.d to prove this
A
theorem.
We begin by proving the asymptotic distribution of (h -h ).
First using the fact that h
o
o
0
minimizes the function A(h) and second
using the Mean-Value Theorem, we get
66
~
~
~
0= A'(h ) = M'(h ) + D'(h ) = (h -h ) M"(h*) + D'(h )
(4.3)
o
o
o
o 0
0
where h* lies between h and h~ . It follows from Lemma 4.3. that for
o
0
some E.)O,
h = h + 0 (c- 1/5-E.)
o
as s~.
s
p
0
(4.4)
Thus, by letting h,=h , Lemma 4.2 implies that
o
D'(h) = D'(h ) + 0 (c- 7/10 ).
(4.5)
o
0
p s
7/10
From Lemma 4.4, we know that c
D'(h ) is asymptotically normal with
s
0
Therefore, it follows that
zero mean and standard deviation a a3u .
o
c
7/10
J!... N(O,
D'(h )
s
0
0
(ao a3)
2
2
u)
0
(4.6)
as s-J».
P
Since (4.4) impl ies that h*Ib ---+ 1 and since M"(h )-a3c- 2I5 , one can
0
o
s
show that
*)
= a3 c -215+
MOO (h
s
0
°
p
(c -215 ).
s
With (4.6), we know that D'(ho ) = p (c-s
(4.3) and (4.7), we get
7/IO
).
(4.7)
Combining this with
-(h - h ) = D'(h ) I M"(h*) = a;l c 2l5 D'(h ) +
o
0
0
5
(-3110)
0
0p C s
.. (4.B)
(I).
(4.9)
Using the fact that h - a c-l/~ gives the result,
o
_cIllO (h _ h)1b
5
0
0
0
s
0
= c 7/10 (a a )-ID'(h ) +
S
0:J
0
0
p
Consequently, using Slutsky's Theorem with (4.6) and (4.9) leads to the
final limi t,
cIllO (h - h}1b
5
0
0
0
~ N(O, u 2 ) as s~.
0
~
Next, we'll prove the asymptotic distribution of (h
~
cv
- h ).
0
~
Again, since h
o
cv
minimizes CV(h),
A
A
A
A
= CV' (hcv ) = M' (hcv ) + D' (hcv ) + 6' (hcv )
= (h
- h ) M"(h*) + D'(h
eva
where h* lies in between h~ cv and h 0 .
67
cv
) + 6'(h
cv
)
(4.10)
p
By Lemma 4.3, h*Ib ---+
o
1,
and
hence,
..
-2/5
M"(h ) = a3 c s
+
0
p
-2/5
(c s
)
as s-.
(4.11)
As above, utilizing Lemma 4.2 and Lemma 4.3 and then utilizing Lemma
4.4 and Lemma 4.5 leads to the result that as s-,
D'(h
cv
) + O'(h
cv
) = D'(h ) + o'(h ) +
0
=
° (c-7/IO ).
p
0
0
p
(cs
7/IO )
(4.12)
s
Combining (4.11) and (4.12) with (4.10) gives
°
- h ) [a3 c-2/5+ 0 (c-2/5)] +
(c- 7/10 ).
cvo
s
ps
ps
This implies that (hcv - h 0 )=0p (C-s 3/10 ) , and hence,
it follows from
.
0= (h
(4.11) that
(i~
- h ) M';(h") =
cvo
(hcvo
-h )
a 3 c-2/5+
s
0
(c- 7/10 ).
ps
(4.13)
Therefore, substituting (4.13) into (4.10) results in,
0= (h
- h ) a 3 c-2/5 + D'(h ) + o'(h ) + 0 (c-7/10 ).
cv
0
s
0
0
p s
'.
(4.14)
From (4.5) and (4.8) in the first half of the proof, we can conclude
that
0=
(h0- 0
h )
a3 c-2/5 + D'(h ) + 0 (c- 7/10 ).
sops·
(4.15)
Subtracting (4.15) from (4.14) gives,
-2/5
7110
+ O'(ho ) + op(c~
0= (hcv - ho) a 3 C s
),
A
A
and consequently,
1/1
7/10
0(h - h)/h = c
(a a3)-1 o'(h ) + 0 (1).
s
cvoo
so
0
P
By Lemma 4.5, we !mow that
_c
c 7 / l O o'(h) ...Q..N(O,
s
0
as s-iOO.
Finally, using Slutsky's Theorem with (4.16) and (4.17) gives,
cIllO (h - h)/h ...Q.. N(O, 0 2 ) as S-.
s
cvoo
cv
This completes the proof of Theorem 4.1 .....
58
(4.16)
(4.17)
proof of Theorem 4.2:
A
First, consider [A(h )-A(h »).
o
0
Observe that by repeatedly
applying the Mean-Value Theorem, we get,
A
A
A
A(h )-A(h ) = (h -h )A'«h -h )/2) = (h -h )[A'«h -h )/2)-A'(h
o
0
o o
o o
o 0
0
0
0
A
A
A
= (1I2)(h -h )2A"(h*)
o
(4.18)
0
..
where h
,.,
lies between h
from Lemma 4.6 that as
o
P
M
and h.
0
»)
Since h Ih
0
~
0, it is immediate
s~,
and hence,
215
) = a" c- 215 + 0 (c- 215 ).
(cs s p s
By Theorem 4.1, it directly follows that (h -h )=0 (c- 3/ 10 ).
A"(h*) = M"(h*) +
0
p
o
p
0
s
(4.19)
Thus,
substituting (4.19) into (4.18) yields,
A(h )-A(h )
o
0
= (1/2)(h0 -h0 )2
a"
Finally, by Theorem 4.1, we know that as
10
c3
s / (h0 _
h)
(4.20)
s~,
D N(O , a2 °02) •
o
( 4.21)
~
0
Therefore, using Slutsky's Theorem, (4.20) and (4.21) imply that as
A...Q...2
cs{A(ho)-A(ho )}
(ao a" ./2)
22
Xl = a 4
00
22
Xl'
°0
A
The proof for [A(h )-A(h ») is exactly the same as the argument
o
0
2
2
presented above replacing h and
by h
and
respectively. Thus,
o
0
cv
cv
°
the proof of Theorem 4.2 is finished.
A
°
**
4.d. The Lemmas and Their Proofs.
Assume throughout this section that the definitions and the three
technical assumptions given in section 4.a hold.' We will begin by
69
presenting some material that is useful for the proofs of the six
lemmas.
The number of observations. N. is a Poisson random varaible with
mean c • and therefore. it can be shown that
·s
m
m
E[N ; N>O] = E[N ] = c:+ o(c:)
(4.22)
d
(N-c s )/v'C"->N(O.l)
s.
as c s (4.23)
(N/c ) =' 1 + 0 (c-1/2) as c _.
(4.24)
s
P s
s
Recall that L(z)
-zK·(z). Using integration by parts. one can
=
show that
JL(z)dz = fK(z)dz = 1.
and
(4.25)
Jz~(z) = 3Jz~(z)dz = 3k .
(4.26)
Moreover. since K(z) is symmetric about zero.
2
JzL(z)dz = Jz (-K'(z»dz = 0 .
A
We will denote
~h(x)
(4.27)
N
=c-1s h-1 i=l
L L«x-Xi)/h).
It is also useful for us to calculate several derivatives
~d
A
moments of
~(x).
Suppose that {Y l .Y2 ..... YN} are the unordered
observations (i.e. the Yi's are the observations in some random order).
In addition. assume that Y and Z are independent random variables with
density function a(x)I[O.T](x).
d[~ (x)]/dh
n
= c- l ~ [-h-1K'«x-Y)/h) «x_Y)/h2) s i=l
L h [1_ (x-X.)-K (x-X )]
n
1
uh
i
s i=l
-1 .....
,.
[~h(x)~(x)].
= h
Moreover.
A
E[~(x)]
h-~«x-Y)/h)]
-1 N -1
. = c
,..
d[~(x)]/dh
Then. for x€[O.T].
-1
= C s ~(x-Y)A(Y)dy
70
(4.28)
~
E[~(x)]
= fKh(x-y)a(y)dy =
2
E[~(x-Y)]
(4.29)
2
2-_
= a(x) + h a"(x)[Su-K(u)/2] + o(h ),
and,
~
2
-2 N ..,2
-2
E[~(x) ] = E[c s i:l~(X-Xi) +.c s
;~; ~(x-Xi) ~(x-Xj)]
= c~2~(x-Y)A(Y)dY + c~2SfKh(x-y)~(x-z)A(Y)A(Z)dYdz
~
2
-1..,2
_2
E[~(x) ] = C
E[~(x-Y)] + I:.-[~(x-Y)]. .
(4.30)
s
Given (4.29) and (4.30), one can show that
A
d(~)/dh
=
E[d(~(x-Y»/dh]
-1
= h
A
E~h
- h
-1
A
~,
(4.31)
and
~2
d(Eab)/dh = -2c
-1 -1 ..,2
-1 -1
h E[~ (x-Y)]. + 2c h E[~ (x-Y)ln(x-Y)]
- 2h-l~[Kh(x-Y)] + 2h-lE[~(x-Y)] E[~(x-Y)]
d(~)/dh.= _2h-l{c-lE[~(X_Y)] - c-lE[~(x-Y)~(x-Y)]
+
~
[~(x)]
2
-
E[~(x)]E[~h(x)]}.
(4.32}
For these lemmas, we are interested in calculating
D (h) = -(h/2}D'(h) = (-h/2)A'(h) - (-h/2)M'(h).
1
We will find a decomposition of D (h} that will be convenient for the
1
proofs of several lemmas. Define:
Then, since we are using a circular design, we claim that
=
D1(h)
-(h/2)D'(h) = 81(h)+82(h)+83(h)+84(h)+85(h)+86(h)
(4.33)
where 8 = 8 - 8 , 8 = 821 + 8 22 , 83 = 831 - 832 , 8 4 = 841 - 842 ,
11
12
1
2
8 = 8 + 852 , and 86 = 861 - 862 , and
51
5
-2
8 11 (h) = 2c
! ! SKi(x) Kj(x) dx
s l~i<j~N
8 (h) = C- 2 ! ! S{K.(x)L.(x) + L.(x)K.(x)}dx
12
s l~i<HN
1
J
1
J
71
-1 N
A
A
S21(h) = C s i:1fKi(X){~(X) - E~h(X) - a(x)}dx
-1 N
A
•
S22(h) = Cs i:1JLi(x){a(x) - ~(x)}dx
S31(h) =
c~2i~1~(X-Yi)'- E[~(X-Y)]dx
-2
N
.
S32(h) = cs ' i:1~(X-Yi)~(X-Yi) - E[Kb(X-Y)~(x-Y)]dx
S41(h)
-1
S42(h)
-1 N
{(N-1)c s -1}
=
i:1~(X-Yi)E[Kb(X-Y)]
Cs
(1/2){Kb(x-Yi)E[~(x-Y)]+~(x-Yi)E[Kb(x-Y)]}dx
= {N(N-1)c-2
s -1}
2
E[Kb(x-Y)]E[~(x-Y)]}dx
J{E [Kb(x-Y)] -
-1
A..'"
S51(h) = (Nc s -1) JE[Kb(X-Y)] {~(x) - E~h(x) - a(x)}dx
-1
S52(h) = (Nc s -1) JE[~(x-Y)] {a(x) - ~(x)}dx
A
S61(h) = (Nc~1_1) c~1JE[~(~-Y)]dx
-1
S62(h) = (Nc s -1)
C
-1
JE[Kb(x-Y)~(x-Y)]dx.
s
Now, we must prove the claim (4.33).
(-h/2)A'(h) = -h/2
=
-h/2
TA2
d[fO~
TA
]/dh
TA
[fO(~)' A
T 2
2fO~a - JOa
-
T A2
T
First, expand (-h/2)A'(h).
2JOaj, a]
A
= -h [fo aj,(~-a)dx].
Using (4.28), we get
T
Jo(~- a)(~- ~h)dx
A
(-h/2)A'(h) =
(-h/2)A'(h)
A.
A
T
A
+
Jo(~- ~)(~- E~h-
+
Jo(~- a)(~- E~h)'
A
= Jo(~-~)
r ...
r
2
-
T
A
A
A
A
....
A
A
[Jo(~-~)
-
definitions of
A
A
a) +
r........
A
JO(~h- E~h)(a- ~)
(4.34)
A
~
A
JO(~-~)(~h-E~h)]
A
....
A
By substituting the
T A
A
2
T A
A
JO(~- ~)(~h- E~h)
and
~h
into
•
and then expanding these terms as a
sum of integrals of squares plus a sum of integrals of products, one
72
can show that:
Likewise,
TA
A.....
A.
TA
A
A
Jo(~-~)(~-E~h-a) + JO(~h- E~h)(a - ~) = 821 + 822+ 851 + 852 ,
Thus, by (4.34),
(-h/2)A'(h)
-2 N
= 8 1+
rv2
i:1J~(X-Yi)dx
82+ 84+ 85 + C s
-2 N T . . . . .
s i:l~(X-Yi)~(X-Yi)dx + JO(~-a)(~-E~h)' (4.35)
A
-
C
In addition, we need to calculate (-h/2)M'(h).
....
First of all,
(-h/2)M' (h) = (-h/2) d[fbE(~-a)2]/dh = (-h/2)JbE(~)' + hfbE(~) 'a.
8econdly, using (4.31) and (4.32), one can show that
(-h/2)M'(h) = c:lfE[~(X-Y)]dx - c:lfE[~(X-Y)~(x-Y)]dx + Jb(~)2
r . . . ",
r ..
T'"
- JO~E~h - JO~a + JOE~ha
=
c~lfE[~(X-Y)]dx - c~lfE[~(x-Y)~(x-Y)]dx
T
+ Jo(~ - a)(~ - E~h)'
A
.........
(4.36)
Combining (4.35) and (4.36), we can conclude that (4.33) holds:
D1 (h) = (-h/2)D'(h)
= (-h/2)A'(h) - (-h/2)M'(h)
-2 N rv2
'-2 N
= 8 +8 +8 +8 + c
L J~(x-Y.)dx - c L ~(x-Y.)~(x-y.)dx
1 2 4 5
s i=1
1
s i=1
.
1
1
+
Jb(~-a)(~-E~h) - {c~lfE[~(x-Y)]dx
-
C
-1
s
.
fE[~(x-Y)~(x-Y)]dx
= 8 + 8 + 8 + 8 + 8 + 86'
1
2 3 4 5
73
+
T.....
A.
A
JO(~-a)(~-E~h)}
(4.37)
Furthermore, we will be interested in D'(h).
Define
Then,
T
-1 N
A
= h d[fO ~(x)~(x)dx .
T
A
= h fa ~(x)~(x)dx = c
C
! ~i(Yi)]/dh
s, i=1
-1 N
Cs
i:l~i(Yi)
-1 N
-1 N
! n_ (x-Yi)a(x)dx - c
! rK. (x-Y.)a(x)dx
s i=I'n
s i=1"n
1
-2 N N
-2 N N
+ c
!
! ~n (Yi-Y.) - c
!
! I. (Yi-Y.).
s 1=1 j=I'
J
s i=1 j=1n
J
j¢1
Let
Y
and
Z
j¢1
be'random variables with distribution a(x)I[O,T](x).
Then, D1(h) can be decomposed such that
D1(h) = (h/2)o'(h) = T1(h) + T2 (h) + T3 (h)
where T1 = TIC T12 , T2 = T 21 - T22 and T3 = T31 - T32 , and
2
T11 (h) = 2c- ! ! {~(Y'-Yj) - rK (y.-y)a(~)dy - rK (y-Yj)a(y)dy
s 1~i<j~N 'n 1
"n 1
"n
+ f~(y-z)a(y)a(z)dYdz)
TI2 (h) = 2c- 2 ! ! {L (Y.-Y.) - n (Yi-y)a(y)dy - n (y-Y.)a(y)dy
s 1~l<j~N -h 1 J
'n
'n
J
+ fJ!n(y-z)a{y)a(z)dydz}
-1 N
T 2
-1 N
T 2
T21 (h) = C s l:l{~(Yi-y)a(Y)dY - f~(y-z)a(y)a(z)dydz - a(YI) + faa)
T22 (h) = C s i:l{J!n(Yi-y)a(y)dy - fJ!n(y-z)a(y)a(z)dydz - a(Yi) + faa)
N
T31 (h) = [(N-l)c- 1 - 1] c- 1 ! irK (Y.-y)a(y)dy - n. (Y.-y)a(y)dy}
s
s ii:l "n 1
'n 1
-1
T32 (h) = [N(N-l)c -2
s - Nc s ][f~(y-z)a(y)a(z)dydz
- fJ!n(y-z)a(y)a(z)dYdz].
74
It is convenient to denote
such that
(-h/2) n*'(h) - (5 1+
(-h/2) R* '(h) - (54+
(-h/2) O*'(h)
52+ 53)
55+ 56)
(T + T )
2
l
(-h/2) p*'(h) E T3 .
E
Essentially, n* and 0* include the terms that are found in the density
estimation case, and R* and p* include the extra terms that are
introduced by considering a nonhomogenous Poisson process.
Thus, the
intensity setting differs from the density setting in two ways.
First,
the number of observations in n* and 0* is now a random variable, and
second, R*.and p* are new terms that have to be considered.
At this
stage, we will present and prove the Lemmas that are necessary to prove
Theorem 4.1. and Theorem 4.2.
°(
Lemma 4.1: For each
a ( b ( ~ and aLL positive integers
sup EClc7 / 10 nx'(c-l/St)12e
N) 0] ~ A,(a,b,e),
s;a~t~b
s
s
sup EC!c7/100x'(c·l/5t)12e
N ) 0] ~ A,(a,b,e).
s;a~t~b
s
Furthermore 3
c)
e,
(1)
(2)
s
° such that
EClc7/10{nx'(c-l/Sr)
N)O] ~ A2 (a,b,e)!r-tI Ce , (3)
EClc7/10{ox'(c-l/Sr)
N)O] ~ A2 (a,b,e)lr-tI Ce
s
s
s
whenever a
s
~
r
~
t
~
b
75
(")
proof of Lemma 4.1:
We will present the proofs of parts (2) and (3) of Lemma 4.1.
We
Since h-c- l / 5 , in order to prove (3) it
begin by looking at part (3).
s
is sufficient to prove that for some e>O and constant A:
E[lc;/10{Sll(c~1/5r) - Sll(c~1/5t)}12e
N > 0] ~ Alr_tl ee ,
E[lc;/10{S12(c~1/5r) -S12(c~1/5t)}12e
N > 0] ~ Alr_tl ee ,
E[lc;/10{S2l(c~1/5r) - S2l(c~1/5t)}12e
N > 0] ~ A/r_tl ee ,
E[lc9/l0{S22(c-l/5r) _ S22(c- l / 5 t)}1 2e
N > 0] ~ A/r_tl ee •
5
S
s
'
E[lc;3/l0{S3l(c~1/5r) - S3l(c~1/5t)}/2e
(4.38)
(4.39)
N > 0] ~ Alr_t/ ee . (.4.40)
ee
N > 0] ~ Alr_tl .
E[lc;3/l0{S32(c~1/5r) - S32(c~1/5t)}12e
In this chapter, we will prove (4.38), (4.39), and (4.40).
The proofs
of the other inequalites are similar.
Throughout this proof let
a
~
r,t
e.
constant that depends on K, a, band
~
b and let A be a generic
First, consider (4.38).
We
can write Sll as
-1/5
Sll(c s
where
E[U .. (t)
IJ
I Yi,N]
t) = c
-2
~ ~ U .(t)
s l~i<j~N iJ
= E[U.j(t) I-Y .,N] = O.
J
1
Furthermore, let
Wij(r,t) = Uij(r) - Uij(t) .
The left side of (4.38) can be written in terms of W.. (s,t):
IJ
1 5
2e
; N > 0]
E[lc;/10{Sll(c~1/5r) - Sll(c- / t)}1
2e
= E[ Ic 9/10c -2 ~.~ Wij(r,t) 1
;.N > 0]
s
s l~i<HN
= c -11 us....
-E[ E[
s
I
~ ~
l~i<HN
W.. (r, t)
,2e I N];
N
IJ
> 0].
(4.41 )
Consequently, we are interested in finding an upper bound for
E[I
~ ~-
l~i<j~N
w. j (r,t)1 2e IN].
Given the definition of K.(x), we can eXpand
1
W such that
ij
Wij(r,t)
1
= W1 (r,t,Y i ,Y j )
76
+ W (r,t,Y ) + W (.r,t)
2
i
3
where
=JK(c-1/Sr)(x-Y i ) K(c-1/Sr)(x-Yj)dx
W1(r,t,Yi,Y j )
5
5
- JK(C-1/St)(x-Y i ) K(C-1/St)(x-Yj)dx.
5
5
=JK(c-1/Sr)(x-Yi)
W2 (r,t,Y i )
E[K(c-1/Sr}(x-Y)]dx
5
5
- JK(C-1/St)(x-Y i ) E[K(C-l/St)(X-Y)]dx,
5
5
W3 (r,t)
=JE2[K(C- l/Sr)(X-Y)]dx 5
~[K(C-l/St)(X-Y)]dx.
5
We begin with WI' For the first integral, use the substitution
-liS
-liS -1
u=(x-Yi)/(c s
r) (hence, du=(c
r) dx); and for the second integal.
use u=(x-Y.)/(c -liS t). Then,
1
5
-liS -2
-liS
-liS
r) JK((x-Yi)/(c s
r»K((x-Yj}/(c s
r»dx
IWl(r,t,Yi,Y j ) I = I(c s
_(c-l/St)-2JK((x-Yi)/(c-l/St»K((X-Yj)/(c-l/St»dxl
5
5
5
= l(c-l/Sr}-lSllK(U}K(U+(Y'-Yj)/(c-l/Sr)}du
S
-
1
5
- (C~l/St}-lS~lK(U)K(U+(Yi-Yj)/(c~l/St»dul
~ (c-l/Sr)-lSllK(u)IK(U+(Y'-Yj)/(C-l/Sr»-K(u+(Yi-Yj)/(C-l/St»ldu
s
-
I S S
+ !(c-l/Sr)-l_(c-l/St)-ll SllK(U)K(U+(Yi-Yj)/(c-l/St»du.
S
5
Since K is supported on [-1,1],
I(Yi-Yj)/(c~l/~)1 > 2.
continuous.
-
5
K(u+(Yi-Yj)/(c -liS
r) equals zero when
s
In addition, K is bounded and Holder
Therefore,
77
(4.42)
Next consider W2 .·
IW2 (r,t,Y i )1 = IIK(c-1/Sr)(x-Yi) [IK(c-1/Sr)(x-y)a(y)dy] dx
s
s
- IK(c- 1/ S t)(x-Y i ) [IK(c-1/St)(x-y)a(y)dy] dxl
s
~
I
s
IIK( c -l/S r
)(x-Y.)
K( -l/S r )(x-y)dx
Ic
s·
s
- IK(c-1/St)(X-Yi) K(c-1/St)(x-y)dxI a(y)dy
s
=I
s
IW 1(r,t,Y i ,y)1 a(y)dy.
By (4.42), we get
~
-1/Fo- -1
-1/Fo1[_2,2]{(Y i -y)/c s -b} a(y)dy.
IW2 (r,t.Y i ).I ~ I A Ir-t I 2(C S -b)
Since aCyl is bounded,
-1
-1/FoIW2 (r.t,Y i ) I ~· A Ir-t I~ 2 f(c -1/Fos -b) 1[_2,2]{(Y i -y}/c s -b}dy
IW2 (r,t.Y i )1 ~ Alr-tl~2.
(4.43)
Finally, look at W . Using similar methods as those used to
3
obtain (4.43), one can show that
IW3 (r,t)1 ~ IffIK(c-1/Sr) (x-y} K(c- 1/ Sr)(x-z)a(y)a(z)dydzdx
s
s
- ffIK(c-1/St)(x-y) K(c-1/St)(x-z)a(y)a(z)dydzdxI
s
.
s
y
<
- If IIK( c -1/S r )(x- ) K(·c -1/S r )(X-Z)dx
s
.
s
- IK(c-1/St)(x-y) K(c-l/St)(x-z)dxI a(y)a(z)dydz
s
~
IIA Ir-t I
~
s
-1/Fo- -1
-1/5.
2 (c s -b)
1[_2,2]{(y-z)/cs -b)a(y)a(z)dYdz.
78
Hence.
IW (r.t)1 ~ Alr-tl~2.
3
(4.44)
Substituting (4.42). (4.43) and (4.44) into the definition of
+1].
N
Observe that. conditional on N. { ! ! Wij(s.t)}k_l is a
l~i<j~k
martingale.
As a result. we can utilize a martingale inequality found
in Burkholder (1973. p. 40}.
2e
E[I ! ! Wij (r.t)1 l N] ~ E[I
l~i<j~N
sup
k=l ..... N
j-l
N
~ AE[( ! E[( ! Wij(r.t»
j=l
i=l
m
2
I
k-l
!!
Wij (r.t}1
l~i<j~k
2e
l
N]
e
{Yl.···Yj-l}.N]) 1 N]
2el
(4.46)
N].
k=l
i=l
Consider the first term on the right side of the inequality in (4.46).
+ A ! E[( ! Wkj(r.t»
Observe that
N-l
E[(i~lWiN(r.t) )21
Yl.···.YN-l.N]
N-l
=
i~lE[~NI
Yl·····YN-l·N]
N-l N-l
+ !
! E(WiNIY l ···· .YN_l·N] E(WkNIY 1 ···· .YN-l·N]
i=l k=l
kl'!i
N
Yl.N] +
N
!
! O.
j=l k=l
kl'!j
79
Then. the entire first term in the inequality of (4.46) equals.
N
j-l
2·
e
E[( ! E[( ! Wij(r.t) ) I {yl.· .. y ._l}.N]) IN]
j=1
i=1
J
e
N-l
2
e
~ E[N (E[(i:1 WiN (r.t»
I {yl.· .. yN-l}.N]) IN]
~ E[N e (Alr_tI 2c2 Nc s1/ 5 )e
~ A /r_t/ 2C2e N2e c el5
.
s
I
N]
(4.48)
Next. consider the second term in Burkholder's inequality.
Using
a partitioning argument as in Lemma 2.3 in Section 2.c. we can show
that
k-l
! E[( ! Wik(r.t)
k=1
i=1
co
2e
) I N]
N k-l
= E[ ! ( ! Wike r. t»
k=1 i=1
~
AN
e+l
80
_2e
E[WI2(r.t)
2e
.
I N]
I N].
Since
2~1,
CD
k-1
! E[( ! W (r.t»221 N] ~ Alr_tl 2e22 N2+1 c22/5-1/5
k=1
i=l ik
s
(4.49)
Thus. substituting (4.48) and (4.49) into (4.46). we get
E[I ! ! W. (r.t)1 22 1 N] ~ Alr_tl~2[N22c2/5 + N2+1c22/5-1/5].
l~1<HN Ij
s
s
and hence. from (4.41). we can conclude that
E[!c;/10{Sll(c~1/5r) - Sll(c~1/5t»122 ; N>O]
~ c- 112/ 5 E[ E[I
2e
W (r.t)! l N]
N>O]
s
l~i<HN ij
1 5
1 5
2e
; N>O]
E[lc;/10{Sll(c: / r) - Sll(c: / t)}1
~ c- 11e/ 5 E[ Alr-tl~e [N 2e c 2/ 5 + Ne+1 c2e/5-1/5]
s
s
s
4e
~ Alr-tl~2 [1 + c- /5+4/5]
by (4.22)
!!
N>O]
s
.~ A Ir-t I~e.
Thus. we have proven (4.38)
Next. we consider (4.39).
We can write.
~
1 5
1
S21(c: / t) = NViet) where
i=l
.
Let
Bi(r.t) = Vier) - Viet).
E[Vi(t)!N] = 0
Note that. utilizing Taylor expansion
methods along with (4.25). (4.26) and (4.27) gives
A
A
2E[a(c-1/5t)(X)] - E[~(c-1/5t)(x)] - a(x}
s
s
-1/5
-1/5
= I[2K(u)u(x-c
tu) - L(u)a(x-c s
tu)]du - u(x)
.
s
-1/5 2
2
-1/5 2
= 2a(x} + (c s
t) IK(u)u du u"(x) + o«c s
t»
- a(x)
- (c -1/5
t) 2IL(u)u 2
du "
u (x)/2 + o«c -1/5
t» 2 - u(x)
s
s
2
= (c -1/5
t) 2"
u (x) I[K(u)+(L(u)/2)]u2du + o«c -1/5
t».
s
s
.
81
With arguments similar to those used to prove (4.38), one can
prove that
IBi(r,t)1 ~ J{IE[~(c-1/5r)(X)] - E[~(C-1/5r)(X)] - a(x)l}
s
s
{IK(c- 1/ 5r)(x-Y i ) - E[K(c- 1/ 5r)(x-Y)]
s
s
- K(c- 1/ 5 t)(x-Y i ) + E[K(c- 1/ 5 t)(x-Y)]I}dx
s
+
s
J{IK(C- 1/ 5 t)(x-Y i ) - E[K(c-1/5t)(x-Y)]I}
s
s
A
A
{IE[a(c- 1/ 5 r)(X)] - E[~(c-1/5r)(x)] - a(x)
s
s
A
A
- E[a(C-V5t)(X)] + E[~(C-V5t)(x)] + a(x)l}dx,
s
s
and hence,
(4.50)
Again, Burkholder's martingale inequality can be used for the
k
N
martingale { ! B.(r,t)}
to show that
i=1 1
k=1
N
N
2e
E[I ! B.(r,t)1 l N] ~ AE[( ! E[B~(r,t)])eIN]
i=1 1
i=1
1
'"
A! E[IB.(r,t)1 2e IN]
i=1
1
N
~ AE[Ne(E[B~(r,t)])el N] + AE[ ! IB.(r,t)1 2e l N]
1
i=1 1
~ AE[N e (c-4/5 Ir_tI2~1)el N] + AE[N(c-2/5 Ir_tl~1)2el N]
s
s
~ AN e c-4e / 5 Ir_tI2~le + ANc-4e/5Ir_tI2~le
s
+
s
~ AN e c-4e / 5 Ir_tl~e.
(4.51)
s
As before. the left side of (4.39) can be written in terms of B.(s,t):
1
2e
E[lc;/10{S21(c;1/5r ) - S21(c;1/5t )}1
; N>O]
N
= E[lc 9/ 10 c- 1 ! Bi (r,t)1 2e ; N>O ]
s
s i=1
-e/5
2e
N
.
E[ E[ I ! B. (r, t) I I N] ; N>O ]
s
i=1 1
e
5
e
~ c- / E[(AN c-4e / 5 Ir-tl~e); N>O]
=c
s
s
82
N~O]
by (4.22).
Thus, (4.39) has been proven.
Finally, we'll prove (4.40).
-1/5
S3l(c s
t)
Let
N
= C-2
s i:lVi(t)
Bi(r,t) = Vi(r)-Vi(t).
B2 (r,t)
I
where E[Vi(t) N]=O .
Then, Bi(r,t) can be written as
Bi(r,t)= Bl(r,t,Y ) + B2 (r,t)
i
Bl(r,t,Y i ) E
As before, we represent
where,
~c-1/5r)(x-Yi)dx
s
-
~c-1/5t)(x-Yi)dx,
s
JE[~c-1/5r)(X-Y)]dx - JE[~c-1/5t)(X-Y)]dx.
E
s
s
1/5
First consider B . Using the substitution w=(x-Y.)/(cr) for
,
s
l
-1/5
the first integral and w=(x-Yi)/(c s
t) for the second integral, we
get
IBl(r,t,Y,.)1 = l(c-1/5r)-2~«x_y.)/(c-1/5r»dx
S I S
.
_ (c-1/5t)-2~«x_y )/(c- 1/5 t»dxl
s i s
= l(c~1/5r)-1 J~lI2!(w)dW - (c~1/5t)-lJ~1I2!(w)dwl
~ Ac~/5 [Il/r - l/tl J~lI2!(w)dx]
~
Since K is bounded and (rt)
IBl(r,t,Y i ) I
Likewise,
~
IB (r, t)
2
Ir-t.I J_ll'(w)dx].
1.2
-1
Ac s1/5 [(rt).
Ac s1/5
-1
~
a
-2
,
Ir-t le 2 .
I ~ Ac~/5
IBi(r,t) I
(4.52)
Ir-t le 2
~
,
and hence,
Ac s1/5 Ir-t Ie 2.
83
(4.53)
Now, (4.53) and Burkholder's
ineq~lity
imply that
E[ Ic 13/ 1O{S . (c-1/5 r ) - S
(c-1/5 t )} 122 . N>O]
s
31a s
31a s
'
N
= E[lc13/10c-2! B (r,t)1 22 ; N>O]
s
s i=l i
= c -72/5 E[ E[ 1 !N Bi(r,t) 122 1 N] ; N>O].
s
i=1
~ c-72 / 5 E[Alr-tl~e (N e c 22/ 5 + Nc22/ 5 ) N>O]
. s
s
~ Alr-tl~e c~e E[N 2 ; N>0J.
~ Alr-tl~2.
Therefore, we've proven (4.40).
This concludes the proof of part (3)
of Lennna 4.1.
.
Now we prove part (2) of Lennna 4.1.
Since h-c
-1/5
,in order to
s
show that part (2) holds, it is sufficient to prove:
N>O]
~.A,
N>O]
~
A,
N>O]
~
A,
N>O]
~
A.
We will present the proofs of (4.54) and (4.55) here.
are similar.
where
(4.54)
(4.55)
The other parts
We begin with (4.54). Notice that
-1/5 .'
-2
T (c
t) = 2c
! ! W.. (q
11 s
s 1~i<HN lJ
E[Wij(t)1 Yi,N] = E[Wij(t)! Yj,N] = O.
Consequently, we can
write,
N>O]
84
2i " N>O]
E[I c 9/10 T11 (-1/5
cs
t )1
s
s;a~t~b
_
sup
9i~
= Ac s
E[s;:~~~b
I~
cs
l~i<~~NWij{t)
12i
; N>O].
We now can use the same argument as seen in Lemma 2.2 from Chapter 2.
That is, one can use Burkholder's inequality to show that for h-c- 1/ 5
s
and
i~l,
sup
E[lc;/10Tll{c~I/5t)12e;N>0]
s;a~t~b
Next, look at (4.55).
We can represent
T21 {c-1/5 t) = c -1 ~ V.{t)
s
s i=1 1
where E[V.{t)
I
N]=O.
1
Thus, we can write,
Again using the same argument as in Lemma 2.2 (Chapter 2) for h-c
-1/5
s
,
E[lc9 / IO T2l{c-l/5t)12i ; N>O] ~ c 9i / 5 [Ac-9i / 5 ] ~ A.
s;a~t~b
s
s
s
s
sup
Hence, we have proved part (2) of the lemma.
of Lemma 4.1.
**
85
This completes the proof
Lemma 4.2:
For some
and any O(a(b(m,
~)O
sup {/n'(c- 1 / 5 tJI + 16'(c- 1 / 5 tJI} = 0 (c-3/5-~J
~t~
s s p s
Furthermore, for any
mut tLpte of c
~I)O
as
(1 J
s~.
and any nonrandom hi asymptotic to a constant
-1/5
s
as s_.
'(2J
proof of Lemma 4.2:
We'll present a proof for the'D' term in part (2) and for the 6' term
in (I) of Lemma 4.2.
Start by considering D'(h).
Note 'that
D'(h} = D"'(h} + R"'(h}.
Thus, we can treat the two summands separately.
o~
The methods used
D", are similar to those in Lemma 3.2 in Hall and Marron (1987) .
.
Since K and L are both Holder continuous and have compact support,
there exists a
~>O
sufficiently large such that
sup
(4.56)
a~r~t~b
Ir-t l~c~~+lI5
Assume that a ( lim c hi
s
s-
< b.
Then, a sequence can be constructed such
that
lIf>-hi - c -~1 = t < t <. . . < t 1
s
0
ml
s
where t - t _ = c-~ for each i. Let
i
i l
s
.
-lI5-~1
~ = {(ti,t ) . 0 < t - tj~ c
' i~m }.
i
s
j
C
Note that there are (2C-~1/C-~) points t., and for each t
s
s
1
. t s t j such tha t (t i - tj) "'/ c -lI5-~1
( c -1/5-~11
s
c -~)
s
pOln
s
.
86
there are
i
Therefore, the
number of pai rs in ~ has order c -~ 1 -115+2/3.
s
P[
sup
(ti,tj)€~
c7/1010*'(c-I/5ti) - 0*' (c- I / 5 t )\ > 7)]
j
s
s
s
~ P[
sup
c7/10Io*'(c-I/5ti) - 0*' (c- I / 5 t )1 > 7): N>O]
s
s .j
( t i ' t j )€~ s
+
~
Given any 7»0,
P[N=O]
~
E[7)-lc~/10Io*'{c~I/5ti) - O*'{c~I/5tj)12e: N>O]
(ti,tj)€~
+ exp( -c ).
s
By allowing
s-.
e
to be sufficiently large, this term converges to zero as
Thus,
p
c7/10Io*'(c-I/5t) _ 0*' {c- I / 5 t )1 ---.0.
(t.,t.)€~s
sis
j
sup
1
(4.57)
J
(4.57) together with (4.56) imply that as s-.
(4.58)
Now consider R* '(h)=(-h/2) -i [S4(h)+S5(h)+S6(h)].
at S5'
We can define
= (Nc-l-l) Q(h).
s
87
Begin by looking
Then, as s- (and hence h-;Q),
A
Q(h) = f E[Kh(x-Y)]
A
{~(x)-E~h(x)-a(x)}dx
A
+
f
E(~(x-Y)] {a(x)-~(x)}dx
= f~{a(x)+h2a"(x)[fu~2]+o(h2)}{h2a"(x)[fu2(2K-L)/2]+o(h2)}dx
+ f~{a(x)+h2a"(x)[fu2u2]+O(h2)} {-h2a"(x) [fu~2]+o(h2)}dx
2 T
2 .
2.. _
2
= h (fOa"a)[fu (2K-L)/2 - fU-K/2] + op(h )
Q(h) =
h2(f~a"a)(-2k) + Op(h2 ).
(4.S9)
For any t such that It-c1/~, I~c-~'.
s
s
IQ( c-1/5t )-Q(h,) I = Ic-2IS t 2 -h~ I (fToa"a) (-2k) + 0 (c-2IS) .
s s p s
-lIS'
-~ " we know that It+c 1/5.
I
Since h,~cs
and It-c s1/5.
-h, I~cs
s -h, =Op(l).
In addition, a and a" are bounded. Therefore,
IQ(c- 1/S t)-Q(h,)I ~ A c-2ISlt_c1/~, 1+ 0 (c-2IS ).
s
s.
s
p s
Moreover, I t-c;/~, I ~ c~~' implies that
c 2lS [Q(cs
s
1/S
t) - Q(h,)]
~0
as s-.
(4.60)
By (4.24), we know that
d
c~/2 {Nc~l_l} ~ N(O,l)
as s-.
(4.61)
Using Slutsky's theorem on (4.60) and (4.61) gives,
p
c;/10 {ISS(c~l/St) - SS(h,)I} ~ 0
as s-.
Using the same procedure forS 4 and S6'
Since t was chosen arbitrarily,
P
----fo
0 as s-f» .
(4.62)
Combining (4.58) and (4.62) gives the main result for D' in part (2) of
Lemma 4.2.
88
Next, consider the 6' term ih part (1) of Lemma 4.2.
.. .
recall that 6(h)=6 (h)+p (h).
First,
Using an argument that involves the
partitioning method found in Hall and Marron (1987) and seen above for
.
D , one can show that
sup {16"'(c- 1/ S t)I} = 0 (c-3 / S- E)
a~t~b
s
p s
Now, look at p"'(h)=(-h/2)-1 T (h).
3
as s~.
First of all,
-1
T3 (h) = [(N-1)c s ' - 1] B(h)
where as
(4.63)
(4.64)
s~.
B(h) = c
-1 N
~ {~(x-Yi)a(x)dx
-
s i=l
~(x-y.)a(x)dx}
1
-1
- Nc s
{J~(x-y)a(x)a(Y)dxdy
-
J~(x-y)a(x)a(y)dxdy}
N
= h 2c-1 ~ a"(Y i) [fu2 (K-L)/2] - Nc -lh2(fba"a) [fu 2 (K-L)/2]
s i=l
s
+
= h
2
0
p
2
(h )
2
(fba"a) [fu (K-L)/2] - h
2
(fba "a)[fu2 (K-L)/2] + Op(h2 )
2
= op(h ).
Hence, for any t such that
O<a~t~b<~,
B(c- 1/ S t) =
s
Since [(N-1)c- 1-1] is 0 (c- 1/ 2 ),
s
p
s
0
p
(c-2/S)
s
as s~.
for some E)O,
sup {IT (c- 1/ S t)I} = 0 (C-41S-E).
3 s
p s
a~t~b
Therefore, as
s~,
sup {lp"'(c- 1/5t)!} = 0 (c-3 / S- E).
a~t~b
s
p s
(4.65)
(4.65) together with (4.63) imply that the main result for 6' in part
(1) of the lemma- holds.
This completes Lemma 4.2 .....
89
Lemma 4.3:
For some 6)0,
Iho .
h
0
I
IhCV .
+
h
0
I
= 0 (C· 3 /5-6)
P
as s~.
s
prooF of Lemma 4.3:
The proof of this lemma is parallel to the proof of Lemma 3.3 in
Hall and Marron {1987}.
Consider first
Ihev-
h
0
I.
By definition of
A
h ev ' {j and D.
A
CV' {ho} = CV' {ho} - CV' {hev }
A
A
= A' {ho} - A' {h } + {j' {ho} - {j' {h }
ev
ev
A
A
A
CV'{h } = M'{h } - M'{h } + D'{h } - D'{h } + {j'{h } - {j'(h }'
cv
ev
o
o
o
o
ev
p
A
We know from the proof of Theorem 2.2 that h
ev
/h
0
~
1.
Hence. by
Lemma 4.2,
W{hev } + 0p {e-s 3/S- 6 }
CV'{h } = M'{h } o
0
In addition, using Lemma 4.2 and the fact that h
o
as
s~.
{4.66}
minimizes M{h}, we
get
CV'{h } = A'{h } + {j'{h }
o
o
o
= M'(h } + D'{h } + {j'{h }
o
o
o
= D' {h }.+ {j' (h )
o
o
CV'{h } = 0 {e-3/S-~} as s~.
o
{4.67}
s
p
Combining {4.66} and {4.67} implies that
op (e-s 3/S- 6 ) = M'{h
}
. 0
-
W{he}v= .{h0 - hev } M"{h*}
.
Since h*/ho ~ 1 and . M"{h0
M"{h*} = a"e-21S + 0 {e-21S } .
. s
p s
when h* lies between h
o
and
hev
{4.68}
}-a"e-21~
5
Substituting this into {4.68} gives
{h o
hev } = 0 p {e-s 1/S-
This is the first part of the Lemma.
90
6
}
as s~.
{4.69}
Consider now
.'
A
P
h /h --> 1.
o
0
Iho -h0 I.
In the proof of Theorem 2.2, we saw that
Thus, Lenuna 4.2 implies that as
S-,
A
A'(ho ) = A'(h ) - A'(h )
o
o
= /lI' (h ) - /lI' (h ) + D' (h ) - D' (h )
o
o
o
o
3
= /lI'(h ) - /II'(h ) + 0 (c- /5-,,).
o
0
p s
A
Since h
o
A
(4.70)
is the minimizer of /lI(h), and then by Lenuna 4.2 it follows
that
A' (h )
o
= D' (h0 ) = 0p (c -3/5-,,)
.
s
(4.71)
Using (4.70) and (4.71), we get
op (c -3/5-,,)
s·
= /II' (h ) - /II' (h ).
0
0
As above, this implies that as s -
h ) = 0 p (c-s 1/ 5 -,,).
(h o
(4.72)
0
Combining (4.69) and (4.72) gives Lemma 4.3.
**
Leuma 4.4:
as s
-+
00.
proof of Lemma 4.4:
it is sufficient to prove that
c
9/10
D
2
D (h ) --> N(O, a
s
4
1 o
0
2
0
(4.73)
),
We saw in the proof of Lemma 4.1 that E[~3(h )] = 0(c- 13/ 5 ).and
o
s
consequently S3(h ) = 0 (c-9 / 10 ). Similarly, S6(h ) = 0 (c- 9 / 10 ).
o
p s o p s
Next consider S4(ho )'
=
[(N-1)c-s 1-1] B(h)
91
.
•
where
-1 N
B(h) = C s i:1f{Kh(X-Yi)E[Kh(X-Y)]
-
(1/2)(Kh(x-Yi)E[~(x-Y)]
+
(1/2)~(X-Yi)E[Kh(X-Y)])}dx
-1
:r _2
- (Nc s +l)fO{e-[Kh(X-Y)]-E[Kh(X-Y)]E[~(x-Y)]}dx + 0p(l)
2 1N
2
= h c- ! a"(Y ) [Ju (K-L)/2]
i
s i=l
2
- (Nc-1+1) h2 (fT
oa "a) [JU2(K~L)/2] + 0p (h )
s
=
h2(f~a"a)( -2k)
2
B(h) = 2kh
Recall that
-
2h2(f~a"a)( -2k)
2
+ Op(h )
:r
2
(fOa"a) + op(h ).
S5(h) = (Nc
-1
-1) Q(h) where by (4.59),
s
2:r
2
Q(h) = -2kh (fOa"a) + o(h ).
Thus. f or h - C -1/5
'
s
-1
-2/5
,[S4(h) + S5(h)] = (Nc s -1) [0,+ o(c s
)].
1/2 - 1 '
-1/5
Since c S [Nc 5 -1] is asymptotically N(O,l). and h0 -a0 c s
,
one can show using Slutsky's theorem that [S4(h )+S5(h )] = 0 (c-9/ 10 ).
o
0
p
s
Let Si = Si(ho ) in this proof. Then. in order to prove (4.73) and
hence the Lemma, it is sufficient to prove that
(4.74)
Define:
2_':r2
'
2
= (2/ao ) (fOa) f{JK(y+z)[K(z)
- L(z)]dz} dy
'a 1
a; :;
4a:
and observe that
a
k
[J~(a,,)2a - (fba"a)2]
2
2
-2 2
2
= a (a + a ).
1
2
o
4
In order to prove (4.74) and hence Lemma 4.4, it is sufficient to
prove that conditional on N,
S
( C 9/10
l'
s
C
9/10 S)
2
s
D (Z I' Z)
2 as c s""
--+
92
(4.75)
where Zl and Z2 are independent normal variables with zero mean.
2 -2 2
First. we will prove
var(Zl) = N c s 0 1 and var(Z2)
(4.75). and then we will show how (4.75) implies that (4.74) holds.
Conditional on N. 8
density setting.
1
and 8
2
are similar to the terms found in'the
Therefore. the proof of (4.75) is parallel to the
proof found in Hall and Marron (1987).
conditional on N. 8
1
and 8
2
It can be shown that
are uncorrelated.
In addition. we can
write
8 1 = :I :I V(Y .. Yj )
i<j
1
such that E[v(Yi·Yj)IYj.N] =
.
N
:I v(Y.)
=
i=l
(4.76)
1
a a.s. for j<i. Hence. conditional on N.
for any a and b .
N
aS 1 + bS2 =
:I Wi =
i=l
N
:I [a :I V(Yi.Y j ) + bv(Y )]
i
i=l
j<i
is a martingale with respect to the a-fields generated by {Yl .... Y }.
N
The method found in the proof of Theorem 1 in Hall (1984) can be
utilized to show that conditional on N.
(aS
D
2
I
2
l + bS2 ) ~ N(O. a var[8 l N] + b var[82 IN]).
(4.77)
Moreover. we show below that
c 9/5 var[8 1 IN]
s
~
N2 c -2
s
0
2
1
= N2c~2 2a:l(fba2) f[JK(y+z) {K(z)-L(z)}dz]2dy .
9/5
I
-1
var[82 N] ~ Nc s
cs
O
2
-1
2 = Nc s
4 2
T . 2
T.. 2
o k (fO(a') a - (faa a)}.
4a
(4.78)
(4.89)
Using the Cramer-Wold device. (4.77). (4.78) and (4.79) imply that
(4.75) holds.
At this stage. we will prove (4.78) and (4.79).
From Hall and
Marron (1987). we can conclude that given observations Y . Yj from a
i
density f(x).
93
= 2ho-1.(If 2 ){I[fK(y+z){K(z)-L(z)}dz] 2dy}
Var[V(Yi,Y j )]
(4.80)
Var[v(Y )] = 4114 k 2 {f(f,,)2f - (ff"f)2} + o(h4 )
i
o
0
where V(.) and v(.) are defined as in (4.76).
(4.81)
Moreover, in the
intensity estimation situation, we know that conditional on N, the Y. 's
1
are distributed with density a(x)I[O,T](x).
Var[Sl IN] = Var[c-2
s
I
7<; V(Yi,Y j )
Now,
2
N] = c -4
s (N -N) Var[V(Yi,Y j
)
I N].
Thus, by (4.80),
Var[Sl IN ] = c-4 (N2_N) [2h-1(ITOa2){I[fK(y+Z){K(z)-L(z)}dz]2dY}+O(h-1)]
s
0
0
1
2
2
2
T
= c-s h-0 ~ c-s 2(Ioa ) {I[fK(y+z){K(z)-L(z)}dz]2dy} + o(c-~-l).
s 0
This means that
Var[Sl/ N] ~ c~9/5 N2 c~2 2a~1 (Iba2 ){I[fK(y+z){K(z) - L(z)}dz]2dy}.
In addition, (4.81) leads to
Var[S2 IN ] = Var[c-1~ v(Yi) I N]
s i
I N]
= C-2
s N Var[v(Y i )
= c-2 N [4114 k 2 {fT (a,,)2a s
0
O
(POa"a)2}
+ o(h4 )]
0
1 4
= c- 1 h4 Nc- 1 4k2 {fT (a,,)2a _ (fT
oa"a)2} + o(c- h ).
S
0
s .
o
5
0
Var[S2IN] ~ c~9/5 Nc~l 4k2 a: {fb(a,,)2a - (fba "a)2}.
The limits of Var[SlIN] and Var[S2IN] above confirm that (4.78) and
(4.79) hold.
Finally, given that (4.75) is true, we will prove (4.74) and hence
the main result of Lemma 4.4.
Let ~.) be the standard normal
cumulative distribution function, and let
~(x,a)
be the normal density
function at x with mean zero and standard deviation a.
implies that
cs~.
Then,
94
Recall that
s~
11m P[c
s-JD
9/10
~
(Sl+ S2)
5
s-fID
c
-9/10
x]
S
co
= 1im{ ! P[(Sl+ S2)
5"""
~
x] = 11m P[(Sl+ S2)
~ C
n=O
-9/10 x
s
I
N=n] P[N=n] ).
Since the summands are positive numbers, we can move the limit inside
of the summation.
co
=!
n=O
lim{P[(Sl+ ,S2) ~ c~9/10 x
co
= ! [lim{P[(Sl+ S2)
n=O
I N=n]
P[N=n])
5"""
~ c~9/10
I N=n]}
x
5"""
lim{P[N=n])]. '
5"""
By (4.76), conditional on N, (Sl+ S2) is asymptotically normal with
-9/5
2 -2 2
-1 2
variance [c s
(N c
u + Nc
u )]. Moreover. by (4.25), N is
1
2
asymptotically normal with mean c 5 and variance c.
5
=
11~f ~
5"""
= 11
]
2
c -9/10 x
5
1
-(n-c) 12c
2 -2 2
-1 2 1/2
e
5
5
[ -9/10
C
(n c S u 1+ nc 5 u 2 )
~
S
.
~f ~ [
5"""
Hence.
x] -
2 -2 2
-1 2 1/2
(n C s u + nc s u 2 )
1
95
1
~
e
2}
-(n-c) 12c
5
5
dn .
Using the substitution
9/10
lim P[-c s
(Sl+S2)
s.....
=
,
~
w=c
-1/2
-1
-1/2
(n-c) (i.e. nc =[wc
+1]), we get
s
s
s
s·
x]
r~[
x
s~l J: «wc~1/2+1)2G~+(wc~1/2+1)G~)1/2
11J
] _1 e-
v2U
w2 2
/ dW}
(4.82)
(4.82) implies that the asymptotic cumulative distribution function of
(Sl+S2) is the limit of a weighted normal cumulative distribution
function with normal weights.
We show below that this weighted
function actually is a normal distribution function.
First, by the
monotone convergence theorem, it follows that
=
J [
x
]} 1
~':l~ -(-(-w-c-~""1/-;:2""'+-1-)"'2G-:~""'+-(-w-c~""1:-/"'2:-+-1-)G"'~:-)"'1'"12'" v2U-rr
f
i-f~[-...-m--X...,......,....-----;-;~~]}
:1 « -1/2+ 1)2 2+ ( -1/2+ 1) 2)1/2
= fsl.....
wC
s
G
1
wC
G
s
2
-w /2
e
. dw
<p(w,l) dw.
2
Hence,
~ x] = J~[ (G2+G2 ) 1/2]
x
lim p[c;/10(Sl+S2)
s.....
1
=
Thus,
2
~[ (G2+Gx 2 ) 112]
1
=
<p(w,l) dw.
~[x
a G
2
].
(4.83)
4 o
c;/10(Sl+ S2) ~ N(O. a~G~). and this completes the proof of
the Lemma 4.4.
**
Lenma 4.5:
96
proof of Lemma 4.5:
First, define:
=
0l(h)
-(h/2)o'(h) = T1+ T2+ T3
2_
:r2
2
0 1 = (2/ao ) (JOa ) (fL )
o~
= 4a:
k
2
Ub(a,,)2a - (Jba"a)2] .
2 _ -2 2
and observe that 0ev= a (0 +
4
1
Consider T .
3
2
2 ),
O
Recall from (4.64) that,
T3 (ho ) = [(N-1)c~1- 1] W(ho )
where
W(ho ) = o(h2 ).
Thus, one can show that T (h ) =op(c~9/1O).
3 o
Therefore, in order to prove Lemma 4.5, it is sufficient to show
that
(4.84)
Given N, T and T are terms that are found in the density
1
2
setting.
Therefore, we use procedure similar to the one seen in Lemma
4.4 for T and T .
I
2
The density setting result of Lemma 3.5 in Hall and
Marron (1987) is utilized to show that conditional on N, aT + bT
I
2
is
asymptotically normal and
c 9/ 5 var( T IN) - 0 ~ c-2 0 2 = N2 c-2 2a-1 (JT
Oa2) fL2,
s
5
5
0
1
1
9/5
I
-1 2
-1
4 2 :r
2
:r
2
var(T2 N) - 0 Nc s O 2 = Nc s
k {SO(a") a - (JOa"a) }.
cs
o
Then, the Cramer-Wold device implies that conditional on N,
4a
T
9/10 T)
( c 9/10
l' c s
2
s
D
---+
(2
l'
2)
2
as c
s
-JX)
(4.85)
normal variables with zero mean,
Statement (4.84) and hence Lemma 4.5 follow from (4.85) using a similar
argument to the one found in Lemma 4.4.
97
**
lemma 4.6:
For
any
Given assumption 4) as weLL as assumptions 1). 2) and 3),
O<a<b<oo.
as s-J».
prooF of Lemma 4.6:
We begin by creating a decomposition of D (h) _ (h2/2)D"(h).
2
Then. as in Lemma 4.1. we can prove that
sup
s;a~t~b
E[ Icl/~*"(c-l/5t} ,2e; N)O] ~ A(a. b. e).
s
s
Moreover. using the same methods as those found in Lemma 4.2. it can be
shown that for some 10)0,
ID* "(c-1/5 t) I
=0 (c-2/5-c)
as s.......
sup IR*"(c-l/5 t) I = 0 (c-2/5-c)
p s
a~t~b
s
as s.......
sup
a~t~b
s
p
s
and hence.
sup ID"(c-l/5 t)I =
a~t~b
s
0
p
(c-2/5)
s
This completes the proof of Lemma 4.6.
98
**
as s.......
OIAPI'ER 5
AN EXAMPLE OF KERNEL INfENSI1Y ESTIMATION WIm TIlE LEAST-SQUARES
O«lSS-VALIDATlON BANDWIDrn APPLIED TO CDFFEE SALES DATA
In general, there are two ways of viewing kernel estimation.
First of all, our goal might be to estimate a "true" underlying
intensity function.
Certainly, this goal motivates the theory that was
presented in the previous chapters.
The alternative is to view the
kernel estimator as a exploratory data analysis tool.
That is, we use
smoothing techniques in order to see important features in ·data.
This
is typically the big advantage for using kernel estimation in practice.
We saw in Chapters 2 and 4 that the least-squares cross-validation
bandwidth has desirable asymptotic properties.
Asymptotic analysis
helps us see the basic structure of an estimator; however, since data
sets always have a finite number of observations, it is essential that
an estimator also performs reasonably well for moderately large data
sets.
We will show how the kernel intensity estimator with the
cross-validation bandWidth can be utilized to analyze data by
presenting an actual example with coffee sales purchase data.
5.a. Kernel Estimation of the Coffee Data.
In this chapter, we aim to describe the rate of purchases for
various brands of coffee.
In particular, we are interested in the
sales of regular coffee {i.e. not decaffinated and not instant)· in
standard sized packages (between 13 and 32 ounces) at one given store.
The data were obtained from the Academic Research Data Base at
Information Resources Inc.
Two thousand families were observed over a
period of one hundred and eight weeks (April. 1980 through April,
1982).
Each time a participating family member purchased coffee, the
time of the sale and the coffee brand were recorded.
Therefore, each
data set represents the sales of one brand of coffee at one store, and
the observations are the ordered times that the particular brand was
purchased by any of the participants in the study.
For a process such as the sales of coffee. we do not expect the
rate of observations to be constant over time.
For instance. we expect
coupons, price changes and advertising campaigns to have an effect on
sales.
Thus, we are interested in presenting a curve which describes
the purchase rate for a given brand of coffee.
If we assume that the
purchase times form a nonhomogeneous Poisson process, then the
intensity function describes the rate of coffee purchases over the
relevant time.
In addition, we are interested in observing the
relationship between sales and marketing efforts.
By overlaying the
promotional activities above. the intensity estimate, we hope to explore
the connection between promotions and sales.
A kernel estimator of the intensity function is appealing since it
does not restrict the researcher to any prespecified family of
functions.
This method allows one to see the existing structure in a
data set since the kernel estimate is high at times when there are many
observations and is low where there are few observations.
100
In the
purchase rate context. the area under the curve in the interval [r.s]
is approximately the number of coffee sales that occurred between time
r and time s.
Let t be time measured in weeks; then. for thiS example we are
interested in the interval [O.T] where T=lOS.
Let N be the number of
purchases and Xl.X2 •... ~ be the purchase times that we observe.
The
kernel smoothed purchase rate is:
N
A
~(t)
=
~
i=l
Kh(t-X.)
(5.1 )
for t€[O.T]
1
where h is the bandwidth or smoothing parameter and K(.) is the kernel
function.
Since we are estimating an intensity function on a finite interval
it is important to consider boundary effects.
In particular. there are
no observations outside of the interval [O.T]. and hence. the kernel
estimator is artificially low at the endpoints.
A(T) are positive. it is known that
endpoints.
~
In fact. if A(O) and
is inconsistent at the two
This is parallel to the problem in density estimation when
there are discontinuities in the density function.
Several solutions
have been studied by Rice (1984). Gasser. Muller and Mamrnitzsch (1985).
Schuster (1985) and Cline and Hart (1986).
Furthermore. when boundary
corrections are not used. the kernel intensity estimate has mass
outside of the relevant interval. and hence. f
T
O
A
~(t)dt
is smaller tban
the number of observations in [O.T].
In this chapter. we use the kernel estimator with "mirror image"
corrections at the two endpoints which was presented in Schuster (1985)
and Cline and Hart (1986).
With this method. the area from the kernel
functions that extends beyond the boundaries of the interval are folded
101
over at ° and T so that all of their area is inside of the interval
[O,T].
Therefore, the
re~ulting
kernel estimator with boundary
adjustments is defined by
The kernel function K(.) in (5.2) is generally a symmetric
probability density function;
In our analysis, for computational
simplicity we use the biweight kernel:
°0·9375(l-X2)2
K(x) = [
for -1
~
x
~
1
otherwise.
With kernel smoothing, the main question is not what shape should
the kernel be, but rather how wide should the kernel be.
parameter, h. quantifies the width of the kernel.
The smoothing
Thus, choosing the
smoothing parameter for a kernel estimator is an important concern.
Consider figures 5.1, 5.2 and 5.3.
These are kernel intensity
estimates of the puchase rate for the coffee sales of Brand 4 with
three different bandwidths.
The vertical axis represents the number of
sales per week, and this brand has a total of 1506 sales between week °
and week 108.
In figure 5.1, h=2.82 weeks is used.
This kernel
estimate displays a number of obvious peaks, but the curve is fairly
smooth over the range of the 108 weeks.
Figure 5.2 shows the curve
obtained from using a very small bandwidth (h=.75).
curve has many sharp increases and decreases.
The purchase rate
These peaks occur
precisely when there are a relatively large'number of sales in a small
time period; however, we believe that this is an undersmoothed version
102
Figure 5.1
Brand 4, h=2.82
o
<0
w
~
0::0
<0
W
(/)
<{
:::c
U O
0:: ....
:::>
a.
o
N
0
10
0
20
JO
40
50
60
70
80
.90
100 .110
WEEKS
Figure 5.2
0
0
..
-
Brand 4, h=.75
o
<0
w
~
<{
.
0::0
<0
W
(/)
<{
:::c O
U
0:: ....
:::>
a.
o
N
~JWJI} #~vVvJ V~~jJl.v-.,..~JJ~ ~
o'--'-..............::'"-'-~ ........~................~.-.:....~""""":>:.:-'"'-'-~.......,,=~
o
10
20
JO
40
50
60
WEEKS
103
70
80
90
100 110
Figure 5.3
Brand 4, h=10.57
o
<0
'0
20
JO
40
50
60
WEEKS
104
70
80
90
100 , 10
of the purchase rate since the little peaks probably are due to noise
in the data set.
For example, consider the intensity estimate from
week 28 to 35 and from week 48 to 56.
bandwidth is used (h=lO.57).
Finally, in figure 5.3, a large
With this oversmoothed curve, we lose
some information about the purchase rate.
Notice how increasing the
bandwidth results in one mild peak where there had been two dramatic
peaks in figure 5.1 (e.g. near weeks 23 and 43).
In general, small bandwidths allow one to pinpoint exactly when
the purchase rate is high and low for each brand. however, it is
important to keep in mind that these pictures also display changes in
the rates that may be due to random variation rather than identifiable
causes.
Alternatively, with large bandwidths, the resulting curve may
summarize the purchase rate. but this might be an oversimplified view.
It is crucial, therefore, to choose a bandwidth so that the resulting
curve displays the important features of the data without letting the
random variation of the data set effect the curve disproportionately.
For Brand 4. the middle bandwidth, h=2.82, might be a reasonable
choice.
Fortunately, we also have information about several covariates
that may influence the rate of coffee sales.
For each brand, we have
weekly data that gives the average price of the coffee for that week
and indicates what type or types of marketing efforts were employed
during the week.
See figure 5.4.
First, we overlay the price data;
the "+" signs indicate the average price per ounce of Brand 4 coffee.
It appears that decreasing the price does effect the purchase rate;
however, several of the peaks are not explained by price changes.
also include the information regarding three additional covariates.
105
We
A
co
PRICE
OJ
....
.
W
W
U"l
.....
OJ
1'1
I
. ....
()
(f)-8-~-o----
c
o
-¢-
+-I
o
-<l- S - - - - -
o
- - -<l- 13 - - - - -
u
c
<l
-¢-
0
II
o
¥-::-;:;-=-.--.::::::=-::-:::=:-::-::::-==-===-~-~-::::-:-=--1 ~
o(/)
-<]- 13 - - - - -
W:::::L
W
<l
(I)
(])
oW
LI13:
---<l-S------~-~----
----------
o
---<l-B-----~~-~------
--------
~
c
o
L
+
CO
0....
u
Il
-$--<l-B-----
L.
(f)
o
o,....
o
o
o
0-
(Jl
E
o
....
,....
---<]-13-~-~-~-=----------
--------
---<l-B-~->_-------------
-----
v
o
I")
o
N
....OJ
:l
.....o
OJ
'-
II
<I
OJ
c:
o
~ZL
OOL
~L
O~
31V~ 3SVHJ~nd
0:J
o
()
II
<>
106
diamond is printed at each week that Brand 4 coupons were distributed;
a triangle marks the weeks when the brand was advertised in the local
circular; and a square is placed at the weeks when the brand was on
"special display" (I.e. either the coffee was displayed on a rack at
the end of the store aisle or else there was a big sign advertising the
coffee).
We can now see that most of these large peaks are associated
with marketing efforts.
Thus, the purchase rate kernel intensity
estimate, allows a researcher to decide whether a promotion influenced
the sales rate, and she can determine the size of the effect since the
area under a curve is approximately equal to the number of purchases in
that period.
5.b. The Least-Squares Cross-Validation BandWidth.
We would like to choose the bandWidth. h, such that the resulting
kernel intensity estimator reflects the important features (and only
those) of the data.
The cross-validation bandwidth selection method
can be motivated by thinking about choosing the bandwidth to minimize
the distance between the kernel smooth and the "true" purchase rate
function, denoted by A(t).
In this context, A(t) is the noiseless
ideal purchase rate underlying the sampling that is used in this study.
In order to quantify the error of
~*
Ab'
we use the integrated square
error (ISE) which is defined by
lSE(h) =
Sb[~(t)]2dt
-
2Sb ~(t)A(t)dt
+
Sb
2
A (t)dt
We would like to find the bandwidth that minimizes lSE(h). but since A
is unknown. this is not possible.
In Chapter 2. we saw that the
cross-validation score function has roughly the same behavior as
107
ISE(h).
Thus. we choose h by minimizing the cross-validation score
function:
CV(h)
N
= ! [Kh(t-Xj)+Kh(t+Xj)+Kh(zr-t-X j )].
j=l
.
j¢i
where
Thus. the
cross-validation bandwidth should be close to the bandwidth that
minimizes ISE(h).
In particular, we saw in Chapters 2 and 4, that the
cross-validation bandwidth has some good theoretical properties. for
instance asymptotic optimality.
Therefore. we used the
cross-validation bandwidth for the following purchase rate kernel
intensity estimators.
In general. the cross-validation bandwidth selection method
performed well for the coffee data.
Unfortunately. there are some
practical problems to be dealt with.
Chief among them is the fact that
the cross-validation score function. CV(h). often has several local
minima.
For example. consider for Brand 4.
function appears in figure 5.5.
The cross-validation score
There are two local minima at h=2.S2
and h=10.57; note log(2.S2)=.45 and log(10.57)=1.02.
The global
minimum of the cross-valfdation score function is larger than 100
weeks: however. since the entire study lasted lOS weeks, this
possibility was considered unreasonable.
The kernel purchase rate
estimator that is created by using the smallest local minimum bandwidth
(h=2.S2) is shown in figure 5.4. and the estimator that is created
using the larger local minimum bandwidth (h=10.57) is seen in figure
1~
Figure 5.5
Brand 4 Cross-Validation Curve
N
0
0
0
I
~
h :: 2.82
0
0
0
I
h
::
10.57
,,-...'
.co"
'__'0
>0
UI
co
0
~
0
I
0
~
'T 0.2
0.4
0.6
O.B
1.0
1.2
1.4
1.6
1.8
LOG(h)
Figure 5.6
C5
Brand 29 Cross-Validation Curve
o
o
I
co
o
o
o
h
::
.
4.08
1
h
"-"'N
::
.31.1.3
.c'__'0
>0
UI
'"
oI
-0
o
N
o
'f 0.2
.
,
0.4 -
0.6
0.8
1.0
LOG (h)
109
1.2
1.4
1.6
1.8
5.3.
As mentioned above, the bandwidth h=2.S2 seems to perform better
because the larger bandwidth, h=10.57, tends to smooth the data such
that two distinct clusters of observations. are sometimes combined to
form one peak.
We believe that the rather small change in CV(h) at the
second local minimum (log(h)=1.02) may indicate that the first local
minimum (log(h)=.45) gives the better bandwidth.
It is known that the cross-validation bandwidth is stongly
affected by clusters in the data.
From simulated examples. data that
have large clusters which contain smaller clusters of observations
often produce cross-validation score functions with local minima.
In
addition, Hall and Marron (1991) hav~ shown that for density estimation
the expected number of local minima in CV(h) is proportional to
p=[If2][I(f .. )2]~1/5.
Thus, p, a ratio of two measures of the
roughness. indicates the likelihood for having multiple minima.
The
existence of several minima in CV(h) is not desirable; on the other
hand, the curves that result from the various cross-validation
bandwidths often provide complementary and relevant insights into the
data.
Hence, for a complete data analysis. looking at the curve
estimators for each of the local minima is recommended.
Next, look at Brand 29. a brand that was introduced in the middle
of the study and had 513 observations.
The cross-validation curve in
figure 5~6 displays two significant minima at h=4.OS and h=31.13
(log(4.0S)=.61 and log(31.13)=1.49).
The kernel intensity estimate
with the smaller bandwidth. h=4.OS. is seen in figure 5.7.
With this
curve. the peaks occur at or near the time of the promotions and thus
suggest a strong correlation between the marketing efforts and the
110
Figure 5.7
Brand 29, h=4.08
..
0
"
I
'"'"
,
I!I
,
0
I
~
.~
I!I
I
I
,
V
l-
I:l
I
I
w
I
w'"
~
f
I!I
. I
I
I
17.6"
16.2 ::;>
n
I
i
r
«'"
0::",
I
I
I
I
I
H.9 M
I
r
W
V'l0
«1'1
I
:c
U'"
0::-
~
-
0..
0
'"
..
0
0
0
U
I
I
= coupon
,
..)
10
~
Figure 5.8
20
=
30
leoture
~O
I
. I
.
I
,. .,I
50 60
70
WEEKS
0
SO
,I
100 110
SO
= dis ploy
+
=
price
Brand 29, h=31.13
I
~
I
~
f
f
I!I
I!I
I!I
I
I
I
oI
0"'"--'.......Ir---'f
o
W'"
I
i
I
«'"
0::1'1
I
w
,
I
I
I
I
I
I
I
I
I-
I
I
V'l0
,
,
,
«1'1
:c
0..
I
I
I
~
I
I
I
I
+-::
I
0
........_--~:I
60
70
WEEKS
o
50
o = coupon
~
= leoture
111
H.9 M
I
I
u'"
0::-
17.6 "
16.2 :;:>
n
80
SO
= dis ploy
,
I
I
I
I
,
I
100 1 10
+
= price
purchase rate.
In figure 5.8, the intensity estimate with the larger
bandwidth, h=3I.13, displays the overall trend of the purchase rate
over time.
This brand demonstates how the loacl minima in the
cross-validation curve each provide interesting and valid viewpoints
for studying the data.
That is. combining the two pictures shows very
clearly what is happening to the sales of Brand 29.
First of all,
sales responded very well to the marketing techniques employed (seen in
figure 5.6), and looking macroscopically, the purchase rate decreased
after the introduction of the brand and then increased after week 80
(seen in figure 5.7).
Finally. consider Brand 20 a relatively small brand with only 217
observations.
In figure 5.9, the cross-validation curve for Brand 20
has only one minimum at h=55.56 (log(55.56)=1.74).
The resulting
purchase rate'curve in figure 5.10 is extremely smooth.
This kernel
intensity estimate indicates that the purchase rate gently rises in the
early weeks and then declines in the later weeks, but remains fairly
steady during the entire study.
bandwidth (h=4.00).
In figure 5.11. we consider a smaller
When h=4, peaks are evident at the times that
marketing efforts were used; however, by looking at the purchase rate
curve alone. it is difficult to distiguish the explained peaks from the
unexplained ones.
Note that the intensity estimator represents between
one and seven sales per week; therefore. these peaks may simply
represent Poisson variability in the data.
As a consequence, the
smooth kernel purchase rate estimate given by the cross-validation
bandwidth (h=55.56) in figure 5.10 seems more reasonable in this
situation.
112
5.9
Figure
Brand
N
0
0
20 Cross;"" Validotion Curve
0
I
...
0
0
0
1
_1.0
..co
>0
h = 55.56
,-,0
(.)1
EO
0
0
0
I
-'i'
0
0
. 0.6
0.4
0.2
0.8
1.0
1.2
1.4
1.6
1.8
LOG (h)
:
Figure 5.10
Brand 20, h=55.56
N
I
I
A·
I
o
I
I
I
I
I
I
I
I
I
I
I
I
r
I
.
~
.
.
10 20
~
H.O
I
I
I
I
I
I
I
= coupon
20.0
I
I
I
I
.
.
I
I
I
I
I
I
I
I
I
I
I
I
I
I
N
~
1.
I
I
o
I
I
I
I
I
I
I
I
I
o
I
I
I
I
~~
I
I
I
o
i
I
I
I
I
,
I
'.
r
I
I
I
I
I
. • . . ..
~o
= feoture
40
..
50
60
WEEKS
113
70
.,
80
.
90
o ., dis ploy
I
I
..
I
100 110
+ = price
Figure 5.11
Brand 20, h=4.00
N
0
I
I
I
I
!I
I
9
!/
I
I
I
I
I
I
l
W
/
20.0
I
I
!:;;:co
I
I
I
a::
./
W
"0
'"
17.0 n
r'l
H.O
/
V) 10
-<
::r:
U
I
I
I
I
0-
I
l
l
a::
~ ....
N
0
0
0=
coupon
10
~
20
JO
= leoture
40
50
60
70
WEEKS
0
114
80
90
= dis ploy
100 110
+
= price
The fairly constant sales rate for Brand 20 can be justified by
•
the marketing' theory that small brands may have a loyal group of
customers who continue to buy their coffee with or without promotions.
Combining the marketing theory and the purchase rate kernel estimator
with the cross-validation bandwidth leads us to believe that promotions
are not very effective for increasing the Brand 20 sales rate.
Recall
that for the large brand. Brand 4. there appeared to be a strong
relationship between promotions and the purchase rate.
Thus. one can
hypothesize that large brands derive greater benefits from marketing
. efforts compared to small brands.
The purchase data for the three coffee brands discussed above are
examples where the kernel estimator with the cross-validation bandwidth
seems to perform quite well.
In particular. this bandwidth is quite
effective in distinguishing between increases in sales that are due to
promotions rather than random variations in the data.
Therefore. the
kernel estimator with the least-squares cross-validation bandwidth
assists us in interpreting the relationship between promotions and the
purchase rate and thus provides greater insight into the coffee data.
This example demonstrates that the cross-validation bandwidth can be a
practical bandwidth for analyzing data ,from a counting process.
115
Aalen, Odd (1978), "Nonparametric Inference for a Family of Counting
Processes," The Annals of Statistics, 6, 701-726.
Andersen, Per Kragh and Borgan, Ornulf (1985), Counting Process Models
for Life History Data: A Review," Scandinavian Journal of
Statistics, 12, 97-158.
Bowman. 'Adrian W. (1984), "An Al ternative of Cross-Validation for the
Smoothing of Densi ty Estimates." Biometrika. 71, 363-60.
Burkholder, D. L. (1973), "Distribution Function Inequali ties for
Martingales," The Annals of Probability. I, 19-42.
Burman, Prabir (1985), "A Data Dependent Approach to Density
Estimation," Zeitschrift fur Wahrscheinlichkeitstheorie und
verwandte Gebiete, 69, 609-628.
Castellana, J. V. and Leadbetter, M. R. (1986). "On Smoothed
Probability Density Estimation for Stationary Processes i "
Stochastic Processes and their Applications, 21, 179-193.
Clevenson, M. Lawrence and Zidek, James V. (1977), "Bayes Linear
Estimators of the Intensity Function of the Nonstationary Poisson
Process," Journal of the American Statistical Association, 72,
112-120.
Cline, D.and Hart, J. (1986), "Kernel Estimation of Densities with
Discontinuities or Discontinuos Derivatives," unpublished
manuscript.
Cox. D. R. and Isham. Valerie (1980), Point Processes, Chapman and
Hall, London ..
Devroye, Luc and Gyorfi, Laszlo (1984), Nonparametric Density
Estimation: The L View, John Wiley and Sons, New York.
1
Diggle, Peter J. (1983), Statistical Analysis of Spacial Point
Patterns, Academic Press, London.
Diggle. Peter (1985), "A Kernel Method for Smoothing Point Process
Data," Applied Statistics, 34, 138-147.
Diggle, Peter and Marron, J. S. (1988), "Equivalence of Smoothing
Parameter Selectors in Density and Intensity Estimation," Journal
of the American Statistical Association, 83, 793-800.
Duin. Robert P. W. (1976), "On the Choice of Smoothing Parameters for
Parzen Estimators of Probability Density Functions," IEEE
Transactions on Computers, C-25, 1175-1179.
116
Ellis, Steven (1986), "A Limit Theorem for Spatial Point Processes,"
Advances in Applied Probability, 18, 646-65.
Epanechnikov, V. A. (1969), "Non-parametric Estimation of a
Multivariate Probability Density," Theory of Probability and Its
Applications, 14, 153-158.
Farrell, R. H. (1972), "On the Best Obtainable Asymptotic Rates of
Convergence in Estimation of a Density Function at a Point", The
Annals of Mathematical Statistics, 43, 170-180.
Fiskel, Thomas (1988), "Edge-<:Orrected Density Estimators for Point
Processes", Statistics, I, 67-75.
Foldes, A. and Revesz, P. (1974), "A General Method for Density
Estimation," Studia Scientiarum Mathematicarum Hungarica, 9, 81-90.
Gasser, Too Mueller, H. G. and Mammitzsch, V. (1985), "Kernels for
Nonparametric Curve Estimation," Journal of the Royal Statistical
Society, Series B, 47, 238-252.
Habbema, J. D. F., Hermans, J. and van den Broek, K. (1974),
"A Stepwise Discriminant Analysis Program Using Density
Estimaiton," Compstat 1974: Proceedings in Computational
Statistics, Physica Verlag, Vienna, 101-110.
~.
Hall, Peter (1983), "Large Sample Optimality of Least Squares
Cross-Validation in Densi ty Estimation," The Annals of Statistics,
11, 1156-1174.
Hall, Peter (1984), "Central Limit Theorem for Integrated Square Error
of Multivariate Nonparametric Density Estimators," Journal of
Multivariate Analysis, 14, 1-16.
Hall, Peter (1987a), "On Kullback-Leibler Loss and Densi ty Estimation,"
The Annals of Statistics, 15, 1491-1519.
Hall, Peter (1987b), "On the Use of Compactly Supported Density
Estimates in Problems of Discrimination," Journal of Multivariate
Analysis, 23, 131-158.
Hall, Peter and Marron, J. S. (1987a), "Extent to which Least-Squares
Cross-Validation Minimises Integrated Squares Error in
Nonparametric Density Estimation," Probability Theory and Related
Fields, 74,567-581.
Hall, Peter and Marron, J. S. (l987b) , "Choice of Kernel Order in
Densi ty Estimation," The Annals of Statistics, 16, 161-173.
Hall, Peter and Marron, J. S. (1991), "Local Minima in Cross-validation
Functions," Journal of the Royal Statistical Society, Series B, 53,
245-252.
!
117
HardIe, Wolfgang, Hall, Peter and Marron, J. S. (1988), "How Far Are
Automatically Chosen Regression Smoothing Parameters From Their
Optimum?" Journal of the American Statistical Association, 83,
86-101.
.
HardIe, Wolfgang, and Marron, J. S. (1985), "Optimal Bandwidth
Selection in Nonparametric Regression Estimation," The Annals of
Statistics, 13, 1465-1481.
"
HardIe, Wolfgang, Marron, J. S. and Wand, M. P. (1990), "Bandwidth
Choice for Density Derivatives." Journal of the Royal Statistical
Society, Series B, 52, 223-232.
Leadbetter, M. R. and Wold. Diane (1983), "On Estimation of Point
Process Intensities," Contributions to Statistics: Essays in Honour
of Norman L. Johnson (P.K. Sen, ed.), North-Holland, Amsterdam,
299-312.
Marron, James Stephen (1985) "An Asymptotically Efficient Solution to
the Bandwidth Problem of Kernel Density Estimation," The Annals of
Statistics. 13, 1011-1023.
Marron, J. S. '(19S7) "A Comparison of Cross-Validation Techniques
in Density Estimation," The Annals of Statistics, 15, 152-162.
"
Marron, James Stephen and HardIe, Wolfgang (1986) , "Random
Approximations to Some Measures of Accuracy in Nonparametric Curve
Estimation," Journal of Multivariate Analysis. 20, 91-113.
Parzen, Emanuel (1962) , "On Estimation of a Probability Density
Function and Mode." The Annals of Mathematical Statistics, 33,
1065-1076.
Ramlau-Hansen, Henrik (1983a), "Smoothing Counting Process Intensities
by Means of Kernel Functions," The Annals of Statistics, 11,
.453-466.
Ramlau-Hansen, Henrik (1983b), "The Choice of a Kernel Function in the
Graduation of Counting Process Intensities," Scandinavian Actuarial
Journal, 165-182.
Rice, John (1984), "Boundary Modification for Kernel Regression,"
Communications in Statistics, Series A, 13, 893-900.
Ripley, B. D. (1981). Spatial Statistics, John Wiley and Sons, New York.
Rosenblatt, Murray (1956), "Remarks on Some Nonparametric estimates of a
Density Function. The Annals of Mathematical Statistics," 27,
832-837.
118
Rosenblatt, Murray (1971), "D..1rve Estimates," The Annals of Mathematical
Statistics," 42, 1815-1842 .
•
Rudemo, Mats (1982). "Empirical Choice of Histograms and Kernel Density
Estimators," Scandinavian Journal of Statistics, 9. 65-78.
Schucany, W. R. and Sommers, J. P. (1977), "Improvement of Kernel Type
Density Estimators," Journal of the American StatistiCal
Association, 72, 420-423.
Schuster, Eugene F. (1985). "Incorporating Support Constraints into
Nonparametric Estimators of Densities." Communications in
Statistics, Series A. 14, 1123-1136.
Scott. David W., Tapia, Richard A. and Thompson. James R. (1977),
"Kernel Densi ty Estimation Revisted." Nonlinear Analysis, Theory,
Methods and Applications, 1. 339-372.
Silverman, B. W. (1986), Density Estimation for Statistics and Data
Analysis, Chapman and Hall, New York.
Stone. Charles (1980), "Optimal Rates of Convergence for Nonparametric
Estimators," The Annals of Statistics. 8, 1348-1360.
!
Stone, Charles (1984), "An Asymptotically Optimal Window Selection Rule
For Kernel Density Estimates." The Annals of Statistics, 12,
1285-97.
Tapia. Richard A. and Thompson. James R. (1978). Nonparametric
Probability Density Estimation, The Johns Hopkins University
Press. Baltimore.
.
Walter, G. and Blum, J. (1979), "Probability Density Estimation Using
Delta Sequences." The Annals of Statistics, 7, 328-340.
Watson, G. S. and Leadbetter. M. R. (1963), "On the Estimation of the
Probability Density. I," The Annals of Mathematical Statistics,
34. 480-491.
Whittle. P. (1958), "On the Smoothing of Probability Density
Functions," Journal of the Royal Statistical Society, Series B,
20, 334-343 .
•
119
© Copyright 2026 Paperzz