Equivalence of smoothing parameter selectors
in density and intensity estimation
by
1
J.S. Marron
Australian National University
and University of North Carolina
Peter Diggle
CSIRO Division of Mathematics
and Statistics. Canberra
June 1987
;, ;::;
_.
~
. :;)
AMS 1980 subject classif1cat!Qq:.:;-ppimary'62GOS,;.-::-secOriaarY
6OG35.
_ .._ 0 ' .
.
Key words and phrases:
Cox process, Cross-validation. Kernel estimators,
Nonparametric densi ty esti~tton...Poisson .intens-i ty estimation. Smoothing
1Research partially suppor ted-by NSF Gran i 'tiMS.:::.8400602.
r
...
Kernel smoothing is an attractive method for the nonparametric estimation of
either a probability density function or the intensity function of a nonstationary
Poisson process.
In each case the amount of smoothing. controlled by the band-
width. i.e. smoothing parameter. is crucial to the performance of the estimator.
Bandwidth selection by cross-validation has been widely studied in the context of
density estimation.
A bandwidth selector in the intensity estimation case has been
proposed which minimizes an estimate of the mean squared error under the assumption
that the data are generated by a stationary Cox process.
This paper shows that
these two methods each select the same bandwidth. even though they are motivated in
much different ways.
In addi tion to providing further
justification of each
method. this eqUivalence of smoothing parameter selectors yields new insights for
both density and intensity estimation.
A benefit for intensity esimation is that
this equivalence makes i t clear how the Cox process method may be applied to
kernels which are nonuniform. or even of higher order.
Another bene£! t is that
this duali ty between problems makes it clear how to apply the well develoPed
asymptotic methods for understanding density estimation to the intensity setting.
A bene£! t for densi ty estimation is that it motivates an analogue of the Cox
process method which provides a
problem.
useful nonasymptotic means of
studying
that
The specific forms of the estimators and smoothing parameter selectors
are introduced in Section 1.
The basic equivalence result is stated in Section 2.
Sections 3 and 4 describe new insights which follow for intensity and densi ty
estimation respectively.
Section 5 discusses modification of these ideas to take
boundary effects into consideration. and shows how they can be used to motivate new
boundary adjustments in intensity estimation.
1.
The Es tt...tors
The raw data for both density and intensity estimation consist of a set of
points X ..... X
1
n
€~.
For density estimation. these are thought of as realizations
of n independent random variables. all having probabili ty densi ty function f(x).
For intensity estimation these are thought of as a realization on an interval [O.T]
of a nonstationary Poisson process with intensity function A{X).
A reasonable estimate of either f or A should be a function which takes on
large values in regions where the data are dense. and values close to zero where
the data are sparse.
The kernel smoothing method of constructing such a function
is to let
,.
f t(x)
= n -1
for density estimation. or
for intensity estimation.
probability density.
The parameter t controls the amount of smoothing that is done
and is called the bandwidth.
The normalization factor of n
-1
,.
makes f(x) a prob-
ability density.
Figure 1
For access to the literature on theoretical properties of these estimators see
Devroye and Gyorfi (1984). Leadbetter and Wold (1983) and Ellis (1986).
In Figure (1) we show how varying the bandwidth t affects the smoothness of
,.
the intensity estimator At(X).
The data are the times of 191 coal-mining disasters
- 2 in a total time-peried.i:of o 4Q55Q
days,
2~S
ic_"i lied by Jarret t (1979).
the~c'd8.ta
used an earlier, incomplete version of
~""
.
.'
.
squares cross-validation. foi-nlandWidth...· s'elec~ion.
, ,'d d(.~)O~Of~93t6{1~~)2 it
_....
.': ..
4:.~j
'"!Sl.;J:,:1':l£<:.[
':[~il:.19~.:::5 ~rU't.r
Rudemo (1982_
to illustrate the use of least
We use a quartic kernel,
-1
~
~
x
1
otherwise
,,'.L.
.~[i::3'··~·
...
Figure (I) shows three estimates At(X) with a four-fold increase in t between each
inte~mediate
pair of estimates and the
= 5957.3
value t
the optimum value according
to the method of bandwidth selection described in Diggle (1985).
Note that the estimate wi th t = 1489.3 oscillates wildly and contains many
On the
features that one can not expect to distinguish with only 191 observations.
other hand, the curve wi th t = 23829.2 has- lost features which are more likely to
be reflected by the underlying intens:J;;ty function.
•
smoothing
~
4
:
..
k·
.
' ... _.
-
This is the essence of the
-
proble~:
while not smoothing so much that interesting features disappear.
It is generally the case that the
k~rnel
effective performance of.. a
e~~i~t~r .t~
. , "
further
discussion and
,} 1J).l~,'· __, '"
; " . ,_'
theoretical
Silverman (1986).
c-
choi~.. 'P.~{1;.is
analysis
>1
.l, ',J!"!'1'S:":' '~·11C ~1 .!" D.'
by
the method of
least
squares
letting t
cv
this
);
CV(t)
= I[f
...
:' s l
f
.2
-1
n
~
j=l
t
.(x)
t ,J
=n
-1
see,
for
example.
have proposed choosing
This essentially entails
t"
(x)] dx - 2 n
...
For
G,,':"
...
ft,j(X .),
J
where ft,j(x} is the leave one out p.stimator
( 1.2)
issue
denote the minimizer dfJthe cross-validation score
...
(1.1 )
of
cross-validation.
,
....
the choice of o(·}.
, ..
~dBowman (1984)
For density estimation:'R(idenio (1982)'
t
is
far more important for the
n
~
1=1
ijll!j
°t(x-X i }·
- 3 The motivation for the form ~ ~t;) d,s .' t~t..
estimate of the first two tef'!lseQ,f:,[;:;
""'2
'1':)
"'"
u: P!'o" i des i ancintui tively
!"::lic"f'>;
~,'~i'
. 2
"'"
reasonable
':r;,.·:
2
ILT?1 ~.r$f:f? +rtlbf't'L~d(-frrf )dr~:
' ::"
See Stone (1984). Marron (1987):{~dr~i'l!·:>8(IMarron(l987a.b) for properties of t
,
r
1ih~~
and an access to additional,literature.
in Stone (1984).
~
(X)6
cv
particular form of CV(t) is treated
Several slightly different. but essentially eqUivalent forms may
be found elsewhere.
For
intensi ty
Diggle '('i~}' has
estimation.
following empirical Bayesian method.
proposed
choosing
t
= ~ and
E[A(x)A(y)]
= v(lx-YI).
of a Cox process. or doubly
Isham (1980. pp. 70-75).
the
First suppose that the intensity A(x) is a
realization of a stationary. nonnegative valued random process {A(x) : x e
E[A(x)]
by
~}.
with
Then. X •.... X form a partial realization
1
n
stocl~stic
Poisson process.
See. for example. Cox
&
Letffng E ~note-~xpectation over both the randomness in
Xl •... ,X and in A(x}r Diggl'i!' S'~W$ t:na.ti.~iror x"1 iDO'l4{thBn t Units form the boundary
n
of [O.T].
(1.3)
MSE( t) == E([X '('()~{xH~'i::'
t
= v(O) +
IE:)
':>.-;1
<"y',
'
Jlti~~{t)]i2il +lr:J2~)2J:t
":'~
i
C'~ .: :'~1!.~
/.!:.
r
K(y)dy.
,7 ~)'J .:.";'
where
It;17~~~~ '~t~.~)dx>~
and where the special case of the uniform kernel
isv-e:;:( -::';
6(e)
( 1.4)
~!"1nlJp('
T("j
= !~c::[~1.11·
1
(e).
''':::'2;'':'
"'"
has been used in the estimator At .
.-
The function K(t)
is very useful for
analysis of spatial point processes; 's"eeRipley (1981) and Diggle (1983).
one-dimensional case it can be estimated by
"'"
K(t)
=T
n
the
.... ;.•. ~> r t
'I: -
-2
~ ~ 1[_t t](Xi-X.).
i~j
•
J
In the
- 4 Hence. the ~Wi~~~,~h'~~c~in;!ml;..~~:..t~.e ,,"~,~,uared error (1.3) should be fairl.
well approximated-:,~l"the,_,:~PJ'\ii,~_fh';ttM.~;~~i$~,
minimizes
A
-I,
-1.
A
A
,)l(tl={aa.t)':~ -~,j;~nrKf't.)T
where Jl has beeri<esrtmB.t~(Ff)y~o:;et;"F
.
+'(2t)
-2
Jr2
o
j" "
;..~~.
t
A
K(y)dy.
(1.5)
It should be noted that in all of the above. no attempt has been made to
adjust for boundary effects.
These adjustments are very important. but have a
tendency to obscure the main points being made.
Hence. we ignore boundary effects
and adjustments until Section 5. where a detailed treatment is given.
2.
The Equivalence
Note that the bandwidths t
cv and t M are both well defined in either setting.
The surprising fact that they' ar'~) emctly t~~ same. under no addi tional assumptions
e
Theorem 2. L:
In the case of the uniform. kerneL. (L4).
("
"
.f
';
~'-...;:l')
.'-- L
\.\ L J. ,-
.~
()*-;/}
t
~-,'.
'
-
-
' J ~.;, i:":"
cv = t x ;
~i
-",'
A
in the sense that each minimizer of CV(t) is a lIlinilllizer of M(t) and vice-versa.
l ..
~. 3: 1 1 J
(~ ~
_6
Proof:
Theorem 2.1 follows
immed~~tel~ from
)J~:, ,~'):_ (,:~-.
Lemma 2.2:
(
)6
\ ?\\~."-
For the un1.form. kerneL.
A
M(t)
=T
• CV(t).
The proof of Lemma 2.2 is in the 'appendix.
The first
consequenc.~
of Theorem 2.1 is that i t shows each method is more
natural than preViously suspected. because it can be derived from much different
ideas than those originally used.
However. the benefits go far deeper than this.
e
5 because ideas in one setting '"'JaJf:ii[)~~
setting.
In section 3 we
estimation can point the
show·,~,Jltlte·Nf~d·la~v~~· as~totic
way~
different kernel functions.
into the other
theory of densi ty
)to- Jt,ew"l1c1eas" ib·f.,utensi ty estimation by discussing
including those of yeiMfi'f:mPF~er(.,!. In section 4. an
devel~~fqr
empirical Bayes approach is
3.
~:~riiig:'~b~{~'C'~re'i~s:f~t
, , . i
-
..;
density estimation.
Benefi ts for Intensi ty Estt... Uon
.
~
'"
:
Note that the resul ts in section 2 are only stated in terms of the uniform
kernel.
(1.4).
"-
This was because in Diggle
(1985) the motivation for t M appeared to
"-
apply only to this case.
However. the motivation for t
cv
works for any kernel. so
"-
this suggests using Lemma 2.2 to find an appropriate M{t) for the general case.
In
particular define
"-
,:,m.
. ~'I'i.B<:
~tl,,}=, ~,>x-:,. qy~t)v.~u
H:1':;, __
"-
A form of M{t) which will allow a Mean Square Error interpretation. is given by:
uhere
~.Jn.\!i', .\j Z.\
"-
K6(t) =
(.~} V~:'
-2
';')
'1~:;:J.~"
J.'
T ~ ~ 6t (X i -X j ).
m01c vl~}E~b~~4 2~O
6*6(-) =S'6(--x)6{x)dx.
n
t
A
and J.I. is deftn.ed at (1.5).
l, :; )k
The proof of Lemma 3.1 is in the appendix.
Next
it should be verified that·
ch~osing
t
n
to be the minimizer of
"-
general M{t) still makes sense in the context of Diggle
(1985).
this
Much of the work
for this is summarized in the following Lemma. whose proof is also in the appendix.
Recall that boundary effects are ignored until section 5.
6 -
Lemma. 3.2:
•
unere
'.:;~~: rL:'.u.iw
.:ir::;no0Im...:'2 e~.C ;;;~.'~i.c..
= 2t'~
K6 (t)
K6 ( t)
Since
JO 6 t (u)v(u)du.
is a reasonable estimate of K6 ( t).
,
• n01 ~ .m;; j,' ',:>
)('(oW'" 5<....
(which also minimizes M(t»
.'~ ~",-
:: ; : ",
the minimizer of
~2M( t)
.,
should be close to the minimizer of MSE(t).
;:~,P:l :)
+ v(O)
Note that
:3 [}Jlj!c:~
6
when 6(-) is the uniform kernel. K (t) differs from K(t) by a factor of two.
A very useful tool in the study of density estimation is asymptotics with
The above
strong connection between bandwidth selectors
in
the
motivates looking for an analogue in the intensity estimation case.
T~.
of asymptotics that one might consider is
two
n~.
settings
The first type
However. as pointed out at the end
of section 2.2 of Diggle (1985). this wi 11 only add new information at the right
hand endpoint. instead of eve;yWhere as for
add information everywhere is to allow
, -,.
in density estimation.
Il4D
.~~ (wh~re ~ was
stochastic process' introduced 'at<:"the end of section 1).
s .
t:)t~ ,
1" U(:, .. '-~··r....rl~d
1: (.
the mean of the
'), - " (.i! - (
... I,).
~ri:
:1:' (7,2
·c l'G ~J :', ~)fIW
This
,,£immsl ;JJ1 ':-wo i 1
'7-
Note that care must be
in the limi ting
.:~~.
is accomPlished by taking the expected product
(defined in section 1)
underlYing~
:: C
taken to avoid changing the relative shape of the curve A(x)
process.
A way to
function
vex)
to·~bof~t,b~nfcp'ltl.sn\Hi'-:-
vex) =
(3.1)
2
~
vO(x).
for some fixed function V6(~}'{iJ}fdri,I'J-Mgh~;into the effects of ~~ with v of the
form (3.1). consider the special case of the "linear Cox process." described in
. ,.'>: .~.
section 2.2 of Diggle (1985).
(3.2)
where
$.~~ "}
J~
".'
;~
j
For this process {A(x)} can be represented as
A(x)
=
~
h(x-Zi)'
i=1
the Zi are the points of a homogeneous Poisson process and h( -)
is a
non-negative valued function.
A very important application of n~ asymptotics in density estimation is that~
they allow a very simple and elegant quantification of the smoothing problem. i.e.
- 7 the fact that t too small admits too much sample noise while t too large smooths
away features of the underlyi~ :~~r~J-: rn.\~ri'i~u~ar·•. ,s~e RQ!Senblatt (1971) for
.J
,c
example. various squared error cri teria can be expanded into simple and easily
squar~ b~s :"CompoJlentswh~ch become
: L:' J ..; (.' .""'\~
.;..~:~.:.: .::-" f .. ;
interpreted variance and
?>'
large when t is small
and large respectively.
To apply these ideas
the normalization factor
~~ inten~~~ty -;~tl~~)~~n.i.~~,~w~~~,~ith J,L-~( t) •
between the estimators f
J,L-~{t)
may be thought of as adj1.,lsting for
J,L
A
A
t
.'
and At.
1,
(~"., ){
,1 :3 TT cn,
-: .•~ .
where
the difference
From Lenina 3.2.
= J,L- I t- I [Jo2 ]
+ t-IKOMO{t) - 2t- I K {t) +
o
vo{O).
An inspection of the proof of Lemma 3.2 shows that the first term can be thought of
as a
bias.
type of variance. whi Ie the remaining terms are the corresponding squared
Hence we define
To gain more insight into the behaviour of the squared bias. consider the
.
'c' 11
:)q'Srl~ sv: j £1 ~;.
-"r!:l '~G :-:-';i
expansion sUDIIBrized in the following lelllllB.. whose. proof is in the appendix.
-x '"
:' ~i J
~:l.i. :iy;j
,,:d h:t r; ~ t ; (!!;Ie-~):: ,
Lemma 3.3:
IF v O{ Ixl) has a conUI'll.UJi1§!f8Urth C-Mri.1:la,Uve at
It is interesting to compare this wi th the expression for
;:2tnC --;,:~
density
estimation.
see
for
example
"iii:!
then as
the squared bias in
'TO
(1.6)
of
(4){O) replaces [f(2){x)]2.
O
difference is that V
the origin.
Rosenblatt
(1971).
The
only
This is not surprising in view of
the well known relationship between the k-th derivative of a process A{x) and the
2k-th derivative at the origin of its covariance function.
v{x)_J,L2.
.! :-.:",
L~L~r;Jr'l
- 8 In densi ty e§-tiJiBcl'on~ %~,: vadanceSi-term also admits a
expansion.
see '(9)'~o£-'~'R:hsenblatt!
:em1 y:";,:",
simple asymptotice
Note that the dominant
expansion is in fact the same as v(t), if we identify n
with~.
term of this
The fact that no
expansion is required,Jin "'tib~Pl'rest!n:~corltext' seems related to the fact that the
variance of the Poisson distribution has a simpler form than the Binomial.
Parzen (1962) demonstrated that if a density has more than 2 derivatives. say
A
k, then a faster rate of convergence of f t'io~f can be obtained by using a "higher
order kernel", i.e .. assuming that
J
(3.3)
x~6(x)dx = { ~
C>O
if
~=O
if
~=1,2 •.... k-l
if
~=k
It is straightforward to extend the above computations to this case.
The answer is
sUIIIIIB.rized as
Lemma 3.4:
IF 6 ~tisFieS f3;~)0~(,~~rlfIJl~~' a cqntirwous 2k-th derivative
the origin, then as t-(),
.- ..:,
2
b ( t)
'~~Ll(f lC"~: ..·S
=t
2k
v~,
,
...:.,r:·)~::~·.
' V (2k) (O)[J uk
o
ate
-;!:-,":~~~'1-1,:
6,~uJ~~iq2 +
o( t
2k
).
.",
The proof of LelllRB. 3.4
Is )oiilftted:WeauSWiJH
tW'-e'ssentially the same as the proof
of LeDllll 3.3.
The above resul ts are a very sDBll ,part of what can be done in terms of
';
finding
intensity
'~:'.~~
,I
~U~. ~ . .")ff;~.
estiDB~i~n
1'2:
L:J~ JZ.r:~Js..J
;':an@lqgso£ W,Aat
;:t ; ",_ ~,' .r
1
;;,.
:~,' .~- l ]
is
already mown about
density
-0
t'r\.. .
estiDBtion.
Other possibilities, that will require much more work than can be done
in this paper, include analogs of:
(i)
the optillBl rates ideas "of Farr~li (1972) and Stone (1980) for
example.
(ii)
the asymptotic optiDBlity results of Stone (1984) and Marron (1987)
for example.
- 9 -
C?holJ.!..!..ktiJ.Q,d
(i ii) The noise in
bandwic;it.~jseJ~F~1-~Rovi~
(iv)
depeng~IW smo~fl?!IM ~sL~f~r~br~sq~{1982) and
the location
Marron (1986a).:
~
!-'1W
:n .:
?~; ~ •./;..,
,'.j
Marron (1987a. b) .
Hall and
'.;;"t,,:"
(l986b).
4.
Benefi ts Eor dens! ty esti-.tion
1.,The equivalence described in Section 2
~tiva,tes
consideration of the density
estimation problem from the empirical Bayes type of viewpoint considered in Diggle
(1985).
An analog of the Cox process may be defined where Xl .... ,X
n
is an iid
sample from a probabili ty density A(x) on [0. T] which is a realization of a
stationary stochastic process A. with E[A(x)]
= r- 1 and
E[A(x)A(y)]
= v(lx-YI).
An
example of such a process A is a normalized vesion of the "linear Cox process"
described in Section
2.2~'ot Dt&iI~>h~);m~d;~{t3:2j'(~~~ethat
,l,'
in this context.
21:.
a kerne I dens i ty estimate seems !special}¥~,app~ppriate.~ since
the ~a~l:t'1:i~~Y.J )'; U '1. J l :,') \ A'~i C<! .ii.,..,~.
''
density itself has
the underlying
*
An expression for th.~2~ ~CJ~J!JJ3SRio~'el~o(~ l. ;of the estimator f t (x) is
A
contained in
Lemma 4.1:
MSE*(t)
mere
',,,
"'IRq
J lane
V·19Y
"'>
9"
e"
In the Cox process density esttmattori setting
= (ntT)-l[I 52] ~'1Ff_n~1)t~*f~sr~~~(i}~12t-1T-2 ~(t)
::Up9'"
Ii I'" u:...-f.J
The proof of Lemma 4.1 is in the appendix.
+
v(O).
..-';~Jt)l'·
Note that when
~
is identified with
T- 1 . this is very similar to Lemma 3.2. the only difference being that a factor of
1
n now appears in the first term and a negligible factor of (1-n- ) is now in the
- 10 -
second term.
exchanged for one closer to the Binomial ":':<
• :;.~
Now, ~(t) is, by considerations similar to those above, reasonably estimated
by
~(t) = (tit):;) Tr.fi·-'c5~lil·(1-n-1)t-1 Kc5*c5(t) - 2t- 1 ~(t)
~
say t
x'
brought
~
should be close to the minimizer of MSE (t).
full
circle
by
showing
that
"'*
t
has
x
a
The ideas of this paper are
representation
in
terms
of
cross-validation
Lemma. 4.2:
unere
'~l
cv"(t)
";'::
1_,
!' _i
~' :-~~~,~:
\ : ....
:-~i b!1,~'
~JL{€(,>Ell~"* -.,~~ L:;~10 f:'. j
(Xj) •
The proof of Lemma 4.2 is omitted because it is very similar to the proof of Lemma
:;pt1i:19':
2.2.
. ·:'/...:-x'\
?: ...
~'. j~ . ..;7.;1
"t_'
AM
AM
The difference between the present estImators, Kc5 and ft,j' and their original
1
analogs K and £t,j' is a negriglble"factor of 1-nThe reason for introducing
c5
new notation instead of simply remarking that they are approximately the same is
that the cross-validation score CV*(t) aPPears more often in the literature than
CV(t), see Rudemo (1982), Marron (1987), Hall and Marron (1987a).
The calculations
•
.... 11 done in this section make
cv!!:;'1 s~m
JsIJfgtrely~~"iDOr~cl
'\iiihrar!"'''
essentially no difference in practiceJS.i'l'c.:J.::::j 9rL ,).
5.
Boundary
!"".-;
However,
there is
~ ,.:';,r:'
adjus~ts
,
Boundary effects can
are positive.
. .
M-
b&- a:- p~obl;~~'iIf );~&ffsli; Jstimation
If the definition of A(X) is extended
to have the value 0,
when
,,:'
ou~side~of
then ~(x) "is'1dhconttnu~'Q!J t:s.'tl ~Q,\and T.
~(o)
and
~(T)
[O,T] by taking it
In this case,
the
continuous estimate At(X) will perform poorly on neighborhoods of 0 and T.
By
A
studying asymptotics of the type dhcussed in Section 3,
A
quantified.
In particular it can be shown that
this difficul ty can be
A
~t(O)
and At(T) are inconsistent.
The same problem exists in density estimation when there are discontinuities
in the density, such as happens if the density is bounded above zero on an interval
'See Rice (i~j. Schuster (1985) , Gasser, Muller and
and equal to zero of f i t .
Mammitzsch (1985) and Cline and Hart (1986) for further discussion on this topic.
A simple method of
:lcoFr~trrig··tlif~ Ipr./Q6}]..; iSi
the "mirror image" type of
c:.;l
adjustment considered by Schuster (1985) and Cline and Hart (1986).
We illustrate
A
the method by showing hO~1 ~~ ~1~,~~j~S~~,at O.
Adjustments at T and for
density estimation are similar)::::!fhe basic idea 'h that the piece of each 0t(·-X )
i
~_~'tJ
that extends to the left of 0 should be "folded at 0" so that all of its mass is
:,-",)' c I ! i 5~~1.!B~4'){~ :tle1 :)~,mo
inside [O,T]. This is done by defining
A+
At(X) =
(5.1)
n
i:l'J~f~:~i}!I:?-~'1;~X:X,i) J. ~[O, T](x).
Another possible boundary~j9~~t~n;i~~:~g.iven in (1.1) of Diggle (1985).
A
simple asymptotic expansion of the type in Lemma 3.3 shows that Diggle's estimator
will
typically
implications of
have
this,
slightly
more
bias
consider Figure 2,
estimates, which are in close agreement,
than
(5.1).
which shows
To
the
see
two
the
practical
edge corrected
together with the uncorrected version.
Clearly, some form of boundary adjustments is Vital.
- 12 Note that (5~1) is, exac,tl~,~~he, t~ o~.::~~dary adjustment that is being donee
by the estimator K{t) in Section 2.3 of Diggle (1985).
of adjusting for boundary effects is
~o
Hence, one sensible method
A+
use the estimator At' with the bandwidth t
chosen to minimiz"e' ~(t) :as ,defined' ~n~'Di'ggle{1985).
Similar remarks hold for
density estimation.
Figure 2
Y.b i ~.,:
A drawback of the above recipe is that for the bandwidth selection part,
i.~
f ,'~ j ~.: ,-,IT ')::-:
the
boundary will still have a tendency to introduce a bias towards undersmoothing (see
Cline and Hart 1986).
One means of approaching thiS problem is to use the more
sophisticated boundary adjustments proposed by Rice (1984) and Gasser, Muller and
Mammitzsch (1985).
Another means of approaching the bias towards undersmoothing can be motivated
from densi ty estimation,
[-1, 1].
see Marron (1987).
Suppose that 6{x)
is supported on
Find to so thaf r'easonabrlf:J,-\"';lties 6f-'i:"are smaller than to'
Now, as
ine
Section 1, considEH·'estimting theoft'+'$t:t1tOCt~P~sof
:r-to. "-'2
,'"
So
t
f
'.'"2!"
~' t,..
'" :r-to
" J:r-.t0 "
0' '"... <""',i
,>~.:-~,
I",
I 's'
- 2
t
...
'J'
f f +
t
j
f
t
art:: " :
0 .:_..~_!:X-,~
0
by
=
t
0
•
2
[f -f] ,
t
()(';
It is important to note that ~~~r'.:~~-"~l~' :Y~n?,~re used to construct the estimates
u
f t and ft,j' but only those
~ns~~tJto;,T:-_tO]
avoids moving the problem of the
~un~ries
appear in the sum inside
0/.
This
at 0 and T to to and T-t O '
To find the analog of this in the intensity estimation setting, Lemma 2.2
motivates looking for another interpretation of
. . .+
+
M (t) = T • CV (t).
Much of the work in this is summarized in:
- 13 ,,-'
Lemma. 5.1:
!,~
If {, is supported, on
~qlp
J -1,1] s:,and
'C' ~ ~;:;>l J '.1
<
t
(i.J; J ::J',c
-' -
:' ~
\' L ,"SX'"
!i,lJ
to'
L
t~n.
(~\ .-;
in the intensi ty
-
estimation setting:
The proof of Lemma 5.1 is in the appendix.
M+ (t)
It now follows from Lemma 3.2 that
proper attention paid to
boun~ry ~ff,~~t}s
JTz2Ion
is essentially estimating.
wi th
;_,)',
-1 -2
t;;J6i£~(,~J~c~.d'JOU"".-;
+
To get some feeling ~?>r ,:~~~ ~,~or~~e;:;)~f <;Xy> r~on~ider Table 1, which shows
"'+
__
A+
the approximate value of-~~. the.:rni~imi~et: <?:,fiil-Jt). f~pr:,several values of to'
~
'::.
;
+-
t, I ~l '
to
A+
t
M
0
2000
6000
,6100
l!
t
i
j,'"t
,~:,
'
7
4000
.~
"
,1
7
-,.
.~',
,:
Table 1.
.620<).,
-,1
.. h
.~
6000
8000
10.000
1i8OO
6000
5.300
Note that the values are qUite~~table until to becomes 10.000.
this, note that in Figure 1,
are removed from the
information available
~ih6'lrto ap~r«M~h~!i; r<f:rcioo
intervaf'ttoi~T:t~J:~~high":has a
to~.
represents
a
two quite pronounced peaks
large effect on the amount of
11l~ p'~ag't1~1'- dlfference between the bandwidths in
Table 1 is demonstrated by Figure 3.
t =10.ooo
o
To understand
materially
The fact that the curve corresponding to
different
demonstrated by the extra small mode near t=2OOO.
amount
of
smoothing
is
clearly
-14-
Appendix
Proof of LelllDB. 2.2:
First . .note that
t ii(~a? \~ ~ ~::;;11~"i[
-y ,y] (Xi -X j )dy
"n
Hence
(A. 1)
=
1
f[ft(x) ]2dx = n-2 [~ f 6 t (x-X. )2dx + }; }; f 6 (x-X.) 6 t (x-X .)dx]
t
i=l
1
i~j
1
J
-2
4n
t
-2 [ n
}; f l[X -t X +t](x)dx + }; }; f l[X -t X +t](x)
i=l
i' i
i~j
i' i
e'
Thus,
.
,'..
'\:.........~
= (2n:/;'~~;::~;~~t l((~~)dy
_ (tTl- 1 (((tl
.. ~
-1 A
= T
M(t).
Proof of Lemma 3.1:
First note that (A.l) becomes
f[f (x)]2dx = (nt)-l[f 62 ] + n- 2 t- 1 }; }; 6M6«X -X.)/t)
i J
t
i~j
"1
'..
- 15 -
and that. by {I. 2}
-1
-2n
n
Proof
of
LelllllB.
:
~'"
(
~.•
t~
-~:'
i"':" ,:
-
~6{:~·
V"hf,~'\
-
.'"
~
"'.-
th~ ~de()fnil;;ons '(1. I} ~(('(l.5:j'.
.
__
_ _
Fit;"$t.
~P~e.rv~
3.2:
'" !O~,_
.3£iJ
j:i;'~t.j(Xj,~. = -2(tT}
,-
LelllllB. 3.1 now follows from
<
""
n _ ,-t~._
conditioned
on
'"
:""
(E[~\(x)
IA] - A(x» 2 ={I
6 (x-u}A(u}du - A(x» 2
t
II
=
the
process
=
6 (x-u}6 (x-v}A(u}A(v}dudv
t
t
-2 I 6 (x-u}A(u}A{x}du + A(x}2.
t
and
'" -
.'.
, .. ):
var[A~txftA]
='J
2
6~(x} A(x}dx.
Thus
-,
A
MSE{t}
= E(var[At(x}IA]
~
2
+ (E£N~~}IA] -~A{x)} }
A.
" ....
- ...",
-2 I~~{x-u}v(lx-ul}du +
I'-
.,
_.
~
.......
.'
y}~
.'
~__
{j:~~.'.~
!
,.~..
t
ufO}
_
+'
l
,
= ~ t
Proof of LelllllEl 3.3:
b2 (t)
= t-1 [2
-1
[I
2
6 ] +
2
~
t
-1
K {t} 6M6
Note that
1:
1
6*6(u/t)vO(U)du] - 2t- [2
1:
2
~
2t
-1
K6 (t} +
ufO}.
6(u/t)Vo(U)dU] + vO(O)
A.
\"
- 16 -
•
where
Thus
j f:f
-'4; _. ~
£;0
2
(A.2)
b (t) = I Cj:t
j
~)9'
,.,,-
4
vj(O)/j! + vO(O) + 0(t )
j=O
where
= II v
j
o(v-z)dv o(z)dz - 2 I vjo(v)dv
=jI(u+~)jj o(u)
j
= I
l=1
o(z)dudz - 2
I
v j 0 (v)dv
d) [I Ul~(~)dUJ
kzj-lO(Z)dZ]
:>.:;:1) '.:':in
r
"
- 2
I vj
o(v)dv.
lxb ;"
But 0 is a probability density which is
Co
, .' Cf'::S; O.
symmet~iC ~bout the origin. so
= -1,
.jl:;: tr.2;3 •
C4 = 6[I u2 o(u)du] 2
Lemma 3.3 now follows on applying this to (A.2).
Proof of Lemma 4.1:
Working as in the proof of Lemma 3.2 above. we first condition
on the process A and get the same expression for the conditional squared bias.
This time the conditional variance is
1
2
-1
2
var[ft(x)IA] = n- I 0t(x-u) A(u)du - n [Iot(x-u)A(u)du]
A
- 11 -
Hence,
MSE*(t)
= (nT)-l I 6t('V>fdV:+j,fl-,~1)-(JJ')~~\tX41)CSt'(.x-v.)V( Iu-vl)dudv
....
~
'.,
-2 I 6t(x-u)v(lx-ul)du + v(O)
Proof of Lel'lllB 5.1:
E[l
i
Since 6 is .supported ,on [-1 oJ;l] and t,>
r
to
0t(X-X )2dx]
i
to
_.
~~.- ~_-
~ ~aT l~t() Ot(X~~)2A(Y)dxdY]
0
to
=. CT.-tO)'J.L
:'") 6 ~. :5 j.-;
which is the first part of Lel'lllB 5.1.
< to'
~"
'!;.'",,'.~~ .J..t
t
-1
::
2
[I 6 ],
- 18 -
which completes the proof of Lemma 5.1.
- 19 RE~;t.
Abramson. 1. S.
( 1982). "On bandwid th ,var,tatiQn in. kerne 1 es tillB tes - a square roo t
,
C
I
law." AnnaLs of Statistics. 1O:-t217J.122:h
Bowman. A.
(1984).
"An alternative method
u,:: t{L>') ~ L
~t ~[;?r~:~H~~f:~~?"f~': the smoothing of
density estillBtes." Biometrika.. 71, 353-360.
Cline. D. and Hart. J. (1986). "Kernel estillBtion:.etf 4.S'it'ies with discontinuities or
discontinuous derivatives." unpublished manuscript.
Cox D.R. and Isham. V. (1980). Point Processes.
Devroye. L. and Gyorfi. L.
(1984).
Chapman and Hall. London.
Nonparametric Density Estimation: The L View.
1
Wiley. New York.
Diggle.
P.
J.
(1983).
StatisticaL
AnaLysis
of
spatiaL
Point
Patterns.
London:
Academic Press.
Diggle.
P.
(1985).
"A kernel
method
for
smoothing point process
data."
AppLied
Statistics. 34. 138-147.
Ellis. S.P. (1986). "A limit theorem for spatial point processes." Advance in AppHed
ProbabiLities. 18. 649-659.
EPanechnikov.
V.
A.
(1969) "Nonparametric estim.tion of a multivariate probability
density". Theory of ProbabHHy and Its AppUcattons. 14. 153-158.
Farrell. R. H. (1972). "On the best obtainable rates of convergence in estimation of a
density function at a point." AnnaLs of Iatheltattca1. Stattsttcs. 43. 170-180.
Gasser. T.. MUller. H. G. and Mannitzsch. V. (1985). "Kernels for nonparametric curve
estillBtion." JournaL of the RoyaL StatisticaL Society. Series B. 47. 238-252.
Hall. P. and Marron. J. S.
unpublished manuscript.
(19868.). "Choice of kernel order in densi ty estimation".
"
- 20 -
. 5 •... "1' '"' "i ('1"", ;' +.", p'.r- ;')f'i :.'
;."
•0.1 ~~/~ ~~'~~,';~;,,';'I.~": ..;~.
IIarron .-.J. :l:l:~lllitt~"-;:Ex~j;j,t.:.'.·t~:.d,ich ieas t-squares cross-validat i o , .
Hall. P. and
minimises integrated square error in nonparametric densi ty estimation." Theory of
ProbabiLity and Reta ted
Firtds~··to· appear.
Jarrett. R.C. (1979). A note on the intervals between coal-mining disasters.
Biometrika. 66. 191-3.
Leadbetter. M.R. and Wold. D.
(1983). "On estimation of point process intensities."
Contributions to StatistJt;$:E:is.a.ys._in Honour of Norman 1. Johnson (P.K. Sen. ed.)
North-Holland:
Amsterdam. 299-312.
Marron. J. S. (1987). "A comparison of cross-validation methods in densi ty estimation."
Annals of Statistics. 15 (to appear).
Parzen. E. (1962). "On estimation of a probability density function and mode." Annals
of Mathematical Statistics. 33. 1065-1076.
Rice.
J.
"Boundary modification for kernel
(1984)
Statistics. Series
A.
13. 893-900.
Ripley. (1981). Spatial Statistics. New York:
Rosenblatt.
M.
(1971).
regression." Communications in
"Curve estimates."
Wiley.
Annals
of Mathematical
Statistics.
42.
1815-1842.
Rudemo.
M.
(1982).
"Empirical choice of histograms and kernel densi ty estimators."
Soandanavian Journal of Statistics. 9. 65-78 .
.
Schuster. E.A. (1985) "IncorpOrating support constraints into nonparametric estimators
,
of densi ties." ColRIIwnications in Statis tiCS,'" series A. 14. 1123-1136.
Silverman. B. W. (1986). Density Estimation for Statistics and lhta Analysis. Chapman
and Hall. New York.
Stone. C. J. (1980). "Optimal convergence rates for nonparametric estimators." Annals
of Statistics. 8. 1348-1360.
Stone.
C.
J.
(1984).
"An asymptotically optimal
window selection rule for kernel
density estimates." Annals of Statistics. 12. 1285-1297.
.,
o
o
C\J
..-
Figure 1 . Effect of band-width on kernel estimates for
c_o_a_l-mtr:i~5aJc;ijs~~~eJ,?ata
o
LO
,
, .. , - '
T""
o
o
I
I
I
,,,
I
I
I
I
o
o
1
I
T""
I
o
I
I
I
I
I
I
I
I
I
,
I
I
I
I
\
\
I
I
I
I
I
I
I
.
LO
o
o
I '.
\
I
" ''1", 'j
• <,•• ,,'..
,,
o
\
\
.,.,?
,-.- ,". ",',
,,' l,,-,·'t"""
•
,",'J '" ."..
",, " , , , , "
I
"
•
,"~,.I
",': \.2 :J': ~ ' .. ~:> ~., ;,
"
...,
,,
,,
,
,
1
~"';;~' ~ .J ,J.:.r'·;~.' ':
I
I.
1
I
,
I
I
I
'I
I
,,'
~
\
/
\/
,
/
o
o
o
10000
20000
30000
time (days)
40000
50000
fold ed at
~d-
Ft'/l;ts
LO
~
o
o
o
o
~
o
LO
o
o
o
•
o
o
o
10000
20000
30000
time (days)
40000
50000
e
'"
-."
.n
,,,,,.,..
.
,r--:~
",1'''''
.-~,
.,"
-
Fjgure 3 : Extreme .k~r'lel~~tjtrta~E£$lfQr coal-mining
dIsaster data, as seJe:e.telf.1:i¥-~C~'tt:Wlth O~tcs'1 0000
o
C'J
o
o
-t
(\
=,00u0
L()
,....
o
o
o
,....
o
o
L()
o
o
o
o
o
o
10000
20000
30000
time (days)
40000
50000
© Copyright 2026 Paperzz