Patil, Prakash N.; (1990).Automatic Smoothing Parameter Selection in Hazard Rate Estimation."

AuralATIC SMOOI1IING PARAMETER SELECfION IN
HAZARD RATE ESTDIATION
by
Prakash N. Pati I
A dissertation submitted to the faculty of the
. University of North Carolina at Chapel Hill. in
partial fulfillment of the requirements for the
degree of Doctor of Philosophy in the Department
of Statistics
Chapel Hill
1990
Approved by
Adviser
Reader
Reader
PRAKASH N. PATIL.
Automatic Smoothing Parameter Selection in Hazard
Rate Estimation (Under the direction of J. S. Marron).
ABSTRACf
The two commonly used kernel based estimators of the hazard rate.
in either uncensore or censored setting. are essentially variants of
the same estimator and share similar optimal properties However. they
are not cOmParable in the sense of mean integrated square error. By
studying their effective kernel.
from practical point of view. and
mathematical tractability. a different view point is given for choosing
between them.
The smoothing parameter obtained by minimizing mean integrated
square
error
depends
on
an
unlmown
functional
of
the
parent
distribution. For this choice of bandwidth to be of any practical use
one needs to estimate and study the estimate of the functional of
parent distribution. Alternately the least squares cross-validation is
employed
to
choose
the
bandwidth and
shown
that
such choice
is
asymptotically optimal in both complete and censored samples.
From a practical point of view it is important to know at what
rate the cross-validation bandwidth converges to the optimal. The rate
of such convergence is obtained in general format so as to cover both.
the complete and the censored sample. cases of density and hazard rate
estimation. The ralative magnitude by which the use of cross-validation
bandwidth one fails to minimize the integrated square error is also
obtained.
it
Aa<NOWLEDGEMENTS
First and foremost, thanks go to my adviser J. S. Marron for his
giudance in the preparation of this disserUltion.
I would also like to express my appreciation to the members of my
committee (Gordon Simons, Norman Johnson, Edward Carlstein, Jianqing
Fan and Kinh Truong) for their assistance" as well as to the entire
facul ty of the Department of Statistics for their encouragement during
my studies.
I
also wish
to
thank Mr.
Muley,
Secretary,
Ahmednagar
Jilha
Maratha Vidya Prasarak Samaj, Ahmednagar and Professor K. H. Shitole,
Principal, New Arts Commerce and Science College, Ahmednagar, India for
granting me study leave.
Thanks also go to the staff, in particlllar June Maxwell, Courtenay
Stark, Lisa Brooks and Peggy Ravitch for so many different things.
The constant support and encouragement of my parents and family
are something for which I am always very grateful.
Final
thanks go
to my
fellow
students
for
their
support and
friendship, especially my officemate of four years, John Crowell.
iii
TABLE OF OONTENTS
aIAPTER 1. INTRODUCfION TO NONPARAMETRIC HAZARD RATE FSfIMATION
1.1. Introduction
1
1.2. Kernel Based Estimators
2
1.3. Practical Problems Associated With the Estimators
4
1.4. Summary
9
aIAPTER 2. KERNEL BASED FSfIMATORS OF RATIO FUNCfIONS
2.1. Introduction
12
2.2. The Estimators
13
2.3. The Natural Hazard Rate Estimator
18
2.4. The Natural Density Estimator for Right Censored
Samples
22
2.5. Censored Hazard Rate Estimatiom
24
aIAPTER 3. AlTfOMATIC SELECfION AND OPTIMALITY OF TIlE SMOOTIIING
PARAMETER
3.1. Introduction
25
3.2. Uncensored Case
30
3.3. Censored Case
33
3.4. Proofs
37
aIAPTER 4. CENTRAL LIMIT TIIEOREM FOR TIlE INTEGRATED SQUARED ERROR
EVALUATED AT A FIXED SEQUENCE OF BANDWIDTHS
4.1. Introduction
55
4.2. Central Limit Theorem for ISE
56
iv
4.3. Proof of the Theorem
62
4.4. Lenunas
66
aIAPTER 5. OOMPARATIVE STUDY OF TIIE BANDWID'IlIS h AND h'"
c
0
'"
WIm h AS REFERENCE
0
5.1. Introduction
77
5.2. Main Results
78
5.3. Lenunas
84
BIBLIOGRAPHY
111
v
•
aIAPTER 1
INTROOUCfION TO NONPARAMETRIC HAZARD RATE ESTIMATION
1.1 Introduction
Suppose X,
the time to failure of an article, is a nonnegative
random variable with distribution function F(x) and probability density
function f(x). Then the chance that the article, functioning at time x,
will fail in the-interval (x,x+ox) is
P[X€( X,X+0x )Ix>x ] -_ P[X€(x,x+ox)]
for P[X>x] > 0
P[X>x]
x+ox
! --I
xI f(u)du
for F(x) < I,
=
I-F(x)
and by continuity of the probability density function f(x) we-have
f(x)
o~ P[X€(x,x+ox) IX>x]/ox = I-F(x)
•
for F(~~~~ I,
= X(x).
The function X(x) which indicates the "instantaneous
p~obability"
that
a failure occurs at time x, is called the hazard rate:. function.
It is
.
(
also mown as the conditional death or failure rate.
,-
It is useful
particularly in the context of reliability theory and survival analysis
and hence in fields as diverse as engineering, medical statistics and
geophysics.
Rice
and
Rosenblatt
(1976)
deal
with
a
problem
of
geophysics in which X(x) is the risk of having a new micro earthquake
at time x. Silverman (1986) uses the Copas and Fryer (1980) data on
lengths of treatment spells (in days) of control patients in a suicide
study to study suicide risk. A discussion of the role of hazard rate
•
.
•
functions in the interpretation and modelling of survival data is given
in Cox and Oakes (1984). Estimation of the hazard rate function is
therefore. one of the questions of prime interest in survival analysis.
If the functional form of £(x) is \mown except for some unlmown
parameters then the functional form of the hazard rate function will be
\mown except for the same unlmown parameters. Therefore the sample data
will be used to estimate the unlmown parameters and hence the hazard
rate.
Such
parametric
models
of
X(x)
and
inferential
problems
associated with it have been extensively studied.
see Barlow and
Proschan (1975).
is appropriate.
When the assumed parametric model
corresponding inferential methods are usually efficient. But if the
assumed
•
parametric
model
is
grossly
in
error
then
parametric
inferential procedures may be very inappropriate. In those cases where
one
does
not have
the \mow1edge
of
parametric
form.
inferential
procedures have been proposed to estimate the curve X(x). A variety of
such nonparametric procedures to estimate X(x)
is discussed in the
literature. see for example the survey paper by Singpurwalla and Wong
,
'
(1983i).' Here we will focus our attention on one such method based on
kernel estimators. In the next section we will define such estimators
and further investigation of them is carried out in section 1.3 where
we will
identify the major problems associated with them.
A brief
summary of the following chapters is given in section 1.4 .
1.2 Kernel Based Estimators
•
.
The nonparametric estimation of X(x) was ini tiated by Watson &
Leadbetter
(1964a.1964b).
Subsequent research works
2
include Rice &
Rosenblatt
(1976).
Singpurwalla & Wong (1983a..1983b).
(1983.1984) and Lo.
Mack & Wang
(1989).
Tanner & Wong
There are essentially two
variants of the same estimator which are commonly used. both based on
the
delta
sequence
smoothing
(l964a.l964b) and Rice
introduced
& Rosenblatt
by
Watson
&
Leadbetter
(1976).
To define the estimators consider a
random sample
X1'~'··· ,X
n
from the density f(x) and distribution function F(x) having support on
the posi tive half of the real line. Define for any x.
*
Fn(x) = n
-1 n
i:1 I[X ~x]'
i
Fn(x)
*
n
=:
(n+1 )Fn(x).
*
-1 n
f (x) = ~(x-u)dF (u) = n
I ~(x-Xi)'
n
n
i=l
-1
where ~(u)=h K(u/h) is a kernel. usually assumed to be a sYmmetric
and
probability densi ty function and h
is called the bandwidth of
the
kernel. Note that f (x) is the kernel estimate of f(x). So combining
n
the estimates F (x) and f (x) we get
n
n
An. l(x) = f n (x)1Fn (x) where Fn = 1-Fn .
as an estimate of A(X). A variant of A l(x) is
n.
An • 2 (X) =
~(x-u)
dF*(u)
1-~ (u)
n
which was proposed and motivated in Watson and Leadbetter (1964a). In
the case of randomly right censored samples. Tanner and Wong (1983)
have studied the generalization of estimator A 2(x) and Marron and
n.
Padgett
(1987)
have
analogs
of
this
kind
of
estimator
for
the
probability density function. The motivation and comparative discussion
of these variants of
the same estimator is considered in detail
Chapter 2.
3
in
To bring out the associated problem wi th the above mentioned
kernel based estimators we will focus our attention to X 2(x) only and
n.
we will compute its mean and variance. Now as noted by Watson and
Leadbetter (1964a)
E(Xn •2 (x»
= I[I-FD(Y)]X(y)~(x-Y)dY.
So as the sample size increases to m. the dominant part of E[X
is
the
fX(y)~(x-y)dY. which
convolution
can
be
2(x)]
n.
regarded as an
approximation of X(x) by the weighted average fX(y)~(x-Y)dy. For this
to be a good approximation it is necessary that the values of X(y) for
y far away from x must not be so large that the down-weighting of the
kernel is insufficient. Since X(y)=f(y)(I-F(y»-1 this amounts to a so
called "compatibility condi tion". given by Tanner & Wong (1983). on the
tail behaviors of
~
and I-F which says that for any M
an h small enough such that
Iy-xl
-1
~(y-x)[I-F(y)]·
>0
there exists
is uniformly bounded for
> M. In this sense K is assumed compatible with the distribution
function F. Therefore by theorem 2 of Tanner and Wong (1983) if K is
compatible to F then •
(Bias)
2
2
= (E[Xn •2 (x) - X(x)])
= h4 X,,2(x)[f(u 2/2)K(u)du]2 + 0(h4 ).
(2.1 )
and
Var(Xn . 2 (x»
= (nh)
-1 _2
X(x)
-1 -1
[JK-(u)du] I-F(x) + o(n h ).
(2.2)
We close this section with the important observation that the bias is
large for bigger bandwidths and the variance is large for smaller
bandwidths.
1.3 Practical Problems Associated With the Estimators
The major practical difficulty in using kernel based nonparametric
4
estimators of the hazard function is the choice of bandwidth h. To
illustrate its importance a random sample of 200 observations is
generated from a mixture of N(4.1) and N(9.1) with the mixing parameter
p=1/2 and the estimate A 2(x) and the true curve are plotted (see
n.
Figures 1.1. 1.2 and 1.3). Figure 1.1 shows the estimate of X(x) with
h. the bandwidth. equal to 0.2 . Figure 1.2 shows the same estimate
when h equals 0.8 and Figure 1.3 shows the true curve. Estimates of the
distribution function and the density function are overlaid in Figures
1.1 and 1.2 while the true distribution function and density function
curves are overlaid in Figure 1.3. The estimate of X(x) is too wiggly
when h=O.2. This is because there is not enough local averaging when h
is small.
In contrast to this the estimate of X(x) when h=O.8 has
neither the peaks so high nor the valleys as low. as for the case h=O.2
and there is less wigglyness. This is because there is more averaging
when h is big. Thus for h=O.2 the estimate is under-smoothed and for
h=O.8
the estimate
estimate
is over-smoothed.
is controlled by
Since
the smoothness of
the bandwidth h i t
the
is also called the
"smoothing parameter". Also as noted earlier. smaller h leads to larger
variance and bigger h leads to larger bias in the estimate. Dependence
of all this on h. the smoothing parameter. is why the choice of h is
more
crucial.
bandwidth
we
The above
get
a
Figures
very
illustrate
large
amount
of
that
by changing
flexibility
in
the
the
nonparametric curve estimator which is very desirable. because this
means that the estimator does not impose any preassigned structure on
the data. which is always done in the parametric estimators. HaVing to
choose h is the price we have to pay for the flexibility that we earn.
.
e
5
N
...-:
:
...--
...-0
...--
N
OJ
II
...c
0
OJ
.....
,/
~
-
r----
",..
~
X
-....
c.o
Q)
L
~
OJ
0-
LL
L[)
N
..--..--..---
..
..
0
..---
:
CO
:
:
II
...c.
/
/
,/
. .....
N
~
CD
L
,/
/
r---.
/
I
"-
X
"-
"-
CD
"-
"'\
L()
=:J
OJ
.-
LL
-
e
o
(J)
n
CO
~
f""'-..
Q)
L
::::J
OJ
.-
LL
X
:
:
CD
The choice of h in the above simulated example is not difficul t
because we know the right answer. But for real data sets this is not
the case. Dependence of bias and variance on h suggest a naive method
of selecting h as trade off between bias and variance. This can be
achieved by considering mean square error or integrated mean square
that by (2.1) and
(2.2).
estimator An •2 {X)
MSE '" h 4 A,,2{x)[!{u2/2)K{u)du]2 + (nh)-I[~{u)du]
A(X)
I-F{x)·
error of
the estimator.
Note
for
the
(3.1)
Therefore the h which minimizes MSE is
n
which depends on unknown quanti ties.
-1/5
(3.2)
Thus i t is not an immediate
practical solution to the problem of selecting h.
In general.
for
whatever error criteria we decide on to assess the performance of the
estimator when h is selected to minimize that error cri teria.
the
result will depend on unknown population quantities. Also since we do
not know the right answer. the trial and error method of selecting h
will not guarantee the correct choice. Hence a method which uses the
data in some objective way. or automatic way. to choose the smoothing
parameter
is needed.
Such methods are proposed in chapter 3 wi th
discussion.
1.4 Summary
We have seen two essentially similar estimators of the hazard rate
in section 1.2 earlier. Similar estimators for the probability density
function and the hazard rate function in the case of randomly right
censored data are proposed by Marron and Padgett (1987) and Tanner and
Wong (1983). In Chapter 2. we show that these are ratio type estimators
9
~
and provide different motivation for them. We also provide new insight
into choosing between them.
In Chapter 3 we have proposed different
selecting
the
smoothing
parameter
with
possible methods
discussion.
Among
of
those
proposed. we have investigated the cross-validation method of selecting
h. We have shown that the integrated square error of the kernel hazard
estimator.
in the censored as well as uncensored setting. is a good
approximation
to
the
mean
integrated
square
error.
under
some
reasonable assumptions. Also it is shown that the bandwidth selected by
the cross-validation method is optimal.
To study the order of closeness between integrated square error
and mean integrated square error. for all the estimators discussed in
Chapter 2. we have derived the asymptotic distribution of integrated
~
square error evaluated at a fixed sequence of bandwidths h=h{n}
as
the
~~.
0
in Chapter 4. This also develops the machinery essential to
derivation
of
joint
asymptotic
distribution
of
the
random
bandwidths obtained by minimizing the integrated square error and the
cross-validation score function.
In
Chapter
5
the
joint
asymptotic
distribution
of
random
bandwidths. obtained by minimizing the integrated square error and the
cross-validation score function. is obtained. It is seen that the rate
of convergence of the bandwidth obtained by cross-validation method to
the
optimal
estimation.
is
n
even
-1/10
.
In summary we
though
the
find
that
cross-validation
10
for
hazard
method
rate
selects
the bandwidth which converges to the right answer, as Chapter 3 says,
Chapter 5 shows that it needs an exorbi tantly large sample size to
achieve that !
11
aIAPTER 2
KERNEL BASED FSflMATORS OF RATIO FUNCfIONS
2.1 Introduction
Two types of kernel estimators of the hazard rate function have
been defined in the Chapter 1 and it will be seen, in section 2.2, that
these are easily generalized to randomly right censored samples . The
purpose of
this chapter is motivation and comparison of
these two
estimators. Asymptotic analysis of their mean integrated square error,
as done in section 2.2, shows that the variances are the same but with
regard to bias neither estimator is uniformly superior to the other.
That is to say for some choices of the underlying probability density
function bias of the one estimator is smaller, while for other choices
bias of the other estimator is smaller. Moreover, later it will be seen
that the asymptotic optimality results of Chapter 3, 4 and 5 are true
for both the estimators with slight variation. So neither the bias and
variance analysis nor the asymptotic optima1i ty resul ts reveal any
order
of
preference among
these
two commonly used estimators.
An
analogous problem also occurs in the case of kernel density estimation
from right censored samples,
where Marron and Padgett
(1987) have
discussed optimal properties of these two related estimators.
This chapter provides a new insight into the choice between these
two estimators. We argue that one of the estimators is more natural. To
make the main idea clear we will focus attention first on the hazard
rate function estimators, as these are the estimators wi th which we
will be dealing in following chapters. As an example, these ideas will
then be applied
to kernel
density estimators
from
randomly
right
censored samples.
The basis of our comparison of these estimators, as will be seen
in section 2.3, is their structure. The estimator in An, l{x) may be
viewed as separately estimating the ,numerator and denominator of the
target function, while the estimator An, 2{x) is motivated only in terms
of the entire target function. We show that the latter estimator is
more natural in two senses. First, it admits an important and simple
tyPe of martingale representation which the former does not seem to
have and second,
smooth while
the
the "effective kernel" of the estimator An, 2{x) is
"effective kernel"
of
the
estimator
A
n,
l{x)
has
undesirable discontinuities. The idea is applied to analogous issues in
kernel
densi ty estimation from
randomly right
censored samples
in
section 2.4 and extended to hazard rate function estimation in the
censored setting in section 2.5.
2.2 The Estimators
We have defined the estimators A .(x) i=I,2, of hazard rate A{X)
n.l
in the uncensored setting in Chapter 1 and these are
x-X
A
'x)
n,.1 \
= {nh)-1
n
!
i=1
K{~)
1-F {x) ,
n
(2.1)
and its simple (and natural. as will be seen) modification considered
by Watson and Leadbetter (l964a),
13
x-x
K(~)
-1 n
I-F (X )
(2.2)
n i
1 2
As F converges to F at the rate of n- / • asymptotically A . (x) and
n 1
n
An •2 (X) = (nh)
i:l
An. 2(x) are equivalent to
x-X
K(~)
n
~n.l (x) = (nh)-1 );
I-F(x)
i=1
and
x-X
A
K(~)
n
n.
2(x)
= (nh) -1 i=1
};
I-F(X.1 )
To generalize these estimators to the randomly right censored samples
we will introduce additional notation.
Let X~ .~ •.... X~ denote the independent identically distributed
(~.i.d.)
survival times of n items or individuals that are censored on
the right by i.i.d. random variables U .U ..... U which are independent
1 2
n
of the Xi·s. Denote the common distribution of the X~·s by FO and that
of Ui's by H.
It is assumed that F° is absolutely continuous with
densi ty f O and that H is continuous and the hazard rate function we
want to estimate is Ao(x) = fO(x)/[I-Fo(x)].
The observed randomly
right censored data are denoted by the pairs (Xi.A i ) i=I.2 ..... n where
Xi=min{X~. Ui }
The
X.' s
1
form
an
i. i .d.
and Ai=I[X~~U.].
1
sample
from
a
1
distribution
F
where
I-F=(I-Fo)(I-H). Define the empirical distribution function
F*(x)
n
and
F (x)
n
=n
-1 n
}; I[X
i=1
(2.3a.)
< ],
i-X
n
*
= (-)F
n+l n (x)
(2.3b)
Let FO{x) be the Kaplan-Meier estimate of FO and H be such that
n
n
= (l-F~(x»(l-Hn(x»"
(l-Fn (x»
(2.4)
Let f n, l(x) be the kernel estimator of fO(x),
-1 n ~(x-Xi)
f n ,l(x) = n i:1 1-H (x) Ai'
n
which
is
one
of
the
estimators
Marron
(2.5)
and
Padgett
o
(1987)
had
0
considered. Therefore combining the estimates f n, l(x) and Fn (x), the
O
straightforward ratio estimator Xn,
l(x) of XO(x) is
o
-1 n ~(x-Xi)
Xn ,l(x) = {n i:1 1-H (x) Ai}/[l-F~(x)]
n
-1 n ~(x-Xi)
= n .~
1-F (x) Ai·
(2.6)
1=1
n
O l(x) considered by Tanner and Wong
Again the simple modification of Xn,
(1983) is
-1 n ~(x-Xi)
~
A
(2 7)
n,2
i=l 1-Fn (X i ) i"
.
-1/2
0
As Fn converges to F at the rate of n
,asymptotically X ,l(x) and
n
X~,2(x) are equivalent to
O
X
(x)
=n
-0
Xn ,l (x) = n
-1 n
~
~(x-Xi)
1-F(x)
i=l
Ai ,
and
-0
Xn ,2(x) = n
-1 n
~
~(x-Xi)
1-F(X i ) Ai·
i=l
We will see in section 2.4 that f n, l(x)
is also obtained by
considering the ratio of two estimates. Marron and Padgett (1987) have
also
considered
the
simple and natural,
as
will
be
seen
later,
modification of f n, l(x) as
-1 n ~(x-Xi)
f n ,2(x) = n .~1 1-H (X.) Ai"
1=
n 1
Again f n, l(x) and f n, 2(x) are asymptotically equivalent to
15
(2.8)
and
Theorem 2.1 : Assume that the condi tions of Theorem 2 of Tanner and
Wong (1983) hold. Then,
°
°
. °
E[X-0n, l(x) -X (x)]=[I-F(x)] -1 JK(u){X (x-hu)[I-F(x-hu)]-X (x)[I-F(x)]}du,
_2
Var(X-0n ,l(x»= (nh) -1 [JK-(u)du]X
(x)[I-F(x)] -1 + o( n-1 h-1 ),
-0
-1 -1
-1 -1
MISE(Xn ,l(x» = a n h +b 1+0( n h )
and
°
E[Xon, 2(x) - XO(x)] = JK(u)[Xo(x-hu)-Xo(x)]du,
°
Var(X-0n ,2(x»
where
where
_2
= (nh) -1 [JK-(u)du]X
(x)[I-F(x)] -1 + o(n-1 h-1 ),
-0
-1 -1
-1 -1
MISE(Xn ,2(x» = a n h +b2+0( n h ),
a = [~(u)du]JAo(x)[I-F(x)]-lw(x)dx ,
2
-2
2
b 1=JB 1(x,h)w(x)[I-F(x)] dx,
b2=JB2 (x,h)w(x)dx ,
B1(x,h)=JK(u){Xo(x-hu)[I-F(x-hu)]-Xo(x)[I-F(x)]}du and
B2 (x,h)=JK(u)[Xo(x-hu)-Xo(x)]du and
and w(x) is a weight function.
Proof
See Tanner and Wong (1983).
Remark 2.1 : The difference in the asymptotic analysis of these MISE's
shows up in the bias part. To understand this difference observe that
B1(x,h)/[I-F(x)] = JK(u)Xo(x-hu){[F(x-hu)-F(x)]/I-F(x)]}du
+ B (x,h).
2
(2.9)
Note that for some choices of f(x), b will be smaller, while for
2
others b will be smaller. So the two estimators are not comparable in
1
16
terms of MISE.
Corollary 2.1 : Assume that the conditions of Theorem 2 of Tanner and
Wong (1983) hold. Then,
-
E[Xn, l(x) -X(x)] = [l-F(x)]
-1
JK(u)[f(x-hu)-f(x)]du,
Var(Xn, l(x»= (nh)-l[~(u)du]X(x)[l-F(x)]-l+o( n- 1h- 1),
-1 -1 .
-1 -1
MISE(Xn ,l(x» = a n h +b 1+o( n h )
and
E[Xn, 2(x) - X(x)] = JK(u)[X(x-hu)-X(x)]du ,
= (nh)-l[~(u)du]X(x)[l-F(x)]-l + o(n- 1h- 1),
-1 -1
-1 -1
MISE(Xn ,2(x» = a n h +b2+o( n h )
Var(Xn, 2(x»
with
where
a = [~(u)du]JA(x)[l-F(x)]-lw(x)dx,
2
~
2
b 1=JB 1(x,h)w(x)[1-F(x)] dx,
b2=JB2 (x,h)w(x)dx
B1(x,h)=JK(u)[f(x-hu)-f(x)]du and
B2 (x,h)=JK(u) [X(x-hu)-X(x)]du
and w(x) is a weight function.
Proof: Proof follows by taking H(x)=O for
x~
in above theorem. For
independent proof see Watson and Leadbetter (1964a).
Corollary 2.2 : Assume that K is compatible to H. Then,
E[f- n, l(x) -f 0 (x)]= JK(u){f 0 (x-hu)[l-H(x-hu)]-f 0 (x)[l-H(x)]}du,
-1 _2
0
-1
-1 -1
Var(f n ,l(x»= (nh) [JK-(u)du]f (x)[l-H(x)] + o( n h ),
-1 -1
-1 -1
MISE(f n ,l(x» = a n h +b 1+o( n h )
and
E[fn, 2(x) - fO(x)] = JK(u)[fo(x-hu)-fo(x)]du,
_2
0
Var(f- n ,2(x» = (nh) -1 [JK-(u)du]f
(x)[l-H(x)] -1 "+ o(n-1 h-1 ),
17
.
e
MISE(f n ,2(x»
= a n
-1 -1
-1 -1
h +b2+0( n h ),
a = [~(u)du]Jfo(x)[l-H(x)]-lw(x)dx ,
where
2
-2
b 1=JB (x,h)w(x)[1-H(x)] dx,
1
2
b 2 =JB2 (x,h)w(x)dx ,
0
0
B1 (x.h)=.fK (u){f (x-hu)[l-H(x-hu)]-f (x)[l-H(x)]}du and
where
B2 (x,h)=.fK (u)[f 0 (x-hu)-f 0 (x)]du and
and w(x) is a weight function.
Proof: The proof is based on the identical steps to those of the above
theorem.
In the next section we argue that An, 2(x) is more natural than
A ,l (x) and the argument is extended to the estimators of fO(x) and
n
AO(X) in following sections.
2.3 The Natural Hazard Rate Estimator
As noted earlier An, l(x) is obtained by estimating the numerator
and denominator of A(X) individually. On the other hand An, 2(x) may be
viewed as directly targeting A(X) itself. To see this, note that the
kernel density estimator f (x) can be motivated heuristically by
n
f(x) ~ ~(x-u)f(u)du
= ~(x-u)dF(u)
* + ~(x-u)d[F(u)-Fn(u)].
*
= ~(x-u)dFn(u)
(3.1)
So a reasonable estimator of f(x) is
*
fn(x) = ~(x-u)dFn(u) = n
The
random part of
the error
term,
-1 n
i:1 ~(x-Xi)·
*
~(x-u)d[F(u)-Fn(u)],
can be
conveniently analyzed using the fact that F* (x)-F(x) is an empirical
n
process. The representation in (3.1) also shows that the approximation
18
of f(x) by
~(x-u)f(u)du
introduces the bias part of the error.
For a hazard function analog of (3.1) define the cumulative hazard
function
x
x dF(u)
A(x) =oJ A(u)du =oJ 1-F(u)
(3.2)
and its empirical version
x dF* (u)
J
n
o 1-Fn* (u)
=
for
O~x<co,
and to avoid the infinity in the denominator we modify it to
x dF* (u)
n
An(x) = J 1_F (u)
o
for
(3.3)
O~x~co.
n
Heuristic motivation as at (3.1) gives
A(X) ~ ~(x-u)A(u)du
= ~(x-u)dA(u)
= ~(x-u)dAn(u) + ~(x-u)d[A(u)-An(u)].
(3.4)
So the natural estimator of A(X) will be
An, 2(x) = V-ll
O{_ (x-u)dA (u) ,
n
and to see that the resulting random error term,
~(x-u)d[A(u)-An(u)],
has a similarly convenient probability structure, note that
(3.5)
which is a stochastic integral of the predictable process (1-F (u»-1
n
with respect to the martingale
M (x) = n
1/2
n
with
associated
filter
x
[F (x) - J (1-F (u»dA(u)].
nOn
~x=
convergence of the process n
a(I[Xi~u]'
1/2
(3.6)
u~x)
.
The
weak
{A (x)-A(x)} is discussed in chapter 6
n
of Shorack and WeUner(1986).
19
.
e
:1
... j
..... , /
r
"
.-
.- ..- .-
'"
..- .-
..... -....
-....
..... .....
......... ........ ....
........ .
Q)
- N
L
:=J
OJ
I
.-
/
LL
/
./
/
.........
........ .
..........
,-: ..
'\:.
\
\.
I
9
I
I
7
v
o
o
X
The elegance and power of this representation is seen in Wells
(1989)
(nh)
1/2
who
uses
the
representation
in
(3.4)
to
show
that
{An, 2(X)-A(X)} converges weakly to a Gaussian process on D[O,OO).
The main step of the proof is the application of Rebolledo's central
limit theorem to
h-l/~(x-u)
vrrih {An ,2(x)-A*(X)} = J I-F (U)h dMn(u).
n
where A* (x) = ~(x-u)dA(u).
Another difference in the above discussed estimators, which seems
to have been overlooked so far.
is
the form of
their "effective
kernel". The effective kernels of the estimators A
and A
are
n.l
n.2
-1
and Kb(x-X )[I-F(X )] -1 respectively. and the
Kb(X-X i ) [1-F(x)]
i
i
difference
betweeT.l
them
is made
evident
in
the
Figure
2.1.
The
effective kernels are drawn at two sample points. x=I.3. with the
e·
effective kernels of A 1 represented by dotted lines and those of A 2
n.
n.
by dashed lines. The solid vertical lines are three sample points. The
effective kernel of A 1 has as many discontinuities as the number of
n.
sample observations caught in the window width of the kernel. This is
an undesirable property for a smoothing method. On the other hand the
effective kernel of A 2 is a smooth curve. Thus A 2 will give a
n,
n.
smooth estimate of A. while the estimator
A
will always result in a
n.l
discontinuous estimate of A.
2.4 The Natural Density Estimators For Right Censol.:d Samples
We now apply
the above
idea
to
the case of kernel
densi ty
estimation for,right censored samples, Marron and Padgett (1987) have
21
.
e
proposed two closely related estimators (2.5) and (2.8) of the density
but again found through MISE analysis that nei ther one is uniformly
better than the other. However here we show that f n, 2(x), the analog of
xn, 2(x),
is more natural in this same sense.
Define the sub-distribution functions
+
x
F (x)=P[Xi~x, Ai=O]= J (l-Fo(u»dH(u)
(4.1)
o
and
x
F-(x)=P[Xi~x, A.=l]= J (l-H(u»dFo(u),
1
0
(4.2)
and its empirical version
+
F (x)=n
n
_
-1 n
! I[X
i=l
< ](l-A i ) ,
(4.3)
i-X
-1 n
Fn(x)=n
i:1I[Xi~x] Ai .
(4.4)
Clearly
F(x)=F+(x) + F-(x}. and F*(x}=F+(x) + F-(x}
n
n
n
O
Thus to estimate f we have only those Xi's for which Ai =l. Observe
that
n
-1 n
! ~(x-Xi)A.
i=l
(4.5)
1
provides an estimate of fO(x}(l-H(x». So to get an estimator of fO(x)
it
is natural
(l-H(x».
Hence
to divide the above expression by the estimate of
fO(x)
may
be
viewed
as
the
"ratio
function"
fO(x)(l-H(x)}/(l-H(x}}, which has a structure with strong connection to
X(x). The obvious analog of Xn, l(x) is f n, l(x).
Now define the cumulative hazard function
22
AO(x}= Jx
o
°
dF (u)
I-Fo(u}
_ Jx (I-H(u»
-0
I-F(u}
(4.6)
As above an obvious estimate of AO(x} is
(4.7)
Again.
fO(x} ~ ~(x-u}dFo(u)
= ~(x-u}(I-Fo(u}}dAo(u)
= ~-n
rK. (x-u}(I-Fo(u}}dAo(u)
+ ~-n
rK (x-u}(Fo(u}-Fo(u}}dAo(u)
n
n
n
+ ~-n
rK. (x-u}(I-Fo(u}}d[Ao(u}-Ao(u)]
n
n
(4.8)
which gives the natural estimator. in the sense of section 3. of fO(x}
to be
f n. 2(x} =
~ (x-u}(I-F°n (u}}dA°n (u)
=
~ (x-u}dF°n (u)
.
This is precisely the second estimator of Marron and Padgett (1987).
although no motivation of this type is given there. The random error in
this approximation is slightly more complicated than in section 3. so
convenient representation of it requires
two
terms.
Note that one
involves empirical process while other involves the process
which is a stochastic integral of a predictable process with respect to
a martingale
°n
M (x}=n 1/2[F- (x) -
x
J (I-F (x}}dA°(u)]
nOn
°
with associated filter ~x= a(I[X~~u]' I[X~~u]Ai'
u~x).
The
weak
convergence
of
the
process
is
discussed in chapter 7 of Shorack and Wellner (1986). The estimator
f
n.
2(x}
is more natural
in this sense and also has a
23
continuous
.
e
effective kernel.
2.5 Censored Hazard Rate Estimation
Again as noted earlier An • (x)
l
is obtained by estimating the
numerator and denominator of AO{X) individually. However. this is not
natural in the sense of being a martingale with respect to x and has a
discontinuous effective kernel. On the other hand heuristic motivation
as in section 3. gives
AO{X) ~ ~(x-u)Ao{u)du
= ~(x-u)dAo{u)
= ~(x-u)dA:{u)
+ ~(x-u)[dAo{u)-dAn{u)]
(5.l)
So the natural estimator of AO{X) will be
which has smooth effective kernel and the resulting· random error term.
~(x-u)d[Ao{u)-A:{u)]. has a convenient probability structure as seen
in section 2.4.
24
QIAPTER 3
AUfOMATIC SELECTION AND OPTIMALITY OF THE SMOOTIIING PARAMETER
3.1 Introduction
As noted in chapter one the basic problem of using the hazard rate
function estimators in practice is choosing an appropriate bandwidth.
The only work in this direction is by Tanner and Wong (1984).
In
contrast. this problem has been discussed at length for the density
estimation and regression function estimation.
survey
See for example the
paper by Marron (1988). Here we will develop similar techniques
for selecting the smoothing parameter and assessing its performance for
hazard function estimation. in censored and uncensored settings.
The traditional method of assessing the performance of estimators
which uses an automatic smoothing parameter selector is to consider
some sort of error criterion. The usual criteria may be separated into
two classes.
global and pointwise.
As most applications of curve
estimation call for a picture of an entire curve. instead of its value
at one particular point. only global measures will be discussed here.
The global error criteria those are used in literature are L -norm
1
L2-norm .3.hd Leo-norm. Each of these error cri teria has its own meri ts
and demerits. For example both. L and Leo norm. have the major drawback
1
that
they
are
technically
very
difficul t
and
give
no
simple
.
e
quantification of smoothing. For the advantages of L1-norm we refer the
reader to Devroye and Gyorfi (1985). Unlike L and Loo norms, L -norm is
1
2
technically much easier because of the inner product structure and
notion of orthogonality. As a matter of fact this leads to much deeper
resul ts
in densi ty estimation case.
We,
therefore,
use
the widely
accepted L -error criteria, the integrated squared error,
2
2
A
ISE{h) = I{X{xlh)-X{x»
w{x)dx.
For the non-random assessment of performance of the estimator we will
also consider
expected value of
the L -norm,
2
the mean integrated
squared error,
MISE{h)=E{ISE{h».
To select a
smoothing parameter
it
is natural
to attempt
to
minimize these error criteria wi th respect to h. It should be noted
that the considered error cri teria are continuous functions of h and
the minimizers with respect to h exist. But for each of these error
criteria the minimizer may not be unique. If there is a tie we select
either one. Such minimizers of these error criteria will be denoted by
h with appropriate subscripts, ego h
that h
the
ISE
data
ISE
. At this point one should note
will be a random quantity. So, even though h
set
at
hand,
a
disturbing
feature
is
ISE
is best for
that
the
two
eXPerimenters whose data have the same underlying distribution will
have two different "optimal bandwidths". On the other hand,
~ISE
is
fixed and best with respect to the average over all possible data sets.
A practical problem wi th h ISE or
~ISE
is that they both depend on
unknown quantities. To resolve this several methods have been proposed,
including the following.
26
•
3.1.1 Plug-in Method
Recall from the corollary 2.1 of Chapter 2 that
= a(nh) -1
-1 -1
+ b 2 + o(n h )
1 1
= (nh)-1[JK2(u)du][JA(x)(1-F(x»-1w(x)dx] + o(n- h- )
+ h4 [JA,,2(x)w(x)dx][f(u2/2)K(u)du]2 + o(h4 ).
Thus, since asymptotically
MISE - (nh)-1[JK2(u)du][JA(x)(1-F(x»-1w(x)dx]
+ h 4 [JA,,2(x)w(x)dx][f(u2/2)K(u)du]2,
people generally define asymptotic mean integrated square error
AMISE
= (nh)-1[JK2(u)du][JA(x)(1-F(x»-1 w(x)dx]
+ h 4 [JA,,2(x)w(x)dx][f(u2/2)K(u)du]2.
Thus the smaller h will give large value of the first term and larger h
will give large value of the second term in the AMISE and the trade off
will be achieved at
_ {[JK2(u)du J£JA(X)(1-F(x»-1w(X)dxJ}1/5 -1/5
h 2
2_
2
n
o
[JA" (x)w(x)dx][fu-K(u)du]
Now the plug-in method requires the asymptotic representation of the
MISE to be
MISE(h)=AMISE(h)+o(AMISE(h»
under whatever conditions may be necessary. Then the essential idea is
to work with AMISE and plug-in estimates of
possible drawback of
this approach is
that
the unlmown parts. A
the estimation of
the
unlmown quanti ties require pi lot estimation which themselves require
specification of smoothing parameters. Another weakness of the plug-in
estimator
is
that
the
difference betweeu MISE and
~1'11SE
can
be
substantial. This method and related problems have been investigated in
relation to density estimation,
see for example Woodroofe
(1970),
Scott, Tapia and Thomson (1977), Scott and Factor(1981), Krieger and
27
.
e
Pickands (1981). Sheather (1983. 1986). and Hall and Marron (1987a).
Similar investigation needs
to be done in relation to
the hazard
function estimation and is an interesting open problem.
3.1.2 Pseudo Likelihood Cross-Validation
This was proposed independently by Habbema. Hermans and van den
Broek (1974) and by Duin (1976) in the density estimation setting. For
hazard function estimation. Tanner and Wong (1984) were first to use
this method where they call it modified likelihood cross-validation.
Essentially this method means a cross-validation idea applied to
the likelihood criteria. In the case of density estimation. Schuster
and Gregory (1981) have shown that the selector based on this method is
severely affected by the tail behavior of f. If both the kernel and
density.
are
compactly
supported
then Chow.
Geman and Wu
(1983)
demonstrated that the resulting density estimator will be consistent.
But this consistency can be very slow and the selected bandwidth very
poor. as demonstrated by Marron (1985). Hall (1987a.b) explores this
method much more deeply and concludes that this method of selecting the
bandwidth is usually not appropriate for
the curve estimation.
We
expect similar performance of this method if one has to use it for the
hazard rate function estimation although verification of this is an
open problem. This method will not be considered further here.
3.1.3 Least Squares Cross-validation
This was proposed independently by Rudemo (1982) and Bowman (1984)
28
in relation to density estimation. To adapt it to hazard rate function
estimation. the idea is to target
ISE{h)=I
X- 21 XA
+
x2
I
A
=1
A
X{x)dx -
2I
X{x)
I-F{x)
f{x)dx +
I
2
X (x)dx
The first term of this expansion is available to the experimenter. the
moment estimator of the second term seems tractable and the last term
is independent of h. So one would get the criteria
I
A
x
2 - 2 Estimate of I
X{x)
f{x)dx
I-F{x)
which is then minimized to give cross-validation smoothing parameter.
This method has been discussed at length in the density estimation
and nonparametric regression estimation li terature. See Hall
(1983.
85). Stone (1984). Burman (1985). Nolan and Pollard (l987). Marron and
Padgett (81) • Haerdle (l988). Eubank (l988). Haerdle. Hall and Marron
(1988) and Gyorfy. Haerdle. Sarda and Vieu (1989). A good discussion of
the strengths and weaknesses of this method can be found in the survey
paper by Marron (1988).
We will be analyzing this method for
the
smoothing parameter selection in hazard rate function estimation.
The basic purpose of
this Chapter is
to develop a bandwidth
selector using cross-validation and show that it is asymptotically
optimal. The program is carried out as follows. Section 3.2 deals with
the uncensored case where an approximation result for ISE is obtained
and the least squares cross-validation bandwidth is proposed. Also this
bandwidth is shown to be optimal in the sense of ISE. A similar program
is carried out for censored case in section 3.3. Section 3.4 contains
29
~
proofs of all the results in this chapter.
3.2 Uncensored Case
3.2.1 Asymptotics
Recall the plug-in estimator An • 1 (x) and the natural estimator
An. 2(x} which are
-1 n
~
i=1
~(x-Xi)
1-F (x)
(2.1)
-1 n ~(x-Xi)
~ 1-F (X )
i=l
n i
(2.2)
An • 1 (x} = n
n
and
A
n.
2(x}
=n
Now note that Fn is ~ii convergent (see Serfling (1980» which is a
faster rate of convergence than that of the density estimator which
-~5
.
converges typically at the rate of n
. So A 1(x) and A 2(x) are
n.
n.
essentially the same as
~n.l(x} = n
-1 n
~
i=l
~(x-Xi)
1-F(x}
(2.3)
and
-1 n
~(x-Xi)
~n.2(x} = n i:1 I-F(X }
i
(2.4)
,.,
For A(X} equal to An ,l (x). An •2 (X}. ~n.l (x) or ~n.2(x} we choose to
analyze its performance by studying ISE.
,.,
co
,.,
2
lSE(A(x}} = I (A(X) - A(X}} w(x}dx
o
where
(2.5)
w(x} is a non-negative weight function. The role of w is to
eliminate the endpoint effects.
Assumptions :
A.1. The weight function w is bounded & supported on [O,T] where
30
= sup
T
< 1-~}
{t; F{t}
for an small
~>O.
A.2. K is compatible with the distribution function F, and f and A are
twice differentiable.
A.3. K is a symmetric probability density function with compact
support.
A.4. IK{x}-K{y}I ~ Clx-yla for some a>O and C is a positive constant.
A.5. The bandwidths under consideration are assumed to come , for each
n, from a set H so that
n
inf ~ n- 1+6 and
# H
h~H
n
n
~ n P for some 1/2 > 6 > 0, P > O.
Theorem 3.1.: Under the above conditions on w , K , f and A we have
I
sup
h~
ISE{An ,l{x}} - [a n
[a n
n
-1 -1
h
+ b
-1 -1
h .+ b 1]
1
and
sup
h~H
I
ISE{An ,2{x}} - [a
-1 -1
[a n h
n
+ b
2
--0+)
0
a.s.
{2.6}
]
n
-1 -1
h + b2 ]
I
--0+)
0
a.s.
{2.7}
]
where a and bi's are as defined in Corollary 2.2.1.
Remark 3.1.: In Theorem 3.1, the bandwidths are assumed to come from
finite set Hn . But i t could be extended to the interval I = [!!,h]
Hn
which is achieved
in the
following corollary.
Before stating
the
corollary first note that
{i}
H C IH '
n
n
{ii} the cardinality of H grows algebraically fast and
n
for some p>O, and
{iii} for h'
~
I
H
n
# H
n
~ nP .
there is h ~ Hn such that Ih'-hl ~ n~ for given
31
TJ>O.
Corollary 3.1.: Under the assumptions A.1 - A.4,
ISE{Xn ,l{x»
[a n
- [a n
-1 -1
h + b
1
-1 -1
h
+ b 1]
----+J
0 a.s.
(2.S)
----+J
0
a.s.
(2.9)
]
and
ISE{Xn ,2{x»
[a n
- [a n
-1 -1
h + b
2
-1 -1
h
+ b2 ]
]
The Corollary 3.1 is an inunediate consequence of the Theorem 3.1 and
the following lemma.
Lemma 3.1.: Under the assumptions A.1-A.5,
sup
h~n,h': Ih'-hl~n-TJ
I Ai{h')-Ai{h)1
JO
a.s.
(2.1O)
ISE{Xn,i{h» - [a{nb)-l+ bi]
where Ai{h) =
-1
' i=l,2.
[a{nb) + bi]
3.2.2 Automatic Bandwidth Selection
For the data based bandwidth selection we propose a least squares
cross-validation. This is motivated as follows
"
"
For the estimator of X{x) write X{x)
I.e. X{x)
may be Xn, l{x) or
Xn, 2{x) then
""2
"
2
ISE (X{x» = SA {x)w{x)dx - 2SA{x)X{x)w{x)dx + SA (x)w{x)dx
"
"2
X(x)
2
= SA {x)w{x)dx - 2I{1_F{x)w{x)}f{X)dx + SA (x)w{x)dx. (2.11)
i) The third term is independent of h, so it is enough to choose h to
minimize the sum of the first two terms.
32
ii)The second term can be nearly unbiasedly estimated by
(2.12)
where
A
A
Ai is the leave one out version of
A, given by
A
(2.13a)
when A = An, 1
and
~(x-Xj)
-1
A 2i{x) = (n-l)
n,
!
I-F (X )
to
be
jjll!i
n
A
when
j
(2.13b)
A = An, 2
A
Thus
we
define
h
c
the
minimizer
of
the
least
square
cross-validation criterion
A
A2
CV{h) = I A {x)w{x)dx -
-1 n
2n
!
i=1
Ai{X i
)
n
i
I-F (X ) w{X i
)·
(2.14)
A
Theorem 3.2.: Under the conditions of Theorem 3.1 h
c
is optimal in the
sense that
ISE{A,hc )
A
---+J
1
a.s.
(2.15)
inf ISE{A,h)
h
Remark 3.2.: Again note that the h's are assumed to come from H but
n
this set could be extended to the interval I
H
n
.
3.3 Censored Case
3.3.1 Asymptotics
o
Recall the plug-in estimator An" 1(x) &Ild the natural estimator
o
An ,2{x) which are
o
-1 n ~(x-X.)
An ,l{x) = n !
I-F (~) I[A.=I]
i=1
n
1
and
33
(3.1)
o
n Kb(x-X i )
-1
Xn ,2(x) = n
(3.2)
i:1 1-F (X ) I[A =l]'
n i
i
Now note that Fn is ~ii convergent (see Serfling (1980», which is a
faster rate of convergence than that for the density estimator, which
converges typ i ca 11y at t h e rate of n -2/5. So ",0
I'n,l (x) and ",0
I'n,2 (x) are
essentially the same as
-0
Xn, l(x)
=n
-1
n Kb(x-X i )
~
1=1
1- F()
x
(3.3)
1[£a = 1]
1
and
'"
For X(x)
analyze its performance by studying ISE,
'"
ClO
ISE(X(x»
where
=I
o
'"
(X(x) - X(x»
2
(3.5)
w(x)dx
w(x) is a non-negative weight function. The role of w is to
eliminate the endpoint effects.
Assumptions : The stated condi tions A.3 - A.5 in uncensored setting
will again be assumed together with the following modification of A.l
and A.2.
A'.I. The weight function
w is bounded & supported on [O,T] where
T < min(TH,TFo),
with T
G
= sup
{t; G(t)<I-e} for small e>O
for distribution
function G.
A'.2. K is compatible with the distribution function FO and H, and f
and XO are twice differentiable.
Theorem 3.3.: Under the above conditions on w, K, f & Xo we have
34
sup
heR
I
o
lSE(An . 1 (x»
[a n
n
- [a n
-1 -1
h + b
1
-1 -1
h
+ b 1]
and
sup
heR
n
I
o
lSE(An •2 (x»- [a n
[a n
-1 -1
h
+ b
2
---.) 0
a.s.
(3.6)
---.) 0
a.s.
(3.7)
]
-1 -1
h
+ b2 ]
I
]
where a and bi's are as defined in theorem 2.2.1.
Remark 3.3.: Again in the above Theorem
the bandwidths are assumed to
come from finite set Hn . But it could be extended to the interval
I
Hn
=[h,h] which is achieved in the following corollary.
Corollary 3.2.: Under the assumptions of the theorem 3.3
o
lSE(An ,l(x»
[a n
- [a n
-1 -1
h + b
1
-1 -1
h
+ b 1]
---.)0
a.s.
(3.8)
---.) 0
a.s ..
(3.9)
]
and
o
lSE(An ,2(x»
[a n
- [a n
-1 -1
h
+ b
2
-1 -1
h
+ b2 ]
]
The Corollary 3.2 is an immediate consequence of the Theorem 3.3 and
the following lemma.
Lemma 3.2.: Under the assumptions of the theorem 3.3
sup
heH ,h': Ih'-hl~n-11
I Ai (h')
- Ai (h) I
)0
n
where
Ai(h) =
lSE(A~,i(h» - [a(nh)-I+ b i ]
-1
[a(nh)
+ bi ]
35
' i=I,2.
a. s.
(3.10)
3.3.2 Automatic Bandwidth Selection
.
For the data based bandwidth selection we propose a least squares
cross-validation. This is motivated as follows
o
'"
'"
0
For the estimator of A (x) write A(X} Le. A(X} may be An, l(x} or
o
An, 2(x} then,
'"
"'2
'"
0
lSE (A(X}) = JA (x}w(x}dx - 2JA(X}A (x}w(x}dx + JA
0
2
(x}w(x}dx
'"
= ~2(x}w(x}dx - 2f{ A(X)
w(x}}fo(x}dx
I-Fo (x)
2
+ JAo (x}w(x}dx.
(3.11)
i) The third term is independent of h, so it is enough to choose h to
minimize the sum of the first two terms.
ii}The second term can be nearly unbiasedly estimated by
n
-1 n
I
'"
Ai(X i }
---~...;;...----w(X
i=1 [1-Fo (X }] [1-H (X )]
n
where
n
i
}I
i
i
'"
Ai is the leave one out version of
(3.12)
[A i =l]
'"
A,
given by
(3.13a)
and
o
-1
Kb(x-X j }
A 2i(x} = (n-l)
I 1 F (X)
n,
j;H - n j
I[Aa
'"
0
1] when A = A 2
n,
j=
(3. 13b)
By (2.4) of Chapter 2, the nearly unbiased estimate of the second term
can be rewritten as
n
Thus
we
define
-1 n
I
(3.12')
i=1
h'"
c
to
be
the
minimizer
of
the
leas t
square
cross-validation criterion
(3.14)
36
Theorem 3.4.: Under the conditions of Theorem 3.3 h
.
c
is optimal in the
sense that
" " }
ISE(X,h
c
------~.
"
(3.15)
1 a.s.
inf ISE(X,h}
h
Remark 3.4.: Again note that the h' s are assumed to come from H but as
n
above this set could be extended to the interval I
H
.
n
3.4 Proofs
We will prove Theorem 3.3, Theorem 3.4 and Lemma 3.2. Theorem 3.1,
Theorem 3.2 and Lemma 3.1 will follow as particular cases of Theorem
3.3, Theorem 3.4
and Lemma 3.2 respectively. To see this take the
censoring distribution such that H(x} = 0 for
x~.
Also we will state
and prove some additional lemmas which are needed in the proofs of the
theorems. We assume the conditions A'.l, A'.2 ,and A.3-A.5 and we will
take the compact support of K to be [-1,1].
3.4.1 Proof of Theorem 3.3
Here we give the proof explicitly for the estimator XO 2(x}. This
n,
proof can be adapted for the estimator
o
will write Xn(x} for Xn ,2(x}
0
xn, 1(x).
For convenience we
and ~n (x) for ~o
n, 2(x} in the following
proof. Note first that,
ISE(Xn (x})
= ISE(~n (x}}+II+III
where
co
II
= 2Jo [~n (x)-Xo(x)][Xn (x)-~n (x)]w(x)dx
co
III = J [X (x)-~ (x}]2w(x}dx
o
n
n
37
(4.1)
MISE(A- n (x»
And
= an-1 h-1 + o(n-1 h-1 )+b
where a & b are as defined earlier.
Proof of the Theorem will be completed if we prove the following three
lenunas.
Lenuna 4.1
'sup I[ISE(~ (x»-MISE(~ (x» ]/[MISE(~ (x»] I
heR
n
n
n
n
Lenuna 4.2
sup
hEH
n
Lemma 4.3
sup
hEH
IIII/[MISE(~
III/[MISE(~
n
(X»]I
J
0
---+J 0
a. s.
a.s.
n
(x»] I
J
0
a.s.
n
Proof of Lenuna 4.1 :
ISE(A- n (x»
2
-2
0
0
=I A
(x)w(x)dx
21
A
(X)A
(x)w(x)dx
+
fA
(x)w(x)dx
n
n
n n -2
-1-1
= 11 n ~(x-Xi)Kb(x-Xj)[I-F(Xi)] [1-F(X j )]
i=lj=1
.
x I[A =I]I[A =l]w(x)dx
j
i
n
- 2 1 n-1 rK_ (x-X.)[I-F(X.)] -1 1[1 _1][I-F 0 (x)] -1 f 0 (x)w(x)dx
i=1 ~·n
1
1
D i
2
+ fAo (x)w(x)dx
n
n
-1
-1
-1-1
= 1 .1.n (n-l) ~(x-Xi)Kb(x-Xj)[I-F(Xi)] [1-F(X j )]
i=IJ;h
I[A =I]I[A =l]w(x)dx
j
i
n
-2~
-2
-2
+i:l n J~(x-Xi)[I-F(Xi)] I[A =l]w(x) + o(n )
i
n
-1
-1
0
-1 0
- 2 : n ~(x-Xi)[I-F(Xi)] I[A =I][I-F (x)] f (x)w{x)dx
i
i 1
x
2
+ fAo (x)w(x)dx
-1
-1 n n
-1
-1
= n (n-l) 1 1 {~{x-Xi)Kb(x-Xj)[I-F(X.)] [1-F{X.)]
i=lj#i
1
J
38
So
ISF{Xn (x»-MISE{Xn (x»
=n
-1
(n-l)
-1 n n
I
I {~{x-Xi)~{x-Xj)[I-F{Xi)]
i=lj#i
-1-1
[1-F{X j )]
x I[A =I]I[A =I]W{X)dx
j
i
- 2~{x-Xi)[I-F{Xi)]
-1
0
I[A =I][I-F (x)]
i
-1 0
f {x)w{x)dx
2
+ fAo {x)w{x)dx - b + n-l~{X-Xi)[I-F{Xi)]-2I[Ai=I]W{X)dx
- n-lh-lI~{u)Ao{x-hu)[I-F{x-hu)]-lw{x)dxdu}
- [an-lh-l-n-lh-lI~{u)Ao{x-hu)[I-F{x-hu)]-lw{x)dxdu]
=n
-1
(n-l)
-1 n
I
n
I Uij
i=lj#i
- [an-lh-l-n-lh-lI~{u)Ao{x-hu)[I-F{x-hu)]-lw{x)dxdu]o
Expanding Ao (x-hu)[I-F{x-hu)] -1 in a Taylor series it is easy to see
that the term inside the square bracket is of order n
-1
. i.e. this term
goes to zero faster than MISE{Xn (x». So now to complete the proof we
need only to show that
sup n-1 (n-l) -1 n~ nI Ui . / MISE{A- (x»
h~H
i=lj#i J
n
n
I
For that write
where.
and
I
-----+J
0 a.s.
Uij = 'ij+ ViE+ VEj + UEE
UiE= E[UijIX i ]. UEj = E[UijIX j ].
UEE= E[U i ;].
'ij= Uij-UiE-UEj+UEE'
ViE= UiE-UEE
& VEj = UEj-UEEo
So we get
39
(4.2)
ViE = I~(x-Xi)[I-F(Xi)]
II~(x-y)[I-F
-
o
-1
(y)]
- ~(x-Xi)[I-F(Xi)]
I[Ai=I]~(X-Y)[I-F
-1
0
~(x-u)[I-F
-1
(u)]
0
-1
(y)]
-1
0
0
f (y)w(x)dydx
0
f (y)f (u)w(x)dudydx
0
I[A =I]A (x)w(x)dx
i
rK_
0
+ 2j'V.ll(x-y)[1-F
(y)] -1 A0 (x)f 0 (y)w(x)dydx
+ n
-1~
J~(x-Xi)[I-F(Xi)]
-2
I[A =I]W(X)dx
i
-1
.~
0
-1
- n IJ~(x-y)[I-F (y)] [1-F(y)] -1 f 0 (y)w(x)dydx .
VEj =
I~(x-y)[I-F
-II~(x-y)[I-F
o
o
-1
(y)]
(y)]
-1
-1
~(x-Xj)[I-F(Xj)] I[A
0
~(x-u)[I-F
(u)]
-1
(4.3)
0
=l]f (y)w(x)dydx
i
0
f (y)f (u)w(x)dudydx. (4.4)
0
and
-1
Wij = ~(x-Xi}[I-F(Xi)] ~(x-Xj}[I-F(Xj)]
-1
I[A =I]I[A =l]w{X}dx
j
i
-1
0
-1 0
- I~(x-Xi)[I-F(Xi)] I[A.=I]~(x-Y)[I-F (y)] f (y)w(x)dydx
1
-
I~(x~y)[I-F
o
+ II~(x-y)[I-F
(y)]
o
-1
(y)]
-1
~(x-Xj}[I-F(Xj)] I[A
-1
0
j
=l]f (y}w(x}dydx
0-1
~(x-u}[I-F
x
(u)]
fO(y)fo(u}w(x)dydudx.
(4.5)
Note here that Wij = Wji . and UEE= O.
So to finish the proof it is enough to show that
n
sup In-I! V. E / MISE ~ I ~ 0 a.s . .
h~H
i=1 1
n
(4.6)
n
n
sup In-I! VEj / MISE ~nl ~ 0 a.s . .
h~H
j=1
n
(4.7)
and
-1
sup In (n-l)
h~H
n
-1 n
n
_
!
! W / MISE A I
i=1 j#i ij
n
~
0 a.s.
(4.8)
First we will prove (4.6). Note that by the Borel-cantelli lemma it is
enough to show that for
~>O.
40
Hence by Chebyshev' s Inequali ty it is enough to show that there is a
constant
~>O
so that for m = 1.2 •... there are constants Cm such that
-1 n
- 2m
-~
sup E[n ~ ViE / MISE A] ~ C n
h~H
i=1
n
m
(4.9)
n
Define the sequence of sigma fields
Sk = a {(Xl.Al).(X2.A2) •...• (~.Ak)}
= a {(Xl.Al).(X2.A2) •...• (Xn.An)}
for k<n
for
k~n.
Now note that as E[VnE]=O.
~1
n
~1
E[i~=IViEISn-l] = E[VnE ] + ~ ViE = ~ ViE·
i=1
i=1
n
1. e.
~
{
ViE} is a martingale with respect to the sequence of sigma
i=1
fields generated by {Sk'
k~I}.
Now we will use the inequality (21.5) of
Burkholder (1973) for the martingale which is
co
E[¢(f*)] ~ C E[¢(s(f»] + C ~ E[¢(ldil)]
i=1
where C is a constant. We will apply this inequality to the finitely
n
indexed martingale {
~
$
V. } with
1E
. 1
1=
I~
2m
• f*= sup
ViEI. d i = ViE for
k=l, ..• n i=1
n
_..2
1/2
s(f) = ( ~ E[YfE])
.
i=1
So E[¢(s(f»]
= x
= [(
k~n and
zero otherwise. and
n
~ E[~ ])1/2]2m
i=1
iE
n
= (
~ E[~E])m
i=1
n
1
= (i:1 E[ I~(x-Xi)[I-F(Xi)]
-1
I[A =l]
i
41
~
o
x ~(x-y)[1-F (y)]
o
- II!Kh(x-y)[1-F (y)]
- 2!Kh(x-X i )[1-F(X i )]
-1
0
~(x-u)[1-F
(u)]
f (y)w(x)dydx
-1 0
0
f (y)f (u)w(x)dudydx
0
I[A =1]A (x)w(x)dx
i
~rrK
0
~~'n(x-y)[1-F
(y)] -1 A0 (x)f 0 (y)w(x)dydx
-1~
-2
+ n J~(x-Xi)[1-F(Xi)] I[A =1]W(X)dx
i
-1 .~
0
-1
-1 0
2 m
- n IJ~(x-y)[1-F (y)] [1-F(y)] f (y)w(x)dydx] )
4m
(n C h)
for some C.
+
~
-1
-1 0
n
CD
2m
I E[¢(ldil)] = I E[lv iE I ]
And
i=1
i=1
~ n C
for some C.
Therefore,
Thus,
n
-m 4m
-2m+1
E[n- 1 I V / MISE X ]2m~ C en h + n
)
1=1 iE
n
(n- 1 h- 1 + h4 )2m
~ C(hm + n h 2m )
~
C n--rm,
m
for m sufficiently large and -r)O. Hence we get (4.9) and the proof of
(4.6) is complete.
Now to prove (4.7), by the Borel--cantelli lemma it is enough to
show that for
~)O,
CD
1
n
I # (H ) sup P[ln- I VEjl )
n=1
n h~H
j=1
n
~MISE
X] < CD
•
n
Hence by Chebyshev's Inequality it is enough to show that there is a
constant -r)O so that for m = 1,2, ... there are constants C such that
m
42
sup E[n- 1 ~ V / MISE
Ej
hE.Hn
j=1
Xn]2m~
Cmn--rm
Now note that as E[vEnI:Jn_1]=O •
n
E[ I VEjl:Jn_1] = E[VEn ] +
j=1
~1
~1
VEj = I VEj .
j=1
I
j=1
n
Le. {I VEj } is a martingale with respect to the sequence of sigma
j=1
fields generated by {:Jk • k~1}. So again we will apply the martingale
inequality (21.5) of Burkholder (1973) to the finitely indexed
n
martingale { I VEj } with
j=1
<p = x2m • f*= sup
VEj
k=l, ... n j=1
n _..2
1/2
s(f) = ( I E[V[j])
.
j=1
So E[¢es(f»]
I~ I. ~=
VEk for
k~n and
zero otherwise. and
n
= [( I E[~ ])1/2]2m
j=1
Ej
•
n
= ( I E[~.])m
j=1
J
n
= (j:1E{[J~(X-Xj)[1-F(Xj)]
-1
I[A =1]
i
o
-1 0
x Kb(x-y)[1-F (y)] f (y)w(x)dydx]
o
- [JJ~(x-y)[1-F (y)]
-1 0
f (y)
o
x Kb(x-u)[1-F (u)]
~ (n C h4 )m
n
-1
0
2 m
f (u)w(x)dudydx]} )
for some C.
n
And I E[¢eldjl)] = I E[IVEjl2m] ~ n C for some C.
j=1
j=1
Therefore.
43
Thus.
-1 n
E[n
:I VEj
j=1
-'"r1Il
e
mn .
~
for m sufficiently large and
~>O.
Hence the proof of (4.7) is complete.
To prove (4.8). by the Borel-Gantelli lemma it is enough to show
that for
~>O
CXI
2
:I # (H ) sup P[ln- :I :I wijl
n=1
n h~H
l~i<j~n
> ~MISE X]
<
CXI
•
n
n
Hence by Chebyshev's Inequality it is enough to show that there is a
constant
~>O
so that for m = 1.2•... there are constants C such that
m
-2
sup E[n
h~H
n
-
/ MISE A]
n
:I:I W
l~i<j~n ij
To see this note that
2m
~
Cn
-~
m
I
it i-I
E{:I :I Wij ' ~n-l} = E{:I :I Wij ~n-l}
l~i<j~n
i=2 j=1
n-l i-I
n-l
= E{:I
:I Wij + :I Wnj ~n-l}
i=2 j=1
j=1
n-l i-I
n-l
=:I
:I Wij + E{ :I Wnj ~n-l}
i=2 j=1
j=1
n-l i-I
= :I :I Wi . =:I:I
WO j .
i=2 j=1 J l~i<j~n-l 1
I
I
i.e.
:I :I W . is a martingale with respect to the sequence of sigma
1~i<Hn iJ
fields
{~k'
(21.5)
of
k~I}.
Now again we wi 11 apply the martingale inequali ty
Burkholder
(1973)
to
{:I :I Wi.} with
Hi<Hn J
2m
<p<x) = x
the
finitely
indexed
martingale
*
k i-I
k-l
•f = sup
I:I.:I Wij I. ~= .:I [Wkj ] for k~n and zero
k=2 •...• n i=2 J=1
J=1
44
otherwise, and s(f}
This gives
where C is a constant.
Therefore finally we get
-2
- 2m
-~m
sup E[n I I 'ij / MISE An] ~ Cmn
,
heRn
l~i<j~n
as required. Which completes the proof of (4.8) and of the lemma 4.1.
Proof of Lemma 4.2 :
So
sup III
heH
n
~
{
sup 1[I-F (t}]-I_[I-F(t}]-II}
te[O,T]
n
o
x sup I[(f (I-H}} (x}]2w(x}dx,
heH
n
n
where (f°(1-H» (x) is kernel density estimator of fO(x)( I-H(x» as
n
defined in (4.5) of Chapter 2. Now note that the rate of strong uniform
consistency of the empirical distribution function (see Serfling(1980)}
is much faster than MISE
X.
n
So the first term on the right hand side
of the last expression goes to zero as n becomes large. Therefore to
complete the proof one needs only to show that second term is constant
almost surely. Which follows from (5.5) and (5.7) of Marron and Padgett
(1987). This completes
·h~
proof of Lemma 4.2.
Lemma 4.3 follows from Schwarz inequality,Lemma 4.1.& Lemma 4.2.
45
3.4.2 Proof of Lemma 3.2
We will prove the lemma for the case i=2. i.e. we need to prove
sup
heH ,h': Ih'-hl~n~
n
I
A_(h') - A_(h)1
-~
-----+J
-~
0 a.s.
lSE(Xo 2(h»- [a(nh)-1+ b2 ]
----:~
where
-~
[a(nh)-1+ b ]
2
To emphasize that b 2 depends on h we will denote it by
A_(h) =
denote the
_---...;;;;n.=.:;,~-=--
sup
~
-h
and sup will
h,h'
.
h~H ,h': Ih'-hl~n-~
n
Now note that
_
~(h)
ISE(Xo 2(h'»
- [a(nh,)1+ b ,]
h
.......;.;;...
[a(nh,)-1+ ~,]
= _--.;;.;;n..:..;;,~~
- [a(nh)-1+ ~]
lSE(X:,2(h»
[a(nh)-1+ ~]
lSE(X:,2(h'»
=
- lSE(X:,2(h»
[a(nh' )-1+ ~,]
+
a(nh)
lSE(X:,2(h»
[a(nh)-1+ ~]
{
-1
+
~
- a(nh')
1
a(nh')- + ~,
-1
-~,
}.
So it is enough to prove that
ISE(X: 2(h'» - lSE(X: 2(h»1
sup'
,
h,h'
[a(nh,)-1+ ~,]
I
and
sup
h,h'
I
a(nh)-1+ ~ - a(nh,)-1- ~,
1
a(nh')- + ~,
I
---+J
0 a.s.
(4.10)
---+J
0 a.s . .
( 4.11)
To see (4.11), note that
Ih- 1-h,-11 = I(h'-h)/hh' I ~ n- 2 (1-6)lh'_hl ~ n-2 - 21a for ~26+21a.
and hence
(4.12)
46
Now
I~-~, I = II(B~(X,h)-B~(X,h'»W(X)dxl
~
IC(B2 (x,h)-B2 (x,h'»w(x)dx,
where C is generic constant and
IB2 (x,h)-B2 (x,h')I = IJK(u)[Xo(x-hu)-X(x-h'u)]dul
~ Clh2_h,21
~
C n-
~
C n
6
Ih-h'l
-36-2/a
,
which gives
I~-~ ,I ~ C n-36-2/a .
(4.13)
Thus (4.12) and (4.13) now give (4.11).
For (4.10) consider
where
(4.14)
(4.15)
Now
where
47
x
n
-1
n
-1
[I-Fn{X i )]
-1
[I-Fn{X j )]
.
-1
I[Ai=I]I[Aj=I]W{X)dx,
-1
~(h) = 2 : n II[Xi<T+I]Kb{X-Xi)[I-Fn{X i )] I[Ai=I]
i l
x [I-F°(x)] -1 f °(x)w{x)dx
and
*
-1
R2{h) = 2 : n II[Xi~T+I]Kb{X-Xi)[I-Fn{Xi)] I[Ai=I]
i l
x [I-Fo{x)]-Ifo{x)w{x)dx.
~T+I]Kb{x-Xi
But I[X
)w{x)::O as w{x) is supported on [0, T] and K is
i
compactly supported on [-1,1]. Thus
Now to complete the proof we need to show that
I
I----+.
0
a. s.
(4.16)
I
I----+.
0
a. s. .
(4.17)
RI{h')-RI{h)
sup
1
h,h' [a{nh')- + ~,]
and
sup
~(h' )-~{h)
h,h' [a{nh,)-I+ ~,]
For that first note that
x-Xi -1 x-Xi
h' )-h K{ h')
x-X
x-X
+ h-IK{ h,i)-h-IK{ h ill
(4.18)
~ C Ih,-I_h-II + C h- I lh,-I_h-Il a ,
I
2
'Kb,{x-Xi)-Kb{x-X i )I = Ih
,-I
K{
and
IKb,{x-Xi)Kb,{x-Xj)-Kb{x-Xi)Kb{x-X j )I
,-I
x-X j
~ h
K{ h' ) IKb,{x-Xi)-Kb{x-X i ) I
1
x-X.
+ h- K{ h 1) IKb,{x-Xj)-Kb{x-X j ) I
~ C h- I lh,-I_h-II + C4h-2Ih,-I_h-Ila ,
(4.19)
3
where Ci's are generic constants.
48
Also note that by the strong law of large numbers
-1 n
n-
I I[X <T+1][1-Fn (X i }]
i=1
-1
i
I[A =1] is constant a.s.
i
and
I[1-F o (x)] -1 f 0 (x}w(x}dx is deterministic and bounded.
So by (4.1S)
I
~(h'}-~(h)
sup
1
h,h' [a(nh')- + ~,]
I
~ 2 sup {[C1Ih,-1_h-11+C2h-1Ih,-1_h-1Ia]/[a(nh,}-1+~,]}V W ,
h,h'
-h
n
-1
n
where Vn= (i:1 n I[X <T+1][1-Fn (X i )]
i
o
and W=I[1-F (x}]-1 f o(x}w(x}dx.
-1
I[A =1]}
i
Ih,-1_h-11~n-2-21a
Since
h-1Ih,-1_h-1Ia~n1-6 n-2- 2a = n- 1- 6- 2a
and
~~~, {[C 1h ,-1_h-11+C h-1 Ih,-1_h-1la]/[a(nh ,}-1+~,]}
1
2
----+ 0 ,
and as Vn is constant a.s. and W is deterministic we get
I
~(h'}-~(h)
sup
1
h,h' [a(nh'}- + ~,]
I
-----+) 0
Exactly analogous argument and use of
a.s.
(4.19)
gives
(4.16).
This
completes the proof of lemma 3.2.
3.4.3 Proof of Theorem 3.4
In the following all sup's are taken over H . Here we give the
n
0
proof for X=Xn, 2 and for simplicity we will write Xn (x) for Xo 2(x}.
n,
Let lSE(X ,h* }=inf ISE(X ,h} so
A
H
n
n
A
n
*
ISE(Xn,hc } - lSE(Xn,h
I
}I
lSE(X ,h*}
n
49
*
A
* )1
A
ISE(X ,h ) + ISE(X ,h ) IISE(X ,h ) - ISE(X ,h
= (
n c
n)
n AC
n
ISE(X ,h*)
ISE(X ,h ) + ISE(X ,h*)
n
n c
n
ISE(X ,hC ) - ISE(X ,h*
n
n
ISE(Xn ,hc ) + ISE(Xn ,h*)
)1
A
~
c
I
A
,
where C is a constant. Further CV(h*) - CV(hc ) ~ 0, therefore,
ISE(X ,h*)
n
/ C
ISE(X ,h ) - ISE(X ,h*)
n _c
~
n
_
CV(h ) - CV(h*)
c
I
IISE(X,h ) + ISE(X ,h*)
ISE(X,h) + ISE(X ,h*) .
n c
n
n c
n
By (3.7) and (4.20), theorem will be true if we could show that
(4.20)
sup I{CV(h)-ISE(X ,h)-[CV(h')-ISE(X ,h')]}/{MISE(X ,h)+MISE(X ,h')}1
h,h'
n
n
n
n
~
0 a.s. .
Now (4.21) is a consequence of the following lemma.
CV(h) - ISE(Xn,h) -T
Lemma 4.4 :
sup
MISE(X h)
J 0 a.s.
1
h I
n'
(4.21)
(4.22)
2
where
T = _IAo (x)w(x)dx - 2R
and R = n
-1 n 0
0
-2
-1
i:l f (X i )[I-F (Xi)] [1-Hn (X i )] W(X i )I[A =l]
i
2
-ffO (X)[I-Fo (x)]2w(x)dx.
To see how (4.22) implies (4.21) first note that lemma 4.4 essentially
says that
CV(h) = ISE(Xn ,h) + T + o(MISE(Xn ,h».
Now adding and subtracting T in the numerator of
sup I{CV(h)-ISE(X ,h)-[CV(h')-ISE(X ,h')]}/{MISE{X ,h)+MISE(X ,h')}1
h,h'
n
n
n
n
gives (4.21).
Now we will prove lemma 4.4 i.e.
50
(4.23)
The numerator of the last term is
-1
which we will write as n (n-l)
t ij = Kb(X i -X j }[I-Fn (X j }]
-1
-1 n
n
~ t
where.
i=lj#i ij
~
[1-Fn (X i }]
-1
I[A =I.A =l]w(X i }
j
i
-1
°
0-1
-~(x-Xj}[I-Fn(Xj)] I[A _1]f (x}[I-F (x)] w(x}dx
j-
-f°(X i }[I-F°(Xi)] -2 [1-Hn (X i )] -1 I[A =l]w(X i }
i
2
+Ifo (x)[I-Fo(x)]-2w(x}dx.
Now write
(4.24)
tij=Uij+Zlij+Z2ij+Z3ij+Z4ij'
-1
-1
I[A =I.A =l]w(X i }
j
i
-1
°
0-1
-~(x-Xj)[I-F(Xj)] I[A =l]f (x}[I-F (x)] w(x)dx
j
where Uij = Kb(X i -X j )[I-F(X i )]
[1-F(X j }]
-f °(X i )[I-F°(Xi)] -2 [1-H(X i )] -1 I[A =l]w(X i }
i
2
+Ifo (x)[I-Fo(x)]-2w(x)dx .
~-1
Zlij= Kb(X i -X j )[1-Fn (X j )J
[1-Fn (X i )]
(4.25)
-1
[1-F(X i }]
-1
I[A =I.A =l]
j
i
(4.26)
[Fn(Xi}-F(Xi}]w(X i }.
-1
-1
-1
Z2ij= Kb(X i -X j )[I-Fn (X j )] [1-F(X i )] [1-F(X j }] I[A =I.A =l]
j
i
x
51
~
(4.27)
x [Fn (X j )-F(X j )]w(X1) ,
-1
-1
Z31j= ~(x-Xj)[I-F(Xj)] [1-Fn (X j )] I[A =I][F(X j )-Fn (X j )]
j
x fO(x)[I-Fo(x)]-l w(x)dx,
(4.28)
Z41j= f °(X 1)[I-F°(Xl)] -2 [1-H(X1)] -1 [1-Hn (X 1)] -1
x
I[Al=I]W(Xl)[H(Xl)-Hn(Xl)]
(4.29)
U1j =V 1+ W1j ,
Now take
where
Vl=E[UljIXl]
& Wlj=Ulj-Vl
So
[1-F°(y)] -1 I[A =I]w(X 1)f °(y)dy
1
-I~(x-y)[I-F°(x)] -1 [1-F°(y)] -1 f °(x)f °(y)w(x)dydx
Vl=~(Xl-y)[I-F(Xl)]
-1
-f °(X 1)[I-F°(Xi)] -2 [1-H(X1)] -1 I[A =l]w(X i )
i
2
+Ifo (x)[I-Fo(x)]-2w(x)dx,
and
E[Vn ]=0.
Also
-1
-1
Wlj=~(Xi-Xj)[I-F(Xi)] [1-F(X j )] I[A
=I,A =l]w(X 1)
j
-1
°
0-1
-~(x-Xj)[I-F(Xj)] w(x)f (x)[I-F (x)] dx
i
-1
W(X i )I[A =I]f °(x)[I-F0
(y)]
dy
1
-I~(x-y)[I-F°(x)] -1 [1-F°(y)] -1 f °(x)f °(y)w(x)dydx,
-1
-~(Xi-y)[I-F(Xi)]
and
W1j =Wjl·
n
At this point note that { }; V1} & {};}; W.. } are martingales with
1=1
l>j IJ
respect to sequence of sigma fields generated by {'"k' k~I}. So to
complete the proof we need to show the following
1 n
supln- }; V./MISE(X ,h)1 ~ 0 a.s.,
h
1=1 1
n
lIn n
supln- (n-l)- }; }; Wi ./MISE(X ,h)1 ~ 0 a.s.,
h
l=lj~l J
n
52
(4.30)
(4.31)
1
1 n n
supln- (n-1)- I I Z ij/MISE(A ,h)1 ~ 0 a.s.,
h
i=lj~i r
n
n
r=l,2,3,4. (4.32)
n
Since { I Vi} & { I I Wi.} are martingales (4.30) & (4.31) will follow
i=l
i)j
J
from a similar argument to that used in the Theorem 3.3. To see (4.32)
for r=l note that
where
*
Zlij= I[Xj<T+1]Kb(Xi-Xj)[1-Fn(Xj)]
-1
[l-Fn (X i )]
-1
[l-F(X i )]
-1
I[A =l,A =l]
i
j
x[Fn(Xi)-F(Xi)]w(X i )
and
**
-1
-1
-1
Zlij= I[Xj~T+1]Kb(Xi-Xj)[1-Fn(Xj)] [l-Fn (X i )] [l-F(X i )] I[A.=l,A.=l]
J
1
x [Fn(Xi)-F(Xi)]w(X i ).
But as w is supported on [o,T] and K is cOmPactly supported on [-1,1]
I[Xj~T+1]Kb(Xi-Xj)w(Xi) =
** O.
o. and hence Zlij=
So to prove (4.32) for r=l it is enough to prove that
-1
supln
h
(n-1)
*
-1 n n
I I Z
/MISE(A ,h)1 ~ 0
i=lj~i lij
n
a.s.
(4.33)
Observe that
and further
I[X <T+1]Kb(X i -X j ) [l-Fn (X j )]
j
-1
[l-Fn (Xi)]
is
bounded
almost
surely.
-1
[l-F(X.)]
1
Therefore
the
-1
I[ja -1
j _l]w(X.)
1
- .a i
strong
j
convergence
of
{sup
IF (x)-F(x)I} to zero will give (4.33). Using similar argument
xe[O,T] n
53
~
we get (4.32) for r=2, 3, 4, which completes the proof of lemma 4.4.
Hence the proof of Theorem 3.4 is completed.
54
aIAPTER 4
CENTRAL LIMIT THEOREM
FOR THE INTEGRATED SQUARED ERROR
EVALUATED AT A FIXED SEQUENCE OF BANDWIrrIlIS
4.1 Introduction
In
this
chapter
our
main aim
is
to
derive
the
asymptotic
distribution of ISE of the density estimator from the randomly censored
samples and of the hazard rate estimator. The study of this asymptotic
distribution is important for the following two reasons. (1) Results of
the last chapter about the hazard rate estimator involve the assumption
that
the integrated squared error is somehow "close"
integrated squared error.
to
the mean
Same is true of the resul ts on censored
density estimator of Marron and Padgett (1987). The limit distribution
of an integrated squared error will provide an explicit description of
the order of this closeness. (2) We have seen in the last chapter that
A
cross-validation "works" in the sense that h
c
is asymptotically optimal
for hazard rate estimation and Marron and Padgett (1987) have shown the
same for the censored density estimation. Now the interest will be to
A
see how well it works by studying the asymptotic distribution of h
c
and
A
of integrated squared error evaluated at h . This study is the main
c
theme of the next chapter where it will be seen that. as a first step,
one needs
the asymptotic distribution of
integrated squared error
evaluated at a fixed sequence of bandwidths h=h(n) ---+ 0 as n ---+
co.
To derive the asymptotic distribution of ISE, our main focus will
be on proving central
function estimator,
limit
theorem for
ISE for
in censored and uncensored
censored density estimator.
In each of
these
the hazard rate
settings,
cases
we
and
for
have
two
different estimators, the so called natural and plug-in estimators. To
avoid
the
repetitious
argument
we
consider
the
ISE
for
natural
estimators but the results can be adapted to the plug-in estimators
that we have defined.
Our aim is to give unified proof for all the cases discussed above
which calls for additional notation. In section 4.2 we introduce these
and state the central limit theorem for ISE and also identify the main
steps of its proof. The proof of the result is given in section 4.3
through a sequence of lemmas which are proved in section 4.4.
4.2 Central Limit Theorem for ISE
Assume the censored settings as described in Chapter 2. A general
formulation of the target function in all of our example is
~(x)
~
(x) of
n
(1-H~(l~f (x) , for Q(x»O
1 n
(x) = - };
n
n 1=1
~
O~Q(x)~1, x~
(2.1)
Define
~(x)
~
where
where
o
=
and Q(x) is a non-increasing function such that
the estimators of
~(x)
(2.2a)
Is an estimator of Q such that
~
converges to Q at the rate
of n-a where a>215. Also define
(2.2b)
56
a
and note that I~n {x)-nn (x) I = 0p (n- ).
Remark 2.1.:{i) If Q{x)=1-F{x) and
~(x)=1-Fn{x)
then we have the case
of hazard rate estimation in the censored setting with ~(X)=AO{X) and
~n (x)=X°n. 2{x).
~(x)=1-Hn{x)
(ii) If Q{x)=1-H{x) and
then we have the case of density
estimation in the censored setting with ~(x)=fo{x) and ~ (x)=f o 2{x).
n
n.
Remark 2.2.:If the censoring random variable has all its mass at
~
then
Ai =1 with probability one. H{x)=O for ~ • FO{x)=F{x) and fO{x)=f{x).
~(x)=1-Fn{x)
(i) If Q{x)=1-F{x) and
we have the case of hazard rate
estimation in the uncensored setting with
Also
note
that
(ii)
if
we
~(X)=A{X)
take Q{x)=1=Gri{x)
and
~n{x)=An.2{x).
we get
the
usual
probability density and its kernel based estimator.
The weighted integrated square error of the estimator
~
n
is
ISE{~n{x» = J {~n{x) - ~(x)}2w{x)dx
where as before w{x) is a non-negative weight function and wi 11 be
defined later. Now we will decompose ISE into simple terms and analyze
them individually.
ISE{~n{x» = J {~n{x) - ~n{x)
+
~n{x) - ~{x)}2w{x)dx
= J{~n{x)-~{x)}2w{x)dx + J{~n{x)-nn{x)}2w{x)dx
+ 2J{~n (x)-~n {x)}{~n (x)~{x)}w{x)dx
= ISE{~n{x)
+
J{~n{x)-nn{x)}2w{x)dx
+ 2J{~n (x)-~n {x)}{~n {x)-~{x)}w{x)dx
and further
-
ISE{~
n (x»
=
-
-
J{~ {x)-E[~
n
2
n (x)]} w{x)dx
57
+ 2I{~n {x)-E[~n {x)]}{E[~n (x)]-~{x)}w{x)dx
2
+ I{E[~ (x)]~{x)} w{x)dx
n
i.e.
where
lSE{~n{x»
= In + 211 n + Ill n + IVn + Vn .
I n = I{~n {x)-E[~n {x)]}2w{x)dx
(2.3)
II n = I{~n {x)-E[~n {x)]}{E[~n {x)]~{x)}w{x)dx
IIIn = I{E[~n {x)]~{x)}2w{x)dx
IVn = I{~n{x)~n{x)}2w{x)dx
Vn = 2I{~n (x)~n {x)}{~n {x)~{x)}w{x)dx.
Now to prove the central limit theorem we will analyze the above five
terms individually.
First consider
I n = I{~n {x)-E[~n (x)]}2w{x)dx = 1*
n + 1**
n
where
.
(2.4)
*
-2
~(x-Xi)
~(x-Xi)
In = 2{n) 1~i(~~nI{ Q{X ) I[A =1] - E[ Q{X ) I[Ai=l]] }
i
i
i
~(x-Xj)
.
~(x-Xj)
x {
Q{X ) I[Aj=l] - E[ Q{X ) I[Aj=l]] }w{x)dx.
j
j
**
-2 n
~(x-Xi)
~(x-Xi)
2
I
In = n i:1 { Q{X ) I[Ai=l] - E[ Q{X ) I[Ai=l]] } w{x)dx
i
i
**
The term I n . being a sum of independent random variables. is very
easily described by a central limit theorem. while the term I n* equals
2{n)-2 times the centered. degenerate U-statistic whose variable kernel
function is given by
which later on we will denote cpn{X l ,X2 ). So we will use the central
limit theorem for degenerate U-statistics with variable kernel to treat
the term 1*. The term II
n
n
can be written as a sum of independent and
58
identically distributed random variables and so is readily described by
the
ordinary
limit
central
theorem.
The
term
III
n
is
purely
deterministic in character and so can be analyzed by routine analytic
methods. The terms IV and V will be shown to be negligible.
n
n
Assumptions:
B.1. K is a bounded and non-negative function on
j'K{u)du=l •
B.2. f.
~
fuK{u)du=O
~
satisfying
fu~{u)du=2k.
and
and their second order derivatives are bounded and uniformly
continuous on
~.
B.3. w{x)=I[O.T] where T=sup{x;
Q{x»~}
~>o.
for
Define
~n = IE{~n{x)~{x)}2w{x)dx
.
a
n
-
2
-
-
2
= f{E~n (x)-~{x)} w{x)dx + IE{~n (x)-E~n (x)} w{x)dx.
=
nh 1/ 2
if
nh5
J
01)
n9 / 10
if
nh5
J
c.
nl/~-2
if
nh5
J
O.
(2.5)
O<c<OI)
(2.6)
An= ISE{~n (x»-~n
and
An = lSE{~n (x»-~n .
Theorem 2.1 :Assuming B.I-B.3 are true and that h
n
~
01)
•
0 and nh
~
01)
as
we have
5
if nh - - .
2ko Z
2
A
(i) a nn
~
L
(4k2c4/5u~ + 2c-1/5u~)1/2z
2 1/2
01)
5
if nh - - . c. O<c<OI)
5
if nh - - . O.
u3Z
59
(2.7)
and
(ii) n9/ 10A
n
L) (4k2c4/5a~ + 2c-1/5a~)1/2z if nh5~ c. O<c<oo (2.8)
where
2
a2
= {In" 2 (x)
a~
=
and Z is
n{x)
Q{xj
2
W
2
(x)dx - [In"(x)n(x)w(x)dx] }
(2.9)
[I( gf~} w(x»2dx ][I(fK(z)K(z+u)dz)2du ]
(2.10)
a N(O.l) random variable.
Remark 2.3 :(i) We may replace ~n by
*
2
-1 _2
n{x)
~n = I{Enn(x)-n(x)} w(x)dx + (nh) [JK-(u)du][f Q{xjw(x)dx]
without affecting the asymptotics.
(ii) The choice of weight function in B.3. and B.2 gives us that
In,,2(x)w(x)dx <
a,consequence of which is that
2
I{Enn (x)-n(x)} w(x)dx
00
2
= h~2
KIn" (x)w(x)dx
4
+ o(h ).
Therefore
2
-1 _2
n{x)
I{nn(x)-n(x)} w(x)dx = (nh) [JK-(u)du][f Q{xjw(x)dx]
~
2
2
+ h KIn" (x)w(x)dx +
0
p
[(nh)
-1
4
+ h ]
(2.11)
which can be interpreted as a weak law of large numbers corresponding
to theorem 2.1 and
I{nn(x)-n(x)}2w(x)dx
= (nh)-l[~(u)du][f gf~~w(x)dx]
+
0
p
+ h~2In,,2(x)w(x)dx
[(nh)-l+ h4 ] + 0 [n- 19/ 20 ].
(2.12)
P
The following corollaries are immediate consequences of the above
theorem.
Corollary 2.1 :Under the conditions of the theorem 2.1
1 1
o
(i) a [ISE(X 2(x) - (an- h- + b)]
n
n.
60
S
2ko Z
4
L )
{4k2c4/Sa~ + 2c-1/Sa~} 1I2z
21/2aSz
if nh ---+
CD
if nhS---+
c. O<C<CD
if nhS---+
O.
and
{ii} n9/ 10[ISE{XO {x} - {an- 1h- 1+ b}]
n.2
L ) {4k2c4/Sa~ + 2c-1/Sa~} 1/2z
where a
b
if nhS---+
c. O<C<CD
= [~(u}du]JAo{x}[I-F{x}]-lw{x}dx
= I{JK{u}[Xo{x-hu}-Xo{x}]du}2w{x}dx
2
a4 = {I{X
2
as
0"
2 XO{x) 2
0"
2
(x)] I-F{x) w (x}dx - [JA (x)X{x}w(x}dx]}
XO{x)
= [I{I-F{x)
2
2
w{x}} dx][I(JK{z}K{z+u}dz} du]
and Z is a N(O.I} random variable.
Corollary 2.2 :Under the conditions of the theorem 2.1
-1 -1
-0
(i) a n [ISE(f n . 2 {x} - (an h + b)]
2ka6Z
(4k2c4/Sa~ + 2c-1/Sa~) 1I2z
L
if nhS---+
CD
if nhS---+
c. O<C<CO
if nhS---+ O.
21/2ar-
and
{ii} n9/ 10[ISE{fo {x} - {an- 1h- 1+ b}]
n.2
L ) (4k2c4/Sa~ + 2c -1/Sa~) 1I2z
if nhS---+
c. O<C<CO
= [~(u}du]Ifo(x}[I-H(x}]-lw(x}dx
where a
b = I{JK{u}[fo{x-hu}-fo{x}]du}2w{x}dx
2
a 6 = {I[f
2
a7
0"
2 fO{x) 2
0"
°
2
(x)] I-H{x) w {x}dx - [If {x}f (x}w(x}dx] }
fO{x)
= [I{I-H{x)
2
2
w{x}} dx][I{JK{z}K{z+u}dz} du]
and Z is a N(O.I} random variable.
61
Corollary 2.3 :Under the conditions of the theorem 2.1
{i} a [ISE{~ 2{x} - {an- 1h- 1+ b}]
n
n.
L
I
2ka Z
if nh5~ CXI
S
{4k2c 4/5a~ + 2c -1/5a~} 1/2z if nh5~ c. O<C<CXI
if nh5~ O.
2 1/2a9Z
and
{ii} n9 / 10[ISE{X {x} - {an- 1h- 1+ b}]
n.2
L I {4k2c4/5a~ + 2c -1/5a~} 1/2z if nh5~ c. O<C<CXI
where a = [~{u}du]SX{x}[l-F{x}]-lw{x}dx
b = I{fK{u}[X{x-hu}-X{x}]du}2w{x}dx
X(x} 2
2
as2 = {SX" 2 {x}l_F{x}W'
{x}dx - [SX"{x}X{x}w{x}dx] }
2
X(x}
2
2
a9 = [I{l_F{x}w{x}} dx][I{fK{z}K{z+u}dz} du]
and Z is a N{O.l} random variable.
4.3 Proof of the Theorem
Let
I
**
-2 n
= {nh}
~ Z 1i
n
i=l n
x-X
x-X
K{ _ i }
h
Zn1i = I{
were
K{ _ i }
h } I [A =l] - E[
Q{X
i
i
h
I=l]
] }2w{x)dx
} [A
Q{X
i
i
2
**
-2 n
-1 -2Let a l=E[I ] = {nh}
~ E[Z 1i] = n h -c[Z 1i]
n
n
i=l
n
n
By lemma 4.1 of the section 4.4
2
-1 J-.2
n(x-hu}
a n1 = {nh} [JK-{U} Q{x-hu} w{x}dudx
-
hIIfK{u}K{v}~{x-hu}~{x-hv}w{x}dvdudx].
Again by lemma 4.1 .
2
2
2
E[Zn1i] = O{h } and Var{Zn1i} = O{h }
**
-4
-::t -2
which implies
Var{I n } = {nh} nVar{Zn1i} = O{n -h }.
62
Therefore.
(3.1)
Now consider
IIn = 2f{~n (x)-E[~n (x)]}{E[~n (x)]-n(x)}w(x)dx
= 2(nh)
-1 n
1 Zn2i
i=1
where
n
=
~
-2 -4
4
s
1 E[Zn2.]
n 1=
. 1
Now by lemma 4.2 of the section 4.4.
This implies that
s
-1 n
1 Zn2.
n i=1
}
(3.2)
1
-...,;;L~,
s
-4 n
4
1 E[Zn2i]
n 1=
. 1
N(0. 1)
as n
-+
----+,
0
as n
-+
CD
CD
and therefore.
211
where Nn2
= s (nh)
n
n
L
---+,
N(O.I)
-1
(s
-1 n
1 Zn2i) = un2 Nn2 •
n i=1
as n
-+
CD.
63
(3.3)
2
and
-1
0n2 = 4n
2 2
4.-
(3.4)
h K 02 .
Write
where
~n(X1'~} = !A (u,X }A (u,X }w(u}du
1 n
n
2
u-X
u-X
K( h i }
and
An(u,X i } =
K( h i }
Q(X } I[A =1] - E
i
i
Q(X } I[A =1]
i
i
for i=1,2.
Gn (X 1 ,X2 } = E{~n(~,X1}~n(~,X2}IX1,X2}.
Define
Now notice that
(i)
~
n
is symmetric by definition,
and by lemma 4.3, proved in next section, we have the following
So the conditions of the theorem 1 of the Hall (1984) are satisfied.
Hence
}; };
~
(Xi ,X )
1~i<Hn n
2
2
variance (n
j
/2)E~n(X1,X2}.
*
In
where
N
n3
is asymptotically normal wi th mean zero and
Therefore,
= 0n3
L ) N(O,1} as n
Nn3
--+
(3.5)
00,
and
-4 2
2
-~-1 2
2
0n3 = 4(nh} (n /2)E~n(X1,X2} = 2n n 03
(3.6)
So the two terms of the ISE decomposi tion (2.3) have asymptotically
64
normal distributions. Therefore to derive the asymptotic distribution
of ISE now we need to study the joint distribution of these two terms.
Towards that end we have by lemma 4.4 that II
*
n and I n are uncorrelated.
Define. for any real numbers a and b
M
n
= a[V(II n }]-1/2II n
Now lemma 4.5 gives that. M
n
+ b[Var(I*}]-1/2 I * .
n
n
is asymptotically normal wi th mean zero
and variance a 2+b2 . So by the Cramer-Wold device II
asymptotically
independent
and
n
distributed.
normally
and 1* are
n
Therefore
asymptotically.
*
2
2 1/~L
= [un2 + u n3 ] ~n4
IIn+ In
where
L
N
n4
) N(O.I}
as n
5
and in the case where nb
u
2
n2
+ u
(3.1)
) 00,
) c , O<c<oo.
2
-9/5
2 4/5 2
-1/5 2
= n
(4k c
u + 2c
u )
n3
2
3
(3.8)
So by (3.1). (3.3). and (3.5) for sufficiently large n.
A = 2n-l/~~2 N + 21/2n-lh-l/2u3 N + Op(n-3/~-I}.
n
n2
n3
Therefore
n
1/2
h
-2 -
An
L
-;;;;""'-+1
5
2ka Z2 if nb
2
- - - + 1 00
and
nb 1/ 2 An
if nb5
L) 21/2u3 Z3
) O.
where Z2 and Z3 are N(O.I} variables.
By (3.1). (3.3). (3.5) and (3.1)
An
= [u~
+ u~]1/2 N + Op(n-3/~-I}
n4
Therefore by (3.8)
9
n / 10"ian
L ) (4k2 C4/5u 2 + 2c -1/5u 2}1/2L
2
3
1
. f nh 5
L
---+1
c. 0< c<oo
where ZI is a N(O,I} random variable. which completes the proof of the
first part of the theorem.
65
For the second part note that
An = An+ IVn+ Vn
9/10
So it is enough to show that n
(IVn + Vn )
P
J
o.
Now IVn = I{~n (x)~n (x)}2w{x)dx
-1 n
1
1
2
= I{n i:1 {X-X i )I[A =1][ ~(Xi)
Q{X )]} w{x)dx
i
i
1
1
2 -1 n
2
= {X~[~,T][~~~{X~) - Q{x)]} I{n i:1 {X-X i )I[A =1]} w{x)dx
i
1
(3.9)
= Op{n- )
Kb
Kb
and by the Schwarz inequality
Vn = 2I{~n (x)-~n {x)}{~n (x)~{x)}w{x)dx
~ 2 [I{~n{X)-~n{X)}2W{x)dxI{~n{x)-~{X)}2W{X)dx]1/2
= 2[IVnxlSE{~n{x»]1/2.
By first part of the theorem lSE{~n (x»
=
° (n-9/ 10 ) and by (3.9)
p
IV = Open-1), which implies that V ~ Open-19/20).
n
n
Hence
where Z is a N{O,l) random variable.
4.4 Lemmas
For the following lemmas we assume the conditions B.1-B.3.
Lemma 4.1 :
_2{ )
~(x-hz)
E[ Zn1i] = h SJK z Q{x-hz) w{x)dzdx
- h2IIIK{u)K{z)~{x-hu)~{x-hz)w{x)dzdudx.
(4.1)
and
2
E[Zn 1.]
1
as h
J
= O{h2 )
(4.2)
O.
66
Proof
So
Now
x-Xi
K{ h )
(E[ Q{X ) I[Ai=I]])
i
2
= (h1K{z)~{x-hz)dz)
2
= h2IJK{u)K{v)~{x-hu)~{x~hv)dudv .
and
x-X
i)
K{x-u)
h
2
h
2
0
. _2
n(x-hz)
E[ Q{X ) I[Ai=I]] = I[ Q{u) ] {I-H{u»f (u)du = hJK-{z) Q{x-hv) dz
i
K{
So
x-X
K{ hi)
2
,._2
n(x-hz)
lE[ Q{X ) I[Ai=I]] w(x}dx = I~JK-(z) Q{x-hz) w(x)dzdx.
i
L
(4.3)
and
) n(x-hz)
E[Znli ] = L,._2
I~JK-{Z Q{x-hz) w{x)dzdx
- h2 IIJK{u)K{v)~{x-hu)~{x-hv)w{x)dudvdx.
as reqUired.
Now
.
e
61
Now for any
r~I
and
s~I.
x-x
y-X
Kr ( _1)
KS( _1)
I.fE[
h I
h ]dxd
y
Qr(X ) [AI=I] QS(X )
I
I
= h2II[J
n(x-hu) Kr(u)Ks(u+v)du]dvdx
Qr+s-I(x_hu)
2
= O(h ).
(4.5)
By (4.3). (4.4) and (4.5)
68
E[Zn 1.]
I
2
~
2
O(h }.
which completes the proof of lemma 4.1.
Lemma 4.2 :
and
as h
)
o.
Proof :Clearly. E[Zn2i] = o.
Zn2i = Yn2i-E[Yn2i]'
x-X
Note that
K( h i }
_
~n(x}
where Yn2i = J
Q(X } I[A =l][E
i
i
Define. t~j}=E[Y~i] for positive j.
x-X.I
(1)
K(
h
)
- ~(x}]w(x}dx.
-
Q(X } I[A =l][E ~n(x} - ~(x}]w(x}dx.
i
i
But E Tjn(x) - ~(x) = f[~(ul-~(x}]~(x-u}du = h~"(x} + o(h2 }.
x-X
So t n
= EJ
K( h i }
Q(X } I[A =l] = hfK(z}~(x-hz}dz.
i
i
Therefore. t(l}
= hf[.fK(z}~(x-hz}dz]h~"(x}w(x}dx + o(h3 }
n
and E
= h~~"(x}[~(x}+o(h}]w(x}dx + o(h3 }
= h~~"(x}~(x}w(x}dx + o(h3 }.
x-Xi
(2)
__2
K( h
}
2
t n = E[r-n2i ] = E{f Q(X } I[A =l][E ~n(x} - ~(x}]w(x}dx}
i
i
x-X
K( h i }
=
Eff
Q(X } I[A =l]
i
i
u-X
K( h i }
_
Q(X } I[A =l][E ~n(x} - ~(x)]
i
i
x [E Tj (u) - ~(u}]w(x}w(u}dxdu
n
69
.
e
x-x
u-X
K( h i )
=IJE[
K( h i )
_
Q(X ) ][E ~n(x) - ~(x)]
i
x [E ~ (u) - ~(u)]w(x)w(u)dxdu
n
Q(X ) I[Ai=l]
i
u-x
~(x-hz)
= II{h.fK(z)K(~z) Q(x-hz) dz}
x [h~"(x)+0(h2)][h~"(u)+0(h2)]w(x)w(u)dxdu.
u-x = u, x = x and z = z we get
Now by substituting h
t~2)= h~2II~"(x)~"(x+hu)w(x)w(x+hu)IK(z)K(u+z)g~~=~~ dzdudx + 0(h6 )
6
= h~2I~..2(x)w2(x) g~~~ dx + 0(h ).
For any
t(k)
n
k~l
~ II ... I{ ~ IE[~ (x(j»]_~(x(j»I}dx(l)dx(2)
j=l
n
... dx(k)
.
k K(
x I[j~l
.
k K(
~ Ckh2kII... Idx(l)dx(2) ... dx(k)I[j~l
i.e
So
x(jLz
x(jLz
h)
Q(z)
]f(z)dz
)
Q(~) ]f(z)dz.
t~k) ~ q:h3k .
E[Z~i] = t~2) - (t~l»2
= h~2I~ ..2(x)w2(x) g~~~dx - h6k2[I~"(x)~(x)w(x)dx]2 + 0(h6 )
'" h6k2{I~ ..2(x)w2(x) g~~~ dx - [I~"(x)~(x)w(x)dx]2},
which completes the proof of (4.7).
And finally,
E[Z4 ] = t(4) _ 4t(3)t(l) + 6t(2)(t(l»2 _ 3(t(l»4
n2i
n
n n
n
n
n
l2
9
3
6
3
= O(h ) _ O(h )O(h ) + O(h )(O(h »2 _ (O(h3 »4
'" O(h l2 ).
which completes the proof of lemma 4.2.
Lemma 4.3 :
(4.9)
70
2
E[~n(X1'~)] ~
3
T}(x}
2
2
h [I( Q{x} w(x» dx][I(fK(z)K(z+u)dz) du],
4
E[~n(X1,X2)]
5
= O(h ),
(4.10)
(4.11)
and
(4.12)
ash
)0.
But
Therefore,
) v-u ) T}(u-hz}
E[ ~n2( X1'~ )] = JI{~rv
11,Jn.(Z K(~z Q(u-hz) dz
- [hfK(z)T}(u-hz)dz][hfK(z)T}(v-hz)dz]}2w(u)w(v)dudvo
Now b y
·
.
v-u
d u = u we get
sub stltUtlng
~ = x, z = z an
2
3
fK
D(u-hz}
E[ ~n(X1'~)] = h SI { (z)K(x+z) Q(u-hz) dz
- h[fK(z)T}(u-hz)dz][fK(z)T}(u+h(x-z»dz]}2w(u)w(u+hx)dudx
~ h3II{fK(z)K(x+z) ij~~=~~} dz}2w(u)w(u+hx)dudx
~
D( u-hz} ) 2W2 (u)du}{ I [ fK (z)K(z+x)dz] 2c:1~:},
h3 {I ( Q(u-hz)
by using the Taylor's expansion. That completes the proof of (4.10).
To prove (4011) consider
4
E[~ (X ,X2 )] =
n 1
EIIII[ 4IT A (u (i) ,X 1)A (u (i) ,X2 )] 4IT w(u (i) )du (0)
1
i=1 n
n
i=1
71
•
e
Now the integrand may be expanded into several terms and each of these
term can be shown to be of order h5 . Here we will illustrate the
To prove (4.l2) define
Bn (x,y) = E[An (x,Xl)An (y,X l )]
x-X
K{
= E[
h..1)
y-X
K{ h I )
Q{X ) I[Al=l]
l
Q{X ) I[Al=l]]
l
x-X
K{ h I )
- E[
y-X
K{ h I )
Q{X ) I[Al=l]]E[ Q{X ) I[Al=l]]' (4.l3)
l
l
Now
Gn {X l ,X2 ) = E{~n{~,Xl)~n{~,X2)IXl,X2}
= IIAn{u,Xl)An{v,X2)E[An{u,~)An{v,X3)]w{u)w{v)dudv
=
IIAn {u,Xl)An {v,X2 )Bn {u,v)w{u)w{v)dudv.
Therefore,
72
and
~(X1'~) =
JJJJBn (u(l),V(l»
B (U(2),V(2»B (U(1),u(2»
n
n
x B (v(1),v(2»
n
~
W(U(i»W(V(i»du(i)dv(i).
i=l
Now using (4.13), last integral may be expanded into several terms,
each of which is of order h7 . For illustration we treat only the first
such term,
T
n2
=
JJJJDn (u(1),u(2),v(1),v(2» ~
w(u(i»w(v(i»du(i)dv(i)
i=l
1) (1)
(2) u(2)_u(I)
(3) v(2)_u(1)
and subs ti tu ting x ( =u
,x =
h
,x = """"""--:-h-+
T
v(l)_u(l)
(4) v(2)_u(1)
h
' and x
=
h
we get
n2
= h3JJJJD (x(1),x(I)+x(2)h,x(1)+x(3)h-x(4)h,x(I)-x(4)h)w(xCI»~
n
x w(x(1)+x(2)h)w(x(1)+x(3)h-x(4)h)w(x(I)_x(4)h) ; dx(i)
i=l
where
73
•
v(l)_u(l)
n(u(l)-hz)
h
IZ)
(1)
dz]
Q(u -hz)
u(2)_v(2)
n(v(2)-hz)
x [IK(z)K(
h
I z)
(2)
dz].
Q(v -hz)
x
[IK(z)K(
Therefore.
(1)
T = h 1 IIII[IK(z)K(z+x(2»
n2
n(x -hz) dz]
Q(x(l)-hz)
(3) n(X(1)-x(4)h-hz)
x [IK(z)K(z+x
)
(1) (4)
dz]
Q(x -x h-hz)
( 3) -x (4) ) n(x(l)-hz)
x [IK(z)K(z+x
1
dz]
Q(x( L hz )
(1)
x
(4)
4.
[IK(z)K(z+x(2)+x(4» n(x 1 -x 4 h-hz) dz] rr dx(l)
Q(x( )-x( )h-hz)
i=l
= O(h1 ).
Lenuna 4.4
Proof
I *n and IIn are uncorrelated.
Note that by (3.4)
*
~d
n
Cov(II .1 ) = ~
n n k=l
~
(4.3).
~
Cov(Zn2k'~
l~i<j~n
n
(Xi.X j »
n
=k:1 l~i<~~nE(Zn2k~n(Xi,Xj».
E(Zn2k~n(Xi,Xj»
If k#i. k#j then.
and
if
=
E(Zn2k)E(~n(Xi,Xj»
= O.
k=i. k#j then.
E(Zn2k~n(Xi,Xj» = E[E(Zn2k~n(Xi,Xj)IXi)] = E[Zn2kE(~n(Xi,Xj)IXi)] = O.
and similarly for k#i. k=j
E(Zn2k~n(Xi,Xj»
= O.
i.e. II n and I *n are uncorrelated.
Lenuna 4.5
Mn
L
--=~)
N(O. a 2+b2 ) as n
n
Proof
Let M= ~ Y i
n i=l n
14
---+
00.
where
-1/2
-1
* -1/2
-2
Yni = a[V(II n )]
(nh) Zn2i + 2b[Var(In )]
(rib)
Vni •
i-I
where V i= ! ~ (Xi.X j ).
n
j=1 n
(4.14)
Now by the corollary 3.9 of Hall and Heyde (1980). we only need to
check the following two conditions.
(4.15)
and
(4.16)
as n
-.00
~k
where
=
0
=0
{(Xl.Al).(~.A2).···.(~.Ak)}
for k~n
{(Xl.Al).(X2.A2) •...• (Xn.An)} for
k~n.
Now
E[~il~i-l] = a2[Var(IIn)]-I(nh)-~~i
*
2
+ 4b [Var(In )]
-1
-4- _..2
I
(nh) lE[~i ~i-l]
as E[Zn2iVnil~i-l] =0 by lemma 4.4.
Further.
a 2 2
a 2 -4.. -2 2
2_
= h-k [nh K 0n2/4] = nh-Var(II n ).
n
n
2
2
! E[y2il~i_l] = a + 4b [Var(I*)]-I(nh)-4 ! E[y2il~'-I]'
i=1
n
n
i=2 n
1
2
EZn2i = h-k
So
O
2
Therefore to prove (4.16) it is enough to show that as n
*
4[Var(I)]
-1
n
(nh)
-4
_..2
p
! E[V- I~i-l] ~---+) 1 .
i
. 2
n
1=
n
Also to prove (4.15) it is enough to show that as n
I1
~
2
i:1E[YniI[IYnil>~]] ------~) 0 .
Now notice that
75
00.
(4.17)
~OO.
(4.18)
y2 I
ni [IYnil>e]
* -112 (nh) -~-2
Zn2i + 2b[Var(I n )]
-Vni ] I[IYnil>e]'
2
and that la+~12I[la+~I>e] ~ 4a I[lal>e/2] + ~I[I~I>e/2]'
= [a[V(II n )]
-1/2
(nh)
-1
So to prove (4.18) we will prove that as n
~~,
(4.19)
and
(4.20)
Lenuna 4.2 and the exact similar argument used to get (3.2) gives
(4.19). By theorem 1 of Hall (1984) the sufficient condi tions for
(4.17) and (4.20) to be true are
(i)
~
n
is sYmmetric ,
(ii) E[~n(Xl.~)IX1] = E[~n(Xl.~)I~]
(iii)
(iv)
2
E[~n(Xl'~)]
=0
•
< ~ for each n • and
{~(Xl'~) + n-1E~:(Xl,X2)}
2
[E~n(Xl'~)]
2
' 0 as
n -.
~
.
By lenuna 4.3 these condi tions are satisfied. Hence the proof of the
lenuna 4.5 is complete.
76
aIAPTER 5
,.,
roMPARATIVE STUDY OF BANDWIurns h
,.,
AND h WITH h AS REFERENCE
coo
5.1 Introduction
As
noted
earlier
the
bandwidth
h,
the
o
minimizer
of
mean
integrated squared error, is best in an average sense over all possible
data sets and the random bandwidth h,
o
the minimizer of integrated
squared error, is best for the data set at hand. Which one of these is
to be considered as a benchmark is a debatable issue. But thinking only
about the data set at hand, h
,.,
o
seems to be a more reasonable benchmark.
So if h is a bandwidth obtained by any rational methodology, then our
interest would be to see how it compares wi th h . lbat is to say,
o
,.,
,.,
examine the relative distance between h and h
o
and examine relatively
,.,
how much greater the integrated squared error evaluated at h is than
the integrated squared error evaluated at h .
o
There are quite a few different methods of constructing h. Several
of
these are:
estimate
h
o
the
(Le.
classical
the
argument
plug-in
which essentially
method
of
estimating
tries
h).
o
to
the
pseudo-likelihood method and least squares cross validation. Here we
,.,
,.,
,.,
will only consider h = h and h = h . Our interest is two fold. As
o
mentioned earlier the
h
o
prilli~
c
interest is to compare h to the benchmark
and the other interest is comparison among h's,
Le. h
0
and h .
c
These goals are achieved in two main theorems which are stated and
proved in section 5.2. In section 5.3 we state and prove a sequence of
lemmas, which supply. all the rigour required for the main theorems of
the section 5.2.
5.2 Main Results
We impose the following conditions on K, fO and ~
(B.I) K is a cOmPactly supported, sYmmetric function on
~,
with Walder
continuous derivative K', and satisfies
.fK(u)du
=I
and
Iu~(u)du = 2k
"# 0 .
(B.2) fO and ~ are bounded and twice differentiable, fO', ~', fO" and
~"
are bounded and integrable
and f
continuous on [O,T] where T = sup{x
(B.3) K has a second derivative on
~
0"
and~"
are uniformly
I Q(x) > ~ },
~>O.
and Kit is Walder continuous.
Now for simplicity we will change the notation slightly to denote
integrated squared error by
f(h) = I{~n (x)-n(x)}2w(x)dx,
mean integrated squared error by
M(h)
= E[f(h)]
,
and set
D(h) = f(h) - M(h).
Now
f(h)
= I{~n (x)~n (x)~n (x)-n(x)}2w(x)dx
-
= fl(h) + f 2 (h) + 2f3 (h)
2
where fl(h) = I{~n(x)-~(x)} w(x)dx,
f 2 (h) = I{~n (x)~n (x)}2w(x)dx,
and
f 3 (h) = I{~n (x)-~(x)}{~n (x)-~n (x)}w(x)dx
78
Denote [[fi(h}] = Mi(h} , i=I,2, and 3, so that
M(h} = M1 (h} + M2 (h} + 2M3(h}
and D1(h} = f 1 (h} - M1 (h}.
Now recall that CV, the cross validation criterion, is
CV = f(h} + 6(h} - I~2(x}w(x}dx
where 6(h} = 61 (h} + 62 (h}
and
_
-1 n ~ni(Xi}
6 1(h} = 2I~(x}~n(x}w(x}dx - 2n i:l Q(X } w(X i }I[4 =l] ,
i
i
62 (h} = 2I~(x}[~n (x)~n (x}]w(x}dx
-1 n ~ni (Xi)
- 2n
Also,
recall
that
'"
h,
o
~ [ 0 (X )
i=1
'"
h,
c
i
~
and
h
0
minimize
respectively. Now the contribution from
~(h)
f(h},
CV and
M(h}
and M:3(h} to M(h} is
negligible as cOmPared to M (h}. So M(h} ,.. M1 (h} and
1
M (h} = a(nh}-1 + bh4 + 0{(nh}-I+h4}
1
as h ---+ 0 and nh ---+
where
2
2
a = [~(u}du] x Ig{~~w(x}dx and b = k I(~"(x)} w(x}dx.
(1),
Now the following are expressions for the first and second derivatives
of M1 (h} obtained by differentiating under the integral sign,
Mi(h} = _a(nh2 }-I+ 4bh3+ 0{(nh2 }-I+ h3 },
Mi(h} = 2a(nh3 }-I+ 12bh2+ 0{(nh3 }-I+ h2 }.
Therefore by setting Mi(h} = 0 we get
-1/5
a
1/5
h0
,.. C n 0where
C
=
(-4b)
and
0
Mi(h } ,.. 2a[n2/5( ~b }3/5]-1+ 12b[( ~b }2/5n-2/5]
o
,.. [2a( :b }3/5+ 12b( ~b }2/5] n-2/5
,.. [2aC-3+ 12~] n-2/5= C n-2/5.
o
0
1
Also
M (h }'" C n-4/5 where C = a4/~1/5[41/5+ 4-4/5].
1 o
2
2
Set L(u} = -uK'(u} •
G
2
3
~(x)
2
2
o= (2/Co ) [I( Q(x) w(x}} dx]I[fK(z+u}(K(z}-L(z}}dz] du
79
2
2 n(x) 2
2
+ (4kCo ) {f(n"(x» Q{x) W (x)dx - [In"(x)n(x)w(x)dx] },
2
3
n(x)
2
2
0c= (21Co ) I[ Q(x)w(x)] dx JL (u)du
2
2 n(x) 2
2
+ (4kCo ) {I(n"(x» Q{x) W (x)dx - [In"(x)n(x)w(x)dx] },
a oc= - {(21C0 )3[I( g{X~
w(x»2dx]I[K(u)-L(u)][K*K(u)-L*K(u)]du
x
I
2 n(x)
+ (4kCo ) 2 [(n"(x»
Q(x)
where * denotes the convolution.
(x)dx - ( I n"(x)n(x)w(x)dx) 2 ]},
2'
W
Theorem 2.1.: Under the conditions B.1 and B.2
A
h - h
o
0
A
n
h
1/10
L
o
A
[0]
--";;~-+J N( 0
A
h - h
c
0
2
' a [
°o~
oc
a
oc
2
a
c
A
h
o
Proof:
Observe that ,
A
o = f' (ho )
A
A
A
A
= M' (ho)+D' (ho )
A
= Mi(ho)+Di(ho ) + R1 (ho ),
A
A
A
where R1(ho ) = [f (h )+2f (ho )].
o
3
2
I.e.
A
0 = f' (h )
o
= (ho-h0 )M'l'(h* )+D 1'(h0 )
A
A
where h* lies between h and h .
A
+ R1(h0 ),
A
o
0
By lemma 3.4, which is proved in the next section,
ho
= h + 0 (n-1/5-~) for some ~>O,
0
p
and so by lemma 3.2 with h 1=h
o
'(ho ) = D1'(h0 )
D
1
+
0
p
(n-7/10 ).
But by lemma 3.5, given in the next section,
80
(2.1 )
n7/10D1, {h}
o
L
• N{O, (72) .
0
7/10
"p
1,
So n
Di {ho } must have the same weak limi t. Since h*/ho ---+
M..1{h* } = C1n-2/5+ open-2/5 }.
Thus {2.1} becomes
"-2/5
-7/10
"= {ho-ho }C1n
+ Di{ho } + open
} + R1{ho }'
So by lemmas 3.3 and 3.5 we conclude that
°
0
{2.2}
{2.3}
"-
"-
"-
For the other component in the vector [h - h ' h - h ]' note that
o o
o c
CV{h} = f{h} + o{h} -J~2{x}w{x}dx
= M{h} + D{h} + o{h} - J~2{x}w{x}dx.
Therefore,
° = CV'{hc } = M1'{hc }+D1'{hc }+ol'{hc }+R1{hc }+R2{hc },
A
"-
A
A
A
A
A
"-
where ~{hc} = 02{hc }'
° = CV' {hc }
"-
1. e.
**
A
A
A
A
A
= {hc -ho }M1{h }+Di{hc}+oi{hc}+Rl{hc}+R2{hc},
**
"where h
lies between h and h .
o
c
Now lemma 3.4 gives
ho = h0 + 0p {n-l/5-~} ,
for some ~>o,
and lemma 3.2 gives with h =h
1 o
D'{h }+o'{h } = D'{h }+o'{h }+o {n-7/ 10 }.
1 c
1 c
1 0
1 0 P
Also by lemma 3.5 and 3.6
D'{h }+ol'{h } = 0 {n-7/ 10 }.
1 o
0
p
**
~ ~ 1 M {h**} = C n-2/5+ 0p{n-2/5}.
Further, since
1
1
h
o
So {2.4} can be expressed as
° = {hc
-h }C1n-2/5[1+0{1}]
+ ° {n-7/ 10 } + R {hc }+R {hc }
o p
1
2
81
{2.4}
This implies that
hc
-h = 0 (n-3/ 10 ).
op
Therefore,
"
*="
-2/5
-2/5
+ 0 (n
(hc
-h )[C
)]
o
1n p
(hc -h0 )M 1 (h )
-2/5
= (h" c -h0 )C1n-2/5"
-ho)0
)
+ (hc
p (n
= (h -h )C n-2/5+ 0 (n-7/ 10 ) .
co
p
1
Using the last representation, refinement of (2.4) gives
°
7 10 ).
= (h -h )C n-2/5+D1'(h )+o1'(h )+R (h )+R_(h )+0 (n- /
co
o
0
P
1
1 c -~ c
Now subtracting (2.2) from (2.5) we get
"
-2/5
"-7/10
= (h"c
-h )Co
+01'(h
).
o )+R2 (hc )+0p (n
1n
°
(2.5)
(2.6)
Hence by lemma 3.6 and lemma 3.3 we get
(2.7) .
Note that by (2.2), (2.6) and lemma 3.3, for any p,q € ~,
"
-2/5""
-2/5
-7/10
p(h -h )C n
)
+ q(h -h )C n
+ pD '(h ) + q01'(h ) + 0 (n
o 0 1
c 1o o
1
0
p
= 0.
Therefore by lemma 3.7 we get
Hence we conclude that
3/10[ h o- h o ]
",.
h-h
c
0
Now h _P---+l h
C n-1/5 implies that
000
n
-2[ a~
C
1
a
oc
<V
"
h - h
o
0
"
h
a oc
1/10
o
n
"
,..
2
h - h
a
c
0
c
"
h
o
- ~~
-1
-2
and C C = 2aC + 12~ = a
which completes the proof of theorem 2.1.
0
o
1 o
82
Next is the other main result describing the relative amount by
A
A
which h (ho or hc ) fails to minimize integrated square error.
Theorem 2.2.: Under conditions B.1, B.2 and B.3
(i)
and
(il)
Proof:
A
A
Let h denote either h
"
A
or h and consider
o
c
A
A.
At
A
A
A
f(h) - f(h o ) = f 1 (h) - f 1 (ho ) + f 2 (h) - f 2 (ho ) + 2[f3 (h) - f 3 (ho )].
Notice that, Taylor's expansion of f 1(h) at h gives
o
A
f(h) - f(h o )
= ~h
A
- ho )2f1 (h*) + f 2 (h) - f 2 (h )
o
A
A
+ 2[f (h) - f (h )]
*
h
where
3
3
A
o
A
lies between h and h .
o
By lemma 3.8 and by the fact that
l h* ~
1
o
f1(h*) = M1(h*) + op(n-
* = C n-215+
1
But M1(h )
open
-215
A
A
1
A
= ~h
A
2
).
) and by (2.3) and (2.6),
A
A
-3/10
h - ho = 0p (n
). Therefore,
f(h) - f(h o )
2I5
- ho) C1n
-215
-1
A
A
+ open ) + f 2 (h) - f 2 (h )
o
A
A
+ 2[f3 (h) - f 3 (h )]·
o
Hence by (2.3), (2.7) and by lemma 3.9 we get
ALl
-1 2 2
n[f(ho ) - f(h o )]
) "2"" C1 a 0 '<1 '
and
S3
and
-1 -1
where C1 C2
= 13·
5.3 Lenunas
In this section we will state and prove the lenunas which we have
used to prove the theorems of the last section. But before that we will
prove some identities which are used in the lenunas.
Identity l.
-
h2 f 1'(h)
= I(~n (xlh)-~(x»(~n (xlh)-~n (xlh»w(x)dx
=
I(~n (xlh)-E[~n (xlh)])2w(x)dx
- I(~n (xlh)-E[~n (xlh)])(~n (xlh)-E[~n (xlh)])w(x)dx
+ I(~n (xlh)-E[~n
(xlh)])(2E[~
.
n (xlh)]-E[~n (xlh)]-~(x»w(x)dx
+ I(~n (xlh)-E[~n (xlh)])(~(x)-E[~n (xlh)])w(x)dx
+ I(E[~n (xlh)]-~(x»(E[~n (xlh)]-E[~n (xlh)])w(x)dx
(3.1)
where
-1 n
~ (xlh) = (nh)
~
n
i=l
Proof:
First note that by (3.2),
84
x-x.I
L(~)
Q(X.)
I
Ai'
(3.2)
Therefore
f ' (h) =
l
2S{Tin (x Ih)-T}{x) )Ti'n {x Ih)w{x)dx
= -2h- I I{Tin {xlh)-T}{x»{Tin (xlh)-;n (xlh»w{x)dx
and hence the first part of the identity. W&get the second part of the
identi ty by expanding all the squares and products of the rightmost
side of the identity, and by adding and subtracting suitable expected
values.
Identi ty 2. :
~ Mi{h)
= -IE[1i~{xlh)]W{x)dx
+
In{x)E[l1n {xlh)]w{x)dx
IE[Tin (xlh);n {xlh)]w{x)dx - In{x)E[;n (xlh)]w{x)dx.
+
(3.4)
Proof:
Again by (3.3) note that
= 2E[II{Tin {xlh)-T}{x»Ti'{xlh)w{x)dx]
n
MI'{h)
-2h- IE[I{Tin {xlh)-n{x»{Tin (xlh)-;n {xlh»w{x)dx]
= 2h- I {-IE[Ti2{x/h)]dx + In{x)E[Ti {x/h)]w{x)dx
=
.
+
n
n
IE[l1n (xlh);n (xlh)]w{x)dx - In{x)E[;n {xlh)]w{x)dx}.
Hence the identity.
Identi ty 3. :
- ~ Di{h)
= I{l1n {xlh)-E[l1n {xlh)])2w{x)dx
- I{Tin {xlh)-E[l1n (xlh)]){;n {x/h)-E[;n {xlh)])w{x)dx
+
I{Tin {xlh)-E[Tin {xlh)]){2E[Tin {xlh)]-E[;n {xlh)]-n{x»w{x)dx
+
I{; n {xlh)-E[;n {xlh)]){n{x)-E[Tin {xlh)])w{x)dx
85
- lE{~n (xlh)-E[~n (xlh)]}2w(x)dx
+ lE{(~ (xlh)-E[~ (xlh)])(~ (xlh)-E[~
n
n
n (xlh)])}w(x)dx. (3.5)
n
Proof:
Recall that,
Therefore
~ Di (h) = - ~ ri (h)
-
~ Mi (h).
+
~ riCh) is given by (3.1) and ~ Mi(h) is given by (3.4),
But -
thus by adding this together we get the required identity.
The following is a consequence of the above three identities.
Define
S(h) = -
~ Di (h),
x-x
x-x
K(~)
Ki(x) =
K(~)
Q(X ) I[A =l] - E[ Q(X ) I[A =I]]'
i
i
i
i
x-x
x-x
L(~)
and
Li(x) =
L(~)
Q(X ) I[A =l] - E[ Q(X ) I[A =I]].
i
i
i
i
Therefore,
S(h) = -
h
~
Di(h) = (nh)
- (nh)
-2 n
2
J[ I Ki(x)] w(x)dx
i=1
n
n
J[ I Ki(x)][ I Li(x)]w(x)dx
-2
i=1
n
+ (nh)-I J I
i=1
n
+ (nh)-I J I
i=1
-2
i=1
Ki(x){2E[~ (xlh)]-E[~ (xlh)]-~(x)}w(x)dx
n
n
Li(x){~(x)-E[~ (xlh)]}w(x)dx
n
2
n
- (nh) lE[ I K.(x)] w(x)dx
i=1 1
+
(nh)
n
n
lE[ I K1(x)][ I Li(x)]w(x)dx.
-2
1=1
i=1
86
Therefore, finally the decomposition of S(h) (or Di(h»
S(h) = -
~ Di(h)
is
= SI(h) + S2(h) + S3(h)
(3.6)
where
where
SII(h) = (nh)
-2
IKi(x)Kj(x)w(x)dx,
l l
l~i<j~n
SI2(h) = (nh)
-2
![Ki(x)Lj(x)+Kj(x)Li(x)]w(x)dx,
l l
l~i<j~n
n
S21(h) = (nh)-1 l IKi(x){2E[~ (xlh)]-E[; (xlh)]-~(x)}w(x)dx,
i=1
n
n
.
n
S22(h) = (nh)-1 l Jli(x){~(x)-E[~ (xlh)]}w(x)dx,
i=1
n
S31(h) = (nh)-2
S32(h) = (nh)
~ ![~(X)-E(~(X»]w(x)dx,
i=1 .
-2 n
i:l![K i (X)L i (X)-E(K i (X)L i (X»]W(X)dx.
We will now obtain a similar decomposition for 6i(h). For that we
define,
K(
X~
i
h
j)
B1(X i ,X j ) = Q(Xi)Q(X ) [w(X i )+W(X j )]I[A =I]I[A =l]
j
j
i
= B11 (X i ,X j ) + BI2 (X i ,X j ) ,
X -X
L(
i
h
j)
B2 (X i ,X j ) = Q(Xi)Q(X ) [w(X i )+W(X j )]I[A =I]I[A =l]
j
j
i
= B21 (X i ,X j ) + B22 (X i ,X j ) ,
br(X i ) = E[Br(Xi,Xj)IX i ] = b r1 (X i ) + b r2 (X i }, r=I,2,
and
~
r
= E[b r (Xi)]'
r=I,2.
Identity 4.
(3.7)
•
r=1,2,
r=1,2.
Proof:
First note that by (3.3)
d
(1/2)6i{h) = dh[I~{x)[~n(x)-~n(x)]w(x)dx -
1 n
-n- :
~ni{Xi)
i 1
Q(X ) W(X i )I[A =l]]
i
i
- l I n ~ni{Xi)-~ni{Xi)
= h {-n- :
Q(X )
w(X i )I[A =l]
i 1
i
i
- I~{x)[~n (x)-~n {x)]w{x)dx}.
Now the identity follows by algebra which consists of showing that the
sum
T1(h) + T {h) is equal to the terms in bracket in the last
2
equation.
The symbols C, C and C occurring in the following lemmas denote
1
2
generic positive constants. In lemmas 3.1 -3.7 and 3.9 we assume
conditions B.1 and B.2.
Lemma 3.1. :For each O<a<b<oo and all positive integers m,
sup
2m
E In7/10,
D {n-1/5 t) 1
1
~
C1{a,b,m),
(3.8)
E In7/10,
6 {n-1/5 t) 12m
1
~
C1{a,b,m).
(3.9)
n;a~t~b
sup
n;a~t~b
Furthermore, there exists
~1>0
such that
Eln7/10[Di{n-1/5s) - Di{n- 1/ 5 t)]1 2m ~ C {a,b,m)ls- t
2
~
l
m
1,
~
m
Eln7/10[6i{n-1/5s) - 6i{n- 1/ 5 t)] 12m ~ C (a,b,m)ls- t l 1,
2
whenever
a~s~t~b.
88
(3.10)
(3.11)
Proof:
By (3.6) to prove (3.10) we shall show that for some ~>O.
Eln9/10[SII(n-1/5s) - SII(n- 1/5 t)] 12m ~ cls-tl~m.
(3.12)
Eln9/10[S21(n-1/5s) - S21(n- 1/5 t)] 12m ~ cls-tl~m.
(3.13)
ElnI3/10[S31(n-1/5s) - S31(n- 1/5 t)] 12m ~ cls-tl~m.
(3.14)
Similar inequalities may be established for the functions S12' S22 and
~2'
To verify (3.12) note that SII may be written as
-1/5
-2
SII(n
t) = n
~ ~ Ut(i.j)
l~i<j~n
where
-1/5 -2
Ut(i.j) = 2(n
t) JKi(x)Kj(x)w(x)dx.
°
At this point note that as E[K.(x)]
= for every i=I.2 ....• n.
1
-1/5 -2
E[Ut(i.j) IXi] = 2(n
t) JKi(x)E[Kj(x)]w(x)dx =
and similarly E[Ut(i.j)IX j ] = 0.
Now consider the difference
where
89
°
(3.15)
and
-1/5 -2
14 = 2(n
s) fEE
x-Xi
x-X
K( -1/5 )
K(
n
s
n
s
Q(X ) I[A =l]]XE[
Q(X ) I[A =l]]w(x)dx
j
j
i
i
x-X
x-X
-l/l )
-1/5 -2 rr
- 2(n
t) JL[
K(
-1/~
)
K(
-l/l )
n
t n t
Q(X) I[A =l]]xE[
Q(X) I[A _l]]w(x)dx.
i
i
j
j-
Now by using the compactness of support and ffdlder continuity of K one
can show that, for s,t€(a,b), 1
n
1/5
that
I[-2,2](
Xi-X j
n-l/~)
the compact
1
is bounded by a constant x Is-tiC x
. For that, without loss of generality, assume
support of K is
[-1,1]
and as defined earlier
Now by the Hoelder continuity of K,
11 1 1
~ cls-tlCn
1/5
X.-X.
1[-2,2]( n~l/~)
Also each of
(3.16)
the terms 1 , 1 , and 1 is bounded by a constant
2
3
4
5
c
multiplier of n-2/ Is_t1 and E[I~(i,j) IX ] is bounded by a constant
j
5
c
multiplier of n-4/ Is_tl . To illustrate this consider the term 1
2
-1/5
-1/5
where hI=n
t and h2 =n
s,
90
K(
-2
12 = 2(h 1} I
x-x
x-x
K(
i}
hi
Q(X } I[A =l]E[
i
i
x-x
K(
j}
hi
Q(X } I[A =l]]w(X}dx
j
j
x-x
K(
i}
j}
~
h2
- 2(~} I
Q(X } I[A =l]E[ Q(X } I[A =l]]W(X}dx
j
j
i
i
1
x-Xl
2
~ Clh~ IK( h }[Tl(x} - kh 1Tl"(x}]w(x}dx
1
-2
-1
-~ IK(
-1
~ ClfTl(x}h 1 K(
~ C Ih~-~I
x-Xl
2
~}[Tl(X} - kh2Tl"(x}]w(x}dx I
x-Xl
-1 x-Xl
2 2
hi }w(x}dx -fTl(x}~ K( ~}w(x}dxl + C Ih1-h21
+ C
Ih~-h~1 ~ Clh~-h~l.
1121 ~ C n-2/5Is_tl~ ~ C Is-tl~.
Hence we get
Also note that E[1 1(i. j }IX j ] = 12 (i}.
Therefore
IUS(i.j} -Ut(i.j}1
~ 11 1 + C n-2/5Is_tl~
1
~
CIs-t I (n
~
-1/5..
-b)
-1
{1[-2.2](
X -X
i
j
n_1/~}+constant}.
Define the sequence of sigma fields
~k
=a
{(X1.A1}.(X2.A2}.···.(~.Ak}}.
= a {(X1.A1}.(~.A2} ..... (Xn.An}}.
for k<n
for
k~n.
Now note that by (3.15)
[U (i.j) -Ut(i·j}]1 ~ -I}
s
n
n i-I
= E{! ! [U (i.j) -Ut(i.j}] ~ -I}
i=2 j=l s
n
n-l i-I
n-1
= E{! ! [U (i.j) -Ut(i.j}] + ! [U (n.j) -Vt(n.j}]
i=2 j=l s
j=l s
n-1 i-I
n-1
=! ! [U (i.j) -Ut(i.j}] + E{ ! [U (n.j) -Ut(n.j}]
i=2 j=l s
j=l s
E{! !
l~i<j~n
I
91
I
I
~
-I}
n
~
-I}
n
(3.17)
•
n-1 i-1
= I
~ [Us(i,j) -Ut(i,j)] = I I
[Us(i,j) -Ut(i,j)].
i=2 j=1
1~i<j~n-1
1. e.
I I [U (1, j) -Ut (1, j)] is a martingale wi th respect to the
1~i<j~n s
sequence of sigma. fields
{~k'
k~1}.
Now we will use the inequality
(21.5) of Burkholder (1973) for the martingale which is
co
E[¢(f*)] ~ C E[¢(s(f)] + C I E[¢(ldil)]
i=1
where C is a constant. This inequali ty is applied to the finitely
2m
indexed martingale {I I [U (i,j) -Ut(i,j)]} with ¢(x) = x ,
1~i<Hn s
k-1
k i-1
f*= sup
1I
I [Us(i,j) -Ut(i,j)]I, ~= .I [Us(k,j) -Ut(k,j)] for
k=2, ... ,n i=2 j=1
J=1
n
21
1/2
k~n and zero otherwise, and s(f) = { I E[d ~k-1]}
.
k
k=2
So by (3.17),
2
k-1
2
E[dkl~k-1] = E[( I Us(k,j) -Ut(k,j» l~k-1]
. j=1
215
2
k-1
~ E[( I I 1 (k,j) + C nIs-tl~) l~k-1]
j=1
k-1 k-1
= I
I E[I (k,i)I (k,j)
1
1
i=1 j=1
+ I (k,i)C n-2I5Is_tI2~ + C n-4I5Is-tI2~I~k_1]
1
k-1 k-1
~
I
I
i=1 j=1
E[I~(k,i)
+ I 1 (k,i)C n-2151 s-t I~ + C n-
~ Cls_tI2~2n-4I5 .
Therefore
n
E[¢(s(f»] = E[{ I E[d~l~k_1]}1/2]2m
k=2
n
m
2
= E[{ I E[dkl~k_1]} ]
k=2
n
~ E[{ I Cls_tI2~2 n- 4I5 }m]
k=2
92
s-t I~I ~k-1]
415 1
(3.18)
and
n
n
k-1
~ E[¢(I~I)] ~ ~ E[( ~ IUs(k,j) -U t (k,j)I)2m]
i=l
i=l
j=l
~ C Is_tI2~m
n
~ n2m-(4ml5)
i=l
~ C Is_tI2~m n(6m+5)/5 .
(3.19)
Finally the Burkholder inequality, (3.18) and (3.19) give
Eln9/10[Sll(n-1/5s) - Sll(n- 1/ 5 t)]1 2m
n
i-I
= n-(11/5)~1 ~ ~ [U (i,j) -U (i,j)]I 2m
i=2 j=l s
t
= n-(11/5)m{E[¢(f*)]}
n
~ n-(11/5)m{C E[¢(s(f)] + C ~ E[¢(ld. I)]}
i=l
1
~ n-(11/5)m{C Is_tI2~m n(11/5)m + C Is_tI2~m n(6m+5)/5}
~ C Is_tI2~m + C Is-tl~m n 1- m
~ C Is-t 12~m ,
which completes the proof of (3.12).
To show (3.13), note that
12E[~n (xlh)]-E[~n (xlh)]-~(x)1 = 12(E[~n (xlh)]-~(x»-(E[~n (xlh)]-~(x»I.
Now B.1, B.2, Taylor's Theorem and the fact that L is also symmetric
and integrates to I, imply that for t€(a,b),
12E[~ (xln-1/5t)]-E[~ (xln-1/5t)]_~(x)1
n
n
= 12(E[~ (xln-1/5t)]_~(x»-(E[~ (xln-1/5t)]_~(x»1
n
n
~ C n-2/5.
Write
where
93
•
V (i)
t
= (n- 1/5 t)-l S{
x-Xi
x-Xi
K( -1/5 )
n
t I
K( -1/5 )
n
t I
Q(X i )
[A i =l] -
E[
Q(X i )
[A i =l]
]}
x {2E[~ (xln-1/5t)]-E[~ (xln-1/5t)]~(x)}w(x)dx.
n
Observe that
n
E[Vt(i)] = 0 •
(3.20)
and
x-X.I
-1/5
- E«n
s)
-1
K( -1/5 )
n
Q(X )
i
s
I[A =l])]}W(X)dx.
i
~ C[n-2I5 + C n-2I5 n-2I5]ls_tl~
Therefore
Vs(i) - Vt(i)
~
215 1 I~
C ns-t
.
Now note that by (3.20)
E[i~l(VS(i) -
Vt(i»1
~n-1)]
n-1
= E[V (n) - Vt(n)] + ! (V (i) - Vt(i»
s
i=l s
94
(3.21)
n-1
= I (V (i) - Vt{i».
i=1 s
n
Le.
{
I
i=1
(V (i) - Vt{i» } is a martingale with respect to the
s
sequence of sigma fields
{"k'
k~1}.
So again we will
martingale inequality (21.S) of Burkholder (1973)
n
indexed martingale { I (V (i) - Vt{i»
i=1 s
apply the
to the finitely
} with ¢(x) = x
~
,
*
k
sup
I I [Vs{i) - Vt{i)]I, ~= [Vs{k) - Vt{k)] for k~n and zero
k=1, ... ,n i=1
otherwise, and
f =
s{f) ={
~ E[d~I"
k=1
-k
_ ]}1/2 ={ ~ E[{V {k)-V (k»2]}1/2.
k 1
k=1
s
t
Therefore using (3.21) we get
n
E[¢(s{f»] = E[{ I E[{V {k)_V {k»2]}1/2]2m
t
k=1
s
n
~ {I E[{V (k)-V t {k»2]}m
k=1
s
~ C Is-tl 2em n{1/S)m
,
(3.22)
and
n
I
k=1
E[¢(I~I)] =
n
2m
I E[IVs{k)-Vt{k) 1
]
k=1
~ C Is_tl 2cm n 1-{4IS)m .
(3.23)
Finally the Burkholder inequality, (3.22) and (3.23) give
Eln9/10[S21{n-1/Ss) - S21{n- 1/ S t)]1 2m
n
= n-{1/S)~1 I [V {i)-V (i)]I 2m
i=1 s
t
= n-{1/S)m{E[¢(f*)]}
~
(1/S)m{C E[¢(s{f)] + C ~ E[¢(ld. I)]}
i=1
1
~ n-{1/S)m{C Is_tl 2cm n{1/S)m + C Is_tl 2cm n 1-{4IS)m
n
~ C Is_tl 2cm + C Is_tl 2cm n 1- m
9S
•
which completes the proof of (3.13).
To show (3.14) note that
where
-1/5
-2 n
S31(n
t) = n ! ft(i)
i=1
ft(i) = (n-l/5t)I{~(x) - E[~(x)]}w(x)dx
E[ft(i)] = 0
and
For s.t
€ (a.b)
for every i.
consider the difference
where
x-x.I
-1/5 -2
IJ 1 I = I(n s) I(
K( -1/5 )
n
Q(X )
i
s
2
I[A =I]) w(x)dx
i
K(
x-x
i)
-1/5 -2
t) I(
- (n
n-1/5 t
2
Q(X ) I[A =I]) w(x)dxl.
i
i
x-Xi
x-Xi
K( -1/5 )
K( -1/5 )
1/ 5 t)-2
=
12(nn
t
I
E[
n
t I
] ( )dx
f Q(X ) [A =l]
IJ2 I
Q(X i )
[Ai=l] w x
i
i
x-Xi
x-Xi
K( -1/5 )
K( -1/5 )
-1/5 -2
n
s
n
s
I
- 2(n
s) f
Q(X ) I[A =I]E[
Q(X ) I[A =I]]w(x)dx.
i
i
i
i
x-Xi
K( -1/5 )
-1/5 -2
n
s
2
IJ3 I = I(n s) I{E[ Q(X ) I[A =I]]} w(x)dx
i
i
x-X.
K(
I)
2
-1/5 -2
n- l / 5 t
t) I{E[
Q(X.) I[A.=l]]} w(x)dxl.
- (n
I
and
96
1
As we have illustrated in (3.16). by using compactness of the support
and Walder continuity of K. each of the above four terms is bounded by
constant x nl/5Is_tl~. Thus
(3.24)
n
Further {:I (f (i) - f t{i» } is a martingale with respect to the
i=1 s
sequence of sigma fields
{~k' k~l}
because
~1
n
E[ :I {f (i) - ft{i»1 ~ 1] = ~ (f (i) - ft{i» + E[f en) - ft{n)]
i=1 s
ni=1 s
s
n-l
= :I (f (i) - ft{i».
i=1 s
So again we will apply the martingale inequality (21.5) of Burkholder
n
(1973) to the finitely indexed martingale {
*
} with
k
sup
I:I [fs{i) - ft{i)]I. ~= [fs{k) - ft{k)] for
k=I •...• n i=1
and zero otherwise. and
¢(x) = x
k~n
2m
:I (f (i) - ft{i»
i=1 s
s{f) ={
• f =
~ E[d~l~
k=1
Ik
k-l
]}1/2 ={
~
k=1
E[{f {k)-f (k»2]}1/2.
s
t
Therefore using (3.24) we get
n
E[¢(s{f»] = E[{ :I E[{f {k)-f {k»2]}1/2]2m
t
k=1
s
n
~ {:I E[{f (k)-f t {k»2]}m
k=1
s
~ C Is_tI2~m n{7/5)m •
(3.25)
and
n
n
2m
:I E[¢(ldkl)] = :I E[lfs{k)-ft{k) 1 ]
k=1
k=1
~ C Is_tI2~m n{2/5)m+l .
97
(3.26)
•
Finally the Burkholder inequality, (3.25) and (3.26) give
Eln13/10[S31(n-1/5s) - S31(n- 1/ 5 t)]1 2m
n
= n-(7/5)~1 ~ [E (i)-E (i)]I 2m
i=l s
= n-(7/5)m{E[¢(f*)]}
t
n
~ n-(7/5)m{C E[¢(s(f)] + C ~ E[¢(ldil)]}
i=l
~ n-(7/5)m{C Is-tl 2cm n(7/5)m + C Is_tI2~m n(2/5)m+1}
~ C Is_tI2~m + C Is-tl 2cm n 1- m
~ C Is_tI2~m,
which completes the proof of (3.14) and hence that of (3.10). The same
type of argument gives (3.8), (3.9) and (3.11).
Lemma 3.2. : For some
sup
a~t~b
~>o
and any O<a<b<m,
{IDi(n-I/5t)I + 16i(n-I/5t)I}
Furthermore, for any
= ° (n-3/5-~).
(3.27)
p
~2>0
and any nonrandom
hI' asymptotic to a
constant multiple of n-1/5 ,
P
----t)
°.
(3.28)
Proof :
First we will prove (3.28). Note that (3.28) will be true if
sup
n7/10IDi(n-1/5t)-Di(hl)I
It-nl/~11~n-~2
p)
°
(3.28a)
P
°.
(3.28b)
and
sup
n7/1016i(n-1/5t)-6i(hl)I
It-n1/~II~n-~2
----t)
The proof of (3.28a) and (3.28b) basically involves two steps. In
98
the first step supremum over the uncountable set will be reduced to the
supremum over a countable set and in the second step the Markov
inequality is used to get the required result. To illustrate, we will
prove the result for 0i because the proof for 6i will be along parallel
lines.
To check (3.28a) note that using the decomposition (3.6) of 0i,
the }folder continui ty of K and L and the fact that both of these
function have compact support, there is an a>O sufficiently large that
sup
IOi(n-
1/5
s)-Oi (n-
1/5
t)
a~s~t~2b
I = O(n- 1).
(3.29)
Is-t I~n-a+1/5
For a<lim n
1If\..
-h <b, suppose
1
n
-~2
1/&
-h 1-n
where t -t _ = n
i i 1
-a
= to<t1<···<tm-1~n
1/&
-h 1+n
-~2
<tm ,
for each i. Therefore now to finish the proof of
(3.28a), in view of (3.29) it suffices to check that
sup
n7/1010i(n-1/5ti)-Oi(n-1/5tj)I
(ti,tj)G
p)
•
°,
-~2-1/5
where
~
is the set of all pairs (ti,t j ) with
O<ti-tj~
n
and
i~m.
For any (j>O,
P{
sup
n7/1010i(n-1/5ti)-Oi(n-1/5tj)I
(ti,tj)G
> (j}
~
~
E{(j-1n7/1010'(n-1/5t )_O'(n- 1/5 t.)1}2t
(ti,tj)G
1
i
1
J
~
C (j
-2t
n
2(a-~2-1/5)
-1/5-~2
(n
~lt
),
using (3.10) and the fact that the number of elements in
(3.30)
~
is of order
2(a-~2-1/5)
n
. By choosing t sufficiently large we may ensure that the
term in (3.30) converges to zero as n
~ ~
. This proves (3.28a) and
hence (3.28). A similar partitioning argument will give (3.27).
99
Lemma 3.3. : For some
sup
a~t~b
and any O<a<b<oo,
IR (n- 1/ 5 t) I + IR (n- 1/ 5 t) I = Op(n-3/4+~)
1
2
O<~<3/20
Proof :
We will give an argument for R1 . The proof for
First note that R1(h) = f 2(h) + 2fj(h) and
f '(h)
2
~
is similar.
= [-(2/h)]I(~n (xlh)~n (xlh»2w(x)dx
- I(~n (xlh)-~n (xlh»(~n (xlh)-~n (xlh»w(x)dx
f '(h) = (2/h)I(~ (xlh)-~ (xlh»(2n (xlh)-~ (xlh)-n(x»w(x)dx
3
n
n
n
n
- I(~n (xlh)-n(x»(~n (xlh)-~n (xlh»w(x)dx.
1 2
Now as I~(X)-Q(X)I = 0p(n- / )
I(~n (xlh)~n (xlh»2w(x)dx
1
= 0 (n- )
p
and by the Schwarz inequality
I(~n (xlh)-~n (xlh»(~n (xlh)-~n (xlh»w(x)dx
= 0p (n- 1).
Further by theorem 2.1 of Chapter 4 and the Schwarz inequality
I(~n (xlh)-nn (xlh»(2~n (xlh)-~n (xlh)-n(x»w(x)dx = 0p (n- 19/ 20 )
and similarly
I(~n (xlh)-n(x»(~n (xlh)-~n (xlh»w(x)dx
19 20
= 0p
(n- / .
)
Therefore,
,
-4/5
,
-3/4
(n
)o
and f 3 (hp
) = 0 (n
),
2 (ho ) = 0p
and hence R (h ) = 0 (n-3/ 4 ).
1 o
p
Now to get the magnitude of R (n- 1/ 5 t) uniform in t consider
1
1
5
sup IR (n- / t)I and then reducing the supremum to a countable set as
1
f
a~t~b
done in lemma 3.2 we get the required result by an argument similar to
that used in the last lemma.
Lemma 3.4.
For some
~>O,
100
Ih -h 1 + Ih -h 1 = 0 (n-~-1/5).
o 0
Cy 0
P
Proof:
First we treat
h" /h
o
P
0
---.;;.-+-
o.
Iho -h0 I. Theorem 3.3 of Chapter 3 implies that
Therefore by lemma 3.2 and lemma 3.3,
"
r' (ho) = r' (ho) - r' (ho)
= Mi{ho ) - Mi{h" o ) + Di{ho ) - Di{h" o ) + R1{ho ) - R1{h" o )
= M1'{ho ) - M1'{h0 ) + 0p (n-~-3/5).
Also r'{h) = D'{h ) + R {h ) and by lemma 3.2, D'{h )=0 (n-~-3(5),
1 0
1 0
1 0
o
p
3
4
and by lemma 3.3 R {h) = 0 (n- / ). So r'{h )=0 (n-~-3/5). Therefore
1 o
pop
o (n-~-3/5) = M'{h ) - M'{h ) + 0 (n-~-3/5)
p
1 0
lop
o (n-~-3/5) = M'{h ) - M'{h )
I.e.
1
p
1
0
0
= {ho-h~Mi: (h*) ,
where
h*
lies
in
between
h
o
and
(3.31)
h
0
As
in
section
5.2,
Mi:{h* )=C 1n-215+op{n-215 ) . Using this estimate in (3.31) we conclude
that Ih -h 1
o
0
= 0 p (n-~-1/5)
To treat
IhCy-h0 I,
P
implies that hCy/h0
CV'{ho )
as required.
first note that theorem 3.4 of Chapter 3
- O. Therefore by lemma 3.2,
= CV'{ho )
- CV'{hCY )
= Mi{ho)+Di{ho)+6i{ho)+Rl{ho)+R2{ho)
- [Mi{hCy)+Di{hCy)+6i{hCy)+Rl{hCy)+R2{hCY)]
= M'{h ) - M'{h ) + 0 (n-~-3/5)
1 0
1 Cy
P
Also by lemma 3.2 and 3.3,
CV'{h ) = M1'{h )+D 1'{h )+6 1'{h )+R {h )+R {h )
o
000
1 0
2 0
= Di{ho)+6i{ho)+Rl{ho)+~{ho)
= 0 (n-~-3/5)
p
101
•
Therefore ,
op (n-~-3/5)
= M'{h ) - M'{h )
1 0
1 cv
= (h -hcv )M1{h*),
o
where h* lies in between h and h . So as before Ih -h
o
~
~
0
(3.32)
1=0p (n-~-1/5).
Lemma 3.5:
L
"'-';;~-+l
2
a).
N{O,
o
Proof:
We shall start from the decomposition (3.G) and prove that
n 9/10S{h)
L
I N{O,
a 2 ).
o
0
Now the argument leading to (3.14) gives E[~{ho)] = O{n-9110
S3{h ) = o{n
). Therefore it suffices to show that
o
9/10S )
L
(Z
Z)
( 9/10S
n
I' n
2
I'
I
13/5
2
) and so
(3.33)
where Si=5 i (ho ) i=1,2, and Zl and Z2 are independent normal variable
c?
wi th zero mean and variances adding up to
~ a~. The argument leading
to (3.33) is given in Chapter 4 with the terms equivalent to Sl and 82
being I *
n and II n . Therefore to complete the proof we need only to show
that
9/
n 5yar(8 1 )
n
-+
2
C~l[f{ g~~} w{x»2dx]f[fK{z+u) {K{z)-L{z»dz]2du , (3.34)
4 2
2 n(x)
-Yar{S2) -tCo 4k {f{ll"{X»
Q(x)
2
9/5-_
W
{x)dx
- [fll"{x)1l{x)w{x)dx]2}.
(3.35)
Let
K{~)
- E{
Q{X) I[A=1])]
K{y~X)
[
102
Q{X) I[A=l] - E{
K{y~X)
Q{X) I[A=1])]}'
L(X~X)
L(x~X)
a2 (x,y) = E{[ Q(X) I[A=l] - E( Q(X) I[A=l])]
L(Y~X)
L(Y~X)
[ Q(X) I[A=l] - E( Q(X) I[A=l])]}'
K(x~X)
K(x~X)
~(x,y) = E{[ Q(X) I[A=l] - E( Q(X) I[A=l])]
L(Y~X)
L(Y~X)
[ Q(X) I[A=l] - E( Q(X) I[A=l])]}'
and
a4(x,y) =
~(y,x).
Now
Var(Sl) = Var(Sll) + Var(S12) - 2Cov(Sll' S12)'
where Sij= Sij(ho)'
Now
Var(Sll) = 4(nh )-4yar (~ ~ JK i (x)K j (x)w(x)dx)
l~i<j~n
o
= 4(nh) -4 ~ ~ Var(JKi(x)Kj(x)w(x)dx).
o l~i<j~n
2
= 2(nho )-4 n(n-l) IIa1(x,y)w(x)w(y)dxdY,
•
Var(S12) = (nh )-4yar {~ ~ I[Ki(x)Lj(x)+Li(x)Kj(x)]w(x)dx}
o
l~i<j~n
= (nh )-4 ~ ~ Var{I[Ki(x)Lj(x)+L.(x)K.(x)]w(x)dx}
o l~i<j~n
1
J
-4
= (nh) ~ ~ [Var{JKi(x)Lj(x)w(x)dx}+ Var{lLi(x)Kj(x)w(x)dx}
o l~i<Hn
+ 2Cov{JK i (x)L j (x)w(x)dx, lLi(x)Kj(x)w(x)dx}]
= (nh) -4 n(n-l) ~ ~ II2[a1(x,y)a2(x,y)
o
l~i<j~n
+ ~(x,y)a4(x,y)]w(x)w(y)dxdy
= (nho) -4 n(n-1) II[a 1(x,y)a2(x,y) + a 3(x,y)a4(x,y)]w(x)w(y)dxdy.
and
CoV(Sll' S12) = (nh )-4 Cov{2 ~ ~ JKi(x)K.(x)w(x~dx.
o
l~i<j~n
J
~ ~ I[Ki(x)Lj(x)+L.(x)Kj(x)]w(x)dx}
l~i<j~n
= (nho )
-4
2
~
~
Hi<Hn
~
~
1
Cov{JKi(x)Kj(x)w(x)dx,
Hr<s~n
103
I[K r {x)Ls {x)+L r {x)Ks {x)]w{x)dx}.
but as
=0
for
= IIa l {x,y)a3 {x.y)w{x)w{y)dxdy
= IIa l {x.y)a4 {x.y)w{x)w{y)dxdy
for
s=j
rjll!i, sjll!j
r=i. sjll!j
r=i. s=j
for
r=j. s=i.
{ r,<i.
we get
Therefore
-4
2
Var{SI) = (nbo )
n{n-l)II{2al+ala2+a3a4-2ala3-2ala4) {x.y)w{x)w{y)dxdy.
Note that by (4.10) of Chapter 4
IIa~{x,y)w{X)W{Y)dxdY ~ h~I{ g~:~ w{x»2dx
where
~1{u)
x
I~{u)du,
= fK{z)K{u+z)dz and the similar computation gives
IIa l a j {x. y )w{x)w{y)dxdY ~
h~I{ g~:~ w{x»2dx
x
I~I{u)~j{u)du
~ h~I{ g~:~ w{x»2dx
x
I~3{u)~4{u)du.
j=I.2.3.
and
IIa3a 4 {x,y)w{x)w{y)dxdy
where
~2{u)
= fL{z)L{u+z)dz
~3{u)
= fK{z)L{u+z)dz = fL{z)K{u+z)dz =
~4{u).
Thus
Var{SI)
~
n
-2
-1
1](x) _.
ho I{ Q{x) w{x»
2
..2
dxf{2Pl+~1~2+~3~4-2~1~3-2~1~4){u)du.
Now using the fact that ~3{u) = ~4{u) and I~I~2{u)du = I~{u)du we get
-2 -1
n(x)
2
2
Var{SI) ~ n ho I{ Q{x) w{x» dxf2{~I-~3) (u)du
-9/5 -1
n(x)
2
2
= 2 n
CO I{ Q{x) w{x» dxf[fK{z+u){K{z)-L{z»dz] duo
and hence the proof of (3.34) is complete.
104
To prove (3.35). observe that
Var(S2) = (nho )
-2
2
n (v2-v l ).
where
=
E[A + B]i
and
E[A] = -II[K(x~u)- L(x~u)]~(u)~(x)w(x)dudx
= -h3 {f(u2/2)[K(u)-L(u)]du}{I~"(x)~(x)w(x)dx}+ o(h3 )
and
E[B] = (2Ih)III[K(x~u)- L(x~u)]K(X~Y)~(y)~(u)W(X)dudYdx
= 2h3{f(u2/2)[K(u)-L(u) ]du}{I~"(x)~(x)w(x)dx} + o(h4).
Since Iu~(u)du = -Iu~'(u)du = -I[~u~(U»]dU + 3Iu2A(U)du = 6k.
vI = -2kh3
I~"(x)~(x)w(x)dx +
3
o(h ).
2
K(~)
L(x~X)
2
Again E[A ] = E{I{ Q(X) I[A=l]-f Q(X) I[A=l]}~(x)w(x)dx}
105
4It
x-u
x-u
2 n(u)
I
= I {[K(~)L(~)]~(x)w(x)dx} Q(u) du
=
h2I{I[K(Y)-L(y)]~(u+hy)w(u+hY)dy}2 g~~}
by substituting
x-u
~
du
= y in the inside integral.
2
2 6
2 ~(u)
So finally E[A ] = 4 k h I(~"(u» Q(u}
2
W
6
(u)du + o(h ).
It can be seen from a calculation similar to lenuna. 4.2 of O1apter 4
6
2
6
that E[B ] = 0(h ) and by the Schwarz inequality E(AB) = 0(h ). Hence
2
2
6
v2 = 4 k h6I(~"(u»2 g~~} w (u)du + 0(h ).
Therefore
Var(S2) '" (000 )-2 n
4k~: {I(~"(u»2 g~~} w2 (u)du
- [I~"(x)~(x)w(x)dx]2}
9 5
= n- / C:
4k2{I(~"(u»2 g~~} du - [I~"(x)~(x)w(x)dx]2}
Hence the proof of lenuna. 3.5 is complete.
Lenuna 3.6:
2
a c ).
Proof:
We shall start from the decomposition (3.7) and prove that
9 IO
9 1
(n / TI' n / 0T2 )
L
) (ZI' Z2)
(3.38)
where Ti = Ti(h )' i=I,2, and ZI and Z2 are independent normal variables
o
with zero means and variances as in (3.38) and (3.35) . Again the
martingale method and Cramer-Wold device used to prove theorem 2.1 of
O1apter 4 and lenuna 3.5 are applicable here. The term T is a centered
1
U-statistic with the variable kernel
"
Bl(Xi,Xj)-B2(Xi,Xj)+b2(Xi)-bl(Xi)+b2(Xj)-bl(Xj)+~I-~2·
So T1 is of the same form as SI and the term T is of the same form as
2
S2. Infact
106
T2 (h) = (nh)
-1 n
!
i=l
[bll(Xi)-b2l(Xi)+
~2-~1]
L
and
b ll (X)-b2l (X) +
= A + B.
(~2-~1)
where A and B are as given in
(3.36) and (3.37). Therefore.
9/5-_
4 2
2 T}{u) 2
n -Yar(T2 ) ----+ Co 4k {f(T}"(u» Q{uj W(u)du
- [IT}"(x)T}(x)w(x)dx]2}.
So to complete the proof we need only to prove
n9/5yar(T l )
----+
2 C~l [I( g~~}w(x»2dx][(JK(U)-L(u»2dU]. (3.39)
For that write
then.
Var(T l ) = [n(n-l)ho] ~ [n(n-l)/2] E{B* (X l ,X2 ) - b(X l ) - b(X2 ) +
2 -1
*2
2
2
= (1/2)[n(n-l)ho] E{B (Xl'~) - 2b (Xl) + ~ }
2
.
~}
2
- (1/2)(nho )-2 E{B* (Xl'~)}
= (1/2)(nho)-~{[Bll(Xl·~)-B2l(Xl,X2)+B12(Xl·~)-B22(Xl·~)]2}
= (1/2)(nho)-~{2[Bll(Xl,X2)-B2l(Xl·~)]2}
-~
= 2(nho ) ~[
K(Xl-~) _ L(Xl-~)
h
h
Q(X )Q(X )
I[Al=l]I[Al=l]w(X l )]
l
2
= 2(nh )-2I [K(x-y) _ L(x-y)]2w(x)2 n{x) n{y)dxdy
o
h
h
Q(x) Q(y)
- 2(n-~~1)I[ g~~}W(X)]2dx x I[K(u)-L(u)]2du .
Further. since d[~(u)] = [~(u)+2uK(u)K'(u)]du gives
I[K(u)-L(u)]2du = JL2 (u)du we get
-9/5
-1 T}{x)
2
2
Var(T l ) - n
2 CO I[ Q{xjw(x)] dx JL (u)du
Lemma 3.7: For any p,q € ~
7 lO
n / (p Di(h ) + q 6i(h »
o
o
2
2
2
2
2
where 0 = p 0 o + q 0 c + 2pqooc
107
L~) N(O.
0
2)
2
~
Proof : By the argument leading to (3.33) and (3.38) it is enough to
show that
c20
9 10
[p(Sl+S2) + q(T1+T2 )]
L
~ N(O, 4 0 )
Since P(Sl+S2) + q(T 1+T2 ) is a martingale, the corollary 3.9 of Hall
n /
2
and Heyde (1980) is applicable. The steps to verify the conditions of
the corollary are identical to those in lemma 4.5 of Chapter 4, so we
omit the details. The only addition will be to show that
n
9/5
Cov(Sl+S2,T 1+T2 )
~ {2 C~l [I( Q~~~ w(x»2dx ]I[K(u)-L(u)][K*K(u)-L*K(u)]du
2 n(x)
2 4
+ 4k Co [f(n"(x»
Q(x}
Now as Si and Tj ,
i~j
2
W
2
(x)dx - (In"(x)n(x)w(x)dx) ]}.
are uncorrelated,
+ Cov(S2,T2 ) = Cov(Sl,T 1) - Var(S2)·
and therefore by (3.35) it is enough to prove that
Cov(Sl+S2,T 1+T2 )
n
9/5
= Cov(Sl,T1)
Cov(Sl ' T1) .
~ 2 c~l [I( g~~~ w(x»2dx ]I[K(u)-L(u)][K*K(u)-L*K(u)]du
)=!Kr (x)Ks (x)w(x)dx, then,
For that write A(K,K
r s
Cov(Sl,T ) = n-3 (n_l)-lh-3 Cov( ~ ~ [2A(K K )-A(K ,L )-A(L ,K )],
1
o
l~r<s~n
r, s
r s
r s
1~i<~~~B1(Xi,Xj)-B2(Xi,Xj)+b2(Xi)-bl(Xi)+b2(Xj)-bl(Xj)+~1-~2])
3
= n- (n_l)-lh-3 Cov(
o
~ ~ [2A(K K )-A(K ,L )-A(L ,K )],
l~r<s~n
r, s
~
~
r
s
r
[Bl(Xi,Xj)-B2(Xi,Xj)]).
l~i<Hn
Now Cov([2A(Kr ,K s )-A(Kr ,Ls )-A(Lr ,K s )], [B 1(X.,X.)-B
j )])
1
J
2 (X.,X
1
r~i, s=j
for
r~i, s~j ,
=0
r=i, s#j
~
108
s
Cov(A(Li,Kj),B2(Xi,Xj»
~
I n(x)
2
- h2~
J~(y)[K*L(y)]dy x 2 ( Q(x)w(x» dx.
Also note that fK(u)[L*L(u)]du = fL(u)[K*L(u)]du.
Therefore
CoV(Sl,T~) =
-
n-3(n_l)-lh~3 n(~-l) 4h~I( g~~}W(x»2dx
x I{K(u)[K*K(u)]-K(u) [L*K(u)]-L(u) [K*K(u)]+L(u)[K*L(u)]}du
-~-l
n(x)
2
2 n -ho I( Q(xjw(x» dx I[K(u)-L(u)][K*K(u)-L*K(u)]du
~ - 2 n-9/5c~lI( g~~}w(x»2dx I[K(u)-L(u)][K*K(u)-L*K(u)]du,
=-
which completes the proof of lemma 3.7.
Lemma 3.8
Under conditions B.l, B.2 and B.3 and for any O<a<b<oo,
sup
a~t~b
IDI(n-1/5 t) I = 0
p
(n-2/5 ).
Proof:
Again the proof will follow by arguments similar to those of lemma
3.2. But to use such an argument, first we have to prove analogue of
(3.8),
109
•
sup
E[n
1/2-1/5 t) 121
-ui(n
~
C(a,b,l).
(3.4O)
n:a~t~b
Since the proof of (3.40) is almost identical
to (3.8) we omi t
the
details.
Lemma 3.9 : For any
multiple of n
-1/5
~>O
and any nonrandom hI' asymptotic to a constant
,
p
----+~
0 .
Proof: Since the proof is based on the identical steps to those of
(3.28) we omit the details.
110
BIBLIOGRAPHY
Barlow. R. E. and Proschan. F. (1975). Statistical theory of
reliability and life testing: probability models. Holt. Rinehart.
Winston. New York.
Bowman. A. (1984). "An al ternative method of cross-validation for the
smoothing of density estimates". Biometrika. 65. 521-528.
Burkholder. D. L. (1973). "Distribution function inequali ties for
martingales". Annals of Statistics. 1. 19-42.
Burman. P. (1985). "A data dependent approach to density estimation".
Zeitscrift fur Wahrscheinlichkeitstheorie und verwandte Gebiete.
69. 609-628.
Chow. Y. S .• Geman. S. and Wu. L. D. (1983). "Consistent
cross-validated densi ty estimation". Annals of Statistics. 11.
25-38.
Copas. J. B. and Fryer. M. J. (1980). "Density estimation and suicide
risk in psychiatric treatment". Journal of Royal Statistical
Society A. 143. 167-176.
Cox. D. R. and Oakes. D. (1984). Analysis of Survival Data. Chapman
Hall. London.
Devroye. L. and Gyerfi. L. (1985). Nonparametric Density Estimation:
the L View. Wiley. New York.
1
Duin. R. P. W. (1976). "On the choice of smoothing parameters of Parzen
estimators of probability density functions". IEEE Transactions
on Computers. C-5. 1175-1179.
Eubank. R. (1988). Spline Smoothing and Nonparametric Regression.
Wiely. New York.
Foldes. A. and Revesz. P. (1974). "A general method for density
estimation". Studia Sci. Math. Hungar. 9. 81-82.
Gyerfi. L.. Haerdle. W.• Sarda. P. and Vieu. P. (1989). Nonparametric
Curve Estimation from Time Series. Springer-Verlag. Berlia.
Habbema. J. D. F.• Hermans. J. and van den Broek. K. (1974). "A
stepwise discrimination analysis program using density
estimation". Compstat 1974: Proceedings in Computational
Statistics. 101-110. Physica Verlag. Vienna.
•
Haerdle. W. {1988}. Applied Nonparametric Regression. Econometric
society monograph series. Cambridge University Press.
Haerdle. W.• Hall. P. and Marron. J. S. {1988}. "How far are
automatically chosen regression smoothers from their optimum?".
Journal of the American Statistical Association 83. 86-101. with
discussion.
Haerdle. W.• Marron. J. S. and Wand. M. P {1990}. "Bandwidth choice for
density derivatives". Journal of Royal Statistical Society. B.
52. 223-232.
Hall. P. {1983}. "Large sample optimali ty of leas t square
cross-validation in densi ty estimation". Annals of Statistics 11.
1156-1114.
Hall. P. {l984}. "Central limit theorem for integrated square of
multivariate nonparametric density estimator". Journal of
Multivariate Analysis. 14. 1-16.
Hall. P. {1985}. "Asymptotic theory of mInImum integrated square error
for multivariate density estimation". Proceedings of the Sixth
International Symposium on Multivariate Analysis at Pittsburgh.
25-29 .
. Hall. P. {1981a}. "On Kullback-Leibler loss and density estimation".
Annals of Statistics. 15. 1491-1519.
Hall. P. {1981b}. " On estimation of probability densities using
·compactly supported kernels". Journal of Multivariate Analysis.
23. 131-158.
Hall. P. and Heyde. C. C. {1980} , Martingale limit theory and its
application. Academic Press. New York.
Hall, P. and Marron. J. S. {1981}. "Extent to which least squares
cross-validation minimises integrated square error in
nonparametric density estimation", Probability Theory and Related
Fields. 14. 561-581.
Krieger. A. M. and Pickands. J. {1981}. "Weak convergence and efficient
density estimation at a point". Annals of Statistics, 9.
1066-1018.
Lo. S. H., Mack. Y. P. and Wang. J. L. { 1989}. "Dens ity and hazard rate
estimation for censored data via strong representation of the
Kaplan-Meier estimator". Probability Theory and Related Fields.
80. 461-413.
Marron. J. S. {l985}, "An asymptotically efficient solution to the
bandwidth problem of kernel density estimation". Annals of
Statistics, 13. 1011-1023.
Marron. J. S. {1981}, "A comparison of cross- validation techniques in
density estimation". Annals of Statistics. 15. 152-162.
112
Marron, J. S. (19S8), "Automatic smoothing parameter selection: A
survey", Empirical Econ., 13, 187-208.
Marron, J. S. and Haerdle, W. (1986), "Random approximations to some
measures of accuracy in nonparametric curve estimation", Journal
of Multivariate Analysis, 20, 91-113.
Marron, J. S. and Padgett, W. J. (1987), "Asymptoticallyoptimal
bandwidth selection for kernel density estimators from randomly
right censored samples", Annals of Statistics, 15, 1520-1535.
Nolan, D. and Pollard, D. (1987), "U-processes: rates of convergence",
Annals of Statistics, 15, 780-799.
Parzen. E. (1962), "On estimation of a probabil ity density function and
mode", Annals of Mathematical Statistics, 33, 1065-1076.
Rice, J. and Rosenblatt, M. (1976), "Estimation of the log survivor
function and hazard function", Sankhya Ser. A, 38, 60-78.
Rosenblatt, M. (1956), "Remarks on some non-parametric estimates of a
densi ty function", Annals of Mathematical Statistics, 27,
832-837.
Rudemo, M. (1982), "Empirical choice of histograms and kernel density
estimators", Scandanavian Journal of Statistics, 9, 65-78.
Schuster, E. A. and Gregory, C. G. (1981), "On the nonconsistency of
maximum likelihood nonparametric densi ty estimators", Computer
Science and Statistics: Proceedings of the 13th Symposium on the
Interface (W. F. Eddy ed.) Springer Verlag, New York, 295-298.
•
Scott, D. W. and Factor, L. E. (1981), "Monte Carlo study of three
data-based nonparametric probability density estimators", Journal
of the American Statistical Association, 76, 9-15.
Scott, D. W., Tapia, R. A. and Thompson, J. R. (1977), "Kernel density
estimation revisited", Nonlinear Analysis, Theory, Methods and
Applications, 1, 339-372.
Serfling, R. J. (1980), Approximation Theorems of Mathematical
Statistics, Wiley. New York.
Sheather, S. J. (1983), "A data-based algorithm for choosing the window
width when estimating the densi ty at a point", Computational
Statistics and Data Analysis, 1, 229-238.
Sheather, S. J. (1986), "An improved data-based algorithm for choosing
the window width when estimating the densi ty at a point",
Computational Statistics and Data Analysis. 4, 61-65.
Shorack, G. R. and Wellner. J. A. (1986), Empirical Processes with
Applications to Statistics, Wiley, New York.
Silverman, B. W. (1986), Density Estimation for Statistics and Data
113
"
Analysis, Chapman and Hall, New York.
Singpurwalla, N. D. and Wong, M. Y. (1983a) , "Estimation of the failure
rate. a survey of nonparametric methods. Part I: non Bayesian
methods", Communication in Statistics - Theory and Methods, 12,
5, 559-588.
Singpurwalla, N. D. and Wong, M. Y. (1983b) , "Kernel estimators of the
failure rate function and densi ty function : An analogy", Journal
of the American Statistical Association, 78, 478-481.
Stone, C. J. (1984), "An asymptotically optimal window selection rule
for kernel density estimates", Annals of Statistics, 12,
1285-1297.
Tanner, M. A. and Wong, W. H. (1984), "Data-based nonparametric
estimation of the hazard function with applications to model
diagnostics and exploratory analysis", Journal of the American
Statistical Association, 79, 174-182.
Tanner, M. A. and Wong, W. H.. (1983), "The estimation of the hazard
function from randomly censored data by kernel method", Annals of
Statistics, II, 989-993.
Walter, G. and Blum, J. R. (1979), "Probability density estimation
using delta sequences", Annals of Statistics, 7, 328-340.
Watson, G. S. and Leadbetter, M. R. (1964a) , "Hazard analysis I",
Biometrika, 51, 175-184.
Watson, G. S. and Leadbetter, M. R. (l964b) , "Hazard analysis II",
Sankhya Sere A, 26, 101-116
Wells, M. T. (1989), "On the estimation of hazard rates and their
extrema from general randomly censored data", Unpublished
manuscript.
Woodroofe, M. (1970), "On choosing a delta sequence", Annals of
Mathematical Statistics, 41, 1665-1671.
114