Chu, Chih-Kang; (1989)Some Results in Nonparametric Regression."

#2008
SOME RESULTS IN NONPARAMETRIC REGRESSION
by
Chih-Kang Chu
A dissertation submitted to the faculty of the
University of North Carolina at Chapel Hill in
partial fulfillment of the requirements for the
degree of Doctor of Philosophy in the Department
of Statistics
Chapel Hill
1989
Approved by
Reader
~~-
Reader
>
.
CHIH-KANG CHU.
Some Results in Nonparametric Regression (Under the
direction of J. S. MARRON.)
ABSTRACf
For nonparametric regression. the two most popular methods for
constructing kernel estimators. involving choosing weights by kernel
evaluation or by subinterval integration. are compared for their
asymptotic mean square errors.
Their performance is quite different
when the design points are serious departures from equal spacing, or
when the design points are randomly chosen.
For choosing the bandwidth in nonparametric regression. the
ordinary cross-validation provides poor bandwidth estimates when the
observations are correlated.
In the case of the short range dependent
observations. the modified cross-validation is proposed.
Based on the
reduction of the asymptotic bias of the bandwidth estimates. the
modified cross-validation would provide asymptotically optimal
bandwidths.
However. the modified cross-validated bandwidth suffers
from a large amount of sample variability.
The partitioned cross-validation is applied to reduce the sample
variability.
However. the partitioned cross-validation provides
asymptotically biased bandwidths.
The two criteria are compared for
their asymptotic mean square errors of the bandwidth estimates.
In the
simulation study. it is shown that the performance of the two criteria
depends on the amount of sample variability.
-i-
I wish to thank my advisor. Dr. Steve Marron. for his constant
support. enthusiastic guidance. and helpful criticism in all parts of
the research presented in this dissertation.
I also wish to thank the
other members of my committee. Dr. Edward Chrlstein. Dr. Gordon Simons.
Dr. Walter Smith. and Dr.
Youn~
Truong for their valuable comments and
suggestions.
Special thanks go to my fellow students for their warm
encouragement and help during the years I spent in Ol.npel IIill.
Thanks
to Miguel Nakamura for his excellent knowledge about GAUSS and T3.
Thanks to Mike Frey for his supports and helpful advice.
I am grateful to the Department of Statistics for providing a good
study environment and financial support.
-ii-
TABLE OF (X)NTENTS
v
LIST OF TABLES
vi
LIST OF FIGURES
CHAPTER I
INTRODUCTION TO NONPARAMETRIC REGRESSION
1.1
Introduction
1
1.2
Kernel Estimators
3
1.3
Cross-validation
7
1.4
Partitioned Cross-validation
13
1.5
Summary of Results
14
CHAPTER II
(X)MPARISON OF TWO KERNEL ESTIMATORS
2.1
Introduction
16
2.2
A Comparison for the Unequally Spaced Fixed Design
16
2.3
A Comparison for the Random Design
20
2.4
An Improved Gasser-Mueller Estimator
24
2.5
Discussion
25
2.6
Proofs
27
CHAPTER III
A DATA-DRIVEN BANDWIDTH ESTIMATOR
3.1
Introduction
41
3.2
ARMA Process
49
3.3
Behaviors of the Two Optimal Bandwidths
52
3.4
Behaviors of the Modified Cross-validated Bandwidth
55
3.5
Discussion
51
-iii-
3.6
CHAPTER IV
Proofs
65
THE PERFORMANCE OF THE MODIFIED CROSS-VALIDATED
BANDWIDTH
4.1
Introduction
86
4.2
Limiting Distributions for Bandwidths
87
4.3
Discussion
89
4.4
Proofs
94
CHAPTER V
AN APPLICATION OF THE PARTITIONED CROSS-VALIDATION
5.1
Introduction
132
5.2
Partition Cross-validation
133
5.3
ASJ~ptotic
5.4
Discussion
138
5.5
Proofs
144
CHAPTER VI
Normality
137
A SIMULATION STUDY
6.1
Introduction
151
6.2
Simulation Results
154
6.3
Estimation of the Optimal Number of Subgroups
for the PCV
173
.'
BIBLIOGRAPHY
-iv-
178
e
.
LIST OF TABLES
Table 6.1:
The sample mean square errors (MSE) of the ratios
of the modified cross-validated bandwidths and
the partitioned cross-validated bandwidths to the
optimal bandwidth for the independent observations.
Table 6.2:
The sample mean square errors (MSE) of the ratios
156
of the modified cross-validated bandwidths and
the partitioned cross-validated bandwidths to the
optimal bandwidth for the positively correlated
observations with a large amount of sample variability.
Table 6.3:
The sample mean square errors (MSE) of the ratios
157
of the modified cross-validated bandwidths and
the partitioned cross-validated bandwidths to the
optimal bandwidth for the positively correlated
observations with a small amount of sample variability.
Table 6.4:
The sample mean square errors (MSE) of the ratios
158
of the modified cross-validated bandwidths and
the partitioned cross-validated bandwidths to the
optimal bandwidth for the negatively correlated
observations with a large amount of sample variability.
Table 6.5:
The sample mean square errors (MSE) of the ratios
159
of the modified cross-validated bandwidths and
the partitioned cross-validated bandwidths to the
optimal bandwidth for the negatively correlated
observations with a small amount of sample variability.
Table 6.6:
The average and the standard deviation of the
estimated values of the theoretically optimal
number of subgroups derived by the methods
given in Solo (19S1) and Truong (1989). and
fitted by the AR(l) process of the regression errors.
-v-
155
176
LIST OF FIGURES
Figure 1.1:
Plot of the regression function (dashed curve).
the observations (solid squares). the Nadaraya-Watson
estimate (solid curve). and the kernel function (dashed
curve shown at bottom) for the sample size n = 200. the
bandwidth h = 0.05. and the lID regression errors.
8
Figure 1.2:
Plot of the regression function (dashed curve).
the observations (solid squares). the Nadaraya-Watson
estimate (solid curve). and the kernel function (dashed
curve shown at bottom) for the sample size n = 200. the
bandwidth h = 0.5. and the lID regression errors.
9
Figure 2.1:
Plot of the relative weights assigned by the
Nadaraya-Watson estimator (horizontal line)
and the Gasser-Mueller estimator (vertical bars)
to the observations for the case of P = 1.
21
Figure 2.2:
Plot of the relative weights assigned by the
Nadaraya-Watson estimator (horizontal line)
and the Gasser-Mueller estimator (vertical bars)
to the observations for the case of P = 1/2.
22
Figure 3.1:
Relationships among the modified cross-validated
bandwidths (solid vertical lines) and the optimal
bandwidth (dashed vertival line) for the average
function and the positively correlated observations.
59
Figure 3.2:
Relationships among the modified cross-validated.
bandwidths (solid vertical lines) and the optimal
bandwidth (dashed vertival line) for the average
function and the negatively correlated observations.
60
Figure 3.3:
Relationships among the modified cross-validated.
bandwidths (solid vertical lines) and the optimal
bandwidth (dashed vertival line) for the average
function and the one step positively correlated
observations.
61
Figure 3.4:
Relationships among the modified cross-validated.
bandwidths (solid vertical lines) and the optimal
bandwidth (dashed vertival line) for the average
function and the one step negatively correlated
observations.
62
Figure 6.1:
Kernel density estimates of the ratios of the
modified cross-validated bandwidths to the optimal
bandwidth for the independent observations.
162
-vi-
Figure 6.2:
Kernel density estimates of the ratios of the
partitioned cross-validated bandwidths to the
optimal bandwidth for the independent observations.
163
Figure 6.3:
Kernel density estimates of the ratios of the
modified cross-validated bandwidths to the optimal
bandwidth for the positively correlated observations
with a large amount of sample variability.
164
Figure 6.4:
Kernel density estimates of the ratios of the
165
partitioned cross-validated bandiwdths to the
optimal bandwidth for the positively correlated
observations with a large amount of sample variability.
Figure 6.5:
Kernel density estimates of the ratios of the
modified cross-validated bandwidths to the optimal
bandwidth for the positively correlated observations
with a small amount of sample variability.
Figure 6.6:
Kernel density estimates of the ratios of the
167
partitioned cross-validated bandiwdths to the
optimal bandwidth for the positively correlated
observations with a small amount of sample variability.
Figure 6.7:
Kernel density estimates of the ratios of the
modified cross-validated bandwidths to the optimal
bandwidth for the negatively correlated observations
with a large amount of sample variability.
Figure 6.8:
Kernel density estimates of the ratios of the
169
partitioned cross-validated bandiwdths to the
optimal bandwidth for the negatively correlated
observations with a large amount of sample variability.
Figure 6.9:
Kernel density estimates of the ratios of the
modified cross-validated bandwidths to the optimal
bandwidth for the negatively correlated observations
wi~h a small amount of sample variability.
166
168
170
Figure 6.10: Kernel density estimates of the ratios of the
171
partitioned cross-validated bandiwdths to the
optimal bandwidth for the negatively correlated
observations with a small amount of sample variability.
-vii-
OlAPTER I
INTRODUCTION OF NONPARAMETRIC REGRESSION
1.1
Introduction
In regression. one seeks to find a relationship between the
response Y and the design point x.
Frequently there is a functional
relationship between the mean response of Y and the design point x.
If
the functional form is known except for some unknown parameters. then
the regression is called a parametric regression.
A parametric
regression model is often expressed by
Y = m(x.J3) +
~.
Here m is some known function. J3 is a vector of unknown parameters. and
~
is a zero mean regression error introduced to account for measurement
and all other possible errors.
If the functional form is unknown but
smooth. then the regression is called a nonparametric regression.
A
nonparametric regression model is expressed as
Y = m(x) +
~.
In this case. m is a smooth unknown function.
~
is a zero mean
regression error introduced to account for measurement and all other
possible errors.
When appropriate. parametric regression models have some definite
advantages.
For example. the corresponding inferential methods usually
have nice efficiency properties.
Also. the parameters generally have
some physical meaning which makes them interpretable and of interest in
their own right.
~
If the assumed parametric model is grossly in error.
the above advantages of the parametric approach will not be realized.
Thus. there are few benefits from using a poorly specified parametric
form of m.
Parametric models are most appropriate when theory. past
experience. or other sources are available that provide detailed
knowledge about the process under study.
The result of a nonparametric regression is a curve fitted to a
set of data.
Since this is produced without assuming a parametric form
of m. there is some loss in the interpretability and efficiency of
estimators obtained in this fashion.
In contrast to physics or
engineering. it is not often appropriate to give a specific functional
relationship between the mean response of Y and the design point x in
biological systems.
One reason is that modeling is more difficult for
liVing organisms. a second is that it may be difficult to find one wellfitting parametric family of functions just for descriptive purposes.
Whenever there is no appropriate parametric model available. the
data may provide information for the parametric regression.
Thus.
nonparametric regression may represent the final stage of data analysis
or merely an exploratory or confirmatory step in the modeling process.
See. for example. the monographs by Eubank (1988). Mueller (1988). and
Haerdle (1988) for a large variety of interesting real data examples
where applications of this method have yielded analyses essentially
unobtainable by other techniques.
In Section 1.2 the two most popular methods for recovering the
unknown regression function m(x) are discussed.
In Section 1.3 we
discuss cross-validation. a method for choosing the bandwidth in
-2-
~.
nonparametric regression.
Partitioned cross-validation. a method for
improving the rate of convergence of the cross-validated bandwidth. is
briefly discussed in Section 1.4.
Finally. Section 1.5 contains a
brief summary of the results obtained.
1.2
Kernel Estimators
Nonparametric regression. a smoothing method. has been the focus
of substantial theoretical work in recent years and also has been
applied increasingly to real data.
Typical examples of nonparametric
regression from the medical field are: the fixed design case where the
Xi are times at which the observations Y are made on a patient. and
i
the random design case in which studies where X. and Y. are observed on
I
I
the same patient; e.g. height and weight or blood pressure and
cholesterol level.
The fixed design nonparametric regression model can be written as
(1. 1)
Y.
I
for i = 1. 2 ....• n.
defined on the interval
= m(x.)
I
+ ~ .•
I
Here m is a smooth unknown regression function
[O.IJ (without loss of generality). x.I are
nonrandom design points with 0
~
xl
~
x
2
~
...
~
x
n
~
1. Y are noisy
i
observations of the regression function m at the design points X.. and
I
~i
2
a.
are uncorrelated random variables with mean zero and finite variance
If xi are equally spaced. i.e. Xi
= i/n.
then (1.1) is called the
equally spaced fixed design nonparametric regression model.
If the regression function m is believed to be smooth. then the
observations at points Xi near X should contain information about the
value of m at x.
Kernel estimators are local weighted averages of the
response variables. i.e. using the weighted average of those Y. which
1
-3-
have x. close to x to estimate m(x).
1
The width of the neighborhood in
which averaging is performed is called the bandwidth or smoothing
paramet"er.
The kernel function is a given function used to calculate
the weights assigned to the observations.
In this dissertation. the
kernel function K is taken as a probability density function.
The two
most popular methods for constructing kernel estimators involve
choosing weights either by kernel evaluation or subinterval
integration.
For other related kernel estimators. see. for example.
Mack and Mueller (1989) and the monographs by Eubank (1988). Mueller
(1988). and Haerdle (1988).
To construct kernel estimators. first Nadaraya (1964) and Watson
(1964) proposed choosing weights by evaluating a kernel function at the
design points and then diViding by the sum of the weights. so that they
add up °to one.
e"
Then Gasser and Mueller (1979) proposed choosing
weights as the integrals of the kernel function on small subintervals
which contain the design points.
function K.
a
In both situations. given a kernel
bandwidth h. and Kh(o)
= h -1 K(o/h).
for 0
< x < 1.
the
Nadaraya-Watson estimator is defined by
~(x)
(1.2)
(if the denominator is zero. take
~W(x)
= 0).
and the Gasser-Mueller
estimator is defined by
~ JS
ri
i _l
(1.3)
i=1
where
So = O.
Kh(x-t) dt Y •
i
sn = 1. and Xi ~ si ~ x i + l ' for i = I, 2, ...• n-l.
Gasser and Mueller (1979) for a different choice of the kernel
function.
-4-
See
e
If the design points x. are equally spaced. then the two
1
estimators are almost equivalent to each other. as shown by the
integral mean value theorem.
The two estimators are equivalent in
their asymptotic mean square error. almost sure convergence. L
2
convergence. rate of convergence. and asymptotic normality.
The random design nonparametric regression model is written as
Y.=m(X.)+€o
..
1
1
1
(1.4)
= 1.2.... ,
for i
n.
In this case. m. Y . and €oj are assumed to have
i
the same properties as they did in (1.1); only the X. are different.
1
The X. are independent and identically distributed (lID) random
1
variables wi th a densi ty function f on the interval "[0.1]. and X. and
1
€oi are uncorrelated random variables.
For the random design case'. "liW(x) is expressed as
(1.5)
(if the denominator is zero. take "liW(x) = 0). and mCM(x) is expressed
as
(1.6)
Here Y(i) are the observations corresponding to the i-th order
statistic XCi) of the random design points.
The integral points si are
taken as
for i = 1. 2.
So = 0
and sn
n-l. and
= 1.
So
and sn are any reasonable choice. e.g.
In the random design case. see Mack and Mueller
(1989) for other kernel estimators.
-5-
A drawback of the Nadaraya-Watson estimator in the random design
case is that the estimator becomes tricky to analyze from a technical
view point.
Indeed, for some high order kernel functions, Haerdle and
Marron (1983) have shown that the moments of the Nadaraya-Watson
estimator can fail to exist.
The high order kernel function requires
the value of the kernel function to be negative in some place (see
Gasser and Mueller (1979}).
For the random design case, Devroye and Wagner (1980) showed the
....
weak convergence of
~W(x),
and Schuster (1972) and Stute (19S4)
....
establfshed an asymptotic normality of
conditions.
~W(x}
under different
Based on the existence of the continuous second derivative
mOO of m and the continuous first derivative f' of f, Rosenblatt (1969)
....
and Collomb (19S1) gave an asymptotic mean square error of
~W(x}
based
on
Var(~w(x}} = n- l h- l (f(x}}-lo2JK2
(l.7)
E(~W(x» = m(x}
(1.8)
·e .
l
+ o(n-lh- },
2
2
+ h (m"/2 + m' f' If }(x) Ju~ + 0(h } + O(n-lh -I}.
Here and throughout this chapter, the notation J denotes J duo
Working
under the same conditions as Rosenblatt (l969) and Collomb (19Sl),
Jennen-Steinmetz and Gasser (1987) and Mock and Mueller (1989) gave an
....
asymptotic mean square error of mCM(x} based on
Var(;CM(x}}
(1.9)
(1. 10)
E(;CM(x»
= 2n- l h- 1 (f(x}}-1 0 2JK2
= m(x}
+ 0(n- 1h- 1 }.
2
2
+ (l/2}h m"(x) Ju~ + 0(h } + O(n-I},
....
where the integral points of mCM(x} are taken as the order statistics
of the design points. i.e.
~
is taken as I in (I.G).
....
....
Note that mCM(x} has twice the variance of
-6-
~W(x}.
Jennen-
.e
Steinmetz and Gasser (1987) and Mack and Mueller (1989) mentioned that
A
~(x)
.is unattractive because its bias term has an extra factor
m'f'/f.
For the random design case. there is no overall conclusion
about which kernel estimator is better in the sense of asymptotic mean
square error.
1.3
Cross-validation
The magnitude of bandwidth controls the smoothness of the
resulting estimate of m(x).
In practice. we need to choose a value of
the bandwidth and plug it into the kernel estimator.
Figures 1.1 and
1.2 show a simulated regression setting with 100 equally spaced design
points.
The observations are generated by adding pseudo-random normal
variables
~urve
m(x)
~ 1.. with mean 0 and variance
= x 3 (I-x) 3
for 0
the bi-weight kernel. K(x)
~
x
~
1.
0
2
= 0.00152 .
to the regression
The kernel function is taken as
= (15/8)(1-4x 2 ) 2 I[_1/2.1/2](x).
The goal of
regression is to use the observations and the Nadaraya-Watson estimator
(1.2) to recover the curve m(x).
If the bandwidth is too small. i.e.
the average is made only with a few observations. then the resulting
estimate of m(x) is undersmooth (see Figure 1.1).
On the other hand,
if the bandwidth is too large. i.e. the average is made with many
observations. then the resulting estimate of m(x) is oversmooth (see
Figure 1.2).
This is the essence of the smoothing problem.
In the
extreme case as the bandwidth approaches zero. then kernel estimators
approach a a-function. i.e. the estimate of m(x) is Y. if x is x. for
1
some i. otherwise the estimate is zero.
1
This is because no design
points are included in the sufficiently small neighbourhood of x if x
is not a design point.
On the other hand. if the bandwidth approaches
-7-
Figure 1.1: Plot of the regression function (dashed curve). the
observations (solid squares). the Nadaraya-Watson estimate (solid
curve). and the kernel function (dashed curve shown at bottom) for the
sample size n
= 200.
the bandwidth h
= 0.05.
and the lID regression
2
errors N(0.0.00I5 ).
o
8.---.---....---..,.---....-----r----r-----,.------,.----.,------.
o
•
•
-8-
Figure 1.2: Plot of the regression function (dashed curve), the
observations (solid squares), the Nadaraya-Watson estimate (solid
curve), and the kernel function (dashed curve shown at bottom) for the
sample size n
= 200,
the bandwidth h
= 0.5,
and the lID regression
2
errors N(O,O.OOIS ).
o
25.---.----....----.----.-----r---.----r-----,------,----.
o
•
• •
•
•
•
. ".- ......
. /. . '"'
•
I ..
•
•
•
•
•
•
•
•
'I
I
I
I
•
I
•
•
./
/ .
..
-
N
-
.." ..
........
/
/
I
/
"
\
\
I
•
\
I
\
I
•
\
••
OL-_
_....L-_ _.....L-_ _-J.._ _--L---'_----L_
_---1_ _---l
o
• _""--_ _....L...-_oC-...L-_
/
"
o
I
0.0
0.2
0.4
0.6
i/n
-9-
0.8
1.0
infinity. then kernel estimators become an overall average of
observations. i.e. the estimate of m(x} is Y
=n
-1 n
}; Y.1 for all x.
i=l
This is due to the fact that all design points are included in the
sufficiently large neighbourhood of x for all x.
The two optimal bandwidths. h
A
and h . are taken as the minimizers
M
of the average square error (ASE) and the mean average square error
(MASE) respectively.
The ASE function is defined by
(1.11)
where m(x ) are kernel estimators of m(x ) for every j.
j
j
The weight
function W is introduced to allow elimination (or at least significant
reducti~n)
of boundary effects by taking W to be supported on a
subinterval of the unit interval (see Gasser and Mueller (1979».
The
MASE function is defined by
(1.12)
We call h
A
and h
M
the optimal bandwidths among all possible
A
choices of bandwidths h because the estimated curve m(x). as derived by
the bandwidth h . is the closest to the regression function m(x) for
A
the data set at hand.
Similarly. h
M
makes m(x} closest to m(x)
according to the average over all possible data sets.
values of the two optimal bandwidths h
A
and h
M
In practice. the
are not available
because these quantities depend on the unknown function m(x).
the values of h
A
and h
M
Since
can not be calculated. bandwidth selection
methods have been designed with the purpose of providing bandwidths
which approximate h
A
methods are based on
and h
M
in some mode.
minimizati~n
Most bandwidth selection
of some functions of bandwidth h.
-10-
For the equally spaced fixed design nonparametric regression model
(1.1). Clark (1975) introduced the cross-validation idea as the
bandwidth-selection rule for kernel estimators.
The essential idea is
based upon the fact that the regression function m(x) is the best mean
square predictor of a new observation taken at x.
This idea suggests
A
choosing the bandwidth so that m(x) is a good predictor or. in other
words. taking the bandwidth as the minimizer of the estimated
prediction errors
EPE(h)
This criterion for choosing the minimizer has a problem, namely that it
has a trivial minimum at h
= O.
This is because, as h approaches zero,
A
m(x) becomes a a-function.
This problem can be viewed as a result of
A
using the same data to both construct and assess the estimator.
A
reasonable approach to this problem is provided by replacing m(x.) as
,.
J
A
A
m.(x.). where mj(x.) is the "leave-I-out" version of m(x.). i.e. the
J
J
.
J :
J
A
observation (x .• Y.) is left out in constructing m(x.).
J
J
J
To illustrate.
A
for the Nadaraya-Watson estimator. mj(x ) is defined by
j
,.
(1.13) m.(x.)
J
J
= [(n-l) -1
!
i:i~j
Kh(xj-xi)Y.]/[(n-l)
1
-1
!
i:i~j
Kh(x.-x.)].
J
1
Hence. the cross-validation idea is to choose the bandwidth by
minimizing the cross-validation score
(1. 14)
CV(h)
=n
-1 n
,.
2._
! [m.(x j ) - Yj]-W(x.).
j=l
J
J
In the extreme case as the bandwidth h approaches zero. then CV(h)
approaches n-
1 n
2._
! Yj-W(X j ).
j=l
On the other hand. if h approaches zero.
-11-
then CV(h) approaches n
~.
-1 n
~
j=1
[Y - Yj]-W(X j ).
The minimizer. hCV' of
the CV(h) is called the cross-validated bandwidth.
See Haerdle and
Marron (1985) for a motivation of this criterion.
In the case of independent observations. there is another popular
bandwidth-selection rule.
Mallows' criterion is to choose the
bandwidth by minimizing the risk function
A
(1.15)
R(h)
=n
-1 n
~
j=1
A
Here R(h) is an asymptotically unbiased estimate of dM(h) when nh
See Mallows (1973) for details.
A2
a =
(2(n-l»
In practice. a
-1 n-l
~
j=1
2
~
00.
is usually replaced by
[Y j + l - Yj ]
2
.
For bandwidth selection. Rice (1984) worked with the reciprocal of h
and the equally spaced fixed. circular design Mallows' criterion.
The
equally spaced fixed. circular design means that the observations Y +
j kn
at the design points (j+kn)/n are the same as the observations Y. at
J
the design points j/n for all integers k and j
= 1.2 . . . . .
n.
Finally. see Rice (1984) and Haerdle. Hall. and Marron (1988) for other
bandwidth-selection rules.
After one bandwidth selection rule has been chosen. we can arrive
at a bandwidth estimate h.
estimate is.
We need to measure how good the bandwidth
Suppose h is in some set H C R+ which is a range of
n
n
A
bandwidths of interest.
Let the distance
d(~.
measure of accuracy for the kernel estimator
means that the kernel estimator
~
~.
m) denote a given
where the subscript h
involves the bandwidth h.
A
Shibata (19Bl). the bandwidth-selection rule h is said to be
asymptotically optimal with respect to the measure d if
-12-
Following
A
lim edemA. m) / inf
n~
h
hEH
d(~.
m)]
=
1
n
with probability one.
Based on the equally spaced fixed design
nonparametric regression model (1.1). Rice (1984) showed that. for the
circular design case. Mallows' criterion provides weakly consistent
.
estlmates
0
f IIhM Wlt
. h
'
a reiatlve
rate n -1/10 .
In
t
h e same case.
Haerdle. Hall. and Marron (1988) showed that hCV' h . and h M approach
A
-115
zero at the same rate n
. and hCV converges weakly to h and h with
A
a rate n
-3/10
M
From Haerdle. Hall. and Marron (1988). it follows that
. h
'
hCV converges to h A and h MWlt
a reiatlve
rate n -1110
For t h e ran d om
design case. Haerdle and Marron (1985) showed that hCV is an
asymptotically optimal bandwidth estimator with respect to the measures
ASE and MASE.
For the equally spaced fixed design nonparametric regression model
(1.1). Haerdle. Hall. and Marron (1988) compared the cross-validation
and some other bandwidth selection methods.
They showed that the
bandwidths derived by these bandwidth selection methods are
asymptotically equivalent to one another.
Hence. all properties
described for the cross-validated bandwidth apply to these as well.
See Marron (1988). a survey paper. for the properties of other
bandwidth selectors.
1.4
Partitioned Cross-validation
In kernel density estimation. the cross-validated bandwidth also
. . I y s I ow re I at i ve rate
su ff ers an excruclatlng
0
f convergence n -1110 to
the optimal bandwidths (see Hall and Marron (1987».
For kernel
. density estimation. Marron (1988) proposed the partitioned cross-13-
validation (PCV) criterion which improves the relative rate of
convergence of the cross-validated bandwidth from n
-1/10
to n
-1/4
The
idea of PCV could be applied to nonparametric regression and all other
smoothing methods.
The basic idea of the PCV is to use extra smoothness on the
regression function m(x} and split the observations into g subgroups by
taking every g-th observation. calculating the cross-validation score
for each subgroup separately and minimizing the average of these score
functions.
The resulting bandwidth is rescaled by the number of
subgroups g to get the partitioned cross-validated bandwidth hpcv(g}'
A
Then the asymptotic mean square error (AMSE) of (hpcv(g}/hM - I)
depends on parameters g and n. and other factors.
For the PCV. we
minimize the AMSE of (hpcv(g}/h M - I) with respect to g and get a
theoretically optimal value of g for the smallest value of the AMSE.
One drawback to this approach is that the optimal value of g depends on
the unknown regression function m(x}.
To address this drawback. we use
the "plug-in" approach. in Section 6.3. to show how the value of g is
arrived at from the simulated data sets.
1.5
Summary of Results
In Chapter 2. we compare the two kernel estimators
~W(x}
and
A
mGM(x} on the unequally spaced fixed and random design cases.
In both
cases we find different asymptotic representations for the asymptotic
A
biases of
~W(x}.
We also find that. in the random design case. the
minimum asymptotic variance of mGM(x} is arrived at by taking the
A
integral points of mGM(x} as the middle points of each consecutive two
order s'tatistics of the design points.
-14-
In both cases. the asymptotic
e
variances of mCM(x) are greater than those of
~W(x).
In Chapter 3. we explore the effect of dependent observations on
the modified cross-validated bandwidth.
When the observations are
subjected to a short range dependence. the asymptotically optimal
bandwidth is attainable by deleting enough points in the construction
of the modified cross-validation score.
In Chapter 4. we derive an asymptotic normality of the modified
cross-validated bandwidth when the observations are subjected to a
short range dependence.
The relative rate of convergence of the
modified cross-validated bandwidth to h
M
is n-
1/10
which is the same as
in the case of independent observations.
In Chapter 5. we use the partitioned cross-validation to improve
the relative rate of convergence of the cross-validated bandwidth to h
when the observations are subjected to a short range dependence.
M
Here
the results are quite surprising in view of those from Marron (1988).
In Chapter 6. we do a simulation study to compare the performances
between the modified cross-validated and the partitioned crossvalidated bandwidths when the regression errors are an AR(1) process in
time series analysis.
-15-
rnAPTER II
CX>MPARISON OF TWO KERNEL FSfIMATORS
2.1
Introduction
Kernel estimators are local weighted averages of the response
variables.
The two most popular methods for constructing kernel
estimators. in the case of the equally spaced fixed design. involve
choosing weights either by kernel evaluation or subinterval
integration.
If the design points are equally spaced. then the two
estimators are very nearly the same as demonstrated by the integral
mean value theorem.
The purpose of this chapter is to compare the two
estimators for the unequally spaced fixed design and the random design.
Then. the estimator that proves more efficient will be chosen for
further work.
Section 2.2 uses an example to illustrate the efficiency of the
two kernel estimators in the case of the unequally spaced fixed design.
A comparison of the two estimators for the random design is given in
Section 2.3.
An improved version of the Gasser-Mueller estimator is
given in Section 2.4.
results.
2.2
Section 2.5 contains a discussion of our
Finally. the proofs are given in Section 2.6.
A Comparison for the Unequally Spaced Fixed Design
Given the equally spaced fixed design nonparametric regression
model (1.1). the Nadaraya-Watson estimator (1.2). and the Gasser-
Mueller estimator (1.3). we will construct an example of the unequally
spaced fixed design to illustrate the point that the kernel estimator
(1.3) is inefficient in the case of the unequally spaced fixed design.
For the Gasser-Mueller estimator mCM(x). when looking at the
weights' ri
JS i _ 1
Kh(x-t) dt assigned to the observations Y •• it becomes
1
immediately clear that Y will be downweighted if si-1 and si are too
i
close.
We study this effect by starting with an equally spaced fixed
design. and considering consecutive triples of design points.
For each
triple. we move the first and the third points toward the center.
The
amount of shift to the center can be parameterized by a value
a €
[0.1].
This results in the design points expressed as
[(3e+2-a J /
for
(3e+2) / n
(3e+2+a) / n
x.1 =
(2.2.1)
e = O.
(n/3)-1.
1.
n
= 3e+1
if i = 3i+2.
if i
= 3e+3
Here. for convenience of notation. we
assume that n is a multiple of 3.
spaced fixed design. while a
if i
=0
Note that a
=1
is the usual equally
is a design which is also essentially
equally spaced but which has three replications at each point.
As a
approaches zero. the weight given to the center observation is
essentially zero.
This is because the integral points.
,..
observation Y
+
3e 2
approach each other.
In this case. mCM(x) is making
use of only 2/3 of the available observations. hence its inefficiency.
However. in this case. the Nadaraya-Watson estimator
independent of a.
~W(x)
is nearly
It is because the weights assigned to each triple of
observations Y3e +1 . Y3e+ . and Y
2
n
-1
3e +3 are roughly equal to
.
Kh (x-x3e +2 ) as determined by the continuity of K and (2.6.1) given
-17-
later.
lne extent of the inefficiency of the Gasser-Mueller estimator
can be quantified by studying the asymptotic variance of the two kernel
estimators.
We are not claiming this unequally spaced fixed design is
important. however it is considered here because it prOVides a clear
and simple illustration of the point being made.
The effects described
will provide an intuitive basis for understanding the causes of the
similar effects in the more complicated random design case.
We now do an asymptotic analysis for the design points (2.2.1).
Assume that the kernel function K is a probability density function
with support contained in the interval [-1.1]. and that the kernel
function K and the regression function mare Hoelder continuous of
order 1.
The definition of the Hoelder continuity of order 1 of a
e-
function g is that there is a constant M such that
Ig(s) - get)
I
~ M· /s-t/
for any s and t in the domain of function g.
with nh
(2.2.2)
(2.2.3)
~
00.
for 0
< x < 1.
A
Yar(~W(x»
~ 00.
= n-1 h-1 a 2rr2
JK-
+
-2-2
+ O(n -h
).
Bias(~w(x» = IKh(X-t)(m(t)-m(x»
1
dt + O(n- ).
(2.2.5)
Bias(;CM(X»
= IKh(X-t)(m(tl-m(x»
1
dt + O(n- ).
where C(a) = 1 + (1/2)(a-l)2. 0 ~ a ~ 1.
J
~
-2-2
O(n -h ).
(2.2.4)
chapter. the notation
h
it is shown in Section 2.6 that
= C(a)n-1 h -1 a 2r,JK2
A
'Yar(mCM(x»
Then as n
denotes
J
duo
-18-
Here and throughout this
O.
Looking at (2.2.3). we observe that when a
=1
(the minimizer of
C(a». the two estimators have the same performance. which. as noted
above. is expected from the integral mean value theorem.
However. for
any a € [0.1). the variance of mCM(x) is larger than that of
= O.
In the extreme case of a
~(x).
note that mCM(x) has 3/2 times the
A
variance of
~W(x).
available data.
This is because mCM(x) is only using 2/3 of the
Of course. if one really had three replications at
each design point (as we have when a
= 0).
then the obvious thing is to
pool. i.e. work with the average of the observations at these points.
A
One of the features of
~(x)
automatically. as a
O.
is that it makes this adjustment
A
~
On the other hand. mCM(x) has a disturbing
tendency to delete observations.
The biases of the two estimators are
exactly the same in the asymptotic expression and rate of convergence.
A
So this example suggests that the variance of mCM(x) will be increased
by the too close integral points.
The above results may be generalized in a straightforward manner
to the case of forming clusters of k points. instead of the 3 points
case.
When this is done. (2.2.2). (2.2.4). and (2.2.5) remain the
same. except (2.2.3) becomes
= Ck(a)n -1 h -1 a2r,JK2
(2.2.6)
where Ck(a)
=1
+
(1/2)(a-1) 2 (k-2). 0
~
a
~
-2-2
+ O(n -h ).
1 and k
~
2.
Note that the
A
downweighting effect of mCM(x) can be magnified when k
~
3. subject of
course to the fact that these asymptotics describe only the situation
where nh
»
k.
-19-
2.3
A Comparison for the Random Design
In the random design case. it is clear that. by chance. some
design points will be closer to their neighbours than others are.
It
is expected that the efficiency of the Gasser-Mueller estimator is
affected by this situation.
Civen the random design nonparametric
regression model (1.4). the Nadaraya-Watson estimator (1.5). and the
Gasser-Mueller estimator (1.6). we shall do an asymptotic analysis of
this point.
Using the integral mean value theorem and the Hoelder continuity
of K. the ratios of the weights assigned to the observations Y(i) by
the Nadaraya-Watson estimator
~W(x)
and the Gasser-Mueller estimator
A
mCM(x) are roughly equal to (lin) : D1 . where
Figures 2.1 and 2.2 use the same 100 Un1form[0.1] pseudo random
variables to show graphically how strong the variability of the
integral points is in the random design case.
case
~
=1
and Figure 2.2 shows the case
~
Figure 2.1 shows the
= 1/2.
Note that the
relative weights differ across the observations to a surprising degree
with a substantial number of the points significantly downweighted.
which means that the Gasser-Mueller estimator is making very
inefficient use of the data.
In view of (2.2.3). this effect is
A
expected to give a dramatically increased variance of mCM(x) with
respect to
~W(x)
whose relative weights are roughly illustrated by the
A
horizontal line.
~
= 1/2
It 1s also expected that the variance of mCM(x) at
is less than that at
~
= 1.
To make the above ideas more precise. we now start an asymptotic
analysis.
Assume that the kernel function K. the regression function
-20-
~
Figure 2.1: Plot of the relative weights assigned by the
Nadaraya-Watson estimator (horizontal line) and the Gasser-Mueller
estimator (vertical bars) to the observations for the case of
~
= 1.
If)
o
I
o
o
o
o 0.0
r,fT r
:,T
0.2
1!IIT
I
I
I
r.
I i.T
0.4
iIT
0.6
i/(n+ 1)
-21-
TJ.
1,1
I I' I,
0.8
"
1.0
Figure 2.2: Plot of the relative weights assigned by the
Nadaraya-Watson estimator (horizontal line) and the Gasser-Mueller
estimator (vertical bars) to the observations for the case of
~
= 1/2.
l()
o
o
I
T
I
I
.
I
o
o
o 0.0
'
Tl
Jill
IT
0.2
0.4
0.6
i/(n+ 1)
-22-
IT 1fT
1
0.8
11
J
1.0
m. and the density function f of the design points are Hoelder
continuous of order 1. and that K is a probability density function
with support contained in the interval [-1,1].
generat"ity, K has a positive lower bound
positive lower bound If on [0,1].
eK on
Without loss of
[-1/2,1/2] and f has a
This positive lower bound of f
guarantees that the design has no holes on [0,1].
n- 1+e
< h < n- e •
for some e € (0,1/2). and 0
~
Then as n
00,
with
< x < I, it is shown in
Section 2.6 that
Var(~W(x» = n- I h- 1(f(X»-1 0 2JK2
(2.3.1)
2 -1 -1
= 2(1-~+~)n
h (f(x»
A
(2.3.2) Var(mGM(x»
(2.3.3)
1 1
+ 0(n- h- ),
-1 2r, 2
-1 -1
0 JK
+ o(n h ),
Bias(~w(x» = [JKh(X-t)(m(t)-m(X»f(t)dt]
[J~(X-t)f(t)dt]
/
1
+ O(n- h),
(2.3.4)
The above results are based on the assumption of the Hoelder
continuity of m and f which is a little weaker than the continuity of
the second derivative m" of m and the first derivative f' of f needed
by Rosenblatt (1969), Collomb (1981). Jennen-Steinmetz and Gasser
(1987). and Mack and Mueller (1989).
Given
~
= 1/2
in (2.3.2), the Gasser-Mueller estimator has the
minimum asymptotic variance
-1 -1
-1 2rr2
-1 -1
1.5n h (f(x»
0 J~ + o(n
h ).
Thus, the choice of
~
= 1.
mode in Jcnnen-Steinmetz and Gasser (1987)
and Mack and Mueller (1989) seems inadvisable in practice.
In the
uniform random design case, both asymptotic biases have the same
-~-
asymptotic expression IKh(X-t}(m(t}-m(X}}dt.
design case, we can conclude that
"
~~(x)
2.4
"
~W(x}
So, in the uniform random
is better than mGM(x} because
has a smaller value of asymptotic mean square error.
An Improved Gasser-Mueller Estimator
The results of Section 2.3 clearly show that the increased
"
variance of mGM(x) in the random design case is caused by the
instability of the integral points si'
A means of reducing this
instability is to average together more order statistics in the
definition of the integral points si'
One means of doing this, given a
positive integer k, is to define
(2.4. 1)
for i = k. k+l:
2k
satisfy
:I
13
j=1
j
n-k, where the weights f3
= I,
j
are nonnegative and
and where sO' sl' ... , sk-l and sn-k+l' ... , sn are
defined in any reasonable manner (again the exact definition does not
affect our main ideas).
AS'sumptions for an asymptotic analysis of the new version of
'"
mGM(x) are the same as the assumptions given in Section 2.3.
n
~ m.
h
~
O. with nh
~ m,
it is shown in Section 2.6 that
2k
'"
(2.4.2) Var(mGM(x»
= (1 +
Then as
:I
f3j2)n-lh-l(f(X»-la2IK2 + o(n- l h- l ).
j=1
(2.4.3)
Bias(;GM(x»
= IKh(x-t)(m(t)-m(x»
l
dt + O(n- ).
From (2.4.2). it is easy to see that the best choice of f3
f3 j
= 1/{2k).
for j
= 1.2....•
2k.
is
In this case. the Gasser-Mueller
estimator has the minimum asymptotic variance
-24-
j
A
Thus the amount by which the variance of
can be reduced. if k is large.
A
~W(x)
improves over mGM(x)
Of course. a practical limitation is
that these asymptotics are only meaningful when nh
2.5
»
k.
Discussion
Through the instability of the integral points s. of the Gasser1
A
Mueller estimator mGM(x). the observations with close neighbours will
A
be downweighted.
This effect causes the inefficiency of mGM(x).
~~en
the observations are dependent. we expect that the minimum variance of
A
mGM(x) is arrived at by taking the middle points of each consecutive
two order statistics of the design points as integral points.
We also
A
expect that. even in this case.
t~e
variance of mGM(x) is still larger
A
than that of
~W(x).
In further analysis of (2.3.3) and (2.3.4). if m and f have three
continuous derivatives at x and the kernel function K is a symmetric
density function. then by Taylor's theorem. the asymptotic biases of
the two kernel estimators are
(2.5.1)
(2.5.2)
Bias(~W(x» = h 2 (2f(x»-I(m"f
Bias(;GM(x»
+
2m'f')(X)Ju~
= (l/2)h2m"(x) Ju~
3
1
+ O(h +n- h).
3
+ O(h +n-1).
Unfortunately. these expressions are not comparable.
In some
cases. one asymptotic bias will be bigger in magnitude (depending on
the signs of m"(x)f(x) and m'(x)f·(x». while in other cases. the other
A
asymptotic bias will be bigger.
The bias of mGM(x) is certainly more
simple and. at first glance. one might think it is more natural.
-25-
However a case can be made for the bias of
natural one.
~W(x}
being the more
In particular. when the data are being used efficiently.
the design density f(x} is an important issue which should affect how
well one can estimate m(x}.
The fact that f(x} nearly disappears in
A
the bias of mCM(x} can be considered an unattractive feature of that
estimator.
A completely different view of this is that. in order to
make the design density essentially disappear from the bias. one must
pay some price. which. in this case is increased variance.
It would be nice to find some way to resolve this completely. say
by finding some other methods to comprise the integral points of mCM(x}
in which one bias is bigger than the other. but we do not know how to
do this.
We do. however. speculate that if this can be done in a
reasonable manner. the net result in terms of the asymptotic mean
A
square error will be in favor of
~W(x).
-26-
2.6
Proofs
Let c denote a generic constant.
Proofs of (2.2.2) and (2.2.4):
We first derive an asymptotic expression of the denominator of
....
~(x).
Since the design points are clustered in groups of three. we
rearrange the denominator of
n
-1 n
!
1=1
.....
~W(x}
as
Kh (x-x 1} = Al + B1 + C1 ,
where
(n/3}-1
Al
=
!
1.'=0
~(x-x31.'+1)'
(n/3}-1
B =
1
!
1.'=0
Kh (x-x 3 1.'+2) ,
(n/3}-1
C1 =
-!
1.'=0
~(x-x31.'+3) ,
and where n is assumed to be a multiple of 3 for simplicity.
Each of
these summations is over on equally spaced grid, with width 3/n, so AI'
B1 , and C1 are all the Riemann summation for
(1/3}JK (X-t) dt + O(n- 1h- 1 ).
h
Hence, by the Hoelder continuity of K, we have
(2.6.1)
n
-1 n
!
1=1
K (x-x }
1
h
=1 +
O(n
-1 -1
h
.....
Using (2.6.1), the variance and bias of
~W(x}
....
Var(~(x}}
....
Bias(~(x}}
/ n
= n -1
}.
can be expressed as
-1
n
!
1=1
Kh(x-x.}]
1
n I l
! ~(x-x1}(m(x1}-m(x}}(1 + O(n- h- )}.
1=1
-27-
The Riemann summation method used above show that asymptotically:
(2.6.2)
(2.6.3) n
-1 n
dt+O(n
!
-1
).
i=1
This completes the proofs of (2.2.2) and (2.2.4).
Proof of (2.2.5): It is immediate from equation (6) of Gasser and
Mueller (1984).
Proof of (2.2.3):
A
Using the integral mean value theorem. the variance of mCM(x) can
be expressed as
= YarE
=a
= a
~
1=1
r1
Kh(x-t) dt (m(x.) + 1:..)]
1
JSi_l
1
2 n
r1
2
!
[Js
~(x-t) dt]
1=1
1-1
2 n
!
1=1
2....
2
(5. - 5'_I)~h(x-t.) .
1
1
1
for some t i € [5 i _ 1 .s ]. i = 1. 2 ..... n.
1
So. by the Hoelder
continuity of K. we have
A
Yar(mCM(x»
where. for
e
= a
2 n
2....
(51 - s1_1)~h(X-X1)
2
-2...
+ O(n ~
-2
)
1=1
= o2{a2A2 + «3-a)/2)2a2 + «3-a)/2)2C2 ) + O(n-~-2).
= O. 1•.... (n/3)-2.
si -
!
5
1- 1
if 1 = 3£+2
= [a/n
(3-a)/2n
and where
-28-
if i = 3£+3. 3£+4
e'
Since A • B . and C all have the same asymptotic
2
2
2
(1/3}n- 1h- 1JK 2 + O(n-~-2).
ex~ression
A
which follows from (2.6.2). we conclude that Var(mCM{x»
has an
asymptotic expression
a2 {a 2 + {{3-a}/2)2 + ({3Ia)/2)2}{1/3)n- 1h- 1JK 2 + O{n-~-2)
= c{a)n- 1h- 1a 2JK 2 +
O{n-~-2).
The proof of (2.2.3) is complete.
Proof of {2.2.6}: The case of clustering k points can be easily derived
by the same procedure as for the case of k
= 3.
Proofs of {2.3.1} through {2.3.4}:
For the Nadaraya-Watson estimator. before we use Taylor's theorem
to handle the denominator. we need to make sure the denominator of
A
~(x)
is bounded above zero. i.e. there are enough number of Xi in the
interval
[x~.x+ ~
]. where K
~ t K.
For the Gasser-Mueller
estimator. before we use the integral mean value theorem to approximate
the integrals which are the weights assigned to the observations. we
need to make sure the approximation error is negligible. i.e. there are
not too many Xi in the interval [x-h.x+h].
Let N be the event that the number of Xi in the interval
[x~.x~] is
constant and CN
less than
~
CN-n.J~~~~ f{t)dt.
where CN is a positive
1/4. W the event that the number of Xi in the
h
h
interval [x--2--'x+~]
is greater than CW- n • IX+W2
x-W2 f{t)dt where CW is a
1
constant and CW ~ e • and G the event that the number of X. in the
1
interval [x-h.x+h] is greater than C .n-Ix+h f(t)dt. where C is a
G
x-h
G
-29-
constant and C
G
1
e .
~
For deriving the probability of the events N. W, and G. we shall
use (1) and (2) of the theorem given in Section 10.3.1 of Serfling
(1980) which is described below:
Theorem (Chernoff).
T
be independent and identically
n
distributed random variables with distribution F and put S
n
Assume existence of the moment generating function
real. and put b(T)
= inf
-ZT
e
~(z)
=
n
2 T ..
. 1 J
J=
= EF(e zT ).
z
~(z).
z€R
<T
~
E(T). then
If E(T) ~
T
< 00.
If
-00
(1)
then
(2)
Using (1) and (2) of the theorem. as n
(2.6.4)
(2.6.5)
we have
00.
f(t)dt]n + [1 - J~~~~ f(t)dt]n
n
n
~ [1 - CNefh] + [1 - efh] = O(exp(-nh».
X+h
n
n
P(G) ~ [1- x-h f(t)dt] ~ [l-tfh] = O(exp(-nh».
P(NUW)
~
~
CNJ~~~~
[1 -
J
where if is a positive lower bound of f on [0.1].
The proofs of
(2.6.4) and (2.6.5) are given at the end of this section.
Notation used to prove (2.3.1) and (2.3.3) include:
Z
n
n
-1 n
2
Kh(x-X i
i=l
=n
= n
-1 n
2
~(x-Xi}(m(X1)
V = n
-2 n
2
i=1
n
).
-1 n
M
n
B
=n
2 Kh(x-Xi}m(X i }.
1=1
1=1
- m(x}}.
2
Kh(x-X i ) .
~ = JKh(X-t)f(t)
-30-
dt,
e-
o = IKh(X-t)m(t)f(t) dt.
C = IKh(X-t)(m(t) - m(x»f(t) dt.
Because X. are lID random variables. and K. m. and fare Hoelder
1
continuous. then Zn . Mn . and Vn have the following properties:
(a) Given the event (NUW)c. the denominator Zn of ~W(x) is bounded by
CWofmaxoKmax
~
Zn
~
CNoefoe K
> O. since there are O(nh) terms in
n
!
i=1
~(x-Xi)'
E(Zn)
(b)
= ~ = f(x)
+ O(h).
E(Mn ) = 0 = m(x)f(x) + O(h).
(c)
E(Bn ) = C = O(h).
-1r.
2
-I -I
r. 2
-1
E(Vn ) = n J~(x-t) f(t) dt = n h f(X)JK + O(n ).
I
E(Zn2 ) = n-II~(x-t)2f(t) dt + (l-n- )[IKh (X-t)f(t) dt]2
= n- 1h- 1f(x)IK 2 + ~2 + O(n- I ) = 0(1) ..
E(Mn2 )
E(MnZn )
= n- 1JKh (X-t)2m(t)2f (t) dt + (l-n- 1)[JKh (x-t)m(t)f(t)
= n- I h- I m(x)2f (X)IK2 + 0 2 + 0(n- 1 ) = 0(1).
= n- I JKh (X-t)2m(t)f(t) dt +
dt]2
I
(l-n- )[IKh (x-t)m(t)f(t) dt][IKh(X-t)f(t) dt]
= n- 1h- I m(X)f(X)JK2 + o~ + 0(n- 1 ).
E(V 2)
n
= 0(n-~-2).
n
(d) Given the event (NUW)c, there are O(nh) terms in the
Using Whittle's inequality (Whittle (1960»
3.5. we have. for each k
!
Kh(x-X ).
i
i=1
as described in Section
= 1.2.....
= O(n-kh- k+ I ).
E«6 - M )2k I(NUW)c) = 0(n- kh- k+ 1).
n
E«C - B )2k I(NUW)c) = 0(n- khk+ 1 ).
n
E«~ - Z )2k I(NUW)c)
n
E(V 2 I(NUW)c)
n
-31-
= O(n-2 ),
For the proof of (2.3.3), using the conditional expectation, the
...
bias of
~W(x)
(2.6.6)
Bias(~(x»
...
can be expressed as
= E(Bn/ln )
= E(B /l
n n INUW)P(NUW) + E(Bn/Zn I(NUW)c)P«NUW)c).
Using P(NUW) = O(exp(-nh» and IB n /Z n I ~ 21ml max ,the first term of the
right hand side in (2.6.6) is
E(Bn/Z n INUW)P(NUW)
= O(exp(-nh».
Using property (a) and applying the first order Taylor's theorem to a
reciprocal function, the second term of the right hand side in (2.6.6)
can be expressed as
E(B /Z I(NUW)c) = E(B (~-1 + (~-Z )~ -2) I(NUW)c),
n n
n
n n
-2
where ~n is between ~ and Zn . i.e. ~n = 0(1). Using properties (b)
and (d), we have
IE(Bn (~-Z n )~ n-2 I(NUW) cli
~ cIE«Bn-O(~-Zn) I(NUW)c) + CE«~-Zn)
~ cIE«Bn-C)(~-Zn)
I(NUW)c)I
I(NUW)c)I
~ c[E«Bn -C)2 I(NUW)c)oE«~-Z n )2 I(NUW)c)]1/2
= c[0(n- 1h2 )oO(n- 1)]1/2 = 0(n- 1h).
Using the fact that
E(B ~-1)
n
= C/~
= E(Bn~-1 INUW)P(NUW) +
= O(exp(-nh» + E(Bn~-1
E(B ~-1 I(NUW)c)P«NUW)c)
n
I(NUW)c)P«NUW)c),
then we have
-1 I C C
E(Bn~
Then the bias of
...
Bias(~W(x»
...
(NUW) )P«NUW) )
be expressed
~(x) can
= C/~
= C/~
-1
+ O(n
as
h) + O(exp(-nh»
-32-
+ O(exp(-nh» .
e'
= IKh(x-t)(m(t)-m(X»f(t)dt
1
IKh(X-t)i(t)dt + O(n-lh).
The proof of (2.3.3) is complete.
For the proof of (2.3.1), using the fact that Xi and c i are
A
uncorrelated, the variance of
A
Var(~W(x»
(2.6.7)
~W(x)
can be expressed as
2-
= o-E(Vn/(Zn
2
» + Var(Mn/Zn )·
Now we check the first term of the right hand side in (2.6.7).
Using the conditional expectation, it can be expressed as
I(NUW)c)P( (NUW)c).
E(V I(Z 2»
n
n
= E(VnI(Z n2)
INUW)P(NUW) + E(V I(Z 2)
n
n
Using P(NUW)
= O(exp(-nh»
and Iv I(Z 2)1 ~ I, we have
n
n
E(Vn I(Z n 2) INUW)P(NUW)
= O(exp(-nh».
Using property (a) and applying the first order Taylor's theorem to a
reciprocal function, we have
E(V I(Z 2) I(NUW)c)
n
n
= E(Vn (~-1 + (~-Zn )nn-2)2 I(NUW)c)
= E(Vn~-2 + 2Vn~-l(~_Z nn
)n -2 + V (~-Z )2n -4
n
n
n
1(~UW)c)
where n is between ~ and Z , i.e. n -2 and n -4 are 0(1). Using
n
n
n
n
properties (b), (c), and (d), we have the following expressions:
IE(V ~-I(~_Z )n -2
n
n n
I(NUW)c)I
~ cIE(Vn(~-Zn) I(NUW)c)I
~ c[E(V 2 I(NUW)c)oE«~-Zn)2 I(NUW)c)]1/2
n
.
-2
-1 1/2
= c[O(n
)oO(n)]
= o(n-1 h-1 ).
and
IE(V (~-Z )2n -4 I(NUW)c)
n
n
n
I
~ cIE(Vn (~-Zn )2 I(NUW)c)I
~ c[E(V 2 I(NUW)c)oE«~-Z )4 I(NUW)c)]1/2
n
n
-2
-~-1
1/2
-1 -1
= c[O(n )oO(n ~)]
= o(n h ).
Thus, the first term of the right hand side in (2.6.7) becomes
-33-
Using property (b). we have
E(Vn~-2) = n-lh-lf(x)I~(f(X}
= n-1 h-1 (f(x»
+ 0(h»-2
-1 rr2
-1 -1
JK- + o(n h ).
Using the conditional expectation. we have
E(V ~-2)
n
= E(Vn~-2
INUW}P(NUW} + E(V ~-2 I(NUW}c}P«NUW}c)
n
CC
= O(n-1 h-2 oexp(-nh}} + E(Vn~-2 I
(NUW)
}P«NUW} }
CC
= o(n-1 h-1 } + E(Vn~-2 I
(NUW)
}P«NUW} ).
Combining the two results. we have
E(Vn~-2)
I(NUW)c)P«NUW}c)
= n- 1h- 1(f(X)}-IIK2
1
1
+ o(n- h- ).
Le.
E(V I(Z 2 })
n
n
= n-1 h-1 (f(x}) -1}:2
~
+ o(n
-1 h -1 ).
Now we check the second term of the right hand side in (2.6.7).
Let
Ln
= MnIZn
- E(M nIZ n ).
Using the conditional expectation. the second term of the right hand
side in (2.6.7) can be expressed as
Var(MnIZ n } = E(Ln2 INUW}P(NUW) + E(Ln2 I(NUW)c}P«NUW)c).
Using P(NUW) = O(exp(-nh}} and ILn 2 1 ~ (21ml max }2. we have
E(Ln 2 INUW}P(NUW)
Using property (a) and E(Mn IZn )
= O(exp(-nh».
= 6/~
1
+ 0(n- h) as given in (2.6.6).
and applying the second order Taylor's theorem to a reciprocal
function. we have
E(L 2 I(NUW}c}
n
= E([Mn (~-1 + ~-2(~_Zn ) + ~n-3(~_Zn )2)
= E([A + B + C + D]2 I (NUW)c).
where ~ is between ~ and Z • i.e. ~ -3
n
n
n
-34-
_ 6/~ + O(n- 1h)]2 I(NUW)c)
= 0(1).
and where
= ~-l{Mn - (o/~)Zn)'
B = ~-2{Mn - o){~ - Zn)'
C = M ~ -3{~_Z )2.
n n
n
A
o
= O{n-1 h).
c
Using properties (b). (c). and (d). we have E{B2 I(NUW».
E(c2 I(NUW)c). and E{02 I(NUW)c) are o(n-lh- l ). The proof of (2.3.1)
I(NUW» c
is complete when we show that E{A 2
is o{n -1 h -1 ).
Using the
conditional expectation. P{NUW) = O{exp{-nh». o/~ = m{x) + O{h).
o ~ A2 < (K L)2 where L is the constant of the Hoelder continuity
max m
m
of m. and E{A2 INUW)P{NUW)
= o{n- 1h- l ).
we have
E{A2 I{NUW)c)
= E{A2 )
1 1
+ o{n- h- )
= ~-2[E{Mn 2) _ 2{o/~)E(MnZn ) + (O/~)~{Zn2)]
r, 2 (m{x) 2 f(x) - 2{o/~)m(x)f{x) +
= ~-2[n.-1 h-1 JK
= ~-2n-lh-1JK2.0{h)
+ o{n-lh- l )
(o/~)
2
f(x»] + o(n
-1 -1
h )
= o{n- l h- 1).
+ o{n- 1h- l )
The proof of (2.3.1) is complete.
Now we give notation and properties for the proofs of (2.3.2) and
(2. 3 .4) .
Le t
Q.
= F- 1{i/(n+l».
di
= XCi)
1
- X{i-l)'
for i = 2.3 ..... n. where F is a distribution function of the design
-1
density f. and F
for i
= 2.
3.
(u)
= inf
°i
= si
{t
€
- si-l
R: F{t)
= ~i
~
u}.
Let
+ {l-~)di+l'
n-l.
We shall use the sampling distribution. the exponential-type
probability inequality of Kolmogorov-Smirnov distance as given in
Section 2.1.3 of Serfling (19BO). and the Hoelder continuity of f. to
-35-
derive a relationship between f(x) and f(Qi)' for each XCi)
[x-h,x+h].
€
If XCi) falls into the interval [x-h.x+h]. then (without
loss of generality). we have
~
F (x-h)
n
= 1,
For all' n
i/(n+l)
F (x+h).
n
2. . ... we have
P(sup IF (t) - F(t)
n
t€R
= n-O. 4 .
Taking d
~
F{x-h)-n-0 . 4
I > d) ~
2
exp(-2nd ).
there is a probability l-exp(-2no. 2 ) that
~
F (x-h)
~
n
i/(n+l)
~
Fn(x+h)
~
-0 4
F(x+h)+n . .
Thus we have. asymptotically.
Because of the Hoelder continuity and the positive lower bound of f. we
have
x-h+O(n-0 . 4 )
~
Q
i
~
x+h+O(n-0 . 4 ).
which implies. by the Hoelder continuity of f.
f(Q.)
1
= f(x)
+ O(h + n-O. 4 ).
Us'ing the results as given in Section 4.6 of David (1981) and the
Hoelder continuity of f. we have the following properties:
(e)
= (nf(Qi»-l
E(d i )
E(d.d.)
J
1
E(D i )
=
+ 0(n- 2 ).
2(n2f(Qi)2)-1
+ O(n- 3 ) if i = j
2
1
3
[ (n f(Qi)f(Qj»+ O(n- ) if i ~ j
= (nf(Qi».
-1
+ O(n
-2
).
2(1-p+p2)(n2f(Qi)2)-1
2
2
E(DiD j ) = 2(1+p-p )(n f(Qi)2)-1
[ (n2 f(Qi)2)-1
.
k
-k
E(D i ) = O(n ). for k = 2. 3. 4.
for i. j
= 2.3 •...•
end of this section.
n-l.
+ O(n- 3 )
+ O(n-3 )
+ O(n- 3 )
if i
=j
if li-jl = 1.
if li-jl ~ 2
The proof of property (e) are given at the
Whenever XCi)' X(j) € [x-h.x+h]. using f(Qi)
-36-
=
f(x) + O(h + n-O. 4 ). property (e) can be modified as
(f)
E(d.) = (nf(x»-1 + O(n- 1h + n- 1 . 4 ).
1
+ O(n-~ + n- 2 . 4 )
2
2 A
+ O(n-~ + n- .~)
2(n2f(x)2)-1
E(d.d.) =
2
2 1
[ (n f(x»1 J
if i =
if i
j
~ j
E(D.) = (nf(x»-1 + O(n- 1h + n- 1 . 4 ).
1
2(1_~+~2)(n2f(x)2)-1 + O(n-~ + n- 2 . 4 ) if
2(1+~_~2)(n2f(x)2)-1
E(D.D.) =
for i.
oj
[ (n2f(x)2)-1
J
1
= 2.
+ O(n-~ + n- 2 . 4 )
+ O(n-~ + n- 2 . 4 )
i
=j
if li-jl = 1.
if li-jl~2
3. .... n-l.
For the proof of (2.3.4). letting
I
n
=
~i
n
~
1=1
"
then the bias of mCM(x)
.
"
Blas(mCM(x»
Kh(x-t)(m(X(i»
Si-l
can
- met»~ dt.
be expressed as
ri
n
= E(i:l
jSi-l Kh(x-t)(m(X(i»
= IKh(X-t)(m(t)-m(X»dt
- m(x»
dt)
+ E(I n )·
Using the conditional expectation. E(I ) can be expressed as
n
c
E(I n ) = E(I n IC)p(C) + E(I n I CC)P(C ).
and II n I ~ 21ml max . we have
Using P(C) = O(exp(-nh»
E(I n IC)p(C) = O(exp(-nh».
c
Using the Hoelder continuity of m. the definition of the event C . and
property (e). we have
IE(I n Icc)
I~
n
E(1:1
Di~-IKmaxLm ~Cc)
~ c(CcnI~~~ f(t) dt)n-~-1 = O(n-1 ).
"
Then the bias of mCM(x) can be expressed as
Bias(;CM(x»
=
I~(x-t)(m(t)
The proof of (2.3.4) 15 complete.
For the proof of (2.3.2), letting
-37-
1
- m(x»dt + O(n- ).
W = !n ( ~i
Kh(x-t)dt} 2 .
si-l
n
i=1
J
~ ri Kh(x-t}m(X(i}}dt.
i=1 JS i _ 1
=
n
A
then the variance of mGM(x} can be expressed as
(2.6.8)
Var(;GM(x}} =
a~(Wn} + Var(Jn }·
Now we check the first term of the right hand side in (2.6.8).
ex~ressed
Using the conditional expectation. E(W } can be
n
as
c
E(W } = E(W IG}p(G} + E(W IGc}p(G }.
n
n
n
Using peG} = O(exp(-nh}} and 0
~
Wn
~
1. we have
E(WIG}p(G}
= O(exp(-nh».
n
Using the integral mean value theorem. the Hoelder continuity of f and
K. and properties (e) and (f). we have the following expressions:
c
E(Wn IG }
IG}
c
n
2.._
2
= E(! D1~h(x-X(i}}
1=1
since
ri
IE(.!n{[Js
1-1
1=1
n
e'
-2.. -2
+ O(n ~ }.
Kh(x-t}dt]
2
2.._
2 I c
- Di~h(x-X(I})} G}
~ IE(! D 3 h-3LX 2 IGc}1 where
. 1
1=
1
.Cmax
l __
~
I
is the constant of the
Hoelder continuity of K
~
=
2(1-~+~
2
rX+h
-3 -3
c(CGnJx_h f(t} dt}n h
)(nf(x»
-2-
n
~(!
Kh(x-X(i»
2
= O(n
IG)
c
-2.. -2
~
+ O(n
-I
}.
+n
-14-1 -2..-2
. h +n ~ }.
1=1
since
~
n
IE(1:1
2
2
-2
2
[D 1 - 2(1-~+P }(nf(x}) ]Kh (x-X(I}}
IG}
c I
rX+h
-2.. -2 4 -2... 2
-I -I 4_-1
(CGnJx-h f( t) dt} -O(n ~+n . }h ~max = O(n +n . -h }.
n
= 2(1-~+~2}(f(x}}-~(n-2! ~(x-Xl}2 IGc ) + O(n-l+n-I.4h-l+n-~-2}.
1=1
= 2(1-~+~2}n-lh-l(f(X}}-IJ~ + o(n-1h- 1).
since peG} = O(exp(-nh)}. and given the event G.
-38-
e
and
= o(n -1 h-1 ),
then we have
E(Vn ICC) = E(Vn ) - E(Vn IC)p(C)
= E(V ) + 0(n- 1h- 1 )
n
= n- 1h- 1f(x)JK 2 + 0(n-1h- 1).
So, we have
E(Wn ) =
2 -1 -1
-1 rr2
-1 -1
h (f(x»
j~ + o(n h
).
2(1-~+~)n
Now we check the second term of the right hand side in (2.6.8).
Using the conditional expectation, Var(J n )
can
Var(Jn ) = Var(Jn IC)p(C) + Var(Jn
Using P(C) = O(exp(-nh»
be expressed as
ICc)p(Gc ).
and IJ n I ~ Iml max ,we have
Var(Jn IG)p(C) = O(exp(-nh».
,
Using the Hoelder continuity of m, Minkowsk1's inequality, and property
(f), we have the following expressions:
E(Jn ) = E(;CM(x»
1
= JKh(X-t)m(t) dt + O(n- ),
and
C
Var(Jn IG )
= E«Jn - E(Jn »2 ICC)
= E([
i
i=1
ri
Kh(x-t)m(X(i»dt - rKh(x-t)m(t)dt + O(n- 1)]2 IGc )
jSi-l
jl
~.
1
~ (x-t)(m(X(i»
1=1 si-l uh
n
= E([.I
n
~ E([ I
i=1
- m(t»dt + O(n
-1
)]
2
ICC)
L + O(n- 1)]2 ICC)
max m
Di~-IK
n
n
= E([ I Di~-lK
L]2 ICC) + O(n- 1 )E([ I Di~-lK
L] ICC) + O(n- 2 )
1=1
max m
1=1
max m
rx+h
-2. -1 2
-1
fx+h'
-2. -1
-2
~ c(CCnjx_h f(t) dt-n -h ) + O(n )(CCnJx-h f(t) dt-n ~ ) + O(n )
= O(n-2 ).
-39-
The proof of (2.3.2) is complete.
Proofs of (2.4.2) and (2.4.3):
Using property (e) and
2k
Di = si - si-l = ! ~j(X(i-k+j) - X(i-k+j-l»
j=1
for i
= k+l.
2k
=
L
j=1
f3 J.d.1-k+"J
k+2 ..... n-k. then we have
= (1
2k
+ !
j=1
~.
2
)(nf(Q.»
J
-2
+ O(n
-3
).
1
Furthermore. if XCi) € [x-h.x+h]. then we have (without loss of
generality)
2
E(D. )
= (1
1
+
2k
!
j=1
~j
2
) (nf(x»
-2
+ O(n
-2.
~
+ n
-2 4
. ).
k
-k
E(D.1 ) = O(n ) for k = 2. 3. 4.
Following the same steps as in the proof of (2.3.2) and (2.3.4). the
proofs of (2.4.2) and (2.4.3) are complete.
Proofs of (2.6.4) and (2.6.5):
Since the event W is the same as the event G. the proof of peG)
will be given only.
We shall use (1) of the theorem as given in
section 10.3.1 of Serfling (1980) to prove P(N) = O(exp(-nh».
Let
rx+h/2
p = Jx-h/2 f(t) dt.
where p
> efoh. Let
Tj
= 1.
= l[x-h/2.x+h/2](X j ),
So. Tj are lID binomial random variables with
mean p and moment generating function ~(z) = 1-p+pez . z € R. The
for j
2 •...• n.
event N is equivalent to the event that
n
S = ! T
n
j=1 j
~ ~_onop
-N
-40-
= nT.
e-
where the value of T corresponding to (I) of the theorem is
T
and where 0
< CN
(2.6.9)
beT) = inf e
~
1/4.
= CNp < E(T} = p.
CN.
We have. using c to express
-ZT
~(z)
= c -cp «I-p)/(I-cp»
l-cp
.
zER
The proof of (2.6.9) is given later.
Now we need to show beT) = b(cp)
~
1 - cpo
This is equivalent to
show that
In (l-cp) - In (b(cp»
~
o.
Using Taylor's theorem. we have the following expressions:
In (l-cp) = -cp In (b(cp»
I
k k
(c p Ik).
k=2
= In [«I_p)/(I_cp»I-cp ]
= (l-cp)[ln (l-p) - In (l-cp)]
CD
k
= -(I-c)p - I [c -c
k=2
k-I
Thus. we have
In (I-cp) - In (b(cp»
k
CD
= [1 - 2c + c-(ln c)]p +
[k(I - (11k) - c - c ) + 2c
k=2
k(k-l)
I
k
k
]p .
For the coefficient of p. 1 - 2c + c-(ln c) is decreasing on c E
(0.1/4].
The minimum value of the coefficient of p is occured at c =
1/4. Le.
1-2c+c-(ln c)
~
1-2c+c-(In c) IC=1/4
> O.
So. the coefficient of p is positive for all c E (0.1/4].
k ~ 2. the coefficient of pk is
k
k
k(1 - (11k) - c - c ) + 2c
k(k-l)
where
1 - (11k) - c - c
k
~
1 - (1/2) - (2c)
-41-
~
O.
For each
when c
k l 1.
€
(0.I/4J.
SO the coefficients of pk are positive for all
Then by (1) of the theorem. we have. asymptotically.
~
peN)
where n-I+~
[b(cp)J
n
~
[I-cpJ
n
~
[1-coefohJ
n
= O(ex~(-nh».
< h < n-~
= O{exp(-nh»
The proof of peN)
is complete.
We shall use (2) of the theorem as given in section 10.3.1 of
Serfling (1980) to prove P(G)
p
where p
> 2e f oh.
= O(exp(-nh».
= J:~~ f( t ) d t •
Let
_ Let
Tj
for j = 1. 2 •...• n.
= I[x-h.x+hJ(X j ),
So. T are 110 binomial random variables with
j
mean p and moment generating function
~(z)
= I-p+pe Z .
z € R.
The
event G is equivalent to the event that
n
S
n
= .! 1
T
J=
> CG°nop = nT.
j -
where the value of T corresponding to (2) of the theorem is
T
and where C l e
G
beT)
1
= CGp > E(T) = p.
Through (2.6.9). we have. using c to express C .
G
= inf
e
-ZT
~(z)
=c
-cp
«I-p)/(I-cp»
z€R
Now we need to show that
b(cp)
If p
>
l/c.
= l/c.
= peSn > n) = 0 < l-p.
then (2.6.5) is true by the fact that
peSn l npc)
If 0
- p.
then (2.6.5) is true by the fact that
peSn l npc)
If p
<1
= peSn = n) = pn <
< P < I/c. then we need to show that
In (l-p) - In (b(cp»
-42-
l O.
l-p.
I-cp
.
Using Taylor's theorem, we have
IX)
In (l-p) - In (b(cp»
= [c(ln c - 1)]p +
L
k
c -ck
k
[k(k-l)]P.
k=2
1
Since c = C
k
G ~ e , the coefficients of p are positive for all k
Then by (2) of the theorem, we have, asymptotically,
n
n
n
peG) ~ [b(cp)] ~ [1-p] ~ [1-2e o h] = O(exp(-nh»,
f
where n-l+~
~
1.
< h < n-~.
The proof of (2.6.5) is complete.
Proof of (2.6.9):
The moment generating function of a binomial random variable T
with mean p is
derivative of e
derivative of e
~(z)
-zcp
Z
= I-p+pe , z
€
R.
The root of the first
z
(l-p+pe) is In (c(l-p)/(I-cp».
The second
-zcp
z
. (l-p+pe ) is
, e- zcpp[c 2 p(l_p) + (1_cp)2e z]
which is positive for all z € R.
So (e
-zcp
z
(l-p+pe»
has the unique
minimum value at z = In (c(l-p)/(I-cp», i.e.
z
(e -zcp (l-p+pe»
I
= c -cp «I-p)/(I-cp» l-cp .
z
e =c(l-p)/(I-cp)
The proof (2.6.9) is complete.
Proof of property (e):
Let U , U , ...• Un be lID Uniform[O,l] random variables.
2
l
Let
U(i) denote the i-th order statistic of Ui · Let Pi = 1/(n+l), qi =
l
I-Pi' Qi = F- (Pi)' Qi ' = Qi '(Pi)' etc. David (1981) has the following
results (including his indices):
(4.6.2) XCi) = Qi + (U(i) - Pi)Qi' + (l/2)(U(i) - Pi)2Qi "
+ (l/6)(U(.) _ p. )3Q,'" + ... ,
1
1
1
-43-
= Q.1
(4.6.3) E(X(.»
1
+ p.q.(2(n+2»-lQ."
1 1
1
3
+ p.q.(n+2)-2[(1I3)(qi- P i)Qi·" + (1I8)p.q.Q ..... ] + O(n- ).
1 1
1 1 1
l
(4.6.4) Var(X(.»
= p.q.(n+2f
(Q 1.. )2 + p.qi(n+2)-2[2(q.-P.)Q.
'Q."
1
11
1
1111
3
+ Piqi(Qi'Qi'" + (1I2)(Qi .. )2)] + O(n- ).
(4.6.5) Cov(X(i),X(j}}
= Piqj(n+2}-IQi'Qj'
+ P i qj(n+2}-2[(qi- P i)Qi"Qj' + (qj-Pj}Qi'Qj"
+ (112)(Pi q iQi .. ,Qj' + Pj q j Qi 'Qj ... + Pi q j Qi "Qj .. )] + 0 (n
for i
-3
).
< j.
Using (4.6.3) of David (1981), the expectation of d
i
can be
expressed as
E(d i }
= E(X(i}
= (Qi
- X(i-l»
2
- Qi-l) + (2(n+2»-I[P i q i Qi" - P i - l q i-l Qi-l"] + O(n- ).
Using the Hoelder continuity of f, through a straightforward
calculat~on,
we have obtained the following quantities:
Qi - Qi-l
= (nf{Qi»
-1
+ O{n
-2
),
.. + O{ -I)
Qi-I .. -_ Qi
n,
Piqi - Pi-Iqi-I
= O{n
-I
).
Then
E(d i )
= (nf(Qi»
-1
+ O(n
-2
}.
To check E(d.d.), using (4.6.3), (4.6.4), and (4.6.5) of David
J
1
(1981), through a straightforward calculation, we have obtained the
following quantities:
E{X{i)2) = Qi
2
+ Pi q i{n+2)-I[(Qi,)2 + QiQ i "]
+ P i q i(n+2}-2[(qi- P i)«2/3}QiQi'" + 2Q i 'Qi")
+ P q {{3/4)(Q .. )2 + (1I4)Q Q .... + Q 'Q "')] + O(n-3 )
ii
i
ii
i i '
E(X(i)X(j»
= QiQ j
+ (n+2}-I[P i qj Qi 'Q j ' + (1/2)(P i q i Qi" Qj + PjqjQiQ j ")]
-44-
e
+ (n+2)
-2
(qi- P i)[(1I3)P i q i Qi"'Qj + PiqjQi"Qj ']
{
+ (qj-P j )[(1/3)P j qj Qi Qj'" + PiqjQi 'Qj"]
+ P 2q i[(l/2)qj Qi'''Qj' + (lIB)qi Qi·..·Qj]
i
+ Pjqj 2[(1I2)P Q 'Qj'" + (l/B)pjQiQ ·..·]
j
i i
+ [(1I2)p/q/Qi" Qj" + (1/4)PiPjqiqjQi"Qj"]
}
+ O(n- 3 ).
for i
<
2
E(d i )
j.
Then
= E(X(i)
- X(i-l»
2
= E(X(i)
2
2
) + E(X(i_l) ) - 2E(X(i_l)X(i»
= A + B + C + O(n- 3 ),
where
A
= Qi
2
+ Qi-I
2
- 2Qi Qi-l'
B = (n+2) -1 { Piqi [(Qi .)2 + QiQ i "] + Pi-Iqi-I [(Qi-I ,)2 + Qi-lQi-I"]
- 2[P i - I q i Qi-l 'Qi' + Pi-lqi-lQi-l"Qi + PiqiQi-IQi"] }.
C = (n+2)-2{
(qi-Pi)Piqi[(2/3)QiQi'"
+ 2Qi'Qi"]
2
+ p. q.2[(3/4)(Qi .. )2 + (1I4)QiQi .... + Q. 'Q .... ]
1
1
1
1
+ (Qi-l-Pi-I)Pi-lqi-I[(213)Qi-IQi-I'" + 2Qi-I'Qi-l"]
+ Pi_12qi_12[(3/4)(Qi_I .. )2 + (1I4)Qi-I Qi-I·..• + Qi-l 'Qi-l"']
+ (q.1- I-P,1- I)P,1- 1[(-2/3)qi - lQ
i-
I·"Q·1 - 2q.Q.
l"Q··]
1 11
+ (qi-P i)qi[(-2/3)P i Qi-I Qi'" - 2P i - 1Q i-l' Qi"]
+ Pi-12qi-I[-qiQi-l"'Qi' - (1I4)qi-l Q i-l .... Qi]
+ Piqi2[-Pi-IQi-1 'Qi'"
- (1I4)P i Q i-l Qi .... ]
+ [-Pi_12qi2Qi_I"Qi" - (1/2)Pi-IPiqi-lqiQi-l"Qi"]·
Using the Hoelder continuity of f to check each sum of
coefficients of QQ. QQ", Q'Q', QQ... , Q'Q". QQ.... , Q.Q.... and Q"Q",
through a straightforward calculation. we have obtained the following
quantities:
A
= (n 2f 2 (Qi» -1
-45-
+ O(n
-3
),
B
= n -2 (Qi ') 2
+ O(n
-3
C
)
= (n2 f 2 (Qi»
-1
+ O(n
-3
).
= O(n-3 ).
2
The proof of E(d. ) is complete.
1
Use the same method to check E(did ) for i
j
David (1981) to check higher moments of XCI).
property (e) is complete.
-46-
< j,
and use (4.6.2) of
Then the proof of
OIAPTER III
A DATA-DRIVEN BANDWIDTH ESTIMATOR
3.1
Introduction
The two optimal bandwidths, h
A
average square error (ASE or dA(h»
average square error (MASE or dM(h»
and h , are the minimizers of the
M
as given in (1.11) and the mean
as given in (1.12) respectively.
Based on the independent observations, it has been shown that the crossvalidated bandwidth hCV as given in (1.14) is asymptotically optimal,
A
i.e.
h~A
or hCV/h
M
converge to 1 in some mode.
This was established
by Rice (1984) in the equally spaced fixed circular design context and
by Haerdle and Marron (1985) in the random design setting.
However, Hart and Wehrly (1986) showed that the MSE criterion will
be affected when the observations are dependent.
Hart (1987) showed
that, when the observations are positively correlated. the crossvalidation criterion will produce rough kernel estimates.
Chiu (1987)
showed that Mallows' criterion does not select a proper bandwidth when
the regression errors are subjected to the first order moving average
process.
When the dependent observations are considered in the
nonparametric regression, a first convenient choice of dependence
structure for our analysis is the class of autoregressive-moving
average (ARMA) processes in time series analysis.
In the nonparametric
sense, the ARMA parameters will not be estimated.
The ARMA process
will only be used to give the regression model a short range dependence
structure.
In this chapter. the equally spaced fixed design nonparametric
regression model (1.1) is used and the regression errors are taken as
an unknown ARMA process. In Chapter 2. it was shown that the Nadaraya,.,
Watson estimator ~W(x) as given in (1.2) is more efficient than the
,.,
Gasser-Mueller estimator mCM(x) as given in (1.3).
Subsequently. only
,.,
~(x)
will be used to estimate the unknown regression function m(x).
Looking at (1.14). the cross-validation score CV(h) is affected by
the
dependen~e
structure of the observations because of the dependence
A
between mj(x ) as given in (1.13) and Y .
j
j
Since the observations are
assumed to suffer from a short range dependence. an immediate remedy
for the cross-validation score (1.14) is to leave more than one point
A
out when m.(x.) are constructed.
J
J
This allows us to reduce the amount
A
of dependence between mj(x j ) and Yj .
We hope that the dependence
effect on the cross-validation score is negligible whenever there are
,.,
enough points deleted in the construction of mj(x ).
j
Let the new version of the cross-validation score be
(3.1.1)
CVt(h) = n
-1 n
!
j=I
~L
A
[mj(x j ) - Yj]-W(x.).
J
,.,
where the subscript t means that mj(x ) are a "leave-(2t+l)-out"
J
,.,
version of m(x ) for all j. i.e.
j
(3.1.2)
= [(n-2t-l)-1
!
Kh(x.-x.)Y.] /
i:li-jl>t
J 1 1
[(n-2e-l)
-1
!
i:li-jl>t
Kh(x.-x.)].
J
1
When t = O. CVt(h) is the ordinary cross-validation score CV(h) as
-48-
given in (1.14).
The minimizer of CVe(h) is denoted by hCV(e).
The idea of the modified cross-validation has been proposed and
analyzed in Collomb (1985), Haerdle and Vieu (1987). and Vieu and Hart
(1989) for the settings of strong mixing data.
In these settings, the
modified cross-validated bandwidth is asymptotically optimal when the
value of e is taken as O(n T ) for some r
> O.
For the correlated data,
Chiu (1989) and Hart (1987) proposed methods to estimate the covariance
function of the regression errors, and plugged the estimated covariance
function into bandwidth selection methods.
Chiu (1989) and Hart (1987)
showed, in the simulation study, that these plug-in bandwidth selection
methods would produce good bandWidths.
The purpose of this chapter is to study how the short range
dependence among the observations affects the modified cross-validation
criterion CVe(h) for each value of
e.
The definition and the
properties of ARMA processes are given in Section 3.2.
Section 3.3
describes the asymptotic behaviors of the optimal bandwidths h
A
and h .
M
The asymptotic behaviors of the data-driven bandwidths hCV(e) are
described in Section 3.4.
results.
3.2
Section 3.5 contains a discussion of our
Finally, the proofs are given in Section 3.6.
ARMA Processes
In this section, we only give the definition and the properties of
ARMA processes which are quoted, including the indices, from Brockwell
and Davis (1987):
(a)
(Definition 3.1.2 (The ARMA(p,q) Process»
The process {c } is
j
said to be an ARMA(p,q) process if {c } is stationary and if for every
j
j,
-49-
f(B)E. j
= 9(B)e j •
where {e.} is a sequence of uncorrelated random variables with mean
J
zero and finite variance
0
2
• f and 9 are the p-th and q-th degree
polynomials
and B is the backward shift operator defined by
for all integers k.
(b)
(Theorems 3.1.1 and 3.1.3) If {E. j } is an ARMA(p.q) process for
which the polynomials f(Z) and 9(z) have no common zeros and fez) has
no zeros on Izi ~ 1. then {E. } has the unique solution. for every j.
j
CD
CD
with
!
I~.
i=O
1
I<
CD.
where the coefficients {~i}.are determined by the
relation
~(z)
(c)
CD
=
!
i=O
~izi
= 9(z)/f(z).
Izl
~ 1.
(Theorem 3.2.1 and Example 3.2.3) If {E. } is an ARMA(p,q) process
j
for which the polynomials f(Z) and 9(z) have no common zeros and feZ)
has no zeros on Izi ~ 1. then the autocovariance function ~(.) of {E. }
j
is determined by
-50-
~(k)
= Cov(c .. c J'+k ) =
J
for every j and all integers k.
m
a
k=-m
~'~'+k'
1
1
Then we have
= a 2(!
i=O
m
(d)
!
i=O
m
~(k)
!
~i)
2
.
co
<m
!
I~il
i=O
The implication of
m
2
is that
!
k=-m
1~(k)1
< m.
(Exercise 3.11) If {c } is an ARMA(p.q) process for which the
j
polynomial ~(z) is not equal to zero on Izi
autocovariance function
~(.)
there exist constants C
>0
= 1.
then the
of {c } is geometrically bounded. i.e.
j
and s E (0.1). such that. for all integers
k.
and hence that
co
!
k=-m
(e)
Ik~(k) I
< co.
(Theorem 7.1.1) If {c } is an ARMA(p.q) process for which the
j
polynomials
~(z)
and 9(z) have no common zeros and
Izi ~ 1. then. as n ~
-1
~(z)
has no zeros on
m.
n
co
n Var(! c j ) = !
(1-lkl/n)~(k) ~
! ~(k).
j=l
Ikl<n
k=-m
where
~(.)
is the autocovariance function of {c.}.
J
m
!
~(k)
k=-m
-51-
~
O.
This implies
3.3
Behaviors of the Two Optimal Bandwidths
In this section. the asymptotic behaviors of the two optimal
bandwidths h
A
and h
M
are studied.
In order to derive the asymptotic
behaviors. using the equally spaced fixed design nonparametric
regression model (1.1) and the Nadaraya-Watson estimator (1.2). we must
impose the following assumptions:
(A.1)
The regression function m(x} supported on the interval [0.1] has
a uniformly continuous and square integrable second derivative m"(x} on
the interval (0.1).
(A.2)
The kernel function K is a symmetric probability density
function with support contained in the interval [-1.1].
The second
derivative K" of K is Hoelder continuous of order 1.
(A.3)
The weight function W is bounded. Hoelder continuous of order 1.
and compactly supported on the interval (O.I) with' a nonempty interior.
(A.4)
The regression errors
~j
are an unknown ARMA(p.q) process for
which the polynomials ;(z) and 9(z} have no common zeros. ;(z) has no
zeros on Izl ~ 1. and e
all finite moments ~
(A.5)
j
are lID random variables with mean zero and
= E(e l k )
for all positive integers k.
The autocovariance function
~(.)
of the regression errors
has
~.
J
CD
a positive sum. i.e. 0
(A.6)
for n
(A.7)
<
!
k=-CllI
~(k)
< CD.
The bandwidth h is chosen from the interval H = [C-In-I+o.en-o]
n
= 1.2 •...•
constants C
>0
and 0 € (0.1/5).
The number of observations deleted in the construction of the
modified cross-validation score is 2t+I. with t «nh.
The total
number of observations in this regression setting is n. with n
=
Let the notation X
0 (v ) mean that. as n ~
nun
almost surely. and uniformly on H if v
n
-52-
n
involves h.
CD.
~
CD.
Ixn Iv n I ~
As n
~ CD.
0
under
e'
the above assumptions. it is shown in Section 3.6 that dM(h) and dA(h)
have the following asymptotic expressions:
(3.3.1)
(3.3.2)
where the superscript T represents a truncation and
and where
=
al
b
{k=~ ~(k»JK2JW.
= (1/4)(JU~}2J{m")2w.
l
Here and throughout this chapter. the notation
J
J
denotes
components of mean average square error. the terms a1n
represent the variance and bias square respectively.
duo
-1 -1
h
For the
and b h
4
1
Through a
T
straightforward calculation. dM(h) has the unique minimizer
(3.3.3)
where
Here. if the coefficient b
1
is zero. then
d~(h} has no minimizer.
implies that dM(h} and dA{h} have no minimizer asymptotically.
i f h is zero. then
(3.3.2)
.
IS
that h
A
d~(h}
and h
is
M
~(O)Jw.
This
Also.
The implication of (3.3.1) and
are of the same order n
-1/5
T
as h .
M
The rest of this section is devoted to deriving the relationships
T
between h • h • and h .
A M
M
Let
-53-
Observe that
(3.3.4)
0
= dM'{hM) = d~'{hM)
+ DM'{hM)
h~)d~"{hM*) + DM'{h M),
= (h M -
where h * lies inbetween h and h~, and
M
M
(3.3.5)
d~'{hA)
0 = dA'{h A) =
h~)d~"{hA*) + DA' (hA) ,
= (hA -
where hAM lies inbetween h
A
+ DA'{hA)
and h~.
* are of the order n-115
A
-1 -2 3
T
-1 -3 2
Following the fact that dM"{h) = O{n h +h), DM' (h) = o{n h +h),
-1 -2 3
and DA'{h) = 0u{n h +h). we have
the same order n -1/5 ,then h
*
Since h , h , and h~ are all of
A M
M
and h
d~"{hM*)
Thus, it is concluded that. as n
T
and hM = O{n
-1/5
= O(n- 215 ),
~~.
combining with (3.3.4), (3.3.5).
). we have the following results:
(3.3.6).
{3.3.1}
-54-
3.4
Behaviors of the Modified Cross-validated Bandwidth
In this section, the asymptotic behavior of hCV(e)' the minimizer
of CVe(h). is discussed for each
e
«nh.
Through adding and
,.
subtracting the terms m(x j ) and m(x j ), the modified cross-validation
score CVe(h) can be expressed as
= n -1
(3.4.1) CVe(h)
n
L
j=1
2..
~.-W(Xj)
+ dA(h) - 2Crosse(h) + Remaindere(h),
J
where
,.
A
,.
2m(x. )]W(x.),
m(xj)][mj(x j ) + m(x j )
,.
J
J
,.
and where m.(x.) is a "leave-(2e+l)-out" version of m(x.) as given in
J
J
J
(3.1.2) for every' j.
in Section 3.3, as n
(3.4.2)
(3.4.3)
For each e «
~~,
nh, under the assumptions as given
it is shown in Section 3.6 that
Remaindere(h)
= 0u(n-1 h -1
4
+ h ),
E(Crosse(h»
(3.4.4)
Crosse(h) = E(Crosse(h»
+ 0u(n
-1 -1
h
4
+ h ).
Looking at (3.4.1) through (3.4.4), we see that the effect of the
dependent observations on the modified cross-validation criterion
CVe(h) is determined asymptotically by the term
in (3.4.1).
as given
In the rest of this section, we shall study how this term
affects the modified cross-validation criterion.
For each
-2Crosse(h)
e
«
nh, let
-55-
(3.4.5)
where the superscript S represents a subtraction and the subscript 2
applies for the "leave-(22+1)-out" version of the modified
cross-validation.
Using (3.3.1). (3.3.2). and (3.4.1) through (3.4.4).
we obtain the following asymptotic expressions for each
«
~
nh:
(3.4.6)
(3.4.7).
CVn(h) = n
~
= n
-1 n
2_
S
~ ~j-W(Xj) + dMn(h) +
j=l
~
-1 n
~j-W(Xj)
j=l
1 1
4
(n- h- + h )
u·
S -1 -1
4
+ a 1tn h
+ b 1h +
2_
~
0
(n
0
-1 -1
4
h
+ h ).
u
where
a~~
If
a
S
a1~
> O.
= [(
;
k=~
~(k»J~
- 4(
~ ~(k»K(O)]rW.
J'
k>t
then. through a straightforward calculation.
S -1 -1
4
~C;
-1/5
n h
+ b h has the unique minimizer COtn
• where
1t
1
= [(
If
S
al~
;
k=~
~(k) r~
Jl
- 4!
k>t
~(k)K(O» rW( rU~)-2(J(m")2w)-1]l/5.
J Jl
is negative or zero. then a
S -1 -1
4
n h
+ b h has no minimizer on
1t
1
H asymptotically.
n
Let
S
hM~
S
be the minimizer of dMt(h).
right hand side in (3.4.7). n
-1 n
!
Since the first term of the
2_
~j-W(Xj)'
is independent of h. and
j=l
follOWing the same arguments in (3.3.4) through (3.3.7). then we have
the follOWing asymptotic results for any t
-56-
«
nh as ~~
> 0:
S
(3.4.8),
h Me
-S -1/5
= COen
(1+0(1»,
(3.4.9)
Finally, combining (3.3.6) with (3.4.9), we have, as n
~
00
and
~e > 0,
(3.4.10)
where
= [1
3.5
-
(4K(O)/J~)(
!
k>e
~(k)
/
;
k=-oo
~(k»]1/5.
Discussion
From (3.3.7), (3.4.7), (3.4.10), and the coefficient
a~e'
we
arrive at the following ideas about the effect of the dependence
structure on the modified cross-validation criterion:
a.
It moves hCV(e) toward the right or the left of hM depending on the
,.
value of
!
k>e
side of h
!
k>e
H.
n
M
~(k).
If
!
k>e
!
k>e
~(k)
As the value of
approaches h
A
~e
~
~e/Co is between 0 and 1.
If
0, then CVe(h) has no minimizer on
is negative, then hCV(e) is on the right side of h M
and the value of the ratio
b.
is positive, then hCV(e) is on the left
and the value of the ratio
"l'(k) is too large, i.e.
If
~(k)
e
~e/Co
is greater than 1.
S
increases, i.e. ale approaches aI' then hCV(e)
or h .
M
To illustrate the above ideas, we do a simulated regression.
Figures 3.1, 3.2, 3.3, and 3.4 use the average of 100 curves of CVe(h)
and dA(h) generated from the pseudo AR(!): ~j
-57-
= ~~j-1
+ e j , and MA(l):
Figure 3.1: Relationships among the modified cross-validated bandwidths
(solid vertical lines) and the optimal bandwidth (dashed vertical line)
for the average function and the positively correlated observations.
to
or-rr-.,....,--__r---.---...---,--~-,....,r__-__r--_r--_r_-__,
o
\
\
CVs
\
\
\
\
'd M
\
\
CV2
\
\
\
\
\
\
,
gll===~c~v~o:::;::=::::l:=1~. ~. _ft=~===....1....._J
o -0.70
-0.56
-0.42
-0.28
LOG(h)
-58-
-0.14
0.00
Figure 3.2: Relationships among the modified cross-validated bandwidths
(solid vertical lines) and the optimal bandwidth (dashed vertical line)
for the average function and the negatively correlated observations.
o
\
\
\
\ CV2
d M\
\
/
\
I
,
/
/
""
I
"
/
25 ll~~ __--L_~===L=:'3:==d~L;/~J'-':',.j_J.....I..
__J
o -0.70
-0.56
-0.42
-0.28
LOG(h)
-59-
-0.14
0.00
Figure 3.3: Relationships among the modified cross-validated bandwidths
(solid vertical lines) and the optimal bandwidth (dashed vertical line)
for the average function and the one step positively correlated
observations.
UJ
~ .......--.-~---r----r-----r-- ......---rrTTT"--'---'---""---'
o
\
\
\
\
\CV S
\
\
\
\
e-
I
I
I
I
I
I
I
I
I
I
"
/
§ LLc::v~o~=:=j:====::r:::=::I='=,~. . ::::Jltt_2/=/==~ __I..-_J
o -0.70
-0.56
-0.42
-0.28
LOG(h)
-60-
-0.14
0.00
Figure 3.4: Relationships among the modified cross-validated bandwidths
(solid vertical lines) and the optimal bandwidth (dashed vertical line)
for the average function and the one step negatively correlated
observations.
o
-0.56
-0.42
-0.28
LOG(h)
-61-
-0.14
0.00
= ee j _ 1
~j
+ e • where e
j
2. 5. and 15.
~
~
x
are lID N(O.a ). processes of regression
1.
2
=
Figure 3.1 shows that the case of ARCl)
regress.ion errors with,
and a
The weight function is taken as W(x)
The Nadaraya-Watson estimator (1.2) is applied to
get kernel estimates.
= 0.5
2
= 0.02654.
regression errors with e
= -2
•
The kernel function is taken as the bi-weight kernel. K(x)
(5/3)I[I/5.4/5](x).
, = -0.5
O. 1.
3
3
The regression function is given as m(x) = x (I-x) for
= (15/8)(1-4x 2 ) 2 I[_1/2.1/2](x).
e
e are e =
The sample size is n = 200. and the values of
errors.
o
2
j
and a
2
= 0.008852 .
and Figure 3.2 shows
Figure 3.3 shows that the case of MACl)
=2
and a
2
= 0.0058992 •
and Figure 3.4 shows
2 2 2
and a = 0.0177. In the above cases. a are taken to make h M
roughly 1/2.
notation
*
Vertical lines are the minimizers of the curves.
The
denotes the intersection point where the vertical line and
CVe(h) curve meets.
If the regression errors are a MA(q) process with an unknown order
q. then the dependence effect is totally avoided whenever
!
k>e
~(k)
= O.
e > q.
i.e.
In practice. the MA(q) regression errors are detectable
by looking at the sample autocorrelation function of Y.-Y. 1 which
J
should vanish at lags greater than q+l.
an AR(I) process with parameter "
with
J-
If the regression errors are
1,1 < 1.
and finite variance
2
a • then
We solve the inequality.
I I
e
If
1,1
e
~
n
-115
to get
~ (-1/5)(log n)/(log
is not larger than 0.9. then taking
-62-
1,1).
e = 4.5(log
n) is enough to
make the quantity
!
~(k)
be negligible.
In practice. the AR(I)
k>e
regression errors are also detectable by looking at the sample
autocorrelation function of Y.-Y. 1 which would act as the usual AR(l)
J
J-
process at lags greater than or equal to 1.
For the general ARMA regression errors
boundedness of the autocovariance function
~j'
~(-)
using the geometric
of
~j
(the property (d)
as given in Section 3.2). we have
e 1
C-s + /(1-s).
I! ~(k)1 ~
k>e
This implies that
e ~ m ..
(~e/Co) converges to 1 at a polynomial rate as
Therefore. taking
e = O(log
n). the modified cross-validation
criterion CVe(h) would provide asymptotically optimal bandwidths when
the observations suffer from a short range dependence.
Since the optimal bandwidths h
A
and h
M
are of the same order
n- 1/ 5 • then the value of e is allowed to be o(n4l5 ) around the optimal
bandwidths.
This leaves a lot of room for the modified cross-
validation criterion to handle the more complicated and the longer
range dependence structure than the ARMA processes used in this
chapter.
See Granger and Joyeux (1980) and Cox (1984) for a detailed
discussion of long range dependence processes.
The results above may be extended to the case where the regression
errors
~.
. J
are a linear process instead of only an ARMA process. i.e.
~
j
are defined by
for every j.
The linear process
~.
J
still has the same results obtained
in this chapter when the following assumptions hold:
-63-
a.
The lID random variables e
j
have mean zero and all finite moments
for every j.
b.
The coefficients
~i
are real numbers such that
CD
CD
!
k=-CD
2
where "Y(k) = E(e )
1
c.
When
e=
o(n
4/5
Ike"Y(k) I < CD,
CD
!
i=-CXI
),
I
~i~i+k for all integers
!
k>e
k.
-r
"Y(k) I = O(n ), for some r > O.
-64-
3.6
Proofs
Let c. c . and c denote generic constants.
2
1
Before we start the
proofs. we need to introduce some properties which will be used in this
section.
Theorem A (Section 1.4 of Serfling (1980».
Suppose that. as n
~
00.
X ~ X in distribution and the sequence {X r} is uniformly integrable.
n
n
where r
> O.
r
< 00.
Then E(IXl )
r
lim E(X r) = E(X ). and
n
n-tCO
r
lim E(\X I ) = E(lx\r).
n-tCO
n
Whittle's inequality (Theorem 2 of Whittle (1960».
Let
n
! aiX ·
i
i=1
n
n
B = !
! bijXiX j •
i=1 j=1
A =
and
~i(k)
= [E( IXi \k )] 11k •
where Xl' .... Xn are independent random variables with mean zero. a .
i
b IJ
.. are. real numbers for i. j = 1.2 •.... n.
Then the following
inequalities are valid
k
E( IA \ )
k
n
2
2 k/2
2 C(k)(! a i ~i(k»
.
i=1
n
n
(ii) E(IB - E(B)\k) ~ 2 3kC(k)C(2k)I/2(!
! b. 2~ (2k)2~.(2k)2)k/2.
i=1 j=1 Ij i
J
k/2 k+l - 1 / 2 .
.
where C(k) = 2
r(~)v
• provIded k ~ 2 and the rIght-hand members
(i)
~
exist.
We first extend Whittle's inequality to infinite series.
If Xi
are lID random variables with mean zero and all finite moments. and a.
1
-65-
co
and b .. are real numbers such that
IJ
};
la.1
1
i=-ClO
< co
and
co
co , then for all positive integers k, letting A =
};
i=-ClO
co
co
};
};
i=-co j=-co
Ib ij
a.X. andB =
1 1
co
co
};
}; b.jX.X , then we have
.
i"
1
1 j
1 =-ClO , "# J ' J =-co
(3.6.1)
co
2k
E(B ) ~ c (};
}; b. 2 )k,
2 i =-ClO j =-ClO 1 j
CD
(3.6.2)
where c
1
and c
are constants involving k and moments of X.
2
Proofs of (3.6.1) and (3.6.2):
Because
}; lail and
};
}; Ibijl absolutely converge. we
i =-ClO j =-ClO
i=-CO
have, as n --+ co,
An =
a.
b.
--+
A in probability,
=
Bn
--+
B in probability.
We first check that, for each k = 1.2 •...•
E{A 2k) < co. E(B 2k) < co. for all n = 1. 2. . .. ,
n
n
2k
2k
{A
} and {B
} are uniformly integrable.
1.
2.
n
n
For the first one. using Whittle's inequality. we have
E(A
2k
n
)
~
)
~
2k
E(Bn
co
};
i
i
i=-n
i=-ClO
i=-ClO
n
n
2k
co
co
2k
bij
c2{i=:n j=:nbij) ~ c2{i=:m j=:m
)
c {
1
n
I
a
2k
)
~
c {
1
CD
I
For the second one, we have
-66-
a
2k
)
~
c {
1
I<
since
E(A
4k
n
co
~
}
< co.
!
la 1}4k
1 i=-(O
i
c (
co
E(B 4k} ~ c (
co
!
' ! Ib 1}4k
2 i=-(O j=-(O
ij
n
< co
Then. using the Theorem A in Section 1.4 of Serfling (1980) as
described above. we have
E(A
2k
n
2k
E(B
~ E(A2k } ~
)
~
}
E(B
2k
}
~
n
2
e (
;
a }k.
t i=-o:) i
co
co.
2 k
c (!
!
b . ) .
2 i=-(O j=-(O i J
The proofs of (3.6.1) and (3.6.2) are complete.
If
~.
J
are a linear process defined by
for j = 1.2. . . . . n. where ~1 are real numbers with
e
j
!
I~il
1=0
< co.
and
are lID random variables with mean zero and all finite moments. then
n
(3.6.3)
~j}
E(!
2k
= O(nk }.
j=1
for all positive integers k.
(3.6.4)
(3.6.5)
n
n
The equation (3.?3) implies. 0
-1 n
=0
~j
!
j=1
-1 n
! ~(x-x'}~i
i=1
1
u
(n
-2/5
'
= 0 u (n-2/5h-2/5 ).
Proofs of (3.6.3) through (3.6.5):
For each finite n. let
n
S =
).
!
j=1
-67-
~j'
< x < 1.
By Fubini 's Theorem, absolute convergence of
1-/1.1 I, and fini te
!
i=O
moments of e , we have
j
n
n
n
S = ! cj = !
! -/liej_i = ! -/li(! e j -1.) = ! -/I.X.,
i=O 1 1
i::O
j=1
j=1 i::O
j=1
where, for each i = 0, I,
IX)
IX)
IX)
n
X. =
!
e _ ,
j
j=l
1
i
and where X. satisfies, for all positive integers k,
1
2k
E(X
i
) = E(X
2k
)
1
~ c 1nk ,
Let, for each e = I, 2,
by Whittle's inequality.
e
Se = !
i=O
Applying the absolute convergence of
E( IS e -
sl>
= E( I!
De
-/lix.l) ~
!
De
1
-/liXi'
!
I-/lil, we have, as
i=O
e~
I-/l i IE( IX i I) = E( IX1 1)! I-/l i
De
IX),
I ~ 0,
Le.
Se
~
S in probability.
Applying Minkowski's inequality and Whittle's inequality, for each
finite e and all positive integers k, we have
E(Se
2k
e
)
~ E(X 12k )(.!
1=0
Using the finite moments of
lim sup E(Se
c~
e
2k
IX)
I-/l 1)2k
i
~ c 1n k ( !
i=O
Se' we have
eI[S 2k
e
> c]>
~
lim sup c
c~
e
-1
E(Se
~
)
= 0,
Le.
{Se 2k } is uniformly integrable.
Then, using the Theorem A in Section 1.4 of Serfling (1980) as
described above, we have, as
e~
CXl,
-68-
co
k
E(S2k) ~ cln ( !
i::O
I-1J.)
I
1
2k
= O( nk ).
The proof of (3.6.3) is complete.
For (3.6.4). use the Borel-eantelli lemma. for any
! P(ln
2/5
(n
!
/
n
~
~
) O.
-1 n
!
~j)1 ) ~)
j=l
n
~
~
~-2~«n-3/5
i
~j)3k)
j=l
~ ~
-2k
cln
-k/5
< co .
n
if k
> 5.
The proof of (3.6.4) is complete.
For (3.6.5). by the definition of K and Kh(o)
are O(nh) terms in n
-1 n
! Kh(x-xi)~i'
= h -1 K(o/h).
there
For h E H • we have
i=1
n
C- 1n- l +6 ~ n-lh- l ~
Use the Borel-eantelli lemma. for any
~
en- 6 ,
> O.
> ~)
if k
> 5/6.
The proof of (3.6.5) is complete.
For any £
«
nh and each x j with W(x j
) ~
.
O. or h
< x j < I-h. under
the assumptions given in Section 3.3, we have the following asymptotic
results:
(3.6.6)
(3.6.7)
(3.6.8) b.
J
-69-
= [(n-2e-1) -1
(3.6.9)
!
i:li-j!>e
~(Xj-x.)(m(x.)-m(x.»] I
1
1
J
-I
[(n-2e-l)
!
i:li-jl>e
Kh(x.-x.)]
J
1
.
(3.6.10)
(3.6.11)
Proof of (3.6.9):
Here we only need to check
n
-1
!
.. I'1-J'1 "e
~
1·
Kh(xj-xi)(m(x i ) - m(x.»
2
= o(h ) .
J
Using the symmetry of K. the Hoelder continuity of K and m. and
e«
nh. we have
n=
1
!
~ (xj-xi)(m(x i ) - m(x.»
i:li-jl~e-ll
J
(j+e)/n
J(j-e)/n Kh(xj-t)(m(t) - m(x j » dt + O(en
e~
= J-e/nh
.
-2 -1
~ )
~-1
K(u)(m(x j - hu) - m(x j » du + O(en ~ )
2
Je/nh
2_
-2-1
= (1/2)h (l+o(1»m"(X j ) -t/nh ur«u) du + O(en ~ )
= O(tn- 1h + tn-~-l) = o(h2 ).
The proof of (3.6.9) is complete.
Proof of (3.3.1):
....
Since m(x j ) - m(x j ) = b j + v j ' then dM(h) can be expressed as
-70-
dM(h) = E(n
= E(n
-1 n
2..
A
! (m(x j ) - m(x.»-W(X.»
j=1
J
J
-1 n
2..
! (bj+vj)-W(x j »
j=1
-1 n
~_
-1 n
2
= n
! bj-W(x j ) + n
! E(v. )W(x.) .
. 1
. 1
J
J
J=
J=
2
} 2..._
2
.
Using b. = (l/2)h m"(x.) uJ( + o(h ) as given in (3.6.8) and the
J
J
uniform" continui ty of m". we have
(3.6.12)
n= n-
=
-1 n
!
i=1
2._
! bj-W(x.)
j=1
J
1 n
2
} ?_ _
2 2._
! [(l/2)h m"(x ) uJ( + o(h )]-W(x.)
j
j=1
J
(l/4)h\ru~)2n-l ~ (m"(x j »2w(x.)
J'
j=1
J
(l/4)h4(Ju~)2J(m")2w + o(h4).
=
Using n
1 n
~(X'-Xi)
= 1 + O(n
-1 -1
h
+ o(h 4 )
) as given in (3.6.6). we have
J
2
E(v. ) = n
-2 n
J
n
!
! Kh(Xj-xs)~(Xj-Xt)~(s-t)(1+0(n
5=1 t=1
-1 -1
h».
co
Then. letting k=s-t and using
!
Ik~(k)1
k=-m
< co. Kh(xj-xs+xk ) =
~(x.-x ) + O( Ikln- 1h- 2 ). and O(nh) terms in !. we have
J
5
(3~6.13)
= n
= n
= n
= n
S
n
-1 n
!
. 1
J=
2
E(v. )W(x.)
J
J
-3 n
n
n.
-1 -1
!
!
! ~(x.-x )Kh(Xj-Xt)~(s-t)W(x.)(1+0(n h »
j=1 5=1 t=1
J 5
J
-3 n
-1 -1
!
!
! Kh(x.-x )Kh(xj-x +~)~(k)W(x.)(1+0(n h
j=1 Ikl<n 5
J 5
5
J
-1
-1
-1
»
-1 n
!
[ n ! ~(Xj-xs)~(Xj-xs+~)]~(k)[n ! W(x.)]
Ikl<n
5
j=1
J
(1+0(n- 1h- 1 »
-1
2
-1 -2
-1 n
[ n ! ~(Xj-X) + O(lkln h )]~(k)[n
! W(x.)]
Ik I<n
5
5
j=1
J
1
1
(1+0(n- h!
»
-71-
= n- 1
!
[h- 1JK 2 + O(
Ikl<n
1
= n- h- 1JK2JW[
Ikln-1h-2)]~(k)[JW
~(k)]
!
+
+ O(n- 1 )](1+0(n- 1h- 1 »
O(n-~-2).
Ikl<n
The proof of (3.3.1) is complete.
Proof of (3.3.2):
Using (3.3.1), the proof of (3.3.2) is complete if we show that,
as n
--+ co,
(3.6.14)
where .
dA(h) - dM(h)
= 2n-1 .!n
bjv.W(x j ) + n
J=1
J
-1 n
!
(v.
j=1
J
2
2
- E{v. »W{x.).
J
J
Using the Hoelder continuity of K and m, the compactness of the
support of K, the boundedness of W, and b.{h) = O{h2 ) as given in
J
(3.6.8), we have the following expressions: for any h, hI € Hn . with
h ~ hI'
Then, we have
n
-1 n
!
j=1
[b.{h)vj(h) - bj{hl)Vj(hl)]W{Xj)
J
-72-
n
}: [b.(h)(vj(h) - v.(h 1» + (bj(h) j=1
J
J
1
= [h2 (h- 1 - hI-I) + (h2 - h 2 )h 1- ]-OU(I)
I
= n
-1
,
b.(hl»v.(hl)]~(x,)
J
J
J
= h1I(h-h1)/hI-oU(I),
and
-1 n
2
2
2
2
}: [(v.(h) - E(v.(h) » - (v j (h 1 ) - E(v .(h 1) »]W(x .)
J
J
j=I
J
J
-2
-2
= (h
- hI )-ou(l)
2
= h- I(h-h 1)/hI-ou{l)·
-1 -1+0
-0
.
For any constant r ~ 3, h, hI € Hn = [C n
,en], wIth h ~ hI'
we have
n
sup
l(dA(h) - dM(h»
\ (h-hI)/h I~n-r
~
sup
- (d A(h 1) - dM(h 1»I
1 n
12n- }: [bj(h)vj(h) - b j (h 1)v j (h 1)]W(x.)! +
I(h-hl)/hl~n-r
j=1
J
sup
l(h-hI)/hI~n-r
-1 n
In}:
[(vj(h)
2
j=I
2
2
- E(vj(h) » - (v j (h 1 ) E(v.(h 1)2»]W(X.)
=
2
sup
(hI + h- )I(h-h 1)/h\-0 (1)
\(h-hI)/h \ ~n-r
u
J
J
I
= n 2n -r -0u (1) = 0 u (n-1 ).
Hence, it is sufficient to restrict the supremum in (3.6.14) to a
r 1
set H'n which is a subset of Hn so that ~(H')
n ~ n + and so that for any
r
h € Hn there is an hI € H~ with \(h-h 1)/h1 ~ n- . Then we have
sup
h€H
n
IdA(h) - dM(h) \ ~
sup \d A(h 1) - dM(h I )I +
h €H'
1
n
sup
_ \(dA(h)-dM(h»
\(h-hI)/h I~n r
- (d A(h 1)-d M(h I » I
I
IdA(h I ) - dM(h I )I + 0 (n- ).
h €H'
u
~ sup
1
n
T
4
To verify (3.6.14), using the fact that dM(h)
= O(n-1 h-1 +h)
and
-73-
the above Hoelder-continuity considerations. it is enough to show
1 n
sup In- ! bjvjW(x j
hEH'
j=1
(3.6.15)
T
} /
dM(h}
I
~ 0
a.s.
n
(3.6.16)
sup In
hEH'
-1 n
! (v.
j=1
J
n
2
2
T
- E(v. »W(x.) / dM(h)
J
J
To verify (3.6.15). given
~
> O.
I
~ 0
a.s.
for any k = 1.2..... we have
1 n
T
P( sup In- ! b.v.W(x j ) / dM(h)
hEH'
j=1 J J
I > ~)
n
~ ~
-2k
-#(H')- sup E«n
n hEH'
n
-1 n
T
2k
!
bjv.W(x.) / dM(h)} ).
j=1
J
J
The proof of (3.6.15) is complete when it is seen that there is a
constant T
> O.
so that for k = 1.2 •.... there are constants c
k
so
that
(3.6.17)
Using the Borel-cantelli lemma. there is a sufficiently large k to make
(r+l-Tk)
< -1.
then for any given
~
> O.
1 n
T
!. P( sup In- .! b.vjW(x j ) / dM(h)
n=1
h€H'
J=1 J
Q)
we have
I > ~}
n
Q)
~ c!
n
r+l-Tk
<
Q)
n=1
Similarly. the proof of (3.6.16) .is complete by showing that
(3.6.18)
sup E«n
h€Hn
-1 n
2
2
T
2k
! (v j - E(v j »W(x j ) / dM(h»
)
j=1
To check (3.6.17). using n-
1
n
!
i=1
~
-Tk
ckn
~(Xj-x,) = 0(1) as given in
2
1
(3.6.6). b j = O(h ) as given in (3.6.8). and the boundedness of W. we
have
-74-
Using E«
en-0.
n
L c.)
. 1 I
1=
2k
= O(nk )
)
-1 -1+0
as given in (3.6.3), and C n
<h <
we have
n
E«n- I h2 ! c / d~(h}}2k}
i
i=I
The proof of (3.6.I7) is complete.
= O(hk }
To check (3.6.I8). express c s ' Ct' and
C
C
s
=
t
=
!
p=O
!
q=O
~ ckn- Ok
~(s-t) by
-Ppe s-p •
-Pq e t-q •
GO
~(s-t)
= ~2! -Pp-Pp _ s + t '
p=O
for each s. t = 1.2....• n.
n -1
V
j =
~
1<..(
nh x-x. }c
~
i=1
1
i
-_ 0 «nh}-2/5) as gIven
.
. (3 .6. 5) .
In
U
n-li~1 ~(Xj-Xi}Ci
n
-1 n
!
j=1
Using
+
0u«nh)-7/~}
as given in (3.6.IO).
2
E(v j }W(x }
j
n
n
L
1: ~(x.-x }Kh(xj-x }~(s-t) + O(n-~-2)
1:
J s
t
j=1 5=1 t=1
as given in (3.6.13), then we have
= n
n
-3 n
-1 n
L
j=1
n
n
1:
! Kh(xj-x }Kh(xj-x }(c C 5
t
5 t
j=1 5=1 t=1
= A + B + o «nh) -9/5} ,
U
= n
-3 n
!
+ 0 «nh}-9/5)
~(s-t}}W(x.)
U
J
where
-3 n
n
n
GO
2
1: ! -fi
K_ (xj-x 5 )Kh(xj-x } 1: -PP-Pp-s +t(e s-p - ~2}W(x,},
J
t
j=1 5=1 t=1
p=O
n
n
-3 n
B=n
!
1:
1: Kh(x.-x }~(Xj-Xt}(1: !
-P -P e
e
)W(x.}.
J 5
q~p-s+t P q s-p t-q
J
j=1 5=1 t=1
A = n
!
-75-
Then the inequality (3.6.18) is complete by showing that
(3.6.19)
E(A2k } = 0(n- 3kh- 2k },
(3.6.20)
E(B2k } = 0(n-
2k
k
h- }.
To check (3.6.19), letting u = s-p and f u = e s-p
2
-
~2'
express A
as
where
-3 n
a
and f
u
u
n
~
= n
n
~
~
Kh(xj-x }Kh(xj-x }W(x.}~ ~
,
s
t
J s-u -u+t
s=l t=1 j=l
are lID random variables with mean zero and all finite moments
for all u.
Using the extended Whittle's inequality as given in
00
(3.6.1), we need to check
~
u=-ClO
00
lau I
< 00
first, then we have
00
~ I~s-u~-u+tl ~ (~ I~i 1)2 <
u=-oo
00,
i=O
and n-O(nh) terms in
n
n
~
~
then
s=l t=l
we have
00
00
lau I
u=-ClO
~
~
n
~
-3 n
~
u=-oo
~
n
~
~
s=l t=l j=l
n
n
00
c
n
n-~-1 ~
~
u=-ClO
~
s=1 t=l
n
n
~ c ~
~ n-~-1
s=1 t=1
00
Now we calculate
~
u=-ClO
-1
O(h
2
a
u
.
<
Kh(Xj-xs)~(Xj-Xt)W(xj)l~s_u~_u+tl
I~s-u~-u+t I
00
-1 n
~
Because of n
j=1
n
) for any
5,
t,
~
s=1
n
n
~
~
~(t-t')
Kh(x.-x
J
s
)~(x.-Xt)W(x.)
J
J
00
~s-u = 0(1), ~2
~
u=-ClO
= O(n), then we have
t=1 t'=1
-76-
~-u+t~-u+t' = ~(t - t'),
=
ClO
ClO
n
n
n
2
3 !
2
[n!
! ~(Xj-X )Kh{x,-Xt)W{x.)~
!
! au =
~ +t J
s
J
J s-u-u
U=-ClO
U=-ClO
s=1 t=1 j=1
ClO
n
n
-4h -2 !
2
!
en
[ !
~
~s-u~-u+t]
U=-ClO s=1 t=1
ClO
n
n
-4 -2
!
cn
h
!
!
~
~-u+t~-u+t •
u=-ClO t=1 t'=1
n
-4 -2 n
'Y{t-t') =O{n-3h-2 ).
cn
h
!
!
~
t=1 t'=1
The proof of (3.6.19) is complete.
To check (3.6.20). letting u = s-p. v = t-q. express B as
ClO
ClO
B =!
! b
uveue V .
u=-ClO. u;ll!v •v=-Gl
where
-3 n
n
n
!
!
!
Kh{x.-x
)Kh{xj-x t )W{Xj)~s-u~ t-v .
J s
s=1 t=1 j=1
Using the extended Whittle's inequality as given in (3.6.2), we need to
ClO
ClO
check
Ibuv I < ClO first. then we have
b
uv = n
and n-O{nh) terms in
n
n
!
!
then we
s=1 t=1
have
ClO
~ ~ ~
ClO
-3 n
n
n
n s:1 t:1 j:1
~{Xj-xs)~{Xj-Xt)w{xj)l~s_u~t_vl
n
n
!
I~ ~
I
s=1 t=1
s-u t-v
~ e! ! n-~-I!
u v
n
n
~ c!
!
n-~-1
s=1 t=1
Now we calculate
< ClO
Because of. for any s. t,
-77-
n
-1 n
! Kh(x.-x 5 )Kh(xj-x t )W(x.)
J = O(h
J
j=1
00
n
!
~2
v=-oo
.J.
'f'
t
.J.
-v 'f'
= ~ (t
t ' -v
- t'),
-1
00
),
~2
5'=1
and noO(nh) terms in
~
~,
S-U 5
!
t
n
n
!
! , then we have
-U
~(s
=
n
= 0(1),
~(S-S')
!
!
U=-oo
~(t-t')
'=1
- 5'),
= 0(1),
s=1 t=1
00
00
i
i
b
u=-oo v=-OO
2
=i
UV
n
i [n-3 i
n
n
i
i
~(Xj-X )Kh(x,-Xt)W(x,)~
s
Uv
J
J
s-u ~t -v ]
s=1 t=1 j=1
n
n
2
[ !
! ~s-u~t-v]
s=1 t=l
n
n
n
n
!
i
! ~s-u~t-v~s'-u~t'-v
!
5=1 t=1 s'=l t '=1
n
n
n
n
~ cn-4h-2 !
!
!
! ~(s-s')~(t-t')
s=l t=1 s'=l t'=l
n
n
-4 -2
-1
!
n h
= O(n-2
n
).
~ c !
s=1 t=1
The proof of (3.6.20) is complete, i.e. the proof of (3.3.2) is
complete.
Proof of (3.4.2):
A
A
Since m(xj)-m(x j ) = b j + v j and mj(xj)-m(x j ) = b tj + v ej ' then
Remaindert(h) can be expressed as
(3.6.21)
Remaindere(h)
-1 n
= n
!
A
A
A
A
[mj(Xj)-m(Xj)][mj(Xj)+m(xj)-2m(xj)]W(Xj)
j=1
= n
-1 n
!
j=1
=A +
[(bej-b j ) + (Vej-vj)][(bej+bj) + (veJ.+vJ.)]W(x .)
J
B + C.
where
-78-
2
= 2n
-1 n
.! (vn.b nj - vjb.)W(x.).
J=1
~J ~
J
J
-1 n
C = n
j:l (v2j-Vj)(v2j+Vj)W(Xj)'
The proof of (3.4.2) is complete by showing that A. B. and Care
T
all of the order 0u(dM(h»
T
-1 -1
4
where dM(h) = a 1n h
+ b 1h. To check
2
using b j = O(h ) as given in (3.6.8) and b 2j = b j +
A = 0u(d~(h».
o(h2 ) as given in (3.6.9). we have
(b2j-bj)(bej+bj) = b ej
224
- b j = o(h ).
Then. using the boundedness of Wand 2 = o(nh). we have. for any
h € H . as n
n
~ m.
4
T
A = o(h ) = o(dM(h».
The
pro~f of
A =
o(d~(h»
is complete.
T
To check B = 0u(dM(h». following the same steps as in the proof
of n
-1 n
T
b.v.W(x.) = 0u(dM(h)} as given in (3.6.15). then we have
J
j=1 J J
-1 n
n
!
b .v W(x.)
j=1 2 J ej. J
!
n
-1
-1
-1 -1
= ! [n (n-2e~l)
!
Kh (x j -x i }b e .W(x.)(1+0(2n h »Je.
i=1
i:li-jl>e
J
J
1
T
= O(n-1 h2 ) !n e. = 0 (dM(h».
i=1 1
u
Thus. n
-1 n
! b.v.W(x.) and n
j=1 J J
J
T
The proof of B = 0u(dM(h»
-1 n
T
! bejvejW(Xj} are 0u(dM(h».
j=1
is complete.
To'check C = 0u(d~(h». express C as
-1 n
! (v2j-Vj)(vej+Vj)W(Xj)
j=1
= n -1 n! (vej 2 - v j 2 )W(x j )
j=1
-1 n
C = n
=n
!
j=1
-79-
n
Following the proof of n-
~
1
j=1
(V j
2
-1 n
:I
j=1
2
2
(E(v ej ) - E(v. »W(x.).
J
2
- E(v j »W(X j ) =
0u(d~(h»
J
as given
in (3.6.16). then we have
n
-1 n
j:l (v ej
T
Thus we have C = 0u(dM(h»
n
-1 n
:I
j=1
j+e
!
~(s-t)
:I
2
2
- E(v ej »W(x j
T
)
= °u(dM(h».
by checking that
2
2
T
(E(v ej ) - E(v j »W(x j ) = o(dM(h».
= o(e). and the boundedness of W. then we have
s t=j-e
n
-1 n
:I
2
2
IE(v j ) - E(v ej )IW(x j )
j=1
j+e
n
~ n-3 :I
:I
:I 1~(Xj-Xs)Kh(Xj-Xt)~(s-t)W(Xj)I(I+0(en-lh-l» +
j=1 s=j-e t
-3 n
j+e I
I
-1 -1
n
:I:I
:I
Kh(Xj-xs)Kh(Xj-Xt)~(s-t)W(x.) (1+0(en h
»
j=1 s t=j-e
J
-3 n
-2
-~-2
= n
:I O(eh ) = O(en ~ ) = o(dM(h» for e = o(nh).
j=1
The proof of (3.4.2) is completed.
Proof of (3.4.3):
A
Using mj(xj)-m(x j ) = b ej + v ej . we have
-1 n
Crosse(h) = n
:I ~j(bej+vej)W(Xj)'
j=1
E(Crosse(h»
Using (n-2e-l)-1
:I
i: li-jl>e
= E(n
-1 n
:I
j=1
~jVejW(x.»,
J
K (x.-x.) = 1 + O(en- 1h- 1 ) as given in
nh
J
1
co
(3.6.7).,
:I
k=-CD
Ik~(k) I
< co, and letting k = j - i, then we have
-80-
e'
E(Crosse(h»
= n
-1 n
L W(x.)(n-2e-l)
-1
Kh(Xj-x.)~(i-j)(1+0(en
L
j=1
J
1
= n- (n-2e-l)-1
-1 -1
h »
i:li-jl>e
1
L L W(i+k)K. (~)~(k)(I+o(en-lh-l»
Ik \ >e i
n -'h n
K. (~)~(k)[(n-2e-l)-IL W(i+k)](I+o(en- 1h- 1 »
= n- 1 L
i
n
Ik\>e "h n
1
1
= nL
[h- K(O) + \kln-lh-2]~(k)[JW + O( Ikln- 1)](1+0(en- 1h- 1 »
Ikl>e
1
= 2n- h- 1[ L ~(k)]K(O) w + o(en-~-2).
r
J
k>e
The proof of (3.4.3) is complete.
Proof of (3.4.4):
Since Crosse(h) = n
-1 n
L
~j(bej
j=1
+ Vej)W(x j ), we have
Crosse(h) - E(Crosse(h»
-1 n
.L
= n
J=I
~jbejW(Xj)
co
Using
~.
1
= L
p=O
(n-2e-l)-1
+ n
-1 n
E(~jvej»W(Xj)'
L (ejv ej -
j=1
co
~
e
•
p i -p
~j
co
~
= L
q=O
e
.
q j -q
~(i-j)
=
~2
L
p=O
~ ~
p p-
i+"
J
K. (Xj-X ) = 1 + O(en- 1h- l ) as given in (3.6.7),
i
i: li-j\>e- 11
L
then Crosse(h) - E(Crosse(h»
A= n
B1 = n
-1
(n-2e-l)
can be expressed as A + B1 + B2 , where
-I n
L
j=1
-I n
L
p=O
B2 = n
-1
(n-2e-l)
J
K (x.-x.)h J 1
L
j=li:li-jl>e
co
(L
~jbejW(x,),
~ ~
i j(e i -P
p p- +
2
-
~2»W(x.)(1
J
+ O(en
-I -I
h»,
-1 n
L
L
Kh(xj-x i )j=1 i: li-j \>e
-1 -1
( L L
~ ~ e _ e _ )W(Xj)(1 + O(en h » .
j
i
q#p-i+j p q
P q
Following the same arguments as the proof of (3.3.2). it is enough
to show that. for each k = 1. 2.
-81-
2k
(3.6.22)
E(A
(3.6.23)
E(B 2k )
I
k 4k
) = O(n- h ),
= O(n- 3kh- 2k ),
2k ) = O(n- 2kh- k ).
E(B
2
(3.6.24)
CD
= q::O
!
~j
To check (3.6.22), expressing
~
q
ej
-q
and letting u
=
j-q,
then we have
CD
!
a e .
u u
where
au
=n
-1 n
j:l bejW(Xj)~j_u'
Using the extended Whittle's inequality as given in (3.6.1), we need to
CD
check . ! la I
u
u=-CD
< CD first, then we have
CD
2
Using b . = O(h ) as given in (3.6.9),
eJ
!
u=-CD
I~.
J-U
I<
and the
CD,
boundedness of W, we have
CD
Now we calculate
!
a
2
u
Using the boundedness of W,
n
!
1~(j-j')1 = O(n), then we have
n
!
j'=1
n
~j-U~j'-U
n
2.4
4
cn-n
!
! ~(j-j') ~ cn- 1h .
j=1 j'=1
The proof of (3.6.22) is complete.
~
To check (3.6.23), letting u = i-p and f u = e.I-p
B
1
-1 n
= n -1 (n-2e-l)!
!
K (x -x.)-
j=1 i: li-j I>e -0
-82-
j
1
2
-
~2'
we have
(~
P=O
(Xl
where
b
u
+p +p-1.+ J..(e.1-P
2
~2»W(x.)(1
-
+ olen
J
-1 -1
h».
.
= n
-1 n
-1
~
(n-2e-l)
~
j=1
1: li-jl>e
Kh(Xj-x1)W(x.)~.
~
.(1+0(en
1-U -u+J
J
-1 -1
h »
and f U are lID random variables with mean zero and all finite moments,
for all u.
Using the extended Whittle's inequality as given in
(Xl
(3.6.1), we need to check
~
Ib 1 < f1rst, then we have
u=-c:D
u
(Xl
Using the boundedness of W,
(Xl
(Xl
~
1~1-u+-u+jl ~ (.~
u=-c:D
1=0
1+ 1 1)2
< (Xl, and
n
~
n-O{nh) terms in
(Xl
~
~ c ~ n
-1 n
U
then we have
.
I
Ibu
. u=-(Xl
~ cn
~ ~(Xj-x1)'
j=1 1
~
(n-2e-l)
-1
-1 n
~
(n-2e-l)-
1
j=1
~
1: Ii-j 1>e
j=1
~
Kh(xj-xi)W(x.)
1+.1-U+-u+.1
J
J
~(x -x )
i:li-jl>e
j
< (Xl.
1
(Xl
Now we calculate
~
U=-c:D
b 2.
u
According to
~ ~
1 -n
(x.-x.)+.
= O(h-1 ), the
J 1 1-U
(Xl
boundedness of W,
~2
~
u=-c:D
+-u+ j~-u+ j' =
~(j-j'),
and
n
n
~
~
~(j-j')
=
j=1 j'=1
O{n), then we have
(Xl
(Xl
n
2
1
~ bu ~ c ~ {n- ~ {n-2e-l)-1
~
K_ (x.-x.)W{x.)+. + +.)2
J 1-U -u J
u=-c:D
u=-c:D
j=1
1: I1-j I>e -n J 1
n
~ c ~ {n- 1 (n-2e-l)-1 ~ h- 1+
)2
u=-c:D
j=1
-u+j
n
n
(Xl
<_
cn-2{n-2e-l)-~-2
(Xl
~
~
~
u=-c:D j=l j'=l
-83-
.1.
.1.
-u+J' 't'-u+J' ,
't'
2
2- 2 n
n
-3 -2
cn- (n-2e-l)-~- L
L ~(j-j') = O(n h ).
j=1 j'=1
~
The proof of (3.6.23) is complete.
To check (3.6.24), letting u = i-p, v = j-q, we have
B = n
-1
2
co
L
=
-1 n
L
L
~(x -x.). 1 1·
'. I'1-J'1 >e
j
1
J=
-1 -1
( L L ~ ~ e i _ e j _ )W(X j )(1 + O(en h »
q~p-i+j p q
P q
co
L b
uveueV,
(n-2e-l)
U=-CD,u~v,V=-CD
where
bUY = n
-1 n
-1
-1 -1
L (n-2e-l)
L
Kh(Xj-xi)W(x,)~,
~j
(1+0(en h » .
j =1
i: 1i - j I >e
J 1-U -v
Using the extended Whittle's inequality as given in (3.6.2), we need to
co
co
Ib I < co, then we have
check
uv
Using the boundedness of W, L I~i . I
u
-u
< m, L
v
I~.
J-V
I < m,
e'
and n-O(nh)
n
L L Kh(x.-x.), then we have
terms in
j=1 i
J
1
u::-co v=-m
Sc
co
co
-1 n
1
L
L n
L (n-2e-l)L
Kh(x.-x.)W(x.) I~. ~. I
u=-co V=-CD
j=1
i: li-jl>e
J 1
J
1-U J-V
-1 n
~ cn
1
L (n-2e-l)L
Kh(xj-xi)W(x ) < co
j
j=1
i:li-jl>e
co
co
2
-1
Now we calculate
L
L b uv ' Using Kh(xj-x ) = O(h ).
i
U=-CD V=-CD
~2
L
u=-CD
~.
1-U~.,
1 -u
= ~(i-i·).
~2
L
v=-CD
~j
-v~j' -v
= ~(j-j').
n
!
~(i-i') = 0(1). L ~(j-j') = 0(1). n-O(nh) terms in !
L. and the
i'
j'
j=1 i
boundedness of W. then we have
-84-
co
co
2:
2: b 2
Uy
u=-co y=-co
~
c2: 2: (n
U Y
~
~
~
-1 n
2:
. 1
J=
(n-2e-1)
-1
..
1·
2:
K_ (x.-x. )W(x.H.
I·1- J·1 >e -n
J
1
J
'iJ.
1-U J-Y
-2
-~-2
n
2
cn (n-2e-1) n
2: 2: (2: 2: 'iJi_u'iJ j _ y )
U Y
j=1 i
-2
-~-2
n
n
cn (n-2e-1) n ! ! 2: !
2: ! 'iJ i 'iJ j 'iJ., -IJ • ,
U Y j=1 i j'=1 i'
-u -Y 1 -U J -Y
cn
-2
-~-2 n
(n-2e-1) n
n
2: 2:
2: !
j=1 i j'=1 i'
n
~ cn-2(n-2e-1)-~-2 2: 2: 0(1)
~(i-i')~(j-j')
= O(n-~-1).
j=1 i
The proof of (3.6.24) is complete.
-85-
)
2
OIAPTER IV
THE PERFORMANCE OF THE MODIFIED CROSS-VALIDATED BANDWIDTH
4.1
Introduction
The two optimal bandwidths, h
A
average square error (ASE or dA{h»
average square error (MASE or dM{h»
and h , are the minimizers of the
M
as given in (I.ll) and the mean
as given in (I.I2) respectively.
In Chapter 3, we showed that the modified cross-validation criterion.
CYi{h) as given in (3.l.I), could provide asymptotically optimal
bandwidths with respect to the measures ASE and MASE when the
observations suffer from a short range dependence.
However, this property is asymptotic in character.
In simulation
studies, we often observed that hcv{e) , the minimizer of CVe{h) , is not
close to h
A
or h , no matter how large the value of e is.
M
The purpose
of this chapter is to study the rates of convergence of hAIh ,
M
A
A
hCV (2)IhM, dM{hA)/dM{hM), and dM{hCV(e))/dM{hM)·
For the equally spaced fixed. circular design and independent
A
observations, Rice (I984) showed that IIh , the minimizer of Mallows'
M
criterion as given in {I. IS), converges to IIh
n
-1/10
M
at a relative rate
For the equally spaced fixed design and the independent
A
observations, the fact that hCV{O) converges to h and ~ at a relative
A
-1/10
rate n
and dA{hCV{O) converges to dA{hA) and dA{hM) at a relative
-1/5
rate n
could be derived from Haerdle, I~ll, and Marron (1988)
A
through a straightforward calculation.
e'
Section 4.2 gives the limiting distributions for hAlh ,
M
A
A
Section 4.3 contains
dM(hA)/dM(h M), hCV(e)lh M, and dM(hcv(e»/dM(hM)·
a discussion of our results.
Finally, the proofs are given in Section
4.4.
4.2
Limiting Distributions for Bandwidths
The purpose of this section is to derive the limiting
A
distributions for hAlhM, dM(hA)/dM(h M), hCV(e)lhM, and
A
dM(hCV(e»/dM(hM).
h~
(minimizer of
In Chapter 3, we showed that the bandwidths h A, h M,
d~(h)
as given in (3.3.1»,
as given in (3.4.5», and
h~e
(minimizer of
d~e(h)
hCV(t) are all of the same order n- 1/5
Thus
.
-115
-1/5
we may confine our attention to h within the Interval [an
,bn
]
for arbitrarily small a and large b.
In order to derive the limiting
distributions, using the equally spaced fixed design nonparametric
regression model (1.1) and the assumptions (A.l) through (A.7) as given
in Section 3.2, we must also impose additional assumptions:
(A.8)
The bandwidth h is chosen from the interval H
n
for arbitrarily small a and large b and for n
(A.9)
= I,
= [an-1/5,bn-1/5]
2,
The number of observations deleted in the construction of the
modified cross-validation score is 2e+1, with e
«n 1/2 .
The total
number of observations in this regression setting is n, with n
~
00.
Before we show the limiting distributions, we shall first repeat
some notation which will be used later:
-87-
Co
= [ ; -Y(k)JK2JW(JU~)-2(J(m")2w)-1]1/5.
k=-CXl
~e = [( ;
-Y(k)Jr - 4 ! -Y(k)K(O» JW(JU~)-2(J(m")2w)-1]1/5.
k=-CXl
k>e
Here and throughout this chapter. the notation
n ~
ro
J denotes J duo
As
and ~e > O. given the above assumptions. it is shown in Section
4.4 that
(4.2.1)
n
1/10
CD
(hA/hM - 1) => N(O. (
!
k=-CXl
-Y(k»
1/5
Var AM ).
(4.2.2)"
(4.2.3)
-S
=> N(O, (COeICo)
-7
CD
(
!
k=-CD
-y(k»
1/5
Var M) ,
(4.2.4)
where
VarAM = (8/25)[ J(K*(K-L»2Jw2J(m")2w + 4JWJrJ(m"w)2] I
[(JK2)9(JW)9(Ju~)2(J(m")2w)6]1/5 •
CD
CAM
= 2(
!
k=-CD
-Y(k»
-88-
1/5
Var AM ,
for L{u) = -uK'{u) and
*
meaning convolution.
Give an extra condition on the functions m and W:
(A.I0)
The fourth derivative m(4) of m and the first derivative W' of
Ware uniformly continuous.
Thus it is shown in Section 4.4 that. as n
hM = COn -1/5 + BOn-3/5 +
(4.2.5)
~
0
00.
(-3/5)
n
.
where
B
O
=
(1I20)[(k=~ "l'(k)J~JW)3/5(Ju4K)(J(m{3»2w
+ Jm"m(3)w')J 1
[(Ju~) 11 (J{m.. )2w)8] 1/5.
It follows from. (3.4.8). (3.4.9). and (4.2.5) that
(4.2.6)
where
4.3
Discussion
In (4.2.3). the rate of convergence n
-1110
A
of hCV(e)/hM is
A
excruciatingly slow.
In (4.2.4). the relative loss dM{hCV(e»/dM(hM)
-115
also has a slow rate of convergence n
These explain. in practice.
why the modified cross-validation criterion does not always provide
good bandwidth estimates.
In these two limiting distributions for
-89-
hcv(e) , we also see that the value of t controls the amount of
asymptotic biases and asymptotic variances, and is independent of the
rates of convergence.
Although the two rates of convergence for hCV(e)
seem, at first glance, to be excruciatingly slow, it should not be too
disappointing because h
A
has the same rates of convergence as hCV(e).
The limiting distributions derived in this chapter are still true when
the regression errors are a short range dependent linear process as
defined in Section 3.5.
The only difference in this case is that the
range of allowable values of e has been significantly reduced from
1 2
o(nh) to 0(n / ).
In the rest of this section, we shall use two examples of MA(l)
and AR(l) processes, the two easiest cases in time series analysis, of
regression errors to illustrate how the asymptotic bias-squares and the
e.
asymptotic variances depend on the value of
Note that while (4.2.3)
and (4.2.4) quantify the sample noise or "variance" of the ratios
hcv(e)/hM and dM(hcv(e»/dM(h M),
[(~e/Co) -
[(~e/Co)
+
(B~t/Co)n-2/5
- 1] and
1] can be thought as measuring a type of "bias" for these
two ratios respectively.
Looking at (4.2.3) and (4.2.4), (4.2.5), and
(4.2.6), we have the following asymptotic mean square errors (AMSE) for
[hcv(e)/h~ -
1] and [dM(hcv(e»/dM(hM) - 1] as n
~
m
and
~e > 0:
...
(4.3.1)
AMSE(hCV(t)/hM - f)
~C:::
= [COe/Co
S
-2/5
2
+ (BOe/CO)n
- 1] +
n
-1/5
-S
[COe/Co]
-7
m
(
~
k=-m
-90-
~(k»
1/5
Var M,
e-
(4.3.2)
'"
AMSE(d M(hCV (2»/dM(h M) - 1)
-S
= [(C02/CO)
4
- 1]
2
+ 8n
-2/5
-S
(C02/CO)
-10
en
(2
k=-en
~(k»
2/5
2
Yar M .
Now we begin to discuss theoretically how the asymptotic biassquares and the asymptotic variances of these two AMSE depend on 2.
If
the observations are independent. then t has no effect on these two
AMSE.
This is because the ratio ~2/CO equals 1 for any
However. for the dependent observations. the effect of
e
e~
O.
on these two
AMSE are different from the case of the independent observations.
is because the ratio
~2/CO as given in (4.2.3) depends on 2.
~j
MA(1) process of regression errors:
N(O.02) and -1
<G~
= Ge j _ 1
+ e
j
where e
j
It
For the
are lID
1. which has been discussed in Section 3.5. we
have. through a straightforward calculation.
For each 2 l 1. the AMSE(hCV(e)Ih
M
if 2
=0
if 2
~
1
- 1) becomes n-1/5(1+8)2/502/5YarM'
and the AMSE(dM(hcv(e»/dM(h ) - 1) becomes sn-2/5(1+8)4/504/5YarM2.
M
2
where 0 is the MA(1) parameter. These are the same as in the case of
the independent observations.
This is because the "leave-{22+1)-out"
'"
version of mj(x j ) as given in (3.1.2) is independent of Y. for every j
J
when 2 .l 1 in the MA(1) case.
If 8
> O.
then the asymptotic bias-
squares and the asymptotic variances of these two AMSE at 2
greater than those at 2 = 1.
two AMSE might be at
e = O.
are
In this case. we prefer to take the value
of 2 as any positive integer. with
some values of K. n. G. and
=0
0
2
e « n 1/ 2 .
If 8
< O.
then there are
for which the minimum values of these
This is because. for these two AMSE. the
-91-
two asymptotic bias-squares decrease and the two asymptotic variances
increase. as e increases from 0 to 1.
However. the value of Var M is
not obtainable in practice since it involves the unknown regression
function m(x).
In Chapter 6. Var
M is estimated by the plug-in methods
in the simulated regression setting.
As n
~~.
the two asymptotic
bias-squares are dominant terms in these two AMSE since the asymptotic
variances tend to O.
asymptotically.
e«
e
So. in this case. we conclude that.
should be taken as any positive integer. with
n 1/ 2
When the regression errors are an AR(I) process:
where e. are lID N(O.o2) and
J
~j
= ~~.J- 1 +
e.
J
I~I < 1. we have. through a
straightforward calculation.
If
~
> O.
then the asymptotic bias-squares and the asymptotic variances
of these two AMSE decrease as e increases.
~e/Co is between
0 and 1. and increases to 1 as
this case. we prefer to take the value of
course.
n
~
•
IS
•
d to e
restrlcte
«
nl/2
are discussed on the odd number of
separately.
When
This is because the ratio
e
If
e and
e as
~
e increases.
So. in
large as possible.
< O.
Of
then the situations
the even number of
e
increases as an odd number. the situation is the
same as in the case of
~
> O.
When
e
increases as an even number. the
two asymptotic bias-squares decrease and the two asymptotic variances
increase.
e.
Then the two AMSE are convex upward on the even numbers of
We have a simulation study to show this situation in Chapter 6.
For the general ARMA processes of regression errors. the situation
becomes more complicated than that of the MA(I) and AR(I) processes.
In the general case. we only know that the ratio
-92-
~e/Co
= [1 -
(4K(O)/r~)(; ~(k)
JI
k>e
of the two AMSE than in the case of
;
k=-oo
~(k»]1/5
e goes to infinity. Thus,
of e could result in smaller values
e = o.
converges to 1 at a polynomial rate as
asymptotically, taking large values
/
-93-
4.4
Proofs
For the proofs of (4.2.I) through (4.2.4). we need the following
notation:
"
D{h) = dA{h) - dM{h). DI{h) = (-h/2)D'{h).
o{h) = -2{Crosse{h) - E{Crosse{h»). 0I{h) = (h/2)o'{h).
where
~rosse{h)
is defined in (3.4.I).
Using the assumptions (A.I)
through (A.9) as given above. it is shown later that DI{h) and 0I{h)
have the following asymptotic decompositions:
(4.4.1)
(4.4.2)
where
SI{h) =
n
n
~
~
5=1. s¢t. t=I
Ast{h){~s~t
-
~(s-t».
n
TI{h) = j:I li-~I>e Di/h)(~i~j - ~(i-j)).
n
T2 (h) = ~ Ej(h)~j'
j=1
and where the coefficients Ast{h). Bs{h). Dij{h). and Ej{h) are given
in their proofs.
Given the assumptions above. as n
later that
(4.4.3)
-94-
~~.
it is shown
~
for any
€ [a,b] and where
= 2u2v-1 J(KM(K-L))2Jw2
VS(u, v)
Vsr(u,v) = 2U
.
-1/5
SInce hM = COn
2
V-
+ uv\Ju~)2J(m..w)2,
1
2
J(KM(K-L) - (K-L»2JW .
S
~en-1/5(1+0(1»
as given in (3.3.6), and hMe =
as given in (3.4.8), then (4.4.1) through (4.4.5)
imply that, as n
-4
(4.4.6)
n
(1+0(1»
co
and ~e
7/10 ,
D (hM)
7 10
(4.4.7) n / (D'+O')(h: e )
> 0,
=> N(O,
~>
N(O,
4C
-2
0
co
-V (
!
S k=-Cll
4(~e)-2ysr(
~(k),CO»'
co
!
k=-Cll
~(k)'~e»'
Under the assumptions given in Section 4.2, we have the following
asymptotic properties (4.4.8) through (4.4.13) which are shown later:
For each positive integer k, there is a constant C ' so that
4
2k
2k
(4.4.8)
sup E[r -l h 1/2(ID'(h) 1 + lo'(h) 1 )] ~ C '
4
h€H
n
n
furthermore, there is an ~ > 0 and a constant C ' so that
5
1 1 2
2k
+ IO'{h) - O'{h )1 2k )]
(4.4.9) E[r - h / {ID'{h) - D'{h )1
n
1
1
~ c51 (h-h 1 )/h I~k,
whenever h, hI
sup
h€H
(4.4.10)
n , with h ~ hI' For any p € (0, 1/10), we have
[r -l h 1/2( ID'(h)1 + lo'(h)I)] = 0 (n P),
€ H
n
furthermore, if n 1/ 5h
1
tends to a constant, then we have
sup
(4.4.11)
Ih-h11~n-3/10+P
[r -l h 1/2(ID'(h) - D'(h )1 +
·n
1
IO'{h) - o'(h 1 )1)] = 0p{l),
S I
-3/10+p
Ih A - hMI + IhCV(e) - hMe = open
).
1/2
n
, we have
A
(4.4.12)
For any
p
n
e«
-95-
(4.4.13)
Remaindere'(h)
= 0p(n-7/10 ),
where Remaindere(h) is defined in (3.4.1).
The proof of (4.2.1) is based on the expansion
(4.4.14) 0
= dA' (hA) = dM' (hA)
where h* lies inbetween h
A
= (hA - hM)dM"(h* )
Using h A = h M(l+o u (l»
+ D' (hA)
and h .
M
+ D' (h A),
as given in
(3.3.6), we have
(4.4.15)
Using (4.4.12), for any p € (0, 1/10), h A
using (4.4.11) (with hI
= hM +
= hM = COn-115 (1+0(1»),
open
-3/10+p
), and
we have
D'(hA) = D'(hM) + 0p(n-7/10 ).
So (4.4.14) becomes
,
-7/10
+ D (hM) + open
).
,
-7/10
Combining (4.4.16) with D (hM) = Open
) as given in (4.4.6), we
(4.4.16) 0
= (hA -
h M)C2n
-215
(1+o u (1»
have
-3/10
h A - h M = Open
).
7/10
Multiplying (4.4.16) by n
,and combining the result with (4.4.17)
(4.4.17)
we have
(4.4.18)
0
= n3/10 (hA = n
1/10
7/10 ,
hM)C2 + n
D (hM) + 0p(l)
(hAIhM - 1)COC2 + n
-lIS
for h M = COn
(1+0(1».
7/10 ,
D (hM) + 0p(l),
Then combining (4.4.18) with (4.4.6), the
proof of (4.2.1) is complete.
For the limiting distribution of dM(hA)/dM(hM), observe that
= (l/2)(hA -
(4.4.19) [dM(hA) - dM(hM)]/dM(hM)
where h* lies inbetween h
.. * = C n -2/5 (l+o (l»
2
u
dM (h )
A
and h .
M
h M)2dM "(h*)/dM(hM),
Given the facts that
-3/10
as stated in (4.4.15), hA - h M = Open
)
as stated in (4.4.17), and dM(hM)
= C1n-4/5 (1+0(1»,
becomes
-96-
then (4.4.19)
2 -1
2/5
-1/5
(4.4.20) dM(hA}/dM(h M} - 1 = (1/2)(h A - hM) C1 C2n
+ 0u(n
).
Then. combining (4.4.20). (4.4.18). and (4.4.6). the proof of (4.2.2)
is complete.
The proofs of (4.2.3) and (4.2.4) are based on the expansion of
S
CV (h).
According to the definitions of D(h). o(h). and dMe(h) as
2
given in (3.4.5). then CVe(h) as given in (3.4.1) can be expressed as
(4.4.21)
CV 2 (h) = n
-1 n
.~
?__
~j-W(Xj) +
S
dMe(h) +
J=1
(D + o)(h) + Remaindere(h).
Then. the proofs of (4.2.3) and (4.2.4) are complete by taking the
derivative of (4.4.21) and following the same steps of (4.4.14) through
(4.4.20) .
Proof of (4.2.5):
The proof of (4.2.5) is based on the proof of (3.3.1).
Since the
regression function m(x) has a uniformly continuous fourth derivative
and h = O(n
-1/5
}. then the Taylor expansion of b
j
as given in (3.6.8)
becomes
-1 n
b . = [n
J
=
~
i=1
-1 n
~ Kh(x.-x.)]
i=1
J
1
4
4
4
+ (1I24)h m(4)(x }Ju K + o(h ).
j
Kb(x.-xi)(m(x.)-m(x j »] 1 [n
J
(1I2)h2m"(xj)Ju~
1
Thus. the asymptotic expression of dM(h) as given in (3.6.12) and
(3.6.13) becomes
(4.4.22)
where
-97-
Let DM(h) = dM(h) - [a 1n
Taylor's theorem and h
Co = [a 1/(4b 1 )]
1/5
M
-1 -1
= COn
h
-1/5
4
+ b 1h ] = b2h
6
6
+ o(h).
Using
(1+0(1» as given in (3.3.6). where
• then we have
.
-1
-2
3
,
o = d M (hM) = -a 1n hM + 4b 1hM + DM (hM)
-1/5
-1--3
~
5 -1
-1
= (h - COn
)(2a n h
+ 12b h ) + 6b C n
+ o(n ).
M
1
1
2 O
..... .
-1/5
'"
-115
where h 1S 1nbetween hM and COn
• i.e. h = COn
(1+0(1». Thus.
-115
solving the above inequality for hM - COn
. we have
hM - COn-115
-1"'-3
~
h
+ l2b h ]
1
-1
-1
2/5
3/5 -2/5
+ o(n )] 1 [Sal
n
= [-3a 1b 2 (2b 1 )
(4b 1 )
(1+0(1»],
-3/5
= BOn
(1+0(1».
5 -1
= [-6b C n
2 O
+ o(n
-1
)] 1 [2a n
1
where BO has been given in (4.2.5).
The proof of (4.2.5) is complete.
The notation and properties used to prove (4.4.1) through (4.4.13)
include:
(a)
The constant c is a generic one.
(b)
1
1
Given L(u) = -uK'(u). Kh(u) = h- K(ulh). ~(u) = h- L(ulh) =
-uh-~·(ulh). then we have
(c)
For each x j with W(x j ) # O. or h
< x j < I-h.
under the assumptions
as given in Section 4.2. we have the following asymptotic results:
bh(x j ) = n
~(Xj) = n
-1 n
!
i=1
-1 n
!
i=1
~(xj-xi)m(xi)
- m(x j ) =
~(xj-xi)m(xi)
- m(x j
-98-
)
(1/2)h2m"(Xj)Ju~
2
+ 0(h ).
2
2
= (3/2)h m"(X j ) Ju~ + 0(h ).
vh(x.) = n
J
-1 n
~
1=1
Kh(Xj-xi)~.
J
= h
0
1
(alah)(bh(x j » = h
(alah)(vh(x.»
=
-1
[n
-1
u
«nh)
).
(ch(X j ) - bh(x j ».
-1 n
~
i=1
1- (Xj-x.)~.
~
1
-1 n
b j = [n
-2/5
- vh(x.)].
J
1
-1 n
1:1 Kh(Xj-xi)(m(xi)-m(xj»]/[n
-1 -1
= bh(x j ) + O(n h
b nJ. = [(n-2e-1)-1
c;
i:1 Kh(xj-x i )]
).
~
i:11-j\>e
Kh(xj-xi)(m(x.)-m(x.»] 1
J
1
~
[(n-2e-1)-1
Kh(x.-x.)]
i:li-jl>e
-1
-~-1
2
= b. + O(en h + en ~ ) = b j + o(h ).
.
J
-1 n
-1 n
~
v. = [n
i=1
J
~(x.-x.)~i] 1 [n
J
1
v e . = [(n-2e-1)
-1
J
..
1·
~
\ I-J
..
~
i=1
K_
I>e On
J
~(Xj-X1)] = vh(x.) +
J
1
7/5
0
u
«nh)-
(Xj-X1)~i] 1
[(n-2e-1)-1
~
Kh(xj-xo)]
i:li-j\>e
= V.J
(d)
+ 0
U
).
«nh)-415 + e3/5 (nh)-1)
= Vj
+
1
o«nh)-415).
For any i. j. s. t = 1. 2 ..... n. using the Riemann summation and
the Hoelder continuity of K. L. and m. the coefficients given in
(4.4.1) and (4.4.2) are
-3 n
Ast(h) = n
j:1 [~(xj-xs)Kh(Xj-Xt) - (1/2)~(Xj-xs)ln(Xj-Xt) -
(xj-x sUh
)~ (x.-x )]W(x.)
J
t
J
2
= n-~-1W(X )[KM(K-L){(s-t)/nh}] + O(n- )
(1/2)1n
s
-~
=O(nn
-1
).
-99-
Proof of (4.4.1):
Using the decomposition of dA{h) - dM{h) as given in the proof of
(3.6.14), then D{h) can be expressed as
D{h) = dA{h) - dM{h)
-1 n
!
= 2n
j=1
-1 n
= 2n
!
j=1
= 2n
bjv.W{x.) + n
J
-1 n
!
j=1
J
-1 -1
(bh{x j ) + O{n h
(v.
J
2
2
- E{v j »W{x.)
J
»(vh{x j ) + 0u{{nh)
-7/5
»W{x j ) +
-1 n
! bh{x .)vh{x .)W{x.) +
j=1
J.
J
J
-3 n
n
n
n
!
!
! Kb{Xj-xs)Kb{Xj-Xt){~s~t j=1 s=1 t=1
Using the derivatives of
~(s-t»W{x.) +
J
0u{{nh)
-7/5
Kb'
bh , and vh ' we have
D1{h) = (-h/2)D'{h) = SI{h) + S2{h) + S3{h) + Op{{nh)-7/5),
where
n
n
SI{h) = !
!
Ast{h){~s~t - ~(s-t»,
s=I,sjll!t,t=1
n
S2{h) = ! B (h)~ ,
s=1 s
s
n
2
S3{h) = ! A (h){~
- ~(O»,
5=1
S5
5
and where the definitions of Ast (h) and Bs (h) have been given above.
The proof of (4.4.1) is complete.
-100-
).
e-
Proof of (4.4.2):
Using the decomposition of Crosse(h) - E(Crosse(h»
as given in
the proof of (3.4.4), then 6(h) can be expressed as
6(h) = -2(Cross e (h) - E(Crosse(h»)
-1 n
-1 n
~ (~jVe. - E(~.vftj»W(Xj) - 2n
~
= -2n
j=1
J
J ~
j=l
To check T (h), using (n-2e-l)-1
1
~
i:li-jl>e
~.bft.W(x.).
J ~J
J
Kh(xj-x ) = 1 + O(2n- 1h- 1 ) as
i
given in (3.6.7), then the first term in the decomposition of 6(h)
becomes
-2n
-1 n
~
j=1
(~jVej - E(~jvej»W(Xj)
-2 n
= -2n
~
~
Kb(Xj-xi)(~i~j
-
~(i-j»W(Xj)(1 +
-1 -1
O(2n
h».
j=1 i: li-j I>e
Using (3.6.23) and (3.6.24), the second moment of the remainder term in
the above equation is
n
E([n-2 ~
~
~ (x.-x.)(~i~j - ~(i-j»W(x.)]2) = O(n-~-I).
nh
J 1
J
j=1 i:li_jl>e
This implies that the remainder term 15 0 (n-1 h -1/2 ). Using the
p
derivative of Kh as given above, we have
(h/2) (%h) (-2n
-1 n
~
j=l
= -h(%h)[n-
2
n
(~jVej - E(~jvej»W(Xj»
~ .
~
j=1 i: li-j I>e
Kh(Xj-xi)(~.~j - ~(i-j»W(x.)
J
1
+
0
p
= T1 (h) +
0
-lh-l/2).
p (n
where
-101-
(n-l h -1I2)]
and b
2j
= b
2
j
2
+ o(h } = bh(x } + o(h } as given in (3.6.9). then the
j
second in the decomposition of o(h} becomes
-2n
-1 n
! c.b .W(x.}
j=1 J 2J
J
where
n
T (h} =
2
Ej(h} = n
-1
! Ej(h}c j •
j=l
(bh(x j ) - ch(xj}}W(X j }.
e-
The proof of (4.4.2) is complete.
Proof of (4.4.3):
Using the linear expressions of c . c 2 . and
s
(Xl
c
s
(Xl
s
(Xl
2
! 'Ii. e .. c
=!
! 'Ii. 'li j e i e .•
.1=0 1 S-1
S
1
S- S-J
1::0 j::O
=
~(O):
(Xl
~ (O)
2
= J.l 2 ! 'Ii. . then we
.1=0 1
have
(Xl
c
(Xl
Letting u=s-i. f u = e u
2
- J.l2' v=s-j. then S3(h) becomes
n
S3(h} = ! A (h}(c
s=1 ss
5
n
=
(Xl
2 2 2
- ~(O) = ! 'Ii. (e i - J.l ) + !
! 'Ii.'Ii.e .e .'
2
s
i::O 1
5'Jj '::0 1 J 5-1 S-J
i::O
.1~ .J-
(Xl
2
2
2
! A (h)! 'li 1 (e 1
5=1 55
1::0
5-
~(O})
n
(Xl
(Xl
- J.l 2 ) + ! A (h)!
! 'Ii 'Ii e
e
l::O.i~j.j::O 1 j 5-1 s-j
s=1 55
= A + B.
where
-102-
m
A= L
a f .
u u
m
m
L
L
B=
u=--m.uj1l!v.v=--m
b
uveueV.
and where
n
a
= s=l
L
u
A (h)~
2.
ss
s-u
n
b
and f
u
=
uv
L Ass(h)~s-u~s-v'
s=l
are lID random variables with mean zero and all finite moments.
222
So. E(S3(h) ) = E(A ) + E(B).
Using the extended Whittle's
m
2
inequality (3.6.1) to calculate E(A ). we need to check
2
m
first. then E(A )
m
m
~
s-u
2
< m.
~
L
u=--m
a
2
u
-2.. -1
Since Ass (h) = O(n -h
lau I < m
L
). and
we have
m
m
Using A (h) = O(n-~-l), L ~
2
ss
u=--m s-u
1/S ) , then we have
h = O(nm
2
au
m
< m,
n
~,
L
s'=l
s -u
2
< m,
and
n
[L A (h)~ 2]2
= L
s-u
u=--m s= I ss
Using the extended Whittle's inequality (3.6.2) to calculate
E(B2 ) , we need to check
m
m
L
L
Ibuv I < m
Since Ass (h) = O(n
-103-
first, then
-2. -1
~
m
and
),
u=--m
co
then we have
v=-co
co
co
co
Ibuv I
u=-co v=-co
co
n
I'J1
~
n -2..-1
-h
.L.
~ c }:
}:
u=-ClO v=-ClO
co
s=l
s - u'Ji s - v
I
=
(XI
Using A (h) = O(n-~-l), ~2}:
~ ~,
ss
u=-ClO s-u s -u
-1/5
~(s-s'). and h = O(n
), then we have
(XI
(XI
):
}:
(XI
b
u=-co v=-ClO
2
uv
n
(XI
2
!
c !
~s-u~s-v]
u=-ClO v=-ClO
s=l
co
co
n
-4 -2 n
!
n h
!
!
~ c !
~s-u~s-v~s'-u~s'-v
u=-ClO v=5=1 5'=1
n-~-2[ !
~
~
-4 -2 n
cn
h
n
!
.
~(s-s')
}:
2
5=1 s'=l
n
~ c! n- 4h- 2 = 0(n- 3h- 2 ) = 0(n- 13/S ).
s=l
The proof of (4.4.3) is complete.
e-
Proof of (4.4.4):
Using the property (b) of Section 3.2, then the ARMA process of
regression errors
can be uniquely expressed as
~j
co
(XI
where~. are real numbers with
1
I~il
!
i=O
<
(XI
and e. are lID random
1
variables with mean zero and all finite moments
~
= E(e
k
1
).
For the
proof of the asymptotic normality of Sl+S2' we start from the case that
are lID random variables, i.e.
~j
~i
= 0 for all i > O.
In this case,
Haerdle, Hall, and Marron (1988) have shown that, as n -+ co, for any
13
€
[a,b],
n
where
~(O)
9/10
(Sl+S2)(l3n
= Var(e ).
j
-1/5
) => N(O,
VS(~(O),I3»,
We now need to show that this asymptotic
-104-
normality holds for the linear process
= 1.2. . . . .
Define. for j
~j
= i=O
!
n, and 0 ~ q
~.e . .
1 J-l
,
< n4/5,
q
= i=O
! ~iej-i'
q
9/10
n
(SI +s2q}(~n-1/5}.
q
~j
Ynq (~)
xn (~) = n
9110
=
(SI+S2}(Pn
-1/5
ro
) (with ~j
= i=O
!
~.e
1
. . ).
J-l
where Slq(h} and S2q(h} are SI(h) and S2(h) replaced ~j by ~jq
First
of all. the asymptotic normality holds for an MA(q) process with
q
< n 4/5 •
i . e. as n
(4.4.4.1)
~ ro.
=>
Y (~)
nq
where the notation A
~
q
2
N(O. VS(~2(! ~1) .~»
i=O
~
y
q
(~).
B means A and B have the same distribution.
The
proof of (4.4.4.1) is given later.
To.continue extending the asymptotic normality to the linear
process given above. we shall use the Exercise 6.16 of Brockwell and
Davis (1987):
(Exercise 6.16) Suppose that X
n
o < v < ro. as n
~ ro.
Thus, we have. as q
y
q
Then X
n
~ N(~
n
=>
.v }. where
n
X. where X
~
n
~~.
~ N(~.v}.
as n
v
n
~
v and
~ ro.
~ ro.
(~) =>
ro
N(O.
In order to claim that X (P)
n
VS(~2(! ~i}2.~)} ~ Y(~}.
i=O
=~ Y(~)
as n
~ ro.
we shall use the
Proposition 6.3.9 of Brockwell and Davis (1987):
(Proposition 6.3.9) Let X • n
n
n
= 1.2....•
(i)
(ii)
= 1.2•....
and Y
nq
q = 1.2•...•
be random p-vectors such that
Ynq
=)
Yq as n
Yq
=>
~ ro
Y as
for each q
Q ~ ro.
-105-
= 1.2....•
and
lim limsup P(
(iii)
q~
n~
Ix n
Y I > ~) = 0 for every ~ > O.
nq
Then
xn
=> Y as n
~ m.
Now we only need to check the condition (iii) of the property described
above to get X
n
=>
(~)
Y(~).
Using the following quantities which are
shown later:
(4.4.4.2) Var(S1(h»
= 2n
-
m
~ -1 (k=:m
~(k»
r 2_ 2J(m"W) 2+
m
(4.4.4.3) Var(S2(h» = n
(4.4.4.4)
-1 4
h (
:I
k=-m
J(K*(K-L» 2Jw2 + o(n--1
~ ).
2
~(k»(JuJ{)
4
o(n-1 h).
= O(n-~).
Cov(Sl(h),S2(h»
where
then we have
e-
For any q. we have
The variance of this term is
Var(X
n
(~) - Y (~» = VS(~2(:I ~.)2.~)
nq
i>q
+ 0(1).
1
Hence. using Chebychev's inequality. given any
~
> O. we have
lim limsup p(IX (~) - Y (~)I > ~)
n~
n
nq
q~
~ lim limsup ~-2yar(X (~) - Y (~»
q~
n~
n
~ lim ~-2yS{~2(:I ~i)2.~)
q~
i>q
nq
= O.
The proof of (4.4.4) is complete.
For the proof of (4.4.4.1). using the linear expressions:
-106-
q
q
q
= ! '/J.e ., e
= ! '/J.e ., "'T(s-t) = /l2! '/J.'/J. +t' and
tq
'_A
1 t-J
'_A
11-S
sq
. 0 1 S-1
J=v
1=v
1=
q
q
q
2
e sq e tq - "'T(s-t) =.
!, .
!
'/Ji'/J j e s- i e t- j + . ! '/J.'/J.
+t(e
.
1 1-S
S-1
1=O,J#1-s+t,j=O
1=0
e
/l2)' then Slq(h) can be decomposed into
n
n
Slq(h) = !
!
A (h)(e e
- "'T(s-t»
s=I,s#t,t=l st
sq tq
= a (h) + b (h),
q
q
where
n
q
n
q
a q (h) = !
!
Ast(h) !
!
'/J.'/J.e
1 J S-1.e t - J.,
s=l,s#t,t=l
i=O,j#i-s+t,j=O
n
n
q
2
b q (h) = !
!
As t(h)! ~i~'1-S+t(e S-1. - Jl2)'
s=l,s#t,t=l
i=O
Here, a (h) can be further decomposed into
q
n
n
q
q
a (h) = !
!
A t(h) !
!
q
s
s=l s#t , t=1
i=O,j#i-s+t,j=O
'/J.~.e
.e ·
1 J S-1 t -J
l
q
n
q
n
=.!
! '/Ji'/J j
Ast(h)es_iet_j
!
!
1=0 j=O
s=l,s-t#O,i-j,t=l
q
q
n
n
=.!
! ~i~j!!
AU+i,V+j(h)euev
1=0 j=O
u,v=-q+l,u-v#O,i-j
by letting u=s-i,v=t-j,
where
a 1q (h)
=
q
q
!
!
n
i=O j=O
~i~j
U
n
n
I
n
!
!
AU +i
v=-q+ 1 u-v#O
IV
+ j (h)eU e V
I
q
=!
!
(!
'_A
U,v=-q+ 1 , u#v 1=v
q
!
.-"
J=v
~.'/J.
1 J
A . +.(h»e e
U+1,V J
u V
I
q
a 2 q (h) = (-1)
! [ ! ! ~i~i - k AU+i
k=-q u i
. 2k(h)eU e u-k]'
,U+1-
Based on the assumptions given in Section 4.2, it is shown later that
( 4.4.4.5)
(4.4.4.6)
9
(n- / 10 ) for all h € H ,
n
-9/10
a2q{h) = 0 (n
) for all h € H .
b (h) =
q
0
p
n
p
Now we rearrange S2q(h) as
-107-
n
S2q(h) = ! B (h)f..
s=l s
Jq
q
n
= ! B (h)! -li i e i
i::O
ss=l s
q
n
!
=
( ! -li i B +i(h»e
u
u
u=-q+1 i::O
by letting u=s-i.
Thus. we have
Slq(h) + S2q(h)
= a 1q (h) + S2q(h) +
n
n
=!
!
0
q
u.v=-q+1.u~v
p
q
(n
-9/10)
(!
! ~'~jAU.V
+i +j(h»eUe v +
i::O j::O 1
n
q
!
( ! ~iBu+i(h»eu + 0 (n
u=-q+1 i::O
P
-9/10
).
which is the same form as in the case of lID regression errors given in
Haerdle. Hall. and Marron (1988).
Slq+S2q is true.
So. the asymptotic normality of
The expectation of Slq+S2q is zero for any h
Now we only need to calculate the variance of Slq+S2q'
€
Hn
Using the
following quantities which are shown later:
(4.4.4.7)
Var(Slq(h»
-?-
= 2n ~
-1
q
2 2J
2 r, 2
-2- -1
(K*(K-L}) JW + o(n ~ ).
(~2(i:O -Iii)}
(4.4.4.8) Var(S2q(h» = n
r
-1 4
q
2
2-. 2J
2
-1 4
h ~2( i:O ~i) (JuJ() (m"W) + o(n h).
Cov(Slq(h),S2q(h}) = O(n-~}.
(4.4.4.9)
Replacing h with ~n-1/5 in (4.4.4.7) through (4.4.4.9). then we have
9/10
Var(n
-liS
(Slq+S2q)(~n
q
2
}) = VS(~2(i:a -Iii) .~).
The proof of (4.4.4.1) is complete.
For the proof of (4.4.4.5). changing of indices. then b q (h) can be
expressed as
b q (h)
=
-108-
n
n
n
=!
!
Ast{h)
!
~s-u~-u+tfu
u=-q+1
s=l. s;l!t. t=l
by letting u=s-i. f u =eu
2
-~2'
n
=
!
u=-q+l
Using (1) of Whittle's inequality. independence between f u ' and Ast{h)
= O{n-~-l) for any s. t. then we have
n
n
n
!
(!
!
Ast (h)~s-u~-u+t )2
u=-q+1 s=l.s;l!t.t=l
n
~ c !
n-4 h- 2 = O{n- 13/S ).
u=-q+l
l/S
since h = O{n). Then. we have b (h) = 0 (n-g/lO).
E{bq (h)2) ~ c
q
p
The proof of (4.4.4.S) is complete.
-2 -1
For the proof of (4.4.4.6). using Ast(h) = O{n n ) for any s. t.
~ ~i~i-k
O(n) terms in !.
u
-liS
= 0(1). and h = O(n
). then we have
1
2
q
E{a2q (h) ) = E(k=:q ~
~ cn-4h-~(
"
~
7~i~i-kAU+i.U+i-2k(h)eueu+k)
q
2
q
!!
! ! eueu+keu,eu'+k')
k=-q u k'=-q u·
cn-4.lh-2n 4/Sn4/S= O{ n-2)
since the subindices of the random variables must pairwisely equal.
otherwise the expectations are zero.
The proof of (4.4.4.6) is complete.
For the proof of (4.4.4.7). using the Hoelder continuity of K. L.
n
n
and m. noO{nh) terms i n !
! . and the following quantities:
u . v=-q+ 1. u;l!v
2
Ast(h) = n-~-lW(Xs)(K*{K-L)«s-t)/nh» + 0(n- ).
l
W{xu +.)
= W{xu ) + O{lil/n) = W(xU ) + O{n- ) for lil<q.
1
K*(K-L)({u+i-v-j)/nh) = KM(K-L)«u-v)/nh) + O{li-jl/nh)
= KM(K-L)«u-v)/nh) + O(n-lh- l )
-109-
for li-jl~q.
then we have
Var(a 1q (h))
= Yare
n
n
~
~
u,v=-q+l,UjII!V
2
= ~2
n
~
q
(~
q
~
q
q
i=O j=O
n
~
~i~·A i +j(h»e e )
J u+ ,v
Uv
(.~
~ ~i~jAU+i,V+j(h»
u, v=-q+1, ujII!v 1=0 j=O
2
since AU+i ,v+j(h) = Av+ j ,U+1.(h)
= ~2
2
n
~
n
~
q
q
[~
u, v=-q+l, ujII!v
~
i=O j=O
-2-1
~i~.n ~
J
W(x
u+
i)(K*(K-L)«u+i-v-j)/nh»
+
0(n-2 )]2
= ~2
2
n
~
n
~
q
U, v=-q+ 1, ujII!v
q
[~
~
i=O j=O
-2-1
~i~jn ~
W(x )(K*(K-L)«u-v)/nh»
U
,
+
0(n- 2 )]2
2 -4 -2 q
4 n
n
2
h (~~i)
~
~
W(xu ) (K*(K-L)«u-v)/nh»
i=O
u,v=-q+1,Ujll!V
= ~2 n
2
+ O(n- )
2 -2 -1 q
4 ~ 2J
2
-2 -1
= ~2 n ~ (i:O ~i) JW (K*(K-L»
+ o(n ~ ).
Since S1 q (h) = a 1q (h) + a q (h) + bq (h), Var(a q (h»
2
1
2
2
E(bq(h) ) and E(a2q(h) ) are O(n
-3 -2
h
= 0(n-~-1), and
) as given 1n the proof of
(4.4.4.5) and (4.4.4.6), then we have Var(S1q(h»
0(n- 3h- 2 ). The proof of (4.4.4.7) 1s complete.
= Var(a 1q (h»
+
For the proof of (4.4.4.8), using the continuity of Wand m", and
r
-1 2
2.
-1 2
h (Jul()(m"W)(xs) + o(n h) for all s
Bs(h) = (-I)n
(m"W)(xu +i
)
for Ii I~q,
= (m"W)(xu ) + 0(1)
then we' have
Var(S2q(h))
n
q
= Yare
~
(~~.B +.(h»e )
u=-q+l i=O 1 u 1
U
=
~2 ~ [.i ~iBu+i(h)]2
u=-q+l
= ~2
n
~
u=-q+l
1=0
q
[~~ . ( -1 ) n
S
-1 2
2.
h ( u I()( m"W)( x i ) +
1=0 1 ,
u+
-110-
0
(n
-1 2
h)]
2
r
n
q
-1 2
2.
-1 2 2
!
[.! "'1(-I)n h (JuJ<)(m"W)(x) + o(n h)]
u=-q+I 1=0
n
2
-1 4
q
2 -2. 4 } 2. 2
!
(m"W}(x) + o(n h)
= ~2(! "'.) n n (uJ<)
·_A
1
u
.
1~
u=-q+ 1
= ~2
r
2.. 2J
2
-1 4
q
2 -1 4
= ~2( 1:0 "'1) n h (JUJ<) (m"W) + o(n h).
The proof of (4.4.4.8) is complete.
E(a
2q
-1 4
For the proof of (4.4.4.9). using Yar(S2q(h» = O(n h).
(h)2) = 0(n-3h- 2 ). and E(b (h)2) = O(n- 3h- 2 ). then we have
q
Cov(Slq(h),SZq(h»
= Cov(a 1q (h)+a2q(h)+bq (h),S2q(h»
= Cov(a 1q (h),S2q(h»
+
O(n-~)
+ O(n-~).
= E(a 1q (h)-S2q(h»
Thus.
E(a 1q (h) -S2
(h»
. q
n·
n
n
q
q
q
= ~
~'
~
(!
! "'i"'jA +i +j(h»(! "'.B +.(h»E(e e e )
u.v=-q+l.u~v z=-q+I i=O j=O
u.v
1=0 1 z 1
U V Z
= O.
since
E(e e e ) =
u v z
~3
if u=v=z
0
otherwise
~
The proof of (4.4.4.9) is complete.
For the proof of (4.4.4.2). using the linear expressions of
Q)
Q)
SI(h) can be decomposed into A + B. where
Q)
B=
Q)
Q)
!
!
U=-.u~v.v=-
-111-
b
uveuev .
Eo •
S
and where
n
=
n
~
~
Ast(h)~s-u~-u+t'
s=l,sjl!t,t=l
n
n
b uv = ~
~
Ast(h)~s-u~t-v'
s=l ,sjl!t, t=l
au
and f
.
u are lID random variables with mean zero and all finite moments.
Since A and Bare uncorrelated, then Var(Sl(h»
2
2
= E(A ) + E(B ).
Using the extended Whittle's inequality (3.6.1), we need to check
u:-Ol:l
lau
I<
CII
2
then we have E(A )
CII,
CII
for any s, t, and
~
~ ~
I~s-u~-u+tl <
u=-CIl
a
2
u
O(n~)
CII,
terms in
n
n
~
~
s=l,sjl!t,t=l
then we have
CII
lau I =
u=-CIl
~
-~
~
-1
n
~
~
Ast(h)~s-u~-u+tl
u=-CIl s=l,sjl!t,t=l
n
n
_~ -1
c ~
~
n ~
s=l, sjl!t, t=l
~
Since Ast(h) = O(n
I
n
n
~
) for any s, t,
.1.
t=l
~2 ~
u=-CIl
~s-u~s'-u = ~(s-s'), and
CII
~
u:-Ol:l
au
2
Ol)
=
= 0(1).
n
n
~
~
~
~(s-s')
= 0(1),
= O(n), then we have
s=1 s'=l
n
u=-CIl
"'-u+t
n
[~
~
Ast(h)~s_u~_u+t]2
s=l,sjl!t,t=l
n
CII
n
~ c ~ n- 4 h- 2 ( ~
~
.1.
.1.
}2
"'s-u"'-u+t
u=-CIl
s=I,sjl!t,t=1
n
CII
n
~ c ~ n- 4h- 2 ~ ~ ~ ~,
u=-CIl
s=1 s'=1 s-u s -u
-3 -2
-~-1
= O(n h } = o(n ~ ).
.
2
Now we calculate E(B).
2
Ol)
Ol)
~
~
Since b
uv
where
b
u=-Ol)
uu
2 ~ c ~
u=-CIl
n
n- 4h- 2 ( ~
n
~
s=1 t=1
-112-
= b
vu
,we have
Using the the Riemann summation, changing of variable, and the Hoelder
continuity of K, L, and W, then we have the following quantities:
A (h) = n-~-1W(x )[KM(K-L)«s-t)/nh)] + O(n- 2 ).
st
5
As-p.t-q (h) = n
-2-1
~
-2
W(x 5-p )[KM(K-L)«5-p-t+q)/nh)] + O(n )
-1
-2 I I -3h-1 + Ip-q In-3h-2 ).
= n -2
~ W(x )[KM(K-L)«5-t)/nh)] + O(n + p n
5
Thus, we have
c:o
c:o
~22 !
!
b 2
u=-CXl v=-«) uv
c:o
c:o
n
n
2
= ~2 !
!
[!
!
u=-«) v=-CXl 5=1,s#t,t=1
c:o
c:o
n
n
n
n
2
!
!
=~2!'!
!
!
u=-«) v=-c:o s=1,s#t,t=1 s'=1,s'#t' ,t'=1
n
n
n
n
=!
!
!
!
A5t(h)A5't,(h}~(s-s'}~(t-t'}
s=1,s#t,t=1 s'=1,5#t,t'=1
c:o
c:o
for ~2! ~ S-U.s
~,
~
~t'
-u = ~(5-s'), ~2!
-v =
u=-«)
v=-CXl t-v
n
n
=!
! ~(p)~(q)!
!
A t(h}A
(h)
Ipl<n Iql<n
5=1,s#t,t=1 s
s-p,t-q
~(t-t').
by letting p=s-s', q=t-t',
-4 -2
h
(
!
2
n
n
2
2
-2-1
!
W(x)
(KM(K-L)«x
-x )Ib»
+ o(n -h )
s
s t
5=1,s#t,t=1
Ip I<n
c:o
c:o
for!
Ip~(p)1 < c:o, !
Iq~(q)1 < C:O,
p=-CXl
q=-CXl
= n-~-1( ! ~(p})2[n-1! W(x )2][n- 1! (KM(K-L)(rn- 1h- 1 »2] + o(n-~-1)
. Ipl<n
5
5
r
= n
~(p»!
by letting r=s-t and u=«xs-xt)Ib),
!
~(p»2rw2J(KM(K-L»2 +
Ipl<n
J
The proof of (4.4.4.2) is complete.
=
n-~-1(
o(n-~-1).
For the proof of (4.4.4.3), using the linear expression of
then S2(h) can be expressed as
-113-
Eo
5
,
ClO
be,
u u
where
.
n
bu = L Bs(h)~s_u.
s=1
Using the Hoelder continuity of K, L, and W, the uniform continuity of
m", and the following quanti ties:
J'ru~)(m"W)(xs ) + o(n-l h2) ,
(-I)n-lh2(ru~)(m"W)(x
) + o(lpln- 1h2 ).
J'
s
B (h) = (-I)n-l h 2(
s
B
(h) =
s-p
(m"W)(x +hu) = (m"W)(x ) + o( 1),
s
s
then we have
Var(S2(h»
n
n
2
L
Bs (h)B s
= L
bu Jl2 = L
u=..(X)
s=1 s'=1
ClO
=
=
L
n
. ~(p) L
Ipl<n
L
Ipl<n
s=1
.(h)~(s-s·)
by letting p=s-s',
B (h)B
(h)
s
s-p
~(p) n-~4(Ju~)2 ~
s=1
III
for
= n- 1h4 (
~(p»(ru~)2J(m"w)2
L
J'
Ipl<n
L
p=..(X)
4
1
+ o(n- h ).
Ip~(p)1 <
ClO,
The proof of (4.4.4.3) is complete.
For the proof of (4.4.4.4), because SI(h), S2(h) , and
~j
have mean
zero, we have
n
L
= E(
n
L
Ast(h)(~s~t
s=I,s~t.t=1
=
n
n
L
!
s=l.s~t.t=1
To calculate
E(~s~t~u)'
n
- ~(s-t»· L Bu(h)~u)
u=1
n
L
u=1
A t(h)B (h)·E(~ ~ ~ ).
sus t u
using the linear expressions of
-114-
~
s
.
~
t'
and
~
u
:
co
=
~s
!
p=O
= !q
~Pe s-p '~t
E(~ ~t~ )
s
u
e,
q-s+t s-q
~
~
=!
u
r
~
e
,then we have
r-s+u
s-r
,
=
co
where
E( e s-pe s-qe s-r)
=
Hp=q=r
Jl3
t
0
otherwise
for e. are lID random variables with mean zero and all finite moments
1
~ = E(e 1k ) for all positive integers k.
Using Ast(h) = O(n
-~-1
-1 2
), Bu(h) = O(n h).
n
co
Jl2 !
p=O
~p~p-s+t = ~(s-t), and
n
=!
~
~p-s+u = 0(1),
n
!
!
s=l t=l
n
n
!
n
U:l
~(s-t)
= O(n). then we have
co
!
s=l,sjll!t,t=l u=1
n
-3 n
n
!
cn h!
!
s=1 t=l u=l
=O(n-~).
The proof of (4.4.4.4) is complete.
Then the proof of (4.4.4) is
complete.
Proof of (4.4.5):
The unique solution of the ARMA process of regression errors
~.
J
co
can be expressed as
~j
=
!
~iej-i'
where
~i
are real numbers with
i=O
co
!
I~il
< co
and e
1=0
finite moments.
i
are lID random variables with mean zero and all
For the proof of the asymptotic normality of
(-Sl-S2+Tl+T2)' we shall use the same arguments as in the proof of
(4.4.4).
Starting from the case that
~j
-115-
are lID random variables i.e.
~.
1
= 0 for all i > O. and e = O. Haerdle. Hall. and Marron (1988)
showed that
n
~(O)
where
9/10
(-81-82+Tl+T2)(~n
= Var(e ).
j
-1/5
) => N(O.
VST(~(O).~».
Using the same arguments in Haerdle. Hall. and
Marron (1988). this asymptotic normality can be extended to the general
case
e«
(4.4.5.1)
nh for
lID regression errors by showing that
Var(T (h»
1
2n-~-la4J(K-L)2Jw2
=
o(n-~-l).
+
The proof of (4.4.5.1) is given later.
Following the same steps as in the proof of (4.4.4.4). the
asymptotic normality of (-8 -8 +T +T ) holds for the MA(q) process with
1 2 1 2
q
< n 4/5
Then. using the Exercise 6.16 and the Proposition 6.3.9 of
Brockwell and Davis (1987). the asymptotic normality of (-8 -8 +T +T )
1 2 1 2
holds for the linear process
~j
~iej-i'
= !
Thus. the proof of
i=O
(4.4.5) is complete by showing the following quantities whose proofs
are given later:
=
2n-~-I(k=~ ~(k»2J(K-L)2Jw2
(4.4.5.2)
Var(T1(h»
(4.4.5.3)
Var(T2 (h» = n-lh4(k=~
(4.4.5.4)
Cov(Tl(h) .8 1(h» = 2n
+
~(k) )(Ju~)2J(m"w)2
o(n-~-l).
+ o(n-lh4).
-~-1 (k=~ ~(k»2JeK*(K-L)(K-L» Jw2
+
o(n-~-I).
(4.4.5.5) Cov(T (h) .8 (h» = n-l h4( ;
2
2
k=-GI
(4.4.5.6)
Cov(T 1 (h).T2 (h»
~(k»)( ru~)2J(m"w)2
+ Cov(T 1 (h),S2(h»
= O(n-~).
where
-116-
J'
+ o(n-l h4).
+ Cov(T (h),SI(h»
2
The proof of (4.4.5) is complete.
For the proof of (4.4.5.1), since Dij(h) = D.. (h) and
JI
~.
J
are lID
random variables with mean zero and all finite moments, then we have
n
2
2
= 2 I
I
Dij(h) ~(O) ,
j=l i: li-j I>e
Letting k=j-i, then we have
Var(T 1(h»
n
2 I
I
D. (h)2
j=1 i: li-j I>e Ij
_4
= 2n
= 2n
j:l i: li:jl>e (~(Xj-xi) - ~(Xj-xi»
~
(~(kln)
I
Ikl>e
= 2n- [J(Kh (t) -
2n-~-lJ(K-L}2Jw2
+
2
~(kln»
-
~(t»2dt
2
=
2w
n
·I W(xk+ i )
2
(xj )
2
i
2
1 2
+ O(tn- h- )][Jw + O(h)]
o(n-~-I).
The proof of (4.4.5.1) is complete.
For the proofs of (4.4.5.2) and
expressions of
~i' ~j'
and
~(i-j),
(4.4~5.3),
using the linear
then D1 (h) can be decomposed into
T1A(h) + T1B (h) + T2 (h), where
a f ,
T1A(h) = I
u=-CI»
=
T1B(h)
U U
CD
CD
I
I
b
uveueV,
u=-CI» , ujll!v , v=-CI»
CD
be,
u u
and where
a
u
n
I
= j=l
i:
I
li-jl>e
Di.(h)~i ~
j'
J
-u -u+
n
b
uv
=
I
I
j=l i: li-jl>t
D
ij
(h)~
n
b
u
=
I
j=1
Ej(h)~j
Thus, we have
-117-
-u
.
~
i-u j-v
.
Use the extended Whittle's inequality (3.6.1) to check that
2
E{T 1A {h) )
= o{n-2...-1
~
).
00
~
< 00 and !
I~
u=-OO
.1 <
-u+J
n
00
!
Iau I
u=-OO
For ! I~i-ul
i
-11 ~i ~ +j 1
c !
!
!
n-2...
~
u=-OO j=1 i: li-jl>e
-u -u
n
00
we have
00,
cn-1 h-1 .
~
n
Using ~2!
~-u+'~-u+j' = ~(j-j') and !
!
~{j-j'} = O(n}, we have
u=-OO
J
j=1 j' =1
00
E(T lA (h}2) ~ c !
au 2
u=-oo
for !
i
To check E(T 1B (h}2}, use
~he
I~.
l-U
I<
00,
extended Whittle's inequality
00
(3.6.2).
Since
!
b 2 = o{n-~-I} and b
= b , then we have
u=-OO uu
UV
vu
2
00
00
E(T 1B (h} } = E(!
!
bee}
u=-OO v=-OO UV U V
00
2
-2... -1
+ o(n ~ }
00
= ~22 !
!
b 2 + o(n-~-I},
u=-oo v=-oo uv
where
2
00
2
00
n
00
= ~2!
!
(!
!
D'j(h)~i_ ~j_ )
u=-oo v=-oo j=1 i: li-jl>e 1
u
v
=
00
n
2
n
~2!
!
!
!
!
!
u=-OO v=-OO j=1 i: li-jl>e j'=1 i':li'-j'l>e
Dij{h)Di'j,(h}~.
~. ~i' -u~.,
l-U J-v
J -v
n
n
=!
!
!
!
Dij{h}Di'j,(h}~(i-i'}~{j-j'}
j=1 i:li-jl>e j'=1 i':li'-j'l>e
-118-
~2
for
~.
L
l-U
u=-ro
~i'
-u =
~(i-i
~2
') and
L
v=-ro
~.
~.,
=
J-V J -v
~(j_jO),
n
= Ipl<n Iql~n ~(p)~(q)j:1 i: li:jl>e Dij(h)Di_q,j_p(h)
by letting p = j - j', q = i - i',
L
=
~(p)~(q)
L
Ipl<n Iql<n
L Ln
-2
Ikl>e i
n
-2
(Kh(xk ) -
~(xk»W(x'+k)·
1
(Kh(~_p+q) - ~(xk_p+q»W(xk+i_P)
2
by letting k=j-i, and D. (h)=n- (K_-J_ )«x.-x.»W(x.),
1
= (
2
~(p»
L
Ipl<?
L Ln
-4
Ikl>e I
(Kh(xk ) -
-n
j
~-
~(xk»-W(x'+k)
1
for W(~+I_p) = W(x i +k ) + O(lpl/n),
~
2
J
J
1
-3 -1
+ O(n h
)
ro
L
p=O
Ip~(p) I < ro,
ro
< ro,
Iq~(q)1
L
q~
(Kh-~)(xk_p+q)
and
= (
L
Ipl<n
~(p»2n-2[I(~(t)
-
=
~(t»2dt
-1-1
+ O( Ip-q 1n h ),
+ O(en- 1h- 2 )][IW 2 + O(h)] +
.
O(n- 3 h- 1 )
(Kh-~)(xk)
L
~(p»2I(K-L)2Iw2 + o(n-~-I).
Ipl<n
The proof of (4.4.5.2) Is complete.
=
n-~-1(
For the proof of (4.4.5.3), through a straightforward calculation,
we have
= ~2 (
~
b 2)
~
~2
=
~2
=
2
n
(L
ro
=
u
u=-ro
L
E.(h)~·_u)
J
J
u=-ro j=1
ro
n n
L
L
L
u=-ro j=1 j'=1
n
n
L
L
j=l j'=1
Ej(h)Ej,(h)~j_ ~j'_
u
Ej(h)Ej,(h)~(j-j')
u
for
~2
L
u=-ro
~.
J-U ~j' -u =
n
=
L
Ipl<n
~(p)
L
j=1
by letting p=j-j',
E.(h)Ej_(h)
J
P
-119-
~(j-j'),
where
-1 2 } 2.._
E.(h) = (-I)n
J
h ( ul<)(m"W)(x.) + o(n
-1 2
J
h),
} 2.._
I I -1 2
E. (h) = (-I)n-1 h2 (u-K)(m"W)(x
j ) + o( p n h).
J-P
Thus
Var(T (h) )
2
r 2._ 2
-2. 4
2
n
:I
..,(p»n n (JUI<) :I (m"W)(x j) + o(n
Ipl<n
j=1
..,(p»n-l h \ ru~)2J(m"w)2 + o(n- 1h4).
= ( :I
Ip I<n
J'
= (
-1 4
h)
The proof of (4.4.5.3) is complete.
For the proof of (4.4.5.4), follow the decompositions used in the
proof of (4.4.4.1) and (4.4.5.2).
Using the superscripts T and S
denote the coefficients bUY used for Tl(h) and Sl(h) respectively, we
have
co
= Cov(:I
u=-ClO
co
=C)..
2~
~2 ~
u=-ClO
2
=
co
co
co
co
T
S
:I
b e e ,:I
:I
b e e )
v=-ClO UV U V u=-co v=-ClO UV U V
co
~
bTbS
~
uv uv
for bT =bT . bS =bS
v=-ClO
UV vu
uv vu
n
co
n
n
~2:I
!
[!
!
D .(h)~i ~j ][:I
!
u=-ClO v=-ClO j=1 i: li-jl>t i J
-u -v s=I,s~t,t=1
Ast(h)~s-u~t-v]
n
= 2!
! ..,(p)..,(q)!
!
D (h)A
. (h)
Ipl<n Iql<n
j=l i: li-jl>t ij
i-p,J-q
co
for
~2:I
co
u=-ClO
~.
~
1-U s-u
= ..,(i-s),
~2!
v=-ClO
~.
J-V~t -v = "'(j-t),
and by letting p=i-s, q=j-t,
= 2!
! ..,(p)..,(q)
! n-~-I! (K-L) (xklh)W{x. k)A
k' (h)
i -p, +l-q
Ipl<n Iql<n
Ikl>t
i
1+
-2
by letting k=j-i, Dij(h)=n (Kb-~)(Xj-xi)W(Xj)'
= 2{
! ..,(p»
Ipl<n
2 -4 -2
n h
!
Ikl>i
({K-L).(K*(K-L»(~Ih).:I
i
-120-
W(x.)
1
2
+ O(n
-2
)
because in the proof of (4.4.4.1) we had
2
A.I-p. k+ i -q (h) = n-~-IW(xi -p )K*(K-L) (xk+p-qIh) + O(n- ).
1 1
K*(K-L)(~K+p-qIh) = K*(K-L)(~Ih)
+ O(lp-qln- h- ).
K
=
2n-~-1( ~
W(x.I-p ) . = W(x.)
+ O(lpl/n), and W(x. k) = W(x.) + O(h),
1
1+
1
~(p»2JK*(K-L).{K-L)Jw2
o(n-~-1).
+
Ipl<n
The proof of (4.4.5.4) is complete.
For the proof of (4.4.5.5), following the same decompositions used
in the proof of (4.4.5.2) and (4.4.4.3), where the superscripts T and S
denote the coefficients for T2 (h) and S2(h) respectively, we have
co
= Cov(
co
T
be,:I
:I
u u
u=-co
U=-Q)
S
be)
u U
(Xl
= Jl2
T S
:I
U=-Q)
co
since e
bu bU
n
u
are lID random variables,
n
= Jl2 :I
(j:1 Ej(h)~j-u){s:1 Bs(h)~s_u)
U=-Q)
n
co
n
= .:I
~
j=1 s=1
Ej(h)Bs(h)~(j-s)
for
Jl
~
2 U=-Q)
~.
J-u~ s-u =
~(j-s)
n
= :I
Ipl<n
~(p):I
=:I
Ipl<n
~(p)n
s=1
Ep+ {h)B (h)
s
s
-2. 4
by letting p=j-s,
r 2.._ 2 n
n (JUI<:)
~
(m"W)(x)
s=1
s
2
-1 4
+ o(n h)
since in the proof of (4.4.4.2) we had
2.._
-1 2
Bs (h) = -n-1 h2}ul<:(m"W)(x)
+ o(n h) and
s
2.._
2
Ep +s (h) = {-I}n-1 h2}ul<:(m"W)(x}
+ o( Ip In-1 h).
s
= n -1 h 4 (
r
2.._ 2J
4
:I
~(p) )(JUI<:)
(m"W) 2 + o(n-1 h}.
Ipl<n
The proof of {4.4.5.5} is complete.
For the proof of (4.4.5.6), following the proof of (4.4.4.4), we
have similar results.
Through a straightforward calculation. we have
-121-
n
.(h)(~i~j-~(i-j». ~
n
= CoV(
~
~
j=l i: Ii - j
I>e
Di
J
k= 1
~(h)~k)
n
n
= };
~
j=l i: Ii - j
~
I>e k= 1
D,. (h ) ~ (h) E( ~ i ~ j ~k )
1J
co
where E(~i~j~k) = ~3 ~
p=O
~p~p-i+j~p-i+k'
n
co
};
}; D.. (h)~(h)~3}; ~ ~ .+.~ '+k
p=o p p-l J p-l
j=1 i: li-jl>e k=1 IJ
where D.. (h) = O(n-~-l). ~ (h) = O(n- 1h2 ).
n
=
};
IJ
K
n
~
~
n-~-ln-lh2~(1_j) = O(n-~)
j=1 i:11-jl>e
c };
n
for};
};
1=11:11-jl>e
Using the same reason as above. we have
~(1-j)
= O(n).
Cov(T 1(h)'S2(h»
n
n
= CoV(j:l i: li:jl>e Dij(h)(~1~j-~(1-j»'k:lBk(h)~k)
n
n
= j:l i:
Ii:j I>e
n
k:l Dij(h)~(h)·E(~1~ j~k)
n
co
= j:l 1: 11:jl>e k:l Dij(h)~(h).p:O ~p~p-1+j~P-i+k
= O(n-~)
for D..
(h) = O(n-~-I)
IJ
and R (h)
-k
= O(n- 1h2 ).
Using the same reason as above. we have
Cov(T2 (h),SI(h»
n
n
n
~
= Cov( ~ E.(h)~ .. !
j=l J
J s=l.s~t.t=1
n
n
~
~
Ast(h)(~s~t-~(s-t»
n
}; Ej(h)Ast(h)E(~j~s~t)
j=1 5=1. s~t. t=1
n
n
n
co
=};
~
~
Ej(h)Ast(h) ~ ~ ~ -i+s~ -'+t
j=1 s=l.s~t.t=1
p=o p p
p J
2
1
= O(n-~)
for Ej(h) = O(n- h ). and A (h) = O(n-~-I),
st
The proof of "(4.4,5.6) is complete. So. the proof of (4.4.5) is
=
complete.
-122-
e-
Proofs of (4.4.8) through (4.4.12):
The proofs of the first parts of (4.4.8) through (4.4.13) are
given first.
Using the linear expressions of
C , C ,
s
t
and
~(s-t),
we
rearrange D1 (h) as
n
n
= !
!
Ast(h)(csc t - ~(s-t»
s=l t=l
+ S2(h)
= SlA(h) + SlB(h) + S2(h),
where
CD
SlA(h) =
a u!.u'
~
I
u=-CXl
CD
CD
SlB(h) = I
I
bee,
U=-CXl, ujll!v ,v=-CXl uv u V
CD
S2(h) =
I
u=-CXl
be,
u u
and where
n
n
I
I A (h)-I's-u-l'-u+t'
=
st
u
s=l t=l
n
n
b
I Ast(h)-I's-u-l't-v'
I
=
uv
s=l t=l
a
n
b
= I B (h)-I'
.
s=l s
s-u
u
For the proof'of (4.4.8), it is enough to check that, for any
h = O(n- 1/ 5 ) and any positive integer k,
E(lrn-1h-1/2S1A(h) 12k ) ~ C ,
4
E(lrn-1h-1/2S1B(h)12k) ~ C ,
4
2k
E(lrn-1h-1/2S2(h) 1 } ~ C .
4
Since r
-1 -1/2
h
n
9/10..
= O(n
). It IS equivalent to check that. for any h =
-1/5
2k
2k
2k
O(n
). the expectaions of SlA(h} • SlB(h} ,S2(h}
are all of
-9k/5
order O(n
}.
Using the extended Whittle's inequality (3.6.1) and
-123-
(3.6.2) to check the absolute convergence of
I.
00
00
00
!
lau
u=-oo
Ib
uv
I.
and!
Ib I first. then we have
u=-CXI
u
co
2
E(SIA(h)2k) ~ ( !
a )k.
u
U=-Q)
co
co
2
E(SIB(h)2k) ~ ( !
!
b
)k.
uv
U=-Q) v=-CXI
co
E(S2(h)2k) ~ ( !
b 2 )k.
u=-CXI u
which imply that we need to check. for h = O(n
co
00
-1/5
00
).
a
2
u
co
!
!
b 2. and !
b 2 are all of the order 0(n- 9 / 5 ).
uv
u=-CXI v=-CXI
U=-Q) u
In the proof of (4.4.4.3) through (4.4.4.4). we had
00
la I
u
u=-Q)
= 0(n- 1h- 1 ).
co
00
co
!
a 2 = 0(n-3 h- 2 ) = 0(n- 9 / 5 ).
U=-Q) U
co
co
1b uv I = 0(1).!
!
b
U=-Q) V=-Q) UV
2
-2. -1
= O(n
~
) = O(n
-9/5
e'
).
The proof of (4.4.8) is complete.
For the proof of (4.4.9). since r -lh- 1/ 2 = 0(n9 / 10 ). it is enough
n
to check-that. there is an
~
> 0.
such that. for any h. hI
.
€
Hn . with
~
h ~ hI' the expectatIons of (SIA(h) - SIA(h 1 »
. (SIB(h) - SIB(h 1 »
9k15 1
Ick
and (S2(h) - S2(h l » 2k are all of the order O(n (h-h )1h).
1
Using the absolutely convergent series
(Xl
(Xl
! la (h)l.
U=-Q)
u
(Xl
Ibuv (h)
I.
and
Ibu (h)1 as given in the proofs of
(4.4.4.2) through (4.4.4.4). then we have the absolutely convergent
series . ! lau (h) - a u (hl)l.
!
! Ibuv (h) - b UV (hl)l. and
U=-(Xl
U=-Q) V=-Q)
-124-
~
.
Q)
L
U=-Q)
Ib (h) - b (h
u
1
u
)1.
Using the extended Whittle's inequality (3.6.1)
and (3.6.2), we have
E«SIA(h) - SIA(h 1»
~
Q)
)
~
(L
(au(h) - a u (h 1 »
U=-Q)
Q)
2k
) ,
CD
E«SlB(h) - SlB(h 1»2k) ~ (L
L
(buv(h) - buv (h 1 »2)k,
U=-Q) V=-Q)
Q)
E«S2(h) - S2(h »2k) ~ (L
(bu(h) - b (h »2)k.
1
u 1
U=-Q)
Q)
These inequalities imply that we need to check L (au(h) - a u (h »2,
1
U=-Q)
Q)
Q)
L
(buv(h) - buv (h 1»2, L
(bu(h) - bu (h 1»2 are all of the
U=-Q) V=-Q)
U=-Q)
It:.
order O(n-9/51 (h-h 1)1h».
For SIA(h), using Ast(h) = O(n-~-l),
Q)
L
U=-Q)
n
CD
(L ~s u)2 = L
5=1
-
n
n
L
L
U=-Q) 5=1 5'=1
~s-u~s'-u
n
L
t=l
~
-u+
n
n
t = 0(1), and
= L L
5=1 5'=1
~(s-s')
= O(n).
then we have
Q)
(au(h) - a u (h 1»2
n
n
= L (L L (Ast(h) U=-Q) s=l t=l
L
U=-Q)
Q)
-2. -1
~
~
c(n
- n
-2.
~1
CD
-1 2
)
Ast(hl»~5_u~_u+t)2
L
0
u=-CD
~
en
-4
-1
(h
- hi
-1 2
) on
~ cn-3hl-21(h-hl)1h12 = 0(n-13/51(h-hl)1h12).
For SIB(h), we have
Q)
CD
(b
=
CD
Q)
L
L
u=-CD v=-CD
uv
(b
UV
(h) - buv (h 1»2
(h)2 - 2b (h)b (hi) + b (h)2).
UV
uv
uv
From the proof of (4.4.4.2), 'we have
CD
CD
L
L
U=-CD V=-CD
-125-
b
UV
(h)2 = O(n-~-I) and
Following the same steps as in the
proof of (4.4.4.2). we have
co
co
!
!
u=-c:o v=-c:o
=
n-~1-1(
buv(h)buv(hl)
!
Ipl<n
~(p»2rw2JK*(K-L)(t)eK*(K-L)(th/h1)
J
dt +
o(n-~1-1)
= O(n-~I-I).
Thus. we have
co
co
!
!
(buv (h) - buv (h 1»2
u=-c:o v=-c:o
-~ -1
-~ -II
c In ~
- n ~l
~
~ cn-~I-II(h-hl)1b1
-11 (h-h1)1b 1
cn-~
~1
2
~
For S2(h). using Bs(h)
= O(n- 1h2 )
= O(n-9/5 1(h-h 1)1b I ).
and
n
(!
!
u=-c:o
~
5=1
2
s-u ) = O(n).
then we have
co
u=-c:o
(bu (h) - bu (h 1»2
co
n
=!
(!
u=-c:o
s (hl»~ s-u )2
(B (h) - B
s
s=1
~
c(n- I h2 - n- 1h 2 )2;
(~~s_u)2
I
u=-c:o 5=1
-2 2
2 2
~ cn
(h - hI ) en
~ cn- 1h 4 «h2 _ h 2 )1b2 )2
1
-1
4
2
2 = O(n-9/5 (h-h )1b).
12
~ cn h «h+hl)lb) «h-h )1b)
1
1
1
So. we 'take e as 2.
The proof of (4.4.9)
is
complete.
For the proof of (4.4.10). it is equivalent to show that
sup
h€H
Ir -l h-I/2n (h)I
I
n.
=0
(n P).
p
n
2
-3 -2
In the proof of (4.4.8). we have E(SIA(h) ) = O(n h }.
2
E(SIB(h) ) = O(n
-~
~
-1
2
). E(S2(h) ) = O(n
-126-
-I 4
h). from which. we have
SlA(h), SlB(h), S2(h) are h
~
L 6/5,
hI' taking r
sup
I(h-hI)/h I~n-r
~
~
sup
I(h-h 1 )/h I~n
_
r
-1
.0p(I).
Thus, for any h. hI
€
Hn , with h
then we have
IDI(h) - D1 (h 1 )
I
(IS 1A (h)-SlA(h 1 )1 + IS 1B (h)-SlB(h 1)I
I (1)
sup
h-11 (h-h 1 )/h·o
I(h-h )/h I~n-r
p
= n-1/5-r
n·o
1
(1)
p
= 0 p (n-1 ),
which implies that
sup ID (h)
1
h€Hn
I
~
sup ID 1(h I )1 +
h €H'
1 n
sup
I(h-h I)/h \ ~n-r
1
sup \D I {h 1)1 + op(n- ).
h €H'
1 n
Since r -l h -1/2 = 0{n9/10 ) and 0 < p < 1/10. the second term of the
~
n
right hand side in the above inequality is 0 (r h
p
n
I/2 P
n ).
Hence, it is
sufficient to restrict the supremum in the statement of (4.4.10) to a
r 1
set H~ •. which is a subset of Hn so that #(H~) ~ n + and s6 that for
r
any h € Hn , there is an hI € H~ with l(h-h 1 )/h\ ~ n- .
Using Bonferroni's inequality and (4.4.S), we have, for any
any positive integer k, as n
~ m,
by taking k sufficiently large to make r+1-2kp
< O.
The proof of (4.4.10) is complete.
For the proof of (4.4.11), it is equivalent to check
sUP_3/10+
Ih-h11~n
p
Ir n-1 h-1/2 (Dl(h)
-127-
- Dl(h l »
I
= 0p(l).
~
> 0,
Using the result of (4.4.10). through adding and subtracting D1 (h'),
then we have
Thus we only need to check that
sup
-3/10+
Ih '-hll~n
p
For a
< lim
n
hI - n
where t
i
l/5
h
l
<b
-3/lO+p
- t - = n
i l
Ir
-1 -1/2
h
n
(D 1(h') - Dl(h l »
and for any r
= to
<
-(1I5)-r
sup
-3/10+
Ih'-hll~n
p
Ir
<
tl
tl
> 1110.
< ... <
for each i.
-1 -1/2
n
h
I = 0p(l).
suppose that
t p = hI + n
-3/l0+p
.
To check
(Dl(h') - Dl(~l»
I
e'
= opel).
it suffices to check that
sup
r n -1 h -1/21 D1 (t i ) - Dl(t j )
(ti·tj)€F
where f is the set of all pairs (ti.t j ) with 0
and 0
n
~
i. j
2(r+p-l/lO)
~
p.
I
= opel).
< I t i - t j I ~ n-3/l0+p
The number of elements in F is of order
.
Given any
~
>0
and any positive integer k. using Bonferroni's
inequality and (4.4.9). where l(ti-tj)/t
we have
-128-
i
l
~ n- 1/l O+ P . as n ~
m.
then
~ c
E«~-lr -lhI/2ID'(ti} - D'(t.}1}2k}
sup
n
(t .. t . }€F
~
J
J
1
cn2(r+p-l/10) n (-1/10+p}ck
~
0.
by taking k sufficiently large to make 2(r+p-l/10) + (-1/10+p}ck
< O.
The proof of (4.4.11) is complete.
For the proof of (4.4.12). using (4.4.10). we have
7/1 0+ P) -- D' (h )
d' (h)
d' (h)
d' (h )
0 p(n M = A M - M M = AM·
Thus.
dA'(hM}= dA'(hM} - dA'(hA}
= (D' (hM}+d M' (hM»
-
»
(D' (hA}+d M' (hA
= dM'(hM) - dM'(hA) + 0p(n-7/10 ).
So. we have
0p{n-
7/1
0+ P}
= dM'{hM}
where h* lies inbetween h
A
- dM'{h A)
and h .
M
= {hM -
hA}dM"{h*}.
Combining this result with
{4.4.15}. we have
hM - h A = open
The proof of (4.4.12) is complete.
-3/10+p
).
For the proofs of the second parts of {4.4.8} through {4.4.12}. we
shall, use the decomposition of D1(h)
in the proof of {4.4.5.2}.
= T1A (h)
+ T1B (h) + T (h) as given
2
The orders of the coefficients of T1A (h).
T1B (h). and T2 {h) are the same as those of SIA{h). SIB{h). and S2(h)
respectively.
So the proofs of the second parts of (4.4.8). (4.4.10).
(4.3.11) are exactly the same as the proofs of their first parts.
Following the proof of {4.4.5.2}, the proof of the second part of
(4.4.9) is exactly the same as the proof of its first part.
For the proof of {4.4.12}. we start from the first derivative of
7/10
CV {h) as given in (4.4.21). Using Remainder ' (h) = 0p{n) as
2
2
given in (4.4.13), then
-129-
CVe'(h) = d~e'(h) + (D + o)'(h) +
0p(n-
7/10
).
From (4.4.10) and (4.4.11), we have
0p(n.-
7/1O P
+ )
= (D +
o)'(h~e) = (D
= (D +
o)'(h~e)
=
o)'(h~e) - CVe'(hcv(e»
+
d~e'(hCV(e»
-
D'(h~e)-D'(hcv(e»
- (D + O)'(hcv(e»
O'(h~e)-O'(hcv(e»
+
-
+ 0p(n-
d~e'(hcv(e»
7/10
)
+ 0p(n-
7/10
)
S ,
-7110
= -d Me (hCV(e» + open
)
S
hA
)dS "(h*)
( -7/10)
= (hMe - cv(e) Me
+ 0p n
,
A
*
S
where h
A
lies inbetween hMe and hCV(e)'
given in (3.4.9), we have
A
S
Using hCV(e)/hMe
~
1 a.s. as
* = C2en
-.S -215
(l+o u (I».
S..
d Me (h )
Thus, we have
Op ( n-7/10+p)
= (hSMe
-
hCV(e) )c.2en -2/5(1 +ou (1»
'
which implies
h
S
Me -
hcv(e) = 0 p (n- 3/1 0+ p ).
The proof of the second part of (4.4.12) is complete.
Proof of (4.4.13):
Using the same decomposition as given in the proof of (3.4.2),
then Remainderi(h) can be decomposed as Remaindere(h) = A + B + C,
where
A= n
-1 n
!
j=1
(b ej - bj)(b ej + bj)W(x j ),
-1 n
!
B = 2n
C= n
-1 n
!
j=1
j=1
(ve.be . - vjb.)W(x.),
J J
J
J
(v e . - vj)(v ej + vJ.)W(x .).
J
J
In the proof of (3.4.2), we have that, for term A,
A = O(en- 1h3 + en-~).
-130-
Thus. the first derivative of A can be expressed as
(818h)(A)
for e
«
n
l/2
and h
= O(en- l h 2
= O(n- 1/5 ).
= 0 u (n- 7/10 )
2
+ en- )
For term B. through subtracting and
adding the term vjb ej • we have
B
-1 n
!
= 2n
j=1
[(Yeo - vj)b e · + vJ.(b eJ· - bJ.)]W(x J.).
J
J
where
3/5 -1 -1 2
3/5 -1
v.)b . = 0 (e
n h oh) = 0 (e
n h).
J eJ
u
u
-2/5
-1
-~-1
vj(b ej - b j ) = 0u«nh)
)oO(en h + en ~ ).
(v
eJ. -
Thus. we have the first derivative
(818h)[(v ej - vj)b ej + vj(b ej - b j )]
= 0 (e 3/5n- 1 + en-7/5h-215 + en-12/5h-12/5)
u
= 0u (n-7/10 )
for
e«
n
l/2
and h
This implies that the first derivative of B is 0 (np
using the following quantities: v ej - v
j
-1/2
v ej + v j = 0p«nh)
). then we have
= O(n- 1/5 ).
7/10 ).
For term C.
= 0u(e 3/5n-1 h-1 ).
(818h)«v ej - vj)(v ej + v j »
= (818h)(o (e3/5n-lh-lo(nh)-1/2»
u
= 0u (e 3/5n- 1 )
= 0u (n-7/10 )
for e
«
n
1/2
This implies that the first derivative of C is 0 (np
The proof of (4.4.13) is complete.
-131-
and h
7/10 ).
= O(n- 1/5 ).
rnAPTER V
AN APPLICATION OF THE PARTITIONED CROSS-VALIDATION
5.1
Introduction
Recall from Chapters 3 and 4 that the optimal bandwidth, h , and
M
the data-driven bandwidth, hCV(l}' are the minimizers of the mean
average square error (MASE or dM(h)} as given in (1.12). and the
modifie~
cross-validation score CVe(h} as given in (3.1.1)
respectively.
The ordinary cross-validation criterion CV(h} as given
in (1.14) could not provide asymptotically optimal bandwidths because
....
of the dependence between the "leave-I-out" version of m.(x.} as given
J
in (1.13) and Y ..
J
....
J
In Chapters 3 and 4, we used the "leave-(2l+1}-out"
version of mj(x } as given in (3.1.2) to reduce the effect of
j
dependence on bandwidth estimation.
However, (4.2.3) and (4.2.4)
....
showed that hCV(l} suffers from slow relative rates of convergence to
the optimal bandwidth h .
M
To improve the relative rate of convergence of the cross-validated
bandwidth, Marron (19B8) proposed the partitioned cross-validation
(PCY) for kernel density estimation.
In the PCY. the observations are
split into g subgroups by taking every g-th observation.
This means
that the observations of each subgroup are distanced by g/n.
Through
this property, we could use the PCY to estimate bandwidth when the
observations are dependent.
This is because the dependence effect on
the ordinary cross-validation score of each subgroup of observations is
reduced as the value of g is increased.
So. the PCV and the CVe(h)
have a similar ability to alleviate the dependence effect on bandwidth
estimation.
Section 5.2 gives some results for the ordinary cross-validated
bandwidth of each subgroup of observations.
The asymptotic normality
of the partitioned cross-validated bandwidth is given in Section 5.3.
A discussion of these results is given in Section 5.4.
Finally. the
proofs are given in Section 5.5.
5.2
Partitioned Cross-validation
The purpose of this section is to derive the asymptotic properties
of the ordinary cross-validated bandwidth of each subgroup of
observations.
The number of observations in each subgroup is
~
= n/g.
Here. for.notational convenience. n is assumed to be a multiple of g.
To derive the asymptotic properties. using the equally spaced fixed
design nonparametric regression model (1.1). the assumptions (A.I)
through (A.G) as given in Section 3.2. and the assumption (A.IO) as
given in Section 4.2. we must also impose the following assumptions:
(A.II)
For the PCV. the number of subgroups is g.
observations in each subgroup is
(A.I2)
H
=
n.g
~
~.
with
The number of
~ ~ m.
For the PCV. the bandwidth h is chosen from the interval
[a~
-115
-115
.b~
] for arbitrarily small a and large b and for
= 1. 2 . . . . .
Before we show the asymptotic properties for the ordinary cross-
validated bandwidth of each subgroup of observations. we shall first
introduce notation.
a subscript
e
The notation. as defined in Chapters 3 and 4. with
is simplified by ignoring the subscript
-133-
e.
Because, in
this chapter. we only consider the ordinary cross-validated bandwidth.
i.e. 2
d~(h)
=0
and
in the modified cross-validation criterion.
h~
now represent
d~O(h)
h~O
and
For example.
respectively.
Secondly. a
•
symbol with a subscript k means that the value is taken on the k-th
subgroup of observations. for each k
= 1.
g.
2.
For example.
CVk(h) denotes the ordinary cross-validation score CV(h) as given in
(1.14) .for the k-th subgroup of observations.
Thirdly. the notation
with a superscript * denotes the average of the values of the notation
of all g subgroups of observations.
CV*(h) = g
For example. CV* (h) is defined by
-1 g
!
CVk{h).
k=l
Finally. h denotes the minimizer of the function expressed by its
subscript or superscript.
For example. h
CVk
"'* are the minimizers
and hCV
of CVk(h) and CV*{h) respectively.
Now we start to derive the MASE of each subgroup of observations.
Given the assumptions above. the definition of dM(h). and the results
of (4.2.5) and (4.4.22). we have. as n
~ m.
{5.2.1}
which have these minimizers respectively.
(5 .2. 2) h*M
= h Ml = h M2 = ... = h Mg = COgTl-1/5
+
where
a
1g
= {
;
k=-Gl
~(gk»rK2rW.
JI J
-134-
B
-3/5
OgTl
+
0
(-315)
Tl
•
e- .•
BOg = (1I20)[{
'Y{gk)J~JW)3/5{Ju1<)(J{m(3»2w+Jm"m(3)w')]
;
k=-clO
I
[( Ju~) 11 (J(m" )2w)8] 1/5.
and where the subscript g denotes that the notation apply to the PCV
with g subgroups of observations.
the notation
J
J
denotes
duo
Here and throughout this chapter.
-1 -1
The term a1g~ h
representation of the variance of the MASE.
a
1g
and a
1
is an asymptotic
Looking at coefficients
as given in (3.3.1). the PCV would not reflect the
dependence structure of the whole data set.
Now we start to derive the asymptotic properties of the ordinary
cross-validated bandwidth.
of
Given the assumptions above. the definition
d~{h) as given in (3.4.5). and the results (4.2.6) and (4.2.22). we
have. as n --+
co.
(5.2.3)
which have these minimizers respectively.
S*
S
S
S
~~
(5.2.4) hM = hMl = h M2 = ... = h Mg = ~Og~
-1/5
where
a~g
= [
;
k=-clO
'Y(gk)r~
Jl
- 4! 'Y(gk)K(O)]rW.
k>O
J
-135-
S -3/5
+ BOg~
+ o{~
-3/5
).
and where the superscript S represents subtraction.
Here. a
S
1g
must be
S*
S
S
positive. otherwise d M {h}. d M1 {h} •...• dMg{h} have no minimizers on
H
n.g
Using {3.4.7}. the ordinary cross-validation score of the k-th
subgroup of observations CVk{h} has an asymptotic expression
for each k
= 1.
2 •...• g.
The notation X = 0 {v } denotes that X Iv
nun
n n
converges to zero almost surely. and uniformly on H
if v involves
n.g
n
h.
Using {3.4.9}. we have. as
~ ~
m
and
~ > O.
{5.2.6}
for every k.
(5.2.7)
Taking the average of CVk{h}. we have
*
-1 n
CV {h} = n
!
j=1
~_
ej-w{x.} + d
J
S*
(h) +
M
Then. by (3.4.7) and (3.4.9). we have. as ~ ~
(5.2.8)
h~~* ~ 1
(~
0
-1 -1 4
h +h).
u
m
and cSOg
> O.
a.s.
If ~Og is negative or zero. then CV*{h) has no minimizer on H
n.g
asympto·tica11y.
Thus having looked at these asymptotic relationships of the
ordinary cross-validated bandwidths. we are now ready to define the
-136-
partitioned cross-validated bandwidth.
is of the order n
~
-1/5
= n -1/5g 115
-1/5
Since the optimal bandwidth h
M
A*
as given in (4.2.5) and hCV is of the order
.
. (5 .2. 8) . t h en t h e partltlone
..
d crossas glven
ln
A*
validated bandwidth is defined as the rescaled hCV
(5.2.9)
= h*
hpcv(g)
g-1/5
CV
Then. (5.2.4). (5.2.8). and (5.2.9) imply that
(5 .2. 10) hPCV(g) /hM
A
When g
= 1.
IC]
= [-8
~Og 0
+
[BS IC ] -215 2/5
0 n
Og
g
( -215 2/5)
+ 0u n
g
.
the-partitioned cross-validated bandwidth is equal to
the ordinary cross-validated bandwidth and the modified cross-validated
A
bandwidth with
e
A
= O. i.e. h pcv (l)
= hCV
A
= hCV(O)'
However as g
~ ro
. h g «112
Wlt
n
• we have
This means that the partitioned cross-validation criterion could not
prOVide an asymptotically optimal bandwidth estimate for the measure
MASE. no matter how large the vaule of g is.
5.3
Asymptotic Normality
The object of the PCV is to improve the rate of convergence of the
cross-validated bandwidth.
However. the modified cross-validation puts
the emphasis on the reduction of the asymptotic bias of bandwidth
estimates.
In Section 5.2 we showed that the PCV could not prOVide
asymptotically optimal bandwidths when the observations are dependent.
It is still worth calculating the limiting distribution of hpcv(g)/hM'
Based on the assumptions given in Section 5.2. it is shown in
-137-
Section 5.5 that. as n
~<:;
~ m. ~Og
>0
and g
= o(n 1/2 ).
(5.3.1)
m
=> N(O. v (
g
!
k=-Q)
~(k»
-21fi.-
-YarM).
where
co
= (!
v
co
!
i=-m j=-co
g
~(j)~(j-ig»( !
for L(u) = -uK'(u) and
llI)
k=-Q)
r,
~(gk)J~ - 4! ~(gk)K(O»
_7/5
,
k>O
*
meaning convolution. Based on (5.3.1).
2
dM(hpcv(g»/dM(h M) has a noncentral Xl distribution.
A
5.4
Discussion
First of all. the results obtained in this chapter are still true
when the regression errors are a linear process as defined in Section
4.4.
Pertaining to the PCV and the CVe(h). in (4.2.3) and (5.3.1). we
know that hpcv(g) has a faster relative rate of convergence than
A
hCV(i)'
However. as derived in (5.3.l) •• h pcv (g)/hM - 1 suffers from an
co
!
-r(k» 1/5 - 1
asymptotic bias cSOg/C - 1 which converges to (~(O) /
O
k=-Q)
at a po"lynomial rate as g
~ m.
This asymptotic bias is caused by the
distance g/n among the observations of each subgroup.
This means that
the PCV could not reflect the dependence structure of the observations.
An immediate remedy for this asymptotic bias is to split the
observations into g subgroups by taking every g-th cluster.
cluster is composed of
r
consecutive observations.
Each
Thus. the PCV would
be able to reflect the dependence structure of the observations.
-138-
Since
the autocovariance function of ARMA regression errors is geometrically
bounded (the property (d) of Section 3.2), it is enough to take the
value of
C as
A drawback to this
c(log n) for some constant c.
approach is that it requires too many observations.
Looking at (5.3.1), as n
~~,
> 0,
cSOg
and
g
«
n
1/2
,the
A
asymptotic mean square error (AMSE) of hpcv(g)/h - 1 is
M
A
(5.4.1)
=
AMSE(hpcv(g)/hM - 1)
~c.;
(~Og/CO
S
-2/5 2/5
2
-4/5 -1/5
g
- 1) + g
n
v (
+ BOg/CO·n
g
~
~
k=-CX)
~(k»
-2/5
Var M.
Since the autocovariance function of an ARMA process is geometrically
bounded, then the coefficients in (5.4.1) have the following polynomial
rates of convergence, as g
~ ~
~/CO
S
~
~
gk~
~(k»
~
- 1
~
BOg/CO
v (
= O(log
and g
-2/5
n):
A,
B,
Var
M
~
V.
where
A = [~(O) /
B
~
V = (
= [~(O)3
~ ~(k)2)[(
k=-CX)
~
~ ~(k)]1/5 - 1.
k=-CX)
~
/
!
k=-CX)
~(k)]1/5.b.
~
~ ~(k»2~(O)7]-1/5.v.
k=-CX)
and where
b
= (1/20)[(J~JW)2/5(Ju4K)(J(m(3»2w
+ Jm"m(3)w')] /
[(Ju~)9(J(m")2w) 7 ] 1/5.
-139-
v
Thus. as g
= O(log
= [Var M /
rr.2
(J~)
7/5
].
n). this AMSE can be expressed as
'"
(5.4.2) AMSE(hpcv(g)/hM - 1)
= [A
+ Bn
-2/5 2/5 2
-115 -415
g
] + Vn
g
.
where B and V are positive quantities.
AMSE decreases as g increases.
If A is negative. then this
This is because. asymptotically. A is
the dominant term in the asymptotic bias.
If A is positive. then this
AMSE may be minimized over g giving the minimizer
Note that while this result is of theoretical interest. the
practical applicability of it is somewhat hampered by the fact that it
depends on the unknowns. "T(O).
; "T(k). J(m(3»2w. and J(m")2w.
k=-eD
For
this choice of gO' the asymptotic expression of (5.4.2) implies that
(5.4.4)
which provides an improvement over (4.2.3).
Here the notation - means
that the random variable on the left side has an asymptotic normal
distribution. as n
side.
~ m.
when normalized by the sequence on the right
In Section 6.3. we do a simulation study to show the performance
,.
of hpcv(g) when the value of go is decided by the plug-in approach.
We shall use two examples of MA(I} and AR(1) processes of
regression errors. which have been given in Sections 3.5 and 4.3. to
discuss how the asymptotic bias-square and the asymptotic variance
depend on the value of g.
For the MA(1) process of regression errors.
we have. through a straightforward calculation.
-140-
[1 -
(4K(O)/J~)8(1+8)-2]1/5
[ [(1+8 2 )(1+8)-2]1/5
[
vg
=
if g
=1
if g
~
2
(1+8)4/5[1-(4K(O)/J~)8(1+8)-2]3/5.a4/5b
if g
(1+8 2 )3/5(1+8)-2/5. a4/5b
if g ~ 2
=1
(1+48+682+483+84)a6/5«1+8)2J~ - 48K(O»-7/5
if g
=1
(1+682+84)a6/5«1+82)J~)-7/5
if g
= 2.
(1+482+84)a6/5«1+82)JK2)-7/5
if g ~ 3
where 8 and a
2
are the MA(l) parameters. and 8 # -1.
For any sensible
choice of kernel function K. the maximum value of K is assumed at O.
then we have
J~ < K(O)JK = K(O) < 2K(O).
Looking at the above coefficients cSOg/Co'
B~/CO' and v g ' the effect of
the dependent observations on the asymptotic bias-square and the
asymptotic variance becomes constant when g
~
3.
observations of each subgroup are independent.
This is because the
However. the minimum
value of the AMSE could possibly be at the large value of g.
This is
becaus~ the asymptotic variance decreases with a rate g-4/5. and the
asymptotic bias-square increases or decreases with a rate g
example. if 8
>0
and cSOg
> O.
then
o < ~l/CO < cSOg/Co < 1
S
o < BS01 /CO < BOg/CO
-141-
for all g
for all g
~
~ 2.
2.
2/5
For
In this case. the minimum value of the AMSE might be at any g
~
1 for
the different combinations of the value of the factors n. K. b. v.
2
and o.
If
e <0
~
and C-Og
> O.
then
< cSOg/CO < ~l/CO
for all g
~
2.
S
S
o < BOg/C
O < BOI/CO
for all g
~
2.
1
In this case. there are some values of K. n. and Var
minimum value of the AMSE might be at g
cases.
e.
= 1.
M
for which the
However. these are rare
Generally. in the MA(I) case. we prefer to take the value of g
as large as possible no matter what the values of
course. t h e va I ue
0
«
f g i s restr i cte d to g
n
e
and
0
2
are.
Of
1/2
For the AR(I) process of regression errors. we have. through a
straightforward calculation.
~
Og/CO
B~/CO
v
g
= [(1-+)(1++) -1 rIg] liS •
= [04(1_+)-I(I++)-3]I/S rtg3/S. b •
= [06(1_+2)-3(J~)-7]I/Sr2g-7/S.
where
rIg
r
2g
=I
-
[(4K(O)/J~) -
2]+g(I-+g)-I.
= (1++2 )(I_+2)-1(1++g )(I_+g )-1 + 2g+g (I-+g )-2.
and where the meaning of the notation b has been given above.
and cSOg
> O.
then rIg increases to 1 and r
g ~ -(log +)-1). as g increases.
2g
If.
>0
decreases (when
This implies both of the asymptotic
bias-square and the asymptotic variance decrease.
-142-
In this case. the
minimum value of the AMSE is at the large value of g, with g
If
«
1 2
n /
~ < 0 and cSOg > 0, then the cases of even and odd are discussed
separately.
When g increases as an even number, the results are the
same as in the case of
number g and some k.
~ > O.
This is because
~g
= (~2)k
for the even
When g increases as an odd number, using (5.4.3),
then the minimum value of the AMSE is at
-143-
5.5
Proofs
Proof of (5.3.1):
We shall give the outline of the proof of (5.3.1) first.
details are given later.
Let t n
= n-1 h -1
4
+ h.
The
The meaning of the
notation with a superscript * or a subscript k has been described in
Section 5.2.
n
~
00.
Based on the assumptions as given in Section 5.2. as
~g > O. and g «
n
1/2
. it is shown later that
(5.5.1)
(5.5.2)
for any
~ €
[a.b]. where
Psr(g·P) =
2P-l(p~ t=~
,(t),(t-pg »JW2J(K*(K-L) - (K-L»2.
The implications of (5.5.1) and (5.5.2) are. as n
~ €
[a.b]. and
(5.5.3)
(5.5.4)
hMS*
-S
-1/5
= ~Ogn
~
00.
for any
(1+0(1».
g1/2n7/10(D*+6*}'(~n-1/5)=> N(0.4P-2PST(g.~».
g1/2n7/10(D*+6*)'(h:*)
=> N(0.4(~)-2PST(g.cSOg»'
Based on the assumptions given above. we have the following
asymptotic properties (5.5.5) through (5.5.10) which are shown later:
For each positive integer k. there is a constant C . so that
4
2k
(5.5.5)
sup E(lg1/2~ -l h 1/2(D*+6*)'(h} 1 ) ~ C .
4
h€H
n
.
n.g
furthermore. there is an
(5.5.6)
>0
and a constant C . so that
5
E(lg1/2tn-1h1/2[(D*+6*}'(h) - (D*+6*)'(h )]1 2k )
1
~
~ C51(h-hl)/hI~k.
whenever h. hi
(5.5.7)
n.g wi th h ~ hi' For any p € (0.1110). we have
sup Ig 1/2 t -l h 1/2(D*+6*)'(h)I = 0 (nP ).
€ H
~
p
n
n.g
1/5
furthermore. if n
h 1 tends to a constant. then we have
-144-
(5.5.8).
(5.5.9)
Finally. we have
. *. (h) = 0 (g-1/2~-7/10 ).
Remalnder
u
(5.5.10)
A
Now we derive a limiting distribution for hpcv(g)/hM'
According
to (4.4.21). the average of all ordinary cross-validation scores CV* (h)
can be expressed as
*
-1 n
CV (h) = n
~
(5.5.11)
2-_
~j-W(Xj)
j=l
S*
*
+ d M (h) + D (h) +
6* (h) + Remainder* (h).
Taking
~
A*
= hCV in the first derivative of (5.5.11). we have
(5.5.12) 0
= CV*. (hA*CV )
= (d
S*
M
* *
*
A*
+D +6 +Remainder )'(h )
CV
A*
S* S*
*
* *
A*
* A*
= (hCV - hM
)d "(h) + (D +6 ). (h ) + Remainder . (h )'
M
CV
CV
*
A*
S*
A*
-S -1/5
where h lies inbetween hCV and hi. Using hCV = ~Og~
(l+ou (l» as
given in (5.2.4). and (5.2.8). we have
S*" *
dM (h)
-~ -215
= C2g~
(l+ou(l».
where
-S
C2g
d? -3
= 2aS1g (Og)
d?
+ 12b 1 ( Og)
= 5[(k=~ClO ..,.(gk)J~
2
- 4 ;0
k
,.(gk)K(0»2(JW)2(Ju~)6(J(m")2w)3]1I5.
A*
S*
-1/2 -3/10+p
Using (5.5.9). for any p € (0.1/10). hCV = hM + 0p(g
~
). and
.
S*
-S -1/5
using (5.5.8) (wlth hI = hM = ~Og~
(1+0(1»). we have
(D*+6*)'(h~) = (D*+6*)'(h~*) + Op(g-1/2~-7/10).
Combining this result with (5.5.10). then (5.5.12) becomes
(5.5.13)
0 = (h~ - ~*)~g~-2I5(1+0u(l»
+
(D*+6*)'(~*) +
o (g-1I2T/-7/10).
p
Next. combining (5.5.13) with (D*+6*)'(hS*)
M
-145-
= 0p (g-1/2T/-7/10)
as given
in (5.5.4). then we have
h*
CV
A
(5.5.14)
_ hSM* _
0 ( -1/2 -3/10)
- p g
~
.
Mu 1 tip 1ying (5 .5. 13) by g 1/2~7/10 and comb·In i ng t h e result with
(5.5.14). we have
0 = g1/2~3/10(h~ - ~*)~g + g1/2~7/10(D*+O*)'(h~*) + 0p(l)
(5.5.15)
2/5 1/10
.-S_S
~
[hpcv(g)/hM - c-Og/CO]COSg +
A
= g
gl/2~7/10(D*+O*)'(h~*) + 0p(l).
A
A*
-1/5
-1/5
.
where hpcv(g) = hcvg
as gIven in (5.2.9). hM = COn
(1+0(1»
.
S*
~C;
-1/5
.
given In (3.3.6). and h M = C-Og~
(1+0(1» as given In (5.2.4).
as
Then. combining (5.5.15) and (5.5.4). the proof of (5.3.1) is complete.
The notation and the properties used to prove (5.5.1) through
(5.5.10) are the same as (a). (b). (c). and (d) given in Section 4.4.
Proof of (5.5.1):
Using the expression of Ast(h) as given in property (d) of Section
4.4. for each k = 1.2....• g. we have
-rr-1
S3k (h)
= s=o
!
Asg+k.sg+k (h)~ sg+k'
where
Asg+k .sg+k(h) =
~-~-I(I+0{1»rK(K-L)-W(x
J
sg+k)'
Thus
*
S3(h) = g
-1 g
! S3k(h) = g
k=1
-1 -2. -1
~ ~
(1+0(1»
.
YK(K-L)!n
j=1
W(x.)~ ..
J
J
*
Through a straightforward calculation. the second moment of S3(h)
is
Y
*
2 ) = g-2~-4h-2 (1+0(1»{ K(K-L» 2 -E(!
n
E(S3(h)
2
. 1
J=
W(x.)~.)
J
J
= O(g-2~-4h-2.n) = O(g-1~-3h-2).
n
for E( !
j=l
~j)
2 .
= O(n) as gIven in (3.6.3). g «n
boundedness of W.
1/2
• and the
*
This implies S3(h)
= op(g-1/2~-9/10 ).
-H6-
The proof of
(5.5.1) is complete.
Proof of (5.5.2):
Using the expression of B (h) as given in property (d) of Section
s
4.4. for each k
= 1.2 •....
S2k (h)
g. we have
=
-ry-l
L Bsg+k (h)c sg+k'
s=O
where
B
sg+k(h) = _T}-l h2(l+0(1»( J'ru~)(m"W)(xsg+k)'
Thus
(5.5.16)
*
S2(h) = g
-1 g
L S2k(h)
k=1
~_
= -g-1 T}-1 h2 (1+0(1»( } uJ<:)
n
L (m"W)(x.)c ..
j=1
J J
Using the same reason as S2(h).
*
we have
(5.5.17)
r ~_
= -g-1 T}-1 h2 (1+0(1»(JUJ<:)
*
T (h)
2
n
L
j=1
Using the boundedness of m"W and
n
L
c
j=1
j
=0
(n
(m"W)(xj)c j .
-1/2
) as given (3.6.3).
p
we have
*
*
(5.5.18) -S2(h)+T2 (h)
= o(l)oOp(g-1 T}-I h 2n 1/2 ) = op(g-I T}-9/10 ).
. the expression of SI(h).
*
For
using the Ast(h) as given in property
(d) of Section 4.4. we have
-ry-l
Slk(h) =
-ry-l
L
L Asg+k.sg+k (h)(c sg+kc tg+k -
s=O.S;l!t.t=O
where
Thus
(5.5.19)
*
SI(h)
= g -1
g
L
Slk(h)
k=1
-147-
~(sg-tg»
.
= g
-1 -2 -1
~ ~
g 1/1
1/1
k
(1+0(1»!!
! K*(K_L)(S~tg)W(s~+)_
k=ls=O,s;l!t,t=O
( c sg+kCtg+k n
~(sg-tg»
n
= g-1~-~-1(1+0(1»!!
! K*(K-L)(~h)W(~)(C C - ~(pg».
n
n
s t
p~ s=1,s-t=pg,t=1
*
Using the same reason as S1(h),
we have
*
T (h)
(5.5.20)
1
= g
-1 -2-1
~ ~
n
(1+0(1»!
p~ i
n
!
!
(K-L) (~)W(~)= 1 . i - j =pg , j = 1
(cic j - oy(pg».
Using (3.6.3), we have
* * * *
(-S1-S2+T1+T2) (h)
n
=
n
g-1~-~-1!!
!
p#O s=l,s-t=pg,t=1
(-KM(K-L)+(K-L»(~)W(~)-
= U + 0p (-1/2
-9/10) ,
g
~
where
U = g
-1 -2 -1
~ ~
E&
s
! ! (-K*(K-L)+(K-L»(nh)W(-)(c C
- oy(pg».
p#O s
n
s s-pg
Thus, using the same arguments as in the proof of (4.4.5), then the
asymptotic normality of (5.5.2) is true.
Then, the proof of (5.5.2) is
complete by checking that its asymptotic variance is correct.
Using the linear expressions of c , c
5
~
c
s
=
s-pg
~
,and
~(pg):
~
! ~.e ., c
= ! ~je
j ' ~(pg) = ~2! ~i~i
,and
i=O 1 S-l
s-pg
j=O
s-pgi=O
-pg
~
c 5 c s-pg - oy(pg) = . ! ~'~i
1
-pg (e s- i
1=0
2
- ~2) +
A + UB' where
UA = ! a u f u '
then U can be decomposed into U
u
U
B = !u#v! buveuev ,
and have
-148-
! . ! ~i~·e.e
J 5-1 s-pg-J.,
j#l-pg
e"
au = g
-1 -?_ -1
~ ~
nQ'"
~
n
p;iO s
-1 -?_ -1
~
mr
s
n
n
s-u~ s-pg-u .
s-u~ s-pg-v .
P;iO s
for letting u=s-i. v=s-pg-j. f u =e u 2 - ~2' Thus we have
Var(U) = E(U 2 ) + E(U 2 ).
buv = g
~ ~
S
~ (-K*(K-L)+(K-L»(nh~)W(-)~
~ (-K*(K-L)+(K-L»(~h)W(-)~
A
B
To calculate E(U 2 ). using the boundedness of Wand
A
~ ~
= 0(1) for any u. O(n) terms in ~.
p;iO s-pg-u
s
~(s-s') = O(n). and ~ ~s-u~,
s -u = ~2-1~(s-s'). then we have
( -K*(K-L)+(K-L».
~ ~
s s
u
2
~ c~
E(UA )
au
2
u
~ c~ [g-l~-~-l ~ ~ (-K*(K-L)+(K-L»(Nnh)W(~)~
n s-u~s-pg-u J2
u
P;iO s
~ cg-2~-4h-2~ [~~
]2 = cg-2~-4h-2~ ~ ~ ~ ~,
s-u u s s . s-u s -u
u s
~ Cg-2~-4h-2.n = O(g-1~-3h-2) = o(g-1~-9/5).
To calculate E(UB2 ). using buv =bvu for any u. v.
~2~ ~
u
~,
~(s-s'). ~2~ ~
s -u =
s~u
v
s-pg-v~..
s -p g-v =
~(s-s·-pg-p·g).
then we
have
2
E(UB ) =
= ~22~
E(~ ~
bee)
u#v uv u v
2
=
2
2
b
u;iv uv
~2 ~ ~
~ [g-l~-~-l ~ ~ (-K*(K-L)+(K-L»(nhB[)W(~)~
~
]2
n s-u s-pg-v
p;iO s
u;iV
= ~22g-2~-4h-2~
~
~
~
~
N
~
s
(-K*(K-L)+(K-L»(nh)W(-)·
u;iv P;iOs p';iO s·
n
(-K*(K-L)+(K-L»(~~g)W(~')~s_u~s_pg_v~s'_u~s'_P'g_V
= 2g-2~-4h-2
~ ~ ~ ~ (-K*(K-L)+(K-L»(B[)W(~).
p;iO s P';iO s·
nh n
(-K*(K-L)+(K-L»(~g)W(~')~(s-s')~(s-s'-pg-P'g)
= 2g-2~-4h-2 ~
~ ~ ~ (-K*(K-L)+(K-L»(~h)W(~).
P;iO s i
n
j
n
(-K*(K-L)+(K-L»(~ig)W(S~j)~(j)~(j-ig)
by letting i=p-p' and j=s-s·.
-2 -4 -2
2 ~ s 2
~ h
~ ~ ~ ~ (-K*(K-L)+(K-L»
( h)W(-) ~(j)~(j-ig) +
p;iO s i j
n
n
= 2g
-149-
O(g-2TJ-3h-2) .
Since (-K*(K-L)+(K-L»(p~ig) = (-K*(K-L)+(K-L»(~) + O(~).
(Xl
(Xl
W(s-j) = W(~) + 0(-.1).
!
IJ'T(j) I = 0(1).
!
lig"T(j-ig)I = O(ljl).
n
n
n
j=-o:l
i=-o:l
the boundedness of Wand (-K*(K-L)+(K-L». O(T]h) terms is
!
. and
p;eO
O(n) terms in !. then we have
s
2
E(U )
B
=
2g-1TJ-~-lS(K*(K-L)-(K-L»2SW2[;
; "T(j)"T(j-ig)] +
i=-o:l j=-o:l
o(g-lTJ-~-l).
The proof of (5.5.2) is complete.
For the proofs of (5.5.5) through (5.5.10). using the expressions
*
*
*
of Sl(h}.
S2(h}.
T1(h}.
and T*2 (h} as given in (5.5.16) through
(5.5.20). and following the same arguments given in the proofs of
(4.4.S) through (4.4.13). through a straightforward calculation. then
the proofs of (5.5.5) through (5.5.10) are complete.
-150-
e'
OlAPTER VI
A SI MULATI ON STUDY
6.1
Introduction
To investigate the practical implications of asymptotic results
for bandwidth estimates presented in
study is carried out.
regression setting.
Ok~pters
3. 4. and 5. an empirical
In this section. we shall describe the simulated
The regression model is taken as the equally
spaced fixed design nonparametric regression model (I. I}.
underlying regression function is taken to be m(x)
o
~
x
~
I.
The
= x 3 (I-x) 3
for
This function has the nice effect of allowing a circular
design to eliminate boundary effects and has a uniformly continuous
fourth derivative. a condition needed by (4.2.5).
The regression
errors are taken to be a first order autoregressive process. AR(I}.
~1 ~ N(0.a2/(I-~2}} and ~i
= ~~i-l
= 2.
3 ..... n.
where e are independent and identically distributed (lID) N(O,a2 }. and
i.e.
+ e . for i
i
i
~ and
a
2
are the AR(I) parameters.
B have the same distribution.
the regression values and the
Y.
I
(~
The notation A ~ B means that A and
The observations are generated by adding
re~ression
errors together. i.e.
= m(x.}
+ ~ .. for i = '1.2 . . . . . n. Five combinations of ~ and
I
I
= O. a = 0.0177: 0.6. 0.0071: 0.6. 0.0018: -0.6. 0.0283: -0.6.
0.OO29) and the sample size n
of a make h
is K(x}
M
= 200
are investigated.
correspond roughly 1/5. 1/1. or 1/2.
2 2
= (15/8)(1-1x)
I[-I/2.I/2](x).
a
Here the values
The kernel function
The weight function is W(x)
=
(5/3)I[l/5.4/5](x).
The Nadaraya-Watson estimator (1.2) is applied to
get the kernel estimates.
The same functions K and m were also used in
Rice (1984) and Iberdle. lIall. and Marron (1988).
For each combination
of 4> and a. 1000 independent sets of data are generated.
All the
normal random variables are generated by the function RNDNS in GAUSS
where the seed is taken as 123.
For the modified cross-validation criterion CVe(h) as given in
(3.1.1). the values of e are O. 1. 2 . . . . . 14.
Recall that 2e+1
observations are deleted in the construction of the cross-validated
estimators.
For the partitioned cross-validation criterion rev as
given in (5.2.10). the values of g are 1. 2 . . . . . 15.
Recall that the
observations are split into g subgroups by taking every g-th
observation.
The score function dM(h) as given in (1.12) is
approximated by the average of the 1000 values of dA(h) as given in
(1.11).
The approximate expected value of the cross-validated
bandwidth hCV(e) is approximated by the minimi7.er of the average of the
1000 values of CVe(h) for each
the partitioned cross-validated
the product of g
-1/5
e.
The approximate expected value of
~~ndwidth hI~(g)
is approximated by
and the minimi7.er of the average of the 1000
values of CV* (h) for each g. where CV * (h) was defined in Section 5.2.
The values of the score functions dA(h). CVe(h). and CV*(h) are
calculated on an equally spaced logarithmic grid of 11 values.
The
endpoints of the grid are different for the different settings. and
chosen to contain essentially all the ~lndwidths of interest. The
....
....*
minimizers. h A. h M. hCV(e)' and hCV of the score functions. dA(h).
dM(h). CVe(h). and CV* (h) respectively. are calculated.
The
parti tioned cross-val idat ion bandwidth hI~(~) was defined as g -l/5h~.
-1:>2-
e'
After evaluation on the grid. a one step interpolation improvement is
done. with the results taken as the selected bandwidths.
If the score
functions had multiple minimizers on the grid. the algorithm chooses
the smaller of them (this choice is made arbi trari ly).
The approximate
A
expectea values of h M. hCV(e)' and hl~(g) could also be calculated by
-1/5
-3/5
_S
-1/5
BS -3/5
the formulas COn
+ BOn
as given in (4.2.5). COe n
+ Oen
as given in (4.2.6). and cSOgnrespectively.
1/5
+
.
. (5 .2. 4)
11S
1agn -3/5g 315 as glven
ln
The coefficients Band C are derived by the following
quantities:
Ju~
Jw 2 = 5/3.
1
Ju 1< - 0.002976.
= 1/28.
J(m")2w - 0.05612.
J(I<*(I<-L) - (K-L»2 - 0.7629.
where the notation a - b denotes that b approximates a with the
accuracy to the last decimal of b.
To show the structure of the population of bandwidth ratios. we
use a kernel density estimate with the same kernel function
I«x)
= (15/8)(1-1x 2 ) 2 1[-1/2.1/2J(x)
as for the nonparametric regression. where h denotes h . hCV(e)' and
A
hpcv(g)'
.
The b..'lndwidth h is taken as
estimates are calculated by (I/IOOO)
value of h/h
M
-1/5
I(X}()
•
Then the density
1000
! Kh(x-X ). where Xi denotes the
i
i=1
of the i-th data set and Kh(o) denOtes h
Section 6.2 gives our simulation results.
-1
K(o/h).
Section 6.3 gives the
estimated values of the optimal value of the subgroups as given in
(5A.3)..
-1F>.1-
6.2
Simulation Results
~~mple
In this section. we shall give the
and the kernel density
estin~tes
for the population of bandwidth ratios
2
for each combination of , and o.
bias-squares. and the MSE of the
mean square errors (MSE)
The s:unple variances. the sample
~~ndwidth
estimates h/h
M
are
summarized in Tables 6.1 through 6.5. where h denotes h , hCV(e)' and
A
hpcv(g)'
The bias-square of the bandwidth estimates is derived as the
square of the average of the 1000 values of hlh
sum of the variance and the bias-square.
- 1.
The MSE is the
Table 6.1 presents the
results for the independent observations where
roughly equals 1/2.
M
0
2
= 0.01772
and h
M
Table 6.2 contains the results for the positively
correlated observations with a large amount of sample variability.
where, = 0.6.
0
2
2
= 0.0071 . and h
M
rou~hly
equals 1/2.
Table 6.3
gives the results for the positively correlated observations with a
small amount of sample variability. where. = 0.6.
roughly equals 1/4.
0
2 = 0.0018 2 . and h
M
Table 6.4 contains the results for the negatively
correlated observations with a large amount of sample variability.
where,
= -0.6.
0
2
= 0.02832 .
and h
M
roughly equals 1/2.
Finally.
Table 6.5 gives the results for the negatively correlated observations
with a small amount of sample variability. where,
o
2
= 0.00292 .
and h
M
= -0.6.
.roughly equals liS.
Looking at Table 6.1. the case of the independent observations.
the bias-squares of hCV(t)/hM and hl~(g)/hM are in the same amount for
all values of
decreases.
as
e
e and
g.
As g increases. the variance of hpcv(g)/h
M
However. the variance of hCV(I')/hM keeps at the same amount
increases.
In this case. the
rev
is preferred to the CVe(h)
Next we shall discuss the contents of the tables with the
-154-
Table 6.1: The sample mean square errors (MSE) of the ratios of the
modified cross-validated bandwidths and the partitioned cross-validated
bandwidths to the optimal bandwidth for the independent observations.
Ratios
Variance
Bias-square
MSE
hA/h M
0.066793
0.000915
0.067707
0.141572
0.143701
0.147578
0.153700
0.155854
0.162344
0.159727
0.155042
0.155480
0.152513
0.147163
0.149591
0.150184
0.143045
0.140391
0.000382
0.000023
0.000599
0.001122
0.002106
0.004151
0.004408
0.003674
0.003543
0.003024
0.002397
0.003348
0.001787
0.000368
0.000324
0.141954
0.143724
0.148177
0.154822
0.157960
0.166495
0.164135
0.158716
0.159023
0.155536
0.149560
0.152939
0.151971
0.143413
0.140714
0.142141
0.122478
0.104857
0.090875
0.082542
0.069547
0.065137
0.052551
0.046748
0.044378
0.034843
0.040050
0.032656
0.026781
0.025813
0.000472
0.000189
0.001186
0.000703
0.000901
0.001303
0.000890
0.001325
0.001629
0.000631
0.001059
0.000105
0.000096
0.000121
0.000053
0.142613
0.122667
0.106043
0.091578
0.083443
0.070850
0.066027
0.053876
0.048377
0.045009
0.035902
0.040155
0.032753
0.026902
0.025866
A
hCV(e)/hM
2 Value
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
A
hpey(g)/hM
g Value
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-155-
Table 6.2: The sample mean square errors (MSE) of the ratios of the
modified cross-validated bandwidths and the partitioned cross-validated
bandwidths to the optimal bandwidth for the positively correlated
observations with a large amount of sample variability.
Ratios
Variance
Bias-square
MSE
hA/hM
0.067989
0.000839
0.068828
0.015217
0.094015
0.129529
0.142950
0.144861
0.150511
0.152457
0.153662
0.152041
0.154379
0.148788
0.148136
0.145901
0.142961
0.143305
0.340901
0.157793
0.066091
0.035485
0.022526
0.016509
0.013700
0.011340
0.010281
0.009622
0.008326
0.008082
0.007954
0.005565
0.004422
0.356118
0.251808
0.195620
0.178436
0.167387
0.167020
0.166157
0.165001
0.162322
0.164000
0.157113
0.156217
0.153855
0.148526
0.147727
0.014969
0.051444
0.079858
0.082228
0.075802
0.072712
0.065812
0.063662
0.059589
0.056709
0.054767
0.054734
0.052097
0.046694
0.045878
0.342923
0.285233
0.186568
0.127764
0.091330
0.071990
0.059694
0.055051
0.049922
0.050466
0.049093
0.046613
0.046844
0.048023
0.048859
0.357892
0.336677
0.266425
0.209992
0.167132
0.144702
0.125507
0.118712
0.109511
0.107175
0.103862
0.101347
0.098941
0.094718
0.094738
"
hCV(i)~
e Value
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
"
hpcv(g)/h M
g Value
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-156-
Table 6.3: The sample mean square errors (MSE) of the ratios of the
modified cross-validated bandwidths and the partitioned cross-validated
bandwidths to the optimal bandwidth for the positively correlated
observations with a small amount of sample variability.
Ratios
Variance
Bias-square
MSE
hA~
0.057907
0.000035
0.057942
0.001158
0.041539
0.074666
0.084220
0.086808
0.084461
0.079699
0.073712
0.069808
0.061553
0.057085
0.053950
0.049806
0.045128
0.041218
0.424437
0.279507
0.144123
0.081520
0.052977
0.039172
0.031801
0.027187
0.025106
0.018699
0.015261
0.012969
0.011853
0.009669
0.007396
0.425595
0.321046
0.218790
0.165740
0.139785
0.123633
0.111500
0.100900
0.094913
0.080252
0.072346
0.066919
0.061659
0.054797
0.048614
0.001068
0.022676
0.044861
0.043792
0.037051
0.030271
0.028279
0.025144
0.026751
0.022914
0.022240
0.022638
0.023159
0.023357
0.028926
0.418798
0.37B389
0.24154B
0.152653
0.107216
0.085374
0.077010
0.070429
0.068S51
0.066056
0.062975
0.066291
0.068660
0.066338
0.074854
0.419865
0.401064
0.286410
0.196446
0.144268
0.115645
0.105289
0.095573
0.095602
0.088970
0.085215
0.088929
0.091819
0.089695
0.103780
A
hCV (2)/hM
2 Value
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
A
hpcv(g)/h M
g Value
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-157-
Table 6.4: The sample mean square errors (MSE) of the ratios of the
modified cross-validated bandwidths and the partitioned cross-validated
bandwidths to the optimal bandwidth for the negatively correlated
observations with a large amount of sample variability.
Ratios
Variance
Bias-square
MSE
hA/hM
0.066622
0.000923
0.067545
0.053738
0.008799
0.115370
0.141201
0.181069
0.202484
0.201588
0.206957
0.203060
0.201329
0.196773
0.190999
0.196088
0.194194
0.194244
0.480186
0.369738
0.167648
0.132474
0.023440
0.026484
0.000285
0.008039
0.000264
0.002119
0.000693
0.001972
0.000592
0.000100
0.000125
0.533925
0.378538
0.283019
0.273675
0.204509
0.228968
0.201873
0.214996
0.203324
0.203449
0.197466
0.192973
0.196680
0.194294
0.194368
0.054103
0.196045
0.038372
0.211217
0.064431
0.130354
0.076233
0.089382
0.061838
0.068736
0.042000
0.049226
0.042437
0.034366
0.037513
0.499512
0.147679
0.185286
0.004263
0.054817
0.002965
0.017172
0.004054
0.007784
0.002063
0.005920
0.001418
0.001181
0.001194
0.000116
0.553615
0.343724
,..
hCV (2)/hM
2 Value
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
A
hpcv(g)/hM
g Value
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-158-
0.2236~
0.215481
0.119249
0.133319
0.093404
0.093436
0.069622
0.070798
0.047920
0.050644
0.043618
0.035560
0.037630
Table 6.5: The sample mean square errors (MSE) of the ratios of the
modified cross-validated bandwidths and the partitioned cross-validated
bandwidths to the optimal bandwidth for the negatively correlated
observations with a small amount of sample variability.
Ratios
hA/hM
'"
hCV(e)/hM
I
'"
hpev(g)/hM
Variance
0.041937
Bias-square
0.OCXXJ77
MSE
0.042014
0.008722
0.002672
0.027080
0.068115
0.060550
0.080834
0.066284
0.067686
0.056633
0.048331
0.042212
0.035176
0.033291
0.027471
0.062268
0.333118
0.246337
0.103456
0.082290
0.016934
0.006755
0.004151
O.OOCXXJl
0.005088
0.004142
0.011381
0.011860
0.021996
0.028349
0.039337
0.341840
0.249008
0.130537
0.150405
0.077483
0.087589
0.070435
0.067687
0.061721
0.052473
0.053593
0.047036
0.055287
0.055820
0.065606
0.008956
0.077615
0.027660
0.159741
0.070814
0.109601
0.079275
0.097323
0.082095
0.081028
.0.067858
0.072630
0.078190
0.053247
0.065906
0.366789
0.173606
0.253039
0.001283
0.148202
0.045872
0.107663
0.067464
0.102081
0.089276
0.099434
0.094587
0.094887
0.088721
0.093108
0.375746
0.251221
0.280670
0.161025
0.219017
0.155472
0.186938
0.164787
0.184175
0.170304
0.167292
0.167218
0.173077
0.141969
0.159014
e Value
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
g Value
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-159-
dependent observations.
When the observations nre positively
correlated or negatively correlated. the ordinary cross-validation
always produces too small or too large bandwidth estimates respectively
as mentioned in Chiu (1989a) and Ibrt(1987}.
I~
regression setting. the CVe(h} and the
dependence effect for the small values of
g
~
3.
In our simulated
still suffer from the
e and
g. e.g.
e
~
2 and
Thus. the bandwidth estimates would be close to the right or
the left bound of the bandwidth selection interval for the small values
of
e and
g.
This implies that the bandwidth estimates would have small
variances when
e and
g are small.
In Tables 6.2 and 6.3. we give the results for the combinations of
the AR(l} parameters
~
= 0.6.
and
0
222
= 0.0071 and 0
2
= 0.0018.
these two cases. the bias-square of hCV(t}/hM decreases to 0 as
In
e
A
increases.
However. the bias-square of hpcv{g}/h M decreases to a
nonzero constant.
I~
This is because the
could not reflect the actual
dependence structure of the observations and the bias-square of hpcv(g}
depends on
0
2
as given in (5.4.2).
In contrast to the bias-square. the
variance of hCV(e}/hM stays the same for nIl
hpcv(g}/h
M
e and
decreases monotonely as g increases.
the variance of
Thus. the MSE of
A
hcv(e}/hM decreases for the decreasing of its bias-square. and the MSE
of hpcv{g}/h
M
decreases for the
dccrcasin~
In Table 6.2. variance is the
of its variance.
domi~~nt
the PCV to reduce the variance of the
term in MSE.
~~ndwidth
Thus using
estimates would result
in a smaller value of MSE than using the CVt(h} to reduce the biassquare of the bandwidth estimates.
On the other hand. in Table 6.3.
bias-square is the dominant term in MSE.
CVe(h} to reduce the bias-square of the
-160-
In this case. using the
~~ndwidth
estimates is better
than using the PCV to reduce the variance of the bandwidth estimates.
The choice between the CVe(h} and the PCV should be made on the
basis of which component, variance or bias-square, is the dominant term
in MSE.
In Section 6.3, we use the plug-in approach to estimate the
unknown quantities for the regression setting.
This method could be
used to estimate the variance and the bias-square of hCV(e}/hM and
hpcv(g}/hM·
In Tables 6.4 and 6.5, we give the results for the combinations of
the AR(I} parameters.
= -0.6,
and
0
222
= 0.0283 and 0
2
= 0.0029.
The
variance and the bias-square of hpcv(g}/h M decreases along even g's and
odd g's separately.
This is because .g = .2k where g = 2k for the even
number of g and some k as discussed in Section 5.4.
The conclusions
for Tables 6.4 and 6.5 are the same as those for Tables 6.2 and 6.3.
Kernel density estimates of the ratios hA/h M, hCV(e}/hM'
A
hpcv(g}/hM are given in Figures 6.1
combinations of • and
0
2
throu~h
6.10, corresponding to the
as given in Tables 6. I through 6.5.
For the
density estimates of hCV(t}/hM' the integers I, 2, 3, and 4 correspond
to the curves for the values of
e = I,
6, 10, and 14 respectively.
For
the density estimates of hpcv(g}/h M, the integers I, 2, 3, and 4
correspond to the curves for the values of g
respectively.
Here the values of
e and
= 3,
4. 9, and 15
g arc chosen arbitrarily.
all cases, the character A corresponds to the curve for hA/hM'
In
The
vertical lines arc the ratios of the approximate expected values of h ,
A
hcv(e}' and hpcv(g} , to that of h M, for the corresponding values of
and g.
Here the approximate expected values of h
minimizers of the average of 1000
~urves
A
e
and hCV(e} are the
dA(h} and CVe(h} respectively.
A
Also the approximate expected value of hpcv(g} is the multiplication of
-161-
Figure 6.1: Kernel density estimates of the ratios of the modified
cross-validated bandwidths to the optimal bandwidth for the independent
observations.
Lf)
(J).-------.----~----,.---...,_---r__-.,...,.___r---~--___,
Lf)
--
o
O L . . . - - - - - L . - - - . . . 1 . -_ _---I'--_ _---L..
o
-0.66
-0.43
J..-_.LU...--..L.
-0.19
LOG (hCV(L)/h M)
-162-
0.04
...1.-_ _---J
0.28
Figure 6.2: Kernel density estimates of the ratios of the partitioned
cross-validated bandwidths to the optimal bandwidth for the independent
observations.
If)
mr------r----,-----r----.,------r--.,~.,,_---._--__,
If)
/
I
AI
/
/
.
2
/
/
1
/
/
g L-=:::=;;;;;;;;;E!!!!~~::::::~~~_---:...L__
o
-0.66
-0.43
__l._
-0.19
LOG(hpCV(L)/h M)
-163-
_lLL__
0.04
..::I
...
_L..._ _
. 0.28
Figure 6.3: Kernel density estimates of the ratios of the modified
cross-validated bandwidths to the optimal bandwidth for the positively
correlated observations with a large amOunt of sample variability.
If)
O'l..-----....---"""'1I""---.....----r-----.-m---,-----r------,
.;;t
- .....
I
I
I
I
\
\
A
\
\
\
\
\
\
\
o
O'------L---...lL---.J.-_ _--..;r.....-_ _--..L._JIl.-J......-L..
o
-0.66
-0.43
-0.19
LOG(hCV(L)/h M)
-164-
0.04
..l..-_ _----I
0.28
Figure 6.4: Kernel density estimates of the ratios of the partitioned
cross-validated bandiwdths to the optimal bandwidth for the positively
correlated observations with a large amount of sample variability.
If)
O"l..-------..-.---..---r----..------,---,..----,--_ _...-
.--_ _--,
"""
-,
I
.....
A
\
\
\
\
\
\
\
\
\
\
\
\
\
g 1===:::::i:-L-L_L_.....
_--l.
o
-0.66
-0.43
...L_JL--II..-_L-L_ _....:::i:::=._-.J
-0.19
LOG(hpCV(L)/h M)
-165-
0.04
0.28
Figure 6.5: Kernel density estimates of the ratios of the modified
cross-validated bandwidths to the optimal bandwidth for the positively
correlated observations with a small amount of sample variability.
I"'-
I.f)..-----r---y-..,..------,-----'--r-----r-----rn---r---..,------,
I.f)
\
\A
\
\
\
\
\
\
\
\
\
\
\
\
o
0"------'---....1...--'-----'--_--'-
o
-0.65
-0.43
"----'"..........'--'-
-0.20
LOG(hCV(L)/h M)
-166-
0.02
-'-_ _---1
0.25
Figure 6.6: Kernel density estimates of the ratios of the partitioned
cross-validated bandiwdths to the optimal bandwidth for the positively
correlated observations with a small amount of sample variability.
I"Ul..------r-r----r--~r_.._--__,--__rTl---.,....---r_--___,
Ul
I
I
I
I
I
I
I
I
I
I
I
1',
I
\
I
1
I
I
I
I
\
A
\
\
\
\
\
\
\
\
\
\
\
\
g
L_":::::=I:::t::::=--.J!..-_.L~/:-_---l_ _...lJl...L.-_ _LJ.
o -0.65
-0.43
-0.20
0.02
LOG(hpCV(L)/h M)
-167-
--.I
..L-_ _
0.25
Figure 6.7: Kernel density estimates of the ratios of the modified
cross-validated bandwidths to the optimal bandwidth for the negatively
correlated observations with a large amount of sample variability.
ex)
If)r-----.-----.---,.....---r---'-T--'-Trr---...,----,
r---
1
A_ -+
/
.....
/
/
I
I
Ig::::::~
/
/
/
g L--...L----l._.::-:.-~====:::r:====:dbdd::::!::::======='=
o
-0.66
-0.43
-0,19'·
LO G(h CY( L) / h M)
-168-
0.04
--l
0.28
Figure 6.8: Kernel density estimates of the ratios of the partitioned
cross-validated bandiwdths to the optimal bandwidth for the negatively
correlated observations with a large amount of sample variability.
U,J
\{).------r-~-...,....---....._--__r---_,..---_,r__,..--or--.....,
,......
\
\
.\
\
-0.43
-0.19
LOG(hpCV(L)/h M)
-169-
0.04
0.28
Figure 6.9: Kernel density estimates of the ratios of the modified
cross-validated bandwidths to the
opti~l
bandwidth for the negatively
correlated observations with a small amount of sample variability.
N
lOr------r----n----~---r__--~-_.,rr_"""TT---.,__--_,
f"-..
1
\
\
\
\
\
\
8L--...l.---JL----:o:::::f..-==:::c======h==:Jl_L__...l.-_-.J
o
-0.49
-0.29
-0.10
LOG(hCV(L)!h M)
-170-
0.10
0.30
Figure 6.10: Kernel density estimates of the ratios of the partitioned
cross-validated bandiwdths to the optimal bandwidth for the negatively
correlated observations with a small amount of sample variability.
l""
l D . . - - - - . , . - , . - -......---..-----,.-----.---.,.--,--...,.,r-----,
I'-
A
/
/
I
I
I
I
I
I
. . . 1'
1 \.
I
1
1
I
I
1
I
I
\
/
\
/
\
\.
gb===:::::b=:c:::~::.._=::=!:::=::::=.L
o -0.49
-0.29
__L_L-L=::..__'
-0.10
A
LOG(hpCV(L)/h M)
-171-
0.10
0.30
-1/5
g
a n d the minimizer of the average of 1000 curves CV* (h).
The peak
on the left or the right side of the density estimate is caused by the
dependence effect and the bandwidth selection interval.
exists strong dependence effect for any value of
e or
If there still
g. then the
bandwidth estimate would be on the left or the right bound of the
bandwidth selection interval.
Observe that the density estimates of hAIh . h (2)Ih M, and
M CV
hpcv(g)Ih
M show quite clearly that there is substantial skewness and
kurtosis despite the limiting normal distribution as given in (4.2.1).
(4.2.3). and (5.3.1) respectively.
This indicates that. for each e and
g.
a.
The ratios of bandwidths have very slow rates of convergence.
b.
There may be some other limiting distributions which prOVide a
better fit than the asymptotic normality.
exponential distribution and a limiting
parameters are possible.
For example. a limiting
~2 distribution with some
For the equally spaced fixed. circular design
Mallows' criterion. and independent observations. Chiu (1989b)
established that the bandwidth estimator has a limiting ~2
distribution.
,
For the dependent observations. the bias of the partitioned crossvalidated bandwidths could be understood by looking at Figures 6.4.
6.6.6.8. and 6.10.
For all values of g. the kernel density estimates
of hpcv(g)Ih M are not close to that of hAA1 M.
The curves marked by the
numbers 3 and 4. corresponding to the large values of g. are very
close.
This means that as g increases. lhe density estimate of
hpcv(g)/h M converges together. but not close to that of hAIh .
M
The results given in Table 6.2 and Figures 6.3 and 6.4 were
-172-
derived from the same regression setting.
In the analysis of the
results given in Table 6.2. we concluded that the PCV was better than
the CVe(h).
By looking at the kernel density estimates, we can also
decide which one. the PCV or the CVe(h). is better.
Looking at Figures
6.3 and 6.4. the area of the curve for hA/hM not covered by hCV(e)/hM
is larger than that by hpcv(g)/hM'
better than the CVe(h).
Thus. In this case. the
rev
is
On the other hand. Figures 6.5 and 6.6, which
were derived from the same regression setting, shows that the CVe(h) is
rev.
better than the
It is because
t~~t
the area of the curve for
hA/hM not covered by hCV(t)/hM is smaller than that by hpcv(g)/hM'
This conclusion is the same as the
a~~lysis
of the results given in
Table 6.3.
6.3
Estimation of the Optimal Number of Subgroups for the PCV
In the analysis of the asymptotic mean square error of hpcv(g)/h M
as given in (5.4.1). there exists a theoretically optimal number of
subgroups go as given in (5.1.3) for the negatively correlated
observations.
However. go involves some unknown quantities.
In this
section. we shall use the plug-in methods to estimate the regression
function and the autocovariance function '(0) of the regression errors.
Since the regression function is assumed to be smooth. then we may
use a polynomial to approximate the
parametric model m(x)
funct ion W(x)
=a
+ bx + cx
= (5/3) I [1/5.1/5] (x).
re~ression
function.
Given the
234
+ dx + ex . and the weight
through a st rn igh t forward
calculation. we have the following quantities:
J(m")2w = 4c2
+ 12cd + 13.41ce + lO.OSd
-173-
2
2
+ 21.1Sde + 15.7132Se .
J(m(3»2w
2
= 36d 2
+ 111de + 161.28e .
r ..m(
Jm
3 )w· - 0
-.
where the notation J denotes J dx.
Then the ordinary least square
estimates of the regression parameters nrc plugged into the above
equations.
For the estimation of the autocovariance function
~(.)
of the
regression errors. there are two different approaches available.
Solo
(1981) showed that the ordinary least square estimates of the
regression parameters are strongly consistent if the regression errors
are a stationary process with a bounded spectral density.
The spectral
density of an ARMA process as defined in Section 3.2 is bounded if the
lep(z)
I
~ r for some r
>0
on Izi
= I.
Thus the residuals from the
ordinary least square fit may be-used to estimate the autocovariance
function
~(.)
.
of the regression errors.
For the second method. given
the AR(p) process of the regression errors. Truong (1989) showed that
the estimator of the AR(p) parameters converges to the real values
weakly with a rate of convergence n
-112
.
Specifically. the
semiparametric regression model given in Truong (1989) is
for i
=1.
2 . . . . . n.
Here. the design points Xi equal i/n.
regression function m is Hoelder continuous of order 1.
errors
~i
The regression
are an AR(p) process and defined as
2
where e i are 110 N(O.o ) and 1 - 'lz - '2z
Izi ~ 1.
The
2 - ... - eppz P
Given the uniform kernel function K(x)
-171-
~
0 on
= (1/2)I[_1.1](x).
the
~
Nadaraya-Watson estimator (1.2). and the bandwidth h - (n- 110g n)1/3.
then
n 1/ 2
$ = (~1' ~2'
where
..• p.
~.
J-
$ is
Here
~
2' ...•
notation a
infinity.
. • where
n
~
p
). and
=>
rP
N(0.02 r -1).
p
= [Cov(~i'~')] for all i. j = 1. 2.
J
an estimator of $ obtained by regression ~j on
J-P
- b
n
...•
($ _ $)
~j
are defined as Y - m(x j ) for all j.
j
J- l '
~.
The
means that a /b is bounded away from zero and
n n
A weakness of this approach is that e i are assumed to be
2
N(O.o ).
In the rest of this section. we shall give the results for
estimation of gO' the optimal number of subgroups of the PCV.
The
simulated regression setting is the same as the one described in
The regression errors are an AR(I) process.
Section 6.1.
combination of
The
~ and 0 2 is the same as the one given in Table 6.5.
do not use the combination of ~ and
0
2
as given in Table 6.4.
because the value of MSE in Table 6.4 decreases monotonely.
We
This is
The pseudo
normal random variables are generated by the function RNDNS in GAUSS
where the seed is given as 456.
generated.
One thousand sets of data are
The average and the standard deviation of the estimated
values of go fitted by the AR(I) process of regression errors are given
in Table 6.6.
The symbo'l
use the same value.
(3/10)(n
-1
log n)
1/3
*
in Table 6.6 denotes that the two methods
Here the bandwidth h is taken as
.
The two methods of Solo (1981) and Truong (1989)
use the same estimates of J(m")2w and Je m(3 »2w.
The value of go is
calcula.ted by
as given in (5.1.3).
The quantities V. A. and B were given in (5.4.3).
-175-
Table 6.6: The average and the standard deviation of the estimated
values of the theoretically optimal number of subgroups derived by the
methods given in Solo (19Bl) and Truong (19S9). and fitted by the AR(l}
process of the regression errors. The notation
*
•
denotes these two
methods use the same data.
r
fern" }2w I(m'" }2w
~
'T(O}
A
B
V
go
-0.6
0.0000128
0.21 0.32 0.34
6.9
-0.59
-0.60
0.0000129
0.0000128
0.32 0.21 0.36
0.33 0.21 0.3B
B.1
B.2
Standard deviation of the above quanti ty
0.396
0.00 O.OOOOOIB
Solo
0.003
Truong
0.00 0.0000019
*
*
0.05 0.02 0.09
0.05 0.02 0.10
0.6
0.7
Real Value
0.056
4.221
Average of the above quantity
0.047
2.B95
Solo
Truong
*
*
-176-
Looking at Table 6.5. the minimum value of the MSE is at g
= 6.
Using
one step interpolation improvement. the minimizer of the MSE is 6.25.
In Table 6.6. the theoretical value of
go
is 6.9.
These two methods
provide good estimates for the AR(l) parameters, and
~(o).
However.
the fitted polynomial does not provide good estimates for the
integration of the derivatives of the regression function.
This means
that we need other methods which give good estimates of the derivatives
of the regression function.
-177-
BIBU(X;RAPHY
Brockwell. P. J. and Davis. R. A. (1987). Times Series: Theory and
Methods. Springer Series in Statistics. Springer-Verlag. New
York.
Chiu. S. T. (1987). Estimating the parameters of the noise spectrum
for a time series with trend: with application to bandwidth
selection for nonparametric regression. To appear.
Chiu. S. T. (1989a). Bandwidth selection for kernel estimation with
correlated noise. To appear in Statistics and Probability
Letters.
Chiu. S. T. (1989b). On the asymptotic distributions of bandwidth
estimates. To appear in Annals of Statistics.
Chu. C. K. and Marron. J. S. (1988). Comparison of kernel regression
estimators. North Carolina Institute of Statistics. Mimeo Series
No. 1754.
Clark. R. M. (1975). A calibration curve for radiocarbon data.
Antiquity. 49. 251-266.
Col lomb. G. (1981). Estimation non-param'etrique de la regression:
revue bibliographique. International Statistical Review. 49.
75-93.
Collomb. G. (1985). Non parametric time series analysis and
prediction: uniform almost sure convergence of the window and
K-NN autoregressive estimates. Statistics. 16. 297-307.
Collomb. G. (1985). Nonparamctric regression: an up-to-date
bibliography. Statistics. 16. 309-324.
Cox. D. R. (1984). Long-range dependence: a review. In: Statistics:
An Appraisal. H. A. David and II. T. IXlVid. cds .. 55-74. Iowa
State Univ. Press.
David. H. A. (1981).
Order Statistics. Wiley. New York.
Devroye. L. P. and Wagner. T. J. (1980). On the L 1 convergence of
kernel estimators of regression functions with applications is
discrimination. Z. Wahrsh. vrew. Gcbiete. 51. 15-25.
Eubank. R. L. (1988). Spline Smoothing and Nonparametric Regression.
Marcel Dekker Inc .. New York.
Gasser. T. and Mueller. H. G. (1979). Kernel estimation of regression
functions. In Smoothing techniques for curve estimation. Lecture
Notes in Math .. No. 757. 23-68. Spring-Verlag. New York.
Gasser. T. and Mueller. H. G. (1984). Estimating regression functions
and their derivatives by the kernel methods. Scandinavian
Journal of Statistics. 11. 171-185.
Granger. C. W. J. and Joyeux. R. (1980). An introduction to
long-memory time series and fractional differencing. Journal of
Time Series Analysis. 1. 15-30.
Haerd1e. W. (1988).
monograph.
Applied nonparamctric regression. unpublished
Haerdle. W.. Hall. P.• and Marron. J. S. (1988). How far are
automatically chosen regression smoothing parameters from their
optimum? Journal of the American Statistical Association. 83.
86-101.
Haerdle. W. and Marron. J. S. (1983). The nonexistence of moments of
some kernel regression estimators. North Carolina Institute of
Statistics. Memio Series No. 1537.
Haerdle. W. and Marron. J. S. (1985). Optimal bandwidth selection in
nonparametric regression function estimation. Annals of
Statistics. 13. 1465-1481.
Haerdle. W. and Vieu. P. (1987). Non parametric kernel regression
function estimation for ,-mixing observations. Part I: Optimal
squared error estimation. To appear.
Hall. P. (1984). Central limit theorem for integrated squared error
of multivariate nonparamctric density estimators. Journal of
Mul tivariate Analysis. 14. 1-16.
Hall. P. and Marron. J. S. (1987). Extent to which least-squares
cross-validation minimises integrated square error in
nonparametric density estimation. Theory of Probability and
Related Fields. 74. 567-581.
Hart. J. D. (1987). Kernel regression estimation with time series
errors. To appear.
Hart. J. D. and Wehrly. T. E. (1986). Kernel regression using
repeated measurements data. Jour~~l of the American Statistical
Association. 81. 1080-1088.
Jennen-Steinmetz. C. and Gasser. T. (1987). A unifying approach to
nonparametric regression estimation. Journal of the American
Statistical Association. 83. 1084-1089.
-179-
Mack. Y. P. and Mueller. H. G. (1989). Convolution type estimators
for nonparometric regression. Statistics and Probability
Letters. 7. 229-239.
Mallows. C. L. (1973). Some comments on C.
p
Technometrics. 15.
,
661-675.
Marron. J. S. (1987). Partitioned cross-validation.
Reviews. 6. 271-284.
Econometric
Marron. J. S. (19BB). Automatic smoothing parameter selection: A
survey. Empirical Economics. 13. 187-208.
Mueller. H. G. (19BB). Nonparametric Analysis of Longi tudinal Data.
Lecture Notes in Statistics. No. 46. Spring-Verlag. Berlin.
Nadaraya. E. A. (1964). On estimating regression.
Probability and its Applications. 9. 141-142.
Rice.
Theory of
J. (1984). Bandwidth choice for nonparometric regression.
Annals of Statistics. 12. 1215-1230.
Rosenblatt. M. (1969). Conditional probability density and regression
estimates. In: Multivariate Analysis II. P. R. Krishnaiah. ed ..
25-31. Wiley. New York.
Schuster. E. F. (1972). Joint asymptotic distribution of the
estimated regression function at D (inite number of distinct
points. Annals of Mathematical Statistics. 43. 84-88.
Serf! ing. R. (1980). Approximat ion Theorems of Mathematical
Statistics. Wiley. New York.
Shibata. R. (1981). An optimal selection of regression variables.
Biometrika. 68. 45-54.
Solo. V. (1981). Strong consistency of least squares estimators in
regression with correlated disturbances. The Annals of
Statistics. 9. 689-693.
Stute. W. (1984). Asymptotic normality o( nearest neighbor regression
function estimates. Annals o( Statistics. 12. 917-926.
Truong. Y. K. (1989). Nonparametric curve estimation with time series
errors. To appear.
Vieu. P. and Ihrt. J. (1989). Nonpnrametric regression under
dependence: A class of asymptotically optimal data-driven
bandwidths. To appear.
Wahba. G. and Wold. S. (1975). A completely automatic french curve:
fitting spline functions by cross-validation. Communications in
Statistics. 4.1-17.
-180-
.
.
Watson. G. S. (1964).
26. 359-372.
Smooth regression analysis.
Sankhya. Series A.
Whittle. P. (1960). Bounds for the moments of linear and quadratic
forms in independent variables. Theory of Probability and its
Applications. 5. 302-305.
-181- ,