Chu, C.K. and Marron, J.S.; (1989)."Comparison of Two Bandwidth Selectors with Dependent Errors."

No. 2007
Comparison of Two Bandwidth Selectors with Dependent Errors
C. K. Chu and J. S. Marron
1
National Tsing Hua University and University of North Carolina
September 1. 1989
ABSTRACT
For nonparametric regression. in the case of dependent
observations. cross-validation is known to be severely affected by
dependence. This effect is precisely quantified through a limiting
distribution for the cross-validated bandwidth. The performance of two
methods. the "leave-(2e+1)-out" version of cross-validation and
partitioned cross-validation. which adjust for the dependence effect on
bandwidth selection is investigated. The bandwidths produced by these
two methods are analyzed by further limiting distributions which reveal
significantly different characteristics. Simulations demonstrate that
the asymptotic effects hold for reasonable sample sizes.
AMS 1980 subject classifications: Primary 62G05; secondary 62G20.
Keywards: cross-validation. autoregressive-moving average process.
bandwidth selector. nonparametric regression.
IThis research is part of the Ph. D. dissertation of the first author,
under the supervision of the second at the University of North Carolina,
Chapel Hill. It was partially supported by NSF Grant DMS-8701201.
-1-
1.
INTRODUCfION
Nonparametric regression is a smoothing method for recovering the
mean function from noisy data.
It has been well established as a
powerful and useful data-analytic tool.
See the monographs by Eubank
(1988). Haerdle (1988). and Mueller (1988) for a large variety of
interesting real data examples where applications of this method have
yielded analysis essentially unobtainable by other techniques.
The simplest and most widely used regression smoothers are based on
kernel methods.
Kernel estimators are local weighted averages of the
response variables.
The width of the neighborhood in which averaging is
performed is called the bandwidth or smoothing parameter.
The magnitude
of bandwidth controls the smoothness of the resulting estimate of the
regression function.
For the independent observations. cross-validation
provides an attractive data-based method for choosing the bandWidth.
although it suffers from considerable sample noise.
and Marron (1988) for a detailed discussion of this.
See Haerdle. Hall.
For other
bandwidth selectors. see also Rice (1984) and Marron (1988).
However. if the observations are dependent. then the bandwidth
selectors designed for independent observations will not produce good
bandwidths.
For instance. if the observations are positively
correlated. then cross-validation will produce small bandwidths which
result in rough kernel estimates.
On the other hand. if the
observations are negatively correlated. then cross-validation will
produce large bandwidths which result in oversmooth kernel estimates.
See Hart and Wehrly (1986). Hart (l987). Chiu (1987). and Diggle and
Hutchinson (1989) for a detailed discussion of the dependence effect on
-2-
e·
bandwidth selection.
For dependent observations. a central limit theorem (CLT) for the
cross-validated bandwidth is given in Section 3 which quantifies the
dependence effect on cross-validation by showing what this bandwidth
converges to and by giving the rate of convergence for the crossvalidated bandwidth.
The rate of convergence is of the same order as
that given in Haerdle. Hall. and Marron (1988) for the case of the
independent observations. although the convergence is now not to the
optimal bandwidth.
This quantification motivates a modification of
cross-validation to eliminate the dependence effect.
This adjustment is called modified cross-validation (MCV) and is
simply the "leave-{2e+I)-out" version of cross-validation.
See Collomb
(1985). Haerdle and Vieu (1987). and Vieu and Hart (1989) for earlier
results on the application of this method to the settings of strong
mixing data.
Based on an autoregressive-moving average (ARMA) model for
the dependent regression errors. a CLT is given in Section 3 for the
modified cross-validated bandwidth. for each
e
~
O.
This CLT shows
clearly how the dependence effect on cross-validation is alleviated as
the value of
e
is increased.
However. the value of
e does
not appear in
the rate of convergence.
There are other possibilities for overcoming the dependence effect.
Marron (1987) proposed partitioned cross-validation (PCV) for kernel
density estimation to eliminate the sample noise inherent to crossvalidation.
The idea of PCV is to split the observations into g
subgroups by taking every g-th observation.
For the correlated data. as
long as g is large enough. the errors associated with each subgroup are
essentially independent.
Marron (1987) mentioned that this method of
-3-
cross-validation should effectively overcome the dependence effect.
While this is true. the resulting bandwidth is poor for a surprising
reason.
In Section 3. a CLT for the partitioned cross-validated
bandwidth is derived. for each g
~
1.
The rate of convergence is faster
than that for the modified cross-validated bandwidth.
This rate of
convergence is of the same order as that given in Marron (1987) for
kernel density estimation.
However. the asymptotic expectation reveals
that there is a significant distance between the partitioned crossvalidated bandwidth and the optimal bandwidth which minimizes the mean
average square error.
In fact the limiting distribution of this
bandwidth is centered at the bandwidth which is optimal for no
dependence. which is different from the true optimal.
Essentially
partitioned cross-validation does not work well because it is too
effective at removing the dependence.
Another approach to bandwidth selection for correlated data is that
of Hart (1987). Chiu (1989). and Diggle and Hutchinson (1989) who
proposed methods to estimate the covariance function of the regression
errors. plugging the estimated covariance function into bandwidth
selectors.
They showed. in simulation studies. that these plug-in
bandwidth selectors would produce good bandwidths.
When the dependent observations are considered in nonparametric
regression. a convenient dependence structure for analysis is the class
of ARMA processes in time series analysis.
Section 2 describes the
regression setting and the precise formulation of these two bandwidth
selectors.
The asymptotic behaviors of bandwidth estimates produced by
these two methods are given as theorems in Section 3.
Section 4
contains simulation results which give additional insight into what the
-4-
theoretical results mean.
Finally. sketches of proofs are given in
Section 5.
2.
REGRESSION MODEL AND BANDWIDTH SELECTORS
In this paper. the equally spaced fixed design and the short range
dependence nonparametric regression model is considered.
given by. for j
= 1.
The model is
2 •.... n.
Here m is a smooth unknown regression function defined on the interval
[0.1] (without loss of generality). x
points. i.e. x j
= j/n.
~j
j
are equally spaced fixed design
are an unknown ARMA process. and Yj are noisy
observations of the regression function m at the design points x ..
J
Using Definition 3.1.2 of Brockwell and Davis (1987). the process
an ARMA(p.q) process if
~j
is a stationary
proc~ss
~.
J
is
and if there are
positive integers p and q such that
+(B)~j
for all j.
= 9(B)e j •
Here e. are uncorrelated random variables with mean zero and
J
2
finite variance a . +(z) and 9(z) are the p-th and q-th degree
polynomials and B is the backward shift operator.
To estimate the regression function m(x). we consider a kernel
estimator as introduced by Nadaraya (1964) and Watson (1964).
kernel function K and a bandwidth h. for 0
<x <
Given a
1. the Nadaraya-Watson
estimator is defined by
A
(2.2)
where
m(x)
~(-)
= [n
-1 n
-1 n
!
Kh(x-xi)Y i ] / [n
!
Kh(x-x i }].
i=1
i=1
= h- 1K(-/h}
(if the denominator is zero. take m(x)
= O}.
See Chu and Marron (1989) for the comparison of this estimator to other
types of kernel estimator.
-5-
The optimal bandwidth. h . is taken as the minimizer of the mean
M
average square error (MASE).
The MASE function is defined by
(2.3)
"
where m(x ) are kernel estimators of m(x ).
j
j
The weight function W is
introduced to allow elimination (or at least significant reduction) of
boundary effects by taking W to be supported on a subinterval of the
unit interval (see Gasser and Mueller (1979».
e
For any
~
O. the "leave-(2e+1)-out" version of MCV is to choose
the bandwidth by minimizing the modified cross-validation score
CVe(h)
=n
-1 n
"
2._
}; [m.(x.) - Y.]-ow(x.).
j=1
J J
J
J
"
"
Here mj(x j ) are a "leave-(2e+1)-out" version of m(x j ). i.e. the
-e
observations (xj+i.Y j + i ).
"
m(x ).
j
~
i
e.
~
are left out in constructing
A
For the Nadaraya-Watson estimator. mj(x j ) are defined by
;.(x.)
J
J
= [(n-2e-1)-1
};
K_ (xj-xi)Y ] /
i
i: Ii- j I>t -n
-1
[(n-2t-1)
i: li:jl>e ~(Xj-xi)]'
The amount of dependence between mj(x j ) and Yj is reduced as
increased.
When
e = O.
MCV is ordinary cross-validation.
e
is
The minimizer
of CVe(h) is denoted by hCV(t)'
For any g
~
1. PCV is to calculate the ordinary cross-validation
score CVO.k(h) of the k-th subgroup of observations. k
= 1.
2 . . . . . g.
and minimize the average of these score functions
*
CV (h)
= g -1
g
}; CVo k(h).
k=1
.
*
"*
The minimizer of CV (h) is denoted by hCV'
A*
Since hCV is appropriate for
sampled size only n/g. the partitioned cross-validated bandwidth hpcv(g)
"*
"
is defined to be the rescaled hCV' hpcv(g)
-6-
-l/Sh*
A
=g
CV·
This scale
factor is given in Section 5.
validation.
For g
When g
= I,
PCV is ordinary cross-
large enough. the dependence effect inherent to
CVO.k(h). for all k. becomes negligible.
3.
RESULTS
In this section, we shall study the asymptotic behaviors of h
and hpcv(g) for any
e
~
0 and g
~
1.
CV
(2)
In order to derive this, using the
regression model (2.1) and the Nadaraya-Watson estimator (2.2). we must
impose the following assumptions:
(A.l)
The regression function m(x) supported on the interval [0.1] has
a uniformly continuous fourth derivative m(4)(x) on the interval (0.1).
(A.2)
The kernel function K is a symmetric probability density function
with support contained in the interval [-1.1].
The second derivative K"
of K is Hoelder continuous of order 1.
(A.3)
The weight function W compactly supported on the interval (0,1)
with a nonempty interior has a uniformly continuous first derivative W·.
(A.4)
The regression errors
~.
J
are an unknown ARMA(p,q) process for
which the polynomials +(z) and 9(z) have no common zeros. +(z) has no
zeros on Izl ~ 1. and e
j
are independent and identically distributed
(lID) random variables with mean zero and all finite moments.
(A.5)
The autocovariance function
~(.)
of the regression errors c. has
J
00
a positive sum, i.e. 0
<
~
~(k)
< 00.
k=-oo
(A.6)
The total number of observations in this regression setting is n,
with n
-+
« n 1/ 2 .
00.
The "leave-(22+1)-out" version of MCV is applied. with 2
The number of subgroups of PCV is g. with g « n 1/ 2
number of observations of each subgroup of PCV is
~
= n/g.
simplicity of notation. n is assumed to be a multiple of g.
-7-
For
The
For any e
(A.7)
.
Interval H
~
0, the minimizer of CVe(h) is searched on the
= [an -1/5 ,bn-1/5 ]
n
= I.
for n
For any g
2, ....
minimizer of CV* (h) is searched on the interval Hn,g
for T}
= 1.2,
....
~
1, the
= [ aT}-115 . bT}-115]
Here the constant a is arbitrarily small and b is
arbitrarily large.
Let the notation X = 0 (v ) mean that, as n ~
nun
almost surely, and uniformly on H
n
if v
n
00,
involves h.
Ixn Iv n I
~ 0
Under the above
assumptions, it is shown briefly in Section 5 that dM(h) can be
asymptotically expressed as
(3.1)
where
a
2
=
= (
;
k=-Gl
~(k»r~rW,
Jl J
I
(1I4)(fu~)2f(m")2w.
(-1I24)(fU~)(fU1<:)(f(m(3»2w + f m"m(3)w').
b
b
1
1
=
f
Here and throughout this paper, the notation
components of MASE, the terms a1n
-1 -1
h
and b h
1
variance and the bias square respectively.
that the optimal bandwidth h
h M = COn
(3.2)
M
-1/5
4
denotes
+ b h
2
6
f
duo
For the
represent the
A consequence of (3.1) is
can be asymptotically expressed as
+ BOn
-3/5
+ o{n
-3/5
),
where
Co = [a 1 I (4b l )]115 =
BO =
[{k=~ ~(k»f~fW{fU~)-2(f(m")2w)-1]1I5.
(1120)[(k=~ ~(k)J~JW)3/5(JU1<:)(J(m(3»2w
+ Im"m(3)w')] I
[(fU~) 11 (J(m")2w)8]115 .
We now quantify the dependence effect on the methods of
cross-validation, the MCV for each e
~
0 and the PCV for each g
through the following limiting distributions.
-8-
~
1.
Let the coefficients a~e
and
[
~e
;
k=-m
be the coefficients a l and Co with
~(k)J~
~ ~(k)]
- 4K(O)
Theorem 1:
1
and Co with [
~(gk)]
- 4K(O)k;O
replaced by
Let the coefficients a~g
in each case.
k>e
and cSOg be the coefficients a
[k=~ ~(gk)J~
[k=~ ~(k)J~]
;
k=-CIO
~(k)rK2]
J'
replaced by
in each case.
Under the above assumptions, as n
~
m, b
S
> 0, and ale> 0
1
S
for hCV(e) and a 1g > for hpcv(g)' then
(3.3) n
(3.4) g
1/10
A
[hCV(e)~ -
2/5 1/10
n
~
~~
oe/CoJ => N(O. (COe/Co)
A
[hpcv()~
g
-7
m
(
~
k=-CIO
~(k»
m
~~
- C:Og/COJ => N(O. v ( ~
g k=-CIO
~(k»
1/5
Var M).
-2/5
Var M).
where
[(J~)9(JW)9(Ju~)2(J(m")2w)] 1/5.
Var M .= (8/25) J(K*(K-L)-(K-L»2Jw2 /
v
g
= [;
;
i=-CIO j=-CIO
~(j)~(j-ig)][
;
k=-CIO
~(gk)r~
JI
- 4
~ ~(gk)K(O)]-7/5,
k>O
and where L(u) = -uK'(u) and * means convolution.
S
Remark 3.1: If ale
asymptotically.
~
O. then CVe(h) is minimized at the left end of Hn
If a S
1g
~
0, then CV* (h) is also minimized at the left
end of H
asymptotically.
n.g
If b
1
= O. then CVe(h) and CV* (h) are
minimized at the right or the left ends of H and H
respectively.
n
n.g
s
S
depending on the values of ale and a 1g .
Remark 3.2: The rates of convergence for
~.
hCV(e)' and hpcv(g) are of
the same order as those given in Haerdle, Hall. and Marron (1988) and
Marron (1987) for the respective cases with independent observations.
Remark 3.3: In the case of independent observations. the ratios.
and cSOg/C ' are equal to 1 for any values of
O
-9-
e and
g.
~e/Co
However. for the
dependent observations. these two ratios have different values.
MCV. if e
~
00
with e
1 2
n / , then
«
the geometric boundedness of
and Davis (1987).
~(k)
~e/Co ~ 1 at a polynomial rate for
as given in Exercise 3.11 of Brockwell
This means that MCV would produce asymptotically
unbiased optimal bandwidth with respect to h
large.
M
whenever
e
is moderately
Thus MCV is adequate to the case that the expectation of m is
important.
2
For
For PCV. if g
~
00
with g «n
~(k)]1/5 at a polynomial rate.
In
~
. then C-Og/C
O
~ [~(O)
/
This means that PCV would produce
k=-oo
asymptotically biased bandwidth with respect to h . no matter how large
M
the value of g is.
This asymptotic bias is caused by the distance g/n
among the observations of each subgroup.
An immediate remedy for this
bias is to split the observations into g subgroups by taking every g-th
cluster.
Each cluster is composed of ( consecutive observations.
Thus
PCV would be able to reflect the dependence structure of the whole data
set.
Since
~(k)
is geometrically bounded. it is enough to take the
value of ( as O(log n).
too many observations.
A drawback of this approach is that it requires
Since hpcv(g) has a faster rate of convergence
than hCV(e)' then pCV is adequate to the case that the variance of m is
important.
Remark 3.4: See sections 3.5. 4.3. and 5.4 of Chu (1989) for more
detailed comparison of these two methods in the important special cases
where the regression errors are a MA(I) process and an AR(I) process.
Remark 3.5: See sections 5.4 and 6.3 of Chu (1989) for the choice of the
optimal value of g for PCV.
-10-
e-
4.
SIMULATIONS
To investigate the practical implications of the asymptotic results
for hCV(e) and hpcy(g) presented in Section 3. an empirical study was
carried out.
settings.
We shall first introduce the simulated regression
The sample size was n
= 200.
The regression model (2.1) and
The regression function was
the kernel estimator (2.2) were considered.
m(x)
= x3 (I-x) 3
for 0
~
x
= .pc.J- 1
process. 1. e. c.
J
~
1.
The regression errors c. were an AR(I)
J
2
+ e .. where e were IID N(O.a ).
j
J
The AR( 1)
parameters .p and a were given as the following five combinations O.
0.0177; 0.6. 0.0071; 0.6. 0.0018; -0.6. 0.0283; -0.6. 0.0029.
Only the
second combination is treated the numerical results in Table 1.
values of a make h
M
The
correspond roughly 1/5. 1/4. or 1/2. and give
different amounts of sample variability of the regression settings with
the same value of.p.
-112
~
x
~
112.
The kernel function was K(x)
The weight function was W(x)
= 5/3
= (15/8)(1-4x 2 )2
for 1/5
~
x
~
for
4/5.
The same functions K and m were also used in Rice (1984) and Haerdle.
Hall. and Marron (1988).
For each combination of.p and a. 1000
independent sets of data were generated.
o.
1. 2 . . . . . 14.
For MCV. the values of
For PCY. the values of g were 1. 2 . . . . . 15.
e were
The
values of the score functions. the average square function in dM(h).
CVe(h). and CV* (h). were calculated on an equally spaced logarithmic
grid of 11 values.
The endpoints of the grid were different for the
different setting. and chosen to contain essentially all the bandwidths
of interest.
The expectation in dM(h) was empirically approximated by
averaging the average square error over the 1000 pseudo data sets.
A
The
A*
minimizers. h M• hCV(e)' and hCV of the score functions dM(h). CVe(h).
.
and CV* (h) respectIvely.
were calculated.
-11-
After evaluation on the grid.
a one step interpolation improvement was done, with the results taken as
the selected bandwidths.
If the score functions had multiple minimizers
on the grid, the algorithm chose the smaller of them (this choice was
made arbitrarily).
The sample variances, the sample bias-squares, and the MSE of the
bandwidth estimates
hpcv(g)'
~
were summarized, where h denotes hCV(e) and
The sample bias-square of the bandwidth estimates was taken as
A
the square of the average of the 1000 values of hlh
M
- 1.
the sum of the sample variance and the sample bias-square.
first combination, where.
=0
The MSE was
For the
(independent observations) and a
= 0.0177
(hM roughly equals 1/2), the bias-squares for hCV(e) and hpcv(g) were
roughly constant over
e and
g as predicted by our theorem.
As
e and
g
increased, the variances for hCV(e) stayed the same, but the variances
A
for hpcv(g) decreased also as predicted.
to MCV.
In this case, pCV is preferred
The numerical results as given in Table 1 represent the second
combination, where.
= 0.6,
and a
= 0.0071
(h
M
roughly equals 1/2).
In
this case, the bias-squares for hCV(e) decreased to 0 as e increased.
However, the bias-squares for hpcv(g) converged to a nonzero constant as
g increased.
In contrast to the bias-squares, the variances for hCV(e)
stayed the same for all
e and
monotonely as g increased.
the variances for hpcv(g) decreased
Here, variance is the dominant term in MSE.
Thus, using pCV to reduce the variance of bandwidth estimate would
result in a smaller value of MSE than using MCV to reduce the biassquare of the bandwidth estimate.
[Put Table 1 about here.]
In the case of the third combination, variance and bias-square had the
same tendency as the second combination.
-12-
In this case, bias-square is
the dominant term in MSE.
Thus using MCV to reduce bias-square would
give better MSE than using PCV to reduce variance.
cases where
~
= -0.6,
= 0.0283
and a
(h
In the final two
M roughly equals 1/2) and
=
a
0.0029 (h
M roughly equals 1/5), the variances and the bias-squares for
hpcv(g) decreased along even g's and odd g's separately.
because ~g
= (~2)k
= 2k
where g
This is
for the even number of g and some k.
The conclusions for these two cases are the same as those for the second
and the third combinations.
Finally, the choice between MCV and PCV
should be made on the basis of which component, variance or bias-square,
is the dominant term in MSE.
5.
SKETaIES OF PROOFS
The following notation and results will be used in this section.
For all integers i and j, let Xi be lID random variables with mean zero
and all finite moments, and a
00
.z
and b
i
ij
be real numbers such that
00
lail
< 00
and . Z
1 =-00
1 =-00
. Z Ibijl < 00.
Using Theorem 2 of Whittle
J =-00
(1960) and Theorem A of section 1.4 of Serfling (1980), then, for all
positive integers k, we have
00
E«
where c
1
and c
2
z
a.X.)
i=-OO
1
2k
00
)
~
1
2 k
c ( z ai ) ,
l i=-OO
are constants involving k and moments of X.
Let c. be a
J
00
linear process defined by c
j
= z
i=O
~iej_"
for j
= I,
2,
... , n,
where
1
00
~.
1
are real numbers with
z
i=O
I~il
< 00 and e j are lID random variables
with mean zero and all finite moments.
Using Fubini's Theorem, Theorem
2 of Whittle (1960), Minkowski's inequality, and Theorem A of Serfling
-13-
(1980). then. for all positive integers k. we have
n
E( ~
~j)
2k
= O(nk ).
j=1
For any i
«
n
1/2
and each x
j
with W(x )
j
~
O. or h
< x. < I-h.
under the
J
assumptions given in Section 3. we have the following asymptotic
results:
Proofs of (3.1):
asymptotic results of b
j
and v
j
as given above. through a
straightforward calculation. then the proof of (3.1) is complete.
Proof of Theorem 1:
A
A
We first give asymptotic expressions of hcv(e) and hpcv(g) for each
A
Through adding and subtracting the terms m(x.) and
J
m(x ). then CVe(h) can be expressed as
j
(5.4) CVe(h)
=n
-1 n
.~
?__
~j-W(Xj) +
dA(h) - 2Crosse(h) + Remaindere(h).
J=1
where
-14-
Remaindere(h)
=n
-1 n
A
A
calculation. then. as n
j
and v ' through a straighforward
j
~ 00.
l l
Crosse(h) = 2n- h- (
(5.6)
As n
A
[mJ.(x j ) - m(xj)][mj(x j ) + m(x.) - 2m(x.)]W(x.).
j=l
J
J
J
Using the asymptotic results of b
(5.5)
A
~
~ ~(k»K(O)rW
J
k>e
+ 0 (dM(h».
u
Remaindere(h) = °u(dM(h».
~ 00.
S
ale
> O.
and b
> O.
l
through a straightforward calculation.
then
~~
A
hCV(e) = COe n
-1/5
(1 + 0u(l».
Using the results of (5.4) through (5.6). through a straightforward
calculation. then CV* (h) can be asymptotically expressed as
*
CV (h) = n
-1 n
~
j=l
~_
~.-W(x.)
J
J
-1 -1 4
h +h).
+ou(~
a~g > O. and b l > O. then
h~ = cSOg~-1/5(l + 0u(l».
This implies that. as n
~
S -1 -1
4
+ al ~ h
+ blh
g
00.
Since the optimal bandwidth
~
is of the order n
-115
A*
and hCV is of the
order ~-1/5 = gl/5n -l/5. then hpcv(g) is defined as g-1/5h~.
Using the unique linear solution of the ARMA process
~
..
J
the
asymptotic properties given above. and Fubini's Theorem. the proof of
Theorem 1 is essentially the same as the proofs of Theorem 1 of Marron
(1987). and Theorems 1 and 2 of Haerdle. Hall. and Marron (1988). The
~~
-115
only difference is that hCV(e) should be close to COen
and hpcv(g)
-~
-115
close to c-Ogn
. not hM.
A
-15-
REFERENCES
Brockwell. P. J. and Davis. R. A. (1987). Time Series: Theory and
Methods. Springer Series in Statistics. Springer-Verlag. New York.
Chiu. S. T. (1987). Estimating the parameters of the noise spectrum
for a time series with trend: with application to bandwidth
selection for nonparametric regression. To appear.
Chiu. S. T. (1989). Bandwidth selection for kernel estimation with
correlated noise. To appear in Statistics and Probability Letters.
Chu. C. K. (1989). Some results in nonparametric regression. Ph. D.
Dissertation. Department of Statistics. University of North
Carolina.
Chu. C. K. and Marron. J. S. (1989).
estimators. unpublished paper.
Comparison of kernel regression
Clark. R. M. (1975). A calibration curve for radiocarbon data.
Antiquity. 49. 251-266.
Col lomb. G. (1985). Non parametric time series analysis and
prediction: uniform almost sure convergence of the window and K-NN
autoregressive estimates. Statistics. 16. 297-307.
Diggle. P. J. and Hutchinson. M. F. (1989). On spline smoothing with
autocorrelated errors. Australia Journal of Statistics. 31,
166-182.
Eubank. R. L. (1988). Spline Smoothing and Nonparametric Regression.
Marcel Dekker Inc .. New York.
Gasser. T. and Mueller. H. G. (1979). Kernel estimation of regression
functions. In Smoothing techniques for curve estimation, Lecture
Notes in Math .. No. 757. 23-68. Spring-Verlag. New York.
Haerdle. W. (1988).
monograph.
Applied nonparametric regression. unpublished
Haerdle. W.. Hall. P .. and Marron. J. S. (1988). How far are
autometically chosen regression smoothing parameters from their
optimum? Journal of the American Statistical Association. 83.
86-101.
Haerdle. W. and Marron. J. S. (1985). Optimal bandwidth selection in
nonparametric regression function estimation. Annals of
Statistics. 13. 1465-1481.
Haerdle. W. and Vieu. P. (1987). Non parametric kernel regression
function estimation for ;-mixing observations. Part I: Optimal
squared error estimation. To appear.
-16-
Hart. j. D. (1987). Kernel regression estimation with time series
errors. To appear.
Hart. J. D. and Wehrly. T. E. (1986). Kernel regression using repeated
measurements data. journal of the American Statistical
Association. 81. 1080-1088.
Marron. j. S. (1987). Partitioned cross-validation.
Reviews. 6. 271-284.
Econometric
Marron. j. S. (1988). Automatic smoothing parameter selection: A
survey. Empirical Economics. 13. 187-208.
Mueller. H. G. (1988). Nonparametric Analysis of Longitudinal Data,
Lecture Notes in Statistics. No. 46. Spring-Verlag. Berlin.
Nadaraya. E. A. (1964). On estimating regression.
Probability and its Applications. 9. 141-142.
Theory of
Rice. J. (1984). Bandwidth choice for nonparametric regression.
Annals of Statistics. 12. 1215-1230.
Serfling. R. (1980). Approximation Theorems of Mathematical
Statistics. Wiley. New York.
Vieu. P. and Hart. j. (1989). Nonparametric regression under
dependence: A class of asymptotically optimal data-driven
bandwidths. To appear.
Watson. G. S. (1964).
26. 359-372.
Smooth regression analysis.
Sankhya. Series A,
Whittle. P. (1960). Bounds for the moments of linear and quadratic
forms in independent variables. Theory of Probability and its
Applications. 5. 302-305.
-17-
A
A
Table 1: The sample MSE of hCV(e)/hM and hpcv(g)/hM for the positively
correlated observations with a large amount of sample variability.
Ratios
Variance
Bias-square
hA/h M
0.067989
0.000839
A
hCV(e)/h M
MSE
0.068828
l
e Value
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0.015217
0.094015
0.129529
0.142950
0.144861
0.150511
0.152457
0.153662
0.152041
0.154379
0.148788
0.148136
0.145901
0.142961
0.143305
0.340901
0.157793
0.066091
0.035485
0.022526
0.016509
0.013700
0.011340
0.010281
0.009622
0.008326
0.008082
0.007954
0.005565
0.004422
0.356118
0.251808
0.195620
0.178436
0.167387
0.167020
0.166157
0.165001
0.162322
0.164000
0.157113
0.156217
0.153855
0.148526
0.147727
0.014969
0.051444
0.079858
0.082228
0.075802
0.072712
0.065812
0.063662
0.059589
0.056709
0.054767
0.054734
0.052097
0.046694
0.045878
0.342923
0.285233
0.186568
0.127764
0.091330
0.071990
0.059694
0.055051
0.049922
0.050466
0.049093
0.046613
0.046844
0.048023
0.048859
0.357892
0.336677
0.266425
0.209992
0.167132
0.144702
0.125507
0.118712
0.109511
0.107175
0.103862
0.101347
0.098941
0.094718
0.094738
A
hpcv(g)/hM
g Value
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-18-
,