Hardle, WolfgangApproximations to the Mean Integrated Squared Error With Applications to Optimal Bandwidth Selection for Nonparametric Regression Functin Estimators."

•
APPROXIMATIONS TO lliE MEAN INTEGRATED SQUARED ERROR
WIlli APPLICATIONS TO OPTIMAL BANlMID1H SELECfION
FOR NONPARAMETRIC REGRESSION FUNCfION ESTIMATORS
Wolfgang Hardle*
lJniversitat Heidelberg
Sonderforschungsbereich 123
Im Neuenheimer Feld 293
D-6900 Heidelberg 1
AMS 1980 Subject Classifications:
KEYWORDS AND PHRASES:
University of North Carolina
Department of Statistics
321 Phillips Hall 039A
Chapel Hill, NC 27514
Primary 60F05, Secondary 62G05
stochastic measure of accuracy, nonparametric regression
function estimation, optimal bandwidth selection, limit
theorems, mean square error.
*Research partially supported by the "Deutsche Forschungsgemeinschaft"
SFB123, "Stochastische Mathematische Madelle".
Research partially supported by the Air Force of Scientific Research Contract
AFOSR-F49620 82 c 0009.
SUMMARY
Discrete versions of the Mean Integrated Squared Error (MISE) provide
stochastic measures of accuracy to compare different estimators of regression
flIDctions.
These measures of accuracy have been used in Monte Carlo trials
and have been employed for the optimal bandwidth selection for kernel
regression flIDction estimators, as shown in HardIe and Marron (1983).
In the
present paper it is shown that these stochastic measures of accuracy converge
to a weighted version of the MISE of kernel regression function estimators,
extending a result of Hall (1982) and Marron (1983) to regression function
~
estimation.
1.
INfRODUCTION AND BACKGROUND
Let (Xl,Yl ), (X2,Yz)' ... be independent random vectors distributed as (X,Y)
with common joint probability density function f(x,y) and let
m(x) = E(Y/X=x) = !yf(x,y)dy/fX(x), f X the marginal density of X, be the
denote the nonparametric kernel estimate
regression curve of Y on X. Let m*(x)
n
of m(x), as introduced by Nadaraya (1964) and Watson (1964),
m*(x)
=m
n
n (x)/fn (x)
(1.1)
where
n
" (x) = n -1 h -1 L\' K((x-X.)/h)Y.
m
n
•
. 1
1=
1
1
and
-1 -1 n.
f (x) = n h
n
L K((x-X.)/h).
1
. 1
1=
Here K is a kernel function and h=h(n) is a sequence of ''bandwidths'' converging
to zero as n tends to infinity.
This estimator was studied by Rosenblatt (1969) who derived bias, variance
and asymptotic normality; Schuster (197Z) demonstrated the multivariate norma1ity at a finite mnnber of distinct points.
For further results we refer to the
bibliography of Co11omb (1981).
In the present paper we show that
(1.Z)
a stochastic measure of accuracy for the estimate m*,
exhibits the same
n
limiting behaviour as the deterministic measure
(1.3)
MISE =
f
1
o
MSE(t)fX(t)dt
-2-
where "MSE(t) is the mean squared error (HSE) of m*(t).
The proper definition
n
of the M..,)J: for m* wi L1 be delayed to section 2.
n
1be result of this paper addresses two problems.
Firstly, in a survey
paper, Wegman (1972) was interested in comparing the Mean Integrated Squared
Error (MISE) of several different density estimators.
As Wegman pointed out,
the computation of the actual MISE can be quite tedious.
Hence, Wegman used
an empirical measure of accuracy of the structure as in fonnula (1.2) and gave
some heuristic justification.
Now, since the bias/variance decomposition of
regression function estimators is rather similar to that of density estimators
(Rosenblatt (1969), (1971)) it may be argued that Wegman's. heuristics hold
also in the regression function estimation setting.
is shown here that, as n
-+
(1.4)
0
A*(h) = MISE +
n
00,
p
The answer is posi tive:
uniformly over an interval [h ,11]
(MISE), h
E
[h,n]
-
The appealing feature of this approximation is, that it holds uniformly
in h
E
[h,n].
A Monte Carlo trial comparing different estimators of
m(x) (wrt MISE) at different sequences of bandwidths can thus be based on
A* (h) which is faster to compute than MISE as defined in (1. 3) .
n
Secondly, the approximation (1.4) contributes to the solution of the
"optimal bandwidth selection" problem.
As the optimal bandwidth h* we under-
stand that sequence h=h(n) which minimizes the MISE for each n.
HardIe and
Marron (1983) demonstrated by a crossva1idation argument that minimization
(with respect to h) of
A~(h)
is asymptotically equivalent to minimization of
(1.5)
where
m*(j)(x) = n-lh- l L K((x-X.)/h)Y./f (x)
n
"J."
lIn
IT]
is the "leave-one-out" estimator. So the result of this paper, as stated in
(1.4), ensures that the minimization of (1.5) with respect to h yields the
It
-3-
(MISE-) optimal sequence of bandwidth h* and solves, as is shown in HardIe and
Marron, a problem raised by Stone (1982), Question 3, p. 1054.
We will not only analyze m*(x) , as defined in (1.1), but also
n
(1.6)
where f X denotes the marginal density of X. This estimator of m(x) is reasonable if we know the marginal density and is somewhat more tractable than m* .
n
The estimator (1.6) was studied by Johnston (1982), who also observed that
mn/fX has
in general a higher asymptotic variance than m*.
n
The stochastic measure of accuracy (1.2) was defined only on the interval
[0,1].
It will later be assumed that the support of f
interval.
properly contains this
X
This is due to ''boundary effects", more precisely, the bias at the
endpoints of the support of f
~
inflates and has a slower rate than in the
x
interior (Gasser and MUller, 1979; Rice and Rosenblatt, 1983). Thus, defining
the MISE over the whole support of f X' would ultimately lead to the tmappealing
situation that the optimal bandwidth with respect to MISE would be determined
in such a way that it minimizes the mean square error at the botmdaries, since
that is of lower order.
The estimate in the interior would thus exhibit
suboptimal behaviour.
The results of this paper are improvements over some previous work for
several reasons.
Firstly, we do not need such strong smoothness assumptions
on f X as in Hall (1982), who proves similar results in the density estimation
setting. Secondly, our assumptions on the variance curve V2 (t)=var(Ylx=t)
and the range of allowable bandwidths are considerably weaker than those in
Johnston (1982) who demonstrates a Gaussian approximation to
along the same lines as Bickel and Rosenblatt (1973).
1.
(nh)2[~-EIDn]
Thirdly, our work
extends the result of Wong (1982) who deals only with the fixed design case,
-4-
i.e. X. are nonrandom.
1
Finally, we may note that Hall's proof would simplify
if one uses the approximation provided by the Bickel and Rosenblatt paper and
the
outline of the proof given here for regression ftmction estimators.
Note that although only the two-dimensional case is considered here, the
proof can certainly be extended to the higher dimensional case where we
observe a (d+l) dimensional random vector (Xl,···Xd,Y), d > 1.
The assumptions
will be different in that case, since it is still tmknown whether the multivariate empirical process can be strongly approximated by Brownian bridges
with compatible rates as in the univariate or bivariate case.
This approxi-
mation technique by Brownian bridges, as carried out in the appendix, is vital
to our results.
A similar technique, exploiting the idea of invariance
principles in nonparametric regression, was used by Mack and Si1vennan (1982)
who showed weak and strong uniform consistency (in sup-norm) of m*.
n
outline of the paper is organized as follows.
The
First we prove that
mn(t)-Effin(t) can be uniformly (in t and h) approximated by a Gaussian process
similar to that occurring in Bickel and Rosenblatt (1973) p.1074, formula
(2.5) .
Secondly, we plug this approximating process into the fonnula (1. 2) ,
which defined the discrete version of MISE, and by evaluation of covariances
and higher moments be finally arrive
2.
at the deterministic measure (1.3).
RESULTS
We will make use of the following definition.
Definition. A function w is called Lipschitz - continuous of order a
(LC(a)) iff with a constant L
w
Iw(t)-w(t')
I
:<;
L It-t'l a , 0 < a
w
:<;
l.
The following assumptions fix the range of allowable bandwidths [h,h] , determine
the kernel function K and describe some smoothness of m(t) , var(YIX=t) and
-5(AI)
Let
{~}
denote a sequence for which there is an
lim h n!-E/ log n = 0,
lim h n}-c =
n-l<lO -n
n-l<lO -n
<:: >
0 so that
00
and let {h } denote a sequence for which
n
lim h = 0,
lim h log n =
n-l<lO n
00
n-l<lO n
Asswne from h = hen) that it satisfies
h ~ h ~
(AZ)
h
There exists a sequence of positive constants {a }
n
such that
sup_ h -3
h~h~h
lim
n-l<lO
J
Iyl >an
sup
O~x~l
I~(x)
J
Iyl>an
a
I =I Jn
-an
fldu[~(uh)]1
(A3)
y 2 fyCy)dy
~
c, f
y
t
00
and a c
<
00
the marginal density of Y
y2 f (x,y)dy = 0
2
Y f(x,y)dYI ~ n >
a
for all
a~ x~
1, n ~ 1.
1
= 0 ({log(l/h)}2)
The functions S2(t) = E[y2 Ix=tl, fxCt) and met) are LC(a) with a
and are all of bounded variation.
>
~
The marginal density of X is
bounded from below:
inf fxCt)
~ y >
O.
O~t~l
(A4)
The kernel ftmction K is differentiable with K' of bounded variation
and fulfills
fKCu)du
=1
support {K}
c
[-A,A].
-6By straightforward computations it can be shown that
~
is LC(a), a > !
and of bOl.m.ded variation by asswnption (A3) on S2 (t) and fX(t).
~
not hard to see that if
It is also
is LC(I) the last condition in (A3) follows.
that the set of asstmlptions in (AZ) holds if Y is bOlUlded (a
n
= log
Note
log n),
an asstmlption that is often made in other papers, to avoid conditions on
(AZ) also holds, if an = n B, B small, while (X,Y) are
moments of Y as in (AZ).
jointly normally distributed.
For simplicity of notation, we will not
explicitly write the indices of h,h,h.
The following results show that the approximation (1.4) holds for both
~/fX and~.
Only the proof of theorem 1 (dealing with mn/f ) will be given
x
in full detail since the result for m* can be obtained quite analogously.
n
A 2
Let us define B
k
=
fK (u) du
-A
and
A
b (t)
n
= f~
1
A
J K(u) [m(t-uh)fX(t-uh) -m(t)fX(t) ]du
(t)
-A
the bias of mn/fX.
Theorem 1
Assume that (Al) to (A4) hold and
uniformly over h
An (h)
[!!,h]
E
I
= n- l
bn (t)
.J
JE
= (nh)-lB
[m
(X.)/fx(X.) - m(X.)]2
nJ
J
J
1
k
f
S2(t) dt
o
1
+
J[b
o
n
(t)]2 fx (t) dt
1
+ 0 ((nh)-l +
P
= MISE[tftn /fX]
f[b n (t)]2
o
+ 0 (MISE)
p
dt)
is of bOlIDded variation.
Then
-7-
Assume that f
x is
d
- times continuously differentiable and m is d - times
Z
1
continuously differentiable. Then, as in Rosenblatt (1971), the bias bn(t)
would read as
provided that K satisfies
f u~(u)du
=
d!~d.
f
ujK(u)du
= 0,
j=1, ... ,d-1 and
Many papers in nonparametric regression function estimation
assume such a kind of differentiability as above and are dealing with methods
to balance the contribution from the variance and the bias (see Co11omb (1981)
for a review).
In a similar manner define b*(t), the bias of m*(t), as follows
n
1
n
A
b~(t) = f~ (t) _!K(U) [m(t-uh) -m(t)] fX(t-uh)du.
e
Where the expression "bias" has to be understood as the expected value of
n
.
f~l[~ - mf ], f (t)= n- 1h- 1 I K((t-Xi)/h)
n
n
n
i=l
density f •
X
a density estimate of the marginal
This is justified by the observation that
-1 A
A
m* - m = f [m -mf ] + 0 (m -mf )
n
X n
n
pn n
(see Hardle and Marron (1983)) and that moments of m* need not exist in
n
general (Rosenblatt (1969)).
The next theorem shows how A*(h) approximates the MISE.
n
Theorem Z
Assume that (Al) to (A4) hold and that
Then uniformly over h
E
[h,h']
b~(t)
is of bounded variation.
-8-
A*(h)
=n
n
-1
+ 0
I
jcJ
[m*(X.) - m(x.)]2
n
J
J
((nh)-l + J1 [b*(t)]2 dt)
pan
= MISE[m~]
+
0p(MISE) ,
where V2 (t) = S2(t) - m2 (t).
Note that the variance terms and the bias terms of the two estimators
mn/fX and m*n are completely different. Since V2(t) ~ S2(t), the
Nadaraya-Watson estimator m*(t) attains in general a smaller (asymptotic)
n
variance than mn/fx . This was also observed by Jolmstlon (1982). The
condition ''nhS -+ 0", appearing in the work of the latter, implies that the bias
vanishes asymptotically faster than the variance.
bias terms does not show up in that work.
Therefore, any difference in
It would be interesting to find a
similar comparison of bias terms, but this would lead to complicated and rather
lDlIlatura1 assumptions on derivatives of m and f ' as can be seen from the
X
'"
formula for b , following Theorem 1.
n
TI-IE PROOFS
3.
We shall prove theorem 1 in full detail, the proof of theorem 2 will onl y
be sketched since the technical details are similar to the proof of theorem 1.
F(x,y) will denote the joint cumulative distribution function (df) of (X,Y) and
Fn(x,y) will denote the two-dimensional empirical df, defined as usual.
It
1S
understood throughout these proofs that 0,0 in remainder terms are uniform over
h
E
[~,h].
-9-
Proof of theorem 1
The basic decomposition is
m
n (t)/fX(t)
(3.1)
- met) = Yn (t)
A
+
bn (t)
where
00
Yn(t) = f~l(t) h- 1
If
yK((t-x)/h)d[Fn(x,y)-F(x,y)] .
-00
In the appendix it is shown that
Y (t) = [S2(t)/f (t)]-iYn (t)
o,n
x
where the remainder term is miform in t.
The basic decomposition (3.1) now reads
(3.2)
_.1.
_1
= n 2h 2V
n
where
(3.3)
p
n
=
_1..
0
p
A
(t) + b (t) + P
n
n
_1..
(n 2h 2) is mifonnly in t and
7
Vn(t) = [S2(t)/f (t)]! h-! K((t-x)/h)dW(x)
X
-00
Using (3.2) and (3.3) the stochastic measure of accuracy is then
-10A
A (h)
n
1
A
= J [b (t)]
o
2
dFX (t)
,n
n
1
1
1
+ 2n- zh- z J b (t) Y (t)dF (t)
X,n
on
n
1
A
J..
1
1
+ p {2[f b (t)dF (t)+n-Zh- Z J Y (t)dF (t)]+ p } ,
X,n
X,n
nOn
0 n
n
where FX,n denotes the empirical distribution ftmction of {Xi}~=l .
This can be rewritten as
n- l
I [b
. J
JE:
n
(x.)]2
J
where
1
2
U 1 = f y (t)fX(t)dt
nOn
1
2
U 2 = f y (t)d[Fx (t)-FX(t)]
nOn
,n
1
A
U 3 = f Y (t)b (t)fX(t)dt
nOn
n
1
U 4 = f Y (t)b (t)d[FX (t)-FX(t)].
nOn
n
,n
-11-
We now show that the limits of U ., i = 1,2,3,4 give us the desired
nl
limit behaviour of A (h). We may note that the approximations, as carried
A
n
out in Bickel and Rosenblatt (1973), would have led to a process similar to
V (t) when estimating a density.
n
So the technique developed here, would be
useful in density estimation also and wcu1d provide an alternative proof
of Hall's (1982) result on stochastic measures of accuracy for density
estimators.
Let us begin with the limit behaviour of U .
n1
1
E U
nl
1
Note first that
00
= f E{h- 2 f
a
K((t-x)/h)dW(x)}2 s2(t)dt
-00
1
.
00
= f h- l f K2 ((t-X)/h)
a
dx S2(t)dt
-00
1A
= f J K2 (U)
S2(t-Uh)du dt
OA
1
= 6k f
S2(t)dt + 0(1).
a
where the remainder term is uniform in h, since S2(t) is LC(a), a > ! by
asslUllption (A3).
To show that
(3.4)
U
nl
we demonstrate
A 2
1
K (u)du f S2(t) dt
~
f
a
-A
E(U~l) ~ (EUnl ) 2. The statement (3.4) will then follow from
Chebyshev's inequality.
Since Z(t) = h-
1
2
00
f
K((t-x)/h) dW(x) is a Gaussian process we conclude by
-00
the Isserlis (1918) formula
-12-
The first summand
by assumption (A4) on the kernel K.
The second summand
by evaluation of the integral inside the [0] - brackets.
1 2
This shows that U 1 = Pk J S (t)dt + 0 (1).
n
0
p
Next we show that
We obtain by partial integration
-13h U
n2
=
1
-2 JoHn (t)
q(t)Z (t) [h- 1q(t)
n
00
JK' ((t-x)/h)dW(x)]
dt
-00
1
-2
J H (t) q(t)Z2(t) d q(t)
o
n
n
where q(t) = S2(t)/f (t)
x
Now since Hn (t) = 0p (n -~) llllifonnly in t and V2n (t 0 ) = 0p (1), t 0 = 0,1, as
is easily verified by Chebyshev's inequality, we only have to consider the
first two summands in the equality above.
These are further estimated by Schwarz's inequality
1
X
+
where Sl
=
00
[f [h- 2 fK'((t-x)/h)dW(x)]
o
S2
1
1
dt]2
00
sup
O~t~l
Iz
2
n
1
(t)lf Idq(t)I } ,
0
sup q 2 (t) and S2 = sup q(t).
O~t~l
By
2
O~t~l
Chebyshev's inequality we have
1
f
o
1
00
[h- 2 !L((t-x)/h)dW(x)]2 dt = 0 (1)
-00
where L is either K or K'.
ately that
p
Integration by parts applied to Z2(t) show immedin
-14-
sup Z2(t) = 0 (1)
p'
therefore (3.5) holds.
O~t~l n
Now, since
1
2
[b (t)] dt)
o n
~ o(J
A
by an application of Schwarz's inequality, we conclude that
(3.6)
U 3
n
=0
1
2
1
([J [b (t)] dt]2) •
p 0 n
A
The tenn Un4 is estimated again by a partial integration argument.
1
1
4 = h- 2 f H (t)b (t)h- q(t) !K'((t-x)/h)dW(x)dt
nOn
n
1
00
A
U
-00
1
+ h- 2
1
+
1
Jo Hn (t)bn (t)Zn (t)
dq(t)
1
h- 2! H (t)q(t)Z (t) db (t)
o n
n
n
A
1
(t) V (t)b (t) I = T
l
n
n
non
+ H
+
T2
n
+
T
3n
+
T4
n
where, as for the computations for Un2' Hn (t) = FX,n (t) - FX(t) and
00
Zn (t) = !K((t-x)/h)dW(x).
.
-00
_.!.
The last surranand T is obviously 0 (n 2) =
p
4n
-1 -1
o (n h ) by (Al).
P
The first tenn, TIn' can be estimated as follows:
-15-
Now, since
f
1
o
1
00
[h- 2 fK'((t-x)/h)&~(x)]2dt = 0 (1)
P
-00
1
and n2
sup
O~t~l
IH
n
(t) I = Opel) we
conclude that
The terms T2n and T3n are estimated in a similar fashion as we did estiTIk~te
the terms of Un 2 employing the Lipschitz continuity of bn (t) and q(t) and we
A
thus obtain
2n = 0p (n -!) = 0p ( n - Ih -1)
T
T
l
3n = 0p (n-!) = 0p (n -lh- )
This shows finally that
(3.8)
It remains to show that
(3.9)
Again by partial integration we have that the illS of (3.9) is
-2f,01Hn (t)bn (t)dbn (t)
+
Hn (t)bn2 (t) 11
0
.
_.!-
As before the last summand is open 2) and so is the first summand.
Now, putting together (3.5) to (3.9) we finally have that
An (h)
= flo[bn (t)]2 f X(t)dt
+
n-lh-lBkf~s2(t)dt
+
0p(n-lh- l
which proves the theorem.
+
J~[bn(t)]2dt)
-16-
Proof of theorem 2
This proof goes mainly along the lines of the proof of theorem 1.
From Hardle and Marron (1983), fonnula (2.4), we have
(3.10 )
m*(t)
- met)
n
00
b~(t) = f~l(t)h -1 fK( (t-u) /h) [m(u) -m(t)] fx(u) du
where
-00
and
00
Y*(t)
n
= f x-1 (t)h -Iff [y-m(t)]K((t-x)/h)d[Fn (x,y)-F(x,y)]
-00
This process can now be approximated as Yn (t) (see the appendix) but with
V2(t) = S2(t) - m2(t) in the place of S2(t).
So we obtain that
y* (t) = [V2(t)/fx(t)]-tY~(t)
o,n
uniformly in t.
The decomposition (3.10) then reads as
(3.11)
mn*(t) - met) = b*(t)
n
+
n-lh-lV*(t)
n
+
p*
n
where
and
V~(t) = [V2(t)/fx(t)]~h-i7
K((t-x)/h)dW(x).
-00
We then carry out the same procedures as for V (t) in the proof of theorem l.
n
-17-
4.
APPENDIX
It is shown here that the variance terms in (3.1) can be approximated
by a sequence of Gaussian processes.
The crucial step in these approximations
is provided by the following lemma, due to Tusnady (1977).
Lemma 1
Let T(x,Y)
1952).
=
(F ' FYI X) (x,y) be the Rosenblatt transformation (Rosenblatt,
X
Then on a suitable probability space there exists a sequence of
Brownian bridges Bn(x' ,y') on [0,1] x [0,1] such that
sup I [Fn (x,y) - F(x,y)] - n -tBn (T(x,y)) I
x,y
=
0 (n -1 [log n] 2) .
p
A
It is next shown that Yn(t) can be approximated (uniformly in t) by Gaussian
processes.
Yo,n (t)
Y4 (t)
,n
For this define
=
=
[S2(t)/f (t)]-lYn(t)
x
where {Bn } is the sequence of Brownian bridges
as in lemma 1.
2
[Sn (t)fx(t)]-lh- 1n- l JJy K((t-x)/h)dW (T(x,y))
r
n
n
where {Wn } is a sequence of Wiener processes used
in constructing {B } as
n
-18-
(Tusnady, 1977).
Ys ,n (t) =
[S~(t)fx(t)]-!h-1n-t
7[S~(x)fx(x)]tK((t-x)/h)dW(x)
-00
1
1
Y6 net) = n-Zh,
00
!K((t-x)/h)dW(x)
-00
where W(x) is a standard Wiener process on (-00,00)
II Y II
For the following 1enunas
will denote sup IY(t) I.
O$t$l
Lerruna 2
Proof
II Un II
We have to show that
.!.
U (t) = n 2 h
n
_.!.
2
{!
IY >a
n
y
P
-+
0, where
n
K((t-x)/h)d[F (x,y)-F(x,y)] = I X . (t)
i=l n,1
n
Note that EX . (t) = 0 for all t and that X . (.) are independent, identically
n,1
n,1
distributed for each n. Therefore
(4.2)
EX2 .(t)
n,1
$
n -1 h -1 sup I K12
J y 2fy(y)dy
Iyl>a
n
P
establishes Un (t)
---+
0 for each t by assumption (A2).
By (A4) and the Cauchy-Schwarz inequality we have
-19EIUn(t)-Un(t l ) II Un(tZ)-Un(t) I~ Moh-3Itl-tll tz-tl
J.
y2 fy (y)dY,
Iyl>an
establishing by (AZ) tightness of Un(t) , [Theorem 15.6 of Billingsley (1968)].
Note that the proof of this lemma was done as in Jolmston I s paper, but
note also that our assumption is somewhat weaker than his, since we are
employing lemma 1, due to Tusn[ldy (1977), establishing a faster rate for the
two-dimensional empirical process.
Lemma 3
Proof
= SZ(t)fx(t), ~(t) = S~(t)fx(t). We must show that
Define get)
Now, from Johnston (198Z) we have that the second factor inside the curly
_1.
_1.
brackets is 0 (n 2h 2).
p
Now, from the mean value theorem
. where I:n is between ~ and g. Since~, g are bounded away from zero by
assumption (A3), 1/ 1:-n 3/ ZI/ is a bounded sequence. Now, from (AZ) it follows
that 1/ ~ - gil -----
0
and thus the lenma follows.
Lemma 4
Proof
Using integration by parts (see Johnston (1982) Lemma A.5 for details),
we obtain
0
-ZO-
= 0 (n-i(log n)Z)h-!{4a
~ IK'(u) Idu+4an [K(A)+K(-A)]}
n -A
p
0 (n-~h-~a (log n)2)
=
n
p
unifonnly in t.
The proof thus follows using assumption (AZ).
Lemma 5
Proof
Since the Jacobian of the transfonnation T, introduced in Lemma 1, is
f(x,y), we have by Masani (1968), Theorem 5.19
1
n2IY3,n(t) - Y4 ,n(t) I
~ 1~(t)-~h-lfIyK((t-x)/h)f(x,Y)dxdYI·11V (1,1)1.
r
n
n
So we finally have
1
n~lI Y..,J,n,n
- Y4 II ~ Iwn (1,1) 11.1 h- IIK((t-x)/h) Idx
where Al is a constant (AI
= sup Im(t)fX(t)I).
O~t~1
This proves the lerrnna.
Note that Y4 (t) is a zero mean Gaussian process with covariance
,n
x
n -1 h-2
Ify 2K((tl-x)/h)K( (tz-x)/h) f(x,y)dxdy
f
n
-21-
So both Y4 ,n and Ys ,n are Gaussian processes with the same covariance
structure and can thus be identified.
Lerrona
6
Proof
Note that by assumption (A3) on
~(t)
2
= Sn(t)fX(t)
is also LC(a) , a >! , i.e.
where LG is independent of t by (A3).
The difference of interest is now
l
(nh)2I Ys (t) - Y (t)!
,n
6 ,n
=
1~(t)1 •
We will now show that
for all n and t
•
sup
O~t~l
IR
n
(t)
I =a
p
(1) .
By partial integration we have
-221
S
Ih- 2
A
f W(t-uh)Gn, t(U)K' (u)dul
-A
1
+
Ih- 2
+
Ih- 2
1
= R1 ,n (t)
A
f
-A
[W(t-uh)-W(t)]K(u)d[G t(u)] I
n,
A
1-
J W(t)Gn, t(u)K'(u)dul
-A
+
Rz,n (t)
+
R3,n (t)
+
+
ap (h
2
)
•
R4,n
where R ,n is independent of t. The term R_
(t) is estimated as in
-~,n
4
Johnston (1982), Lemma 4.6, p. 411 to obtain
sup IR net)
Ostsl 1 '
~
I = opel)
We now show that
sup IR (t)
Ostsl Z,n
I = opel)
J£two(s) denote the modulus of continuity of Wet) and let
K=
sup !K(u) I
-AsusA
we then have with Silverman (1978), formula (7), (8) and his definitions of
p, q, B
1
•
+
1
h- 216K (log B)2
A
-A
J p(lulh)ldG
n,
t(u)
I
-23-
Now following the proof of Silverman (1978), Prop. 4 we see that the
both surmnands are by assumption (A3) on Id~(u)
ly
in t.
It remains to show that
sup
O~t~l
IR3
I of
,n
the order Opel) unifonn-
(t)1 = 0 (1).
p
again from assumption (A3) on the LC(a),a > ! condition
~(.)
This follows
and the follow-
ing inequality.
sup IR (t)
3 ,n
O~t~l
I
~
n -2 sup IWet) Ih _1.2L ha fA .Iu la, K'(u) Idu = 0 (1).
G -A
O~t~l
P
..
ACKNOWLEDGEMENT
I am grateful to Steve Marron for helpful discussions.
contributed much to the approximations of the appendix.
Ray Carroll
REFERENCES
BICKEL, P. and ROSENBLATI, M. (1973). On some global measures of the deviation
of density function estimators. Ann. Stat. 1, 1071-1095.
BILLINGSLEY, P. (1968). Convergence of probability measures.
and Sons, New York.
Jolm Wiley
COLLOMB, G. (1981). Estimation non-parametrique de 1a Regression: Revue
Bib1iographique. International Statistical Review 49, 75-93 .
..
GASSER, T. and MULLER, G.H. (1979). Kernel estimation of regression functions
in "Smoothing Techniques for curve estimation. Ed. T. Gasser and M.
Rosenblatt. Lecture Notes in Mathematics 757, Springer Verlag
Heidelberg.
HARDLE, W. and MARRON, S. (1983). Optimal bandwidth selection in nonparametric regression function estimation. University of North Carolina,
Institute of Statistics Mimeo Series #
HALL, P. (1982).
383-390.
Cross-validation in density estimation.
Biometrika, 69,
ISSERLIS, L. (1918). On a formula for the product moment coefficient of any
order of a normal frequency distribution in any number of variables.
Biometrika, 12, 134-139.
JOHNSTON, G. (1982). Probabilities of maximal deviations of nonparametric
regression function estimation. J. Mu1t. Analysis, 12, 402-414.
MACK, Y.P. and SILVERMAN, B.W. (1982). Weak and Strong Uniform Consistency
of Kernel Regression Estimates. Z. F. Wahrschein1ichkeitstheorie und
verw. Gebiete, 61, 405-415.
MARRON, J.S. (1983). Convergence properties of an empirical error criterion
for multivariate density estimation. Institute of Statistics Mimeo
Series # 1520, University of North Carolina at Chapel Hill.
~~ANI,
P. (1968).
Orthogonally Scattered Measures.
Adv. in Math., 2,61-
117.
NADARAYA, E.A. (1964).
141-142.
On
estimating regression.
Theor. Probe App!. 9,
RICE, T. and ROSENBLATI, M. (1983). SIJIOothing splines:
tives and Deconvolution. Ann. Stat., 11, 141-156.
•
Regression, Deriva-
ROSENBLATT, M. (1952). Remarks on a multivariate transformation.
Math. Stat., 23, 470-472.
Ann .
ROSENBLATI, M. (1969). Conditional probability density and regression
estimation in Multivariate Analysis, ed. Krishnaiah 25-31.
ROSENBLATI, M. (1971).
Curve estimates.
Ann. Math. Stat., 42, 1815-1842.
SCHUSTER, E.F. (1972). Joint asymptotic distribution of the estimated
regression function at a finite number of district points. Ann. Math.
Stat., 43, 84-88 .
•
SILVERMAN, B. (1982). Weak and strong uniform consistency of the kernel
estimate of a density and its derivatives. Ann. Stat., 6, 177-184.
STONE, C.J. (1982). Optimal global rates of convergence for nonparametric
regression. Ann. Stat.,lO, 1040-1053.
TUSNADY, G. (1977). A remark on the approximation of the sample distribution
function in the multidimensional case. Period. Math. Hung., 8, 53-55.
WATSON, G.S. (1964).
26, 359- 372.
Smooth regression analysis.
Sankhya, Series A, Vol.
WEGMAN, E.J. (1972). Nonparametric probability density estimation: A
comparison of density estimation methods. J. Statist. Comput. Simulation, 1, 225-245.
WONG, W.H. (1982). On the consistency of cross-validation in kernel
etric regression. Technical Report, Univ. Chicago .
.
no~param­