Marron, James Stephen and Hardle, WolfgangRandom Approximations to an Eerror Criterion of Nonparametric Statistics."

RANJ.n1 APPROXIMATIONS TO AN ERROR CRITERION
OF NONPARAMETRIC STATISTICS
by
James Stephen Marron1
University of North Carolina
Chapel Hill, N.C. 27514
and
Wolfgang I1ardle 2
Universitat Heidelberg
Sonderforschungsbereich 123,
Heidelberg, West Gennany
lResearch partially supported by Office of Naval Research, Contract
N00014-75-C-0809.
2Research partially supported by the Deutsche Forschungsgemeinschaft
and
.
Research partially supported by the Air Force Office of Scientific
Research Grant AFOSR-F49620-82-C-0009.
SUMMARY
1bis paper deals with a quite general nonparamctric statistical curve
estimation problem.
Special cases include estimation of probability density
functions, regression functions and hazard functions.
The class of
"fractional delta sequence estimators" is defined and treated here.
This
class includes the familiar kernel, orthogonal series and histogram methods.
It is seen that, under some mild assumptions, both the Average Square
Error and Integrated Square Error provided reasonable (random)
tions to the Mean Integrated Square Error.
reasons.
approxima~
This is important for two
First, it provides theoretical backing to a practice that has
been employed in several empirical studies.
Second, it provides a vital
tool for use in selecting smoothing parameters for several different nonparametric curve estimators by cross-validation.
1.
Introduction and Background
In nonparametric statistics, the problem of estimating a curve g(x)
(where x € Rd ) is treated as follows. Information about g(x) is contained
in a "sample" of independent d-dimensiona1 random vectors having common
probability density function f(x) and cumulative distribution function F(x).
An "estimate" of g(x) is a measurable function of x and the sample,
Xl'· .. ,Xn ·
Some special cases of this problem are:
D-densityestimation.
Here g(x) is taken to be the same as f(x), the-
d-dimensional common density of Xl, ... ,Xn .
"-
H-hazard function estimation.
Given a density, f(x), and the corresponding
cumulative distribution function, F(x), the hazard function is defined by
g(x) = f(x) (I-F(x))
-1
The importance of estimation of this function
Leadbetter [35].
(1.1)
discussed in Watson and
1S
This function is most useful in the case d = 1, so only
that case will be treated here.
For j = 1, ... ,n write vector X. in tCl~ of its
J
=
(1)
(d-l)
components as Xj
(X. , ... ,X.
, Y.). It is convenient to define
R-regression estimation.
J
J
J
d' = d-1
X! = (X ~l) , ... , X~ d I )
J
J
)
J
(1. 2)
In a similar spirit, write
x = (x(I),
,x(dl),y)
x' = (x(I),
,x(d ' )) .
(1.3)
-2-
Now take g(x) to be the regression function of Y. on X! which is defined by
J
J
g(x) = g(x') = rex') = E[Y·IX! = x']
J
(1. 4)
J
It is important to note that g(x) depends only on the first d' components
of x.
This somewhat unusual fOTII1Ulation allows regression estimation to be
included in the above general framework.
It should be noted that this list of special cases is meant to be
representative, not exhaustive.
For example, the theorems of this paper may
be easily seen to apply to derivatives (or other functionals) of the above
functions.
Quite a number of different estimators have been proposed for each of
the curves given above.
accuracy" is needed.
"-
For comparison of these estimators, a ''measure of
Given an estimator g(x) , a very cornmon error criterion
is the Mean Integrated Sql1are Error, defined by
2
A
MISE = Ef [g(x)-g(x)] w(x)dx,
(1.5)
where w(x) is some nonnegative "weight function".
One drawback to the use of MISE as an error criterion is that, for fixed
n, it involves n+1 fold integration, and thus requires a lot of computations.
The literature contains two different ways of overcoming this difficulty.
'The first is to study the asymptotic (as n
-+
(0) behavior of MISE.
The
second is to consider Monte Carlo (and hence random) approximations to MISE.
The most cornmon stochastic (ie: random) error criteria are the
Integrated Square Error, given by
2
A
ISE = J[g(x)-g(x)] w(x)dx ,
and the Average Square Error, given by
n
ASE = n- l L[g(X.)-g(X.)]2w(x.)
1
J
J
J
(1.6)
-3-
Wegman [37], working in the setting of density estimation, has pointed out
that if w(x) is defined by
w(x)f(x) = w(x) ,
(1. 7)
then, by a heuristic Law of Large Numbers, for n large, ASE should be a good
approximation of MISE.
The convention (1. 7) will be used throughout this
paper.
With the above heuristics in mind, Wegman used ASE as an error criterion
for a Monte Carlo comparison of several density estimators.
ASE has also
been employed for this purpose by Fryer [7] and Wahba [31].
Breiman, Meisel
and Purcell [3] and Raatgever and Duin [18] used a "normalized version" of
j\sE
in their Monte Carlo studies.
The error criterion ISE also has been
attractive to several authors, see for example Rust and Tsokos [22], Scott
-e
and Factor [23], Bean and Tsokos [1] and Bowman [2].
In the regression
setting, Stone [27] has used a "leave-one-out" version of ASE and Engle,
Granger, Rice and Weiss [6] and Silverman [25] have used ASE to study
cross-validated estimators.
In the hazard function setting, Tanner and
Wong [29] have compared two estimators by computing the difference of their
ASE's.
The use of ASE and ISE as error criteria was criticized by Steele [26]
who gave an example in which, asymptotically as n
-+
00, ASE hehaved very
differently from ISE (hence, at least one is a poor approximation to MISE).
In reply to this objection, Hall [8] showed that Steele's example was somewhat artificial by showing that, in the case d=l, if g(x) is a kernel
density estimator, then U11der some reasonable assumptions, as n
ASE = MISE
+
-+
00,
0p(MISE) ,
ISE = MISE + 0p (MISE) ,
and if g(x) is a trigonometric series density estimator (1.9) holds.
(1.8)
(1.9)
-4-
In this paper, Hall's results are extended in a number of ways, most
notably to a much wider class of estimators and to the more general setting
given above.
The importance of these results is two-fold.
First, they
demonstrate that the objections of Steele [26] need cause no concern in the
ease of lllany connnonly considered estimators.
I Icnee,
ISE and i\SE arc
both reasonable error criteria for empirical investigations of the behavior
of different estimators.
Second, the results of this paper provide an impor-
tant tool for use in establishing MISE optimality results for certain
cross-validatory choices of smoothing parameters.
been made for kernel density estimators in
~1arron
regression estimators in HardIe and Marron [10].
This application has already
[13] and for kernel
Using the theorems of this
paper, it should be straightfonvard to obtain similar results for the other
-e
estimators treated here.
Section 2 introduces the class of "fractional delta sequence estimators"
and contajns examples showing that many of the most widely studied nonparumetric estimators are contained in this class.
Section 3 contains a theorem
which gives sufficient conditions for (1. 8) for a subset of these estimators.
Section 4 illustrates how the hypotheses of the theorem are satisfied in some
specific cases.
Section 5 contains a theorem which extends the theorem of
Section 3 to all fractional delta sequence estimators.
examples for illustration of this second theorem.
Section 6 contains
section 7 presents a
theorem which gives sufficient conditions for (1.9) together with some
examples.
The proofs of the theorems are in sections 8 and 9.
Section 10
discusses how the theorems of this paper can be reformulated in terms of a
type of uniform convergence that is useful for cross-validatory choices of
smoothing parameters.
-5-
2.
Fractional Delta Sequence Estimators
'The class of fractional delta sequence estimators is defined to consist of
all estimators of the form
n
I
~(x)
. I
1=
0 (x ,x.)
m
1
(2. L)
=----
n
Io'(x,X.)
. I m
1
1=
where 0m and 0m are measurable functions on R d x R d , which are indexed by
a "smoothing parameter" m = men) ~ R+. Note that the special case
I
O~(x,Xi)
=I
gives the delta sequence estimators studied by Watson and
Leadbetter [36], Walter and Blum [33], Susarla and Walter [28] and others.
In the setting of density estimation, some well known estimators of this
type are:
D-I kernel estimators:
Introduced by Rosenblatt [19] and Parzen [17], given
a "kernel function", K(x), and a smoothing parameter, m, define
om(x,X.)
= mdK(m(x-x.))
1
1
(2.2)
D-2 orthogonal series estimators:
Introduced by Cencov [4].
Suppose
{lj; k(x)} is a sequence of functions which is orthonormal and comp] ete with
respect to the inner product
(2.3)
C;iven the smoothing parameter m, define
m
om(x,X.)
= I lj;k(x)wk(X.)w(X.) ,
1
k=l
1
1
(2.4)
-6-
D-3 histogram estimators:
Let B denote a bounded subset of Rd , and suppose
that (given the smoothing parameter, m) Al, ... ,Am form a partition of B.
For k=l, ... ,m, let Xk (r) denote the indicator of Ak and let
Define
8 (x,X.) =
m
1
m
L ~(Ak)-
1
k=l
Xk(x)Xk(Xi)w(Xi)'
8'(x,X.)
- 1.
m
1
It should be noted that this estimator is a special case of the estimator D-2
if that estimator is generalized to allow not only the number of terms, but
also the choice of orthogonal series to depend on m.
In order to handle both
estimators simultaneously, it will be assumed in the rest of this paper that
-e
this generalization has been made.
dependence of
~k(x)
However, to simplify the notation, the
on m will be suppressed.
Further examples of delta sequence density estimators may be found in
Walter and Blum [33] and Susarla and Walter [28].
Some examples of
fractional delta sequence estimators in the regression setting are:
R-l kernel estimators:
a
Introduced by Nadaraya [IS] and Watson [34].
Given
kernel function, K(x') and a smoothing parameter, m, using the notation
(1.2) and (1.3), define
d'
8 (x,X.) = m K(m(x'-X!))Y. ,
m
1
1
1
d'
= m K(m(x'-X!))
.
1
To see the intuition behind this estimator, note that, by (2.1), g(x) is a
weighted average of the y 1..
-7-
R-2 known-marginal kernel estimators:
Studied by Johnston [12].
Let fNfx')
denote the marginal density of X!1 and define
om(x,X.)
= md' K(m(x'-X!))Y.,
1
1
1
To see the idea behind the estimator, note that when the denominator of R-l
is properly normalized, it becomes the estimate D-l of the marginal density,
R-3 delta sequence estimators:
[5], define
A generalization of R-l, discussed in Collomb
6m(x',X!)
as for any of the density estimators and let
1
om(x,X.)
1
u.('1
(x 'X.)
l
m
=
'6 m(x' ,X!)Y.
1
1
= ]:u
(x' , X')
••
m
1
Note that the regressogram of Tukey [30] is the special case where t; m IS
defined as for D-3.
In the setting of hazard function estimation, Watson and Leadbetter [35]
have introduced the following fractional delta sequence estimators.
1f-1 kernel estimators:
Given a kernel function, K(x), and a smoothing
parameter, m, define
6m(x,X.)
= mK(m(x-x 1·))
1
O~(x'Xi) = 1-
IX
,
(2.5)
mK(m(t-Xi))dt .
-00
11-2 delta sequence estimators:
A straightforward generalization of H-l,
define 6m(x,X.)
as in any of the density estimators and let
1
O~ (x,X i ) = 1 -
IX
-00
0m(t ,Xi)dt.
-8-
For the optimal smoothing parameter selection results of Marron [13]
and IUirdlc and Marron [10], it is important to work with "leave-one-out"
estimators.
So that the theorelTL<; of this paper may be used for similar
results, define, for j=l, ... ,n, the "leave-one-out fractional delta sequence
estimators"
~J
<5
. . m
g. (x)
= 1
(x,X.)
1
L o'(x,X.)
'''/'' m
I
J
IT]
and the corresponding Average Square Error
ASE = n
3.
-1
n
L [g.(X.)-g(X.)]2w(X.).
j=l J J
J
J
(2.6)
Theorem for Delta Sequence Estimators
This section gives sufficient conditions for (1.8) in the special case
of delta sequence estimators, which are of the form
g(x) = n- l
n
I <5 (x,X.)
i=l m
]
Note that, from (l.S), by adding and subtracting E[g(x)], MISE admits the
.
lb'las 2 d ecomposltl0n
..
varIance
-1
MISE = n v(m) + b(m) ,
(3.1)
v(m) = !var[om(x,Xi)]w(x)dx
(3.2)
b(m) = ![Eg(x)-g(x)]2w(x)dx
(3.3)
where
It
is well known that the rate of convergence of b(m) to 0 depends on the
"smoothness" of g(x).
It is convenient to define, for i,j=l, ... ,n
1
V.. = [6 (X., X. ) -g(X .) ]w(X .) 2
IJ
m J
I
J
J
•
(3.4)
-9-
Note that by (1.6) and (2.6)
ASE = n
-1 n
L [n
-1 n
I
V.. ]
i=l 1J
j=l
2
,
(3.5)
-
ASE = n
n
-1 \'
/., [(n-1)
j=l
-1
J. V.. ]
ifj 1J
2
,
and by (1.7) and (3.3)
2
E(E[V. ·Ix.] ) = b(m)
1J
(3.6)
J
In the theorem of this section it will be assumed that (as n
n- 1v(m)
+
0, v(m) ~
00
sCm)
,
2
+
0 .
2
E[Vjj ] = o(n MISE).
It
~
00),
(3.7)
(3.8)
will also be asswned that, for i,i' ,j ,j' all different,
E[V~.] = O(v(m)3) ,
(3.9)
2
2 _
5/2
E(E[V.. Ix.] ) - O(v(m)
),
(3.10)
E(E[V.. Ix.] 4) = o(v(m)b(m)),
(3.11)
E(V..V., .V.. ,V.,. ,) = o(v(m) 2) .
(3.12)
1J
1J
1J
J
J
1J 1 J 1J
1 J
These asslUIlptions are sufficient for the approximation of MISE by ASE
and ASE .
Theorem 1:
If g(x) is a delta sequence estimator which satisfies (3.7) to
(3.12), then
ASE = MISE
+ 0
ASE = MISE
+
0
P
P
(MISE),
(MISE) .
-10In the next section it is seen that the assumptions (3.7) to (3.12)
are very mild for the delta sequence estimators discussed in Section 2.
4.
Examples for Delta Sequence Estimators
In this section, examples are given which illustrate how the hypotheses
of Theorem 1 may be satisfied, under mild conditions, by the delta sequence
estimators D-1, D-2, D-3 and R-2 that were defined in Section 2.
D-1 kernel estimators:
Using computations very similar to those in Marron [14],
it may be seen that (3.7) - (3.12) hold whenever the estimator satisfies the
assumptions of the following corollary to Theorem 1.
Corollary 1.(D-l):
Sufficient conditions for the estimator D-1 to satisfy
the hypotheses of Theorem 1 are:
f is uniformly continuous,
(4.1)
!K(u)du = 1,
(4.2)
!K(u)4du <
(4.3)
K(O)
<
for some
It
00,
(4.4)
00
ex eo
(0,1)
-
II x II ex IK(x) I
1im
II xii
-+
<
00
(4.5)
,
00
w(x) is bounded,
(4.6)
n -1 md
(4.7)
+O,ID+oo.
is worth noting that this corollary slightly improves Theorems 1 and
2 of Marron [14].
Also, (4.1)-(4.7) apply to a substantially wider class of
estimators than the assumptions of Theorem 1 of Hall [8].
D-2, D-3 orthogonal series estimators:
Recall from the introduction that both
orthogonal series estimators and histogram estimators may be handled simultaneously by allowing the set of orthonormal functions to depend on m.
Using the
inner product (2.3), for k=l, ... ,m, the Fourier Coefficients of f(x) are
-11-
given by (recall dependence on m is suppressed)
(4.8)
Assume that f is in L2 (w) in the sense that
(4.9)
Also assume that the collection of orthogonal systems is asymptotically
complete in the sense that, for x in the support of w,
m
lim L ~~k(x) = f(x)
m+
k=l
(4.10)
00
Observe that, by (2.4),
m
m
Eg(x) = Eo (x,X.) = J I ~k(x)~k(y)w(y)f(y)dy = L ck~k(x) .
m
1
k=l
k=l
Similarly,
Jvar[g(x)]w(x)dx = n- l JE[8 (x,X.)-E8 (x,X.)]2w(x)dx =
m
1
m
1
m
= n -lJJ [L\ (~k(y)w(y)-ck)~k(X)]2f(y)dyw(x)dx =
k=l
1 m
2
= n- J L (~k(y)w(y)-ck) f(y)dy =
k=l
-1 m
2
2
2
= n L (J~k(Y) w(y) f(y)dy - ck)
k=l
But, by Bessel's Inequality,
(4.11)
Thus, from (3.2),
v(m) = Cm
+
oem)
when it is assumed that there is a C > 0 so that
-1 m
2
2
lim m I J~k(Y) w(y) f(y)dy = C
k=l
m+
00
e
Note this condition is little stronger than assuming w(y)f(y) is bOilllded
and is bounded above
a on
a set of positive measure.
-12-
Condition (3.7) follows from assuming
m+oo,n -1 m+O,
together with (4.9), (4.10), (4.11) and the dominated convergence theorem.
To check (3.8), note that
m
E[V~.] = J[ I ~k(X)2w(X)-f(x)]2w(x)f(x)dx ~
k=l
JJ
~
m
2
22· 2
2J[ I ~k(x) ~k' (x) w(x) +f(x) ]w(x)dx =
k,k'=l
= 2(
m
L II
k,k'=l
~k~k,wll
2
+
2
II f II )
So, from the assumption
m
L II
k,k'=l
~k~k,wll
2
2
= O(m) ,
it follows that
-e
E[V~.] = O(m2) = O(n1MISE 2) = 0(n 2MISE).
JJ
To check (3.9), note that, for i t j,
4
E[V .. ] =
1J
~
4
m
2
JJ[ L ~k(x)~k(y)w(y)-f(x)] w(x) f(x)f(y)dxdy
k=l
m
8JJ([ I
k=l
4
~
4
~k(x)~k(Y)w(y)] + f(x) )W(x)w(x)f(y)dxdy.
For k, k', k", k* = 1, ... ,m, let Ck , k' , k" ,k* and Ck, k' , k" ,k* denote the
k*-th Fourier coefficients of 1jJk~k'~k"w and 1jJk~k,1jJk"w3f respectively. Then,
E[V~.]
1J
.:5. 8
L
k , k' , k" , k*
Ck k' k" k*C k' k' k" k* + 8Jf(x)3w(x)2 dx •
, "
'"
By the Schwartz Inequality and the Bessel Inequality, the first term on the
right is bounded by
-13-
8(
k ,k'
2
' ',k
*
"k*
,~C
, k,k ',k
) 1(
\'
C' 2
l
' ',k
*
k ,k' ,k"
,k* k,k ',k
)1
~
'I1ms, by assllll1ing
2
3
= O(rn ) ,
L
lI\)Jk\)Jk,\)Jk"wIl
L
3
II\)Jk\)Jk,\)Jk"w3fI12 = O(rn ) ,
k,k' ,k"
k,k' ,k"
f f(x)
3
2
w(x) dx <
00
,
it follows that
3
E [V~.] = o(rn )
1J
To verify (3.10), note that
-e
2
E[V. ·Ix.] =
1J J
f[
So.
m
L t)Jk(X·Nk(y)w(y)-f(X.)]
k=l
2f(
J
J
rn
L
2
w(X.)f(y)dy.:sJ
\)Jk(X·Nk(y)\)Jk' (X·N k , (y)w(y)
k , k'=l
J
J
2
2
+ f(X.) )w(X.)f(y)dy =
J
J
rn
2(
L
k ,k' =1
lfik(X·N k , (XJ.)w(X.)Ck k' + f(XJo)w(X .))
J
J
J,
where C k' denotes the k'-th Fourier Coefficient of \)Jkwf.
k,
Thus, using
Ck , k' , k" ,k* as defined above,
But, as above,
L Ck k'Ck" , k*Ck 'k'" k" k*
k , k' ,k" ,k*'
~[(
\'
2
\'
2
2
C k')( L. Ck " k*) (
L C k' k" k*)]
k ,k" k
k" , k*
,
k ,k' ,k" , k* k , , ,
L.
1
2
-14-
Hence, by the above assumptions, together with
2
Llil/!kwfil = Oem)
k
(4.12)
,
it follows that
E(E[VijIXj]2)~ [O(m)O(m)O(m3)]! = O(mS/ 2)
•
For condition (3.11), using (3.6) and the assumption
m
1
1
= 0(m 2) ,
sup I I C l/!k(x)-f(x)lw(X)2
x k=l k
note that
4
2
E(E[V. ·Ix.] ) < o(m)E(E[V. ·Ix.] )
1J
J
-
1J
J
= o(v(m)b(m)).
To check (3.12), note that, by (4.8) and (4.9)
.-
E (V .. V. , .V.. ,V. , . ,) = E(E [V ..V. , . Ix. ,X. ,] 2) =
1J 1 J 1J
1 J
1J 1 J
1
1
= E(J[Ll/!k(x)l/Jk(X' )w(X.) -f (x)] [2l/!k(x)l/!k(X. ,)w(X .•) -f (x) ]w(x) dx) 2 =
k
=
11
1
k
1
E (Ll/!k(X, )w(X. )l/Jk(X .•)w(X .•) - 2Ckl/!k(X. )w(X.) - LCkl/!k(X .~l/!k(X .,) + II f 11
k
1
1
11
11
1
1
k
k
= E(f(\)Jk(Xi)w(Xi)-Ck)(l/!k(Xi.)W(Xi,)-Ck)
~C~
-
+
II
f11
2
~ 2E(Lcl/!k(X.)w(X.)-C )(l/!k(X.,)w(X.,)-C )) +2(11 f
k
1
1
k
11
k
2
)2
II 2 -
2 2
LCk )
k
Using the notation C k' as defined above, the first term equals
k,
2
L
(E(l/!k(X.)w(X.)-Ck)(l/!k'(X.)w(X.)-Ck ,)) =
k k'
1
1
1
1
,
)2 =
~
Now by (4.11) the second term on the right of this last inequality
2
2
is bounued.
-15-
using assumption (4.12).
The assumptions niade in the above argument are sunnnarized in the following
corollary to Theorem 1:
Corollary 1.(D-2,3):
Sufficient conditions for the estimators D-2 or D-3 to
satisfy the hypotheses of Theorem 1 are:
f and ware bounded,
(4.13)
for all x in the support of w(x),
(4.14)
(4.15)
m
I
k,k'=l
"-
IIl)JkWk'
II
2
2
= O(m) ,
(4.16)
(4.17)
m
1
sup I I Ckwk(x)-f(x) I = o(mZ)
x k=l
n
-1
m ~ 0,
m~
00
•
Verification of conditions (4.14-(4.19)
(4.18)
(4.19)
is straightforward in the case
of the histogram estimator, D-3, when it is assumed that f is uniformly
continuous and
-1
sup
f.l(Ak) = Oem ) ,
k=l, ... ,m .
1
~k(X) = f.l(Ak)-Z Xk(x).
For the estimator D-2, in the case of the Hermits series (4.16)-(4.19) follow
easily from (4) of Walter [32], and in the case of trigonometric series,
(4.16)-(4.18) are trivial, but (4.19) seems to require an extra assumption on
f(x).
-16R-2 known-marginal kernel estimators:
As defined in Section 2, this estimator
is not a delta sequence estimator, so redefine (using the notation (1.2) and
(1. 3))
-1
om(x,X.)
= md' K(m(x' -X!))Y
.fM(x')
1
1
1
0'm(x,X.)
1
==
l.
In the regression setting, note that both g(x) and
through x' (see (1.4)).
~(x)
depend on x only
Thus MISE should also be restricted to x'.
In order
to continue using the notation (1.5), (1.6) and (2.6), this restriction will
be accomplished by assuming that the weight function w(x) has the form
w(x)
--
= w(x' ,y) = wO(x')fc(Ylx') ,
where wO(x') is the desired nonnegative weight function and fc(Ylx') denotes
the conditional density of Y·IX!.
J
J
This restriction has been done without
loss of generality, because now,
"
2wO(x')dx'
MISE = EJ[g(x')-g(x')]
and by (1. 7) ,
ASE = n- l
ASE = n- l
n
I
j=l
[g(X!)-g(X!)]2w(X!) ,
J
J
J
n
I [g. (X!) -g (X!) 12w(X!) ,
j=l J ]
J
J
where
w(x) = w(x)/f(x) =
wo(x')/~(x')
= w(x')
Analogous to (1.4) define, for k=1,2,3,4,
M
Ok (x') =
o
e
E[Y~IX!
J J = x'] .
Assuming Ml , M2 and ~\1 are uniformly continuous, standard calculations, such
as those of Rosenblatt [21] show that condition (3.7) is satisfied with
-17-
v(m) = (jMZ(x')wO (x')fM(x') -1 dx') (f K(u') Zdu')md' + oemd' ) ,
when it is also assumed that both integrals are finite, that JK(x)Jx = 1
and that m ~ w with n- 1md ~ 0 .
To check (3.8), note that, by (3.4),
Z
Z
Z~
E[V ..] < ZE(o (X. ,X.) + r(X.) )w(X.) =
m J
JJ -
J
J
J
z!(mZd 'K(0)ZM (X')f (x,)-2+ r(x,)2)w (x')dx'
o
z
M
= O(m2d ') .
where it has been assumed that K(O) <
00
==
, that fM is bounded above 0 on the
support of wo' that r and MZ are bounded, and that
by (3.1) and (3.7),
E[V~.] = O(m 2d ') = o(nmd ') == o(n~ISE) .
o is
W
integrable.
Thus,
JJ
For (3.9), note that for i f j,
E[V4.. ]
1J
<
-
~
Z
8E[o (X.,X.) 4+r(X.) 4]w(X.)
m J 1
J
J
dx'du'
1
by the above assumptions together with assuming that M4 and ware bounded.
Condition (3.10) follows from the same assumptions, since
E[V~.IX.] ~ ZEro (X.,X.)Z+r(X.)ZIX.]w(x.) =
1J
m J
J
1
J
J
J
==
ZJ[m 2d ' K(m(X! -x') )M2 (x ') f M(X!)2+ r(X!) 2]w(X! )fM(x' )dx'
=
ZJ[m K(u')MZ(X!-u'/m)£M(X!)
=
d'
Oem ) ,
J
J
J
d'
J
J
-Z
+m
-d'
J
Z
==
reX!) ]w(X!)fM(X!-u'/m)du'
J
J
J
=
-18-
independently of X!.
Thus,
J
2
2
E(E [V .. IX.] )
1J J
To verify (3.11), note that
E[V ij IX(X ] :::
1
J[m~(m(xi -x Z)) r(xp f M (xi) -l- r (Xi) ]w(x:i.1 ~fM (xz)dx z
Thus, by the above assumptions,
sup
X
1
I
E [v .. X . :::x ] <
1J J 1
00
•
Hence, from (3.6) it follows that
E(E[V. ·Ix.]
1J J
4
) ::: O(b(m))
Condition (3.12) will require more effort.
-e
A ::: {(xi ,x
C
A
:::
z): II xi -xzll
>
Given a
2m-
a/2
>
0, define the sets
} ,
R'\A.
Note that
E(V ..V. I . V.. I V. I . I )
1J 1 J 1J 1 J
:::
E(E [V ..V.. I IX. ,X. I ]
1J 1J
J J
2
)
:::
(4.20)
::: I + II ,
"where
I :::
JJ{
A
But, by the above assumptions, uniformly over xi,x
z'
-19-
d'
E[m K(m(xi -Xj))YifM(xP
-1
d'
-1
-r(xp] [m K(m(xZ-Xp)Yi:tM (x Z)
- r(xp]
=
= ![m2d'K(m(xi-x3))K(m(XZ-x3))Mz(X3)fM(Xi)-lfM(xZ)-1-md' K(m(x'-x'))r(x')f (x') -1 r(x')1 3
3 lM 1
Z
- r(x')md' K(m(x'-x'))r(x')f (x') -1 +r(x')r(x')]£ (x')dx'
1
233M2
12M3
3
=
= Oemd' ) "
xi. "Rd'
Thus, since for each
,the set {x :
Z
(xi. ,x Z)
Eo
c
A } has Lebesgue
measure O(m-d'a/Z) ,
= O(mZd'-d'a/Z)
I'
Now given (xi ,xp
~
A, note that the sets
I
=
a 2
xi-x3" < m- / } ,
Al
= {x 3 : II
A
a / 2}
= {x'"llx'-x'll<m3"
2 3
2
are disjoint.
(4" 21)
d'
Let A3 =R \ (A1U AZ).
fJ{f
dx
A Al
'
Define 11'1
3 + Af
dx' +
3
Z
2
and 1 by
J
3
dx3}2Wo(Xi)Wo(xz)dxidxz
A
3
= fJ{I1+IZ+I3}2WO(xi)Wo(XZ)dxidxZ "
A
Now aSSlUlle that
IK(x')
I<
It follows that, uniformly over (xi ,x
z)€
II
lim
II
x' "
+
by assuming a Ec (0,1).
x'
Ii a
00
(4.22)
•
00
A and x €. AI'
3
Hence, by the above assumptions,
=
-20-
"
l )]£ (x')dx
+ r(xl)r(x
12M
3
3'
d 1-0./2
oem
=
) .
Similarly
= (d 1-0./2) ,
sup I 20m
=
(d ' -o./2) .
sup 130m
A
A
Thus,
.
I
.
= oem
2d ' - o.
)
So now, from (4.20) and (4.21),
E(V .. V. I .V .. IV. 1'1)
1J 1 J 1J 1 J
= 0(m2d'-o.)+O(m2d'-d'o./2) = 0(m 2d' ),
(4.23)
which verifies (3.12).
The assumptions made in the above argument are sunnnarized in the following
corollary to Theorem 1.
Corollary 1. (R-2):
Sufficient conditions for the estimator R-2 to satisfy the
hypotheses of Theorem 1 are:
f W r, M are uniformly continuous ,
(4.24 )
M , ware bounded
(4.25)
f M is bounded above 0 on the support of wo'
(4.26)
Z
4
jK(u')dn ' = 1 ,
(4.27)
4
jK(u ' ) du ' <
K(O) <
for some a
E.
-1 d I
n m
(0,1),
+
0,
00
00
(4.29)
,
lim
Ilx'll-roo
m+
(4.28)
,
00
•
II
Xl 110. IK(X ' )
I<
00
(4.30)
(4.31)
-21It is worth noting that this corollary considerably improves Theorem 1
of l-fardle [9].
Also, the proof uses techniques quite similar (although more
complicated because of the Y.) to those used in Marron [14].
1
5.
Theorem for Fractional Delta Sequence Estimators
'This section extends Theorem 1 to include fractional delta sequence esti-
mators.
Since these estimators have denominators containing random variables,
they are technically more difficult to work with.
In fact, for the estimator
R-l, if the kernel function, K, is allowed to take on negative values, then the
moments of g(x) may not exist (see Rosenblatt [20] and HardIe and Marron [11])
so MISE could be infinite.
One way around these difficulties is the following.
function D(x) and a set S c: R
n
d
Assume there is a
so that , uniformly over x e. S,
-1 n
o'(x,X.) +D(x)
. 1 m
1
1=
I
(5.1)
in probability, and assume that
inf D(x) >
a•
(5.2)
x~S
~
Then, uniformly over x
S,
~ ~m (x,X i ) -O~(X,Xi)g(X~
g(x) - g(x) = _1
=n-lI
i
(
t
= n -1 ~<5;(x'Xi)
1
Lo'(x,X.)
. m
1
1
(j
ux, X)
. -u.r,(x, X)
. g x
m1
=
m
1
+
1
1
[D(x)-n- I<5'(x,x.)]nH<5
. m
1
.
m(x,X.)-o'(x,X.)g(x)]
1
m
1
..:.1:..-_----.;--_-=.1
D(x)
_
D(x)n-llcS'(x,x.)
. m
1
1
+ 0 (n
P
-1\ *
LO (x,X.)) ,
i m
1
where
[<5 (x ,X . ) - 0 ' (x, X. ) g (x) ] /D (x) .
m
1
m
1
(5.3)
-22Thus, for w(x) supported inside S, it makes sense to replace MISE with the
more tractable error criterion
MISE* = EJ[n- 1
n
I
o*(x,X.)]2w(x)dx
. 1 m
1=
1
(5.'1)
Sjmi1ar1y, ASE and ASE may be replaced with
_
ASE* - n
L [n -1 nL 0* (X. ,X.)] 2~w(X.),
-1 n
j=l
i=l m J
1
J
(5.5)
n
ASE*
=
n-1 L [(n-1) -1 ~ 0*(X . ,X. ) ] 2w(X .)
'1
.. m J 1
J
J=
1 J
The main theorem of this section consists of a set of sufficient conditions for
MISE* to be approximated by ASE* and A5E*.
An
alternative (and quite similar)
Ineans of getting around the nonexistence of MISE for the estimatorR-l may be
found in HardIe & Marron [10].
Before the theorem is stated, note that, from another point of view, MISE*
may be considered to be an assessment of how accurately the delta sequence
estimator g*(x), defined by
n
g*(x) = n-l.I o;(x,X i ),
1=1
estimates the function g*(x), defined by
g*(x) ::
a.
Similarly ASE* and ASE* are the ASE and ASE for this new estimation problem.
COlis observation allows immediate application of Theorem 1.
Following section 3, write
MISE* = v*(m) + b*(m) ,
where
v*(m) = Jvar[g*(x)]w(x)dx,
b *(m) =
A
(5.6)
2
J [Eg*(x)] w(x)dx .
For i,j=l, ... ,n, define
1
V ~. = 0* (X. ,X. )w(x.) 2
1J
m J 1
J
(5.7)
-23It will be assumed that (as n
~
00)
n-Iv*(m) ~ 0, v*(m) ~ 00, b*(m) ~
o.
(5.8)
E[V~~]
= o(n~ISE*)
1)
(5.9)
It will also be assumed that for i,i',j,j' all different,
(5.10)
(5.11)
E(E[V~1J·IXJo]4)
o(v*(m)b*(m)) ,
(5.12)
E(V~.V~,oV~.,V~,o,) = 0(v*(m)2)
(5.13)
1J 1 J 1J
=
1 J
This theorem of this section is:
Theorem 2:
If g(x) is a fractional delta sequence estimator which satisfies
(5.8) - (5.13).
ASE* = MISE*
+
0 (MISE*)
P
ASE* = MISE* +
Op~1ISE*)
.
To see how Theorem I and Theorem 2 are intimately related, note that in the
special case where g(x) is a delta sequence estimator (ie:
o'm(x, X10)
:: I),
conditions (5.1) and (5.2) hold trivially and the quantities MISE*, ASE*,
ASE*, v*(m), b*(m) and
V~.
1J
are the same as their unasterisked analogues.
Theorem 1 is a special case of Theorem 2.
1bus
On the other hand, using the second
viewpoint given above, Theorem 2 is a consequence of Theorem 1.
6.
Examples for Fractional Delta Sequence Estimators
In this section, examples are given which illustrate how the hypotheses
of 'Theorem 2 may be satisfied, under mild conditions, by the fractional delta
sequence estimators R-1 and H-l that were defined in Section 2.
-24-
R-l kernel estimators:
Note that by results
such as Theorem A of Silverman
[24], conditions (5.1) and (5.2) are easily satisfied under reasonable assumptions.
Thus, it makes sense to use (5.4) and (5.5) as error criteria.
The
computations to derive conditions under which the estimator R-1 satisfies
(5.8)-(5.J.3) are very similar to those for the estimator R-2 (studied in Section
4) and hence are omitted.
The results are summarized in the following
corollary to Theorem 2.
Corollary 2. (R-l):
Sufficient conditions for the estimator R-1 to satisfy
the hypotheses of Theorem 2 are:
f M is bounded above 0 on the support of wO'
f r, M are uniformly continuous,
W
2
(6.1)
(6.2)
M4 , ware bounded,
!K(u')du ' = 1 ,
(6.3)
!K(u,)4 du'
(6.5)
K(O) <
00
<
00
,
(6.6)
,
for some a e (0,1),
-1 d I
n m
+
(6.4)
0, m +
00
II
II
lim
x'
II +
Xl
Ii a
IK(X') I <
00
,
(6.7)
00
(6.8)
•
It should be noted that the assumptions of this corollary are substantially
weaker than those of Theorem 2 of HardIe [9].
H-l kernel estimators:
d
=
1 has been adopted.
Recall that, in this setting, the usual convention
Note that by results such as Theorem 2 of Nadaraya
[16], conditions (5.1) and (5.2) are easily satisfied'with D(x) = l-F(x),
under reasonable assumptions.
Thus, it makes sense to use (5.4) and (5.5)
as error criteria.
Note that fyom (2.5) and (5.3), using (1.1),
-25-
E[o*(x,X.)]
= J[mK(m(x-y))-g(x)!:mK(m(t-y))dt]f(y)dy
=
m
1
X
= JK(u) f(x-u/m) du-g (x)JK(u) [l-F(x-u/m)ldu =
(6.9)
= !K(u) [f(x-u/m)-f(x)]du + g(x)JK(u) [F(x-u/m)-F(x)]du,
where it has been assumed that JK(u)du
=
Using the notation
1.
K*(z) = JooK(u)du ,
z
note that
E[cS~(X,Xi)2] = E[mK(m(x-Xi))-g(x)K*(m(x-X i ))] 2 =
= J[mK(u)2-2g(X)K(U)K*(U)+ m- 1g(x)2K*(u)2]f(x-u/m)du =
= m f(x) !K(u)2du + oem).
lmiformly in x, where it has been assumed (in addition to the above) that g
is bounded and f is uniformly continuous.
Condition (5.8) follows easily
from the above, together with the further assumptions that JK(u)2du
w(x) is bounded, m ~
00
and n-lm
+
<
00
O.
To check (5.9), note that
E[V~~]
JJ
=
J[mK(O)-g(X)K*(O)]Zw(x)f(x)dx
by the above assumptions and K(O)
<
00
=
•
For (5.10), using the above assumptions, for irj,
2
E[Vij]
~ 8 JJ
2f(x)f(y)dxdy =
[m4K(m(x-y)) 4+g(x) 4K*(m(x-y)) 4]w(x)
= O(m3)
,
where it has also been assumed that !K(u)4du
<
00
Using similar techniques and the same assumptions, (5.11) and (5.12) may
be checked by noting that
-26-
and by (6.9)
sup E [V~ . IX.] <
X
1J
J
00
from which it follows that
E(E[V~.lx.]4) = O(s*(m)) .
1J
J
Verification of condition (5.13) requires a computation very similar to
that leading to (4.23), using the assumption (4.22).
To save space, the
details are omitted.
The assumptions made in the above argument are summarized in the following
corollary to Theorem 2.
Corollary 2.(H-l):
Sufficient conditions for the estimator H-l to satisfy the
hypotheses of Theorem 2 are:
f is uniformly continuous,
(6.10)
g and ware bounded,
(6.11)
fK(u)du = 1,
fK(u) 4du <
K(O) <
for some
00
00
(6.13)
,
(6.14)
,
(l ~
n -1 m -+ 0, m -+
7.
(6.12)
(0,1),
00
II x 11(l
lim
II xll-r
IK(x)
I
<
00
,
(6.15)
00
(6.16)
•
Approximation by ISE
'This section contains a result which provides sufficient conditions for
(1.9) together with some examples illustrating its applicability.
Recall
that, using the notation of section 5, both delta sequence estimators and
~
fractional delta sequence estimators may be handled sinllltaneously.
-27Analogous to (5.4) and (5.5), define
ISE* = ![n
-1 n
2
L o*(x,X.)]
w(x)dx
. 1 m
1
(7.1)
1=
It will be assumed that
n- 1v*(m) ~ 0, v*(m) ~ 00,
Theorem 3:
b*(m) ~
a,
(7.2)
E!o*(x,x.)4w(x)dx = O(v*(m)3) ,
m
1
(7.3)
E[!O~(X,Xi)2w(X)dx]2 = o(v*(m)8/3) ,
(7.4)
E[!8*(X,X.)O*(x,x.,)]2
= o(v*(m)2) .
m
1
m
1
(7.5)
If g(x) is a fractional delta sequence estimator which satisfies
(7.2)-(7.5), then
ISE* = MISE* +
0
P
(MISE*) .
Computations very similar to those of section 4 and section 6 may be
carried out to derive sufficient conditions for the estimators D-l, D-2, D-3,
R-l, R-2 and H-1 to satisfy (7.2)-(7.5).
For the purposes of illustration,
the results are recorded here for the estimators D-1, D-2 and D-3.
Corol~arly
3. (D-l):
Sufficient conditions for the estimator D-l to satis-
fy the hypotheses of Theorem 3 are:
f is uniformly continuous,
(7.6)
!K(u)du = 1,
(7.7)
!K(u)4du
<
00
,
for some
Ct
€
(0,1),
(7.8)
lim
II x " ~
" xil
Ct
IK(X) I <
00
,
(7.9)
00
w(x) is bounded,
(7.10)
n -1 md
(7.11)
~
0, m ~
00
-28-
It should be noted that these assumptions are considerably weaker than
those of Theorem 2 of Hall [8].
Recall from section 2 that the estimators D-2 and D-3 may be handled
simultaneously using the notation of that section.
Corollary 3. (D-2,3):
Sufficient conditions for the estimators D-2 or D-3
to satisfy the hypotheses of Theorem 3 are:
f and ware bounded,
(7.12)
m
for all x in the support of w(x) , lim L '1c1/Jk(x) = f(x) ,
~ k=l
-1 m
lim m
~
m
L II
k,k'=l
-e
1
(7.14)
k=l
\)!k\)!k'
m
L
1
L II1/Jkwze II = c,
(7.13)
k,k' ,k"=l
II
II
2
2
= O(m)
2
\)!k\)!k'\)!k"II
(7.15)
,
3
= O(m)
(7.16)
,
-1
n m -+ 0, m -+
(7.17)
00.
As in section 4, it is easily seen that this set of assumptions are very
reasonable for either of the estimators D-2 or D-3.
Note also this corollary
provides a substantial improvement over Theorem 3 of Hall [8].
8.
Proof of Theorem 1
First it will be seen that the first conclusion of Theorem 1 follows from
the second.
ASE
Note that by (3.5)
= n -1
n
\ [(n -1
L
j=l
~ V.. ) 2+ 2n -2 ( ~ V.. )V.. + n -:L2
-V .. ]
i- j 1J JJ
JJ
i j 1J
n
=
n
= n- 2(n-l)2ASE + 2(n-l)n- 2 L CCn-l)-l Y V.. )(n-1V .. )+n- l L n-;~.
j=l
i~j 1J
JJ
j=l
JJ
But by (3.8) and the Markov Inequality
n
n- l L n-;~. = 0 (MISE) .
j =1
JJ
P
-29-
Hence, by the Schwartz Inequality, Theorem 1 will be established when it is
seen that
ASE = MISE
a (MISE).
+
(8.1)
P
To check this, first note that
E(A$E) = !E[(g.(x)_g(x))2w(x)!X.=x]f(x)dx =
J
J
= ! var[g.(x)]w(x)dx
J
+
![Eg. (x)-g(x)] 2w(x)dx =
= (n-l)-ln!var[g(x)]w(x)dx
J
+
b(m) =
= MISE + o(MISE) .
Thus, by the Chebyshev Inequality, (8.1) and hence Theorem 1 will be established
when it is seen that
--
var(A$E) = o(MISE 2).
(8.2)
To verify this, note that by (3.5), for j
~
j'
Now by a further application of the linearity of variances and covariances, the
variance part of (8.3) may be expanded into (n-l)4 terms of the following types
(where i, i', i", i *, j are all different):
Term Number
1
2
3
_
4
Number of Terms
O(n)
O(n 2)
O(n 2)
O(n 2)
6
o(n3)
O(n 3)
7
O(n 4)
5
General Term
n var(V 2.. )
1J
-5
n var(V ..V.,.)
-5
1J 1 J
n -5 COy (V 2.. , V.2 , .)
1J
1 J
1J
1J 1 J
1J
1 J 1 J
1J
1 J
n -5 coy (V 2.. ,V ..V. , .)
n -5 COy (V 2.. ,V. , .V." .)
n -5 COy (V .. ,V.
-5
I .
,V ..V. II" )
1J 1 J
n COy (V .. V. , . ,V. II" V. * .)
1J 1 J
1 J 1 J
-30Similarly, the covariance part of (8.3) may be expanded into (n-l)4 terms of
the following types (where i, i' ,i", i * ,j ,j' are all different):
Term Nwnber
-e
Nwnbcr of Terms
8
0(1)
9
O(n)
10
O(n)
11
2
0(n )
12
O(n)
13
2
0(n )
14
O(n)
15
2
0(n )
16
2
O(n )
17
3
o (n )
18
O(n)
19
2
O(n )
20
o(n 2)
21
3
O(n )
22
2
O(n )
23
O(n 3)
24
4
O(n )
General Tcnn
2
2
n covev . , ,V .. ,)
J J JJ
-4
2
2
,V .. ,)
n covev
J J 1J
-4
2
2
n cov ev ,V.
1J 1J
-4
2
2
n cov ev. ,V. , . ,)
1J 1 J
-4
2
n cov (V
,V ,V.
J J JJ 1J
-4
2
n covev . , . ,V .. ,V. , . ,)
J J 1J 1 J
-4
2
n covev ,V I V.. , )
1J 1J JJ
-4
2
n cov ev ,V .. ,V. ,
1J 1J 1 J
-4
2
n cov ev .. ,V .. ,V. ,
1J JJ 1 J
-4
2
n covev
,V." . ,)
1J 1 J 1 J
-4
n covev V. ,V ,V .. ,)
J J 1J JJ 1J
-4
n cov (V. , V.. V ,V. ,
J J 1J, JJ 1 J
-4
n cov (V V. V ,V
J J 1J 1J 1 J
-4
n covev., .VO. ,Vo,. ,V."
J J 1J 1 J 1 J
-4
n cov ev V. I . ,V .. ,V. , . ,)
1J 1 J 1J 1 J
-4
n cov ev .. V
V.. ,V." . ,)
1J 1 J 1J 1 J
-4
n covev· .V. , . ,V.". I V * . ,)
1J 1 J 1 J 1 J
-4
0
0
,
0
•
0
0
,)
0
0
0
0
,
0
0
0
•
0
0
•
,
•
0
,
0
,
,
0
,
0
,)
)
•
0
0
•
,)
0
0
0
0
0
,
0
0
0
•
0
•
0
,
0
,)
•
,)
0
0
•
0
,
•
,
0
,)
-31The statement (8.2) will be verified by showing that each general term,
rru1tip1ied by the number of such terms, is of a lower order than MISE 2. First
note that by (3.6),
E(V ..V.,.) ::: E(E[V. ·IX.]E[V., .Ix.]) ::: b(m).
1J 1 J
1J
J
1 J
(8.4)
J
and by (3.10) and the Moment Inequality,
E(V~.)
1J
--
<
-
[E(E[V~.IX.]2)]! ::: O(v(m) 5/4)
(8.5)
J
1J
Term 1:
By (3.1), (3.7) and (3.9),
Term 2:
By the Schwartz inequality and the computations for Term 1,
2
n -3var(V ..V.,.) < n -3E[V 2..V.,.]
1J 1 J
Telm 3:
-
1J 1 J
~
4 ! ::: o(MISE,2 )
n -3 (E[V4.. ]E[V.,.])
1J
1 J
By the Schwartz inequality for covariances and the bound on Term 2,
n -3 cov (V 2.. ,\' 2. , .) < n -3 [var (V 2.. )var (V 2. , .) ] 1 :::
1J
Term 4:
1 J
-
0
(MISE 2).
Similarly,
n -3 cov(V 2.. ,V ..V.,.)
1J
Term 5:
1 J
1J
~
1J 1 J
n -3 [var(V 2.. )var(V ..V., .)] ! ::: o(MISE 2)
1J
1J 1 J
By (3.10), (3.11) and the independence of X1 , ... ,Xn '
E(V 2..V. , .V. 11") ::: E(E [V 2.. IX. ]E[V. , . IX.]E[V. II" IX.]) <
1J 1 J 1 J
1J
J
1 J
J
1 J
2
2
4
< [E (E [V .. IX.] ) E (E [V .. IX.] )]
1J J
1J J
::: O(v(m)
5/4
1
Thus by (3.1), (3.7), (8.4) and (8.5),
n -2 cov (V 2.. ,V. , .V. 11") :::
1J
1 J 1 J
1
)o(v(m)2b(m)2)
0
(MISE 2)
J
-
1
2
:::
-32Term 6:
Us:ing the same computations as Term 5,
n
Term 7:
-2
2
= 0 (MI SE)
cov (V ..V. , . ,V..V. 11")
1J 1 J 1J 1 J
•
By (3.11) and :independence,
E(V .. V., ,V'II"V,*,) = E(E[V ..
1J 1 J 1 J 1 J
1J
Ix.] 4)
= o(v(m)b(m)).
J
lhus,
n
Term 8:
-1
covev·· v. , .,V. II" V. * .)
1J 1 J 1 J 1 J
By the same method as that used on Term 3,
n
Tenns 9, 10:
--
n
2
= o(MISE 2)
-3
2
2
2
= o(MISE)
-3
cov(V.,.,V .. ,)
J J 1J
2
2
covev .. , V.. ,) =
1J 1J
0
,
2
(MI SE ) •
-7
2
~ cov ev ..
1J
2
, V. , . ,)
J
1
= O.
By the same method as that used on Term 4,
n
Term 13:
2
covev., . ,V .. ,)
J J JJ
By :independence,
n
Term 12:
-4
Similarly,
n
Term 11:
= 0 (MI SE 2)
-3 covev 2. , . ,V .. ,V.. ,)
J J JJ 1J
= 0 (MISE 2).
By (8.4), (8.5) and :independence,
2
= E (E [V 2. , .E [V .. , IX. , X. , ] E [V. , . , IX. , X. ,] IX.]) =
E ev . , .V.. ,V . , . ,)
J J 1J
1 J
J J
1J
J
J
1 J
J
J
J
= E (E [V~ , . X. ] E [V .. , X. , ] E [V. , . , X. ,]) =
J J
I
J
1J
I
J
1 J
I
J
-,"
2 IX.])E(E[V .. , IX. ,] 2) -_ O(v(m) 5/4 )b(m)
- E(E[V.,.
J J
J
1J
J
Thus,
n
-2
2
covev.,· , V.. ,V. , . ,)
J J
1J
1 J
= 0 (MISE 2)
.
-33-
Term 14:
Similar to Term 12,
n
Term 15:
-3
2
coy (V .. , V.. ,V . . ,)
1J 1J JJ
= 0 (MI SE 2).
In a spirit similar to Term 13,
E(V~ .V .. ,V.,.,)= E(V~ .E[V .. ,E[V.,., Ix. ,X. ,X. ,] Ix" ,X.])
1J 1J
1 J
1J
1J
1 J
2
2
1J
1J
1
J
J
1
< E(V .. (E[V. ·,Ix. ,X.]E[E[V., .,Ix.,]
-
1
J
2
2!
1J
1J
1 J
!
~ E((E[V
2
J
2
2
1J
=
~
2
!
2!
2
!
!
J
1J
=
Thus,
Term 16:
-2 coy (V 2.. , V.. ,V. , . ,) = 0 (MISE 2) .
1J
1J
1 J
Similarly
2
222
1
!
E(V ..V.. ,V.,.,) < [E(E[V .. lx.] )E(V .. ,)]2b(m)2.
1J J J 1 J
1J J
JJ
Thus,
n
Term 17:
-2 coy (V 2.. , V.. ,V. , . ,) = 0 (MI SE 2) .
1J
JJ
1 J
By independence,
_1
2
n "'cov(V .. ,V. , . ,V." . ,)
1J1 J 1 J
Tem 18:
Tem 19:
= o.
By. the same method as that used on Term 4,
n
1
Ix. ,X.])2)
1 J
O(v(m) 51 4)0 (v(m) 5/8) b"(m)! .
n
<
-
1
IX ] E[E[V , IX ] IX ])2)b(m)2 ~
ij j
j
ij
i
< E(E[V .. Ix.] )2E(V .. ,)2b(m)2
-
2
J
= E(E [V . . E [V . . , IX. ] 2 IX. ]) b (m) 2
1
J
-3 coy (V . , .V.. , V.. ,V . . ,) = 0 (MI SE 2) .
J J 1J
By the above methods,
JJ
1J
=
-34E (V . , .V..V.. ,V. , . ,)
J J 1J J J 1 J
= E (V . , .V.. ,E [V .. IX. ,X. , ] E[V. , . , IX.
J J JJ
1J
J
J
X.,]) <
J, J
-
1 J
I
I
2 ,) E (E [V .. X.] 2E [V. , ., X.,] 2)]
< [E (V 2
. , .V..
J J JJ
1J
J
1 J
1
44
J J
JJ
J
~<_
2
.2
< [[E(V.,.)E(V .. ,)]2E(E[V.·lx.] )E(E[V·,·,lx.,])]
= O(v(m)
3
1J
J
1 J
J
~
=
1
)2b(m) .
Thus,
n
Term 20:
-2
= o(MISE 2)
cov(V., .V .. ,V .. ,V., .,)
J J 1J J J 1 J
.
Similarly,
= E (V . , .V.. ,V .. E [V. , . , IX. ,X . ,X . ,])
E(V . , .V..V.. ,V. , . ,)
J J 1J 1J 1 J
J J 1J
1J
1 J
J
1
J
<
-
222
2 !
< [E (V . , .V.. , ) E (V .. E [V . , . , IX. ,] )]
J J 1J
1J
1 J
J
=
-e
=
[E (E [VJ~ IJ. IX. I] E [V~ . I IX. ,]) E (V~ .) b (m) ] l
J
1J
J
1J
~
2!
2
!
2
2
2
< [[E(E[V., ·Ix. I] )E(E[V .. I Ix. I] )] E(V .. )b(m)]
J J J
1J
J
1J
= o(v(m) 3) !O(v(m) 5/2) !O(v(m) 5/4)!b(m) !
=
.
Thus,
n
Term 21:
-2
= o(MISE 2)
cov(V. I .V .. ,V .. IV. I· I)
J J 1J 1J 1 J
Using the computations from Term 5,
E (V . I .V..V. , . ,V. II" , )
J J 1J 1 J 1 J
= E (VJ. , J.E [V.1 I J. I IX.J , ] E [V1J
.. IX. ] E [V.
J
1
2
J J
2
I
1J
II" ,
J
IX.J ,])
<
-
2
I
2
1
~ [E (V . , . E [V. , . I IX. ,] ) E (E [V .. X.] ) E (E [V . II" I X. ,] ) 12 <
1 J
J
Thus,
n
e
Term 22:
-1 COy (V . , .V.. , V. , . ,V." . I) = 0 (MI SE 2)
J J 1J
1 J
1 J
By (3.12),
E(V ..V.,.V .. ,V. , .,)
1J 1 J 1J 1 J
= o(v(m) 2)
.
J
1 J
J
-
-35-
Thus,
n-2 cov(V ..V., .,V .. ,V.,. ,)
1J 1 J
Term 23:
1J
1 J
o(MISE 2) .
=
Again using (3.12),
E(V ..V. , .V.. ,V. II" I) = E(E [V ..V.. I IX. ,X . , ]E[V. I . IX. ]E[V. II" , IX. I])
1J 1 J 1J
1 J
1J 1J
<
J
J
1 J
J
1 J
J
<
-
[E (E [V ..V.. I I X. ,X. I] 2) E(E [V. I . IX.] 2)E(D [V . II" I IX. I] 2)] i
1J 1J
J
J
1 J
J
1 J
J
2
1
= [E (E [V ..V.. I IX. ,X. , ]E[V . , .V. , . , IX. ,X. , ]) b(m) ] 2
1J 1J
J
J
1 J 1 J
J
J
= o(v(m))b(m) .
Thus,
2 .
n -1 COy (V .. V. , . ,V.. ,V. II" , ) = 0 (MISE)
1J 1 J
Term 24:
1 J
By independence,
-e
COy (V ..V.
, . ,V." . , ,V. *. ,)
1J 1 J
Nmv,
1J
1 J
1 J
=
O.
looking back to (8.3), the proof of (8.2) and hence that of Theorem
1 is complete.
9.
Proof of Theorem 3
This proof is very similar in spirit to the proof of Theorem 1.
First
note that by (5.4) and (7.1)
E[ISE*] = MISE* .
Thus, Theorem 3 will follow from Chebyshev's Inequal i ty when it is seen that
var[ISE*] = 0(MISE*2) .
(9.1)
To check this, note that
var[ISE*]
=
In
2
In
2
ffcov([nL
o*(x,X.)] ,[n- L o*(y,X.)] )w(x)w(y)dxdy
.
'lm
1
'lm
1
1=
n
=
1=
I f f n-4cov(0*m(x,X.)
6* (x,X. ,)0* (y ,X.,,) 0* (y ,x. *) )w(x)w(y) dxdy.
1 m
1
m
1
m
1
.
. I
1,1
,1. " ,1• *-1.
-
-36Most of these covariances are
a by
independence.
The ones that are not may
be slUllll1arized in the following table (where i, it, i", i* are assumed to be
different).
Tenn Number
General Tenn
Number of Tenns
I
O(n)
n- 4cov(8*(x,x.)2,8*(y,x.)2)
II
o(n 2)
n- 4cov(8*(x,X.)2,8*(y,X.)8*(y,X.,))
m
1
m
1 m
1
III
2
O(n )
4
n- cov(cS*(X,X.)cS*(x,X. ,) ,8*(y,X.)cS* (y,X. I))
m
1m
1
m
1m
1
IV
o(n3)
n -4cov(8* (x ,X.) 8* (x ,X. ,) ,8* (y ,X.) 8* (y ,X. II))
m
1m
1
m
1m
1
m
1
m
1
The statement (9.1) will be verified by checking that each general term, when
integrated and multiplied by the number of such terms, is of a lower order than
MISE*2.
First note that, by (5.6)
j[EcS*(x,X.)]2w(x)dx = b*(m) ,
m
1
and by (7.4) and the Moment Inequality
Term I:
By computations very similar to those used on Tenn 3 in section 8, it
is easily seen that, by (7.3),
Il-
3JJcov(8*(x,X.)2,8*(y,X.)2)w(x)w(y)dxdy = o(MISE*2) .
m
1
m
1
Tenn II:
By computations. very similar to those used on Term 15 in section 8,
it is easily seen that, by (7.4)
e
Term III:
By computations very similar to those used on Term 22 in section 8,
it is easily seen that, by (7.5)
-37-
= oQMISE*2).
n- 2!!cov(o*(x,X.)o*(x,X.,),O*(y,X.)O*(y,X.,))w(x)w(y)dxdy
m
1m
1
m
1m
1
Tern IV:
By computations very similar to those used on Term 23 in section 8,
it is easily seen that,
n-l!!cov(o*(x,X.)o*(x,X.
,) ,o*(v,X.)o*(y,X.,,))w(x)w(y)dxdy
= o(MISE*2)
.
1
m
1
m' 1 m
1
.
m
This completes the proof of Theorem 3.
10.
Uniform Convergence.
For studying optimal selection of the smoothing parameter by cross-valida-
tory techniques, it has been seen in Marron [13] and HardIe and Marron [10]
that it is useful to consider MISE and A5E as functions of m.
In particular,
it is important to know that A5E is approximately MISE, "uniformly over the
smoothing parameter, m."
To make this precise, let
~(n) ~
men).
~(n)
and men) be sequences for which
Define
"A5E(m).uapp. MISE(m)"
to mean, for all
m~
y >
sup_
[~,m]
0,
P[ IA5E/MISE -
11
> y] -+
O.
The notion of "uniform over [E!,m]" may be similarly defined for other quantities
(both random and nonrandom).
A caref~l inspection of the proof of Theorem 1 will show that that theorem
may be extended to:
Theorem 4:
If g(x) is a delta sequence estimator which satisfies (3.7) to
(3.12) uniformly over me
[~,m],
then
ASE .uapp. MISE
A5E .uapp. MISE .
-38Similarly, Theorem 2 can be extended to:
1beorem 5:
If g(x) is a fractional delta seqtlCnce estimator which satisfies
(5.8)-(5.13) tllliformly over m £.
[~,m],
then
ASE* .uapp. MISE*
ASE* .uapp. MISE*
In the same fashion Theorem 3 can be extended to:
Theorem 6:
If g(x) is a fractional delta sequence estimator which satisfies
(7.2)-(7.5) tllliformly over m e
[~,m],
then
ISE* .uapp. MISE*
In the same way the corollaries of Sections 4,6 and 7 may be extended to:
Corollary 4. (D-l):
Sufficient conditions for the estimator D-l to satisfy
the hypotheses of Theorem 4 are (4.1)-(4.6) together with
n-l-d
m -+ 0,
Corollary 4. (D-2,3):
(10.1)
Sufficient conditions for the estimators D-Z or D-3 to
satisfy the hypotheses of Theorem 4 are (4.13)-(4.18) together with
n -I-m -+ 0, m -+
Corollary 4. (R-2):
00
(10.2)
•
Sufficient conditions for the estimators R-2 to satisfy
the hypotheses of Theorem 4 are (4.24)-(4.30) together with
n -l-d'
m -+0,
Corollary 5. (R-l):
m-+
oo
•
(10.3)
Sufficient conditions for the estimator R-l to satisfy
the hypotheses of Theorem 5 are (6.1)-(6.8) together with
n -l-d'
m -+0,
m-+ oo
(10.4)
-39-
Corollary 5. (H-l):
Sufficient conditions for the estimator H-l to satisfy
the hypotheses of Theorem 5 are (6.10)-(6.15) together with
-I-
n m-+O,
Corollary 6. (D-1):
m-+
oo
(10.5)
•
Sufficient conditions for the estimator D-1 to satisfy
the hypotheses of Theorem 6 are (7.6)-(7.10) together with
n -l-d
m -+0,
Corollary 6. (D-2,3):
m-+
(10.6)
oo
Sufficient conditions for the estimators D-2 or D-3
to satisfy the hypotheses of Theorem 6 are (7.12)-(7.16) together with
-I-
n m -'- 0,
,
~
-+
00
•
(10.7)
REFERENCES
•
1.
S. BEAN and C.P. TSOKOS, Bandwidth selection procedures for kernel density
estimates, Corom. Statist., AI! (1982), 1045-1069.
2.
A.W. BOWMAN, A comparative study of some kernel-based nonparametric
density estimators. Research Report, Statistical Laboratory,
University of Manchester. (1982).
3.
L. BREIMAN, W. MEISEL and E. PURCELL, Variable kernel estimates of
multivariate densities, Technometrics, 19 (1977),
135-144.
--
4.
N.N.CENCOV, Evaluation of an unknown distribution density from observations,
Soviet Math., ~ (1962), 1559-1562.
5.
G. COLLOMB, Estimation non-parametrique de la regression: Revue
Bibliographique, Int. Statist. Rev., 1~ -(1981), 75-93.
6.
R.F. ENGLE, C.W.J. GRANGER, J. RICE and A. WEISS, Nonparametric estimates
of the relation between weather and elasticity of demand. Discussion
paper #83-17, Department of Economics, University of California at
San Diego.
7.
M.J. FRYER, Review of some non-parametric methods of density estimation,
J. Inst. Math. Applic., kQ (1977), 335-354.
8.
P. HALL, Limit theorems for stochastic measures of the accuracy of density
estimators, Stochastic Processes and their Appln., !~ (1982), 11-25.
9.
W. HARDLE, Approximations to the mean integrated squared error with applications to optimal bandwidth selection for nonparametric regression
function estimators, North Carolina Institute of Statistics Mimeo
Series #1529.
J
I
10.
W. HARDLE and J.S. MARRON, Optimal bandwidth selection in nonparametric
regression function estimation, North Carolina Institute of
Statistics, Mimeo Series #1530.
11.
W. HARDLE and J.S. MARRON, The nonexistence of moments of some kernel
regression estimators, North Carolina Institute of Statistics, Mimeo
Series #1537.
12.
G.J. JOHNSTON, Properties of maximal deviations for nonparametric regression
function estimates, J. Mu1tivar. Anal., ~3 (1982), 402-414.
13.
J.S. MARRON, An asymptotically efficient solution to the bandwidth problem
of kernel density estimation, North Carolina Institute of Statistics,
Mimeo Series #1518.
14.
.I.S. MARRON, Convergence properties of an empirical error criterion for
multivariate density estimation, North Carolina Institute of Statistics,
Mimeo Series #1520.
15.
E.A. NADARAYA, On estimating regression, Theor. Prob. Appl., 9 (1964),
141-142.
.
..
-
16.
E.A. NADARAYA, Some new estimates for distribution functions, Theor.
Prob. App1., ~ (1964), 497-500.
17.
E. PARZEN, On estimation of a probability density function and mode, Ann.
Math. Statist., 33 (1962), 1065-1076.
.
18.
J.W. RAATGEVER and R.P.W. DUIN, On the variable kernel model for TIRlltivariate nonparametric density estimation, Compstat 1978, eds. L.C.A.
Corsten and J. Hermans, 524-533.
19.
M. ROSENBLATT, Remarks on some nonparametric estimates of a density function,
Ann. ~mth. Statist., 27 (1956), 832-837.
20.
M. ROSENBLATT, Conditional probability density and regression estimators.
MUltivariate Analysis II, ed. P.R. Krishnaiah, (1969), 25-31.
21.
M. ROSENBLATT, Curve estimates, Ann. Math. Statist.,
22.
A.E. RUST and C.P. TSOKOS, On the covergence of kernel estimators of probability density functions, Ann. Inst. Statist. Math., ~~ (1981),
233-246.
23.
D.W. SCOTT and L.E. FACTOR, Monte Carlo study of three data-based nonparametric probability density estimators, J. Amer. Statist. Assoc., lQ
(1981), 9-15 •
24.
B.W. SILVERMAN, Weak and Strong uniform consistency of the kernel estimate
of a density and its derivatives, Ann. Statist., § (1978), 177-184 .
25.
B.W.
26.
J.M. STEELE, Invalidity of average squared error criterion in density
estimation, Canad. J. Statist.,~ (1978), 193-200.
27.
C.J. STONE, Nearest neighbor estimators of a nonlinear regression function,
Froc. Compo Sci. Statist. 8th Annual Symposium on the Interface (1976)
Health SCIences Computing Facility, U.C.L.A., 413-418.
28.
V. SUSARLA and G. WALTER, Estimation of a TIRlltivariate density function
using delta sequences, Ann. Statist., Q (1981), 347-355.
29.
M.A. TANNER and W.H. WONG, Data-based nonparametric estimation of the
hazard function with applications to model diagnostics and exploratory
analysis, (1982) Tech. Rep. No. 137, Dept. of Statist., Univ. of
Chicago.
30.
J.W. TUKEY, Curves as parameters,. and touch estimation, Proc. 4th Berkeley
Symp. (1961), 681-694 .
31.
G. WAHBA, Optimal smoothing of density estimates, in Classification and
Clustering (ed. J. Van Ryzin) , Academic Press, New York, (1977),
423-458.
,
•
~~
(1971), 1815-1842.
SILVE~'~, A fast and efficient cross-validation method for smoothing
parameter choice in spline regression. (Unpublished manuscript).
32.
G. WALTER, Properties of Hermite series estimation of probability density, Ann. Statist., 2 (1977), 1258-1264.
n
&
•
--
•
I
•
33.
G. WALTER and J. BLUM, Probability density estimation using delta
sequences, Ann. Statist., 7 (1979), 328-340.
34.
G.S. WNfSON, Smooth regression analysis, Sankhya, At 2 (1964), 359-372.
35.
G.S. WATSON and M.R. LEADBETTER, Hazard analysis I, Biometrika,
(1964), 175-184.
36.
G.S. WATSON and M.R. LEADBETTER, Hazard analysis II, Sankhya At 2 (1964),
101-116.
37.
E.J. WEGMAN, Nonparametric probability density estimation: II. A comparison of density estimation methods, J. Statist. Compo and Simu1., !
(1972), 225-245.
21