Fan, Jianqing and Masry, Elias; (1991).Multivariate Regression Estimation with Errors-in-Variables: Asymptotic Normality for Mixing Processes."

Multivariate Regression Estimation with Errors-in-Variables:
Asymptotic Normality for Mixing Processes·
Jianqing Fan t
Department of Statistics
University of North Carolina
Chapel Hill, N.C. 27599-3260
Elias Masryt
Department of Electrical and
Computer Engineering
University of California at San Diego
La Jolla, CA 92093
July 14, 1991
Abstract
Errors-in-variables regression is the study of the association eetween covariates and
responses where covariates are observed with errors. In this paper, we consider the
estimation of multivariate regression functions for dependent data with errors in covari- .
ates. Nonparametric deconvolution technique is used to account for errors-in-variables.
The asymptotic behavior of regression estimators depends on the smoothness of the
error distributions, which are characterized as either ordinary smooth or super smooth.
Asymptotic normality is established for both strongly mixing and p-mixing processes,
when the error distribution function is either ordinary smooth or super smooth.
·t Supported by the
Office of Naval Research under Grant N00014-90-J-1l75.
t Completed while visiting the Department of Electrical and Computer Engineering, University of California at San Diego.
Abbreviated title. Errors-in-variables regression.
AMS 1991 Subject Classification. Primary 62G07. Secondary 62HIO, 60F05.
Key words and phrases. Asymptotic normality, deconvolution, errors-in-variables, multivariate regression, mixing processes.
1
1
Introduction
In data analysis, it is customary to explore the association between covariates and responses
via regression analysis. Let XO denote the covariate variable and Y be the response variable.
= E(YIXo = x) which is assumed to exist.
The regression function is defined by m( x)
This
paper deals with the regression problem with errors-in-variables: We wish to estimate m(x),
but direct observations of the covariate XO are not available. Instead, due to the measuring
mechanism or the nature of the environment, the covariate XO is measured with error
€:
Xj = XJ + € j so that Xj instead of XJ is observed and one desires to explore the association
between XO and Y based on the observation (Xj, Yj)j=l. This problem arises, for example,
in medical and epidemiologic studies where risk factors are partially observed. See Prentice
(1986) and Whittemore and Keller (1988).
In the Li.d. case, the nonparametric errors-in-"variables problem was studied by Fan
and Truong (1990) and Fan, Truong and Wang (1990) where optimal rates of convergence
and asymptotic normality are established. Let]((.) be a kernel function whose Fourier
transform is given by
~K(t) =
1:
00
and ~~ (t) be the characteristic function of the error variable
Wb(X)
= -1
211"
1+
00
(1.1 )
exp(itx)]((x)dx
€.
Set
exp( -itx) _~K(t) dt,
</>~(t/b)
-00
(1.2)
which is a deconvolution kernel. Fan and Truong (1990) proposed the following kernel
estimate for m(x):
.
mn(x)
=
,£7=1 YjWbn((X - Xj)/b n)
n ,
'£j=1 lYbn((X - Xj)/b n)
(1.3)
where bn is the bandwidth parameter. We remark that the deconvolution kernellVb n is used
to account for the fact that the covariates are observed with error. For more discussions
on deconvolution, see Carroll and Hall (1988), Liu and Taylor (1989), Stefanski and Carroll
(1990), Zhang (1990), and Fan (1991a, b, 1992) in the Li.d. setting and Masry (1991a,b,c)
for dependent observations.
2
Our goal in this paper is to establish the asymptotic normality for estimators of form
(1.3) in the following more general setting.
• The processes {Xl} and {Yj} are individually and jointly dependent.
• Multivariate regression from past vector data is considered.
• Estimation of general regression function of form
is studied, where 'l/J(.) is an arbitrary measurable function. These functions include
the usual mean regression and conditional moment functions as well as conditional
distribution functions.
We note that in this general setting, sharp almost sure convergence rates were established
in Masry (1991dj.
When the error variable
€
== 0, the errors-in-variables problem reduces to the ordinary
nonparametric regression where covariates are observable. In that case, the deconvolution
kernel (1.2) is just an ordinary kernel in which case the estimator (1.3) was proposed by
Nadaraya (1964) and Watson (1964). The estimator (1.3) has been thoroughly studied with
no errors in covariates. See, for example, Mack and Silverman (1982) and HardIe (1990)
and references therein for LLd. observations, and Rosenblatt (1969), Robinson (1983, 1986),
Collomb and HardIe (1986), Truong and Stone (1991), Truong (1991) and Roussas and Tran
(1991), among others for dependent observations.
We now introduce the regression estimator in the more general setting mentioned above.
Let {XJ}~-oo and {Yj }~-oo be jointly stationary processes and
{€j }~-oo
be LLd. random
variables, independent of the processes {XJ}~-oo and {Yj}~-oo' Denote the probability
density and the characteristic function of the error variable € by h( x) and ¢dt), respectively.
Set
Xj
= X'j + € j ,j = 0 ± 1, ....
3
Let r( Xi p)
variables
= r( Xl, ... ,Xp ; p)
be the joint probability density function of the random
Xl' ... ,X;, which is assumed to exist.
Then the joint probability density function
of X ll '" ,Xp is given by
f(x;p)
= JRJ> r(x -
(1.5)
UiP)h(u)du
where
p
h(u) = IIh(Uj).
(1.6)
j=l
Let
(1. 7)
For simplicity, we use product kernel for the multivariate nonparametric regression estimation. Let the kernel
K
be a real-valued, even, and bounded density function on the real
line satisfying K(x) = O(!xl- I -
e5
)
for some 0 > 0 and let ~K(t) be its Fourier transform. A
basic assumption on the error distribution and the kernel function is that for every b > 0
With Wb defined by (1.2), set
p
K(x)
P
= II K(xj);
Wb(x) =
II Wb(Xj)
(1.8)
j=l
j=l
so that
p
</>K(t) =
II ~K(tj).
(1.9)
j=l
Let {b n } be a sequence of positive numbers such that bn
--
0 as n --
00,
Given the
observations (Xj, Y})j=l' we estimate the regression function (1.4) by
(1.10)
where
(1.11)
4
and
1
A
f~(~;p) = (n _ p + 1)bh
.r;
n-p
W bn
((~
-
(1.12)
Xj)/b n ).
We remark that j~(~;p) is a deconvolution density estimation of r(~jp).
For considering the asymptotic normality of m(~;p), we define the centralizing parameter by
B (~. ) - ERn(~;P) - R(~jp) - m(~;p)(Ej~(~;p) - r(~jp))
Ej~(~;p)
,
n ,p where
R(~;p)
=
m(~;p)r(~;p).
estimator m(~;p). With
Rn , j~
We will see that
Bn(~;P)
(1.13)
is the 'asymptotic bias' of the
and B n defined respectively by (1.11), (1.12) and (1.13), it
is easy to verify that
mn
A
(
. )
~,p
- m
( . )
~,p
. ) -_ On(~jP) - Bn(~;p)(j~(~;p) - Ej~(~;p))
- B n(~,p
A
,
f~(~jp)
where
It will be shown in Proposition 1.1 that
Bn(~;P)
= 0(1) under some mild conditions.
Therefore, the dominated term in the numerator is On:
mn(~jP) - m(~;p) - Bn(~jP)
= On(~jp)(1 + op(1))j j~(~;p).
(1.14)
Note that On is centralized and has the form of an average of a sequence of stationary
random variables. Hence, we need first to establish asymptotic normality for On, and then
the asymptotic normality of mn follows easily from (1.14).
The bias of the estimators j~(~jp) and Rn(~;P) and the asymptotic value of Bn(~jP)
are given by the following proposition.
Proposition 1.1.
a) For almost all
~
E RP, we have as n
Ej~(~jp) - t r(~;p)j
where
R(~;p)
=
- t 00
ERn(~jP) - t R(~jp)j
m(~jp)r(~jp).
5
and
Bn(~;P)
-t
0,
b) If r (x j p) and R( X; p) are twice differentiable and their second partial derivatives are
bounded and continuous on RP and the kernel function [( satisfies J~: u 2 j(( u )du <
00,
then as n
-+ 00
1. b;;2bias(j~(x;p))
2. b;;2bias(Rn (x;p))
-+
-+
t JRPuG'jo(XiP)uTK(u)dui
! JRP uG'R(XiP)UTK(u)du;
where u T is the transpose of the row vector u, the p
. )_
G"f o ( x,p
-
({)2{)r ({)x; P)) ,
Xi Xj
X
P matrices G" are given by
. )_
G"R ( x,p
-
({)2{)R(X;P))
{)
,
Xi Xj
and
B(XiP) = JRJ> u (G'R(x;p) - m(x;p)G'jo(XiP)) u T K(u)du/(2r(x;p)) .
. Proof. See Fan and Truong (1990) and Masry (1991d).
We remark that the above bias expressions do not dependent on the error distribution.
However, the asymptotic variance and the optimal rates of convergence depend strongly on
the smoothness of the error distributions. Fan (1991a) shows that such a dependence is
an intrinsic part of the regression problem, not an artifact produced by the kernel method
being used. Following Fan (1991a), we call a distribution
• super smooth of order (3: if the characteristic function of the error distribution ¢>e(')
satisfies
(1.15)
where a, ao, al, {3 are positive constants and {3o, {3l are constants;
• ordinary smooth of order (3: if the characteristic function of the error distribution
¢>e(') satisfies
for positive constants do, d l , {3.
6
Note that the above conditions are imposed in the Fourier domain and only on the tail of
the characteristic function. The faster the decay of the tail of the characteristic function,
the smoother its corresponding density. The super smooth distributions include normal and
Cauchy distributions and their mixtures; the ordinary smooth distributions include gamma
and Laplace distributions.
It will be seen in the sequel that the technical conditions needed to establish the asymptotic normality of mn ( X; p), as well as the nature of the proofs, depend strongly on the type
of the error distributions. Section 2 establishes asymptotic normality when the error distribution is ordinary smooth and Section 3 accomplishes the same for super smooth error
distributions. Both strongly mixing and p-mixing processes are studied. Some technical
proofs are given in the Appendix.
2
Asymptotic normality for ordinary smooth error distributions
2.1
Preliminaries
Let F ik be the a-algebra of events generated by the random variables {XJ, Cj, Yj, i ~ j ~
k} and L 2 (F[) denote the collection of all second-order random variables which are Fr
measurable. The stationary processes {XJ, Cj, Yj} are called strongly mixing (Rosenblatt,
1956) if
sup
IP(AB) - P(A)P(B)I = a(k)
AEF~oo ,BEFt:
~
0
k
as
~ 00
and are said to be uniformly mixing if
sup
IP(BIA) - P(B)I = ¢(k)
~ 0
as
k
~ 00,
0
as
k
AEF~oo ,BEFt:
and are called p-mixing (Kolmogorov and Rozanov, 1960) if
Icov(U, V)I
sup
= p(k)
•
1 2
1 2
U EL2 (~oo)'V EL2 (Ft:) var / ( U)var / (V)
7
~
~ 00.
It is well known that these mixing coefficients satisfy
and thus the class of p-mixing processes is intermediate between strongly and uniformly
mixing.
We begin by imposing some conditions on the kernel function, the error distribution, as
well as the mixing coefficients.
Condition 2.l.
J~(t) and JK(t) are twice continuously differentiable with bounded derivatives such that
i) J~(t):I 0,
ii) tI3J~(t)
where
~13,1
--lo
Vt E R,
B as t
--lo
00
for some
f3 ~ 1 and B > 0,
is the Kronecker's delta.
Let fx 0, X t IY.p. y.p+1 (u,vIYp,Yp+d be the conditional density of (Xo,X/) given Y p
and Yp+/ = Yp+/, where Xj is given by (1.7), and when 1 ~ l
<
= Yp
p, the vector (Xo,Xd
means (Xl,'" ,X/+ p )' Recall that f(·) is the joint density of Xj given by (1.5). Let
f(xo,xI) be the probability density function of (Xo,X/) with a similar meaning as above
when 1
~
l
< p. Denote by
(2.1 )
We make the following assumptions on the processes involved.
Condition 2.2.
8
ii) 1/(:z:;p)1 ~ A 2·
iv) Either the processes {XJ,Cj,l'j} are p-mixing with Ei=lP(j) <
mixing with E~lIa[a(I)p-2//I <
00
for some a
00;
or are strongly
> 1 - 2/v.
v) I(:z:; q) ~ A 4 for alII ~ q ~ 2p, and I(:z:o, :z:/) ~ A 4 for alII ~ p.
where Aj(j
= 1,"',4) are some positive constants.
We remark that Condition 2.2 ii), iv) and v) are imposed on the X-variable. By the
convolution theorem, they are satisfied when the density 11,(.) of error variable C is bounded.
With Band f3 given in Condition 2.1, let
D=
1
27l"IBI
2
1+
00
-00
2
It I2/3 I<t>K(t)1
dt.
(2.2)
We need the following lemmas.
Lemma 2.1 (Masry, 1991a). Under Condition 2.1 and Condition 2.2 iv) and v), we
have
at points
01 continuity of f(:z:; p).
If in addition nb~(2/3+l)
j~(:z:;p) ~ r(:z:; p),
as n
-+ 00,
then by Proposition 1.1
-+ 00.
Lemma 2.2 (Masry, 1991a). Under Condition 2.1, we have the following result:
a) There exists a constant c such that
9
b) For all points a: of continuity of a bounded function g(.),
lim b1P
n
bn-O
where
Wb n is the
1 II
P
RP. 1
(
b~W?n
(x. b-u.)) g(u)du = DPg(a:),
J
J
n
J=
deconvolution kernel given by (1.2) and D is given by (2.2).
To study the asymptotic normality for
Qn and hence for
mn, we put
(2.3)
where
J(
and Wb n are defined by (1.8) and
By Proposition 1.1, J.ln is independent of the error distribution and goes to zero for almost
all a: E RP:
J.ln
= E [b~P (,¢(Yp ) -
Then, we have
_
Qn =
m(a:;p)) J( ((a: - Xo)/b n )]
1
n - p
= 0(1).
(2.4)
n-p _
L Zn,j.
+ 1 j=O
(2.5)
With V(a:jp) given by (2.1), let
Lemma 2.3. Under Conditions 2.1 and 2.2 i) - iii), we have at continuity points of
V(a:;p) and f(a:jp) that
n-I
L
ICOV(Zn,o, Zn,I)1 =
1=1
and
10
0
(var(Zn,o))
where Zn,i is given by {2.3}.
Proof. We first remark that the third result follows directly from the first two results
together with the stationarity assumption:
var
n-l
(
)
~ Zn,i =
nvar( Zn,O)
n-l (
+ 2n ~
J=O
1-
.)
~
cov( Zn,O' Zn,i ).
(2.6)
J=1
By conditioning on X o, we have from (2.3) and (2.4) that
var(Zn,o)
=
b~2p E ([1/7(Yp) -
b~2PE
m(a:;p)]2Wln
(a: ~nXo) ) + 0(1)
(a: ~:o)) + 0(1)
IT (b;13U.t (Xi; Ui)) V(UiP)j(UiP)du + 0(1).
(V(XoiP)Wln
b~(213+l)p b~
n
i
RP )=1
n
Applying Lemma 2.2 b) with g(a:) = V(a:;p)j(a:;p) which is bounded by Condition 2.2 i)
and ii), we obtain the first part of the result.
Next, with a sequence of integers Cn
For 1
-+ 00
such that
cnb~ -+
0, we write
:s; l :s; p - 1, (2.3) and (2.4) lead to
Let u', u", u lII denote respectively an l, (p -l), l dimensional vector, where u" will represent
the overlap part of the vector X o and Xl. Conditioning on (Yp,Yp+d and using Condition
2.2 iii) and a change of variables, the above covariance is further bounded by
11
uniformly in 1. The last equality follows from the factorization (1.8). This together with
Lemma 2.2 a) lead to
p-1
L0
J1 =
(b~(213+l)P+I)
= 0 ( var( Zn,o)) .
1=1
For p :5 1 :5 cn , we have similarly that
uniformly in 1. Therefore, by the choice of Cn, we have
We now deal with
h. We separate the argument into two cases, depending on whether
the processes are p-mixing or strongly mixing. For p-mixing processes, we have
n
h:5
L
p(l- p + l)var(Zn,o)
= 0 (var(Zn,o))
,
I=c n
by the summability of the mixing coefficients. For strongly mixing processes, we proceed
as follows. By Davydov's Lemma [see Hall and Heyde (1980), Corollary A2], we have
(2.7)
Conditioning on X o and using Condition 2.2 i), we have
EIZn,oIV
r
<
b~pv E (!Wbn
<
C1b~pv E IWbn (:Z: ~nXo) IV ,
Cz: ~nXo)
E [I7P(Yp) - mIVIX o])
+ 0(1)
where C1 is a positive constant. Therefore, by a change of variable and Condition 2.2 ii),
we obtain
p
p
EIZn,O IV _< C1 A 2 bn (v-1)IIWJbn Ilvv = 0 (b-n (V-l+l3 v)) ,
where the last equality follows from (1.8) and Lemma 2.1 a). This together with (2.7) entail
C2
~ [ (1
)]1-2/v
b2p(I3+l- 1/ v) LJ a - p + 1
n
<
I=Cn+1
C2
~ la[ a (1 - p
ab2p(I3+l- 1/ v) LJ
cn
n
l=cn+1
12
+ 1)]1-2/v ,
for some positive constant C 2. Choose Cn
In view of Condition 2.2 iv), we have
= [b~p(1-2/11)/a] so that cnb~ -+ 0 since a> 1- 2jv.
h = 0(b~(211+l)p). This completes the proof.
0
Lemma 2.4 (Volkonskii and Rozanov, 1959). Let V!"", VL be random variables measurable with respect to the u-algebras FIll, ... ,
jL ~ n, i/+l - j/ ~ w ~ 1 and
E
IVjI
~ 1 for j
(:g tx
Vi) -
Flf
respectively with 1 ~ i 1
= 1,"', L.
< jl < i 2 < ... <
Then
E(Vi) $ 16(L - 1)<>(w),
where a( w) is the strongly mixing coefficient.
2.2
Main Results
The principle result of this section gives the asymptotic normality of the regression estimator
(1.10) for both strongly mixing and p-mixing processes.
Condition 2.3.
Let {sn} be a sequence of positive integers, Sn
-+ 00,
For strongly mixing processes, a(k) satisfies (nb~p)1/2a(sn)
processes, p(k) satisfies (nb~p)1/2p(sn) -+ 0 as n
such that Sn = 0((nb~)1/2).
-+
0 as n
-+ 00;
for p-mixing
-+ 00.
Theorem 2.1. Under Conditions 2.1 - 2.3 and nb~211+l)p -+
00
as n
-+ 00,
we have
at continuity point of r(x;p) and V(x;p), where r 2(x;p) = DPV(x;p)f(x;p)j(r(x;p))2
and f, V and D are given respectively by (1.5), (2.1) and (2.2).
The following corollary follows from Theorem 2.1 and Part b) of Proposition 1.1 together
with the choice of bandwidth.
Corollary 2.1 Under Conditions 2.1- 2.3, if the functions m(x;p) and f(x;p) have
13
bounded and continuous second partial derivatives on RP, then
provided that nb~+(2,6+1)p -+ 0 and that J~: u 2K( u )du
<
00.
Proof of Theorem 2.1
The idea of the proof of Theorem 2.1 is as follows. We first establish the asymptotic
normality for
On and then use (1.14) to conclude the desired result. We employ the following
big blocks and small blocks argument.
Set
n-1
zn,J.-- b(2,6+t)p/2
Zn,J'..
n
Sn
=L
Zn,j.
j=O
Then, (2.5) leads to
(2.8)
In view of Lemma 2.3, we have
n-1
var(Zn,o)
-+
L
(p(a:;p);
Icov(Zn,O,Zn,j)l-+
o.
(2.9)
j=1
Partition the set {I, 2,··' ,n} into 2kn
blocks of size s =
Sn,
+ 1 subsets with large blocks of size r
= r n and small
where, with [.J denoting integer part,
(2.10)
Define the random variables
j(r+s)+r-l
"Ii =
L
Zn,i,
0 ~ j ~ k - 1,
(2.11)
0 ~ j ~ k - 1,
(2.12)
Zn,i.
(2.13)
i=j(r+s}
{i+t}(r+s)-l
~i
=
L
Zn,i,
i=j(r+s)+r
and
n-l
(k =
L
i=k(r+s)
14
Then,
Sn
We will show that as n
k-l
k-l
j=O
j=O
= I>li + I:~j + 17k == S~ + S~ + S~'.
(2.14)
-+ 00,
(2.15)
k-l
E (exp( itS~)] -
II E [exp( itl7j)]
-+
(2.16)
0
j=O
1 k-l
- I:E(l7J) -+02(x;p)
n j=O
1
(2.17)
k-l
- I: E (l7]I{!l7jl ~ €O(x;p)y'n})
-+
n j=O
for every
€.
(2.15) implies that
the summands {l7j} in
S~
S~
and
S~'
(2.18)
0
are asymptotically negligible, (2.16) implies that
are asymptotically independent, and (2.17) and (2.18) are the
standard Lindeberg-FeUer conditions for asymptotic normality of
S~
under independence.
Expressions (2.15) - (2.18) entail the following asymptotic normality:
(2
1
y'nSn
- .c N 0,0 (x;p)
)
(2.19)
so that by (2.8)
This together with (1.14) and Lemma 2.1 prove Theorem 2.l.
We now establish (2.15) - (2.18). The proof concentrates on the strongly mixing case
as it is more involved. We remark on the differences for p-mixing processes.
We first choose the block sizes. Condition 2.3 implies that there exist constants qn
such that for strongly mixing processes,
and for p-mixing processes
15
-+ 00
Define the large block size r n
= [( nb~)1/2 / qn]'
Then, simple algebra shows the following
properties:
(2.20)
and
(2.21)
[ For p-mixing processes, (2.21) is proved via the inequality a(sn) ~ p(sn)/4.]
We now establish (2.15). Note that
k-l
k-l
E(S~)2 = Lvar(~j)
k-l
LCOV(~i,~j) == F1
+L
j=O
+ F2·
(2.22)
i=O j=O
i:#j
Using stationarity and (2.9), we obtain that
a-I
var(~j) =
svar(Zn,o)
+ 2s L(1-1/s)cov(Zn,0, Zn,l)
1=1
(s ~ Icov(Zn,O, Zn,I)I)
=
svar(Zn,o)
+0
=
S(p(:z:;p)(l
+ 0(1)).
By (2.22),
F1 = O(ks) = o(n),
since by (2.10) and (2.20)
Now, we consider F2 • We first note that with mj
k-l
F2
=L
k-l
but since i
:/= j, Imi - mj + it -
a-I a-I
L
L
i=O j=O
i:f=j
L cov(Zn,mi+ll,Zn,mj+12)'
11 =01 2=0
12 / ~ r so that
n-r-l
IF2 / ~
= j(r + s) + r,
2
L
n-l
L
It =0 12 =It +r
16
ICOV(Zn,lpZnh)l·
(2.23)
By stationarity and (2.9)
n-l
IF2 1 ~ 2n L
Icov(Zn,O,Zn,j)1
= o(n).
j=r
By (2.22), we have validated the first part of (2.15). For the second part of (2.15), using a
similar argument together with (2.9), we obtain that
1
;E(S~/? <
1
;(n - k(r + s))var(Zn,o)
n-l
+2 L
Icov(Zn,O, Zn,j)1
j=l
<
rn
+ Sn 0( ~;p ) + 0(1) ---+- O.
n
For (2.16) we proceed as follows. We note that T/a is a function of the random variables
or T/a is
:F!: -measurable with i a = a( r +s) + 1 and ja = P + a( r +s) + r -
Lemma 2.4 with Vj
E
1. Hence, applying
= exp( itT/j), we have
(U eXP(it~j)) UE[exp(it~j)l ~
-
16ko(sn + 2 - p)
~ 16:' o(sn + 2 - p),
which tends to zero by (2.21).
We now show (2.17). By stationarity and (2.23) with s replaced by r, we have
var(T/j)
since sn/rn
---+-
= var(T/o) = r02(~;p)(1 + 0(1)).
O.
It remains to establish (2.18). We first prove (2.18) when 'ljJ(.) is bounded. This would
establish the asymptotic normality (2.19) for this particular case. The general case of 1/J
possibly unbounded is then established by using a truncation argument.
Assume that
l'ljJOI
~
L. Then by the definition of Zn,j,
17
and by Lemma 2.2 a) and the fact that J1n
--+
0, we have
for some constant C3 . This and (2.11) entail
which tends to by (2.20). Hence, when n is large the set {11]jl :2: O( X; P)cJn} becomes an
empty set. Hence, (2.18) follows, and consequently the asymptotic normality (2.19) holds
for bounded 'I/J(.).
To complete the proof for the general case, we utilize the following truncation argument:
Put
1/JL = 1/J(y)I{11/J(y)1
~
L},
where L is a fixed truncation point. Correspondingly let
and
Put
J1n,L
= E [b;? ('l/JL(Yp) -
mL(x;p)) Wb n ((x - Xo)/b n )] ,
Z~,j = b~213+l)p/2 [b~p ('l/JL(Yp) - mL(Xjp)) Wb n ((x - Xo)/b n ) - J1n,L]
and
n-l
S~ = LZ~,j'
(2.24)
j=O
Then, by the asymptotic normality for bounded 1/J('), we have
(2.25)
18
In order to complete the proof, namely to establish (2.19) for the general case, it suffices to
show that as first n
--t
00
and then L
--t
00
we have
~var (S~)
-+
o.
(2.26)
Indeed,
IE exp( itSn/v'n) - exp( -t20(:z:; p)/2)/
IEexp(it(S~ + S~)/v'n) - exp( -t 201/2) + exp( -t 20l!2) - exp( _t 202/2)1
< IEexp(itS~/v'n) - exp(- t201/2)! + E lexp(itS~/v'n)
+ lexp( -t20l!2) Letting n
--t
00,
exp( _t 202/2)1
the first term goes to zero by (2.25) for every L
converges to zero by (2.26) as first n
L"--t
00
-11
--t
00
and then L
--t
00;
> 0; the second term
the third term goes to zero as
by the dominated convergence theorem. Therefore, it remains to prove (2.26). Note
that by (2.24) S~ has the same structure as S~ except that the function 'l/JL is replaced by
'l/J - 'l/JL. Hence, by Lemma 2.3 (note the different scaling between Zn,j and Zn,j), we have
By the dominated convergence theorem, the right hand side converges to 0 as L
establishes (2.26) and completes the proof of Theorem 1.
3
--t
00.
This
o
Asymptotic normality for super smooth error distributions
In this section, we deal with super smooth error distributions, whose characteristic function
decays exponentially fast. Recall that Qn(:Z:;P) given by (2.5) is central to our discussion
of asymptotic normality. Set
19
and
(j 2 (n) = var( Qn(:Z:;
p)).
(3.2)
Since the asymptotic rates and constants of (j5(n) and (j2(n) are not available, the technical
arguments are more involved here than in the ordinary smooth case. We first derive both
lower and upper bound for (j5(n) and then use these bounds to establish
These bounds are also useful in validating the Lindeberg-Feller condition for asymptotic
normality.
3.1
Preliminaries
We make the following assumptions on the characteristic function J~ (t) of the error variable
c and on the Fourier transform JK(t) of a kernel function
1(.
Condition 3.1.
i) J~( t)
1:- 0 for all t E R. Moreover, expression (1.15) holds with f31
= f3o.
ii) JK(t) has a finite support (-d,d).
iii) There exist positive constants ~, a2 and f such that IJK(t)1 < a2(d - t)i, for t E
(d-~,d).
iv) JK(t) ~ a3(d - t)i for t E (d - ~,d), where a3 is a positive constant.
v) With R~(t) and j~(t) being the real and the imaginary part of J~(t), assume that
either j~(t)
= o(R~(t)) or R~(t) = o(i~(t))
, as t
-+ 00.
We remark that condition i) assumes that the error distribution is super smooth. Under
such an assumption, by Fourier's inversion, the density h( u) of the error variable
20
E
is
bounded and has bounded derivatives of all orders. This entails that the marginal density
f(~;p)
given by (1.5) is bounded:
f(~;p) ~
for some M
= YP
Yp
M,
> O. We also remark that the conditional density fXoly/uIYp) of X o given
exists and is bounded and continuous in view of the smoothness of h( u) and
the convolution theorem. Condition ii) is a sufficient condition for the existence of the
deconvolution kernel (1.2) for b
>
o.
Note that 4>K(d)
=0
since 4>K(-) is continuous.
Condition iii) describes the behavior of 4>K(t) in a neighborhood of t = d. Conditions iv)
and v) are used to develop lower bounds. Condition v) says that at the tail, the characteristic
function 4>~(.) is either purely real or purely imaginary.
The following lemma gives both lower and upper bounds on the norms of the deconvolution kernel function (1.2). The proof is given in the appendix.
Lemma 3.1. Under Condition 3; 1 i) - iii), we have as bn
--
0,
and
IIWbnll2
=
0(bi
l+
OoS
)!3+{10
(IOg(b~)r eXp(a(d/bn){1)).
If moreover Condition 3.1 iv) and v) hold, then we have
for some
a4
> 0 uniformly in x on
_
H(x)
=
a bounded interval, where
{ I cos dxl,
Isin dxl,
if j~(t)
= o(Re(t))
if Re(t)
= o(le(t)).
The following two lemmas establish a lower bound on cr5(n) and the identity cr 2 (n) =
;cr5(n)(1 + 0(1)). We impose the following conditions.
21
Condition 3.2.
i) EI"p(Y1)IV
< 00 for some v > 2.
ii) The processes {XJ, 6" j, Yj} are either p-mixing with L~l p(j) < 00 or strongly mixing
with L~l j"[o:(j)]1-2/v < 00 for some A > O.
Lemma 3.2. Under Condition 3.1, we have for large n
and
for some constants as, a6 > O.
Proof. Since /In is bounded [see (2.4)], we have by conditioning on Yp and using a
change of variables that
b~2PE(["p(Yp) - m(a:jp)]Wbn((a: - X o)/bn))2 + 0(1)
0"6(n)
= b~P E (["p(Yp) -
m(a:;pW
b~PE (["p(Yp) -
m(a:;p)]2
>
JRP Wb:(u)fXoIYp(a: f
J[-l,l]P
bnu1Yp)dU)
+ 0(1)
W b2 (u)fx IY. (a: - bnU1Yp)dU)
n
0
p
+ 0(1)
and by Lemma 3.1 and the factorization (1.8) of Wb n
0"6 (n) :2: a~Pb~P[(l+1)13+l3o-0.S] exp (2ap( d/bnl)
xE (["p(Yp) - m(a:;p)]2
r IT iI 2(uj)fX oIYp(a: - bnU1Yp)dU) + 0(1).
J[-l,l]P j=l
By the continuity of fXolYp' we then have
0"6(n) :2: a;Pb;p[(l+1)13+l3o- 0.S]exp (2ap(d/b n )l3)
(i~ iI(u)du)P
E (["p(Yp) - m(a:jPWfXoIYp(a:IYp)) (1
22
+ 0(1)).
The first conclusion follows. The second conclusion follows immediately from the bound on
o.
Lemma 3.3. Under Conditions 3.1 and 3.2, we have
n-l
L: ICOV(Zn,o, Zn,i)1 = o(u5( n))
i=l
and
1
u 2 (n) = -u5(n)(1
n
+ 0(1)),
where u 2 (n) and u5(n) are given by (3.1) and (3.2).
Proof. Let in,i
= Icov(Zn,O' Zn,i)1
en
and
= [exp(ap(d/bn )I3)].
(3.3)
Then
(3.4)
We now deal with each of the above three terms. For 1 ~ j
in,i
~
p - 1, by (2.3), we write
= b:;;2 pE ([1P(Yp)-m( Xj P )][1P(Yp+i )-m( XjP )]Wbn([x-X o]/bn)JtVbn ([x-X j )/bn)) +O( 1),
since J-Ln is bounded by (2.4). Put
(XO)'
= (Xf,··· ,XJ);
( XO)'" -- (XOp+l' ... , XO).
p+i'
and X'
= (Xo)' +
e', X"
= (Xo)', +
e" and X'"
on (xo)", e",Yp and Yp+j, we have
23
e '" =
( cp+l,···,cp+j )
= (xo)'" +
e"'. Then, by conditioning
where
ql(X')
IT W (XI ~nXI)
IT Wb (XI ~nXI) I=j+l
IT Wb
I=j+l
=
bn
1=1
Q2(X") =
n
Q3(X"') =
IT
Wbn
I=p+l
n
(XI-jb - XI)
n
(XI-jb - XI)
n
and
By (1.2) and Fubinis's theorem, the inner conditional expectation is
Js _
=
E [Ql(X')Q3(X"'*Xo)'" elf, Yp,Yp+j]
1.
--
(21r)2)
r
JR2j
i
. 'Ltl(XI
)
E[
exp(_.2..
- X? - cI)
bn 1=1
~
-~? CI))
Y;~j]
- b
tl(XI_j
/(xot, elf, Yp,
nl=p+l
j
p+j
X II[¢K(tt)I¢t:(tt/b n )]
[¢K(tI)/¢t:(tt/bn)]dt'dt'"
1=1
I=p+l
II
=
1
(
. j
. p+j
)
(21r)2j12exp -i;-'LtIXI-i;tlxl_j et>(t',t"'!(XO)",Yp,Yp+l)
RJ
n 1=1
n I=p+l
j
p+j
X
¢K(tl)
¢K(tt)dt'dt"',
1=1
I=p+l
'L
II
II
where et> (t', t"'!( Xc)"~, Yp , Yp +j ) is the conditional characteristic function of (Xo)', (Xo)'"
given {(XO)", Yp,Yp+j}. Therefore,
Consequently, by (3.5), for 1
~ j ~
-
lin,jl
p - 1, we have
2'
<
(~:;~~;pE!Q2(X")Q4(YP,Yp+j)1 + 0(1)
<
C
11 2 (p-j)
b;"P111mbn 00
,
24
(3.6)
for some positive constant C 1 • The same argument yields
j
?
(3.7)
p.
Thus, by (3.4) and (3.6), we have
J1
C
p-1
n
i=1
2(p-l)
<
_1 "" IIWibn 11 00
- b2p ~
This together with the upper bound on
= 0 (b-n 2P IIWib 11 2(p-1)).
IIWbn 1100
n
00
in Lemma 3.1 and the lower bound on
0'5 (n) in Lemma 3.2 show that
~ <
C3
(
1
O';3(n) - b~+2(l+1).B+.Bo log b
)2l(P-1)
n
exp( -2a(d/bn ).B)
= 0(1),
(3.8)
where C3 is a positive constant. For J 2 , (3.7) leads to
Hence, by the choice of en given in (3.3) together with the lo~er bound on
0'5 (n)
given by
Lemma 3.2, we have
(3.9)
Finally, we consider
h. For p-mixing processes, we have
00
h $
0'6 (n)
L:
p(j - p + 1) =
0(
0'6 (n)) .
i=cn+ 1
From this together with (3.4), (3.8) and (3.9), we have proved the first conclusion for the
p-mixing processes. For strongly mixing processes, we first note that by (2.3)
(3.10)
Then employing Davydov's lemma, we obtain that
Iinti I <
8[a(j - p + 1)]1-2/11
(EIZn,olll) 2/11
< C4 [a(j - p + lW-2/1I1I~£pll~
25
for some constant C4 • Thus,
h
<;
2-+lC4I1W~JI:i: j~ [aU - p + 1JI1-2/- <; C411~;~:i: j~ i'[ aU - p + lJP-2/-.
Using again the upper bounded on
IIWbnlloo
and the lower bound of (j6(n) given by Lemma
3.2, we have
~
b- P[log ..!.-]UP ~ i"[a:(j - p + 1)]1-2 I V)
(j2(n) 5, 0 (c--'
n n
b
.Lo
n
= 0(1).
J=Cn
Combining this with (3.4), (3.8) and (3.9) proves the first part of the Lemma for strongly
mixing processes. The second conclusion follows directly from (2.6) and the first one.
0
Lemma 3.4. Under Conditions 3.1, we have
var(J~(:I:;p))
A
Moreover, if bn
-+
=0
(1;;b~p[(i+l).8+.80-1] (1
)
log(b) )2pl eXP(2ap(djb ).8).
n
0 such that bn
> ,d(2apjlogn)1 / .8 for some, > 1, then
at the continuity points of r(:l:jp).
Proof. The same argument as in the proof of Lemma 3.3 leads to
o (;Elb~Wbn
(:I:~nXo)12)
o (~lIltVbnll~).
nb P
n
By using the upper bound on
IIWbn 1100
given in Lemma 3.1, we obtain the first result. For
the second result, by the assumption on the bandwidth, we have as n
var(J~(:l:j p))
-+
-+ 00
0,
and by Proposition 1.1, we have j~(:l:jp) -+ r(:l:jp) in quadratic-mean. Hence, the second
o.
conclusion follows.
We remark that a similar conclusion to Lemma 3.4 was proved in Masry (1991a). The
current result is slightly stronger and broader.
26
3.2
Main Results
The goal of this section is to establish the asymptotic normality for the regression estimator
(1.10). To this end, we first discuss the asymptotic normality for
On, the dominating term
in the numerator of (1.14), and then use Lemma 3.4 to show the asymptotic normality for
m(:z:;p) via (1.14). We need the following conditions.
Condition 3.3A.
Assume nb~
defined by Sn
-+ 00
as n
-+ 00
= [(nb~)1/2].
for some,
> 1. Let {sn} be a sequence of positive integers
For strongly mixing processes, a(k) satisfies (nb-P'"Y)1/2 a (Sn)
-+
o as n -+ 00; for p-mixing processes, p(k) satisfies (nb:;;P'"Y)1/2 P(Sn) -+ 0 as n -+ 00.
Condition 3.3B.
Assume that nV-2b~v
-+ 00
as n
-+ 00
for some, > 1, where v is given in Condition
3.2. Let {sn} be a sequence of positive integers given by
For strongly mixing processes, a(k) satisfies (nb:;;P'"Y)1/2 a(sn)
processes, p(k) satisfies (nb:;;P'"Y)1/2p(sn)
-+
0 as n
-+
0 as n
-+ 00;
for p-mixing
-+ 00.
We remark that Condition 3.3A is weaker than Condition 3.3B, since Sn given in Condition 3.3A is larger. Note that Condition 3.3A is very similar to Condition 2.3 in the
ordinary smooth case. For bounded 'I/J, Theorem 3.1 shows that the asymptotic normality
holds under this weaker condition. Interesting examples of bounded 'I/J include estimating
conditional cdf and estimating mean regression function when responses are bounded (e.g.
binary response).
Theorem 3.1. Under Conditions 3.1 and 3.2, we have
A. If 'I/J(.) is bounded and Condition 3.3 A holds, then
c
vnQn(:z:;p)jO"o(n) --+ N(O, 1).
27
B. If Condition 3.3 B holds, then the above asymptotic nOTmality holds.
Proof. We first prove part Aj the proof of part B is outlined following the proof of part
A.
We first normalize Zn,j in (2.3) as follows:
n-l
Sn = I:Zn,j
j=O
so that
(3.11)
var( Zn,j) = 1.
In view of Lemma 3.3, we have
n-l
(3.12)
I:/COV(Zn,o,Zn,j)l- 0,
j=1
With such a normalization, (2.5) leads to
0iljn(Zjp)/uo(n) = In/(n - p + 1) v'
1
Sn-p+!.
n-p+1
Thus, it suffices to show that
(3.13)
We now employ the big and small block arguments as in the proof of Theorem 2.1. Let
/1 be a real number satisfying 1
small block size s
<
/1
< /, where / is given in Condition 3.3 A. Let the
= Sn, the big block size T = Tn
and the number of blocks k
kn
Then, it is easy to verify as n Sn
-
Tn
----t
OJ
00
n
be
n
= [Tn + Sn ].
that
Tn
-
= kn
----t
O·
'
Tn
(nb~)1/2
(
1
log bn
)lP
----t
0
(3.14)
and that
(3.15)
28
Define respectively TJj,f,j,(j, and
only to verify that as n -
S~,S~,S~' as
in (2.11) - (2.14). To prove (3.13), we need
00,
~E(S~? -
0,
~E(S~')2 -
(3.16)
0
k-l
II E[exp(itTJj)]
E [exp(itS~)] -
(3.17)
- 0
j=o
(3.18)
1 k-l
-n·L
E (TJ]I{ITJil
3=o
~ ey'n}) - 0
(3.19)
for every e.
As in (2.23), using (3.11) and (3.12), we have
var(f,j)
= s(1 + 0(1)).
(3.20)
One can similarly show that with F 1 and F2 given by (2.22), we have
Thus ~E(S"? -
o.
Note that by (3.11) and (3.12)
1
n-l
1
;,E(Sn"')2 ~ ;,(n - k(T
+ s)) + "~ Icov(Zn,O,Zn,j)l- O.
3=1
Hence, (3.16) holds.
Next, using Lemma 2.4 and (3.15) we have
E [exp(itS~)] -
k-l
II E [exp(itTJj)]
j=O
~
16kn a(sn
+2 -
p) '" 16~a(sn + 2 - p) - 0,
Tn
establishing (3.17).
Now, using (3.20) with s replaced by
T,
we obtain
(3.21)
29
so that by stationarity
1
k-l
-n.L E(11;) =
k
nTn (1
n
J= o
+ 0(1)) ~ 1.
This establishes (3.18).
Finally, we verify the Lindeberg condition (3.19). Since 'l/J(Y) is bounded, we have
where
C1 is
a positive constant. By using the upper bound on
the lower bound on
0"0 (
IIWbn 1100
in Lemma 3.1 and
n) in Lemma 3.2, we obtain
l11ol/vn=O ( (nbTn
h)l/2
(
1
logbn
)lP)
'
which tends to zero by (3.14). Hence (3.19) holds for bounded 'l/J since the set {11101 ~
cJn}
is asymptotically empty. This completes the proof of the first part.
For part B 7 we proceed .as follows. Define the small block size
Sn
as in Condition 3.3 B,
and let
and
where 1
<
,1 < ,.
Then, it is easy to show that
Sn
Tn
~
o.,
Tn
-~Oj
n
1)
log b
_P (
bn
Up
n
T~(I-I/II)
n l - 2 / 11
~0
(3.22)
and that (3.15) holds. The same argument as in part A) establishes (3.16) - (3.18). Thus,
we need only to show (3.19). Since
knTn/n ~
1, it suffices to show, by stationarity, that
(3.23)
as n ~
00.
Let An = {11101 > cy'n}. Then by Jensen's inequality,
so that
Tn
Fa ~
L EZ~,jIAn'
j=1
30
Let p = v /2 and ij = v/(v - 2) be its conjugate number so that
1
1
-= + -=q
P
= 1,
where v was given in Condition 3.2. By Holder's inequality and stationarity, we obtain
and by Tchebychev's inequality
oI
F3 <- r n (E/Zn,O 12P)1/P (EI1J
c2 n
2
)1/fi.
By (3.10), Lemma 3.1 and Lemma 3.2,
Using this, together with (3.21), we have
F= 0 (b;P (lOg b~)
3
Up
r~(1-1/V)n-(1-2/V»)
which tends to zero by (3.22). Hence, (3.23) holds and this completes the proof.
0
By using (1.14) together with Lemma 3.4 and Theorem 3.1, we have
Theorem 3.2. If the assumptions of Theorem 3.1 hold, then for x E RP such that
r(Xjp)
> 0 we have
y'nmn(x;p) - m(x;p) - Bn(xjp) ~ N
uo(n)
provided that bn
--.
(0, f02(xjp)
1 ),
0 such that bn ~ 'Yd(2ap/ log n )1/{3 some 'Y > 1, where u5( n) is given by
(3.1).
We remark that for the ordinary smooth case, the asymptotic normality [see Theorem 2.1
and Corollary 2.1] admits a classical form. However, for super smooth case, the asymptotic
normality does not have a classical form due to the unavailability of the asymptotic rate
31
and constant of 0'5(n). Note that the explicit upper and lower bound on 0'5(n) are given in
Lemma 3.2. These bounds are nearly sharp.
Remark 3.1. Under Condition 3.3B, there is a tradeoff between the order ofthe moment
v in EI'¢'(YIW' <
00
and the rate of decay of the mixing coefficients: the larger v, the
weaker the condition on the mixing coefficients. For example, if bn ~ id(2apjlogn)I/fJ as
in Theorem 3.1 and the strongly mixing coefficient satisfies a(j) = O(j-C), then Conditions
3.2 and 3.3 B hold when
v
v-2
c>--.
Appendix: Proof of Lemma 3.1
Let c be a genetic constant and
1
in
= Ab~ log bn '
(A.l)
where A is a positive constant. Then, by (1.2) we obtain
(A.2)
We first deal with II. With M large but fixed, condition 3.1 i) leads to
By taking derivative with respect to t, it is easy to show that the integrand in (A.3) is
an increasing function of t when t
~
Mb n and hence is bounded by its value at the point
32
t
=d -
In' Therefore, we have
Taylor's expansion gives
This together with (A.l) lead to
(AA)
Next, we consider 12. We first note that for t E [d - In, d]
r f3o -(f3- I ) < c.
This together with Condition 3.1 iii) and (1.15) entail that
h
~
C
<
l~"In (d -
t)l.(t/b n)-f3o exp (a(t/b n)f3) dt
Cf~b~o l~"In t f3 - 1 exp (a(t/bn)f3) dt
o (b~o+f3+1.f3 (lOg b~) I. exp (a(d/bn)f3) )
By choosing a large value of the constant
.
>., the upper bound of h dominates II. Hence,
by (A.2), we obtain the first conclusion. The second conclusion follows from the Parseval's
identity:
and a similar argument.
Now, we establish the third conclusion. First, write
By (AA),
(A.6)
33
Next, by symmetry, we have
We remark that R~(t/bn) can not change its sign on the interval [d - 'i'n, d]; otherwise,
R~(t/bn) would have a root, say, tn, which implies 4>~(tn/bn)
= R~(tn/bn) + ij~(tn/bn) = 0
[since we assume j~(t/bn) = o(R~(t/bn»] and contradicts with the assumption that </>~(t) =IO. Also, by Condition 3.1 vi), 4>K(t) > 0 on the interval (d - 'Yn,d). For the point x =
(k
+ O.5)tr/d,
When x =I- (k
k
= O,±I,±2,"', the
+ O.5)tr/d,
third conclusion follows naturally since cos(dx)
= O.
on the interval t E [d - "'In, d], we have
cos( tx)
= cos( dx ) (1 + 0(1»)
uniformly in x on a bounded interval. Thus, the function cos(tx) can not change its sign
on [d - 'Yn, d]. These imply that J 2,1 and h,2 have the same signs, say positive. Thus (A.7)
entails
Using the tail condition (1.15) and Condition 3.1 iv), we obtain
Ihl >
c Icos(dx)( 1 + 0(1») 11~b~(d - t)l(t/b n )-f3o exp (a(t/b n)f3) dt
> cl cos(dz)1 ( d
~n~rp, exp (a (d ~nb~r) {b~(d -
t)'dt
> clcosdxl~o+(l+1)f3exp (a(d/bn )f3(1- b~/d)f3).
The second inequality follows from the fact that the function
r f30 exp(a(t/b n )f3) is
creasing function when t E [d - b~, d]. Using the fact that for small x,
( 1 - x)f3
f3
>
1- - x
2'
34
an in-
we have
J2 ~
cl cos dxlb~o+(l+I).B exp (a( d/bn).B)
.
This together with (A.5) and (A.6) lead to the desired lower bound by choosing a large
value of A so that h dominates J1 •
0
References
[1] Carroll, R. J. and Hall, P. (1988). Optimal rates of convergence for deconvolving a
density. J. Amer. Statist. Assoc., 83, 1184-1186.
[2] Collomb, G. and HardIe, W. (1986). Strong uniform convergence rates in robust nonparametric time series analysis and prediction: Kernel regression estimation from dependent observations. Stoch. Processes Appl., 23, 77-89.
[3] Fan, J. (1991a). On the optimal rates of convergence for nonparametric deconvolution
problems. Ann. Statist, to appear.
[4] Fan, J. (1991b). Asymptotic normality for deconvolving kernel density estimators.
Sankhyii, to appear.
[5] Fan, J. (1992). Deconvolution with supersmoother distributions. Canad. J. Statist., to
appear.
[6] Fan, J. and Truong, Y. (1990). Nonparametric regression with errors-in-variables. Tentatively accepted by Ann. Statist.
[7] Fan, J., Truong, Y. and Wang, Y. (1990). Measurement errors regression: A nonparametric approach. Institute of Statistics Mimeo Series,
# 2042,
University of North
Carolina, Chapel Hill.
[8] Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and its Applications. Academic Press, New York.
35
[9] HardIe, W. (1990). Applied Nonparametric Regression. Cambridge University Press,
Boston.
[10] Kolmogorov, A.N. and Rozanov, Yu. A. (1960). On strong mixing conditions for stationary Gaussian processes. Theorey Prob. Appl., 52, 204-207.
[11] Liu, M. C. and Taylor, R. (1989). A consistent nonparametric density estimator for
the deconvolution problem. Ganad. J. Statist., 17, 399-410.
[12] Mack, Y. P. and Silverman, B. W. (1982). Weak and strong uniform consistency of
kernel regression estimates. Z. Wahrscheinlichkeitstheorie view. Gebiete, 61, 405-415.
[13] Masry, E. (1991a). Multivariate probability density deconvolution for stationary random processes. IEEE Trans. Inform. Theory, 37, 1105-1115.
[14] Masry, E. (1991b). Asymptotic normality for deconvolution estimators of multivariate
densities of stationary processes. Submitted for publication.
[15] Masry, E. (1991c). Strong consistency and rates for deconvolution of multivariate densities of stationary processes. Submitted for publication.
[16] Masry, E. (1991d). Regression estimation with errors-in-variables for stationary processes. Submitted for publication.
[17] Nadaraya, E. A. (1964). On estimating regression. Theor. Prob. Appl., 9, 141-142.
[18] Prentice, R. L. (1986). Binary regression using an extended Beta-Binomial distribution, with discussion of correlation induced by covariate measurement errors. J. Amer.
Statist. Assoc., 81, 321-327.
[19] Robinson, P. M. (1983). Nonparametric estimators for time series. J. Time Series Ana!.,
4, 185-297.
36
•
[20] Robinson, P. M. (1986). On the consistency and finite sample properties of nonparametric kernel time series regression, auto regression, and density estimators. Ann. Inst.
Statist. Math., 38, 539-549.
[21] Rosenblatt, M. (1956). A central limit theorem and strong mixing conditions. Proc.
Nat. Acad. Sci., 4, 43-47.
[22] Rosenblatt, M. (1969). Conditional probability density and regression estimates. In
Multivariate Analysis II, ed. Krishnaiah, 25-31.
[23] Roussas, G. G. and Tran, L. T. (1991). Asymptotic normality of the recursive kernel
regression estimate under dependence conditions. Ann. Statist., to appear.
[24] Stefanski, L. and Carroll, R. J. (1991) . Deconvolving kernel density estimators. Statis-
tics, 21, 169-184.
[25] Truong, Y. K. and C. J. Stone (1991). Nonparametric function estimation involving
time series. Annals of Statistics, to appear.
[26] Truong, Y. K. (1991). Nonparametric curve estimation with time series errors. J. Sta-
tistical Planning and Inference, to appear.
[27] Volkonskii, V.A. and Rozanov, Yu. A. (1959). Some limit theorems for random functions. Theory Prob. Appl., 4, 178-197.
[28] Watson, G. S. (1964). Smooth regression analysis. Sankhyii Ser. A., 26, 359-372.
[29] Whittemore, A. S. and Keller, J. B. (1988). Approximations for errors in variables
regression. J. Amer. Statist. Assoc., 83, 1057-1066.
[30] Zhang, C. H. (1990). Fourier methods for estimating mixing densities and distributions.
Ann. Statist., 18, 806-830.
37