Fan, Jianqing, Truong, Young K. and Wang, Yonghua; (1990).Measurement Errors Regression: A Nomparametric Approach."

MEASUREMENT ERRORS REGRESSION:
A NONPARAMETRIC APPROACH
Jianqing Fan
Young K. Truong
Department of Statistics
University of North Carolina
Chapel Hill, N.C. 27599
Department of Biostatistics
University of North Carolina
Chapel Hill, N.C. 27599
Yonghua Wang
Department of Statistics & Biostatistics
University of North Carolina
Chapel Hill, N.C. 27599
October 25, 1990
Abstract
Nonparametric regression provides a useful tool for exploring the association between
the responses and covariates. In many applications, the actual values of the covariates
are not known, but are measured with errors or through surrogates. In this paper,
we present nonparametric regression techniques to study the association between the
response and the underlying unobserved covariate. Deconvolution techniques are used
to account for the errors in the covariates. The proposed regression estimators are
shown to have limiting normal distributions and methods for constructing confidence
intervals are also introduced. Various simulated examples based on both continuous
and binary responses are included to illustrate the usefulness of the proposed methods.
oAbbreviated title. Measurement errors regression
AMS 1980 ,ubject clallification. Primary 62G20. Secondary 62G05, 62J99.
Key word, and phrYJle,. Binary response; Confidence interval; Deconvolution; Errors-in-variables; Kernel
estimator; Nonparametric regression.
1
1
Introduction
Recently, there has been a great deal of interest in the regression problem involving errorsin-covariates. This is due largely to important medical and epidemiologic studies where risk
factors are usually available partially. Prentice (1986) presented a study about the effect of
radiation on chromosomal aberrations based on 649 survivors of the Hiroshima atomic blast,
the amount of radiation actually received is not known but its measurements were obtained
through physical models of radioactive transmission along with interview data for each of
the survivors, primarily regarding their distance from the blast. Whittemore and Keller
(1988) considered the respiratory illness verses exposure to nitrogen dioxide (N0 2 ) based
on group of school children, the actual amount of (N0 2 ) is not known but was measured
by the concentration in the child's room, a surrogate for personal exposure to (N0 2 ).
To describe this regression problem more formally, consider a response Y and a covariate
xo, that is observed through
X
= xo + t,
where t is independent of XO and Y.
Suppose now the error distribution is known and that it is desired to estimate the regression
function m(x)
= E(YIXo = x) based on a random sample (Xt,Yd, ... ,(Xn,Yn) from the
distribution of (X, Y).
In parametric approach, the regression function is assumed to be a specific function
with unknown parameters which are then estimated by (say) maximum likelihood method.
See, for example, Armstrong (1985), Stefanski (1985), Stefanski and Carroll (1985, 1987),
Prentice (1986), Whittemore and Keller (1988) and Fuller (1987), and Whittemore (1989).
One drawback in this approach is that the resulting maximum likelihood equations are
usually very complicated. The other is that erroneous conclusions can be made if the
parametric model is misspecified. To overcome these problems and to enhance the flexibility
in fitting data, we propose a modified kernel method for estimating the regression function
in which the idea of deconvolution is involved to account for the effect of errors-in-covariates.
As in nonparametric curve estimation, this method does not require a particular functional
form of the regression curve. Thus it has the advantage for exploring the relationship
2
between the response and the covariate. Further, it also provides a useful diagnostic tool
for regression analysis with errors-in-variables and a good alternative method to parametric
approaches.
There is a strong connection between errors-in-variables and deconvolution problem,
which has been studied extensively in the literature. See, for example, Carroll and Hall
(1988), Fan (1991), Masry and Rice (1991), Stefanski and Carroll (1990) and Zhang (1990).
The paper is outlined as follows. Section 2 describes the deconvoluting kernel estimators and gives examples of these estimators according to the error distributions. Numerical
examples based on continuous and binary responses with different error distributions are
presented in Section 3. Section 4 discusses the limiting distributions of the proposed estimators and their consequences in providing confidence intervals. Section 5 contains some
concluding remarks. The justifications of the limiting distribution of the deconvoluting
kernel estimators are given in Section 6.
2
2.1
Methods
Kernel Estimators
Given a random sample (XI, Y1 ), ••• , (Xn , Yn ) from the distribution of (X, Y), where X
xo
+ E,
=
we are interested in estimating the regression function of Y on the unobserved
variable xo: m(x) = E(YIXo = x). Note here the value of the covariate variable Xi is
not available, thus the estimating procedure must have the ability to extract Xi from the
observed Xi
= X io + Ei.
Fan and Truong (1990) consider the following deconvoluting kernel estimator
(2.1)
with Kn(x) given by
1
Kn(x) = 211'
1
00
-00
•
<l>K(t)
exp(-ttx) <I>~(t/hn) dt.
3
(2.2)
Here hn
-+
0 is a smoothing parameter, and <PK(') is the Fourier transform of the kernel
function K(·):
<PK(t) =
and
Jexp(itx)K(x) dx,
<PcO is the characteristic function ofthe error variable E.
Note that (2.1) is a kernel type estimator with kernel weights modified to account for the
fact that covariates are measured with errors. The kernel function (2.2) can be motivated as
follows. Suppose that we are interested in estimating the density function of the unobserved
XO. Given a set of observations XI, ... , X n with Xi = Xt +Ei, Stefanski and Carroll (1990)
and Fan (1991) consider the following deconvoluting density estimator:
(2.3)
where ~nO is the empirical characteristic function:
A
<Pn(t)
1
n
n
1
= - I:exp(itXj).
With notation (2.2), the estimator (2.3) can be written in the kernel form:
Now recall that the ordinary kernel regression estimator (Le. no errors in covariates) is
obtained from the kernel density estimator via the same line of reasoning. This leads to our
deconvoluting estimator (2.1).
Throughout this paper, our discussion also based on the fact that the regression function
has two derivatives.
Some theoretical aspects of the problem are investigated by Fan and Truong (1990),
which can be highlighted as follows .
• The estimate (2.1) is optimal in terms of rate of convergence.
• The rates of convergence depend on the smoothness of error distributions, which can
be characterized into two categories: ordinary smooth and super smooth. Let <Pc be
the characteristic function of the error distribution. Call a distribution
4
- super smooth of order {3: if the characteristic function of the error distribution
tPe(.) satisfies
where do, dh {3, i are positive constants and {30, {31 are constants;
- ordinary smooth of order {3: if the characteristic function of the error distribution
tPe{-} satisfies
(2.5)
for positive constants do, dh {3.
IT the second derivative of the regression function exists, the optimal rates of conver2
gence is O(Oognr:s) when the error distribution is super smooth of order {3, and is
2
O(n-~)
when the error distribution is ordinary smooth of order {3•
• The optimal choice of bandwidth h n depends also on whether the error distribution is
smooth or super smooth. For the super smooth error distribution of order {3, it turns
out that the optimal choice of h n = c(logn)-I / .8 for some constant c, which is known
(see Example 2.1), and depends only on the error distribution and kernel function. In
the ordinary smooth case, the optimal choice of bandwidth is hn
1
= dn- 5+2d , for some
constant d > O. The constant d is usually selected to balance the bias and variance
of the estimate.
2.2
Examples
In this section, we give two examples ofthe deconvoluting kernel estimator (2.1) based on
normal error and double exponential error distributions, which are super smooth of order
2 and ordinary smooth of order 2, respectively.
5
Example 2.1: Normal Errors
Suppose E has a normal distribution with mean 0 and variance CT~. Then 4>e(t) =
exp( -!CT~t2). Further, suppose the kernel K(·) is a function whose Fourier transform is
given by
(This is a modified version of the inverse triangular kernel so that K is a second order
kernel.) By (2.2),
(2.6)
Graphs of this function for different values of h n are given Figure 1. According to Fan and
Truong (1990), to achieve the optimal rates of convergence, the bandwidth h n is chosen so
that
h n = cCTo(logn)-1/2
for c:::::: 1.
(2.7)
III
ci
~
~
~
~
,-,
,,
.
\
\
.
....- --."
,~-',
0
ci
,, ....-.....""
, .... . ...,,
,.....
"
".
~
..- "',,
-,..........
I
C?
",
\
..,
, ":
...
,--',
..
"' .......... ""
"
,
" ...... "
t'If
,\.....
~,
~
C?
-10
o
-5
5
10
x
Figure 1. Deconvoluted kernel functions (2.6). The dotted curve has c = 1.0 and the dash curve
has c = 0.8.
6
Example 2.2: Double Exponential Errors
Suppose
e has a double exponential distribution:
(2.8)
Then
By (2.3),
=
..!..jOO
exp( -itx )4>K(t) (1 + u~ht:)
21l'
4
=
K(x)+
dt
n
-00
2~1.: exp(-itx)t 24>K(t)dt
2
= K(x) - 4u;,2 K"(x).
(2.9)
n
If K(·) is the Gaussian kernel K(x) = (v'2i)-l exp(-x 2 /2), then
Kn(x)
1 exp (1
u~ (x 2- 1)] .
= Vifi
-2 x 2) [1 -. 4h~
To obtain optimal rates of convergence, Fan and Truong (1990) showed that h n is chosen
so that
for some c
2.3
> o.
Reliability Ratio
For the above examples, it is assumed that the variance of the error term CT~ is known.
This would seem rather restrictive at first. However, there are a number of situations, for
example in psychology and sociology where information about u~ can be obtained via the
battery of questions called instrument [see Fuller (1987, p.5)]. In fact, knowledge about u~
is usually expressed through the reliability ratio, which is defined by
where s~ is the sample variance for X,
Uk.
is the variance of
xo, and u~ is the variance
of error distribution. See Fuller (1987) for a comprehensive account for regression analysis
7
based on the reliability ratio. Nevertheless, we are taking a different approach to this issue
by viewing r as the signal-to-noise ratio in the sense that if the ratio is too small, the signal
on covariate is too weak to give a meaningful analysis of the data. On the other hand, if the
ratio is known to be about 90%, the above approximation gives O'~ = O.ls~, and hence the
value of 0'0 is approximately known. In practice, one could analyze the data by trying, say
r = 0.5,0.6, ... ,1.0, and see which r would yield meaningful interpretations for the data
being analyzed. r
~
0.5 is chosen since we are trying to model the relationship between Y
and XO rather than associating Y with the error term e.
Comparing with parametric approaches, nonparametric approach is much more flexible.
Indeed, for parametric approaches, in addition to specifying the reliability ratio, one further
assumes a particular form of regression function, and a specific joint distribution between
the covariate and the response. To judge the appropriateness of the model, one would have
to follow the trial and error procedures, and this would lead to choosing parametric functions
in an infinite dimensional space involving complicated computing in MLE. In contrast, our
approach only requires to tune the signal to noise ratio r, which is much simpler. Another
advantage is the shape of the deconvoluting kernel estimator (2.1) usually suggests a good
parametric form for the regression function, in addition to being a useful diagnostic tool for
regression analysis involving errors-in-variables.
2.4
Viewing Errors-in-Covariates as Errors-in-Responses
In this section, we give a new viewpoint of errors-in-variables problem, which provides some
insight for understanding how errors in covariates would affect the scatterplot of the data.
Suppose the regression function is smooth and the amount of error e is small, or equivalently,
r=
O'fo/(O'~
+ O'fo) is large.
Then, it is clear that with high probability that the error e
contaminated in the true covariate XO(X
= XO + e) is small.
point, then its response at this point would be
yo = m(X) + eO
8
If X were the true design
= m( XO + e) + eO
~
m(XO) + m'(XO)e + eO,
(TaYlor's expansion)
where eO is the error in the regression model. But our observed response is Y
= m(XO)+eo.
Thus, viewing covariate X as the true designed point, the observed response Y has an
additional error -m'(XO)e, comparing with the true response yo at point X.
There are two important consequences of the new view-point of the errors-in-variables:
• When the true regression curve is very flat (so that
Im'(·)1
is small), the errors-
in-covariates do not appreciately affect the response. In other words, we can treat
errors-in-covariates as if there were no error contaminated in the covariates. Thus, in
linear regression model m(x) = ax + b, when
lal is small, error in covariate does
not
affect greatly the ordinary linear regression analysis.
• When the reliability ratio is small, there is high probability that error e contaminated
in the covariate is small. Thus, viewing X as desired design point, the errors in the
response are still negligible. In other words, error-in-covariate is negligible.
3
3.1
Simulation
Continuous Responses
In this section, we are interested in estimating the regression function m( x) = x 3( 1 - x)3
[see, for example, Rice (1984)] based on a set of observations (Xl, Yi), ... , (X,,, Yn ) in which
Yi = m(Xf) + e'l, e'l ""lid N(O, 0.0015 2)
(3.1)
and the covariates Xi are measured with errors according to
Xi
= Xi + ei,
Xi
""lid
unif(O, 1).
(3.2)
Here {ei} is a random sample from either N(O, CT~) or double exponential given by (2.8).
9
Before errors are added to the underlying covariates, it is helpful to show the scatterplot of 200 original observations
(xt, Yi)
(Figure 2) so that one can study the effect of
measurement errors.
tQ
C!
0
..
..,.
~
0
0
• *.
II)
•• ~ .;'t.. *••
~
••••
.
. r...:
•••
0
>-
•
~...
..
8
..
't(" ..*
fI#~
.l:
0
C;
0
..
:;.~
..',: ,
'r
II
I~·
• •
.~~
',*
.:.,
.......", .
"
~';
C!
.,-a:....,.. ...~.,
.
~
. *...
.
0
~
••
.
. .'.:"....:. .
*. '.&.':~
.*" •
~ ~
.*
.
I
1.0
0.5
0.0
1.5
x
Figure 2. Scatterplot of the responses (3.1) and the covariates without measurement errors.
Example 3.1: Normal Errors
Figure 3 is the scatterplot of the responses
Yi and Xi by adding to the covariates Xi
normal errors N (0, O'~) with O'~ chosen so that the reliability ratio r
= 0.85.
Equivalently,
these are normal errors with
2
0'0
1
1- r
= 12 . -r- = 0.0147.
The scatterplot looks noisier than Figure 2. This can easily be explained by viewing errors
in covariates as additional errors in the responses. See Section 2.4. The regression function
is then estimated by the deconvoluting kernel estimator (2.1) with different degrees of
smoothing by choosing the following bandwidths: (see (2.7))
10
l(l
C!
0
~
·.·... •.",... . ' •..
..
..
~
on
~
:0-
.. .· ..:..l
ci
•• •
C!
0
0
ci
IA~.
t.
0
0
l!:
~:~i-.
.*
*.
I·
'::.
" '...
*,• •
I
••*
I.:
.. .,...
,. ..
. ..• .,'#..
-:
••
*.
.\ ...\..
1-,
....,.',. ..
...
•••* I'
• , ..III':
~I:'
*.
• : ....:
..
•
•
\
..
\,.
.
.-\1.:" ••'
..••••••* f/i
•
I
0.0
0.5
1.5
1.0
x
Figure 3. Scatterplot of the responses and the covariates contaminated with normal errors
(r = 0.85).
on
oci
§
,,"',"../
ci
o
ci
,,
,
..·
,,
,
...
...
.,.,.......
0.0
~
'."-.\ \
' '
0.5
.. ...
1.5
1.0
x
Figure 4. Deconvoluted kernel estimates: normal errors with r = 0.85. The solid and the dash
curves are the estimates with c
z
= 1.0 and c = 0.80, respectively.
The estimates are evaluated at
= 1/200,2/200, ... ,1. The dotted curve is the true regression function.
11
Figure 4 shows the estimates (at z = 1/200,2/200, ... ,1) superimposed with the underlying regression function m(z) =.z3(1 - z)3. Note that in this case, the bandwidth
hn
= uo(log n)-1/2 provides a threshold for nonparametric estimation involving normal er-
rors. If h n = cuo(log n )-1/2 is chosen with c > 1, then the variance converges to 0 much
faster than the bias term. If c would have been chosen to be much less than 1, then the
variance can be quite large. In fact, it can even diverge to
00.
The graphs of the corresponding deconvoluting kernel functions used in the above estimates are indicated in Figure 1. Note that in contrast to the ordinary approach, these
kernel functions depend on n.
.. .
II)
.-
..
:.
o
,...
•....~ ... #
'..
*.
..
•
.. . .-.. ."..,I,... .',...".... '
·0·. I'·
o
o
Ci
,. *,,"..
....
* ••
•
o
:*
.
••••• ' •.'
'.••••
..: :.
•• *l •
o
o
:
-0.5
0.0
..
~.
.;....
...
'.'
..
..
.*
• ..... ..... ••
", •••
~
.
:.
*,'
:.
• •
.....
•
.. .
.
~
-.: ••
••
.',
.••.• I"~
.. .
\
. ..:;-.
~
.
0.5
1.0
1.5
x
Figure 5. ScatterpJot of tbe responses and tbe covariates contaminated witb normal errors
(r = 0.70).
Figures 5-6 show the results when the variance of the error distribution is larger by
decreasing the reliability ratio r to 0.70. The resulting scatterplot is much noisier than the
one in Figure 3 with r
= 0.85. The effect of choosing c < 1 (hence inflating the variance) is
more transparent, see the estimates with c = 0.6 and c = 0.8 in Figure 6. This is compatible
12
with the theory in Fan and Truong (1990).
··..
··· ...
; .......
: ..
..
.!!lit!'·
./:..i
' . , ",_...
".,.
:L.
.
··..
,i,l
•l'\
.~
#
I
,~
•
~
"
, I','
, .'"
II,,"
o
1"
o
-0.5
0.0
0.5
1.0
1.5
x
Figure 6. Deconvoluted kernel estimates: normal errors with r ";,, 0.70. The solid, dash and the
dotted curves are the estimates with c
=1.0, c =0.80 and c =0.60, respectively. The true regression
function is also plotted for comparison.
Example 3.2: Double Exponential Errors
Figure 7 shows the data with double exponential errors added to the covariates. These
errors are generated from (2.8) with
0'0
chosen so that r = 0.80.
The regression function is then estimated by smoothing the data in Figure 7 using the
deconvoluting kernel (2.9). The results are presented in Figure 8.
Figures 9-10 illustrate the effect of having a larger variance (since r
= 0.60) in the double
exponential error distribution, In contrast with normal errors, even though the reliability
ratios are smaller, the resulting estimates are closer to the true curve: regression with double
exponential errors is easier than the regression with normal errors in the covariates. This
is consistent with the remarks in Section 2.
13
t(l
C!
0
-
~
.
~
•
.. ., .. ....
..
."..
..,.....
III
•
~
I
0
~
""\
•• ~,•• : ••• :,..*'
0
~0
,..~.
~
fill
0
>-
. .
.
'•
* ••
-
.
III
a
9
'...
. :..*'"
. t.••
:l~
I.-
. .. ......, .. .
..
•
\
.,...
0.0
e• • (
•
••••
J1f-. *.
:~..
~
~
•
~.
*.
. :.'..'-,'.
•••
.t,....
,
1.0
0.5
...
1.5
x
Figure 7. Scatterplot of the responses and the covariates contaminated with double exponential
errors (r = 0.80).
o
>-
0
o
o
o
0.0
0.5
1.0
1.5
x
Figure 8. Deconvoluted kernel estimates: double exponen~ia1 errors with r = 0.80. The solid
and dash curves are the estimates with hn = 0.07 and hn = 0.08, respectively. The dotted curve is
the true regression function.
14
~
~
.... .
*- *
*,
..~
II)
0
>-
~
flit ' ;.....: ....- .....
.
.*
0
~
0
..
~..
I
**
d
~..
*.~.
-0.5
*•
l • • : • ..
. '.
·l, :..
. .... :... *.
•
..../* * .. *.....
-, ....'...... '
:..*:l:
• "
~ :
.. .....:.;:'• *.
*
•
*:-.* ••,\, • ,
*
0
*
•
0.0
•
\.*
..
.., ...
.....
.,~..,. ..
.. . ,.~••....
\
..
•\. •
*.
1.0
0.5
1.5
x
Figure 9. Scatterplot of the responses and the covariates contaminated with double exponential
errors (r = 0.60).
~
0
d
II)
Ci
d
0
>-
~
!
0
d
-0.5
0.0
0.5
1.0
1.5
x
Figure 10. Deconvoluted kernel estimates: double exponential errors with r = 0.60. The solid
and dash curves are the estimates with hn = 0.09 and hn = 0.10, respectively. The dotted curve is
the underlying regression function m(·).
15
3.2
Binary Responses
Many interesting applications in medical and epidemiologic studies involve binary responses.
See, for example, Carroll et al (1984), Prentice (1986) and Whittemore and Keller (1988).
The nonparametric estimators considered in the previous sections can also be applied to
this type of response as well.
Let Y denote a 0-1 random variable with distribution depending on the covariate XO
so that the logit of Yon XO
=x
= x is given by
10git(Y\XO = x) = 8(x - 0.5)3.
That is, the regression function is given by
m(x) = E(YIXo = x) = P(Y = l\XO = x) =
exp(8(x- 0.5)3)
.
1 + exp(8(x - 0.5)3)
As before, let XO have a uniform distribution on (0,1). XO is not available and it is observed
through X
= XO + t.
The distribution of the error t is treated separately as follows.
Example 3.3: Double Exponential Errors
Let (XI, Y1 ), ••• , (X 200 , Y200 ) denotes a random sample from the distribution of (X, Y)
with t having a double exponential distribution (2.8). Here CTo is chosen so that the reliability
ratio r
= 0.90.
Thus
Xi
Yi
=
Xi
+ ti,
_ {I,
Xi
-lid
unif(O, 1),
ti -lid f~,
[see (2.8)]
with probability m(Xi)i
0, with probability 1- m(Xi).
The data are presented in Figure 11, there the estimates (at x = 1/200,2/200, ..., 199/200,1)
are obtained via (2.1) with the deconvoluting kernel function given by (2.9) and the bandwidth h n = 0.10.
16
-
CI!
_
........................... ..-- ..
C!
~
>.
CI
d
.-d
N
d
0
d
.. _. __
0.0
--
.
1.0
0.5
x
Figure 11. Scatterplot of the responses and the covariates contaminated wit/I double exponential
errors (r
= 0.90).
The solid curve is the estimate (2.1) and (2.9) with hn.
= 0.1.
The dotted curve
is the underlying regression function logit(YIXo = z) = 8(z - 0.5)3.
Ezample 3.4: Normal Errors
In this example, let XO denote a unif(O, 1) random variable and c has a normal distribution N(O,O'~) with O'~ chosen so that the reliability ratio r = 0.90. (XI, Y1 ), ••• , (X200 , Y200 )
is a random sample from the distribution of (X, Y), where X
of the data is shown in Figure 12. Choose h" by setting c
= xo + c.
The scatterplot
= 1.3 in (2.7), we obtain the
deconvoluting kernel estimates (at z = 1/200,2/200, ... ,199/200,1) via (2.1) and (2.6).
See Figure 12.
17
-·
-·
C'!
. --
~
lD
d
:0-
lD
d
..
d
N
-_.._. ...
·
.
..
.'.'
.....'.'
'
--=::::--_--..
·
............
.'.'
.'
' .'
d
0
d
........ _- ..- .. _
0.0
_
.
0.5
1.0
1.5
x
Figure 12. Scatterplot of the responses and the covariates. contaminated with normal errors
(r = 0.90). The solid curve is the estimate (2.1) and (2.6) with hn = 0.06. The dotted curve is the
underlying regression function logit(YIXo
4
= z) = 8(z -
0.5)3.
Confidence Intervals
To construct confidence intervals for the unknown regression function in the presence of
errors in variables, we need to validate the asymptotic normality. For simplicity of notation,
denote
K"j(z)
= :" K" (Z ~:j),
(4.1)
which is the deconvolution weight of estimator (2.1). Let
b,,(z)
1 reX>
z-u
= h" J_oo[m(u) -
m(z)]K(-x:-)!x(u)du,
(4.2)
where !x(-) is the density of Xj. For the bandwidth and the kernel that we will use, we
have
•
p
!,,(z) -+ !xo(z),
18
(4.3)
where jn(x) is defined by (2.3). Thus,
mn(x) - m(x)
= ! t(~
n
1
n
n
1
1
- m(x))Knj(x)jIxo(x)(l + IJfo(x)
- 1)
In(x)
= - }:(Y; - m(x))Knj(x)jIx o(x)(l + op(l)).
Hence the asymptotic normality for
for n- 1 Ei(Y;
mn ( x) -
(4.4)
m( x) is equivalent to the asymptotic normality
- m(x))Knix ), which is the mean of LLd. random variables.
We begin with some assumptions.
CONDITIONS
1. The marginal density Ixo(,) ofthe unobserved XO is bounded away from zero at point
x, and has a continuous bounded derivative.
2. The regression function m(.) has a bounded second derivative.
3. The conditional variance of 0'2(x) = var(YIXo = x) is bounded away from zero and
infinity. Moreover, there is a 6> 0 such that Jl2+6(X)
4. The characteristic function
</>~(.)
=E (IYI2+6IXo = x) <
00.
of the error variable e does not vanish.
Let's first consider double exponential errors. Assume more generally that there exist
constants c =I 0, and
f3
~
0 such that
(4.5)
Then, it is very easy to verify that the double exponential distribution (2.8) is a special
case of (4.5).
Call a kernel K a second order kernel, if K is bounded and satisfies
i:
K(x)dx
= 1,
i:
xK(x)dx
Moreover, we always assume that
19
= 0,
i:
2
x K(x) =I O.
where <PK(') is the Fourier transform of the kernel K.
Now, we are ready to study the .asymptotic normality for double exponential error
distributions.
Theorem 1. Under Conditions 1-4 and (4.5), if the kernel function K(·) is a second
order kernel, then
(4.6)
provided that h n
-
0 and nh~+213 -
00.
A practical version of Theorem 1 involves how to estimate bias and variance in (4.6).
This is usually done as follows by letting h n converges 0 a little bit faster than the optimal
choice [which is hn
1
= O(n-~) by Fan and Truong (1990)] so that the bias is negligible.
To avoid technical difficulties, assume
(12( x)
=
(12,
which is independent of x. Then, the
asymptotic variance is given by
Let's take the interval [a, b] such that the marginal density fx o satisfies minxE[a,b] fxo(x) >
O. Then
(12
can be estimated by
O'~ =
n
n
1
1
L:(l'i - m(Xj ))21[a:$Xj:$b]I L: 1[a:$Xj:$b]
and the second factor can be estimated by ~ E~ K~j(x), see (4.1. Denote
(4.7)
Corollary 1. Under the conditions of Theorem 1 and the assumption that (12(x) =
if hn = 0(n- 2 i+ 5 ), and nh~+213 -
00,
(12,
then
(4.8)
20
As a direct consequence, the (1- a) confidence interval for m(x) is given by
(4.9)
where
Zl-a
is the (1- a/2)-quantile of the standard normal distribution.
Example 3.2: Double Exponential Errors (cont'd)
Recall that in this case, the covariates are measured with double exponential errors
(r
= 0.80).
Based on the estimate (2.1) and (2.9) with h n
= 0.08 (Figure 8),
the 95%
pointwise confidence intervals (4.9) are given in Figure 13. It is interesting to note that the
whole regression curve m(x) = x 3(1 - x)3 lies within the confidence limits.
III
C;
d
o
>-
C;
d
o
d
0.0
0.5
1.0
1.5
x
Figure 13. 95% pointwise confidence intervals for m(z)
Figure 7.
21
= z3(1- z)3. Based on the data in
Now, let's state the asymptotic normality for normal error distributions.
Theorem 2. Under Conditions 1-4, if! '" N(O,O'~), then for the kernel estimator (2.1)
with K" given by (2.6), we have
y'nfxo(x)[m,,(x) - m(x)) - b,,(x)
E0'2(X1)K~1 (x)
J
dO' (log n)-1/2 with d -> 1 _
Provided that h, =
,0
-+
N(O, 1),
(4.10)
31og1og".
~"
Note that the condition d ~ 1 - 311:~~g" is used to guarantee that (4.3) holds. To wit,
note that for K" defined by (2.6),
IK,,(·)I :5
<
1\1- t 2)3 exp (~~;) dt
exp (1-
h~:)20'~) + h~.4
2h"
r
It-h~.8
2texp
(O'~t:) dt
2h"
= 2~r exp (;;~) (1 + 0(1)).
(4.11)
Thus,
var(j" (x ))
2
< !EIK"11
n
(~h~2.8exp (~~) )
=
0
=
0(n(lo:n)6.4 (logn [1- 31~~~nr) )
-+
exp
O.
Now the bias has the expression (see Fan(1991))
Ej,,(x) - fxo(x) =
1:
[fxo(x - h"y) - fxo(x)) K(y)dy -+
o.
Thus, (4.3) follows since the bias and the variance of j,,(x) both converge to zero.
Corollary 2. Under the conditions of Theorem 2, if 0'2(X) = 0'2 and
h " -- (1 _
3log1ognlog n+ (3 log1ognlog n))
0
0'0
(1
og n
)-1/2
,
we have
(4.12)
22
where Vn(x) is defined by (4.7).
For normal error distribution, the confidence interval for m(x) has a similar form as
(4.9).
Example 3.1: Normal Errors (cont'd)
Here the covariates are measured with normal errors (r = 0.85). Using (4.9) and the
estimate (2.1), (2.6), (2.7) with c = 0.8 (see Figure 4), 95% pointwise confidence intervals
are presented in Figure 14. Note that the regression curve does not lie totally within the
confidence limits (compare with double exponential error case), this may be associated with
the fact that the rate of convergence of the deconvoluting kernel estimators (2.1) is very
slow. See Fan and Truong (1990).
III
Ci
ci
o
:0..
Ci
ci
o
ci
III
8·,
ci -
..,_----~---~----..,_---'
0.0
1.0
0.5
1.5
J[
Figure 14. 95% pointwise confidence intervals for m(z) = z3(1 - z)3. Based on the data in
Figure 3.
Note that for the normal error distributions, in order to have the variance of the deconvolution kernel estimator converges to zero, the bandwidth must satisfy hn = dO'o(log n )-1/2
23
with d
~
1, while for the variance to have lower order than the bias, one requires that
h" = dO'o(1og n )-1/2 with d ~ 1. This determines the bandwidth in Corollary 2.
For the binary response models in Section 3.2, m( x) is the conditional probability P(Y =
llX o = x) with conditional variance given by m(x)(I- m(x)), which depends on x. Hence,
the confidence intervals (4.9) should be modefied to account for the heteroscedasticity as
follows:
m(x) E m,,(x) ± Z1-a/2
v'EK~j(x)v'{m,,(x)(I- m,,(x»)}+
A
•
nj,,(x)
Figures 15-16 show the 95% confidence intervals based on the binary response data
presented in Examples 3.3 and 3.4. See also Figures 11 and 12.
...
l'!
...C!
CD
ci
,..
..
.'
CI
ci
•ci
'
............. _......................•...•..••.•••/
.'
.'.'
.'
N
ci
0
ci
0.0
0.5
1.0
x
Figure 15. 95% pointwise confidence intervals for logit(YIXo
binary response data in Figure 11.
24
= x) = 8(x - 0.5)3. Based on the
C'
.-!
C!
.III
0
~
.'.'.'
.'
Cl
0
............................................
•0
.'..'
.'
.'
N
0
a
0
0.0
1.5
1.0
0.5
x
Figure 16. 95% pointwise confidence intervals for }ogit(YIXo
= z) = 8(z -
0.5)3. Based on the
binary response data in Figure 12.
5
Conclusions
Much work has been done on the errors-in-variables problem under parametric assumptions for the regression function, and the error distribution is usually taken to be Gaussian.
Although recent attention has turned to other error distributions and nonlinear (parametric) regression model (Carroll et al. (1984) and Prentice (1986)), the computation needed,
however, to obtain maximum likelihood estimates are arduous and intractable [Whittemore
(1989)]. This difficulty has prompted to the development of method based on approximation. Still, there are further problems along this line. Namely, the sampling properties of
these approximating procedures are not easy to obtain (Stefanski (1985), Whittemore and
Keller (1988)) and there are no diagnostic tools. Such important methodology is needed in,
for instance, Examples 3.3 and 3.4 on nonlinear logistic regression analysis.
25
In contrast, we proposed nonparametric estimators based on the idea of deconvolution. These estimates are easy to compute, flexible in describing the relationship for the
response and the covariate, they are also useful diagnostic tools for data analysis involving
errors-in-variables. More importantly, both the sampling properties of these estimators and
confidence intervals are established. The analyses given in the above examples have demonstrated the importance of these features, which also make the nonparametric procedures
potentially useful for a board class of problems.
Further work in this direction may be needed to address the issue on bandwidth selection.
Our simulations showed that the visual bandwidth selection (Le. bandwidths chosen by the
users) performed reasonably well. Methods based on cross-validation appear promising and
offer interesting research problems.
6
6.1
Appendix-proofs
Proofs of Theorem 1 &. 2
Denote [see (4.4)]
Unj = (Yj - m(x))Knj(x).
(6.1)
Note that (4.4) is the sum of LLd. random variables. By the triangular central limit
theorem,
vTn~ E~ Unj - EUnl
-+
N(O, 1),
Jvar(Und
(6.2)
if Lyapunov's condition holds:
E lanl
-
EUnl1 2+6
n c5 / 2[var(Un1 )]l+6/2 -
o.
(6.3)
It has been calculated (see Lemma 6.2 of Fan and Truong (1990)) that
(6.4)
26
A direct consequence of (6.4) is that the sequence {EUn1 } is bounded. Since the sequence
{ElUnlIQ}(a
= 2,2+ 6) is an unbounded sequence, the condition (6.3) is equivalent to
E IUnll 2+5
_
E IY1 - m(x)I2+ 5 IKn1 (x)I2+ 5
n5/2[EU~lP+5/2 - n 5/ 2[E I(Y1 - m(x»Kn1 (x)1 2P+5/2
-+
0
'
(6.5)
where Kn1(x) is defined by (4.1). By Condition 3, the right hand side of (6.5) is of order
EJL2+5(X1)IKn1 (xW+ 5 < C
EIKn1 (x)I2+ 5
n 5/ 2 [E0'2(Xl)K~1(X)]l+5/2 - n 5/ 2 [EK~1(X)]l+5/2'
(6.6)
for some constant C. If we use the same argument for asymptotic normality of the deconvolution problem (estimating the density
Ix.), we end up with exactly the same problem:
showing that (6.6) converges to zero, which has been justified by Fan (1990). Thus, Lyapounov's condition (6.3) holds. Since
Thus, by (6.2), (6.4) and (4.4), the conclusion of Theorem 1 & 2 follows.
6.2
Proof of Corollary 1
With bandwidth selection in Corollary 1, the bias term is negligible [see Fan and Truong
(1990)]:
(6.7)
Consequently,
Jiilxo(x)(mn(x) - m(x»
";0'2E K: 1 (x)
-+
N(O, 1).
Thus, we need only to show
v.2
p
n
0'2EK: 1(x)
-+1
.
According to the Weak Law of Large Number [Chow and Teicher (1978), p. 328]
*Ei
K~j(x)
EK:1(x)
27
p
-+
1
holds, if for each
E
> 0,
EK~l(X)E (K~l(X)l[K~l(:r:)~enEK~l(:r:)]) -+ 0,
(6.8)
which is proved by Fan (1990) in the deconvolution setting. Thus, to complete the proof,
we need only to show that O'~ 1: 0'2. Since the density lx.(·) is bounded away from 0 on
[a, b], we have
(6.9)
Thus, by (6.9), we have
.!.n ~(Yj
- mn(Xj)?l[a<X-<b]
LJ
J_
1
1
n
= ;; L(Y; - m(Xj))21[a$Xj$b] + op(l)
1
= E(Y1 - m(X1 ))21[a$Xj$b] + op(l)
=
0'2 Pea ~ Xl ~ b) + op(l).
and
1 n
-n "l[a<X
<bj1: Pea ~ Xl ~ b).
LJ
- J_
1
Thus,
The conclusion follows.
6.3
Proof of Corollary 2
Similarly to the proof of Corollary 1, we need only to show that both (6.7) and (6.8) hold
for K n given by (2.6). To this end, for x E [0,11'/4],
Kn(x)
~ 2~ l~h~ (1 -
2
t )3 exp
(~~~) dt
~ 2- exp (0'~(1-2h~)2)
211'
~
1
2h n
8
811' hn exp
(
O'~
2h~ -
28
2)
0'0
1
1
l-h~
•
(1 - t)3dt
(6.10)
Let C
= 8~ exp( -O'~).
Note that the density of X is the convolution of fx. with a normal
distribution. Thus, P(X E [0,11"/4]) > O. By (6.10),
EK;l(X)
~ C2h~4exp(O'Vh~)P(XE [0,
i]).
(6.11)
It follows from (6.11) that (6.7) holds. Using (6.11) together with (4.11), the set {K~l(X) ~
enEK;l(x)} is an empty set, when n is large. Hence (6.8) holds.
References
[1] Armstrong, B. (1985). Measurement error in the generalized linear models. Communi-
cations in Statistics - Simulation and Computation, 14, 529-544.
[2] Carroll, R. J., Spiegelman, C. H., Lan, K. K. G., Bailey, K. T., and Abbott, R. D.
(1984). On errors-in-variables for binary regression models. Biometrika, 70, 19-25.
[3] Carroll, R. J. and Hall, P. (1988). Optimal rates of convergence for deconvoluting a
density. J. Amer. Statist. Assoc., 83, 1184-1186.
[4] Chow, Y. S. and Teicher, H. (1978). Probability Theory: independence, interchange-
ability, martingales. Springer Verlag, New York.
[5] Fan, J. (1990). Asymptotic normality for deconvolving kernel density estimators. To
appear in Sankhyii.
[6] Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution
problem. Ann. Statist. To appear.
[7] Fan, J. and Truong, Y. (1990). Nonparametric regression with errors-in-variables. In-
stitute of Statistics Mimeo Series #2023, Univ. of North Carolina, Chapel Hill.
[8] Fuller, W. A. (1987). Measurement error models. Wiley, New York.
[9] Masry, E. and Rice, J. A. (1991). Gaussian deconvolution via differentiation. To appear
in Can. J. Statist..
29
[10] Prentice, R. L. (1986). Binary regression using an extended Beta-Binomial distribution, with discussion of correlation induced by covariate measurement errors. J. Amer.
Statist. Assoc., 81, 321-327.
[11] Rice, J. (1984). Bandwidth choice for nonparametric regression. in generalized linear
models. Ann. Statist., 12, 1215-1230.
[12] Stefanski, L. A. (1985). The effects of measurement error on parameter estimation.
Biometrika 72, 583-592.
[13] Stefanski, L. A. and Carroll, R. J. (1985). Covariate measurement error in logistic
regression. Ann. Statist., 13, 1335-1351.
[14] Stefanski, L. A. and Carroll, R. J. (1987). Conditional scores and optimal scores for
generalized linear measurement-error models. Biometrika, 74, 703-716.
[15] Stefanski, 1. A. and Carroll, R. J. (1990). Deconvoluting kernel density estimators.
Statistics, 21, 169-184.
[16] Whittemore, A.S. and Keller, J.B. (1988). Approximations for errors in variables regression. J. Amer. Statist. Assoc., 83, 1057-1066.
[17] Whittemore, A. S. (1989). Errors-in-variables regression using Stein estimates. Amer-
ican Statistician. 43, 226-228.
[18] Zhang, C. H. (1990). Fourier methods for estimating mixing densities and distributions.
Ann. Statist., 18,806-830.
30
2038 CHAKRAVARTI, I.M.: A three-clullIIOCiation Icheme on the flags or a finite projective plane
and a (PBm) des.ip defined b7 the inddeDce or the Oagl and the Bier lubplanes in PG(2,q2),
Oct. 1000. (8 JlIIIU)
2039 CHAKRAVARTI, I.M.: Geometric conltruction or some families or two-clul and three-cl311
lIIodation Ichemes and codes from non~egenerate and degenerate Hermitian varieties, Oct. 1990.
(13J1ogu)
2040 CARLSTEIN, E.: ResampliDg techniques for Itati0Dar7 time eeries: Some recent developmentl,
Oct. 1990. (16 Jl4Iu)
2041
FAN, J. ok MARRON, S.: Best possible conltant for bandwidth ee1ection, Oct. 1990. (IS pllga)
2042 FAN, J., TRUONG, Y.K. ok WANG, Y.: Measurement errorl regression: A nonparametric
approach, Oct. 1990. (30 Jlllgu)
2043 REN, J.J.: On Hadamard differentiabilit7 and M-estimation in liDear modell, Sept. 1990.
(diI,eriIJticm)
.

Download Report

Fan, Jianqing, Truong, Young K. and Wang, Yonghua; (1990).Measurement Errors Regression: A Nomparametric Approach."

Paperzz.com

Your Paperzz