Fan, Jianqing; (1990).Deconvolution with Supersmooth Distributions."

Deconvolution with Supersmooth Distributions
Jianqing Fan
Department of Statistics
University of North Carolina
Chapel Hill, N.C. 27514
July 15, 1990
Abstract
The desire to recover the unknown density when data are contaminated with errors
leads to nonparametric deconvolution problems. Optimal global rates of convergence
are found under the weighted Lp-loss (1 $ p $ 00). It appears that the optimal rates of
convergence are extremely slow for supersmooth error distributions. To overcome the
difficulty, we examine how large the noise level can be for deconvolution to be feasible,
and for the deconvolution estimate to be as good as the ordinary density estimate. It
is shown that if noise level is not too large, nonparametric Gaussian deconvolution can
still be practical. Several simulation studies are also presented.
o Abbreviated title. Supersmooth Deconvolution.
AMS 1980 lubject clallijication. Primary 62G20. Secondary 62G05.
Key wortU and phralel. Deconvolution, Fourier transforms, kernel density estimates, Lp-norm, global
rates of convergence, minimax risks.
1
Section 4 examines how the theory works for moderate sample sizes via simulation
studies. Futher remarks are given in section 5. Proofs are deferred in section 6.
Optimal Global Rates
2.
Let's give a global lower bound on rates for supersmooth error distributions. Let's assume
that the second half inequality of (1.4) holds:
-+
00),
(as x
-+
(as t
/3, i > 0, d 1
for some constants
for some 0
~
0, and
(2.1)
/3I, and that
±oo),
(2.2)
< Qo < 1 and a > 1 + Qo.
Theorem 1. Suppose that the distribution of error variable
E
satisfies (2.1) and (2.2)
and f E Cm,B. Then, no estimator can estimate f(l)(x) faster than the rate 0 (log n)-(m-l)/t3)
in the sense that for any estimator Tn(x),
(2.3)
for all 1 ::; p :S 00, provided that the weight function w(.) is positive continuous on some
.:.~ :::)~-,:).a:'Yl ;':(~':'~:::
interval, where
.;"
I:
is a positive constant independent of the estimator.
Cp,l
'.. ,: ','~ .'
technical argum,ent of Theorem 1, we will use the technique of adaptiYely
. J017 ~,t::\:~, J.Y..9f: 9fL! I!~ j5~G;J)~J:~
.
,
. In tez:ms
-;c. ..'J.~.)
. , C"
",,'"
...J'J
.... \,.
o:.~.s
J'"
('/l:"'
rr "
..
~f
.J~b· )
"0
r",o ,
.....
~
v· _'. -... ~,)
" ••
'~,
-
_._
local one-dimensional subproblems developed by Fan (1989), and then reduce the global
:2. I L:::9'W9lil' r:i 2!L:'>.~: "lLiT .JJ:o:. '::,:I'::.
'~lJro ~fr~~1~9~~l:~a1,~;~t~:(t~;~~~:i~n~roblem so that the existing lower bound (Fan (1990))
on pointwise rates of convergence can be used. To our knowledge, the technical argument
.h9~~9-:~) ':
~'1 ; J
appears to be new!
•
.
',.
-~:.,":
.'
?'~
.. )
~_:-. "";' ;-':.~.
.. ,
_
Now, let's show that the rate above is indeed attained by the deconvolution kernel
..,
...
estimator (1.3), and hence it is optimal. Some assumptions on kernel function K(.) are
Condition 1:
4
e
• 1((·) is bounded continuous, and J:~= lylml](y)ldy <
00 •
• The Fourier transform 4>K of K has a bounded support
Theorem 2. Assume that 4>e(t)
:f:. 0 for any t,
ItI $
Mo. Moreover, <PK(t)
=
and that
(2.4)
for some positive constants {3,'Y,d 2 and constant {32. If the kernel function I( satisfies
Condition 1, then for h n = cMo(2h)1/fJ(logn)-1!fJ with c
> 1,
(2.5)
for all 0 $ p $
00,
provided that the weight function is integrable.
In light of the bandwidth given by Theorem 2, there is no much room for bandwidth
selection. He> 1, then the variance converges to 0 much faster than the bias does, while
if c < 1, the variance goes to infinity. Thus, practical selection of bandwidth would select a
constant c close to 1 in Theorem 2.
The distributions satisfying conditions (2.1), (2.2), and (2.4:) include normal, mixture
normal, and Cauchy distributions. For these supersmooth error distributions, nonpara:. ': :.)':',~Q\q . x: ~ q ~ .l
metric deconvolution is extremely hard: the optimal rate of convergence is only of order
.,:)') ~~r:~f'?O'1 1) U ",.J 3't~:H'\1lr ,\1'l;;-:'')~'''1
(log n )-m!fJ. One way of resolving this difficulty will be discussed in the next section.
~ ~ ir,: i11!T'" 1.0 ~:>uHb9.t 10 i!.lI:!91 .aI
Some special global results (basically p = m = 2, I = 0, t normal or Cauchy) are obtained
. '~[;:9Id()'!9G :ra I.s!IOi~£I9J:.!L:J·9iIO ls,o!
independently by Zhang (1990) under different formulatIon. The results in Theorem 1 &
2 provide better insights: it shows that both
lowe~'~d~~p:;ib~~~lssd~;e~dr~~lreonly
.'.:-·;:·:S·Ino:> 10 a91SI 9arnSJ::> ' fIO
through the tail of 4>e, and the dependence is explicitly addressed.
: "llSc.a :; d' c ~ ~1.6'
")
Remark 1. In an early version of the proof of Theorem 1 (see Fa:n (19~8), f~r which the
" .. ',':', %:
results in this section are based), a 1-dimensional subproblem is hard enough to capture the
difficulty of the full global deconvolution problem. In contrast with the ordinary density
estimation (Stone (1982», in order to construct an attainable lower bound under the global
5
•
and
Let Fe be the distribution of
£,
and
x2 (f,g) = J(f - g)2lldx be the x2.distance. By
Theorem 1 of Fan (1989), if
(6.3)
then
sup EJ
in!
r l1'n(x) - If)(x)IPw(x)dx
Jo
> 1- J1 ~~(-C1) lol w(x)dx llIH(I)(x)IPdX(m;;(m-l)p.
tn(z) JeCm.B
(6.4)
Thus, m;;(m-l) is the global lower rate.
Let's determine m n from (6.3). Note that there exists a positive constant
C2
such that
lo(x) > cdo(x + jlm n ) (1 $ j $ m n ). By (6.1) and (6.2) with a change of variable, we
have
To construct a pointv.ise minimax lower bound, one has also to select m n such that (6.5)
I,
.'~
.
"f',:
is of order O(l/n), which.
o.
constant C3 >
'p
•
is determined by Fan (1990) to be m
j
n
= c3(logn)1/t3, for some
Conse:q~ktly, the global rate is of order m;;(m-l) = c;(m-l) (log n )-(m-l)/t3.
The conclusion follows.
~v£.d 9·n
('0
<
i~'~·-)
'''.
6.2•
• '~e need o,nly
'rIS 1(,;
t~
prove the result for p = 00; the other result follows from
J')-(naoI))o ,"'::","
"
by assuming that J:~= w(x )dx = 1.
12
Note that
~
E/~')(x)
=
1+
00
-00
Ix(I) (x -
hny)K(y)dy,
which is independent of the error distribution Fc • Thus, by the results in the ordinary
density estimation, or Taylor's expansion
sup
feem,s
IIEj~'>(-) - If>(.)lIoo ~
1+
(m - 1).
00
B,
-00
lyl{m-I>IK(y)ldyh~m-I>.
(6.6)
Thus, we need only to verify that
Note that by (1.3),
(6.7)
by using the fact that ¢JK has a support [-1110, Mo], and that
.:. k:
By (2.4), there exists a constant
'-,:,:._';,.:~
to such that when ItI ~ to,
•. .i~
f, ;jr::;~J:'.,':", .
,. fLo ,: b:_>; ,:' q\i)O
!9b'!o
ro;,:
.a"iloITor !1o!wbno:> sdT
Consequently, by (6.7) and the fact that minltl90 l¢Jc(t)1
> 0, we have
.S:.-3
' : : ct
':Ic)
D'19 II
sorT
With the bandwidth given by Theorem 2, the last display is of order o«logn)-d), for any
positive constant d. This completes the proof.
13
6.5.
Proof of Theorem 5
First, using the integration by parts twice, it
~s
easy to see that K n defined by (3.5) is
bounded by
C
II(n(x)1 $ 1 + x 2
(for some constant C),
Ixl- 2 as Ixl -
Le. Kn(x) is bounded and decays at the rate
and can be expressed as
p;;(x) =
with K·(x)
= 1:
00
!
n
t
JC·
1
00.
Thus, P;; is well defined,
(X - Yi)
hn
Kn(y)dy. Note that
s~p IEP,:(x) - F(x)1
+00
1
= s~p I
-00
Fx(x - hny)JC(y)dy - F(x)1 = O(h~) = O(n- 1 / 2 ).
Thus, we need to prove that
which follows from Marcinkiewicz-Zugmund's inequality (Chow and Teicher (1988), p356)
or direct e.,,<pansion by assuming p = 2j as Theorem 4,
for some constant D, as had to be shown.
References
.G'rOj S~~;~9 ..~] 1.GJ!sb I~~;·.:·!~J.
cF:: ..... ~1.:J~"" Jrl~"-'"
[1] Carroll, R. J. and Hall, P. (1988). Optimal rates of convergence for deconvoluting a
density. J. Amer. Statist. Assoc., 83, 1184-1186.
. !Iot~?~1dij1
:,1·::s.r.::::..'i:sq:.r10I: 101 ::::J~~;~:.:.~':~-:.
[2] Chow, Y.S. and Teicher, H. (1988). Probability Theory, 2 nd edition. Springer-Verlag,
New York.
[3] Devroye (1989). Consistent deconvolution in density estimation. Canadian J. Statist.,
17, 235-239.
16
[4] Fan, J. (1988). Optimal global rates of convergence for nonparametric deconvolution
problems. Tech. Report 162, Dept. of Statist., Univ. of California, Berkeley.
[5] Fan, J. (1989). Adaptively local I-dimensional subproblems. Institute of Statistics
Mimeo Series #2010, Univ. of North Carolina, Chapel Hill.
[6] Fan, J. (1990). On the optimal rates of convergence for nonparametric deconvolution
problem. Ann. Statist., to appear.
[7] Fan, J. and Truong, Y. (1990). Nonparametric regression with errors-in-variables. In-
stitute of Statistics Mimeo Series #2023, Univ. of North Carolina, Chapel Hill.
[8] Fuller, W. A. (1987). Measurement error models. Wiley, New York.
[9] Liu, M. C. and Taylor, R. L. (1989). A consistent nonparametric density estimator for
the deconvolution problem. Canadian J. Statist., 17,427-438.
[10] Masry, E. and Rice, J.A. (1990). Gaussian deconvolution via differentiation.
Manuscript.
[11] Mendelson, J. and Rice, J.A. (1982). Deconvolution of microfl.uorometric histograms
with B splines. J. A mer. Stat. Assoc., 77, 748-753.
r~
.::.
'A
C.... 1,___
..... : . .,
[12] Stefanski, L.A. (1990). Rates of convergence of some estimators in a class of deconvolution problems. Statist. Probab. Letters, 9,229-235.
[13] Stefanski, L. A. and Carroll, R. J. (1990). Deconvoluting kernel density estimators.
, ': ,::.:;:: 011£ .r, ..: .iloTIs::> [11
Statistics, to appear.
'
. .
'" .:?s~,,-,~"2. :nm1'_ .t :-:;tl2il9S
[14] Stone, C. (1982). Optimal global rates of convergence for nonparametric regression.
:-' ,T i::;' ;::.-: ,')7,'.(") ::~
Ann. Statist., 10, 1040-1053.
[15] Wise, G. L., Traganitis, A. D., and Thomas, J. B. (1977). The estimation of a probability density from measurements corrupted by Poisson noise. IEEE Trans. Inform.
Theory, 23, 764-766.
17