Hu, Tien-Chung and Fan, Jianqing; (1990).Bias Correction and Higher Order Kernel Functions."

Bias Correction and Higher Order Kernel Functions
Tien-Chung Hu 1
Jianqing Fan
Department of Mathematics
Department of Statistics
National Tsing-Hua University
University of North Carolina
Hsinchu, Taiwan
Chapel Hill, N.C. 27514
.
Abstract
Kernel density estimates are frequently used, based on a second order kernel. Thus,
the bias inherent to the estimates has an order of O(h~). In this note, a method of
corr~cting
the bias in the kernel density estimates is provided, which reduces the bias
to a smaller order. Effectively, this method produces a higher order kernel based on a
second order kernel. For a kernel function K, the functions
and
1
f~oo K(lc-l)(z)/zdz
K(lc-l)( )/
z z
are kernels of order k, under some mild conditions.
lCompleted while visiting Department of Statistics, University of North Carolina, Chapel Hill.
Abbreviated title. Bias Correction.
AMS 1980 subject classification. Primary 62G05. Secondary 62G20.
Key words and phrases. Bias correction, higher order kernel, kernel density estimate, nonparametrics.
1
1
Introduction
Consider data that can be thought of as a random sample from a distribution having
an unknown density. It is common practice to summarize the data with some kinds of
statistics. Unless the form ofthe density is known, it is also very helpful to examine graphical
representations and overall structures of the data. Kernel density estimates provide a useful
tool for these purpose. See Silverman (1986), Eubank (1988), Miiller (1988), HardIe (1990)
and Wahba (1990) for many examples of this, and good introductions to the general subject
area.
Great efforts have been made to select a bandwidth for a kernel density estimate based
on a second order kernel, because such an estimate is easily explainable. A large amount
.
of recent progress has been obtained on data based smoothing parameter selection, see
Rice (1984), Marron (1988), Hall et al. (1990), Jones, Marron and Park (1990), Chiu
(1990), Fan and Marron (1990) and among others. Most of these bandwidth selectors have
extremely fast rates of convergence to their theoretical optimal. However, since the second
order kernel is used in the density estimate, the bias inherent to the estimate is always of
order n- 2!5, no matter how good an automatic bandwidth selector is. This amount of bias
may sometimes obscure the interesting features such the number of modes and height of
the underlying density at the modes. In such cases, bias correction to the kernel density
estimate is desirable. The discussion on this issue forms the core of this paper.
For a set of random sample Xl,"', X n , a kernel density estimator is defined by (See
Rosenblatt (1956))
in(x) =
n~n ~I( (X ~nXj)
,
(1.1)
where K is a kernel function and h n is a bandwidth. We will concentrate on how the bia.s
of (1.1) can be estimated for a nonrandom bandwidth h n .
A method of correcting bias is given in section 2. Effectively, we give a method for
constructing a higher order kernel based on a second order kernel. This provides a new
insight to the effect of an higher order kernel.
2
There are several methods of constructing a higher order kernel. Schucany and Sommers
(1977) propose a method based on the generalized jackknife to higher order kernels. A useful
class of higher order kernel based on Gaussian density can be found in Wand and Schucany
(1990). BerliJ:!-et (1990) using the idea of reproducing kernel in a Hilbert space to construct
a class of higher order kernel and discuss its consequences. The optimalities of higher order
kernels' are discussed in Millier (1984), Gasser et al. (1985), and Granovsky and Miiller
(1990).
Mathematically, most of methods above are directly targeted at finding a function ](r
satisfying
1:
](r(z)dz
= 1,1: zqKr(z)dz = O,q = 1,·· ·,T -1,
and 1 : Izr Kr(z)ldz <
00.
We take a different approach from the pioneering work by correcting bias directly. As a
result of bias correction, a class of higher order kernel is constructed.
Section 2 gives a precise formulation, and discussion, of the main results. Proofs are in
section 3.
2
Main Results
Let's illustrate how the bias of the kernel density can be corrected. Mathematical justifications are given in section 3. Observe that the kernel density estimator (1.1) is an unbiased
estimator of J~oo f(x - hny)K(y)dy:
E]n(x)
=
1:
f(x - hny)K(y)dy.
(2.1)
Taking derivatives j times with respective to h n yields an unbiased estimate of the functional
(viewing x as a fixed point)
(2.2)
and the unbiased estimate of OJ is given by
A
OJ(X)
()i
= -. fn(x),
ah~
3
A
(2.3)
where
in is the kernel density estimate defined by (1.1).
Let's assume that the unknown density has k bounded continuous derivatives. Now, the
Taylor expansion of f( x) yields
f(x) =
f(x-hny+hny)
k-l
=
L
j=O
1
1f(i)(x - hnY)(hny)i + O(h~)
(2.4)
J.
Multiplying K(y) and then integrating both sides of (2.4) with respect to y, we have
f(x) =
k-l
(-hn)j
j=O
J.
L
.,
k
8j(x) + O(h n ),
where the fact J~oo K(y)dy = 1 is used. Thus, one can use OJ(x) to correct the bias
8o(x) - f(x) = -
k-l (
h )j
j=l
J.
L - .;
8j(x) + O(h~)
of kernel density estimate (1.1). In other words, a bias-corrected estimate is defined as
•
fb(X) =
k-l
L
j=O
(-hn)j •
8j(x).
J.
"
(2.5)
Let's give a simpler formula for the bias-corrected estimate (2.5).
Lemma 2.1. If K(·) has bounded k th derivative, then
where K,(Z) = zl K(l)(z), and OJ was defined by (2.3).
By Lemma 2.1, the bias-corrected estimate (2.5) can be written as
(2.6)
with
(2.7)
4
where the identity that
was used. Thus, effectively the efforts of bias correction of kernel density estimate produce
another kernel function Wk(') defined by (2.7). As intuitively expected, Wk( x) is a k th order
kernel, which is justified by
Theorem 1. If the kernel function K (.) satisfies
1:
K(y)dy
=1
and
1:
ly2k K(k)(y)ldy <
00,
then the function Wk(') is a k th order kernel:
1:
Wk(x)dx = 1,
1:
x"Wk(x)dx = O,Jor s = 1"" k - 1,
and
(2.8)
Since Wk( x) is a k th order kernel, it follows that
Theorem 2. Let K satisfy the condition of Theorem 1 and let f(·) have k th bounded
continuous derivative. Then,
A
Efb(x) = f(x) -
f(k)(x)
k!
1
00
-00
x kK(x)dxh~(1 + 0(1)).
Thus, the bias-corrected estimate does have the order of bias as expected. Since W k is a
k th order kernel, a similar conclusion holds for the Mean Integrated Square Error (MISE).
Remark 1. When K is symmetric, the kernel function (2.7) is also symmetric. In such
a case, if k = 2r - 1 is an odd integer, then W 2r easily justified by (2.8), and satisfies
5
1
is also a kernel of order 2r which can
Example 1. Let's take a standard normal density
K(x)
1
= </>(x) = y21l"
/iCexp(-x 2 /2)
as a kernel function. Then,
</>(I)(x)
= (-1)IH,(x)</>(x),
where H,( x) is the Hermite polynomial of order 1. Thus, by (2.7),
Wk(X)
= k-1
L
(
1=0
k
)
1+ 1
(2.9)
(-x)IH,(x)</>(x)/l!
is a k th order kernel with
if k = 2r - 1
if k = 2r.
Note that also that W 2r -
1 ( x)
is a kernel of order 2r with
These kernel functions are different from the kernel functions derived by Wand and Schucany
(1990). The following table list the first few kernel functions (2.9), which is computed by a
computer program.
-7
k
(x· - 7x 2
3
4
5
6
7
+ 2)t/>(x)
+ 6)t/>(x)/2
6
(_x + 15x· - 48x2 + 24)t/>(x)/6
(x8 - 26x6 + 183x· - 360x2 + 120)t/>(x)/24
(_x 10 + 40x 8 - 495x 6 + 2190x· - 3000x2 + 720)t/>(x)/120
57r o + 1095x8 - 8625x 6 + 27090x· - 27720x2 + 5040)t/>(x)/720
(_x 2
2
(x 12 -
0.7758
1.4149
2.2336
3.3145
4.8096
6.9908
It appears that the higher kernels produced by (2.9) are quite complicated, which make
them less useful. However, a simple method is possible. Observe that for 1 2:: 1
i:
1((1)( x )dx
6
= O.
By integration by parts, we obtain
1
00
-00
if 1 = 0" .. , k - 2
x'K(k-l)(x)dx = { 0,
(_l)(k-l)(k - I)!,
Thus, if J~oo IK(k-l)(x)/xldx <
00,
ifl
= k-1
(2.10)
then by (2.10)
(2.11)
is a kernel of order k.
Theorem 3. Let K (x) be a kernel function satisfying
and
x'K(l-l)(X) --+ 0, as Ixl--+ oo,lor 1 = 1,'" ,k-1.
Then, Kk defined by (2.11) is a kernel function satisfying
if 1 = 1,,,,, k - 1
if 1 = k
Remark 2. When K(·) is a even function, then K(2r-l)(0) = O. Thus,
if K(2r)(0) exits. In other words, the function !(2r-l)(x)/x is well defined at point x = O.
Consequently, if J~oo IK(2r-l)(x)ldx <
00,
then J~oo IK(2r-l)(x)l/xdx <
is a kernel of order 2r, if other conditions of Theorem 3 is satisfied.
Example 1. (continued) If K (x) = </>( x), then
</>(2r-l)(X)/[X
=
1:
( _ly</>(2r-l)(x)
2r - 1 (r - 1)!x .
7
</>(2r-l)(x)/xdx]
00
and K 2r - 1(X)
is a kernel function of order 2r. The result is found in Wand and Schucany (1990). See also
Wand and Schucany (1990) for the kernel functions K2r-1 (x), r
= ~ 1;X2 be the standard Cauchy density.
Example 2. Let K(x)
(1
= 1", ,,5.
+ x 2)K(2r-1)(x) + 2(2r -
l)xK(2r-2)(x)
+ (2r -
Then,
1)(2r - 2)K(2r-3)(x) = O.
The recursive formula is used to compute higher order kernels. The following table gives
the higher order kernel function resulting from (2.11). The renormalization constants and
f~oo K~r-1 (x )dx are computed by using numerical integration.
Table 2: Cauchy density based kernels of order 2-8
order 2r
K2r-l (x)
2
4
1«1 ';x 2 )2
16(x~-I)
0.4
0.3581
(l+x 2 )6
0.4023
l~x2
0.4464
1«1 +x 2 ).
90.5415(3x· -10x~ +3)
6
130.3S0(x 6 -7x· + 7x~ -1)
8
Example 3. Let Kn(x)
f~~ K~r_l (x)dx
•
= cn(l- x 2)+. be a kernel function, where
Cn
is a normalization
constant. Then, by (2.11)
_
-1
n
K n,2r-1 - Cn,r ~(-1)
j (
n )
j
2j-2r
(2J')'•
(2j _ 2r + 1)!x
l[1xl$11' for r
"n (
(2j)!
The
h
Cn,r -- 2 L.Jj=r - 1)j ( nj ) (2j-2r+1)!(2j-2r+1)'
were
r1011owmg
.
= 1"", [n/2],
T a belgIves
'
t he resu1t
of K 8 ,2r-1(X) for x E [-1,1].
Table 3: Polynomial based kernels of order 4-8
order 2r
4
6
8
KS.2r-l (x)
~(1 - x 2 )t(1- 5x5 )
nhr(1 - x2)~(3 - 26x 2 39x 4 )
+
j;\liJil (1 - x 2 )+ (35 - 385x2 + 1001x4
-
715x6)
f~<>o K~r_l (x)dx
1.8190
2.2435
2.5333
If one is interested in finding a fourth order kernel, a simpler one would be K 4.3(X) =
8
3
Proofs
3.1
Proof of Lemma 2.1
Since differentiation is a linear operator, we need only to show for the case n
= 1.
We
use the induction to prove the result. Note that Lemma 2.1 holds for j = O. Assume that
Lemma 2.1 holds for j = m. Then,
A
Om+I(X)
=
0
oh Om(X)
=
(-~=::m! t ( m1 )
A
n
=
[KI+I(X
1=0
n
~ Xl) + (l + m + l)KI(X ~ Xl)] Il!
n
n
(_1)m+lm! [ m + 1
x - Xl
h~+2
(m + 1)!Km+l ( hn )
E
am,IK1+I(x
+ (m + 1)Ko(
Xl
hn
X -
~ Xl)] ,
n
1=0
where
Combining the last two displays yields that
iJ
()_(-1)m+I(m+1)!~(m+1)K(X-XI)ll'
h m +2
L..J
1
h
m+I x -
n
Thus, Lemma 2.1 holds for 1 = m
3.2
1=0
1
..
n
+ 1.
Proof of Theorem 1
Let's give two simple Lemmas, which will be used in the proof of Theorem 1.
min(r k)
Lemma 3.1. Li=~(O,k-")
(
r)(
i
s )
k _ i
=
9
(r +
k
s )
.
)
Proof. Think of products consisting of T good and
bad prod ucts. Choosing k products
8
is equivalent to selecting i good products and k - i bad products, for all possible i.
Lemma 3.2. Under the condition of Theorem 1,
where
T8
= f~oo x 8 K(x)dx.
Proof. Integration by parts j times yields the results.
Proof of Theorem 1. By Lemma 3.2 and the definition of Wk, we have for 1 ~
_
-
T8
~
L.J ( k ) ( 1
1=1
1
+8 -
1 ) (-1) 1-1
8 ~
k
(3.1)
8
By Lemma 3.1, the summation in (3.1) can be written as
t(
1=1
;
=
t.~(-1)1-1
i=1
(
1)
i- 1
8 -
i
(; )( : )( :
=~ )
tt(-1)/_l( k) (k-i) (8-1)
t [E( _1)l+i-l (
,=1
k- 1
i
i=1 I=i
=
miI:8) ( 1 )
-1 )/-1 ( k )
1
1=0
i - 1
~
Note that
k-i
L(-1)1
1=0
~
k - i )] (
) (
- 1 )
1
t
t - 1
0,
(k'
- )= {
t
1
if i < k
1, if i = k
10
(3.2)
Thus, by (3.1) and (3.2)
if s < k
if s = k
Similarly, by (3.1) we have
L
W,<x)dx =
t.<-l)'( ~ )
= 1.
This completes the proof. .
ACKNOWLEDGEMENTS
We would also like to express our sincere thanks to Professor J.S. Marron for many
helpful discussions.
References
[1] Berlinet, A. (1990) Reproducing kernels and finite order kernels. Manuscript.
[2] Chiu, S.T. (1990). Bandwidth selection for kernel density estimation, Ann. Statist., to
appear.
[3] Eubank, R. 1. (1988). Spline Smoothing and Nonparametric Regression. Dekker, New
York.
[4] Fan, J. and Marron, J.S. (1990). Best possible constant for bandwidth selection. Institute of Mimeo Series #2041, University of North Carolina, Chapel Hill.
[5] Gasser, T., MUller, H.G., and Mammitzsch, V. (1985). Kernels for nonparametric curve
estimation. J. Roy. Statist. Soc. Ser. B, 47, 238-252.
[6] HardIe, W. (1990). Applied Nonparametric Regression. Cambridge University Press,
Boston.
[7] Hall, P., Sheather, S. J., Jones, M.C. and Marron, J.S. (1990). On optimal data-based
bandwidth selection in kernel density estimation. Biometrika, to appear.
[8] Jones, M. C., Marron, J. S. and Park, B. U. (1990). A simple root n bandwidth selector.
Annals of Statistics, to appear.
[9] Marron, J. S. (1988). Automatic smoothing parameter selection: a survey. Empirical
Economics, 13, 187-208.
11
[10] MUller, H.G. (1984). Smooth optimum kernel estimators of densities, regression curves
and modes. Ann. Statist., 12, 766-774.
[11] Milller, H.G. (1988). Nonparametnc Analysis of Longitudinal Data. Springer Verlag,
Berlin.
[12] Granovsky, B. and Milller, H.-G. (1990). Optimizing kernel methods for the nonparametric estimation of functions and characteristic points: a unifying variational principle. Manuscript.
[13] Rice, J. (1984). Bandwidth choice for nonparametric regression. Ann. Statist., 12,
1215-1230.
[14] Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function,
Ann. Math. Statist., 42, 1815-1842.
[15] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.
[16] Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
[17] Wand, M. P. and Schucany, W.R. (1990). Gaussian-based kernels. Canadian J. Statist.,
18, 197-204.
[18] Schucany, W. R. and Sommers, J. P. (1977). Improvement of kernel type density estimators. J. Amen. Statist. Assoc., 72,420-423.
12