Kernel and convolution Contents 1 Definitions

Université de Rouen – Master 2 MFA – Statistique non asympotique: intro. à la statistique non paramétrique
2016-2017
Kernel and convolution
Contents
1 Definitions
1.1 Definition and examples
1.2 The order of a kernel . .
1.2.1 Definition . . . .
1.2.2 Construction . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Approximation properties
2.1 Approximate identity (reminders) . . . . . . . . .
2.2 Noyau et approximation . . . . . . . . . . . . . .
2.2.1 Local control of the approximation error .
2.2.2 Global control of the approximation error
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
2
2
.
.
.
.
4
4
4
5
5
This document is an introduction to the notion of kernel which is used in nonparametric
estimation. Approximation properties are also stated. Most of the results can also be found in
the books of Tsybakov (2009) and Comte (2015).
1
Definitions
1.1
Definition and examples
Definition 1 A kernel is a function K : Rd → R which is integrable and satifies
R
Rd
K(u)du = 1.
Remark. If we denote by K1 , . . . , Kd some kernel functions on R, then K : x = t (x1 , . . . , xd ) 7→
K1 (x1 ) . . . Kd (xd ) is a kernel on Rd . In the sequel, we thus consider mainly kernels on R.
Examples of kernel on R (see Figure 1).
(a) The rectangular kernel: x 7→ (1/2)1|x|≤1 .
(b) The triangular kernel: x 7→ (1 − |x|)1|x|≤1 .
(c) The Epanechnikov or parabolic kernel: x 7→ (3/4)(1 − x2 )1|x|≤1 .
(d) The biweight kernel: x 7→ (15/16)(1 − x2 )2 1|x|≤1 .
√
(e) The Gaussian kernel: x 7→ (1/ 2π) exp(−x2 /2).
√
√
(f) The Silverman kernel: x 7→ (1/2) exp(−|x|/ 2) sin(|x|/ 2 + π/4).
Remark. All these kernels only take nonnegative values (and are even functions) except the
Silverman kernel.
1
Université de Rouen – Master 2 MFA – Statistique non asympotique: intro. à la statistique non paramétrique
(a)
(b)
(c)
(d)
(e)
(f)
2016-2017
Figure 1: Classical kernels in statistics.
1.2
1.2.1
The order of a kernel
Definition
Definition 2 Let l be a positive integer (l 6= 0). We say that a kernel K on R has order l, if,
for any j = 0, . . . , l, the functions x 7→ xj K(x) are integrable and
Z
xj K(x)dx = 0.
R
Example. The order of the Gaussian kernel is 1.
1.2.2
Construction
Kernels of a given order l can be built with at least two methods.
A first contruction is proposed by Kerkyacharian et al. (2001).
R
Proposition 3 Let g : R → R be an integrable function such that R g(x)dx = 1. Let l ∈ N\{0}.
We define
Kl : R −→ R
l X
l
1 u
.
u 7→ Kl (u) =
(−1)k+1 g
k
k
k
k=1
Then,
1. The function Kl is a kernel, and if the support of g is a segment [a; b], then the one of Kl
is [a, lb];
2. Moreover, if for any k ∈ {1, . . . , l} the functions u 7→ uk g(u) are integrable, then Kl is a
kernel of order l − 1.
Exercise. The aim is to provide a proof for Proposition 3.
1. Prove the first point of the result.
2. Let j, k, l be some integers such that 1 ≤ j ≤ k ≤ l. Justify that
l
k
l
l−j
=
.
k
j
j
k−j
2
Université de Rouen – Master 2 MFA – Statistique non asympotique: intro. à la statistique non paramétrique
2016-2017
If j = 1, the identity amounts to
l
l−1
k
=l
.
k
k−1
3. Let j, l be some integers such that 1 ≤ j < l. Then prove that
l
X
k
k l
= 0.
(−1)
k
j
k=1
4. Let j, l be some integers such that 1 ≤ j < l. Deduce that
l X
l
(−1)k Qj (k) = 0 avec Qj (X) = X(X − 1)(X − 2) · · · (X − j + 1).
k
(1)
k=1
5. With a "finite induction" over j, prove that
∀j ∈ {1, . . . , l − 1},
l X
l
k=1
6. Deduce that
3).
j
R x Kl (x)dx
R
k
(−1)k k j = 0.
= 0 for j ∈ {1, . . . , l − 1} (which ends the proof of Proposition
A second way to build kernels of a given order is to take advantage of the Legendre polynomials.
Definition 4 The Legendre polynomials (Qn )n∈N are the functions defined by the following formulas, for x ∈ [−1; 1]:
r
1
2n + 1 1
dn 2
Q0 (x) = √ , Qn (x) =
(x − 1)n , n ≥ 1.
n
n
2 2 (n!) dx
2
Proposition 5 The Legendre polynomials (Qn )n≥1 is an orthormal basis of L2 ([−1, 1]), and Qn
has degree n for any n ∈ N.
A proof of this result is suggested in Exercise 6.2.8, Chap. 6 of the book of Lacombe and
Massat (1999). Tsybakov (2009) proposed the following consruction.
Proposition 6 Let l ∈ N\{0}. We define,
Kl : R −→ R
u
7→
l
X
Kl (u) =
Qm (0)Qm (u)1|u|≤1 .
m=0
Then, Kl is a kernel of order l.
R
Exercise. Prove Proposition 6. Indication: to compute R uj Kl (u)du, one may write the
function Pj (u) = uj in the Legendre basis, and keep in mind that the (Qm )m are orthogonal
functions.
3
Université de Rouen – Master 2 MFA – Statistique non asympotique: intro. à la statistique non paramétrique
2
2016-2017
Approximation properties
Notation. Let f and g two functions on R (or Rd ). We define the convolution product of f
with g by
Z
f (x − t)g(t)dt,
f ? g : x 7→ f ? g(x) =
R
when it has a sense. For example, the product is well-defined for function f and g such that
f ∈ Lp (R) and g ∈ Lq (R) with p, q ≥ 1 such that p−1 + q −1 = 1. It is also defined almost-surely
when f ∈ L1 (R) and g ∈ Lp (R). We refer to Chap. 13 of Briane and Pagès (2000) for more
details.
2.1
Approximate identity (reminders)
The set (L1 (R), +, ., ?) is a commutative algebra, with no identity element: one cannot find a
function e ∈ L1 (R) such that for any f ∈ L1 (R), f ? e = f . A solution is to introduce some
sequences or families of functions that act as substitutes for an identity element.
Definition 7 An approximate identity for the convolution product is a net of functions (gh )h>0 ⊂
L1 (R) such that
R
(i) for any h > 0, R gh (x)dx = 1,
R
(ii) suph>0 R |gh (x)|dx < +∞,
R
(iii) ∀ε > 0, limh→0 |x|>ε |gh (x)|dx = 0.
Proposition 8 Let (gh )h>0 ⊂ L1 (R) an approximate identity. Let f ∈ Lp (R), p ≥ 1. Then for
any h > 0, f ? gh ∈ Lp (R) and
lim kf ? gh − f kp = 0.
h→0
The proof of this result, as well as other approximation results can be found in the Chapter
13 of the book of Briane and Pagès (2000).
2.2
Noyau et approximation
Notation. In the sequel, for any kernel K on R, and any real-number h > 0, we denote by Kh
the following function
1 x
Kh : x ∈ R 7→ K
.
(2)
h
h
Some examples are plotted in Figure 2.
The Rectangular kernel
The Gaussian kernel
Figure 2: Examples of functions (Kh )h>0 .
4
Université de Rouen – Master 2 MFA – Statistique non asympotique: intro. à la statistique non paramétrique
2016-2017
Proposition 9 Let K be a kernel and (Kh )h>0 the associated family of functions (see notation
(2)). Then (Kh )h>0 is an approximate identity for the convolution product.
Exercise. Prove Proposition 9.
Consequence. Kernel are used in statistics for their approximation properties. The two following results show that one can obtain upper-bounds which are sharper than the convergence of
Proposition 8, when the functions that are approximated are smooh enough.
In the sequel for any real-number α > 0, we denote by bαc denote the greatest integer strictly
less than α.
2.2.1
Local control of the approximation error
We first define the Hölder functional subspaces, which permit to evaluate the local smoothness
of a function.
Definition 10 Let I ⊂ R be an interval, and α, L ∈ R∗+ . The Hölder functional class is the set
HI (α, L) of functions g, defined on I which are bαc−times differentiable and such that, for any
x, x0 ∈ I,
(bαc)
(x) − g (bαc) (x0 ) ≤ L|x − x0 |α−bαc .
(3)
g
Remarks.
1. If α ∈]0, 1], bαc = 0, and HI (α, L) is the set of contractions. If α = 1, we obtain the
Lipschitz functions.
2. The index α of the class HI (α, L) can be understood as a "smoothing index" which measure
the smoothness of the functions that belong to the class (similarly, the index k of the set
C k (I) of k−times continuously differentiable functions is a measure of the smoothness of
the functions of this class).
Proposition
11 Let α, L ∈ R∗+ . Consider a kernel K with order l = bαc, that satisfies
R
α
R |u| |K(u)|du < +∞, and (Kh )h>0 the associated functions family (see notation (2)). Let
g ∈ HI (α, L). Then, for any x0 ∈ R, h > 0,
Z
L
α
|Kh ? g(x0 ) − g(x0 )| ≤ Ch , with C =
|u|α |K(u)|du.
l! R
2.2.2
Global control of the approximation error
The Nikol’skiı̆ spaces, introduced by Nikol0 skiı̆ (1975) are integrated version of the Hölder spaces:
they permit to measure a global smoothness.
Definition 12 Let α, L ∈ R∗+ . The Nikol’skiı̆ functional class N2 (α, L) is the set of functions g
which are bαc-times differentiable and satisfy, for any x ∈ R,
Z 1/2
2
(bαc)
(bαc) (bαc) 0
(bαc) 0
0
(. + x) − g
g
(x + x) − g
(x ) dx
≤ L|x|α−bαc .
g
=
2
R
Remark. One can generalize the previous definition by defining Nikol’skiı̆ spaces Np (α, L) for
any p ≥ 1 (p is just the index of the considered norm).
5
Université de Rouen – Master 2 MFA – Statistique non asympotique: intro. à la statistique non paramétrique
2016-2017
Proposition
13 Let α, L ∈ R∗+ . Consider a kernel K with order l = bαc, that satisfies
R
α
R |u| |K(u)|du < +∞, and (Kh )h>0 the associated functions family (see notation (2)). Let
g ∈ N2 (α, L). Then, for any h > 0,
Z
L
α
kKh ? g − gk2 ≤ Ch , with C =
|u|α |K(u)|du.
l! R
Propositions 11 et 13 justify the use of kernels in statistics: if a function g to be estimated
is smooth enough, we could replace if by g ? Kh for a kernel K. The advantage is that g ? Kh is
defined by an integral, which can be re-written as an expectation. We thus get a simple estimator
by the moment method. Proposition 11 permits to upper-bound the bias of such an estimator if
a punctual risk is considered, while Proposition 13 permits to control the bias for an integrated
risk.
References
Marc Briane and Gilles Pagès. Théorie de l’intégration: licence de mathématiques; cours et
exercices. Vuibert, 2000.
Fabienne Comte. Estimation non-paramétrique. Spartacus IDH, 2015.
Gérard Kerkyacharian, Oleg Lepski, and Dominique Picard. Nonlinear estimation in anisotropic
multi-index denoising. Probab. Theory Related Fields, 121(2):137–170, 2001.
Gilles Lacombe and Pascal Massat.
Analyse fonctionnelle:
universitaire-agrégation]. Dunod, 1999.
exercices corrigés;[2e cycle
Sergeı̆ Mikhaı̆lovich Nikol0 skiı̆. Approximation of functions of several variables and imbedding
theorems. Springer-Verlag, New York, 1975. Translated from the Russian by John M. Danskin,
Jr., Die Grundlehren der Mathematischen Wissenschaften, Band 205.
Alexandre B. Tsybakov. Introduction to nonparametric estimation. Springer Series in Statistics.
Springer, New York, 2009. Revised and extended from the 2004 French original, Translated
by Vladimir Zaiats.
6