A concentration inequality for kernel regression estimators1
Tatiana Sinotina, Silvia Vogel
TU Ilmenau, Germany
MSC classification: 62G08, 90C15
Key Words: Nonparametric regression, Priestley-Chao estimator, GasserMüller estimator, stochastic programming
1. Introduction. We consider the regression model
Yi := m(xi ) + εi ,
(1.1)
for i =1,2,..,n and m as an unknown regression function. xi are non random
design points from the interval [0,1] and εi are independent random variables
with mean 0 and variance σ 2 . Thus, we consider the case, when Y is a random
variable and x is deterministic. This model is also called the fixed design regression model and the ordinates can be chosen as in a designed experiment. In
the absence of other information it is intuitively essential to take these points
equally spaced, and this is frequently the case. Thus,
i
, i = 1, 2, .., n.
n
The main goal of this paper is to estimate the maximum (and its location)
of the unknown regression function. Conventionally we can divide the whole
procedure into 3 stages. We start with the approximation of the regression
model. Then the estimated function is to be maximized and, thus, we achieve
the estimated value and location of the mode. Finally, we assess the “goodness”
of our result. In this paper we mostly deal with the last part of the problem:
using different kernel estimators we will determine the confidence set for the
maximum of the regression function.
In our previous work ([SV1]) we considered the random design regression
model, i.e. the model based on the assumption that both X and Y are random
and we are given an i.i.d. sample (Xi , Yi ), i = 1, .., n. The unknown function was
approximated by the Nadaraya-Watson kernel estimator. However, sometimes
some other estimators are more preferable as they are considerably easier to
analyze([MT04]). Besides, in our problem the application of Nadaraya-Watson
estimator involves additional difficulties, because we have to choose such a set
of design variables X that allows us to avoid zero in the denominator.
xi :=
1 This research was supported by Deutsche Forschungsgemeinschaft under grant number
VO 790/3
1
We will estimate m with two well-known estimators that are usually used
when we have to deal with the fixed design model: the Pristley-Chao kernel
estimator
n
1 X
x − xi
m̂P C (x) :=
)Yi , x∈[0, 1]
(xi − xi−1 )K(
hn i=1
hn
(1.2)
and the Gasser-Muller kernel estimator
Z si−1
n
x−u
1 X
K(
m̂GM (x) :=
Yi [
)du], x∈[0, 1]
hn i=1
hn
si
(1.3)
i+1
.
with si = xi +x
2
As we deal with kernel estimators, it seems necessary to say a few words
about their parameters: kernel function K and bandwidth hn . Kernel estimation is well-studied and usually shows good results, however, the quality of the
estimation can significantly vary, if we use different kernel functions or different
bandwidths. The choice of any of these parameters is a problem of itself. There
are many publications considering the methods to make the right choice and it
is not our purpose to develop a new one. However, we would like to recall the
main properties of these parameters.
By K we denote a kernel function which is supposed to be a measurable
mapping with the compact support K : [−τ, τ ] → R, and satisfies the following
conditions:
Rτ
(K1) −τ |K(x)|dx < ∞,
Rτ
(K2) −τ K(x)dx = 1,
Rτ
(K3) K(x) = K(−x) ∀x ∈ R ⇒ −τ xj K(x)dx = 0 for odd j ∈ N,
(K4) supx∈[−τ,τ ] |K(x)| = C < ∞.
Many functions satisfy these conditions and the most common (and also
optimal in some cases ([DL01]) choice is the Epanechnikov kernel. We will not
discuss the choice of the optimal kernel for our problem, because the greater
influence on the approximation quality has the second parameter - bandwidth
(for more details see [Si86]). Though the kernel estimators are not new, there are
still no effective procedures to determine the “best” value for hn . Traditionally
we assume that h ≡ hn → 0, nhn → ∞. However, in practice it should not be too
small or too big. Small values of hn make the estimate look “wiggly” and show
spurious features, whereas big values of hn will lead to an estimate which is too
smooth in the sense that it is too biased and may not reveal structural features.
In our previous paper [SV1] we proposed the possible way to determine the
optimal bandwidth in the sense of this particular problem. The same approach
can be also used for this result and we will not discuss it again.
2
Now we assume that the maximum of m is located at some unknown point
xmax ∈[0, 1]. Using each of two estimators we estimate the value of m(xmax ) by
m̂(x̂max ) = max m̂(x), x∈[0, 1].
x
Thus, we have the approximate maximum and we try to assess the quality this
result. The range of methods that allow to do it is very large. One of the most
common ways is to construct some suitable interval (or in general - a set) that
covers the “true” solution at least with the prescribed probability. Such intervals are also known as confidence intervals. However, for this tool we usually
need some special knowledge about information which is usually not available.
In our paper we will use another method to derive the confidence bounds. It
is based on uniform concentration-of-measure results and some additional assumptions about the true model. This means that for each sample size n we
will derive a random set, which covers the estimated parameter with the prescribed probability. These random sets are called (strong) universal confidence
sets. The idea of this method was firstly proposed by G. Pflug [Pf03] and was
further developed by S. Vogel in [Vog07] and [Vog08].
This paper is organized as follows. In the second section we introduce universal confidence sets and explain the method in general. In section 3 we discuss
the main result of the paper and in the next section we consider it in more details. Thus, in Section 4 we investigate both Gasser-Müller and Pristley-Chao
estimators with respect to the convergence properties needed for the derivation
of the universal confidence sets. We consider a single-valued solution set. In
Section 5 we extend our results to the case that the function has multiple peaks.
2. Universal confidence sets (UCS) In this section we set the main
definitions and theorems that are necessary for constructing UCS (For more
information see [Vog07],[Vog08]).
Let (E, k · k) be a complete separable metric space, χ a compact subset of
E and [Ω, Σ, P ] a complete probability space. We assume that a deterministic
optimization problem
(P0 ) min f0 (x)
x∈χ
is approximated by a sequence of random problems
(Pn ) min fn (x), n ∈ N.
x∈χ
f0 | E → R̄1 is a lower semicontinuous function and fn | E × Ω →NR̄1 are
lower semicontinuous random functions which are supposed to be (B(E) Σ, B̄ 1 )measurable. B(E) denotes the Borel-σ-field of E and B̄ 1 the σ-field of Borel sets
of R̄1 . Finally, we assume that all objective function are (almost surely) proper
functions, i.e. functions with values in (−∞, +∞] which are not identically ∞.
By Φn we denote the optimal value and by Ψn the solution set of the random
approximate problems (Pn ). Correspondingly, by Φ0 we denote the optimal
3
value and by Ψ0 the solution set of the deterministic limit problem (P0 ). Each
solution set can be described as follows:
Ψ0 = {x ∈ χ : f0 (x) − Φ0 ≤ 0}.
Ψn = {x ∈ χ : fn (x) − Φn ≤ 0}, n ∈ N0 .
We make use of assertions of the following form:
∀κ > 0 ∀n ∈ N : P {Ψn \ Uβn,κ Ψ0 6= ∅} ≤ H(κ).
(2.1)
∀κ > 0 ∀n ∈ N : P {Ψ0 \ Uβn,κ Ψn 6= ∅} ≤ H(κ).
(2.2)
We assume throughout the paper that the sequence (βn,κ )n∈N belongs to
the class B of non-increasing sequences of positive numbers and the functions
H belong to the class H of non-increasing functions of the form: H|R+ → R+ .
Of course, one is interested in small confidence sets, hence (βn,κ )n∈N should go
to zero as fast as possible and H should converge to zero as fast as possible if κ
tends to infinity. Uα X denotes an open neighborhood of set X ⊂ E with radius
α : Uα X := {x ∈ E : d(x, X) < α}.
If the sequence (Ψn,κ )n∈N fulfills the relation (1), we call it an inner approximation in probability to Ψ0 with convergence rate βn,κ and tail behavior
function Hn or just an inner (βn,κ , H)-approximation. Correspondingly, when
a sequence (Ψn,κ )n∈N fulfills the relation (2), it is an outer a approximation
in probability to Ψ0 with convergence rate βn,κ and tail behavior function Hn
or in short an outer (βn,κ , H)-approximation. Since supersets of outer approximations are again outer approximations, one is especially interested in outer
approximation which are also inner approximations.
Unfortunately, under reasonable conditions one can only prove inequality
(2.1), roughly spoken, that only a subset of the ’true’ solution set Ψ0 is approximated. However, if Ψ0 is single-valued and the set Ψn , n ∈ N , are uniformly
bounded, inequality (2.1) implies inequality (2.2). The uniform boundedness
condition is satisfied because of the compactness of χ.
Crucial assumptions are uniform concentration-of-measure conditions for the
objective functions and conditions about the limit problem, which concern the
growth of the objective function.
The growth condition will be described by a function µ which belongs to
a set Λ := {µ | R+ → R+ : µ is increasing, right-continuous, and satisfies
µ(0) = 0}. As it was mentioned above, the constraint set is fixed.
For the reader’s convenience we provide below the most important theorems
without proofs. More information cab be found in [Vo07], [Vo08].
Theorem 2.1 (Inner Approximation of the Solution Set) Assume
that the following assumptions are satisfied:
(2.1) There exist a function H ∈ H and to all κ > 0 a sequence (βn,κ )n∈N ∈ B
such that
sup P{sup |fn (x) − f0 (x)| ≥ βn,κ } ≤ H(κ)
(2.3)
n∈N
x∈χ
holds.
4
(2.2) There exist a function µ ∈ Λ such that for all κ > 0
∀x ∈ χ \ Uκ Ψ0 : f0 (x) ≥ Φ0 + µ(κ).
(2.4)
Then for all κ > 0 and β̃n,κ := µ−1 (2βn,κ ) the relation
supn∈N P (Uβ̃n,κ Ψ0 ⊂ χ and Ψn \ Uβ̃n,κ Ψ0 6= ∅) ≤ 2H(κ) holds.
However, if the solution set is not single-valued, one can obtain only inner
approximation and, thus, one can not guarantee that the whole “true” solution
set will be covered. Nevertheless, an outer approximation can be constructed
as well. To that end we consider “relaxed” problems, where one deals with
“relaxed” solution sets. They are accurate only up to a small parameter that
depends on n and κ and tends to zero for each κ and n → ∞. With other words,
we consider a suitable relaxing sequence (ρn,κ )n∈N , which tends to zero for each
κ > 0 and consider ρn,κ -optimal solutions Ψrn,κ .
Ψrn,κ = {x ∈ χ : fn (x) − Φn ≤ ρn,κ }, n ∈ N.
(2.5)
Theorem 2.2 (Outer Approximation of the Solution Set, relaxation) Assume that there exist a function H ∈ H and to all κ > 0 a sequence
(βn,κ )n∈N ∈ B such that
sup P{sup |fn (x) − f0 (x)| ≥ βn,κ } ≤ H(κ)
n∈N
(2.6)
x∈χ
holds.
Then for all κ > 0, ρn,κ = 2βn,κ β̃n,κ := 2βn,κ the following relation
supn≥n0 (κ) P{Uβ̃n,κ Ψ0 ⊂ K and Ψ0 \ (Ψrn,κ
T
Uβ̃n,κ χ) 6= ∅} ≤ 2H(κ) holds.
3. Approximation of the regression function. Main results. We
observe a sample of pairs (Yi , xi ), i = 1, 2, .., n, where Yi are independent real
valued random variables and xi are deterministic design points with values in
[0, 1]. We consider the fixed design regression model given in (1.1) which is
approximated by a kernel estimator of the form (1.2) or (1.3). Our main concern
now is the estimation of the peak of the unknown function. This problem can
be described as a deterministic approximation problem:
(P0 ) max(m(x)),
x∈χ
(3.1)
where m as in (1.1). It is approximated by the following sequence:
(Pn ) max (m̂(x)),
(3.2)
x∈[0,1]
where m̂ is as in (1.2) or (1.3).
Assume that the regression function has the unique maximum, so the solution set is single-valued. Thus, Proposition 2.1 can be applied and the outer
5
approximation of this set will be achieved. In other words, we will find β̃n,κ and
H(κ) for the following inequality:
sup P (Uβ̃n,κ Ψ0 ⊂ [0, 1] ∧ Ψn \ Uβ̃n,κ Ψ0 6= ∅) ≤ 2H(κ)
n∈N
with Ψ0 and Ψn as the solution set for the problem (3.1) and (3.2) correspondingly.
However, this result can be obtained only if the parameters in the inequality
(2.3) are known. So the challenge is to estimate them. The solution of this
problem will be the most important result of this paper. In Lemma 3.4 we
will show that both kernel estimators (m̂GM and m̂P C ) satisfy the following
inequality
P ( sup |m̂(x) − m(x)| ≥ βn,κ ) ≤ H(κ)
x∈[0,1]
with
2
H(κ) = 2e−α1 κ
and
βn,κ = √
κ
+ α2 hn + Kn .
nhn
α1 and α2 here are constants that will be determined later. Kn is a parameter
that tends to zero as n tends to infinity. It is obvious that the main properties
of H(κ) and βn,κ described in Section 2 are satisfied. It worths mentioning that
using the estimators (1.2) and (1.3) we are in a better situation than if we used
Nadaraya-Watson estimator. First of all, we do not need any additional bounds
for the constraint set. The second, the constants α1 and α2 are less complicated
and, thus, are easier to estimate. The only disadvantage we did not manage to
avoid is the boundedness condition for the response variable Y :
sup |Y | < ∞.
However, the application of the concentration-of-measure inequalities makes
this condition essential.
In the next section we provide the more detailed results with the proofs.
4. Approximation of the regression function. Proofs.
Our main concern is the estimation of H(κ) and βn,κ from (2.3). A useful approach for such an expression is based on the concentration-of-measure
inequalities, such as McDiarmid’s inequality, Dvoretzky-Kiefer-Wolfowitz inequality and some others (for more information see [MP07]). Thus, there are
several ways to construct (2.3). In this paper we will use only McDiarmid’s
inequality, because its application gives us quite a good rate of convergence
6
without complicated proofs. However, this approach has one significant disadvantage: we need the boundedness condition for the response variable Y. So we
assume that Y is a.s. contained in some compact interval [−CY , CY ], i.e.
sup |Y | ≤ CY < ∞.
(4.1)
y
We recall McDiarmid’s inequality([DL01]). Let X1 , X2 , .., Xn be independent random variables all taking values in the set A. Further, let q : An → R
be a function of X1 , X2 , .., Xn that satisfies ∀i, ∀x1 , .., xn , xi ∈ A,
sup
X1 ,..,Xi ,..,Xn ,Xi∗ ∈A
|q(X1 , .., Xn ) − q(X1 , .., Xi−1 , Xi∗ , Xi+1 .., Xn )| ≤ ci .
(4.2)
Then for all ε,
P (|q(X1 , .., Xn ) − E[q(X1 , .., Xn )]| ≥ ε) ≤ 2e
2
c2
i=1 i
− Pn2ε
.
Let us now consider the triangle inequality . Here and further by m̂ we
understand either Pristley-Chao or Gasser-Müller estimator , i.e. the expression
holds for any of them:
sup |m̂(x) − m(x)| ≤ S1 + S2
(4.3)
x∈[0,1]
with
S1 := sup |m̂(x) − E(m̂(x))|
x∈[0,1]
and
S2 := sup |E(m̂(x)) − m(x)|].
x∈[0,1]
In the first lemma we apply the McDiarmid’s inequality to the first summand
S1 in order to obtain an estimation of the form (2.3). In the second lemma we
provide auxiliary result which is used for estimation of the summand S2 and
this will be the third lemma. After that we put both results together and in the
final theorem we obtain our goal - the universal confidence set for the mode of
m.
Lemma 4.1 Assume that condition (4.1) is satisfied. Furthermore, suppose
that
sup |K(x)| = CK < ∞.
(4.4)
x∈[0,1]
Then for the kernel regression m̂GM with the bandwidth hn and kernel K the
following inequality holds:
P (| sup |m̂GM (x) − E(m̂GM (x))| ≥ t(κ)) ≤ H(κ) = 2e
x∈[0,1]
7
−
t(κ)2 nh2
n
2C 2 C 2
Y K
.
Furthermore, for the kernel regression m̂P C with the bandwidth hn and kernel
K the following inequality holds:
P (| sup |m̂P C (x) − E(m̂P C (x))| ≥ t(κ)) ≤ H(κ) = 2e
−
t(κ)2 nh2
n
2C 2 C 2
Y K
.
x∈[0,1]
Proof. We start with the first part of the assertion, namely with the GasserMüller estimator. Assume that
q(x, X1 , X2 , ..Xn , Y1 , Y2 , .., Yn ) := m̂GM (x, X1 , X2 , ..Xn , Y1 , Y2 , .., Yn ).
However, we consider the fixed design model, so xi , i = 1, .., n are constant,
namely xi = ni . Thus, function q depends only on Y1 , Y2 , .., Yn . According to
(4.1) Yi , i = 1, .., n are contained in some compact set A. Further, assume that
there exists Yi∗ ⊂ A and Yi∗ 6= Yi for some i = 1, .., n.
Now we will show that the difference (4.2) is bounded and, consequently,
the McDiarmid’s inequality can be applied. At first let us recall the definition
of m̂GM and ”simplify” it a little.
Z si
Z si
n
n
x−u
1 X
1 X
Yi [
K(
Yi CK [
du]
)du] ≤
m̂GM (x) :=
hn i=1
hn
hn i=1
si−1
si−1
≤
i
n,
|si
n
1 X
Yi CK (si − si−1 ),
hn i=1
i+1
where si = xi +x
. As we consider the fixed design model x and xi =
2
i = 1, .., n, si are nonrandom variables from the interval [0,1]. Therefore,
− si−1 | = n1 .
Thus, we obtain that
sup
sup
x∈[0,1] Y1 ,..,Yn ,Yi∗ ∈A
|m̂GM (x, Y1 , .., Yn ) − m̂GM (x, Y1 , .., Yi−1 , Yi∗ , Yi+1 .., Yn )|
≤ sup
|
sup
x∈[0,1]
Y1 ,..,Yn ,Yi∗ ∈A
CK (si − si−1 )
(Yi − Yi∗ )|.
hn
Further, applying assumptions (4.1) and (4.4) we have
≤ sup
sup
x∈[0,1] Y1 ,..,Yn ,Yi∗ ∈A
|
2CK CY
CK (si − si−1 )
(Yi − Yi∗ )| ≤
hn
nhn
Thus, we obtained
sup
sup
x∈[0,1]
Y1 ,..,Yn ,Yi∗ ∈A
|m̂GM (x, Y1 , .., Yn )−m̂GM (x, Y1 , .., Yi−1 , Yi∗ , Yi+1 .., Yn )| ≤
8
2CK CY
,
nhn
which is our first claim.
The second part of the assertion, where we consider the Priestley-Chao estimator, can be proved similar. The next step is to estimate the second summand from (4.3). In fact, we
need to estimate the upper bond for the bias. One of the most common ways
involves the Taylor expansion. However, before we start this estimation it is
nesessary to recall the integral approximation for the expected value of m̂GM .
This transformation was introduced by Gasser and Müller in [GM78]. In our
paper we will use their result with a few modifications.
Lemma 4.2 Let m be an unknown regression function and m̂GM is its kernel
estimation. Furthermore, we assume m to be Lipschitz continuous of order γ
with constant L. Then E(m̂GM ) can be approximated as follows:
1
E(m̂GM (x)) =
hn
Z
τ
K(u)m(x−uhn )du+Kn , where Kn ≤
−τ
LCK
= O(n−γ ).
n(γ) hn
Proof. It follows from the Appendix 1 in [GM78].
The next lemma gives the estimation for S2 from (4.3).
Lemma 4.3 Assume that the condition (4.1) is fulfilled. Furthermore, suppose that the function m and the kernel function K satisfy the following assumptions:
m ∈ C2,γ (R),
Z
(4.5)
τ
x2 K(x)dx = CT < ∞,
(4.6)
∃m00 and |m00 | = Cm < ∞.
(4.7)
−τ
Then the following inequalities hold:
S2 = sup |E(m̂GM (x)) − m(x)| ≤
x∈[0,1]
hn CT Cm
+ Kn
2
and
S2 = sup |E(m̂P C (x)) − m(x)| ≤
x∈[0,1]
hn CT Cm
.
2
Proof. Let us show the correctness of the first inequality. At the start we
apply the integral approximation that was introduced in the previous lemma:
sup |E(m̂GM (x)) − m(x)| = sup |
x∈[0,1]
x∈[0,1]
1
hn
9
Z
τ
K(u)m(x − uhn )du + Kn − m(x)|.
−τ
We use a second order Taylor expansion of m around x with the remainder
of Lagrange form (ξ ∈ (x, x + u)).
1
m(x − uhn ) = m(x) − (uhn )m0 (x) + (uhn )2 m00 (ξ).
2
Further,
Z τ
1
S2 ≤ sup |
K(u)(m(x − uhn ) − m(x))du| + Kn =
x∈[0,1] hn −τ
Z τ
1
1
K(u)(m(x) − (uhn )m0 (x) + (uhn )2 m00 (ξ) − m(x))du| + Kn
= sup |
2
x∈[0,1] hn −τ
1
= sup |
h
n
x∈[0,1]
Z
τ
1
(−K(u)(uhn )m0 (x) + K(u)(uhn )2 m00 (ξ))du| + Kn
2
−τ
Z τ
1
1
K(u)(uhn )2 m00 (ξ)du| + Kn
≤ sup |
x∈[0,1] hn −τ 2
Z τ
hn CT Cm
hn Cm
sup |
u2 K(u)du| + Kn 6
+ Kn .
≤
2 x∈[0,1] −τ
2
The proof for the second inequality, where the Pristley-Chao estimator is
considered, is similar. Now we have all the knowledge we need about the parameters in (2.3).
Lemma 4.4 Assume that the assumptions (4.1)and (4.4)-(4.7) are satisfied.
Then for the kernel regression m̂GM with the bandwidth hn and kernel K the
following inequality holds:
P ( sup |m̂(x)GM − m(x)| ≥ βn,κ ) ≤ H(κ)
x∈[0,1]
where
βn,κ = √
hn CT Cm
κ
+
+ Kn ,
2
nhn
Kn ≤
LCK
n(γ) hn
and
H(κ) = 2e
−
κ2
2C 2 C 2
K Y
.
Furthermore, for the kernel regression m̂P C with the bandwidth hn and kernel
K the following result holds:
P ( sup |m̂(x)P C − m(x)| ≥ βn,κ ) ≤ H(κ)
x∈[0,1]
10
where
βn,κ = √
κ
hn CT Cm
+
2
nhn
and
H(κ) = 2e
−
κ2
2C 2 C 2
K Y
.
Proof. We will show the correctness of the result for the Gasser-Müller kernel
estimator. The proof for Pristley-Chao estimator is the same.
We apply the triangle inequality as in (4.3):
sup P ( sup |m̂GM (x) − m(x)| ≥ βn,κ ) ≤ sup P (S1 + S2 ≥ βn,κ ) ≤
x∈[0,1]
n∈N
n∈N
sup P (S1 +
n∈N
hn CT Cm
+ Kn ≥ βn,κ ) =
2
2
− 2κ 2
κ
sup P ( sup |m̂GM (x) − E(m̂GM (x))| ≥ √
) ≤ 2e 2CK CY .
nhn
x∈[0,1]
n∈N
Applying the result of Lemma 4.1 with t(κ) =
√κ ,
nhn
the proof is completed.
Finally, we can construct a universal confidence set for the mode of the
regression model. In this theorem we use an additional condition - knowledge
about the co-called growth function µ which relates to the true function m and
quarantee the existence of the extreme point. However, we have know knowledge
about m and, thus, can only estimate the corresponding growth function. This
question was studied in details in paper [Vog08]. In this article we just assume
that growth function µ exists.
Theorem 4.5 Assume that the unknown regression function was approximated with Gasser-Müller kernel estimator and that the conditions (4.1) and
(4.4)-(4.7) are satisfied. Furthermore, assume that there exist a function µ ∈ Λ
such that
κ > 0 ∀x ∈ [0, 1] \ Uκ Ψ0 : −m(x) ≥ Φ0 + µ(κ).
Then the confidence set Ψn for the solution set of problem (3.1) Ψ0 can be
derived as follows:
sup P (Uβ̃n,κ Ψ0 ⊂ [0, 1] ∧ Ψn \ Uβ̃n,κ Ψ0 6= ∅) ≤ 4e
−
n∈N
and
β̃n,κ = µ−1 (2( √
hn CT Cm
κ
+
+ Kn )).
2
nhn
11
κ2
2C 2 C 2
K Y
Moreover, under the above assumptions for the Pristley-Chao estimator we
obtain the following confidence set Ψn :
sup P (Uβ̃n,κ Ψ0 ⊂ [0, 1] ∧ Ψn \ Uβ̃n,κ Ψ0 6= ∅) ≤ 4e
−
κ2
8C 2 C 2
K Y
n∈N
and
β̃n,κ = µ−1 (2( √
κ
hn CT Cm
+
)).
2
nhn
Proof. This theorem follows from Theorem 2.1 and Lemma 4.4.
5. Approximation of the multimodal regression function. In the
previous section we dealt with problem (3.1) and obtained a confidence set for its
solution. Unfortunately, the assumption of the uniqueness of the mode is usually
too restrictive. So we have to deal with the fact that the regression function can
be multimodal, i.e. to have several peaks. However, it leads immediately to the
problem that only an inner approximation of the solution set can be achieved
directly as it was done in Section 4. In the general case an outer approximation
is not possible and we have to deal with the “relaxed” problems, or in our case
with “relaxed” (ρn,κ -optimal) solution Ψrn,κ as in (2.5).
The Theorem 2.2 provides a criterion that offers the possibility to construct
a universal confidence set in this case. It is obvious that the condition (2.6)
is equivalent to the condition (2.3). Furthermore, the Lemma 4.4 yields all
the information about it. Thus, we are able to obtain an outer approximation
without any further calculations.
Theorem 5.1 Assume that the unknown regression function was approximated with Gasser-Müller kernel estimator and that the conditions (4.1) and
(4.4)-(4.7) are satisfied. Then for all κ > 0 and ρn,κ = β̃n,κ = 2βn,κ the solution set of the problem (3.1) Ψ0 can be approximated by Ψrn,κ so that Ψrn,κ
covers Ψ0 with the probability 4e
−
κ2
8C 2 C 2
K Y
:
supn≥n0 (κ) P{Uβ̃n,κ Ψ0 ⊂ [0, 1] and Ψ0 \
κ
+
and β̃n,κ = 2( √nh
n
hn CT Cm
2
(Ψrn,κ
T
Uβ̃n,κ χ) 6= ∅} ≤ 4e
−
κ2
8C 2 C 2
K Y
+ Kn )
Moreover, under the same hypotheses, for the Pristley-Chao estimator we
obtain the following confidence set Ψrn,κ :
supn≥n0 (κ) P{Uβ̃n,κ Ψ0 ⊂ [0, 1] and Ψ0 \ (Ψrn,κ
κ
and β̃n,κ = 2( √nh
+
n
T
hn CT Cm
)
2
Proof. This theorem follows from Theorem 2.2 and Lemma 4.4.
12
−
Uβ̃n,κ χ) 6= ∅} ≤ 4e
κ2
8C 2 C 2
K Y
References:
[DL01] Devroye, L. and G. Lugosi: Combinatorial Methods in Density Estimation. Springer, 2001.
[HW91] Härdle, Wolfgang : Applied nonparametric regression. Cambridge
Univ. Press, 1991.
[GM79] Gasser,T. and Müller,H.-G.: Kernel estimation of regression functions. In: Smoothing Techniques for Curve Estimation, Lecture Notes in Mathematics, pp. 23-68, 1979.
[MP07] Massart, Pascal: Concentration Inequalities and Model Selection.
Ecole d’Eté de Probabilités de Saint-Flour XXXIII-2003. Springer, 2007
[MT04] Mackenzie, M and Tieu, K.: Asymmetric kernel regression. IEEE
Transaction on Neutral Networks, pp. 276-282, March 2004.
[Pf04] Pflug, Georg Ch.: Stochastic Optimization and Statistical Inference.
In: Stochastic Programming, Handbooks in Operation Research and Management Science, pp. 427-482, 2003.
[Si86] Silverman, B.W.: Density estimation for Statistics and Data Analysis.
Chapman and Hall, 1986.
[SV1] Sinotina, T., Vogel, S.: Universal confidence sets for the mode of a
regression function. Preprint.
[Str94] Stroock, D.W.: A Concise Intoduction to the theory of Integration.
Birkhäuser, 1994.
[Vog07] Vogel, Silvia: Universal Confidence Sets for Solutions of Optimization Problems. SIAM Journal on Optimization 19 (2008), 3, 1467 - 1488.
[Vog08] Vogel, Silvia: Confidence Sets and Convergence of Random Functions. In: Festschrift in Celebration of Prof. Dr. Wilfried Grecksch’s 60th
Birthday; C Tammer, F. Heyde (Hrsg.), Shaker-Verlag 2008
13
© Copyright 2026 Paperzz