THE k-NEAREST NEIGHBORS ESTIMATION
OF THE CONDITIONAL MODE FOR FUNCTIONAL DATA
MOHAMMED KADI ATTOUCH and WAHIBA BOUABÇA
Communicated by Dan Crişan
The aim of this article is to study the nonparametric estimation of the conditional
mode by the k-Nearest Neighbor method for functional explanatory variables.
We give the rate of the almost complete convergence of the k-NN kernel estimator of the conditional mode. A real data application is presented to exhibit
the effectiveness of this estimation method with respect to the classical kernel
estimation.
AMS 2010 Subject Classification: 62G05, 62G08, 62G20, 62G35.
Key words: functional data, the conditional mode, nonparametric regression,
k-NN estimator, rate of convergence, random bandwidth.
1. INTRODUCTION
The conditional mode, by its importance in the nonparametric forecasting field, has motivated a number of researchers in the investigation of mode
estimators, and constitutes an alternative method to estimate the conditional
regression (see Ould Saı̈d [24] for more discussion and examples).
In finite dimension spaces, there exists an extensive bibliography for
independent and dependent data cases. In the independent case, strong
consistency and asymptotic normality using the kernel method estimation
of the conditional mode is given in Samanta and Thavaneswaran [26]. In
the dependent case, the strong consistency of conditional mode estimator
was obtained by Collomb et al. [6] and Ould-Saı̈d [24] under strong mixing
conditions. (See Azzahrioui [14] for extensive literature).
In infinite dimension spaces, the almost complete convergence of the conditional mode is obtained by Ferraty et al. [18], Ezzahrioui and Ould Saı̈d
[15] establish the asymptotic normality of nonparametric estimators of the
conditional mode for both independent and dependant functional data. The
consistency in Lp-norm of the conditional mode function estimator is given in
Dabo-Niang and Laksaci [10].
Estimating the conditional mode is directly linked to density estimation,
and for the last one, the bandwidth selection is extremely important for the
REV. ROUMAINE MATH. PURES APPL. 58 (2013), 4, 393–415
394
Mohammed Kadi Attouch and Wahiba Bouabça
2
performance of an estimate. The bandwidth must not be too large, so as to
prevent over-smoothing, i.e. substantial bias, and must not be too small either,
so as to prevent detecting the underlying structure. Particularly, in nonparametric curve estimation, the smoothing parameter is critical for performance.
Based on this fact, this work deals with the nonparametric estimation
with k nearest neighbors method k-NN. More precisely, we consider a kernel estimator of the conditional mode build from a local window taking into
account the exact k nearest neighbors with real response variable Y and functional curves X.
The k-Nearest Neighbor or k-NN estimator is a weighted average of response variables in the neighborhood of x. The existent bibliography of the
k-NN method estimation dates back to Royall [25] and Stone [27] and has received, continuous developments (Mack [22] derived the rates of convergence for
the bias and variance as well as asymptotic normality in multivariate case; Collomb [5] studied different types of convergence (probability, almost.sure (a.s.)
and almost.completely (a.co.)) of the estimator of the regression function; Devroye [12] obtained the strong consistency and the uniform convergence). For
the functional data studies, the k-NN kernel estimate was first introduced in
the monograph of Ferraty and Vieu [18], Burba et al. [4] obtained the rate of
almost complete convergence of the regression function using the k-NN method
for independent data and Attouch and Benchikh [1] established the asymptotic
normality of robust nonparametric regression function.
The paper is organized as follows: the following section presents the k-NN
model estimation of the conditional mode. Then, we give hypotheses and state
our main result in Section 3. After we introduce the almost complete convergence and the almost complete convergence rate, in Section 5 we illustrate the
effectiveness of the k-NN method estimation for real data. All proofs are given
in the appendix.
2. MODELS AND ESTIMATORS
Let (X1 , Y1 ), . . . (Xn , Yn ) be n independent pairs, identically distributed
as (X, Y ) which is a random pair valued in F × IR, where F is a semi-metric
space, d(., .) denoting the semi-metric. We do not suppose the exitance of a
density for the functional random variable (f.r.v) X.
All along the paper when no confusion will be possible we will denote
by c, c0 or / Cx some generic constant in R+ and in the following, any real
function with an integer in brackets an exponent denotes its derivative with the
corresponding order. From a sample of independent pair (Xi , Yi ), each having
the same distribution as (X, Y ), our aim is to build nonparametric estimates
3
The k-Nearest Neighbors estimation of the conditional mode for functional data
395
of several function related with the conditional probability distribution of Y
given X. For x ∈ F we will denote the conditional cdf of Y given X = x by:
∀y ∈ IR
FYX = IP(Y ≤ y|X = x).
If this distribution is absolutely continue with respect to the Lesbesgues
measure on R, then we will denote by f x (resp. f x(j) ) the conditional density
(resp. its j th order derivative) of Y given X = x. Then we will give almost
complete convergence results (with rates) for nonparametric estimates of f x(j) .
we will deduce immediately the convergence of conditional density estimate
from the general results concerning f x(j) .
In the following, x will be a fixed in F; Nx will denote a fixed neighborhood of x, S will be a fixed compact subset of R, and we will use the notation
B(x, h)={x0 ∈ F/ d(x,x0 )<h}.
The k nearest neighbors (k-NN) of F on x is defined by:
n
X
x
FbkN
N (y) =
−1
d(x, Xi ))G(h−1
K(Hn,k
G (y − Yi ))
i=1
n
X
.
−1
K(Hn,k
d(x, Xi ))
i=1
Now, we focuse on the estimation of the j th order derivative of the conx(j)
ditional density f on x by the k nearest neighbors (k-NN), where fbkN N (y) is
defined by:
−j−1
hG
(1)
x(j)
fbkN N (y) =
n
X
K(Hk−1 d(x, Xi ))G(j+1) (h−1
G (y − Yi ))
i=1
n
X
,
K(Hk−1 d(x, Xi ))
i=1
where K is a kernel function and Hn,k (.) is defined as follows :
(
)
n
X
Hn,k (x) = min h ∈ R+ /
1IB(x,h) (Xi ) = k .
i=1
Hk,n is a positives random variable (r.v.) which depends (x1 , ..., xn ), G
is a cdf, and hG = hG,n is a sequence of positive real numbers.
We propose the estimate θbkN N of the conditional mode θ (assumed uniquely
defined in the compact set S) defined below in (2). Our estimate is based on
the previous functional conditional density estimate defined in (1) as:
(2)
x
fbx (θbkN N ) = sup fbkN
N (y)
y∈S
396
Mohammed Kadi Attouch and Wahiba Bouabça
4
Note that the estimate θbkN N is note necessarily unique, and if this is the
case, all the remaining of our paper will concern any value θbkN N satisfying (2).
To prove the almost complete convergence, and the almost complete
convergence rate of the k-NN conditional mode estimate, we need some results of density estimate of Ferraty et al. [19]. Therefore, we present this
density estimate by
h−j−1
G
n
X
(j+1) −1
(hG (y − Yi ))
K(h−1
k d(x, Xi ))G
i=1
fbx(j) (y) =
n
X
K(h−1
k d(x, Xi ))
i=1
and the conditional mode is
(3)
fbx (θbmode ) = sup fbx (y).
y∈S
In the case, the non-random bandwidth h := hn (sequence of positive real
numbers which goes to zero as n goes to infinity). The random feature of the
k-NN bandwidth represents both its main quality and also its major disadvantage. Indeed, the fact that Hn,k (x) is a r.v. creates technical difficulties in
proofs because we cannot use the same tools as in the standard kernel method.
But the randomness of Hn,k (x) allows to define a neighborhood adapted to x
and to respect the local structure of the data.
We will consider two kinds of nonparametric models. The first one, called
the “continuity-type” model, is defined as:
(4)
0
f ∈ CE
= {f : F → IR/
lim
d(x,x0 )→0
f (x0 ) = f (x)},
and will issue pointwise consistency results. The second one called the “Lipchitztype” model, assumes the existence of an α > 0 such that
(5) f ∈ LipE,α = {f : F → IR/∃C > 0, ∀x0 ∈ E, |f (x0 ) − f (x)| < Cd(x, x0 )α },
and will allow to obtain the rates of convergence.
3. TECHNICAL TOOLS
The first difficulty comes because the window Hn,k (x) is random. So, we
x
do not have in the numerator and in denominator of fkN
N (y) sums of variables
independents. To resolve this problem, the idea is to frame sensibly Hn,k (x)
by two non-random windows. More generally, these technical tools could be
useful as long as one has to deal with random bandwidths.
5
The k-Nearest Neighbors estimation of the conditional mode for functional data
397
Lemma 1 (Burba et al. [4]). Let (Ai , Bi )i=1,n be n random pairs (independent but note, identically distributed) valued in F × R+ , where (F, α) a
generic measurable space. We note W : R × F a measurable function such
that:
(L0 )
∀ z ∈ F, ∀t, t0 ∈ R :
t 6 t0 ⇒ W (t, z) 6 W (t0 , z).
We put for every n ∈ N∗ , and T a real.r.v:
n
X
(1) −1
W (T, Ai )h−1
G Gi (hG (y − Bi ))
Cn (T ) = sup
i=1
n
X
y∈S
.
W (T, Ai )
i=1
Let (Dn )N a sequence of real random variables. If for any β ∈]0, 1[, there
exists two sequence of r.r.v Dn− (β) and Dn+ (β) which verify:
(L1 ) ∀ n ∈ N∗ : Dn− 6 Dn+ ,
n
X
(L2 )
i=1
n
X
and
a.co.
1I{Dn− 6Dn 6Dn+ } −−−→ 1.
W (Dn− , Ai )
a.co.
−−−→ β.
W (Dn+ , Ai )
i=1
(L3 ) ∃ c > 0 such that:
a.co.
a.co.
Cn (Dn− ) −−−→ c and Cn (Dn+ ) −−−→ c .
So, we have :
a.co.
Cn (Dn ) −−−→ c .
Lemma 2 (Burba et al. [4]). Let (Ai , Bi )i=1,n be n random pairs (independent but note, identically distributed) valued in F × R+ , where (F, α) a
generic measurable space. We note W : R × F a measurable function such
that:
(L0 )
∀ z ∈ F, ∀t, t0 ∈ R :
t 6 t0 ⇒ W (t, z) 6 W (t0 , z).
We put for every n ∈ N∗ , and T a real.r.v:
n
X
(1) −1
W (T, Ai )h−1
G G (hG (y − Bi ))
Cn (T ) = sup
y∈S
i=1
n
X
.
W (T, Ai )
i=1
Let (Dn )N a sequence of r.r.v. And let (vn )n∈N a positive decreasing
sequence.
398
Mohammed Kadi Attouch and Wahiba Bouabça
6
1. if l = lim vn 6= 0, and if for every increasing sequence βn ∈]0, 1[, there
exists two sequences of r.r.v Dn− (βn ) and Dn+ (βn ) which verifies:
(L1 ) ∀ n ∈ N∗ : Dn− 6 Dn+ ,
n
X
(L02 )
a.co.
1I{Dn− 6Dn 6Dn+ } −−−→ 1 ,
W (Dn− , Ai )
i=1
n
X
and
− βn = oa.co. (vn ) ,
W (Dn+ , Ai )
i=1
(L03 )
∃ c > 0 such that:
oa.co. (vn ) ,
Cn (Dn− ) − c = oa.co. (vn ) and Cn (Dn+ ) − c =
2. Or if l = 0, and if for every increasing sequences βn ∈]0, 1[ width limits
1, (L1 ) , (L02 ) and (L03 ) are verified.
So, we have:
Cn (Dn ) − c = oa.co. (vn ) .
Burba et al. [4] use in their consistency proof of the k-NN kernel estimate
for independent data a Chernoff-type exponential inequality to check Condition
(L1 ) or (L01 ). In the case of the k-NN conditional mode, we can also use the
exponential inequality.
Proposition 1 (Burba et al. [4]). Let X1 · · · Xn independent random
n
X
variables valued in {0,1} such as IP(Xi = 1) = pi . We note X =
Xi , and
i=1
µ = IE[X], so ∀δ > 0
1. IP(X > (1 + δ)µ) <
eδ
.
(1 + δ)1+δ
δ2
µ
2. IP(X < (1 − δ)µ) < e 2 .
−
Lemma 3 (Ferraty et al. [18]). Under the hypotheses (H1) and (H12) we
have:
n
X
1
d( x , Xi )
a.co.
K
−−−→ 1 .
n φx ( hK )
hK
i=1
7
The k-Nearest Neighbors estimation of the conditional mode for functional data
399
4. HYPOTHESES AND RESULTS
We need the following hypotheses gathered together for easy reference:
(H1) There exists a nonnegative differentiable function φ strictly increasing such that:
φx (h) = IP(X ∈ B(x, h)) > 0.
(H2) ∀(y1 , y2 ) ∈ S × S, ∀(x1 , x2 ) ∈ Nx × Nx , |F x1 (y1 ) − F x2 (y2 )|
≤ Cx d(x1 , x2 )b1 + |y1 − y2 |b2 or for some j ≥ 0.
(H3) ∀(y1 , y2 ) ∈ S × S, ∀(x1 , x2 ) ∈ Nx × Nx , |f x1 (j) (y1 ) − f x2 (j) (y2 )|
≤ Cx d(x1 , x2 )b1 + |y1 − y2 |b2 .
, y2 ) ∈ IR2 , |G(y1 ) − G(y2 )| ≤ C|y1 − y2 |
R∀(y1b2
(H4)
|t| G(1) (t)dt < ∞.
(H5) K is a function with support (0,1) such that 0 < C1 < K(t) < C2
< ∞.
(H6) The kernel G is a positive function supported on [0, 1]. Its derivative
G0 exists and is such that there exist two constants C3 and C4 with
−∞ < C3 < G0 (t) < C4 < 0 for 0 ≤ t ≤ 1.
(H7) lim hG = 0 width lim nα hG = ∞ for some α > 0.
n→∞
n→∞
2 (j+1)
(j+1) (y ) ≤ C|(y ) − (y )|;
∀(y
,
y
)
∈
IR
,
G
(y
)
−
G
1
2
1
2
1
2
0 ≤ j + 1, lim |y|1+ν |G(j 0 +1) (y)| = 0,
∃ν
>
0,
∀j
(H8)
y→∞
(j+1)
G
is bounded.
(H9) ∃ξ > 0, f x % on (θ − ξ, θ) and f x & on (θ, θ + ξ).
(H10) f x is j-times continuously differentiable with respect to y on (θ −
ξ, θ + ξ),
x(l)
f (θ) = 0, if 1 ≤ l < j,
and
(H11)
x(j) f
(θ) > 0.
(H12) The sequence of positive real numbers kn = k satisfies:
log(n)
log(n)
→ 0 and
−→ 0 as n → ∞.
k
k h2j+1
G
k
−→ 0,
n
400
Mohammed Kadi Attouch and Wahiba Bouabça
8
Comments on the hypotheses:
1. (H1) is an assumption commonly used for the explanatory variable X
(see Ferraty and Vieu [17]) for more details and examples).
2. Hypotheses (H2)–(H3) characterizes the structural functional space of
our model and are needed to evaluate the bias term in our asymptotic
properties.
3. (H5) describes complementary assumptions concerning the kernel function, focus on non-continuous kernels
4. Hypotheses (H6)–(H8) are the same as those used in Ferraty et al. [18].
5. The convergence of the estimator can be obtained under the minimal
assumption (H9).
6. (H10) and (H11) are classical hypotheses in the functional estimation in
finite or infinite dimension spaces.
7. (H12) describes technical conditions imposed for the brevity of proofs.
5. ALMOST COMPLETE CONVERGENCE
AND ALMOST COMPLETE CONVERGENCE RATE
In order to link the existing literature with this works, we present the
consistency result of the conditional mode estimation given in Ferraty et al. [18].
Theorem 1 (Ferraty et al. [18]). Under the model (4), suppose that hypotheses (H3) and (H8)–(H12) are verified for j = 0, then if (H1), (H4)–(H5)
and (H7)–(H9)–(H11), we have
lim θbmode (x) = θ(x)
almost completely. (a.co.)
n→∞
Under the model (5) and the same condition of Theorem 1 we have:
θbmode (x) − θ(x) = O
hbK1
+
b2
hG
1
j
+O
log(n)
n hG φx (hK )
1
2j
a.co.
Now, we state the almost complete convergence, and almost complete
convergence rate result for the nonparametric k-NN mode introduced in (2).
Theorem 2. Under the model (4), suppose that hypotheses (H3) and (H8)
are verified for j = 0. Then if (H1), (H4)–(H5) and (H7)–(H9)–(H12), we have
(6)
lim θbkN N (x) = θ(x)
n→∞
a.co.
9
The k-Nearest Neighbors estimation of the conditional mode for functional data
(7)
Under the model (5) and the same condition of Theorem 2 we have:
1
j
b2
k b1
θbkN N (x) − θ(x) = O φ−1
+
h
x
G
n
1
2j
a.co.
+O log(n)
hG k
401
Proof. The conditional density f x (.) is continuous (see (H3) and (H9)).
We get
∀ > 0, ∃σ() > 0, ∀y ∈ (θ(x) − ξ, θ(x) + ξ),
|f x (y) − f x (θ(x))| ≤ σ() ⇒ |y − θ(x)| ≤ .
By construction θbkN N (x) ∈ (θ(x) − ξ, θ(x) + ξ) then
∀ > 0, ∃σ() > 0, |f x (θbkN N ) − f x (θ(x))| ≤ σ() ⇒ |y − θ(x)| ≤ .
So, we finally arrive at
∃σ() > 0, IP(|θbkN N (x) − θ(x)| > ) ≤ IP(|f x (θbkN N (x)) − f x (θ(x))| > δ()).
b kN N
In the other case, it comes directly by the definition of θ(x) and θ(x)
that:
(8) |f x (θbkN N (x)) − f x (θ(x))|
= |f x (θbkN N (x)) − fbx (θbkN N (x)) + fbx (θbkN N (x)) − f x (θ(x))|
≤ |f x (θbkN N (x)) − fbx (θbkN N (x))| + |fbx (θbkN N (x)) − f x (θ(x))|
≤ 2 sup |fbx (y) − f x (y)|.
y∈S
kN N
The uniform completes convergence of the conditional density estimate
over the compact set |θ − ξ, θ + ξ| (see Lemma 1 below) can be used leading
directly from both previous inequalities to:
n
X
∀ > 0,
IP(|θbkN N (x) − θ(x)| > ) < ∞.
i=1
Finally, the claimed consistency result (5) will be proved as long as the
following lemma could be checked. Lemma 4. Under the conditions of Theorem 2 we have:
x
x
lim sup |fbkN
a.co.
N (y) − f (y)| = 0
n→∞ y∈S
(9)
We have already shown in (5) that
|f x (θbkN N ) − f x (θ)| = 2 sup |fbx
y∈S
kN N (y)
− f x (y)|.
402
Mohammed Kadi Attouch and Wahiba Bouabça
10
Let us now write the following Taylor expansion of the function f x :
f x (θbkN N ) = f x (θ) +
1 x(j) ∗ b
f
(θ ) (θkN N − θ)j ,
j!
for some θ∗ between θ and θbkN N because (9), as along as we could be able to
check that
n
X
(10)
∀τ > 0,
IP f x(j) (θ∗ ) < τ < ∞.
i=1
We would have
!
(11)
j
(θbkN N − θ) = O
x
sup |fbkN
N (y)
y∈S
x
− f (y)|
a.co.,
so it suffices to check (10), and this is done directly by using the second part
of (H11) together with (6).
6. REAL DATA APPLICATION
Now, we apply the described method to some chemometrical real data.
We work on spectrometric data used to classify samples according to a physicochemical property which is not directly accessible (and therefore, requires a
specific analysis). The data were obtained using a Tecator InfratecFood and
Feed Analyzer working in a near-infrared (wavelengths between 850 and 1050
nanometers). The measurement is performed by transmission through a sample
finely chopped meat which is then analyzed by a chemical process to determine
its fat. The spectra correspond to the absorbance (–log10 of the transmittance
measured by the device) 100 for wavelengths between 850 regularly distributed
and 1050 nm. Meat samples were divided into two classes according to their
contents more or less than 20% fat (77 was more spectra corresponding to 20%
fat and 138 with less than 20% fat). The problem is then to discriminate the
spectra to avoid the chemical analysis, which is costly and time consuming. The
figure shows absorbance versus wavelength (850–1050) for 215 selected pieces
of meat. Note that, the main goal of spectrometric analysis is to allow the
discovery of the proportion of some specific chemical content (see Ferraty and
Vieu (2006) for further details related to spectrometric data). At this stage,
one would like to use the spectrometric curve X to predict Y the proportion
of fat content in the piece of meat. The data are available on the web site
“http://lib.stat.cmu.edu/datasets/tecator”.
11
The k-Nearest Neighbors estimation of the conditional mode for functional data
403
Fig. 1. The 215 spectrometric curves, {Xi (t), t ∈ [850, 1050], i = 1, · · · , 215}.
However, as described in the real data application (see Ferraty et al. [18]),
the prediction problem can be studied by using the conditional mode approach.
Starting from this idea, our objective is to give a comparative study to estimate
the conditional mode of both methods: the kernel method estimation defined
in (3) and the k-NN method estimation defined in (2).
For the both methods, we use a quadratic kernel function K, defined by:
3
K(x) = (1 − x2 )1I[0,1]
2
and the distribution function G(.) is defined by
Z x
15 2
t (1 − t2 )1I[−1,1] (t)dt.
G(x) =
4
−∞
Note that, the shape of these spectrometric curves is very smooth, for
this we used the semi-metric defined by the L2 distance between the second
derivative of the curves (see Ferraty and Vieu [16] for more motivations of this
choice). We proceed by the following algorithm:
• Step 1. We split our data into two subsets:
– (Xj , Yj )j=1,...,172 training sample,
– (Xi , Yi )i=173,...,215 test sample.
• Step 2. We compute the mode estimator θd
Xj , for all j by using the
training sample.
404
Mohammed Kadi Attouch and Wahiba Bouabça
12
• Step 3. For each Xi in the test sample, we set: i∗ = arg min d(Xi , Xj ).
j=1,...,172
• Step 4. For all i = 173, . . . , 215 we take
Ybi = θbmode (Xi∗ ),
where Xi∗ is the nearest curve to Xi in the training sample.
And
Yei = θbkN N (Xkopt ),
whereas, the optimal number of neighbors kopt is defined by
kopt = arg min CV (k) ,
k
where
CV (k) =
215
X
−i
2
(Yi − θbkN
N (x))
i=171
with
−i
−i
bx
θbkN
N (x) = arg max(fkN N ) (y)
y
and
−i
x
(fbkN
N ) (y) =
h−1
G
−1
(1) −1
j=1,j6=i K(Hk (x)d(x, Xj ))G (hG (y
P172
−1
j=1,j6=i K(Hk (x)d(x, Xj ))
P172
− Yj ))
.
The error used to evaluate this comparison is the mean square error
(MSE) expressed by
215
1 X
|Yi − Tb(Xi )|2 ,
45
i=173
where Tb designate the estimator used: kernel or k-NN method to estimate the
conditional mode.
The MSE of k-NN mode estimation is 1.716115 and the MSE of the
classical mode estimation used in Ferraty et al. [18] is 2.97524, this result
exhibits the effectiveness of the k-NN method.
7. APPENDIX
(1)
Proof of Lemma 2. For i=1,...,n, we consider the quantities Gi (y) =
(1)
G (h−1
G (y − Bi )).
Under the definition of the r.v Dn , we put: Dn− ≤ Dn ≤ Dn+ .
It’s clear that:
W ( Dn− (β) , Ai ) 6 W ( Dn (β) , Ai ) 6 W ( Dn+ (β) , Ai )
n
n
n
X
X
X
W Dn− (β) , Ai 6
W ( Dn (β) , Ai ) 6
W Dn+ (β) , Ai .
i=1
i=1
i=1
13
The k-Nearest Neighbors estimation of the conditional mode for functional data
So:
1
n
X
W
6
Dn+ (β) ,
Ai
i=1
1
n
X
6
W ( Dn (β) , Ai )
i=1
1
n
X
W
405
.
Dn− (β) ,
Ai
i=1
Under the hypotheses (H7)–(H8) and (H9), we have:
n
X
sup
y∈S
n
X
(1)
W (Dn− (β), Ai )h−1
G Gi (y)
i=1
n
X
6 sup
i=1
n
X
y∈S
W ( Dn+ (β) , Ai )
i=1
|
(1)
W (Dn (β), Ai )h−1
G Gi (y)
W ( Dn (β) , Ai )
i=1
{z
}
+
Cn
(β)
n
X
6 sup
|
{z
(1)
W (Dn+ (β), Ai )h−1
G Gi (y)
i=1
n
X
y∈S
}
Cn (Dn )
.
W ( Dn− (β)( β ) ,
Ai )
i=1
{z
|
}
−
Cn
(β)
Let us put the following r.v:
n
X
Cn− (β) = sup
(1)
W Dn− (β) , Ai h−1
G Gi (y)
i=1
n
X
y∈S
W Dn+ (β) , Ai
i=1
and
n
X
Cn+ (β) = sup
(1)
W Dn+ (β) , Ai h−1
G Gi (y)
i=1
y∈S
n
X
.
W
Dn− (β) ,
Ai
i=1
We consider for every > 0:
Tn () = { Cn (Dn ) / c − 6 Cn (Dn ) 6 c + } ,
and for every β ∈ ]0, 1[, we note:
Sn− (, β) = { Cn− (β) / c − 6 Cn− (β) 6 c + } ,
Sn+ (, β) = { Cn+ (β) / c − 6 Cn+ (β) 6 c + } ,
and
Sn (β) = { Cn (Dn ) / Cn− (β) 6 Cn (Dn ) 6 Cn+ (β) } .
406
Mohammed Kadi Attouch and Wahiba Bouabça
14
It is obvious that:
Sn ( β ) ∩ Sn− ( , β ) ∩ Sn+ ( , β ) ⊂ Tn ( ) .
(12)
Let
0 =
3c
2
and
β = 1 −
.
3c
By denoting:
6 Cn− ( β ) 6 β c + } ,
3
3
c
c
Wn+ ( ) = { Cn+ ( β ) /
− 6 Cn+ ( β ) 6
+ },
β 3
β 3
Wn () = {Dn / Dn− ( β ) 6 Dn 6 Dn+ ( β ) } .
We see that for ∈ ] 0 , 0 [:
c
− > c− ,
β c − > c − ,
β 3
3
and
c
β c + 6 c + ,
+ 6 c+ .
3
β 3
So, for ∈ ] 0 , 0 [:
Wn− ( ) = { Cn− ( β ) / β c −
Wn− ( ) ⊂ Sn− ( , β )
(13)
and
Wn+ ( ) ⊂ Sn+ ( , β ) .
(L0 ) implies that:
∀ z ∈ F, ∀t, t0 ∈ R :
t 6 t0 ⇒ W (t, z) 6 W (t0 , z) .
And in particular:
Dn ∈ Wn ( ) ⇒ Dn− ( β ) 6 Dn 6 Dn+ ( β )
Dn ∈ Wn ( ) ⇒ Cn− ( β ) 6 Cn ( Dn ) 6 Cn+ ( β )
⇒ Cn ( D n ) ∈ S n ( β ) .
So:
Wn ( ) ⊂ Sn ( β ) .
(14)
The result (12), (13) and (14) implies:
∀ ∈ ] 0 , 0 [ ,
Wn− () ∩ Wn+ () ∩ Wn () ⊂ Tn () .
In other words:
(15)
∀ ∈ ] 0 , 0 [ ,
1Ic Tn () 6 1Ic Wn− () + 1Ic Wn+ () + 1Ic Wn () .
On the other hand, we can express the r.r.v Cn− and Cn+ in the following way:
15
The k-Nearest Neighbors estimation of the conditional mode for functional data
n
X
Cn− (β)
(1)
W Dn− (β) , Ai h−1
G Gi (y)
i=1
= sup
n
X
y∈S
W Dn+ (β) , Ai
i=1
n
X
= Cn (Dn− ) ×
W Dn− (β) , Ai
Dn+ (β) ,
i=1
n
X
,
W
Ai
i=1
and
n
X
Cn+ (β) =
sup
y∈S
(1)
W Dn+ (β) , Ai h−1
G Gi (y)
i=1
n
X
W Dn− (β) , Ai
i=1
n
X
=
Cn (Dn+ ) ×
W Dn+ (β) , Ai
Dn− (β) ,
i=1
n
X
.
W
Ai
i=1
So, under (L2 ) and (L3 ) :
a.co.
Cn− (β) −−−→ β c
(16)
and
a.co.
Cn+ (β) −−−→ c/β .
Because:
Wn− ( ) = { Cn− ( β ) / Cn− ( β ) − β c > } ,
3
the first part of (16) implies:
P
a.co.
0
Cn− (β ) −−−→ β c ⇒ ∀0 > 0 :
P ( | Cn− (β ) − β c | >
) < ∞
P
⇒ ∀ ∈ ] 0 , 0 [ :
P 1Ic Wn− () > < ∞ .
c
So,
a.co.
∀ ∈] 0 , 0 [ , 1Ic Wn− () −−−→ 0 .
(17)
In the same way,
a.co.
∀ ∈] 0 , 0 [ , 1Ic Wn+ () −−−→ 0 .
(18)
Under (L1 ):
a.co.
1I{Dn− 6Dn 6Dn+ } −−−→ 1 , it implies that:
a.co.
∀ ∈] 0 , 0 [ : 1IWn ( ) −−−→ 1 .
407
408
Mohammed Kadi Attouch and Wahiba Bouabça
16
In other words,
a.co.
∀ ∈] 0 , 0 [ , 1Ic Wn ( ) −−−→ 0 .
(19)
We consider the results (15), (17), (18) and (19)
a.co.
∀ ∈] 0 , 0 [ , 1Ic Tn ( ) −−−→ 0 .
We can repeat as follows:
a.co.
1Ic Tn ( ) −−−→ 0
P
P | 1Ic Tn ( ) | > < ∞
P
P ( | Cn (Dn ) − c| > ) < ∞ .
⇒ ∀ ∈ ] 0 , 0 [ :
⇒ ∀ ∈ ] 0 , 0 [ :
And we finally obtain:
a.co.
Cn (Dn ) −−−→ c. (1)
Proof of Theorem 2 (Almost Complete Convergence). Let Gi (y) =
(1)
G (h−1
G (y − Yi )).
For the demonstration of this Theorem we have to use Lemma 2 with:
• F =E,
α = BE .
• ∀i = 1, n : ( Ai , Bi ) = ( Xi , Yi ) .
x
• Dn = Hn and Cn ( Dn ) = sup fˆkN
N (y) .
y∈S
• ∀( t , z ) ∈
R+
× E : W(t, z ) = K
d( x , z )
t
.
And for one β ∈ ]0 , 1[, we choose Dn− and Dn+ such that:
p k
1 k
φx ( Dn− ) = β
and
φx ( Dn+ ) = p
.
n
β n
For the demonstration, we just have to show that the kernel K verifies (L0 ) and that the sequences (Dn− )N , (Dn+ )N and (Dn )N verify (L1 ), (L2 )
and (L3 ).
Verification of (L0 ) :
∀ t , t 0 ∈ R+ : t 6 t 0 ⇔ ∃ v ∈ [1 , +∞[ / t 0 = v t
1
1
⇔ ∃ u ∈]0 , 1] / 0 = u
t t kx − zk
kx − zk
⇔ ∃ u ∈]0 , 1] / K
=K u
,
t0
t
∀ z ∈ E.
17
The k-Nearest Neighbors estimation of the conditional mode for functional data
409
We have that K is a bounded kernel, verifying: ∀ z ∈ E, ∀u ∈ [0, 1] :
K(uz) ≥ K(z). So, we have:
kx − zk
kx − zk
K
6 K u
,
t
t
where:
0
0
+
∀t,t ∈ R , ∀z ∈ E : t 6 t ⇒ K
kx − zk
t
6 K
kx − zk
t0
.
So, (L0 ) is verified.
Verification of (L1 ) :
The proof of (L1 ) is the same one as in Burba et al. [4].
Verification of (L2 ) :
According to (H12) :
lim φx (Dn− ) =
p k
β = 0,
n→+∞
n→+∞
n
1
k
+
= 0.
lim φx (Dn ) = lim p
n→+∞
n→+∞
n
β
For (H1), we deduce that:
(
lim
lim Dn− = 0 ,
n→+∞
lim Dn+ = 0 .
n→+∞
And by (H12) we have:
log(n)
lim
= lim
n→+∞
n→+∞
nφx (Dn− )
lim
n→+∞
log(n)
nφx (Dn+ )
log(n)
p k = 0 ,
n β
n
p log(n)
= lim
β
= 0.
n→+∞
k
n
n
Because (Dn− ) and (Dn+ ) verify the conditions of Lemma 4, we have:
n
X
1
d( x , Xi )
a.co.
−
K
−−−→ 1 ,
−
Q( Dn ) = n φ ( Dn− )
Dn
x
i=1
n
X
1
d( x , Xi )
a.co.
+
K
−−−→ 1 .
+
Q( Dn ) = n φx ( Dn+ )
Dn
i=1
410
Mohammed Kadi Attouch and Wahiba Bouabça
Q ( Dn− )
= Q Dn+
So,
1
n φx ( Dn− )
1
n φx ( Dn+ )
X
n
i=1
X
n
K
K
i=1
d( x , Xi )
Dn−
d( x , Xi )
Dn+
18
a.co.
−−−→ 1 .
We can deduce that:
n
X
i=1
n
X
i=1
K
K
d( x , Xi )
Dn−
d( x , Xi )
Dn+
a.co.
−−−→ β.
So, (L2 ) is checked.
Verification of (L3 ) :
Because Dn− and Dn+ verify the condition of Theorem 1, we obtain :
n
X
d( x , Xi )
(1)
h−1
K
G Gi (y)
−
D
n
a.co.
−−−→ f x (y) ,
sup i=1
n
X
y∈S
d( x , Xi )
K
Dn−
i=1
n
X
d(
x
,
X
)
(1)
i
K
h−1
G Gi (y)
+
D
n
a.co.
−−−→ f x (y) .
sup i=1
n
X
y∈S
d( x , Xi )
K
Dn+
i=1
So, (L3 ) verifies.
Theorem 2. That finishes the proof of the first part of
Proof of Theorem 2 (Almost Complete Convergence Rate). In the same
way as the demonstration of the first part of Theorem 2 (almost complete
convergence), with the following notations:
• F =E,
α = BE .
• ∀i = 1, n : ( Ai , Bi ) = ( Xi , Yi )
x
• Dn = Hn and Cn ( Dn ) = fˆkN
N (y)
• ∀( t , z ) ∈
R+
× E : W(t, z ) = K
d( x , z )
t
.
19
The k-Nearest Neighbors estimation of the conditional mode for functional data
411
And for one βn ∈ ]0 , 1[, we choose Dn− and Dn+ such that:
p k
1 k
φx ( Dn− ) = βn
and
φx ( Dn+ ) = p
.
n
βn n
For the proof, it is enough to show that the kernel K verifies (L0 ) and
the sequences (Dn− )N , (Dn+ )N and (Dn )N verify (L1 ), (L02 ) and (L03 ).
Verification of (L0 ) and (L1 ) :
The verification of (L0 ) and (L1 ) is made in the same way as for Lemma 1.
Verification of (L02 ) :
We have already shown in Ferraty et al. [18] that:
n
x
fbN
(y) =
X
1
(1)
K(h−1
K d(x, Xi ))Gi (y)
n hG IE [K1 ]
i=1
and
FbDx =
X
1
K(h−1
K d(x, Xi )).
nIE [K1 ]
i=1
So,
o
1 n bx
x
x
(y) − f x (y) − IEfbN
(y)
fbx (y) − f x (y) =
fN (y) − IEfbN
FbDx
o
fbx (y) n bx
IEFD − FbDx .
+ N
FbDx
To show that:
s
sup fbx (y) − f x (y) = O hb1 + O hb2 + O
y∈S
K
G
logn
nhG φx (hK )
!
It is enough to show that:
1
bx(j)
x(j)
(A)
sup IEfN (y) − f
(y) = O hb1
+ O hb2
K
G ,
FbDx y∈S
!
s
1
log(n)
bx(j)
x(j)
(B)
sup fN (y) − IEfbN (y) = O
,
nh2j+1
φx (hK )
FbDx y∈S
G
s
!
log(n)
x
x
b
b
(C)
FD − IEFD = O
,
nφx (hK )
i
X h (D)
∃ δ > 0 , such that
P FbDx < δ < +∞ .
,
a.co.
a.co.
a.co.
a.co.
412
Mohammed Kadi Attouch and Wahiba Bouabça
20
p k
−
−
−1
and h+ = Dn+ = φ−1
We put: h = Dn = φx
βn
x
n
Because :
lim h− =
n→+∞
log(n)
− =
n→+∞ n hG φx (Dn
)
1 k
p
βn n
lim h+ = 0 and
n→+∞
lim
=
lim
n→+∞
log(n)
hG k
lim
n→+∞
log(n)
=0
k
(because βn is bounded by 1).
We apply Lemma 4 and we obtain both results:
s
!
n
d (x, Xi )
1X
log n
−
K
− φx (h ) = oa.co.
1.
.
n
h−
n φx (h− )
i=1
n
n
1X
K
2.
n
i=1
K
i=1
d (x, Xi )
h+
d (x, Xi )
Dn−
r
p k
− βn = oa.co.
n
s
− φx (h+ ) = oa.co.
log n
n φx (h+ )
log n
k
!
.
!
.
So,
n
1X
K
n
i=1
d (x, Xi )
Dn+
1 k
− p
= oa.co.
βn n
r
log n
k
!
.
We have:
n
X
i=1
n
X
i=1
K
K
.
log(n)
+
n→+∞ n hG φx (Dn
)
lim
What implies:
n
1X
!
d (x, Xi )
Dn−
d (x, Xi )
Dn+
r
− βn = oa.co.
And according to (H12)
n
X
d (x, Xi )
K
Dn−
i=1
− βn = oa.co.
n
X
d (x, Xi )
K
Dn+
i=1
(L02 ) is also verified.
r
log n
k
!
log n
hG k
!
.
.
21
The k-Nearest Neighbors estimation of the conditional mode for functional data
413
Verification of (L03 ) :
Because h+ and h− verify the conditions, we can apply Theorem 1, that
gives:
n
X
d (x, Xi )
(1)
hG Gi (y)
K
−
D
n
x
sup i=1
− sup f (y) n
X
y∈S
d (x, Xi )
y∈S
K
−
Dn
n
X
d (x, Xi )
(1)
hG Gi (y)
K
+
D
n
x
sup i=1
−
sup
f
(y)
n
X
y∈S
d (x, Xi )
y∈S
K
+
Dn
i=1
s
= O (Dn− )b1
+ O(hG )b2 + Oa.co.
i=1
log(n)
n hG φx (Dn− )
s
= O (Dn+ )b1
+ O(hG )b2 + Oa.co.
!
log(n)
n hG φx (Dn+ )
.
!
.
In other words,
s
b1 !
k
log(n)
.
+ O(hG )b2 + Oa.co.
Cn (Dn− ) − sup f x (y) = O φ−1
x
n
hG k
y∈S
s
b1 !
k
log(n)
.
+ O(hG )b2 + Oa.co.
Cn (Dn+ ) − sup f x (y) = O φ−1
x
n
hG k
y∈S
So, ∃ c = sup f x (y) > 0, such that:
y∈S
s
log(n)
|Cn (Dn− ) − c| = Oa.co.
hG k
s
log(n)
|Cn (Dn+ ) − c| = Oa.co.
hG k
So, (L03 ) is verified.
.
.
414
Mohammed Kadi Attouch and Wahiba Bouabça
22
We finally have:
s
α log(n)
x
−1 k
x
sup fbkN
+ O(hG )b2 + Oa.co.
.
N (y)−f (y) = O φX
n
hG k
y∈S
8. CONCLUSION AND PERSPECTIVES
We proposed in this work a convergence result of the nonparametric
k-NN method estimation of the conditional mode for functional data. The
almost complete pointwise convergence of this estimator with rates is given.
The effectiveness of the k-NN method resort on practical examples, as
long as the smoothing parameter k takes values in a discrete set this provides
a better implementation.
This work can be generalized to the dependent data (see Azzahrioui [14]).
In nonparametric statistics, uniform convergence is considered as a preliminary
step to obtain sharper results, in this context, it would be very interesting to
extend the results of Ferraty et al. [20].
Obtaining explicit expressions of the dominant terms of centered moments
can be envisaged when we obtain the asymptotic normal result (see Delsol [11]).
This idea can be investigated in the future.
Acknowledgments. The authors thank the referee and the editors-in-chief for helpful
comments. Supported by “L’Agence Thématique de Recherche en Sciences et Technologie ATRST (Ex ANDRU)” in P.N.R., No.46/15.
REFERENCES
[1] M. Attouch and T. Benchikh, Asymptotic distribution of robust k-nearest neighbour estimator for functional nonparametric models. Math. Vesnik 64 (2012), 4, 275–285.
[2] D. Bosq, Linear processes in function spaces. Theory and Application. Lectures Notes
in Stat. 149, Springer Verlag, New-York, 2000.
[3] F. Burba, F. Ferraty and P. Vieu, Convergence de l’estimateur noyau des k plus proches
voisins en régression fonctionnelle non-paramétrique. C. R. Math. Acad. Sci. Paris
346 (2008), 5–6, 339–342.
[4] F. Burba, F. Ferraty and P. Vieu, k-nearest neighbour method in functional nonparametric regression. J. Nonparametr. Stat. 21 (2009), 4, 453–469.
[5] G. Collomb, Estimation de la régression par la méthode des k points les plus proches
avec noyau: quelques propriétés de convergence ponctuelle. Nonparametric Asymptotic
Statistics, Lecture Notes in Math. 821, Springer-Verlag, 159–175, 1980.
[6] G. Collomb, W. Härdle and S. Hassani, A note on prediction via conditional mode
estimation. J. Statist. Plann. Inference 15 (1987), 227–236.
[7] S. Dabo-Niang, Estimation de la densité dans un espace de dimension infinie: application
aux diffusions. C. R. Acad. Sci. Paris 334 (2002), 3, 213–216.
23
The k-Nearest Neighbors estimation of the conditional mode for functional data
415
[8] S. Dabo-Niang, Kernel density estimator in an infinite dimensional space with a rate of
convergence in the case of diffusion process. Appl. Math. Lett. 17 (2004), 381– 386.
[9] S. Dabo-Niang, F. Ferraty and P. Vieu, Mode estimation for functional radom variable
and its appliction for curves classification. Far East J. Theor. Stat. 18 (2006), 93–119.
[10] S. Dabo-Niang and A. Laksaci, Estimation non paramétrique du mode conditionnel pour
variable explicative fonctionnelle. C. R. Math. Acad. Sci. Paris 344 (2007), 49–52.
[11] L. Delsol, Advances on asymptotic normality in nonparametric functional time series
analysis. Statistics 43 (2008), 1, 13–33.
[12] L.P. Devroye, The uniform convergence of nearest neighbour regression function estimators and their application in optimization. IEEE Trans. Inform. Theory 24 (1978),
142–151.
[13] W.F. Eddy, The asymptotic distribution of kernel estimators of the mode. Z. W. Giebete
59 (1982), 279–290.
[14] M. Ezzahrioui and E. Ould-Saı̈d, On the asymptotic properties of a nonparametric estimator of the conditional mode for functional dependent data. LMPA No 277, Univ. du
Littoral Côte d’Opale, 2006, preprint.
[15] M. Ezzahrioui and E. Ould-Saı̈d, Asymptotic normality of nonparametric estimator of
the conditional mode for functional data. J. Nonparamet. Stat. 20 (2008), 3–18.
[16] F. Ferraty and P. Vieu, The functional nonparametric model and application to spectrometric data. Comput. Statist. Data Anal. 4 (2002), 545–564.
[17] F. Ferraty and P. Vieu, Nonparametric Functional Data Analysis: Theory and Practice.
Springer Series in Stat., Springer, New York, 2006.
[18] F. Ferraty, A. Laksaci and P. Vieu, Estimating some chacteristics of the conditional
distribution in nonparametric functional models. Stat. Inference Stoch. Process. 9
(2006), 47–76.
[19] F. Ferraty, A. Mas and P. Vieu, Nonparametric regression on functional data: Inference
and practical aspects. Aust. N. Z. J. Stat. 49 (2007), 267–286.
[20] F. Ferraty, A. Laksaci, Tadj and P. Vieu, Rate of uniform consistency for nonparametric
estimates with functional variables. J. Stat. Plann. Inference 140 (2010), 335–352.
[21] T. Gasser, P. Hall and B. Presnell, Nonparametric estimation of the mode of a distribution of random curves. J. R. Stat. Soc. Ser B Stat. Methodol. 60 (1998), 681–691.
[22] Y.P. Mack, Local properties of kNN regression estimates. SIAM J. Algebr. Discrete
Methods 2 (1981), 311–323.
[23] E. Ould-Saı̈d, Estimation nonparamétrique du mode conditionnel. Application la prévision. C. R. Acad. Sci. Paris 316 (1993), 943–947.
[24] E. Ould-Saı̈d, A note on ergodic processes prediction via estimation of the conditional
mode function. Scand. J. Stat. 24 (1997), 231–239.
[25] R.M. Royall, A class of nonparametric estimates of a smooth regression function, Ph. D.
Dissertation, Stanford University, 1966.
[26] M. Samanta and A. Thavaneswaran, Non-parametric estimation of conditional mode.
Comm. Stat. Theory Methods 16 (1990), 4515–4524.
[27] C.J. Stone, Consistent nonparametric regression. Ann. Stat. 5 (1977), 595–645.
Received 3 June 2012
Université Djillali Liabès de Sidi Bel Abbès,
Laboratoire de Statistique Processus Stochastiques,
BP 89, Sidi Bel Abbès 22000, Algérie
attou [email protected]
wahiba [email protected]
© Copyright 2026 Paperzz