Bhattacharya, P.K. and Gangopadhyay, Ashis KKernel and Nearest Neighbor Estimationof Conditional Quantile"

KERNEL AND NEAREST NEIGHBOR ESTIMATION OF
A CONDITIONAL QUANTILE
by
P .K. Bhattacharya
University of California, Davis
and
Ashis K. Gangopadhyay
University of North Carolina, Chapel Hill
ABSTRACT
Let (Xl'Z1)' (X 2 ,Z2),· .. ,(X n ,Zn) be LLd. as (X,Z), Z taking values in
o <
Rl,
and for
p < 1, let ep(x) denote the conditional p-quantile of Z given X=x, i.e.,
P(Z ~ ep(x) IX=x) = p. In this paper, kernel and nearest neighbor estimators of ep(x) are
proposed.
As a first step in studying the asymptotics of these estimates, Bahadur type
representations of the sample conditional quantile functions are obtained.
These
representations are used to examine the important issue of adaptive choice of the
~
\l
fj
',-,
smoothing parameter by a local approach (for a fixed x) based on weak convergence of
these estimators with varying k in k-nearest neighbor method and with varying h in kernel
method with bandwidth h.
These weak convergence results lead to asymptotic linear
models which motivate certain estimators.
-1-
1. Introduction
Let (X 1,Zl)' (X 2,Z2)"" be two-dimensional random vectors which are iid as (X,Z),
and for 0 < P < 1, let ep(x) denote the conditional p-quantile of Z given X=x. We
consider the problem of estimating ep(x) from the data (Xl'ZI)"",(Xn,Zn)' and study
the asymptotic properties of the kernel and nearest neighbor (NN) estimators, as n
-I 00.
Usefulness of conditional quantile functions as good descriptive statistics has been
discussed by Hogg (1975) who calls them percentile regression lines.
The problem of
conditional quantile estimation has been investigated by Bhattacharya (1963) follo\'.;ing
the fractile approach, and is also included in the general scheme of nonparametric
regression considered by Stone (1977). More recently asymptotic normality of estimators
of conditional quantiles has been proved by Cheng (1983), who considered kernel
estimators in the fixed design case, and by Stute (1986), who considered NN-type
estimators in the random design case. However, both of these authors took the bandwidth
1
h
n
to be O(n -~)
which kept the bias smaller than the random error by an order of
magnitude, and thereby made the rate of convergence slower than optimal.
In this paper, we obtain Bahadur-type representations [Bahadur (1966)] of NN and
kernel estimators of conditional quantiles as a first step in studying their asymptotics
(Theorems Nl and K1). Bias terms show up in these representations because of our choice
4
of k = k
n
= O(nf»
1
in the k-NN method and h=h
n
= O(n-
f»
in the kernel method
(with uniform kernel), for the purpose of achieving optimum balance between bias anel
random error.
These representations are then used to examine the important issue of
adaptive choice of the smoothing parameters by a local approach (for fixed x).
\'leak
convergence results (Theorems N2 and K2) are proved for these estimators with varying k
in the NN method and with varying h in the kernel method. These weak convergence
-2-
results are analogous to the one obtained by Bhattacharya and Mack (1987) for k-NN
regression estimators with varying k and lead to asymptotic linear models similar to theirs,
which motivate certain esFrrators., ,
"
-::-
.,..
\
i
",
I
,J
2. The Main Results
For the random vector
(X,Z),
let
f denote the pdf of X and
g( 'Ix)
the
conditional pdf of Z given X=x, with corresponding conditional cdf G(· Ix). \Ve want
to estimate
~p(xa),
p~uantile
of Z given X=x a . Since p
xa will remain fixed throughout our discussion, we shall write ~p(xa) = ~.
The following regularity conditions are assumed.
1.
2.
the conditional
E
(a, 1) and
(a)
f(x a) >
(b)
f"(x) exists ina neighborhood of xa ' and there exist (> a and A <
that Ix-xal ~ ( implies If"(x) -f"(xa)1 ~ Alx-xal .
a.
OJ
such
e
(a) g(~lxa) > a, where G(~lxo) = p.
(b)
The partial derivativesgx(zlx) and gxx(zlx) of g(zlx) and Gxx(zlx) of
G(z Ix) exist in a neighborhood of (xo'~)' and there exist (> 0 and A <
such that
Ix-x OI,,$
f
and" Iz-~ I $
f
Igz(zlx)1 $ A, Igx(zlxo)1 $ A,
OJ
together imply
Igxx(zlxa)1 $ A,
Igxx(z Ix)- gxx(z.1 xO) I $ A Ix-xOI,
IGx)zlx) -G
By condition 2,
Now let
~
J9C
(zlx o)1 $ Alx-xol .
is uniquely defined by G( ~ IxO) = p.
{(X.,Z.),
i=1,2,... } be iid as (X,Z), and let Yo1 =
1 1
{(Vi'Zi)' i=1,2, ... } are iid as (V =
IX-x OI,
IX,-x
I,
1 a
so that
Z) with the pdf fV of V, the conditional
pdf g*( 'Iy) of Z given V=y and the corresponding conditional cdf G*( 'Iy) given by
e
-3-
(1)
g*(zly)
= [f(xO +
y) g(zlx o + y) + f(xo-y) g(zlxo-Y)] / fy(Y),
G*(zly) = [f(xO + y) G(zlxo+Y) + f(xo-y)G(zlxo-Y)] / fy(Y).
Note that
g*(zIO)
= g(zlxo) = g(z),
G*(zIO) = G(zlxo) = G(z).
Here and in what follows, we write g(zlx )
o
= g(z)
and G(zlx )
o
= G(z)
for simplicity.
Let Yn1 < .., < Ynn denote the order statistics and Zn1, ... ,Znn the induced order
statistics of (Y1,Zl)'''''(Y ,Z ), Le., Z . = Z. if Y . = y.. For any positive integer
nn
llJ
J
m
J
k ~ n, the k-NN empirical cdf of Z (with respect to x ) is now defined as:
O
(2)
where 1(5) denotes the indica.tor of the event S. The k-NN esdmc.tor of
expressed as the p-quantile of G
(3)
~n ,k
nk
, Le.,
~
.
I
~
can nO\\' be
;
;
= the [kp]th order statistic of Z ;,... ,z k
n . , n. "
= inf {z:
.
Gnk(z) ~ [kp]/k}.
I
{
The kernel estimator of ~ with uniform kernel and bandwidth h can also be expressed in
the same manner, viz.,
.
-
(4)
{{nh = inf ~ z: Gn K (h)(z)
n
K (h) =
n
~
i=l
l(Y. ~ h/2)
1
~ ~ Kn (liM / Kn(h)}
= n F y , n(h/2)
,
.
Fy,n being the empirical cdf of Y1,... ,yn' The kernel estimators are thus related to the
NN estimators by
-4-
where Kn(h) is the random integer given by (4).
We now state our main results in the following two theorems of which Theorem Nl
,
;'l
'0 " , '
•
gives a Bahadur-type represe~tationfor th,e k-NN estimator ~nk of ~ with k lying in
,~~
!f'h';·.:l'~.
'
and Theorem Kl gives a corresponding representation for the kernel estimator
~
h lying in
Theorem Nl.
t
t
"nk -"
where
f3(~) =-[f(~o)Gxx({lxo) + 2f' (xO) Gx(~lxo)] / {24 rl(xo) g(~)},
.'
and
Theorem Kl.
t
t
"nh -"
where f3(~) and Z~i are as in Theorem Nl, and
nh
with
-5-
Remarks.
; ; : ;-:~';,
1. Let
i \
.A = O'{Y I'Y 2''''}' Then Znl"",Znn are conditionally independent given
,
~ ~_ ~ ~;
;} J
.A,
;
with Zni having conditional cdf G*(·\ Y ni)' as shown by Bhattacharya (1974). Hence
G*(ZniIYni)' 1 ~ i ~ n, are conditionally independen~ and uniform (0,1) given.A, and
therefore,
Thus for each n, Z~1 "",Z~n are iid with cdf G. Since G(~) = p, it follows that for each
n, the summands 1(Z~i > ~) - (l-p) in the above representations are independent random
variables with mean 0 and variance p(l-p).
2. The remainder terms in both theorems are O(n
corresponds
to
O(k
_ ;l
5
-,i log n),
a.s.
Ir~
Theorem Nl, this
4
log k)
with
k -
O(n~, as one would expect.
,4
The same
1
explanation applies to Theorem Kl, because [n h f(xO)J= O(n~ for h = O(n- 5).
3.
Weaker versions of the above theorems were prdved by Gangopadhyay (1987).
2
His
2
remainder terms were o(n- 5), a.s. in Theorem Nl and 0p(n- 5) in Theorem Kl.
3. Weak Convergence Properties of NN Estimators with Varying k and Kernel Estimators
with Varying Bandwidth.
.
-
Consider the stochastic processes {~nk' k E In(a,b)} and {~nh' hE In(c,d)}. The
two theorems in this section describe the weak convergence properties of suitably
normalized versions of these processes, as n
-+ 00.
The symbol
fi'
indicates convergence in
-6-
distribution, Le., weak convergence of the distributions of the stochastic processes (or
random vectors) under consideration and
{B(t), t ~ O}
denotes a standard Brownian
motion.
A
Theorem N2. Let Tn(t)
=e
4.•
n~[n~tJ'·
2
{n%[Tn(t) - el - (3t , a
Then for any 0 < a < b,
,. (
~ t ~ b}
*'
I
{a C B(t), a
~ t ~ b}
,
2
where (3 = (3(e) given in Theorem NI and a = P(I-p)/g2(e).
4
Proof.
+ (n(t)
In the representation for
enk given in Theorem NI, take
k
4
= [n5 t] = n5 t
with 0 ~ (n(t) < 1. After a little rearrangement of terms, this leads to
4
2
n5[T (t)-e]-(3(e)t
n
2
=
5
1
2 [n t]
h/p{ I-p) /g(enC . n- 5 E W.
nl
1
;
3
+ E RnJ·(t),
j=I
where
are iid with mean 0 and variance 1 for each
I
g
remainder terms, R 1(t) = n R
4
n
n,[n!it]
come from the discrepancy 0 ~ (n(t)
n
in view of Remark 1.
from Theorem NI, while R
< 1 due to replacing k
4
n2
Of the three
(t)
and
n3
(t)
4
= [n5 t] by n5t in the first
•
two terms of the representation. Hence
a.s.,
while
R
-7-
:.
4
2
4
is easily seen to be 0p(n- 0) by virtue of sup
a~t~b
_ 1
\n- 5
'. ' . .
I
[n'5"t]
W. I
~
iJ
1
m
=
°( )
p
1. The three
'...
remainder terms are together 0p(n 5" log n) uniforrnly:lp, a
~
t
~
b. We thus have, with
a=Jp(l-p) /g(~),
uniformly in a
~
t
~
b. Now use Theorem 1, page 452 of Gikhman and Skorokhod (1969)
4
[n5 t]
S
to see that {nE W., a ~ t ~ b} ~ {B(t), a ~ t ~ b}. This proves the theorem.
2
1
m
Theorem K2. Let Sn(t) =
~
_ 1 . Then for any 0
n ,n 5t
where 'Y
= f3 f2(x O)
Proof.
In the representation for
and
T
= a /f(x O)'
rearrange terms to obtain
2
n5[S n (t)-~] - 'Y t
2
with
~nh
f3
<
and iJ
C
0
< d,
as in Theorem N2.
given in Theorem Kl, take
h
and
-8-
where Wni' 1
5 i 5 n, are as in the proof of Theorem N2 and R*
gIR
Kl. Thus
1
n ,n- 5" t
sup n
c5t5d
_1
-! log n),
I =
O(n
is from Theorem
a.s., and the theorem follows from the
n,n 5"t
triangular array, version of l)(m~keT's theorem, referred to above.
0
Remark. The uniformity of the: order of.ITIagnitude of the remainder terms in Theorems N1
and K1 was crucial in proving Theorems N2 and K2.
2
From Theorems N2 and K2, it follows that n5[Tn (t )-~l
~
~
2
n5[Sn(t)-~1
N( "/ t
2
~
~
2 2 -1
N (/3 t ,(J t ) and
2 -1
.
2
t ) for each t, where N(/l,(J) denotes a Gaussian
,T
LV.
with
mean /l and variance (J2. Hence the asymptotic mean-squared errors (AMSE) of Tn(t) is
nis
tur t 4 + (J2 C 1),
n-
t(l
t4
which is minimized at ti = {(J2/(4
+ T2 C 1),
optimum tj and t
which is minimized at
t
2=
;)}!
and the AMSE of Sn(t)
{T 2/(4 ,,/2)}!. However, these
2 involve unknown quantities involving the marginal distribution of
X
e'
and the conditional distribution of Z given X = x ' Although one could attempt to use
O
consistent (but possibly
and t
non~ptimal) estimates of /3,
(J2, "/ and
T
2
to approximate ti
2 in the spirit of Woodroofy (1970) and Krieger and Pickands (1981), we shall take
another approach in the next section by considering linear combinations of k-NN
estimators with varying k and kernel estimators with varying bandwidth.
4. Asymptotic Linear Models arid 'LInear Combinations of
.
,
,
tl
t
"nk
and
~nh
•
Neglect the remainder form in' 'Theorem N1 to obtain the following asymptotic linear
model for {~nk' kO 5 k
.
~nk ~
(7)
where
/3
5 k 1} ;
•
2
(+ (k/n) /3 +
(J
~nk' kO ~ k
and (J are as in Theorem N2 and
5k ,
-9-
f Wni ,
-1 k
~nk = k
in which Wni , 1 ~ i
~
n are the iid r.v.'s with mean 0' and' variance 1 given by (6). Hence
This is exactly like the asymptotic linear model for k-NN regression estimates obtained by
Bhattacharya and Mack (1987). Following their approach, we can now obtain the BLUE
A
~
of
~nk's.
in the model (7) and other suitably biased linear combinations of the
asymptotic distribution of the BLUE of
~
and its ARE with respect to
e
4
n,[ n5til
The
where
4
[1l5
til is the optimum number of NN's, follow from Theorem N2 in exactly the same way
as in the case of regression estimates discussed in [4].
The kernel estimators
-
~nh
with n-
1
5C~
h
~
1
n- 5 d also satisfy an asymptotic linear
model similar to (7). For this, first neglect the remainder term in Theorem 1\1 to obtain
where {3,
(J
parameter
and
~nk
are as in (7). Although this linear model is indexed by a continuous
h, the problem of finding the BLUE and other suitably biased linear
combinations of
~
in this model are essentially the same,as the corresponding problems in
-
(7). All questions about these linear combinations. are thus covered by our discussion in
4
4
the previous paragraph. To see this, let mO = [n5 c f(x O)], m1 = [n"5 d f(x O)], and define
Then for h(m)
~
h < h(m+l),
-10 -
Using this in (8), we have
enh ~ enh(m)'
(9)
{
{nh(m)
~
/:'+
h{m) ~ h <: h(m+l)
(l~ln}~~' r/~
(J
~nm
mO ~ m
'
~ m1
1
n- ~1 d
I
All linear combinations
n-
~
£(h) enh dh
c
of kernel estimators with varying
bandwidth are, therefore, approximated by linear combinations of {enh(m)' mO ~ m ~ m 1},
for which the asymptotic linear model (9) is identical to (7). The asymptotic properties of
these linear combinations follow from Theorem K2. The details are as in [7].
5. Proof of Theorem Nl: Preliminary Lemmas.
The k-NN estimator
Znl"",Znk'
~~k
of
~
is the p-quantile of the empirical cdf Gnk
which are conditionally independent with
Zni
of
having conditional cdf
G*( . IYni)' It is, therefore, natural to think of enk as an estimator of the p-quantile enk
k
'
of the random cdf k-1 E G*(· IY .). The discrepancy between this random cdf and the
1
III
cdf G*(· 10) = G(·) of Which e 'is the p-quantile, is going to give rise to a bias in
I
addition to the random error in estimating
~nk
by
~nk'
To facilitate the examination of
this bias and the random error, we introduce some notations. Let
g*( . IYIII.) = gIII.(.), G*(·' YIII.) = GIII.(.) ,
(10)
Then
~nk'
which is the target of
~nk
is given by
•
-11-
To examine the asymptotic properties of ~nk-~nk and ~nk-~ for k E In(a,b), we now
analyze the corresponding properties of G nk (·) - ~nk(i') and U nk (·) - G( . ).
0 < Y 1 < ... <
The order statistics
n
1
Y
are of the order of
4
\n;[n~bl
n- 5.
Consequently, for k E In(a,b), it should be possible to approximate the pdf's ink and the
cdf's U nk defined in (10), by the first few terms of their expansions in powers of Yni'
4
i ~ [n5 b]. To this end, we have the following lemmas.
Lemma 1. For B > b/f(x ) and for sufficiently large n,
O
pry
4
n,[n5b]
> n-
1
5"
B]
~
2
3
exp[-2 n5{B f(xo)-b} ].
Proof. This is proved in [4].
k
Lemma 2. k-1 L y 2 . = {12 f 2(x )}-1 (kin) 2
o
1 111
ma~
k~
IRnkl
= O(n-~,
+ R nk ,
where
a.s.
[ n "5"b]
Proof. Let 0 < Un1 < ... < U < 1 denote the order statistics of a random sample of
nn
size n from uniform (0,1), and let
r.p
=
FyI.
Then
r.p(0)
=
r.p"(0)
= 0,
9'(0)
=
{2 f(x )}-l and r.plll is continuous at 0 by condition 1. Use this to expand Y . = rp(U .)
111
III
O
1
to the cubic term and observe that U
(12)
k-1
4
n,[n5 b]
= O(n- 5), a.s., by Lemma 1. This leads to
k
k 2
F
y~i = {2 f(x O)}-2 . k-1 F Uni + R nk (1),
-12 -
k~ [n~b]
k
k-1 E U 2.
(13)
4
IR nk (l) I =
ma~
where
m
1
O(n- 5), a.s. Now write
= k-1
I,
where
I
~ 2 k-:-1
Rnk (2) 1
~
l '.
'
r
k
k
(i/n)! UnCi/n I + k-1
2(k/n)
max
1 ~ i ~n
1
i
1
Uni-i/n 1
U .-i/n 1 + max
m
1 ~ i ~n
1
2
2
Unl.-i/n 1 ,
so that
max
k~
4
[n 5 b]
7
= O(n- m
JI 0 g
log
n), a.s.,
by the law of iterated logarithm (see [6], page 157), and
I
Rnk (3) 1 =
I
-1 k, ,',' 2 ·1 2
2
k 'E (i/k) - fOx dx I (k/n) ~
1
"
so that
ma*
k~ [n~bl
1
k
-3?
.)
(E 2ik )(k/nt ~ (k+ 1)/n- ,
1=1
6
Rnk 1 == O(n- 5).
Combining (12) and (13), the lemma is proved.
o
Lemma 3.
The following expansions hold for the conditional pdf
conditional cdf G*(zly):
g*(z 1 y) = g(z) +
G*(zly)
= G(z)
~ y2 q(z) + y3 r(y,z),
+
~ y2 Q(z) + y3 R(y,z),
g*(z Iy)
and the
,
-13 -
where
and there exist {> 0 and M <
all bounded by M for 0
~
y
~
00
such that Iq(z)l, iQ(z)l, Ir(y,z)I and IR(y,z)1 are
{ and Iz-{I
~
{.
Proof. Expand f(x O ± y), g(zlxo ± y) and G(zlxo ± y) about 0 to obtain:
1 2
g(zlx o ± y) = G(zlxo) ± y gx(zlx o) + 2 y {gxx(zlxo) + ~2(Y'z)},
G(zlx o ± y) = G(zlxo) ± y Gx(zlx o) + ~ y2{G xx (zlx o) + ~3(Y'z)} ,
with max{ 1~I(y)l, 1~2(Y'z)\, 1~3(y,z)l} ~ Ay for 0 ~ y ~ { and Iz-{I ~ {, where
{ > 0 and A <
G*(z Iy)
00
are as in conditions 1 and 2. To obtain the formulas for g*(z Iy) and
stated in the lemma, use the above expansions in (1).
It will be seen by
appropriate arrangement of terms that in remainders y3r(y,z) for g*(y Iz) and y3 R(y,z)
for G*(y,z), the quantities Ir(y,z)1 and IR(y,z)1 for 0
~
y
~
{ and Iz-{I
~
( will
remain bounded by a constant M determined by f(x O)' f"(x O)' g({lx o)' gxx({lx o),
Gx ( {I xO)' Gxx ( {I xO) and A, where {> 0 and A < 00 are as in conditions land 2.
The boundedness of 1q(z) I and
6. Proof of Theorem Nl:
IQ(z) I
Bias in
for 1z-{ I ~. { also follows from Condi tion 2.
0
{nk.
.
Recall that the target of (nk is (nk' the p-quantile of the random cdf G"nk(·) =
k
k-1
G*(·I Y ni)' while ( is the p-{}uantile of G(·). The leading term of ~nk - ~ is
f
-14 -
non-stochastic with probability 1~ which is determined in this section. This is the bias in
A
~nk' We first use Lemma 3 to bound the discrepancy between G"nk(') and G(') near (,
.
:: :
',"
in terms of Y k' This makes
max
I~ k-~ I small whenever Y 54 is small, in
kE I n(a,b) n
n,[n b]
n
the manner described in the foiI6wing lemma.
•
The almost sure order of magnitude of
~nk-~ is obtained as a corollar~. "
I
;
Lemma 4. For every B, there exist Nand C such that in the sample space of infinite
1
Y
4
< B n- 5
n,[n5b]
sequences
Proof. Fix B <
00
and 0 < a < b. By (11), it is enough to show the existence of Nand
C such that for n ~ Nand k E I (a,b), Y
n
2
1
4
n,[n 5b]
~ B n- 5 implies
2
"Gnk ( ~ - C n- 5) ~ -: G(~) ~ G"nk( ~ + C n- 5) .
.
For this, assume Y
4
< B, choose Nand C so that
n,[n 5b]
and use Lemma 3 to obtain
(15)
implies
e"
-15 -
for n
~
G(~ -
C n- ~
N. Moreover, since by Condition 2,
~ G(e) - ~ C n- i g(e)
G(~
for C n-i'~
+ C n- i)' ~ G(~) + ~ C n- gg(O
(,
and
(14) implies'
,
1
2
2
G( e + C n- 5) ~ G( e) + 2 C n~ 5 g( e) -
G(e + C n-i)
¥., B,
2
n-
2
f)
~ G(e) -~ C n- gg(~) + M B2n- i
for n ~ N. The lemma is proved by choosing C > 2M B2/ g( e) and then choosing N so
as to satisfy (14).
o
Corollary. For 0 < a < b,
max
kEI n (a,b)
Ienk-e I =
2
O(n- 5), a.s.
Proof. Take B > b/f(xO) and apply Lemma 4 using C and N appropriately determined
by B. Then
~
00
3
2
exp[-2n5 {B f(x O) - b} ] <
n=N
E
We now determine the leading term of enk-~ .
Lemma 5. For 0 < a < b,
Proof. By (11) and Lemma 3,
00.
o
-16 -
-1 k
G(O = Unk(e nk ) = k
-1 k
= G(z)+ k
'1
,f.
,E1JGni(e}+(enk-e) gni(zni)]
1
'1
3
{~::Tni. Q(~) + Yni R(Yni' ~)}
2
.
. -.
•
..
: ilk!,
1 2
3
+ (enk-e)[g( O. + k- E {g(Zni) - g(~)} + '2 Yni q(zni) + Yni r(Yni' zni)]'
,.1
1
where
(16)
Hence
k
k
Q(e) . k-1 E y 2 . +2 k- 1 E y 3 . R(Ynl·,O
1
1 Dl
1 Dl
~nk-e = --2 - - - - . . . , . . . k - - - - - - - - - - - - - - - - 1
1 2
3
g(e) + k- E
{g(znl·)-g(e)} + -2 Ynl. q(z Dl.) + YDl. r(Y Dl"Z 111.)
1
,
where
3
3
max
IR k(l) I = d(n- 5), a.s., by Lemma 2, max IR k(2) I = O(n- 5),
kEl (a,b)
n
kEI n (a,b)
n
n
a.s., by Lemmas land 3, and
2
max
IR k(3) I = O(n- 5), a.s., by (16), Lemmas 1
kEl (a,b)
n
n
and 3, and condition 2. Since (kjn)2, ~ n-
~ b2
for k
E
In(a,b), the lemma is proved.
0
7. Proof of Theorem Nl: Conclusion
We shall first express
t
~nk
-
t
~nk
111
a manner analogous to the familiar
representation of sample quantiles [Bahadur (1966)]. The random variables involved in
this representation are l(Zni >
~nk)
- (l-G ni (e nk )), which are conditionally independent
-17 -
.A= u{Yl'Y2''''},
given
and the remainder term is
corresponds to O(k-! log k) with k =
O(n~,
O(n-! log n),
a.s.,
which
as one would expect. This representation
I(Zni > (nk)
is then modified to another one in which the random variables
- (1 - Gni((nk)) are replaced by I(Zni > () - (1;- G ni (()), and to yet another one in
which these random variables are replaced by l(Z*. >.~)
(1 - G( ~)) where Z*.
ill
III
-
= G-
1
0
Gni(Zni)' 1 ~ i
~ n, are iid with cdf G(~). Together with the bias term obtained
in Lemma 5, this will establish the representation of ~nk given in Theorem Nl. Having
completed the proof of Theorem Nl in this section, we shall then prove Theorem Kl in the
next section, via formula (5), establishing the corresponding representation for the kernel
estimator (nh (with uniform kernel and bandwidth h). It should be emphasized that
Theorem Nl is proved uniformly in k E In(a,b) with 0 < a < b and Theorem K1 is
proved uniformly in h E In(c,d) with 0 < c < d where In(a,b) and In(c,d) are as given
in Section 2.
The first representation of ~nk - ~nk rests on Lemmas 8 and 9, which run parallel to
Bahadur's proof [1]. However, we start with a lemma which provides an exponential bound
for the deviation of sums of independent Bernoulli variables from their mean and then
.
prove another lemma dealing with fluctuations of G nk (') - G"nk(' ).
Lemma 6. Let Unl"",U mn be independent Bernoulli variables with P(Uni=l) =
Then
P[I n-
1n
1 2
E (U n1·-7r .) I > t n] ~ 2 exp[- -2 n t n / {max
1
ill
1 ~ i ~n
In particular, if t / max
n 1 ~ i ~n
P[ In-
1n
E (U
1
7r .....
ill
ill
111
0, then for large n,
.-7l' .) I > t
ill
7r.
1 2
] ~ 2 exp[- -4 n t / max
n
n 1 ~ i ~n
7l'.].
ill
+ t } 1.
n
ni .
7r
-18 -
and if
max 7r. It -l 0, then for large n,
'(
m n
..."
1(_ l_n
,
Proof. The first inequality is
205), from which the
Lemma
7.
I (nk-(nk I ~
a simplified version of Bernstein's inequality (see [14], page
other't~~ follow as special cases.
Suppose
kmeasurable
are
random
variables
with
2
C n- 5 log n = (n(C), Then for any "I, there exists M such that
3
> M n- 5 log nJ < 00 ,
Proof. Write Unki
= I(Zni
E(U nki I vC). Then
Choose
B
I (nk-(nk I ~
> b/f(x O)'
~ (nk) - I(Zni ~ (nk) and /lnki
-'·r
, ,'.
and for each
(n (C) = C n-
glog n,
n, let
S
n
=
{Y
4
n,[n 5bJ
~
1
B n- 5}
Since
Lemma 4 implies that there exist C' and N such
that for n ~ N and for z lying between (nk and (nk'
Iz-( I ~
2
2
C n- 5 log n + C' n- 5 ~
2 (n(C) holds on the set Sn' Using Lemma 3, we now conclude that when n is large,
then on Sn'
-19 -
It now follows from Lemma 6 that for sufficiently large n,
max
kEln(a,b)
k
P[I k-1 E (U k Jl k
o-
1
nl
"
O )
nl
~
I > M n- ~ log nl
"
3
1k
max
E P[lk- E (U k Jl k > M n-Slog nl
kEln(a,b)
1 n I n I
=
o-
O
)
~ 2 exp[-M 2 a{5 C g(~)}-l log nl + P(S~)
To complete the proof, observe that
~
n-
.
0
5Cg(~)} -1 <
n -y-M 2a {
1
AJ
for sufficiently large }'1,
00
00
and
E P(S~) <
by Lemma 1.
00
o
n=l
= n-
We now define an
into 2b
n
1
= nS
and divide the interval [~nk-an' ~nk+a n]
equal intervals:
each of length an/b = nn
Hnk ( z)
(17)
2
Slog n, b n
1I n*k
11*n
=
=
=
~ log n.
.
Let
.
{G nk (z) - Gnk ( ~nk ) }
sup
I Z -~ n k I~an
IHnk(z)1 -
max
IH~kl
kE I n ( a ,b)
max
~up
IH n(z)1
k'
-b n <
r
<
b
-1
zE
J
nk
r
- - n
,
L
-20-
.
Lemma 8. P[
max
kEl (a,b)
n
I ~nk-~nk I> an' La.) = 0
•
(]nk( ~nk-an)' Fix B
min
kEI n (a,b)
> b/f(xO) and let S = {Y
n
4
n,[n 5b)
~ B n-
1
5}. Then by Lemma 3,
{[kp)/k-(]nk(~nk-an)}~ ~g(~)an
on the set Sn when n is large. Hence for large n,
.
P[
min
(e k- e k) ~ -a )
kEln(a,b) n
n
n
00
by Theorem 1 af Hoeffding (1963).
00
~
4
1
2
2
n5 exp[- 2 a g (e)(log n) )
Since
~ P(S~) <
n=l
< 00,
n=l
.
P[
min
(e k-~ k) ~ -a La.) = O.
kEI (a b) n
n
n
n
'
In the same way,
.
P[
max
(e k- e k) ~ a La.)
n
n
kEl (a,b) n
n
and the lemma is proved.
= 0,
00
by Lemma 1
and
- 21-
Lemma 9. P[H: > C n-! log n La.]
=0
for large C.
.
Proof. It follows from the monotonicity of Gnk (') and ITnk (·) that for
Z
E J nk ,r =
[1]nk ,r' 1]nk ,r+1],
where Hnk (·) is given by (17), and
Hence
~
Let Sn
= {Y
IHnk (1]nk r) I +
max
-bn -<r<b
- n
4
n,[n5b]
~
,
max
a nk r .
-bn -<r<b
-1
'
- n
1
B n- 5} as in the previous proofs. Then by Lemmas 3 and 4,
max
a nk r
-bn -<r<b
-1
'
- n
~ 2 g( e)n- ! log n
on the set Sn when n is large. Hence
P[H~ > {M +
~ P[
2g(en n-!log nJ
max
max
kEI (a,b) -b <r<b
n
~ 2n(b-a)
n- - n
3
C
IH k(1]nk r)1 > M n- 5 10g n] + P(S )
n
,
n
-! log n] + P(Sc).
max
max P[ IH k(1] k ) I > M n
kEI (a b) -b <r<b
n n,r
n'
n- - n
n
-22 -
But
k -~ k I ~ nn ,r n
1'1
max
-b <r<b
n- - n
00
i log
n, and by Lemma 7,
_i!
,,',
En·
max
max P[I H k( 7] k ) I > M b
n=l
kEI (a,b) -b <r<b
n n ,r
n
n- - n
5
log nl <
00,
•
00
while
E P(S~) <
n=l
00
by Lemma 1. This completes the proof.
0
By Lemmas 8 and 9, we now have:
A
A
P - Gnk(~nk) = Unk(~nk) - Unk(~nk)
+ Rnk (l)
(18)
A
= (~nk-~nk)gnk(~~k)
+ Rnk (l),
where ~~k lies between ~nk and ~nk' and
(19)
e'
2
max
IR k(l) I = O(n- 5 log n), a.s.
n
kEI (a,b)
n
By the corollary to Lemma 4 and Lemma 8,
max
kEI n (a,b)
I ~~k-~ I =
O(n-
2
5
log n), a.s.
Consequently, Lemma 1 and 3 now imply
(20)
Again, letting Unki = l(Zni ~ ~nk)' we have
A
P[
max
kEI (a,b)
n
~
Ip-G
2
k( ~ k) I > n- Slog nl
n n
k
2
E
EP[I k-1 E {U k' - E(U k· 1~} I > n- 5 log n IAl
kEI n ( a, b)
1 n)
n )
-23-
4
~ 2(b-a)n~
2
exp[-2(b-a)(log n) ] ,
00
~
by Theorem 1 of Hoeffding (1963), and
4
n~
2
exp[-2(b-a) (log n) ] < 00.
n=l
Hence
From (18), (19), (20), and (21), we have
max
kEI (a,b)
1((nk-~nk)-{g(~)}-l [P-Gnk(~nk)]1 =O(n-~logn),
a.s.
n
.
-1 k
Since p - Gnk(~nk) = k
~ [1(Zni > ~nk) - {1-Gni(~nk)}]'
1
we now have the following representation:
(22a)
max
kEI n (a,b)
IR k l =O(n-!logn),
a.s.
n
This representation can be easily modified to two other slightly different forms, viz.,
(22b)
.
~nk = ~nk
+ {k g( ~)}-
1k
~ [1(Zni > ~) - {1-Gni(~)}]
1
+ Rnk
and
•
(22c)
~nk
k
=
~nk + {k g(~)}-1 ~ [1(Z~i > ~) - {1-G(~)}] + Rnk
1
-24 -
max
kEI n (a,b)
where
IRnkl
== O(n-! 'log n),
a.s., in both (22b) and (22c), and Z*.
m
1
•
= G- 0 Gni(Zni) and G(·) = G( ·Ixo) is the conditional cdf of Z given
x=
x O' Note
are conditionally independent given v'6, with
Z.
having
,
that since
11l
conditional cdf G ,
ni
P[Z~i ~ zi' i=l, ... ,n] = E P[Z~i ~ zi' i=l,... ,n I ~
-1
.
n
= E P[Z 11l. -(G n 1. 0 G(z.),
1=1,... ,nl ~ = II
1
1 G(Z1')'
VOJ
so that for each n, Z~l "",Z~n are iid with cdf G( . ).
To obtain (22b) from (22a), use Lemma 7 with (nk = e to conclude that
1k
max
Ik- ~ {l(Z·~ek)-l(Z·~e)}-{G·(ek)-G ·(enl
kEI n (a,b)
1
m
n
m
m n
m
To obtain (22c) from (22b) it is enough to show that
(23)
max
kEI (a,b)
n
where
E(U
ni
U.
m = l(Z*.
m
I JC).
I
k-
1k,
3
~1 (Uni-/l ) 1 = O(n- 5 log n), a.s.,
ni
> G-1 0" ·G..m·(e)) - l(Z*.
m > e), and Itnl' - G(e) - Gnl·(O =
For this, use Lemma 3 to observe that for large n,
max
4
i~ [ n5 b]
l/l·I~B
m
2
2
IQ(Oln- 5 0nthesetS ={Y
and then use Lemma 6 and Lemma 1 to obtain
n
1
4
n,[n 5bj
~Bn-5},
•
-25-
~ 2(b-a)
00
2
4
1
2
00
E n!i exp[-a{4 B IQ(~)I}- (log n) ] + E P(S~) <
n=1
),
00.
n=1
This proves (23) and the representation (22c) is established.
Combine Lemma 5 with (22c) and note that G(~) = p to complete the proof of
Theorem N1.
8. Proof of Theorem Kl.
The kernel estimator ~nh can be regarded as the NN estimator ~nKn(h) in which
Kn(h) is a random integer given by (4). A formal substitution for k by Kn(h) in the
representation given in Theorem N1 leads to
(24)
~nh-~
=
t1(~){Kn(h)/n}2 + {Kn(h)g(~)}-l
Kn(h)
E
fl(Z*. > ~) - (l-p)] + R I' (h)'
1 .
III
n \.n
However, this is of no use unless we can show that
(a)
sup
IRK (h) I converges at a fast rate,
hEJn(c,d)
n n
1
where In(c,d)
(b)
1
= [n- 5 c, n-!i d],
0<
C
< d, and
in the first two terms of the RHS of (24), K (h) can be replaced by the leading term
n
of its deterministic component without slowing down the rate of convergence of the
remainder term.
We now proceed to establish (a) and (b).
First examine the magnitude of Kn(h) - nhf(x ) in the following lemma.
O
-26-
= Kn(h) -
Lemma 10. Let ~n(h)
sup
hEJn(c,d)
I~
nhf(x O)' Then
2
n
(h) I = 0(n5 log n), a.s.
Proof. Let /t(h) = P(Y ~ h/2) and write ~n(h) = ~nl (h) + ~n2(h), where ~nl (h) =
n
I;
'
[1(Y i ~
1
h/2)
-
Jl(h)],
I/L(h)-hf(xo)1 ~(h3/24)
'~n2(h)
=
n[Jl(h)
-
By condition
hf(x O)]·
1,
2
sup
Ifll(xo)l. Thus
Ix-xol~h/2
sup
I~ 2(h) I = 0(n 5) .
hEJ (c,d)
n
n
n
Next consider
and
1/n
nJl(h).
~nl (h),
Since
which is the difference of non-decreasing functions I; I(Y i
1
Jl'(h) < 2 f(x O)
on
In(c,d), we divide
2
= 2( d-e) f(x )n 5/log n equal intervals to ensure that
nJl(h)
O
2
n510g n in each of these interv~ls. Consequently, if
endpoints of these intervals, then
sup
hEJn(c,d)
I~nl (h) I ~
h/2)
In(c,d)
into
increases by at most
2
n5 log n at the
IJ
n+ 1
a
•
2
I~
1(h) I ~ 2 n5 10g n. Thus
n
2
P[
~
2
sup
1~I(h)I>2nfilogn]~(I/+I)
sup
P[I~I(h)l>n510gn].
hEJn(c,d)
n
n
hEJn(c,d)
n
?
2
sup
P[I~nl(h)1 > n5 log nJ ~ 2 exp[-(log nt/{S df(x )}]
O
hEJ n (c,d)
00
by Bernstein's inequality, and
I;
2
2
n5 exp[-a(log n) ] <
00
for all
a > O.
II cncC'
n=1
sup
I~ 1(h) I =
hEJn(c,d)
n
•
2
0(n 5 log n), a.s., and the lemma is proved.
o
-27 -
The convergence rate of
sup
IRK' (h)· 1 is now determined in the following
hEJ (c,d)
n n
n
lemma.
Lemma 11.
sup
IRnK (h) I = O(n-! log n), a.s.
hEJn(c,d)
n
Proof. For 0 < c < d, let 0 < a = c f(x O)/2 < 2 d f(x O) = b, and let
IR
A = {sup
hEJ (c,d)
n
n
K (h)1 > M n-!log n},
n n
IRnk I >
B = { max
n
kEI (a,b)
n
3
M n- 5 log n},
2
C = {max
I~ (h) I > M nO log n} .
n
hEJ n(c,d)
n
There there exists NO=NO(M) such that for n > NO' C~ implies Kn (h) E In (a,b) for all
hE In(c,d). Thus
CnC n An c CnC n Bn for n > NO .
It now follows that for sufficiently large M,
P[A
n
La.] ~ P[C
n
~ P[C
La.]
+ P[ U
La.] + P[ U
n
n
N0->1 n>N
- 0
n
C~ and An 1.0.]
N>N O n~NO
Cnc and Bn La.]
= 0,
since for large M, P[C n La.] = 0 by Lemma 10 and P[B n La.] = 0 by Theorem N1.
o
-28 -
We now consider the first two term,s on the RHS of (24). Of these,
•
(25)
where Rl~h
= ,8(Of2(xO)h 2 [~n(h)/{nhf(xO)}]
. [2
+ ~n(h)/{nhf(xO)}], and by Lemma
10,
To examine the other term, let
Then
{Kn(h)g(~)}-l
Kn(h)
E., Uni
1
=
{mn(h)g(~)}-l
mn(h)
E
1
Um.
+ R"nh + R'"nh '
mn(h)
,
R~h = -{~n(h)/Kn(h)} {mn(h)g(~)}-l E U.
m
(27)
1
where Un1 ,,,,,U nn are conditionally independent given ..t6, with E(U m·1
Lemma 10,
sup
hEJ n (c,d)
I~n(h)/Kh(h) I = O(n-
mn(h)
1 m (h)
E
sup
n
1
hEJ n (c,d)
U·I =
m
2
fj
~ =
O. By
log n), a.s., and
k
1
1k-1 E U .1 = O(n- 5), a.s.
max
1 m
n!cf(x ) ~k~n!df(xo)
o
•
-29-
by an application of Theorem 1 of Hoeffding (1963). Hence
Now consider the jump-points of mn(h) = [nhf(x )] in In(c,d) together with the endO
1
1
points n- 5" c and n- 5" d, and call these points n-
!c =h
< h n1 < ... < h nvn = nnO
1
5" d.
Then
and mn(h) is constant on each of the vn intervals [hnj , hn ,j+l). At the same time,
Kn(h) is also integer-valued and non-<lecreasing, and
IUni I ~
1. Hence for each j and
for all h nj ~ h < hn,j+ l'
Therefore, if we can show that
(28)
max
O~j~vn
then in (27)
I
Kn(h nj )
E
mn(h nj )
U .-
1
sup
IR"'(h) I
hEJ (c,d)
n
n
n1
E
1
1
U.I
n1
= O(n- t log n),
= O(no log n),
a.s.,
a.s., will follow because by Lemma 10,
-30 -
sup
hEJn(c,d)
II-~
(h)/K ,{hYI {m (hn-1 = O(n- t), a.s.
n
n
n
of
•
since h '+l-h ,< {nf(x o)}-l. Hence for large n and for all J,
n,J
nJ 1
5
P[K n (h n,J'+1) - Kn(h nJ.) >
- 2 M n log n]
1
'+1) - Kn(h nJ·n
-< P[n- {K n (h n,J.
~
nJ,>- M n-~ log n]
-1r
1
exp[-(M/4)n5 Iog n]
1
00
by Lemma 6, and
4
v exp[-(/L/4)n5 log n]
I:
n=1 n
< 00, because vn
= canst. n5 , This proves
e -\
(29).
Kn(h nj )
Finally, note that
f'
I11
Uni
n(h nj )
f
Uni is a sum of
IKn(h nj ) -
mn(h nj ) I
terms in which the summands U ' given in (26), when K (h .) > m (h ,), and -U "
III
n nJ
n nJ
III
when K (h .) < m (h .) are conditionally independent given.A, with E(U ,I vC) = o.
n nJ
n nJ·
,
III
Using Hoeffding's inequality, we therfore have
•
- 31-
2
<
> M nfilog n]
- 2 P[lK n(h nJ.) -m n(hn·)1
1 J
+ 2 exp[-2 M log n]
Since
00
00
4
E vn exp[-2 M log n] = canst. E nfi
n=l
n=1
-2M
< 00 for sufficiently large M, we only
need to show that
00
(30)
4
2
E nfi P[I K (h .) - m (h .) I > M n5 log n] < 00
n=1
n nJ
n nJ
for large M, in order to establish (28). But Kn(h) - Mn(h) = Kn(h) - [nhf(x O)] differs
from ~n(h) = Kn(h) - nhf(x O) by at most 1, and it was shown in the proof of Lemma 10,
g
that P[ I~n(h) I > M n log n] < exp[-a(1og n)2] for some a > 0, which implies (30) anel
thus (28) is established. \Ve have now shown that in (27),
t
sup
IR"'h I = O(n-- log n), a.s.
hEJn(c,d) n
To complete the proof of Theorem Kl, substitute the expressions in (25) and (27) for
the first two terms on the RHS of (24). The remainder terms R'nlI' R"l
nl and R"'l
nl is
these expressions have all been shown to be
O(n-
t log
n), a.s. and uniformly in
h E In(c,d), and the other remainder term RnKn(h) has also been shown to be of this
order in Lemma 11. This establishes the order of magnitude of R*h
n = R'nh + R"h
n +
+ RnK (h) claimed in the theorem.
n
R"']
nl
-32-
REFERENCES
(1) Bahadur, RR (1966). A note on quantiles in large samples. Ann. Math. Statist. 37,
577-580.
(2) Bhattacharya, P .K. (1963). On an analog of regression analysis. Ann. Math. Statist.
34, 1459-1473.
(3) Bhattacharya, P .K. (974). Convergence of sample paths of normalized sums of
induced order statistics. Ann. Statist. 2, 1034-1039.
(4) Bhattacharya, P.K. and Mack Y.P. (1987). Weak convergence of k-NN density and
regression estimators with varying k and applications. 15, 976-994.
(5) Cheng, K.F. (1983).
Nonparametric estimators for percentile regression functions.
Commun.Statist.-Theor. Meth. 12, 681-692.
(6) Csorgo, M. and Revesz, P. (1981). Strong approximations in Probability and Statistics.
Academic Press, New York.
(7) Gangopadhyay, A.K. (1987).
Nonparametric estimation of conditional quantile
function. Ph.D. dissertation at the University of California at Davis.
(8) Gikhman 1.1. and Skorokhod, A.V. (1969). Introduction to the theory of random
processes. \V.B. Saunders company, Philadelphia.
(9) Hoeffding, \V. (1963). Probability ineqalities for sums of bounded random variables. 1.
Amer. Statist. Assoc. 58, 13-30.
(10) Hogg, RV.
(1975).
Estimates of percentile regression lines using salary data.
J.
Amer. Statist. Assoc. 70, 56-59.
(11) Krieger, A.M. and Pickands, III, J. (1981). Weak convergence and efficient density
estimation at a point. Ann. Statist. 9, 1066-1078.
(12) Stone, C.J. (1977). Consistent nonparametric regression. Ann. Statist. 5,595-645.
(13) Stute, W. (1986). Conditional empirical processes. Ann. Stistist., 14,638-647.
(14) Uspensky, J.V. (1937). Introduction to Mathematical Probability. McGraw-Hill, New
York.
e)