BOOTSTRAP CONFIDENCE INTERVALS FOR OONDITIONAL QUANTILE FUNCfIONS
By
SUMMARY.
Ashis K. Gangopadhyay and Pranab K. Sen
University of North Carolina, Chapel Hill
Based on the (k -) nearest neighbor as well as the (h -) kernel
n
n
methods of estimation, bootstrap confidence intervals for a conditional
quantile function are considered.
Along with a Bahadur-type representation
for bootstrap sample quantiles, the crucial choices of k
n
and hare
n
examined. and the related asymptotic theory is presented in a systematic
manner.
1.
INTRODUCTION
Let {(X 1.. Z.).
i l 1} be a sequence of independent and identically
1
distributed (i.i.d.) random vectors (r.v.) with a distribution function
(d.f.) U(x.z). (x.z)
€
2
R (=(-00.00)2).
Let F(x) = U(x. oo ). x
€
R. and let
G(zlx) be the conditional d.f. of Z given X=x. for z € R. x € R.
A
conditionaL quantiLe function (of Z given X=x) is defined by
fp(x) = inf{z : G(zlx) l pl. x € R
Asymptotic normality of kerneL estimators
(0
< p < 1).
(1.1 )
of f (x) (in the fixed design
p
case) was studied by Cheng (1983). while. in the random design case. Stute
AMS (1980) Subject Classificatiqns:
62 E 20, 62 F 12.
Keywords and Phrases: conditional quantile. kernel estimator. nearest
neighbor estimator. Bahadur representation. bootstrap confidence interval.
weak convergence. Wiener process. order statistics, induced order
statistics.
Short Title:
Bootstrap Intervals for Conditional Quantiles
2
(1986) considered nearest neighbor (NN-) type estimators of f (x).
p
context. k • the cardinality of NN. is taken as O(n
n
2l3
In this
). and h . the
n
-1/3
bandwidth in the (uniform) kernel method. is taken as O(n
).
Recently,
Bhattacharya and Gangopadhyay (1988) have studied both the NN and kernel
type estimators of fp(x). and incorporating the celebrated Bahadur (1966)
representation For sampLe quantiLes
able to consider k
choice of k
n
= O(n~5 }
and h
in a conditional framework. they were
n
= O(n-1/5 ).
However. in this optimal
n and h n . bias terms crop up. raising the question of
attainability of such optimal rates.
The object of the present study is to construct bootstrap conFidence
intervaLs
for f (x) employing both the NN and kernel methods.
It is shown
P
that if. for some ~
> O.
k
= kn = O(n~5~)
and h
= hn = O(n-1/5-~).
then
the proposed bootstrap methods work out neatly and the leading bias terms
may be eliminated readily.
Thus. without any essential reduction in the
rate of convergence. the bias terms are eliminated while the other
asymptotic properties are retained.
In this context too. a Bahadur type
representation for the bootstrap sample (conditional) distributions plays a
vital role.
Based on LocaL bootstrap sampLes. estimators of f (x) are
p
considered. and incorporating a Bahadur type representation. their
asymptotic properties are studied in a systematic manner.
This shows that
bootstrap distributions can be validly used to approximate closely the
sampling distributions of estimators of f (x). and this. in turn. provides
p
asymptotic confidence intervals for the conditional quantile function f (x).
p
Along with the preliminary notions. the main theorems are presented
in Section 2.
Section 3 deals with a Bahadur-type representation for local
bootstrap samples. and this is then incorporated in Section 4 in the
3
derivation of the main theorems.
In this context. certain other asymptotic
results on bootstrapping. having interests of their own. are also considered
along with.
The concluding Section deals with some general remarks.
2.
THE MAIN RESULTS
Let fp(x) be defined by (1.1). and consider a fixed p € (0.1) and
:
-00
< Xo < 00.
Since
may write fp(xO)
X
= fo.
o
X
o
and p are held fixed. for notational simplicity. we
The following regularity conditions are assumed to
be true:
[AI]
The d.f. F admits an absolutely continuous density function f. such
that
and
(a)
f(xO)
(b)
f"(x)
= f O > O.
= (d2/dx2 )
(2.1)
f(x) exists in a neighborhood of x o • and there
exist positive numbers € and k O' such that Ix-xol ~ c implies that
If" (x)-f" (xO) I ~ kO Ix-xo I.
[A2]
The d.f. G(zlxo ) has a continuous density function g(zlxo ) for all z
close to fo' such that
and.
>0
(a)
g(folxo )
where
G(folxo) = p.
(b)
the partial derivatives g (zlx) and g (zlx) of g(zlx) and
z
xx
(2.2)
Gxx(zlx) of G(zlx) exist in a neighborhood of (xO.f O)' and there exist
positive constants c and kO' such that Ix-xol ~ c and IZ-fOI ~ c together
imply that
(2.3)
4
Incidentially. (a) in [A2] insures that f O is uniquely defined by G(folxo) =
p.
Consider next the transformation (X 1.. Z.)
1
Ixi-xol. for i ~ 1.
(Y1
.. Z.)1
where
Y. =
1
~
The marginal d.f. of Yi is denoted by Fy(y). so that by
[AI]. Fy(Y) admits a density function fy(y) = f(xO+Y) + f(xO-y). and for y
"close to" O. the condition (b) in [AI] holds for fy(y) as well.
The
conditional density g*(zly) of Zi' given Yi=Y' and the corresponding d.f.
G*(zly) are given by
Note that
(2.6)
In the sequel. we shall write G(zlx ) = G(z)
o
g(zlx ) = g(z).
o
and
For the collection {(y1. Z1).··· . (Yn,Zn)} of r.v. ·s. let Yn1
Ynn be the order statistics corresponding to Y ..... Yn . and let Zn 1
1
<
<
Znn
be the induced order statistics (i.e .. Z . = Z. if Y . = Y.. for
nl
i.j=l ....• n).
For every positive integer k
J
nl
J
the k-NN empiricaL d.'. of
(~n).
Z (with respect to x ) is given by
O
"
Gnk(z) = k
-1
k
!
i=l
l~Zni ~ z).
z
€
where l(A) stands for the indicator function of the set A.
estimators of f
(i)
O
(2.7)
R.
The following
are due to Bhattacharya and Gangopadhyay (1988):
The k-NN estimator of f O is
e·
5
t nk = the [kp]th order statistic of Zn1 •.... Znk
= inf{z
(ii)
'"
~
: Gnk(z)
k
-1
[kp]}.
(2.8)
The kernel estimator (with uniform kernel and bandwidth h) is
too
(2.9)
= inf{z
where
Kn(h)
=
is a positive integer valued r.v.
too
n
~
i=l
l(Y i ~
1
2
(2.10)
h)
Note that by (2.8) and (2.9).
=
t nK
(h) .
(2.11)
n
where h is usually non-stochastic and Kn (h) is stochastic in nature.
order that
~
00.
k
= kn
t'"nk
~
00
(and
'"
too)
are consistent estimators of
to'
In
we need that as n
= hn ! 0). For the asymptotic normality results.
a
that kn = O(n ) = 00n • for some a € (0.1) : Stute (1986)
(and h
usually. one needs
considered the case of a = 213. while Bhattacharya and GangoPadhyay (1988)
studied the case of a = 4/5.
However. in the later case. there is generally
a bias term of the order n- 2I5 • and it may be necessary to eliminate the
bias term in drawing statistical conclusions on
bias term can be eliminated if we choose a
of a
~
to.
< 4/5.
We shall see that this
We shall discuss the case
4/5 in the concluding section.
To formulate the main results. we define. for every n.
(2.12)
where 0
< a < b < 00
and
0
< 6 < 1/10.
Also. let
6
J n ( c. d) -- [ cn-1/5-20 . dn-1/5-20] .
o <c <d <
1
o < 0 < 10'
00.
(2.13)
Then the following asymptotic results follow along the lines of Bhattacharya
and Gangopadhyay (1988) [and hence. the derivations are omitted]:
THEOREM 2.1.
Under assumptions [AI] and [A2]. as n
'"
f nk - f O = (kin)
2
~(fO) + {kg(fO)}-
1
k
~
0
i=l
[l(Z.
nl
-+
00.
> f O)-(l-p)] + Rnk .
(2.14)
mere
~(fO)
3
= -[f(xO)Gxx(fOlxo) + 2f'(xO)Gx(fOlxo)]/{24 f (xo )g(fo )}'
o = GoG
-1
* (Zni IY ). i=l ..... n.
Zni
ni
and
IRnk I
max
k€I (a.b)
-3/5+30/2
= O(n
log n)
a.s.
e·
n
THEOREM 2.2.
Under the hypothesis of Theorem 2.1,
2/5-0 '"
{n
[f
~
-+
where B = {B(t).
{p(l-p)}
t €
~
n -+
{[g{fo )]
-1
t
-1
B{t).
t
€
[a.b]}
t
€
-+
00.
[a.b]} .
(2.15)
R+} is a standard Wiener process on R+. and ~ stands
For the convergence in law.
as
415-20 -fo]'
n[tn
]
as n
Hence. For every k : k = [tn4l5- 20 ].
t €
[a.b].
00.
n
THEOREM 2.3.
2/5-0
A
[fnk-fO]
~
-+
N{O.t
-1
2
p{l-p)/g (fo»'
Under Assumptions [AI] and [A2]. as n
-+
00.
(2.16)
7
{[OOf(xO)]g(fO)}
[OOf(x )]
O
-1
.~
o > fo) - (l-p)} + ROO
* .
{1(Zni
(2.17)
1=1
where ~(fo) and ZOo
are defined as in Theorem 2.1 and
nl
* I = O(n-3/5+30/2
IRoo
sup
h€Jn(c.d}
THEOREM 2.4.
log n)
Under the hypothesis of Theorem 2.3. as n
2/5-0 '"
{n
~
-+
{p(l-p)}
[f
~
n(tn
-1/5-20
{f(xO)g(fo )}
)
- fo]'
-1
{t
-1
n -+
-+
(2.18)
00.
t € [c.d]}
B(t).
where {B(t). t € R+} is defined as in Theorem 2.1.
[c.d]. as
a.s.
t € [c.d]}.
(2.19)
Hence. for every t €
00.
(2.20)
Our main interest lies in the construction of (LocaL) bootstrap
confidence intervaLs for f
O
based on both the k-NN and kernel methods. and
to study their asymptotic properties.
For this purpose. in the k-NN
* .... Znk)
* is obtained from the empirical
procedure. a bootstrap sample (Znl'
"
d.f. G
. defined by (2.7).
nk
Thus. given the induced order statistics
*
.
(Znl' .... Znk)· the Zni
are conditIonally
i.i.d.r.v. with the d.f.
" ·
G
nk
Similarly. in the kernel method. given the Znl. and he> 0). we define kn (h)
by (2.10) and then a bootstrap sample of size K (h) is obtained by
n
resampling from all the Zni for which Yi
= Ixi-xol
introduce the bootstrap sample estimates of fo'
~ h/2.
First. let us
8
"'* be the bootstrap sample d.f. (based on
For the k-NN procedue, let G
nk
* ... ,Znk).
*
Zn1,
Then, the bootstrap estimator is
~ = [kp]th order statistic of Zn1,
* ... ,Znk
*
E
nk
"'*
= inf{z : Gnk(z)
~
k
-1
(2.21)
[kp]}.
Similarly, in the kernel method, the bootstrap estimator is
(2.22)
n
where K (h) = ~ 1(Y i
n
i=1
~
h/2) is conditionally held fixed.
Now, parallel to
Theorems 2.1 through 2.4, we have the following asymptotic representation
for the bootstrap estimates.
THEOREM 2.5.
o <a <b <
Under [A1] and
00,
as n
[A2], for every k
€
I n (a,b), defined by (2.13).
~ 00.
(2.23)
mere
znO*i
-_ G- 1
are i.i.d.r.v. with the d.f.
max
o
G"
nk
(Z*)
ni '
G(e), and as n
i=1, .... k,
~
(2.24)
00,
IR~I = O(n-3/5+3612 log n)
a.s.
kEln(a,b)
THEOREM 2.6.
Under the hypothesis of Theorem 2.5, as n
~ 00
2/5-6 "*
"
{n
(E
4/5-2~ - E
4/5 2~ ), t € [a,b]}
n[tn
u]
n[tn
- u]
(2.25)
e-
9
w
~
~ {p(l-p)}
{g(fO)}
-1-1
{t B(t), t E [a,b]},
where B(t), t E R+ is defined as in Theorem 2.2.
for every 6 : 0
THEOREM 2.7.
< 6 < 1/10,
and every 0
(2.26)
Hence, for each t,
< a < b < 00.
Under [A1], [A2], for every h E J (c,d)
0
n
0*
{l(Znl.
< c < d < 00,
> f O)
- (l-p)}
(2.28)
where the Z~ are defined by (2.24), and
sup
hEJn(c,d)
THEOREM 2.8.
** I
IRnh
= O(n-3/5+36/2 log n)
a.s.,
Under the hypothesis of Theorem 2.7, as n
as
n
~
00.
(2.29)
~ 00,
2/5-6 "'*
'"
{n
(f
-1/5-26 - f
-1/5-26' t E [c,d]}
n(tn
)
n(tn
)
W
~
2
2
~-1
{p(l-p)/f (xO)g (fO)} {t
B(t), t E [c,d]},
where B(t), t E R+ is defined as in (2.26) and 0
(2.30)
< c < d < 00. Hence,
It is interesting to note that for each r(=1,2,3,4), the asymptotic
representation in Theorem 2.r for the original sample and Theorem 2·r+4 for
the bootstrap sample are the same.
As such, one can generate a set of M
10
bootstrap samples. and from each one. compute the bootstrap estimates. and
AM
A
then use the empirical quantiles (for the fnk-f nk
desired bootstrap confidence intervals for fO.
or
~*
~
fnh-f nh ) to set the
The asymptotic properties of
such bootstrap intervals would then follow from Theorems 2.5 through 2.8.
3.
BAHADUR REPRESENTATION FOR BOOTSTRAP (CONDITIONAL) QUANTILES
For the proofs of Theorems 2.5 through 2.8. we need a few preliminary
lemmas and a Bahadur (1966) type representation for local bootstrap
conditional distributions.
These are considered in this section.
Note that given the (Xi or the) Yni . Zn1 ..... Znk are conditionally
independent r.v.·s with d.f.·s G*(. IY ) ..... G*(·IY ) respectively [viz ..
n1
nk
Bhattacharya (1974)].
We define
* .(z). G* (z IY .) = G* i(z). i
g* (z IYnl.) = gnl
nl
n
k
*
-*
g i(z). Gnk(z)
i=l n
= k -1
-*
gnk(z)
!
= k -1
k
!
i=l
~
1.
z
€
R.
G* .(z). z € R.
nl
(3.1)
(3.2)
(3.3)
Further. note that. by definition.
o < Yn1 < ... < Ynk •
and by Lemma 3.1 (to follow). Y = O(n
nk
holds and k € I (a.b)].
n
-1/5-20
)
(3.4)
a.s .• as n
~
00 [when [AI]
As such. we may be tempted in making local
-*
expansions for -*
~ and G
in terms of powers of the Y .
nk
ni
Towards this. we
have the following:
Lemma 3.1
Under [AI] and for
O. there exists an n
«
O
k € In(a.b). for every B
00). such that
Bf(x )
O
>b >a >
e-
11
P{Y
n[bn4l5- 20 ]
> Bn-1/5-20}
~ exp{_2n3/5-40 (Bf(X )-b)2}.
V n ~ nO'
O
-115-20
so that Y
415 2~ = O(n
)
n[bn
- u]
a.s .. as n
~
(3.5)
00.
For a proof of the lemma. we may refer to Bhattacharya and Mack
(1987). and hence. the details are omitted.
Likewise. Lemmas 3.2 through
3.5 (on the base sample) are formulated along the same lines as in
Bhattacharya and Gangopadhyay (1988) [Wi th only a change in the order of k],
and hence. these will be stated without proof.
We adopt the same notations
as in Section 2 [see (2.1) through (2.6). for example].
Lemma 3.2 Under [AI] and [A2].
(3.6)
(3.7)
where q(z;xO)
= gxx(zlxo ).+
2 f'(xo)Gx(zlxo)/f(xO)'
for every
2 f'(xO) ~(zlxo)/f(xo)'
and there exist ~ > 0 and M(O
= Gxx(zlxo )
< M < 00).
+
such that
y: 0 ~ y ~ ~. and IZ-fOI ~~. \q(z;XO)\' IQ(z;xo )I. Ir(z.y;xo )I
and IR(z,y,xO)\ are aLL bounded by M.
LEMMA 3.3 Under the hypothesis of Lemma 3.2. for a n
< Mn -3/5+30/2 I og
where M«
Q(z;xO)
00) is a generic constant.
n
~
n
-215+0
i.o.} =0.
log n,
(3.8)
12
Lemma 3.4 Under the hypothesis of Lemma 3.3.
P{
Ifnk-fnkl ) an
max
i.o.}
k€I n (a.b)
»'
Lemma 3.5
For every B() b/f(xO
= O.
(3.9)
there exist nO and c (both
< 00).
such
that in the sample space of infinite sequences {(Yi.Z ). 1 ~ i ~ n; n ~ 1}.
i
-1/5-20 .
Ifnk-f I ~ C n-2/5-40 . for
Y
4/5-20 ~ B n
tmplies that
max
O
n[bn
]
k€In(a.b)
~
every n
nO.
Hence. for every 0
max
k€I n (a.b)
< a < b < 00.
-2/5-40
)
Ifnk-f OI = O(n
a. s.. as n -+ 00.
(3.10)
With these results at hand. we now proceed on to consider parallel
* ..... Znk.
*
results for the bootstrap sample: Zn1
where C
1
«
Let an = C1n -2/5+0 log n
00) is an arbitrarily (large) positive number.
Let then
(3.11 )
(3.12)
H* = max
n
H*
k€I (a. b)
n
(3.13)
nk
Then the following Theorem is an extension of the classical Bahadur (1966)
representation for the conditional empirical d.f.·s in a bootstrap model.
THEOREM 3.1.
Under [AI] and [A2]. defining
P{~ ) M n-3/5+30/2 log n
1
I (a.b) as in (2.12).
n
i.o.} = O.
(3.14)
e-
13
Proof:
Let b
n=n
1/5-0/2 • I
~
= [C
-a _C
~~
+a]
n
n~~
(3.15)
= [~~.r· ~~.r+1]·
(3.16)
r = -b . -b +1 ..... -1.0.1 ..... b -1.
n
n
n
then
H* = max
max
sup
IH~(z) I.
n
k€I (a.b) -b ~r<b -1 z€J_1 •
n
n - n
l~.r
(3.17)
A
If follows from the monotonicity of G~(.) and G~(.) that for z €
J~.r = [~~.r· ~~.r+1]·
(3.18)
where
H~(.}
is given by (3.11). and
A
A
a~.r = G~(~~.r+1} - G~(~~.r}
(3.19)
Hence
5:
max
-bn 5:r~bn
IH_l.(~_l.. r}1 +
1~
1~
-b
n
max
~r~b -1
a nk
n
'r
·
(3.20)
Now by lemma 3.3,
A
a~,r
A
= G~(~~,r+1)
- G~(~~,r)
= -KG~ ( ~~,r+1 )
-
~G (
~ ~~,r
) + O(n-3/5+30/2 log n)
14
a
= ( bn)
*) + O( n -3/5+30/2 I ogn }
gnkTlnk
-* (
(3.21)
n
where
* ~ Tlnk ,r+1
Tlnk,r ~ nnk
But by lemma 3.5, on the set
Sn
= {Y
4/5-20 ~ B n
n[bn
]
-1/5-20
}, for any z lying between Tlnk
,r
and Tlnk ,r+1' we have
~ (an/bn ) + Ifnk-fOI + rean /bn }
~
C n
2
-2/5-40
(3.22)
, for large constant C .
2
So, on the set S , by lemma 3.1 and lemma 3.2,
n
+ 0(n-3/5+30/2 log nj
~
Hence:
-3/5+30/2
C g(f }n
3
O
log n, for large constant C .
3
(3.23)
15
P[H*
n
n-3/5+30/2 log n] ~ P[
>M
1
max
max
k€I (a,b) -b <r<b
n
n- - n
(3.24)
(3.25)
iff
> M2n-3/5+30/2 1og n,
(3.26)
) k{M n-3/5+30/2 log n).
2
(3.27)
iff
iff
k
I~
i=l
{U
- M }
nk
nki
I
) k(M n
2
-3/5+30/2
log n),
(3.28)
where
(3.29)
and
(3.30)
Also by lemma 3.3,
A
Mnk = Gnk(f nk + r(anlbn »
-
A
Gnk{f nk )
16
-N
~
*
an gnk(f nk ) + O(n
-3/5+30/2
* is lying between f nk and Tlnk,r.
where f nk
and Tl
,by lenuna. 3.5, on the set S ,
nk .r
n
some large constant C .
4
log n),
(3.31)
But for any z lying between f nk
Using lenuna. 3.2, we can now conclude that for large
n, on the set S ,
n
+ 2 B3 n -3/5-60
sup
-1/5-20 , Iz-f I~C4an
O~y~Bn
O
+ O(n-3/5+30/2 log n)
~ ~
Ir(z.y,x ) I]
O
g(fO)an , for some large constant M .
3
(3.32)
So, by Bernstein's inequality we have
P[Hnk(Tlnk,r)
>~
n-3/5+30/2 log n]
= P[ I k};
i=1
{Unki-J.Lnk }
I > k(M
2
n
-3/5+30/2
log n)]
(3.33)
for sufficiently large n.
e-
17
Thus,
P(H:
>M
1
n-3/5+30/2 log n)
= 4(b-a)
n
-1
1-0/2-~a[~C1g(EO)]
3/5-40
2
+ exp[-2n
(Bf(xO)-b) ]
(3.34)
Now choose the constants in such a way that
00
L
P(H:
n=1
>M
1
n-3/5+30/2 log n)
< 00.
(3.35)
•
This concludes the proof of Theorem 3.1.
Lemma 3.6
P(
max
k€In(a,b)
Proof.
"
IE"*nk - Enkl
>a
n
i.o.)
= O.
First note that by lemma 3.4, it is enough to show
(3.35)
for some constant C .
5
"*
First note that E
nk
~
Enk-an
implies
(3.37)
18
By lennna 3.3 and lennna 3.4, we have
A
A
= [Gnk(fnk )
A
A
A
- Gnk(f nk )] + [Gnk(fnk ) - Gnk(f nk-C5a n )]
+ O( n -3/5+315/2 I og n )
First note that both If~-fOI and If:-fOI are bounded by C6a n for
some large constant C .
Then by lemma 3.1, lennna 3.2 and lemma 3.4, on the
6
set S , for large n
n
(3.39)
choosing C large enough.
5
~
4/5-215
2(b-a)n
Hence, for large n,
exp(~
2
2
c
CIa g (fO)(log n) ) + P(Sn)'
co
by Hoeffding (1963).
Since
1
n=1
P(Sc)
n
< co
by lemma 3.1 and
(3.40)
e·
19
co
Z n
4/5-20
exp[~
2
2
CIa g (fO)(log n) ]
< co.
(3.41)
n=1
we have
(3.42)
In the same way
(3.43)
•
and the lemma is proved.
Proof of Theorem 2.5
First note that Theorem 3.1. lemma 3.3 and lemma 3.4 together imply
> M4
n -3/5+30/2 log n
LO.]
= O.
(3.44)
for some large constant M .
4
So. by lemma 3.3 and lemma 3.4. we now have
(3.45)
20
max
IR~(I)1
k€I (a,b)
= 0(n-3/5+30/2
log n),
a.s.
(3.46)
n
By lemma 3.5,
max
k€In(a,b)
I'"fnk-f"nkI = O(n-2/5+0
log n), a.s.
Then by lemma 3.1 and lemma 3.2,
-*
'"
max
Ignk(f nk ) - g(fO) I
k€I (a,b)
= O(n-2/5+0
log n),
a.s.
(3.47)
n
Again letting V 1.
nk
= I(Z*nl.
~
"f
nk
), we have
> n-2/5+o
~
log n
4/5-20
2
2(b-a)n
exp(-2(b-a)(log n) ),
1
Znl'···' Z]
nk
(3.48)
by Theorem 1 of Hoeffding (1963), and
00
!
n4/5-20 exp[-2(b-a)(log n)2] < 00.
(3.49)
n=1
Hence
"* "
p - Gnk(f
nk )
= O(n-2/5+0
From (3.45), (3.47), (3.50), we have
log n),
a.s.
(3.50)
e·
21
= O(n-3/5+30/2 log n),
Since
A* A
-1 k
P - Gnk(f nk ) = k
.! [1(Z*.
1=1
n1
a.s.
A
> f nk )
(3.51)
- (l-p)],
we now have the following representation
A* A
1 k
*
fnk-f nk = {kg(fO)}- ! [1(Z.
i=1
n1
A
*0
> f nk ) - (l-p)] + Rnk ,
(3.52)
where
max
k€I (a.b)
I
IR*0
nk
= O(n-3/5+30/2 log n),
a.s.
(3.53)
n
This representation can be modified to the following forms:
and
(3.55)
where
0*
-1
in both (3.54) and (3.55). and Zn1. = G
the conditional cdf of Z given X=xO'
.
*
Gnk(Zn1.) and G(·) = G(· Ixo) is
A
0
* ..... Znk
* are
Note that since Znl
*
A
conditionally iid given (Znl ..... Znk) with Zni having cdf Gnk •
0*
P(Zni
~
Zit
i=I.2 •...• k)
22
(3.56)
* .... ,Znk
* are iid with cdf G(·).
so that for each k. Zn1
To obtain (3.54) from (3.53). we need to show
max
kEI n (a.b)
)
= O( n -3/5+36/2 I og n.
(3.57)
a. s.
Since by lemma 3.4 and lemma 3.5.
- , I = O(n-2/5+6
I,nk
0
A
max
kEI (a.b)
n
log n).
(3.58)
the result follows immediately from (3.44).
To obtain (3.55) from (3.54). it is enough to show that
-1
Ik
max
kEI n (a.b)
0*
k
~
. 1
1=
-1
where W
n1. = 1{Zn1. > G
{Wn1'-~nk}
log n). a.s.
(3.59)
0*
A
0
I = O{n-3/5+36/2
- 1{Zn1. > '0)
Gnk{,O»
Now.
(3.60)
By lemma 2. for large n.
/G{,O) - ~{'0)1 ~ B2IQ{'0)ln-2/5-46
Sn = {Y
4/5..
n. [n -b]
~
Bn
-1/5-26
on the set
}. and hence
23
(3.61)
Now use (3.61) and Bernstein's inequality to show:
P[
max
IGnk(f O) - ~(fo)1
k€I (a.b)
n
= EP[
Ik- 1
max
k€I (a.b)
n
where
k
~
~
i=1
> n- 2/5+0
{1(Z.
nl
log n]
> fo)
- -*
G .(fo } I
nl
> n-2/5+0
~
4/5-20
-1
2
c
2(b-a)n
exp[-2a(4G(f o »
(log n) ] + P(Sn)'
~
= a(yl. y 2 .... ).
log n
I ~]
(3.62)
By lemma 3.1. this implies
-*
-2/5+0
max
IGnk(fo ) - Gnk(f o )I = O(n
log n).
k€I (a.b)
A
(3.63)
a.s.
n
Hence combining (3.61) and (3.63). we have
(3.64)
for some constant M .
5
Now. we use the Bernstein's inequality and lemma 3.1 to obtain
00
}; P[ max
n=1
k€I (a.b)
Ik- 1
n
00
~
2(b-a)
}; n
n=1
k
}; (W '-~nk)1
i=1
nl
4/5-20
> n-3/5+30/2
exp[-a(4 M )
5
-1
log n]
00
c
log n] + }; P(S)
n=1
n
<
00
(3.65)
This proves (3.59) and the representation (3.55) is established.
This completes the proof of Theorem 2.5.
•
24
Proof of Theorem 2.7
First. we use (2.11) to rewrite the representation given in Theorem
2.5 as follows:
~
t nh - t nh
K (h)
= {K
n
(h)g(t O)}
-1
n
0*
~
[l(Zni
i=l
> to) - (l-p)]
*0
+ RnK (h)
(3.66)
n
Our objective at this point is to show
(1)
sup
IR*O
n K (h) I -- O( n -3/5+36/2 I og n ) . a.s.
n
h€Jn (c.d)
(2)
we can replace Kn(h) by [nhf(x )] in the first term of the eXPansion
O
(3.67)
without slowing down the rate of convergence of the remainder term.
Our approach is very similar to the one of Bhattacharya and
Gangopadhyay (1988).
First note:
Lemma 3.7.
Let An(h) = Kn(h) - nhf(xO).
Then
2l5 6
sup
IA (h) I = 0(n
- log n). a.s.
h€Jn(c.d)
n
Proof:
~(h)
Let
= P(Y
~
h/2).
Now write
n
A (h) =
n
=
[~
i=l
(l(Y i
~
h/2) -
~(h»]
+ [n
~(h)
- nhf(xO)]
An1 (h) + An2 (h). say.
(3.68)
Bu t. by condi tion 1,
/~(h) - hf(xO)/ ~ (h3/24) {sup
Ix-xOI~h/2
/f"(xo)I}.
Thus
(3.69)
(3.70)
Now. since
~'(h)
< 2f(xO)
on J n (c.d). we divide J n (c.d) into v n
=
25
2/5-0
-1
2(d-c) f(x ) n
(log n)
equal intervals to ensure that ~(h) increases
O
2/5-0
by at most (n
log n) at the (v n+ 1) endpoints of these intervals, then
sup
IAn1 (h) I ~ 2n2/5-0 log n].
hEJn(C, d)
P[
sup
hEJn(c,d)
2/5-0
IAn1 (h) I > 2n
~ (v
Finally, since
~(h)
n
log n]
sup
P[ IAn1 (h) I
h€Jn(c,d)
+1)
= hf(xO)
Thus
+ n
-1
An2 (h)
~
2df(xO)n
> n2/5-0 log n]
-1/5-20
(3.71)
for all h €
J n (c,d),
sup
P[ IAn1 (h) I
h€Jn (c,d)
> n 2/5-0
~
log n]
2
2 exp[-(log n) 1(8 df(xO»]'
(3.72)
00
~ n2/5-0 exp[-a(log n)2] < 00 for all a > O.
by Bernstein's inequality, and
n=1
Hence
sup
h€Jn(c,d)
2/5-0
IAn1 (h) I = O(n
log n), a.s.
•
and the lemma is proved.
Now to prove (3.67), for 0
< c < d,
A = {sup
n
(3.73)
IR:
h€Jn(c,d)
B = {max
n
k€In(a,b)
let a
(h)
= cf(xO)/2 < 2df(xO) = b,
I > M n-3/5+30/2
log n}
n
IR:I > M n-3/5+30/2
log n}
and let
26
C
n
= {max
IA (h)1 > M n
hEJ (c.d)
n
2l5
-
o
log n}.
n
So there exists NO = NO(M) such that for n > NO' C~ implies Kn(h)
for all h
E
J (c.d).
E
In(a.b)
Thus
n
It now follows that for sufficiently large M.
PEAn
1.0. ] ~
P[Cn
~
P[Cn
1.0.] +
n
P[ U
N~1 n~N
1.0.] +
CC and A
n
n
P[ U
n CnC and Bn
N>NO n~N
1.0. ]
1.0. ]
= O.
since for large M, P(Cn
Theorem 2.5.
i.o.) = 0 by lemma 3.7 and P[Bn
i.o.] = 0
by
This proves (3.67).
Finally, let
*0
= I(Zni
> fo) -
(3.74)
(l-p)
and
(3.75)
Then
Kn(h)
{Kn (h)g(f o)}-1
i
Uni = {m (h)g(f o)}
n
-1
m (h)
n
~
1
U
ni
+ R" + R'"
nh
nh'
(3.76)
where
mn(h)
Rrih = {An(h)lKn(h)}{mn(h)g(fO)}-l
i
Uni ,
(3.77)
e·
27
K (h)
n
IA (h)/K (h)1
sup
n
n
h€J (c,d)
n
~
{1-A (h)/K (h)}{m (h)g(f )}-1 [
O
n
n
n
where Un 1' ... ,Unn are iid with mean O.
m (h)
Unl. -
1
~
U .],
nl
1
(3.78)
By lemma 8,
= O(n-2I5+0
log n), a.s.
(3.79)
n
and
m (h)
n
I(m (h))-1
sup
n
h€J (c,d)
k
~
Ik- 1 ~1
1
U
.1
nl
n
= O(n-1/5+0/2 ),
a.s.
(3.80)
by an application of Theorem 1 of Hoeffding (1963).
" I=
IRnh
sup
h€Jn(c,d)
Hence
O( n -3/5+30/2 I og n,
)
(3.81)
a.s.
Now consider the jump points of mn(h) = [nhf(x )] in In(c,d) together
O
-1/5-20
-1/5-20
with the end points n
and n
, and call these points
n
then vn ~ n
-1/5-20
4/5-20
c
= h n0
< hn 1 < ... < hnv
= n -1/5-20
(d-c)f(xO), hn,j+1-hn,j ~ (nf(xO»
on each of the v n intervals [hn j' h n, J+
. 1).
< h n ,j+1'
K (h)
I
n
~
1
m (h)
n
U . -
nl
~
1
u.1
nl
uni -
(3.82)
-1
,and mn(h) is constant
At the same time, Kn (h) is also
integer-valued and non-decreasing and Iunil ~ 1.
all hnj ~ h
d,
n
m (h .)
n nJ
~
1
Hence for each j and for
28
n
1
+
~
K (h .)+1
n nJ
m (h .)
K (h .)
K (h)
.1 ~
U
I
n
nJ
~
nl
1
nl
n
U . -
nJ
~
U
.1
nl
1
(3.83)
+ {Kn (hn,J'+1) - Kn (hnJ.)}.
Therefore, if we can show that
K (h .)
I
max
O~Hv
n
n
~
1
nJ
Uni 1
= O(n 1/5-0/2
log n), a.s.
(3.84)
and
max
O~j~vn
IKn (hn,J'+1)
- Kn (hn j) I
= O(n 1/5-0/2
log n), a.s.
(3.85)
then by Lemma 3.7, we have in (3.78),
'''1
sup
IRnh
hEJn(c,d)
= 0(n-3/5+30/2
I og n ) , a. s ..
(3.86)
since
sup
hEJ (c,d)
II-An (h)/Kn (h) I(mn (h»
-1
= O(n
-415+20
), a.s.
(3.87)
n
To prove (3.85), note that K(hn, j+1) - K(hnJ.) is Binomial (n,IT .).
nJ
where ITnj
(nf(x »
O
-1
= P(hn ,j/2 < Yi < hn,j+1/2) = n-1 [1+0(1)],
.
since hn,J+
. 1-hnJ. <
Hence for large n and for all j,
P[Kn (hn, j+1) - Kn (hnJ.)
~
2 Mn
1/5-0/2
log n]
/~ P[ n -1({k (h j+l ) - K (h . )} - IT .) ~ M n- 4I5- 0/ 2 log n]
n n,
n nJ
nJ
~
2 exp[-(M/4)n
1/5-0/2
log n],
(3.88)
e·
29
by a variation of the Bernstein's inequality, and the fact that for
v
n
= (constant) n
415-20
~
,
exp[_(Ml4)n 1/ 5 - 0/ 2 log n]
v
n=1
< 00.
n
This proves (3.85).
Finally, note that by Hoeffding's inequality,
K (h .)
p[1 n~
nJ
1
m (h .)
u. _ n~
1
nl
<
2 E
-
nJ
U.
I > M n 1/ 5- o/ 2
log n]
nl
_-2 2/5-0
exp[-2 Mn
(log n) 2 / IK (h .)-m (h .)1]
n nJ
n nJ
+ 2 exp(-2 M log n).
(3.89)
Since
00
~
n=1
00
exp[-2 M log n] = Canst.
v
n
~
n
4I5-20-2M < 00;
(3.90)
n=1
for sufficiently large M, to establish (3.84), we only need to show that
00
~ n 4l5- 20 P[IKn (hnj ) - mn (hnj )1
n=1
> M n2/5-0
I og n ]
< 00
(3.91)
But Kn (h) - mn (h) = Kn (h) - [nhf(xO)] differs from An (h) = Kn (h) - nhf(xO)
by at most 1, and it is shown in the proof of lemma 3.7 that
P[ IA (h)
n
for some a
> 0,
2/5-0
I > Mn
log n]
< exp[a(log n) 2 J,
(3.92)
which implies (3.91), and thus (3.84) is established.
Now note that (3.66), (3.67), (3.76), (3.81), and (3.86) together
imply the result given in Theorem 2.7.
•
30
4.
PROOFS OF THEOREM 2.6 AND 2.8.
Proof of Theorem 2.6.
.
A*
In the representatIon of f
nk
4/5-20
of theorem 2.5, take k = [n
t] =
n4/5-20 t + ~ (t) with 0 ~ ~ (t) ~ 1.
n
After a little rearrangement of terms
n
this leads to
(4.1 )
where
0*
W*
nl. = [1(Zn i
r-~........
(l-p}]/~(I-p),
> fo} -
i=I,2, ... ,n,
(4.2)
are iid with mean 0 and variance 1 for each n in view of (3.56).
remainder term Rn1 (t} = n
sup
a~t~b
IR (t}I = n2l5- 0
n1
215-0
R
4/5-20'
n, [n
t]
max
IR~I = 0(nk€I (a,b)
Also, the
So,
1/
5+0/2 log n},
a.s.
(4.3)
n
The other remainder term R (t} comes from the discrepancy 0 ~ ~n(t}
n2
~ 1 due to replacing k = [n4/5-20 t ] by n4/5-20 t in the first term of the
representation.
But
2l5- o
In
[n4/5-20 t ]
~
*
~
W.
1
nl
I
=0(1},
p
(4.4)
e·
31
hence
(4.5)
Thus we have, with a = vP(l-p)/g(tO),
uniformly in a
~
t
~
(4.6)
b.
Now we use Theorem I, page 452 of Gikhman and Skorokhod (1969) to see
[n4/5-20 t ]
{n-2/5+0
2 W:
1
i
, a ~ t ~ b} ~ {B(t); a ~ t ~ b}.
(4.8)
•
This proves the lemma.
Proof of Theorem 2.8 is exactly the same.
5.
1.
REMARKS
Similar results can be obtained by choosing k
1 5
h -_ O(n- / (log n)-l).
= O(n4/5(log
n)-l)
or
Thus, u~ ) 0 may be rep I aced by t h·IS I ess strIngent
.
condition.
2.
If we choose k = O(n4/5), [such that (n-4/5 k) ~ t, as n~] this will
result in a non-zero bias term
(~ t 2 ) in the asymptotic distribution of t nk .
So under this set up, to obtain a bootstrap confidence interval of to' we
need to obtain a consistent estimator
respective consistent estimators.
of~.
A consistent estimator of
However, note that if we define
~
may
32
then it can be seen easily that for A
nb
to be a consistent estimator of
n
-2
-215
(flxo)' we need to choose bn such that bn O(n
log n) ~ 0 as n ~
4/5
But the corresponding rate of b can not be achieved with k = O(n
).
n
G
~
00.
Similar problems are encountered in the kernel case as well if we choose the
bandwidth h = O(n- 1/ 5 ).
33
REFERENCES
(1)
Bahadur. R.R. (1966).
A note on quantiles in large samples.
Ann.
Math. Statist. 37. 577-580.
(2)
Bhattacharya. P.K. (1974).
Convergence of sample paths of normalized
sums of induced order statistics.
(3)
Ann. Statist. 2. 1034-1039.
Bhattacharya. P.K. and Gangopadhyay. A.K. (19B8).
neighbor estimation of a conditional quantile.
Kernel and nearest
TechnicaL Report
Series of the IntercoLLege Division of Statistics. University of
caLifornia at Davis.
(4)
Technical report #104.
Bhattacharya. P.K. and Mack. Y.P. (1987).
Weak convergence of k-NN
density and regression estimators with varying k and applications.
Ann. Statist. 15. 976-994.
(5)
Cheng. K.F. (1983).
Nonparametric estimators for percentile
regression functions.
(6)
Gikhman. 1.1. and Skorokhod. A.V. (1969).
of Random Processes.
(7)
(8)
Commun. Statist. Theor. Meth. 12. 681-692.
Hoeffding. W. (1963).
Introduction to the Theory
W.B. Saunders COmPanY. Philadelphia.
Probability inequalities for sums of bounded
random variables.
]. Amer. Statist. Assoc. 58. 13-30.
Stute. W. (1986).
Conditional empirical processes.
595-645.
Ann. Statist. 5.
© Copyright 2026 Paperzz