KERNEL AND NEAREST NEIGHBOR ESTIMATION OF A CONDITIONAL QUANTILE by P .K. Bhattacharya University of California, Davis and Ashis K. Gangopadhyay University of North Carolina, Chapel Hill ABSTRACT Let (Xl'Z1)' (X 2 ,Z2),· .. ,(X n ,Zn) be LLd. as (X,Z), Z taking values in o < Rl, and for p < 1, let ep(x) denote the conditional p-quantile of Z given X=x, i.e., P(Z ~ ep(x) IX=x) = p. In this paper, kernel and nearest neighbor estimators of ep(x) are proposed. As a first step in studying the asymptotics of these estimates, Bahadur type representations of the sample conditional quantile functions are obtained. These representations are used to examine the important issue of adaptive choice of the ~ \l fj ',-, smoothing parameter by a local approach (for a fixed x) based on weak convergence of these estimators with varying k in k-nearest neighbor method and with varying h in kernel method with bandwidth h. These weak convergence results lead to asymptotic linear models which motivate certain estimators. -1- 1. Introduction Let (X 1,Zl)' (X 2,Z2)"" be two-dimensional random vectors which are iid as (X,Z), and for 0 < P < 1, let ep(x) denote the conditional p-quantile of Z given X=x. We consider the problem of estimating ep(x) from the data (Xl'ZI)"",(Xn,Zn)' and study the asymptotic properties of the kernel and nearest neighbor (NN) estimators, as n -I 00. Usefulness of conditional quantile functions as good descriptive statistics has been discussed by Hogg (1975) who calls them percentile regression lines. The problem of conditional quantile estimation has been investigated by Bhattacharya (1963) follo\'.;ing the fractile approach, and is also included in the general scheme of nonparametric regression considered by Stone (1977). More recently asymptotic normality of estimators of conditional quantiles has been proved by Cheng (1983), who considered kernel estimators in the fixed design case, and by Stute (1986), who considered NN-type estimators in the random design case. However, both of these authors took the bandwidth 1 h n to be O(n -~) which kept the bias smaller than the random error by an order of magnitude, and thereby made the rate of convergence slower than optimal. In this paper, we obtain Bahadur-type representations [Bahadur (1966)] of NN and kernel estimators of conditional quantiles as a first step in studying their asymptotics (Theorems Nl and K1). Bias terms show up in these representations because of our choice 4 of k = k n = O(nf» 1 in the k-NN method and h=h n = O(n- f» in the kernel method (with uniform kernel), for the purpose of achieving optimum balance between bias anel random error. These representations are then used to examine the important issue of adaptive choice of the smoothing parameters by a local approach (for fixed x). \'leak convergence results (Theorems N2 and K2) are proved for these estimators with varying k in the NN method and with varying h in the kernel method. These weak convergence -2- results are analogous to the one obtained by Bhattacharya and Mack (1987) for k-NN regression estimators with varying k and lead to asymptotic linear models similar to theirs, which motivate certain esFrrators., , " -::- .,.. \ i ", I ,J 2. The Main Results For the random vector (X,Z), let f denote the pdf of X and g( 'Ix) the conditional pdf of Z given X=x, with corresponding conditional cdf G(· Ix). \Ve want to estimate ~p(xa), p~uantile of Z given X=x a . Since p xa will remain fixed throughout our discussion, we shall write ~p(xa) = ~. The following regularity conditions are assumed. 1. 2. the conditional E (a, 1) and (a) f(x a) > (b) f"(x) exists ina neighborhood of xa ' and there exist (> a and A < that Ix-xal ~ ( implies If"(x) -f"(xa)1 ~ Alx-xal . a. OJ such e (a) g(~lxa) > a, where G(~lxo) = p. (b) The partial derivativesgx(zlx) and gxx(zlx) of g(zlx) and Gxx(zlx) of G(z Ix) exist in a neighborhood of (xo'~)' and there exist (> 0 and A < such that Ix-x OI,,$ f and" Iz-~ I $ f Igz(zlx)1 $ A, Igx(zlxo)1 $ A, OJ together imply Igxx(zlxa)1 $ A, Igxx(z Ix)- gxx(z.1 xO) I $ A Ix-xOI, IGx)zlx) -G By condition 2, Now let ~ J9C (zlx o)1 $ Alx-xol . is uniquely defined by G( ~ IxO) = p. {(X.,Z.), i=1,2,... } be iid as (X,Z), and let Yo1 = 1 1 {(Vi'Zi)' i=1,2, ... } are iid as (V = IX-x OI, IX,-x I, 1 a so that Z) with the pdf fV of V, the conditional pdf g*( 'Iy) of Z given V=y and the corresponding conditional cdf G*( 'Iy) given by e -3- (1) g*(zly) = [f(xO + y) g(zlx o + y) + f(xo-y) g(zlxo-Y)] / fy(Y), G*(zly) = [f(xO + y) G(zlxo+Y) + f(xo-y)G(zlxo-Y)] / fy(Y). Note that g*(zIO) = g(zlxo) = g(z), G*(zIO) = G(zlxo) = G(z). Here and in what follows, we write g(zlx ) o = g(z) and G(zlx ) o = G(z) for simplicity. Let Yn1 < .., < Ynn denote the order statistics and Zn1, ... ,Znn the induced order statistics of (Y1,Zl)'''''(Y ,Z ), Le., Z . = Z. if Y . = y.. For any positive integer nn llJ J m J k ~ n, the k-NN empirical cdf of Z (with respect to x ) is now defined as: O (2) where 1(5) denotes the indica.tor of the event S. The k-NN esdmc.tor of expressed as the p-quantile of G (3) ~n ,k nk , Le., ~ . I ~ can nO\\' be ; ; = the [kp]th order statistic of Z ;,... ,z k n . , n. " = inf {z: . Gnk(z) ~ [kp]/k}. I { The kernel estimator of ~ with uniform kernel and bandwidth h can also be expressed in the same manner, viz., . - (4) {{nh = inf ~ z: Gn K (h)(z) n K (h) = n ~ i=l l(Y. ~ h/2) 1 ~ ~ Kn (liM / Kn(h)} = n F y , n(h/2) , . Fy,n being the empirical cdf of Y1,... ,yn' The kernel estimators are thus related to the NN estimators by -4- where Kn(h) is the random integer given by (4). We now state our main results in the following two theorems of which Theorem Nl , ;'l '0 " , ' • gives a Bahadur-type represe~tationfor th,e k-NN estimator ~nk of ~ with k lying in ,~~ !f'h';·.:l'~. ' and Theorem Kl gives a corresponding representation for the kernel estimator ~ h lying in Theorem Nl. t t "nk -" where f3(~) =-[f(~o)Gxx({lxo) + 2f' (xO) Gx(~lxo)] / {24 rl(xo) g(~)}, .' and Theorem Kl. t t "nh -" where f3(~) and Z~i are as in Theorem Nl, and nh with -5- Remarks. ; ; : ;-:~';, 1. Let i \ .A = O'{Y I'Y 2''''}' Then Znl"",Znn are conditionally independent given , ~ ~_ ~ ~; ;} J .A, ; with Zni having conditional cdf G*(·\ Y ni)' as shown by Bhattacharya (1974). Hence G*(ZniIYni)' 1 ~ i ~ n, are conditionally independen~ and uniform (0,1) given.A, and therefore, Thus for each n, Z~1 "",Z~n are iid with cdf G. Since G(~) = p, it follows that for each n, the summands 1(Z~i > ~) - (l-p) in the above representations are independent random variables with mean 0 and variance p(l-p). 2. The remainder terms in both theorems are O(n corresponds to O(k _ ;l 5 -,i log n), a.s. Ir~ Theorem Nl, this 4 log k) with k - O(n~, as one would expect. ,4 The same 1 explanation applies to Theorem Kl, because [n h f(xO)J= O(n~ for h = O(n- 5). 3. Weaker versions of the above theorems were prdved by Gangopadhyay (1987). 2 His 2 remainder terms were o(n- 5), a.s. in Theorem Nl and 0p(n- 5) in Theorem Kl. 3. Weak Convergence Properties of NN Estimators with Varying k and Kernel Estimators with Varying Bandwidth. . - Consider the stochastic processes {~nk' k E In(a,b)} and {~nh' hE In(c,d)}. The two theorems in this section describe the weak convergence properties of suitably normalized versions of these processes, as n -+ 00. The symbol fi' indicates convergence in -6- distribution, Le., weak convergence of the distributions of the stochastic processes (or random vectors) under consideration and {B(t), t ~ O} denotes a standard Brownian motion. A Theorem N2. Let Tn(t) =e 4.• n~[n~tJ'· 2 {n%[Tn(t) - el - (3t , a Then for any 0 < a < b, ,. ( ~ t ~ b} *' I {a C B(t), a ~ t ~ b} , 2 where (3 = (3(e) given in Theorem NI and a = P(I-p)/g2(e). 4 Proof. + (n(t) In the representation for enk given in Theorem NI, take k 4 = [n5 t] = n5 t with 0 ~ (n(t) < 1. After a little rearrangement of terms, this leads to 4 2 n5[T (t)-e]-(3(e)t n 2 = 5 1 2 [n t] h/p{ I-p) /g(enC . n- 5 E W. nl 1 ; 3 + E RnJ·(t), j=I where are iid with mean 0 and variance 1 for each I g remainder terms, R 1(t) = n R 4 n n,[n!it] come from the discrepancy 0 ~ (n(t) n in view of Remark 1. from Theorem NI, while R < 1 due to replacing k 4 n2 Of the three (t) and n3 (t) 4 = [n5 t] by n5t in the first • two terms of the representation. Hence a.s., while R -7- :. 4 2 4 is easily seen to be 0p(n- 0) by virtue of sup a~t~b _ 1 \n- 5 '. ' . . I [n'5"t] W. I ~ iJ 1 m = °( ) p 1. The three '... remainder terms are together 0p(n 5" log n) uniforrnly:lp, a ~ t ~ b. We thus have, with a=Jp(l-p) /g(~), uniformly in a ~ t ~ b. Now use Theorem 1, page 452 of Gikhman and Skorokhod (1969) 4 [n5 t] S to see that {nE W., a ~ t ~ b} ~ {B(t), a ~ t ~ b}. This proves the theorem. 2 1 m Theorem K2. Let Sn(t) = ~ _ 1 . Then for any 0 n ,n 5t where 'Y = f3 f2(x O) Proof. In the representation for and T = a /f(x O)' rearrange terms to obtain 2 n5[S n (t)-~] - 'Y t 2 with ~nh f3 < and iJ C 0 < d, as in Theorem N2. given in Theorem Kl, take h and -8- where Wni' 1 5 i 5 n, are as in the proof of Theorem N2 and R* gIR Kl. Thus 1 n ,n- 5" t sup n c5t5d _1 -! log n), I = O(n is from Theorem a.s., and the theorem follows from the n,n 5"t triangular array, version of l)(m~keT's theorem, referred to above. 0 Remark. The uniformity of the: order of.ITIagnitude of the remainder terms in Theorems N1 and K1 was crucial in proving Theorems N2 and K2. 2 From Theorems N2 and K2, it follows that n5[Tn (t )-~l ~ ~ 2 n5[Sn(t)-~1 N( "/ t 2 ~ ~ 2 2 -1 N (/3 t ,(J t ) and 2 -1 . 2 t ) for each t, where N(/l,(J) denotes a Gaussian ,T LV. with mean /l and variance (J2. Hence the asymptotic mean-squared errors (AMSE) of Tn(t) is nis tur t 4 + (J2 C 1), n- t(l t4 which is minimized at ti = {(J2/(4 + T2 C 1), optimum tj and t which is minimized at t 2= ;)}! and the AMSE of Sn(t) {T 2/(4 ,,/2)}!. However, these 2 involve unknown quantities involving the marginal distribution of X e' and the conditional distribution of Z given X = x ' Although one could attempt to use O consistent (but possibly and t non~ptimal) estimates of /3, (J2, "/ and T 2 to approximate ti 2 in the spirit of Woodroofy (1970) and Krieger and Pickands (1981), we shall take another approach in the next section by considering linear combinations of k-NN estimators with varying k and kernel estimators with varying bandwidth. 4. Asymptotic Linear Models arid 'LInear Combinations of . , , tl t "nk and ~nh • Neglect the remainder form in' 'Theorem N1 to obtain the following asymptotic linear model for {~nk' kO 5 k . ~nk ~ (7) where /3 5 k 1} ; • 2 (+ (k/n) /3 + (J ~nk' kO ~ k and (J are as in Theorem N2 and 5k , -9- f Wni , -1 k ~nk = k in which Wni , 1 ~ i ~ n are the iid r.v.'s with mean 0' and' variance 1 given by (6). Hence This is exactly like the asymptotic linear model for k-NN regression estimates obtained by Bhattacharya and Mack (1987). Following their approach, we can now obtain the BLUE A ~ of ~nk's. in the model (7) and other suitably biased linear combinations of the asymptotic distribution of the BLUE of ~ and its ARE with respect to e 4 n,[ n5til The where 4 [1l5 til is the optimum number of NN's, follow from Theorem N2 in exactly the same way as in the case of regression estimates discussed in [4]. The kernel estimators - ~nh with n- 1 5C~ h ~ 1 n- 5 d also satisfy an asymptotic linear model similar to (7). For this, first neglect the remainder term in Theorem 1\1 to obtain where {3, (J parameter and ~nk are as in (7). Although this linear model is indexed by a continuous h, the problem of finding the BLUE and other suitably biased linear combinations of ~ in this model are essentially the same,as the corresponding problems in - (7). All questions about these linear combinations. are thus covered by our discussion in 4 4 the previous paragraph. To see this, let mO = [n5 c f(x O)], m1 = [n"5 d f(x O)], and define Then for h(m) ~ h < h(m+l), -10 - Using this in (8), we have enh ~ enh(m)' (9) { {nh(m) ~ /:'+ h{m) ~ h <: h(m+l) (l~ln}~~' r/~ (J ~nm mO ~ m ' ~ m1 1 n- ~1 d I All linear combinations n- ~ £(h) enh dh c of kernel estimators with varying bandwidth are, therefore, approximated by linear combinations of {enh(m)' mO ~ m ~ m 1}, for which the asymptotic linear model (9) is identical to (7). The asymptotic properties of these linear combinations follow from Theorem K2. The details are as in [7]. 5. Proof of Theorem Nl: Preliminary Lemmas. The k-NN estimator Znl"",Znk' ~~k of ~ is the p-quantile of the empirical cdf Gnk which are conditionally independent with Zni of having conditional cdf G*( . IYni)' It is, therefore, natural to think of enk as an estimator of the p-quantile enk k ' of the random cdf k-1 E G*(· IY .). The discrepancy between this random cdf and the 1 III cdf G*(· 10) = G(·) of Which e 'is the p-quantile, is going to give rise to a bias in I addition to the random error in estimating ~nk by ~nk' To facilitate the examination of this bias and the random error, we introduce some notations. Let g*( . IYIII.) = gIII.(.), G*(·' YIII.) = GIII.(.) , (10) Then ~nk' which is the target of ~nk is given by • -11- To examine the asymptotic properties of ~nk-~nk and ~nk-~ for k E In(a,b), we now analyze the corresponding properties of G nk (·) - ~nk(i') and U nk (·) - G( . ). 0 < Y 1 < ... < The order statistics n 1 Y are of the order of 4 \n;[n~bl n- 5. Consequently, for k E In(a,b), it should be possible to approximate the pdf's ink and the cdf's U nk defined in (10), by the first few terms of their expansions in powers of Yni' 4 i ~ [n5 b]. To this end, we have the following lemmas. Lemma 1. For B > b/f(x ) and for sufficiently large n, O pry 4 n,[n5b] > n- 1 5" B] ~ 2 3 exp[-2 n5{B f(xo)-b} ]. Proof. This is proved in [4]. k Lemma 2. k-1 L y 2 . = {12 f 2(x )}-1 (kin) 2 o 1 111 ma~ k~ IRnkl = O(n-~, + R nk , where a.s. [ n "5"b] Proof. Let 0 < Un1 < ... < U < 1 denote the order statistics of a random sample of nn size n from uniform (0,1), and let r.p = FyI. Then r.p(0) = r.p"(0) = 0, 9'(0) = {2 f(x )}-l and r.plll is continuous at 0 by condition 1. Use this to expand Y . = rp(U .) 111 III O 1 to the cubic term and observe that U (12) k-1 4 n,[n5 b] = O(n- 5), a.s., by Lemma 1. This leads to k k 2 F y~i = {2 f(x O)}-2 . k-1 F Uni + R nk (1), -12 - k~ [n~b] k k-1 E U 2. (13) 4 IR nk (l) I = ma~ where m 1 O(n- 5), a.s. Now write = k-1 I, where I ~ 2 k-:-1 Rnk (2) 1 ~ l '. ' r k k (i/n)! UnCi/n I + k-1 2(k/n) max 1 ~ i ~n 1 i 1 Uni-i/n 1 U .-i/n 1 + max m 1 ~ i ~n 1 2 2 Unl.-i/n 1 , so that max k~ 4 [n 5 b] 7 = O(n- m JI 0 g log n), a.s., by the law of iterated logarithm (see [6], page 157), and I Rnk (3) 1 = I -1 k, ,',' 2 ·1 2 2 k 'E (i/k) - fOx dx I (k/n) ~ 1 " so that ma* k~ [n~bl 1 k -3? .) (E 2ik )(k/nt ~ (k+ 1)/n- , 1=1 6 Rnk 1 == O(n- 5). Combining (12) and (13), the lemma is proved. o Lemma 3. The following expansions hold for the conditional pdf conditional cdf G*(zly): g*(z 1 y) = g(z) + G*(zly) = G(z) ~ y2 q(z) + y3 r(y,z), + ~ y2 Q(z) + y3 R(y,z), g*(z Iy) and the , -13 - where and there exist {> 0 and M < all bounded by M for 0 ~ y ~ 00 such that Iq(z)l, iQ(z)l, Ir(y,z)I and IR(y,z)1 are { and Iz-{I ~ {. Proof. Expand f(x O ± y), g(zlxo ± y) and G(zlxo ± y) about 0 to obtain: 1 2 g(zlx o ± y) = G(zlxo) ± y gx(zlx o) + 2 y {gxx(zlxo) + ~2(Y'z)}, G(zlx o ± y) = G(zlxo) ± y Gx(zlx o) + ~ y2{G xx (zlx o) + ~3(Y'z)} , with max{ 1~I(y)l, 1~2(Y'z)\, 1~3(y,z)l} ~ Ay for 0 ~ y ~ { and Iz-{I ~ {, where { > 0 and A < G*(z Iy) 00 are as in conditions 1 and 2. To obtain the formulas for g*(z Iy) and stated in the lemma, use the above expansions in (1). It will be seen by appropriate arrangement of terms that in remainders y3r(y,z) for g*(y Iz) and y3 R(y,z) for G*(y,z), the quantities Ir(y,z)1 and IR(y,z)1 for 0 ~ y ~ { and Iz-{I ~ ( will remain bounded by a constant M determined by f(x O)' f"(x O)' g({lx o)' gxx({lx o), Gx ( {I xO)' Gxx ( {I xO) and A, where {> 0 and A < 00 are as in conditions land 2. The boundedness of 1q(z) I and 6. Proof of Theorem Nl: IQ(z) I Bias in for 1z-{ I ~. { also follows from Condi tion 2. 0 {nk. . Recall that the target of (nk is (nk' the p-quantile of the random cdf G"nk(·) = k k-1 G*(·I Y ni)' while ( is the p-{}uantile of G(·). The leading term of ~nk - ~ is f -14 - non-stochastic with probability 1~ which is determined in this section. This is the bias in A ~nk' We first use Lemma 3 to bound the discrepancy between G"nk(') and G(') near (, . :: : '," in terms of Y k' This makes max I~ k-~ I small whenever Y 54 is small, in kE I n(a,b) n n,[n b] n the manner described in the foiI6wing lemma. • The almost sure order of magnitude of ~nk-~ is obtained as a corollar~. " I ; Lemma 4. For every B, there exist Nand C such that in the sample space of infinite 1 Y 4 < B n- 5 n,[n5b] sequences Proof. Fix B < 00 and 0 < a < b. By (11), it is enough to show the existence of Nand C such that for n ~ Nand k E I (a,b), Y n 2 1 4 n,[n 5b] ~ B n- 5 implies 2 "Gnk ( ~ - C n- 5) ~ -: G(~) ~ G"nk( ~ + C n- 5) . . For this, assume Y 4 < B, choose Nand C so that n,[n 5b] and use Lemma 3 to obtain (15) implies e" -15 - for n ~ G(~ - C n- ~ N. Moreover, since by Condition 2, ~ G(e) - ~ C n- i g(e) G(~ for C n-i'~ + C n- i)' ~ G(~) + ~ C n- gg(O (, and (14) implies' , 1 2 2 G( e + C n- 5) ~ G( e) + 2 C n~ 5 g( e) - G(e + C n-i) ¥., B, 2 n- 2 f) ~ G(e) -~ C n- gg(~) + M B2n- i for n ~ N. The lemma is proved by choosing C > 2M B2/ g( e) and then choosing N so as to satisfy (14). o Corollary. For 0 < a < b, max kEI n (a,b) Ienk-e I = 2 O(n- 5), a.s. Proof. Take B > b/f(xO) and apply Lemma 4 using C and N appropriately determined by B. Then ~ 00 3 2 exp[-2n5 {B f(x O) - b} ] < n=N E We now determine the leading term of enk-~ . Lemma 5. For 0 < a < b, Proof. By (11) and Lemma 3, 00. o -16 - -1 k G(O = Unk(e nk ) = k -1 k = G(z)+ k '1 ,f. ,E1JGni(e}+(enk-e) gni(zni)] 1 '1 3 {~::Tni. Q(~) + Yni R(Yni' ~)} 2 . . -. • .. : ilk!, 1 2 3 + (enk-e)[g( O. + k- E {g(Zni) - g(~)} + '2 Yni q(zni) + Yni r(Yni' zni)]' ,.1 1 where (16) Hence k k Q(e) . k-1 E y 2 . +2 k- 1 E y 3 . R(Ynl·,O 1 1 Dl 1 Dl ~nk-e = --2 - - - - . . . , . . . k - - - - - - - - - - - - - - - - 1 1 2 3 g(e) + k- E {g(znl·)-g(e)} + -2 Ynl. q(z Dl.) + YDl. r(Y Dl"Z 111.) 1 , where 3 3 max IR k(l) I = d(n- 5), a.s., by Lemma 2, max IR k(2) I = O(n- 5), kEl (a,b) n kEI n (a,b) n n a.s., by Lemmas land 3, and 2 max IR k(3) I = O(n- 5), a.s., by (16), Lemmas 1 kEl (a,b) n n and 3, and condition 2. Since (kjn)2, ~ n- ~ b2 for k E In(a,b), the lemma is proved. 0 7. Proof of Theorem Nl: Conclusion We shall first express t ~nk - t ~nk 111 a manner analogous to the familiar representation of sample quantiles [Bahadur (1966)]. The random variables involved in this representation are l(Zni > ~nk) - (l-G ni (e nk )), which are conditionally independent -17 - .A= u{Yl'Y2''''}, given and the remainder term is corresponds to O(k-! log k) with k = O(n~, O(n-! log n), a.s., which as one would expect. This representation I(Zni > (nk) is then modified to another one in which the random variables - (1 - Gni((nk)) are replaced by I(Zni > () - (1;- G ni (()), and to yet another one in which these random variables are replaced by l(Z*. >.~) (1 - G( ~)) where Z*. ill III - = G- 1 0 Gni(Zni)' 1 ~ i ~ n, are iid with cdf G(~). Together with the bias term obtained in Lemma 5, this will establish the representation of ~nk given in Theorem Nl. Having completed the proof of Theorem Nl in this section, we shall then prove Theorem Kl in the next section, via formula (5), establishing the corresponding representation for the kernel estimator (nh (with uniform kernel and bandwidth h). It should be emphasized that Theorem Nl is proved uniformly in k E In(a,b) with 0 < a < b and Theorem K1 is proved uniformly in h E In(c,d) with 0 < c < d where In(a,b) and In(c,d) are as given in Section 2. The first representation of ~nk - ~nk rests on Lemmas 8 and 9, which run parallel to Bahadur's proof [1]. However, we start with a lemma which provides an exponential bound for the deviation of sums of independent Bernoulli variables from their mean and then . prove another lemma dealing with fluctuations of G nk (') - G"nk(' ). Lemma 6. Let Unl"",U mn be independent Bernoulli variables with P(Uni=l) = Then P[I n- 1n 1 2 E (U n1·-7r .) I > t n] ~ 2 exp[- -2 n t n / {max 1 ill 1 ~ i ~n In particular, if t / max n 1 ~ i ~n P[ In- 1n E (U 1 7r ..... ill ill 111 0, then for large n, .-7l' .) I > t ill 7r. 1 2 ] ~ 2 exp[- -4 n t / max n n 1 ~ i ~n 7l'.]. ill + t } 1. n ni . 7r -18 - and if max 7r. It -l 0, then for large n, '( m n ..." 1(_ l_n , Proof. The first inequality is 205), from which the Lemma 7. I (nk-(nk I ~ a simplified version of Bernstein's inequality (see [14], page other't~~ follow as special cases. Suppose kmeasurable are random variables with 2 C n- 5 log n = (n(C), Then for any "I, there exists M such that 3 > M n- 5 log nJ < 00 , Proof. Write Unki = I(Zni E(U nki I vC). Then Choose B I (nk-(nk I ~ > b/f(x O)' ~ (nk) - I(Zni ~ (nk) and /lnki -'·r , ,'. and for each (n (C) = C n- glog n, n, let S n = {Y 4 n,[n 5bJ ~ 1 B n- 5} Since Lemma 4 implies that there exist C' and N such that for n ~ N and for z lying between (nk and (nk' Iz-( I ~ 2 2 C n- 5 log n + C' n- 5 ~ 2 (n(C) holds on the set Sn' Using Lemma 3, we now conclude that when n is large, then on Sn' -19 - It now follows from Lemma 6 that for sufficiently large n, max kEln(a,b) k P[I k-1 E (U k Jl k o- 1 nl " O ) nl ~ I > M n- ~ log nl " 3 1k max E P[lk- E (U k Jl k > M n-Slog nl kEln(a,b) 1 n I n I = o- O ) ~ 2 exp[-M 2 a{5 C g(~)}-l log nl + P(S~) To complete the proof, observe that ~ n- . 0 5Cg(~)} -1 < n -y-M 2a { 1 AJ for sufficiently large }'1, 00 00 and E P(S~) < by Lemma 1. 00 o n=l = n- We now define an into 2b n 1 = nS and divide the interval [~nk-an' ~nk+a n] equal intervals: each of length an/b = nn Hnk ( z) (17) 2 Slog n, b n 1I n*k 11*n = = = ~ log n. . Let . {G nk (z) - Gnk ( ~nk ) } sup I Z -~ n k I~an IHnk(z)1 - max IH~kl kE I n ( a ,b) max ~up IH n(z)1 k' -b n < r < b -1 zE J nk r - - n , L -20- . Lemma 8. P[ max kEl (a,b) n I ~nk-~nk I> an' La.) = 0 • (]nk( ~nk-an)' Fix B min kEI n (a,b) > b/f(xO) and let S = {Y n 4 n,[n 5b) ~ B n- 1 5}. Then by Lemma 3, {[kp)/k-(]nk(~nk-an)}~ ~g(~)an on the set Sn when n is large. Hence for large n, . P[ min (e k- e k) ~ -a ) kEln(a,b) n n n 00 by Theorem 1 af Hoeffding (1963). 00 ~ 4 1 2 2 n5 exp[- 2 a g (e)(log n) ) Since ~ P(S~) < n=l < 00, n=l . P[ min (e k-~ k) ~ -a La.) = O. kEI (a b) n n n n ' In the same way, . P[ max (e k- e k) ~ a La.) n n kEl (a,b) n n and the lemma is proved. = 0, 00 by Lemma 1 and - 21- Lemma 9. P[H: > C n-! log n La.] =0 for large C. . Proof. It follows from the monotonicity of Gnk (') and ITnk (·) that for Z E J nk ,r = [1]nk ,r' 1]nk ,r+1], where Hnk (·) is given by (17), and Hence ~ Let Sn = {Y IHnk (1]nk r) I + max -bn -<r<b - n 4 n,[n5b] ~ , max a nk r . -bn -<r<b -1 ' - n 1 B n- 5} as in the previous proofs. Then by Lemmas 3 and 4, max a nk r -bn -<r<b -1 ' - n ~ 2 g( e)n- ! log n on the set Sn when n is large. Hence P[H~ > {M + ~ P[ 2g(en n-!log nJ max max kEI (a,b) -b <r<b n ~ 2n(b-a) n- - n 3 C IH k(1]nk r)1 > M n- 5 10g n] + P(S ) n , n -! log n] + P(Sc). max max P[ IH k(1] k ) I > M n kEI (a b) -b <r<b n n,r n' n- - n n -22 - But k -~ k I ~ nn ,r n 1'1 max -b <r<b n- - n 00 i log n, and by Lemma 7, _i! ,,', En· max max P[I H k( 7] k ) I > M b n=l kEI (a,b) -b <r<b n n ,r n n- - n 5 log nl < 00, • 00 while E P(S~) < n=l 00 by Lemma 1. This completes the proof. 0 By Lemmas 8 and 9, we now have: A A P - Gnk(~nk) = Unk(~nk) - Unk(~nk) + Rnk (l) (18) A = (~nk-~nk)gnk(~~k) + Rnk (l), where ~~k lies between ~nk and ~nk' and (19) e' 2 max IR k(l) I = O(n- 5 log n), a.s. n kEI (a,b) n By the corollary to Lemma 4 and Lemma 8, max kEI n (a,b) I ~~k-~ I = O(n- 2 5 log n), a.s. Consequently, Lemma 1 and 3 now imply (20) Again, letting Unki = l(Zni ~ ~nk)' we have A P[ max kEI (a,b) n ~ Ip-G 2 k( ~ k) I > n- Slog nl n n k 2 E EP[I k-1 E {U k' - E(U k· 1~} I > n- 5 log n IAl kEI n ( a, b) 1 n) n ) -23- 4 ~ 2(b-a)n~ 2 exp[-2(b-a)(log n) ] , 00 ~ by Theorem 1 of Hoeffding (1963), and 4 n~ 2 exp[-2(b-a) (log n) ] < 00. n=l Hence From (18), (19), (20), and (21), we have max kEI (a,b) 1((nk-~nk)-{g(~)}-l [P-Gnk(~nk)]1 =O(n-~logn), a.s. n . -1 k Since p - Gnk(~nk) = k ~ [1(Zni > ~nk) - {1-Gni(~nk)}]' 1 we now have the following representation: (22a) max kEI n (a,b) IR k l =O(n-!logn), a.s. n This representation can be easily modified to two other slightly different forms, viz., (22b) . ~nk = ~nk + {k g( ~)}- 1k ~ [1(Zni > ~) - {1-Gni(~)}] 1 + Rnk and • (22c) ~nk k = ~nk + {k g(~)}-1 ~ [1(Z~i > ~) - {1-G(~)}] + Rnk 1 -24 - max kEI n (a,b) where IRnkl == O(n-! 'log n), a.s., in both (22b) and (22c), and Z*. m 1 • = G- 0 Gni(Zni) and G(·) = G( ·Ixo) is the conditional cdf of Z given x= x O' Note are conditionally independent given v'6, with Z. having , that since 11l conditional cdf G , ni P[Z~i ~ zi' i=l, ... ,n] = E P[Z~i ~ zi' i=l,... ,n I ~ -1 . n = E P[Z 11l. -(G n 1. 0 G(z.), 1=1,... ,nl ~ = II 1 1 G(Z1')' VOJ so that for each n, Z~l "",Z~n are iid with cdf G( . ). To obtain (22b) from (22a), use Lemma 7 with (nk = e to conclude that 1k max Ik- ~ {l(Z·~ek)-l(Z·~e)}-{G·(ek)-G ·(enl kEI n (a,b) 1 m n m m n m To obtain (22c) from (22b) it is enough to show that (23) max kEI (a,b) n where E(U ni U. m = l(Z*. m I JC). I k- 1k, 3 ~1 (Uni-/l ) 1 = O(n- 5 log n), a.s., ni > G-1 0" ·G..m·(e)) - l(Z*. m > e), and Itnl' - G(e) - Gnl·(O = For this, use Lemma 3 to observe that for large n, max 4 i~ [ n5 b] l/l·I~B m 2 2 IQ(Oln- 5 0nthesetS ={Y and then use Lemma 6 and Lemma 1 to obtain n 1 4 n,[n 5bj ~Bn-5}, • -25- ~ 2(b-a) 00 2 4 1 2 00 E n!i exp[-a{4 B IQ(~)I}- (log n) ] + E P(S~) < n=1 ), 00. n=1 This proves (23) and the representation (22c) is established. Combine Lemma 5 with (22c) and note that G(~) = p to complete the proof of Theorem N1. 8. Proof of Theorem Kl. The kernel estimator ~nh can be regarded as the NN estimator ~nKn(h) in which Kn(h) is a random integer given by (4). A formal substitution for k by Kn(h) in the representation given in Theorem N1 leads to (24) ~nh-~ = t1(~){Kn(h)/n}2 + {Kn(h)g(~)}-l Kn(h) E fl(Z*. > ~) - (l-p)] + R I' (h)' 1 . III n \.n However, this is of no use unless we can show that (a) sup IRK (h) I converges at a fast rate, hEJn(c,d) n n 1 where In(c,d) (b) 1 = [n- 5 c, n-!i d], 0< C < d, and in the first two terms of the RHS of (24), K (h) can be replaced by the leading term n of its deterministic component without slowing down the rate of convergence of the remainder term. We now proceed to establish (a) and (b). First examine the magnitude of Kn(h) - nhf(x ) in the following lemma. O -26- = Kn(h) - Lemma 10. Let ~n(h) sup hEJn(c,d) I~ nhf(x O)' Then 2 n (h) I = 0(n5 log n), a.s. Proof. Let /t(h) = P(Y ~ h/2) and write ~n(h) = ~nl (h) + ~n2(h), where ~nl (h) = n I; ' [1(Y i ~ 1 h/2) - Jl(h)], I/L(h)-hf(xo)1 ~(h3/24) '~n2(h) = n[Jl(h) - By condition hf(x O)]· 1, 2 sup Ifll(xo)l. Thus Ix-xol~h/2 sup I~ 2(h) I = 0(n 5) . hEJ (c,d) n n n Next consider and 1/n nJl(h). ~nl (h), Since which is the difference of non-decreasing functions I; I(Y i 1 Jl'(h) < 2 f(x O) on In(c,d), we divide 2 = 2( d-e) f(x )n 5/log n equal intervals to ensure that nJl(h) O 2 n510g n in each of these interv~ls. Consequently, if endpoints of these intervals, then sup hEJn(c,d) I~nl (h) I ~ h/2) In(c,d) into increases by at most 2 n5 log n at the IJ n+ 1 a • 2 I~ 1(h) I ~ 2 n5 10g n. Thus n 2 P[ ~ 2 sup 1~I(h)I>2nfilogn]~(I/+I) sup P[I~I(h)l>n510gn]. hEJn(c,d) n n hEJn(c,d) n ? 2 sup P[I~nl(h)1 > n5 log nJ ~ 2 exp[-(log nt/{S df(x )}] O hEJ n (c,d) 00 by Bernstein's inequality, and I; 2 2 n5 exp[-a(log n) ] < 00 for all a > O. II cncC' n=1 sup I~ 1(h) I = hEJn(c,d) n • 2 0(n 5 log n), a.s., and the lemma is proved. o -27 - The convergence rate of sup IRK' (h)· 1 is now determined in the following hEJ (c,d) n n n lemma. Lemma 11. sup IRnK (h) I = O(n-! log n), a.s. hEJn(c,d) n Proof. For 0 < c < d, let 0 < a = c f(x O)/2 < 2 d f(x O) = b, and let IR A = {sup hEJ (c,d) n n K (h)1 > M n-!log n}, n n IRnk I > B = { max n kEI (a,b) n 3 M n- 5 log n}, 2 C = {max I~ (h) I > M nO log n} . n hEJ n(c,d) n There there exists NO=NO(M) such that for n > NO' C~ implies Kn (h) E In (a,b) for all hE In(c,d). Thus CnC n An c CnC n Bn for n > NO . It now follows that for sufficiently large M, P[A n La.] ~ P[C n ~ P[C La.] + P[ U La.] + P[ U n n N0->1 n>N - 0 n C~ and An 1.0.] N>N O n~NO Cnc and Bn La.] = 0, since for large M, P[C n La.] = 0 by Lemma 10 and P[B n La.] = 0 by Theorem N1. o -28 - We now consider the first two term,s on the RHS of (24). Of these, • (25) where Rl~h = ,8(Of2(xO)h 2 [~n(h)/{nhf(xO)}] . [2 + ~n(h)/{nhf(xO)}], and by Lemma 10, To examine the other term, let Then {Kn(h)g(~)}-l Kn(h) E., Uni 1 = {mn(h)g(~)}-l mn(h) E 1 Um. + R"nh + R'"nh ' mn(h) , R~h = -{~n(h)/Kn(h)} {mn(h)g(~)}-l E U. m (27) 1 where Un1 ,,,,,U nn are conditionally independent given ..t6, with E(U m·1 Lemma 10, sup hEJ n (c,d) I~n(h)/Kh(h) I = O(n- mn(h) 1 m (h) E sup n 1 hEJ n (c,d) U·I = m 2 fj ~ = O. By log n), a.s., and k 1 1k-1 E U .1 = O(n- 5), a.s. max 1 m n!cf(x ) ~k~n!df(xo) o • -29- by an application of Theorem 1 of Hoeffding (1963). Hence Now consider the jump-points of mn(h) = [nhf(x )] in In(c,d) together with the endO 1 1 points n- 5" c and n- 5" d, and call these points n- !c =h < h n1 < ... < h nvn = nnO 1 5" d. Then and mn(h) is constant on each of the vn intervals [hnj , hn ,j+l). At the same time, Kn(h) is also integer-valued and non-<lecreasing, and IUni I ~ 1. Hence for each j and for all h nj ~ h < hn,j+ l' Therefore, if we can show that (28) max O~j~vn then in (27) I Kn(h nj ) E mn(h nj ) U .- 1 sup IR"'(h) I hEJ (c,d) n n n1 E 1 1 U.I n1 = O(n- t log n), = O(no log n), a.s., a.s., will follow because by Lemma 10, -30 - sup hEJn(c,d) II-~ (h)/K ,{hYI {m (hn-1 = O(n- t), a.s. n n n of • since h '+l-h ,< {nf(x o)}-l. Hence for large n and for all J, n,J nJ 1 5 P[K n (h n,J'+1) - Kn(h nJ.) > - 2 M n log n] 1 '+1) - Kn(h nJ·n -< P[n- {K n (h n,J. ~ nJ,>- M n-~ log n] -1r 1 exp[-(M/4)n5 Iog n] 1 00 by Lemma 6, and 4 v exp[-(/L/4)n5 log n] I: n=1 n < 00, because vn = canst. n5 , This proves e -\ (29). Kn(h nj ) Finally, note that f' I11 Uni n(h nj ) f Uni is a sum of IKn(h nj ) - mn(h nj ) I terms in which the summands U ' given in (26), when K (h .) > m (h ,), and -U " III n nJ n nJ III when K (h .) < m (h .) are conditionally independent given.A, with E(U ,I vC) = o. n nJ n nJ· , III Using Hoeffding's inequality, we therfore have • - 31- 2 < > M nfilog n] - 2 P[lK n(h nJ.) -m n(hn·)1 1 J + 2 exp[-2 M log n] Since 00 00 4 E vn exp[-2 M log n] = canst. E nfi n=l n=1 -2M < 00 for sufficiently large M, we only need to show that 00 (30) 4 2 E nfi P[I K (h .) - m (h .) I > M n5 log n] < 00 n=1 n nJ n nJ for large M, in order to establish (28). But Kn(h) - Mn(h) = Kn(h) - [nhf(x O)] differs from ~n(h) = Kn(h) - nhf(x O) by at most 1, and it was shown in the proof of Lemma 10, g that P[ I~n(h) I > M n log n] < exp[-a(1og n)2] for some a > 0, which implies (30) anel thus (28) is established. \Ve have now shown that in (27), t sup IR"'h I = O(n-- log n), a.s. hEJn(c,d) n To complete the proof of Theorem Kl, substitute the expressions in (25) and (27) for the first two terms on the RHS of (24). The remainder terms R'nlI' R"l nl and R"'l nl is these expressions have all been shown to be O(n- t log n), a.s. and uniformly in h E In(c,d), and the other remainder term RnKn(h) has also been shown to be of this order in Lemma 11. This establishes the order of magnitude of R*h n = R'nh + R"h n + + RnK (h) claimed in the theorem. n R"'] nl -32- REFERENCES (1) Bahadur, RR (1966). A note on quantiles in large samples. Ann. Math. Statist. 37, 577-580. (2) Bhattacharya, P .K. (1963). On an analog of regression analysis. Ann. Math. Statist. 34, 1459-1473. (3) Bhattacharya, P .K. (974). Convergence of sample paths of normalized sums of induced order statistics. Ann. Statist. 2, 1034-1039. (4) Bhattacharya, P.K. and Mack Y.P. (1987). Weak convergence of k-NN density and regression estimators with varying k and applications. 15, 976-994. (5) Cheng, K.F. (1983). Nonparametric estimators for percentile regression functions. Commun.Statist.-Theor. Meth. 12, 681-692. (6) Csorgo, M. and Revesz, P. (1981). Strong approximations in Probability and Statistics. Academic Press, New York. (7) Gangopadhyay, A.K. (1987). Nonparametric estimation of conditional quantile function. Ph.D. dissertation at the University of California at Davis. (8) Gikhman 1.1. and Skorokhod, A.V. (1969). Introduction to the theory of random processes. \V.B. Saunders company, Philadelphia. (9) Hoeffding, \V. (1963). Probability ineqalities for sums of bounded random variables. 1. Amer. Statist. Assoc. 58, 13-30. (10) Hogg, RV. (1975). Estimates of percentile regression lines using salary data. J. Amer. Statist. Assoc. 70, 56-59. (11) Krieger, A.M. and Pickands, III, J. (1981). Weak convergence and efficient density estimation at a point. Ann. Statist. 9, 1066-1078. (12) Stone, C.J. (1977). Consistent nonparametric regression. Ann. Statist. 5,595-645. (13) Stute, W. (1986). Conditional empirical processes. Ann. Stistist., 14,638-647. (14) Uspensky, J.V. (1937). Introduction to Mathematical Probability. McGraw-Hill, New York. e)
© Copyright 2025 Paperzz