• APPROXIMATIONS TO lliE MEAN INTEGRATED SQUARED ERROR WIlli APPLICATIONS TO OPTIMAL BANlMID1H SELECfION FOR NONPARAMETRIC REGRESSION FUNCfION ESTIMATORS Wolfgang Hardle* lJniversitat Heidelberg Sonderforschungsbereich 123 Im Neuenheimer Feld 293 D-6900 Heidelberg 1 AMS 1980 Subject Classifications: KEYWORDS AND PHRASES: University of North Carolina Department of Statistics 321 Phillips Hall 039A Chapel Hill, NC 27514 Primary 60F05, Secondary 62G05 stochastic measure of accuracy, nonparametric regression function estimation, optimal bandwidth selection, limit theorems, mean square error. *Research partially supported by the "Deutsche Forschungsgemeinschaft" SFB123, "Stochastische Mathematische Madelle". Research partially supported by the Air Force of Scientific Research Contract AFOSR-F49620 82 c 0009. SUMMARY Discrete versions of the Mean Integrated Squared Error (MISE) provide stochastic measures of accuracy to compare different estimators of regression flIDctions. These measures of accuracy have been used in Monte Carlo trials and have been employed for the optimal bandwidth selection for kernel regression flIDction estimators, as shown in HardIe and Marron (1983). In the present paper it is shown that these stochastic measures of accuracy converge to a weighted version of the MISE of kernel regression function estimators, extending a result of Hall (1982) and Marron (1983) to regression function ~ estimation. 1. INfRODUCTION AND BACKGROUND Let (Xl,Yl ), (X2,Yz)' ... be independent random vectors distributed as (X,Y) with common joint probability density function f(x,y) and let m(x) = E(Y/X=x) = !yf(x,y)dy/fX(x), f X the marginal density of X, be the denote the nonparametric kernel estimate regression curve of Y on X. Let m*(x) n of m(x), as introduced by Nadaraya (1964) and Watson (1964), m*(x) =m n n (x)/fn (x) (1.1) where n " (x) = n -1 h -1 L\' K((x-X.)/h)Y. m n • . 1 1= 1 1 and -1 -1 n. f (x) = n h n L K((x-X.)/h). 1 . 1 1= Here K is a kernel function and h=h(n) is a sequence of ''bandwidths'' converging to zero as n tends to infinity. This estimator was studied by Rosenblatt (1969) who derived bias, variance and asymptotic normality; Schuster (197Z) demonstrated the multivariate norma1ity at a finite mnnber of distinct points. For further results we refer to the bibliography of Co11omb (1981). In the present paper we show that (1.Z) a stochastic measure of accuracy for the estimate m*, exhibits the same n limiting behaviour as the deterministic measure (1.3) MISE = f 1 o MSE(t)fX(t)dt -2- where "MSE(t) is the mean squared error (HSE) of m*(t). The proper definition n of the M..,)J: for m* wi L1 be delayed to section 2. n 1be result of this paper addresses two problems. Firstly, in a survey paper, Wegman (1972) was interested in comparing the Mean Integrated Squared Error (MISE) of several different density estimators. As Wegman pointed out, the computation of the actual MISE can be quite tedious. Hence, Wegman used an empirical measure of accuracy of the structure as in fonnula (1.2) and gave some heuristic justification. Now, since the bias/variance decomposition of regression function estimators is rather similar to that of density estimators (Rosenblatt (1969), (1971)) it may be argued that Wegman's. heuristics hold also in the regression function estimation setting. is shown here that, as n -+ (1.4) 0 A*(h) = MISE + n 00, p The answer is posi tive: uniformly over an interval [h ,11] (MISE), h E [h,n] - The appealing feature of this approximation is, that it holds uniformly in h E [h,n]. A Monte Carlo trial comparing different estimators of m(x) (wrt MISE) at different sequences of bandwidths can thus be based on A* (h) which is faster to compute than MISE as defined in (1. 3) . n Secondly, the approximation (1.4) contributes to the solution of the "optimal bandwidth selection" problem. As the optimal bandwidth h* we under- stand that sequence h=h(n) which minimizes the MISE for each n. HardIe and Marron (1983) demonstrated by a crossva1idation argument that minimization (with respect to h) of A~(h) is asymptotically equivalent to minimization of (1.5) where m*(j)(x) = n-lh- l L K((x-X.)/h)Y./f (x) n "J." lIn IT] is the "leave-one-out" estimator. So the result of this paper, as stated in (1.4), ensures that the minimization of (1.5) with respect to h yields the It -3- (MISE-) optimal sequence of bandwidth h* and solves, as is shown in HardIe and Marron, a problem raised by Stone (1982), Question 3, p. 1054. We will not only analyze m*(x) , as defined in (1.1), but also n (1.6) where f X denotes the marginal density of X. This estimator of m(x) is reasonable if we know the marginal density and is somewhat more tractable than m* . n The estimator (1.6) was studied by Johnston (1982), who also observed that mn/fX has in general a higher asymptotic variance than m*. n The stochastic measure of accuracy (1.2) was defined only on the interval [0,1]. It will later be assumed that the support of f interval. properly contains this X This is due to ''boundary effects", more precisely, the bias at the endpoints of the support of f ~ inflates and has a slower rate than in the x interior (Gasser and MUller, 1979; Rice and Rosenblatt, 1983). Thus, defining the MISE over the whole support of f X' would ultimately lead to the tmappealing situation that the optimal bandwidth with respect to MISE would be determined in such a way that it minimizes the mean square error at the botmdaries, since that is of lower order. The estimate in the interior would thus exhibit suboptimal behaviour. The results of this paper are improvements over some previous work for several reasons. Firstly, we do not need such strong smoothness assumptions on f X as in Hall (1982), who proves similar results in the density estimation setting. Secondly, our assumptions on the variance curve V2 (t)=var(Ylx=t) and the range of allowable bandwidths are considerably weaker than those in Johnston (1982) who demonstrates a Gaussian approximation to along the same lines as Bickel and Rosenblatt (1973). 1. (nh)2[~-EIDn] Thirdly, our work extends the result of Wong (1982) who deals only with the fixed design case, -4- i.e. X. are nonrandom. 1 Finally, we may note that Hall's proof would simplify if one uses the approximation provided by the Bickel and Rosenblatt paper and the outline of the proof given here for regression ftmction estimators. Note that although only the two-dimensional case is considered here, the proof can certainly be extended to the higher dimensional case where we observe a (d+l) dimensional random vector (Xl,···Xd,Y), d > 1. The assumptions will be different in that case, since it is still tmknown whether the multivariate empirical process can be strongly approximated by Brownian bridges with compatible rates as in the univariate or bivariate case. This approxi- mation technique by Brownian bridges, as carried out in the appendix, is vital to our results. A similar technique, exploiting the idea of invariance principles in nonparametric regression, was used by Mack and Si1vennan (1982) who showed weak and strong uniform consistency (in sup-norm) of m*. n outline of the paper is organized as follows. The First we prove that mn(t)-Effin(t) can be uniformly (in t and h) approximated by a Gaussian process similar to that occurring in Bickel and Rosenblatt (1973) p.1074, formula (2.5) . Secondly, we plug this approximating process into the fonnula (1. 2) , which defined the discrete version of MISE, and by evaluation of covariances and higher moments be finally arrive 2. at the deterministic measure (1.3). RESULTS We will make use of the following definition. Definition. A function w is called Lipschitz - continuous of order a (LC(a)) iff with a constant L w Iw(t)-w(t') I :<; L It-t'l a , 0 < a w :<; l. The following assumptions fix the range of allowable bandwidths [h,h] , determine the kernel function K and describe some smoothness of m(t) , var(YIX=t) and -5(AI) Let {~} denote a sequence for which there is an lim h n!-E/ log n = 0, lim h n}-c = n-l<lO -n n-l<lO -n <:: > 0 so that 00 and let {h } denote a sequence for which n lim h = 0, lim h log n = n-l<lO n 00 n-l<lO n Asswne from h = hen) that it satisfies h ~ h ~ (AZ) h There exists a sequence of positive constants {a } n such that sup_ h -3 h~h~h lim n-l<lO J Iyl >an sup O~x~l I~(x) J Iyl>an a I =I Jn -an fldu[~(uh)]1 (A3) y 2 fyCy)dy ~ c, f y t 00 and a c < 00 the marginal density of Y y2 f (x,y)dy = 0 2 Y f(x,y)dYI ~ n > a for all a~ x~ 1, n ~ 1. 1 = 0 ({log(l/h)}2) The functions S2(t) = E[y2 Ix=tl, fxCt) and met) are LC(a) with a and are all of bounded variation. > ~ The marginal density of X is bounded from below: inf fxCt) ~ y > O. O~t~l (A4) The kernel ftmction K is differentiable with K' of bounded variation and fulfills fKCu)du =1 support {K} c [-A,A]. -6By straightforward computations it can be shown that ~ is LC(a), a > ! and of bOl.m.ded variation by asswnption (A3) on S2 (t) and fX(t). ~ not hard to see that if It is also is LC(I) the last condition in (A3) follows. that the set of asstmlptions in (AZ) holds if Y is bOlUlded (a n = log Note log n), an asstmlption that is often made in other papers, to avoid conditions on (AZ) also holds, if an = n B, B small, while (X,Y) are moments of Y as in (AZ). jointly normally distributed. For simplicity of notation, we will not explicitly write the indices of h,h,h. The following results show that the approximation (1.4) holds for both ~/fX and~. Only the proof of theorem 1 (dealing with mn/f ) will be given x in full detail since the result for m* can be obtained quite analogously. n A 2 Let us define B k = fK (u) du -A and A b (t) n = f~ 1 A J K(u) [m(t-uh)fX(t-uh) -m(t)fX(t) ]du (t) -A the bias of mn/fX. Theorem 1 Assume that (Al) to (A4) hold and uniformly over h An (h) [!!,h] E I = n- l bn (t) .J JE = (nh)-lB [m (X.)/fx(X.) - m(X.)]2 nJ J J 1 k f S2(t) dt o 1 + J[b o n (t)]2 fx (t) dt 1 + 0 ((nh)-l + P = MISE[tftn /fX] f[b n (t)]2 o + 0 (MISE) p dt) is of bOlIDded variation. Then -7- Assume that f x is d - times continuously differentiable and m is d - times Z 1 continuously differentiable. Then, as in Rosenblatt (1971), the bias bn(t) would read as provided that K satisfies f u~(u)du = d!~d. f ujK(u)du = 0, j=1, ... ,d-1 and Many papers in nonparametric regression function estimation assume such a kind of differentiability as above and are dealing with methods to balance the contribution from the variance and the bias (see Co11omb (1981) for a review). In a similar manner define b*(t), the bias of m*(t), as follows n 1 n A b~(t) = f~ (t) _!K(U) [m(t-uh) -m(t)] fX(t-uh)du. e Where the expression "bias" has to be understood as the expected value of n . f~l[~ - mf ], f (t)= n- 1h- 1 I K((t-Xi)/h) n n n i=l density f • X a density estimate of the marginal This is justified by the observation that -1 A A m* - m = f [m -mf ] + 0 (m -mf ) n X n n pn n (see Hardle and Marron (1983)) and that moments of m* need not exist in n general (Rosenblatt (1969)). The next theorem shows how A*(h) approximates the MISE. n Theorem Z Assume that (Al) to (A4) hold and that Then uniformly over h E [h,h'] b~(t) is of bounded variation. -8- A*(h) =n n -1 + 0 I jcJ [m*(X.) - m(x.)]2 n J J ((nh)-l + J1 [b*(t)]2 dt) pan = MISE[m~] + 0p(MISE) , where V2 (t) = S2(t) - m2 (t). Note that the variance terms and the bias terms of the two estimators mn/fX and m*n are completely different. Since V2(t) ~ S2(t), the Nadaraya-Watson estimator m*(t) attains in general a smaller (asymptotic) n variance than mn/fx . This was also observed by Jolmstlon (1982). The condition ''nhS -+ 0", appearing in the work of the latter, implies that the bias vanishes asymptotically faster than the variance. bias terms does not show up in that work. Therefore, any difference in It would be interesting to find a similar comparison of bias terms, but this would lead to complicated and rather lDlIlatura1 assumptions on derivatives of m and f ' as can be seen from the X '" formula for b , following Theorem 1. n TI-IE PROOFS 3. We shall prove theorem 1 in full detail, the proof of theorem 2 will onl y be sketched since the technical details are similar to the proof of theorem 1. F(x,y) will denote the joint cumulative distribution function (df) of (X,Y) and Fn(x,y) will denote the two-dimensional empirical df, defined as usual. It 1S understood throughout these proofs that 0,0 in remainder terms are uniform over h E [~,h]. -9- Proof of theorem 1 The basic decomposition is m n (t)/fX(t) (3.1) - met) = Yn (t) A + bn (t) where 00 Yn(t) = f~l(t) h- 1 If yK((t-x)/h)d[Fn(x,y)-F(x,y)] . -00 In the appendix it is shown that Y (t) = [S2(t)/f (t)]-iYn (t) o,n x where the remainder term is miform in t. The basic decomposition (3.1) now reads (3.2) _.1. _1 = n 2h 2V n where (3.3) p n = _1.. 0 p A (t) + b (t) + P n n _1.. (n 2h 2) is mifonnly in t and 7 Vn(t) = [S2(t)/f (t)]! h-! K((t-x)/h)dW(x) X -00 Using (3.2) and (3.3) the stochastic measure of accuracy is then -10A A (h) n 1 A = J [b (t)] o 2 dFX (t) ,n n 1 1 1 + 2n- zh- z J b (t) Y (t)dF (t) X,n on n 1 A J.. 1 1 + p {2[f b (t)dF (t)+n-Zh- Z J Y (t)dF (t)]+ p } , X,n X,n nOn 0 n n where FX,n denotes the empirical distribution ftmction of {Xi}~=l . This can be rewritten as n- l I [b . J JE: n (x.)]2 J where 1 2 U 1 = f y (t)fX(t)dt nOn 1 2 U 2 = f y (t)d[Fx (t)-FX(t)] nOn ,n 1 A U 3 = f Y (t)b (t)fX(t)dt nOn n 1 U 4 = f Y (t)b (t)d[FX (t)-FX(t)]. nOn n ,n -11- We now show that the limits of U ., i = 1,2,3,4 give us the desired nl limit behaviour of A (h). We may note that the approximations, as carried A n out in Bickel and Rosenblatt (1973), would have led to a process similar to V (t) when estimating a density. n So the technique developed here, would be useful in density estimation also and wcu1d provide an alternative proof of Hall's (1982) result on stochastic measures of accuracy for density estimators. Let us begin with the limit behaviour of U . n1 1 E U nl 1 Note first that 00 = f E{h- 2 f a K((t-x)/h)dW(x)}2 s2(t)dt -00 1 . 00 = f h- l f K2 ((t-X)/h) a dx S2(t)dt -00 1A = f J K2 (U) S2(t-Uh)du dt OA 1 = 6k f S2(t)dt + 0(1). a where the remainder term is uniform in h, since S2(t) is LC(a), a > ! by asslUllption (A3). To show that (3.4) U nl we demonstrate A 2 1 K (u)du f S2(t) dt ~ f a -A E(U~l) ~ (EUnl ) 2. The statement (3.4) will then follow from Chebyshev's inequality. Since Z(t) = h- 1 2 00 f K((t-x)/h) dW(x) is a Gaussian process we conclude by -00 the Isserlis (1918) formula -12- The first summand by assumption (A4) on the kernel K. The second summand by evaluation of the integral inside the [0] - brackets. 1 2 This shows that U 1 = Pk J S (t)dt + 0 (1). n 0 p Next we show that We obtain by partial integration -13h U n2 = 1 -2 JoHn (t) q(t)Z (t) [h- 1q(t) n 00 JK' ((t-x)/h)dW(x)] dt -00 1 -2 J H (t) q(t)Z2(t) d q(t) o n n where q(t) = S2(t)/f (t) x Now since Hn (t) = 0p (n -~) llllifonnly in t and V2n (t 0 ) = 0p (1), t 0 = 0,1, as is easily verified by Chebyshev's inequality, we only have to consider the first two summands in the equality above. These are further estimated by Schwarz's inequality 1 X + where Sl = 00 [f [h- 2 fK'((t-x)/h)dW(x)] o S2 1 1 dt]2 00 sup O~t~l Iz 2 n 1 (t)lf Idq(t)I } , 0 sup q 2 (t) and S2 = sup q(t). O~t~l By 2 O~t~l Chebyshev's inequality we have 1 f o 1 00 [h- 2 !L((t-x)/h)dW(x)]2 dt = 0 (1) -00 where L is either K or K'. ately that p Integration by parts applied to Z2(t) show immedin -14- sup Z2(t) = 0 (1) p' therefore (3.5) holds. O~t~l n Now, since 1 2 [b (t)] dt) o n ~ o(J A by an application of Schwarz's inequality, we conclude that (3.6) U 3 n =0 1 2 1 ([J [b (t)] dt]2) • p 0 n A The tenn Un4 is estimated again by a partial integration argument. 1 1 4 = h- 2 f H (t)b (t)h- q(t) !K'((t-x)/h)dW(x)dt nOn n 1 00 A U -00 1 + h- 2 1 + 1 Jo Hn (t)bn (t)Zn (t) dq(t) 1 h- 2! H (t)q(t)Z (t) db (t) o n n n A 1 (t) V (t)b (t) I = T l n n non + H + T2 n + T 3n + T4 n where, as for the computations for Un2' Hn (t) = FX,n (t) - FX(t) and 00 Zn (t) = !K((t-x)/h)dW(x). . -00 _.!. The last surranand T is obviously 0 (n 2) = p 4n -1 -1 o (n h ) by (Al). P The first tenn, TIn' can be estimated as follows: -15- Now, since f 1 o 1 00 [h- 2 fK'((t-x)/h)&~(x)]2dt = 0 (1) P -00 1 and n2 sup O~t~l IH n (t) I = Opel) we conclude that The terms T2n and T3n are estimated in a similar fashion as we did estiTIk~te the terms of Un 2 employing the Lipschitz continuity of bn (t) and q(t) and we A thus obtain 2n = 0p (n -!) = 0p ( n - Ih -1) T T l 3n = 0p (n-!) = 0p (n -lh- ) This shows finally that (3.8) It remains to show that (3.9) Again by partial integration we have that the illS of (3.9) is -2f,01Hn (t)bn (t)dbn (t) + Hn (t)bn2 (t) 11 0 . _.!- As before the last summand is open 2) and so is the first summand. Now, putting together (3.5) to (3.9) we finally have that An (h) = flo[bn (t)]2 f X(t)dt + n-lh-lBkf~s2(t)dt + 0p(n-lh- l which proves the theorem. + J~[bn(t)]2dt) -16- Proof of theorem 2 This proof goes mainly along the lines of the proof of theorem 1. From Hardle and Marron (1983), fonnula (2.4), we have (3.10 ) m*(t) - met) n 00 b~(t) = f~l(t)h -1 fK( (t-u) /h) [m(u) -m(t)] fx(u) du where -00 and 00 Y*(t) n = f x-1 (t)h -Iff [y-m(t)]K((t-x)/h)d[Fn (x,y)-F(x,y)] -00 This process can now be approximated as Yn (t) (see the appendix) but with V2(t) = S2(t) - m2(t) in the place of S2(t). So we obtain that y* (t) = [V2(t)/fx(t)]-tY~(t) o,n uniformly in t. The decomposition (3.10) then reads as (3.11) mn*(t) - met) = b*(t) n + n-lh-lV*(t) n + p* n where and V~(t) = [V2(t)/fx(t)]~h-i7 K((t-x)/h)dW(x). -00 We then carry out the same procedures as for V (t) in the proof of theorem l. n -17- 4. APPENDIX It is shown here that the variance terms in (3.1) can be approximated by a sequence of Gaussian processes. The crucial step in these approximations is provided by the following lemma, due to Tusnady (1977). Lemma 1 Let T(x,Y) 1952). = (F ' FYI X) (x,y) be the Rosenblatt transformation (Rosenblatt, X Then on a suitable probability space there exists a sequence of Brownian bridges Bn(x' ,y') on [0,1] x [0,1] such that sup I [Fn (x,y) - F(x,y)] - n -tBn (T(x,y)) I x,y = 0 (n -1 [log n] 2) . p A It is next shown that Yn(t) can be approximated (uniformly in t) by Gaussian processes. Yo,n (t) Y4 (t) ,n For this define = = [S2(t)/f (t)]-lYn(t) x where {Bn } is the sequence of Brownian bridges as in lemma 1. 2 [Sn (t)fx(t)]-lh- 1n- l JJy K((t-x)/h)dW (T(x,y)) r n n where {Wn } is a sequence of Wiener processes used in constructing {B } as n -18- (Tusnady, 1977). Ys ,n (t) = [S~(t)fx(t)]-!h-1n-t 7[S~(x)fx(x)]tK((t-x)/h)dW(x) -00 1 1 Y6 net) = n-Zh, 00 !K((t-x)/h)dW(x) -00 where W(x) is a standard Wiener process on (-00,00) II Y II For the following 1enunas will denote sup IY(t) I. O$t$l Lerruna 2 Proof II Un II We have to show that .!. U (t) = n 2 h n _.!. 2 {! IY >a n y P -+ 0, where n K((t-x)/h)d[F (x,y)-F(x,y)] = I X . (t) i=l n,1 n Note that EX . (t) = 0 for all t and that X . (.) are independent, identically n,1 n,1 distributed for each n. Therefore (4.2) EX2 .(t) n,1 $ n -1 h -1 sup I K12 J y 2fy(y)dy Iyl>a n P establishes Un (t) ---+ 0 for each t by assumption (A2). By (A4) and the Cauchy-Schwarz inequality we have -19EIUn(t)-Un(t l ) II Un(tZ)-Un(t) I~ Moh-3Itl-tll tz-tl J. y2 fy (y)dY, Iyl>an establishing by (AZ) tightness of Un(t) , [Theorem 15.6 of Billingsley (1968)]. Note that the proof of this lemma was done as in Jolmston I s paper, but note also that our assumption is somewhat weaker than his, since we are employing lemma 1, due to Tusn[ldy (1977), establishing a faster rate for the two-dimensional empirical process. Lemma 3 Proof = SZ(t)fx(t), ~(t) = S~(t)fx(t). We must show that Define get) Now, from Johnston (198Z) we have that the second factor inside the curly _1. _1. brackets is 0 (n 2h 2). p Now, from the mean value theorem . where I:n is between ~ and g. Since~, g are bounded away from zero by assumption (A3), 1/ 1:-n 3/ ZI/ is a bounded sequence. Now, from (AZ) it follows that 1/ ~ - gil ----- 0 and thus the lenma follows. Lemma 4 Proof Using integration by parts (see Johnston (1982) Lemma A.5 for details), we obtain 0 -ZO- = 0 (n-i(log n)Z)h-!{4a ~ IK'(u) Idu+4an [K(A)+K(-A)]} n -A p 0 (n-~h-~a (log n)2) = n p unifonnly in t. The proof thus follows using assumption (AZ). Lemma 5 Proof Since the Jacobian of the transfonnation T, introduced in Lemma 1, is f(x,y), we have by Masani (1968), Theorem 5.19 1 n2IY3,n(t) - Y4 ,n(t) I ~ 1~(t)-~h-lfIyK((t-x)/h)f(x,Y)dxdYI·11V (1,1)1. r n n So we finally have 1 n~lI Y..,J,n,n - Y4 II ~ Iwn (1,1) 11.1 h- IIK((t-x)/h) Idx where Al is a constant (AI = sup Im(t)fX(t)I). O~t~1 This proves the lerrnna. Note that Y4 (t) is a zero mean Gaussian process with covariance ,n x n -1 h-2 Ify 2K((tl-x)/h)K( (tz-x)/h) f(x,y)dxdy f n -21- So both Y4 ,n and Ys ,n are Gaussian processes with the same covariance structure and can thus be identified. Lerrona 6 Proof Note that by assumption (A3) on ~(t) 2 = Sn(t)fX(t) is also LC(a) , a >! , i.e. where LG is independent of t by (A3). The difference of interest is now l (nh)2I Ys (t) - Y (t)! ,n 6 ,n = 1~(t)1 • We will now show that for all n and t • sup O~t~l IR n (t) I =a p (1) . By partial integration we have -221 S Ih- 2 A f W(t-uh)Gn, t(U)K' (u)dul -A 1 + Ih- 2 + Ih- 2 1 = R1 ,n (t) A f -A [W(t-uh)-W(t)]K(u)d[G t(u)] I n, A 1- J W(t)Gn, t(u)K'(u)dul -A + Rz,n (t) + R3,n (t) + + ap (h 2 ) • R4,n where R ,n is independent of t. The term R_ (t) is estimated as in -~,n 4 Johnston (1982), Lemma 4.6, p. 411 to obtain sup IR net) Ostsl 1 ' ~ I = opel) We now show that sup IR (t) Ostsl Z,n I = opel) J£two(s) denote the modulus of continuity of Wet) and let K= sup !K(u) I -AsusA we then have with Silverman (1978), formula (7), (8) and his definitions of p, q, B 1 • + 1 h- 216K (log B)2 A -A J p(lulh)ldG n, t(u) I -23- Now following the proof of Silverman (1978), Prop. 4 we see that the both surmnands are by assumption (A3) on Id~(u) ly in t. It remains to show that sup O~t~l IR3 I of ,n the order Opel) unifonn- (t)1 = 0 (1). p again from assumption (A3) on the LC(a),a > ! condition ~(.) This follows and the follow- ing inequality. sup IR (t) 3 ,n O~t~l I ~ n -2 sup IWet) Ih _1.2L ha fA .Iu la, K'(u) Idu = 0 (1). G -A O~t~l P .. ACKNOWLEDGEMENT I am grateful to Steve Marron for helpful discussions. contributed much to the approximations of the appendix. Ray Carroll REFERENCES BICKEL, P. and ROSENBLATI, M. (1973). On some global measures of the deviation of density function estimators. Ann. Stat. 1, 1071-1095. BILLINGSLEY, P. (1968). Convergence of probability measures. and Sons, New York. Jolm Wiley COLLOMB, G. (1981). Estimation non-parametrique de 1a Regression: Revue Bib1iographique. International Statistical Review 49, 75-93 . .. GASSER, T. and MULLER, G.H. (1979). Kernel estimation of regression functions in "Smoothing Techniques for curve estimation. Ed. T. Gasser and M. Rosenblatt. Lecture Notes in Mathematics 757, Springer Verlag Heidelberg. HARDLE, W. and MARRON, S. (1983). Optimal bandwidth selection in nonparametric regression function estimation. University of North Carolina, Institute of Statistics Mimeo Series # HALL, P. (1982). 383-390. Cross-validation in density estimation. Biometrika, 69, ISSERLIS, L. (1918). On a formula for the product moment coefficient of any order of a normal frequency distribution in any number of variables. Biometrika, 12, 134-139. JOHNSTON, G. (1982). Probabilities of maximal deviations of nonparametric regression function estimation. J. Mu1t. Analysis, 12, 402-414. MACK, Y.P. and SILVERMAN, B.W. (1982). Weak and Strong Uniform Consistency of Kernel Regression Estimates. Z. F. Wahrschein1ichkeitstheorie und verw. Gebiete, 61, 405-415. MARRON, J.S. (1983). Convergence properties of an empirical error criterion for multivariate density estimation. Institute of Statistics Mimeo Series # 1520, University of North Carolina at Chapel Hill. ~~ANI, P. (1968). Orthogonally Scattered Measures. Adv. in Math., 2,61- 117. NADARAYA, E.A. (1964). 141-142. On estimating regression. Theor. Probe App!. 9, RICE, T. and ROSENBLATI, M. (1983). SIJIOothing splines: tives and Deconvolution. Ann. Stat., 11, 141-156. • Regression, Deriva- ROSENBLATT, M. (1952). Remarks on a multivariate transformation. Math. Stat., 23, 470-472. Ann . ROSENBLATI, M. (1969). Conditional probability density and regression estimation in Multivariate Analysis, ed. Krishnaiah 25-31. ROSENBLATI, M. (1971). Curve estimates. Ann. Math. Stat., 42, 1815-1842. SCHUSTER, E.F. (1972). Joint asymptotic distribution of the estimated regression function at a finite number of district points. Ann. Math. Stat., 43, 84-88 . • SILVERMAN, B. (1982). Weak and strong uniform consistency of the kernel estimate of a density and its derivatives. Ann. Stat., 6, 177-184. STONE, C.J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Stat.,lO, 1040-1053. TUSNADY, G. (1977). A remark on the approximation of the sample distribution function in the multidimensional case. Period. Math. Hung., 8, 53-55. WATSON, G.S. (1964). 26, 359- 372. Smooth regression analysis. Sankhya, Series A, Vol. WEGMAN, E.J. (1972). Nonparametric probability density estimation: A comparison of density estimation methods. J. Statist. Comput. Simulation, 1, 225-245. WONG, W.H. (1982). On the consistency of cross-validation in kernel etric regression. Technical Report, Univ. Chicago . . no~param
© Copyright 2024 Paperzz