• CONVERGENCE PROPERTIES OF AN EMPIRICAL ERROR CRITERION FOR MULTIVARIATE DENSITY ESTI}1ATION James Stephen Marron University of North Carolina at Chapel Hill 'e AMS 1980 Subject Classifications: Key Words and Phrases: Primary 62G20; Secondary 62H12 Nonparametric density estimation, kernel estimator, empirical error criterion, stochastic measures of accuracy. ABSTRACT For the purpose of comparing different nonparametric density estimators, Wegman (1972) introduced an empirical error criterion. In a recent paper by Hall (1982) it is shown that this empirical error criterion converges to the mean integrated square error. Here, in the case of kernel estimation, the results of Hall are improved in several ways, most notably multivariate densities are treated and the range of allowable bandwidths is extended. The techniques used here are quite different from those of Hall, which demonstrates that the elegant Brownian Bridge approximation of Komlos, Major and Tusnady (1975) does not always give the strongest results possible. 1. INTRODUCTION Consider the problem of estimating a probability density f(x), defined using a sample of random vectors Xl""'Xn from f. If f(x) = rex, Xl, •.• ,X ) denotes some estimator of f(x), a very popular means of measuring n the accuracy of f is Mean Integrated Square Error. Given a nonnegative function w(x), this error criterion is defined by (1.1) f MISE 2 A E [f(x)-f(x)] w(x)f(x)dx. Writing the weight function in the form w(x)f(x) is for notational convenience in the proofs of this paper. There is no loss of generality in this device because if a weight function w*(x) is desired, simply take w(x) w*(x)f(x) -1 In a survey paper, Wegman (1972) was interested in comparing the MISE of several different density estimators, in the case d = 1. Unfortunately, as Wegman pointed out, the computation of MISE can be quite tedious. Hence Wegman used the empirical error criterion A MISE = n -1 n L [f(X.)-f(X.)] 2w(X.), I' j=l J J J and gave some heuristic justification. Recently there has been some controversy regarding this practice. The difficulties have been essentially settled by Hall (1982), who has shown that Wegman's heuristics are valid if f(x) is either a kernel estimator or an orthogonal series estimator. Hall has shown that, relatively mild conditions on f and w, as n A MISE MISE + 0 P + 00 (MISE). , in the case d=l, under . -2In this paper, Hall's theorem 1 (dealing with kernel estimation) is extended in several ways. First, the assumptions made on f, K and ware substantially weaker here. d is treated here. Of more interest, the case of general dimension Also of importance is the fact that the bandwidth of the kernel estimator satisfies much weaker restrictions than in Hall's theorem. This is vital to the results of Marron (1983) where an asymptotically efficient means of choosing the bandwidth is proposed. It is interesting to note that the crude, "brute force" method of proof used in this paper gives stronger results than the elegant techniques employed by Hall. 2. ASSUMPTIONS AND STATEMENT OF THEOREM Given a "bandwidth", h > 0, and a "kernel function", K, defined on lRd , the usual kernel estimator of f(x) is given by " (2.1) = f(x,h) n x-X. 1 K(~) d nh j=l L For the rest of this paper it is assumed that K satisfies the following assumptions. = 1, (K.1) !K(x)dx (K.2) K is bounded, (K.3) !K(x)2 dx < (K.4) lim II xii + 00 , II x II I K(x) I < 00 , 00 d where the integrals are taken over lR and II ·11 denotes the usual Euclidean norm. Assumption (K.1) is required to make f(x,h) integrate to 1. Assumptions (K.2), (K.3) and (K.4) are milder than those used in theorem 1 of Hall (1982). -3- It will also be assumed that the underlying density satisfies the following assumptions: (f.l) f is bounded, (f. 2) f is uniformly continuous. Assumption (f.l) is a consequence of (f.2) but it is required often enough in the upcoming proofs to be listed separately. Assumption (f.2) is much weaker than the bounded variation and differentiability assumptions used in theorem 1 of Hall. Finally, it is assumed that, (w.l) w(x) is nonnegative, (w.2) w(x) is bounded. Hall deals only with the special case w(x) =1. Given sequences {a } and {b } it will be convenient to let the phrase n "h is between a n n and b " mean the sequence h = h(n) satisfies n lim n-+ oo a h n -1 0, 00 The main theorem of this paper can now be stated. Theorem 1: if f as n Under the assumptions (K.I)-(K.4), (f.l), (f.2), (w.l) and (w.2), l d is the kernel estimator of (2.1) and if h is between n- / and 1, then -+ 00, "'- MISE = MISE + 0 P (MISE) • It is worth noting that in the case d much larger than that of Hall (1982). Marron (1983a). = 1 the range of allowable h is This fact is vital to the results of -4- e 3. PROOF OF THEOREM First note that by the familiar variance and bias 2 decomposition (see Rosenblatt (1971) or (3.6) of Marron (1983», MISE = n-1h-dC!fex)2w(x)dx)(!K(u)2du) + o(n- 1h- d) + (3.1) +J[JK(u)f(x-hu)du-f(x)]2w(x)f(x)dx • It will be convenient to let the bias (3.2) 2 part of this expansion be denoted by Sf(h) = ![!K(u)f(x-hu)du-f(x)]2 w (x)f(x)dx. Note that by (f.2) lim sf(h) = 0 (3.3) h + 0 It is seen in Marron (1983) that the rate of convergence of sf(h) to 0 provides a measure of the "smoothness" of f. Next, for j=l, ... ,n, it will be useful to define the "leave one out" kernel estimator: f. (x, h) = _....::1=---=-d I (3.4) (n-1)h J i~j x-X K(-h i) A A quantity which is more tractable than MISE is: ~ 1 n 2 (3.5) MISE = - I [f.(X.,h)-f(X.)] w(X.). n j=l J J J J Note that for j = A 1, ... ,n f. (x,h)-f(x,h) J -lid Hence, by (K.2), for h between n a n d 1, (3.6) sup sup If.(x,h)-f(x,h) '-1 , ... ,n J x J- I 1 = O(n- h- d) • Theorem 1 is now a consequence of the following. -5- e Theorem 2: Under the conditions of theorem 1, ~ MISE 4. MISE + = 0 P (MISE) • PROOF OF THEOREM 2 = 1, .•• ,n First, for j define 2 [f.(X.,h)-f(X.)] w(X.) A J u. = (4.1) J J J J MISE 1 d Note that by (1.1), (3.1) and (3.6), for h between n- / and 1, EU. = MISE- 1 !E[f.(x,h)-f(x)]2w(x)f(x)dx J J = MISE- 1 jE[f(x,h)-f(x)]2w(x)f(x)dx + O(n-1h- d) (4.2) = 1 + 0(1). The proof of Theorem 2 will be complete when a weak law of large numbers is established for the U.. J For i (4.3) ~ j = 1, .•• ,n define _ -d. [h lK( V•. ~ Xj-Xi ~ h ) - f(X.)]w(X.) • J J It follows from (3.4) that, for j = 1, •.. ,n, " (X. ,h)-f(X,)] 2w(X.) [f. J J J J = [(n-1) -1 \ 2 LVi']. i=!j J Hence, (4.4) var(U,) J = (n_1)-4 MISE-2 var[ L i~j Now if this last variance is expanded in terms there will be (n-1) all different.) 4 v.,]2 1J of variances and covariances, terms of the following types (where i, i', i", i*, j are -6- if of Terms General Terms D(n) var(V .. Vi") 1J 2 J 2 cov (V .. , Vi' .) 1J J 2 1J cov(V .. , Vi' V. , . ) J 1 J 2 cov(Vi . , V. , . V. " . ) J 1 J 1 J cov (V. j V. , . , Vi . V. " . ) 1 1 J J 1 J cov(V.. V. , . ,Vi'" V. *.) 1J 1 J J 1 J Bounds will now be computed for each general term. Using (K.2), (K.3), (f.l), (w.2) and (4.3), as h -e ff [h (4.5) -3d 4 JK(u) -4h - 4K(u)f(x) -2d- 3 -d + 0, 2 JK(u) f(x) + 6h JK(u) f(x) - 3 + h d f(x) 4 ]w(x) 2 f(x)f(x-hu)dudx D(h- 3d ) • Similarly, var(Vi.V.,.) J 1 J (4.6) fff [h -d ~ EV:.V:,. = 1J 1 J x-y 2 -d x-z 2 2 K( h )-f(x)] [h K(~)-f(x)] w(x) f(x)f(y)f(z)dxdydz f[f[h- dK(u)2-2K(u)f(x) = D(h- 2d ) • Very similar computations show that (4.7) 2 -d = D(h ) • 1J EV .. + h d f(x)2]f(x-hu)du]2 f (x)dx = -7It follows from the above that In the same way it is easily seen that 2 cov (V . ij' Vij V i' j) = O(h- 2d ) , ) ( 2 Vi' j V COVV!j' i"j = O(h- d ) COV(Vij Vi 'j' Vij Vi"j) = O(h- ) d To bound the final term, note that (4.8) d X.-y ~ E[V .. Ix.] = Jrh-J.{(-.-L...h) - f(X.)]w(X.) f(y)dy 1.J J J J Hence, by (3.2) = E[V .. V '.] 1.J i J -e (4.9) E(E[V .lx.]2) iJ J = !r!K(u)f(x-hu)du - f(x)]2w(x)f(x)dx As another consequence of (4.8), note that by (K.2), (4.10) (f.l) and (w.2), suplw(x)~[fK(u)f(X+hu)du suvIE[v .. lx. = x]1 = X 1.J J x = - f(x)]1 < 00 • It follows from the above that ~ sup E[V .. 1X. = xl x 1.J J 22 E(E[Vi.jX.]) = J J O(sf(h». Hence, COV(V"V"J,V,,,,V'*J) = O(sf(h» 1.J 1. 1. J 1. -lId Looking back to (4.4), it is now apparent that for h between n a n d 1, -8- -1 -d O(n h ) + O(sf(h» var(U.) = ) --------=---::...--2 MISE Hence, by (3.1) and (3.2), there is a constant C so that, for h between n -lid and 1, n -1 O(n -1 -d h )+O(sf(h» var(U.) ~ ) n[Cn-lh-d+Sf(h)+0(n-lh-d)]2 o (n-lh- d ) O(sf(h» +----- 2 -1 -2d Cn h -d 2Ch sf(h) (4.1:1) d O(h ) -+- 0 • Next, for j " j', a sindlar bound will be computed for (4.12) cov(U). ,Uj ,) = (n-l) -4 MISE -2 \ 2 \ 2 cov( [ L Vi'] ,[ LV .. ,] ) i,&j) i:fj' 1) Once again, by the linearity of covariance, the right hand covariance in (4.12) may be written as the sum of (n-l) i*, j and j' are all different): 4 terms of the form (again assume i, .,, 1 i" , -9II of Terms General Term 0(1) 2 2 cov(Vj 'j , Vjj ,) O(n) 2 2 cov (V. , . , Vi' ,) J J J O(n) cov(V~j , ~j ,) 2 0(n ) 2 2 cov (Vi . , V. , . , ) J ~ J O(n) 2 cov (V. , . , V.. , Vi' , ) J J JJ J 2 0(n ) 2 cov(V. , . , Vi' , V. , . ,) J J J ~ J O(n) 2 cov(V .. , Vij , V.. ,) ~J JJ 2 0(n ) 2 cov(Vij,Vij,Vi'j') 2 0(n ) 2 cov(V .. , V.. , Vi'j') ~J JJ 3 0(n ) 2 cov (Vij , Vi' j , Vi" j , ) O(n) cov(v.,.vi·V .. ,V .. ,) J J J, J J ~J 2 0(n ) cov (V. , . V.. , V.. , V. , . , ) J J ~J JJ ~ J 2 0(n ) cov (V. , . V.. , V.. , V. , . , ) J J ~J ~J ~ J 3 0(n ) cov(V. , . V.. , Vi' . , V.". ,) J J ~J J ~ J 2 0(n ) cov(V .. Vi' . , V.. , Vi' . , ) ~J J ~J J 3 0(n ) cov(V .. Vi' . , Vi' , V.". , ) ~J J J ~ J 4 0(n ) cov(Vij Vi 'j' Vi"j' Vi*j ,) These general terms will now be bounded in order of increasing difficulty. By the independence of XI, ... ,Xn , Using the Schwartz Inequality, (4.5) and (4.6) each term which appears O(n) times may be bounded by O(h -3d ) • -10Using (4.7) and (4.10) E[V:,.V .. ,V.,.,] = J J 1.J 1. J E(V~,.E[Vi.,lx.,x.,]E[Vi,.,lx.,x.,]) ~ J J J J J J J J 2 2 -d sup E[V .. IX.=x] E(V.,.) = D(h ) x 1.J J J J ~ Thus, by (3.3) and (4.9), COy 2 (V . , . , Vij' V.,., ) = D(h- d) • J J 1. J Similarly, again using the Schwartz Inequality, E(V. , . Vi' V.. ,V. , . ,) = E(V. , . Vj . ,E [V .. Ix. ,X. , ]E [V. , . , Ix. ,X ,]) J J J J J 1. J J J . J 1.J J J 1. J J j ~ sup E[Vi . /X.=X]2 E(V., .V .. ,) = D(h- d). x J J J J JJ from which it follows that cov(V.,.V .. ,V .. ,V.,.,) = D(h- d). J J 1.J JJ 1. J By the above techniques it is easy to see that ·e cov(V"'Vi"V""V'''j') = D(Sf(h». J J J 1. J 1. The remaining terms will be a little more difficult. By (4.3), 2 E[Vi .vi .,v.,.,] = E(;ijV .. ,E[Vi,·,lx.,x.,x.,]) = J J 1. J 1.J J 1. J J 1 • w(y)w(Z)~E[V .. IX.=z]f(x)f(y)f(z)dxdydz 1.J J d d d = JJJh- [K(u)-h f(x+hu)]2[K(V)-h f(x+hv)] • 1 " w(X+hu)w(X+hv)~[Vi' Ix. = x+hv]f(x)f(x+hu)f(x+hv)dxdudv J J = D(h- d), and hence, COy 2 (V •• , Vi" V.,., ) 1.J J D(h- d). 1. J Similarly, 2 2 E[Vi.V.. ,vi ,.,] = E(Vi.Vj.,E[Vi'j' 1X.,X.,X.,]) = J JJ J J J 1. J J = JJJ[h-dK(Y~x)- f(y)]2[h-~(z~y)_ f(z)]" "w(y)w(z)~E[Vi.lx. = z]f(x)f(y)f(z)dxdydz = D(h- d) , J J -11and so, 2 V V .. , JJ. ., I 1. I j , ) cov (V1J O(.h- d ) • = By the same technique, E[Vj I .Vi' Vi' IVi I . I] = E(V . I . V.. Vi' IE[Vi I . I Ix. ,X. I ,Xi]) = J J J J J J 1J J J J J = JJ f[h-~(Y~X)-f(Y)][h -dKe~Z) -f(y)][h-dK(X~Z)_ f(x)]· • W(Y)W(X)~E[Vi' J IX.=x]f(x)f(y)f(z)dxdydz = J O(h- d ) and hence, -d cov(V. I . V... V.. IV. I . I) = O(h J J 1J 1J 1 J ) • The most difficult terms have been saved for last. It will be convenient to define the sets 1 {(x,y): Ilx-yll > 2h~} A C A d = lR d x JR , A. Note that E(V .. Vi I . V.. IV. I . I) = E(E [Vi' V.. I IX.. X. I ] 2) = 1J J 1J 1 J J 1J J J JJ{J[h-~(X~Z)-f(X)][h-~(Y~Z)-f(Y)]f(Z)dz}2w(x)W(Y)f(x)f(y)dxdy = (4.13) = I + II , where I JJ{ 2 } w(x)w(y)f(x)f(y)dxdy, A II JJ{ 2 } w(x)w(y)f(x)f(y)dxdy • C A Note that, uniformly over x,y, x-z -d .l::.!. (x)] [h K( h )-f (y)] f (z) dz f [h-dK(h)-f ~ = O(h -d ) Thus, since for each x E JRd , the set {y:(x,y)EAc } has Lebesgue measure O(h d / 2), -12(4.14) Now given (xtY) € At note that the sets Al {z:lly-zll 1 3 < h ~2 } {z: = II x-z II d Let A = lR \(~UA2). 3 are disjoint. Define lIt 1 2t and by I = ffU = f dz + A Al J dz + A2 dz]2w(x)w(y)f(x)f(y)dxdy = A3 ff[I +I +1 ]2 w(x)w(y)f(x)f(y)dxdy. 2 3 A l . Using (K.4)t uniformly over (xtY) € A and z € ~ , Hence t sup II (x,y)€A ·e sup -d J [h x-z -du y-z K(T) - f(x)][h l{(T) - fey) ]f(z)dz = A ~ = O(h~-d) • SimilarlYt sup A I ~ 2 3 -- O(h - d ) • It follows from the above that Now from (4.13) and (4.14) cov ( V.. V. ~J ~ I • J t Vi . I V. J ~ I • I J 2d ) = 0 ( h -3d/2) + O(h l - ) = o(h-2d) • The above computations will prove useful in bounding the last term. Using the Schwartz Inequality and the independence of X. and X' J J I t -13- IE(V1.J.. V.1. . ~ I • J I=IE (E [V1.J.. Vij Ix.J , X.J Vi . I V." . ,) J I 1. J [E(E[Vi,V " J iJ I X.,X. J J 2~ [E(E[V , ])] I ] E [Vi I • J IXj ] E[Vi" J. 2 1 )E(E[Vi"" i'j X.] J J X' I J I X. J I]) I ~ 2~ , ])] Note that the first factor on the right side appears in (4.13). Thus the computations following (4.13) together with (4.9) imply cov(Vi,Vi,,,V,,,Vi"") = o(h J J 1.J J -d sf(h» • l d Now looking back to (4.12) it is apparent that, for h between n- / and o(n 1, cov (U . J U. ,) J J -2 -2d -1 -d h ) + o(n h sf(h» = MISE 2 Thus by computations similar to (4.11), for h between n- / d and 1, 1 cov(U. , U. I) -+ 0 • _ J J -lid It follows from this together with (4.11) that for h between n a n d 1, 1 n var(n- L U.) -+ 0 , j=l J and hence, by the Chebychev Inequality, n -1 n L U. -+ E(U.) j=l J J in probability. Theorem 2 is an easy consequence of this together with (3.5), (4.1) and (4.2). ACKNOWLEDGEMENT The author is grateful to David Ruppert for recommending the force" techniques employed in this paper • • '~rute -14- REFERENCES Ha11 t P. (1982). Limit theorems for stochastic measures of the accuracy of nonparametric density estimators. Stoch. Processes App1n. l l t 11-25. Kom1os t J. t Major t P. and Tusnady (1975). An approximation of partial sums of independent random variab1es t and the sample distribution function. Z. Wahrsch. Verw. Geb. ~t 111-131 • Marron t J.S. (1983). An asymptotically efficient solution to the bandwidth problem of kernel density estimation. North Carolina Institute of Statistics, Mimeo Series #1518. -e • • Wegman t E.J. (1972). Nonparametric density estimation. II. A comparison of density estimation methods. J. Statist. Compo Simu1. t ~t 225-245 •
© Copyright 2024 Paperzz