ON SEQUENTIAL DENSITY ESTI~~TION by Raymond J. Carroll Department of Statistics university of North Carolina at Chapel Hill Institute of Statistics Mimeo Series #1025 August, 1975 ,/ L SUMrJ1ARY We consider the problem of sequential estimation of a density function at a point X o which may be known or unknown. estimators of Let {Tn} For two classes of density estimators f be a sequence of f , namely the n kernel estimates and a recursive modification of these, we show that if N(d) is a sequence of integer-valued random variables and constants with N(d)/n(d) + 1 in probability as ned) d+O, then a sequence of ~~(d) (TN(d))-f(x o) is asymptotically normally distributed (when properly normed). l~e also propose two new classes of stopping rules based on the ideas of fixed-width interval estimation and show that for these rules, EN(d)/n(d) + 1 as d+O. N(d)/n(d) + 1 almost surely and One of the stopping rules is itself asymptotically normally distributed when properly normed and yields a confidence interval for f(x o) of fixed-width and prescribed coverage probability. 1 Introduction While there have been many papers on the estimation of a probability density function f(x) (see Wegman (1972)), the literature on sequential density estimation is relatively small. Srivastava (1973) notes that one often takes as many observations as possible in a certain time period, so that the number of observations is random. Davies and Wegman (1975) were interested in developing sequential rules which satisfy a certain error contro1. In this paper we provide a treatment of the asymptotic distributions of two types of density estimators when the number of observations is random; the estimators considered are the kernel estimators (Rosenblatt (1956), Parzen (1962)) and a variant of these due to Yamato (1971). classes of sequential stopping rules for estimating We then develop two new f(x) and obtain their precise asymptotic behavior; one of the stopping rules actually yields a confidence interval for f(x) of fixed-width and prescribed coverage probability. Specifically, He focus our attention on the following problem. we are interested in estimating f(x O)' where an example of the latter is the case where mode. x x may be known or unknown; o is the population median or o Typically, there will be a sequence of estimators of Suppose N(d) ping rules) and Suppose xO' say {T }. n is a sequence of integer-valued random variables (i.e., stopned) is a sequence of constants for which N(d)/n(d) + 1 in probability as' d+O. It is known that in many cases, if the density estiA mators are denoted by zero, as f (x), that for some sequence n n-.oo (l.la) (nE n )~(fn (Xo)-Efn (xO)J1 (l.lb) (nEn)~(fn(xO)-f(xo)) t.. A h: } decreasing to n 2 converge in distribution to a normal random variable. The first two sections of this paper are concerned with finding conditions under which both ~ (1.2a) A " , ' (N(d)E:N(d)) (fN(d) (TH(d))-EfN(d) (1 N(d))) (N(d)E:N(d))~(fN(d)(TN(d))-f(Xo)) (1. 2b) still converge in distribution to a normal random variable. Srivastava (1973) attempts to show for the kernel estimators that (1.3) is asymptotically normally distributed, but the proof of his Theorem 4.1 contains a number of errors; for example, the statement after equation (4.9) is not correct (take the kernel to be uniform on the interval [-~.~]). Wegman and Davies (1975) have shown the asymptotic normality of (1.3) for the Yamato (1971) recursive estimators indirectly by use of an almost sure invariance principle. Hence, finding the asymptotic distributions of (1.2a) and (1.2b) are still unsolved problems, while that of (1.3) is unresolved for the kernel estimators. In Sections 2 and 3 we obtain the necessary results by means of the theory of weak convergence (Billingsley (1968)) and a random change of time argument. Incidentally, the approach yields the asymptotic normality of (1.3) for both types of estimators in a general and reasonably straightforward manner, although it is much harder to obtain asymptotic normality for (1.2a) and (1.2b). In the final two sections of the paper. we propose two new classes of stopping rules tion. N(d), both based on the ideas of fixed-width interval estima- Davies and Wegman (1975) have proposed one class of stopping rules, 3 shown that they stop with probability one, and investigated the existence of moments, but the exact large sample behavior of their rules is unknown. For both classes we propose, we find sequences of constants N(d)/n(d) + 1 almost surely as (1.4) EN(d)/n(d) + 1 (1. 5) as ned) for which d+O d+O . This is precise information about the stopping rules and gives the user an idea of the approximate number of observations to be taken. Interestingly enough, one of the stopping rules yields a confidence interval for f(x O) of fixed-width and prescribed coverage probability. and we have been able to show the asymptotic normality of this stopping rule itself. Kernel Estimators In this section we investigate the asymptotic normality under random sample sizes of estimates of the density using the kernel estimates due to Rosenblatt (1956) and Parzen (1962). f(x) The kernel estimator of the density is given by n f (x) n where the kernel -1 I..\"' = (ne:) n 1 K (-1 e: (x-X.) ) n 1 , K is a bounded density function and of constants decreasing to zero. We wish to estimate {e: n } is a sequence f(x O)' where o X is an unknown point; we will thus asswne the existence of a sequence of estimators {Tn} of x • O For example, if o were the population median, Tn would be X taken as the sample median. Although the details are fairly complicated, our method is contained in a number of basic steps. Initially, we consider the asymptotic normality 4 of fn(x O) under random sample sizes by giving a weak convergence argument f[nt] (xO) (where [0] is the greatest integer function) and then using a random change of time argument (Billingsley applied to a process closely related to (1968)). Then, and f n (T) n are shown to be sufficiently close by modifying the maximal deviation arogument of Woodroofe (1967). The two basic processes with which we will work in this section are given below. DEFINITION 2.1. for Let O<a<l be fixed, let {an} converge to zero, and define O$s,t~l, vn (s) = (nE n0)- ~ [ns] { I II K(E[ ns ](Xo-X.)) 1 ~ [ns] { 1 K(Ei = (nE)n II n a ](xo-X.l) 1 V*(s,t) = (nE ) n n _~ [ns] ~ 1 EK(E[~S](Xo-X))} if s~ [na]ln EK(E[~a](Xo-X))} if s$[na]/n . {K(E[~S](Xo+tan-Xi)) - EK(E[~S](Xo+tan-X))} if ~[na]/n E~(E[~a](Xo+tan-X))} if What ,..;e eventually will assume is that an T n s:=;;[na]/n. faster than converges to converges to zero, so that we can replace the parameter t in V; by a -1 (T .,x ), It is clear that V is a random element ·of D[O, 1], while if K n n o n o is right continuous, with limits from the left, V*n is a random element of D (see Billingsley (1968) and Bickel and Wichura (1971) respectively for 2 definitions). The i.dea is now reasonably clear; we will first consider the weak convergence of the process in probability. Vn > then show that Vn and v*n are "close ll Then, since the normed random sample size version of the 5 estimate of f(x o) can be basically obtained from V~(d)(N(d)/n(d), a~~d)(TN(d)-XO)r, a random change of time argument and a few computations will yield the final result. It will be assumed throughout the rest of this paper that for some function h, €[ns]/€n + h(s) as 11+00, and A(f,K) will be defined by (2.1) The proofs of all results will be delayed to the end of the section. LE~~ 2.1. Suppose that for any cO>O Then, the sequence {Vn } is tight and there exists a process (2.3a) The finite dimensional distributions of Vn (2.3b) For P~MARK € n 2.1. O~ssl, interval f(x) [a,b] O<S<~. {an} V for which is normally distributed with mean zero and variance sh(s)A(F,K) if s~a sh(a)A(F,K) if s~a. Then, (2.2) will hold if and is Lipschitz on this interval. is bounded, of constants Yes) for which converge to those of V. Assumption (2.2) is satisfied in many cases. =n- S for some M(C O) there is a constant Suppose that K vanishes off some It will also hold if K is continuously differentiable, and if for any sequence converging to zero, 6 LEMMA~~1...:- Suppose that -1 t3 E = O(a ) nn n (2.4a) a (2.4b) nEn (2.4c) na- 2+ S exp{_ca- S/ 2} (2.4d) K is Lipschitz of order one and satisfies Woodroofe's (1967) condi- for some 0<6<1, where 0 is the standard "big oh". increases in n n n + 0 for all c>O tions. Then sup{lvn (s)-V*(s,t)I : n O~s,tSl} ~ 0 , where ~ is convergence in probability. RE~~RK 2.2. estimation of THEOREM 2.1. Lemma 2.2 is the key result, since as will be seen it says that o by Tn does not change the asymptotic distribution. X Suppose that (2. Sa) almost surely as n-+co, where "0" is the "little oh" notation. (2.5b) {an}' {En}' K satisfy the conditions of Lemmas 2.1 and 2.2. (2.5c) For some sequence of constants variables Then N(d) n(d)~, satisfy N(d)/n(d) + 1 the integer-valued random in probability as d+O. 7 converges in distribution to a normal random variable with mean zero and variance A(f,K). REMARK 2.3. The conditions (2.5a) and (2.5b) are satisfied in many cases. For example, if is the pth population quantile and X o Tn is the pth sample = O(n-~(log log n)~. If Xo O is the population mode, Venter (1967) and Sager (1975) have given rates of quantile, Bahadur (1966) has shown that convergence of T n to Note that if one wanted merely to estimate for a known Tn = xo. Tn-x (2.5a) and (2.5b) clearly hold by choosing Thus Theorem 2.1 is a generalization of the problem considered by Srivastava (1973). ~fuile of Theorem 2.1 shows the asymptotic normality of a normed version fN(d) (TN(d)) , it is useful to ask when the integral appearing in (2.6) can be replaced by x o. This is the gist of the following Corollary. The results here are comparable to those given by Cacou11os (1966). COROLLARY 2.1. Suppose that the conditions of Theorem 2.1 hold and that on the support of K the density f tiable. (2.7) (2.8) is twice boundedly continuously differen- Suppose further that f =0 yK(y)dy k (m: ) 2 a n n , -+ 0 , f iK ne: (y) ely < 0() 5 -+ 0 • n Then (2.9) is asymptotically normally distributed with mean zero and variance A(f,K). 8 PROOF.OF LEMMA 2.1. By using the Cramer-Wold device and the method given in Parzen (1962, page 1069), we see that the finite dimensional distributions of Vn converge, so it suffices to show that the sequence {V} n is tight. From, for example, the extension of Theorem 3 of Bickel and Wichura (1971) given in that paper, it suffices to show that there exist S>~, H>O such Elvn (s 1 )-Vn (s 2 )1 2 ~ MSIE n- l :S; M*ls -s .2 1 I by assumption (2.2). The final case manner, so that suffices. S=1 PROOF OF LEMMA 2.2. (sl~[n~]/n~s2) follows in a similar Here we make use of the results of l'Joodroofe (1967). First define -k J; Z (s) = ([ns]£[ng(s)]) • (nE )' vn(s)/a[ng(s)] (x ) O n n Z*(S,t) n = ([nS]E[ng(s)])-~ (nEn)~ V~(s,t)/a[ng(s)](xo+tan) where g(s) = gn (s) = s = [n~]/n if s~[n~]/n if s<[n~]/n 9 If we show that sup{IZ n (s)-Z*(s,t)I : O~s,t~l} = sup{/Z n (i/n)-Z*(i/n,t)I : O~t~l, O~i~n} ~ 0 , n n this will yield the result because of Lemma 2.1 and since sup{la[ng(s)](xO+tan)-a[ng(s)](xo)f: O~s,t~l} Now, fix -1 and let, for i 0 . + p=O,l, ... , [E. ] , 1 x = xO+PE np i (xnp ) anp (x) = a.1 (xnp +XE.)/a. 1 1 O~x~l pE.a -1 +XE.a -1 ) Znp (x) = anp (x)Z*(i/n, n 1 n 1 n O~x:::;l Z (k) (x) = Z Next, define ('2- k ) np J np if x=j2-\ j=O,l, ... ,l defined by linear interpolation otherwise. From Le~~a 3.2 of Woodroofe (1967), there exists Pr{ sup O~x~l sup{lz Now, since 0:::; x:::; 1 , Choose we have Izno (x)-Z(k)(x)I>e:} . . no . no with such that k DId, n l l :::;-D exp{-eda- } n Z(k) (x) s np 1,/2 -1 ...a such that for ' (X)_Z(k)Cx)I : O:::;x~l, nl~i~n} ~ 0 . no and since is piecewise linear for n . 10 Hence, by Lemma 3.1 of Woodroofe (1967), Pr{ sup o~t~l I -2 (l-!) Z (o)-Z(k)(ta E~l)I>~} ~ M*a n no no n 1 . . ., M*u -2 (l-!) n Thus, as 2 2 exp{ - EMa -B/2 }. n n ,n+oo, 1 sup{lz no (x)-Z no (0)1: O~x~an c~l, nl~i~n} ~ 0 1 Since Zno (0) = Zn (i/n) and -1 1 Zno (tan E.1 ) = Z*(i/n,t)a n no (tan E~1 ), we have that so that sup{lvn (i/n)-V*(i/n,t)I : O~t~l, nl~i~n} ~ 0 • n Since K is Lipschitz, it is continuous so that the result now follows. PROOF OF THEOREM 2.1. Define m(d)=2n(d) and Wn (5) = Vn (s)-Vn (~). Then is tight and sup ~~s~~+n min{lwm(d) (s)-Wm(d)(t)l, IWmCd)(s)-WmCd)ct+n)I}. Since the first term on the right hand side of the above equation converges in probability to zero as d,n+O (because of (2.2) and Chebychev's inequality) and the second term is bounded by the modulus of continuity, we have (if denotes convergence in distribution) nLli 11 VM(d) (NCd)/mCd)) L V(~) • Because of Lemma 2.2 and assumption (2.5a), we thus obtain (2.10) But, by choosing a4 (2.11). in Definition 1. 1, we obtain (m(d)e: rn (d))-l (NCd)e:N(d)) (fN(d) L with (N(d)e:N(d) )J;( - m(d)e: mCd ) )-% sup Itls1 Now, since (n/e:)~ n v(%) , P -+ %hC~) bution with mean zero and variance PROOF OF COROLLARY 2.1. I (TNCd))-e:~~d) K(e:;~d) (TN(d)-Y))fCY)dY] . Since V(%) has a normal distri- ACf,K) (%h(%)) , the proof is complete. Because of Theorem 2.1, it suffices to show that I f K(e:-n1 Cxo+tan-y))f(y)dy - e: nf(X o)/ + 0 • K:is a density function, this term is bounded above by sup Itl$l (ne:)~ J n K(y) If(xO-e: y-ta J/-f(xO)dy n n A Taylor's expansion now completes the proof. A Second Class of Estimators In this section we investigate the asymptotic normality under random sample sizes of the recursive density estimators introduced by Yamato (1971) and defined by f*(x) n = n -1 ~l e:.-1 K( e:.-1 (x-X.) ) . 1 1= I J n- 1 1 1 1 = (n-1 f * lex) + (ue: )-l K(e:- (x_X)) . n n n n 12 The recursive property of this class of density estimators is clearly useful in sequential investigations and also for fairly large sample sizes since addition of a few extra observations means the kernel estimato be entirely recomputed. xo' for a given fn(x) Wegman and Davies (1975) have recently shown that f~(xO) satisfies an aln~st sure invariance principle; this method is closely related to that of Jain, Jogdeo and Stout (1975). still left with the must ~rob1em f(x O) of estimating where We are is unknown. The outline of our results is very much like that of Section 2, but the methods are different (especially the analogue to Lemma 2.2, where we use a weak convergence argument) and we are able to relax the Lipschitz condition on K. One can obtain an analogue of Lemma 2.1 using the results of Wegman and Davies (1975); however, their assumptions and methods are different from those used here. We again start with two processes. DEFINITION 3.1. Define vn = Vn (s) = (nle )n v*n k = Vn (S,t) = (nle )-2 3.1. Suppose that LEt~ n ~ ens] 1 {K(e: (x -x.)) . 1 101 1= L ens] 1 1 L {K(e: (xO+ta -X.)) - EK(E~ (xO+ta -X))}/e . . . 1 1 n 1 1 n 1 1= lirnnsup f K (y)f(xo-Y£n)dY 2 f that K,f 2 K (y)dy < ro < ro , satisfy the conditions of Theorem 5 of Yamato (1971), and that 13 there exists a number a (O~a~l) n -1 for which n l i=l e: If:.. n 1 + a . Then Vn is tight and there is a process, V -.in. D[O,l] for which the finite dimensional di$tributions of V. converge to those of V and Yes) is norn mal1y distributed with mean zero and variance ash(s)A(f,K). The following two results form an analogue to Lemma 2.2. the Lipschitz condition on K is somewhat reduced in Lemma 3.3, the price being paid is a stronger relationship between the sequences LEI~~ 3.2. Note that while {a } n and {f:.}. n Suppose one of the following hold: (3.la) K is Lipschitz of order one and (3.lb) K is continuously 2 a /£3 n n diff~rentiable, + a 4 Is 5 n n O. + 0, and for any sequence n +0, n Then, sup{lvn (s)-V*(s,t)I : O~s,t~l} ~ 0 . n LEMMA 3.3. Suppose Lipschitz on [a,b]. K vanishes off a closed interval y>O. and is Suppose further that for some (3.2) [a,b] 13>0, and for some 14 Then, • : O~s,t~l} -+ p 0 . sup{ IV (s)-V*(s,t)I n n Theorem 3.1 and Corollary 3.1 (which are given below) follow in a manner similar to Theorem 2.1 and Corollary 2.1. THEOREM 3.1. Assunie (2.Sa), (2.Sc), the conditions of Lemma 3.1, and that the conditions of either Lemma 3.2 or Lemma 3.3 hold. Then converges in distribution to a normal random variable with mean zero and variance aA(f,K). COROLLARY 3.1. Under the conditions of Theorem 3.1 and Corollary 2.1, is asymptotically nonnally distributed with mean zero and variance PROOF OF LEr$~ 3.1. aA(f,K). As in Lemma 2.1, it is straightforward to verify the convergence of the finite dimensional distributions of V. n sufficient to show the existence of M>O, S>~ (3.4) Now, the left hand side of (3.4) is bounded by such that for Thus, it is again sl=j/n, s~=k/n, '- 15 ,,; (e: In) n which completes the proof with PROOF OF LEr~~ 3.2. 6=1. We will use a weak convergence argument, so it is first necessary to verify the convergence of the finite dimensional distributions. Fix x and t. Then = n (e: In) n L i=l 2 {K(y) - K(y+tan Ie:.)} f(xO-e:.y)dy . 1 1 2 3 If (3.la) holds, the first integral expression is bounded by Manle n . If (3.1b) holds, a Taylor's expansion shows that the last integral expression is bounded by M(an Ie: n )2. Hence, by Chebychevis inequality, the finite dimensional distributions each converge to zero in probability. tightness, we again USE; the extension given by Bickel and Wichura (1971) of their Theorem 3; to do so, first define the process It is clear that To verify V**(s,t) n V*-V** ~ 0 so that we may work with V**. n n ' n = V*(s,[nt]/n). n It is thus sufficient to verify the moments condition given by equation (3) of Bickel and Wichura (1971). blocks. M>O, 6>1 (3.5) Letting We will adapt their notation and let B,C be neighboring By the Schwartz inequality, it suffices to show that there exists for which if i,p,q are integers, 16 we see that the term on the left hand side of (3.5) is (E In) 2 n i I j=l E z~ In + (e n In)2 I j..eQ, j,t::::;i If (3.la) holds, we have thus bounded (3.5) by (for some so that tightness holds in this case. .Z. 1n M>O), If (3.1b) holds, we have = E.1-1 {K (-1 E. (xO+pa In-X.) ) - K(E.-1 (xO+qa In-X.) ) } + 0 ( (p-q)a Ine: ) 1 n 1 1 n 1 n n so that and hence (3.5) is bounded by which completes the proof. PROOF OF LE~~~ 3.3. We first show that each of the finito dimensional dis- tributions converge to zero. that a=-l, b=l, so that Fix s,t, and assume (without loss of generality) 17 2 Elvn (s)-V*(s,t)1 n ~ (E In) n n \' -2 l. Ei i=l + (e: ~ n (e: In) n 2 K (e: -1 i (xo-y))f(y)dy + L In) i=l r i=l I xo+e:·+ta n e:: 1 l Jl -l+ta n Ie:. 1 n 1 xO+E i [K(Z)-K(z-ta 1e:.)]2 f (x -E.Z)d Z O 1 n 1 2 an Ie:n n ~ M..,(e: n In) ,~ r i=l { a 2 e:.-3 + a e:.-2 } + n 1 n a. 1 Thus, the finite dimensional distributions converge, so that it remains only. to verify tightness. As before, we first try to verify V*-V** ~ 0 • n n (3.6) Let I A denote the indicator function of the event A, and define A.1n as the union of the intervals [e:., e:.+a +n -1 ], [-e:., -e:.+a +n -1 ] . 1 1 n 1 1 n Since K is bounded and Lipschitz on its support, there is a constant M for which ., (3.7) IV*(s,t)-V**(s,t)I ~ M{(a Ie: )~ + (e: In) nn n n n k 2 n I '1 1= I . } . Aln By using Chebychev's inequality for fourth moments, we see that each term on the right hand side of (3.7) converges almost surely to zero. Hence we need 18 only verify the tightness of V -V**. Defining n n 3.2~ we sec that for p<q~ p, q integers~ Z. as in the proof of Lemma 1n XO-E.+(q-p)a /n -2 2 2 -2 ~ HE i ((p-q)/n) (an/Ei) ) +E i I 1 2-1 n K (e: i (xo-y))f(y-pan/n)dy XO-E i Also, one shows by similar computations that Thus (3.5) is bounded by M{(i/n)e:- 2 I~I (an/n) + (i/n)2 1!.?..=.9.1 2 (a /£ )2} . n n n n n (3.8) for some 8>0, and since n I E:~I ~ lIn . , (3.8) is boun- ded by (3.9) which completes the proof. First Class of Stopping Rules In the previous sections we have assumed (in (2.5c)) the existence of a stopping rule with the property that N(d)/n(d)+l in probability as d+oo. While it is easy to write down various stopping rules, it is not clear how to develop rules which are based on reasonable criteria for density estimation. 19 We aro aware of only one class of stopping rules for the density estimation problem; this class was suggested by Davies and Wegman (1975) who base their idea on stopping when .c and n-1 are close together. However, their stopping rules have not been shown to satisfy (2.5c) and the precise asympto.L n f tic behavior of the rules is not known, other than that they terminate with probability one and have certain moment properties. In this and the next section we will discuss two classes of stopping rules which are motivated in a natural manner from the ideas in the theory of fixed-width confidence intervals (Chow and Robbins (1965), Govindarajulu (1975)). We will verify (2.5c) and the related property EN(d)jn(d) -+- 1 for all the rules we propose, thus making their properties clear. One of the stopping rules, discussed in this section, will actually yield fixedlength confidence intervals for f(x O); we will also be able to discuss the asymptotic normality of this stopping rule itself. In this and the next section we will make the following general assumptions. There will exist a sequence of density estimators f (x) n' such ther;-- A (4.1) for a sequence of statistics (4.2) if N(d)jn(d) ~ 1, function K, and zero and variance B(K) N(O,a 2) 2 a , then {T }, n f (T ) -+- f(x ) o n n almost surely. is a constant depending on some known denotes a normal random variable with mean Some discussion is in order. We are not assuming f n type of estimator such as discussed in Sections 2 and 3. (x) is a special However, as we have seen, these latter estimators often satisfy (4.2), and it is not in general very difficult to verify (4.1). f For example, if xo' . f in some neighborhood of f n converges uniformly to is continuous, and Tn converges to X o almost surely, then (4.1) holds since Conditions on uniform convergence are found in Schuster (1969) and Davies (1973) • Another important point to note is that many of the results in the next two sections hold if (4.2) is replaced by convergence as in Theorems 2.1 and 3.1, if f(x o) is merely replaced by the correct function. Unless specified otherwise (as in Lemma 4.4) this will be the case. The first stopping rules we discuss introduced by Chow and Robbins (1965). function and ~ -1 ari~e If w in a manner similar to those is the normal distribution its inverse function, define (4.3) STOPPING RULE 4.1. The stopping rule nE LEMHI. 4.1. n ~ (bId) N(d) stops the first time 2 ,. f (T ) • n n Suppose that (4.1) holds and that N(d)EN(d) (b 2 f(xO)/d 2)-1 + 1 a.s. En IE n- 1 as + d+O. 1. Then n~nO that 21 LEMMA 4.2. Suppose (4.1) holds and 8 /8 n n- 1 + 1. If (4.2) holds, then The proof of LeJIllUa 4.1 is contained in Lemma 1 of Chow and Robbins (1965). The proof of Lennna 4.2 is immediate from (4.2) and (4.3). Because of Lemma 4.2, we say that stopping rule 4.1 yields a confidence interval of fixed width LEI-jf.1A 4. 3 • 2d Suppose that and prescribed coverage probability Ef'" (T )+f (x ) O n n I-a. and that there is a constant M* for which ()() L (4.4) If n=l En=n- a for some Pr{lf (T )-Ef (T )I>M*} < ()() . n n n n a>O and v(d) = (b 2f(x )/d 2)1/(1-a), o EN(d)/v(d) + 1 . REMARK 4.3. Schuster (1969) has shown that the kernel estimators fn(x) n satisfy (4.4), while under the assumption n- 1 I 8 /E. + a, one may use i=l n 1 the exponential bounds (Loeve (1968), page 254) to show that f~(x) satisfies (4.4) . PROOF OF LEMMA 4.3. Because of LeIDr.la 4.1 and by Bickel and Yahev (1968), i t suffices to show there is a dO>O for which ()() sup Pr{N(d»md- 2 (1-a) m=l O<d<d O I -1 Letting m(d)~md-2(1-a) , we see that -1 } < ()() . 22 (4. Sa) N(d)l-a_V(d)l-a ~ (v (d))l-a( fN(d) (TN(d) ) -f(x ) ) /f(x ) O O (4.Sb) N(d)l-~_~(d)l-a ~ (V(d))l-a(fN(d)_l (TN(d)_1)-f(x o))/f(xo)+2no Since N(d)/v(d)~l a.s. as d+o, multiplying all terms by v(d)-(1-a)/2 completes the proof. Thus, taken together, Lemmas 4.1, 4.3 and 4.4 yield a great deal of information, telling us in detail how the stopping rule 4.1 behaves. Lemma 4.2 gives us a very nice property of the stopping rule, namely that it yields confidence intervals for probability. f(x O) of fixed length and prescribed coverage It should be mentioned here that Starr (1966) has sho"~ that for this same class of stopping rules when trying to estimate the mean of a normal distribution the following approximation is almost true for all values of d: This indicates that stopping rule 4.1 may well achieve its asymptotic behavior for moderate values of d. A second stopping rule of this type is more global in scope and would appear more useful in situations where one is interested in estiluating the density at the mode. STOPPING RULE 4.2. Specifically, The stopping rule N(d) stope the first time that ne: n ~ (b/d)2 sup f n (x) x n~nO 22 LE~4A if 4.4. Suppose (4.1) and (4.2) hold and E: n =n -a for some a>O. Then A(F)=B(K)f(xO) REMARK 4.1. This Le~ma requires (4.2). It may not be true if convergence only on the order of Theorems 2.1 and 3.1 is known, PROOF OF LEr$~ 4.4. The proof follows exactly along the lines of a result due to Ghosh and Mukhophadyay (1975). we will sketch the proof. Since Since their result is as yet unpublished, N(d)+oo a.s., by (4.1) and (4.2), N(d) (1-a)/2(fN(d) (TN(d))-f(xo))/A(F) ~ N(O,l) (N(d)-l) (1-a)/2(fN(d)_1(TN(d)_1)-f(X o))/A(F) ~ N(O,l) . Now, N(d)l-a ~ (N(d)_l)l-a ~ so that Thus, 23 LEl·it1A 4.5. Suppose there is a unique X o such that = max f(x) x and max fn(x) ~ f(x O) a.s. x If e: n =n -I). as n+oo. and N(d)/v(d)~l a.s. If, in addition, there is a constant 00 L (4.6) n=l d~O. as M* for which "- Pr{suplfn (x)-f(x)I>H*} < 00 , x we have EN(d)/v(d)~l a.s. The proof of Lemma 4.5 is the same as that of Lemmas 4.1 and 4.3. (1969) has shown that the kernel estimators is not known whether the estimators f~(x) f (x) n Schuster satisfy (4.6), but it satisfy (4.6). We have also been unable to obtain a result similar to Lemma 4.4. Stopping rule 4.2 is a competitor f is symmetric and unimodal and sample median. X o to StopI1ing rule 4.1 in the case is the mode. T n tha~: here might be the The almost sure asymptotic properties of the two rules would be the same, but rule 4.2 would of course always take more observations than rule 4.1. HO\'lever, in this particular case, both yield fixed-width co:afidence intervals of prescribed coverage probability for f(x O)' 24 Finally, note that probability one, as K(el) d~ dcfinc..l oy stopping rule 4.2 Jiverges with so that A suplfn (x)-f(x)I+O (4.7) almost surely x ~ sup/f x N(d) (x)-f(x) 1+0 almost surely. A Second Class of Stopping Rules The second class of stofping rules we investigate is motivated by work of Farrell (1966) and Sen and Ghosh (1971). Their idea was to obtain upper and lower bounds on the parameter of interest and to stop when the difference in these two bounds becomes at most 2d. The first rule of this section then becomes STOPPING RULE 5.1. Define for some sequence of constants {bn } decreasing to zero vn = (rn(Tn+bn )-fnn (T ))/fnn (T ) (5.1) and let H(d) be the first time n~nO The motivation for dividing by when there is little change in f n , Ivn 1~2d. that in (5.1) is that one will stop n n (in a neighborhood of xO ) relative to f (T ) f(x ). At the end of this section we briefly discuss a rule which does not O divide by f (T). As a notational device we make the following: n DEFINITION 5.1. n A sequence of statistics L n=l Pr{lyn I>e:an } < y n 00 =o*(a ) n • if for all £>0. 25 Now, in order to investigate the stopping rule at N(d) we want to look V. n LEr<jMA ,5...1. Suppose ,that neighborhood of xO' f f(2)(X ) ~ 0, and that O Tn -x O = o*(b n) (5.2a) (S.2b) has three continuous',bounded"derivatives in 'a suplfn(x)-f(X)! = O*(b~), where the supremum is taken in some neigh- bor hood of xo. (5.3) (5.4) PROOF: Note that A A ~ f (T +b ) - f (T ) nnn nn where IHn I ;s; = f n(Tn+bn) ~ - f(T +b ) + f(T ) - f (T ) + f(T +b ) - f(Tn) nn n nn nn 2 suplfn (x)-f(x)/. f (T +b ) - f (T ) n n n n n Thus, = bnf(l) (xo) + ~bn2f(2)(xo) + O*(b 2) . n New (5.3) and (5.4) follow easily. REHARK 5.1. For a discussion of (5.2b), see the remarks after Lemma 4.5. 26 \Ale are now in a position to discuss the almost sure behavior of N(d). THEOREM 5.1. Suppose bn =n- Define a VIed) = 14f(Xo)d/f(2)(xo)I-~a v (d) 2 = satisfies the conditions of Lemma 5.1. N(d)/vlCd)+l NCd)/v 2 Cd)+1 PROOF: the case 12f(XO)d/f(1)(Xo)l-l/a almost surely as almost surely as Then, if d+O. d+O. The proof follows a method of Sen and Ghosh (1971). f(l) (x ). o If lien denotes set inclusion and iiU" We consider only set union, The first set on the right hand side of (5.5) is contained in 2fCxO) 2a 2a } (2) (Cl+E)v Cd)) V[l+E]V (d) 1 > (l+E) 1 1 {f (x ) 1 O so that by Lemma 5.1, it suffices to consider only the last event in (5.5). Now, 27 so that again by Lemma 5.1 and since d+O, the proof is now complete. The next step is to show that an analogue to Lemma 4.3 holds. Before proceeding, a few definitions are needed. DEFINITION 5.2. Let 00 HE (d) LEI~:~ 5.3. = L nZ(d) 00 Pr{N(d»n}, H*(d) € Pr{NCd»n} • ni(d) Under the conditions of Lemma 5.1, lim H Cd) d+O e: < lim H* Cd) < d+o E PROOF: = L 00 if f(l)(x) = 0 0 <XI if f Cl ) (x ) Again, consider only the case <XI He: (d3 $ L n ZCd) { Pr 0 fCl)Cx ) O ~ o . = O. Then for some 4f(x )d O V - 1 > b~f(2) Cx b~fCZ)(Xo) n The last sum converges and is a decreasing function of LEMMA. 5.4. I} 2f(xO) Under the conditions of Lemma 5.1. (5.6a) EN(d)/v (d)+l l if f(l)Cx ) O C5.6b) ENCd)/v Cd)+1 2 if f(1)CX =0 o) ~ 0 . o) d. e:1>O, 28 PROOF: Ho HIll only ::;how (~.ba). hie have where II extends over {n:s;n (d)} 1 r extends over {n (d)<n<n 2 Cd)} 1 13 extends over {n;;:n zCd)} . 2 Now, by Lemma 5.1. Also, by Lemma 5.2. Finally, as This completes the proof. E~, d+O. The choice of V n one available. stants given in (5.1) is certainly not the only possible We list below more statistics vI (d), v2 (d) that go with them: V n and the sequence of con- 29 (5.7) {I = max fn (Tn +bn )-fn (Tn )/ A ' fn (Tn -bn )-fn (Tn )/} , f ,(T ) n n I ".. . f (T ) n n (5.8) v (d) 2 (5.9) = 12d/f(1)(xo)I-1/a Ivn31 = max{lfn(Tn+bn ) - fnn (T )1, Ifn(Tn-b n) - fnn (T )!} Again, modifications of these stopping rules along the lines of Lemma 4.5 are also possible. When this is done, we again see that since diverges with probability one as N(d) J+O, suplr (x)-f(xj+ 0 almost surely x n ':!> s~plfN(d)(X)-f(X)I+O almost surely . .r. REFEREI~CES [1] Bahadur, R.R. (1966). A note on quanti1es in large samples. Statist. (37) 577-580. Ann. Math. [2] Bickel, P.J. and Wichura, M.J. (1971). Convergence criteria for multiparameter stochastic processes and some applications. Ann. Math. Statist. (42) 1656-1670. [3] Bickel, P.J. and Yahav, J.A. (1968). Asymptotically optimal Bayes and minimax procedures in sequential estimation. Ann. Math. Statist. (39) 442-456. 30 [4] Billingsley, P. (1968). New York. Convergence of Probability Measures. Wiley, [5] Cacou11os, T. (1966). Estimation of a multivariate density. Statist. Math. (18) 178-189. [6] Chow, Y.S. and Robbins, H., (1965). On the asymptotic theory of fixedwidth sequential confidence intervals for the mean. Ann. ~~th. Statist. (36) 463-467. [7] Davies, H.I. (1973). Strong consistency of a sequential estimation of a probability density function. Bull. Math. Statist. (15), 49-54. [8] Davies, H.I. and Wegman, E.J. (1975). Sequential nonparametric density estimation, to appear IEEE Transaotions on Information Theory, November, (1975). [9] Farrell, R.H. (1966). Bounded length confidence intervals for the ppoint of a distribution function III. Ann. Math. Statist. (37) 586592. [10] Ghosh, H. and Mukhophadyay, N. (1975). Asymptotic normality of stopping times in sequential analysis. Unpublished paper. [11] Govindarajulu, Z. (1975). Press, New York. [12] Jain, N.C., Jogdeo, K., and Stout, W.F. (1975), Upper and lower functions for martingales and mixing processes, Ann. Frob., (3) 119-145. [13] Lo6ve, ri. (1968). Probability Theory, 3rd cd., Van Nostrand, Princeton. [14] Parzen, E. (1962). and the mode. [15] Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. (27) 832-837. [16] Sager, T.W. (1975). Consistency in nonparametric estimation of the mode. Ann. Statist. (3) 698-706. [17] Schuster, E.F. (1969). Estimation of a probability density function and its derivatives. Ann. flath. Statist. (40) 1187-1195. [18] Sen, P.K. and Ghosh, M. (1971). On bounded length confidence intervals based on one-sample rank order statistics. Ann. !lath. Statist. (42) 189-203. [19] Srivastava, R.C. (1973). Estimation of probability density function based on random number of observations with applications. Int. Statist. Rev. (41) 77-86. Ann. Inst. SequentiaZ StatistioaZ Prooedures. Academic On the estimation of a probability density function Ann. Math. Statist. (33) 1065-1076. 31 [20] Starr, N. (1966). The performance of a sequential procedure for the fixed-width interval estimate. Ann. fibth. Statist. (36) 36-50. [21J Venter, J.H. (1967). On estimation of the mode. Ann. Math. Statist. (38) 1446-1455. [22J Wegman, E.J. (1972). Nonparametric probability density estimation: I. A summary of available methods. Teahnometrias (14) 533-546. [23] Wegman, E.J. and Davies, H.I. (1975). Remarks on some recursive estimators of a probability density. Institute of Statistics Mimeo Series #1021. University of North Carolina at Chapel Hill. [24] Woodroofe, M. (1967). On the maximum deviation of the sample density. Ann. Math. Statist. (38) 475-481. [25] Yamato, H. (1971), Sequential estimation of a continuous probability density function and mode, BuZZ. Math. Statist., (14) 1-12.
© Copyright 2025 Paperzz