SUMMARY .. THE USE OF SUBSERIES VALUES FOR ESTIMATING THE VARIANCE OF A GENERAL STATISTIC FROM A STATIONARY SEQUENCE Edward G. Carlstein University of North Carolina, Chapel Hill Let {Z. : _00< i < -f=} be a strictly stationary a-mixing sequence 1. ... , Z ) be a statistic computed from the n Without specifying the dependence model giving rise to the sequence {Z.}, 1. specifying the marginal distribution of Z., 1. of variance estimation for t and without we address the question For estimating the variance of t n n -+ from just the available data Z , we propose computing subseries n values t m(Zi+l' Zi+2' ... , Zi+m)' 0 < i < i+m < n . These subseries values are used as replicates in order to model the sampling variability of the statistic t. In particular we use adjacent non-overlapping subseries of length m , m-+-OO n n m n In -+- 0 . Our basic variance estimator is just the usual sample variance computed amongst these subseries values (after appropriate standardization). This estimator is shown to be consistent under mild integrability conditions. A simulation study is conducted, leading to the introduction of overlapping subseries and improved performance of the variance estimator. Running Heading: Using Subseries Values to Estimate Variance. AMS 1980 subject classifications. Key Words and Phrases. Primary 62G05; Secondary 60G10. Variance estimation, subseries values, general statistic, dependence, a-mixing, stationary sequence. Supported by NSF Grant MCS-8102725. THE USE OF SUBSERIES VALUES FOR ESTIMATING THE VARIANCE OF A GENERAL STATISTIC FROM A STATIONARY SEQUENCE by Edward G. Carlstein • 1. Introduction. Consider a strictly stationary sequence {Z. : _00 1. -+- from which we observe Zn = (Zl' Z2' ... , Zn)' istic t < i < +ro} n> 1 A stat- -+- n t (Z ) is computed from the observed series. n n In the absence of assumptions about the underlying dependence model in the sequence (e.g. autoregression), and in the absence of specific distributional assumptions about the Z 's (e.g. joint normality), i we would like to be able to estimate the variance of t n from the -+- available data Z n Most variance estimation techniques for general statistics have been aimed at iid samples, making heavy use of exchangeability in their schemes for generating replicates of t. This is true of the theory and intuition behind Tukey's (1958) "jackknife," Efron's (1979) "bootstrap," and Hartigan's (1969) "typical values." Recently, Freedman (1984) and Freedman and Peters (1984) have • considered applying the bootstrap to a linear model with autoregressive component, but this still assumes additive iid perturbations. We propose computing the statistic t on subseries -2O<i<i+m<n, - + within the sample Z n as a way of obtaining replicates of t without disturbing the natural ordering in the data. Our basic variance estimator uses adjacent non-overlapping subseries of lengthm and m /n n 0 as n + + n s.t. m n + 00 00 Section 3 gives a detailed comparison of the motivating factors behind our variance estimator and those behind the standard variance estimators for iid data. In Section 4 we establish conditions under which our estimator will be consistent in the L sense. 2 Parallel theory is developed in Section 5 involving onlyF-consistency of the variance estimator. These conaist~ncy results are combined with the asymptotic normality results of Carlstein (1984) to obtain asymptotic normality for general statistics from a-mixing sequences with the limiting distribution being free of the nuisance parameter 0 2 Then simulation studies are conducted in order to investigate the finite-sample performance of the variance estimator. The results of these studies (Section 6) give insight regarding the choice of subseries length (m ) n they also suggest a way to use longer overlapping subseries. 2. Notation and Definitions. Let {Z. (ell) : 1 _00 < i < +,o} be a strictly stationary sequence of real-valued random variables (r.v.) defined on probability space (D,F,F). Let F+(Fp q respectively) be the u-field generated by -3{Zp (W), Zp+l(W)' ... } ({ .•. , Zq- lew), Zq (w)} respectively). For N > 1 denote: a(N) = sup{!:JP{AnB} - :JP{A}:JP{B} and define a-mixing to mean lim a(N) = O. I: A E: F~, B E: F~}, This is a standard N~ mixing condition which guarantees approximate independence • between observations that are separated by a great distance (in time) (see Rosenblatt (1956». a-mixing is known to be satisfied by normal, double-exponential, and Cauchy AR(l) sequences (Gastwirth and Rubin (1975», as well as by Markov sequences with finite state space (Billingsley (1968), p. 167). In fact, Gastwirth and Rubin (1975) ficient a(N) by C Ipl N bound the mixing coef- for the normal and double-exponential AR(l) sequences, and by C N IplN for the Cauchy AR(l) sequence (where -1 < P < 1 is the AR parameter). • 1 n Let t n (zl' zZ'.·., zn) be a function from R -+ R , defined for each n> 1 so that t (Zl (w), Zz (w), ... , Z (w» - n n is F-measurable. We will suppress the argument w of Z.(·) from here on. ~ +i i Denote Zn = (Zi+1' Zi+Z'···' Zi+n) and t n icular case: -i Z n -+i = t n (Zn); as a part- n L = j=l Zi-tj.' In. For B > 0 denote: Definition: Random variables {X } will be said to be n uniformly integrable (e.u.i.) iff ]n O s.t. lim A~ sup lE{ I~ I} n n>n - 0 = O. event~ally -4At times it will be convenient to use the equivalence: {x } are e.u.i. iff lim lim n A-'>= n-'>= 3. 0. The Variance Estimator. • It would be useful to have a procedure for estimating the variance 0 -+0 data Z . n avoid 2 of a general statistic to using only the available n In the same spirit as Carlstein (1984), we wish to ~aking assumptions about the specific marginal distribution F of Zo and about the dependence model Min {Zi} Moreover, calculation of the theoretical variance of to in terms of the n parameters of F and M --even if they were specified-- may be intractable. Hence our objective is a non-parametric variance • estimator for general statistics from stationary a-mixing sequences. In the special case where {Z.} is iid, non-parametric error 1 estimation has been addressed within the broader context of subsampling and resampling. Hartigan t s (1969) "typical values" can be used to obtain confidence intervals in a very general setting, without explicitly estimaU ng for example Efron (1982» 0 2 The "bootstrap" approach (see may be applied for estimating virtually any characteristic of the distribution of to n variance. These techniques are b~sed --including its on the idea that by computing -5the statistic t on subsamples of the data, we can gain insight about the sampling distribution of to n behind the bootstrap: This is the intuition The empiricalc.d.f. F to the true c.d.f. F of 2 0 , -+0 since 2 n -+0 n from 2 -+0 from F; n to the true sampling distribution of to n -+0 Z from F, n 1n0 0* n So instead of drawing many samples from F n Since F n t -+0* n (Z n become close We are, however, stuck -+0* replacement) • cations t from F. o would n we draw many "bootstrap" samples Z • each based on as the number of such replications be- comes large, the empirical distribution of t with but one sample is "close" is a random sample from F. We would like to observe many replications of to n a new sample 2 n is close to F, (with n the corresponding repli- ) have an empirical distribution that satis- factorily approximates the true sampling distribution of to n When non-trivial dependence is present in {Z.}, 1 there is in principle nothing wrong with using the empirical c.d.f. F n estimate the common marginal c.d.f. F. to For example, ergodic theorems may be used to show that F (t) converges to F(t) with n probability 1; and Gastwirth and Rubin (1975) have demonstrated that (F (t) - F(t»n~ converges to a Gaussian process when {Z.} n 1 -6is)-mixing with a(k) distribution of t = 2 0(k- S / ) . The problem is that the sampling ° depends not only on the marginal distribution F • n -+0 but also on the dependence in Z (i.e. the joint distribution). n Replications of t computed on bootstrap samples from F n from F-- will not accurately reflect this dependence. "empirical dis·tribution" of t 0* n -- or even The resultant will be of little value in approximating the true sampling distribution of to n In fact. the dependence structure in {Z.} is preserved only by those subsamples of the form 1 O..::.j"::'n-k. n>k>l} There are several competing considerations in designing a variance estimator based on {t~ It is clear • that the performance of such an estimator will depend upon how many representative subseries values t j are used. how different the k t j , s are from each other. and how accurately the t j 's model the k k behavior of to n For a particular value of k. one would not expect '+1 to differ by much --especially in light of the dependence k and t J between {t~ : z~ and Zj+k+l 0..::. j ..::. n-k} Hence the collection of subseries values contains a great deal of redundancy that may not ° contribute information about t 's sampling variability. n tion The collee- on the other hand. contains only t~ -7'k non-overlapping subseries values. If k is growing, each t~ will eventually behave as if it were independent of all but two of the 'k other t J 's Furthermore, if k remained fixed, a subseries value k t~ would never be able to reflect the dependencies of lag k+1 or greater. with k n These arguments suggest the use of {t ~ 00 as n ~ ok J n k o -< j n < [n/k ] - l} , n 00 Within this framework it seems reasonable to consider k J'kn since the corresponding t (0 < B < 1) k liS disjoint t J'kn k 's = [Sn] 's are based on subseries n of the same order of magnitude as to itself. n about n Unfortunately, only of this form will ever be available as n jk So an estimator based on such t representative subseries values. will never stabilize and home in on a 2 even as n ~ 00 k n,s n (Ironically, the bootstrap and typical-value methods use randomly selected subsets of the possible subsamples, since it is computationally impractical to use all the subsamples available.) In light of these factors we propose the use of the subseries values h jm n m n s.t. m n o-< j - ~ and m 00 < [n 1m n n ] - In 4 1} where {m 0 as n + 00 number of subseries values (n/m ) n n : n~ 1} are positive integers Thus we obtain an increasing each of which is based on an -8jm ever-growing subseries jm (2m n). ' and each t n m n is becoming increasingly n im distant (m ) from all but two of the other t n , s n m n From this point on we will assume the following set-up: s i -;.i : = s (Z ) is a statistic that is wholly comp4table from the data n n n , 0 t~ -+' . (s 1. -:IE { s }) n/2 Z1. , and does not involve any unknown parameters. t1. n n n n is the correct theoretical standardization for s 2 o E (0,00). [n/m ]-1 -"2 o n 2 (m In) n n L in the sense that 2 The proposed estimator for 0 . [n/m ]-1 . 2 1.m (s i=O i n n - s mn n where s ) is simply: mn 1.m L m n n/[n/m]. s i=O mn This is nothing more than the usual sample variance amongst the stant jm dardized subseries values {~s n n m n 6 o-< j < [n/m ] - l} n we will investigate the choice of {m }, n In Section and we will introduce some modifications (involving longer overlapping subseries) which enhance the performance of 4. -"2 n 0 L -Consistency. 2 In this section we work out some theory for suhseries values. The first main result is a law of large numbers for these entities. This result is llsed to obtain consistency of ~2 0 n Finally we arrive at an asymptotic normality result for to in which the n limiting distribution is free of 0 2 n -9Let us begin with a useful truncation lemma: Let X be F+-measurable and Y be F- -measurable, q> P . Lemma 1: q p 2 2 Suppose max{E{x }, ]E{y }} < C < 00 Proof: Then for any A> 0 • Using the representation X = AX + ~ we see that: (1) The first term on the right hand side (r.h.s.) of (1) is bounded 2 above by 4A a(q-p), by Theorem 17.2.1, Ibragimov and Linnik (1971). The required bounds on the other terms follow from the Schwarz inequality. 0 Applying this lemma we can establish the following law of large numbers for subseries values from an a-mixing process. Theorem 2: Let {m -+i Let {Z.} be a-mixing and let f (Z ) l n n n > I} be s. t. m n n -+ 00 and m [n/m ]-1 . Define f m n I n i=O lm n/[n/m] f mn n n In -+ 0 . fi be a statistic. n -10- 1 cPt: lR , If: (2.a) and 2 (fO) are e.u.i. n then: f l' m as n (2.b) -~ (2.c) 00 n By (2. a) it suffices to show lim V(f} m n-+= n Proof: jm im Now ~V(f m n L } < I<tU O<i<j<[n/m]-1 - - - fin n f n, f m [n/m < (4JE{ (f0 ) } m + I n k=2 n n} I [n/m ,-2 n kIn n} I ([n/m ] _ k)[n/m ]-2 n n 2 m n 0 . n ]-1 I<cU O , mn The idea here is that the covariance between non-adjacent f jm n,s 1J ~o, dropping off as the separation (m ) incrcaccc. n Clre order n/m as n + 00 n is n although i:ilCrc of these terms, their average becomes negligible • o 2 Formally, we note first that (by (2. b» 1E{ (fn) } are bounded uniformly in n2:. nO by C < large so that m 2:. nO . n 00 • Assume now that n is sufficiently Then for each k E {2, 3, ... , [n/m ]-l} we have: n -11- J Hence: k.!V{f } < 4[n/m ]-l C m n + n 2 4A a(m ) n + 6C\El (1\0 )-})k.! for any A> 0. TIl n Now take lim lim (.) of this last expression. [] A-+OO n-+OO ~2 Now we are ready to prove the L -consistency of '}n. 2 result follows in part from Theorem 2, since This ~2n is essentially a mean. e Let {Z. } be a-mixing and let {m } be s. t. m n n l. Theorem 3: m In n -+ °. Let s i i 2 "-2 t , a , a n' n n -+ co and be as defined in Section 3. 4 If: (to) n then: a Proof: "-z Write a n "-z n are e.u.i. LZ --> () Z as (3.a) n -+ (3. b) co • [n/m ]-1 . - (t 'm [n/m ]-1 L: L: n n = n LZ 2 --> () ]-1 (t L i=O and (t n , n 2 n, m / where t m n L i=O l.m t mn ll/[n/m] n im n [n/m ) 2 Clearly we only need to show n m ) n 2 L2 ->0 The former follows from Theorem 2. -12In order to show t L 4 --> 0, m recall n Lenuna 4 (Chung (1974), p. 97): Let rE (0,00), and suppose that and X ~> n {x } are s.t. n X. Then: iff L X _r_> X. n By (3.a) , lE{(1: m )4J < 00 o. Hence by Lenuna 4 it will suffice to establish n we have t that (t mn ~> m n 'rin s. t. m > n - n And applying Theorem 2 0 . )4 are c.u. i. Now (I: so that for A> 0 m lE{ (I: m n n Therefore we only need to show e.u.i. 2 But by (3. a) again we know that lE{ 0:: ) } < 00 n m n ~ nO. above. And L -)n 0 2 in lP and in L Z when by Theorem Z, as discussed So Lenuna 4 yields the required result. rJ Notice that both Theorem 2 and Theorem 3 are logically independent of the question of convergence in distribution. These rcsults give -13moment and integrability conditions that guarantee L 2 -consistency of estimators based on the subseries values from an a-mixing sequence--regardless of whether the to,s (or fO,s) are converging n n in distribution. Furthermore, we have not constrained the mixing coefficient a or the sub series length m a (n) -+ 0, m n -+ m 00, n In n °. -+ in any way other than In practice the L -consistency 2 is desirable because it translates into shrinking variance and bias for the estimator. We can now combine the variance estiT:ldtion result (Theorem 3) with the distributional results of Carlstein (1984), and obtain: Theorem 5: Let {Z.} be a-mixing and let s 1 i n {m } n o"'2 , n t i be as n in Theorem 3. If: 2 30 E (5.a) (0,00) s. to lim (N n-+OO k n IR ) 2 n M n t a;{t N ' R } n n whenever {N } n ° as n then: {R } {M } n n 30 -+ 4 00 0 2 are s. t. N > M + R > R n- n n- n • 00; and (5. b) ; , -+ and (S.c) (S.d) whenever {N }, n Rn IN n -+ p 2 {M}, n {R} are s.t. N >M +R >R n n- n n- n -+ 00 and -14Proof: We will begin by showing that (t °N la, M V tRn/O) --> N (0, 0, 1, 1, p) , n via Theorem 4 of Carlstein (1984). n Since lE{ to} - 0, n 2 it suffices to 2 observe that (5.b) implies that (to) n are e.u.i. Next we want to use Theorem 3 to conclude that (S.c) holds. light of (5.a) with M =0 and N =R =n, n n n (3.a) . t ° n V ---> In it is enough to verify ° But e.u.i. of (t) 4 follows directly from (S.b) together with n N(O,o 2 ) (established above). 0 (Condition (5.b) may of course be replaced by the less specific 4 condition: 5. (to) n are e.u.i.) F-Consistency. In order to get the convergence in distribution (S.d) of Theorem "2 5, we really need just a n F 2 --> a . It is possible to obtain results analogcus to Theorem 2 and Theorem 3, but which only require integrability conditions on the moments being estimated--not on the higher moments --and which yield only convergence in F for the subseries estimators. The trade-off, however, is that we must explicitly relate the subseries length m n to the rate of decay in a(·) . [n 1m ] ex (m ) n n -+ °. Specifically, we use This says essentially that if the dependence is strong (i.e. ex(·) decreases slowly), the subseries length should be -15large relative to n . This is reasonable since under strong dependence we need larger "gaps" separating non-adjacent subseries values if we want them to behave as if they were independent. Proceeding in this spirit: Let {Z } be a-mixing and let fi i n ' Theorem 6: {m } f n n Theorem 2. 1> If: E: lR 1 be as in m (6.a) and , fO are e.u.i. , and n = o; lim n a(m )/m n n n-)oO:) f then: m (6.b) ~> 1> as n -)- (X) (6.c) (6.d) • n Proof: qn Denote [n/m ] n 2m -1 + f m m n n q m + ... + f n n)/k -2 f m lP -> n - n. We write f 4m n + f p m n + ... + f n n)/k m m n n n as f m n n 1>/2. -1 We consider f = - n, -1 + f m n m n m 1P _..- -> 1>/2 and n m n first. n jm n m n Define r.v. 's {g j E: {O, 2, 4, ... , p }, n > 1} having the n same marginal distributions as {f jm n m n jE {O, 2, 4, f2 m n m 3m 5m (fn+ f n+ f n m m m n n n -2 f m n -1 It will suffice to show that both f m n j an even integer -< k n sup{j Pn ' j an odd integer < k sup{j where f = kn ... , p } , n n > n, -16jm but s.t. {gm n : j E {O, 2, 4, .•. , Pn}} are independent for fixed n n> 1. C9 n 1 Denote: tjJ (s) =lli{exp{isf }}, tjJ (s) =lli{exp{is gm}} n m n n n jm where Y . (s) = exp{is f n/ k }, nJ m n n p /2+1 (s)) n 2m p m + gm n + ... + gm n n)/kn and 8n (s) ( gm0 n n = lli{Y nO ()} s n Now, ItjJn(s) - tjJn(s) I ~ l1JJ (s) -lli{ n + Illi{ IT IT j =0, 2, - lE{ . .. + ... , j =0, 2, ... , p r, n -L. Y .(s)}8 (s)1 + nJ n Y ,(s)} nJ p -2 n IT Y . (s)}8 (s) n J'=O, 2 , ... , Pn- 4 nJ I + ... 2 IT Y. (s)} - 8 (s) I < 16 ex (m ) p /2 , nJ n n n j= O, 2 IlE { by Ibragimov and Linnik (1971), p. 307, because Iy nJ,(s) I = - Hence, by (6.c) it will suffice to show g ffiu 2(j-1)rn Put r p /L. + 1 n n denote X . nJ g n n:>1. - that for fixed n {X , nJ We will show n mn r jE{1,2, ... ,r}, . -¢ j=l X./r nj JP ---> ¢/2 . for each n I 1 and JP n ->0. Note -17Also, {X r n1 n lim JE{ I } are e.u.i. by (6.b), which in turn implies that o. XnIl} Now truncating X at r we obtain nj n n~ r r -1 n r n L j=1 X nj r -1 n + L ( n r r j=1 r r n -1 n L n j=1 - JE{ X X I}) nj r n n n + JE! r Xn I} n + X nj We will show that each of the 3 terms on the r.h.s. converges to zero in W, using an argument similar to Chow and Teicher (1978), pp. 125-126. IE! r n xn1 } I .2.r I lP{ r -1 n r L n !E{X } n1 nX . nJ j=1 I + IE{ I>d r n Xn1 } I -+ 0 as n -+ 00; also I < W{ t X . > r for some 1 < j < r } < nJ - n - - n r r n W{ IXnll2- r r r n JE{ (r -1 n n } < JE{ I n xn11 } -+ 0 as n 00; and lastly n \' L j=1 r -1 n L -+ r -. lE{(Xnl)21I{j.2.-IXnll < j + I}} j=O -w{IXnll2- j+l}) ~ n L U+ 1 )2(1p{! Xnl l2:-j} j=O -18r -1 n L j=l r .::.1 n I +3 + 1. (2) j =1 Since {X n1 } are e.u.i., 3C<oo r I I m{ jx j=[A]+l for n Gufficiently large. r n f m n n II}.::. A C + r JE{ n I~ II} , for any A> 0 , n Substitutir.g into (2), dividing through by and taking lim lim (.) establishes the required convergence inF of , -1 s.t. A-¥X> . I"HOO -2 An exactly analogous argument may be used on f m . 0 n Corollary 7: Let {Z.} be a-mixing and let s 1. i n , t i , n a 2 , a"'2 , n ten} be as in Theorem 3. 2 If: (t 0) n are e.u.i. lim n (:t(m )/m = n n n- KXJ then: ,', 2 n ll' -_. .2 a~ , and o; n ,. . . (7. a) (7.b) (7.c) n -19Proof: Write JP n --'> n 2 t (t n m ) 2 as in the proof of Theorem 3. n follows directly from Theorem 6; so does t m JP --> n ° since 0 lE{ to} _ 0. n We can finally give a version of Theorem 4 of Carlstein (1984) whose conclusion is free of a 2 and whose moment conditions are no stronger than those in that earlier result. Of course we pay by assuming more about the relationship between mixing rate and subseries length. Let {Z.} be a-mixing and let s Corollary 8: l i n {m } n ~2 () n as in Theorem 3. If: :3 0 2 E (8.a) (O,txl) s. t. M !,,; lim (N IR ) 2 a;{tO n n N n-l«J n t R } " n whenever [N 1 n N 'M n-- n n I {M II +R '. R ->n- n Q ; , 2 {R } are s. t. n and L (to) n are e.Il.i. lim n ('( (mn ) 1mn n-K'O = and °; (S.b) (8.c) -20- as n then: -+ 00 • (8.d) and • M N2(0. 0, 1, 1, p) as n t n /0) - V -> A R n n {M}, n whenever {N }. n N >M +R n- n Proof: "R n- n -+00 (8.e) -+ 00 {R} are s. t. n R /N n n and -+ p 2 This is an i.mmediate consequence of Theorem 4 of Carlstein (1984) and Corollary 7 (above). Notice that i f a(n) ~ C n Sn. U 0 < S < 1. as in the normal. double- exponential and Cauchy autoregressive examples of Section 2. then choosing m =n'i (O<y< 1) yields: n as well as m -+ n co and m n In -+ 0 1/ n a(m )/m <Cm Y n nn mn s -+ 0 as n-+,oo. . The sample mean and sample fractile statistics are discussed as examples in Corollary 14 and Theorem 17 (respectively) of Carlstein (1984). 6. "2 Simulation Study of 0 n A2 Section 3 gave intuitive motivation for the general form of 0 n and Sections 4 and 5 established certain reasonable asymptotic properties of this variance estimator. In the present section we A2 consider the finite-sample behavior of 0 A2 and we suggest some modifications of 0 n n and the choice of {m } • that yield superior n A .. -21performance. Here the method of investigation is large-scale simulation rather than theoretical calculation. "2 At this stage it is helpful to write 0 as: n im "2 o = n o-< i L (s I (t < j < [n/m ] - 1 n m n n_ s im o-< i < j < [n/m ] - 1 n m n n_ t jm 2 n) m n ill n I[n/m ] [n/m -1] n n jm 2 n) I[n/m ][n/m -1] m n n n There are [n/m ][n/m -1]/2 squared paired differences, each contributing 2 n n 2 im replicates of (t m n) to our estimate of 0 2 As mentioned in n "2 Section 3, 0 will be biased if m n n 2 im is not long enough to make (t m n) n im a good "representative." to the bias if jm - (i+1)m n jm n Th e cross-pro d uct terms t n t m m n n n is not large enough to make t will add im nand m n jm t n approximately independent. m And we need a fair number of n im (t m 2 n) replicates if our estimator is to be stable. These consid- n "2 erations led us to define a and {m } with m ~,~ and n/m ~ n n n n and led us to impose a-mixing on the underlying sequence. 00 The theoretical framework we arrived at was tractable, and yielded -22encouraging results. But these same considerations also suggest modifications to improve the performance of im m n for finite n . n we want our subseries to be as long as possible For fixed n , so that (t "'2 0 2 n) reflects all of the "relevant" dependence in {Z.} l We are restrained, however, by the fact that there are not enough non-overlapping long subseries. In practice, then, it is worth- while to consider allowing the subseries to overlap so that quality need not be sacrificed for quantity. That is, we may use subseries starting at the same intervals {im n i = O , l , 2 , ... }, for k n = lmn terms rather than just m n positive integer.) (r n terms. (Here l but lasting is a fixed The number of replicates available : = [n/m ] -l+l) is virtually unchanged, but their approximate n On the other hand, since l independence is undermined. is fixed the asymptotic properties of the estimator will still hold: now each subseries is approximately independent of all but 21 other subseries (rather than all but just 2 other subseries). In the "'2 finite-sample setting we would expect to reduce the bias of (, n im in so far as (t k n 2 ) 2 im is a better representative than (t n n jm im Yet the magnitude of the cross-product terms t im be greater than that of t m n) m n n t jm - n ill n n k n t k n will probably n especially for j-i small; this -23could offset the reductions in bias. And furthermore, although im 2 n the number of (t k ) replicates is nearly the same as the number n im 2 of (t n) replicates, the covariances between the former are likely m n to be larger than the covariances between the latter. Hence the im 2 "'2 n ) 's would have larger variance estimator On based on the (t k n im than the version based on (t m 2 n) 's Our simulation study invest- n igates these trade-offs. Based on the above arguments, it would appear that the general- • ized variance estimator being proposed is: /',2 a im jm 2 (sk n_ sk n) k /r (r -I), I n O<i<j<r -1 - - n n n n n /',2 which reduces to our old a when ( i -i Z, n But reflecting on the case 1 • n s n it is clear that the paired differences involving overlap- n ping subseries require special treatment: - im Zk n n Z k n n ( l~ z p=im +l we have "m +k J n n "Jm - n jm for 1 ~ j - i < ( , Z )/k X p p=im +k +1 n n which should be stan11 11 dardized by a factor of k /((j-i)m ) n P n 1: 2 if it is to be used to model -24n~s 0 This suggests that the appropriate variance estimator (for n mean-like statistics) .- im L (s 0< i < j < r -1 -- is: - n k jm n_s n 2 n ) (IT {j - i > f} k n which again reduces to the old "2 0 n .e + -:-. J-1 II { j - i < n) k n / r n (rn -1) , when I I . To test the performance of this 0"2 n where theoretical checks can be made: {Z.} comes from an AR(l) 1 E. ~ sequence Z. 1 we began with a simple situation , 1 this situation it is easy to show that a iid N(0, 1) , 2 (l-¢) S i n = -::-lZ'n . In _? ~. We considered weak, moderate, and strong positive dependences in {Z.} (¢ = .1, .5, .9) 1 samples of realistic sizes for time-series analysis (n = 100, 250, 500, 1000) (m n = short, medium, and long "base-lengths" for the sub series [.en nJ , and subseries overlaps of 2/3, 1/2, and none (corresponding to I 3, 2, 1) . = For each combination of ,0 "2 1000 realizations of Z (and hence a ) were generated. n n (¢, n, m , I) , n The routine generating the E. 's was adapted from the uniform random 1 number generator of Wichmann and Hill (1982) and the inversenormal approximation of Beasley and Springer (1977). The criteria used to evaluate 2 V{o } n + ~21 CIE{} n J 2 2 - 0) ,2 0 n are: and (each of these being estimated -25from the 1000 realizations of -"2 ). n ' 1: -"2 2 is estimated by (V{o }/1000) n 0 2 vary so dramatically (0 the standard deviation of 0 2 Because the true values of 1. 23 for ¢ = .1, 0 2 = 100 for -"2} Ii; 4 n MSE { () it aids comparisons across ¢ to consider ¢ = .9), nnd also to consider V{~2} and n (lE{ ;~2} n 2 - ( ) 2 as proportions of The results are presented in Table 1. -"2 As suggested by the theoretical results of Section 4, 0 is converging to 0 2 n in m.s.e. as n increases--for all values of ¢ and all choices of {m } and C • n Under weak dependence (¢ = .1), virtually all of the m.s.e. is • due to variance because even the short subseries are long enough to represent the relevant dependence. Comparing across m = il1 n, n ~ ,n 3/4 for n fixed values of nand C , we see the smallest variance for m n and the largest variance for m = n n 3/4 = This is due to the large . number of subseries values (r ) available 'Ilhen n ill II is short and the scarcity of subseries values when m n is long On the other hand, for fixed nand m we see a subn stantial increase in variance as C increases. This cannot be attributed to the corresponding but relatively minor decrease in r n (except p~rhaps in the case m n = n 3/4 ). Rather, this effect must be from the larger covariances between the longer overim 2 n lapping subseries values (t Since variance is the name ) k n il1 n e ~ TABLE 1. Simulation Study. of;2n I = .1, n , m n £. k n r lE n sd {lE} V ) = ,1SE V --4- :1SE G 25 50 83 166 1. 19 1. 18 1. 19 1.19 .011 .008 .006 .004 .120* .061* .032* .016* .98 .95 .94 .89 .080* .042* .023* .012* 2.71 2.96 3.11 3.11 100 250 500 1000 4 5 6 6 2 2 2 2 8 10 12 12 24 49 82 165 1. 21* 1. 21* 1.22* 1. 22* .014 .010 .007 .005 .192 .090 .055 .027 .99 .99 1. 00 1. 00 .126 .060 .036 .018 100 250 500 1000 4 5 6 6 3 3 3 3 12 15 18 18 23 48 81 164 1.18 1. 23* 1. 21* 1. 22* .015 .012 .009 .006 .237 .137 .079 .038 .98 1. 00 .99 1. 00 100 250 500 1000 10 15 22 31 1 1 1 1 10 15 22 31 10 16 22 32 1. 23* 1. 22* 1.24* 1.23* .018 .014 .012 .010 .310 .189 .145 .101 100 250 500 1000 10 15 22 31 2 2 2 2 20 30 44 62 1,9 21 31 1.18 1.24* 1. 26* 1.22* .022 .018 .015 .012 100 250 500 1000 10 15 22 31 3 3 3 3 30 45 66 93 8 1:' 20 30 1. 26* 1. 23* 1. 22* 1.24* 100 250 500 1000 31 62 105 177 1 1 1 1 31 62 105 177 3 4 4 5 100 250 500 1000 31 62 105 177 2 2 2 2 62 124 210 354 100 31 62 250 I 500 105 177 I 1000 3 3 3 3 93 186 315 531 I ~ '" I i 1-'(' " s= I I I I -:r I ;::;- I " "! 0;=\ I I I : 2 = lim n~ .5, :;.:. (n V{ s O:-), n si n =Zi 4.0 n V ~ V \1~1' 4 5 6 6 s- sd {E} lE 1 1 1 1 II ~ , _2 = 1.23 4 5 6 6 " e {Z.} is an AR(l) sequence with coefficient :I. 100 250 500 1000 ....'" e , .026 .019 .015 .011 .667* .363* .239* .127* .29 .25 .23 3.29 3.42 3.52 3.56 .040 .028 .023 .016 .158 .090 .053 .025 3.38 3.60 3.72 3.66 1. 00 1. 00 1. 00 1. 00 .203 .124 .095 .066 .489 .307 .232 .138 .99 1. 00 .99 1. 00 .027 .022 .018 .015 .745 .492 .314 .218 1.19* 1.24* 1.18 1. 23* .037 .031 .028 .028 2 3 3 4 1.13 1. 20* 1.28* 1. 25* .054 .038 .046 .034 1 2 2 3 -- -- -- -- -- -- 1. 31 1.17* 1. 20* .061 .1155 .040 3.73 1. 00 1.00 1. 00 2.45 1. go 1.07 4.16* 3.82* 3.96* ~ 4 lE sd{lE} ,146 .14 1. 57 . 81 .51 .26 .76 .71 .69 .57 .129* .071* .046* .029* 26.4 35.2 41.8 42.6 · 3 · 1 · 9 · 3 .45 .41 .35 .25 .049 .035 .028 .019 2.43 1. 20 .78 .37 .86 .88 .91 .76 .176 .085 .054 .031* 34.7 46.4 53.5 54.3 3.42 3.58 3.73 3.78 .052 .040 .038 .030 2.67 1. 61 1. 44 .93 .89 .90 .95 .95 .188 .112 .094 .061 .323 .202 .153 .091 3.58 3.85* 3.91* 3.93* .065 .056 .047 .039 4.19 3.14 2.17 1. 53 .96 .99 .99 1. 00 1. 00 1. 00 1. 00 1. 00 .489 .323 .206 .143 3.77* 3.76 3.89* 4.01* .084 .064 .060 .049 6.98 4.06 3.63 2.42 1. 39 .94 .79 .31 1.00 1.00 1. DC 1. 00 .914 .617 .519 .532 3.67* 3.83* 3.73 3.85 .113 .097 .098 .084 2.95 1. 42 2 .10 1.14 1. 00 1. 00 1. 00 1. 00 1. 94 .93 1. 38 .75 3.69* 4,20* 3.97* 4.07* .184 .133 .124 .120 ~.03 1. 64 = 100 V ;lSI - V I ."!.:> I:.. 15.3 20.7 25.2 25.3 -- ? :;- ~ .090 .065 .058 .185 .179 .129 = . 9, 35.1* 16.5* .01 .01 . ell .00 200 169 124 63 .04 .04 .04 .02 .65 .64 .52 .34 423 409 275 118 .09 ', ...... .05 32.7 47.2 58.6 69.8 .61 .60 .62 .59 367 361 381 347 .273 .198 .136 .096 47.9 61.5 76.7 81.9 1.1 1.0 1.0 .9 1157 1008 1082 796 .99 .99 .99 1.00 .440 .257 .228 .151 55.1 73.9 80.8 89.6 1.5 1. ~ 1.3 1.1 2224 2049 1687 1218 12.69 9.35 9.56 7.09 .99 1. 00 .99 1.00 .800 .586 .602 .444 60.0* 85.3* 89.0* 91. 9 1.9 2.2 2.2 2.0 3598 4770 4976 4128 .69 .96 .98 .98 33.8 17.6 15.3 14.4 1.00 1.00 1.00 1. 00 2.12 1.10 .96 .90 64.8* , 89.9* 94.4* 90.0 3.1 3.0 3.3 2.5 9688 9082 10721 6380 .89 .99 1. 00 .98 ' 1.09 ! .92 I I -- -- -- 34.4 32.0 16.7 II -- 1. 00 1. 00 1. 00 -- 2.15 2.00 1. 05 -- 86.9* 84.0 98.1* -- 4.0 3.9 3.3 51.5* 43.1* I I -- 16005 14889 10548 i .713 .534 .~~ .. r .J.:>9 , I . .562 i .:'37 .351 .336 .469 .328 d~ ..." -' • .:. .... -+ .220 I; .07 .~89 ! .:1 .315 I i .18 .28 .209 .126 I .30 .:'0 .388*: .67 .163* i .112*, I ; .249*! . i .:. I , .52 .75 .82 .92 .99 I 1..98 00 24 ~ I ... I .273 i ' .206 I : .133 ! i .520 I i I I I .499 I .510 i .419 :I ' 1. 08 .65 .~2 . .:>1 .06 I -27of the game here, the introduction of overlap doesn't payoff in a big way. But it is worth noting that in the cases where there was some significant bias (Le. m subseries length (i = faced with a fixed n, estimators (m n = in n, i = n = 1), 2) does eliminate it. In practice one is a fixed (but unknown) ep, k in n , n 2 n 3/4 doubling the and a choice of i = 1, 2, 3) So the "bottom line" of this analysis is to identify which of the 9 estimators is best for each criteria (bias, variance, m.s.e.), given nand ep. In Table 1 , an asterisk (*) indicates the best (or approximate best in the case of close races) estimator. Clearly, m n i = 1 is the big winner for all sample sizes when ep Moving on to the case of moderate dependence (ep = in n , .1. = .5), we begin to see the biasing effect of insufficient subseries length. = in In the case of m n i = 1, where the bias is most substan- n, tial, there are pronounced gains for doubling and tripling the subseries length (i 2, 3) . = When m n n~ and i 1, there are again improvements for doubling, but less decidedly so for tripling. When m n n 3/4 , the subseries are already so long that increasing i is of little value. is parallel to the ep = .1 case: The pattern of variances for fixed n, variance increases in response to fewer replications and increased overlap. • though this makes m n = in n, R. Al- 1 the best choice for minimizing variance, the bias contribution is of enough consequence to push -28m = ll1 n, l = 2 ahead in m.s.e. n also beats m ll1 n, n The variances for m n n small sizes of r n l = Note that m = ll1 n, n l = 3 1 for n~ 250 (in terms of m.s.e.). 3/4 are so large--due to the excruciatingly --that it seems unwise to use such estimators, in spite of their relatively good performance in terms of bias. For m n n~ In n l 2 and 3, the biases are nearly as good as for = n 3/4 , but the variances and m.s.e. are much more reasonable (relative to 4 If one places higher priority on bias reduction, 0). it would not be unreasonable to prefer m n = !.: n2 l = 2: the estimator that minimizes variance and m.s.e. amongst those estimators that are "best" on the bias criteria. Thus there are several arguments supporting the use of overlapping subseries • under moderate dependence. When the dependence is strong (¢ = .9) we are embarrassed to find that, as n increases, the minimum variance estimator (m n true = ll1 n, i 0 2 . 1) is zeroing-in on a value that is \ of the = Now it takes the mammoth subseries of m n to wipe out the bias portion of m.s.e. of the m n n 3/4 But again the variances 3 4 n / estimators seem prohibitively large. The over- all pattern of variances is as in the previous cases, but when bias and variance contributions are combined into m.s.e., the estimator with m n = n li f = 2 is superior for all values of n . • -29On the whole, this simulation lends credence to the use of = 8ubseries, particularly l overlapping = 2 (since l 3 seems to suffer from increased covariances more than it gains by And, for this range of sample sizes, m reducing bias). n n 3/4 yields too few replications for it to be a stable estimator. Looking carefully at the case 2 of 0 is 100), ¢ = .9 (where the true value it appears that the gains for allowing overlap do not quite measure up to what would be expected. example, when k 34.7, but when k 42. • = 12 n n = 12 =3 with l with l =2 n , 31, n l = 1 for: k n n 85.3; 64.8. l increases. im n -t n jm 2 n ) k 62, n = 30, 69.8; 61.5; 55.1. = 22, l = 1 those for: k Similarly, the expectations for: k are respectively: k the expectation is approximately but here the estimator does worse when more overlap is involved. (t the estimator has expectation In principle the bias reduction should be constant for a fixed subseries length k k For k n l = 2 ; k l = 3 30, Likewise the expectations 20, l = 2 are: 58.6; 47.9. l = 1; k n = 62, l = 2 (m n = And 3 4 n / ) are: In each case there is substantially more bias as The explanation is that when t = 1 all of the im terms contribute 2 replicates of (t n " when • im 2 n contribute 2 replicates of (t(O 0) ) J-1 m l = n > 2 there are pairs with 1 < j-i ~ n k n 2 ) but n l-1 that instead Being based upon -30shorter subseries, these latter replicates do not have the debiasing effect which was the motivation for introducing overlap. Our special standardization of these pairs by k /«j-i)m ) n k 2 n makes these terms the correct order of magnitude (if s i n is mean-like), and including them in our estimator gives us more paired differences and hence more stability. But in terms of bias we would be better off excluding them and defining: im jm 2 (sk n_ sk n) rr{j-i > -Ok /(r -f) (r -f+1) n n n r -1 n n ~2 a : n n as our variance estimator. Of course, if the original number of pairs (r (r -1)/2) is small (e.g. m n n n 3 4 n / ), = the reduction to (r -f)(r -1+1)/2 non-overlapping pairs will be disastrous. n And n -2 in general we would expect a to have larger variance but smaller n "'-2 bias than a To investigate these effects n were conducted, but excluding m = n n . l ' Slmu atlons 0 f a~2 n 3/4 (due to insufficient r ) , n and excluding .£ = 1 (for obvious reasons) and f = 3 (because the m.s.e. of f = 2 was usually better). The results for ¢ = .1, .5, .9 are in Table 2. Under weak dependence (cjJ ~2 unbiased, and so are the a that for fixed ¢, n, n m , n = • 1) the estimators. .e the 0n2 "'-2 (1 n estimators were nearly (A bubble (0) indicates estimator is superior to or "'-2 approximately equivalent to the corresponding a n estimator. A • -31Simulation Study of TABLE 2. 2 0- n k = n si= Zin • n . { O} = n-XX> 11m(n V s ), n sample = lmn = size, m n ~2 0 n {Z.} is an AR(l) sequence with coefficient ¢. • 1 Based on 1000 realizations. = "base" subseries length, l = 2 = overlap factor, actual sub series length, r n = # of subseries per sample. °Better than (or approximately equal to) corresponding value for +Best (or approx. best) for criteria, for fixed nand ¢. ~2 lE, V. MSE are simulated estimates of :IE{0- }. n n C""l N · P n k n r n sd{E} :IE V V NSE n MSE --tr 0- 8 10 12 12 24 49 82 165 1. 21°+ 1.20°+ 1.21"+ 1. 220+ .014 .009 .008 .005 .190° .088° .056° .027° .99 .98 .98 1.00 100 250 500 1000 10 15 22 31 20 30 44 62 9 15 21 31 1.17° 1.22°+ 1.23°+ 1.22°+ .484° I .022 .018 .016 .013 .340 .247° .165 .991.321° 1.001.2230 1.00 .162 1.001.108 100 250 II pi 500 13 1000 4 5 6 6 8 10 12 12 24 49 82 165 II .039 .029 .023 .017 1.50° 100 250 500 1000 10 15 22 31 20 30 44 62 9 15 21 31 .077 .057 .051 .041 5.97 3.23° 2.59 1.65° 100 250 500 1000 4 5 6 6 8 10 12 12 24 49 82 165 42.6° 42.9° .49 .42 .37 .25 237 177 ° 140 62 ° 100 250 500 1000 10 15 22 31 20 30 44 62 9 15 21 31 48.5 ° 68.0° 77.9° 84.2° ~ II II b 13 p .125° .059° .037° .018° , I ~ ...... ....\,.. · p II II -e0 · -.:t p 13 p ~ ~ II b ~ ....\,.. · p -e- I3 0 0 .... If'') II II P p ...... ~ II II I3 N b ~ 0\ · II -e- P .•If' P II I3 P 3.22° 3.41° 3.53° 3.54° I I 3.72° + 3.74 3.89°+ 3.93°+ 28.1° 36.6° I I 1.2 1.2 1.1 .9 .85° .52° .27° I I i 1532 1397 1240 851 0 . 71 1. 131°+ .711.075°+ .70,.046°+ .56 .030°+ .99 .98 1.00 .99 .378 .206° .163 .104° .04 .540° .041 .419 ° .04/ .344 ° .021.333 ° .37 .58 .72 .77 • A2 0- n and -2 MSE{O- } respectively. 4 5 6 6 ...... n among all 100 250 500 1000 ~ N m ~2 V{o- }. n "-2 0- .418° + .242°+ .173°+ .110°+ II i 0n2 estimates. -32plus (+) indicates that for fixed ~2 ¢ and n the 0 estimator is n "2 -2 the best or approximate best amongst all eleven 0 and 0 n estimators.) ,,2 n good as 0 (m -2 In terms of variance and m.s.e., 0 n is almost as n -2 The variance of 0 is hurt more when r n n is small !.: n = n 2 ) because then the loss of (2r -i) (i-l)/2 overlapping n pairs is relatively greater. The story is similar for bias, variance and m.s.e. as terms of variance when r n -2 ¢ = .5 : o n has about the same ,,2 0 n -2 did, but again is small. 0 suffers in n -2 . Notice that 0 ~s an n optimal (+) estimator in terms of m.s.e. when m = bl. n, n and n~ is usually optimal for bias reduction when m n Turning to the case of strong dependence (¢ = .9), we now 0n2 see significant gains in debiasing by using every expectation shows an improvement relative to the corresponding entry for ,,2 n o , and five out of eight of these increments are in excess of 2 s.d. units. -2 tends to be 0 n Once again the variance of ~2 inferior, but its bias is so superior that o n actually ends up "2 with smaller m.s.e. than 0 in six out of eight cases. n ,,2 -2 Overall it seems that o performs as well as 0 in terms n n of bias and m.s.e., but somewhat worse in terms of variance. "2 Moreover, whenever (} (£=2) was optimal (*) for bias or m.s.e., n -2 (} retained that optimality. And when the dependence is strong, n ,,2 -2 (} offers substantial gains over (} in terms of bias and m.s.e. n n -33If one is more concerned with bias and m.s.e. than with variance, then nothing is to be lost by using -2 0 n And if one would like "insurance" against strong dependence, then there is something to be gained in using 0n2 An analogous simulation study was conducted in order to investigate the behavior of 0n2 when si is the ratio statistic n The results here echo those for s =2 there are substantial gains in debiasing for using i = than i i rather 1 (for fixed ¢, n, m ), and this effect is more pron nounced under heavier dependence. • i n The estimator using m n in n , = = 2 minimizes m.s.e. when ¢ = .1 and ¢ =.5 (for all n); but ¢ = when that m n .9 the debiasing effect of long subseries is so important = n~ i = 2 has the best m.s.e. Thus there is further support for using longer overlapping . .-2 0 • n su b ser1es 1n Throughout these simulattons, i 2 in particular has made noticeable improvements over i i = 3 had unacceptably inflated variance. 1, while The choice of {m } n seems to hinge upon the strength of dependence in {Z.} : when 1 augmented by overlap (i = for ¢ but for ¢ = .1 and ¢ = .5; 2), fin in n was quite acceptable .9 the extra length of fi was really necessary to control the bias. ., n!:;. n (Recall that in Section 5 our theoretical work suggested the need to relate {m } to a(·) n -34by (n/m )O'.(m ) n n -+ a. This again requires longer subseries under strong dependence.) Perhaps the "safest" and most in- tuitive estimator under unknown dependence would be t = 2 , m n = n~ It gives equal priority to r replicates) and m n n 0n2 with (II of (base subseries length), but then beefs up the subseries length for debiasing (k n = tmn 2m ) n and ignores the confounding overlapping pairs (j-i < t) it minimized m. s. e. when ep both statistics s 7. = .9, for all values of n, And for i n Acknowledgment. I thank Professor John Hartigan, my thesis advisor, for his guidance on this research. • REFERENCES Beasley, J.D. and Springer, S.G. (1977). of the Normal Distribution. Billingsley, P. (1968). The Percentage Points Appl. Stat., 26, 118-120. Convergence of Probability Measures. John Wiley and Sons, New York. Carlstein, E. (1984). Asymptotic Normality for a General Statistic from a Stationary Sequence. University of North Carolina Institute of Statistics, Mimeo Series #1561 Chow, Y.S. and Teicher, H. (1978). Probability Theory. Springer- Verlag, New York. Chung, K.L. (1974). A Course in Probability Theory. Academic Press, New York. • Efron, B. (1979). e knife. Bootstrap Methods: Another Look at the Jack- The Ann. of Stat., 7, 1-26. Efron, B. (1982). The Jackknife, the Bootstrap and Other Re- sampling Plans. Society for Industrial and Applied Mathematics, Philadelphia. Freedman, D. (1984). On Bootstrapping Two-stage Least-squares Estimates in Stationary Linear Models. The Ann. of Stat., 12, 827-842. Freedman, D.A. and Peters, S.C. (1984). Equation: Some Empirical Results. Bootstrapping a Regression Jour. of the Am. Stat. Assoc., 79, 97-106. Gastwirth, J.L. and Rubin, H. (1975). The Asymptotic Distribution Theory of the Empiric CDF for Mixing Stochastic Processes. The Ann. of Stat., 3, 809-824. Hartigan, J.A. (1969). Using Subsample Values as Typical Values. Jour. of the Am. Stat. Assoc., 64, 1303-1317. Ibragimov, I.A. and Linnik, Yu.V. (1971). Independent and Stationary Sequences of Random Variables. Wolters-Noordhoff Publishing, Groningen, The Netherlands. Rosenblatt, M. (1956). Mixing Condition. Tukey, J.W. (1958). Samples. A Central Limit Theorem and a Strong Proc. of the Nat. Acad. of Sc., 42, 43-47. Bias and Confidence in Not-quite Large The Ann. of Math. Stat., 29, 614. Wichmann, B.A. and Hill, I.D. (1982). Pseudo-random Number Generator. Ed Carlstein Dept. of Statistics Univ. of N.C. Phillips Hall, 039A Chapel Hill, N.C. 27514 An Efficient and Portable Appl. Stat., 31, 188-190.
© Copyright 2025 Paperzz