Carlstein, E.Asymptotic Multivariate Normality for the Subseries Values of a General Statistic from a Stationary Sequence--with Applications to Nonparametraic COonfidence Intervals."

ASYMPTOTIC
MULTIVARIATE NORMALITY FOR THE SUBSERIES VALUES OF
A GENERAL STATISTIC FROM A STATIONARY SEQUENCE--WITH APPLICATIONS
TO NONPARAMETRIC CONFIDENCE INTERVALS
E. Car1stein
University of North Garo1ina, Chapel Hill
Summary
Let {Z.:
_00
1.
~
< i < + oo} be a strictly stationary a-mixing sequence with unknown
marginal distributions and unknown dependence structure.
~.
data Z~
= (Zi+1,Zi+2"'"
of the unknown
series values s
Zi+m)' the statistic s~
paramet~r e.~f
i
m
.
a
..
sample
.
..
..
series
.
Suppose that, given
~.
= sm(Z~) is a point estimator
~O
Zn is available,
then the sub..
(0::;; i < i+m::;; n) may be used to construct a nonparametric con-
fidence interval on e via either Student's distribution or via the Typical
Value principle.
The asymptotic justification for both methods rests upon a
more general result regarding asymptotic multivariate normality of subseries
values.
•
'.
Supported by NSF Grant DMS-8400602 •
Running Heading: Subseries-based Confidence Intervals
AMS 1980 Subject Classification: Primary 62G15, Secondary 62E20.
Keywords and Phrases:
Confidence intervals, subseries, dependence, typical
values, Student's t, jackknife, nonparametric, stationary,
multivariate normality.
1.
Introduction.
The jackknife (Tukey, 1958) and the typical-value principle (Hartigan, 1969)
both employ subsample values of a general statistic as the building blocks of
confidence intervals on an unknown parameter.
The idea behind the jackknife is
that the "pseudovalues" (which are based upon subsamples of the data) are approximately iid normal random variables; hence, they can be used to construct
approximate confidence intervals based on Student's distribution.
Hartigan
(1975) gave sufficient conditions for the asymptotic multivariate normality of
these pseudovalues, thus providing a theoretical justification for the jackknife
procedure.
The idea behind the typical-value principle is that certain sets of
subsample values of a statistic are (approximately) "typical values" for the unknown parameter; i.e.
each of the intervals between the ordered subsample values
will include the unknown parameter with (approximately) equal probability.
4It
Hart-
igan (1975) gave sufficient conditions for a set of subsample values to behave
asymptotically like a set of typical values, thus providing a theoretical justification for the concomitant confidence intervals.
All of the work discussed above deals with subsample values of a general
statistic computed from iid data.
If there is dependence in the data, then
there is an even greater need for confidence interval techniques that free the
user from parametric modeling and theoretical analysis.
Indeed, the dependence
structure would provide one more unknown in the model and one more complication
in the theory.
,
The present paper provides confidence interval procedures for
the dependent case.
The procedures are analogous to the jackknife and typical-
value methods both in physical form and in spirit:
employed rather than "subsamples"
posed of successive observations).
(the~former
On
"Subseries" of the data are
referring only to subsamples com-
the one hand, equi-lengthed non-overlap-
ping subseries values of a general statistic are shown to be asymptotically
2
distributed as iid normal random variables.
This justifies the use of confidence
~
intervals based on Student's distribution, analogously to the jackknife procedure.
On the other hand, certain sets of linear combinations of subseries values behave
asymptotically like a set of typical values.
This justifies the use of these
entities in constructing confidence intervals via the typical-value principle.
The theoretical justifications that Hartigan (1975) provided for the jackknife and typical-value methods (in the case of iid data) both rested on a more
general result:
His Theorem 2 gave
conditions under which (possibly overlapping)
subsample values of a general statistic are asymptotically multivariate normal
(with possibly nonzero covariances).
Our theoretical justifications for confid-
ence intervals based on Student's distribution and on the typical-value principle
(under dependence) are likewise based upon a more general result:
Theorem 1
(below) establishes conditions under which (possibly overlapping) subseries
values of a general statistic are asymptotically multivariate normal (With possib1y nOnzero covariances).
~
•
Quite surprisingly, the conditions in Theorem 1 are
virtually identical to those in Hartigan's (1975) Theorem 2, even though the
£ormer permits dependence in the data while the latter forbids it.
The results presented below assume no knowledge of the marginal distributions
of the data beyond stationarity.
The only assumption about the dependence is
that it be a-mixing (a relatively weak assumption in the mixing hierarchy (see
Ibragimov and Linnik, 1971)). The literature contains no other analogies to the
jackknife or the typical-value methods for dependent data.
Freedman (1984) has
considered applying the bootstrap to a linear model with autoregressive component; but, as he emphasizes, the bootstrap calculations assume that the user
has correctly specified the form of the underlying autoregressive model.
The next section presents basic notation and definitions.
Section 3 contains
the fundamental asymptotic multivariate normality result for subseries values of
a general statistic from an a-mixing sequence (Theorem 1).
Corollaries 1 and 2
~
3
an Section 4) provide the justifications for confidence intervals based on
Student's distribution and the typical-value principle.
The proof of Theorem 1
is deferred to Section 5.
•
2.
Definitions and Notation .
Let {Z. (w) :
_00
1.
< i < +oo} be a strictly stationary sequence of real-valued
random variables (r.v.) defined on probability space (Q,F,P).
+ -
Let F (F
P q
respectively) be the a-field generated by {Zp(w), Zp+l(w), .•• }
(L .. , Z l(w), Z (w)} respectively).
qq
For N ~ 1 denote: a(N) = sup{ IP{AnB} - P{A}P{B} I: A E F~, B E F~}, and define
a-mixing to mean
liJTN_~cxp(N) =
0 •
m
1
Let t (Zl' .•• ' Z ) be a function from R + R , defined for each m ~ 1 so that
m
m
Suppressing the argument
,
W
of Z. (0)
1.
E,V, and C denote expectation, variance, and covariance respectively.
Indicator functions are denoted by I{·}.
and
B
X
For B~O denote: BX=xoI{lxl <B}
= X-BX.
Let {a} be a sequence of real vectors, and let A be a set of conditions
n
to be satisfied by the a 's as n+ oo (e.g. la 1+ 00 ) .
n
n
Then the notation
= x means that, for a single finite constant x,
an
sequences {a } satisfying A.
lim
A
x
lim x
n-+OO
an
= x. for all
n
•
3.
Main Result .
'.
2n0
Consider an arbitrary
m.
but fixed number k > 1 of subseries values {T. : 1 < i <: k}, where T. =t 10 .
1n
- 1n
r.
1n
2
Assume r. In + p. > 0 as n + 00 for each i, so that none of the subseries
1n
1
For each n > 1 the data
are asymptotically negligible.
from {Z.} is available.
1
Of course we must have 0 < m. <m. + r. < n
1n
1n
1n
+0
V i,n so that each subseries is in fact contained in Zn.
Also assume
4
2
as n+oo for each i, so that the proportion of overlap between
In
Cl
subseries is asymptotically constant. In fact the asymptotic covariances
m. /n+u.
between the T. 's depend precisely upon the limiting proportions of overlap.
In
Therefore it is convenient to pool the 2k integers {min' min + r in: 1 < i < k}
It
for fixed n, and to order them and relabel them as Cn
Thus c
<.··<c
<no
< c
where 0 -< C1n- 2k,n2n c
2k,n
=
max {m.In
I<i<k
+
r.
In
}
In
= {c.In :
1 < i < 2k}
= min {m.In } and
I<i<k
To avoid trivial complications, we assume that
the ranks of {m.In , m.In + r.In : I<i<
- - k} remain the same for all n.
That is,
if we define I (i) to be the rank of m. in C (Le. c (.)
we may
n
In
n
I 1 ,n = m.)
In '
n
wri tel (i):: I (i) . Similarly the rank of m. + r. in C may be denoted J(ij.
In
In
n
n
2
Finally we define li-1 = lim (c. - c. 1 )/n; in particular
n--=
In
1-,n
The statistics defined by {t (0): m > I} will be called central
m
parameter
~ith
0 2 1'f :
•
(I)
(II)
lim
limAIE{At~}1 = 0; and
A-><>o n+oo
(III)
lim
A+oo
lim
O'Vp
2
E[O,l].
2
u /w +p ,w >v +u >u +00
n n
n-n n-n
These conditions are virtually identical to those set forth by Hartigan (1975)
in his definition of centrality for the independent case.
Condition (I) controls
the tails of to,s distribution; (II) centers the statistic; (III) requires the
n
statistic to have covariance behavior analogous to the sample mean of iid r.v.s:
the squared correlation between the statistic and its subseries value should
be approximately equal to the proportion of overlap.
Similarly to Hartigan's
(1975) result for the independent case, centrality (as defined above) implies
+
multivariate asymptotic normality of Tn
= (TIn'
T2n , .:., Tkn ) in the Q-mixing
5
case.
Theorem 1:
Let {Zi} be a-mixing.
2
-+ D
-+
o > 0, then Tn -+ NkCO,L:) as n-+ oo •
o ..
=02
J.J
If {tmCo)} are central with parameter
The entries of L: are:
min{JCi)!J(j)}-l
~=max{I
2
Yo/p.p.
JV
(i) , I (j) }
J.
J
where empty sums are zero.
Note that centrality does not even assume the existence of ti,s moments.
m
Carlstein (1985) establishes centrality for sample means, sample fractiles,
and smooth functions of central statistics, all under a-mixing.
As a
practical matter, the following alternative conditions (which are analogous
but more restrictive) are known to imply centrality under a-mixing (see
Carlstein (1985)):
•
(I' )
{to} are uniformly squared-integrable; and
(II' )
lim E{tO} = 0; and
n
n-+oo
n
(II I')
4.
°
1
v
(W/U)2C{t
,tn}=i.
lim
n
n
w
u
w >v +u >u -+00
n
n
n-n n-n
Applications to Confidence Intervals.
Throughout this section assume the following set-up: The statistic
-+i
si = s (Zi) is wholly computable from the data Z , and does not depend upon
m
m m m
any unknown parameters.
.
i
Furthermore, s
1
Finally, put tJ. = (s -8 )m 2
m
m
•
i
m
estimates the unknown parameter
e.
The notation of Section 3 remains unchanged.
Corollary 1: Let r.J.n = r n and m.J.n = (i-1)rn V i € U, ... ,k} and Vn, with:
r In -+ p2 > 0, 1 < r < kr < n vn .
If {Z.} is a-mixing and {t (o)} are
J.
m
n
0
n- ncentra 1 WJ.t
. h parameter 0 2 > , t h en Tn 2. Nk (0, 02 Ik ) as n-+oo •
°
6
Proof:
Immediate from Theorem 1.
0
Analogously to Hartigan's (1975) justification for the jackknife in the
independent case, Corollary 1 provides the asymptotic justification in the
a-mixing case for treating
S
n
=
, k-l ir
,
(s _8)k 2 / ( I (s n_ s)2/(k_l))2
n
. 0 r
n
1=
n
as a r.v. with Student's distribution on k-l degrees of freedom. (Here we
k-l ir
denote 5 = I s n/ k .) Observe that S is free of the nuisance parameter
n
n
i=O r n
2
a , and hence may be used as a pivot for constructing a confidence interval
on 8.
i
Corollary 2 shows that, even under a-mixing, the subseries values scan
m
be used to construct a set of statistics which are (asymptotically) typical
values for 8.
Coroll ar)' 2:
Let mIn = 0 and m.1+,n
1 = m. + r. Vi E{I, ... ,k-l} and \fn,
1n
In
Kith:
k
r. /n -+ p ~ > 0 \f i E {l, ... , k}, 1 < r. < I r. < n \' j E{I, ... , k} and Vn .
1n
1
- In i=l 1nLet £ E{l, ... ,k}be arbitrary but fixed, and denote K = {i E{1, ... ,k} s.t.
i f. .Q.}. Define:
V.
1n
If
for each i EK, Vn .
{Z.} is a-mixing and {t (o)} are central with parameter a 2 > 0, then:
1
m
lim p{ I I{V. < 8} = N} = l/k
1n
n-+oo
iE K
In particular:
Proof:
for each N E {0,1, ... ,k-l}
P{min V. < 8 < max V. }
1n
iE K 1n
iEK
-+
1-2/k
By Theorem 1, Tn ~ Nk (0, a 2I ) as n-+oo.
k
-
-
T. =T. +T n ; alsodenoteT =(T.: iEK).
1n
In
Nn
n
In
as n-+oo.
For each i EK, denote
7
-r = cr 2 (I _ +[I} (k-l)X(k-l)) •
k l
Using the logic in the second paragraph of
Hartigan's (1975) proof of his Theorem 6, we may conclude that the event
= {exactly N of the
A.._
-~n
T.
~n
's, i E K, are less than O} has asymptotic probability
But since T.
11k (for each N E {O,I, ... ,k-l}).
An is equivalent to the event
-~
5.
j
~n
IEK
I{V. <e}
Proof of Theorem 1.
{d. : n>l} and {w. : n>l} as follows:
~n
1n
~n
d.
~n
For each iE {2,3, ... ,2k} define
= c.
~n
= 0 yn' otherwise we can choose w. s. t. 0 <
'
~n
-
w,/d.
t
0
= N.
~n
2
Without loss of generality, put cr =1.
w.
= (V. -e)(r.!+!
r o ), the event
~n
1n
Nn
-+ 0 as n-+ oo •
1n
1n
~n
c
= t d ~n
-w
~+1, n
~E
For each
- c. 1
.
l-,n
W,
~n
If y. 1 = 0 then
~-
< d. Yn, w. -+
- ~n
~n
00,
and
{l,2, ... ,2k-l} and each n, denote
We shall begin by showing that:
~+1,
n
J(i)-1
B.1n .- IT,
1n
P
yo"t o Ip·1 -+
N
Nn ~
L
~=I(i)
-
o
as n -+ 00 for each i
E
{I, ... , k} .
Observe that
J(i)-1
A-
B.
~n
for A> O.
<
-
I-l'·1n I
Let
E:
+
I
~=I(i)
•
(1)
> 0 be given.
By (1), lim lim p{
A-+00 n-+oo
each
A
J(i)-1
Yol to lip. + 'AT1'n L YoN "Ato/f'iI
N
Nn
1
~=I(i)
N
I~inl > d
= 0
and
lim lim
A-+00 n-+oo
p{yQ.IAt~nl > d
= 0 for
Let B. (A) denote the term within the last modulus of equation (1).
~.
~n
We have:
P{ lB. (A)
1n
I>d
J(i)-1
I
t=I(i)
<
-
E: -
2 [ IE { (AT. ) 2} - 11 +
~n
_
(2 Yo
N
IE{AT,~n "At
0
Nn
}
-
II p.~
~
Yo I p.
N
2
+ Yo
N
-
IE { (At 0
Nn
2
)
}
-
2
1 II p .) +
1
8
(2)
Now take lim lim of both sides in equation (2).
Except for the terms within
A~n~
the last summation, each term on the r.h.s. of (2) vanishes by (III).
Those
remaining summands with Y£.· Y£.I ~ 0 are each dominated by IC{At£.n'At£"n}I
+ IE{A'f£.n}E{A'f£"n}l.
Now taking
lim lim we obtain zero, since the
A~ n~
2
covariance term is bounded by 4A
a(c n I - c n 1
+ wn 1 ) (see Ibragimov and
N n
N+,n
N+,n
Linnik, p. 306), and since (II) applies to the expectation terms.
these. results establishes B.1n
p
+
Combining
0 for each i.
k
Let A = (A ,A , ... ,A ) E R be fixed. It will suffice to consider the
1 2
k
k J(i)-l
asymptotic distribution of G :=
A.y~t£. /p. in place of that of
n
i =1 £.= I (i) 1 . n 1
I
A•
k
I
Tn , b-ecause IA • Tn -G n I -< . I 1 IA.1 lB.1n
P
+
0.
1=
2k-1
I
Write G =
n
Define r.v.s
£.= 1
{tin:
t £.n • g ~' where g£. = Y~.
I"::'£'''::'
k
I
A. 1{J(i) < £. < J(i) - l}/P . .
1
i=l
-
-
1
2k-1, n > I} to have the same marginal distributions
as U£.n: 1~£.'::'2k-1, n> l}, but with {tIn: 1 <£.< 2k-1} independent for fixed
n > 1.
Let G'
n
=
IE{ exp {i u Gn }}
Then for every u
E
R:
- E{exp {i u GI }} I < 16
n
as n-l-W,
-
by the argument of Ibragimov and Linnik (p. 338).
Hence we may simply consider
the asymptotic distribution of G' in place of that of A • T
n
n
.
By Theorem 6 of Carlstein (1985), each tl is marginally asymptotically
n
N(O,l) (provided y£.~O).
And since {tln: 1 < ~ < 2k-l} are independent, we may
.
D
2k-1 2
conclude that GI + N(O,
L gn).
n
£.=1 N
for 0 .. as defined in Theorem 1.
1J
k
AE R , the proof is completed. 0
2k-1
k
k
g~ =
L A.A. 0 ..
£.=1
i=l j=l 1 J 1J
Since this argument holds for each
Observe that
I
I
•
References
Carl stein, E. (1985).
•
Asymptotic Normality for a General Statistic from a
Stationary Sequence.
Freedman, D. (1984).
(To appear in Ann. Prob.)
On Bootstrapping Two-Stage Least-Squares Estimates in
Stationary Linear Models.
Hartigan, J. (1969).
Ann. Stat., 12, 827-842.
Using Subsarnple Values as Typical Values.
JASA, 64,
1303-1317.
Hartigan, J. (1975).
Necessary and Sufficient Conditions for Asymptotic
Joint Normality of a Statistic and its Subsample Values.
Ann. Stat., 3,
573-580.
Ibragimov, I. and Linnik, Y. (1971).
Random Variables.
Tukey, J. (1958).
...
Wolters-Noordhoff, The Netherlands.
Bias and Confidence in Not-quite Large Samples.
Stat., 29, 614 .
•
Independent and Stationary Sequences of
Ann. r-lath.