Sherman, Michael; (1992).Subsampling and Asymptotic Normality for a General Statistics from a Random Field."

SUBSAMPLING AND ASYMPTOTIC NORMALITY
FOR A GENERAL STATISTIC FROM A RANDOM FIELD
by
Michael Sherman
A Dissertation submitted to the faculty of
The University of North Carolina at Chapel Hill
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in the Department of Statistics
Chapel Hill
1992
________________ Advisor
________________ Reader
________________ Reader
ABSTRACT
Let {Xj : i£Z2} be a strictly stationary random field indexed by the two-dimensional
integer lattice. We observe X/s in a finite ordered set Be Z2, with cardinality IBI , and
compute a statistic t(B)
== TB(Xj:i£B) from the observed data. The main question addressed
here is: Under what conditions on the statistic t and the set B will t(B) be asymptotically
normal as IBI-oo? Section II is a survey of results on asymptotic normality for the sample
mean and for a general statistic t( .). In section III necessary and sufficient conditions are given
for the joint asymptotic normality of t with it's subsample values when B is a rectangle. The
result essentially says that a statistic, t, is asymptotically jointly normal with its subsample
values if and only if it has sufficiently thin tails and limiting covariance behavior similar to that
of the sample mean in the i.i.d. setup. A simpler set of sufficient conditions is given for the
class of rectangles and for the case where B is a starshape.
As a practical counterpart to this characterization of normality, in section IV, a
completely nonparametric diagnostic tool is given to assess whether a statistic is approximately
normally distributed, and if not, how its distribution departs from normality (e.g. it is skewed).
The method uses statistics computed on overlapping "subshapes" (t(Bj):B j C B, i=l,...,m) of
the data as replicates of t(B). The estimator of the distribution function is strongly consistent
for arbitrary t( • ) computed on dependent data. A small simulation indicates that it works well
for reasonably sized data
g~ids.
In the case that normality is a reasonable assumption, the variance of the limiting normal
distribution is typically unknown. In Section V a nonparametric Method of Moments variance
estimator using subshapes of data is given, for complicated statistics computed on data from a
random field. A simulation illustrates the finite sample effectiveness of the estimator.
11
....
CONTENTS
I.
LIST OF FIGURES AND TABLES
iv
INTRODUCTION
1
A)
Subsampling Algorithms for I.I.D. Data
B)
Subsampling Algorithms for Stationary Sequences
C)
Subsampling Algorithms for Stationary Random Fields
D)
The Connection Between Subsampling and Asymptotic Normality
for General Statistics
II.
EVOLUTION OF THE PROBLEM OF ASYMPTOTIC NORMALITY
5
FOR A GENERAL STATISTIC
A)
B)
III.
Asymptotic Normality for the Sample Mean
1)
Independent and Identically Distributed Observations
2)
Observations from a Stationary Sequence
3)
Observations from a Stationary Random Field
Asymptotic Normality for General Statistics
1)
Independent and Identically Distributed Observations
2)
Observations from a Stationary Sequence
3)
Observations from a Stationary Random Field
ASYMPTOTIC NORMALITY FOR A GENERAL STATISTIC FROM
A MIXING RANDOM FIELD
A)
Necessary and Sufficient Conditions For Asymptotic Normality
B)
Examples
C)
1)
The Sample Mean
2)
The Delta Method
3)
The Sample Percentiles
Sufficient Conditions for Asymptotic Normality of t
Computed on Starshapes
1)
Centrality of the Sample Mean
iii
13
IV.
ESTIMATION OF THE DISTRIBUTION FUNCTION
47
OF A GENERAL STATISTIC
V.
A)
Introduction
B)
Properties of the Replicate Histogram
C)
Examples
D)
Appendix: Proof of Strong Consistency of the Replicate Histogram
MOMENT ESTIMATION FOR A GENERAL STATISTIC
A)
Introduction
B)
How Good is Maximum Likelihood Under Misspecification?
C)
Estimating the Moments of a Statistic Computed on a Random Field
1)
D)
VI.
64
Example: The Ising Model
Appendix: Proof of Consistency of the Method of Moments Estimator
77
REFERENCES
FIGURES AND TABLES
Figure 3.1
Layout For the Proof of Univariate Asymptotic Normality
15
Figure 3.2
Layout For the Proof of Bivariate Asymptotic Normality
25
Figure 3.3
Examples of Starshapes
38
Figure 4.1
Smoothed Subseries Replicates (I), n=200
54
Figure 4.2 Smoothed Subseries Replicates (II), n=200
54
Figure 4.3 Smoothed Subseries Replicates (I), n=1000
55
Figure 4.4 Smoothed Subseries Replicates (II), n=1000
55
Figure 4.5
2
P.D.F. of T = (1- W )/4
56
Figure 4.6
P.D.F. of the Standard Normal Distribution
56
Figure 4.7 Smoothed Subshape Replicates (I), 0=50
58
Figure 4.8 Smoothed Subshape Replicates (II), 0=50
58
Figure 4.9 Smoothed Subshape Replicates (I), 0=100
59
Figure 4.10 Smoothed Subshape Replicates (II), n=100
59
Figure 4.11 The Half Normal Distribution
60
Table 5.1
MSE's of ML and MOM estimates of (J = E{exp(X 1 )}
67
Figure 5.1
69
Table 5.2
Plot D n with Subplot Replicates D;(n)
1\
The Effectiveness of {3 as an Estimate of {3 in the Ising Model
Table 5.3
The Effectiveness of the Method of Moments Variance Estimator
74
1\
Figure 5.2 Smoothed Estimate of the P.D.F. of {3
iv
73
75
'.
I. INTRODUCTION
Much of the motivation for this study of asymptotic normality comes from the area of
subsampling. For this reason, in sections (A), (B) and (C) a brief overview of subsampling
algorithms is given in the i.i.d., stationary sequence and random field setups. How subsampling
and asymptotic normality are related will be discussed in section (D).
A)
Subsampling Algorithms for i.i.d. Data
The jackknife (Quenouille (1949, 1956), and Tukey (1958» and the bootstrap (Efron
(1979» have become popular subsampling procedures for estimating features of the distribution
of a general statistic from an Li.d. sequence. Quenouille used the jackknife to estimate bias in a
statistic while Tukey used the jackknife to estimate the variance of a statistic. The bootstrap
has been used to estimate the underlying distribution function of the statistic as well as
functionals of that d.f.
Let Xl' X2 ,,,.,X n be i.i.d. observations from an unknown distribution function F. We
compute a statistic Tn(Xl,...,X n). The idea of the jackknife is to compute T i =
Tn-l(Xl,,,.,Xi_l' Xi+l'''''X n), i=1,2,...,n and use these "replicates" to get information about the
distribution of T n(Xl,,,.,X n). For example, the jackknife estimate of variance is: s)=
f:
f:
(n-1)(T i -T)2 In , where T= TJn. Note that, given the data, this is a completely
i=l
i=l
deterministic subsampling scheme.
The bootstrap, on the other hand, is, in practice, a random subsampling procedure. The
basic principle is to generate "bootstrap resamples" (Xr, X2,,,.,X~) where the { Xi: 1 ~ i ~ n }
are conditionally i.i.d. from the empirical d.f. of { Xi: 1 ~ i
~
n}. We then compute
T*=T(Xi,... ,X~) as a replicate of T and use the conditional distribution of these T*'s to model
the distribution of T. If all nn subsamples are computed this is the exact, or deterministic
bootstrap. Typically, however, a random subset of the nn replicates is used. Bickel and
Freedman (1981) and Singh (1981) have proven that the limiting bootstrap distribution of the
T*'s converges to the limiting distribution of T in the case of the sample mean, the sample
quantiles and Von Mises functionals, when these statistics are asymptotically normal.
Another approach to subsampling has been taken by Hartigan (1969). Suppose that T
1
estimates an unknown parameter 8 of interest.
Hartigan's method uses "typical values" to get
confidence intervals for 8. Hartigan's idea is to choose the subsamples in such a way that each
interval between 2 ordered replicates contains 8 with the same probability. That is, if we let
T(l) $ T(2) $ ... $ T(N) denote the ordered replicates of T then IP{ T(i) $ 8 < T(i+l)} =
1/(N+l) for i = 0,1,..., N , where T(O)=
-00
and T(N+l)= +
00.
So, for example, [T(l),
T(N)) is a (1-2/(N+l))100 % confidence interval for 8. Hartigan proves the validity of this
procedure when T is the sample mean of X/s which are independent (not necessarily with the
same distribution), continuous and symmetric about 8, where the set of subsamples is the set of
all nonempty subsets of the n observations. Hartigan (1975) also showed that if T
(appropriately standardized) has a limiting normal distribution and we choose the set of N
subsamples appropriately then p{T(i) $ 8
< T(i+l)} _ I/N+l as n-oo. Gordon (1974)
showed that the empirical distribution of the typical value replicates has the same limiting
distribution as T when T is the sample mean of LLd. finite variance random variables.
B)
Subsampling Algorithms for Stationary Sequences
In the case of stationary observations two basic approaches to subsampling have been
taken. One is a model-based approach and the other is a model-free approach. In the former, a
particular form of dependence is assumed. For example, we might assume Xi=pX i _ 1 + (i where
p E (-1,1), (i are i.i.d. with mean zero. Say we compute a statistic T(X1,...,X n ) from the
data and are interested in the distribution of T. In order to generate a bootstrap resample
(Xi',... ,X~) we would like to resample from the
e/s as they have the i.i.d. structure.
(/s are unknown and p is unknown, we resample (ei',...
residuals li= X j -pX i _ 1
Xi=pXi_1 +ei : 1 $
'
where p is an estimate of p.
,(~)
Because the
from the empirical d.f. of the
We can then use these to generate
i $ n, and from them T*=T(Xi',... ,X~). As in the i.i.d. case, the T*'s can
now be used to model the distribution of T. The validity of this procedure has been proven by
Freedman (1984) for certain "linear dynamical models" and by Bose (1988) for autoregression.
Bose showed that the bootstrap can be used to estimate the distribution of the p parameter
estimates in an AR(p) process by showing that the difference between the bootstrap distribution
(suitably normalized) and the true distribution of the estimates is 0(n- 1 / 2 ) thus improving upon
the usual limiting normal approximation. Basawa, et a1. (1989) have used the bootstrap to
estimate the distribution of p in the case of an explosive AR(I) process ( I pi> 1).
The main drawback of these model-based procedures is that it is assumed that the true model of
dependence is known. In the i.i.d. case, both the jackknife and the bootstrap are truly nonparametric procedures (assuming the i.i.d. assumption holds). This is a main selling point of
these methods. Therefore, assuming a specific dependence model takes away much of the
2
original appeal of these procedures.
Model-free methods have been proposed by Rajarshi (1990), Carlstein (1986a) and
Kiinsch (-1989). Rajarshi assumes that the process is 1st order Markovian and uses the
estimated transition density to generate replicates and do the bootstrap. Specifically, let f(xly)
denote the conditional density of Xi given Xi-I. We then estimate the conditional density by
f(xly) using a kernel based estimator. Assuming that f is a good estimator of f, we next generate
Xi having density f(xl
Xi_I) : 1 ::; i ::; n, and from them compute T*= T(Xi, ... ,X~).
We then
use these T*'s to model the distribution of T. Rajarshi proved the validity of this procedure
when T is the asymptotically normal sample mean.
Carlstein (1986a) uses "subseries" and Kiinsch proposes generalized jackknife and
bootstrap procedures. The basic idea in both cases is similar. Let X{ = (Xi+1,Xi+2,...,Xj+j ).
o
Then X is our data, and we are interested in the distribution of some statistic T(X
o)'
In the
stationary case the i.i.d. bootstrap procedure does not make sense. The reason is because the
i.i.d. bootstrap drastically alters the original dependence structure. One way to get around this
is to break the data up into k blocks of length n/k = I, say, and compute "subseries values"
T(X~j_I)I) , j=1,2,...k, and treat these as replicates of T(X
o).
Note that blocks of data
automatically retain the correct serial dependence structure. Carlstein (1986a & 1988a) used
this procedure to get a consistent method of moments estimate ~ for I-'=lifll ETm(X
o)'
Kiinsch
uses the jackknife and bootstrap algorithms but he deletes Uackknife) or resamples (bootstrap)
blocks of data instead of single observations. The jackknife replicates are T(X 1 ,X 2'...,X j,
X i +l+l' X i +l+ 2" ..,X n) : 0 ::; i ::; n-l , omitting each block of length 1once. A bootstrap
resample of X is (X~*,X~*, ... ,X~*) where the X~*: 1::; i ::; k, are conditionally i.i.d. from the
o
empirical d.f. of {X~ : 0 ::; i::; n-l}, i.e., each X~* is one of the possible n-I+l blocks of length
I that can be chosen from n consecutive observations. Kiinsch showed the validity of this
procedure for the (asymptotically normal) sample mean and for a class of asymptotically normal
nonlinear statistics. These results are analogous to those of Singh (Theorem 1.A) and Bickel &
Freedman in the i.i.d. bootstrap setup.
C)
Subsampling Algorithms for Stationary Random Fields
Subsampling methodology has been extended to the random field case in recent years.
Possolo (1991) used "subblocks" (analogous to subseries in the time series case) to get a
consistent estimate of the variance of a general statistic. This can be used to construct
confidence intervals for
(J
if t(B) is an estimate of (J with an asymptotic normal distribution and
unknown variance. Lele (1988) proposed a bootstrap algorithm for the random field case where
3
the dependence is specified by an auto-model (see section II.A.3 for a description of automodels). His basic idea is as follows: Let N(i) denote the neighbors of a site ifZ 2 • Let
f[XiIX(N(i))] denote the conditional distribution of Xi given the values at neighboring sites. We
"
then obtain an estimate f (xly) of this conditional density and use it to generate Xi's and hence
a replicate T(Xi,...,X:). We then use these T*'s to estimate the distribution of T. This is
analogous to Rajarshi's method for a stationary time series. However unlike the time series
"
case, generating the data grid {Xi: ifB} is nontrivial here, even after estimating f (xly).
Lele
uses the "Metropolis, et a1. (1953) algorithm" to generate {Xi: ifB}. He showed the validity of
this procedure in the case of the (asymptotically normal) sample mean.
D)
The Connection Between Subsampling and Asymptotic Normality of General Statistics
In establishing the validity of the subsampling plans discussed above, it becomes clear
that there is a close connection between subsampling algorithms and asymptotic normality. In
particular, it is known, in the i.i.d. case, that if the jackknife estimate of variance is
asymptotically unbiased and consistent then the limiting distribution of T is normal (van Zwet
1990). Similarly, much of the justification for the bootstrap (e.g., the Singh, Bickel and
Freedman, Kiinsch, Rajarshi and Lele results) comes from cases where the statistic is
asymptotically normal. Carlstein (1986a,1988a) appeals to asymptotic normality to motivate
his subsampling and Hartigan (1975) uses his asymptotic normality results to study the
jackknife and typical values. So the question: "When does the bootstrap (jackknife, typical
values) work?" is intimately related with the question "Which statistics have a limiting normal
distribution?". The connection between these two questions motivated the results reviewed in
sections I1.B.l and
~I.B.2
, which deal with asymptotic normality for general statistics in the
Li.d. and stationary time series cases. Moreover, this connection motivates the present study of
asymptotic normality for general statistics computed from a random field.
4
II. EVOLUTION OF THE PROBLEM OF ASYMPTOTIC NORMALITY
FOR A GENERAL STATISTIC
A)
Asymptotic Normality for the Sample Mean
Let X have the same distribution as Xi. We will assume throughout this section that
EX=O and that t(B)= i~XiI IBll/2h1BI where h n is a continuously slowly varying function,
which is integrable over any finite interval. {Y n: n 2: 1 } is said to be uniformly squared
integrable if sHPEY~I{IYnl2: A}-O as A-oo.
1)
Independent and Identically Distributed Observations
Our starting point is the classical Central Limit Theorem. Let {Xi} be i.i.d. and let
Bn =(1,2,...,n).
Theorem 2.1 :
If
(2.1)
then (for h n == 1)
(2.2)
{t(B n )} is uniformly squared integrable.
and
Loeve (1977) actually gives necessary and sufficient conditions for normality in the
following result:
5
(2.3)
Theorem 2.2 :
(2.4)
iff
For each f>O and for some r>O the following three conditions hold as n-oo:
1 2
nP( IXI ~ fh n n / ) - 0,
(n 1 / 2/h )EXI{IXI < rh n 1 / 2}_ 0,
(2.5)
(1/h~)EX21{IXI < rh n n / 2}_ IT 2f(O,oo).
(2.7)
n
n
1
(2.6)
Condition (2.5) controls the tails of the distribution; condition (2.6) centers the truncated
random variable near 0 and condition (2.7) controls the truncated variance.
Observations from a Stationary Sequence
2)
In any situation where we do not assume independence we must somehow quantify the
departure from independence in the sequence. As discussed in the introduction, it is unappealing
in the nonparametric subsampling context to assume the true dependence structure is known.
Because of this, we focus on model-free measures of dependence, Le., mixing conditions.
Rosenblatt (1956) introduced the concept of a (or strong) mixing for the time series case.
This allows for dependence between the X/s without specifying the model giving rise to the
dependence. Bradley (1986) gives a detailed account of various mixing conditions, the relative
strength of each and examples that satisfy the conditions.
Let g'~oo denote the IT-field generated by (""X~_I' X~) ; g'r' the IT-field generated by
(X t ,Xt+l"") and define:
a(m)= sup
I
P(AB) - P(A)P(B) I .
Af~oo
Bfg':oo
We say that the stationary sequence {X;} is a-mixing if a(m)- 0 as m-
00.
Roughly
speaking, the random variables become less dependent as the distance .between them grows.
There have been many Central Limit Theorems proven under different moment and mixing
conditions. Hoeffding and Robbins (1948) proved a CLT for an M-dependent sequence
(a(m) == 0 V m>M ) and Rosenblatt and Blum (1956) proved a CLT under a-mixing.
6
Leadbetter and Rootzen (1990) prove a central limit theorem for a class of additive random
functions which includes array sum results as a special case. The following result
IS
given in
Ibragimovand Linnik (1971, p.339) (I&L from now on) [Here Bn =(1,2,...,n) and h n == 1]:
Theorem 2.3 :
Let {Xi} be a-mixing.
Assume
(2.8)
Then
(2.9)
iff
{t(B n )} is uniformly squared integrable.
(2.10)
This result was proven by Volkonskii and Rozanov in 1959.
Note that the uniform integrability (2.3) that was a consequence of finite variance in the
independent case [ (2.1)
~
(2.2) and (2.3) ] now becomes an additional necessary condition for
normality [If (2.8) then (2.9)
~
(2.10) ].
Observations from a Stationary Random Field
3)
In 1968 Dobrushin published his results giving sufficient conditions for the existence and
uniqueness of a random field with a given conditional distribution. As in the I-dimensional
case, there are model-dependent and model-free ways to characterize the dependence in a
random field.
Besag (1974) gives various "auto-models" ( a generalization of autoregression to the
random field setup) which describe the distribution of a random field. Here a model is given by
its conditional probability structure. We define the point afZ 2 to be a "neighbor" of the point
bfZ 2 if the conditional distribution of X b given the values at all other sites depends on the value
of Xa • Any set of sites which consists of either a single site or else in which every site is a
neighbor of every other site is called a clique. As an example, let Xi be a random variable
located at the point ifZ 2 taking on the values 0 and 1. An auto-logistic model is defined by:
P{Xi=xil all other sites}= exp[xi(aj+ 4:.8 j,jx)] [I+exp(aj+ l:.8 j,jx)]-l,
J
7
J
where the only non-zero f3i./s are those such that j is a neighbor of i. Similarly we can define
auto-normal and auto-exponential schemes.
More generally, an auto model is a special case of a Gibbs distribution. The description
here is based on Preston (1974). Consider a random field where each site is empty (0) or
occupied (1) by some entity. Let A be a finite set of sites and 1I'(A) be the set of all subsets of
A. Let V be a function which maps 1I'(A) to the real line. V is called a potential on A if
V(tP)=O. For any A f 1I'(A) we let P(A)= exp(V(A»j L
exp(V(B» be the probability that the
Bf1l'(A)
occupied sites are precisely those at locations in A. This discrete probability measure IP on 11'( A)
is defined to be the Gibbs state (measure) with potential V. Actually, any discrete probability
measure such that P(A) > 0 V A f 1I'(A) can be taken as the Gibbs state for some potential
function.
As in the time series case, we can quantify the dependence in a random field by means of
a model-free mixing coefficient. We define:
where Aj C Z2 for j=1,2; GJ(A j ) is the O'-field_generated by {Xi: if Aj }, and D( ".) is defined as
follows. For a=(al'~) and b=(b 1 ,b2) f l2, let Do(a,b) = max lai - bil and for A,B C 71. 2 let
D(A,B) = inf{Do(a,b): afA, beB}. Do is the Manhattan or "city block" distance. This distance
will be used in all that follows.
Many limit theorems have been proven for the sample mean from a stationary random
field. For example, Bolthausen (1982) proves (here again h n == 1):
Theorem
Let
2.4 :
Bn i l2 and 18B n l j IBnl .....O where 8B denotes the boundary of B.
E
If
(2.11)
< 00,
(2.12)
Ql,oo(m) = 0(m- 2),
(2.13)
mQ3,3(m)
EIXI2+6 <
and
E
then
0'2=
for some cbO,
(2.14)
mQl,1(m)6/(2+6) < 00,
(2.15)
00
~ Cov(XO,X i)
<
00
(2.16)
1
(2.17)
8
...
(2.12) is a standard type of mixing condition and is analogous to the condition
E
a(m)<oo,
m
which is required for many Central Limit Theorems in the time series case (for
example, see I&L, Theorem 18.5.4). (2.14) and (2.15) show the "trade-off" between moment
and mixing conditions. As b decreases towards 0 the moment condition (2.14) becomes less
restrictive and the mixing condition (2.15) becomes more restrictive.
Asymptotic normality of the sample mean has been proven under various conditions by
Neaderhouser (1978) and Nahapetian (1980). Takahata (1983), Guyon and Richardson (1984)
and Sunklodas (1986) have given rates of convergence.
It is interesting to note the differences between the setup for a time series (Theorem 2.3)
and that for a random field (Theorem 2.4). In the random field case a standard requirement is
that the number of sites on the boundary of the set Bn is negligible compared to the total
cardinality of the set [(2.11)]. We assume this automatically in the time series case because we
exclusively consider '"intervals" in I, and for any interval
laAI=2.
Another difference is: In the random field, the dependence between two sets of random
variables (characterized by
ate,,( •»)
is a function of not only the distance between the two sets,
but also of each set's cardinality. In the time series setup cardinality is generally not accounted
for. This is generally acceptable because there are many common processes which satisfy amixing. For example, AR(l) processes with normal, double exponential or Cauchy errors (see
Gastwirth and Rubin (1975».
It appears that there is no consensus as to whether a oo ,oo(m) -
0 as m-
00
is an
unrealistic assumption for random fields. Bolthausen states that "theorems based on a oe ,00 are
quite useless for applications to Gibbs fields, as has been remarked by Dobrushin", and
essentially the same statement is made by Goldie and Morrow (1986). Bolthausen is referring to
Dobrushin's (1968) remark where a surprisingly simple, yet nontrivial, example is given of a
random field such that
<Poe,oo(m)
+ 0 as m-oo (where <P is the uniform mixing coefficient).
Neaderhouser, Takahata and many others do account for cardinality in their mixing conditions.
However, Guyon and Richardson, Deo (1975) and Rosenblatt (1985) ignore cardinality in
their mixing conditions. I account for cardinality in the mixing conditions used, as the methods
of proof allow it and because mixing conditions which account for cardinality are less restrictive
than those that ignore cardinality.
B).
Asymptotic Normality for General Statistics
9
Independent and Identically Distributed Observations
1)
Let B n= (1,2,....,n) , {Xi} be i.i.d., and for each n, let t(Bn)=T B (Xi: i€B n) where T B( .)
n
only depends on B through I B I. Let C n be any ordered subset of B n.
LetAX=XI{IXI
< A},AX=X - AX and let E
be a set of conditions on the sequence of sets of
indices {m}j then lim x m = x means x m converges to one common value x, whenever E holds.
E
Hartigan (1975) proves the following general statistic analog to Loeve's Theorem 2.2:
Theorem 2.5:
(2.18)
iff
(2.19)
lim
A-oo
lim IAE t(B n)
A
n-oo
I=
0,
(2.20)
(2.21 )
where
Note that the result holds for one fixed constant
(T2
and for all p€[O,I]. For a given p,
however, the limiting bivariate normal distribution does not depend on a particular sequence
{(Bn,C n)}. Condition (2.19) controls the tails of the distribution, Condition (2.20) centers the
truncated statistic near 0 and Condition (2.21) insures that the statistic has '"mean-like"
covariance: i.e., the covariance is determined solely by the proportion of overlap between the two
sets.
In addition to the above necessary and sufficient conditions for joint normality, Hartigan
shows the following sufficiency result:
Theorem 2.6 :
Let Et(B n) = 0 'fin.
If
(2.22)
then
(2.23)
and
{ t(B n) } is uniformly squared integrable.
10
(2.24)
Analogously to (2.21), (2.22) is a mean-like covariance condition on the statistic t. The
reason for this terminology is that when t is the sample mean (standardized)
(IB n I/IC n I)1/2Cov{t(B n ),t(C n )}
== 0'2, "In. So here, as in Theorem 2.1, a covariance condition
on the statistic yields both normality and uniform integrability.
Another method for obtaining asymptotic normality for a general statistic from an i.i.d.
sequence uses the Hoeffding decomposition (called the ANOYA decomposition in Efron (1982) )
of the statistic. The basic idea is to show that t is close to
2;: tt'(X j ), a sum of Li.d.
I
projections. The tt'(X j ) terms then, being i.Ld., will obey the Central Limit Theorem. This
projection method is a different analytical approach to asymptotic normality: This
decomposition of t aids in obtaining sufficient conditions for univariate normality, while
Hartigan's approach yields necessary and sufficient conditions for joint normality of t(B n) and
t(C n ). (This joint normality ?f a general statistic with its subsample values is very useful for
studying subsampling procedures, as discussed in the introduction). Also, the "projection"
approach actually demands that we explicitly derive the projection of t. Hartigans's approach
shows exactly which properties of a general statistic make it sufficiently "mean-like" to be
asymptotically normal. Namely, that it has mean-like covariance and sufficiently thin tails.
Further, as we will see Hartigan's approach works even under dependence.
2)
Observations From a Stationary Sequence
Carlstein (1986b) extended Hartigan's (Theorem 2.6) result to the stationary case with
Q-
mixing. Now, however, the subsample C n is a set of consecutive indices contained in B n . The
reason for this, as discussed in the introduction, is that "blocks", or "subseries", of data allow
the replicate t(C n ) to have dependence structure similar to that of t(B n ).
Carlstein's necessary and sufficient conditions are essentially the same as Hartigan's
(Theorem 2.5). The corresponding sufficiency result becomes:
Theorem 2.7:
Let Et(Bn)=O V n.
If
then
iff
{ t(B n )
}
is uniformly squared integrable.
11
So the differences between Theorem 2.6 and Theorem 2.7 mirror those between Theorem
2.1 and Theorem 2.3. In the general statistic results {2.6 and 2.7} and the sample mean results
{2.1 and 2.3}, uniform integrability becomes an additional necessary condition for asymptotic
normality in the dependent cases {2.3 and 2.7}, rather than being a consequence of the
convergence of covariances as in the independent cases {2.1 and 2.6}.
Observations from a Stationary Random Field
3)
There ar«: two paths we have followed thus far. In (II.A) we traced the evolution of
asymptotic normality for the sample mean from the LLd. setup to the random field case. In
(II. B) we followed the results for asymptotic normality for a general statistic from the case of
Li.d. observations to the case of observations from a stationary time series. Thus, two questions
naturally arise. "Can we extend results for the sample mean in a random field to more general
statistics?" and "Can we extend the normality results for general statistics from an i.i.d. or
stationary sequence to the random field setup?". The work in section III answers these two
questions.
12
III. ASYMPTOTIC NORMALITY FOR A GENERAL STATISTIC
FROM A MIXING RANDOM FIELD
A)
Necessary and Sufficient Conditions For Asymptotic Normality
As discussed previously (following Theorem 2.4), there are some restrictions on the class
of "shapes" in Z2 on which a statistic t can be computed. Initially, we consider the class of
rectangles which are nby 'Yn fJ for some
'Y,P e (0,1]
(and those approximated by such rectangles).
If 'Y=P=l then our data resides inside a square. For
P < 1,
however, we allow the dimensions' of
the rectangle to grow at different rates. The main result of this section gives necessary and
sufficient conditions for t computed on such a rectangle (and a subrectangle) to be
asymptotically (jointly) normal. This result is analogous to Theorem 2.5 in the i.i.d. case.
Corollary 3.1 gives a simpler set of sufficient conditions analogous to Theorems 2.6 and 2.7 in
the i.i.d. and stationary cases.
Let {X j :ieZ2} be a strictly stationary random field and let ti,j(n,m) == t(Bi,'r) ==
TB~,!,,(Xu:ueBi,'r),
where
Bi,'t = ((i+1,j+1),
(i+1,j+2),..., (i+1,j+m), (i+2j+1), ...,
I,J
(i+nj+1),...,(i+n,j+m», i.e., the n x m block of data with lower left corner at (ij), and where
TB~,m(')
depends on
Bi,'r only
through (n,m). We assume the following mixing condition
I,J
holds:
(3.1)
for some fixed poe(O,l) {O'k,l(r) is as defined on page 8}. Condition (3.1) can be interpreted as
follows: For a fixed distance r, as the cardinality k increases we allow dependence (as
characterized by O'k ,k(r) ) to increase at a rate controlled by k. Note that this mixing condition
does account for cardinality, as discussed in section (II.A.3). As r increases we require O'k ,k(r) to
decrease at a polynomial rate in r. As can be seen from the definition of E 1 in Theorem 3.1 the
mixing rate (Po) is related to the index sets on which we compute the statistic, t. Specifically,
for small values of Po our random field is close to independence, and hence we expect that
normality can be obtained when t is computed on more "irregular" shapes (note that this turns
out to be the case by observing the tradeoff between condition (3.1) and the joint normality in
13
(IV), Theorem 3.1).
~
Following the terminology from Hartigan (1975) we say that
statistic i
~
central with
variance ~2 if the following three conditions are satisfied:
(I)
lim
A-oo
(II)
li]?A 2 IP( It(B~~C)·g(p»1 ~ A) =
lim
A-oo
1
lim I AE t(Bj(p),g(p»
E
1
A
0,0
I=
°
°
where,
E1 = { p-oo, g(p) -
E 2 = { p-oo, g2(P) -
00,
00,
f2 (p) $ i p
f.8(p) / g(p) - r(.B,f,g)f(O,oo), .8f(.8o,l] }
fl(p) / gj(p) - r(.8,fj ,gj) f (0,00) for i=1,2,
+ f2(p)
$ f1(p), g2(P) $ jp
+ g2(P)
$ gl(P),
f2 (p)/f1(p)- Pjf[O,l], p 2= pj+l r(.8,f1 ,gl)/ r(.8,f2 ,g2)' (J f (.80 ,1] }
Theorem 3.1:
(IV)
iff
t is central with variance
(T2.
Condition E1 says that the dimensions of the rectangle grow at specified rates. E2 adds
joint conditions on a rectangle and a subrectangle. The conditions, analogous to the i.i.d. setup
(Theorem 2.5), are that the subrectangle is completely contained in the larger rectangle and that
the asymptotic covariance between t computed on the full rectangle and the subrectangle
depends only on the limiting proportion of shared observations. To see this, note that
2 r
f2 (p)g2(P)
P P!..IYor1 (p )gl (p)'
It is necessary to include the rectangular shapes in E1 in order to obtain necessary and
sufficient conditions for joint normality. This is because there is not a one to one
correspondence between a shape on which t is normal and on which t is central. For example,
no result like Theorem 3.1 can be obtained on the class of squares. The reason is that in order
to obtain joint normality of t computed on a square and a subsquare it is necessary to have
centrality of t when computed on a class of (nonsquare) rectangles. Thus, the class of shapes in
14
E1 is one where the shapes needed for centrality of t and normality of t "match up".
By including more shapes in E 1 we obtain normality for a larger class of shapes.
However, this implies that t must be central on a larger class of shapes. Thus, more is gained
but more is assumed. For example, it is true that centrality holds for a fixed .8£(.80 ,1] if and
only if joint normality holds for this flXed.8. It is not clear whether this result would be
preferable to Theorem 3.1 which assumes centrality V .8f(.8o,l] but yields normality V .8f(.8o,l].
The goal here is to find the 1Hm1 possible class of shapes on which t will be normal if and only
if t is central. Thus, Theorem 3.1 is the preferred statement.
Three examples of central statistics will be given in Section III.B.
It is assumed that f, g, fi and gi map N into N and that i p and j p are in N. Fix.8 £
(.80 ,1]. The proof begins by showing that I,ll and III (for this fIXed .8 only) imply asymptotic
univariate normality of to,o(n,L"InlJ J) for any "If (0,00). Then it will be shown that "any" f and
g satisfying rIJ(p) j g(p) .....
1/1 yields a statistic that is
"close" to this t, to obtain
t o,o(f(n),g(n))!N(O,0'2).
To prove the asymptotic normality of to,o(n,L"InlJ J) we use the following argument: We
obtain replicates of t by breaking the n by L"InlJ J rectangle up into krklJl nonoverlapping blocks,
each of dimension p by r"IplJl and separated by "strips" of width q. We then approximate the
distribution of to, o(n, l"InlJ J) by that of:
II:
rll:lJl
EEt.
.
IJ
(p,r"Ip1J1) j
i=lj=l (.-l)(p+q),(J-l)(hp l+q)
n
~
(krk IJ 1)1/2, where q == qp is such that qjp.8_ 0 and
k(p+q).
The setup is sketched below:
Figure 3.1: Layout for the Proof of Asymptotic Univariate Normality
.
2 (r"6,1'7 +<i)-;
2f6/7+ 2-
ft.P~ti~ 1'6P";
-•
I
..
,
.
.
•
..
D·
r-
.-
o
.-
P P+1 2p+2.." 2(p~t)
15
Let t(mr j) = t .
.
fJ
(p,r ip fJ l)· If p-oo then each t(mr j) will behave
(l-l)(p+q),(,-I)(hp l+q)
.
,
like to,o(n,l infJ J) and if q-oo the krk fJ l summands will become approximately independent and
hence will follow the Central Limit Theorem as k-oo. If q/pfJ_ 0 then the omitted strips of
fJ
k k l
nfJ
data will be negligible. Finally, to,o(n, l i J) will be close to i~lj~1 t(mL) 1 (krk fJ l)I/2
r
because of the mean-like covariance structure.
This argument is a version of Bernstein's (1927) "big-block, little-block" argument for
approximating a sum of dependent random variables by a sum of independent ones in the time
series case. However, as indicated here, this argument can be applied to general statistics
computed from a random field as well.
Specifically, first assume that (1'2{ (0,00). Then without loss of generality we can take
(1'2=1. Fix i. Let p 2: 1 and let k 2: 1 be fixed but arbitrary. Let tp,k= to,o(k(p+qp)'
.
fJ
fJ
6
fJo(fJ+ 1 )
rk l(hp l+qp» where qp=rp 1, fJ + 1 < 6 < 13. We Will show first that:
o
krk fJ l
d:= lim
lim Var{ Atp
A-oo p-+oo
+ (2/(krk fJ l)I/2)
k - L L At(mr,)
'i=lj=I'
1 (krk fJ l)I/2 } = o.
kr~l
i~lj~).!Toop~oolEAt p, k At(mr, j) - I/(krk fJ l)I/21
+ (2(krk fJ l)I/2) lim
lim IEAt kEAt(mr 1)1
,
A-oo p - o op·
The first term is equal to 0 by III { f1 =f2= k(p+qp)~ PJ=I, gl=g2=rkfJl(ripfJl + qp),
eft Igi= kfJ(p+qp)fJIrkfJl(r"YpfJl + qp)- kfJ IfkfJh for i=I,2 [using the fact that qp=o(pfJ)j}.
Similarly the second term equals 0 {fl=f2=P~ Pf=l, gl=g2= ripfJl, eft/gi= pfJ lhp fJ l-1h
for i=I,2}.
The fifth term equals 0 using III { f1 =k(p+qp)' f2=p, gl=rkfJ l(hpfJ l + qp) , g2=hpfJl~
Pf= 1Ik, ~1gl =k,(3(p+qp),(3Irk,(31(r iP,(31+qp)-k fJ 1irk,(31, f~1g2=pfJIf ip fJ l-11 i and so
p2=(l/k)fJ+lkfJihrkfJl= l/krk,(31}·
lim IIEAt klEAt(mr 1)1 $ lim lim IAIEAt p kl=O
,
A-oo p-oo
.
by II. Similarly the third and fourth terms equal O.
For the sixth term note that lim
A-oo p - o op'
For the seventh term we need the following Lemma which is a generalization of I&L, Theorem
16
17.2.1.
Lemma 3.1:
Let At and A2 be two sets of indices in Z2 such that IAtl $ k and IA 21 $ I and
D(A t ,A2) ~ r. Let X and Y be two random variables which are measurable with respect to
"(A t ) and "(A2), respectively. Further assume that I X I $ C t and
I Cov(X,Y) I
IY I $
C 2. Then:
$ 4C t C 2£rk,,(r).
Proof:
IEXY-EXEYI = IE{X[E(YI"(At»-EY]}I $ CtlEllE(Y,"(A t ) -lEYI = C t lE{$[IE(Y,"(A t »
-EY]} (where $= sgn[E(YI"(A1»-EY] ).
The last term equals CtIE$Y -E$EY] because $ is "(At)-measurable.
IE$Y -E$EYI=IE{Y[E($I"(A2»-E$]}1 $ C 2EIE($I"(A 2»-E$1 =
C 2E{q,J[E($I"(A 2»-E$]) (where q,J= sgn[E($I"(A 2»-E$]).
The last term equals C2IE$q,J -E$Eq,J1 as q,J is "(A 2)-measurable, so we have:
IEXY-EXEYI $ C t C 2IE$q,J -[$Eq,JI.
(3.2)
Now, let A={$=l} and B={q,J=l} which are "(At)-measurable and "(A2)-measurable,
respectively. Then:
IE$q,J -E$Eq,JI=IP(AB)-P(ABC)-P(ACB)+P(ACBC)_P(A)P(B)+P(A)P(B C)
+P(AC)P(B)-P(AC)P(BC)I $ IP(AB)-P(A)P(B)I + IP(ABC)-P(A)P(BC)I +
IP(ACB)-P(AC)P(B)I + IP(ACBC)_P(AC)P(B C)I $ 40 k,,(r).
Combining this with (3.2) proves Lemma 3.1.
Using the lemma on the seventh term above we see that:
ICov{ At(Sr, j)' At(!8r, mn I $ 4A20 php.81, php.81( qp)
(3.3)
and
by (3.1). The R.B.S. in the above inequality is no greater than C t p.8+1 p - 5(130 +1)/13 0 .
5
So lim lim I Cov{ At(!8f,')' At(Sr.mn I $ C 2 lim A2 lim l+1- (13 0 +1)/130 = 0 by the
A-oo p-oo
,
,
A-oo p-oo
definition of 6. This implies that the seventh term equals 0 as each summand is 0 and k is
fixed, and hence d=O.
17
We now need the following lemma:
Lemma 9.2:
Let VA p k ~ 0 V A,p,k and suppose that lim
, ,
lim VA p k = 0 Vk ~ 1. Then given
A_oop-oo
"
any {bk-oo} 3 {AkToo} such that A k ~ b k Vk, and 3 {PkToo} such that lim VA
k-oo
k,Pk'
k=O
whenever Pk ~ Pk Vk.
Now note that:
k
P{ltp,k -
<
+
rk.81
.
i~lj~l t(llf,j) / (krk.81)1/2 I > l}
(3.5)
k rk.81
P{I A tp,k-i~lj~l At(!8f,j)/(kr k .81)1/21 > l/2}
kr~l
p{IAtp,k-i~lj~lAt (!8f,j)/
k
(krk.81)1/21
> l/2}
rk.81
(4/l )E{A tp,k-i~lj~lAt(llf,j)/(kr k.81)1/2}2
2
k
+
p[lAtp,kl
k
<
rk.81
> l/4] + P[ I i~lj~l At (!8f,j)/(kr k.81)1/21 > l/4]
rk.81
(4/l 2)[{ Atp,k - i~lj~lAt(llf,j) / (krk.81)1/2}2
(3.6)
We know that: lim lim e(A,p,k)=O Vk
A-ocP-OO
>
1 by (3.4);
-
by I;
This implies that: lim lim [(4/l 2)e(A,p,k)
A_ocP-OO
+
<:Tl(A,p,k)
18
+
A 2S(A,p,k)]
(4/e 2)lim lim {e(A,p,k)
A_ooP-oo
+ GJ)(A,p,k) + A2S(A,p,k)}
(4/e 2)lim limOO ~(A,p,k)=O 'Vk> 1 and 'V 0
A_ooP-
-
< e < 2.
-
Now applying Lemma 3.2 we have that 3 A"loo such that A k ~ (krk.81)1/2 'Vk and 3
p~l)loo such thatLlim ~(A",p",k)=O when Pk ~ p~l) 'Vk. Setting P=Pk (Pk ~ p~l» and A = A k
..- 0 0
in (3.5) and (3.6) and using the established inequality we have:
.
<
<
So,
(3.7)
The next step is to prove that the distribution of
krk.81
.2: t: t(1Bf.~)/(krk.81)1/2 can be
a=lJ=l
approximated by that of the corresponding independent sum.
Let { t*(1Bf.~): i=1,...,k ; j=1,...,rk.81 } have the same marginal distributions as {t(1Bf.~):
i=1,...,k; j=1,...,rk.81 } but so that the t*'s are independent.
Let
for fIxed s let Zu,j,k= e
i.U
."
i.U k
u,J" and finally, let 4>,,(s)= Ee
and
shown that lim 14>,,(s)-4>k(s)I=O when Pie ~ (some) p~2)loo.
k-oo
For ease of notation, let Y1=Zl, 1,k' Y 2=Zl,2,k'''·'Y rk J3 1=Zl, rkJ3 1,k' YrkJ3 1+l =Z2, 1,k
..., Y r .81=Z r.81 .
""
k,","
" r".81
r .81
14>,,(s)-4>k(s)1 = IE [lIJ1 Zi,j,,,-(EZ1,1,k)k" I =
i
19
"r k .81
IE
I1
r .81
Y i -(1EY1)k" I
<
'
kr k.81
kr k.81
lEn Yi-E n Yi(EY1)1
i=1
i=2
krk.81
kr k.81
Yi(EYI)-En Y i(EY I )21 + ...
i=2
i=3
+ I En
using the fact that lEYII' ~ 1 V leN.
Next, we need a corollary to Lemma 3.1:
Lemma 3.3:
If X and Yare complex-valued random variables then Lemma 3.1 holds with the
constant 16 replacing 4.
kr k.81
kr k.81
kr k.81
Lemma 3.3 implies that lEn Vi-En Yi(EY1)1= I Cov( n
Yi,Ym)
i=m
i=m+l
i=m+1
I
~
1 ~ m ~ krk.81- 1, and hence that l<Pk(s)-<PZ(s)1 ~ krk.8116a r.81 r .81 r.81 r .81(qp)·
k k Pk -YPk ' k k Pk -YPk
k
2
Now, using the same argument as in (3.3) we see that k rk.81a r .81 r .81 r.81 r .81(Qp)-+ 0
k k P -YP ,k k P-YP
as p-+oo. We now need the following:
Lemma 3.4:
If P-+OO
lim Lk, p=O for each fixed k, then 3 PkToo such that k-+oo
lim L k• P k =0 when Pk 2:
Pk'
So the sum with dependent summands can be well approximated by that with independent ones.
We now need the following Lemma from Loeve, Sec. 23.5:
Lemma 3.5:
Let {Xi,j,k: 1 ~ i ~ k, 1 ~ j ~ rk 131} be LLd. random variables.
20
if V £ > 0 and for some r
and
>0
(1)
krk131P[lX1,I,kl ~ £]-0 as k-oo,
(2)
krkI31EX1,I,kl{IX1,I,kr < r}-O as k-oo,
(3)
krkI31EXtl,kl{IX1,I,kl
< r}-1 as k-oo.
We will verify that (1), (2) and (3) hold when Xi,j,k=t*()Bf.~)/(krkI31)1/2. First we need the
following:
Lemma 9.6:
Suppose that VA p
,
~
O,VA,p, and that lim limOO VA p=O. Then given any {Akloo} 3
A_ooP-
•
{Pkioo} such that lim VA p =0 when Pk ~ Pk Vk.
k-oo
k' k
I AIEAt*(mr 1) I =0
Consider condition (2) first. We know that lim lim
A_ooP-OO
•
by II. Let
A k=r(krk131)1/2. Then Akioo aitd by Lemma 3.6 we see that there is a sequence p~3)ioo such
thatk~IAkEAkt*(mi,\)I=O when Pk ~ p~3). Substituting in for A k this yields:
So (2) holds when Pk ~ p~3).
To check that (3) holds note that lim lim IIEAt*(mr 1)2- 11=0 by III. By Lemma 3.6
A-ooP-OO
'
we see that there is a sequence p~4)ioo such thatk~oollEAkt*(mi,\)2-11=0 when Pk ~ p~4).
Substituting in for A k yields:
So (3) holds when Pk ~ p~4).
It would be nice if we could use a similar argument for (1). However, (1) must hold
Y....!. > O. The problem is that Pk depends on
r. The method of verification for (2) and (3)
would require a new Pk (i.e., a new set of summands) for each new
problem "ill-defined".
To show that (1) holds we need the following two Lemmas:
21
f.
This would make the
Lemma 3.7a:
Let C( . ):(0,00)1-+[0,00) be a function such that C(A)-O as A-oo. Then 3 Q(A) which is
continuous on R+ and nondecreasing to 00 such that Q(A)C(A)-O as A-oo.
Lemma 3.76:
Suppose that V p , k ~ O,Vp,k, and that le_ooP-OO
lim lim V p • k=O. Then 3 {PkToo} such that
lim V
k=O when Pie ~ Pk Vk.
Ie-oo p Ie'
Let C(A)= lim A2P[lt*(1Br 1)1> A]. Then C(A)-O as A-oo and using Lemma 3.7a
p-oo
.-
there is a Q(A) nondecreasing to 00 such thatA_ooP-OO
lim lim Q(A)A2IP[lt*(1Br• 1)/ -> Al=O. Define
2
N(A)=Q(A)A . No~e that N(A) is continuous and strictly increasing. We can define fN(A) to
be 1/Q(A)I/2. Then fN(A) is nonincreasing to 0 and we now have that:
lim N(A)p[lt*(1Br 1)1 ~ f (A)N(A)I/2]=0. N(A) is continuously increasing to 00 implies
N
•
that lim lim krk.81p[lt*(1Br 1)1> f .8 (krk.81)1/2]=0 as {krk.81: kfZ+, k > Ko} C {N(A):
lc_ooP-oo
•
- IcfIe ,
AeR+}. This implies the existence of a sequence such that
lim
A_ooP-OO
l~ookrk.81P[lt*(1Bf~I)1~ \fle13,(krk13 1)1/2]=0 when Pie ~ p~5)Too by Lemma 3.7b. Now for a
given e > 0, e 13 < f when k > K(f). This implies that lim krk.81P[lt*(1Bfle1)/ ~ f(krk.81)1/2]
lefle ,
Ie-oo'
~ Ic!!...~krkI31p[lt*(1Bf.\)I ~ flcflcl3,(krkI31)1/2]=0 when Pie ~ p~5). So (1) holds. Loeve's 3
conditions are satisfied so we have that:
Ie rle131
.L ~ t*(1Br.~)/(krk.81)1/2
!N(O,l) and hence by (3.7) that:
1=1)=1
(3.8)
whenever Pie ~ 'lrle where 'lrle:= ~ax
p~i).
1=1.... ,5
It now remains to be shown that to,o(n, Lrn.8J) is close to t ple • Ie' We will show that for
any subsequence nj there is a further subsequence njle such that to,o(njlc,L rn1IcJ)!N(O,l) as
k-oo.
Let {nj} be given. Define, for fixed k, njle so that njle ~ k('lrIe+ Q1r Ie) and next define Pie so
!!>
that nj/k ~ Pie < nj/k +1. Then Pie ~ 'lrle and hence t ple • Ie -N(O,l) in this case.
We now proceed as follows:
Let f > 0 be given, then for any A we have:
P[I to , o(n). Ie , Lrn13).Ie J)-t p Ie''''LI > f] -<
22
lim lim P[lt p lei> A]=O by I. Similarly, lim lim P[lto o(n). , bn.8). J)I ~ A]=O in). -00,
le' A-ook-oo
'
le
le
le
n~ Ibn~ J-1f'y}.
'le
'le
A-ock-eo
Finally:
lim
A-00
lim E{Ato o(n,. , b n .8,. J)-Atp le}2
le-oo
•
le
le
le'
<
(3.9)
The first term in (3.9) is zero by III {f1 =f2= ni le ::::}PJ=1, f~ Igi=n1/bn1kJ-1f'y::::}p=1}.
Similarly the second term is zero by III {f1=f2=k(PI;+qple)::::}PJ=1, ff/gi=k.8(pl;+Qpl;).81
rk.81(r 'YP~l+Qp 1;)-1/'r}.
For the third term we need to show that III applies with p=l. Here f1 =k(Ple+Qple)'
f2= ni le' gl=f k .81(r 'YP~l+Qple)' g2=l'Yn1 J, ile=jle=O. Recall that nj/k ::; PI;
le
implies that nil; ::; k(Ple+Qple) and that:
< nj/k +1. This
so that f2(k)::; i le + f2(k) ::; f1 (k) and g2(k) ::; jl; + g2(k) ::; gl(k). It has already been shown
that fl
1 gi -
1f'y for i=1,2, so that it only remains to be shown that f2/fl-1 to conclude that
p=1 and hence that the third term in (3.9) is zero.
So we have that to o(n,. , L-yn.8,. J)-t p le~ 0 as k-oo and hence that:
,
I;
k
le'
(3.10)
The next step is to show that whenever f(n) and g(n) are such that: n-oo, g(n) -
23
00
and
Ih
fI3(n)/g(n) -
we have to,o(f(n),g(n»! N(O,I) as n-oo when I,ll and III hold { E i holds for
fIxed pe(po,l]}. To begin, note that g(n)-oo
~ f(n)-oo and hence that to , o(f(n), Lif(n).BJ)!
N(O,I) as n-oo by (3.10). We will show that:
to,o(f(n), lif(n).BJ)-to,o(f(n),g(n» ~ 0 as n-oo.
(3.11)
By the triangle inequality we have:
P[lto,o(f(n), L{f(n).BJ)-to,o(f(n),g(n»I > e] <
p[lAto,o(f(n), L{f(n).BJ)I > e/4]
+ P[lAto,o(f(n),g(n»1 > e/4] +
P[lAto,o(f(n),Lif(n).BJ)-Ato,o(f(n),g(n»1 > e/2].
We have that}!Toonli!poop[lAto,o(f(n), L{f(n).B J)I > e/4]
$}!Toonl~ooP[lto,o(f(n), Lif(n).B J)I ~ A]=O
by I and similarly lim lim p[lA to o(f(n),g(n»1 > e/4]=0 by I. For the third term, consider
A_oon-oo
'
to,o(f(n), g(n) V L{f(n).BJ). By another application of the triangle inequality we have:
P[lAto,o(f(n),L{f(n).BJ)-Ato,o(f(n), g(n) V L{f(n).B J)I > e/4]
+
(3.12)
P[lAto,o(f(n),g(n»-Ato,o(f(n), g(n) V L{f(n).B J)I > e/4].
We will show that lim limoo of the second term in (3.12) is zero. For the fIrst term, the
A-oon-
analogous argument can be used with L{f(n).BJ in place of g(n). The second term in (3.12) is no
larger than:
(16/e 2 ){ E(Ato,o(f(n),g(n»-Ato,o(f(n), g(n) V Lif(n).B J»2} $
21E(Ato,o(f(n),g(n»Ato,o(f(n), g(n) V L{f(n).BJ))-11 }.
lim limoo of the fIrst term in the braces on the R.B.S. is zero by III. For the second term
A_oonnote that fI3(n)/g(n)-l/'y and fI3(n)/L {f(n).BJ-lh imply that fI3(n)/(g(n) V L"Yf(n).BJ)-lh so
that lim limooof the second term is zero. For the third term let f1 =f2 =f(n), gl=g(n) V L{f(n).B J,
A_oon-
g2=g(n). Clearly f2 /f1 =1 so that by the argument of the second sentence in this paragraph
24
p2=1 1J +l"'{h=1 and hence lim lim of the third term in the braces is zero. Thus, (3.11) holds
A-adl-OO
and we can conclude, using (3.10) that:
to,o(f(n),g(n»! N(0,1) whenever n-oo, g(n)-oo, rIJ(n)/g(n)-1h.
(3.13)
Now because "'{£(O,oo) was arbitrary we can use an analogous argument for any "'{£(O,oo) and
hence we have that to o(f(n),g(n»E! N(O,1) (for a fixed
,
t
13).
We now need to extend this result to obtain the joint asymptotic normality stated in
Theorem 3.1. Fix aAO,oo), i=1,2, such that !f(n)/gi(n)-ai. We need to show that 'v'Al'A2£R:
Figure 3.2: Layout for the Proof of Asymptotic Bivariate Normality
9,
®
o
®
@)
CD
I I
@
.
I .I
To prove (3.14) assume that in/ft(n)-w t , jn/gt(n)-w 2, f2(n)/ft (n)-p... These three
conditions imply that the location and size of the subrectangle relative to the larger rectangle
eventually "settle down". The first two constraints ultimately will not be needed (Le.,
surprisingly we do not require the location of the subrectangle within the larger rectangle to
"settle down"). We start the proof by breaking up the ft(n) by 81(n) rectangle into five
subrectangles, one of which is f2(n) by g2(n) with lower left corner at (inJ n). If each of the
25
subrectangles satisfies EI then each will be asymptotically normal by (3.13). We insert "strips"
between each of the blocks, to get the t's computed on them to be asymptotically independent.
Then the L.B.S. of (3.14) can be approximated by a sum of asymptotically independent terms,
each of which is asymptotically normal, and hence is itself asymptotically normal.
We first define the "strips". If wl=O set dn=O. If w2=0 set an=O. If 1-wl -p.=0 set
bn=O. If 1-w2-(al/a2)p~=0set cn=O. If any of the preceding four constants are greater than
•
5J
130 (.13+1)
.
o then set the correspondlDg
an,bn,c n or d n equal to Lgi where 13(13
+ I) < 6 < 1. We Will
0
show that:
W~/2(alPel a 2)1/2 to , in (in-d n, g2(n)) + w~/2to,0(fl (n)J n-an) +.
(l-w l -p.)1/2( alPel a 2)1/2t; +1 (n)+b )" (fl (n)-i n-f2(n)-b n,g2(n» +
n 2
n' n
(1-W2-aIPela2)1/2to,in+Cn+g2(n) (fl (n),gl (n)-jn-c n-g2(n))
+
(aIPela2)1/2p~/2t; )' (f2(n),g2(n)).
n' n
We will show that to,o(fl(n),gl(n»-Tn~Oas n-oo. Using the triangle and Chebyshev's
inequalities (as usual) we have:
lim lim P[IAto, o(fl(n),gl(n»1 > l/4]
nl_imooP[lto,o(fl(n),gl(n))-Tnl > l] ~ A_oon-oo
(3.15)
The first term on the R.B.S. is zero by I. The second term is no larger than:
The i th term in (3.16) is zero if lIi=O. If IIi > 0 then the i th term is no larger than:
lim lim P[ltinl ~ A]. It will be shown that each of these 5 terms is zero by condition I. First
A_oon-oo
note that anyone of an' b n, c n or d n divided by either ~ or gl goes to zero as n-oo.
i=1:
Ill> 0 => in/fl(n)-w l > 0 and f2(n)/fl (n)-p. > O. So g(n)=g2(n)-00 and
26
So the first term in (3.16) is zero by I.
i=2:
fJ2
£r1
> O:} jn/gl(n)-w2 > O:} g(n)=jn-an-oo and rfJ/ g
/w 2=£r.€(O,00),
ff
ffgl
(j n-an) gl (j n-an)
1
so we can use I to show that (3.16) is zero, i.e., that the second term 'on the R.H.S. in (3.15) is
5
zero. For the third term in (3.15) , noting that
E fJ~=1:
i=1
(3.17)
The first term on the R.H.S. in (3.17) is zero by III. Similarly, the second term is zero by
III ( using the calculations above in i=1,...,5).
For the third term on the R.H.S. of (3.17) we will use III on each of the five terms in the
sum. The i th term is clearly 0 if fJi=O so assume fJi > o. This implies, in particular, that
g2. -00 (here f h
g2.' etc. denote the generic elements defined in E2). The inclusions obviously
hold by design so we need only check that p2 is fJ~ for i=1,...,5.
,
27
i=l: rf/../gh= f~/gl - aI' ~./g2.= (i n -d n)l3/ g2-w~a2/pe!(O,oo) as JJI > 0
f2./fl.=(in-dn)/fl-wl=Pj' =>p2=w~+1alPe/a2w~=wlalPe/a2=JJ~,
i=4: rf/../gl.= rf/./gl - aI' ~./g2.=rf/./(gl-jn-Cn-g2)-ad(l-w2-alpe/a2)!(0,OO)
as JJ4 > 0
f2./f1.= ft/f1 =I=Pj => p2=al(l-w2-alpe/a2)/al=JJ~,
i=5: rf/../gh= rf/./gl - aI' ~./g2.=fl/g2-a2!(O,oo)by assumption
f2./f1.=f2/f1-p .=P j =>p2=pe+1 a1 / a2=JJ~,
so we see that the third term on the R.H.S. in (3.17) is zero by III.
For the fourth term on the R.H.S.,
.
i
L:
1:"
JJiJJ," lim lim IEAtinAt,"nl, let t in := tin(fin,gin)
A-oo"-CXl
.
.
(e.g., f1n=l n -dn' gln=g2(n)) and let en=an,"b n , c n or dn' We wIll show that each term In the
sum is zero with the aid of of Lemma 3.1. If JJi=O or ~i=O then clearly the (ij)th summand is
zero. So assume now that JJi > 0, JJi > O.
< lim lim I COv{Atin'At)"n} I +A-oo"-OO
lim lim I[Atin[At)nl.
First note that: lim lim IEAtinEAt,"nl < lim lim AllEAtinl =0 by II ( using the calculations
A_oo"-OO
- A_oo"-OO
lim lim IEAtinAt,"nl
A_oo"-CXl
- A-oo"-CXl
for i=1 to 5 above).
I COV{Atin'Atin} I = I Cov{Atin(fin,gin)'
Next:
Atin(fin,gin)}
I ::s 4A2
Q
f. g" f" g" (Lg/J)
In In , )n )n
".
(Lg 5J) < Ci iA 2f l+ 13 Lg 5J-(130+1)/130
C ) j~+I3, C', j~+13
1
1
1
1
< CiiA2fl +13 -5(130+1)/130 < CiiA2fl+,I3f -513(,130+1)/,130
2
1 gl
- 3
1 1
< 4A 2a I""
-
- C i iA 2f
-
3
1
(1_/(130+1 )(,13+1)
130(13+1)
0
-
(3.18)
.
The first inequality follows from Lemma 3.1 and the "definition" of en' The second
follows from the fact that gin::S c~iifn , gin::S C;ifIJn for large n and that fin,fin ::s f1(n), while
the third follows from mixing condition (3.1). The fifth inequality holds because gl 2: Cf~ for
large n, and the last term tends to zero as n-oo by the definition of 6.
28
So lim limOO
A_ooR-
I COv{Atin'At)on} 1=0 , the third term on the
hence we can conclude that to,O(fl(n),gl(n))-Tn~O.
_
s
_
R.H.S. in (3.15) is zero, and
s
The next step is to approximate T n:= EIJ itin by T~:= EIJ it in where the tin's have the
i=1
i=1
same marginal distributions as the tin's but are independent, and where IJ i=A1IJi for i=1,2,3,4,
_
i.it °t °
i.T
i.T*
and IJ S=A1IJs + A2' For fixed s, let Zi,n= e ))n and let 4>n(s)= lEe n, 4>~(s)=lEe n.
Consider the last term on the R.H.S. If IJ4 is zero then this term is clearly zero. If IJ4 is
not equal to zero then c n=Lgl 5 J.
S
S
!J4Zi, n-D4EZi, nl= I Cov{Z4, n'ZS, n} I ~ 16Ctf1 (gl-jn-c n-g2),f2g2(cn)
Then IE
i
~ 16CtCf
1
1+13 Cf 1+t3(Lg/J)
'
1
which tends to zero as n-oo (using the argument in 3.18 above). Similarly, the other three
terms tend to zero. This implies that lim l4>n(s)-4>~(s)I=O. So we need only consider the
n-oo
distribution of T~ to obtain (3.14).
(1-W2-CtlP~/Ct2)1/2to,in+cn+g2(fl,gl-jn-cn-g2) +
(CtlP~/Ct2)1/2p;/2tin' )0 n (f2,g2)} + A2{tin' )0 n (f2,g2)}
where the t*'s are independent and by earlier calculations each one has a limiting N(0,1)
distribution (provided IJi > 0, i=1,2,3,4). So the above expression is asymptotically normally
distributed with variance:
29
(~1(a1P2/a2)1/2p;/2 + A2
P
=A~(l- p.a 1P2/a2) + A~(p*a1P2/ a2) + 2A1 A2(a1P2/a 2)1/2 p;/2 + A~ =
So (3.14) holds. This concludes the sufficiency of I,ll and III (for fixed /3) for the joint
asymptotic normality in Theorem 3.1 (for fixed /3) provided that in/f1(n)-w1, jn/g1(n)-w 2.
Now assume that one (or both) of in/f1(n) and jn/g1(n) does not converge. Then given
,
any
. subsequence {nk} of {n} we have that there is some subsequence of {nk}, {nk } (say) so that
Ink
( f
') - wi (say) e[O,l]. There is a further subsequence {nk } such that jn
'r
1 nk ,
/g1(nk )-w2
k, r
'r
(say) e [0,1].. So that (3.14) holds along the subsequence nk, and hence along the whole
r
sequence when I,ll and III hold (for fixed /3). The same argument can be used for any
ri == aie(O,oo), i=1,2, so we conclude that IV holds in Theorem 3.1 (for fixed /3) when 0-2e(0,00).
In the case when
0-
2=0,
joint normality reduces to:
p
to,o(f(n),g(n»-O whenever E1 holds.
(3.19)
limP[lto o(f(n),g(n»1 > e]:5 lim limP[lto o(f(n),g(n»1 ~ A]
£1
'
A-oo £1
.
+A-oo
lim limP[e < Ito o(f(n),g(n»1 < A]
£1
•
:5lim limP[lto o(f(n),g(n»1 ~ A]
:4-00 £1
.
+ (1/e 2 )A-oo
lim limEAt5o(f(n),g(n».
£1
•
The first term on the R.B.S. is zero by I and the second term is zero by III so (3.19)
holds. This concludes the proof of the sufficiency of conditions I,ll and III for asymptotic, joint
normality of the statistic t (for fixed /3).
It will now be shown that I, II and III are necessary for the asymptotic joint normality of
t with its subsample values.
Assume that {to o(f1(n),g1 (n», t i J' (f2(n),g2(n» lE~!1> N2(0,0,0-2 ,0-2 ,0- 2 p) (E 2 for fixed /3).
•
n' n
IE;
lim limA 2 P( Ito o(f(n),g(n»1 ~ A) = lim A2p(IZI ~ A)=lim A2EI{IZI2: A} :5
£1'
A-oo
A-oo
2
lim EZ 1{IZI 2: A}=O by the Dominated Convergence Theorem (Z is N(0,0-2», so I holds.
A-oo
A-oo
For II note that:
liF I AEAto,o(f(p),g(p» I = AIEAZI=O V A by symmetry so:
1
lim
A-oo
lim
£1
I AEAto o(f(p),g(p»I
'
=
°
and II holds.
30
For III, {to,o(ft(n),gt(n», t in ,i (f2(n),g2(n»}! (Zt' Z2) (where (Zt' Z2)"'" N2 (O,O,0-2,0-2,0-2 p
n
»
implies that limE A to o(ft(n),gt (n»At j i (f2(n),g2(n»=EZtZ21{IZtl < A}&{IZ21 < A} by the
E2
'
n' n
Reily-Bray Theorem.
ZtZ21{IZtl < A}I{IZ21
A and E I ZtZ21 $
0-
2
< A} a~. Zt Z2 as A-oo, I ZtZ21{IZtl < A}I{IZ21 < A} I $ I ZtZ21 for all
so that: lim EZtZ21{IZtl
A-oo
< A}I{IZ21 < A}= EZ t Z2=p0-2, by Dominated
Convergence.
This implies that lim lim I EA to o(ft (n),gt (n» At j i (f2(n),g2(n» A-oo E
'
n' n
p0-2
I=
0.
2
So I,ll and III are necessary and sufficient conditions for asymptotic, joint normality of t for a
fIxed ,8£(,80,1]. Clearly, the same argument can be used for any,8. Thus, Theorem 3.1 holds.
Theorem 3.1 is a characterization of normality. It shows, for data from a mixing random
fIeld, precisely which statistics are asymptotically normal: namely, those which are central, i.e.,
satisfy I, II, and III in Theorem 3.1. However, the conditions are difficult to directly check in
practice. The following corollary will enable us to check whether specifIc statistics are central,
and hence by the theorem, asymptotically Uointly) normal.
Corollary 3.1:
(3.21 )
If
then
(3.22)
iff
{t(B~~;),g(n»} is uniformly squared integrable (E 1 ).
(3.23)
The proof will be given after Theorem 3.2. This result is analogous to Theorems 2.3 and
2.7 in Section II. Comparing to Theorem 2.3, we need the covariance condition (rather than
31
just a variance condition) to insure that t is a su.fficiently mean-like general statistic, and we
require a faster mixing rate for our result to hold on a random field.
One question that comes to mind is: Is any statistic central? In other words, does any
.
statistic satisfy the three necessary and sufficient conditions of Theorem 3.1 or those of Corollary
3.1? Note that if a statistic satisfies (3.21) and (3.23) then it is central by Theorem 3.1 and
Corollary 3.1.
B) Examples
Example 3.1: The Sample Mean
EX = 0 and let Xo denote the observation at the origin.
Assume that
E{ I X I 2 + 6} <
6
where > 2,80(3 - ,80)
,
(1 - ,80)2 '
(3.1) holds,
a 1,oo(m)=o(m- 2),
If
and
then
(12: =
00
L Cov{XO,X i}
< 00,
(3.24)
(3.25)
(3.26)
(3.27)
ifZ 2
and if (12
> 0 then
t(B/(n),o(n) =
0,0
I B/(n),o(n)
1-1 / 2 "~
0,0
X.
I
(3.28)
. B/(n),o(n)
It
0,0
is central with variance (12,
and hence is jointly asymptotically normal with its subsample values.
Note that as ,80 approaches 0, 6 can approach 0, (3.1) approaches the i.i.d. assumption,
and (3.26) becomes unnecessary as we know that (3.28) holds in this case. At the other
extreme, if we have bounded random variables then we can allow relatively strong dependence as
,80 can be arbitrarily close to 1.
Proof:
The main work to be done is to show that (3.21) holds for (12 as defined in (3.27)
whenever E2 holds. For locations ijeZ 2 let lij: = Cov(Xi,X j ) and let lij(w):= lijD{Do(ij)=w}.
00
Then (12 =
10i(w). This will enable us to establish (3.27) as follows: From a natural
L L
w
=0ieZ2
extension of an I&L Lemma (see Bolthausen, Lemma 1) we have that
l'Oi(w)
I ~ C6af~12 + 6)(w) 'Vi:
Do(O,i)=w. This implies that:
32
00
(T2
00
6
$ Ccu 'L..J
"
'L..J.
"
a 61,1
/(2 + 6)(W) < C ' " (8w + 1)a /(2 + 6)(W) < 00
6 L..J
1,1
w=O DO(O,i)=w
w=O
(3.29)
4130 /(1 -130 ) is needed here). This establishes (3.27). To
simplify notation, let BI(n)=B~~~n)'9I(n) = {i=(i I ,i2):1 $ i1 $ fI (n),1 $ i2 $ gI(n)} and let
B2(n) = B{:~;~'92(n) = {j=(hJ 2):i n + 1 $h $ in +f2(n), jn + 1 $'h $jn +g2(n)}. Then, the
by (3.24) and (3.25) (in fact only 6>
left hand side in (3.21) becomes
I B2(n) I-ICov{L:
Xi' L: X j }= I B2(n) I-ICov{L: Xi' L: X j }+
it:BI (n) jt:B2(n)
it:B2(n) jt:B2(n)
I B2(n) I-ICov~~
Xi'
E
it:Bl~n)-B2(n) jt:B2(n)
Xj} = An + Bn (say).
To obtain (3.21) for this statistic, it will be shown that
I An -
For the firstimplication note that
92(n) V 12(n)
I L: {f2(n)g2(n)}-I{L:
w
=0
92(n) V 12 (n)
$
w= 0
E
00
I
(T21
w
L: 'Yi;<W)}it:B2(n)jt:B2(n)
I
(T21_0 and that Bn-O.
00
E=
I {f2(n)g2(n)}-1{~
E
I An -
=
L: 'Yij(W)}-
it:B 2 (n);t:B 2 (n)
L:
'YOi(w)
w = 92(n) V 12 (n) i
(3.30)
0
L: 'YOi(w)
I
i
Ei 'YOi(w) I +
= Cn + Dn
(say). Dn-O as n-oo because (T2
< 00 by (3.29).
(F n == 0 if f2(n) $ g2(n» where ki,mi $ w. The first equality follows by stationarity and a
counting argument. The first inequality follows from the fact that f2(n) > Cg2(n) for large n
and the second from the same I&L Lemma used to obtain (3.29). We now need the following
lemma (Ash (1972):
Lemma 3.8: Kronecker's Lemma:
00
a.
n
a.
If bn-oo and L: b~ < 00 then L: b) -0.
j=I}
j=In
33
f1
Using this Lemma (a J.=j2 a I 2 + 6)0), bJ.=j) and (3.29) we find that En-O.
n
2
'
Note that Fn-O if
~f(P+6)(w)_0. Now, af(1 2 + 6)(w):::; C1w,l3o-3 by (3.1) and
E
10
=1
(3.24) and hence
n
2
E 7f(l2 + 6)(w) <
00
10
00.
Now using Kronecker's Lemma again (with
=1 w
aj=paf(12 + 6)0), b j =j,l3) we see that Fn-O implying that Cn-O and hence 1An - (1'21_0.
I Bn I : :; {f2(n)g2(n)}-1
E L I iij I
I~Bl(n)-B2(n) j~B2(n)
g2(n)
:::; {f2(n)g2(n)}-IC c5
E (8w)(2wf2(n) + 2wg2(n))afl 2 + 6)(w) +
10=1
gl (n) V It (n)
+ {f2(n)g2(n)}-ICc5 E
8wf2(n)g2(n)af(1 2 + 6)(w)
w=g2(n)+1
g2(n)
:::; C {(g2(n)f 1
c5
=
g2(n)
gl(n) V 11 (n)
10=1
w=g2(n)+1
E w2af~p +6)(w) + (f2(n))-1 E w2af~p + 6)(w) + E
10=1
waf(p + 6)(w)}
Cc5 {G n + Hn + In} (say).
The second inequality follows by noting that for distances (w) no larger than g2(n), and a
fixed orientation (ij) there are at most 2w(f2(n) + g2(n)) terms equal to ii;<w) in the double
sum and for each w, there are at most 8w possible orientations (first term); and that for any
distance there are at most 8wf2(n)g2(n) terms at this distance (second term). G n = En and
hence Gn-O, Hn :::; COG n for large n so Hn-O and In-O by (3.29). Thus, (3.21) holds.
Now note that (3.24), (3.25) and (3.26) imply (2.12) - (2.15) in Theorem 2.4, section IIA.
Hence (2.17) holds. (3.21) and (2.17) together imply (3.23) (see the proof of Corollary 3.1 and
Lemma 3.14); (3.22) holds by Corollary 3.1 and hence (3.28) holds by Theorem 3.1.
The next example shows that a nice function of a statistic that is central is also central and
hence jointly asymptotically normal with its subsample values.
Example 3.2: The Delta Method
Let s(Bt(~)·g(n») be a statistic with Es(Bt(~)·g(n») = c. Also let b n == b(f(n),g(n)) be a
•
E
•
sequence of real numbers s.t. bn-loo and let t(Bt~~)·g(n») = bn(s(Bt~~)·g(n») -c).
If t(Bt~~)·g(n») is central with variance (1'2, then for h( . ) with derivative at c we have that
bn(h(s(Bt~~)·g(n»))- h(c)) is central with variance (h/(c))2 q 2.
34
Proof:
Write bn=b(f(n),g(n)), b~l)=b(fl(n),gl(n)), and b~2)=b(f2(n),g2(n)). We need to show
that Al b~l)(h(so,o(fl (n),gl (n))) - h(c») + '\2b~2)(h(si , in(f2(n),g2(n») - h(c») converges to
n
"<'O,('\~ +,\~ + 2'\1 '\2 p )0'\h'(c)f)=N* (say).
We will use the expansion h(su , v (f(n),g(n))) =
n n
h(c) + (h'(c)Xsu , v (f(n),g(n)) - c)+ Rc,!,g,u,v where bnRc,!,g,U,v!..O as n-+oo.
n n
{AI b~l)(h'(c) XSo,0(f1 (n),gl (n)) - c) +
A2b~2)(h'(c) XSin,in(f2(n),g2(n»
- c)}
!,,<,O,0'2(h'(c))2(A~+2PAl'\2+'\~)) and '\lb~l)Rc,lt,gl'O,o-+Oand A2b-~2)Rc'!2,g2,i,i-+0by the
above expansion. This implies joint normality and hence centrality by Theorem 3.1.
Example 3.3: The Sample Percentiles
For '1£(0,1), let b denote the '1 th percentile of F, i.e., F(b) = IP[Xo $ b] = TJ. Assume that
F is absolutely continuous and strictly increasing and that Xo has density f( . ) with f(b) > OJ let
X?'i,!,g denote the rTJf(n)g(n)lth smallest of the ordered Xk's, k£B{;f.
(3.1) holds,
(3.31 )
Cl1,oc(m)=o(m- 2).
(3.32)
0'2: = L: [P{Xo $ b, Xi $ b} - p2{Xo $ b}] < 00,
(3.33)
Assume that
and that
Then
i~12
and if 0'2 > 0 then
t(B!(n),g(n» = I B!(n),g(n) 1 1 / 2 (X'1
0,0
0,0
O,O,!(n),g(n)
- b)
(3.34)
is central with variance 0'2 /f2(b).
Proof:
00
First 0'2= L:Cov(I{Xo $ b},I{X i $ b}) $ 4 L: Cl1,l(Do(i,O) $ 4 L: (8w + l)Cll,l(w) <
i~12
by Lemma 3.1 and (3.31) using the fact that
i~12
w
=0
(130 + 1)/130 > 2. Thus, (3.33) holds.
To demonstrate that (3.34) holds note that P{to,0(f1(n),gl(n» $ x, t in ,i (f2(n),g2(n» $ y} =
n
where Yi = I{X i $ b + x/(f1g1)1/2}, 'Ii = I{X i $ b + y/(f2g2)1/2} and B1(n) and B2(n) are as
defined above in Example 3.1. Assume that x > 0 and y > O. The other cases will follow
similarly. The last term is equal to
35
00
using the expansion: F(b + x/(fg)I/2)=F(b) + f(b)x/(fg)I/2 + R(b,x,fg) where
(fg)I/2R(b,x,fg)E'0' We now need the following proposition:
1
Proof:
We need to show that
' " I{Xi~b+x/(flgl)I/2}-F(b+x/(flgl)I/2)_
' " U{Xi~b}-'7 ~
LJ .-;......:...;:;;,.....--.;....;.,.=:....-~1....,,/2,........:..-......;...;...:;.;:~~ LJ
1/2
0,
ieB (n)
(f
ieB (n) (f1g1)
g
)
1
1
1
1
.
I.e.,
' " I{b
LJ
< Xi < b + x/(f1g1)1/2} - P{b < Xi < b + x/(f1g 1)1/2} Po Th'
1/2
I.
'II
IS WI
~
II
10
b
ow, y
(f1g 1 )
ieB (n)
I
Chebyshev's inequality, if we can show that (flglrlVar{L: U{b < Xi:::; b+
00
Note that L: wa 1,I(w) < 00
w=o
'"fii:=Cov{l{b
Var{L: I{b
ieB 1(n)
<
x 1/2}}-+0.
(f1g 1 )
implies that for any 0> 0,3 Ns s.t. L: WCl 1,I(w) < 0/64. Letting
ieB (n)
1
00
w=N{j
< Xi ~ b +
< Xi ~ b +
.
x 1/2},I{b < Xi
(f1 g1 )
~ b+
x 1/2}} we have that
(f1g 1 )
N{j-l
X
(f1g1 )
1/2}} = L:
00
L: '"fii(W) + L:
i,ieB1(n)w=O
flgI4N~Va~l{b < Xo ~ b+x/(f1g1)1/2}}+f1g1
f:
w=N{j
36
L: '"fii(W)
i,ieB 1(n)w=N{j
8w(4a 1,I(w)) where '"fij(w) is as defined
00
L
$ 4Nlp{b < X o $ b+
x 1/2} +32
w£r1 ,I(w) < 4Nl(6/8Nl) +6/2=6 for n > N5because
•
(f1g 1 )
w=N
6
Va~l{b < Xo $
l/2}}
b + x/(f1g1
$ P{b < Xo $ b + (f x)I/2}' F( . ) is continuous at band
Ig1
00
L
w£rl,l(w) < 6/64. Hence Proposition 3.9 holds. Now,
w=N 6
The two terms in the third sum tend to 0 because f(b) > O. The last two terms tend to
zero in probability by Proposition 3.9. For the first two terms, note that (3.31) and (3.32)
o
L
I{X i $ bI/;'" is jointly asymptotically normal with its
i~Bl (n) (f1g 1 )
subsample values (by Example 3.1); Le., the first two terms converge to a
~0,( A~ + A~)u2 /f2 (b) + 2A A Pu 2/f2 (b») random variable and hence (3.34) holds.
imply that t ,o(f1(n),gl(n»: =
1 2
C)
Sufficient Conditions for Asymptotic Normality of t Computed on Starshapes
A natural question that comes up is: Can we compute the statistic, t, on a different class
of shapes than rectangles (and still get asymptotic normality)? Here we consider computing t on
the indices contained in a "starshape" in R2 • A starshape (with center at the origin) is a
bounded set A C H2 , such that for each xfA we have that tXfA for 0 $ t $ 1. In other words, A
is a connected set such that each ray from the origin intersects A on a single segment going
through the origin. As seen from the three examples in Figure 3.3 this is a broad class of
37
shapes. This class of shapes has been used by Rudemo et al. (1990) for detecting a "change
boundary".
Figure 3.3: Examples of Starshapes
In Lemma 3.10 we give sufficient conditions for asymptotic normality of t computed on a
starshape and in Theorem 3.2 we give a result analogous to Corollary 3.1 for t computed on a
class of sets containing all starshapes (sets in E1 as defined below). In Example 3.4 we show
that the sample mean satisfies the conditions of Theorem 3.2 and hence is central.
We now give the basic setup for this section. For any finite collection of indices E C Z2
let T E be a function from RIEl to R1 and let t(E):=TE(X j : jeE) be a statistic of interest (T is
assumed to be invariant under translations of the set E). Let .A.={A:A is a starshape,
A C [-1,1]2}. To avoid pathological sets we assume that A contains an (-ball around the
origin, that the boundary of A,
oA, has finite length, and
that A includes its boundary, Now
for each n, multiply the set A by n, to obtain the set An C [ - n,n]2. Our data are observed at
the indices On: = 0n(A): = {ieA n n Z2}. On will always denote the indices contained in the nth
multiple of a starshape. We will let En and Bn denote generic sets of indices i~creasing to Z2.
Here, analogously to the definition in Theorem 3.1, a statistic t is defined to be
central with variance
(12
if the following three conditions are satisfied:
(I)
(IT)
38
where, E1 =E 1 U E2,
E 1 ={B n: Bn ;;2 Dn 'In, Dn(A):Ae.A., IDnl/IBnl-l, n-oo},
I
E 2={B n: Bn=: U Bim ,B im nonoverlapping m n x m n squares, n-oo},
5
1=1
n
n
E2=( U E i) n E6 ,
1=3
E3 ={(Bn,En):(Bn,En)=(Bn,Dn)eEl},
E4={{Bn,En):Bn=EneEI U E2},
ES={{Bn,E n): En=Pn x Pn square, BneE 2,IE n //IB nl-l/ I, I as in E 2},
E6 ={{B n,E n):E n ~ Bn, IE nl/I Bn l-p2 (p2=I/k,kel +
n.
Let A be a fIxed, but arbitrary starshape in .A and for each n let Dn be the corresponding set of
indices.
Lemma 9.10:
If t is central with variance (1'2
> 0 and (3.1) holds,
then t(D n)!N(O,(1'2) as n-oo (Dn=Dn(A):Ae.A).
Proof:
Initially the proof follows along the lines of that in Theorem 3.1. Assume that (1'2 e (0,00).
Then without loss of generality we can take (1'2=1. Let p ~ 1 and let k ~ 1 be fIxed but
arbitrary integers. Let Bi,'j denote the half open square (i,i+m] x (jj+m] in R2 and let Bi,'j
denote the set of indices contained in Bi,'j' Let R(A,k,p) denote the minimal union of
(p + qp) x (p + qp) squares, B~~~l~(~ + qp),(j -l)(p + qp) :(ij)el , containing Ak(p + qp) (where
qp=r p 6"1, 2130 /(1 + 130 ) < 6 < 1, 130 as defIned in (3.1», let r:=r(A,k) denote the number of (ij):
2
B~~~l~(~ + qp),(j _ l)(p + qp) C R(A,k,p).:,.i.e., the number of squares comprising R(A,k,p) and let
'::Bf, e=I,::..,r, denote the corresponding Bri -l)(p + qp),(j -1)(p :,qp) for some ordering of (ij).
Letting R(A,k,p): = {ieR(A,k,p) n Z2}, it will be shown that t(R(A,k,p» is close to the
r t('::Bf)
standardized sum
---rJ2 ' that the standardized sum is close to a standard normal random
1=1 r
variable, and fInally that t(D n) is close to t(R(A,k,p». Recall that AX=XB{ IXI < A}.
r
At('::Bf)
lim lim Var{ At(R(A,k,p»1/2} <
A-oo p-oo
1= 1 r
-
E
-
lim
lim
A_ooP-OO
. .,
I EAt 2 (R(A,k,p»
E
- 11 + lim
lim IIE At 2('::Bf) - 11
A-ooP-OO
39
+ 2r1/ 2 lim
lim IEAt(R:(A,k,p»EAt(~f)1
A-ooP-oo
The first, second and fifth terms on the R.B.S. are zero by III. The third, fourth and
sixth terms are zero by II. The seventh term is zero using Lemma 3.1 and an identical
argument to that following the Lemma (with .8=1=1). In the same way as in the proof of
- {....
'f-. At(~f)}2
Theorem 3.1 we find that}.!TooP~ooE At(R(A,k,p»- LJ
1/2
=0 using II, and that
t(~p,,)
/= 1 r
t(R(A,k,p,,»IJ2 P,O when P" ~ pLl) using Lemma 3.2 with b,,=rl/ 2 . Letting 4>,,(s) and
/=1 r"
t(~p,,)
t*(~p,,)
4>k(s) be the characteristic functions of
and
1/~ where the t*'s have the
..
r"
L
Lr" +,
Lr"
/=1
/=1
r"
r"
same marginal distributions as the t's but are independent, we find that 14>,,(s) - 4>k(S) 1-0 when
P"
~ pL2), and we arrive at:
"
....
!1>
(1')
t(R(A,k,p,,»-N(O,I) when P" ~ 11",,: =1 ~~ 5 {Pk' }.
(3.35)
Using the same logic as in demonstrating (3.10), it now remains to be shown that
t(D n )!N(O,I). This will be done in two steps. It will first be shown that t(D,,(p +
....
close to t(R(A,k,p,,» and finally that t(D n ) is close to t(D,,(p + q
" P"
we will need the following propOsition from Apostol (1957):
».
"
» is
qPk
For the first implication,
Proposition 3.11:
Let
0 be a
rectifiable, simple, closed curve contained in [ -1,1]2 and partition [ - 1,1]2
into disjoint squares of length 6. Then [the number of squares having points in common with
4L(O)
~] $ 4 + -6- where L(~) denotes the length of ~.
*
(Pk
+ qPk)
Let R (A,k,Pk) denote the union of squares B(. 1)(
1-
nonempty intersection with 8A,,(
P"
+
qPk
) (.
Pk+qp",J-
1)(
Pk
+
) that have
) (8 denotes the boundary of a set) and let R:*(A,k,Pk)
denote the indices in this union. Clearly, R(A,k,p,,) - R*(A,k,Pk) ~ Ak(Pk + q
Let
qp"
Pk
)~
R(A,k,Pk).
II 8A II
denote the length of A's boundary. By Proposition 3.11 [the number of squares
(p,,+qp )
411 8A II k(Pk +qPk)
B(. )("
) ( .)(
) in R*(A,k,Pk)] $ 4 +
(
)
. The number of
I - I P,,+qPk' J-l P,,+qPk
P,,+qPk
lattice points in these squares=O(k)(p" + qp )2. Now, IDk(p + q
"
k
)1 $ IR(A,k,Pk)1 and
Pk
)1(k 2(p" + qp,,)2) -1 - A(A) 1 $ 2..J211 8A II (k(Pk + qPk» -1 + 211"(k(p,,"+ qPk» - 2
I I D"(Pk + q
Pk
(by Rudemo et al. (1990, Lemma AI» imply that for large k, IR(A,k,Pk)1 ~ C(k 2(Pk + qPk)2),
40
O'(k(Pk + qPk)2)
and hence that 1 -
2
Ck (Pk + qPk)
2
IR(A,k,Pk)l- O'(k(Pk + qPk)2)
IDk(Pk
<....
< ....
IR(A,k,Pk)1
-
-
< 1. Now
IR(A,k,Pk)1 -
)1
IDk(Pk +q
note that the first term tends to 1 as k-oo implying that....
Pk
IR(A,k,Pk)1
Taking lim.
+ qpk)1
lIas k-oo.
lim of the first two terms on the R.B.S. yields 0 by I. The lim lim of the
.
A-~k-~
A-~k-~
third term is no larger than lim
lim
A-~ k-~
- 11 +
2lim
I EAt 2(R(A,k,Pk))
lim IEAt(R(A,k,Pk))At(Dk(p
A-~ k-~
k
+
qPk
»)
11 + A-~
-lim
-
-11.
lim
k-~
IIEAt 2 (D k(p k + qPk »)
All three of these terms are 0 by III
(the third by the calculation after Proposition 3.11) so we can conclude that
(3.36)
Now for given {nj} define njk and Pk exactly as in the proof of Theorem 3.1 following
(3.8). We have by (3.35) and (3.36) that t(D k ( +q »)~N(O,I) in this case. If we can show
ID n . I
Pk
Pk
that ID
Jk
I 11 then we will have that t(Dk(p + ») - t(D n . )~O using the same
k(Pk+qp)
k
qPk
argument as thai which established (3.36) and hence that
Jk
t(Dn)~N(O,I).
Toward this end, note
that nj $; k(Pk + qp ) implies that Dn . ~ D k ( + q ) by the fact that D u is the set of indices
k
k
Jk
Pk
Pk
contained in the starshape Au and the definition of a starshape (in fact this is the only point
where the starshape property is used).
ID n .
I
J..,::k,--.
IDk(Pk + q )1
Pk
,:,="_ _
ID n . I/n~k
=
Jk
IDk(p
k
+ q )I/k
Pk
2
(Pk
+ qPk)
n·
2&<
Jk
(Pk
+ qPk)
)2 == ~Wk)2,
k
where Vk-.A(A) by the
-
argument following Proposition 3.11; uk-A(A) similarly by Rudemo, et al. (1990); wk-1 as in
the argument preceding equation 3.10. This concludes the Proof of Lemma 3.10 because Af.A
41
was arbitrary. The next three Lemmas will be used to obtain Theorem 3.2 (and Corollary 3.1
by making the identification t(Bn)=t(B~lJn).gl(n) and t(E n) = t(B{2(1~),g2(n)).
,
n' n
Lemma 3.12:
If t is central with variance
(T2
Proof:
We need only show that t(B n)!N(0,1) where Bn: Bn ;2 D n 'tin, Dn(A):Af.A, IDnl/IBnl-1
or Bn: Bn~ ~ Bin' Bin is a n by n square. For the first case we can use Lemma 3.10 and an
1=1
identical argument to that which established (3.36). For the second case break up each Bin into
a big square Bim with the same center as Bin and a "little" buffer. We can show that
I t(B i : )
t(B n) 1/2n P,O in the same manner as equation (3.5) was shown to tend to zero as
I t(B im )
I t*(B im )
i=1 I
p-oo and that the characteristic functions of
1/2 nand
1/2 n are asymptotically
L
L
L
I
i=l
i=l
I
equivalent by using the same argument that follows (3.7). Now note that each
t*(B im )!N(0,1) by the fact that the indices in B im are contained in a translated starshape
I t*(B7m )m
n
(hence Lemma 3.10 applies) and finally that
1/2 n IN(0,1). Thus Lemma 3.12 holds for
L
i
For
(T2
= 0,
=1
I
t(Bn)~N(0,(T2) reduces to t(Bn).!O (E 1). To see that this holds, note that
1
for any 0 < f < A, lP[lt(B n)1
> f]=IP[f < It(Bn)1 < A] + lP[lt(B n)I 2: A]
::; (1/f 2)E A t 2(B n) + P[lt(Bn)1 2: A] so that limP[lt(Bn)1 > f] ::; (1/f 2)lim lim IE At 2(B n)
.
~
A-~ ~
+ A-~
lim lim P[lt(Bn)1 2: A]
E
= 0 by III and I.
1
Lemma 3.19:
Assume that
(3.37)
If
(3.38)
and
(3.39)
then
t is central with variance
42
(T2.
Proof:
For I: lim
A-oo
limA 2 P( It(Bn)1 ~ A)::; lim lim [At 2(B n) = 0 by (3.39).
E1
A-co E1
= lim limIAEt(Bn) - AIEAt(Bn)1 ::; lim limIAEt(B n) I
A-oo E
il-oo E
For II: lim limlAEAt(Bn)1
A- 0 0 E 1
1
1
+A-oo
lim lim I E A t 2(B n) I = 0 by (3.37) and (3.39).
E
1
For III:
1 EAt(B n)At(E n)
- pu 2
1
::; I Et(Bn)t(E n) - pu 2 1
::; I Cov{t(Bn),t(EnH
= 1 E{t(B n) - At(B n)} {t(E n) - At(E n)} -
+
1 Et(Bn)At(E n) 1
- pu 2 1
+
+
1 EAt(Bn)t(E n) 1
I Et(Bn)Et(E n)
I
+
+
pu 2
I
I EAt(Bn)At(E n) I
I Et(Bn)At(E n) 1
+
IIEAt(Bn)t(E n)
I
+
I EAt(Bn)At(E n) 1 •
Now,
_
- I E 11 / 2
lim I Cov{t(Bn),t(EnH - pu 2 1 ::; lim n 1/2
A-oo £2
E2 1 B 1
_
IE 1 1 / 2
n
2
+ u lim I n 1/2 - P I = 0 by (3.38).
E2 IBnl
lim lim 1 Et(Bn)Et(E n) I ::; lim I Et(B n) I lim I Et(E n)
lim
A-oo
~
~
~
I
= 0 by (3.37) noting that any set that
appears in E 2 appears in E r
lim lim I Et(Bn)At(E n)
A-oo E 2
1 ::;
lim {Et 2(B nH 1/ 2 lim lim{EAt2(EnHl/2 ::;
E2
A-oo E 2
(lim Vt(B n) + limIE 2t(B n»I/2lim lim{EAt2(EnHl/2 = (u 2 + 0)1/2 0=0 by (3.38), (3.37) and
E2
E2
A-oo £2
(3.39).
Similarly,
lim lim I EAt(Bn)t(E n)
A-oo E 2
1
lim lim I EAt(Bn)At(E n)
A-oo £2
= 0 and
1 ::;
lim lim(E At 2(B n»I/2lim lim(IE At 2(E n»I/2 = 0 by (3.39).
A-oo £2
il-oo £2
Lemma 3.14:
Assume that
(3.40)
If
(3.41 )
and
(3.42)
then
(3.43)
43
Proof:
lim limE At 2(B n) = lim lim I Et 2(B n) -
A-oo E 1
A-oo E 1
lim I Vart(B,,) E1
(T2
(T2
+ (T2 -
2t(B ) + lim lim
I + limE
n
E
A-oo E
1
1
EAt 2(B n) I
lEAt 2(B n) -
<
(T2
I.
The first term on the R.B.S. is zero by (3.41) while the second is zero by (3.40). So it is
sufficient to show that lim lim I EAt 2(B n) _(T21
0. limE At 2(B n) EAZ2 where Z has a
A-oo E1
E
N(0,(T2) distribution, by Belly-Bray and finally, lim EAZ'- = EZ 2 = (T2 by Dominated
=
=
A-oo
Convergence.
Lemmas 3.12, 3.13 and 3.14 imply the following Theorem (analogous to Corollary 3.1 in
the case of rectangles) holds:
Theorem 3.2:
Let {Xi} satisfy mixing condition (3.1) and let the statistic t be standardized so that
E{t(Bn)}EO.
1
If
then
1 2
IB 1 /
n 1/2 Cov{t(B n ),t(E n )}E(T2
IEnl
2
(3.44)
t(Bn)~N(0,(T2)
(3.45)
1
iff
{t 2 (B n)} is uniformly integrable (E 1 ).
(3.46)
Note that Corollary 3.1 is proven using identical arguments to those in Lemmas 3.13 and
3.14 by using the identification preceding Lemma 3.12. Like Corollary 3.1, Theorem 3.2 enables
us to check if a specific statistic is asymptotically normal by examining its covariance behavior.
This result is analogous to Corollary 3.1 except only univariate normality is obtained here (3.45)
compared with bivariate normality in (3.22). This is the price we pay for considering the class
of irregular shapes in E 1 and E2. Theorem 3.2 is analogous to Theorems 2.6 and 2.7. Here, as
in the time series case (Theorem 2.7), uniform integrability becomes an additional necessary
condition to obtain asymptotic normality, as opposed to the i.i.d. case (Theorem 2.6) where
uniform integrability is a consequence of the covariance condition.
44
It will now be shown that under appropriate mixing and moment conditions (3.44) holds
for the standardized sample mean. This will (using Theorem 2.4) imply that t(B n):=IB n l- 1 / 2
X j is central (satisfies I,ll and III).
L
j~Bn
Example 3.4:
Assume that
EX = 0 and let Xo denote the observation at the origin.
E{ 1 X I 2 + c5} < 00, where 6 > (1 ~~o) ,
If
and
then
0"2:
=
(3.47)
(3.1) holds,
(3.48)
a 1 ,oo(m)=o(m- 2),
(3.49)
L Cov{Xo,X j}
< 00,
(3.50)
j~Z2
and if 0"2 > 0 then
t(B n): = IB n l- 1 / 2
L
X j is central with variance
(3.51 )
0"2.
j~Bn
Proof:
We will use Lemma 3.13. We show that if B n is any set such that I aBn I/ I Bn 1-0,
where denotes the boundary of a set of indices, then Var(t(B n»-0"2. This will enable us to
a
show that (3.44) holds (noting that all sets considered here satisfy this boundary condition).
Then (3.51) follows from the same logic as in the last paragraph in Example 3.1.
For each ifB n let Si,n =
L Xjl{Do(ij) $ m n}, an = L IE(X jSi,n)' S:,n =
j~B
j~Bn
L X}{Do(ij) $ m n}, and a~ = 2:: E(XjS:,n)' where m n is chosen such that mn-oo but
m~ 1aBn 1/ I B n 1-0. Note first that a~/ 1Bn I = I: [(XjS:, n)/ I B n I
because
j~Z2
j~Bn
_0"2
i~Bn
[(XjS:, n)_0"2 uniformly in i by stationarity. By the triangle inequality,
I Var(t(Bn»
- 0"
2
1$ I Var(t(B n»
- I;: 1I
*
*
+ 1I;: I - I;: I I + I I;: I -
0"
2
1.
For the second term on the R.B.S. note that for each i there are at most (2m n + 1)2 indices in
Sj, nand S:, n' that there are at most (2m n + 1)2 I aBn I values i contained in a~ for which
Si,n:l= Si,n and that each
I EXjX j I $
C by (3.47) and the Cauchy-Schwartz Inequality. Hence
the second term on the R.B.S. is no larger than
(C
m n. For the first term on the R.B.S.:
»- I;: I 1=
I Var(t(Bn
45
1 aBn I
m~)/ I Bn 1-0 by the definition of
L 8wcrf(12 +6)(w)_O by (3.47) and (3.48).
w=m
00
=C e5
n
Now note that for any Bn.E n (En S; B n) the L.B.S. of (3.44) is equal to:
lEn 1-ICov{L
,(E
Xj..L X}+ lEn ,-ICov{ L Xj,.L XJ
n
j
J(E n
,(Bn-E n
The first term has been shown to
J(E n
converge to (1"2. The second term is bounded in absolute value by:
IEnr
l
L L: 11jj I = lEn I-I L: L: 11jj II{Do(iJ) $ m n}
j(Z2_ E j(E n
j(Z2_ E j(E n
n
+ IEnr l L:
n
L: 11jj.II{Do(iJ) > m n}
j(Z2_ E j(E n
n
$ C*(IE n I-II oE n I m~ +
t
8wcrf(1 2 +6)(w)}-O (where m n is such that mn-oo but
w=m n
lEn I-II oE n I m~-O). The inequality follows by noting that there are at most
IoE n I (2m n + 1)4 pairs (iJ) for which ieZ 2 - En. jeEn' Do(iJ) $ m n and that for each j in the
second term (and fixed w) there are at most 8w terms i and each
! 1jj I .$ ce5crf~12 + 6)(w).
The work in this section has been theoretical in nature. A more applied question is: For a
statistic computed on a finite (and possibly irregularly shaped) data grid how can we empirically
assess whether or not it is asymptotically normal? We want to answer this question in the same
model free spirit in which this section was approached. This question is addressed in the
following section for both data from a stationary sequence and from a random field.
46
IV. THE REPLICATE HISTOGRAM:
ESTIMATING THE DISTRIBUTION FUNCTION OF A GENERAL STATISTIC
1. Introduction
Here is a typical scenario for statistical inference: A series of n observations
Xn: = (X 1 'X2 ' ... 'X n) is obtained from a random process that is controlled by unknown
parameters (O,v). The "target parameter" (hR is estimated by a scalar statistic sn: = sn(X n),
while v is. a "nuisance parameter" which may be a vector
01'
even infinite-dimensional.
In order to draw statistical inferences from sn to 0 (e.g., confidence intervals, hypothesis
tests), one must make assumptions about the sampling distribution of sn' or, as a practical
approximation, assumptions about the asymptotic distribution F of a standardized transform of
sn' say, t n :
=an(sn -
b n ). The validity of the inferences therefore relies upon the
appropriateness of the chosen F (e.g., normal, X2 ). The "Replicate Histogram" is a simple
diagnostic tool for assessing the appropriateness of F, using only the data Xn at hand.
To illustrate the need for using the Replicate Histogram, consider the following obstacles
which confront the statistician in trying to correctly determine F(y): =
(i)
ME.1oolP[t n ~ y]:
The statistic sn may be complicated (e.g., a robustified or adaptively defined statistic, like
Switzer's adaptive trimmed mean [see Efron (1982), Ex. 5.2]), so that a theoretical derivation of
F is analytically intractable.
(ii)
The observations might be serially dependent, so their joint probability structure must be
accounted for in deriving F. This in turn may require knowledge or assumptions about the
underlying serial dependence mechanism. For example an assumed AR(p) model would require
estimation of the noise parameter (part of v) as well as the p autoregressive coefficients.
(iii) The standardization constants (an,b n) may involve the unknown parameters. Moreover,
these parameters may actually be crucial in determining the fundamental form of F. For
example, consider the simple case where sn is the sample mean of i.i.d. observations, so that F is
necessarily a stable distribution [see Ibragimov and Linnik (1971), Chapter 2]. The appropriate
47
standardizing coefficient is an = n 1 / 2h(n) (where h( . ) is a slowly varying function) when F is
the [symmetric] normal distribution. On the other hand, the appropriate standardizing
coefficient is n(a -l)/ah(n) (where cu(0,2)) when F is an a-stable distribution [which may
actually be quite skewed].
Clearly, (i), (ii) and (iii) are not mutually exclusive and may actually all be present in
any given situation. In order to see how the Replicate Histogram avoids these three obstacles,
we introduce the following notation for a series of consecutive observations Xi:
=
(Xi+l,Xi+2'''''X i +,) so that the observed data-set is X~, and the collection of all available
"subseries" is {Xi:o $ i $ n -I}. The corresponding "replicates" are denoted by
sf;
the
standardized versions are analogously denoted by ti: = a,(s; - b,) where a, > O. For each yfil~,
the Replicate Histogram is defined to be:
(1.1)
where In:
= Lcn"'( J, for any fixed c > 0 and "'(f(0,1/2). This is simply the empirical distribution
function of the replicates. Note how the Replicate Histogram avoids obstacles (i), (ii) and (iii):
(i) The Replicate Histogram is directly computable from the data so no theoretical analysis is
necessary. Moreover, the Replicate Histogram "works" [i.e. , is a strongly consistent diagnostic
tool, in a sense which will be made precise in Section 2] for virtually any statistic t n that has an
asymptotic distribution.
(ii) By employing "replicates" of the statistic of interest sn the correct serial dependence
structure is automatically retained, without any knowledge of or assumptions about the
underlying dependence mechanism.
(iii) No standardizing constants (an or b n) are present in the equation defining the Replicate
Histogram and hence there is no need to know (or guess) them.
Thus, the Replicate Histogram is an "omnibus" procedure: it applies to a general statistic
in a general setting, so that each new scenario (e.g., a new sn or a new serial dependence
mechanism) does not require the development of a new procedure. The Replicate Histogram
shares "the charm of the jackknife and the bootstrap", which is, quoting Efron (1982), "that
they can be applied to complicated situations where parametric modeling and/or theoretical
analysis is hopeless." Naturally, any method which achieves such seemingly far reaching
48
applicability must do so at some cost. This cost, for the Replicate Histogram, is that it returns
only diagnostic information about the shape of F (e.g., symmetry vs. skewness), but does not
return numerical estimates of the percentiles or probabilities induced by F.
Comparison with Other Methods
Many methods have been proposed for estimating the distribution of a general statistic in
various settings, but none are diagnostic tools in as general a setting as that considered here.
Wu (1990) says "a major purpose of resampling is to use the observed data to construct a
distribution that mimics the unknown distribution of [the statistic]". This is his motivation for
development of the "jackknife histogram". The bootstrap has been used for estimating the d.f.
of a statistic in many different settings: in the LLd. case, for statistics with a limiting normal
distribution (Bickel and Freedman (1981) and Singh (1981» and for statistics with non-normal
limiting distributions (Athreya (1987), Bretagnolle (1983) and Swanepoel (1986»); in the time
series case, for statistics with a limiting normal distribution (Freedman (1984), Bose (1988),
Rajarshi (1990) and Kiinsch (1989» and for statistics with a non-normal limiting distribution
(Basawa et al. (1989»; and in the case of spatial data, for asymptotically normal statistics (Lele
(1988».
Despite Wu's comment on the purpose of resampling, he only considers statistics which
are asymptotically normal. In general, i.i.d. resampling results use the exchangeability of the
observations which, as discussed, is not compatible with dependent data. Further, in the
Athreya, Bretagnolle, and Swanepoel results for asymptotically nonnormal statistics, knowledge
of the particular limiting distribution (and the standardizing constants an and b n) of our
statistic determines the correct standardization .for the resampled statistic and the correct
bootstrap resample size. The Freedman and Bose results assume that the dependence mechanism
which generated the data is known (Freedman emphasizes this in his paper) while our method
characterizes the dependence with model-free mixing coefficients (2.1 and 2.3). The Kiinsch,
Rajarshi and Lele results allow for a variety of dependence structures but they all assume that
the statistic of interest has a limiting normal distribution. In some cases (Singh, Bose),
however, the bootstrap picks up an extra term in the Edgeworth expansion.
In summary, the problem is that these methods are, in some sense, piecemeal. Each
method is situation specific and demands that the user have particular knowledge (Does my
data come from an AR(p) process? Is the statistic asymptotically normal? If not, what is the
correct limiting distribution? What is the correct resample size?). It is true that if the user has
this prior knowledge then the appropriate jackknife or bootstrap algorithm can (should) be used
49
(if it exists). In absence of this knowledge, the Replicate Histogram can still give a good idea of
the form of the limiting distribution, without any assumptions on the standardization or on the
dependence structure which generates the data. It will not, however, give percentiles of the
limiting distribution of the (standardized) statistic (because the standardizing constants are
assumed unknown). Thus, one strategy is to use the Replicate Histogram to estimate the
general shape (e.g., normai, skewed) of the distribution. The resulting information can be used,
in conjunction with knowledge about the dependence structure (if available), to select the proper
jackknife, bootstrap or any other procedure which is compatible with the information obtained.
The next Section formally defines our setup and the theoretical properties of the Replicate
Histogram (proofs are deferred to the Appendix) in both the case of a stationary time-series and
a random field (spatial data). Two examples, illustrating the use of the Replicate Histogram as
a diagnostic tool, are presented in Section 3.
2. Properties of the Replicate Histogram
A) Time-Series Data
The random process {Xi: -
00
< i < +oo} is assumed to be stationary, and the strength of
serial dependence is measured by the standard model-free "mixing coefficient"
a(m): = sup{ I P{A n B} - P{A}P{B}
I:
AeCT(... ,X_I,XO)' BeCT(X m • X m + I"")}'
(2.1)
as introduced by Rosenblatt (1956). Intuitively, the requirement that a(m)-O as m-oo says
that observations separated by a large time-lag behave approximately as if they were
independent. This mixing property, together with stationarity, guarantees that the replicates
defined in Section 1 are "valid".
In order to state the consistency result for the Replicate Histogram we need to make the
following transformation:
Notice that {Gn(y):yeR} (as defined in (1.1», which is directly computable from the observed
1\
data, contains essentially the same diagnostic information as the unknown {F n(y):yeR}. For
example, the standardization transformation preserves normality, symmetry, skewness (as
measured by, e.g., the skewness coefficient ,lJ(Y): = £2 {(Y - £ {Y} )3} IV 3 {Y} [see Kendall and
Stuart (1977), p. 87], or the skewness parameter ,lJ of the a-stable distributions [see Ibragimov
and Linnik (1971), Theorem 2.2.2]).
50
The following result justifies the Replicate Histogram as a diagnostic tool for describing F:
Theorem 2.1:
If
In is as defined in (1.1),
Q(m) = O(m- E) for some E > 1/2-y,
and
(2.2)
1\
F n(y)a~. F(y) VyEIR.
then
Further, if F is continuous then
1\
sup I F n(Y) - F(y) la~·O.
II
The rate (E) can be interpreted as follows: Recall that -y in (1.1) determines the size of the
subseries. For large
smaller
E,
E,
the time-series is nearly i.i.d., and in this case -y can be very small; for
however, we see that -y needs to be larger. This is because small
E
signifies strong
serial dependence, and the subseries must be relatively large to capture the full extent of this
dependence. The proof of Theorem 2.1 is in the appendix.
B) Spatial Data
We now show that the Replicate Histogram can be used to estimate the distribution of a
statistic computed on data contained in a possibly irregularly shaped set of indices from a
stationary random field {X j :iEZ 2}. One context where such an estimator is needed is in image
analysis problems. For a model that describes an image of interest, there will usually be
parameters to estimate. In order to make inferences from a statistic to a parameter it estimates
we need some information about the statistic's distribution.
The basic idea is the same as in the case of time-series data. Here we compute the
statistic, s, on overlapping "subshapes" of data (analogous to subseries). For each n, let On be a
finite set of lattice points in Z2, at which observations are taken. In general the "shape" On
may be quite irregular. Formally, let f D be a function from IRIDnl to Ri and let
n
s(Dn):=fD (X j : jED n) be a statistic of interest (f is assumed to be invariant under translations
n
of the set On)' As in the time-series case, we assume that the standardized statistic t(D n ):
=
an(s(D n) - b n) has limiting distribution F and our goal is to obtain diagnostic information
about F. Let D;(n)' i=l,...,k n denote overlapping subshapes of On where I(n) determines the
common size of each subshape [1(n) = Lcn -y J as in Section 1] and k n denotes the number of
subshapes. Analogously to Section 2.A we compute the Replicate Histogram:
51
and use this to obtain information about F(y): = M.!!1oolP[t(Dn) :::; y].
We now describe the index sets On and D;(n) precisely. Let A C (0,1] x (0,1] be the
interior of a simple dosed curve which will serve as a template for On and D;(n)' To avoid
pathological cases we assume that the boundary of A has finite length. Now multiply the set A
by n, to obtain the set An C (O,n] x (O,n]; i.e., An is the shape A inflated by a factor of n. The
data are observed at the indices in Dn:={ieA n n 12}. This formulation allows for a wide variety
of shapes on which our data can be observed.
"Subshapes" are obtained by dividing (0,n]2 into I(n) x I(n) subsquares (of which there are
(n -/(n)+1)2). In (O,/(n)] x (O,/(n)] identify the set D'(n)' and do the same in each subsquare by
simply translating the origin. Since there is only data in On' we only use the k n subshapes
D;(n)' i=l,...,k n , which are in subsquares whose indices are completely contained in An'
1\
In order to formally state the consistency result for F n(y):=Gn(b'(n)+y/a'(n))' the
dependence in our random field needs to be quantified. As in Section 2.A we measure the
strength of dependence by a model-free "mixing" coefficient. Define:
where
~(Ai)
contains the events depending on {Xj:jeA i} and d(A I ,A 2) is the minimal city-block
distance between index sets Al and A2. Note that in the random field setup, the dependence
between two sets of random variables (characterized by a p( . )) is a function not only of the
distance between the two sets, but also of each set's cardinality. In time series cardinality is
generally not accounted for.
This is considered acceptable because there are many common
proceSses which satisfy a-mixing. For example, AR( 1) processes with normal, double
exponential or Cauchy errors (see Gastwirth and Rubin (1975)). In the random field case there
is no consensus as to whether accounting for cardinality is necessary. Bradley (1991) has shown
that, for some random fields, mixing conditions that account for cardinality (like 2.4, below)
hold while mixing conditions that do not account for cardinality fail. For this reason we
account for cardinality in our mixing coefficient. We assume the following mixing condition
holds:
a (m)
sup ~:::; Oem-e) where
p
t:
> 2+(111).
(2.4)
Condition (2.4) says that, at a fixed distance (m), as the cardinality increases we allow
dependence to increase at a rate controlled by p. As the distance increases we need the
52
dependence to decrease at a polynomial rate in m. The relationship between the rate (f) and the
subshape size (,) is the same as in the discussion after Theorem 2.1. The following result
justifies using the Replicate Histogram Gn(y) for an arbitrary statistic ( computed on dependent
data in a possibly irregularly shaped set of indices) with arbitary limiting distribution F:
Theorem 2.f!:
If I(n) is as defined after (1.1) and (2.4) holds then:
/\
F n(y)a~'F(y) 'v'yfiR as n-oo.
The proof is given in the appendix.
3. Examples
In this Section, we examine the finite sample behavior of the Replicate Histogram. Let
{Xi:-oo < i < oo} be an AR(1) process with X j+1 =,8X i+fj+l' 1,81 < 1, fj LLd. N(O,I). Then
{Xj} is a stationary sequence with X j - N(O,~). In many contexts it is of interest to detect
1-,8
observations that fall above some critical cutoff point "1 (which may be unknown and
unobservable). We let Zj=I{X j > "1}. Then {Zj} is also a stationary seqJlence. In order to
assess the variability of the process {Zi} we want an estimate of (7'2:=V(ZI)'
n
Based on a sample sequence of length n, we use the usual sn:=:L
j=1
(Z._Z)2
l I a s an estimate.
n-
This is a reasonable estimate as sna.=:. (7'2 even under mixing dependence. In order to assess the
sampling distribution of sn (e.g., normality, skewness) we calculate the n-/n +l subseries
replicates:
i+l n
:L Z·
.
1
' for 1=0,1,.. ,n-/n •
j=j+l )
n
We observe a sample sequence of length n=200 from {Zi}' We let c=3 and ,=.49 and compute
the n-/n +1=161 subseries replicates of s1 in the following 2 cases:
n
case 1:
"1=0, i.e., Zj=S{X i > O};
case 2:
"1=1, i.e., Zi=D{X j > 1}.
The smoothed Replicate Histograms are drawn in Figures 4.1 and 4.2. Each has been
smoothed by eye in order to make the pictures less jagged. There are some apparent differences
between the 2 cases. The plot for "1=0 is extremely skewed left and hence seems incompatible
53
Figure 4.1: Smoothed Subseries Replicates
n=200. In =40
L{)
<0
N
N
L{)
N
01
I"')
........
>-
--
'-"
<0
N
o
o L-O-.2--'---O..l..2":::"'-~"'-~O:l..2----I.y--O...J.""2-""""--0..1.o3----'--lO. 3
Figure 4.2: Smoothed Subseries Replicates
o
o ....
O-.O-...-~O......-l---'---O...L..1---'y--O.l-2---'---o...L...::::::-.-......_.....J.
2
54
O o3
Figure 4.3: Smoothed Subseries Replicates
n= 1000, 1n =88
N
.....
0
co
U1
U1
..,.
rt)
..--.
-
>.
..........
0
0'1
N
U1
..,.
OL.-_-"'-_......L.-'-~~~.a:=:::..;:...--'-_--.:L-.._"""'--_~:--
o 0.1
0.2
0.2
y
......_~
0.3
0.3
Figure 4.4: Smoothed Subseries Replicates
n= 1000. 1n =88
or--"""---r--"--""'-""'~~r---"""--r----.-.....,
o
"<t
o
N
o
0 ..._-0-.--'---0....1...::::......-......--0...L.-10
1
....y - -.l...2------....:.l.-,3-......- - - J.
O
0
O3
55
2
Figure 4.5: p.d.f. of T=( 1 -W )/4
II')
°
-==:t::==:=;:j=:._ _..L-._ _...........L_.-J
0L.
0-5.0
-4.0
-.3.0
-2.0
-1.0
0.0
1.0
y
Figure 4.6: p.d.f. of the Standard Normal Distribution
vr----r----r---r--_~-___,r__--r__-___,r_-_
o
o
O'--_-==:J:;...._ _J.-_ _J.-_ _J.-_ _l--_ _l--_..:::t:::=._--J
o -4.0 -.3.0
-2.0
-1.0
0.0
Y
56
1.0
2.0
.3.0
4.0
with a normal distribution while the plot for '1=1 does not contradict a normal distribution.
To see if our suspicion is correct we generate a series of length n=1000, use the same c
and 1, and plot the resulting Replicate Histograms in Figures 4.3 and 4.4. The plot for '1=0
again strongly implies a skewed-left non-normal distribution, while the plot for '1=1 again is
compatible with a normal distribution.
For this data [from an AR(l) process with ,8=.5], asymptotic theory shows us why the
pictures for the two cases ('1=0) and ('1=1) differ. When '1=1: n 1 / 2(sn-0'2)1N(0,1I 2) by
Ibragimovand Linnik, Theorem 18.5.4. When '1=0: n(sn -0'2)1T:=!(1-W 2) where W is
N(0,r 2 ) by Carlstein (1988b), Example 4. T has density:
2(2)1/2exp (4t-1 )
2r 2
f(t)
1/2
1/2 for
1r
(1-4t) r
-00
1
< t < 4·
The distributions of N(O,l) and T are drawn in Figures 4.5 and 4.6.
The agreement of the Replicate Histograms for both cases ('1=1,'1=0) with the correct
corresponding limiting distributions (N(0,1I 2 ), T) is quite good. As expected, the Replicate
Histogram gives a more accurate picture of the limiting distribution for the larger sample size
(n=1000). Howev.er, even for a sample series of length n=200, we get a clear warning that
normality is doubtful in the case '1=0 while the '1=1 plot is compatible with normality. This
shows that even without knowledge of the proper standardizations (n 1/ 2, n), the centering (0'2),
and the underlying dependence mechanism, the Replicate Histogram still provides useful
diagnostic information about the correct form of the limiting distribution. In particular, it gives
a good indication of whether the limiting distribution is normal or not, and if not, how it
departs from normality.
To illustrate the use of the Replicate Histogram for spatial data, consider data arising
from a random field with the following probability structure: For each ifl 2,
P[Xi=u I X j :jfZ 2 - i] = P[Xi=u I E X j ] = 1 - P[Xi=v I E Xj]' u,VfR, where N(i) denotes the
ifN(i)
jfN(i)
indices of the four nearest neignbors of i. That is, we nave a binary Markov random field where
transition probabilities are completely determined by the sum of the values at the four
neighboring sites.
We observe data from two 50 x 50 random fields (so D n is the 50 x 50
square) with this structure, with different values of (u,v) for each, and we are interested in the
magnitude of the sample mean, i.e., s(D n) = I xD I. To assess the distribution of s(D n) we let
n
l(n)=40 and hence kn=(n -l(n) + 1)2 = 121 subshape replicates in each case.
Pictured in Figures 4.7 and 4.8 are the smoothed Replicate Histograms in the two cases.
The statistic s(D n) is clearly positive so we use a "fold over" boundary adjustment (Schuster
57
Figure 4.7: Smoothed Subshape Replicates
s(Dn)=lxo nI, n=50. 1(n)=40. k n= 121
58
Figure 4.9: Smoothed Subshape Replicates
s(Dn)=lxo n I, n= 100, 1(n)=80, k n =441
0
~
O'l
IX)
I'
1O
........
>,
......,.
-
LJ1
v
1"1
C'l
O'--.......- - I _.......--I._"""-........._,;:;.....l...---I._...I...-
0.0
0.1
0.2
0.3
0.4
0.5
"--.......- - I _.......--I._ _-I.........__...1
0.6
0.7
0.8
0.9
1.0
Y
Figure 4.10: Smoothed Sub-shape Replicates
s(Dn)=lxDn I, n= 100, 1(n)=80, k n=441
o
C'l
IX)
o
v
o
0 L -.......---J,.....::;........--I._"""--I.........__...l...---I._..L-
0
0 .0
0.2
0.4
0.6
0.8
1.0
y
59
"--.......- - I _.......--L.......;;:_...J-.......---..J
1.2
1.4
1.6
'1.8
2.0
Figure 4.1 1: The Half Normal Distribution
co
0
r-0
lO
0
ll"l
0
,......
>.
........
-
v
0
1"1
0
N
0
0
0
0
0 .0
0.4
0.8
1.2
1.6
2.0
y
60
2.4
2.8
3.2
3.6
4.0
(1985» with our smoothing in both cases. In the first picture the mode of the distribution
appears to be at 0 and the distribution is skewed right. In the second picture the distribution is
reasonably compatible with a normal distribution.
To get a more stable estimate we generate two 100 x 100 random fields with the same
probability structure as above and with the same values of (u,v) as above. The first picture
( Figure 4.9) now strongly implies a skewed right distribution with mode at 0 while the second
(Figure 4.10) is quite compatible with a normal distribution.
Asymptotic theory (Ellis (1984), Theorems V.7.2 and V.7.7) tells us why the two pictures
are so different. In the first case u = 1 and v= - 1. In this situation IOn 11/2s(On) converges
to a half-normal distribution (pictured in Figure 4.11). In the second case u=3 and v= - 1 and
we have that IOn 11/ 2(s(D n) -1) is asymptotically normal.
The agreement of the Replicate Histograms for both cases (u=l,v = - 1 and
u=3,v = - 1) with the correct corresponding limiting distributions (half-normal, normal) is
quite good. Thus in the case of spatial data from a random field, as in the time series setup, the
Replicate Histogram provides useful diagnostic information about the correct limiting
distribution, even in the absence of knowledge about the centerings (0,1), scaling ( I On
1
1 /
2),
and underlying dependence mechanism .
Suppose that for a certain statistic, s(On)' the Replicate Histogram is employed and an
assumption of normality seems reasonable, i.e., t(On):=n 1/ 2(s(On) - 0) !N(O,0'2). In order to
draw inferences about 0 we need an estimate of the nuisance parameter
0'2.
It is desirable to get
a consistent estimate, ~ 2, of 0'2 in the model-free spirit of this Section and Section III. Then we
would have that n 1/ 2(s(On) - o)/~ !N(O,l) and thus obtain a confidence interval for o. This is
a major motivation for Section V.
61
Appendix: Proofs
Proof of Theorem 2.1:
.
Let l(n):=/n and for each kEN and lEN define F,(y): = lP{t?:5 y},
L2
'" -1
,.,.
L:
i=O
F k ,(y):=
,
I{
t;.
:5 y}
,.,.
2
and A k :=
max
IF k ,(y)-F,(y)
k
1:/(k 2) :5 1:5 l«k+1)2)
,
Now note that for each nEN, 3 k(n)fN such that k 2(n) ~ n
I·
< (k(n)+1)2. By the triangle
inequality we have:
A
IF n(y)-F(y)
I
A,.,.
,.,.
~ IF n(y)-F k(n),'(n)(y) 1+ I F k(n),'(n)(y)-F'(n)(y) I +
I F'(n)(y)-F(y) 1(04.1)
We have that I F'(n)(y)-F(y) 1-0 as n-oo by assumption. We will use Lemma 1 (below) to
show that the second term on the R.H.S. of (4.1) tends to O.
Lemma 1:
a•••
A
~k-
0 as k
-00.
Proof:
Let d(ij):= Ii-j
P[A k > 6] ~
I.
E
1:/(k 2) :5/:5/«k+1)2)
Now, P[ I Fk,,(y)-F,(y)
Var(F
k
,
:5
P[ I F k,,(y)-F,(y) 1 > 6].
I > 6] ~
(1/6 2)Var(Fk,,(y)) as EFk,,(y)=F,(y).
,(y))= 14 [ E E Cov[l{t;
k d(i,r) < 21
~ k 2 41 + k4 a( I)
< y},D{ti :5 y}] +
EE
d(i,r)
~
Cov[D{t;
21
< y},D{ti < y}]]
]
by letting each summand in the first sum be bounded in absolute value by 1 and noting that
there are at most k 241 terms in the first sum (0:5 i,r :5 k2 -1) and that each of the at most k4
summands in the second sum is no larger than a( I) in absolute value.
Now note that for k sufficiently large (c/2)k2-r < l(k 2 ) :5 1:5 i«k+ 1)2) :5 I( 4k 2) :5 2ck 2-y by
(1.1), a(1) ~
crt: by (2.2) and hence
(4.2)
Now,
(4.3)
62
The second inequality follows from (4.2) while the third comes from the fact that
1«k+1)2)-/(k2) ~ c{(k+1f'Y-k 2"}+1 ~ c + 1 beca~se 1 < 1/2. The finiteness of the last sum
follows from the inequalities 2-21 > 1 and 21 t" > 1 by (1.1) and (2.2). We have that
00
L: P[A Ie > 6] < 00 and hence that A lea.:!.. 0 as k-oo.
1e=1
,...
Now, IF Ie(n) l(n)(y)-F/(n)(y) 1 ~
,
max
1:/(k 2(n» ~ 1 ~ 1«k(n)+1)2)
,...
a.".
IF Ie(n) l(y)-F/(y) I =AIe(n)-+ O.
,
using Lemma 1 and the fact that k(n)-oo as n-oo. From (4.1) it only remains to show that
I ~ n(y)-F le(n),/(n)(y) j'::'O to conclude that 1~ n(y)-F(y) j':':·O.
n-/n
1\,...
1F n(y)-F le(n),/(n)(y) I =
2
I
4:
1=0
2"
"
I{
t1
t1
~ y} le 2 (n)_1 I{
~ y}
(n:'1 +1) ;2( )
4:
n
1=0
n
"
"
Ie (n)-II{ti ~y} Ie (n)-II{ti ~y} n-I ~{ti ~y}
n-l
~{ti ~y}
=12:
n
-2:
n2
+2:
n
-2:
n
I
"=0
(n-/ n+1)
"=0
k (n)
2
(n-/ n +1).
I +1(n-/n +1)
11i = Ie (n)
1 = n- n
le2(n)_1
n-I
n-I
~
L
"=0
1-
1
1
(n-/ n +1)
L
L
_1_ 1 +
1
+
1
k 2 (n)
2
(n-/ n+1) "
I +1 (n-/ n +1)
i = Ie (n)
1 = n- n
~ (I k2(n)-n I +/(n)-1+n-k 2(n)+ l(n)-1)n_/(~)+1 ~ 2(n-k 2(n)+
~ 2(2k(n)+/(n»(nJ(n»
I(n) - l)nJ(n)
-0.
The first inequality is obtained from the triangle inequality and the fact that the
indicator function is always less than or equal to one and the fourth inequality follows from
.
I
notmg that n - k2(n) ~ 2k(n). The last step follows from k2(n) $ nand ;: .0 as n-oo. The
uniform convergence follows in the usual way. This concludes the proof of Theorem 2.1.
Proof of Theorem 2.2:
Let
~(A)
~(A)
> 0 and
denote the area of A and let
II 8A II denote the length of the boundary of A.
II 8A II < 00 implies that 36> 0 such that a 6 x 6 square is completely contained
in A and hence that the corresponding 6n x 6n square is completely contained in An' This
6n x 6n square contains a square set of L6nllattice points, from which we form the
(L6nJ - In+1)2 overlapping subsquares, each In X In' and the corresponding Dhn)
i=l,...,(L6nJ - In+1)2. Note that (L6nJ -In+1)2 $ k n and hence that k n ~ Cn 2.
1\
Using the same logic as in the proof of Lemma 1 we see that Var(F n(Y» $
t
~{kn( 4/n )2 + k~a 2 (In)}' The R.H.S. is O(n- V ) for some v > 1 by (2.4). This implies that
kn
00
In
1\
1\
L: P[ 1 F n(y)-EF n(Y) I
n=1
.
> r]
1\
< 00 and hence that F n(Yt.:!.·F(y).
63
V. MOMENT ESTIMATION FOR A GENERAL STATISTIC
1. INTRODUCTION
Consider the following two problems:
(1) Based on an i.i.d. sample of n observations {Xi}, we want to estimate 0=[{e Xl}.
(2) Based on observations from a stationary random field on the integer lattice (l2), we want to
estimate the variance of the a-trimmed mean.
Although both problems are moment estimation problems, there are two fundamental
differences between them. In Problem (1), if we choose to specify a likelihood it will be
relatively simple due to the i.i.d. assumption, while in Problem (2) the spatial dependence
makes any likelihood function more complicated. A second fundamental difference is that in
Problem (1) the expectation is of a function (exp( .)) of a single observation (Xl)' while in
Problem (2) the variance is of a function (trimmed mean) of all the observations in the data set.
These differences make estimation more difficult for problems like (2). Problem (2) would be
even more difficult if we wanted to estimate the variance of a complicated statistic (e.g.,
Switzer's adaptive trimmed mean (Efron 1982, p. 28)).
The purpose here is to show that, although these two problems occur in quite different
settings, we can use a Method of Moments (MOM) estimator effectively in each. In the i.i.d.
setup, Bickel and Doksum (1977, p. 93) say "[MOM estimates] generally lead to procedures that
are easy to compute", and "if the sample size is large, [MOM] estimates are likely to be close to
the value estimated (consistency)". We show (in Section 3) that these two properties hold, not
only for simple problems in the i.i.d. setup, but also for complicated statistics under
dependence.
Due to the possibility and the consequences of model misspecification (see Section 2) our
MOM approach is nonparametric. We want to estimate 0 in Problem (1) without assuming
knowledge of F, the distribution of X. For Problem (2), we want to estimate the variance
without assuming knowledge of the marginal distribution F, or the dependence mechanism
which generated the observations.
In Section 2 we define a MOM estimate of 0 and compare it with the Maximum
64
Likelihood estimate derived under the assumption of normality. We compare the two estimates
when the data actually are normal and in a case where the data actually are nonnormal. In
Section 3 we define an intuitive and consistent MOM estimate for problems like (2) (estimating
the moments of a general statistic from dependent data).
2. HOW GOOD IS MAXIMUM LIKELIHOOD UNDER MISSPECIFICATION?
In introductory statistics textbooks, the MOM is quickly set aside in favor of ML (e.g.,
see Devore (1982) section 6.2). The main reason for this is the asymptotic efficiency of ML
estimates if the assumed model is correct. An interesting question is: How "robust" is the ML
estimate under misspecification of the likelihood function? For example, if we assume normality
when deriving our ML estimate, how will the ML estimate perform when the data actually arise
from a different distribution? This question has been studied, from another perspective, by
Huber (1981). We address the specific problem of estimating 8:=IE{g(X)} where g( . ) is some
nice function.
Consider the setup discussed in the Introduction in Problem (1): We have n Li.d.
observations X1,,,.,X n and we want to estimate 8=E{exp(X 1)}, so in this case g(t):=exp(t). As
will be seen, this choice of g( .) allows us to compute all finite sample and asymptotic
quantities of interest. Under the assumption that Xi"" N(J.I,l), we find 8=exp(J.I+(1/2» so that
the ML estimate of 8 is
1\
8 MLE:= exp(x+(1/2», where x is the usual sample mean.
A completely nonparametric MOM estimate of 8 is:
n
E exp(X i)
8 MOM:=
n
1\
.
1=1
1\
[Note that this is not the typical parametric MOM estimate which would be the same as 8 MLE
in this case]. In general, the nonparametric MOM estimate for the problem of estimating
8:=E{g(X 1)} is Eg(Xi)/n (see Hoel, Port and Stone (1971) section 2.6).
1\
1\
We will compare 8 MLE and 8 MOM in two different situations: (A) the X/s are
distributed N(J.I,1), and (B) the X/s have density f*(x), where f*(x)=.697(x-J.l)2,
I x-J.l1
~
1.291. This latter distribution has the same mean, variance and skewness as the
1\
N(J.I,1) distribution, so it is interesting to see how 8 MLE will perform under this type of
misspecification. As is customary, we will compare the estimates through their MSE's (Mean
1\
1\
1\
Square Errors), where MSE(8 )=Var(8) + Bias 2(8).
65
Situation A:
Under the assumed normality, for any constant c we have that E{exp(cX)}=
1\
exp(cJl+(c 2 /2». Using this, one can show that MSE(O MOM)=(exp(2Jl+l»(e-l)/n while
1\
.
MSE(O MLE)= (exp(2Jl+l»(exp(2/n)-2exp(I/2n)+I). Listed in Table 5.1, for different values
of n, are the MSE's for each of the two estimates (setting 0=1 for convenience) and the ratio of
the two MSE's (which is independent of 0). Notice that even when the X/s are normally
1\
1\
distributed, the nonparametric 0 MOM has a smaller MSE than that of the parametric 0 MLE
1\
for n ~ 3. In fact for n=1 we have Bias(O MOM)
1\
1\
< Bias(O MLE) and Var(O MOM) <
1\
Var(O MLE)' For larger n this effect wears off and asymptotically we have:
1\
MSE(OMOM)
1\
-
e-l ::::-: 1.718.
MSE(OMLE)
Situation B:
1\
•
1\
In this case we have the same estimates 0 MOM and 0 MLE' but the MSE's are calculated
using the density f*(x). Let a=.697 and let b=1.291. Then for any constant c we find that
E{exp(cX) }=exp(cJl)a[«b 2 /c)-(2b/c 2)+(2/c3 ))exp(bc)-«b 2 /c)+(2b/c 2 )+(2/c3 ) )exp( -bc)].
1\
Using this expression for c=1 and c=2 we find that 0=(1.552)exp(Jl) and that MSE(O MOM)=
(~)exp(2Jl)' Finally, using this expression for c=l/n and c=2/n we can evaluate
MSE(~ MLE)=[exp(I/2)(Eexp(X/n»n_0]2+e[(Eexp(2X/n»n_(Eexp(X/n))2n].
Table 5.1 lists
the MSE's for the two estimates (setting Jl=O for convenience) and their ratio (which is
1\
1\
independent of Jl). One can check numerically that MSE(O MOM) < MSE(O MLE) for each
n ~ 170. Observe that ~ MLE has an asymptotic squared bias of (e· 5-1.552)2=.009. Further,
by Jensen's inequality, Eexp(X/n) ~ 1 so that for any sample size n, Bias2(~ MLE)
(e· 5 -1.552)2. Thus for n ~ 171 we have: MSE(~ MLE) ~ Bias2(~ MLE) ~ .009 >
1\
1\
1\
1.537/n = MSE(O MOM)' Hence MSE(O MOM) < MSE(O MLE) for all
Q.
~
1\
Moreover, 0 MOM is
unbiased for all n and has asymptotic variance equal to O. This, together with the fact that
1\
oMLE is asymptotically biased, implies:
1\
MSE(OMOM)
1\
-0.
MSE(OMLE)
Thus, asymptotically the MLE is very inefficient.
The example here bears out the efficiency of Maximum Likelihood estimates if the
likelihood is correctly specified and if the sample size is sufficiently large (situation A). If the
likelihood is misspecified (situation B), however, a simple Method of Moments estimator can be
much better.
66
Table 5.1. MSE's of ML and MOM estimates of O=IE{exp(X t )} for Situations A and B
Situation B
Situation A
/\
/\
/\
/\
n
MSE(O MOM) MSE(O MLE) Ratio
MSE(O MOM) MSE(O MLE) Ratio
1
1.718
5.092
0.338
1.537
5.191
0.296
2
0.859
1.150
0.747
0.769
2.442
0.315
3
0.573
0.585
0.979
0.512
1.465
0.350
4
0.430
0.382
1.123
0.384
1.019
0.377
10 0.172
0.119
1.446
0.154
0.345
0.445
20 0.086
0.055
1.575
0.077
0.165
0.465
30 0.057
0.035
1.621
0.051
0.111
0.463
40 0.043
0.026
1.645
0.038
0.084
0.456
50 0.034
0.021
1.659
0.031
0.069
0.446
1000.017
0.010
1.689
0.015
0.039
0.399
0
1.718
0
.0094
0
00
0
67
3. ESTIMATING THE MOMENTS OF A STATISTIC COMPUTED ON A RANDOM FIELD
Now consider the problem of estimating the moments of a general statistic s, when the
observations may be dependent. The statistic s may be quite complicated and the presence of
any dependence will make the estimation problem even more difficult. Due to insufficient
knowledge of F (the marginal distribution of each Xi) and the hazards that come with
misspecification (see Section 2), we want to estimate the moments of s without making
assumptions on F. Furthermore, if we have insufficient knowledge of the marginal distribution
F, it is unrealistic to assume that the joint distribution (i.e., the dependence structure) of the
observations is known. For these reasons, we propose a completely nonparametric MOM
estimator of (J, analogous to
"(J MOM in Section 2, which works for a large class of statistics
under many different dependence structures. Moreover, our estimator avoids the need for
detailed theoretical analysis of the statistic s.
Example:
Suppose that we have equally spaced plants on a large (and possibly irregularly shaped)
plot. Let X j denote the yield of the plant at site j and let D n be the set of all sites in the plot
(see Figure 1). We compute the a-percent trimmed mean (TM a ) yield of the sites as a
summary statistic, denoting s(Dn):=TMa(X j
:
jeD n ). In order to assess the variability of s(D n),
we want an estimate of (In:=Var{s(D n)}=E{[s(D n)-[s(D n)]2}. The distribution of yields may
be complicated (or unknown), and observations close to one another will be dependent due to
similar environmental conditions (e.g., moisture in the soil). These factors make the estimation
of (J n a nontrivial problem.
1\
In order to nonparametrically estimate
(In'
using a method an.alogous to
(J
MOM in
Section 2, we would like to have k independent plots D n, compute s(D n) for each, and then
compute the sample variance of these k values. But, because we only have data on a single plot,
we need to generate "replicates" of s( . ) from D n. The basic idea is as follows. Compute the
statistic s on nonoverlapping subplots D;(n) for i=l,... ,k n, where len) < n determines the
common size of each subplot D;(n) C D n, and k n denotes the number of subplots (see Figure
5.1). Then the subplot replicates of s are s(D;(n»=TMa(X j
:
jeD;(n»' 1 ~ i ~ k n. Let
ID I
be
the cardinality of a set D. Define:
kn
1\
~ I D;(n) I [s(D;(n»-Sn]2
. - 1-1
(J n'-
kn
_
k n s(Di(n»
' where Sn:=.L -·k....:.....;;....
1=1
(5.1)
n
1\
The nonparametric MOM estimator
(J
n is simply the sample variance of the
[standardized] replicates s(D;(n»' i=l, ...,k n. This type of MOM estimator has been employed
68
Figure 5.1: Plot On with Subplot Replicates
O~(n)
n=100. 1(n)=10. k n=67. IOnl=7440, 101(n)I=68
o
O'l
o
00
o
r--
~t:--_-:-.l-.--_--:-..L..-_.....,.........L-_-.."....I-_-..".-L-._~~---,::,...-J--___:,....-J
~E--~.L-----=--'--_-=-...L-_-=-...J....._-=--'-_~-'-_-=-'-_--::::-'-_--::::--!
0E---=-.l-.--_-.".-..L..-_--=-....L-_-.."....I-_-..".-'-_~~-__:::,...-J--___:,....-J,..--,....J_--..,_'I
O""'""'...................................""'"""".................l.I..o................~...............""""-...............o..I.l..................""""""..................l.J.I....................J,...................u
o
10
20
30
40
50
69
60
70
80
90
100
successfully by Carlstein (1986a, 1988a), Possolo (1991), and Politis and Romano (1992) in
simpler scenarios (e.g., for time-series data, rectangular shapes D n, and linear statistics).
The target parameter here is 0:= lim
n-~
1 Dn
IOn' the asymptotic variance of s(D n). The
following intuitive argument explains why 0 n is a reasonable estimator of 0. Algebraic
manipulations show that:
where t(D):= 1 D 1 1 / 2 (s(D)-E{s(D)}). Each D;(n) is separated from all but a few of the other
subplots. This implies that the t(D;(n»' i=l,...,k n • behave as approximately independent
replicates, assuming the dependence is weak at large distances. Therefore, since lE{t(D;(n»} == 0,
k
we expect
t t(D;(n»/k n to tend towards °for. large k n·
k
Similarly,
1=1
t
,=1
t 2(D;(n)/k n should be
close to E{t 2(D;(n»} for large k n. For large /(n), lE{t 2(D;(n»} is close to 0, because
lim E{t 2 (D n)}=0. Thus we expect ~ )~.+O, provided kn-oo, /(n)-oo, and provided that
n-oo
dependence decays as distance increases. This is simply a Law of Large Numbers argument (the
same argument used to establish consistency of MOM estimates in the i.i.d. case). Essentially
the same logic holds here for complicated statistics computed on dependent data.
General Case:
We now address the problem of estimating any moment of a general statistic t. Consider
observations from a stationary random field {X j :ifZ 2}. For each n, let D n be a finite set of
lattice points in Z2, at which observations are taken. Let fD be a function from IRIDnl to 1R 1
n
and let t(Dn):=fD (X j : jfD n) be a statistic of interest (f is assumed to be invariant under
n
translations of the set D n). Assume that E{ t r (D n)}-OfR I as n-oo, so 0 is the asymptotic r th
/\
moment of t. Here we give a consistent MOM estimate of the parameter 0. The estimate 0 n'
analogous to Section 2, is simply the empirical average of subshape replicates tr(D;(n»'
i=l,...,k n, where D;(n) C D n; that is,
(5.2)
We now describe the index sets D n and D;(n) precisely. Let A C (0,1] x (0,1] be the
interior of a simple closed curve which will serve as a template for D n and Dj(n)' To avoid
pathological cases we assume that the boundary of A has finite length. Now multiply the set A
70
by n, to obtain the set An C (O,n]
X
(O,n]; Le., An is the shape A inflated by a factor of n. The
data are observed at the indices in 0n:={ifA n n Z2}.
Subshape replicates are obtained by dividing (0,n]2 into I(n) x I(n) subsquares (of which
there are L I~/)' In (O,/(n)] x (O,/(n)] identify the set O'en)' and do the same in each subsquare
by simply translating the origin. Since there is only data in On' we only use the k n subshapes
o1(n)' i=l,...,k n , which are in subsquares completely contained in An (see Figure 1). By
choosing
l(n)=L.Bn 6 J for some
.B >
°
(5.3)
and of(0,1)
A
we get kn-oo (i.e., an increasing number of replicates in 0 n) and I(n)-oo (so that, for each i,
E{t r (01(n))} approaches the targ~t parameter 0).
A
In order to formally state a consistency result for 0 n' the dependence in our random field
needs to be quantified. Since our approach is nonparametric, we measure the strength of
dependence by a model-free "mixing" coefficient of the type introduced by Rosenblatt (1956).
The mixing condition says roughly that observations separated by large distances are
approximately independent. Define:
where <:1(A i ) contains the events depending on {Xj:jfA j } and d(A I ,A 2) is the minimal city-block
distance between index sets Al and A2.
If the observations are independent, then Q'p(m)=O for all m
~
1. Here we allow
dependence but require that Q'p(m) approach 0 for large m, at some rate depending on the
cardinality p. Specifically, we assume:
(5.4)
Condition (5.4) says that, at a fIXed distance (m), as the cardinality increases we allow
dependence to increase at a rate controlled by p. As the distance increases we need the
dependence to decrease at a polynomial rate in m. To interpret the rate (f) recall that 0
determines the size of the subshape replicates. For large f, the random field is nearly i.i.d., and
in this case 0 can be very small; for smaller f, however, ·we see that 0 needs to be larger. This is
because small f signifies strong dependence between sites, and the subshape replicates must be
relatively large to capture the full extent of this dependence. Mixing conditions of the form
71
(5.4) have been studied and justified by Bradley (1991).
1\
Here is a formal consistency result for the MOM estimator 0 n (a proof is sketched in the
Appendix).
Theorem:
1\
Assume that E{tr(Dnn-OeRl, l(n) is as defined in (5.3), and 0 n is as defined iIi (5.2).
If
{tr(DnH is Uniformly Integrable
and
(5.4) holds,
then
Application:
In the agricultural example (above) we can establish consistency of the MOM variance
estimator (5.1) by simply applying the Theorem with r=1 and r=2. Note that this MOM
variance estimator is consistent for any statistic (not just the a-trimmed mean) satisfying the
conditions of the Theorem.
Example 5.1: The Two-Dimensional Ising Model
To study the performance of the nonparametric MOM variance estimator in a specific
situation we consider the two-dimensional Ising model. This is a special case of an auto-model,
a generalization of autoregression to spatial data (see Besag (1974)), and more generally of a
Gibbs Random Field. For each site ieZ 2, define N(i) to be the four "neighbors" of i at Euclidean
distance one from i, and let f(Xil L: Xj) denote the conditional distribution of Xi given the sum
jfN(i)
°
of the neighboring values. The random field is assumed to be Markovian, i.e.,
f(Xil L: X,o) = f(X i IX,o:jeZ 2 - i). If we assume that each Xi takes on only the values 1 and - 1
°fN(i)
ani that:
exp(.8E Xj)
f(11 L: X -) =
jfN(i)
, f( - 11 E X 0) = 1 - £(11 E X J
°fN(i)'
exp(.8
X 0) + exp( - .8
X -)
"fN(i) ,
"fN(i)"
,
jfN(i) ,
jfN(i) ,
1
1
E
t
(55)
we have the Ising model.
For a set of data on a subset of Z2, the problem of interest here is estimation of {3. This
has been studied by Besag (1975) who proposed a "maximum pseudo-likelihood" method of
estimating.8. His suggestion has been justified by an asymptotic consistency result of Geman &
Graffigne (1986). This is, however, a computationally expensive estimator. What is proposed
here is a simple estimator in the spirit of MOM.
72
1\
The Estimator {3
Note that we can solve for {3 in (5.5) to obtain: {3 = 10g{f(11~ Xj)f( -11~ Xjrl}/2~ Xj'
j~N(i)
1\
j~N(i)
j~N(i)
Let f(llk):= #{i: (X i=l) n (~ Xj=k)}/#{i: ~ Xj=k}, the proportion of neighborhoods of i
j~N(i)
j~N(i)
with sum equal to k that have X i=1. If the random field is stationary and mixing (in the
1\
ergodic sense) then f(llk)a~'f(llk) by the Ergodic theorem for k=-4, -2, 0, 2, 4. This holds if
1\
1\
1\
{3 < {3cr (see Ellis (1984) for a discussion of {3cr)' Define {3 Ie:= 10g{f(llk)f( -llkr 1}/2k for k=1\
4, -2, 2, 4. Then {3 lea~. {3 for each of these four values of k. The estimator of {3 is
1\
1\
1\
1\
1\
{3: = ({3 -4+{3 -2+{3 2+{3 4)/4.
To judge the effectiveness of this estimator, 100 random fields, each 500 x 500, were
generated with {3=.1 from model (5.5) via the Gibbs Sampler (see Geman and Geman (1984».
1\
{3 was computed for each of the 100 random fields; the results are in Table 5.2. The average of
the estimates is .10005 (with an estimated standard error of .00014) and the sample variance is
.000002. Thus, it appears that this approach is an effective method of estimation in this
context.
1\
Table 5.2: The Effectiveness of {3 as an Estimate of {3 in the Ising Model
1\
1\
The estimate {3
True {3
1\
E(.8)
Yar(j3)
.10005
(.00014)
.10000
.000002
The first row was estimated from 100 simulations.
1\
Estimation of Yar{j3 }
1\
Define {3 n to be the estimate of {3 based on an n x n data grid. In order to assess the
1\
1\
accuracy of {3 n we desire an estimate of Yar{{3 n}' For each of the 100 independent random
fields generated in the previous section we use (5.1) with kn =25 and In =100 (Le. 25 100 x 100
1\
blocks). The target here is 0'2= n-oo
lim n 2Yar{{3 n} (assumed to exist). The estimator is
1\ 2
2 25 1\.
- 2
1\ .
th
0' =(100) ~ ({3 ioo - {3) /25 where {3 ioo denotes the estimate of {3 based on the i
100 x 100
i
=1
block, i=I,...,25, and {j is the sample mean of the 25 subblock estimates of {3.
0'2
was estimated
1\
empirically from 1220 independent copies of 500 x 500 random fields to be true = .48165
1\
(actually this is an estimate of (500)2Yar{{3 500})' So, MSE(~ 2):=Bias2(~ 2) + Yar(~ 2) is
100 1\ 2
1\
2 100 1\ 2 "2 2
1\ 2
2
approximated by (~ (O'j /100) - true) + ~ (O'j - 0' ) /100) where O'j is the estimate of 0'
th i =1
j =1
based on the i simulation, i=I,...,100.
73
We also compute the variance estimate based on overlapping blocks {X 2Si + j,2Sk + I
:j=l,...,100, l=l,m,lOO}, i=O,...,16, k=O,m,16, i.e. 289 (17 2 ) overlapping 100 x 100 blocks. This
scheme should have approximately the same estimated bias, as both estimators have the same
expectation. It is not clear whether the variance should be larger or smaller than for
nonoverlapping blocks. There are more overlapping subblocks but the overlap leads to larger
positive covariance terms. In the time series case it has been shown (Kiinsch (1989), Remark
3.3) that using all possible overlapping blocks leads (asymptotically) to a smaller Var(~ 2) than
using nonoverlapping blocks. The results in Table 5.3 indicate that this holds in the case of
spatial data, as well. In fact, the improvement found here using "partially overlapping" blocks
is more dramatic than predicted by Kiinsch's results, raising the possibility that there is an
optimal amount of overlap somewhere between all possible overlapping blocks and
nonoverlapping blocks. Two other explanations are possible: 1) The improvement due to
overlapping blocks is different in the time series and random field cases, or 2) The variance
estimates based on the finite data grids are not close to their asymptotic limits.
Table 5:3: The Effectiveness of the Method of Moments Variance Estimator
E(~2)
Var(~2)
.02223
MSE(~2)
.02248
.49720
(.00974)
.00950
.00974
~ 2 (nonoverlapping blocks.9733
(.01491)
~ 2 (overlapping blocks)
1\
True
.48165
The first two rows are estimated from 100 simulations. The third row was estimated
1\
empirically from 1220 independent calculations of {3 soo.
Performance of Confidence Intervals Based on Normality
The previous section shows that ~ is a good estimate of the true
1\
tr
for the Ising Model.
1\
We might hope that ({3500 -1.645~/500, {3soo + 1.645~/500) is an approximate 90% confidence
interval for {3.
Based on this simulation we found that the nonoverlapping scheme had a 90%
coverage rate and the overlapping scheme had a 89% coverage rate. Thus, the coverage rates
are compatible with those based on normality of the estimate. Figure 5.2 shows that this is not
surprising.
Pi~tured
1\
is a smoothed estimate of the density of {35oo based on the 100 values of
1\
1\
{3500 from the simulation. The plot strongly implies that {3 n is asymptotically normal.
74
Figure 5.2: Estimate of the Density of
1J'
0
N
I"l
0
co
N
0
~
N
0
0
N
r-o.
-
>.
........,
0
to
0
N
0
co
0
~
0
0.092
0.094
0.096
0.098
0.100
y
75
0.102
0.104
0.106
0.108
0.110
APPENDIX: PROOF OF THE THEOREM
Without loss of generality, take r=1. Let '\(A) denote the area of A and let
II 8A II
denote the length of its boundary. Let /JAn be the number of subsquares that intersect with
An's boundary. Then
I Dn I ::s 12(n)(kn+/JAn)j
by Apostol (1957, Theorem 10-42(e»
/JAn::S 4 + 411 8A II njl(n)j and from Lemma Al of Rudemo, et al. (1990) we have
-'\(A)
> O.
I D n I /n 2
These three facts, together with the definition of I(n), imply that kn-oo (actually
kn/(n)/n-oo).
Group the I(n) x I(n) subsquares in (0,n]2 into disjoint "blocks" of 4, each block being
2/(n) x 2/(n)j label the 4 subsquares within a block (1,2,3,4), beginning with "1" in the lower left
and proceeding clockwise through the block. Let k~ denote the number of usable subshapes with
label j (j=1,2,3,4), and denote by D;(~), i=I,... ,k~, the i th subshape with label j. We have:
/\
o
I k:
- k: I
j
t:
k
t(Di,j)
I(n) •
j=1 i=1
n
Note that k~jkn-l/4, since
4
E
=
kn
::s
/JA n= o(k n) (this last equality follows from the
previous paragraph). Thus it suffices to prove that:
k
T n :=
1
n
.E
i 1 )
t ( D/(n)
P
1
kn
1=1
-
OJ
an analogous argument applies to j=2,3,4.
Now let t~, i' i=I,... ,k~, have the same marginal distributions as t(D;(~), i=I,... ,k~, but
such that the t~.;'s are independent for fixed n. Note that d(Dn'~)' Dr(~) ~ I(n) for u
4>~(s)
'# v.
Let
and 4>n(s) be the characteristic functions of
and Tn' respectively. Then we have:
The first inequality follows from Lemma 3.3, and from Ibragimov and Linnik's
"telescoping" argument (p. 338). The third inequality follows from (5.4) and from the definition
of I(n). It now suffices to show
T~ !.OJ this follows from the same argument as in Carlstein
(1988a, p. 297).
76
. VI.
REFERENCES
Apostol, T.M. (1957). Mathematical Analysis:
Addison-Wesley, Reading.
A Modern Approach 12 Advanced Calculus,
Ash, Robert B. (1972). &Ai Analysis and Probability, Academic Press, New York.
Athreya, K.B. (1987). Bootstrap of the Mean in the Infinite Variance Case, in Proceedings of the
M Yi:2Ild Congress Qf ~ Bernoulli Society, (Y. Prohorov and V.V. Sazonov, editors), Vol. 2,
95-98.
Basawa, LV., Mallik, A.K., McCormick, W.P., and Taylor, R.L. (1989). Bootstrapping
Explosive Autoregressive Processes, Annals Qf Statistics, 17, 1479-1486.
Bernstein, S. (1927). Sur l'extension du theoreme limite du Calcul des Probabilites aux sommes
de quantites dependantes, Mathematische Annalen, 97, 1-59.
Besag, J. (1974). Spatial Interaction and the Statistical Analysis of Lattice Systems, Journal of
Rsw!:l Stat. Soc., B36, 192-236.
~
Besag, J. (1975). Statistical Analysis of Non-lattice Data, The Statistician, 24, 179-195.
Bickel, P.J. and Doksum, K.A. (1977). Mathematical Statistics, Holden Day Inc., San Francisco.
Bickel, P. and Freedman, D. (1981). Some Asymptotic Theory for the Bootstrap, Annals of
Statistics, 9, 1196-1217.
Bolthausen, E. (1982). On the Central Limit Theorem for Stationary Mixing Random Fields,
Annals Qf Probability, 10, 1047-1050.
Bose, A. (1988). Edgeworth Correction by Bootstrap in Autoregressions, Annals of Statistics, 16,
1709-1722.
Bradley, R.C. (1986). Basic Properties of Strong Mixing Conditions, in Dependence in
Probability, (Eberlein, E. and Taqqu, M.S. editors) 165-192.
Bradley, R.C. (1991). Some Examples of Mixing Random Fields, Technical Report 342, Center
for StochaStic Processes, Dept. of Statistics, University of North Carolina, Chapel Hill.
Bretagnolle, J. (1983). Lois Limites du Bootstrap de Certaines Fonctionnelles, Ann. Inst. H.
Poincare, 19, 281-296.
Carlstein, E. (1986a). The Use of Subseries Values For Estimating the Variance of A General
Statistic From a Stationary Sequence, Annals Qf Statistics, 14, 1171-1179.
Carlstein, E. (1986b). Asymptotic Normality for a General Statistic From a Stationary
Sequence, Annals of Probability, 14, 1371-1379.
Carlstein, E. (1988a). Law of Large Numbers for the Subseries Values of a Statistic from a
Stationary Sequence, Statistics, 19, 295-299.
Carlstein, E. (1988b). Degenerate U-Statistics Based on Non-Independent Observations, Calcutta
Statistical Assoc. Bull., 37, 55-65.
77
•
Deo, C. (1975). A Functional Central Limit Theorem for Stationary Random Fields, Annals of
Probability, 3, 708-715.
m
Statistics For Engineering and the Sciences, Brooks and
Devore, J.L. (1982). Probability
Cole Publishing Company, Monterey.
Dobruschin, P.L. (1968). The Description of a Random Field by Means of Conditional
Probabilities and Conditions of its Regularity, Theory 2f Probability and its Applications, 13,
197-224.
Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife, Annals of Statistics, 7, 126.
Efron, B. (1982). The Jackknife.
Philadelphia.
~
Bootstrap W Other Resampling Plans, SIAM,
Ellis, R.S. (1984). Entropy. .L.i!I:G Deviations
York.
W
Statistical Mechanics, Springer Verlag, New
Freedman, D. (1984). On Bootstrapping Two-Stage Least-Squares Estimates in Stationary
Linear Models, Annals 2f Statistics, 12, 827-842.
Gastwirth, J.L. and Rubin, H. (1975). The Asymptotic Distribution Theory of the Empiric c.d.f.
For Mixing Stochastic Processes, ~ 2! Statistics, 3, 809-824.
Geman, S. and Geman, D. (1984). Stochastic Relaxation, Gibbs Distributions, and the Bayesian
Restoration of Images, IEE.& ~ Pattern Alli!L. Machine Intelligence, 6, 721-741.
Geman, S. and Graftigne, C. (1986). Markov Random Field Image Models and Their
Applications to Computer Vision, in Proceedings of the Int. Congress of Mathematicians,
Berkeley.
Goldie, C.M. and Morrow, G.J. (1986). Central Limit Questions for Random Fields, in
DePendence in Probability, (Eberlein, E. and Taqqu, M.S. editors) 275-289.
Gordon,L. (1974). Efficiency in Subsampling, 'Annals of Statistics, 2, 739-750.
Guyon, X. and Richardson S. (1984). Vitesse de Convergence du Theoreme de la Limite
Centrale pour des Champs Faiblement Dependant, Z. Wahrsch. Verw. Gebiete, 66, 297-314.
Hartigan, J.A. (1969). Using Subsample Values as Typical Values, Journal of the Amer. Stat.
64, 1303-1317.
~,
Hartigan, J.A. (1975). Necessary and Sufficient Conditions for Asymptotic Joint Normality of a
Statistic and its Subsample Values, ~ of Statistics, 3, 573-580.
Hoeffding, W. and Robbins H. (1948). The Central Limit Theorem for Dependent Random
Variables, Duke Journal of Mathematics, 15, 773-780.
Hoel, P.G., Port, S.C. and Stone, C.J. (1971). Introduction to Statistical Theory, HoughtonMiffiin, Boston.
Huber, P.J. (1981). Robust Statistics, John Wiley and Sons, New York.
78
Ibragimov, I.A. and Linnik, Y.V. (1971). Independent and Stationary Sequences of Random
Yariables, Wolters-NoordhofT Publishing, Groningen.
Kendall, M. and Stuart, A. (1977). The Advanced Theory of Statistics, Vol. I, Charles Griffin
and Co" Ltd., London.
Kiinsch, H. (1989). The Jackknife and the Bootstrap for General Stationary Observations,
An.nH 2i Statistics, 17, 1217-1241.
Leadbetter, M.R. and Rootzen, H. (1990). On Central Limit Theory For Families of Strongly
Mixing Additive Random Functions, Technical &l!2I! 295, Center for Stochastic Processes,
Dept. of Statistics, University of North Carolina, Chapel Hill.
Lele, S. (1988). Nonparametric Bootstrap for Spatial Processes, Technical Report 671, Dept. of
Biostatistics, Johns Hopkins University.
Loeve, M. (1977). Probability Theory
I,
Springer-Verlag, New York.
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E. (1953).
Equations of State Calculations by Fast Computing Machines, Journal of Chemical Physics, 21,
1087-1092.
Nahapetian, B.S. (1980). The Central Limit Theorem for Random Fields with Mixing
Conditions, in Advances in Probability, 6, Multicomponent Systems (Dobrushin R.L. and Sinai,
Y.G. editors) 531-548.
Neaderhouser, C. (1978). Limit Theorems for Multiply Indexed Mixing Random Variables with
Application to Gibbs Random Fields, Annals 2i Probability, 6, 207-215.
Politis, D.N. and Romano, J.P. (1992, to appear). On the Sample Variance of Linear Statistics
Derived from Mixing Sequences, Stochastic Processes and lJWr Applications.
Possolo, A. (1991). Subsampling a Random Field, in Spatial Statistics and Imaging, I.M.S.
Lecture Notes-Monograph Series, Vol.20 (A.Possolo, editor) 286-294.
Preston, C.J. (1974). Gibbs States 2!!. Countable Sets, Cambridge Tracts in Mathematics 68,
Cambridge University Press.
.
Quenouille, M. (1949). Approximate Tests of Correlation in Time Series, Journal 2f the
~
&!b Soc., BU, 18-84.
Quenouille, M. (1956). Notes on Bias in Estimation, Biometrika, 43, 353-360.
Rajarshi, M.B. (1990). Bootstrap in Markov-Sequences Based on Estimates of Transition
Density, Ann. !n.§h Statist. .M!illls., 42, 253-268.
Rosenblatt, M. (1956). A Central Limit Theorem and a Strong Mixing Condition, Proc. Nat.
s.g.,.1!.S.A:., 42, 43-47.
~
Rosenblatt, M. (1985). Stationary Sequences and Random Fields, Birkhauser, Boston, Basil,
Stuttgart.
Rosenblatt, M. and Blum, J .R. (1956). A Class of Stationary Processes and a Central Limit
Theorem, Proc. !iM..:. Acad. Sci. ~, 42, 412-413.
79
Rudemo, M., Skovgaard, I. and Stryhn, H. (1990). Maximum Likelihood Estimation of Curves
in Images, Technical &l22!1 90-4, Dept. of Mathematics and Physics, Royal Veterinary and
Agricultural University, Copenhagen.
Schuster, Eugene F. (1985). Incorporating Support Constraints into NonparametriC Estimators
of Densities, Communications in Statistics. ~ A, 14, 1123-1136.
Singh, K. (1981). On the Asymptotic Accuracy of Efron's Bootstrap, Annals Qf Statistics, 9,
1187-1195.
Sunklodas, J. (1986). Estimates of the Rates of Convergence in the Central Limit Theorem for
Weakly Dependent Random Variables, Lithuanian Mathematics Journal, 26, 273-277.
Swanepoel, J.W.H. (1986). A Note on Proving that the (Modified) Bootstrap Works, Commun.
Statist. Theory~, 15,3193-3203.
Takahata, H. (1983). On the Rates in the Central Limit Theorem for Weakly Dependent
Random Fields, Z:. Wahrsch. ~ Gebiete, 445-456.
Tukey, J. (1958). Bias and Confidence in Not-Quite Large Samples (abstract), Annals of Math.
~,29, 614.
Volkonskii, V.A. and Rozanov, Y.A. (1959). Some Limit Theorems for Random Functions,
yeroyatnost.i Primenen, 4, 186-207.
~
Wu, C.F.J. (1990). On the Asymptotic Properties of the Jackknife Histogram, Annals of
Statistics, 18, 1438-1452.
van Zwet, W. (1990). Second Order Asymptotics and the Bootstrap, Lecture notes, Dept. of
Statistics, University of North Carolina, Chapel Hill.
80