Generalized Runs Tests for the IID Assumption

Generalized Runs Tests for the IID Assumption
JIN SEO CHO
HALBERT WHITE
School of Economics and Finance
Department of Economics
Victoria University of Wellington
University of California, San Diego
PO Box 600, Wellington, 6001, New Zealand
9500 Gilman Dr., La Jolla, CA, 92093-0508, U.S.A.
Email: [email protected]
Email: [email protected]
First version: March, 2004. This version: January, 2007.
Abstract
We provide a family of tests for the IID hypothesis based on generalized runs, powerful
against unspecified alternatives, providing a useful complement to tests designed for specific
alternatives, such as serial correlation, GARCH, or structural breaks. Our tests have appealing
computational simplicity in that they do not require kernel density estimation, with the associated challenge of bandwidth selection; and they have power against dependent alternatives
without having to examine dependence at specific lags. Simulations show levels close to nominal asymptotic levels and encouraging power against important alternatives. We apply our
tests to residuals from a male labor supply model using the PSID and find no evidence against
the IID hypothesis.
Key Words IID condition, Runs test, Geometric Distribution, Gaussian process, Dependence,
Structural Break, Model Specification, PSID, Male labor supply.
JEL Classification C12, C23, C80, J22.
Acknowledgements The authors have benefited from discussions with Robert Davies, Chirok Han, Estate
Khmaladze, Tae-Hwy Lee, Peter Thomson, David Vere-Jones and participants at NZESG, NZSA, SRA,
UC-Riverside, and SEF and SMCS of Victoria University of Wellington.
Preliminary Version DO NOT CITE OR CIRCULATE WITHOUT AUTHORS’ PERMISSION.
1
Introduction
The assumption that data are independent and identically distributed (IID) plays a central role in the
analysis of economic data. In cross-section settings, the IID assumption holds under pure random
sampling. As Heckman (2001) notes, violation of the IID property can indicate the presence of
sample selection bias. The IID assumption is also important in time-series settings, as processes
driving time series of interest are often assumed to be IID. Moreover, transformations of certain
time series can be shown to be IID under specific null hypotheses. For example Diebold, Gunther,
and Tay (1998) show that to test density forecast optimality, one can test whether the series of
probability integral transforms of the forecast errors are IID uniform (U[0,1]).
There is a large number of tests designed to test the IID assumption against specific alternatives,
such as structural breaks, serial correlation, or autoregressive conditional heteroskedasticity. Such
special purpose tests may lack power in other directions, however, so it is useful to have available
broader diagnostics that may alert researchers to otherwise unsuspected properties of their data.
Thus, as a complement to special purpose tests, we consider tests for the IID hypothesis that are
sensitive to general alternatives. Here we exploit runs statistics to obtain a necessary and sufficient
condition for data to be IID. Specifically, we show that the runs are IID with geometric distribution
if and only if the underlying data are IID. By testing whether the runs have the requisite geometric
distribution, we obtain a new family of tests, the generalized runs tests, suitable for testing the IID
property. An appealing aspect of our tests is their computational convenience relative to other tests
sensitive to general alternatives to IID. For example, Hong and White’s (2005) entropy-based IID
tests require kernel density estimation, with its associated challenge of bandwidth selection, together with specification of particular lags at which to test for dependence. Our tests do not require
kernel estimation and, as we show, have power against dependence without explicit examination
of specific lags. Our tests also have power against structural break alternatives.
Runs have formed an effective means for understanding data properties since the early 1940’s.
Mood (1940), Dodd (1942), and Goodman (1958) first studied runs to test for randomness of
data with a fixed percentile p used in defining the runs. Granger (1963) proposes using runs as a
nonparametric diagnostic for serial correlation; he also emphasizes that the choice of p is important
for the power of the test. Fama (1965) extensively exploits a runs test to examine stylized facts
1
of asset returns in US industries, with a particular focus on testing for serial correlation of asset
returns. Heckman (2001) observes that runs tests can be exploited to detect sample selection bias
in cross sectional data.
Earlier runs tests compared the mean or other moments of the runs to those of the geometric
distribution for fixed p. Here we develop runs tests based on the probability generating function
(PGF) of the geometric distribution. Previously, Kocherlakota and Kocherlakota (1986) (KK) have
used the PGF to devise tests for discrete random variables having a given distribution under the null
hypothesis. Using fixed values of the PGF parameter s, KK develop tests for the Poisson, PascalPoisson, bivariate Poisson, or bivariate Neyman type A distributions. More recently, Rueda, PérezAbreu, and O’Reilly (1991) study PGF-based tests for the Poisson null hypothesis, constructing
test statistics as functionals of stochastic processes indexed by the PGF parameter s. Here we
contribute by developing PGF-based tests for the geometric distribution with parameter p, applied
to the runs for a given sample.
We construct our test statistics as functionals of stochastic processes indexed by both the runs
percentile p and the PGF parameter s. By not restricting ourselves to fixed values for p and/or s,
we create the opportunity to construct tests with superior power. Further, we obtain weak limits
for our statistics in situations where the distribution of the raw data from which the runs are constructed may or may not be known and where there may or may not be estimated parameters. As
pointed out by Darling (1955), Sukhatme (1972), Durbin (1973), and Henze (1996), among others,
goodness-of-fit (GOF) based statistics such as ours may have limiting distributions affected by parameter estimation. As we show, however, our test statistics have asymptotic null distributions that
are not affected by parameter estimation under mild conditions. We also provide straightforward
simulation methods to consistently estimate asymptotic critical values for our test statistics.
We conduct Monte Carlo experiments to explore the properties of our tests in settings relevant
for economic applications. We find that our tests have well-behaved levels. In studying power,
we give particular attention to dependent alternatives and to alternatives containing an unknown
number of structural breaks. For dependent alternatives, we compare our generalized runs tests
to the entropy-based tests of Robinson (1991), Skaug and Tjøstheim (1996), and Hong and White
(2005). Our tests perform respectably, showing useful, and in some cases superior, power against
dependent alternatives. For structural break alternatives, we compare our generalized runs tests
2
to Feller’s (1951) and Kuan and Hornik’s (1995) RR test, Brown, Durbin and Evans’s (1975)
RE-CUSUM test, Sen’s (1980) and Ploberger, Krämer and Kontrus’s (1989) RE test, Ploberger
and Krämer’s (1992) OLS-CUSUM test, Andrews’s (1993) Sup-W test, Andrews and Ploberger’s
(1994) Exp-W and Avg-W tests, and Bai’s (1996) M-test. These prior tests are all designed to
detect a finite number of structural breaks at unknown locations. An innovation here is that we
consider alternatives in which the number of breaks grows with the sample size. We find that our
new tests perform well against such structural break alternatives, whereas the prior tests do not.
Finally, we consider a model of male labor supply developed by MaCurdy (1981) and Altonji
(1986), which they estimate using the Panel Study of Income Dynamics (PSID). Using the PSID,
we test whether the IID hypothesis holds for the residuals of this model. This enables us to investigate the success of the original random sampling design, its preservation over time, the presence
or absence of structural breaks, and the possible presence or absence of model misspecification.
We find little evidence against the IID null hypothesis, supporting the design of the PSID and the
specification of the MaCurdy (1981)-Altonji (1986) model.
This paper is organized as follows. In Section 2, we introduce our new family of generalized
runs statistics and derive their asymptotic null distributions. These involve Gaussian stochastic
processes. Section 3 provides methods for consistently estimating critical values for the test statistics of Section 2. We achieve this using other easily simulated Gaussian processes whose distributions are identical to those of Section 2. Section 4 contains Monte Carlo simulations, and
Section 5 applies our methods to the PSID data. Section 6 contains concluding remarks, and all
mathematical proofs are collected in the Appendix.
Before proceeding, we introduce mathematical notation used throughout. We let 1{ · } stand for
the indicator function such that 1{A} = 1 if the event A is true, and 0 otherwise. ⇒ and → denote
d
‘converge(s) weakly’ and ‘converge(s) to’, respectively, and = denotes equality in distribution.
Further, k · k and k · k∞ denote the Euclidean and uniform metrics respectively. We let C(A) and
D(A) be the spaces of continuous and cadlag mappings from a set A to R respectively, and we
endow these spaces with Billingsley’s (1968, 1999) or Bickel and Wichura’s (1971) metric. We
denote the unit interval as I := [0, 1].
3
2
Testing the IID Condition
2.1 Maintained Assumptions
We first specify the data generating process (DGP) and a parameterized function whose behavior
is of interest.
A1 (DGP): Let (Ω, F, P) be a complete probability space. For m ∈ N, {Xt : Ω 7→ Rm , t =
1, 2, · · · } is a stochastic process on (Ω, F, P).
A2 (PARAMETERIZATION): For d ∈ N , let Θ be a non-empty convex compact subset of Rd . Let
h : Rm × Θ 7→ R be a function such that (i) for each θ ∈ Θ, h(Xt ( · ), θ) is measurable; and
(ii) for each ω ∈ Ω, h(Xt (ω), · ) is such that for each θ, θ 0 ∈ Θ, |h(Xt (ω), θ) − h(Xt (ω), θ 0 )| ≤
Mt (ω)kθ − θ 0 k, where Mt is measurable and is OP (1), uniformly in t.
Assumption A2 specifies that Xt is transformed via h. The Lipschitz condition of (ii) is mild
and typically holds in applications involving estimation. Our next assumption restricts attention to
continuously distributed random variables.
A3 (C ONTINUOUS R ANDOM VARIABLES): For given θ ∗ ∈ Θ, the random variables Yt :=
h(Xt , θ ∗ ) have continuous cumulative distribution functions (CDFs) Ft : R 7→ I, t = 1, 2, · · · .
Our main interest attaches to distinguishing the following hypotheses:
H0 :
{Yt : t = 1, 2, · · · } is an IID sequence;
H1 :
{Yt : t = 1, 2, · · · } is not an IID sequence.
Under H0 , Ft ≡ F (say), t = 1, 2, ... . We separately treat the cases in which F is known or
unknown. In the latter case, we estimate F using the empirical distribution function.
We also separately consider cases in which θ ∗ is known or unknown. In the latter case, we
assume θ ∗ is consistently estimated by θ̂ n . Formally, we impose
A4 (E STIMATOR): There exists a sequence of meaurable functions {θ̂ n : Ω 7→ Θ} such that
√
n(θ̂ n − θ ∗ ) = OP (1).
4
This condition is relevant when the behavior of transformations of the data, such as regression
errors, are of interest. We pay particular attention to the effect of parameter estimation on the
asymptotic null distribution of our test statistics.
2.2 Generalized Runs (GR) Tests
We begin by analyzing the case in which θ ∗ and F are known. We define runs in the following two
steps: first, for each p ∈ I, we let
Tn (p) := {t ∈ {1, · · · , n} : F (Yt ) < p},
n = 1, 2, · · · .
This set contains those indices whose percentiles F (Yt ) are less than the given number p. Next,
let Mn (p) denote the (random) number of elements of Tn (p), let tn,i (p) denote the ith smallest
element of Tn (p), i = 1, · · · , Mn (p), and define the p-runs Rn,i (p) as


tn,i (p),
i = 1;
Rn,i (p) :=
 t (p) − t
n,i
n,i−1 (p), i = 2, · · · , Mn (p).
Thus, a p-run Rn,i (p) is a number of observations separating data values whose percentiles are less
than the given value p. This notion is related to that of stopping times. Note that Mn (p)/n =
p + oP (1).
Our first result provides a characterization of runs, new to the best of our knowledge, that can
be exploited to yield a variety of runs-based tests consistent against arbitrary departures from the
IID null hypothesis.
L EMMA 1: Suppose Assumptions A1, A2(i), and A3 hold. Then for each n = 1, 2, · · · , {Yt ,
t = 1, · · · , n} is IID if and only if for every p ∈ I, {Rn,i (p), i = 1, · · · , Mn (p)} is IID with
distribution Gp , the geometric distribution with parameter p.
By Lemma 1, alternatives to IID {Yt } may manifest as p-runs with (i) distribution Gq , q 6= p;
(ii) non-geometric distribution; (iii) heterogeneous distributions; (iv) dependence; or (v) any
combination of (i) − (iv). To keep our analysis manageable, we focus on detecting (i) − (iii)
by testing the p-runs for distribution Gp . As we shall see, this delivers well-behaved tests with
power against both dependent and heterogeneous alternatives to IID. Specifically, we exploit a
5
GOF statistic based on the PGF to test the Gp hypothesis. For this, let −1 < s < 0 < s̄ < 1; for
each s ∈ S := [s, s̄], define
¶
Mn (p) µ
1 X
sp
Rn,i (p)
s
−
Gn (p, s) := √
,
{1 − s(1 − p)}
n i=1
if p ∈ ( n1 ,
n−1
),
n
(1)
and Gn (p, s) := 0 otherwise. This is a scaled difference between the p-runs
sample PGF and the Gp PGF.
Two types of GOF statistics are popular in the literature: those exploiting the empirical distribution function (e.g., Darling, 1955; Sukhatme, 1972; Durbin, 1973; and Henze, 1996) and those
comparing empirical characteristic or moment generating functions (MGFs) with their sample estimates (e.g., Bierens, 1990; Brett and Pinkse, 1997; Stinchcombe and White 1998; Hong, 1999;
and Pinkse, 1998). The statistic in (1) belongs to the latter type, as the PGF for discrete random
variables plays the same role as the MGF, as noted by Karlin and Taylor (1975). The PGF is especially convenient because it is a rational polynomial in s, enabling us to easily handle the weak
limit of the process Gn . Specifically, the rational polynomial structure permits us to represent this
weak limit as an infinite sum of independent Gaussian processes, enabling us to straightforwardly
estimate critical values by simulation, as examined in detail in Section 3.
Our use of Gn builds on work of Kocherlakota and Kocherlakota (1986) (KK), who consider
tests for a number of discrete null distributions, based on a comparison of sample and theoretical
PGFs for a given finite set of s’s. To test their null distributions, KK recommend choosing s’s
close to zero. Subsequently, Rueda, Pérez-Abreu, and O’Reilly (1991) examined the weak limit of
an analog of Gn ( p, · ) to test the IID Poisson null hypothesis. Here we show that if {Rn,i (p)} is a
sequence of IID p-runs distributed as Gp then Gn ( p, · ) obeys the functional central limit theorem;
test statistics can be constructed accordingly. Specifically, for each p,
Gn ( p, · ) ⇒ G(p, · ),
where G(p, · ) is a Gaussian process such that for each s, s0 ∈ S, E[G(p, s)] = 0, and
E [G(p, s)G(p, s0 )] =
ss0 p2 (1 − s)(1 − s0 )(1 − p)
.
{1 − s(1 − p)}{1 − s0 (1 − p)}{1 − ss0 (1 − p)}
(2)
This follows by showing that {Gn (p, · ) : n = 1, 2, · · · } is tight (see Billingsley, 1999); the given
covariance structure (2) is derived from E [Gn (p, s)Gn (p, s0 )] under the null. Let f : C(S) 7→ R be
6
a continuous mapping. Then by the continuous mapping theorem, under the null any test statistic
f [Gn (p, · )] obeys
f [Gn (p, · )] ⇒ f [G(p, · )].
As Granger (1963) emphasizes, the power of runs tests may depend critically on the specific
choice of p. To take full advantage of this, we consider the behavior of Gn as a random function
of both p and s, and not just the behavior of Gn (p, · ) for given p. Specifically, under the null, a
functional central limit theorem ensures that
Gn ⇒ G,
where G is a Gaussian process such that for each (p, s) and (p0 , s0 ) with p0 ≤ p, E [G(p, s)] = 0,
and
ss0 p0 2 (1 − s)(1 − s0 )(1 − p){1 − s0 (1 − p)}
.
(3)
{1 − s(1 − p)}{1 − s0 (1 − p0 )}2 {1 − ss0 (1 − p)}
Note that if p = p0 then the covariance structure is as in (2). For each s, Gn ( · , s) is cadlag, so
E [G(p, s)G(p0 , s0 )] =
the tightness of {Gn } must be proved differently from that of {Gn (p, · )}. Further, though G is
continuous in p, it is not differentiable almost surely. This is because Rn,i is a discrete random
function of p. As n increases, the discreteness of Gn disappears, but its limit is not smooth enough
to deliver differentiability in p.
As before, the continuous mapping theorem ensures that, given a continuous mapping f :
D(I × S) 7→ R, under the null the test statistic f [Gn ] obeys
f [Gn ] ⇒ f [G].
Another way to construct tests is to use the process Gn (· , s). Under the null, we have Gn ( · , s)
⇒ G( · , s), where G( · , s) is a Gaussian process such that for each p and p0 with p0 ≤ p, E [G( · , s)]
= 0, and
s2 p0 2 (1 − s)2 (1 − p){1 − s(1 − p)}
E [G(p, s)G(p , s)] =
.
{1 − s(1 − p)}{1 − s(1 − p0 )}2 {1 − s2 (1 − p)}
0
(4)
Given a continuous mapping f : D(I) 7→ R, under the null the test statistic f [Gn ( · , s)] obeys
f [Gn ( · , s)] ⇒ f [G( · , s)].
We call tests based on f [Gn (p, · )], f [Gn (· , s)], or f [Gn ] generalized runs tests (GR tests)
to emphasize their lack of dependence on specific values of p and/or s. Our first main result
summarizes our discussion so far.
7
T HEOREM 1: Given conditions A1, A2(i), A3, and the null,
(i) for each p ∈ I, Gn ( p, · ) ⇒ G(p, · ), and if f : C(S) 7→ R is continuous, then f [Gn (p, · )] ⇒
f [G(p, · )];
(ii) for each s ∈ S, Gn ( · , s) ⇒ G( · , s), and if f : D(I) 7→ R is continuous, then f [Gn ( · , s)]
⇒ f [G( · , s)];
(iii) Gn ⇒ G, and if f : D(I × S) 7→ R is continuous, then f [Gn ] ⇒ f [G].
2.3 Empirical Generalized Runs (EGR) Tests
We now consider the case in which θ ∗ is known, but the null CDF of Yt is unknown. This is a
common situation when interest attaches to the behavior of raw data. As the null CDF is unknown,
Gn cannot be computed. Nevertheless, we can proceed by replacing the unknown F with a suitable
estimator. The empirical distribution function is especially convenient here. Specifically, for each
y ∈ R, we define
n
1X
Fen (y) :=
1{Yt ≤y} .
n t=1
This estimation requires modifying our prior definition of p-runs as follows: First, for each p, let
Ten (p) := {t ∈ N : Fen (Yt ) < p},
fn (p) denote the (random) number of elements of Ten (p), and let e
let M
tn,i (p) denote the ith smallest
fn (p). (Note that M
fn (p)/n = p.) We define the empirical p-runs as
element of Ten (p), i = 1, · · · , M


e
tn,i (p),
i = 1;
e
Rn,i (p) :=
e
fn (p).
tn,i (p) − e
tn,i−1 (p), i = 2, · · · , M
For each s ∈ S, define
¶
Mn (p) µ
X
1
sp
e
R
(p)
en (p, s) := √
G
s n,i −
{1 − s(1 − p)}
n i=1
f
if p ∈ ( n1 ,
n−1
),
n
(5)
en (p, s) := 0 otherwise.
and G
en different from that for Gn .
The presence of Fen leads to an asymptotic null distribution for G
We now examine this in detail. For convenience, for each p ∈ I, let qen (p) := inf{x ∈ R : Fen (x) ≥
8
p}, let pen (p) := F (e
qn (p)), and abbreviate pen (p) as pen . Then (5) can be decomposed into two pieces
as
en = Wn + Hn ,
G
where for each (p, s),
¶
Mn (p) µ
1 X
se
pn
en,i (p)
R
Wn (p, s) := √
s
−
,
{1 − s(1 − pen )}
n i=1
f
and
¶
fn (p) µ
M
se
pn
sp
Hn (p, s) := √
.
−
{1 − s(1 − pen )} {1 − s(1 − p)}
n
Our next result relates Wn to the random function Gn , revealing Hn to be the contribution of the
CDF estimation error.
L EMMA 2: Given conditions A1, A2(i), A3, and the null,
(i) sup(p,s) ∈ I×S |Wn (p, s) − Gn (p, s)| = oP (1);
(ii) Hn ⇒ H, where H is a Gaussian process such that for each (p, s) and (p0 , s0 ) with p0 ≤ p,
E[H(p, s)] = 0, and
ss0 pp0 2 (1 − s)(1 − s0 )(1 − p)
E [H(p, s)H(p , s )] =
.
{1 − s(1 − p)}2 {1 − s0 (1 − p0 )}2
0
0
(6)
(iii) (Wn , Hn ) ⇒ (G, H), and for each (p, s) and (p0 , s0 ),
E[G(p, s)H(p0 , s0 )] = −E[H(p, s)H(p0 , s0 )].
To state our next result, we let Ge be a Gaussian process such that for each (p, s) and (p0 , s0 )
e s)] = 0, and
with p0 ≤ p, E[G(p,
e s)G(p
e 0 , s0 )] =
E[G(p,
ss0 p0 2 (1 − s)2 (1 − s0 )2 (1 − p)2
.
{1 − s(1 − p)}2 {1 − s0 (1 − p0 )}2 {1 − ss0 (1 − p)}
(7)
The analog of Theorem 1 can now be given as follows.
T HEOREM 2: Given conditions A1, A2(i), A3, and the null,
en (p, · ) ⇒ G(p,
e · ), and if f : C(S) 7→ R is continuous, then f [G
en (p, · )]
(i) for each p ∈ I, G
e · )];
⇒ f [G(p,
9
en ( · , s) ⇒ G(
e · , s), and if
(ii) for each s ∈ S, G
f : D(I) 7→ R is continuous, then
en ( · , s)] ⇒ f [G(
e · , s)];
f [G
en ⇒ G,
e and if f: D(I × S) 7→ R is continuous, then f [G
en ] ⇒ f [G].
e
(iii) G
en (p, · )], f [G
en ( · , s)], or f [G
en ] empirical generalized runs tests (EGR
We call tests based on f [G
tests) to highlight their use of the empirical distribution function. We emphasize that the distributions of the GR and EGR tests differ, as the CDF estimation error survives in the limit.
2.4 Parametric Empirical Generalized Runs (PEGR) Tests
Now we consider the consequences of estimating θ ∗ by θ̂ n satisfying A4. As noted by Darling
(1955), Sukhatme (1972), Durbin (1973), and Henze (1996), this can affect the asymptotic null
distribution of GOF-based test statistics. Somewhat remarkably, it turns out that this does not
occur here.
We elaborate our notation to handle parameter estimation. Let Ŷn,t := h(Xt , θ̂ n ) and let
n
1X
F̂n (y) :=
1
,
n t {Ŷn,t ≤y}
so that F̂n is the empirical CDF of Ŷn,t . For each p, we now let
T̂n (p) := {t ∈ N : F̂n (Ŷn,t ) < p},
let M̂n (p) denote the (random) number of elements of T̂n (p), and let t̂n,i (p) denote the ith smallest element of T̂n (p), i = 1, · · · , M̂n (p). (Note that M̂n (p)/n = p.) We define the parametric
empirical p-runs as


R̂n,i (p) :=
t̂n,i (p),
 t̂ (p) − t̂
n,i
n,i−1 (p),
i = 1;
i = 2, · · · , M̂n (p).
For each s ∈ S, define
¶
M̂n (p) µ
1 X
sp
R̂n,i (p)
Ĝn (p, s) := √
s
−
{1 − s(1 − p)}
n i=1
if p ∈ ( n1 ,
n−1
),
n
and Ĝn (p, s) := 0 otherwise.
10
To show that estimating θ ∗ has no asymptotic impact, we decompose Ĝn as Ĝn = G̈n + Ḧn ,
where, letting qen (p) := inf{y ∈ R : Fen (y) ≥ p} and pen := F (e
qn (p)) as above, we define
¶
M̂n (p) µ
1 X
se
pn
R̂n,i (p)
G̈n (p, s) := √
s
−
,
{1 − s(1 − pen )}
n j=1
and
M̂n (p)
Ḧn (p, s) := √
n
µ
se
pn
sp
−
{1 − s(1 − pen )} {1 − s(1 − p)}
¶
.
Our next result extends Lemma 2.
L EMMA 3: Given conditions A1− A4 and the null,
(i) sup(p,s) ∈ I×S |G̈n (p, s) − Gn (p, s)| = oP (1);
(ii) sup(p,s) ∈ I×S |Ḧn (p, s) − Hn (p, s)| = oP (1).
The analog of Theorems 1 and 2 can now be given.
T HEOREM 3: Given conditions A1− A4 and the null,
e · ), and if f : C(S) 7→ R is continuous, then f [Ĝn (p, · )]
(i) for each p ∈ I, Ĝn (p, · ) ⇒ G(p,
e · )];
⇒ f [G(p,
e · , s), and if
(ii) for each s ∈ S, Ĝn ( · , s) ⇒ G(
f : D(I) 7→ R is continuous, then
e · , s)];
f [Ĝn ( · , s)] ⇒ f [G(
e and if f: D(I × S) 7→ R is continuous, then f [Ĝn ] ⇒ f [G].
e
(iii) Ĝn ⇒ G,
We call tests based on f [Ĝn ( p, · )], f [Ĝn (·, s)], or f [Ĝn ] parametric empirical generalized runs
tests (PEGR tests) to highlight their use of estimated parameters. By Theorem 3, the asymptotic
en ], which takes θ ∗ as known.
null distribution of f [Ĝn ] is identical to that of f [G
3
Simulating Asymptotic Critical Values
Obtaining critical values for test statistics constructed as functions of Gaussian processes can be
challenging. Nevertheless, the rational polynomial structure of our statistics permits us to construct
representations of G and Ge as infinite sums of independent Gaussian random functions. Straightforward simulations then deliver the desired critical values.
11
For this, we use the Karhunen-Loève (K-L) representation (Loeve, 1978, ch.11) of a stochastic
process, which represents Brownian motion as an infinite sum of sine functions multiplied by independent Gaussian random coefficients. Grenander (1981) describes this representation as a complete orthogonal system (CONS) and provides many examples. In econometrics, Phillips (1998)
has used the K-L representation to obtain asymptotic critical values for testing cointegration. Andrews’s (2001) analysis of test statistics for a GARCH(1,1) model with nuisance parameter not
identified under the null also exploits a CONS representation. By theorem 2 of Jain and Kallianpur
(1970), Gaussian processes with almost surely continuous paths have a CONS representation and
can be approximated uniformly. We apply this result to our GR and (P)EGR test statistics; this
straightforwardly delivers reliable asymptotic critical values.
3.1 Generalized Runs Tests
A fundamental property of Gaussian processes is that two Gaussian processes have identical distributions if their covariance structures are the same. We use this fact to represent G(p, · ), G( · , s),
and G as infinite sums of independent Gaussian processes that can be straightforwardly simulated.
To obtain critical values for GR tests, we can use the Gaussian process Z defined by
¶
µ
∞
X
sp(1 − s)B00 (p)
(1 − s)2
p2
j
1+j
Z(p, s) :=
+
s (1 − p) Bj
,
{1 − s(1 − p)}2 {1 − s(1 − p)}2 j=1
(1 − p)1+j
(8)
where {Bj : j = 1, 2, · · · } is a sequence of independent standard Brownian motions, and B00 is
a Brownian bridge independent of {Bj }. It is straightforward to compute E[Z(p, s)Z (p0 , s0 )].
Specifically, if p0 ≤ p then
ss0 p0 2 (1 − s)(1 − s0 )(1 − p){1 − s0 (1 − p)}
.
E[Z(p, s)Z(p , s )] =
{1 − s(1 − p)}{1 − s0 (1 − p0 )}2 {1 − ss0 (1 − p)}
0
0
This covariance structure is identical to (3), so Z has the same distribution as G. The processes
B00 and {Bj } are readily simulated as a consequence of Donsker’s (1951) theorem, ensuring that
critical values for any statistic f [Gn ] can be straightforwardly found by Monte Carlo methods.
An inconvenient computational aspect of Z is that the terms Bj require evalution at arbitrary
points on the real line, as p2 /(1 − p)1+j becomes unbounded as p approaches unity. Another
12
representation in this regard is the Gaussian process
Z ∗ (p, s) :=
∞
X
¡
¢
sp(1 − s)B00 (p)
(1 − s)2
+
sj Bjs p2 , (1 − p)1+j ,
2
2
{1 − s(1 − p)}
{1 − s(1 − p)} j=1
(9)
where {Bjs (p, q) : j = 1, 2, · · · } is a sequence of independent Brownian sheets: E[Bjs (p, q)Bis (p0 , q 0
)] = 1{i=j} min[p, p0 ] · min[q, q 0 ]. The arguments of Bjs lie only in the unit interval, and it is readily
verified that E[Z ∗ (p, s)Z ∗ (p0 , s0 )] is identical to (3).
Although one can obtain asymptotic critical values for p-runs test statistics f [Gn (p, · )] by
fixing p in (8) or (9), there is a much simpler representation for G(p, · ). Specifically, consider the
process Zp defined by
∞
sp(1 − s)(1 − p)1/2 X j
Zp (s) :=
s (1 − p)j/2 Zj ,
{1 − s(1 − p)} j=0
where {Zj } is a sequence of IID standard normals. It is readily verified that for each p, E[Zp (s)Zp
(s0 )] is identical to (2). Because Zp does not involve the Brownian bridge, Brownian motions,
or Brownian sheets, it is more efficient to simulate than Z(p, · ). This convenient representation
arises from the symmetry of equation (3) in s and s0 when p = p0 . The fact that equation (3) is
asymmetric in p and p0 when s = s0 implies that a similar convenient representation for G(· , s) is
not available. Instead, we obtain asymptotic critical values for test statistics f [Gn ( · , s)], by fixing
s in (8) or (9).
We summarize these results as follows.
d
T HEOREM 4: (i) For each p ∈ I, G(p, · ) = Zp , and if f: C(S) 7→ R is continuous, then f [G(p, · )]
d
= f [Zp ];
d
d
d
d
(ii) G = Z = Z ∗ and if f: D(I × S) 7→ R is continuous, then f [G] = f [Z] = f [Z ∗ ].
As deriving the covariance structures of the relevant processes is straightforward, we omit the proof
of Theorem 4 from the Appendix.
3.2 (Parametric) Empirical Generalized Runs Tests
For the EGR statistics, we can similarly provide a Gaussian process whose covariance structure is
the same as (7) and that can be straightforwardly simulated. By Theorem 3, this Gaussian process
also yields critical values for PEGR test statistics.
13
We begin with a representation for H. Specifically, consider the Gaussian process X defined
by
sp(1 − s)
B 0 (p),
{1 − s(1 − p)}2 0
where B00 is a Brownian bridge as before. It is straightforward to show that when p0 ≤ p,
X (p, s) := −
E[X (p, s)X (p0 , s0 )] is the same as (6). The representation Z for G in Theorem 4(ii) and the covariance structure for G and H required by Lemma 2(iii) together suggest representing Ge as Ze or
Ze∗ defined by
µ
¶
∞
2
2
X
(1
−
s)
p
j
1+j
e s) :=
Z(p,
s (1 − p) Bj
{1 − s(1 − p)}2 j=1
(1 − p)1+j
and
Ze∗ (p, s) :=
(10)
∞
X
¡
¢
(1 − s)2
sj Bjs p2 , (1 − p)1+j
2
{1 − s(1 − p)} j=1
(11)
respectively, so that Ze (resp. Ze∗ ) is the sum of Z (resp. Z ∗ ) and X with the identical B00 in
e 0 , s0 )Z(p,
e s)] and E[Ze∗ (p0 , s0 )Ze∗ (p, s)]. Thus,
each. As is readily verified, (7) is the same as E[Z(p
en and Ĝn .
simulating (10) or (11) can deliver the asymptotic null distribution of G
Similar to the previous case, the following representation is convenient when p is fixed:
¾
∞ ½
sp(1 − s)(1 − p)1/2 X j
p
e
Zp (s) :=
s −
(1 − p)j/2 Zj .
{1 − s(1 − p)} j=0
{1 − s(1 − p)}
e , s) or Ze∗ (· , s).
For fixed s, we use the representation provided by Z(·
We summarize these results as follows.
d
T HEOREM 5 (i) H = X ;
d
d
e · ) = Zep , and if f: C(S) 7→ R is continuous, then f [G(p,
e · )] = f [Zep ];
(ii) For each p ∈ I, G(p,
d
d
d
d
e =
e =
(iii) Ge = Ze =Ze∗ , and if f: D(I × S) 7→ R is continuous, then f [G]
f [Z]
f [Ze∗ ].
As deriving the covariance structures of the relevant processes is straightforward, we omit the proof
of Theorem 5 from the Appendix.
4
Monte Carlo Simulations
In this section, we use Monte Carlo simulation to obtain critical values for test statistics constructed
with f delivering the L1 and uniform norms of its argument. We also examine level and power
14
properties of tests based on these critical values.
4.1 Critical Values
We consider the following statistics:
Z 1
p
T1,n :=
|Gn (p, s)|ds,
−1
Z
p
Te1,n
:=
1
en (p, s)|ds,
|G
−1
p
T∞,n
:= sup |Gn (p, s)|,
s∈S
p
en (p, s)|,
:= sup |G
Te∞,n
s∈S
where S := [s, s̄] with s = −0.99 and s̄ = 0.99, and p is an element of {0.1, 0.3, 0.5, 0.7, 0.9}; and
Z
1
Z
1
T1,n :=
|Gn (p, s)| ds dp,
0
Z
1
Te1,n :=
0
T∞,n :=
Z
1
sup
|Gn (p, s)|,
(p,s) ∈ I×S
−1
en (p, s)| ds dp,
|G
Te∞,n :=
sup
en (p, s)|.
|G
(p,s) ∈ I×S
−1
Theorems 4 and 5 ensure that the asymptotic null distributions of these statistics can be generated by simulating Zp , Z (or Z ∗ ), Zep , and Ze (or Ze∗ ), as suitably transformed. We approximate
these using
50
sp(1 − s)(1 − p)1/2 X j
Wp (s) :=
s (1 − p)j/2 Zj ,
{1 − s(1 − p)} j=0
µ
¶
50
X
(1 − s)2
p2
sp(1 − s)
0
j
1+j e
e
B (p) +
s (1 − p) Bj
W(p, s) :=
,
{1 − s(1 − p)}2 0
{1 − s(1 − p)}2 j=1
(1 − p)1+j
¾
50 ½
sp(1 − s)(1 − p)1/2 X j
p
f
Wp (s) :=
s −
(1 − p)j/2 Zj ,
{1 − s(1 − p)} j=0
{1 − s(1 − p)}
and
µ
¶
50
2
2
X
p
(1
−
s)
j
1+j
f s) :=
s (1 − p) Bej
,
W(p,
{1 − s(1 − p)}2 j=1
(1 − p)1+j
P
P
(j)
respectively, where Be00 (p) := Be0 (p) − p Be0 (1), Bej (x + p − 1) := 100−1/2 xk=1 [100p]
i=1 Zi,k ,
(j)
j = 0, 1, · · · , 50; x = 1, 2, · · · , 7499; and Zi,k ∼ IID N (0, 1) with respect to i, k, and j. We evaluate these functions for p and s on the grids {0.01, 0.02, · · · , 1.00} and {−0.99, −0.98, · · · , 0.98,
0.99}, respectively.
15
Concerning these approximations, three comments are in order. First, the domains for p and
s are approximated using a relatively fine grid. Second, we truncate the sum of the independent
Brownian motions at 50 terms. The jth term contributes a random component with a standard
deviation of sj (1 − p)(1+j)/2 p, which vanishes quickly as j tends to infinity. Finally, the Brownian
motions defined on the positive real line are approximated by partial sum processes with jumps
at {0, 0.01, 0.02, · · · , 7498.98, 7498.99, 7499.00}. Preliminary experiments showed the impact of
these approximations to be small; we briefly discuss certain aspects of these experiments below.
Table 1 contains the critical values generated by 10,000 replications of these processes.
4.2 Level and Power of the Test Statistics
In this section, we compare the level and power of generalized runs tests with other tests in the
literature. We conduct two sets of experiments. The first examines power against dependent alternatives. The second examines power against structural break alternatives.
To examine power against dependent alternatives, we follow Hong and White (2005) and consider the following DGPs:
DGP 1.1: Xt = εt ;
DGP 1.2: Xt = 0.3Xt−1 + εt ;
1/2
2
DGP 1.3: Xt = ht εt , and ht = 1 + 0.8Xt−1
;
1/2
2
2
DGP 1.4: Xt = ht εt , and ht = 0.25 + 0.6ht−1 + 0.5Xt−1
1{ε2t−1 <0} + 0.2Xt−1
1{ε2t−1 ≥0} ;
DGP 1.5: Xt = 0.8Xt−1 εt−1 + εt ;
DGP 1.6: Xt = 0.8ε2t−1 + εt ;
DGP 1.7: Xt := 0.4Xt−1 1{Xt−1 >1} − 0.5Xt−1 1{Xt−1 ≤1} + εt ;
DGP 1.8: Xt = 0.8|Xt−1 |0.5 + εt ;
DGP 1.9: Xt = sgn(Xt−1 ) + 0.43εt ;
where εt ∼ IID N (0, 1). Note that DGP 1.1 satisfies the null hypothesis, whereas the other DGPs
represent interesting dependent alternatives. As there is no parameter estimation, we apply our
EGR statistics and compare these to the entropy-based nonparametric statistics of Robinson (1991),
Skaug and Tjøstheim (1996), and Hong and White (2005), denoted as Rn , STn , and HWn , respectively.
16
We present the results in Tables 2 to 4. To summarize, the EGR test statistics show approximately correct levels. As the sample size increases, we generally observe more accurate nominal
levels. For the alternatives, the EGR test statistics generally gain power as n increases. As noted
by Granger (1963), a particular selection of p or, more generally, the choice of mapping f can
p
yield tests with better or worse power. Generally, we see that the Te1,n
-based tests outperform the
p
p
Te∞,n
-based tests. Similarly, the Te1,n -based tests outperform the Te∞,n -based tests. Among the Te1,n
based tests, more extreme choices for p generally yield better power, although DGP 1.2 is a notable
p
exception. Nor is there a clear-cut relation between Te1,n
-based tests and Te1,n -based tests. The more
p
powerful Te1,n
-based tests dominate the Te1,n -based tests for DGPs 1.3-1.5, but the Te1,n -based tests
dominate for DGPs 1.2, and 1.6-1.9. When compared to the entropy-based test statistics, we observe four notable features. First, T̃1,n -based tests generally dominate entropy-based tests for DGP
1.2 and DGP 1.8. Second, for DGPs 1.3 and 1.7, entropy-based tests are more powerful than the
EGR tests. Third, for the other DGPs, the best powered EGR tests exhibit power roughly similar
to that of the best powered entropy-based tests. Finally, the best powered EGR tests outperform
the best powered entropy-based tests more often for smaller sample sizes.
For structural break alternatives, we compare our PEGR tests to a variety of well-known
tests. These include Feller’s (1951) and Kuan and Hornik’s (1995) RR test, Brown, Durbin and
Evans’s (1975) RE-CUSUM test, Sen’s (1980) and Ploberger, Krämer and Kontrus’s (1989) RE
test, Ploberger and Krämer’s (1992) OLS-CUSUM test, Andrews’s (1993) Sup-W test, Andrews
and Ploberger’s (1994) Exp-W and Avg-W tests, and Bai’s (1996) M-test.1 Appendix B contains
formulas for these statistics. As these are all devised to test for a single structural break at an
unknown point, they may not perform well when there are multiple breaks. In contrast, our PEGR
statistics are designed to detect general alternatives to IID, so we expect these may perform well
in such situations.
We consider the following DGPs for our Monte Carlo simulations:
DGP 2.1: Yt := Zt + εt ;
DGP 2.2: Yt := exp(Zt ) + εt ;
1
We also examined Chu, Hornik, and Kuan’s (1995a) ME test and Chu, Hornik, and Kuan’s (1995b) RE-MOSUM
and OLS-MOSUM tests. Their performance is comparable to that of the other prior tests, so for brevity we do not
report those results here.
17
DGP 2.3: Yt := Zt 1{t≤b0.5·nc} − Zt 1{t>b0.5·nc} + εt ;
DGP 2.4: Yt := Zt 1{t≤b0.3nc} − Zt 1{t>b0.3nc} + εt ;
DGP 2.5: Yt := Zt 1{t≤b0.1nc} − Zt 1{t>b0.1snc} + εt ;
DGP 2.6: Yt := exp(Zt )1{t≤b0.5·nc} + exp(−Zt )1{t>b0.5·nc} + εt ;
DGP 2.7: Yt := Zt 1{t∈Kn (0.02)} − Zt 1{t∈K
/ n (0.02)} + εt ;
DGP 2.8: Yt := Zt 1{t∈Kn (0.05)} − Zt 1{t∈K
/ n (0.05)} + εt ;
DGP 2.9: Yt := Zt 1{t∈Kn (0.1)} − Zt 1{t∈K
/ n (0.1)} + εt ;
DGP 2.10: Yt := Zt 1{t=odd} − Zt 1{t=even} + εt ;
DGP 2.11: Yt := exp(0.1 · Zt )1{t=odd} + exp(Zt )1{t=even} + εt ,
where Zt = 0.5Zt−1 + ut ; (εt , ut )0 ∼ IID N (0, I2 ); and Kn (r) := {t = 1, · · · , n : (k − 1)/r + 1 ≤
t ≤ k/r, k = 1, 3, 5, · · · }.
For DGPs 2.1, 2.3–2.5, and 2.7–2.10, we use ordinary least squares (OLS) to estimate the
parameters of a linear model
Yt = α + βZt + vt ,
and we apply our PEGR statistics to the prediction errors vˆt := Yt − α̂ − β̂Zt . The linear model is
correctly specified for DGP 2.1, but is misspecified for DGPs 2.3–2.5 and 2.7–2.10. Thus, when
DGP 2.1 is considered the null hypothesis holds, permitting an examination of the level of the tests.
As the model is misspecified for DGPs 2.3–2.5 and 2.7–2.10, the alternative holds for vˆt , permitting
an examination of power. DGPs 2.3–2.5 exhibit a single structural break at different break points,
permitting us to see how the PEGR tests compare to standard structural break tests specifically
designed to detect such alternatives. DGPs 2.7 to 2.10 are deterministic mixtures in which the
true coefficient of Zt depends on whether or not t belongs to a particular structural regime. The
number of structural breaks increases as the sample size increases, but the proportion of breaks to
the sample size is constant. Also, the break points are equally spaced. Thus, for example, there
are four break points in DGP 2.8 when the sample size is 100 and and nine break points when the
sample size is 200. The extreme case is DGP 2.10, in which the proportion is equal to one, and the
coefficient of Zt depends whether or not t is even. Given the regular pattern of these breaks, this
may be hard to distinguish from a DGP without a structural break.
For DGPs 2.2, 2.6, and 2.11, we use nonlinear least squares (NLS) to estimate the parameters
18
of a nonlinear model
Yt = exp(βZt ) + vt ,
and we apply our PEGR statistics to the prediction errors vˆt := Yt − exp(β̂Zt ). The situation is
analogous to that for the linear model, in that the null holds for DGP 2.2, whereas the alternative
holds for 2.6 and 2.11. Examing these alternatives permits an interesting comparison of the PEGR
tests, designed for general use, to the RR, RE, M, OLS-CUSUM and RE-CUSUM statistics, which
are expressly designed for use with linear models.
Our simulation results are presented in Tables 5 to 7. To summarize, the levels of the PEGR
tests are approximately correct for both linear and nonlinear cases and generally improve as the
sample size increases. On the other hand, there are evident level distortions for some of the other
statistics, especially for the nonlinear DGP 2.2 with the linear model statistics. The PEGR statistics also have respectable power. They appear consistent against our structural break alternatives,
although the PEGR tests are not as powerful as the other break tests when there is a single structural break. This is as expected, as the other tests are specifically designed to detect a single break,
whereas the PEGR test is not. As one might also expect, the power of the PEGR tests diminishes
notably as the break moves away from the center of the sample. Nevertheless, the relative performance of the tests reverses when there are multiple breaks. All test statistics lose power as the
proportion of breaks increases, but the loss of power for the non-PEGR tests is much faster than
for the PEGR tests. For the extreme alternative DGP 2.10, the PEGR tests appear to be the only
consistent tests.
p
We also note that, as for the dependent alternatives, the Te1,n
(resp. Te1,n ) -based tests outperform
p
the Te∞,n
(resp.Te∞,n ) -based tests. For these specific structural break alternatives, however, the Te1,n p
based tests always outperform the Te1,n
-based tests.
5
Empirical Application
The Panel Study of Income Dynamics (PSID) is a dataset in which an initial panel for 1968 was
constructed by random sampling, followed by successive annual waves designed to preserve the
random sampling structure. This dataset has been extensively analyzed, especially by labor economists, who typically exploit the random sampling design to justify IID assumptions for particular
19
aspects of the data generating process. For example, MaCurdy (1981) and Altonji (1986) used
the PSID to estimate a model of male labor supply based on dynamic optimization under uncertainty. Here we estimate their model using panels from the PSID constructed using several different
waves. We apply our generalized runs tests to the estimated residuals of these models, permitting
us to assess not only whether or not the IID assumption is valid for a given panel, but also the
extent to which this validity may change through time. We can thus assess not only the success
of the original sample design, but also whether there may have been structural shifts over time or
misspecification of the labor supply model.
Formally, we test the null hypothesis that for some real vector β ∗ and for each t = 1, · · · , T ,
H00 : {Yit − β 0∗ Xit : i = 1, · · · , NT } is IID
against the alternative H10 that the null is false, where Yit and Xit are as fully described below, and
NT is the total number of individuals in the balanced panel beginning in 1968 and ending in year
T.
The PSID contains data both at the individual and at the household level. We restrict attention to
household-level data, and we apply the following data exclusion rules: Observations are excluded
if:
1. The household is from the Survey of Economic Opportunity (SEO) group;2
2. The household is not surveyed in 1968;
3. The household reports missing information for our dependent variable, regressors, or instruments;
4. The household head has zero income with positive working hours;
5. The household head is not a married male in 1968;
6. The household composition changes;
7. The household head is retired, disabled, a housewife, etc.;
8. The household is outside the US;
9. The household head worked more than 8,760 hours a year;
10. The household head has zero working hours for a year;
2
The PSID consists of two samples, the SRC and the SEO. The latter is obtained by separately sampling low-
income families. Applying the PEGR tests to SRC and SEO together strongly rejects the IID null.
20
11. The household head’s reported age is inconsistent.3
In order to have sufficient observations to obtain reliable coefficient estimates, we consider panel
data sets constructed using data through 1980. Table 8 displays the numbers of observations remaining after applying these exclusion rules for annual waves from 1968 through 1980. We apply
our tests to five data sets: an initial cross section (N68 = 1,129) and four panels, containing three,
six, nine, and twelve years of data respectively. For these panels, the datasets have NT × T =
2,817, 4,182, 4,401, and 4,152 observations, respectively.
Following MaCurdy (1981) and Altonji (1986), we estimate male labor supply models given
by
`it =
(T )
β0
+
(T )
β1 wit
+
T −1
X
(T )
(T )
βt Dt + uit ,
t=69
where `it := log(Lit ) − log(Li(t−1) ), with Lit denoting annual labor hours for household head i in
period t; wit := log(Wit ) − log(Wi(t−1) ), with Wit denoting hourly earnings for household head i
(T )
in period t; Dt is a time dummy for period t; and {uit : i = 1, 2, · · · , NT } is the set of stochastic
residuals in period t assumed IID by the null for each t; t = 69, · · · , T ; and T = 69, 71, 74, 77,
(T )
80.4 In MaCurdy (1981) and Altonji (1986), interest attaches primarily to β1 , which relates the
intertemporal substitution effect to the business cycle.
We apply our PEGR test statistics to the estimated errors for these equations,
(T )
ûit
:= `it −
(T )
β̂0
−
(T )
β̂1 wit
−
T −1
X
(T )
β̂t Dt ,
i = 1, 2, · · · , NT ,
t=69
where, following MaCurdy (1981), we estimate the unknown β∗ by pooled two-stage least squares,
using as instruments the household head’s father’s and mother’s education, dummy variables indicating economic status during the household head’s childhood, household head’s education, household head’s education squared and the time dummy variables. These instruments are those used by
MaCurdy (1981).
We apply our PEGR tests separately for each year within a given panel. For example, for
the 1968-1971 panel, we apply our tests separately for t = 69, 70, and 71; because the panel is
3
For example, in some cases, the household head’s age as reported in a particular year and an adjacent year differs
by more than two years. For other examples, refer to http://www.vuw.ac.nz/staff/js-cho/iidtesting.html.
4
The dummy variable Dt is omitted when T = 69 to avoid the dummy variable trap.
21
balanced, there are N71 = 939 observations for each t. As is readily verified, under correct specification, this model and estimation method satisfy the regularity conditions required for application
of our statistics.
Table 9 contains the p-values for our test statistics Te1,n , and Te∞,n . We also compute Hochberg’s
(1988) modified Bonferroni p-value bounds5 based on groups of reported individual p-values. The
resulting bounds are given at the bottom of Table 9. As the reported p-values are nowhere near
standard significance levels, we fail to reject the IID null hypothesis for every panel data set. We
p
p
also report Hochberg bounds when the (not separately tabulated) test statistics Te1,n
and Te∞,n
(p ∈
s
s
(s ∈ {−0.5, −0.3, −0.1, 0.1, 0.3, 0.5}) are computed.
{0.3, 0.4, 0.5, 0.6, 0.7}) and Te1,n
and Te∞,n
As before, we do not reject the null hypothesis.
6
Conclusion
The IID assumption plays a central role in economics and econometrics. Here we provide a family
of statistics based on generalized runs that are powerful against unspecified alternatives, providing
a useful complement to tests designed to have power against specific alternatives, such as serial
correlation, GARCH, or structural breaks. Relative to other tests of this sort, for example the
entropy-based tests of Hong and White (2005), our tests have an appealing computational simplicity in that they do not require kernel density estimation, with the associated challenge of bandwidth
selection; and they have power against dependent alternatives without having to examine dependence at specific lags. Our simulation studies show that our tests have empirical levels close to
their nominal asymptotic levels. They also have encouraging power against a variety of important alternatives. A feature of particular interest is their power against alternatives that involve a
number of structural breaks increasing with the sample size. We apply our tests to residuals from
a male labor supply model using the PSID and find no evidence against the IID hypothesis, suggesting the success of the original random sampling design, its preservation over time, the absence
of structural breaks, and the absence of misspecification for the MaCurdy (1981)-Altonji (1986)
model.
5
Hochberg’s bound is computed as minj=1,2,··· ,J (J + 1 − j)p(j) , where p(j) is the j-th smallest p-value and J is
the total number of models.
22
7
Appendix
7.1 Proofs
Proof of Lemma 1: As the necessity part is trivial, we prove the sufficiency part only. Suppose
that {Yt } is not independent, then there is at least a pair of two dependent variables, (Yi , Yi+k )
(k > 0) say. For notational simplicity, we let i = 0 without loss of generality. Then, for some
p ∈ I, P(F0 (Y0 ) ≤ p, Fk (Yk ) ≤ p) 6= P(F0 (Y0 ) ≤ p)P(Fk (Yk ) ≤ p) = p2 . Further, from the given
condition on Rn,i (p) and the definition of conditional expectation, P(Fk (Yk ) ≤ p|F0 (Y0 ) ≤ p) =
Pk
P`
Pk
k−` `
p = p, where the second equality follows
`=1 P (
i=1 Rn,i (p) = k) =
`=1 k−1 C`−1 (1 − p)
M (p)
from the IID condition of {Rn,i (p)}i=1n . Thus, P(Fk (Yk ) ≤ p, F0 (Y0 ) ≤ p) = P(Fk (Yk ) ≤
p | F0 (Y0 ) ≤ p)P(F0 (Y0 ) ≤ p) = p2 . This is contradictory to the above inequality derived from
the assumption that {Yt } is not independent. Thus, {Yt } is independently distributed.
Next, suppose further that {Yt } is not identically distributed. Then there is a pair, say (Yi , Yj ),
such that for some y ∈ R, pi := Fi (y) 6= pj := Fj (y). Further, for the same y, P(Rn,(j) (pj ) = 1) =
P(Fj (Yj ) ≤ Fj (y)|Fj−1 (Yj−1 ) ≤ Fj−1 (y)) = P(Fj (Yj ) ≤ Fj (y)) = pj , where the subscript (j)
denotes the (j)-th run of {Rn,i (pj )} corresponding to Fj (y), and the second equality follows from
the independence property shown above. Similarly, P(Rn,(i) (pj ) = 1) = P(Fj (Yi ) ≤ Fj (y)) =
P(Yi ≤ y) = pi . That is, P(Rn,(j) (pj ) = 1) 6= P(Rn,(i) (pj ) = 1). This is contradictory to
the assumption that {Rn,i (p)} is identically distributed for all p ∈ I. Hence, {Yt } is identically
distributed. This completes the proof.
¥
To prove Theorem 1, we first examine preliminary lemmas. We let Kj be a random variable
PKj
satisfying that i=K
Rn,i (p) = Rn,j (p0 ).
j−1 +1
L EMMA A1: Given conditions A1, A2, A3(i), A4(i), and the null, E[
PK1
i=1
sRi ] = E[sRi ] · E[K1 ].
Proof of Lemma A1: For each s, ` ∈ N, P(Ri = s|K1 = `) = P(K1 = `|Ri = s)P(Ri = s)
/P(K1 = `) = P(K1 = `)P(Ri = s)/P(K1 = `) = P(Ri = s), where the second equality follows
P 1 Ri
from that the event {K1 = `} is independent of {Ri = s} under the null, so that E[ K
i=1 s ] =
PK1 Ri
P
P∞
∞
Ri
Ri
`=1 ` P(K1 = `) = E[s ] · E[K1 ]. This
`=1 E[
i=1 s |K1 = `]P(K1 = `) = E[s ]
completes the proof.
¥
23
0
L EMMA A2: Given conditions A1, A2, A3(i), A4(i), and the null, if p0 ≤ p, then E[K1 sR1 ]
=
sp0
{1−s(1−p0 )}
·
1−s(1−p)
.
{1−s(1−p0 )}
0
0
Proof of Lemma A2: First, E[K1 sR1 ] = E[sR1 E[K1 |R10 ]], and P(K1 = `|R10 = r10 ) = P(K1 = `,
R10 = r10 )/P(R10 = r10 ) =
r10 −1 C`−1 (p
0
0
− p0 )`−1 (1 − p)r1 −` /(1 − p0 )r1 −1 , where the last equality
0
follows from the fact that P(R10 = r10 ) = p0 (1 − p0 )r1 −1 and P(K1 = `, R10 = r10 ) = r10 −1 C`−1 p0 (p −
P
0
p−p0
0
0
0
p0 )`−1 (1−p)r1 −` . Thus, E[K1 |R10 = r10 ] = ∞
`=1 `·P(K1 = `|R1 = r1 ) = 1+(r1 −1)( 1−p0 ), imply0
0
0
0
0
0
d
ing that E[K1 sR1 ] = E[sR1 ]+( p−p
)E[sR1 (R10 −1)]. Second, E[sR1 (R10 −1)] = s2 ( ds
E[sR1 −1 ]) =
1−p0
s2 p0 (1 − p0 )/{1 − s(1 − p0 )}2 . Therefore,
·
¸
sp0
p − p0
s2 p0 (1 − p0 )
sp0 {1 − s(1 − p)}
R10
E[K1 s ] =
=
.
+
{1 − s(1 − p0 )}
1 − p0 {1 − s(1 − p0 )}2
{1 − s(1 − p0 )}2
This completes the proof.
¥
L EMMA A3: Given conditions A1, A2, A3(i), A4(i), and the null, if p0 ≤ p, then
K1
X
0
E[
sRi +R1 ] =
i=1
·
¸2
1 − s(1 − p)
s2 p0
·
.
{1 − s2 (1 − p)} {1 − s(1 − p0 )}
P 1 Ri +R0
PK1
Ri +R10
1 ] = E[
Proof of Lemma A3: First, E[ K
s
|K1 ]], and
i=1
i=1 E[s
Ri +R10
E[s
PK1
Ri + j=1
Rj
|K1 ] = E[s
|K1 ] = E[s
·
2Ri
|K1 ]
K1
Y
j=1,j6=i
2
sp
=
{1 − s2 (1 − p)}
E[sRj |K1 ]
¸·
sp
{1 − s(1 − p)}
¸K1 −1
,
h
i
P 1 Ri +R0
s2 p
sp
K1 −1
1 ] = E[K [
so that E[ K
s
]
]
. Next, P(K1 = k) =
1 {1−s(1−p)}
i=1
{1−s2 (1−p)}
p0
p
h
p−p0
p
ik−1
by Lemma B4 given below. From this,
·
E[K1
sp
{1 − s(1 − p)}
¸K1 −1
·
·
¸k−1
¸2
∞
p0 X
p0
1 − s(1 − p)
s(p − p0 )
]=
= ·
.
k
p k=1
{1 − s(1 − p)}
p
{1 − s(1 − p0 )}
P 1 Ri +R0
1 ].
The desired result follows by plugging this into E[ K
i=1 s
L EMMA A4: Given A1, A2, A3(i), A4(i), and the null, if p0 ≤ p, then P(K1 = k) =
and E[K1 ] =
p
.
p0
24
¥
p0
p
h
·
p−p0
p
ik−1
P
0
Proof of Lemma A4: First, P(K1 = 1) = P(R10 = R1 ) = ∞
`=1 P(R1 = R1 = `) = P(F (X1 ) <
P∞ 0
P
0
`−1
p0 ) + ∞
= p0 p−1 . Next,
`=2 P(F (X1 ) ≥ p, · · · , F (X`−1 ) ≥ p, F (X` ) < p ) =
`=1 p (1 − p)
P
P∞
0
0
P(K1 = 2) = P(R10 + R20 = R1 ) = ∞
`1 =1
`2 =1 P(R1 = `1 , R2 = `2 , and R1 = `1 + `2 ) =
P∞ P∞
`1 −1
(p − p0 )(1 − p)`2 −1 p0 = p0 (p − p0 )p−2 . Similarly, we can proceed our
`1 =1
`2 =1 (1 − p)
P
computation for an arbitrarily chosen number, say k. Then P(K1 = k) = P( kj=1 Rj0 = R1 ) =
P∞
P∞
Pk
Qk−1 P∞
0
0
0
`1 =1 · · ·
`k =1 P(R1 = `1 , R2 = `2 , · · · , Rk = `k , and R1 =
j=1 `j ) =
j=1 {
`j =1 (1 −
h 0 ik−1
P
0
`k −1 0
p)`j −1 (p0 − p)} · ∞
p = pp · p−p
.
`k =1 (1 − p)
p
From these, it is trivial to compute that E[K1 ] = p/p0 . This completes the proof.
¥
Proof of Theorem 1: (i) We separate the proof into 3 pieces, (a), (b), and (c). In (a), we prove the
weak convergence of Gn (p, · ). In (b), we show that for each s ∈ S, E [G(p, s)] = 0. Finally, we
derive E [G(p, s)G(p, s0 )] in (c).
(a) First, for each s, if we let gn,i (p, s) := sRn,i (p) − sp{1 − s(1 − p)}−1 , then
¯
¯
¯∂
¯
sup ¯¯ gn,i (p, s)¯¯ ≤ Rn,i (p)e
sRn,i (p)−1 + (1 − s̄)−2 ,
s∈S ∂s
where se := max[|s|, s̄], so that |gn,i (p, s) − gn,i (p, s0 )| ≤ {Rn,i (p)e
sRn,i (p)−1 + (1 − s̄)−2 }|s − s0 |.
Second, from the independence condition, E[|Gn (p, s)−Gn (p, s0 )|2 ] = E[|gn,i (p, s)−gn,i (p, s0 )|2 ],
so that E[|Gn (p, s) − Gn (p, s0 )2 ] ≤ K|s − s0 |2 , where
K :=
2
1
2
+
+
≥ sup E[{Rn,i (p)e
sRn,i (p)−1 + (1 − s̄)−2 }2 ],
3
2
2
(1 − se)
(1 − se) (1 − s̄)
(1 − s̄)4
p∈I
and K < ∞. Third, therefore E[|Gn (p, s) − Gn (p, s0 )|2 ]1/2 E[|Gn (p, s0 ) − Gn (p, s00 )|2 ]1/2 ≤
K|s − s00 |2 , where we let s00 ≤ s0 ≤ s. These verify the conditions of theorem 13.5 in Billinsley
(1999). The desired result follows from these and the finite dimensional weak convergence, which
is obtained by applying Cramér-Wold device.
(b) Given the conditions and the null,
Mn (p)
E[
X
i=1
Mn (p)
Rn,i (p)
s
X
sp
sp
−
] =E[
E[sRn,i (p) −
|Mn (p)]]
1 − s(1 − p)
{1
−
s(1
−
p)}
i=1
Mn (p)
=E[
X
i=1
sp
sp
−
] = 0,
{1 − s(1 − p)} {1 − s(1 − p)}
where the second equality follows from that Rn,i (p) given Mn (p) is independently and identically
distributed under the null.
25
(c) Given the conditions and the null,
E[Gn (p, s)Gn (p, s0 )]
·
Mn (p)
X
−1
= n E[
E[ sRn,i (p) −
¸
s0 p
s
−
|Mn (p)]]
0 (1 − p)}
{1
−
s
i=1
·
¸
ss0 p
ss0 p2
−1
= n E[Mn (p)
−
]
{1 − ss0 (1 − p)} {1 − s(1 − p)}{1 − s0 (1 − p)}
ss0 p2 (1 − s)(1 − s0 )(1 − p)
=
,
{1 − ss0 (1 − p)}{1 − s(1 − p)}{1 − s0 (1 − p)}
sp
{1 − s(1 − p)}
¸·
0 Rn,i (p)
where the last equality follows from that n−1 E[Mn (p)] = p. Finally, we can apply the continuous
mapping theorem to show that f [Gn (p, · )] ⇒ f [G(p, · )]. This is the desired result.
Remark 1: (1) In addition to the weak convergence, it also follows that for any ε > 0, there is
δ > 0 such that
000
lim sup P(wG
(δ) > ε) = 0,
n (p, · )
(12)
n→∞
000
where we define wG
(δ) as
n (p, · )
sup
sup
s,s0 ∈S s00 ∈{|s−s0 |<δ}
min[sup |Gn (p, s00 ) − Gn (p, s0 )|, sup |Gn (p, s00 ) − Gn (p, s)|].
p
p
This follows from the proof of theorem 3 in Bickel and Wichura (1971).
(2) Verification of the second condition of theorem 13.5 in Billingsley (1999) is omitted as Gn (p, · )
is defined on C(S).
(3) Proof of Theorem 1(i) can be also proved by applying Jain and Marcus (1975). That is, if we
let H(S, u) := log(N (S, u)), where N (S, u) is the minimum number of balls of radius less than u
R s̄−s
R2
R2
covering S, then 0 H 1/2 (S, u)du ≤ 0 H((−1, 1), u)du ≤ 0 log(2 + 2/u)du = log(27) < ∞.
These two facts ensure the weak convergence of {Gn (p, · )} by theorem 1 of Jain and Marcus
(1975).
(ii) This can be proved in numerous ways. We show this by verifying the conditions in theorem
13.5 of Billingsley (1999). Our proof is separated into three pieces, (a), (b), and (c). In (a), we
show the weak convergence of Gn ( · , s). In (b), we prove that for each p, E [G(p, s)] = 0. Finally,
in (c), we show that
E [G(p, s)G(p0 , s)] =
s2 p02 (1 − s)2 (1 − p)
.
{1 − s(1 − p0 )}2 {1 − s2 (1 − p)}
26
(a) First, for each s, G(1, s) ≡ 0 from that Gn (1, s) ≡ 0, and for any δ > 0,
s2 p2 (1 − s)2 (1 − p)
E(|G(p, s)|2 )
=
lim
=0
p→1
p→1 δ 2 {1 − s(1 − p)}2 {1 − s2 (1 − p)}
δ2
lim P(|G(p, s)| > δ) ≤ lim
p→1
by Markov’s inequality and the result in (c). Thus, for each s, G(p, s) − G(1, s) ⇒ 0 as p → 1.
Second, for each p00 ≤ p0 ≤ p,
E[{Gn (p, s) − Gn (p0 , s)}2 {Gn (p0 , s) − Gn (p00 , s)}2 ]
≤ E[{Gn (p, s) − Gn (p0 , s)}4 ]1/2 E[{Gn (p0 , s) − Gn (p00 , s)}4 ]1/2
by Cauchy-Schwarz inequality. Also, it’s not hard to show that E[{Gn (p, s) − Gn (p0 , s)}4 ] =
E[{G(p, s) − G(p0 , s)}4 ] + o(n−1 ) using the fact that E[s4Ri (p) ] ≤ E[s2Ri (p) ] < ∞ and s ∈ S. Also,
some algebra shows that
E[{G(p, s) − G(p0 , s)}4 ] = 3a(p, p0 , s)2 +
12p0 2 (1 − p)b(p, p0 , s)c(p, p0 , s)
,
{1 − s0 (1 − p0 )}2 {1 − s2 (1 − p)}
where for each p and p0 , a(p, p0 , s) = k(p, s)m(p, s) − k(p0 , s)m(p0 , s), b(p, p0 , s) = m(p0 , s)
−m(p, s), c(p, p0 , s) = k(p, s) − k(p0 , s),
k(p, s) :=
p2
,
{1 − s(1 − p)}2
and
m(p, s) :=
1−p
.
{1 − s2 (1 − p)}
∂
∂
Note that | ∂p
k(p, s)| and | ∂p
m(p, s)| are bounded by k̄ := 4/(1− s̄)2 and m̄ := 1/(1−max[s̄2 , s2 ])
uniformly in (p, s) respectively. Therefore, k( · , s) and m( · , s) are Lipschitz continuous with Lipschitz constants k̄ and m̄ respectively, which are independent of s. These imply that |b(p, p0 , s)| ≤
m̄|p − p0 |, |c(p, p0 , s)| ≤ k̄|p − p0 |, and that
|a(p, p0 , s)| = |m(p, s)c(p, p0 , s) − k(p0 , s)b(p, p0 , s)| ≤ m(p, s)|c(p, p0 , s)| + k(p0 , s)|b(p, p0 , s)|.
Note also that k̄ and m̄ are uniform bounds for k(p, s) and m(p, s) in (p, s) respectively. Thus,
E[{G(p, s) − G(p0 , s)}4 ] ≤ 3 · 2k̄ 4 m̄4 |p − p0 |2 + 12k̄ 2 m̄2 |p − p0 |2 = 18k̄ 4 m̄4 |p − p0 |2 .
Finally,
E[{Gn (p, s) − Gn (p0 , s)}2 {Gn (p0 , s) − Gn (p00 , s)}2 ]
≤ 18k̄ 4 m̄4 |p − p0 | · |p0 − p00 | + o(n−1 ) ≤ 18k̄ 4 m̄4 |p − p00 |2 + o(n−1 )
27
by applying the same procedure to E[{G(p0 , s) − G(p00 , s)}4 ].
Given these, the weak convergence of {Gn ( · , s)} holds by theorem 13.5 of Billingsley (1999)
and the finite dimensional weak convergence, which we don’t prove from its self-evidence.
(b) For each p, E[G(p, · )] = 0 follows from the proof of Theorem 1(i, b).
(c) First, for convenience, for each p and p0 , we let M and M 0 denote Mn (p) and Mn (p0 )
respectively and let Ri and Ri0 be the short-hand notations for Rn,i (p) and Rn,i (p0 ). Then from the
definition of Kj ,
K1
X
0
0
E[Gn (p, s)Gn (p0 , s)] = n−1 E[M 0 E[ (sRi − E[sRi ])(sR1 − E[sR1 ])|M, M 0 ]]
i=1
K1
X
= n−1 E[M 0 ]E[
0
0
(sRi − E[sRi ])(sR1 − E[sR1 ])]
i=1
0
K1
X
= p E[
Ri +R10
s
−
i=1
K1
X
0
0
0
sRi E[sR1 ] − K1 sR1 E[sRi ] + K1 E[sRi ]E[sR1 ]],
i=1
where the first equality follows from that {Kj } is independently and identically distributed under
the null and that Rj0 is independent of R` , if ` ≤ Kj−1 or ` ≥ Kj + 1. The second equality also
follows from the fact that {M, M 0 } is independent of {Ri , R1 : i = 1, 2, · · · , K1 }. Further,
E[
K1
X
0
sRi ] = E[sRi ] · E[K1 ],
E[K1 sR1 ] =
i=1
sp0
1 − s(1 − p)
·
,
0
{1 − s(1 − p )} {1 − s(1 − p0 )}
and
K1
X
0
E[
sRi +R1 ] =
i=1
·
¸2
1 − s(1 − p)
s2 p0
·
{1 − s2 (1 − p)} {1 − s(1 − p0 )}
by Lemmas A1 to A4. Substituting these to the above equation yields the desired result.
Remark 2: (1) The proof of Theorem 1(ii) also implies that for any ε > 0, there is δ > 0 such that
00
lim supn→∞ P(wG
(δ) > ε) = 0, where
n ( · ,s)
00
wG
(δ) := sup
n ( · ,s)
p,p0 ∈I
sup
p00 ∈{|p−p0 |<δ}
min[|Gn (p00 , s) − Gn (p0 , s)|, |Gn (p00 , s) − Gn (p, s)|],
implying that {Gn ( · , s)} is tight. Proof is given in Billingsley (1999, pp. 141–143).
(2) The given bounds k̄ and m̄ in the proof of Theorem 1(ii) are independent of s. This further
implies that for any ε > 0, there is δ > 0 such that
000
(δ) > ε) = 0,
lim sup P(wG
n ( · ,s)
n→∞
28
(13)
000
where wG
(δ) is defined as
n ( · ,s)
sup
sup
p,p0 ∈I p00 ∈{|p−p0 |<δ}
min[sup |Gn (p00 , s) − Gn (p0 , s)|, sup |Gn (p00 , s) − Gn (p, s)|].
s
s
This follows from the proof of theorem 3 in Bickel and Wichura (1971).
(iii) We separate the proof into two pieces, (a) and (b). In (a), we prove the weak convergence of
Gn and derive its covariance structure in (b).
(a) In order to show the weak convergence of Gn , we exploit the corollary of Bickel and
Wichura (1971, p. 1664). That is, we show that for any ε > 0, there is δ > 0 such that
000
000
000
000
(δ)]. For this, we
(δ), wG
lim supn→∞ P(wG
(δ) > ε) = 0, where wG
(δ) := max[wG
n
n
n ( · ;s)
n ( · ;p)
note that
000
000
000
(δ) > ε/2) + lim sup P(sup wG
(δ) > ε/2),
lim sup P(wG
(δ) > ε) ≤ lim sup P(sup wG
n
n (p, · )
n ( · ,s)
n→∞
n→∞
p
n→∞
s
which is identical to zero by (12) and (13). Therefore, the desired result is obtained from this and
the finite dimensional weak convergence, which follows from Cramér-Wold device.
(b) As before, for convenience, for each p and p0 , we let M and M 0 denote Mn (p) and Mn (p0 )
respectively and let Ri and Ri0 be the short-hand notations for Rn,i (p) and Rn,i (p0 ). Also, we let
PKj
Kj be such that i=K
Rn,i (p) = Rn,j (p0 ). Then, given the conditions and the null,
j−1 +1
E[Gn (p, s)Gn (p0 , s0 )]
(14)
M0
M
XX
R0
R0
= n E[
(sRi − E[sRi ])(s0 j − E[s0 j ])]
−1
j=1 i=1
K1
K1
X
X
R0
R0
R0
R0
= p0 E[
sRi s0 1 −
E[s0 1 ]sRi − K1 s0 1 E[sRi ] + K1 E[sRi ]E[s0 1 ]].
i=1
i=1
From Lemmas A1 to A4, we note that
K1
X
E[
sRi ] = E[sRi ] · E[K1 ],
i=1
0
0 R1
E[K1 s
1 − s0 (1 − p)
s0 p0
·
,
]=
{1 − s0 (1 − p0 )} {1 − s0 (1 − p0 )}
and
K1
X
R0
sRi s0 1 ] =
E[
i=1
·
¸2
1 − s0 (1 − p)
ss0 p0
·
.
{1 − ss0 (1 − p)} {1 − s0 (1 − p0 )}
29
Thus, plugging these into (14) leads to that
E[Gn (p, s)Gn (p0 , s0 )] =
ss0 p02 (1 − s)(1 − s0 )(1 − p){1 − s0 (1 − p)}
.
{1 − s(1 − p)}{1 − s0 (1 − p0 )}2 {1 − ss0 (1 − p)}
Finally, it follows from the continuous mapping theorem that f [Gn ] ⇒ f [G].
¥
Proof of Lemma 2: (i) First, supp |e
pn (p) − p| → 0 almost surely by Glivenko-Cantelli. Second,
Gn ⇒ G by Theorem 1(ii). Third, (D(I × S) × D(I)) is a separable space. Thus, (Gn , pen ( · )) ⇒
(G, · ) by theorem 3.9 of Billingsley (1999). Fourth, |G(p, s) − G(p0 , s0 )| ≤ |G(p, s) − G(p0 , s)| +
|G(p0 , s) − G(p0 , s0 )|, and each term in the RHS can be made as small as possible by letting |p − p0 |
and |s − s0 | tend to zero because G ∈ C(I × S) a.s. Finally, note that for each (p, s), Wn (p, s) =
Gn (e
pn (p), s). Therefore, Wn − Gn = Gn (e
pn ( · ), · ) − Gn ( · , · ) ⇒ G − G = 0 by the lemma of
Billingsley (1999, p. 151) and the first to the fourth facts. This implies that sup(p,s)∈I×S |Wn (p, s)−
Gn (p, s)| → 0 in probability as desired.
(ii) We abbreviate pen (p) into pen for convenience. Then, for some p∗n (p) between p and pen ,
¯
¯·
¸
¯ 2s2 (1 − s)(e
¯
sp
s(1
−
s)(e
p
−
p)
se
p
pn − p)2
n
n
¯
¯
¯ {1 − s(1 − pen )} − {1 − s(1 − p)} − {1 − s(1 − p)}2 ¯ = {1 − s(1 − p∗ (p))}3
n
by the mean value theorem such that supp∈I |p∗n (p) − p| → 0 a.s. by Glivenko-Cantelli. Also,
¯¯
¯
¯¯
¯
fn (p)s2 (1 − s)(e
fn (p) ¯¯ ¯
¯¯
¯
M
pn − p)2
M
s2 (1 − s)
1 ¯¯
2¯
¯
¯
¯
√
sup √
sup
≤
sup
sup
n(e
p
−
p)
¯
¯
n
¯,
n ¯ ¯ p,s {1 − s(1 − p∗n (p))}3 ¯ ¯ p
n{1 − s(1 − p∗n (p))}3
n¯ p
p,s
fn (p) and s2 (1 − s){1 − s(1 − p∗ (p))}−3 are uniformly bounded by 1 and 1/(1 − se)3
where n−1 M
n
pn − p)2 = OP (1) uniformly in p. Thus,
respectively, where se := max[|s|, s̄], and n(e
¯
¯
¸
fn (p) ¯·
¯
M
sp
s(1
−
s)
(e
p
−
p)
se
p
n
n
¯ = oP (1).
sup √ ¯¯
−
−
{1 − s(1 − pen )} {1 − s(1 − p)}
{1 − s(1 − p)}2 ¯
n
p,s
fn (p) − p| = oP (1),
Given these, the weak convergence of Hn trivially follows because supp |n−1 M
√
and n(e
pn − p) as a function of p weakly converge to a Brownian bridge, so that we can apply the
lemma of Billingsley (1999, p. 151). Also, all these prove the tightness of Hn , as well.
Next, the covariance structure of H follows from the fact that for each (p, s) and (p0 , s0 ) with
p0 ≤ p,
#
fn (p)s(1 − s) (e
M
pn − p)
√
= 0,
E
n{1 − s(1 − p)}2
"
30
and
fn (p)s(1 − s) (e
fn (p0 )s0 (1 − s0 )(e
M
pn − p) M
p0n − p0 )
ss0 pp0 2 (1 − s)(1 − s0 )(1 − p)
√
,
E[ √
]
=
{1 − s(1 − p)}2 {1 − s0 (1 − p0 )}2
n{1 − s(1 − p)}2
n{1 − s0 (1 − p0 )}2
which is identical to E[H(p, s)H(p0 , s0 )].
(iii) To show the given claim, we first derive the given covariance structure. For each (p, s) and
(p0 , s0 ),
E[Wn (p, s)Hn (p0 , s0 )] = E[E[Wn (p, s)|X1 , · · · , Xn ]Hn (p0 , s0 )],
where the equality follows from that Hn is measurable with respect to the smallest σ–algebra
generated by {X1 , · · · , Xn }. Given this, note that
fn (p) ·
¸
M
1 X
se
pn
en,i (p)
R
E[Wn (p, s)|X1 , · · · , Xn ] = √
E[s
|X1 , · · · , Xn ] −
{1 − s(1 − pen )}
n i=1
¸
Mn (p) ·
sp
1 X
se
pn
=√
−
= −Hn (p, s).
n i=1 {1 − s(1 − p)} {1 − s(1 − pen )}
f
Thus, E[E[Wn (p, s)|X1 , · · · , Xn ]Hn (p0 , s0 )] = −E[Hn (p, s)Hn (p0 , s0 )]. Second, we also note
that E[G(p, s)H(p0 , s0 )] = limn→∞ E[Wn (p, s)Hn (p0 , s0 )] by Lemma 2(i), and further E[H(p, s)
H(p0 , s0 )] = limn→∞ E[Hn (p, s)Hn (p0 , s0 )]. Therefore, it follows that E[G(p, s) H(p0 , s0 )] =
−E[H(p, s)H(p0 , s0 )].
en , Hn )0 and apply example 1.4.6 of van der Vaart and Wellner (1996, p.
Next, we consider (G
en = Wn + Hn = Gn + Hn + oP (1), and each of Gn
31) to show the weak convergence. Note that G
en is tight, too. Further, their limits are continuous by Theorem 1(ii) and
and Hn is tight, so that G
en
Lemma 2(ii). Thus, if the finite-dimensional distributions of Gn + Hn has weak limits, then G
must be weakly converging to the Gaussian process Ge with the covariance structure (7). We may
apply Lindeberg-Levy’s CLT for this. That is, for each (p, s), E[Gn (p, s) + Hn (p, s)] = 0, and
E[{Gn (p, s) + Hn (p, s)}2 ] = E[Gn (p, s)2 ] − E[Hn (p, s)2 ] + o(1)
≤ E[Gn (p, s)2 ] + o(1) =
s2 p2 (1 − s)2 (1 − p)
+ o(1),
{1 − s2 (1 − p)}{1 − s(1 − p)}2
which is uniformly bounded, so that for each (p, s), the sufficiency conditions for LindebergLevy’s CLT are easily verified. Here the first equality follows by applying Lemma 2(ii). The
en follows from the Cramér-Wold device.
finite-dimensional weak convergence of G
31
en is asymptotically independent of Hn because E[G
en (p, s) Hn (p0 , s0 )]
Given this, we note that G
= E[Wn (p, s)Hn (p0 , s0 )] + E[Hn (p, s)Hn (p0 , s0 )] = 0 by the covariance structure given above, so
en , Hn )0 ⇒ (G,
e H)0 by example 1.4.6 of van der Vaart and Wellner (1996, p.
that it follows that (G
en − Hn , Hn )0 = (Wn , Hn )0 and apply the continuous
31). To complete the proof, we consider (G
mapping theorem.
¥
Remark 3: Durbin (1973) shows that empirical distributions with parameter estimation error are
not distribution-free using the similar proof to that of Lemma 2(i). We further exploit his proof to
show that Wn asymptotically equivalent to Gn .
Proof of Theorem 2: (i, ii, and iii) By chance, it is already shown in the proof of Lemma 2(iii)
en ⇒ G.
e This implies the given claims together with the continuous mapping theorem.
that G
¥
Proof of Lemma 3: (i) First, as shown in the proof of Lemma 3(ii) given below, F̂n (y( · )) converges
to F (y( · )) in probability uniformly on I, where for each p, y(p) := inf{x ∈ R : F (x) ≥ p}.
Second, Gn ⇒ G by Theorem 1(ii). Third, (D(I × S) × D(I)) is a separable space. Therefore, it
follows that (Gn , F̂n (y( · ))) ⇒ (G, F (y( · ))) by theorem 3.9 of Billingsley (1999). Fourth, G ∈
C(I×S). Finally, G̈n ( · , · ) = Gn (F̂n (y( · )), · ), so that G̈n ( · , · )−Gn ( · , · ) = Gn (F̂n (y( · )), · )−
Gn ( · , · ) ⇒ G −G = 0, where the weak convergence follows from the lemma of Billingsley (1999,
p. 151). Thus, sup(p,s) |G̈n (p, s) − Gn (p, s)| = oP (1).
(ii) First, by the definition of Ḧn (p, s), we can represent this as a product of two pieces as follows:
(
)½
·
¸¾
√
M̂n (p)
se
pn
sp
Ḧn (p, s) =
n
−
.
n
{1 − s(1 − pen )} {1 − s(1 − p)}
Second, it follows that
¸
·
√
sp
s(1 − s)B00 (p)
se
pn
−
⇒−
n
{1 − s(1 − pen )} {1 − s(1 − p)}
{1 − s(1 − p)}2
by Lemma 2(ii) and Theorem 5(i) given below. Third, if F̂n ( · ) converges to F ( · ) in probability, then n−1 M̂n (p) converges to p in probability uniformly in p because for each p, M̂n (p) =
Pn
t=1 1{F̂n (Ŷt )<p} . Finally, all these imply that sup(p,s) |Ḧn (p, s) − Hn (p, s)| = oP (1) by the lemma
of Billingsley (1999, p. 151), and this completes the proof.
Therefore, we only have to show that F̂n ( · ) converges to F ( · ) in probability and for this
exploit Glivenko-Cantelli. That is, if for each p, F̂n (y(p)) converges to F (y(p)) in probability,
32
then the uniform convergence follows from the property of empirical distribution (boundedness,
monotonicity and right continuity). Thus, the proof can be completed by showing the pointwise
convergence of F̂n (p). We follow the next steps. First, if we let y = y(p) for notational simplicity,
then for each y and for any ε1 > 0, {ω ∈ Ω : ht (θ̂ n ) < y} ⊂ {ω ∈ Ω : ht (θ ∗ ) < y + |ht (θ̂ n ) −
ht (θ ∗ )|} = {ω ∈ Ω : ht (θ ∗ ) < y +|ht (θ̂ n )−ht (θ ∗ )|, |ht (θ̂ n )−ht (θ ∗ )| < ε1 }∪{ω ∈ Ω : ht (θ ∗ ) <
y +|ht (θ̂ n )−ht (θ ∗ )|, |ht (θ̂ n )−ht (θ ∗ )| ≥ ε1 } ⊂ {ω ∈ Ω : ht (θ ∗ ) < y +ε1 }∪{|ht (θ̂ n )−ht (θ ∗ )| ≥
ε1 }. Second, for the same y and ε1 , {ω ∈ Ω : ht (θ̂ n ) < y} ⊃ {ω ∈ Ω : ht (θ ∗ ) < y − ε1 } \ {ω ∈
Ω : |ht (θ ∗ ) − ht (θ̂ n )| > ε1 }. These two facts imply that
n
n
1X
1X
1{ht (θ∗ )<y−ε1 } −
1
n t=1
n t=1 {|ht (θ̂n )−ht (θ∗ )|≥ε1 }
n
n
n
1X
1X
1X
≤
1{ht (θ̂n )<y} ≤
1{ht (θ∗ )<y+ε1 } +
1
.
n t=1
n t=1
n t=1 {|ht (θ̂n )−ht (θ∗ )|≥ε1 }
Thus, n−1
Pn
t=1
1{ht (θ∗ )<y−ε1 } and n−1
Pn
t=1
1{ht (θ∗ )<y+ε1 } converge to F (y − ε1 ) and F (y + ε1 )
a.s. by the SLLN and the null hypothesis. Further, for any δ > 0 and ε2 > 0, there is an
P
n∗ such that if n > n∗ , P(n−1 nt=1 1{|ht (θ∗ )−ht (θ̂n )|>ε1 } ≥ δ) ≤ ε2 . This follows from that
P
P
P
P(n−1 nt=1 1{|ht (θ∗ )−ht (θ̂n )|>ε1 } ≤ δ) ≤ (nδ)−1 nt=1 E(1{|ht (θ∗ )−ht (θ̂n )|>ε1 } ) = (δn)−1 nt=1 P
(|ht (θ ∗ ) − ht (θ̂ n )| > ε1 ) ≤ ε2 , where the first inequality follows from Markov’s inequality, and
the last inequality follows from the fact that |ht (θ ∗ ) − ht (θ̂ n )| ≤ Mt kθ̂ n − θ ∗ k = oP (1) uniformly
in t by A2 and A3(ii). From these, therefore for any ε1 > 0, F (y − ε1 ) + oP (1) ≤ F̂n (y) ≤
F (y + ε1 ) + oP (1). Since ε1 can be chosen arbitrarily, we may let ε1 tend to zero. Therefore, F̂n (y)
converges to F (y) in probability as desired.
¥
Proof of Theorem 3: (i, ii, and iii) Ĝn = G̈n + Ḧn = Gn + Hn + oP (1) by Lemmas 3(i) and 3(ii).
en ⇒ Ge by Theorem 2(ii). Thus, Ĝn ⇒ G,
e which implies the desired result
Further, Gn + Hn = G
together with the continuous mapping theorem.
¥
7.2 Other Test Statistics
In this section, we provide the formulae of other test statistics used for our Monte Carlo experiments.
The following statistics are used for DGPs 2.1, 2.2, and 2.3:
33
REn := maxk
k
σ̂n n
max[e
αk , βek ];
max[e
αk , βek ] − mink σ̂nkn max[e
αk , βek ];
P
RECU SU Mn := maxk σen √1n−1 | kt=2 vt |;
P
OLSCU SU Mn := maxk σ̂n1√n | kt=1 et |;
P
P
√
Mn := maxk supz | nk (1 − nk ) n(k −1 kt=1 1{et ≤z} − (n − k)−1 nt=k+1 1{et ≤z} )|;
RRn := maxk
k
σ̂n n
SupWn := supk1 ≤k≤k2 Wn (k);
P2
ExpWn := ln{ k2 −k11 +1 kk=k
exp[0.5Wn (k)]};
1
P
2
Wn (k),
AvgWn := k2 −k11 +1 kk=k
1
where
[e
αk , βek ]0 = Mn [α̂1,k − α̂1,n , β̂1,k − β̂1,n ]0 ; Mn := Z0 Z;
1/2
Z is n × 2 matrix of the regressors;
(α̂1,k , β̂1,k ) is the OLS estimator using the first k observations;
(α̂2,k , β̂2,k ) is the OLS estimator using the (t + 1)-th to n-th observations;
P
et := Yt − α̂1,n − β̂1,n Zt ; σ̂n2 := (n − 2)−1 nt=1 e2t ;
P
vt := Yt − α̂1,t−1 − β̂1,t−1 Zt ; σ
en2 := (n − 2)−1 nt=3 vt2 ;
Wn (k) := {(n − 2)σ̂n2 − (n − 3)σ̈n2 (k)}/σ̈n2 (k); k1 = b0.15nc; k2 = b0.85nc;
ŵ1,t (k) := Yt − α̂1,k − β̂1,k Zt ; ŵ2,t (k) := Yt − α̂2,k − β̂2,k Zt ; and
P
P
σ̈n2 (k) := (n − 3)−1 { kt=1 ŵ1,t (k)2 + nt=k+1 ŵ2,t (k)2 }.
The following statistics are used for DGPs 2.4, 2.5, and 2.6:
REn := maxk
k√ e
|β |;
σ̂n n k
k√ e
|β |;
σ̂n n k
P
RECU SU M n := maxk σen √1n−1 | kt=2 vt |;
P
N LSCU SU M n := maxk σ̂n1√n | kt=1 et |;
RRn := maxk
k√ e
|β |
σ̂n n k
− mink
P
P
√
Mn := maxk supz | nk (1 − nk ) n(k −1 kt=1 1{et ≤z} − (n − k)−1 nt=k+1 1{et ≤z} )|;
SupWn := supk1 ≤k≤k2 Wn (k);
P2
ExpWn := ln{ k2 −k11 +1 kk=k
exp[0.5Wn (k)]};
1
P
2
AvgWn := k2 −k11 +1 kk=k
Wn (k),
1
where
34
fn1/2 [β̂1,k − β̂1,n ]; M
fn := Pn exp(2Zt β̂1,n )Z 2 ;
βek = M
t
t=1
β̂1,k is the NLS estimator using the first k observations;
β̂2,k is the NLS estimators using the (k + 1)-th to n-th observations;
P
et := Yt − exp(β̂1,n Zt ); σ̂n2 := (n − 1)−1 nt=1 e2t ;
P
vt := Yt − exp(β̂1,t−1 Zt ); σ
en2 := (n − 1)−1 nt=2 vt2 ;
Wn (k) := n{β̂1,k − β̂2,k }{nV̂1,k /k + nV̂2,k /(n − k)}−1 {β̂1,k − β̂2,k }; and
V̂1,k and V̂2,k are the variance estimators of β̂1,k and β̂2,k respectively.
References
[1] Altonji, J. (1986). “Intertemporal Substitution in Labor Supply: Evidence from Micro Data,”
The Journal of Political Economy, 94, S176–S215.
[2] Andrews, D. (1993). “Tests for Parameter Instability and Structural Change with Unknown
Change Point,” Econometrica, 61, 821–856.
[3] Andrews, D. and Ploberger, W. (1994). “Optimal Tests When a Nuisance Parameter is Present
only under the Alternative,” Econometrica, 62, 1383–1414.
[4] Andrews, D., Lee, I. and Ploberger, W. (1996). “Optimal Changepoint Tests for Normal Linear Regression,” Journal of Econometrics, 70, 9–38.
[5] Andrews, D. (2001). “Testing When a Parameter is on the Boundary of the Maintained Hypothesis,” Econometrica, 69, 683-734.
[6] Bai, J. (1996). “Testing for Parameter Constancy in Linear Regressions: an Empirical Distribution Function Approach,” Econometrica, 64, 597–622.
[7] Bickel, P. and Wichura, M. (1971). “Convergence Criteria for Multiparameter Stochastic
Processes and Some Applications,” The Annals of Mathematical Statistics, 42, 1656-1670.
[8] Bierens, H. (1990). “A Consistent Conditional Moment Test of Functional Form,” Econometrica, 58, 1443–1458.
[9] Billingsley, P. (1968, 1999). Convergence of Probability Measures. New York: Wiley.
35
[10] Brett, C. and Pinkse, J. (1997). “Those Taxes Are All over the Map! A Test for Spatial
Independence of Municipal Tax Rates in British Columbia,” International Regional Science
Review, 20, 131–151.
[11] Brown, R., Durbin, J. and Evans, J. (1975). “Techniques for Testing the Constancy of Regression Relationships over Time,” Journal of the Royal Statistical Society, Series B, 37,
149–163.
[12] Chu, J., Hornik, K. and Kuan, C. (1995a). “MOSUM Tests for Parameter Constancy,” Biometrika, 82, 603–617.
[13] Chu, J., Hornik, K. and Kuan, C. (1995b). “The Moving-Estimates Test for Parameter Stability,” Econometric Theory, 11, 699–720.
[14] Darling, D. (1955). “The Cramér–Smirnov Tetst in the Parametric Case,” The Annals of Mathematical Statistics, 28, 823–838.
[15] Diebold, F., Gunther, T. and Tay, A. (1998). “Evaluating Density Forecasts with Applications
to Financial Risk Management,” International Economic Review, 76, 967–974.
[16] Dodd, E. (1942). “Certain Tests for Randomness Applied to Data Grouped into Small Sets,”
Econometrica, 10, 249–257.
[17] Donsker, M. (1951). “An Invariance Principle for Certain Probability Limit Theorems,” Memoirs of American Mathematics Society, 6.
[18] Durbin, J. (1973). “Weak Convergence of the Sample Distribution Function When Paramaters
are Estimated,” The Annals of Statistics, 1, 279–290.
[19] Elliott, G. and Müller, U. (2006). “Optimally Testing General Breaking Processes in Linear
Time Series Models,” The Review of Economic Studies, 73, 907–940.
[20] Fama, E. (1965). “The Behavior of Stock Market Prices,” The Journal of Business, 38, 34–
105.
[21] Feller, W. (1951). “The Asymptotic Distribution of the Range of Sums of Independent Random Variables,” The Annals of Mathematical Statistics, 22, 427–432.
36
[22] Goodman, L. (1958). “Simplified Runs Tests and Likelihood Ratio Tests for Markov Chains,”
Biometrika, 45, 181–97.
[23] Granger, C. (1963). “A Quick Test for Serial Correaltion Suitable for Use with Nonstationary
Time Series,” Journal of the American Statistical Association, 58, 728–736.
[24] Grenander, U. (1981). Abstract Inference. New York: Wiley.
[25] Heckman, J. (2001). “Micro Data, Heterogeneity, and the Evaluation of Public Policy: Nobel
Lecture,” The Journal of Political Economy, 109, 673–746.
[26] Henze, N. (1996). “Empirical Distribution Function Goodness-of-Fit Tests for Discrete Models,” Canadian Journal of Statistics, 24, 81–93.
[27] Hochberg, J. (1988). “A Sharper Bonferroni Procedure for Multiple Test of Significance,”
Biometrika, 75, 800–802.
[28] Hong, Y. (1999). “Hypothesis Testing in Time Series via the Empirical Chracteristic Function: A Generalized Spectral Density Approach,” Journal of the American Staitistical Association, 84, 1201–1220.
[29] Hong, Y. and White, H. (2005). “Asymptotic Distribution Theory for Nonparameteric Entropy Measures of Serial Dependence,” Econometrica, 73, 837–901.
[30] Jain, N. and Kallianpur, G. (1970). “Norm Convergent Expansions for Gaussian Processes in
Banach Spaces,” Proceedings of the American Mathematical Society, 25, 890–895.
[31] Jain, N. and Marcus, M. (1975). “Central Limit Theorem for C(S)–valued Random Variables,” Journal of Functional Analysis, 19, 216–231.
[32] Karlin, S. and Taylor, H. (1975). A First Course in Stochastic Processes. San Diego: Academic Press.
[33] Kocherlakota, S. and Kocherlakota, K. (1986). “Goodness-of-Fit Tests for Discrete Distributions,” Communications in Statistics–Theory and Methods, 15, 815–829.
37
[34] Kuan, C. and Hornik, K. (1995). “The Generalized Fluctuation Test: a Unifying View,”
Econometric Review, 14, 135–161.
[35] Loeve, M. (1978). Probability Theory II. New York: Springer-Verlarg.
[36] MaCurdy, T. (1981). “An Empirical Model of Labor Supply in a Life–Cycle Setting,” The
Journal of Political Economy, 69, 1059–1085.
[37] Mood, A. (1940). “The Distribution Theory of Runs,” The Annals of Mathematical Statistics,
11, 367–392.
[38] Nyblom, J. (1989). “Testing for the Consistency of Parameters over Time,” Journal of the
American Statistical Association, 84, 223–230.
[39] Phillips, P. (1998). “New Tools for Understanding Spurious Regressions,” Econometrica, 66,
1299–1326.
[40] Pinkse, J. (1998). “Consistent Nonparametric Testing for Serial Independence,” Journal of
Econometrics, 84, 205–231.
[41] Ploberger, W., Krämer, W. and Kontrus, K. (1989). “A New Test for Structural Stability in
the Linear Regression Model,” Journal of Econometrics, 40, 307–318.
[42] Ploberger, W. and Krämer, W. (1992). “The CUSUM Test with OLS Residuals,” Econometrica, 60, 271–285.
[43] Robinson, P. (1991) “Consistent Nonparametric Entropy-Based Testing,” Review of Economic Studies, 58, 437–453.
[44] Rueda, R., Pérez-Abreu, V. and O’Reily, F. (1991). “Goodness-of-Fit Test for the Poisson
Distribution Based on the Probability Generating Function,” Communications in Statistics–
Theory and Methods, 20, 3093-3110.
[45] Sen, P. (1980). “Asymptotic Theory of Some Tests for a Possible Change in the Regressional
Slope Occurring at an Unknown Time Point,” Zeitschrift für Wahrscheinlichkeitstheorie und
Verwandte Gebiete, 52, 203–218.
38
[46] Skaug and Tjøstheim, D. (1996). “Measures of Distance between Densities with Application
to Testing for Serial Independence,” Time Series Analysis in Memory of E. J. Hannan, ed. P.
Robinson and M. Rosenblatt. New York: Springer-Verlag, 363–377.
[47] Stinchcombe, M. and White, H. (1998). “Consistent Specification Testing with Nuisance
Parameters Present Only under the Alternative,” Econometric Theory, 14, 295–324.
[48] Sukhatme, P. (1972). “Fredholm Determinant of a Positive Definite Kernel of a Special Type
and Its Applications,” The Annals of Mathematical Statistics, 43, 1914–1926.
[49] van der Vaart, A. and J. Weller. (1996). Weak Convergence and Empirical Processes with
Applications to Statistics. New York: Springer–Verlag.
39
Table 1: A SYMPTOTIC C RITICAL VALUES OF THE T EST S TATISTICS
Statistics \ Level
p
T1,n
p
T∞,n
1%
5%
10%
p = 0.1
0.2474
0.1914
0.1639
0.3
0.5892
0.4512
0.5
0.8124
0.7
Statistics \ Level
1%
5%
10%
p = 0.1
0.2230
0.1727
0.1420
0.3828
0.3
0.5004
0.3842
0.3225
0.6207
0.5239
0.5
0.6413
0.4886
0.4092
0.9007
0.6841
0.5763
0.7
0.6065
0.4632
0.3889
0.9
0.7052
0.5329
0.4478
0.9
0.3066
0.2356
0.1973
p =0.1
0.7483
0.5683
0.4750
p =0.1
0.7454
0.5677
0.4750
0.3
1.3517
1.0237
0.8582
0.3
1.3239
1.0069
0.8441
0.5
1.6846
1.2818
1.0728
0.5
1.5909
1.1990
1.0091
0.7
1.7834
1.3590
1.4000
0.7
1.5019
1.1360
0.9617
0.9
1.3791
1.0486
0.8839
0.9
0.7912
0.6028
0.5060
T1,n
0.4524
0.3595
0.3157
Te1,n
0.2994
0.2427
0.2159
T∞,n
2.2586
1.9137
1.7382
Te∞,n
2.0346
1.7243
1.5603
p
Te1,n
p
Te∞,n
40
Table 2: L EVEL S IMULATION
AT
5% L EVEL ( IN PERCENT, 10,000
ITERATIONS )
DGP 1.1
Statistics \ n
p
Te1,n
p
Te∞,n
300
500
p = 0.1
4.05
4.71
5.09
0.3
5.02
4.95
4.61
0.5
5.22
4.74
4.87
0.7
5.34
4.90
5.31
0.9
4.35
6.63
6.07
p = 0.1
3.86
4.44
4.49
0.3
3.54
4.65
4.59
0.5
5.42
4.40
5.05
0.7
7.00
5.03
5.08
0.9
4.21
6.27
4.71
Te1,n
Te∞,n
4.85
4.88
4.54
5.17
7.15
7.75
Rn a
4.7
STn
a
HWn a
a:
100
5.1
6.5
These are the results in Hong and White (2005). Their number of replications is 1,000.
41
Table 3: P OWER S IMULATION AT 5% L EVEL ( IN
DGP 1.2
DGP 1.3
PERCENT,
3,000
ITERATIONS )
DGP 1.4
DGP 1.5
Statistics \ n
p
e
T1,n
p = 0.1
100
200
100
200
100
200
100
200
20.67
33.17
29.50
45.00
27.53
47.47
5.80
9.27
0.3
33.27
59.37
8.43
10.57
8.77
12.07
9.77
15.40
0.5
36.60
65.17
5.33
5.47
5.20
5.80
18.77
32.93
0.7
34.13
60.30
9.33
11.37
5.37
6.10
76.17
97.33
0.9
23.77
38.07
30.30
51.57
15.40
23.97
75.37
97.10
p = 0.1
6.60
6.67
7.10
6.33
5.03
5.37
3.63
4.73
0.3
10.57
23.97
3.60
5.40
3.70
5.40
5.70
11.00
0.5
24.07
42.87
5.77
6.13
4.90
6.33
16.43
26.70
0.7
29.53
41.67
9.80
6.87
6.93
4.20
72.97
92.77
0.9
22.47
22.63
27.30
32.67
14.43
13.30
72.30
89.87
Te1,n
Te∞,n
62.20
90.73
14.53
25.00
8.87
15.03
16.13
26.20
31.47
63.30
6.83
12.60
5.30
7.30
18.63
34.63
a
13.8
25.4
26.4
52.2
15.0
7.2
59.8
75.4
12.4
22.0
61.2
90.0
27.8
52.0
81.6
98.4
14.0
27.0
37.6
67.6
20.6
35.2
69.6
95.6
p
Te∞,n
Rn
STn
a
HWn
a:
a
These are the results in Hong and White (2005). Their number of replications is 500.
42
Table 4: P OWER S IMULATION AT 5% L EVEL ( IN
DGP 1.6
DGP 1.7
PERCENT,
3,000
ITERATIONS )
DGP 1.8
DGP 1.9
Statistics \ n
p = 0.1
Te p
100
200
100
200
100
200
100
200
1.30
6.83
1.00
5.33
2.77
3.73
27.83
47.70
0.3
9.93
16.33
18.90
33.67
17.73
32.57
39.47
64.23
0.5
8.33
10.87
9.27
11.37
31.77
59.80
43.13
68.00
0.7
30.87
57.17
18.07
28.73
33.60
58.43
39.97
62.13
0.9
26.43
41.27
24.00
39.03
22.37
34.63
26.30
37.50
p = 0.1
3.73
5.73
4.33
4.30
3.77
3.67
9.50
14.53
0.3
6.00
10.47
8.93
16.23
5.83
12.80
22.23
44.33
0.5
8.47
9.90
8.10
9.83
20.47
36.93
34.60
59.23
0.7
33.63
47.20
16.67
18.17
30.03
39.93
37.83
54.90
0.9
1,n
p
Te∞,n
25.80
29.60
22.53
22.70
20.63
19.97
24.83
29.80
Te1,n
Te∞,n
26.93
62.00
20.87
52.60
48.60
81.47
58.03
81.90
23.33
53.27
15.60
31.93
27.67
56.30
52.60
77.60
Rn a
31.8
65.2
24.6
80.8
14.2
34.6
60.2
84.0
34.8
72.8
25.8
86.8
13.4
23.8
55.8
79.8
34.0
74.0
25.6
85.4
17.0
26.2
60.8
84.6
STn
a
HWn a
a:
These are the results in Hong and White (2005). Their number of replications is 500.
43
Table 5: L EVEL S IMULATION
AT
5% L EVEL ( IN PERCENT, 10,000
DGP 2.1
Statistics \ n
p
e
T1,n
p = 0.1
0.3
0.5
0.7
0.9
p
Te∞,n
p = 0.1
ITERATIONS )
DGP 2.2
100
300
500
100
300
500
4.12
5.12
4.76
5.81
3.71
4.88
5.22
5.51
5.63
6.63
5.02
5.07
5.33
5.19
5.92
4.25
5.20
5.02
5.14
4.35
5.01
5.12
5.04
4.51
6.87
5.02
5.23
5.38
5.32
6.02
3.86
5.56
5.17
7.44
3.58
6.38
4.56
5.34
5.70
6.25
6.04
6.10
5.60
4.88
7.43
4.33
5.68
5.60
6.89
4.14
6.44
4.82
5.06
4.54
6.47
6.32
6.38
5.77
5.15
7.49
Te1,n
Te∞,n
5.14
5.00
4.72
7.19
4.35
7.38
4.97
5.10
4.88
7.30
4.40
7.79
Mn
REn
RRn
SupWn
AvgWn
ExpWn
RECU SU Mn
O(N )LSCU SU Mn
2.89
7.89
9.28
4.77
5.81
5.34
1.65
2.68
4.43
4.14
4.30
4.41
5.29
5.35
3.07
4.20
4.75
3.11
2.74
4.57
5.10
5.04
3.68
4.01
3.98
71.90
64.39
2.93
2.60
1.78
3.41
28.91
5.33
70.70
60.78
0.94
1.69
2.35
3.86
30.22
5.62
31.01
66.15
1.31
2.22
3.13
3.64
55.50
0.3
0.5
0.7
0.9
44
Table 6: P OWER S IMULATION AT 5% L EVEL ( IN
DGP 2.3
Statistics \ n
p
Te1,n
p
Te∞,n
DGP 2.4
PERCENT,
3,000
DGP 2.5
ITERATIONS )
DGP 2.6
100
200
100
200
100
200
100
200
p = 0.1
18.13
29.60
22.20
39.93
16.70
24.70
35.67
59.00
0.3
27.47
48.43
20.53
37.60
7.80
10.37
22.87
39.23
0.5
29.03
55.43
21.83
37.77
5.80
7.77
19.43
35.07
0.7
29.03
51.70
22.33
38.20
7.67
10.17
27.37
48.37
0.9
19.37
32.77
22.63
39.23
17.30
24.07
43.27
71.47
p = 0.1
5.90
5.53
5.50
6.80
6.07
5.03
7.57
9.47
0.3
7.70
21.80
6.87
16.43
4.77
8.40
7.20
18.10
0.5
18.23
33.47
13.80
20.50
5.93
6.50
12.63
20.10
0.7
25.27
44.40
18.83
32.30
7.30
9.83
22.47
41.37
0.9
17.37
39.73
20.33
44.47
15.27
25.77
40.10
74.80
Te1,n
Te∞,n
44.99
79.94
36.70
67.27
3.60
3.83
44.44
76.60
17.72
46.81
14.10
32.10
4.53
5.40
16.81
40.58
Mn
100.0
100.0
12.93
23.03
1.97
2.70
35.96
61.43
REn
100.0
100.0
100.0
100.0
94.96
100.0
79.50
89.13
RRn
99.97
100.0
99.96
100.0
87.90
99.93
69.43
81.73
SupWn
100.0
100.0
100.0
100.0
85.63
97.86
94.89
99.03
AvgWn
100.0
100.0
100.0
100.0
73.76
92.60
94.87
99.02
ExpWn
100.0
100.0
100.0
100.0
85.73
97.70
90.19
98.14
RECU SU Mn
6.77
9.92
9.30
11.43
14.73
19.10
15.60
20.20
O(N )LSCU SU Mn
19.67
24.29
22.43
26.60
15.50
22.03
83.03
92.40
45
Table 7: P OWER S IMULATION AT 5% L EVEL ( IN
DGP 2.7
DGP 2.8
PERCENT,
DGP 2.9
3,000
ITERATIONS )
DGP 2.10
DGP 2.11
Statistics \ n
Te p
p = 0.1
100
200
100
200
100
200
100
200
100
200
18.13
29.13
18.13
27.06
14.86
23.66
4.10
12.33
24.03
45.53
0.3
27.47
49.87
25.50
43.26
20.06
37.43
37.83
63.23
20.53
34.53
0.5
29.03
55.77
26.33
47.90
21.56
41.06
46.40
74.57
18.93
29.27
0.7
29.03
50.97
25.46
45.06
22.70
39.90
35.87
63.67
15.20
24.10
0.9
19.37
32.33
21.10
27.73
16.06
26.13
0.43
0.03
0.93
0.20
p = 0.1
5.90
5.27
5.43
4.93
5.10
4.23
4.23
7.13
24.27
46.60
0.3
7.70
20.83
7.86
15.83
6.43
14.50
31.27
43.87
23.07
30.87
0.5
18.23
32.73
16.40
28.16
13.10
23.96
37.17
65.13
16.87
26.87
0.7
25.27
44.43
22.43
28.73
19.76
25.56
38.13
56.77
17.60
20.17
0.9
1,n
p
Te∞,n
17.37
39.57
19.90
15.06
14.80
14.23
0.40
0.07
0.90
0.53
Te1,n
Te∞,n
44.99
80.86
43.76
75.36
35.86
67.36
68.63
90.45
35.16
51.88
17.72
43.10
17.93
40.50
15.73
35.36
50.51
78.32
25.16
42.63
Mn
100.0
13.67
9.80
12.20
8.10
9.40
0.66
0.86
2.36
2.73
REn
100.0
100.0
78.06
60.00
36.23
3.12
7.77
4.91
80.70
89.16
RRn
99.97
99.96
77.86
59.73
34.33
30.93
8.85
5.24
71.16
79.93
SupWn
100.0
100.0
98.16
63.26
41.86
46.06
3.73
3.46
3.82
3.64
AvgWn
100.0
100.0
92.80
50.16
34.16
32.63
4.35
3.63
3.34
3.15
ExpWn
100.0
100.0
98.86
62.66
43.50
42.40
3.56
3.19
1.76
1.87
RECU SU Mn
6.77
9.20
7.03
9.50
5.46
8.50
0.23
0.39
1.26
3.10
OLSCU SU Mn
19.67
23.36
17.80
20.40
13.63
15.70
0.43
0.39
28.43
35.13
Table 8: S AMPLE S IZES
Sample Period
Initial Number of Households
Remaining Number of Households
68–69
6,798
1,129
68–71
6,871
939
68–74
6,962
697
68–77
7,529
489
68–80
8,439
346
46
Table 9: p-VALUES OF Te1,n
Panels
Year \ Statistics
1968–1980
Te1
Te∞
AND
Te∞,n ( IN PERCENT )
1968–1977
Te1
Te∞
1968–1974
Te1
Te∞
1968–1971
Te1
Te∞
1968–1969
Te1
Te∞
77.7
1969
32.64
38.55
9.35
36.31
84.4
57.6
13.5
4.4
1970
36.73
21.19
33.43
53.81
13.9
5.6
37.8
30.9
1971
42.76
67.24
42.61
28.88
60.2
32.6
45.9
6.4
1972
11.58
15.27
65.91
35.15
16.3
16.1
1973
34.64
10.66
41.30
21.68
39.9
30.3
1974
75.09
30.05
66.41
73.32
98.8
89.4
1975
53.07
47.80
38.88
24.50
1976
72.17
22.04
65.03
19.17
1977
27.80
15.57
36.69
31.42
1978
27.39
25.96
1979
34.46
31.97
1980
8.83
4.88
55.1
Hochberg’s Boundsa
75.0
73.3
68.1
26.7
77.7
Boundsb
86.4
88.2
51.6
63.8
88.7
Hochberg’s
These are obtained from Te1,n and Te∞,n statistics.
p
b : These are obtained from T
e p and Te∞,n
statistics for p = 0.3, 0.4, 0.5, 0.6, and 0.7.
a:
1,n
47