Generalized Runs Tests for the IID Assumption JIN SEO CHO HALBERT WHITE School of Economics and Finance Department of Economics Victoria University of Wellington University of California, San Diego PO Box 600, Wellington, 6001, New Zealand 9500 Gilman Dr., La Jolla, CA, 92093-0508, U.S.A. Email: [email protected] Email: [email protected] First version: March, 2004. This version: January, 2007. Abstract We provide a family of tests for the IID hypothesis based on generalized runs, powerful against unspecified alternatives, providing a useful complement to tests designed for specific alternatives, such as serial correlation, GARCH, or structural breaks. Our tests have appealing computational simplicity in that they do not require kernel density estimation, with the associated challenge of bandwidth selection; and they have power against dependent alternatives without having to examine dependence at specific lags. Simulations show levels close to nominal asymptotic levels and encouraging power against important alternatives. We apply our tests to residuals from a male labor supply model using the PSID and find no evidence against the IID hypothesis. Key Words IID condition, Runs test, Geometric Distribution, Gaussian process, Dependence, Structural Break, Model Specification, PSID, Male labor supply. JEL Classification C12, C23, C80, J22. Acknowledgements The authors have benefited from discussions with Robert Davies, Chirok Han, Estate Khmaladze, Tae-Hwy Lee, Peter Thomson, David Vere-Jones and participants at NZESG, NZSA, SRA, UC-Riverside, and SEF and SMCS of Victoria University of Wellington. Preliminary Version DO NOT CITE OR CIRCULATE WITHOUT AUTHORS’ PERMISSION. 1 Introduction The assumption that data are independent and identically distributed (IID) plays a central role in the analysis of economic data. In cross-section settings, the IID assumption holds under pure random sampling. As Heckman (2001) notes, violation of the IID property can indicate the presence of sample selection bias. The IID assumption is also important in time-series settings, as processes driving time series of interest are often assumed to be IID. Moreover, transformations of certain time series can be shown to be IID under specific null hypotheses. For example Diebold, Gunther, and Tay (1998) show that to test density forecast optimality, one can test whether the series of probability integral transforms of the forecast errors are IID uniform (U[0,1]). There is a large number of tests designed to test the IID assumption against specific alternatives, such as structural breaks, serial correlation, or autoregressive conditional heteroskedasticity. Such special purpose tests may lack power in other directions, however, so it is useful to have available broader diagnostics that may alert researchers to otherwise unsuspected properties of their data. Thus, as a complement to special purpose tests, we consider tests for the IID hypothesis that are sensitive to general alternatives. Here we exploit runs statistics to obtain a necessary and sufficient condition for data to be IID. Specifically, we show that the runs are IID with geometric distribution if and only if the underlying data are IID. By testing whether the runs have the requisite geometric distribution, we obtain a new family of tests, the generalized runs tests, suitable for testing the IID property. An appealing aspect of our tests is their computational convenience relative to other tests sensitive to general alternatives to IID. For example, Hong and White’s (2005) entropy-based IID tests require kernel density estimation, with its associated challenge of bandwidth selection, together with specification of particular lags at which to test for dependence. Our tests do not require kernel estimation and, as we show, have power against dependence without explicit examination of specific lags. Our tests also have power against structural break alternatives. Runs have formed an effective means for understanding data properties since the early 1940’s. Mood (1940), Dodd (1942), and Goodman (1958) first studied runs to test for randomness of data with a fixed percentile p used in defining the runs. Granger (1963) proposes using runs as a nonparametric diagnostic for serial correlation; he also emphasizes that the choice of p is important for the power of the test. Fama (1965) extensively exploits a runs test to examine stylized facts 1 of asset returns in US industries, with a particular focus on testing for serial correlation of asset returns. Heckman (2001) observes that runs tests can be exploited to detect sample selection bias in cross sectional data. Earlier runs tests compared the mean or other moments of the runs to those of the geometric distribution for fixed p. Here we develop runs tests based on the probability generating function (PGF) of the geometric distribution. Previously, Kocherlakota and Kocherlakota (1986) (KK) have used the PGF to devise tests for discrete random variables having a given distribution under the null hypothesis. Using fixed values of the PGF parameter s, KK develop tests for the Poisson, PascalPoisson, bivariate Poisson, or bivariate Neyman type A distributions. More recently, Rueda, PérezAbreu, and O’Reilly (1991) study PGF-based tests for the Poisson null hypothesis, constructing test statistics as functionals of stochastic processes indexed by the PGF parameter s. Here we contribute by developing PGF-based tests for the geometric distribution with parameter p, applied to the runs for a given sample. We construct our test statistics as functionals of stochastic processes indexed by both the runs percentile p and the PGF parameter s. By not restricting ourselves to fixed values for p and/or s, we create the opportunity to construct tests with superior power. Further, we obtain weak limits for our statistics in situations where the distribution of the raw data from which the runs are constructed may or may not be known and where there may or may not be estimated parameters. As pointed out by Darling (1955), Sukhatme (1972), Durbin (1973), and Henze (1996), among others, goodness-of-fit (GOF) based statistics such as ours may have limiting distributions affected by parameter estimation. As we show, however, our test statistics have asymptotic null distributions that are not affected by parameter estimation under mild conditions. We also provide straightforward simulation methods to consistently estimate asymptotic critical values for our test statistics. We conduct Monte Carlo experiments to explore the properties of our tests in settings relevant for economic applications. We find that our tests have well-behaved levels. In studying power, we give particular attention to dependent alternatives and to alternatives containing an unknown number of structural breaks. For dependent alternatives, we compare our generalized runs tests to the entropy-based tests of Robinson (1991), Skaug and Tjøstheim (1996), and Hong and White (2005). Our tests perform respectably, showing useful, and in some cases superior, power against dependent alternatives. For structural break alternatives, we compare our generalized runs tests 2 to Feller’s (1951) and Kuan and Hornik’s (1995) RR test, Brown, Durbin and Evans’s (1975) RE-CUSUM test, Sen’s (1980) and Ploberger, Krämer and Kontrus’s (1989) RE test, Ploberger and Krämer’s (1992) OLS-CUSUM test, Andrews’s (1993) Sup-W test, Andrews and Ploberger’s (1994) Exp-W and Avg-W tests, and Bai’s (1996) M-test. These prior tests are all designed to detect a finite number of structural breaks at unknown locations. An innovation here is that we consider alternatives in which the number of breaks grows with the sample size. We find that our new tests perform well against such structural break alternatives, whereas the prior tests do not. Finally, we consider a model of male labor supply developed by MaCurdy (1981) and Altonji (1986), which they estimate using the Panel Study of Income Dynamics (PSID). Using the PSID, we test whether the IID hypothesis holds for the residuals of this model. This enables us to investigate the success of the original random sampling design, its preservation over time, the presence or absence of structural breaks, and the possible presence or absence of model misspecification. We find little evidence against the IID null hypothesis, supporting the design of the PSID and the specification of the MaCurdy (1981)-Altonji (1986) model. This paper is organized as follows. In Section 2, we introduce our new family of generalized runs statistics and derive their asymptotic null distributions. These involve Gaussian stochastic processes. Section 3 provides methods for consistently estimating critical values for the test statistics of Section 2. We achieve this using other easily simulated Gaussian processes whose distributions are identical to those of Section 2. Section 4 contains Monte Carlo simulations, and Section 5 applies our methods to the PSID data. Section 6 contains concluding remarks, and all mathematical proofs are collected in the Appendix. Before proceeding, we introduce mathematical notation used throughout. We let 1{ · } stand for the indicator function such that 1{A} = 1 if the event A is true, and 0 otherwise. ⇒ and → denote d ‘converge(s) weakly’ and ‘converge(s) to’, respectively, and = denotes equality in distribution. Further, k · k and k · k∞ denote the Euclidean and uniform metrics respectively. We let C(A) and D(A) be the spaces of continuous and cadlag mappings from a set A to R respectively, and we endow these spaces with Billingsley’s (1968, 1999) or Bickel and Wichura’s (1971) metric. We denote the unit interval as I := [0, 1]. 3 2 Testing the IID Condition 2.1 Maintained Assumptions We first specify the data generating process (DGP) and a parameterized function whose behavior is of interest. A1 (DGP): Let (Ω, F, P) be a complete probability space. For m ∈ N, {Xt : Ω 7→ Rm , t = 1, 2, · · · } is a stochastic process on (Ω, F, P). A2 (PARAMETERIZATION): For d ∈ N , let Θ be a non-empty convex compact subset of Rd . Let h : Rm × Θ 7→ R be a function such that (i) for each θ ∈ Θ, h(Xt ( · ), θ) is measurable; and (ii) for each ω ∈ Ω, h(Xt (ω), · ) is such that for each θ, θ 0 ∈ Θ, |h(Xt (ω), θ) − h(Xt (ω), θ 0 )| ≤ Mt (ω)kθ − θ 0 k, where Mt is measurable and is OP (1), uniformly in t. Assumption A2 specifies that Xt is transformed via h. The Lipschitz condition of (ii) is mild and typically holds in applications involving estimation. Our next assumption restricts attention to continuously distributed random variables. A3 (C ONTINUOUS R ANDOM VARIABLES): For given θ ∗ ∈ Θ, the random variables Yt := h(Xt , θ ∗ ) have continuous cumulative distribution functions (CDFs) Ft : R 7→ I, t = 1, 2, · · · . Our main interest attaches to distinguishing the following hypotheses: H0 : {Yt : t = 1, 2, · · · } is an IID sequence; H1 : {Yt : t = 1, 2, · · · } is not an IID sequence. Under H0 , Ft ≡ F (say), t = 1, 2, ... . We separately treat the cases in which F is known or unknown. In the latter case, we estimate F using the empirical distribution function. We also separately consider cases in which θ ∗ is known or unknown. In the latter case, we assume θ ∗ is consistently estimated by θ̂ n . Formally, we impose A4 (E STIMATOR): There exists a sequence of meaurable functions {θ̂ n : Ω 7→ Θ} such that √ n(θ̂ n − θ ∗ ) = OP (1). 4 This condition is relevant when the behavior of transformations of the data, such as regression errors, are of interest. We pay particular attention to the effect of parameter estimation on the asymptotic null distribution of our test statistics. 2.2 Generalized Runs (GR) Tests We begin by analyzing the case in which θ ∗ and F are known. We define runs in the following two steps: first, for each p ∈ I, we let Tn (p) := {t ∈ {1, · · · , n} : F (Yt ) < p}, n = 1, 2, · · · . This set contains those indices whose percentiles F (Yt ) are less than the given number p. Next, let Mn (p) denote the (random) number of elements of Tn (p), let tn,i (p) denote the ith smallest element of Tn (p), i = 1, · · · , Mn (p), and define the p-runs Rn,i (p) as tn,i (p), i = 1; Rn,i (p) := t (p) − t n,i n,i−1 (p), i = 2, · · · , Mn (p). Thus, a p-run Rn,i (p) is a number of observations separating data values whose percentiles are less than the given value p. This notion is related to that of stopping times. Note that Mn (p)/n = p + oP (1). Our first result provides a characterization of runs, new to the best of our knowledge, that can be exploited to yield a variety of runs-based tests consistent against arbitrary departures from the IID null hypothesis. L EMMA 1: Suppose Assumptions A1, A2(i), and A3 hold. Then for each n = 1, 2, · · · , {Yt , t = 1, · · · , n} is IID if and only if for every p ∈ I, {Rn,i (p), i = 1, · · · , Mn (p)} is IID with distribution Gp , the geometric distribution with parameter p. By Lemma 1, alternatives to IID {Yt } may manifest as p-runs with (i) distribution Gq , q 6= p; (ii) non-geometric distribution; (iii) heterogeneous distributions; (iv) dependence; or (v) any combination of (i) − (iv). To keep our analysis manageable, we focus on detecting (i) − (iii) by testing the p-runs for distribution Gp . As we shall see, this delivers well-behaved tests with power against both dependent and heterogeneous alternatives to IID. Specifically, we exploit a 5 GOF statistic based on the PGF to test the Gp hypothesis. For this, let −1 < s < 0 < s̄ < 1; for each s ∈ S := [s, s̄], define ¶ Mn (p) µ 1 X sp Rn,i (p) s − Gn (p, s) := √ , {1 − s(1 − p)} n i=1 if p ∈ ( n1 , n−1 ), n (1) and Gn (p, s) := 0 otherwise. This is a scaled difference between the p-runs sample PGF and the Gp PGF. Two types of GOF statistics are popular in the literature: those exploiting the empirical distribution function (e.g., Darling, 1955; Sukhatme, 1972; Durbin, 1973; and Henze, 1996) and those comparing empirical characteristic or moment generating functions (MGFs) with their sample estimates (e.g., Bierens, 1990; Brett and Pinkse, 1997; Stinchcombe and White 1998; Hong, 1999; and Pinkse, 1998). The statistic in (1) belongs to the latter type, as the PGF for discrete random variables plays the same role as the MGF, as noted by Karlin and Taylor (1975). The PGF is especially convenient because it is a rational polynomial in s, enabling us to easily handle the weak limit of the process Gn . Specifically, the rational polynomial structure permits us to represent this weak limit as an infinite sum of independent Gaussian processes, enabling us to straightforwardly estimate critical values by simulation, as examined in detail in Section 3. Our use of Gn builds on work of Kocherlakota and Kocherlakota (1986) (KK), who consider tests for a number of discrete null distributions, based on a comparison of sample and theoretical PGFs for a given finite set of s’s. To test their null distributions, KK recommend choosing s’s close to zero. Subsequently, Rueda, Pérez-Abreu, and O’Reilly (1991) examined the weak limit of an analog of Gn ( p, · ) to test the IID Poisson null hypothesis. Here we show that if {Rn,i (p)} is a sequence of IID p-runs distributed as Gp then Gn ( p, · ) obeys the functional central limit theorem; test statistics can be constructed accordingly. Specifically, for each p, Gn ( p, · ) ⇒ G(p, · ), where G(p, · ) is a Gaussian process such that for each s, s0 ∈ S, E[G(p, s)] = 0, and E [G(p, s)G(p, s0 )] = ss0 p2 (1 − s)(1 − s0 )(1 − p) . {1 − s(1 − p)}{1 − s0 (1 − p)}{1 − ss0 (1 − p)} (2) This follows by showing that {Gn (p, · ) : n = 1, 2, · · · } is tight (see Billingsley, 1999); the given covariance structure (2) is derived from E [Gn (p, s)Gn (p, s0 )] under the null. Let f : C(S) 7→ R be 6 a continuous mapping. Then by the continuous mapping theorem, under the null any test statistic f [Gn (p, · )] obeys f [Gn (p, · )] ⇒ f [G(p, · )]. As Granger (1963) emphasizes, the power of runs tests may depend critically on the specific choice of p. To take full advantage of this, we consider the behavior of Gn as a random function of both p and s, and not just the behavior of Gn (p, · ) for given p. Specifically, under the null, a functional central limit theorem ensures that Gn ⇒ G, where G is a Gaussian process such that for each (p, s) and (p0 , s0 ) with p0 ≤ p, E [G(p, s)] = 0, and ss0 p0 2 (1 − s)(1 − s0 )(1 − p){1 − s0 (1 − p)} . (3) {1 − s(1 − p)}{1 − s0 (1 − p0 )}2 {1 − ss0 (1 − p)} Note that if p = p0 then the covariance structure is as in (2). For each s, Gn ( · , s) is cadlag, so E [G(p, s)G(p0 , s0 )] = the tightness of {Gn } must be proved differently from that of {Gn (p, · )}. Further, though G is continuous in p, it is not differentiable almost surely. This is because Rn,i is a discrete random function of p. As n increases, the discreteness of Gn disappears, but its limit is not smooth enough to deliver differentiability in p. As before, the continuous mapping theorem ensures that, given a continuous mapping f : D(I × S) 7→ R, under the null the test statistic f [Gn ] obeys f [Gn ] ⇒ f [G]. Another way to construct tests is to use the process Gn (· , s). Under the null, we have Gn ( · , s) ⇒ G( · , s), where G( · , s) is a Gaussian process such that for each p and p0 with p0 ≤ p, E [G( · , s)] = 0, and s2 p0 2 (1 − s)2 (1 − p){1 − s(1 − p)} E [G(p, s)G(p , s)] = . {1 − s(1 − p)}{1 − s(1 − p0 )}2 {1 − s2 (1 − p)} 0 (4) Given a continuous mapping f : D(I) 7→ R, under the null the test statistic f [Gn ( · , s)] obeys f [Gn ( · , s)] ⇒ f [G( · , s)]. We call tests based on f [Gn (p, · )], f [Gn (· , s)], or f [Gn ] generalized runs tests (GR tests) to emphasize their lack of dependence on specific values of p and/or s. Our first main result summarizes our discussion so far. 7 T HEOREM 1: Given conditions A1, A2(i), A3, and the null, (i) for each p ∈ I, Gn ( p, · ) ⇒ G(p, · ), and if f : C(S) 7→ R is continuous, then f [Gn (p, · )] ⇒ f [G(p, · )]; (ii) for each s ∈ S, Gn ( · , s) ⇒ G( · , s), and if f : D(I) 7→ R is continuous, then f [Gn ( · , s)] ⇒ f [G( · , s)]; (iii) Gn ⇒ G, and if f : D(I × S) 7→ R is continuous, then f [Gn ] ⇒ f [G]. 2.3 Empirical Generalized Runs (EGR) Tests We now consider the case in which θ ∗ is known, but the null CDF of Yt is unknown. This is a common situation when interest attaches to the behavior of raw data. As the null CDF is unknown, Gn cannot be computed. Nevertheless, we can proceed by replacing the unknown F with a suitable estimator. The empirical distribution function is especially convenient here. Specifically, for each y ∈ R, we define n 1X Fen (y) := 1{Yt ≤y} . n t=1 This estimation requires modifying our prior definition of p-runs as follows: First, for each p, let Ten (p) := {t ∈ N : Fen (Yt ) < p}, fn (p) denote the (random) number of elements of Ten (p), and let e let M tn,i (p) denote the ith smallest fn (p). (Note that M fn (p)/n = p.) We define the empirical p-runs as element of Ten (p), i = 1, · · · , M e tn,i (p), i = 1; e Rn,i (p) := e fn (p). tn,i (p) − e tn,i−1 (p), i = 2, · · · , M For each s ∈ S, define ¶ Mn (p) µ X 1 sp e R (p) en (p, s) := √ G s n,i − {1 − s(1 − p)} n i=1 f if p ∈ ( n1 , n−1 ), n (5) en (p, s) := 0 otherwise. and G en different from that for Gn . The presence of Fen leads to an asymptotic null distribution for G We now examine this in detail. For convenience, for each p ∈ I, let qen (p) := inf{x ∈ R : Fen (x) ≥ 8 p}, let pen (p) := F (e qn (p)), and abbreviate pen (p) as pen . Then (5) can be decomposed into two pieces as en = Wn + Hn , G where for each (p, s), ¶ Mn (p) µ 1 X se pn en,i (p) R Wn (p, s) := √ s − , {1 − s(1 − pen )} n i=1 f and ¶ fn (p) µ M se pn sp Hn (p, s) := √ . − {1 − s(1 − pen )} {1 − s(1 − p)} n Our next result relates Wn to the random function Gn , revealing Hn to be the contribution of the CDF estimation error. L EMMA 2: Given conditions A1, A2(i), A3, and the null, (i) sup(p,s) ∈ I×S |Wn (p, s) − Gn (p, s)| = oP (1); (ii) Hn ⇒ H, where H is a Gaussian process such that for each (p, s) and (p0 , s0 ) with p0 ≤ p, E[H(p, s)] = 0, and ss0 pp0 2 (1 − s)(1 − s0 )(1 − p) E [H(p, s)H(p , s )] = . {1 − s(1 − p)}2 {1 − s0 (1 − p0 )}2 0 0 (6) (iii) (Wn , Hn ) ⇒ (G, H), and for each (p, s) and (p0 , s0 ), E[G(p, s)H(p0 , s0 )] = −E[H(p, s)H(p0 , s0 )]. To state our next result, we let Ge be a Gaussian process such that for each (p, s) and (p0 , s0 ) e s)] = 0, and with p0 ≤ p, E[G(p, e s)G(p e 0 , s0 )] = E[G(p, ss0 p0 2 (1 − s)2 (1 − s0 )2 (1 − p)2 . {1 − s(1 − p)}2 {1 − s0 (1 − p0 )}2 {1 − ss0 (1 − p)} (7) The analog of Theorem 1 can now be given as follows. T HEOREM 2: Given conditions A1, A2(i), A3, and the null, en (p, · ) ⇒ G(p, e · ), and if f : C(S) 7→ R is continuous, then f [G en (p, · )] (i) for each p ∈ I, G e · )]; ⇒ f [G(p, 9 en ( · , s) ⇒ G( e · , s), and if (ii) for each s ∈ S, G f : D(I) 7→ R is continuous, then en ( · , s)] ⇒ f [G( e · , s)]; f [G en ⇒ G, e and if f: D(I × S) 7→ R is continuous, then f [G en ] ⇒ f [G]. e (iii) G en (p, · )], f [G en ( · , s)], or f [G en ] empirical generalized runs tests (EGR We call tests based on f [G tests) to highlight their use of the empirical distribution function. We emphasize that the distributions of the GR and EGR tests differ, as the CDF estimation error survives in the limit. 2.4 Parametric Empirical Generalized Runs (PEGR) Tests Now we consider the consequences of estimating θ ∗ by θ̂ n satisfying A4. As noted by Darling (1955), Sukhatme (1972), Durbin (1973), and Henze (1996), this can affect the asymptotic null distribution of GOF-based test statistics. Somewhat remarkably, it turns out that this does not occur here. We elaborate our notation to handle parameter estimation. Let Ŷn,t := h(Xt , θ̂ n ) and let n 1X F̂n (y) := 1 , n t {Ŷn,t ≤y} so that F̂n is the empirical CDF of Ŷn,t . For each p, we now let T̂n (p) := {t ∈ N : F̂n (Ŷn,t ) < p}, let M̂n (p) denote the (random) number of elements of T̂n (p), and let t̂n,i (p) denote the ith smallest element of T̂n (p), i = 1, · · · , M̂n (p). (Note that M̂n (p)/n = p.) We define the parametric empirical p-runs as R̂n,i (p) := t̂n,i (p), t̂ (p) − t̂ n,i n,i−1 (p), i = 1; i = 2, · · · , M̂n (p). For each s ∈ S, define ¶ M̂n (p) µ 1 X sp R̂n,i (p) Ĝn (p, s) := √ s − {1 − s(1 − p)} n i=1 if p ∈ ( n1 , n−1 ), n and Ĝn (p, s) := 0 otherwise. 10 To show that estimating θ ∗ has no asymptotic impact, we decompose Ĝn as Ĝn = G̈n + Ḧn , where, letting qen (p) := inf{y ∈ R : Fen (y) ≥ p} and pen := F (e qn (p)) as above, we define ¶ M̂n (p) µ 1 X se pn R̂n,i (p) G̈n (p, s) := √ s − , {1 − s(1 − pen )} n j=1 and M̂n (p) Ḧn (p, s) := √ n µ se pn sp − {1 − s(1 − pen )} {1 − s(1 − p)} ¶ . Our next result extends Lemma 2. L EMMA 3: Given conditions A1− A4 and the null, (i) sup(p,s) ∈ I×S |G̈n (p, s) − Gn (p, s)| = oP (1); (ii) sup(p,s) ∈ I×S |Ḧn (p, s) − Hn (p, s)| = oP (1). The analog of Theorems 1 and 2 can now be given. T HEOREM 3: Given conditions A1− A4 and the null, e · ), and if f : C(S) 7→ R is continuous, then f [Ĝn (p, · )] (i) for each p ∈ I, Ĝn (p, · ) ⇒ G(p, e · )]; ⇒ f [G(p, e · , s), and if (ii) for each s ∈ S, Ĝn ( · , s) ⇒ G( f : D(I) 7→ R is continuous, then e · , s)]; f [Ĝn ( · , s)] ⇒ f [G( e and if f: D(I × S) 7→ R is continuous, then f [Ĝn ] ⇒ f [G]. e (iii) Ĝn ⇒ G, We call tests based on f [Ĝn ( p, · )], f [Ĝn (·, s)], or f [Ĝn ] parametric empirical generalized runs tests (PEGR tests) to highlight their use of estimated parameters. By Theorem 3, the asymptotic en ], which takes θ ∗ as known. null distribution of f [Ĝn ] is identical to that of f [G 3 Simulating Asymptotic Critical Values Obtaining critical values for test statistics constructed as functions of Gaussian processes can be challenging. Nevertheless, the rational polynomial structure of our statistics permits us to construct representations of G and Ge as infinite sums of independent Gaussian random functions. Straightforward simulations then deliver the desired critical values. 11 For this, we use the Karhunen-Loève (K-L) representation (Loeve, 1978, ch.11) of a stochastic process, which represents Brownian motion as an infinite sum of sine functions multiplied by independent Gaussian random coefficients. Grenander (1981) describes this representation as a complete orthogonal system (CONS) and provides many examples. In econometrics, Phillips (1998) has used the K-L representation to obtain asymptotic critical values for testing cointegration. Andrews’s (2001) analysis of test statistics for a GARCH(1,1) model with nuisance parameter not identified under the null also exploits a CONS representation. By theorem 2 of Jain and Kallianpur (1970), Gaussian processes with almost surely continuous paths have a CONS representation and can be approximated uniformly. We apply this result to our GR and (P)EGR test statistics; this straightforwardly delivers reliable asymptotic critical values. 3.1 Generalized Runs Tests A fundamental property of Gaussian processes is that two Gaussian processes have identical distributions if their covariance structures are the same. We use this fact to represent G(p, · ), G( · , s), and G as infinite sums of independent Gaussian processes that can be straightforwardly simulated. To obtain critical values for GR tests, we can use the Gaussian process Z defined by ¶ µ ∞ X sp(1 − s)B00 (p) (1 − s)2 p2 j 1+j Z(p, s) := + s (1 − p) Bj , {1 − s(1 − p)}2 {1 − s(1 − p)}2 j=1 (1 − p)1+j (8) where {Bj : j = 1, 2, · · · } is a sequence of independent standard Brownian motions, and B00 is a Brownian bridge independent of {Bj }. It is straightforward to compute E[Z(p, s)Z (p0 , s0 )]. Specifically, if p0 ≤ p then ss0 p0 2 (1 − s)(1 − s0 )(1 − p){1 − s0 (1 − p)} . E[Z(p, s)Z(p , s )] = {1 − s(1 − p)}{1 − s0 (1 − p0 )}2 {1 − ss0 (1 − p)} 0 0 This covariance structure is identical to (3), so Z has the same distribution as G. The processes B00 and {Bj } are readily simulated as a consequence of Donsker’s (1951) theorem, ensuring that critical values for any statistic f [Gn ] can be straightforwardly found by Monte Carlo methods. An inconvenient computational aspect of Z is that the terms Bj require evalution at arbitrary points on the real line, as p2 /(1 − p)1+j becomes unbounded as p approaches unity. Another 12 representation in this regard is the Gaussian process Z ∗ (p, s) := ∞ X ¡ ¢ sp(1 − s)B00 (p) (1 − s)2 + sj Bjs p2 , (1 − p)1+j , 2 2 {1 − s(1 − p)} {1 − s(1 − p)} j=1 (9) where {Bjs (p, q) : j = 1, 2, · · · } is a sequence of independent Brownian sheets: E[Bjs (p, q)Bis (p0 , q 0 )] = 1{i=j} min[p, p0 ] · min[q, q 0 ]. The arguments of Bjs lie only in the unit interval, and it is readily verified that E[Z ∗ (p, s)Z ∗ (p0 , s0 )] is identical to (3). Although one can obtain asymptotic critical values for p-runs test statistics f [Gn (p, · )] by fixing p in (8) or (9), there is a much simpler representation for G(p, · ). Specifically, consider the process Zp defined by ∞ sp(1 − s)(1 − p)1/2 X j Zp (s) := s (1 − p)j/2 Zj , {1 − s(1 − p)} j=0 where {Zj } is a sequence of IID standard normals. It is readily verified that for each p, E[Zp (s)Zp (s0 )] is identical to (2). Because Zp does not involve the Brownian bridge, Brownian motions, or Brownian sheets, it is more efficient to simulate than Z(p, · ). This convenient representation arises from the symmetry of equation (3) in s and s0 when p = p0 . The fact that equation (3) is asymmetric in p and p0 when s = s0 implies that a similar convenient representation for G(· , s) is not available. Instead, we obtain asymptotic critical values for test statistics f [Gn ( · , s)], by fixing s in (8) or (9). We summarize these results as follows. d T HEOREM 4: (i) For each p ∈ I, G(p, · ) = Zp , and if f: C(S) 7→ R is continuous, then f [G(p, · )] d = f [Zp ]; d d d d (ii) G = Z = Z ∗ and if f: D(I × S) 7→ R is continuous, then f [G] = f [Z] = f [Z ∗ ]. As deriving the covariance structures of the relevant processes is straightforward, we omit the proof of Theorem 4 from the Appendix. 3.2 (Parametric) Empirical Generalized Runs Tests For the EGR statistics, we can similarly provide a Gaussian process whose covariance structure is the same as (7) and that can be straightforwardly simulated. By Theorem 3, this Gaussian process also yields critical values for PEGR test statistics. 13 We begin with a representation for H. Specifically, consider the Gaussian process X defined by sp(1 − s) B 0 (p), {1 − s(1 − p)}2 0 where B00 is a Brownian bridge as before. It is straightforward to show that when p0 ≤ p, X (p, s) := − E[X (p, s)X (p0 , s0 )] is the same as (6). The representation Z for G in Theorem 4(ii) and the covariance structure for G and H required by Lemma 2(iii) together suggest representing Ge as Ze or Ze∗ defined by µ ¶ ∞ 2 2 X (1 − s) p j 1+j e s) := Z(p, s (1 − p) Bj {1 − s(1 − p)}2 j=1 (1 − p)1+j and Ze∗ (p, s) := (10) ∞ X ¡ ¢ (1 − s)2 sj Bjs p2 , (1 − p)1+j 2 {1 − s(1 − p)} j=1 (11) respectively, so that Ze (resp. Ze∗ ) is the sum of Z (resp. Z ∗ ) and X with the identical B00 in e 0 , s0 )Z(p, e s)] and E[Ze∗ (p0 , s0 )Ze∗ (p, s)]. Thus, each. As is readily verified, (7) is the same as E[Z(p en and Ĝn . simulating (10) or (11) can deliver the asymptotic null distribution of G Similar to the previous case, the following representation is convenient when p is fixed: ¾ ∞ ½ sp(1 − s)(1 − p)1/2 X j p e Zp (s) := s − (1 − p)j/2 Zj . {1 − s(1 − p)} j=0 {1 − s(1 − p)} e , s) or Ze∗ (· , s). For fixed s, we use the representation provided by Z(· We summarize these results as follows. d T HEOREM 5 (i) H = X ; d d e · ) = Zep , and if f: C(S) 7→ R is continuous, then f [G(p, e · )] = f [Zep ]; (ii) For each p ∈ I, G(p, d d d d e = e = (iii) Ge = Ze =Ze∗ , and if f: D(I × S) 7→ R is continuous, then f [G] f [Z] f [Ze∗ ]. As deriving the covariance structures of the relevant processes is straightforward, we omit the proof of Theorem 5 from the Appendix. 4 Monte Carlo Simulations In this section, we use Monte Carlo simulation to obtain critical values for test statistics constructed with f delivering the L1 and uniform norms of its argument. We also examine level and power 14 properties of tests based on these critical values. 4.1 Critical Values We consider the following statistics: Z 1 p T1,n := |Gn (p, s)|ds, −1 Z p Te1,n := 1 en (p, s)|ds, |G −1 p T∞,n := sup |Gn (p, s)|, s∈S p en (p, s)|, := sup |G Te∞,n s∈S where S := [s, s̄] with s = −0.99 and s̄ = 0.99, and p is an element of {0.1, 0.3, 0.5, 0.7, 0.9}; and Z 1 Z 1 T1,n := |Gn (p, s)| ds dp, 0 Z 1 Te1,n := 0 T∞,n := Z 1 sup |Gn (p, s)|, (p,s) ∈ I×S −1 en (p, s)| ds dp, |G Te∞,n := sup en (p, s)|. |G (p,s) ∈ I×S −1 Theorems 4 and 5 ensure that the asymptotic null distributions of these statistics can be generated by simulating Zp , Z (or Z ∗ ), Zep , and Ze (or Ze∗ ), as suitably transformed. We approximate these using 50 sp(1 − s)(1 − p)1/2 X j Wp (s) := s (1 − p)j/2 Zj , {1 − s(1 − p)} j=0 µ ¶ 50 X (1 − s)2 p2 sp(1 − s) 0 j 1+j e e B (p) + s (1 − p) Bj W(p, s) := , {1 − s(1 − p)}2 0 {1 − s(1 − p)}2 j=1 (1 − p)1+j ¾ 50 ½ sp(1 − s)(1 − p)1/2 X j p f Wp (s) := s − (1 − p)j/2 Zj , {1 − s(1 − p)} j=0 {1 − s(1 − p)} and µ ¶ 50 2 2 X p (1 − s) j 1+j f s) := s (1 − p) Bej , W(p, {1 − s(1 − p)}2 j=1 (1 − p)1+j P P (j) respectively, where Be00 (p) := Be0 (p) − p Be0 (1), Bej (x + p − 1) := 100−1/2 xk=1 [100p] i=1 Zi,k , (j) j = 0, 1, · · · , 50; x = 1, 2, · · · , 7499; and Zi,k ∼ IID N (0, 1) with respect to i, k, and j. We evaluate these functions for p and s on the grids {0.01, 0.02, · · · , 1.00} and {−0.99, −0.98, · · · , 0.98, 0.99}, respectively. 15 Concerning these approximations, three comments are in order. First, the domains for p and s are approximated using a relatively fine grid. Second, we truncate the sum of the independent Brownian motions at 50 terms. The jth term contributes a random component with a standard deviation of sj (1 − p)(1+j)/2 p, which vanishes quickly as j tends to infinity. Finally, the Brownian motions defined on the positive real line are approximated by partial sum processes with jumps at {0, 0.01, 0.02, · · · , 7498.98, 7498.99, 7499.00}. Preliminary experiments showed the impact of these approximations to be small; we briefly discuss certain aspects of these experiments below. Table 1 contains the critical values generated by 10,000 replications of these processes. 4.2 Level and Power of the Test Statistics In this section, we compare the level and power of generalized runs tests with other tests in the literature. We conduct two sets of experiments. The first examines power against dependent alternatives. The second examines power against structural break alternatives. To examine power against dependent alternatives, we follow Hong and White (2005) and consider the following DGPs: DGP 1.1: Xt = εt ; DGP 1.2: Xt = 0.3Xt−1 + εt ; 1/2 2 DGP 1.3: Xt = ht εt , and ht = 1 + 0.8Xt−1 ; 1/2 2 2 DGP 1.4: Xt = ht εt , and ht = 0.25 + 0.6ht−1 + 0.5Xt−1 1{ε2t−1 <0} + 0.2Xt−1 1{ε2t−1 ≥0} ; DGP 1.5: Xt = 0.8Xt−1 εt−1 + εt ; DGP 1.6: Xt = 0.8ε2t−1 + εt ; DGP 1.7: Xt := 0.4Xt−1 1{Xt−1 >1} − 0.5Xt−1 1{Xt−1 ≤1} + εt ; DGP 1.8: Xt = 0.8|Xt−1 |0.5 + εt ; DGP 1.9: Xt = sgn(Xt−1 ) + 0.43εt ; where εt ∼ IID N (0, 1). Note that DGP 1.1 satisfies the null hypothesis, whereas the other DGPs represent interesting dependent alternatives. As there is no parameter estimation, we apply our EGR statistics and compare these to the entropy-based nonparametric statistics of Robinson (1991), Skaug and Tjøstheim (1996), and Hong and White (2005), denoted as Rn , STn , and HWn , respectively. 16 We present the results in Tables 2 to 4. To summarize, the EGR test statistics show approximately correct levels. As the sample size increases, we generally observe more accurate nominal levels. For the alternatives, the EGR test statistics generally gain power as n increases. As noted by Granger (1963), a particular selection of p or, more generally, the choice of mapping f can p yield tests with better or worse power. Generally, we see that the Te1,n -based tests outperform the p p Te∞,n -based tests. Similarly, the Te1,n -based tests outperform the Te∞,n -based tests. Among the Te1,n based tests, more extreme choices for p generally yield better power, although DGP 1.2 is a notable p exception. Nor is there a clear-cut relation between Te1,n -based tests and Te1,n -based tests. The more p powerful Te1,n -based tests dominate the Te1,n -based tests for DGPs 1.3-1.5, but the Te1,n -based tests dominate for DGPs 1.2, and 1.6-1.9. When compared to the entropy-based test statistics, we observe four notable features. First, T̃1,n -based tests generally dominate entropy-based tests for DGP 1.2 and DGP 1.8. Second, for DGPs 1.3 and 1.7, entropy-based tests are more powerful than the EGR tests. Third, for the other DGPs, the best powered EGR tests exhibit power roughly similar to that of the best powered entropy-based tests. Finally, the best powered EGR tests outperform the best powered entropy-based tests more often for smaller sample sizes. For structural break alternatives, we compare our PEGR tests to a variety of well-known tests. These include Feller’s (1951) and Kuan and Hornik’s (1995) RR test, Brown, Durbin and Evans’s (1975) RE-CUSUM test, Sen’s (1980) and Ploberger, Krämer and Kontrus’s (1989) RE test, Ploberger and Krämer’s (1992) OLS-CUSUM test, Andrews’s (1993) Sup-W test, Andrews and Ploberger’s (1994) Exp-W and Avg-W tests, and Bai’s (1996) M-test.1 Appendix B contains formulas for these statistics. As these are all devised to test for a single structural break at an unknown point, they may not perform well when there are multiple breaks. In contrast, our PEGR statistics are designed to detect general alternatives to IID, so we expect these may perform well in such situations. We consider the following DGPs for our Monte Carlo simulations: DGP 2.1: Yt := Zt + εt ; DGP 2.2: Yt := exp(Zt ) + εt ; 1 We also examined Chu, Hornik, and Kuan’s (1995a) ME test and Chu, Hornik, and Kuan’s (1995b) RE-MOSUM and OLS-MOSUM tests. Their performance is comparable to that of the other prior tests, so for brevity we do not report those results here. 17 DGP 2.3: Yt := Zt 1{t≤b0.5·nc} − Zt 1{t>b0.5·nc} + εt ; DGP 2.4: Yt := Zt 1{t≤b0.3nc} − Zt 1{t>b0.3nc} + εt ; DGP 2.5: Yt := Zt 1{t≤b0.1nc} − Zt 1{t>b0.1snc} + εt ; DGP 2.6: Yt := exp(Zt )1{t≤b0.5·nc} + exp(−Zt )1{t>b0.5·nc} + εt ; DGP 2.7: Yt := Zt 1{t∈Kn (0.02)} − Zt 1{t∈K / n (0.02)} + εt ; DGP 2.8: Yt := Zt 1{t∈Kn (0.05)} − Zt 1{t∈K / n (0.05)} + εt ; DGP 2.9: Yt := Zt 1{t∈Kn (0.1)} − Zt 1{t∈K / n (0.1)} + εt ; DGP 2.10: Yt := Zt 1{t=odd} − Zt 1{t=even} + εt ; DGP 2.11: Yt := exp(0.1 · Zt )1{t=odd} + exp(Zt )1{t=even} + εt , where Zt = 0.5Zt−1 + ut ; (εt , ut )0 ∼ IID N (0, I2 ); and Kn (r) := {t = 1, · · · , n : (k − 1)/r + 1 ≤ t ≤ k/r, k = 1, 3, 5, · · · }. For DGPs 2.1, 2.3–2.5, and 2.7–2.10, we use ordinary least squares (OLS) to estimate the parameters of a linear model Yt = α + βZt + vt , and we apply our PEGR statistics to the prediction errors vˆt := Yt − α̂ − β̂Zt . The linear model is correctly specified for DGP 2.1, but is misspecified for DGPs 2.3–2.5 and 2.7–2.10. Thus, when DGP 2.1 is considered the null hypothesis holds, permitting an examination of the level of the tests. As the model is misspecified for DGPs 2.3–2.5 and 2.7–2.10, the alternative holds for vˆt , permitting an examination of power. DGPs 2.3–2.5 exhibit a single structural break at different break points, permitting us to see how the PEGR tests compare to standard structural break tests specifically designed to detect such alternatives. DGPs 2.7 to 2.10 are deterministic mixtures in which the true coefficient of Zt depends on whether or not t belongs to a particular structural regime. The number of structural breaks increases as the sample size increases, but the proportion of breaks to the sample size is constant. Also, the break points are equally spaced. Thus, for example, there are four break points in DGP 2.8 when the sample size is 100 and and nine break points when the sample size is 200. The extreme case is DGP 2.10, in which the proportion is equal to one, and the coefficient of Zt depends whether or not t is even. Given the regular pattern of these breaks, this may be hard to distinguish from a DGP without a structural break. For DGPs 2.2, 2.6, and 2.11, we use nonlinear least squares (NLS) to estimate the parameters 18 of a nonlinear model Yt = exp(βZt ) + vt , and we apply our PEGR statistics to the prediction errors vˆt := Yt − exp(β̂Zt ). The situation is analogous to that for the linear model, in that the null holds for DGP 2.2, whereas the alternative holds for 2.6 and 2.11. Examing these alternatives permits an interesting comparison of the PEGR tests, designed for general use, to the RR, RE, M, OLS-CUSUM and RE-CUSUM statistics, which are expressly designed for use with linear models. Our simulation results are presented in Tables 5 to 7. To summarize, the levels of the PEGR tests are approximately correct for both linear and nonlinear cases and generally improve as the sample size increases. On the other hand, there are evident level distortions for some of the other statistics, especially for the nonlinear DGP 2.2 with the linear model statistics. The PEGR statistics also have respectable power. They appear consistent against our structural break alternatives, although the PEGR tests are not as powerful as the other break tests when there is a single structural break. This is as expected, as the other tests are specifically designed to detect a single break, whereas the PEGR test is not. As one might also expect, the power of the PEGR tests diminishes notably as the break moves away from the center of the sample. Nevertheless, the relative performance of the tests reverses when there are multiple breaks. All test statistics lose power as the proportion of breaks increases, but the loss of power for the non-PEGR tests is much faster than for the PEGR tests. For the extreme alternative DGP 2.10, the PEGR tests appear to be the only consistent tests. p We also note that, as for the dependent alternatives, the Te1,n (resp. Te1,n ) -based tests outperform p the Te∞,n (resp.Te∞,n ) -based tests. For these specific structural break alternatives, however, the Te1,n p based tests always outperform the Te1,n -based tests. 5 Empirical Application The Panel Study of Income Dynamics (PSID) is a dataset in which an initial panel for 1968 was constructed by random sampling, followed by successive annual waves designed to preserve the random sampling structure. This dataset has been extensively analyzed, especially by labor economists, who typically exploit the random sampling design to justify IID assumptions for particular 19 aspects of the data generating process. For example, MaCurdy (1981) and Altonji (1986) used the PSID to estimate a model of male labor supply based on dynamic optimization under uncertainty. Here we estimate their model using panels from the PSID constructed using several different waves. We apply our generalized runs tests to the estimated residuals of these models, permitting us to assess not only whether or not the IID assumption is valid for a given panel, but also the extent to which this validity may change through time. We can thus assess not only the success of the original sample design, but also whether there may have been structural shifts over time or misspecification of the labor supply model. Formally, we test the null hypothesis that for some real vector β ∗ and for each t = 1, · · · , T , H00 : {Yit − β 0∗ Xit : i = 1, · · · , NT } is IID against the alternative H10 that the null is false, where Yit and Xit are as fully described below, and NT is the total number of individuals in the balanced panel beginning in 1968 and ending in year T. The PSID contains data both at the individual and at the household level. We restrict attention to household-level data, and we apply the following data exclusion rules: Observations are excluded if: 1. The household is from the Survey of Economic Opportunity (SEO) group;2 2. The household is not surveyed in 1968; 3. The household reports missing information for our dependent variable, regressors, or instruments; 4. The household head has zero income with positive working hours; 5. The household head is not a married male in 1968; 6. The household composition changes; 7. The household head is retired, disabled, a housewife, etc.; 8. The household is outside the US; 9. The household head worked more than 8,760 hours a year; 10. The household head has zero working hours for a year; 2 The PSID consists of two samples, the SRC and the SEO. The latter is obtained by separately sampling low- income families. Applying the PEGR tests to SRC and SEO together strongly rejects the IID null. 20 11. The household head’s reported age is inconsistent.3 In order to have sufficient observations to obtain reliable coefficient estimates, we consider panel data sets constructed using data through 1980. Table 8 displays the numbers of observations remaining after applying these exclusion rules for annual waves from 1968 through 1980. We apply our tests to five data sets: an initial cross section (N68 = 1,129) and four panels, containing three, six, nine, and twelve years of data respectively. For these panels, the datasets have NT × T = 2,817, 4,182, 4,401, and 4,152 observations, respectively. Following MaCurdy (1981) and Altonji (1986), we estimate male labor supply models given by `it = (T ) β0 + (T ) β1 wit + T −1 X (T ) (T ) βt Dt + uit , t=69 where `it := log(Lit ) − log(Li(t−1) ), with Lit denoting annual labor hours for household head i in period t; wit := log(Wit ) − log(Wi(t−1) ), with Wit denoting hourly earnings for household head i (T ) in period t; Dt is a time dummy for period t; and {uit : i = 1, 2, · · · , NT } is the set of stochastic residuals in period t assumed IID by the null for each t; t = 69, · · · , T ; and T = 69, 71, 74, 77, (T ) 80.4 In MaCurdy (1981) and Altonji (1986), interest attaches primarily to β1 , which relates the intertemporal substitution effect to the business cycle. We apply our PEGR test statistics to the estimated errors for these equations, (T ) ûit := `it − (T ) β̂0 − (T ) β̂1 wit − T −1 X (T ) β̂t Dt , i = 1, 2, · · · , NT , t=69 where, following MaCurdy (1981), we estimate the unknown β∗ by pooled two-stage least squares, using as instruments the household head’s father’s and mother’s education, dummy variables indicating economic status during the household head’s childhood, household head’s education, household head’s education squared and the time dummy variables. These instruments are those used by MaCurdy (1981). We apply our PEGR tests separately for each year within a given panel. For example, for the 1968-1971 panel, we apply our tests separately for t = 69, 70, and 71; because the panel is 3 For example, in some cases, the household head’s age as reported in a particular year and an adjacent year differs by more than two years. For other examples, refer to http://www.vuw.ac.nz/staff/js-cho/iidtesting.html. 4 The dummy variable Dt is omitted when T = 69 to avoid the dummy variable trap. 21 balanced, there are N71 = 939 observations for each t. As is readily verified, under correct specification, this model and estimation method satisfy the regularity conditions required for application of our statistics. Table 9 contains the p-values for our test statistics Te1,n , and Te∞,n . We also compute Hochberg’s (1988) modified Bonferroni p-value bounds5 based on groups of reported individual p-values. The resulting bounds are given at the bottom of Table 9. As the reported p-values are nowhere near standard significance levels, we fail to reject the IID null hypothesis for every panel data set. We p p also report Hochberg bounds when the (not separately tabulated) test statistics Te1,n and Te∞,n (p ∈ s s (s ∈ {−0.5, −0.3, −0.1, 0.1, 0.3, 0.5}) are computed. {0.3, 0.4, 0.5, 0.6, 0.7}) and Te1,n and Te∞,n As before, we do not reject the null hypothesis. 6 Conclusion The IID assumption plays a central role in economics and econometrics. Here we provide a family of statistics based on generalized runs that are powerful against unspecified alternatives, providing a useful complement to tests designed to have power against specific alternatives, such as serial correlation, GARCH, or structural breaks. Relative to other tests of this sort, for example the entropy-based tests of Hong and White (2005), our tests have an appealing computational simplicity in that they do not require kernel density estimation, with the associated challenge of bandwidth selection; and they have power against dependent alternatives without having to examine dependence at specific lags. Our simulation studies show that our tests have empirical levels close to their nominal asymptotic levels. They also have encouraging power against a variety of important alternatives. A feature of particular interest is their power against alternatives that involve a number of structural breaks increasing with the sample size. We apply our tests to residuals from a male labor supply model using the PSID and find no evidence against the IID hypothesis, suggesting the success of the original random sampling design, its preservation over time, the absence of structural breaks, and the absence of misspecification for the MaCurdy (1981)-Altonji (1986) model. 5 Hochberg’s bound is computed as minj=1,2,··· ,J (J + 1 − j)p(j) , where p(j) is the j-th smallest p-value and J is the total number of models. 22 7 Appendix 7.1 Proofs Proof of Lemma 1: As the necessity part is trivial, we prove the sufficiency part only. Suppose that {Yt } is not independent, then there is at least a pair of two dependent variables, (Yi , Yi+k ) (k > 0) say. For notational simplicity, we let i = 0 without loss of generality. Then, for some p ∈ I, P(F0 (Y0 ) ≤ p, Fk (Yk ) ≤ p) 6= P(F0 (Y0 ) ≤ p)P(Fk (Yk ) ≤ p) = p2 . Further, from the given condition on Rn,i (p) and the definition of conditional expectation, P(Fk (Yk ) ≤ p|F0 (Y0 ) ≤ p) = Pk P` Pk k−` ` p = p, where the second equality follows `=1 P ( i=1 Rn,i (p) = k) = `=1 k−1 C`−1 (1 − p) M (p) from the IID condition of {Rn,i (p)}i=1n . Thus, P(Fk (Yk ) ≤ p, F0 (Y0 ) ≤ p) = P(Fk (Yk ) ≤ p | F0 (Y0 ) ≤ p)P(F0 (Y0 ) ≤ p) = p2 . This is contradictory to the above inequality derived from the assumption that {Yt } is not independent. Thus, {Yt } is independently distributed. Next, suppose further that {Yt } is not identically distributed. Then there is a pair, say (Yi , Yj ), such that for some y ∈ R, pi := Fi (y) 6= pj := Fj (y). Further, for the same y, P(Rn,(j) (pj ) = 1) = P(Fj (Yj ) ≤ Fj (y)|Fj−1 (Yj−1 ) ≤ Fj−1 (y)) = P(Fj (Yj ) ≤ Fj (y)) = pj , where the subscript (j) denotes the (j)-th run of {Rn,i (pj )} corresponding to Fj (y), and the second equality follows from the independence property shown above. Similarly, P(Rn,(i) (pj ) = 1) = P(Fj (Yi ) ≤ Fj (y)) = P(Yi ≤ y) = pi . That is, P(Rn,(j) (pj ) = 1) 6= P(Rn,(i) (pj ) = 1). This is contradictory to the assumption that {Rn,i (p)} is identically distributed for all p ∈ I. Hence, {Yt } is identically distributed. This completes the proof. ¥ To prove Theorem 1, we first examine preliminary lemmas. We let Kj be a random variable PKj satisfying that i=K Rn,i (p) = Rn,j (p0 ). j−1 +1 L EMMA A1: Given conditions A1, A2, A3(i), A4(i), and the null, E[ PK1 i=1 sRi ] = E[sRi ] · E[K1 ]. Proof of Lemma A1: For each s, ` ∈ N, P(Ri = s|K1 = `) = P(K1 = `|Ri = s)P(Ri = s) /P(K1 = `) = P(K1 = `)P(Ri = s)/P(K1 = `) = P(Ri = s), where the second equality follows P 1 Ri from that the event {K1 = `} is independent of {Ri = s} under the null, so that E[ K i=1 s ] = PK1 Ri P P∞ ∞ Ri Ri `=1 ` P(K1 = `) = E[s ] · E[K1 ]. This `=1 E[ i=1 s |K1 = `]P(K1 = `) = E[s ] completes the proof. ¥ 23 0 L EMMA A2: Given conditions A1, A2, A3(i), A4(i), and the null, if p0 ≤ p, then E[K1 sR1 ] = sp0 {1−s(1−p0 )} · 1−s(1−p) . {1−s(1−p0 )} 0 0 Proof of Lemma A2: First, E[K1 sR1 ] = E[sR1 E[K1 |R10 ]], and P(K1 = `|R10 = r10 ) = P(K1 = `, R10 = r10 )/P(R10 = r10 ) = r10 −1 C`−1 (p 0 0 − p0 )`−1 (1 − p)r1 −` /(1 − p0 )r1 −1 , where the last equality 0 follows from the fact that P(R10 = r10 ) = p0 (1 − p0 )r1 −1 and P(K1 = `, R10 = r10 ) = r10 −1 C`−1 p0 (p − P 0 p−p0 0 0 0 p0 )`−1 (1−p)r1 −` . Thus, E[K1 |R10 = r10 ] = ∞ `=1 `·P(K1 = `|R1 = r1 ) = 1+(r1 −1)( 1−p0 ), imply0 0 0 0 0 0 d ing that E[K1 sR1 ] = E[sR1 ]+( p−p )E[sR1 (R10 −1)]. Second, E[sR1 (R10 −1)] = s2 ( ds E[sR1 −1 ]) = 1−p0 s2 p0 (1 − p0 )/{1 − s(1 − p0 )}2 . Therefore, · ¸ sp0 p − p0 s2 p0 (1 − p0 ) sp0 {1 − s(1 − p)} R10 E[K1 s ] = = . + {1 − s(1 − p0 )} 1 − p0 {1 − s(1 − p0 )}2 {1 − s(1 − p0 )}2 This completes the proof. ¥ L EMMA A3: Given conditions A1, A2, A3(i), A4(i), and the null, if p0 ≤ p, then K1 X 0 E[ sRi +R1 ] = i=1 · ¸2 1 − s(1 − p) s2 p0 · . {1 − s2 (1 − p)} {1 − s(1 − p0 )} P 1 Ri +R0 PK1 Ri +R10 1 ] = E[ Proof of Lemma A3: First, E[ K s |K1 ]], and i=1 i=1 E[s Ri +R10 E[s PK1 Ri + j=1 Rj |K1 ] = E[s |K1 ] = E[s · 2Ri |K1 ] K1 Y j=1,j6=i 2 sp = {1 − s2 (1 − p)} E[sRj |K1 ] ¸· sp {1 − s(1 − p)} ¸K1 −1 , h i P 1 Ri +R0 s2 p sp K1 −1 1 ] = E[K [ so that E[ K s ] ] . Next, P(K1 = k) = 1 {1−s(1−p)} i=1 {1−s2 (1−p)} p0 p h p−p0 p ik−1 by Lemma B4 given below. From this, · E[K1 sp {1 − s(1 − p)} ¸K1 −1 · · ¸k−1 ¸2 ∞ p0 X p0 1 − s(1 − p) s(p − p0 ) ]= = · . k p k=1 {1 − s(1 − p)} p {1 − s(1 − p0 )} P 1 Ri +R0 1 ]. The desired result follows by plugging this into E[ K i=1 s L EMMA A4: Given A1, A2, A3(i), A4(i), and the null, if p0 ≤ p, then P(K1 = k) = and E[K1 ] = p . p0 24 ¥ p0 p h · p−p0 p ik−1 P 0 Proof of Lemma A4: First, P(K1 = 1) = P(R10 = R1 ) = ∞ `=1 P(R1 = R1 = `) = P(F (X1 ) < P∞ 0 P 0 `−1 p0 ) + ∞ = p0 p−1 . Next, `=2 P(F (X1 ) ≥ p, · · · , F (X`−1 ) ≥ p, F (X` ) < p ) = `=1 p (1 − p) P P∞ 0 0 P(K1 = 2) = P(R10 + R20 = R1 ) = ∞ `1 =1 `2 =1 P(R1 = `1 , R2 = `2 , and R1 = `1 + `2 ) = P∞ P∞ `1 −1 (p − p0 )(1 − p)`2 −1 p0 = p0 (p − p0 )p−2 . Similarly, we can proceed our `1 =1 `2 =1 (1 − p) P computation for an arbitrarily chosen number, say k. Then P(K1 = k) = P( kj=1 Rj0 = R1 ) = P∞ P∞ Pk Qk−1 P∞ 0 0 0 `1 =1 · · · `k =1 P(R1 = `1 , R2 = `2 , · · · , Rk = `k , and R1 = j=1 `j ) = j=1 { `j =1 (1 − h 0 ik−1 P 0 `k −1 0 p)`j −1 (p0 − p)} · ∞ p = pp · p−p . `k =1 (1 − p) p From these, it is trivial to compute that E[K1 ] = p/p0 . This completes the proof. ¥ Proof of Theorem 1: (i) We separate the proof into 3 pieces, (a), (b), and (c). In (a), we prove the weak convergence of Gn (p, · ). In (b), we show that for each s ∈ S, E [G(p, s)] = 0. Finally, we derive E [G(p, s)G(p, s0 )] in (c). (a) First, for each s, if we let gn,i (p, s) := sRn,i (p) − sp{1 − s(1 − p)}−1 , then ¯ ¯ ¯∂ ¯ sup ¯¯ gn,i (p, s)¯¯ ≤ Rn,i (p)e sRn,i (p)−1 + (1 − s̄)−2 , s∈S ∂s where se := max[|s|, s̄], so that |gn,i (p, s) − gn,i (p, s0 )| ≤ {Rn,i (p)e sRn,i (p)−1 + (1 − s̄)−2 }|s − s0 |. Second, from the independence condition, E[|Gn (p, s)−Gn (p, s0 )|2 ] = E[|gn,i (p, s)−gn,i (p, s0 )|2 ], so that E[|Gn (p, s) − Gn (p, s0 )2 ] ≤ K|s − s0 |2 , where K := 2 1 2 + + ≥ sup E[{Rn,i (p)e sRn,i (p)−1 + (1 − s̄)−2 }2 ], 3 2 2 (1 − se) (1 − se) (1 − s̄) (1 − s̄)4 p∈I and K < ∞. Third, therefore E[|Gn (p, s) − Gn (p, s0 )|2 ]1/2 E[|Gn (p, s0 ) − Gn (p, s00 )|2 ]1/2 ≤ K|s − s00 |2 , where we let s00 ≤ s0 ≤ s. These verify the conditions of theorem 13.5 in Billinsley (1999). The desired result follows from these and the finite dimensional weak convergence, which is obtained by applying Cramér-Wold device. (b) Given the conditions and the null, Mn (p) E[ X i=1 Mn (p) Rn,i (p) s X sp sp − ] =E[ E[sRn,i (p) − |Mn (p)]] 1 − s(1 − p) {1 − s(1 − p)} i=1 Mn (p) =E[ X i=1 sp sp − ] = 0, {1 − s(1 − p)} {1 − s(1 − p)} where the second equality follows from that Rn,i (p) given Mn (p) is independently and identically distributed under the null. 25 (c) Given the conditions and the null, E[Gn (p, s)Gn (p, s0 )] · Mn (p) X −1 = n E[ E[ sRn,i (p) − ¸ s0 p s − |Mn (p)]] 0 (1 − p)} {1 − s i=1 · ¸ ss0 p ss0 p2 −1 = n E[Mn (p) − ] {1 − ss0 (1 − p)} {1 − s(1 − p)}{1 − s0 (1 − p)} ss0 p2 (1 − s)(1 − s0 )(1 − p) = , {1 − ss0 (1 − p)}{1 − s(1 − p)}{1 − s0 (1 − p)} sp {1 − s(1 − p)} ¸· 0 Rn,i (p) where the last equality follows from that n−1 E[Mn (p)] = p. Finally, we can apply the continuous mapping theorem to show that f [Gn (p, · )] ⇒ f [G(p, · )]. This is the desired result. Remark 1: (1) In addition to the weak convergence, it also follows that for any ε > 0, there is δ > 0 such that 000 lim sup P(wG (δ) > ε) = 0, n (p, · ) (12) n→∞ 000 where we define wG (δ) as n (p, · ) sup sup s,s0 ∈S s00 ∈{|s−s0 |<δ} min[sup |Gn (p, s00 ) − Gn (p, s0 )|, sup |Gn (p, s00 ) − Gn (p, s)|]. p p This follows from the proof of theorem 3 in Bickel and Wichura (1971). (2) Verification of the second condition of theorem 13.5 in Billingsley (1999) is omitted as Gn (p, · ) is defined on C(S). (3) Proof of Theorem 1(i) can be also proved by applying Jain and Marcus (1975). That is, if we let H(S, u) := log(N (S, u)), where N (S, u) is the minimum number of balls of radius less than u R s̄−s R2 R2 covering S, then 0 H 1/2 (S, u)du ≤ 0 H((−1, 1), u)du ≤ 0 log(2 + 2/u)du = log(27) < ∞. These two facts ensure the weak convergence of {Gn (p, · )} by theorem 1 of Jain and Marcus (1975). (ii) This can be proved in numerous ways. We show this by verifying the conditions in theorem 13.5 of Billingsley (1999). Our proof is separated into three pieces, (a), (b), and (c). In (a), we show the weak convergence of Gn ( · , s). In (b), we prove that for each p, E [G(p, s)] = 0. Finally, in (c), we show that E [G(p, s)G(p0 , s)] = s2 p02 (1 − s)2 (1 − p) . {1 − s(1 − p0 )}2 {1 − s2 (1 − p)} 26 (a) First, for each s, G(1, s) ≡ 0 from that Gn (1, s) ≡ 0, and for any δ > 0, s2 p2 (1 − s)2 (1 − p) E(|G(p, s)|2 ) = lim =0 p→1 p→1 δ 2 {1 − s(1 − p)}2 {1 − s2 (1 − p)} δ2 lim P(|G(p, s)| > δ) ≤ lim p→1 by Markov’s inequality and the result in (c). Thus, for each s, G(p, s) − G(1, s) ⇒ 0 as p → 1. Second, for each p00 ≤ p0 ≤ p, E[{Gn (p, s) − Gn (p0 , s)}2 {Gn (p0 , s) − Gn (p00 , s)}2 ] ≤ E[{Gn (p, s) − Gn (p0 , s)}4 ]1/2 E[{Gn (p0 , s) − Gn (p00 , s)}4 ]1/2 by Cauchy-Schwarz inequality. Also, it’s not hard to show that E[{Gn (p, s) − Gn (p0 , s)}4 ] = E[{G(p, s) − G(p0 , s)}4 ] + o(n−1 ) using the fact that E[s4Ri (p) ] ≤ E[s2Ri (p) ] < ∞ and s ∈ S. Also, some algebra shows that E[{G(p, s) − G(p0 , s)}4 ] = 3a(p, p0 , s)2 + 12p0 2 (1 − p)b(p, p0 , s)c(p, p0 , s) , {1 − s0 (1 − p0 )}2 {1 − s2 (1 − p)} where for each p and p0 , a(p, p0 , s) = k(p, s)m(p, s) − k(p0 , s)m(p0 , s), b(p, p0 , s) = m(p0 , s) −m(p, s), c(p, p0 , s) = k(p, s) − k(p0 , s), k(p, s) := p2 , {1 − s(1 − p)}2 and m(p, s) := 1−p . {1 − s2 (1 − p)} ∂ ∂ Note that | ∂p k(p, s)| and | ∂p m(p, s)| are bounded by k̄ := 4/(1− s̄)2 and m̄ := 1/(1−max[s̄2 , s2 ]) uniformly in (p, s) respectively. Therefore, k( · , s) and m( · , s) are Lipschitz continuous with Lipschitz constants k̄ and m̄ respectively, which are independent of s. These imply that |b(p, p0 , s)| ≤ m̄|p − p0 |, |c(p, p0 , s)| ≤ k̄|p − p0 |, and that |a(p, p0 , s)| = |m(p, s)c(p, p0 , s) − k(p0 , s)b(p, p0 , s)| ≤ m(p, s)|c(p, p0 , s)| + k(p0 , s)|b(p, p0 , s)|. Note also that k̄ and m̄ are uniform bounds for k(p, s) and m(p, s) in (p, s) respectively. Thus, E[{G(p, s) − G(p0 , s)}4 ] ≤ 3 · 2k̄ 4 m̄4 |p − p0 |2 + 12k̄ 2 m̄2 |p − p0 |2 = 18k̄ 4 m̄4 |p − p0 |2 . Finally, E[{Gn (p, s) − Gn (p0 , s)}2 {Gn (p0 , s) − Gn (p00 , s)}2 ] ≤ 18k̄ 4 m̄4 |p − p0 | · |p0 − p00 | + o(n−1 ) ≤ 18k̄ 4 m̄4 |p − p00 |2 + o(n−1 ) 27 by applying the same procedure to E[{G(p0 , s) − G(p00 , s)}4 ]. Given these, the weak convergence of {Gn ( · , s)} holds by theorem 13.5 of Billingsley (1999) and the finite dimensional weak convergence, which we don’t prove from its self-evidence. (b) For each p, E[G(p, · )] = 0 follows from the proof of Theorem 1(i, b). (c) First, for convenience, for each p and p0 , we let M and M 0 denote Mn (p) and Mn (p0 ) respectively and let Ri and Ri0 be the short-hand notations for Rn,i (p) and Rn,i (p0 ). Then from the definition of Kj , K1 X 0 0 E[Gn (p, s)Gn (p0 , s)] = n−1 E[M 0 E[ (sRi − E[sRi ])(sR1 − E[sR1 ])|M, M 0 ]] i=1 K1 X = n−1 E[M 0 ]E[ 0 0 (sRi − E[sRi ])(sR1 − E[sR1 ])] i=1 0 K1 X = p E[ Ri +R10 s − i=1 K1 X 0 0 0 sRi E[sR1 ] − K1 sR1 E[sRi ] + K1 E[sRi ]E[sR1 ]], i=1 where the first equality follows from that {Kj } is independently and identically distributed under the null and that Rj0 is independent of R` , if ` ≤ Kj−1 or ` ≥ Kj + 1. The second equality also follows from the fact that {M, M 0 } is independent of {Ri , R1 : i = 1, 2, · · · , K1 }. Further, E[ K1 X 0 sRi ] = E[sRi ] · E[K1 ], E[K1 sR1 ] = i=1 sp0 1 − s(1 − p) · , 0 {1 − s(1 − p )} {1 − s(1 − p0 )} and K1 X 0 E[ sRi +R1 ] = i=1 · ¸2 1 − s(1 − p) s2 p0 · {1 − s2 (1 − p)} {1 − s(1 − p0 )} by Lemmas A1 to A4. Substituting these to the above equation yields the desired result. Remark 2: (1) The proof of Theorem 1(ii) also implies that for any ε > 0, there is δ > 0 such that 00 lim supn→∞ P(wG (δ) > ε) = 0, where n ( · ,s) 00 wG (δ) := sup n ( · ,s) p,p0 ∈I sup p00 ∈{|p−p0 |<δ} min[|Gn (p00 , s) − Gn (p0 , s)|, |Gn (p00 , s) − Gn (p, s)|], implying that {Gn ( · , s)} is tight. Proof is given in Billingsley (1999, pp. 141–143). (2) The given bounds k̄ and m̄ in the proof of Theorem 1(ii) are independent of s. This further implies that for any ε > 0, there is δ > 0 such that 000 (δ) > ε) = 0, lim sup P(wG n ( · ,s) n→∞ 28 (13) 000 where wG (δ) is defined as n ( · ,s) sup sup p,p0 ∈I p00 ∈{|p−p0 |<δ} min[sup |Gn (p00 , s) − Gn (p0 , s)|, sup |Gn (p00 , s) − Gn (p, s)|]. s s This follows from the proof of theorem 3 in Bickel and Wichura (1971). (iii) We separate the proof into two pieces, (a) and (b). In (a), we prove the weak convergence of Gn and derive its covariance structure in (b). (a) In order to show the weak convergence of Gn , we exploit the corollary of Bickel and Wichura (1971, p. 1664). That is, we show that for any ε > 0, there is δ > 0 such that 000 000 000 000 (δ)]. For this, we (δ), wG lim supn→∞ P(wG (δ) > ε) = 0, where wG (δ) := max[wG n n n ( · ;s) n ( · ;p) note that 000 000 000 (δ) > ε/2) + lim sup P(sup wG (δ) > ε/2), lim sup P(wG (δ) > ε) ≤ lim sup P(sup wG n n (p, · ) n ( · ,s) n→∞ n→∞ p n→∞ s which is identical to zero by (12) and (13). Therefore, the desired result is obtained from this and the finite dimensional weak convergence, which follows from Cramér-Wold device. (b) As before, for convenience, for each p and p0 , we let M and M 0 denote Mn (p) and Mn (p0 ) respectively and let Ri and Ri0 be the short-hand notations for Rn,i (p) and Rn,i (p0 ). Also, we let PKj Kj be such that i=K Rn,i (p) = Rn,j (p0 ). Then, given the conditions and the null, j−1 +1 E[Gn (p, s)Gn (p0 , s0 )] (14) M0 M XX R0 R0 = n E[ (sRi − E[sRi ])(s0 j − E[s0 j ])] −1 j=1 i=1 K1 K1 X X R0 R0 R0 R0 = p0 E[ sRi s0 1 − E[s0 1 ]sRi − K1 s0 1 E[sRi ] + K1 E[sRi ]E[s0 1 ]]. i=1 i=1 From Lemmas A1 to A4, we note that K1 X E[ sRi ] = E[sRi ] · E[K1 ], i=1 0 0 R1 E[K1 s 1 − s0 (1 − p) s0 p0 · , ]= {1 − s0 (1 − p0 )} {1 − s0 (1 − p0 )} and K1 X R0 sRi s0 1 ] = E[ i=1 · ¸2 1 − s0 (1 − p) ss0 p0 · . {1 − ss0 (1 − p)} {1 − s0 (1 − p0 )} 29 Thus, plugging these into (14) leads to that E[Gn (p, s)Gn (p0 , s0 )] = ss0 p02 (1 − s)(1 − s0 )(1 − p){1 − s0 (1 − p)} . {1 − s(1 − p)}{1 − s0 (1 − p0 )}2 {1 − ss0 (1 − p)} Finally, it follows from the continuous mapping theorem that f [Gn ] ⇒ f [G]. ¥ Proof of Lemma 2: (i) First, supp |e pn (p) − p| → 0 almost surely by Glivenko-Cantelli. Second, Gn ⇒ G by Theorem 1(ii). Third, (D(I × S) × D(I)) is a separable space. Thus, (Gn , pen ( · )) ⇒ (G, · ) by theorem 3.9 of Billingsley (1999). Fourth, |G(p, s) − G(p0 , s0 )| ≤ |G(p, s) − G(p0 , s)| + |G(p0 , s) − G(p0 , s0 )|, and each term in the RHS can be made as small as possible by letting |p − p0 | and |s − s0 | tend to zero because G ∈ C(I × S) a.s. Finally, note that for each (p, s), Wn (p, s) = Gn (e pn (p), s). Therefore, Wn − Gn = Gn (e pn ( · ), · ) − Gn ( · , · ) ⇒ G − G = 0 by the lemma of Billingsley (1999, p. 151) and the first to the fourth facts. This implies that sup(p,s)∈I×S |Wn (p, s)− Gn (p, s)| → 0 in probability as desired. (ii) We abbreviate pen (p) into pen for convenience. Then, for some p∗n (p) between p and pen , ¯ ¯· ¸ ¯ 2s2 (1 − s)(e ¯ sp s(1 − s)(e p − p) se p pn − p)2 n n ¯ ¯ ¯ {1 − s(1 − pen )} − {1 − s(1 − p)} − {1 − s(1 − p)}2 ¯ = {1 − s(1 − p∗ (p))}3 n by the mean value theorem such that supp∈I |p∗n (p) − p| → 0 a.s. by Glivenko-Cantelli. Also, ¯¯ ¯ ¯¯ ¯ fn (p)s2 (1 − s)(e fn (p) ¯¯ ¯ ¯¯ ¯ M pn − p)2 M s2 (1 − s) 1 ¯¯ 2¯ ¯ ¯ ¯ √ sup √ sup ≤ sup sup n(e p − p) ¯ ¯ n ¯, n ¯ ¯ p,s {1 − s(1 − p∗n (p))}3 ¯ ¯ p n{1 − s(1 − p∗n (p))}3 n¯ p p,s fn (p) and s2 (1 − s){1 − s(1 − p∗ (p))}−3 are uniformly bounded by 1 and 1/(1 − se)3 where n−1 M n pn − p)2 = OP (1) uniformly in p. Thus, respectively, where se := max[|s|, s̄], and n(e ¯ ¯ ¸ fn (p) ¯· ¯ M sp s(1 − s) (e p − p) se p n n ¯ = oP (1). sup √ ¯¯ − − {1 − s(1 − pen )} {1 − s(1 − p)} {1 − s(1 − p)}2 ¯ n p,s fn (p) − p| = oP (1), Given these, the weak convergence of Hn trivially follows because supp |n−1 M √ and n(e pn − p) as a function of p weakly converge to a Brownian bridge, so that we can apply the lemma of Billingsley (1999, p. 151). Also, all these prove the tightness of Hn , as well. Next, the covariance structure of H follows from the fact that for each (p, s) and (p0 , s0 ) with p0 ≤ p, # fn (p)s(1 − s) (e M pn − p) √ = 0, E n{1 − s(1 − p)}2 " 30 and fn (p)s(1 − s) (e fn (p0 )s0 (1 − s0 )(e M pn − p) M p0n − p0 ) ss0 pp0 2 (1 − s)(1 − s0 )(1 − p) √ , E[ √ ] = {1 − s(1 − p)}2 {1 − s0 (1 − p0 )}2 n{1 − s(1 − p)}2 n{1 − s0 (1 − p0 )}2 which is identical to E[H(p, s)H(p0 , s0 )]. (iii) To show the given claim, we first derive the given covariance structure. For each (p, s) and (p0 , s0 ), E[Wn (p, s)Hn (p0 , s0 )] = E[E[Wn (p, s)|X1 , · · · , Xn ]Hn (p0 , s0 )], where the equality follows from that Hn is measurable with respect to the smallest σ–algebra generated by {X1 , · · · , Xn }. Given this, note that fn (p) · ¸ M 1 X se pn en,i (p) R E[Wn (p, s)|X1 , · · · , Xn ] = √ E[s |X1 , · · · , Xn ] − {1 − s(1 − pen )} n i=1 ¸ Mn (p) · sp 1 X se pn =√ − = −Hn (p, s). n i=1 {1 − s(1 − p)} {1 − s(1 − pen )} f Thus, E[E[Wn (p, s)|X1 , · · · , Xn ]Hn (p0 , s0 )] = −E[Hn (p, s)Hn (p0 , s0 )]. Second, we also note that E[G(p, s)H(p0 , s0 )] = limn→∞ E[Wn (p, s)Hn (p0 , s0 )] by Lemma 2(i), and further E[H(p, s) H(p0 , s0 )] = limn→∞ E[Hn (p, s)Hn (p0 , s0 )]. Therefore, it follows that E[G(p, s) H(p0 , s0 )] = −E[H(p, s)H(p0 , s0 )]. en , Hn )0 and apply example 1.4.6 of van der Vaart and Wellner (1996, p. Next, we consider (G en = Wn + Hn = Gn + Hn + oP (1), and each of Gn 31) to show the weak convergence. Note that G en is tight, too. Further, their limits are continuous by Theorem 1(ii) and and Hn is tight, so that G en Lemma 2(ii). Thus, if the finite-dimensional distributions of Gn + Hn has weak limits, then G must be weakly converging to the Gaussian process Ge with the covariance structure (7). We may apply Lindeberg-Levy’s CLT for this. That is, for each (p, s), E[Gn (p, s) + Hn (p, s)] = 0, and E[{Gn (p, s) + Hn (p, s)}2 ] = E[Gn (p, s)2 ] − E[Hn (p, s)2 ] + o(1) ≤ E[Gn (p, s)2 ] + o(1) = s2 p2 (1 − s)2 (1 − p) + o(1), {1 − s2 (1 − p)}{1 − s(1 − p)}2 which is uniformly bounded, so that for each (p, s), the sufficiency conditions for LindebergLevy’s CLT are easily verified. Here the first equality follows by applying Lemma 2(ii). The en follows from the Cramér-Wold device. finite-dimensional weak convergence of G 31 en is asymptotically independent of Hn because E[G en (p, s) Hn (p0 , s0 )] Given this, we note that G = E[Wn (p, s)Hn (p0 , s0 )] + E[Hn (p, s)Hn (p0 , s0 )] = 0 by the covariance structure given above, so en , Hn )0 ⇒ (G, e H)0 by example 1.4.6 of van der Vaart and Wellner (1996, p. that it follows that (G en − Hn , Hn )0 = (Wn , Hn )0 and apply the continuous 31). To complete the proof, we consider (G mapping theorem. ¥ Remark 3: Durbin (1973) shows that empirical distributions with parameter estimation error are not distribution-free using the similar proof to that of Lemma 2(i). We further exploit his proof to show that Wn asymptotically equivalent to Gn . Proof of Theorem 2: (i, ii, and iii) By chance, it is already shown in the proof of Lemma 2(iii) en ⇒ G. e This implies the given claims together with the continuous mapping theorem. that G ¥ Proof of Lemma 3: (i) First, as shown in the proof of Lemma 3(ii) given below, F̂n (y( · )) converges to F (y( · )) in probability uniformly on I, where for each p, y(p) := inf{x ∈ R : F (x) ≥ p}. Second, Gn ⇒ G by Theorem 1(ii). Third, (D(I × S) × D(I)) is a separable space. Therefore, it follows that (Gn , F̂n (y( · ))) ⇒ (G, F (y( · ))) by theorem 3.9 of Billingsley (1999). Fourth, G ∈ C(I×S). Finally, G̈n ( · , · ) = Gn (F̂n (y( · )), · ), so that G̈n ( · , · )−Gn ( · , · ) = Gn (F̂n (y( · )), · )− Gn ( · , · ) ⇒ G −G = 0, where the weak convergence follows from the lemma of Billingsley (1999, p. 151). Thus, sup(p,s) |G̈n (p, s) − Gn (p, s)| = oP (1). (ii) First, by the definition of Ḧn (p, s), we can represent this as a product of two pieces as follows: ( )½ · ¸¾ √ M̂n (p) se pn sp Ḧn (p, s) = n − . n {1 − s(1 − pen )} {1 − s(1 − p)} Second, it follows that ¸ · √ sp s(1 − s)B00 (p) se pn − ⇒− n {1 − s(1 − pen )} {1 − s(1 − p)} {1 − s(1 − p)}2 by Lemma 2(ii) and Theorem 5(i) given below. Third, if F̂n ( · ) converges to F ( · ) in probability, then n−1 M̂n (p) converges to p in probability uniformly in p because for each p, M̂n (p) = Pn t=1 1{F̂n (Ŷt )<p} . Finally, all these imply that sup(p,s) |Ḧn (p, s) − Hn (p, s)| = oP (1) by the lemma of Billingsley (1999, p. 151), and this completes the proof. Therefore, we only have to show that F̂n ( · ) converges to F ( · ) in probability and for this exploit Glivenko-Cantelli. That is, if for each p, F̂n (y(p)) converges to F (y(p)) in probability, 32 then the uniform convergence follows from the property of empirical distribution (boundedness, monotonicity and right continuity). Thus, the proof can be completed by showing the pointwise convergence of F̂n (p). We follow the next steps. First, if we let y = y(p) for notational simplicity, then for each y and for any ε1 > 0, {ω ∈ Ω : ht (θ̂ n ) < y} ⊂ {ω ∈ Ω : ht (θ ∗ ) < y + |ht (θ̂ n ) − ht (θ ∗ )|} = {ω ∈ Ω : ht (θ ∗ ) < y +|ht (θ̂ n )−ht (θ ∗ )|, |ht (θ̂ n )−ht (θ ∗ )| < ε1 }∪{ω ∈ Ω : ht (θ ∗ ) < y +|ht (θ̂ n )−ht (θ ∗ )|, |ht (θ̂ n )−ht (θ ∗ )| ≥ ε1 } ⊂ {ω ∈ Ω : ht (θ ∗ ) < y +ε1 }∪{|ht (θ̂ n )−ht (θ ∗ )| ≥ ε1 }. Second, for the same y and ε1 , {ω ∈ Ω : ht (θ̂ n ) < y} ⊃ {ω ∈ Ω : ht (θ ∗ ) < y − ε1 } \ {ω ∈ Ω : |ht (θ ∗ ) − ht (θ̂ n )| > ε1 }. These two facts imply that n n 1X 1X 1{ht (θ∗ )<y−ε1 } − 1 n t=1 n t=1 {|ht (θ̂n )−ht (θ∗ )|≥ε1 } n n n 1X 1X 1X ≤ 1{ht (θ̂n )<y} ≤ 1{ht (θ∗ )<y+ε1 } + 1 . n t=1 n t=1 n t=1 {|ht (θ̂n )−ht (θ∗ )|≥ε1 } Thus, n−1 Pn t=1 1{ht (θ∗ )<y−ε1 } and n−1 Pn t=1 1{ht (θ∗ )<y+ε1 } converge to F (y − ε1 ) and F (y + ε1 ) a.s. by the SLLN and the null hypothesis. Further, for any δ > 0 and ε2 > 0, there is an P n∗ such that if n > n∗ , P(n−1 nt=1 1{|ht (θ∗ )−ht (θ̂n )|>ε1 } ≥ δ) ≤ ε2 . This follows from that P P P P(n−1 nt=1 1{|ht (θ∗ )−ht (θ̂n )|>ε1 } ≤ δ) ≤ (nδ)−1 nt=1 E(1{|ht (θ∗ )−ht (θ̂n )|>ε1 } ) = (δn)−1 nt=1 P (|ht (θ ∗ ) − ht (θ̂ n )| > ε1 ) ≤ ε2 , where the first inequality follows from Markov’s inequality, and the last inequality follows from the fact that |ht (θ ∗ ) − ht (θ̂ n )| ≤ Mt kθ̂ n − θ ∗ k = oP (1) uniformly in t by A2 and A3(ii). From these, therefore for any ε1 > 0, F (y − ε1 ) + oP (1) ≤ F̂n (y) ≤ F (y + ε1 ) + oP (1). Since ε1 can be chosen arbitrarily, we may let ε1 tend to zero. Therefore, F̂n (y) converges to F (y) in probability as desired. ¥ Proof of Theorem 3: (i, ii, and iii) Ĝn = G̈n + Ḧn = Gn + Hn + oP (1) by Lemmas 3(i) and 3(ii). en ⇒ Ge by Theorem 2(ii). Thus, Ĝn ⇒ G, e which implies the desired result Further, Gn + Hn = G together with the continuous mapping theorem. ¥ 7.2 Other Test Statistics In this section, we provide the formulae of other test statistics used for our Monte Carlo experiments. The following statistics are used for DGPs 2.1, 2.2, and 2.3: 33 REn := maxk k σ̂n n max[e αk , βek ]; max[e αk , βek ] − mink σ̂nkn max[e αk , βek ]; P RECU SU Mn := maxk σen √1n−1 | kt=2 vt |; P OLSCU SU Mn := maxk σ̂n1√n | kt=1 et |; P P √ Mn := maxk supz | nk (1 − nk ) n(k −1 kt=1 1{et ≤z} − (n − k)−1 nt=k+1 1{et ≤z} )|; RRn := maxk k σ̂n n SupWn := supk1 ≤k≤k2 Wn (k); P2 ExpWn := ln{ k2 −k11 +1 kk=k exp[0.5Wn (k)]}; 1 P 2 Wn (k), AvgWn := k2 −k11 +1 kk=k 1 where [e αk , βek ]0 = Mn [α̂1,k − α̂1,n , β̂1,k − β̂1,n ]0 ; Mn := Z0 Z; 1/2 Z is n × 2 matrix of the regressors; (α̂1,k , β̂1,k ) is the OLS estimator using the first k observations; (α̂2,k , β̂2,k ) is the OLS estimator using the (t + 1)-th to n-th observations; P et := Yt − α̂1,n − β̂1,n Zt ; σ̂n2 := (n − 2)−1 nt=1 e2t ; P vt := Yt − α̂1,t−1 − β̂1,t−1 Zt ; σ en2 := (n − 2)−1 nt=3 vt2 ; Wn (k) := {(n − 2)σ̂n2 − (n − 3)σ̈n2 (k)}/σ̈n2 (k); k1 = b0.15nc; k2 = b0.85nc; ŵ1,t (k) := Yt − α̂1,k − β̂1,k Zt ; ŵ2,t (k) := Yt − α̂2,k − β̂2,k Zt ; and P P σ̈n2 (k) := (n − 3)−1 { kt=1 ŵ1,t (k)2 + nt=k+1 ŵ2,t (k)2 }. The following statistics are used for DGPs 2.4, 2.5, and 2.6: REn := maxk k√ e |β |; σ̂n n k k√ e |β |; σ̂n n k P RECU SU M n := maxk σen √1n−1 | kt=2 vt |; P N LSCU SU M n := maxk σ̂n1√n | kt=1 et |; RRn := maxk k√ e |β | σ̂n n k − mink P P √ Mn := maxk supz | nk (1 − nk ) n(k −1 kt=1 1{et ≤z} − (n − k)−1 nt=k+1 1{et ≤z} )|; SupWn := supk1 ≤k≤k2 Wn (k); P2 ExpWn := ln{ k2 −k11 +1 kk=k exp[0.5Wn (k)]}; 1 P 2 AvgWn := k2 −k11 +1 kk=k Wn (k), 1 where 34 fn1/2 [β̂1,k − β̂1,n ]; M fn := Pn exp(2Zt β̂1,n )Z 2 ; βek = M t t=1 β̂1,k is the NLS estimator using the first k observations; β̂2,k is the NLS estimators using the (k + 1)-th to n-th observations; P et := Yt − exp(β̂1,n Zt ); σ̂n2 := (n − 1)−1 nt=1 e2t ; P vt := Yt − exp(β̂1,t−1 Zt ); σ en2 := (n − 1)−1 nt=2 vt2 ; Wn (k) := n{β̂1,k − β̂2,k }{nV̂1,k /k + nV̂2,k /(n − k)}−1 {β̂1,k − β̂2,k }; and V̂1,k and V̂2,k are the variance estimators of β̂1,k and β̂2,k respectively. References [1] Altonji, J. (1986). “Intertemporal Substitution in Labor Supply: Evidence from Micro Data,” The Journal of Political Economy, 94, S176–S215. [2] Andrews, D. (1993). “Tests for Parameter Instability and Structural Change with Unknown Change Point,” Econometrica, 61, 821–856. [3] Andrews, D. and Ploberger, W. (1994). “Optimal Tests When a Nuisance Parameter is Present only under the Alternative,” Econometrica, 62, 1383–1414. [4] Andrews, D., Lee, I. and Ploberger, W. (1996). “Optimal Changepoint Tests for Normal Linear Regression,” Journal of Econometrics, 70, 9–38. [5] Andrews, D. (2001). “Testing When a Parameter is on the Boundary of the Maintained Hypothesis,” Econometrica, 69, 683-734. [6] Bai, J. (1996). “Testing for Parameter Constancy in Linear Regressions: an Empirical Distribution Function Approach,” Econometrica, 64, 597–622. [7] Bickel, P. and Wichura, M. (1971). “Convergence Criteria for Multiparameter Stochastic Processes and Some Applications,” The Annals of Mathematical Statistics, 42, 1656-1670. [8] Bierens, H. (1990). “A Consistent Conditional Moment Test of Functional Form,” Econometrica, 58, 1443–1458. [9] Billingsley, P. (1968, 1999). Convergence of Probability Measures. New York: Wiley. 35 [10] Brett, C. and Pinkse, J. (1997). “Those Taxes Are All over the Map! A Test for Spatial Independence of Municipal Tax Rates in British Columbia,” International Regional Science Review, 20, 131–151. [11] Brown, R., Durbin, J. and Evans, J. (1975). “Techniques for Testing the Constancy of Regression Relationships over Time,” Journal of the Royal Statistical Society, Series B, 37, 149–163. [12] Chu, J., Hornik, K. and Kuan, C. (1995a). “MOSUM Tests for Parameter Constancy,” Biometrika, 82, 603–617. [13] Chu, J., Hornik, K. and Kuan, C. (1995b). “The Moving-Estimates Test for Parameter Stability,” Econometric Theory, 11, 699–720. [14] Darling, D. (1955). “The Cramér–Smirnov Tetst in the Parametric Case,” The Annals of Mathematical Statistics, 28, 823–838. [15] Diebold, F., Gunther, T. and Tay, A. (1998). “Evaluating Density Forecasts with Applications to Financial Risk Management,” International Economic Review, 76, 967–974. [16] Dodd, E. (1942). “Certain Tests for Randomness Applied to Data Grouped into Small Sets,” Econometrica, 10, 249–257. [17] Donsker, M. (1951). “An Invariance Principle for Certain Probability Limit Theorems,” Memoirs of American Mathematics Society, 6. [18] Durbin, J. (1973). “Weak Convergence of the Sample Distribution Function When Paramaters are Estimated,” The Annals of Statistics, 1, 279–290. [19] Elliott, G. and Müller, U. (2006). “Optimally Testing General Breaking Processes in Linear Time Series Models,” The Review of Economic Studies, 73, 907–940. [20] Fama, E. (1965). “The Behavior of Stock Market Prices,” The Journal of Business, 38, 34– 105. [21] Feller, W. (1951). “The Asymptotic Distribution of the Range of Sums of Independent Random Variables,” The Annals of Mathematical Statistics, 22, 427–432. 36 [22] Goodman, L. (1958). “Simplified Runs Tests and Likelihood Ratio Tests for Markov Chains,” Biometrika, 45, 181–97. [23] Granger, C. (1963). “A Quick Test for Serial Correaltion Suitable for Use with Nonstationary Time Series,” Journal of the American Statistical Association, 58, 728–736. [24] Grenander, U. (1981). Abstract Inference. New York: Wiley. [25] Heckman, J. (2001). “Micro Data, Heterogeneity, and the Evaluation of Public Policy: Nobel Lecture,” The Journal of Political Economy, 109, 673–746. [26] Henze, N. (1996). “Empirical Distribution Function Goodness-of-Fit Tests for Discrete Models,” Canadian Journal of Statistics, 24, 81–93. [27] Hochberg, J. (1988). “A Sharper Bonferroni Procedure for Multiple Test of Significance,” Biometrika, 75, 800–802. [28] Hong, Y. (1999). “Hypothesis Testing in Time Series via the Empirical Chracteristic Function: A Generalized Spectral Density Approach,” Journal of the American Staitistical Association, 84, 1201–1220. [29] Hong, Y. and White, H. (2005). “Asymptotic Distribution Theory for Nonparameteric Entropy Measures of Serial Dependence,” Econometrica, 73, 837–901. [30] Jain, N. and Kallianpur, G. (1970). “Norm Convergent Expansions for Gaussian Processes in Banach Spaces,” Proceedings of the American Mathematical Society, 25, 890–895. [31] Jain, N. and Marcus, M. (1975). “Central Limit Theorem for C(S)–valued Random Variables,” Journal of Functional Analysis, 19, 216–231. [32] Karlin, S. and Taylor, H. (1975). A First Course in Stochastic Processes. San Diego: Academic Press. [33] Kocherlakota, S. and Kocherlakota, K. (1986). “Goodness-of-Fit Tests for Discrete Distributions,” Communications in Statistics–Theory and Methods, 15, 815–829. 37 [34] Kuan, C. and Hornik, K. (1995). “The Generalized Fluctuation Test: a Unifying View,” Econometric Review, 14, 135–161. [35] Loeve, M. (1978). Probability Theory II. New York: Springer-Verlarg. [36] MaCurdy, T. (1981). “An Empirical Model of Labor Supply in a Life–Cycle Setting,” The Journal of Political Economy, 69, 1059–1085. [37] Mood, A. (1940). “The Distribution Theory of Runs,” The Annals of Mathematical Statistics, 11, 367–392. [38] Nyblom, J. (1989). “Testing for the Consistency of Parameters over Time,” Journal of the American Statistical Association, 84, 223–230. [39] Phillips, P. (1998). “New Tools for Understanding Spurious Regressions,” Econometrica, 66, 1299–1326. [40] Pinkse, J. (1998). “Consistent Nonparametric Testing for Serial Independence,” Journal of Econometrics, 84, 205–231. [41] Ploberger, W., Krämer, W. and Kontrus, K. (1989). “A New Test for Structural Stability in the Linear Regression Model,” Journal of Econometrics, 40, 307–318. [42] Ploberger, W. and Krämer, W. (1992). “The CUSUM Test with OLS Residuals,” Econometrica, 60, 271–285. [43] Robinson, P. (1991) “Consistent Nonparametric Entropy-Based Testing,” Review of Economic Studies, 58, 437–453. [44] Rueda, R., Pérez-Abreu, V. and O’Reily, F. (1991). “Goodness-of-Fit Test for the Poisson Distribution Based on the Probability Generating Function,” Communications in Statistics– Theory and Methods, 20, 3093-3110. [45] Sen, P. (1980). “Asymptotic Theory of Some Tests for a Possible Change in the Regressional Slope Occurring at an Unknown Time Point,” Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 52, 203–218. 38 [46] Skaug and Tjøstheim, D. (1996). “Measures of Distance between Densities with Application to Testing for Serial Independence,” Time Series Analysis in Memory of E. J. Hannan, ed. P. Robinson and M. Rosenblatt. New York: Springer-Verlag, 363–377. [47] Stinchcombe, M. and White, H. (1998). “Consistent Specification Testing with Nuisance Parameters Present Only under the Alternative,” Econometric Theory, 14, 295–324. [48] Sukhatme, P. (1972). “Fredholm Determinant of a Positive Definite Kernel of a Special Type and Its Applications,” The Annals of Mathematical Statistics, 43, 1914–1926. [49] van der Vaart, A. and J. Weller. (1996). Weak Convergence and Empirical Processes with Applications to Statistics. New York: Springer–Verlag. 39 Table 1: A SYMPTOTIC C RITICAL VALUES OF THE T EST S TATISTICS Statistics \ Level p T1,n p T∞,n 1% 5% 10% p = 0.1 0.2474 0.1914 0.1639 0.3 0.5892 0.4512 0.5 0.8124 0.7 Statistics \ Level 1% 5% 10% p = 0.1 0.2230 0.1727 0.1420 0.3828 0.3 0.5004 0.3842 0.3225 0.6207 0.5239 0.5 0.6413 0.4886 0.4092 0.9007 0.6841 0.5763 0.7 0.6065 0.4632 0.3889 0.9 0.7052 0.5329 0.4478 0.9 0.3066 0.2356 0.1973 p =0.1 0.7483 0.5683 0.4750 p =0.1 0.7454 0.5677 0.4750 0.3 1.3517 1.0237 0.8582 0.3 1.3239 1.0069 0.8441 0.5 1.6846 1.2818 1.0728 0.5 1.5909 1.1990 1.0091 0.7 1.7834 1.3590 1.4000 0.7 1.5019 1.1360 0.9617 0.9 1.3791 1.0486 0.8839 0.9 0.7912 0.6028 0.5060 T1,n 0.4524 0.3595 0.3157 Te1,n 0.2994 0.2427 0.2159 T∞,n 2.2586 1.9137 1.7382 Te∞,n 2.0346 1.7243 1.5603 p Te1,n p Te∞,n 40 Table 2: L EVEL S IMULATION AT 5% L EVEL ( IN PERCENT, 10,000 ITERATIONS ) DGP 1.1 Statistics \ n p Te1,n p Te∞,n 300 500 p = 0.1 4.05 4.71 5.09 0.3 5.02 4.95 4.61 0.5 5.22 4.74 4.87 0.7 5.34 4.90 5.31 0.9 4.35 6.63 6.07 p = 0.1 3.86 4.44 4.49 0.3 3.54 4.65 4.59 0.5 5.42 4.40 5.05 0.7 7.00 5.03 5.08 0.9 4.21 6.27 4.71 Te1,n Te∞,n 4.85 4.88 4.54 5.17 7.15 7.75 Rn a 4.7 STn a HWn a a: 100 5.1 6.5 These are the results in Hong and White (2005). Their number of replications is 1,000. 41 Table 3: P OWER S IMULATION AT 5% L EVEL ( IN DGP 1.2 DGP 1.3 PERCENT, 3,000 ITERATIONS ) DGP 1.4 DGP 1.5 Statistics \ n p e T1,n p = 0.1 100 200 100 200 100 200 100 200 20.67 33.17 29.50 45.00 27.53 47.47 5.80 9.27 0.3 33.27 59.37 8.43 10.57 8.77 12.07 9.77 15.40 0.5 36.60 65.17 5.33 5.47 5.20 5.80 18.77 32.93 0.7 34.13 60.30 9.33 11.37 5.37 6.10 76.17 97.33 0.9 23.77 38.07 30.30 51.57 15.40 23.97 75.37 97.10 p = 0.1 6.60 6.67 7.10 6.33 5.03 5.37 3.63 4.73 0.3 10.57 23.97 3.60 5.40 3.70 5.40 5.70 11.00 0.5 24.07 42.87 5.77 6.13 4.90 6.33 16.43 26.70 0.7 29.53 41.67 9.80 6.87 6.93 4.20 72.97 92.77 0.9 22.47 22.63 27.30 32.67 14.43 13.30 72.30 89.87 Te1,n Te∞,n 62.20 90.73 14.53 25.00 8.87 15.03 16.13 26.20 31.47 63.30 6.83 12.60 5.30 7.30 18.63 34.63 a 13.8 25.4 26.4 52.2 15.0 7.2 59.8 75.4 12.4 22.0 61.2 90.0 27.8 52.0 81.6 98.4 14.0 27.0 37.6 67.6 20.6 35.2 69.6 95.6 p Te∞,n Rn STn a HWn a: a These are the results in Hong and White (2005). Their number of replications is 500. 42 Table 4: P OWER S IMULATION AT 5% L EVEL ( IN DGP 1.6 DGP 1.7 PERCENT, 3,000 ITERATIONS ) DGP 1.8 DGP 1.9 Statistics \ n p = 0.1 Te p 100 200 100 200 100 200 100 200 1.30 6.83 1.00 5.33 2.77 3.73 27.83 47.70 0.3 9.93 16.33 18.90 33.67 17.73 32.57 39.47 64.23 0.5 8.33 10.87 9.27 11.37 31.77 59.80 43.13 68.00 0.7 30.87 57.17 18.07 28.73 33.60 58.43 39.97 62.13 0.9 26.43 41.27 24.00 39.03 22.37 34.63 26.30 37.50 p = 0.1 3.73 5.73 4.33 4.30 3.77 3.67 9.50 14.53 0.3 6.00 10.47 8.93 16.23 5.83 12.80 22.23 44.33 0.5 8.47 9.90 8.10 9.83 20.47 36.93 34.60 59.23 0.7 33.63 47.20 16.67 18.17 30.03 39.93 37.83 54.90 0.9 1,n p Te∞,n 25.80 29.60 22.53 22.70 20.63 19.97 24.83 29.80 Te1,n Te∞,n 26.93 62.00 20.87 52.60 48.60 81.47 58.03 81.90 23.33 53.27 15.60 31.93 27.67 56.30 52.60 77.60 Rn a 31.8 65.2 24.6 80.8 14.2 34.6 60.2 84.0 34.8 72.8 25.8 86.8 13.4 23.8 55.8 79.8 34.0 74.0 25.6 85.4 17.0 26.2 60.8 84.6 STn a HWn a a: These are the results in Hong and White (2005). Their number of replications is 500. 43 Table 5: L EVEL S IMULATION AT 5% L EVEL ( IN PERCENT, 10,000 DGP 2.1 Statistics \ n p e T1,n p = 0.1 0.3 0.5 0.7 0.9 p Te∞,n p = 0.1 ITERATIONS ) DGP 2.2 100 300 500 100 300 500 4.12 5.12 4.76 5.81 3.71 4.88 5.22 5.51 5.63 6.63 5.02 5.07 5.33 5.19 5.92 4.25 5.20 5.02 5.14 4.35 5.01 5.12 5.04 4.51 6.87 5.02 5.23 5.38 5.32 6.02 3.86 5.56 5.17 7.44 3.58 6.38 4.56 5.34 5.70 6.25 6.04 6.10 5.60 4.88 7.43 4.33 5.68 5.60 6.89 4.14 6.44 4.82 5.06 4.54 6.47 6.32 6.38 5.77 5.15 7.49 Te1,n Te∞,n 5.14 5.00 4.72 7.19 4.35 7.38 4.97 5.10 4.88 7.30 4.40 7.79 Mn REn RRn SupWn AvgWn ExpWn RECU SU Mn O(N )LSCU SU Mn 2.89 7.89 9.28 4.77 5.81 5.34 1.65 2.68 4.43 4.14 4.30 4.41 5.29 5.35 3.07 4.20 4.75 3.11 2.74 4.57 5.10 5.04 3.68 4.01 3.98 71.90 64.39 2.93 2.60 1.78 3.41 28.91 5.33 70.70 60.78 0.94 1.69 2.35 3.86 30.22 5.62 31.01 66.15 1.31 2.22 3.13 3.64 55.50 0.3 0.5 0.7 0.9 44 Table 6: P OWER S IMULATION AT 5% L EVEL ( IN DGP 2.3 Statistics \ n p Te1,n p Te∞,n DGP 2.4 PERCENT, 3,000 DGP 2.5 ITERATIONS ) DGP 2.6 100 200 100 200 100 200 100 200 p = 0.1 18.13 29.60 22.20 39.93 16.70 24.70 35.67 59.00 0.3 27.47 48.43 20.53 37.60 7.80 10.37 22.87 39.23 0.5 29.03 55.43 21.83 37.77 5.80 7.77 19.43 35.07 0.7 29.03 51.70 22.33 38.20 7.67 10.17 27.37 48.37 0.9 19.37 32.77 22.63 39.23 17.30 24.07 43.27 71.47 p = 0.1 5.90 5.53 5.50 6.80 6.07 5.03 7.57 9.47 0.3 7.70 21.80 6.87 16.43 4.77 8.40 7.20 18.10 0.5 18.23 33.47 13.80 20.50 5.93 6.50 12.63 20.10 0.7 25.27 44.40 18.83 32.30 7.30 9.83 22.47 41.37 0.9 17.37 39.73 20.33 44.47 15.27 25.77 40.10 74.80 Te1,n Te∞,n 44.99 79.94 36.70 67.27 3.60 3.83 44.44 76.60 17.72 46.81 14.10 32.10 4.53 5.40 16.81 40.58 Mn 100.0 100.0 12.93 23.03 1.97 2.70 35.96 61.43 REn 100.0 100.0 100.0 100.0 94.96 100.0 79.50 89.13 RRn 99.97 100.0 99.96 100.0 87.90 99.93 69.43 81.73 SupWn 100.0 100.0 100.0 100.0 85.63 97.86 94.89 99.03 AvgWn 100.0 100.0 100.0 100.0 73.76 92.60 94.87 99.02 ExpWn 100.0 100.0 100.0 100.0 85.73 97.70 90.19 98.14 RECU SU Mn 6.77 9.92 9.30 11.43 14.73 19.10 15.60 20.20 O(N )LSCU SU Mn 19.67 24.29 22.43 26.60 15.50 22.03 83.03 92.40 45 Table 7: P OWER S IMULATION AT 5% L EVEL ( IN DGP 2.7 DGP 2.8 PERCENT, DGP 2.9 3,000 ITERATIONS ) DGP 2.10 DGP 2.11 Statistics \ n Te p p = 0.1 100 200 100 200 100 200 100 200 100 200 18.13 29.13 18.13 27.06 14.86 23.66 4.10 12.33 24.03 45.53 0.3 27.47 49.87 25.50 43.26 20.06 37.43 37.83 63.23 20.53 34.53 0.5 29.03 55.77 26.33 47.90 21.56 41.06 46.40 74.57 18.93 29.27 0.7 29.03 50.97 25.46 45.06 22.70 39.90 35.87 63.67 15.20 24.10 0.9 19.37 32.33 21.10 27.73 16.06 26.13 0.43 0.03 0.93 0.20 p = 0.1 5.90 5.27 5.43 4.93 5.10 4.23 4.23 7.13 24.27 46.60 0.3 7.70 20.83 7.86 15.83 6.43 14.50 31.27 43.87 23.07 30.87 0.5 18.23 32.73 16.40 28.16 13.10 23.96 37.17 65.13 16.87 26.87 0.7 25.27 44.43 22.43 28.73 19.76 25.56 38.13 56.77 17.60 20.17 0.9 1,n p Te∞,n 17.37 39.57 19.90 15.06 14.80 14.23 0.40 0.07 0.90 0.53 Te1,n Te∞,n 44.99 80.86 43.76 75.36 35.86 67.36 68.63 90.45 35.16 51.88 17.72 43.10 17.93 40.50 15.73 35.36 50.51 78.32 25.16 42.63 Mn 100.0 13.67 9.80 12.20 8.10 9.40 0.66 0.86 2.36 2.73 REn 100.0 100.0 78.06 60.00 36.23 3.12 7.77 4.91 80.70 89.16 RRn 99.97 99.96 77.86 59.73 34.33 30.93 8.85 5.24 71.16 79.93 SupWn 100.0 100.0 98.16 63.26 41.86 46.06 3.73 3.46 3.82 3.64 AvgWn 100.0 100.0 92.80 50.16 34.16 32.63 4.35 3.63 3.34 3.15 ExpWn 100.0 100.0 98.86 62.66 43.50 42.40 3.56 3.19 1.76 1.87 RECU SU Mn 6.77 9.20 7.03 9.50 5.46 8.50 0.23 0.39 1.26 3.10 OLSCU SU Mn 19.67 23.36 17.80 20.40 13.63 15.70 0.43 0.39 28.43 35.13 Table 8: S AMPLE S IZES Sample Period Initial Number of Households Remaining Number of Households 68–69 6,798 1,129 68–71 6,871 939 68–74 6,962 697 68–77 7,529 489 68–80 8,439 346 46 Table 9: p-VALUES OF Te1,n Panels Year \ Statistics 1968–1980 Te1 Te∞ AND Te∞,n ( IN PERCENT ) 1968–1977 Te1 Te∞ 1968–1974 Te1 Te∞ 1968–1971 Te1 Te∞ 1968–1969 Te1 Te∞ 77.7 1969 32.64 38.55 9.35 36.31 84.4 57.6 13.5 4.4 1970 36.73 21.19 33.43 53.81 13.9 5.6 37.8 30.9 1971 42.76 67.24 42.61 28.88 60.2 32.6 45.9 6.4 1972 11.58 15.27 65.91 35.15 16.3 16.1 1973 34.64 10.66 41.30 21.68 39.9 30.3 1974 75.09 30.05 66.41 73.32 98.8 89.4 1975 53.07 47.80 38.88 24.50 1976 72.17 22.04 65.03 19.17 1977 27.80 15.57 36.69 31.42 1978 27.39 25.96 1979 34.46 31.97 1980 8.83 4.88 55.1 Hochberg’s Boundsa 75.0 73.3 68.1 26.7 77.7 Boundsb 86.4 88.2 51.6 63.8 88.7 Hochberg’s These are obtained from Te1,n and Te∞,n statistics. p b : These are obtained from T e p and Te∞,n statistics for p = 0.3, 0.4, 0.5, 0.6, and 0.7. a: 1,n 47
© Copyright 2024 Paperzz