Directed Tests of No Cross-Sectional Correlation in Large-N Panel Data Models∗ Matei Demetrescu† Christian Albrechts-University of Kiel Ulrich Homm University of Bonn Revised version: April 19, 2015 Abstract Cross-sectional dependence in panel data can significantly affect the inference about slope parameters. Existing procedures test however for cross-unit dependence per se. Based on the principle of information matrix equality tests of White (1982, Econometrica 50, 1-25), we propose directed tests for no crosssectional dependence. The new tests essentially check whether there is cross-unit dependence that is momentous for the variance of OLS slope parameter estimators and thereby for standard inferential procedures. The tests rely on suitably weighted sample residual cross-covariances or cross-correlations, and are in this sense a generalization of Pesaran’s (2004, CESifo Working Paper 1229) test for cross-sectional error dependence. We derive the joint N, T asymptotics of the proposed tests, which, under the null, follow asymptotic chi-squared distributions without restrictive sphericity or distributional assumptions. Moreover, the relative rates at which N and T grow to infinity are only mildly restricted. We use Monte Carlo simulation to gauge the finite sample power of the directed tests and compare it to extant procedures. The performance of the proposed tests in terms of size and power is good, even when the number of cross-sections is decisively larger than the number of time observations. When using the outcome of the directed tests to decide whether to use panel-robust standard errors or not for inference on slope parameters, the slope parameter tests are not affected. They are, however, affected when using alternative cross-dependence tests for such decision: the use of generic error cross-correlation tests may induce serious size distortions in slope parameter tests and lead to wrong conclusions in applied work. Key words: Cross-unit correlation; Information matrix equality; Joint asymptotics; Pre-test JEL classification: C12 (Hypothesis Testing), C23 (Models with Panel Data) ∗ The authors would like to thank the editor (M. Hashem Pesaran), three anonymous referees, Jörg Breitung, Kai Carstensen and Jean-Marie Dufour for very helpful comments and suggestions, as well as Benjamin Hillmann for computational research assistance. † Corresponding Author: Institute for Statistics and Econometrics, Christian-AlbrechtsUniversity of Kiel, Olshausenstr. 40-60, D-24118 Kiel, Germany. E-mail address: [email protected] . 1 1 Introduction Cross-sectional dependence in panel data can arise for various reasons, such as global shocks unaccounted for—be they economic, political or technological in nature—or unobserved shocks affecting only a subset of cross-sectional units. Such dependence can have dramatic effects on the asymptotic and finite-sample properties of the least squares estimator and standard inferential procedures in panel data models; see e.g. Andrews (2005) and also the more recent survey of Chudik and Pesaran (2013). One way to deal with potential cross-unit error dependence is to use so-called panelrobust covariance matrix estimation techniques, for instance as proposed by Arellano (1987), Beck and Katz (1995) or Driscoll and Kraay (1998). Another way is to employ a preliminary test for cross-sectional dependence and adjust the estimation and inference procedures according to the outcome of the pretest. The relevant issue for practitioners is then the reliability of the procedure as a whole, i.e. of the pretesting step and the inference on the slope parameters taken together. To test for error cross-unit dependence, Breusch and Pagan (1980) proposed a Lagrange multiplier [LM] test. The test, relying on the squared pairwise correlations between the residuals of each unit, is appropriate for panels with large time dimension T and small cross-sectional dimension N , and is therefore more often employed in the seemingly unrelated regressions framework of Zellner (1962). When N is large relative to T , however, it is well-known that the LM test is severely oversized; see for instance Pesaran (2004) and Pesaran, Ullah, and Yamagata (2008). While the LM test is essentially relying on (squared) Pearson correlation coefficients, Frees (1995) suggests the use of Spearman rank correlation coefficients. Frees’ statistic also tends to exhibit size distortions when N is large relative to T , even if the distortions are less pronounced than in the case of the LM test. One way to amend the size distortions of the initial LM test is that of Pesaran et al. (2008). They compute the expected values and variances of the squared correlation coefficients under the assumption of normally distributed disturbances and adjust the LM test accordingly. The resulting adjusted LM test is asymptotically normal as T → ∞ followed by N → ∞. The empirical size of the test is controlled for moderateto-large N and small T .1 The normality assumption does not appear to be restrictive, as is documented by simulation experiments with χ2 - and t-distributed errors; see e.g. Pesaran et al. (2008) and Baltagi et al. (2011). Another correction has recently been suggested by Baltagi et al. (2012), who compute the bias of the LM statistic based on fixed-effects pooled OLS residuals for N, T → ∞ and N/T → c, which leads to a bias-corrected version of the LM test. 1 Non-negligible size distortion can occur, however, when N is very large relative to T (cf. Baltagi et al., 2011) 2 Pesaran (2004, 2015) puts forward an alternative approach; the corresponding test employs residual Pearson correlation-coefficients without squaring them, on the grounds that the null of an average zero correlation is often more relevant for practitioners, say in portfolio management. The test has correct empirical size in finite samples even when N is much larger than T . But it has, by construction, low power when error correlation is present and the correlation coefficients roughly sum up to zero. This can be the case when the disturbances are generated from a factor model where the loadings average to zero. Pesaran (2015) analyzes the implicit null hypothesis of Pesaran’s (2004) test, which is given by weak dependence rather than by independence of errors between cross-sections. Finally, following John (1972), Baltagi et al. (2011) discuss a sphericity test which can be used to infer about cross-correlation when the data are cross-sectionally homoskedastic. The common feature of these three approaches is that one ultimately test for the existence of cross-unit correlation per se. Here, we derive tests for the null of no crosscorrelation based on the information matrix equality test principle of White (1982). We may interpret our procedure as being directed against cross-sectional correlation that impacts the estimate of the covariance matrix of the slope parameter estimators. From this perspective, our tests check whether the simple OLS variance matrix estimate significantly differs from a cross-correlation robust estimate. As a consequence, our procedure “leverages” the residual cross-covariances with sample cross-section covariances of the explanatory variables. This, of course, restricts the set of alternatives against which the test has power; but it is precisely these detected alternatives that are relevant from the standpoint of covariance matrix estimation for slope parameter estimators and thus for standard inferential procedures. Concretely, we discuss in Section 2 several versions of the test, which differ in how cross-sectional variance heterogeneity or heteroskedasticity is taken into account. They are all robust to cross-sectional heteroskedasticity under the null, and differences arise in the presence of cross-sectional heteroskedasticity only under the alternative. Moreover, they are equivalent under homoskedasticity. The baseline variant uses the residual cross-covariances directly, whereas the second weights them using the unitspecific residual variances. The third variant relies on cross-correlations rather than cross-covariances; when focussing on an intercept only, it turns out to be essentially Pesaran’s (2004) test. While our procedures inherit the good power properties of Pesaran’s (2004) test when the correlation coefficients all have the same sign, we expect power gains in other directions of the space of alternative hypotheses and provide Monte Carlo evidence in favour of this conjecture. Furthermore, we establish the asymptotic distribution for N and T going to infinity 3 jointly; a χ2 limiting distribution of the test is obtained without relying on a specific distribution for the disturbances. For the test based on residual cross-covariances, the relative rates of N and T are not explicitly restricted to a certain path. For the other two variants, not restricting the shape of the distribution of the disturbances comes at the price of having to control the growth rates of the cross-sectional dimension N relative to T more strictly. Our Monte Carlo simulations in Section 3 show that the directed tests have correct empirical size for as little as 10 time periods (20 for the correlations-based test), even for much larger N . The Monte Carlo analysis furthermore illustrates the severe shortcomings of several alternative procedures for testing for no cross-correlation when used as a pretest to decide between ordinary and panel-robust standard errors. In contrast, our directed test procedures work reliably. Let us introduce some notation before proceeding. We denote vectors by boldface symbols. Let k·k denote the Euclidean vector norm and the corresponding induced pP r |·| , and for matrix norm. Further, k·kr stands for both the Lr vector norm, r p r r the Lr norm of a random variable or vector, E(k · k ). The Kronecker product of two matrices is denoted by ⊗, and diag(di ) denotes the diagonal matrix having di , i = 1, . . . , N , as diagonal elements. Finally, C is a generic constant whose value may differ from occurrence to occurrence. 2 2.1 Testing for no cross-correlation Information matrix equality testing To motivate our tests, let us first analyze the homogenous panel data model yi,t = α + x0i,t β + ui,t , i = 1, . . . , N, t = 1, . . . , T, (1) where the zero-mean disturbances ui,t are assumed to be independent of the regressors xi,t . We work with a total number of K regressors including the intercept, so xi,t ∈ RK−1 . We shall relax the assumptions on α and β later on. For deriving the likelihood function, we assume normality and independence of the disturbances, ui,t ∼ iidN (0, σ 2 ). Normality is only required to justify the test statistics; we shall establish their limiting behavior under considerably weaker conditions, including e.g. cross-sectional heteroskedasticity; see Subsection 2.3. Let now y i denote the T × 1 vector containing the observations for cross-section i, Xi be the T × K regressor matrix {xi,t,k } with xi,t,1 = 1, and X the matrix stacking the N individual regressor matrices Xi . The vectors ui are the T × 1 individualunit disturbance vectors, and the N T -dimensional vector u contains the stacked N 4 individual-unit disturbances ui . Correspondingly, ut = (u1,t , . . . , uN,t )0 denotes the cross-section of errors at time t. Finally, the contemporaneous covariance matrix of the errors is given by E (ut u0t ) = Σ = {σij }i,j=1,...,N . Under the null, Σ = Σ0 = σ 2 IN , whereas under the alternative we use the parameterization Σ = σ 2 Ω with Ω positive definite and normalized such that tr Ω = N . The null is recovered when Ω = IN . Then, conditional on the regressors, the log-likelihood of model (1) is given under the null of no cross-sectional correlation by N T 1 XX `=C− 2 (yi,t − α − x0i,t β)2 , 2σ i=1 t=1 from which the score and the Hessian can be derived, N T 1 XX s = σ 2 i=1 t=1 ui,t xi,t ui,t N T 1 XX H = − 2 σ i=1 t=1 ! = 1 0 Xu σ2 ! 1 x0i,t xi,t xi,t x0i,t =− 1 0 XX σ2 when treating for simplicity σ 2 as known. When not restricting Σ, the covariance matrix of the score is given by Cov(s|X) = 1 1 0 X (Ω ⊗ IT ) X = 4 X 0 (Σ ⊗ IT ) X; 2 σ σ under the null of no cross-sectional correlation and homoskedasticity, the information matrix equality holds, 1 0 1 X (Σ ⊗ IT ) X = 2 X 0 X, 4 σ σ with Σ = Σ0 = σ 2 IN being correctly specified. Under the alternative, the equality does not hold, and a test statistic for sphericity is immediately obtained by plugging in an unrestricted estimate of Σ, say Σ̂, and a restricted estimate of σ 2 , say σ̂ 2 . One rejects the null when the equality is significantly violated, i.e. when X 0 2 0 Σ̂ ⊗ IT X − σ̂ X X = N X N X σ̂ij − σ̂ 2 1(i = j) Xi0 Xj , (2) i=1 j=1 with 1(·) the indicator function, is significantly different from zero. In this respect we obtain an alternative to the John test recently discussed by Baltagi et al. (2011). Unlike the case of the John test, however, it is straightforward to test against cross-sectional correlation only. We simply need to check whether the 5 terms not involving σii of the difference in (2) are zero or not, i.e. we focus on whether N X N X N X i−1 X 0 σ̂ij − σ̂ 1(i = j) Xi Xj = σ̂ij (Xi0 Xj + Xj0 Xi ) 2 i=1 j=1 i6=j i=2 j=1 is significantly different from zero. In fact, we only need to look at the lower triangular elements of the matrix on the r.h.s. due to its symmetry, so our directed tests rely on S= N X i−1 X σ̂ij vech(Xi0 Xj + Xj0 Xi ), (3) i=2 j=1 where σ̂ij are suitable estimators σij and vech is the half-vec operator. Having derived the statistic S on the basis of a homogenous panel data model, one possibility to compute error covariance estimates σ̂ij is via pooled OLS residuals. Indeed, a number of cross-sectional dependence tests rely on pooled or pooled fixedeffects residuals (e.g. Baltagi et al., 2011, 2012). Other procedures, such as the ones of Pesaran (2004) or Pesaran et al. (2008), rely however on unit-wise residuals. We too shall resort to unit-wise residuals in the following; the main reasons to do so are discussed in more detail in Subsection 2.2 which also addresses the relation of the directed test to the literature. We therefore let σ̂ij = 1 1 û0i ûj = y 0i MXi MXj y j T −K T −K with MXi = IT − Xi (Xi0 Xi )−1 Xi0 .2 In may be interesting to note that, analogously, a cross-sectional heteroskedasticP 2 0 ity test can be based on N i=1 (σ̂ii − σ̂ ) vech(Xi Xi ); this is essentially a White heteroskedasticity test against cross-sectional heteroskedasticity. In order to decide on the significance, the statistic S yet has to be normalized. Lemma 1 in the Appendix indicates that Cov(S) = N N −1 X X 0 1 0 0 0 0 X vech X X + X X , σ σ E tr(M M ) vech X X + X ii jj Xi Xj i j j i i j j i (T − K)2 i=2 j=1 provided that the disturbances are independent of the regressors and that the moments exist. (Precise assumptions about the components of our model are provided and discussed in the following subsection.) 2 Alternatively, one could use the (Q)ML estimator which is not corrected for degrees of freedom; the correction improves however the finite-sample properties of the test statistic while not affecting the asymptotics. 6 A natural estimator for the covariance matrix of S is then given by N X i−1 X 0 1 0 0 0 0 X , X + X X vech X X + X σ̂ σ̂ tr M M vech X V̂ = i j i j ii jj X X j i j i i j (T − K)2 i=2 j=1 where, as above, σ̂ij is a consistent estimator of σij based on ûi . Standardizing S with its estimated covariance matrix leads to the following statistic for testing the null of no cross-sectional correlation: σ = S 0 V̂ −1 S. CDX The following subsection will analyze its relation to the test of Pesaran (2004) and argue that it delivers a directed test of cross-section independence: should S be significantly different from zero, one rejects the null of no cross-sectional correlation in favor of cross-correlation that affects inference on the slope parameters β. σ was derived under the assumption of cross-unit homoskedasticity. The statistic CDX While we prove in Section 2.3 that the statistic is robust to cross-unit heteroskedasticity, it may be useful to examine a variant which explicitly accounts for heteroskedasticity to begin with. Under the null hypothesis of no cross-unit correlation, we have E (ut u0t ) = Σ0 = diag(σii ) so the score becomes s = X 0 Σ−1 0 ⊗ IT u and the Hessian is given by H = −X 0 Σ−1 ⊗ I X. T 0 At the same time, the covariance matrix of the score is given under the alternative by −1 Cov (s|X) = X 0 Σ−1 0 ⊗ IT (Σ ⊗ IT ) Σ0 ⊗ IT X. By having allowed for cross-unit heteroskedasticity, the diagonals of Cov (s|X) and −H are equal, so, by focussing again on the lower triangular elements and plugging in the corresponding estimates, we obtain Sw = i−1 N X X σ̂ij vech(Xi0 Xj + Xj0 Xi ) σ̂ σ̂ i=2 j=1 ii jj 7 (4) as basis for a no cross-correlation test. Analogously, its variance can be estimated by V̂ w N X i−1 X 0 1 1 0 0 0 0 X , X + X tr M M vech X X + X X vech X = i j X X j i j i i j i j (T − K)2 i=2 j=1 σ̂ii σ̂jj leading to −1 w CDX = (S w )0 V̂ w Sw. This statistic can be seen, with a mild abuse of terminology, as a “WLS variant” of σ CDX , since Sw = N X i−1 X 1 1 1 1 √ Xi0 p Xj + p Xj0 √ Xi σ̂ii σ̂ii σ̂jj σ̂jj ρ̂ij vech i=2 j=1 ! , is, up to a negligible3 term, the statistic (3) computed in the WLS-transformed model x0i,t yi,t 1 ui,t =α + β+ . σii σii σii σii We compare the variants theoretically and in finite samples in Sections 2.3 and 3, but not before introducing a correlations-based version with quite a nice interpretation. 2.2 Discussion To put the new tests in relation with existing ones, let us now consider a third variant of the directed tests obtained by replacing the estimated covariances in S, cf. Equation (3), by estimated correlation coefficients: R= N X i−1 X ρ̂ij vech Xi0 Xj + Xj0 Xi , where ρ̂ij = i=2 j=1 û0i ûj 1 1 (û0i ûi ) 2 (û0j ûj ) 2 . (5) In practice, Cov(R) can be approximated using V̂R = N i−1 0 1 XX vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi , T i=2 j=1 since E(ρ̂2 ) ≈ T −1 for large T and iid sampling. The resulting statistic is denoted as ρ CDX = R0 V̂R−1 R. 3 Under the conditions of Proposition 2 further below. 8 ρ σ w Under homoskedasticity, the three variants CDX , CDX and CDX can be checked to be asymptotically equivalent; see the proof of Proposition 2 in the appendix. Under crosssectional heteroskedasticity, however, we document power gains of the WLS version in Section 3 discussing the finite-sample properties of our tests. Note that, if the model only includes an intercept, the correlation-based form of our test statistic reduces to 2T ρ CDX = N (N − 1) N X i−1 X !2 ρ̂ij i=2 j=1 which is the square of the test statistic proposed by Pesaran (2004). The directed tests may thus be seen as an extension of Pesaran’s test idea. Then, to understand what the additional terms in our statistic stand for, let us now examine the standard fixed effect panel data model yi,t = αi + x0i,t β + ui,t , i = 1, . . . , N. (6) By letting ȳ i and X̄i denote the variables after within-group demeaning, we obtain by stacking equations ȳ = X̄β + ū, with ȳ = (ȳ 01 , . . . , ȳ 0N )0 and otherwise obvious notation. Under serial independence of the disturbances (we make this assumption explicit in the following subsection), the covariance matrix of the OLS estimator of β is given by −1 −1 0 . X̄ (Σ ⊗ IT ) X̄ X̄ 0 X̄ Cov β̂|X = X̄ 0 X̄ (Under serial error dependence, clustered standard errors would have been the suitable choice; see e.g. Driscoll and Kraay, 1998.) Under the null hypothesis of no crosssectional correlation we have that σij = 0 for all i 6= j, and Σ = Σ0 = diag σij . Comparing the covariance matrix of the fixed-effects OLS estimator with the one obtained assuming no cross-unit correlation, we should have equality under the null, X̄ 0 (Σ ⊗ IT )X̄ = X̄ 0 (Σ0 ⊗ IT )X̄. (7) Checking whether the equality is not significantly violated in the sample (i.e. comparing the usual heteroskedasticity-robust with the heteroskedasticity and cross-correlation robust, or panel-robust, covariance matrix estimator) takes us to a test directed at detecting cross-unit correlation affecting inference on the slope parameters in the fixed-effects σ framework. We thus obtain essentially the same test statistic as before, say CDX̄ , de9 rived from the information matrix equality principle with intercepts concentrated out. Moreover, this fixed-effects variant is nothing else than the complement of Pesaran’s 2004 CD test: the CD test focusses on cross-correlations “leveraged” by intercepts only, ρ while CDX̄ leverages with cross-products of mean-adjusted regressors – which are by construction orthogonal to the intercepts. Also, since S essentially consists of weighted pairwise error cross-unit covariances, the implicit null of our test is closely related to that of the CD test as discussed by Pesaran (2015). Actually, an adaptation of the arguments of Pesaran (2015) to the setup of Proposition 1 would show that the CD test and the directed tests have the same implicit null, i.e. an exponent of cross-sectional dependence 0 ≤ α < (2 − )/4 when T = O(N ).4 In brief, the CDX tests include the initial CD test, and the CDX leverages (the expectation of the product of regressors from different units) are either zero or constant, so there is no qualitative difference in the kind of hypotheses that can be detected by CD and CDX . The simulations in Section 3 shows the behavior of the CD tests to be the same under some data generating processes exhibiting weak cross-dependence To conclude the discussion on the construction and interpretation of the new tests, let us examine the issue of which residuals to use for computing the error covariance estimates σ̂ in more detail. On the one hand, the interpretation as directed tests is based on a homogenous or at most a fixed-effects panel data model. On the other hand, the directed tests only fit the framework of Pesaran (2004) when using unit-wise residuals. Unit-wise residuals have the additional advantage of being consistent in panels with coefficient heterogeneity, whereas the pooled (fixed-effects) OLS residuals would contain a component due to heterogeneity. Under regressor cross-unit dependence, this component is cross-sectionally correlated, so, unless one is specifically interested in detecting such types of cross-dependence, heterogeneity turns out to be a nuisance when detecting error cross-correlation. To sum up, using residuals from individual regressions has the advantage of robustifying the cross-correlation testing procedure against parameter heterogeneity; the pooled residuals may be used too, should parameter heterogeneity not be of concern.5 The following subsection derives the asymptotic limiting distribution of the three variants of our test and discusses the assumptions under which we work. 4 The implicit null is different when T ∼ CN , but the correlation-based versions require some restrictions on the relative rates at which N and T go to infinity which exclude proportionality. 5 The results provided in the following subsection build on unit-wise residuals. The proof of Proposition 1 first shows that the estimation effect is negligible, and establishes then the limiting distribution of the statistic building on the true disturbances. With pooled residuals converging at a higher rate under the null of no cross-correlation, Proposition 1 arguably holds for pooled residuals as well. 10 2.3 Limiting behavior Let us now state the conditions for the panel data generating process. The assumptions on the disturbances are standard in the panel data literature. Assumption 1. The disturbances ui,t are independent of the regressors xi0 ,t0 ,k for all 1 ≤ i, i0 ≤ N , 1 ≤ t, t0 ≤ T and 1 ≤ k ≤ K, and satisfy √ 1. ui,t = i,t σii with limN →∞ inf 1≤i≤N σii > 0, limN →∞ sup1≤i≤N σii < ∞; 2. i,t ∼ iid (0, 1) over both i and t with E 8i,t < ∞. The first condition allows for cross-unit heteroskedasticity and excludes the possibility that some units dominate in the limit. The second is e.g. implied by the normality of the disturbances assumed by Pesaran et al. (2008) or Baltagi et al. (2012), but does not restrict the distribution of the disturbances beyond typical moment restrictions. In what concerns the regressors, we essentially require mild forms of uniformity of their properties as N → ∞. Without such uniformity or analogous conditions, there may not be a limit as N, T → ∞ jointly. Before stating the assumption, we introduce some auxiliary notation that helps dealing with the fact that some of the cross-product moments of the regressors from different units may converge to zero, while others may converge to a non-zero constant. In other words, different regressors from different −1 0 0 √1 units may, but do not have to, be uncorrelated. Let v ij T = T DT vech Xi Xj + Xj Xi , where DT is a 1/2K(K + 1) × 1/2K(K + 1) diagonal matrix whose diagonal elements P PT PT T ∗ xj,t,k ∗ + = 0, x x x = 0 or E x are either 1 when E i,t,k i,t,k j,t,k i,t,k t=1 t=1 t=1 √ or T when the respective expectation is nonzero. We implicitly assume that the regressors only exhibit short-range dependence. Long-range dependence can easily be dealt with using a different standardization, but we omit the details here; integration or cointegration of the regressors is however excluded since it would lead to stochastic limiting behavior of the sample cross-product averages. Let us furthermore agree on the notation that the mth element ofv ij T, m = P P T T ij ij 1, . . . , 1/2K(K +1), is given by vT,m = T1 t=1 xi,t,k xj,t,k∗ + t=1 xi,t,k∗ xj,t,k or vT,m = P PT T ∗ √1 ∗ ∗ t=1 xi,t,k xj,t,k + t=1 xi,t,k xj,t,k for suitable k and k . T Assumption 2. The regressors xi,t,k , k = 2, . . . , K, are stochastic and satisfy X 0 Xi −1 i < ∞ = 1; 1. Pr limN,T →∞ sup1≤i≤N T 2. limN,T →∞ sup1≤i≤N ;1≤t≤T E x6i,t,k < ∞ ∀k = 1, . . . , K; 4 ij 3. limN,T →∞ sup1≤i,j≤N ;1≤t≤T E vT,m < ∞ ∀m = 1, . . . 1/2K (K + 1). 11 ij where k·k is the matrix norm induced by the Euclidean vector norm and vT,m the standardized cross-product of the regressors as defined above. Furthermore, the space 1 spanned by the vectors E v ij T has dimension /2K (K + 1). The first condition stated in the assumption is standard for stochastic regressors and just formalizes the requirement that the moment matrix of the regressors is invertible in each unit of the panel. Depending on the distributional properties of the regressors, it may imply some restrictions on the relative rates at which N and T are allowed to diverge, but the restrictions are not obvious. Considering e.g. independent units, the probability to observe regressor moment matrices in the neighbourhood of singularity is the key quantity: the faster it vanishes in T , the faster N can grow. At the other end of the scale, should the regressors be common to all units (i.e. extreme regressor cross-dependence), it suffices that the condition be fulfilled in T , case which is wellunderstood from standard regression analysis and there is no relative rate restriction. The second and third conditions are typical moment restrictions: the second is similar to the moment condition on the disturbances, while the third focuses of the cross-product sample moments of the regressors in a given unit and requires a specific form of uniformity of their convergence. For instance, a factor model such as xi,t = Λi f t + ei,t with Λi a (K − 1) × L matrix of loadings and independence of the factors from the idiosyncratic errors generates them under suitable moment conditions on the components f t and ei,t and uniformity conditions on the loadings Λi , as can easily be checked. Finally, no independence across the panel is required and the regressors may be allowed to be common as long as the dimensionality condition is fulfilled. The condition ensures the covariance matrix V̂ to be well-behaved; see Remark 1 for how to deal with the situation where the condition is violated. The requirement that the regressors be stochastic for k = 2, . . . , K simplifies proofs and notation, but is not essential for the proofs. In fact one can treat deterministic regressors as stochastic ones, provided that the sequence of regressor values behaves, in terms of distributions, like realizations of a stochastic regressor obeying the assumption; see e.g. Amemiya (1985, Chapter 4). The two assumptions allow us to establish a χ2 limiting distribution for the three variants of the proposed directed test. Proposition 1. Under the above assumptions, we have as N, T → ∞ that d σ CDX → χ21 K(K+1) . 2 Proof: see the Appendix. 12 Remark 1. The covariance matrix V is not always well-behaved, a leading example being the case where the regressors are common across units. There are two possibilities of dealing with a rank-deficitary matrix V . The first would be to simply exclude some redundant regressor cross-products. The second relies on the work of Andrews (1987) and amounts to using a generalized inverse of an estimator of V , but one which ensures that the rank of V̂ converges to the true rank of V : the limiting distribution then remains chi-squared, but with rk V degrees of freedom. We provide simulation evidence that the use of the Moore-Penrose inverse in the extreme case of common regressors (i.e. unity rank of V and V̂ ), together with χ21 critical values, works reliably. Remark 2. Assumption 1 requires strict exogeneity of the disturbance terms. It can be seen from the proof of the proposition that relaxing this to weak exogeneity, say, is difficult, since the limiting null distribution relies on uncorrelatedness of the disturbances and cross-moments of the regressors. This prevents the application of the directed tests in dynamic panels, for instance. It is possible though to eliminate the regressors that are not strictly exogenous from the vector of cross-moments; the CD test of Pesaran (2004), which has been proved to work in dynamic panels under certain circumstances, can be seen as such an “exogenized” statistic. Remark 3. Halunga, Orme, and Yamagata (2012) bootstrap the LM test of Breusch and Pagan (1980) to obtain, besides an improved behavior for large N , robustness to heteroskedasticity in the time dimension as well. Along these lines, an examination of the proof of Proposition 1 reveals that an Eicker-White type covariance matrix estimator (Eicker, 1967; White, 1980) is given by ! T N i−1 N k−1 1 X XXXX 0 ûj,t ûi,t ûk,t ûl,t vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi Ṽ = . T t=1 i=2 j=1 k=2 l=1 σ Using Ṽ instead of V̂ thus robustifies CDX against conditional as well as unconditional heteroskedasticity in the time dimension. In what concerns the WLS and the correlation-based versions of our test, we need to seriously restrict the rate at which N may grow to infinity. The finite-sample experiments in Section 3 suggest that the restriction is only binding for very small T and large N . The literature sometimes assumes symmetry of the disturbances to relax such rate restrictions; we do not find the assumption plausible in general and σ rather recommend the use of CDX when T is dangerously small. Proposition 2. Under the additional assumption that N 3 /T → 0, we have as N, T → ∞ that d ρ CDX = R0 V̂R−1 R → χ21 K(K+1) 2 13 and −1 d w CDX = (S w )0 V̂ w S w → χ21 K(K+1) . 2 Proof: see the Appendix. Remark 4. Results Similar to Propositions 1 and 2 hold as well for the test variants building on X̄ rather than on X, yet with 1/2 (K − 1) K degrees of freedom. 3 Finite-sample behavior In this section, the empirical size and power of the tests for cross-section independence is assessed by means of Monte Carlo experiments. We first present the competing procedures to keep the paper self-contained. The simulation scenarios are described in Section 3.2, and we discuss the results in Section 3.3. 3.1 Alternative test procedures For completeness, we start with the LM test of Breusch and Pagan (1980), although, because of its known severe size distortions when N is large relative to T (cf. Pesaran et al., 2008, or Moscone and Tosetti, 2009), its use is not recommended for N comparable with, or larger than, T . The LM test of Breusch and Pagan (1980) builds on the statistic s LM = N i−1 XX 1 (T ρ̂2ij − 1), N (N − 1) i=2 j=1 with ρ̂ij defined as in (5). Under the null hypothesis and for fixed N and T → ∞, LM approaches a standard normal distribution. However, for small T , (T ρ̂2ij − 1) is not centered at 0. For large cross-sectional dimension (in relation to the time dimension), this can lead to substantial overrejection. Frees’ test (1995) rank correlation test is given by N 2 RAV E = i−1 XX 2 2 r̂ij , N (N − 1) i=2 j=1 where r̂ij is the Spearman rank correlation-coefficient between residual ûi and ûj . Frees 2 −1 (1995) derives the limit distribution of QN = N (RAV E − (T − 1) ) for the case that the intercept is the only regressor in (6). The resulting limit, Q, is a weighted sum of two independent χ2 random variables. Because of dependence on T , critical values are cumbersome to compute. When T is not small, Frees (1995) suggests to use the 14 following approximately normally distributed statistic QN , F RE = p Var(Q) where Var(Q) = 4(T − 2)(25T 2 − 7T − 54) . 25T (T − 1)3 (T + 1) The adjusted LM test: Pesaran et al. (2008) compute the exact finite sample expectations and variance of ρ̂2ij imposing that, in addition to Pesaran’s assumptions, the errors i,t are normally distributed. Their statistic is given by s LMadj = N X i−1 X (T − K)ρ̂2ij − µT ij 2 , N (N − 1) i=2 j=1 νT ij where 1 tr E(MXi MXj ) , T −k 2 = tr E(MXi MXj ) a1T + 2 tr E (MXi MXj )2 a2T µT ij = νT2 ij with a1T = a2T 1 − , (T − K)2 and a2T (T − K − 8)T − K + 2) + 24 =3 (T − K + 2)(T − K − 2)(T − K − 4) 2 . As T → ∞ followed by N → ∞, LMadj converges to a standard normal distribution. The test controls size much better, also when N is large relative to T . The bias-corrected LM test: Baltagi et al. (2012) analyze the asymptotics of the LM test under joint N, T asymptotics where N/T → c ∈ (0, ∞). When using the pooled fixed-effects residuals, ũ = MX̄ u, they prove that the statistic s LMbc = N i−1 XX 1 N (T ρ̃2ij − 1) − N (N − 1) 2 j=1 2(T − 1) with ρ̃ij the correlation of ũi and ũj has a limiting standard normal distribution.6 Like for the correction of Pesaran et al. (2008) or the John test below, normality is required. The size control is again essentially better than that of the initial LM statistic; see Baltagi et al. (2012). Pesaran’s test: To work around the bias problem of the LM test for finite T , 6 This contrasts with the previous tests, where the residuals are obtained from separate cross-section regressions. One may conjecture that, applying techniques from Baltagi et al. (2011), the asymptotic results from Section 2.3 are still valid when fixed-effects residuals are employed, but we do not pursue the topic here. 15 Pesaran (2004) suggests to use ρ̂ij instead of ρ̂2ij and considers the statistic s CDP = N i−1 XX 2T ρ̂ij . N (N − 1) i=2 j=1 He shows that, under his assumptions, E[ρ̂ij ] = 0 and CDP approaches a standard normal distribution for T, N → ∞. Indeed, his test has correct empirical size also when N is large relative to T ; see Section 3. A standard critique of the test is that it P Pi−1 lacks power against alternatives under which N i=2 j=1 ρij ≈ 0. The John test: A different approach is taken by Baltagi et al. (2011). Their procedures tests for spherical disturbances. That is, apart from independence, the null hypothesis includes homoskedasticity. Although this only allows for a limited comparison with tests of cross-section independence, the John test is included in the MC experiments, since Baltagi et al. (2011) found a favorable performance relative to CDP and LMadj under homoskedasticity. The test statistic is given by T J= 1 N −2 tr Ω̃ 1 N tr Ω̃2 − T − N − 2 1 N − , 2 2(T − 1) (8) PT 0 1 0 where Ω̃ = T −K t=1 ũt ũt and ũt = (ũ1,t , . . . , ũN,t ) contains the residuals for period t from a fixed-effects regression in model (6). A crucial assumption of Baltagi et al. (2011) is that the errors are normally distributed. Then, under H0 , the statistic in (8) is asymptotically standard normal as N, T → ∞ with N/T → c ∈ [0, ∞). 3.2 Simulation setup Similar to Pesaran et al. (2008) and Moscone and Tosetti (2009), we use the following data generating process (for i = 1, . . . , N ): yi,t = αi + β1 x1,i,t + β2 x2,i,t + ui,t , ui,t = γi ft + σi i,t , where ft ∼ iid N (0, 1), αi ∼ iid N (1, 1), β1 = β2 = 1. Several scenarios are considered for the regressors xl,i,t , for the cross-sectional variances σi2 , for the factor loadings γi and for the variance σi2 of the idiosyncratic error components. In addition to Pesaran et al. (2008) and Moscone and Tosetti (2009) we simulate 16 regressors that, thanks to a factor structure, are correlated across cross-sections, (x) (x) (x) xl,i,t = fl,t γl,i + l,i,t (x) (x) where fl,t ∼ iid N (0, 1) and l,i,t ∼ iid N (0, 0.1). This makes the DGP a factoraugmented panel data model without correlation between the common components of the regressors and of the errors. It comes at no surprise, that, without such regressor cross-dependence, the CDX tests have little power, since the terms vech Xi0 Xj + Xj0 Xi in (3) basically contain empirical covariances between regressors in different sections. But the point of the CDX tests is to check whether standard inferential procedures, such as the OLS-based t-test on slope coefficients, are invalid when neglecting error cross-section correlation.7 Since the regressor cross-correlation affects the behavior of the CDX family of tests under the alternative, we shall consider three cases for the regressor loadings (x) (x) γl,i . Thus, γl,i ∼ iid U (−0.2, 0.2) with U (a, b) standing for a uniform distribution (x) on (a, b), captures relatively weak regressor cross-dependence; γl,i ∼ iid U (0.3, 0.7) (x) stands for moderate regressor cross-dependence, and γl,i ∼ iid U (3, 5) models strong regressor cross-dependence. We report here results for the case of moderate regressor cross-dependence: typically, weak and strong regressor cross-dependence lead to little differences so we omit them to save space. In some of the cases (namely some of the power studies), the results were different, though not by a large margin; we comment briefly on the additional results in such cases and the tables are available upon request from the authors. The “null” scenario, S0, is designed to study size. We start with homoskedastic errors σi2 = 1 and simulate the idiosyncratic error components i,t as standard normal. There is no error cross-correlation, S0: γi = 0 for all i = 1, . . . , N . The regressors are moderately cross-dependent as described above. The errors fulfill both the symmetry assumption of Pesaran (2004) and the normality assumption of Pesaran et al. (2008) or Baltagi et al. (2011) such that, considering the slope parameter homogeneity, all considered tests have correct asymptotic size; this will allow for meaningful power comparisons. To check the robustness of the results, the baseline scenario S0 is expanded by several cases. We build on the baseline scenario and mention for each additional case only the features that differ. First, we consider a case where the errors are heteroskedastic 7 Unreported Monte Carlo simulations show that the t-test for the null β1 = 1 has correct empirical size in the absence of regressor cross-section correlation, even if cross-sectional error correlation is present. 17 √ (with σi2 ∼ iidχ21 ) in addition to being non-normal, i,t ∼ iid(χ21 − 1)/ 2. Furthermore, building on normality and homoskedasticity again, we consider a case with slope heterogeneity (where the regressor cross-dependence is expected to induce residual cross-correlation), a case where the regressors are common such that V has reduced rank and we may study the behavior of the test when employing a generalized matrix inverse and adjusted critical values to compute the test (see Remark 1), and two cases of weak cross-correlation. The implementation details are as follows. For the slope heterogeneity case, β1 is kept constant and we generate β2i ∼ iidU (0.5, 1); the variability matches roughly the one on the heterogenous simulation setup of Chudik and Pesaran (2013, see Eq. (67)); we did not follow the design of Chudik and Pesaran (2013) since it included weakly exogenous regressors. For the common regressors case, we simply (x) (x) set xi,t,l = fl,t and i,t = 0, resort to the Moore-Penrose generalized inverse of V̂ , and employ χ21 critical values. For the weak cross-dependence designs we followed Chudik and Pesaran (2013) in setting the loadings γi to zero for i > [N α ] + 1; we work with α = 0.5 and α = 0.75 and γi ∼ iid U (0.1, 0.3) for i ≤ [N α ] like in the first power scenario below. We also simulated with cross-dependence index α = 0.25, yet the figures are virtually the same as those for α = 0.5 and we do not report them to save space. In the following scenarios we investigate the power of the tests. The baseline scenario 1 exhibits positive cross-unit covariances, S1: γi ∼ iid U (0.1, 0.3), with moderate regressor cross-dependence as described above. The errors are, like in the baseline Scenario 0, homoskedastic and Gaussian. While, in the three variations of scenario 1, all covariances σij are positive, they P Pi−1 will approximately net out in scenario 2, N 2 j=1 ρij ≈ 0 thanks to the loadings γi roughly averaging to zero. It has been argued (cf. Pesaran et al., 2008) that, in the latter case, the CDP test lacks power, and we examine our tests for similar behavior: S2: γi ∼ iid U (−0.3, 0.3). The third considered scenario duplicates Scenario 1, S3: γi ∼ iid U (0.1, 0.3), √ up to the distribution of the errors, which are now generated as i,t ∼ iid(χ21 − 1)/ 2 with constant unity variances, σii = 1. For S4, we replicate the baseline Scenario 2, S4: γi ∼ iid U (−0.3, 0.3). 18 but allow for heteroskedasticity as in S3 and switch to heteroskedastic errors (with w σi2 ∼ iidχ21 ) to allow for an assessment of the advantages of the “WLS variant” CDX , if any. Finally, Scenario 5 works with the “common regressors” DGP considered as a variation of scenario S0, with the addition of cross-error correlation following the baseline scenario S1, i.e. the case of a positive average loadings of the error factors. The question studied is whether the rank correction to get size control affects or not the power properties of the CD tests and the effectiveness as a pretest. For each scenario and varying numbers of cross-sections N and time periods T we employ 5000 Monte Carlo replications. All tests are conducted at the 5% nominal level. We also report, for each considered power study, the size of the tests of the null β1 = 1 using either the usual or the panel-robust standard errors (see Section 2.2). The choice of the standard error to be used for the slope parameter test is done according to the outcome of each cross-correlation test considered: a rejection of the null of no cross-correlation prompts the use of panel-robust standard errors, instead of standard errors accounting for cross-sectional heteroskedasticity only. We do this because, in terms of plain rejection frequencies, the CDX tests benefit from multiplication with the elements of Xi0 Xj + Xj0 Xi such that large sample covariances of the regressors boost the power of the directed tests. Raw power may therefore not be the best comparison criterion. Rather, it is more informative to find out how cross-dependence tests affect the behavior of subsequent slope parameter tests. 3.3 Simulation results We begin with the discussion of the scenario S0 not exhibiting cross-unit error correlation. Table 1 gives the empirical rejection frequencies under the null of the compared tests. The LM test is clearly unreliable, even for the largest T considered and the smallest N . As expected from the literature, the Frees test behaves much nicer here, but is still oversized under error skewness and heteroskedasticity (Table 2), with rejection frequencies between 6% and 9%, typically around 7.5%. The adjusted and the bias-corrected LM tests perform even better (4% to 6%), in particular the LMb test; the figures worsen slightly in Table 2, where sizes up to 8% emerge for larger T under skewness and heteroskedasticity, where the John test rejects in 90% to 100% of the cases, which is explained by the departure from the Gaussianity assumption under which its asymptotic properties have been derived (Table 1 with normal errors shows that the John test does hold size when the assumption is met). The behavior of the other tests is barely affected by the changed shape of the error distribution. The best size control is offered by Pesaran’s test CDp , which is virtually at 5% ρ σ throughout, and by our CDX test, which is only marginally more liberal. The CDX 19 N 10 20 50 100 200 T LM F RE LMa LMb CDp J σ CDX ρ CDX w CDX 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 12.1 7.9 7.6 6.7 26.1 12.0 9.2 7.7 88.3 35.1 19.3 12.7 100.0 85.1 51.0 25.4 100.0 100.0 97.2 64.6 6.5 6.3 6.7 6.6 5.9 5.5 5.3 5.6 5.8 5.5 4.6 5.5 5.5 5.4 5.4 4.9 6.1 5.4 5.4 5.5 6.0 5.7 5.9 5.8 5.7 5.3 5.1 5.4 5.9 5.6 4.7 5.4 5.8 5.6 5.5 4.9 6.5 5.6 5.5 5.5 5.2 5.4 6.0 5.6 5.3 5.6 4.8 5.2 5.3 5.3 4.7 5.5 4.7 5.0 4.7 4.7 5.2 5.1 5.3 5.4 4.8 5.3 5.5 4.8 5.2 5.3 5.1 4.7 5.0 5.7 4.7 5.3 5.1 5.2 5.2 5.7 5.1 5.3 5.3 4.9 6.6 6.1 5.7 5.8 8.5 7.0 6.1 5.6 8.7 6.4 6.1 5.9 8.1 6.7 6.5 5.9 9.3 7.2 6.5 5.9 5.2 5.3 5.6 4.3 6.2 5.7 5.2 5.2 6.4 5.6 5.6 5.2 6.8 5.7 5.6 5.0 6.5 5.5 5.1 5.5 8.4 6.6 6.2 4.7 9.2 7.2 5.8 5.8 9.0 7.4 6.5 5.7 9.1 6.8 6.8 5.5 9.1 6.6 6.1 6.0 8.1 6.4 5.8 5.2 9.0 7.0 6.1 5.6 8.9 7.8 6.6 5.7 9.2 6.7 6.7 5.5 9.8 6.8 6.6 5.9 Table 1: Size: homoskedasticity and normal idiosyncratic error components; for further details see the text N 10 20 50 100 200 T LM F RE LMa LMb CDp J σ CDX ρ CDX w CDX 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 12.7 9.8 10.2 8.9 25.7 13.4 11.3 11.7 87.9 34.6 22.9 15.9 100.0 82.4 49.7 27.6 100.0 100.0 95.3 62.7 7.4 7.9 9.5 8.9 6.3 7.3 8.0 9.5 6.4 7.4 7.7 7.7 5.8 7.1 7.4 7.4 6.6 6.9 7.9 7.6 7.0 7.2 8.7 8.2 6.3 7.1 7.8 9.1 6.5 7.4 7.7 7.7 6.1 7.3 7.5 7.4 7.0 7.0 8.1 7.7 5.9 7.7 8.4 8.5 6.0 7.4 7.1 8.4 5.8 7.1 7.3 7.1 6.1 7.7 7.4 7.6 6.2 7.6 8.0 8.1 5.3 5.1 5.3 4.8 5.1 4.6 4.5 4.7 5.2 4.7 5.3 4.5 5.2 5.2 5.1 4.4 5.5 4.8 4.9 4.9 92.9 99.6 99.9 100.0 98.1 99.9 100.0 100.0 99.7 100.0 100.0 100.0 99.9 100.0 100.0 100.0 99.9 100.0 100.0 100.0 5.2 5.9 6.6 6.3 5.5 5.8 5.7 5.8 5.6 6.0 6.1 5.0 6.4 5.3 5.8 5.5 6.7 5.9 5.9 5.1 8.8 7.0 7.5 6.7 9.2 6.9 5.8 5.9 9.6 7.1 7 5.9 9.2 7 6.1 5.8 10.1 6.4 6.6 6.1 7.3 6.9 6.6 6.2 7.9 6.3 6.2 6.4 7.9 6.1 6.5 5.6 7.6 6.3 6.0 6.3 8.1 6.4 6.3 5.8 Table 2: Size: skewed heteroskedastic idiosyncratic error components; for details see the text 20 w and CDX variants have good size control when T is not small compared to N , as σ predicted by Proposition 2; for T = 10 one should rely on CDX , but the CDX tests are otherwise practically equivalent in terms of size control for all N . Table 3 gives the behavior under slope coefficient heterogeneity. Given the moderate regressors cross-dependence, neglected heterogeneity induces residual cross-correlation. This however appears to be quite weak, the tests relying on fixed-effects OLS estimation still hold size. As before, the correlations-based CDX tests have some difficulties for small T . Table 4 then gives size in the case where the use of a generalized inverse together with χ21 critical values is required. We note that T = 10 is too small for the asymptotics to deliver a good finite-sample approximation for the CDX tests, but otherwise size control is quite good considering the nonstandard asymptotic problem. An interesting finding is that the extreme cross-dependence influences the finite-sample behavior of the LM and Frees tests: since the distortions decrease with T , the likely explanation is the difference between errors ui , t and the residuals ûi,t , which cross-correlates due to the common regressors. The other tests’ behavior does not change compared to the baseline S0. For the last two variations of scenario S0, Tables 5 and 6 illustrate the behavior under weak error cross-dependence. While for α = 0.5 the size is controlled by all tests that controlled size in the baseline scenario S0, we notice that, for α = 0.75, the CD tests tend to reject slightly more often as expected under the null. To sum up, only the LMa , LMb and in particular CDp tests are reliable alternatives to the directed tests in terms of size. Let us now discuss the results under the power scenarios. An examination of Table 7 shows that, under scenario S1, only CDp keeps up with ρ σ w the CDX and the CDX tests in terms of power. The covariance-based version CDX is less powerful than the cross-correlation based ones, though not by much, while still being more powerful than the LMa or LMb tests.8 The advantage over the CDp test is only visible for small N . But it is not the power per se that is most interesting in our setup. Examining Table 8 with the size of the slope parameter test computed with standard errors chosen by the cross-correlation tests, we notice that it can be severely oversized if using LMa or LMb as pretests, with sizes e.g. up to 30% for T = 10 and N = 200. In contrast, the CD tests (including CDp which can be seen as a particular case of the directed tests, see Section 2.2) come all close to holding size of the combined testing procedure, 8 The differences in power depend on the strength of the regressor cross-correlation, but are still ρ w visible when the regressor cross-dependence is weak, while the domination of CDX and CDX is quite visible when the regressor cross-dependence is strong; the exact figures are available from the authors upon request 21 N 10 20 50 100 200 T LM F RE LMa LMb CDp J σ CDX ρ CDX w CDX 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 31.7 12 9.2 6.7 70.9 26.7 16.5 10.6 100 83.5 50.4 25.4 100 100 95.4 64.2 100 100 100 100 20.7 8.8 8.6 6.5 35.1 14.8 11.8 8.3 88.2 39.8 21.8 13.5 99.2 83.1 53.5 27.5 100 99.6 94.7 97.3 7.2 4.7 5.8 4.5 7.1 5.9 6.3 5.3 7.6 6.1 4.8 4.6 8.4 5.9 6.6 5 10.6 7.1 6 5.8 5.5 4.5 5.3 5 5.1 5.1 5.3 5.9 5.7 5.1 6.4 6.3 4.9 5.2 5.9 5.3 6.3 5.7 6.7 6.1 6.9 5.9 5.2 5.9 5.1 5 5.4 4.6 5.8 6 6.7 4.2 6.7 4.2 5.8 5.3 6.9 5.7 4.6 4.8 6.8 6.3 5.7 5.2 8.1 7 6.1 7.6 10.1 6.4 7.3 6.1 9.8 6.7 5.8 5.2 8.3 7.4 6.2 6.1 6.2 5.3 4.6 6 8.2 5.1 6.8 5.1 9 6.6 5.9 4.5 10 5.5 6.4 5.5 9.2 6.9 5.5 5.8 13.2 8.5 6.5 7 13.8 7.7 8.7 5.9 15.5 8.8 7.3 5.5 14.2 8.1 8.8 6 14 7.5 7.8 5.8 13.4 8.3 6.8 6.8 13.7 7.5 8.2 6.2 14.3 8.4 7.2 5.5 16.4 7.8 7.9 6.5 14.1 6.9 7.4 5.7 Table 3: Size: heterogenous coefficients; for details see the text N 10 20 50 100 200 T LM F RE LMa LMb CDp J σ CDX ρ CDX w CDX 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 60.8 21.2 12.6 9 99.8 49.8 28.4 16.2 100 99.8 86.1 46.6 100 100 100 93.6 100 100 100 100 44.1 17.8 11.8 9 92.1 33.6 21.3 12.4 100 91.7 59.5 28.7 100 100 98.7 70.2 100 100 100 100 6.1 6.1 4.9 4.8 6 6.4 6 6.3 6.8 4.8 4 5.4 6 5.3 5 5.5 5.5 5.6 5.2 5.3 4.9 5.1 5.3 5.3 5.7 6.4 6.1 5.9 5.4 4.5 4.1 5.7 6.1 5 4.7 6.3 4.7 5 3.6 4.4 8.3 6.9 5.7 5.3 8.8 7 6 5.6 7.4 6.2 6.3 5.6 7.8 6.4 6.9 6.6 8.5 5.2 5.3 5.1 5.9 5.3 6.1 5.4 8.2 7.8 7.2 5.4 8.8 6.7 6.2 6.6 8.5 7.8 6 5.4 8 6.4 5.4 5.1 4.7 5.2 4.2 4.8 4.4 5.8 4.9 4.8 4.6 4.2 4.5 4.5 5.2 5 5.5 5.7 6.1 3.3 4.1 5.1 8.3 6.9 5.7 5.3 8.8 7 6 5.6 7.4 6.2 6.3 5.6 7.8 6.4 6.9 6.6 8.5 5.2 5.3 5.0 8.4 6.6 5.6 5.3 9.7 7 6.5 5.4 8.4 6.8 6.2 5.5 7.5 6.1 6.9 6 7.9 5.2 4.8 4.9 Table 4: Size: common regressors prompting the use of a generalized matrix inverse and adjusted degrees of freedom; for details see the text 22 N 10 20 50 100 200 T LM F RE LMa LMb CDp J σ CDX ρ CDX w CDX 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 11.92 8.6 7.9 6.68 25.82 12.62 9.4 7.48 89.46 34.98 20.94 12.34 100.0 84.18 51.06 26.12 100.0 100.0 96.6 64 6.72 6.82 6.92 6.62 6.4 6.28 5.82 5.68 5.84 5.62 4.86 5.36 6.1 5.76 4.76 5.38 6.32 5.32 6.1 4.65 6.38 6.34 6.44 6.02 6.52 6.12 5.7 5.46 6.06 5.68 4.84 5.28 6.18 5.92 4.82 5.4 6.72 5.62 6.2 4.75 4.7 5.42 6.32 5.64 5.04 5.22 5.56 5.5 4.46 5.24 4.9 4.62 5.22 5.02 4.96 5.28 4.94 5.16 4.4 4.7 5.34 4.52 6 5 5.58 5 5.6 5.64 4.64 5.04 4.94 4.74 5.46 5.44 5.26 4.88 5.24 5.3 4.9 5.05 6.94 6.76 6.26 6.36 7.92 6.78 6.26 6.24 8.24 6.14 6.52 5.34 8.42 6.86 6.8 5.54 7.92 7.06 6.6 5.65 5.44 4.96 5.26 5.02 6.14 5.3 5.9 5.56 6.32 5.58 5.96 5.66 6.3 6.12 5.88 5.2 7.18 5.56 5.5 5.1 8.12 6.8 6.14 5.46 9.82 6.4 6.44 6 9.18 6.96 7.12 5.94 8.74 7.34 6.72 5.78 9.26 6.34 5.7 5.3 8.24 6.94 6.52 5.62 9.22 6.2 6.42 5.66 8.74 7.26 7.1 6 9.04 7.18 6.62 5.46 9.12 6.42 5.5 5.35 Table 5: Size: Weak cross-dependence of exponent α = 0.5; for details see the text N 10 20 50 100 200 T LM F RE LMa LMb CDp J σ CDX ρ CDX w CDX 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 11.64 8.62 7.92 6.7 25.92 12.86 9.52 7.64 89.42 35.02 21.2 12.94 100.0 84.48 51.76 27.3 100.0 100.0 96.6 65.5 6.3 6.92 7.1 6.62 6.2 6.28 5.92 5.92 5.64 5.54 5.08 5.56 6.22 5.84 5.26 5.84 6.28 5.32 6.4 4.95 6.16 6.34 6.46 5.78 6.22 6.18 5.74 5.68 5.82 5.62 5.06 5.52 6.48 6.06 5.34 5.88 6.56 5.52 6.5 5.05 4.8 5.22 6.2 5.68 5.32 5.44 5.44 5.4 4.52 5.22 5.24 4.88 5.24 5.34 5.04 5.34 4.94 5.22 4.5 5.55 5.48 4.7 6.44 5.42 6.08 5.7 6.48 6.88 5.7 6.82 6.78 7.1 7.06 7.96 8.72 9.06 8.22 10.34 10.3 13 7.18 6.64 6.14 6.5 8.14 6.8 6.34 6.24 8.4 6.1 6.38 5.64 8.3 6.6 6.9 5.84 8.26 7.14 7.2 5.75 5.24 5.08 5.32 5.34 6.38 6.08 6.66 6.5 7.2 6.66 7.16 6.92 7.76 7.9 7.94 7.84 9.54 8.24 9.5 10.45 8.2 7.12 6.36 5.84 10.3 7.12 7.26 6.84 9.88 8.18 8.44 7.52 9.96 9.08 9.1 8.5 11.8 9.78 10.7 11 8.36 7 6.78 6 9.78 6.86 7.26 6.44 9.32 7.88 8.24 7.52 10.02 8.88 8.86 8.3 10.7 9 9.3 10.8 Table 6: Size: Weak cross-dependence of exponent α = 0.75; for details see the text 23 about 6% except maybe for very small T . The point is that the LMa and LMb test against cross-correlation indiscriminately, whereas the CD family reacts precisely to those departures from the null that affect inference on slope parameters.9 Moving on to the baseline Scenario S2, we note that the zero average correlation of the errors affects the directed tests as expected, such that they have little power compared to the adjusted and bias-corrected LM statistics LMa and LMb ; see Table 9. Their power does not appear to increase in N either. More important, however, is the finding that the reduced power is not an issue when using the directed tests as a pretest for deciding which standard errors to use for a slope parameter test. See Table 10, where the size of the pretest-based slope parameter tests is practically 5% (except perhaps for mild oversizedness for T = 10). For Scenario S3, the power is somewhat higher than under S1, but note that the John test does not hold size under nonnormality. The overall image is otherwise the same as under S1; the size distortions of the slope parameter test with LMa and LMb as pretests is still far from 5%, see e.g. the case where T = 10 and N = 100. Scenario S4 confirms that the WLS version of the directed test has higher power than the other two variants, due to the presence of cross-sectional heteroskedasticity, at least for smaller N , but the results are otherwise similar to those of S2 so we omit the corresponding tables. They are available upon request from the authors. Finally, Scenario S5 (see Table 13) replicates mostly the findings of Scenario S1, with one interesting exception. Namely, the CDX tests are clearly more powerful than under S1 (while controlling size fairly well, at least for larger T ). This is most likely because the critical values are much smaller, while the displacement of the statistic under the alternative is apparently not affected by the use of a generalized inverse. While the CDX tests may safely be used as pretests (again, except for very small T ), the LMa and LMb tests fail in this respect (Table 14) Summing up, size control is good for all three variants with the exception of the case σ is reliable; only the adjusted T = 10 where only the covariances-based version CDX and the bias-corrected LM tests of Pesaran et al. (2008) and Baltagi et al. (2012), and especially the test of Pesaran (2004) are serious competitors. In terms of power, the directed tests can dominate, but can also be dominated by the alternatives. But the use as a pretest can fail for the two corrected versions of the LM test, with sizes of the slope parameter test of up to 33%. This is not the case with the directed tests, which are designed to find the correlation that would affect inference about the slope parameters, and work fine in this respect as long as T is not too small. Thus, for T = 10, it may be wiser to resort to panel-robust standard errors di9 Again, the distorting effect is even more visible for strong regressor correlation, but apparently negligible under weak regressor cross-dependence. 24 N 10 20 50 100 200 T LM F RE LMa LMb CDp J σ CDX ρ CDX w CDX 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 29.6 16.74 14.56 16.16 72.52 33.92 28.38 31.74 100.0 91.44 78.44 77.5 100.0 100.0 99.72 99.3 100.0 100.0 100.0 100.0 19.02 14.02 13.32 16.06 37.9 21.64 21.18 28.04 88.84 58.38 54.14 64.46 99.54 94.9 91.94 95.5 100.0 100.0 99.94 99.94 6.18 8.08 9.6 12.88 8.28 9.58 13.04 21.8 10.76 18.94 27.98 50.26 18.24 34.5 55.36 84.68 27.56 60.4 84.12 98 5.34 6.8 7.88 12 6.04 8.38 11.94 20 8.18 16.48 25.98 48.78 13.28 32.02 53.56 84.46 21.3 59.98 83.58 98 16.44 23.52 29.22 42.9 30.8 50.08 64.4 83.3 65.2 91.5 97.76 99.86 89.14 99.44 100.0 100.0 97.56 100.0 100.0 100.0 7.66 8.06 6.8 9.42 8.82 8.34 10.84 16.16 11.6 17.26 23.34 44.22 17 32.5 51.06 81.58 26.82 60.22 82.32 97.58 14.14 19.36 22.56 31.78 29.46 43.62 55.66 74.04 63.42 87.66 96.26 99.58 87.9 99.08 99.98 100.0 97.34 100.0 100.0 100.0 22.72 23.74 25.48 34.18 36.56 47.74 58.18 74.92 68.1 88.46 96.32 99.64 89.94 99.1 100.0 100.0 97.72 100.0 100.0 100.0 21.76 22.78 24.2 32.98 32.52 45.02 56.16 72.96 60.54 85.86 95.54 99.56 84.38 98.62 99.94 100.0 95.88 99.96 100.0 100.0 Table 7: Power: Scenario 1; for details see the text N 10 20 50 100 200 T LM F RE LMa LMb CDp 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 8.1 7.1 8.32 7.32 8.8 8.96 9.66 9.04 8.6 7.32 8.3 8 8.52 6.22 6.94 6.48 8.94 6.64 6.28 5.68 8.1 7.18 8.38 7.32 9.64 9.46 10.14 9.34 9.44 10.96 11.3 9.68 8.58 6.96 8.48 7.3 8.94 6.64 6.3 5.7 8.26 7.4 8.48 7.5 10.54 10.1 10.72 9.8 17.34 15.94 14.96 11.42 24.04 19.98 16.32 9.8 31.74 19.02 11.64 6.14 8.34 7.62 7.38 6.62 8.5 7.48 7.62 6.1 10.46 8.74 10.1 7.34 10.9 6.82 10.08 5.9 17.94 10.08 16.3 6.88 15.24 6.22 11.62 5.56 25.06 9 20.5 6.28 16.54 6.86 9.86 6.28 33.3 9.12 18.96 6.64 11.78 6.28 6.24 5.68 J σ CDX ρ CDX w CDX 8.3 7.24 8.6 7.58 10.38 10.24 11.08 10.26 17.76 16.08 15.58 12.18 24.16 20.36 17.18 10.4 31.7 18.72 12.2 6.4 7.84 6.8 7.9 6.54 8.8 7.82 7.36 6.38 10.04 7.24 6.32 5.62 9.24 6.3 6.86 6.28 9.08 6.64 6.28 5.68 7.52 6.58 7.82 6.44 8.62 7.62 7.22 6.44 9.88 7.16 6.34 5.62 9.14 6.3 6.86 6.28 9.12 6.64 6.28 5.68 7.64 6.66 7.82 6.42 8.88 7.84 7.44 6.52 10.82 7.42 6.46 5.62 9.88 6.36 6.86 6.28 9.46 6.64 6.28 5.68 Table 8: Size for slope parameter test: Scenario 1; for details see the text 25 N 10 20 50 100 200 T LM F RE LMa LMb CDp J σ CDX ρ CDX w CDX 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 28.5 14.0 11.8 11.5 72.6 29.6 23.1 19.4 100.0 88.6 66.9 56.4 100.0 100.0 99.1 94.0 100.0 100.0 100.0 100.0 17.9 11.6 10.8 11.3 36.5 17.4 16.1 15.6 88.6 49.4 38.3 40.6 99.4 91.6 79.3 78.8 100.0 99.9 99.7 99.0 6.6 7.0 6.8 8.2 6.9 7.8 9.6 11.0 9.36 11.5 14 25.1 12.5 18.8 27.9 51.9 19.0 34.8 53.1 83.1 5.5 6.6 7.3 8.4 4.9 6.8 9.7 11.4 7.5 11.1 15.2 26 10.5 19.6 30.6 53.6 17.0 38.6 57.9 85.7 6.0 5.7 5.5 6.1 6.1 5.2 6.0 5.6 6.1 6.0 5.3 6.0 6.2 5.7 5.7 5.9 5.5 5.6 5.3 5.7 7.1 7.1 7.2 7.3 7.8 7.4 9.1 9.4 10.5 11.7 14.5 22.2 14.0 20.4 29.5 50.0 22.0 39.5 57.1 84.0 6.9 6.0 5.6 5.9 8.1 7.1 6.6 6.7 9.1 7.1 6.7 6.8 8.3 7.6 7.0 7.0 8.4 7.5 7.0 7.3 13.1 8.3 7.2 6.9 13.3 9.2 8.2 7.4 14.3 9.2 8.1 7.4 13.4 10.1 8.6 7.9 13.5 9.5 8.6 8.1 12.8 8.1 7.6 6.9 13.1 9.2 7.7 7.6 13.6 9.2 7.9 7.0 13.1 9.6 8.3 8.1 13.8 9.5 8.8 8.1 Table 9: Power: Scenario 2; for details see the text N 10 20 50 100 200 T LM F RE LMa LMb CDp J σ CDX ρ CDX w CDX 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 6.14 5.6 5.42 5.26 5.88 5.52 5 5.4 7.02 5.4 5.32 4.44 7.28 5.64 5.84 5.44 6.36 5.96 5.18 5.56 5.96 5.58 5.42 5.26 5.48 5.46 4.96 5.34 6.94 4.88 5.08 4.44 7.26 5.56 5.78 5.22 6.36 5.96 5.18 5.58 5.8 5.46 5.42 5.28 5.08 5.3 4.88 5.34 5.52 4.84 5.04 4.42 5.28 5.18 5.32 5.12 5.18 5.8 4.94 5.58 5.78 5.44 5.44 5.26 5.18 5.34 4.9 5.34 5.54 4.76 5.1 4.4 5.42 5.12 5.32 5.06 5.14 5.86 4.98 5.6 5.46 5.24 5.38 5.32 4.84 5.24 4.78 5.24 5.16 4.8 5 4.3 5.02 5.02 5 5.08 4.7 5.46 4.64 5.28 5.74 5.34 5.46 5.24 5.12 5.26 4.82 5.3 5.54 4.72 5.02 4.34 5.5 5.1 5.36 5.08 5.26 5.9 4.94 5.62 5.42 5.22 5.26 5.2 4.84 5.16 4.68 5.14 5.08 4.64 4.9 4.24 4.94 4.8 4.88 4.96 4.48 5.34 4.62 5.18 5.4 5.14 5.24 5.18 4.74 5.18 4.68 5.14 5.12 4.66 4.88 4.24 4.88 4.74 4.88 4.96 4.5 5.3 4.62 5.2 5.46 5.1 5.26 5.18 4.82 5.2 4.76 5.12 5.28 4.7 4.84 4.26 5.04 4.86 4.9 4.94 4.74 5.38 4.62 5.22 Table 10: Size for slope parameter test: Scenario 2; for details see the text 26 N 10 20 50 100 200 T LM F RE LMa LMb CDp J σ CDX ρ CDX w CDX 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 33.9 22.8 21.2 22.2 77.3 43.8 37.8 41.0 100.0 94.5 87.6 85.2 100.0 100.0 99.8 99.5 100.0 100.0 100.0 100.0 23.1 19.8 19.9 22.1 46.5 31.3 31.4 36.9 93.0 73.6 71.2 78.2 99.8 98.1 96.5 97.8 99.9 100.0 100.0 99.9 10.3 13.8 15.1 18.8 14.4 18.4 23.1 31.6 27.9 38.7 50.4 67.8 46.2 64.7 77.5 92.8 67.0 86.4 96.0 99.6 10.5 13.2 14.3 17.8 14.3 17.3 22.0 30.6 28.8 38.8 49.5 66.0 47.9 63.8 77.2 92.5 70.6 86.4 95.8 99.6 27.2 33.3 38.5 48.6 50.1 66.9 75.6 87.4 85.3 96.8 99.4 99.9 97.4 99.8 100.0 100.0 99.7 100.0 100.0 100.0 53.3 65.8 73.0 77.3 63.9 78.5 84.8 92.5 76.6 91.2 96.3 98.9 83.3 95.9 99.0 99.9 89.2 98.7 99.9 100.0 18.2 23.6 28.2 34.0 34.5 49.1 59.8 74.5 67.7 89.2 97.0 99.6 88.6 99.2 100.0 100.0 97.9 99.9 100.0 100.0 33.2 32.0 33.9 38.9 54.8 64.1 71.1 80.9 86.4 95.5 99.1 99.9 97.4 99.8 100.0 100.0 99.7 100.0 100.0 100.0 31.9 31.5 33.3 39.1 54.0 64.6 71.3 81.3 86.1 95.5 99.0 99.8 97.0 99.9 100.0 100.0 99.7 100.0 100.0 100.0 Table 11: Power: Scenario 3; for details see the text N 10 20 50 100 200 T LM F RE LMa LMb CDp J σ CDX ρ CDX w CDX 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 7.7 6.9 6.8 7.2 7.6 8.3 8.4 8.5 8.2 6.7 7.4 7.5 8.3 6.5 6.3 5.5 8.6 7.3 6.5 5.3 7.9 6.9 6.9 7.2 8.2 9.1 8.8 8.7 8.6 8.7 9.3 8.2 8.3 6.8 6.8 5.9 8.6 7.3 6.5 5.3 8.2 7.2 7.1 7.3 9.4 9.8 9.2 9.1 14.5 12.8 12.0 9.7 18.0 13.4 10.4 6.9 17.6 11.0 7.8 5.4 8.2 7.2 7.2 7.4 9.4 9.9 9.2 9.2 14.4 12.6 12.1 9.9 17.5 13.6 10.2 7.0 15.9 11.0 7.7 5.5 7.3 6.4 6.2 6.3 7.3 7.0 6.1 6.0 8.3 6.3 6.3 5.8 8.3 6.5 6.2 5.5 8.6 7.3 6.5 5.3 7.8 6.5 6.1 6.2 7.9 7.2 6.3 6.2 10.4 7.0 6.6 6.0 11.0 7.4 6.3 5.5 11.7 7.6 6.6 5.3 7.7 6.8 6.5 6.7 7.9 8.0 7.0 6.4 9.4 6.8 6.6 5.8 9.2 6.6 6.2 5.5 8.9 7.3 6.5 5.3 7.3 6.6 6.5 6.5 7.4 7.3 6.5 6.3 8.3 6.5 6.4 5.8 8.3 6.5 6.2 5.5 8.6 7.3 6.5 5.3 7.3 6.7 6.5 6.6 7.6 7.4 6.4 6.3 8.6 6.5 6.4 5.8 8.5 6.6 6.2 5.5 8.6 7.3 6.5 5.3 Table 12: Size for slope parameter test: Scenario 3; for details see the text 27 N 10 20 50 100 200 T LM F RE LMa LMb CDp J σ CDX ρ CDX w CDX 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 27.3 16.4 14.5 15.8 72.6 33.5 32.1 27.4 100 91 80.8 77.8 100 100 99.4 99.1 100 100 100 100 17.8 13.8 13.4 15.6 37.3 20.7 23.7 24 89.8 58.5 54.3 66.6 99.7 95.2 91.3 95.9 100 100 100 100 7.1 8 8.2 12.5 7.5 9.6 15.7 18.7 12.3 19.2 28.1 51.7 20.7 36.7 57.6 85.3 30.5 60.3 82.3 93.6 6 8 7.4 11.6 5.4 8.2 12.7 17.6 8.3 15.9 25.6 48.2 15.9 33.7 56.3 84.3 23.2 59.4 82.3 93.9 16.7 23.7 29 41.7 31.6 53.6 65.5 82.5 66.7 91.7 97.9 99.7 88.6 99.4 100 100 97.8 100 100 100 7.6 6.9 7.1 9.4 7.6 9.2 12.3 13.8 11.4 16.8 24.3 44.2 20.6 33.9 54 81.1 28.9 59.9 82.7 91.2 75.3 77.1 82.1 87.8 79.5 89.6 94.1 97.2 93.2 98.8 99.7 100 97.3 99.9 100 100 99.7 100 100 100 84 81.7 84.9 88.5 83.8 90.5 94.5 97.4 94.8 99.3 99.8 100 97.9 100 100 100 99.7 100 100 100 84.5 80.5 84.8 88.1 84 90.7 94.1 97.1 93.3 98.9 99.8 100 97.3 99.9 100 100 99.5 100 100 100 Table 13: Power: Scenario 5; for details see the text N 10 20 50 100 200 T LM F RE LMa LMb CDp J σ CDX ρ CDX w CDX 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 10 20 30 50 7.4 6.7 7.8 7.2 8.6 7.9 9.1 9 7.3 7 7.4 8.1 11.3 6.4 5.8 5.6 7.9 7.4 6.8 6.1 7.7 6.8 7.8 7.2 9.1 8.7 9.3 9.3 8.1 11.4 10.3 9.8 11.3 7.1 7.8 6.1 7.9 7.4 6.8 6.2 7.6 7 7.9 7.2 10.1 9 9.4 10.1 16.5 16.9 13.7 12.4 27.1 18.2 16 8.8 29.3 21.4 11.7 9.7 7.4 7 8 7.3 10.1 9.3 9.8 10.1 16.8 17.1 14.3 12.8 27.7 18.9 15.6 8.8 30.6 21 11.7 8.9 6.9 5.9 6.8 5.8 8.5 6.3 6.2 6.3 8.4 6.5 5.5 5.6 11.8 6.5 5.7 5.3 8.1 7.4 6.8 5.8 7.3 7 8.1 7.4 9.9 8.9 9.8 10.4 16.6 17.2 14.1 13.8 26.7 18.5 16.3 9.4 29.8 20.3 12.3 10.3 7.4 5.2 6 5 7.9 5.2 5.6 6.1 7.3 5.9 5.4 5.6 11.3 6.4 5.7 5.3 7.9 7.4 6.8 6.0 7.8 5.1 5.9 5 8.3 5.2 5.6 6.1 7.2 5.8 5.4 5.6 11.3 6.4 5.7 5.3 8 7.4 6.8 6.1 7.8 5.2 6 5 8.3 5.2 5.7 6.1 7.4 5.8 5.5 5.6 11.4 6.4 5.7 5.3 8.1 7.4 6.8 6.1 Table 14: Size for slope parameter test: Scenario 5; for details see the text 28 rectly, since the pretesting step induces some oversizedness in this case. For data sets w w with larger dimension, the correlations-based CDX and CDX appear to be the better w choices, with CDX having an edge under cross-sectional heteroskedasticity. 4 Summary We have introduced directed tests of error cross-section independence. Relying on the information matrix equality, they essentially check whether cross-sectional error correlation can be neglected when estimating the covariance matrix of the fixed effect slope estimator. Compared to extant testing procedures, this restricts the set of alternative hypotheses, since the directed tests augment cross-sectional error covariances or correlations by cross-sectional regressor correlation. The asymptotic distribution of the directed tests has been shown to be χ2 with 1/2K(K + 1) degrees of freedom, for T → ∞ and N → ∞ jointly, where K is the number of explanatory variables. The limiting distribution approximates the finite sample distribution well, except perhaps in the case when T is very small compared to N . In terms of power, the augmentation with cross-sectional regressor correlation can be of disadvantage when the regressor correlation is weak, but can turn into an advantage when the regressor correlation is strong. This suggests that a union-ofrejections approach combining the evidence from the directed tests and alternative test with power not depending on regressor cross-correlation (say the corrected LM tests) may work better than either test alone in a wider range of situations.10 When used as a pretest for choosing the type of standard error to be used for slope parameter tests, the directed tests act reliably, unlike alternative procedures focussing on error cross-correlation only. Especially when the regressor cross-correlation is not weak, the latter tests fail to detect cross-dependence affecting the standard errors, indicating ordinary standard errors too often when robust clustered ones would have been in order. This leads to heavily oversized slope parameter tests for the non-directed tests. When T is small, one should use panel-robust standard errors to begin with. Appendix Lemma 1. Under Assumptions 1 and 2, the following holds: (i) E σ̂ij vech Xi0 Xj + Xj0 Xi = 0, for i 6= j, (ii) E σ̂ij vech Xi0 Xj + Xj0 Xi σ̂ik vech (Xi0 Xk + Xk0 Xi ) = 0, given that the indices i, j, and k differ , 10 This has been impressively illustrated by Harvey et al. (2009) for unit root tests. 29 (iii) Cov σ̂ij vech Xi0 Xj + Xj0 Xi = 0 (T − K)−2 σii σjj E tr(MXi MXj ) vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi , and 0 PN Pi−1 (iv) Cov (S) = (T −K)−2 i=2 j=1 σii σjj E tr(MXi MXj ) vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi . Note that T /(T − K) → 1; furthermore, since the projection matrices Pi and Pj are positive q semidefinite, the Cauchy-Schwarz inequality indicates that |tr Pi Pj | ≤ tr Pi2 tr Pj2 = K. Thus, we may use T throughout instead of tr MXi MXj or T − K. Before moving on to the main proofs, we require an additional lemma. Lemma 2. Under the Assumptions of Proposition 1 it holds as N, T → ∞ that PN Pi−1 0 p −1 1 0 0 u v ij (i) N √ X ) X X (X u j i i i i i T →0 i=2 j=1 T −1 0 ij p PN Pi−1 0 1 0 (ii) N √ u X X X Xj uj v T → 0 j j i j i=2 j=1 T −1 0 ij p PN Pi−1 −1 1 0 0 0 0 u X (X X ) X X X X (iii) N √ Xj uj v T → 0. i i j j i i i j i=2 j=1 T 3 P P N i−1 ij (iv) supN,1≤t≤T E N1 i=2 j=1 uj,t v T ui,t < ∞. (v) (vi) 1 N 2T 1 N2 T P N P N i−1 P P k−1 P t=1 i=2 k=2 j=1 l=1 PN Pi−1 i=2 uj,t ul,t ui,t uk,t v ij T ij j=1 σ̂ii σ̂jj v T v ij T 0 − 1 N PN 0 v kl T i=2 − σii 1 N 1 N N P σii i=2 1 N Pi−1 i−1 P j=1 ij j=1 σjj v T σjj v ij T v ij T 0 v ij T 0 ! p → 0. p → 0. The proofs of the above lemmas are available in an online appendix. Proof of Proposition 1 Let ΣS,N N i−1 X X 0 1 1 ij = σii σjj v ij . T vT N i=2 N j=1 Note that ΣS,N does not depend on t, but depends on T via N and v ij T. ij The dimensionality restriction on v implies full rank of Σ even in the limit, implying that S,N T −1 −0.5 ΣS,N and ΣS,N are uniformly bounded in probability. Consider now the following normalization of the test statistic: σ CDX = Σ−0.5 S,N 1 −1 D S N T 0 −1 −0.5 −1 1 −1 −0.5 −0.5 1 −1 ΣS,N DT V̂ DT ΣS,N ΣS,N D S . N2 N T We shall prove that Σ−0.5 S,N 1 −1 d DT S → N 0, I 12 K(K+1) N and that 1 p V̂ DT−1 − ΣS,N → 0; (9) N2 σ d 2 together with the uniform w.p.1 boundedness of Σ−0.5 S,N , the two imply that CDX → χ 21 K(K+1) as required. DT−1 30 Let us first examine the limiting distribution of the suitably normalized S. We have N i−1 N i−1 vech Xi0 Xj + Xj0 Xi 1 −1 1 XX√ 1 XX√ −1 √ T σ̂ij DT T σ̂ij v ij DT S = = T N N i=2 j=1 N i=2 j=1 T with √ T σ̂ij −1 0 −1 0 T Xj uj Xi uj 1 X u0i Xi Xi0 Xi u0i Xj Xj0 Xj √ − √ = √ ui,t uj,t − T T T T T t=1 T T −1 −1 0 0 Xj uj u0 Xi Xi0 Xi Xi0 Xj Xj Xj √ . + i T T T T T Using Lemma 2, it follows immediately that N i−1 T 1 −1 1 XX 1 X √ DT S = ui,t uj,t v ij T + op (1) N N i=2 j=1 T t=1 as N, T → ∞. Rearrange the terms on the r.h.s. to obtain N i−1 T T N i−1 1 X 1 XX 1 XX 1 X ij ij √ ui,t uj,t v T , ui,t uj,t v T = √ N i=2 j=1 T t=1 T t=1 N i=2 j=1 and let ξ t,T N i−1 X X 1 −0.5 ui,t . = ΣS,N uj,t v ij T N i=2 j=1 Given that E ( ui | uj , Xk , j < i, 1 ≤ k ≤ N ) = 0 and ΣS,N only depends on the regressors, the ξ t,T have a (multivariate) md array structure. To establish the limiting behavior of the suitably normalized S (i.e. the sums of the lines of the md array) we need to show that p 1 max √ ξ t,T (10) →0 1≤t≤T T and that T 1X p ξ t,T ξ 0t,T → I 12 K(K+1) . T t=1 (11) Since Cov ξ t,T is easily shown to be the 1/2K (K + 1)-dimensional identity matrix, (10) and (11) will allow us to conclude that −1 Σ−0.5 S,N DT 1 d S → N 0, I 12 K(K+1) N using a central limit theorem for md arrays (Davidson, 1994, Theorem 24.3) and the Cramér-Wold device. −0.5 To establish the asymptotic negligibility condition (10), recall that ΣS,N does not depend on t so we have that N i−1 X X 1 −0.5 1 1 ij √ √ max max ξ ≤ u v u Σ j,t T i,t . t,T S,N 1≤t≤T 1≤t≤T T N i=2 j=1 T 31 P P √1 N i−1 ij 1 is bounded w.p.1, it suffices that max Since Σ−0.5 u v u van1≤t≤T j,t i,t S,N T i=2 j=1 N T ishes as T → ∞. By a well-known relation between the uniform boundedness of the moments of a sequence and the maxima of the sequence, this is in turn implied by the uniform boundedness of 3 1 PN Pi−1 ij E N i=2 , which has been proven in Lemma 2 item (iv), so condition (10) j=1 uj,t v T ui,t follows. Item (v) of Lemma 2 then establishes (11). We now only need to establish the asymptotic behavior of the suitably normalized V̂ given in (9). With tr MXi MXj /T → 1, the desired convergence follows from Lemma 2 item (vi), thus concluding the proof. Proof of Proposition 2 ρ The proof is essentially the same for both statistics, so we only give the part concerning CDX . n o −1 Letting ˆi = i,t − x0i,t (Xi0 Xi ) Xi0 i , it follows immediately that t=1,...,T ρ̂ij = q Let then S = N X i−1 X 1 ˆ0 ˆ T −K i j 1 ˆ0 ˆ 1 ˆ0 ˆ T −K i i T −K j j . σ̂ij vech Xi0 Xj + Xj0 Xi i=2 j=1 with σ̂ij = 1 ˆ0 ˆj . T −K i σ This is nothing else than the CDX statistic computed as if the disturbances of the panel were and not u. We shall prove that 1 −1 1 D R = DT−1 S + op (1) N T N and the result will follow with the arguments of the proof of Proposition 1. A direct use of the arguments given there is not a choice without additional assumptions about the distribution of i,t since i,t ζi,t = q 1 ˆ0 ˆ T −K i i may not have finite moments at all, and the independence of ζi,t in time is lost. Since K is finite, T /(T − K) → 1 and we may safely replace T − K with T. p We show in a first step that, for some vanishing sequences aTij for which supi,j aTij → 0 as T → ∞, there exists a constant C > 0 such that 1 q − 1 ≤ C sup aTi,j ∀i, j. (12) i,j 1 + aTij The Taylor expansion around 1 with rest in integral form indicates that ∃λ ∈ [0, 1] such that aTi,j 1 q −1=− q 3 1 + aTij 2 1 + λaTij 32 Since supi,j aTi,j → 0, ∃T0 fixed such that 1 + λaTij > 0 ∀ T > T0 and thus 1 1 1 max sup q ≤ max , q 3 3 2 2 1 + λaTij 2 1 − supi,j aTij λ∈[0,1] i,j ∀ T > T0 , which is obviously uniformly bounded in i and j as T → ∞. Equation (12) follows. In a second step, we establish the behavior of the term T1 ˆ0i ˆi − 1 across all units of the panel. We have that T T T 1X 2 2X 1X 0 1 0 −1 −1 −1 ˆi ˆi − 1 = i,t − 1 − i,t x0i,t (Xi0 Xi ) Xi0 i + x (X 0 Xi ) Xi0 i 0i Xi (Xi0 Xi ) xi,t . T T t=1 T t=1 T t=1 i,t i PT The term T1 t=1 2i,t − 1 is of order Op T −0.5 and is easily checked to have finite kurtosis given our assumptions; hence √ ! T 1 X N 2 i,t − 1 = op √ . sup T i T t=1 P P −1 −1 −1 T T The terms T2 t=1 i,t x0i,t (Xi0 Xi ) Xi0 i and T1 t=1 x0i,t (Xi0 Xi ) Xi0 i 0i Xi (Xi0 Xi ) xi,t are both of order Op T −1 , so their maximum over i = 1, . . . , N must be of order Op (N/T ) . Summing up, T 1 X 0 ˆi ˆi − 1 = Op sup i T t=1 (√ max N N √ , T T )! = Op √ ! N √ . T To conclude the proof, we now examine 1 0 1 0 ˆ ˆi ˆ ˆj − 1 = T i T j 1 0 1 0 1 0 1 0 ˆ ˆi − 1 + ˆ ˆj − 1 + ˆ ˆij − 1 ˆ ˆj − 1 . T i T j T i T j Note that if supi T1 ˆ0i ˆi − 1 = Op (aT ) for some positive aT → 0 it follows that 1 0 1 0 1 0 1 0 ˆi ˆi − 1 + ˆi ˆi − 1 + ˆj ˆj − 1 ˆj ˆj − 1 = Op (aT ) sup T T T T i,j so, given that aT = √ √N , T 1 0 1 0 sup ˆi ˆi ˆj ˆj − 1 = Op T i,j T √ ! N √ . T Thus, we have for all pairs i, j √ 1 0 √ T ρ̂ij − ˆi ˆj ≤ T T 1 0 √ ˆi ˆj q T 1+ √ = C √N 1 0 T ˆ ˆ 1 ˆ0 ˆ T i i T j j − 1 1 1 0 √ ˆi ˆj . T Then, 1 −1 ρ D S − 1 D−1 S T N T N ≤ N i−1 1 X X 1 0 ˆ ˆ ρ̂ij − i j DT−1 vech Xi0 Xj + Xj0 Xi N i=2 j=1 T N i−1 1 X X 1 0 √ ˆi ˆj v Ti,j . ≤ C√ N T i=2 j=1 T 33 But the expectation E √1T ˆ0i ˆj v Ti,j is easily shown to be uniformly bounded, from which it PN Pi−1 follows that i=2 j=1 √1T ˆ0i ˆj v Ti,j = Op N 2 . This leads ultimately to 1 −1 ρ 1 D S − DT−1 S = Op N T N N 1.5 √ T 3 which vanishes given the additional assumption N /T → 0. This assumption can be dropped if in exchange i,t ζi,t = q 1 ˆ0 ˆ T −K i i has finite variance – e.g. when |i,t | is bounded away from zero. References Amemiya, T. (1985). Advanced econometrics. Harvard University Press. Andrews, D. W. K. (1987). Asymptotic Results for Generalized Wald Tests. Econometric Theory 3, 348–358. Andrews, D. W. K. (2005). Cross-Section Regression with Common Shocks. Econometrica 73, 1551– 1585. Arellano, M. (1987). Computing Robust Standard Errors for Within-Group Estimators. Oxford Bulletin of Economics and Statistics 49, 431–434. Baltagi, B. H., Q. Feng, and C. Kao (2011). Testing for Sphericity in a Fixed Effects Panel Data Model. Econometrics Journal 14, 25–47. Baltagi, B. H., Q. Feng, and C. Kao (2012). A Lagrange Multiplier Test for Cross-Sectional Dependence in a Fixed Effects Panel Data Model. Journal of Econometrics 170, 164–177. Beck, N. and J. N. Katz (1995). What to Do (and Not to Do) with Time Series Cross-section Data. American Political Science Review 89, 634–647. Breusch, T. S. and A. R. Pagan (1980). The Lagrange Multiplier Test and its Application to Model Specification Tests in Econometrics. Review of Economic Studies 47, 239–253. Chudik, A. and M. H. Pesaran (2013). Large Panel Data Models with Cross-Sectional Dependence: A survey. In B. H. Baltagi (Ed.), The Oxford Handbook of Panel Data, forthcoming. Oxford University Press. Davidson, J. (1994). Stochastic Limit Theory. Oxford University Press. Driscoll, J. C. and A. C. Kraay (1998). Consistent Covariance Matrix Estimation with Spatially Dependent Panel Data. The Review of Economics and Statistics 80, 549–560. Eicker, F. (1967). Limit Theorems for Regressions with Unequal and Dependent Errors. In L. Le Cam and J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 59–82. Berkeley: University of California Press. 34 Frees, E. W. (1995). Assessing Cross-Sectional Correlation in Panel Data. Journal of Econometrics 69, 393–414. Halunga, A., C. D. Orme, and T. Yamagata (2012). A Heteroskedasticity Robust Breusch-Pagan Test for Contemporaneous Correlation in Dynamic Panel Data Models. University of Manchester Economics Discussion Paper 1118. Harvey, D. I., S. J. Leybourne, and A. M. R. Taylor (2009). Unit root testing in practice: Dealing with uncertainty over the trend and initial condition. Econometric Theory 25, 587–636. John, S. (1972). The Distribution of a Statistic Used for Testing Sphericity of Normal Distributions. Biometrika 59, 169–173. Moscone, F. and E. Tosetti (2009). A Review and Comparison of Tests of Cross-Section Independence in Panels. Journal of Economic Surveys 23, 528–561. Pesaran, M. H. (2004). General Diagnostic Tests for Cross Section Dependence in Panels. CESifo Working Paper No. 1229. Pesaran, M. H. (2015). Testing Weak Cross-Sectional Dependence in Large Panels. Econometric Reviews 34, 1088–1116. Pesaran, M. H., A. Ullah, and T. Yamagata (2008). A Bias-Adjusted LM Test of Error Cross-Section Independence. Econometrics Journal 11, 105–127. White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica 48, 817–838. White, H. (1982). Maximum Likelihood Estimation of Misspecified Models. Econometrica 50, 1–25. Zellner, A. (1962). An Efficiency Method of Estimating Seemingly Unrelated Regression Equations and Tests for Aggregation Bias. Journal of the American Statistical Association 57, 348–368. 35 Directed Tests of No Cross-Sectional Correlation in Large-N Panel Data Models Matei Demetrescu, Ulrich Homm Appendix - for online publication Proof of Lemma 1 For notational simplicity we show the results for deterministic X. For stochastic X they can be obtained by conditioning on X and applying the LIE. (i): E σ̂ij vech Xi0 Xj + Xj0 Xi = vech Xi0 Xj + Xj0 Xi E = 1 0 u MXi MXj uj T i 1 vech Xi0 Xj + Xj0 Xi tr(σij MXi MXj ) T H0 = 0 (ii): Up to a scaling constant, E σ̂ij vech Xi0 Xj + Xj0 Xi σ̂ik v 0ik = T −2 vech Xi0 Xj + Xj0 Xi v 0ik E (MXi ui )0 MXj uj (MXi ui )0 MXk uk =0 since, under the null, ui , uj and uk are independent. (iii) Note that Var σ̂ij vech Xi0 Xj + Xj0 Xi ) = = = = = = = = 0 1 vech Xi0 Xj + Xj0 Xi E u0i MXi MXj uj u0j MXj MXi ui vech Xi0 Xj + Xj0 Xi T2 0 1 vech Xi0 Xj + Xj0 Xi E tr u0i MXi MXj uj u0j MXj MXi ui vech Xi0 Xj + Xj0 Xi T2 0 1 vech Xi0 Xj + Xj0 Xi tr MXi MXj E uj u0j MXj MXi ui u0i vech Xi0 Xj + Xj0 Xi T2 0 1 vech Xi0 Xj + Xj0 Xi tr MXi MXj E E uj u0j MXj MXi ui u0i |uj vech Xi0 Xj + Xj0 Xi T2 0 1 σii tr MXi MXj E uj u0j MXj MXi vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi T2 0 1 σii σjj tr MXi MXj MXj MXi vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi T2 0 1 σii σjj tr(MXi MXj ) vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi . T2 (iv): follows from (ii) and (iii). Proof of Lemma 2 (i): 36 ij 1 Let vT,m denote the mth element of the vector v ij T . If, for all m = 1, . . . , /2 K (K + 1), Var 1 √ N X i−1 X N T −1 u0i Xi (Xi0 Xi ) ij Xi0 uj vT,m →0 i=2 j=1 as N, T → ∞, the result follows since the expectation of the term is 0. To this end, re-write the expression as −1 N i−1 N 0 X 1 1 X u0i Xi Xi0 Xi X u 1 1 X 1 ij √ √ √ √ √i j vT,m =√ √ ζi,T T T N i=2 T N j=1 T T N i=2 with u0i Xi ζi,T = √ T Xi0 Xi −1 T (13) i−1 0 X Xi uj ij √1 √ vT,m . N j=1 T Given the assumed independence of the disturbances for i 6= j we have together with the independence from the regressors that E (ui |uj , 1 ≤ j < i, Xi , 1 ≤ i ≤ N ) = 0 implying that E (ζi,T |ζi−1,T , ζi−2,T , . . . , ζ2,T ) = 0 and as such a martingale difference [md] array structure for ζi,T . Then, Var N 1 X √ ζi,T N i=2 ! = N 1 X Var (ζi,T ) ; N i=2 if the variances on the r.h.s. are uniformly bounded in N and T , the result follows thanks to dividing √ by T in (13), no matter which rate T has relatively to N. Using again the independence of ui and uj and conditioning on the Xi , we have thanks to the law of iterated expectations [LIE] that 2 Var (ζi,T ) = E E ζi,T |Xi , 1 ≤ i ≤ N 0 = E E ζi,T ζi,T X , since E (ζi,T ) = 0. Then, 0 E ζi,T ζi,T X 0 0 −1 0 −1 0 i−1 i−1 0 0 0 X X Xi uj ij 1 Xi uj ij Xi Xi Xi ui u Xi Xi Xi √1 √ vT,m √ √ vT,m √ X ; = E √i T T T N j=1 T N j=1 T T this equals in turn E tr 0 Xi ui √ T u0i Xi √ Xi0 Xi −1 T T 0 0 −1 i−1 i−1 0 0 X X Xi uj ij 1 Xi uj ij Xi Xi √1 X , √ vT,m √ √ vT,m T N j=1 T N j=1 T or, after using the linearity of the trace, X 0 ui u0i Xi √ tr E √i T T Xi0 Xi T −1 0 0 −1 i−1 i−1 0 0 X X 1 X u 1 X u X X j j i ij ij i X . √ √i vT,m √ √i vT,m T N j=1 T N j=1 T 37 Using the independence of ui and uj , j < i, we obtain that 0 −1 i−1 0 X X X 1 X u ij i i . √i j vT,m |X = σii tr Cov √ X T N j=1 T 0 E ζi,T ζi,T Let us now examine the covariance matrix on the r.h.s.: 0 i−1 i−1 i−1 0 X 1 X X ij ik 1 Xi uj ij Xi uj u0k Xi √ vT,m X = √ √ X . Cov √ vT,m vT,m E N j=1 N j=1 T T T k=1 Given the independence of ui,t and uj,s for i 6= j, the expectation on the r.h.s. is only nonzero for j = k, and we have that i−1 i−1 2 X 0 X 0 X 1 X 1 X u j ij ij i i i √ vT,m X = Cov √ σjj vT,m . N j=1 T N j=1 T The desired variance is then i−1 i−1 2 X X 2 1 ij ij = Kσii 1 σjj vT,m σjj E vT,m E Kσii N j=1 N j=1 2 E E ζi,T |X = ≤ K Considering that E ij vT,m max 1≤m≤ 21 K(K+1) 2 sup E i,j ij vT,m i−1 2 σ X ii σjj . · N j=1 is uniformly bounded as implied by (the stricter) Assumption 2(iii), the result follows. (ii): Analogous to (i) and omitted. (iii): Analogous to (i) and omitted. (iv): It suffices to establish the result elementwise, so write for some 1 ≤ m ≤ 1/2K (K + 1) N i−1 N X N N X X 1 X X 1 1 1 ij ij uj,t vT,m ui,t = ui,t uj,t vT,m − u2 v ii . N i=2 j=1 2 N i=1 j=1 N i=1 i,t T,m Note that the third absolute moment of the term on the l.h.s. is uniformly bounded if the L3 norm of the term is uniformly bounded. Apply then Minkowski’s inequality to obtain that X X i−1 N N 1 X 1 N X 1 N X 1 1 ij ij ii uj,t vT,m ui,t ui,t uj,t vT,m u2i,t vT,m , N ≤ 2 N +2 N i=2 j=1 i=1 j=1 i=1 3 3 3 so the result follows if both L3 norms on the r.h.s. are uniformly bounded. Equivalently, we show the corresponding expectations to be uniformly bounded in t. Making use of the LIE, we only need to 38 show that 3 N 1 X ii lim sup E E u2i,t vT,m X < ∞ N,T →∞ 1≤i≤N ;1≤t≤T N (14) 3 X N X N 1 ij lim sup E E ui,t uj,t vT,m X < ∞. N,T →∞ 1≤i≤N ;1≤t≤T N i=1 j=1 (15) i=1 and Thanks to the independence of ui,t from teh regressors, we have that 3 N N N N 1 X ii jj kk 1 XXX ii E u2i,t vT,m vT,m vT,m ; E u2i,t u2j,t u2k,t vT,m X ≤ 3 N N i=1 j=1 i=1 k=1 using Hölder’s inequality we also have that E u2i,t u2j,t u2k,t ≤ u2i,t 3 u2j,t u2k,t ≤ u2i,t 3 u2j,t 3 u2k,t 3/2 3 so 3 N N N N 1 X ii jj kk 1 XXX ii u2i,t u2j,t u2k,t E vT,m E ≤ 3 u2i,t vT,m vT,m vT,m . 3 3 3 N N i=1 j=1 i=1 k=1 ii jj kk ii jj kk vT,m vT,m ≤ vT,m Use again Hölder’s inequality to conclude that E vT,m v vT,m 3 , which T,m 3 3 leads to 3 !3 N N 1 X X 1 ii ii u2i,t vT,m E u2i,t vT,m . ≤ 3 3 N N i=1 i=1 With E u6i,t being finite and uniformly bounded, u2i,t 3 is itself uniformly bounded in i, and, with the ii 3 < ∞ so condition (14) is satconditions on Xi imposed in Assumption 2, we have that supi E vT,m 4 PN PN ij 1 X , isfied. For condition (15), note that it is implied by the finiteness of E E i=1 j=1 ui,t uj,t vT,m N for which we have that 4 N N 1 XX ij ui,t uj,t vT,m E X N i=1 j=1 = N N N N N N N N 1 X X X X X X X X i1 j1 i2 j2 i3 j3 i4 j4 v v v E u u u u u u u u v i1 ,t j1 ,t i2 ,t j2 ,t i3 ,t j3 ,t i4 ,t j4 ,t T,m T,m T,m T,m X N 4 i =1 j =1 i =1 j =1 i =1 j =1 i =1 j =1 1 = 1 N4 1 N X N X 2 N X 2 N X 3 N X 3 4 N X N X 4 N X i1 j 1 i2 j 2 i3 j 3 i4 j 4 E (ui1 ,t uj1 ,t ui2 ,t uj2 ,t ui3 ,t uj3 ,t ui4 ,t uj4 ,t ) vT,m vT,m vT,m vT,m i1 =1 j1 =1 i2 =1 j2 =1 i3 =1 j3 =1 i4 =1 j4 =1 due to the assumed independence of disturbances and regressors. Because the disturbances are independent, the expectation E (ui1 ,t uj1 ,t ui2 ,t uj2 ,t ui3 ,t uj3 ,t ui4 ,t uj4 ,t ) is nonzero only if the indices are pairwise equal, which leaves us with O N 4 nonzero summands; and, when nonzero, the expectation E (ui1 ,t uj1 ,t ui2 ,t uj2 ,t ui3 ,t uj3 ,t ui4 ,t uj4 ,t ) is obviously uniformly bounded according to Assumption 1. Furthermore, E i1 j 1 i2 j 2 i3 j 3 i4 j 4 vT,m vT,m vT,m vT,m s i1 j1 4 i2 j 2 4 i3 j 3 4 i4 j4 4 ≤ 4 E vT,m E vT,m E vT,m E vT,m thanks to the Hölder’s inequality, and the r.h.s. is uniformly bounded according to Assumption 2. 39 4 PN PN ij 1 X To sum up, E E is uniformly bounded, implying the required i=1 j=1 ui,t uj,t vT,m N condition (15). The result then follows. (v): Because only those products ui,t uj,t uk,t ul,t for which i = k can have non-zero expectation, we “split” the problem and show that ! T N N i−1 i−1 i−1 X X X 1 1 X 1 X X 1 ij ij p ij il σii σjj vT,m vT,n uj,t vT,m → 0 (16) ul,t vT,n u2i,t − T t=1 N 2 i=2 j=1 N i=2 N j=1 l=1 and that T N X i−1 X N k−1 X X X 1 ij kl p 1 ui,t uj,t uk,t ul,t vT,m vT,n → 0. T t=1 N 2 i=2 j=1 (17) i6=k=2 l=1 To establish (16), begin by writing for each time t ! i−1 N i−1 X X X 1 ij il uj,t vT,m ul,t vT,n u2i,t N 2 i=2 j=1 l=1 ! N i−1 i−1 X X X 1 ij il = 2 uj,t vT,m ul,t vT,n u2i,t − σii N i=2 j=1 l=1 ! N i−1 i−1 X 1 X X ij il + 2 uj,t vT,m ul,t vT,n σii N i=2 j=1 (18) l=1 and let ζt,T = = ! N i−1 i−1 X 1 X X ij il uj,t vT,m ul,t vT,n u2i,t − σii N 2 i=2 j=1 l=1 ! N i−1 i−1 1 X 1 X 1 X ij il √ √ uj,t vT,m ul,t vT,n u2i,t − σii N i=2 N j=1 N l=1 Since E u2i,t − σii u2j,s − σjj = 0 for all s 6= t, the LIE implies that ζt,T has zero expectation and is uncorrelated in t, so its average over t vanishes in mean square if lim sup kζt,T k2 < ∞. T →∞ 1≤t≤T Since ζt,T is itself an average over i, use Minkowski’s inequality to obtain that v u u i−1 X u 1 ij kζt,T k2 ≤ u uj,t vT,m tE √ N j=1 1 √ N i−1 X ! il ul,t vT,n 2 u2i,t − σii , l=1 so the desired condition limT →∞ sup1≤t≤T kζt,T k2 < ∞ follows from the uniform boundedness of the 40 expectation on the r.h.s., for which we have that 2 ! i−1 i−1 X X 1 1 ij il √ E √ uj,t vT,m ul,t vT,n u2i,t − σii N j=1 N l=1 !2 i−1 i−1 2 1 X 1 X ij il 2 √ uj,t vT,m ul,t vT,n . = E √ E ui,t − σii N j=1 N l=1 2 The uniform boundedness of E u2i,t − σii follows directly from Assumption 1, while the uniform 2 Pi−1 Pi−1 ij 1 il √1 √ u v u v is established using the arguments boundedness of E j=1 j,t T,m l=1 l,t T,n N N employed in the proof of item (i). We now establish (18), which completes the derivation of (16). We have at each time t ! N N i−1 i−1 i−1 X X X 1 X X 1 1 ij ij ij il σii σjj vT,m vT,n uj,t vT,m ul,t vT,n σii = νt,T + N 2 i=2 j=1 N i=2 N j=1 l=1 with νt,T = N i−1 X i−1 X 1 X ij il σ (uj,t ul,t − σjj δj,l ) vT,m vT,n , ii N 2 i=2 j=1 l=1 where δj,l is Kronecker’s symbol, δj.l = 1 if j = l and 0 otherwise. Thus, we only have to show that PT p t=1 νt,T → 0. The independence of disturbances and regressors leads us to 1/T E ( νt,T | X, νt−1,T , . . . ν1,T ) = 0, PT so the LIE indicates that νt,T has a md array structure; thus, T1 t=1 νt,T vanishes in mean square if νt,T has uniformly bounded variance. Uniformly bounded variance is implied by uniformly bounded L2 norm kνt,T k2 . Rearrange terms to obtain via Minkowski’s inequality that N i−1 X 1 X 1 ij √ kνt,T k2 ≤ uj,t vT,m N i=2 N j=1 i−1 1 X il √ ul,t vT,n N l=1 ! i−1 X ij ij + 1 σ v v . jj T,m T,n N j=1 2 2 Given the assumed uniform moment conditions on the regressors, it suffices to show that the product Pi−1 Pi−1 ij 1 il √1 √ j=1 uj,t vT,m l=1 ul,t vT,n has uniformly bounded variance, which in turn is implied N N by uniform boundedness of 4 i−1 X 1 ij uj,t vT,m . E √ N j=1 The latter expectation is 4 i−1 i−1 i−1 i−1 i−1 X 1 1 X X X X ij ij1 ij2 ij3 ij4 E √ uj,t vT,m = 2 E (uj1 ,t uj2 ,t uj3 ,t uj4 ,t ) E vT,m vT,m vT,m vT,m , N j =1 j =1 j =1 j =1 N j=1 1 2 3 4 2 and, due to the independence of theus, the sum only has O N nonzero terms. Then, the expec ij1 ij2 ij3 ij4 tations E (uj1 ,t uj2 ,t uj3 ,t uj4 ,t ) and E vT,m vT,m vT,m vT,m are uniformly bounded since E u4j,t and 41 E ij vT,m 4 are uniformly bounded according to Assumptions 2 and 1. To complete the proof, we only need to establish (17). Note that, like above, the elements averaged over t are uncorrelated, so (17) follows if N i−1 N k−1 1 XX X X ij kl ui,t uj,t uk,t ul,t vT,m vT,n N 2 i=2 j=1 i6=k=2 l=1 = 1 N2 N X i−1 X i−1 k−1 X X ij kl ui,t uj,t uk,t ul,t vT,m vT,n + i=2 j=1 k=2 l=1 N i−1 N k−1 1 XX X X ij kl ui,t uj,t uk,t ul,t vT,m vT,n N 2 i=2 j=1 k=i+1 l=1 P1 PN has uniformly bounded variance (where we make the convention that k=2 = k=N +1 = 0). This is the case if each of the two sums on the r.h.s. have themselves uniformly bounded variance. Note that, for the first sum, i > j and i > k > l, while, for the second, k > l and k > i > j. Thus, for each of the two sums alone, the summands are pairwise uncorrelated thanks to the independence and zero-mean of ui,t . The variance of the first sum then satisfies N X N X 1 Var N2 i=2 k=2 k<i ! i−1 k−1 X X ij kl ul,t vT,n ui,t uk,t uj,t vT,m j=1 l=1 N N i−1 X 1 1 XX ij Var √ uj,t vT,m = 4 N i=2 N j=1 k=2 k<i k−1 1 X kl √ ul,t vT,n N l=1 ! ui,t uk,t . There are ON 2 sum elements, so the variance on the l.h.s. is indeed uniformly bounded if the ex 2 P P i−1 k−1 ij kl √1 √1 pectation E is itself uniformly bounded; this j=1 uj,t vT,m l=1 ul,t vT,n ui,t uk,t N N last step is easily established using the LIE, the independence of disturbances and regressors, and the assumed uniform moment conditions. The same argument applies for the sum with k > i and the result follows. (vi): We establish the result elementwise and show that i−1 N X X 1 ij ij (σ̂ii σ̂jj − σii σjj ) vT,m vT,n =0 E 2 N i=2 j=1 and (19) N X i−1 X 1 ij ij Var 2 (σ̂ii σ̂jj − σii σjj ) vT,m vT,n →0 N i=2 j=1 (20) for all m, n = 1, . . . , 1/2K (K + 1). Using the LIE, the independence of disturbances for i 6= j, and the independence of disturbances and regressors, it is straightforward to show that ij ij E (σ̂ii σ̂jj − σii σjj ) vT,m vT,n = 0, which implies (19). The same arguments indicate that ij ij kl kl E (σ̂ii σ̂jj − σii σjj ) vT,m vT,n (σ̂kk σ̂ll − σkk σll ) vT,m vT,n =0 42 unless i = k or j = l. Then, N X i−1 X 1 ij ij Var 2 (σ̂ii σ̂jj − σii σjj ) vT,m vT,n N i=2 j=1 = N i−1 N i−1 1 XXXX ij ij kl kl Cov (σ̂ii σ̂jj − σii σjj ) vT,m vT,n , (σ̂kk σ̂ll − σkk σll ) vT,m vT,n . 4 N i=2 j=1 k=2 l=1 ij ij Due to the zero expectation of (σ̂ii σ̂jj − σii σjj ) vT,m vT,n , the covariances equal the expectation of the products, and are as such only nonzero if i = k or j = l. Thanks to the assumed uniform moment conditions, the covariances are uniformly bounded. So there are O N 3 uniformly bounded sum terms in the expression for the variance (20), implying the variance to vanish and leading to the desired result. 43
© Copyright 2025 Paperzz