Directed Tests of No Cross-Sectional Correlation in Large

Directed Tests of No Cross-Sectional Correlation in
Large-N Panel Data Models∗
Matei Demetrescu†
Christian Albrechts-University of Kiel
Ulrich Homm
University of Bonn
Revised version: April 19, 2015
Abstract
Cross-sectional dependence in panel data can significantly affect the inference
about slope parameters. Existing procedures test however for cross-unit dependence per se. Based on the principle of information matrix equality tests of
White (1982, Econometrica 50, 1-25), we propose directed tests for no crosssectional dependence. The new tests essentially check whether there is cross-unit
dependence that is momentous for the variance of OLS slope parameter estimators and thereby for standard inferential procedures. The tests rely on suitably
weighted sample residual cross-covariances or cross-correlations, and are in this
sense a generalization of Pesaran’s (2004, CESifo Working Paper 1229) test for
cross-sectional error dependence. We derive the joint N, T asymptotics of the
proposed tests, which, under the null, follow asymptotic chi-squared distributions without restrictive sphericity or distributional assumptions. Moreover, the
relative rates at which N and T grow to infinity are only mildly restricted. We
use Monte Carlo simulation to gauge the finite sample power of the directed tests
and compare it to extant procedures. The performance of the proposed tests
in terms of size and power is good, even when the number of cross-sections is
decisively larger than the number of time observations. When using the outcome
of the directed tests to decide whether to use panel-robust standard errors or
not for inference on slope parameters, the slope parameter tests are not affected.
They are, however, affected when using alternative cross-dependence tests for
such decision: the use of generic error cross-correlation tests may induce serious
size distortions in slope parameter tests and lead to wrong conclusions in applied
work.
Key words:
Cross-unit correlation; Information matrix equality; Joint asymptotics; Pre-test
JEL classification:
C12 (Hypothesis Testing), C23 (Models with Panel Data)
∗
The authors would like to thank the editor (M. Hashem Pesaran), three anonymous referees, Jörg
Breitung, Kai Carstensen and Jean-Marie Dufour for very helpful comments and suggestions, as well
as Benjamin Hillmann for computational research assistance.
†
Corresponding Author: Institute for Statistics and Econometrics, Christian-AlbrechtsUniversity of Kiel, Olshausenstr.
40-60, D-24118 Kiel, Germany.
E-mail address:
[email protected] .
1
1
Introduction
Cross-sectional dependence in panel data can arise for various reasons, such as global
shocks unaccounted for—be they economic, political or technological in nature—or
unobserved shocks affecting only a subset of cross-sectional units. Such dependence can
have dramatic effects on the asymptotic and finite-sample properties of the least squares
estimator and standard inferential procedures in panel data models; see e.g. Andrews
(2005) and also the more recent survey of Chudik and Pesaran (2013).
One way to deal with potential cross-unit error dependence is to use so-called panelrobust covariance matrix estimation techniques, for instance as proposed by Arellano
(1987), Beck and Katz (1995) or Driscoll and Kraay (1998). Another way is to employ a
preliminary test for cross-sectional dependence and adjust the estimation and inference
procedures according to the outcome of the pretest. The relevant issue for practitioners
is then the reliability of the procedure as a whole, i.e. of the pretesting step and the
inference on the slope parameters taken together.
To test for error cross-unit dependence, Breusch and Pagan (1980) proposed a
Lagrange multiplier [LM] test. The test, relying on the squared pairwise correlations
between the residuals of each unit, is appropriate for panels with large time dimension
T and small cross-sectional dimension N , and is therefore more often employed in
the seemingly unrelated regressions framework of Zellner (1962). When N is large
relative to T , however, it is well-known that the LM test is severely oversized; see for
instance Pesaran (2004) and Pesaran, Ullah, and Yamagata (2008). While the LM
test is essentially relying on (squared) Pearson correlation coefficients, Frees (1995)
suggests the use of Spearman rank correlation coefficients. Frees’ statistic also tends
to exhibit size distortions when N is large relative to T , even if the distortions are less
pronounced than in the case of the LM test.
One way to amend the size distortions of the initial LM test is that of Pesaran et al.
(2008). They compute the expected values and variances of the squared correlation
coefficients under the assumption of normally distributed disturbances and adjust the
LM test accordingly. The resulting adjusted LM test is asymptotically normal as
T → ∞ followed by N → ∞. The empirical size of the test is controlled for moderateto-large N and small T .1 The normality assumption does not appear to be restrictive,
as is documented by simulation experiments with χ2 - and t-distributed errors; see
e.g. Pesaran et al. (2008) and Baltagi et al. (2011). Another correction has recently
been suggested by Baltagi et al. (2012), who compute the bias of the LM statistic based
on fixed-effects pooled OLS residuals for N, T → ∞ and N/T → c, which leads to a
bias-corrected version of the LM test.
1
Non-negligible size distortion can occur, however, when N is very large relative to T (cf. Baltagi
et al., 2011)
2
Pesaran (2004, 2015) puts forward an alternative approach; the corresponding
test employs residual Pearson correlation-coefficients without squaring them, on the
grounds that the null of an average zero correlation is often more relevant for practitioners, say in portfolio management. The test has correct empirical size in finite
samples even when N is much larger than T . But it has, by construction, low power
when error correlation is present and the correlation coefficients roughly sum up to zero.
This can be the case when the disturbances are generated from a factor model where
the loadings average to zero. Pesaran (2015) analyzes the implicit null hypothesis of
Pesaran’s (2004) test, which is given by weak dependence rather than by independence
of errors between cross-sections.
Finally, following John (1972), Baltagi et al. (2011) discuss a sphericity test which
can be used to infer about cross-correlation when the data are cross-sectionally homoskedastic.
The common feature of these three approaches is that one ultimately test for the
existence of cross-unit correlation per se. Here, we derive tests for the null of no crosscorrelation based on the information matrix equality test principle of White (1982).
We may interpret our procedure as being directed against cross-sectional correlation
that impacts the estimate of the covariance matrix of the slope parameter estimators.
From this perspective, our tests check whether the simple OLS variance matrix estimate significantly differs from a cross-correlation robust estimate. As a consequence,
our procedure “leverages” the residual cross-covariances with sample cross-section covariances of the explanatory variables. This, of course, restricts the set of alternatives
against which the test has power; but it is precisely these detected alternatives that
are relevant from the standpoint of covariance matrix estimation for slope parameter
estimators and thus for standard inferential procedures.
Concretely, we discuss in Section 2 several versions of the test, which differ in
how cross-sectional variance heterogeneity or heteroskedasticity is taken into account.
They are all robust to cross-sectional heteroskedasticity under the null, and differences
arise in the presence of cross-sectional heteroskedasticity only under the alternative.
Moreover, they are equivalent under homoskedasticity. The baseline variant uses the
residual cross-covariances directly, whereas the second weights them using the unitspecific residual variances. The third variant relies on cross-correlations rather than
cross-covariances; when focussing on an intercept only, it turns out to be essentially
Pesaran’s (2004) test. While our procedures inherit the good power properties of
Pesaran’s (2004) test when the correlation coefficients all have the same sign, we expect
power gains in other directions of the space of alternative hypotheses and provide Monte
Carlo evidence in favour of this conjecture.
Furthermore, we establish the asymptotic distribution for N and T going to infinity
3
jointly; a χ2 limiting distribution of the test is obtained without relying on a specific
distribution for the disturbances. For the test based on residual cross-covariances, the
relative rates of N and T are not explicitly restricted to a certain path. For the other
two variants, not restricting the shape of the distribution of the disturbances comes
at the price of having to control the growth rates of the cross-sectional dimension N
relative to T more strictly.
Our Monte Carlo simulations in Section 3 show that the directed tests have correct
empirical size for as little as 10 time periods (20 for the correlations-based test), even
for much larger N . The Monte Carlo analysis furthermore illustrates the severe shortcomings of several alternative procedures for testing for no cross-correlation when used
as a pretest to decide between ordinary and panel-robust standard errors. In contrast,
our directed test procedures work reliably.
Let us introduce some notation before proceeding. We denote vectors by boldface
symbols. Let k·k denote the Euclidean vector norm and the corresponding induced
pP r
|·| , and for
matrix norm. Further, k·kr stands for both the Lr vector norm, r
p
r
r
the Lr norm of a random variable or vector, E(k · k ). The Kronecker product of
two matrices is denoted by ⊗, and diag(di ) denotes the diagonal matrix having di ,
i = 1, . . . , N , as diagonal elements. Finally, C is a generic constant whose value may
differ from occurrence to occurrence.
2
2.1
Testing for no cross-correlation
Information matrix equality testing
To motivate our tests, let us first analyze the homogenous panel data model
yi,t = α + x0i,t β + ui,t ,
i = 1, . . . , N, t = 1, . . . , T,
(1)
where the zero-mean disturbances ui,t are assumed to be independent of the regressors
xi,t . We work with a total number of K regressors including the intercept, so xi,t ∈
RK−1 . We shall relax the assumptions on α and β later on.
For deriving the likelihood function, we assume normality and independence of the
disturbances, ui,t ∼ iidN (0, σ 2 ). Normality is only required to justify the test statistics; we shall establish their limiting behavior under considerably weaker conditions,
including e.g. cross-sectional heteroskedasticity; see Subsection 2.3.
Let now y i denote the T × 1 vector containing the observations for cross-section i,
Xi be the T × K regressor matrix {xi,t,k } with xi,t,1 = 1, and X the matrix stacking
the N individual regressor matrices Xi . The vectors ui are the T × 1 individualunit disturbance vectors, and the N T -dimensional vector u contains the stacked N
4
individual-unit disturbances ui . Correspondingly, ut = (u1,t , . . . , uN,t )0 denotes the
cross-section of errors at time t. Finally, the contemporaneous covariance matrix of
the errors is given by E (ut u0t ) = Σ = {σij }i,j=1,...,N . Under the null, Σ = Σ0 = σ 2 IN ,
whereas under the alternative we use the parameterization Σ = σ 2 Ω with Ω positive
definite and normalized such that tr Ω = N . The null is recovered when Ω = IN .
Then, conditional on the regressors, the log-likelihood of model (1) is given under
the null of no cross-sectional correlation by
N
T
1 XX
`=C− 2
(yi,t − α − x0i,t β)2 ,
2σ i=1 t=1
from which the score and the Hessian can be derived,
N
T
1 XX
s =
σ 2 i=1 t=1
ui,t
xi,t ui,t
N
T
1 XX
H = − 2
σ i=1 t=1
!
=
1 0
Xu
σ2
!
1
x0i,t
xi,t xi,t x0i,t
=−
1 0
XX
σ2
when treating for simplicity σ 2 as known.
When not restricting Σ, the covariance matrix of the score is given by
Cov(s|X) =
1
1 0
X (Ω ⊗ IT ) X = 4 X 0 (Σ ⊗ IT ) X;
2
σ
σ
under the null of no cross-sectional correlation and homoskedasticity, the information
matrix equality holds,
1 0
1
X (Σ ⊗ IT ) X = 2 X 0 X,
4
σ
σ
with Σ = Σ0 = σ 2 IN being correctly specified. Under the alternative, the equality does
not hold, and a test statistic for sphericity is immediately obtained by plugging in an
unrestricted estimate of Σ, say Σ̂, and a restricted estimate of σ 2 , say σ̂ 2 . One rejects
the null when the equality is significantly violated, i.e. when
X
0
2
0
Σ̂ ⊗ IT X − σ̂ X X =
N X
N
X
σ̂ij − σ̂ 2 1(i = j) Xi0 Xj ,
(2)
i=1 j=1
with 1(·) the indicator function, is significantly different from zero.
In this respect we obtain an alternative to the John test recently discussed by
Baltagi et al. (2011). Unlike the case of the John test, however, it is straightforward
to test against cross-sectional correlation only. We simply need to check whether the
5
terms not involving σii of the difference in (2) are zero or not, i.e. we focus on whether
N X
N
X
N X
i−1
X
0
σ̂ij − σ̂ 1(i = j) Xi Xj =
σ̂ij (Xi0 Xj + Xj0 Xi )
2
i=1 j=1
i6=j
i=2 j=1
is significantly different from zero. In fact, we only need to look at the lower triangular
elements of the matrix on the r.h.s. due to its symmetry, so our directed tests rely on
S=
N X
i−1
X
σ̂ij vech(Xi0 Xj + Xj0 Xi ),
(3)
i=2 j=1
where σ̂ij are suitable estimators σij and vech is the half-vec operator.
Having derived the statistic S on the basis of a homogenous panel data model,
one possibility to compute error covariance estimates σ̂ij is via pooled OLS residuals.
Indeed, a number of cross-sectional dependence tests rely on pooled or pooled fixedeffects residuals (e.g. Baltagi et al., 2011, 2012). Other procedures, such as the ones
of Pesaran (2004) or Pesaran et al. (2008), rely however on unit-wise residuals. We
too shall resort to unit-wise residuals in the following; the main reasons to do so are
discussed in more detail in Subsection 2.2 which also addresses the relation of the
directed test to the literature. We therefore let
σ̂ij =
1
1
û0i ûj =
y 0i MXi MXj y j
T −K
T −K
with MXi = IT − Xi (Xi0 Xi )−1 Xi0 .2
In may be interesting to note that, analogously, a cross-sectional heteroskedasticP
2
0
ity test can be based on N
i=1 (σ̂ii − σ̂ ) vech(Xi Xi ); this is essentially a White heteroskedasticity test against cross-sectional heteroskedasticity.
In order to decide on the significance, the statistic S yet has to be normalized.
Lemma 1 in the Appendix indicates that
Cov(S) =
N N
−1
X
X
0 1
0
0
0
0
X
vech
X
X
+
X
X
,
σ
σ
E
tr(M
M
)
vech
X
X
+
X
ii jj
Xi
Xj
i j
j i
i j
j i
(T − K)2 i=2 j=1
provided that the disturbances are independent of the regressors and that the moments
exist. (Precise assumptions about the components of our model are provided and
discussed in the following subsection.)
2
Alternatively, one could use the (Q)ML estimator which is not corrected for degrees of freedom;
the correction improves however the finite-sample properties of the test statistic while not affecting
the asymptotics.
6
A natural estimator for the covariance matrix of S is then given by
N X
i−1
X
0
1
0
0
0
0
X
,
X
+
X
X
vech
X
X
+
X
σ̂
σ̂
tr
M
M
vech
X
V̂ =
i
j
i
j
ii
jj
X
X
j
i
j
i
i
j
(T − K)2 i=2 j=1
where, as above, σ̂ij is a consistent estimator of σij based on ûi .
Standardizing S with its estimated covariance matrix leads to the following statistic
for testing the null of no cross-sectional correlation:
σ
= S 0 V̂ −1 S.
CDX
The following subsection will analyze its relation to the test of Pesaran (2004) and argue
that it delivers a directed test of cross-section independence: should S be significantly
different from zero, one rejects the null of no cross-sectional correlation in favor of
cross-correlation that affects inference on the slope parameters β.
σ
was derived under the assumption of cross-unit homoskedasticity.
The statistic CDX
While we prove in Section 2.3 that the statistic is robust to cross-unit heteroskedasticity,
it may be useful to examine a variant which explicitly accounts for heteroskedasticity to
begin with. Under the null hypothesis of no cross-unit correlation, we have E (ut u0t ) =
Σ0 = diag(σii ) so the score becomes
s = X 0 Σ−1
0 ⊗ IT u
and the Hessian is given by
H = −X 0 Σ−1
⊗
I
X.
T
0
At the same time, the covariance matrix of the score is given under the alternative by
−1
Cov (s|X) = X 0 Σ−1
0 ⊗ IT (Σ ⊗ IT ) Σ0 ⊗ IT X.
By having allowed for cross-unit heteroskedasticity, the diagonals of Cov (s|X) and
−H are equal, so, by focussing again on the lower triangular elements and plugging in
the corresponding estimates, we obtain
Sw =
i−1
N X
X
σ̂ij
vech(Xi0 Xj + Xj0 Xi )
σ̂
σ̂
i=2 j=1 ii jj
7
(4)
as basis for a no cross-correlation test. Analogously, its variance can be estimated by
V̂
w
N X
i−1
X
0
1
1
0
0
0
0
X
,
X
+
X
tr
M
M
vech
X
X
+
X
X
vech
X
=
i
j
X
X
j
i
j
i
i
j
i
j
(T − K)2 i=2 j=1 σ̂ii σ̂jj
leading to
−1
w
CDX
= (S w )0 V̂ w
Sw.
This statistic can be seen, with a mild abuse of terminology, as a “WLS variant” of
σ
CDX
, since
Sw =
N X
i−1
X
1
1
1
1
√ Xi0 p Xj + p Xj0 √ Xi
σ̂ii
σ̂ii
σ̂jj
σ̂jj
ρ̂ij vech
i=2 j=1
!
,
is, up to a negligible3 term, the statistic (3) computed in the WLS-transformed model
x0i,t
yi,t
1
ui,t
=α
+
β+
.
σii
σii
σii
σii
We compare the variants theoretically and in finite samples in Sections 2.3 and 3, but
not before introducing a correlations-based version with quite a nice interpretation.
2.2
Discussion
To put the new tests in relation with existing ones, let us now consider a third variant
of the directed tests obtained by replacing the estimated covariances in S, cf. Equation
(3), by estimated correlation coefficients:
R=
N X
i−1
X
ρ̂ij vech
Xi0 Xj
+
Xj0 Xi
,
where ρ̂ij =
i=2 j=1
û0i ûj
1
1
(û0i ûi ) 2 (û0j ûj ) 2
.
(5)
In practice, Cov(R) can be approximated using
V̂R =
N i−1
0
1 XX
vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi ,
T i=2 j=1
since E(ρ̂2 ) ≈ T −1 for large T and iid sampling. The resulting statistic is denoted as
ρ
CDX
= R0 V̂R−1 R.
3
Under the conditions of Proposition 2 further below.
8
ρ
σ
w
Under homoskedasticity, the three variants CDX
, CDX
and CDX
can be checked to be
asymptotically equivalent; see the proof of Proposition 2 in the appendix. Under crosssectional heteroskedasticity, however, we document power gains of the WLS version in
Section 3 discussing the finite-sample properties of our tests.
Note that, if the model only includes an intercept, the correlation-based form of
our test statistic reduces to
2T
ρ
CDX
=
N (N − 1)
N X
i−1
X
!2
ρ̂ij
i=2 j=1
which is the square of the test statistic proposed by Pesaran (2004). The directed tests
may thus be seen as an extension of Pesaran’s test idea.
Then, to understand what the additional terms in our statistic stand for, let us now
examine the standard fixed effect panel data model
yi,t = αi + x0i,t β + ui,t ,
i = 1, . . . , N.
(6)
By letting ȳ i and X̄i denote the variables after within-group demeaning, we obtain by
stacking equations
ȳ = X̄β + ū,
with ȳ = (ȳ 01 , . . . , ȳ 0N )0 and otherwise obvious notation.
Under serial independence of the disturbances (we make this assumption explicit in
the following subsection), the covariance matrix of the OLS estimator of β is given by
−1
−1 0
.
X̄ (Σ ⊗ IT ) X̄ X̄ 0 X̄
Cov β̂|X = X̄ 0 X̄
(Under serial error dependence, clustered standard errors would have been the suitable
choice; see e.g. Driscoll and Kraay, 1998.) Under the null hypothesis of no crosssectional correlation we have that σij = 0 for all i 6= j, and Σ = Σ0 = diag σij . Comparing the covariance matrix of the fixed-effects OLS estimator with the one obtained
assuming no cross-unit correlation, we should have equality under the null,
X̄ 0 (Σ ⊗ IT )X̄ = X̄ 0 (Σ0 ⊗ IT )X̄.
(7)
Checking whether the equality is not significantly violated in the sample (i.e. comparing
the usual heteroskedasticity-robust with the heteroskedasticity and cross-correlation robust, or panel-robust, covariance matrix estimator) takes us to a test directed at detecting cross-unit correlation affecting inference on the slope parameters in the fixed-effects
σ
framework. We thus obtain essentially the same test statistic as before, say CDX̄
, de9
rived from the information matrix equality principle with intercepts concentrated out.
Moreover, this fixed-effects variant is nothing else than the complement of Pesaran’s
2004 CD test: the CD test focusses on cross-correlations “leveraged” by intercepts only,
ρ
while CDX̄
leverages with cross-products of mean-adjusted regressors – which are by
construction orthogonal to the intercepts. Also, since S essentially consists of weighted
pairwise error cross-unit covariances, the implicit null of our test is closely related to
that of the CD test as discussed by Pesaran (2015). Actually, an adaptation of the
arguments of Pesaran (2015) to the setup of Proposition 1 would show that the CD test
and the directed tests have the same implicit null, i.e. an exponent of cross-sectional
dependence 0 ≤ α < (2 − )/4 when T = O(N ).4 In brief, the CDX tests include the
initial CD test, and the CDX leverages (the expectation of the product of regressors
from different units) are either zero or constant, so there is no qualitative difference
in the kind of hypotheses that can be detected by CD and CDX . The simulations
in Section 3 shows the behavior of the CD tests to be the same under some data
generating processes exhibiting weak cross-dependence
To conclude the discussion on the construction and interpretation of the new tests,
let us examine the issue of which residuals to use for computing the error covariance
estimates σ̂ in more detail.
On the one hand, the interpretation as directed tests is based on a homogenous
or at most a fixed-effects panel data model. On the other hand, the directed tests
only fit the framework of Pesaran (2004) when using unit-wise residuals. Unit-wise
residuals have the additional advantage of being consistent in panels with coefficient
heterogeneity, whereas the pooled (fixed-effects) OLS residuals would contain a component due to heterogeneity. Under regressor cross-unit dependence, this component
is cross-sectionally correlated, so, unless one is specifically interested in detecting such
types of cross-dependence, heterogeneity turns out to be a nuisance when detecting
error cross-correlation.
To sum up, using residuals from individual regressions has the advantage of robustifying the cross-correlation testing procedure against parameter heterogeneity; the
pooled residuals may be used too, should parameter heterogeneity not be of concern.5
The following subsection derives the asymptotic limiting distribution of the three
variants of our test and discusses the assumptions under which we work.
4
The implicit null is different when T ∼ CN , but the correlation-based versions require some
restrictions on the relative rates at which N and T go to infinity which exclude proportionality.
5
The results provided in the following subsection build on unit-wise residuals. The proof of Proposition 1 first shows that the estimation effect is negligible, and establishes then the limiting distribution
of the statistic building on the true disturbances. With pooled residuals converging at a higher rate
under the null of no cross-correlation, Proposition 1 arguably holds for pooled residuals as well.
10
2.3
Limiting behavior
Let us now state the conditions for the panel data generating process. The assumptions
on the disturbances are standard in the panel data literature.
Assumption 1. The disturbances ui,t are independent of the regressors xi0 ,t0 ,k for all
1 ≤ i, i0 ≤ N , 1 ≤ t, t0 ≤ T and 1 ≤ k ≤ K, and satisfy
√
1. ui,t = i,t σii with limN →∞ inf 1≤i≤N σii > 0, limN →∞ sup1≤i≤N σii < ∞;
2. i,t ∼ iid (0, 1) over both i and t with E 8i,t < ∞.
The first condition allows for cross-unit heteroskedasticity and excludes the possibility that some units dominate in the limit. The second is e.g. implied by the normality
of the disturbances assumed by Pesaran et al. (2008) or Baltagi et al. (2012), but does
not restrict the distribution of the disturbances beyond typical moment restrictions.
In what concerns the regressors, we essentially require mild forms of uniformity of
their properties as N → ∞. Without such uniformity or analogous conditions, there
may not be a limit as N, T → ∞ jointly. Before stating the assumption, we introduce
some auxiliary notation that helps dealing with the fact that some of the cross-product
moments of the regressors from different units may converge to zero, while others may
converge to a non-zero constant. In other words, different regressors from different
−1
0
0
√1
units may, but do not have to, be uncorrelated. Let v ij
T = T DT vech Xi Xj + Xj Xi ,
where DT is a 1/2K(K
+ 1) × 1/2K(K + 1) diagonal matrix whose diagonal elements
P
PT
PT
T
∗ xj,t,k
∗ +
= 0,
x
x
x
=
0
or
E
x
are either 1 when E
i,t,k
i,t,k
j,t,k
i,t,k
t=1
t=1
t=1
√
or T when the respective expectation is nonzero. We implicitly assume that the
regressors only exhibit short-range dependence. Long-range dependence can easily be
dealt with using a different standardization, but we omit the details here; integration
or cointegration of the regressors is however excluded since it would lead to stochastic
limiting behavior of the sample cross-product averages.
Let us furthermore agree on the notation
that the mth element ofv ij
T, m =
P
P
T
T
ij
ij
1, . . . , 1/2K(K +1), is given by vT,m = T1
t=1 xi,t,k xj,t,k∗ +
t=1 xi,t,k∗ xj,t,k or vT,m =
P
PT
T
∗
√1
∗
∗
t=1 xi,t,k xj,t,k +
t=1 xi,t,k xj,t,k for suitable k and k .
T
Assumption 2. The regressors xi,t,k , k = 2, . . . , K, are stochastic and satisfy
X 0 Xi −1 i
< ∞ = 1;
1. Pr limN,T →∞ sup1≤i≤N T
2. limN,T →∞ sup1≤i≤N ;1≤t≤T E x6i,t,k < ∞ ∀k = 1, . . . , K;
4 ij
3. limN,T →∞ sup1≤i,j≤N ;1≤t≤T E vT,m
< ∞ ∀m = 1, . . . 1/2K (K + 1).
11
ij
where k·k is the matrix norm induced by the Euclidean vector norm and vT,m
the
standardized cross-product of the regressors as defined above. Furthermore, the space
1
spanned by the vectors E v ij
T has dimension /2K (K + 1).
The first condition stated in the assumption is standard for stochastic regressors and
just formalizes the requirement that the moment matrix of the regressors is invertible
in each unit of the panel. Depending on the distributional properties of the regressors,
it may imply some restrictions on the relative rates at which N and T are allowed to
diverge, but the restrictions are not obvious. Considering e.g. independent units, the
probability to observe regressor moment matrices in the neighbourhood of singularity
is the key quantity: the faster it vanishes in T , the faster N can grow. At the other
end of the scale, should the regressors be common to all units (i.e. extreme regressor
cross-dependence), it suffices that the condition be fulfilled in T , case which is wellunderstood from standard regression analysis and there is no relative rate restriction.
The second and third conditions are typical moment restrictions: the second is
similar to the moment condition on the disturbances, while the third focuses of the
cross-product sample moments of the regressors in a given unit and requires a specific
form of uniformity of their convergence. For instance, a factor model such as
xi,t = Λi f t + ei,t
with Λi a (K − 1) × L matrix of loadings and independence of the factors from the idiosyncratic errors generates them under suitable moment conditions on the components
f t and ei,t and uniformity conditions on the loadings Λi , as can easily be checked.
Finally, no independence across the panel is required and the regressors may be
allowed to be common as long as the dimensionality condition is fulfilled. The condition
ensures the covariance matrix V̂ to be well-behaved; see Remark 1 for how to deal with
the situation where the condition is violated. The requirement that the regressors be
stochastic for k = 2, . . . , K simplifies proofs and notation, but is not essential for the
proofs. In fact one can treat deterministic regressors as stochastic ones, provided that
the sequence of regressor values behaves, in terms of distributions, like realizations of
a stochastic regressor obeying the assumption; see e.g. Amemiya (1985, Chapter 4).
The two assumptions allow us to establish a χ2 limiting distribution for the three
variants of the proposed directed test.
Proposition 1. Under the above assumptions, we have as N, T → ∞ that
d
σ
CDX
→ χ21 K(K+1) .
2
Proof: see the Appendix.
12
Remark 1. The covariance matrix V is not always well-behaved, a leading example
being the case where the regressors are common across units. There are two possibilities
of dealing with a rank-deficitary matrix V . The first would be to simply exclude some
redundant regressor cross-products. The second relies on the work of Andrews (1987)
and amounts to using a generalized inverse of an estimator of V , but one which ensures
that the rank of V̂ converges to the true rank of V : the limiting distribution then remains
chi-squared, but with rk V degrees of freedom. We provide simulation evidence that the
use of the Moore-Penrose inverse in the extreme case of common regressors (i.e. unity
rank of V and V̂ ), together with χ21 critical values, works reliably.
Remark 2. Assumption 1 requires strict exogeneity of the disturbance terms. It can be
seen from the proof of the proposition that relaxing this to weak exogeneity, say, is difficult, since the limiting null distribution relies on uncorrelatedness of the disturbances
and cross-moments of the regressors. This prevents the application of the directed tests
in dynamic panels, for instance. It is possible though to eliminate the regressors that
are not strictly exogenous from the vector of cross-moments; the CD test of Pesaran
(2004), which has been proved to work in dynamic panels under certain circumstances,
can be seen as such an “exogenized” statistic.
Remark 3. Halunga, Orme, and Yamagata (2012) bootstrap the LM test of Breusch
and Pagan (1980) to obtain, besides an improved behavior for large N , robustness to
heteroskedasticity in the time dimension as well. Along these lines, an examination of
the proof of Proposition 1 reveals that an Eicker-White type covariance matrix estimator
(Eicker, 1967; White, 1980) is given by
!
T
N i−1 N k−1
1 X XXXX
0
ûj,t ûi,t ûk,t ûl,t vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi
Ṽ =
.
T t=1 i=2 j=1 k=2 l=1
σ
Using Ṽ instead of V̂ thus robustifies CDX
against conditional as well as unconditional
heteroskedasticity in the time dimension.
In what concerns the WLS and the correlation-based versions of our test, we need
to seriously restrict the rate at which N may grow to infinity. The finite-sample
experiments in Section 3 suggest that the restriction is only binding for very small
T and large N . The literature sometimes assumes symmetry of the disturbances to
relax such rate restrictions; we do not find the assumption plausible in general and
σ
rather recommend the use of CDX
when T is dangerously small.
Proposition 2. Under the additional assumption that N 3 /T → 0, we have as N, T →
∞ that
d
ρ
CDX
= R0 V̂R−1 R → χ21 K(K+1)
2
13
and
−1
d
w
CDX
= (S w )0 V̂ w
S w → χ21 K(K+1) .
2
Proof: see the Appendix.
Remark 4. Results Similar to Propositions 1 and 2 hold as well for the test variants
building on X̄ rather than on X, yet with 1/2 (K − 1) K degrees of freedom.
3
Finite-sample behavior
In this section, the empirical size and power of the tests for cross-section independence
is assessed by means of Monte Carlo experiments. We first present the competing
procedures to keep the paper self-contained. The simulation scenarios are described in
Section 3.2, and we discuss the results in Section 3.3.
3.1
Alternative test procedures
For completeness, we start with the LM test of Breusch and Pagan (1980), although,
because of its known severe size distortions when N is large relative to T (cf. Pesaran et al., 2008, or Moscone and Tosetti, 2009), its use is not recommended for N
comparable with, or larger than, T .
The LM test of Breusch and Pagan (1980) builds on the statistic
s
LM =
N
i−1
XX
1
(T ρ̂2ij − 1),
N (N − 1) i=2 j=1
with ρ̂ij defined as in (5). Under the null hypothesis and for fixed N and T → ∞, LM
approaches a standard normal distribution. However, for small T , (T ρ̂2ij − 1) is not
centered at 0. For large cross-sectional dimension (in relation to the time dimension),
this can lead to substantial overrejection.
Frees’ test (1995) rank correlation test is given by
N
2
RAV
E =
i−1
XX
2
2
r̂ij
,
N (N − 1) i=2 j=1
where r̂ij is the Spearman rank correlation-coefficient between residual ûi and ûj . Frees
2
−1
(1995) derives the limit distribution of QN = N (RAV
E − (T − 1) ) for the case that
the intercept is the only regressor in (6). The resulting limit, Q, is a weighted sum
of two independent χ2 random variables. Because of dependence on T , critical values
are cumbersome to compute. When T is not small, Frees (1995) suggests to use the
14
following approximately normally distributed statistic
QN
,
F RE = p
Var(Q)
where Var(Q) =
4(T − 2)(25T 2 − 7T − 54)
.
25T (T − 1)3 (T + 1)
The adjusted LM test: Pesaran et al. (2008) compute the exact finite sample
expectations and variance of ρ̂2ij imposing that, in addition to Pesaran’s assumptions,
the errors i,t are normally distributed. Their statistic is given by
s
LMadj =
N X
i−1
X
(T − K)ρ̂2ij − µT ij
2
,
N (N − 1) i=2 j=1
νT ij
where
1
tr E(MXi MXj ) ,
T −k
2
= tr E(MXi MXj ) a1T + 2 tr E (MXi MXj )2 a2T
µT ij =
νT2 ij
with
a1T = a2T
1
−
,
(T − K)2
and
a2T
(T − K − 8)T − K + 2) + 24
=3
(T − K + 2)(T − K − 2)(T − K − 4)
2
.
As T → ∞ followed by N → ∞, LMadj converges to a standard normal distribution.
The test controls size much better, also when N is large relative to T .
The bias-corrected LM test: Baltagi et al. (2012) analyze the asymptotics of
the LM test under joint N, T asymptotics where N/T → c ∈ (0, ∞). When using the
pooled fixed-effects residuals, ũ = MX̄ u, they prove that the statistic
s
LMbc =
N
i−1
XX
1
N
(T ρ̃2ij − 1) −
N (N − 1) 2 j=1
2(T − 1)
with ρ̃ij the correlation of ũi and ũj has a limiting standard normal distribution.6
Like for the correction of Pesaran et al. (2008) or the John test below, normality is
required. The size control is again essentially better than that of the initial LM statistic;
see Baltagi et al. (2012).
Pesaran’s test: To work around the bias problem of the LM test for finite T ,
6
This contrasts with the previous tests, where the residuals are obtained from separate cross-section
regressions. One may conjecture that, applying techniques from Baltagi et al. (2011), the asymptotic
results from Section 2.3 are still valid when fixed-effects residuals are employed, but we do not pursue
the topic here.
15
Pesaran (2004) suggests to use ρ̂ij instead of ρ̂2ij and considers the statistic
s
CDP =
N
i−1
XX
2T
ρ̂ij .
N (N − 1) i=2 j=1
He shows that, under his assumptions, E[ρ̂ij ] = 0 and CDP approaches a standard
normal distribution for T, N → ∞. Indeed, his test has correct empirical size also
when N is large relative to T ; see Section 3. A standard critique of the test is that it
P Pi−1
lacks power against alternatives under which N
i=2
j=1 ρij ≈ 0.
The John test: A different approach is taken by Baltagi et al. (2011). Their
procedures tests for spherical disturbances. That is, apart from independence, the
null hypothesis includes homoskedasticity. Although this only allows for a limited
comparison with tests of cross-section independence, the John test is included in the
MC experiments, since Baltagi et al. (2011) found a favorable performance relative to
CDP and LMadj under homoskedasticity. The test statistic is given by
T
J=
1
N
−2
tr Ω̃
1
N
tr Ω̃2 − T − N
−
2
1
N
−
,
2 2(T − 1)
(8)
PT
0
1
0
where Ω̃ = T −K
t=1 ũt ũt and ũt = (ũ1,t , . . . , ũN,t ) contains the residuals for period
t from a fixed-effects regression in model (6). A crucial assumption of Baltagi et al.
(2011) is that the errors are normally distributed. Then, under H0 , the statistic in (8)
is asymptotically standard normal as N, T → ∞ with N/T → c ∈ [0, ∞).
3.2
Simulation setup
Similar to Pesaran et al. (2008) and Moscone and Tosetti (2009), we use the following
data generating process (for i = 1, . . . , N ):
yi,t = αi + β1 x1,i,t + β2 x2,i,t + ui,t ,
ui,t = γi ft + σi i,t ,
where
ft ∼ iid N (0, 1),
αi ∼ iid N (1, 1),
β1 = β2 = 1.
Several scenarios are considered for the regressors xl,i,t , for the cross-sectional variances σi2 , for the factor loadings γi and for the variance σi2 of the idiosyncratic error
components.
In addition to Pesaran et al. (2008) and Moscone and Tosetti (2009) we simulate
16
regressors that, thanks to a factor structure, are correlated across cross-sections,
(x) (x)
(x)
xl,i,t = fl,t γl,i + l,i,t
(x)
(x)
where fl,t ∼ iid N (0, 1) and l,i,t ∼ iid N (0, 0.1). This makes the DGP a factoraugmented panel data model without correlation between the common components of
the regressors and of the errors.
It comes at no surprise, that, without such regressor cross-dependence, the CDX
tests have little power, since the terms vech Xi0 Xj + Xj0 Xi in (3) basically contain
empirical covariances between regressors in different sections. But the point of the
CDX tests is to check whether standard inferential procedures, such as the OLS-based
t-test on slope coefficients, are invalid when neglecting error cross-section correlation.7
Since the regressor cross-correlation affects the behavior of the CDX family of
tests under the alternative, we shall consider three cases for the regressor loadings
(x)
(x)
γl,i . Thus, γl,i ∼ iid U (−0.2, 0.2) with U (a, b) standing for a uniform distribution
(x)
on (a, b), captures relatively weak regressor cross-dependence; γl,i ∼ iid U (0.3, 0.7)
(x)
stands for moderate regressor cross-dependence, and γl,i ∼ iid U (3, 5) models strong
regressor cross-dependence. We report here results for the case of moderate regressor
cross-dependence: typically, weak and strong regressor cross-dependence lead to little
differences so we omit them to save space. In some of the cases (namely some of the
power studies), the results were different, though not by a large margin; we comment
briefly on the additional results in such cases and the tables are available upon request
from the authors.
The “null” scenario, S0, is designed to study size. We start with homoskedastic
errors σi2 = 1 and simulate the idiosyncratic error components i,t as standard normal.
There is no error cross-correlation,
S0: γi = 0 for all i = 1, . . . , N .
The regressors are moderately cross-dependent as described above. The errors fulfill
both the symmetry assumption of Pesaran (2004) and the normality assumption of
Pesaran et al. (2008) or Baltagi et al. (2011) such that, considering the slope parameter homogeneity, all considered tests have correct asymptotic size; this will allow for
meaningful power comparisons.
To check the robustness of the results, the baseline scenario S0 is expanded by several cases. We build on the baseline scenario and mention for each additional case only
the features that differ. First, we consider a case where the errors are heteroskedastic
7
Unreported Monte Carlo simulations show that the t-test for the null β1 = 1 has correct empirical
size in the absence of regressor cross-section correlation, even if cross-sectional error correlation is
present.
17
√
(with σi2 ∼ iidχ21 ) in addition to being non-normal, i,t ∼ iid(χ21 − 1)/ 2. Furthermore, building on normality and homoskedasticity again, we consider a case with slope
heterogeneity (where the regressor cross-dependence is expected to induce residual
cross-correlation), a case where the regressors are common such that V has reduced
rank and we may study the behavior of the test when employing a generalized matrix
inverse and adjusted critical values to compute the test (see Remark 1), and two cases
of weak cross-correlation. The implementation details are as follows. For the slope heterogeneity case, β1 is kept constant and we generate β2i ∼ iidU (0.5, 1); the variability
matches roughly the one on the heterogenous simulation setup of Chudik and Pesaran
(2013, see Eq. (67)); we did not follow the design of Chudik and Pesaran (2013) since
it included weakly exogenous regressors. For the common regressors case, we simply
(x)
(x)
set xi,t,l = fl,t and i,t = 0, resort to the Moore-Penrose generalized inverse of V̂ , and
employ χ21 critical values. For the weak cross-dependence designs we followed Chudik
and Pesaran (2013) in setting the loadings γi to zero for i > [N α ] + 1; we work with
α = 0.5 and α = 0.75 and γi ∼ iid U (0.1, 0.3) for i ≤ [N α ] like in the first power scenario below. We also simulated with cross-dependence index α = 0.25, yet the figures
are virtually the same as those for α = 0.5 and we do not report them to save space.
In the following scenarios we investigate the power of the tests. The baseline scenario 1 exhibits positive cross-unit covariances,
S1: γi ∼ iid U (0.1, 0.3),
with moderate regressor cross-dependence as described above. The errors are, like in
the baseline Scenario 0, homoskedastic and Gaussian.
While, in the three variations of scenario 1, all covariances σij are positive, they
P Pi−1
will approximately net out in scenario 2, N
2
j=1 ρij ≈ 0 thanks to the loadings γi
roughly averaging to zero. It has been argued (cf. Pesaran et al., 2008) that, in the
latter case, the CDP test lacks power, and we examine our tests for similar behavior:
S2: γi ∼ iid U (−0.3, 0.3).
The third considered scenario duplicates Scenario 1,
S3: γi ∼ iid U (0.1, 0.3),
√
up to the distribution of the errors, which are now generated as i,t ∼ iid(χ21 − 1)/ 2
with constant unity variances, σii = 1.
For S4, we replicate the baseline Scenario 2,
S4: γi ∼ iid U (−0.3, 0.3).
18
but allow for heteroskedasticity as in S3 and switch to heteroskedastic errors (with
w
σi2 ∼ iidχ21 ) to allow for an assessment of the advantages of the “WLS variant” CDX
,
if any.
Finally, Scenario 5 works with the “common regressors” DGP considered as a variation of scenario S0, with the addition of cross-error correlation following the baseline
scenario S1, i.e. the case of a positive average loadings of the error factors. The question studied is whether the rank correction to get size control affects or not the power
properties of the CD tests and the effectiveness as a pretest.
For each scenario and varying numbers of cross-sections N and time periods T we
employ 5000 Monte Carlo replications. All tests are conducted at the 5% nominal level.
We also report, for each considered power study, the size of the tests of the null
β1 = 1 using either the usual or the panel-robust standard errors (see Section 2.2). The
choice of the standard error to be used for the slope parameter test is done according
to the outcome of each cross-correlation test considered: a rejection of the null of no
cross-correlation prompts the use of panel-robust standard errors, instead of standard
errors accounting for cross-sectional heteroskedasticity only. We do this because, in
terms of plain rejection frequencies, the CDX tests benefit from multiplication with
the elements of Xi0 Xj + Xj0 Xi such that large sample covariances of the regressors boost
the power of the directed tests. Raw power may therefore not be the best comparison
criterion. Rather, it is more informative to find out how cross-dependence tests affect
the behavior of subsequent slope parameter tests.
3.3
Simulation results
We begin with the discussion of the scenario S0 not exhibiting cross-unit error correlation. Table 1 gives the empirical rejection frequencies under the null of the compared
tests. The LM test is clearly unreliable, even for the largest T considered and the
smallest N . As expected from the literature, the Frees test behaves much nicer here,
but is still oversized under error skewness and heteroskedasticity (Table 2), with rejection frequencies between 6% and 9%, typically around 7.5%. The adjusted and the
bias-corrected LM tests perform even better (4% to 6%), in particular the LMb test;
the figures worsen slightly in Table 2, where sizes up to 8% emerge for larger T under
skewness and heteroskedasticity, where the John test rejects in 90% to 100% of the
cases, which is explained by the departure from the Gaussianity assumption under
which its asymptotic properties have been derived (Table 1 with normal errors shows
that the John test does hold size when the assumption is met). The behavior of the
other tests is barely affected by the changed shape of the error distribution.
The best size control is offered by Pesaran’s test CDp , which is virtually at 5%
ρ
σ
throughout, and by our CDX
test, which is only marginally more liberal. The CDX
19
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
J
σ
CDX
ρ
CDX
w
CDX
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
12.1
7.9
7.6
6.7
26.1
12.0
9.2
7.7
88.3
35.1
19.3
12.7
100.0
85.1
51.0
25.4
100.0
100.0
97.2
64.6
6.5
6.3
6.7
6.6
5.9
5.5
5.3
5.6
5.8
5.5
4.6
5.5
5.5
5.4
5.4
4.9
6.1
5.4
5.4
5.5
6.0
5.7
5.9
5.8
5.7
5.3
5.1
5.4
5.9
5.6
4.7
5.4
5.8
5.6
5.5
4.9
6.5
5.6
5.5
5.5
5.2
5.4
6.0
5.6
5.3
5.6
4.8
5.2
5.3
5.3
4.7
5.5
4.7
5.0
4.7
4.7
5.2
5.1
5.3
5.4
4.8
5.3
5.5
4.8
5.2
5.3
5.1
4.7
5.0
5.7
4.7
5.3
5.1
5.2
5.2
5.7
5.1
5.3
5.3
4.9
6.6
6.1
5.7
5.8
8.5
7.0
6.1
5.6
8.7
6.4
6.1
5.9
8.1
6.7
6.5
5.9
9.3
7.2
6.5
5.9
5.2
5.3
5.6
4.3
6.2
5.7
5.2
5.2
6.4
5.6
5.6
5.2
6.8
5.7
5.6
5.0
6.5
5.5
5.1
5.5
8.4
6.6
6.2
4.7
9.2
7.2
5.8
5.8
9.0
7.4
6.5
5.7
9.1
6.8
6.8
5.5
9.1
6.6
6.1
6.0
8.1
6.4
5.8
5.2
9.0
7.0
6.1
5.6
8.9
7.8
6.6
5.7
9.2
6.7
6.7
5.5
9.8
6.8
6.6
5.9
Table 1: Size: homoskedasticity and normal idiosyncratic error components; for further
details see the text
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
J
σ
CDX
ρ
CDX
w
CDX
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
12.7
9.8
10.2
8.9
25.7
13.4
11.3
11.7
87.9
34.6
22.9
15.9
100.0
82.4
49.7
27.6
100.0
100.0
95.3
62.7
7.4
7.9
9.5
8.9
6.3
7.3
8.0
9.5
6.4
7.4
7.7
7.7
5.8
7.1
7.4
7.4
6.6
6.9
7.9
7.6
7.0
7.2
8.7
8.2
6.3
7.1
7.8
9.1
6.5
7.4
7.7
7.7
6.1
7.3
7.5
7.4
7.0
7.0
8.1
7.7
5.9
7.7
8.4
8.5
6.0
7.4
7.1
8.4
5.8
7.1
7.3
7.1
6.1
7.7
7.4
7.6
6.2
7.6
8.0
8.1
5.3
5.1
5.3
4.8
5.1
4.6
4.5
4.7
5.2
4.7
5.3
4.5
5.2
5.2
5.1
4.4
5.5
4.8
4.9
4.9
92.9
99.6
99.9
100.0
98.1
99.9
100.0
100.0
99.7
100.0
100.0
100.0
99.9
100.0
100.0
100.0
99.9
100.0
100.0
100.0
5.2
5.9
6.6
6.3
5.5
5.8
5.7
5.8
5.6
6.0
6.1
5.0
6.4
5.3
5.8
5.5
6.7
5.9
5.9
5.1
8.8
7.0
7.5
6.7
9.2
6.9
5.8
5.9
9.6
7.1
7
5.9
9.2
7
6.1
5.8
10.1
6.4
6.6
6.1
7.3
6.9
6.6
6.2
7.9
6.3
6.2
6.4
7.9
6.1
6.5
5.6
7.6
6.3
6.0
6.3
8.1
6.4
6.3
5.8
Table 2: Size: skewed heteroskedastic idiosyncratic error components; for details see
the text
20
w
and CDX
variants have good size control when T is not small compared to N , as
σ
predicted by Proposition 2; for T = 10 one should rely on CDX
, but the CDX tests
are otherwise practically equivalent in terms of size control for all N .
Table 3 gives the behavior under slope coefficient heterogeneity. Given the moderate
regressors cross-dependence, neglected heterogeneity induces residual cross-correlation.
This however appears to be quite weak, the tests relying on fixed-effects OLS estimation
still hold size. As before, the correlations-based CDX tests have some difficulties for
small T .
Table 4 then gives size in the case where the use of a generalized inverse together
with χ21 critical values is required. We note that T = 10 is too small for the asymptotics
to deliver a good finite-sample approximation for the CDX tests, but otherwise size
control is quite good considering the nonstandard asymptotic problem. An interesting
finding is that the extreme cross-dependence influences the finite-sample behavior of
the LM and Frees tests: since the distortions decrease with T , the likely explanation
is the difference between errors ui , t and the residuals ûi,t , which cross-correlates due
to the common regressors. The other tests’ behavior does not change compared to the
baseline S0.
For the last two variations of scenario S0, Tables 5 and 6 illustrate the behavior
under weak error cross-dependence. While for α = 0.5 the size is controlled by all tests
that controlled size in the baseline scenario S0, we notice that, for α = 0.75, the CD
tests tend to reject slightly more often as expected under the null.
To sum up, only the LMa , LMb and in particular CDp tests are reliable alternatives
to the directed tests in terms of size. Let us now discuss the results under the power
scenarios.
An examination of Table 7 shows that, under scenario S1, only CDp keeps up with
ρ
σ
w
the CDX
and the CDX
tests in terms of power. The covariance-based version CDX
is less powerful than the cross-correlation based ones, though not by much, while still
being more powerful than the LMa or LMb tests.8 The advantage over the CDp test
is only visible for small N .
But it is not the power per se that is most interesting in our setup. Examining Table
8 with the size of the slope parameter test computed with standard errors chosen by
the cross-correlation tests, we notice that it can be severely oversized if using LMa or
LMb as pretests, with sizes e.g. up to 30% for T = 10 and N = 200. In contrast,
the CD tests (including CDp which can be seen as a particular case of the directed
tests, see Section 2.2) come all close to holding size of the combined testing procedure,
8
The differences in power depend on the strength of the regressor cross-correlation, but are still
ρ
w
visible when the regressor cross-dependence is weak, while the domination of CDX
and CDX
is quite
visible when the regressor cross-dependence is strong; the exact figures are available from the authors
upon request
21
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
J
σ
CDX
ρ
CDX
w
CDX
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
31.7
12
9.2
6.7
70.9
26.7
16.5
10.6
100
83.5
50.4
25.4
100
100
95.4
64.2
100
100
100
100
20.7
8.8
8.6
6.5
35.1
14.8
11.8
8.3
88.2
39.8
21.8
13.5
99.2
83.1
53.5
27.5
100
99.6
94.7
97.3
7.2
4.7
5.8
4.5
7.1
5.9
6.3
5.3
7.6
6.1
4.8
4.6
8.4
5.9
6.6
5
10.6
7.1
6
5.8
5.5
4.5
5.3
5
5.1
5.1
5.3
5.9
5.7
5.1
6.4
6.3
4.9
5.2
5.9
5.3
6.3
5.7
6.7
6.1
6.9
5.9
5.2
5.9
5.1
5
5.4
4.6
5.8
6
6.7
4.2
6.7
4.2
5.8
5.3
6.9
5.7
4.6
4.8
6.8
6.3
5.7
5.2
8.1
7
6.1
7.6
10.1
6.4
7.3
6.1
9.8
6.7
5.8
5.2
8.3
7.4
6.2
6.1
6.2
5.3
4.6
6
8.2
5.1
6.8
5.1
9
6.6
5.9
4.5
10
5.5
6.4
5.5
9.2
6.9
5.5
5.8
13.2
8.5
6.5
7
13.8
7.7
8.7
5.9
15.5
8.8
7.3
5.5
14.2
8.1
8.8
6
14
7.5
7.8
5.8
13.4
8.3
6.8
6.8
13.7
7.5
8.2
6.2
14.3
8.4
7.2
5.5
16.4
7.8
7.9
6.5
14.1
6.9
7.4
5.7
Table 3: Size: heterogenous coefficients; for details see the text
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
J
σ
CDX
ρ
CDX
w
CDX
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
60.8
21.2
12.6
9
99.8
49.8
28.4
16.2
100
99.8
86.1
46.6
100
100
100
93.6
100
100
100
100
44.1
17.8
11.8
9
92.1
33.6
21.3
12.4
100
91.7
59.5
28.7
100
100
98.7
70.2
100
100
100
100
6.1
6.1
4.9
4.8
6
6.4
6
6.3
6.8
4.8
4
5.4
6
5.3
5
5.5
5.5
5.6
5.2
5.3
4.9
5.1
5.3
5.3
5.7
6.4
6.1
5.9
5.4
4.5
4.1
5.7
6.1
5
4.7
6.3
4.7
5
3.6
4.4
8.3
6.9
5.7
5.3
8.8
7
6
5.6
7.4
6.2
6.3
5.6
7.8
6.4
6.9
6.6
8.5
5.2
5.3
5.1
5.9
5.3
6.1
5.4
8.2
7.8
7.2
5.4
8.8
6.7
6.2
6.6
8.5
7.8
6
5.4
8
6.4
5.4
5.1
4.7
5.2
4.2
4.8
4.4
5.8
4.9
4.8
4.6
4.2
4.5
4.5
5.2
5
5.5
5.7
6.1
3.3
4.1
5.1
8.3
6.9
5.7
5.3
8.8
7
6
5.6
7.4
6.2
6.3
5.6
7.8
6.4
6.9
6.6
8.5
5.2
5.3
5.0
8.4
6.6
5.6
5.3
9.7
7
6.5
5.4
8.4
6.8
6.2
5.5
7.5
6.1
6.9
6
7.9
5.2
4.8
4.9
Table 4: Size: common regressors prompting the use of a generalized matrix inverse
and adjusted degrees of freedom; for details see the text
22
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
J
σ
CDX
ρ
CDX
w
CDX
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
11.92
8.6
7.9
6.68
25.82
12.62
9.4
7.48
89.46
34.98
20.94
12.34
100.0
84.18
51.06
26.12
100.0
100.0
96.6
64
6.72
6.82
6.92
6.62
6.4
6.28
5.82
5.68
5.84
5.62
4.86
5.36
6.1
5.76
4.76
5.38
6.32
5.32
6.1
4.65
6.38
6.34
6.44
6.02
6.52
6.12
5.7
5.46
6.06
5.68
4.84
5.28
6.18
5.92
4.82
5.4
6.72
5.62
6.2
4.75
4.7
5.42
6.32
5.64
5.04
5.22
5.56
5.5
4.46
5.24
4.9
4.62
5.22
5.02
4.96
5.28
4.94
5.16
4.4
4.7
5.34
4.52
6
5
5.58
5
5.6
5.64
4.64
5.04
4.94
4.74
5.46
5.44
5.26
4.88
5.24
5.3
4.9
5.05
6.94
6.76
6.26
6.36
7.92
6.78
6.26
6.24
8.24
6.14
6.52
5.34
8.42
6.86
6.8
5.54
7.92
7.06
6.6
5.65
5.44
4.96
5.26
5.02
6.14
5.3
5.9
5.56
6.32
5.58
5.96
5.66
6.3
6.12
5.88
5.2
7.18
5.56
5.5
5.1
8.12
6.8
6.14
5.46
9.82
6.4
6.44
6
9.18
6.96
7.12
5.94
8.74
7.34
6.72
5.78
9.26
6.34
5.7
5.3
8.24
6.94
6.52
5.62
9.22
6.2
6.42
5.66
8.74
7.26
7.1
6
9.04
7.18
6.62
5.46
9.12
6.42
5.5
5.35
Table 5: Size: Weak cross-dependence of exponent α = 0.5; for details see the text
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
J
σ
CDX
ρ
CDX
w
CDX
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
11.64
8.62
7.92
6.7
25.92
12.86
9.52
7.64
89.42
35.02
21.2
12.94
100.0
84.48
51.76
27.3
100.0
100.0
96.6
65.5
6.3
6.92
7.1
6.62
6.2
6.28
5.92
5.92
5.64
5.54
5.08
5.56
6.22
5.84
5.26
5.84
6.28
5.32
6.4
4.95
6.16
6.34
6.46
5.78
6.22
6.18
5.74
5.68
5.82
5.62
5.06
5.52
6.48
6.06
5.34
5.88
6.56
5.52
6.5
5.05
4.8
5.22
6.2
5.68
5.32
5.44
5.44
5.4
4.52
5.22
5.24
4.88
5.24
5.34
5.04
5.34
4.94
5.22
4.5
5.55
5.48
4.7
6.44
5.42
6.08
5.7
6.48
6.88
5.7
6.82
6.78
7.1
7.06
7.96
8.72
9.06
8.22
10.34
10.3
13
7.18
6.64
6.14
6.5
8.14
6.8
6.34
6.24
8.4
6.1
6.38
5.64
8.3
6.6
6.9
5.84
8.26
7.14
7.2
5.75
5.24
5.08
5.32
5.34
6.38
6.08
6.66
6.5
7.2
6.66
7.16
6.92
7.76
7.9
7.94
7.84
9.54
8.24
9.5
10.45
8.2
7.12
6.36
5.84
10.3
7.12
7.26
6.84
9.88
8.18
8.44
7.52
9.96
9.08
9.1
8.5
11.8
9.78
10.7
11
8.36
7
6.78
6
9.78
6.86
7.26
6.44
9.32
7.88
8.24
7.52
10.02
8.88
8.86
8.3
10.7
9
9.3
10.8
Table 6: Size: Weak cross-dependence of exponent α = 0.75; for details see the text
23
about 6% except maybe for very small T . The point is that the LMa and LMb test
against cross-correlation indiscriminately, whereas the CD family reacts precisely to
those departures from the null that affect inference on slope parameters.9
Moving on to the baseline Scenario S2, we note that the zero average correlation
of the errors affects the directed tests as expected, such that they have little power
compared to the adjusted and bias-corrected LM statistics LMa and LMb ; see Table
9. Their power does not appear to increase in N either.
More important, however, is the finding that the reduced power is not an issue
when using the directed tests as a pretest for deciding which standard errors to use
for a slope parameter test. See Table 10, where the size of the pretest-based slope
parameter tests is practically 5% (except perhaps for mild oversizedness for T = 10).
For Scenario S3, the power is somewhat higher than under S1, but note that the
John test does not hold size under nonnormality. The overall image is otherwise the
same as under S1; the size distortions of the slope parameter test with LMa and LMb
as pretests is still far from 5%, see e.g. the case where T = 10 and N = 100.
Scenario S4 confirms that the WLS version of the directed test has higher power
than the other two variants, due to the presence of cross-sectional heteroskedasticity,
at least for smaller N , but the results are otherwise similar to those of S2 so we omit
the corresponding tables. They are available upon request from the authors.
Finally, Scenario S5 (see Table 13) replicates mostly the findings of Scenario S1,
with one interesting exception. Namely, the CDX tests are clearly more powerful than
under S1 (while controlling size fairly well, at least for larger T ). This is most likely
because the critical values are much smaller, while the displacement of the statistic
under the alternative is apparently not affected by the use of a generalized inverse.
While the CDX tests may safely be used as pretests (again, except for very small T ),
the LMa and LMb tests fail in this respect (Table 14)
Summing up, size control is good for all three variants with the exception of the case
σ
is reliable; only the adjusted
T = 10 where only the covariances-based version CDX
and the bias-corrected LM tests of Pesaran et al. (2008) and Baltagi et al. (2012), and
especially the test of Pesaran (2004) are serious competitors. In terms of power, the
directed tests can dominate, but can also be dominated by the alternatives. But the
use as a pretest can fail for the two corrected versions of the LM test, with sizes of
the slope parameter test of up to 33%. This is not the case with the directed tests,
which are designed to find the correlation that would affect inference about the slope
parameters, and work fine in this respect as long as T is not too small.
Thus, for T = 10, it may be wiser to resort to panel-robust standard errors di9
Again, the distorting effect is even more visible for strong regressor correlation, but apparently
negligible under weak regressor cross-dependence.
24
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
J
σ
CDX
ρ
CDX
w
CDX
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
29.6
16.74
14.56
16.16
72.52
33.92
28.38
31.74
100.0
91.44
78.44
77.5
100.0
100.0
99.72
99.3
100.0
100.0
100.0
100.0
19.02
14.02
13.32
16.06
37.9
21.64
21.18
28.04
88.84
58.38
54.14
64.46
99.54
94.9
91.94
95.5
100.0
100.0
99.94
99.94
6.18
8.08
9.6
12.88
8.28
9.58
13.04
21.8
10.76
18.94
27.98
50.26
18.24
34.5
55.36
84.68
27.56
60.4
84.12
98
5.34
6.8
7.88
12
6.04
8.38
11.94
20
8.18
16.48
25.98
48.78
13.28
32.02
53.56
84.46
21.3
59.98
83.58
98
16.44
23.52
29.22
42.9
30.8
50.08
64.4
83.3
65.2
91.5
97.76
99.86
89.14
99.44
100.0
100.0
97.56
100.0
100.0
100.0
7.66
8.06
6.8
9.42
8.82
8.34
10.84
16.16
11.6
17.26
23.34
44.22
17
32.5
51.06
81.58
26.82
60.22
82.32
97.58
14.14
19.36
22.56
31.78
29.46
43.62
55.66
74.04
63.42
87.66
96.26
99.58
87.9
99.08
99.98
100.0
97.34
100.0
100.0
100.0
22.72
23.74
25.48
34.18
36.56
47.74
58.18
74.92
68.1
88.46
96.32
99.64
89.94
99.1
100.0
100.0
97.72
100.0
100.0
100.0
21.76
22.78
24.2
32.98
32.52
45.02
56.16
72.96
60.54
85.86
95.54
99.56
84.38
98.62
99.94
100.0
95.88
99.96
100.0
100.0
Table 7: Power: Scenario 1; for details see the text
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
8.1
7.1
8.32
7.32
8.8
8.96
9.66
9.04
8.6
7.32
8.3
8
8.52
6.22
6.94
6.48
8.94
6.64
6.28
5.68
8.1
7.18
8.38
7.32
9.64
9.46
10.14
9.34
9.44
10.96
11.3
9.68
8.58
6.96
8.48
7.3
8.94
6.64
6.3
5.7
8.26
7.4
8.48
7.5
10.54
10.1
10.72
9.8
17.34
15.94
14.96
11.42
24.04
19.98
16.32
9.8
31.74
19.02
11.64
6.14
8.34 7.62
7.38 6.62
8.5
7.48
7.62
6.1
10.46 8.74
10.1 7.34
10.9 6.82
10.08 5.9
17.94 10.08
16.3 6.88
15.24 6.22
11.62 5.56
25.06
9
20.5 6.28
16.54 6.86
9.86 6.28
33.3 9.12
18.96 6.64
11.78 6.28
6.24 5.68
J
σ
CDX
ρ
CDX
w
CDX
8.3
7.24
8.6
7.58
10.38
10.24
11.08
10.26
17.76
16.08
15.58
12.18
24.16
20.36
17.18
10.4
31.7
18.72
12.2
6.4
7.84
6.8
7.9
6.54
8.8
7.82
7.36
6.38
10.04
7.24
6.32
5.62
9.24
6.3
6.86
6.28
9.08
6.64
6.28
5.68
7.52
6.58
7.82
6.44
8.62
7.62
7.22
6.44
9.88
7.16
6.34
5.62
9.14
6.3
6.86
6.28
9.12
6.64
6.28
5.68
7.64
6.66
7.82
6.42
8.88
7.84
7.44
6.52
10.82
7.42
6.46
5.62
9.88
6.36
6.86
6.28
9.46
6.64
6.28
5.68
Table 8: Size for slope parameter test: Scenario 1; for details see the text
25
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
J
σ
CDX
ρ
CDX
w
CDX
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
28.5
14.0
11.8
11.5
72.6
29.6
23.1
19.4
100.0
88.6
66.9
56.4
100.0
100.0
99.1
94.0
100.0
100.0
100.0
100.0
17.9
11.6
10.8
11.3
36.5
17.4
16.1
15.6
88.6
49.4
38.3
40.6
99.4
91.6
79.3
78.8
100.0
99.9
99.7
99.0
6.6
7.0
6.8
8.2
6.9
7.8
9.6
11.0
9.36
11.5
14
25.1
12.5
18.8
27.9
51.9
19.0
34.8
53.1
83.1
5.5
6.6
7.3
8.4
4.9
6.8
9.7
11.4
7.5
11.1
15.2
26
10.5
19.6
30.6
53.6
17.0
38.6
57.9
85.7
6.0
5.7
5.5
6.1
6.1
5.2
6.0
5.6
6.1
6.0
5.3
6.0
6.2
5.7
5.7
5.9
5.5
5.6
5.3
5.7
7.1
7.1
7.2
7.3
7.8
7.4
9.1
9.4
10.5
11.7
14.5
22.2
14.0
20.4
29.5
50.0
22.0
39.5
57.1
84.0
6.9
6.0
5.6
5.9
8.1
7.1
6.6
6.7
9.1
7.1
6.7
6.8
8.3
7.6
7.0
7.0
8.4
7.5
7.0
7.3
13.1
8.3
7.2
6.9
13.3
9.2
8.2
7.4
14.3
9.2
8.1
7.4
13.4
10.1
8.6
7.9
13.5
9.5
8.6
8.1
12.8
8.1
7.6
6.9
13.1
9.2
7.7
7.6
13.6
9.2
7.9
7.0
13.1
9.6
8.3
8.1
13.8
9.5
8.8
8.1
Table 9: Power: Scenario 2; for details see the text
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
J
σ
CDX
ρ
CDX
w
CDX
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
6.14
5.6
5.42
5.26
5.88
5.52
5
5.4
7.02
5.4
5.32
4.44
7.28
5.64
5.84
5.44
6.36
5.96
5.18
5.56
5.96
5.58
5.42
5.26
5.48
5.46
4.96
5.34
6.94
4.88
5.08
4.44
7.26
5.56
5.78
5.22
6.36
5.96
5.18
5.58
5.8
5.46
5.42
5.28
5.08
5.3
4.88
5.34
5.52
4.84
5.04
4.42
5.28
5.18
5.32
5.12
5.18
5.8
4.94
5.58
5.78
5.44
5.44
5.26
5.18
5.34
4.9
5.34
5.54
4.76
5.1
4.4
5.42
5.12
5.32
5.06
5.14
5.86
4.98
5.6
5.46
5.24
5.38
5.32
4.84
5.24
4.78
5.24
5.16
4.8
5
4.3
5.02
5.02
5
5.08
4.7
5.46
4.64
5.28
5.74
5.34
5.46
5.24
5.12
5.26
4.82
5.3
5.54
4.72
5.02
4.34
5.5
5.1
5.36
5.08
5.26
5.9
4.94
5.62
5.42
5.22
5.26
5.2
4.84
5.16
4.68
5.14
5.08
4.64
4.9
4.24
4.94
4.8
4.88
4.96
4.48
5.34
4.62
5.18
5.4
5.14
5.24
5.18
4.74
5.18
4.68
5.14
5.12
4.66
4.88
4.24
4.88
4.74
4.88
4.96
4.5
5.3
4.62
5.2
5.46
5.1
5.26
5.18
4.82
5.2
4.76
5.12
5.28
4.7
4.84
4.26
5.04
4.86
4.9
4.94
4.74
5.38
4.62
5.22
Table 10: Size for slope parameter test: Scenario 2; for details see the text
26
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
J
σ
CDX
ρ
CDX
w
CDX
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
33.9
22.8
21.2
22.2
77.3
43.8
37.8
41.0
100.0
94.5
87.6
85.2
100.0
100.0
99.8
99.5
100.0
100.0
100.0
100.0
23.1
19.8
19.9
22.1
46.5
31.3
31.4
36.9
93.0
73.6
71.2
78.2
99.8
98.1
96.5
97.8
99.9
100.0
100.0
99.9
10.3
13.8
15.1
18.8
14.4
18.4
23.1
31.6
27.9
38.7
50.4
67.8
46.2
64.7
77.5
92.8
67.0
86.4
96.0
99.6
10.5
13.2
14.3
17.8
14.3
17.3
22.0
30.6
28.8
38.8
49.5
66.0
47.9
63.8
77.2
92.5
70.6
86.4
95.8
99.6
27.2
33.3
38.5
48.6
50.1
66.9
75.6
87.4
85.3
96.8
99.4
99.9
97.4
99.8
100.0
100.0
99.7
100.0
100.0
100.0
53.3
65.8
73.0
77.3
63.9
78.5
84.8
92.5
76.6
91.2
96.3
98.9
83.3
95.9
99.0
99.9
89.2
98.7
99.9
100.0
18.2
23.6
28.2
34.0
34.5
49.1
59.8
74.5
67.7
89.2
97.0
99.6
88.6
99.2
100.0
100.0
97.9
99.9
100.0
100.0
33.2
32.0
33.9
38.9
54.8
64.1
71.1
80.9
86.4
95.5
99.1
99.9
97.4
99.8
100.0
100.0
99.7
100.0
100.0
100.0
31.9
31.5
33.3
39.1
54.0
64.6
71.3
81.3
86.1
95.5
99.0
99.8
97.0
99.9
100.0
100.0
99.7
100.0
100.0
100.0
Table 11: Power: Scenario 3; for details see the text
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
J
σ
CDX
ρ
CDX
w
CDX
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
7.7
6.9
6.8
7.2
7.6
8.3
8.4
8.5
8.2
6.7
7.4
7.5
8.3
6.5
6.3
5.5
8.6
7.3
6.5
5.3
7.9
6.9
6.9
7.2
8.2
9.1
8.8
8.7
8.6
8.7
9.3
8.2
8.3
6.8
6.8
5.9
8.6
7.3
6.5
5.3
8.2
7.2
7.1
7.3
9.4
9.8
9.2
9.1
14.5
12.8
12.0
9.7
18.0
13.4
10.4
6.9
17.6
11.0
7.8
5.4
8.2
7.2
7.2
7.4
9.4
9.9
9.2
9.2
14.4
12.6
12.1
9.9
17.5
13.6
10.2
7.0
15.9
11.0
7.7
5.5
7.3
6.4
6.2
6.3
7.3
7.0
6.1
6.0
8.3
6.3
6.3
5.8
8.3
6.5
6.2
5.5
8.6
7.3
6.5
5.3
7.8
6.5
6.1
6.2
7.9
7.2
6.3
6.2
10.4
7.0
6.6
6.0
11.0
7.4
6.3
5.5
11.7
7.6
6.6
5.3
7.7
6.8
6.5
6.7
7.9
8.0
7.0
6.4
9.4
6.8
6.6
5.8
9.2
6.6
6.2
5.5
8.9
7.3
6.5
5.3
7.3
6.6
6.5
6.5
7.4
7.3
6.5
6.3
8.3
6.5
6.4
5.8
8.3
6.5
6.2
5.5
8.6
7.3
6.5
5.3
7.3
6.7
6.5
6.6
7.6
7.4
6.4
6.3
8.6
6.5
6.4
5.8
8.5
6.6
6.2
5.5
8.6
7.3
6.5
5.3
Table 12: Size for slope parameter test: Scenario 3; for details see the text
27
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
J
σ
CDX
ρ
CDX
w
CDX
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
27.3
16.4
14.5
15.8
72.6
33.5
32.1
27.4
100
91
80.8
77.8
100
100
99.4
99.1
100
100
100
100
17.8
13.8
13.4
15.6
37.3
20.7
23.7
24
89.8
58.5
54.3
66.6
99.7
95.2
91.3
95.9
100
100
100
100
7.1
8
8.2
12.5
7.5
9.6
15.7
18.7
12.3
19.2
28.1
51.7
20.7
36.7
57.6
85.3
30.5
60.3
82.3
93.6
6
8
7.4
11.6
5.4
8.2
12.7
17.6
8.3
15.9
25.6
48.2
15.9
33.7
56.3
84.3
23.2
59.4
82.3
93.9
16.7
23.7
29
41.7
31.6
53.6
65.5
82.5
66.7
91.7
97.9
99.7
88.6
99.4
100
100
97.8
100
100
100
7.6
6.9
7.1
9.4
7.6
9.2
12.3
13.8
11.4
16.8
24.3
44.2
20.6
33.9
54
81.1
28.9
59.9
82.7
91.2
75.3
77.1
82.1
87.8
79.5
89.6
94.1
97.2
93.2
98.8
99.7
100
97.3
99.9
100
100
99.7
100
100
100
84
81.7
84.9
88.5
83.8
90.5
94.5
97.4
94.8
99.3
99.8
100
97.9
100
100
100
99.7
100
100
100
84.5
80.5
84.8
88.1
84
90.7
94.1
97.1
93.3
98.9
99.8
100
97.3
99.9
100
100
99.5
100
100
100
Table 13: Power: Scenario 5; for details see the text
N
10
20
50
100
200
T
LM
F RE
LMa
LMb
CDp
J
σ
CDX
ρ
CDX
w
CDX
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
10
20
30
50
7.4
6.7
7.8
7.2
8.6
7.9
9.1
9
7.3
7
7.4
8.1
11.3
6.4
5.8
5.6
7.9
7.4
6.8
6.1
7.7
6.8
7.8
7.2
9.1
8.7
9.3
9.3
8.1
11.4
10.3
9.8
11.3
7.1
7.8
6.1
7.9
7.4
6.8
6.2
7.6
7
7.9
7.2
10.1
9
9.4
10.1
16.5
16.9
13.7
12.4
27.1
18.2
16
8.8
29.3
21.4
11.7
9.7
7.4
7
8
7.3
10.1
9.3
9.8
10.1
16.8
17.1
14.3
12.8
27.7
18.9
15.6
8.8
30.6
21
11.7
8.9
6.9
5.9
6.8
5.8
8.5
6.3
6.2
6.3
8.4
6.5
5.5
5.6
11.8
6.5
5.7
5.3
8.1
7.4
6.8
5.8
7.3
7
8.1
7.4
9.9
8.9
9.8
10.4
16.6
17.2
14.1
13.8
26.7
18.5
16.3
9.4
29.8
20.3
12.3
10.3
7.4
5.2
6
5
7.9
5.2
5.6
6.1
7.3
5.9
5.4
5.6
11.3
6.4
5.7
5.3
7.9
7.4
6.8
6.0
7.8
5.1
5.9
5
8.3
5.2
5.6
6.1
7.2
5.8
5.4
5.6
11.3
6.4
5.7
5.3
8
7.4
6.8
6.1
7.8
5.2
6
5
8.3
5.2
5.7
6.1
7.4
5.8
5.5
5.6
11.4
6.4
5.7
5.3
8.1
7.4
6.8
6.1
Table 14: Size for slope parameter test: Scenario 5; for details see the text
28
rectly, since the pretesting step induces some oversizedness in this case. For data sets
w
w
with larger dimension, the correlations-based CDX
and CDX
appear to be the better
w
choices, with CDX
having an edge under cross-sectional heteroskedasticity.
4
Summary
We have introduced directed tests of error cross-section independence. Relying on the
information matrix equality, they essentially check whether cross-sectional error correlation can be neglected when estimating the covariance matrix of the fixed effect slope
estimator. Compared to extant testing procedures, this restricts the set of alternative hypotheses, since the directed tests augment cross-sectional error covariances or
correlations by cross-sectional regressor correlation.
The asymptotic distribution of the directed tests has been shown to be χ2 with
1/2K(K + 1) degrees of freedom, for T → ∞ and N → ∞ jointly, where K is the
number of explanatory variables. The limiting distribution approximates the finite
sample distribution well, except perhaps in the case when T is very small compared
to N . In terms of power, the augmentation with cross-sectional regressor correlation
can be of disadvantage when the regressor correlation is weak, but can turn into an
advantage when the regressor correlation is strong. This suggests that a union-ofrejections approach combining the evidence from the directed tests and alternative
test with power not depending on regressor cross-correlation (say the corrected LM
tests) may work better than either test alone in a wider range of situations.10
When used as a pretest for choosing the type of standard error to be used for slope
parameter tests, the directed tests act reliably, unlike alternative procedures focussing
on error cross-correlation only. Especially when the regressor cross-correlation is not
weak, the latter tests fail to detect cross-dependence affecting the standard errors,
indicating ordinary standard errors too often when robust clustered ones would have
been in order. This leads to heavily oversized slope parameter tests for the non-directed
tests. When T is small, one should use panel-robust standard errors to begin with.
Appendix
Lemma 1. Under Assumptions 1 and 2, the following holds:
(i) E σ̂ij vech Xi0 Xj + Xj0 Xi = 0, for i 6= j,
(ii) E σ̂ij vech Xi0 Xj + Xj0 Xi σ̂ik vech (Xi0 Xk + Xk0 Xi ) = 0, given that the indices i, j, and k
differ ,
10
This has been impressively illustrated by Harvey et al. (2009) for unit root tests.
29
(iii) Cov σ̂ij vech Xi0 Xj + Xj0 Xi =
0 (T − K)−2 σii σjj E tr(MXi MXj ) vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi , and
0 PN Pi−1
(iv) Cov (S) = (T −K)−2 i=2 j=1 σii σjj E tr(MXi MXj ) vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi .
Note that T /(T − K) → 1; furthermore, since the projection matrices
Pi and Pj are positive
q
semidefinite, the Cauchy-Schwarz inequality indicates that |tr Pi Pj | ≤ tr Pi2 tr Pj2 = K. Thus, we
may use T throughout instead of tr MXi MXj or T − K.
Before moving on to the main proofs, we require an additional lemma.
Lemma 2. Under the Assumptions of Proposition 1 it holds as N, T → ∞ that
PN Pi−1 0
p
−1
1
0
0
u
v ij
(i) N √
X
)
X
X
(X
u
j
i
i
i
i
i
T →0
i=2
j=1
T
−1 0 ij p
PN Pi−1 0
1
0
(ii) N √
u
X
X
X
Xj uj v T → 0
j
j
i
j
i=2
j=1
T
−1 0 ij p
PN Pi−1
−1
1
0
0
0
0
u
X
(X
X
)
X
X
X
X
(iii) N √
Xj uj v T → 0.
i
i
j
j
i
i
i
j
i=2
j=1
T
3
P
P
N
i−1
ij
(iv) supN,1≤t≤T E N1 i=2
j=1 uj,t v T ui,t < ∞.
(v)
(vi)
1
N 2T
1
N2
T P
N P
N i−1
P
P k−1
P
t=1 i=2 k=2 j=1 l=1
PN Pi−1
i=2
uj,t ul,t ui,t uk,t v ij
T
ij
j=1 σ̂ii σ̂jj v T
v ij
T
0
−
1
N
PN
0
v kl
T
i=2
−
σii
1
N
1
N
N
P
σii
i=2
1
N
Pi−1
i−1
P
j=1
ij
j=1 σjj v T
σjj v ij
T
v ij
T
0 v ij
T
0
!
p
→ 0.
p
→ 0.
The proofs of the above lemmas are available in an online appendix.
Proof of Proposition 1
Let
ΣS,N


N
i−1
X
X
0
1
1
ij 
=
σii 
σjj v ij
.
T vT
N i=2
N j=1
Note that ΣS,N does not depend on t, but depends on T via N and v ij
T.
ij
The
dimensionality
restriction
on
v
implies
full
rank
of
Σ
even
in the limit, implying that
S,N
T
−1 −0.5 ΣS,N and ΣS,N are uniformly bounded in probability.
Consider now the following normalization of the test statistic:
σ
CDX
=
Σ−0.5
S,N
1 −1
D S
N T
0 −1 −0.5 −1 1
−1 −0.5
−0.5 1
−1
ΣS,N DT
V̂ DT ΣS,N
ΣS,N
D S .
N2
N T
We shall prove that
Σ−0.5
S,N
1 −1 d
DT S → N 0, I 12 K(K+1)
N
and that
1
p
V̂ DT−1 − ΣS,N → 0;
(9)
N2
σ d
2
together with the uniform w.p.1 boundedness of Σ−0.5
S,N , the two imply that CDX → χ 21 K(K+1) as
required.
DT−1
30
Let us first examine the limiting distribution of the suitably normalized S. We have
N i−1
N i−1
vech Xi0 Xj + Xj0 Xi
1 −1
1 XX√
1 XX√
−1
√
T σ̂ij DT
T σ̂ij v ij
DT S =
=
T
N
N i=2 j=1
N i=2 j=1
T
with
√
T σ̂ij
−1 0
−1 0
T
Xj uj
Xi uj
1 X
u0i Xi Xi0 Xi
u0i Xj Xj0 Xj
√ −
√
= √
ui,t uj,t −
T
T
T
T
T t=1
T
T
−1
−1
0
0
Xj uj
u0 Xi Xi0 Xi
Xi0 Xj Xj Xj
√ .
+ i
T
T
T
T
T
Using Lemma 2, it follows immediately that
N i−1
T
1 −1
1 XX 1 X
√
DT S =
ui,t uj,t v ij
T + op (1)
N
N i=2 j=1 T t=1
as N, T → ∞. Rearrange the terms on the r.h.s. to obtain


N i−1
T
T
N i−1
1 X 1 XX
1 XX 1 X
ij 
ij
√
ui,t uj,t v T ,
ui,t uj,t v T = √
N i=2 j=1 T t=1
T t=1 N i=2 j=1
and let
ξ t,T


N
i−1
X
X
1
−0.5

 ui,t .
= ΣS,N
uj,t v ij
T
N i=2 j=1
Given that
E ( ui | uj , Xk , j < i, 1 ≤ k ≤ N ) = 0
and ΣS,N only depends on the regressors, the ξ t,T have a (multivariate) md array structure. To
establish the limiting behavior of the suitably normalized S (i.e. the sums of the lines of the md array)
we need to show that
p
1
max √ ξ t,T (10)
→0
1≤t≤T
T
and that
T
1X
p
ξ t,T ξ 0t,T → I 12 K(K+1) .
T t=1
(11)
Since Cov ξ t,T is easily shown to be the 1/2K (K + 1)-dimensional identity matrix, (10) and (11)
will allow us to conclude that
−1
Σ−0.5
S,N DT
1
d
S → N 0, I 12 K(K+1)
N
using a central limit theorem for md arrays (Davidson, 1994, Theorem 24.3) and the Cramér-Wold
device.
−0.5
To establish the asymptotic negligibility condition (10), recall that ΣS,N
does not depend on t so
we have that




N
i−1
X
X
1
−0.5 1 1
ij 


√
√
max max
ξ
≤
u
v
u
Σ
j,t T
i,t .
t,T S,N
1≤t≤T
1≤t≤T T
N i=2 j=1
T
31
P
P
√1
N
i−1
ij
1
is
bounded
w.p.1,
it
suffices
that
max
Since Σ−0.5
u
v
u
van1≤t≤T
j,t
i,t
S,N
T
i=2
j=1
N
T
ishes as T → ∞. By a well-known relation between the uniform boundedness of the moments of a
sequence
and the maxima of the sequence, this is in turn implied by the uniform boundedness of
3
1 PN Pi−1
ij
E N i=2
, which has been proven in Lemma 2 item (iv), so condition (10)
j=1 uj,t v T ui,t follows. Item (v) of Lemma 2 then establishes (11).
We now only need to establish the asymptotic behavior of the suitably normalized V̂ given in (9).
With tr MXi MXj /T → 1, the desired convergence follows from Lemma 2 item (vi), thus concluding
the proof.
Proof of Proposition 2
ρ
The proof is essentially
the same for both statistics,
so we only give the part concerning CDX
.
n
o
−1
Letting ˆi = i,t − x0i,t (Xi0 Xi ) Xi0 i
, it follows immediately that
t=1,...,T
ρ̂ij = q
Let then
S =
N X
i−1
X
1
ˆ0 ˆ
T −K i j
1
ˆ0 ˆ 1 ˆ0 ˆ
T −K i i T −K j j
.
σ̂ij
vech Xi0 Xj + Xj0 Xi
i=2 j=1
with
σ̂ij
=
1
ˆ0 ˆj .
T −K i
σ
This is nothing else than the CDX
statistic computed as if the disturbances of the panel were and
not u. We shall prove that
1 −1
1
D R = DT−1 S + op (1)
N T
N
and the result will follow with the arguments of the proof of Proposition 1. A direct use of the
arguments given there is not a choice without additional assumptions about the distribution of i,t
since
i,t
ζi,t = q
1
ˆ0 ˆ
T −K i i
may not have finite moments at all, and the independence of ζi,t in time is lost. Since K is finite,
T /(T − K) → 1 and we may safely replace T − K with T.
p
We show in a first step that, for some vanishing sequences aTij for which supi,j aTij → 0 as T → ∞,
there exists a constant C > 0 such that
1
q
− 1 ≤ C sup aTi,j ∀i, j.
(12)
i,j
1 + aTij
The Taylor expansion around 1 with rest in integral form indicates that ∃λ ∈ [0, 1] such that
aTi,j
1
q
−1=− q
3
1 + aTij
2 1 + λaTij
32
Since supi,j aTi,j → 0, ∃T0 fixed such that 1 + λaTij > 0 ∀ T > T0 and thus

1


1
1
max sup q
≤ max
, q
3 
3

2
2 1 + λaTij
2 1 − supi,j aTij λ∈[0,1] i,j
∀ T > T0 , which is obviously uniformly bounded in i and j as T → ∞. Equation (12) follows.
In a second step, we establish the behavior of the term T1 ˆ0i ˆi − 1 across all units of the panel. We
have that
T
T
T
1X 2
2X
1X 0
1 0
−1
−1
−1
ˆi ˆi − 1 =
i,t − 1 −
i,t x0i,t (Xi0 Xi ) Xi0 i +
x (X 0 Xi ) Xi0 i 0i Xi (Xi0 Xi ) xi,t .
T
T t=1
T t=1
T t=1 i,t i
PT
The term T1 t=1 2i,t − 1 is of order Op T −0.5 and is easily checked to have finite kurtosis given our
assumptions; hence
√ !
T
1 X
N
2
i,t − 1 = op √
.
sup T
i T t=1
P
P
−1
−1
−1
T
T
The terms T2 t=1 i,t x0i,t (Xi0 Xi ) Xi0 i and T1 t=1 x0i,t (Xi0 Xi ) Xi0 i 0i Xi (Xi0 Xi ) xi,t are both
of order Op T −1 , so their maximum over i = 1, . . . , N must be of order Op (N/T ) . Summing up,
T
1 X
0
ˆi ˆi − 1 = Op
sup i T t=1
(√
max
N N
√ ,
T T
)!
= Op
√ !
N
√
.
T
To conclude the proof, we now examine
1 0 1 0
ˆ ˆi ˆ ˆj − 1 =
T i T j
1 0
1 0
1 0
1 0
ˆ ˆi − 1 +
ˆ ˆj − 1 +
ˆ ˆij − 1
ˆ ˆj − 1 .
T i
T j
T i
T j
Note that if supi T1 ˆ0i ˆi − 1 = Op (aT ) for some positive aT → 0 it follows that
1 0
1 0
1 0
1 0
ˆi ˆi − 1 +
ˆi ˆi − 1 +
ˆj ˆj − 1
ˆj ˆj − 1 = Op (aT )
sup T
T
T
T
i,j
so, given that aT =
√
√N ,
T
1 0 1 0
sup ˆi ˆi ˆj ˆj − 1 = Op
T
i,j T
√ !
N
√
.
T
Thus, we have for all pairs i, j
√ 1 0 √
T ρ̂ij − ˆi ˆj ≤ T
T

1 0
√ ˆi ˆj  q
T
1+

√
 = C √N
1 0
T
ˆ ˆ 1 ˆ0 ˆ
T i i T j j − 1
1
1 0 √ ˆi ˆj .
T
Then,
1 −1 ρ
D S − 1 D−1 S T
N T
N
≤
N i−1 1 X X 1 0 ˆ
ˆ
ρ̂ij − i j DT−1 vech Xi0 Xj + Xj0 Xi N i=2 j=1 T
N i−1 1 X X 1 0 √ ˆi ˆj v Ti,j .
≤ C√
N T i=2 j=1 T
33
But the expectation E √1T ˆ0i ˆj v Ti,j is easily shown to be uniformly bounded, from which it
PN Pi−1 follows that i=2 j=1 √1T ˆ0i ˆj v Ti,j = Op N 2 . This leads ultimately to
1 −1 ρ
1
D S − DT−1 S = Op
N T
N
N 1.5
√
T
3
which vanishes given the additional assumption N /T → 0. This assumption can be dropped if in
exchange
i,t
ζi,t = q
1
ˆ0 ˆ
T −K i i
has finite variance – e.g. when |i,t | is bounded away from zero.
References
Amemiya, T. (1985). Advanced econometrics. Harvard University Press.
Andrews, D. W. K. (1987). Asymptotic Results for Generalized Wald Tests. Econometric Theory 3,
348–358.
Andrews, D. W. K. (2005). Cross-Section Regression with Common Shocks. Econometrica 73, 1551–
1585.
Arellano, M. (1987). Computing Robust Standard Errors for Within-Group Estimators. Oxford
Bulletin of Economics and Statistics 49, 431–434.
Baltagi, B. H., Q. Feng, and C. Kao (2011). Testing for Sphericity in a Fixed Effects Panel Data
Model. Econometrics Journal 14, 25–47.
Baltagi, B. H., Q. Feng, and C. Kao (2012). A Lagrange Multiplier Test for Cross-Sectional Dependence in a Fixed Effects Panel Data Model. Journal of Econometrics 170, 164–177.
Beck, N. and J. N. Katz (1995). What to Do (and Not to Do) with Time Series Cross-section Data.
American Political Science Review 89, 634–647.
Breusch, T. S. and A. R. Pagan (1980). The Lagrange Multiplier Test and its Application to Model
Specification Tests in Econometrics. Review of Economic Studies 47, 239–253.
Chudik, A. and M. H. Pesaran (2013). Large Panel Data Models with Cross-Sectional Dependence: A
survey. In B. H. Baltagi (Ed.), The Oxford Handbook of Panel Data, forthcoming. Oxford University
Press.
Davidson, J. (1994). Stochastic Limit Theory. Oxford University Press.
Driscoll, J. C. and A. C. Kraay (1998). Consistent Covariance Matrix Estimation with Spatially
Dependent Panel Data. The Review of Economics and Statistics 80, 549–560.
Eicker, F. (1967). Limit Theorems for Regressions with Unequal and Dependent Errors. In L. Le Cam
and J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics
and Probability, pp. 59–82. Berkeley: University of California Press.
34
Frees, E. W. (1995). Assessing Cross-Sectional Correlation in Panel Data. Journal of Econometrics 69,
393–414.
Halunga, A., C. D. Orme, and T. Yamagata (2012). A Heteroskedasticity Robust Breusch-Pagan
Test for Contemporaneous Correlation in Dynamic Panel Data Models. University of Manchester
Economics Discussion Paper 1118.
Harvey, D. I., S. J. Leybourne, and A. M. R. Taylor (2009). Unit root testing in practice: Dealing
with uncertainty over the trend and initial condition. Econometric Theory 25, 587–636.
John, S. (1972). The Distribution of a Statistic Used for Testing Sphericity of Normal Distributions.
Biometrika 59, 169–173.
Moscone, F. and E. Tosetti (2009). A Review and Comparison of Tests of Cross-Section Independence
in Panels. Journal of Economic Surveys 23, 528–561.
Pesaran, M. H. (2004). General Diagnostic Tests for Cross Section Dependence in Panels. CESifo
Working Paper No. 1229.
Pesaran, M. H. (2015). Testing Weak Cross-Sectional Dependence in Large Panels. Econometric
Reviews 34, 1088–1116.
Pesaran, M. H., A. Ullah, and T. Yamagata (2008). A Bias-Adjusted LM Test of Error Cross-Section
Independence. Econometrics Journal 11, 105–127.
White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test
for Heteroskedasticity. Econometrica 48, 817–838.
White, H. (1982). Maximum Likelihood Estimation of Misspecified Models. Econometrica 50, 1–25.
Zellner, A. (1962). An Efficiency Method of Estimating Seemingly Unrelated Regression Equations
and Tests for Aggregation Bias. Journal of the American Statistical Association 57, 348–368.
35
Directed Tests of No Cross-Sectional Correlation in Large-N
Panel Data Models
Matei Demetrescu, Ulrich Homm
Appendix - for online publication
Proof of Lemma 1
For notational simplicity we show the results for deterministic X. For stochastic X they can be
obtained by conditioning on X and applying the LIE.
(i):
E σ̂ij vech Xi0 Xj + Xj0 Xi
= vech Xi0 Xj + Xj0 Xi E
=
1 0
u MXi MXj uj
T i
1
vech Xi0 Xj + Xj0 Xi tr(σij MXi MXj )
T
H0
= 0
(ii):
Up to a scaling constant,
E σ̂ij vech Xi0 Xj + Xj0 Xi σ̂ik v 0ik = T −2 vech Xi0 Xj + Xj0 Xi v 0ik E (MXi ui )0 MXj uj (MXi ui )0 MXk uk
=0
since, under the null, ui , uj and uk are independent.
(iii)
Note that Var σ̂ij vech Xi0 Xj + Xj0 Xi ) =
=
=
=
=
=
=
=
0
1
vech Xi0 Xj + Xj0 Xi E u0i MXi MXj uj u0j MXj MXi ui vech Xi0 Xj + Xj0 Xi
T2
0
1
vech Xi0 Xj + Xj0 Xi E tr u0i MXi MXj uj u0j MXj MXi ui vech Xi0 Xj + Xj0 Xi
T2
0
1
vech Xi0 Xj + Xj0 Xi tr MXi MXj E uj u0j MXj MXi ui u0i vech Xi0 Xj + Xj0 Xi
T2
0
1
vech Xi0 Xj + Xj0 Xi tr MXi MXj E E uj u0j MXj MXi ui u0i |uj
vech Xi0 Xj + Xj0 Xi
T2
0
1
σii tr MXi MXj E uj u0j MXj MXi vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi
T2
0
1
σii σjj tr MXi MXj MXj MXi vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi
T2
0
1
σii σjj tr(MXi MXj ) vech Xi0 Xj + Xj0 Xi vech Xi0 Xj + Xj0 Xi .
T2
(iv):
follows from (ii) and (iii).
Proof of Lemma 2
(i):
36
ij
1
Let vT,m
denote the mth element of the vector v ij
T . If, for all m = 1, . . . , /2 K (K + 1),

Var 
1
√
N X
i−1 X
N T

−1
u0i Xi (Xi0 Xi )
ij 
Xi0 uj vT,m
→0
i=2 j=1
as N, T → ∞, the result follows since the expectation of the term is 0. To this end, re-write the
expression as


−1
N
i−1
N
0
X
1 1 X u0i Xi Xi0 Xi
X
u
1 1 X
1
ij 
√
√ √
√
√i j vT,m
=√ √
ζi,T
T
T N i=2
T
N j=1
T
T N i=2
with
u0i Xi
ζi,T = √
T
Xi0 Xi
−1
T
(13)


i−1
0
X
Xi uj ij 
 √1
√ vT,m .
N j=1
T
Given the assumed independence of the disturbances for i 6= j we have together with the independence
from the regressors that
E (ui |uj , 1 ≤ j < i, Xi , 1 ≤ i ≤ N ) = 0
implying that
E (ζi,T |ζi−1,T , ζi−2,T , . . . , ζ2,T ) = 0
and as such a martingale difference [md] array structure for ζi,T . Then,
Var
N
1 X
√
ζi,T
N i=2
!
=
N
1 X
Var (ζi,T ) ;
N i=2
if the variances on the r.h.s. are uniformly bounded in N and T , the result follows thanks to dividing
√
by T in (13), no matter which rate T has relatively to N. Using again the independence of ui and
uj and conditioning on the Xi , we have thanks to the law of iterated expectations [LIE] that
2
Var (ζi,T ) = E E ζi,T
|Xi , 1 ≤ i ≤ N
0 = E E ζi,T ζi,T
X ,
since E (ζi,T ) = 0. Then,
0 E ζi,T ζi,T
X




0
0 −1
0 −1 0 i−1
i−1
0
0
0
X
X
Xi uj ij   1
Xi uj ij  Xi Xi
Xi ui 
u Xi Xi Xi
 √1
√ vT,m
√
√ vT,m
√ X ;
= E  √i
T
T
T
N j=1
T
N j=1
T
T this equals in turn

E  tr

0
 Xi ui
√
T
u0i Xi
√
Xi0 Xi
−1
T
T

0
 
0 −1 i−1
i−1
0
0
X
X
Xi uj ij   1
Xi uj ij  Xi Xi
 √1
 X  ,
√ vT,m
√
√ vT,m
T
N j=1
T
N j=1
T

or, after using the linearity of the trace,
 
X 0 ui u0i Xi
√
tr E  √i
T
T
Xi0 Xi
T
−1


0
0 −1 i−1
i−1
0
0
X
X
1
X
u
1
X
u
X
X
j
j
i
ij  
ij 
i
X  .
√
√i vT,m
√
√i vT,m
T
N j=1
T
N j=1
T

37
Using the independence of ui and uj , j < i, we obtain that


0 −1
i−1
0
X
X
X
1
X
u
ij i i

.
√i j vT,m
|X = σii tr Cov  √
X
T
N j=1
T

0
E ζi,T ζi,T

Let us now examine the covariance matrix on the r.h.s.:


0
i−1
i−1 i−1
0
X
1 X X ij ik
1
Xi uj ij 
Xi uj u0k Xi √ vT,m X =
√
√ X .
Cov  √
vT,m vT,m E
N j=1
N j=1
T
T
T
k=1
Given the independence of ui,t and uj,s for i 6= j, the expectation on the r.h.s. is only nonzero for
j = k, and we have that

i−1
i−1
2 X 0 X
0
X
1 X
1
X
u
j
ij
ij
i
i i
√ vT,m X  =
Cov  √
σjj vT,m
.
N j=1
T
N j=1
T

The desired variance is then

i−1
i−1
2 X
X
2
1
ij
ij
 = Kσii 1
σjj vT,m
σjj E
vT,m
E Kσii
N j=1
N j=1

2
E E ζi,T
|X
=
≤ K
Considering that E
ij
vT,m
max
1≤m≤ 21 K(K+1)
2 sup E
i,j
ij
vT,m
i−1
2 σ X
ii
σjj .
·
N j=1
is uniformly bounded as implied by (the stricter) Assumption
2(iii), the result follows.
(ii):
Analogous to (i) and omitted.
(iii):
Analogous to (i) and omitted.
(iv):
It suffices to establish the result elementwise, so write for some 1 ≤ m ≤ 1/2K (K + 1)




N
i−1
N X
N
N
X
X
1 X X
1
1
1
ij 
ij
uj,t vT,m
ui,t = 
ui,t uj,t vT,m
−
u2 v ii  .
N i=2 j=1
2 N i=1 j=1
N i=1 i,t T,m
Note that the third absolute moment of the term on the l.h.s. is uniformly bounded if the L3 norm of
the term is uniformly bounded. Apply then Minkowski’s inequality to obtain that


X
X
i−1
N
N
1 X
1 N X
1 N X
1
1
ij
ij
ii 
uj,t vT,m  ui,t ui,t uj,t vT,m u2i,t vT,m
,
N
≤ 2 N
+2
N
i=2 j=1
i=1 j=1
i=1
3
3
3
so the result follows if both L3 norms on the r.h.s. are uniformly bounded. Equivalently, we show the
corresponding expectations to be uniformly bounded in t. Making use of the LIE, we only need to
38
show that
 
3 
N
1 X
ii lim
sup
E E  u2i,t vT,m
X  < ∞
N,T →∞ 1≤i≤N ;1≤t≤T
N
(14)
 
3 
X
N X
N
  1

ij lim
sup
E E  ui,t uj,t vT,m
X  < ∞.
N,T →∞ 1≤i≤N ;1≤t≤T
N i=1 j=1
(15)
i=1
and
Thanks to the independence of ui,t from teh regressors, we have that

3 
N
N N N
1 X
ii jj kk 1 XXX
ii E 
u2i,t vT,m
vT,m vT,m ;
E u2i,t u2j,t u2k,t vT,m
X ≤ 3
N
N i=1 j=1
i=1
k=1
using Hölder’s inequality we also have that E u2i,t u2j,t u2k,t ≤ u2i,t 3 u2j,t u2k,t ≤ u2i,t 3 u2j,t 3 u2k,t 3/2
3
so

3 
N
N N N
1 X
ii jj kk 1 XXX
ii 
u2i,t u2j,t u2k,t E vT,m
E 
≤ 3
u2i,t vT,m
vT,m vT,m .
3
3
3
N
N i=1 j=1
i=1
k=1
ii jj kk ii jj kk vT,m vT,m ≤ vT,m
Use again Hölder’s inequality to conclude that E vT,m
v
vT,m 3 , which
T,m
3
3
leads to

3 
!3
N
N
1 X
X
1
ii
ii u2i,t vT,m
E 
u2i,t vT,m
.
≤
3
3
N
N
i=1
i=1
With E u6i,t being finite and uniformly bounded, u2i,t 3 is itself uniformly bounded in i, and, with the
ii 3
< ∞ so condition (14) is satconditions on Xi imposed in Assumption 2, we have that supi E vT,m
4 PN PN
ij
1
X
,
isfied. For condition (15), note that it is implied by the finiteness of E E
i=1
j=1 ui,t uj,t vT,m
N
for which we have that

4 
N
N

 1 XX
ij  ui,t uj,t vT,m
E 
X
N i=1 j=1
=
N
N
N
N
N
N
N
N
1 X X X X X X X X i1 j1 i2 j2 i3 j3 i4 j4 v
v
v
E
u
u
u
u
u
u
u
u
v
i1 ,t j1 ,t i2 ,t j2 ,t i3 ,t j3 ,t i4 ,t j4 ,t T,m T,m T,m T,m X
N 4 i =1 j =1 i =1 j =1 i =1 j =1 i =1 j =1
1
=
1
N4
1
N
X
N
X
2
N
X
2
N
X
3
N
X
3
4
N
X
N
X
4
N
X
i1 j 1 i2 j 2 i3 j 3 i4 j 4
E (ui1 ,t uj1 ,t ui2 ,t uj2 ,t ui3 ,t uj3 ,t ui4 ,t uj4 ,t ) vT,m
vT,m vT,m vT,m
i1 =1 j1 =1 i2 =1 j2 =1 i3 =1 j3 =1 i4 =1 j4 =1
due to the assumed independence of disturbances and regressors. Because the disturbances are independent, the expectation E (ui1 ,t uj1 ,t ui2 ,t uj2 ,t ui3 ,t uj3 ,t ui4 ,t uj4 ,t ) is nonzero only if the indices are
pairwise equal, which leaves us with O N 4 nonzero summands; and, when nonzero, the expectation E (ui1 ,t uj1 ,t ui2 ,t uj2 ,t ui3 ,t uj3 ,t ui4 ,t uj4 ,t ) is obviously uniformly bounded according to Assumption
1. Furthermore,
E
i1 j 1 i2 j 2 i3 j 3 i4 j 4
vT,m
vT,m vT,m vT,m
s i1 j1 4
i2 j 2 4
i3 j 3 4
i4 j4 4
≤ 4 E vT,m
E vT,m
E vT,m
E vT,m
thanks to the Hölder’s inequality, and the r.h.s. is uniformly bounded according to Assumption 2.
39
4 PN PN
ij
1
X
To sum up, E E
is uniformly bounded, implying the required
i=1
j=1 ui,t uj,t vT,m
N
condition (15). The result then follows.
(v):
Because only those products ui,t uj,t uk,t ul,t for which i = k can have non-zero expectation, we “split”
the problem and show that






!
T
N
N
i−1
i−1
i−1
X
X
X
1
1 X  1 X X
1
ij
ij  p
ij 
il
σii 
σjj vT,m
vT,n
uj,t vT,m
→ 0 (16)
ul,t vT,n
u2i,t  −
T t=1 N 2 i=2 j=1
N i=2
N j=1
l=1
and that


T
N X
i−1 X
N
k−1
X
X
X
1
ij
kl  p
 1
ui,t uj,t uk,t ul,t vT,m
vT,n
→ 0.
T t=1 N 2 i=2 j=1
(17)
i6=k=2 l=1
To establish (16), begin by writing for each time t


!
i−1
N
i−1
X
X
X
1
ij
il

uj,t vT,m 
ul,t vT,n u2i,t
N 2 i=2 j=1
l=1


!
N
i−1
i−1
X
X
X
1
ij
il

= 2
uj,t vT,m 
ul,t vT,n u2i,t − σii
N i=2 j=1
l=1


!
N
i−1
i−1
X
1 X X
ij 
il
+ 2
uj,t vT,m
ul,t vT,n σii
N i=2 j=1
(18)
l=1
and let
ζt,T
=
=


!
N
i−1
i−1
X
1 X X
ij 
il
uj,t vT,m
ul,t vT,n
u2i,t − σii
N 2 i=2 j=1
l=1


!
N
i−1
i−1
1 X 1 X
1 X
ij 
il
√
√
uj,t vT,m
ul,t vT,n
u2i,t − σii
N i=2
N j=1
N l=1
Since E u2i,t − σii u2j,s − σjj = 0 for all s 6= t, the LIE implies that ζt,T has zero expectation and
is uncorrelated in t, so its average over t vanishes in mean square if
lim
sup kζt,T k2 < ∞.
T →∞ 1≤t≤T
Since ζt,T is itself an average over i, use Minkowski’s inequality to obtain that
v 
u


u
i−1
X
u 
1
ij 
kζt,T k2 ≤ u
uj,t vT,m
tE  √
N j=1
1
√
N
i−1
X
!
il
ul,t vT,n
2 

u2i,t − σii  ,
l=1
so the desired condition limT →∞ sup1≤t≤T kζt,T k2 < ∞ follows from the uniform boundedness of the
40
expectation on the r.h.s., for which we have that


2 
!
i−1
i−1
X
X

1
1

ij 
il
√
E  √
uj,t vT,m
ul,t vT,n
u2i,t − σii  
N j=1
N l=1



!2
i−1
i−1
2 1 X
1 X

ij 
il
2
 
√
uj,t vT,m
ul,t vT,n
.
= E  √
 E ui,t − σii
N j=1
N l=1
2 The uniform boundedness of E u2i,t − σii
follows directly from Assumption 1, while the uniform
2 Pi−1
Pi−1
ij
1
il
√1
√
u
v
u
v
is established using the arguments
boundedness of E
j=1 j,t T,m
l=1 l,t T,n
N
N
employed in the proof of item (i).
We now establish (18), which completes the derivation of (16). We have at each time t




!
N
N
i−1
i−1
i−1
X
X
X
1 X X
1
1
ij
ij
ij 
il
σii 
σjj vT,m vT,n 
uj,t vT,m
ul,t vT,n
σii = νt,T +
N 2 i=2 j=1
N i=2
N j=1
l=1
with
νt,T =
N
i−1 X
i−1
X
1 X
ij
il
σ
(uj,t ul,t − σjj δj,l ) vT,m
vT,n
,
ii
N 2 i=2
j=1
l=1
where δj,l is Kronecker’s symbol, δj.l = 1 if j = l and 0 otherwise. Thus, we only have to show that
PT
p
t=1 νt,T → 0. The independence of disturbances and regressors leads us to
1/T
E ( νt,T | X, νt−1,T , . . . ν1,T ) = 0,
PT
so the LIE indicates that νt,T has a md array structure; thus, T1 t=1 νt,T vanishes in mean square if
νt,T has uniformly bounded variance. Uniformly bounded variance is implied by uniformly bounded
L2 norm kνt,T k2 . Rearrange terms to obtain via Minkowski’s inequality that


N i−1
X
1 X
1
ij 
 √
kνt,T k2 ≤
uj,t vT,m
N i=2 N j=1
i−1
1 X
il
√
ul,t vT,n
N l=1
!
i−1
X
ij ij + 1
σ
v
v
.
jj
T,m
T,n
N j=1
2
2
Given
the assumed uniform
moment conditions
on the regressors, it suffices to show that the product
Pi−1
Pi−1
ij
1
il
√1
√
j=1 uj,t vT,m
l=1 ul,t vT,n has uniformly bounded variance, which in turn is implied
N
N
by uniform boundedness of
4

i−1
X
1
ij
uj,t vT,m  .
E √
N j=1
The latter expectation is
4
i−1
i−1 i−1 i−1 i−1
X
1
1 X X X X
ij
ij1
ij2
ij3
ij4
E √
uj,t vT,m  = 2
E (uj1 ,t uj2 ,t uj3 ,t uj4 ,t ) E vT,m
vT,m
vT,m
vT,m
,
N j =1 j =1 j =1 j =1
N j=1
1
2
3
4

2
and, due to the independence of theus, the sum only has
O N nonzero terms. Then, the expec
ij1
ij2
ij3
ij4
tations E (uj1 ,t uj2 ,t uj3 ,t uj4 ,t ) and E vT,m
vT,m
vT,m
vT,m
are uniformly bounded since E u4j,t and
41
E
ij
vT,m
4 are uniformly bounded according to Assumptions 2 and 1.
To complete the proof, we only need to establish (17). Note that, like above, the elements averaged
over t are uncorrelated, so (17) follows if
N i−1
N
k−1
1 XX X X
ij
kl
ui,t uj,t uk,t ul,t vT,m
vT,n
N 2 i=2 j=1
i6=k=2 l=1
=
1
N2
N X
i−1 X
i−1 k−1
X
X
ij
kl
ui,t uj,t uk,t ul,t vT,m
vT,n
+
i=2 j=1 k=2 l=1
N i−1
N
k−1
1 XX X X
ij
kl
ui,t uj,t uk,t ul,t vT,m
vT,n
N 2 i=2 j=1
k=i+1 l=1
P1
PN
has uniformly bounded variance (where we make the convention that k=2 = k=N +1 = 0). This is
the case if each of the two sums on the r.h.s. have themselves uniformly bounded variance. Note that,
for the first sum, i > j and i > k > l, while, for the second, k > l and k > i > j. Thus, for each of the
two sums alone, the summands are pairwise uncorrelated thanks to the independence and zero-mean
of ui,t . The variance of the first sum then satisfies

N X
N
X
 1
Var 
N2
i=2
k=2
k<i



!
i−1
k−1
X
X

ij 
kl

ul,t vT,n
ui,t uk,t 
uj,t vT,m

j=1
l=1


N N
i−1
X
1
1 XX
ij 
Var  √
uj,t vT,m
= 4
N i=2
N j=1
k=2
k<i
k−1
1 X
kl
√
ul,t vT,n
N l=1

!
ui,t uk,t  .
There are ON 2 sum elements, so the variance on the l.h.s. is indeed
uniformly bounded if the ex
2 P
P
i−1
k−1
ij
kl
√1
√1
pectation E
is itself uniformly bounded; this
j=1 uj,t vT,m
l=1 ul,t vT,n ui,t uk,t
N
N
last step is easily established using the LIE, the independence of disturbances and regressors, and the
assumed uniform moment conditions. The same argument applies for the sum with k > i and the
result follows.
(vi):
We establish the result elementwise and show that


i−1
N X
X
1
ij
ij 
(σ̂ii σ̂jj − σii σjj ) vT,m
vT,n
=0
E 2
N i=2 j=1
and
(19)


N X
i−1
X
1
ij
ij 
Var  2
(σ̂ii σ̂jj − σii σjj ) vT,m
vT,n
→0
N i=2 j=1
(20)
for all m, n = 1, . . . , 1/2K (K + 1). Using the LIE, the independence of disturbances for i 6= j, and the
independence of disturbances and regressors, it is straightforward to show that
ij
ij
E (σ̂ii σ̂jj − σii σjj ) vT,m
vT,n
= 0,
which implies (19). The same arguments indicate that
ij
ij
kl
kl
E (σ̂ii σ̂jj − σii σjj ) vT,m
vT,n
(σ̂kk σ̂ll − σkk σll ) vT,m
vT,n
=0
42
unless i = k or j = l. Then,


N X
i−1
X
1
ij
ij 
Var  2
(σ̂ii σ̂jj − σii σjj ) vT,m
vT,n
N i=2 j=1
=
N i−1 N i−1
1 XXXX
ij
ij
kl
kl
Cov (σ̂ii σ̂jj − σii σjj ) vT,m
vT,n
, (σ̂kk σ̂ll − σkk σll ) vT,m
vT,n
.
4
N i=2 j=1
k=2 l=1
ij
ij
Due to the zero expectation of (σ̂ii σ̂jj − σii σjj ) vT,m
vT,n
, the covariances equal the expectation of the
products, and are as such only nonzero if i = k or j = l. Thanks to the assumed uniform moment
conditions, the covariances are uniformly bounded. So there are O N 3 uniformly bounded sum terms
in the expression for the variance (20), implying the variance to vanish and leading to the desired
result.
43