LM-type tests for slope homogeneity in panel data models

LM-type tests for slope homogeneity
in panel data models∗
Jörg Breitung†
University of Cologne
Christoph Roling‡
Deutsche Bundesbank
Nazarii Salish
BGSE, University of Bonn.
July 14, 2016
Abstract
This paper employs the Lagrange Multiplier (LM) principle to test parameter homogeneity across cross-section units in panel data models. The test can be seen
as a generalization of the Breusch-Pagan test against random individual effects to
all regression coefficients. While the original test procedure assumes a likelihood
framework under normality, several useful variants of the LM test are presented to
allow for non-normality, heteroskedasticity and serially correlated errors. Moreover,
the tests can be conveniently computed via simple artificial regressions. We derive
the limiting distribution of the LM test and show that if the errors are not normally
distributed, the original LM test is asymptotically valid if the number of time periods tends to infinity. A simple modification of the score statistic yields an LM
test that is robust to non-normality if the number of time periods is fixed. Further
adjustments provide versions of the LM test that are robust to heteroskedasticity
and serial correlation. We compare the local power of our tests and the statistic
proposed by Pesaran and Yamagata (2008). The results of the Monte Carlo experiments suggest that the LM-type test can be substantially more powerful, in
particular, when the number of time periods is small.
JEL classification: C12; C33
Keywords: panel data model; random coefficients; LM test; heterogenous coefficients
∗
We are grateful to the editor Michael Jansson and anonymous referee for helpful and constructive
comments on an earlier version of this paper.
†
Corresponding author: University of Cologne, Institute of Econometrics, 50923 Cologne, Germany.
Email: [email protected], [email protected], [email protected].
‡
The views expressed in this paper do not necessarily reflect the views of Deutsche Bundesbank or its
staff.
1
Introduction
In classical panel data analysis it is assumed that unobserved heterogeneity is captured by
individual-specific constants, whether they are assumed to be fixed or random. In many
applications, however, it cannot be ruled out that slope coefficients are also individualspecific. For instance, heterogenous preferences among individuals may result in individualspecific price or income elasticities. Ignoring this form of heterogeneity may result in biased estimation and inference. Therefore, it is important to test the assumption of slope
homogeneity before applying standard panel data techniques such as the least-squares
dummy-variable (LSDV) estimator for the fixed effect panel data model.
If there is evidence for individual-specific slope parameters, economists are interested
in estimating a population average like the mean of the individual-specific coefficients.
Pesaran and Smith (1995) advocate mean group estimation, where in a first step the
model is estimated separately for each cross-section unit. In a second step, the unitspecific estimates are averaged to obtain an estimator for the population mean of the
parameters. Alternatively, Swamy (1970) proposes a generalized least squares (GLS)
estimator for the random coefficients model, which assumes that the individual regression
coefficients are randomly distributed around a common mean.
In this paper we derive a test for slope homogeneity by employing the LM principle
within a random coefficients framework, which allows us to formulate the null hypothesis
of slope homogeneity in terms of K restrictions on the variance parameters. Hence, the
LM approach substantially reduces the number of restrictions to be tested compared to
the set of K(N − 1) linear restrictions on the coefficients implied by the test proposed
by Pesaran and Yamagata (2008), henceforth referred to as PY. This does not mean,
however, that our test is confined to detect random deviation in the coefficients. In fact
our test is optimal against the alternative of random coefficients but it is also powerful
against any systematic variations of the regression coefficients.
Our approach is related but not identical to the conditional LM test recently suggested
by Juhl and Lugovskyy (2014) which is referred to as the JL test. The main difference
is that the latter test is derived for a more restrictive alternative, where it is assumed
that the individual-specific slope coefficients attached to the K regressors have identical
variances. In contrast, our test focuses on the alternative that the coefficients have different variances which allows us to test for heterogeneity in a subset of the regression
coefficients. Furthermore, the derivation of our test follows the original LM principle
involving the information matrix, whereas the JL test employs the outer product of the
scores as an estimator of the information matrix. Our simulation study suggest that both
non-standard features of the latter test may result in size distortions in small samples and
a sizable loss in power. An important advantage of the JL test is however that it is robust
against non-Gaussian and heteroskedastic errors. We therefore propose variants of the
1
original LM test that share the robustness against non-Gaussian and heteroskedastic errors. Furthermore, we also suggest a modified LM test that is robust to serially correlated
errors. Another contribution of the paper is the analysis of the local power of the test
that allows us to compare the power properties of the LM and PY tests. Specifically, we
find that the location parameter of the LM test depends on the cross-section dispersion
of the regression variances, whereas the location parameter of the PY test only depends
on the mean of the regressor variances. Thus, if the regressor variances differ across the
panel groups, the gain in power from using the LM test may be substantial.
The outline of the paper is as follows. In Section 2 we compare two tests for slope
heterogeneity recently proposed in the literature. We introduce the random coefficients
model in 3 and lay out the (standard) assumptions for analyzing the large-sample properties. In Section 4 we derive the LM statistic and establish its asymptotic distribution.
Section 5 discusses several variants of the proposed test. First, we relax the normality
assumption and extend the result of the previous section to this more general setting.
Second, we propose a regression-based version of the LM test. Section 6 investigates the
local asymptotic power of the LM test. Section 7 describes the design of our Monte Carlo
experiments and discusses the results. Section 8 concludes.
2
Existing tests
To prepare the theoretical discussion in the following sections, we briefly review the random coefficients model and existing tests. Following Swamy (1970), consider a linear
panel data model
yit = x0it βi + it ,
(1)
for i = 1, 2, . . . , N , and t = 1, 2, . . . , T , where yit is the dependent variable for unit i at
time period t, xit is a K × 1 vector of explanatory variables and it is an idiosyncratic
error with zero mean and variance E (2it ) = σi2 . For the slope coefficient βi we assume
βi = β + vi ,
where β is a fixed K × 1 vector and vi is a i.i.d. random vector with zero mean and K × K
covariance matrix Σv .1
1
For more details and extensions of the basic random coefficient model see Hsiao and Pesaran (2008).
As pointed out by a referee, this specification may be replaced by some systematic variation of the coefficients that depends on observed variables. For example, we may specify the deviations as βi −β = Γzi +ηi ,
where zi is some vector of observed variables possibly correlated with xit . The corresponding variant of
the LM test (which is different from our LM test based assuming that vi and xit are independent) will be
optimal against this particular form of systematic variation. In general, our test assuming independent
variation with Γ = 0 will also have power against systematic variations but admittedly our test is not
2
The null hypothesis of slope homogeneity is
β1 = β2 = · · · = βN = β,
(2)
which is equivalent to testing Σv = 0. To test hypothesis (2), Swamy suggests the statistic
Sb∗ =
N X
βbi − βbWLS
0 X 0 X i
i
s2i
i=1
βbi − βbWLS ,
with Xi = (xi1 , . . . , xiT )0 and βbi = (Xi0 Xi )−1 Xi0 yi is the ordinary least squares (OLS)
estimator of (1) for panel unit i, and t = 1, . . . , T . The common slope parameter β is
estimated by the weighted least-squares estimator
N
X
X 0 Xi
!−1
s2i
i=1
!
i
i
βbWLS =
N
X
X 0 yi
i=1
,
s2i
where s2i denotes the standard OLS estimator of σi2 .
Intuitively, if the regression coefficients are identical, the differences between the individual estimators and the pooled estimator should be small. Therefore, Swamy’s test
rejects the null hypothesis of homogenous slopes for large values of this statistic, which
possesses a limiting χ2 distribution with K(N − 1) degrees of freedom as N is fixed and
T → ∞.
Pesaran and Yamagata (2008) emphasize that in many empirical applications N is
large relative to T and the approximation by a χ2 distribution is unreliable. PY adapt
the test to a setting in which N and T jointly tend to infinity. In particular, they assume
individual-specific intercepts and derive a test for the hypothesis β1 = · · · = βN = β in
yit = αi + x0it βi + it .
(3)
The analogue of the pooled weighted least squares estimator above eliminates the unobserved fixed effects,
βbWFE =
N
X
X 0 Mι Xi
i
i=1
σ
bi2
!−1
N
X
X 0 Mι yi
i
i=1
σ
bi2
!
,
where Mι = IT − ιT ι0T /T , and ιT is a T × 1 vector of ones. A natural estimator for σi2 is
σ
bi2 =
yi − Xi βbi
0
Mι yi − Xi βbi
T −K −1
optimal against alternative with systematically varying coefficients.
3
,
where βbi = (Xi0 Mι Xi )−1 (Xi0 Mι yi ) and the test statistic becomes
N 0 X 0 M X X
ι i
i
b
b
b
b
b
S=
βi − βWFE .
βi − βWFE
σ
bi2
i=1
Employing a joint limit theory for N and T , PY obtain the limiting distribution as
b
d
b = S√− N K →
∆
N (0, 1) ,
2N K
(4)
√
provided that N → ∞, T → ∞ and N /T → 0. Thus, by appropriately centering and
standardizing the test statistic, inference can be carried out by resorting to the standard
normal distribution, provided the time dimension is sufficiently large relative to the crosssection dimension. PY propose several modified versions of this test, which for brevity
we shall refer to as the ∆ tests or statistics. In particular, to improve the small sample
properties of the test, PY suggest the adjusted statistic under normally distributed errors
(see Remark 2 in PY),
e adj =
∆
p
!
N −1 Se − K
N (T + 1)
p
2K (T − K − 1)
,
(5)
where Se is computed as Sb but replacing σ
bi2 by the variance estimator
σ
ei2 =
where βeFE =
N
P
i=1
Xi0 Mι Xi
yi − Xi βeFE
0
Mι yi − Xi βeFE
,
T −1
−1 N
P
Xi0 Mι yi
(6)
is the standard ’fixed effects’ (within-group)
i=1
estimator. Note that this asymptotic framework does not seem to be well suited for typical
panel data applications where N is large relative to T . Therefore, it will be of interest to
derive a test statistic that is valid when T is small (say T = 10) and N is very large (say
N = 1000), which, for instance, is encountered in microeconomic panels.
The test statistic proposed by Juhl and Lugovskyy (2014) is based on the individual
scores
Si = u
b0i Mι Xi Xi0 Mι u
bi − σ
bi2 tr(Xi0 Mι Xi ),
where u
bi = yi − Xi βeFE and tr (A) denotes the trace of the matrix A. The (conditional)
LM statistic results as
CLM =
N
X
i=1
Si0
N
X
i=1
4
!−1
Si Si0
N
X
i=1
Si .
(7)
It is interesting to compare this test statistic to the PY test which is based on the sum
P b
Sb = N
i=1 Si with
0 X 0 M X ι i
i
bi − βbWFE
b
b
b
β
Si = βi − βWFE
σ
bi2
1
= 2 u0i Mι Xi (Xi0 Mι Xi )−1 Xi0 Mι ui + op (1)
σi
if N and T tend to infinity. Note that limN →∞ E(Sbi ) = K. The main difference between the JL and the PY statistics is that the statistic Si neglects the additional inverse
(σi2 Xi0 Mι Xi )−1 in the statistic Sbi . Thus, although these two test statistics are derived from
different statistical principles, the final test statistics are essentially testing the independence of ui and Mι Xi or E(u0i Mι Xi Wi Xi0 Mι ui ) = σi2 E(tr [Mι Xi Wi Xi0 Mι ]) with Wi = IK
for the JL test and Wi = (σi2 Xi0 Mι Xi )−1 for the PY test.
3
Model and Assumptions
Consider a linear panel data model with random coefficients,
yi = Xi βi + i ,
(8)
βi = β + v i ,
(9)
for i = 1, 2, . . . , N , where yi is a is a T ×1 vector of observations on the dependent variable
for cross-section unit i, and Xi is a T × K matrix of possibly stochastic regressors. To
simplify the exposition we assume a balanced panel with the same number of observation
in each panel unit (see also Remark 1 of Lemma 1). The vector of random coefficients
is decomposed into a common non-stochastic vector β and a vector of individual-specific
disturbances vi . Let X = [X10 , X20 , . . . , XN0 ]0 .
In order to construct the LM test statistic for slope homogeneity we start with model
(8)-(9) under stylized assumptions. However, in Section 5 these assumptions will be
relaxed to accommodate more general and empirically relevant setups. The following
assumptions are imposed on the errors and the regressor matrix:
iid
iid
Assumption 1 The error vectors are distributed as i |X ∼ N (0, σ 2 IT ) and vi |X ∼
2
2
N (0, Σv ), where Σv = diag σv,1
, . . . , σv,K
. The errors i and vj are independent from
each other for all i and j.
Assumption 2 For the regressors we assume E|xit,k |4+δ < C < ∞ for some δ > 0, for all
i = 1, 2 . . . , N , t = 1, 2, . . . , T and k = 1, 2 . . . , K. The limiting matrix lim N −1 E (X 0 X)
N →∞
exists and is positive definite for all N and T .
5
In Assumption 1, the random components of the slope parameters are allowed to have
different variances but we assume that there is no correlation among the elements of vi .
Note that this framework is more general than the one considered by Juhl and Lugovskyy
(2014) who assume E(vi vi0 ) = σ 2v IK . The latter assumption seems less appealing if there
are sizable differences in the magnitudes of the coefficients. Furthermore, the power of
the test depends on the scaling of regressors, whereas the (local) power of our test is
invariant to a rescaling of the regressors (see Theorem 5). The alternative hypothesis
can be further generalized by allowing for a correlation among the elements of the error
vector vi . However, this would increase the dimension of the null hypothesis to K(K +1)/2
restrictions and it is therefore not clear whether accounting for the covariances helps to
increase the power of the test. Obviously, if all variances are zero, then the covariances
are zero as well.2
Let ui = Xi vi + i . Stacking observations with respect to i yields
y = Xβ + u,
(10)
0 0
) and u = (u01 , . . . , u0N )0 . The N T × N T covariance matrix of u is
where y = (y10 , . . . , yN
given by


X1 Σv X10 + σ 2 IT
0


..
.
Ω ≡ E [uu0 |X] = 
.


2
0
0
XN Σv XN + σ IT
The hypothesis of fixed homogeneous slope coefficients, βi = β for all i, corresponds to
testing
2
H0 : σv,k
= 0, for k = 1, ..., K,
against the alternative
H1 :
K
X
2
σv,k
> 0,
(11)
k=1
that is, under the alternative at least one of the variance parameters is larger than zero.
4
The LM Test for Slope Homogeneity
0
2
2
Let θ = σv,1
, ..., σv,K
, σ 2 . Under Assumption 1 the corresponding log-likelihood function
results as
` (β, θ) = −
NT
1
1
log(2π) − log |Ω (θ)| − (y − Xβ)0 Ω (θ)−1 (y − Xβ) .
2
2
2
2
(12)
We also conducted Monte Carlo simulations allowing for non-zero diagonal elements in the matrix
Σv . We found that the results are quite similar to the setting where Σv is diagonal.
6
The restricted ML estimator of β under the null hypothesis coincides with the pooled OLS
estimator βe = (X 0 X)−1 X 0 y and the corresponding residual vector and estimated residual
variance are denoted by u
ei = yi − Xi βe and σ
e2 . The following lemma presents the score
and the information matrix derived from the log-likelihood function in (12).
Lemma 1 The score vector evaluated under the null hypothesis is given by
N P

∂` e
S≡
∂θ H0
(1) (1)0
u
e0i Xi Xi u
ei
2
(1)0 (1)
Xi Xi

−σ
e
 i=1

..
1 
.

= 4 N 2e
σ  P 0 (K) (K)0
(K)0 (K)

ei − σ
e 2 Xi Xi
u
ei Xi Xi u
 i=1
0




,



(13)
where X (k) is the k-th column of X for k = 1, 2, . . . , K.
The information matrix evaluated under the null hypothesis is
∂ 2 ` I(e
σ )≡−E
∂θ∂θ0 H0
 N 2
P
(1)0 (1)
X
X
i
i
 i=1
 N 2
 P
(2)0 (1)

Xi X i
1 
 i=1
..
= 4
.
2e
σ 
 N 2
 P
(K)0 (1)

X
X
i
i

2
···
···
..
(k)
where Xi
1, ..., N .
i=1
N P
(1)0 (K)
Xi X i
(2)0
2
2
X
(1)0
X
(1)
X (2)0 X (2)
..
.
.
···
(K)
Xi Xi
i=1
i=1
X (1)0 X (1)
N P
..
.
2
N P
(K)0 (K)
X i Xi
X (K)0 X (K)
i=1
···
X (K)0 X (K)







,





(14)
NT
denotes the k-th column of the T × K matrix Xi , k = 1, 2, . . . , K and i =
Remark 1 It is straightforward to extend Lemma 1 to unbalanced panel data, where
observations are assumed to be missing at random. Let Xi be a Ti × K matrix and u
ei be
a conformable Ti × 1 vector. The score vector is given by

N P
(1) (1)0
u
e0i Xi Xi u
ei
2
(1)0 (1)
Xi Xi

−σ
e
 i=1

..
1 
.

e
S= 4 N 2e
σ  P 0 (K) (K)0
(K)0 (K)

u
ei Xi Xi u
ei − σ
e 2 Xi Xi
 i=1
0
7




,



where
N
1 X 0
u
ei u
ei .
σ
e = N
P
Ti i=1
2
i=1
The information matrix is computed accordingly.
Remark 2 If individual-specific constants αi are included in the regression, then a conditional version of the test is available (cf. Juhl and Lugovskyy (2014)). The individual
effects can be “conditioned out” by considering the transformed regression
Mι yi = Mι Xi β + Mι ui ,
(15)
with Mι as defined in Section 2. The typical elements of the corresponding score vector
result as
1 0
(j) (j)0
(j)
2 (j)0
bi − σ
e Xi Mι Xi
u
e Mι Xi Xi Mι u
, j = 1, . . . , K,
σ
e4 i
where u
ei = Mι yi − Mι Xi βe and βe is the pooled OLS estimator of the transformed model
(15), and σ
e2 is the corresponding estimated residual variance. It follows that we just have
(j)
(j)
to replace the vector Xi by the mean-adjusted vector Mι Xi in Theorem 1.
Remark 3 It is easy to see that under the more restrictive alternative E(vi vi0 ) = σ 2v IK of
2
2
= · · · = σv,k
= σ 2v , the score is simply the sum of
Juhl and Lugovskyy (2014), where σv,1
e
all elements of S.
Remark 4 Notice also that the LM-type statistics do not require the restriction K < T ,
which is important for the PY approach. This is of course not an issue for the asymptotic
framework, where T → ∞, however, it can be a substantive restriction in many empirical
applications when T is small.
In the following theorem it is shown that when T is fixed, the LM statistic possesses a χ2
limiting null distribution with K degrees of freedom as N → ∞.
Theorem 1 Under Assumptions 1, 2 and the null hypothesis
0
−1
d
LM = Se I(e
σ 2 )−1 Se = se0 Ve se → χ2K ,
(16)
as N → ∞ and T is fixed, where se is defined as the K × 1 vector with typical element
N
T
1 X X
sek = 4
u
eit xit,k
2e
σ i=1 t=1
8
!2
N
T
1 XX 2
− 2
x ,
2e
σ i=1 t=1 it,k
(17)
and the (k, l) element of the matrix Ve is given by
Vek,l

!2
N
T
1
1 X X
xit,k xit,l −
= 4
2e
σ
NT
t=1
i=1
N X
T
X
!
x2it,k
i=1 t=1
N X
T
X
!
x2it,l  .
(18)
i=1 t=1
Remark 5 If T is fixed, normality of the regression disturbances is required. If we relax
the normality assumption, an additional term enters the variance of the score vector and
the information matrix becomes an inconsistent estimator. Theorem 2 discusses this issue
in more details and derives the asymptotic distribution of the LM test if the errors are
not normally distributed.
Remark 6 It may be of interest to restrict attention to a subset of coefficients. For
example, in the classical panel data model it is assumed that the constants are individualspecific and, therefore, the respective parameters are not included in the null hypothesis.
Another possibility is that a subset of coefficients is assumed to be constant across all
panel units. To account for such specifications the model is partitioned as
0
0
yit = β1i
Xita + β20 Xitb + β3i
Xitc + uit .
The K1 × 1 vector Xita includes all regressors that are assumed to have individual-specific
coefficients stacked in the vector β1i . The K2 × 1 vector Xitb comprises all regressors that
are supposed to have homogenous coefficients. The null hypothesis is that the coefficient
vector β3i attached to the K3 × 1 vector of regressors Xitc is identical for all panel units,
that is, β3i = β3 for all i, where β3i = β3 + v3i . The null hypothesis implies Σv3 = 0. Let

X1a 0 · · ·

 0 X2a · · ·
Z=
...
 ..
 .
0
0
···
0
0
..
.
X1b
X2b
XNa XNb

X1c

X2c 
,


c
XN
a
a 0
where Xia = [Xi1
, . . . , XiT
] and the matrices Xib and Xic are defined accordingly. The
residuals are obtained as u
e = (I − Z(Z 0 Z)−1 Z 0 )y and the columns of the matrix X c
are used to compute the LM statistic. Some caution is required if a set of individualspecific coefficients are included in the panel regression since in this case the ML estimator
P PT
σ
e2 = (N T )−1 N
e2it is inconsistent for fixed T and N → ∞. This implies that
i=1
t=1 u
the expectation of the score vector (13) is different from zero. Accordingly, the unbiased
estimator
N X
T
X
1
u
e2
σ
b =
N T − K1 − K2 − K3 i=1 t=1 it
2
9
(19)
must be employed. As a special case, assume that the constant is included in Xic , whereas
all other regressors are included in the matrix Xib , and Xia is dropped. This case is
equivalent to the test for random individual effects as suggested by Breusch and Pagan
(1980). The LM statistic then reduces to
2
NT
u
e0 (IN ⊗ ιT ι0T ) u
e
LM =
1−
,
2 (T − 1)
u
e0 u
e
where ιT is a T × 1 vector of ones, which is identical to the familiar LM statistic for
random individual effects.
5
Variants of the LM Test
In this section we generalized the LM test statistic by allowing for non-normally distributed, heteroskedastic and serially dependent errors. First we show in Section 5 that
the proposed LM test is robust against non-normally distributed errors once we assume
N, T → ∞ jointly and specific restrictions on the existence of higher-order moments.
Moreover, the variants of the test with non-normally distributed errors are proposed for
the settings when N → ∞ and T is fixed. Second, in Section 5.2 we propose a variant of
the LM test that is robust to heteroskedastic errors. Finally, Section 5.3 discusses how to
robustify the LM test, when the errors are serially correlated.
5.1
The LM statistic under non-normality
In this section we consider useful variants of the original LM statistic under the assumption
that the errors are not normally distributed. Therefore, we replace Assumptions 1 and 2
by:
Assumption 10 it is independently and identically distributed with E(it |X) = 0, E(2it |X) =
σ 2 and E (|it |6 |X) < C < ∞ for all i and t. Furthermore, it and js are independently
distributed for i 6= j and t 6= s.
Assumption 2 0 For the regressors we assume E|xit,k |6 < C < ∞ for some δ > 0, for all
T
P
i = 1, 2 . . . , N , t = 1, 2, . . . , T and k = 1, 2 . . . , K. Further, lim T −1
E [xit x0it ] tend to
T →∞
a positive definite matrix Qi and the limiting matrix Q :=
t=1
lim (N T )−1
N,T →∞
N P
T
P
E [xit x0it ]
i=1 t=1
exists and is positive definite.
Assumption 3 The error vector vi is independently and identically distributed with
2
2
E(vi |X) = 0, E(vi vi0 |X) = Σv , where Σv = diag σv,1
, . . . , σv,K
and E |vik |2+δ |X <
C < ∞ for some δ > 0, for all i and k = 1, ..., K. Further, vi and j are independent
from each other for all i and j.
10
Notice that, as in Section 3 under the null hypothesis Σv or vi = 0 for all i. Hence,
Assumption 3 is not required for the derivation of the asymptotic null distribution. To
study the behaviour of the LM test statistic under (local) alternatives, Assumption 3 will
be used in Section 6.
With these modifications of the previous setup, the limiting distribution of the LM
statistic is given in
Theorem 2 Under Assumptions 10 , 2 0 and the null hypothesis,
d
LM → χ2K ,
(20)
as N → ∞, T → ∞ jointly.
Generalizing the model to allow for non-normally distributed errors introduces a new term
into the variance of the score: the (k, l) element of the covariance matrix now becomes
(see equation (49) in appendix A.2)
(4)
Vk,l +
µu − 3σ 4
(2σ 4 )2
!
N X
T
X
i=1 t=1
N
T
1 XX 2
x2it,k −
x
N T i=1 t=1 it,k
!
N
T
1 XX 2
x2it,l −
x
N T i=1 t=1 it,l
!
, (21)
(4)
where µu denotes the fourth moment of the error distribution, and Vk,l is as in (18) with
(4)
σ
e4 replaced by σ 4 . The additional term depends on the excess kurtosis µu − 3σ 4 . Clearly,
for normally distributed errors, this term disappears, but it deviates from zero in the more
general setup. Under Assumptions 10 and 2 0 , the first term Vk,l is of order N T 2 , while the
new component is of order N T , such that, when the appropriate scaling underlying the
LM statistic is adopted, it vanishes as T → ∞. Therefore, the LM statistic as presented
in the previous section continues to be χ2K distributed asymptotically. By incorporating a
suitable estimator of the second term in (21), however, a test statistic becomes available
that is valid in a framework with non-normally distributed errors as N → ∞, whether T
is fixed or T → ∞. Therefore, denote the adjusted LM statistic by
−1
LMadj = se0 Veadj
se,
(4)
where Veadj is as in (21) with Vk,l , σ 4 and µu replaced by the consistent estimators Vek,l
P PT
(4)
e4it for k, l = 1, ..., K. As a consequence
defined in (18), σ
e4 and µ
eu = (N T )−1 N
i=1
t=1 u
of Theorem 2 and the preceding discussion, we obtain the following result.
Corollary 1 Under Assumptions 10 , 2 and the null hypothesis
d
LMadj → χ2K ,
11
as N → ∞ and T is fixed. Furthermore,
p
LMadj − LM → 0,
as N → ∞, T → ∞ jointly.
As mentioned above, once the regression disturbances are no longer normally distributed,
the fourth moments of the error distribution enter the variance of the score. It is insightful to identify exactly which terms give rise to this new form of the covariance matrix.
According to Lemma 1, the contribution of the i-th panel unit to the k-th element of the
score vector is
!
T
T X
X
X
(k)
(k)0
(k)0
(k)
0
2
2
2
2
u
ei Xi Xi u
ei − σ
e Xi X i =
xit,k u
eit − σ
e
+
u
eit u
eis xit,k xis,k .
(22)
t=1
t=1 s6=t
The variance of the first term on the right hand side depends on the fourth moments of
the errors. Since the contribution of this term vanishes if T gets large, it can be dropped
without any severe effect on the power whenever T is sufficiently large. Hence, we consider
a modified score vector as presented in the following theorem.
Theorem 3 Under Assumptions 10 , 2 and the null hypothesis, the modified LM statistic
∗
LM = s̃
∗0
Ve ∗
−1
d
s̃∗ → χ2K ,
as N → ∞ and T fixed, where s̃∗ is K × 1 vector with contributions for panel unit i
s̃∗i,k
T t−1
1 XX
u
eit u
eis xit,k xis,k ,
= 4
σ
e t=2 s=1
(23)
for i = 1, ..., N , k = 1, ..., K, and the (k, l) element of Ve ∗ is given by
N
T t−1
1 XXX
∗
e
Vk,l = 4
xit,k xit,l xis,k xis,l ,
σ
e i=1 t=2 s=1
(24)
for k, l = 1, ..., K.
Remark 7 It is important to note that this version of the LM test is invalid if the panel
regression allows for individual-specific coefficients (cf. Remark 3). Consider for example
the regression
yit = αi + x0it βi + uit
12
(25)
where αi are fixed individual effects and we are interested in testing H0 : var(βi ) = 0.
The residuals are obtained as
u
eit = yit − y i − (xit − xi )0 βe = uit − ui − (xit − xi )0 (βe − β).
It follows that in this case E(e
uit u
eis xit,k xis,k ) 6= 0 and, therefore, the modified scores (23)
result in a biased test. To sidestep this difficulty, orthogonal deviations (e.g. Arellano
and Bover (1995)) can be employed to eliminate the individual-specific constants yielding
yit∗ = β 0 x∗it + u∗it t = 2, 3, . . . , T,
"
!#
r
t−1
X
t
−
1
1
with yit∗ =
yit −
yis
,
t
t − 1 s=1
where x∗it and u∗it are defined analogously. It is well known that if uit is i.i.d. so is u∗it . It
follows that the modified LM statistic can be constructed by using the OLS residuals u
e∗it
instead of u
eit . This approach can be generalized to arbitrary individual-specific regressors
a
xit . Let Xia = [xai1 , . . . , xaiT ]0 denote the individual-specific T × K1 regressor matrix in the
regression
yi = Xia β1i + Xib β2 + Xic β3i + ui ,
(26)
(see Remark 3). Furthermore, let
Mia = IT − Xia (Xia0 Xia )−1 Xia0 ,
fa denote the (T − K1 ) × T matrix that results from eliminating the last K1 rows
and let M
i
a
from Mi such that (Mia Mia0 ) is of full rank. The model (26) is transformed as
yi∗ = Xib∗ β2 + Xic∗ β3i + u∗i ,
(27)
fia M
fia0 )−1/2 M
fia . It is not difficult to see that E(u∗i u∗0
where yi∗ = Ξai yi and Ξai = (M
i ) =
2
σ IT −K1 and, thus, the modified scores (23) can be constructed by using the residuals of
(27), where the time series dimension reduces to T − K1 . Note that orthogonal deviations
result from letting Xia be a vector of ones.
To review the results of this section, the important new feature in the model without
assuming normality is that the fourth moments of the errors enter the variance of the
score. The information matrix of the original LM test derived under normality does not
incorporate higher order moments, but the test remains applicable as T → ∞. To apply
the LM test in the original framework when T is fixed and errors are no longer normal we
can proceed in two ways. A direct adjustment of the information matrix to account for
13
higher order moments yields a valid test. Alternatively, we can adjust the score itself and
restrict attention to that part of the score that does not introduce higher order moments
into the variance. In the next section, we further pursue the second route of dealing
with non-normality and thereby robustify the test against heteroskedasticity and serial
correlation.
5.2
The regression-based LM statistic
In this section we offer a convenient way to compute the proposed LM statistic via a
simple artificial regression. Moreover, the regression-based form of the LM test is shown
to be robust against heteroskedastic errors. Following the decomposition of the score
contribution in (22) and the discussion thereafter, we construct the “Outer Product of
Gradients” (OPG) variant of the LM test based on the second term in (22). Rewriting
the corresponding elements of the score contributions of panel unit i as
s̃∗i,k
=
T X
t−1
X
u
eit u
eis xit,k xis,k ,
(28)
t=2 s=1
for k = 1, ..., K. Note that we dropped the factor 1/e
σ 4 as this factor cancels out in the
final test statistic. This gives the usual LM-OPG variant
LMopg =
N
X
!0
s̃∗i
i=1
N
X
!−1
s̃∗i s̃∗0
i
i=1
N
X
!
s̃∗i
,
(29)
i=1
0
where s̃∗i = s̃∗i,1 , ..., s̃∗i,K . An asymptotically equivalent form of the LM-OPG statistic
can be formulated as a Wald-type test for the null hypothesis ϕ = 0 in the auxiliary
regression
K
X
u
eit =
zeit,k ϕk + eit , for i = 1, . . . , N, t = 1, . . . , T
(30)
k=1
where
zeit,k = xit,k
t−1
X
u
eis xis,k
s=1
for k = 1, ..., K. Therefore, with the Eicker-White heteroskedasticity-consistent variance
estimator, the regression based test statistic results as
LMreg =
N X
T
X
i=1 t=2
!0
u
eit zeit
N X
T
X
i=1 t=2
!−1
u
e2it zeit zeit0
T
N X
X
!
u
eit zeit
,
(31)
i=1 t=2
It follows from the arguments similar as in Theorem 3 that Mreg test statistic is
asymptotically χ2 distributed but it turns out to be robust against heteroskedasticity:
14
Corollary 2 Under Assumption 10 but allowing for heteroscedastic errors such that E[2it |X] =
σit2 < C < ∞, Assumption 2 and the null hypothesis
d
LMreg → χ2K ,
(32)
as N → ∞ and T is fixed.
It is important to note that the LM-OPG variant cannot be applied to residuals from a
fixed effect regression, see Remark 7. Furthermore, the replacement of the residuals by
orthogonal forward deviation will not fix this problem since orthogonal forward deviations
are no longer serially uncorrelated if the errors are heteroskedastic. Therefore, a version
of the test is required that is robust against autocorrelated errors.
5.3
The LM statistic under serially dependent errors
In this section we propose a variant of the LM test statistic that accommodates serially
correlated errors, that is, we relax Assumptions 10 as follows:
Assumption 100 The T × 1 error vector i is independently and identically distributed
with E(i |X) = 0, E(i 0i |X) = E(i 0i ) = Σ and E |it |4+δ |X < C < ∞ for some δ > 0
and all i and t. The T × T matrix Σ is positive definite with typical element σts for
t, s = 1, . . . , T .
Note that Assumption 100 allows for heteroscedasticity and serial dependence across time,
however, it restricts the error vector i to be iid across individuals.
Under this assumption the expectation of the score vector (23) is under the null hypothesis
E[uit uis xit,k , xis,k ] = σts E[xit,k xis,k ].
We therefore suggest a modification for autocorrelated errors based on the adjusted K × 1
PN ∗∗
score vector s̃∗∗ with typical element s̃∗∗
k =
i=1 s̃i,k for k = 1, . . . , K and
s̃∗∗
i,k
=
T X
t−1
X
(e
uit u
eis − σ
ets ) xit,k xis,k ,
(33)
t=2 s=1
P
where σ
ets = N1 N
eit u
eis . The asymptotic properties of the LM statistic based on the
i=1 u
modified score vector are presented in
Theorem 4 Let
LMac = s̃
∗∗ 0
15
Ve ∗∗
−1
s̃∗∗ ,
where Ve ∗∗ is a K × K matrix with typical element
∗∗
Vek,l
=
T X
T X
t−1 X
t−1
N X
X
δbtsτ q xit,k xis,k xiτ,l xiq,l
(34)
i=1 t=2 τ =2 s=1 q=1
and
δbtsτ q
1
=
N
N
X
!
u
ejt u
ejs u
ejτ u
ejq − σ
ets σ
eτ q
.
j=1
Under Assumptions 100 , 2, the null hypothesis (2) and as N → ∞ with T fixed the LMac
statistic has a χ2K limiting distribution.
Note that this version of the test has a good size control irrespective of serial dependence in errors. However, the test involves some power loss relative to the original
test statistics when errors are serially uncorrelated, which is not surprising given a more
general setup of this variant of the test. The respective asymptotic power results are analyzed in the next section (see Remark 10). Section 7 elaborates in detail on the size-power
properties of the LMac in finite samples.
6
Local Power
The aim of this section is twofold. First, we investigate the distributions of the LM-type
test under suitable sequences of local alternatives. Two cases are of interest, N → ∞
with T fixed and N, T → ∞ jointly, which are presented in the respective theorems below.
Second, we adopt the results of PY to our model in order to compare the local asymptotic
power of the two tests. To formulate an appropriate sequence of local alternatives, we
specify the random coefficients in (9) in a setup in which T is fixed. The error term vi is
as in Assumption 1 with elements of Σv given by
ck
2
σv,k
=√ ,
N
(35)
where ck > 0 are fixed constants for k = 1, . . . , K. The asymptotic distribution of the
LM statistic results as follows.
Theorem 5 Under Assumptions 1, 2 and the sequence of local alternatives (35),
d
LM → χ2K (µ) ,
as N → ∞ and T fixed, with non-centrality parameter µ = c0 Ψc, where c = (c1 , . . . , cK )0
16
and Ψ is a K × K matrix with (k, l) element

Ψk,l =
1
1
plim
2σ 4 N →∞ N
N
X
T
X
i=1
t=1
!2
xit,k xit,l
−
1
T
1
N
N X
T
X
!
x2it,k
i=1 t=1
1
N
N X
T
X
!
x2it,l  .
i=1 t=1
In order to relax the assumption of normally distributed errors we adopt Assumption 10
for vi , where the sequence of local alternatives is now given by
2
σv,k
=
c
√k ,
T N
(36)
for k = 1, . . . , K. Note that according to Theorem 2 we require T → ∞.
Theorem 6 Under Assumptions 10 , 2 0 , 3 and the sequence of alternatives (36),
d
LM → χ2K (µ) ,
as N → ∞, T → ∞, with non-centrality parameter µ = c0 Ψc, where c = (c1 , . . . , cK )0 and
Ψ is a K × K matrix with (k, l) element
Ψk,l
N
1 X
1
= 4 plim
2σ N,T →∞ N i=1
T
1X
xit,k xit,l
T t=1
!2
.
Remark 8 As in Section 5.1 above, when the normality assumption is relaxed, local
power can be studied for LM ∗ under Assumptions 10 , 2 and 3 when T is fixed. The
specification of local alternatives as in Theorem 5 applies. The non-centrality parameter
of the limiting non-central χ2 distribution results as µ∗ = c0 Ψ∗ c with
Ψ∗k,l
N
T t−1
1
1 XXX
= 4 plim
xit,k xit,l xis,k xis,l ,
σ N →∞ N i=1 t=2 s=1
for k, l = 1, ..., K.
∗
Remark 9 Given theresults
for the modified statistic LM in remark 8, and the fact
P
N
∗
that s̃∗ =
Ze0 u
e∗ , we expect a similar result for the regression-based LM
i=1 s̃i =
statistic LMreg to hold. Recall that LM ∗ uses N −1 Ve ∗ as an estimator of the variance of
P PT
s̃∗ (see (24)), while LMreg employs N −1 N
e2it zeit zeit0 . Under the null hypothesis,
i=1
t=2 u
it is not difficult to see that these two estimators are asymptotically equivalent. Under
the alternative, when studying the (k, l) element of the variance of LMreg , we obtain (see
17
appendix A.2 for details)
N
N
T
T
1 XX 2
1 XX 2
u
e zeit,k zeit,l =
xit,k xit,l
N i=1 t=2 it
N i=1 t=2 it
t−1
X
!
is xis,k
s=1
t−1
X
!
is xis,l
s=1
N
T
1 XX 2 0 X
+
v B vi + op (1) ,
N i=1 t=2 it i it
(37)
P
P
0
xit,l t−1
with the K × K matrix BitX = xit,k t−1
s=1 xis xis,k
s=1 xis xis,l . The first term
∗
on the right-hand side in (37) has the same probability limit as N −1 Vek,l
, the limiting
∗
∗
covariance matrix element Ψk,l . In contrast to LM , however, the variance estimator of
the regression-based test involves additional quadratic forms such as vi0 BitX vi , contributing
2
= √ckN ,
to the estimator. Since, in a setup with fixed T and the local alternatives σv,k
N
T
1 XX 2 0 X
it vi Bit vi = Op N −1/2 ,
N i=1 t=2
the variance estimator remains consistent. In small samples, however, the additional
term results in a bias of the variance estimator and may deteriorate the power of the
regression-based test. See the appendix for details about the above result and the Monte
Carlo experiments in Section 7.
Remark 10 The arguments of Remark 8 can be used to derive the local power of the
LMac statistic that accounts for serial correlation in errors. The same specification of
local alternatives applies. The non-centrality parameter of the limiting non-central χ2
distribution takes the quadratic form µ∗∗ = c0 Ψ∗∗ c with
Ψ∗∗
k,l = plim
N →∞
N
T t−1 T t−1
1 XXXXX
(uit uis uiτ uiq − σts στ q ) xit,k xis,k xiτ,l xiq,l ,
N i=1 t=2 s=1 τ =2 q=1
for k, l = 1, ..., K. In the absence of serial correlation it can be shown that the LMac test
involve a loss of power. To illustrate this fact assume for simplicity that K = 1 (single
regressor case). Further, the score vector in (33) can be equivalently written as
ŝ
∗∗
=
N X
T X
t−1
X
(e
uit u
eis − σ
ets ) xit,k xis,k =
i=1 t=2 s=1
N X
T X
t−1
X
u
eit u
eis xit,k xis,k − C ts ,
(38)
i=1 t=2 s=1
P
where C ts = N1 N
eit u
eis is equivalent with demeaning of
i=1 xit xis . Thus, demeaning of u
xit,k xis,k . In the case of no autocorrelation and (38) it follows that Ψ∗ − Ψ∗∗ is positive
semi-definite. Therefore, the modification (33) tends to reduce the power of the LMac
test when compared to LM ∗ .
18
We now proceed to examine the local power of the ∆ statistic of PY in model (8)
and (9) under the sequence of local alternatives (36). In our homoskedastic setup, the
dispersion statistic becomes
Se =
N X
i=1
0 X 0 X i i
ei − βe ,
β
βei − βe
σ
e2
b statistic
with βe as the OLS estimator in (10) as above. Using this expression, the ∆
b
is computed as in (4). The next theorem presents the asymptotic distribution of the ∆
statistic under the local alternatives as specified above. This result follows directly from
Section 3.2 in PY.
Theorem 7 Under Assumptions 10 , 2 0 , 3 and the sequence of local alternatives (36)
d
b →
∆
N (λ, 1) ,
as N → ∞, T → ∞, provided
with typical element
√
√
N /T → 0, where λ = Λ0 c/ 2K and Λ is a K × 1 vector
N
T
1 XX 2
1
x ,
Λk = 2 plim
σ N,T →∞ N T i=1 t=1 it,k
for k = 1, . . . , K.
b is slightly different from the
In Theorem 7, the mean of the limiting distribution of ∆
result in Section 3.2 in PY. Here, vi is random and independently distributed from the
regressors and, therefore, the second term of the respective expression in PY is zero.
Remark 11 Consider for simplicity a scalar regressor xit that is i.i.d. across i and t
2
, that is, the
with uniformly bounded fourth moments. Let E [xit ] = 0 and E [x2it ] = σi,x
regressor is assumed to have a unit-specific variation which is constant over time for a
given unit. We obtain

T
1X 2

E
x
T t=1 it
!2 
2 2
 = σi,x
+ O T −1 ,
P
2 2
implying µ = c2 /2σ 4 limN →∞ N −1 N
in Theorem 6. To gain further insight,
i=1 σi,x
2
2
we think of σi,x
as being randomly distributed in the cross-section such that the noncentrality parameter results as
µ=
2 2 2 c2 h 2 2 i
c2 E
σ
=
V
ar
σ
+
E
σi,x
.
i,x
i,x
2σ 4
2σ 4
19
(39)
Similarly, under these assumptions, we find
λ=
2 c
√ E σi,x
.
σ2 2
(40)
Comparing the mean of the normal distribution of the ∆ statistic in (40) with the noncentrality parameter of the asymptotic χ21 distribution of the LM statistic in (39), we see
2
contributes to
that the main difference between the two tests is that the variance of σi,x
2 the power of the LM statistic but not to the power of the ∆ test. If V ar σi,x = 0 such
2
that σi,x
= σx2 for all i, the LM test and the ∆ test have the same asymptotic power in
2 this example. If, however, V ar σi,x
> 0, so that there is variation in the variance of the
regressor in the cross-section, the LM test has larger asymptotic power. To illustrate this
point, we examine the local asymptotic power functions of the LM and the ∆ test for two
cases, using the expressions in (39) and (40). Figure 1 (see appendix C) shows the local
asymptotic power of the LM (solid line) and the ∆ test (dashed line) as a function of c
2
2
has a χ21 distribution. Figure 2 repeats this exercise for σi,x
drawn from a χ22
when σi,x
distribution. In both cases, the LM test has larger asymptotic power. The power gain
is substantial for the first case, but diminishes for the second. This pattern is expected,
2
contributes relatively more to the non-centrality parameter in the
as the variance of σi,x
first specification.
This discussion exemplifies the difference between the LM-type tests and the ∆ statistic
in terms of the local asymptotic power in a simplified framework. The analysis suggests
that the LM-type tests are particularly powerful in an empirically relevant setting in
which there is non-negligible variation in the variances of the regressors between panel
units. Having studied the large samples properties of the LM tests under the null and
the alternative hypothesis in our model, we now evaluate the finite-sample size and power
properties of the LM-type tests in a Monte Carlo experiment.
7
7.1
Monte Carlo Experiments
Design
After deriving LM-type tests in the random coefficient model, we now turn to study the
small-sample properties of the proposed test and it variants. The aim of this section is to
evaluate the performance of the tests in terms of their empirical size and power in several
different setups, relating to the theoretical discussion of Sections 4 - 6. We consider the
following test statistics: the original LM statistic presented in Theorem 1, the adjusted
LM statistic that adjusts the information matrix to account for fourth moments of the
error distribution (see Corollary 1), the score-modified LM statistics (see Theorem 3 and
Theorem 4) and the regression-based, heteroskedasticity-robust LM statistic (see Section
20
e adj given in (5). Following the notes
5.2). As a benchmark, we consider PY’s statistic ∆
e adj is carried out as a two-sided test. In addition, the
in Table 1 in PY, the test using ∆
CLM test in (7) is included, which is also a two-sided test. We consider the following
data-generating process with normally distributed errors as the standard design:
yit = αi + x0it βi + it ,
iid
it ∼ N (0, 1) ,
(41)
iid
αi ∼ N (0, 0.25) ,
xit,k = αi + ϑxit,k , k = 1, 2, 3,
iid
2
,
ϑxit,k ∼ N 0, σix,k
iid
βi ∼ N3 (ι3 , Σv ) ,
under the null hypothesis: Σv = 0


0.03 0
0


under the alternative: Σv =  0 0.02 0  ,
0
0 0.01
(42)
(43)
where i = 1, 2, . . . , N , t = 1, 2, . . . , T . Hence, to simulate a model under the null the
slope vector βi is generated as a 3 × 1 vector of ones ι3 for all i. As discussed in Section
6 the variances of the regressors play an important role. In our benchmark specification
we generate the variances as
2
σix,k
= 0.25 + ηi,k
iid
ηi,k ∼ χ21 ,
(44)
2
The choice of the χ2 distribution for σix,k
is made analogous to the Monte Carlo experiment
in PY. We then consider variations of this specification below. All results are based on
5,000 Monte Carlo replications. We choose
N ∈ {10, 20, 30, 50, 100, 200} ,
T ∈ {10, 20, 30} ,
as we would like to study the small sample properties of the test procedures when the time
dimension is small. In our first set of Monte Carlo experiments the errors are normally
distributed; therefore we focus on the standard LM test. We also include their respective
heteroskedasticity-robust regression variants for this exercise.
21
7.2
Normally distributed errors
Panel A of Table 1 (see Appendix B) shows the rejection frequencies when the null hye adj test has rejection frequencies close to the nominal size of 5%
pothesis is true. The ∆
for all combinations of N and T , while the CLM test rejects the null hypothesis too often,
in particular for small N . Deviations from the nominal size for the the standard LM test
and the regression-based test are small and disappear as N increases, as expected from
Theorem 1. Panel B of Table 1 shows the corresponding rejections frequencies under the
e adj and the CLM test in general.
alternative hypothesis. The LM test outperforms the ∆
This observation holds in particular for T = 10 where the power gain is considerable. The
e adj test for T = 10, suffers from a power loss
LMreg variant, although as powerful as the ∆
relative to the standard LM test. This power loss may be due to the small sample bias of
the variance estimator, see Remark 9.
Following Remark 7 the variants of the LM tests are computed as follows. First, the
individual-specifc fixed effects αi are eliminated by transforming the data using orthogonal
forward deviations (see Arellano and Bover (1995)). The LM statistics are then computed
using the transformed data. The results presented in Panel A of Table 2 indicate that
by employing forward orthogonalization all variants of the LM test have size reasonably
close to the nominal level. By comparing panel B of Table 1 and the rejection rates under
the alternative in panel B of Table 2 we see that the power is very similar in both setups
confirming usefulness of the forward orthogonalization procedure for the LM tests.
7.3
Non-normal errors
We now investigate the LM test when the errors are no longer normally distributed,
thereby building on the results of Section 5.1. The errors in (41) are generated from a
t-distribution with 5 degrees of freedom, scaled to have unit variance. All other specifications of the standard design remain unchanged. In addition to the statistics already
considered, we now include the adjusted LM statistic (see corollary 1) and the scoremodified statistic (see Theorem 3). Panel A in Table 3 reports the rejection frequencies
under the null hypothesis in this case. We notice that the LM test has substantial size
distortions when T is fixed and N increases, which is expected from Theorem 2. However,
the adjusted LM statistic LMadj and the modified score statistic LM ∗ are both successful
in controlling the type-I error.
Panel B of Table 3 shows rejection frequencies under the alternative hypothesis. The
e adj test is noticeable when T = 10 or T = 20.
power gain of the LM test relative to the ∆
We found similar results when the errors are χ2 distributed with two degrees of freedom,
centered and standardized to have mean zero and variance equal to one. Given the
similarity of the results for t and χ2 distributed errors, we do not present the latter
results.
22
7.4
Serially correlated errors
To study the impact of serially correlated errors on the test statistics we adjust the DGP
as follows:
yit = x0it βi + it ,
it = ρit−1 + 1 − ρ2
1/2
eit ,
iid
for i = 1, 2, . . . , N , t = 1, 2, . . . , T , where eit ∼ N (0, 1). Under the null hypothesis
βi = 1 for all i while under the alternative βi is generated as in (43). The regressors, xit,k ,
k = 1, 2, 3 are generated as
xit,k = φi,k xit−1,k + 1 − φ2i,k
1/2
ϑxit,k ,
iid
φi,k ∼ U [0.05, 0.95],
iid
2
ϑxit,k ∼ N 0, σix,k
,
iid
2
where σix,k
= 0.25 + ηi,k with ηi,k ∼ χ21 . Parameters φi,k and σix,k are fixed across
replications.
Results of this simulation experiment are reported in Table 4. Panel A and B show
the rejection frequencies under the null hypothesis in case of “small” serial dependence
(i.e., ρ = 0.2, Panel A) and “moderate” dependence (i.e., ρ = 0.5, Panel B). For all LM
based test statistics, except the LMac test, we observe substantial size deviations from
the nominal level. However, the LMac test is successful in controlling the type-I error.
Further, size properties of PY test are also significantly affected by autocorrelated errors.
Note that this fact is already documented and studied in Blomquist and Westerlund
(2013).
Panel C of Table 4 reports power properties of the test under no serial correlation (i.e.,
ρ = 0), building on the discussion in Remark 10. We observe that the LMac test involve
a 5 − 10% power loss compared to the LM ∗ test. This relative power loss dies out if T
increases.
8
Concluding remarks
In this paper we examine the problem of testing slope homogeneity in a panel data model.
We develop testing procedures using the LM principle. Several variants are considered
that robustify the original LM test with respect to non-normality, heteroscedasticity and
serially correlated errors. By studying the local power we identify cases where the LMtype tests are particularly powerful relative to existing tests. In sum, our Monte Carlo
experiments suggest that the LM test are powerful testing procedures to detect slope
23
homogeneity in short panels in which the time dimension is small relative to the crosssection dimension. The LM approach suggested in this paper may be extended in future
research by allowing for dynamic specifications with lagged dependent variables and cross
sectionally or serially correlated errors.
References
Arellano, M. and O. Bover (1995). Another look at the instrumental variable estimation
of error-components models. Journal of Econometrics 68, 29–51.
Baltagi, B., Q. Feng, and C. Kao (2011). Testing for sphericity in a fixed effects panel
data model. The Econometrics Journal 14, 25–47.
Blomquist, J. and J. Westerlund (2013). Testing slope homogeneity in large panels with
serial correlation. Economics Letters 121 (3), 374 – 378.
Breusch, T. S. and A. R. Pagan (1980). The Lagrange multiplier test and its applications
to model specification in econometrics. Review of Economic Studies 47, 239–253.
Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association 72,
320–338.
Honda, Y. (1985). Testing the error components model with non-normal disturbances.
Review of Economic Studies 52, 681–690.
Hsiao, C. and M. H. Pesaran (2008). Random coefficient models. In L. Mátyás and
P. Sevestre (Eds.), The Econometrics of Panel Data, Chapter 6. Springer.
Juhl, T. and O. Lugovskyy (2014). A test for slope homogeneity in fixed effects models.
Econometric Reviews 33, 906–935.
Pesaran, M. H. and R. Smith (1995). Estimating long-run relationships from dynamic
heterogenous panels. Journal of Econometrics 68, 79–113.
Pesaran, M. H. and T. Yamagata (2008). Testing slope homogeneity in large panels.
Journal of Econometrics 142, 50–93.
Phillips, P. C. B. and H. R. Moon (1999). Linear regression limit theory for nonstationary panel data. Econometrica 67 (5), 1057–1111.
Swamy, P. A. V. B. (1970). Efficient inference in a random coefficient regression model.
Econometrica 38, 311–323.
Ullah, A. (2004). Finite-sample econometrics. Oxford University Press.
Wand, M. P. (2002). Vector differential calculus in statistics. The American Statistician 56, 1–8.
24
White, H. (2001). Asymptotic theory for econometricians. Emerald.
Wiens, D. P. (1992). On moments of quadratic forms in non-spherically distributed
variables. Statistics 23 (3), 265–270.
25
A
Appendix: Proofs
To economize on notation we use
P
and
P
N
P
instead of full expressions
t
i
and
T
P
through-
t=1
i=1
out this appendix.
A.1
Preliminary results
We first present an important result concerning the asymptotic effect of the estimation
error βe − β on the test statistics. Define
!
X
1
(k)
(k) (k)0
(k)0 (k)
IT .
A i = Xi Xi −
Xi Xi
NT i
(k)
Lemma A.1 Let RXAX =
P
i
(k)
(k)
Xi0 Ai Xi and RXAu =
P
(k)
Xi0 Ai ui for k = 1, ..., K. Fur-
i
thermore let
(k)
RN
=
σ
e4
σ4
1
2σ 2
βe − β
0
(k)
RXAX
0
(k)
e
e
β − β − 2 β − β RXAu ,
for k = 1, ..., K. Under Assumptions 1, 2 and the null hypothesis the following properties
hold if T is fixed:
(k)
(i) RXAX = Op (N ),
(k)
(ii) RXAu = Op N 1/2 ,
(k)
(iii) RN = Op (1),
for k = 1, ..., K.
(j)
Proof. (i) Using the definition of Ai
(k)
RXAX =
X
(k)
(k)0
Xi0 Xi Xi
i
yields
1
Xi −
NT
!
XX
!
X
x2it,k
t
i
Xi0 Xi
.
i
The first term is a K × K matrix with typical (l, m) element
!
!
X X
X
xit,l xit,k
xit,m xit,k = Op (N ) ,
i
t
t
as a consequence of Assumption 2, while
P P
i
t
x2it,k /N T = Op (1) and
P
i
Xi0 Xi = Op (N ).
(ii) Recall that under the null hypothesis, ui = i . Thus
(k)
RXAu
=
X
i
(k)
Xi0 Xi
1
(k)0
Xi u i −
NT
26
!
XX
i
t
x2it,k
!
X
i
Xi0 ui
.
The first and the second term are Op N 1/2 by a the central limit theorem (CLT) for
independent random variables
and Assumption 2. (iii) Combining (i) and (ii) together
√ with the fact that N βe − β = Op (1) yields the result.
Lemma A.2 Under Assumptions 10 , 2 0 and the null hypothesis the following properties
hold for N → ∞ and T → ∞:
(k)
(i) RXAX = Op (N T 2 ),
(k)
(ii) RXAu = Op N 1/2 T 3/2 ,
(k)
(k)
(iii) RN T = Op (T ) , which is defined as RN in Lemma A.1,
for k = 1, ..., K.
(k)
Proof. Following the proof of Lemma A.1 the element of the first term of RXAX is
Op (N T 2 ), whereas the second term is Op (N T ) by Assumption 2 0 which yields state(k)
ment (i). Notice in (ii) RXAu has two terms as in Lemma A.1, where the first one
has zero mean and variance of order T 3 . Therefore by Lemma 1 in Baltagi, Feng,
(j) (j)0
and Kao (2011) we have that Xi0 Xi Xi ui = Op (T 3/2
) and by Lemma 2 in PY that
P 0 (j) (j)0 P
X i Xi
Xi ui = Op N 1/2 T 3/2 and
Xi0 ui = Op N 1/2 T 1/2 . These results
i
i
√
e
and the fact that N T β − β = Op (1) imply (iii).
A.2
Proofs of the main results
Proof of Lemma 1
We use the following rules for matrix differentiations:
∂`
1
1 0 −1 ∂Ω −1
−1 ∂Ω
= − tr Ω
Ω u ,
+
uΩ
∂θk
2
∂θk
2
∂θk
∂`
1
∂Ω
∂Ω
−1
−1
−E
= tr Ω
Ω
,
∂θk ∂θl
2
∂θk
∂θl
for k, l = 1, 2, . . . , K + 1, see, e.g., Harville (1977) and Wand (2002). First,
X
(k) (k)0
2
Xi Σv Xi0 =
σv,k
Xi Xi ,
k
(k)
with Xi
denoting the k-th column vector of Xi .

X1 Σv X10
0

.
..

0
XN Σv XN0
27
Hence

 X 2
σv,k Ak ,
=
k
(45)
(46)
with the N T × N T matrix ,

(k)
(k)0
X1 X1
...

Ak = 
(k)
0
(k)
for k = 1, . . . , K, and Xi

0
(k)0

,
X N XN
denotes the k-th column of the T × K matrix Xi . Thus,
X
2
Ω=
σv,k
Ak + σ 2 IN T
k
and
∂Ω
=
∂θk
(
Ak , for k = 1, 2, . . . , K,
IN T , for k = K + 1.
Under the null hypothesis we have Ω = σ 2 IN T . Using (45) we obtain
(
− 2eσ1 2 tr [Ak ] + 2eσ1 4 u
e0 Ak u
e, for k = 1, 2, . . . , K
∂` =
∂θk H0
0,
for k = K + 1,
where
1 0
u
eu
e,
N T
−1
u
e = IN T − X (X 0 X) X 0 y.
σ
e2 =
The representation of the score vector follows from
XX
2
Xit,k
= X (k)0 X (k) ,
tr [Ak ] =
i
t
where X (k) denotes the k-th column of the N T × K matrix X. Similarly, (46) yields

1

 2σ4 tr [Ak Al ] , for k, l = 1, 2, . . . , K,
∂`
1
=
−E
X (k)0 X (k) , for k = 1, 2, . . . , K, and l = K + 1,
2σ 4
∂θk ∂θl H0 
 NT
,
for k = l = K + 1,
2σ 4
Using the fact that Ak and Al are block-diagonal,
X h (k) (k)0 (l) (l)0 i X (k)0 (l) 2
tr [Ak Al ] =
tr Xi Xi
Xi Xi
=
Xi X i
,
i
i
(k)
where Xi denotes the i-th column of Xi , which yields the form of the information matrix
presented in the lemma.
Proof of Theorem 1
28
Recall that
(k)
Ai
=
(k) (k)0
Xi Xi
!
1 X (k)0 (k)
Xi Xi
IT ,
NT i
−
and rewrite the elements of the scores as
4
σ
e
1 X 0 (k)
sek =
u
eA u
ei ,
σ 4 2σ 4 i i i
for k = 1, ..., K. Since u
ei = ui − Xi (βe − β) we have
4
1
1
σ
1 X 0 (k)
1 (k)
√ sek = √
ui Ai ui + √ RN ,
4
4
e
2σ i
N
N σ
N
P h (k) i
(k)
(k)
where RN = Op (1) from Lemma A.1. Since i tr Ai = 0 it follows that E(u0i Ai ui ) =
0 and, therefore,
1
lim E √ se = 0.
N →∞
N
The covariances are obtained as
i
h
(k) (l)
(l)
(k)
Cov u0i Ai ui , u0i Ai ui X = 2σ 4 tr Ai Ai
!
!
2
X
X
1
1
(k)0 (l)
(l)0 (l)
(k)0 (k)
(k)0 (k)
(l)0 (l)
−
= 2σ 4 Xi Xi
X i Xi −
Xi Xi
Xi Xi
X i Xi
NT i
NT i
!
!
1 X (k)0 (k)
1 X (l)0 (l)
+T
,
X i Xi
Xi Xi
NT i
NT i
(l)
(k)
and since u0i Ai ui is independent of u0j Ai uj for all i 6= j conditional on X,
1
2σ 4
2
1
= 4
2σ
Cov
X
(k)
u0i Ai ui ,
i
X
X
(l)
u0i Ai ui
i
(k)0 (l)
Xi Xi
i
2
1
−
NT
!
X
!
X
X
(k)0
X
(k)
i
!!
X
(l)0 (l)
X i Xi
i
= Vk,l .
The Liapounov condition in the central limit theorem for independent random variables
(see White (2001), Theorem 5.10) is satisfied by Assumption 2 and therefore
1 e
V
N
−1/2 1
d
√ se → N (0, IK ) ,
N
where Ve replaces σ 4 in V by σ
e4 . By the formula for the partitioned inverse
I(e
σ 2 )−1 1:K,1:K = Ve −1 ,
29
where {·}1:K,1:K denotes the upper-left K × K block of the matrix, it follows finally that
d
Se0 I(e
σ 2 )−1 Se = se0 Ve −1 se → χ2K .
Proof of Theorem 2
The proof proceeds in three steps: (i) we derive the covariance matrix of the score vector,
(ii) we establish the asymptotic normality of the score vector and (iii) we use these results
to establish the asymptotic distribution of the LM statistic.
(i) Define the K × 1 vector s = [s1 , ..., sK ]0 with typical element
sk =
1 X 0 (k)
1 X
u
A
u
=
si,k ,
i
2σ 4 i i i
2σ 4 i
(47)
(k)
where si,k = u0i Ai ui and 1 ≤ k ≤ K. Using standard results for quadratic forms (see
e.g., Ullah (2004), appendix A.5),
h
i
(k)
E si,k X = σ 2 tr Ai
h
i
i
h
i h
(k)0 (l)
(k) (l)
(k)
(l)
4
4
4
E si,k si,l X = 2σ tr Ai Ai + σ tr Ai tr Ai + µ(4)
ai ai ,
u − 3σ
(k)
(k)
where ai is a vector consisting of the main diagonal elements of the matrix Ai
denotes the fourth moment of uit . Since
i
i h
h
(l)
(k)
E si,k X E si,l X = σ 4 tr Ai tr Ai ,
we have
(4)
and µu
i
h
(k)0 (l)
(k) (l)
4
ai ai .
Cov si,k , si,l X = 2σ 4 tr Ai Ai + µ(4)
u − 3σ
(48)
(l)
(k)
Due to the independence of u0i Ai ui and u0j Aj uj for i 6= j, it follows that
Cov
X
i
si,k ,
X
i
!
X h (k) (l) i
X (k)0 (l)
4
si,l X = 2σ 4
tr Ai Ai + µ(4)
ai ai .
u − 3σ
i
i
h
i
(k) (l)
Let VN T denote the covariance matrix of s. Inserting the expression for tr Ai Ai , we
determine the (k, l) element of VN T as

!2
!
!
XX
XX
1 X X
1
2
x
x
−
x
x2it,k 
Vk,l =
it,k
it,l
it,k
4
2σ
NT
t
t
t
i
i
i
!
!
!
(4)
XX
XX
1
1
µu − 3σ 4 X X
+
x2it,k −
x2it,k
x2it,l −
x2it,l
2
4
N
T
N
T
(2σ )
t
t
t
i
i
i
= V1,k,l + V2,k,l .
(49)
(ii) To verify that a central limit theorem applies to s, let λ ∈ Rk , ||λ|| = 1 and Zi,T =
1 0
λ si , where si is a K × 1 vector with elements si,k for 1 ≤ k ≤ K. Further, E [Zi,T ] = 0
T
30
2 and E Zi,T
= T12 λ0 E [Vi,T ] λ, where Vi,T is a K × K matrix with the typical (k, l) element
defined in (48) for 1 ≤ k, l ≤ K. From the Cramer-Wold device we conclude that it is
sufficient to show that
1 X
d
√
Zi,T −→ N (0, V) ,
(50)
N i
P
where V ≡ lim N1T 2 i λ0 E [Vi,T ] λ. Assumption 2 0 and (48) ensure that V exists and
N,T →∞
is positive definite.
The asymptotic normality result (50) follows from the central limit theorem for the
double indexed process (see e.g., Phillips and Moon (1999), Theorem 2) if the following
condition holds
2 2
X Zi,T
Zi,T > ε −→0 for all ε > 0,
E
1 (51)
V
V
N
T
N
T
i
P
where VN T = T12 i λ0 E [Vi,T ] λ. In turn the Lindeberg condition (51) holds provided that
s 3
i
sup E kZi,T k3 ≤ sup E < ∞.
T
i,T
i,T
(52)
To study wether si /T is uniformly L3 bounded for all i and T it suffices to consider si /T
elementwise. Furthermore, each element of si /T can be written in terms of quadratic
forms i.e.,
1
1
0
0
si,k =
ui Bi,k ui − E (ui Bi,k ui )X
,
T
T
(k)
(k)0
where Bi,k = Xi Xi
for 1 ≤ k ≤ K and by the triangle inequality
3
3
1
0
1
1
3
0
E si,k = 3 E ui Bi,k ui − E (ui Bi,k ui )X ≤ 3 E |u0i Bi,k ui |
T
T
T
2
3
3
2 0
0
0
0
E |ui Bi,k ui | E (ui Bi,k ui )X + 3 E |ui Bi,k ui | E (ui Bi,k ui )X +
3
T
T
3
1 0
X .
E
(u
B
u
)
+
i,k
i
i
T3 (53)
For the first term on the r.h.s of (53) we make use of a formula for the third moment of a
quadratic form (see e.g., Wiens (1992) or Ullah (2004), appendix A.5), the law of iterated
expectations and uniform bounds E [|uit |6 |X] < C < ∞ and E|xit,k |6 < C < ∞ given in
Assumptions 10 and 2 0 , i.e.,
1
1
3
3 0
0
E |ui Bi,k ui | = 3 E E |ui Bi,k ui | X
T3
T
##
" "
X X X
1
2
2
2
2
2
2
=
E
E
+ O T −1 .
(54)
u
u
u
x
x
x
it1 it2 it3 it1 ,k it2 ,k it3 ,k X
3
T
t1
t2
t3
2 2 2 Further, from Assumptions 1 we have E uit1 uit2 uit3 X < C < ∞ and from Assump
tions 2 0 the term E x2it1 ,k x2it2 ,k x2it3 ,k is uniformly bounded for all i and T . Then it follows
from triangle inequality that the first term on the r.h.s. of (54) is uniformly bounded. The
0
31
same reasoning applies to the rest of the terms in (53) to show their uniform boundedness.
This concludes the proof of (52) and the asymptotic normality of the score vector s.
(iii) Rewrite the first K elements of the score as
4
σ
e
s + RN T ,
se =
σ4
where RN T is given in Lemma A.2 and s has typical element as defined in (47). By (ii),
d
s0 (VN T )−1 s → χ2K ,
(55)
as N → ∞, T → ∞, where VN T has (k, l) element Vk,l as in (49). Under Assumptions 10
and 2 0
V1 = Op N T 2 ,
V2 = Op (N T ) ,
where V1 and V2 are specified elementwise in (49). Given the expression for Ve in Theorem
1,
Ve
V1 p
−
→0
2
NT
NT 2
and hence
p
s0 Ve −1 s − s0 (VN T )−1 s → 0
(56)
as N → ∞, T → ∞. The LM statistic can be expanded as
LM = se 0 Ve −1 se
4 0
4 σ
e
σ
e
−1
e
=
s + RN T V
s + RN T
σ4
σ4
4
σ
e
0 e −1
s
V
s
+ Op N −1/2 .
=
4
σ
(57)
where the last line follows from Lemma A.2. The theorem follows by combining (55), (56)
and (57).
Proof of Corollary 1
(4)
The result
follows immediately from the proof of Theorem 2 and the fact that µ
eu =
(4)
−1 P P
(N T )
u
eit is a consistent estimator of µu .
i
t
Proof of Theorem 3
Using similar arguments as in the proof of Theorem 1,


P P t−1
P
1
xit,1 uit xis,1 uis 
σ4
4
i t s=1


1
1
σ 

..
√ se∗ = √

 + op (1).
.
4


e
N
N σ
 1 P P t−1

P
x
u
x
u
it,K
it
is,K
is
4
σ
i
t s=1
32
(58)
P Pt−1 ∗ ∗ ∗
= xit,k u∗it . Clearly, E
Let u∗it = uit /σ and zitk
t
s=1 zitk zisk = 0. Since conditional on
P Pt−1 ∗ ∗
P Pt−1 ∗ ∗
X, t s=1 zit,k zis,k and t s=1 zjtl zjsl are independent for i 6= j, the covariances for
two elements k and l of the vector (58) are
! T t−1
" T t−1
! #
i
h
X
X
X
X
X
1
∗ ∗ ∗ ∗
∗
∗
E sk sl X = 4
zitl zisl X
E
zitk zisk
σ i
t=2 s=1
t=2 s=1
! t−1
!
T
X
1 X X
= 4
xit,k xit,l
xis,k xis,l
σ i=1 t=2
s=1
∗
= Vk,l
.
since all cross terms have zero expectation and E (u∗it )2 = 1. The central limit theorem
for independent random variables and Slutsky’s theorem imply
1 ∗
V
N
−1/2 1
√ s∗
N
d
→ N (0, IK )
and the result follows.
Proof of Corollary 2
Using the arguments in Theorem 1 and 3 (under Assumption 10 , 2 and allowing for
E[2it |X] = σit2 ), LMreg is asymptotically χ2K if the Liapounov condition is satisfied
P P 2and the
asymptotic covariance matrix of the score vector is equal to the limit of
u
eit zeit,k zeit,l
i
t
as N → ∞ and T is fixed.
2+δ
Regarding the Liapounov condition it suffices to show that E s∗i,k < C < ∞ for
k = 1, ..., K. By Minkowski inequality,
2+δ
E s∗i,k ≤
t−1 XX
t
E |uit uis xit,k xis,k |2+δ
2+δ
1
! 2+δ
.
s=1
Further by the Cauchy-Schwartz inequality, the law of iterated expectations, Assumptions
10 and 2,
#
"r
h
i
2+δ
2+δ
2+δ
E |xit,1 uit xis,1 uis |
E |u2it u2is | |X |xit,1 xis,1 |
≤E
#
h
i h
i
4+δ
4+δ
2+δ
E |uit |
E |uis | |X |xit,1 xis,1 |
< C < ∞.
"r
= E
Hence the Liapounov condition holds.
Regarding the (k, l) element of the covariance matrix of s̃∗ , note that
"
!!
!! #
t−1
t−1
X
X
X
X
X
E
xit,k uit
xis,k uis
xit,l uit
xis,l uis
X
t
t−1
XXX
i
=
i
t
s=1
t
2
σit2 σis
xit,k xit,l xis,k xis,l .
s=1
33
s=1
Pt−1
Next let zit,k = xit,k s=1
uis xis,k and notice that
#
"
t−1
XXX
XX
2
0 2
xit,k xit,l xis,k xis,l .
σit2 σis
E
uit zit,k zit,l X =
t s=1
t
i
i
Furthermore
1 XX 2
1 XX 2
p
u
eit zeit,k zeit,l −
uit zit,k zit,l → 0,
N i t
N i t
and result (32) follows.
Proof of Theorem 4
Consider the normalized scores
1
1
√ s̃∗∗ = √ s∗∗ + op (1)
N
N
where the k-element of the vector s∗∗ is given by
s∗∗
k
N X
T X
t−1
X
=
(uit uis − σts ) xit,k xis,k
i=1 t=2 s=1
By construction E [s∗∗
k ] = 0 for k = 1, . . . , K under the null hypothesis. Same arguments
as for result (32) apply to show the Liapounov condition and make use of the central limit
theorem for independent and heterogeneously random variables . It remains to show that
the (k, l) element of the covariance matrix of s∗∗ takes form (34). Since conditional on X
∗∗
, contributions s∗∗
i,k and sj,k are independent for i 6= j, the covariances for two elements k
∗∗
and l of the vector s normalized with N are
i
h
∗∗ E s∗∗
s
k l X =
! #
! T t−1
" T t−1
X
XX
XX
=
E
(uit uis − σts ) xit,k xis,k
(uiτ uiq − στ q ) xiτ,l xiq,l X
t=2 s=1
τ =2 q=1
i
!
T X
t−1
T X
t−1
XX
X
=
xit,k xis,k
E [(uit uis − σts ) (uiτ uiq − στ q ) |X] xiτ,l xiq,l ,
i
t=2 s=1
τ =2 q=1
Further,
E [(uit uis − σts ) (uiτ uiq − στ q ) |X] = E [uit uis uiτ uiq |X] − σts στ q ,
and the (k, l) element can be written as
h
∗∗
E s∗∗
k sl
N X
T X
t−1
i
X
xit,k xis,k
X =
i=1 t=2 s=1
=
N X
T X
t−1 X
T X
t−1
X
i=1 t=2 s=1 τ =2 q=1
34
t−1
T X
X
!
δtsτ q xiτ,l xiq,l
τ =2 q=1
δtsτ q xit,k xis,k xiτ,l xiq,l ,
where δtsτ q = E[ujt ujs u
ujq |X] − σts στ q . Substituting
δtsτ q by an appropriate consistent
jτ
PN
b
estimator δtsτ q = 1/N
u
ejt u
ejs u
ejτ u
ejq − σ
ets σ
eτ q yields the limiting distribution of the
j=1
modified test statistic.
Proof of Theorem 5
As in Honda (1985) the proof of the theorem proceeds in three steps: (i) first we show
that σ
e2 remains consistent under the local alternative; (ii) second, we incorporate the
local alternative into the score vector and (iii) establish the asymptotic distribution of
the LM statistic. (i) Note first that with MX = IN T − X (X 0 X)−1 X 0
u
e = MX u = MX (DX v + ) ,
where


X1
0


X2


DX = 
.
.
.


.
0
XN
Hence,
1
u
e0 u
e
0
0
=
(0 − 0 PX + v 0 DX
MX DX v + v 0 DX
MX + v 0 MX DX ) .
NT
NT
Using Assumptions 10 , 2 and 3, it is straightforward to show that
1 0
−1
X (X 0 X) X 0 = op (1) ,
N
1 0 0
v DX MX DX v = op (1) ,
N
1 0 0
v DX MX = op (1) .
N
and, thus, σ
e2 = σ 2 + op (1) .
(ii) Since ui = Xi vi + i and
u
ei = Xi vi + i − Xi βe − β ,
we obtain
σ4
1 X 0 (k)
A i
σ
e4 2σ 4 i i i
4
σ
1
1 X 0 0 (k) +√
v Xi Ai Xi vi + op (1),
e4 2σ 4 i i
N σ
1
1
√ sek = √
N
N
(59)
for k = 1, ..., K, where the order of the remainder term follows by similar arguments as
in lemma A.1.
√
(iii) Using the same arguments as in the proof of Theorem 1, the first term of se/ N in
35
(59) is asymptotically normally distributed. Regarding the second term
0 1 X 0 0 (k) 1 X
(k)
√
vi Xi Ai Xi vi =
N 1/4 vi Xi0 Ai Xi N 1/4 vi ,
N i
N i
and by standard results for quadratic forms,
h
h
i
0 i
(k)
(k)
1/4
1/4
E N vi Xi Ai Xi N vi X = tr Xi Ai Xi Dc .
with Dc = diag(c1 , . . . , cK ). Thus by the law of large numbers for sums of independent
random variables,
X 1
1 X h 0 (k) i
p
0
0 (k)
√
tr Xi Ai Xi Dc .
vi Xi Ai Xi vi → lim
N →∞ 2σ 4 N
2 N σ4 i
i
Now
X
tr
h
(k)
i
Xi0 Ai Xi Dc =
i
K
X

cl 
!2
X X
t
i
l=1
xit,k xit,l
Define the K × 1 vector ψ elementwise by

!2
K
X X
X
1
1
cl plim 
xit,k xit,l −
ψk ≡
N i
T
N →∞
t
l=1
−
1
NT
!
!
XX
i
1 XX 2
x
N i t it,k
!
t
x2it,k
XX
i
x2it,l 
t
!
X
X
1
x2 
N i t it,l
By Slutsky’s theorem theorem we obtain
1
V
N
−1/2 1
d
√ se → N (ψ, IK ) ,
N
and the theorem follows by the definition of the non-central χ2 distributed random variable
with ψ = Ψc and c = (c1 , . . . , cK )0 .
Proof of Theorem 6
The proof is analogous to the proof of Theorem 5. To show that σ
e2 remains consistent
under the sequence of alternatives we note that
−1
0 X (X 0 X) X 0 = Op (1) ,
0
MX DX v = Op N 1/2 T ,
v 0 DX
0
v 0 DX
MX = Op N T 1/2 + Op T 1/2 .
√
Using the same arguments as in the proof of Theorem 2, se/(T N ) has a limiting normal
distribution with nonzero mean which is determined by applying the law of large numbers
to the second term in (59) with proper normalization.
Proof of Theorem 7 With the Swamy statistic as described in the text, the proof follows
the steps outlined in Appendix A.6 in PY.
36
Details for Remark 9
P PT
0
2
z
e
z
e
u
e
We study the (k, l) element of N −1 N
i=1
t=2 it it it under the sequence of alternatives in Theorem 5. Note that
0
2
u
e2 = (it + x0 vi ) + βe − β xit x0 βe − β − 2 (it + x0 vi ) x0 βe − β ,
(60)
it
it
it
it
it
and
zeit,k = xit,k
t−1 X
is + x0is vi + x0is βe − β xis,k .
s=1
implying,
zeit,k zeit,l = xit,k xit,l
+ βe − β
0
xit,k
!
t−1
X
is xis,k
s=1
t−1
X
t−1
X
!
is xis,l
+ xit,k
xis xis,k
!
xit,l
is xis,k
s=1
− xit,k xit,l
− xit,k xis,l
t−1
X
s=1
t−1
X
s=1
!
is xis,k
xit,k
s=1
t−1
X
xit,l
!!
xis xis,k
xit,l
x0is xis,l
t−1
X
!
x0is xis,l
vi
s=1
βe − β
s=1
t−1
X
s=1
t−1
X
!
(x0is vi ) xis,l
+
xit,k
t−1
X
!
(x0is vi ) xis,k
xit,l
t−1
X
xis,l x0is
x0is βe − β xis,k
!
t−1
X
is xis,l
s=1
!
t−1
X
!
s=1
t−1
X
βe − β − xit,k xit,l
− xit,k xis,l
!
is xis,l
s=1
s=1
!
s=1
!
s=1
!!
s=1
t−1
X
+ vi0
t−1
X
(x0is vi ) xis,k
t−1
X
!
xis,l x0is
βe − β
s=1
x0is βe − β xis,k
s=1
!
t−1
X
s=1
(61)
First, from the first term on the right hand sides of (60) and (61), we obtain
! t−1
!
t−1
N
T
X
X
1 XX 2
is xis,l .
is xis,k
xit,k xit,l
N i=1 t=2 it
s=1
s=1
∗
Notice that this term has the same probability limit as Vek,l
/N , which is equal to Ψ∗k,l .
Next, from the first term on the right-hand side in (60) and the second term on the
right-hand side in (61),
!
!
N X
T
t−1
t−1
X
X
X
1
2it vi0 xit,k
xis xis,k
xit,l
x0it xis,l vi
N i=1 t=2
s=1
s=1
!
!
T
t−1
t−1
N
X
X
0
1 XX 2
1/4
0
= 1.5
xit,k
xis xis,k
xit,l
xit xis,l N 1/4 vi
it N vi
N i=1 t=2
s=1
s=1
37
!
x0is vi xis,l
Since it and vi are independent conditional on X,
!
!
"
#
t−1
t−1
X
X
0
0
1/4
2
1/4
xis xis,k
xit,l
xit xis,l N vi X = σ 2 tr BitX Dc
E it N vi
xit,k
s=1
with the K × K matrix BitX = xit,k
s=1
Pt−1
s=1
xis xis,k
t−1
N
T
X
0
1 XX 2
1/4
xis xis,k
N vi
xit,k
N 1.5 i=1 t=2 it
s=1
!
xit,l
xit,l
t−1
X
Pt−1
s=1
x0it xis,l such that
!
Xit0 xis,l
N 1/4 vi = Op N −1/2
s=1
Using the properties of it , vi and the fact that βe − β = op (1), it can be shown in a
similar manner that all of the remaining terms are of lower order.
38
B
Appendix: Tables
Table 1: Rejection frequencies for H0 : all coefficients are homogenous
A) Size
B) Power
e adj
∆
CLM
LM
LMreg
N = 10
N = 20
N = 30
N = 50
N = 100
N = 200
6.3
5.6
5.5
5.2
5.4
4.6
11.8
12.0
11.4
8.9
8.3
7.5
2.6
3.2
3.7
3.7
4.7
4.9
4.7
4.3
4.3
4.1
4.8
4.6
N = 10
N = 20
N = 30
N = 50
N = 100
N = 200
5.3
5.7
5.9
5.1
4.5
5.1
15.1
14.2
12.8
10.8
8.3
7.1
2.6
3.4
3.8
4.1
4.3
5.2
N = 10
N = 20
N = 30
N = 50
N = 100
N = 200
4.7
4.5
5.2
5.3
5.2
5.2
15.6
14.5
13.2
11.6
8.9
7.4
2.4
3.5
3.9
4.5
4.3
5.0
e adj
∆
CLM
LM
LMreg
5.5
8.4
11.5
18.9
36.1
65.6
5.4
4.2
6.3
15.9
46.5
87.3
11.3
24.8
35.2
51.2
77.5
96.9
4.1
7.0
10.4
19.6
47.0
83.1
6.3
5.7
5.8
5.3
4.7
5.5
17.1
35.0
50.7
74.6
95.7
99.9
3.4
9.5
23.4
53.5
90.1
98.8
28.1
53.0
70.5
88.9
99.1
100.0
12.4
25.8
43.2
71.8
96.5
100.0
7.0
6.2
5.6
5.9
4.7
5.8
34.4
64.8
81.9
96.3
100.0
100.0
4.6
19.7
42.5
76.9
96.0
99.3
43.0
74.3
88.9
98.2
100.0
100.0
22.4
50.9
73.3
94.5
99.9
100.0
T = 10
T = 20
T = 30
Notes: Rejection frequencies (in %) for K = 3 under the null (panel A) and the alternative hypothesis
(panel B). Nominal size is 5%.
39
Table 2: Rejection frequencies: model with individual effects
A) Size
B) Power
LM
LM ∗
LMreg
N = 10
N = 20
N = 30
N = 50
N = 100
N = 200
2.6
3.3
3.1
3.8
4.1
4.2
2.9
4.0
3.9
4.6
4.8
4.8
N = 10
N = 20
N = 30
N = 50
N = 100
N = 200
2.3
3.4
3.8
3.8
4.3
4.8
N = 10
N = 20
N = 30
N = 50
N = 100
N = 200
2.8
3.6
3.7
4.7
4.4
4.6
LMac
e adj
∆
CLM
LM
LM ∗
LMreg
LMac
4.8
5.2
4.7
4.3
4.7
4.7
3.4
3.8
4.1
4.5
4.6
4.7
5.7
8.1
8.7
13.1
29.2
49.0
5.3
4.3
5.3
10.8
40.8
79.5
8.3
21.6
26.5
42.0
73.4
91.8
8.6
20.2
24.2
38.9
67.7
87.9
4.2
6.8
8.8
15.7
40.4
72.9
7.8
17.8
22.5
35.8
64.5
84.9
2.2
3.5
4.1
4.2
4.6
4.8
6.4
6.0
5.4
5.1
5.5
5.3
2.4
3.9
4.2
3.8
4.5
4.7
19.1
39.1
40.0
72.5
91.4
99.8
3.6
9.4
15.8
53.9
89.0
98.4
31.4
59.3
60.2
89.0
98.2
100.0
30.8
57.8
58.0
88.1
97.7
100.0
13.1
31.7
33.0
73.3
93.9
99.9
26.3
53.6
53.8
84.7
96.9
100.0
2.9
3.5
3.7
4.5
4.4
4.6
8.0
6.2
5.8
5.8
5.1
4.8
3.5
3.8
4.1
4.8
4.7
4.6
26.6
66.8
91.8
94.6
99.9
100.0
3.9
24.6
47.8
67.1
92.8
98.4
37.8 37.6
78.0 77.3
95.3 94.9
97.0 96.9
100.0 100.0
100.0 100.0
18.0
56.3
86.4
91.8
99.9
100.0
29.4
72.2
93.7
96.1
99.9
100.0
T = 10
T = 20
T = 30
Notes: Left panel: rejection frequencies (in %) under the null hypothesis with same design as in
Table 1 and orthogonal forward orthogonalization to eliminate fixed effects. Right panel: rejection
frequencies (in %) under the alternative hypothesis and forward orthogonalization to eliminate fixed
effects. Nominal size is 5%.
40
Table 3: Size and power for t-distributed errors
e adj
∆
CLM
LM
LMadj
LM ∗
LMreg
N = 10
N = 20
N = 30
N = 50
N = 100
N = 200
6.3
6.1
5.6
5.1
5.0
5.5
9.0
10.5
10.2
8.9
7.5
6.9
3.5
5.4
6.9
7.5
9.8
10.7
2.9
4.0
5.0
5.3
6.2
5.7
4.0
6.1
6.2
6.6
7.2
7.6
3.9
4.5
4.9
4.4
4.9
5.2
N = 10
N = 20
N = 30
N = 50
N = 100
N = 200
5.5
5.8
5.4
5.2
4.9
5.1
13.2
13.0
11.7
10.4
8.0
7.0
3.5
5.1
5.5
6.7
8.0
8.9
2.8
4.0
4.3
4.8
5.5
5.5
3.6
4.6
4.8
5.1
5.9
5.4
6.0
6.3
5.6
5.3
5.1
4.8
N = 10
N = 20
N = 30
N = 50
N = 100
N = 200
5.8
5.0
4.7
5.1
5.0
5.3
15.0
13.6
12.4
11.3
8.5
7.7
3.0
4.5
5.2
5.9
6.4
7.3
2.6
3.5
4.3
4.7
5.0
5.0
3.0
3.9
4.2
5.1
4.8
5.2
7.0
5.7
5.5
5.9
5.1
4.9
e adj
∆
CLM
LM
LMadj
LM ∗
LMreg
N = 10
N = 20
N = 30
N = 50
N = 100
N = 200
5.9
9.7
13.6
24.1
45.3
76.1
3.4
3.9
6.9
16.0
44.4
83.1
12.4
25.6
36.1
52.4
78.9
95.7
10.9
22.7
31.8
46.7
71.8
92.6
11.8
22.9
32.4
47.9
73.9
93.8
4.9
7.5
11.9
22.6
49.4
83.2
N = 10
N = 20
N = 30
N = 50
N = 100
N = 200
20.0
39.5
57.3
80.0
97.8
100.0
3.2
10.1
22.8
51.5
89.5
98.8
30.5
53.6
70.5
88.3
99.0
100.0
28.6
50.0
67.3
85.5
98.4
100.0
29.4
52.5
68.5
86.7
99.0
100.0
13.7
29.7
46.7
72.2
96.4
100.0
N = 10
N = 20
N = 30
N = 50
N = 100
N = 200
37.5
67.7
85.4
97.3
100.0
100.0
3.7
19.6
43.0
76.2
95.9
99.4
45.8
74.3
88.6
98.0
100.0
100.0
43.8
71.9
86.6
97.2
100.0
100.0
45.1
73.3
87.4
97.7
100.0
100.0
25.4
54.3
74.4
93.8
99.9
100.0
A) Size
T = 10
T = 20
T = 30
B) Power
T = 10
T = 20
T = 30
Notes: Rejection frequencies (in %) for K=3 under the null (panel A) and the alternative hypothesis
(panel B) when it is drawn from a t-distribution with five degrees of freedom. Nominal size is 5%.
41
Table 4: Size and power in a model with autocorrelated errors
A) Size:
e adj
∆
CLM
LM
LM ∗
LMreg
LMac
N = 10
N = 50
N = 100
5.3
9.0
15.9
7.4
4.6
6.0
4.8
12.1
17.1
4.8
13.1
21.0
3.9
5.3
7.6
5.5
4.3
5.0
N = 10
N = 50
N = 100
7.6
16.3
34.7
6.8
4.9
9.7
6.4
15.1
23.8
6.7
15.7
25.7
4.4
6.3
11.5
4.5
4.9
4.4
N = 10
N = 50
N = 100
7.3
27.9
46.7
6.6
5.2
12.7
6.7
16.1
25.6
6.9
16.9
27.6
5.1
6.3
12.4
4.1
3.6
4.1
N = 10
N = 50
N = 100
6.5
48.2
68.6
3.4
14.3
25.9
9.3
52.6
67.7
11.0
63.5
79.8
3.6
22.3
30.9
7.7
4.8
4.8
N = 10
N = 50
N = 100
14.3
88.9
99.4
3.0
32.8
74.8
8.8
65.1
90.5
9.6
71.4
94.2
4.4
39.0
73.8
4.7
4.3
4.6
N = 10
N = 50
N = 100
32.9
94.7
99.9
4.1
36.7
81.1
24.6
66.6
92.1
25.6
71.1
94.6
9.6
42.0
77.2
4.7
4.5
4.5
N = 10
N = 50
N = 100
5.4
10.5
16.9
8.1
7.0
23.2
8.3
46.0
69.1
8.1
43.6
65.2
4.3
10.5
22.3
7.7
34.2
54.5
N = 10
N = 50
N = 100
11.6
59.3
85.9
3.7
34.6
80.0
26.7
89.6
99.0
26.4
88.7
99.0
9.6
60.2
90.5
17.9
82.4
97.9
N = 10
N = 50
N = 100
14.4
88.6
99.7
4.6
63.9
95.7
28.3
97.5
100.0
27.7
97.4
100.0
10.3
85.8
99.9
18.2
94.9
100.0
ρ = 0.2
T = 10
T = 20
T = 30
B) Size:
ρ = 0.5
T = 10
T = 20
T = 30
C) Power:
ρ=0
T = 10
T = 20
T = 30
Notes: Panel A) and B) present rejection frequencies (in %) for K=3 under serial correlation of errors
(ρ = 0.2 and ρ = 0.5, respectively) and the null hypothesis. Panel C) presents rejection frequencies
(in %) under the alternative hypothesis without serial correlation in errors. Nominal size is 5%.
42
C
Appendix: Figures
Figure 1: Asymptotic local power of the LM (solid line) and the ∆ test
2
(dahed line) when σi,x
∼ χ21 .
Figure 2: Asymptotic local power of the LM (solid line) and the ∆ test
2
(dashed line) when σi,x
∼ χ22 .
43