LM-type tests for slope homogeneity in panel data models∗ Jörg Breitung† University of Cologne Christoph Roling‡ Deutsche Bundesbank Nazarii Salish BGSE, University of Bonn. July 14, 2016 Abstract This paper employs the Lagrange Multiplier (LM) principle to test parameter homogeneity across cross-section units in panel data models. The test can be seen as a generalization of the Breusch-Pagan test against random individual effects to all regression coefficients. While the original test procedure assumes a likelihood framework under normality, several useful variants of the LM test are presented to allow for non-normality, heteroskedasticity and serially correlated errors. Moreover, the tests can be conveniently computed via simple artificial regressions. We derive the limiting distribution of the LM test and show that if the errors are not normally distributed, the original LM test is asymptotically valid if the number of time periods tends to infinity. A simple modification of the score statistic yields an LM test that is robust to non-normality if the number of time periods is fixed. Further adjustments provide versions of the LM test that are robust to heteroskedasticity and serial correlation. We compare the local power of our tests and the statistic proposed by Pesaran and Yamagata (2008). The results of the Monte Carlo experiments suggest that the LM-type test can be substantially more powerful, in particular, when the number of time periods is small. JEL classification: C12; C33 Keywords: panel data model; random coefficients; LM test; heterogenous coefficients ∗ We are grateful to the editor Michael Jansson and anonymous referee for helpful and constructive comments on an earlier version of this paper. † Corresponding author: University of Cologne, Institute of Econometrics, 50923 Cologne, Germany. Email: [email protected], [email protected], [email protected]. ‡ The views expressed in this paper do not necessarily reflect the views of Deutsche Bundesbank or its staff. 1 Introduction In classical panel data analysis it is assumed that unobserved heterogeneity is captured by individual-specific constants, whether they are assumed to be fixed or random. In many applications, however, it cannot be ruled out that slope coefficients are also individualspecific. For instance, heterogenous preferences among individuals may result in individualspecific price or income elasticities. Ignoring this form of heterogeneity may result in biased estimation and inference. Therefore, it is important to test the assumption of slope homogeneity before applying standard panel data techniques such as the least-squares dummy-variable (LSDV) estimator for the fixed effect panel data model. If there is evidence for individual-specific slope parameters, economists are interested in estimating a population average like the mean of the individual-specific coefficients. Pesaran and Smith (1995) advocate mean group estimation, where in a first step the model is estimated separately for each cross-section unit. In a second step, the unitspecific estimates are averaged to obtain an estimator for the population mean of the parameters. Alternatively, Swamy (1970) proposes a generalized least squares (GLS) estimator for the random coefficients model, which assumes that the individual regression coefficients are randomly distributed around a common mean. In this paper we derive a test for slope homogeneity by employing the LM principle within a random coefficients framework, which allows us to formulate the null hypothesis of slope homogeneity in terms of K restrictions on the variance parameters. Hence, the LM approach substantially reduces the number of restrictions to be tested compared to the set of K(N − 1) linear restrictions on the coefficients implied by the test proposed by Pesaran and Yamagata (2008), henceforth referred to as PY. This does not mean, however, that our test is confined to detect random deviation in the coefficients. In fact our test is optimal against the alternative of random coefficients but it is also powerful against any systematic variations of the regression coefficients. Our approach is related but not identical to the conditional LM test recently suggested by Juhl and Lugovskyy (2014) which is referred to as the JL test. The main difference is that the latter test is derived for a more restrictive alternative, where it is assumed that the individual-specific slope coefficients attached to the K regressors have identical variances. In contrast, our test focuses on the alternative that the coefficients have different variances which allows us to test for heterogeneity in a subset of the regression coefficients. Furthermore, the derivation of our test follows the original LM principle involving the information matrix, whereas the JL test employs the outer product of the scores as an estimator of the information matrix. Our simulation study suggest that both non-standard features of the latter test may result in size distortions in small samples and a sizable loss in power. An important advantage of the JL test is however that it is robust against non-Gaussian and heteroskedastic errors. We therefore propose variants of the 1 original LM test that share the robustness against non-Gaussian and heteroskedastic errors. Furthermore, we also suggest a modified LM test that is robust to serially correlated errors. Another contribution of the paper is the analysis of the local power of the test that allows us to compare the power properties of the LM and PY tests. Specifically, we find that the location parameter of the LM test depends on the cross-section dispersion of the regression variances, whereas the location parameter of the PY test only depends on the mean of the regressor variances. Thus, if the regressor variances differ across the panel groups, the gain in power from using the LM test may be substantial. The outline of the paper is as follows. In Section 2 we compare two tests for slope heterogeneity recently proposed in the literature. We introduce the random coefficients model in 3 and lay out the (standard) assumptions for analyzing the large-sample properties. In Section 4 we derive the LM statistic and establish its asymptotic distribution. Section 5 discusses several variants of the proposed test. First, we relax the normality assumption and extend the result of the previous section to this more general setting. Second, we propose a regression-based version of the LM test. Section 6 investigates the local asymptotic power of the LM test. Section 7 describes the design of our Monte Carlo experiments and discusses the results. Section 8 concludes. 2 Existing tests To prepare the theoretical discussion in the following sections, we briefly review the random coefficients model and existing tests. Following Swamy (1970), consider a linear panel data model yit = x0it βi + it , (1) for i = 1, 2, . . . , N , and t = 1, 2, . . . , T , where yit is the dependent variable for unit i at time period t, xit is a K × 1 vector of explanatory variables and it is an idiosyncratic error with zero mean and variance E (2it ) = σi2 . For the slope coefficient βi we assume βi = β + vi , where β is a fixed K × 1 vector and vi is a i.i.d. random vector with zero mean and K × K covariance matrix Σv .1 1 For more details and extensions of the basic random coefficient model see Hsiao and Pesaran (2008). As pointed out by a referee, this specification may be replaced by some systematic variation of the coefficients that depends on observed variables. For example, we may specify the deviations as βi −β = Γzi +ηi , where zi is some vector of observed variables possibly correlated with xit . The corresponding variant of the LM test (which is different from our LM test based assuming that vi and xit are independent) will be optimal against this particular form of systematic variation. In general, our test assuming independent variation with Γ = 0 will also have power against systematic variations but admittedly our test is not 2 The null hypothesis of slope homogeneity is β1 = β2 = · · · = βN = β, (2) which is equivalent to testing Σv = 0. To test hypothesis (2), Swamy suggests the statistic Sb∗ = N X βbi − βbWLS 0 X 0 X i i s2i i=1 βbi − βbWLS , with Xi = (xi1 , . . . , xiT )0 and βbi = (Xi0 Xi )−1 Xi0 yi is the ordinary least squares (OLS) estimator of (1) for panel unit i, and t = 1, . . . , T . The common slope parameter β is estimated by the weighted least-squares estimator N X X 0 Xi !−1 s2i i=1 ! i i βbWLS = N X X 0 yi i=1 , s2i where s2i denotes the standard OLS estimator of σi2 . Intuitively, if the regression coefficients are identical, the differences between the individual estimators and the pooled estimator should be small. Therefore, Swamy’s test rejects the null hypothesis of homogenous slopes for large values of this statistic, which possesses a limiting χ2 distribution with K(N − 1) degrees of freedom as N is fixed and T → ∞. Pesaran and Yamagata (2008) emphasize that in many empirical applications N is large relative to T and the approximation by a χ2 distribution is unreliable. PY adapt the test to a setting in which N and T jointly tend to infinity. In particular, they assume individual-specific intercepts and derive a test for the hypothesis β1 = · · · = βN = β in yit = αi + x0it βi + it . (3) The analogue of the pooled weighted least squares estimator above eliminates the unobserved fixed effects, βbWFE = N X X 0 Mι Xi i i=1 σ bi2 !−1 N X X 0 Mι yi i i=1 σ bi2 ! , where Mι = IT − ιT ι0T /T , and ιT is a T × 1 vector of ones. A natural estimator for σi2 is σ bi2 = yi − Xi βbi 0 Mι yi − Xi βbi T −K −1 optimal against alternative with systematically varying coefficients. 3 , where βbi = (Xi0 Mι Xi )−1 (Xi0 Mι yi ) and the test statistic becomes N 0 X 0 M X X ι i i b b b b b S= βi − βWFE . βi − βWFE σ bi2 i=1 Employing a joint limit theory for N and T , PY obtain the limiting distribution as b d b = S√− N K → ∆ N (0, 1) , 2N K (4) √ provided that N → ∞, T → ∞ and N /T → 0. Thus, by appropriately centering and standardizing the test statistic, inference can be carried out by resorting to the standard normal distribution, provided the time dimension is sufficiently large relative to the crosssection dimension. PY propose several modified versions of this test, which for brevity we shall refer to as the ∆ tests or statistics. In particular, to improve the small sample properties of the test, PY suggest the adjusted statistic under normally distributed errors (see Remark 2 in PY), e adj = ∆ p ! N −1 Se − K N (T + 1) p 2K (T − K − 1) , (5) where Se is computed as Sb but replacing σ bi2 by the variance estimator σ ei2 = where βeFE = N P i=1 Xi0 Mι Xi yi − Xi βeFE 0 Mι yi − Xi βeFE , T −1 −1 N P Xi0 Mι yi (6) is the standard ’fixed effects’ (within-group) i=1 estimator. Note that this asymptotic framework does not seem to be well suited for typical panel data applications where N is large relative to T . Therefore, it will be of interest to derive a test statistic that is valid when T is small (say T = 10) and N is very large (say N = 1000), which, for instance, is encountered in microeconomic panels. The test statistic proposed by Juhl and Lugovskyy (2014) is based on the individual scores Si = u b0i Mι Xi Xi0 Mι u bi − σ bi2 tr(Xi0 Mι Xi ), where u bi = yi − Xi βeFE and tr (A) denotes the trace of the matrix A. The (conditional) LM statistic results as CLM = N X i=1 Si0 N X i=1 4 !−1 Si Si0 N X i=1 Si . (7) It is interesting to compare this test statistic to the PY test which is based on the sum P b Sb = N i=1 Si with 0 X 0 M X ι i i bi − βbWFE b b b β Si = βi − βWFE σ bi2 1 = 2 u0i Mι Xi (Xi0 Mι Xi )−1 Xi0 Mι ui + op (1) σi if N and T tend to infinity. Note that limN →∞ E(Sbi ) = K. The main difference between the JL and the PY statistics is that the statistic Si neglects the additional inverse (σi2 Xi0 Mι Xi )−1 in the statistic Sbi . Thus, although these two test statistics are derived from different statistical principles, the final test statistics are essentially testing the independence of ui and Mι Xi or E(u0i Mι Xi Wi Xi0 Mι ui ) = σi2 E(tr [Mι Xi Wi Xi0 Mι ]) with Wi = IK for the JL test and Wi = (σi2 Xi0 Mι Xi )−1 for the PY test. 3 Model and Assumptions Consider a linear panel data model with random coefficients, yi = Xi βi + i , (8) βi = β + v i , (9) for i = 1, 2, . . . , N , where yi is a is a T ×1 vector of observations on the dependent variable for cross-section unit i, and Xi is a T × K matrix of possibly stochastic regressors. To simplify the exposition we assume a balanced panel with the same number of observation in each panel unit (see also Remark 1 of Lemma 1). The vector of random coefficients is decomposed into a common non-stochastic vector β and a vector of individual-specific disturbances vi . Let X = [X10 , X20 , . . . , XN0 ]0 . In order to construct the LM test statistic for slope homogeneity we start with model (8)-(9) under stylized assumptions. However, in Section 5 these assumptions will be relaxed to accommodate more general and empirically relevant setups. The following assumptions are imposed on the errors and the regressor matrix: iid iid Assumption 1 The error vectors are distributed as i |X ∼ N (0, σ 2 IT ) and vi |X ∼ 2 2 N (0, Σv ), where Σv = diag σv,1 , . . . , σv,K . The errors i and vj are independent from each other for all i and j. Assumption 2 For the regressors we assume E|xit,k |4+δ < C < ∞ for some δ > 0, for all i = 1, 2 . . . , N , t = 1, 2, . . . , T and k = 1, 2 . . . , K. The limiting matrix lim N −1 E (X 0 X) N →∞ exists and is positive definite for all N and T . 5 In Assumption 1, the random components of the slope parameters are allowed to have different variances but we assume that there is no correlation among the elements of vi . Note that this framework is more general than the one considered by Juhl and Lugovskyy (2014) who assume E(vi vi0 ) = σ 2v IK . The latter assumption seems less appealing if there are sizable differences in the magnitudes of the coefficients. Furthermore, the power of the test depends on the scaling of regressors, whereas the (local) power of our test is invariant to a rescaling of the regressors (see Theorem 5). The alternative hypothesis can be further generalized by allowing for a correlation among the elements of the error vector vi . However, this would increase the dimension of the null hypothesis to K(K +1)/2 restrictions and it is therefore not clear whether accounting for the covariances helps to increase the power of the test. Obviously, if all variances are zero, then the covariances are zero as well.2 Let ui = Xi vi + i . Stacking observations with respect to i yields y = Xβ + u, (10) 0 0 ) and u = (u01 , . . . , u0N )0 . The N T × N T covariance matrix of u is where y = (y10 , . . . , yN given by X1 Σv X10 + σ 2 IT 0 .. . Ω ≡ E [uu0 |X] = . 2 0 0 XN Σv XN + σ IT The hypothesis of fixed homogeneous slope coefficients, βi = β for all i, corresponds to testing 2 H0 : σv,k = 0, for k = 1, ..., K, against the alternative H1 : K X 2 σv,k > 0, (11) k=1 that is, under the alternative at least one of the variance parameters is larger than zero. 4 The LM Test for Slope Homogeneity 0 2 2 Let θ = σv,1 , ..., σv,K , σ 2 . Under Assumption 1 the corresponding log-likelihood function results as ` (β, θ) = − NT 1 1 log(2π) − log |Ω (θ)| − (y − Xβ)0 Ω (θ)−1 (y − Xβ) . 2 2 2 2 (12) We also conducted Monte Carlo simulations allowing for non-zero diagonal elements in the matrix Σv . We found that the results are quite similar to the setting where Σv is diagonal. 6 The restricted ML estimator of β under the null hypothesis coincides with the pooled OLS estimator βe = (X 0 X)−1 X 0 y and the corresponding residual vector and estimated residual variance are denoted by u ei = yi − Xi βe and σ e2 . The following lemma presents the score and the information matrix derived from the log-likelihood function in (12). Lemma 1 The score vector evaluated under the null hypothesis is given by N P ∂` e S≡ ∂θ H0 (1) (1)0 u e0i Xi Xi u ei 2 (1)0 (1) Xi Xi −σ e i=1 .. 1 . = 4 N 2e σ P 0 (K) (K)0 (K)0 (K) ei − σ e 2 Xi Xi u ei Xi Xi u i=1 0 , (13) where X (k) is the k-th column of X for k = 1, 2, . . . , K. The information matrix evaluated under the null hypothesis is ∂ 2 ` I(e σ )≡−E ∂θ∂θ0 H0 N 2 P (1)0 (1) X X i i i=1 N 2 P (2)0 (1) Xi X i 1 i=1 .. = 4 . 2e σ N 2 P (K)0 (1) X X i i 2 ··· ··· .. (k) where Xi 1, ..., N . i=1 N P (1)0 (K) Xi X i (2)0 2 2 X (1)0 X (1) X (2)0 X (2) .. . . ··· (K) Xi Xi i=1 i=1 X (1)0 X (1) N P .. . 2 N P (K)0 (K) X i Xi X (K)0 X (K) i=1 ··· X (K)0 X (K) , (14) NT denotes the k-th column of the T × K matrix Xi , k = 1, 2, . . . , K and i = Remark 1 It is straightforward to extend Lemma 1 to unbalanced panel data, where observations are assumed to be missing at random. Let Xi be a Ti × K matrix and u ei be a conformable Ti × 1 vector. The score vector is given by N P (1) (1)0 u e0i Xi Xi u ei 2 (1)0 (1) Xi Xi −σ e i=1 .. 1 . e S= 4 N 2e σ P 0 (K) (K)0 (K)0 (K) u ei Xi Xi u ei − σ e 2 Xi Xi i=1 0 7 , where N 1 X 0 u ei u ei . σ e = N P Ti i=1 2 i=1 The information matrix is computed accordingly. Remark 2 If individual-specific constants αi are included in the regression, then a conditional version of the test is available (cf. Juhl and Lugovskyy (2014)). The individual effects can be “conditioned out” by considering the transformed regression Mι yi = Mι Xi β + Mι ui , (15) with Mι as defined in Section 2. The typical elements of the corresponding score vector result as 1 0 (j) (j)0 (j) 2 (j)0 bi − σ e Xi Mι Xi u e Mι Xi Xi Mι u , j = 1, . . . , K, σ e4 i where u ei = Mι yi − Mι Xi βe and βe is the pooled OLS estimator of the transformed model (15), and σ e2 is the corresponding estimated residual variance. It follows that we just have (j) (j) to replace the vector Xi by the mean-adjusted vector Mι Xi in Theorem 1. Remark 3 It is easy to see that under the more restrictive alternative E(vi vi0 ) = σ 2v IK of 2 2 = · · · = σv,k = σ 2v , the score is simply the sum of Juhl and Lugovskyy (2014), where σv,1 e all elements of S. Remark 4 Notice also that the LM-type statistics do not require the restriction K < T , which is important for the PY approach. This is of course not an issue for the asymptotic framework, where T → ∞, however, it can be a substantive restriction in many empirical applications when T is small. In the following theorem it is shown that when T is fixed, the LM statistic possesses a χ2 limiting null distribution with K degrees of freedom as N → ∞. Theorem 1 Under Assumptions 1, 2 and the null hypothesis 0 −1 d LM = Se I(e σ 2 )−1 Se = se0 Ve se → χ2K , (16) as N → ∞ and T is fixed, where se is defined as the K × 1 vector with typical element N T 1 X X sek = 4 u eit xit,k 2e σ i=1 t=1 8 !2 N T 1 XX 2 − 2 x , 2e σ i=1 t=1 it,k (17) and the (k, l) element of the matrix Ve is given by Vek,l !2 N T 1 1 X X xit,k xit,l − = 4 2e σ NT t=1 i=1 N X T X ! x2it,k i=1 t=1 N X T X ! x2it,l . (18) i=1 t=1 Remark 5 If T is fixed, normality of the regression disturbances is required. If we relax the normality assumption, an additional term enters the variance of the score vector and the information matrix becomes an inconsistent estimator. Theorem 2 discusses this issue in more details and derives the asymptotic distribution of the LM test if the errors are not normally distributed. Remark 6 It may be of interest to restrict attention to a subset of coefficients. For example, in the classical panel data model it is assumed that the constants are individualspecific and, therefore, the respective parameters are not included in the null hypothesis. Another possibility is that a subset of coefficients is assumed to be constant across all panel units. To account for such specifications the model is partitioned as 0 0 yit = β1i Xita + β20 Xitb + β3i Xitc + uit . The K1 × 1 vector Xita includes all regressors that are assumed to have individual-specific coefficients stacked in the vector β1i . The K2 × 1 vector Xitb comprises all regressors that are supposed to have homogenous coefficients. The null hypothesis is that the coefficient vector β3i attached to the K3 × 1 vector of regressors Xitc is identical for all panel units, that is, β3i = β3 for all i, where β3i = β3 + v3i . The null hypothesis implies Σv3 = 0. Let X1a 0 · · · 0 X2a · · · Z= ... .. . 0 0 ··· 0 0 .. . X1b X2b XNa XNb X1c X2c , c XN a a 0 where Xia = [Xi1 , . . . , XiT ] and the matrices Xib and Xic are defined accordingly. The residuals are obtained as u e = (I − Z(Z 0 Z)−1 Z 0 )y and the columns of the matrix X c are used to compute the LM statistic. Some caution is required if a set of individualspecific coefficients are included in the panel regression since in this case the ML estimator P PT σ e2 = (N T )−1 N e2it is inconsistent for fixed T and N → ∞. This implies that i=1 t=1 u the expectation of the score vector (13) is different from zero. Accordingly, the unbiased estimator N X T X 1 u e2 σ b = N T − K1 − K2 − K3 i=1 t=1 it 2 9 (19) must be employed. As a special case, assume that the constant is included in Xic , whereas all other regressors are included in the matrix Xib , and Xia is dropped. This case is equivalent to the test for random individual effects as suggested by Breusch and Pagan (1980). The LM statistic then reduces to 2 NT u e0 (IN ⊗ ιT ι0T ) u e LM = 1− , 2 (T − 1) u e0 u e where ιT is a T × 1 vector of ones, which is identical to the familiar LM statistic for random individual effects. 5 Variants of the LM Test In this section we generalized the LM test statistic by allowing for non-normally distributed, heteroskedastic and serially dependent errors. First we show in Section 5 that the proposed LM test is robust against non-normally distributed errors once we assume N, T → ∞ jointly and specific restrictions on the existence of higher-order moments. Moreover, the variants of the test with non-normally distributed errors are proposed for the settings when N → ∞ and T is fixed. Second, in Section 5.2 we propose a variant of the LM test that is robust to heteroskedastic errors. Finally, Section 5.3 discusses how to robustify the LM test, when the errors are serially correlated. 5.1 The LM statistic under non-normality In this section we consider useful variants of the original LM statistic under the assumption that the errors are not normally distributed. Therefore, we replace Assumptions 1 and 2 by: Assumption 10 it is independently and identically distributed with E(it |X) = 0, E(2it |X) = σ 2 and E (|it |6 |X) < C < ∞ for all i and t. Furthermore, it and js are independently distributed for i 6= j and t 6= s. Assumption 2 0 For the regressors we assume E|xit,k |6 < C < ∞ for some δ > 0, for all T P i = 1, 2 . . . , N , t = 1, 2, . . . , T and k = 1, 2 . . . , K. Further, lim T −1 E [xit x0it ] tend to T →∞ a positive definite matrix Qi and the limiting matrix Q := t=1 lim (N T )−1 N,T →∞ N P T P E [xit x0it ] i=1 t=1 exists and is positive definite. Assumption 3 The error vector vi is independently and identically distributed with 2 2 E(vi |X) = 0, E(vi vi0 |X) = Σv , where Σv = diag σv,1 , . . . , σv,K and E |vik |2+δ |X < C < ∞ for some δ > 0, for all i and k = 1, ..., K. Further, vi and j are independent from each other for all i and j. 10 Notice that, as in Section 3 under the null hypothesis Σv or vi = 0 for all i. Hence, Assumption 3 is not required for the derivation of the asymptotic null distribution. To study the behaviour of the LM test statistic under (local) alternatives, Assumption 3 will be used in Section 6. With these modifications of the previous setup, the limiting distribution of the LM statistic is given in Theorem 2 Under Assumptions 10 , 2 0 and the null hypothesis, d LM → χ2K , (20) as N → ∞, T → ∞ jointly. Generalizing the model to allow for non-normally distributed errors introduces a new term into the variance of the score: the (k, l) element of the covariance matrix now becomes (see equation (49) in appendix A.2) (4) Vk,l + µu − 3σ 4 (2σ 4 )2 ! N X T X i=1 t=1 N T 1 XX 2 x2it,k − x N T i=1 t=1 it,k ! N T 1 XX 2 x2it,l − x N T i=1 t=1 it,l ! , (21) (4) where µu denotes the fourth moment of the error distribution, and Vk,l is as in (18) with (4) σ e4 replaced by σ 4 . The additional term depends on the excess kurtosis µu − 3σ 4 . Clearly, for normally distributed errors, this term disappears, but it deviates from zero in the more general setup. Under Assumptions 10 and 2 0 , the first term Vk,l is of order N T 2 , while the new component is of order N T , such that, when the appropriate scaling underlying the LM statistic is adopted, it vanishes as T → ∞. Therefore, the LM statistic as presented in the previous section continues to be χ2K distributed asymptotically. By incorporating a suitable estimator of the second term in (21), however, a test statistic becomes available that is valid in a framework with non-normally distributed errors as N → ∞, whether T is fixed or T → ∞. Therefore, denote the adjusted LM statistic by −1 LMadj = se0 Veadj se, (4) where Veadj is as in (21) with Vk,l , σ 4 and µu replaced by the consistent estimators Vek,l P PT (4) e4it for k, l = 1, ..., K. As a consequence defined in (18), σ e4 and µ eu = (N T )−1 N i=1 t=1 u of Theorem 2 and the preceding discussion, we obtain the following result. Corollary 1 Under Assumptions 10 , 2 and the null hypothesis d LMadj → χ2K , 11 as N → ∞ and T is fixed. Furthermore, p LMadj − LM → 0, as N → ∞, T → ∞ jointly. As mentioned above, once the regression disturbances are no longer normally distributed, the fourth moments of the error distribution enter the variance of the score. It is insightful to identify exactly which terms give rise to this new form of the covariance matrix. According to Lemma 1, the contribution of the i-th panel unit to the k-th element of the score vector is ! T T X X X (k) (k)0 (k)0 (k) 0 2 2 2 2 u ei Xi Xi u ei − σ e Xi X i = xit,k u eit − σ e + u eit u eis xit,k xis,k . (22) t=1 t=1 s6=t The variance of the first term on the right hand side depends on the fourth moments of the errors. Since the contribution of this term vanishes if T gets large, it can be dropped without any severe effect on the power whenever T is sufficiently large. Hence, we consider a modified score vector as presented in the following theorem. Theorem 3 Under Assumptions 10 , 2 and the null hypothesis, the modified LM statistic ∗ LM = s̃ ∗0 Ve ∗ −1 d s̃∗ → χ2K , as N → ∞ and T fixed, where s̃∗ is K × 1 vector with contributions for panel unit i s̃∗i,k T t−1 1 XX u eit u eis xit,k xis,k , = 4 σ e t=2 s=1 (23) for i = 1, ..., N , k = 1, ..., K, and the (k, l) element of Ve ∗ is given by N T t−1 1 XXX ∗ e Vk,l = 4 xit,k xit,l xis,k xis,l , σ e i=1 t=2 s=1 (24) for k, l = 1, ..., K. Remark 7 It is important to note that this version of the LM test is invalid if the panel regression allows for individual-specific coefficients (cf. Remark 3). Consider for example the regression yit = αi + x0it βi + uit 12 (25) where αi are fixed individual effects and we are interested in testing H0 : var(βi ) = 0. The residuals are obtained as u eit = yit − y i − (xit − xi )0 βe = uit − ui − (xit − xi )0 (βe − β). It follows that in this case E(e uit u eis xit,k xis,k ) 6= 0 and, therefore, the modified scores (23) result in a biased test. To sidestep this difficulty, orthogonal deviations (e.g. Arellano and Bover (1995)) can be employed to eliminate the individual-specific constants yielding yit∗ = β 0 x∗it + u∗it t = 2, 3, . . . , T, " !# r t−1 X t − 1 1 with yit∗ = yit − yis , t t − 1 s=1 where x∗it and u∗it are defined analogously. It is well known that if uit is i.i.d. so is u∗it . It follows that the modified LM statistic can be constructed by using the OLS residuals u e∗it instead of u eit . This approach can be generalized to arbitrary individual-specific regressors a xit . Let Xia = [xai1 , . . . , xaiT ]0 denote the individual-specific T × K1 regressor matrix in the regression yi = Xia β1i + Xib β2 + Xic β3i + ui , (26) (see Remark 3). Furthermore, let Mia = IT − Xia (Xia0 Xia )−1 Xia0 , fa denote the (T − K1 ) × T matrix that results from eliminating the last K1 rows and let M i a from Mi such that (Mia Mia0 ) is of full rank. The model (26) is transformed as yi∗ = Xib∗ β2 + Xic∗ β3i + u∗i , (27) fia M fia0 )−1/2 M fia . It is not difficult to see that E(u∗i u∗0 where yi∗ = Ξai yi and Ξai = (M i ) = 2 σ IT −K1 and, thus, the modified scores (23) can be constructed by using the residuals of (27), where the time series dimension reduces to T − K1 . Note that orthogonal deviations result from letting Xia be a vector of ones. To review the results of this section, the important new feature in the model without assuming normality is that the fourth moments of the errors enter the variance of the score. The information matrix of the original LM test derived under normality does not incorporate higher order moments, but the test remains applicable as T → ∞. To apply the LM test in the original framework when T is fixed and errors are no longer normal we can proceed in two ways. A direct adjustment of the information matrix to account for 13 higher order moments yields a valid test. Alternatively, we can adjust the score itself and restrict attention to that part of the score that does not introduce higher order moments into the variance. In the next section, we further pursue the second route of dealing with non-normality and thereby robustify the test against heteroskedasticity and serial correlation. 5.2 The regression-based LM statistic In this section we offer a convenient way to compute the proposed LM statistic via a simple artificial regression. Moreover, the regression-based form of the LM test is shown to be robust against heteroskedastic errors. Following the decomposition of the score contribution in (22) and the discussion thereafter, we construct the “Outer Product of Gradients” (OPG) variant of the LM test based on the second term in (22). Rewriting the corresponding elements of the score contributions of panel unit i as s̃∗i,k = T X t−1 X u eit u eis xit,k xis,k , (28) t=2 s=1 for k = 1, ..., K. Note that we dropped the factor 1/e σ 4 as this factor cancels out in the final test statistic. This gives the usual LM-OPG variant LMopg = N X !0 s̃∗i i=1 N X !−1 s̃∗i s̃∗0 i i=1 N X ! s̃∗i , (29) i=1 0 where s̃∗i = s̃∗i,1 , ..., s̃∗i,K . An asymptotically equivalent form of the LM-OPG statistic can be formulated as a Wald-type test for the null hypothesis ϕ = 0 in the auxiliary regression K X u eit = zeit,k ϕk + eit , for i = 1, . . . , N, t = 1, . . . , T (30) k=1 where zeit,k = xit,k t−1 X u eis xis,k s=1 for k = 1, ..., K. Therefore, with the Eicker-White heteroskedasticity-consistent variance estimator, the regression based test statistic results as LMreg = N X T X i=1 t=2 !0 u eit zeit N X T X i=1 t=2 !−1 u e2it zeit zeit0 T N X X ! u eit zeit , (31) i=1 t=2 It follows from the arguments similar as in Theorem 3 that Mreg test statistic is asymptotically χ2 distributed but it turns out to be robust against heteroskedasticity: 14 Corollary 2 Under Assumption 10 but allowing for heteroscedastic errors such that E[2it |X] = σit2 < C < ∞, Assumption 2 and the null hypothesis d LMreg → χ2K , (32) as N → ∞ and T is fixed. It is important to note that the LM-OPG variant cannot be applied to residuals from a fixed effect regression, see Remark 7. Furthermore, the replacement of the residuals by orthogonal forward deviation will not fix this problem since orthogonal forward deviations are no longer serially uncorrelated if the errors are heteroskedastic. Therefore, a version of the test is required that is robust against autocorrelated errors. 5.3 The LM statistic under serially dependent errors In this section we propose a variant of the LM test statistic that accommodates serially correlated errors, that is, we relax Assumptions 10 as follows: Assumption 100 The T × 1 error vector i is independently and identically distributed with E(i |X) = 0, E(i 0i |X) = E(i 0i ) = Σ and E |it |4+δ |X < C < ∞ for some δ > 0 and all i and t. The T × T matrix Σ is positive definite with typical element σts for t, s = 1, . . . , T . Note that Assumption 100 allows for heteroscedasticity and serial dependence across time, however, it restricts the error vector i to be iid across individuals. Under this assumption the expectation of the score vector (23) is under the null hypothesis E[uit uis xit,k , xis,k ] = σts E[xit,k xis,k ]. We therefore suggest a modification for autocorrelated errors based on the adjusted K × 1 PN ∗∗ score vector s̃∗∗ with typical element s̃∗∗ k = i=1 s̃i,k for k = 1, . . . , K and s̃∗∗ i,k = T X t−1 X (e uit u eis − σ ets ) xit,k xis,k , (33) t=2 s=1 P where σ ets = N1 N eit u eis . The asymptotic properties of the LM statistic based on the i=1 u modified score vector are presented in Theorem 4 Let LMac = s̃ ∗∗ 0 15 Ve ∗∗ −1 s̃∗∗ , where Ve ∗∗ is a K × K matrix with typical element ∗∗ Vek,l = T X T X t−1 X t−1 N X X δbtsτ q xit,k xis,k xiτ,l xiq,l (34) i=1 t=2 τ =2 s=1 q=1 and δbtsτ q 1 = N N X ! u ejt u ejs u ejτ u ejq − σ ets σ eτ q . j=1 Under Assumptions 100 , 2, the null hypothesis (2) and as N → ∞ with T fixed the LMac statistic has a χ2K limiting distribution. Note that this version of the test has a good size control irrespective of serial dependence in errors. However, the test involves some power loss relative to the original test statistics when errors are serially uncorrelated, which is not surprising given a more general setup of this variant of the test. The respective asymptotic power results are analyzed in the next section (see Remark 10). Section 7 elaborates in detail on the size-power properties of the LMac in finite samples. 6 Local Power The aim of this section is twofold. First, we investigate the distributions of the LM-type test under suitable sequences of local alternatives. Two cases are of interest, N → ∞ with T fixed and N, T → ∞ jointly, which are presented in the respective theorems below. Second, we adopt the results of PY to our model in order to compare the local asymptotic power of the two tests. To formulate an appropriate sequence of local alternatives, we specify the random coefficients in (9) in a setup in which T is fixed. The error term vi is as in Assumption 1 with elements of Σv given by ck 2 σv,k =√ , N (35) where ck > 0 are fixed constants for k = 1, . . . , K. The asymptotic distribution of the LM statistic results as follows. Theorem 5 Under Assumptions 1, 2 and the sequence of local alternatives (35), d LM → χ2K (µ) , as N → ∞ and T fixed, with non-centrality parameter µ = c0 Ψc, where c = (c1 , . . . , cK )0 16 and Ψ is a K × K matrix with (k, l) element Ψk,l = 1 1 plim 2σ 4 N →∞ N N X T X i=1 t=1 !2 xit,k xit,l − 1 T 1 N N X T X ! x2it,k i=1 t=1 1 N N X T X ! x2it,l . i=1 t=1 In order to relax the assumption of normally distributed errors we adopt Assumption 10 for vi , where the sequence of local alternatives is now given by 2 σv,k = c √k , T N (36) for k = 1, . . . , K. Note that according to Theorem 2 we require T → ∞. Theorem 6 Under Assumptions 10 , 2 0 , 3 and the sequence of alternatives (36), d LM → χ2K (µ) , as N → ∞, T → ∞, with non-centrality parameter µ = c0 Ψc, where c = (c1 , . . . , cK )0 and Ψ is a K × K matrix with (k, l) element Ψk,l N 1 X 1 = 4 plim 2σ N,T →∞ N i=1 T 1X xit,k xit,l T t=1 !2 . Remark 8 As in Section 5.1 above, when the normality assumption is relaxed, local power can be studied for LM ∗ under Assumptions 10 , 2 and 3 when T is fixed. The specification of local alternatives as in Theorem 5 applies. The non-centrality parameter of the limiting non-central χ2 distribution results as µ∗ = c0 Ψ∗ c with Ψ∗k,l N T t−1 1 1 XXX = 4 plim xit,k xit,l xis,k xis,l , σ N →∞ N i=1 t=2 s=1 for k, l = 1, ..., K. ∗ Remark 9 Given theresults for the modified statistic LM in remark 8, and the fact P N ∗ that s̃∗ = Ze0 u e∗ , we expect a similar result for the regression-based LM i=1 s̃i = statistic LMreg to hold. Recall that LM ∗ uses N −1 Ve ∗ as an estimator of the variance of P PT s̃∗ (see (24)), while LMreg employs N −1 N e2it zeit zeit0 . Under the null hypothesis, i=1 t=2 u it is not difficult to see that these two estimators are asymptotically equivalent. Under the alternative, when studying the (k, l) element of the variance of LMreg , we obtain (see 17 appendix A.2 for details) N N T T 1 XX 2 1 XX 2 u e zeit,k zeit,l = xit,k xit,l N i=1 t=2 it N i=1 t=2 it t−1 X ! is xis,k s=1 t−1 X ! is xis,l s=1 N T 1 XX 2 0 X + v B vi + op (1) , N i=1 t=2 it i it (37) P P 0 xit,l t−1 with the K × K matrix BitX = xit,k t−1 s=1 xis xis,k s=1 xis xis,l . The first term ∗ on the right-hand side in (37) has the same probability limit as N −1 Vek,l , the limiting ∗ ∗ covariance matrix element Ψk,l . In contrast to LM , however, the variance estimator of the regression-based test involves additional quadratic forms such as vi0 BitX vi , contributing 2 = √ckN , to the estimator. Since, in a setup with fixed T and the local alternatives σv,k N T 1 XX 2 0 X it vi Bit vi = Op N −1/2 , N i=1 t=2 the variance estimator remains consistent. In small samples, however, the additional term results in a bias of the variance estimator and may deteriorate the power of the regression-based test. See the appendix for details about the above result and the Monte Carlo experiments in Section 7. Remark 10 The arguments of Remark 8 can be used to derive the local power of the LMac statistic that accounts for serial correlation in errors. The same specification of local alternatives applies. The non-centrality parameter of the limiting non-central χ2 distribution takes the quadratic form µ∗∗ = c0 Ψ∗∗ c with Ψ∗∗ k,l = plim N →∞ N T t−1 T t−1 1 XXXXX (uit uis uiτ uiq − σts στ q ) xit,k xis,k xiτ,l xiq,l , N i=1 t=2 s=1 τ =2 q=1 for k, l = 1, ..., K. In the absence of serial correlation it can be shown that the LMac test involve a loss of power. To illustrate this fact assume for simplicity that K = 1 (single regressor case). Further, the score vector in (33) can be equivalently written as ŝ ∗∗ = N X T X t−1 X (e uit u eis − σ ets ) xit,k xis,k = i=1 t=2 s=1 N X T X t−1 X u eit u eis xit,k xis,k − C ts , (38) i=1 t=2 s=1 P where C ts = N1 N eit u eis is equivalent with demeaning of i=1 xit xis . Thus, demeaning of u xit,k xis,k . In the case of no autocorrelation and (38) it follows that Ψ∗ − Ψ∗∗ is positive semi-definite. Therefore, the modification (33) tends to reduce the power of the LMac test when compared to LM ∗ . 18 We now proceed to examine the local power of the ∆ statistic of PY in model (8) and (9) under the sequence of local alternatives (36). In our homoskedastic setup, the dispersion statistic becomes Se = N X i=1 0 X 0 X i i ei − βe , β βei − βe σ e2 b statistic with βe as the OLS estimator in (10) as above. Using this expression, the ∆ b is computed as in (4). The next theorem presents the asymptotic distribution of the ∆ statistic under the local alternatives as specified above. This result follows directly from Section 3.2 in PY. Theorem 7 Under Assumptions 10 , 2 0 , 3 and the sequence of local alternatives (36) d b → ∆ N (λ, 1) , as N → ∞, T → ∞, provided with typical element √ √ N /T → 0, where λ = Λ0 c/ 2K and Λ is a K × 1 vector N T 1 XX 2 1 x , Λk = 2 plim σ N,T →∞ N T i=1 t=1 it,k for k = 1, . . . , K. b is slightly different from the In Theorem 7, the mean of the limiting distribution of ∆ result in Section 3.2 in PY. Here, vi is random and independently distributed from the regressors and, therefore, the second term of the respective expression in PY is zero. Remark 11 Consider for simplicity a scalar regressor xit that is i.i.d. across i and t 2 , that is, the with uniformly bounded fourth moments. Let E [xit ] = 0 and E [x2it ] = σi,x regressor is assumed to have a unit-specific variation which is constant over time for a given unit. We obtain T 1X 2 E x T t=1 it !2 2 2 = σi,x + O T −1 , P 2 2 implying µ = c2 /2σ 4 limN →∞ N −1 N in Theorem 6. To gain further insight, i=1 σi,x 2 2 we think of σi,x as being randomly distributed in the cross-section such that the noncentrality parameter results as µ= 2 2 2 c2 h 2 2 i c2 E σ = V ar σ + E σi,x . i,x i,x 2σ 4 2σ 4 19 (39) Similarly, under these assumptions, we find λ= 2 c √ E σi,x . σ2 2 (40) Comparing the mean of the normal distribution of the ∆ statistic in (40) with the noncentrality parameter of the asymptotic χ21 distribution of the LM statistic in (39), we see 2 contributes to that the main difference between the two tests is that the variance of σi,x 2 the power of the LM statistic but not to the power of the ∆ test. If V ar σi,x = 0 such 2 that σi,x = σx2 for all i, the LM test and the ∆ test have the same asymptotic power in 2 this example. If, however, V ar σi,x > 0, so that there is variation in the variance of the regressor in the cross-section, the LM test has larger asymptotic power. To illustrate this point, we examine the local asymptotic power functions of the LM and the ∆ test for two cases, using the expressions in (39) and (40). Figure 1 (see appendix C) shows the local asymptotic power of the LM (solid line) and the ∆ test (dashed line) as a function of c 2 2 has a χ21 distribution. Figure 2 repeats this exercise for σi,x drawn from a χ22 when σi,x distribution. In both cases, the LM test has larger asymptotic power. The power gain is substantial for the first case, but diminishes for the second. This pattern is expected, 2 contributes relatively more to the non-centrality parameter in the as the variance of σi,x first specification. This discussion exemplifies the difference between the LM-type tests and the ∆ statistic in terms of the local asymptotic power in a simplified framework. The analysis suggests that the LM-type tests are particularly powerful in an empirically relevant setting in which there is non-negligible variation in the variances of the regressors between panel units. Having studied the large samples properties of the LM tests under the null and the alternative hypothesis in our model, we now evaluate the finite-sample size and power properties of the LM-type tests in a Monte Carlo experiment. 7 7.1 Monte Carlo Experiments Design After deriving LM-type tests in the random coefficient model, we now turn to study the small-sample properties of the proposed test and it variants. The aim of this section is to evaluate the performance of the tests in terms of their empirical size and power in several different setups, relating to the theoretical discussion of Sections 4 - 6. We consider the following test statistics: the original LM statistic presented in Theorem 1, the adjusted LM statistic that adjusts the information matrix to account for fourth moments of the error distribution (see Corollary 1), the score-modified LM statistics (see Theorem 3 and Theorem 4) and the regression-based, heteroskedasticity-robust LM statistic (see Section 20 e adj given in (5). Following the notes 5.2). As a benchmark, we consider PY’s statistic ∆ e adj is carried out as a two-sided test. In addition, the in Table 1 in PY, the test using ∆ CLM test in (7) is included, which is also a two-sided test. We consider the following data-generating process with normally distributed errors as the standard design: yit = αi + x0it βi + it , iid it ∼ N (0, 1) , (41) iid αi ∼ N (0, 0.25) , xit,k = αi + ϑxit,k , k = 1, 2, 3, iid 2 , ϑxit,k ∼ N 0, σix,k iid βi ∼ N3 (ι3 , Σv ) , under the null hypothesis: Σv = 0 0.03 0 0 under the alternative: Σv = 0 0.02 0 , 0 0 0.01 (42) (43) where i = 1, 2, . . . , N , t = 1, 2, . . . , T . Hence, to simulate a model under the null the slope vector βi is generated as a 3 × 1 vector of ones ι3 for all i. As discussed in Section 6 the variances of the regressors play an important role. In our benchmark specification we generate the variances as 2 σix,k = 0.25 + ηi,k iid ηi,k ∼ χ21 , (44) 2 The choice of the χ2 distribution for σix,k is made analogous to the Monte Carlo experiment in PY. We then consider variations of this specification below. All results are based on 5,000 Monte Carlo replications. We choose N ∈ {10, 20, 30, 50, 100, 200} , T ∈ {10, 20, 30} , as we would like to study the small sample properties of the test procedures when the time dimension is small. In our first set of Monte Carlo experiments the errors are normally distributed; therefore we focus on the standard LM test. We also include their respective heteroskedasticity-robust regression variants for this exercise. 21 7.2 Normally distributed errors Panel A of Table 1 (see Appendix B) shows the rejection frequencies when the null hye adj test has rejection frequencies close to the nominal size of 5% pothesis is true. The ∆ for all combinations of N and T , while the CLM test rejects the null hypothesis too often, in particular for small N . Deviations from the nominal size for the the standard LM test and the regression-based test are small and disappear as N increases, as expected from Theorem 1. Panel B of Table 1 shows the corresponding rejections frequencies under the e adj and the CLM test in general. alternative hypothesis. The LM test outperforms the ∆ This observation holds in particular for T = 10 where the power gain is considerable. The e adj test for T = 10, suffers from a power loss LMreg variant, although as powerful as the ∆ relative to the standard LM test. This power loss may be due to the small sample bias of the variance estimator, see Remark 9. Following Remark 7 the variants of the LM tests are computed as follows. First, the individual-specifc fixed effects αi are eliminated by transforming the data using orthogonal forward deviations (see Arellano and Bover (1995)). The LM statistics are then computed using the transformed data. The results presented in Panel A of Table 2 indicate that by employing forward orthogonalization all variants of the LM test have size reasonably close to the nominal level. By comparing panel B of Table 1 and the rejection rates under the alternative in panel B of Table 2 we see that the power is very similar in both setups confirming usefulness of the forward orthogonalization procedure for the LM tests. 7.3 Non-normal errors We now investigate the LM test when the errors are no longer normally distributed, thereby building on the results of Section 5.1. The errors in (41) are generated from a t-distribution with 5 degrees of freedom, scaled to have unit variance. All other specifications of the standard design remain unchanged. In addition to the statistics already considered, we now include the adjusted LM statistic (see corollary 1) and the scoremodified statistic (see Theorem 3). Panel A in Table 3 reports the rejection frequencies under the null hypothesis in this case. We notice that the LM test has substantial size distortions when T is fixed and N increases, which is expected from Theorem 2. However, the adjusted LM statistic LMadj and the modified score statistic LM ∗ are both successful in controlling the type-I error. Panel B of Table 3 shows rejection frequencies under the alternative hypothesis. The e adj test is noticeable when T = 10 or T = 20. power gain of the LM test relative to the ∆ We found similar results when the errors are χ2 distributed with two degrees of freedom, centered and standardized to have mean zero and variance equal to one. Given the similarity of the results for t and χ2 distributed errors, we do not present the latter results. 22 7.4 Serially correlated errors To study the impact of serially correlated errors on the test statistics we adjust the DGP as follows: yit = x0it βi + it , it = ρit−1 + 1 − ρ2 1/2 eit , iid for i = 1, 2, . . . , N , t = 1, 2, . . . , T , where eit ∼ N (0, 1). Under the null hypothesis βi = 1 for all i while under the alternative βi is generated as in (43). The regressors, xit,k , k = 1, 2, 3 are generated as xit,k = φi,k xit−1,k + 1 − φ2i,k 1/2 ϑxit,k , iid φi,k ∼ U [0.05, 0.95], iid 2 ϑxit,k ∼ N 0, σix,k , iid 2 where σix,k = 0.25 + ηi,k with ηi,k ∼ χ21 . Parameters φi,k and σix,k are fixed across replications. Results of this simulation experiment are reported in Table 4. Panel A and B show the rejection frequencies under the null hypothesis in case of “small” serial dependence (i.e., ρ = 0.2, Panel A) and “moderate” dependence (i.e., ρ = 0.5, Panel B). For all LM based test statistics, except the LMac test, we observe substantial size deviations from the nominal level. However, the LMac test is successful in controlling the type-I error. Further, size properties of PY test are also significantly affected by autocorrelated errors. Note that this fact is already documented and studied in Blomquist and Westerlund (2013). Panel C of Table 4 reports power properties of the test under no serial correlation (i.e., ρ = 0), building on the discussion in Remark 10. We observe that the LMac test involve a 5 − 10% power loss compared to the LM ∗ test. This relative power loss dies out if T increases. 8 Concluding remarks In this paper we examine the problem of testing slope homogeneity in a panel data model. We develop testing procedures using the LM principle. Several variants are considered that robustify the original LM test with respect to non-normality, heteroscedasticity and serially correlated errors. By studying the local power we identify cases where the LMtype tests are particularly powerful relative to existing tests. In sum, our Monte Carlo experiments suggest that the LM test are powerful testing procedures to detect slope 23 homogeneity in short panels in which the time dimension is small relative to the crosssection dimension. The LM approach suggested in this paper may be extended in future research by allowing for dynamic specifications with lagged dependent variables and cross sectionally or serially correlated errors. References Arellano, M. and O. Bover (1995). Another look at the instrumental variable estimation of error-components models. Journal of Econometrics 68, 29–51. Baltagi, B., Q. Feng, and C. Kao (2011). Testing for sphericity in a fixed effects panel data model. The Econometrics Journal 14, 25–47. Blomquist, J. and J. Westerlund (2013). Testing slope homogeneity in large panels with serial correlation. Economics Letters 121 (3), 374 – 378. Breusch, T. S. and A. R. Pagan (1980). The Lagrange multiplier test and its applications to model specification in econometrics. Review of Economic Studies 47, 239–253. Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association 72, 320–338. Honda, Y. (1985). Testing the error components model with non-normal disturbances. Review of Economic Studies 52, 681–690. Hsiao, C. and M. H. Pesaran (2008). Random coefficient models. In L. Mátyás and P. Sevestre (Eds.), The Econometrics of Panel Data, Chapter 6. Springer. Juhl, T. and O. Lugovskyy (2014). A test for slope homogeneity in fixed effects models. Econometric Reviews 33, 906–935. Pesaran, M. H. and R. Smith (1995). Estimating long-run relationships from dynamic heterogenous panels. Journal of Econometrics 68, 79–113. Pesaran, M. H. and T. Yamagata (2008). Testing slope homogeneity in large panels. Journal of Econometrics 142, 50–93. Phillips, P. C. B. and H. R. Moon (1999). Linear regression limit theory for nonstationary panel data. Econometrica 67 (5), 1057–1111. Swamy, P. A. V. B. (1970). Efficient inference in a random coefficient regression model. Econometrica 38, 311–323. Ullah, A. (2004). Finite-sample econometrics. Oxford University Press. Wand, M. P. (2002). Vector differential calculus in statistics. The American Statistician 56, 1–8. 24 White, H. (2001). Asymptotic theory for econometricians. Emerald. Wiens, D. P. (1992). On moments of quadratic forms in non-spherically distributed variables. Statistics 23 (3), 265–270. 25 A Appendix: Proofs To economize on notation we use P and P N P instead of full expressions t i and T P through- t=1 i=1 out this appendix. A.1 Preliminary results We first present an important result concerning the asymptotic effect of the estimation error βe − β on the test statistics. Define ! X 1 (k) (k) (k)0 (k)0 (k) IT . A i = Xi Xi − Xi Xi NT i (k) Lemma A.1 Let RXAX = P i (k) (k) Xi0 Ai Xi and RXAu = P (k) Xi0 Ai ui for k = 1, ..., K. Fur- i thermore let (k) RN = σ e4 σ4 1 2σ 2 βe − β 0 (k) RXAX 0 (k) e e β − β − 2 β − β RXAu , for k = 1, ..., K. Under Assumptions 1, 2 and the null hypothesis the following properties hold if T is fixed: (k) (i) RXAX = Op (N ), (k) (ii) RXAu = Op N 1/2 , (k) (iii) RN = Op (1), for k = 1, ..., K. (j) Proof. (i) Using the definition of Ai (k) RXAX = X (k) (k)0 Xi0 Xi Xi i yields 1 Xi − NT ! XX ! X x2it,k t i Xi0 Xi . i The first term is a K × K matrix with typical (l, m) element ! ! X X X xit,l xit,k xit,m xit,k = Op (N ) , i t t as a consequence of Assumption 2, while P P i t x2it,k /N T = Op (1) and P i Xi0 Xi = Op (N ). (ii) Recall that under the null hypothesis, ui = i . Thus (k) RXAu = X i (k) Xi0 Xi 1 (k)0 Xi u i − NT 26 ! XX i t x2it,k ! X i Xi0 ui . The first and the second term are Op N 1/2 by a the central limit theorem (CLT) for independent random variables and Assumption 2. (iii) Combining (i) and (ii) together √ with the fact that N βe − β = Op (1) yields the result. Lemma A.2 Under Assumptions 10 , 2 0 and the null hypothesis the following properties hold for N → ∞ and T → ∞: (k) (i) RXAX = Op (N T 2 ), (k) (ii) RXAu = Op N 1/2 T 3/2 , (k) (k) (iii) RN T = Op (T ) , which is defined as RN in Lemma A.1, for k = 1, ..., K. (k) Proof. Following the proof of Lemma A.1 the element of the first term of RXAX is Op (N T 2 ), whereas the second term is Op (N T ) by Assumption 2 0 which yields state(k) ment (i). Notice in (ii) RXAu has two terms as in Lemma A.1, where the first one has zero mean and variance of order T 3 . Therefore by Lemma 1 in Baltagi, Feng, (j) (j)0 and Kao (2011) we have that Xi0 Xi Xi ui = Op (T 3/2 ) and by Lemma 2 in PY that P 0 (j) (j)0 P X i Xi Xi ui = Op N 1/2 T 3/2 and Xi0 ui = Op N 1/2 T 1/2 . These results i i √ e and the fact that N T β − β = Op (1) imply (iii). A.2 Proofs of the main results Proof of Lemma 1 We use the following rules for matrix differentiations: ∂` 1 1 0 −1 ∂Ω −1 −1 ∂Ω = − tr Ω Ω u , + uΩ ∂θk 2 ∂θk 2 ∂θk ∂` 1 ∂Ω ∂Ω −1 −1 −E = tr Ω Ω , ∂θk ∂θl 2 ∂θk ∂θl for k, l = 1, 2, . . . , K + 1, see, e.g., Harville (1977) and Wand (2002). First, X (k) (k)0 2 Xi Σv Xi0 = σv,k Xi Xi , k (k) with Xi denoting the k-th column vector of Xi . X1 Σv X10 0 . .. 0 XN Σv XN0 27 Hence X 2 σv,k Ak , = k (45) (46) with the N T × N T matrix , (k) (k)0 X1 X1 ... Ak = (k) 0 (k) for k = 1, . . . , K, and Xi 0 (k)0 , X N XN denotes the k-th column of the T × K matrix Xi . Thus, X 2 Ω= σv,k Ak + σ 2 IN T k and ∂Ω = ∂θk ( Ak , for k = 1, 2, . . . , K, IN T , for k = K + 1. Under the null hypothesis we have Ω = σ 2 IN T . Using (45) we obtain ( − 2eσ1 2 tr [Ak ] + 2eσ1 4 u e0 Ak u e, for k = 1, 2, . . . , K ∂` = ∂θk H0 0, for k = K + 1, where 1 0 u eu e, N T −1 u e = IN T − X (X 0 X) X 0 y. σ e2 = The representation of the score vector follows from XX 2 Xit,k = X (k)0 X (k) , tr [Ak ] = i t where X (k) denotes the k-th column of the N T × K matrix X. Similarly, (46) yields 1 2σ4 tr [Ak Al ] , for k, l = 1, 2, . . . , K, ∂` 1 = −E X (k)0 X (k) , for k = 1, 2, . . . , K, and l = K + 1, 2σ 4 ∂θk ∂θl H0 NT , for k = l = K + 1, 2σ 4 Using the fact that Ak and Al are block-diagonal, X h (k) (k)0 (l) (l)0 i X (k)0 (l) 2 tr [Ak Al ] = tr Xi Xi Xi Xi = Xi X i , i i (k) where Xi denotes the i-th column of Xi , which yields the form of the information matrix presented in the lemma. Proof of Theorem 1 28 Recall that (k) Ai = (k) (k)0 Xi Xi ! 1 X (k)0 (k) Xi Xi IT , NT i − and rewrite the elements of the scores as 4 σ e 1 X 0 (k) sek = u eA u ei , σ 4 2σ 4 i i i for k = 1, ..., K. Since u ei = ui − Xi (βe − β) we have 4 1 1 σ 1 X 0 (k) 1 (k) √ sek = √ ui Ai ui + √ RN , 4 4 e 2σ i N N σ N P h (k) i (k) (k) where RN = Op (1) from Lemma A.1. Since i tr Ai = 0 it follows that E(u0i Ai ui ) = 0 and, therefore, 1 lim E √ se = 0. N →∞ N The covariances are obtained as i h (k) (l) (l) (k) Cov u0i Ai ui , u0i Ai ui X = 2σ 4 tr Ai Ai ! ! 2 X X 1 1 (k)0 (l) (l)0 (l) (k)0 (k) (k)0 (k) (l)0 (l) − = 2σ 4 Xi Xi X i Xi − Xi Xi Xi Xi X i Xi NT i NT i ! ! 1 X (k)0 (k) 1 X (l)0 (l) +T , X i Xi Xi Xi NT i NT i (l) (k) and since u0i Ai ui is independent of u0j Ai uj for all i 6= j conditional on X, 1 2σ 4 2 1 = 4 2σ Cov X (k) u0i Ai ui , i X X (l) u0i Ai ui i (k)0 (l) Xi Xi i 2 1 − NT ! X ! X X (k)0 X (k) i !! X (l)0 (l) X i Xi i = Vk,l . The Liapounov condition in the central limit theorem for independent random variables (see White (2001), Theorem 5.10) is satisfied by Assumption 2 and therefore 1 e V N −1/2 1 d √ se → N (0, IK ) , N where Ve replaces σ 4 in V by σ e4 . By the formula for the partitioned inverse I(e σ 2 )−1 1:K,1:K = Ve −1 , 29 where {·}1:K,1:K denotes the upper-left K × K block of the matrix, it follows finally that d Se0 I(e σ 2 )−1 Se = se0 Ve −1 se → χ2K . Proof of Theorem 2 The proof proceeds in three steps: (i) we derive the covariance matrix of the score vector, (ii) we establish the asymptotic normality of the score vector and (iii) we use these results to establish the asymptotic distribution of the LM statistic. (i) Define the K × 1 vector s = [s1 , ..., sK ]0 with typical element sk = 1 X 0 (k) 1 X u A u = si,k , i 2σ 4 i i i 2σ 4 i (47) (k) where si,k = u0i Ai ui and 1 ≤ k ≤ K. Using standard results for quadratic forms (see e.g., Ullah (2004), appendix A.5), h i (k) E si,k X = σ 2 tr Ai h i i h i h (k)0 (l) (k) (l) (k) (l) 4 4 4 E si,k si,l X = 2σ tr Ai Ai + σ tr Ai tr Ai + µ(4) ai ai , u − 3σ (k) (k) where ai is a vector consisting of the main diagonal elements of the matrix Ai denotes the fourth moment of uit . Since i i h h (l) (k) E si,k X E si,l X = σ 4 tr Ai tr Ai , we have (4) and µu i h (k)0 (l) (k) (l) 4 ai ai . Cov si,k , si,l X = 2σ 4 tr Ai Ai + µ(4) u − 3σ (48) (l) (k) Due to the independence of u0i Ai ui and u0j Aj uj for i 6= j, it follows that Cov X i si,k , X i ! X h (k) (l) i X (k)0 (l) 4 si,l X = 2σ 4 tr Ai Ai + µ(4) ai ai . u − 3σ i i h i (k) (l) Let VN T denote the covariance matrix of s. Inserting the expression for tr Ai Ai , we determine the (k, l) element of VN T as !2 ! ! XX XX 1 X X 1 2 x x − x x2it,k Vk,l = it,k it,l it,k 4 2σ NT t t t i i i ! ! ! (4) XX XX 1 1 µu − 3σ 4 X X + x2it,k − x2it,k x2it,l − x2it,l 2 4 N T N T (2σ ) t t t i i i = V1,k,l + V2,k,l . (49) (ii) To verify that a central limit theorem applies to s, let λ ∈ Rk , ||λ|| = 1 and Zi,T = 1 0 λ si , where si is a K × 1 vector with elements si,k for 1 ≤ k ≤ K. Further, E [Zi,T ] = 0 T 30 2 and E Zi,T = T12 λ0 E [Vi,T ] λ, where Vi,T is a K × K matrix with the typical (k, l) element defined in (48) for 1 ≤ k, l ≤ K. From the Cramer-Wold device we conclude that it is sufficient to show that 1 X d √ Zi,T −→ N (0, V) , (50) N i P where V ≡ lim N1T 2 i λ0 E [Vi,T ] λ. Assumption 2 0 and (48) ensure that V exists and N,T →∞ is positive definite. The asymptotic normality result (50) follows from the central limit theorem for the double indexed process (see e.g., Phillips and Moon (1999), Theorem 2) if the following condition holds 2 2 X Zi,T Zi,T > ε −→0 for all ε > 0, E 1 (51) V V N T N T i P where VN T = T12 i λ0 E [Vi,T ] λ. In turn the Lindeberg condition (51) holds provided that s 3 i sup E kZi,T k3 ≤ sup E < ∞. T i,T i,T (52) To study wether si /T is uniformly L3 bounded for all i and T it suffices to consider si /T elementwise. Furthermore, each element of si /T can be written in terms of quadratic forms i.e., 1 1 0 0 si,k = ui Bi,k ui − E (ui Bi,k ui )X , T T (k) (k)0 where Bi,k = Xi Xi for 1 ≤ k ≤ K and by the triangle inequality 3 3 1 0 1 1 3 0 E si,k = 3 E ui Bi,k ui − E (ui Bi,k ui )X ≤ 3 E |u0i Bi,k ui | T T T 2 3 3 2 0 0 0 0 E |ui Bi,k ui | E (ui Bi,k ui )X + 3 E |ui Bi,k ui | E (ui Bi,k ui )X + 3 T T 3 1 0 X . E (u B u ) + i,k i i T3 (53) For the first term on the r.h.s of (53) we make use of a formula for the third moment of a quadratic form (see e.g., Wiens (1992) or Ullah (2004), appendix A.5), the law of iterated expectations and uniform bounds E [|uit |6 |X] < C < ∞ and E|xit,k |6 < C < ∞ given in Assumptions 10 and 2 0 , i.e., 1 1 3 3 0 0 E |ui Bi,k ui | = 3 E E |ui Bi,k ui | X T3 T ## " " X X X 1 2 2 2 2 2 2 = E E + O T −1 . (54) u u u x x x it1 it2 it3 it1 ,k it2 ,k it3 ,k X 3 T t1 t2 t3 2 2 2 Further, from Assumptions 1 we have E uit1 uit2 uit3 X < C < ∞ and from Assump tions 2 0 the term E x2it1 ,k x2it2 ,k x2it3 ,k is uniformly bounded for all i and T . Then it follows from triangle inequality that the first term on the r.h.s. of (54) is uniformly bounded. The 0 31 same reasoning applies to the rest of the terms in (53) to show their uniform boundedness. This concludes the proof of (52) and the asymptotic normality of the score vector s. (iii) Rewrite the first K elements of the score as 4 σ e s + RN T , se = σ4 where RN T is given in Lemma A.2 and s has typical element as defined in (47). By (ii), d s0 (VN T )−1 s → χ2K , (55) as N → ∞, T → ∞, where VN T has (k, l) element Vk,l as in (49). Under Assumptions 10 and 2 0 V1 = Op N T 2 , V2 = Op (N T ) , where V1 and V2 are specified elementwise in (49). Given the expression for Ve in Theorem 1, Ve V1 p − →0 2 NT NT 2 and hence p s0 Ve −1 s − s0 (VN T )−1 s → 0 (56) as N → ∞, T → ∞. The LM statistic can be expanded as LM = se 0 Ve −1 se 4 0 4 σ e σ e −1 e = s + RN T V s + RN T σ4 σ4 4 σ e 0 e −1 s V s + Op N −1/2 . = 4 σ (57) where the last line follows from Lemma A.2. The theorem follows by combining (55), (56) and (57). Proof of Corollary 1 (4) The result follows immediately from the proof of Theorem 2 and the fact that µ eu = (4) −1 P P (N T ) u eit is a consistent estimator of µu . i t Proof of Theorem 3 Using similar arguments as in the proof of Theorem 1, P P t−1 P 1 xit,1 uit xis,1 uis σ4 4 i t s=1 1 1 σ .. √ se∗ = √ + op (1). . 4 e N N σ 1 P P t−1 P x u x u it,K it is,K is 4 σ i t s=1 32 (58) P Pt−1 ∗ ∗ ∗ = xit,k u∗it . Clearly, E Let u∗it = uit /σ and zitk t s=1 zitk zisk = 0. Since conditional on P Pt−1 ∗ ∗ P Pt−1 ∗ ∗ X, t s=1 zit,k zis,k and t s=1 zjtl zjsl are independent for i 6= j, the covariances for two elements k and l of the vector (58) are ! T t−1 " T t−1 ! # i h X X X X X 1 ∗ ∗ ∗ ∗ ∗ ∗ E sk sl X = 4 zitl zisl X E zitk zisk σ i t=2 s=1 t=2 s=1 ! t−1 ! T X 1 X X = 4 xit,k xit,l xis,k xis,l σ i=1 t=2 s=1 ∗ = Vk,l . since all cross terms have zero expectation and E (u∗it )2 = 1. The central limit theorem for independent random variables and Slutsky’s theorem imply 1 ∗ V N −1/2 1 √ s∗ N d → N (0, IK ) and the result follows. Proof of Corollary 2 Using the arguments in Theorem 1 and 3 (under Assumption 10 , 2 and allowing for E[2it |X] = σit2 ), LMreg is asymptotically χ2K if the Liapounov condition is satisfied P P 2and the asymptotic covariance matrix of the score vector is equal to the limit of u eit zeit,k zeit,l i t as N → ∞ and T is fixed. 2+δ Regarding the Liapounov condition it suffices to show that E s∗i,k < C < ∞ for k = 1, ..., K. By Minkowski inequality, 2+δ E s∗i,k ≤ t−1 XX t E |uit uis xit,k xis,k |2+δ 2+δ 1 ! 2+δ . s=1 Further by the Cauchy-Schwartz inequality, the law of iterated expectations, Assumptions 10 and 2, # "r h i 2+δ 2+δ 2+δ E |xit,1 uit xis,1 uis | E |u2it u2is | |X |xit,1 xis,1 | ≤E # h i h i 4+δ 4+δ 2+δ E |uit | E |uis | |X |xit,1 xis,1 | < C < ∞. "r = E Hence the Liapounov condition holds. Regarding the (k, l) element of the covariance matrix of s̃∗ , note that " !! !! # t−1 t−1 X X X X X E xit,k uit xis,k uis xit,l uit xis,l uis X t t−1 XXX i = i t s=1 t 2 σit2 σis xit,k xit,l xis,k xis,l . s=1 33 s=1 Pt−1 Next let zit,k = xit,k s=1 uis xis,k and notice that # " t−1 XXX XX 2 0 2 xit,k xit,l xis,k xis,l . σit2 σis E uit zit,k zit,l X = t s=1 t i i Furthermore 1 XX 2 1 XX 2 p u eit zeit,k zeit,l − uit zit,k zit,l → 0, N i t N i t and result (32) follows. Proof of Theorem 4 Consider the normalized scores 1 1 √ s̃∗∗ = √ s∗∗ + op (1) N N where the k-element of the vector s∗∗ is given by s∗∗ k N X T X t−1 X = (uit uis − σts ) xit,k xis,k i=1 t=2 s=1 By construction E [s∗∗ k ] = 0 for k = 1, . . . , K under the null hypothesis. Same arguments as for result (32) apply to show the Liapounov condition and make use of the central limit theorem for independent and heterogeneously random variables . It remains to show that the (k, l) element of the covariance matrix of s∗∗ takes form (34). Since conditional on X ∗∗ , contributions s∗∗ i,k and sj,k are independent for i 6= j, the covariances for two elements k ∗∗ and l of the vector s normalized with N are i h ∗∗ E s∗∗ s k l X = ! # ! T t−1 " T t−1 X XX XX = E (uit uis − σts ) xit,k xis,k (uiτ uiq − στ q ) xiτ,l xiq,l X t=2 s=1 τ =2 q=1 i ! T X t−1 T X t−1 XX X = xit,k xis,k E [(uit uis − σts ) (uiτ uiq − στ q ) |X] xiτ,l xiq,l , i t=2 s=1 τ =2 q=1 Further, E [(uit uis − σts ) (uiτ uiq − στ q ) |X] = E [uit uis uiτ uiq |X] − σts στ q , and the (k, l) element can be written as h ∗∗ E s∗∗ k sl N X T X t−1 i X xit,k xis,k X = i=1 t=2 s=1 = N X T X t−1 X T X t−1 X i=1 t=2 s=1 τ =2 q=1 34 t−1 T X X ! δtsτ q xiτ,l xiq,l τ =2 q=1 δtsτ q xit,k xis,k xiτ,l xiq,l , where δtsτ q = E[ujt ujs u ujq |X] − σts στ q . Substituting δtsτ q by an appropriate consistent jτ PN b estimator δtsτ q = 1/N u ejt u ejs u ejτ u ejq − σ ets σ eτ q yields the limiting distribution of the j=1 modified test statistic. Proof of Theorem 5 As in Honda (1985) the proof of the theorem proceeds in three steps: (i) first we show that σ e2 remains consistent under the local alternative; (ii) second, we incorporate the local alternative into the score vector and (iii) establish the asymptotic distribution of the LM statistic. (i) Note first that with MX = IN T − X (X 0 X)−1 X 0 u e = MX u = MX (DX v + ) , where X1 0 X2 DX = . . . . 0 XN Hence, 1 u e0 u e 0 0 = (0 − 0 PX + v 0 DX MX DX v + v 0 DX MX + v 0 MX DX ) . NT NT Using Assumptions 10 , 2 and 3, it is straightforward to show that 1 0 −1 X (X 0 X) X 0 = op (1) , N 1 0 0 v DX MX DX v = op (1) , N 1 0 0 v DX MX = op (1) . N and, thus, σ e2 = σ 2 + op (1) . (ii) Since ui = Xi vi + i and u ei = Xi vi + i − Xi βe − β , we obtain σ4 1 X 0 (k) A i σ e4 2σ 4 i i i 4 σ 1 1 X 0 0 (k) +√ v Xi Ai Xi vi + op (1), e4 2σ 4 i i N σ 1 1 √ sek = √ N N (59) for k = 1, ..., K, where the order of the remainder term follows by similar arguments as in lemma A.1. √ (iii) Using the same arguments as in the proof of Theorem 1, the first term of se/ N in 35 (59) is asymptotically normally distributed. Regarding the second term 0 1 X 0 0 (k) 1 X (k) √ vi Xi Ai Xi vi = N 1/4 vi Xi0 Ai Xi N 1/4 vi , N i N i and by standard results for quadratic forms, h h i 0 i (k) (k) 1/4 1/4 E N vi Xi Ai Xi N vi X = tr Xi Ai Xi Dc . with Dc = diag(c1 , . . . , cK ). Thus by the law of large numbers for sums of independent random variables, X 1 1 X h 0 (k) i p 0 0 (k) √ tr Xi Ai Xi Dc . vi Xi Ai Xi vi → lim N →∞ 2σ 4 N 2 N σ4 i i Now X tr h (k) i Xi0 Ai Xi Dc = i K X cl !2 X X t i l=1 xit,k xit,l Define the K × 1 vector ψ elementwise by !2 K X X X 1 1 cl plim xit,k xit,l − ψk ≡ N i T N →∞ t l=1 − 1 NT ! ! XX i 1 XX 2 x N i t it,k ! t x2it,k XX i x2it,l t ! X X 1 x2 N i t it,l By Slutsky’s theorem theorem we obtain 1 V N −1/2 1 d √ se → N (ψ, IK ) , N and the theorem follows by the definition of the non-central χ2 distributed random variable with ψ = Ψc and c = (c1 , . . . , cK )0 . Proof of Theorem 6 The proof is analogous to the proof of Theorem 5. To show that σ e2 remains consistent under the sequence of alternatives we note that −1 0 X (X 0 X) X 0 = Op (1) , 0 MX DX v = Op N 1/2 T , v 0 DX 0 v 0 DX MX = Op N T 1/2 + Op T 1/2 . √ Using the same arguments as in the proof of Theorem 2, se/(T N ) has a limiting normal distribution with nonzero mean which is determined by applying the law of large numbers to the second term in (59) with proper normalization. Proof of Theorem 7 With the Swamy statistic as described in the text, the proof follows the steps outlined in Appendix A.6 in PY. 36 Details for Remark 9 P PT 0 2 z e z e u e We study the (k, l) element of N −1 N i=1 t=2 it it it under the sequence of alternatives in Theorem 5. Note that 0 2 u e2 = (it + x0 vi ) + βe − β xit x0 βe − β − 2 (it + x0 vi ) x0 βe − β , (60) it it it it it and zeit,k = xit,k t−1 X is + x0is vi + x0is βe − β xis,k . s=1 implying, zeit,k zeit,l = xit,k xit,l + βe − β 0 xit,k ! t−1 X is xis,k s=1 t−1 X t−1 X ! is xis,l + xit,k xis xis,k ! xit,l is xis,k s=1 − xit,k xit,l − xit,k xis,l t−1 X s=1 t−1 X s=1 ! is xis,k xit,k s=1 t−1 X xit,l !! xis xis,k xit,l x0is xis,l t−1 X ! x0is xis,l vi s=1 βe − β s=1 t−1 X s=1 t−1 X ! (x0is vi ) xis,l + xit,k t−1 X ! (x0is vi ) xis,k xit,l t−1 X xis,l x0is x0is βe − β xis,k ! t−1 X is xis,l s=1 ! t−1 X ! s=1 t−1 X βe − β − xit,k xit,l − xit,k xis,l ! is xis,l s=1 s=1 ! s=1 ! s=1 !! s=1 t−1 X + vi0 t−1 X (x0is vi ) xis,k t−1 X ! xis,l x0is βe − β s=1 x0is βe − β xis,k s=1 ! t−1 X s=1 (61) First, from the first term on the right hand sides of (60) and (61), we obtain ! t−1 ! t−1 N T X X 1 XX 2 is xis,l . is xis,k xit,k xit,l N i=1 t=2 it s=1 s=1 ∗ Notice that this term has the same probability limit as Vek,l /N , which is equal to Ψ∗k,l . Next, from the first term on the right-hand side in (60) and the second term on the right-hand side in (61), ! ! N X T t−1 t−1 X X X 1 2it vi0 xit,k xis xis,k xit,l x0it xis,l vi N i=1 t=2 s=1 s=1 ! ! T t−1 t−1 N X X 0 1 XX 2 1/4 0 = 1.5 xit,k xis xis,k xit,l xit xis,l N 1/4 vi it N vi N i=1 t=2 s=1 s=1 37 ! x0is vi xis,l Since it and vi are independent conditional on X, ! ! " # t−1 t−1 X X 0 0 1/4 2 1/4 xis xis,k xit,l xit xis,l N vi X = σ 2 tr BitX Dc E it N vi xit,k s=1 with the K × K matrix BitX = xit,k s=1 Pt−1 s=1 xis xis,k t−1 N T X 0 1 XX 2 1/4 xis xis,k N vi xit,k N 1.5 i=1 t=2 it s=1 ! xit,l xit,l t−1 X Pt−1 s=1 x0it xis,l such that ! Xit0 xis,l N 1/4 vi = Op N −1/2 s=1 Using the properties of it , vi and the fact that βe − β = op (1), it can be shown in a similar manner that all of the remaining terms are of lower order. 38 B Appendix: Tables Table 1: Rejection frequencies for H0 : all coefficients are homogenous A) Size B) Power e adj ∆ CLM LM LMreg N = 10 N = 20 N = 30 N = 50 N = 100 N = 200 6.3 5.6 5.5 5.2 5.4 4.6 11.8 12.0 11.4 8.9 8.3 7.5 2.6 3.2 3.7 3.7 4.7 4.9 4.7 4.3 4.3 4.1 4.8 4.6 N = 10 N = 20 N = 30 N = 50 N = 100 N = 200 5.3 5.7 5.9 5.1 4.5 5.1 15.1 14.2 12.8 10.8 8.3 7.1 2.6 3.4 3.8 4.1 4.3 5.2 N = 10 N = 20 N = 30 N = 50 N = 100 N = 200 4.7 4.5 5.2 5.3 5.2 5.2 15.6 14.5 13.2 11.6 8.9 7.4 2.4 3.5 3.9 4.5 4.3 5.0 e adj ∆ CLM LM LMreg 5.5 8.4 11.5 18.9 36.1 65.6 5.4 4.2 6.3 15.9 46.5 87.3 11.3 24.8 35.2 51.2 77.5 96.9 4.1 7.0 10.4 19.6 47.0 83.1 6.3 5.7 5.8 5.3 4.7 5.5 17.1 35.0 50.7 74.6 95.7 99.9 3.4 9.5 23.4 53.5 90.1 98.8 28.1 53.0 70.5 88.9 99.1 100.0 12.4 25.8 43.2 71.8 96.5 100.0 7.0 6.2 5.6 5.9 4.7 5.8 34.4 64.8 81.9 96.3 100.0 100.0 4.6 19.7 42.5 76.9 96.0 99.3 43.0 74.3 88.9 98.2 100.0 100.0 22.4 50.9 73.3 94.5 99.9 100.0 T = 10 T = 20 T = 30 Notes: Rejection frequencies (in %) for K = 3 under the null (panel A) and the alternative hypothesis (panel B). Nominal size is 5%. 39 Table 2: Rejection frequencies: model with individual effects A) Size B) Power LM LM ∗ LMreg N = 10 N = 20 N = 30 N = 50 N = 100 N = 200 2.6 3.3 3.1 3.8 4.1 4.2 2.9 4.0 3.9 4.6 4.8 4.8 N = 10 N = 20 N = 30 N = 50 N = 100 N = 200 2.3 3.4 3.8 3.8 4.3 4.8 N = 10 N = 20 N = 30 N = 50 N = 100 N = 200 2.8 3.6 3.7 4.7 4.4 4.6 LMac e adj ∆ CLM LM LM ∗ LMreg LMac 4.8 5.2 4.7 4.3 4.7 4.7 3.4 3.8 4.1 4.5 4.6 4.7 5.7 8.1 8.7 13.1 29.2 49.0 5.3 4.3 5.3 10.8 40.8 79.5 8.3 21.6 26.5 42.0 73.4 91.8 8.6 20.2 24.2 38.9 67.7 87.9 4.2 6.8 8.8 15.7 40.4 72.9 7.8 17.8 22.5 35.8 64.5 84.9 2.2 3.5 4.1 4.2 4.6 4.8 6.4 6.0 5.4 5.1 5.5 5.3 2.4 3.9 4.2 3.8 4.5 4.7 19.1 39.1 40.0 72.5 91.4 99.8 3.6 9.4 15.8 53.9 89.0 98.4 31.4 59.3 60.2 89.0 98.2 100.0 30.8 57.8 58.0 88.1 97.7 100.0 13.1 31.7 33.0 73.3 93.9 99.9 26.3 53.6 53.8 84.7 96.9 100.0 2.9 3.5 3.7 4.5 4.4 4.6 8.0 6.2 5.8 5.8 5.1 4.8 3.5 3.8 4.1 4.8 4.7 4.6 26.6 66.8 91.8 94.6 99.9 100.0 3.9 24.6 47.8 67.1 92.8 98.4 37.8 37.6 78.0 77.3 95.3 94.9 97.0 96.9 100.0 100.0 100.0 100.0 18.0 56.3 86.4 91.8 99.9 100.0 29.4 72.2 93.7 96.1 99.9 100.0 T = 10 T = 20 T = 30 Notes: Left panel: rejection frequencies (in %) under the null hypothesis with same design as in Table 1 and orthogonal forward orthogonalization to eliminate fixed effects. Right panel: rejection frequencies (in %) under the alternative hypothesis and forward orthogonalization to eliminate fixed effects. Nominal size is 5%. 40 Table 3: Size and power for t-distributed errors e adj ∆ CLM LM LMadj LM ∗ LMreg N = 10 N = 20 N = 30 N = 50 N = 100 N = 200 6.3 6.1 5.6 5.1 5.0 5.5 9.0 10.5 10.2 8.9 7.5 6.9 3.5 5.4 6.9 7.5 9.8 10.7 2.9 4.0 5.0 5.3 6.2 5.7 4.0 6.1 6.2 6.6 7.2 7.6 3.9 4.5 4.9 4.4 4.9 5.2 N = 10 N = 20 N = 30 N = 50 N = 100 N = 200 5.5 5.8 5.4 5.2 4.9 5.1 13.2 13.0 11.7 10.4 8.0 7.0 3.5 5.1 5.5 6.7 8.0 8.9 2.8 4.0 4.3 4.8 5.5 5.5 3.6 4.6 4.8 5.1 5.9 5.4 6.0 6.3 5.6 5.3 5.1 4.8 N = 10 N = 20 N = 30 N = 50 N = 100 N = 200 5.8 5.0 4.7 5.1 5.0 5.3 15.0 13.6 12.4 11.3 8.5 7.7 3.0 4.5 5.2 5.9 6.4 7.3 2.6 3.5 4.3 4.7 5.0 5.0 3.0 3.9 4.2 5.1 4.8 5.2 7.0 5.7 5.5 5.9 5.1 4.9 e adj ∆ CLM LM LMadj LM ∗ LMreg N = 10 N = 20 N = 30 N = 50 N = 100 N = 200 5.9 9.7 13.6 24.1 45.3 76.1 3.4 3.9 6.9 16.0 44.4 83.1 12.4 25.6 36.1 52.4 78.9 95.7 10.9 22.7 31.8 46.7 71.8 92.6 11.8 22.9 32.4 47.9 73.9 93.8 4.9 7.5 11.9 22.6 49.4 83.2 N = 10 N = 20 N = 30 N = 50 N = 100 N = 200 20.0 39.5 57.3 80.0 97.8 100.0 3.2 10.1 22.8 51.5 89.5 98.8 30.5 53.6 70.5 88.3 99.0 100.0 28.6 50.0 67.3 85.5 98.4 100.0 29.4 52.5 68.5 86.7 99.0 100.0 13.7 29.7 46.7 72.2 96.4 100.0 N = 10 N = 20 N = 30 N = 50 N = 100 N = 200 37.5 67.7 85.4 97.3 100.0 100.0 3.7 19.6 43.0 76.2 95.9 99.4 45.8 74.3 88.6 98.0 100.0 100.0 43.8 71.9 86.6 97.2 100.0 100.0 45.1 73.3 87.4 97.7 100.0 100.0 25.4 54.3 74.4 93.8 99.9 100.0 A) Size T = 10 T = 20 T = 30 B) Power T = 10 T = 20 T = 30 Notes: Rejection frequencies (in %) for K=3 under the null (panel A) and the alternative hypothesis (panel B) when it is drawn from a t-distribution with five degrees of freedom. Nominal size is 5%. 41 Table 4: Size and power in a model with autocorrelated errors A) Size: e adj ∆ CLM LM LM ∗ LMreg LMac N = 10 N = 50 N = 100 5.3 9.0 15.9 7.4 4.6 6.0 4.8 12.1 17.1 4.8 13.1 21.0 3.9 5.3 7.6 5.5 4.3 5.0 N = 10 N = 50 N = 100 7.6 16.3 34.7 6.8 4.9 9.7 6.4 15.1 23.8 6.7 15.7 25.7 4.4 6.3 11.5 4.5 4.9 4.4 N = 10 N = 50 N = 100 7.3 27.9 46.7 6.6 5.2 12.7 6.7 16.1 25.6 6.9 16.9 27.6 5.1 6.3 12.4 4.1 3.6 4.1 N = 10 N = 50 N = 100 6.5 48.2 68.6 3.4 14.3 25.9 9.3 52.6 67.7 11.0 63.5 79.8 3.6 22.3 30.9 7.7 4.8 4.8 N = 10 N = 50 N = 100 14.3 88.9 99.4 3.0 32.8 74.8 8.8 65.1 90.5 9.6 71.4 94.2 4.4 39.0 73.8 4.7 4.3 4.6 N = 10 N = 50 N = 100 32.9 94.7 99.9 4.1 36.7 81.1 24.6 66.6 92.1 25.6 71.1 94.6 9.6 42.0 77.2 4.7 4.5 4.5 N = 10 N = 50 N = 100 5.4 10.5 16.9 8.1 7.0 23.2 8.3 46.0 69.1 8.1 43.6 65.2 4.3 10.5 22.3 7.7 34.2 54.5 N = 10 N = 50 N = 100 11.6 59.3 85.9 3.7 34.6 80.0 26.7 89.6 99.0 26.4 88.7 99.0 9.6 60.2 90.5 17.9 82.4 97.9 N = 10 N = 50 N = 100 14.4 88.6 99.7 4.6 63.9 95.7 28.3 97.5 100.0 27.7 97.4 100.0 10.3 85.8 99.9 18.2 94.9 100.0 ρ = 0.2 T = 10 T = 20 T = 30 B) Size: ρ = 0.5 T = 10 T = 20 T = 30 C) Power: ρ=0 T = 10 T = 20 T = 30 Notes: Panel A) and B) present rejection frequencies (in %) for K=3 under serial correlation of errors (ρ = 0.2 and ρ = 0.5, respectively) and the null hypothesis. Panel C) presents rejection frequencies (in %) under the alternative hypothesis without serial correlation in errors. Nominal size is 5%. 42 C Appendix: Figures Figure 1: Asymptotic local power of the LM (solid line) and the ∆ test 2 (dahed line) when σi,x ∼ χ21 . Figure 2: Asymptotic local power of the LM (solid line) and the ∆ test 2 (dashed line) when σi,x ∼ χ22 . 43
© Copyright 2026 Paperzz