Paolo Paruolo The power of Lambda Max 2000/5 UNIVERSITÀ DELL'INSUBRIA FACOLTÀ DI ECONOMIA http://eco.uninsubria.it I punti di vista espressi nei quaderni della Facoltà di Economia riflettono unicamente le opinioni degli autori, e non rispecchiano necessariamente quelli della Facoltà di Economia dell'Università dell'Insubria. The views expressed in the working papers reflect the opinions of the authors only, and not necessarily the ones of the Economics Faculty of the University of Insubria. © Copyright Paolo Paruolo Printed in Italy in October 2000 Università degli Studi dell'Insubria Via Ravasi 2, 21100 Varese, Italy All rights reserved. No part of this paper may be reproduced in any form without permission of the Author. The power of lambda max Paolo Paruolo∗ December 2000 Abstract This paper considers likelihood ratio (LR) cointegration rank tests in vector autoregressive models (VAR); the local power of the most widely used LR ‘trace’ test is compared with the LR ‘lambda max’ test. It is found that neither test uniformily dominates the other one. Moreover it is shown that the asymptotic properties of the estimator of the cointegration rank based on the trace test are shared by a similar estimator based on the lambda max test. These results indicate that the both tests are admissible. Keywords: Cointegration, Likelihood Ratio, Unit roots, Local Power. 1 Introduction Likelihood ratio (LR) cointegration rank tests, see Johansen (1991a, 1996), are widely used in the empirical literature. The tests compare the maximized likelihood of the model with at most j cointegration vectors with the one with at most s cointegration vectors; these tests are indicated in the following by LR(j, s). The best known test is the LR(j, p) test, called the ‘trace’ test, where p is the number of variables in the VAR; the LR(j, j + 1) test is known as the ‘lambda max’ test. In this paper i) it is shown that the asymptotic properties of the estimator of the cointegration rank based on the trace test are shared by a similar estimator based on the lambda-max test; ii) the asymptotic local power of the lambda-max test is computed and compared with the one of the trace test; it is found that neither test dominates the other one uniformly over the local alternative. The rest of the paper is organized as follows: Section 2 introduces the model and notation, Section 3 presents the selection criteria and their properties. Section 4 reports the limit local power of the tests, calculated via simulation. Section 5 concludes. All proofs are placed in the Appendix. In the following a := b indicates that a is defined by b; moreover, for any full column rank matrix H, H⊥ indicates a basis of the orthogonal complement of span(H). ∗ Department of Economics, University of Insubria, Via Ravasi 2, I-21100 Varese, Italy. Email: [email protected]. Partial financial support from Italian MURST grants ex 60% and 40% is gratefully acknowledged. First draft March 2000, revised October 2000. 1 2 Model and notation Consider the standard autoregressive I(1) model ∆Xt = αβ 0 Xt−1 + (Υ, µ)Ut + ²t (Zt0 d0t )0 , (1) 0 (∆Xt−1 0 ∆Xt−k+1 )0 1 where Xt and ²t are p × 1, Ut := : Zt := : ... : is p(k − 1) × 1, ²t is i.i.d. N(0, Ω) and dt is a vector of deterministic terms. Let Π = αβ 0 . The assumption that α, β are full rank p × r matrices, α0⊥ Γβ ⊥ is of full P rank p − r, where Γ := Ip − k−1 i=1 Υi , Υ := (Υ1 : ... : Υk−1 ), and that no roots of the characteristic polynomial of Xt are on or inside the unit circle except at the point z = 1, is called ‘the I(1, r) assumption’ in the following. Under these assumptions, see Johansen (1991a, 1996), Xt contains a random walk component P t 0 −1 0 i=1 ²i with coefficient C := β ⊥ (α⊥ Γβ ⊥ ) α⊥ . The model (1) when α and β are unrestricted matrices with j columns is indicated as H(j). The LR(j, j + 1) ‘lambda max’ test of H(j) versus H(j + 1) is given by max Q(j) := −T ln(1 − λ̂j+1 ), (2) while the LR(j, p) ‘trace’ test of H(j) versus H(p) is tr Q(j) := −T p X i=j+1 ln(1 − λ̂i ) = p−1 X max Q(i), (3) i=j ¯ ¯ ¯ ¯ −1 S01 ¯ = 0, and Shf are where λ̂1 > ... > λ̂p are the eigenvalues of ¯λ̂S11 − S10 S00 the sample moment matrices of ht , ft corrected for Ut ; the subscripts 0 and 1 indicate Y0t := ∆Xt and Y1t := Xt−1 respectively. Many different models on the deterministic part can be accommodated within specification (1), see Johansen (1992) or Johansen (1996) Section 5.7. Let H(j, h) indicate a submodel of H(j), with h = 1, ..., m, such that H(j, 1) ⊂ H(j, 2) ⊂ ... ⊂ H(j, m) := H(j); for each submodel one can derive the trace and lambdamax tests by (2) and (3), for appropriate definition of the deterministic variables dt and their coefficient µ. We indicate the corresponding statistics by ih Q(j), where i indicates either tr or max and h indicates the submodel H(j, h). Let ih c(η, p − j) be the 1 − η quantile of the asymptotic distribution of ih Q(j) under the I(1,r) assumption and correct specification of the deterministic terms present in dt . These quantiles have been extensively tabulated, see e.g. Johansen (1996), Chapters 11, 15 to which we refer for details.2 Let ih Rj = {ih Q(j) > 1 The assumption of gaussian i.i.d ²t may be relaxed considerably without changing the asymptotics, see e.g. Chan and Wei (1988), Hansen (1992). 2 ‘Correct specification of the deterministic terms’ here means that the restrictions of the submodel H(r, h) hold while the ones of submodels H(r, h − 1) or H(r − 1, h) do not. It is well known that the limit distribution is different if the DGP satisfies the restrictions of some H(j, m) model with j < r or m < h. Thus under the null hypotheses the test statistics ih Q(r) presents many different limit distributions. 2 i h c(η, p − j)} be the size-η rejection region of the test ih Q(j), and let ih Aj = X \ih Rj be its complement, the acceptance region, where X is the sample space. 3 Selection criteria and their interpretation Johansen (1992) has proposed a procedure to estimate the cointegration rank, based on the ideas in Pantula (1989), see also Johansen (1996) chapter 12. For fixed h this procedure starts testing from j = 0 and proceeds to j = 1, 2, ... until a non-rejection is obtained. The corresponding value of j is the estimate of r. More formally j−1 {ih r̂ = j} = ∩ ih Rs ∩ ih Aj s=0 j = 1, ..., p − 1 (4) i i i i while {ih r̂ = p} = ∩p−1 s=0 h Rs , i.e. if h Q(j) rejects for every j, and {h r̂ = 0} = h A0 , i.e. if ih Q(0) does not reject. Johansen based this estimator on the trace test, i = tr. We observe here that the same procedure can be applied for i = max, thus defining the maxh r̂ estimator.3 The procedure in (4) can also be described as follows. The econometrician performs all the tests ih Q(j), j = 0, 1, ..., p − 1 and collects all the values of j that give a non-rejection. Let the set of these indices be called ih I A , where A stands for ‘acceptance’. The procedure in (4) can then be seen as the procedure that selects the minimum among the elements in ih I A ; one could consider this procedure as an application of a parsimony principle (Ockham’s Razor). The application of the procedure in (4) for different values of h provides several rank-estimators i1 r̂, i2 r̂, ...; a joint estimator of the cointegration rank may be obtained applying the same ‘parsimony principle’ to {ih r̂, h = 1, 2, ...m} by selecting i r̂ as the minimum value among the obtained estimators. Similarly one can define an estimator for the specification of the deterministic components by selecting the model class h which corresponds to the minimal ih r̂ equal to i r̂; one thus obtains the combined estimators © ª © ª i i r̂ := min ih r̂ , (5) ĥ := min h : ih r̂ = i r̂ . 1≤h≤m 1≤h≤m The Pantula-type procedure described in (4) can be compared with the standard testing-down procedure obtained applying a general-to-specific strategy, as described in Johansen (1992, page 391). This procedure starts testing from j = p − 1 and proceeds to j = p − 2, p − 3, ... until a rejection is obtained. The last non rejected value of j is the estimate of r. More formally p−1 {ih ř = j} = ∩ ih As ∩ ih Rj−1 s=j 3 j = p − 1, p − 2, ..., 1, (6) For a different testing sequence based on lambda max see exercise 12.1 in Johansen (1996) and its solution in Hansen and Johansen (1998), p. 126. 3 i i while {ih ř = 0} = ∩p−1 s=0 h As , i.e. when all tests do not reject, and {h ř = p} = i i h Rp−1 , i.e. if h Q(p − 1) rejects. The testing-down criterion (6) can also be described in terms of the set of indices that give a non-rejection ih I A , introduced above. Let ih I R the set complementary to ih I A , i.e. the set of integers which give a rejection. Then criterion (6) can be described as the procedure that selects as the estimate ř of r the maximum among the elements in ih I R plus one.4 This interpretation shows how the selection procedures in (4) and (6) focus respectively on the first acceptance and the first rejection in sequences that are in reverse order.5 Here we want to stress that no special property of the trace test is used in defining the two criteria, and that they can be applied to the lambda max test as well. The asymptotic properties of the estimator tr h r̂ have been investigated by Johansen (1992), while the next proposition shows that the same results apply to maxh r̂. Proposition 1 Under the I(1, r) assumption, let the restrictions of the H(r, h) model hold while the ones of H(r, h − 1) do not; then, as T diverges, Pr(maxh r̂ < r) → 0, Pr(maxh r̂ = r) → 1 − η, Pr(max r̂ < r) → 0 , Pr(max ĥ < h) → 0 and Pr(max r̂ = r,max ĥ = h) → 1 − η. The key property in establishing Proposition 1 is that ih Q(j) diverges if j < r. Since λ̂1 , ..., λ̂r are Op (1), one has maxh Q(j) := −T ln(1 λ̂j+1 ) → ∞ for j < r. P− p−1 max This is the same reason that ensures that tr Q(j) = h Q(i) → ∞ in the h i=j same circumstances. Thus the common properties of the estimators (4) based on the lambda max and the trace tests have a common genesis. Johansen (1992) shows that under the same assumptions of proposition 1, one tr has Pr(tr h ř < r) → 0 and Pr(h ř = r) → c < 1 − η as T diverges, i.e. that it is difficult to control the size of the overall test, while a smaller value of r is never selected. The following proposition shows that the same applies to maxh ř. Proposition 2 Under the I(1, r) assumption, let the restrictions of the H(r, h) model hold while the ones of H(r, h − 1) do not; then Pr(maxh ř < r) → 0 and Pr(maxh ř = r) → c < 1 − η as T diverges. The above propositions show that the asymptotic properties of selection criteria (4) and (6) do not depend on which test is used between i = max, tr, and favour estimator (4). Thus one should start testing from the largest λ̂ eigenvalue. 4 With the convention that if ih I R is empty (no rejections) the maximum of the empty set is equal to −1. This convention would then deliver 0 as the corresponding estimate of r. 5 As argued in Johansen (1992 page 390), also the Pantula-type selection criterion in (4) can be interpreted as complying with a general-to-specific strategy. 4 Figure 1: Power of max Q(r) = tr Q(r) for p − r = 1. Full circles indicate simulated data generating process, lines obtained by quadratic interpolation. 4 Local power This section summarizes the asymptotics on the local power of the LR tests for the case of no deterministics; in the rest of the paper we thus drop the subscript h from ih Q. The asymptotic power of tr Q has been obtained by Johansen (1991b); results for max Q are not reported in Johansen (1991b), even though they can be easily derived from there; thisR extension ¡R is reported ¢−1 R here. 0 For any vector processes 0 0 a(da) , where all integrals are aa du a(u), u ∈ [0, 1] let N(a) := (da)a from 0 to 1 and the argument u has been omitted for brevity. The local alternative is defined substituting αβ 0 with αβ 0 + T −1 α1 β 01 , (7) where α, β are p×r full rank matrices and α1 and β 1 are taken to be of dimension p × 1 and not to lie in span(α), span(β) respectively. Let B := (B1 : B2 : B30 )0 indicate a standard Brownian motion with components respectively of dimension 1, 1, and p − r − 2. Let also K denote a Ornstein-Uhlenbeck process 0 0 of dimension p − r, partitioned R u conformably with B, K := (K1 : 0K2 : B3 ) , where Ki (u) := Bi (u) + fi 0 K1 (s)ds, i = 1, 2. Here f := f1 := β 1 Cα1 , g := 1 f2 := ((α01 α⊥ (α0⊥ Ωα⊥ )−1 α0⊥ α1 )(β 01 CΩC 0 β 1 ) − (β 01 Cα1 )2 ) 2 . Finally we indicate by w eigmax (V ) the maximal eigenvalue of the argument matrix V , and → indicates weak convergence. Proposition 3 Let the I(1, r) assumption hold for Π = αβ 0 and let the I(1, r+1) assumption hold for Π defined in (7); then under the local alternative (7) as w w T → ∞, max Q(r) → eigmax (V ), tr Q(r) → tr(V ), V := N (K). 5 Figure 2: Left graph: power of max Q for p − r = 2. Right graph: percentage relative power, 100max p/tr p; empty circles indicate that the relative power was insignificantly different from 1. The limit powers have been computed by simulation for p − r = 1, 2, 3, as in Johansen (1991b). The simulations have been performed discretizing the unit interval in T = 400 segments and using 106 replications. The values of (f, g) have been chosen in the set {0, −3, −6, −9, −12, −15, −18, −21} × {0, 6, 12} as in Johansen (1991b), for a total of 24 data generating processes (DGP). Each DGP is indexed by (f, g), DGP (f, g). Let i R = i Rr indicate the event of rejection based on i Q(r), i = max, tr, and i let p := i p(f, g) := Pr(i R) indicate the power of test i Q under DGP (f, g). Let also i q = 1− i p, and τ := τ (f, g) := γ − max p · tr p, where γ := Pr(max R ∩ tr R) indicates the probability of rejection of both tests. In order to make the simulated size of the tests exactly equal to the nominal size (1% or 5%), the empirical quantiles of the DGP (0, 0) were used as critical values i c(η, p − r).6 Let also i p̂ and γ̂ indicate the Monte Carlo frequencies of the events of rejection of the test i Q and of both tests; define also√τ̂ := γ̂ − max p̂· tr p̂. It is well known that, as the number m of replications diverges, m((max p̂ : tr p̂)0 −(max p : tr p)0 ) converge in law to a bivariate Gaussian variable with mean 0 and variance covariance matrix ¶ µ max max p q τ tr tr p q τ which can be consistently by substituting i p and τ with i p̂ and τ̂ . It is √ maxestimated tr simple to note that m( p̂− p̂) has limit variance equal to max p max q + tr p tr q − 6 Following a common practice, we treat the critical values i c(η, p − r) as known constants and not as estimated values. This does not affect the consistency of the Monte Carlo, although it would influence the calculation of standard errors; this issue is developped in Paruolo (2001). 6 2τ , such that the Monte Carlo variance is a inverse function of τ . In order to exploit this feature, each draw of innovations was used to generate all 24 DGPs, as suggested e.g. in Hendry et al. (1990), page 9. This result can be applied to comparison of powers of the same test i Q for different DGPs. Thus the t-ratio z := µ m max p̂max q̂ + tr p̂tr q̂ − 2τ̂ ¶ 12 w (max p̂ − tr p̂) → N(0, 1) m→∞ (8) was used to test homogeneity of powers. In order to save space, the results are reported only graphically. Fig. 1 reports results for p − r = 1; in this case max Q = tr Q and the two tests share the same power function, which depends only on f . The power function is graphed in Fig. 1 for the 5% and the 1% significance levels. The number of DGPs is in this case reduces to 8; each DGP is marked with a filled circle in the graph. Here and in the following the power function is completed graphically by quadratic interpolation between points. The t-test applied to the comparison of the power of max Q = tr Q of every DGP (f, g) with respect to DGP (0, 0) was always significant; this finding occurred for both max Q and tr Q also for the cases p − r = 2, 3. Fig. 2 reports results for p − r = 2; in this case max Q and tr Q are not equal, and their limit local powers are functions of f and g. In this and the following graphs only results for the 5% significance levels are reported; results for the 1% critical level were similar. The graph on the l.h.s. of Fig. 2 reports the power function of the tests max Q for the 24 DGPs. It is seen that the power function is not monotonic in f , and varies with g. These features are common to the power function of the trace test, see Johansen (1991b). The power difference max p − tr p is very small and has range −1.36% ≤ max p − tr p ≤ 1.14% across DGPs. The graph on the right of Fig. 2 reports the percentage relative power 100max p/tr p. Filled circles in the graph indicate DGPs for which the z test (8) was significant; empty circles indicate instead the DGPs for which the z test did not reject. It can be seen that neither test resulted as uniformly more powerful than the other test. In particular the tr Q test is more powerful for moderate values of f and g, while max Q is superior for extreme values of f , g. It may be helpful to observe that the absolute power gains at regions of low power are emphasized in the graph of the relative power with respect to regions of high power 7 . The power gain of the tr Q test (at most equal to 1.36%) appears slightly superior than the one of the max Q test, (at most equal to 1.14%). Fig. 3 reports results for p − r = 3. The power difference max p − tr p is not as small as in the previous case and has range −2.52% ≤ max p − tr p ≤ 4.59% across DGPs. The graph on the left of Fig. 3 reports the power function of the test max Q for the 24 DGPs, while the graph on the right of Fig. 2 reports the percentage relative efficiency 100max p/tr p. Again it can be seen that the max Q test is more powerful for extreme values of f and g, while, unlike in the case p − r = 2, the 7 Where typically max Q is more powerful. 7 Figure 3: Left graph: power of max Q for p − r = 3. Right graph: percentage relative power, 100max p/tr p; empty circles indicate that the relative power was insignificantly different from 1. power gain of the max Q test (at most equal to 4.59%) appears slightly superior than the one of the tr Q test, (at most equal to 2.52%). 8 5 Conclusions The results of this paper show that the max Q test and the tr Q test share many asymptotic properties, and that no test uniformly dominates the other one in terms of local power. References Chan N. H. and C. Z. Wei (1988), “Limiting distributions of least-squares estimates of unstable autoregressive processes”, Annals of Statistics 16, 367— 401. Hansen B. (1992), “Convergence to stochastic integrals for dependent heterogenous processes”, Econometric Theory 8, 489-500. Hansen P. R., Johansen S. (1998) Workbook on cointegration, Oxford: Oxford University Press. 8 One referee pointed out that the local alternative (7) is 1-dimensional, and it is the kind of alternative that the lamba max test is designed for. He thus found it surprising that the trace test does better than the lambda max test for some DGPs. 8 Hendry D. F., A. J. Neale, N. R. Ericsson (1990) “Pc-Naive, an interactive program for Monte Carlo experimentation in econometrics”, Institute of Economics and Statistics, University of Oxford. Johansen S. (1991a), “Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models”, Econometrica 59, 1551-80. Johansen S. (1991b), “The power function of the likelihood ratio test for cointegration”, in J. Gruber (ed.) Econometric decision models: new methods of modelling and applications, 323-35, Springer Verlag, New York. Johansen S. (1992), “Determination of cointegration rank in the presence of a linear trend”, Oxford Bullettin of Economics and Statistics 54, 383—97. Johansen S. (1996), Likelihood—based inference in cointegrated vector auto-regressive models, Oxford: Oxford University Press. Paruolo P. (2001) On Monte Carlo Estimation of Relative Power, mimeo Università dell’Insubria, Varese, Italy. Pantula S. G., (1989) Testing for unit roots in time series data, Econometric Theory 5, 256-71. Appendix Proof. of Proposition 1. Apply Theorem 2 on page 390 and Theorem 2 on page 392 in Johansen (1992) to maxh Q, recognizing that Johansen’s Theorem 1 hold also for maxh Q. Proof. of Proposition 2. Under the hypotheses of the proposition, λ̂j = Op (1) for j ≤ r, see e.g. Johansen (1996), and thus maxh Q(j) → ∞ for all j ≤ r. This proves that Pr(maxh ř < r) → 0. In order for maxh ř to select the value r, the event A := ∩pj=r+1 maxh Rj ∩ maxh Ar must occur; this event has limit probability c := Pr(A) = Pr((∩pj=r+1 maxh Rj ) ∩ maxh Ar ) < Pr(maxh Ar ) = 1 − η, since Pr(maxh Rj ) is strictly less than 1 for j > r. Proof. of Proposition 3. As in the standard proof, see Johansen (1996), the max Q(r) and the tr Q(r) statistics converge weakly to the maximal eigenvalue and the trace of a limit stochastic matrix F . In Johansen (1991b) it is shown that F = N (K); thus by the continuous mapping theorem the results for max Q(r) follow. 9
© Copyright 2026 Paperzz