The power of Lambda Max - Università dell`Insubria

Paolo Paruolo
The power
of
Lambda Max
2000/5
UNIVERSITÀ DELL'INSUBRIA
FACOLTÀ DI ECONOMIA
http://eco.uninsubria.it
I punti di vista espressi nei quaderni della Facoltà di
Economia riflettono unicamente le opinioni degli autori,
e non rispecchiano necessariamente quelli della Facoltà
di Economia dell'Università dell'Insubria.
The views expressed in the working papers reflect the
opinions of the authors only, and not necessarily the ones
of the Economics Faculty of the University of Insubria.
© Copyright Paolo Paruolo
Printed in Italy in October 2000
Università degli Studi dell'Insubria
Via Ravasi 2, 21100 Varese, Italy
All rights reserved. No part of this paper may be reproduced in
any form without permission of the Author.
The power of lambda max
Paolo Paruolo∗
December 2000
Abstract
This paper considers likelihood ratio (LR) cointegration rank tests in
vector autoregressive models (VAR); the local power of the most widely
used LR ‘trace’ test is compared with the LR ‘lambda max’ test. It is
found that neither test uniformily dominates the other one. Moreover it is
shown that the asymptotic properties of the estimator of the cointegration
rank based on the trace test are shared by a similar estimator based on the
lambda max test. These results indicate that the both tests are admissible.
Keywords: Cointegration, Likelihood Ratio, Unit roots, Local Power.
1
Introduction
Likelihood ratio (LR) cointegration rank tests, see Johansen (1991a, 1996), are
widely used in the empirical literature. The tests compare the maximized likelihood of the model with at most j cointegration vectors with the one with at most
s cointegration vectors; these tests are indicated in the following by LR(j, s). The
best known test is the LR(j, p) test, called the ‘trace’ test, where p is the number
of variables in the VAR; the LR(j, j + 1) test is known as the ‘lambda max’ test.
In this paper i) it is shown that the asymptotic properties of the estimator of
the cointegration rank based on the trace test are shared by a similar estimator
based on the lambda-max test; ii) the asymptotic local power of the lambda-max
test is computed and compared with the one of the trace test; it is found that
neither test dominates the other one uniformly over the local alternative.
The rest of the paper is organized as follows: Section 2 introduces the model
and notation, Section 3 presents the selection criteria and their properties. Section 4 reports the limit local power of the tests, calculated via simulation. Section
5 concludes. All proofs are placed in the Appendix. In the following a := b indicates that a is defined by b; moreover, for any full column rank matrix H, H⊥
indicates a basis of the orthogonal complement of span(H).
∗
Department of Economics, University of Insubria, Via Ravasi 2, I-21100 Varese, Italy. Email: [email protected]. Partial financial support from Italian MURST grants ex
60% and 40% is gratefully acknowledged. First draft March 2000, revised October 2000.
1
2
Model and notation
Consider the standard autoregressive I(1) model
∆Xt = αβ 0 Xt−1 + (Υ, µ)Ut + ²t
(Zt0
d0t )0 ,
(1)
0
(∆Xt−1
0
∆Xt−k+1
)0
1
where Xt and ²t are p × 1, Ut :=
:
Zt :=
: ... :
is
p(k − 1) × 1, ²t is i.i.d. N(0, Ω) and dt is a vector of deterministic terms. Let
Π = αβ 0 . The assumption that α, β are full rank p × r matrices, α0⊥ Γβ ⊥ is of full
P
rank p − r, where Γ := Ip − k−1
i=1 Υi , Υ := (Υ1 : ... : Υk−1 ), and that no roots
of the characteristic polynomial of Xt are on or inside the unit circle except at
the point z = 1, is called ‘the I(1, r) assumption’ in the following. Under these
assumptions, see Johansen (1991a, 1996), Xt contains a random walk component
P
t
0
−1 0
i=1 ²i with coefficient C := β ⊥ (α⊥ Γβ ⊥ ) α⊥ .
The model (1) when α and β are unrestricted matrices with j columns is
indicated as H(j). The LR(j, j + 1) ‘lambda max’ test of H(j) versus H(j + 1)
is given by
max
Q(j) := −T ln(1 − λ̂j+1 ),
(2)
while the LR(j, p) ‘trace’ test of H(j) versus H(p) is
tr
Q(j) := −T
p
X
i=j+1
ln(1 − λ̂i ) =
p−1
X
max
Q(i),
(3)
i=j
¯
¯
¯
¯
−1
S01 ¯ = 0, and Shf are
where λ̂1 > ... > λ̂p are the eigenvalues of ¯λ̂S11 − S10 S00
the sample moment matrices of ht , ft corrected for Ut ; the subscripts 0 and 1
indicate Y0t := ∆Xt and Y1t := Xt−1 respectively.
Many different models on the deterministic part can be accommodated within
specification (1), see Johansen (1992) or Johansen (1996) Section 5.7. Let H(j, h)
indicate a submodel of H(j), with h = 1, ..., m, such that H(j, 1) ⊂ H(j, 2) ⊂
... ⊂ H(j, m) := H(j); for each submodel one can derive the trace and lambdamax tests by (2) and (3), for appropriate definition of the deterministic variables
dt and their coefficient µ. We indicate the corresponding statistics by ih Q(j),
where i indicates either tr or max and h indicates the submodel H(j, h).
Let ih c(η, p − j) be the 1 − η quantile of the asymptotic distribution of ih Q(j)
under the I(1,r) assumption and correct specification of the deterministic terms
present in dt . These quantiles have been extensively tabulated, see e.g. Johansen
(1996), Chapters 11, 15 to which we refer for details.2 Let ih Rj = {ih Q(j) >
1
The assumption of gaussian i.i.d ²t may be relaxed considerably without changing the
asymptotics, see e.g. Chan and Wei (1988), Hansen (1992).
2
‘Correct specification of the deterministic terms’ here means that the restrictions of the
submodel H(r, h) hold while the ones of submodels H(r, h − 1) or H(r − 1, h) do not. It is
well known that the limit distribution is different if the DGP satisfies the restrictions of some
H(j, m) model with j < r or m < h. Thus under the null hypotheses the test statistics ih Q(r)
presents many different limit distributions.
2
i
h c(η, p − j)}
be the size-η rejection region of the test ih Q(j), and let ih Aj = X \ih Rj
be its complement, the acceptance region, where X is the sample space.
3
Selection criteria and their interpretation
Johansen (1992) has proposed a procedure to estimate the cointegration rank,
based on the ideas in Pantula (1989), see also Johansen (1996) chapter 12. For
fixed h this procedure starts testing from j = 0 and proceeds to j = 1, 2, ... until
a non-rejection is obtained. The corresponding value of j is the estimate of r.
More formally
j−1
{ih r̂ = j} = ∩ ih Rs ∩ ih Aj
s=0
j = 1, ..., p − 1
(4)
i
i
i
i
while {ih r̂ = p} = ∩p−1
s=0 h Rs , i.e. if h Q(j) rejects for every j, and {h r̂ = 0} = h A0 ,
i.e. if ih Q(0) does not reject. Johansen based this estimator on the trace test,
i = tr. We observe here that the same procedure can be applied for i = max,
thus defining the maxh r̂ estimator.3
The procedure in (4) can also be described as follows. The econometrician
performs all the tests ih Q(j), j = 0, 1, ..., p − 1 and collects all the values of j
that give a non-rejection. Let the set of these indices be called ih I A , where A
stands for ‘acceptance’. The procedure in (4) can then be seen as the procedure
that selects the minimum among the elements in ih I A ; one could consider this
procedure as an application of a parsimony principle (Ockham’s Razor).
The application of the procedure in (4) for different values of h provides
several rank-estimators i1 r̂, i2 r̂, ...; a joint estimator of the cointegration rank may
be obtained applying the same ‘parsimony principle’ to {ih r̂, h = 1, 2, ...m} by
selecting i r̂ as the minimum value among the obtained estimators. Similarly one
can define an estimator for the specification of the deterministic components by
selecting the model class h which corresponds to the minimal ih r̂ equal to i r̂; one
thus obtains the combined estimators
© ª
©
ª
i
i
r̂ := min ih r̂ ,
(5)
ĥ := min h : ih r̂ = i r̂ .
1≤h≤m
1≤h≤m
The Pantula-type procedure described in (4) can be compared with the standard testing-down procedure obtained applying a general-to-specific strategy, as
described in Johansen (1992, page 391). This procedure starts testing from
j = p − 1 and proceeds to j = p − 2, p − 3, ... until a rejection is obtained.
The last non rejected value of j is the estimate of r. More formally
p−1
{ih ř = j} = ∩ ih As ∩ ih Rj−1
s=j
3
j = p − 1, p − 2, ..., 1,
(6)
For a different testing sequence based on lambda max see exercise 12.1 in Johansen (1996)
and its solution in Hansen and Johansen (1998), p. 126.
3
i
i
while {ih ř = 0} = ∩p−1
s=0 h As , i.e. when all tests do not reject, and {h ř = p} =
i
i
h Rp−1 , i.e. if h Q(p − 1) rejects.
The testing-down criterion (6) can also be described in terms of the set of
indices that give a non-rejection ih I A , introduced above. Let ih I R the set complementary to ih I A , i.e. the set of integers which give a rejection. Then criterion (6)
can be described as the procedure that selects as the estimate ř of r the maximum among the elements in ih I R plus one.4 This interpretation shows how the
selection procedures in (4) and (6) focus respectively on the first acceptance and
the first rejection in sequences that are in reverse order.5
Here we want to stress that no special property of the trace test is used in
defining the two criteria, and that they can be applied to the lambda max test
as well. The asymptotic properties of the estimator tr
h r̂ have been investigated by
Johansen (1992), while the next proposition shows that the same results apply
to maxh r̂.
Proposition 1 Under the I(1, r) assumption, let the restrictions of the H(r, h)
model hold while the ones of H(r, h − 1) do not; then, as T diverges, Pr(maxh r̂ <
r) → 0, Pr(maxh r̂ = r) → 1 − η, Pr(max r̂ < r) → 0 , Pr(max ĥ < h) → 0 and
Pr(max r̂ = r,max ĥ = h) → 1 − η.
The key property in establishing Proposition 1 is that ih Q(j) diverges if j < r.
Since λ̂1 , ..., λ̂r are Op (1), one has maxh Q(j) := −T ln(1
λ̂j+1 ) → ∞ for j < r.
P−
p−1 max
This is the same reason that ensures that tr
Q(j)
=
h Q(i) → ∞ in the
h
i=j
same circumstances. Thus the common properties of the estimators (4) based on
the lambda max and the trace tests have a common genesis.
Johansen (1992) shows that under the same assumptions of proposition 1, one
tr
has Pr(tr
h ř < r) → 0 and Pr(h ř = r) → c < 1 − η as T diverges, i.e. that it is
difficult to control the size of the overall test, while a smaller value of r is never
selected. The following proposition shows that the same applies to maxh ř.
Proposition 2 Under the I(1, r) assumption, let the restrictions of the H(r, h)
model hold while the ones of H(r, h − 1) do not; then Pr(maxh ř < r) → 0 and
Pr(maxh ř = r) → c < 1 − η as T diverges.
The above propositions show that the asymptotic properties of selection criteria (4) and (6) do not depend on which test is used between i = max, tr, and
favour estimator (4). Thus one should start testing from the largest λ̂ eigenvalue.
4
With the convention that if ih I R is empty (no rejections) the maximum of the empty set is
equal to −1. This convention would then deliver 0 as the corresponding estimate of r.
5
As argued in Johansen (1992 page 390), also the Pantula-type selection criterion in (4) can
be interpreted as complying with a general-to-specific strategy.
4
Figure 1: Power of max Q(r) = tr Q(r) for p − r = 1. Full circles indicate simulated
data generating process, lines obtained by quadratic interpolation.
4
Local power
This section summarizes the asymptotics on the local power of the LR tests for
the case of no deterministics; in the rest of the paper we thus drop the subscript
h from ih Q. The asymptotic power of tr Q has been obtained by Johansen (1991b);
results for max Q are not reported in Johansen (1991b), even though they can be
easily derived from there; thisR extension
¡R is reported
¢−1 R here. 0 For any vector processes
0
0
a(da) , where all integrals are
aa du
a(u), u ∈ [0, 1] let N(a) := (da)a
from 0 to 1 and the argument u has been omitted for brevity. The local alternative
is defined substituting αβ 0 with
αβ 0 + T −1 α1 β 01 ,
(7)
where α, β are p×r full rank matrices and α1 and β 1 are taken to be of dimension
p × 1 and not to lie in span(α), span(β) respectively. Let B := (B1 : B2 : B30 )0
indicate a standard Brownian motion with components respectively of dimension 1, 1, and p − r − 2. Let also K denote a Ornstein-Uhlenbeck process
0 0
of dimension p − r, partitioned
R u conformably with B, K := (K1 : 0K2 : B3 ) ,
where Ki (u) := Bi (u) + fi 0 K1 (s)ds, i = 1, 2. Here f := f1 := β 1 Cα1 , g :=
1
f2 := ((α01 α⊥ (α0⊥ Ωα⊥ )−1 α0⊥ α1 )(β 01 CΩC 0 β 1 ) − (β 01 Cα1 )2 ) 2 . Finally we indicate by
w
eigmax (V ) the maximal eigenvalue of the argument matrix V , and → indicates
weak convergence.
Proposition 3 Let the I(1, r) assumption hold for Π = αβ 0 and let the I(1, r+1)
assumption hold for Π defined in (7); then under the local alternative (7) as
w
w
T → ∞, max Q(r) → eigmax (V ), tr Q(r) → tr(V ), V := N (K).
5
Figure 2: Left graph: power of max Q for p − r = 2. Right graph: percentage
relative power, 100max p/tr p; empty circles indicate that the relative power was
insignificantly different from 1.
The limit powers have been computed by simulation for p − r = 1, 2, 3, as
in Johansen (1991b). The simulations have been performed discretizing the unit
interval in T = 400 segments and using 106 replications. The values of (f, g) have
been chosen in the set {0, −3, −6, −9, −12, −15, −18, −21} × {0, 6, 12} as in
Johansen (1991b), for a total of 24 data generating processes (DGP). Each DGP
is indexed by (f, g), DGP (f, g).
Let i R = i Rr indicate the event of rejection based on i Q(r), i = max, tr, and
i
let p := i p(f, g) := Pr(i R) indicate the power of test i Q under DGP (f, g). Let
also i q = 1− i p, and τ := τ (f, g) := γ − max p · tr p, where γ := Pr(max R ∩ tr R)
indicates the probability of rejection of both tests. In order to make the simulated
size of the tests exactly equal to the nominal size (1% or 5%), the empirical
quantiles of the DGP (0, 0) were used as critical values i c(η, p − r).6
Let also i p̂ and γ̂ indicate the Monte Carlo frequencies of the events of rejection
of the test i Q and of both tests; define also√τ̂ := γ̂ − max p̂· tr p̂. It is well known that,
as the number m of replications diverges, m((max p̂ : tr p̂)0 −(max p : tr p)0 ) converge
in law to a bivariate Gaussian variable with mean 0 and variance covariance
matrix
¶
µ max max
p
q
τ
tr tr
p q
τ
which can be consistently
by substituting i p and τ with i p̂ and τ̂ . It is
√ maxestimated
tr
simple to note that m( p̂− p̂) has limit variance equal to max p max q + tr p tr q −
6
Following a common practice, we treat the critical values i c(η, p − r) as known constants
and not as estimated values. This does not affect the consistency of the Monte Carlo, although
it would influence the calculation of standard errors; this issue is developped in Paruolo (2001).
6
2τ , such that the Monte Carlo variance is a inverse function of τ . In order to
exploit this feature, each draw of innovations was used to generate all 24 DGPs,
as suggested e.g. in Hendry et al. (1990), page 9. This result can be applied to
comparison of powers of the same test i Q for different DGPs. Thus the t-ratio
z :=
µ
m
max p̂max q̂ + tr p̂tr q̂ − 2τ̂
¶ 12
w
(max p̂ − tr p̂) → N(0, 1)
m→∞
(8)
was used to test homogeneity of powers.
In order to save space, the results are reported only graphically. Fig. 1 reports
results for p − r = 1; in this case max Q = tr Q and the two tests share the same
power function, which depends only on f . The power function is graphed in Fig.
1 for the 5% and the 1% significance levels. The number of DGPs is in this case
reduces to 8; each DGP is marked with a filled circle in the graph. Here and in the
following the power function is completed graphically by quadratic interpolation
between points. The t-test applied to the comparison of the power of max Q = tr Q
of every DGP (f, g) with respect to DGP (0, 0) was always significant; this finding
occurred for both max Q and tr Q also for the cases p − r = 2, 3.
Fig. 2 reports results for p − r = 2; in this case max Q and tr Q are not equal,
and their limit local powers are functions of f and g. In this and the following
graphs only results for the 5% significance levels are reported; results for the 1%
critical level were similar. The graph on the l.h.s. of Fig. 2 reports the power
function of the tests max Q for the 24 DGPs. It is seen that the power function is
not monotonic in f , and varies with g. These features are common to the power
function of the trace test, see Johansen (1991b).
The power difference max p − tr p is very small and has range −1.36% ≤ max p −
tr
p ≤ 1.14% across DGPs. The graph on the right of Fig. 2 reports the percentage
relative power 100max p/tr p. Filled circles in the graph indicate DGPs for which
the z test (8) was significant; empty circles indicate instead the DGPs for which
the z test did not reject. It can be seen that neither test resulted as uniformly
more powerful than the other test. In particular the tr Q test is more powerful
for moderate values of f and g, while max Q is superior for extreme values of f ,
g. It may be helpful to observe that the absolute power gains at regions of low
power are emphasized in the graph of the relative power with respect to regions
of high power 7 . The power gain of the tr Q test (at most equal to 1.36%) appears
slightly superior than the one of the max Q test, (at most equal to 1.14%).
Fig. 3 reports results for p − r = 3. The power difference max p − tr p is not as
small as in the previous case and has range −2.52% ≤ max p − tr p ≤ 4.59% across
DGPs. The graph on the left of Fig. 3 reports the power function of the test max Q
for the 24 DGPs, while the graph on the right of Fig. 2 reports the percentage
relative efficiency 100max p/tr p. Again it can be seen that the max Q test is more
powerful for extreme values of f and g, while, unlike in the case p − r = 2, the
7
Where typically
max
Q is more powerful.
7
Figure 3: Left graph: power of max Q for p − r = 3. Right graph: percentage
relative power, 100max p/tr p; empty circles indicate that the relative power was
insignificantly different from 1.
power gain of the max Q test (at most equal to 4.59%) appears slightly superior
than the one of the tr Q test, (at most equal to 2.52%). 8
5
Conclusions
The results of this paper show that the max Q test and the tr Q test share many
asymptotic properties, and that no test uniformly dominates the other one in
terms of local power.
References
Chan N. H. and C. Z. Wei (1988), “Limiting distributions of least-squares estimates of unstable autoregressive processes”, Annals of Statistics 16, 367—
401.
Hansen B. (1992), “Convergence to stochastic integrals for dependent heterogenous processes”, Econometric Theory 8, 489-500.
Hansen P. R., Johansen S. (1998) Workbook on cointegration, Oxford: Oxford
University Press.
8
One referee pointed out that the local alternative (7) is 1-dimensional, and it is the kind of
alternative that the lamba max test is designed for. He thus found it surprising that the trace
test does better than the lambda max test for some DGPs.
8
Hendry D. F., A. J. Neale, N. R. Ericsson (1990) “Pc-Naive, an interactive
program for Monte Carlo experimentation in econometrics”, Institute of
Economics and Statistics, University of Oxford.
Johansen S. (1991a), “Estimation and hypothesis testing of cointegration vectors
in Gaussian vector autoregressive models”, Econometrica 59, 1551-80.
Johansen S. (1991b), “The power function of the likelihood ratio test for cointegration”, in J. Gruber (ed.) Econometric decision models: new methods
of modelling and applications, 323-35, Springer Verlag, New York.
Johansen S. (1992), “Determination of cointegration rank in the presence of a
linear trend”, Oxford Bullettin of Economics and Statistics 54, 383—97.
Johansen S. (1996), Likelihood—based inference in cointegrated vector auto-regressive
models, Oxford: Oxford University Press.
Paruolo P. (2001) On Monte Carlo Estimation of Relative Power, mimeo Università dell’Insubria, Varese, Italy.
Pantula S. G., (1989) Testing for unit roots in time series data, Econometric
Theory 5, 256-71.
Appendix
Proof. of Proposition 1. Apply Theorem 2 on page 390 and Theorem 2 on page
392 in Johansen (1992) to maxh Q, recognizing that Johansen’s Theorem 1 hold
also for maxh Q.
Proof. of Proposition 2. Under the hypotheses of the proposition, λ̂j = Op (1)
for j ≤ r, see e.g. Johansen (1996), and thus maxh Q(j) → ∞ for all j ≤ r. This
proves that Pr(maxh ř < r) → 0. In order for maxh ř to select the value r, the
event A := ∩pj=r+1 maxh Rj ∩ maxh Ar must occur; this event has limit probability
c := Pr(A) = Pr((∩pj=r+1 maxh Rj ) ∩ maxh Ar ) < Pr(maxh Ar ) = 1 − η, since Pr(maxh Rj )
is strictly less than 1 for j > r.
Proof. of Proposition 3. As in the standard proof, see Johansen (1996), the
max
Q(r) and the tr Q(r) statistics converge weakly to the maximal eigenvalue and
the trace of a limit stochastic matrix F . In Johansen (1991b) it is shown that
F = N (K); thus by the continuous mapping theorem the results for max Q(r)
follow.
9