pdf Notes on Unit Roots & Cointegration

Coint - 1
D. A. Dickey, April 1998
Notes on Cointegration and Unit Roots
Model
H!
Initial
Condition
Estimator (regression)
]> œ 3 ]>"  />
3 œ " ]! œ !
3^ Regress ]> on ]>" NOINT
]> -. œ 3 (]>" -.)  />
3 œ " ]! œ .
]>  !  " > œ
3 (]>"  !  " Ð>  ")Ñ  />
3 œ " ]! œ !
3^ . Regress ]> on 1, ]>"


(or ]>  ] on ]>"  ] Ñ
3^ 7 Regress ]> on 1, >ß ]>"
Throughout, data can be generated by Yt =Yt-1 +et
8" ! ]>" />
8
"Þ If 3 œ 1 then 8Ðs
3  3Ñ œ
>œ"
8
#
8# ! ]>"
>œ#
œ
"
Ð]8#
#85 #
 ! /#> Ñ
8
" ! #
]>"
8# 5 #
>œ#
8
>œ"
œ
"
#
È
# ÐX8 "Ñ  O: (1/ 8Ñ
>8
]8# œ Ð/"  /#  â  /8 Ñ#
Î
Ð /" /#
Ð
Ð /" /$
Ð
/" /%
Ï /" /&
=?,  .3+198+6
/" /# œ ]" /#
/" /$  /# /$ œ ]# /$
Ð/"  /#  /$ Ñ/% œ ]$ /%
]% /&
/" /#
/# /$
/# /%
/# / &
/" /$
/# /$
/$ /%
/$ / &
Sum of
/" /& Ñ all elts.
/# /& Ó =
Ó8
/$ /& Ó
! /># 
Ó
/% /&
>œ"
Ò 8
#! ]>" />
/" /%
/# /%
/$ /%
/% /&
>œ"
"
È85 ]8
_
œ X8 Ä N(0,1) so
_
numerator Ä
"
#Ð
;#"  "Ñ
Denominator: Ew8 A8 E8 where Ew8 = Ð/" ß /# ß ÞÞÞ ß /8" Ñ
Î8  "
Ð 8#
Ð
Ð 8$
A8 œ Ð
ã
Ð
Ð
Ï
"
8#
8#
8$
ã
8$
8$
8$
ã
"
"
ÞÞÞ
ä
"Ñ
"Ó
Ó
"Ó
Ê
Ó
Ó
Ó
"Ò
Coint - 2
A"
8
Î
Ð
Ð
Ð
œ Ð
Ð
Ð
"
"
!
!
Ï !
"
#
"
!
ã
!
!
"
#
"
!
!
"
#
ÞÞÞ
ä
!
!
!Ñ
!Ó
Ó
!Ó
Ó
!Ó
Ó
#Ò
Rutherford (1946, J. Royal Soc. Edinbg.) shows that
|R7 ÐBß +ß ,Ñl =
=38Ð7"Ñ)  Ð+,Ñ=38Ð7)Ñ+,=38ÐÐ7"Ñ) Ñ
=38Ð)Ñ
where ) = E<--9=ÐB/2Ñ and
ÎB  ,
Ð "
Ð
Ð !
R7 ÐBß +ß ,Ñ = Ð
ã
Ð
Ð
!
Ï !
" ! â
B "
" B
ä
! !
! !
!
!
!
!
!
!
Ñ
Ó
Ó
Ó
Ó
Ó
Ó
B
"
" B+Ò
"
Thus if |A8  - I | = 0 (eigenvalues) then |  A"
I|=l
8 "
R8" Ð  #  - ß !ß "Ñl œ ! so eigenvalues of A8 are reciprocals of those of A"
8 from
which we obtain the i>2
eigenvalue of A8 to be -3ß8 œ ” % -9=
#8 -9=Š
Ð83Ñ1
#8"
#
Ð83Ñ1
Š #8" ‹ •
"
‹ œ Š #8
‹ =38Š #1  #8" ‹ =
"
Ð83Ñ1
"
ß 3 œ "ß #ß âß Ð8  "ÑÞ Now
Ð#3"Ñ1
=38Š #Ð#8"Ñ
‹
Ð#3"Ñ1
Š #Ð#8 Ñ
‹
Š
Ð#3"Ñ1
#
Ð83Ñ1
#3" 1
Let @3ß8 œ Š #8"
‹ # . We have -3ß8 œ ” 2 -9= Š #8" ‹ • = ”
2
‹Ä
Ð#3"Ñ1
#
"
# =38 Ð@3ß8 Ñ •
2
Þ
Now let x3 be the corresponding i>2 eigenvector (column) of A8 with t>2 element x3ß> œ
#
È8" -9=ÒÐ#>  "Ñ @3ß8 ÓÞ Stack these columns side by side to get X8 = [ x1 :x2 :x3 :â :
w
xn-1 ] , a symmetric, orthonormal matrix, i.e., X8 œ X"
8 œ X8 Þ
Now X8 =
"
È8
and
"
È85
]8 œ
Ð"ß "ß "ß âß "ÑÐ/" Î5ß /# Î5ß /$ Î5ß âß /8 Î5Ñw
(def.) "
= È85 Ð"ß "ß "ß âß "ÑE8"
Þ
Coint - 3
1
52
Ew8" A8" E8" = ’Š
E8" ‹ X8" “X8" A8" X8" ’Š
w
1
5
1
5
E8" ‹ X8" “ =
w
Z w8" ?8" Z 8" where ?8" is a diagonal matrix with elements ’4 cos2 Š
Z 8" =
T8 =
1
5
Xw8" E8" so
1
Èn (1,1,â,1)
n+1-i
2n+1 1‹
E8" = X8" Z 8" .
X8" Z 8" .
1
Èn (1,1,â,1)
Now
1
5
w
Bi =
n
1
2 !
Èn È2n+1
t=1
cos –
Ð#>"ÑÐ#3"Ñ 1
Ð#8"Ñ
#—
=
(from Jolley, Summation of Series, Dover press)
#
"
È#8# 8 #
1
#3"
=38œŒ #8"
#
Œ #3"  1
#8"
œ
1
#3"
=38 œ#8Œ #8"
#
(‡ )
œ
"
È#8# 8 Ð
#
È#8# 8 Ð-1Ñ
3" -9=Ð@3ß8" Ñ
=38Ð@
Ñ
’ @ 3ß8" “
È# -3 where -3 œ
Ä
 "Ñ3"
1
#3"
-9= œŒ #8"
#
#È #
3"
1Ð#3"Ñ Ð-1Ñ
1
#3"
=38œŒ #8"
#
œ
3ß8"
#
3"
Þ
1Ð#3"Ñ Ð-1Ñ
#3"
1
#3"
1
Ð‡Ñ sin(A+B) = sinA cosB + cosA sinB, A= (#8+1)Œ #8"
 # , B =  Œ #8"  #
If 3 œ " then nÐs
3  1Ñ œ
3"
! Ð-1Ñ# -9=Ð@3ß8" Ñ
È#8 8 =38Ð@3ß8" Ñ
i=1
n
T8 =
"
#
# ÐX8 "Ñ
>8
 O: Ð"ÎÈ8Ñ where
^3 , >8 = ! ”
8"
3œ"
• ^3#
#
"
# =38Ð@3ß8 Ñ
and where ^3 µ R MHÐ!ß "ÑÞ Now define the following limit random variables:
T = !È2 -3 ^3 , > = ! -32 ^32 where -3 œ
#
3"
.
1Ð#3"Ñ Ð-1Ñ
_
We thus see that nÐs
3  1Ñ Ä
31Ñ
Ðs
s.e.
_
_
i=1
i=1
"
#
# ÐX "Ñ
>
and 7 =
where s.e. is the usual regression standard error.
_
Ä
"
#
# ÐX "Ñ
È>
“ .
-1
Coint - 4
*** Verify unit root eigenvalues and vectors for n=6 **;
options ls=72;
proc iml; reset fuzz=.00001 spaces=3;
AN = { 5 4 3 2 1,
4 4 3 2 1,
3 3 3 2 1,
2 2 2 2 1,
1 1 1 1 1};
inan = inv(An);
eval=eigval(An);
evec=eigvec(An);
print an [format=5.3] inan [format=5.3];
cval = shape(0,5,1); pi = 4*atan(1); cvec=shape(0,5,5);
do i=1 to 5; theta= (2*i-1)*pi/22 ;
cval[i,1] = 1/(2*sin(theta))**2;
do t=1 to 5;
cvec[i,t] = 2/sqrt(2*6-1)*cos((2*t-1)*theta);
end; end;
print cval eval;
print evec [format=5.3] cvec [format=5.3];
AN
5.000
4.000
3.000
2.000
1.000
4.000
4.000
3.000
2.000
1.000
3.000
3.000
3.000
2.000
1.000
2.000
2.000
2.000
2.000
1.000
1.000
1.000
1.000
1.000
1.000
INAN
1.000
-1.00
0.000
0.000
0.000
CVAL
12.343538
1.4486906
0.5829645
0.3532533
0.2715541
EVEC
0.597
0.549
0.456
0.326
0.170
-.549
-.170
0.326
0.597
0.456
0.456
-.326
-.549
0.170
0.597
-.326
0.597
-.170
-.456
0.549
0.170
-.456
0.597
-.549
0.326
-1.00
2.000
-1.00
0.000
0.000
0.000
-1.00
2.000
-1.00
0.000
0.000
0.000
-1.00
2.000
-1.00
0.000
0.000
0.000
-1.00
2.000
0.326
-.597
0.170
0.456
-.549
0.170
-.456
0.597
-.549
0.326
EVAL
12.343538
1.4486906
0.5829645
0.3532533
0.2715541
CVEC
0.597
0.549
0.456
0.326
0.170
0.549
0.170
-.326
-.597
-.456
0.456
-.326
-.549
0.170
0.597
Coint - 5
Alternative to quadratic form decomposition:
ñ D[0,1] = {all functions on [0,1] such that f(t) is right continuous and has left limits}
Examples: CDF, X(t) = !Zi for t − [0,1] where {Zi }_
i=1 is a sequence.
[nt]
i=1
ñ Wiener Process:
Probability space (H,¹,P) for each = we get a function W(t) such that
W(u1 ), W(u2 )-W(u1 ), W(u3 )-W(u2 ), ... ,W(uk )-W(uk-1 ) are normal, independent,
means 0 and variances u1 , u2 -u1 ,... ,uk -uk-1 for 0 Ÿ u1 Ÿ u2 Ÿ .... Ÿ uk Ÿ 1
ñ Donsker's Theorem:
et µ iid(0,5 2 ) and Sn (t) = 5 -1 n-1/2 !ei .
[nt]
iœ"
L
Then Sn (t) Ä W(t)
[see Fuller Theorem 5.3.5 for weaker (Martingale-type) assumptions]
ñ Example:
n
L
Sn (1) = 5 -1 n-1/2 !ei Ä W(1) µ N(0,1)
iœ"
This is the usual CLT.
ñ Continuous Mapping Theorem (a slight variation):
Sn (t) a sequence of random functions on D[0,1] and S(t) one such function
Let f1 ( ), ... fm ( ) be real valued continuous functions on the real line.
Define, for i=1,2,...m, and ki 0
s
Zin (s) = '0 uki fi [Sn (t)]dt
s
Zi (s) = '0 uki fi [S(t)]dt
L
L
If Sn Ä S then (jointly) (Sn (t), Z1n (s) ,Z2n (s) ,..., Zmn (s)) Ä (S(t), Z1 (s) ,Z2 (s)
,..., Zm (s))
ñ Example:
et µ iid(0,5 2 ) Yt =!ei a random walk. Sn (t) = 5 -1 n-1/2 !ei .
t
[nt]
iœ"
iœ"
Thus going from Y to S
(1) Rescales t so that it runs from 0 to 1 rather than 1 to n
(2) Rescales Y so Yn becomes Sn (1) which has variance 1.
(3) Sets S(t) in the interval [j/n, (j+1)/n ) all equal to Sn ( nj ) = 5 -1 n-1 !ei
j
iœ"
Coint - 6
n
n
L 1
Thus, for example, 5 -2 n-2 !Y2t = !(5 -1 n-1/2 Yt )2 1n Ä '0 W2 (t)dt and
Z1n =! (t/n) (5 n
n
j
L 1
Ä '0 t j W(t)dt. Note that the
t=1
-1 -1/2
Yt ) 1n
t=1
1
n
corresponds to dt as one
1
might expect, that is, '0 S2n (t) dt = !(5 -1 n-1/2 Yt )2 1n because Sn (t) is a step function with
t=1
n
t=1
-1 -1/2
value (5 n
Yt ) over intervals of width 1n .
ñ Conclusions:
' 1 W2 (t)dt must be a random variable with ' 1 W2 (t)dt = > = ! -32 ^32
0
0
_
i=1
ñ Unit Root Test (zero mean assumed)
Yt = Yt-1 + et with Y0 =0.
Regression: n(3^ -1) = n-1 !Yt-1 et / ( n-2 !Y2t-1 ) =
n
n
1
L
2
n-1 5 -2 (Y2n -!e2t ) / ( n-2 5 -2 !Yt-1
) Ä "# (W2 (1)-1)/'0 W2 (t)dt
t=1
"
#
t=1
n
n
t="
t=1
Notice that Donsker's Theorem implies this is the same distribution for any i.i.d.
sequence et so if you can compute it for normal et , you have it for any iid sequence.
ñ MSE = n-1 !(Yt -3^ Yt-1 )2 = n-1 !(et +(1-3^)Yt-1 )2 and we have
n(1-3^)2 [n-2 !Y2t-1 ] = n Op(n-2 ) Op(1) = Op( 1n ) so
MSE = n-1 e2t + Op( È1n ) + Op( 1n ) = 5 2 +Op( È1n ).
ñ Studentized statistic (zero mean assumed)
n
P
Because MSE Ä 5 2 then n times the standard error, ËMSE / ( n-2 !Y2t-1 )
converges in law to É1/ '0 W2 (t)dt and the studentized statistic
1
(3^ -1)/ËMSE / !Y2t-1 =
n
t=1
n
1
L
n(3^ -1)/ ËMSE / ( n-2 !Y2t-1 ) Ä "# (W2 (1)-1)/É'0 W2 (t)dt
t=1
t=1
Coint - 7
Now suppose our regression has an intercept.
nÐs
3.  1Ñ =
n
1!

]>" />  Ð 8"
8
8 Ñ/ ]
t=2
n
#
1 ! #
]>"  Ð 8" Ñ]
#
8
t=#
Note that Èn
/ Î5 œ X8 and

where ] œ
"
È85

] œ
8"
" !
]>
8"
>œ"
"
8È85
and 7. =
Ðs
3. 1Ñ
s.e.
_
Ä
1
#
# ŠX  "‹  X [
È>  [ #
8
" !
/>
8"
>œ2
Ò Ð8  "Ñ/"  Ð8  #Ñ/#  â  /8" Ó.
n
1
#
" ! #
/> Î5 #
# ŠX8 8
t="
Call this last expression [8 Þ Then nÐs
3.  1Ñ =
1
#
# ŠX  "‹  X [
> [#
and 
e œ
‹  Ð 8"
8 ÑX8 [8
>8  [8#
_
Ä
.
Finally, for a regression with a linear trend we define
Z8 œ
#
8È8Ð8#Ñ
1
#
# ŠX8 "
!Ð>-"- 8 Ñ]>" =
#
n
t=2
1
8È8Ð8#Ñ
‹  X8 [8  'Ð[8  "# X8 ÑZ8  O: Ð"ÎÈ8Ñ
>8  [8#  $Z8#  O: Ð"ÎÈ8Ñ
and
_
77 Ä
!Ð8-4ÑÐ4-"Ñ/4 œ 8Ðs
3.  "Ñ =
n-1
_
Ä
j=1
Š 1# X  [ ‹Š X 'Z ‹- "#
>  [ # $Z #
Š 1# X  [ ‹Š X 'Z ‹- "#
È>  [ # $Z #
The limit random variables can be expressed jointly in terms of a sequence of N(0,1)
variates ^3 . We have (see Fuller, Thms. 10.1.3, 10.1.6 where > is G ß [ is Hß and Z is K)
_
( >8 ß X 8 ß [ 8 ß Z 8 Ñ Ä ( > ß X ß [ ß Z Ñ œ
Ð! -3# ^3# ß ! È#-3 ^3 ß ! È#-3# ^3 ß ! È#Ð#-3$  -3# Ñ^3 Ñ =
_
_
_
_
1
1
1
('0 W2 (t)dt, W(1), '0 W(t)dt, '0 (2t-1) W(t)dt )
3œ"
3œ"
3œ"
-3 œ
3œ"
#
3"
.
1Ð#3"Ñ Ð-1Ñ
Letting WÐ>Ñ represent a Wiener Process on [0,1] (Brownian Motion) we can represent
_ " ÐW# Ð"Ñ"Ñ
the limits as, for example, nÐs
3  1Ñ Ä #' " #
. This representation was first
!
W Ð>Ñ .>
suggested by J.S. White in the Annals of Math Stat and has been rigorously proved and
popularized by P.C.B. Phillips of Yale and C.Z. Wei of U. Maryland and their students.
Although the expression contains an integral, it is a random variable and must somehow
be simulated or further developed to get limit distributions.
Coint - 8
^3#
Our approach (Dickey and Fuller) is to approximate the infinite weighted sum of
with a finite sum and simulate from that.
Higher Order Processes
1. Note: In lag 1 model, regress ]>  ]>" on ]>" to get 3^  " and 7 right off the
printout. This is the most convenient form of regression for extension to higher order.
2. Regress f]> on ]>" ß f]>" ß f]># ß âß f]>: Òwhere f]> œ ]>  ]>" Ó
to test H! À ARIMA (p,1,0) versus H" À ARIMA(p+1,0,0)
Example
]> œ Ð!  3Ñ]>"  !3 ]>#  /> Ò with roots ! and 3 Ó is the same as
f]> œ Ð"  !ÑÐ3  "Ñ]>"  !3f]>"  />
Note: Coefficient of ]>" estimates Ð"  !ÑÐ3  "Ñ, a multiple of Ð3  "Ñ. Limit
distribution will be Ð"  !Ñ times that of nÐ3^  "Ñ but standard error will have same
multiplier so 7 will be unaffected in the limit (same for mean and trend models). We will
show this.
s " œ
Using standard matrix regression symbols, our regression gives "
Ð\ w \ Ñ" Ð\ w I Ñ ß and letting D8 œ .3+1Ö "Î8ß "ÎÈ8 × we will normalize this as
w
"
w
s
D"
8 ("  " ) œ ÐD8 \ \ D8 Ñ ÐD8 \ I Ñ where, for one lagged difference,
"Î8
D8 \ w \ D8 = Œ
!
Î
Ð
Ð
Ï8
#
8# ! ]>"
8
! ]>" f]>"
>œ$
$Î#
8
>œ$
!
"Î8
\w\ Œ

È
"Î 8
!
!
œ
È
"Î 8 
8$Î# ! ]>" f]>" Ñ
Ó
>œ$
and we have
Ó
8
8" ! Ðf]>" Ñ# Ò
8
>œ$
[Claim: ]
Î
Ð
Ð
Ï8
#
8# ! ]>"
8
! ]>" f]>"
>œ$
$Î#
8
>œ$
"
8$Î# ! ]>" f]>" Ñ
Ó Ð"!Ñ# >8
>œ$

Ó
8
" !
#
!
8
Ðf]>" Ñ Ò
8
>œ$
!
5#
"!#
Ä0
P
Notice that this is a diagonal matrix. With more lagged differences, a block diagonal
limit structure would be obtained.
Coint - 9
[ Proof: ]
To show this limit result, note that under H0 : 3 œ " we have these items:
8
P
8" ! Ðf]>" Ñ# Ä #f Ð!Ñ œ 5 # ÎÐ"  !# Ñ
item (a)
>œ3
IÐ! ]>" f]>" Ñ# œ !!Ò IÐ ]>" ]=" Ñ IÐf]>" f]=" Ñ
8
item (b)
>
>œ3
=
 IÐ]=" f]>" ÑIÐ]>" f]=" Ñ  IÐ]=" f]=" ÑIÐ]>" f]>" ÑÓ
Notice that the last two terms are O(1) because
lIÐ]4 f]5 Ñl œ l I ÖÐf]"  f]#  f]$  â  f]4 Ñf]5 × l Ÿ ! l #f Ð5  4Ñ l
_
which is O(1). Turning to the first term we have !l IÐ ]>" ]=" Ñ IÐf]>" f]=" Ñl œ
4œ_
n
tœ 2
! ¹ ! ! #f Ð3  4Ñ ¹¹ #f Ð>  =Ñ ¹ Ÿ ! l>l
n
>" ="
n
>œ3 3œ" 4œ"
>œ3
! ¹ #f Ð2Ñ ¹¹ #f Ð>  =Ñ ¹ œ
_
2œ_
! ¹ #f Ð2Ñ ¹ ! l>l ¹ #f Ð>  =Ñ ¹ œ SÐ"Ñ and we sum this over =, that is
_
n
2œ_
>œ3
! ! { } = O(n) so that item (b) is O(n2 ) and ! ]>" f]>" = O: Ð8Ñ
n
8
n
>œ3
s=3 t=3
item (c)
]>  ]>" œ !Ð]>"  ]># Ñ  /> Ê
]>  !]>" œ ]>"  !]>#  /> Ê
>
./0
]>  !]>" œ ! /4 œ W> ß
4 œ"
so ]>  ! ]> œ W>  !]>"  !]> œ W>  !Ð]>  ]>" Ñ. Squaring and summing both
sides we have
Ð"  !Ñ#
!]>#
8#
œ
!W>#
8#
= 5 2 >n + O: Ð È"8 Ñ .
 !#
!Ð]> ]>" Ñ#
8#
 O: Ð È"8 Ñ œ
!W>#
8#
 O: Ð È"8 Ñ
…
Coint - 10
Having studied Xw X we turn to Xw E.
Î 8" ! ]>" /> Ñ
1/n
0
Ð
Ó
>œ3
D8 Xw E œ Œ
Xw E œ Ð
Ó
8
"
0 1/Èn 
# !
8
f]
/
>" > Ò
Ï
8
>œ3
where the second element is the numerator of the AR(1) regression coefficient for the
differenced data. Thus by standard stationary arguments (e.g. Brown's martingale central
limit theorem) it converges to N(0, (1-!2 )-1 5 4 ). Because D8 Xw X D8 converges to a
(block) diagonal matrix with the appropriate lower right element (block) we see that the
coefficient(s) on the lagged difference(s) will have the same limit normal distribution
when we regress differences on a lagged level and lagged differences as when we regress
on just the lagged differences as would be appropriate if we knew we had a unit root.
Thus using F or t tests to determine an appropriate order is asymptotically justified when
our data has a unit root and we regress the first difference on the lagged level and lagged
differences.
Now the first element of D8 Xw E is 8" !]>" /> and what we have previously
studied would be written 8" !W>" /> in the model we are now studying. We have from
our previous calculations,
Ð"  !Ñ 8" ! ]>" /> œ 8" ! ÒW>"  !Ð]>"  ]># ÑÓ /> œ
n
n
8" ! W>" />  O: Ð È"8 Ñ =
t=3
n
t=3
2
"
2 2 !et
# [5 (Tn - n )]
t=3
 O: Ð È"8 Ñ .
Combining this with item (c) we find 8"^ 1 = 8Ðs
3  "Ñ œ ’
’ 5 2 >n /Ð"  !Ñ# “ ’ "# [5 2 (T2n -1
!e2t
n )]
!]># -1 ! ]>" />
8# “ ’
8
/Ð"  !Ñ “  O: Ð È"8 Ñ Ä
“œ
Ð"  !Ñ "# (T2 - ")Î> as expected. This means we cannot compare the raw coefficient
8"^ 1 to our tables without some adjustment. We could divide 8 "^ 1 by Ð"  "^ # Ñ
^ Ñ which we just showed to be consistent, then compare to our tables.
œ Ð"  !
The error mean square, MSE, from our regression is a consistent estimate of 5 2
and the standard error printed by the computer is asymptotically equivalent to the square
root of MSE/!]># (because Xw X is block diagonal). From the results above we see that
P
8# Ð MSE/!]># Ñ - [>n /Ð"  !Ñ# ]-1 Ä 0. Now [>n /Ð"  !Ñ# ] appears in the denominator
of 8"^ . Dividing 8"^ by n times the standard error produces the t test from the computer
and we see that this has the same limit distribution as ’ 5 2 È>n /Ð"  !Ñ# “ ’ "# [5 2 (Tn2 1
!e2t
n5 2 )]
1
/Ð"  !Ñ “ = (T2n -
-1
!e2t
n5 2 )
/È>n . This means that we can use our 7 tables for the t-
test without any further adjustment. Adding more lagged differences, a mean or trend
Coint - 11
term to the model poses no additional theoretical problem. The bank of tests (coefficients
and t tests for regressions possibly containing a mean or trend) have come to be known
collectively as the "Dickey-Fuller" test and when lagged differences are included as we
are now discussing, the tests are called "Augmented D-F tests". Note the appearance in
the denominators of several of our expressions, the term Ð"  !Ñ. Clearly our results will
not hold in the presence of a second unit root.
See Fuller, Chapter 10, for further discussion. Also our alternative estimators
have been studied in the case of unit roots. Gonzalez-Farias (NCSU PhD thesis)
develops the exact MLE based test, Dickey, Hasza and Fuller (JASA 1984) develop the
symmetric test and Park and Fuller (JTSA 1995 and ISU PhD Thesis) develop the
weighted symmetric estimator and test for unit root problems. Pantula, Gonzalez-Farias
and Fuller (JBES 1994) study power and size properties of these and other tests. There is
considerable power improvement in these versus the original OLS based tests with the
best ones appearing to be the exact MLE followed closely by weighted symmetric which
is much easier to compute. The limit distributions are not the same as the DF tables.
Tables of critical values are given in Fuller's book for all of these.
Cointegration - general discussion
Definitions:
A time series that requires d differences to get it stationary is said to be
"integrated of order d". If the dth difference has p AutoRegressive and q Moving
Average terms, the differenced series is said to be ARMA(p,q) and the original Integrated
series to be ARIMA(p,d,q).
Two series Xt and Yt that are integrated of order d may, through linear
combination, produce a series +\>  ,]> which is stationary (or integrated of order
smaller than d) in which case we say that \> and ]> are cointegrated and we refer to
Ð+ß ,Ñ as the cointegrating vector. Granger and Weis discuss this concept and
terminology.
An example:
For example, if \> and ]> are wages in two similar industries, we may find that
both are unit root processes. We may, however, reason that by virtue of the similar skills
and easy transfer between the two industries, the difference \> - ]> cannot vary too far
from 0 and thus, certainly should not be a unit root process. The cointegrating vector is
specified by our theory to be Ð"ß  "Ñ or Ð  "ß "Ñ, or Ð-ß  -Ñ all of which are
equivalent.
The test for cointegration here consists of simply testing the original series for
unit roots, not rejecting the unit root null , then testing the \> - ]> series and rejecting the
unit root null. We just use the standard D-F tables for all these tests. The reason we can
use these D-F tables is that the cointegrating vector was specified by our theory, not
estimated from the data.
Numerical examples: ]> œ E ]>"  I>
Coint - 12
1. Bivariate, stationary:
]">
"Þ#
Œ ]  œ ” 0Þ%
#>
E  -M œ º
"Þ#  Þ%
]"ß>"
 Þ$
/">

ß
0Þ& •Œ ]#ß>"  Œ /#> 
!
%
I> µ R Œ ß Œ
!
"
"
$ 
 Þ$
œ -#  "Þ(-  Þ'  Þ"# œ Ð-  Þ*ÑÐ-  Þ)Ñ
Þ&  - º
this is =>+>398+<C Ðnote form of I> distribution has no impact )
2. Bivariate, nonstationary:
]">
"Þ#
Œ ]  œ ” 0Þ2
#>
E  -M œ º
"Þ#  0Þ2
]"ß>"
 Þ&
/">

ß
•Œ
0Þ&
]#ß>"  Œ /#> 
 Þ5
œ -#  "Þ(-  Þ'  Þ" œ Ð-  1ÑÐ-  Þ7Ñ
Þ&  - º
this is a ?83> <99> :<9-/==
Using the spectral decomposition (eigenvalues, vectors) of A we have
1.2
” !Þ#
 Þ&
1ÎÈ"Þ"' "ÎÈ#
1ÎÈ"Þ"' "ÎÈ# 1
œ –
•
–
—
—” !
!Þ&
Þ%ÎÈ"Þ"' "ÎÈ#
Þ%ÎÈ"Þ"' "ÎÈ#
E
X
œ
X
X " ]> œ Ð X " EX Ñ X " ]>"
^>
œ
?
^>" 
!
!Þ( •
?
 X " I>
(>
components of the ^> vector:
^"ß> œ ^"ß>"  ("ß> ww -97798 ></8. ww (unit root)
^2ß> œ 0.7 ^2ß>"  (2ß>
(stationary root)
]"ß> œ A" ^"ß>  A# ^#ß> œ "ÎÈ"Þ"' ^"ß>  "ÎÈ# ^#ß>
Ÿ <- share "common trend"
]2ß> œ A$ ^"ß>  A% ^#ß> œ Þ%ÎÈ"Þ"' ^"ß>  "ÎÈ# ^#ß>
^> œ X " ]> so that last row of X " is cointegrating vector. Notice that A is not
symmetric and X " Á X w Þ
Coint - 13
Engle - Granger method
This is one of the earliest and easiest to understand treatments of cointegration.
]"ß> œ A" ^"ß>  A# ^#ß>
where
]2ß> œ A$ ^"ß>  A% ^#ß>
!Z21,t
8#
is O: Ð"Ñ and
!Z22,t
8#
is O: Ð 1n Ñ
n-2 !Y1t Y2t
n
so if we regress ]"ß> on ]#ß> our regression coefficient is
œ
A"
A$
n-2 !Y22t
t=1
n
t=1
+ O: Ð È"8 Ñ and our residual series is thus approximately
=
n-2 A" A$ !Z21,t O: Ð È"8 Ñ
n-2 A#$ !Z21,t O: Ð È"8 Ñ
"
A$ Ò A$ Y1t -A" Y2t Ó
œ
ÐA#  A" A% ÎA$ Ñ ^#> , a stationary series. Thus a simple regression of ]"ß> on ]#ß>
gives an estimate of the cointegrating vector and a test for cointegration is just a test that
the residuals are stationary. Let the residuals be <> . Regress <>  <>" on <>" (and
possibly some lagged differences). Can we compare to our D-F tables? Engle and
Granger argue that one cannot do so.
The null hypothesis is that there is no cointegration, thus the bivariate series has 2
unit roots and no linear combination is stationary. We have, in a sense, looked through
all possible linear combinations of ]"ß> and ]#ß> finding the one that varies least (least
squares) and hence the one that looks most stationary. It is as though we had computed
unit root tests for all possible linear combinations then selected the one most likely to
reject. We are thus in the area of order statistics. (If you report the minimum heights
from samples of 10 men each, the distribution of these minimae will not be the same as
the distribution of heights of individual men nor will the distribution of unit root tests
from these "best" linear combinations be the same as the distribution you would get for a
pre specified linear combination). Engle and Granger provide adjusted critical values.
Here is a table comparing their E-G tables to our D-F tables for n=100. E-G used an
augmented regression with 4 lagged differences and an intercept to calculate a t statistic
7. so keep in mind that part of the discrepancy is due to finite sample effects of the
(asymptotically negligible) lagged differences.
Prob of smaller 7. :
E-G
D-F
.01
-3.77
-3.51
.05
-3.17
-2.89
Example:
Pt = cash price on delivery date, Texas steers
Ft = Futures price
(source: Ken Mathews, NCSU Ag. Econ.)
Data are bimonthly Feb. '76 through Dec. '86 (60 obs.)
1. Test individual series for integration
.10
-2.84
-2.58
Coint - 14
fPt = 7.6 - 0.117 Pt-1 + ! "^ i fPt-i
5
fFt = 7.7 - 0.120 Ft-1 + ! "^ i fFt-i
7. (D-F) =
-.117
.053
= -2.203
7. (D-F) =
-.120
.054
= -2.228
i=1
5
i=1
each series is integrated, cannot reject at 10%
2. Regress Ft on Pt :
^ = .5861 + .9899 P , residual R
F
t
t
t
fRt = 0.110 - .9392 Rt-1
7. (E-G) =
-.9392
.1264
= -7.428
Thus, with a bit of rounding, Ft - 1.00 Pt is stationary.
The Engle-Granger method requires the specification of one series as the dependent
variable in the bivariate regression. Fountis and Dickey (Annals of Stat.) study
distributions for the multivariate system.
If ]> œ E ]>"  I> then ]>  ]>" œ  ÐM  EÑ ]>"  I>
We show that if the true series has one unit root then the root of the least squares
^ that is closest to 0 has the same limit distribution, after
estimated matrix M  E
multiplication by n, as the standard D-F tables and we suggest the use of the eigenvectors
^ to estimate the cointegrating vector. The only test we can do with this is the
of M  E
null of one unit root versus the alternative of stationarity. Johansen's test, discussed
later, extends this in a very nice way. Our result also holds for higher dimension models,
but requires the extraction of roots of the estimated characteristic polynomial.
For the Texas steer futures data, the regression gives
f P>
5.3
 "Þ((
Œ fF  œ ” 6.9 •  ”  "Þ!$
>
where
0.54
” -.69
"Þ'*
P>"
 $ lagged differences
!Þ*$ •Œ F>" 
 "Þ((
-0.82
0.72 •”  "Þ!$
"Þ'*
 3.80 -4.42
!Þ*$ •”  3.63 -2.84 •
indicating that  .69 P>  0.72 Ft is stationary. This is about .7 times the difference so
the two methods agree that P>  F> is stationary, as is any multiple of it.
Johansen's Method
This method is similar to that just illustrated but has the advantage of being able
to test for any number of unit roots. The method can be described as the application of
standard multivariate calculations in the context of a vector autoregression, or VAR. The
Coint - 15
test statistics are those found in any multivariate text. Johansen's idea, like in univariate
unit root tests, is to get the right distribution for these standard calculated statistics. The
statistics are standard, their distributions are not.
We start with just a lag one model with mean 0 (no intercept).
f]> œ C]>"  I> where ]> is a p-dimensional column vector as is I>
assume E Ö I> I> w × œ AÞ
w
H! À C œ  !" œ 
Ô
×
Õ
Ø
c
d<B:
:B<
< œ ! Ê all linear combinations nonstationary
< œ : Ê all linear combinations stationary
0 < <  : Ê cointegration
Note: for any C there are infinitely many !, " such that C œ  !" w
[because  !" w =  !X X " " w Ó so we do not test hypotheses about ! and " , only
about the rank r.
Now define sums of squares and cross products:
w
f]>w ]>"
8
w
W!! W!" for example: W"" = ! ]>" ]>"
ŒW

>œ"
W""
"!
f]>
]>"
Now write down likelihood (conditional on ]0 = 0 )
_ =
w
/B: Ö  "# ! Ðf]>  C ]>" Ñ A" Ðf]>  C ]>" Ñ ×
8
"
8:
8
Ð#1Ñ # lA l #
>œ"
If C is assumed to be full rank ( r = p) then the likelihood is maximized at the
usual estimate - the least squares regression estimate
"
^ = ! Ð f] ] w Ñ ’ ! ]
w
"
œ W!" W""
C
> >"
>" ]>" “
and
^=
A
8
8
>œ"
>œ"
8
"!
Ðf]>
8
>œ"
^ ] ÑÐf]  C
^ ] Ñw œ W  C
^W C
^w
C
>"
>
>"
!!
""
H! À r stationary linear combinations of ]> (linearly indep.) and thus (p-r) unit
root linear combinations
Coint - 16
H0 À r "cointegrating vectors" and (p-r) "common trends"
H! À C œ  !" w with !:B< and ":B<
^ ) and can evaluate likelihood there.
^ ,A
So far we have the unrestricted estimate ( C
Principle of likelihood ratio requires that we maximize the likelihood for C œ  !" w
and compare to the unrestricted maximum. That is, we now want to maximize
_ =
"
8:
8
Ð#1Ñ # lA l #
w
/B: Ö  "# ! Ðf]> + !" w ]>" Ñ A" Ðf]> + !" w ]>" Ñ ×
8
>œ"
Step 1:
For any given " we can compute " w ]>" and find the corresponding !
by regression in the model
f]> =  ! (" w ]>" ) + I>
and this is simply
w
w
^ Ð " Ñ =  ! Ð f]> ]>"
!
" Ñ ’ ! Ð " w ]>" ]>"
"Ñ“
8
8
>œ"
>œ"
"
Step 2:
^ Ð " Ñ into the likelihood
Search over " for maximum. To do this, plug !
function which now becomes a function of " and A. Now recall (from general
regression) that
exp š -
1
#
><+-/ Ò \ Ð 8" \ w \ Ñ" \ w Ó › œ exp š -
8
#
><+-/ Ò Ð\ w \ Ñ" \ w \ Ó › œ
>2
exp{ - 8:
# × where exp is the exponential function and p=rank(\ ). In our case \ has t
w
row Ðf]> + !" w ]>" Ñ and by our usual maximum likelihood arguments we will, for
any given " , estimate A by
^ (" ) = n" \ w \ so that _ =
A
"
Ð#1Ñ
8: ^
# l A(" )
8
l#
/B: Ö 
8:
# ×
^ (" ) l.
Our goal now is to maximize _ which we would do by minimizing lA
Step 2 a
Coint - 17
^ (" ) l œ l W  W " Ð" w W " Ñ" " w W l
Minimize lA
!!
!"
""
"!
Recall for a 2x2 matrix we have
+
º-
,
.Ð+  , . " -Ñ
œ
+.

,œ
 +Ð.  - +" ,Ñ
.º
and similarly for the determinant of a partitioned matrix. Thus
W!!
º "w W
"!
s Ð" Ñl l " w W"" " l
W!" "
lA
œ 
º
w
"
" W"" "
W!" " l
l W!! l l " w W"" "  " w W"! W!!
so our problem now becomes:
Mi n
"
s Ð" Ñl
lA
l W!! l
Ê Mi n
"
"
l " w W"" "  " w W"! W!!
W!" " l
w
l " W"" " l
œ ‡
Recall: Cholesky Root
W"" p.s.d. and symmetric Ê W"" = Y w Y =
ÐY upper triangular) SAS:
PROC IML; U=ROOT(S11);
*=
·
Ñ· ‡ ‡Ñ
‡ ‡
‡ ‡
χ ‡ ‡ÒÏ
‡Ò
"
l " w Y w ÐM  ÐY w Ñ" W"! W!!
W!" Y " ÑY " l
l" w Y w Y " l
Note: We have seen that C =  !" w allows a lot of flexibility in choosing the columns
of " . (Corresponding adjustments in ! will preserve C.) We choose " w Y w Y " = M .
Let ^ œ Y "
Ò " œ Y " ^ Ó
Fact:
'3 = i>2 column of Z is eigenvector of symmetric matrix so can get it in SAS.
"
^ w ÐM  ÐY w Ñ" W"! W!!
W!" Y " Ñ^ œ .3+198+6 7+><3BÞ
(1) Cholesky on W""
"
(2) ÐY w Ñ" W"! W!!
W!" Y "
‡
(3) EIGENVECTORS '3 , EIGENVALUES (-"  -#  -$  â  -: Ñ
Coint - 18
s œ Y " ^
Ð%Ñ "
w
^ ] ):!
s w W"" "
s Ñ" Ð "
s w W"! Ñ
^ by regressing f]> on ("
^w œ Ð"
Ð&Ñ Get !
>"
* Note: eigenvalues are called "squared canonical correlations" between f]> and ]>" .
PROC CANCOR will compute these for you.
Testing
Maximized _ unconditionally
Maximized _ under H! Þ
Now look at 635/63299. <+>39 >/=>.
Summary
s Ð" Ñl
s to minimize lA
(1) Choose "
s is expressible as "
s = Y " ^ for some choice of ^ .
(2) Y invertible so any "
s vector is arbitrary so we can specify ^ w ^ œ M Þ
(3) Length of "
"
Ð%Ñ Pick ^ to minimize l^ w ÐY w Ñ" Ð W!!  W!" W""
W"! ÑY " ^ l and thus ^ would be
picked as the matrix whose columns are those associated with the smallest (1--3 Ñß that is,
the largest "squared canonical correlations" -3 Þ
H! À r Ÿ r!
H" À r  r!
# Ð"-3 Ñ
:
PVX =
7+B H! Ð_ Ñ
7+B H! H" Ð_ Ñ
=
s " l8Î#
lA
s ! l8Î#
lA
œ – 3œ"
<!
#
Ð"-3 Ñ
3œ"
—
8Î#
# Ð"  -3 Ñ Ó8Î#
:
œÒ
3œ<! "
Now in standard likelihood ratio testing we often take the log of the likelihood ratio. The
reason for this is that it often leads to a Chi-square limit distribution. There is no hope of
that happening in this nonstandard case, but Johansen still follows that tradition,
suggesting that we reject when
8
#
! 68 Ð"  -3 Ñ is small, that is we will reject when
:
3œr! "
 8 ! 68 Ð"  -3 Ñ is large where -r! "  -r! #  -r! $  â >-:
:
3œr! "
are the p-r! =7+66/=> squared canonical correlations. This is Johansen's "trace" test.
To keep things straight ........
Coint - 19
You use the smallest squared canonical correlations thus making 1--3 large
(nearer 1) and hence making -n D 68(1--3 Ñ small (i.e. you select the -3 that best
protect H! Ñ
1.
In a later article, Johansen notes that you may have better power if you opt to
test H! À r=r! versus H" À r=r! +1 and thus use the largest (-r! " ) of the smallest
-'s. This is Johansen's "maximal eigenvalue test."
2.
Under H! 9< H" you have at least r! cointegrating vectors and hence at most p-r!
"common trends." Therefore rejection of the null hypothesis means you have
found yet another cointegrating vector.
3.
The interpretation of a cointegrating vector is that you have found a linear
combination of your vector components that cannot vary too far from 0 (i.e. you
have a "law" that cannot be too badly violated). A departure from this
relationship would be called an "error" and if we start at any point and forecast
into the future with this model, the forecasts will eventually satisfy the
relationship. Therefore this kind of model is referred to as an "error correction"
model.
Example:
Z"> = Z" ß >"  e"> We observe
Z2> = .8 Z2 ß >"  e2>
Y"> œ Z">  Þ*Z#>
Y#> œ Z">  Þ'Z#>
Notice that Y">  Y#> œ "Þ& Z#> is stationary so we are saying that Y"> can't wander too
far from Y#> and yet both Y> s are nonstationary. They are wandering around, but
wandering around together, you might say.
Now in practice we would just observe the Y's. Notice that
^> = Œ
"
!
"
]> œ Œ
"
!
 I> Ê
^
Þ)  >"
Þ*
" !
"
Œ
Œ
 Þ'
! Þ)
"
Ð/B+->6CÑ
Þ))
œ
Œ Þ!)
Þ*
]>"  893=/
 Þ' 
"
Þ"#
]
 893=/
Þ*#  >"
Now suppose Y"> œ "# and Y#> = 2. These are not very close to each other and
thus are in violation of the equilibrium condition Y"> œ Y#> . In the absence of future
shocks, does the model indicate that this "error" will "correct" itself?
Error correction:
Coint - 20
Þ)) Þ"#
"#
"!Þ)
The next period we forecast ]^ >+" = Œ
=Œ
whose components
Œ

2
Þ!) Þ*#
#Þ) 
are closer together.
Our two step ahead forecast is Œ
finds that Œ
Þ))
Þ!)
Þ"#
10.8
9.84
=Œ
and continuing one
Œ

Þ*#
2.8
3.44 
Þ)) Þ"#
"#
'Þ'%%
Þ)) Þ"#
"#
'Þ!!!"
œ Œ
and Œ
=Œ

Œ



Œ

Þ!) Þ*#
2
&Þ(""
Þ!) Þ*#
2
&Þ***" 
"!
&!
Let us take this example a step further. Modeling the changes in the series as a
Þ))-1 Þ"#
function of the lagged levels, we have f]> œ Œ
]
 893=/
Þ!) Þ*#-1  >"
 Þ"#
œ Œ
a"  " b ]>"  893=/ so we see that the discrepancy from
Þ!) 
equilibrium a"  " b ]>" , which in our case is 10, is computed then .12 times this is
subtracted from Y" and .08 times this is added to Y# . The "speed of adjustment" is thus
faster in Y" and we end up farther from the original Y" than from the original Y# Þ Also,
although the model implies (assuming 0 initial condition) that E{Y" × œ E{Y# } = 0,
there is nothing drawing the series back toward their theoretical means (0). Shocks to
this series have a permanent effect on the levels of the series, but only a temporary effect
on the relationship (equality in this model) between the series components.
Remaining to do:
Q1: What are the critical values for the test?
Q2: What if we have more than 1 lag?
Q3: What if we include intercepts, trends, etc.
Q1. Usually the likelihood ratio test has a limit Chi-square distribution. For example a
_
_
regression F%8 test is such that 4F8% Ä ;%# . Also F81 = t# Ä ^ # is a special case. Now
we have seen that the t statistic, 7 , has a nonstandard limit distribution expressible as a
functional of Brownian Motion FÐ>Ñ on [0,1]. We found:
7=
'!" FÐ>Ñ .FÐ>Ñ
"
"
Ò '! F # Ð>Ñ .> Ó #
œ
"
#
# Ò F Ð"Ñ  " Ó
"
" #
Ò '! F Ð>Ñ .> Ó #
and we might thus expect Johansen's test statistic to converge to a multivariate analogue
of this expression. Indeed, Johansen proved that his likelihood ratio (trace) test
converges to a variable that can be expressed as a functional of a vector valued Brownian
Motion with independent components (channels) as follows (i.e. the error term has
variance matrix I5 # Ñ
Coint - 21
"
"
"
_
LRT Ä trace œ[ '! F Ð>Ñ . F Ð>Ñ ]w [ '! F Ð>Ñ F w Ð>Ñ .> ]" ['! F Ð>Ñ . F Ð>Ñ] 
For a Brownian motion of dimension m = 1,2,3,4,5 (Table 1, page 239 of Johansen) he
computes the distribution of the LRT by Monte Carlo. Empirically he notes that these
percentiles are quite close to the percentiles of c;#0 where f=2m# and c=.85-.58/f.
We will not repeat Johansen's development of the limit distribution of the LRT
but will simply note that it is what would be expected by one familiar with the usual limit
results for F statistics and with the nonstandard limit distributions that arise with unit root
processes. As one might expect, the m=1 case can be derived from the 7 distribution.
Q2. What happens in higher order processes?
]> œ E" ]>"  E# ]>#  E$ ]>$  â  E5 ]>5  I>
f]> œ  (M  E"  E2  â  E5 Ñ ]>"  Ð E# + E$ +â+E5 Ñf]>" 
Ð E$ + E% +â+E5 Ñf]>#  â  E5 f]>5"  I>
which has the form
f]> œ  (M  E"  E2  â  E5 Ñ ]>"  F" f]>"
 F$ f]>#  â  F5 f]>5"  I>
The characteristic equation is |M m5  E" m5"  E2 m5#  â  E5 l œ !Þ Now if
m=1 is a root of this, we have |M  E"  E2  â  E5 | = 0 . The number of unit
roots in the system is the number of 0 roots for the matrix (M - E" - E2 -â  E5 Ñ and the
rank of this matrix is the number of cointegrating vectors. I am assuming a vector
autoregression in which each component has at most one unit root (i.e. differencing
makes each component stationary) here. While this parameterization most closely
resembles the usual unit root testing framework, Johansen chooses an equivalent way to
parameterize the model, placing the lagged levels at lag k instead of lag 1. His model is
written:
f]> œ  (M  E" Ñ f]>"  â  (M  E"  E2  â  E5 -1 Ñ f]>5 +"
 (M  E"  E2  â  E5 Ñ ]>5  I>
and he is checking the rank of C = (M  E"  E2  â  E5 Ñ .
Recall: Regression can be done in 3 steps.
Suppose we are regressing ] on \" ß \# ß âß \5" , \5 and we are interested in the
coefficient (matrix) of \5 . We can get that coefficient matrix in 3 steps:
Step 1: Regress ] on \" ß \# ß âß \5" --- residuals R]
Step 2: Regress \5 on \" ß \# ß âß \5" --- residuals Rk
Step 3: Regress R] on Rk -- gives same coefficient matrix as in full regression.
Coint - 22
so Johansen does the following in higher order models:
Step 1: Regress f]> on f]>" ß f]># ß âß f]>5 +" --- residuals Rf
Step 2: Regress ]>5 on f]>" ß f]># ß âß f]>5 +" --- residuals R5
Step 3: Squared canonical correlations between R] on Rk
The idea is very nice. Johansen maximizes the likelihood for any C5 with respect to the
other parameters by performing steps 1 and 2. Having done this, he's back to a lag 1 type
of problem.
By analogy, if in ordinary unit root testing the null hypothesis were true, you
could estimate the autoregressive coefficients consistently by regressing the first
s 3 , you could compute
difference on the lagged first differences. Having these estimates, !
s > œ ]>  !
s " ]>"  â  !
s : ]>: and under the null
a "filtered" version of ] , namely Z
hypothesis, Z> œ ]>  !" ]>"  â  !: ]>: is a random walk so you could regress
s> on Z
s>-1 and, in large samples, compare the results to our D-F unit root tables both for
Z
the coefficient and t statistic. Note that all of Johansen's statistics are multivariate
analogues of 7 - there is nothing like the "normalized bias test" n(3^ - 1).
Example:
]> œ E" ]>"  E# ]>#  E$ ]>$  I> ,
]>w œ Ð]"> ß ]#> ß ]$> ß ]%> Ñ
"!! observations
Step 1: Regress f]> on f]>" ß f]>#
Step 2: Regress ]>3 on f]>" ß f]>#
Step 3: Squared canonical correlations 0.010, 0.020, 0.08, 0.460
Test H! À r = ! vs. H" : r > 0.
r! = 0, p=4, so use 4-0 = 4 smallest canonical correlations.
LRT = -100 [ 68(0.99)  68(0.98)  68(0.92)  68Ð0.54) ] = 72.98
Johansen, Table 1, gives critical value for m=4 (H! implies m=4 common trends)
Critical value is 41.2. Reject H! Þ There is at least 1 cointegrating vector.
Test H! À r = 1 vs. H" : r > 1.
r! = 1, p=4, so use 4-1 = 3 smallest canonical correlations.
LRT = -100 [ 68(0.99)  68(0.98)  68(0.92) ] = 11.36
(look in Johansen's table under m=3 common trends)
Critical value is 23.8. Do not reject H! Þ There are no more cointegrating vectors.
Q3 Johansen later wrote a paper addressing the intercept case. The trend issue is a bit
tricky as is true in the univariate case. Even the intercept case has some interesting
features in terms of how intercepts in the model for the observed series translate into the
underlying canonical series (common trends).
Coint - 23
An additional note: We are used to taking -2 68Ð_Ñ in maximum likelihood
estimation and likelihood ratio testing. We do this because, under certain regularity
conditions, it produces test statistics with standard limiting distributions, (Chi-square
usually). In this case, we do not have the required regularity conditions for the test of r
(number of cointegrating vectors) otherwise, Johansen would not have needed to do any
new tabulations. Because this is the case, Johansen could have opted to use other
functions of the eigenvalues such as n!-3 in place of his -n ! 68Ð"  -3 Ñ statistic. We
show (for the case of a univariate series - just one -) that both of these statistics converge
to the same distribution and it is the distribution 7 # Þ First note that for the series ]> œ
]>"  /> we have ™ ]> œ ]>  ]>" œ /> so our regression MSE is !/># Î8 œ W!! ,
#
our standard error is ÊQ WIÎ!]>"
œ ÈW!! ÎÐ8 W"" Ñ and 3^  " = W!" ÎW"" so
7 œ
3^  "
ÊQ WIÎ
#
!]>"
œ
W!" ÎW""
ÈW!! ÎÐ8 W"" Ñ
7
W!"
œ
È8
ÈW!! W""
Now the "Cholesky root" of the univariate variable S"" is, of course, just ÈW"" and
Johansen looks at the matrix ÈW"" ( 1 - ÈWW!" W"!! ÈWW!" )ÈW"" =
ÈW"" ( 1 - 7 # ÑÈW"" calling its eigenvalues 1--3 Þ We see immediately that 8 -3 is just
8
7 # so we already know its distribution (the multivariate cases are analogous).
This also shows that -3 = S: Ð"Î8Ñ so if we expand Johansen's statistic, using the
Taylor series for 68Ð"  BÑ expanded around B œ !, we then have
""
 8 68Ð"  -3 Ñ œ  8Ð 68Ð"Ñ 
"
"! Ð
""
 -3 Ñ  S: Ð"Î8# Ñ Ñ
from which we see that  8 68Ð"  -3 Ñ œ 8-3  S: Ð"Î8Ñ
proving, as claimed, that these two statistics have the same limit distribution (of course
there are some extra details needed for the multivariate case). Notice that for a single these two statistics are monotone transforms of each other so even in finite samples,
provided we had the right distributions, they would give exactly equivalent tests. For the
more interesting multivariate case, they are the same only in the limit.
Coint - 24