STAT 497 LECTURE NOTE 11

STAT 497
LECTURE NOTE 11
VAR MODELS AND GRANGER
CAUSALITY
1
VECTOR TIME SERIES
• A vector series consists of multiple single
series.
• Why we need multiple series?
– To be able to understand the relationship between
several components
– To be able to get better forecasts
2
VECTOR TIME SERIES
• Price movements in one market can spread easily
and instantly to another market. For this reason,
financial markets are more dependent on each
other than ever before. So, we have to consider
them jointly to better understand the dynamic
structure of global market. Knowing how markets
are interrelated is of great importance in finance.
• For an investor or a financial institution holding
multiple assets play an important role in decision
making.
3
VECTOR TIME SERIES
4
VECTOR TIME SERIES
5
VECTOR TIME SERIES
6
7
8
VECTOR TIME SERIES
• Consider an m-dimensional time series
Yt=(Y1,Y2,…,Ym)’. The series Yt is weakly
stationary if its first two moments are time
invariant and the cross covariance between Yit
and Yjs for all i and j are functions of the time
difference (st) only.
9
VECTOR TIME SERIES
• The mean vector:

E Yt     1 , 2 ,, m 
• The covariance matrix function

k   CovYt k ,Yt   E Yt k   Yt   
  11 k   12 k 
  k   k 
21
22



 
 k   k 
 m1
m2

  1m k  

  2 m k 


 



  mm k 

10
VECTOR TIME SERIES
• The correlation matrix function:
k D
 ij k 
where D is a diagonal matrix in which the i-th
diagonal element is the variance of the i-th
process, i.e.
D  diag 110,  22 0,,  mm 0.
 k   D
1 / 2
1 / 2
• The covariance and correlation matrix
functions are positive semi-definite.
11
VECTOR WHITE NOISE PROCESS
• {at}~WN(0,) iff {at} is stationary with mean 0
vector and
, k  0
 k   
0 , o.w.
12
VECTOR TIME SERIES
• {Yt} is a linear process if it can be expressed as

Yt    j at  j for at  ~ WN 0 ,  
j 0
where {j} is a sequence of mxn matrix whose
entries are absolutely summable, i.e.

  j i , l   0 for i,l  1,2,...,m.
j  
13
VECTOR TIME SERIES
• For a linear process, E(Yt)=0 and

 k     j  k j , k  0 ,1,2 ,...
j  
14
MA (WOLD) REPRESENTATION
Yt    B at

where  B    s B
s
s 0
• For the process to be stationary, s should be
square summable in the sense that each of
the mxm sequence ij.s is square summable.
15
AR REPRESENTATION
B Yt     at

where  B   1    s B
s
s 0
• For the process to be invertible, s should be
absolute summable.
16
THE VECTOR AUTOREGRESSIVE MOVING
AVERAGE (VARMA) PROCESSES
• VARMA(p,q) process:
 p  B Yt      q  B at
where  p  B    0  1B     p B p
 q  B    0  1B     q B
q
q  0   p  B Yt     at  VAR p 
p  0  Yt      q  B at  VMA(q)
17
VARMA PROCESS
• VARMA process is stationary, if the zeros of
|p(B)| are outside the unit circle.
Yt      B at   p  B   q  B 
1
• VARMA process is invertible, if the zeros of
|q(B)| are outside the unit circle.
  B Yt     at
 q B 
1
 p  B Yt     at
18
IDENTIFIBILITY PROBLEM
• Multiplying matrices by some arbitrary matrix
polynomial may give us an identical
covariance matrix. So, the VARMA(p,q) model
is not identifiable. We cannot uniquely
determine p and q.
19
IDENTIFIBILITY PROBLEM
• Example: VARMA(1,1) process
Y1,t  0   m  Y1,t 1  a1,t  0  m  a1,t 1 
 
Y   0






0  Y2 ,t 1  a2 ,t  0 0  a2 ,t 1 
 2 ,t  
1    m B  Y1,t  1 mB  a1,t 





0


1

 Y2 ,t  0 1  a2 ,t 
1
Y


1



m
B
 1,t  
 1 mB  a1,t  1  B  a1,t 

Y   0





 a 
1
0
1
a
0
1
 
  2 ,t  
  2 ,t 
 2 ,t  
MA()=VMA(1)
20
21
IDENTIFIBILITY
• To eliminate this problem, there are three
methods suggested by Hannan (1969, 1970,
1976, 1979).
– From each of the equivalent models, choose the
minimum MA order q and AR order p. The resulting
representation will be unique if Rank(p(B))=m.
– Represent p(B) in lower triangular form. If the order
of ij(B) for i,j=1,2,…,m, then the model is
identifiable.
– Represent p(B) in a form p(B) =p(B)I where p(B)
is a univariate AR(p). The model is identifiable if p0.
22
VAR(1) PROCESS
• Yi,t depends not only the lagged values of Yit
but also the lagged values of the other
variables.
 I  B Yt     at
• Always invertible.
• Stationary if I  B  0 outside the unit
circle. Let =B1.
I  B  0      0
The zeros of |IB| is related to the eigenvalues of .
23
VAR(1) PROCESS
• Hence, VAR(1) process is stationary if the
eigenvalues of ; i, i=1,2,…,m are all inside
the unit circle.
• The autocovariance matrix:


 k   EYt kYt   E Yt k Yt 1  at 
 EYt kYt1  Yt k at 


 1   ,k  0
 k   
k


k  1  0  , k  1
24
VAR(1) PROCESS
• k=1,
1  0    1
  0   1

1
1
0
01
1
 0  1 1 00 1 0 1




 0  0

25
VAR(1) PROCESS
• Then,  0     0 
vec 0   I      vec 
1
where   Kronecker product
vec ABC   C   Avec B 
3 
 4
 
 3 2
1 


e.g.X  4 6  vec X    


2


1 7 
6 
7 
 
 a11B  a1n B 
e.g. A  B   

 


am1B  amn B 
26
VAR(1) PROCESS
• Example:
1.1  0.3
Yt  
Yt 1  at

0.6 0.2 
1.1    0.3 
  I  

0
.
6
0
.
2




det   I   1.1   0.2     0.6 0.3
2  1.3  0.4  0
 1  0.8,2  0.5
The process is stationary.
27
VMA(1) PROCESS
Yt    at  at 1 where at ~ WN 0, .
• Always stationary.
• The autocovariance function:
0    
 ,k  1

 k    ,k  1
0 ,o.w.

• The autocovariance matrix function cuts of
after lag 1.
28
VMA(1) PROCESS
• Hence, VMA(1) process is invertible if the
eigenvalues of ; i, i=1,2,…,m are all inside
the unit circle.
29
IDENTIFICATION OF VARMA
PROCESSES
• Same as univariate case.
• SAMPLE CORRELATION MATRIC FUNCTION:
Given a vector series of n observations, the
sample correlation matrix function is
ˆ k   ˆ ij k 
where ˆ ij k  ‘s are the crosscorrelation for the
i-th and j-th component series.
• It is very useful to identify VMA(q).
30
SAMPLE CORRELATION MATRIC
FUNCTION
• Tiao and Box (1981): They have proposed to
use +, and . signs to show the significance of
the cross correlations.
+ sign: the value is greater than 2 times the
estimated standard error
 sign: the value is less than 2 times the
estimated standard error
. sign: the value is within the 2 times estimated
standard error
31
PARTIAL AUTOREGRESSION OR PARTIAL
LAG CORRELATION MATRIX FUNCTION
• They are useful to identify VAR order. The
partial autoregression matrix function is
proposed by Tiao and Box (1981) but it is not
a proper correlation coefficient. Then, Heyse
and Wei (1985) have proposed the partial lag
correlation matrix function which is a proper
correlation coefficient. Both of them can be
used to identify the VARMA(p,q).
32
EXAMPLE OF VAR MODELING IN R
• “vars” package deals with VAR models.
• Let’s consider the Canadian data for an
application of the model.
• Canadian time series for labour productivity
(prod), employment (e), unemployment rate
(U) and real wages (rw) (source: OECD
database)
• Series is quarterly. The sample range is from
the 1stQ 1980 until ¨ 4thQ 2000.
33
Canadian example
> library(vars)
> data(Canada)
> layout(matrix(1:4, nrow = 2, ncol = 2))
> plot.ts(Canada$e, main = "Employment", ylab = "", xlab = "")
> plot.ts(Canada$prod, main = "Productivity", ylab = "", xlab = "")
> plot.ts(Canada$rw, main = "Real Wage", ylab = "", xlab = "")
> plot.ts(Canada$U, main = "Unemployment Rate", ylab = "",
xlab = "")
34
35
• An optimal lag-order can be determined
according to an information criteria or the
final prediction error of a VAR(p) with the
function VARselect().
> VARselect(Canada, lag.max = 5, type = "const")
$selection
AIC(n) HQ(n) SC(n) FPE(n)
3
2
2
3
• According to the more conservative SC(n) and
HQ(n) criteria, the empirical optimal lag-order
is 2.
36
• In a next step, the VAR(2) is estimated with
the function VAR() and as deterministic
regressors a constant is included.
> var.2c <- VAR(Canada, p = 2, type = "const")
> names(var.2c)
[1] "varresult" "datamat" "y" "type" "p"
[6] "K" "obs" "totobs" "restrictions" "call“
> summary(var.2c)
> plot(var.2c)
37
• The OLS results of the example are shown in
separate tables 1 – 4 below. It turns out, that
not all lagged endogenous variables enter
significantly into the equations of the VAR(2).
38
39
40
The stability of the system of difference equations has
to be checked.
If the moduli of the eigenvalues of the companion
matrix are less than one, the system is stable.
> roots(var.2c)
[1] 0.9950338 0.9081062 0.9081062 0.7380565
0.7380565 0.1856381 0.1428889 0.1428889
Although, the first eigenvalue is pretty close to unity,
for the sake of simplicity, we assume a stable VAR(2)process with a constant as deterministic regressor.
41
Restricted VARs
• From tables 1-4 it is obvious that not all regressors enter
significantly.
• With the function restrict() the user has the option to reestimate the VAR either by significance (argument method =
’ser’) or by imposing zero restrictions manually (argument
method = ’manual’).
• In the former case, each equation is re-estimated separately
as long as there are t-values that are in absolute value below
the threshold value set by the function’s argument thresh.
• In the latter case, a restriction matrix has to be provided that
consists of 0/1 values, thereby selecting the coefficients to be
retained in the model. The function’s arguments are
therefore:
42
> var2c.ser <- restrict(var.2c, method = "ser", thresh
= 2)
> var2c.ser$restrictions
e.l1 prod.l1 rw.l1 U.l1 e.l2 prod.l2 rw.l2 U.l2 const
e
prod
rw
U
1
0
0
1
1
1
1
0
1
0
1
0
1
0
0
1
1
1
1
1
0
0
0
0
0
1
0
1
0
1
1
0
1
1
0
1
43
> B(var2c.ser)
44
45
46
Diagnostic testing
• In package ‘vars’ the functions for diagnostic
testing are arch(), normality(), serial() and
stability().
> var2c.arch <- arch(var.2c)
47
• The Jarque-Bera normality tests for univariate and
multivariate series are implemented and applied to
the residuals of a VAR(p) as well as separate tests for
multivariate skewness and kurtosis (see Bera &
Jarque [1980], [1981] and Jarque & Bera [1987] and
Lutkepohl [2006]).
• The univariate versions of the Jarque-Bera test are
applied to the residuals of each equation.
• A multivariate version of this test can be computed
by using the residuals that are standardized by a
Choleski decomposition of the variance-covariance
matrix for the centered residuals.
48
> var2c.norm <- normality(var.2c, multivariate.only = TRUE)
> var2c.norm
$JB
JB-Test (multivariate)
Chi-squared = 5.094, df = 8, p-value = 0.7475
$Skewness
Skewness only (multivariate)
Chi-squared = 1.7761, df = 4, p-value = 0.7769
$Kurtosis
Kurtosis only (multivariate)
Chi-squared = 3.3179, df = 4, p-value = 0.5061
49
• For testing the lack of serial correlation in the
residuals of a VAR(p), a Portmanteau test and
the LM test proposed by Breusch & Godfrey are
implemented in the function serial().
> var2c.pt.asy <- serial(var.2c, lags.pt = 16, type = "PT.asymptotic")
> var2c.pt.asy
Portmanteau Test (asymptotic)
Chi-squared = 205.3538, df = 224, p-value = 0.8092
> var2c.pt.adj <- serial(var.2c, lags.pt = 16, type = "PT.adjusted")
> var2c.pt.adj
Portmanteau Test (adjusted)
Chi-squared = 231.5907, df = 224, p-value = 0.3497
50
• The Breusch-Godfrey LM-statistic (see Breusch
1978, Godfrey 1978) is based upon the
following auxiliary regressions:
> var2c.BG <- serial(var.2c, lags.pt = 16, type = "BG")
> var2c.BG
Breusch-Godfrey LM test
Chi-squared = 92.6282, df = 80, p-value = 0.1581
> var2c.ES <- serial(var.2c, lags.pt = 16, type = "ES")
> var2c.ES
Edgerton-Shukur F test
F statistic = 1.1186, df1 = 80, df2 = 199, p-value = 0.2648
51
• The stability of the regression relationships in
a VAR(p) can be assessed with the function
stability(). An empirical fluctuation process is
estimated for each regression by passing the
function’s arguments to the efp()-function
contained in the package strucchange.
> args(stability)
function (x, type = c("Rec-CUSUM", "OLS-CUSUM", "Rec-MOSUM",
"OLS-MOSUM", "RE", "ME", "Score-CUSUM", "Score-MOSUM", "fluctuation"),
h = 0.15, dynamic = FALSE, rescale = TRUE)
NULL
> var2c.stab <- stability(var.2c, type = "OLS-CUSUM")
> names(var2c.stab)
[1] "stability" "names" "K"
52
53
Forecasting
A predict-method for objects with class attribute
varest is available. The n.ahead forecasts are
computed recursively for the estimated VAR,
beginning with h = 1, 2, . . . , n.ahead:
> var.f10 <- predict(var.2c, n.ahead = 10, ci = 0.95)
> names(var.f10)
[1] "fcst" "endog" "model" "exo.fcst"
> class(var.f10)
[1] "varprd"
> plot(var.f10)
> fanchart(var.f10)
54
55
56
GRANGER CAUSALITY
• In time series analysis, sometimes, we would
like to know whether changes in a variable will
have an impact on changes other variables.
• To find out this phenomena more accurately,
we need to learn more about Granger
Causality Test.
57
GRANGER CAUSALITY
• In principle, the concept is as follows:
• If X causes Y, then, changes of X happened
first then followed by changes of Y.
58
GRANGER CAUSALITY
• If X causes Y, there are two conditions to be
satisfied:
1. X can help in predicting Y. Regression of X on
Y has a big R2
2. Y can not help in predicting X.
59
GRANGER CAUSALITY
• In most regressions, it is very hard to discuss
causality. For instance, the significance of the
coefficient  in the regression
yi  xi   i
only tells the ‘co-occurrence’ of x and y, not that
x causes y.
• In other words, usually the regression only tells
us there is some ‘relationship’ between x and y,
and does not tell the nature of the relationship,
such as whether x causes y or y causes x.
60
GRANGER CAUSALITY
• One good thing of time series vector autoregression is
that we could test ‘causality’ in some sense. This test is
first proposed by Granger (1969), and therefore we
refer it Granger causality.
• We will restrict our discussion to a system of two
variables, x and y. y is said to Granger-cause x if current
or lagged values of y helps to predict future values of x.
On the other hand, y fails to Granger-cause x if for all s
> 0, the mean squared error of a forecast of xt+s based
on (xt, xt−1, . . .) is the same as that is based on (yt, yt−1,
. . .) and (xt, xt−1, . . .).
61
GRANGER CAUSALITY
• If we restrict ourselves to linear functions, x
fails to Granger-cause x if
MSE Ê  xt  s xt , xt 1 ,  MSE Ê xt  s
ts
xt , xt 1 ,, yt , yt 1 ,
• Equivalently, we can say that x is exogenous in
the time series sense with respect to y, or y is
not linearly informative about future x.
62
GRANGER CAUSALITY
• A variable X is said to Granger cause another
variable Y, if Y can be better predicted from
the past of X and Y together than the past of
Y alone, other relevant information being
used in the prediction (Pierce, 1977).
63
GRANGER CAUSALITY
• In the VAR equation, the example we
proposed above implies a lower triangular
coefficient matrix:
1
11p 0   xt  p  a1t 
0   xt 1 
 xt  c1  11
 y   c    1  1   y      p  p   y   a 
 t   2   21 22   t 1 
 21 22   t  p   2t 
Or if we use MA representations,
0  a1t 
 xt   1   11 B 
 y         B    B  a 
  2t 
 t   2   21
22
where ij B     
0
ij

1
2 2
ij B  ij B
 ,    1,  0.
0
11
0
22
0
21
64
GRANGER CAUSALITY
• Consider a linear projection of yt on past,
present and future x’s,


j 0
j 1
yt  c   b j xt  j   d j xt  j  et
where E(etx ) = 0 for all t and . Then y fails to
Granger-cause x iff dj = 0 for j = 1, 2, . . ..
65
TESTING GRANGER CAUSALITY
Procedure
1) Check that both series are stationary in mean, variance
and covariance (if necessary transform the data via
logs, differences to ensure this)
2) Estimate AR(p) models for each series, where p is large
enough to ensure white noise residuals. F tests and
other criteria (e.g. Schwartz or Akaike) can be used to
establish the maximum lag p that is needed.
3) Re-estimate both model, now including all the lags of
the other variable
4) Use F tests to determine whether, after controlling for
past Y, past values of X can improve forecasts Y (and
vice versa)
66
TEST OUTCOMES
1. X Granger causes Y but Y does not Granger
cause X
2. Y Granger causes X but X does not Granger
cause Y
3. X Granger causes Y and Y Granger causes X
(i.e., there is a feedback system)
4. X does not Granger cause Y and Y does not
Granger cause X
67
TESTING GRANGER CAUSALITY
• The simplest test is to estimate the regression
which is based on
p
p
i 0
j 1
xt  c1    i xt i    j yt  j  ut
using OLS and then conduct a F-test of the
null hypothesis
H0 : 1 = 2 = . . . = p = 0.
68
TESTING GRANGER CAUSALITY
2.Run the following regression, and calculate
RSS (full model)
p
p
i 0
j 1
xt  c1    i xt i    j yt  j  ut
3.Run the following limited regression, and
calculate RSS (Restricted model).
p
xt  c1    i xt i  ut
i 0
69
TESTING GRANGER CAUSALITY
4.Do the following F-test using RSS obtained
from stages 2 and 3:
F = [{(n-k) /q }.{(RSSrestricted-RSSfull) / RSSfull}]
n: number of observations
k: number of parameters from full model
q: number of parameters from restricted
model
70
TESTING GRANGER CAUSALITY
5. If H0 rejected, then X causes Y.
• This technique can be used in investigating
whether or not Y causes X.
71
Example of the Usage of Granger Test
World Oil Price and Growth of US Economy
• Does the increase of world oil price influence the
growth of US economy or does the growth of US
economy effects the world oil price?
• James Hamilton did this study using the following
model:
Zt= a0+ a1 Zt-1+...+amZt-m+b1Xt-1 +…bmXt-m+εt
Zt= ΔPt; changes of world price of oil
Xt= log (GNPt/ GNPt-1)
72
World Oil Price and Growth of US Economy
• There are two causalities that need to be
observed:
(i) H0: Growth of US Economy does not influence
world oil price
Full:
Zt= a0+ a1 Zt-1+...+amZt-m+b1Xt-1 +…+bmXt-m+εt
Restricted:
Zt= a0+ a1 Zt-1+...+amZt-m+ εt
73
World Oil Price and Growth of US
Economy
(ii) H0 : World oil price does not influence
growth of US Economy
• Full :
Xt= a0+ a1 Xt-1+ …+amXt-m+ b1Zt-1+…+bmZt-m+ εt
• Restricted:
Xt= a0+ a1 Xt-1+ …+amXt-m+ εt
74
World Oil Price and Growth of US
Economy
• F Tests Results:
1. Hypothesis that world oil price does not
influence US economy is rejected. It means
that the world oil price does influence US
economy .
2. Hypothesis that US economy does not affect
world oil price is not rejected. It means that
the US economy does not have effect on
world oil price.
75
World Oil Price and Growth of US
Economy
• Summary of James Hamilton’s Results
Null Hypothesis (H0)
(I)F(4,86)
(II)F(8,74)
I. Economic growth
≠→World Oil Price
0.58
0.71
II. World Oil
Price≠→Economic
growth
5.55
3.28
76
World Oil Price and Growth of US
Economy
• Remark: The first experiment used the data
1949-1972 (95 observations) and m=4; while
the second experiment used data 1950-1972
(91 observations) and m=8.
77
Canadian example
The function causality() is now applied for investigating if the real wage and
productivity is causal to employment and unemployment.
> causality(var.2c, cause = c("rw", "prod"))
$Granger
Granger causality H0: prod rw do not Granger-cause e U
F-Test = 3.4529, df1 = 8, df2 = 292, p-value = 0.0008086
$Instant
H0: No instantaneous causality between: prod rw and e U
data: VAR object var.2c
Chi-squared = 2.5822, df = 4, p-value = 0.63
The null hypothesis of no Granger-causality from the real wage and labour
productivity to employment and unemployment must be rejected; whereas
the null hypothesis of non-instantenous causality cannot be rejected. This test
outcome is economically plausible, given the frictions observed in labour
markets.
Instantaneous causality appears when we include the current information
78
of variables
Chicken vs. Egg
• This causality test is also can be used in
explaining which comes first: chicken or egg.
More specifically, the test can be used in testing
whether the existence of egg causes the
existence of chicken or vise versa.
• Thurman and Fisher did this study using yearly
data of chicken population and egg productions
in the US from 1930 to1983
• The results:
1. Egg causes the chicken.
2. There is no evidence that chicken causes egg.
79
Chicken vs. Egg
• Remark: Hypothesis that egg has no effect on
chicken population is rejected; while the other
hypothesis that chicken has no effect on egg is
not rejected. Why?
80
GRANGER CAUSALITY
• We have to be aware of that Granger causality does
not equal to what we usually mean by causality. For
instance, even if x1 does not cause x2, it may still help
to predict x2, and thus Granger-causes x2 if changes in
x1 precedes that of x2 for some reason.
• A naive example is that we observe that a dragonfly
flies much lower before a rain storm, due to the lower
air pressure. We know that dragonflies do not cause a
rain storm, but it does help to predict a rain storm,
thus Granger-causes a rain storm.
81