Factor Models for Multiple Time Series

Factor Models for Multiple Time Series
Qiwei Yao
Department of Statistics, London School of Economics
[email protected]
Joint work with
Neil Bathia, University of Melbourne
Clifford Lam, London School of Mathematics
Jiazhu Pan, University of Strathclyde
Flavio Ziegelmann, Federal University of Rio Grande do Sul
– p.1
Econometric factor models: a brief survey
Statistical factor models: identification
Estimation
• expanding white noise space: non-stationary factors
• eigenanalysis: stationary cases
Asymptotic properties (stationary cases only in this talk)
• fixed p: fast convergence rate for zero-eigenvalues
• p → ∞: convergence rates independent of p
Illustration with real data sets
• temperature data
• implied volatility surfaces
• densities of intraday returns
– p.2
Econometric modelling: represent a p × 1 time series yt as
yt = ft + ξ t ,
both ft and ξt are unobservable, and
• ft : driven by r common factors, and r << p
• ξ t : idiosyncratic components
Basic idea. The dynamical structure of each component of yt is
driven by the r common factors plus one or a few
idiosyncratic components.
Practical motivation: asset pricing models, yield curves,
portfolio risk management, derivative pricing,
macroeconomic behaviour and forecasting,
consumer theory etc.
– p.3
Sargent & Sims (1977) and Geweke (1977): dynamic-factor
models
Chamberlain & Rothschild (1983): approximate and static factor
models
Forni, Hallin, Lippi & Reichlin (2002 – ): generalized dynamic
factor models – combining the above two together
yit = bi1 (L)u1t + · · · + bir (L)urt + ξit ,
i = 1, 2, · · · , t = 0, ±1, · · · ,
• ukt ∼ WN(0, 1), k = 1, · · · , r, are common (dynamic)
factors, and are uncorrelated with each other,
• ξit are stationary in t, are idiosyncratic noise,
and {ukt } and {ξit } are uncorrelated.
Only yit are observable.
– p.4
Let ξpt = (ξ1t , · · · , ξpt )τ and ypt = (y1t , · · · , ypt )τ .
Assumption: As p → ∞, it holds almost surely on [−π, π] that all
the eigenvalues of spectral density matrices of ξpt are uniformly
bounded, and only the r largest eigenvalues of (ypt − ξpt )
converge to ∞.
Intuition: The r common factors affect the dynamics of most
component series, while each idiosyncratic noise only affects the
dynamics of a few component series.
Characteristics result: As p → ∞, it holds almost surely on
[−π, π] that all the r largest eigenvalues of spectral density
matrices of ypt converge to ∞, and the (r + 1)-th largest
eigenvalue is uniformly bounded.
The model is asymptotically identifiable, when the number of
time series (i.e. the number of cross-sectional variables) p → ∞ !
– p.5
• Estimation for GDFM when r is given — Dynamic principle
component analysis (Brillinger 1981):
b
i. Obtain an estimator Σ(θ)
for spectral density matrix of yt ,
θ ∈ [−π, π]
b
ii. Find eigenvalues and eigenvectors of Σ(θ)
iii. Project yt onto the space spanned by the r eigenvectors
corresponding to the r largest eigenvalues:
the projection is defined as the mean square limit of
a Fourier sequence, and
each component of the projection is a sum of r
uncorrelated MA processes.
• Determine r: only identifiable when p → ∞!
‘There is no way a slowly diverging sequence can be told from an
eventually bounded sequence’ (Forni et al. 2000).
– p.6
Let {yt } be a p × 1 time series defined by
yt = Axt + εt ,
– p.7
Let {yt } be a p × 1 time series defined by
yt = Axt + εt ,
xt : r × 1 unobservable factors, r (< p) unknown
A: p × r unknown constant factor loading matrix
{εt }: vector WN(µε , Σε )
no linear combinations of xt are WN.
– p.7
Let {yt } be a p × 1 time series defined by
yt = Axt + εt ,
xt : r × 1 unobservable factors, r (< p) unknown
A: p × r unknown constant factor loading matrix
{εt }: vector WN(µε , Σε )
no linear combinations of xt are WN.
Lack of identification: (A, xt ) may be replaced by
(AH, H−1 xt ) for any invertible H.
Therefore, we assume Aτ A = Ir
But factor loading space M(A) is uniquely defined
– p.7
Let {yt } be a p × 1 time series defined by
yt = Axt + εt ,
xt : r × 1 unobservable factors, r (< p) unknown
A: p × r unknown constant factor loading matrix
{εt }: vector WN(µε , Σε )
no linear combinations of xt are WN.
Lack of identification: (A, xt ) may be replaced by
(AH, H−1 xt ) for any invertible H.
Therefore, we assume Aτ A = Ir
But factor loading space M(A) is uniquely defined
The model is not new: Peña & Box (1987).
– p.7
What is new?
• No distributional assumption on εt . More significantly, allow
correlation between εt and xt+k .
– p.8
What is new?
• No distributional assumption on εt . More significantly, allow
correlation between εt and xt+k .
• Factor xt , and therefore also yt , may be nonstationary, not
necessarily driven by unit roots.
– p.8
What is new?
• No distributional assumption on εt . More significantly, allow
correlation between εt and xt+k .
• Factor xt , and therefore also yt , may be nonstationary, not
necessarily driven by unit roots.
• A new estimation method for stationary cases: an
eigenanalysis, applicable when p >> n
– p.8
What is new?
• No distributional assumption on εt . More significantly, allow
correlation between εt and xt+k .
• Factor xt , and therefore also yt , may be nonstationary, not
necessarily driven by unit roots.
• A new estimation method for stationary cases: an
eigenanalysis, applicable when p >> n
Key: estimate A, or more precisely, M(A).
b , a natural estimator for factor
With available an estimator A
and the associated residuals are
b τ yt ,
bA
b τ )yt .
bt = A
b
x
εt = (Ip − A
bt , we obtain the
By modelling the lower-dimensional x
dynamical model for yt :
b xt .
bt = Ab
y
– p.8
Reconciling to econometric models
‘Common factors’ & ‘idiosyncratic noise’: conceptually appealing,
only identifiable when p → ∞.
Goal: identify those components of xt , each of them affects
most (or a few) components of yt .
Put A = (a1 , · · · , ar ) and xt = (xt1 , · · · , xtr )′ . Then
yt = a1 xt1 + · · · + ar xtr + εt .
Hence the number of non-zero coefficients of aj is the number of
components of yt which are affected by the factor xtj .
To avoid the correlation among the components of xt , apply PCA
to xt , i.e. replace (A, xt ) by (AΓ, Γ′ xt ), where Γ is an r × r
orthogonal matrix defined in Var(xt ) = ΓDΓ′ .
Eigenvalues of Var(xt ) are different, this representation is unique.
– p.9
Lemma 1. Let A1 z1 = A2 z2 , where, for i = 1, 2, Ai is p × r matrix,
A′i Ai = Ir , and zi = (zi1 , · · · , zir )′ is r × 1 random vector with
uncorrelated components, and Var(zi1 ) > · · · > Var(Zir ).
Furthermore M(A1 ) = M(A2 ). Then z1j = ±z2j for 1 ≤ j ≤ r.
bt .
In practice, we use the PCA-ed factor x
b is the
The number of non-zero elements of the j -th column of A
number of the components of yt whose dynamics depends on
the j -th factor x
btj .
– p.10
Nonstationary factors
C1. εt ∼ WN(µε , Σε ), c′ xt is not white noise for any
constant c ∈ Rp . Furthermore A′ A = Ir .
Let B = (b1 , · · · , bp−r ) be a p × (p − r) matrix such that
(A, B) is a p × p orthogonal matrix, i.e.
Bτ A = 0,
Since yt = Axt + εt ,
Bτ B = Ip−r .
Bτ yt = Bτ εt
i.e. {Bτ yt , t = 0, ±1, · · · } is WN.
Therefore
Corr(bτi yt , bτj yt−k ) = 0
∀ 1 ≤ i, j ≤ p − r and k ≥ 1.
– p.11
Search for mutually orthogonal directions b1 , b2 , · · · one by
one such that the projection of yt on each of those directions
is a white noise.
Stop the search when such a direction is no longer available,
and take p − k as the estimated value of r, where k is the
number of directions obtained in the search.
– p.12
Search for mutually orthogonal directions b1 , b2 , · · · one by
one such that the projection of yt on each of those directions
is a white noise.
Stop the search when such a direction is no longer available,
and take p − k as the estimated value of r, where k is the
number of directions obtained in the search.
See Pan and Yao (2008) for further details, and also some
(preliminary) asymptotic results.
– p.12
Stationary models
C2. xt is stationary, and Cov(xt , εt+k ) = 0 for any k ≥ 0.
Put Σy (k) = Cov(yt+k , yt ), Σx (k) = Cov(xt+k , xt ),
Σxε (k) = Cov(xt+k , εt ). By yt = Axt + εt ,
Σy (k) = AΣx (k)A′ + AΣxε (k),
k ≥ 1.
For a prescribed integer k0 ≥ 1, define
M=
k0
X
Σy (k)Σy (k)′ .
k=1
Then MB = 0, i.e. the columns of B are the eigenvectors of M
corresponding to zero-eigenvalues.
– p.13
Hence under C1 and C2,
– p.14
Hence under C1 and C2,
The factor loading space M(A) are spanned by the
eigenvectors of M corresponding to its non-zero
eigenvalues, and the number of the non-zero
eigenvalues is r.
– p.14
Hence under C1 and C2,
The factor loading space M(A) are spanned by the
eigenvectors of M corresponding to its non-zero
eigenvalues, and the number of the non-zero
eigenvalues is r.
Pk0 b
c
b y (k)′ , where Σ
b y (k) denotes the sample
Let M = k=1 Σy (k)Σ
covariance matrix of yt at lag k .
– p.14
Hence under C1 and C2,
The factor loading space M(A) are spanned by the
eigenvectors of M corresponding to its non-zero
eigenvalues, and the number of the non-zero
eigenvalues is r.
Pk0 b
c
b y (k)′ , where Σ
b y (k) denotes the sample
Let M = k=1 Σy (k)Σ
covariance matrix of yt at lag k .
c,
rb : No. of non-zero eigenvalues of M
b : its columns are the rb orthonormal eigenvectors of
A
c corresponding to its rb largest eigenvalues.
M
– p.14
Bootstrap test for r
Note that r = r0 iff the (r0 + 1)-th largest eigenvalue of M is 0 and
the r0 -th largest eigenvalue is nonzero.
Consider the testing for H0 : λr0 +1 = 0,
br0 +1 > lα .
We reject H0 if λ
Bootstrap to determine lα :
bt with rb = r0 . Let b
bt .
1. Compute y
εt = yt − y
bt + ε∗t , where ε∗t are drawn independently (with
2. Let yt∗ = y
replacement) from {b
ε1 , · · · , b
εn }.
c with {yt }
3. Form the operator M∗ in the same manner as M
replaced by {yt∗ }, compute the (r0 + 1)-th largest eigenvalue
λ∗r0 +1 of M∗ .
br0 +1 under H0 .
L(λ∗r0 +1 |{yt }) is taken as the distribution of λ
– p.15
Asymptotics I: n → ∞ and p fixed
(i) yt is strictly stationary, E||yt ||4+δ < ∞ for some δ > 0.
P
δ
(ii) yt is α-mixing satisfying j α(j) 2+δ < ∞.
(iii) M has r non-zero eigenvalues λ1 > · · · > λr > 0.
Then under condition C1 and C2, the following assertions hold.
bj − λj = OP (n−1/2 ) for 1 ≤ j ≤ r,
(i) λ
br+k = OP (n−1 ) for 1 ≤ k ≤ p − r,
(ii) λ
b M(A)} = OP (n−1/2 ) provided rb = r a.s.,
(iii) D{M(A),
where
1
bA
b τ ).
b
D{M(A), M(A)} = 1 − tr(AAτ A
r
– p.16
Numerical illustration: λ1 = 1.884, λ2 = λ3 = λ4 = 0 (p = 4, r = 1)
(Simulation replications: 10,000 times)
– p.17
√ b
Histogram of n(λ1 − λ1 )
n=20
n=50
n=100
20
40
−40
−20
20
40
0
20
40
0.03
0.02
0.01
0.0
0
20
40
−40
−20
0
20
40
0.030
n=2000
0.020
0.0
0.010
0.010
0.0
−20
−20
n=1000
0.020
0.030
0.020
0.010
0.0
−40
−40
n=500
0.030
n=200
0
0.020
0
0.010
−20
−40
−20
0
20
40
0.0
−40
0.0
0.0
0.0
0.01
0.02
0.02
0.01 0.02 0.03
0.04
0.03
0.04
n=10
−40
−20
0
20
40
−40
−20
0
20
40
– p.18
0.15
0
5
n=10
10
15
20
0
5
15
20
0
5
n=1000
10
n=50
b2
Histogram of nλ
n=20
10
10
n=500
15
20
0.15
n=200
0.10
20
0.05
15
0.0
5
0.15
0
0.10
20
0.05
15
0.0
5
0.10
0
0.05
20
0.0
15
0.10
5
0.05
0
10
0.15
0.0
0.05
0.10
0.15
0.0
0.05
0.10
0.15
0.0
0.05
0.10
0.15
0.0
0.05
0.10
0.15
10
0.0
0
0
5
5
n=100
10
n=2000
10
15
15
20
20
– p.19
Asymptotics II: n → ∞, p → ∞ and r fixed
Recall model: yt = Axt + εt ,
and A is p × r
1. Assumptions on Strength of factors:
(i) A = (a1 , · · · , ar ), ||ai ||2 ≍ p1−δ , i = 1, · · · , r, 0 ≤ δ ≤ 1.
(ii) For k = 0, 1, · · · , k0 , Σx (k) ≡ Cov(xt+k , xt ) is full-ranked,
and Σx,ǫ (k) ≡ Cov(xt+k , εt ) = O(1) elementwisely.
We call
• factors are strong if δ = 0,
• factors are weak if δ > 0.
Standardization ‘Aτ A = Ir ’ + (i, ii) imply:
kΣx (k)k ≍ p1−δ ≍ kΣx (k)kmin ,
kΣx,ǫ (k)k = O(p1−δ/2 ),
where a ≍ b represents a = O(b) & b = O(a), kAk2 = λmax (AAτ )
and kAk2min = min{λ(AAτ ) : λ(AAτ ) > 0}.
– p.20
2. For k = 0, 1, · · · , k0 , kΣx,ǫ (k)k = o(p1−δ ), and it holds
elementwisely that
Σ̃x (k) − Σx (k) = OP (n−lx ),
Σ̃ǫ (k) − Σǫ (k) = OP (n−lǫ ),
Σ̃x,ǫ (k) − Σx,ǫ (k) = OP (n−lxǫ ) = Σ̃ǫ,x (k)
for some constants 0 < lx , lxǫ , lǫ ≤ 1/2, and Σ̃ denotes the
sample version of Σ.
3. M has r different non-zero eigenvalues.
Then under condition C1 and C2,
b − Ak = OP (hn ) = OP (n−lx + pδ/2 n−lxǫ + pδ n−lǫ ),
kA
provided hn = o(1).
Remark. When all factors are strong (i.e. δ = 0), the convergence
rate hn is independent of the dimension p.
– p.21
Our asymptotic theory also shows:
1. Factor model-based estimator for Σy :
by = A
bΣ
b xA
bτ + Σ
b ǫ , where Σ
bx = A
b τ (Σ̃y − Σ
b ǫ )A,
b
Σ
cannot improve over the sample covariance estimator Σ̃y .
−1
b
But the convergence rate for kΣy − Σ−1
y k is independent of
p when all the factors are strong.
2. When the factors are of different levels of strengths, two (or
multiple) step estimation procedure is preferred to estimate
and to remove the stronger factors first.
– p.22
Simulation with r = 1 and δ = 0 (only one strong factor):
xt = 0.9xt−1 + N (0, 4),
εtj ∼iid N (0, 4), and the i-th element of A is 2 cos(2πi/p).
b − Ak
n = 200 kA
p = 20 .022(.005)
p = 180 .023(.004)
p = 400 .022(.004)
p = 1000 .023(.004)
−1
kΣ̃y
− Σ−1
y k
.24(.03)
79.8(29.8)
-
−1
b
kΣy − Σ−1
y k
.009(.002)
.007(.001)
.007(.001)
.007(.001)
b y − Σy k
n = 200 kΣ̃y − Σy k kΣ
p = 20
218(165)
218(165)
p = 180 1962(1500) 1963(1500)
p = 400 4102(3472) 4103(3471)
p = 1000 10797(6820) 10800(6818)
– p.23
Illustration With Real Data
Example 1. The monthly temperature data from 7 cities in
Eastern China in January 1954 — December 1986
n = 396,
p = 7,
rb = 4
Example 2. Daily implied volatility surfaces for IBM, Microsoft
and Dell call options in 2006
n = 100,
p = 130,
rb = 1
⇛
Example 3. Daily densities of one-minute returns of IBM
stock price in 2006
n = 251,
p = “∞”,
rb = 2
⇛
– p.24
Time plots of the monthly
temperature in 1959-1968 of
Nanjing, Dongtai, Huoshan,
Hefei, Shanghai, Anqing and
Hangzhow.
25
•
15
•
•
•
•
•
•
•
•
•
•••
•
•
25
•
15
•
•
•
•
0 5
•
• •
•
•
•
••
25
15
•
•
•
•
0 5
•
•
•
25
•
15
32.5
•
•
•
•
•
•
0 5
•
•
•
•
•
•••
•
••
•
25
•
15
•
•
•
•••
•
•
•
•••
•
25
•
•
15
•
•
•
•
0 5
•
•
•
•
•
•
•••
•
•
•
••
•
0
25
•
15
•
•
•
•
•
•
•
•
0
•
•
•
•
•
•
••
•
20
•
••
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•••
40
•
••
•
•
•
•
•
•
•
•
60
•
•
•
•
•
•
•
•
•
•
•
120
•
•
•
•
•
•
•
••
••
•
•
•
•
••
••
•
•
•
•
•
• •
•
•
•
100
•
•
•
•
80
•
•
•
•
•
•
• •
•
•
•
•
••
•
•
•
•
•
•
•
•
•
••
••
•
•
•
120
•
•
••
•
•
100
•
•
•
••
•
•
•
•
•
•
•
••
80
••
•
•
• •
•
•
•
• ••
•
••
•
•
•
•
•
60
•
•
•
•
•
•
•
•
•
••
••
•
•
•
••
•
•
•
••
•
•
•
•
•
120
•
•
••
•
•
100
•
•
•
•
•••
•
•
•
•
•
•••
•
80
••
••
•
• •
•
•
•
• ••
•
••
40
•
•
•
•
••
•
•
•
•
•
•
•
••
•
•••
•
•••
•
5
•
•
•
•
••
•
•
•
•
••
••
•
•
•
•
•
•
•
20
•
•
•
•
100
•
• •
•
•
•
•
•
•
•
•
120
•
•
•
•
••
•••
•
•
•
•
•
•
•••
•
•
60
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
80
•
•
• •
•
•
122
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
100
••
•
••
40
•
•
••
•
••
••
•
•
•
•
••
•
•
••
•
20
••
•
•
80
••
•
••
•
•
••
•
•
•
••
•
•
•
•
60
•
•
•
•
•••
•
•
•
•
•
•
•
•
•
•
•
•
0
•••
•
•
•
•
•
•
•
120
•
•
•
••
•
•
•
•
•
•
•
•••
•
•
•
•
•
•
80
•
•
100
•
• •
•
••
••
•
•
•
•
•
••
•
•
•
•
40
•
•
•
•
••
•
•
••
•
•
••
•
•
•
•••
••
•
60
•
•
•
•
•
•
•
•
•
•
•••
•
20
•
•
•
•
•
•
•
•
•••
•
••
•
••
•
40
•
•
•
•
•
•
••
••
•
••
••
•
••
••
•
•
•
•
•
•
•
•
•••
•
•
0 5
32.0
31.5
31.0
30.5
30.0
Latitude
•
•
•
•
•
•
•
80
•
•
20
•
Longitude
•
••
0
120
•
•
•
•
•
•
•
•
•
60
•
•
•
•
•
•
118
•
•
•
•••
••
•
•
116
•
••
•
•
•
•
•
•
•
••
•
•
••
••
•
•
••
40
•••
•
•
•
• •
•
•
•
0
Hangzhow
•
•
•
••
•
•
•
•
•
•
•
•
60
•
•
•
•
•
•
•
Anqing
•
•
•
•
•
•
•
••
•
•
•
•
•
•
40
20
•
•
•
•
••
•
•
••
•
•
•••
••
•
•
•
Shanghai
•
•
•
•
•
20
•
•
•
•
•
••
•
•
•
•
•
0
Hefei
•
Huoshan
•
•
•
•
•
Nanjing
•
•
•
•
•
0
Dongtai
•
••
•
•
•
0 5
•
•
••
120
••
•
•
•
•
•
•
•
••
••
••
•
•
•
•
•
•
• •
•
•
100
•
•
•
•
•
120
– p.25
b t + et , rb = 4,
With p = 12, α = 1%, the fitted model is yt = Ax
b ε ),
et ∼ WN(b
µε , Σ
0
B
B
B
B
B
B
B
be = B
µ
B
B
B
B
B
@
3.41
2.32
4.39
4.30
3.40
4.91
4.77
0
1
C
C
C
C
C
C
C
C,
C
C
C
C
C
A
.394
B
B −.086
b =B
A
B
B .395
@
.687
0
B
B
B
B
B
B
B
b
Σe = B
B
B
B
B
B
@
1
1.56
1.26
1.05
1.71
1.34
1.91
1.90
1.49
2.10
2.33
1.37
1.16
1.46
1.58
1.37
1.67
1.26
1.91
2.09
1.37
1.97
1.41
1.14
1.58
1.67
1.39
1.56
.386
.378
.387
.363
.376
.225
−.640
−.271
.658
−.014
.0638
−.600
.346
−.494
−.074
−.585
−.032
−.306
.173
.206
.366
1.53
C
C
C
C
C
C
C
C.
C
C
C
C
C
A
1τ
C
.164 C
C
C ,
.332 C
A
−.139
xt are PCAed factors: 1st PC accounts for 99% of TV of 4 factors,
and 97.6% of the original 7 series.
– p.26
Time plots of the 4 estimated factors
20 40 60
•
•
•
0
•
•
•
•••
•
•
• •
•
•
••
•
••
•
•
•
•
•
•
•••
3
2
•
•
•
1
••
•
0
•
•
•
•
•
•
••
••••
•
•
•
•
••
•••
•••
••
20
••
••
••
••
••
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
••
•
•••
•••
80
•
•
••••
•
•
•
•
•
•
60
•
•
•
•
•
••
•
• •
•••
•
•
•
•
•
••
•
•
•
•
•
•
•
•••
•
•
•
100
•
• ••
•
•
•
•
•
••
••
• •
•
•• •
•
40
60
0.6
0
•
••
120
•
••••
•
• •
••
•
40
•
•
•
•
•
20
•
•
••
•
•
4
0
•
•
•
•
•
•
•
•
•
•
•
•
•
•
VAR(1)
••
•
•
••••
•
•
•
•
•
••
•
80
•
•
• ••••
•
•
•
•
•
• ••
••
•
100
120
•
0.0
• •
•
•••
•
•
••
•
•
•
−0.6
•
•
•
• •
•
•
• •
•
•
•
•
•
•
••
•
•
0.4
•
•
•
•
•
•
•
•
•
••
•
•••
•
•
• •
•
•••
•••
••
•
•
••
•
•
•
••
•
••• •
• •
• •
20
40
•
••
••
••
•
••
• •
•
• ••
•
•••••
•••
••
•
•
••
••
•
•
•
•
80
•
• •
•
•
•
••• • •
•
•
•
••• ••••
•
•
•
•
100
••
•••
••
•
•
•
•
•
120
•
•
•
•
•
•
••
•• •
•
•
•
••
•
•
•• •
•
•
•
•
•
0
•••
•
•
•
•
•
•
60
•
•
•
•
•
•
•
•
••
•
• • •
• •
•
•••
• • •
•
•
•
•
•
•
•
• ••
•••
•
•
40
•
•
•
•
•
20
•
−0.4
••
•
0
0.0
•
•
•
•
•
60
80
100
120
– p.27
Sample cross-correlation of the 4 estimated factors
factor 1 and factor 2
factor 1 and factor 3
factor 1 and factor 4
15
20
0
5
10
15
20
−5
0
0.05
15
20
0
0.05
5
10
15
20
0
5
10
15
20
0
10
15
20
5
10
15
20
factor 3 and factor 4
factor 3
−10
−5
0
0.05
0.0
0.4
0.0
−15
−10
−5
0
0
factor 4 and factor 2
5
10
15
20
5
10
15
20
factor 4
−20
−15
−10
Lag
−5
0
0.8
−20
−15
−10
Lag
−5
0
0.4
0.0
−0.10
−0.10
0.0
0.0
0.05
ACF
−0.05
0.05
0
factor 4 and factor 3
0.10
factor 4 and factor 1
−20
−0.10
−15
−0.15
−0.05
−0.05
ACF
0.05
0.05
0.8
0.10
0.15
factor 3 and factor 2
−0.15
−20
−0.15
5
factor 2 and factor 4
0.0
0
−0.10
−10
10
0.10
1.0
0.0
−0.5
−15
factor 3 and factor 1
5
factor 2 and factor 3
0.5
0.5
ACF
0.0
−0.5
−20
0.0
0
factor 2
0.10
10
0.0
5
factor 2 and factor 1
−0.10
0
−0.10
0.0
−1.0
−0.10
−0.5
−0.5
0.0
ACF
0.0
0.5
0.5
0.10
1.0
factor 1
−20
−15
−10
Lag
−5
0
0
5
10
15
Lag
20
– p.28
b τ yt )
Sample cross-correlation of the 3 residuals (i.e. B
residual 1 and residual 2
residual 1 and residual 3
0
5
10
15
20
0.05
−0.15
−0.10
0.0
0.2
−0.05
−0.05
0.0
ACF
0.4
0.6
0.05
0.8
1.0
0.10
residual 1
0
5
15
20
0
residual 2
10
15
20
residual 2 and residual 3
1.0
−20
−15
−10
−5
0
0.0
−0.10
0.0
−0.10
0.2
−0.05
0.4
ACF
0.0
0.6
0.05
0.8
0.10
0.05
5
0.10
residual 2 and residual 1
10
0
10
15
20
0
5
residual 3 and residual 2
15
20
1.0
0.4
0.2
0.0
−0.10
−0.10
−0.05
−0.05
0.0
ACF
0.0
0.6
0.05
0.8
0.10
0.05
10
residual 3
0.10
residual 3 and residual 1
5
−20
−15
−10
Lag
−5
0
−20
−15
−10
Lag
−5
0
0
5
10
Lag
15
20
– p.29
Since the first two factors are dominated by periodic
components, we remove them before fitting.
60
•
•
50
•
•
40
•
30
•
20
•
•
0
10
•
•
•
•
2
•
•
4
•
•
6
•
•
8
•
•
10
•
••
12
– p.30
b t + et , the AICC selected
In the fitted factor model yt = Ax
VAR(1) for the factor process:
b 1 (xt−1 − αt−1 ) + ut ,
b0 + Φ
xt − αt = ϕ
where ατt = (pt1 , pt2 , 0, 0) is the periodic component, and
0
B
B
B
b
Φ1 = B
B
@
.27
−.31
.72
.01
.36
−.04
.00
−.01
.42
−.00
.03
.03
1
14.24
B
C
B −.17
.04 C
B
C b
C , Σu = B
B −.02
−.02 C
@
A
.042
.48
.40
1
0
.23
.03
.05
.01
−.00
.05
C
C
C
C,
C
A
b 0 = (.07, −.02, −.11, .10)τ .
ϕ
– p.31
• Temperature dynamics in the 7 cities may be modelled in
terms of 4 common factors
• The annual periodic fluctuations may be explained by a
single common factor
• Removing the periodic components, the dynamics of the 4
common factors may be represented by an AR(1) model
⇛
– p.32
Example 2. Implied volatility surfaces of IBM, Microsoft and Dell
stocks in 2006 (i.e. 251 trading days).
Source of Data: OptionMetrics at WRDS
Observations: for t = 1, · · · , 251, implied volatility wt (ui , vj )
computed from call options at
• time to maturity at 30, 60, 91, 122, 152, 182, 273, 365,
547 & 730 calendar days, denoted by u1 , · · · , u10 , and
• delta at 0.2, 0.25, 0.3, 0.35, 0.4, · · · , 0.8, denoted
by v1 , · · · , v13 .
Total: p = 10 × 13 = 130 time series with the length 251 each.
Take difference w̌t (ui , vj ) = wt (ui , vj ) − wt−1 (ui , vj ), and vectorize
the matrix w̌t (ui , vj ) into a 130 × 1 vector process yt .
– p.33
Average daily implied volatility surfaces over 251 days.
– p.34
Fitting a factor model on each of the rolling windows of length
100 days:
yi , yi+1 , · · · , yi+99 , i = 1, · · · , 150.
The estimated number of factors for all 3 stocks across different
windows is always rb = 1.
Based on a fitted AR model to the estimated factor process, we
predict the next value xi+100 , denoted by x̌i+100 . It leads to the
one-step ahead prediction for yi+100 :
b i+100 .
y̌i+100 = Ax̌
Put
1
RMSEi = √ ||y̌i+100 − yi+100 ||,
p
i = 1, · · · , 150.
– p.35
Average of the ordered
c over
eigenvalues of M
the 150 rolling windows.
3 panels on the left: 10
largest eigenvalues
3 panels on the right:
2nd–11th largest eigenvalues.
– p.36
Benchmark prediction for yi+100 : the previous value yi+99
Prediction based on Bai & Ng (2002) — factor-modelling based
b x
bt ) is the solution of
on the LSE: (A,
n
X
||yt − Axt ||2 , subject to Aτ A/p = Ir and Xτ X/n = Ir ,
min
A,xt
t=1
where X = (x1 , · · · , xn ).
The cumulative RMSE over the 150 windows. Red dotted – Bai &
Ng (2002), Green – benchmark, Black – our method.
– p.37
Example 3. IBM stock intra-day prices in 2006
251 trading days, tick by tick prices collected in 9:30 — 16:00
In total 2,786,649 observations (74MB)
For each of 251 trading days, construct the pdf curve of
one-minute log-return using the log-returns in 390 one-minute
intervals: kernel density estimation with h = 0.000025
Treating the 251 pdfs as a high-dimensional time series, apply
the proposed procedure.
The white-noise test rejects H0 : r = 1, but cannot reject
H0 : r = 2.
– p.38
4e−04
4000
0
−4e−04
0e+00
4e−04
−4e−04
0e+00
4e−04
day 5
day 6
0
4000
0
4000
0
0e+00
4e−04
4000
day 4
Density
bandwidth = 2.5e−05
Density
bandwidth = 2.5e−05
−4e−04
0e+00
4e−04
−4e−04
0e+00
4e−04
day 7
day 8
day 9
4000
0e+00
4e−04
0
0
−4e−04
4000
bandwidth = 2.5e−05
Density
bandwidth = 2.5e−05
Density
bandwidth = 2.5e−05
0 4000
−4e−04
0e+00
4e−04
−4e−04
0e+00
4e−04
day 10
day 11
day 12
0e+00
4e−04
bandwidth = 2.5e−05
0
4000
0
−4e−04
4000
bandwidth = 2.5e−05
Density
bandwidth = 2.5e−05
Density
bandwidth = 2.5e−05
0 4000
Density
Density
0
0e+00
bandwidth = 2.5e−05
−4e−04
Density
day 3
4000
Density
−4e−04
Density
day 2
0 4000
Density
day 1
−4e−04
0e+00
4e−04
bandwidth = 2.5e−05
−4e−04
0e+00
4e−04
bandwidth = 2.5e−05
– p.39
4
6
8
10
8e+08
12
2
4
6
8
10
Index
BHK: 2 to 12
New: 2 to 12
0
0e+00
4e+06
eigen−values
1000
12
8e+06
Index
3000
2
eigen−values
4e+08
0e+00
5000
eigen−values
15000
New: 1 to 12
0
eigen−values
BHK: 1 to 12
2
4
6
Index
8
10
2
4
6
Index
8
10
– p.40
−0.10
0.00
1st Eigen−Function (New)
eigen−funtion
0.00
−0.10
eigen−funtion
1st Eigen−Function (BHK)
0
100 200 300 400 500
0
100 200 300 400 500
Index
2nd Eigen−Function (BHK)
2nd Eigen−Function (New)
0.05
−0.15
−0.05
eigen−funtion
−0.05
−0.15
eigen−funtion
0.05
Index
0
100 200 300 400 500
Index
0
100 200 300 400 500
Index
– p.41
40 45 50 55 60 65 70
Time series plots of xt1 and xt2
0
50
100
150
200
250
150
200
250
150
200
250– p.42
−10
−5
0
5
10
Day
0
50
100
−15
−5
0
5
10
15
Day
0
50
100
Day
ACF of (xt1 , xt2 )
0.6
0.2
−0.2
0.2
−0.2
ACF
0.6
1.0
Series 1 & Series 2
1.0
Series 1
5
10
15
20
5
10
Series 2 & Series 1
Series 2
15
20
15
20
0.6
0.2
−0.2
0.2
0.6
1.0
Lag
−0.2
ACF
0
Lag
1.0
0
−20
−15
−10
Lag
−5
0
0
5
10
Lag
– p.43
PACF of (xt1 , xt2 )
0.4
0.0
0.2
0.4
0.2
0.0
Partial ACF
0.6
Series 1 & Series 2
0.6
Series 1
5
10
15
5
10
Series 2 & Series 1
Series 2
15
20
15
20
0.4
0.2
0.0
0.2
0.4
0.6
Lag
0.6
Lag
0.0
Partial ACF
20
−20
−15
−10
Lag
−5
5
10
Lag
– p.44
Fitting time series xt = (xt1 , xt2 )′
Since there is little cross correlation between the two component
series, we fit them separately.
For {xt1 }, AIC selected ARMA(1,1) with AIC=4556.76:
xt+1,1 = 0.985xt1 + εt+1,1 − 0.787εt,1 .
For {xt2 }, AIC selected ARMA(1,1) with AIC=4323.1:
xt+1,2 = 0.982xt2 + εt+1,2 − 0.885εt,2 .
Allowing nonstationarity — ARIMA(1,1,1):
xt+1,1 −xt1 = 0.062(xt1 −xt−1,1 )+εt+1,1 −0.847εt,1 ,
(AIC = 4537.13)
xt+1,2 −xt2 = 0.046(xt2 −xt−1,2 )+εt+1,2 −0.889εt,2 ,
(AIC = 4306.08)
– p.45
ACF of the residuals from the fitted ARMA(1,1) models
Series residuals(arima(xi[, 1], order = c(1, 0, 1)))
0.6
0.2
0.4
ACF
0.4
0.2
0.0
0.0
ACF
0.6
0.8
0.8
1.0
1.0
Series residuals(arima(xi[, 2], order = c(1, 0, 1)))
0
5
10
15
Lag
20
0
5
10
15
20
Lag
– p.46