Factor Models for Multiple Time Series Qiwei Yao Department of Statistics, London School of Economics [email protected] Joint work with Neil Bathia, University of Melbourne Clifford Lam, London School of Mathematics Jiazhu Pan, University of Strathclyde Flavio Ziegelmann, Federal University of Rio Grande do Sul – p.1 Econometric factor models: a brief survey Statistical factor models: identification Estimation • expanding white noise space: non-stationary factors • eigenanalysis: stationary cases Asymptotic properties (stationary cases only in this talk) • fixed p: fast convergence rate for zero-eigenvalues • p → ∞: convergence rates independent of p Illustration with real data sets • temperature data • implied volatility surfaces • densities of intraday returns – p.2 Econometric modelling: represent a p × 1 time series yt as yt = ft + ξ t , both ft and ξt are unobservable, and • ft : driven by r common factors, and r << p • ξ t : idiosyncratic components Basic idea. The dynamical structure of each component of yt is driven by the r common factors plus one or a few idiosyncratic components. Practical motivation: asset pricing models, yield curves, portfolio risk management, derivative pricing, macroeconomic behaviour and forecasting, consumer theory etc. – p.3 Sargent & Sims (1977) and Geweke (1977): dynamic-factor models Chamberlain & Rothschild (1983): approximate and static factor models Forni, Hallin, Lippi & Reichlin (2002 – ): generalized dynamic factor models – combining the above two together yit = bi1 (L)u1t + · · · + bir (L)urt + ξit , i = 1, 2, · · · , t = 0, ±1, · · · , • ukt ∼ WN(0, 1), k = 1, · · · , r, are common (dynamic) factors, and are uncorrelated with each other, • ξit are stationary in t, are idiosyncratic noise, and {ukt } and {ξit } are uncorrelated. Only yit are observable. – p.4 Let ξpt = (ξ1t , · · · , ξpt )τ and ypt = (y1t , · · · , ypt )τ . Assumption: As p → ∞, it holds almost surely on [−π, π] that all the eigenvalues of spectral density matrices of ξpt are uniformly bounded, and only the r largest eigenvalues of (ypt − ξpt ) converge to ∞. Intuition: The r common factors affect the dynamics of most component series, while each idiosyncratic noise only affects the dynamics of a few component series. Characteristics result: As p → ∞, it holds almost surely on [−π, π] that all the r largest eigenvalues of spectral density matrices of ypt converge to ∞, and the (r + 1)-th largest eigenvalue is uniformly bounded. The model is asymptotically identifiable, when the number of time series (i.e. the number of cross-sectional variables) p → ∞ ! – p.5 • Estimation for GDFM when r is given — Dynamic principle component analysis (Brillinger 1981): b i. Obtain an estimator Σ(θ) for spectral density matrix of yt , θ ∈ [−π, π] b ii. Find eigenvalues and eigenvectors of Σ(θ) iii. Project yt onto the space spanned by the r eigenvectors corresponding to the r largest eigenvalues: the projection is defined as the mean square limit of a Fourier sequence, and each component of the projection is a sum of r uncorrelated MA processes. • Determine r: only identifiable when p → ∞! ‘There is no way a slowly diverging sequence can be told from an eventually bounded sequence’ (Forni et al. 2000). – p.6 Let {yt } be a p × 1 time series defined by yt = Axt + εt , – p.7 Let {yt } be a p × 1 time series defined by yt = Axt + εt , xt : r × 1 unobservable factors, r (< p) unknown A: p × r unknown constant factor loading matrix {εt }: vector WN(µε , Σε ) no linear combinations of xt are WN. – p.7 Let {yt } be a p × 1 time series defined by yt = Axt + εt , xt : r × 1 unobservable factors, r (< p) unknown A: p × r unknown constant factor loading matrix {εt }: vector WN(µε , Σε ) no linear combinations of xt are WN. Lack of identification: (A, xt ) may be replaced by (AH, H−1 xt ) for any invertible H. Therefore, we assume Aτ A = Ir But factor loading space M(A) is uniquely defined – p.7 Let {yt } be a p × 1 time series defined by yt = Axt + εt , xt : r × 1 unobservable factors, r (< p) unknown A: p × r unknown constant factor loading matrix {εt }: vector WN(µε , Σε ) no linear combinations of xt are WN. Lack of identification: (A, xt ) may be replaced by (AH, H−1 xt ) for any invertible H. Therefore, we assume Aτ A = Ir But factor loading space M(A) is uniquely defined The model is not new: Peña & Box (1987). – p.7 What is new? • No distributional assumption on εt . More significantly, allow correlation between εt and xt+k . – p.8 What is new? • No distributional assumption on εt . More significantly, allow correlation between εt and xt+k . • Factor xt , and therefore also yt , may be nonstationary, not necessarily driven by unit roots. – p.8 What is new? • No distributional assumption on εt . More significantly, allow correlation between εt and xt+k . • Factor xt , and therefore also yt , may be nonstationary, not necessarily driven by unit roots. • A new estimation method for stationary cases: an eigenanalysis, applicable when p >> n – p.8 What is new? • No distributional assumption on εt . More significantly, allow correlation between εt and xt+k . • Factor xt , and therefore also yt , may be nonstationary, not necessarily driven by unit roots. • A new estimation method for stationary cases: an eigenanalysis, applicable when p >> n Key: estimate A, or more precisely, M(A). b , a natural estimator for factor With available an estimator A and the associated residuals are b τ yt , bA b τ )yt . bt = A b x εt = (Ip − A bt , we obtain the By modelling the lower-dimensional x dynamical model for yt : b xt . bt = Ab y – p.8 Reconciling to econometric models ‘Common factors’ & ‘idiosyncratic noise’: conceptually appealing, only identifiable when p → ∞. Goal: identify those components of xt , each of them affects most (or a few) components of yt . Put A = (a1 , · · · , ar ) and xt = (xt1 , · · · , xtr )′ . Then yt = a1 xt1 + · · · + ar xtr + εt . Hence the number of non-zero coefficients of aj is the number of components of yt which are affected by the factor xtj . To avoid the correlation among the components of xt , apply PCA to xt , i.e. replace (A, xt ) by (AΓ, Γ′ xt ), where Γ is an r × r orthogonal matrix defined in Var(xt ) = ΓDΓ′ . Eigenvalues of Var(xt ) are different, this representation is unique. – p.9 Lemma 1. Let A1 z1 = A2 z2 , where, for i = 1, 2, Ai is p × r matrix, A′i Ai = Ir , and zi = (zi1 , · · · , zir )′ is r × 1 random vector with uncorrelated components, and Var(zi1 ) > · · · > Var(Zir ). Furthermore M(A1 ) = M(A2 ). Then z1j = ±z2j for 1 ≤ j ≤ r. bt . In practice, we use the PCA-ed factor x b is the The number of non-zero elements of the j -th column of A number of the components of yt whose dynamics depends on the j -th factor x btj . – p.10 Nonstationary factors C1. εt ∼ WN(µε , Σε ), c′ xt is not white noise for any constant c ∈ Rp . Furthermore A′ A = Ir . Let B = (b1 , · · · , bp−r ) be a p × (p − r) matrix such that (A, B) is a p × p orthogonal matrix, i.e. Bτ A = 0, Since yt = Axt + εt , Bτ B = Ip−r . Bτ yt = Bτ εt i.e. {Bτ yt , t = 0, ±1, · · · } is WN. Therefore Corr(bτi yt , bτj yt−k ) = 0 ∀ 1 ≤ i, j ≤ p − r and k ≥ 1. – p.11 Search for mutually orthogonal directions b1 , b2 , · · · one by one such that the projection of yt on each of those directions is a white noise. Stop the search when such a direction is no longer available, and take p − k as the estimated value of r, where k is the number of directions obtained in the search. – p.12 Search for mutually orthogonal directions b1 , b2 , · · · one by one such that the projection of yt on each of those directions is a white noise. Stop the search when such a direction is no longer available, and take p − k as the estimated value of r, where k is the number of directions obtained in the search. See Pan and Yao (2008) for further details, and also some (preliminary) asymptotic results. – p.12 Stationary models C2. xt is stationary, and Cov(xt , εt+k ) = 0 for any k ≥ 0. Put Σy (k) = Cov(yt+k , yt ), Σx (k) = Cov(xt+k , xt ), Σxε (k) = Cov(xt+k , εt ). By yt = Axt + εt , Σy (k) = AΣx (k)A′ + AΣxε (k), k ≥ 1. For a prescribed integer k0 ≥ 1, define M= k0 X Σy (k)Σy (k)′ . k=1 Then MB = 0, i.e. the columns of B are the eigenvectors of M corresponding to zero-eigenvalues. – p.13 Hence under C1 and C2, – p.14 Hence under C1 and C2, The factor loading space M(A) are spanned by the eigenvectors of M corresponding to its non-zero eigenvalues, and the number of the non-zero eigenvalues is r. – p.14 Hence under C1 and C2, The factor loading space M(A) are spanned by the eigenvectors of M corresponding to its non-zero eigenvalues, and the number of the non-zero eigenvalues is r. Pk0 b c b y (k)′ , where Σ b y (k) denotes the sample Let M = k=1 Σy (k)Σ covariance matrix of yt at lag k . – p.14 Hence under C1 and C2, The factor loading space M(A) are spanned by the eigenvectors of M corresponding to its non-zero eigenvalues, and the number of the non-zero eigenvalues is r. Pk0 b c b y (k)′ , where Σ b y (k) denotes the sample Let M = k=1 Σy (k)Σ covariance matrix of yt at lag k . c, rb : No. of non-zero eigenvalues of M b : its columns are the rb orthonormal eigenvectors of A c corresponding to its rb largest eigenvalues. M – p.14 Bootstrap test for r Note that r = r0 iff the (r0 + 1)-th largest eigenvalue of M is 0 and the r0 -th largest eigenvalue is nonzero. Consider the testing for H0 : λr0 +1 = 0, br0 +1 > lα . We reject H0 if λ Bootstrap to determine lα : bt with rb = r0 . Let b bt . 1. Compute y εt = yt − y bt + ε∗t , where ε∗t are drawn independently (with 2. Let yt∗ = y replacement) from {b ε1 , · · · , b εn }. c with {yt } 3. Form the operator M∗ in the same manner as M replaced by {yt∗ }, compute the (r0 + 1)-th largest eigenvalue λ∗r0 +1 of M∗ . br0 +1 under H0 . L(λ∗r0 +1 |{yt }) is taken as the distribution of λ – p.15 Asymptotics I: n → ∞ and p fixed (i) yt is strictly stationary, E||yt ||4+δ < ∞ for some δ > 0. P δ (ii) yt is α-mixing satisfying j α(j) 2+δ < ∞. (iii) M has r non-zero eigenvalues λ1 > · · · > λr > 0. Then under condition C1 and C2, the following assertions hold. bj − λj = OP (n−1/2 ) for 1 ≤ j ≤ r, (i) λ br+k = OP (n−1 ) for 1 ≤ k ≤ p − r, (ii) λ b M(A)} = OP (n−1/2 ) provided rb = r a.s., (iii) D{M(A), where 1 bA b τ ). b D{M(A), M(A)} = 1 − tr(AAτ A r – p.16 Numerical illustration: λ1 = 1.884, λ2 = λ3 = λ4 = 0 (p = 4, r = 1) (Simulation replications: 10,000 times) – p.17 √ b Histogram of n(λ1 − λ1 ) n=20 n=50 n=100 20 40 −40 −20 20 40 0 20 40 0.03 0.02 0.01 0.0 0 20 40 −40 −20 0 20 40 0.030 n=2000 0.020 0.0 0.010 0.010 0.0 −20 −20 n=1000 0.020 0.030 0.020 0.010 0.0 −40 −40 n=500 0.030 n=200 0 0.020 0 0.010 −20 −40 −20 0 20 40 0.0 −40 0.0 0.0 0.0 0.01 0.02 0.02 0.01 0.02 0.03 0.04 0.03 0.04 n=10 −40 −20 0 20 40 −40 −20 0 20 40 – p.18 0.15 0 5 n=10 10 15 20 0 5 15 20 0 5 n=1000 10 n=50 b2 Histogram of nλ n=20 10 10 n=500 15 20 0.15 n=200 0.10 20 0.05 15 0.0 5 0.15 0 0.10 20 0.05 15 0.0 5 0.10 0 0.05 20 0.0 15 0.10 5 0.05 0 10 0.15 0.0 0.05 0.10 0.15 0.0 0.05 0.10 0.15 0.0 0.05 0.10 0.15 0.0 0.05 0.10 0.15 10 0.0 0 0 5 5 n=100 10 n=2000 10 15 15 20 20 – p.19 Asymptotics II: n → ∞, p → ∞ and r fixed Recall model: yt = Axt + εt , and A is p × r 1. Assumptions on Strength of factors: (i) A = (a1 , · · · , ar ), ||ai ||2 ≍ p1−δ , i = 1, · · · , r, 0 ≤ δ ≤ 1. (ii) For k = 0, 1, · · · , k0 , Σx (k) ≡ Cov(xt+k , xt ) is full-ranked, and Σx,ǫ (k) ≡ Cov(xt+k , εt ) = O(1) elementwisely. We call • factors are strong if δ = 0, • factors are weak if δ > 0. Standardization ‘Aτ A = Ir ’ + (i, ii) imply: kΣx (k)k ≍ p1−δ ≍ kΣx (k)kmin , kΣx,ǫ (k)k = O(p1−δ/2 ), where a ≍ b represents a = O(b) & b = O(a), kAk2 = λmax (AAτ ) and kAk2min = min{λ(AAτ ) : λ(AAτ ) > 0}. – p.20 2. For k = 0, 1, · · · , k0 , kΣx,ǫ (k)k = o(p1−δ ), and it holds elementwisely that Σ̃x (k) − Σx (k) = OP (n−lx ), Σ̃ǫ (k) − Σǫ (k) = OP (n−lǫ ), Σ̃x,ǫ (k) − Σx,ǫ (k) = OP (n−lxǫ ) = Σ̃ǫ,x (k) for some constants 0 < lx , lxǫ , lǫ ≤ 1/2, and Σ̃ denotes the sample version of Σ. 3. M has r different non-zero eigenvalues. Then under condition C1 and C2, b − Ak = OP (hn ) = OP (n−lx + pδ/2 n−lxǫ + pδ n−lǫ ), kA provided hn = o(1). Remark. When all factors are strong (i.e. δ = 0), the convergence rate hn is independent of the dimension p. – p.21 Our asymptotic theory also shows: 1. Factor model-based estimator for Σy : by = A bΣ b xA bτ + Σ b ǫ , where Σ bx = A b τ (Σ̃y − Σ b ǫ )A, b Σ cannot improve over the sample covariance estimator Σ̃y . −1 b But the convergence rate for kΣy − Σ−1 y k is independent of p when all the factors are strong. 2. When the factors are of different levels of strengths, two (or multiple) step estimation procedure is preferred to estimate and to remove the stronger factors first. – p.22 Simulation with r = 1 and δ = 0 (only one strong factor): xt = 0.9xt−1 + N (0, 4), εtj ∼iid N (0, 4), and the i-th element of A is 2 cos(2πi/p). b − Ak n = 200 kA p = 20 .022(.005) p = 180 .023(.004) p = 400 .022(.004) p = 1000 .023(.004) −1 kΣ̃y − Σ−1 y k .24(.03) 79.8(29.8) - −1 b kΣy − Σ−1 y k .009(.002) .007(.001) .007(.001) .007(.001) b y − Σy k n = 200 kΣ̃y − Σy k kΣ p = 20 218(165) 218(165) p = 180 1962(1500) 1963(1500) p = 400 4102(3472) 4103(3471) p = 1000 10797(6820) 10800(6818) – p.23 Illustration With Real Data Example 1. The monthly temperature data from 7 cities in Eastern China in January 1954 — December 1986 n = 396, p = 7, rb = 4 Example 2. Daily implied volatility surfaces for IBM, Microsoft and Dell call options in 2006 n = 100, p = 130, rb = 1 ⇛ Example 3. Daily densities of one-minute returns of IBM stock price in 2006 n = 251, p = “∞”, rb = 2 ⇛ – p.24 Time plots of the monthly temperature in 1959-1968 of Nanjing, Dongtai, Huoshan, Hefei, Shanghai, Anqing and Hangzhow. 25 • 15 • • • • • • • • • ••• • • 25 • 15 • • • • 0 5 • • • • • • •• 25 15 • • • • 0 5 • • • 25 • 15 32.5 • • • • • • 0 5 • • • • • ••• • •• • 25 • 15 • • • ••• • • • ••• • 25 • • 15 • • • • 0 5 • • • • • • ••• • • • •• • 0 25 • 15 • • • • • • • • 0 • • • • • • •• • 20 • •• • • • • • • • • • • • • • • • • • ••• 40 • •• • • • • • • • • 60 • • • • • • • • • • • 120 • • • • • • • •• •• • • • • •• •• • • • • • • • • • • 100 • • • • 80 • • • • • • • • • • • • •• • • • • • • • • • •• •• • • • 120 • • •• • • 100 • • • •• • • • • • • • •• 80 •• • • • • • • • • •• • •• • • • • • 60 • • • • • • • • • •• •• • • • •• • • • •• • • • • • 120 • • •• • • 100 • • • • ••• • • • • • ••• • 80 •• •• • • • • • • • •• • •• 40 • • • • •• • • • • • • • •• • ••• • ••• • 5 • • • • •• • • • • •• •• • • • • • • • 20 • • • • 100 • • • • • • • • • • • 120 • • • • •• ••• • • • • • • ••• • • 60 • • • • • • • • • • • • • • • • 80 • • • • • • 122 • • • • • • • • • • • • • • • • • • • • • • • • 100 •• • •• 40 • • •• • •• •• • • • • •• • • •• • 20 •• • • 80 •• • •• • • •• • • • •• • • • • 60 • • • • ••• • • • • • • • • • • • • 0 ••• • • • • • • • 120 • • • •• • • • • • • • ••• • • • • • • 80 • • 100 • • • • •• •• • • • • • •• • • • • 40 • • • • •• • • •• • • •• • • • ••• •• • 60 • • • • • • • • • • ••• • 20 • • • • • • • • ••• • •• • •• • 40 • • • • • • •• •• • •• •• • •• •• • • • • • • • • ••• • • 0 5 32.0 31.5 31.0 30.5 30.0 Latitude • • • • • • • 80 • • 20 • Longitude • •• 0 120 • • • • • • • • • 60 • • • • • • 118 • • • ••• •• • • 116 • •• • • • • • • • •• • • •• •• • • •• 40 ••• • • • • • • • • 0 Hangzhow • • • •• • • • • • • • • 60 • • • • • • • Anqing • • • • • • • •• • • • • • • 40 20 • • • • •• • • •• • • ••• •• • • • Shanghai • • • • • 20 • • • • • •• • • • • • 0 Hefei • Huoshan • • • • • Nanjing • • • • • 0 Dongtai • •• • • • 0 5 • • •• 120 •• • • • • • • • •• •• •• • • • • • • • • • • 100 • • • • • 120 – p.25 b t + et , rb = 4, With p = 12, α = 1%, the fitted model is yt = Ax b ε ), et ∼ WN(b µε , Σ 0 B B B B B B B be = B µ B B B B B @ 3.41 2.32 4.39 4.30 3.40 4.91 4.77 0 1 C C C C C C C C, C C C C C A .394 B B −.086 b =B A B B .395 @ .687 0 B B B B B B B b Σe = B B B B B B @ 1 1.56 1.26 1.05 1.71 1.34 1.91 1.90 1.49 2.10 2.33 1.37 1.16 1.46 1.58 1.37 1.67 1.26 1.91 2.09 1.37 1.97 1.41 1.14 1.58 1.67 1.39 1.56 .386 .378 .387 .363 .376 .225 −.640 −.271 .658 −.014 .0638 −.600 .346 −.494 −.074 −.585 −.032 −.306 .173 .206 .366 1.53 C C C C C C C C. C C C C C A 1τ C .164 C C C , .332 C A −.139 xt are PCAed factors: 1st PC accounts for 99% of TV of 4 factors, and 97.6% of the original 7 series. – p.26 Time plots of the 4 estimated factors 20 40 60 • • • 0 • • • ••• • • • • • • •• • •• • • • • • • ••• 3 2 • • • 1 •• • 0 • • • • • • •• •••• • • • • •• ••• ••• •• 20 •• •• •• •• •• • • • • • • • • • • •• • • • • • • • • • • •• • ••• ••• 80 • • •••• • • • • • • 60 • • • • • •• • • • ••• • • • • • •• • • • • • • • ••• • • • 100 • • •• • • • • • •• •• • • • •• • • 40 60 0.6 0 • •• 120 • •••• • • • •• • 40 • • • • • 20 • • •• • • 4 0 • • • • • • • • • • • • • • VAR(1) •• • • •••• • • • • • •• • 80 • • • •••• • • • • • • •• •• • 100 120 • 0.0 • • • ••• • • •• • • • −0.6 • • • • • • • • • • • • • • • •• • • 0.4 • • • • • • • • • •• • ••• • • • • • ••• ••• •• • • •• • • • •• • ••• • • • • • 20 40 • •• •• •• • •• • • • • •• • ••••• ••• •• • • •• •• • • • • 80 • • • • • • ••• • • • • • ••• •••• • • • • 100 •• ••• •• • • • • • 120 • • • • • • •• •• • • • • •• • • •• • • • • • • 0 ••• • • • • • • 60 • • • • • • • • •• • • • • • • • ••• • • • • • • • • • • • •• ••• • • 40 • • • • • 20 • −0.4 •• • 0 0.0 • • • • • 60 80 100 120 – p.27 Sample cross-correlation of the 4 estimated factors factor 1 and factor 2 factor 1 and factor 3 factor 1 and factor 4 15 20 0 5 10 15 20 −5 0 0.05 15 20 0 0.05 5 10 15 20 0 5 10 15 20 0 10 15 20 5 10 15 20 factor 3 and factor 4 factor 3 −10 −5 0 0.05 0.0 0.4 0.0 −15 −10 −5 0 0 factor 4 and factor 2 5 10 15 20 5 10 15 20 factor 4 −20 −15 −10 Lag −5 0 0.8 −20 −15 −10 Lag −5 0 0.4 0.0 −0.10 −0.10 0.0 0.0 0.05 ACF −0.05 0.05 0 factor 4 and factor 3 0.10 factor 4 and factor 1 −20 −0.10 −15 −0.15 −0.05 −0.05 ACF 0.05 0.05 0.8 0.10 0.15 factor 3 and factor 2 −0.15 −20 −0.15 5 factor 2 and factor 4 0.0 0 −0.10 −10 10 0.10 1.0 0.0 −0.5 −15 factor 3 and factor 1 5 factor 2 and factor 3 0.5 0.5 ACF 0.0 −0.5 −20 0.0 0 factor 2 0.10 10 0.0 5 factor 2 and factor 1 −0.10 0 −0.10 0.0 −1.0 −0.10 −0.5 −0.5 0.0 ACF 0.0 0.5 0.5 0.10 1.0 factor 1 −20 −15 −10 Lag −5 0 0 5 10 15 Lag 20 – p.28 b τ yt ) Sample cross-correlation of the 3 residuals (i.e. B residual 1 and residual 2 residual 1 and residual 3 0 5 10 15 20 0.05 −0.15 −0.10 0.0 0.2 −0.05 −0.05 0.0 ACF 0.4 0.6 0.05 0.8 1.0 0.10 residual 1 0 5 15 20 0 residual 2 10 15 20 residual 2 and residual 3 1.0 −20 −15 −10 −5 0 0.0 −0.10 0.0 −0.10 0.2 −0.05 0.4 ACF 0.0 0.6 0.05 0.8 0.10 0.05 5 0.10 residual 2 and residual 1 10 0 10 15 20 0 5 residual 3 and residual 2 15 20 1.0 0.4 0.2 0.0 −0.10 −0.10 −0.05 −0.05 0.0 ACF 0.0 0.6 0.05 0.8 0.10 0.05 10 residual 3 0.10 residual 3 and residual 1 5 −20 −15 −10 Lag −5 0 −20 −15 −10 Lag −5 0 0 5 10 Lag 15 20 – p.29 Since the first two factors are dominated by periodic components, we remove them before fitting. 60 • • 50 • • 40 • 30 • 20 • • 0 10 • • • • 2 • • 4 • • 6 • • 8 • • 10 • •• 12 – p.30 b t + et , the AICC selected In the fitted factor model yt = Ax VAR(1) for the factor process: b 1 (xt−1 − αt−1 ) + ut , b0 + Φ xt − αt = ϕ where ατt = (pt1 , pt2 , 0, 0) is the periodic component, and 0 B B B b Φ1 = B B @ .27 −.31 .72 .01 .36 −.04 .00 −.01 .42 −.00 .03 .03 1 14.24 B C B −.17 .04 C B C b C , Σu = B B −.02 −.02 C @ A .042 .48 .40 1 0 .23 .03 .05 .01 −.00 .05 C C C C, C A b 0 = (.07, −.02, −.11, .10)τ . ϕ – p.31 • Temperature dynamics in the 7 cities may be modelled in terms of 4 common factors • The annual periodic fluctuations may be explained by a single common factor • Removing the periodic components, the dynamics of the 4 common factors may be represented by an AR(1) model ⇛ – p.32 Example 2. Implied volatility surfaces of IBM, Microsoft and Dell stocks in 2006 (i.e. 251 trading days). Source of Data: OptionMetrics at WRDS Observations: for t = 1, · · · , 251, implied volatility wt (ui , vj ) computed from call options at • time to maturity at 30, 60, 91, 122, 152, 182, 273, 365, 547 & 730 calendar days, denoted by u1 , · · · , u10 , and • delta at 0.2, 0.25, 0.3, 0.35, 0.4, · · · , 0.8, denoted by v1 , · · · , v13 . Total: p = 10 × 13 = 130 time series with the length 251 each. Take difference w̌t (ui , vj ) = wt (ui , vj ) − wt−1 (ui , vj ), and vectorize the matrix w̌t (ui , vj ) into a 130 × 1 vector process yt . – p.33 Average daily implied volatility surfaces over 251 days. – p.34 Fitting a factor model on each of the rolling windows of length 100 days: yi , yi+1 , · · · , yi+99 , i = 1, · · · , 150. The estimated number of factors for all 3 stocks across different windows is always rb = 1. Based on a fitted AR model to the estimated factor process, we predict the next value xi+100 , denoted by x̌i+100 . It leads to the one-step ahead prediction for yi+100 : b i+100 . y̌i+100 = Ax̌ Put 1 RMSEi = √ ||y̌i+100 − yi+100 ||, p i = 1, · · · , 150. – p.35 Average of the ordered c over eigenvalues of M the 150 rolling windows. 3 panels on the left: 10 largest eigenvalues 3 panels on the right: 2nd–11th largest eigenvalues. – p.36 Benchmark prediction for yi+100 : the previous value yi+99 Prediction based on Bai & Ng (2002) — factor-modelling based b x bt ) is the solution of on the LSE: (A, n X ||yt − Axt ||2 , subject to Aτ A/p = Ir and Xτ X/n = Ir , min A,xt t=1 where X = (x1 , · · · , xn ). The cumulative RMSE over the 150 windows. Red dotted – Bai & Ng (2002), Green – benchmark, Black – our method. – p.37 Example 3. IBM stock intra-day prices in 2006 251 trading days, tick by tick prices collected in 9:30 — 16:00 In total 2,786,649 observations (74MB) For each of 251 trading days, construct the pdf curve of one-minute log-return using the log-returns in 390 one-minute intervals: kernel density estimation with h = 0.000025 Treating the 251 pdfs as a high-dimensional time series, apply the proposed procedure. The white-noise test rejects H0 : r = 1, but cannot reject H0 : r = 2. – p.38 4e−04 4000 0 −4e−04 0e+00 4e−04 −4e−04 0e+00 4e−04 day 5 day 6 0 4000 0 4000 0 0e+00 4e−04 4000 day 4 Density bandwidth = 2.5e−05 Density bandwidth = 2.5e−05 −4e−04 0e+00 4e−04 −4e−04 0e+00 4e−04 day 7 day 8 day 9 4000 0e+00 4e−04 0 0 −4e−04 4000 bandwidth = 2.5e−05 Density bandwidth = 2.5e−05 Density bandwidth = 2.5e−05 0 4000 −4e−04 0e+00 4e−04 −4e−04 0e+00 4e−04 day 10 day 11 day 12 0e+00 4e−04 bandwidth = 2.5e−05 0 4000 0 −4e−04 4000 bandwidth = 2.5e−05 Density bandwidth = 2.5e−05 Density bandwidth = 2.5e−05 0 4000 Density Density 0 0e+00 bandwidth = 2.5e−05 −4e−04 Density day 3 4000 Density −4e−04 Density day 2 0 4000 Density day 1 −4e−04 0e+00 4e−04 bandwidth = 2.5e−05 −4e−04 0e+00 4e−04 bandwidth = 2.5e−05 – p.39 4 6 8 10 8e+08 12 2 4 6 8 10 Index BHK: 2 to 12 New: 2 to 12 0 0e+00 4e+06 eigen−values 1000 12 8e+06 Index 3000 2 eigen−values 4e+08 0e+00 5000 eigen−values 15000 New: 1 to 12 0 eigen−values BHK: 1 to 12 2 4 6 Index 8 10 2 4 6 Index 8 10 – p.40 −0.10 0.00 1st Eigen−Function (New) eigen−funtion 0.00 −0.10 eigen−funtion 1st Eigen−Function (BHK) 0 100 200 300 400 500 0 100 200 300 400 500 Index 2nd Eigen−Function (BHK) 2nd Eigen−Function (New) 0.05 −0.15 −0.05 eigen−funtion −0.05 −0.15 eigen−funtion 0.05 Index 0 100 200 300 400 500 Index 0 100 200 300 400 500 Index – p.41 40 45 50 55 60 65 70 Time series plots of xt1 and xt2 0 50 100 150 200 250 150 200 250 150 200 250– p.42 −10 −5 0 5 10 Day 0 50 100 −15 −5 0 5 10 15 Day 0 50 100 Day ACF of (xt1 , xt2 ) 0.6 0.2 −0.2 0.2 −0.2 ACF 0.6 1.0 Series 1 & Series 2 1.0 Series 1 5 10 15 20 5 10 Series 2 & Series 1 Series 2 15 20 15 20 0.6 0.2 −0.2 0.2 0.6 1.0 Lag −0.2 ACF 0 Lag 1.0 0 −20 −15 −10 Lag −5 0 0 5 10 Lag – p.43 PACF of (xt1 , xt2 ) 0.4 0.0 0.2 0.4 0.2 0.0 Partial ACF 0.6 Series 1 & Series 2 0.6 Series 1 5 10 15 5 10 Series 2 & Series 1 Series 2 15 20 15 20 0.4 0.2 0.0 0.2 0.4 0.6 Lag 0.6 Lag 0.0 Partial ACF 20 −20 −15 −10 Lag −5 5 10 Lag – p.44 Fitting time series xt = (xt1 , xt2 )′ Since there is little cross correlation between the two component series, we fit them separately. For {xt1 }, AIC selected ARMA(1,1) with AIC=4556.76: xt+1,1 = 0.985xt1 + εt+1,1 − 0.787εt,1 . For {xt2 }, AIC selected ARMA(1,1) with AIC=4323.1: xt+1,2 = 0.982xt2 + εt+1,2 − 0.885εt,2 . Allowing nonstationarity — ARIMA(1,1,1): xt+1,1 −xt1 = 0.062(xt1 −xt−1,1 )+εt+1,1 −0.847εt,1 , (AIC = 4537.13) xt+1,2 −xt2 = 0.046(xt2 −xt−1,2 )+εt+1,2 −0.889εt,2 , (AIC = 4306.08) – p.45 ACF of the residuals from the fitted ARMA(1,1) models Series residuals(arima(xi[, 1], order = c(1, 0, 1))) 0.6 0.2 0.4 ACF 0.4 0.2 0.0 0.0 ACF 0.6 0.8 0.8 1.0 1.0 Series residuals(arima(xi[, 2], order = c(1, 0, 1))) 0 5 10 15 Lag 20 0 5 10 15 20 Lag – p.46
© Copyright 2026 Paperzz