Time-Varying Transition Probabilities Based on

Time-Varying Transition Probabilities Based on
Predictive Likelihood Scores in
Markov Regime Switching Models∗
Marco Bazzia , Francisco Blasquesb
Siem Jan Koopmanb,c , Andre Lucasb
a
b
University of Padova, Italy
VU University Amsterdam and Tinbergen Institute, The Netherlands
c
CREATES, University of Aarhus, Denmark
Abstract
We propose a new Markov switching model with time-varying transition probabilities. The novelty of our model is that the dynamics of the transition probabilities
evolve in an observation driven way based on the score of the predictive likelihood
function using the recently developed Generalized Autoregressive Score framework.
We show how the dynamics of the new model can be readily interpreted. We investigate the model’s performance in a controlled simulation setting and show that the
model is successful in estimating a range of different dynamic patterns for unobserved
regime switching probabilities. We also illustrate the usefulness of the new methodology in an empirical setting by studying U.S. Industrial Production. We find empirical
evidence of changes in the regime switching probabilities, with more persistence for
low volatility regimes in the later part of the sample, and more persistence for high
volatility in the former part of the sample.
Keywords: Hidden Markov Models; HMM; observation driven models; generalized
autoregressive score models.
Classivication codes: AMS-62M10, AMS-62M20, JEL-C22.
∗
The authors would like to thank the participants of the “2014 Workshop on Dynamic Models driven by
the Score of Predictive Likelihoods”, La Laguna, and seminar participants and VU University Amsterdam
for useful comments and discussions. Blasques and Lucas thank the Dutch Science Foundation (NWO,
grant VICI453-09-005) for financial support.
1
1
Introduction
Markov regime-switching models have been widely applied in economics and finance. Since
Hamilton (1989) illustrated the model in his seminal paper by an application to U.S. real
GNP and showed how well a simple regime switching model captures the NBER business
cycle classification, the model has seen numerous other applications, ranging from switches
in the level of a time series, switches in the (autoregressive) dynamics of vector time series,
switches in volatilities, and switches in the correlation or dependence structure between
time series. The key attrictive feature of Markov switching models is that the conditional
distribution of a time series depends on an underlying latent state or regime, which can
take only a finite number of different values. The discrete state evolves through time as a
Markov chain and we can summarize its statistical properties by a transition probability
matrix.
It has since long been noted that the assumption of a constant transition probability
matrix for a Markov switching model is too restrictive for many empirical settings. Therefore, Diebold et al. (1994) and Filardo (1994) extended the basic Markov switching model
to allow for the transition probabilities to vary over time using observable covariates, such
as strictly exogenous explanatory variables or lagged values of the dependent variable.
Though such an approach is often very useful, it is not always clear what variables or
which functional specification we should use for describing the dynamics in the transition
probabilities.
Our main contribution in this paper is to propose a new, pure time series approach to
model the dynamics of the transition probabilities in Markov switching models. We do
so in an observation driven way, i.e., we let the transition probabilities vary as functions
of particular transforms of the lagged observations. The alternative would be to let the
transition probabilities vary in a parameter driven way; see Cox (1981) for more details
between the two approaches. The main advantage of an observation driven model is the
availability of a closed form of the likelihood using a standard prediction error decomposition. Given the computational ease of a standard regime switching model, we opt for the
2
observation driven approach in order not to break the appeal for Markov switching models
from a computational perspective.
By opting for an observation driven approach, we are directly faced with the challenge
to specify a suitable functional form that links past observations to future transition probabilities. We do so by applying the Generalized Autoregressive model (GAS) of Creal et al.
(2011,2013) and Harvey (2013). In a GAS framework, the dynamics of a time-varying parameter are driven by the score of the conditional observation density at a particular point
in time. GAS models encompass many well-known observation driven time series models, including the ARCH model of Engle (1982), the generalized ARCH (GARCH) model
of Bollerslev (1986), the exponential GARCH (EGARCH) model of Nelson (1991), the
autoregressive conditional duration (ACD) model of Engle and Russell (1998), the multiplicative error model (MEM) of Engle (2002), the Beta-t-GARCH model of Harvey (2013),
and many related models. In addition, the recent statistics literature has seen many new
successful applications of GAS based time seriesmodels. For example, Creal et al. (2011)
and Lucas et al. (2014) study dynamic volatilities and correlations under fat-tails and possible skewness. Harvey and Luati (2014) introduce new models for dynamic changes in
levels under fat tails. Creal et al. (2014b) investigate GAS based observation driven mixed
measurement dynamic factor models. And Oh and Patton (2013) and De Lira Salvatierra
and Patton (2013) investigate factor copulas based on GAS dynamics. Koopman et al.
(2012) show that GAS based time series models have a similar forecasting performance
as correctly specified non-linear, non-Gaussian state space models over a range of model
specifications.
We show that the score of the predictive density in the regime switching model takes
a very intuitive form. The information on the time-varying parameter contained in the
models for each of the states is combined using the predictive probabilities of each of the
states being applicable at that moment in time. Interestingly, this directly introduces a
possible information problem with regard to the dynamics of the transition probabilities
corresponding to the states that receive hardly any current predictive probability. We solve
this issue by studying alternative scaling forms for the score corresponding to various forms
3
of generalized inverses of the Fisher information matrix.
We study the performance of our new model in a simulation setting and for empirical
data. In the simulation experiment, we find that the model and estimation methodology
can uncover a range of unobserved dynamics of transition probabilities. The dynamic
patterns range from structural breaks to slow and fast changing dynamic patterns ofr
the transition probabilities. In our empirical example, we study monthly U.S. Industrial
Production from January 1919 to October 2013. We find empirical evidence of changes
in the transition probabilities over this long time span. In general, we uncover two or
three regimes. The corresponding transition probabilities, however, are not constant over
time. In particular, the high volatility regime is both likely and persistent in the earlier
part of the sampe. In the later part of the sample, the low volatility regime is much more
persistent. Such changes in dynamics are easily captured by the new model.
It is interesting to note that the new model we propose in this paper is one of the first
models that mixes parameter driven (Markov switching) dynamics with observation driven
GAS dynamics. In particular, it is interesting to see that the GAS approach can still be
applied in a context where an additional filtering step (for the unobserved discrete state
process) has to be applied before we can compute the score of the resulting conditional
density as a driver for the time-varying transition probabilities. This feature of the new
model may be interesting on its own. Similar observations using continuously valued
Gaussian states in a state space set-up were made by Creal et al. (2014a) and Delle Monache
and Petrella (2014)
The remainder of the paper is organized as follows. In Section 2, we briefly discuss the
main set-up of the Markov switching model and its residual diagnostics. In Section 3, we
introduce the new Markov switching model with dynamic transition probabilities based
on the Generalized Autoregressive Score framework of Creal et al. (2013). In Section 4,
we discuss the stationarity and ergodicity properties of the new model. based on the
Generalized Autoregressive Score framework of Creal et al. (2013). In Section 5, we perform
an extensive Monte Carlo study to show the robustness of our methodology to several types
of latent dynamic processes for the time-varying transition probabilities. In Section 6 we
4
discuss the empirical application to U.S. Industrial Producation data. We conclude in
Section 7.
2
Markov switching models
For an extensive introduction to and discussion of Markov switching models, we refer to
the textbook treatment of Frühwirth-Schnatter (2006). Here, we provide a quick overview
of the basics of Markov switching models and establish the main notation. Let {yt , t =
1, · · · , T } denote a time series of T univariate observations. We consider the time series
{yt , t = 1, · · · , T } as a subset of a stochastic process {yt }t∈Z . The probability distribution
of the stochastic process yt depends on the realizations of a hidden discrete stochastic
process zt . The stochastic process yt is directly observable, whereas zt is a latent random
variable that is observable only indirectly through the effect it has on the realizations of
yt . The hidden process {zt }t∈Z is assumed to be an irreducible and aperiodic Markov chain
with finite state space {0, · · · , K − 1}. Its stochastic properties are sufficiently described
by the (K × K) transition matrix, Π, where each element πij of Π is equal to the transition
probability from state i to state j
∀i, j ∈ {0, · · · , K − 1}.
πij = P[zt = j|zt−1 = i],
(1)
All elements of Π are nonnegative and the elements of each row sum to 1, i.e. πij ≥ 0,
P
∀i, j ∈ {0, · · · , K − 1} and K−1
j=0 πij = 1, ∀i = 0, · · · , K − 1.
Let p( · |θ, ψ) be a parametric conditional density indexed by a parameters θ ∈ Θ and
ψ ∈ Ψ. We assume that the random variables y1 , · · · , yT are conditionally independent
given z1 , · · · , zT , with densities
yt | (zt = i) ∼ p( · |θi , ψ).
(2)
with regime dependent parameter θi , and regime independent parameter ψ. Starting from
5
the doubly stochastic process {zt , yt }t∈Z , the density of yt conditional on the information
available at time t − 1, denoted by It−1 , is
p(yt |ψ, It−1 ) =
K−1
X
p(yt |θi , ψ) P(zt = i|ψ, It−1 ),
(3)
i=0
where both ψ and θ0 , . . . , θK−1 need to be estimated.
Note that the conditional mean of yt given zt and It−1 may contain lags of yt itself.
Francq and Roussignol (1998) and Francq and Zakoïan (2001) derive the conditions for the
existence of an ergodic and stationary solution for the general class of Markov switching
ARMA models. In particular, they show that global stationarity of yt does not require the
stationarity conditions within each regime separately.
As an example, consider a continuous variable yt with conditional density
p( · |zt ) = N (1 − zt )µ0 + zt µ1 , σ 2 ,
(4)
where µ0 and µ1 are static regime-dependent means, and σ 2 is the common variance σ 2 .
The latent process {zt }t∈Z is driven by the transition probability matrix Π

Π=
1 − π00
π00
1 − π11
π11

,
(5)
where the transition probabilities 0 < π00 , π11 < 1. We now have θi = µi for i = 0, 1, and
ψ = (σ 2 , π00 , π11 )0 .
In order to evaluate equation (3), we require the quantities P(zt = i|ψ, It−1 ) for all t.
We can compute these efficiently using the recursive filtering approach of Hamilton (1989).
Assuming we have an expression for the filtered probability P(zt−1 = i|ψ, It−1 ), we can
obtain the predictive probabilities P(zt = i|ψ, It−1 ) as
P(zt = i|ψ, It−1 ) =
K−1
X
πki · P(zt−1 = k|ψ, It−1 ).
k=0
6
(6)
Hence, the conditional density of yt given It−1 is given by
p(yt |ψ, It−1 ) =
K−1
X K−1
X
p(yt |θi , ψ) · πki · P(zt−1 = k|ψ, It−1 ).
(7)
i=0 k=0
The previous formula can be rewritten more compactly, giving the explicit recursion for the
filtered probabilities and introducing a matrix notation. Denote ξt−1|t−1 the K−dimensional
vector which collects the filtered probabilities P(zt−1 = i|ψ, It−1 ) at time t − 1 and ηt the
K−dimensional vector which collects the densities p(yt |θi , ψ) at time t for i = 0, . . . , K − 1.
Then the filtered probabilities ξt|t are update by the Hamilton recursion
ξt|t
Π0 ξt−1|t−1 ηt
=
0
ξt−1|t−1
Πηt
where denotes the Hadamard product and the denominator of the previous quantity is
equivalent to (7) written in matrix notation,
0
p(yt |ψ, It−1 ) = ξt−1|t−1
Πηt .
(8)
The filter needs to be started from an appropriate set of initial filtered probabilities P(z0 =
i|ψ, I0 ). Using the above Hamilton filter, the log-likelihood function can be evaluated for
0
any parameter vector (θ00 , . . . , θK−1
, ψ 0 )0 . The likelihood can then be optimized numerically
using a Newton or quasi Newton algorithm. In order to avoid local maxima in nonlinear
time series models such as the above, it is useful to work with different starting values for
the numerical optimizations.
Once the whole sample {y1 , · · · , yT } is observed, we are often also interested in making
inference about the regime probabilities conditioned on all information available at time T ,
i.e., P(zt = i|ψ, IT ). These probabilities are known as the smoothed regime probabilities.
We can compute them efficiently using the algorithm of Kim (1994).
Diagnostic checking in Markov regime switching models is always complicated. This is
due to fact that the true residuals are unobservable because of the the latent variable zt .
7
The typical approach followed in the literature is to use generalized residuals instead. The
generalized residual was introduced by Gourieroux et al. (1987) in the context of latent
variable models. It has been used in the context of Markov regime-switching models by
Turner et al. (1989), Gray (1996), Maheu and McCurdy (2000), and Kim et al. (2004).
Given the filtered regime probabilities P(zt = i|ψ, It−1 ), for i = 0, · · · , K − 1, let µi
and σi2 denote the conditional mean and the conditional variance of yt in regime i. The
standardized generalized residual are now defined as
et =
K−1
X
i=0
y t − µi
P(zt = i|ψ, It−1 ).
σi
(9)
Smith (2008) considered another type of residual for Markov-switching models based on
the transformation proposed by Rosenblatt (1952),
ẽt = Φ−1
K−1
X
!
P(zt = i|ψ, It−1 )Φ σi−1 (yt − µi )
,
(10)
i=0
where Φ denotes the cumulative distribution function of a standard normal with inverse
Φ−1 . If yt is actually generated by the distribution implied by the Markov switching model,
then the Rosenblatt residuals ẽt are standard-normally distributed. As shown by Smith
(2008) in an extensive Monte Carlo study, Ljung-Box tests based on the Rosenblatt transformation have good finite-sample properties as a diagnostic device for Markov switching
models.
3
The MS-GAS model
Until now we considered the transition matrix Π as constant over time. Diebold et al.
(1994) and Filardo (1994) propose to model a dynamic matrix Πt for the transition probabilities, where the elements of Πt are functions of past values of the dependent variable yt
or others exogenous variables. The Hamilton filter and Kim smoother can easily be generalized to handle the case of time-varying Πt . A key challenge is to formulate an appropriate
8
functional specification linking the lagged values of the dependent variable to future transition probabilities. In this paper we follow the Generalized Autoregressive Score (GAS)
model of Creal et al. (2013) to specify the dynamics of Πt . In this section, we discuss the
precise model formulation and show that it has a highly intuitive interpretation, combining
in an efficient way the likelihood information in each of the separate regimes p( · |θi , ψ). In
Section 4 we then discuss the stationarity and ergodicity properties of the new model.
3.1
The Generalized Autoregressive Score model
Up to now, the static parameter vector ψ contained parameters describing the transition
probabilities as well as other parameters capturing the shape of the conditional distributions p(yt |ψ, It−1 ). With a slight abuse of notation, we single out the dynamic parameters ft that we use to capture the dynamic transition probabilities, and use ψ ∗ to
gather all remaining static parameters in the model, as well as some new static parameters needed to describe the dynamics of ft . For example, in the two-state example of
Section 2 we may choose ft to have two elements, namely logit(π00,t ) and logit(π11,t ), with
logit(π00,t ) = log(π00,t ) − log(1 − π00,t ), and log( · ) denoting the natural logarithm. From
now, we write the conditional observation density as p(yt |ft , ψ ∗ , It−1 ).
In the GAS framework of Creal et al. (2013), we model dynamic parameters by exploiting the information contained in the score of the conditional observation density
p(yt |ft , ψ ∗ , It−1 ) with respect to ft ; see also Harvey (2013). Our main departure in this
paper from the regular GAS model framework is that the conditional observation density
itself is a mixture of an unobserved, dynamic process zt . Therefore, the shape of our conditional observation density as given by equation (3) is more somewhat more involved than
usual.
The GAS mechanism for updating the time-varying parameter ft is given by
ft+1 = ω + Ast + Bft ,
st = St · ∇t ,
∇t =
∂
log p(yt |ft , ψ ∗ , It−1 ),
∂ft
(11)
where ω is a vector of constants, A and B are matrices, while st equals the derivative of the
9
log conditional observation density log p(yt |ft , ψ ∗ , It−1 ) with respect to ft , i.e. the ‘score’
∇t , scaled by the matrix St . The GAS mechanism thus takes a steepest ascent or Newton
type step in ft , using the log conditional density at time t as its criterion function. An
interesting choice for St , as recognized by Creal et al. (2013), is given by considering the
inverse square root of the inverse Fisher Information matrix to account for the curvature
of ∇t as a function of ft . Under correct specification, this results in GAS steps st that have
unit variance.
3.2
GAS dynamics for transition probabilities: 2 states
To facilitate the discussion of the model, we first consider the two-state Markov regime
switching model, K = 2. To ensure that the transition probabilities πij,t , i, j = 0, 1 vary
between 0 and 1, we use the logit transformation fij,t = logit(πij,t ) and set ft = (f00,t , f11,t )0 .
To keep the model parsimonious, we use diagonal specifications for the matrixes A and B
in (11),


f00,t+1
f11,t+1


=
ω00
ω11


+
B00
0
0
B11


f00,t
f11,t


+
A00
0
0
A11

−0.5
 It−1
∇t ,
(12)
where It−1 is the 2 × 2 Fisher information matrix, and ∇t is the 2 × 1 gradient vector of
the log conditional observation density with respect to ft . The assumption of diagonal A
and B matrices can, of course, easily be relaxed.
For the conditional density in (7), we obtain
p(yt |θ0 , ψ ∗ ) − p(yt |θ1 , ψ ∗ )
∗
g
f
,
ψ
,
I
,
t
t−1
p(yt |ψ ∗ , It−1 )


∗
P[zt−1 = 0|ψ , It−1 ] · π00,t (1 − π00,t )
.
g ft , ψ ∗ , It−1 = 
−P[zt−1 = 1|ψ ∗ , It−1 ] · π11,t (1 − π11,t )
∇t =
(13)
(14)
This expression has a highly intuitive form. The first factor in (13) is the difference in
the likelihood of yt given zt = 0 versus zt = 1. The difference is scaled by the total
10
likelihood of the observation given all the static parameters. If the likelihood of yt given
zt = 0 is relatively large compared to that for zt = 1, we expect f00,t to rise and f11,t to
decrease. This is precisely what happens. The magnitudes of the steps are determined by
the conditional probabilities of being in regime zt−1 = 0 or zt−1 = 1, respectively, at time
t − 1. The remaining factors πii,t (1 − πii,t ) or i = 0, 1 are due to the logit parameterization.
In particular, we note that if we are almost sure of being in regime zt−1 = 0 at time t − 1,
i.e., P[zt−1 = 0|ψ ∗ , It−1 ] ≈ 1, then we take a large step for f00,t , and almost no step for
f11,t . Obviously, if we are certain of being in regime zt−1 = 0, we can only learn about
π00,t by observing yt . The converse holds if we are almost sure of being in regime zt−1 = 1
at time t − 1, in which case we can only learn about f11,t . The weighting by the filtered
probabilities in the vector g(ft , ψ ∗ , It−1 ) in (13) automatically takes care of this.
Apart from the singularity of the information about either f00,t or f11,t when P[zt−1 =
0|ψ ∗ , It−1 ] moves close to either 1 or 0, the conditional Fisher information matrix based on
(13) is singular by design. The follows directly by noting that the vector g(ft , ψ ∗ , It−1 ) on
the right-hand side of (13) is It−1 -measurable. Therefore, we scale the score by a square
root Moore-Penrose pseudo-inverse1 of the conditional Fisher information matrix, resulting
in the scaled score
st = sZ
p(yt |θ0 ,ψ ∗ )−p(yt |θ1 ,ψ ∗ )
p(yt |ψ ∗ ,It−1 )
∞
−∞
p(yt |θ0 ,ψ ∗ )−p(yt |θ1 ,ψ ∗ )
p(yt |ψ ∗ ,It−1 )
g ft , ψ ∗ , It−1
·
g ft , ψ ∗ , It−1 ,
2
(15)
dyt
where the integral has no closed form in general and needs to be computed numerically,
for example using Gauss-Hermite quadrature methods.
The approach using the analytic expression for the Moore-Penrose pseudo inverse can
be generalized to accommodate more than 2 regimes. Typically, however, the expressions
If x ∈ Rn is a vector, then the Moore-Penrose pseudo-inverse of xx0 is given by kxk−4 xx0 , and
its square root by kxk−3 xx0 , as kxk−3 xx0 kxk−3 xx0 = kxk−4 xx0 . As g(ft , ψ ∗ , It−1 ) is It−1 -measurable,
scaling the score by the square root Moore-Penrose pseudo inverse of the conditional Fisher information
matrix yelds an expression proportional to kg(ft , ψ ∗ , It−1 )k−3 g(ft , ψ ∗ , It−1 )g(ft , ψ ∗ , It−1 )0 g(ft , ψ ∗ , It−1 ) =
kg(ft , ψ ∗ , It−1 )k−1 g(ft , ψ ∗ , It−1 ).
1
11
become more involved and both analytically and computationally cumbersome. As an
alternative to the analytic approach in (15) we can also work with a numerical pseudoinverse, such as a Tikhonov regularized inverse
∗1/2
It−1 = λI + (1 − λ)It−1
−1/2
,
(16)
with I the unit matrix and 0 < λ < 1 a fixed scalar. For λ → 0 we have a pure square
root inverse information scaling, while for λ → 1 we obtain unit matrix scaling. In our
applications below, we estimate λ jointly with ψ ∗ using maximum likelihood. We get some
further flexibility by introducing a diagonal matrix Λ with λ1 and λ2 on the diagonal and
∗∗1/2
It−1
= Λ + (I − Λ)1/2 It−1 (I − Λ)1/2
−1
.
(17)
We estimate λ1 and λ2 by maximum likelihood again, with 0 < λ1 , λ2 < 1.
3.3
GAS dynamics for transition probabilities: K states
The two-regime model is easily generalized to K regimes. To force all transition probabilities to be non-negative and sum to one (row-wise), we use the multinomial logit specification. We set by
πij,t =
exp(fij,t )
∀i = 0, . . . , K − 1, j = 0, · · · , K − 2,
PK−1
1 + j=1 exp(fij,t )
πi,K−1,t = 1 −
K−1
X
j=1
πij,t =
1+
1
PK−1
j=1
exp(fij,t )
∀i = 0, . . . , K − 1.
The GAS parameter vector ft now has K(K − 1) free elements. The GAS updating
mechanism in (12) can be easily adapt in the following way
−0.5
ft+1 = ω + Bft + AIt−1
∇t ,
12
(18)
where ω is a K(K − 1) vector and the K(K − 1) × K(K − 1) A and B matrices can be
assumed either diagonal or full. Due to the multinomial logit specification for ft , the score,
∇t , for the conditional density for yt in (8) and the Fisher Information matrix, It−1 , are
given by
∇t = J 0 ∇Π
t
0
Π
It−1 = E[J 0 ∇Π
t ∇t J]
where ∇Π
t is the score with respect to the un-transformed parameter vector vec(Πt ) and J
is the Jacobian matrix of all first-order partial derivatives of vec(Πt ) with respect to ft ,

∂vec(Πt ) 

J=
=
0
∂ft

∂π00,t
∂f00,t
..
.
∂π(K−1)(K−2),t
∂f00,t
···
..
.
···
∂π00,t
∂f(K−1)(K−2),t
..
.
∂π(K−1)(K−2),t
∂fK−1)(K−2),t



.

For the generic element ∂πij,t /∂frc,t the following expression holds:





P
2
exp(frc,t )(1+ K−1
c=1 exp(frc,t )−[exp(frc,t )]
P
K−1
2
(1+ c=1 exp(frc,t )
exp(frc,t ) exp(frc,t )
− (1+PK−1 exp(f )2
rc,t
c=1
∂πij,t
=
∂frc,t 


 0
for i = r ∧ j = c
for i = r ∧ j 6= c .
otherwise
The generic element in the score ∇Π
t is given by
∂ log p(yt |ψ ∗ , It−1 )
1
∂Πt
0
ξt−1|t−1
=
ηt
∗
∂πij,t
p(yt |ψ , It−1 )
∂πij,t
4
(19)
Stationarity and ergodicity of the GAS process
To characterize the dynamic properties of our Markov switching model with GAS dynamics
for the transition probabilities, we build on Blasques et al. (2012). Blasques et al. analyze GAS processes using the stationarity and ergodicity (SE) conditions formulated by
Straumann and Mikosch (2006) for general stochastic recurrence equations. A stochastic
13
recurrence equation for ft takes the form
ft+1 = ϕt (ft ; ψ ∗ ),
∀t ∈ Z,
where ϕt is a random function. This clearly embeds the GAS model in equation (11) if we
set
ϕt (ft ; ψ ∗ ) = ω + A st (ft ; ψ ∗ ) + B ft .
The following lemma states sufficient conditions for the process {ϕt (ψ ∗ , f1 )}t∈N initialized at f1 ∈ R to converge almost surely and exponentially fast to a limit stationary process
{f˜t (ψ ∗ )}t∈Z that has nf bounded moments.
Lemma 1. Let {yt }t∈Z be SE and for every ψ ∗ ∈ Ψ∗ assume that there exists a non-random
f1 ∈ R such that,
(i) E log+ |s(yt , ft , ψ ∗ )| < ∞;
,ft ;ψ ∗ )
(ii) E ln supf A ∂s(yt∂f
+ B < 0.
Then {ϕt (ψ ∗ , f1 )}t∈N converges exponentially fast and almost surely to an SE process {f˜t (ψ ∗ )}t∈Z ,
e.a.s.
for every ψ ∗ ∈ Ψ∗ , i.e., |ft (ψ ∗ , f1 ) − f˜t (ψ ∗ )| → 0 as t → ∞ ∀ψ ∗ ∈ Ψ∗ . If also
(iii) E|s(yt , ft , ψ ∗ )|nf < ∞ and
∗)
+
B
(iv) sup(y,f ) A ∂s(y,f,ψ
< 1,
∂f
then supt E|ft (ψ ∗ , f1 )|nf < ∞ and E|f˜t (ψ ∗ )|nf < ∞ ∀ψ ∗ ∈ Ψ∗ .
Proof. See Proposition 3 in Blasques et al. (2012).
Under the regularity conditions offered by Lemma 1(ii), we can derive all combinations
of A and B for which we can ensure that {ft } is asymptotically SE. We call this the SE
region. In addition, we can use the more restrictive counterpart of (ii), i.e., condition (iv)
to derive regions where we can ensure the existence of moments.
14
For sake of simplicity, we consider the model introduced in Example 1 but where only
π00,t is assumed to be time varying, the corresponding GAS process ft is univariate, A e
B are scalar. The score is scaled with the square root of the Fisher information plus the
extra parameter k. The marginal density for yt is rewritten as function of ft , directly,
f (yt |ψ ∗ , It−1 ) =
eft
P(St−1 = 0|ψ ∗ , It−1 ) [f (yt |zt = 0ψ ∗ , It−1 ) − f (yt |zt = 1ψ ∗ , It−1 )]
1 + eft
+ π11 (P(St−1 = 0|ψ ∗ , It−1 ) − 1) [f (yt |zt = 0ψ ∗ , It−1 ) − f (yt |zt = 1ψ ∗ , It−1 )]
+ (1 − P(St−1 = 0|ψ ∗ , It−1 )) (yt |zt = 0ψ ∗ , It−1 ) + P(St−1 = 0|ψ ∗ , It−1 )(yt |zt = 1ψ ∗ , It−1 ),
The stability condition for ft is given by
∂s(yt , f, ψ ∗ ) 1 − |B|
≤
,
Eyt sup ∂f
|A|
f
where s(yt , f, ψ ∗ ) indicates that, in the model considered in Example 1, the score does
not depend on past values of dependent variable yt . Since it has not a closed form, we
evaluated numerically it for different values of k and P(St−1 = 0|ψ ∗ , It−1 ). On other hand,
we keep fixed µ0 = −1, µ1 = 1, σ 2 = 0.5 and π11 = 0.85. In Figure 1, we showed the SE
region in (A, B) when P(St−1 = 0|ψ ∗ , It−1 ) = 0.05, 0.5, 1 and k = 0, 0.5, 1. Since the score
∇t becomes very small in absolute value when P(St−1 = 0|ψ ∗ , It−1 ) is close to 0, the SE
region in (A,B) is quite large and different values k do not play an important role. When
P(St−1 = 0|ψ ∗ , It−1 ) is equal to 0.5 or 1, the SE region in (A, B) is small and the role of
k becomes clear. In both cases, when we use totally the Fisher Information, (k = 0), we
obtain the largest SE region which becomes smaller as k decreases. However also for the
extreme situation where k = 1 the SE is not degenerate. In the multivariate case, if k = 0,
the Fisher information matrix does not admit an inverse and then the resulting SE region
is degenerate.
15
-60
-40
-20
0
20
40
60
1.0
0.5
B
-1.0
-0.5
0.0
0.5
-1.0
-0.5
B
0.0
0.5
0.0
-1.0
-0.5
B
St−1 = 0.95
1.0
St−1 = 0.5
1.0
St−1 = 0.05
-60
-40
-20
A
0
20
40
60
-60
-40
-20
A
0
20
40
60
A
Figure 1: Stability region of parameters A and B for different values of P(St−1 = 0|ψ ∗ , It−1 )
and k. (black line k = 0, red line k = 0.5, green line k = 1 )
Table 1: Simulation patterns for π00,t and π11,t
5
1.
Model
Constant
π00,t
0.95
π11,t
0.85
2.
Break
0.95 − 0.9 1{t<T /2}
0.05 + 0.9 1{t<T /2}
3.
Slow Sine
0.5 + 0.45 cos (4πt/T )
0.5 − 0.45 cos (4πt/T )
4.
Sine
0.5 + 0.45 cos (8πt/T )
0.5 − 0.45 cos (8πt/T )
5.
Fast Sine
0.5 + 0.45 cos (20πt/T )
0.5 − 0.45 cos (20πt/T )
Monte Carlo Study
To investigate the performance of the GAS model for time-varying transition probabilities, we consider a Monte Carlo study for the two regime model from equation (4), with
common variance σ 2 = 0.5 and means µ0 = −1 and µ1 = 1. We consider 5 different forms
of time-variation for the transition probabilities π00,t and π11,t , which we call our Data
Generating Process (DGP). The patterns are summarized in Table 1. The patterns range
from a constant set of transition probabilities, via an incidental structural break halfway
the sample, to slow and fast changing continuously changing transition probabilities. In
this way, we can investigate the robustness of the GAS Markov switching models under a
variety of circumstances.
16
We consider three different sample sizes: T = 250, 500, 1000 and both types of Tikhonov
regularizations introduced in equations (16) and (17). For each data generating process and
each sample size, we compare the transition probability estimates for a model with constant
transition probabilities and a model with GAS dynamics for the transition probabilities.
We simulation average over 100 simulations of the Mean (over the sample size) Squared
Error (MSE) and Mean Absolute Error (MAE) in Table 2. Bold entries indicate the
minimum the MSE or MAE, row-wise.
The first thing to note in Table 2 is that the MS-GAS model is successful in filtering
out the time-varying transition probabilities. Except for the DGP with constant transition
probabilities, the MSE and MAE of the MS-Constant filtered probabilities dπ00,t and dπ11,t
are always higher than their MS-GAS filtered counterparts. Using the more flexible form
∗∗
typically has a slight advantage in terms of MSE
of the regularized Tikhonov inverse It−1
∗
. The difference, however, is small compared to the
and MAE compared to the use of It−1
improvement vis-à-vis the model with constant transition probabilities (Const).
For increasing sample sizes, both the MSE and MAE generally decrease for the MSGAS models, whereas this is not the case for the MS-Constant model. This suggests the
the MS-GAS filter is better at tracking the dynamics of the transition probabilities if these
occur at a slower pace. Note that the sinusoid patterns have the same number of swings
over the entire sample. Therefore, for a larger sample size the change in the transition
probabilities is smaller per unit of time if T increases. The overall inaccuracy of the model
with constant probabilities (Const), however, remains rather constant in the sample size.
The improvement compared to the constant model is substantial: in terms of MSE, the
MS-GAS model is for example better by a factor of 5 to 10 times for the sine and slow sine
patterns.
As shown in the bottom part of Table 2, the MS-GAS is also better in general than the
constant model for filtering out the smoothed unobserved state process zt . This holds both
in terms of MSE and MAE over a range of sample sizes. The differences, however, are less
pronounced than for the transition probabilities π00,t and π11,t themselves. This is due to
the fact that estimation of the state zt is harder, and the potential errors are much larger,
17
Table 2: Simulation results: MSE and MAE for π00,t , π11,t , and smoothed zt
MSE
DGP
Constant
Break
Slow Sine
Sine
Fast Sine
Constant
Break
Slow Sine
Sine
Fast Sine
Constant
Break
Slow Sine
Sine
Fast Sine
MAE
T
250
500
1000
250
500
1000
250
500
1000
250
500
1000
250
500
1000
Const
0.017
0.000
0.000
0.447
0.449
0.451
0.175
0.203
0.198
0.175
0.187
0.197
0.162
0.172
0.193
GAS It∗
0.077
0.026
0.011
0.244
0.161
0.199
0.058
0.041
0.024
0.074
0.059
0.035
0.110
0.074
0.052
results for π00,t
GAS It∗∗
Const
0.083
0.017
0.022
0.013
0.013
0.008
0.251
0.495
0.172
0.497
0.217
0.499
0.052
0.344
0.030
0.368
0.021
0.363
0.069
0.342
0.052
0.354
0.031
0.362
0.108
0.331
0.065
0.339
0.044
0.358
250
500
1000
250
500
1000
250
500
1000
250
500
1000
250
500
1000
0.005
0.002
0.001
0.446
0.449
0.451
0.186
0.208
0.201
0.182
0.194
0.200
0.155
0.175
0.195
0.091
0.030
0.018
0.180
0.126
0.199
0.059
0.037
0.025
0.081
0.063
0.036
0.103
0.080
0.053
results for π11,t
0.129
0.053
0.039
0.032
0.037
0.022
0.173
0.493
0.138
0.496
0.217
0.499
0.055
0.352
0.032
0.372
0.023
0.365
0.077
0.348
0.052
0.360
0.031
0.364
0.104
0.326
0.073
0.342
0.047
0.360
0.209
0.106
0.083
0.240
0.185
0.212
0.193
0.151
0.121
0.229
0.197
0.149
0.264
0.233
0.188
0.252
0.136
0.123
0.235
0.195
0.224
0.184
0.141
0.144
0.222
0.173
0.135
0.261
0.221
0.173
250
500
1000
250
500
1000
250
500
1000
250
500
1000
250
500
1000
0.000
0.056
0.016
0.067
0.069
0.060
0.067
0.089
0.089
0.063
0.085
0.076
0.055
0.057
0.073
0.000
0.028
0.028
0.062
0.051
0.049
0.052
0.049
0.046
0.055
0.067
0.048
0.055
0.051
0.050
results for zt
0.000
0.000
0.024
0.072
0.028
0.033
0.061
0.069
0.036
0.070
0.032
0.068
0.052
0.111
0.046
0.130
0.045
0.132
0.053
0.109
0.066
0.131
0.048
0.120
0.058
0.104
0.050
0.106
0.049
0.121
0.037
0.048
0.046
0.070
0.050
0.049
0.099
0.093
0.088
0.100
0.113
0.093
0.103
0.099
0.098
0.042
0.043
0.047
0.071
0.051
0.048
0.098
0.090
0.087
0.097
0.111
0.092
0.104
0.097
0.096
18
GAS It∗
0.103
0.054
0.034
0.312
0.235
0.260
0.195
0.156
0.123
0.223
0.191
0.148
0.272
0.226
0.188
GAS It∗∗
0.108
0.049
0.036
0.316
0.243
0.275
0.181
0.136
0.113
0.210
0.178
0.136
0.265
0.206
0.167
Transformed series
0
-10
20
-5
40
0
60
5
80
10
15
100
Orginal series
1920
1940
1960
1980
2000
1920
Time
1940
1960
1980
2000
Time
Figure 2: Deseasonalized U.S. Industrial Production index (left) and its log differences
(right, scaled by 100)
because zt = 0 or zt = 1 by construction. All results, however, point to the conclusion that
the MS-GAS model can result in substantial increases in model fit compared to a model
with constant probabilities, even if the GAS dynamics are not the actual DGP dynamics
(as was the case for the structural break and sinusoid patterns above).
6
Empirical application: U.S. Industrial Production
Given the usefulness of the MS-GAS model in a controled environment, we now turn to
an application of the model to empirical data. We consider the deseasonalized monthly
time series of the U.S. Industrial Production index. We obtain the data from the Federal
Reserve Bank of St. Louis (FRED) website. The sample spans the period January 1919
to October 2013, amounting to T = 1137 observations. We define yt as the monthly log
differences of the original series and scale the series by 100 to denote percentages. Figure
2 presents the original index as well as its log differences.
Following the approach of Doornik (2013) for post-war U.S. GDP data, we model the
monthly Industrial Production log growth rates by a Markov switching mean-variance
component model with three regimes for the mean and two for the variance. The mean
19
is modeled as a constant plus 3 autoregressive lag terms, which appears adequate for the
current series. The choice of three regimes for the mean intends to distinguish between
recessions, normal times, and growth phases of the economy. The two variance regimes
allow for a low and high volatility phase, respectively.
The total of six regimes produces a model that is rich in parameters. In order to have
some parsimony, we follow Doornik (2013) and impose the restriction Πt = Πµ ⊗ Πσt with

µ
π00
µ
π01
µ
π00
µ
π01
1−
−


µ
µ
µ
µ
Πµ =  π10
π11
1 − π10
− π11

µ
µ
µ
µ
π20
π21
1 − π20
− π21




,

Πσt = 
σ
π00,t
σ
1 − π11,t
1−
σ
π00,t
σ
π11,t

,
and ⊗ denoting the Kronecker product. Denote with {ztµ }t∈Z and with {ztσ }t∈Z , the hidden
processes which determine the mean and the variance for the density of yt , respectively.
Given (z4µ , z4σ ) , . . . , (zTµ , zTσ ), the random variables y4 , . . . yT are not longer conditionally
independent as the conditional mean µm,t in their densities
yt |(ztµ = m, ztσ = s, It−1 ) ∼ N µm,t , σs2
m = 1, 2, 3 s = 1, 2,
depends on the first three lags of depend variable, i.e.,
µm,t = γ0,i + γ1,m yt−1 + γ2,m yt−2 + γ3,m yt−3 .
(20)
We can write the conditional density for yt in (8) as

µ
P[zt−1
σ
0, zt−1
∗
=
= 0|ψ , It−1 ]


µ
σ
 P[zt−1
= 0, zt−1
= 1|ψ ∗ , It−1 ]
∗

p(yt |ψ , It−1 ) = 
..

.

µ
σ
P[zt−1
= 2, zt−1
= 1|ψ ∗ , It−1 ]
0
∗
= ξt−1|t−1
(Πµ ⊗ Πσt ) ηt∗ .
20
0

p(yt ; µ0,t , σ02 , It−1 )





 p(yt ; µ0,t , σ12 , It−1 )
 (Πµ ⊗ Πσt ) 


..


.


p(yt ; µ2,t , σ12 , It−1 )




 (21)



The model is estimated by numerically maximum the likelihood function. We consider two
benchmark models. In the first model, the transition probabilities for the variance regimes
are constant over time (MS-Constant). In the second model (MS-TVP), we allow the
transition probabilities for the variance regimes to vary over time as a function of the lagged
dependent variable. By using the logit transformation to ensure that the probabilities lie
σ
, j = 0, 1 is
on unit interval, the specification for the transformed parameter fjj,t
σ
fjj,t
= ωj∗ + A∗j yt−1 ,
σ
πjj,t
=
σ
exp(fjj,t
)
,
σ
1 + exp(fjj,t )
(22)
σ
depends only on the lagged yt−1 .We compare both
where the evolution over time for πjj,t
models to the MS-GAS model introduced in this paper. We note that the current parameterization is different from the general K-regime Markov switching model introduced in
Section 3. This causes slightly different expressions for the score and scaling matrix. After
0
σ
σ
defining ftσ = (f00,t
f11,t
) , we use diagonal specifications for the matrixes A and B in (11)
and the GAS updating mechanism given by (12). The 2 × 2 score ∇t for the conditional
density for yt in (21) is given by

∇t = 
σ
(1
π00,t
−
0
σ
)
π00,t

∗
 ∂ log p(yt |ψ , It−1 ) ,
σ
σ
σ
σ
)0
π11,t
∂(π00,t
)
(1 − π11,t
π11,t
0
(23)
where, similarly to (19), the generic element of the score with respect to the untransformed
parameter vector (π00,t π11,t ) is given by
0
∂ log p(yt |ψ ∗ , It−1 )
1
∂Πσt
∗
µ
=
ξ
Π ⊗
ηt∗ ,
∂πjj,t
p(yt |ψ ∗ , It−1 ) t−1|t−1
∂πjj,t
∗
for j = 0, 1 and where ξt−1|t−1
and ηt∗ are defined in (21). To compute the Fisher Informa-
tion, given by E[∇t ∇0t ], we have to evaluate 4 numerical integrals for every time t and trial
parameter value ψ ∗ . Table 3 provides summary statistics of the model fit and diagnostic
tests based on the generalized and Rosenblatt residuals introduced in Section 2. Allowing
the transition probabilities of the variance regimes to vary over time yields a substantial
21
Table 3: Model fit and diagnostics
JB indicates the p-value for Jarque-Bera normality test. LB and LBsq denote the p-values for
Ljung-Box statistics for the residuals and squared
risiduals, respectively. All models are estimated
with an AR(3) plus intercept component for the
mean.
MS-Constant MS-TVP MS-GAS
LogLik
-1642.3
-1635.6 -1624.2
AICc
3329.5
3320.2
3305.9
N par
22
24
28
Generalized residuals
LB(6)
0.771
0.857
0.811
LBsq(6)
0.556
0.738
0.626
JB
0.065
0.100
0.023
Rosemblatt residuals
LB(6)
0.408
0.551
0.584
LBsq(6)
0.648
0.675
0.472
JB
0.011
0.031
0.002
increase in the log-likelihood. The MS-TVP model shows an increase of almost 7 points by
adding 2 parameters compared to the MS-Constant model. The MS-GAS model shows a
larger increase of about 18 points at the cost of 6 additional parameters. If we consider the
AICc criterion, the MS-GAS model appears preferable, followed by the MS-TVP model.
The corrected Akaike Information Criterion (AICc) adds a stricter finite sample penalty to
the regular AIC of Akaike (1973,1974). The AICc was fist introduced by Hurvich and Tsai
(1991) and further supported by theoretical and simulation based evidence by McQuarrie
and Tsai (1998) and Burnham and Anderson (2002). The Ljung-Box tests indicate that the
dynamics of the model are well specified. The normality of the generalized and Rosenblatt
residuals, however, seems to be amenable to further model improvements, for example,
using non-normal regime dependent distributions.
Table 4 shows the parameter estimates. We note that the differences in the parameters
γk,i describing the conditional means are small across models with and without timevarying transition probabilities. For all models, we see that regime i = 1 corresponds to
the negative growth regime, while i = 2 corresponds to the boom. We also note that the
22
Table 4: Parameter estimates
γ0,i for i = 0, 1, 2 are the intercepts in the conditional mean shown in in (20) under the three different mean
µ
σ
regimes, while γ1,i . . . γ3,i are the 3 autoregressive parameters. πij
and πjj
are the transition probabilities
in the transition matrix for the mean regimes and for the variance regimes, respectively. The transition
matrix for the mean regimes is assumed to be constant over time for all models, while for the transition
matrix of the variance regimes is assumed constant for the MS-Constant model. Forthe MS-TVP model,
σ
∗
the transition probabilities are modeled according to (22) and πjj
= logit−1 ωjj
. For the MS-GAS
σ
model, they are modeled according to (12) and (23) and πjj is the unconditional mean of the transformed
stationary process fjj,t , i.e. logit−1 (ωjj /(1 − Bjj )). For the MS-TVP model, Ajj corresponds to A∗jj in
(22) while for the MS-GAS model Ajj and Bjj are given in (12). λj j = 1, 2 are the parameters in the
Tikonov regularized inverse in (17) of the conditional Fisher information matrix. All models are estimated
by maximizing numerically the log-likelihood, Standard errors are reported between parentheses.
MS-Constant
MS-TVP
MS-GAS
i, j = 0
i, j = 1
i, j = 2
i, j = 0
i, j = 1
i, j = 2
i, j = 0
i, j = 1
i, j = 2
γ0,i
0.076
-0.212
0.846
0.070
-0.297
0.820
0.084
-0.172
0.822
(0.033)
(0.176)
(0.166)
(0.036)
(0.142)
(0.170)
(0.038)
(0.129)
(0.178)
γ1,i
0.316
1.121
-0.609
0.321
1.075
-0.645
0.286
1.117
-0.631
(0.030)
(0.072)
(0.134)
(0.054)
(0.085)
(0.123)
(0.050)
(0.063)
(0.121)
γ2,i
0.212
-0.569
-0.395
0.206
-0.490
-0.399
0.237
-0.580
-0.489
(0.022)
(0.057)
(0.123)
(0.045)
(0.118)
(0.146)
(0.042)
(0.069)
(0.079)
γ3,i
0.105
0.076
0.039
0.108
0.017
0.038
0.109
0.072
0.133
(0.030)
(0.020)
(0.118)
(0.041)
(0.080)
(0.138)
(0.034)
(0.073)
(0.069)
µ
πi0
0.909
(0.021)
0.016
(0.025)
0.111
(0.177)
0.858
(0.125)
σj2
0.336
(0.022)
σ
πjj
0.980
(0.005)
µ
πi1
Ajj
0.578
(0.120)
0.055
(0.027)
0.911
(0.031)
0.010
(0.011)
0.071
(0.072)
0.904
(0.065)
5.578
(0.567)
0.347
(0.026)
0.947
(0.014)
0.901
(0.041)
0.029
(0.022)
0.152
(0.085)
0.822
(0.080)
5.698
(0.672)
0.311
(0.027)
5.483
(0.646)
0.986
(0.008)
0.935
(0.022)
0.875
(0.104)
0.743
(0.287)
1.608
(0.429)
-0.034
(0.088)
1.410
(0.914)
0.999
(0.002)
0.999
(0.027)
1.275
(0.198)
0.998
(0.002)
0.911
(0.431)
Bjj
λj
0.574
(0.136)
0.062
(0.051)
0.606
(0.152)
0.059
(0.069)
extreme booms are very short lived, whereas normal growth regimes (i = 0) are much
more persistent. The negative growth regimes provide an intermediate case in terms of
persistence.
Looking at the transition probabilities for the mean, again the results are similar across
the different models, albeit that the MS-TVP model exhibits a larger persistence in that
23
µ
µ
are larger. For all models, the extreme growth regime is not persistent with
and π11
π00
µ
µ
µ
transition probability π22
= 1 − π20
− π21
≈ 0.35.
For the variance regimes, the models distinguish between a low (0.336) and and a high
(5.578) variance regime. The magnitudes of these variances are again comparable across the
different models, MS-Constant, MS-TVP and MS-GAS. In the model with static transition
probabilities (MS-Constant), both variance regimes are highly persistent with probabilities
σ
σ
both close to 1.
and π11
π00
The MS TVP is not able to capture the dynamics in the transition probabilities. Indeed,
the parameter A∗1 is not significant −0.034 with standard error 0.088. This is also reflected
in the plot of the smoothed MS-TVP probabilities in the lower panels of Figure 3. The
smoothed probability estimate of the low volatility regime is highly erratic, whereas that
for the high volatility regime hardly moves away from the unconditional estimate for MSConstant.
The filtered probabilities for the MS-GAS model show an entirely different pattern.
Both the low and high volatility transition probabilities evolve gradually over time. In
particular, the persistence of the low volatility regime appears to have gone up over time,
with values around 0.7 in the early part of the sample, and values close to 1 in the second
half of the sample. The converse holds for the high volatility regime. The persistence probσ
ability π11
is close to 1 up to the 1940s. After that, the probability decreases substantially
to values around 0.5, and slowly rises towards the end of the sample again. The pattern for
the filtered probabilities is entirely in line with the empirical pattern in the data in Figure
2. In the earlier part of the sample, high volatility levels are predominant. Towards the
middle of the sample, large volatilities are incidental and short-lived, whereas towards the
end of the sample during the years of the financial crisis, U.S. debt ceiling crisis, and the
European sovereign debt crisis, higher volatility levels appear to cluster again somewhat
more.
The empirical patters are also corroborated by the parameter estimates in Table 4. In
particular, the parameters Bj are both close to 1, suggesting that the dynamic transition
probabilities evolve gradually over time. Given the high values of λj , the model appears to
24
1.0
Smoothed Recession Mean Regime
0.0
0.5
NBER Recession phases
Constant
1919
1929
1939
1949
TVP
1959
GAS
1969
1979
1989
1999
2009
1989
1999
2009
Time
0.0
0.5
1.0
Smoothed High Variance Regime
Constant
1919
1929
1939
1949
TVP
1959
GAS
1969
1979
Time
0.4
0.4
0.8
Filtered High Variance probability
0.8
Filtered Low Variance probability
1920
1940
TVP
1960
GAS
1980
Constant
0.0
0.0
Constant
2000
1920
Time
1940
TVP
1960
GAS
1980
2000
Time
Figure 3: Smoothed regime probabilities and transition probabilities for the variance
regimes
25
favor almost unit staling. Both Aj parameters have the correct sign and result in parameter
changes that increase the local fit in terms of the likelihood.
Finally, we plot the smoothed estimates of zt in the top panels of Figure 3. The figure
also holds the NBER business cycle classifications, with gray areas indicating recessions.
We see that all models result in higher smoothed recession probabilities in the NBER
classified periods. The fits for the models with time varying transition probabilities for
the variance regimes (MS-TVP and MS-GAS) typically do a better job than the static
model (MS-Constant). We also see the smoothed probability of the high variance regime.
Most of the high variance regime is located in the first half of the sample. A second clear
high variance episode is during the financial crisis, with the intermediate period being low
volatility. We also note that some, but definitely not all NBER recessions correspond to
a period of high volatility. This confirms our current set-up with separate regimes for the
(conditional) means and for the variances.
7
Conclusion
We introduced a new methodology for time-varying transition probabilities in Markov
switching models. We used the score of the predictive likelihood and the Generalized
Autoregressive Score (GAS) framework of Creal et al. (2013) to drive the dynamics of the
transition probabilities over time. The corresponding dynamics could be easily interpreted
and included all the information embedded in the conditional (on the unobserved state)
observation densities, as well as in the filtered probabilities which of the (unobserved) states
was applicable at any moment in time. We discussed several regularized or pseudo inverses
to remedy the singularity in the Fisher information matrix and found these to work well
in simulated and empirical samples.
Through an extensive Monte Carlo study, we showed that the new MS-GAS model adequately estimates the dynamic pattern in transition probabilities, even if the GAS dynamics
themselves are mis-specified. Both for deterministic structural breaks and deterministic
sinusoid patters, the MS-GAS model yields a large improvement in model fit compared to
26
a model with constant transition probabilities.
In our real data application, we showed that the model is also useful for analyzing
actual empirical data. Using log differences of the the deseasonalized monthly Industrial Production index, we found that the MS-GAS model outperformed both the Markov
switching model with constant probabilities, and the Markov switching model where the
transition probabilities were standard logistic functions dependent on the past value of
the dependent variable. In particular, the patterns filtered by our MS-GAS model were
easily interpretable, with higher(lower) persistence for high(low) volatility regimes in the
beginning of the sample compared to the later part of the sample. We conclude that the
MS-GAS model can provide a useful benchmark in settings where transition probabilities
in a regime switching model may be time-varying.
References
Akaike, H., 1973. Maximum likelihood identification of Gaussian autoregressive moving
average models. Biometrika 60 (2), 255–265.
Akaike, H., 1974. A new look at the statistical model identification. Automatic Control,
IEEE Transactions on 19 (6), 716–723.
Blasques, F., Koopman, S. J., Lucas, A., 2012. Stationarity and ergodicity of univariate
generalized autoregressive score processes. Tech. rep., Tinbergen Institute.
Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. Journal of
econometrics 31 (3), 307–327.
Burnham, K. P., Anderson, D. R., 2002. Model selection and multi-model inference: a
practical information-theoretic approach. Springer.
Cox, D. R., 1981. Statistical analysis of time series: some recent developments. Scandinavian Journal of Statistics 8, 93–115.
27
Creal, D., Koopman, S., Lucas, A., 2014a. Combining state-space and gas dynamics. Working Paper, VU University Amsterdam.
Creal, D., Koopman, S. J., Lucas, A., 2011. A Dynamic Multivariate Heavy-Tailed Model
for Time-Varying Volatilities and Correlations. Journal of Business & Economic Statistics 29 (4), 552–563.
Creal, D., Koopman, S. J., Lucas, A., 2013. Generalized autoregressive score models with
applications. Journal of Applied Econometrics 28 (5), 777–795.
Creal, D., Koopman, S. J., Lucas, A., Schwaab, B., 2014b. Observation driven mixedmeasurement dynamic factor models. Review of Economics and Statistics, forthcoming.
De Lira Salvatierra, I., Patton, A. J., 2013. Dynamic copula models and high frequency
data. Duke University Discussion Paper.
Delle Monache, D., Petrella, I., 2014. A score driven approach for gaussian state-space
models with time-varying parameter. Working Paper, Imperial College London.
Diebold, F., Lee, J., Weinbach, G., 1994. Regime Switching with Time-Varying Transition Probabilities. In: Hargreaves, C. (Ed.), Nonstationary Time Series Analysis and
Cointegration. Oxford University Press, pp. 283–302.
Doornik, J., 2013. A Markov-switching model with component structure for US GNP.
Economics Letters 118 (2), 265–268.
Engle, R., 2002. New Frontiers for ARCH models. Journal of Applied Econometrics 17 (5),
425–446.
Engle, R. F., 1982. Autoregressive conditional heteroscedasticity with estimates of the
variance of United Kingdom inflation. Econometrica 50 (4), 987–1007.
Engle, R. F., Russell, J. R., 1998. Autoregressive Conditional Duration: A New Model for
irregularly Spaced Transaction Data. Econometrica 66 (5), 1127–1162.
28
Filardo, A. J., 1994. Business-cycle phases and their transitional dynamics. Journal of
Business & Economic Statistics 12 (3), 299–308.
Francq, C., Roussignol, M., 1998. Ergodicity of autoregressive processes with Markovswitching and consistency of the maximum-likelihood estimator. Statistics: A Journal
of Theoretical and Applied Statistics 32 (2), 151–173.
Francq, C., Zakoïan, J.-M., 2001. Stationarity of multivariate markov–switching arma models. Journal of Econometrics 102 (2), 339–364.
Frühwirth-Schnatter, S., 2006. Finite Mixture and Markov Switching Models. Springer.
Gourieroux, C., Monfort, A., Renault, E., Trognon, A., 1987. Generalised residuals. Journal
of Econometrics 34 (1), 5–32.
Gray, S. F., 1996. Modeling the conditional distribution of interest rates as a regimeswitching process. Journal of Financial Economics 42 (1), 27–62.
Hamilton, J., 1989. A New Approach to the Economic Analysis of Nonstationary Time
Series and the Business Cycle. Econometrica 57 (2), 357–384.
Harvey, A. C., 2013. Dynamic Models for Volatility and Heavy Tails: With Applications
to Financial and Economic Time Series. Econometric Series Monographs. Cambridge
University Press.
Harvey, A. C., Luati, A., 2014. Filtering with heavy tails. Journal of the American Statistical Association, forthcoming.
Hurvich, C. M., Tsai, C.-L., 1991. Bias of the corrected AIC criterion for underfitted
regression and time series models. Biometrika 78 (3), 499–509.
Kim, C., 1994. Dynamic linear models with Markov-switching. Journal of Econometrics
60 (1), 1–22.
29
Kim, C.-J., Morley, J. C., Nelson, C. R., 2004. Is there a positive relationship between
stock market volatility and the equity premium? Journal of Money, Credit and Banking
36, 339–360.
Koopman, S. J., Lucas, A., Scharth, M., 2012. Predicting time-varying parameters with
parameter-driven and observation-driven models. Tinbergen Institute Discussion Papers
12-020/4.
Lucas, A., Schwaab, B., Zhang, X., 2014. Measuring credit risk in a large banking system: econometric modeling and empirics. Journal of Business and Economic Statistics,
forthcoming.
Maheu, J. M., McCurdy, T. H., 2000. Identifying bull and bear markets in stock returns.
Journal of Business & Economic Statistics 18 (1), 100–112.
McQuarrie, A. D., Tsai, C.-L., 1998. Regression and time series model selection. Vol. 43.
World Scientific.
Nelson, D. B., 1991. Conditional Heteroskedasticity in Asset Returns: A New Approach.
Econometrica 59 (2), 347–370.
Oh, D. H., Patton, A. J., 2013. Time-varying systemic risk: Evidence from a dynamic
copula model of cds spreads. Duke University Discussion Paper.
Rosenblatt, M., 1952. Remarks on a multivariate transformation. The Annals of Mathematical Statistics 23 (3), 470–472.
Smith, D. R., 2008. Evaluating Specification Tests for Markov-Switching Time-Series Models. Journal of Time Series Analysis 29 (4), 629–652.
Straumann, D., Mikosch, T., 2006. Quasi-maximum-likelihood estimation in conditionally
heteroscedastic time series: a stochastic recurrence equations approach. The Annals of
Statistics 34 (5), 2449–2495.
30
Turner, C. M., Startz, R., Nelson, C. R., 1989. A Markov model of heteroskedasticity, risk,
and learning in the stock market. Journal of Financial Economics 25 (1), 3–22.
31