multivariate gaussian hidden markov models with an unknown

M ULTIVARIATE G AUSSIAN HIDDEN M ARKOV MODELS
WITH AN UNKNOWN NUMBER OF REGIMES
Luigi Spezia
Biomathematics & Statistics Scotland
Macaulay Institute
Craigiebuckler, Aberdeen, AB15 8QH, UK
(e-mail: [email protected])
A BSTRACT. Hidden Markov models (HMMs) are generalizations of mixture models, obtained by
adding a latent, or hidden, Markov chain which drives the observed process. Multivariate Gaussian
HMMs with an unknown number of regimes are considered here in the Bayesian setting and efficient
reversible jump Markov chain Monte Carlo algorithms for estimating both the dimension and the unknown parameters of the model will be presented, along with applications to real world phenomena.
1
I NTRODUCTION
We apply Multivariate Gaussian Hidden Markov Models (MVGHMMs) to weakly dependent
non-linear time series to understand the switching of both the observed levels of a multivariate time series and the covariance matrices, by assuming that the levels and the strength of
the interdependencies between the variables can change over time. We also reconstruct the
sequence of the hidden regimes, which represents the evolution of the switching in time of
the means and covariance matrices of the observed phenomena. The reversible jump Markov
chain Monte Carlo (RJMCMC) algorithm we propose for MVGHMMs, instead of exploiting
the conjugacy of the prior distributions to implement Gibbs sampling, updates the parameters
by random walk Metropolis-Hastings moves, because the Gibbs sampler is less able to traverse the posterior surface and to escape local modes, as pointed out by Celeux et al. (2000)
for univariate mixture models. All sweeps of our MCMC algorithms are characterized by the
generation of the parameters with the “absence of completion”, as in Cappé et al. (2003), i.e.
without generating the sequence of the hidden Markov chain.
2
M ULTIVARIATE G AUSSIAN HIDDEN M ARKOV MODELS
Let {Xt } be a discrete-time, first-order, homogeneous Markov chain on a finite space SX =
{1, . . . , m}; the transition matrix is Γ = [γi, j ], where γi, j = P(Xt = j | Xt−1 = i), for any i; j ∈ SX
and for any t = 2, . . . , T , with 0 < γi, j < 1. Let also {Yt } be a sequence of variables taking
values in R p . Only process {Yt } is observable, while {Xt } is latent, i.e. hidden underneath
{Yt }, and we assume that {Yt }, given {Xt }, is a sequence of conditionally independent random
variables, which are multivariate here, i.e. Yt = (Yt,1 , . . . ,Yt,p )0 , for any t = 1, . . . , T , whose
conditional distributions depend on {Xt } only through the contemporary Xt ’s and they are
assumed to be Multivariate Gaussian of order p. Hence, the stochastic process ({Xt } ; {Yt })
can be named MVGHMM and represented by the following equality:
(Yt |Xt = i) = µi + (Et |Xt = i) ,
where the m processes {Et |Xt = i}, i = 1,
¡ . . . , m,¢are multivariate Gaussian white noise process of order p, with (Et |Xt = i) ∼ N(p) 0(p) ; Σi , so that (Yt |Xt = i) ∼ N(p) (µi ; Σi ), for any
regime, or state, i ∈ SX and for any t = 1, . . . , T (N(p) (·; ·) stands for the Multivariate Gaussian, or Normal, distribution of order p and 0(p) for the p-dimensional vector of zeros).
The unknown parameters of our MVGHMM are allocated in vector (µ, Σ, Ω, m)0 , where
µ is the vector of the m mean vectors µi (µ = (µ1 , . . . , µm )0 with µi = (µi,1 , . . . , µi,p )0 , for any
0
£i = 1, ¤. . . , m) and Σ is the vector of the m covariance matrices Σi (Σ = (Σ1 , . . . , Σm ) with Σi =
σi, j,k , for any i = 1, . . . , m and for any j, k = 1, . . . , p, i.e. σi, j,k = COV [(Yt, j |Xt = i) ; (Yt,k |Xt = i)],
for any t = 1, . . . , T and for any i = 1, . . . , m). The matrix Ω = [ωi, j ] contains positive quantities which are used to obtain the transition matrix Γ = [γi, j ], through the equality
,
γi, j = ωi, j
m
∑ ωi, j ,
j=1
with ωi, j > 0, for any i; j ∈ SX (Cappé et al. (2003)). This transformation of the transition
probabilities makes the random walk Metropolis-Hastings moves of the MCMC algorithm
more efficient.
Given that our inferential tool is Bayesian, we introduce the following independent and
relabelling-invariant prior distributions: for any i; j = 1, . . . , m, any µi is Multivariate Normal,
any Σi is Inverted Wishart, any ωi, j is Gamma, and m is Discrete Uniform between 1 and the
maximum number of hidden regimes we admit a-priori. The sequence of the observations
is denoted by yT = (y1 , . . . , yT )0 , with yt = (yt,1 , . . . , yt,p ), for any t = 1, . . . , T ; so, the joint
density of all variables of the model is
¡
¢
p(m, µ, Σ, Ω, yT ) = p yT | µ, Σ, Ω, m p(µ | m)p(Σ | m)p(Ω | m)p(m).
3
M ARKOV CHAIN M ONTE C ARLO ALGORITHMS
The vector of unknown parameters (µ, Σ, Ω, m)0 can be estimated in two consecutive steps
through two different MCMC algorithms: by the first, we can obtain the number of regimes,
when it is a random variable, while, by the second, we can obtain the estimates of the parameters of the multivariate Normals and the transition matrix, when the number of regimes is
fixed.
In both the MCMC algorithms, the simulations from the posterior density are characterized by Metropolis-Hastings moves with no completion, that is without the generation of
the sequence of the hidden regimes, which is never simulated. We update the parameters by
random walk Metropolis-Hastings moves, instead of Gibbs Sampling, because the former is
more able than the latter to traverse the posterior surface and to escape local modes. In any
sweep, all parameters are updated with no completion, in order to improve the precision of
the simulation and to accelerate the convergence of the algorithm, by eliminating unnecessary
randomness from the procedure and reducing the dimension of the parameter space.
Moreover, the posterior density is left unconstrained, because no artificial identifiability
constraint is selected a priori, and all m! subspaces on which the posterior density is defined
can be visited by the algorithm, about the same number of times, because we conclude any
sweep by permuting the generated values according to the random selection of one of the
m! possible ordering of the regimes (Frühwirth-Schnatter, 2001). This device, called “random permutation sampling” allows us to improve both the convergence and the mixing of
the algorithm, but, when the number of regimes is fixed, it prevents the direct computation
of the estimates of the parameters. So, a further step is necessary: we have to post-process
the MCMC sample, following the guidelines of Marin et al. (2005). First, we look for the
posterior mode, that is the values obtained in a special sweep which maximizes the posterior
density. Then, for any sweep, given all possible permutations of the regimes, we compute the
Euclidean norm between any permuted vector of parameters and the posterior mode; we select that special permutation which minimizes the distance between the reordered vector and
the posterior mode; we apply the selected permutation to the generated values of the sweep
and place them in the reordered MCMC sample. Finally, we can compute the estimates of the
parameters by taking the ergodic averages of the reordered values.
On the other hand, when the number of regimes is a random variable and we implement
an RJMCMC algorithm, no post-processing is necessary, because the posterior probabilities
of the different hidden regimes are not influenced by the ordering of the labels of the regimes.
In the RJMCMC algorithm the dimension of the model is changed by split-and-merge and
birth-and-death moves; in split and combine moves the moment matching is not required and
birth and death moves are not limited to the empty components.
4
A PPLICATIONS
The methodology can be illustrated through three different examples of non-linear multivariate time series, which are all benchmarks in the theory of HMMs: the duration and the waiting
time of Old Faithful Geyser eruptions (Härdle (1990)); the mortality in London during winter
1958 (Pole et al. (1994)); the dynamics of the United States Real Gross Domestic Product
and Unemployment Rate (Kim and Nelson (1999)). Here the geyser example is resumed; see
Spezia (2008) for details on all three applications.
The geyser data set contains 274 observations of two variables: Yt,1 is the duration (in
minutes) of the eruption of the Old Faithful geyser in Yellowstone National Park (Figure 1a)
and Yt,2 is the waiting time (in minutes) to the next blow out (Figure 1b).
First, we want to estimate the number of hidden regimes, so we ran the RJMCMC algorithm for 500,000 iterations after a burn-in of 100,000 iterations. The algorithm visited seven
regimes and the most probable model is that with two hidden regimes. Then, the MCMC algorithm has been run with fixed dimension (m = 2) for 200,000 iterations discarding the first
100,000 and the parameters (µ1 , µ2 , Σ1 , Σ2 , Ω, Γ)0 have been estimated.
Finally, the hidden sequence of regimes has been reconstructed (Figure 1c) and they can
represent two unobserved geological states whose switching activity produces long and short
eruptions after long and short waiting times. Each regime is the maximizer of the current
smoothed probabilities: after having estimated the parameters, we can compute the smoothed
probabilities of the regimes (Kim (1993)), that is the probabilities of any regime, at any time,
given all observations and the parameters.
In Spezia et al. (2009) MVGHMMs have been extended by adding a monthly periodic
component in order to reconstruct the missing values affecting the series of the concentrations of three inorganic nitrogens, recorded by Scottish Environment Protection Agency in
56 Scottish rivers near their mouth. The analysis of River Lochy is shown in brief. The time
series contains 288 observations of three variables (in the log-scale): Yt,1 is the Ammonia
(NH4) concentration (Figure 2a), Yt,2 is the Nitrite (NO2) concentration (Figure 2b), and Yt,3
is the Nitrate (NO3) concentration (Figure 3b).
The Periodic MVGHMM is
(Yt |Xt = i) = µi + βt + (Et |Xt = i) ,
where βt is the multivariate monthly component, i.e. βt = (βt,1 , βt,2 , βt,3 ), with:
·
µ
¶
µ
¶¸
πkt
πkt
β
cos
+
β
sin
,
2,k, j
∑ 1,k, j
6
6
k=1
6
βt, j =
for any j = 1, 2, 3. Periodic coefficients are gathered in vector β, i.e. β = (β1,1 β2,1 , . . . , β1,6 β2,6 )0 ,
with βh,k = (βh,k,1 , βh,k,2 , βh,k,3 ), for any h = 1; 2 and k = 1, . . . , 6. A priori we have
p(β) = ∏ p(βh,k )
h,k
and any βh,k is Multivariate Normal.
First, we want to estimate the number of hidden regimes, so we ran the RJMCMC algorithm for 200,000 iterations after a burn-in of 100,000 iterations. The algorithm visited seven
regimes and the most probable model is that with five hidden regimes. Then, the MCMC
algorithm has been run with fixed dimension (m = 3) for 200,000 iterations discarding the
first 100,000 and the parameters (µ1 , . . . , µ5 , Σ1 , . . . , Σ5 , β, Ω, y∗ )0 have been estimated, where
y∗ is the set the missing values occuring within the series. Finally, the hidden sequence has
been reconstructed (Figure 2d) by means of the algorithm of Kim (1993) and the regimes can
represent three unobserved levels of the river pollution.
6
1 20
4
80
2
40
0
0
2
1
1
27 4
(a)
0
1
274
(b)
1
274
(c)
Figure 1. Series of the duration of the eruption of Old Faithful Geyser (a), the waiting time to the next
blow out (b), and the estimated sequence of the hidden regimes (c)
0
0
1
1
28 8
28 8
-1
-1
-2
-2
-3
-3
-4
-4
-5
-5
-6
-6
-7
-7
-8
(a)
(b)
1
5
0
-1
1
28 8
-2
4
3
-3
-4
2
-5
1
-6
-7
0
-8
1
(c)
28 8
(d)
Figure 2. Series of NH4 (a), NO2 (b), NO3 (c), and the estimated sequence of the hidden regimes (d)
R EFERENCES
CAPPÉ, O., ROBERT C.P., RYDÉN T. (2003): Reversible jump, birth-and-death and more general
continuous Markov chain Monte Carlo samplers. Journal of the Royal Statistical Society Series B,
63, 679–700.
CELEUX, G., HURN, M., ROBERT C.P. (2000): Computational and Differential Difficulties With
Mixture Posterior Distributions. Journal of the American Statistical Association, 95, 957–970.
FRÜHWIRTH-SCHNATTER, S. (2001): Markov Chain Monte Carlo Estimation of Classical and Dynamic Switching and Mixture Models. Journal of the American Statistical Association, 96, 194–
209.
HÄRDLE, W. (1990): Smoothing Techniques With implementation in S. Springer, New York.
KIM, C.-J. (1993): Dynamic Linear Models with Markov-Switching. Journal of Econometrics, 60, 1–
22.
KIM, C.-J., NELSON, C.R. (1999): State-Space Models with Regime Switching: Classical and GibbsSampling Approaches with Applications. The MIT Press, Cambridge.
MARIN, J.M., MENGERSEN, K.L., ROBERT C.P. (2005): Bayesian Modelling and Inference on Mixture of Distributions. In: D. Dey and C.R. Rao (Eds.), Hanbooks of Statistics 25 . Elsevier Science,
Amsterdam, 459–507.
POLE, A., WEST M., HARRISON, J. (1994): Applied Bayesian Forecasting and Time Series Analysis.
Chapman & Hall, New York.
SPEZIA, L. (2008): Bayesian analysis of multivariate Gaussian hidden Markov models with an unknown number of regimes. Submitted.
SPEZIA, L., FUTTER, M.N., BREWER, M.J. (2009): Periodic multivariate normal hidden Markov
models for the analysis of water quality time series. In preparation.