M ULTIVARIATE G AUSSIAN HIDDEN M ARKOV MODELS WITH AN UNKNOWN NUMBER OF REGIMES Luigi Spezia Biomathematics & Statistics Scotland Macaulay Institute Craigiebuckler, Aberdeen, AB15 8QH, UK (e-mail: [email protected]) A BSTRACT. Hidden Markov models (HMMs) are generalizations of mixture models, obtained by adding a latent, or hidden, Markov chain which drives the observed process. Multivariate Gaussian HMMs with an unknown number of regimes are considered here in the Bayesian setting and efficient reversible jump Markov chain Monte Carlo algorithms for estimating both the dimension and the unknown parameters of the model will be presented, along with applications to real world phenomena. 1 I NTRODUCTION We apply Multivariate Gaussian Hidden Markov Models (MVGHMMs) to weakly dependent non-linear time series to understand the switching of both the observed levels of a multivariate time series and the covariance matrices, by assuming that the levels and the strength of the interdependencies between the variables can change over time. We also reconstruct the sequence of the hidden regimes, which represents the evolution of the switching in time of the means and covariance matrices of the observed phenomena. The reversible jump Markov chain Monte Carlo (RJMCMC) algorithm we propose for MVGHMMs, instead of exploiting the conjugacy of the prior distributions to implement Gibbs sampling, updates the parameters by random walk Metropolis-Hastings moves, because the Gibbs sampler is less able to traverse the posterior surface and to escape local modes, as pointed out by Celeux et al. (2000) for univariate mixture models. All sweeps of our MCMC algorithms are characterized by the generation of the parameters with the “absence of completion”, as in Cappé et al. (2003), i.e. without generating the sequence of the hidden Markov chain. 2 M ULTIVARIATE G AUSSIAN HIDDEN M ARKOV MODELS Let {Xt } be a discrete-time, first-order, homogeneous Markov chain on a finite space SX = {1, . . . , m}; the transition matrix is Γ = [γi, j ], where γi, j = P(Xt = j | Xt−1 = i), for any i; j ∈ SX and for any t = 2, . . . , T , with 0 < γi, j < 1. Let also {Yt } be a sequence of variables taking values in R p . Only process {Yt } is observable, while {Xt } is latent, i.e. hidden underneath {Yt }, and we assume that {Yt }, given {Xt }, is a sequence of conditionally independent random variables, which are multivariate here, i.e. Yt = (Yt,1 , . . . ,Yt,p )0 , for any t = 1, . . . , T , whose conditional distributions depend on {Xt } only through the contemporary Xt ’s and they are assumed to be Multivariate Gaussian of order p. Hence, the stochastic process ({Xt } ; {Yt }) can be named MVGHMM and represented by the following equality: (Yt |Xt = i) = µi + (Et |Xt = i) , where the m processes {Et |Xt = i}, i = 1, ¡ . . . , m,¢are multivariate Gaussian white noise process of order p, with (Et |Xt = i) ∼ N(p) 0(p) ; Σi , so that (Yt |Xt = i) ∼ N(p) (µi ; Σi ), for any regime, or state, i ∈ SX and for any t = 1, . . . , T (N(p) (·; ·) stands for the Multivariate Gaussian, or Normal, distribution of order p and 0(p) for the p-dimensional vector of zeros). The unknown parameters of our MVGHMM are allocated in vector (µ, Σ, Ω, m)0 , where µ is the vector of the m mean vectors µi (µ = (µ1 , . . . , µm )0 with µi = (µi,1 , . . . , µi,p )0 , for any 0 £i = 1, ¤. . . , m) and Σ is the vector of the m covariance matrices Σi (Σ = (Σ1 , . . . , Σm ) with Σi = σi, j,k , for any i = 1, . . . , m and for any j, k = 1, . . . , p, i.e. σi, j,k = COV [(Yt, j |Xt = i) ; (Yt,k |Xt = i)], for any t = 1, . . . , T and for any i = 1, . . . , m). The matrix Ω = [ωi, j ] contains positive quantities which are used to obtain the transition matrix Γ = [γi, j ], through the equality , γi, j = ωi, j m ∑ ωi, j , j=1 with ωi, j > 0, for any i; j ∈ SX (Cappé et al. (2003)). This transformation of the transition probabilities makes the random walk Metropolis-Hastings moves of the MCMC algorithm more efficient. Given that our inferential tool is Bayesian, we introduce the following independent and relabelling-invariant prior distributions: for any i; j = 1, . . . , m, any µi is Multivariate Normal, any Σi is Inverted Wishart, any ωi, j is Gamma, and m is Discrete Uniform between 1 and the maximum number of hidden regimes we admit a-priori. The sequence of the observations is denoted by yT = (y1 , . . . , yT )0 , with yt = (yt,1 , . . . , yt,p ), for any t = 1, . . . , T ; so, the joint density of all variables of the model is ¡ ¢ p(m, µ, Σ, Ω, yT ) = p yT | µ, Σ, Ω, m p(µ | m)p(Σ | m)p(Ω | m)p(m). 3 M ARKOV CHAIN M ONTE C ARLO ALGORITHMS The vector of unknown parameters (µ, Σ, Ω, m)0 can be estimated in two consecutive steps through two different MCMC algorithms: by the first, we can obtain the number of regimes, when it is a random variable, while, by the second, we can obtain the estimates of the parameters of the multivariate Normals and the transition matrix, when the number of regimes is fixed. In both the MCMC algorithms, the simulations from the posterior density are characterized by Metropolis-Hastings moves with no completion, that is without the generation of the sequence of the hidden regimes, which is never simulated. We update the parameters by random walk Metropolis-Hastings moves, instead of Gibbs Sampling, because the former is more able than the latter to traverse the posterior surface and to escape local modes. In any sweep, all parameters are updated with no completion, in order to improve the precision of the simulation and to accelerate the convergence of the algorithm, by eliminating unnecessary randomness from the procedure and reducing the dimension of the parameter space. Moreover, the posterior density is left unconstrained, because no artificial identifiability constraint is selected a priori, and all m! subspaces on which the posterior density is defined can be visited by the algorithm, about the same number of times, because we conclude any sweep by permuting the generated values according to the random selection of one of the m! possible ordering of the regimes (Frühwirth-Schnatter, 2001). This device, called “random permutation sampling” allows us to improve both the convergence and the mixing of the algorithm, but, when the number of regimes is fixed, it prevents the direct computation of the estimates of the parameters. So, a further step is necessary: we have to post-process the MCMC sample, following the guidelines of Marin et al. (2005). First, we look for the posterior mode, that is the values obtained in a special sweep which maximizes the posterior density. Then, for any sweep, given all possible permutations of the regimes, we compute the Euclidean norm between any permuted vector of parameters and the posterior mode; we select that special permutation which minimizes the distance between the reordered vector and the posterior mode; we apply the selected permutation to the generated values of the sweep and place them in the reordered MCMC sample. Finally, we can compute the estimates of the parameters by taking the ergodic averages of the reordered values. On the other hand, when the number of regimes is a random variable and we implement an RJMCMC algorithm, no post-processing is necessary, because the posterior probabilities of the different hidden regimes are not influenced by the ordering of the labels of the regimes. In the RJMCMC algorithm the dimension of the model is changed by split-and-merge and birth-and-death moves; in split and combine moves the moment matching is not required and birth and death moves are not limited to the empty components. 4 A PPLICATIONS The methodology can be illustrated through three different examples of non-linear multivariate time series, which are all benchmarks in the theory of HMMs: the duration and the waiting time of Old Faithful Geyser eruptions (Härdle (1990)); the mortality in London during winter 1958 (Pole et al. (1994)); the dynamics of the United States Real Gross Domestic Product and Unemployment Rate (Kim and Nelson (1999)). Here the geyser example is resumed; see Spezia (2008) for details on all three applications. The geyser data set contains 274 observations of two variables: Yt,1 is the duration (in minutes) of the eruption of the Old Faithful geyser in Yellowstone National Park (Figure 1a) and Yt,2 is the waiting time (in minutes) to the next blow out (Figure 1b). First, we want to estimate the number of hidden regimes, so we ran the RJMCMC algorithm for 500,000 iterations after a burn-in of 100,000 iterations. The algorithm visited seven regimes and the most probable model is that with two hidden regimes. Then, the MCMC algorithm has been run with fixed dimension (m = 2) for 200,000 iterations discarding the first 100,000 and the parameters (µ1 , µ2 , Σ1 , Σ2 , Ω, Γ)0 have been estimated. Finally, the hidden sequence of regimes has been reconstructed (Figure 1c) and they can represent two unobserved geological states whose switching activity produces long and short eruptions after long and short waiting times. Each regime is the maximizer of the current smoothed probabilities: after having estimated the parameters, we can compute the smoothed probabilities of the regimes (Kim (1993)), that is the probabilities of any regime, at any time, given all observations and the parameters. In Spezia et al. (2009) MVGHMMs have been extended by adding a monthly periodic component in order to reconstruct the missing values affecting the series of the concentrations of three inorganic nitrogens, recorded by Scottish Environment Protection Agency in 56 Scottish rivers near their mouth. The analysis of River Lochy is shown in brief. The time series contains 288 observations of three variables (in the log-scale): Yt,1 is the Ammonia (NH4) concentration (Figure 2a), Yt,2 is the Nitrite (NO2) concentration (Figure 2b), and Yt,3 is the Nitrate (NO3) concentration (Figure 3b). The Periodic MVGHMM is (Yt |Xt = i) = µi + βt + (Et |Xt = i) , where βt is the multivariate monthly component, i.e. βt = (βt,1 , βt,2 , βt,3 ), with: · µ ¶ µ ¶¸ πkt πkt β cos + β sin , 2,k, j ∑ 1,k, j 6 6 k=1 6 βt, j = for any j = 1, 2, 3. Periodic coefficients are gathered in vector β, i.e. β = (β1,1 β2,1 , . . . , β1,6 β2,6 )0 , with βh,k = (βh,k,1 , βh,k,2 , βh,k,3 ), for any h = 1; 2 and k = 1, . . . , 6. A priori we have p(β) = ∏ p(βh,k ) h,k and any βh,k is Multivariate Normal. First, we want to estimate the number of hidden regimes, so we ran the RJMCMC algorithm for 200,000 iterations after a burn-in of 100,000 iterations. The algorithm visited seven regimes and the most probable model is that with five hidden regimes. Then, the MCMC algorithm has been run with fixed dimension (m = 3) for 200,000 iterations discarding the first 100,000 and the parameters (µ1 , . . . , µ5 , Σ1 , . . . , Σ5 , β, Ω, y∗ )0 have been estimated, where y∗ is the set the missing values occuring within the series. Finally, the hidden sequence has been reconstructed (Figure 2d) by means of the algorithm of Kim (1993) and the regimes can represent three unobserved levels of the river pollution. 6 1 20 4 80 2 40 0 0 2 1 1 27 4 (a) 0 1 274 (b) 1 274 (c) Figure 1. Series of the duration of the eruption of Old Faithful Geyser (a), the waiting time to the next blow out (b), and the estimated sequence of the hidden regimes (c) 0 0 1 1 28 8 28 8 -1 -1 -2 -2 -3 -3 -4 -4 -5 -5 -6 -6 -7 -7 -8 (a) (b) 1 5 0 -1 1 28 8 -2 4 3 -3 -4 2 -5 1 -6 -7 0 -8 1 (c) 28 8 (d) Figure 2. Series of NH4 (a), NO2 (b), NO3 (c), and the estimated sequence of the hidden regimes (d) R EFERENCES CAPPÉ, O., ROBERT C.P., RYDÉN T. (2003): Reversible jump, birth-and-death and more general continuous Markov chain Monte Carlo samplers. Journal of the Royal Statistical Society Series B, 63, 679–700. CELEUX, G., HURN, M., ROBERT C.P. (2000): Computational and Differential Difficulties With Mixture Posterior Distributions. Journal of the American Statistical Association, 95, 957–970. FRÜHWIRTH-SCHNATTER, S. (2001): Markov Chain Monte Carlo Estimation of Classical and Dynamic Switching and Mixture Models. Journal of the American Statistical Association, 96, 194– 209. HÄRDLE, W. (1990): Smoothing Techniques With implementation in S. Springer, New York. KIM, C.-J. (1993): Dynamic Linear Models with Markov-Switching. Journal of Econometrics, 60, 1– 22. KIM, C.-J., NELSON, C.R. (1999): State-Space Models with Regime Switching: Classical and GibbsSampling Approaches with Applications. The MIT Press, Cambridge. MARIN, J.M., MENGERSEN, K.L., ROBERT C.P. (2005): Bayesian Modelling and Inference on Mixture of Distributions. In: D. Dey and C.R. Rao (Eds.), Hanbooks of Statistics 25 . Elsevier Science, Amsterdam, 459–507. POLE, A., WEST M., HARRISON, J. (1994): Applied Bayesian Forecasting and Time Series Analysis. Chapman & Hall, New York. SPEZIA, L. (2008): Bayesian analysis of multivariate Gaussian hidden Markov models with an unknown number of regimes. Submitted. SPEZIA, L., FUTTER, M.N., BREWER, M.J. (2009): Periodic multivariate normal hidden Markov models for the analysis of water quality time series. In preparation.
© Copyright 2026 Paperzz