Antonietta Mira Efficiency of finite state space Monte Carlo Markov chains 2000/1 UNIVERSITÀ DELL'INSUBRIA FACOLTÀ DI ECONOMIA http://eco.uninsubria.it In questi Quaderni vengono pubblicati i lavori dei docenti della Facoltà di Economia dell’Università dell’Insubria. La pubblicazione di contributi di altri studiosi, che abbiano un rapporto didattico o scientifico stabile con la Facoltà, può essere proposta da un professore della Facoltà, dopo che il contributo sia stato discusso pubblicamente. Il nome del proponente è riportato in nota all'articolo. I punti di vista espressi nei quaderni della Facoltà di Economia riflettono unicamente le opinioni degli autori, e non rispecchiano necessariamente quelli della Facoltà di Economia dell'Università dell'Insubria. These Working papers collect the work of the faculty of Economics of the University of Insubria. The publication of work by other Authors can be proposed by a member of the Faculty, provided that the paper has been presented in public. The name of the proposer is reported in a footnote. The views expressed in the working papers reflect the opinions of the authors only, and not necessarily the ones of the Economics Faculty of the University of Insubria. © Copyright Antonietta Mira Printed in Italy in October 2000 Università degli Studi dell'Insubria Via Ravasi 2, 21100 Varese, Italy All rights reserved. No part of this paper may be reproduced in any form without permission of the Author. EÆciency of nite state space Monte Carlo Markov chains Antonietta Mira y 10th October 2000 Abstract The class of nite state space Markov chains stationary with respect to a common pre-specied distribution is considered. An easy to check partial ordering is dened on this class. The ordering provides a suÆcient condition for the dominating Markov chain to be more eÆcient. EÆciency is measured by the asymptotic variance of the estimator of the integral of a specic function with respect to the stationary distribution of the chains. A class of transformations that, when applied to a transition matrix, preserves its stationary distribution and improves its eÆciency is dened and studied. Keywords: Markov chain Monte Carlo, EÆciency ordering, Stationarity preserving and eÆciency increasing transfers. 1 Introduction Consider a distribution of interest (x), x 2 X , possibly known up to a normalizing constant. Assume that X is an N dimensional nite state space. In order to gather information about we construct a Harris recurrent Markov chain with transition matrix P (we will identify Markov chains with the corresponding transition matrices), that has has its unique stationary distribution: = P . In particular, following the Markov chain Monte Carlo (MCMC) literature, we estimate E [f (X )] = with the sample average along a realized path of a chain of length n: 1 ^n = n X f (X ): n i i=1 This work has been supported by EU TMR network ERB-FMRX-CT96-0095 on "Computational and Statistical methods for the analysis of spatial data" and by the F.A.R. 2000, of the University of Insubria. y Universit a degli Studi dell'Insubria, Facolt a di Economia, Via Ravasi 2, 21100, Varese, Italy. Email: [email protected] 1 Let L2 () indicate the space of functions that have nite variance under and assume that all the functions of interest belong to this space. Let 1 denote the 1 N vector of ones and, likewise, 0 denote the 1 N vector of zeros. Dene the inner product on L2 ( ) to be: (f; g) = f g = f (x)g(x)(x) 0 X x2X where is an N N diagonal matrix with i as the i-th element on the diagonal and g 0 is the transpose of g . A measure of the eÆciency of ^n is v (f; P ) the limit, as n tends to innity, of n2 = n Var [^n]. It can be shown, Peskun (1973), that v (f; P ) = f [2lP 1 I A]f = (f; [2lP 1 I A]f ); (1) where I is the N N identity matrix, A = 1 and lP = (I (P A)) is called the Laplacian of P . Notice that, in the formula for the asymptotic variance, the only quantity that involves the transition matrix (and thus the only quantity on which we can intervene to improve the performance of our estimates) is the inverse Laplacian. Given a distribution of interest there is often more than one transition matrix that has as its unique stationary distribution. The primary criterion used to choose among them is, as everywhere else in statistics, the asymptotic variance of the (asymptotically unbiased) estimators. We thus give the following denitions: 0 0 Denition 1. If P and Q are Markov chains with stationary distribution , then P is at least as eÆcient as Q for a particular function f , P f Q, if v (f; P ) v (f; Q). Denition 2. If P and Q are Markov chains with stationary is at least as uniformly eÆcient as Q, P E Q, if v (f; P ) v (f; Q); distribution , then P 8f 2 L2 (): (2) Peskun ordering, Peskun (1973) and Tierney (1995), is a suÆcient condition for (2) to hold and the covariance ordering, Mira and Geyer (1998), is necessary and suÆcient. Uniform eÆciency has already been studied quite extensively: Peskun (1973), Tierney (1995), Mira and Geyer (1998), Frigessi et al. (1992). In this paper we focus on relative eÆciency and try to answer the following question: if we have a specic function f in mind and we are only interested in its expected value with respect to , which Markov chain should we use for our simulation study? Intuitively, the transition matrix that we choose having in mind a specic function to estimate, will perform better than a generic one chosen to minimize the asymptotic variance for all possible functions with nite asymptotic variance under . 2 SuÆcient conditions for eÆciency Assume that the function of interest f is monotone. This is not a restrictive hypothesis since we can always rearrange the state space X , and everything else accordingly, to make it monotone. 2 Dene the summation matrix T to be an N N upper triangular matrix with zeros below the main diagonal and ones elsewhere. Dene the south-west sub-matrix of the matrix M to be the sub-matrix of M obtained deleting the rst row and the last column of M . Denition 3. P dominates Q in the south-west ordering (with respect to some re Q, if all the elements of the south-west sub-matrix ordering of the state space), P SW of T P Q T are non-negative. ( ) Theorem 1. Let P and Q be two Markov chains having 1 1 distribution. If lP SW lQ then P f Q. as their unique stationary It is suÆcient to show that, for the given function, f (lP 1 lQ1 )f 0 0. Consider the identity f (lP 1 lQ1 )f 0 = fT 1T (lP 1 lQ1 )T T 1f 0 and note that: the last column of T (lP 1 lQ1 )T equals T (lP 1 lQ1 )10 = 00 ; the rst row of T (lP 1 lQ1 )T equals 1(lP 1 lQ1 )T = (lP 1 lQ1 )T = 0; f monotone is equivalent to the rst (N 1) elements of T 1f 0 and the last (N 1) elements of fT 1 to have opposite signs. It follows that lP 1 SW lQ1 implies P f Q. With respect to the Peskun ordering, the south-west ordering requires to check fewer conditions, namely (N 1)(N 1) instead of N (N 1). This is an important issue when the state space is large. Both orderings are partial, that is, do not allow ranking of all transition matrices having a specied stationary distribution. A possible limit of Theorem 1 is that it requires to work with inverse Laplacians and computing inverses of matrices can be computationally intensive (especially on large state spaces). A way to work with the transition matrices directly is to realize that I + (P A) provides a rst order approximation to the inverse Laplacian since (Kemeny and Snell 1969) Proof. lP 1 X = I + (P 1 i=1 A)i and limi!1(P A)i equals, by construction, the N N zero matrix. Thus a rst order approximation to v(f; P ) is given by v (f; P ) v (f; A) + 2f (P A)f 0 where v(f; A) is the theoretical independence sampling variance of ^n and f (P A)f 0 can be interpreted as a rst-order covariance if is the distribution of the initial state of the Markov chain. This justies the attempt of trying to nd P such that f (P A)f 0 is large (in absolute value) and negative. This is nothing more than an attempt of inducing negative correlation into the Markov chain Monte Carlo sampler. We thus give the following denition: Denition 4. is at least as f (P If P and Q rst order Q)f 0 0. are Markov chains with stationary distribution eÆcient as Q for a particular function f , P 3 , then P 1f Q, if A reasoning similar to the one in the proof of Theorem 1 shows that, if P SW Q, then P 1f Q. 3 Stationarity preserving transfers Denote by S.P.T. a stationarity preserving (and eÆciency increasing) transfer performed on a transition matrix P (with stationary distribution ) of the following kind: given integers 1 i < j N and 1 k < l N and a quantity 0 < h 1 , increase pi;l and pj;k by h and hi =j respectively and decrease pi;k and pj;l by h and and hi =j respectively. Thus a S.P.T. is completely dened by giving 4 indexes and the amount by which to increase/decrease the proper entries of the matrix on which the transfer is performed. We will thus identify a S.P.T. by P (i; j; k; l; h). If P is derived from Q via a nite sequence of stationarity-preserving and eÆciency increasing transfers then P SW Q. We can thus increase the eÆciency of a transition matrix (provided the rst order approximation to the asymptotic variance is good), while preserving its stationary distribution, via a sequence of S.P.T. Indeed, we can keep transferring probability mass around until 9 i < j such that pi;k and pj;l > 0 for some k < l. Of course, every time we move some probability mass around, we need to check that the resulting matrix remains a proper (all entries between 0 and 1) irreducible transition matrix. Notice that knowing only up to a normalizing multiplicative constant does not constitute a limit to this theory since only the ratio of specic values is needed to dene a S.P.T. It is interesting to study the limiting transition matrix obtained by applying a sequence of S.P.T. The resulting matrix has at most one non zero element along the main diagonal and it presents a path of positive entries connecting the north-east to the south-west corner of the matrix. Which specic pattern is optimal depends on . For example consider the case N = 2 and assume that is properly normalized. Consider rst the case where 1 = 1=2. In this setting a transition matrix P has the proper stationary distribution if and only if it is doubly stochastic. For example, a Metropolis-Hastings algorithm with proposal matrix K given by k12 = p and k21 = q, with 0 p; q 1, has transition matrix P with p12 = minfq; pg and p21 = minfq; pg. How do we optimally chose the proposal distribution, that is the values of p and q? The rst-order-optimal transition matrix has p11 = 0, p21 = 1 and the path of positive entries follows the north-east to south-west diagonal. This is nothing but a MetropolisHastings algorithm where the proposal distribution always proposes to jump to the other state (notice that this proposal is always accepted). The structure of the rstorder-optimal transition matrix is the same for any uniform stationary distribution, that is, we always have a patterns of ones on the north-east to south-west diagonal. The resulting rst-order-optimal transition matrix is periodic since it has an eigenvalue equal to -1. This is a problem for convergence in total variation distance to stationarity but it is not an issue if we are interested in asymptotic variance of the ergodic average estimates. If 1 < 1=2 then to obtain the rst-order-optimal transition matrix we have to set p11 = 0 and p21 = 1 =(1 1 ). Finally if 1 > 1=2 then we have to set p22 = 0 4 and p12 = (1 1)=1 for rst-order optimality. 4 Conclusions Two partial orderings are dened on the space of irreducible transition matrices having a common stationary distribution . The south-west ordering involves the inverse of the transition matrices and provides a suÆcient condition for the dominating Markov chain to produce MCMC estimates of E f with a smaller asymptotic variance. The second ordering is dened on the transition matrices them self and provides a suÆcient condition for the dominating Markov chain to be rst order more eÆcient. Following the economic literature on inequality comparisons and in particular the principle of Pigou-Dalton transfers (Dardanoni, 1993), we dene stationarity preserving and eÆciency increasing transfers of probability mass within a transition Matrix. The optimal transition matrix that results after a sequence of such transfers, resembles the matrix that we would observe when studying the distribution of two negatively correlated characters. The literature on these topics might help to extend the ideas explored in the present paper from nite state spaces to general state spaces. Acknowledgments I would like to thank Gareth Roberts and Peter Green for illuminating discussions and Valentino Dardanoni for bringing to my attention the paper that inspired the present work (Dardanoni, 1993). References Dardanoni, V. (1993), Measuring Social Mobility, Journal of Economic Theory, 61, 372 { 394. Frigessi, A. Hwang, C. and Younes, L. (1992), Optimal spectral structure of reversible stochastic matrices, Monte Carlo methods and the simulation of Markov random elds, The Annals of Applied Probability, 2, 610 { 628. Mira A. and Geyer C. (1998), Ordering Monte Carlo Markov chains, Tech. Rep. 632, U. of Minnesota. Kemeny, J. G. and Snell, J. L. (1969), Finite Markov chains, Princeton, Van Nostrand. Peskun, P. H. (1973), Optimum Monte Carlo sampling using Markov chains. Biometrika, 60, 607{612. Tierney, L. (1995), A Note on Metropolis-Hastings kernels for general state spaces, Tech. Rep. 606, U. of Minnesota. 5
© Copyright 2024 Paperzz