2.15 MCMC in a Nutshell

MCMC in a Nutshell – 1
• MCMC = “Markov Chain Monte Carlo”
• Very general method for (approximate) simulation of complex highdimensional probability distributions
• Main idea:
– Invent a Markov chain (MC) which converges to the probability
distribution  of interest
– Simulate the MC. If
X0 , X1 , X2 , X3 , 
is a simulation of the chain and if g() is a statistic of interest, then
g ( X1 )  g ( X 2 )  g ( X n )
n
 Eπ [ g ]
as
n 
– If g() is the indicator of event A, then the limit is Pr[A]
• Assessing variability of g() is also possible, but more difficult since
the Xi are not independent.
2.15.
1
MCMC in a Nutshell – 2
•  = finite set (usually astronomically large)
• (x), x  , probability distribution of interest
• Must be able to quickly compute (x) up to a normalizer. It is not
necessary to know the normalizer.
• Examples:
– Thematic raster maps
•  = set of thematic maps on a regular grid (for 10241024 grid, #  k1,000,000)
• (x) = Z – 1 e – H(x), x  
• H(x) = Gibbs potential (specific parametric forms involving
neighboring grid cells are typically used for H(x) )
• Z = x (x) = normalizer (called the partition function in Statistical Mechanics)
– Linear extensions of partially ordered sets
• S = partially ordered set
•  = set of rankings of S consistent with the partial order (called linear extensions)
•  = uniform distribution, (x) = Z – 1 where Z = #
• R(a, i)   is the set of linear extensions which assign rank i to a given a  S
• i  Pr[ R(a, i) ] is called the fuzzy rank or rank-frequency distribution of a 
2.15.
2
S
MCMC in a Nutshell – 3
Markov Chain Preliminaries - a
•  = finite configuration (‘state’) space
• p(x,y) , x,y  , a row stochastic transition matrix on 
• Current state jumps around from configuration to configuration
in :

x
y

• Given that x is the current configuration, the next configuration y
is selected according to row x of the transition matrix, p(x,), and
is independent of the previous history
• When is there a unique limiting distribution (x), x   ?
2.15.
3
MCMC in a Nutshell – 4
Markov Chain Preliminaries - b
Conditions for (x), x  , to be the unique limiting distribution:
a)  should be a stationary distribution for p, i.e.,
 x (x) p(x,y) = (y) for all y  
b)
p should be irreducible, i.e., it is possible to reach each
configuration from every other configuration
c)
p should be aperiodic, i.e., for each x  ,
gcd{ n : x is reachable from x in n steps } = 1
Usually, condition (a) is replaced by the stronger requirement that 
satisfy detailed balance with respect to p, i.e.,
(x) p(x,y) = (y) p(y,x) for all x, y  
In MCMC, this gives a simple algebraic requirement for determining
2.15.
4
p in terms of .
MCMC in a Nutshell – 5
Implementation of MCMC
• Algorithm:
– With x as current configuration, propose a tentative new configuration y
– Either accept y as next configuration or continue with x as next configuration.
Not rejection sampling; either y or x is the next configuration.
• Proposal Rule: Given x, propose y according to a row stochastic
matrix q(x,y)
• Acceptance Rule: Accept y with probability (x,y)
• Formal Definition of MCMC Transition Matrix p:
q ( x, y ) α ( x, y )
p ( x, y ) 
• Two Questions:
{
1
 p ( x, y )
if y  x
if
yx
y :y x
– How to choose the acceptance probability to get the desired distribution (x) ?
– Where does the proposal transition matrix q(x,y) come from ? The amazing
thing about MCMC is that almost any q will work (not necessarily efficiently);
the acceptance probability  adjusts
q to give the desired (x)
2.15.
5
MCMC in a Nutshell – 6
Hastings-Metropolis Rule for Acceptance Probability 
• Choose the acceptance probability to force detailed balance:
(x) p(x,y) = (y) p(y,x)
for all x, y  
• Trivially holds when x = y ; otherwise, p(x,y) = q(x,y) (x,y) so we
need
(x) q(x,y) (x,y) = (y) q(y,x) (y,x)
• The strategy is to make the acceptance probability as large as
possible. So take  on one side of the equation to be unity; and solve
for the  on the other side of the equation. This gives the
Hastings-Metropolis Rule:
π( y ) q ( y, x)
α ( x, y ) 
π ( x ) q ( x, y )
(Interpret the right-hand side to be unity if it is greater than one)
• Special Case: If q(x,y) is symmetric, the Metropolis Rule applies:
2.15.
6
(x,y) = (y)(x)
MCMC in a Nutshell – 7
Proposal Transition Matrix q(x,y)
• About the only requirement of q is that it be irreducible, i.e., it is
possible to go from any configuration to any other configuration in a
finite sequence of steps of positive q-probability.
• The standard way of obtaining q is via a neighborhood system on
configuration space : for each x   there is a “neighborhood” of
x, Nx  , such that:
– ‘Deleted’ neighborhoods: x  Nx
– Symmetry: y  Nx if and only if x  Ny
• Given x as the current configuration, the proposal y is a random
(equal probability) selection from Nx . Thus,
q ( x, y ) 
{
1
# Nx
0
if y  N x
otherwise
2.15.
7
• q(x,y) is symmetric if and only if
all neighborhoods have same size