Markov chain Monte Carlo and Reversible Markov chains

Problem
Markov chain Monte Carlo
and
Reversible Markov chains
STK2130 Monday February 18
Want to evaluate (estimate), for some discrete random variable X
θ = E[h(X)] =
N
X
h(j)πj
j=0
The parameter θ may for instance be an approximation to some
high dimensional integral
Z
Z
θ0 = · · · h(x1 , . . . , xm )f (x1 , . . . , xm )dx1 · · · dxm
and N may be a very large number.
We assume, however, that the probabilities πj are known.
Markov chain Monte CarloandReversible Markov chains – p. 2/22
Markov chain Monte CarloandReversible Markov chains – p. 1/22
(one good) Solution: Hasting-Metropolis algorithm
Idea: MCMC
MCMC = Markov chain Monte Carlo
Construct irreducible and ergodic Markov chain X0 , X1 , X2 , . . .
with stationary distribution limn→∞ P(Xn = j) = πj .
Let Yn+1 be a candidate for the new state Xn+1 chosen from a
"proposal" distribution
qij = P (Yn+1 = j|Xn = i).
Estimate θ by
n
1X
θ̂ =
h(Xi )
n i=1
It is advisable to run the chain an initial number of times
(burnin-period) so that the process is close to the stationary
distribution before start taking the average θ̂.
How could such a Markov chain be constructed?
Given Xn = i and Yn+1 = j we accept the candidate value and
let Xn+1 = j with probability
αij = min(1,
πj qji
)
πi qij
and otherwise reject the candidate, thus Xn+1 = Xn = i.
Then Xn has stationary distribution πj = limn→∞ P(Xn = j) if
the qij are transition probabilities for an irreducible MC.
Note: This implies that the chain is aperiodic and recurrent
Markov chain Monte CarloandReversible Markov chains – p. 3/22
Markov chain Monte CarloandReversible Markov chains – p. 4/22
since
.
Why does this work? With this construction
the MC satisfies a "detailed balance equation"
πj Pji = πi Pij
where Pij = P (Xn+1 = j|Xn = i).
Reversible Markov chains
With P (Xn = j) = πj the detailed balance equation can be
interpreted as
P(Xn = i, Xn+1 = j) = P(Xn+1 = j|Xn = i)P(Xn = i)
This obviously holds for j = i. For j 6= i note
= πi Pij = πj Pji = P(Xn+1 = i|Xn = j)P(Xn = j)
= P(Xn = j, Xn+1 = i).
π q
πi Pij = πi qij αij = πi qij min(1, πji qijji ) = min(πi qij , πj qji )
πq
= πj qji min(1, πji qijji ) = πj qji αji = πj Pji
Summing both sides of the detailed balance equation over i gives
X
X
X
πi Pij =
πj Pji = πj
Pji = πj
i
i
i
This has the consequence that it is impossible to check if the
chain is recorded forwards or backwards, i.e. reversible.
For some MCs, the detailed balanced equations can simplify the
derivation of the stationary distribution (a bit), for instance
Random Walks in Example 4.35
determining the stationary distribution πj (along with
P
j πj = 1).
Markov chain Monte CarloandReversible Markov chains – p. 5/22
A worked toy example
Markov chain Monte CarloandReversible Markov chains – p. 6/22
A simulation
1.0
4
We want to simulate from a distribution on states 0, 1, 2, 3 and 4
Series X
A sample path
2
15
4
10
2
10
2
10
2
15
1
15
1
10
2
10
2
10
1
15
3
15
4
10
2
10
2
10
8
15
0.2










Markov chain Monte CarloandReversible Markov chains – p. 7/22
0.4
ACF
2
1
1
15
1
10
2
10
2
10
1
15
0.0




P=




8
15
2
10
2
10
2
10
3
15
0
The full transistion matrix becomes

X
With uniform proposal distrib. qij = P(Yn+1 = j|Xn = i) = 0.2
we get for instance
π1 q10
1 0.2
)=
P01 = q01 min(1,
= 2/15
π0 q01
5 0.3
0.6
3
0.8
π = (π0 , π1 , . . . , π4 ) = (0.3, 0.2, 0.1, 0.1, 0.3)
0
20 40 60 80
time
0
5
15
25
Lag
The left panel shows the autocorrelation corr(Xn , Xn+k ) which
tends fast to 0.
Markov chain Monte CarloandReversible Markov chains – p. 8/22
Another proposal distribution
A simulation with the new prop.distr.
Series X
4
1.0
A sample path
ACF
0.6
3
0.4
2
0.2
1









0.8

X
Let the matrix of the qij equal

0.5 0.5 0 0
0

 1
1
1
0
0
 3
3
3

1
1
1
Q=
0
 0
3
3
3

1
1
1
 0
0 3 3
3

0
0 0 0.5 0.5
0
0.0
To calculate the transition matrix P for the chain Xn it is only
necessary to calculate αij for which qij > 0.
0
20 40 60 80
0
10
time
Markov chain Monte CarloandReversible Markov chains – p. 9/22
Choice of qij
20
30
40
Lag
There is larger dependencies in this chain, will need a longer
Markov chain Monte CarloandReversible Markov chains – p. 10/22
runs to calculate precise θ = E[h(X)].
No need to calculate P
In the preceding example the uniform qij = 0.2 gives smaller
dependencies between Xn and Xn+k than the second proposal
distribution. This shows that there is a need to look for "good"
qij .
It is not, however, always the case that a uniform distribution is
best - when the number of states is large and proposal involving
just some neighbours can often be preferred.
Markov chain Monte CarloandReversible Markov chains – p. 11/22
When the number of states N + 1 is large it is inconvenient to
calculate the transition probabilities Pij . But there is really no
need to do so.
The sampling can always be carried out in two steps
(a) Given Xn = i sample Yn+1 = j from qij
(b) Calculate αij and sample ui ∼ U [0, 1].
If ui ≤ αij then Xn+1 = j
Otherwise Xn+1 = Xn = i
Markov chain Monte CarloandReversible Markov chains – p. 12/22
For instance my R-script
Estimation of expectations
For the toy example we can immediately calculate
P4
j=0
θ2 = E[eX ] =
jπj = 1.9
Estimating E[X]
P4
j=0
ej πj ≈ 19.97
Estimating E[exp(X)]
24
θ1 = E[X] =
Uniform q
Neighbour q
true value
22
1.0
16
1.2
18
20
mean
1.6
1.4
mean
1.8
X<-0
for (i in 1:10000){
y<-sample(0:4,1,prob=q[X[i]+1,])
alpha<-min(1,px[X[i]+1]*q[X[i]+1,y+1]/(px[y+1]*q[y+1,X[i]+1]))
u<-runif(1)
if (u<alpha) X[i+1]<-y
if (u>alpha) X[i+1]<-X[i]
}
Initially we stated that a purpose of MCMC is to evaluate
P
expectations θ = E[h(X)] = N
j=0 h(j)πj .
2.0
for simulating the second proposal distribution was done without
calculating P .
0
4000
8000
Markov chain Monte CarloandReversible Markov chains – p. 13/22
n
Partially determined πj
0
4000
8000
Markov chain Monte CarloandReversible Markov chains – p. 14/22
n
Partially determined πj , contd.
In many problems it is only possible to specify the πj up to a
constant, that is we know πj = bj /C with bj known and the
normalizing constant C unknown.
Hasting-Metropolis is constructed to handle this, because the
constant cancels out in
Here we may know P(X = x|Y = y) and P(Y = y), but there
may be a large number of possible y, thus
X
C=
P(X = x|Y = y)P(Y = y)
y
can be hard to evaluate.
bj qji
πj qji
) = min(1,
)
αij = min(1,
πi qij
bi qij
Such partially determined πj is for instance often the case in
Bayesian statistics (STK4021, H2013) which involve calculating
conditional probabilities, using Bayes rule
P(X = x|Y = y)P(Y = y)
P(Y = y|X = x) = P
y P(X = x|Y = y)P(Y = y)
Markov chain Monte CarloandReversible Markov chains – p. 15/22
Markov chain Monte CarloandReversible Markov chains – p. 16/22
−30
−10
0
10
20
1.5e+10
0.0e+00
for whole numbers i ∈ Z for certain values of
α = 2.222, β = 1.111 and γ = 7.777.
0.8
1
1
f (i) = exp(−β|i| + |i| log(α) + γ sin(2πi))
C
C
0.4
πi =
0.0
Suppose we want to simulate from a (slightly silly) distribution
exp(abs(ival) * log(alpha))
The function and its components
exp(−beta * abs(ival))
A more complicated example
30
−30
−10
Acceptance probabilities
f (j)
αij = min(1,
) for j = i − 1 and i + 1
f (i)
100
60
Markov chain
0
5
10
15
0
2000
6000
time
Mean of X
Mean of X^2
10000
10000
16
12
Mean of exp(X)
6000
time
30
0.8
ff(ival)
0
10
20
30
ival
−30
−10
0
ival
Markov chain Monte CarloandReversible Markov chains – p. 18/22
0
2000
6000
Want to sample uniformly from S maybe in order to
Pn
• Estimate average
j=1 jxj over S
Define a neighbour of s = (x1 , x2 , . . . , xn ) as a permutation
where only two elements have changed place. Let N (s) = the
number of neighbours of s in S.
4 6 8
Mean
0
2000
20
0.0
−10
• Number of elements K in S
1
x
−3 −2 −1
0
10
0.4
1
1
−30
S = the set of all permutations (x1 , . . . , xn ) of 1, 2, . . . , n that
P
satisfies nj=1 jxj > α for some α.
0 20
0.15
0.10
0.05
−5
30
Example 4.39 in Ross
0.00
normalised histogram
−15
20
ival
Markov chain Monte CarloandReversible Markov chains – p. 17/22
Results for the "funny" function
10
1
exp(gamm * sin(pi * ival))
ival
We can not sample candidates uniformly over Z and simply
choose neighbours of Xn = i so the candidates values are

 i + 1 with probability0.5
Yn+1 =
 i − 1 with probability0.5
0
10000
time
Markov chain Monte CarloandReversible Markov chains – p. 19/22
Sample candidates uniformly over the neighbours. Since the
uniform distribution over S equals π(s) = K1 we get acceptance
probabilities
N (s)
αs,t = min(1,
)
N (t)
Markov chain Monte CarloandReversible Markov chains – p. 20/22
Time reversible MC’s again
Time reversed Markov chains
Let . . . , X−n , X−n+1 , . . . , X−1 , X0 , X1 , . . . , Xn , Xn+1 , . . . be a
Markov chain on times Z with a stationary distribution
πj = P(Xn = j) for all n.
Thus we see that if Qij = Pij we obtain
If we reverse the chain, Yn = X−n then also Yn is a MC with
stationary distribution πj = P(Yn = j)
or the detailed balance equation
or the condition for being a time-reversible Markov chain.
πi Pij = πj Pji
Thus we can work out the transition probabilities
Qij = P(Yn+1 = j|Yn = i) as
=j,Xn+1 =i)
Qij = P(Xn = j|Xn+1 = i) = P(XPn(X
n+1 =i)
πj
P
(Xn =j)P(Xn+1 =i|Xn =j)
=
= πi Pji
P(Xn+1 =i)
or πi Qij = πj Pji
Markov chain Monte CarloandReversible Markov chains – p. 21/22
Markov chain Monte CarloandReversible Markov chains – p. 22/22

Download Report

Markov chain Monte Carlo and Reversible Markov chains

Paperzz.com

Your Paperzz