Set Up for Instructor

MCMC Output &
Metropolis-Hastings Algorithm
Part I
P548: Bayesian Stats with Psych Applications
Instructor: John Miyamoto
01/18/2017: Lecture 03-2
Note: This Powerpoint presentation may contain macros that I wrote to help me create the slides. The macros
aren’t needed to view the slides. You can disable or delete the macros without any change to the presentation.
Outline
• Metropolis-Hastings (M-H) algorithm – one of the main tools
for approximating a posterior distribution by means of
Markov-Chain-Monte-Carlo (MCMC).
• M-H draws samples from the posterior distribution.
With enough samples, you have a good approximation
to the posterior distribution.
♦
This is the central step in computing a Bayesian analysis.
♦
Today's lecture is just a quick overview; we will look at the details
in a later lecture.
Psych 548, Miyamoto, Win '17
#
2
Outline
• Assignment 3 focuses on R-code for computing the posterior
of a binomial parameter by means of the M-H algorithm.
♦
The code in Assignment 3 is almost entirely due to Kruschke, but
JM has made a few modifications plus added annotations.
♦
In actual research, you will almost never execute the M-H algorithm
within R. Instead, you will send the problem of sampling from the
posterior over to JAGS or Stan. Nevertheless, it is useful to see
the details of M-H on a simple example, and this is what you do
in Assignment 3.
Psych 548, Miyamoto, Win '17
General Strategy of Bayesian Statistical Inference
3
Three Strategies of Bayesian Statistical Inference
Define the Class of Statistical Models
(Reality is Assumed to Lie within this Class of Models
Define Prior Distributions
Data
Define Likelihoods
Conditional on Parameters
Compute Posterior
from Conjugate Priors
(if possible)
Compute Posterior
with Grid
Approximation
(if practically possible)
Compute Approximate
Posterior by MCMC
Algorithm
(if possible)
Psych 548, Miyamoto, Win '17
Same Slide – Summary Representation
4
General Strategy of Bayesian Statistical Inference
Define the Class of Statistical Models
(Reality is Assumed to Lie within this Class of Models
Define Prior Distributions
Data
Compute Posterior
from Conjugate Priors
(if possible)
This lecture
Psych 548, Miyamoto, Win '17
Define Likelihoods
Conditional on Parameters
Compute Posterior
with Grid
Approximation
(if practically possible)
Compute Approximate
Posterior by MCMC
Algorithm
(if possible)
Illustrate Idea of Sampling from a Posterior Distribution
5
BIG NUMBER
MCMC Algorithm Samples from the Posterior Distribution
Psych 548, Miyamoto, Win '17
Validity of the MCMC Approximation
6
Validity of the MCMC Approximation
Theorem: Under very general mathematical conditions:
As sample size K gets very large,
the distribution in the sample converges
to the true posterior probability distribution.
Psych 548, Miyamoto, Win '17
Look Closely at Bayes Rule
7
Reminder About Bayes Rule
Before computing a Bayesian analysis, the researcher knows:
• θ = (θ1, θ2, ..., θn) is a vector of parameters for a statistical model.
o
E.g., in a oneway anova with 3 group, θ1 = mean 1, θ2 = mean 2,
3 = mean 3, and 4 = the common variance each of the 3 populations.
• P(θ) = the prior probability distribution over the vector θ.
• P(D | θ) = the likelihood of the data D given any particular vector θ
Known for
of parameters.
each specific θ
Known for
each specific θ
Known for each specific θ
Unknown for the entire distribution
• Bayes Rule:
Unknown
Psych 548, Miyamoto, Win '17
Same Slide w-o Red Annotation
8
Bayes Rule
Before computing a Bayesian analysis, the researcher knows:
• θ = (θ1, θ2, ..., θn) is a vector of parameters for a statistical model.
o
E.g., in a oneway anova with 3 group, θ1 = mean 1, θ2 = mean 2,
3 = mean 3, and 4 = the common variance each of the 3 populations.
• P(θ) = the prior probability distribution over the vector θ.
• P(D | θ) = the likelihood of the data D given any particular vector θ
of parameters.
• Bayes Rule:
Psych 548, Miyamoto, Win '17
Why Is Bayes Rule Hard to Compute in Practice?
9
Why Is Bayes Rule Hard to Apply in Practice?
• Fact #1:
P(D | θ) is easy to compute for individual cases.
• Fact #2:
P(θ) is easy to compute for individual cases.
• Metropolis-Hastings Algorithm uses Facts #1 and #2 to compute
an approximation to P(θ| D) where
P   | D 
Psych 548, Miyamoto, Win '17
P D |   P   
P D 
Reminder: Each Step of Metropolis-Hastings Algorithm .
Depends only on Immediately Preceding Step. 10
Reminder: Each Sample from the Posterior
Depends only on the Immediate Preceding Step
Psych 548, Miyamoto, Win '17
BIG PICTURE: Metropolis-Hastings Algorithm
11
BIG PICTURE: Metropolis Hastings Algorithm
• At the k-th step, you have a current vector of k parameters.
This is your current sample.
• A “proposal function” F proposes a random new vector
based only on the values in Iteration k:
• A “rejection rule” decides whether the proposal is acceptable
or not.
♦
If it is acceptable:
Iteration k + 1 = Proposal k
♦
If it is rejected:
Iteration k + 1 = Iteration k
• Repeat the process at the next step.
Psych 548, Miyamoto, Win '17
Metropolis-Hastings: The Proposal Density
12
Metropolis-Hastings (M-H) Algorithm
The “Proposal” Density
• Notation: Let k   1k , k2 ,
, kn

be a vector of specific values
for θ1, θ2, ..., θn that make up the k-th sample.
• Choose a "proposal" density F(θ | θk ) where for each
k   1k , k2 ,
, kn  ,
F(θ | θk ) is a probability distribution
over the θ  Ω = the set of all parameter vectors.
Example: F(θ | θk ) might be defined by:
θ1 ~ N(θk1,  = 2), θ2 ~ N(θk2,  = 2) , ...., θn ~ N(θkn,  = 2).
Psych 548, Miyamoto, Win '17
Case of M-H Algorithm With Symmetric Proposal Function 13
MH Algorithm for Case Where Proposal Function is Symmetric
Step 1: Assume that θk is the current value of θ:
k   1k , k2 ,
, kn 
Step 2: Draw a candidate θc from F(θ | θk ), i.e. θc ~ F(θ | θk )
Step 3: Compute the posterior odds:
R

P  c |D
P  k |D
Step 4: If R ≥ 1, set θk+1 = θc.
If R < 1, draw u ~ Uniform(0, 1).
▪ If R ≥ u, set θk+1 = θc.
▪ If R < u, set θk+1 = θk.
Step 5: Set k = k+1, and return to Step 2.
Continue this process until you have
a very large sample of θ.
Psych 548, Miyamoto, Win '17
Now we have
finished choosing
k 1   1k 1, k21,
, kn1

Closer Look at Steps 3 and 4
14
Closer Look at Steps 3 & 4
Step 3: Compute the posterior odds:
θc =
“candidate” sample
θk =
previously accepted k-th sample
R 
P θc |D
P θk |D
Step 4: If R ≥ 1.0, set θk+1 = θc.
If R < 1.0, draw random u ~ Uniform(0, 1).
▪ If R ≥ u, set θk+1 = θc.
▪ If R < u, set θk+1 = θk.
♦
If P( θc | D) ≥ P( θk | D), then R ≥ 1.0, so it is certain that θk+1 = θc.
♦
If P( θc | D) < P( θk | D), then R is the probability that θk+1 = θc.
♦
Conclusion: The MCMC chain tends to jump towards high probability regions
of the posterior, but it can jump to low probability regions.
Psych 548, Miyamoto, Win '17
Return to Slide Showing the Metropolis-Hastings Algorithm
15
MH Algorithm for Case Where Proposal Function is Symmetric
Step 1: Assume that θk is the current value of θ:
k   1k , k2 ,
, kn 
Step 2: Draw a candidate θc from F(θ | θk ), i.e. θc ~ F(θ | θk )
Step 3: Compute the posterior odds:
R

P  c |D
P  k |D
Step 4: If R ≥ 1, set θk+1 = θc.
If R < 1, draw u ~ Uniform(0, 1).
▪ If R ≥ u, set θk+1 = θc.
▪ If R < u, set θk+1 = θk.
Step 5: Set k = k+1, and return to Step 2.
Continue this process until you have
a very large sample of θ.
Psych 548, Miyamoto, Win '17
Now we have
finished choosing
k 1   1k 1, k21,
, kn1

Go to Handout on R-code for Metropolis-Hastings - END
16
Wednesday, January 18, 2017: The Lecture Ended Here
Psych 548, Miyamoto, Win '17
17