MCMC Output & Metropolis-Hastings Algorithm Part I P548: Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/18/2017: Lecture 03-2 Note: This Powerpoint presentation may contain macros that I wrote to help me create the slides. The macros aren’t needed to view the slides. You can disable or delete the macros without any change to the presentation. Outline • Metropolis-Hastings (M-H) algorithm – one of the main tools for approximating a posterior distribution by means of Markov-Chain-Monte-Carlo (MCMC). • M-H draws samples from the posterior distribution. With enough samples, you have a good approximation to the posterior distribution. ♦ This is the central step in computing a Bayesian analysis. ♦ Today's lecture is just a quick overview; we will look at the details in a later lecture. Psych 548, Miyamoto, Win '17 # 2 Outline • Assignment 3 focuses on R-code for computing the posterior of a binomial parameter by means of the M-H algorithm. ♦ The code in Assignment 3 is almost entirely due to Kruschke, but JM has made a few modifications plus added annotations. ♦ In actual research, you will almost never execute the M-H algorithm within R. Instead, you will send the problem of sampling from the posterior over to JAGS or Stan. Nevertheless, it is useful to see the details of M-H on a simple example, and this is what you do in Assignment 3. Psych 548, Miyamoto, Win '17 General Strategy of Bayesian Statistical Inference 3 Three Strategies of Bayesian Statistical Inference Define the Class of Statistical Models (Reality is Assumed to Lie within this Class of Models Define Prior Distributions Data Define Likelihoods Conditional on Parameters Compute Posterior from Conjugate Priors (if possible) Compute Posterior with Grid Approximation (if practically possible) Compute Approximate Posterior by MCMC Algorithm (if possible) Psych 548, Miyamoto, Win '17 Same Slide – Summary Representation 4 General Strategy of Bayesian Statistical Inference Define the Class of Statistical Models (Reality is Assumed to Lie within this Class of Models Define Prior Distributions Data Compute Posterior from Conjugate Priors (if possible) This lecture Psych 548, Miyamoto, Win '17 Define Likelihoods Conditional on Parameters Compute Posterior with Grid Approximation (if practically possible) Compute Approximate Posterior by MCMC Algorithm (if possible) Illustrate Idea of Sampling from a Posterior Distribution 5 BIG NUMBER MCMC Algorithm Samples from the Posterior Distribution Psych 548, Miyamoto, Win '17 Validity of the MCMC Approximation 6 Validity of the MCMC Approximation Theorem: Under very general mathematical conditions: As sample size K gets very large, the distribution in the sample converges to the true posterior probability distribution. Psych 548, Miyamoto, Win '17 Look Closely at Bayes Rule 7 Reminder About Bayes Rule Before computing a Bayesian analysis, the researcher knows: • θ = (θ1, θ2, ..., θn) is a vector of parameters for a statistical model. o E.g., in a oneway anova with 3 group, θ1 = mean 1, θ2 = mean 2, 3 = mean 3, and 4 = the common variance each of the 3 populations. • P(θ) = the prior probability distribution over the vector θ. • P(D | θ) = the likelihood of the data D given any particular vector θ Known for of parameters. each specific θ Known for each specific θ Known for each specific θ Unknown for the entire distribution • Bayes Rule: Unknown Psych 548, Miyamoto, Win '17 Same Slide w-o Red Annotation 8 Bayes Rule Before computing a Bayesian analysis, the researcher knows: • θ = (θ1, θ2, ..., θn) is a vector of parameters for a statistical model. o E.g., in a oneway anova with 3 group, θ1 = mean 1, θ2 = mean 2, 3 = mean 3, and 4 = the common variance each of the 3 populations. • P(θ) = the prior probability distribution over the vector θ. • P(D | θ) = the likelihood of the data D given any particular vector θ of parameters. • Bayes Rule: Psych 548, Miyamoto, Win '17 Why Is Bayes Rule Hard to Compute in Practice? 9 Why Is Bayes Rule Hard to Apply in Practice? • Fact #1: P(D | θ) is easy to compute for individual cases. • Fact #2: P(θ) is easy to compute for individual cases. • Metropolis-Hastings Algorithm uses Facts #1 and #2 to compute an approximation to P(θ| D) where P | D Psych 548, Miyamoto, Win '17 P D | P P D Reminder: Each Step of Metropolis-Hastings Algorithm . Depends only on Immediately Preceding Step. 10 Reminder: Each Sample from the Posterior Depends only on the Immediate Preceding Step Psych 548, Miyamoto, Win '17 BIG PICTURE: Metropolis-Hastings Algorithm 11 BIG PICTURE: Metropolis Hastings Algorithm • At the k-th step, you have a current vector of k parameters. This is your current sample. • A “proposal function” F proposes a random new vector based only on the values in Iteration k: • A “rejection rule” decides whether the proposal is acceptable or not. ♦ If it is acceptable: Iteration k + 1 = Proposal k ♦ If it is rejected: Iteration k + 1 = Iteration k • Repeat the process at the next step. Psych 548, Miyamoto, Win '17 Metropolis-Hastings: The Proposal Density 12 Metropolis-Hastings (M-H) Algorithm The “Proposal” Density • Notation: Let k 1k , k2 , , kn be a vector of specific values for θ1, θ2, ..., θn that make up the k-th sample. • Choose a "proposal" density F(θ | θk ) where for each k 1k , k2 , , kn , F(θ | θk ) is a probability distribution over the θ Ω = the set of all parameter vectors. Example: F(θ | θk ) might be defined by: θ1 ~ N(θk1, = 2), θ2 ~ N(θk2, = 2) , ...., θn ~ N(θkn, = 2). Psych 548, Miyamoto, Win '17 Case of M-H Algorithm With Symmetric Proposal Function 13 MH Algorithm for Case Where Proposal Function is Symmetric Step 1: Assume that θk is the current value of θ: k 1k , k2 , , kn Step 2: Draw a candidate θc from F(θ | θk ), i.e. θc ~ F(θ | θk ) Step 3: Compute the posterior odds: R P c |D P k |D Step 4: If R ≥ 1, set θk+1 = θc. If R < 1, draw u ~ Uniform(0, 1). ▪ If R ≥ u, set θk+1 = θc. ▪ If R < u, set θk+1 = θk. Step 5: Set k = k+1, and return to Step 2. Continue this process until you have a very large sample of θ. Psych 548, Miyamoto, Win '17 Now we have finished choosing k 1 1k 1, k21, , kn1 Closer Look at Steps 3 and 4 14 Closer Look at Steps 3 & 4 Step 3: Compute the posterior odds: θc = “candidate” sample θk = previously accepted k-th sample R P θc |D P θk |D Step 4: If R ≥ 1.0, set θk+1 = θc. If R < 1.0, draw random u ~ Uniform(0, 1). ▪ If R ≥ u, set θk+1 = θc. ▪ If R < u, set θk+1 = θk. ♦ If P( θc | D) ≥ P( θk | D), then R ≥ 1.0, so it is certain that θk+1 = θc. ♦ If P( θc | D) < P( θk | D), then R is the probability that θk+1 = θc. ♦ Conclusion: The MCMC chain tends to jump towards high probability regions of the posterior, but it can jump to low probability regions. Psych 548, Miyamoto, Win '17 Return to Slide Showing the Metropolis-Hastings Algorithm 15 MH Algorithm for Case Where Proposal Function is Symmetric Step 1: Assume that θk is the current value of θ: k 1k , k2 , , kn Step 2: Draw a candidate θc from F(θ | θk ), i.e. θc ~ F(θ | θk ) Step 3: Compute the posterior odds: R P c |D P k |D Step 4: If R ≥ 1, set θk+1 = θc. If R < 1, draw u ~ Uniform(0, 1). ▪ If R ≥ u, set θk+1 = θc. ▪ If R < u, set θk+1 = θk. Step 5: Set k = k+1, and return to Step 2. Continue this process until you have a very large sample of θ. Psych 548, Miyamoto, Win '17 Now we have finished choosing k 1 1k 1, k21, , kn1 Go to Handout on R-code for Metropolis-Hastings - END 16 Wednesday, January 18, 2017: The Lecture Ended Here Psych 548, Miyamoto, Win '17 17
© Copyright 2026 Paperzz