Bayesian Analysis of Agricultural Experiments: When Everything Is Random Shizhong Xu Department of Botany and Plant Sciences University of California Riverside, CA 92521 [email protected] Outline Introduction to Bayesian statistics Markov chain Monte Carlo algorithm Assessing Markov chain convergence Summary statistics (post MCMC analysis) An example (linear regression) Software for Bayesian analysis Outline Introduction to Bayesian statistics Markov chain Monte Carlo algorithm Assessing Markov chain convergence Summary statistics (post MCMC analysis) An example (linear regression) Software for Bayesian analysis Bayesian Inference and Frequentist Method Bayesian method: Parameters are random variables Frequentist method: Parameters are constants Bayesian method: Conditional probability Frequentist method: Maximum likelihood (classical method) Bayesian method: Invented in 1764 Frequentist method: Invented in 1922 Bayesians: People who do Bayesian analysis Frequentists: People who do maximum likelihood analysis Thomas Bayes (1702?-1761) British Mathematician Bayes’ Theorem (1764) Ronald A. Fisher (1890-1962) Statistician, Evolutionary Biologist, Geneticist and Eugenecist Maximum Likelihood Method (1912, 1922) Bayesian Inference and Maximum Likelihood Bayes, Thomas. 1764. An essay towards solving a problem in the Doctrine of Chances, Bayes's essay in the original notation, Communicated by Mr. Price (Bayes’ friend) in 1763, A letter to John Canton, Phil. Trans. Royal Society London 53: 269-271. Fisher, Ronald A. 1922. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society, A, 222: 309– 368. Bayes’ Theorem (Conditional Probability) P( A | B) = P( A) P ( B | A) P( A) P ( B | A) = P( B) ∑ P( A) P( B | A) P ( A | B ) is the conditional probability of A given B P ( B | A) is the conditional probability of B given A P ( A) is the prior probability of A (before B is observed) P( B) is the marginal probability of B (acts as a normalizing constant) Bayesian Inference P (θ | y ) = P (θ ) P( y | θ ) P (θ ) P( y | θ ) = ∝ P (θ ) P( y | θ ) P( y ) P P y d ( θ ) ( | θ ) θ ∫ P (θ | y ) is the conditional density of θ given y (Bayesian inference) P ( y | θ ) is the conditional density of y given θ (likelihood) P (θ ) is the prior density of θ (prior distribution) P( y ) is the marginal density of y (acts as a normalizing constant) P (θ | y ) ∝ P( y | θ ) P(θ ) = L( y | θ )π (θ ), commonly used notation Multivariate Bayesian Inference θ = {θ1...θ m } θ k = the kth element of θ θ − k = {θ1...θ k −1θ k +1...θ m } P (θ1...θ m | y ) ∝ P (θ1...θ m ) P ( y | θ1...θ m ) posterior joint distribution P (θ k | y ) ∝ ∫∫∫ P(θ1...θ m ) P( y | θ1...θ m )dθ − k posterior marginal distribution P (θ k | y ) ∝ ∫∫∫ P (θ ) P( y | θ )dθ − k Bayesian inference High dimensional multiple integration is involved A few simplest cases where a closed form of P (θ k | y ) exists Advantages of Bayesian Analysis Natural way of combining prior information with data Inferences made based on data and are exact without reliance on asymptotic approximation It provides interpretable answers, such as “the true parameter has a probability of 0.95 of falling in a 95% credible interval.” It provides a convenient setting for arbitrarily complicated parametric models, e.g., hierarchical models and missing data problems. Disadvantages of Bayesian Analysis Require skills to translate subjective prior beliefs into a mathematically formulated prior. If you do not proceed with caution, you can generate misleading results. It can produce posterior distributions that are heavily influenced by the priors. From a practical point of view, it might be difficult to convince experts who do not agree with the validity of the chosen prior. It often comes with a high computational cost, especially in models with a large number of parameters. Often implemented via MCMC, which itself has a lot of problems Prior Distributions Objective priors versus subjective priors Noninformative priors (vague, flat, diffuse) Proper prior (posterior exists) and improper priors (posterior does not exist) Informative prior (change or dominate over likelihood, previous study, past experience, or expert opinion) Conjugate priors (the prior and posterior distributions are from the same family, ) Jeffreys’ Prior (local uniform) Posterior Distribution Posterior mean, mode, percentiles Posterior standard deviation Equal-tail credible interval Highest posterior density (HPD) interval Credible Interval and Confidence Interval Credible interval: Pr(a>theta|y)=Pr(theta>b|y)=0.025, which leads to Pr(a<theta<b|y)=0.95, Given the data, theta will fall between a and b with 95% probability. Confidence interval: if the experiment is repeated infinite number of times, 95% of the times the confidence interval will cover the true value. It is purely hypothetical because the experiment is never going to be replicated, and if it is, the number of replications will never be infinite. Credible Intervals (Bayesian Analysis) Equal-tail interval Highest posterior density interval Confidence Interval (ML Analysis) Outline Introduction to Bayesian statistics Markov chain Monte Carlo algorithm Assessing Markov chain convergence Summary statistics (post MCMC analysis) An example (linear regression) Software for Bayesian analysis Markov Chain Monte Carlo P(θ k | y ) ∝ ∫∫∫ P(θ1...θ m ) P ( y | θ1...θ m )dθ − k posterior marginal distribution 1 P(θ k | y ) ≈ M M ∑ g (θ i =1 (i ) k | y ) Approximate via Monte Carlo simulation Multiple varite integration avoided Monte Carlo simulation of one or a few variable at a time Random number generators from simple distributions M sufficiently large to reduce the Monte Carlo error MCMC (Simple with Computer Simulation) Simple, so you and I can do it Increased computer power Popular as a result (Bayes v.s. ML) Can be dangerous, misleading result More of an art than a science Sampling from Posterior Conditional Distribution Target distribution is the posterior marginal distribution Sample from posterior conditional distribution When the Markov chain is sufficiently long, it reaches the stationary distribution The stationary distribution is the posterior joint distribution Product of MCMC is a posterior sample of all parameters Sampling from Posterior Conditional Distribution θ1(t +1) ~ P(θ1 | θ 2(t ) ...θ m(t ) , y ) = P( y | θ1 ,θ 2(t ) ...θ m(t ) ) P(θ1 ,θ 2(t ) ...θ m(t ) ) θ 2(t +1) ~ P(θ 2 | θ1(t ) ...θ m(t ) , y ) = P( y | θ 2 ,θ1(t ) ...θ m(t ) ) P(θ 2 ,θ1(t ) ...θ m(t ) ) ....................................................................................... θ m(t +1) ~ P(θ m | θ1(t ) ...θ m(t−)1 , y ) = P( y | θ m ,θ1(t ) ...θ m(t−)1 ) P(θ m ,θ1(t ) ...θ m(t−)1 ) Gibbs Sampler, Metropolis Sampler and Metropolis-Hastings Sampler Gibbs sampler: Directly sample variable from its fully conditional posterior distribution Metropolis: Random walk (symmetric proposal distribution) Metropolis-Hastings: General method (asymmetric proposal distribution) Metropolis-Hastings – MCMC Gibbs sampling special case of MCMC when proposal = posterior Acceptance-Rejection ⎧ p (θ new | y ) ⎫ Metropolis sampler: r = min ⎨ ,1⎬ (t ) ⎩ p (θ | y ) ⎭ ⎧ p (θ new | y ) p (θ ( t ) | θ new ) ⎫ Metropolis-Hastings sampler: r = min ⎨ ,1⎬ new (t ) (t ) ⎩ p (θ | y ) q (θ | θ ) ⎭ θ (t +1) = θ ( new) if accepted (r ), overwise θ (t +1) = θ (t ) with (1 − r ) Markov Chain P (θ (t +1) | θ (t ) , y ) θ (0) → θ (1) → θ (2) → θ (3) → ... → θ (T ) ⎡θ1(0) ⎤ ⎡θ1(1) ⎤ ⎡θ1(2) ⎤ ⎡θ1(T ) ⎤ ⎢ (0) ⎥ ⎢ (2) ⎥ ⎢ (2) ⎥ ⎢ (T ) ⎥ θ θ θ ⎢ 2 ⎥ → ⎢ 2 ⎥ → ⎢ 2 ⎥ → ... → ⎢θ 2 ⎥ ⎢ M⎥ ⎢ M⎥ ⎢ M⎥ ⎢ M⎥ ⎢ (0) ⎥ ⎢ (1) ⎥ ⎢ (2) ⎥ ⎢ (T ) ⎥ θ θ θ ⎣⎢ m ⎦⎥ ⎣⎢ m ⎦⎥ ⎣⎢ m ⎦⎥ ⎣⎢θ m ⎦⎥ When T → ∞, θ (T +1) ~ P (θ (T ) | y ) = P(θ (T ) | θ (T −1) , y ) Posterior Sample (Final Product of MCMC) Table 1. Posterior sample with T observations Iteration θ1 θ2 θ3 θ4 0 θ1(0) θ 2(0) θ3(0) θ 4(0) 1 θ1(1) θ1(2) θ 2(1) θ 2(2) θ3(1) θ3(2) θ 4(1) θ 4(2) θ1(3) θ1(4) θ 2(3) θ 2(4) θ3(3) θ3(4) θ 4(3) θ 4(4) . . . . . . . . . . . . θ1(T ) θ 2(T ) θ3(T ) θ 4(T ) 2 3 4 . . . T Outline Introduction to Bayesian statistics Markov chain Monte Carlo algorithm Assessing Markov chain convergence Summary statistics (post MCMC analysis) An example (linear regression) Software for Bayesian analysis Assessing Convergence of Markov Chain Convergence to the stationary distribution (no conclusive test, no guarantee) Some parameters converge fast and some converge slowly, but all parameters must converge to the stationary distribution After convergence, collect observations to form a posterior sample Assessing Convergence of Markov Chain Burn-in period: number of iterations before convergence Autocorrelation: measure the dependency of observations of the posterior sample Thinning (trimming) rate: keep one observation in every k-th iteration to reduce serial correlation Visual Analysis of Trace Plots All Parameters Have to Converge (Perfect Case) Some Parameters Do not Mix Well Burn-in Period Poor Mixing (Need High Thinning Rate ) Nonconvergence Autocorrelation Statistical Diagnosis Test for Convergence Gelman-Rubin R test (convergence, multiple chains required) Geweke z-test (convergence, single chain) Autocorrelation (dependency) Effective sample size (dependency) Gelman-Rubin Diagnostic Test dˆ + 3 Vˆ dˆ + 3 ⎛ n − 1 M + 1 B ⎞ µ + Rc = · = ⎜ ⎟ ˆ ˆ nM W ⎠ d +1 W d +1 ⎝ n 2Vˆ 2 ˆ where d = ¶ Vˆ ) var( · Vˆ ) = ⎛ n − 1 ⎞ 1 Var( · s2 ) + ⎛ M + 1 ⎞ 2 B2 Var( m ⎜ ⎟ ⎜ ⎟ ⎝ n ⎠ M ⎝ nM ⎠ M − 1 ( M + 1)(n − 1) n · · s 2 ,θ · ) cov( sm2 , (θ m· ) 2 ) − 2θ· · cov( +2 m m 2 nM M 2 2 ( ) Unconverged Multiple Chains Converged Multiple Chains Geweke Diagnostics 1 n1 t 1 θ1 = ∑ θ and θ 2 = n1 t =1 n2 Zn = θ1 − θ 2 sˆ1 (0) sˆ2 (0) + n1 n2 n t θ ∑ t = na Geweke’s Diagnostics Autocorrelations γˆ ( h ) , | h |< n ρˆ ( h ) = γˆ ( 0 ) 1 n−h t +h γˆ ( h ) = θi − θi )(θit − θi ) , 0 ≤ h < n ( ∑ n − h t =1 Autocorrelation Effective Sample Size ESS = n τ = n ∞ 1 + 2∑ ρ k (θ ) k =1 Effective Sample Size Effective Sample Sizes Parameter ESS Correlation Time Efficiency Sample Size beta0 beta1 sigma1 sigma2 106.2 106.1 31.5 116.4 0.1062 0.1061 0.0315 0.1164 9.4161 9.4232 31.7240 8.5938 1000 1000 1000 1000 Outline Introduction to Bayesian statistics Markov chain Monte Carlo algorithm Assessing Markov chain convergence Summary statistics (post MCMC analysis) An example (linear regression) Software for Bayesian analysis Posterior Sample: Final Product of MCMC Iteration beta0 beta1 beta2 beta3 1001 -0.6518 0.1508 1.4204 -0.8199 1011 -0.5059 0.089 1.3165 -0.8546 1021 -0.3935 -0.2032 1.3247 -0.6918 1031 -0.4967 -0.0446 1.2305 -0.7546 1041 -0.4055 0.0753 1.1728 -0.8286 1051 -0.5366 0.1417 1.2627 -0.5335 1061 -0.5082 0.1666 1.2622 -0.5695 1071 -0.5667 0.3094 1.4219 -1.1682 1081 -0.5644 0.2282 1.4198 -0.9836 1091 -0.8113 0.5319 1.5171 -0.9582 1101 -0.8727 0.7558 1.738 -1.2919 1111 -0.7814 0.3138 1.8088 -1.3229 1121 -0.789 0.4903 1.6471 -1.1649 1131 -0.5641 0.00334 1.513 -1.0373 1141 -0.4443 0.2236 1.3024 -1.1663 1151 -0.3215 0.1636 1.0776 -0.6622 1161 -0.3652 -0.1745 1.0796 -0.3665 1171 -0.4751 -0.1332 1.0835 -0.2229 1181 -0.4606 0.1071 1.1 -0.6125 1191 -0.653 0.3555 1.3507 -0.8123 1201 -0.5943 0.3791 1.3608 -0.967 1211 -0.4866 0.2016 1.2843 -0.897 1221 -0.4528 0.4167 1.2743 -1.3167 1231 -0.5015 0.453 1.2511 -1.2055 s2 LogPrior LogLike LogPost delta1 0.000149 71.4095 -55.5855 15.8239 0.00565 0.000111 69.8796 -55.3764 14.5032 0.00809 0.000192 72.8501 -57.4954 15.3547 0.00949 0.000086 73.5804 -56.1259 17.4545 0.00645 0.000125 77.5513 -56.1312 21.42 0.00776 0.000118 76.9519 -55.8484 21.1035 0.00631 0.000073 81.6694 -55.9616 25.7079 0.00602 0.000107 81.155 -55.7688 25.3862 0.00648 0.000047 85.579 -55.1402 30.4388 0.000245 0.00005 86.1065 -57.2681 28.8384 -0.00434 0.000033 87.6737 -59.9277 27.746 -0.00714 0.000033 87.5795 -58.6224 28.9572 -0.00752 0.000053 85.5295 -56.8125 28.717 -0.00947 0.000064 85.5708 -56.9185 28.6522 -0.0126 0.000043 87.6287 -56.7919 30.8367 -0.0116 0.000035 91.6861 -57.9028 33.7833 -0.00632 0.000018 90.3015 -56.2956 34.0059 -0.00631 0.000069 87.3886 -56.5182 30.8704 -0.00631 0.000039 88.117 -55.5613 32.5557 -0.00008 0.000095 80.608 -55.689 24.919 0.01 0.000064 77.0309 -55.6955 21.3354 0.0143 0.000141 76.9601 -55.3034 21.6567 0.0109 0.000206 74.7238 -58.1844 16.5394 0.00998 0.000187 72.6304 -57.4055 15.2248 0.00552 Marginal Posterior Distributions Summary Statistics (Post MCMC Analysis) Posterior means, modes, medians Posterior quantiles (percentiles) Posterior standard deviations Equal-tail credible intervals Highest posterior density intervals Summary Statistics (Post MCMC Analysis) Posterior Summaries Parameter N Mean Standard Deviation Percentiles 25% 50% 75% LogBUN 10000 1.7610 0.6593 1.3173 1.7686 2.2109 HGB 10000 -0.1279 0.0727 -0.1767 -0.1287 -0.0789 Platelet 10000 -0.2179 0.5169 -0.5659 -0.2360 0.1272 Age 10000 -0.0130 0.0199 -0.0264 -0.0131 0.000492 LogWBC 10000 0.3150 0.7451 -0.1718 0.3321 0.8253 Frac 10000 0.3766 0.4152 0.0881 0.3615 0.6471 LogPBM 10000 0.3792 0.4909 0.0405 0.3766 0.7023 Protein 10000 0.0102 0.0267 -0.00745 0.0106 0.0283 SCalc 10000 0.1248 0.1062 0.0545 0.1273 0.1985 Summary Statistics (Post MCMC Analysis) Posterior Intervals Parameter Alpha Equal-Tail Interval HPD Interval LogBUN 0.050 0.4418 3.0477 0.4107 2.9958 HGB 0.050 -0.2718 0.0150 -0.2801 0.00599 Platelet 0.050 -1.1952 0.8296 -1.1871 0.8341 Age 0.050 -0.0514 0.0259 -0.0519 0.0251 LogWBC 0.050 -1.2058 1.7228 -1.1783 1.7483 Frac 0.050 -0.3995 1.2316 -0.4273 1.2021 LogPBM 0.050 -0.5652 1.3671 -0.5939 1.3241 Protein 0.050 -0.0437 0.0611 -0.0405 0.0637 SCalc 0.050 -0.0935 0.3264 -0.0846 0.3322 Outline Introduction to Bayesian statistics Markov chain Monte Carlo algorithm Assessing Markov chain convergence Summary statistics (post MCMC analysis) An example (linear regression) Software for Bayesian analysis Regression Analysis y j = β 0 + X j1β1 + X j β 2 + ε j j = 1,..., n, where n is the sample size ε j ~ N (0, σ 2 ) Parameter θ = {β 0 , β1 , β 2 , σ 2 } Data y = { y j } n j =1 Likelihood n L(θ ) = ∏ N ( y j | β 0 + X j1β1 + X j 2 β 2 , σ 2 ) j =1 New notation: p ( x) = N ( x | μ ,ν ) = same as x ~ N ( μ ,ν ) 1 ⎡ 1 ⎤ exp ⎢ − ( y − μ ) 2 ⎥ 2πν ⎣ 2ν ⎦ Prior Distribution π (θ ) = π ( β 0 )π ( β1 )π ( β 2 )π (σ 2 ) π ( β 0 ) = U( β 0 | −∞, +∞) ≈ N ( β 0 | 0,106 ) π ( β1 ) = N ( β1 | 0, σ 12 ) π ( β 2 ) = N ( β 2 | 0, σ 22 ) π (σ 2 ) = Inv − χ 2 (σ 2 | τ , ω ) hierarchical model ξ (σ 12 ) = Inv − χ 2 (σ 2 | τ , ω ) ξ (σ 22 ) = Inv − χ 2 (σ 2 | τ , ω ) Conditional Posterior Distributions p (θ1 , θ 2 ,..., θ m | y ) = L(θ1 ,θ 2 ,...,θ m )π (θ1 ,θ 2 ,...,θ m ) p (θ1 , θ 2(0) ,...,θ m(0) | y ) = L(θ1 ,θ 2(0) ,...,θ m(0) )π (θ1 ,θ 2(0) ,...,θ m(0) ) p (θ1 | θ 2(0) ,...,θ m(0) , y ) = a known distribution Conditional Posterior Distribution p( β 0 | β1(0) , β 2(0) , σ 2(0) , y ) = N ( β 0 | μ0 ,ν 0 ) p( β1 | β 0(0) , β 2(0) , σ 2(0) , y ) = N ( β1 | μ1 ,ν 1 ) p( β 2 | β 0(0) , β1(0) , σ 2(0) , y ) = N ( β 2 | μ 2 ,ν 2 ) p(σ 2 | β 0(0) , β1(0) , β 2(0) , y ) = Inv − χ 2 (σ 2 | τ + n,ν + SS ) p(σ 12 | β 0(0) , β1(0) , β 2(0) , σ 2(0) , y ) = Inv − χ 2 (σ 12 | τ + 1,ν + ( β1(0) ) 2 ) p(σ 22 | β 0(0) , β1(0) , β 2(0) , σ 2(0) , y ) = Inv − χ 2 (σ 22 | τ + 1,ν + ( β 2(0) ) 2 ) Posterior Sample (Final Product of MCMC) Table 1. Posterior sample with T observations Iteration β0 β1 β2 σ2 0 β 0(0) β 0(1) β1(0) β1(1) β 2(0) β 2(1) σ 2(0) σ 2(1) σ 12(0) σ 12(1) σ 22(0) σ 22(1) β 0(2) β 0(3) β 0(4) β1(2) β1(3) β1(4) β 2(2) β 2(3) β 2(4) σ 2(2) σ 2(3) σ 2(4) σ 12(2) σ 12(3) σ 12(4) σ 22(2) σ 22(3) σ 22(4) . . . . . . . . . . . . . . . . . . β 0(T ) β1(T ) β 2(T ) σ 2(T ) σ 12(T ) σ 22(T ) 1 2 3 4 . . . T σ 12 σ 22 Outline Introduction to Bayesian statistics Markov chain Monte Carlo algorithm Assessing Markov chain convergence Summary statistics (post MCMC analysis) An example (linear regression) Software for Bayesian analysis Software for Bayesian Analysis WinBUGS (Windows Bayesian analysis Using Gibbs Sampler) Website: http://www.mrc-bsu.cam.ac.uk/bugs/ SAS (Statistical Analysis System) Website: http://www.sas.com/ Bayesian Analysis Using SAS PROC PROC PROC PROC PROC PROC GENMOD (generalize linear model) LIFEREG (accelerated life failure model) PHREG (piecewise constant baseline hazard model) MIXED (mixed model analysis) QTL (quantitative trait loci) MCMC (Bayesian analysis via MCMC) Bayesian Analysis Using SAS PROC PROC PROC PROC PROC PROC GENMOD (generalize linear model) LIFEREG (accelerated life failure model) PHREG (piecewise constant baseline hazard model) MIXED (mixed model analysis) QTL (quantitative trait loci) MCMC (Bayesian analysis via MCMC) Demonstration of PROC MCMC Seed data Damage data (will not be presented) The Seed Data (Crowder 1977) PROC PROC PROC PROC PROC PROC GENMOD (generalize linear model) LIFEREG (accelerated life failure model) PHREG (piecewise constant baseline hazard model) MIXED (mixed model analysis) QTL (quantitative trait loci) MCMC (Bayesian analysis via MCMC) Formatted Raw Data Plate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Breed O.a75 O.a75 O.a75 O.a75 O.a75 O.a75 O.a75 O.a75 O.a75 O.a75 O.a75 O.a73 O.a73 O.a73 O.a73 O.a73 O.a73 O.a73 O.a73 O.a73 O.a73 Host Bean Bean Bean Bean Bean Cucumber Cucumber Cucumber Cucumber Cucumber Cucumber Bean Bean Bean Bean Bean Cucumber Cucumber Cucumber Cucumber Cucumber r n 10 23 23 26 17 5 53 55 32 46 10 8 10 8 23 0 3 22 15 32 3 39 62 81 51 39 6 74 72 51 79 13 16 30 28 45 4 12 41 30 51 7 Formatted Pretreated Data Plate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Breed 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 Host 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 r 10 23 23 26 17 5 53 55 32 46 10 8 10 8 23 0 3 22 15 32 3 n 39 62 81 51 39 6 74 72 51 79 13 16 30 28 45 4 12 41 30 51 7 p 0.2564 0.371 0.284 0.5098 0.4359 0.8333 0.7162 0.7639 0.6275 0.5823 0.7692 0.5 0.3333 0.2857 0.5111 0 0.25 0.5366 0.5 0.6275 0.4286 Logistic Model rj ~ Binomial(n j , p j ) μ j = β 0 + breed j ⋅ β1 + host j ⋅ β 2 + breed j × host j ⋅ β 3 + δ j p j = logistic( μ j ) = exp( μ j ) 1 + exp( μ j ) μ j = logit( p j ) = log pj 1− p j δ j ~ Normal(0, σ 2 ), overdispersion σ 2 > 0 Parameters, Missing Values and Data Parameter : θ = {β 0 , β1 , β 2 , β 3 , σ 2 } Missing values : δ = {δ j } Data : d = {rj , n j , breed j , host j } Prior and Posterior Distributions Prior : β 0 − β3 ~ Normal(0,106 ) δ j ~ Normal(0, σ 2 ) σ 2 ~ Inv − χ 2 (10−6 ,10−6 ) Posterior : β 0 − β3 ~ Unknown (Metropolis sampler) δ j ~ Unkown (Metropolis sampler) ( σ 2 ~ Inv − χ 2 10−6 + n,10−6 + ∑ j =1 δ j2 n ) Demonstration! SAS Code and Demonstration Thank You! The Damage Data (Milliken and Johnson 1992) Plant 1 2 3 4 5 6 7 8 9 10 11 12 13 Variety A A A B B B B C C C C D D Damage 3.9 4.05 4.25 3.6 4.2 4.05 3.85 4.15 4.6 4.15 4.4 3.35 3.8 Model, Prior and Posterior y = X β + Z γ + ε ; p (ε ) = Normal(ε | 0, Iσ 2 ) ⎡1 ⎡3.90 ⎤ ⎡1⎤ ⎢ 4.05⎥ ⎢1⎥ ⎢1 ⎢ ⎢ ⎥ ⎢⎥ ⎢ 4.25⎥ ⎢1⎥ ⎢1 ⎢ ⎥ ⎢⎥ ⎢ 3.60 1 ⎢0 ⎢ ⎥ ⎢⎥ ⎢ 4.20 ⎥ ⎢1⎥ ⎢0 ⎢ ⎢ ⎥ ⎢⎥ 4.05 1 ⎢ ⎥ ⎢⎥ ⎢0 ⎢ 3.85 ⎥ = ⎢1⎥ β + ⎢0 ⎢ ⎢ ⎥ ⎢⎥ ⎢0 ⎢ 4.15⎥ ⎢1⎥ ⎢0 ⎢ 4.60 ⎥ ⎢1⎥ ⎢ ⎢ ⎥ ⎢⎥ ⎢0 ⎢ 4.15⎥ ⎢1⎥ ⎢ ⎢ ⎥ ⎢⎥ 4.40 1 ⎢0 ⎢ ⎥ ⎢⎥ ⎢0 ⎢ 3.35 ⎥ ⎢1⎥ ⎢ ⎢ ⎥ ⎢⎥ ⎣0 ⎣3.80 ⎦ ⎣1⎦ 0 0 0⎤ ⎡ ε1 ⎤ ⎢ε ⎥ 0 0 0 ⎥⎥ ⎢ 2⎥ ⎢ ε3 ⎥ 0 0 0⎥ ⎢ ⎥ ⎥ 1 0 0⎥ ⎢ε4 ⎥ ⎢ ε5 ⎥ 1 0 0⎥ ⎥ ⎡γ ⎤ ⎢ ⎥ 1 0 0⎥ ⎢ 1 ⎥ ⎢ ε 6 ⎥ γ 1 0 0⎥ ⎢ 2 ⎥ + ⎢ ε 7 ⎥ ⎥ ⎢γ 3 ⎥ ⎢ ⎥ 0 1 0⎥ ⎢ ⎥ ⎢ ε 8 ⎥ ⎣γ ⎦ 0 1 0⎥ 4 ⎢ ε 9 ⎥ ⎢ ⎥ ⎥ ⎢ε10 ⎥ 0 1 0⎥ ⎢ ⎥ ⎥ 0 1 0⎥ ⎢ε11 ⎥ ⎢ε12 ⎥ 0 0 1⎥ ⎢ ⎥ ⎥ 0 0 1⎦ ⎣ε13 ⎦ Model, Prior and Posterior Prior: π ( β ) = Normal( β | 0,106 ) π (γ ) = Normal(γ | 0, Iσ A2 ) π (σ A2 ) = Inv − χ 2 (σ A2 |10−6 ,10−6 ) π (σ 2 ) = Inv − χ 2 (σ 2 |10−6 ,10−6 ) Posterior: p ( β | ...) = Normal ⎡⎣ β | ( X T V −1 X ) −1 ( X T V −1 y ), ( X T V −1 X ) −1 ⎤⎦ p (γ | ...) = Normal ⎡⎣γ | σ A2 Z TV −1 ( y − X β ), σ A2 ( I − Z T V −1Zσ A2 ) ⎤⎦ ( p (σ A2 | ...) = Inv − χ 2 σ A2 |10−6 +4,10−6 +∑ k =1 γ k2 4 ) p (σ 2 | ...) = Inv − χ 2 ⎡σ 2 |10−6 +13,10−6 +∑ j =1 ( y j − X j β − Z j γ ) 2 ⎤ ⎣ ⎦ 13
© Copyright 2025 Paperzz