Bayesian Analysis of Agricultural Experiments: When Everything Is

Bayesian Analysis of Agricultural Experiments:
When Everything Is Random
Shizhong Xu
Department of Botany and Plant Sciences
University of California
Riverside, CA 92521
[email protected]
Outline
„
„
„
„
„
„
Introduction to Bayesian statistics
Markov chain Monte Carlo algorithm
Assessing Markov chain convergence
Summary statistics (post MCMC analysis)
An example (linear regression)
Software for Bayesian analysis
Outline
„
„
„
„
„
„
Introduction to Bayesian statistics
Markov chain Monte Carlo algorithm
Assessing Markov chain convergence
Summary statistics (post MCMC analysis)
An example (linear regression)
Software for Bayesian analysis
Bayesian Inference and Frequentist Method
„
„
„
„
„
„
„
„
Bayesian method: Parameters are random variables
Frequentist method: Parameters are constants
Bayesian method: Conditional probability
Frequentist method: Maximum likelihood (classical method)
Bayesian method: Invented in 1764
Frequentist method: Invented in 1922
Bayesians: People who do Bayesian analysis
Frequentists: People who do maximum likelihood analysis
Thomas Bayes (1702?-1761)
British Mathematician
Bayes’ Theorem (1764)
Ronald A. Fisher (1890-1962)
Statistician, Evolutionary Biologist, Geneticist and Eugenecist
Maximum Likelihood Method (1912, 1922)
Bayesian Inference and Maximum Likelihood
„
„
Bayes, Thomas. 1764. An essay towards solving a problem in the
Doctrine of Chances, Bayes's essay in the original notation, Communicated
by Mr. Price (Bayes’ friend) in 1763, A letter to John Canton, Phil. Trans.
Royal Society London 53: 269-271.
Fisher, Ronald A. 1922. On the mathematical foundations of theoretical
statistics. Philosophical Transactions of the Royal Society, A, 222: 309–
368.
Bayes’ Theorem (Conditional Probability)
P( A | B) =
P( A) P ( B | A)
P( A) P ( B | A)
=
P( B)
∑ P( A) P( B | A)
P ( A | B ) is the conditional probability of A given B
P ( B | A) is the conditional probability of B given A
P ( A)
is the prior probability of A (before B is observed)
P( B)
is the marginal probability of B (acts as a normalizing constant)
Bayesian Inference
P (θ | y ) =
P (θ ) P( y | θ )
P (θ ) P( y | θ )
=
∝ P (θ ) P( y | θ )
P( y )
P
P
y
d
(
θ
)
(
|
θ
)
θ
∫
P (θ | y ) is the conditional density of θ given y (Bayesian inference)
P ( y | θ ) is the conditional density of y given θ (likelihood)
P (θ )
is the prior density of θ (prior distribution)
P( y )
is the marginal density of y (acts as a normalizing constant)
P (θ | y ) ∝ P( y | θ ) P(θ ) = L( y | θ )π (θ ), commonly used notation
Multivariate Bayesian Inference
θ = {θ1...θ m }
θ k = the kth element of θ
θ − k = {θ1...θ k −1θ k +1...θ m }
P (θ1...θ m | y ) ∝ P (θ1...θ m ) P ( y | θ1...θ m ) posterior joint distribution
P (θ k | y ) ∝ ∫∫∫ P(θ1...θ m ) P( y | θ1...θ m )dθ − k posterior marginal distribution
P (θ k | y ) ∝ ∫∫∫ P (θ ) P( y | θ )dθ − k Bayesian inference
High dimensional multiple integration is involved
A few simplest cases where a closed form of P (θ k | y ) exists
Advantages of Bayesian Analysis
„
„
„
„
Natural way of combining prior information with data
Inferences made based on data and are exact without reliance
on asymptotic approximation
It provides interpretable answers, such as “the true parameter
has a probability of 0.95 of falling in a 95% credible interval.”
It provides a convenient setting for arbitrarily complicated
parametric models, e.g., hierarchical models and missing data
problems.
Disadvantages of Bayesian Analysis
„
„
„
„
Require skills to translate subjective prior beliefs into a
mathematically formulated prior. If you do not proceed with
caution, you can generate misleading results.
It can produce posterior distributions that are heavily
influenced by the priors. From a practical point of view, it
might be difficult to convince experts who do not agree
with the validity of the chosen prior.
It often comes with a high computational cost, especially in
models with a large number of parameters.
Often implemented via MCMC, which itself has a lot of
problems
Prior Distributions
„
„
„
„
„
„
Objective priors versus subjective priors
Noninformative priors (vague, flat, diffuse)
Proper prior (posterior exists) and improper priors (posterior
does not exist)
Informative prior (change or dominate over likelihood, previous
study, past experience, or expert opinion)
Conjugate priors (the prior and posterior distributions are from
the same family, )
Jeffreys’ Prior (local uniform)
Posterior Distribution
„
„
„
„
Posterior mean, mode, percentiles
Posterior standard deviation
Equal-tail credible interval
Highest posterior density (HPD) interval
Credible Interval and Confidence Interval
„
„
Credible interval: Pr(a>theta|y)=Pr(theta>b|y)=0.025, which
leads to Pr(a<theta<b|y)=0.95, Given the data, theta will fall
between a and b with 95% probability.
Confidence interval: if the experiment is repeated infinite
number of times, 95% of the times the confidence interval will
cover the true value. It is purely hypothetical because the
experiment is never going to be replicated, and if it is, the
number of replications will never be infinite.
Credible Intervals (Bayesian Analysis)
Equal-tail interval
Highest posterior density interval
Confidence Interval (ML Analysis)
Outline
„
„
„
„
„
„
Introduction to Bayesian statistics
Markov chain Monte Carlo algorithm
Assessing Markov chain convergence
Summary statistics (post MCMC analysis)
An example (linear regression)
Software for Bayesian analysis
Markov Chain Monte Carlo
P(θ k | y ) ∝ ∫∫∫ P(θ1...θ m ) P ( y | θ1...θ m )dθ − k posterior marginal distribution
1
P(θ k | y ) ≈
M
M
∑ g (θ
i =1
(i )
k
| y ) Approximate via Monte Carlo simulation
Multiple varite integration avoided
Monte Carlo simulation of one or a few variable at a time
Random number generators from simple distributions
M sufficiently large to reduce the Monte Carlo error
MCMC (Simple with Computer Simulation)
„
„
„
„
„
Simple, so you and I can do it
Increased computer power
Popular as a result (Bayes v.s. ML)
Can be dangerous, misleading result
More of an art than a science
Sampling from Posterior Conditional Distribution
„
„
„
„
„
Target distribution is the posterior marginal distribution
Sample from posterior conditional distribution
When the Markov chain is sufficiently long, it reaches the
stationary distribution
The stationary distribution is the posterior joint distribution
Product of MCMC is a posterior sample of all parameters
Sampling from Posterior Conditional Distribution
θ1(t +1) ~ P(θ1 | θ 2(t ) ...θ m(t ) , y ) = P( y | θ1 ,θ 2(t ) ...θ m(t ) ) P(θ1 ,θ 2(t ) ...θ m(t ) )
θ 2(t +1) ~ P(θ 2 | θ1(t ) ...θ m(t ) , y ) = P( y | θ 2 ,θ1(t ) ...θ m(t ) ) P(θ 2 ,θ1(t ) ...θ m(t ) )
.......................................................................................
θ m(t +1) ~ P(θ m | θ1(t ) ...θ m(t−)1 , y ) = P( y | θ m ,θ1(t ) ...θ m(t−)1 ) P(θ m ,θ1(t ) ...θ m(t−)1 )
Gibbs Sampler, Metropolis Sampler and
Metropolis-Hastings Sampler
„
„
„
„
„
Gibbs sampler: Directly sample variable from its fully conditional
posterior distribution
Metropolis: Random walk (symmetric proposal distribution)
Metropolis-Hastings: General method (asymmetric proposal
distribution)
Metropolis-Hastings – MCMC
Gibbs sampling special case of MCMC when proposal = posterior
Acceptance-Rejection
⎧ p (θ new | y ) ⎫
Metropolis sampler: r = min ⎨
,1⎬
(t )
⎩ p (θ | y ) ⎭
⎧ p (θ new | y ) p (θ ( t ) | θ new ) ⎫
Metropolis-Hastings sampler: r = min ⎨
,1⎬
new
(t )
(t )
⎩ p (θ | y ) q (θ | θ ) ⎭
θ (t +1) = θ ( new) if accepted (r ), overwise θ (t +1) = θ (t ) with (1 − r )
Markov Chain
P (θ (t +1) | θ (t ) , y )
θ (0) → θ (1) → θ (2) → θ (3) → ... → θ (T )
⎡θ1(0) ⎤ ⎡θ1(1) ⎤ ⎡θ1(2) ⎤
⎡θ1(T ) ⎤
⎢ (0) ⎥ ⎢ (2) ⎥ ⎢ (2) ⎥
⎢ (T ) ⎥
θ
θ
θ
⎢ 2 ⎥ → ⎢ 2 ⎥ → ⎢ 2 ⎥ → ... → ⎢θ 2 ⎥
⎢ M⎥ ⎢ M⎥ ⎢ M⎥
⎢ M⎥
⎢ (0) ⎥ ⎢ (1) ⎥ ⎢ (2) ⎥
⎢ (T ) ⎥
θ
θ
θ
⎣⎢ m ⎦⎥ ⎣⎢ m ⎦⎥ ⎣⎢ m ⎦⎥
⎣⎢θ m ⎦⎥
When T → ∞, θ (T +1) ~ P (θ (T ) | y ) = P(θ (T ) | θ (T −1) , y )
Posterior Sample (Final Product of MCMC)
Table 1. Posterior sample with T observations
Iteration
θ1
θ2
θ3
θ4
0
θ1(0)
θ 2(0)
θ3(0)
θ 4(0)
1
θ1(1)
θ1(2)
θ 2(1)
θ 2(2)
θ3(1)
θ3(2)
θ 4(1)
θ 4(2)
θ1(3)
θ1(4)
θ 2(3)
θ 2(4)
θ3(3)
θ3(4)
θ 4(3)
θ 4(4)
.
.
.
.
.
.
.
.
.
.
.
.
θ1(T )
θ 2(T )
θ3(T )
θ 4(T )
2
3
4
.
.
.
T
Outline
„
„
„
„
„
„
Introduction to Bayesian statistics
Markov chain Monte Carlo algorithm
Assessing Markov chain convergence
Summary statistics (post MCMC analysis)
An example (linear regression)
Software for Bayesian analysis
Assessing Convergence of Markov Chain
„
„
„
Convergence to the stationary distribution (no conclusive test,
no guarantee)
Some parameters converge fast and some converge slowly, but
all parameters must converge to the stationary distribution
After convergence, collect observations to form a posterior
sample
Assessing Convergence of Markov Chain
„
„
„
Burn-in period: number of iterations before convergence
Autocorrelation: measure the dependency of observations of the
posterior sample
Thinning (trimming) rate: keep one observation in every k-th
iteration to reduce serial correlation
Visual Analysis of Trace Plots
All Parameters Have to Converge (Perfect Case)
Some Parameters Do not Mix Well
Burn-in Period
Poor Mixing (Need High Thinning Rate )
Nonconvergence
Autocorrelation
Statistical Diagnosis Test for Convergence
„
„
„
„
Gelman-Rubin R test (convergence, multiple chains required)
Geweke z-test (convergence, single chain)
Autocorrelation (dependency)
Effective sample size (dependency)
Gelman-Rubin Diagnostic Test
dˆ + 3 Vˆ
dˆ + 3 ⎛ n − 1 M + 1 B ⎞
µ
+
Rc =
· =
⎜
⎟
ˆ
ˆ
nM W ⎠
d +1 W
d +1 ⎝ n
2Vˆ 2
ˆ
where d =
¶ Vˆ )
var(
· Vˆ ) = ⎛ n − 1 ⎞ 1 Var(
· s2 ) + ⎛ M + 1 ⎞ 2 B2
Var(
m
⎜
⎟
⎜
⎟
⎝ n ⎠ M
⎝ nM ⎠ M − 1
( M + 1)(n − 1) n ·
· s 2 ,θ · )
cov( sm2 , (θ m· ) 2 ) − 2θ· · cov(
+2
m
m
2
nM
M
2
2
(
)
Unconverged Multiple Chains
Converged Multiple Chains
Geweke Diagnostics
1 n1 t
1
θ1 = ∑ θ and θ 2 =
n1 t =1
n2
Zn =
θ1 − θ 2
sˆ1 (0) sˆ2 (0)
+
n1
n2
n
t
θ
∑
t = na
Geweke’s Diagnostics
Autocorrelations
γˆ ( h )
, | h |< n
ρˆ ( h ) =
γˆ ( 0 )
1 n−h t +h
γˆ ( h ) =
θi − θi )(θit − θi ) , 0 ≤ h < n
(
∑
n − h t =1
Autocorrelation
Effective Sample Size
ESS =
n
τ
=
n
∞
1 + 2∑ ρ k (θ )
k =1
Effective Sample Size
Effective Sample Sizes
Parameter
ESS Correlation Time
Efficiency Sample Size
beta0
beta1
sigma1
sigma2
106.2
106.1
31.5
116.4
0.1062
0.1061
0.0315
0.1164
9.4161
9.4232
31.7240
8.5938
1000
1000
1000
1000
Outline
„
„
„
„
„
„
Introduction to Bayesian statistics
Markov chain Monte Carlo algorithm
Assessing Markov chain convergence
Summary statistics (post MCMC analysis)
An example (linear regression)
Software for Bayesian analysis
Posterior Sample: Final Product of MCMC
Iteration beta0
beta1
beta2
beta3
1001
-0.6518
0.1508
1.4204
-0.8199
1011
-0.5059
0.089
1.3165
-0.8546
1021
-0.3935
-0.2032
1.3247
-0.6918
1031
-0.4967
-0.0446
1.2305
-0.7546
1041
-0.4055
0.0753
1.1728
-0.8286
1051
-0.5366
0.1417
1.2627
-0.5335
1061
-0.5082
0.1666
1.2622
-0.5695
1071
-0.5667
0.3094
1.4219
-1.1682
1081
-0.5644
0.2282
1.4198
-0.9836
1091
-0.8113
0.5319
1.5171
-0.9582
1101
-0.8727
0.7558
1.738
-1.2919
1111
-0.7814
0.3138
1.8088
-1.3229
1121
-0.789
0.4903
1.6471
-1.1649
1131
-0.5641 0.00334
1.513
-1.0373
1141
-0.4443
0.2236
1.3024
-1.1663
1151
-0.3215
0.1636
1.0776
-0.6622
1161
-0.3652
-0.1745
1.0796
-0.3665
1171
-0.4751
-0.1332
1.0835
-0.2229
1181
-0.4606
0.1071
1.1
-0.6125
1191
-0.653
0.3555
1.3507
-0.8123
1201
-0.5943
0.3791
1.3608
-0.967
1211
-0.4866
0.2016
1.2843
-0.897
1221
-0.4528
0.4167
1.2743
-1.3167
1231
-0.5015
0.453
1.2511
-1.2055
s2
LogPrior LogLike LogPost delta1
0.000149 71.4095 -55.5855 15.8239 0.00565
0.000111 69.8796 -55.3764 14.5032 0.00809
0.000192 72.8501 -57.4954 15.3547 0.00949
0.000086 73.5804 -56.1259 17.4545 0.00645
0.000125 77.5513 -56.1312
21.42 0.00776
0.000118 76.9519 -55.8484 21.1035 0.00631
0.000073 81.6694 -55.9616 25.7079 0.00602
0.000107
81.155 -55.7688 25.3862 0.00648
0.000047
85.579 -55.1402 30.4388 0.000245
0.00005 86.1065 -57.2681 28.8384 -0.00434
0.000033 87.6737 -59.9277
27.746 -0.00714
0.000033 87.5795 -58.6224 28.9572 -0.00752
0.000053 85.5295 -56.8125
28.717 -0.00947
0.000064 85.5708 -56.9185 28.6522
-0.0126
0.000043 87.6287 -56.7919 30.8367
-0.0116
0.000035 91.6861 -57.9028 33.7833 -0.00632
0.000018 90.3015 -56.2956 34.0059 -0.00631
0.000069 87.3886 -56.5182 30.8704 -0.00631
0.000039
88.117 -55.5613 32.5557 -0.00008
0.000095
80.608
-55.689
24.919
0.01
0.000064 77.0309 -55.6955 21.3354
0.0143
0.000141 76.9601 -55.3034 21.6567
0.0109
0.000206 74.7238 -58.1844 16.5394 0.00998
0.000187 72.6304 -57.4055 15.2248 0.00552
Marginal Posterior Distributions
Summary Statistics (Post MCMC Analysis)
„
„
„
„
„
Posterior means, modes, medians
Posterior quantiles (percentiles)
Posterior standard deviations
Equal-tail credible intervals
Highest posterior density intervals
Summary Statistics (Post MCMC Analysis)
Posterior Summaries
Parameter
N
Mean
Standard
Deviation
Percentiles
25%
50%
75%
LogBUN
10000
1.7610
0.6593
1.3173
1.7686
2.2109
HGB
10000
-0.1279
0.0727
-0.1767
-0.1287
-0.0789
Platelet
10000
-0.2179
0.5169
-0.5659
-0.2360
0.1272
Age
10000
-0.0130
0.0199
-0.0264
-0.0131
0.000492
LogWBC
10000
0.3150
0.7451
-0.1718
0.3321
0.8253
Frac
10000
0.3766
0.4152
0.0881
0.3615
0.6471
LogPBM
10000
0.3792
0.4909
0.0405
0.3766
0.7023
Protein
10000
0.0102
0.0267
-0.00745
0.0106
0.0283
SCalc
10000
0.1248
0.1062
0.0545
0.1273
0.1985
Summary Statistics (Post MCMC Analysis)
Posterior Intervals
Parameter
Alpha
Equal-Tail Interval
HPD Interval
LogBUN
0.050
0.4418
3.0477
0.4107
2.9958
HGB
0.050
-0.2718
0.0150
-0.2801
0.00599
Platelet
0.050
-1.1952
0.8296
-1.1871
0.8341
Age
0.050
-0.0514
0.0259
-0.0519
0.0251
LogWBC
0.050
-1.2058
1.7228
-1.1783
1.7483
Frac
0.050
-0.3995
1.2316
-0.4273
1.2021
LogPBM
0.050
-0.5652
1.3671
-0.5939
1.3241
Protein
0.050
-0.0437
0.0611
-0.0405
0.0637
SCalc
0.050
-0.0935
0.3264
-0.0846
0.3322
Outline
„
„
„
„
„
„
Introduction to Bayesian statistics
Markov chain Monte Carlo algorithm
Assessing Markov chain convergence
Summary statistics (post MCMC analysis)
An example (linear regression)
Software for Bayesian analysis
Regression Analysis
y j = β 0 + X j1β1 + X j β 2 + ε j
j = 1,..., n, where n is the sample size
ε j ~ N (0, σ 2 )
Parameter θ = {β 0 , β1 , β 2 , σ 2 }
Data y = { y j }
n
j =1
Likelihood
n
L(θ ) = ∏ N ( y j | β 0 + X j1β1 + X j 2 β 2 , σ 2 )
j =1
New notation:
p ( x) = N ( x | μ ,ν ) =
same as
x ~ N ( μ ,ν )
1
⎡ 1
⎤
exp ⎢ − ( y − μ ) 2 ⎥
2πν
⎣ 2ν
⎦
Prior Distribution
π (θ ) = π ( β 0 )π ( β1 )π ( β 2 )π (σ 2 )
π ( β 0 ) = U( β 0 | −∞, +∞) ≈ N ( β 0 | 0,106 )
π ( β1 ) = N ( β1 | 0, σ 12 )
π ( β 2 ) = N ( β 2 | 0, σ 22 )
π (σ 2 ) = Inv − χ 2 (σ 2 | τ , ω )
hierarchical model
ξ (σ 12 ) = Inv − χ 2 (σ 2 | τ , ω )
ξ (σ 22 ) = Inv − χ 2 (σ 2 | τ , ω )
Conditional Posterior Distributions
p (θ1 , θ 2 ,..., θ m | y ) = L(θ1 ,θ 2 ,...,θ m )π (θ1 ,θ 2 ,...,θ m )
p (θ1 , θ 2(0) ,...,θ m(0) | y ) = L(θ1 ,θ 2(0) ,...,θ m(0) )π (θ1 ,θ 2(0) ,...,θ m(0) )
p (θ1 | θ 2(0) ,...,θ m(0) , y ) = a known distribution
Conditional Posterior Distribution
p( β 0 | β1(0) , β 2(0) , σ 2(0) , y ) = N ( β 0 | μ0 ,ν 0 )
p( β1 | β 0(0) , β 2(0) , σ 2(0) , y ) = N ( β1 | μ1 ,ν 1 )
p( β 2 | β 0(0) , β1(0) , σ 2(0) , y ) = N ( β 2 | μ 2 ,ν 2 )
p(σ 2 | β 0(0) , β1(0) , β 2(0) , y ) = Inv − χ 2 (σ 2 | τ + n,ν + SS )
p(σ 12 | β 0(0) , β1(0) , β 2(0) , σ 2(0) , y ) = Inv − χ 2 (σ 12 | τ + 1,ν + ( β1(0) ) 2 )
p(σ 22 | β 0(0) , β1(0) , β 2(0) , σ 2(0) , y ) = Inv − χ 2 (σ 22 | τ + 1,ν + ( β 2(0) ) 2 )
Posterior Sample (Final Product of MCMC)
Table 1. Posterior sample with T observations
Iteration
β0
β1
β2
σ2
0
β 0(0)
β 0(1)
β1(0)
β1(1)
β 2(0)
β 2(1)
σ 2(0)
σ 2(1)
σ 12(0)
σ 12(1)
σ 22(0)
σ 22(1)
β 0(2)
β 0(3)
β 0(4)
β1(2)
β1(3)
β1(4)
β 2(2)
β 2(3)
β 2(4)
σ 2(2)
σ 2(3)
σ 2(4)
σ 12(2)
σ 12(3)
σ 12(4)
σ 22(2)
σ 22(3)
σ 22(4)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
β 0(T )
β1(T )
β 2(T )
σ 2(T )
σ 12(T )
σ 22(T )
1
2
3
4
.
.
.
T
σ 12
σ 22
Outline
„
„
„
„
„
„
Introduction to Bayesian statistics
Markov chain Monte Carlo algorithm
Assessing Markov chain convergence
Summary statistics (post MCMC analysis)
An example (linear regression)
Software for Bayesian analysis
Software for Bayesian Analysis
„
„
WinBUGS (Windows Bayesian analysis Using Gibbs Sampler)
Website: http://www.mrc-bsu.cam.ac.uk/bugs/
SAS (Statistical Analysis System)
Website: http://www.sas.com/
Bayesian Analysis Using SAS
„
„
„
„
„
„
PROC
PROC
PROC
PROC
PROC
PROC
GENMOD (generalize linear model)
LIFEREG (accelerated life failure model)
PHREG (piecewise constant baseline hazard model)
MIXED (mixed model analysis)
QTL (quantitative trait loci)
MCMC (Bayesian analysis via MCMC)
Bayesian Analysis Using SAS
„
„
„
„
„
„
PROC
PROC
PROC
PROC
PROC
PROC
GENMOD (generalize linear model)
LIFEREG (accelerated life failure model)
PHREG (piecewise constant baseline hazard model)
MIXED (mixed model analysis)
QTL (quantitative trait loci)
MCMC (Bayesian analysis via MCMC)
Demonstration of PROC MCMC
„
„
Seed data
Damage data (will not be presented)
The Seed Data (Crowder 1977)
„
„
„
„
„
„
PROC
PROC
PROC
PROC
PROC
PROC
GENMOD (generalize linear model)
LIFEREG (accelerated life failure model)
PHREG (piecewise constant baseline hazard model)
MIXED (mixed model analysis)
QTL (quantitative trait loci)
MCMC (Bayesian analysis via MCMC)
Formatted Raw Data
Plate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Breed
O.a75
O.a75
O.a75
O.a75
O.a75
O.a75
O.a75
O.a75
O.a75
O.a75
O.a75
O.a73
O.a73
O.a73
O.a73
O.a73
O.a73
O.a73
O.a73
O.a73
O.a73
Host
Bean
Bean
Bean
Bean
Bean
Cucumber
Cucumber
Cucumber
Cucumber
Cucumber
Cucumber
Bean
Bean
Bean
Bean
Bean
Cucumber
Cucumber
Cucumber
Cucumber
Cucumber
r
n
10
23
23
26
17
5
53
55
32
46
10
8
10
8
23
0
3
22
15
32
3
39
62
81
51
39
6
74
72
51
79
13
16
30
28
45
4
12
41
30
51
7
Formatted Pretreated Data
Plate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Breed
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
Host
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
0
1
1
1
1
1
r
10
23
23
26
17
5
53
55
32
46
10
8
10
8
23
0
3
22
15
32
3
n
39
62
81
51
39
6
74
72
51
79
13
16
30
28
45
4
12
41
30
51
7
p
0.2564
0.371
0.284
0.5098
0.4359
0.8333
0.7162
0.7639
0.6275
0.5823
0.7692
0.5
0.3333
0.2857
0.5111
0
0.25
0.5366
0.5
0.6275
0.4286
Logistic Model
rj ~ Binomial(n j , p j )
μ j = β 0 + breed j ⋅ β1 + host j ⋅ β 2 + breed j × host j ⋅ β 3 + δ j
p j = logistic( μ j ) =
exp( μ j )
1 + exp( μ j )
μ j = logit( p j ) = log
pj
1− p j
δ j ~ Normal(0, σ 2 ), overdispersion σ 2 > 0
Parameters, Missing Values and Data
Parameter : θ = {β 0 , β1 , β 2 , β 3 , σ 2 }
Missing values : δ = {δ j }
Data : d = {rj , n j , breed j , host j }
Prior and Posterior Distributions
Prior :
β 0 − β3 ~ Normal(0,106 )
δ j ~ Normal(0, σ 2 )
σ 2 ~ Inv − χ 2 (10−6 ,10−6 )
Posterior :
β 0 − β3 ~ Unknown (Metropolis sampler)
δ j ~ Unkown (Metropolis sampler)
(
σ 2 ~ Inv − χ 2 10−6 + n,10−6 + ∑ j =1 δ j2
n
)
Demonstration!
SAS Code and Demonstration
Thank You!
The Damage Data (Milliken and Johnson 1992)
Plant
1
2
3
4
5
6
7
8
9
10
11
12
13
Variety
A
A
A
B
B
B
B
C
C
C
C
D
D
Damage
3.9
4.05
4.25
3.6
4.2
4.05
3.85
4.15
4.6
4.15
4.4
3.35
3.8
Model, Prior and Posterior
y = X β + Z γ + ε ; p (ε ) = Normal(ε | 0, Iσ 2 )
⎡1
⎡3.90 ⎤ ⎡1⎤
⎢ 4.05⎥ ⎢1⎥
⎢1
⎢
⎢
⎥ ⎢⎥
⎢ 4.25⎥ ⎢1⎥
⎢1
⎢
⎥ ⎢⎥
⎢
3.60
1
⎢0
⎢
⎥ ⎢⎥
⎢ 4.20 ⎥ ⎢1⎥
⎢0
⎢
⎢
⎥ ⎢⎥
4.05
1
⎢
⎥ ⎢⎥
⎢0
⎢ 3.85 ⎥ = ⎢1⎥ β + ⎢0
⎢
⎢
⎥ ⎢⎥
⎢0
⎢ 4.15⎥ ⎢1⎥
⎢0
⎢ 4.60 ⎥ ⎢1⎥
⎢
⎢
⎥ ⎢⎥
⎢0
⎢ 4.15⎥ ⎢1⎥
⎢
⎢
⎥ ⎢⎥
4.40
1
⎢0
⎢
⎥ ⎢⎥
⎢0
⎢ 3.35 ⎥ ⎢1⎥
⎢
⎢
⎥ ⎢⎥
⎣0
⎣3.80 ⎦ ⎣1⎦
0 0 0⎤
⎡ ε1 ⎤
⎢ε ⎥
0 0 0 ⎥⎥
⎢ 2⎥
⎢ ε3 ⎥
0 0 0⎥
⎢ ⎥
⎥
1 0 0⎥
⎢ε4 ⎥
⎢ ε5 ⎥
1 0 0⎥
⎥ ⎡γ ⎤ ⎢ ⎥
1 0 0⎥ ⎢ 1 ⎥ ⎢ ε 6 ⎥
γ
1 0 0⎥ ⎢ 2 ⎥ + ⎢ ε 7 ⎥
⎥ ⎢γ 3 ⎥ ⎢ ⎥
0 1 0⎥ ⎢ ⎥ ⎢ ε 8 ⎥
⎣γ ⎦
0 1 0⎥ 4 ⎢ ε 9 ⎥
⎢ ⎥
⎥
⎢ε10 ⎥
0 1 0⎥
⎢ ⎥
⎥
0 1 0⎥
⎢ε11 ⎥
⎢ε12 ⎥
0 0 1⎥
⎢ ⎥
⎥
0 0 1⎦
⎣ε13 ⎦
Model, Prior and Posterior
Prior:
π ( β ) = Normal( β | 0,106 )
π (γ ) = Normal(γ | 0, Iσ A2 )
π (σ A2 ) = Inv − χ 2 (σ A2 |10−6 ,10−6 )
π (σ 2 ) = Inv − χ 2 (σ 2 |10−6 ,10−6 )
Posterior:
p ( β | ...) = Normal ⎡⎣ β | ( X T V −1 X ) −1 ( X T V −1 y ), ( X T V −1 X ) −1 ⎤⎦
p (γ | ...) = Normal ⎡⎣γ | σ A2 Z TV −1 ( y − X β ), σ A2 ( I − Z T V −1Zσ A2 ) ⎤⎦
(
p (σ A2 | ...) = Inv − χ 2 σ A2 |10−6 +4,10−6 +∑ k =1 γ k2
4
)
p (σ 2 | ...) = Inv − χ 2 ⎡σ 2 |10−6 +13,10−6 +∑ j =1 ( y j − X j β − Z j γ ) 2 ⎤
⎣
⎦
13