Bayesian methods II

Bayesian analysis
grid method (recap)
Abundance (103)
Model and
data
Year
Data: N1981 , CV1981 , N1988 , CV1988 , N1998 , CV1998
Nˆ1973  N1973
Nˆt 1  Nˆt (1  r )
(ln N1981  ln Nˆ1981 )2 (ln N1988  ln Nˆ1988 )2 (ln N1998  ln Nˆ1998 )2
 ln L 


2
2
2
2 1981
2 1988
2 1998
Grid method for posterior
Posterior probability of individual
pairs of r and N1973 values
Value in each cell is hypothesis Hi of each value of r and N1973
L(Hi |data)  Prior(Hi )
P(Hi |data) 
 j L(H j |data)  Prior(H j )
Sum of all cells
• In each cell calculate likelihood×prior for each
hypothesis Hi (each pair of r and N1973 values).
• Then divide each cell by the sum of the
likelihood×prior in all the cells
• The result is the posterior probability for each cell
20 Antarctic blue grid.xlsx, sheet “many cells”
Integration not maximization
-0.10
10
-0.05
0.00
0.05
0.1
0.15
0.2
Integration column
Integration column
1500
Integration column
1000
Integration column
500
2000
Maximum likelihood: for each value of r, search for N1973 with the best NLL (stars)
Bayesian: for each value of r, integrate (“add up”) cells across values of N1973
Where the green/yellow area is very narrow, Bayesian integration will have smaller
summed probability compared to the maximum value used in a likelihood profile
20 Antarctic blue grid.xlsx, sheet “many cells”
Normal prior on r =
0.10
-0.10
No prior
Normal prior
2
N[0.062,0.029 ]
0.20
Punt et al. (2010) looked
at actual increase rates in
depleted whale
populations and found a
mean of 6.2% and SD of
2.9%
Multiply the likelihood by
a prior for r that is
normal with mean 0.062
and SD 0.029
Dropping constants,
r  0.062
-lnPrior 
2
2  0.0292
20 Antarctic blue grid.xlsx, sheet “many cells”
Punt AE & Allison C (2010) Appendix 2. Revised outcomes from the Bayesian meta-analysis, Annex D: Report of the sub-committee on the
revised management procedure. Journal of Cetacean Research and Management (Suppl. 2) 11:129-130
Bayesian (uniform prior)
0.8
0.6
0.4
0.2
Integration
not
maximization
MLE estimate
0.104
95% confidence
interval
0.038-0.170
0.0
-0.10
-0.05
0.00
0.05
0.10
0.15
0.20
probability
Posterior
Posterior probability
No prior
1.0
Uniform prior U[-0.1, 0.2]
median
0.086
95% credible
interval
0.022-0.155
0.010
0.005
0.000
-0.10
-0.05
0.020
0.05
0.10
0.15
0.20
Prior N(0.062,
Informative prior
0.0292)
median
0.072
95% credible
interval
0.029-0.115
0.015
0.010
0.005
0.000
-0.10
0.00
Value ofof
r r
Value
Value of r
Value
of r
Posterior
probability
Posterior probability
likelihood
Scaled
Scaled likelihood
Likelihood
-0.05
0.00
0.05
0.10
0.15
0.20
Value
r r
Valueofof
20 Antarctic blue grid.xlsx, sheet “many cells”
Effect of different priors
Posterior probability
0.020
N(0.062, 0.029)
0.015
U[-0.1, 0.118]
U[-0.1, 0.2]
0.010
0.005
0.000
-0.1
-0.05
0
0.05
0.1
0.15
0.2
Value of r
20 Antarctic blue grid.xlsx, sheet “compare all priors”
SIR method
Problem with grid method
• You don’t know how fine to make the grid steps
• You really want steps to be continuous
• Instead of systematic sampling, the SIR method
randomly samples (r, N1973) pairs from the grid region
• Good guesses (draws) with high likelihood×prior are
kept and bad draws are discarded
• When enough draws have been saved so that the
posterior is smooth (1000 or 5000), then stop
21 Antarctic blue SIR.xlsx, sheet “Normal prior”
SIR: sample-importance resampling
(simplest and least efficient version)
•
•
•
•
•
Find maximum likelihood (best likelihood×prior), Y
Randomly sample pairs of r and N1973
For each pair, calculate X = likelihood×prior
Accept pair with probability X/Y, otherwise reject
Note that X/Y = exp(NLL(Y) –NLL(X)), which is often
easier to work with
• Accepted pairs are the posterior
• Repeat until you have sufficient accepted pairs
21 Antarctic blue SIR.xlsx, sheet “Normal prior”
SIR: accepted, rejected
0.20
0.15
Valuer of r
0.10
0.05
0.00
-0.05
-0.10
0
200
400
600
800
1000
1200
N1973
Value
of N1973
1400
1600
1800
2000
21 Antarctic blue SIR.xlsx, sheet “Normal prior”
Advantage of discrete samples
• Each draw that is saved is a sample from the
posterior distribution
• We can take these pairs of (r, N1973) and project the
model into the future for each pair
• This gives us future predictions for the joint values of
the parameters
• Takes into account correlations between parameter
values
21 Antarctic blue SIR.xlsx, sheet
“Normal prior”
20,000 samples, 296 accepted
• r = 0.072, 95% interval = 0.027-0.112
– Grid method 0.072, 0.029-0.115
• N1973 = 320, 95% interval = 145-689
• LOTS of rejected function calls (waste)
• Tricks to increase acceptance rates
– Accept with probability X/Z where Z is smaller than the
MLE (Y), will accept more draws, though some draws will
be duplicated in the posterior (next slides)
– Sample parameter values from an importance function,
compare likelihood ratios, then account for importance
function (not covered)
21 Antarctic blue SIR.xlsx, sheet “Normal prior”
Increase acceptance rate with
threshold
•
•
•
•
•
Choose threshold Z where Z < maximum likelihood Y
Randomly sample pairs of r and N1973
For each pair, calculate X = likelihood × prior
If X ≤ Z, save one copy of X with probability X/Z
If X > Z, save multiple copies of X
– e.g. if X/Z = 4.6 then save 5 copies with probability 0.6 or 4
copies with probability 0.4
• Rule of thumb: stop when no pair is >0.1% of all
saved draws
Accepted multiple times, accepted once, rejected
0.20
0.15
of r
Value
r
0.10
0.05
0.00
-0.05
-0.10
0
200
400
600
800
1000
1200
ValueN1973
of N1973
1400
1600
1800
2000
MCMC method
Markov chain Monte Carlo
Markov chain Monte Carlo (MCMC)
(general idea)
•
•
•
•
Start somewhere
Randomly jump somewhere else
If you found a better place, go there
If you found a worse place, go there with some
probability
• There are formal proofs that this works
MCMC algorithm
• Start anywhere with values for r1, N1973,1, X1 =
likelihood×prior
• Jump function: add random numbers to r1 and
N1973,1, to get a candidate draw: r*, N1973*, and X* =
likelihood×prior
• Calculate X*/X1 which equals exp(NLL(X1) – NLL(X*))
• If random number U[0,1] is < X*/X1 then r2 = r*,
N1973,2 = N1973*, X2 = X* [accept draw, new values]
• If random number U[0,1] is ≥ X*/X1 then r2 = r1,
N1973,2 = N1973,1, X2 = X1 [reject draw, keep previous
values]
21 Antarctic blue MCMC.xlsx
MCMC algorithm
• Successive points wander around the posterior
• If you start far away, it will take some time to get near
to the highest likelihood
• Therefore, discard first 20% of accepted draws (burnin period)
• Thin the chain, by retaining only one in every n
accepted draws
• Convergence attained when no autocorrelation in
thinned chain (there are other tests for convergence)
21 Antarctic blue MCMC.xlsx
21 Antarctic blue MCMC.xlsx
sheet “Normal prior”
0.20
0.20
0.15
0.15
0.10
0.05
0.00
-0.05
-0.10
100
200
300
400
Draw (first 500)
100
Draw (first 500)
200
300
400
8000
Draw (discard first 2000)
10000
0.05
0.00
-0.05
0
Draw (first 500)
0.20
0.15
0.10
0.05
0.00
-0.10
2000
500
1000
1500
2000
Value for N1973
Value
of N1973
-0.05
6000
0.10
500
0.20
4000
Draws 1–500
-0.10
0
Value
r r
Value forof
2000
1800
1600
1400
1200
1000
800
600
400
200
0
2000
500
of r
Value
Value for r
0
of N1973
Value
Value for N1973
r vs. N1973
of r
Value
Value for r
2000
1800
1600
1400
1200
1000
800
600
400
200
0
Trace for r
ofr r
Value
Value for
of N1973
Value
Value for N1973
Trace for N1973
Draws
2,000–10,000
0.15
0.10
0.05
0.00
-0.05
-0.10
4000
6000
8000
Draw (discard first 2000)
Draw (2,000-10,000)
10000
0
500
1000
1500
2000
Value for N1973
Value of N1973
21 Antarctic blue MCMC.xlsx
10,000 samples, 2669 accepted
• r = 0.074, 95% interval = 0.032-0.118
– Grid method 0.072, 0.029-0.115
• N1973 = 302, 95% interval = 130-673
• Increase length of chain, change jump size,
change thinning rate, change burn-in period,
etc.
21 Antarctic blue MCMC.xlsx
0.20
0.05
0.00
0.10
0.05
0.00
-0.05
-0.05
-0.10
-0.10
0
500
1000
1500
Therefore many
more draws
accepted
0.15
of rr
Value
Value for
0.10
Accepted
0.20
Does not explore
space with low
likelihood
0.15
of rr
Value
Value for
(10000 samples)
MCMC
Rejected
0
2000
0.15
0.15
of rr
Value
Value for
of rr
Value
Value for
(20000 samples)
SIR
0.20
0.10
0.05
0.00
0.05
0.00
-0.05
-0.10
-0.10
1500
Value
Valuefor
ofN1973
N1973
2000
0.10
-0.05
1000
1500
1973
0.20
500
1000
Value
Valuefor
ofN1973
N
Value for
Value
of N1973
N1973
0
500
2000
0
500
1000
1500
2000
Value
Valuefor
ofN1973
N1973
21 Accepted rejected comparison.xlsx
What to do with accepted draws
• Histogram of r values = posterior for r
• Histogram of N1973 values = posterior for r
• Proportion of r values < 0 is the probability that the
population is declining (2 out of 8000 => P = 0.0002)
• Can run population model for each accepted draw r
and N1973 and calculate 95% credibility intervals for
past and future years
Bayesian methods summary
• Many different algorithms: grid method, SIR
method, MCMC method (Gibbs samplers) etc.
• All involve priors, likelihoods, and posteriors
• Natural interpretation of probability
• Allow use of other information
• Posterior draws can be used for prediction