Ch. 8 Markov Chain Monte Carlo

Ch. 8 Markov Chain Monte Carlo
A Solomon Kurz
5/3/2017
8.3. Easy HMC
Let's get our data.
# detach(package:brms)
library(rethinking)
data(rugged)
d <- rugged
d$log_gdp <- log(d$rgdppc_2000)
dd <- d[complete.cases(d$rgdppc_2000), ]
Closing rethinking and opening brms.
rm(rugged)
detach(package:rethinking)
library(brms)
The brms version of m8.1:
b8.1 <- brm(data = dd, family = "gaussian",
log_gdp ~ 1 + rugged + cont_africa + rugged:cont_africa,
prior = c(set_prior("normal(0, 100)", class = "Intercept"),
set_prior("normal(0, 10)", class = "b"),
set_prior("uniform(0, 10)", class = "sigma")),
chains = 4, iter = 2000, warmup = 1000, cores = 4)
## Warning: It appears as if you have specified an upper bounded prior on a p
arameter that has no natural upper bound.
## If this is really what you want, please specify argument 'ub' of 'set_prio
r' appropriately.
## Warning occurred for prior
## sigma ~ uniform(0, 10)
## Compiling the C++ model
## Start sampling
print(b8.1)
## Family: gaussian(identity)
## Formula: log_gdp ~ 1 + rugged + cont_africa + rugged:cont_africa
##
Data: dd (Number of observations: 170)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
WAIC: Not computed
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept
9.22
0.14
8.95
9.50
2842
1
rugged
-0.20
0.08
-0.36
-0.06
2674
1
cont_africa
-1.95
0.23
-2.40
-1.50
2718
1
rugged:cont_africa
0.40
0.13
0.14
0.66
2509
1
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma
0.95
0.05
0.86
1.06
4000
1
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Using a uniform prior for sigma works. But it returns a warning. Use priors with hard
boundaries judiciously.
8.3.2. Estimation.
Here we're just switching out our uniform prior for a half cauchy. We're also anticipating
the parallelization bit in section 8.3.3.
b8.1stan <- brm(data = dd, family = "gaussian",
log_gdp ~ 1 + rugged + cont_africa + rugged:cont_africa,
prior = c(set_prior("normal(0, 100)", class = "Intercept"),
set_prior("normal(0, 10)", class = "b"),
set_prior("cauchy(0, 2)", class = "sigma")),
chains = 4, iter = 2000, warmup = 1000, cores = 4)
## Compiling the C++ model
## Start sampling
print(b8.1stan)
##
##
##
##
##
##
##
##
##
##
##
Family:
Formula:
Data:
Samples:
gaussian(identity)
log_gdp ~ 1 + rugged + cont_africa + rugged:cont_africa
dd (Number of observations: 170)
4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
WAIC: Not computed
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept
9.23
0.14
8.95
9.50
3090
1
rugged
-0.20
0.08
-0.36
-0.05
2662
1
##
##
##
##
##
##
##
##
##
##
cont_africa
rugged:cont_africa
-1.95
0.40
0.23
0.13
-2.40
0.13
-1.51
0.66
2345
2243
1
1
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma
0.95
0.05
0.85
1.06
4000
1
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
8.3.4. Visualization.
post <- posterior_samples(b8.1stan)
str(post)
## 'data.frame':
4000 obs. of
## $ b_Intercept
: num
## $ b_rugged
: num
## $ b_cont_africa
: num
## $ b_rugged:cont_africa: num
## $ sigma
: num
## $ lp__
: num
6 variables:
9.44 9.42 9.03 9.11 9.05 ...
-0.3697 -0.3012 -0.0983 -0.1429 -0.1261 ...
-2.01 -2.44 -1.85 -1.67 -1.91 ...
0.492 0.685 0.357 0.256 0.42 ...
1.002 0.916 0.969 0.926 0.912 ...
-76.7 -76.6 -74.8 -74.3 -74.9 ...
pairs(post[, 1:5])
pairs(b8.1stan, pars = parnames(b8.1stan)[1:5])
8.3.5. Using the samples
These don't quite do what McElreath's show does. But they get you most of the way there.
For details, check the documentation for the loo package.
WAIC(b8.1stan)
##
##
WAIC
SE
469.41 14.89
LOO(b8.1stan)
##
##
LOOIC
SE
469.57 14.92
8.3.6. Checking the chain.
By default, plot in brms only shows information after warmup.
plot(b8.1stan)
Also, checkout launch_shiny(b8.1stan).
Here's the brms way to do what McElreath displays in his "Overthinking: Raw Stan model
code."
stancode(b8.1stan)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
// generated with brms 1.6.1
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc;
matrix[N, K - 1] Xc; // centered version of X
vector[K - 1] means_X; // column means of X before centering
Kc = K - 1; // the intercept is removed from the design matrix
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
real temp_Intercept; // temporary intercept
real<lower=0> sigma; // residual SD
}
transformed parameters {
}
model {
vector[N] mu;
mu = Xc * b + temp_Intercept;
// prior specifications
b ~ normal(0, 10);
temp_Intercept ~ normal(0, 100);
sigma ~ cauchy(0, 2);
// likelihood contribution
if (!prior_only) {
Y ~ normal(mu, sigma);
}
}
generated quantities {
real b_Intercept; // population-level intercept
b_Intercept = temp_Intercept - dot_product(means_X, b);
}
8.4. Care and feeding of your Markov chain
8.4.3. Taming a wild chain.
dfy <- data.frame(y = c(-1, 1))
Inits <- list(Intercept = 0,
sigma = 1)
InitsList <- list(Inits, Inits)
b8.2 <- brm(data = dfy, family = "gaussian",
y ~ 1,
prior = c(set_prior("uniform(-100000000, 100000000)", class = "In
tercept"),
set_prior("uniform(
0, 100000000)", class = "si
gma")),
inits = InitsList,
chains = 2, iter = 4000, warmup = 1000, cores = 2)
## Compiling the C++ model
## Start sampling
plot(b8.2)
print(b8.2)
## Family: gaussian(identity)
## Formula: y ~ 1
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Data: dfy (Number of observations: 2)
Samples: 2 chains, each with iter = 4000; warmup = 1000; thin = 1;
total post-warmup samples = 6000
WAIC: Not computed
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept -8028526 23232143 -82260702 8226872
11 1.1
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma 12231194 24685650
285.57 90403794
3 1.18
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
What a mess! This is one of those situations in which it's handy that Stan shoots out all
those warning mesages.
Let's follow McElreath and use weakly-regularizing priors.
b8.3 <- brm(data = dfy, family = "gaussian",
y ~ 1,
prior = c(set_prior("normal(1, 10)", class = "Intercept"),
set_prior("cauchy(0, 1)", class = "sigma")),
inits = InitsList,
chains = 2, iter = 4000, warmup = 1000, cores = 2)
## Compiling the C++ model
## Start sampling
plot(b8.3)
print(b8.3)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Family:
Formula:
Data:
Samples:
gaussian(identity)
y ~ 1
dfy (Number of observations: 2)
2 chains, each with iter = 4000; warmup = 1000; thin = 1;
total post-warmup samples = 6000
WAIC: Not computed
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept
-0.01
1.83
-3.74
3.61
900
1
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma
2.04
1.98
0.6
7.09
1174
1
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Still rough warnings, but much better overall.
8.4.4. Non-identifiable parameters.
Simulating the data and putting it in a data frame.
set.seed(815)
dfy <- data.frame(y = rnorm(100, mean = 0, sd = 1))
I'm not quite sure how to specify McElreath's models m8.4 and m8.5 in brms. I'm not
concerned, though. It's unclear when such a model would be called for outside of a
pedagogical context.
Anyway, here was my attempt at m8.4:
Inits <- list(intercept = 0,
intercept = 0,
sigma = 1)
InitsList <- list(Inits, Inits)
b8.4 <- brm(data = dfy, family = "gaussian",
y ~ 0 + intercept + intercept,
prior = c(set_prior("uniform(-100000000, 100000000)", class = "b"
),
set_prior("cauchy(0, 1)", class = "sigma")),
inits = InitsList,
chains = 2, iter = 4000, warmup = 1000, cores = 2)
## Warning: It appears as if you have specified a lower bounded prior on a pa
rameter that has no natural lower bound.
## If this is really what you want, please specify argument 'lb' of 'set_prio
r' appropriately.
## Warning occurred for prior
## b ~ uniform(-100000000, 100000000)
## Warning: It appears as if you have specified an upper bounded prior on a p
arameter that has no natural upper bound.
## If this is really what you want, please specify argument 'ub' of 'set_prio
r' appropriately.
## Warning occurred for prior
## b ~ uniform(-100000000, 100000000)
## Compiling the C++ model
## Start sampling
plot(b8.4)
print(b8.4)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Family:
Formula:
Data:
Samples:
gaussian(identity)
y ~ 0 + intercept + intercept
dfy (Number of observations: 100)
2 chains, each with iter = 4000; warmup = 1000; thin = 1;
total post-warmup samples = 6000
WAIC: Not computed
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
intercept
-0.06
0.1
-0.25
0.13
5086
1
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma
0.99
0.07
0.86
1.14
5443
1
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
See? It didn't quite work right. So it goes.
Anyway, remove your objects.
rm(d, dd, b8.1, b8.1stan, dfy, Inits, InitsList, b8.2, b8.3, b8.4)
Note. The analyses in this document were done with:
•
•
•
•
•
•
R 3.4
RStudio 1.0.143
rmarkdown 1.4
rethinking 1.59
brms 1.6.1
ggplot2 2.2.1
References
McElreath, R. (2016). Statistical rethinking: A Bayesian course with examples in R and Stan.
Chapman & Hall/CRC Press.