Bayesian Analysis of Longitudinal Studies

Bayesian Analysis of Longitudinal Studies
Emmanuel Lesaffre
Department of Biostatistics, Erasmus Medical Center
Interuniversity Institute for Biostatistics and statistical Bioinformatics
Catholic University Leuven & University Hasselt
[email protected]
[email protected]
(58 RBRAS Campina Grande — 22 to 26 July 2013)
Contents
0.1
1
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
General introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
3
1.1
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.2
Probability in a Bayesian context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.3
Bayes theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.4
Bayesian concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.5
Bayesian versus frequentist approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.6
Markov chain Monte Carlo sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
1.7
Bayesian software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
Bayesian Longitudinal Analyses
i
2
3
4
Hierarchical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1
Gaussian hierarchical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
2.2
Mixed models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
2.3
Missing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
2.4
Some reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
Model evaluation techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
3.2
Information theoretic measures for model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
3.3
Model checking procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Longitudinal studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.1
List of (future) types of models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.2
Mixed models with non-Gaussian random effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.3
4.2.1
A linear mixed model with a non-Gaussian random effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.2.2
A generalized linear mixed model with a mixture distributions for REs . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Changepoint models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Bayesian Longitudinal Analyses
ii
4.4
4.5
4.6
4.3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.3.2
The HIV RCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.3.3
The Prostate Cancer Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
4.3.4
Overall conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Modeling growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
4.4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.4.2
The Predict Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
4.4.3
The Jimma Infant Survival Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Smooth longitudinal curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
4.5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
4.5.2
Smoothing approach: background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
4.5.3
Two longitudinal smoothing exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Multivariate longitudinal curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
4.6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
4.6.2
Intraocular pressure study: two longitudinal Gaussian responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Bayesian Longitudinal Analyses
iii
4.6.3
Signal-Tandmobielr study: two longitudinal ordinal responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
4.7
Joint modeling of the repeated measures and the failure time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
4.8
Longitudinal profiles as covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
4.9
An autoregressive + random effects model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
4.10 Repeated measures within repeated measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Bayesian Longitudinal Analyses
iv
0.1
Preface
What can be expected from this short course?
• First a brief refreshment of Bayesian concepts
• A brief introduction to Bayesian hierarchical models
• Bayesian mixed models
• Model evaluation techniques
• Bayesian longitudinal studies (in progress)
Bayesian Longitudinal Analyses
1
Some publicity
Bayesian Longitudinal Analyses
2
Chapter 1
General introduction
◃ Notation
◃ Probability in a Bayesian context
◃ Bayes theorem and concepts
◃ Bayesian versus frequentist approach
◃ Markov chain Monte Carlo sampling
◃ Bayesian software
Bayesian Longitudinal Analyses
3
1.1
Notation
• Random variable and realization: y
• Sample: y = {y1, . . . , yn}
• Parameter vector: θ
• Density or distribution: p(y | θ), likelihood: L(θ | y)
Bayesian Longitudinal Analyses
4
1.2
Probability in a Bayesian context
• Frequentist probability = long run frequency definition
• Bayesian probability = expression of Our/Your uncertainty of the parameter value
Bayesian Longitudinal Analyses
5
1.3
Bayes theorem
• Sample of i.i.d. {y1, . . . , yn} = y
• Joint distribution of sample = p(y|θ) =
∏n
i=1 p(yi |θ)
= likelihood L(θ|y)
• Prior distribution: p(θ)
• Split up: p(y, θ) = p(y|θ)p(θ) = p(θ|y)p(y)
⇒ Bayes theorem:
p(θ|y) =
Bayesian Longitudinal Analyses
L(θ|y)p(θ)
L(θ|y)p(θ)
=∫
p(y)
L(θ|y)p(θ)dθ
6
15
Prior, likelihood & posterior:
10
POSTERIOR
proportional to
LIKELIHOOD
0
5
PRIOR
0.0
0.1
0.2
0.3
0.4
θ
Bayesian Longitudinal Analyses
7
1.4
Bayesian concepts
• Prior distribution
• Posterior distribution
• Posterior summary measures
• Posterior predictive distribution
• The Bayes factor
Bayesian Longitudinal Analyses
8
Prior distribution
• Non-informative/vague/diffuse/ . . .: several general principles, but in complex
models difficult to decide
• Subjective: still rare in Bayesian literature
• Conjugate: analytical results, but limited applicability
• Conditional conjugate, semi-conjugate: extends the applicability and combines
nicely with Gibbs sampler
• Proper and improper prior: main question is how to show that posterior is proper
Bayesian Longitudinal Analyses
9
Posterior distribution
Multivariate θ = {θ 1, θ 2}
• Joint posterior: p(θ | y)
• Conditional posterior: p(θ 1 | θ 2, y)
• Marginal posterior:
∫
◃ p(θ 1 | y) = p(θ 1 | θ 2, y) p(θ 2 | y) d θ 2
◃ Bayesian paradigm: inference about parameters of interest θ 1 by averaging
over the uncertainty of the nuisance parameters θ 2
Bayesian Longitudinal Analyses
10
Posterior summary measures
θ univariate:
• Posterior mean: θ =
∫
θ p(θ|y)dθ
• Posterior median: 0.5 =
∫
θM
p(θ|y)dθ
• Posterior mode: θbM = arg maxθ p(θ|y)
• Posterior variance: σ =
2
∫
(θ − θ)2 p(θ|y)dθ
• Posterior SD: σ
θ multivariate: straightforward generalizations, except for posterior median
Bayesian Longitudinal Analyses
11
θ univariate:
• [a, b] = 100(1 − α)% credible interval: if P (a ≤ θ ≤ b | y) = 1 − α
• 100(1 − α)% equal tail credible interval: invariant to monotone transformations
• 100(1 − α)% highest posterior density (HPD) interval: shortest CI with size
100(1 − α)%
θ multivariate:
• HPD region of content 100(1 − α)%
Bayesian Longitudinal Analyses
12
Posterior predictive distribution
• PPD: p(e
y | y) =
∫
p(e
y | θ)p(θ | y) dθ
• PPD expresses what we know about the distribution of future observations
• Also useful for model checking
Bayesian Longitudinal Analyses
13
1.5
Bayesian versus frequentist approach
• Frequentist approach:
◦ θ fixed and inference based on sampling variability
◦ Many tests are based on asymptotic arguments
◦ Maximization is key tool
◦ Testing and estimation closely linked
• Bayesian approach:
◦ Condition on observed data (data fixed), uncertainty about θ (θ stochastic)
◦ No asymptotic arguments are needed, all inference depends on posterior
◦ Integration is key tool
◦ Testing (Bayes fator) and estimation (posterior distribution) separate tools
• But: frequentist and Bayesian approach can give the same numerical output
Bayesian Longitudinal Analyses
14
1.6
Markov chain Monte Carlo sampling
MCMC algorithm saved the Bayesian paradigm: it is the reason why you are here
• The Gibbs sampler
• The Metropolis(-Hastings) sampler
• Checking and accelerating convergence
Bayesian Longitudinal Analyses
15
The Gibbs sampler
Starting position θ 0 = (θ10, . . . , θd0)T
Iteration (k + 1):
(k+1)
k
from p(θ1 | θ2k , . . . , θ(d−1)
, θdk , y)
(k+1)
from p(θ2 | θ1
(k+1)
from p(θd | θ1
1. Sample θ1
2. Sample θ2
..
d. Sample θd
(k+1)
, θ3k , . . . , θdk , y)
(k+1)
(k+1)
, . . . , θ(d−1) , y)
p(θj | θ1, . . . , θ(j−1), θ(j+1), . . . , θd, y): full conditional distribution
Bayesian Longitudinal Analyses
16
Result of Gibbs sampling:
• Chain of vectors: θ k = (θ1k , . . . , θdk )T , k = 1, 2, . . .
◦ Consists of dependent elements
◦ Markov property: p(θ (k+1) | θ k , θ (k−1), . . . , y) = p(θ (k+1) | θ k , y)
• Chain depends on starting value + initial portion/burn-in part is discarded
• Under mild conditions: sample from the posterior distribution
⇒ From k0 on: summary measures calculated from the chain consistently estimate
the true posterior measures
Bayesian Longitudinal Analyses
17
The Metropolis-Hastings algorithm
Chain is at θ k ⇒ then sample value θ (k+1) as follows:
e from the (asymmetric) proposal density q(θ
e | θ), with
1. Sample a candidate θ
θ = θk
2. The next value θ (k+1) will be equal to:
e with probability α(θ k , θ)
e (accept proposal),
•θ
• θ k otherwise
(reject proposal),
with
(
e | y) q(θ | θ)
e
p(θ
e
α(θ , θ) = min r =
,1
k
k
e
p(θ | y) q(θ | θ )
k
)
k
Bayesian Longitudinal Analyses
18
Result of MH sampling:
• Markov chain of vectors: θ k = (θ1k , . . . , θdk )T , k = 1, 2, . . .
• About same properties as with Gibbs sampler
Gibbs sampler= special case of MH-algorithm
Bayesian Longitudinal Analyses
19
Output MCMC sampler
Trace plot from Gibbs sampler:
0.030
β1
0.045
(a)
0
500
1000
1500
1000
1500
Iteration
0.06
σ2
0.08 0.10
(b)
0
500
Iteration
Bayesian Longitudinal Analyses
20
7.1
(a)
6.9
µ
7.3
Trace plot from MH sampler:
600
800
1000
1200
1400
1200
1400
2.0
(b)
1.6
σ2
2.4
Iteration
600
800
1000
Iteration
Accepted moves = blue color, rejected moves = red color
Bayesian Longitudinal Analyses
21
Checking convergence of the Markov chain
Popular tools for checking convergence:
◃ Trace plot: visual inspection of mixing and stationarity of Markov chain
◃ Autocorrelation plot: mixing of Markov chain
◃ Cross-correlation plot: when mixing is low, check for identifiability problems
◃ Geweke diagnostic + dynamic version: single chain diagnostic, compares mean of
early with late part of chain
◃ Heidelberger-Welch: single chain diagnostic + assesses accuracy of posterior mean
(Monte Carlo error)
◃ Brooks-Gelman-Rubin diagnostic: multiple chains diagnostic, compares mixing of
the chains
Bayesian Longitudinal Analyses
22
It is impossible to prove convergence in practice:
◦ Example Geyer (1992)
◦ AZT RCT example
beta2[4]
8.0
6.0
4.0
2.0
0.0
1
5000
10000
15000
20000
iteration
Bayesian Longitudinal Analyses
23
Accelerating convergence of the Markov chain
• Acceleration approaches: aim is to lower the autocorrelation
◦ Choosing better starting values
◦ Transforming the variables: centering + standardizing variables
◦ Thinning: useful to reduce storage, but does not accelerate
◦ Blocking: sample blocks of parameters, may imply considerable improvement in
convergence rate
◦ Suppress purely random behavior of MCMC sampler: overrelaxation
◦ Reparameterization of the parameters: e.g. centering covariates, in general
joint reparameterization is needed
◦ Data augmentation: use of auxiliary variables, may improve writing of MCMC
sampler considerably
• More sophisticated MCMC algorithms may be required
Bayesian Longitudinal Analyses
24
1.7
Bayesian software
• WinBUGS is still the standard software, with OpenBUGS as successor
• Link of R to WinBUGS/OpenBUGS: R2WinBUGS, CODA, BOA, . . .
• Also JAGS, R2jags, rjags
• Examples will be analyzed using above software and show that:
◃ Programming languages are almost the same
◃ R-objects that R2WinBUGS, R2jags, rjags deliver are different
◃ Performance and speed can differ considerably between the software
• INLA: promising new approach based on smart Laplace approximations (no
illustrations here)
Bayesian Longitudinal Analyses
25
Chapter 2
Hierarchical models
• Bayesian hierarchical models (BHMs): for hierarchical/clustered data
• Examples:
◦ Measurements taken repeatedly over time on the same subject
◦ Data with a spatial hierarchy: surfaces on teeth and teeth in a mouth
◦ Multi-center clinical data with patients within centers
◦ Cluster randomized trials where centers are randomized to interventions
◦ Meta-analyses
• BHM: a classical mixed effects model + prior on all parameters
• As in classical frequentist world: random and fixed effects
Bayesian Longitudinal Analyses
26
Here:
• Introduction to BHM via Gaussian hierarchical model
• Full Bayesian versus Empirical Bayesian approach
• Bayesian Linear, Generalized Linear and Nonlinear Mixed Model
• Choice of non-informative priors
Bayesian Longitudinal Analyses
27
2.1
Gaussian hierarchical models
Bayesian Longitudinal Analyses
28
Introduction
• Gaussian hierarchical model: hierarchical model whereby the distribution at each
level is Gaussian
• Here variance component model/random effects model: no covariates involved
• Bayesian linear model:
◦ All parameters are given a prior distribution
◦ Fundamentals by Lindley and Smith
• Illustrations: dietary IBBENS study
Bayesian Longitudinal Analyses
29
Example dietary study: Monitoring dietary behavior in Belgium
• IBBENS study: dietary survey in Belgium (Den Hond et al. 1994)
• Of interest: compare the dietary intake in different geographical areas in Belgium,
but especially in Flanders
• Performed in eight subsidiaries of one bank situated in seven Dutch-speaking cities
in the north and one French-speaking city in the south of Belgium.
• The food habits of 371 (66%) male and 192 female healthy employees with
average age 38.3 years were examined by a three-day food record with an
additional interview.
• Cholesterol intake has roughly a Gaussian distribution, but as shown below this
assumption is not crucial for a large sample. The observed mean of cholesterol
intake is 328 mg/day with a standard deviation equal to 120.3 mg/day.
Bayesian Longitudinal Analyses
30
600
400
200
Cholesterool intake
800
Boxplot cholesterol intake according to subsidiaries:
1
2
3
4
5
6
7
8
Subsidiary
Bayesian Longitudinal Analyses
31
The Gaussian hierarchical model
• Two-level Bayesian Gaussian hierarchical model:
Level 1:
Level 2:
Priors:
yij | θi, σ 2 ∼ N(θi, σ 2) (j = 1, . . . , mi; i = 1, . . . , n)
θi | µ, σθ2 ∼ N(µ, σθ2) (i = 1, . . . , n)
σ 2 ∼ p(σ 2) and (µ, σθ2) ∼ p(µ, σθ2)
◃ Hierarchical independence
◃ Hyperparameters: often p(µ, σθ2) = p(µ)p(σθ2)
◃ Alternative model formulation is θi = µ + αi, with αi ∼ N(0, σθ2)
◃ Joint posterior:
p(θ, σ
2
, µ, σθ2
Bayesian Longitudinal Analyses
| y) ∝
∏n ∏mi
i=1
j=1 N(yij | θi , σ )
2
∏n
2
2
2
N(θ
|
µ,
σ
)
p(σ
)
p(µ,
σ
i
θ
θ)
i=1
32
Estimating the parameters
Some insightful results (with fixed σ 2):
• Joint posterior depends on data only via ȳi (i = 1, . . . , n)
• p(θi | µ, σθ2, σ 2, y) = N(θi, σ 2θi ) with
θi =
1
σθ2
1
µ + σ2/m
ȳi
i
1
σθ2
• θi = Biµ + (1 − Bi)y i, with Bi =
1
+ σ2/m
1
/
σθ2
and σ 2θi =
i
(
1
σθ2
+
1
2
σ /mi
)
=
1
σθ2
1
1
+ σ2/m
i
σ2
(σ 2 +mi σθ2 )
• Inverse relationship of shrinkage factor with intra-class correlation:
σθ2
ICC = 2
σθ + σ 2
Bayesian Longitudinal Analyses
33
• For flat prior of µ and conditional on σθ2 and σ 2:
p(µ | σθ2, σ 2, y) = N(µ, σ 2µ)
with
∑n
µ=
Bayesian Longitudinal Analyses
1
i=1 σ 2 /mi +σ 2 ȳi
θ
∑n
1
i=1 σ 2 /mi +σ 2
θ
and σ 2µ =
n
∑
i=1
1
σ 2/mi + σθ2
34
Example dietary study: Comparison between subsidiaries
Aim: compare average cholesterol intake between subsidiaries with WinBUGS
• Priors: µ ∼ N(0, 106), σ 2 ∼ IG(10−3, 10−3) and σθ ∼ U(0, 100)
◦ 3 chains each of size 10,000, 3 × 5,000 burn-in
◦ BGR diagnostic in WinBUGS: almost immediate convergence
◦ Export to CODA: OK except for Geweke test (numerical difficulties)
• Posterior summary measures on the last 5,000 iterations
◦ µ = 328.3, σµ = 9.44
◦ σ M = 119.5
◦ σθ M = 18.26
◦ θi: see table (not much variation)
◦ Bi ∈ [0.33, 0.45] (≈ uniform shrinkage)
Bayesian Longitudinal Analyses
35
DAG:
sigma.i
sigma.mu
tau.mu
r
tau.i
mu_i
mu
y_ij
for(j IN 1 : m_i)
for(i IN 1 : n)
Bayesian Longitudinal Analyses
36
FB estimation of θi with WinBUGS:
Sub mi Mean (SE)
SD
FBW (SD)
FBS (SD)
EML (SE) EREML (SE)
1
82 301.5 (10.2)
92.1 311.8 (12.6) 312.0 (12.0) 313.8 (10.3)
312.2 (11.3)
2
51 324.7 (17.1) 122.1 326.4 (12.7) 326.6 (12.7) 327.0 (11.7)
326.7 (12.3)
3
71 342.3 (13.6) 114.5 336.8 (11.7) 336.6 (11.7) 335.6 (10.7)
336.4 (11.6)
4
71 332.5 (13.5) 113.9 330.8 (11.6) 330.9 (11.5) 330.6 (10.7)
330.8 (11.6)
5
62 351.5 (19.0) 150.0 341.2 (12.8) 341.3 (12.6) 339.5 (11.1)
340.9 (11.8)
6
69 292.8 (12.8) 106.4 307.3 (14.2) 307.8 (13.5) 310.7 (10.8)
308.4 (11.6)
7
74 337.7 (14.1) 121.3 334.1 (11.4) 334.2 (11.2) 333.4 (10.6)
333.9 (11.5)
8
83 347.1 (14.5) 132.2 340.0 (11.4) 340.0 (11.4) 338.8 (10.2)
340.0 (11.2)
chapter 9 dietary study chol.odc
Bayesian Longitudinal Analyses
37
Posterior predictive distributions
Prediction of the distribution of cholesterol intake of a future bank employee given
observed intakes y.
Two cases:
1. New bank employee works in one of the eight subsidiaries
2. New bank employee works in a new subsidiary from the same bank
Bayesian Longitudinal Analyses
38
Example dietary study: PPDs
Case 1:
∫ ∫
p(e
yij | y) =
µ
σθ2
∫
σ2
∫
p(e
yij | θi) p(θi | µ, σθ2, σ 2, y) p(µ, σθ2, σ 2 | y) dθi dµ dσθ2 dσ 2
θi
with
p(e
yij | θi) = N(e
yij | θi, σ 2)
p(θi | µ, σθ2, σ 2, y) = N(θi | θi, σ 2θi )
p(µ, σθ2, σ 2 | y)
Bayesian Longitudinal Analyses
39
Case 2:
∫ ∫
p(e
y | y) =
µ
σθ2
∫
σ2
∫
θe
e p(θe | µ, σ 2, σ 2, y) p(µ, σ 2, σ 2 | y) dθe dµ dσ 2 dσ 2
p(e
y | θ)
θ
θ
θ
with
e = N(e
e σ 2)
p(e
y | θ)
y | θ,
p(θe | µ, σθ2, σ 2, y) = N(θe | µ, σ 2θ )
p(µ, σθ2, σ 2 | y)
Bayesian Longitudinal Analyses
40
Example dietary study: PPDs with WinBUGS
Extra WinBUGS commands:
#prediction new observations of subsidiary 1
for( j in 1 : nc[1] )
predict1[j] ~ dnorm(mu.sub[1],tau.i)
mpredict1 <- mean(predict1[])
#prediction new observations of a new subsidiary
mu.new ~ dnorm(mu.m,tau.mu)
for (j in 1:nnew) predict.new[j] ~ dnorm(mu.new,tau.i)
mpredict.new <- mean(predict.new[])
Bayesian Longitudinal Analyses
41
2.2
Mixed models
Bayesian Longitudinal Analyses
42
Introduction
• Bayesian linear mixed model (BLMM): extension of Bayesian Gaussian hierarchical
model
• Bayesian generalized linear mixed models (BGLMM): extension of BLMM
• Bayesian nonlinear mixed model: extension of BGLMM
Bayesian Longitudinal Analyses
43
The linear mixed model (LMM)
LMM: yij response for jth observation on ith subject (i = 1, . . . , n)
yij = xTij β + z Tij bi + εij
y i = X iβ + Z ibi + εi
◃ y i = (yi1, . . . , yimi )T : mi × 1 vector of responses
◃ X i = (xTi1, . . . , xTimi )T : mi × (d + 1) design matrix
◃ β = (β0, β1, . . . , βd)T a (d + 1) × 1: fixed effects
◃ Z i = (z Ti1, . . . , z Timi )T : mi × q design matrix of random effects
◃ bi = q × 1 vector of random effects (i = 1, . . . , n)
◃ εi = (εi1, . . . , εimi )T : mi × 1 vector of measurement errors
Bayesian Longitudinal Analyses
44
Distributional assumptions:
◃ bi ∼ Nq (0, G), G: q × q covariance matrix
◃ G with (j, k)th element: σbj ,bk (j ̸= k), σb2j (j = k)
◃ εi ∼ Nmi (0, Ri), Ri: mi × mi covariance matrix often Ri = σ 2Imi
◃ bi statistically independent of εi (i = 1, . . . , n)
Implications:
y i | bi ∼ Nmi (X iβ, Ri)
y i ∼ Nmi (X iβ, Z iGZ Ti + Ri)
Bayesian Longitudinal Analyses
45
• LMM popular for analyzing longitudinal studies with irregular time points +
Gaussian response
• Covariates: time-independent or time-dependent
• Random intercept model: b0i
• Random intercept + slope model: b0i + b1itij
• Bayesian linear mixed model (BLMM): all parameters prior distribution
Bayesian Longitudinal Analyses
46
• Bayesian linear mixed model:
Level 1:
Level 2:
Priors:
yij | β, bi, σ 2 ∼ N(xTij β+z Tij bi, σ 2) (j = 1, . . . , mi; i = 1, . . . , n)
bi | G ∼ Nq (0, G) (i = 1, . . . , n)
σ 2 ∼ p(σ 2), β ∼ p(β) and G ∼ p(G)
◃ Joint posterior:
p(β, G, σ 2, b1, . . . , bn | y 1, . . . , y n)
∝
∏n ∏mi
∏n
2
2
i=1
j=1 p(yij | bi , σ , β)
i=1 p(bi | G)p(β)p(G)p(σ )
Bayesian Longitudinal Analyses
47
Example dietary study: Comparison between subsidiaries
Aim: compare average cholesterol intake between subsidiaries correcting for age and
gender
• Model:
yij = β0 + β1 ageij + β2 genderij + b0i + εij
◃ b0i ∼ N(0, σb20 )
◃ εij ∼ N(0, σ 2)
◃ Priors: vague for all parameters
Bayesian Longitudinal Analyses
48
Results:
• WinBUGS program: chapter 9 dietary study chol age gender.odc
◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in
◦ Rapid convergence
• Posterior summary measures on the last 3 × 5,000 iterations
◦ β1(SD) = −0.69(0.57), β2(SD) = −62.67(10.67)
◦ With covariates: σ M = 116.3, σb0 M = 14.16
◦ Without covariates: σ M = 119.5, σb0 M = 18.30
◦ With covariates: ICC = 0.015
◦ Without covariates: ICC = 0.022
Bayesian Longitudinal Analyses
49
Example toenail RCT: Fitting a BLMM
Aim: compare itraconazol with lamisil on unaffected nail length
◦ Double-blinded multi-centric RCT (36 centers) sportsmen and elderly people
treated for toenail dermatophyte onychomycosis
◦ Two oral medications: itraconazol (treat=0) or lamisil (treat=1)
◦ Twelve weeks of treatment
◦ Evaluations at 0, 1, 2, 3, 6, 9 and 12 months
◦ Response: unaffected nail length big toenail
◦ Patients: subgroup of 298 patients
Bayesian Longitudinal Analyses
50
Individual profiles:
20
15
10
5
0
5
10
15
20
Length unaffected toenail
Lamisil
0
Length unaffected toenail
Itraconazol
0
2
6
9
Months since baseline
Bayesian Longitudinal Analyses
12
0
2
6
9
12
Months since baseline
51
Model specification:
• Model:
yij = β0 + β1 tij + β2 tij × treati + b0i + b1i tij + εij
◃ b0i ∼ N(0, σb20 )
◃ εij ∼ N(0, σ 2)
• Priors
◦ Vague normal for fixed effects
◦ G ∼ IW(D, 2) with D=diag(0.1,0.1)
Bayesian Longitudinal Analyses
52
DAG:
sigma.b0
beta_0
beta_1
beta_2
corr.b0b1
sigma.b1
tau.b
sigma.eps
mu_ij
b_i
eps_ij
tau.eps
y_ij
t_ij
treat_i
for(j IN 1 : m_i)
for(i IN 1 : n)
Bayesian Longitudinal Analyses
53
Results:
• WinBUGS program: chapter 9 toenail LMM.odc
◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in
◦ Rapid convergence
• Posterior summary measures on the last 3 × 5,000 iterations
◦ β1(SD) = 0.58(0.043), β2(SD) = −0.057(0.058)
◦ σ M = 1.78
◦ σb0 M = 2.71
◦ σb1 M = 0.48
◦ corr(b0, b1)M = −0.39
• Frequentist analysis with SASr MIXED: MLEs close to Bayesian estimates
Bayesian Longitudinal Analyses
54
The generalized linear mixed model
Two-level hierarchy, i.e. mi subjects in n clusters with responses yij
Generalized linear model (GLIM):
p(yij | θij ; ϕij ) = exp
[
yij θij −b(θij )
a(ϕij )
]
+ c(yij ; ϕij )
◃ a(ϕij ) > 0 a known scale function
◃ θij unknown canonical parameters
◃ g(µij ) = xTij β with µij = E(yij | θij )
◃ Problem: model ignores clustering in the data
Bayesian Longitudinal Analyses
55
Bayesian generalized linear mixed model (BGLMM):
Level 1: yij |θij , ϕij ∼ exp
[
yij θij −b(θij )
a(ϕij )
]
+ c(yij ; ϕij ) (j = 1, . . . , mi; i = 1, . . . , n)
Level 2: g(µij ) = xTij β + z Tij bi with i.i.d. bi ∼ Nq (0, G) (i = 1, . . . , n)
Priors:
β ∼ p(β), G ∼ p(G)
with G a q × q-covariance matrix
Example: Logistic-normal-binomial model
◦ logit(πij ) = xTij β + b0i
◦ E [logit(πij )] = xTij β
◦ Regression coefficients have a subject-specific interpretation
Bayesian Longitudinal Analyses
56
Example toenail RCT: A Bayesian logistic RI model
Aim: compare evolution of onycholysis over time between two treatments
Response: yij : 0 = none or mild, 1 = moderate or severe
◃ Model:
logit(πij ) = β0 + β1 tij + β2 tij × treati + b0i
◃ b0i ∼ N(0, σb20 )
◃ Priors:
◦ Independent vague normal for regression coefficients
◦ σb0 ∼ U(0, 100)
Bayesian Longitudinal Analyses
57
20
Itraconazol
Lamisil
0
10
% onycholysis
30
40
Mean profiles:
0
1
2
3
6
9
12
Months since baseline
Bayesian Longitudinal Analyses
58
Results:
• WinBUGS program: chapter 9 toenail RI BGLMM.odc
◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in
◦ Rapid convergence
• Posterior summary measures on the last 3 × 5,000 iterations
◦ β0(SD) = −1.74(0.34)
◦ β1(SD) = −0.41(0.045)
◦ β2(SD) = −0.17(0.069)
◦ σb20 M = 17.44
◦ ICC = σb20 /(σb20 + π 2/3) = 0.84
• Frequentist analysis with SASr GLIMMIX: similar results,
see program chapter 9 toenail binary GLIMMIX and MCMC.sas
Bayesian Longitudinal Analyses
59
Example toenail RCT: A Bayesian logistic RI+RS model
Aim: compare evolution of onycholysis over time between two treatments
Response: yij : 0 = none or mild, 1 = moderate or severe
• Model:
logit(πij ) = β0 + β1 tij + β2 tij × treati + b0i + b1i tij
◃ Similar distributional assumptions and priors as before
WinBUGS program: chapter 9 toenail RI+RS BGLMM.odc
Numerical difficulties due to overflow: use of min and max functions
SAS MCMC program: chapter 9 toenail RI+RS BGLMM.sas
No computational problems but much slower convergence
Bayesian Longitudinal Analyses
60
Other BGLMMs
Extension of BGLMM with residual term:
g(µij ) = xTij β + z Tij bi + εij , (j = 1, . . . , mi; i = 1, . . . , n)
Albert’s model (1988):
• Parameters have a population averaged interpretation
• Model allows only level-1 covariates
Further extensions: see later
Bayesian Longitudinal Analyses
61
Nonlinear mixed models
Bayesian nonlinear mixed model (BNLMM):
Level 1: yij | ϕi, xij , σ 2 ∼ N[f (ϕi, xij ), σ 2] (j = 1, . . . , mi; i = 1, . . . , n)
Level 2: ϕi | W i, Z i, β ∼ N(W iβ, Z iGZ Ti ) (i = 1, . . . , n)
Priors:
β ∼ p(β), σ 2 ∼ p(σ 2) and G ∼ p(G)
with G a q × q-covariance matrix
with
◃ ϕi = W iβ + Z ibi
◃ W i includes X i
◃ f a general function of several variables, more general than the link function g
Bayesian Longitudinal Analyses
62
Example arteriosclerosis study: Reperfusion models
• Aim: examine the cellular components of the immune system and their effects on
arteriogenesis
• Study set up: genetically modified mice undergo a surgical procedure that blocked
one of the main leg arteries
• Two groups: C57BL/6 mice and MHC class II-/- mice
• Response: yij = ischemic/non-ischemic perfusion ratio (IPR) (comparison of
perfusion in blocked/unblocked leg)
• Profiles: measured at days 0, 3, 7, 14, 21, 28
Bayesian Longitudinal Analyses
63
0.5
1.0
1.5
C57BL/6 WT
MHC class II−/−
0.0
ischemic/non−ischemic perfusion ratio
Individual profiles:
0
5
10
15
20
25
days
Bayesian Longitudinal Analyses
64
Model specification:
Model:
yij = ϕ1i {1 − exp [− exp(ϕ2i tj )] + εij }
◃ 1st param: ϕ1i = β1 (grp 1), ϕ1i = β2 (grp 2) (ultimate IPR)
◃ 2nd param: ϕ2i = β3 + bi (grp 1), ϕ2i = β4 + bi (grp 2) (rate to attain final IPR)
◃ Interest in: δ = β3 − β4
◃ In matrix notation:

β1



β2
1 0 0 0 

ϕi = W i β + bi ≡ 

0 0 1 0 
 β3

β4
Bayesian Longitudinal Analyses

  


 0
+

bi


65
Further model assumptions:
• bi ∼ N(0, σb2), εij ∼ N(0, σ 2)
• Priors
◦ Independent vague normal for fixed effects
◦ σ, σb: U(0,100)
Bayesian Longitudinal Analyses
66
Results:
• WinBUGS program: chapter 9 arterio study.odc
◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in
◦ Rapid convergence
• Posterior summary measures on the last 3 × 5,000 iterations
δ(SD) = 0.70(0.37) (95% equal tail: [-0.002, 1.47])
⇒ No clear evidence of a difference between the two types of mice
Bayesian Longitudinal Analyses
67
1.5
C57BL/6 WT
0.5
1.0
MHC class II−/−
0.0
ischemic/non−ischemic perfusion ratio
Individual + fitted median profiles:
0
5
10
15
20
25
days
Bayesian Longitudinal Analyses
68
Estimation of the REs and PPDs in mixed models
Prediction/estimation:
◃ Bayesian approach: estimating random effects = estimating fixed effects
◃ Prediction/estimation of individual curves in longitudinal models:
b + λT b
λTβ β
b bi
b and b
with β
bi: posterior means
PPD:
◃ See Gaussian hierarchical model
◃ Estimation via MCMC
Bayesian Longitudinal Analyses
69
Software:
◃ WinBUGS (CODA command) + post-processing with R
◃ R2WinBUGS, R2jags, rjags
Bayesian Longitudinal Analyses
70
Example toenail RCT: Exploring the random effects
0
0
10
30
50
70
10 20 30 40 50 60 70
Histograms of estimated RI and RS:
−4
0
2
4
6
random intercept
Bayesian Longitudinal Analyses
8
10
−1.5
−0.5
0.5 1.0 1.5
random slope
71
Predicting individual evolutions with WinBUGS:
b + zT b
• Predicted evolution for the ith subject: xTij β
ij bi
• Extra WinBUGS commands:
For existing subject (id[iobs]):
newresp[iobs] ~ dnorm(mean[iobs], tau.eps)
For new subject: sample random effect from its distribution
• Computation:
◃ Stats table WinBUGS +R, CODA + R
◃ More practical: R2WinBUGS + R, R2jags + R, rjags + R
◃ Alternatively: generate predictive observations in R
• Prediction missing response NA: automatically done
Bayesian Longitudinal Analyses
72
20
15
10
5
0
Length unaffected naillength
Predicted profile for Itraconazol patient 3:
0
1
2
3
6
9
12
Month
Bayesian Longitudinal Analyses
73
Choice of the level-2 variance prior
• Bayesian linear regression: NI prior for σ 2 = Jeffreys prior p(σ 2) ∝ 1/σ 2
• Bayesian Gaussian hierarchical model: NI prior for σθ2 = p(σθ2) ∝ 1/σθ2 ??
◦ Theoretical + intuitive results: improper posterior
◦ Note: not Jeffreys prior for σ 2 in hierarchical model
• Possible solution: IG(ε, ε) with ε small (WinBUGS)?
◦ Not: see application
• Solutions:
◦ U(0,c) prior on σθ
◦ Parameter expanded model suggestion by Gelman (2006)
Bayesian Longitudinal Analyses
74
Example dietary study*: NI prior for level-2 variance
• Modified dietary study: chapter 9 dietary study chol2.odc
◃ Prior 1: σθ2 ∼ IG(10−3, 10−3)
◃ Prior 2: σθ ∼ U(0,100)
◃ Prior 3: suggestion of Gelman (2006) gives similar results as with U(0,100)
• Results:
◦ Posterior distribution of σθ : clear impact of IG-prior
◦ Trace plot of µ: regularly stuck with IG-prior
Bayesian Longitudinal Analyses
75
Effect choosing NI prior: posterior distribution of σθ
Flat prior
0.00
0.00
0.01
0.05
0.02
0.10
0.03
0.15
0.04
0.20
IG(0.001,0.001)
0
20
40
60
σθ
Bayesian Longitudinal Analyses
80
0
20
40
60
80
σθ
76
Effect choosing NI prior: trace plot of µ
mu.m
400.0
350.0
300.0
250.0
50001
52500
55000
57500
60000
57500
60000
iteration
mu.m
375.0
350.0
325.0
300.0
275.0
50001
52500
55000
iteration
Top: flat prior for σθ , bottom: IG(0.001,0.001) for σθ2
Bayesian Longitudinal Analyses
77
Choice of the level-2 variance prior
Prior for level-2 covariance matrix?
• Classical choice: Inverse Wishart
• Simulation study of Natarajan and Kass (2000): problematic
• Solution:
◦ In general: not clear
◦ 2 dimensions: uniform prior on both σ’s and on correlation
: IW(D, 2) with small diagonal elements
◦ IW(D, 2) with small diagonal elements ≈ uniform priors, but slower
convergence with uniform priors
Bayesian Longitudinal Analyses
78
Assessing and accelerating convergence
Checking convergence Markov chain of a Bayesian hierarchical model:
• Similar to checking convergence in any other model
• But, large number of parameters ⇒ make a selection
Accelerating convergence Markov chain of a Bayesian hierarchical model:
• Use tricks seen before (centering, standardizing, overrelaxation, etc.)
• Specific tricks for a hierarchical model:
◦ hierarchical centering
◦ (reparameterization by sweeping)
◦ (parameter expansion)
Bayesian Longitudinal Analyses
79
Hierarchical centering:
Uncentered Gaussian hierarchical model:
yij = µ + αi + εij
αi ∼ N(0, σα2 )
Centered Gaussian hierarchical model:
yij = θi + εij
θi ∼ N(µ, σα2 )
• For mi = m: hierarchical centering implies faster mixing when σα2 > σ 2/m
• Similar results for multilevel BLMM and BGLMM
Bayesian Longitudinal Analyses
80
Example dietary study: Improving convergence
In all cases: 10.000 iterations, 5,000 burn-in
Hierarchical centering:
◦ σα2 > σ 2/70
◦ Improved mixing
◦ µ: MCerror-uncentered = 0.32, MCerror-centered = 0.23
Bayesian Longitudinal Analyses
81
2.3
Missing data
• All longitudinal models are likelihood-based and thus:
◃ are MAR robust
◃ if model is correctly specified
• In WinBUGS/OpenBUGS, R2WinBUGS, etc. missing responses (NA) are
automatically filled in using estimated model (MAR imputation)
• In Chapter 4: example of a specific MNAR model
Bayesian Longitudinal Analyses
82
2.4
Some reflections
Bayesian solutions were always close to the classical results.
So why doing a Bayesian analysis?
• Perhaps you are Bayesian fundamentalist, if not . . .
• Uncertainty of parameter estimates is an immediate by-product of the Bayesian
approach. And . . . Bayesian 95% CI is often close to classical 95% CI.
• Distributional assumptions are relatively easy to relax: both using WinBUGS as
well as when the full conditionals have to be programmed
Bayesian Longitudinal Analyses
83
Chapter 3
Model evaluation techniques
Toolbox for the statistical modeler
Bayesian Longitudinal Analyses
84
3.1
Introduction
• Bayesian procedures for model building and model criticism ⇒ select an
appropriate model
• Model selection from a few good candidate models
• Bayesian model building and criticism = similar to frequentist model and criticism,
except for
◦ Bayesian model: combination of likelihood and prior ⇒ 2 choices
◦ Bayesian inference: based on MCMC techniques while frequentist approaches
on asymptotic inference
Bayesian Longitudinal Analyses
85
• Explorative tools that check and improve the fitted model using WinBUGS in
combination with R:
◦ Criteria to select models: DIC, MSPE
◦ Methods to detect outliers: PPO, CPO
◦ Bayesian sensitivity analysis: vary the assumptions
◦ Bayesian goodness-of-fit tests: posterior-predictive checks
Bayesian Longitudinal Analyses
86
3.2
Information theoretic measures for model selection
Most popular frequentist information criteria: AIC and BIC
• Both adjust for model complexity: effective degrees of freedom ρ
• Akaike’s information criterion (AIC) developed by Akaike (1974)
b
AIC = −2 log L(θ(y)
| y) + 2ρ
• Bayesian Information Criterion (BIC) suggested by Schwartz (1978)
b
BIC = −2 log L(θ(y)
| y) + ρ log(n)
• Deviance Information Criterion (DIC) suggested by Spiegelhalter et al. (2002)
DIC = generalization of AIC to Bayesian models
Bayesian Longitudinal Analyses
87
The effective degrees of freedom
• Effective degrees of freedom ρ, apart from variance parameters:
◦ Only fixed effects: ρ = p (p = number of fixed effects)
◦ Fixed + random effects: p ≤ ρ ≤ p + q (q = number of random effects)
• Related to conditional and marginal likelihood
◦ Predictive ability of conditional model: focus on current clusters (current
random effects)
◦ Predictive ability of marginal model: focus on future clusters (random effects
distribution)
Bayesian Longitudinal Analyses
88
Deviance Information Criterion & pD
Key question: how to define effective number of parameters in a Bayesian context?
In analogy to definition of ρ in AIC, effective degrees of freedom = pD :
pD = D(θ) − D(θ)
· Bayesian deviance = D(θ) = −2 log p(y | θ) + 2 log f (y)
· D(θ) = posterior mean of D(θ)
· D(θ) = D(θ) evaluated in posterior mean
· f (y) typically saturated density but not used in WinBUGS
◃ pD = ρ for normal likelihood with flat prior for µ and fixed variance
◃ pD quite different from ρ in many other applications
Bayesian Longitudinal Analyses
89
• DIC as a Bayesian model selection criterion:
DIC = D(θ) + 2pD = D(θ) + pD
• Both DIC and pD can be calculated from an MCMC run:
◦ θ 1, . . . , θ K = converged Markov chain
∑K
1
◦ D(θ) ≈ K k=1 D(θ k )
∑
k
◦ D(θ) ≈ D( K1 K
θ
)
k=1
• Practical rule for choosing model: as with AIC/BIC
• DIC is subject to sampling variability
Bayesian Longitudinal Analyses
90
DIC and pD – Use and pitfalls
In book several examples to illustrate use and pitfalls when using pD and DIC:
◃ Use of WinBUGS to compute pD and DIC
◃ Difference between conditional and marginal DIC
◃ Negative pD :
◦ DIC and pD calculation assume log concave likelihood
◦ If assumptions not fulfilled (mixture distributions) pD could be negative, this
caused a lot of confusion among WinBUGS users
◃ DIC and pD are overoptimistic (using data twice)
◃ rjags uses DIC corrected for overoptimism
◃ Except for WinBUGS, computation of DIC could be quite variable
Bayesian Longitudinal Analyses
91
Example growth curve study: Model selection using pD and DIC
• Well-known data set from Potthoff & Roy (1964)
• Dental growth measurements of a distance (mm)
• 11 girls and 16 boys at ages (years) 8, 10, 12, and 14
• Variables are gender (1=female, 0=male) and age
• Gaussian linear mixed models were fit to the longitudinal profiles
• WinBUGS: chapter 10 Potthoff-Roy growthcurves.odc
• Choice evaluated with pD and DIC
Bayesian Longitudinal Analyses
92
Individual profiles:
30
boys
25
20
Distance (mm)
girls
8
9
10
11
12
13
14
Age (years)
Bayesian Longitudinal Analyses
93
Models:
Model M1:
yij = β0 +β1 agej +β2 genderi +β3 genderi ×agej +b0i +b1i agej +εij
◃ yij = distance measurement
◃ bi = (b0i, b1i)T random intercept and slope with distribution N (0, G)


G=
2
σb0
ρσb0σb1
ρσb0σb1
2
σb1

εij ∼ N(0, σ02) for boys
εij ∼ N(0, σ12) for girls
◃ The total number of parameters for model M1 = 63
4 fixed effects, 54 random effects, 3 + 2 variances (RE + ME)
Bayesian Longitudinal Analyses
94
Alternative models:
◃ Model M2: model M1, but assuming ρ = 0
◃ Model M3: model M2, but assuming σ0 = σ1
◃ Model M4: model M1, but assuming σ0 = σ1
◃ Model M5: model M1, but bi, εij ∼ t3-(scaled) distributions
◃ Model M6: model M1, but εij ∼ t3-(scaled) distribution
◃ Model M7: model M1, but bi ∼ t3-(scaled) distribution
◃ Nested model comparisons:
(1) M1, M2, M3, (2) M1, M2, M4, (3) M5, M6, M7
Bayesian Longitudinal Analyses
95
pD and DIC of the models:
Model
Bayesian Longitudinal Analyses
Dbar
Dhat
pD
DIC
M1
343.443 308.887 34.556 377.999
M2
344.670 312.216 32.454 377.124
M3
376.519 347.129 29.390 405.909
M4
374.065 342.789 31.276 405.341
M5
328.201 290.650 37.552 365.753
M6
343.834 309.506 34.327 378.161
M7
326.542 288.046 38.047 364.949
96
Example growth curve study: Conditional and marginal DIC
(Conditional) model M4:
yij | bi ∼ N(µij , σ 2) (j = 1, . . . , 4; i = 1, . . . , 27)
with
µij = β0 + β1 agej + β2 genderi + β3 genderi × agej + b0i + b1i agej
◃ Deviance: DC (µ, σ ) ≡ DC (β, b, σ ) = −2
2
2
∑ ∑
i
j
log N(yij | µij , σ 2)
◃ Interest (focus): current 27 bi’s
◃ DIC = conditional DIC
◃ WinBUGS: pD = 31.282 and DIC=405.444
Bayesian Longitudinal Analyses
97
Marginal(ized) M4:
(
)
y i ∼ N4 X iβ, ZGZ + R , (i = 1, . . . , 27)
T
with y i = (yi1, yi2, yi3, yi4)T

1

1

Xi = 
1


1



8 genderi 8 × genderi 
1 8 






10 genderi 10 × genderi 
 1 10 
, Z = 
, R = σ 2I4
 1 12 
12 genderi 12 × genderi 






14 genderi 14 × genderi
1 14
◃ Deviance: DM (β, σ , G) = −2
2
)
i log N4 y i | X i β, ZGZ + R
∑
(
T
◃ Interest (focus): future bi’s
◃ DIC= marginal DIC
◃ WinBUGS : pD = 7.072 and DIC=442.572
Bayesian Longitudinal Analyses
98
• Frequentist analysis of M4: maximization of marginal likelihood
◦ SAS procedure MIXED: p = 8 and AIC = 443.8 (close to DIC for marginalized
model)
◦ See program chapter 10 Potthoff-Roy growthcurves.sas
• Many prefer to express performance of model on marginalized likelihood, but
needs extra programming (integration) outside WinBUGS
Bayesian Longitudinal Analyses
99
Model selection based on other predictive loss functions
• Assume yei ∼ PPD (i = 1, . . . , n): MSPE =
1
n
∑n
i=1 (yi
− yei)2
• Computation of MSPE based on converged Markov chain (θ k )k :
1 ∑
1∑
MSPE =
MSPEk with MSPEk =
(yi − yeik )2
K
n i=1
K
n
k=1
• WinBUGS commands to obtain predictive values:
y[i] ~ dnorm(mu[i],tauy); ytilde[i] ~ dnorm(mu[i],tauy)
• MSPE does not compensate for model complexity
• For a longitudinal example, see later and Chapter 4
Bayesian Longitudinal Analyses
100
3.3
Model checking procedures
Bayesian Longitudinal Analyses
101
Introduction
Selected model is not necessarily sensible nor does it guarantee a good fit to the data
Statistical model evaluation is needed:
1. Checking that inference from the chosen model is reasonable
2. Verifying that the model can reproduce the data
3. Sensitivity analyses by varying certain aspects of the model
Bayesian model evaluation:
◃ As frequentist approach
◃ Also prior needs attention
◃ Primarily based on sampling techniques
Bayesian Longitudinal Analyses
102
1. Verifying that the model is reasonable:
Critical inspection of the posterior output making use of background knowledge
◃ Is prior in conflict with the data?
◃ Learn from the data? or Are there identifiability issues?
2. Adequate prediction of the data at hand:
Critical inspection of the posterior output making use of background knowledge
◃ Detection of outliers and influential observations
◃ Posterior predictive checks
Bayesian Longitudinal Analyses
103
3. Bayesian sensitivity analysis:
Perturb prior and likelihood and check how much this affects inference
◃ Varying the distributional assumptions
◃ Varying the systematic part
◃ Model expansion
Bayesian Longitudinal Analyses
104
Predictive approaches to outlier detection
• Posterior predictive ordinate (PPOi): PPD evaluated in yi
∫
PPOi = p(yi | y) =
p(yi | θ)p(θ | y) dθ
• Low value of PPOi: ith observation is in the tail area of the density
[i = pb(yi | y) =
• Estimate of PPOi: PPO
1
K
∑K
k=1
p(yi | θ k )
• Problem: makes use of y twice (estimation + validation)
Bayesian Longitudinal Analyses
105
• Conditional predictive ordinate (CPOi): PPD based on y (i) evaluated in yi
∫
CPOi = p(yi | y (i)) = p(yi | θ)p(θ | y (i))dθ
• Computation of CPOi (make use of hierarchical independence):
1
=
p(yi | y (i))
∫
p(y (i) | θ)p(θ)
dθ =
p(y)
∫
[i =
• (Harmonic) estimate of PPOi: CPO
1
p(θ | y)dθ = Eθ|y
p(yi | θ)
( ∑
1
K
1
k p(yi |θ k )
(
1
p(yi | θ)
)
)−1
• Estimate of PPOi in WinBUGS: monitor 1/p(yi | θ k ) + export to R
• Problem: harmonic estimate is unreliable with seriously outlying observations
Bayesian Longitudinal Analyses
106
Example growth curve study: PPO, CPO to detect outliers
• Model M1, R program using R2WinBUGS: model M1 diagnostics.R
• Compute PPOi, inverse of CPOi in WinBUGS, add:
# PPO, iCPO
ppo[(i-1)*4+j] <- pow(tau[(i-1)*4+j],0.5)*
exp(-0.5*tau[(i-1)*4+j]*res2[(i-1)*4+j])
icpo[(i-1)*4+j] <- 1/ppo[(i-1)*4+j]
• To compute CPOi, add in R (on means of icpo (over iterations)):
cpo[(i-1)*4+j] <- 1/icpo[(i-1)*4+j]
Bayesian Longitudinal Analyses
107
0
20
40
60
Index
Bayesian Longitudinal Analyses
80
100
0.0 0.2 0.4 0.6 0.8 1.0 1.2
CPO
0.0 0.2 0.4 0.6 0.8 1.0 1.2
PPO
PPO and CPO:
0
20
40
60
80
100
Index
108
1000
0
500
ICPO
1500
2000
Inverse of CPO:
0
20
40
60
80
100
Index
Bayesian Longitudinal Analyses
109
3. Sensitivity analysis
Sensitivity analysis: check how stable conclusions are when deviating from original
statistical model
◃ Perturbation of likelihood or prior
◃ Likelihood:
◦ Vary distribution of measurement error/random effects
◦ Vary link function
◦ Slightly perturbing the responses or covariate values
◦ Effect of deleting single or sets of subjects
◃ How to perform?
◦ Replay Bayesian analysis with changed settings
◦ Re-use the Markov chain by sampling from this chain
Bayesian Longitudinal Analyses
110
Example Arteriosclerosis study: Detection of influential observations
Model:
yij = ϕ1i {1 − exp [− exp(ϕ2i tj )] + εij }
◃ 1st param: ϕ1i = β1 (grp 1), ϕ1i = β2 (grp 2) (ultimate IPR)
◃ 2nd param: ϕ2i = β3 + bi (grp 1), ϕ2i = β4 + bi (grp 2) (rate to attain final IPR)
Influence analysis: focus on δ = β3 − β4
◦ R program using R2WinBUGS: chapter 10 arterior study-influence.R
◦ Subject-diagnostics: impact of (whole) subjects (here: 20)
◦ Observation-diagnostics: impact of single observations (120-1=119)
Bayesian Longitudinal Analyses
111
• Original data set: no/all observations/subjects are influential
• Distorted data set:
◦ 2nd measurement of 4th subject (obs 20) +1: 0.76 → 1.76
◦ Diagnostic plots indicate the influential subject/observation
• Comparison standardized impact of each subject and observation on δ by
◦ Importance sampling (shows influential subject/observation)
◦ SIR algorithm with replacement (shows influential subject/observation)
◦ SIR algorithm without replacement (does not show influential
subject/observation)
Bayesian Longitudinal Analyses
112
STANDARDIZED DIFF MU1 − MU2
−0.4
−0.2
0.0
0.2
0.4
(a)
STANDARDIZED DIFF MU1 − MU2
−0.4
−0.2
0.0
0.2
Plots:
5
Bayesian Longitudinal Analyses
10
INDEX
15
20
(b)
0
20
40
60
INDEX
80
100
120
113
Posterior predictive check (PPC)
Global measures of goodness-of-fit: frequentist case
◦ H0: p(y | θ) is distributed according to model M0
◦ Sample y = {y1, . . . , yn}
◦ θ ∈ Θ0 estimated from the data
◦ Goodness-of-fit test (GOF) statistic T (y) with large value = poor fit
◃ GOF test (special case: Θ0 ≡ θ 0):
◦ Determine sampling distribution of T (y) under H0
◦ Compute: pC (y, θ0) = P (T (ỹ) ≥ T (y) | Θ0, H0)
◦ ỹ = {ỹ1, . . . , ỹn} a random sample taken from M0
◦ pC small ⇒ H0 rejected
Bayesian Longitudinal Analyses
114
Global measures of goodness-of-fit: Bayesian case
Predictive approach to GOF testing PPC:
Contrast T (y) with T (ỹ) and evaluate its extremeness
More formal definition depending on choice of GOF statistic
• Test statistic T (y):
pT = P (T (ỹ) ≥ T (y) | y, H0)
∫
e θ
e | y)dθ
e
= pC (y, θ)p(
e (GOF test depends on nuisance parameters):
• Discrepancy measure D(ỹ, θ)
e ≥ D(y, θ)
e | y, H0)
pD = P (D(ỹ, θ)
∫∫
e ≥ D(y, θ)]p(ỹ
e
e θ
e | y)dỹdθ
e
=
I[D(ỹ, θ)
| θ)p(
Bayesian Longitudinal Analyses
115
Computational aspects of the PPC
Computation of the PPP-value for D(y, θ)/T (y):
1. Let θ 1, . . . , θ K be a converged Markov chain from p(θ | y)
2. Compute D(y, θ k ) (k = 1, . . . , K) (for T (y) only once)
3. Sample replicated data ỹ k from p(y|θ k ) (each of size n)
4. Compute D(e
y k , θ k ) (k = 1, . . . , K)
]
∑K [
k
k
k
1
5. Estimate pD by pD = K i=1 I D(ỹ , θ ) ≥ D(y, θ )
◃ When pD < 0.05/0.10: “bad” fit of model to the data
◃ Graphical checks:
◦ T (y): Histogram of T (ỹ k ), (k = 1, . . . , K) with observed T (y)
◦ D(y, θ): X-Y plot of D(ỹ k , θ k ) versus D(y, θ k ) + 45◦ line
Bayesian Longitudinal Analyses
116
Example growth curve study, model M1: three PPCs
• R program using R2WinBUGS: model M1 diagnostics.R
• Three PPCs were considered:
◃ PPC1: sum of distances of yij to µij
DSS (y, θ) =
n
∑
(yij − µij )2
i=1
◃ PPC2 & PPC3: skewness and kurtosis of standardized residuals:
)3
)4
n (
n (
1 ∑ yij − µij
1 ∑ yij − µij
Dskew (y, θ) =
and Dkurt(y, θ) =
−3
n i=1
σ
n i=1
σ
with θ = (µ, σ)T
Bayesian Longitudinal Analyses
117
PPCs:
P_skewness=0.56
6
4
2
−2
0
Replicated kurtosis
1
0
−1
200
300
400
500
Replicated skewness
2
P_kurtosis=0.25
100
Replicated SS
600
P_fit=0.51
150
200
250
Observed SS
Bayesian Longitudinal Analyses
300
−2
−1
0
1
Observed skewness
−2
0
2
4
6
8
10
Observed kurtosis
118
Example growth curve study: model M1 versus M5
• MSPE: 3.74 (M1), 4.18 (M5)
• PPCs:
◃ PPC1: 0.51 (M1), 0.53 (M5)
◃ PPC2: 0.56 (M1), 0.57 (M5)
◃ PPC3: 0.25 (M1), 0.57 (M5)
Bayesian Longitudinal Analyses
119
Sensitivity of prior distribution
• Gelman (1996): reasonably sized study with not a too strong prior the sensitivity
of the conclusions to the choice of prior is likely to be minimal
• But checking of the posterior with varying priors is a necessity
Bayesian Longitudinal Analyses
120
Model expansion
◦ Enlarging the model with extra parameters
◦ Embedding the current model into a larger class of models
Examples of model expansion:
◃ Embedding the distribution of the response into a general class of distributions
◃ Adding polynomial and/or interaction terms to the systematic part
◃ Introducing splines into the model
◃ Relaxing the link function
◃ ...
But WinBUGS/OpenBUGS, etc are still too slow for quick exploratory Bayesian
checks . . .
Bayesian Longitudinal Analyses
121