Bayesian Analysis of Longitudinal Studies Emmanuel Lesaffre Department of Biostatistics, Erasmus Medical Center Interuniversity Institute for Biostatistics and statistical Bioinformatics Catholic University Leuven & University Hasselt [email protected] [email protected] (58 RBRAS Campina Grande — 22 to 26 July 2013) Contents 0.1 1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . General introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 3 1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Probability in a Bayesian context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Bayes theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Bayesian concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5 Bayesian versus frequentist approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.6 Markov chain Monte Carlo sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.7 Bayesian software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Bayesian Longitudinal Analyses i 2 3 4 Hierarchical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.1 Gaussian hierarchical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2 Mixed models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.3 Missing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 2.4 Some reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Model evaluation techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.2 Information theoretic measures for model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3 Model checking procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Longitudinal studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.1 List of (future) types of models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.2 Mixed models with non-Gaussian random effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.3 4.2.1 A linear mixed model with a non-Gaussian random effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.2.2 A generalized linear mixed model with a mixture distributions for REs . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Changepoint models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Bayesian Longitudinal Analyses ii 4.4 4.5 4.6 4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 4.3.2 The HIV RCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 4.3.3 The Prostate Cancer Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 4.3.4 Overall conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Modeling growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 4.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 4.4.2 The Predict Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 4.4.3 The Jimma Infant Survival Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Smooth longitudinal curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 4.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 4.5.2 Smoothing approach: background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 4.5.3 Two longitudinal smoothing exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Multivariate longitudinal curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 4.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 4.6.2 Intraocular pressure study: two longitudinal Gaussian responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 Bayesian Longitudinal Analyses iii 4.6.3 Signal-Tandmobielr study: two longitudinal ordinal responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 4.7 Joint modeling of the repeated measures and the failure time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 4.8 Longitudinal profiles as covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 4.9 An autoregressive + random effects model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 4.10 Repeated measures within repeated measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Bayesian Longitudinal Analyses iv 0.1 Preface What can be expected from this short course? • First a brief refreshment of Bayesian concepts • A brief introduction to Bayesian hierarchical models • Bayesian mixed models • Model evaluation techniques • Bayesian longitudinal studies (in progress) Bayesian Longitudinal Analyses 1 Some publicity Bayesian Longitudinal Analyses 2 Chapter 1 General introduction ◃ Notation ◃ Probability in a Bayesian context ◃ Bayes theorem and concepts ◃ Bayesian versus frequentist approach ◃ Markov chain Monte Carlo sampling ◃ Bayesian software Bayesian Longitudinal Analyses 3 1.1 Notation • Random variable and realization: y • Sample: y = {y1, . . . , yn} • Parameter vector: θ • Density or distribution: p(y | θ), likelihood: L(θ | y) Bayesian Longitudinal Analyses 4 1.2 Probability in a Bayesian context • Frequentist probability = long run frequency definition • Bayesian probability = expression of Our/Your uncertainty of the parameter value Bayesian Longitudinal Analyses 5 1.3 Bayes theorem • Sample of i.i.d. {y1, . . . , yn} = y • Joint distribution of sample = p(y|θ) = ∏n i=1 p(yi |θ) = likelihood L(θ|y) • Prior distribution: p(θ) • Split up: p(y, θ) = p(y|θ)p(θ) = p(θ|y)p(y) ⇒ Bayes theorem: p(θ|y) = Bayesian Longitudinal Analyses L(θ|y)p(θ) L(θ|y)p(θ) =∫ p(y) L(θ|y)p(θ)dθ 6 15 Prior, likelihood & posterior: 10 POSTERIOR proportional to LIKELIHOOD 0 5 PRIOR 0.0 0.1 0.2 0.3 0.4 θ Bayesian Longitudinal Analyses 7 1.4 Bayesian concepts • Prior distribution • Posterior distribution • Posterior summary measures • Posterior predictive distribution • The Bayes factor Bayesian Longitudinal Analyses 8 Prior distribution • Non-informative/vague/diffuse/ . . .: several general principles, but in complex models difficult to decide • Subjective: still rare in Bayesian literature • Conjugate: analytical results, but limited applicability • Conditional conjugate, semi-conjugate: extends the applicability and combines nicely with Gibbs sampler • Proper and improper prior: main question is how to show that posterior is proper Bayesian Longitudinal Analyses 9 Posterior distribution Multivariate θ = {θ 1, θ 2} • Joint posterior: p(θ | y) • Conditional posterior: p(θ 1 | θ 2, y) • Marginal posterior: ∫ ◃ p(θ 1 | y) = p(θ 1 | θ 2, y) p(θ 2 | y) d θ 2 ◃ Bayesian paradigm: inference about parameters of interest θ 1 by averaging over the uncertainty of the nuisance parameters θ 2 Bayesian Longitudinal Analyses 10 Posterior summary measures θ univariate: • Posterior mean: θ = ∫ θ p(θ|y)dθ • Posterior median: 0.5 = ∫ θM p(θ|y)dθ • Posterior mode: θbM = arg maxθ p(θ|y) • Posterior variance: σ = 2 ∫ (θ − θ)2 p(θ|y)dθ • Posterior SD: σ θ multivariate: straightforward generalizations, except for posterior median Bayesian Longitudinal Analyses 11 θ univariate: • [a, b] = 100(1 − α)% credible interval: if P (a ≤ θ ≤ b | y) = 1 − α • 100(1 − α)% equal tail credible interval: invariant to monotone transformations • 100(1 − α)% highest posterior density (HPD) interval: shortest CI with size 100(1 − α)% θ multivariate: • HPD region of content 100(1 − α)% Bayesian Longitudinal Analyses 12 Posterior predictive distribution • PPD: p(e y | y) = ∫ p(e y | θ)p(θ | y) dθ • PPD expresses what we know about the distribution of future observations • Also useful for model checking Bayesian Longitudinal Analyses 13 1.5 Bayesian versus frequentist approach • Frequentist approach: ◦ θ fixed and inference based on sampling variability ◦ Many tests are based on asymptotic arguments ◦ Maximization is key tool ◦ Testing and estimation closely linked • Bayesian approach: ◦ Condition on observed data (data fixed), uncertainty about θ (θ stochastic) ◦ No asymptotic arguments are needed, all inference depends on posterior ◦ Integration is key tool ◦ Testing (Bayes fator) and estimation (posterior distribution) separate tools • But: frequentist and Bayesian approach can give the same numerical output Bayesian Longitudinal Analyses 14 1.6 Markov chain Monte Carlo sampling MCMC algorithm saved the Bayesian paradigm: it is the reason why you are here • The Gibbs sampler • The Metropolis(-Hastings) sampler • Checking and accelerating convergence Bayesian Longitudinal Analyses 15 The Gibbs sampler Starting position θ 0 = (θ10, . . . , θd0)T Iteration (k + 1): (k+1) k from p(θ1 | θ2k , . . . , θ(d−1) , θdk , y) (k+1) from p(θ2 | θ1 (k+1) from p(θd | θ1 1. Sample θ1 2. Sample θ2 .. d. Sample θd (k+1) , θ3k , . . . , θdk , y) (k+1) (k+1) , . . . , θ(d−1) , y) p(θj | θ1, . . . , θ(j−1), θ(j+1), . . . , θd, y): full conditional distribution Bayesian Longitudinal Analyses 16 Result of Gibbs sampling: • Chain of vectors: θ k = (θ1k , . . . , θdk )T , k = 1, 2, . . . ◦ Consists of dependent elements ◦ Markov property: p(θ (k+1) | θ k , θ (k−1), . . . , y) = p(θ (k+1) | θ k , y) • Chain depends on starting value + initial portion/burn-in part is discarded • Under mild conditions: sample from the posterior distribution ⇒ From k0 on: summary measures calculated from the chain consistently estimate the true posterior measures Bayesian Longitudinal Analyses 17 The Metropolis-Hastings algorithm Chain is at θ k ⇒ then sample value θ (k+1) as follows: e from the (asymmetric) proposal density q(θ e | θ), with 1. Sample a candidate θ θ = θk 2. The next value θ (k+1) will be equal to: e with probability α(θ k , θ) e (accept proposal), •θ • θ k otherwise (reject proposal), with ( e | y) q(θ | θ) e p(θ e α(θ , θ) = min r = ,1 k k e p(θ | y) q(θ | θ ) k ) k Bayesian Longitudinal Analyses 18 Result of MH sampling: • Markov chain of vectors: θ k = (θ1k , . . . , θdk )T , k = 1, 2, . . . • About same properties as with Gibbs sampler Gibbs sampler= special case of MH-algorithm Bayesian Longitudinal Analyses 19 Output MCMC sampler Trace plot from Gibbs sampler: 0.030 β1 0.045 (a) 0 500 1000 1500 1000 1500 Iteration 0.06 σ2 0.08 0.10 (b) 0 500 Iteration Bayesian Longitudinal Analyses 20 7.1 (a) 6.9 µ 7.3 Trace plot from MH sampler: 600 800 1000 1200 1400 1200 1400 2.0 (b) 1.6 σ2 2.4 Iteration 600 800 1000 Iteration Accepted moves = blue color, rejected moves = red color Bayesian Longitudinal Analyses 21 Checking convergence of the Markov chain Popular tools for checking convergence: ◃ Trace plot: visual inspection of mixing and stationarity of Markov chain ◃ Autocorrelation plot: mixing of Markov chain ◃ Cross-correlation plot: when mixing is low, check for identifiability problems ◃ Geweke diagnostic + dynamic version: single chain diagnostic, compares mean of early with late part of chain ◃ Heidelberger-Welch: single chain diagnostic + assesses accuracy of posterior mean (Monte Carlo error) ◃ Brooks-Gelman-Rubin diagnostic: multiple chains diagnostic, compares mixing of the chains Bayesian Longitudinal Analyses 22 It is impossible to prove convergence in practice: ◦ Example Geyer (1992) ◦ AZT RCT example beta2[4] 8.0 6.0 4.0 2.0 0.0 1 5000 10000 15000 20000 iteration Bayesian Longitudinal Analyses 23 Accelerating convergence of the Markov chain • Acceleration approaches: aim is to lower the autocorrelation ◦ Choosing better starting values ◦ Transforming the variables: centering + standardizing variables ◦ Thinning: useful to reduce storage, but does not accelerate ◦ Blocking: sample blocks of parameters, may imply considerable improvement in convergence rate ◦ Suppress purely random behavior of MCMC sampler: overrelaxation ◦ Reparameterization of the parameters: e.g. centering covariates, in general joint reparameterization is needed ◦ Data augmentation: use of auxiliary variables, may improve writing of MCMC sampler considerably • More sophisticated MCMC algorithms may be required Bayesian Longitudinal Analyses 24 1.7 Bayesian software • WinBUGS is still the standard software, with OpenBUGS as successor • Link of R to WinBUGS/OpenBUGS: R2WinBUGS, CODA, BOA, . . . • Also JAGS, R2jags, rjags • Examples will be analyzed using above software and show that: ◃ Programming languages are almost the same ◃ R-objects that R2WinBUGS, R2jags, rjags deliver are different ◃ Performance and speed can differ considerably between the software • INLA: promising new approach based on smart Laplace approximations (no illustrations here) Bayesian Longitudinal Analyses 25 Chapter 2 Hierarchical models • Bayesian hierarchical models (BHMs): for hierarchical/clustered data • Examples: ◦ Measurements taken repeatedly over time on the same subject ◦ Data with a spatial hierarchy: surfaces on teeth and teeth in a mouth ◦ Multi-center clinical data with patients within centers ◦ Cluster randomized trials where centers are randomized to interventions ◦ Meta-analyses • BHM: a classical mixed effects model + prior on all parameters • As in classical frequentist world: random and fixed effects Bayesian Longitudinal Analyses 26 Here: • Introduction to BHM via Gaussian hierarchical model • Full Bayesian versus Empirical Bayesian approach • Bayesian Linear, Generalized Linear and Nonlinear Mixed Model • Choice of non-informative priors Bayesian Longitudinal Analyses 27 2.1 Gaussian hierarchical models Bayesian Longitudinal Analyses 28 Introduction • Gaussian hierarchical model: hierarchical model whereby the distribution at each level is Gaussian • Here variance component model/random effects model: no covariates involved • Bayesian linear model: ◦ All parameters are given a prior distribution ◦ Fundamentals by Lindley and Smith • Illustrations: dietary IBBENS study Bayesian Longitudinal Analyses 29 Example dietary study: Monitoring dietary behavior in Belgium • IBBENS study: dietary survey in Belgium (Den Hond et al. 1994) • Of interest: compare the dietary intake in different geographical areas in Belgium, but especially in Flanders • Performed in eight subsidiaries of one bank situated in seven Dutch-speaking cities in the north and one French-speaking city in the south of Belgium. • The food habits of 371 (66%) male and 192 female healthy employees with average age 38.3 years were examined by a three-day food record with an additional interview. • Cholesterol intake has roughly a Gaussian distribution, but as shown below this assumption is not crucial for a large sample. The observed mean of cholesterol intake is 328 mg/day with a standard deviation equal to 120.3 mg/day. Bayesian Longitudinal Analyses 30 600 400 200 Cholesterool intake 800 Boxplot cholesterol intake according to subsidiaries: 1 2 3 4 5 6 7 8 Subsidiary Bayesian Longitudinal Analyses 31 The Gaussian hierarchical model • Two-level Bayesian Gaussian hierarchical model: Level 1: Level 2: Priors: yij | θi, σ 2 ∼ N(θi, σ 2) (j = 1, . . . , mi; i = 1, . . . , n) θi | µ, σθ2 ∼ N(µ, σθ2) (i = 1, . . . , n) σ 2 ∼ p(σ 2) and (µ, σθ2) ∼ p(µ, σθ2) ◃ Hierarchical independence ◃ Hyperparameters: often p(µ, σθ2) = p(µ)p(σθ2) ◃ Alternative model formulation is θi = µ + αi, with αi ∼ N(0, σθ2) ◃ Joint posterior: p(θ, σ 2 , µ, σθ2 Bayesian Longitudinal Analyses | y) ∝ ∏n ∏mi i=1 j=1 N(yij | θi , σ ) 2 ∏n 2 2 2 N(θ | µ, σ ) p(σ ) p(µ, σ i θ θ) i=1 32 Estimating the parameters Some insightful results (with fixed σ 2): • Joint posterior depends on data only via ȳi (i = 1, . . . , n) • p(θi | µ, σθ2, σ 2, y) = N(θi, σ 2θi ) with θi = 1 σθ2 1 µ + σ2/m ȳi i 1 σθ2 • θi = Biµ + (1 − Bi)y i, with Bi = 1 + σ2/m 1 / σθ2 and σ 2θi = i ( 1 σθ2 + 1 2 σ /mi ) = 1 σθ2 1 1 + σ2/m i σ2 (σ 2 +mi σθ2 ) • Inverse relationship of shrinkage factor with intra-class correlation: σθ2 ICC = 2 σθ + σ 2 Bayesian Longitudinal Analyses 33 • For flat prior of µ and conditional on σθ2 and σ 2: p(µ | σθ2, σ 2, y) = N(µ, σ 2µ) with ∑n µ= Bayesian Longitudinal Analyses 1 i=1 σ 2 /mi +σ 2 ȳi θ ∑n 1 i=1 σ 2 /mi +σ 2 θ and σ 2µ = n ∑ i=1 1 σ 2/mi + σθ2 34 Example dietary study: Comparison between subsidiaries Aim: compare average cholesterol intake between subsidiaries with WinBUGS • Priors: µ ∼ N(0, 106), σ 2 ∼ IG(10−3, 10−3) and σθ ∼ U(0, 100) ◦ 3 chains each of size 10,000, 3 × 5,000 burn-in ◦ BGR diagnostic in WinBUGS: almost immediate convergence ◦ Export to CODA: OK except for Geweke test (numerical difficulties) • Posterior summary measures on the last 5,000 iterations ◦ µ = 328.3, σµ = 9.44 ◦ σ M = 119.5 ◦ σθ M = 18.26 ◦ θi: see table (not much variation) ◦ Bi ∈ [0.33, 0.45] (≈ uniform shrinkage) Bayesian Longitudinal Analyses 35 DAG: sigma.i sigma.mu tau.mu r tau.i mu_i mu y_ij for(j IN 1 : m_i) for(i IN 1 : n) Bayesian Longitudinal Analyses 36 FB estimation of θi with WinBUGS: Sub mi Mean (SE) SD FBW (SD) FBS (SD) EML (SE) EREML (SE) 1 82 301.5 (10.2) 92.1 311.8 (12.6) 312.0 (12.0) 313.8 (10.3) 312.2 (11.3) 2 51 324.7 (17.1) 122.1 326.4 (12.7) 326.6 (12.7) 327.0 (11.7) 326.7 (12.3) 3 71 342.3 (13.6) 114.5 336.8 (11.7) 336.6 (11.7) 335.6 (10.7) 336.4 (11.6) 4 71 332.5 (13.5) 113.9 330.8 (11.6) 330.9 (11.5) 330.6 (10.7) 330.8 (11.6) 5 62 351.5 (19.0) 150.0 341.2 (12.8) 341.3 (12.6) 339.5 (11.1) 340.9 (11.8) 6 69 292.8 (12.8) 106.4 307.3 (14.2) 307.8 (13.5) 310.7 (10.8) 308.4 (11.6) 7 74 337.7 (14.1) 121.3 334.1 (11.4) 334.2 (11.2) 333.4 (10.6) 333.9 (11.5) 8 83 347.1 (14.5) 132.2 340.0 (11.4) 340.0 (11.4) 338.8 (10.2) 340.0 (11.2) chapter 9 dietary study chol.odc Bayesian Longitudinal Analyses 37 Posterior predictive distributions Prediction of the distribution of cholesterol intake of a future bank employee given observed intakes y. Two cases: 1. New bank employee works in one of the eight subsidiaries 2. New bank employee works in a new subsidiary from the same bank Bayesian Longitudinal Analyses 38 Example dietary study: PPDs Case 1: ∫ ∫ p(e yij | y) = µ σθ2 ∫ σ2 ∫ p(e yij | θi) p(θi | µ, σθ2, σ 2, y) p(µ, σθ2, σ 2 | y) dθi dµ dσθ2 dσ 2 θi with p(e yij | θi) = N(e yij | θi, σ 2) p(θi | µ, σθ2, σ 2, y) = N(θi | θi, σ 2θi ) p(µ, σθ2, σ 2 | y) Bayesian Longitudinal Analyses 39 Case 2: ∫ ∫ p(e y | y) = µ σθ2 ∫ σ2 ∫ θe e p(θe | µ, σ 2, σ 2, y) p(µ, σ 2, σ 2 | y) dθe dµ dσ 2 dσ 2 p(e y | θ) θ θ θ with e = N(e e σ 2) p(e y | θ) y | θ, p(θe | µ, σθ2, σ 2, y) = N(θe | µ, σ 2θ ) p(µ, σθ2, σ 2 | y) Bayesian Longitudinal Analyses 40 Example dietary study: PPDs with WinBUGS Extra WinBUGS commands: #prediction new observations of subsidiary 1 for( j in 1 : nc[1] ) predict1[j] ~ dnorm(mu.sub[1],tau.i) mpredict1 <- mean(predict1[]) #prediction new observations of a new subsidiary mu.new ~ dnorm(mu.m,tau.mu) for (j in 1:nnew) predict.new[j] ~ dnorm(mu.new,tau.i) mpredict.new <- mean(predict.new[]) Bayesian Longitudinal Analyses 41 2.2 Mixed models Bayesian Longitudinal Analyses 42 Introduction • Bayesian linear mixed model (BLMM): extension of Bayesian Gaussian hierarchical model • Bayesian generalized linear mixed models (BGLMM): extension of BLMM • Bayesian nonlinear mixed model: extension of BGLMM Bayesian Longitudinal Analyses 43 The linear mixed model (LMM) LMM: yij response for jth observation on ith subject (i = 1, . . . , n) yij = xTij β + z Tij bi + εij y i = X iβ + Z ibi + εi ◃ y i = (yi1, . . . , yimi )T : mi × 1 vector of responses ◃ X i = (xTi1, . . . , xTimi )T : mi × (d + 1) design matrix ◃ β = (β0, β1, . . . , βd)T a (d + 1) × 1: fixed effects ◃ Z i = (z Ti1, . . . , z Timi )T : mi × q design matrix of random effects ◃ bi = q × 1 vector of random effects (i = 1, . . . , n) ◃ εi = (εi1, . . . , εimi )T : mi × 1 vector of measurement errors Bayesian Longitudinal Analyses 44 Distributional assumptions: ◃ bi ∼ Nq (0, G), G: q × q covariance matrix ◃ G with (j, k)th element: σbj ,bk (j ̸= k), σb2j (j = k) ◃ εi ∼ Nmi (0, Ri), Ri: mi × mi covariance matrix often Ri = σ 2Imi ◃ bi statistically independent of εi (i = 1, . . . , n) Implications: y i | bi ∼ Nmi (X iβ, Ri) y i ∼ Nmi (X iβ, Z iGZ Ti + Ri) Bayesian Longitudinal Analyses 45 • LMM popular for analyzing longitudinal studies with irregular time points + Gaussian response • Covariates: time-independent or time-dependent • Random intercept model: b0i • Random intercept + slope model: b0i + b1itij • Bayesian linear mixed model (BLMM): all parameters prior distribution Bayesian Longitudinal Analyses 46 • Bayesian linear mixed model: Level 1: Level 2: Priors: yij | β, bi, σ 2 ∼ N(xTij β+z Tij bi, σ 2) (j = 1, . . . , mi; i = 1, . . . , n) bi | G ∼ Nq (0, G) (i = 1, . . . , n) σ 2 ∼ p(σ 2), β ∼ p(β) and G ∼ p(G) ◃ Joint posterior: p(β, G, σ 2, b1, . . . , bn | y 1, . . . , y n) ∝ ∏n ∏mi ∏n 2 2 i=1 j=1 p(yij | bi , σ , β) i=1 p(bi | G)p(β)p(G)p(σ ) Bayesian Longitudinal Analyses 47 Example dietary study: Comparison between subsidiaries Aim: compare average cholesterol intake between subsidiaries correcting for age and gender • Model: yij = β0 + β1 ageij + β2 genderij + b0i + εij ◃ b0i ∼ N(0, σb20 ) ◃ εij ∼ N(0, σ 2) ◃ Priors: vague for all parameters Bayesian Longitudinal Analyses 48 Results: • WinBUGS program: chapter 9 dietary study chol age gender.odc ◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in ◦ Rapid convergence • Posterior summary measures on the last 3 × 5,000 iterations ◦ β1(SD) = −0.69(0.57), β2(SD) = −62.67(10.67) ◦ With covariates: σ M = 116.3, σb0 M = 14.16 ◦ Without covariates: σ M = 119.5, σb0 M = 18.30 ◦ With covariates: ICC = 0.015 ◦ Without covariates: ICC = 0.022 Bayesian Longitudinal Analyses 49 Example toenail RCT: Fitting a BLMM Aim: compare itraconazol with lamisil on unaffected nail length ◦ Double-blinded multi-centric RCT (36 centers) sportsmen and elderly people treated for toenail dermatophyte onychomycosis ◦ Two oral medications: itraconazol (treat=0) or lamisil (treat=1) ◦ Twelve weeks of treatment ◦ Evaluations at 0, 1, 2, 3, 6, 9 and 12 months ◦ Response: unaffected nail length big toenail ◦ Patients: subgroup of 298 patients Bayesian Longitudinal Analyses 50 Individual profiles: 20 15 10 5 0 5 10 15 20 Length unaffected toenail Lamisil 0 Length unaffected toenail Itraconazol 0 2 6 9 Months since baseline Bayesian Longitudinal Analyses 12 0 2 6 9 12 Months since baseline 51 Model specification: • Model: yij = β0 + β1 tij + β2 tij × treati + b0i + b1i tij + εij ◃ b0i ∼ N(0, σb20 ) ◃ εij ∼ N(0, σ 2) • Priors ◦ Vague normal for fixed effects ◦ G ∼ IW(D, 2) with D=diag(0.1,0.1) Bayesian Longitudinal Analyses 52 DAG: sigma.b0 beta_0 beta_1 beta_2 corr.b0b1 sigma.b1 tau.b sigma.eps mu_ij b_i eps_ij tau.eps y_ij t_ij treat_i for(j IN 1 : m_i) for(i IN 1 : n) Bayesian Longitudinal Analyses 53 Results: • WinBUGS program: chapter 9 toenail LMM.odc ◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in ◦ Rapid convergence • Posterior summary measures on the last 3 × 5,000 iterations ◦ β1(SD) = 0.58(0.043), β2(SD) = −0.057(0.058) ◦ σ M = 1.78 ◦ σb0 M = 2.71 ◦ σb1 M = 0.48 ◦ corr(b0, b1)M = −0.39 • Frequentist analysis with SASr MIXED: MLEs close to Bayesian estimates Bayesian Longitudinal Analyses 54 The generalized linear mixed model Two-level hierarchy, i.e. mi subjects in n clusters with responses yij Generalized linear model (GLIM): p(yij | θij ; ϕij ) = exp [ yij θij −b(θij ) a(ϕij ) ] + c(yij ; ϕij ) ◃ a(ϕij ) > 0 a known scale function ◃ θij unknown canonical parameters ◃ g(µij ) = xTij β with µij = E(yij | θij ) ◃ Problem: model ignores clustering in the data Bayesian Longitudinal Analyses 55 Bayesian generalized linear mixed model (BGLMM): Level 1: yij |θij , ϕij ∼ exp [ yij θij −b(θij ) a(ϕij ) ] + c(yij ; ϕij ) (j = 1, . . . , mi; i = 1, . . . , n) Level 2: g(µij ) = xTij β + z Tij bi with i.i.d. bi ∼ Nq (0, G) (i = 1, . . . , n) Priors: β ∼ p(β), G ∼ p(G) with G a q × q-covariance matrix Example: Logistic-normal-binomial model ◦ logit(πij ) = xTij β + b0i ◦ E [logit(πij )] = xTij β ◦ Regression coefficients have a subject-specific interpretation Bayesian Longitudinal Analyses 56 Example toenail RCT: A Bayesian logistic RI model Aim: compare evolution of onycholysis over time between two treatments Response: yij : 0 = none or mild, 1 = moderate or severe ◃ Model: logit(πij ) = β0 + β1 tij + β2 tij × treati + b0i ◃ b0i ∼ N(0, σb20 ) ◃ Priors: ◦ Independent vague normal for regression coefficients ◦ σb0 ∼ U(0, 100) Bayesian Longitudinal Analyses 57 20 Itraconazol Lamisil 0 10 % onycholysis 30 40 Mean profiles: 0 1 2 3 6 9 12 Months since baseline Bayesian Longitudinal Analyses 58 Results: • WinBUGS program: chapter 9 toenail RI BGLMM.odc ◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in ◦ Rapid convergence • Posterior summary measures on the last 3 × 5,000 iterations ◦ β0(SD) = −1.74(0.34) ◦ β1(SD) = −0.41(0.045) ◦ β2(SD) = −0.17(0.069) ◦ σb20 M = 17.44 ◦ ICC = σb20 /(σb20 + π 2/3) = 0.84 • Frequentist analysis with SASr GLIMMIX: similar results, see program chapter 9 toenail binary GLIMMIX and MCMC.sas Bayesian Longitudinal Analyses 59 Example toenail RCT: A Bayesian logistic RI+RS model Aim: compare evolution of onycholysis over time between two treatments Response: yij : 0 = none or mild, 1 = moderate or severe • Model: logit(πij ) = β0 + β1 tij + β2 tij × treati + b0i + b1i tij ◃ Similar distributional assumptions and priors as before WinBUGS program: chapter 9 toenail RI+RS BGLMM.odc Numerical difficulties due to overflow: use of min and max functions SAS MCMC program: chapter 9 toenail RI+RS BGLMM.sas No computational problems but much slower convergence Bayesian Longitudinal Analyses 60 Other BGLMMs Extension of BGLMM with residual term: g(µij ) = xTij β + z Tij bi + εij , (j = 1, . . . , mi; i = 1, . . . , n) Albert’s model (1988): • Parameters have a population averaged interpretation • Model allows only level-1 covariates Further extensions: see later Bayesian Longitudinal Analyses 61 Nonlinear mixed models Bayesian nonlinear mixed model (BNLMM): Level 1: yij | ϕi, xij , σ 2 ∼ N[f (ϕi, xij ), σ 2] (j = 1, . . . , mi; i = 1, . . . , n) Level 2: ϕi | W i, Z i, β ∼ N(W iβ, Z iGZ Ti ) (i = 1, . . . , n) Priors: β ∼ p(β), σ 2 ∼ p(σ 2) and G ∼ p(G) with G a q × q-covariance matrix with ◃ ϕi = W iβ + Z ibi ◃ W i includes X i ◃ f a general function of several variables, more general than the link function g Bayesian Longitudinal Analyses 62 Example arteriosclerosis study: Reperfusion models • Aim: examine the cellular components of the immune system and their effects on arteriogenesis • Study set up: genetically modified mice undergo a surgical procedure that blocked one of the main leg arteries • Two groups: C57BL/6 mice and MHC class II-/- mice • Response: yij = ischemic/non-ischemic perfusion ratio (IPR) (comparison of perfusion in blocked/unblocked leg) • Profiles: measured at days 0, 3, 7, 14, 21, 28 Bayesian Longitudinal Analyses 63 0.5 1.0 1.5 C57BL/6 WT MHC class II−/− 0.0 ischemic/non−ischemic perfusion ratio Individual profiles: 0 5 10 15 20 25 days Bayesian Longitudinal Analyses 64 Model specification: Model: yij = ϕ1i {1 − exp [− exp(ϕ2i tj )] + εij } ◃ 1st param: ϕ1i = β1 (grp 1), ϕ1i = β2 (grp 2) (ultimate IPR) ◃ 2nd param: ϕ2i = β3 + bi (grp 1), ϕ2i = β4 + bi (grp 2) (rate to attain final IPR) ◃ Interest in: δ = β3 − β4 ◃ In matrix notation: β1 β2 1 0 0 0 ϕi = W i β + bi ≡ 0 0 1 0 β3 β4 Bayesian Longitudinal Analyses 0 + bi 65 Further model assumptions: • bi ∼ N(0, σb2), εij ∼ N(0, σ 2) • Priors ◦ Independent vague normal for fixed effects ◦ σ, σb: U(0,100) Bayesian Longitudinal Analyses 66 Results: • WinBUGS program: chapter 9 arterio study.odc ◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in ◦ Rapid convergence • Posterior summary measures on the last 3 × 5,000 iterations δ(SD) = 0.70(0.37) (95% equal tail: [-0.002, 1.47]) ⇒ No clear evidence of a difference between the two types of mice Bayesian Longitudinal Analyses 67 1.5 C57BL/6 WT 0.5 1.0 MHC class II−/− 0.0 ischemic/non−ischemic perfusion ratio Individual + fitted median profiles: 0 5 10 15 20 25 days Bayesian Longitudinal Analyses 68 Estimation of the REs and PPDs in mixed models Prediction/estimation: ◃ Bayesian approach: estimating random effects = estimating fixed effects ◃ Prediction/estimation of individual curves in longitudinal models: b + λT b λTβ β b bi b and b with β bi: posterior means PPD: ◃ See Gaussian hierarchical model ◃ Estimation via MCMC Bayesian Longitudinal Analyses 69 Software: ◃ WinBUGS (CODA command) + post-processing with R ◃ R2WinBUGS, R2jags, rjags Bayesian Longitudinal Analyses 70 Example toenail RCT: Exploring the random effects 0 0 10 30 50 70 10 20 30 40 50 60 70 Histograms of estimated RI and RS: −4 0 2 4 6 random intercept Bayesian Longitudinal Analyses 8 10 −1.5 −0.5 0.5 1.0 1.5 random slope 71 Predicting individual evolutions with WinBUGS: b + zT b • Predicted evolution for the ith subject: xTij β ij bi • Extra WinBUGS commands: For existing subject (id[iobs]): newresp[iobs] ~ dnorm(mean[iobs], tau.eps) For new subject: sample random effect from its distribution • Computation: ◃ Stats table WinBUGS +R, CODA + R ◃ More practical: R2WinBUGS + R, R2jags + R, rjags + R ◃ Alternatively: generate predictive observations in R • Prediction missing response NA: automatically done Bayesian Longitudinal Analyses 72 20 15 10 5 0 Length unaffected naillength Predicted profile for Itraconazol patient 3: 0 1 2 3 6 9 12 Month Bayesian Longitudinal Analyses 73 Choice of the level-2 variance prior • Bayesian linear regression: NI prior for σ 2 = Jeffreys prior p(σ 2) ∝ 1/σ 2 • Bayesian Gaussian hierarchical model: NI prior for σθ2 = p(σθ2) ∝ 1/σθ2 ?? ◦ Theoretical + intuitive results: improper posterior ◦ Note: not Jeffreys prior for σ 2 in hierarchical model • Possible solution: IG(ε, ε) with ε small (WinBUGS)? ◦ Not: see application • Solutions: ◦ U(0,c) prior on σθ ◦ Parameter expanded model suggestion by Gelman (2006) Bayesian Longitudinal Analyses 74 Example dietary study*: NI prior for level-2 variance • Modified dietary study: chapter 9 dietary study chol2.odc ◃ Prior 1: σθ2 ∼ IG(10−3, 10−3) ◃ Prior 2: σθ ∼ U(0,100) ◃ Prior 3: suggestion of Gelman (2006) gives similar results as with U(0,100) • Results: ◦ Posterior distribution of σθ : clear impact of IG-prior ◦ Trace plot of µ: regularly stuck with IG-prior Bayesian Longitudinal Analyses 75 Effect choosing NI prior: posterior distribution of σθ Flat prior 0.00 0.00 0.01 0.05 0.02 0.10 0.03 0.15 0.04 0.20 IG(0.001,0.001) 0 20 40 60 σθ Bayesian Longitudinal Analyses 80 0 20 40 60 80 σθ 76 Effect choosing NI prior: trace plot of µ mu.m 400.0 350.0 300.0 250.0 50001 52500 55000 57500 60000 57500 60000 iteration mu.m 375.0 350.0 325.0 300.0 275.0 50001 52500 55000 iteration Top: flat prior for σθ , bottom: IG(0.001,0.001) for σθ2 Bayesian Longitudinal Analyses 77 Choice of the level-2 variance prior Prior for level-2 covariance matrix? • Classical choice: Inverse Wishart • Simulation study of Natarajan and Kass (2000): problematic • Solution: ◦ In general: not clear ◦ 2 dimensions: uniform prior on both σ’s and on correlation : IW(D, 2) with small diagonal elements ◦ IW(D, 2) with small diagonal elements ≈ uniform priors, but slower convergence with uniform priors Bayesian Longitudinal Analyses 78 Assessing and accelerating convergence Checking convergence Markov chain of a Bayesian hierarchical model: • Similar to checking convergence in any other model • But, large number of parameters ⇒ make a selection Accelerating convergence Markov chain of a Bayesian hierarchical model: • Use tricks seen before (centering, standardizing, overrelaxation, etc.) • Specific tricks for a hierarchical model: ◦ hierarchical centering ◦ (reparameterization by sweeping) ◦ (parameter expansion) Bayesian Longitudinal Analyses 79 Hierarchical centering: Uncentered Gaussian hierarchical model: yij = µ + αi + εij αi ∼ N(0, σα2 ) Centered Gaussian hierarchical model: yij = θi + εij θi ∼ N(µ, σα2 ) • For mi = m: hierarchical centering implies faster mixing when σα2 > σ 2/m • Similar results for multilevel BLMM and BGLMM Bayesian Longitudinal Analyses 80 Example dietary study: Improving convergence In all cases: 10.000 iterations, 5,000 burn-in Hierarchical centering: ◦ σα2 > σ 2/70 ◦ Improved mixing ◦ µ: MCerror-uncentered = 0.32, MCerror-centered = 0.23 Bayesian Longitudinal Analyses 81 2.3 Missing data • All longitudinal models are likelihood-based and thus: ◃ are MAR robust ◃ if model is correctly specified • In WinBUGS/OpenBUGS, R2WinBUGS, etc. missing responses (NA) are automatically filled in using estimated model (MAR imputation) • In Chapter 4: example of a specific MNAR model Bayesian Longitudinal Analyses 82 2.4 Some reflections Bayesian solutions were always close to the classical results. So why doing a Bayesian analysis? • Perhaps you are Bayesian fundamentalist, if not . . . • Uncertainty of parameter estimates is an immediate by-product of the Bayesian approach. And . . . Bayesian 95% CI is often close to classical 95% CI. • Distributional assumptions are relatively easy to relax: both using WinBUGS as well as when the full conditionals have to be programmed Bayesian Longitudinal Analyses 83 Chapter 3 Model evaluation techniques Toolbox for the statistical modeler Bayesian Longitudinal Analyses 84 3.1 Introduction • Bayesian procedures for model building and model criticism ⇒ select an appropriate model • Model selection from a few good candidate models • Bayesian model building and criticism = similar to frequentist model and criticism, except for ◦ Bayesian model: combination of likelihood and prior ⇒ 2 choices ◦ Bayesian inference: based on MCMC techniques while frequentist approaches on asymptotic inference Bayesian Longitudinal Analyses 85 • Explorative tools that check and improve the fitted model using WinBUGS in combination with R: ◦ Criteria to select models: DIC, MSPE ◦ Methods to detect outliers: PPO, CPO ◦ Bayesian sensitivity analysis: vary the assumptions ◦ Bayesian goodness-of-fit tests: posterior-predictive checks Bayesian Longitudinal Analyses 86 3.2 Information theoretic measures for model selection Most popular frequentist information criteria: AIC and BIC • Both adjust for model complexity: effective degrees of freedom ρ • Akaike’s information criterion (AIC) developed by Akaike (1974) b AIC = −2 log L(θ(y) | y) + 2ρ • Bayesian Information Criterion (BIC) suggested by Schwartz (1978) b BIC = −2 log L(θ(y) | y) + ρ log(n) • Deviance Information Criterion (DIC) suggested by Spiegelhalter et al. (2002) DIC = generalization of AIC to Bayesian models Bayesian Longitudinal Analyses 87 The effective degrees of freedom • Effective degrees of freedom ρ, apart from variance parameters: ◦ Only fixed effects: ρ = p (p = number of fixed effects) ◦ Fixed + random effects: p ≤ ρ ≤ p + q (q = number of random effects) • Related to conditional and marginal likelihood ◦ Predictive ability of conditional model: focus on current clusters (current random effects) ◦ Predictive ability of marginal model: focus on future clusters (random effects distribution) Bayesian Longitudinal Analyses 88 Deviance Information Criterion & pD Key question: how to define effective number of parameters in a Bayesian context? In analogy to definition of ρ in AIC, effective degrees of freedom = pD : pD = D(θ) − D(θ) · Bayesian deviance = D(θ) = −2 log p(y | θ) + 2 log f (y) · D(θ) = posterior mean of D(θ) · D(θ) = D(θ) evaluated in posterior mean · f (y) typically saturated density but not used in WinBUGS ◃ pD = ρ for normal likelihood with flat prior for µ and fixed variance ◃ pD quite different from ρ in many other applications Bayesian Longitudinal Analyses 89 • DIC as a Bayesian model selection criterion: DIC = D(θ) + 2pD = D(θ) + pD • Both DIC and pD can be calculated from an MCMC run: ◦ θ 1, . . . , θ K = converged Markov chain ∑K 1 ◦ D(θ) ≈ K k=1 D(θ k ) ∑ k ◦ D(θ) ≈ D( K1 K θ ) k=1 • Practical rule for choosing model: as with AIC/BIC • DIC is subject to sampling variability Bayesian Longitudinal Analyses 90 DIC and pD – Use and pitfalls In book several examples to illustrate use and pitfalls when using pD and DIC: ◃ Use of WinBUGS to compute pD and DIC ◃ Difference between conditional and marginal DIC ◃ Negative pD : ◦ DIC and pD calculation assume log concave likelihood ◦ If assumptions not fulfilled (mixture distributions) pD could be negative, this caused a lot of confusion among WinBUGS users ◃ DIC and pD are overoptimistic (using data twice) ◃ rjags uses DIC corrected for overoptimism ◃ Except for WinBUGS, computation of DIC could be quite variable Bayesian Longitudinal Analyses 91 Example growth curve study: Model selection using pD and DIC • Well-known data set from Potthoff & Roy (1964) • Dental growth measurements of a distance (mm) • 11 girls and 16 boys at ages (years) 8, 10, 12, and 14 • Variables are gender (1=female, 0=male) and age • Gaussian linear mixed models were fit to the longitudinal profiles • WinBUGS: chapter 10 Potthoff-Roy growthcurves.odc • Choice evaluated with pD and DIC Bayesian Longitudinal Analyses 92 Individual profiles: 30 boys 25 20 Distance (mm) girls 8 9 10 11 12 13 14 Age (years) Bayesian Longitudinal Analyses 93 Models: Model M1: yij = β0 +β1 agej +β2 genderi +β3 genderi ×agej +b0i +b1i agej +εij ◃ yij = distance measurement ◃ bi = (b0i, b1i)T random intercept and slope with distribution N (0, G) G= 2 σb0 ρσb0σb1 ρσb0σb1 2 σb1 εij ∼ N(0, σ02) for boys εij ∼ N(0, σ12) for girls ◃ The total number of parameters for model M1 = 63 4 fixed effects, 54 random effects, 3 + 2 variances (RE + ME) Bayesian Longitudinal Analyses 94 Alternative models: ◃ Model M2: model M1, but assuming ρ = 0 ◃ Model M3: model M2, but assuming σ0 = σ1 ◃ Model M4: model M1, but assuming σ0 = σ1 ◃ Model M5: model M1, but bi, εij ∼ t3-(scaled) distributions ◃ Model M6: model M1, but εij ∼ t3-(scaled) distribution ◃ Model M7: model M1, but bi ∼ t3-(scaled) distribution ◃ Nested model comparisons: (1) M1, M2, M3, (2) M1, M2, M4, (3) M5, M6, M7 Bayesian Longitudinal Analyses 95 pD and DIC of the models: Model Bayesian Longitudinal Analyses Dbar Dhat pD DIC M1 343.443 308.887 34.556 377.999 M2 344.670 312.216 32.454 377.124 M3 376.519 347.129 29.390 405.909 M4 374.065 342.789 31.276 405.341 M5 328.201 290.650 37.552 365.753 M6 343.834 309.506 34.327 378.161 M7 326.542 288.046 38.047 364.949 96 Example growth curve study: Conditional and marginal DIC (Conditional) model M4: yij | bi ∼ N(µij , σ 2) (j = 1, . . . , 4; i = 1, . . . , 27) with µij = β0 + β1 agej + β2 genderi + β3 genderi × agej + b0i + b1i agej ◃ Deviance: DC (µ, σ ) ≡ DC (β, b, σ ) = −2 2 2 ∑ ∑ i j log N(yij | µij , σ 2) ◃ Interest (focus): current 27 bi’s ◃ DIC = conditional DIC ◃ WinBUGS: pD = 31.282 and DIC=405.444 Bayesian Longitudinal Analyses 97 Marginal(ized) M4: ( ) y i ∼ N4 X iβ, ZGZ + R , (i = 1, . . . , 27) T with y i = (yi1, yi2, yi3, yi4)T 1 1 Xi = 1 1 8 genderi 8 × genderi 1 8 10 genderi 10 × genderi 1 10 , Z = , R = σ 2I4 1 12 12 genderi 12 × genderi 14 genderi 14 × genderi 1 14 ◃ Deviance: DM (β, σ , G) = −2 2 ) i log N4 y i | X i β, ZGZ + R ∑ ( T ◃ Interest (focus): future bi’s ◃ DIC= marginal DIC ◃ WinBUGS : pD = 7.072 and DIC=442.572 Bayesian Longitudinal Analyses 98 • Frequentist analysis of M4: maximization of marginal likelihood ◦ SAS procedure MIXED: p = 8 and AIC = 443.8 (close to DIC for marginalized model) ◦ See program chapter 10 Potthoff-Roy growthcurves.sas • Many prefer to express performance of model on marginalized likelihood, but needs extra programming (integration) outside WinBUGS Bayesian Longitudinal Analyses 99 Model selection based on other predictive loss functions • Assume yei ∼ PPD (i = 1, . . . , n): MSPE = 1 n ∑n i=1 (yi − yei)2 • Computation of MSPE based on converged Markov chain (θ k )k : 1 ∑ 1∑ MSPE = MSPEk with MSPEk = (yi − yeik )2 K n i=1 K n k=1 • WinBUGS commands to obtain predictive values: y[i] ~ dnorm(mu[i],tauy); ytilde[i] ~ dnorm(mu[i],tauy) • MSPE does not compensate for model complexity • For a longitudinal example, see later and Chapter 4 Bayesian Longitudinal Analyses 100 3.3 Model checking procedures Bayesian Longitudinal Analyses 101 Introduction Selected model is not necessarily sensible nor does it guarantee a good fit to the data Statistical model evaluation is needed: 1. Checking that inference from the chosen model is reasonable 2. Verifying that the model can reproduce the data 3. Sensitivity analyses by varying certain aspects of the model Bayesian model evaluation: ◃ As frequentist approach ◃ Also prior needs attention ◃ Primarily based on sampling techniques Bayesian Longitudinal Analyses 102 1. Verifying that the model is reasonable: Critical inspection of the posterior output making use of background knowledge ◃ Is prior in conflict with the data? ◃ Learn from the data? or Are there identifiability issues? 2. Adequate prediction of the data at hand: Critical inspection of the posterior output making use of background knowledge ◃ Detection of outliers and influential observations ◃ Posterior predictive checks Bayesian Longitudinal Analyses 103 3. Bayesian sensitivity analysis: Perturb prior and likelihood and check how much this affects inference ◃ Varying the distributional assumptions ◃ Varying the systematic part ◃ Model expansion Bayesian Longitudinal Analyses 104 Predictive approaches to outlier detection • Posterior predictive ordinate (PPOi): PPD evaluated in yi ∫ PPOi = p(yi | y) = p(yi | θ)p(θ | y) dθ • Low value of PPOi: ith observation is in the tail area of the density [i = pb(yi | y) = • Estimate of PPOi: PPO 1 K ∑K k=1 p(yi | θ k ) • Problem: makes use of y twice (estimation + validation) Bayesian Longitudinal Analyses 105 • Conditional predictive ordinate (CPOi): PPD based on y (i) evaluated in yi ∫ CPOi = p(yi | y (i)) = p(yi | θ)p(θ | y (i))dθ • Computation of CPOi (make use of hierarchical independence): 1 = p(yi | y (i)) ∫ p(y (i) | θ)p(θ) dθ = p(y) ∫ [i = • (Harmonic) estimate of PPOi: CPO 1 p(θ | y)dθ = Eθ|y p(yi | θ) ( ∑ 1 K 1 k p(yi |θ k ) ( 1 p(yi | θ) ) )−1 • Estimate of PPOi in WinBUGS: monitor 1/p(yi | θ k ) + export to R • Problem: harmonic estimate is unreliable with seriously outlying observations Bayesian Longitudinal Analyses 106 Example growth curve study: PPO, CPO to detect outliers • Model M1, R program using R2WinBUGS: model M1 diagnostics.R • Compute PPOi, inverse of CPOi in WinBUGS, add: # PPO, iCPO ppo[(i-1)*4+j] <- pow(tau[(i-1)*4+j],0.5)* exp(-0.5*tau[(i-1)*4+j]*res2[(i-1)*4+j]) icpo[(i-1)*4+j] <- 1/ppo[(i-1)*4+j] • To compute CPOi, add in R (on means of icpo (over iterations)): cpo[(i-1)*4+j] <- 1/icpo[(i-1)*4+j] Bayesian Longitudinal Analyses 107 0 20 40 60 Index Bayesian Longitudinal Analyses 80 100 0.0 0.2 0.4 0.6 0.8 1.0 1.2 CPO 0.0 0.2 0.4 0.6 0.8 1.0 1.2 PPO PPO and CPO: 0 20 40 60 80 100 Index 108 1000 0 500 ICPO 1500 2000 Inverse of CPO: 0 20 40 60 80 100 Index Bayesian Longitudinal Analyses 109 3. Sensitivity analysis Sensitivity analysis: check how stable conclusions are when deviating from original statistical model ◃ Perturbation of likelihood or prior ◃ Likelihood: ◦ Vary distribution of measurement error/random effects ◦ Vary link function ◦ Slightly perturbing the responses or covariate values ◦ Effect of deleting single or sets of subjects ◃ How to perform? ◦ Replay Bayesian analysis with changed settings ◦ Re-use the Markov chain by sampling from this chain Bayesian Longitudinal Analyses 110 Example Arteriosclerosis study: Detection of influential observations Model: yij = ϕ1i {1 − exp [− exp(ϕ2i tj )] + εij } ◃ 1st param: ϕ1i = β1 (grp 1), ϕ1i = β2 (grp 2) (ultimate IPR) ◃ 2nd param: ϕ2i = β3 + bi (grp 1), ϕ2i = β4 + bi (grp 2) (rate to attain final IPR) Influence analysis: focus on δ = β3 − β4 ◦ R program using R2WinBUGS: chapter 10 arterior study-influence.R ◦ Subject-diagnostics: impact of (whole) subjects (here: 20) ◦ Observation-diagnostics: impact of single observations (120-1=119) Bayesian Longitudinal Analyses 111 • Original data set: no/all observations/subjects are influential • Distorted data set: ◦ 2nd measurement of 4th subject (obs 20) +1: 0.76 → 1.76 ◦ Diagnostic plots indicate the influential subject/observation • Comparison standardized impact of each subject and observation on δ by ◦ Importance sampling (shows influential subject/observation) ◦ SIR algorithm with replacement (shows influential subject/observation) ◦ SIR algorithm without replacement (does not show influential subject/observation) Bayesian Longitudinal Analyses 112 STANDARDIZED DIFF MU1 − MU2 −0.4 −0.2 0.0 0.2 0.4 (a) STANDARDIZED DIFF MU1 − MU2 −0.4 −0.2 0.0 0.2 Plots: 5 Bayesian Longitudinal Analyses 10 INDEX 15 20 (b) 0 20 40 60 INDEX 80 100 120 113 Posterior predictive check (PPC) Global measures of goodness-of-fit: frequentist case ◦ H0: p(y | θ) is distributed according to model M0 ◦ Sample y = {y1, . . . , yn} ◦ θ ∈ Θ0 estimated from the data ◦ Goodness-of-fit test (GOF) statistic T (y) with large value = poor fit ◃ GOF test (special case: Θ0 ≡ θ 0): ◦ Determine sampling distribution of T (y) under H0 ◦ Compute: pC (y, θ0) = P (T (ỹ) ≥ T (y) | Θ0, H0) ◦ ỹ = {ỹ1, . . . , ỹn} a random sample taken from M0 ◦ pC small ⇒ H0 rejected Bayesian Longitudinal Analyses 114 Global measures of goodness-of-fit: Bayesian case Predictive approach to GOF testing PPC: Contrast T (y) with T (ỹ) and evaluate its extremeness More formal definition depending on choice of GOF statistic • Test statistic T (y): pT = P (T (ỹ) ≥ T (y) | y, H0) ∫ e θ e | y)dθ e = pC (y, θ)p( e (GOF test depends on nuisance parameters): • Discrepancy measure D(ỹ, θ) e ≥ D(y, θ) e | y, H0) pD = P (D(ỹ, θ) ∫∫ e ≥ D(y, θ)]p(ỹ e e θ e | y)dỹdθ e = I[D(ỹ, θ) | θ)p( Bayesian Longitudinal Analyses 115 Computational aspects of the PPC Computation of the PPP-value for D(y, θ)/T (y): 1. Let θ 1, . . . , θ K be a converged Markov chain from p(θ | y) 2. Compute D(y, θ k ) (k = 1, . . . , K) (for T (y) only once) 3. Sample replicated data ỹ k from p(y|θ k ) (each of size n) 4. Compute D(e y k , θ k ) (k = 1, . . . , K) ] ∑K [ k k k 1 5. Estimate pD by pD = K i=1 I D(ỹ , θ ) ≥ D(y, θ ) ◃ When pD < 0.05/0.10: “bad” fit of model to the data ◃ Graphical checks: ◦ T (y): Histogram of T (ỹ k ), (k = 1, . . . , K) with observed T (y) ◦ D(y, θ): X-Y plot of D(ỹ k , θ k ) versus D(y, θ k ) + 45◦ line Bayesian Longitudinal Analyses 116 Example growth curve study, model M1: three PPCs • R program using R2WinBUGS: model M1 diagnostics.R • Three PPCs were considered: ◃ PPC1: sum of distances of yij to µij DSS (y, θ) = n ∑ (yij − µij )2 i=1 ◃ PPC2 & PPC3: skewness and kurtosis of standardized residuals: )3 )4 n ( n ( 1 ∑ yij − µij 1 ∑ yij − µij Dskew (y, θ) = and Dkurt(y, θ) = −3 n i=1 σ n i=1 σ with θ = (µ, σ)T Bayesian Longitudinal Analyses 117 PPCs: P_skewness=0.56 6 4 2 −2 0 Replicated kurtosis 1 0 −1 200 300 400 500 Replicated skewness 2 P_kurtosis=0.25 100 Replicated SS 600 P_fit=0.51 150 200 250 Observed SS Bayesian Longitudinal Analyses 300 −2 −1 0 1 Observed skewness −2 0 2 4 6 8 10 Observed kurtosis 118 Example growth curve study: model M1 versus M5 • MSPE: 3.74 (M1), 4.18 (M5) • PPCs: ◃ PPC1: 0.51 (M1), 0.53 (M5) ◃ PPC2: 0.56 (M1), 0.57 (M5) ◃ PPC3: 0.25 (M1), 0.57 (M5) Bayesian Longitudinal Analyses 119 Sensitivity of prior distribution • Gelman (1996): reasonably sized study with not a too strong prior the sensitivity of the conclusions to the choice of prior is likely to be minimal • But checking of the posterior with varying priors is a necessity Bayesian Longitudinal Analyses 120 Model expansion ◦ Enlarging the model with extra parameters ◦ Embedding the current model into a larger class of models Examples of model expansion: ◃ Embedding the distribution of the response into a general class of distributions ◃ Adding polynomial and/or interaction terms to the systematic part ◃ Introducing splines into the model ◃ Relaxing the link function ◃ ... But WinBUGS/OpenBUGS, etc are still too slow for quick exploratory Bayesian checks . . . Bayesian Longitudinal Analyses 121
© Copyright 2026 Paperzz