Bayesian Biostatistics Contents I Emmanuel Lesaffre Interuniversity Institute for Biostatistics and statistical Bioinformatics KU Leuven & University Hasselt Basic concepts in Bayesian methods 1 0.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 0.2 Time schedule of the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 0.3 Aims of the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 [email protected] 1 Modes of statistical inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.1 The frequentist approach: a critical reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.1.1 The classical statistical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Departamento de Ciências Exatas — University of Piracicaba 1.1.2 The P -value as a measure of evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 8 to 12 December 2014 1.1.3 The confidence interval as a measure of evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Statistical inference based on the likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.2 Bayesian Biostatistics - Piracicaba 2014 1.3 1.4 2 i 1.2.1 The likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.9 1.2.2 The likelihood principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.10 Bayesian versus likelihood approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 The prior and posterior of derived parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 The Bayesian approach: some basic ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.11 Bayesian versus frequentist approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 1.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.12 The different modes of the Bayesian approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 1.3.2 Bayes theorem – Discrete version for simple events - 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.13 An historical note on the Bayesian approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3 Introduction to Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Bayes theorem: computing the posterior distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2 Summarizing the posterior with probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 2.2 Bayes theorem – The binary version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.3 Posterior summary measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 2.3 Probability in a Bayesian context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.4 Bayes theorem – The categorical version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 2.5 Bayes theorem – The continuous version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.6 The binomial case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 2.7 The Gaussian case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.4.2 Posterior predictive distribution: General case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 2.8 The Poisson case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Bayesian Biostatistics - Piracicaba 2014 ii 3.4 3.5 3.3.1 Posterior mode, mean, median, variance and SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 3.3.2 Credible/credibility interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Predictive distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Exchangeability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Bayesian Biostatistics - Piracicaba 2014 iii 3.6 A normal approximation to the posterior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 4.4 Multivariate distributions 3.7 Numerical techniques to determine the posterior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 4.5 Frequentist properties of Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Numerical integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 4.6 The Method of Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 3.7.2 Sampling from the posterior distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 4.7 Bayesian linear regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 3.7.3 Choice of posterior summary measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 4.7.1 Bayesian hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 4.7.2 A noninformative Bayesian linear regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 The Bayes factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 4.7.3 Posterior summary measures for the linear regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Medical literature on Bayesian methods in RCTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 4.7.4 Sampling from the posterior distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 3.7.1 3.8 3.8.1 3.9 4 4.8 More than one parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 The frequentist approach to linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Bayesian generalized linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 4.8.1 More complex regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 4.2 Joint versus marginal posterior inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 4.3 The normal distribution with μ and σ 2 unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 4.3.1 No prior knowledge on μ and σ 2 is available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 5.2 The sequential use of Bayes theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 4.3.2 An historical study is available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 5.3 Conjugate prior distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 4.3.3 Expert knowledge is available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Bayesian Biostatistics - Piracicaba 2014 5.4 5.5 iv 5 Choosing the prior distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 5.3.1 Conjugate priors for univariate data distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 Bayesian Biostatistics - Piracicaba 2014 v 5.3.2 Conjugate prior for normal distribution – mean and variance unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 5.3.3 Multivariate data distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 5.3.4 Conditional conjugate and semi-conjugate priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 5.3.5 Hyperpriors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 5.7 Modeling priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 Noninformative prior distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 5.8 Other regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 5.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 5.4.2 Expressing ignorance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 5.4.3 General principles to choose noninformative priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 5.4.4 Improper prior distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 5.4.5 Weak/vague priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 5.6 6 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 5.5.2 Data-based prior distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 5.5.3 Elicitation of prior knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 5.5.4 Archetypal prior distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Bayesian Biostatistics - Piracicaba 2014 vi 5.6.1 Normal linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 5.6.2 Generalized linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Markov chain Monte Carlo sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 6.2 The Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 Informative prior distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 5.5.1 Prior distributions for regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 6.3 6.2.1 The bivariate Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 6.2.2 The general Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 6.2.3 Remarks∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 6.2.4 Review of Gibbs sampling approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 The Metropolis(-Hastings) algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 6.3.1 The Metropolis algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 6.3.2 The Metropolis-Hastings algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Bayesian Biostatistics - Piracicaba 2014 vii 6.3.3 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 8.1.1 A first analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440 Information on samplers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 7.2 Assessing convergence of a Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 8.1.3 Assessing and accelerating convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 7.2.1 Definition of convergence for a Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 8.1.4 Vector and matrix manipulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 7.2.2 Checking convergence of the Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 8.1.5 Working in batch mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Graphical approaches to assess convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382 8.1.6 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 7.2.4 Graphical approaches to assess convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 8.1.7 Directed acyclic graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 7.2.5 Formal approaches to assess convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394 8.1.8 Add-on modules: GeoBUGS and PKBUGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468 7.2.6 Computing the Monte Carlo standard error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 8.1.9 Related software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 7.2.7 Practical experience with the formal diagnostic procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 8.2 Bayesian analysis using SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 7.3 Accelerating convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 8.3 Additional Bayesian software and comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 7.4 Practical guidelines for assessing and accelerating convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418 8.3.1 Additional Bayesian software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474 Data augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 8.3.2 Comparison of Bayesian software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436 Bayesian Biostatistics - Piracicaba 2014 9 WinBUGS and related software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 8.1.2 7.5 II 8.1 Assessing and improving convergence of the Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 7.2.3 8 Remarks* Bayesian tools for statistical modeling viii Bayesian Biostatistics - Piracicaba 2014 476 Hierarchical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 9.1 The Poisson-gamma hierarchical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 10 ix 9.4.2 The linear mixed model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 9.4.3 The Bayesian generalized linear mixed model (BGLMM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 Model building and assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 9.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562 9.1.2 Model specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 10.2 Measures for model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 10.2.1 The Bayes factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 9.1.3 Posterior distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 9.1.4 Estimating the parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 10.2.2 Information theoretic measures for model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 9.1.5 Posterior predictive distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502 10.3 Model checking procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584 9.2 Full versus Empirical Bayesian approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 10.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 9.3 Gaussian hierarchical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 9.4 III More advanced Bayesian modeling 599 9.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510 9.3.2 The Gaussian hierarchical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 9.3.3 Estimating the parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 11.1 Example 1: Longitudinal profiles as covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601 Mixed models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 11.2 Example 2: relating caries on deciduous teeth with caries on permanent teeth . . . . . . . . . . . . . . . . . . . . . . . . . . . 612 9.4.1 11 Advanced modeling with Bayesian methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522 Bayesian Biostatistics - Piracicaba 2014 x Bayesian Biostatistics - Piracicaba 2014 xi IV V Exercises Are you a Bayesian? 614 Medical papers 644 Bayesian Biostatistics - Piracicaba 2014 xii Bayesian Biostatistics - Piracicaba 2014 xiii But ... Part I Basic concepts in Bayesian methods Bayesian Biostatistics - Piracicaba 2014 xiv Bayesian Biostatistics - Piracicaba 2014 1 0.1 Preface A significant result on a small trial Mouthwash trial Small sized study with an unexpectedly positive result about a new medication to treat patients with oral cancer Trial that tests whether daily use of a new mouthwash before tooth brushing reduces plaque when compared to using tap water only Result: New mouthwash reduces 25% of the plaque with a 95% CI = [10%, 40%] From previous trials on similar products: overall reduction in plaque lies between 5% to 15% First reaction (certainly of the drug company) = ”great!” Past: none of the medications had such a large effect and new medication is not much different from the standard treatment Second reaction (if one is honest) = be cautious Experts: plaque reduction from a mouthwash does not exceed 30% Then, You are a Bayesian (statistician) What to conclude? Classical frequentist analysis: 25% reduction + 95% CI Conclusion ignores what is known from the past on similar products Likely conclusion in practice: truth must lie somewhere in-between 5% and 25% Bayesian Biostatistics - Piracicaba 2014 2 Bayesian Biostatistics - Piracicaba 2014 3 Incomplete information Bayesian approach mimics our natural life where learning is done by combining past and present experience Some studies have not all required data, to tackle research question Example: Determining prevalence of a disease from fallible diagnostic test Expert knowledge can fill in the gaps Bayesian approach is the only way to tackle the problem! Bayesian Biostatistics - Piracicaba 2014 4 Bayesian Biostatistics - Piracicaba 2014 5 0.2 Most of the material is obtained from Time schedule of the course • Day 1 Morning: Chapters 1 and 2 Afternoon: ◦ Chapter 3, until Section 3.6 ◦ Brief practical introduction to R & Rstudio • Day 2 Morning: refreshing day 1 + Chapter 3, from Section 3.7 Afternoon: ◦ Discussion clinical papers ◦ Computer exercises in R Bayesian Biostatistics - Piracicaba 2014 6 • Day 3 Bayesian Biostatistics - Piracicaba 2014 0.3 7 Aims of the course Morning: Chapters 4 + 6 + computer exercises in R Afternoon: Chapter 8 via interactive WinBUGS computer session • Understand the Bayesian paradigm • Day 4 • Understand the use of Bayesian methodology in medical/biological papers Morning: selection of topics in Chapters 5 and 7 + WinBUGS exercises • Be able to build up a (not too complex) WinBUGS program Afternoon: Chapter 9 + WinBUGS exercises • Be able to build up a (not too complex) R2WinBUGS program • Day 5 Morning: Chapter 10 + (R2)WinBUGS computer exercises Afternoon: exercises and advanced Bayesian modeling + wrap up Bayesian Biostatistics - Piracicaba 2014 8 Bayesian Biostatistics - Piracicaba 2014 9 Chapter 1 Modes of statistical inference 1.1 The frequentist approach: a critical reflection • Review of the ‘classical’ approach on statistical inference Aims: Reflect on the ‘classical approach’ for statistical inference Look at a precursor of Bayesian inference: the likelihood approach A first encounter of the Bayesian approach Bayesian Biostatistics - Piracicaba 2014 1.1.1 10 The classical statistical approach Bayesian Biostatistics - Piracicaba 2014 11 Example I.1: Toenail RCT • Randomized, double blind, parallel group, multi-center study (Debacker et al., 1996) Classical approach: • Two treatments (A : Lamisil and B : Itraconazol) on 2 × 189 patients • Mix of two approaches (Fisher & Neyman and Pearson) • 12 weeks of treatment and 48 weeks of follow up (FU) • Here: based on P -value, significance level, power and confidence interval • Significance level α = 0.05 • Sample size to ensure that β ≤ 0.20 • Example: RCT • Primary endpoint = negative mycology (negative microscopy & negative culture) • Here unaffected nail length at week 48 on big toenail • 163 patients treated with A and 171 treated with B Bayesian Biostatistics - Piracicaba 2014 12 Bayesian Biostatistics - Piracicaba 2014 13 1.1.2 • A: μ1 & B: μ2 Use and misuse of P -value: • H0: Δ = μ1 − μ2 = 0 • The P -value is not the probability that H0 is (not) true = 1.38 with tobs = 2.19 in 0.05 rejection region • Completion of study: Δ • The P -value depends on fictive data (Example I.2) • Neyman-Pearson: reject that A and B are equally effective • The P -value depends on the sample space (Examples I.3 and I.4) • Fisher: 2-sided P = 0.030 ⇒ strong evidence against H0 • The P -value is not an absolute measure Wrong statement: Result is significant at 2-sided α of 0.030. This gives P -value an a ‘priori status’. Bayesian Biostatistics - Piracicaba 2014 The P -value as a measure of evidence 14 • The P -value does not take all evidence into account (Example I.5) Bayesian Biostatistics - Piracicaba 2014 15 The P -value depends on fictive data The P -value is not the probability that H0 is (not) true Often P -value is interpreted in a wrong manner • P -value = probability that observed or a more extreme result occurs under H0 ⇒ P -value is based not only on the observed result but also on fictive (never observed) data • P -value = probability that observed or a more extreme result occurs under H0 ⇒ P -value = surprise index • Probability has a long-run frequency definition • P -value = p(H0 | y) • Example I.2 • p(H0 | y) = Bayesian probability Bayesian Biostatistics - Piracicaba 2014 16 Bayesian Biostatistics - Piracicaba 2014 17 Example I.2: Graphical representation of P -value The P -value depends on the sample space P -value of RCT (Example I.1) • P -value = probability that observed or a more extreme result occurs if H0 is true ⇒ Calculation of P -value depends on all possible samples (under H0) • Examples I.3 and I.4 • The possible samples are similar in some characteristics to the observed sample (e.g. same sample size) Bayesian Biostatistics - Piracicaba 2014 18 Example I.3: Accounting for interim analyses in a RCT Bayesian Biostatistics - Piracicaba 2014 Example I.4: Kaldor et al’s case-control study 2 identical RCTs except for the number of analyses: • Case-control study (Kaldor et al., 1990) to examine the impact of chemotherapy on leukaemia in Hodgkin’s survivors • RCT 1: 4 interim analyses + final analysis • 149 cases (leukaemia) and 411 controls Correction for multiple testing Group sequential trial: Pocock’s rule • Question: Does chemotherapy induce excess risk of developing solid tumors, leukaemia and/or lymphomas? Global α = 0.05, nominal significance level=0.016 • RCT 2: 1 final analysis Treatment Controls Cases Global α = 0.05, nominal significance level=0.05 • If both trials run until the end and P = 0.02 for both trials, then for RCT 1: NO significance, for RCT 2: Significance Bayesian Biostatistics - Piracicaba 2014 19 20 Bayesian Biostatistics - Piracicaba 2014 No Chemo 160 11 Chemo 251 138 Total 411 149 21 The P -value is not an absolute measure • Pearson χ2(1)-test: P =7.8959x10−13 • Fisher’s Exact test: P = 1.487x10 • Small P -value does not necessarily indicate large difference between treatments, strong association, etc. −14 • Interpretation of a small P -value in a small/large study • Odds ratio = 7.9971 with a 95% confidence interval = [4.19,15.25] • Reason for difference: 2 sample spaces are different Pearson χ2(1)-test: condition on n Fisher’s Exact test: condition on marginal totals Bayesian Biostatistics - Piracicaba 2014 22 The P -value does not take all evidence into account Bayesian Biostatistics - Piracicaba 2014 23 Example I.5: Merseyside registry results • Studies are analyzed in isolation, no reference to historical data • Subsequent registry study in UK • Why not incorporating past information in current study? • Preliminary results of the Merseyside registry: P = 0.67 • Conclusion: no excess effect of chemotherapy (?) Treatment Controls Cases Bayesian Biostatistics - Piracicaba 2014 24 Bayesian Biostatistics - Piracicaba 2014 No Chemo 3 0 Chemo 3 2 Total 6 2 25 1.1.3 The confidence interval as a measure of evidence Example I.6: 95% confidence interval toenail RCT • 95% confidence interval for Δ = [0.14, 2.62] • 95% confidence interval: expression of uncertainty on parameter of interest • Interpretation: most likely (with 0.95 probability) Δ lies between 0.14 and 2.62 • Technical definition: in 95 out of 100 studies true parameter is enclosed = a Bayesian interpretation • In each study confidence interval includes/does not include true value • Practical interpretation has a Bayesian nature Bayesian Biostatistics - Piracicaba 2014 1.2 26 Statistical inference based on the likelihood function Bayesian Biostatistics - Piracicaba 2014 1.2.1 27 The likelihood function • Likelihood was introduced by Fisher in 1922 • Likelihood function = plausibility of the observed data as a function of the parameters of the stochastic model • Inference purely on likelihood function has not been developed to a full-blown statistical approach • Inference based on likelihood function is QUITE different from inference based on P -value • Considered here as a pre-cursor to Bayesian approach • In the likelihood approach, one conditions on the observed data Bayesian Biostatistics - Piracicaba 2014 28 Bayesian Biostatistics - Piracicaba 2014 29 Example I.7: A surgery experiment Binomial distribution ⎛ • New but rather complicated surgical technique fθ (s) = ⎝ • Surgeon operates n = 12 patients with s = 9 successes • Notation: ⎞ n s ⎠ θs (1 − θ)n−s with s = yi i=1 • θ fixed & function of s: ◦ Result on ith operation: success yi = 1, failure yi = 0 ◦ Total experiment: n operations with s successes ◦ Sample {y1, . . . , yn} ≡ y ◦ Probability of success = p(yi) = θ n n fθ (s) is discrete distribution with fθ (s) = 1 s=0 • s fixed & function of θ: ⇒ binomial likelihood function L(θ|s) ⇒ Binomial distribution: Expresses probability of s successes out of n experiments. Bayesian Biostatistics - Piracicaba 2014 30 To determine MLE first derivative of likelihood function is needed: • (θ|s) = c + [s ln θ + (n − s) ln(1 − θ)] ! • 31 Example I.7 – Determining MLE Binomial distribution Bayesian Biostatistics - Piracicaba 2014 θ d dθ (θ|s) = θs − (n−s) (1−θ) = 0 ⇒ θ = s/n • For s = 9 and n = 12 ⇒ θ = 0.75 θ ◦ Maximum likelihood estimate (MLE) θ maximizes L(θ|s) ◦ Maximizing L(θ|s) equivalent to maximizing log[L(θ|s)] ≡ (θ|s) Bayesian Biostatistics - Piracicaba 2014 32 Bayesian Biostatistics - Piracicaba 2014 33 1.2.2 The likelihood principles Likelihood principle 1 Two likelihood principles (LP): • LP 1: All evidence, which is obtained from an experiment, about an unknown quantity θ, is contained in the likelihood function of θ for the given data ⇒ LP 1: All evidence, which is obtained from an experiment, about an unknown quantity θ, is contained in the likelihood function of θ for the given data Standardized likelihood Interval of evidence • LP 2: Two likelihood functions for θ contain the same information about θ if they are proportional to each other. 34 Bayesian Biostatistics - Piracicaba 2014 35 Inference on the likelihood function: Example I.7 (continued) • Maximal evidence for θ = 0.75 • Standardized likelihood: LS (θ|s) ≡ L(θ|s)/L(θ|s) $%&' • LS (0.5|s) = 0.21 = test for hypothesis H0 without involving fictive data "# • Likelihood ratio L(0.5|s)/L(0.75|s) = relative evidence for 2 hypotheses θ = 0.5 & θ = 0.75 (0.21 ⇒??) Bayesian Biostatistics - Piracicaba 2014 *+,-. () • Interval of (≥ 1/2 maximal) evidence Bayesian Biostatistics - Piracicaba 2014 36 Bayesian Biostatistics - Piracicaba 2014 θ 37 Likelihood principle 2 Example I.8: Another surgery experiment Surgeon 1: (Example I.7) Operates n = 12 patients, observes s = 9 successes (and 3 failures) LP 2: Two likelihood functions for θ contain the same information about θ if they are proportional to each other Surgeon 2: Operates n patients until k = 3 failures are observed (n = s + k). And, suppose s = 9 • LP 2 = Relative likelihood principle Surgeon 1: s = n yi has a binomial distribution ⇒ binomial likelihood L1(θ|s) = ns θs(1 − θ)(n−s) i=1 ⇒ When likelihood is proportional under two experimental conditions, then information about θ must be the same! Surgeon 2: s = n yi has a negative binomial (Pascal) distribution s θ (1 − θ)k ⇒ negative binomial likelihood L2(θ|s) = s+k−1 s i=1 Bayesian Biostatistics - Piracicaba 2014 38 Inference on the likelihood function: Bayesian Biostatistics - Piracicaba 2014 39 Frequentist inference: • Surgeon 1: Calculation P -value = 0.0730 ⎛ ⎞ 12 12 ⎝ ⎠ 0.5s (1 − 0.5)12−s p [s ≥ 9 |θ = 0.5] = s s=9 "# /!# • H0: θ = 0.5 & HA: θ > 0.5 • Surgeon 2: Calculation P -value = 0.0337 ⎛ ⎞ ∞ 2+s ⎝ ⎠ 0.5s (1 − 0.5)3 p [s ≥ 9 |θ = 0.5] = s s=9 *+,-. () θ Frequentist conclusion = Likelihood conclusion LP 2: 2 experiments give us the same information about θ Bayesian Biostatistics - Piracicaba 2014 40 Bayesian Biostatistics - Piracicaba 2014 41 1.3 Conclusion: Design aspects (stopping rule) are important in frequentist context The Bayesian approach: some basic ideas • Bayesian methodology = topic of the course • Statistical inference through different type of “glasses” When likelihoods are proportional, it is not important in the likelihood approach how the data were obtained Bayesian Biostatistics - Piracicaba 2014 1.3.1 42 Bayesian Biostatistics - Piracicaba 2014 43 Introduction • Medical device trials: • Examples I.7 and I.8: combination of information from a similar historical surgical technique could be used in the evaluation of current technique = Bayesian exercise • Planning phase III study: The effect of a medical device is better understood than that of a drug It is very difficult to motivate surgeons to use concurrent controls to compare the new device with the control device Can we capitalize on the information of the past to evaluate the performance of the new device? Using only a single arm trial?? Comparison new ⇔ old treatment for treating breast cancer Background information is incorporated when writing the protocol Background information is not incorporated in the statistical analysis Suppose small-scaled study with unexpectedly positive result (P < 0.01) ⇒ Reaction??? Bayesian Biostatistics - Piracicaba 2014 44 Bayesian Biostatistics - Piracicaba 2014 45 Example I.9: Examples of Bayesian reasoning in daily life Central idea of Bayesian approach: • Tourist example: Prior view on Belgians + visit to Belgium (data) ⇒ posterior view on Belgians Combine likelihood (data) with Your prior knowledge (prior probability) to update information on the parameter to result in a revised probability associated with the parameter (posterior probability) Bayesian Biostatistics - Piracicaba 2014 1.3.2 • Marketing example: Launch of new energy drink on the market • Medical example: Patients treated for CVA with thrombolytic agent suffer from SBAs. Historical studies (20% - prior), pilot study (10% - data) ⇒ posterior 46 Bayesian Biostatistics - Piracicaba 2014 47 Bayes theorem – Discrete version for simple events - 1 • A (diseased) & B (positive diagnostic test) • Bayes theorem - version II: AC (not diseased) & B C (negative diagnostic test) • p(A, B) = p(A) · p(B | A) = p(B) · p(A | B) p (B | A) = p (A | B ) · p (B) p (A | B ) · p (B) + p (A | B C ) · p (B C ) • Bayes theorem = Theorem on Inverse Probability p (B | A) = Bayesian Biostatistics - Piracicaba 2014 p (A | B ) · p (B) p (A) 48 Bayesian Biostatistics - Piracicaba 2014 49 Example I.10: Sensitivity, specificity, prevalence and predictive values • Folin-Wu blood test: screening test for diabetes (Boston City Hospital) • B = “diseased”, A = “positive diagnostic test” Test Diabetic Non-diabetic Total • Characteristics of diagnostic test: Sensitivity (Se) = p (A | B ) Specificity (Sp) = p AC B C Positive predictive value (pred+) = p (B | A) Negative predictive value (pred-) = p B C AC 56 49 105 - 14 461 475 Total 70 510 580 ◦ Se = 56/70 = 0.80 Prevalence (prev) = p(B) ◦ Sp = 461/510 = 0.90 • pred+ calculated from Se, Sp and prev using Bayes theorem Bayesian Biostatistics - Piracicaba 2014 • Bayes theorem: p D+ T + = + ◦ prev = 70/580 = 0.12 50 Bayesian Biostatistics - Piracicaba 2014 51 • For p(B)=0.03 ⇒ pred+ = 0.20 & pred- = 0.99 + + + p (T | D ) · p (D ) p (T + | D+ ) · p (D+) + p (T + | D− ) · p (D−) • For p(B)=0.30 ⇒ pred+ = ?? & pred- = ?? • Individual prediction: combine prior knowledge (prevalence of diabetes in population) with result of Folin-Wu blood test on patient to arrive at revised opinion for patient • In terms of Se, Sp and prev: pred+ = Se · prev Se · prev + (1 − Sp) · (1 − prev) • Folin-Wu blood test: prior (prevalence) = 0.10 & positive test ⇒ posterior = 0.47 • Obvious, but important: it is not possible to find the probability of having a disease based on test results without specifying the disease’s prevalence. Bayesian Biostatistics - Piracicaba 2014 52 Bayesian Biostatistics - Piracicaba 2014 53 1.4 Outlook Example I.12: Toenail RCT – A Bayesian analysis Re-analysis of toenail data using WinBUGS (most popular Bayesian software) • Bayes theorem will be further developed in the next chapter ⇒ such that it becomes useful in statistical practice • Figure 1.4 (a): AUC on pos x-axis represents = our posterior belief that Δ is positive (= 0.98) • Reanalyze examples such as those seen in this chapter • Figure 1.4 (b): AUC for the interval [1, ∞) = our posterior belief that μ1/μ2 > 1 • Valid question: What can a Bayesian analysis do more than a classical frequentist analysis? • Figure 1.4 (c): incorporation of skeptical prior that Δ is positive (a priori around -0.5) (= 0.95) • Six additional chapters are needed to develop useful Bayesian tools • Figure 1.4 (d): incorporation of information on the variance parameters (σ22/σ12 varies around 2) • But it is worth the effort! Bayesian Biostatistics - Piracicaba 2014 54 Bayesian Biostatistics - Piracicaba 2014 55 True stories ... DE #01 A Bayesian and a frequentist were sentenced to death. When the judge asked what their final wishes were, the Bayesian replied that he wished to teach the frequentist the ultimate lesson. The judge granted his request and then repeated the question to the frequentist. He replied that he wished to get the lesson again and again and again . . . #01 2 3 FG #01 2 A Bayesian is walking in the fields expecting to meet horses. However, suddenly he runs into a donkey. Looks at the animal and continues his path concluding that he saw a mule. !#01 Bayesian Biostatistics - Piracicaba 2014 56 Bayesian Biostatistics - Piracicaba 2014 57 Chapter 2 Bayes theorem: computing the posterior distribution Take home messages • The P -value does not bring the message as it is perceived by many, i.e. it is not the probability that H0 is true or false Aims: • The confidence interval is generally accepted as a better tool for inference, but we interpret it in a Bayesian way Derive the general expression of Bayes theorem Exemplify the computations • There are other ways of statistical inference • Pure likelihood inference = Bayesian inference without a prior • Bayesian inference builds on likelihood inference by adding what is known of the problem Bayesian Biostatistics - Piracicaba 2014 2.1 58 Introduction Bayesian Biostatistics - Piracicaba 2014 2.2 In this chapter: 59 Bayes theorem – The binary version ◦ D+ ≡ θ = 1 and D− ≡ θ = 0 (diabetes) • Bayes theorem for binary outcomes, counts and continuous outcomes case ◦ T + ≡ y = 1 and T − ≡ y = 0 (Folin-Wu test) • Derivation of posterior distribution: for binomial, normal and Poisson p(θ = 1 | y = 1) = • A variety of examples p(y = 1 | θ = 1) · p(θ = 1) p(y = 1 | θ = 1) · p(θ = 1) + p(y = 1 | θ = 0) · p(θ = 0) ◦ p(θ = 1), p(θ = 0) prior probabilities ◦ p(y = 1 | θ = 1) likelihood ◦ p(θ = 1 | y = 1) posterior probability ⇒ Now parameter has also a probability Bayesian Biostatistics - Piracicaba 2014 60 Bayesian Biostatistics - Piracicaba 2014 61 2.3 Bayes theorem Shorthand notation Probability in a Bayesian context Bayesian probability = expression of Our/Your uncertainty of the parameter value p(θ | y) = • Coin tossing: truth is there, but unknown to us p(y | θ)p(θ) p(y) • Diabetes: from population to individual patient where θ can stand for θ = 0 or θ = 1. Bayesian Biostatistics - Piracicaba 2014 Probability can have two meanings: limiting proportion (objective) or personal belief (subjective) 62 Bayesian Biostatistics - Piracicaba 2014 63 Other examples of Bayesian probabilities Subjective probability rules Subjective probability varies with individual, in time, etc. Let A1, A2, . . . , AK mutually exclusive events with total event S Subjective probability p should be coherent: • Tour de France • Ak : p(Ak ) ≥ 0 (k=1, . . ., K) • FIFA World Cup 2014 • p(S) = 1 • Global warming • p(AC ) = 1 − p(A) • ... • With B1, B2, . . . , BL another set of mutually exclusive events: p(Ai | Bj ) = Bayesian Biostatistics - Piracicaba 2014 64 Bayesian Biostatistics - Piracicaba 2014 p(Ai, Bj ) p(Bj ) 65 2.4 Bayes theorem – The categorical version 2.5 • 1-dimensional continuous parameter θ • Subject can belong to K > 2 classes: θ1, θ2, . . . , θK • i.i.d. sample y = y1, . . . , yn • y takes L different values: y1, . . . , yL or continuous • Joint distribution of sample = p(y|θ) = n i=1 p(yi |θ) = likelihood L(θ|y) • Prior density function p(θ) ⇒ Bayes theorem for categorical parameter: p (θk | y) = Bayes theorem – The continuous version • Split up: p(y, θ) = p(y|θ)p(θ) = p(θ|y)p(y) p (y | θk ) p (θk ) K p (y | θk ) p (θk ) ⇒ Bayes theorem for continuous parameters: k=1 p(θ|y) = Bayesian Biostatistics - Piracicaba 2014 66 L(θ|y)p(θ) L(θ|y)p(θ) = p(y) L(θ|y)p(θ)dθ Bayesian Biostatistics - Piracicaba 2014 2.6 67 The binomial case • Shorter: p(θ|y) ∝ L(θ|y)p(θ) • Example II.1: Stroke study – Monitoring safety L(θ|y)p(θ)dθ = averaged likelihood Rt-PA: thrombolytic for ischemic stroke • Averaged likelihood ⇒ posterior distribution involves integration Historical studies ECASS 1 and ECASS 2: complication SICH • θ is now a random variable and is described by a probability distribution. All because we express our uncertainty on the parameter ECASS 3 study: patients with ischemic stroke (Tx between 3 & 4.5 hours) DSMB: monitor SICH in ECASS 3 • In the Bayesian approach (as in likelihood approach), one only looks at the observed data. No fictive data are involved Fictive situation: ◦ First interim analysis ECASS 3: 50 rt-PA patients with 10 SICHs ◦ Historical data ECASS 2: 100 rt-PA patients with 8 SICHs • One says: In the Bayesian approach, one conditions on the observed data. In other words, the data are fixed and p(y) is a constant! Bayesian Biostatistics - Piracicaba 2014 Estimate risk for SICH in ECASS 3 ⇒ construct Bayesian stopping rule 68 Bayesian Biostatistics - Piracicaba 2014 69 Comparison of 3 approaches Notation • Frequentist • SICH incidence: θ • Likelihood • i.i.d. Bernoulli random variables y1, . . . , yn • Bayesian - different prior distributions • SICH: yi = 1, otherwise yi = 0 Prior information is available: from ECASS 2 study •y= Experts express their opinion: subjective prior n 1 yi has Bin(n, θ): p(y|θ) = n y θy (1 − θ)(n−y) No prior information is available: non-informative prior • Exemplify mechanics of calculating the posterior distribution using Bayes theorem Bayesian Biostatistics - Piracicaba 2014 70 Frequentist approach Bayesian Biostatistics - Piracicaba 2014 Likelihood inference • MLE θ = y/n = 10/50 = 0.20 • MLE θ = 0.20 • Test hypothesis θ = 0.08 with binomial test (8% = value of ECASS 2 study) • No hypothesis test is performed • Classical 95% confidence interval = [0.089, 0.31] • 0.95 interval of evidence = [0.09, 0.36] Bayesian Biostatistics - Piracicaba 2014 71 72 Bayesian Biostatistics - Piracicaba 2014 73 Bayesian approach: prior obtained from ECASS 2 study 1. Specifying the (ECASS 2) prior distribution • ECASS 2 likelihood: L(θ|y0) = 1. Specifying the (ECASS 2) prior distribution n0 y0 θy0 (1 − θ)(n0−y0) (y0 = 8 & n0 = 100) 3. Characteristics of the posterior distribution • ECASS 2 likelihood expresses prior belief on θ but is not (yet) prior distribution 2. Constructing the posterior distribution 4. Equivalence of prior information and extra data • As a function of θ L(θ|y0) = density (AUC = 1) • How to standardize? Numerically or analytically? .+-'.+*11& 2412146+105+%* Bayesian Biostatistics - Piracicaba 2014 74 • Kernel of binomial likelihood θy0 (1 − θ)(n0−y0) ∝ beta density Beta(α, β): Some beta densities θα0−1(1 − θ)β0−1 45 456 457 458 954 :54 54 95 954 45 454 454 456 457 Bayesian Biostatistics - Piracicaba 2014 458 954 45 456 457 458 954 :5 45 954 95 54 5 :54 :5 :54 54 45 454 945 95 454 454 76 954 454 458 954 457 45 454 2412146+105+%* 456 5 :54 5 54 95 45 954 45 4545 :5 99 .+-'.+*11& Bayesian Biostatistics - Piracicaba 2014 5 :54 5 954 45 454 454 • α0(= 9) ≡ y0 + 1 β0(= 100 − 8 + 1) ≡ n0 − y0 + 1 454 45 954 RTQRQTVKQPCNVQ .+-'.+*11& 95 54 5 54 95 with B(α, β) = Γ(α)Γ(β) Γ(α+β) Γ(·) gamma function 9494 :5 :: :5 : :5 1 B(α0 ,β0 ) 75 :54 p(θ) = Bayesian Biostatistics - Piracicaba 2014 454 45 456 457 458 954 454 45 456 457 458 954 77 2. Constructing the posterior distribution n 1 α0 +y−1 (1 − θ)β0+n−y−1 y B(α0 ,β0 ) θ n B(α0+y,β0+n−y) y B(α0 ,β0 ) • Posterior distribution = • Bayes theorem needs: Prior p(θ) (ECASS 2 study) ⇒ Posterior distribution = Beta(α, β) Likelihood L(θ|y) (ECASS 3 interim analysis), y=10 & n=50 Averaged likelihood L(θ|y)p(θ)dθ p(θ|y) = • Numerator of Bayes theorem n 1 θα0+y−1(1 − θ)β0+n−y−1 L(θ|y) p(θ) = y B(α0, β0) with α = α0 + y β = β0 + n − y • Denominator of Bayes theorem = averaged likelihood n B(α0 + y, β0 + n − y) p(y) = L(θ|y) p(θ) dθ = y B(α0, β0) Bayesian Biostatistics - Piracicaba 2014 1 θα−1(1 − θ)β−1 B(α, β) 78 2. Prior, likelihood & posterior Bayesian Biostatistics - Piracicaba 2014 79 3. Characteristics of the posterior distribution • Posterior = compromise between prior & likelihood • Posterior mode: θM = ; n0 n0 +n • Shrinkage: θ0 ≤ θM ≤ θ θ0 + n0n+n θ when (y0/n0 ≤ y/n) • Posterior more peaked than prior & likelihood, but not in general • Posterior = beta distribution = prior (conjugacy) ;0 ; • Likelihood dominates the prior for large sample sizes θ Bayesian Biostatistics - Piracicaba 2014 θ<( θ< θ • NOTE: Posterior estimate θ = MLE of combined ECASS 2 data & interim data ECASS 3 (!!!), i.e. y0 + y θM = n0 + n 80 Bayesian Biostatistics - Piracicaba 2014 81 4. Equivalence of prior information and extra data • Beta(α0, β0) prior ≡ binomial experiment with (α0 − 1) successes in (α0 + β0 − 2) experiments ;=>?)@'=@ ;=>?)@'=@ ;@'=@ ⇒ Prior ;@'=@ Example dominance of likelihood ≈ extra data to observed data set: (α0 − 1) successes and (β0 − 1) failures θ θ ;=>?)@'=@ ;=>?)@'=@ ;@'=@ ;@'=@ θ θ Likelihood based on 5,000 subjects with 800 “successes” Bayesian Biostatistics - Piracicaba 2014 82 Bayesian approach: using a subjective prior Bayesian Biostatistics - Piracicaba 2014 83 Example subjective prior • Suppose DSMB neurologists ‘believe’ that SICH incidence is probably more than 5% but most likely not more than 20% )&A>> ;=>?)@'=@ • If prior belief = ECASS 2 prior density ⇒ posterior inference is the same >B"C)&?'D) ;=>?)@'=@ • The neurologists could also combine their qualitative prior belief with ECASS 2 data to construct a prior distribution ⇒ adjust ECASS 2 prior )&A>> ;@'=@ >B"C)&?'D) ;@'=@ Bayesian Biostatistics - Piracicaba 2014 84 Bayesian Biostatistics - Piracicaba 2014 θ 85 2.7 Bayesian approach: no prior information is available The Gaussian case • Suppose no prior information is available Example II.2: Dietary study – Monitoring dietary behavior in Belgium • Need: a prior distribution that expresses ignorance = noninformative (NI) prior • IBBENS study: dietary survey in Belgium • Of interest: intake of cholesterol • Monitoring dietary behavior in Belgium: IBBENS-2 study • Uniform prior on [0,1] = Beta(1,1) 00 'F)'G== ;=>?)@'=@ • For stroke study: NI prior = p(θ) = I[0,1] = flat prior on [0,1] EA?;@'=@ Assume σ is known Bayesian Biostatistics - Piracicaba 2014 θ 86 Bayesian approach: prior obtained from the IBBENS study Bayesian Biostatistics - Piracicaba 2014 87 1. Specifying the (IBBENS) prior distribution Histogram of the dietary cholesterol of 563 bank employees ≈ normal 1. Specifying the (IBBENS) prior distribution • y ∼ N(μ, σ 2) when 2. Constructing the posterior distribution f (y) = √ 3. Characteristics of the posterior distribution 4. Equivalence of prior information and extra data 1 exp −(y − μ)2/2σ 2 2π σ • Sample y1, . . . , yn ⇒ likelihood 2 n 1 μ−y 1 2 √ ≡ L(μ|y) (yi − μ) ∝ exp − L(μ|y) ∝ exp − 2 2σ i=1 2 σ/ n Bayesian Biostatistics - Piracicaba 2014 88 Bayesian Biostatistics - Piracicaba 2014 89 Histogram and likelihood IBBENS study: • Likelihood ∝ N(μ0, σ02) μ0 ≡ y 0 = 328 √ √ σ0 = σ/ n0 = 120.3/ 563 = 5.072 • Denote sample n0 IBBENS data: y 0 ≡ {y0,1, . . . , y0,n0 } with mean y 0 2 1 μ − μ0 1 exp − p(μ) = √ 2 σ0 2πσ0 (), • IBBENS prior distribution (σ is known) .#!- μ with μ0 ≡ y 0 Bayesian Biostatistics - Piracicaba 2014 90 Bayesian Biostatistics - Piracicaba 2014 91 2. Constructing the posterior distribution Integration constant to obtain density? • IBBENS-2 study: Recognize standard distribution: exponent (quadratic function of μ) sample y with n=50 y = 318 mg/day & s = 119.5 mg/day 95% confidence interval = [284.3, 351.9] mg/day ⇒ wide Posterior distribution: p(μ|y) = N(μ, σ 2), • Combine IBBENS prior distribution IBBENS-2 normal likelihood: with ◦ IBBENS-2 likelihood: L(μ | y) ◦ IBBENS prior density: p(μ) = N(μ | μ0, σ02) • Posterior distribution ∝ p(μ)L(μ|y): p(μ|y) ≡ p(μ|y) ∝ exp Bayesian Biostatistics - Piracicaba 2014 1 − 2 μ − μ0 σ0 μ= 2 + μ−y √ σ/ n 2 1 σ02 μ0 + σn2 y 1 σ02 + σn2 and σ 2 = 1 σ02 1 + σn2 Here: μ = 327.2 and σ = 4.79. 92 Bayesian Biostatistics - Piracicaba 2014 93 IBBENS-2 posterior distribution: 3. Characteristics of the posterior distribution Posterior distribution: compromise between prior and likelihood Posterior mean: weighted average of prior and the sample mean '"")/>;@'=@ '"")/>;=>?)@'=@ μ= w0 w1 μ0 + y w0 + w 1 w0 + w1 with w0 = The posterior precision = 1/posterior variance: 1 = w0 + w1 σ2 '"")/>'F)'G== μ 1 1 & w1 = 2 2 σ0 σ /n with w0 = 1/σ02 = prior precision and w1 = 1/(σ 2/n) = sample precision Bayesian Biostatistics - Piracicaba 2014 94 Bayesian Biostatistics - Piracicaba 2014 95 4. Equivalence of prior information and extra data • Posterior is always more peaked than prior and likelihood • Prior variance σ02 = σ 2 (unit information prior) ⇒ σ 2 = σ 2/(n + 1) ⇒ Prior information = adding one extra observation to the sample • When n → ∞ or σ0 → ∞: p(μ|y) = N(y, σ 2/n) • General: σ02 = σ 2/n0, with n0 general ⇒ When sample size increases the likelihood dominates the prior μ= • Posterior = normal = prior ⇒ conjugacy n0 n y μ0 + n0 + n n0 + n and σ2 = Bayesian Biostatistics - Piracicaba 2014 96 Bayesian Biostatistics - Piracicaba 2014 σ2 n0 + n 97 Discounted priors: ;=>?)@'=@ ;@'=@ ;@'=@ • Discounted IBBENS prior + shift: increase from μ0 = 328 to μ0 = 340 'F)'G== 'F)'G== μ Discounted prior Bayesian Biostatistics - Piracicaba 2014 98 Bayesian approach: no prior information is available ;=>?)@'=@ • Discounted IBBENS prior: increase IBBENS prior variance from 25 to 100 Bayesian approach: using a subjective prior μ Discounted prior + shift Bayesian Biostatistics - Piracicaba 2014 99 Non-informative prior: • Non-informative prior: σ02 → ∞ ⇒ Posterior: N(y, σ 2/n) ;=>?)@'=@ 'F)'G== ;@'=@ Bayesian Biostatistics - Piracicaba 2014 100 Bayesian Biostatistics - Piracicaba 2014 μ 101 The Poisson case Example II.6: Describing caries experience in Flanders The Signal-Tandmobiel (STM) study: 0.4 • Longitudinal oral health study in Flanders • Poisson(θ) y −θ θ e y! • 4468 children (7% of children born in 1989) • Mean and variance = θ 0.3 • Annual examinations from 1996 to 2001 PROPORTION p(y| θ) = • Caries experience measured by dmft-index (min=0, max=20) 0.0 • Poisson likelihood: 0.2 • Take y ≡ {y1, . . . , yn} independent counts ⇒ Poisson distribution 0.1 2.8 L(θ| y) ≡ n p(yi| θ) = i=1 n 0 5 (θyi /yi!) e−nθ 10 15 dmft-index i=1 Bayesian Biostatistics - Piracicaba 2014 102 Frequentist and likelihood calculations Bayesian Biostatistics - Piracicaba 2014 103 Bayesian approach: prior distribution based on historical data • MLE of θ: θ = y = 2.24 1. Specifying the prior distribution • Likelihood-based 95% confidence interval for θ: [2.1984,2.2875] 2. Constructing the posterior distribution 3. Characteristics of the posterior distribution 4. Equivalence of prior information and extra data Bayesian Biostatistics - Piracicaba 2014 104 Bayesian Biostatistics - Piracicaba 2014 105 1. Specifying the prior distribution Gamma prior for STM study: • Information from literature: ◦ Average dmft-index 4.1 (Liège, 1983) & 1.39 (Gent, 1994) ◦ Oral hygiene has improved considerably in Flanders ◦ Average dmft-index bounded above by 10 • Candidate for prior: Gamma(α0, β0) V##W β α0 p(θ) = 0 θα0−1 e−β0θ Γ(α0) ◦ α0 = shape parameter & β0 = inverse of scale parameter ◦ E(θ) = α0/β0 & var(θ) = α0/β02 θ • STM study: α0 = 3 & β0 = 1 Bayesian Biostatistics - Piracicaba 2014 106 Some gamma densities 4 4 4 4 4 4 β0α0 α0−1 −β0θ θ e Γ(α0) i=1 yi +α0 )−1 −(n+β0 )θ • Recognize kernel of a Gamma( 4 4 4 Bayesian Biostatistics - Piracicaba 2014 e yi + α0, n + β0) distribution ⇒ p(θ| y) ≡ p(θ| y) = with ᾱ = β̄ ᾱ ᾱ−1 −β̄θ e θ Γ(ᾱ) yi + α0= 9758 + 3 = 9761 and β̄ = n + β0= 4351 + 1 = 4352 4 4 4 4 4 ⇒ STM study: effect of prior is minimal 44 44 4 4 4 4 4 (θyi /yi!) 4 4 4 4 n 4 44 44 ∝ θ( 44 4 4 p(θ| y) ∝ e−nθ 4 4 4 4 4 44 4 4 4 4 4 • Posterior 4 4 44 107 2. Constructing the posterior distribution 4 Bayesian Biostatistics - Piracicaba 2014 4 4 4 4 108 Bayesian Biostatistics - Piracicaba 2014 109 3. Characteristics of the posterior distribution 4. Equivalence of prior information and extra data • Posterior is a compromise between prior and likelihood • Prior = equivalent to experiment of size β0 with counts summing up to α0 − 1 • Posterior mode demonstrates shrinkage • STM study: prior corresponds to an experiment of size 1 with count equal to 2 • For STM study posterior more peaked than prior likelihood, but not in general • Prior is dominated by likelihood for a large sample size • Posterior = gamma = prior ⇒ conjugacy Bayesian Biostatistics - Piracicaba 2014 110 Bayesian Biostatistics - Piracicaba 2014 2.9 Bayesian approach: no prior information is available • Gamma with α0 ≈ 1 and β0 ≈ 0 = non-informative prior 111 The prior and posterior of derived parameter • If p(θ) is prior of θ, what is then corresponding prior for h(θ) = ψ? Same question for posterior density Example: θ = odds ratio, ψ = log (odds ratio) • Why do we wish to know this? Prior: prior information on θ and ψ should be the same Posterior: allows to reformulate conclusion on a different scale −1 • Solution: apply transformation rule p(h−1(ψ)) dh dψ(ψ) • Note: parameter is a random variable! Bayesian Biostatistics - Piracicaba 2014 112 Bayesian Biostatistics - Piracicaba 2014 113 Posterior and transformed posterior: with α = 19 and β = 133. Bayesian Biostatistics - Piracicaba 2014 2.10 114 Bayesian versus likelihood approach 1 exp ψ α(1 − exp ψ)β−1 B(α, β) p(ψ|y) = • Posterior distribution of ψ = log(θ) ;=>?)@'=@)/>'?X • Probability of ‘success’ is often modeled on the log-scale (or logit scale) ;=>?)@'=@)/>'?X Example II.4: Stroke study – Posterior distribution of log(θ) θ ψ Bayesian Biostatistics - Piracicaba 2014 2.11 115 Bayesian versus frequentist approach • Frequentist approach: • Bayesian approach satisfies 1st likelihood principle in that inference does not depend on never observed results θ fixed and data are stochastic Many tests are based on asymptotic arguments • Bayesian approach satisfies 2nd likelihood principle: p2(θ | y) = L2(θ | y)p(θ)/ L2(θ | y)p(θ)dθ = c L1(θ | y)p(θ)/ c L1(θ | y)p(θ)dθ Maximization is key tool Does depend on stopping rules • Bayesian approach: = p1(θ | y) Condition on observed data (data fixed), uncertainty about θ (θ stochastic) • In Bayesian approach parameter is stochastic ⇒ different effect of transformation h(θ) in Bayesian and likelihood approach No asymptotic arguments are needed, all inference depends on posterior Integration is key tool Does not depend on stopping rules Bayesian Biostatistics - Piracicaba 2014 116 Bayesian Biostatistics - Piracicaba 2014 117 2.12 • Frequentist and Bayesian approach can give the same numerical output (with different interpretation), but may give quite a different inference The different modes of the Bayesian approach • Subjectivity ⇔ objectivity • Subjective (proper) Bayesian ⇔ objective (reference) Bayesian • Frequentist ideas in Bayesian approaches (MCMC) • Empirical Bayesian • Decision-theoretic (full) Bayesian: use of utility function • 46656 varieties of Bayesians (De Groot) • Pragmatic Bayesian = Bayesian ?? Bayesian Biostatistics - Piracicaba 2014 2.13 118 Bayesian Biostatistics - Piracicaba 2014 119 An historical note on the Bayesian approach Thomas Bayes was probably born in 1701 and died in 1761 Up to 1950 Bayes theorem was called Theorem of Inverse Probability He was a Presbyterian minister, studied logic and theology at Edinburgh University, and had strong mathematical interests Fundament of Bayesian theory was developed by Pierre-Simon Laplace (1749-827) Laplace first assumed indifference prior, later he relaxed this assumption Bayes theorem was submitted posthumously by his friend Richard Price in 1763 and was entitled An Essay toward a Problem in the Doctrine of Chances Bayesian Biostatistics - Piracicaba 2014 Much opposition: e.g. Poisson, Fisher, Neyman and Pearson, etc 120 Bayesian Biostatistics - Piracicaba 2014 121 Proponents of the Bayesian approach: • de Finetti: exchangeability Fisher strong opponent to Bayesian theory • Jeffreys: noninformative prior, Bayes factor Because of his influence ⇒ dramatic negative effect • Savage: theory of subjective and personal probability and statistics Opposed to use of flat prior and that conclusions change when putting flat prior on h(θ) rather than on θ • Lindley: Gaussian hierarchical models • Geman & Geman: Gibbs sampling Some connection between Fisher and (inductive) Bayesian approach, but much difference with N&P approach Bayesian Biostatistics - Piracicaba 2014 • Gelfand & Smith: introduction of Gibbs sampling into statistics • Spiegelhalter: (Win)BUGS 122 Bayesian Biostatistics - Piracicaba 2014 Recommendation 123 Take home messages • Bayes theorem = model for learning The theory that would not die. How Bayes rule cracked the enigma code, hunted down Russian submarines & emerged triumphant from two centuries of controversy • Probability in a Bayesian context: data: classical, parameters: expressing what we belief/know • Bayesian approach = likelihood approach + prior Mc Grayne (2011) • Inference: Bayesian: ◦ based on parameter space (posterior distribution) ◦ conditions on observed data, parameter stochastic Classical: ◦ based on sample space (set of possible outcomes) ◦ looks at all possible outcomes, parameter fixed Bayesian Biostatistics - Piracicaba 2014 124 Bayesian Biostatistics - Piracicaba 2014 125 Chapter 3 Introduction to Bayesian inference • Prior can come from: historical data or subjective belief • Prior is equivalent to extra data • Noninformative prior can mimic classical results Aims: • Posterior = compromise between prior and likelihood Introduction to basic concepts in Bayesian inference • For large sample, likelihood dominates prior Introduction to simple sampling algorithms Illustrating that sampling can be useful alternative to analytical/other numerical techniques to determine the posterior • Bayesian approach was obstructed by many throughout history Illustrating that Bayesian testing can be quite different from frequentist testing • . . . but survived because of a computational trick . . . (MCMC) Bayesian Biostatistics - Piracicaba 2014 3.1 126 Introduction Bayesian Biostatistics - Piracicaba 2014 3.2 More specifically we look at: 127 Summarizing the posterior with probabilities Direct exploration of the posterior: P (a < θ < b|y) for different a and b • Exploration of the posterior distribution: Example III.1: Stroke study – SICH incidence Summary statistics for location and variability • θ = probability of SICH due to rt-PA at first ECASS-3 interim analysis Interval estimation p(θ|y) = Beta(19, 133)-distribution Predictive distribution • P(a < θ < b|y): • Normal approximation of posterior a = 0.2, b = 1.0: P(0.2 < θ < 1|y)= 0.0062 • Simple sampling procedures a = 0.0, b = 0.08: P(0 < θ < 0.08|y)= 0.033 • Bayesian hypothesis tests Bayesian Biostatistics - Piracicaba 2014 128 Bayesian Biostatistics - Piracicaba 2014 129 3.3 Posterior summary measures 3.3.1 Posterior mode, mean, median, variance and SD • Posterior mode: θM where posterior distribution is maximum • We now summarize the posterior distribution with some simple measures, similar to what is done when summarizing collected data. • Posterior mean: mean of posterior distribution, i.e. θ = • The measures are computed on a (population) distribution. • Posterior median: median of posterior distribution, i.e. 0.5 = θ p(θ|y)dθ • Posterior variance: variance of posterior distribution, i.e. σ 2 = θM p(θ|y)dθ (θ − θ)2 p(θ|y)dθ • Posterior standard deviation: sqrt of posterior variance, i.e. σ • Note: ◦ Only posterior median of h(θ) can be obtained from posterior median of θ ◦ Only posterior mode does not require integration ◦ Posterior mode = MLE with flat prior Bayesian Biostatistics - Piracicaba 2014 130 Example III.2: Stroke study – Posterior summary measures Graphical representation measures: (α = 19 & β = 133) 1 0 ; 1 B(α,β) • Posterior mode: maximize (α − 1) ln(θ) + (β − 1) ln(1 − θ) wrt θ ⇒ θM = (α − 1)/((α + β − 2)) = 18/150 = 0.12 • Posterior mean: integrate θ θα−1(1 − θ)β−1 d θ 1 B(α,β) 1 θM θα−1(1 − θ)β−1 d θ for θ # # # ⇒ θM = = 0.122 (R-function qbeta) 1 1 2 α−1 (1 − θ)β−1 d θ • Posterior variance: calculate also B(α,β) 0 θ θ ⇒ σ 2 = α β/ (α + β)2(α + β + 1) = 0.02672 Bayesian Biostatistics - Piracicaba 2014 > ⇒ θ = B(α + 1, β)/B(α, β) = α/(α + β) = 19/152 = 0.125 • Posterior median: solve 0.5 = 131 • Posterior at 1st interim ECASS 3 analysis: Beta(α, β) Bayesian Biostatistics - Piracicaba 2014 132 Bayesian Biostatistics - Piracicaba 2014 θ 133 Posterior summary measures: 3.3.2 • Posterior for μ (based on IBBENS prior): Credible/credibility interval • [a,b] = 95% credible interval (CI) for θ if P (a ≤ θ ≤ b | y) = 0.95 Gaussian with μ M ≡ μ ≡ μM • Two types of 95% credible interval: • Posterior mode=mean=median: μ M = 327.2 mg/dl 95% equal tail CI [a, b]: AUC = 0.025 is left to a & AUC = 0.025 is right to b • Posterior variance & SD: σ 2 = 22.99 & σ = 4.79 mg/dl 95% highest posterior density (HPD) CI [a, b]: [a, b] contains most plausible values of θ • Properties: 100(1-α)% HPD CI = shortest interval with size (1 − α) (Press, 2003) h(HPD CI) = HPD CI, but h(equal-tail CI) = equal-tail CI Symmetric posterior: equal tail = HPD CI Bayesian Biostatistics - Piracicaba 2014 134 Example III.4: Dietary study – Interval estimation of dietary intake Bayesian Biostatistics - Piracicaba 2014 135 Example III.5: Stroke study – Interval estimation of probability of SICH • Posterior = N(μ, σ 2) • Posterior = Beta(19, 133)-distribution • Obvious choice for a 95% CI is [μ − 1.96 σ, μ + 1.96 σ] • 95% equal tail CI (R function qbeta) = [0.077, 0.18] (see figure) • Equal 95% tail CI = 95% HPD interval • 95% HPD interval = [0.075, 0.18] (see figure) • Results IBBENS-2 study: • Computations HPD interval: use R-function optimize IBBENS prior distribution ⇒ 95% CI = [317.8, 336.6] mg/dl N(328; 10,000) prior ⇒ 95% CI = [285.6, 351.0] mg/dl Classical (frequentist) 95% confidence interval = [284.9, 351.1] mg/dl Bayesian Biostatistics - Piracicaba 2014 136 Bayesian Biostatistics - Piracicaba 2014 137 HPD interval: "$W $%)ZBA?A'&' θ θ !$%!G;' $%G; ")?A$W Equal tail credible interval: ψ HPD interval is not-invariant to (monotone) transformations Bayesian Biostatistics - Piracicaba 2014 3.4 138 Predictive distributions Bayesian Biostatistics - Piracicaba 2014 3.4.1 139 Introduction • Predictive distribution = distribution of a future observation y after having observed the sample {y1, . . . , yn} • Two assumptions: Future observations are independent of current observations given θ Future observations have the same distribution (p(y | θ)) as the current observations • We look at three cases: (a) binomial, (b) Gaussian and (c) Poisson • We start with a binomial example Bayesian Biostatistics - Piracicaba 2014 140 Bayesian Biostatistics - Piracicaba 2014 141 Example III.7: Stroke study – Predicting SICH incidence in interim analysis Example III.7: Known incidence rate • MLE of θ (incidence SICH) = 8/100 = 0.08 for (fictive) ECASS 2 study • Before 1st interim analysis but given the pilot data: Assume now that 8% is the true incidence Obtain an idea of the number of (future) rt-PA treated patients who will suffer from SICH in sample of size m = 50 Predictive distribution: Bin(50, 0.08) 95% predictive set: {0, 1, . . . , 7} ≈ 94% of the future counts • Distribution of y (given the pilot data)? ⇒ observed result of 10 SICH patients out of 50 is extreme • But, 8% is likely not the true value Bayesian Biostatistics - Piracicaba 2014 142 Binomial predictive distribution Bayesian Biostatistics - Piracicaba 2014 143 Example III.7: Unknown incidence rate • Uncertainty of θ expressed by posterior (ECASS 2 prior) = Beta(α, β)-distribution with α = 9 and β = 93 • Distribution of m = 50 future SICH events without knowing θ is weighted by posterior distribution = posterior predictive distribution (PPD) • Distribution (likelihood) of m = 50 future SICH events is given by Bin(m, θ) "W • PPD = Beta-binomial distribution BB(m, α, β) 1 θα−1(1 − θ)β−1 m y p( y |y) = dθ θ (1 − θ)(m−y) y B(α, β) 0 m B( y + α, m − y + β) = y B(α, β) *@?;A0Y>'&G Ignores uncertainty in θ Bayesian Biostatistics - Piracicaba 2014 144 Bayesian Biostatistics - Piracicaba 2014 145 Result: Binomial and beta-binomial predictive distribution • BB(m, α, β) shows more variability than Bin(m, θ) • 94.4% PPS is {0, 1, . . . , 9} ⇒ 10 SICH patients out of 50 less extreme ""W$W$ "W Bayesian Biostatistics - Piracicaba 2014 3.4.2 146 Posterior predictive distribution: General case *@?;A0Y>'&G Bayesian Biostatistics - Piracicaba 2014 147 Example III.7: Second interim analysis • Posterior after 1st interim analysis can be used: • Central idea: Take the posterior uncertainty into account ◦ To compute PPD of the number SICH events in subsequent m treated rt-PA patients, to be evaluated in 2nd interim analysis ◦ As prior to be combined with the results in the 2nd interim analysis ◦ This is an example of the sequential use of Bayes theorem • Three cases: y | θM ) All mass (AUC ≈ 1) of p(θ | y) at θM ⇒ distribution of y: p( y | θk )p(θk | y) All mass at θ1, . . . , θK ⇒ distribution of y: K k=1 p( General case: posterior predictive distribution (PPD) ⇒ distribution of y p( y | y) = p( y | θ)p(θ | y) dθ • PPD expresses what we know about the distribution of the (future) ys Bayesian Biostatistics - Piracicaba 2014 148 Bayesian Biostatistics - Piracicaba 2014 149 Example III.6: Frequentist approach Example III.6: SAP study – 95% normal range √ • yi = 100/ alpi (i = 1, . . . , 250) ≈ normal distribution N(μ, σ 2) • Serum alkaline phosphatase (alp) was measured on a prospective set of 250 ‘healthy’ patients by Topal et al (2003) • μ = known • Purpose: determine 95% normal range 95% normal range for y: [μ − 1.96 σ, μ + 1.96 σ], with μ = y = 7.11 (σ = 1.4) ⇒ 95% normal range for alp = [104.45, 508.95] • Recall: σ 2 is known • μ = unknown 95% normal range for y: [y − 1.96 σ 1 + 1/n, y + 1.96 σ 1 + 1/n] ⇒ 95% normal range for alp = [104.33, 510.18] Bayesian Biostatistics - Piracicaba 2014 150 Bayesian Biostatistics - Piracicaba 2014 151 Example III.6: Bayesian approach Histogram alp + 95% normal range • PPD: y | y ∼ N(μ, σ 2 + σ 2) • 95% normal range for y: [μ − 1.96 √ σ 2 + σ 2, μ + 1.96 √ σ 2 + σ 2] • Prior variance σ02 large ⇒ σ 2 = σ 2/n ⇒ Bayesian 95% normal range = frequentist 95% normal range ⇒ 95% normal range for alp = [104.33, 510.18] • Same numerical results BUT the way to deal with uncertainty of the true value of the parameter is different $%@)E)@)/&)@A/V) 0 Bayesian Biostatistics - Piracicaba 2014 152 Bayesian Biostatistics - Piracicaba 2014 153 Example III.8: Caries study – PPD for caries experience Example III.8: Caries study – PPD for caries experience • Poisson likelihood + gamma prior ⇒ gamma posterior α 1 β+1 y β β+1 ; ;; Γ(α + y) Γ(α) y! • PPD = negative binomial distribution NB(α, β) p( y |y) = =">)@D)'>?@'"B?'=/ #*[ Bayesian Biostatistics - Piracicaba 2014 154 3.5 More applications of PPDs See later: 155 Exchangeability • Exchangeable: • Determining when to stop a trial (see medical literature further) ◦ Order in {y1, y2, . . . , yn} is not important ◦ yi can be exchanged for yj without changing the problem/solution • Given the past responses, predict the future responses of a patient ◦ Extension of independence • Imputing missing data • Exchangeability = key idea in (Bayesian) statistical inference • Model checking (Chapter 10) Bayesian Biostatistics - Piracicaba 2014 Bayesian Biostatistics - Piracicaba 2014 • Exchangeability of patients, trials, . . . 156 Bayesian Biostatistics - Piracicaba 2014 157 3.6 A normal approximation to the posterior 3.7 Numerical techniques to determine the posterior • When the sample is large and the statistical model is relatively simple (not many parameters), then often the posterior looks like a normal distribution • To compute the posterior, the product of likelihood with the prior needs to be divided with the integral of that product • This property is typically used in the analysis of clinical trials • In addition, to compute posterior mean, median, SD we also need to compute integrals • In the book, we show that the normal approximation can work well for small sample sizes (Example III.10) • Here we see 2 possible ways to obtain in general the posterior and posterior summary measures: • This is a nice property, but not crucial in Bayesian world, since all we need is the posterior Numerical integration Sampling from the posterior Bayesian Biostatistics - Piracicaba 2014 ◦ With prior: Gamma(3, 1) ⇒ Posterior: Gamma(29, 11) Replace gamma prior by lognormal prior, then posterior i=1 yi −1 e −nθ− log(θ)−μ0 2 2σ0 , (θ > 0) n ◦ Assume dmft-index has Poisson(θ) with θ = mean dmft-index AB&, [;=>?)@'=@)/>'?X ◦ Take first 10 children from STM study Calculation AUC using mid-point approach: Example III.11: Caries study – Posterior distribution for a lognormal prior ∝θ 159 Numerical integration Bayesian Biostatistics - Piracicaba 2014 3.7.1 158 Posterior moments cannot be evaluated & AUC not known θ Mid-point approach provides AUC Bayesian Biostatistics - Piracicaba 2014 160 Bayesian Biostatistics - Piracicaba 2014 161 3.7.2 Sampling from the posterior distribution Monte-Carlo integration • Monte Carlo integration: replace integral by a Monte Carlo sample {θ1, . . . , θK } • Monte Carlo integration: usefulness of sampling idea • Approximate p(θ|y) by sample histogram • General purpose sampling algorithms • Approximate posterior mean by: θ= θ p(θ|y) dθ ≈ 1 θk , for K large K K k=1 • Classical 95% confidence interval to indicate precision of posterior mean √ √ θ − 1.96 sθ / K, θ + 1.96 sθ / K] [ √ with sθ / K = Monte Carlo error Also 95% credible intervals can be computed (approximated) Bayesian Biostatistics - Piracicaba 2014 162 Example III.12: Stroke study – Sampling from the posterior distribution Bayesian Biostatistics - Piracicaba 2014 163 Sampling approximation: • 5, 000 sampled values of θ from Beta(19, 133)-distribution • 95% equal tail CI for log(θ): [-2.56, -1.70] • 95% equal tail CI for θ: [0.0782, 0.182] • Posterior of log(θ): one extra line in R-program • Sample summary measures ≈ true summary measures • Posterior for θ = probability of SICH with rt-PA = Beta(19, 133) (Example II.1) θ !(θ) • Approximate 95% HPD interval for θ: [0.0741, 0.179] Bayesian Biostatistics - Piracicaba 2014 164 Bayesian Biostatistics - Piracicaba 2014 165 General purpose sampling algorithms Monte Carlo error (K=50) 4 • Many algorithms are available to sample from standard distributions 4 • Dedicated procedures/general purpose algorithms are needed for non-standard distributions • An important example: Accept-reject (AR) algorithm used by e.g. WinBUGS Generate samples from an instrumental distribution 4 Then reject certain generated values to obtain sample from posterior 454 black box = true posterior mean & 45 454 AR algorithm: only product of likelihood and prior is needed red box = sampled posterior mean Bayesian Biostatistics - Piracicaba 2014 166 Accept-reject algorithm – 1 • Stage 1: θ & u are drawn independently from q(θ) & U (0, 1) Stage 1: sample from q(θ) (proposal distribution) ⇒ θ • Stage 2: Stage 2: reject θ ⇒ sample from p(θ | y) (target) Accept: when u ≤ p(θ | y)/A q(θ) • Assumption: p(θ|y) < A q(θ) for all θ Reject: when u > p(θ | y)/A q(θ) A, • Properties AR algorithm: /W Produces a sample from the posterior A = envelope constant 167 Accept-reject algorithm – 2 • Sampling in two steps: q = envelope distribution Bayesian Biostatistics - Piracicaba 2014 \ Only needs p(y | θ) p(θ) Probability of acceptance = 1/A Bayesian Biostatistics - Piracicaba 2014 θ 168 Bayesian Biostatistics - Piracicaba 2014 169 Example III.13: Caries study – Sampling from posterior with lognormal prior Sampling approximation Accept-reject algorithm Aq(θ) = θ n i=1 yi −1 e−nθ ∝ Gamma( i yi , n)-distribution ⇒ • Lognormal prior is maximized for log(θ) = μ0 • Prior: lognormal distribution with μ0 = log(2) & σ0 = 0.5 • 1000 θ-values sampled: 840 accepted • Data from Example III.11 Sampled posterior mean (median) = 2.50 (2.48) θ Posterior variance = 0.21 95% equal tail CI = [1.66, 3.44] Bayesian Biostatistics - Piracicaba 2014 170 Adaptive Rejection Sampling algorithms – 1 Bayesian Biostatistics - Piracicaba 2014 171 Adaptive Rejection Sampling algorithms – 2 Adaptive Rejection Sampling (ARS) algorithm: Tangent ARS • Builds envelope function/density in an adaptive manner Derivative-free ARS • Envelope and squeezing density are log of piecewise linear functions with knots at sampled grid points !0 • Builds squeezing function/density in an adaptive manner )@'DA?'D)E@)) !0 ?A/V)/? >ZB))^'/V >ZB))^'/V • Two special cases: Tangent method of ARS θ θ ] θ θ θ θ ] θ θ Derivative-free method of ARS Bayesian Biostatistics - Piracicaba 2014 172 Bayesian Biostatistics - Piracicaba 2014 173 Adaptive Rejection Sampling algorithms – 3 3.7.3 Properties ARS algorithms: Choice of posterior summary measures • In practice: posterior summary measures are computed with sampling techniques • Envelope function can be made arbitrarily close to target • Choice driven by available software: • 5 to 10 grid points determine envelope density Mean, median and SD because provided by WinBUGS Mode almost never reported, but useful to compare with frequentist solution • Squeezing density avoids (many) function evaluations Equal tail CI (WinBUGS) & HPD (CODA and BOA) • Derivative-free ARS is implemented in WinBUGS • ARMS algorithm: combination with Metropolis algorithm for non log-concave distributions, implemented in Bayesian SAS procedures Bayesian Biostatistics - Piracicaba 2014 3.8 174 Bayesian hypothesis testing 175 Example III.14: Cross-over study – Use of CIs in Bayesian hypothesis testing • 30 patients with systolic hypertension in cross-over study: Two Bayesian tools for hypothesis testing H0: θ = θ0 • Based on credible interval: Direct use of credible interval: ◦ Is θ0 contained in 100(1-α)% CI? If not, then reject H0 in a Bayesian way! ◦ Popular when many parameters need to be evaluated, e.g. in regression models Contour probability pB : Bayesian Biostatistics - Piracicaba 2014 P [p(θ | y) > p(θ0 | y)] ≡ (1 − pB ) ◦ Period 1: randomization to A or B ◦ Washout period ◦ Period 2: switch medication (A to B and B to A) • θ = P(A better than B) & H0 : θ = θ0(= 0.5) • Result: 21 patients better of with A than with B • Testing: • Bayes factor: change of prior odds for H0 due to data (below) Frequentist: 2-sided binomial test (P = 0.043) Bayesian: U(0,1) prior + LBin(21 | 30, θ) = Beta(22, 10) posterior (pB = 0.023) Bayesian Biostatistics - Piracicaba 2014 176 Bayesian Biostatistics - Piracicaba 2014 177 Graphical representation of pB : Predictive distribution of power and necessary sample size • Classical power and sample size calculations make use of formulas to compute power for a given sample size, and necessary sample size for a chosen power • However, there is often much uncertainty involved in the calculations because: ◦ Comparison of means: only a rough idea of the common SD is available to us, and there is no agreement on the aimed difference in efficacy ◦ Comparison of proportions: only a rough idea of the control rate is available to us, and there is no agreement on the aimed difference in efficacy 0"!θ, • Typically one tries a number of possible values of the required parameter values and inspects the variation in the calculation power/sample size #G;'Y • Alternatively: use Bayesian approach with priors on parameters θ • Here 2 examples: comparison of means & comparison of proportions Bayesian Biostatistics - Piracicaba 2014 178 Comparison of means 2σ 2 Δ2 Example 6.5 (Spiegelhalter et al., 2004) ◦ σ = 1, Δ = 1, α = 0.05 ⇒ zα/2 = −1.96 ◦ Power given sample size: n = 63 (sample size in each treatment group) ◦ Sample size given power: power = 0.80 ⇒ zβ = 0.84 (zβ − zα/2)2 Priors: Δ ∼ N(0.5, 0.1) & σ ∼ N(1, 0.3) | [0, ∞) • Formulas correspond to Z-test to compare 2 means with: Predictive distributions of power and n with WinBUGS ◦ Δ = assumed difference in means ◦ σ = assumed common SD in the two populations ◦ zα/2 = (negative) threshold corresponding to 2-sided α-level ◦ zβ = (positive) threshold corresponding to power = 1 − β 0Y#01 #01 3 2 Bayesian Biostatistics - Piracicaba 2014 179 Comparison of means – continued • Explicit formulas for power given sample size & sample size given power are available: nΔ2 ◦ Power given sample size: power = Φ + zα/2 2σ 2 ◦ Sample size given power: n = Bayesian Biostatistics - Piracicaba 2014 180 Bayesian Biostatistics - Piracicaba 2014 2)_ )_ )_ 181 Pure Bayesian ways to determine necessary sample size 3.8.1 • The above approach is a combination of a Bayesian technique to take into account the prior uncertainty of parameter values and the classical approach of determining the sample size The Bayes factor • Posterior probability for H0 p(H0 | y) = • Alternative, Bayesian approaches for sample size calculation have been suggested based on the size of the posterior CI • Note that the calculations require a design prior, i.e. a prior used to express the uncertainty of the parameters at the design stage. This design prior does not need to be the same as the prior used in the analysis p(y | H0) p(H0) p(y | H0) p(H0) + p(y | Ha) p(Ha) • Bayes factor: BF (y) p(y | H0) p(H0) p(H0 | y) = × 1 − p(H0 | y) p(y | Ha) 1 − p(H0) • Bayes factor = factor that transforms prior odds for H0 into posterior odds after observed the data • Central in Bayesian inference, but not to all Bayesians Bayesian Biostatistics - Piracicaba 2014 182 Bayesian Biostatistics - Piracicaba 2014 3.9 Bayes factor - Jeffreys classification • Jeffreys classification for favoring H0 and against Ha 183 Medical literature on Bayesian methods in RCTs Next three papers use above introduced summary measures + techniques Decisive (BF (y) > 100) Very strong (32 < BF (y) ≤ 100) Strong (10 < BF (y) ≤ 32) Substantial (3.2 < BF (y) ≤ 10) ‘Not worth more than a bare mention’ (1 < BF (y) ≤ 3.2) • Jeffreys classification for favoring Ha and against H0: replace criteria by 1/criteria Decisive (BF (y) > 1/100) ... Bayesian Biostatistics - Piracicaba 2014 184 Bayesian Biostatistics - Piracicaba 2014 185 RCT example 1 ([DPSOH3/6KDKHWDO/DQFHW %DVHGRQ SRVWHULRU SUREDELOLWLHV &#Y .. .* 0 Bayesian Biostatistics - Piracicaba 2014 186 Bayesian Biostatistics - Piracicaba 2014 187 RCT example 2 ([DPSOH:KDWLVOLNHO\UHVXOWDWWKHHQG" ([DPSOH$8%X]GDUHWDO-RI&OLQ2QF 3KDVH ,, QHRDGMXYDQW +(5QHXSRVLWLYH EUHDVW FDQFHU WULDO SODQQHG DFFUXDO ZDV SDWLHQWV UDQGRPL]HG WR FKHPRWKHUDS\ ZLWK DQG ZLWKRXW WUDVWX]XPDE VW '60% LQWHULP DQDO\VLV FRQWURO SDWLHQWV ZLWK FRPSOHWH UHVSRQVH SDWLHQWV XQGHU H[SHULPHQWDO WUHDWPHQW ZLWK FRPSOHWHUHVSRQVH 3ULPDU\ HQGSRLQW SDWKRORJLFDO FRPSOHWH UHVSRQVH S&5 RI WKH WXPRU $FFUXDOZDVVORZHUWKDQH[SHFWHG %D\HVLDQ SUHGLFWLYH FDOFXODWLRQ SUREDELOLW\ WR REWDLQ D FODVVLFDO VWDWLVWLFDOVLJQLILFDQWUHVXOWDWWKHHQG| '60% DVVHVVHG UHVXOWV ZKHQ GDWD RQ SDWLHQWV ZDV DYDLODEOH IRU DVVHVVLQJS&5ZLWKTXHVWLRQ '60%DGYLVHGWRVWRSWKHWULDO :KDWLVOLNHO\UHVXOWDWWKHHQG" Bayesian Biostatistics - Piracicaba 2014 %DVHGRQSRVWHULRU SUHGLFWLYHSUREDELOLWLHV 188 Bayesian Biostatistics - Piracicaba 2014 189 RCT example 3 ([DPSOH*HQHUDWLRQRIILFWLYHIXWXUHVWXGLHVJLYHQUHVXOWV REWDLQHGVRIDU Bayesian Biostatistics - Piracicaba 2014 ([DPSOH&DQQRQHWDO$P+HDUW-H 190 191 ([DPSOH'HFLVLRQWDEOH ([DPSOH Bayesian Biostatistics - Piracicaba 2014 Bayesian Biostatistics - Piracicaba 2014 192 Bayesian Biostatistics - Piracicaba 2014 193 Take home messages • Bayesian hypothesis testing can be based on: credible intervals • Posterior distribution contains all information for statistical inference contour probability • To characterize posterior, we can use: Bayes factor posterior mode, mean, median, variance, SD • Sampling is a useful tool to replace integration credible intervals: equal-tail, HPD • Integration is key in Bayesian inference • PPD = Bayesian tool to predict future • No need to rely on large samples for Bayesian inference Bayesian Biostatistics - Piracicaba 2014 194 Chapter 4 More than one parameter Bayesian Biostatistics - Piracicaba 2014 4.1 195 Introduction • Most statistical models involve more than one parameter to estimate • Examples: Normal distribution: mean μ and variance σ 2 Aims: Linear regression: regression coefficients β0, β1, . . . , βd and residual variance σ 2 Moving towards practical applications Logistic regression: regression coefficients β0, β1, . . . , βd Illustrating that computations become quickly involved Multinomial distribution: class probabilities θ1, θ2, . . . , θd with Illustrating that frequentist results can be obtained with Bayesian procedures d j=1 θj =1 • This requires a prior for all parameters (together): expresses our beliefs about the model parameters Illustrating a multivariate (independent) sampling algorithm • Aim: derive posterior for all parameters and their summary measures Bayesian Biostatistics - Piracicaba 2014 196 Bayesian Biostatistics - Piracicaba 2014 197 • It turns out that in most cases analytical solutions for the posterior will not be possible anymore 4.2 • In Chapter 6, we will see that for this Markov Chain Monte Carlo methods are needed Joint versus marginal posterior inference • Bayes theorem: p(θ | y) = • Here, we look at a simple multivariate sampling approach: Method of Composition L(θ | y)p(θ) L(θ | y)p(θ) d θ ◦ Hence, the same expression as before but now θ = (θ1, θ2, . . . , θd)T ◦ Now, the prior p(θ) is multivariate. But often a prior is given for each parameter separately ◦ Posterior p(θ | y) is also multivariate. But we usually look only at the (marginal) posteriors p(θj | y) (j = 1, . . . , d) • We also need for each parameter: posterior mean, median (and sometimes mode), and credible intervals Bayesian Biostatistics - Piracicaba 2014 198 Bayesian Biostatistics - Piracicaba 2014 4.3 • Illustration on the normal distribution with μ and σ 2 unknown 199 The normal distribution with μ and σ 2 unknown Acknowledging that μ and σ 2 are unknown • Application: determining 95% normal range of alp (continuation of Example III.6) • We look at three cases (priors): • Sample y1, . . . , yn of independent observations from N(μ, σ 2) • Joint likelihood of (μ, σ 2) given y: No prior knowledge is available n 1 1 L(μ, σ 2 | y) = exp − 2 (yi − μ)2 2σ i=1 (2πσ 2)n/2 Previous study is available Expert knowledge is available • The posterior is again product of likelihood with prior divided by the denominator which involves an integral • But first, a brief theoretical introduction • In this case analytical calculations are possible in 2 of the 3 cases Bayesian Biostatistics - Piracicaba 2014 200 Bayesian Biostatistics - Piracicaba 2014 201 No prior knowledge on μ and σ 2 is available Justification prior distribution • Most often prior information on several parameters arrives to us for each of the parameters separately and independently ⇒ p(μ, σ 2) = p(μ) × p(σ 2) • Noninformative joint prior p(μ, σ 2) ∝ σ −2 (μ and σ 2 a priori independent) exp − 2σ1 2 (n − 1)s2 + n(y − μ)2 a σ `aa `aa • And, we do not have prior information on μ nor on σ 2 ⇒ choice of prior distributions: 1 σ n+2 • Posterior distribution p(μ, σ 2 | y) ∝ 4.3.1 • The chosen priors are called flat priors μ Bayesian Biostatistics - Piracicaba 2014 202 • Motivation: Bayesian Biostatistics - Piracicaba 2014 203 Marginal posterior distributions ◦ If one is totally ignorant of a location parameter, then it could take any value on the real line with equal prior probability. ◦ If totally ignorant about the scale of a parameter, then it is as likely to lie in the interval 1-10 as it is to lie in the interval 10-100. This implies a flat prior on the log scale. • The flat prior p(log(σ)) = c is equivalent to chosen prior p(σ 2) ∝ σ −2 Marginal posterior distributions are needed in practice p(μ | y) p(σ 2 | y) • Calculation of marginal posterior distributions involve integration: p(μ | y) = p(μ, σ 2 | y)dσ 2 = p(μ | σ 2, y)p(σ 2 | y)dσ 2 • Marginal posterior is weighted sum of conditional posteriors with weights = uncertainty on other parameter(s) Bayesian Biostatistics - Piracicaba 2014 204 Bayesian Biostatistics - Piracicaba 2014 205 Conditional & marginal posterior distributions for the normal case Some t-densities • Conditional posterior for μ: p(μ | σ 2, y) = N(y, σ 2/n) 206 Some inverse-gamma densities 4 4 4 9 4 4 4 4 4 Bayesian Biostatistics - Piracicaba 2014 207 4 4 4 4 σ 4 4 μ 44 4 44 Bayesian Biostatistics - Piracicaba 2014 4 4 4 4 4 4 4 44 44 9 4 9 44 4 4 4 4 • Normal-scaled-inverse chi-square distribution = N-Inv-χ2(y,n,(n − 1),s2) 4 4 p(μ, σ 2 | y) = p(μ | σ 2, y) p(σ 2 | y) = N(y, σ 2/n) Inv − χ2(n − 1, s2) 4 4 4 4 4 44 4 4 4 • Joint posterior = multiplication of marginal with conditional posterior 4 4 4 44 Joint posterior distribution 4 Bayesian Biostatistics - Piracicaba 2014 9 9 = special case of IG(α, β) (α = (n − 1)/2, β = 1/2) (n − 1)s2 ∼ χ2(n − 1) (σ 2 is the random variable) σ2 ⇒ p(σ 2 | y) ≡ Inv − χ2(n − 1, s2) (scaled inverse chi-squared distribution) • Marginal posterior for σ 2: 9 9 μ−y √ ∼ t(n−1) (μ is the random variable) s/ n ⇒ p(μ | y) = tn−1(y, s2/n) • Marginal posterior for μ: 9 99 9 4 4 ⇒ A posteriori μ and σ 2 are dependent 208 Bayesian Biostatistics - Piracicaba 2014 209 Posterior summary measures and PPD Implications of previous results For μ: Posterior mean = mode = median = y Posterior variance = Frequentist versus Bayesian inference: (n−1) 2 n(n−2) s Numerical results are the same 95% equal tail credible and HPD interval: √ √ [y − t(0.025; n − 1) s/ n, y + t(0.025; n − 1) s/ n] Inference is based on different principles For σ 2: ◦ Posterior mean, mode, median, variance, 95% equal tail CI all analytically available ◦ 95% HPD interval is computed iteratively PPD: ◦ tn−1 y, s2 1 + n1 -distribution Bayesian Biostatistics - Piracicaba 2014 210 Example IV.1: SAP study – Noninformative prior • PPD for y = t249(7.11, 1.37)-distribution Joint posterior distribution = N-Inv-χ2 (NI prior + likelihood, see before) √ Marginal posterior distributions (red curves) for y = 100/ alp • 95% normal range for alp = [104.1, 513.2], slightly wider than before ; 211 Normal range for alp: Example III.6: normal range for alp is too narrow ; Bayesian Biostatistics - Piracicaba 2014 Bayesian Biostatistics - Piracicaba 2014 μ 3 3 σ 212 Bayesian Biostatistics - Piracicaba 2014 213 4.3.2 An historical study is available Example IV.2: SAP study – Conjugate prior • Posterior of historical data can be used as prior to the likelihood of current data • Prior = N-Inv-χ2(μ0,κ0,ν0,σ02)-distribution (from historical data) • Posterior = N-Inv-χ2(μ,κ, ν,σ 2)-distribution (combining data and N-Inv-χ2 prior) N-Inv-χ2 is conjugate prior • Prior based on retrospective study (Topal et al., 2003) of 65 ‘healthy’ subjects: √ ◦ Mean (SD) for y = 100/ alp = 5.25 (1.66) ◦ Conjugate prior = N-Inv-χ2(5.25, 65, 64,2.76) ◦ Note: mean (SD) prospective data: 7.11 (1.4), quite different ◦ Posterior = N-Inv-χ2(6.72, 315, 314, 2.61): ◦ Posterior mean in-between between prior mean & sample mean, but: Again shrinkage of posterior mean towards prior mean Posterior variance = weighted average of prior-, sample variance and distance between prior and sample mean ⇒ posterior variance is not necessarily smaller than prior variance! • Similar results for posterior measures and PPD as in first case Bayesian Biostatistics - Piracicaba 2014 214 Marginal posteriors: ◦ Posterior precision = prior + sample precision ◦ Posterior variance < prior variance and > sample variance ◦ Posterior informative variance > NI variance ◦ Prior information did not lower posterior uncertainty, reason: conflict of likelihood with prior Bayesian Biostatistics - Piracicaba 2014 215 Histograms retro- and prospective data: ; 3 σ 3 μ A;(− ) 5HWURVSHFWLYHGDWD,QIRUPDWLYHSULRU ; 3URVSHFWLYHGDWD/LNHOLKRRG Red curves = marginal posteriors from informative prior (historical data) Bayesian Biostatistics - Piracicaba 2014 216 Bayesian Biostatistics - Piracicaba 2014 A;(− ) 217 4.3.3 Expert knowledge is available What now? • Expert knowledge available on each parameter separately ⇒ Joint prior N(μ0, σ02) × Inv − χ2(ν0, τ02) = conjugate Computational problem: • Posterior cannot be derived analytically, but numerical/sampling techniques are available ‘Simplest problem’ in classical statistics is already complicated Ad hoc solution is still possible, but not satisfactory There is the need for another approach Bayesian Biostatistics - Piracicaba 2014 4.4 218 Multivariate distributions Bayesian Biostatistics - Piracicaba 2014 219 Example IV.3: Young adult study – Smoking and alcohol drinking • Study examining life style among young adults Distributions with a multivariate response: Smoking Multivariate normal distribution: generalization of normal distribution Multivariate Student’s t-distribution: generalization of location-scale t-distribution Multinomial distribution: generalization of binomial distribution Multivariate prior distributions: N-Inv-χ2-distribution: prior for N(μ, σ 2) Alcohol No Yes No-Mild 180 41 Moderate-Heavy 216 64 Total 396 105 • Of interest: association between smoking & alcohol-consumption Dirichlet distribution: generalization of beta distribution (Inverse-)Wishart distribution: generalization of (inverse-) gamma (prior) for covariance matrices (see mixed model chapter) Bayesian Biostatistics - Piracicaba 2014 220 Bayesian Biostatistics - Piracicaba 2014 221 Likelihood part: Dirichlet prior: 2×2 contingency table = multinomial model Mult(n, θ) Conjugate prior to multinomial distribution = Dirichlet prior Dir(α) • θ = {θ11, θ12, θ21, θ22 = 1 − θ11 − θ12 − θ21} and 1 = • y = {y11, y12, y21, y22} and n = Mult(n, θ) = i,j i,j θij θ∼ yij 1 αij −1 θ B(α) i,j ij ◦ α = {α11, α12, α21, α22} ◦ B(α) = i,j Γ(αij )/Γ i,j αij n! θy11 θy12 θy11 θy22 y11! y12! y21! y22! 11 12 21 22 ⇒ Posterior distribution = Dir(α + y) • Note: ◦ Dirichlet distribution = extension of beta distribution to higher dimensions ◦ Marginal distributions of a Dirichlet distribution = beta distribution Bayesian Biostatistics - Piracicaba 2014 222 Measuring association: 223 Analysis of contingency table: • Prior distribution: Dir(1, 1, 1, 1) • Association between smoking and alcohol consumption: ψ= • Posterior distribution: Dir(180+1, 41+1,216+1, 64+1) θ11 θ22 θ12 θ21 • Sample of 10, 000 generated values for θ parameters • Needed p(ψ | y), but difficult to derive • 95% equal tail CI for ψ: [0.839, 2.014] • Alternatively replace analytical calculations by sampling procedure Bayesian Biostatistics - Piracicaba 2014 Bayesian Biostatistics - Piracicaba 2014 • Equal to classically obtained estimate 224 Bayesian Biostatistics - Piracicaba 2014 225 4.5 • However, it is important that Bayesian approach gives most often the right answer • What is known? θ θ Theory: posterior is normal for a large sample (BCLT) Simulations: Bayesian approach may offer alternative interval estimators with better coverage than classical frequentist approaches θ ψ Bayesian Biostatistics - Piracicaba 2014 4.6 Frequentist properties of Bayesian inference • Not of prime interest for a Bayesian to know the sampling properties of estimators Posterior distributions: 226 Bayesian Biostatistics - Piracicaba 2014 227 Sampling from posterior when y ∼ N(μ, σ 2), both parameters unknown The Method of Composition • Sample first σ 2, then given a sampled value of σ 2 ( σ 2) sample μ from p(μ | σ 2, y) A method to yield a random sample from a multivariate distribution • Output case 1: No prior knowledge on μ and σ 2 on next page • Stagewise approach • Based on factorization of joint distribution into a marginal & several conditionals p(θ1, . . . , θd | y) = p(θd | y) p(θd−1 | θd, y) . . . p(θ1 | θd−1, . . . , θ2, y) • Sampling approach: Sample θd from p(θd | y) Sample θ(d−1) from p(θ(d−1) | θd, y) ... Sample θ1 from p(θ1 | θd−1, . . . , θ2, y) Bayesian Biostatistics - Piracicaba 2014 228 Bayesian Biostatistics - Piracicaba 2014 229 4.7 Sampled posterior distributions: • Example of a classical multiple linear regression analysis • Non-informative Bayesian multiple linear regression analysis: $ 3 Analytical results are available + method of composition can be applied . Non-informative prior for all parameters + . . . classical linear regression model 3 μ σ σ Bayesian linear regression models $ 3 3 3 3 3 μ ] Bayesian Biostatistics - Piracicaba 2014 4.7.1 230 The frequentist approach to linear regression Bayesian Biostatistics - Piracicaba 2014 Example IV.7: Osteoporosis study: a frequentist linear regression analysis Classical regression model: y = Xβ + ε Cross-sectional study (Boonen et al., 1996) . y = a n × 1 vector of independent responses 245 healthy elderly women in a geriatric hospital . X = n × (d + 1) design matrix Aim: Find determinants for osteoporosis . β = (d + 1) × 1 vector of regression parameters Average age women = 75 yrs with a range of 70-90 yrs 2 . ε = n × 1 vector of random errors ∼ N(0, σ I) Marker for osteoporosis = tbbmc (in kg) measured for 234 women Likelihood: Simple linear regression model: regressing tbbmc on bmi 1 1 L(β, σ | y, X) = exp − 2 (y − Xβ)T (y − Xβ) 2 n/2 2σ (2πσ ) 2 Classical frequentist regression analysis: ◦ β0 = 0.813 (0.12) ◦ β1 = 0.0404 (0.0043) = (X T X)−1X T y . MLE = LSE of β: β . Residual sum of squares: S = (y − Xβ)T (y − Xβ) ◦ s2 = 0.29, with n − d − 1 = 232 ◦ corr(β0, β1) =-0.99 . Mean residual sum of squares: s2 = S/(n − d − 1) Bayesian Biostatistics - Piracicaba 2014 231 232 Bayesian Biostatistics - Piracicaba 2014 233 Scatterplot + fitted regression line: 4.7.2 A noninformative Bayesian linear regression model Bayesian linear regression model = prior information on regression parameters & residual variance + normal regression likelihood ?""(&! • Noninformative prior for (β, σ 2): p(β, σ 2) ∝ σ −2 • Notation: omit design matrix X • Posterior distributions: ! p(β, σ 2 | y) = N(d+1) β ! p(β | σ 2, y) = N(d+1) β "('(! #) " σ 2(X T X)−1 × Inv − χ2(σ 2 | n − d − 1, s2) | β, " σ 2(X T X)−1 | β, p(σ 2 | y) = Inv − χ!2(σ 2 | n − d − 1, s2)" s2(X T X)−1 p(β | y) = Tn−d−1 β | β, Bayesian Biostatistics - Piracicaba 2014 4.7.3 234 Posterior summary measures for the linear regression model 235 Multivariate posterior summary measures Multivariate posterior summary measures for β • Posterior summary measures of (MLE=LSE) • Posterior mean (mode) of β = β (a) regression parameters β (b) parameter of residual variability σ Bayesian Biostatistics - Piracicaba 2014 2 • 100(1-α)%-HPD region • Univariate posterior summary measures • Contour probability for H0 : β = β 0 The marginal posterior mean (mode, median) of βj = MLE (LSE) βj 95% HPD interval for βj Marginal posterior mode and mean of σ 2 95% HPD-interval for σ 2 Bayesian Biostatistics - Piracicaba 2014 236 Bayesian Biostatistics - Piracicaba 2014 237 Posterior predictive distribution 4.7.4 : t-distribution • PPD of y with x Sampling from the posterior distribution • Most posteriors can be sampled via standard sampling algorithms • How to sample? • What about p(β | y) = multivariate t-distribution? How to sample from this distribution? (R function rmvt in mvtnorm) Directly from t-distribution Method of Composition • Easy with Method of Composition: Sample in two steps 2 Sample from p(σ 2 | y): scaled inverse chi-squared distribution ⇒ σ Sample from p(β | σ 2, y) = multivariate normal distribution Bayesian Biostatistics - Piracicaba 2014 238 Example IV.8: Osteoporosis study – Sampling with Method of Composition Bayesian Biostatistics - Piracicaba 2014 239 PPD: • Sample σ 2 from p(σ 2 | y) = Inv − χ2(σ 2 | n − d − 1, s2) • Distribution of a future observation at bmi=30 ! " from p(β | σ σ • Sample from β 2, y) = N(d+1) β | β, 2(X T X)−1 2 30 ): • Sample future observation y from N( μ30, σ T (1, 30) μ 30 = β 2 σ 30 =σ 2 1 + (1, 30)(X T X)−1(1, 30)T • Sampled mean regression vector = (0.816, 0.0403) • 95% equal tail CIs = β0: [0.594, 1.040] & β1: [0.0317, 0.0486] • Sampled mean and standard deviation = 2.033 and 0.282 • Contour probability for H0 : β = 0 < 0.001 • Marginal posterior of (β0, β1) has a ridge (r(β0, β1) = −0.99) Bayesian Biostatistics - Piracicaba 2014 240 Bayesian Biostatistics - Piracicaba 2014 241 4.8 Bayesian generalized linear models Generalized Linear Model (GLIM): extension of the linear regression model to a wide class of regression models Posterior distributions: β β β . β Bayesian Biostatistics - Piracicaba 2014 4.8.1 ◦ Normal linear regression model with normal distribution for continuous response and σ 2 assumed known ◦ Poisson regression model with Poisson distribution for count response, and log(mean) = linear function of covariates ◦ Logistic regression model with Bernoulli distribution for binary response and logit of probability = linear function of covariates • Examples: ] 242 Bayesian Biostatistics - Piracicaba 2014 More complex regression models 243 Take home messages • Considered multiparameter models are limited • Any practical application involves more than one parameter, hence immediately Bayesian inference is multivariate even with univariate data. Weibull distribution for alp? Censored/truncated data? • A multivariate prior is needed and a multivariate posterior is obtained, but the marginal posterior is the basis for practical inference Cox regression? • Nuisance parameters: • Postpone to MCMC techniques Bayesian inference: average out nuisance parameter Classical inference: profile out (maximize out nuisance parameter) • Multivariate independent sampling can be done, if marginals can be computed • Frequentist properties of Bayesian estimators (with NI priors) often good Bayesian Biostatistics - Piracicaba 2014 244 Bayesian Biostatistics - Piracicaba 2014 245 Chapter 5 Choosing the prior distribution 5.1 Introduction Incorporating prior knowledge Unique feature for Bayesian approach Aims: But might introduce subjectivity Review the different principles that lead to a prior distribution Useful in clinical trials to reduce sample size Critically review the impact of the subjectivity of prior information In this chapter we review different kinds of priors: Conjugate Noninformative Informative Bayesian Biostatistics - Piracicaba 2014 5.2 246 The sequential use of Bayes theorem 5.3 • Posterior of the kth experiment = prior for the (k + 1)th experiment (sequential surgeries) • In this way, the Bayesian approach can mimic our human learning process 247 Conjugate prior distributions In this section: • Conjugate priors for univariate & multivariate data distributions • Conditional conjugate and semi-conjugate distributions • Meaning of ‘prior’ in prior distribution: ◦ Prior: prior knowledge should be specified independent of the collected data ◦ In RCTs: fix the prior distribution in advance Bayesian Biostatistics - Piracicaba 2014 Bayesian Biostatistics - Piracicaba 2014 248 • Hyperpriors Bayesian Biostatistics - Piracicaba 2014 249 5.3.1 Conjugate priors for univariate data distributions Table conjugate priors for univariate discrete data distributions • In previous chapters, examples were given whereby combination of prior with likelihood, gives posterior of the same type as the prior. Exponential family member Parameter Conjugate prior UNIVARIATE CASE Discrete distributions • This property is called conjugacy. • For an important class of distributions (those that belong to exponential family) there is a recipe to produce the conjugate prior Bayesian Biostatistics - Piracicaba 2014 250 Table conjugate priors for univariate continuous data distributions Bernoulli Bern(θ) θ Binomial Bin(n,θ) θ Beta(α0, β0) Negative Binomial NB(k,θ) θ Beta(α0, β0) Poisson λ Gamma(α0, β0) Poisson(λ) Beta(α0, β0) Bayesian Biostatistics - Piracicaba 2014 251 Recipe to choose conjugate priors p(y | θ) ∈ exponential family: Exponential family member Parameter Conjugate prior p(y | θ) = b(y) exp c(θ)T t(y) + d(θ) UNIVARIATE CASE Continuous distributions Normal-variance fixed N(μ, σ 2)-σ 2 fixed Normal-mean fixed 2 N(μ, σ )-μ fixed μ N(μ0, σ02) 2 IG(α0, β0) σ ◦ d(θ), b(y) = scalar functions, c(θ) = (c1(θ), . . . , cd(θ))T ◦ t(y) = d-dimensional sufficient statistic for θ (canonical parameter) Inv-χ2(ν0, τ02) Normal∗ N(μ, σ 2) μ, σ 2 N-Inv-χ Exponential Exp(λ) λ ◦ Examples: Binomial distribution, Poisson distribution, normal distribution, etc. NIG(μ0, κ0, a0, b0) 2 (μ0,κ0,ν0,τ02) For a random sample y = {y1, . . . , yn} of i.i.d. elements: Gamma(α0, β0) p(y | θ) = b(y) exp c(θ)T t(y) + nd(θ) ◦ b(y) = Bayesian Biostatistics - Piracicaba 2014 252 n 1 b(yi) & t(y) = Bayesian Biostatistics - Piracicaba 2014 n 1 t(yi) 253 Recipe to choose conjugate priors Recipe to choose conjugate priors For the exponential family, the class of prior distributions closed under sampling = p(θ | α, β) = k(α, β) exp c(θ)T α + βd(θ) • Above rule gives the natural conjugate family • Enlarge class of priors by adding extra parameters: conjugate family of priors, again closed under sampling (O’Hagan & Forster, 2004) ◦ α = (α1, . . . , αd)T and β hyperparameters ◦ Normalizing constant: k(α, β) = 1/ exp c(θ)T α + βd(θ) dθ • The conjugate prior has the same functional form as the likelihood, obtained by replacing the data (t(y) and n) by parameters (α and β) Proof of closure: • A conjugate prior is model-dependent, in fact likelihood-dependent p(θ | y) ∝ p(y | θ)p(θ) = exp c(θ)T t(y) + n d(θ) exp c(θ)T α + βd(θ) = exp c(θ)T α∗ + β ∗d(θ) , with α∗ = α + t(y), β ∗ = β + n Bayesian Biostatistics - Piracicaba 2014 254 Practical advantages when using conjugate priors Bayesian Biostatistics - Piracicaba 2014 255 Example V.2: Dietary study – Normal versus t-prior A (natural) conjugate prior distribution for the exponential family is convenient from several viewpoints: • mathematical • Example II.2: IBBENS-2 normal likelihood was combined with N(328,100) (conjugate) prior distribution • Replace the normal prior by a t30(328, 100)-prior ⇒ posterior practically unchanged, but 3 elegant features of normal prior are lost: • numerical Posterior cannot be determined analytically • interpretational (convenience prior): Posterior is not of the same class as the prior The likelihood of historical data can be easily turned into a conjugate prior. The natural conjugate distribution = equivalent to a fictitious experiment Posterior summary measures are not obvious functions of the prior and the sample summary measures For a natural conjugate prior, the posterior mean = weighted combination of the prior mean and sample estimate Bayesian Biostatistics - Piracicaba 2014 256 Bayesian Biostatistics - Piracicaba 2014 257 5.3.2 Conjugate prior for normal distribution – mean and variance unknown Mean known and variance unknown • For σ 2 unknown and μ known : N(μ, σ 2) with μ and σ 2 unknown ∈ two-parameter exponential family Natural conjugate is inverse gamma (IG) • Conjugate = product of a normal prior with inverse gamma prior Equivalently: scaled inverse-χ2 distribution (Inv-χ2) • Notation: NIG(μ0, κ0, a0, b0) Bayesian Biostatistics - Piracicaba 2014 5.3.3 258 Multivariate data distributions Bayesian Biostatistics - Piracicaba 2014 Table conjugate priors for multivariate data distributions Priors for two popular multivariate models: Exponential family member Parameter Conjugate prior MULTIVARIATE CASE • Multinomial model Discrete distributions Multinomial • Multivariate normal model Bayesian Biostatistics - Piracicaba 2014 259 Mult(n,θ) θ Dirichlet(α0 ) Continuous distributions 260 Normal-covariance fixed N(μ, Σ)-Σ fixed μ Normal-mean fixed N(μ, Σ)-μ fixed Σ Normal∗ N(μ, Σ) Bayesian Biostatistics - Piracicaba 2014 μ, Σ N(μ0 , Σ0 ) IW(Λ0 , ν0 ) NIW(μ0 , κ0 , ν0 , Λ0 ) 261 Multinomial model Mult(n,θ): p(y | θ) = Multivariate normal model n! y1 !y2 !...yk ! k yj j=1 θj ∈ exponential family The p-dimensional multivariate normal distribution: Natural conjugate: Dirichlet(α0) distribution p(θ | α0) = k j=1 Γ(α0j ) k j=1 Γ(α0j ) p(y 1, . . . , y n | μ, Σ) = k 1 (2π)np/2 |Σ|1/2 exp − 12 ni=1(y i − μ)T Σ−1(y i − μ) α0j −1 j=1 θj Conjugates: Properties: Σ known and μ unknown: N(μ0, Σ0) for μ Posterior distribution = Dirichlet(α0 + y) Σ unknown and μ known: inverse Wishart distribution IW(Λ0, ν0) for Σ Beta distribution = special case of a Dirichlet distribution with k = 2 Σ unknown and μ unknown: Normal-inverse Wishart distribution NIW(μ0, κ0, ν0, Λ0) for μ and Σ Marginal distributions of the Dirichlet distribution = beta distributions Dirichlet(1, 1, . . . , 1) = extension of the classical uniform prior Beta(1,1) Bayesian Biostatistics - Piracicaba 2014 5.3.4 262 Conditional conjugate and semi-conjugate priors Bayesian Biostatistics - Piracicaba 2014 5.3.5 Example θ = (μ, σ 2) for y ∼ N(μ, σ 2) 263 Hyperpriors Conjugate priors are restrictive to present prior knowledge • Conditional conjugate for μ: N(μ0, σ02) ⇒ Give parameters of conjugate prior also a prior • Conditional conjugate for σ 2: IG(α, β) Example: • Prior: θ ∼ Beta(1, 1) • Semi-conjugate prior = product of conditional conjugates • Instead: θ ∼ Beta(α, β) and α ∼ Gamma(1, 3), β ∼ Gamma(2, 4) • Often conjugate priors cannot be used in WinBUGS, but semi-conjugates are popular α, β = hyperparameters Gamma(1, 3) × Gamma(2, 4) = hyperprior/hierarchical prior • Aim: more flexibility in prior distribution (and useful for Gibbs sampling) Bayesian Biostatistics - Piracicaba 2014 264 Bayesian Biostatistics - Piracicaba 2014 265 5.4 Noninformative prior distributions 5.4.1 Introduction Sometimes/often researchers cannot or do not wish to make use of prior knowledge ⇒ prior should reflect this absence of knowledge • Prior that express no knowledge = (initially) called a noninformative (NI) • Central question: What prior reflects absence of knowledge? Flat prior? Huge amount of research to find best NI prior Other terms for NI: non-subjective, objective, default, reference, weak, diffuse, flat, conventional and minimally informative, etc • Challenge: make sure that posterior is a proper distribution! Bayesian Biostatistics - Piracicaba 2014 5.4.2 266 Expressing ignorance Bayesian Biostatistics - Piracicaba 2014 267 Ignorance at different scales: σ σ E0σ E0σ ; ; • Unfortunately, but . . . flat prior cannot express ignorance ; ; • Equal prior probabilities = principle of insufficient reason, principle of indifference, Bayes-Laplace postulate E0σ E0σ σ σ Ignorance on σ-scale is different from ignorance on σ 2-scale Bayesian Biostatistics - Piracicaba 2014 268 Bayesian Biostatistics - Piracicaba 2014 269 5.4.3 General principles to choose noninformative priors A lot of research has been spent on the specification of NI priors, most popular are Jeffreys priors: Ignorance cannot be expressed mathematically • Result of a Bayesian analysis depends on choice of scale for flat prior: p(θ) ∝ c or p(h(θ)) ≡ p(ψ) ∝ c • To preserve conclusions when changing scale: Jeffreys suggested a rule to construct priors based on the invariance principle/rule (conclusions do not change when changing scale) • Jeffreys rule suggests a way to choose a scale to take the flat prior on • Jeffreys rule also exists for more than one parameter (Jeffreys multi-parameter rule) Bayesian Biostatistics - Piracicaba 2014 270 Examples of Jeffreys priors 5.4.4 √ • Binomial model: p(θ) ∝ θ−1/2(1 − θ)−1/2 ⇔ flat prior on ψ(θ) ∝ arcsin θ • Poisson model: p(λ) ∝ λ−1/2 ⇔ flat prior on ψ(λ) = Bayesian Biostatistics - Piracicaba 2014 √ 271 Improper prior distributions • Many NI priors are improper (= AUC is infinite) • Improper prior is technically no problem when posterior is proper λ • Example: Normal likelihood (μ unknown + σ 2 known) + flat prior on μ • Normal model with σ fixed: p(μ) ∝ c 2 n μ−y p(y | μ) p(μ) p(y | μ) c 1 p(μ | y) = = =√ √ exp − 2 σ p(y | μ) p(μ) dμ p(y | μ) c dμ 2πσ/ n • Normal model with μ fixed: p(σ 2) ∝ σ −2 ⇔ flat prior on log(σ) • Normal model with μ and σ 2 unknown: p(μ, σ 2) ∝ σ −2, which reproduces some classical frequentist results !!! • Complex models: difficult to know when improper prior yields a proper posterior (variance of the level-2 obs in Gaussian hierarchical model) • Interpretation of improper priors? Bayesian Biostatistics - Piracicaba 2014 272 Bayesian Biostatistics - Piracicaba 2014 273 Locally uniform prior Weak/vague priors 'F)'G== • For practical purposes: sufficient that prior is locally uniform also called vague or weak 5.4.5 ;=>?)@'=@ • Locally uniform: prior ≈ constant on interval outside which likelihood ≈ zero • Examples for N(μ, σ 2) likelihood: ◦ μ: N(0, σ02) prior with σ0 large ◦ σ 2: IG(ε, ε) prior with ε small ≈ Jeffreys prior =&AXB/'E=@(;@'=@ Bayesian Biostatistics - Piracicaba 2014 274 μ Bayesian Biostatistics - Piracicaba 2014 275 Density of log(σ) for σ 2 (= 1/τ 2) ∼ IG(ε, ε) 4 4 ◦ mu ∼ dnorm(0.0,1.0E-6): normal prior with variance = 1000 4 • WinBUGS allows only (proper) vague priors (Jeffreys priors are not allowed) 44 ◦ tau2 ∼ dgamma(0.001,0.001): inverse gamma prior for variance with shape=rate=10−3 4 444 44 44 44 44 44 44 4 4 Vague priors in software: 4 4 4 4 4 44 276 Bayesian Biostatistics - Piracicaba 2014 444 444 4444 44 4 Bayesian Biostatistics - Piracicaba 2014 444 4 4 4 444 • SAS allows improper priors (allows Jeffreys priors) 4 44 44 44 277 Density of log(σ) for σ 2 ∼ IG(ε, ε) 5.5 444 4444 4444 4 44 44444 4 4 4444 444 Informative prior distributions 4 4 44 44 4 44 44 44 4444 444 44444 4444 44 44 44 444 44 4 4 44 Bayesian Biostatistics - Piracicaba 2014 5.5.1 278 Introduction Bayesian Biostatistics - Piracicaba 2014 279 Locating a lost plane • In basically all research some prior knowledge is available Statisticians helped locate an Air France plane in 2011 which was missing for two years using Bayesian methods • In this section: Formalize the use of historical data as prior information using the power prior June 2009: Air France flight 447 went missing flying from Rio de Janeiro in Brazil to Paris, France Review the use of clinical priors, which are prior distributions based on either historical data or on expert knowledge Debris from the Airbus A330 was found floating on the surface of the Atlantic five days later Priors that are based on formal rules expressing prior skepticism and optimism After a number of days, the debris would have moved with the ocean current, hence finding the black box is not easy • The set of priors representing prior knowledge = subjective or informative priors Existing software (used by the US Coast Guard) did not help • But, first two success stories how the Bayesian approach helped to find: Senior analyst at Metron, Colleen Keller, relied on Bayesian methods to locate the black box in 2011 a crashed plane a lost fisherman on the Atlantic Ocean Bayesian Biostatistics - Piracicaba 2014 280 Bayesian Biostatistics - Piracicaba 2014 281 Finding a lost fisherman on the Atlantic Ocean New York Times (30 September 2014) ”. . . if not for statisticians, a Long Island fisherman might have died in the Atlantic Ocean after falling off his boat early one morning last summer DĞŵďĞƌƐŽĨƚŚĞƌĂnjŝůŝĂŶ&ƌŝŐĂƚĞŽŶƐƚŝƚƵŝĐĂŽƌĞĐŽǀĞƌŝŶŐĚĞďƌŝƐ ŝŶ:ƵŶĞϮϬϬϵ The man owes his life to a once obscure field known as Bayesian statistics - a set of mathematical rules for using new data to continuously update beliefs or existing knowledge ϮϬϬϵŝŶĨƌĂƌĞĚƐĂƚĞůůŝƚĞŝŵĂŐĞƐŚŽǁƐǁĞĂƚŚĞƌĐŽŶĚŝƚŝŽŶƐ ŽĨĨƚŚĞƌĂnjŝůŝĂŶĐŽĂƐƚĂŶĚƚŚĞƉůĂŶĞƐĞĂƌĐŚĂƌĞĂ It is proving especially useful in approaching complex problems, including searches like the one the Coast Guard used in 2013 to find the missing fisherman, John Aldridge But the current debate is about how scientists turn data into knowledge, evidence and predictions. Concern has been growing in recent years that some fields are not doing a very good job at this sort of inference. In 2012, for example, a team at the biotech company Amgen announced that they’d analyzed 53 cancer studies ĞďƌŝƐĨƌŽŵƚŚĞŝƌ&ƌĂŶĐĞĐƌĂƐŚŝƐůĂŝĚŽƵƚĨŽƌŝŶǀĞƐƚŝͲ ŐĂƚŝŽŶŝŶϮϬϬϵ Bayesian Biostatistics - Piracicaba 2014 282 and found it could not replicate 47 of them Bayesian Biostatistics - Piracicaba 2014 5.5.2 The Coast Guard has been using Bayesian analysis since the 1970s. The approach lends itself well to problems like searches, which involve a single incident and many different kinds of relevant data, said Lawrence Stone, a statistician for Metron, a scientific consulting firm in Reston, Va., that works with the Coast Guard 283 Data-based prior distributions • In previous chapters: ◦ Combined historical data with current data assuming identical conditions ◦ Discounted importance of prior data by increasing variance • Generalized by power prior (Ibrahim and Chen): ◦ Likelihood historical data: L(θ | y 0) based on y 0 = {y01, . . . , y0n0 } ◦ Prior of historical data: p0(θ | c0) ◦ Power prior distribution: p(θ | y 0, a0) ∝ L(θ | y 0)a0 p0(θ | c0) dŚĞ ŽĂƐƚ 'ƵĂƌĚ͕ ŐƵŝĚĞĚ ďLJ ƚŚĞ ƐƚĂƚŝƐƚŝĐĂů ŵĞƚŚŽĚŽĨdŚŽŵĂƐĂLJĞƐ͕ǁĂƐĂďůĞƚŽĨŝŶĚƚŚĞ ŵŝƐƐŝŶŐĨŝƐŚĞƌŵĂŶ:ŽŚŶůĚƌŝĚŐĞ͘ Bayesian Biostatistics - Piracicaba 2014 with 0 no accounting ≤ a0 ≤ 1 fully accounting 284 Bayesian Biostatistics - Piracicaba 2014 285 5.5.3 Elicitation of prior knowledge Example V.5: Stroke study – Prior for 1st interim analysis from experts Prior knowledge on θ (incidence of SICH), elicitation based on: • Elicitation of prior knowledge: turn (qualitative) information from ‘experts’ into probabilistic language ◦ Most likely value for θ and prior equal-tail 95% CI ◦ Prior belief pk on each of the K intervals Ik ≡ [θk−1, θk ) covering [0,1] • Challenges: Most experts have no statistical background What to ask to construct prior distribution: ◦ Prior mode, median, mean and prior 95% CI? ◦ Description of the prior: quartiles, mean, SD? Some probability statements are easier to elicit than others Bayesian Biostatistics - Piracicaba 2014 286 Elicitation of prior knowledge – some remarks θ Bayesian Biostatistics - Piracicaba 2014 287 Identifiability issues • Community and consensus prior: obtained from a community of experts • With overspecified model: non-identifiable model • Difficulty in eliciting prior information on more than 1 parameter jointly • Unidentified parameter, when given a NI prior also posterior is NI • Lack of Bayesian papers based on genuine prior information • Bayesian approach can make parameters estimable, so that it becomes an identifiable model • In next example, not all parameters can be estimated without extra (prior) information Bayesian Biostatistics - Piracicaba 2014 288 Bayesian Biostatistics - Piracicaba 2014 289 Example V.6: Cysticercosis study – Estimate prevalence without gold standard Data: Table of results: Experiment: Test Disease (True) 868 pigs tested in Zambia with Ag-ELISA diagnostic test + - Observed 496 pigs showed a positive test + πα (1 − π)(1 − β) n+=496 Aim: estimate the prevalence π of cysticercosis in Zambia among pigs - π(1 − α) (1 − π)β n−=372 π (1 − π) n=868 Total If estimate of sensitivity α and specificity β available, then: π = + p+ + β − 1 α + β − 1 Only collapsed table is available Since α and β vary geographically, expert knowledge is needed + ◦ p = n /n = proportion of subjects with a positive test ◦α and β = estimated sensitivity and specificity Bayesian Biostatistics - Piracicaba 2014 290 Prior and posterior: 291 Posterior of π: • Prior distribution on π (p(π)) , α (p(α)) and β (p(β)) is needed (a) Uniform priors for π, α and β (no prior information) (b) Beta(21,12) prior for α and Beta(32,4) prior for β (historical data) • Posterior distribution: n+ + − [πα + (1 − π)(1 − β)]n [π(1 − α) + (1 − π)β]n p(π)p(α)p(β) 0πb • WinBUGS was used n p(π, α, β | n+, n−) ∝ Bayesian Biostatistics - Piracicaba 2014 0πb Bayesian Biostatistics - Piracicaba 2014 292 Bayesian Biostatistics - Piracicaba 2014 293 5.5.4 Archetypal prior distributions Example V.7: Skeptical priors in a phase III RCT Tan et al. (2003): • Use of prior information in Phase III RCTs is problematic, except for medical device trials (FDA guidance document) Phase III RCT for treating patients with hepatocellular carcinoma Standard treatment: surgical resection ⇒ Pleas for objective priors in RCTs Experimental treatment: surgery + adjuvant radioactive iodine (adjuvant therapy) • There is a role of subjective priors for interim analyses: Planning: recruit 120 patients Skeptical prior Frequentist interim analyses for efficacy were planned: Enthusiastic prior First interim analysis (30 patients): experimental treatment better (P = 0.01 < 0.029 = P -value of stopping rule) But, scientific community was skeptical about adjuvant therapy ⇒ New multicentric trial (300 patients) was set up Bayesian Biostatistics - Piracicaba 2014 294 Bayesian Biostatistics - Piracicaba 2014 295 Questionnaire: Prior to the start of the subsequent trial: Pretrial opinions of the 14 clinical investigators were elicited The prior distributions of each investigator were constructed by eliciting the prior belief on the treatment effect (adjuvant versus standard) on a grid of intervals Average of all priors = community prior Average of the priors of the 5 most skeptical investigators = skeptical prior To exemplify the use of the skeptical prior: Combine skeptical prior with interim analysis results of previous trial ⇒ 1-sided contour probability (in 1st interim analysis) = 0.49 ⇒ The first trial would not have been stopped for efficacy Bayesian Biostatistics - Piracicaba 2014 296 Bayesian Biostatistics - Piracicaba 2014 297 Prior of investigators: Bayesian Biostatistics - Piracicaba 2014 Skeptical priors: 298 Bayesian Biostatistics - Piracicaba 2014 299 Example V.8+9 A formal skeptical/enthusiastic prior Formal subjective priors (Spiegelhalter et al., 1994) in normal case: 0.0 • Useful in the context of monitoring clinical trials in a Bayesian manner • Skeptical normal prior: choose mean and variance of p(θ) to reflect skepticism • Enthusiastic normal prior: choose mean and variance of p(θ) to reflect enthusiasm • θ = true effect of treatment (A versus B) .0 θ Bayesian Biostatistics - Piracicaba 2014 % • See figure next page & book % 300 Bayesian Biostatistics - Piracicaba 2014 θ 301 5.6 Prior distributions for regression models 5.6.1 Normal linear regression Normal linear regression model: yi = xTi β + εi, (i = 1, . . . , n) y = Xβ + ε Bayesian Biostatistics - Piracicaba 2014 302 Priors Bayesian Biostatistics - Piracicaba 2014 5.6.2 303 Generalized linear models • Non-informative priors: Popular NI prior: p(β, σ 2) ∝ σ −2 (Jeffreys multi-parameter rule) WinBUGS: product of independent N(0, σ02) (σ02 • In practice choice of NI priors much the same as with linear models large) + IG(ε, ε) (ε small) • But, too large prior variance may not be best for sampling, e.g. in logistic regression model • Conjugate priors: Conjugate NIG prior = N(β 0, σ 2Σ0) × IG(a0, b0) (or Inv-χ2(ν0, τ02)) • In SAS: Jeffreys (improper) prior can be chosen • Historical/expert priors: • Conjugate priors are based on fictive historical data Prior knowledge on regression coefficients must be given jointly Data augmentation priors & conditional mean priors Elicitation process via distributions at covariate values Not implemented in classical software, but fictive data can be explicitly added and then standard software can be used Most popular: express prior based on historical data Bayesian Biostatistics - Piracicaba 2014 304 Bayesian Biostatistics - Piracicaba 2014 305 5.7 Modeling priors Multicollinearity Multicollinearity: |X T X| ≈ 0 ⇒ regression coefficients and standard errors inflated Modeling prior: adapt characteristics of the statistical model Ridge regression: • Multicollinearity: appropriate prior avoids inflation of of β Minimize: (y ∗ − Xβ)T (y ∗ − Xβ) + λβ T β with λ ≥ 0 & y ∗ = y − y1n • Numerical (separation) problems: appropriate prior avoids inflation of β R (λ) = (X T X + λI)−1X T y Estimate: β • Constraints on parameters: constraint can be put in prior = Posterior mode of a Bayesian normal linear regression analysis with: • Variable selection: prior can direct the variable search Normal ridge prior N(0, τ 2I) for β τ 2 = σ 2/λ with σ and λ fixed • Can be easily extended to BGLIM Bayesian Biostatistics - Piracicaba 2014 306 Bayesian Biostatistics - Piracicaba 2014 Numerical (separation) problems Constraints on parameters Separation problems in binary regression models: complete separation and quasi-complete separation Signal-Tandmobiel study: • θk = probability of CE among Flemish children in (k = 1, . . . , 6) school year Solution: Take weakly informative prior on regression coefficients • Constraint on parameters: θ1 ≤ θ2 ≤ · · · ≤ θ6 &.V# • Solutions: \.#00 Prior on θ = (θ1, . . . , θ6)T that maps all θs that violate the constraint to zero [ Neglect the values that are not allowed in the posterior (useful when sampling) /W Bayesian Biostatistics - Piracicaba 2014 307 [ 308 Bayesian Biostatistics - Piracicaba 2014 309 5.8 Other modeling priors Other regression models • LASSO prior (see Bayesian variable selection) • A great variety of models • ... • Not considered here: conditional logistic regression model, Cox proportional hazards model, generalized linear mixed effects models • ... Bayesian Biostatistics - Piracicaba 2014 310 Bayesian Biostatistics - Piracicaba 2014 311 Take home messages • Noninformative priors: do not exist, strictly speaking • Often prior is dominated by the likelihood (data) in practice vague priors (e.g. locally uniform) are ok • Prior in RCTs: prior to the trial important class of NI priors: Jeffreys priors be careful with improper priors, they might imply improper posterior • Conjugate priors: convenient mathematically, computationally and from an interpretational viewpoint • Informative priors: • Conditional conjugate priors: heavily used in Gibbs sampling can be based on historical data & expert knowledge (but only useful when viewpoint of a community of experts) • Hyperpriors: extend the range of conjugate priors, also important in Gibbs sampling Bayesian Biostatistics - Piracicaba 2014 are useful in clinical trials to reduce sample size 312 Bayesian Biostatistics - Piracicaba 2014 313 Chapter 6 Markov chain Monte Carlo sampling 6.1 Introduction Solving the posterior distribution analytically is often not feasible due to the difficulty in determining the integration constant Computing the integral using numerical integration methods is a practical alternative if only a few parameters are involved Aims: ⇒ New computational approach is needed Introduce the sampling approach(es) that revolutionized Bayesian approach Sampling is the way to go! With Markov chain Monte Carlo (MCMC) methods: 1. Gibbs sampler 2. Metropolis-(Hastings) algorithm MCMC approaches have revolutionized Bayesian methods! Bayesian Biostatistics - Piracicaba 2014 314 Bayesian Biostatistics - Piracicaba 2014 315 Intermezzo: Joint, marginal and conditional probability Intermezzo: Joint, marginal and conditional probability Two (discrete) random variables X and Y IBBENS study: 563 (556) bank employees in 8 subsidiaries of Belgian bank participated in a dietary study 120 • Joint probability of X and Y: probability that X=x and Y=y happen together 60 • Conditional probability of X given Y=y: probability that X=x happens if Y=y 80 • Marginal probability of Y: probability that Y=y happens WEIGHT 100 • Marginal probability of X: probability that X=x happens 40 • Conditional probability of Y given X=x: probability that Y=y happens if X=x 140 150 160 170 180 190 200 LENGTH Bayesian Biostatistics - Piracicaba 2014 316 Bayesian Biostatistics - Piracicaba 2014 317 Intermezzo: Joint, marginal and conditional probability Intermezzo: Joint, marginal and conditional probability IBBENS study: 563 (556) bank employees in 8 subsidiaries of Belgian bank participated in a dietary study IBBENS study: frequency table Length −150 150 − 160 160 − 170 170 − 180 180 − 190 190 − 200 200− Total 80 40 60 WEIGHT 100 120 Weight 140 150 160 170 180 190 200 LENGTH Bayesian Biostatistics - Piracicaba 2014 318 −50 2 12 4 0 50 − 60 1 25 50 60 − 70 0 12 54 70 − 80 0 5 80 − 90 0 0 90 − 100 0 100 − 110 0 110 − 120 0 0 0 14 0 0 0 90 52 13 1 0 132 42 72 34 0 0 153 12 58 32 2 1 105 0 0 20 18 3 0 41 0 1 2 7 1 0 11 0 0 0 2 2 1 0 5 120− 0 0 0 0 1 0 0 1 Total 3 54 163 220 107 8 1 556 Bayesian Biostatistics - Piracicaba 2014 319 Intermezzo: Joint, marginal and conditional probability Intermezzo: Joint, marginal and conditional probability IBBENS study: joint probability IBBENS study: marginal probabilities Length Weight 18 Length −150 150 − 160 160 − 170 170 − 180 180 − 190 190 − 200 200− total Weight −150 150 − 160 160 − 170 170 − 180 180 − 190 190 − 200 200− total −50 2/556 12/556 4/556 0/556 0/556 0/556 0/556 18/556 −50 2/556 12/556 4/556 0/556 0/556 0/556 0/556 18/556 50 − 60 1/556 25/556 50/556 14/556 0/556 0/556 0/556 90/556 50 − 60 1/556 25/556 50/556 14/556 0/556 0/556 0/556 90/556 60 − 70 0/556 12/556 54/556 52/556 13/556 1/556 0/556 132/556 60 − 70 0/556 12/556 54/556 52/556 13/556 1/556 0/556 132/556 70 − 80 0/556 5/556 42/556 72/556 34/556 0/556 0/556 153/556 70 − 80 0/556 5/556 42/556 72/556 34/556 0/556 0/556 153/556 80 − 90 0/556 0/556 12/556 58/556 32/556 2/556 1/556 105/556 80 − 90 0/556 0/556 12/556 58/556 32/556 2/556 1/556 105/556 90 − 100 0/556 0/556 0/556 20/556 18/556 3/556 0/556 41/556 90 − 100 0/556 0/556 0/556 20/556 18/556 3/556 0/556 41/556 100 − 110 0/556 0/556 1/556 2/556 7/556 1/556 0/556 11/556 100 − 110 0/556 0/556 1/556 2/556 7/556 1/556 0/556 11/556 110 − 120 0/556 0/556 0/556 2/556 2/556 1/556 0/556 5/556 110 − 120 0/556 0/556 0/556 2/556 2/556 1/556 0/556 5/556 120− 0/556 0/556 0/556 0/556 1/556 0/556 0/556 1/556 120− 0/556 0/556 0/556 0/556 1/556 0/556 0/556 1/556 Total 3/556 54/556 163/556 220/556 107/556 8/556 1/556 1 Total 3/556 54/556 163/556 220/556 107/556 8/556 1/556 1 Bayesian Biostatistics - Piracicaba 2014 320 Bayesian Biostatistics - Piracicaba 2014 321 Intermezzo: Joint, marginal and conditional probability Intermezzo: Joint, marginal and conditional density IBBENS study: conditional probabilities Two (continuous) random variables X and Y Length Weight −150 150 − 160 −50 50 − 60 160 − 170 170 − 180 180 − 190 190 − 200 200− • Joint density of X and Y: density f (x, y) total 12/54 1/90 25/90 25/54 60 − 70 12/54 70 − 80 5/54 80 − 90 0/54 90 − 100 0/54 100 − 110 0/54 110 − 120 0/54 120− 0/54 Total 54/54 50/90 14/90 0/90 Bayesian Biostatistics - Piracicaba 2014 0/90 • Marginal density of X: density f (x) 0/90 90/90 • Marginal density of Y: density f (y) • Conditional density of X given Y=y: density f (x|y) • Conditional density of Y given X=x: density f (y|x) 322 Bayesian Biostatistics - Piracicaba 2014 Intermezzo: Joint, marginal and conditional density Intermezzo: Joint, marginal and conditional density IBBENS study: joint density IBBENS study: marginal densities Bayesian Biostatistics - Piracicaba 2014 324 Bayesian Biostatistics - Piracicaba 2014 323 325 6.2 Intermezzo: Joint, marginal and conditional density The Gibbs sampler IBBENS study: conditional densities • Gibbs Sampler: introduced by Geman and Geman (1984) in the context of image-processing for the estimation of the parameters of the Gibbs distribution &RQGLWLRQDOGHQVLW\RI :(,*+7*,9(1/(1*7+ Bayesian Biostatistics - Piracicaba 2014 6.2.1 • Gelfand and Smith (1990) introduced Gibbs sampling to tackle complex estimation problems in a Bayesian manner &RQGLWLRQDOGHQVLW\RI /(1*7+*,9(1:(,*+7 326 Bayesian Biostatistics - Piracicaba 2014 327 The bivariate Gibbs sampler Gibbs sampling: Method of Composition: • p(θ1, θ2 | y) is completely determined by: • p(θ1, θ2 | y) is completely determined by: conditional p(θ2 | θ1, y) marginal p(θ2 | y) conditional p(θ1 | θ2, y) conditional p(θ1 | θ2, y) • Property yields another simple way to sample from joint distribution: • Split-up yields a simple way to sample from joint distribution Take starting values θ10 and θ20 (only 1 is needed) Given θ1k and θ2k at iteration k, generate the (k + 1)-th value according to iterative scheme: (k+1) 1. Sample θ1 from p(θ1 | θ2k , y) (k+1) (k+1) from p(θ2 | θ1 , y) 2. Sample θ2 Bayesian Biostatistics - Piracicaba 2014 328 Bayesian Biostatistics - Piracicaba 2014 329 Example VI.1: SAP study – Gibbs sampling the posterior with NI priors Result of Gibbs sampling: • Chain of vectors: θ = k (θ1k , θ2k )T , k • Example IV.5: sampling from posterior distribution of the normal likelihood based on 250 alp measurements of ‘healthy’ patients with NI prior for both parameters = 1, 2, . . . √ • Now using Gibbs sampler based on y = 100/ alp ◦ Consists of dependent elements ◦ Markov property: p(θ (k+1) | θ k , θ (k−1), . . . , y) = p(θ (k+1) | θ k , y) • Chain depends on starting value + initial portion/burn-in part must be discarded • Under mild conditions: sample from the posterior distribution = target distribution 1. p(μ | σ 2, y): N(μ | ȳ, σ 2/n) 2. p(σ 2 | μ, y): Inv − χ2(σ 2 | n, s2μ) with s2μ = n1 ni=1(yi − μ)2 • Iterative procedure: At iteration (k + 1) ⇒ From k0 on: summary measures calculated from the chain consistently estimate the true posterior measures 1. Sample μ(k+1) from N(ȳ, (σ 2)k /n) 2. Sample (σ 2)(k+1) from Inv − χ2(n, s2μ(k+1) ) Gibbs sampler is called a Markov chain Monte Carlo method Bayesian Biostatistics - Piracicaba 2014 • Determine two conditional distributions: 330 331 $ σ 3 μ 3 3 3 μ 3 3 σ μ 3 σ $ 3 μ $ 3 μ $ ◦ Zigzag pattern in the (μ, σ 2)-plane ◦ 1 complete step = 2 substeps (blue=genuine element) ◦ Sampling from conditional density of μ given σ 2 ◦ Burn-in = 500, total chain = 1,500 ◦ Sampling from conditional density of σ 2 given μ Bayesian Biostatistics - Piracicaba 2014 μ 3 σ σ σ Gibbs sampling path and sample from joint posterior: Gibbs sampling: Bayesian Biostatistics - Piracicaba 2014 332 Bayesian Biostatistics - Piracicaba 2014 333 Example VI.2: Sampling from a discrete × continuous distribution • Joint distribution: f (x, y) ∝ nx y x+α−1(1 − y)(n−x+β−1) Posterior distributions: ◦ x a discrete random variable taking values in {0, 1, . . . , n} ◦ y a random variable on the unit interval ◦ α, β > 0 parameters • Question: f (x)? $ 3 3 μ 3 3 3 σ Solid line = true posterior distribution Bayesian Biostatistics - Piracicaba 2014 334 Marginal distribution: Bayesian Biostatistics - Piracicaba 2014 335 Example VI.3: SAP study – Gibbs sampling the posterior with I priors • Example VI.1: now with independent informative priors (semi-conjugate prior) ◦ μ ∼ N(μ0, σ02) ◦ σ 2 ∼ Inv − χ2(ν0, τ02) • Posterior: 1 − 2σ12 (μ−μ0)2 e 0 σ0 2 2 × (σ 2)−(ν0/2+1) e−ν0 τ0 /2σ n 1 − 12 (yi−μ)2 × n e 2σ σ i=1 n 1 2 n+ν0 2 2 − 1 (y −μ)2 − 2σ 2 (μ−μ0 ) ∝ e 2σ2 i e 0 (σ 2)−( 2 +1) e−ν0 τ0 /2σ p(μ, σ 2 | y) ∝ [ ◦ Solid line = true marginal distribution i=1 ◦ Burn-in = 500, total chain = 1,500 Bayesian Biostatistics - Piracicaba 2014 336 Bayesian Biostatistics - Piracicaba 2014 337 Conditional distributions: Trace plots: n i=1 e 1 2 − 12 (yi −μ)2 − 2σ 2 (μ−μ0 ) 2σ e 0 (N μk , (σ 2)k ) 2 1. p(μ | σ 2, y): 3 • Determine two conditional distributions: μ n (y −μ)2 +ν τ 2. p(σ 2 | μ, y): Inv − χ2 ν0 + n, i=1 νi0+n 0 0 ' • Iterative procedure: At iteration (k + 1) 1. Sample μ(k+1) from N μk , (σ 2)k n (y −μ)2 +ν τ 2 2. Sample (σ 2)(k+1) from Inv − χ2 ν0 + n, i=1 νi0+n 0 0 σ ' Bayesian Biostatistics - Piracicaba 2014 6.2.2 338 Bayesian Biostatistics - Piracicaba 2014 339 The general Gibbs sampler k k k • Full conditional distributions: p(θj | θ1k , . . . , θ(j−1) , θ(j+1) , . . . , θ(d−1) , θdk , y) Starting position θ 0 = (θ10, . . . , θd0)T • Also called: full conditionals Multivariate version of the Gibbs sampler: Iteration (k + 1): • Under mild regularity conditions: (k+1) k from p(θ1 | θ2k , . . . , θ(d−1) , θdk , y) (k+1) from p(θ2 | θ1 (k+1) from p(θd | θ1 1. Sample θ1 2. Sample θ2 .. d. Sample θd Bayesian Biostatistics - Piracicaba 2014 (k+1) , θ3k , . . . , θdk , y) (k+1) , . . . , θ(d−1) , y) θ k , θ (k+1), . . . ultimately are observations from the posterior distribution With the help of advanced sampling algorithms (AR, ARS, ARMS, etc) sampling the full conditionals is done based on the prior × likelihood (k+1) 340 Bayesian Biostatistics - Piracicaba 2014 341 Example VI.4: British coal mining disasters data Statistical model: British coal mining disasters data set: # severe accidents in British coal mines from 1851 to 1962 • Likelihood: Poisson process with a change point at k yi ∼ Poisson(θ) for i = 1, . . . , k Decrease in frequency of disasters from year 40 (+ 1850) onwards? yi ∼ Poisson(λ) for i = k + 1, . . . , n (n=112) λ: Gamma(a2, b2), (a2 constant, b2 parameter) k: p(k) = 1/n b1: Gamma(c1, d1), (c1, d1 constants) θ: Gamma(a1, b1), (a1 constant, b1 parameter) • Priors b2: Gamma(c2, d2), (c2, d2 constants) _ Bayesian Biostatistics - Piracicaba 2014 342 343 Posterior distributions: y i , n − k + b2 ) i=k+1 p(b1 | y, θ, λ, b2, k) = Gamma(a1 + c1, θ + d1) p(b2 | y, θ, λ, b1, k) = Gamma(a2 + c2, λ + d2) π(y | k, θ, λ) p(k | y, θ, λ, b1, b2) = n j=1 π(y | j, θ, λ) yi , k + b1 ) i=1 n p(λ | y, θ, b1, b2, k) = Gamma(a2 + k p(θ | y, λ, b1, b2, k) = Gamma(a1 + 3 Full conditionals: Bayesian Biostatistics - Piracicaba 2014 θ λ ki=1 yi θ with π(y | k, θ, λ) = exp [k(λ − θ)] λ ◦ Posterior mode of k: 1891 ◦ Posterior mean for θ/λ= 3.42 with 95% CI = [2.48, 4.59] ◦ a1 = a2 = 0.5, c1 = c2 = 0, d1 = d2 = 1 Bayesian Biostatistics - Piracicaba 2014 344 Bayesian Biostatistics - Piracicaba 2014 345 Note: Example VI.5: Osteoporosis study – Using the Gibbs sampler Bayesian linear regression model with NI priors: • In most published analyses of this data set b1 and b2 are given inverse gamma priors. The full conditionals are then also inverse gamma • The results are almost the same ⇒ our analysis is a sensitivity analysis of the analyses seen in the literature Regression model: tbbmci = β0 + β1bmii + εi (i = 1, . . . , n = 234) Priors: p(β0, β1, σ 2) ∝ σ −2 Notation: y = (tbbmc1, . . . , tbbmc234)T , x = (bmi1, . . . , bmi234)T • Despite the classical full conditionals, the WinBUGS/OpenBUGS sampler for θ and λ are not standard gamma but rather a slice sampler. See Exercise 8.10. p(σ 2 | β0, β1, y) = Inv − χ2(n, s2β ) p(β0 | σ 2, β1, y) = N(rβ1 , σ 2/n) p(β1 | σ 2, β0, y) = N(rβ0 , σ 2/xT x) Full conditionals: with Bayesian Biostatistics - Piracicaba 2014 346 Comparison with Method of Composition: (yi − β0 − β1 xi)2 (yi − β1 xi) rβ0 = (yi − β0) xi/xT x Bayesian Biostatistics - Piracicaba 2014 347 Index plot from Method of Composition: Method of Composition 25% 50% 75% 0.74 0.81 0.89 97.5% Mean SD β0 0.57 1.05 0.81 0.12 β1 0.032 0.038 0.040 0.043 0.049 0.040 0.004 σ2 0.069 0.078 0.083 0.088 0.100 0.083 0.008 β 2.5% Parameter 1 n rβ1 = n1 s2β = Gibbs sampler 50% 75% β0 0.67 0.77 0.84 0.91 β1 σ2 97.5% Mean 0.77 0.030 0.036 0.040 0.042 0.046 0.039 0.0041 0.069 0.077 0.083 0.088 0.099 0.083 0.0077 0.11 3 1.10 '[ SD 25% σ $ 2.5% ◦ Method of Composition = 1,000 independently sampled values ◦ Gibbs sampler: burn-in = 500, total chain = 1,500 Bayesian Biostatistics - Piracicaba 2014 '[ 348 Bayesian Biostatistics - Piracicaba 2014 349 Trace plot from Gibbs sampler: Trace versus index plot: Comparison of index plot with trace plot shows: β • σ 2: index plot and trace plot similar ⇒ (almost) independent sampling • β1: trace plot shows slow mixing ⇒ quite dependent sampling ⇒ Method of Composition and Gibbs sampling: similar posterior measures of σ 2 ' ⇒ Method of Composition and Gibbs sampling: less similar posterior measures of β1 σ ' Bayesian Biostatistics - Piracicaba 2014 350 Autocorrelation: Bayesian Biostatistics - Piracicaba 2014 6.2.3 (k−1) Autocorrelation of lag 1: correlation of β1k with β1 (k−2) Autocorrelation of lag 2: correlation of β1k with β1 (k=1, . . .) 351 Remarks∗ • Full conditionals determine joint distribution (k=1, . . .) • Generate joint distribution from full conditionals ... (k−m) Autocorrelation of lag m: correlation of β1k with β1 (k=1, . . .) • Transition kernel High autocorrelation: ⇒ burn-in part is larger ⇒ takes longer to forget initial positions ⇒ remaining part needs to be longer to obtain stable posterior measures Bayesian Biostatistics - Piracicaba 2014 352 Bayesian Biostatistics - Piracicaba 2014 353 6.2.4 Review of Gibbs sampling approaches Review of Gibbs sampling approaches – The block Gibbs sampler Sampling the full conditionals is done via different algorithms depending on: Block Gibbs sampler: Shape of full conditional (classical versus general purpose algorithm) • Normal linear regression Preference of software developer: p(σ 2 | β0, β1, y) ◦ SAS procedures GENMOD, LIFEREG and PHREG: ARMS algorithm ◦ WinBUGS: variety of samplers p(β0, β1 | σ 2, y) • May speed up considerably convergence, at the expense of more computational time needed at each iteration Several versions of the basic Gibbs sampler: Deterministic- or systematic scan Gibbs sampler: d dims visited in fixed order • WinBUGS: blocking option on Block Gibbs sampler: d dims split up into m blocks of parameters and Gibbs sampler applied to blocks Bayesian Biostatistics - Piracicaba 2014 6.3 • SAS procedure MCMC: allows the user to specify the blocks 354 Bayesian Biostatistics - Piracicaba 2014 6.3.1 The Metropolis(-Hastings) algorithm 355 The Metropolis algorithm Sketch of algorithm: Metropolis-Hastings (MH) algorithm = general Markov chain Monte Carlo technique to sample from the posterior distribution but does not require full conditionals • New positions are proposed by a proposal density q • Proposed positions will be: • Special case: Metropolis algorithm proposed by Metropolis in 1953 Accepted: ◦ Proposed location has higher posterior probability: with probability 1 ◦ Otherwise: with probability proportional to ratio of posterior probabilities • General case: Metropolis-Hastings algorithm proposed by Hastings in 1970 • Became popular only after introduction of Gelfand & Smith’s paper (1990) Rejected: ◦ Otherwise • Further generalization: Reversible Jump MCMC algorithm by Green (1995) • Algorithm satisfies again Markov property ⇒ MCMC algorithm • Similarity with AR algorithm Bayesian Biostatistics - Piracicaba 2014 356 Bayesian Biostatistics - Piracicaba 2014 357 Metropolis algorithm: The MH algorithm only requires the product of the prior and the likelihood to sample from the posterior Chain is at θ k ⇒ Metropolis algorithm samples value θ (k+1) as follows: from the symmetric proposal density q(θ | θ), with 1. Sample a candidate θ θ = θk 2. The next value θ (k+1) will be equal to: (accept proposal), with probability α(θ k , θ) •θ k • θ otherwise (reject proposal), # with = min α(θ , θ) k | y) p(θ r= ,1 p(θ k | y) $ = probability of a move Function α(θ k , θ) Bayesian Biostatistics - Piracicaba 2014 358 Example VI.7: SAP study – Metropolis algorithm for NI prior case Bayesian Biostatistics - Piracicaba 2014 359 MH-sampling: Settings as in Example VI.1, now apply Metropolis algorithm: Proposal density: N(θ k , Σ) with θ k = (μk , (σ 2)k )T and Σ = diag(0.03, 0.03) ● σ σ ● ● ● ● σ σ 3 μ 3 3 μ 3 3 μ 3 σ ● 3 μ 3 3 3 μ 3 3 ◦ Jumps to any location in the (μ, σ 2)-plane ◦ Burn-in = 500, total chain = 1,500 Bayesian Biostatistics - Piracicaba 2014 360 Bayesian Biostatistics - Piracicaba 2014 361 Trace plots: μ 3 $ ' 3 3 μ 3 σ 3 $ σ 3 Marginal posterior distributions: ◦ Acceptance rate = 40% ' ◦ Burn-in = 500, total chain = 1,500 ◦ Accepted moves = blue color, rejected moves = red color Bayesian Biostatistics - Piracicaba 2014 362 Second choice of proposal density: Bayesian Biostatistics - Piracicaba 2014 363 Accepted + rejected positions: Proposal density: N(θ k , Σ) with θ k = (μk , (σ 2)k )T and Σ = diag(0.001, 0.001) 9DULDQFHSURSRVDO 9DULDQFHSURSRVDO 9DULDQFHSURSRVDO ● ● ● ● ● ● ● ●●● ● σ ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● σ ● ● σ ● ● ● ● ● σ ● ● 3 μ 3 3 3 $ σ 3 μ 3 3 μ 3 3 μ 3 ◦ Acceptance rate = 84% ◦ Poor approximation of true distribution Bayesian Biostatistics - Piracicaba 2014 364 Bayesian Biostatistics - Piracicaba 2014 365 Problem: 6.3.2 The Metropolis-Hastings algorithm Metropolis-Hastings algorithm: What should be the acceptance rate for a good Metropolis algorithm? Chain is at θ k ⇒ Metropolis-Hastings algorithm samples value θ (k+1) as follows: From theoretical work + simulations: from the (asymmetric) proposal density q(θ | θ), with 1. Sample a candidate θ k θ=θ • Acceptance rate: 45% for d = 1 and ≈ 24% for d > 1 2. The next value θ (k+1) will be equal to: (accept proposal), with probability α(θ k , θ) •θ k (reject proposal), • θ otherwise # with Bayesian Biostatistics - Piracicaba 2014 366 k = min r = p(θ | y) q(θ | θ) , 1 α(θ k , θ) | θk ) p(θ k | y) q(θ $ Bayesian Biostatistics - Piracicaba 2014 367 Example VI.8: Sampling a t-distribution using Independent MH algorithm = probability of move • Reversibility condition: Probability of move from θ to θ to θ from θ Target distribution : t3(3, 22)-distribution (a) Independent MH algorithm with proposal density N(3,42) • WinBUGS makes use of univariate MH algorithm to sample from some non-standard full conditionals Bayesian Biostatistics - Piracicaba 2014 368 (Independent MH | θ k ) ≡ q(θ) • Example asymmetric proposal density: q(θ algorithm) (b) Independent MH algorithm with proposal density N(3,22) • Reversible chain: chain satisfying reversibility condition Bayesian Biostatistics - Piracicaba 2014 369 6.3.3 6.5. Remarks* Choice of the sampler Choice of the sampler depends on a variety of considerations • The Gibbs sampler is a special case of the Metropolis-Hastings algorithm, but Gibbs sampler is still treated differently • The transition kernel of the MH-algorithm • The reversibility condition • Difference with AR algorithm Bayesian Biostatistics - Piracicaba 2014 370 Bayesian Biostatistics - Piracicaba 2014 371 Example VI.9: Caries study – MCMC approaches for logistic regression Program Subset of n = 500 children of the Signal-Tandmobiel study at 1st examination: Mode Intercept -0.5900 MLE Research questions: ◦ Have girls a different risk for developing caries experience (CE ) than boys (gender ) in the first year of primary school? ◦ Is there an east-west gradient (x-coordinate) in CE? R 2 Bayesian model: logistic regression + N(0, 100 ) priors for regression coefficients WinBUGS No standard full conditionals Three algorithms: SAS ◦ Self-written R program: evaluate full conditionals on a grid + ICDF-method ◦ WinBUGS program: multivariate MH algorithm (blocking mode on) ◦ SAS procedure MCMC: Random-Walk MH algorithm Bayesian Biostatistics - Piracicaba 2014 Parameter 372 Bayesian Biostatistics - Piracicaba 2014 Mean SD Median gender -0.0379 0.1810 x-coord 0.0017 0.0052 MCSE 0.2800 Intercept -0.5880 0.2840 -0.5860 0.0104 gender -0.0516 0.1850 -0.0578 0.0071 x-coord 0.0052 0.0017 Intercept -0.5800 0.2810 -0.5730 0.0052 6.621E-5 0.0094 gender -0.0379 0.1770 -0.0324 0.0060 x-coord 0.0052 0.0018 0.0053 5.901E-5 Intercept -0.6530 0.2600 -0.6450 0.0317 gender -0.0319 0.1950 -0.0443 0.0208 x-coord 0.0055 0.0016 0.0055 0.00016 373 Take home messages Conclusions: • Posterior means/medians of the three samplers are close (to the MLE) • The two MCMC approaches allow fitting basically any proposed model • Precision with which the posterior mean was determined (high precision = low MCSE) differs considerably • There is no free lunch: computation time can be MUCH longer than with likelihood approaches • The clinical conclusion was the same • The choice between Gibbs sampling and the Metropolis-Hastings approach depends on computational and practical considerations ⇒ Samplers may have quite a different efficiency Bayesian Biostatistics - Piracicaba 2014 374 Chapter 7 Assessing and improving convergence of the Markov chain Bayesian Biostatistics - Piracicaba 2014 7.1 375 Introduction • MCMC sampling is powerful, but comes with a cost: dependent sampling + checking convergence is not easy: ◦ Convergence theorems do not tell us when convergence will occur ◦ In this chapter: graphical + formal diagnostics to assess convergence Questions: Are we getting the right answer? • Acceleration techniques to speed up MCMC sampling procedure Can we get the answer quicker? • Data augmentation as Bayesian generalization of EM algorithm Bayesian Biostatistics - Piracicaba 2014 376 Bayesian Biostatistics - Piracicaba 2014 377 7.2 Assessing convergence of a Markov chain 7.2.1 Definition of convergence for a Markov chain Loose definition: With increasing number of iterations (k −→ ∞), the distribution of θk , pk (θ), converges to the target distribution p(θ | y). In practice, convergence means: • Histogram of θk remains the same along the chain • Summary statistics of θk remain the same along the chain Bayesian Biostatistics - Piracicaba 2014 378 Approaches Bayesian Biostatistics - Piracicaba 2014 7.2.2 • Theoretical research ◦ Establish conditions that ensure convergence, but in general these theoretical results cannot be used in practice • Two types of practical procedures to check convergence: ◦ Checking stationarity: from which iteration (k0) is the chain sampling from the posterior distribution (assessing burn-in part of the Markov chain). ◦ Checking accuracy: verify that the posterior summary measures are computed with the desired accuracy 379 Checking convergence of the Markov chain Here we look at: • Convergence diagnostics that are implemented in ◦ WinBUGS ◦ CODA, BOA • Graphical and formal tests • Illustrations using the ◦ Osteoporosis study (chain of size 1,500) • Most practical convergence diagnostics (graphical and formal) appeal to the stationarity property of a converged chain • Often done: base the summary measures on the converged part of the chain Bayesian Biostatistics - Piracicaba 2014 380 Bayesian Biostatistics - Piracicaba 2014 381 7.2.3 Graphical approaches to assess convergence 7.2.4 Trace plot: Graphical approaches to assess convergence Trace plot: • A simple exploration of the trace plot gives a good impression of the convergence • A simple exploration of the trace plot gives a good impression of the convergence • Univariate trace plots (each parameter separately) or multivariate trace plots (evaluating parameters jointly) are useful • Univariate trace plots (each parameter separately) or multivariate trace plots (evaluating parameters jointly) are useful • Multivariate trace plot: monitor the (log) of likelihood or log of posterior density: • Multivariate trace plot: monitor the (log) of likelihood or log of posterior density: ◦ WinBUGS: automatically created variable deviance ◦ Bayesian SAS procedures: default variable LogPost ◦ WinBUGS: automatically created variable deviance ◦ Bayesian SAS procedures: default variable LogPost • Thick pen test: trace plot appears as a horizontal strip when stationarity • Thick pen test: trace plot appears as a horizontal strip when stationarity • Trace plot evaluates the mixing rate of the chain • Trace plot evaluates the mixing rate of the chain Bayesian Biostatistics - Piracicaba 2014 382 Univariate trace plots on osteoporosis regression analysis Bayesian Biostatistics - Piracicaba 2014 383 Multivariate trace plot on osteoporosis regression analysis ' 3 σ !0 3 β ' ' Bayesian Biostatistics - Piracicaba 2014 384 Bayesian Biostatistics - Piracicaba 2014 385 Autocorrelation plot: Autocorrelation plot on osteoporosis regression analysis • Autocorrelation of lag m (ρm): corr(θk ,θ(k+m)) (for k = 1, . . .) estimated by β A. A. ◦ Mixing rate ◦ Minimum number of iterations for the chain to ‘forget’ its starting position A. • Autocorrelation plot: graphical representation of ACF and indicates • Autocorrelation function (ACF): function m → ρm (m = 0, 1, . . .) σ /2*326 ◦ Pearson correlation ◦ Time series approach ! ! ! • Autocorrelation plot is a NOT a convergence diagnostic. Except for the burn-in phase, the ACF plot will not change anymore when further sampling Bayesian Biostatistics - Piracicaba 2014 386 Running-mean plot: Bayesian Biostatistics - Piracicaba 2014 387 Running-mean plot on osteoporosis regression analysis k β σ 3 β Bayesian Biostatistics - Piracicaba 2014 ! 3 3 =V;=> ' 388 ! A. ! Bayesian Biostatistics - Piracicaba 2014 σ ◦ Stabilizes with increasing k in case of stationarity A. ◦ Initial variability of the running-mean plot is always relatively high ◦ Checks stationarity for k > k0 of pk (θ) via the mean A. /2*326 $ • Running mean (ergodic mean) θ : mean of sampled values up to iteration k ' ' 389 Q-Q plot: Q-Q plot on osteoporosis regression analysis 3 3 E'@>?GAE=E=V;=> Bayesian Biostatistics - Piracicaba 2014 390 Cross-correlation plot: $ 3 >)&=/GAE=Eσ >)&=/GAE=Eβ 3 3 • Non-stationarity: Q-Q graph deviating from the bisecting line 3 • Q-Q plot: 1st half of chain (X-axis) versus 2nd half of chain on (Y -axis) 3 >)&=/GAE=E=V;=> • Stationarity: distribution of the chain is stable irrespective of part 3 $ E'@>?GAE=Eσ E'@>?GAE=Eβ Bayesian Biostatistics - Piracicaba 2014 391 Cross-correlation plot on osteoporosis regression analysis • Cross-correlation of θ1 with θ2: correlation between θ1k with θ2k (k = 1, . . . , n) • Cross-correlation plot: scatterplot of θ1k versus θ2k • Useful in case of convergence problems to indicate which model parameters are strongly related and thus whether model is overspecified Bayesian Biostatistics - Piracicaba 2014 392 Bayesian Biostatistics - Piracicaba 2014 3 !# 393 7.2.5 Formal approaches to assess convergence Geweke diagnostic: ◦ Acts on a single chain (θk )k (k = 1, . . . , n) Here: ◦ Checks only stationarity ⇒ looks for k0 • Four formal convergence diagnostics: ◦ Geweke diagnostic: single chain ◦ Compares means of (10%) early ⇔ (50%) late part of chain using a t-like test ◦ Heidelberger-Welch diagnostic: single chain • In addition, there is a dynamic version of the test: ◦ Raftery-Lewis diagnostic: single chain ◦ Cut off repeatedly x% of the chain, from the beginning ◦ Each time compute Geweke test ◦ Brooks-Gelman-Rubin diagnostic: multiple chains • Illustrations based on the • Test cannot be done in WinBUGS, but should be done in R accessing the sampled values of the chain produced by WinBUGS ◦ Osteoporosis study (chain of size 1,500) Bayesian Biostatistics - Piracicaba 2014 394 Geweke diagnostic on osteoporosis regression analysis Bayesian Biostatistics - Piracicaba 2014 395 Geweke diagnostic on osteoporosis regression analysis ◦ β1: majority of Z-values are outside [-1.96,1.96] ⇒ non-stationarity ◦ log(posterior): non-stationary ◦ σ 2: all values except first inside [-1.96,1.96] 396 ^. E!# E!# (a) β1 (b) σ 2 Bayesian Biostatistics - Piracicaba 2014 . • Dynamic version of the Geweke diagnostic (with K = 20) next page, results: ^. ^. • Early part = 20%, Geweke diagnostic for β1: Z = 0.057 and for σ 2: Z = 2.00 • With standard settings of 10% (1st part) & 50% (2nd part): numerical problems Bayesian Biostatistics - Piracicaba 2014 • R program geweke.diag (CODA) • Based on chain of 1,500 iterations E!# (c) log posterior 397 BGR diagnostic: Trace plot of multiple chains on osteoporosis regression analysis ◦ Version 1: Gelman and Rubin ANOVA diagnostic .1 ◦ Version 2: Brooks-Gelman-Rubin interval diagnostic ◦ General term: Brooks-Gelman-Rubin (BGR) diagnostic ◦ Global and dynamic diagnostic ◦ Implementations in WinBUGS, CODA and BOA For both versions: 0 ◦ Take M widely dispersed starting points θm (m = 1, . . . , M ) k )k (m = 1, . . . , M ) are run for 2n iterations ◦ M parallel chains (θm ◦ The first n iterations are discarded and regarded as burn-in Bayesian Biostatistics - Piracicaba 2014 398 Within chains: chain mean θm = (1/n) n Across chains: overall mean θ = (1/M ) k k=1 θm the estimated potential scale reduction factor (PSRF) R: (m = 1, . . . , M ) c = (d + 3)/(d + 1)R, with d = 2V /% R var(V ) corrected for sampling variability m θm c is added 97.5% upper confidence bound of R ANOVA idea: n 1 2 2 k 2 ◦ W = M1 M m=1 sm , with sm = n k=1 (θm − θ m ) 2 ◦ B = Mn−1 M m=1 (θ m − θ) In practice: c < 1.1 or 1.2 Continue sampling until R Two estimates for var(θk | y): Monitor V and W , both must stabilize 1 ◦ Under stationarity: W and V ≡ v% ar(θk | y) = n−1 n W + n B good estimates ◦ Non-stationarity: W too small and V too large Bayesian Biostatistics - Piracicaba 2014 399 Version 1: Version 1: = Convergence diagnostic: R Bayesian Biostatistics - Piracicaba 2014 When convergence: take summary measures from 2nd half of the total chain V W 400 Bayesian Biostatistics - Piracicaba 2014 401 BGR diagnostic: Version 1: Version 2: CODA/BOA: Dynamic/graphical version Version 1 assumes Gaussian behavior of the sampled parameters: not true for σ 2 • M chains are divided into batches of length b Within chains: empirical 100(1-α)% interval: c(s) are calculated based upon 2nd half of the • V 1/2(s), W 1/2(s) and R (cumulative) chains of length 2sb, for s = 1, . . . , [n/b] [100α/2%, 100(1-α/2)%] interval of last n iterations of M chains of length 2n Across chains: empirical equal-tail 100(1-α)% interval obtained from all chains of the last nM iterations of length 2nM • 97.5% upper pointwise confidence bound is added I : Interval-based R I = R Bayesian Biostatistics - Piracicaba 2014 402 BGR diagnostic: VI length of total-chain interval ≡ average length of the within-chain intervals WI Bayesian Biostatistics - Piracicaba 2014 403 BGR diagnostic on osteoporosis regression analysis Version 2: • Based on chain of 1,500 iterations I • WinBUGS: dynamic version of R • 8 overdispersed starting values from a classical linear regression analysis α = 0.20 Chain divided in cumulative subchains based on iterations 1-100, 1-200, 1-300, etc.. For each part compute on the 2nd half of the subchain: I (red) (a) R (b) VI /max(VI ) (green) • First version (CODA): ◦ β1: ◦ σ 2: c = 1.06 with 97.5% upper bound equal to 1.12 R c = 1.00 with 97.5% upper bound is equal to 1 R • Second version (WinBUGS): I dropped quickly below 1.1 with increasing iterations ◦ β1: R and varied around 1.03 from iteration 600 onwards 2 ◦ σ : RI was quickly around 1.003 and the curves stabilized almost immediately (c) WI /max(WI ) (blue) Three curves are produced, which should be monitored • When convergence: take summary measures from 2nd half of the total chain Bayesian Biostatistics - Piracicaba 2014 404 Bayesian Biostatistics - Piracicaba 2014 405 Version 1 on osteoporosis regression analysis σ 3 β Version 2 on osteoporosis regression analysis $3% . $$ $3% # *. *. # !#.1 .1 I (red) (a) R (b) VI /max(VI ) (green) . (c) WI /max(WI ) (blue) Bayesian Biostatistics - Piracicaba 2014 406 Conclusion about convergence of osteoporosis regression analysis Bayesian Biostatistics - Piracicaba 2014 7.2.6 • A chain of length 1,500 is inadequate for achieving convergence of the regression parameters • Convergence was attained with 15,000 iterations 407 Computing the Monte Carlo standard error Aim: estimate the Monte Carlo standard error (MCSE) of the posterior mean θ • MCMC algorithms produce dependent random variables √ • s/ n (s posterior SD and n the length of the chain) cannot be used • For accurately estimating certain quantiles 15,000 iterations were not enough • Two approaches: • With 8 chains the posterior distribution was rather well estimated after 1500 iterations each Time-series approach (SAS MCMC/CODA/BOA) Method of batch means (WinBUGS/CODA/BOA) • Chain must have converged! • Related to this: effective sample size = Size of chain if it were generated from independent sampling Bayesian Biostatistics - Piracicaba 2014 408 Bayesian Biostatistics - Piracicaba 2014 409 MCerror on osteoporosis regression analysis MCSE for osteoporosis regression analysis • Chain: total size of 40, 000 iterations with 20, 000 burn-in iterations • MCSE for β1: ◦ WinBUGS: ◦ SAS: ◦ CODA: ◦ CODA: ◦ ESS: MCSE = 2.163E-4 (batch means) MCSE = 6.270E-4 (time series) MCSE = 1.788E-4 (time series) MCSE = – (batch means), computational difficulties 312 (CODA), 22.9 (SAS) • Extreme slow convergence with SAS PROC MCMC: stationarity only after 500,000 burn-in iterations + 500,000 iterations Bayesian Biostatistics - Piracicaba 2014 7.2.7 410 Practical experience with the formal diagnostic procedures Bayesian Biostatistics - Piracicaba 2014 411 It is impossible to prove convergence in practice: ◦ Example Geyer (1992) • Geweke diagnostic: ◦ AZT RCT example ◦ Popular, but dynamic version is needed ◦ Computational problems on some occasions r s • Gelman and Rubin diagnostics: ◦ Animated discussion whether to use multiple chains to assess convergence ◦ BGR diagnostic should be part of each test for convergence ◦ Problem: how to pick overdispersed and realistic starting values Bayesian Biostatistics - Piracicaba 2014 412 Bayesian Biostatistics - Piracicaba 2014 413 7.3 Accelerating convergence Choosing better starting values: Bad starting values: Aim of acceleration approaches: lower autocorrelations • Can be observed in trace plot • Review of approaches: ◦ Choosing better starting values ◦ Transforming the variables: centering + standardisation ◦ Blocking: processing several parameters at the same time ◦ Algorithms that suppress the purely random behavior of the MCMC sample: over-relaxation ◦ Reparameterization of the parameters: write your model in a different manner • Remedy: Take other starting values • More sophisticated MCMC algorithms may be required • Thinning: look at part of chain, is not an acceleration technique Bayesian Biostatistics - Piracicaba 2014 414 β β ● β Bayesian Biostatistics - Piracicaba 2014 β (a) Original 2 2 (b) Block $ (b) Overrelaxation (c) Centered bmi (d) Thinning (10) (d) Centered bmi 2 2 2 2 ! β ! (a) Block . (c) Overrelaxation ! β . 2 2 415 β Bayesian Biostatistics - Piracicaba 2014 Effect of acceleration on osteoporosis regression analysis β Effect of acceleration on osteoporosis regression analysis ! $ 416 Bayesian Biostatistics - Piracicaba 2014 417 7.4 Practical guidelines for assessing and accelerating convergence 7.5 Data augmentation (DA): augment the observed data (y) with fictive data (z) Practical guidelines: Aim: DA technique may simplify the estimation procedure considerably Start with a pilot run to detect quickly some trivial problems. Multiple (3-5) chains often highlight problems faster ◦ Classical DA technique: EM algorithm ◦ Fictive data are often called missing data Start with a more formal assessment of convergence only when the trace plots show good mixing • DA technique consists of 2 parts: Part 1. Assume that missing data z (auxiliary variables) are available to give complete(d) data w = {y, z} Part 2. Take into account that part of the (complete) vector w (z) is not available When convergence is slow, use acceleration tricks When convergence fails despite acceleration tricks: let the sampler run longer/apply thinning/change software/ . . . write your own sampler Ensure that MCerror at most 5% of the posterior standard deviation • Bayesian DA technique (Tanner & Wong, 1987): Bayesian extension of EM algorithm, replacing maximization with sampling Many parameters: choose a random selection of relevant parameters to monitor Bayesian Biostatistics - Piracicaba 2014 Data augmentation 418 Bayesian data augmentation technique: Bayesian Biostatistics - Piracicaba 2014 419 Applications: • Sampling from p(θ | y, z) may be easier than from p(θ | y) • Genuine missing data • Replace E-step by: sampling the missing data z from p(z | θ, y) • Censored/misclassified observations • Replace M-step by: sampling θ from p(θ | y, z) • Mixture models ⇒ DA approach: repeatedly sampling from p(θ | y, z) and from p(z | θ, y) • Hierarchical models ⇒ Block Gibbs sampler Bayesian Biostatistics - Piracicaba 2014 420 Bayesian Biostatistics - Piracicaba 2014 421 Example VII.6 – Genetic linkage model Likelihood + prior: Imagine that: 1st cell was split up into 2 cells with If two factors are linked with a recombination fraction π, the intercrosses Ab/aB × Ab/aB (repulsion) result in the following probabilities (θ = π 2): ◦ Probabilities 1/2 and θ/4 (b) for Ab: (1 − θ)/4 (d) for ab: θ/4 (a) for AB: 0.5 + θ/4 (c) for aB: (1 − θ)/4 ◦ Corresponding frequencies (y1 − z) and z ◦ (Completed) vector of frequencies: w = (125 − z, z, 18, 20, 34) n subjects classified into (AB, Ab, aB, ab) with y = (y1, y2, y3, y4) frequencies y ∼ Mult(n, (0.5 + θ/4, (1 − θ)/4, (1 − θ)/4, θ/4)) Given z: p(θ | w) = p(θ | y, z) ∝ θz+y4 (1 − θ)y2+y3 (completed posterior) Given θ: p(z | θ, y) = Bin(y1, θ/(2 + θ)) Flat prior for θ + above multinomial likelihood ⇒ posterior These 2 posteriors = full conditionals of posterior obtained based on p(θ | y) ∝ (2 + θ)y1 (1 − θ)y2+y3 θy4 Posterior = unimodal function of θ but has no standard expression Popular data set introduced by Rao (1973): 197 animals with y = (125, 18, 20, 34) Bayesian Biostatistics - Piracicaba 2014 422 Bayesian Biostatistics - Piracicaba 2014 423 Example VII.7 – Cysticercosis study – Estimating prevalence Output: ◦ Likelihood: Mult(n,(0.5, θ/4, (1 − θ)/4, (1 − θ)/4,θ/4)) ◦ Based on frequencies (125 − z, z, 18, 20, 34) ◦ Priors: flat for θ and discrete uniform for z on {0, 1, . . . , 125} Example V.6: no gold standard available ^ ◦ Prior information needed to estimate prevalence (π), SE (α) and SP (β) ◦ Analysis was done with WinBUGS 1.4.3 θ ◦ Here we show how data augmentation can simplify the MCMC procedure Test + + θ 3 Disease (True) . ' ' πα - π(1 − α) u Total π - Observed + (1 − π)(1 − β) z1 (1 − π)β (1 − π) - Total y − z1 y = 496 z2 (n − y) − z2 n − y = 372 n = 868 100 iterations, burn-in = 10 initial Bayesian Biostatistics - Piracicaba 2014 424 Bayesian Biostatistics - Piracicaba 2014 425 Likelihood + Prior: Full conditionals: Priors: Observed data: complex ◦ π: Beta(νπ , ηπ ) Completed data: standard ◦ α: Beta(να, ηα) ◦ β: Beta(νβ , ηβ ) Likelihood: multinomial with probabilities in table Posterior on completed data: π z1+z2+νπ −1(1 − π)n−z1−z2+ηπ −1 αz1+να−1(1 − α)z2+ηα−1 β (n−y)−z2+νβ −1(1 − β)y−z1+ηβ −1 Marginal posterior for prevalence: Beta(z1 + z2 + νπ , n − z1 − z2 + ηπ ) Bayesian Biostatistics - Piracicaba 2014 426 427 ' $ π: Beta(1, 1) 3 ' • Only definition of ‘diseased’ and ‘positive test’ are interchanged α: Beta(21, 12) ' β: Beta(32, 4) • If no strong priors are given: Markov chain will swap between values in [0,0.5] and [0.5,1] MCMC program: ' Bayesian Biostatistics - Piracicaba 2014 • Solution: restrict π, α and β to [0.5,1] Self-written R-program 100,000 iterations ^ • Models based on π, α and β and 1 − π, 1 − α and 1 − β are the same Priors: >0.*. 3 > $ 3 ;. ^ Bayesian Biostatistics - Piracicaba 2014 Identifiability issue: $ Output: πα z1 | y, π, α, β ∼ Bin y, π α + (1 − π)(1 − β) π (1 − α) z2 | y, π, α, β ∼ Bin n − y, π (1 − α) + (1 − π)β α | y, z1, z2, π, να, ηα) ∼ Beta(z1 + να, z2 + ηα) β | y, z1, z2, π, νβ , ηβ ) ∼ Beta((n − y) − z2 + νβ , y − z1 + ηβ ) π | y, z1, z2, π, νπ , ηπ ) ∼ Beta(z1 + z2 + νπ , n − z1 − z2 + ηπ ) ' 10,000 burn-in 428 Bayesian Biostatistics - Piracicaba 2014 429 Example VII.8 – Caries study – Analysis of interval-censored data Tooth numbering system: Signal-Tandmobiel study: Distribution of emergence of tooth 22 on 500 children True distribution of the emergence times not known (annual examinations): ◦ Left-censored: emergence before 1st examination ◦ Right-censored: emergence after last examination ◦ Interval-censored: in-between 2 examinations ⇒ Data: [Li, Ri] (i = 1, . . . , n) DA approach: assume true emergence times yi ∼ N(μ, σ 2) (i = 1, . . . , n) known Two types of full conditionals: ◦ Model parameters | yi (i = 1, . . . , n) (classical distributions) ◦ yi | [Li, Ri] (truncated normal distributions) Bayesian Biostatistics - Piracicaba 2014 430 Output: Bayesian Biostatistics - Piracicaba 2014 431 Example VII.9 – MCMC approaches for a probit regression problem Priors: Example VI.9: MCMC approaches for logistic regression model A?)/?)()@V)/&)?'()> Vague for μ and σ 2 Here probit regression: P (yi = 1 | xi) = Φ(xTi β) MCMC program: ● ● Gibbs sampler is logical choice Self-written R-program ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● WB-program Making use of DA technique Use I(Li, Ri) Discretization idea: ● ● ● ● ● ● ● ● ● ● ● ◦ zi = xTi β + εi, (i = 1, . . . , n) ◦ εi ∼ N(0, 1) ◦ yi = 0 (1) for zi ≤ (>) 0 ◦ P (yi = 1 | xi) = P (zi > 0) = 1 − Φ(−xTi β) = Φ(xTi β) ● ● 10,000 iterations 2,000 burn-in Results: '/)v •: mean prediction : [2.5%, 97.5%] Bayesian Biostatistics - Piracicaba 2014 432 Bayesian Biostatistics - Piracicaba 2014 433 Full conditionals: Take home messages β given z: Conditional on z = (z1, z2, . . . , zn) with design matrix Z • MCMC algorithms come with a cost: ◦ β ∼ N(β , (X T X)−1), X = design matrices of covariates LS ⇒ Bayesian regression analysis with σ 2 = 1 and β = (X T X)−1X T Z estimation process takes often (much) longer time than classical ML ⇒ Sampling from a multivariate Gaussian distribution more difficult to check if correct estimates are obtained LS • Testing convergence: z given β: Full conditional of zi on β and yi ◦ yi = 0: proportional to φxT β,1(z)I(z ≤ 0) strictly speaking, we are never sure ◦ yi = 1: proportional to φxT β,1(z)I(z > 0) handful tests (graphical + formal) are used i i when many models need to be fit, then often trace plots are the only source for testing ⇒ Sampling from truncated normals Bayesian Biostatistics - Piracicaba 2014 434 Chapter 8 Software Bayesian Biostatistics - Piracicaba 2014 435 • WinBUGS + MCMC techniques implied a revolution in use of Bayesian methods • WinBUGS is the standard software for many Bayesians • SAS has developed Bayesian software, but will not be discussed here Aims: Introduction to the use of WinBUGS (and related software) • Many R programs for Bayesian methods have been developed, both related to WinBUGS/JAGS/... and stand-alone Review of other Bayesian software Bayesian Biostatistics - Piracicaba 2014 436 Bayesian Biostatistics - Piracicaba 2014 437 8.1 WinBUGS and related software In this chapter: • WinBUGS: Windows version of Bayesian inference Using Gibbs Sampling (BUGS) • WinBUGS, OpenBUGS + related software • Other sources: ◦ Start: 1989 in MRC Biostatistics at Cambridge with BUGS WinBUGS manual, WinBUGS Examples manuals I and II ◦ Final version = 1.4.3 SAS-STAT manual • OpenBUGS = open source version of WinBUGS Books: Lawson, Browne and Vidal Rodeiro (2003) & Ntzoufras (2009) • Both packages freely available • Additional and related software: CODA, BOA, etc. Bayesian Biostatistics - Piracicaba 2014 8.1.1 438 A first analysis Bayesian Biostatistics - Piracicaba 2014 439 WinBUGS program on osteoporosis simple linear regression analysis Input to WinBUGS: .odc file ◦ Text file: with program, data and initial values for a MCMC run ◦ Compound document contains various types of information: text, tables, formulae, plots, graphs, etc. PRGHO ^ IRULLQ1 ^ WEEPF>L@aGQRUPPX>L@WDX PX>L@EHWDEHWDEPL>L@ ` VLJPDWDX VLJPDVTUWVLJPD To access .odc file: EHWDaGQRUP( EHWDaGQRUP( WDXaGJDPPD(( ` ◦ Start WinBUGS ◦ Click on File ◦ Click on Open... GDWD OLVWWEEPF F EPL F1 WinBUGS programming language: LQLWLDOYDOXHV OLVWEHWD EHWD WDX ◦ Similar but not identical to R ◦ chapter 8 osteoporosis.odc Bayesian Biostatistics - Piracicaba 2014 440 Bayesian Biostatistics - Piracicaba 2014 441 Basic components of a WinBUGS program Data: Model: List statement • Components of model ◦ Blue part: likelihood ◦ Red part: priors ◦ Green part: transformations for monitoring Different formats (rectangular, list, etc) for providing data are possible Initial values: • WinBUGS expresses variability in terms of the precision (= 1/variance) List statement: initial values for intercept, slope and residual precision • BUGS language = declarative: order of the statements does not matter No need to specify initial values for all parameters of model For derived parameters, initial value must not be specified • WinBUGS does not have if-then-else commands, replaced by step and equals function • WinBUGS does not allow improper priors for most of its analyses Bayesian Biostatistics - Piracicaba 2014 442 Obtain posterior samples Bayesian Biostatistics - Piracicaba 2014 443 Specification Tool • check model: To obtain posterior samples, one requires 3 WinBUGS tools: ◦ Checking syntax of program ◦ Message data loaded in left bottom corner of screen/error message (a) Specification Tool • load data: (b) Sample Monitor Tool ◦ Message data loaded in left bottom corner of screen/error message (c) Update Tool • compile: ◦ Message data loaded in left bottom corner of screen/error message • load inits: ◦ button switches off/button gen inits Bayesian Biostatistics - Piracicaba 2014 444 Bayesian Biostatistics - Piracicaba 2014 445 Sample Monitor Tool Screen dump WinBUGS program: Choose Samples...: ◦ Fill in node names ◦ To end: type in * and click on set Term node because of connection with Directed Acyclic Graph (DAG) Dynamic trace plot: click on trace Update Tool Choose the number of MCMC iterations: ◦ Click on Model of the menu bar and choose Update... ◦ Specify the number of iterations ◦ To start sampling, click on update Bayesian Biostatistics - Piracicaba 2014 446 Obtain posterior samples Bayesian Biostatistics - Piracicaba 2014 447 WinBUGS output on osteoporosis simple linear regression analysis Extra via Update Tool: #01 !##01 Extra iterations: click again on update and specify extra number of iterations To apply thinning (10%) at generation of samples: fill in 10 next to thin Extra via Sample Monitor Tool: To look at the densities: click on density To obtain ACF: click on auto corr !# 2 2 To read off posterior summary measures: click on stats 2 2 ! ! To apply thinning (10%) when showing posterior summary: fill in 10 next to thin Bayesian Biostatistics - Piracicaba 2014 448 Bayesian Biostatistics - Piracicaba 2014 449 8.1.2 Information on samplers Acceptance rate on Ames study Information on sampler: • Blocking: Check Blocking options..., Node Info..., Node Tool: methods Top: Tuning: 4,000 iterations • Standard settings of samplers: Check Update options..., Updater options window Convergence )_ )_ )_ )_ • Example: settings Metropolis sampler ◦ 4,000 iterations needed to tune the sampler ◦ Acceptance rate between 20% and 40% ◦ Effect of reducing tuning phase to 750 iterations: next page Bottom: Tuning: 750 iterations No convergence Bayesian Biostatistics - Piracicaba 2014 8.1.3 450 Assessing and accelerating convergence )_ )_ )_ )_ Bayesian Biostatistics - Piracicaba 2014 451 BGR diagnostic on osteoporosis regression analysis Checking convergence of Markov chain: .1 • Within WinBUGS: only BGR test is available Choose starting values in an overdispersed manner Number of chains must be specified before compiling !#.1 .1 Bayesian Biostatistics - Piracicaba 2014 Choose bgr diag in the Sample Monitor Tool 452 Bayesian Biostatistics - Piracicaba 2014 453 Checking convergence using CODA/BOA: (Simple) tricks to improve convergence (hopefully): • Export to S+/R: connect with CODA/BOA • Center + standardize covariates ◦ mu[i] <- beta0+beta1*(bmi[i]-mean(bmi[]))/sd(bmi[]) ◦ Vector: indicated by “[]” Click on coda in Sample Monitor Tool 1 index file + x files on sampled values Index file: • Check effect of standardization: cross-correlation plot beta0 1 1500 beta1 1501 3000 sigma2 3001 4500 • Over-relaxation: switch on over relax in Update Tool Process files with CODA/BOA See file chapter 8 osteoporosis BGR-CODA.R Bayesian Biostatistics - Piracicaba 2014 454 Cross-correlation on osteoporosis multiple linear regression analysis Bayesian Biostatistics - Piracicaba 2014 8.1.4 455 Vector and matrix manipulations Example: regress tbbmc on age, weight, length, bmi, strength: 2 • WinBUGS file: compound document 2 rs • See file chapter 8 osteomultipleregression.odc $2 Data collected in matrix x r s Regression coefficients in vector beta Scalar product: mu[i] <- inprod(beta[],x[i,]) rs Precision matrix: beta[1:6] ∼ dmnorm(mu.beta[], prec.beta[,]) rs Bayesian Biostatistics - Piracicaba 2014 r s Data section in two parts: ◦ First part: list statement ◦ Second part: rectangular part end by END ◦ Click on both parts + each time: load data rs 456 Bayesian Biostatistics - Piracicaba 2014 457 WinBUGS program on osteoporosis multiple linear regression analysis Vector and matrix manipulations Ridge regression: PRGHO ^ IRULLQ1^ WEEPF>L@aGQRUPPX>L@WDX [>L@[>L@DJH>L@[>L@ZHLJKW>L@[>L@OHQJWK>L@ [>L@EPL>L@[>L@VWUHQJWK>L@ PX>L@LQSURGEHWD>@[>L@ ` IRUULQ^PXEHWD>U@` F( IRUULQ^IRUVLQ^ SUHFEHWD>UV@HTXDOVUVF `` EHWD>@aGPQRUPPXEHWD>@SUHFEHWD>@ WDXaGJDPPD(( ` c <- 5 for (r in 1:6) { for (s in 1:6){ prec.beta[r,s] <- equals(r,s)*c }} Zellner’s g-inverse: GDWD OLVW1 c <- 1/N for (r in 1:6) { mu.beta[r] <- 0.0} for (r in 1:6) { for (s in 1:6){ prec.beta[r,s] <- inprod(x[,r],x[,s])*tau*c }} DJH>@OHQJWK>@ZHLJKW>@EPL>@VWUHQJWK>@WEEPF>@ (1' Bayesian Biostatistics - Piracicaba 2014 458 Cross-correlation on osteoporosis multiple linear regression analysis rs 8.1.5 2 2 2 2 rs 23 • R2WinBUGS: interface between R and WinBUGS 32$ $2$ • Example: rs $2 rs r s vague normal prior Bayesian Biostatistics - Piracicaba 2014 rs rs r s Working in batch mode • WinBUGS scripts r s rs 459 Running WinBUGS in batch mode: 2 $2 r s Bayesian Biostatistics - Piracicaba 2014 _______________________________________________________________ osteo.sim <- bugs(data, inits, parameters, "osteo.model.txt", n.chains=8, n.iter=1500, bugs.directory="c:/Program Files/WinBUGS14/", working.directory=NULL, clearWD=FALSE); print(osteo.sim) plot(osteo.sim) _______________________________________________________________ rs ridge normal prior 460 Bayesian Biostatistics - Piracicaba 2014 461 Output R2WinBUGS program on osteoporosis multiple linear regression analysis 8.1.6 Troubleshooting "!##[W*!w"BV>W.W.Y*3. %*.. !# @ #% _ • Errors may occur at: ● ● ● _ ◦ syntax checking ◦ data loading ◦ compilation ◦ providing initial values ◦ running the sampler ● ● ● ● ● ● ● ● ● ● ● ● ● ● $ !# • TRAP error ● ● ● ● ● ● ● 3 • Common error messages: Section Tips and Troubleshooting of WinBUGS manual $ . ● ● ● ● ● ● ● Bayesian Biostatistics - Piracicaba 2014 8.1.7 462 Directed acyclic graphs Bayesian Biostatistics - Piracicaba 2014 463 Local independence assumption: Directed local Markov property/conditional independence assumption: • A graphical model: pictorial representation of a statistical model ◦ Each node v is independent of its non-descendants given its parents ◦ Node: random variable ◦ Line: dependence between random variables A " • Directed Acyclic Graph (DAG) (Bayesian network) ◦ Arrow added to line: express dependence child on father ◦ No cycles permitted & Likelihood of DAG: p(V ) = v∈V p(v | parents[v]) Full conditionals of DAG: p(v | V \v) = p(v ∈ parents[v]) v∈parents[w] p(w ∈ parents[w]) Bayesian Biostatistics - Piracicaba 2014 464 Bayesian Biostatistics - Piracicaba 2014 465 Doodle option in WinBUGS: Doodle option in WinBUGS on osteoporosis simple linear regression analysis Visualizes a DAG + conditional independencies • Three types of nodes: !# ◦ Constants (fixed by design) ◦ Stochastic nodes: variables that are given a distribution (observed = data or unobserved = parameters) ◦ Logical nodes: logical functions of other nodes #rs #rs • Two types of directed lines: #.rs ◦ Single: stochastic relationship ◦ Double: logical relationship *'/1/ • Plates Bayesian Biostatistics - Piracicaba 2014 8.1.8 466 Add-on modules: GeoBUGS and PKBUGS Bayesian Biostatistics - Piracicaba 2014 8.1.9 467 Related software OpenBUGS • GeoBUGS ◦ Interface for producing maps of the output from disease mapping and other spatial models ◦ Creates and manipulates adjacency matrices Started in 2004 in Helsinki Based on BUGS language Current version: 3.2.1 (http://www.openbugs.info/w/) • PKBugs Larger class of sampling algorithms ◦ Complex population pharmacokinetic/pharmacodynamic (PK/PD) models Improved blocking algorithms New functions and distributions added to OpenBUGS Allows censoring C(lower, upper) and truncation T(lower, upper) More details on samplers by default Bayesian Biostatistics - Piracicaba 2014 468 Bayesian Biostatistics - Piracicaba 2014 469 JAGS Interface with WinBUGS, OpenBUGS and JAGS • Platform independent and written in C++ • R2WinBUGS • Current version: 3.1.0 (http://www-fis.iarc.fr/ martyn/software/jags/allows) • BRugs and R2OpenBUGS: interface to OpenBUGS • New (matrix) functions: • Interface between WinBUGS and SAS, STATA, MATLab ◦ mexp(): matrix exponential ◦ %*%: matrix multiplication ◦ sort(): for sorting elements • R2jags: interface between JAGS and R • Allows censoring and truncation Bayesian Biostatistics - Piracicaba 2014 8.2 470 Bayesian analysis using SAS Bayesian Biostatistics - Piracicaba 2014 8.3 471 Additional Bayesian software and comparisons SAS is a versatile package of programs: provides tools for ◦ Setting up data bases ◦ General data handling ◦ Statistical programming ◦ Statistical analyses SAS has a broad range of statistical procedures ◦ Most procedures support frequentist analyses ◦ Version 9.2: Bayesian options in GENMOD, LIFEREG, PHREG, MI, MIXED ◦ Procedure MCMC version 9.2 experimental, from version 9.3 genuine procedure ◦ From version 9.3 random statement in PROC MCMC Bayesian Biostatistics - Piracicaba 2014 472 Bayesian Biostatistics - Piracicaba 2014 473 8.3.1 Additional Bayesian software 8.3.2 Comparison of Bayesian software Additional programs (based on MCMC): Mainly in R and Matlab Comparisons are difficult in general ◦ R: check CRAN website (http://cran.r-project.org/) ◦ Website of the International Society for Bayesian Analysis (http://bayesian.org/) Only comparisons on particular models can be done ◦ MLwiN (http://www.bristol.ac.uk/cmm/software/mlwin/) INLA: No sampling involved ◦ Integrated Nested Laplace Approximations ◦ For Gaussian models STAN: ◦ Based on Hamiltonian sampling ◦ Faster for complex models, slower for “simple” models Bayesian Biostatistics - Piracicaba 2014 474 Bayesian Biostatistics - Piracicaba 2014 475 Chapter 9 Hierarchical models Part II Aims: Bayesian tools for statistical modeling Introduction of one of the most important models in (bio)statistics, but now with a Bayesian flavor Comparing frequentist with Bayesian solutions Comparison Empirical Bayes estimate with Bayesian estimate Bayesian Biostatistics - Piracicaba 2014 476 Bayesian Biostatistics - Piracicaba 2014 477 • Bayesian hierarchical models (BHMs): for hierarchical/clustered data In this chapter: • Examples: • Introduction to BHM via Poisson-gamma model and Gaussian hierarchical model ◦ Measurements taken repeatedly over time on the same subject ◦ Data with a spatial hierarchy: surfaces on teeth and teeth in a mouth ◦ Multi-center clinical data with patients within centers ◦ Cluster randomized trials where centers are randomized to interventions ◦ Meta-analyses • Full Bayesian versus Empirical Bayesian approach • Bayesian Generalized Linear Mixed Model (BGLMM) • Choice of non-informative priors • BHM: a classical mixed effects model + prior on all parameters • Examples analyzed with WinBUGS 1.4.3, OpenBUGS 3.2.1 (and Bayesian SAS procedures) • As in classical frequentist world: random and fixed effects Bayesian Biostatistics - Piracicaba 2014 9.1 478 The Poisson-gamma hierarchical model Bayesian Biostatistics - Piracicaba 2014 9.1.1 479 Introduction • Simple Bayesian hierarchical model: Poisson-gamma model • Example: Lip cancer study Bayesian Biostatistics - Piracicaba 2014 480 Bayesian Biostatistics - Piracicaba 2014 481 Example IX.1: Lip cancer study – Description Mortality from lip cancer among males in the former German Democratic Republic including Berlin (GDR) To visually pinpoint regions with an increased risk for mortality (or morbidity): Display SMRs a (geographical) map: disease map 1989: 2,342 deaths from lip cancer among males in 195 regions of GDR • Problem: Observed # of deaths: yi (i = 1, . . . , n = 195) Expected # of deaths: ei (i = 1, . . . , n = 195) SMRi is an unreliable estimate of θi for a relatively sparsely populated region SMRi = yi/ei: standardized mortality rate (morbidity rate) More stable estimate provided by BHM SMRi: estimate of true relative risk (RR) = θi in the ith region ◦ θi = 1: risk equal to the overall risk (GDR) for dying from lip cancer ◦ θi > 1: increased risk ◦ θi ≤ 1: decreased risk Bayesian Biostatistics - Piracicaba 2014 482 Disease map of GDR: Bayesian Biostatistics - Piracicaba 2014 9.1.2 483 Model specification • Lip cancer mortality data are clustered: male subjects clustered in regions of GDR • What assumptions can we make on the regions? • What is reasonable to assume about yi and θi? Bayesian Biostatistics - Piracicaba 2014 484 Bayesian Biostatistics - Piracicaba 2014 485 Example IX.2: Lip cancer study – Basic assumptions Example IX.2: Lip cancer study – Basic assumptions • Assumptions on 1st level: • Assumptions on 1st level: ◦ yi ∼ Poisson(θi ei) + independent (i = 1, . . . , n) ◦ yi ∼ Poisson(θi ei) + independent (i = 1, . . . , n) • Three possible assumptions about the θis: • Three possible assumptions about the θis: A1: n regions are unique: MLE of θi = SMRi with variance θi/ei A1: n regions are unique: MLE of θi = SMRi with variance θi/ei A2: {y1, y2, . . . , yn} a simple random sample of GDR: θ1 = . . . = θn = θ = 1 A2: {y1, y2, . . . , yn} a simple random sample of GDR: θ1 = . . . = θn = θ = 1 A3: n regions (θi) are related: θi ∼ p(θ | ·) (i = 1, . . . , n) (prior of θi) A3: n regions (θi) are related: θi ∼ p(θ | ·) (i = 1, . . . , n) (prior of θi) • Choice between assumptions A1 and A3: subjective arguments • Choice between assumptions A1 and A3: subjective arguments • Assumption A2: test statistically/simulation exercise (next page) • Assumption A2: test statistically/simulation exercise (next page) ⇒ Choose A3 ⇒ Choose A3 Bayesian Biostatistics - Piracicaba 2014 486 Check assumption A2: Bayesian Biostatistics - Piracicaba 2014 487 Model specification Assumption A3: p(θ | ψ) ◦ θ = {θ1, . . . , θn} is exchangeable ◦ Exchangeability of θi ⇒ borrowing strength (from the other regions) What common distribution p(θ | ψ) to take? ◦ Subjects within regions are exchangeable but not between regions ◦ Conjugate for the Poisson distribution: θi ∼ Gamma(α, β) (i = 1, . . . , n) >(@ Bayesian Biostatistics - Piracicaba 2014 ◦ A priori mean: α/β, variance: α/β 2 To establish BHM: (α, β) ∼ p(α, β) #>(@ 488 Bayesian Biostatistics - Piracicaba 2014 489 9.1.3 Posterior distributions DWE • With y = {y1, . . . , yn}, joint posterior: Bayesian Poisson-gamma model: Level 1: yi | θi ∼ Poisson(θi ei) T T T p(α, β, θ | y) ∝ Level 2: θi | α, β ∼ Gamma(α, β) n p(θi | α, β) p(α, β) i=1 • Because of hierarchical independence of yi p(α, β, θ | y) ∝ Prior: (α, β) ∼ p(α, β) p(yi | θi, α, β) i=1 for (i = 1, . . . , n) n n (θiei)yi i=1 yi ! exp (−θiei) n β α α−1 −βθi θ e p(α, β) Γ(α) i i=1 • Assume: p(α, β) = p(α)p(β) • With p(α) = λα exp(−λαα) & p(β) = λβ exp(−λβ β) (λα = λβ = 0.1) = Gamma(1,0.1) Bayesian Biostatistics - Piracicaba 2014 490 Bayesian Biostatistics - Piracicaba 2014 491 ⇒ Full conditionals Full conditionals: p(θi | θ (i), α, β, y) ∝ θiyi+α−1 exp [−(ei + β)θi] (i = 1, . . . , n) (β n θi)α−1 exp(−λαα) p(α | θ, β, y) ∝ Γ(α)!n " θi + λβ )β p(β | θ, α, y) ∝ β nα exp −( with θ = (θ1, . . . , θn)T θ (i) = θ without θi Bayesian Biostatistics - Piracicaba 2014 492 Bayesian Biostatistics - Piracicaba 2014 493 Poisson-gamma model: DAG 9.1.4 Estimating the parameters Some results: 0 • p(θi | α, β, yi) = Gamma(α + yi, β + ei) • Posterior mean of θi given α, β: [0.rs rs E(θi | α, β, yi) = • Shrinkage given α, β: #rs rs yi α α + yi =Bi +(1 − Bi) β + ei β ei with Bi = β/(β + ei): shrinkage factor *'/1 Bayesian Biostatistics - Piracicaba 2014 α + yi β + ei • Also shrinkage for: p(θi | y) = 494 p(θi | α, β, yi)p(α, β | y) dα dβ Bayesian Biostatistics - Piracicaba 2014 495 Example IX.3: Lip cancer study – A WinBUGS analysis • Marginal posterior distribution of α and β: WinBUGS program: see chapter 9 lip cancer PG.odc α y i n Γ(yi + α) ei β p(α, β) p(α, β | y) ∝ Γ(α)yi! β + ei β + ei i=1 Three chains of size 10,000 iterations, burn-in 5,000 iterations Convergence checks: (a) WinBUGS: BGR - OK (b) CODA: Dynamic Geweke no convergence, but rest OK Posterior summary measures from WinBUGS: Bayesian Biostatistics - Piracicaba 2014 496 QRGH PHDQ VG 0&HUURU PHGLDQ VWDUW VDPSOH DOSKD $3 3 33 EHWD $ 33 3 3 PWKHWD $ 33)2 $ 3 VGWKHWD $3 $ $ $ YDUWKHWD $ 3 3$ Bayesian Biostatistics - Piracicaba 2014 3 497 WinBUGS program: Random effects output: CODA: HPD intervals of parameters PRGHO ^ IRULLQQ ^ 3RLVVRQOLNHOLKRRGIRUREVHUYHGFRXQWV REVHUYH>L@aGSRLVODPEGD>L@ODPEGD>L@WKHWD>L@H[SHFW>L@ VPU>L@REVHUYH>L@H[SHFW>L@ WKHWD>L@aGJDPPDDOSKDEHWD SUHGLFW>L@aGSRLVODPEGD>L@ ` Caterpillar plot .001r1s rs rs rs Estimate of random effects: r s rs rs r3s rs First 30 θi r$s 'LVWRIIXWXUHREVHUYHGFRXQWVIRUH[SHFWHGFRXQW WKHWDQHZaGJDPPDDOSKDEHWDODPEGDQHZWKHWDQHZ SUHGLFWQHZaGSRLVODPEGDQHZ rs rs rs rs 95% equal tail CIs r s rs rs r3s rs 3ULRUGLVWULEXWLRQVIRUSRSXODWLRQSDUDPHWHUV DOSKDaGH[SEHWDaGH[S r$s rs rs rs rs r s 3RSXODWLRQPHDQDQGSRSXODWLRQYDULDQFH PWKHWDDOSKDEHWDYDUWKHWDDOSKDSRZEHWD VGWKHWDVTUWYDUWKHWD ` rs rs r3s rs r$s rs Bayesian Biostatistics - Piracicaba 2014 498 Shrinkage effect: Bayesian Biostatistics - Piracicaba 2014 499 Original and fitted map of GDR: α β >!*. θ >(@ ![0. Conclusion? Bayesian Biostatistics - Piracicaba 2014 500 Bayesian Biostatistics - Piracicaba 2014 501 9.1.5 Posterior predictive distributions Example IX.5: Lip cancer study – Posterior predictive distributions Case 1: Prediction of future counts from a Poisson-gamma model given observed counts y. Two cases: p( yi | y) = p( yi | θi) p(θi | α, β, y) p(α, β | y) dθi dα dβ α 1. Future count of males dying from lip cancer in one of the 195 regions β θi with p( yi | θi) = Poisson( yi | θi ei) p(θi | α, β, y) = Gamma(θi | α + yi, β + ei) p(α, β | y) (above) 2. Future count of males dying from lip cancer in new regions not considered before • Sampling with WinBUGS/SAS • WinBUGS: predict[i] ~ dpois(lambda[i]) • SAS: preddist outpred=lout; Bayesian Biostatistics - Piracicaba 2014 Case 2: 502 9.2 p( y | y) = α β θ 503 Full versus Empirical Bayesian approach p(θ | α, β) p(α, β | y) dθ dα dβ p( y | θ) • Full Bayesian (FB) analysis: ordinary Bayesian approach with = Poisson( p( y | θ) y | θ e) p(θ | α, β, y) = Gamma(θ | α, β) • Empirical Bayesian (EB) analysis: classical frequentist approach to estimate (functions of) the ‘random effects’ (level-2 observations) in a hierarchical context p(α, β | y) (above) • EB approach on lip cancer mortality data: • Sampling with WinBUGS ◦ MMLE of α and β: maximum of p(α, β | y) when p(α) ∝ c, p(β) ∝ c ◦ MMLE of α: αEB , MMLE of β: β EB • WinBUGS: theta.new ~ dgamma(alpha,beta) lambda.new <- theta.new*100 predict.new ~ dpois(lambda.new) Bayesian Biostatistics - Piracicaba 2014 Bayesian Biostatistics - Piracicaba 2014 • Summary measures for θi: p(θi | αEB , β EB , yi) = Gamma(αEB + yi, β EB + ei) • Posterior summary measures = Empirical Bayes-estimates because hyperparameters are estimated from the empirical result 504 Bayesian Biostatistics - Piracicaba 2014 505 Example IX.6 – Lip cancer study – Empirical Bayes analysis • All regions: ◦ EB: αEB = 5.66, β EB = 4.81 ⇒ mean of θis = 1.18 ◦ FB: αF B = 5.84, β F B = 4.91 ⇒ mean of θis = 1.19 Bayesians’ criticism on EB: hyperparameters are fixed • Restricted to 30 regions: comparison of estimates + 95% CIs of θis • Classical approach: Sampling variability of αEB and yi, β EB involves bootstrapping • But is complex Bayesian Biostatistics - Piracicaba 2014 506 Bayesian Biostatistics - Piracicaba 2014 9.3 Gaussian hierarchical models θ< Comparison 95% EB and FB: 507 @! FB=blue, EB=red Bayesian Biostatistics - Piracicaba 2014 508 Bayesian Biostatistics - Piracicaba 2014 509 9.3.1 Introduction 9.3.2 • Gaussian hierarchical model: hierarchical model whereby the distribution at each level is Gaussian • Here variance component model/random effects model: no covariates involved • Bayesian (hierarchical) linear model: The Gaussian hierarchical model • Two-level Bayesian Gaussian hierarchical model: Level 1: Level 2: Priors: yij | θi, σ 2 ∼ N(θi, σ 2) (j = 1, . . . , mi; i = 1, . . . , n) θi | μ, σθ2 ∼ N(μ, σθ2) (i = 1, . . . , n) σ 2 ∼ p(σ 2) and (μ, σθ2) ∼ p(μ, σθ2) Hierarchical independence ◦ All parameters are given a prior distribution ◦ Fundamentals by Lindley and Smith Hyperparameters: often p(μ, σθ2) = p(μ)p(σθ2) Alternative model formulation is θi = μ + αi, with αi ∼ N(0, σθ2) • Illustrations: dietary IBBENS study Joint posterior: p(θ, σ 2, μ, σθ2 | y) ∝ Bayesian Biostatistics - Piracicaba 2014 9.3.3 510 Estimating the parameters i=1 j=1 N(yij | θi , σ 2 ) n i=1 N(θi | μ, σθ2) p(σ 2) p(μ, σθ2) Bayesian Biostatistics - Piracicaba 2014 511 Example IX.7 – Dietary study – Comparison between subsidiaries Some analytical results are available (when σ 2 is fixed): Aim: compare average cholesterol intake between subsidiaries with WinBUGS • Priors: μ ∼ N(0, 106), σ 2 ∼ IG(10−3, 10−3) and σθ ∼ U(0, 100) • Similar shrinkage as with the simple normal model: N(μ, σ 2) (with fixed σ 2) ◦ 3 chains each of size 10,000, 3 × 5,000 burn-in ◦ BGR diagnostic in WinBUGS: almost immediate convergence ◦ Export to CODA: OK except for Geweke test (numerical difficulties) • Much shrinkage with low intra-class correlation ICC = n m i σθ2 σθ2 + σ2 • Posterior summary measures on the last 5,000 iterations ◦ μ = 328.3, σμ = 9.44 ◦ σ M = 119.5 ◦ σθM = 18.26 ◦ θi: see table (not much variation) ◦ Bi ∈ [0.33, 0.45] (≈ uniform shrinkage) Bayesian Biostatistics - Piracicaba 2014 512 Bayesian Biostatistics - Piracicaba 2014 513 DAG: FB estimation of θi with WinBUGS: Sub mi Mean (SE) !# !## # #z # SD FBW (SD) FBS (SD) EML (SE) EREML (SE) 1 82 301.5 (10.2) 92.1 311.8 (12.6) 312.0 (12.0) 313.8 (10.3) 312.2 (11.3) 2 51 324.7 (17.1) 122.1 326.4 (12.7) 326.6 (12.7) 327.0 (11.7) 326.7 (12.3) 3 71 342.3 (13.6) 114.5 336.8 (11.7) 336.6 (11.7) 335.6 (10.7) 336.4 (11.6) 4 71 332.5 (13.5) 113.9 330.8 (11.6) 330.9 (11.5) 330.6 (10.7) 330.8 (11.6) 5 62 351.5 (19.0) 150.0 341.2 (12.8) 341.3 (12.6) 339.5 (11.1) 340.9 (11.8) 6 69 292.8 (12.8) 106.4 307.3 (14.2) 307.8 (13.5) 310.7 (10.8) 308.4 (11.6) 7 74 337.7 (14.1) 121.3 334.1 (11.4) 334.2 (11.2) 333.4 (10.6) 333.9 (11.5) 8 83 347.1 (14.5) 132.2 340.0 (11.4) 340.0 (11.4) 338.8 (10.2) 340.0 (11.2) zy *y'/1#z *'/1 chapter 9 dietary study chol.odc Bayesian Biostatistics - Piracicaba 2014 514 Clinical trial applications 515 Meta-analyses • Hierarchical models have been used in a variety of clinical trial/medical applications: • A Bayesian meta-analysis differs from a classical meta-analysis in that all parameters are given (a priori) uncertainty Meta-analyses • There are advantages in using a Bayesian meta-analysis. One is that no asymptotics needs to be considered. This is illustrated in the next meta-analysis. Inclusion of historical controls to reduce the sample size Dealing with multiplicity Bayesian Biostatistics - Piracicaba 2014 Bayesian Biostatistics - Piracicaba 2014 516 Bayesian Biostatistics - Piracicaba 2014 517 Beta-blocker meta-analysis of 22 clinical trials Inclusion of historical controls - RCT example 5 A meta-analysis on 22 RCTs on beta-blockers for reducing mortality after acute myocardial infarction was performed by Yusuf et al. (1985) ([DPSOH'5+ROPHVHWDO-$0$ Originally, a classical meta-analysis was performed, below is a WinBUGS program for a Bayesian meta-analysis dc (dt) = number of deaths out of nc (nt) patients treated with the control (beta-blocker) treatment WinBUGS program: B!%D\HVLDQKLHUDUFKLFDOPRGHO1 .[.!Y 0. model{ for( i in 1 : Num ) { dc[i] ~ dbin(pc[i], nc[i]); dt[i] ~ dbin(pt[i], nt[i]) logit(pc[i]) <- nu[i]; logit(pt[i]) <- nu[i] + delta[i] nu[i] ~ dnorm(0.0,1.0E-5); delta[i] ~ dnorm(mu, itau)} mu ~ dnorm(0.0,1.0E-6); itau ~ dgamma(0.001,0.001); delta.new ~ dnorm(mu, itau) tau <- 1 / sqrt(itau) } Bayesian Biostatistics - Piracicaba 2014 518 Bayesian Biostatistics - Piracicaba 2014 9.4 Multiplicity issues 519 Mixed models • Hierarchical models have been suggested to correct for multiplicity in a Bayesian manner • For example: ◦ Testing treatment effect in many subgroups ◦ Idea: treatment effects in the subgroups have a common distribution with mean to be estimated ◦ The type I error is not directly controlled, but because of shrinkage effect some protection against overoptimistic interpretations is obtained • In other cases, protection is sought via changing settings guided by simulation studies Bayesian Biostatistics - Piracicaba 2014 520 Bayesian Biostatistics - Piracicaba 2014 521 9.4.1 Introduction 9.4.2 • Bayesian linear mixed model (BLMM): extension of Bayesian Gaussian hierarchical model The linear mixed model Linear mixed model (LMM): yij response for jth observation on ith subject yij = xTij β + z Tij bi + εij • Bayesian generalized linear mixed models (BGLMM): extension of BLMM y i = X i β + Z i bi + εi • Bayesian nonlinear mixed model: extension of BGLMM y i = (yi1, . . . , yimi )T : mi × 1 vector of responses X i = (xTi1, . . . , xTimi )T : (mi × (d + 1)) design matrix β = (β0, β1, . . . , βd)T a (d + 1) × 1: fixed effects Z i = (z Ti1, . . . , z Timi )T : (mi × q) design matrix of ith q × 1 vector of random effects bi εi = (εi1, . . . , εimi )T : mi × 1 vector of measurement errors Bayesian Biostatistics - Piracicaba 2014 522 Distributional assumptions: Bayesian Biostatistics - Piracicaba 2014 523 • LMM popular for analyzing longitudinal studies with irregular time points + Gaussian response bi ∼ Nq (0, G), G: (q × q) covariance matrix G with (j, k)th element: σbj ,bk (j = k), σb2j (j = k) • Covariates: time-independent or time-dependent εi ∼ Nmi (0, Ri), Ri: (mi × mi) covariance matrix often Ri = σ 2Imi • Random intercept model: b0i bi statistically independent of εi (i = 1, . . . , n) • Random intercept + slope model: b0i + b1itij Implications: • Bayesian linear mixed model (BLMM): all parameters prior distribution y i | bi ∼ Nmi (X iβ + Z ibi, Ri) y i ∼ Nmi (X iβ, Z iGZ Ti + Ri) Bayesian Biostatistics - Piracicaba 2014 524 Bayesian Biostatistics - Piracicaba 2014 525 Distributional assumptions Random intercept & random intercept + slope model X b E ] /# ;ȕ =E W5 E ] / W * @)>;=/>) @)>;=/>) X @A/=('/?)@&);?_>=;)(=) 0z @A/=('/?)@&);?(=) ?'() 3 $ ?'() 2 2 2 @A/=('/?)@&);? Bayesian Biostatistics - Piracicaba 2014 3 $ ?'() 526 Bayesian Biostatistics - Piracicaba 2014 527 Example IX.11 – Dietary study – Comparison between subsidiaries • Bayesian linear mixed model: Level 1: Level 2: Priors: Aim: compare average cholesterol intake between subsidiaries correcting for age and gender yij | β, bi, σ 2 ∼ N(xTij β+z Tij bi, σ 2) (j = 1, . . . , mi; i = 1, . . . , n) bi | G ∼ Nq (0, G) (i = 1, . . . , n) σ 2 ∼ p(σ 2), β ∼ p(β) and G ∼ p(G) • Model: yij = β0 + β1 ageij + β2 genderij + b0i + εij b0i ∼ N(0, σb20 ) Joint posterior: εij ∼ N(0, σ 2) p(β, G, σ 2, b1, . . . , bn | y 1, . . . , y n) ∝ n m i n 2 2 p(y | b , σ , β) ij i i=1 i=1 p(bi | G)p(β)p(G)p(σ ) j=1 Bayesian Biostatistics - Piracicaba 2014 Priors: vague for all parameters 528 Bayesian Biostatistics - Piracicaba 2014 529 Results: Example IX.12 – Toenail RCT – Fitting a BLMM • WinBUGS program: chapter 9 dietary study chol age gender.odc Aim: compare itraconazol with lamisil on unaffected nail length ◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in ◦ Double-blinded multi-centric RCT (36 centers) sportsmen and elderly people treated for toenail dermatophyte onychomycosis ◦ Rapid convergence ◦ Two oral medications: itraconazol (treat=0) or lamisil (treat=1) • Posterior summary measures on the last 3 × 5,000 iterations ◦ Twelve weeks of treatment ◦ β1(SD) = −0.69(0.57), β2(SD) = −62.67(10.67) ◦ With covariates: σ M = 116.3, σb0 M = 14.16 ◦ Without covariates: σ M = 119.5, σb0 M = 18.30 ◦ With covariates: ICC = 0.015 ◦ Evaluations at 0, 1, 2, 3, 6, 9 and 12 months ◦ Response: unaffected nail length big toenail ◦ Patients: subgroup of 298 patients Bayesian Biostatistics - Piracicaba 2014 530 Individual profiles: • Model: bi ∼ N2(0, G) yij = β0 + β1 tij + β2 tij × treati + b0i + b1i tij + εij εij ∼ N(0, σ 2) • Priors ◦ Vague normal for fixed effects ◦ G ∼ IW(D, 2) with D=diag(0.1,0.1) !**. /DPLVLO !**. 531 Model specification: ,WUDFRQD]RO Bayesian Biostatistics - Piracicaba 2014 $ (. Bayesian Biostatistics - Piracicaba 2014 $ (. 532 Bayesian Biostatistics - Piracicaba 2014 533 DAG: Results: z z z • WinBUGS program: chapter 9 toenail LMM.odc . !# ◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in !# ◦ Rapid convergence • Posterior summary measures on the last 3 × 5,000 iterations !#0 #zy 0zy z ◦ β1(SD) = 0.58(0.043), β2(SD) = −0.057(0.058) ◦ σ M = 1.78 ◦ σb0 M = 2.71 ◦ σb1 M = 0.48 0 zy zy z ◦ corr(b0, b1)M = −0.39 *y'/1#z *'/1 Bayesian Biostatistics - Piracicaba 2014 9.4.3 • Frequentist analysis with SAS MIXED: MLEs close to Bayesian estimates 534 Bayesian Biostatistics - Piracicaba 2014 535 Generalized linear mixed model (GLMM): The Bayesian generalized linear mixed model (BGLMM) Two-level hierarchy, i.e. mi subjects in n clusters with responses yij having an exponential distribution Generalized linear model (GLIM): n subjects with responses yi having an exponential distribution Examples: binary responses, Poisson counts, gamma responses, Gaussian responses, etc. Examples: binary responses, Poisson counts, gamma responses, Gaussian responses, etc. Correlation of measurements in same cluster often modelled using random effects When covariates involved: logistic/probit regression, Poisson regression, etc. When covariates are involved: logistic/probit mixed effects regression, Poisson mixed effects regression, etc. But clustering in data (if present) is ignored Regression coefficients have a subject-specific interpretation Regression coefficients have a population averaged interpretation Bayesian generalized linear mixed model (BGLMM): GLMM with all parameters given a prior distribution Bayesian Biostatistics - Piracicaba 2014 536 Bayesian Biostatistics - Piracicaba 2014 537 Example IX.13 – Lip cancer study – Poisson-lognormal model Results: • WinBUGS program: chapter 9 lip cancer PLNT with AFF.odc aff : percentage of the population engaged in agriculture, forestry and fisheries ◦ Three chains of each 20,000 iterations, 3 × 10,000 burn-in Aim: verify how much of the heterogeneity of SMRi can be explained by aff i ◦ Rapid convergence • Model: yi | μi ∼ Poisson(μi), with μi = θi ei • Posterior summary measures on the last 3 × 5,000 iterations log(μi/ei) = β0 + β1 aff i + b0i ◦ β1(SD) = 2.23(0.33), 95% equal tail CI [1.55, 2.93] ◦ With aff : σb0 M = 0.38 ◦ Without aff : σb0 M = 0.44 ◦ aff explains 25% of variability of the random intercept b0i ∼ N(0, σb20 ) aff i centered Priors: • Analysis also possible with SAS procedure MCMC ◦ Independent vague normal for regression coefficients ◦ σb0 ∼ U(0, 100) Bayesian Biostatistics - Piracicaba 2014 538 Example IX.14 – Toenail RCT – A Bayesian logistic RI model Bayesian Biostatistics - Piracicaba 2014 539 Mean profiles: Aim: compare evolution of onycholysis over time between two treatments Response: yij : 0 = none or mild, 1 = moderate or severe logit(πij ) = β0 + β1 tij + β2 tij × treati + b0i # b0i ∼ N(0, σb20 ) '.u %. Model: Priors: Bayesian Biostatistics - Piracicaba 2014 ◦ Independent vague normal for regression coefficients ◦ σb0 ∼ U(0, 100) $ (. 540 Bayesian Biostatistics - Piracicaba 2014 541 Results: Example IX.15 – Toenail RCT – A Bayesian logistic RI+RS model • WinBUGS program: chapter 9 toenail RI BGLMM.odc Aim: compare evolution of onycholysis over time between two treatments ◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in Response: yij : 0 = none or mild, 1 = moderate or severe ◦ Rapid convergence • Model: • Posterior summary measures on the last 3 × 5,000 iterations logit(πij ) = β0 + β1 tij + β2 tij × treati + b0i + b1i tij ◦ β0(SD) = −1.74(0.34) ◦ β1(SD) = −0.41(0.045) ◦ β2(SD) = −0.17(0.069) ◦ σb20 M = 17.44 ◦ ICC = σb20 /(σb20 Similar distributional assumptions and priors as before WinBUGS program: chapter 9 toenail RI+RS BGLMM.odc 2 + π /3) = 0.84 Numerical difficulties due to overflow: use of min and max functions • Frequentist analysis with SAS GLIMMIX: similar results, see program chapter 9 toenail binary GLIMMIX and MCMC.sas Bayesian Biostatistics - Piracicaba 2014 542 Bayesian Biostatistics - Piracicaba 2014 Other BGLMMs Some further extensions Extension of BGLMM with residual term: In practice more models are needed: g(μij ) = xTij β + z Tij bi + εij , (j = 1, . . . , mi; i = 1, . . . , n) 543 • Non-linear mixed effects models (Section 9.4.4) Further extensions: • Ordinal logistic random effects model WinBUGS program: chapter 9 lip cancer PLNT with AFF.odc • Longitudinal models with multivariate outcome ◦ Replace Gaussian assumption by a t-distribution ◦ t-distribution with degrees of freedom as parameter • Frailty models Other extensions: write down full conditionals • ... See WinBUGS Manuals I and II, and literature Bayesian Biostatistics - Piracicaba 2014 544 Bayesian Biostatistics - Piracicaba 2014 545 9.4.6 Advantage of Bayesian modelling: Estimation of the REs and PPDs Prediction/estimation: Bayesian approach: estimating random effects = estimating fixed effects • Much easier to deviate from normality assumptions Prediction/estimation of individual curves in longitudinal models: • Much easier to fit more complex models + λT λTβ β b bi and with β bi: posterior means, medians or modes PPD: See Poisson-gamma model, but replace θi by bi Estimation via MCMC Bayesian Biostatistics - Piracicaba 2014 546 Bayesian Biostatistics - Piracicaba 2014 547 Example IX.19 – Toenail RCT – Exploring the random effects Histograms of estimated RI and RS: Software: 3 3 WinBUGS (CODA command) + past-processing with R R2WinBUGS #.0 Bayesian Biostatistics - Piracicaba 2014 548 Bayesian Biostatistics - Piracicaba 2014 #0 549 Predicting individual evolutions with WinBUGS: Predicted profile for Itraconazol patient 3: For existing subject (id[iobs]): newresp[iobs] ~ dnorm(mean[iobs], tau.eps) For new subject: sample random effect from its distribution • Computation: Stats table WinBUGS +R, CODA + R, R2WinBUGS • Extra WinBUGS commands: !**.! + zT • Predicted evolution for the ith subject: xTij β ij bi Predicted profiles: • Prediction missing response NA: automatically done Bayesian Biostatistics - Piracicaba 2014 9.4.7 $ ( 550 Choice of the level-2 variance prior Bayesian Biostatistics - Piracicaba 2014 551 Example IX.20 – Dietary study* – NI prior for level-2 variance • Modified dietary study: chapter 9 dietary study chol2.odc • Bayesian linear regression: NI prior for σ 2 = Jeffreys prior p(σ 2) ∝ 1/σ 2 Prior 1: σθ2 ∼ IG(10−3, 10−3) • Bayesian Gaussian hierarchical model: NI prior for σθ2 = p(σθ2) ∝ 1/σθ2 ?? Prior 2: σθ ∼ U(0,100) ◦ Theoretical + intuitive results: improper posterior ◦ Note: Jeffreys prior from another model, not Jeffreys prior of hierarchical model Prior 3: suggestion of Gelman (2006) gives similar results as with U(0,100) • Results: • Possible solution: IG(ε, ε) with ε small (WinBUGS)? ◦ Posterior distribution of σθ : clear impact of IG-prior ◦ Trace plot of μ: regularly stuck with IG-prior ◦ Not: see application • Solutions: ◦ U(0,c) prior on σθ ◦ Parameter expanded model suggestion by Gelman (2006) Bayesian Biostatistics - Piracicaba 2014 552 Bayesian Biostatistics - Piracicaba 2014 553 Effect choosing NI prior: posterior distribution of σθ2 ## )ODWSULRU σθ 3 3 ## 3 ,* Effect choosing NI prior: trace plot of μ σθ 3 Bayesian Biostatistics - Piracicaba 2014 554 Choice of the level-2 variance prior 9.4.8 Prior for level-2 covariance matrix? 555 Propriety of the posterior • Impact of improper (NI) priors on the posterior: theoretical results available • Classical choice: Inverse Wishart ◦ Important for SAS since it allows improper priors • Simulation study of Natarajan and Kass (2000): problematic • Proper NI priors can also have too much impact ◦ Important for WinBUGS • Solution: ◦ In general: not clear ◦ 2 dimensions: uniform prior on both σ’s and on correlation : IW(D, 2) with small diagonal elements : IW(D, 2) with small diagonal elements ≈ uniform priors, but slower convergence with uniform priors Bayesian Biostatistics - Piracicaba 2014 Bayesian Biostatistics - Piracicaba 2014 • Improper priors can yield proper full conditionals 556 Bayesian Biostatistics - Piracicaba 2014 557 9.4.9 Assessing and accelerating convergence Hierarchical centering: Uncentered Gaussian hierarchical model: Checking convergence Markov chain of a Bayesian hierarchical model: yij = μ + αi + εij αi ∼ N(0, σα2 ) • Similar to checking convergence in any other model Centered Gaussian hierarchical model: • But, large number of parameters ⇒ make a selection yij = θi + εij θi ∼ N(μ, σα2 ) Accelerating convergence Markov chain of a Bayesian hierarchical model: • Use tricks seen before (centering, standardizing, overrelaxation, etc.) • For mi = m: hierarchical centering implies faster mixing when σα2 > σ 2/m • Specific tricks for a hierarchical model: • Similar results for multilevel BLMM and BGLMM ◦ hierarchical centering ◦ (reparameterization by sweeping) ◦ (parameter expansion) Bayesian Biostatistics - Piracicaba 2014 558 Bayesian Biostatistics - Piracicaba 2014 559 Chapter 10 Model building and assessment Take home messages • Frequentist solutions are often close to Bayesian solutions, when the frequentist approach can be applied Aims: • There are specific problems that one has to take care of when fitting Bayesian mixed models Look at Bayesian criteria for model choice • Hierarchical models assume exchangeability of level-2 observations, which might a strong assumption. When satisfied it allows for better prediction making use of the ‘borrowing-strength’ principle Bayesian Biostatistics - Piracicaba 2014 560 Look at Bayesian techniques that evaluate chosen model Bayesian Biostatistics - Piracicaba 2014 561 10.1 Introduction • In this chapter: explorative tools that check and improve the fitted model: • Bayesian procedures for model building and model criticism ⇒ select an appropriate model • Model selection: Deviance Information Criterion (DIC) • In this chapter: model selection from a few good candidate models • Model criticism: Posterior Predictive Check (PPC) • Chapter 11: model selection from a large set of candidate models • For other criteria, see book • Bayesian model building and criticism = similar to frequentist model and criticism, except for ◦ Bayesian model: combination of likelihood and prior ⇒ 2 choices ◦ Bayesian inference: based on MCMC techniques while frequentist approaches on asymptotic inference Bayesian Biostatistics - Piracicaba 2014 10.2 562 Measures for model selection Bayesian Biostatistics - Piracicaba 2014 10.2.1 563 The Bayes factor • In Chapter 3, we have seen that it can be used to test statistical hypotheses in a Bayesian manner Use of Bayesian criteria for choosing the most appropriate model: • Briefly: Bayes factor • The Bayes factor can also be used for model selection. In book: illustration to choose between binomial, Poisson and negative binomial distribution for dmft-index • Extensively: DIC • Computing the Bayes factor is often quite involved, and care must be exercised when combined with non-informative priors • The pseudo Bayes factor: a variation of the Bayes factor that is practically feasible • Bayes factors must be computed outside WinBUGS Bayesian Biostatistics - Piracicaba 2014 564 Bayesian Biostatistics - Piracicaba 2014 565 10.2.2 Information theoretic measures for model selection Practical aspects of AIC, BIC, DIC Most popular frequentist information criteria: AIC and BIC • AIC, BIC and DIC adhere Occam’s Razor principle • Both adjust for model complexity: effective degrees of freedom (≤ number of model parameters) • Practical rule for choosing model: ◦ Choose model with smallest AIC/BIC/DIC • Akaike’s information criterion (AIC) developed by Akaike (1974) ◦ Difference of ≥ 5: substantive evidence AIC = −2 log L(θ(y) | y) + 2p ◦ Difference of ≥ 10: strong evidence • Bayesian Information Criterion (BIC) suggested by Schwartz (1978) BIC = −2 log L(θ(y) | y) + p log(n) • Deviance Information Criterion (DIC) suggested by Spiegelhalter et al. (2002) DIC = generalization of AIC to Bayesian models Bayesian Biostatistics - Piracicaba 2014 566 Bayesian Biostatistics - Piracicaba 2014 567 Number of parameters ⇔ effective degrees of freedom Example X.5+6: Theoretical example – Conditional/marginal likelihood of LMM What is the effective degrees of freedom ρ of a statistical model? Statistical model of Example IX.7: • For mixed model, number of parameters: • Two-level Bayesian Gaussian hierarchical model: Let p = # fixed effects, v = # variance parameters, q = # random effects Level 1: Level 2: Priors: Marginal model (marginalized over random effects): p + v Conditional model (random effects in model): p + v + q • # Free parameters (effective degrees of freedom): • Conditional likelihood (θi and θi | μ, σθ2 ∼ N(μ, σθ2)): 1 + 1 ≤ ρ ≤ 8 + 1 i 2 LC (θ, σ | y) = 8i m j N(yij | θi , σ ) Marginal model: ρ = p + v Conditional model: p + v ≤ ρ < p + v + 1 • Marginal likelihood (averaged over the random effects): ρ = 3 8 m i 2 2 LM (μ, σθ , σ | y) = . . . i j N(yij | θi , σ )N(θi | μ, σθ ) dθ1 . . . dθ8 with ◦ p + v = degrees of freedom when random effects are integrated out ◦ p + q + 1 = degrees of freedom when clusters are fixed effects Bayesian Biostatistics - Piracicaba 2014 yij | θi, σ 2 ∼ N(θi, σ 2) (j = 1, . . . , mi; i = 1, . . . , 8) θi | μ, σθ2 ∼ N(μ, σθ2) (i = 1, . . . , 8) σ ∼ IG(10−3, 10−3), σθ ∼ U(0,100), μ ∼ N(0, 106) 568 Bayesian Biostatistics - Piracicaba 2014 569 Bayesian definition of effective degrees of freedom Deviance information criterion • DIC as a Bayesian model selection criterion: • Bayesian effective degrees of freedom: pD = D(θ) − D(θ) · Bayesian deviance = D(θ) = −2 log p(y | θ) + 2 log f (y) DIC = D(θ) + 2pD = D(θ) + pD · D(θ) = posterior mean of D(θ) • Both DIC and pD can be calculated from an MCMC run: · D(θ) = D(θ) evaluated in posterior mean ◦ θ 1, . . . , θ K = converged Markov chain k ◦ D(θ) ≈ K1 K k=1 D(θ ) k ◦ D(θ) ≈ D( K1 K k=1 θ ) · f (y) typically saturated density but not used in WinBUGS • Motivation: The more freedom of the model, the more D(θ) will deviate from D(θ) pD = ρ (classical degrees of freedom) for normal likelihood with flat prior for μ and fixed variance Bayesian Biostatistics - Piracicaba 2014 570 Focus of a statistical analysis • DIC & pD are subject to sampling variability Bayesian Biostatistics - Piracicaba 2014 571 DIC and pD – Use and pitfalls • Focus of research determines degrees of freedom: population ⇔ cluster Examples to illustrate use and pitfalls when using pD and DIC: • Prediction in conditional and marginal likelihood: • Use of WinBUGS to compute pD and DIC (Example X.7) ◦ Predictive ability of conditional model: focus on current clusters (current random effects) ◦ Predictive ability of marginal model: focus on future clusters (random effects distribution) • ρ = pD for linear regression analysis (Example X.7) • Difference between conditional and marginal DIC (Example X.9) • Impact of variability of random effects on DIC (Example X.10) • Example: ◦ Multi-center RCT comparing A with B ◦ Population focus: interest in overall treatment effect = β ◦ Cluster focus: interest in treatment effect in center k = β + bk Bayesian Biostatistics - Piracicaba 2014 • Practical rule for choosing model: as with AIC/BIC • Impact of scale of response on DIC (Example X.10) • Negative pD (Example X.11) 572 Bayesian Biostatistics - Piracicaba 2014 573 Example X.8: Growth curve study – Model selection using pD and DIC Individual profiles: • Well-known data set from Potthoff & Roy (1964) • Dental growth measurements of a distance (mm) .## • 11 girls and 16 boys at ages (years) 8, 10, 12, and 14 • Variables are gender (1=female, 0=male) and age ! • Gaussian linear mixed models were fit to the longitudinal profiles • WinBUGS: chapter 10 Potthoff-Roy growthcurves.odc $ A! • Choice evaluated with pD and DIC Bayesian Biostatistics - Piracicaba 2014 574 Bayesian Biostatistics - Piracicaba 2014 575 Models: Alternative models: Model M1: Model M2: model M1, but assuming ρ = 0 yij = β0 +β1 agej +β2 genderi +β3 genderi ×agej +b0i +b1i agej +εij yij = distance measurement Model M3: model M2, but assuming σ0 = σ1 Model M4: model M1, but assuming σ0 = σ1 bi = (b0i, b1i)T random intercept and slope with distribution N (0, 0)T , G ⎛ G=⎝ ⎞ 2 σb0 ρσb0σb1 ρσb0σb1 2 σb1 ⎠ Model M5: model M1, but bi, εij ∼ t3-(scaled) distributions Model M6: model M1, but εij ∼ t3-(scaled) distribution εij ∼ N(0, σ02) for boys Model M7: model M1, but bi ∼ t3-(scaled) distribution εij ∼ N(0, σ12) for girls Nested model comparisons: The total number of parameters for model M1 = 63 (1) M1, M2, M3, (2) M1, M2, M4, (3) M5, M6, M7 4 fixed effects, 54 random effects, 3 + 2 variances (RE + ME) Bayesian Biostatistics - Piracicaba 2014 576 Bayesian Biostatistics - Piracicaba 2014 577 pD and DIC of the models: Model Initial and final model: Dbar Dhat pD model M1 model M5 Node Mean SD 2.5% 97.5% Mean SD 2.5% 97.5% β0 16.290 1.183 14.010 18.680 17.040 0.987 15.050 18.970 β1 0.789 0.103 0.589 0.989 0.694 0.086 0.526 0.867 β2 1.078 1.453 -1.844 3.883 0.698 1.235 -1.716 3.154 β3 -0.309 0.126 -0.549 -0.058 -0.243 0.106 -0.454 -0.035 σb0 1.786 0.734 0.385 3.381 1.312 0.541 0.341 2.502 σb1 0.139 0.065 0.021 0.280 0.095 0.052 0.007 0.209 ρ -0.143 0.500 -0.829 0.931 -0.001 0.503 -0.792 0.942 σ0 1.674 0.183 1.362 2.074 1.032 0.154 0.764 1.365 σ1 0.726 0.111 0.540 0.976 0.546 0.101 0.382 0.783 DIC M1 343.443 308.887 34.556 377.999 M2 344.670 312.216 32.454 377.124 M3 376.519 347.129 29.390 405.909 M4 374.065 342.789 31.276 405.341 M5 328.201 290.650 37.552 365.753 M6 343.834 309.506 34.327 378.161 M7 326.542 288.046 38.047 364.949 Bayesian Biostatistics - Piracicaba 2014 578 Bayesian Biostatistics - Piracicaba 2014 Example X.9: Growth curve study – Conditional and marginal DIC Marginal(ized) M4: (Conditional) model M4: with y i = (yi1, yi2, yi3, yi4)T 2 yij | bi ∼ N(μij , σ ) (j = 1, . . . , 4; i = 1, . . . , 27) ⎜1 ⎜ ⎜1 ⎜ Xi = ⎜ ⎜1 ⎜ ⎝ 1 μij = β0 + β1 agej + β2 genderi + β3 genderi × agej + b0i + b1i agej Deviance: DC (μ, σ 2) ≡ DC (β, b, σ 2) = −2 In addition: bi ∼ N (0, 0)T , G i j y i ∼ N4 X iβ, ZGZ T + R , (i = 1, . . . , 27) ⎛ with log N(yij | μij , σ 2) 579 ⎞ ⎛ ⎞ 8 genderi 8 × genderi ⎟ ⎜1 8 ⎟ ⎟ ⎜ ⎟ ⎜ 1 10 ⎟ 10 genderi 10 × genderi ⎟ ⎟ ⎜ ⎟ ⎟, Z = ⎜ ⎟, R = σ 2I4 ⎜ 1 12 ⎟ 12 genderi 12 × genderi ⎟ ⎟ ⎜ ⎟ ⎠ ⎝ ⎠ 14 genderi 14 × genderi 1 14 Deviance: DM (β, σ 2, G) = −2 i log N4 y i | X iβ, ZGZ T + R Interest (focus): current 27 bi’s Interest (focus): future bi’s DIC = conditional DIC DIC= marginal DIC WinBUGS: pD = 31.282 and DIC=405.444 WinBUGS : pD = 7.072 and DIC=442.572 Bayesian Biostatistics - Piracicaba 2014 580 Bayesian Biostatistics - Piracicaba 2014 581 DIC and pD – Additional remarks Frequentist analysis of M4: maximization of marginal likelihood Some results on pD and DIC: ◦ SAS procedure MIXED: p = 8 and AIC = 443.8 • pD and DIC depend on the parameterization of the model ◦ chapter 10 Potthoff-Roy growthcurves.sas • Several versions of DIC & pD : ◦ DIC in R2WinBUGS (classical definition) is overoptimistic (using data twice) ◦ R2jags: pD = var(deviance) / 2 ◦ rjags: classical DIC & DIC corrected for overoptimism (Plummer, 2008) ◦ Except for (R2)WinBUGS, computation of DIC can be quite variable Bayesian Biostatistics - Piracicaba 2014 10.3 582 Model checking procedures Bayesian Biostatistics - Piracicaba 2014 10.3.1 583 Introduction Selected model is not necessarily sensible nor does it guarantee a good fit to the data Statistical model evaluation is needed: 1. Checking that inference from the chosen model is reasonable 2. Verifying that the model can reproduce the data 3. Sensitivity analyses by varying certain aspects of the model Bayesian model evaluation: As frequentist approach Also prior needs attention Primarily based on sampling techniques Bayesian Biostatistics - Piracicaba 2014 584 Bayesian Biostatistics - Piracicaba 2014 585 Practical procedure 10.3.4 Posterior predictive check Take Example III.8 (PPD for caries experience) Two procedures are suggested for Bayesian model checking • Sensitivity analysis: varying statistical model & priors ;; ; • Posterior predictive checks: generate replicated data from assumed model, and compare discrepancy measure based on replicated and observed data =">)@D)'>?@'"B?'=/ #*[ Observed dmft-distribution is quite different from PPD Bayesian Biostatistics - Piracicaba 2014 586 Classical approach: Bayesian Biostatistics - Piracicaba 2014 587 (i) Idea behind posterior predictive check We measure difference of observed and predicted distribution via a statistic ◦ H0: model M0 holds for the data, i.e. y ∼ p(y | θ) Then use the (large) sample distribution of the statistic under H0 to see if the observed value of the statistic is extreme ◦ Sample y = {y1, . . . , yn} & θ estimated from the data Take, e.g. But: ◦ Goodness-of-fit test (GOF) statistic T (y) with large value = poor fit In Bayesian approach, we condition on observed data ⇒ different route needs to be taken Now, take a summary measure from the observed distribution and see how extreme it is compared to the replicated summary measures sampled under assumed distribution ◦ Discrepancy measure D(y, θ) (GOF test depends on nuisance parameters) Then = { Generate sample y y1, . . . , yn} from p(y | θ) Compare observed GOF statistic/discrepancy measure with that based on replicated data + take uncertainty about θ into account This gives a proportion of times discrepancy measure of observed data is more extreme than that of replicated data = Bayesian P-value Bayesian Biostatistics - Piracicaba 2014 588 Bayesian Biostatistics - Piracicaba 2014 589 (ii) Computation of the PPC Example X.19: Osteoporosis study – Checking distributional assumptions Computation of the PPP-value for D(y, θ) or T (y): Checking the normality of tbbmc: • (R2)WinBUGS: six PPCs in chapter 10 osteo study-PPC.R 1. Let θ 1, . . . , θ K be a converged Markov chain from p(θ | y) 2. Compute D(y, θ k ) (k = 1, . . . , K) (for T (y) only once) • Skewness and kurtosis using T (y) (fixing mean and variance): 3 4 n n 1 yi − y 1 yi − y and Tkurt(y) = −3 Tskew (y) = n i=1 s n i=1 s 3. Sample replicated data ỹ from p(y|θ ) (each of size n) k k 4. Compute D( y k , θ k ) (k = 1, . . . , K) k k k 5. Estimate pD by pD = K1 K i=1 I D(ỹ , θ ) ≥ D(y, θ ) • Skewness and kurtosis using D(y, θ): 3 4 n n 1 yi − μ 1 yi − μ Dskew (y, θ) = and Dkurt(y, θ) = − 3, n i=1 σ n i=1 σ When pD < 0.05/0.10 or pD > 0.90/0.95: “bad” fit of model to the data Graphical checks: with θ = (μ, σ)T ◦ T (y): Histogram of T (ỹ k ), (k = 1, . . . , K) with observed T (y) ◦ D(y, θ): X-Y plot of D(ỹ k , θ k ) versus D(y, θ k ) + 45◦ line Bayesian Biostatistics - Piracicaba 2014 590 PPP-values: Bayesian Biostatistics - Piracicaba 2014 591 PPP-plots: ⇒ Dskew , Dkurt more conservative than Tskew , Tkurt • Histogram + scatterplot (next page) 0.z • pDskew = 0.27 & pDkurt = 0.26 • pTskew = 0.13 & pTkurt = 0.055 Bayesian Biostatistics - Piracicaba 2014 592 FB@?=>'>z@); Bayesian Biostatistics - Piracicaba 2014 z 593 Example X.19 – Checking distributional assumptions Potthoff & Roy growth curve study: Model M1 evaluated with PPCs Checking normality of ri = (yi − yi)/σ of regression tbbmc on bmi: • R program using R2WinBUGS: LMM R2WB Potthoff Roy PPC model M1.R (R2)WinBUGS: six PPCs in chapter 10 osteo study2-PPC.R • Four PPCs were considered: Kolmogorov-Smirnov discrepancy measure PPC 1: sum of squared distances of data (yij or ỹij ) to local mean value DKS (y, θ) = maxi∈{1,...,n} [Φ(yi | μ, σ) − (i − 1)/n, i/n − Φ(yi | μ, σ)] Two versions (m = 4): ◦ Version 1: TSS (y) = Gap discrepancy measure Dgap(y, θ) = maxi∈{1,...,(n−1)}(y(i+1) − y(i)), with y(i) the ith ordered value of y i=1 None showed deviation from normality Bayesian Biostatistics - Piracicaba 2014 594 595 { 6 6 Bayesian Biostatistics - Piracicaba 2014 Potthoff & Roy growth curve study: PPCs 2, 3, 4 Potthoff & Roy growth curve study: PPC 1 i=1 • PPC 2 & PPC 3: skewness and kurtosis of residuals (θ = (μ, σ)T ): n m yij −μij 3 1 Skewness: Dskew (y, θ) = n×m i=1 j=1 σ yij −μij 4 n m 1 Kurtosis: Dkurt(y, θ) = n×m −3 i=1 j=1 σ Proportion of residuals greater than 2 in absolute value n I(|yi| > 2) D0.05(y, θ) = n m 2 j=1 (yij − y i. ) n m 2 1 ◦ Version 2: DSS (y, θ) = n×m i=1 j=1 (yij − μij ) 1 n×m { PPP 2 = 0.51, PPP 3 = 0.56, PPP 4 = 0.25 PPP 1 = 0.68 Bayesian Biostatistics - Piracicaba 2014 596 Bayesian Biostatistics - Piracicaba 2014 597 Take home messages • The Bayesian approach offers the possibility to fit complex models Part III • But, model checking cannot and should not be avoided • Certain aspects of model checking may take a (very long) time to complete More advanced Bayesian modeling • The combination of WinBUGS/OpenBUGS with R is important • R software should be looked for to do the model checking quicker Bayesian Biostatistics - Piracicaba 2014 598 Chapter 11 Advanced modeling with Bayesian methods Bayesian Biostatistics - Piracicaba 2014 11.1 599 Example 1: Longitudinal profiles as covariates • MCMC techniques allow us to fit models to complex data • This is for many statisticians the motivation to make use of the Bayesian approach • In this final chapter we give 2 examples of more advanced Bayesian modeling: ◦ Example 1: predicting a continuous response based on longitudinal profiles ◦ Example 2: a complex dental model to relate the presence of caries on deciduous teeth to the presence of caries on adjacent permanent teeth Bayesian Biostatistics - Piracicaba 2014 600 Bayesian Biostatistics - Piracicaba 2014 601 Predicting basal metabolic rate Covariate profiles and outcome histogram: G!.# w!! • 70 boys (10.9 -12.6 years) and 69 girls (10.0-11.6 years) ! ! 3 • Study in suburb around Kuala Lumpur (Malaysia) on normal weight-for-age and height-for-age schoolchildren, performed between 1992 and 1995 • Subjects were measured serially every 6 months for 3 years, dropouts: 23% ?# • Measurements: weight, height, (log) lean body mass (lnLBM) and basal metabolic rate (BMR) + . . . ! E\. # %05DWWKH[DPLQDWLRQ ! • Aim: Predict BMR from anthropometric measurements ?# Bayesian Biostatistics - Piracicaba 2014 • Program commands are in BMR prediction.R ?# 602 3 "(@ Bayesian Biostatistics - Piracicaba 2014 603 Joint modeling: • Alternative approaches: • Assessing relation between anthropometric measurements and BMR is useful: Determination of BMR is time consuming, prediction with anthropometric measurements is quicker Classical regression using all anthropometric measurements at all ages: multicollinearity + problems with missing values Tool for epidemiological studies to verify if certain groups of schoolchildren has an unusual BMR Ridge regression using all anthropometric measurements at all ages: problems with missing values Two-stage approach: sampling error of intercepts and slopes is not taken into account • Prediction in longitudinal context: Based on measurements taken cross-sectionally • Joint modeling: takes into account missing values and sampling error Based on longitudinal measurements: joint modeling (here) • Bayesian approach takes sampling error automatically into account, but computer intensive Bayesian Biostatistics - Piracicaba 2014 604 Bayesian Biostatistics - Piracicaba 2014 605 Analysis Results: Longitudinal models • The following longitudinal models were assumed: Weight: yij = β01 + β11 ageij + β21 sexi + b01i + b11i ageij + ε1ij Height: yij = β02 + β12 ageij + β22 sexi + b02i + b12i ageij + ε2ij Param Mean log(LBM): yij = β03 + β13 ageij + β23 sexi + b03i + b13i ageij + ε3ij with RI + RS independent between models and of measurement error: ◦ b1i ∼ N(0, Σb1 ), b2i ∼ N(0, Σb2 ), b3i ∼ N(0, Σb3 ) ◦ ε1ij ∼ N(0, σε21 ), ε2ij ∼ N(0, σε22 ), ε3ij ∼ N(0, σε23 ) • The following main model for BMR was assumed: yi = β0 +β1 b01i +β2 b11i +β3 b02i +β4 b12i ++β5 b03i +β6 b13i +β7 agei1 +β8 sexi +εij SD 2.5% β01 β11 β21 -19.17 1.80 -22.65 4.80 0.15 4.51 -3.99 1.04 -6.06 β02 β12 β22 69.10 2.09 65.26 6.49 0.17 6.15 -4.10 1.16 -6.42 β03 β13 β23 1.99 0.18 0.12 0.01 -0.23 0.25 1.66 0.11 -0.78 25% 50% 75% 97.5% Rhat n.eff Weight -20.40 -19.19 -17.98 -15.63 1.00 790.00 4.69 4.80 4.90 5.09 1.00 750.00 -4.68 -4.01 -3.28 -1.96 1.00 3000.00 Height 67.68 69.02 70.41 73.52 1.00 3000.00 6.39 6.50 6.61 6.80 1.00 3000.00 -4.86 -4.08 -3.33 -1.83 1.00 3000.00 log(LBM) 1.86 1.98 2.12 2.37 1.02 130.00 0.12 0.12 0.12 0.13 1.00 2200.00 -0.40 -0.23 -0.06 0.26 1.04 72.00 • 3 chains, 1,500,000 iterations with 500,000 burn-in with thinning=1,000 Bayesian Biostatistics - Piracicaba 2014 606 Results: Main model for BMR Param β0 β1 β2 β3 β4 β5 β6 β7 β8 Mean SD 2.5% 25% 50% 59.80 97.21 -130.51 -6.10 60.28 17.26 8.55 0.10 11.45 17.43 375.31 82.15 215.49 319.77 373.65 10.15 7.41 -4.95 5.31 10.12 233.95 83.18 61.67 179.57 235.55 119.40 102.76 -92.40 52.80 123.35 1.49 99.83 -198.05 -67.33 4.06 457.78 10.87 436.60 450.20 457.90 198.36 86.51 34.79 138.75 199.50 Bayesian Biostatistics - Piracicaba 2014 607 Results: DIC and pD 75% 124.82 22.85 431.70 15.14 290.52 186.43 69.44 465.10 257.35 97.5% Rhat n.eff 251.50 1.00 3000.00 34.06 1.00 1300.00 534.91 1.00 2200.00 24.41 1.00 2800.00 394.70 1.00 2800.00 316.20 1.01 230.00 188.71 1.01 410.00 479.20 1.00 2400.00 369.60 1.00 1200.00 Model Dbar Dhat pD DIC bmr6 1645.500 1637.260 8.233 1653.730 height 2298.300 2056.320 241.984 2540.290 weight 2592.730 2367.990 224.733 2817.460 lnLBM -2885.930 -3133.980 248.054 -2637.870 total 3650.600 2927.590 723.004 4373.600 Dbar = post.mean of -2logL Dhat = -2LogL at post.mean of stochastic nodes DIC and pD are given for each model: 3 longitudinal models and main model Averaged relative error: 11% Bayesian Biostatistics - Piracicaba 2014 608 Bayesian Biostatistics - Piracicaba 2014 609 Results: Random effects structure: presence of correlation Further modeling: • Looking for correct scale in longitudinal models and main model • Possible alternative: model 4-dimensional response jointly over time • Involve 23 other anthropometric measurements in prediction Bayesian Biostatistics - Piracicaba 2014 11.2 610 Bayesian Biostatistics - Piracicaba 2014 611 612 Bayesian Biostatistics - Piracicaba 2014 613 Example 2: relating caries on deciduous teeth with caries on permanent teeth See 2012 presentation JASA paper Bayesian Biostatistics - Piracicaba 2014 Exercises For the exercises you need the following software: Part IV • Rstudio + R • WinBUGS Exercises • OpenBUGS Bayesian Biostatistics - Piracicaba 2014 614 Bayesian Biostatistics - Piracicaba 2014 Pre-course exercises Exercises Chapter 1 Software: Rstudio + R Software: Rstudio + R Run the following R programs making use of R-studio to get used to R and to check your software • dmft-score and poisson distribution.r Exercise 1.1 Plot the binomial likelihood and the log-likelihood of the surgery example. Use chapter 1 lik and llik binomial example.R for this. What happens when x=90 and n=120, or when x=0 and n=12? Exercise 1.2 Compute the positive predictive value (pred+) of the Folin-Wu blood test for prevalences that range from 0.05 to 0.50. Plot pred+ as a function of the prevalence. • gamma distributions.r Bayesian Biostatistics - Piracicaba 2014 615 616 Bayesian Biostatistics - Piracicaba 2014 617 Exercises Chapter 2 Software: Rstudio + R Exercise 2.1 In the (fictive) ECASS 2 study on 8 out of the 100 rt-PA treated stroke patients suffered from SICH. Take a vague prior for θ = probability of suffering from SICH with rt-PA. Derive the posterior distribution with R. Plot the posterior distribution, together with the prior and the likelihood. What is the posterior probability that the incidence of SICH is lower than 0.10? Change program chapter 2 binomial lik + prior + post.R that is used to take the ECASS 2 study data as prior. So now the ECASS 2 data make up the likelihood. Exercise 2.2 Plot the Poisson likelihood and log-likelihood based on the first 100 subjects from the dmft data set (dmft.txt). Use program chapter 2 Poisson likelihood dmft.R Bayesian Biostatistics - Piracicaba 2014 618 Exercise 2.5 Take the first ten subjects in the dmft.txt file. Assume a Poisson distribution for the dmft-index, combine first with a Gamma(3,1) prior and then with a Gamma(30,10) prior for θ (mean of the Poisson distribution). Establish the posterior(s) with R. Then take all dmft-indexes and combine them with the two gamma priors. Repeat the computations. Represent your solutions graphically. What do you conclude? Use program chapter 2 poisson + gamma dmft.R. Exercise 2.3 Suppose that in the surgical example of Chapter 1, the 9 successes out of 12 surgeries were obtained as follows: SSF SSSF SSF SSSSS. Start with a uniform prior on the probability of success, θ, and show that how the posterior changes when the results of the surgeries become available. Then use a Beta(0.5,0.5) as prior. How do the results change? Finally, take a skeptical prior for the probability of success. Use program chapter 2 binomial lik + prior + post2.R and chapter 2 binomial lik + prior + post3.R. Exercise 2.4 Suppose that the prior distribution for the proportion of subjects with an elevated serum cholesterol (>200 mg/dl), say θ, in a population is centered around 0.10 with 95% prior uncertainty (roughly) given by the interval [0.035,0.17]. In a study based on 200 randomly selected subjects in that population, 30% suffered from hypercholesteremia. Establish the best fitting beta prior to the specified prior information and combine this with the data to arrive at the posterior distribution of θ. Represent the solutions graphically. What do you conclude? Use program chapter 2 binomial lik + prior + post cholesterol.R. Bayesian Biostatistics - Piracicaba 2014 619 Exercises Chapter 3 Software: Rstudio + R Exercise 3.1 Verify with R the posterior summary measures reported in Examples III.2 and III.5. Try first yourself to write the program. But, if you prefer, you can always look at the website of the book. Exercise 3.2 Take the first ten subjects in the dmft.txt file, take a Gamma(3,1) prior and compute the PPD of a future count in a sample of size 100 with R. What is the probability that a child will have caries? Use program chapter 3 negative binomial distribution.R. Exercise 3.3 Take the sequence of successes and failures reported in Exercise 2.3. Compute the PPD for the next 30 results. See program chapter 3 exercise 3-3.R. Bayesian Biostatistics - Piracicaba 2014 620 Bayesian Biostatistics - Piracicaba 2014 621 Exercise 3.4 In Exercise 2.1, determine the probability that in the first interim analysis there will be more than 5 patients with SICH. Exercise 3.5 The GUSTO-1 study is a mega-sized randomized controlled clinical trial comparing two thrombolytics: streptokinase (SK) and recombinant plasminogen activator (rt-PA) for the treatment of patients with an acute myocardial infarction. The outcome of the study is 30-day mortality, which is a binary indicator whether the treated patient died after 30 days or not. The study recruited 41021 acute infarct patients from 15 countries and 1081 hospitals in the period December 1990 - February 1993. The basic analysis was reported in Gusto (1993) and found a statistically significant lower 30-day mortality rate for rt-PA compared with SK. Brophy and Joseph (1995) reanalyzed the GUSTO-1 trial results from a Bayesian point of view, leaving out the subset of patients who were treated with both rt-PA and SK. As prior information the data from two previous studies have been used: GISSI-2 and ISIS-3. In the table on next page data from the GUSTO-1 study and the two historical studies are given. The authors compared the results of streptokinase and rt-PA using the difference of the two observed proportions, equal to absolute Bayesian Biostatistics - Piracicaba 2014 622 Questions: 1. Determine the normal prior for ar based on the data from (a) the GISSI-2 study, (b) the ISIS-3 study and (c) the combined data from the GISSI-2 and ISIS-3 studies. 2. Determine the posterior for ar for the GUSTO-1 study based on the above priors and a noninformative normal prior. 3. Determine the posterior belief in the better performance of streptokinase or rt-PA based on the above derived posterior distributions. 4. Compare the Bayesian analyses to the classical frequentist analyses of comparing the proportions in the two treatment arms. What do you conclude? 5. Illustrate your results graphically. Hint: Adapt the program chapter 3 Gusto.R. Exercise 3.6 Sample from the PPD of Example III.6 making use of the analytical results and the obtained posterior summary measures for a non-informative normal prior for μ. Bayesian Biostatistics - Piracicaba 2014 624 risk reduction (ar), of death, nonfatal stroke and the combined endpoint of death or nonfatal stroke. Use the asymptotic normality of ar in the calculations below. N of Trial N of Combined Drug Patients Deaths Nonfatal strokes Deaths or Strokes GUSTO SK 20,173 1,473 101 1,574 rt-PA 10,343 652 62 714 10,396 929 56 985 rt-PA 10,372 993 74 1,067 13,780 1,455 75 1,530 rt-PA 13,746 1,418 95 1,513 GISSI-2 SK ISIS-3 N of SK Bayesian Biostatistics - Piracicaba 2014 623 Exercise 3.7 In Exercise 2.4, derive analytically the posterior distribution of logit(θ), if you dare. Or . . ., compute the posterior distribution of logit(θ) via sampling and compute the descriptive statistics of this posterior distribution. Exercise 3.8 Apply the R program chapter 3 Bayesian P-value cross-over.R to compute the contour probability of Example III.14, but with x=17 and n=30. Exercise 3.9 Berry (Nature Reviews/Drug Discovery, 2006, 5, 27-36) describes an application of Bayesian predictive probabilities in monitoring RCTs. The problem is as follows: After 16 control patients have been treated for breast cancer, 4 showed a complete response at a first DSMB interim analysis, but with an experimental treatment 12 out of 18 patients showed a complete response. It was planned to treat in total 164 patients. Using a Bayesian predictive calculation, it was estimated that the probability to get a statistical significant result at the end was close to 95%. This was the reason for the DSMB to advice to stop the trial. Apply the R program chapter 3 predictive significance.R to compute this probability. In a second step apply the program with different numbers. Bayesian Biostatistics - Piracicaba 2014 625 Exercises Chapter 4 Exercises Chapter 5 Software: Rstudio + R, WinBUGS Software: Rstudio + R Exercise 4.1 Run the R program chapter 4 regression analysis osteoporosis.R, which is based on the Method of Composition. Exercise 4.2 Sample from a t3(μ, σ 2)-distribution with μ = 5 and σ = 2 using the Method of Composition (extract relevant part from previous program). Exercise 4.3 Sample from a mixture of three Gamma distributions with: Mixture weights: π1 = 0.15, π2 = 0.35 and π3 = 0.55 Parameters gamma distributions: α1 = 2.5, β1 = 1.75, α2 = 3.8, β2 = 0.67 and α3 = 6, β3 = 0.3. Bayesian Biostatistics - Piracicaba 2014 626 Exercise 5.4 Example VI.4: What types of priors have been used? Exercise 5.1 Establish a prior for the prevalence of HIV patients in your country. Make use of the internet (or any source that you would like to use) to establish your prior. Combine this prior information with the results of a(n hypothetical) survey based on 5,000 randomly chosen individuals from your population with 10% HIV-positives? Exercise 5.2 Theoretical question: show that √ the Jeffreys prior for the mean parameter λ of a Poisson likelihood is given by λ. Exercise 5.3 In chapter 5 sceptical and enthusiastic prior.R we apply a skeptical and a enthusiastic prior to interim results comparing to responder rates of two cancer treatments. Run the program, interpret the results and apply the program to the final results. Bayesian Biostatistics - Piracicaba 2014 627 Exercises Chapter 6 Software: Rstudio + R Exercise 6.1 Example VI.1: With chapter 6 gibbs sampling normal distribution.R, determine posterior summary measures from the sampled histograms. Exercise 6.2 Example VI.2: Use R program chapter 6 beta-binomial Gibbs.R to sample from the distribution f (x). Bayesian Biostatistics - Piracicaba 2014 628 Bayesian Biostatistics - Piracicaba 2014 629 Exercises Chapter 7 Software: Rstudio + R, WinBUGS/OpenBUGS Exercise 7.1 The file chapter 7 osteoporosis convergence.R contains the commands to perform the diagnostic plots and tests based on three CODA output files with names osteop1.txt, osteop2.txt, osteop3.txt and one administrative file with name osteoind.txt. These files were obtained from a WinBUGS run. Run this R program and evaluate the output. Exercise 7.2 In chapter 7 caries.odc a Bayesian logistic regression is performed. Use and adapt the R program in the previous exercise to evaluate the convergence of the model parameters (hint: produce three chains as for osteoporosis study). Compare the performance of the convergence of the model parameters when the regressors are centered and standardized. Perform a Bayesian probit regression model. Exercise 7.4 Fit a Bayesian Poisson model on the dmft-counts of the first 20 children in ‘dmft.txt’. Then fit a negative binomial model to the counts. Check the convergence of your sampler, and compute posterior summary statistics. This exercise requires that you build up a WinBUGS program from scratch (hint: use an existing program as start). Exercise 7.5 Run the WinBUGS Mice example - Volume I. This is an example of a Bayesian Weibull regression analysis. Exercise 7.6 Use the Mice example, as a guiding example to solve the following problem. The data is taken from the book of Duchateau and Janssen (The Frailty Model, Springer). It consists in the time from last time of giving birth to the time of insemination of a cow. It is of interest to find predictors of the insemination time. The data can be found in WinBUGS format in insemination.odc. The following variables are studied for their predictor effect: Exercise 7.3 Apply some the acceleration techniques to the program of Exercise 7.2. Bayesian Biostatistics - Piracicaba 2014 630 ◦ PARITY: the number of times that the cow gave birth ◦ Dichotomized version of PARITY: FIRST= 1 if the cow gave birth only once, FIRST = 0 when the cow gave birth multiple times. ◦ UREUM: Ureum percentage in the milk ◦ PROTEIN: Protein percentage in the milk The cows belong to herds (cluster). Some cows are killed before the next insemination and are then censored at the time of death. If a cow is not inseminated after 300 days, it is censored at that time. The data is restricted to the first 15 clusters. The time and cluster information are captured in the following variables : ◦ TIME: time from birth to insemination (or censoring) ◦ STATUS: = 1 for events, = 0 for censored ◦ CLUSTERID: cluster identifier Bayesian Biostatistics - Piracicaba 2014 632 Bayesian Biostatistics - Piracicaba 2014 631 The questions/tasks are: Model the time to insemination using a Weibull distribution, taking censoring into account but ignoring the clustering in the data. Use a vague prior for the parameters. Then: ◦ Check the convergence of sampler and the precision of your estimates. ◦ Compute posterior summary statistics for each parameter. What is the increase of risk of insemination in cows with multiple births (hazard ratio)? Evaluate the sensitivity of the results on the choice of priors. Bayesian Biostatistics - Piracicaba 2014 633 Exercises Chapter 8 Exercise 8.3 Run WinBUGS with chapter 8 osteomultipleregression.odc. Suppose that we wish to predict for a new individual of 65 years old, with length= 165 cm, weight = 70 kg, (fill in BMI yourself) and strength variable equal to 96.25 her tbbmc. Perform the prediction and give the 95% CI. Software: WinBUGS/OpenBUGS Exercise 8.1 In the analysis of the osteoporosis data (file chapter 8 osteoporosis.odc), switch off the blocking mode and compare the solution to that when the blocking mode option is turned on. Exercise 8.2 Adapt the above WinBUGS program (blocking option switched off) to analyze the osteoporosis data and: ◦ perform a simple linear regression analysis with bmi centered ◦ add the quadratic term bmi2 ◦ explore how convergence can be improved ◦ give 4 starting values, apply BGR diagnostics, export the chains Exercise 8.4 Vary in Exercise 8.3 the distribution of the error term. Check WinBUGS manual which distributions are available. Exercise 8.5 Program Example II.1 in WinBUGS. Exercise 8.6 Simulate with WinBUGS prior distributions. Take informative and noninformative priors. and explore the posterior distribution in R. Bayesian Biostatistics - Piracicaba 2014 634 Exercise 8.7 In WinBUGS the command y ∼ dnorm(mu,isigma)I(a,b) means that the random variable y is interval censored in the interval [a,b]. WinBUGS does not have a command to specify that the random variable is truncated in that interval. In OpenBUGS the censoring command is C(a,b), while the truncation command is T(a,b). Look into chapter 8 openbugscenstrunc.odc and try out the various models to see the difference between censoring and truncation. Bayesian Biostatistics - Piracicaba 2014 635 change in samplers in the 3 programs in chapter 6 coalmine samplers.odc. Exercise 8.8 Example VI.4: In chapter 6 coalmine.odc a WinBUGS program is given to analyze the coalmine disaster data. Particular priors have been chosen to simplify sampling. Derive the posterior of θ − λ and θ/λ. Take other priors to perform what what is called a sensitivity analysis. Exercise 8.9 In chapter 5 prevalence Ag Elisa test Cysticercosis.odc you find the program used in Example V.6. Run the program and remove the constraint on the prevalence. What happens? Exercise 8.10 Check the samplers in the program chapter 6 coalmine.odc. Note the Bayesian Biostatistics - Piracicaba 2014 636 Bayesian Biostatistics - Piracicaba 2014 637 Exercises Chapter 9 Software: Rstudio + R, WinBUGS/OpenBUGS, R2WinBUGS Exercise 9.1 A meta-analysis on 22 clinical trials on beta-blockers for reducing mortality after acute myocardial infarction was performed by Yusuf et al. ‘Beta blockade during and after myocardial infarction: an overview of the randomized trials. Prog Cardiovasc Dis. 1985;27:33571’. The authors performed a classical analysis. Here the purpose is to perform a Bayesian meta-analysis. In the WinBUGS file chapter 9 beta-blocker.odc two Bayesian meta-analyses are performed. Tasks: Use the program in Model 1 (and Data 1, Inits 1/1) to evaluate the overall mean effect of the beta-blockers and the individual effects of each of the betablockers. Evaluate also the heterogeneity in the effect of the beta-blockers. Rank the studies according to their baseline risk. Suppose a new study with twice 100 patients are planned to set up. What is the predictive distribution of the number of deaths in each arm? For this use Data 2 and Inits 1/2. Bayesian Biostatistics - Piracicaba 2014 638 Exercise 9.4 In chapter 9 dietary study-R2WB.R an R2WinBUGS analysis is performed on the IBBENS data. Run this program and perform some diagnostic tests in R to check the convergence of the chain. Exercise 9.5 Use chapter 9 dietary study chol2.odc to examine the dependence of the posterior of σθ on the choice of ε in the inverse gamma ‘noninformative’ prior IG(ε, ε). Exercise 9.6 Example IX.12: Check the impact of choosing an Inverse Wishart prior for the covariance matrix of the random intercept and slope on the estimation of the model parameters, by varying the parameters of the prior. Compare this solution with the solution obtained from taking a product of three uniform priors on the standard deviation of the random intercept and slope, and the correlation, respectively. Use chapter 9 toenail LMM.odc. Bayesian Biostatistics - Piracicaba 2014 640 Use the program in Model 2 (and Data 2, Inits 2/2) to determine the predictive distribution of the number of deaths in each arm of the new study (with 2x100 patients recruited). What is the difference with the previous analysis? Why is there a difference? Rank the studies according to their baseline risk. Illustrate the shrinkage effect of the estimated treatment effects compared to the observed treatment effects. Exercise 9.2 Perform a Bayesian meta-analysis using the WinBUGS program in chapter 9 sillero meta-analysis.odc. Perform a sensitivity analysis by varying the prior distributions and the normal distribution of the log odds ratio. Run the program also in OpenBUGS. Exercise 9.3 In chapter 9 toenail RI BGLMM.odc a logistic random intercept model is given to relate the presence of moderate to severe onycholysis to time and treatment. Run the program and evaluate the convergence of the chain. Vary also the assumed distribution of the random intercept. Bayesian Biostatistics - Piracicaba 2014 639 Exercise 9.7 Medical device trials are nowadays often designed and analyzed in a Bayesian way. The Bayesian approach allows to include prior information from previous studies. It is important to reflect on the way subjective information is included into a new trial. A too strong prior may influence the current results too heavily and sometimes render the planning of a new study obsolete. A suggested approach is to incorporate the past trials with the current trial into an hierarchical model. In chapter 9 prior MACE.odc the fictive data are given on 6 historical studies on the effect of a cardiovascular device in terms of a 30-day MACE rate (composite score of major cardiovascular events), see Pennello and Thompson (J Biopharm Stat, 2008, 18:1, 81-115). For the new device the claim is that the probability of 30-day MACE, p1 is less than 0.249. The program evaluates what the prior probability of this claim is based on the historical studies. Run this program and check also, together with the paper (pages 84-87), how this prior claim can be adjusted. Bayesian Biostatistics - Piracicaba 2014 641 Exercises Chapter 10 Exercise 10.3 Example X.19: In chapter 10 osteo study-PPC.R the response tbbmc is checked for normality with 6 PPCs. The program uses R2WinBUGS. Run the program and check the results. Perform also the gap test. Apply this approach to test the normality of the residuals when regressing tbbmc on age and bmi. Software: Rstudio + R, WinBUGS/OpenBUGS Exercise 10.1 Replay the growth curve analyses in the WinBUGS program chapter 10 Pothoff-Roy growthcurves.odc and evaluate with DIC and pD the well performing models. Exercise 10.2 In chapter 10 Poisson models on dmft scores.odc several models are fit to the dmft-index of 276 children of the Signal-Tandmobiel study. Evaluate the best performing model. Confirm that the Poisson-gamma model has a different DIC and pD from the negative binomial model. Perform a chi-squared type of posterior-predictive check. Bayesian Biostatistics - Piracicaba 2014 642 Exercise 10.4 In chapter 10 caries with residuals.odc a logistic regression model is fitted to the CE data of the Signal-Tandmobiel study of Example VI.9 with some additional covariates, i.e. gender of the child (1=girl), age of the child, whether or not the child takes sweets to the school and whether or not the children frequently brushed their teeth. Apply the program and use the option Compare to produce various plots based on εi. Are there outliers? Do the plots indicate a special pattern of the residuals? In the same program also a Poisson model is fitted to the dmft-indices. Apply a PPC with the χ2-discrepancy measure to check the Poisson assumption. Bayesian Biostatistics - Piracicaba 2014 643 Medical papers Part V Bayesian methodology is slowly introduced into clinical trial and epidemiological research. Below we provide some references to medical/epidemiological papers where the Bayesian methodology has been used. Methodological papers to highlight the usefulness of methodological papers in medical research are: Medical papers Lewis, R.J. and Wears, R.L., An introduction to the Bayesian analysis of clinical trials, Annals of Emergency Medicine, 1993, 22, 8, 1328–1336 Diamond, G.A. and Kaul, S., Prior Convictions. Bayesian approaches to the analysis and interpretation of clinical megatrials. JACC, 2004, 43, 11, 1929–1939 Gill, C.J., Sabin, L. and Schmid, C.H., Why clinicians are natural Bayesians, BMJ, 2005, 330, 1080–1083 Berry, D.A. Bayesian clinical trials, Nature Reviews/Drug Discovery, 2006, 5, 27–36 Bayesian Biostatistics - Piracicaba 2014 644 Bayesian Biostatistics - Piracicaba 2014 645 Bayesian methods in clinical trials Monitoring: Cuffe, M.S., Califf, R.M., Adams, K.F. et al. Short-term intravenous milirone for acute exacerbation of chronic heart failure: A randomized controlled trial., JAMA, 2002, 287, 12, 1541–1547 Evaluation of safety during the study conduct in a Bayesian manner. Monitoring: The GRIT Study Group. A randomized trial of time delivery for the compromised preterm fetus: short term outcomes and Bayesian interpretation., BJOG: an International Journal of Obstretics and Gynaecology, 2003, 110, 27–32 Evaluation of safety during the study conduct in a Bayesian manner. Planning and analysis: Holmes, D.R., Teirstein, P., Satler, L. et al. Sirolimus-eluting stents vs vascular brachytherapy for in-stent restenosis within bare-metal stents. JAMA, 2006, 295, 11, 1264–1273 Bayesian methods were used for trial design and formal primary analysis using a Bayesian hierarchical model approach to decrease the sample size. Planning and analysis: Stone, G.W., Martin, J.L., de Boer, M-J. et al. Effect of supersaturated oxygen delivery on infarct size after percutaneous coronary intervention in acute myocardial infarction. Circulation, 2009, Sept 15, DOI: 10.1161/CIRCINTERVENTIONS.108.840066 Bayesian methods were used for trial design and formal primary analysis using a Bayesian hierarchical model approach to decrease the sample size. Sensitivity analysis: Jafar, T.H., Islam, M., Bux, R. et al. Cost-effectiveness of community-based strategies for blood-pressure control in low-income developing country: Findings from a cluster-randomized, factorial-controlled trial. Circulation, 2011, 124, 1615–1625 Bayesian methods were used to vary parameters when computing the cost-effectiveness. Planning and analysis: Shah, P.L., Slebos, D-J., Cardosa, P.F.G. et al. Bronchoscopic lung-volume reduction with Exhale airway stents for emphysema (EASE trial): randomised sham-controlled, multicenter trial. The Lancet, 2011, 378, 997–1005 A Bayesian adaptive design was used to set up the study, the primary efficacy and safety endpoints were analysed using a Bayesian approach. Bayesian Biostatistics - Piracicaba 2014 Bayesian Biostatistics - Piracicaba 2014 646 Planning and analysis: Wilber, D.J., Pappone, C., Neuzil, P. et al. Comparison of antiarrhythmic drug therapy and radiofrequency catheter ablation in patients with paroxysmal atrial fibrillation. JAMA, 2010, 303, 4, 333–340 A Bayesian design was used to set up the study (allowing for stopping for futility), the primary efficacy endpoint was analysed using a Bayesian approach. In addition frequentist methods were applied for confirmation. Planning and analysis: Cannon, C.P., Dansky, H.M., Davidson, M. et al. Design of he DEFINE trial: Determining the efficacy and tolerability of CETP INhibition with AnacEtrapib. Am Heart J, 2009, 158, 4, 513.e3–519.e3 Cannon, C.P., Shah, S., Hayes, M. et al. Safety of Anacetrapib in patients with or at high risk for coronary heart disease. NEJM, 2010, 363, 25, 2406–2415 A preplanned Bayesian analysis was specified to evaluate the sparse CV event in the light of previous studies. Other analyses have a frequentist nature. item[Guidance document:] The FDA guidance document for the planning, conduct and analysis of medical device trials is also a good introductory text for the use of Bayesian analysis in clinical trials in general (google ‘FDA medical devices guidance’) Bayesian Biostatistics - Piracicaba 2014 648 647 Bayesian methods in epidemiology Two relatively simple papers on the application of hierarchical models in epidemiology: Multilevel model Macnab, Y.C., Qiu, Z., Gustafson, P. et al. Hierarchical Bayes analysis of multilevel health services data: A Canadian neonatal mortality study. Health Services & Outcomes Research Methodology, 2004, 5, 5–26. Spatial model Zhu, L., Gorman, D.M. and Horel, S. Hierarchical Bayesian spatial models for alcohol availability, drug “hot spots” and violent crime. International Journal of Health Geographics, 2006, 5:54 DOI: 10.1186/1476-072X-5-54. Bayesian Biostatistics - Piracicaba 2014 649
© Copyright 2026 Paperzz