Bayesian versus Frequentist statistical analyses: what`s

Bayesian versus Frequentist statistical analyses:
what’s the difference?
Jeff Presneill
Intensive Care Unit
Royal Brisbane and Women’s Hospital
18th Annual Meeting on Clinical Trials in Intensive Care
ANZICS Clinical Trials Group
Noosa 8th March 2016
Acknowledgements
• Dobson AJ, Barnett AG. An introduction to generalized linear models.
3rd ed. CRC Press, 2008.
• Gelman A, et al. Bayesian Data Analysis. 3rd ed. CRC Press, 2014.
• Ghahramani S. Fundamentals of probability. 2nd ed. Prentice-Hall, 2000.
• Tijms H. Understanding probability. 3rd ed. Cambridge University Press, 2012
• Berry DA. Bayesian clinical trials. Nature Reviews Drug Discovery 2006;5:27-36.
• https://commons.wikimedia.org
• Amanda Martin, Administration Manager and PA to the Director, ANZIC-RC
This presentation contains
• Formulae
and references to:
• Mathematics
Any resemblance of any image
• Statistics
in this presentation to any
world class researcher in the CTG
• Religion
is purely a chance event
• History
(you decide the probability!)
Frequentist (= Classical) Interpretation of probability
• Current standard statistical approach
Bayesian rising due to advanced computer simulation options
• Experiment = one of an infinite sequence of possible
repetitions of the same experiment, each producing
statistically independent results
• Statistical hypothesis testing (true or false)
• Confidence intervals
• Unknown parameters have fixed but unknown values
Bayes
• Reverend Thomas Bayes (c. 1702 – 61)
• Fellow of the Royal Society in 1742
• Only two known publications, only one
mathematical
• Probability assigned to a hypothesis,
whereas under frequentist inference, a
hypothesis is typically tested without being
assigned a probability.
• Theorem published 1763
Icon representing Bayesian
statistics By Mikhail Ryazanov https://commons.wikimedia.org/w/in
dex.php?curid=17116987
Bayesian vs Frequentist
• Bayes' Theorem = Bayes Rule = Conditional probability
• Combining new evidence with prior belief
• Contrast with frequentist inference = relies only on the evidence
as a whole, with no reference to prior beliefs
• "Bayesian updating"
– Bayes' rule can be applied iteratively
– Observe some evidence, resulting posterior probability can then be
treated as a prior probability, and a new posterior probability computed
from new evidence
– Evidence viewed all at once, or over time
Rise of Bayesian statistics
• From1980s, dramatic growth in research and applications
of Bayesian methods
• Markov chain Monte Carlo methods (MCMC), which
removed many of the computational problems
• Most undergraduate teaching is still based on frequentist
statistics
• Bayesian methods are widely accepted and used
Do Bayesian
and Frequentist methods
agree?
• Posterior is a compromise between data and prior
information
• Under a standard noninformative prior distribution,
Bayesian estimates and standard errors coincide with
classical frequentist regression results
Fundamental concept of Bayes Theorem
• Calculation of P(B | A) in terms of P(A | B)
| =
•
•
•
•
| | =
∑=1
| Subtle use of conditional probabilities
The above theorem due to Laplace
Posterior probability of Bi after the occurrence of A
Posterior is proportional to prior × likelihood
Pierre-Simon Laplace (1749–1827)
• Introduced a general version of the theorem
• Used it to approach problems in celestial mechanics,
medical statistics, reliability, and jurisprudence
Posterior probability
Likelihood
Prior
probability
data| 0 Θ = |data =
data
Probability of data
(parameter space)
Bayes rules appear in different form #1
Prior
probability
Likelihood
Posterior probability
| | | = =
∑=1
| Probability of data
(parameter space)
Bayes rule – discrete parameter
Prior
probability
Likelihood
Posterior probability
| | | = =
∑=1
| Sum over all
possible values
Probability of data
(parameter space)
Bayes rule – continuous parameter
Prior
probability
Likelihood
Posterior probability
data| 0 Θ = |data =
data| 0 Integrate over all
possible values if θ
continuous
Probability of data
(parameter space)
Bayes rule – continuous parameter
• Markov chain Monte Carlo (MCMC) methods
= algorithms sampling probability distribution
• key step compute large models that require integrations over hundreds
Prior
or even thousands of unknown parameters
probability
Likelihood
Posterior probability
data| 0 Θ = |data =
data| 0 Integrate over all
possible values if θ
continuous
Probability of data
(parameter space)
EXAMPLE: Bayesian calculations discrete probability
Estimating a probability from binomial data
• See the
– prior probability
– posterior probability
– likelihood
• Bayesian “Prior” must be specified = expert’s belief about
all possible values of the parameter θ
• Prior specified before seeing data, can take any form
• Data updates Prior information
Steering Committee =
S. Finfer (Chair), …,
R. Bellomo, …,
J. Cooper,…,
J. Myburgh…
“Prior” knowledge available = 50% believe
John, Simon and James are CTG Saints
The CTG Saints Study*
• Very Low Budget
• Can only afford total sample size of 10 subjects
• How to proceed to analysis
• Frequentist?
• Bayesian?
* Completely unregisterable at any legitimate trials database!
Frequentist approach – Binomial random variable
• 10 randomly
selected CTG
attendees
Six report a belief in
Saint Rinaldo
Frequentist approach – Binomial random variable
Frequentist approach – Binomial random variable
Bayesian calculations discrete probability
“The CTG Saint Study”
• Goal is inference about proportion of ANZICS-CTG
attendees who believe in selected “CTG Saints”
• Random sample of 10 CTG attendees
• Data (y) = ⁄ who believe in a particular “CTG Saint”
• θ = probability that an individual believes
• H0 = CTG authors are NOT saints (θ ≤ 0.5)
• H1 = CTG authors ARE saints (θ > 0.5)
Bayesian calculations discrete probability
“The CTG Saint Study”
• Model = Binomial random variable
• Probability of observing a particular set of data
10 #
Likelihood ≡ #| = $ % 1 − 10−#
#
• For θ = 0, 0.05, 0.1, …, 0.95, 1
• To find the posterior probability use Bayes rule:
data| 0 Θ = |data =
data
“Prior” knowledge available = 50% believe
John, Simon and James are CTG Saints
Bayesian calculations discrete probability
“The CTG Saint Study”
• Prior Distribution p(θ)
– Probability of being a CTG Saint = 50%
– P(y = 1) = P(y = 0) = 0.5
– Uniformly distributed
across closed intervals θ = [0.0, 0.5]; [0.55, 1.0]
• Data to update Prior information
– Random sample from CTG conference, n = 10
Bayesian calculations discrete probability
“The CTG Saint Study” – Prior probability = 0.5
Bayesian prior 0.5, obs 0.6
0.4
Prior
Likelihood
Posterior
p(θ )
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
θ
BUT, what if “Prior” = only 10% believe
John, Simon and James are CTG Saints
Bayesian calculations discrete probability
“The CTG Saint Study” – Prior probability = 0.5
Bayesian prior 0.5, obs 0.6
0.4
Prior
Likelihood
Posterior
p(θ )
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
θ
Bayesian calculations discrete probability
“The CTG Saint Study” – Prior probability = 0.1
Bayesian prior 0.1, obs 0.6
0.4
Prior
Likelihood
Posterior
p(θ )
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
θ
Bayesian calculations discrete probability
“The CTG Saint Study” – Prior probability = 0.1
Bayesian prior 0.1, obs 0.6
0.4
Prior
Likelihood
Posterior
p(θ )
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
θ
Bayesian calculations discrete probability “The CTG Saint Study”
– Sceptical Informative Prior probability = 0.1
20
beta( 20.63 , 196.4 ) prior, B( 10 , 6 ) data, beta( 26.63 , 200.4 ) posterior
10
0
5
Density
15
Prior
Likelihood
Posterior
0.0
0.2
0.4
0.6
0.8
1.0
theta
Using R for Bayesian Statistics
http://a-little-book-of-r-for-bayesian-statistics.readthedocs.org/en/latest/src/bayesianstats.html
Bayesian calculations discrete probability “The CTG Saint Study”
– Enthusiastic Informative Prior probability = 0.9
20
beta( 196.4 , 20.63 ) prior, B( 10 , 6 ) data, beta( 202.4 , 24.63 ) posterior
10
0
5
Density
15
Prior
Likelihood
Posterior
0.0
0.2
0.4
0.6
0.8
1.0
theta
Using R for Bayesian Statistics
http://a-little-book-of-r-for-bayesian-statistics.readthedocs.org/en/latest/src/bayesianstats.html
Bayesian calculations discrete probability “The CTG Saint Study”
– Conservative Informative Prior probability = 0.5
beta( 9.2 , 9.2 ) prior, B( 10 , 6 ) data, beta( 15.2 , 13.2 ) posterior
2
0
1
Density
3
4
Prior
Likelihood
Posterior
0.0
0.2
0.4
0.6
0.8
1.0
theta
Using R for Bayesian Statistics
http://a-little-book-of-r-for-bayesian-statistics.readthedocs.org/en/latest/src/bayesianstats.html
Bayesian calculations discrete probability “The CTG Saint Study”
– Uninformative Uniform Prior probability [0, 1]
beta( 1 , 1 ) prior, B( 10 , 6 ) data, beta( 7 , 5 ) posterior
Uninformative prior
likelihood = posterior
Bayesian = Frequentist
1.5
0.0
0.5
1.0
Density
2.0
2.5
Prior
Likelihood
Posterior
0.0
0.2
0.4
0.6
0.8
1.0
theta
Using R for Bayesian Statistics
http://a-little-book-of-r-for-bayesian-statistics.readthedocs.org/en/latest/src/bayesianstats.html
General features of Bayesian inference
• Posterior distribution centered at a point of compromise
between prior information and data
– Weighted proportional to precision
• Data influence rises with increasing sample size
= prior less influence
• Posterior variance is usually less than prior variance
Bayesian credible intervals
vs Frequentist confidence intervals
• Central 95% probability interval of posterior distribution
= “credible interval” = “posterior interval” for p
• Slightly different summary also encountered,
= “highest posterior density”
• Credible intervals sometimes have similar properties to
confidence intervals
Fundamental concept of Bayes Theorem
• Calculation of P( B | A) in terms of P(A | B)
| | | = =
∑=1
| Posterior probability of Bi after the occurrence of A
State an hypothesis = prior probability distribution
Collect and summarize relevant data
Revised distribution of unknown parameter (which is a random
variable) is the posterior probability distribution.
• This allows direct statements about probability hypothesis is true
•
•
•
•
Bayesian analyses can be sequential
• Bayesian analyses sequential
• New information incorporated into analysis as soon as available
= continuously update
• Uses posterior obtained after previous trial as the new prior
• Legitimate intermediate conclusions
based on partial results from ongoing experiment
• Modify future course of experiment in light of these conclusions
• Stop trial not at fixed size, but when adding more patients will not
appreciably change the conclusions
Bayes rule – continuous parameter
• Markov chain Monte Carlo (MCMC) methods
= algorithms sampling probability distribution
• Key step compute large models that require integrations over hundreds
Prior
or even thousands of unknown parameters
probability
Likelihood
Posterior probability
data| 0 Θ = |data =
data| 0 Integrate over all
possible values if θ
continuous
Probability of data
(parameter space)
• Convergence of
MetropolisHastings
algorithm
• MCMC attempts to
approximate the
blue distribution
with the orange
distribution
By Chdrappi - Using
R; FOSS statistical
software, CC BY-SA
3.0
https://commons.wiki
media.org/w/index.ph
p?curid=25674906
The three steps of Bayesian data analysis
1. Setting up a full probability model
– Joint probability distribution for all quantities
– Consistent with scientific knowledge
2. Condition on the observed data
– Conditional probabilities of quantities of interest,
given observed data
3. Evaluation of model fit
– Fit of model to data
– Sensitivity of results to Modelling assumptions
Fundamental concepts
Frequentist vs Bayesian
• Population parameters – different ways of expressing uncertainty
– Frequentist = fixed unknown constant
– Bayesian = random variable, subject to change and as additional data
arises. Assign probability distributions before seeing data = prior
distributions
• The revised distribution of unknown parameter (which is a random
variable) is the posterior probability distribution.
• Direct statements about probability hypothesis is true
• Frequentist = probability of obtaining a result at least as extreme
as observed, assuming hypothesis true = tail probability = p value
Bayesian calculations discrete probability
“The CTG Saint Study” – Prior probability = 0.1
Bayesian prior 0.1, obs 0.6
0.4
Prior
Likelihood
Posterior
p(θ )
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
θ
Parameters θi assigned prior distribution
data| 0 Θ = |data =
data| 0 Integrate over all possible values of θi
using MCMC approach