Introduction to Bayes: Bayes theorem

Rev. Thomas Bayes
•
Studied at the University of Edinburgh in 1719
•
Elected Fellow of the Royal Society in 1742
•
Famous for “An Essay towards solving a Problem in the
Doctrine of Chances”
I
Intro to Bayes
Published posthumously in 1763
Slide 1
Bayes Theorem
•
He proposed a theorem that now bears his name:
p(B|A) =
p(A|B)p(B)
p(A|B)p(B) + p(A|B C )p(B C )
•
A and B are two events
•
Common example:
•
I
A is the event: positive test
I
B is the event: have disease
I
B C is the event: disease free
You know p(A|B)
I
Intro to Bayes
Patient cares about p(B|A)
Slide 2
Bayes Theorem
•
We would rewrite his theorem as
p(x|y) =
•
p(y|x)p(x)
p(y|x)p(x)
=P
p(y)
x p(y|x)p(x)
If x and y are continuous we have
p(x|y) =
p(y|x)p(x)
p(y|x)p(x)
=R
p(y)
p(y|x)p(x) dx
•
Bayes rule is all about conditional probability
•
We want to find p(x|y) from p(y|x)
Intro to Bayes
Slide 3
Bayes Theorem: Statistics
•
When using Bayes rule in statistics:
f (θ|y) =
•
f (y|θ)f (θ)
f (y|θ)f (θ)
=R
f (y)
f (y|θ)f (θ)dθ
Often see this expressed as:
f (θ|y) ∝ f (y|θ)f (θ)
Posterior ∝ Likelihood × Prior
Intro to Bayes
I
f (y|θ): Data model (or likelihood)
I
f (θ): Prior distribution.
I
f (θ|y): Posterior distribution
Slide 4
What is Bayesian statistics?
•
•
We start with prior beliefs about quantities of interest
I
Before the data are collected
I
These beliefs are often ‘vague’
I
Expressed as pmf/pdf
Use the data to update our belief
I
•
The posterior distribution contains all of our knowledge about
the quantity of interest
I
•
Obtain posterior distribution
In the form of a pmf/pdf
‘Bayesian yardstick’: Degree of knowledge about anything
unknown can be described through probability.
Intro to Bayes
Slide 5
Likelihood
•
•
We need to define the information the data contain about the
quantity of interest
I
Specify a likelihood or data model
I
Commonly known as a statistical model
We have seen these already
I
These are the pmf/pdfs we looked at earlier
– We might assume our data are normally distributed
– We might assume our data are binomially distributed
Intro to Bayes
I
These describe the uncertainty of the data given parameter values
I
Talk more about these soon
Slide 6
The posterior distribution
•
•
The posterior distribution
I
Contains all our knowledge about θ given the data
I
Inference is based on this distribution
Use it to find point estimates for θ:
I
•
Use it to find interval estimates for θ:
I
Intro to Bayes
Mean or median of posterior distribution
Quantiles of the posterior distribution
Slide 7
In theory
•
Intro to Bayes
Once we have: statistical model and prior distribution
I
Find the posterior distribution
I
Use posterior to make inference about θ
Slide 8
In practice
•
•
In practice we are unable to calculate the posterior directly
I
It is too difficult to find required integrals
I
We can generate samples from it
I
Markov chain Monte Carlo (MCMC)
We can simulate from the posterior distribution
I
Use these samples to summarize the distribution
– Visualize the distribution
– Find point estimates
– Find interval estimates
– ...
Intro to Bayes
Slide 9
Summary of Bayesian statistics
•
Investment required to understand the basic concepts
I
Easier to extend to realistic and complex models.
– e.g. hierarchical models (outside scope of course)
•
Not constrained by ‘recipe book’ style procedures
I
Easier to make a mistake:
– Software: ‘WARNING: MCMC can be dangerous’
•
•
Intro to Bayes
MCMC can largely be automated (e.g. JAGS)
I
It can take a long time to sample from posterior
I
MCMC can struggle (particularly as models get more complex)
Need to specify priors
I
Natural way to incorporate additional information
I
Offers flexibility in modeling
I
No longer have estimator choice
I
Make probability statements about a parameter
Slide 10
Example: binomial data
•
I want each of you to answer the following question:
I
Does U of O need to do more to help students refrain from
academic misconduct?
•
On a piece of paper write:
1. Answer (yes or no)
2. Teaching load (0, 1, 2.5, etc)
Intro to Bayes
Slide 11
Example: binomial data
•
Now have data
•
Assume the workshop is a representative sample of U of O
staff
•
•
Assume it is binomially distributed
I
n = 15 trials
I
Each with probability π of answering yes
I
Observed y =? saying yes
Goal is to estimate π
I
Intro to Bayes
Find the posterior distribution p(π|y)
Slide 12
Knowledge as pdf
•
When using Bayesian inference we represent our knowledge
using pdfs
I
Go over various choices of prior distribution
– Show the corresponding posterior when y = 9
•
Intro to Bayes
Then, we will fit the model in JAGS
Slide 13
Priors for probabilities
•
We need a pdf for parameter π to describe our prior belief
I
Probabilities must be between 0 and 1
•
We can use a beta distribution: Be(α, β)
•
Two ‘vague’ priors often used are:
•
Intro to Bayes
I
Be(0.5,0.5)
I
Be(1,1)
Also look at:
I
Be(1,9)
I
Be(5,5)
I
Be(9,1)
Slide 14
Priors: vague
2
0
1
Density
3
4
Be(1,1)
Be(0.5,0.5)
0.0
0.2
0.4
0.6
0.8
1.0
π
Intro to Bayes
Slide 15
Priors: informative
4
0
2
Density
6
8
Be(1,9)
Be(9,1)
Be(5,5)
0.0
0.2
0.4
0.6
0.8
1.0
π
Intro to Bayes
Slide 16
Priors and posteriors: vague
3
0
1
2
Density
4
5
6
Prior: Be(1,1)
Prior: Be(0.5,0.5)
0.0
0.2
0.4
0.6
0.8
1.0
π
Intro to Bayes
Slide 17
Priors and posteriors: informative
4
0
2
Density
6
8
Prior: Be(1,9)
Prior: Be(9,1)
Prior: Be(5,5)
0.0
0.2
0.4
0.6
0.8
1.0
π
Intro to Bayes
Slide 18
Example: binomial data
•
Intro to Bayes
Fit the model using our data in JAGS
Slide 19
Example: two sample binomial data
•
Consider two populations:
I
University staff/students that teach
I
University staff/students that do not teach
•
Is there a difference between these two populations?
•
Minimal changes needed to JAGS code
Intro to Bayes
Slide 20
Example: normal data
•
Consider paired data
I
Weight change (in lbs) for 29 young female anorexia patients
undertaking cognitive behavioural treatment1
– File anorex1.txt or use data below
ycbt = c(1.7, 0.7, -0.1, -0.7, -3.5, 14.9, 3.5, 17.1, -7.6,
1.6, 11.7, 6.1, 1.1, -4, 20.9, -9.1, 2.1, -1.4, 1.4,
-0.3, -3.7, -0.8, 2.4, 12.6, 1.9, 3.9, 0.1, 15.4, -0.7)
1
The full data includes another treatment and a control. Hopefully we will
explore that data later.
Intro to Bayes
Slide 21
Example: normal data
•
Statistical model
I
We assume each observation is normally distributed
– Unknown mean µ
– Unknown standard deviation σ (or precision τ )
I
Intro to Bayes
We need priors for µ and σ (or τ )
Slide 22
Priors
•
Prior for µ:
I
We often use a normal prior with large variance for µ
– Lets the data speak
•
The variance/precision is trickier
I
The precision must be positive
I
It is common to use a gamma prior for τ
A well motivated alternative is a half-t prior for σ
I
– Weakly informative prior
– Regularizes the model
I
E.g. unlikely to see weight changes of hundreds of lbs
– Unlikely to see σ exceed 50
– Use a half-t(0, 252 , 3)
– See next slide
Intro to Bayes
Slide 23
0.015
0.000
0.005
0.010
Density
0.020
0.025
0.030
Half-t priors
0
20
40
60
80
100
σ
Intro to Bayes
Slide 24
Half-t prior
•
Advantage:
I
I
Most of the mass between 0 and 50
Heavy tails (we have used 3 degrees of freedom)
– There is still considerable mass >50 if we are wrong
•
Disadvantage?
I
Intro to Bayes
Requires some knowledge about the data
Slide 25
Fitting model
•
In JAGS
•
Trick:
I
Intro to Bayes
JAGS uses precisions not standard deviations
Slide 26
Example: two sample normal data
•
•
Anorexia data: increase in body weight.
I
Treatment (modeled previously)
I
Control
Assume the data are normally distributed
I
I
Each group have different mean
Each group have same variance
– Straightforward to allow variance to differ between samples
•
Intro to Bayes
Data in anorex2.txt
Slide 27