Introduction to Bayesian Statistics

Introduction to Bayesian Statistics
Machine Learning and Data Mining
Philipp Singer
CC image courtesy of user mattbuck007 on Flickr
Conditional Probability
2
Conditional Probability
●
Probability of event A given that B is true
●
P(cough|cold) > P(cough)
●
Fundamental in probability theory
3
Before we start with Bayes ...
●
Another perspective on conditional probability
●
Conditional probability via growing trimmed trees
●
https://www.youtube.com/watch?v=Zxm4Xxvzohk
4
Bayes Theorem
5
Bayes Theorem
●
●
●
P(A|B) is conditional probability of observing A
given B is true
P(B|A) is conditional probability of observing B
given A is true
P(A) and P(B) are probabilities of A and B without
conditioning on each other
6
Visualize Bayes Theorem
Some event
All possible
outcomes
Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/
7
Visualize Bayes Theorem
People having
cancer
All people
in study
8
Visualize Bayes Theorem
People where
screening test
is positive
All people
in study
9
Visualize Bayes Theorem
People having
positive screening
test and cancer
10
Visualize Bayes Theorem
●
Given the test is positive, what is the probability that said
person has cancer?
11
Visualize Bayes Theorem
●
Given the test is positive, what is the probability that said
person has cancer?
12
Visualize Bayes Theorem
●
Given that someone has cancer, what is the probability that said
person had a positive test?
13
Example: Fake coin
●
●
Two coins
–
One fair
–
One unfair
What is the probability of having the fair coin
after flipping Heads?
CC image courtesy of user pagedooley on Flickr
14
Example: Fake coin
CC image courtesy of user pagedooley on Flickr
15
Example: Fake coin
CC image courtesy of user pagedooley on Flickr
16
Update of beliefs
●
Allows new evidence to update beliefs
●
Prior can also be posterior of previous update
17
Example: Fake coin
●
●
Belief update
What is probability of seeing a fair coin after we
have already seen one Heads
CC image courtesy of user pagedooley on Flickr
18
Bayesian Inference
19
Source: https://xkcd.com/1132/
20
Bayesian Inference
●
Statistical inference of parameters
Additional
knowledge
Parameters
Data
21
Coin flip example
●
Flip a coin several times
●
Is it fair?
●
Let's use Bayesian inference
22
Binomial model
●
Probability p of flipping heads
●
Flipping tails: 1-p
●
Binomial model
23
Prior
●
Prior belief about parameter(s)
●
Conjugate prior
●
–
Posterior of same distribution as prior
–
Beta distribution conjugate to binomial
Beta prior
24
Beta distribution
●
Continuous probability distribution
●
Interval [0,1]
●
Two shape parameters: α and β
–
If >= 1, interpret as pseudo counts
–
α would refer to flipping heads
25
Beta distribution
26
Beta distribution
27
Beta distribution
28
Beta distribution
29
Beta distribution
30
Posterior
●
Posterior also Beta distribution
●
For exact deviation:
http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf
31
Posterior
●
Assume
–
Binomial p = 0.4
–
Uniform Beta prior: α=1 and β=1
–
200 random variates from binomial distribution (Heads=80)
–
Update posterior
32
Posterior
●
Assume
–
Binomial p = 0.4
–
Biased Beta prior: α=50 and β=10
–
200 random variates from binomial distribution (Heads=80)
–
Update posterior
33
Posterior
●
●
●
Convex combination of prior and data
The stronger our prior belief, the more data we
need to overrule the prior
The less prior belief we have, the quicker the
data overrules the prior
34
So is the coin fair?
●
Examine posterior
–
95% posterior density interval
–
ROPE [1]: Region of practical equivalence for null hypothesis
–
Fair coin: [0.45,0.55]
●
95% HDI: (0.33, 0.47)
●
Cannot reject null
●
More samples→ we can
[1] Kruschke, John. Doing Bayesian data analysis: A tutorial
with R, JAGS, and Stan. Academic Press, 2014.
36
Bayesian Model Comparison
Evidence
●
Parameters marginalized out
●
Average of likelihood weighted by prior
37
Bayesian Model Comparison
●
Bayes factors [1]
●
Ratio of marginal likelihoods
●
Interpretation table by Kass & Raftery [1]
●
>100 → decisive evidence against M2
[1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors."
Journal of the american statistical association 90.430 (1995): 773-795.
38
So is the coin fair?
●
Null hypothesis
●
Alternative hypothesis
●
–
Anything is possible
–
Beta(1,1)
Bayes factor
39
So is the coin fair?
●
n = 200
●
k = 80
●
Bayes factor
●
(Decent) preference for alt. hypothesis
40
Other priors
●
Prior can encode (theories) hypotheses
●
Biased hypothesis: Beta(101,11)
●
Haldane prior: Beta(0.001, 0.001)
–
u-shaped
–
high probability on p=1 or (1-p)=1
41
Frequentist approach
●
So is the coin fair?
●
Binomial test with null p=0.5
●
–
one-tailed
–
0.0028
Chi² test
42
Posterior prediction
●
Posterior mean
●
If data large→converges to MLE
●
MAP: Maximum a posteriori
–
Bayesian estimator
–
uses mode
43
Bayesian prediction
●
●
Posterior predictive distribution
Distribution of unobserved observations
conditioned on observed data (train, test)
Frequentist
MLE
44
Alternative Bayesian Inference
●
●
Often marginal likelihood not easy to evaluate
–
No analytical solution
–
Numerical integration expensive
Alternatives
–
Monte Carlo integration
●
Markov Chain Monte Carlo (MCMC)
●
Gibbs sampling
●
Metropolis-Hastings algorithm
–
Laplace approximation
–
Variational Bayes
45
Bayesian (Machine) Learning
46
Bayesian Models
●
Example: Markov Chain Model
–
Dirichlet prior, Categorical Likelihood
●
Bayesian networks
●
Topic models (LDA)
●
Hierarchical Bayesian models
47
Generalized Linear Model
●
Multiple linear regression
●
Logistic regression
●
Bayesian ANOVA
48
Bayesian Statistical Tests
●
Alternatives to frequentist approaches
●
Bayesian correlation
●
Bayesian t-test
49
Questions?
Philipp Singer
[email protected]
Image credit: talk of Mike West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf
50