Introduction to Bayesian Statistics Machine Learning and Data Mining Philipp Singer CC image courtesy of user mattbuck007 on Flickr Conditional Probability 2 Conditional Probability ● Probability of event A given that B is true ● P(cough|cold) > P(cough) ● Fundamental in probability theory 3 Before we start with Bayes ... ● Another perspective on conditional probability ● Conditional probability via growing trimmed trees ● https://www.youtube.com/watch?v=Zxm4Xxvzohk 4 Bayes Theorem 5 Bayes Theorem ● ● ● P(A|B) is conditional probability of observing A given B is true P(B|A) is conditional probability of observing B given A is true P(A) and P(B) are probabilities of A and B without conditioning on each other 6 Visualize Bayes Theorem Some event All possible outcomes Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/ 7 Visualize Bayes Theorem People having cancer All people in study 8 Visualize Bayes Theorem People where screening test is positive All people in study 9 Visualize Bayes Theorem People having positive screening test and cancer 10 Visualize Bayes Theorem ● Given the test is positive, what is the probability that said person has cancer? 11 Visualize Bayes Theorem ● Given the test is positive, what is the probability that said person has cancer? 12 Visualize Bayes Theorem ● Given that someone has cancer, what is the probability that said person had a positive test? 13 Example: Fake coin ● ● Two coins – One fair – One unfair What is the probability of having the fair coin after flipping Heads? CC image courtesy of user pagedooley on Flickr 14 Example: Fake coin CC image courtesy of user pagedooley on Flickr 15 Example: Fake coin CC image courtesy of user pagedooley on Flickr 16 Update of beliefs ● Allows new evidence to update beliefs ● Prior can also be posterior of previous update 17 Example: Fake coin ● ● Belief update What is probability of seeing a fair coin after we have already seen one Heads CC image courtesy of user pagedooley on Flickr 18 Bayesian Inference 19 Source: https://xkcd.com/1132/ 20 Bayesian Inference ● Statistical inference of parameters Additional knowledge Parameters Data 21 Coin flip example ● Flip a coin several times ● Is it fair? ● Let's use Bayesian inference 22 Binomial model ● Probability p of flipping heads ● Flipping tails: 1-p ● Binomial model 23 Prior ● Prior belief about parameter(s) ● Conjugate prior ● – Posterior of same distribution as prior – Beta distribution conjugate to binomial Beta prior 24 Beta distribution ● Continuous probability distribution ● Interval [0,1] ● Two shape parameters: α and β – If >= 1, interpret as pseudo counts – α would refer to flipping heads 25 Beta distribution 26 Beta distribution 27 Beta distribution 28 Beta distribution 29 Beta distribution 30 Posterior ● Posterior also Beta distribution ● For exact deviation: http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf 31 Posterior ● Assume – Binomial p = 0.4 – Uniform Beta prior: α=1 and β=1 – 200 random variates from binomial distribution (Heads=80) – Update posterior 32 Posterior ● Assume – Binomial p = 0.4 – Biased Beta prior: α=50 and β=10 – 200 random variates from binomial distribution (Heads=80) – Update posterior 33 Posterior ● ● ● Convex combination of prior and data The stronger our prior belief, the more data we need to overrule the prior The less prior belief we have, the quicker the data overrules the prior 34 So is the coin fair? ● Examine posterior – 95% posterior density interval – ROPE [1]: Region of practical equivalence for null hypothesis – Fair coin: [0.45,0.55] ● 95% HDI: (0.33, 0.47) ● Cannot reject null ● More samples→ we can [1] Kruschke, John. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press, 2014. 36 Bayesian Model Comparison Evidence ● Parameters marginalized out ● Average of likelihood weighted by prior 37 Bayesian Model Comparison ● Bayes factors [1] ● Ratio of marginal likelihoods ● Interpretation table by Kass & Raftery [1] ● >100 → decisive evidence against M2 [1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors." Journal of the american statistical association 90.430 (1995): 773-795. 38 So is the coin fair? ● Null hypothesis ● Alternative hypothesis ● – Anything is possible – Beta(1,1) Bayes factor 39 So is the coin fair? ● n = 200 ● k = 80 ● Bayes factor ● (Decent) preference for alt. hypothesis 40 Other priors ● Prior can encode (theories) hypotheses ● Biased hypothesis: Beta(101,11) ● Haldane prior: Beta(0.001, 0.001) – u-shaped – high probability on p=1 or (1-p)=1 41 Frequentist approach ● So is the coin fair? ● Binomial test with null p=0.5 ● – one-tailed – 0.0028 Chi² test 42 Posterior prediction ● Posterior mean ● If data large→converges to MLE ● MAP: Maximum a posteriori – Bayesian estimator – uses mode 43 Bayesian prediction ● ● Posterior predictive distribution Distribution of unobserved observations conditioned on observed data (train, test) Frequentist MLE 44 Alternative Bayesian Inference ● ● Often marginal likelihood not easy to evaluate – No analytical solution – Numerical integration expensive Alternatives – Monte Carlo integration ● Markov Chain Monte Carlo (MCMC) ● Gibbs sampling ● Metropolis-Hastings algorithm – Laplace approximation – Variational Bayes 45 Bayesian (Machine) Learning 46 Bayesian Models ● Example: Markov Chain Model – Dirichlet prior, Categorical Likelihood ● Bayesian networks ● Topic models (LDA) ● Hierarchical Bayesian models 47 Generalized Linear Model ● Multiple linear regression ● Logistic regression ● Bayesian ANOVA 48 Bayesian Statistical Tests ● Alternatives to frequentist approaches ● Bayesian correlation ● Bayesian t-test 49 Questions? Philipp Singer [email protected] Image credit: talk of Mike West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf 50
© Copyright 2026 Paperzz