Bayesian Updating: Continuous Priors 18.05 Spring 2014 7 6 5 4 3 2 1 0 0.1 0.3 0.5 0.7 0.9 1.0 January 1, 2017 1/ 21 Continuous range of hypotheses Example. Bernoulli with unknown probability of success p. Can hypothesize that p takes any value in [0, 1]. Model: ‘bent coin’ with probability p of heads. Example. Waiting time X ∼ exp(λ) with unknown λ. Can hypothesize that λ takes any value greater than 0. Example. Have normal random variable with unknown µ and σ. Can hypothesisze that (µ, σ) is anywhere in (−∞, ∞) × [0, ∞). January 1, 2017 2/ 21 Example of Bayesian updating so far Three types of coins with probabilities 0.25, 0.5, 0.75 of heads. Assume the numbers of each type are in the ratio 1 to 2 to 1. Assume we pick a coin at random, toss it twice and get TT . Compute the posterior probability the coin has probability 0.25 of heads. January 1, 2017 3/ 21 Notation with lots of hypotheses I. Now there are 5 types of coins with probabilities 0.1, 0.3, 0.5, 0.7, 0.9 of heads. Assume the numbers of each type are in the ratio 1:2:3:2:1 (so fairer coins are more common). Again we pick a coin at random, toss it twice and get TT . Construct the Bayesian update table for the posterior probabilities of each type of coin. hypotheses H C0.1 C0.3 C0.5 C0.7 C0.9 Total prior P(H) 1/9 2/9 3/9 2/9 1/9 1 likelihood P(data|H) (0.9)2 (0.7)2 (0.5)2 (0.3)2 (0.1)2 Bayes numerator P(data|H)P(H) 0.090 0.109 0.083 0.020 0.001 P(data) = 0.303 posterior P(H|data) 0.297 0.359 0.275 0.066 0.004 1 January 1, 2017 4/ 21 Notation with lots of hypotheses II. What about 9 coins with probabilities 0.1, 0.2, 0.3, . . . , 0.9? Assume fairer coins are more common with the number of coins of probability θ of heads proportional to θ(1 − θ) Again the data is TT . We can do this! January 1, 2017 5/ 21 Table with 9 hypotheses hypotheses prior likelihood P(H) P(data|H) H k(0.1 · 0.9) (0.9)2 C0.1 k(0.2 · 0.8) (0.8)2 C0.2 k(0.3 · 0.7) (0.7)2 C0.3 k(0.4 · 0.6) (0.6)2 C0.4 C0.5 k(0.5 · 0.5) (0.5)2 k(0.6 · 0.4) (0.4)2 C0.6 k(0.7 · 0.3) (0.3)2 C0.7 k(0.8 · 0.2) (0.2)2 C0.8 k(0.9 · 0.1) (0.1)2 C0.9 Total 1 Bayes numerator posterior P(data|H)P(H) P(H|data) 0.0442 0.1483 0.0621 0.2083 0.0624 0.2093 0.0524 0.1757 0.0379 0.1271 0.0233 0.0781 0.0115 0.0384 0.0039 0.0130 0.0005 0.0018 P(data) = 0.298 1 k = 0.606 was computed so that the total prior probability is 1. January 1, 2017 6/ 21 Notation with lots of hypotheses III. What about 99 coins with probabilities 0.01, 0.02, 0.03, . . . , 0.99? Assume fairer coins are more common with the number of coins of probability θ of heads proportional to θ(1 − θ) Again the data is TT . We could do this . . . January 1, 2017 7/ 21 Table with 99 coins Hypothesis H H C0.01 C0.02 C0.03 C0.04 C0.05 C0.06 C0.07 C0.08 C0.09 C0.1 C0.11 C0.12 C0.13 C0.14 C0.15 C0.16 C0.17 C0.18 C0.19 C0.2 C0.21 C0.22 C0.23 C0.24 C0.25 C0.26 C0.27 C0.28 C0.29 C0.3 C0.31 C0.32 C0.33 C0.34 C0.35 C0.36 C0.37 C0.38 C0.39 C0.4 C0.41 C0.42 C0.43 C0.44 C0.45 C0.46 C0.47 C0.48 C0.49 C0.5 C0.51 C0.52 C0.53 C0.54 C0.55 C0.56 C0.57 C0.58 C0.59 C0.6 C0.61 C0.62 C0.63 C0.64 C0.65 C0.66 C0.67 C0.68 prior P (H) P (H) (k = 0.04) k(0.01)(1-0.01) k(0.02)(1-0.02) k(0.03)(1-0.03) k(0.04)(1-0.04) k(0.05)(1-0.05) k(0.06)(1-0.06) k(0.07)(1-0.07) k(0.08)(1-0.08) k(0.09)(1-0.09) k(0.1)(1-0.1) k(0.11)(1-0.11) k(0.12)(1-0.12) k(0.13)(1-0.13) k(0.14)(1-0.14) k(0.15)(1-0.15) k(0.16)(1-0.16) k(0.17)(1-0.17) k(0.18)(1-0.18) k(0.19)(1-0.19) k(0.2)(1-0.2) k(0.21)(1-0.21) k(0.22)(1-0.22) k(0.23)(1-0.23) k(0.24)(1-0.24) k(0.25)(1-0.25) k(0.26)(1-0.26) k(0.27)(1-0.27) k(0.28)(1-0.28) k(0.29)(1-0.29) k(0.3)(1-0.3) k(0.31)(1-0.31) k(0.32)(1-0.32) k(0.33)(1-0.33) k(0.34)(1-0.34) k(0.35)(1-0.35) k(0.36)(1-0.36) k(0.37)(1-0.37) k(0.38)(1-0.38) k(0.39)(1-0.39) k(0.4)(1-0.4) k(0.41)(1-0.41) k(0.42)(1-0.42) k(0.43)(1-0.43) k(0.44)(1-0.44) k(0.45)(1-0.45) k(0.46)(1-0.46) k(0.47)(1-0.47) k(0.48)(1-0.48) k(0.49)(1-0.49) k(0.5)(1-0.5) k(0.51)(1-0.51) k(0.52)(1-0.52) k(0.53)(1-0.53) k(0.54)(1-0.54) k(0.55)(1-0.55) k(0.56)(1-0.56) k(0.57)(1-0.57) k(0.58)(1-0.58) k(0.59)(1-0.59) k(0.6)(1-0.6) k(0.61)(1-0.61) k(0.62)(1-0.62) k(0.63)(1-0.63) k(0.64)(1-0.64) k(0.65)(1-0.65) k(0.66)(1-0.66) k(0.67)(1-0.67) k(0.68)(1-0.68) likelihood P (data | hyp.) P (data | hyp.) (1 − 0.01)2 (1 − 0.02)2 (1 − 0.03)2 (1 − 0.04)2 (1 − 0.05)2 (1 − 0.06)2 (1 − 0.07)2 (1 − 0.08)2 (1 − 0.09)2 (1 − 0.1)2 (1 − 0.11)2 (1 − 0.12)2 (1 − 0.13)2 (1 − 0.14)2 (1 − 0.15)2 (1 − 0.16)2 (1 − 0.17)2 (1 − 0.18)2 (1 − 0.19)2 (1 − 0.2)2 (1 − 0.21)2 (1 − 0.22)2 (1 − 0.23)2 (1 − 0.24)2 (1 − 0.25)2 (1 − 0.26)2 (1 − 0.27)2 (1 − 0.28)2 (1 − 0.29)2 (1 − 0.3)2 (1 − 0.31)2 (1 − 0.32)2 (1 − 0.33)2 (1 − 0.34)2 (1 − 0.35)2 (1 − 0.36)2 (1 − 0.37)2 (1 − 0.38)2 (1 − 0.39)2 (1 − 0.4)2 (1 − 0.41)2 (1 − 0.42)2 (1 − 0.43)2 (1 − 0.44)2 (1 − 0.45)2 (1 − 0.46)2 (1 − 0.47)2 (1 − 0.48)2 (1 − 0.49)2 (1 − 0.5)2 (1 − 0.51)2 (1 − 0.52)2 (1 − 0.53)2 (1 − 0.54)2 (1 − 0.55)2 (1 − 0.56)2 (1 − 0.57)2 (1 − 0.58)2 (1 − 0.59)2 (1 − 0.6)2 (1 − 0.61)2 (1 − 0.62)2 (1 − 0.63)2 (1 − 0.64)2 (1 − 0.65)2 (1 − 0.66)2 (1 − 0.67)2 (1 − 0.68)2 2 Bayes numerator P (data | hyp.)P (H) k(0.01)(1 − 0.01)2 k(0.02)(1 − 0.02)2 k(0.03)(1 − 0.03)2 k(0.04)(1 − 0.04)2 k(0.05)(1 − 0.05)2 k(0.06)(1 − 0.06)2 k(0.07)(1 − 0.07)2 k(0.08)(1 − 0.08)2 k(0.09)(1 − 0.09)2 k(0.1)(1 − 0.1)2 k(0.11)(1 − 0.11)2 k(0.12)(1 − 0.12)2 k(0.13)(1 − 0.13)2 k(0.14)(1 − 0.14)2 k(0.15)(1 − 0.15)2 k(0.16)(1 − 0.16)2 k(0.17)(1 − 0.17)2 k(0.18)(1 − 0.18)2 k(0.19)(1 − 0.19)2 k(0.2)(1 − 0.2)2 k(0.21)(1 − 0.21)2 k(0.22)(1 − 0.22)2 k(0.23)(1 − 0.23)2 k(0.24)(1 − 0.24)2 k(0.25)(1 − 0.25)2 k(0.26)(1 − 0.26)2 k(0.27)(1 − 0.27)2 k(0.28)(1 − 0.28)2 k(0.29)(1 − 0.29)2 k(0.3)(1 − 0.3)2 k(0.31)(1 − 0.31)2 k(0.32)(1 − 0.32)2 k(0.33)(1 − 0.33)2 k(0.34)(1 − 0.34)2 k(0.35)(1 − 0.35)2 k(0.36)(1 − 0.36)2 k(0.37)(1 − 0.37)2 k(0.38)(1 − 0.38)2 k(0.39)(1 − 0.39)2 k(0.4)(1 − 0.4)2 k(0.41)(1 − 0.41)2 k(0.42)(1 − 0.42)2 k(0.43)(1 − 0.43)2 k(0.44)(1 − 0.44)2 k(0.45)(1 − 0.45)2 k(0.46)(1 − 0.46)2 k(0.47)(1 − 0.47)2 k(0.48)(1 − 0.48)2 k(0.49)(1 − 0.49)2 k(0.5)(1 − 0.5)2 k(0.51)(1 − 0.51)2 k(0.52)(1 − 0.52)2 k(0.53)(1 − 0.53)2 k(0.54)(1 − 0.54)2 k(0.55)(1 − 0.55)2 k(0.56)(1 − 0.56)2 k(0.57)(1 − 0.57)2 k(0.58)(1 − 0.58)2 k(0.59)(1 − 0.59)2 k(0.6)(1 − 0.6)2 k(0.61)(1 − 0.61)2 k(0.62)(1 − 0.62)2 k(0.63)(1 − 0.63)2 k(0.64)(1 − 0.64)2 k(0.65)(1 − 0.65)2 k(0.66)(1 − 0.66)2 k(0.67)(1 − 0.67)2 k(0.68)(1 − 0.68)2 2 Posterior P (data | hyp.)P (H)/P (data) 0.001940921 0.003765396 0.005476951 0.007079068 0.008575179 0.009968669 0.01126288 0.01246108 0.01356654 0.01458243 0.0155119 0.01635805 0.01712393 0.01781254 0.01842682 0.01896969 0.019444 0.01985256 0.02019812 0.02048341 0.02071109 0.02088377 0.02100402 0.02107436 0.02109727 0.02107516 0.02101042 0.02090537 0.0207623 0.02058343 0.02037095 0.020127 0.01985367 0.01955299 0.01922695 0.01887751 0.01850656 0.01811595 0.01770747 0.01728288 0.01684389 0.01639214 0.01592925 0.01545678 0.01497625 0.0144891 0.01399677 0.01350062 0.01300196 0.01250208 0.0120022 0.01150349 0.01100707 0.01051404 0.01002542 0.009542198 0.009065309 0.008595641 0.008134034 0.00768128 0.007238124 0.006805262 0.006383342 0.005972963 0.005574679 0.005188993 0.004816361 0.004457191 January 1, 2017 8/ 21 Maybe there’s a better way Use some symbolic notation! Let θ be the probability of heads: θ = 0.01, 0.02, . . . , 0.99. Use θ to also stand for the hypothesis that the coin is of type with probability of heads = θ. We are given a formula for the prior: p(θ) = kθ(1 − θ) The likelihood P(data|θ) = P(TT |θ) = (1 − θ)2 . Our 99 row table becomes: hyp. H θ Total prior P(H) kθ(1 − θ) 1 likelihood P(data|H) (1 − θ)2 Bayes numerator P(data|H)P(H) kθ(1 − θ)3 P(data) = 0.300 posterior P(H|data) 0.006 · θ(1 − θ)3 1 (We used R to compute k so that the total prior probability is 1. Then we used it again to compute P(data) and k/P(data) = 0.060.) January 1, 2017 9/ 21 Notation: big and little letters 1. (Big letters) Event A, probability function P(A). 2. (Little letters) Value x, pmf p(x) or pdf f (x). ‘X = x’ is an event: P(X = x) = p(x). Bayesian updating 3. (Big letters) For hypotheses H and data D: P(H), P(D), P(H|D), P(D|H). 4. (Small letters) Hypothesis values θ and data values x: p(θ) f (θ) dθ p(x) f (x) dx p(θ|x) f (θ|x) dθ p(x|θ) f (x|θ) dx Example. Coin example in reading January 1, 2017 10/ 21 Review of pdf and probability X random variable with pdf f (x). f (x) is a density with units: probability/units of x. f (x) f (x) probability f (x)dx P (c ≤ X ≤ d) c d dx x x P(c ≤ X ≤ d) = l x d f (x) dx. c Probability X is in an infitesimal range dx around x is f (x) dx January 1, 2017 11/ 21 Example of continuous hypotheses Example. Suppose that we have a coin with probability of heads θ, where θ is unknown. We can hypothesize that θ takes any value in [0, 1]. Since θ is continuous we need a prior pdf f (θ), e.g. f (θ) = kθ(1 − θ). Use f (θ) dθ to work with probabilities instead of densities, e.g. For example, the prior probability that θ is in an infinitesimal range dθ around 0.5 is f (0.5) dθ. To avoid cumbersome language we will simply say ‘The hypothesis θ has prior probability f (θ) dθ.’ January 1, 2017 12/ 21 Law of total probability for continuous distributions Discrete set of hypotheses H1 , H2 , . . . Hn ; data D: P(D) = n n P(D|Hi )P(Hi ). i=1 In little letters: Hypothesis θ1 , θ2 , . . . , θn ; data x p(x) = n n p(x|θi )p(θi ). i=1 Continuous range of hypothesis θ on [a, b]; discrete data x: l b p(x) = p(x|θ)f (θ) dθ a Also called prior predictive probability of the outcome x. January 1, 2017 13/ 21 Board question: total probability 1. A coin has unknown probability of heads θ with prior pdf f (θ) = 3θ2 . Find the probability of throwing tails on the first toss. 2. Describe a setup with success and failure that this models. January 1, 2017 14 / 20 Bayes’ theorem for continuous distributions θ: continuous parameter with pdf f (θ) and range [a, b]; x: random discrete data; likelihood: p(x|θ). Bayes’ Theorem. f (θ|x) dθ = p(x|θ)f (θ) dθ p(x|θ)f (θ) dθ = Jb . p(x) p(x|θ)f (θ) dθ a Not everyone uses dθ (but they should): f (θ|x) = p(x|θ)f (θ) p(x|θ)f (θ) = Jb . p(x) p(x|θ)f (θ) dθ a January 1, 2017 15 / 20 Concept question Suppose X ∼ Bernoulli(θ) where the value of θ is unknown. If we use Bayesian methods to make probabilistic statements about θ then which of the following is true? 1. The random variable is discrete, the space of hypotheses is discrete. 2. The random variable is discrete, the space of hypotheses is continuous. 3. The random variable is continuous, the space of hypotheses is discrete. 4. The random variable is continuous, the space of hypotheses is continuous. January 1, 2017 16 / 20 Bayesian update tables: continuous priors X ∼ Bernoulli(θ). Unknown θ Continuous hypotheses θ in [0,1]. Data x. Prior pdf f (θ). Likelihood p(x|θ). hypothesis prior θ Total f (θ) dθ 1 likelihood Bayes numerator p(x|θ) p(x|θ)f (θ) dθ p(x) = Z posterior p(x|θ)f (θ) dθ p(x) 1 p(x|θ)f (θ) dθ 1 0 ; Note p(x) = the prior predictive probability of x. January 1, 2017 17 / 20 Board question ‘Bent’ coin: unknown probability θ of heads. Prior: f (θ) = 2θ on [0, 1]. Data: toss and get heads. 1. Find the posterior pdf to this new data. 2. Suppose you toss again and get tails. Update your posterior from problem 1 using this data. 3. On one set of axes graph the prior and the posteriors from problems 1 and 2. January 1, 2017 18 / 20 Board Question Same scenario: bent coin ∼ Bernoulli(θ). Flat prior: f (θ) = 1 on [0, 1] Data: toss 27 times and get 15 heads and 12 tails. 1. Use this data to find the posterior pdf. Give the integral for the normalizing factor, but do not compute it out. Call its value T and give the posterior pdf in terms of T . January 1, 2017 19 / 20 Beta distribution Beta(a, b) has density f (θ) = (a + b − 1)! a−1 θ (1 − θ)b−1 (a − 1)!(b − 1)! http://mathlets.org/mathlets/beta-distribution/ Observation: The coefficient is a normalizing factor so if f (θ) = cθa−1 (1 − θ)b−1 is a pdf, then c= (a + b − 1)! (a − 1)!(b − 1)! and f (θ) is the pdf of a Beta(a, b) distribution. January 1, 2017 20 / 20 MIT OpenCourseWare https://ocw.mit.edu 18.05 Introduction to Probability and Statistics Spring 2014 For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms .
© Copyright 2026 Paperzz