Bayesian Updating: Continuous Prior, Discrete Data

Bayesian Updating: Continuous Priors
18.05 Spring 2014
7
6
5
4
3
2
1
0
0.1
0.3
0.5
0.7
0.9
1.0
January 1, 2017
1/ 21
Continuous range of hypotheses
Example. Bernoulli with unknown probability of success p.
Can hypothesize that p takes any value in [0, 1].
Model: ‘bent coin’ with probability p of heads.
Example. Waiting time X ∼ exp(λ) with unknown λ.
Can hypothesize that λ takes any value greater than 0.
Example. Have normal random variable with unknown µ and σ. Can
hypothesisze that (µ, σ) is anywhere in (−∞, ∞) × [0, ∞).
January 1, 2017
2/ 21
Example of Bayesian updating so far
Three types of coins with probabilities 0.25, 0.5, 0.75 of heads.
Assume the numbers of each type are in the ratio 1 to 2 to 1.
Assume we pick a coin at random, toss it twice and get TT .
Compute the posterior probability the coin has probability 0.25 of
heads.
January 1, 2017
3/ 21
Notation with lots of hypotheses I.
Now there are 5 types of coins with probabilities 0.1, 0.3, 0.5,
0.7, 0.9 of heads.
Assume the numbers of each type are in the ratio 1:2:3:2:1 (so
fairer coins are more common).
Again we pick a coin at random, toss it twice and get TT .
Construct the Bayesian update table for the posterior probabilities of
each type of coin.
hypotheses
H
C0.1
C0.3
C0.5
C0.7
C0.9
Total
prior
P(H)
1/9
2/9
3/9
2/9
1/9
1
likelihood
P(data|H)
(0.9)2
(0.7)2
(0.5)2
(0.3)2
(0.1)2
Bayes numerator
P(data|H)P(H)
0.090
0.109
0.083
0.020
0.001
P(data) = 0.303
posterior
P(H|data)
0.297
0.359
0.275
0.066
0.004
1
January 1, 2017
4/ 21
Notation with lots of hypotheses II.
What about 9 coins with probabilities 0.1, 0.2, 0.3, . . . , 0.9?
Assume fairer coins are more common with the number of coins of
probability θ of heads proportional to θ(1 − θ)
Again the data is TT .
We can do this!
January 1, 2017
5/ 21
Table with 9 hypotheses
hypotheses
prior
likelihood
P(H)
P(data|H)
H
k(0.1 · 0.9)
(0.9)2
C0.1
k(0.2 · 0.8)
(0.8)2
C0.2
k(0.3 · 0.7)
(0.7)2
C0.3
k(0.4 · 0.6)
(0.6)2
C0.4
C0.5
k(0.5 · 0.5)
(0.5)2
k(0.6 · 0.4)
(0.4)2
C0.6
k(0.7 · 0.3)
(0.3)2
C0.7
k(0.8 · 0.2)
(0.2)2
C0.8
k(0.9 · 0.1)
(0.1)2
C0.9
Total
1
Bayes numerator posterior
P(data|H)P(H) P(H|data)
0.0442
0.1483
0.0621
0.2083
0.0624
0.2093
0.0524
0.1757
0.0379
0.1271
0.0233
0.0781
0.0115
0.0384
0.0039
0.0130
0.0005
0.0018
P(data) = 0.298
1
k = 0.606 was computed so that the total prior probability is 1.
January 1, 2017
6/ 21
Notation with lots of hypotheses III.
What about 99 coins with probabilities 0.01, 0.02, 0.03, . . . , 0.99?
Assume fairer coins are more common with the number of coins of
probability θ of heads proportional to θ(1 − θ)
Again the data is TT .
We could do this . . .
January 1, 2017
7/ 21
Table with 99 coins
Hypothesis H
H
C0.01
C0.02
C0.03
C0.04
C0.05
C0.06
C0.07
C0.08
C0.09
C0.1
C0.11
C0.12
C0.13
C0.14
C0.15
C0.16
C0.17
C0.18
C0.19
C0.2
C0.21
C0.22
C0.23
C0.24
C0.25
C0.26
C0.27
C0.28
C0.29
C0.3
C0.31
C0.32
C0.33
C0.34
C0.35
C0.36
C0.37
C0.38
C0.39
C0.4
C0.41
C0.42
C0.43
C0.44
C0.45
C0.46
C0.47
C0.48
C0.49
C0.5
C0.51
C0.52
C0.53
C0.54
C0.55
C0.56
C0.57
C0.58
C0.59
C0.6
C0.61
C0.62
C0.63
C0.64
C0.65
C0.66
C0.67
C0.68
prior P (H)
P (H) (k = 0.04)
k(0.01)(1-0.01)
k(0.02)(1-0.02)
k(0.03)(1-0.03)
k(0.04)(1-0.04)
k(0.05)(1-0.05)
k(0.06)(1-0.06)
k(0.07)(1-0.07)
k(0.08)(1-0.08)
k(0.09)(1-0.09)
k(0.1)(1-0.1)
k(0.11)(1-0.11)
k(0.12)(1-0.12)
k(0.13)(1-0.13)
k(0.14)(1-0.14)
k(0.15)(1-0.15)
k(0.16)(1-0.16)
k(0.17)(1-0.17)
k(0.18)(1-0.18)
k(0.19)(1-0.19)
k(0.2)(1-0.2)
k(0.21)(1-0.21)
k(0.22)(1-0.22)
k(0.23)(1-0.23)
k(0.24)(1-0.24)
k(0.25)(1-0.25)
k(0.26)(1-0.26)
k(0.27)(1-0.27)
k(0.28)(1-0.28)
k(0.29)(1-0.29)
k(0.3)(1-0.3)
k(0.31)(1-0.31)
k(0.32)(1-0.32)
k(0.33)(1-0.33)
k(0.34)(1-0.34)
k(0.35)(1-0.35)
k(0.36)(1-0.36)
k(0.37)(1-0.37)
k(0.38)(1-0.38)
k(0.39)(1-0.39)
k(0.4)(1-0.4)
k(0.41)(1-0.41)
k(0.42)(1-0.42)
k(0.43)(1-0.43)
k(0.44)(1-0.44)
k(0.45)(1-0.45)
k(0.46)(1-0.46)
k(0.47)(1-0.47)
k(0.48)(1-0.48)
k(0.49)(1-0.49)
k(0.5)(1-0.5)
k(0.51)(1-0.51)
k(0.52)(1-0.52)
k(0.53)(1-0.53)
k(0.54)(1-0.54)
k(0.55)(1-0.55)
k(0.56)(1-0.56)
k(0.57)(1-0.57)
k(0.58)(1-0.58)
k(0.59)(1-0.59)
k(0.6)(1-0.6)
k(0.61)(1-0.61)
k(0.62)(1-0.62)
k(0.63)(1-0.63)
k(0.64)(1-0.64)
k(0.65)(1-0.65)
k(0.66)(1-0.66)
k(0.67)(1-0.67)
k(0.68)(1-0.68)
likelihood P (data | hyp.)
P (data | hyp.)
(1 − 0.01)2
(1 − 0.02)2
(1 − 0.03)2
(1 − 0.04)2
(1 − 0.05)2
(1 − 0.06)2
(1 − 0.07)2
(1 − 0.08)2
(1 − 0.09)2
(1 − 0.1)2
(1 − 0.11)2
(1 − 0.12)2
(1 − 0.13)2
(1 − 0.14)2
(1 − 0.15)2
(1 − 0.16)2
(1 − 0.17)2
(1 − 0.18)2
(1 − 0.19)2
(1 − 0.2)2
(1 − 0.21)2
(1 − 0.22)2
(1 − 0.23)2
(1 − 0.24)2
(1 − 0.25)2
(1 − 0.26)2
(1 − 0.27)2
(1 − 0.28)2
(1 − 0.29)2
(1 − 0.3)2
(1 − 0.31)2
(1 − 0.32)2
(1 − 0.33)2
(1 − 0.34)2
(1 − 0.35)2
(1 − 0.36)2
(1 − 0.37)2
(1 − 0.38)2
(1 − 0.39)2
(1 − 0.4)2
(1 − 0.41)2
(1 − 0.42)2
(1 − 0.43)2
(1 − 0.44)2
(1 − 0.45)2
(1 − 0.46)2
(1 − 0.47)2
(1 − 0.48)2
(1 − 0.49)2
(1 − 0.5)2
(1 − 0.51)2
(1 − 0.52)2
(1 − 0.53)2
(1 − 0.54)2
(1 − 0.55)2
(1 − 0.56)2
(1 − 0.57)2
(1 − 0.58)2
(1 − 0.59)2
(1 − 0.6)2
(1 − 0.61)2
(1 − 0.62)2
(1 − 0.63)2
(1 − 0.64)2
(1 − 0.65)2
(1 − 0.66)2
(1 − 0.67)2
(1 − 0.68)2
2
Bayes numerator
P (data | hyp.)P (H)
k(0.01)(1 − 0.01)2
k(0.02)(1 − 0.02)2
k(0.03)(1 − 0.03)2
k(0.04)(1 − 0.04)2
k(0.05)(1 − 0.05)2
k(0.06)(1 − 0.06)2
k(0.07)(1 − 0.07)2
k(0.08)(1 − 0.08)2
k(0.09)(1 − 0.09)2
k(0.1)(1 − 0.1)2
k(0.11)(1 − 0.11)2
k(0.12)(1 − 0.12)2
k(0.13)(1 − 0.13)2
k(0.14)(1 − 0.14)2
k(0.15)(1 − 0.15)2
k(0.16)(1 − 0.16)2
k(0.17)(1 − 0.17)2
k(0.18)(1 − 0.18)2
k(0.19)(1 − 0.19)2
k(0.2)(1 − 0.2)2
k(0.21)(1 − 0.21)2
k(0.22)(1 − 0.22)2
k(0.23)(1 − 0.23)2
k(0.24)(1 − 0.24)2
k(0.25)(1 − 0.25)2
k(0.26)(1 − 0.26)2
k(0.27)(1 − 0.27)2
k(0.28)(1 − 0.28)2
k(0.29)(1 − 0.29)2
k(0.3)(1 − 0.3)2
k(0.31)(1 − 0.31)2
k(0.32)(1 − 0.32)2
k(0.33)(1 − 0.33)2
k(0.34)(1 − 0.34)2
k(0.35)(1 − 0.35)2
k(0.36)(1 − 0.36)2
k(0.37)(1 − 0.37)2
k(0.38)(1 − 0.38)2
k(0.39)(1 − 0.39)2
k(0.4)(1 − 0.4)2
k(0.41)(1 − 0.41)2
k(0.42)(1 − 0.42)2
k(0.43)(1 − 0.43)2
k(0.44)(1 − 0.44)2
k(0.45)(1 − 0.45)2
k(0.46)(1 − 0.46)2
k(0.47)(1 − 0.47)2
k(0.48)(1 − 0.48)2
k(0.49)(1 − 0.49)2
k(0.5)(1 − 0.5)2
k(0.51)(1 − 0.51)2
k(0.52)(1 − 0.52)2
k(0.53)(1 − 0.53)2
k(0.54)(1 − 0.54)2
k(0.55)(1 − 0.55)2
k(0.56)(1 − 0.56)2
k(0.57)(1 − 0.57)2
k(0.58)(1 − 0.58)2
k(0.59)(1 − 0.59)2
k(0.6)(1 − 0.6)2
k(0.61)(1 − 0.61)2
k(0.62)(1 − 0.62)2
k(0.63)(1 − 0.63)2
k(0.64)(1 − 0.64)2
k(0.65)(1 − 0.65)2
k(0.66)(1 − 0.66)2
k(0.67)(1 − 0.67)2
k(0.68)(1 − 0.68)2
2
Posterior
P (data | hyp.)P (H)/P (data)
0.001940921
0.003765396
0.005476951
0.007079068
0.008575179
0.009968669
0.01126288
0.01246108
0.01356654
0.01458243
0.0155119
0.01635805
0.01712393
0.01781254
0.01842682
0.01896969
0.019444
0.01985256
0.02019812
0.02048341
0.02071109
0.02088377
0.02100402
0.02107436
0.02109727
0.02107516
0.02101042
0.02090537
0.0207623
0.02058343
0.02037095
0.020127
0.01985367
0.01955299
0.01922695
0.01887751
0.01850656
0.01811595
0.01770747
0.01728288
0.01684389
0.01639214
0.01592925
0.01545678
0.01497625
0.0144891
0.01399677
0.01350062
0.01300196
0.01250208
0.0120022
0.01150349
0.01100707
0.01051404
0.01002542
0.009542198
0.009065309
0.008595641
0.008134034
0.00768128
0.007238124
0.006805262
0.006383342
0.005972963
0.005574679
0.005188993
0.004816361
0.004457191
January 1, 2017
8/ 21
Maybe there’s a better way
Use some symbolic notation!
Let θ be the probability of heads: θ = 0.01, 0.02, . . . , 0.99.
Use θ to also stand for the hypothesis that the coin is of type
with probability of heads = θ.
We are given a formula for the prior: p(θ) = kθ(1 − θ)
The likelihood P(data|θ) = P(TT |θ) = (1 − θ)2 .
Our 99 row table becomes:
hyp.
H
θ
Total
prior
P(H)
kθ(1 − θ)
1
likelihood
P(data|H)
(1 − θ)2
Bayes numerator
P(data|H)P(H)
kθ(1 − θ)3
P(data) = 0.300
posterior
P(H|data)
0.006 · θ(1 − θ)3
1
(We used R to compute k so that the total prior probability is 1. Then we
used it again to compute P(data) and k/P(data) = 0.060.)
January 1, 2017
9/ 21
Notation: big and little letters
1. (Big letters) Event A, probability function P(A).
2. (Little letters) Value x, pmf p(x) or pdf f (x).
‘X = x’ is an event:
P(X = x) = p(x).
Bayesian updating
3. (Big letters) For hypotheses H and data D:
P(H), P(D), P(H|D), P(D|H).
4. (Small letters) Hypothesis values θ and data values x:
p(θ)
f (θ) dθ
p(x)
f (x) dx
p(θ|x)
f (θ|x) dθ
p(x|θ)
f (x|θ) dx
Example. Coin example in reading
January 1, 2017
10/ 21
Review of pdf and probability
X random variable with pdf f (x).
f (x) is a density with units: probability/units of x.
f (x)
f (x)
probability f (x)dx
P (c ≤ X ≤ d)
c
d
dx
x
x
P(c ≤ X ≤ d) =
l
x
d
f (x) dx.
c
Probability X is in an infitesimal range dx around x is
f (x) dx
January 1, 2017
11/ 21
Example of continuous hypotheses
Example. Suppose that we have a coin with probability of heads θ,
where θ is unknown. We can hypothesize that θ takes any value in
[0, 1].
Since θ is continuous we need a prior pdf f (θ), e.g.
f (θ) = kθ(1 − θ).
Use f (θ) dθ to work with probabilities instead of densities, e.g.
For example, the prior probability that θ is in an infinitesimal
range dθ around 0.5 is f (0.5) dθ.
To avoid cumbersome language we will simply say
‘The hypothesis θ has prior probability f (θ) dθ.’
January 1, 2017
12/ 21
Law of total probability for continuous distributions
Discrete set of hypotheses H1 , H2 , . . . Hn ; data D:
P(D) =
n
n
P(D|Hi )P(Hi ).
i=1
In little letters: Hypothesis θ1 , θ2 , . . . , θn ; data x
p(x) =
n
n
p(x|θi )p(θi ).
i=1
Continuous range of hypothesis θ on [a, b]; discrete data x:
l b
p(x) =
p(x|θ)f (θ) dθ
a
Also called prior predictive probability of the outcome x.
January 1, 2017
13/ 21
Board question: total probability
1. A coin has unknown probability of heads θ with prior pdf
f (θ) = 3θ2 . Find the probability of throwing tails on the first toss.
2. Describe a setup with success and failure that this models.
January 1, 2017
14 / 20
Bayes’ theorem for continuous distributions
θ:
continuous parameter with pdf f (θ) and range [a, b];
x:
random discrete data;
likelihood:
p(x|θ).
Bayes’ Theorem.
f (θ|x) dθ =
p(x|θ)f (θ) dθ
p(x|θ)f (θ) dθ
= Jb
.
p(x)
p(x|θ)f
(θ)
dθ
a
Not everyone uses dθ (but they should):
f (θ|x) =
p(x|θ)f (θ)
p(x|θ)f (θ)
= Jb
.
p(x)
p(x|θ)f
(θ)
dθ
a
January 1, 2017
15 / 20
Concept question
Suppose X ∼ Bernoulli(θ) where the value of θ is unknown. If we use
Bayesian methods to make probabilistic statements about θ then
which of the following is true?
1. The random variable is discrete, the space of hypotheses is
discrete.
2. The random variable is discrete, the space of hypotheses is
continuous.
3. The random variable is continuous, the space of hypotheses is
discrete.
4. The random variable is continuous, the space of hypotheses is
continuous.
January 1, 2017
16 / 20
Bayesian update tables: continuous priors
X ∼ Bernoulli(θ). Unknown θ
Continuous hypotheses θ in [0,1].
Data x.
Prior pdf f (θ).
Likelihood p(x|θ).
hypothesis prior
θ
Total
f (θ) dθ
1
likelihood
Bayes
numerator
p(x|θ)
p(x|θ)f (θ) dθ
p(x) =
Z
posterior
p(x|θ)f (θ) dθ
p(x)
1
p(x|θ)f (θ) dθ
1
0
;
Note p(x) = the prior predictive probability of x.
January 1, 2017
17 / 20
Board question
‘Bent’ coin: unknown probability θ of heads.
Prior: f (θ) = 2θ on [0, 1].
Data: toss and get heads.
1. Find the posterior pdf to this new data.
2. Suppose you toss again and get tails. Update your posterior from
problem 1 using this data.
3. On one set of axes graph the prior and the posteriors from
problems 1 and 2.
January 1, 2017
18 / 20
Board Question
Same scenario: bent coin ∼ Bernoulli(θ).
Flat prior: f (θ) = 1 on [0, 1]
Data: toss 27 times and get 15 heads and 12 tails.
1. Use this data to find the posterior pdf.
Give the integral for the normalizing factor, but do not compute it
out. Call its value T and give the posterior pdf in terms of T .
January 1, 2017
19 / 20
Beta distribution
Beta(a, b) has density
f (θ) =
(a + b − 1)! a−1
θ (1 − θ)b−1
(a − 1)!(b − 1)!
http://mathlets.org/mathlets/beta-distribution/
Observation: The coefficient is a normalizing factor so if
f (θ) = cθa−1 (1 − θ)b−1
is a pdf, then
c=
(a + b − 1)!
(a − 1)!(b − 1)!
and f (θ) is the pdf of a Beta(a, b) distribution.
January 1, 2017
20 / 20
MIT OpenCourseWare
https://ocw.mit.edu
18.05 Introduction to Probability and Statistics
Spring 2014
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms .