Priors and Posteriors (instructor slides)

Computing Posterior
Probabilities
Vassilis Athitsos
CSE 4308/5360: Artificial Intelligence I
University of Texas at Arlington
1
Overview of Candy Bag Example
As described in Russell and Norvig, for Chapter 20 of the 2nd edition:
• Five kinds of bags of candies.
–
–
–
–
–
10% are h1: 100% cherry candies
20% are h2: 75% cherry candies + 25% lime candies
40% are h3: 50% cherry candies + 50% lime candies
20% are h4: 25% cherry candies + 75% lime candies
10% are h5: 100% lime candies
• Each bag has an infinite number of candies.
– This way, the ratio of candy types inside a bag does not change as we pick
candies out of the bag.
• We have a bag, and we are picking candies out of it.
• Based on the types of candies we are picking, we want to figure
out what type of bag we have.
2
Hypotheses and Prior Probabilities
• Five kinds of bags of candies.
–
–
–
–
–
10% are h1: 100% cherry candies
20% are h2: 75% cherry candies + 25% lime candies
40% are h3: 50% cherry candies + 50% lime candies
20% are h4: 25% cherry candies + 75% lime candies
10% are h5: 100% lime candies
• Each hi is called a hypothesis.
• The initial probability that is given for each hypothesis is
called the prior probability for that hypothesis.
– It is called prior because it is the probability we have before we
have made any observations.
3
Observations and Posteriors
• Out of our bag, we pick T candies, whose types are:
Q1, Q2, …, QT.
– Each Qj is equal to either C (cherry) or L (“lime”).
– These Qj’s are called the observations.
• Based on our observations, we want to answer two
types of questions:
• What is P(hi | Q1, …, Qt)?
– Probability of hypothesis i after t observations.
– This is called the posterior probability of hi.
• What is P(Qt+1 = C | Q1, …, Qt)?
– Similarly, what is P(Qt+1 = L | Q1, …, Qt)
– Probability of observation t+1 after t observations.
4
Simplifying notation
• Define:
– Pt(hi) = P(hi | Q1, …, Qt)
– Pt(Qt+1 = C) = P(Qt+1 = C | Q1, …, Qt)?
• Special case: t = 0 (no observations):
– P0(hi) = P(hi)
• P0(hi) is the prior probability of hi
– P0(Q1 = C) = P(Q1 = C)
• P0(Q1 = C) is the probability that the first observation is
equal to C.
5
Questions We Want to Answer,
Revisited
Using the simplified notation of the previous
slide:
• What is Pt(hi)?
– Posterior probability of hypothesis i after t
observations.
• What is Pt(Qt+1 = C)?
– Similarly, what is Pt(Qt+1 = L)
– Probability of observation t+1 after t
observations.
6
Computing P0(Qt)
• As an example: Consider 𝑃0 (𝑄1 = 𝐶).
• What does 𝑃0 (𝑄1 = 𝐶) mean?
7
Computing P0(Qt)
• As an example: Consider 𝑃0 (𝑄1 = 𝐶).
• What does 𝑃0 (𝑄1 = 𝐶) mean?
• It is the probability that the first candy we pick
out of our bag is a cherry candy.
• 𝑃0 𝑄1 = 𝐶 = 𝑃 𝑄1 = 𝐶 ℎ1 ) ∗ 𝑃(ℎ1 ) +
𝑃 𝑄1 = 𝐶 ℎ2 ) ∗ 𝑃(ℎ2 ) +
𝑃 𝑄1 = 𝐶 ℎ3 ) ∗ 𝑃(ℎ3 ) +
𝑃 𝑄1 = 𝐶 ℎ4 ) ∗ 𝑃(ℎ4 ) +
𝑃 𝑄1 = 𝐶 ℎ5 ) ∗ 𝑃(ℎ5 )
8
Computing P0(Qt)
• As an example: Consider 𝑃0 (𝑄1 = 𝐶).
• What does 𝑃0 (𝑄1 = 𝐶) mean?
• It is the probability that the first candy we pick
out of our bag is a cherry candy.
• 𝑃0 𝑄1 = 𝐶 = 1 ∗ 0.1 +
0.75 ∗ 0.2 +
0.5 ∗ 0.4 +
0.25 ∗ 0.2 +
0 ∗ 0.1 = 0.5
9
Computing P1(hi)
•
•
•
•
As an example: Consider 𝑃1 (ℎ1 ).
What does 𝑃1 (ℎ1 )mean?
𝑃1 ℎ1 = 𝑃 ℎ1 𝑄1 = 𝐶)
𝑃1 ℎ1 is the probability that our bag is of
type h1, if the first candy we pick out of our
bag is a cherry candy.
10
Computing P1(hi)
• As an example: Consider 𝑃1 (ℎ1 ).
• 𝑃1 ℎ1 = 𝑃 ℎ1 𝑄1 = 𝐶)
• 𝑃 ℎ1 𝑄1 = 𝐶) =
11
Computing P1(hi)
• As an example: Consider 𝑃1 (ℎ1 ).
• 𝑃1 ℎ1 = 𝑃 ℎ1 𝑄1 = 𝐶)
• 𝑃 ℎ1 𝑄1 = 𝐶) =
𝑃 𝑄1 =𝐶 ℎ1 ) ∗𝑃(ℎ1 )
𝑃(𝑄1 =𝐶)
12
Computing P1(hi)
• As an example: Consider 𝑃1 (ℎ1 ).
• 𝑃1 ℎ1 = 𝑃 ℎ1 𝑄1 = 𝐶)
• 𝑃 ℎ1 𝑄1 = 𝐶) =
=
𝑃 𝑄1 =𝐶 ℎ1 ) ∗𝑃(ℎ1 )
𝑃(𝑄1 =𝐶)
1 ∗0.1
0.5
= 0.2.
13
Computing P1(hi)
• Consider 𝑃1 (ℎ2 ).
• 𝑃1 ℎ2 = 𝑃 ℎ2 𝑄1 = 𝐶)
• 𝑃 ℎ2 𝑄1 = 𝐶) =
=
𝑃 𝑄1 =𝐶 ℎ2 ) ∗𝑃(ℎ2 )
𝑃(𝑄1 =𝐶)
0.75 ∗0.2
0.5
= 0.3.
14
Computing P1(hi)
• Consider 𝑃1 (ℎ3 ).
• 𝑃1 ℎ3 = 𝑃 ℎ3 𝑄1 = 𝐶)
• 𝑃 ℎ3 𝑄1 = 𝐶) =
=
𝑃 𝑄1 =𝐶 ℎ3 ) ∗𝑃(ℎ3 )
𝑃(𝑄1 =𝐶)
0.5 ∗0.4
0.5
= 0.4.
15
Computing P1(hi)
• Consider 𝑃1 (ℎ4 ).
• 𝑃1 ℎ4 = 𝑃 ℎ4 𝑄1 = 𝐶)
• 𝑃 ℎ4 𝑄1 = 𝐶) =
=
𝑃 𝑄1 =𝐶 ℎ4 ) ∗𝑃(ℎ4 )
𝑃(𝑄1 =𝐶)
0.25 ∗0.2
0.5
= 0.1.
16
Computing P1(hi)
• Consider 𝑃1 (ℎ5 ).
• 𝑃1 ℎ5 = 𝑃 ℎ5 𝑄1 = 𝐶)
• 𝑃 ℎ5 𝑄1 = 𝐶) =
=
𝑃 𝑄1 =𝐶 ℎ5 ) ∗𝑃(ℎ5 )
𝑃(𝑄1 =𝐶)
0 ∗0.1
0.5
=0
17
Updated Probabilities, after Q1 = C
–
–
–
–
–
P0(h1) = 0.1
P0(h2) = 0.2
P0(h3) = 0.4
P0(h4) = 0.2
P0(h5) = 0.1
P1(h1) = 0.2
P1(h2) = 0.3
P1(h3) = 0.4
P1(h4) = 0.1
P1(h5) = 0.0
: 100% cherry candies
: h2: 75% cherry candies + 25% lime candies
: h3: 50% cherry candies + 50% lime candies
: h4: 25% cherry candies + 75% lime candies
: h5: 100% lime candies
• Probabilities have changed for each bag, now that we have picked
one candy and we have seen that it is a cherry candy.
18
Computing P1(Qt)
• Now, consider 𝑃1 (𝑄2 = 𝐶).
• What does 𝑃1 (𝑄2 = 𝐶) mean?
• It is the probability that the second candy we pick out
of our bag is a cherry candy, given our knowledge of
the type of the first candy.
– To continue with our previous example, we assume that the
first candy was cherry, so 𝑄1 = 𝐶.
• 𝑃1 𝑄2 = 𝐶 = 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶)
19
Computing P1(Qt)
𝑃1 𝑄2 = 𝐶 = 𝑃
=𝑃
𝑃
𝑃
𝑃
𝑃
𝑄2 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶)
ℎ1 , 𝑄1 = 𝐶) ∗ 𝑃
ℎ2 , 𝑄1 = 𝐶) ∗ 𝑃
ℎ3 , 𝑄1 = 𝐶) ∗ 𝑃
ℎ4 , 𝑄1 = 𝐶) ∗ 𝑃
ℎ5 , 𝑄1 = 𝐶) ∗ 𝑃
ℎ1
ℎ2
ℎ3
ℎ4
ℎ5
𝑄1 = 𝐶) +
𝑄1 = 𝐶) +
𝑄1 = 𝐶) +
𝑄1 = 𝐶) +
𝑄1 = 𝐶)
20
Computing P1(Qt)
𝑃1 𝑄2 = 𝐶 = 𝑃
=𝑃
𝑃
𝑃
𝑃
𝑃
𝑄2 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶)
ℎ1 , 𝑄1 = 𝐶) ∗ 𝑃
ℎ2 , 𝑄1 = 𝐶) ∗ 𝑃
ℎ3 , 𝑄1 = 𝐶) ∗ 𝑃
ℎ4 , 𝑄1 = 𝐶) ∗ 𝑃
ℎ5 , 𝑄1 = 𝐶) ∗ 𝑃
ℎ1
ℎ2
ℎ3
ℎ4
ℎ5
𝑄1 = 𝐶) +
𝑄1 = 𝐶) +
𝑄1 = 𝐶) +
𝑄1 = 𝐶) +
𝑄1 = 𝐶)
• NOTE: Q2 is conditionally independent of Q1 given hi.
– If we know the type of bag, what we have already picked does
not change our expectation of what we will pick next.
• Therefore, we can simplify 𝑃 𝑄1 = 𝐶 ℎ1 , 𝑄1 = 𝐶) as
21
𝑃 𝑄1 = 𝐶 ℎ1 ).
Computing P1(Qt)
• We simplify 𝑃 𝑄1 = 𝐶 ℎ1 , 𝑄1 = 𝐶) as 𝑃 𝑄1 = 𝐶 ℎ1 )
𝑃1 𝑄2 = 𝐶 = 𝑃
=𝑃
𝑃
𝑃
𝑃
𝑃
𝑄2 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶)
ℎ1 ) ∗ 𝑃 ℎ1
ℎ2 ) ∗ 𝑃 ℎ2
ℎ3 ) ∗ 𝑃 ℎ3
ℎ4 ) ∗ 𝑃 ℎ4
ℎ5 ) ∗ 𝑃 ℎ5
𝑄1 = 𝐶) +
𝑄1 = 𝐶) +
𝑄1 = 𝐶) +
𝑄1 = 𝐶) +
𝑄1 = 𝐶)
22
Computing P1(Qt)
• We simplify 𝑃 𝑄1 = 𝐶 ℎ1 , 𝑄1 = 𝐶) as 𝑃 𝑄1 = 𝐶 ℎ1 )
𝑃1 𝑄2 = 𝐶 = 𝑃
=𝑃
𝑃
𝑃
𝑃
𝑃
𝑄2 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶
𝑄1 = 𝐶)
ℎ1 ) ∗ 𝑃 ℎ1
ℎ2 ) ∗ 𝑃 ℎ2
ℎ3 ) ∗ 𝑃 ℎ3
ℎ4 ) ∗ 𝑃 ℎ4
ℎ5 ) ∗ 𝑃 ℎ5
𝑄1 = 𝐶) +
𝑄1 = 𝐶) +
𝑄1 = 𝐶) +
𝑄1 = 𝐶) +
𝑄1 = 𝐶)
• We now can plug in numbers everywhere:
– We know 𝑃 𝑄1 = 𝐶 ℎ𝑖 ).
– We computed 𝑃 ℎ𝑖 𝑄1 = 𝐶) in previous slides, and we
called it 𝑃1 (ℎ𝑖 ).
23
Computing P1(Qt)
𝑃1 𝑄2 = 𝐶 = 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶)
= 1 ∗ 0.2 +
0.75 ∗ 0.3 +
0.5 ∗ 0.4 +
0.25 ∗ 0.1 +
0 ∗ 0 = 0.65
• Notice the difference caused by the knowledge that
Q1 = C:
𝑃0 𝑄1 = 𝐶 = 0.5
𝑃1 𝑄2 = 𝐶 = 0.65
24
Computing P2(hi)
• Now, consider 𝑃2 (ℎ1 ).
• What does 𝑃2 (ℎ1 ) mean?
• We have defined 𝑃2 (ℎ1 ) as the probability of
h1 after the first two observations.
• In our example, the first two observations
were both of type cherry. Therefore:
𝑃2 ℎ1 = 𝑃 ℎ1 𝑄1 = 𝐶, 𝑄2 = 𝐶)
= ???
25
A Special Case of Bayes Rule
• The normal version of Bayes rule states that:
𝑃(𝐴 | 𝐵) =
𝑃(𝐵 | 𝐴) ∗ 𝑃(𝐴)
𝑃 𝐵
• From the basic formula, we can derive a special case of Bayes rule,
that we can apply if we also know some other fact F:
𝑃 𝐵 𝐴, 𝐹) ∗ 𝑃 𝐴 𝐹)
𝑃 𝐴 𝐵, 𝐹) =
𝑃 𝐵|𝐹
• Here, we want to compute 𝑃 ℎ1 𝑄1 = 𝐶, 𝑄2 = 𝐶).
• We will apply the special case of Bayes rule, with:
– h1 as A.
– "Q2 = C" as B.
– "Q1 = C" as F.
26
Computing P2(hi)
• 𝑃2 ℎ1 = 𝑃 ℎ1 𝑄2 = 𝐶, 𝑄1 = 𝐶)
𝑃 𝑄2 = 𝐶 ℎ1 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ1 𝑄1 = 𝐶)
=
𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶)
• These are all quantities we have computed
before:
27
Computing P2(hi)
• 𝑃2 ℎ1 = 𝑃 ℎ1 𝑄2 = 𝐶, 𝑄1 = 𝐶)
𝑃 𝑄2 = 𝐶 ℎ1 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ1 𝑄1 = 𝐶)
=
𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶)
• These are all quantities we have computed
before:
– 𝑃 𝑄2 = 𝐶 ℎ1 , 𝑄1 = 𝐶) = 𝑃 𝑄2 = 𝐶 ℎ1 ) = 1.
– 𝑃 ℎ1 𝑄1 = 𝐶) = 𝑃1 ℎ1 = 0.2
– 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶) = 𝑃1 𝑄2 = 𝐶 = 0.65
28
Computing P2(hi)
• 𝑃2 ℎ1 = 𝑃 ℎ1 𝑄2 = 𝐶, 𝑄1 = 𝐶)
𝑃 𝑄2 = 𝐶 ℎ1 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ1 𝑄1 = 𝐶)
=
𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶)
𝑃 𝑄2 = 𝐶 ℎ1 ) ∗ 𝑃1 ℎ1
=
𝑃1 𝑄2 = 𝐶
1 ∗ 0.2
=
≈ 0.3077
0.65
29
Computing P2(hi)
• 𝑃2 ℎ2 = 𝑃 ℎ2 𝑄2 = 𝐶, 𝑄1 = 𝐶)
𝑃 𝑄2 = 𝐶 ℎ2 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ2 𝑄1 = 𝐶)
=
𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶)
𝑃 𝑄2 = 𝐶 ℎ2 ) ∗ 𝑃1 ℎ2
=
𝑃1 𝑄2 = 𝐶
0.75 ∗ 0.3
=
≈ 0.3462
0.65
30
Computing P2(hi)
• 𝑃2 ℎ3 = 𝑃 ℎ3 𝑄2 = 𝐶, 𝑄1 = 𝐶)
𝑃 𝑄2 = 𝐶 ℎ3 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ3 𝑄1 = 𝐶)
=
𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶)
𝑃 𝑄2 = 𝐶 ℎ3 ) ∗ 𝑃1 ℎ3
=
𝑃1 𝑄2 = 𝐶
0.5 ∗ 0.4
=
≈ 0.3077
0.65
31
Computing P2(hi)
• 𝑃2 ℎ4 = 𝑃 ℎ4 𝑄2 = 𝐶, 𝑄1 = 𝐶)
𝑃 𝑄2 = 𝐶 ℎ4 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ4 𝑄1 = 𝐶)
=
𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶)
𝑃 𝑄2 = 𝐶 ℎ4 ) ∗ 𝑃1 ℎ4
=
𝑃1 𝑄2 = 𝐶
0.25 ∗ 0.1
=
≈ 0.0385
0.65
32
Computing P2(hi)
• 𝑃2 ℎ5 = 𝑃 ℎ5 𝑄2 = 𝐶, 𝑄1 = 𝐶)
𝑃 𝑄2 = 𝐶 ℎ5 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ5 𝑄1 = 𝐶)
=
𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶)
𝑃 𝑄2 = 𝐶 ℎ5 ) ∗ 𝑃1 ℎ5
=
𝑃1 𝑄2 = 𝐶
0 ∗0
=
=0
0.65
33
Updated Probabilities
• Probabilities of bags:
– Before any observations.
– After one observation (assuming the first candy is of type cherry).
– After two observations (assuming both candies are of type cherry).
P0(h1) = 0.1
P0(h2) = 0.2
P0(h3) = 0.4
P0(h4) = 0.2
P0(h5) = 0.1
P1(h1) = 0.2
P1(h2) = 0.3
P1(h3) = 0.4
P1(h4) = 0.1
P1(h5) = 0.0
P2(h1) = 0.3077
P2(h2) = 0.3462
P2(h3) = 0.3077
P2(h4) = 0.0385
P2(h5) = 0.0
h1: 100% cherry candies
h2: 75% cherry + 25% lime
h3: 50% cherry + 50% lime
h4: 25% cherry + 75% lime
h5: 100% lime candies
34
Computing Pt(hi)
• Let t be an integer between 1 and T:
• We have defined Pt(hi) = P(hi | Q1, …, Qt)
• To compute Pt(hi), we will use again the special case
of Bayes rule that we used before:
𝑃 𝐵 𝐴, 𝐹) ∗ 𝑃 𝐴 𝐹)
𝑃 𝐴 𝐵, 𝐹) =
𝑃 𝐵|𝐹
• We will apply this formula, using:
– hi as A.
– Qt as B.
– "Q1, Q2, …, Qt-1" as F.
35
Computing Pt(hi)
• Let t be an integer between 1 and T:
• Pt(hi) = P(hi | Q1, …, Qt) =
P(Qt | hi, Q1, …, Qt-1) * P(hi | Q1, …, Qt-1)
P(Qt | Q1, …, Qt-1)
=> Pt(hi) =
=>
P(Qt | hi) * Pt-1(hi)
Pt-1(Qt)
36
Computing Pt+1(Qt)
• Pt(Qt+1) = P(Qt+1 | Q1, …, Qt) =
5
Σi=1 (P(Qt+1 | hi) P(hi | Q1, …, Qt)) =>
5
Pt(Qt+1) =
Σi=1(P(Qt+1 | hi) Pt(hi))
37
Computing Pt(hi) (continued)
• The formula
Pt(hi) =
P(Qt | hi) * Pt-1(hi)
Pt-1(Qt)
is recursive, as it requires
knowing Pt-1(hi).
• The base case is P0(hi) = P(hi).
• To compute Pt(hi) we also need Pt-1(Qt). We
show how to compute that next.
38
Computing Pt(hi) and Pt(Qt+1)
• Base case: t = 0.
– P0(hi) = P(hi), where P(hi) is known.
5
– P0(Q1) =
Σi=1( P(Q
1
| hi) * P(hi) ), where P(Q1 | hi) is known.
• To compute Pt(hi) and Pt(Qt+1):
• For j = 1, …, t
P(Qj | hi) * Pj-1(hi)
– Compute Pj(hi) =
Pj-1(Qj)
5
– Compute Pj(Qj+1) =
Σi=1( P(Q
j+1
| hi ) * Pj(hi))
39
Computing Pt(hi) and Pt(Qt+1)
• Base case: t = 0.
– P0(hi) = P(hi), where P(hi) is known.
5
– P0(Q1) =
Σi=1( P(Q
1
| hi) * P(hi) ), where P(Q1 | hi) is known.
• To compute Pt(hi) and Pt(Qt+1):
known
computed at previous round
• For j = 1, …, t
P(Qj | hi) * Pj-1(hi)
– Compute Pj(hi) =
Pj-1(Qj)
computed at previous round
5
– Compute Pj(Qj+1) =
Σi=1( P(Q
j+1
| hi ) * Pj(hi))
known
computed at previous line
40