Computing Posterior Probabilities Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Overview of Candy Bag Example As described in Russell and Norvig, for Chapter 20 of the 2nd edition: • Five kinds of bags of candies. – – – – – 10% are h1: 100% cherry candies 20% are h2: 75% cherry candies + 25% lime candies 40% are h3: 50% cherry candies + 50% lime candies 20% are h4: 25% cherry candies + 75% lime candies 10% are h5: 100% lime candies • Each bag has an infinite number of candies. – This way, the ratio of candy types inside a bag does not change as we pick candies out of the bag. • We have a bag, and we are picking candies out of it. • Based on the types of candies we are picking, we want to figure out what type of bag we have. 2 Hypotheses and Prior Probabilities • Five kinds of bags of candies. – – – – – 10% are h1: 100% cherry candies 20% are h2: 75% cherry candies + 25% lime candies 40% are h3: 50% cherry candies + 50% lime candies 20% are h4: 25% cherry candies + 75% lime candies 10% are h5: 100% lime candies • Each hi is called a hypothesis. • The initial probability that is given for each hypothesis is called the prior probability for that hypothesis. – It is called prior because it is the probability we have before we have made any observations. 3 Observations and Posteriors • Out of our bag, we pick T candies, whose types are: Q1, Q2, …, QT. – Each Qj is equal to either C (cherry) or L (“lime”). – These Qj’s are called the observations. • Based on our observations, we want to answer two types of questions: • What is P(hi | Q1, …, Qt)? – Probability of hypothesis i after t observations. – This is called the posterior probability of hi. • What is P(Qt+1 = C | Q1, …, Qt)? – Similarly, what is P(Qt+1 = L | Q1, …, Qt) – Probability of observation t+1 after t observations. 4 Simplifying notation • Define: – Pt(hi) = P(hi | Q1, …, Qt) – Pt(Qt+1 = C) = P(Qt+1 = C | Q1, …, Qt)? • Special case: t = 0 (no observations): – P0(hi) = P(hi) • P0(hi) is the prior probability of hi – P0(Q1 = C) = P(Q1 = C) • P0(Q1 = C) is the probability that the first observation is equal to C. 5 Questions We Want to Answer, Revisited Using the simplified notation of the previous slide: • What is Pt(hi)? – Posterior probability of hypothesis i after t observations. • What is Pt(Qt+1 = C)? – Similarly, what is Pt(Qt+1 = L) – Probability of observation t+1 after t observations. 6 Computing P0(Qt) • As an example: Consider 𝑃0 (𝑄1 = 𝐶). • What does 𝑃0 (𝑄1 = 𝐶) mean? 7 Computing P0(Qt) • As an example: Consider 𝑃0 (𝑄1 = 𝐶). • What does 𝑃0 (𝑄1 = 𝐶) mean? • It is the probability that the first candy we pick out of our bag is a cherry candy. • 𝑃0 𝑄1 = 𝐶 = 𝑃 𝑄1 = 𝐶 ℎ1 ) ∗ 𝑃(ℎ1 ) + 𝑃 𝑄1 = 𝐶 ℎ2 ) ∗ 𝑃(ℎ2 ) + 𝑃 𝑄1 = 𝐶 ℎ3 ) ∗ 𝑃(ℎ3 ) + 𝑃 𝑄1 = 𝐶 ℎ4 ) ∗ 𝑃(ℎ4 ) + 𝑃 𝑄1 = 𝐶 ℎ5 ) ∗ 𝑃(ℎ5 ) 8 Computing P0(Qt) • As an example: Consider 𝑃0 (𝑄1 = 𝐶). • What does 𝑃0 (𝑄1 = 𝐶) mean? • It is the probability that the first candy we pick out of our bag is a cherry candy. • 𝑃0 𝑄1 = 𝐶 = 1 ∗ 0.1 + 0.75 ∗ 0.2 + 0.5 ∗ 0.4 + 0.25 ∗ 0.2 + 0 ∗ 0.1 = 0.5 9 Computing P1(hi) • • • • As an example: Consider 𝑃1 (ℎ1 ). What does 𝑃1 (ℎ1 )mean? 𝑃1 ℎ1 = 𝑃 ℎ1 𝑄1 = 𝐶) 𝑃1 ℎ1 is the probability that our bag is of type h1, if the first candy we pick out of our bag is a cherry candy. 10 Computing P1(hi) • As an example: Consider 𝑃1 (ℎ1 ). • 𝑃1 ℎ1 = 𝑃 ℎ1 𝑄1 = 𝐶) • 𝑃 ℎ1 𝑄1 = 𝐶) = 11 Computing P1(hi) • As an example: Consider 𝑃1 (ℎ1 ). • 𝑃1 ℎ1 = 𝑃 ℎ1 𝑄1 = 𝐶) • 𝑃 ℎ1 𝑄1 = 𝐶) = 𝑃 𝑄1 =𝐶 ℎ1 ) ∗𝑃(ℎ1 ) 𝑃(𝑄1 =𝐶) 12 Computing P1(hi) • As an example: Consider 𝑃1 (ℎ1 ). • 𝑃1 ℎ1 = 𝑃 ℎ1 𝑄1 = 𝐶) • 𝑃 ℎ1 𝑄1 = 𝐶) = = 𝑃 𝑄1 =𝐶 ℎ1 ) ∗𝑃(ℎ1 ) 𝑃(𝑄1 =𝐶) 1 ∗0.1 0.5 = 0.2. 13 Computing P1(hi) • Consider 𝑃1 (ℎ2 ). • 𝑃1 ℎ2 = 𝑃 ℎ2 𝑄1 = 𝐶) • 𝑃 ℎ2 𝑄1 = 𝐶) = = 𝑃 𝑄1 =𝐶 ℎ2 ) ∗𝑃(ℎ2 ) 𝑃(𝑄1 =𝐶) 0.75 ∗0.2 0.5 = 0.3. 14 Computing P1(hi) • Consider 𝑃1 (ℎ3 ). • 𝑃1 ℎ3 = 𝑃 ℎ3 𝑄1 = 𝐶) • 𝑃 ℎ3 𝑄1 = 𝐶) = = 𝑃 𝑄1 =𝐶 ℎ3 ) ∗𝑃(ℎ3 ) 𝑃(𝑄1 =𝐶) 0.5 ∗0.4 0.5 = 0.4. 15 Computing P1(hi) • Consider 𝑃1 (ℎ4 ). • 𝑃1 ℎ4 = 𝑃 ℎ4 𝑄1 = 𝐶) • 𝑃 ℎ4 𝑄1 = 𝐶) = = 𝑃 𝑄1 =𝐶 ℎ4 ) ∗𝑃(ℎ4 ) 𝑃(𝑄1 =𝐶) 0.25 ∗0.2 0.5 = 0.1. 16 Computing P1(hi) • Consider 𝑃1 (ℎ5 ). • 𝑃1 ℎ5 = 𝑃 ℎ5 𝑄1 = 𝐶) • 𝑃 ℎ5 𝑄1 = 𝐶) = = 𝑃 𝑄1 =𝐶 ℎ5 ) ∗𝑃(ℎ5 ) 𝑃(𝑄1 =𝐶) 0 ∗0.1 0.5 =0 17 Updated Probabilities, after Q1 = C – – – – – P0(h1) = 0.1 P0(h2) = 0.2 P0(h3) = 0.4 P0(h4) = 0.2 P0(h5) = 0.1 P1(h1) = 0.2 P1(h2) = 0.3 P1(h3) = 0.4 P1(h4) = 0.1 P1(h5) = 0.0 : 100% cherry candies : h2: 75% cherry candies + 25% lime candies : h3: 50% cherry candies + 50% lime candies : h4: 25% cherry candies + 75% lime candies : h5: 100% lime candies • Probabilities have changed for each bag, now that we have picked one candy and we have seen that it is a cherry candy. 18 Computing P1(Qt) • Now, consider 𝑃1 (𝑄2 = 𝐶). • What does 𝑃1 (𝑄2 = 𝐶) mean? • It is the probability that the second candy we pick out of our bag is a cherry candy, given our knowledge of the type of the first candy. – To continue with our previous example, we assume that the first candy was cherry, so 𝑄1 = 𝐶. • 𝑃1 𝑄2 = 𝐶 = 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶) 19 Computing P1(Qt) 𝑃1 𝑄2 = 𝐶 = 𝑃 =𝑃 𝑃 𝑃 𝑃 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶) ℎ1 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ2 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ3 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ4 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ5 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ1 ℎ2 ℎ3 ℎ4 ℎ5 𝑄1 = 𝐶) + 𝑄1 = 𝐶) + 𝑄1 = 𝐶) + 𝑄1 = 𝐶) + 𝑄1 = 𝐶) 20 Computing P1(Qt) 𝑃1 𝑄2 = 𝐶 = 𝑃 =𝑃 𝑃 𝑃 𝑃 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶) ℎ1 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ2 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ3 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ4 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ5 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ1 ℎ2 ℎ3 ℎ4 ℎ5 𝑄1 = 𝐶) + 𝑄1 = 𝐶) + 𝑄1 = 𝐶) + 𝑄1 = 𝐶) + 𝑄1 = 𝐶) • NOTE: Q2 is conditionally independent of Q1 given hi. – If we know the type of bag, what we have already picked does not change our expectation of what we will pick next. • Therefore, we can simplify 𝑃 𝑄1 = 𝐶 ℎ1 , 𝑄1 = 𝐶) as 21 𝑃 𝑄1 = 𝐶 ℎ1 ). Computing P1(Qt) • We simplify 𝑃 𝑄1 = 𝐶 ℎ1 , 𝑄1 = 𝐶) as 𝑃 𝑄1 = 𝐶 ℎ1 ) 𝑃1 𝑄2 = 𝐶 = 𝑃 =𝑃 𝑃 𝑃 𝑃 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶) ℎ1 ) ∗ 𝑃 ℎ1 ℎ2 ) ∗ 𝑃 ℎ2 ℎ3 ) ∗ 𝑃 ℎ3 ℎ4 ) ∗ 𝑃 ℎ4 ℎ5 ) ∗ 𝑃 ℎ5 𝑄1 = 𝐶) + 𝑄1 = 𝐶) + 𝑄1 = 𝐶) + 𝑄1 = 𝐶) + 𝑄1 = 𝐶) 22 Computing P1(Qt) • We simplify 𝑃 𝑄1 = 𝐶 ℎ1 , 𝑄1 = 𝐶) as 𝑃 𝑄1 = 𝐶 ℎ1 ) 𝑃1 𝑄2 = 𝐶 = 𝑃 =𝑃 𝑃 𝑃 𝑃 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶 𝑄1 = 𝐶) ℎ1 ) ∗ 𝑃 ℎ1 ℎ2 ) ∗ 𝑃 ℎ2 ℎ3 ) ∗ 𝑃 ℎ3 ℎ4 ) ∗ 𝑃 ℎ4 ℎ5 ) ∗ 𝑃 ℎ5 𝑄1 = 𝐶) + 𝑄1 = 𝐶) + 𝑄1 = 𝐶) + 𝑄1 = 𝐶) + 𝑄1 = 𝐶) • We now can plug in numbers everywhere: – We know 𝑃 𝑄1 = 𝐶 ℎ𝑖 ). – We computed 𝑃 ℎ𝑖 𝑄1 = 𝐶) in previous slides, and we called it 𝑃1 (ℎ𝑖 ). 23 Computing P1(Qt) 𝑃1 𝑄2 = 𝐶 = 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶) = 1 ∗ 0.2 + 0.75 ∗ 0.3 + 0.5 ∗ 0.4 + 0.25 ∗ 0.1 + 0 ∗ 0 = 0.65 • Notice the difference caused by the knowledge that Q1 = C: 𝑃0 𝑄1 = 𝐶 = 0.5 𝑃1 𝑄2 = 𝐶 = 0.65 24 Computing P2(hi) • Now, consider 𝑃2 (ℎ1 ). • What does 𝑃2 (ℎ1 ) mean? • We have defined 𝑃2 (ℎ1 ) as the probability of h1 after the first two observations. • In our example, the first two observations were both of type cherry. Therefore: 𝑃2 ℎ1 = 𝑃 ℎ1 𝑄1 = 𝐶, 𝑄2 = 𝐶) = ??? 25 A Special Case of Bayes Rule • The normal version of Bayes rule states that: 𝑃(𝐴 | 𝐵) = 𝑃(𝐵 | 𝐴) ∗ 𝑃(𝐴) 𝑃 𝐵 • From the basic formula, we can derive a special case of Bayes rule, that we can apply if we also know some other fact F: 𝑃 𝐵 𝐴, 𝐹) ∗ 𝑃 𝐴 𝐹) 𝑃 𝐴 𝐵, 𝐹) = 𝑃 𝐵|𝐹 • Here, we want to compute 𝑃 ℎ1 𝑄1 = 𝐶, 𝑄2 = 𝐶). • We will apply the special case of Bayes rule, with: – h1 as A. – "Q2 = C" as B. – "Q1 = C" as F. 26 Computing P2(hi) • 𝑃2 ℎ1 = 𝑃 ℎ1 𝑄2 = 𝐶, 𝑄1 = 𝐶) 𝑃 𝑄2 = 𝐶 ℎ1 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ1 𝑄1 = 𝐶) = 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶) • These are all quantities we have computed before: 27 Computing P2(hi) • 𝑃2 ℎ1 = 𝑃 ℎ1 𝑄2 = 𝐶, 𝑄1 = 𝐶) 𝑃 𝑄2 = 𝐶 ℎ1 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ1 𝑄1 = 𝐶) = 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶) • These are all quantities we have computed before: – 𝑃 𝑄2 = 𝐶 ℎ1 , 𝑄1 = 𝐶) = 𝑃 𝑄2 = 𝐶 ℎ1 ) = 1. – 𝑃 ℎ1 𝑄1 = 𝐶) = 𝑃1 ℎ1 = 0.2 – 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶) = 𝑃1 𝑄2 = 𝐶 = 0.65 28 Computing P2(hi) • 𝑃2 ℎ1 = 𝑃 ℎ1 𝑄2 = 𝐶, 𝑄1 = 𝐶) 𝑃 𝑄2 = 𝐶 ℎ1 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ1 𝑄1 = 𝐶) = 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶) 𝑃 𝑄2 = 𝐶 ℎ1 ) ∗ 𝑃1 ℎ1 = 𝑃1 𝑄2 = 𝐶 1 ∗ 0.2 = ≈ 0.3077 0.65 29 Computing P2(hi) • 𝑃2 ℎ2 = 𝑃 ℎ2 𝑄2 = 𝐶, 𝑄1 = 𝐶) 𝑃 𝑄2 = 𝐶 ℎ2 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ2 𝑄1 = 𝐶) = 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶) 𝑃 𝑄2 = 𝐶 ℎ2 ) ∗ 𝑃1 ℎ2 = 𝑃1 𝑄2 = 𝐶 0.75 ∗ 0.3 = ≈ 0.3462 0.65 30 Computing P2(hi) • 𝑃2 ℎ3 = 𝑃 ℎ3 𝑄2 = 𝐶, 𝑄1 = 𝐶) 𝑃 𝑄2 = 𝐶 ℎ3 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ3 𝑄1 = 𝐶) = 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶) 𝑃 𝑄2 = 𝐶 ℎ3 ) ∗ 𝑃1 ℎ3 = 𝑃1 𝑄2 = 𝐶 0.5 ∗ 0.4 = ≈ 0.3077 0.65 31 Computing P2(hi) • 𝑃2 ℎ4 = 𝑃 ℎ4 𝑄2 = 𝐶, 𝑄1 = 𝐶) 𝑃 𝑄2 = 𝐶 ℎ4 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ4 𝑄1 = 𝐶) = 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶) 𝑃 𝑄2 = 𝐶 ℎ4 ) ∗ 𝑃1 ℎ4 = 𝑃1 𝑄2 = 𝐶 0.25 ∗ 0.1 = ≈ 0.0385 0.65 32 Computing P2(hi) • 𝑃2 ℎ5 = 𝑃 ℎ5 𝑄2 = 𝐶, 𝑄1 = 𝐶) 𝑃 𝑄2 = 𝐶 ℎ5 , 𝑄1 = 𝐶) ∗ 𝑃 ℎ5 𝑄1 = 𝐶) = 𝑃 𝑄2 = 𝐶 𝑄1 = 𝐶) 𝑃 𝑄2 = 𝐶 ℎ5 ) ∗ 𝑃1 ℎ5 = 𝑃1 𝑄2 = 𝐶 0 ∗0 = =0 0.65 33 Updated Probabilities • Probabilities of bags: – Before any observations. – After one observation (assuming the first candy is of type cherry). – After two observations (assuming both candies are of type cherry). P0(h1) = 0.1 P0(h2) = 0.2 P0(h3) = 0.4 P0(h4) = 0.2 P0(h5) = 0.1 P1(h1) = 0.2 P1(h2) = 0.3 P1(h3) = 0.4 P1(h4) = 0.1 P1(h5) = 0.0 P2(h1) = 0.3077 P2(h2) = 0.3462 P2(h3) = 0.3077 P2(h4) = 0.0385 P2(h5) = 0.0 h1: 100% cherry candies h2: 75% cherry + 25% lime h3: 50% cherry + 50% lime h4: 25% cherry + 75% lime h5: 100% lime candies 34 Computing Pt(hi) • Let t be an integer between 1 and T: • We have defined Pt(hi) = P(hi | Q1, …, Qt) • To compute Pt(hi), we will use again the special case of Bayes rule that we used before: 𝑃 𝐵 𝐴, 𝐹) ∗ 𝑃 𝐴 𝐹) 𝑃 𝐴 𝐵, 𝐹) = 𝑃 𝐵|𝐹 • We will apply this formula, using: – hi as A. – Qt as B. – "Q1, Q2, …, Qt-1" as F. 35 Computing Pt(hi) • Let t be an integer between 1 and T: • Pt(hi) = P(hi | Q1, …, Qt) = P(Qt | hi, Q1, …, Qt-1) * P(hi | Q1, …, Qt-1) P(Qt | Q1, …, Qt-1) => Pt(hi) = => P(Qt | hi) * Pt-1(hi) Pt-1(Qt) 36 Computing Pt+1(Qt) • Pt(Qt+1) = P(Qt+1 | Q1, …, Qt) = 5 Σi=1 (P(Qt+1 | hi) P(hi | Q1, …, Qt)) => 5 Pt(Qt+1) = Σi=1(P(Qt+1 | hi) Pt(hi)) 37 Computing Pt(hi) (continued) • The formula Pt(hi) = P(Qt | hi) * Pt-1(hi) Pt-1(Qt) is recursive, as it requires knowing Pt-1(hi). • The base case is P0(hi) = P(hi). • To compute Pt(hi) we also need Pt-1(Qt). We show how to compute that next. 38 Computing Pt(hi) and Pt(Qt+1) • Base case: t = 0. – P0(hi) = P(hi), where P(hi) is known. 5 – P0(Q1) = Σi=1( P(Q 1 | hi) * P(hi) ), where P(Q1 | hi) is known. • To compute Pt(hi) and Pt(Qt+1): • For j = 1, …, t P(Qj | hi) * Pj-1(hi) – Compute Pj(hi) = Pj-1(Qj) 5 – Compute Pj(Qj+1) = Σi=1( P(Q j+1 | hi ) * Pj(hi)) 39 Computing Pt(hi) and Pt(Qt+1) • Base case: t = 0. – P0(hi) = P(hi), where P(hi) is known. 5 – P0(Q1) = Σi=1( P(Q 1 | hi) * P(hi) ), where P(Q1 | hi) is known. • To compute Pt(hi) and Pt(Qt+1): known computed at previous round • For j = 1, …, t P(Qj | hi) * Pj-1(hi) – Compute Pj(hi) = Pj-1(Qj) computed at previous round 5 – Compute Pj(Qj+1) = Σi=1( P(Q j+1 | hi ) * Pj(hi)) known computed at previous line 40
© Copyright 2024 Paperzz