30 3. Conditional Probability and independence From the definition in (3.1), we get a formula for computing the joint probability of two events, or more generally of a finite number of events. Proposition 3.3. Let E1 , . . . , EN 2 F such that P(E1 · · · EN 1) P(E1 · · · EN ) = P(E1 )P(E2 |E1 )P(E3 |E1 E2 ) · · · P(EN |E1 · · · EN > 0, then 1 ). (3.2) The proof can be done either by induction or simply using the formula (3.1) to replace the conditional probabilities on the right-hand side of (3.2) with the ratios of joint and marginal probabilities, that cancel each other. Example 3.4. Let us consider again three consecutive tosses of a coin. The sample space is shown in Example 2.1. We are now interested in the events: E = “the total number of heads is two” = {HHT, HT H, T HH}, F = “the first toss results in a head” = {HHH, HHT, HT H, HT T }, G = “the second toss results in a head” = {HHH, HHT, T HH, T HT }. In particular we want to know the probability of the event F , in three scenarios where we have no information on the outcome of the experiment, we know that the event E has occurred, and we know that the event G has occurred, respectively. Since we are in an equal-likelihood model, the (unconditional) proabability of an event is computed by means of the counting rules: P(F ) = 4 8 = 50%. In order to compute the conditional probabilities, two methods are available: restricting the sample space to the event being conditioned on and using the counting rules on it, or using the formula in (3.1). Both ways give the same results, that is P(F |E) = P(F |G) = 2 3 2 4 = 66.7%, = 50%. Note that, while conditioning on the event E does a↵ect the probability of the event F , conditioning on the event G does not. The fact that the conditional probability of an event given another event is equal to the unconditional probability leads to the notion of independence of two events. 31 Definition 3.5. Let E, F 2 F such that P(F ) > 0. The event E is said to be independent of the event F if P(E|F ) = P(E). So for instance, the event F is independent of the event E in Example 3.4. However, this definition of independence requires strictly positive probability for the event being conditioned on. From the formula in (3.2), we get an equivalent definition of independence for a couple of events which, unlike Definition 3.5, does not require any of the events to have strictly positive probability and is symmetric in thw two events. Definition 3.6. Given E, F 2 F, the events E and F are independent if and only if P(EF ) = P(E)P(F ). (3.3) Remark 3.7. GIven two independent events E, F 2 F, all the following are pairs of independent events: E c F , EF c , E c F c . Try to prove it as an exercise using the law of partitions, the properties of probability measures, and Definition 3.5. When we consider a collection of more than two events, the suitable notion of independence may di↵er from the most intuitive one, that is assuming independence of all pairs in the set. Indeed, we would like that a set of independent events is such that any two events obtained by performing set operations on the events of the given collection are also independent. Example 3.8 (Rolling dice). We recall Example 2.14, where the random experiment consists in rolling two balanced dice, one black (denoted B) and one gray (denoted G). We consider the following three events: E = “B comes up even”, with probability P(E) = F = “G comes up even”, with probability P(F ) = 18 36 18 36 = 0.5, = 0.5, G = “the sum of the dice is even”, with probability P(G) = 18 36 = 0.5. Computing the joint probabilities of paris of events, we get in all cases the same result: P(EF ) = P(EG) = P(F G) = 9 = 25%. 36 32 3. Conditional Probability and independence The same result is also obtained for the joint probability of all three events, since in fact E\F \G = E\G = E\F = F \G, since G = (E\F )[(E c \F c ), as it can only occur when both dice come up even of neither of them does. So: P(EF G) = 25% 6= 12.5% = P(EF )P(G). This means that E \ F and G are not independent events. We finally come to the appropriate definition for a collection of events. Definition 3.9. Let E1 , . . . , EN 2 F , then (En )N n=1 is a collection of (mutually) independent events if each sub-collection (Ekn )rn=1 , where r 2 N, 2 r N , satisfies P(Ek1 · · · Ekr ) = P(Ek1 ) · · · P(Ekr ). Remark 3.10. Let E1 , . . . , EN 2 F. We have: (En )N n=1 are mutually independent =) (En )N n=1 are pairwise independent; (En )N n=1 are pairwise independent =) 6 (En )N n=1 are mutually independent. We now come back to the properties of conditional probabilities and we see an application of them to a famous example, the gambler’s ruin. Proposition 3.11 (Law of total probability). Let (En )n form a partition of ⌦. Then, for any event F 2 F, P(F ) = X n P(En )P(F |En ). (3.4) In particular, given any other event E 2 F, P(F ) = P(E)P(F |E) + P(E c )P(F |E c ). Proposition 3.12. Let F 2 F such that P(F ) > 0. Then, the map PF : F ! R defined on the collection of events as PF (E) = P(E|F ) is a probability measure on (⌦, F). 8E 2 F, (3.5) 33 Example 3.13 (Gambler’s ruin). Two players, A and B, gamble according to the following rules: a coin is tossed and, if the result is a head then A pays one unit of money to B, while if the result is a tail then B pays one unit of money to A; the game is repeated until one of the two players runs out of money. The winner of the game is the player that has all the money when the game ends. Assume that the two players have a total amount of N units of money to play with. Denote by p the probability that the coin ends up with a head in any of the independent tosses. We want to compute the probability that A wins the game, that is B gets ruined, if the initial distribution of money consists of i units for A and N i units for B. We denote: E = “A wins” and Pi (E) = P(E|“A starts with i units of money”), for i 2 {1, . . . , N }. Note that Pi is a well-defined probability measure, according to Proposition 3.12. We want to compute Pi (E) for any i 2 {1, . . . , N }. Also note that P0 (E) = 0 and PN (E) = 1. In order to do that, we proceed one step (toss of the coin) at a time. Denoting H = “the result of the first toss is a head”, and using the law of total probability, we have Pi (E) = Pi (H)Pi (E|H) + Pi (H c )Pi (E|H c ) = pPi+1 (EE) + (1 p)Pi 1 (E). We denote pi := Pi (E) for all i 2 {1, . . . , N } and we rewrite the equation above in a way to have a recursive formula for the di↵erence pi+1 1 p pi+1 pi = (pi pi 1 ). p pi : Replacing the di↵erence on the right-hand side by its recursive definition, iteratively up to p1 p0 = p1 , we get pi+1 pi = ✓ 1 p p Taking the sum of these di↵erences we get ◆k i 1 i 1 ✓ X X 1 p pi p1 = (pk+1 pk ) = p1 p k=1 k=1 ◆i p1 . ) pi = i 1 ✓ X 1 k=0 p p ◆k p1 . 34 3. Conditional Probability and independence Hence pi = 8 1 p i > < 1 ( 1p p) p1 , p 6= 1 , 2 1 > : i p1 , p p= 1 2 Since pN = 1, we obtain p1 , and consequently pi , in terms of the initial parameters: p1 = 8 > < 1 1 ( > :1, 1 p p 1 p N p ) p1 , p 6= N p= 1 , 2 ) 1 2 8 1 p i > < 1 ( p )N , p 6= 1 , 1 p 2 pi = 1 ( p ) > :i, p = 12 . N Then, denote the probability that given the same initial distribution of money the game ends and B wins by qi = P(“B wins”|“A starts with i units of money”), for i 2 {1, . . . , N }. Following the same iterative procedure, we find 8 N i p > < 1 ( 1 p ) N , p 6= 1 , 2 1 ( 1pp ) qi = > :N i, p = 12 . N For p = 12 , we get pi + qi = i N + N i . N For p 6= 12 , we get pN pN i (1 p)i (1 + pN (1 p)N pN (1 p)N = N p (1 p)N p i + qi = p)N (1 p)N N +i pN (1 p)N pN i = 1. This proves that the scenario where the game goes on forever and no player ever runs out of money is impossible: with probability 1, one of the two players will run out of money in a finite time. Let’s give a numerical example. Assume N = 15 and A starts with i = 5 units of money and B with N i = 10 units. If the coin is balanced, that is if p = 0.5, then the probability tha A wins (in a finite time) is p5 = 1/3 = 66.7%. If the coin is slightly biased, with probability of getting a head being p = 0.6, then p5 = 87%, which means the probability that A wins increases considerably. 35 A very useful application of conditional probabilities and related properties is the so-called Bayes’s rule (named after Thomas Bayes), very used in Statistics and useful for the computation of conditional probabilities when the deifinition in (3.1) cannot be directly used with the available data. Proposition 3.14. Let (En )n be a (at most countable) partition of ⌦. Then, for any F 2 F, P(F |Ei )P(Ei ) P(Ei |F ) = P , n P(En )P(F |En ) i = 1, 2, . . . . (3.6) The formula (3.6) tells us how to revise the probability of any of the events in the partition after some information is given. The object to be computed is just a conditional probability, but when we don’t know either the probability of the event whose occurrence is announced or the joint probabilities, we can use the reversed conditional probabilities. Proof. By (3.1), (3.2) and (3.4), we have P(Ei |F ) = P(Ei F ) P(F |Ei )P(Ei ) =P . P(F ) n P(En )P(F |En ) The following famous example is a straightforward application of the Bayes’s rule. Example 3.15 (Monty Hall). In a TV show, the contestant faces three closed doors, one of which hides a valuable prize, say a car, and each of the other two hides sheep. The contestant chooses a door, and at the end of the game: if behind the chosen door is the car, then they win it, otherwise they get nothing. Assume the contestant chooses the door number 1, and the game-show host decides to help by opening the door number 2, which shows a sheep, and giving the contestant the possibility to change their initial choice if they want. Should the contestant switch to the door number 3, or should they keep the door number 1? In order to answer, we have to compute the probability that the car is behind 36 3. Conditional Probability and independence the door number 1, or analogously 3 since the two probabilities are complementary, given that the host opened the door number 2. Denote by Ei the event “the car is behind the door number i”, for i = 1, 2, 3, and by F the event “the game host opens the door number 2”. Without any information, we have P(Ei ) = 1/3 for all i = 1, 2, 3. We also know the probability that the host opens the door number 2 given that the prize is behind the door number i for each i = 1, 2, 3, that is: 1 P(F |E1 ) = , 2 P(F |E2 ) = 0, P(F |E3 ) = 1. Using the Bayes’s rule, we can compute P(F |E3 )P(E3 ) P(E3 |F ) = P3 = i=1 P(Ei )P(F |Ei ) 11 23 11 + 13 23 2 = , 3 and consequently P(E1 |F ) = 13 . Thus, the contestant has a high probability of winning the car if they switch to the door number 3. Chapter 4 Discrete random variables Sometimes we are not interested in the outcome of a random experiment but in some numerical function of it. In this case, the value we are interested in depends on the outcomes of the experiment, which is not known in advance, so it is random. Definition 4.1. A random variable is a real-valued function on the sample space: X : ⌦ ! R, X : ! 7! X(!) 8! 2 ⌦. Example 4.2. Recalling the experiment of three consecutive tosses of a coin, assume we are interesting in the number of heads obtained. This is a random variable, that we denote by X. Now let us rewrite the following events in terms of the random variable X: “the total number of heads is two” = {(HHT ), (HT H), (T HH)} = {! 2 ⌦ : X(!) = 2} ⌘ {X = 2}, “the total number of heads is odd” = {(HHH), (HT T ), (T HT ), (T T H)} = {! 2 ⌦ : X(!) 2 A} ⌘ {X 2 A}, where A = {n 2 N : n odd}. In general, given a random variable X and a set of outcomes A ⇢ ⌦, we use the notation {X 2 A} ⌘ {! 2 ⌦ : X(!) 2 A}. 37 38 4. Discrete random variables Random variable that count something are called discrete, because they can only take values in a discrete set. Definition 4.3. A random variable X is a discrete random variable if there exists a countable set K ⇢ R such that P(X 2 K) = 1. Let X be a discrete random variable, its probability mass function (PMF), or density function, is a real map pX , defined by pX : R ! R, pX (x) = P(X = x) 8x 2 R. The probability mass function of a discrete random variable is usually visually represented with a histogram, where the horizontal axis contains the possible values of the random variable and the vertical axis the corresponding probabilities.
© Copyright 2025 Paperzz