30 3. Conditional Probability and independence From the definition in

30
3. Conditional Probability and independence
From the definition in (3.1), we get a formula for computing the joint
probability of two events, or more generally of a finite number of events.
Proposition 3.3. Let E1 , . . . , EN 2 F such that P(E1 · · · EN
1)
P(E1 · · · EN ) = P(E1 )P(E2 |E1 )P(E3 |E1 E2 ) · · · P(EN |E1 · · · EN
> 0, then
1 ).
(3.2)
The proof can be done either by induction or simply using the formula
(3.1) to replace the conditional probabilities on the right-hand side of (3.2)
with the ratios of joint and marginal probabilities, that cancel each other.
Example 3.4. Let us consider again three consecutive tosses of a coin. The
sample space is shown in Example 2.1. We are now interested in the events:
E = “the total number of heads is two” = {HHT, HT H, T HH},
F = “the first toss results in a head” = {HHH, HHT, HT H, HT T },
G = “the second toss results in a head” = {HHH, HHT, T HH, T HT }.
In particular we want to know the probability of the event F , in three scenarios where we have no information on the outcome of the experiment, we
know that the event E has occurred, and we know that the event G has
occurred, respectively. Since we are in an equal-likelihood model, the (unconditional) proabability of an event is computed by means of the counting
rules: P(F ) =
4
8
= 50%. In order to compute the conditional probabilities,
two methods are available: restricting the sample space to the event being
conditioned on and using the counting rules on it, or using the formula in
(3.1). Both ways give the same results, that is
P(F |E) =
P(F |G) =
2
3
2
4
= 66.7%,
= 50%.
Note that, while conditioning on the event E does a↵ect the probability of
the event F , conditioning on the event G does not.
The fact that the conditional probability of an event given another event
is equal to the unconditional probability leads to the notion of independence
of two events.
31
Definition 3.5. Let E, F 2 F such that P(F ) > 0. The event E is said to
be independent of the event F if P(E|F ) = P(E).
So for instance, the event F is independent of the event E in Example 3.4.
However, this definition of independence requires strictly positive probability
for the event being conditioned on.
From the formula in (3.2), we get an equivalent definition of independence
for a couple of events which, unlike Definition 3.5, does not require any of
the events to have strictly positive probability and is symmetric in thw two
events.
Definition 3.6. Given E, F 2 F, the events E and F are independent if
and only if
P(EF ) = P(E)P(F ).
(3.3)
Remark 3.7. GIven two independent events E, F 2 F, all the following are
pairs of independent events: E c F , EF c , E c F c .
Try to prove it as an exercise using the law of partitions, the properties
of probability measures, and Definition 3.5.
When we consider a collection of more than two events, the suitable notion
of independence may di↵er from the most intuitive one, that is assuming
independence of all pairs in the set. Indeed, we would like that a set of
independent events is such that any two events obtained by performing set
operations on the events of the given collection are also independent.
Example 3.8 (Rolling dice). We recall Example 2.14, where the random
experiment consists in rolling two balanced dice, one black (denoted B) and
one gray (denoted G). We consider the following three events:
E = “B comes up even”, with probability P(E) =
F = “G comes up even”, with probability P(F ) =
18
36
18
36
= 0.5,
= 0.5,
G = “the sum of the dice is even”, with probability P(G) =
18
36
= 0.5.
Computing the joint probabilities of paris of events, we get in all cases the
same result:
P(EF ) = P(EG) = P(F G) =
9
= 25%.
36
32
3. Conditional Probability and independence
The same result is also obtained for the joint probability of all three events,
since in fact E\F \G = E\G = E\F = F \G, since G = (E\F )[(E c \F c ),
as it can only occur when both dice come up even of neither of them does.
So:
P(EF G) = 25% 6= 12.5% = P(EF )P(G).
This means that E \ F and G are not independent events.
We finally come to the appropriate definition for a collection of events.
Definition 3.9. Let E1 , . . . , EN 2 F , then (En )N
n=1 is a collection of (mutually) independent events if each sub-collection (Ekn )rn=1 , where r 2 N,
2  r  N , satisfies
P(Ek1 · · · Ekr ) = P(Ek1 ) · · · P(Ekr ).
Remark 3.10. Let E1 , . . . , EN 2 F. We have:
(En )N
n=1 are mutually independent
=)
(En )N
n=1 are pairwise independent;
(En )N
n=1 are pairwise independent
=)
6
(En )N
n=1 are mutually independent.
We now come back to the properties of conditional probabilities and we
see an application of them to a famous example, the gambler’s ruin.
Proposition 3.11 (Law of total probability). Let (En )n form a partition of
⌦. Then, for any event F 2 F,
P(F ) =
X
n
P(En )P(F |En ).
(3.4)
In particular, given any other event E 2 F,
P(F ) = P(E)P(F |E) + P(E c )P(F |E c ).
Proposition 3.12. Let F 2 F such that P(F ) > 0. Then, the map PF :
F ! R defined on the collection of events as
PF (E) = P(E|F )
is a probability measure on (⌦, F).
8E 2 F,
(3.5)
33
Example 3.13 (Gambler’s ruin). Two players, A and B, gamble according
to the following rules: a coin is tossed and, if the result is a head then A
pays one unit of money to B, while if the result is a tail then B pays one
unit of money to A; the game is repeated until one of the two players runs
out of money. The winner of the game is the player that has all the money
when the game ends. Assume that the two players have a total amount of
N units of money to play with. Denote by p the probability that the coin
ends up with a head in any of the independent tosses. We want to compute
the probability that A wins the game, that is B gets ruined, if the initial
distribution of money consists of i units for A and N
i units for B.
We denote: E = “A wins” and
Pi (E) = P(E|“A starts with i units of money”), for i 2 {1, . . . , N }.
Note that Pi is a well-defined probability measure, according to Proposition
3.12. We want to compute Pi (E) for any i 2 {1, . . . , N }. Also note that
P0 (E) = 0 and PN (E) = 1. In order to do that, we proceed one step (toss
of the coin) at a time. Denoting H = “the result of the first toss is a head”,
and using the law of total probability, we have
Pi (E) = Pi (H)Pi (E|H) + Pi (H c )Pi (E|H c )
= pPi+1 (EE) + (1
p)Pi 1 (E).
We denote pi := Pi (E) for all i 2 {1, . . . , N } and we rewrite the equation
above in a way to have a recursive formula for the di↵erence pi+1
1 p
pi+1 pi =
(pi pi 1 ).
p
pi :
Replacing the di↵erence on the right-hand side by its recursive definition,
iteratively up to p1
p0 = p1 , we get
pi+1
pi =
✓
1
p
p
Taking the sum of these di↵erences we get
◆k
i 1
i 1 ✓
X
X
1 p
pi p1 =
(pk+1 pk ) =
p1
p
k=1
k=1
◆i
p1 .
)
pi =
i 1 ✓
X
1
k=0
p
p
◆k
p1 .
34
3. Conditional Probability and independence
Hence
pi =
8
1 p i
>
< 1 ( 1p p) p1 , p 6= 1 ,
2
1
>
: i p1 ,
p
p=
1
2
Since pN = 1, we obtain p1 , and consequently pi , in terms of the initial
parameters:
p1 =
8
>
<
1
1
(
>
:1,
1 p
p
1 p N
p
)
p1 , p 6=
N
p=
1
,
2
)
1
2
8
1 p i
>
< 1 ( p )N , p 6= 1 ,
1 p
2
pi = 1 ( p )
>
:i,
p = 12 .
N
Then, denote the probability that given the same initial distribution of money
the game ends and B wins by
qi = P(“B wins”|“A starts with i units of money”), for i 2 {1, . . . , N }.
Following the same iterative procedure, we find
8
N i
p
>
< 1 ( 1 p ) N , p 6= 1 ,
2
1 ( 1pp )
qi =
>
:N i,
p = 12 .
N
For p = 12 , we get pi + qi =
i
N
+
N i
.
N
For p 6= 12 , we get
pN pN i (1 p)i (1
+
pN (1 p)N
pN (1 p)N
= N
p
(1 p)N
p i + qi =
p)N (1 p)N N +i pN
(1 p)N pN
i
= 1.
This proves that the scenario where the game goes on forever and no player
ever runs out of money is impossible: with probability 1, one of the two
players will run out of money in a finite time.
Let’s give a numerical example. Assume N = 15 and A starts with i = 5
units of money and B with N
i = 10 units. If the coin is balanced,
that is if p = 0.5, then the probability tha A wins (in a finite time) is
p5 = 1/3 = 66.7%. If the coin is slightly biased, with probability of getting
a head being p = 0.6, then p5 = 87%, which means the probability that A
wins increases considerably.
35
A very useful application of conditional probabilities and related properties is the so-called Bayes’s rule (named after Thomas Bayes), very used in
Statistics and useful for the computation of conditional probabilities when
the deifinition in (3.1) cannot be directly used with the available data.
Proposition 3.14. Let (En )n be a (at most countable) partition of ⌦. Then,
for any F 2 F,
P(F |Ei )P(Ei )
P(Ei |F ) = P
,
n P(En )P(F |En )
i = 1, 2, . . . .
(3.6)
The formula (3.6) tells us how to revise the probability of any of the events
in the partition after some information is given. The object to be computed is
just a conditional probability, but when we don’t know either the probability
of the event whose occurrence is announced or the joint probabilities, we can
use the reversed conditional probabilities.
Proof. By (3.1), (3.2) and (3.4), we have
P(Ei |F ) =
P(Ei F )
P(F |Ei )P(Ei )
=P
.
P(F )
n P(En )P(F |En )
The following famous example is a straightforward application of the
Bayes’s rule.
Example 3.15 (Monty Hall). In a TV show, the contestant faces three
closed doors, one of which hides a valuable prize, say a car, and each of the
other two hides sheep. The contestant chooses a door, and at the end of
the game: if behind the chosen door is the car, then they win it, otherwise
they get nothing. Assume the contestant chooses the door number 1, and
the game-show host decides to help by opening the door number 2, which
shows a sheep, and giving the contestant the possibility to change their initial
choice if they want. Should the contestant switch to the door number 3, or
should they keep the door number 1?
In order to answer, we have to compute the probability that the car is behind
36
3. Conditional Probability and independence
the door number 1, or analogously 3 since the two probabilities are complementary, given that the host opened the door number 2. Denote by Ei the
event “the car is behind the door number i”, for i = 1, 2, 3, and by F the
event “the game host opens the door number 2”. Without any information,
we have P(Ei ) = 1/3 for all i = 1, 2, 3. We also know the probability that
the host opens the door number 2 given that the prize is behind the door
number i for each i = 1, 2, 3, that is:
1
P(F |E1 ) = ,
2
P(F |E2 ) = 0,
P(F |E3 ) = 1.
Using the Bayes’s rule, we can compute
P(F |E3 )P(E3 )
P(E3 |F ) = P3
=
i=1 P(Ei )P(F |Ei )
11
23
11
+ 13
23
2
= ,
3
and consequently P(E1 |F ) = 13 . Thus, the contestant has a high probability
of winning the car if they switch to the door number 3.
Chapter 4
Discrete random variables
Sometimes we are not interested in the outcome of a random experiment
but in some numerical function of it. In this case, the value we are interested
in depends on the outcomes of the experiment, which is not known in advance,
so it is random.
Definition 4.1. A random variable is a real-valued function on the sample
space:
X : ⌦ ! R,
X : ! 7! X(!) 8! 2 ⌦.
Example 4.2. Recalling the experiment of three consecutive tosses of a coin,
assume we are interesting in the number of heads obtained. This is a random
variable, that we denote by X. Now let us rewrite the following events in
terms of the random variable X:
“the total number of heads is two” = {(HHT ), (HT H), (T HH)}
= {! 2 ⌦ : X(!) = 2} ⌘ {X = 2},
“the total number of heads is odd” = {(HHH), (HT T ), (T HT ), (T T H)}
= {! 2 ⌦ : X(!) 2 A} ⌘ {X 2 A},
where A = {n 2 N : n odd}.
In general, given a random variable X and a set of outcomes A ⇢ ⌦, we
use the notation
{X 2 A} ⌘ {! 2 ⌦ : X(!) 2 A}.
37
38
4. Discrete random variables
Random variable that count something are called discrete, because they
can only take values in a discrete set.
Definition 4.3. A random variable X is a discrete random variable if there
exists a countable set K ⇢ R such that P(X 2 K) = 1.
Let X be a discrete random variable, its probability mass function (PMF),
or density function, is a real map pX , defined by
pX : R ! R,
pX (x) = P(X = x) 8x 2 R.
The probability mass function of a discrete random variable is usually
visually represented with a histogram, where the horizontal axis contains the
possible values of the random variable and the vertical axis the corresponding
probabilities.