Discrete Probability - Tulane University Computer Science

Discrete Probability
CMPS/MATH 2170: Discrete Mathematics
1
Applications of Probability in Computer Science
• Average-case complexity
• Randomized algorithms
• Combinatorics
• Cryptography
• Information theory
• Machine learning
•…
2
Sample Space
• Experiment: a procedure that yields one of a given set of possible outcomes
− Ex: flip a coin, roll two dice, draw five cards from a deck, etc.
• Sample space Ω: the set of possible outcomes
− We focus on countable sample space: Ω is finite or countably infinite
− In many applications, Ω is uncountable (e.g., a subset of ℝ)
• Event: a subset of the sample space
− Probability is assigned to events
− For an event 𝐴𝐴 ⊆ Ω, its probability is denoted by P(𝐴𝐴)
• Describes beliefs about likelihood of outcomes
3
Discrete uniform law
• Assume Ω consists of 𝑛𝑛 equally likely outcomes
• For an event 𝐴𝐴 ⊆ Ω, P 𝐴𝐴 =
𝐴𝐴
𝑛𝑛
Ω
𝐴𝐴
prob =
• Ex. 1: An urn contains four blue balls and five red balls. What is the probability
that a ball chosen at random from the urn is blue? 4/9
Ω
1
𝑛𝑛
𝐴𝐴
4
Discrete Probability
• Sample space Ω: the set of possible outcomes
− We focus on countable Ω
• Events: subsets of the sample space Ω
• Discrete Probability Law
− A function P: 𝒫𝒫 Ω → [0,1] that assigns probability to events such that:
• 0 ≤ P 𝑠𝑠 ≤ 1 for all 𝑠𝑠 ∈ Ω
• P 𝐴𝐴 = ∑𝑠𝑠∈𝐴𝐴 P( 𝑠𝑠 ) for all 𝐴𝐴 ⊆ Ω
• P Ω = ∑𝑠𝑠∈Ω P 𝑠𝑠
=1
(Nonnegativity)
(Additivity)
(Normalization)
• Discrete uniform probability law: Ω = 𝑛𝑛, P 𝐴𝐴 =
|𝐴𝐴|
𝑛𝑛
∀ 𝐴𝐴 ⊆ Ω
5
Examples
• Ex. 2: consider rolling a pair of 6-sided fair dice
− Ω = { 𝑖𝑖, 𝑗𝑗 : 𝑖𝑖, 𝑗𝑗 = 1, 2, 3, 4, 5, 6}, each outcome has the same probability of 1/36
− P the sum of the rolls is even
= 18/36 = 1/2
− P the first roll is larger than the second
= 15/36 = 5/12
• Ex. 3: consider rolling a 6-sided biased (loaded) die
− Assume P 3 =
2
,P
7
1 =P 2 =P 4 =P 5 =P 6 =
− 𝐸𝐸 = {1,3,5}, P 𝐸𝐸 = 1 + 2 + 1 = 4
7
7
7
7
1
7
6
Properties of Probability Laws
• Consider a probability law, and let 𝐴𝐴, 𝐵𝐵, and 𝐶𝐶 be events
− If 𝐴𝐴 ⊆ 𝐵𝐵, then P 𝐴𝐴 ≤ P 𝐵𝐵
− P 𝐴𝐴 = 1 − P 𝐴𝐴
− P(𝐴𝐴 ∪ 𝐵𝐵) = P 𝐴𝐴 + P 𝐵𝐵 − P(𝐴𝐴 ∩ 𝐵𝐵)
− P(𝐴𝐴 ∪ 𝐵𝐵) = P 𝐴𝐴 + P 𝐵𝐵 if 𝐴𝐴 and 𝐵𝐵 are disjoint, i.e., 𝐴𝐴 ∩ 𝐵𝐵 = ∅
• Ex. 4: What is the probability that a positive integer selected at random from the
set of positive integers not exceeding 100 is divisible by either 2 or 5?
50
20
10
+
−
= 0.6
100 100 100
7
Conditional Probability
• Conditional probability provides us with a way to reason about the outcome of an
experiment, based on partial information
• Ex. 1: roll a six-sided fair die. Suppose we are told that the outcome is even. What
1
|𝐴𝐴 ∩ 𝐵𝐵|
is the probability that the outcome is 6?
𝐵𝐵
𝐴𝐴
3
=
|𝐵𝐵|
• Let 𝐴𝐴 and 𝐵𝐵 be two events (of a given sample space) where P 𝐵𝐵 > 0. The
conditional probability of 𝐴𝐴 given 𝐵𝐵 is defined as
P(𝐴𝐴 ∩ 𝐵𝐵)
P 𝐴𝐴 𝐵𝐵 =
P(𝐵𝐵)
• Given an event 𝐵𝐵 with P 𝐵𝐵 > 0, conditional probabilities P 𝐴𝐴 𝐵𝐵 form a
legitimate probability law.
8
Conditional Probability
• Ex. 2: Flip a fair coin three successive times. Assume all the outcomes are equally
likely. Find P(𝐴𝐴 | 𝐵𝐵) where
𝐴𝐴 = {more heads than tails come up}, 𝐵𝐵 = {1st toss is a head}.
Ω = 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇
𝐵𝐵 = 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻 , 𝐴𝐴 ∩ 𝐵𝐵 = 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻
P(𝐴𝐴 | 𝐵𝐵) =
3/8
4/8
=
3
4
9
Independence
• We say that event 𝐴𝐴 is independent of event 𝐵𝐵 if P 𝐴𝐴 | 𝐵𝐵 = P(𝐴𝐴)
• Two events 𝐴𝐴 and 𝐵𝐵 are independent if and only if P 𝐴𝐴 ∩ 𝐵𝐵 = P 𝐴𝐴 P(𝐵𝐵)
• Ex. 3: Consider an experiment involving two successive rolls of a 4-sided die in which
all 16 possible outcomes are equally likely and have probability 1/16. Are the following
pair of events independent?
(a) 𝐴𝐴 = { 1st roll is 2 }, 𝐵𝐵 = { 2nd roll is 3 }
Yes
(c) 𝐴𝐴 =
No
(b) 𝐴𝐴 =
1st roll is 1 , 𝐵𝐵 =
1st roll is 4 , 𝐵𝐵 =
sum of two rolls is 5
sum of two rolls is 4
Yes
• We say that the events 𝐴𝐴1 , 𝐴𝐴2 , … 𝐴𝐴𝑛𝑛 are (mutually) independent if and only if
P(⋂𝑖𝑖∈𝑆𝑆 𝐴𝐴𝑖𝑖 ) = ∏𝑖𝑖∈𝑆𝑆 𝑃𝑃(𝐴𝐴𝑖𝑖 ), for every subset 𝑆𝑆 of {1, 2, … , 𝑛𝑛}
10
Bernoulli Trials
• Bernoulli Trial: an experiment with two possible outcomes
− E.g., flip a coin results in two possible outcomes: head (𝐻𝐻) and tail (𝑇𝑇)
• Independent Bernoulli Trials: a sequence of Bernoulli trails that are mutually independent
• Ex.4: Consider an experiment involving five independent tosses of a biased coin, in which the
probability of heads is 𝑝𝑝.
− What is the probability of the sequence HHHTT?
• 𝐴𝐴𝑖𝑖 = {𝑖𝑖−th toss is a head}
• P 𝐴𝐴1 ∩ 𝐴𝐴2 ∩ 𝐴𝐴3 ∩ 𝐴𝐴4 ∩ 𝐴𝐴5 = P 𝐴𝐴1 P 𝐴𝐴2 P 𝐴𝐴3 P 𝐴𝐴4 P 𝐴𝐴5 = 𝑝𝑝3 1 − 𝑝𝑝
− What is the probability that exactly three heads come up?
• P exactly three heads come up =
5
3
𝑝𝑝3 1 − 𝑝𝑝
2
2
11
Bernoulli Trials
• Theorem: Consider an experiment involving 𝑛𝑛 independent Bernoulli trials,
with probability of success 𝑝𝑝 and probability of failure 𝑞𝑞 = 1 − 𝑝𝑝. The
probability of exactly 𝑘𝑘 successes is
𝑛𝑛 𝑘𝑘 𝑛𝑛−𝑘𝑘
𝑝𝑝 𝑞𝑞
𝑘𝑘
12
Random Variables
• A random variable is a real-valued function of the experimental outcome.
• Ex. 1: In an experiment involving rolling a die twice. The following are examples of r.v.
− The sum of the two rolls
− The number of sixes in the two rolls
• Ex. 2: Consider an experiment involving three independent tosses of a fair coin.
− Ω = 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇
− 𝑋𝑋 𝑠𝑠 = the number of heads that appear for 𝑠𝑠 ∈ Ω. Then
𝑋𝑋 𝐻𝐻𝐻𝐻𝐻𝐻 = 3, 𝑋𝑋 𝐻𝐻𝐻𝐻𝑇𝑇 = 𝑋𝑋 𝐻𝐻𝑇𝑇𝑇𝑇 = 𝑋𝑋 𝑇𝑇𝐻𝐻𝐻𝐻 = 2,
𝑋𝑋 𝐻𝐻𝑇𝑇𝑇𝑇 = 𝑋𝑋 𝑇𝑇𝐻𝐻𝑇𝑇 = 𝑋𝑋 𝑇𝑇𝑇𝑇𝐻𝐻 = 1, 𝑋𝑋 𝑇𝑇𝑇𝑇𝑇𝑇 = 0
The event 𝑋𝑋 = 𝑥𝑥 is consisting of
all outcomes that give rise to a
value of 𝑋𝑋 equal to 𝑥𝑥
− P 𝑋𝑋 = 2 = P({s ∈ Ω: 𝑋𝑋 𝑠𝑠 = 2)= P({𝐻𝐻𝐻𝐻𝑇𝑇,𝐻𝐻𝑇𝑇𝐻𝐻,𝑇𝑇𝐻𝐻𝐻𝐻}) = 3/8
− P 𝑋𝑋 < 2 = P 𝐻𝐻𝐻𝐻𝐻𝐻, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇
= 4/8 = 1/2
13
Random Variables
• A random variable is a real-valued function of the outcome of the experiment.
• A function of a random variable defines another random variable.
• We can associate with each random variable certain “averages” of interest, such as
the expected value and the variance.
• A discrete random variable is a real-valued function of the outcome of the
experiment that can take a finite or countably infinite number of values
14
Probability Mass Functions
• Consider a discrete random variable 𝑋𝑋. If 𝑥𝑥 is any possible value of 𝑋𝑋, the
probability mass of 𝑥𝑥, denoted 𝑝𝑝𝑋𝑋 𝑥𝑥 , is the probability of the event 𝑋𝑋 = 𝑥𝑥:
𝑝𝑝𝑋𝑋 𝑥𝑥 = P 𝑋𝑋 = 𝑥𝑥 = P(𝑠𝑠 ∈ Ω: 𝑋𝑋(𝑠𝑠) = x))
− 𝑝𝑝𝑋𝑋 is called the probability mass function (PMF) of 𝑋𝑋
• Ex 3: let the experiment consist of two independent tosses of a fair coin, and let 𝑋𝑋
be the number of heads obtained. The PMF of 𝑋𝑋 is
1/4, if 𝑥𝑥 = 0 or 𝑥𝑥 = 2,
𝑝𝑝𝑋𝑋 𝑥𝑥 = �1/2, if 𝑥𝑥 = 1,
otherwise.
0,
15
Probability Mass Functions
• Properties of PMF
− ∑𝑥𝑥 𝑝𝑝𝑋𝑋 𝑥𝑥 = 1
− P 𝑋𝑋 ∈ 𝑆𝑆 = ∑𝑥𝑥 ∈ 𝑆𝑆 𝑝𝑝𝑋𝑋 𝑥𝑥
• Ex. 4: Let 𝑋𝑋 is the number of heads obtained in two independent tosses of a fair
coin. The probability of at least one head is
2
P 𝑋𝑋 > 0 = � 𝑝𝑝𝑋𝑋 𝑥𝑥 =
𝑥𝑥=1
1 1 3
+ =
2 4 4
16
Expected Value
• The expected value (also called the expectation or the mean) of a random variable
𝑋𝑋 on the sample space Ω is equal to
𝐸𝐸 𝑋𝑋 = ∑𝑠𝑠 ∈ Ω 𝑋𝑋 𝑠𝑠 P {𝑠𝑠}
= ∑𝑥𝑥 𝑥𝑥𝑝𝑝𝑋𝑋 𝑥𝑥
• Ex. 1: Let 𝑋𝑋 be the number that comes up when a fair die is rolled.
6
1
21 7
=
− 𝐸𝐸 𝑋𝑋 = � 𝑖𝑖 × =
6
6
2
− 𝐸𝐸
𝑋𝑋 2
𝑖𝑖=1
6
=�
𝑖𝑖=1
𝑖𝑖 2
1
91
× =
6
6
17
Expected Value
• Ex. 2: Consider two independent coin tosses, each with a 3/4 probability of a
head, and let X be the number of heads obtained. Find the PMF of 𝑋𝑋 and 𝐸𝐸(𝑋𝑋).
𝑝𝑝𝑋𝑋 𝑥𝑥 =
1 2
,
4
1
2⋅
4
3 2
4
1
𝐸𝐸 𝑋𝑋 = 0 ⋅
4
2
⋅
3
4
,
if 𝑥𝑥 = 0,
if 𝑥𝑥 = 1,
if 𝑥𝑥 = 2.
1 3
3
+2⋅
+1⋅ 2⋅ ⋅
4 4
4
2
12 3
=
=
8
2
18
Linearity of Expectations
• If 𝑋𝑋𝑖𝑖 , 𝑖𝑖 = 1,2, … , 𝑛𝑛 are random variables on Ω, and 𝑎𝑎 and 𝑏𝑏 are real numbers, then
− 𝐸𝐸 𝑋𝑋1 + 𝑋𝑋2 + ⋯ 𝑋𝑋𝑛𝑛 = 𝐸𝐸 𝑋𝑋1 + 𝐸𝐸 𝑋𝑋2 + ⋯ + 𝐸𝐸 𝑋𝑋𝑛𝑛
− 𝐸𝐸 𝑎𝑎𝑎𝑎 + 𝑏𝑏 = 𝑎𝑎𝑎𝑎 𝑋𝑋 + 𝑏𝑏
• Ex. 3: Consider an experiment involving rolling a pair of six-sided fair dice. What is the
expected value of the sum of numbers that appear?
− 𝐸𝐸 𝑋𝑋 = 7
• Ex. 4: Consider an experiment involving 𝑛𝑛 independent Bernoulli trials, with probability
of success 𝑝𝑝. What is the expected value of the number of successes in 𝑛𝑛 trials?
𝑛𝑛
− 𝐸𝐸 𝑋𝑋 = � 𝑘𝑘𝑘𝑘 𝑋𝑋 = 𝑘𝑘 = 𝑛𝑛𝑛𝑛
𝑘𝑘=0
19
Variance
• The variance of a random variable 𝑋𝑋 on the sample space Ω is equal to
𝑉𝑉 𝑋𝑋 = ∑𝑠𝑠 ∈ Ω 𝑋𝑋 𝑠𝑠 − 𝐸𝐸 𝑋𝑋
= 𝐸𝐸
𝑋𝑋 − 𝐸𝐸 𝑋𝑋
2
2
P {𝑠𝑠}
− The variance provides a measure of dispersion of 𝑋𝑋 around its mean
− Another measure of dispersion is the standard deviation of 𝑋𝑋:
𝜎𝜎 𝑋𝑋 =
𝑉𝑉(𝑋𝑋)
20
Variance
• Theorem: 𝑉𝑉 𝑋𝑋 = 𝐸𝐸 𝑋𝑋 2 − 𝐸𝐸 𝑋𝑋
2
• Ex. 5: Consider the experiment of tossing a coin, which comes up a head with
probability 𝑝𝑝 and a tail with probability 1 − 𝑝𝑝. Define the random variable 𝑋𝑋:
1
𝑋𝑋 = �
0
𝑖𝑖𝑖𝑖 𝑎𝑎 ℎ𝑒𝑒𝑒𝑒𝑒𝑒
𝑖𝑖𝑖𝑖 𝑎𝑎 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝐸𝐸 𝑋𝑋 = 1 ⋅ 𝑝𝑝 + 0 ⋅ 1 − 𝑝𝑝 = 𝑝𝑝
𝑉𝑉 𝑋𝑋 = 𝐸𝐸 𝑋𝑋 2 − 𝐸𝐸 𝑋𝑋
2
𝑝𝑝
𝑝𝑝𝑋𝑋 (𝑥𝑥) = �
1 − 𝑝𝑝
= 𝑝𝑝 − 𝑝𝑝2
𝑖𝑖𝑖𝑖 𝑥𝑥 = 1
𝑖𝑖𝑖𝑖 𝑥𝑥 = 0
𝐸𝐸 𝑋𝑋 2 = 1 ⋅ 𝑝𝑝 + 0 ⋅ 1 − 𝑝𝑝 = 𝑝𝑝
Ex. 6: What is the variance of the random variable 𝑋𝑋, where 𝑋𝑋 is the number that
2
35
91
7
comes up when a fair die is rolled? 𝑉𝑉 𝑋𝑋 = 𝐸𝐸 𝑋𝑋 2 − 𝐸𝐸 𝑋𝑋 2 =
−
=
6
2
12
21
Total Probability
Total Probability Theorem
Let 𝐴𝐴1 , … , 𝐴𝐴𝑛𝑛 be disjoint events that form a partition of the sample space (each
possible outcome is included in exactly one of the events 𝐴𝐴1 , … , 𝐴𝐴𝑛𝑛 ) and assume that
P 𝐴𝐴𝑖𝑖 > 0, for all 𝑖𝑖. Then, for any event 𝐵𝐵, we have
P 𝐵𝐵 = P 𝐴𝐴1 ∩ 𝐵𝐵 + ⋯ + P(𝐴𝐴𝑛𝑛 ∩ 𝐵𝐵)
= P 𝐴𝐴1 ) P(𝐵𝐵|𝐴𝐴1 + ⋯ + P 𝐴𝐴𝑛𝑛 P(𝐵𝐵|𝐴𝐴𝑛𝑛 )
𝐴𝐴1
𝐴𝐴2
𝐵𝐵
Ω
𝐴𝐴3
22
Total Probability
• Radar Detection. If an aircraft is present in a certain area, a radar detects it and
generates an alarm signal with probability 0.99. If an aircraft is not present. the
radar generates a (false) alarm, with probability 0.10. We assume that an aircraft is
present with probability 0.05.
𝐴𝐴 = {an aircraft is present}, 𝐵𝐵 = {the radar generates an alarm}
P 𝐴𝐴 = 0.05, P 𝐵𝐵|𝐴𝐴 = 0.99, P 𝐵𝐵|𝐴𝐴 = 0.10
P 𝐵𝐵 = P 𝐴𝐴 ∩ 𝐵𝐵 + P(𝐴𝐴 ∩ 𝐵𝐵)
= P 𝐴𝐴 P 𝐵𝐵|𝐴𝐴 + P 𝐴𝐴 P 𝐵𝐵|𝐴𝐴
= 0.05 ⋅ 0.99 + 1 − 0.05 ⋅ 0.1
= 0.1445
23
Bayes’ Theorem
Bayes’ Rule
Let 𝐴𝐴1 , … , 𝐴𝐴𝑛𝑛 be disjoint events that form a partition of the sample space, and
assume that P 𝐴𝐴𝑖𝑖 > 0, for all 𝑖𝑖. Then, for any event 𝐵𝐵 such that P 𝐵𝐵 > 0, we have
𝑃𝑃 𝐴𝐴𝑖𝑖 ∩ 𝐵𝐵
P 𝐴𝐴𝑖𝑖 |𝐵𝐵 =
𝑃𝑃(𝐵𝐵)
P 𝐴𝐴𝑖𝑖 )𝑃𝑃(𝐵𝐵|𝐴𝐴𝑖𝑖
=
P 𝐴𝐴1 ) P(𝐵𝐵|𝐴𝐴1 + ⋯ + P 𝐴𝐴𝑛𝑛 P(𝐵𝐵|𝐴𝐴𝑛𝑛 )
24
Bayes’ Theorem
• Radar Detection. If an aircraft is present in a certain area, a radar detects it and
generates an alarm signal with probability 0.99. If an aircraft is not present. the
radar generates a (false) alarm, with probability 0.10. We assume that an aircraft is
present with probability 0.05.
𝐴𝐴 = {an aircraft is present}, 𝐵𝐵 = {the radar generates an alarm}
P 𝐴𝐴 = 0.05, P 𝐵𝐵|𝐴𝐴 = 0.99, P 𝐵𝐵|𝐴𝐴 = 0.10
P 𝐴𝐴 ∩ 𝐵𝐵
P 𝐴𝐴|𝐵𝐵 =
𝑃𝑃(𝐵𝐵)
P A P 𝐵𝐵|𝐴𝐴
=
P 𝐴𝐴 P 𝐵𝐵|𝐴𝐴 + P 𝐴𝐴 P 𝐵𝐵|𝐴𝐴
0.05 ⋅ 0.99
≈ 0.34256
=
0.1445
25