Introduction to Probability
Introductory Example
When you toss a coin, each of the two outcomes β
heads or tails β are equally likely. Therefore, if we
performed a very large number of such coin tosses,
we would expect each outcome to occur about 50%
of the time. This number, 0.5, is called the
probability of each outcome.
Generally speaking, if a specified outcome of an
experiment has probability π, we mean that if we
perform the same experiment π times, the
outcome will happen about ππ times, if π is large.
Terminology
A process that yields more than one outcome randomly is
called an experiment. The set π of all outcomes is called the
sample space.
Example: the experiment could be rolling a fair, six-sided die.
There are six different outcomes, which are the integers from
1 to 6:
π = 1,2,3,4,5,6
(Since the die is βfairβ, each of these outcomes is equally
1
likely, hence each has the probability π = .)
6
The Concept of an βEventβ and its
Common Misunderstandings
When you hear the word event, you may think that it is just a synonym
for an outcome of an experiment. This is not the case. By an event, we
mean a set of (alternative) outcomes. An example will clarify the
meaning of this.
Let us consider the experiment of rolling a fair die once and consider
the following event:
πΈ = {2,4,6}
πΈ is the event that the die was rolled (once!), and the outcome was 2,
4 or 6.It is NOT the event where someone rolled the die three times
and got a 2, a 4 and a 6.
πΈ could be named βyou rolled an even numberβ.
The exact definition of an event
Given an experiment with sample space π, a
subset of the sample space πΈ β π is called an
event.
The empty set is an event, called the impossible
event.
The entire sample space π is an event as well,
called the certain event.
The probability of an event
If all outcomes of an experiment with finite sample space
π are equally likely, then the probability of an event πΈ β π is
πΈ
π πΈ = .
π
For example, on a previous page, we considered the event
πΈ = {2,4,6}. Since there are 3 outcomes in the event, and 6
3
1
outcomes total, π πΈ = = = 50%.
6
2
The probability of the impossible event is zero. The probability
of the certain event is 1. All other probabilities are real
numbers between 0 and 1.
This definition of probability is called Laplace Probability. It is
only meaningful when all outcomes are equally likely.
An Application to Programming (1)
You are writing a C program that requires two random, distinct numbers a,b of type
int.
Your program divides by a-b at some point, so it is necessary that the two random
numbers are not equal. You figure that the probability that two randomly generated
ints are equal is too small to bother with, so you donβt check for equality.
What is the probability that your program crashes due to this division by zero,
assuming that an int in C is a signed 16-bit value, i.e. it is in the range
[β32767, 32767]?
Solution: there are 216 =65535 possible int values. Thus, the (Laplace) probability that
a randomly chosen b equals a is
1
π=
β 0.0015%.
65535
This is small, but if your program is very popular, and/or makes this random choice
frequently, crash reports will be common. For example, if there are 10000 users who
use your program daily, and if a random choice of a,b is made 100 times per day
during typical usage, and if each crash generates an automatic bug report email to
you, then you will get on average about 15 such emails per day.
When Laplace Probability Doesnβt
Apply
Consider the following experiment: you play the lottery. You only care
about whether you win the jackpot or not, so you model this
experiment with the following sample space:
π = win jackpot, donβ² t win jackpot
The event you hope for is the singleton event πΈ = win jackpot .
1
2
The Laplace probability of that event is π πΈ = = 50%.
Therefore, you win the jackpot every second time you play.
This absurd conclusion is due to our ignoring the assumption that
makes Laplace probability meaningful. The two outcomes are not
equally likely, in fact, far from it.
A subtler example of outcomes that
are not equally likely
Let us consider the experiment where two fair dice are rolled, one
after the other, and the sum of their numbers is then taken. Then the
possible outcomes are the integers from 2 to 12. We might therefore
model this situation with the sample space
π = 2,3,4,5,6,7,8,9,10,11,12
The Laplace probability of each outcome is
1
.
11
Again, this is incorrect, because Laplace probability does not apply to
this situation. This is because the individual outcomes are not equally
likely.
Consider that a sum of 2 can only be obtained as 1 + 1. A sum of 3
can be obtained as 1 + 2 or as 2 + 1. A sum of 4 can be obtained as
1 + 3, 2 + 2 or 3 + 1. It is more likely to roll a sum of 4 than a sum of
3, which is more likely than rolling a sum of 2.
Finding an appropriate sample space
to use Laplace Probability
To fix the problem, we model the experiment by using a different
sample space. Let us ignore the step of adding the two numbers for
now and let us focus on the individual numbers on the dice. Then,
each outcome is an ordered list of two numbers, each an integer from
1 to 6. Then,
π = { 1,1
2,1
3,1
4,1
5,1
6,1
,
,
,
,
,
,
1,2
2,2
3,2
4,2
5,2
6,2
,
,
,
,
,
,
1,3
2,3
3,3
4,3
5,3
6,3
,
,
,
,
,
,
1,4
2,4
3,4
4,4
5,4
6,4
,
,
,
,
,
,
1,5
2,5
3,5
4,5
5,5
6,5
,
,
,
,
,
,
1,6
2,6
3,6
4,6
5,6
6,6
,
,
,
,
,
}
(You should recognize this set as π
× π
where π
= 1,2,3,4,5,6 . )
Now, all outcomes are equally likely with probability
1
.
36
We are now in a position to compute the probabilities for given sums
of the two numbers.
A sum of 2 corresponds to the event πΈ = { 1,1 }. Therefore, its
1
probability is .
36
A sum of 3 corresponds to the event πΈ = { 1,2 , (2,1)}. Therefore, its
2
1
probability is = .
36
18
A sum of 4 corresponds to the event πΈ = { 1,3 , 2,2 , (3,1)}.
3
1
Therefore, its probability is = .
36
12
A sum of 5 corresponds to the event πΈ = { 1,4 , 2,3 , 3,2 , (4,1)}.
4
1
Therefore, its probability is = .
36
9
As an exercise, you should ask yourself which sum of the two dice is
the most likely, and which sums are least likely.
Probabilities Involving Combinations
Example 1: in the German national lottery, 6 numbers are randomly drawn from a
possible 49 numbers. The player attempts to predict these 6 numbers (not their
order). What is the probability of predicting the 6 numbers correctly?
Solution: there are πΆ(49,6) ways of selecting 6 numbers from 49 without order. The
probability that one such combination will turn out to be the winning combinations is
1
1
π = πΆ(49,6) = 13,983,816 .
Example 2: what is the probability of predicting exactly 4 numbers correctly, and 2
incorrectly, in this same lottery?
Given one set of 6 winning numbers and 43 losing numbers, there are πΆ(6,4) ways of
having picked 4 out of the 6 winning numbers, and πΆ(43,2) ways of having picked 2
out of the 43 losing numbers. By the multiplication principle, there are πΆ(6,4) β
πΆ(43,2) different predictions that have 4 winning and 2 loosing numbers. Therefore,
the probability to have made such a prediction is
π=
πΆ(6,4) β πΆ(43,2)
13,545
=
= 0.00097
πΆ(49,6)
13,983,816
Unions, Intersections of Events
If π is a finite set, and πΈ1 , πΈ2 are subsets of π,
then we may consider the events πΈ1 βͺ πΈ2
and πΈ1 β© πΈ2 .
πΈ1 βͺ πΈ2 is the event that πΈ1 or πΈ2 happens (or
both).
πΈ1 β© πΈ2 is the event that πΈ1 and πΈ2 happens.
Probabilities of Unions of Events
We already learned that if π is a finite set, and πΈ1 , πΈ2 are
subsets of π, then πΈ1 βͺ πΈ2 = πΈ1 + πΈ2 β πΈ1 β© πΈ2 .
By dividing both sides of this equation by π , we obtain
πΈ1 βͺ πΈ2
πΈ1
πΈ2
πΈ1 β© πΈ2
=
+
β
π
π
π
π
Each one of these quotients is a Laplace probability. We
have thus obtained the formula for the probability of a
union of two events:
π πΈ1 βͺ πΈ2 = π(πΈ1 ) + π(πΈ2 ) β π(πΈ1 β© πΈ2 )
Mutually Exclusive Events
Two events πΈ1 , πΈ2 are called mutually exclusive if they cannot
happen at the same time, i.e. when when their intersection is
the impossible event, i.e. when they are disjoint as sets:
πΈ1 β© πΈ2 = β
For example, πΈ1 might be that you roll an even number with a
die, and πΈ2 is that you roll an odd number.
Since the probability for the impossible event is zero, the
formula for the probability of the union becomes:
π πΈ1 βͺ πΈ2 = π(πΈ1 ) + π(πΈ2 )
Application
What is the probability of predicting at least 5 numbers
correctly in the German lottery?
Predicting at least 5 means predicting 5 or 6. By applying what
we learned previously,
πΆ 6,5 β πΆ 43,1
π exactly five correct =
πΆ 49,6
1
π exactly six correct =
.
πΆ 49,6
These two events are mutually exclusive. Therefore, the
probability that one or the other happens is the sum of the
πΆ 6,5 βπΆ 43,1 +1
probabilities: π at least five correct =
.
πΆ 49,6
Complements of Events
If π is a finite sample space, and πΈ is an event, then πΈ and
πΈΰ΄€ are disjoint. Here, πΈΰ΄€ means the complement of πΈ with
π being the universal set. Therefore, π πΈ βͺ πΈΰ΄€ =
ΰ΄€
π(πΈ) + π(πΈ).
Since the union of a subset and its complement is the
universal set, πΈ βͺ πΈΰ΄€ = π, and π π = 1,
π πΈ + π πΈΰ΄€ = 1
or
π πΈΰ΄€ = 1 β π πΈ .
Application
Sometimes, an event contains almost all outcomes. It is then more practical
to evaluate its probability as 1 minus the probability of the complementary
event.
Example: what is the probability of having at least one correct prediction in
the German lottery?
Solution: at least one means 1,2,3,4,5 or 6 numbers correct. We would have
to compute 6 probabilities and add them. It is easier to compute
π none correct =
πΆ 43,6
πΆ 49,6
and then subtract that from 1:
πΆ 43,6
π at least one correct = 1 β
.
πΆ 49,6
An Application to Programming (2)
Going back to a previous example, you reconsider your approach to choosing two
random numbers that are required to be unequal. After assigning a random value to a,
you assign a random value to b until a and b are not equal: (pseudocode)
a = random int; do {b = random int;} until (a!=b).
This will guarantee that a and b are different, but now youβre worried that this loop
might run too long, so you wish to know the probability that the loop is executed 10
times or more. Common sense indicates that this probability is vanishingly small.
Our sample space π here consists of all sequences of b values that this program goes
through before the loop terminates. This space is infinite, so technically, the theory of
Laplace probability we have developed so far does not apply to it.
However, we understand, based on Laplace probability, that if we confine our
perspective to a single random choice for b, the probability that b equals a is
1
π π=π =
.
65535
The probability for the complementary event is
π πβ π =
65534
.
65535
An Application to Programming (3)
Each choice of b is independent from every other choice of b, assuming a perfect
random number generator. [There is an exact, technical definition of statistical
independence, but it is beyond the scope of our class, and not required here. It
suffices to understand that each random choice of b does not influence the range of
values, or probability of specific values of the other random choices of b.]
For independent events, there is a product rule: the probability of all of them
occurring in succession is the product of their individual probabilities.
Therefore, the probability π2 of our loop terminating after two iterations is the
probability of getting first π = π, then π β π:
π2 =
1
65534
65534
β
=
65535 65535 655352
The probability π3 of our loop terminating after three iterations is the probability of
getting π = π twice, then π β π:
π3 =
1
1
65534
65534
β
β
=
.
65535 65535 65535 655353
An Application to Programming (4)
We can now see that the probability ππ of our loop terminating after exactly n
iterations is
65534
ππ =
.
65535π
Letβs find the probability π10 that the loop repeats 10 times or more. Since
repeating 10 times or more is the union of the mutually exclusive events of
repeating exactly 10 times, exactly 11 times, exactly 12 times, and so on, π10
is the infinite sum
β
π10
β
β
65534
= ΰ· ππ = ΰ·
= 65534 ΰ·
65535π
π=10
π=10
π=10
1
65535
π
.
We can evaluate the geometric series by using the techniques we learned in
sequences and summation:
β
1
ΰ·
65535
π=10
π
1
=
65535
10 β
1
ΰ·
65535
π=0
π
1
=
65535
10
1
1β
1
65535
An Application to Programming (5)
Simplifying, we get
π10
1
=
65535
10
1
1β
1
65535
1
=
65535
10
65535
1
=
65535 β 1 655359 β 65534
< 10β48 .
This probability is vanishingly small indeed. If ten billion users ran your program
for the known age of the universe (14 billion years) and each instance executed
your loop one trillion times per second, then the probability of anyone hitting
10 repetitions or more would only be 4 in 1 billion.
In practice, imperfections in the random number generator used could very
well cause large numbers of repetitions to occur on βhumanβ time spans. In
programming practice, you should never assume that a βrandom numberβ
function produces truly random results. Making this assumption causes many
software systems to be less stable or secure than mathematical theory
predicts.
© Copyright 2026 Paperzz