Chapter 2
Axiomatic Probability
2.1
Introduction to probability
Once we have seen how to count possible outcomes of an experiment,
or ways a certain action can be performed, we now start talking about the
probability assigned to an event we are interested in related to the outcome
of some random experiment.
Probability is a measure of likelyhood assigned to an event whose occurrence is not certain. In the simplest case, it is a generalisation of the
notion of percentage. For instance, if we consider a finite population from
which a member is randomly extracted, then the probability that he/she has
a specific attribute is given by the percentage of members of the population
having that attribute. However, the fact that probability simply corresponds
to a percentage does not hold true in general, as the space of all possible
outcomes can be unfinite or dense (not discrete), or there may be more complex relations between the possible outcomes and the result we are interested
in, e.g. there may be outcomes that are more likely to be realised or whose
realisation depends on other factors.
Usually, we use probability to know how likely is a certain event to occur
when a certain random experiment is performed. A random experiment is an
experiment whose outcome is uncertain, i.e. not known a priori, and cannot
11
12
2. Axiomatic Probability
be predicted with certainty. Given a random experiment, we refer to an event
as a result that can either be true (or occur) or false (or not occur) when the
experiment is performed.
An intuitive definition of probability was given by the frequentist interpretation of probability, which florished in the nineteenth century. The frequentist interpretation was based on the assumption that a random experiment
can be performed under the same well-specified conditions an infinite number of times, each independent from the others. So, let E denote an event
of interest and n(E) denote the number of occurrences of the event E when
the experiment is repeated n consecutive and independent times. Then, for
n large, the probability of E, denoted by P(E), is approximately given by
the long-run proportion of occurrences of E, i.e.
n large
P(E) ⇠
n(E)
,
n
or more precisely P(E)
!
n!1
n(E)
.
n
(2.1)
In this interpretation, probability is defined as the limiting relative frequency,
which is an easy and intuitive concept. However, although this view helps
in the understanding of probability, it is an insufficient base for a rigorous
complete theory. Indeed, a critical drawback is that in general we don’t know
whether the relative frequency converges to some value when the number of
repetitions goes to infinity. Furthermore, even when convergence holds true,
how do we know that if the experiment is performed again in a repeated
series, we again get convergence? This issue was initially addressed by the
frequentists by assuming the convergence of
n(E)
n
to a constant limit value
as an assumption. In e↵ect, this convergence is experimentally observed in
experiments that are easy to conduct under well-defined conditions an arbitrarily large number of times. For example, running computer simulations of
tossing a fair coin (the probability of getting a head is the same as getting a
tail), we can empirically observe the convergence of the proportion of heads
to the value of 12 .
However, this seems an unreasonable assumption to start with, as it is
not clear under which conditions it is realistic. A more systematic approach
2.2 Sample spaces and events
consists in assuming a set of axioms specifying the basic self-evident properties that probability must satisfy, then proving more complex results. For
instance, the convergence of the relative frequencies to the correspondent
probabilities was proven under the axioms in one of the most famous fundamental results in probability theory, the law of large numbers.
2.2
Sample spaces and events
Now, let us see how to construct a probabilistic model for a random
experiment/phenomenon. The model is specified by three objects:
• a sample space, which is the set of all possible outcomes of the experiment, and is denoted by ⌦;
• the collection of events to which we want to assign probabilities, that
are sets of possible outcomes, hence defined as subsets of the sample
space;
• a probability measure, that is a mathematical real-valued function de-
fined on the collection of events that assigns to each event a value
representing the probability that such event occurs.
A first remark should be done on the characterisation of events as subsets of
⌦: although an event is always a subset of ⌦, not in all models all subsets
of ⌦ can be defined as events. In fact, in general the collection of events is
a specified sub-family of the collection of all possible subsets, such that it
satisfies certain mathematical properties. This is a well-defined mathematical object called sigma-algebra. However, for our purposes, we will always
work in the simple case where the collection of events is defined to be the
collection of all possible subsets of the sample space.
The identification of an event with a subset of the sample space means that
the event occurs if the actual outcome of the experiment is an element contained in it, that is:
given E ✓ ⌦,
E occurs if the outcome is ! 2 ⌦ such that ! 2 E.
13
14
2. Axiomatic Probability
There is a special class of events, namely the ones containing one single
possible outcomes. These are called elementary, or simple, events.
Let us see a few simple examples.
Example 2.1. Suppose that we toss a coin three consecutive times and
observe the resulting sequence. The sample space for this random experiment
is the set of all possible (ordered) sequences where each element is either a
head (H) or a tail (T ):
⌦ = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }.
We have already see in Example 1.5 how to compute the total number of
outcomes for a series of coin tosses. This must be equal to the cardinality
of the sample space, in this case |⌦| = #⌦ = 23 = 8. Suppose that we are
interested in the event E = “the total number of heads is two”. We can
translate the statement that defines the event E in terms of a set of possible
outcomes, that is E = {HHT, HT H, T HH} ✓ ⌦. Conversely, given an
event defined as a subset of ⌦, we can translate it into words, e.g.
F = {HHH, HT H} = “the first and the third tosses are heads”.
Example 2.2. The blood type of human beings is determined by a pair of alleles, each received from one parent. We denote the three possible blood-type
alleles by a, b, o. The possible pairs of alleles are called genotypes. Depending on the genotype inherited, the o↵spring will have an actual blood type
(fenotype), following the dependencies shown in the following table:
Genotype
aa
ab
ao
bb
bo
oo
Blood type
A
AB
A
B
B
O
Suppose that a person is selected at random and his or her genotype is observed. What is the sample space for this random experiment? How is the
event “the selected person has blood type B” defined? How is the event
{aa, ao, bb, bo} translated into words?
The sample space is the set of all possible genotypes ⌦ = {aa, ab, ao, bb, bo, oo},
2.2 Sample spaces and events
15
with cardinality given by the unordered sequences with repetitions of two alleles chose from the three possbile ones:
✓
◆
3+2 1
4!
4·3·2·1
|⌦| =
=
=
= 6.
2
2!2!
2·2
Then, we have the following equivalent specifications for the events of interest:
“the selected person has blood type B” = {bo, bb},
{aa, ao, bb, bo} = “the selected person has blood type either A or B”.
In both Example 2.1 and Example 2.2, the sample space was a finite
discrete set, of the type
⌦ = {!1 , . . . , !n },
|⌦| = n,
where the elementary events are the singletons {!i }, i = 1, . . . , n. There are
also cases where the sample space is countable but not finite and cases where
it is dense and not finite. An example of last kind is the following.
Example 2.3. We sit on the side of a street, we start a stop watch and we
stop it when the next car passes in front of us, then we note down the elapsed
time in units of hours.
The sample space in this case is ⌦ = [0, 1) ⇢ R. Or, if we assume it is
possible that no car passes and we denote that outcome by 1, then the
sample space is ⌦ = [0, 1) [ {1} ⇢ R [ {1}. Assume that at the moment
we start the watch it is 6PM. How is the event that the first car passes before
6:15PM translated in terms of a set of outcomes? And how can the event
1 1
[ 12
, 6 ] be put into words? We have the following equivalent specifications:
“the first car passes before 6:15PM” = [0, 14 ),
1 1
[ 12
, 6 ] = “the first car passes between 6:05PM and 6:10PM (included)”.
In Example 2.3, the elementary events are the singletons of real nonnegative values: {x}, where x 2 [0, 1) (or x 2 [0, 1) [ {1}).
Let us now see how the set relations and operations are interpreted as
relationships among events. Given two events E, F , then:
16
2. Axiomatic Probability
(i) E ✓ F if and only if “E occurs ) F occurs”;
(ii) E c occurs if and only if E does not occur;
(iii) E \ F occurs if and only if both E and F occur;
(iv) E [ F occurs if and only if either E or F or both occur;
Proof.
(i) When E occurs, the outcome is some element ! 2 E, and if
E ✓ F then ! is also in F and so F occurs as well. Conversely, if F
occurs whenever E occurs, that means that all elements of E are also
contained in F .
(ii) Saying that “E c = ⌦ \ E occurs” means that the outcome is ! 2 ⌦ \ E,
equivalently ! 2 ⌦, ! 2
/ E, but that is equivalent to say that “E does
not occur”.
(iii) Simply by definition of intersection, since E \ F is the set of all possible
outcomes that are contained both in E and F .
(iv) By definition of union, since E [ F is the set of all possible outcomes
that are contained either in E or in F or in both of them.
For example, recalling the sample space of the experiment in Example
2.1, we define the events
E = “the total number of heads is one” = {HT T, T HT, T T H},
F = “the first and second tosses are tails” = {T T H, T T T },
G = “the total number of tails is three” = {T T T }.
Then, we have the following:
2.3 Kolmogorov’s axioms
Fc
E[F
E\F
E[G
E\G
17
= “at least one among the first two tosses is not a tail”
= {HHH, HHT, HT H, HT T, T HT, T HH},
= “the total number of heads is either zero or one”
= {HT T, T HT, T T H, T T T },
= “the first and second tosses are tails, and the third is a head”
= {T T H},
= “the total number of tails is either two or three”
= {HT T, T HT, T T H, T T T },
= “the total number of heads is one and all tosses are tails” = ;.
Referring to the last event, we note that it is impossible to get exactly one
head and three tails out of three tosses. The event represented by the empty
set and consisting of no outcomes is called a null (or impossible) event.
Definition 2.4. Two events whose intersection is a null event, hence they
cannot occur both at the same time, are called mutually exclusive. A series
of events (En )n2N such that En \ Em = ; for all n, m 2 N, n 6= m, is called a
series of (pairwise) mutually exclusive events.
We recall that the set operations of union and intersection obey the distributive, commutative and associative laws. We also remark that the set
operations extend to countable sequence of sets, and so do the relationships
between events. We recall the De Morgan’s laws for sets:
!c
[
\
En =
Enc ;
n2N
\
n2N
2.3
En
!c
n2N
=
[
Enc .
n2N
Kolmogorov’s axioms
Now that we are confident with the modeling of random experiments
in terms of sample spaces and events, we can see how to define a probability measure for the events by stating the properties that it must satisfy. These axioms are named after the Russian mathematician Andrey Kol-
18
2. Axiomatic Probability
mogorov (1903 - 1987) and correspond to intuitive properties that are also
in agreement with the frequentist definition of probability in (2.1).
Definition 2.5 (Kolmogorov’s Axioms). Let ⌦ be the sample space of a
random experiment, a real-valued function P defined on the collection F of
events for the given experiment is called a probability measure on (⌦, F) if it
satisfies the following three conditions:
(I) P(E)
0 8E 2 F (non-negativity),
that is all events have a non-negative probability;
(II) P(⌦) = 1 (cerainty or unit measure),
that is: every time the experiment is performed, something happens;
(III) let (En )n2N be a sequence of mutually exclusive events, then
!
[
X
P
En =
P(En )
(additivity).
n2N
n2N
An event that has probability one, like the sample space, is called a
certain event, in that it occurs every time the experiment is performed since
it contains all possible outcomes1 .
From these basic conditions, a series of properties satisfied by the probability measure follows.
Proposition 2.6. Let P be a probability measure on (⌦, F), then the following properties hold true:
1. P(;) = 0;
2. P(E c ) = 1
P(E)
3. E, F 2 F, E ✓ F
1
8E 2 F;
)
P(E) P(F ) ( monotonicity);
If the sample space is well specified as the set of all and only the possible outcomes,
then it is the only certain event. If the sample space was mis-specified by adding some
elements that cannot be outcomes of the random experiment, then there would be other
certain events di↵erent from ⌦ and other impossible events di↵erent from the empty set.
2.3 Kolmogorov’s axioms
4. P(E [ F ) = P(E) + P(F )
Proof.
19
P(E \ F )
8E, F 2 F.
1. We can rewrite the sample space as ⌦ = ⌦ [ ;. Since ;, ⌦ are
disjoint sets, by the certainty and additivity axioms it follows
1 = P(⌦) = P(⌦ [ ;) = P(⌦) + P(;) = 1 + P(;),
from which P(;) = 0.
2. Since E, E c are disjoint sets such that E [ E c = ⌦, again by the certainty and additivity axioms it follows
1 = P(⌦) = P(E) + P(E c ), from which P(E c ) = 1
P(E).
3. We can rewrite the bigger set as the union of the smaller set and the
remaining part, that is F = E [ (E c \ F ), where we note that E
and E c \ F are mutually exclusive. Thus, P(F ) = P(E) + P(E c \ F ),
where P(E c \ F )
0 by the non-negativity axiom, from which the
monotonicity property.
4. Given any pair of events E, F 2 F, we can rewrite their union as the
union of two mutually exclusive events: E [ F = E [ (E c \ F ). On the
other hand, F can also be rewritten as union of two mutually exclusive
events:
F = ⌦ \ F = (E [ E c ) \ F = (E \ F ) [ (E c \ F ).
Thus:
P(E [ F ) = P(E) + P(E c \ F ) = P(E) + P(F )
P(E \ F ).
Note that, from the monotonicity properties, it immediately follows that
the probability of any event is bounded between 0 and 1:
0 P(E) 1
8E 2 F.
This was also clear from the frequentist definition in (2.1).
© Copyright 2026 Paperzz