2.2 Discrete Probability distributions
A random variable is the numerical description of the outcome of
an experiment, e.g.
1. The result of a die roll.
2. The height of a person.
3. The number of heads when a coin is tossed k times.
1 / 67
A random variable
A random variable is denoted by a capital letter and can be defined
by its distribution.
It should be interpreted as the (as yet) unobserved result of a
random experiment.
A realisation of a random variable is a particular observation taken
from this distribution (e.g. the result of a single die roll, the height
of a particular person). Realisations are denoted by small letters.
2 / 67
Example of the distribution of a random variable
Suppose X is the result of a die roll. The distribution of X is given
by
1
P(X = x) = , x ∈ {1, 2, 3, 4, 5, 6}.
6
The support SX of random variable X is the set of possible
realisations (results).
In the case of a die roll SX = {1, 2, 3, 4, 5, 6}.
3 / 67
2.2.1 Definition of a discrete distribution
The support of a discrete random variable is a set that can be
listed.
Commonly, discrete random variables take integer values. e.g.
1. No. of defective components.
2. No. of heads when a coin is tossed k times.
4 / 67
Definition of a discrete distribution
Suppose the support of the random variable X is {x1 , x2 , . . . , xk }.
The distribution of a discrete random variable X satisfies the
following two conditions.
P(X = x)≥0, for any x.
k
X
P(X = xi ) = 1.
i=1
Note that P(X = x) > 0 if and only if x ∈ SX . Otherwise,
P(X = x) = 0.
5 / 67
Example 2.2.1
Anne and Bob play the following game. Anne tosses a coin and
Bob rolls a die.
If Anne tosses heads, then she wins 3 Euro from Bob. If Anne
tosses tails, then she loses X Euro to Bob, where X is the result of
the die roll.
Let Y be the winnings of Anne (if Anne loses c Euro, then her
winnings are −c).
Define the distribution of Y .
6 / 67
Example 2.2.1
Note that an elementary event of such a experiment (a
simultaneous coin toss and die roll) describes the results of both
the toss and the roll.
There are 12 elementary events
(1, H), (2, H), . . . , (6, H), (1, T ), (2, T ), . . . , (6, T ).
Each of these elementary events is associated with a payoff to
Anne.
In order to calculate the distribution of Anne’s winnings, we can
tabulate the elementary events, their probabilities and the
corresponding payoff Anne gets.
The probability that Anne wins y units is simply the sum of the
probabilities of the elementary events that lead to Anne winning y .
7 / 67
Example 2.2.1
8 / 67
Example 2.2.1
9 / 67
Example 2.2.1
10 / 67
Elementary Events and Random Variables
It should be noted that each elementary event corresponds to a
well defined realization of the random variable Y .
However, this is not a one to one relationship, as a particular
realization of Y can correspond to more than one elementary event.
Hence, an elementary event can be interpreted as a full description
of the outcome of an experiment and the realization of a random
variable as a summary of the outcome of an experiment (i.e. the
realization of the random variable contains less information).
For example, when a coin is tossed 10 times. An elementary event
is given by a sequence of 10 results. The random variable X might
be the number of heads in that sequence (i.e. a summary of the
sequence).
11 / 67
2.2.2 The expected value and variance of a random variable
Suppose a die is thrown 600 times. We expect on average 100
observations of each possible result.
Hence, we expect the average result to be
100 × 1 + 100 × 2 + 100 × 3 + 100 × 4 + 100 × 5 + 100 × 6
.
600
This is
X
1
1
1
1
1
1
1× +2× +3× +4× +5× +6× =
xi P(X = xi ) = 3.5.
6
6
6
6
6
6
xi ∈SX
12 / 67
The expected value and variance of a random variable
The expected value of a discrete random variable X , denoted E (X )
or µX , is given by
X
µX = E (X ) =
xi P(X = xi ).
xi ∈SX
The expected value of a function f of a random variable X ,
E [f (X )] is given by
X
E [f (X )] =
f (xi )P(X = xi ).
xi ∈SX
13 / 67
The expected value and variance of a random variable
The k-th moment of X is given by E (X k ), where
X
E [X k ] =
xik P(X = xi ).
xi ∈SX
The variance of a random variable, Var(X ) = σX2 , is given by
σX2 = E [(X − µX )2 ] =
X
(xi − µX )2 P(X = xi ) = E (X 2 ) − E (X )2
xi ∈SX
The standard
deviation of a random variable is given by
p
σX = Var (X ).
14 / 67
The expected value and variance of a random variable
µX can be thought of as the theoretical (or population) mean of a
random variable.
It is a measure of the centre of a distribution (it is the centre of
mass).
σX is a measure of the dispersion of the distribution.
If a distribution is symmetric about x = x0 , then E (X ) = x0 .
15 / 67
The expected value and variance of a random variable
Suppose we observe a set of realisations from a distribution, then
the sample mean can be used to estimate the expected value.
The sample standard deviation can be used to estimate the
standard deviation of the random variable.
Naturally, when we observe samples there will be random
fluctuations from the theoretical mean (µX ) and standard
deviation (σX ).
16 / 67
Example 2.2.2
Calculate the expected value and standard deviation of Anne’s
payoff (see Example 2.2.1).
17 / 67
Example 2.2.2
18 / 67
Example 2.2.2
19 / 67
2.2.3 Standard Discrete Random Variables
1. The 0-1 Distribution with Parameter p
X has a 0-1 distribution with parameter p [we write X ∼0-1(p)] if
P(X = 0)=1 − p
P(X = 1)=p
For example, if I toss a coin once and X is the number of heads,
then X ∼ 0-1( 12 ).
If I roll a die once and Y is the number of sixes, then Y ∼ 0-1( 16 ).
20 / 67
2. The binomial probability distribution with parameters n
and p
Suppose I carry out n experiments and the probability of ”success”
in each experiment is p. Assume that the results of the
experiments are independent of each other.
Such a set of trials is called a series of independent Bernoulli trials.
Let X be the number of ”successes”.
X has a binomial distribution with parameters n (the number of
experiments) and p (the probability of success in each experiment).
21 / 67
The binomial distribution
We write X ∼ Bin(n, p).
If Y is the number of failures, then Y ∼ Bin(n, 1 − p).
We have
P(X = x) =
where
n x
p (1 − p)n−x , for x ∈ {0, 1, 2, . . . , n},
x
n!
n
=
,
x
x!(n − x)!
where n! = 1 × 2 × . . . . . . n. Note that
n!
= n(n − 1)(n − 2) . . . (n − x + 1).
(n − x)!
22 / 67
The binomial distribution
n
[sometimes denoted n Cx ] is the number of ways of choosing
x
x objects from n (the order in which the choices are made are not
important).
Note: Since the experiments are independent, the probability of
any given sequence of results with x successes (and thus n − x
failures) is given by the product of the individual probabilities i.e.
p x (1 − p)n−x .
23 / 67
The binomial distribution
In order to obtain the probability of obtaining x successes, we
multiply this probability by the number of possible sequences of n
results which contain x successes.
Such a sequence can be chosen by choosing x positions from the n
in the sequence, which result in a success. The other
positions
n
must then correspond to failures. Hence, there are
such
x
sequences.
24 / 67
The binomial distribution
Note
n
n
=
.
x
n−x
Choosing the x positions for the successes is equivalent to
choosing the n − x positions for the failures. There must be the
same number of ways of choosing in both cases.
n
n
=
= n.
1
n−1
By definition 0! =1. Hence,
n
n
=
= 1.
0
n
25 / 67
The relation between the binomial distribution and the 0-1
distribution
Suppose X ∼ Bin(n, p). Let Xi be the number of successes in the
i-th experiment.
It follows that
P(Xi = 0)=1 − p
P(Xi = 1)=p
i.e. Xi ∼ 0-1(p).
26 / 67
The relation between the binomial distribution and the 0-1
distribution
Since the results of the experiments are independent, these Xi are
independent. Summing these Xi , we obtain the total number of
successes i.e.
X = X1 + X2 + . . . + Xn .
This result will be later used to derive the expected value and
variance of a random variable with a binomial distribution.
27 / 67
Example 2.2.3
The probability of a randomly chosen creditor paying back a loan is
0.9 (independently of other creditors).
Suppose an employee of a bank has ten loans on his books,
calculate the probability that
1. all these creditors pay their loan back.
2. two of the creditors do not pay their loan back.
3. at least two of the creditors do not pay their loan
back.
28 / 67
Example 2.2.3
29 / 67
Example 2.2.3
30 / 67
Example 2.2.3
31 / 67
Example 2.2.3
32 / 67
3. The Poisson distribution
Consider telephone calls arriving at a call centre. Assume that
1. Arrivals occur randomly at an average rate of λ per
unit time.
2. The probability of an arrival in an interval of length k
is constant.
3. The numbers of arrivals in two non-overlapping
intervals of time are independent.
Then the number of arrivals, X , in an interval of length t has a
Poisson distribution with parameter µ = λt.
Note: µ is the expected number of arrivals in time t.
33 / 67
The Poisson distribution
We write X ∼ Poisson(µ)
P(X = x) =
e −µ µx
x!
The Poisson distribution can also be used to model the distribution
of the number of individuals in a given area when the population
does not form clusters.
e.g. Suppose the number of male Siberian tigers is one per
100km2 .
In a 1000km2 area we expect on average 10 male tigers. Let X be
the number of male tigers in this area X ∼ Poisson(10).
34 / 67
Example 2.2.4
Calls arrive at a call centre at a rate of 3 per minute. Calculate
i) the probability that in one minute there is at least
one call.
ii) the probability that in 3 minutes there are exactly 10
calls.
35 / 67
Example 2.2.4
36 / 67
Example 2.2.4
37 / 67
The Poisson Approximation to the Binomial
If X ∼ Bin(n, p), where n is large and p is small, then X is
approximately Poisson distributed with parameter µ = np.
Note: µ is the expected number of ”successes”.
This approximation is reasonable when n ≥ 20 and p ≤ 0.05 (or
n ≥ 50 and p ≤ 0.1).
If p ≥ 0.9, then this approximation can be used for the distribution
of the number of ”failures”, Y , Y ∼ Bin(n, 1 − p).
Y has an approximate Poisson(n(1 − p)) distribution.
38 / 67
Example 2.2.5
Suppose the probability that a good from a production line is
faulty equals 0.002. 1 000 goods are inspected.
Using the appropriate approximation, estimate the probability that
i) exactly 1 good is faulty.
ii) more than 1 good is faulty.
39 / 67
Example 2.2.5
40 / 67
Example 2.2.5
41 / 67
Example 2.2.5
42 / 67
4. The Geometric Distribution
Suppose that a set of independent Bernoulli trials are carried out
and the probability of a success is p.
Let X be the number of trials until the first success occurs
(including the success). X has a geometric distribution with
parameter p, we write X ∼ Geom(p).
P(X = x) = (1 − p)x−1 p
Note: if the first success occurs at the x-th trial, then the sequence
of results must be FF . . . FS, where F occurs x − 1 times.
Since the trials are independent, the probability of such a sequence
is simply the product of the probability of the individual results i.e.
(1 − p)x−1 p.
43 / 67
Example 2.2.6
A coin is thrown until the first heads appears. Calculate the
probability that
i) the coin is tossed exactly 4 times.
ii) the coin is tossed at least 5 times.
iii) the number of tosses is odd.
44 / 67
Example 2.2.6
45 / 67
Example 2.2.6
46 / 67
Example 2.2.6
47 / 67
Elementary events that are not equally likely
Note: the elementary events in this case are possible sequences, i.e.
w1 = H, w2 = TH, w3 = TTH . . . .
These elementary events are not equally likely. It can be seen that
P(wi ) = P(X = i) = 0.5i
48 / 67
5. The Hypergeometric Distribution
Suppose we choose n objects from N objects without replacement
(e.g. choosing a hand of cards, inspecting a batch of goods).
Suppose there are N1 objects of type 1 and the remaining
N2 = N − N1 objects are of type 2.
Let X be the number of objects of type 1 chosen. What is
P(X = x)?
P(X = x) is given by the number of ways of choosing x objects of
type 1 divided by the total number of ways of choosing the n
objects (the possible choices are all equally likely).
49 / 67
The hypergeometric distribution
N
In total, there are
ways of choosing the n objects.
n
In order to choose x objects of type 1
a. We choose x objects from the N1 of type 1.
N1
x
ways.
b. Since in total we choose n objects, the remaining
n −x objects
are chosen from the N2 objects of type
N2
2.
ways.
n−x
50 / 67
The hypergeometric distribution
Since for each way of choosing the objects of type 1, there are the
same number of ways of choosing the objects of type 2, we have
N1
N2
x
n−x
P(X = x) =
,
N
n
where
N - total number of objects.
n - number of objects chosen.
Ni - total number of objects of type i.
x - number of objects of type 1 chosen.
n − x - number of objects of type 2 chosen.
51 / 67
The hypergeometric distribution
Note that when
1. the total number of objects, N, is very large in
comparison to the number of objects chosen, n,
2. the numbers of objects of both types are reasonably
large,
then the hypergeometric distribution (here, the number of type 1
objects chosen) can be approximated using the binomial
distribution with parameters n and p = NN1 .
52 / 67
Example 2.2.7
Suppose in a batch of 100 components, 5 are faulty. An inspector
checks 10 components. Calculate the probability that
i) none of the components chosen are faulty.
ii) the probability that at least 2 of the components are
faulty.
53 / 67
Example 2.2.7
54 / 67
Example 2.2.7
55 / 67
Example 2.2.7
56 / 67
Example 2.2.7
57 / 67
Example 2.2.7
58 / 67
2.2.4 The expected value and variance of a sum of
independent random variables
In order to calculate the expected value and variance for the
binomial distribution, we use the following theorem regarding the
expected value and variance of a sum of random variables.
THEOREM: Suppose X = X1 + X2 + . . . + Xn .
i) E (X ) = E (X1 ) + E (X2 ) + . . . + E (Xn )
ii) If the Xi are independent, then
Var (X ) = Var (X1 ) + Var (X2 ) + . . . + Var (Xn )
59 / 67
2.2.4 The expected value and variance of a sum of
independent random variables
Suppose Y = aX + b, where a, b are constants. It follows that
i) E (Y )=aE (X ) + b
ii) Var (Y )=a2 Var (X )
60 / 67
Example 2.2.8
Using the theorem on the expected value and variance of a sum of
independent random variables, calculate E (X ) and Var (X ) when
X ∼ Bin(n, p).
61 / 67
Example 2.2.8
62 / 67
Example 2.2.8
63 / 67
Example 2.2.8
64 / 67
Example 2.2.8
65 / 67
Important Note
This expected value and variance, together with the interpretation
of a binomial random variable as the sum of independent random
variables, will be important when we consider the Central Limit
Theorem in Section 2.4.
66 / 67
The expected value and variance of standard discrete
random variables
The following table gives the expected value and variance for the
standard discrete distributions. In the case of the hypergeometric
distribution p = NN1 denotes the proportion of type 1 objects.
Distribution
0-1(p)
Bin(n, p)
Poisson(λ)
Geometric(p)
Hypergeometric
E (X )
p
np
λ
Var (X )
p(1 − p)
np(1 − p)
λ
1
p
1−p
p2
np
np(1 − p) N−n
N−1
67 / 67
© Copyright 2026 Paperzz