06-21254 Mathematical Techniques for Computer Science
Autumn Semester 2016
c Achim Jung
The University of Birmingham
School of Computer Science
Dec 7/8, 2016
Handout 16
Continuous random variables
126. Random real numbers. How can we generate a random real number? Since real numbers have infinitely many digits, this
seems difficult. However, if we accept that in finite time we can anyway only read finitely many digits, then we can satisfy
the specification in a lazy way, in the sense of functional programming: We use a source of random bits and call upon it
repeatedly for the binary digits of the random real r = 0.b1 b2 b3 . . .. We do this in a lazy or demand-driven fashion: If the
program that is asking for a random real is requesting n bits then we generate n random bits and return those; if the program
is asking for more precision, then we generate some more.
In this fashion we generate random reals from the interval [0, 1] (where we think of 1 as the binary fraction 0.1111 . . .).
What is the probability of obtaining any one particular real in this fashion, for example, what is the probability of obtaining
1/5 = 0.001100110011 . . .? If the random bit generator we are using is fair, then the probability for obtaining this stream
of binary digits is 21 × 12 × 12 × 21 × · · · which equals zero. We find that our process clearly works in generating random reals
but each real has only a zero probability of occurring!
On the other hand, if we ask, what is the probability that the random real belongs to [0, 12 ], then the answer is clearly 1/2
because it only depends on the first bit b1 whether the result will belong to [0, 21 ] or to [ 12 , 1]: If b1 = 0 then we end up in the
lower half, if b1 = 1 then we end up in the upper half. (Note that the two intervals have 12 in common but, just like all real
numbers, this value is obtained with probability zero only.) With just a little more effort we can convince ourselves that the
probability of our random real ending up in the interval [a, b] (where 0 ≤ a ≤ b ≤ 1), is precisely b − a.
So we have described a random variable U where each individual outcome has probability zero, yet there are events which
have non-zero probability. You may think that this violates Kolmogorov’s third axiom about the additivity of probabilities
(Handout 14, item 111), since we can write every interval [a, b] as the disjoint union of all the singleton sets {r} where
a ≤ r ≤ b. The solution to this “puzzle” is that Kolmogorov’s axiom for disjoint sets only applies when there are no more
than countably many subsets involved, and we already know that an interval on the real line contains uncountably many
numbers.
127. Continuous random variables. The probabilities associated with the random variable U introduced above are very easy
to describe. As we have argued, it holds that p(a ≤ U ≤ b) = b − a. By generalising this formula a little bit, we can capture
many more useful random variables.
We start with a function f : R → R whose range contains no negative numbers and for which the integral
Z +∞
f dx
−∞
equals 1. Given a random variable X with values in R, we say that f is a density function for X if for every interval
[a, b] ⊆ R, it is the case that
p(a ≤ X ≤ b)
Z b
=
f dx
a
130
We say that X is a continuous random variable if it has a density function (more precisely, if it has a continuous density
function).
89
We note straightaway that all continuous random variables have the same curious behaviour as Rthe one we described at the
beginning of this handout, namely, each real number has zero probability of occurring (because rr f dx always equals zero),
yet the probability of the outcome belonging to some interval [a, b] may be non-zero.
In contrast, the random variables we introduced on the last handout were all discrete. There, certain outcomes could occur
with positive probability, but only countably many outcomes could occur at all. The behaviour of a discrete random variable
can never be described by a density function.
128. Two examples. Well, the first example we have already described above. We could call U the [0, 1]-uniformly distributed
random variable. Its density function u is defined as
1 if 0 ≤ x ≤ 1;
u(x) =
0 otherwise.
and indeed we have
p(a ≤ U ≤ b)
Z b
=
u dx
=
a
x|ba
=
b−a
whenever 0 ≤ a ≤ b ≤ 1.
131
A more aesthetically pleasing example is the normally distributed random variable N. It’s density function has the
famous “bell shape”:
132
In fact, there is a whole family of such random variables, parameterised by their expected value µ and their standard
deviation σ .26 The formula for the density function for N(µ, σ ) is
(x−µ)2
1
1
−
√ × × e 2σ 2
2π σ
which for the standard normal distribution N(0, 1) simplifies to
x2
1
√ × e− 2
2π
This is also known as the Gaussian distribution or indeed the bell curve.
26 Note
that some authors prefer to use variance σ 2 instead of standard deviation as the second parameter.
90
Evaluating the integral under the bell curve is surprisingly difficult and in computational statistics packages (such as “R”)
is done by approximating the curve with a polynomial. Be that as it may, the following probabilities come up often and it
is useful to remember them:
p(−1 ≤ N ≤ 1) ≈ 2/3 (68%)
p(−2 ≤ N ≤ 2) ≈ 0.95 (95%)
p(−2.6 ≤ N ≤ 2.6) ≈ 0.99 (99%)
129. Expected value and standard deviation for continuous random variables. Remember that we used the formula
r × p(X = r)
∑
r∈range(X)
for the expected value of a discrete random variable. This will not work for continuous random variables because p(X = r)
is always zero. We can get an approximate value for E[X] by dividing the real line into small intervals [an , bn ] and calculating
∞
an + bn
× p(an ≤ X ≤ bn )
2
n=0
∑
As we make the intervals finer and finer, this expression converges to an integral:
Z +∞
=
E[X]
x f (x) dx
−∞
For variance we can do the same, and we’ll get the formula
Z +∞
=
Var[X]
(x − E[X])2 f (x) dx
−∞
For standard deviation we have (as before) SD[X] =
p
Var[X].
Unlike the bell curve itself, these integrals are easy to evaluate for the standard normal distribution and we get that the
expected value for a N(0, 1)-distributed random variable is indeed 0 and its standard deviation is indeed 1.
130. The Central Limit Theorem. At the beginning of the handout we described the random variable U which is uniformly
distributed in the interval [0, 1]. How can we realise concretely a random variable that is normally distributed? The answer
is surprisingly simple and general:
Let X be any random variable with expected value E[X] = µ and standard deviation SD[X] = σ . We create a new random
variable Y by calling N times on X and returning the average of the random X values. As a formula:
Y
=
1
(X1 + X2 + . . . + XN )
N
Then the following is true for Y :
133: Central Limit Theorem – version 1
• If N is sufficiently large then Y is approximately normally distributed.
• The expected value of Y is µ, or in other words, E[Y ] = E[X].
• The standard deviation of Y is
√σ ,
N
√
or in other words, SD[Y ] = SD[X]/ N.
When would we say that N is “sufficiently large”? It turns out that if X is a continuous random variable, then N = 5 is
already big enough for practical purposes, as the graphs in Figure 1 show. On the other hand, from a mathematical point of
view, we should remember that the normal distribution is an idealisation, just like the Poisson random variable.
We saw in the last handout, items 121 and 122, how expected value and standard deviation change when we transform a
random variable. We can apply this to the variable Y to get the following version of the Central Limit Theorem:
134: Central Limit Theorem – version 2
Let X be a random variable with expected value µ and standard deviation σ , and Y the variable
Then the variable
√
N
Y −µ
Z =
= (Y − µ) ×
√σ
σ
N
is approximately N(0, 1) distributed.
91
1
N (X1 + X2 + . . . XN ).
Figure 1: Illustration of the Central Limit Theorem for N = 1, 2, 3, 4 (taken from the Wikipedia page on the Central Limit
Theorem).
131. An application. The Central Limit Theorem is the starting point for statistics where we try to make statements about
“populations” by sampling. As an example, consider the following situation: We are in the run-up to a referendum and we
would like to make a prediction on its outcome. We already suspect that it is a tight race and that the outcome will be very
close to a 50–50 split.
We have the random variable X which is realised by selecting a voter at random and asking for his or her voting intention.
If the answer is “yes”, we return 1, otherwise we return zero. So X is a Bernoulli random variable with the chance p ≈ 0.5
of returning 1. From the last handout we know that the expected value of this variable is µ = p and its standard deviation
√
σ = pq ≈ 0.5.
Polling N voters and computing the proportion of “yes” voters is the same as considering X for each voter and computing the
average Y of the results X1 , X2 , . . . , XN . The Central Limit Theorem tells us that the expression Z = Y√−µ
is approximately
σ
N
N(0, 1) distributed. This means that if we select the N voters independently and randomly, then we can expect Z to be
somewhere around zero. From the rules of thumb mentioned above we know that in 95% of all such polls, we will obtain
a value for Z that is between −2 and 2. We use this fact to get an estimate for µ from the polling average Y : We are 95%
certain that each of the following inequalities holds:
−2
−2
−2 √σN
≤
Z
≤ 2
≤
Y −µ
≤ 2
√σ
N
≤ Y −µ
≤ 2 √σN
−Y − 2 √σN
≤
−µ
≤ −Y + 2 √σN
Y − 2 √σN
≤
µ
≤ Y + 2 √σN
Since the standard deviation σ of the variable X can not be more than 21 , we can conservatively replace σ with this concrete
value and obtain that µ lies between Y − √1N and Y + √1N (to repeat: with 95% certainty). This allows us to report our
polling result with an appropriate margin of error of ± √1N .
Assuming that the√race will be tight, we can now choose N in such a way that the margin of error is just ±1%: From
√1 = 0.01 we get N = 100 and from this N = 10, 000. In other words, we need to poll 10,000 (randomly selected) voters
N
to make a prediction that is likely to be accurate within one percent.
132. Practical advice. In the exam I expect you to
• know that the probabilities associated with continuous random variables are computed as integrals (but I will not ask
you to evaluate an integral);
• be able to apply the Central Limit Theorem in order to estimate a margin of error on a sample average.
92
© Copyright 2026 Paperzz