13 Continuous random variables

13
13.1
Continuous random variables
Introduction
We continue with the study of random variables, but now consider the case
where the range RX of a random variable X is either the real line R, or an
interval (or collection of intervals) contained in R. Examples of continuous
random variables are
• the total rainfall (in cm) in Sheffield tomorrow;
• your current blood pressure level (in millimetres of mercury);
• the time in seconds for a runner to complete the London Marathon.
(In practice, measurements of these quantities would actually be discrete, but
it’s usually more convenient (and harmless) to treat them as continuous).
13.2
Why make the distinction between continuous and discrete
random variables?
Let’s try to write down a probability mass function for a continuous random
variable. Consider again the spinning roulette wheel example from section 1:
Example 47. A (European) roulette wheel is spun. Consider the angle between the horizontal axis (viewing the wheel from above) and a line from the
centre of the wheel that bisects the zero on the wheel. Assuming that any
angle is equally likely, (i) what it the probability that this angle is π/2? (ii)
What is the probability that this angle is between π/2 and π?
66
Denote the random angle by X. This is a continuous random variable with
range RX = [0, 2π]. We assume all possible values of X to be ‘equally likely’,
so that P (X = a) = P (X = b) for any a, b ∈ [0, 2π]. If we write P (X = x) =
k for some constant value k, what value would k be? Now, P {X ∈ [0, 2π]} =
1, so
X
X
?
1 = P {X ∈ [0, 2π]} =
P (X = x) =
k.
x∈Rx
x∈Rx
But there are (uncountably) infinitely many different x in the set RX , so how
can we sum k an infinite number of times and get 1? (And how would we
even write down a sum of uncountably many values?) Clearly, this isn’t going
to work. In fact, we’ve already dealt with this problem earlier in the course,
when we considered the problem of a randomly generated angle, and defined
probability as a measure. We will repeat the discussion here and extend it
to consider continuous random variables in general.
13.3
Probability measures for continuous random variables
Recall that in section 7.2 we said that a probability mass function for a
discrete random variable X could be thought of as defining a probability
measure mX on RX , such that for a set of interest A we have P (X ∈ A) =
mX (A). We consider how to do something similar in a continuous setting.
For the example above, we want our range to be RX = [0, 2π]. We don’t
want one part of the circle to be favoured over any other, so for two intervals
of the same length [a, a + w] and [b, b + w] (with a, a + w, b, b + w ∈ RX ) we
would like the probabilities that X is in each of them to be the same.
We can achieve this by considering the measure mX defined by
b−a
mX ([a, b]) =
.
2π
67
(6)
(This is the Lebesgue measure, divided by 2π).
Assume that this measure defines the distribution of X, so that P (X ∈ A) =
mX (A). Then, as we wanted, any two intervals of the same width as above
will be equally likely:
P (X ∈ [a, b]) = P (X ∈ [a + w, b + w]) =
w
.
2π
However, any single value has zero probability:
P (X = a) = mX ([a, a]) =
a−a
= 0.
2π
We can now answer the questions in Example 47 without difficulty. We have
P (X = π/2) = 0 and
P (π/2 ≤ X ≤ π) = P ([π/2, π]) =
π − π/2
= 0.25.
2π
One way to think of the measure defined in (6) as an area under a curve.
If we draw the ‘curve’ y = 1/(2π) for x ∈ [0, 2π], and y = 0 otherwise,
then mX ([a, b]) is the area under the curve between x = a and x = b. An
illustration is given in Figure 10. Again, as the area between x = a and x = a
is zero, this reinforces the point that P (X = a) = 0.
The ‘area under the curve’ interpretation suggests that we can construct other
valid probability measures, by drawing different curves. We can consider any
function f (x) with f (x) ≥ 0 for all x, such that the total area under the
curve is 1. Another example is given in Figure 11, for a random variable X
with range RX = [0, 10], using the curve
f (x) =
3x(10 − x)
,
500
68
0.3
0.2
y
0.1
0.0
0
1
2
3
x
4
5
6
Figure 10: P (X ∈ [π/2, π]) is given by the area under the curve y = 1/(2π) between x = π/2
and x = π.
for x ∈ [0, 10], and f (x) = 0 otherwise. This measure will give more probability to X lying in the interval [4, 6] than the interval [0, 2], say, even though
the two intervals have the same width.
Based on this, we make the following definition.
Definition 27. A probability density function (p.d.f. for short) fX is a
function such that both fX (x) ≥ 0 for all x, and
Z ∞
fX (t) dt = 1.
−∞
A random variable X with p.d.f. fX has the property that
Z b
P (a ≤ X ≤ b) =
fX (t) dt.
a
We have exactly the same definition as in Section 7.3 for the cumulative
69
0.3
0.25
y
0.2
0.15
0.1
0.05
0
0
1
2
3
4
5
x
6
7
8
9
10
11
Figure 11: P (X ∈ ([2, 4]) is given by the area under the curve y = − 3x(x−10)
between x = 2
500
and x = 4.
distribution function:
FX (x) := P (X ≤ x),
but for a continuous random variable, this is calculated as an area under a
curve instead of a summation (see (5) on page 36 in Section 7.3):
Z
x
FX (x) =
fX (t)dt,
−∞
where fX (x) is the p.d.f. of X. So, whereas a discrete random variable has
a probability mass function, a continuous random variable has a probability
density function.
From the definitions, it follows that
d
FX (x) = fX (x).
dx
In summary, for the distribution of a continuous random variable, we use
70
‘area under a curve’ as a probability measure, and the curve that we choose
for a particular random variable X is called the probability density function
of X. Note that the condition
Z ∞
fX (t)dt = 1,
−∞
applies because this integral represents P (−∞ < X < ∞). Note also that
Z a
P (X = a) = P (a ≤ X ≤ a) =
fX (x)dx = 0,
a
and that
P (X < a) = P (X ≤ a) − P (X = a) = P (X ≤ a).
Example 48. Let X be a random variable with p.d.f. given by fX (x) = ke−x
for x ∈ [0, 2], and 0 otherwise.
1. Find the value of k.
2. Derive the cumulative distribution function
3. Calculate the probability that X ≤ 0.5.
14
Expectation and variance of continuous random variables
Definition 28. For a continuous random variable X, the expectation of g(X),
for some function g defined on RX , is defined as
Z
∞
E{g(X)} :=
g(x)fX (x)dx.
−∞
71
Setting g(X) = X gives
∞
Z
E{X} =
xfX (x)dx,
−∞
and we again use the notation
µX := E(X),
with µX referred to as the mean of X.
Variance has the same definition as before, but it is now calculated using
integration
Var(X) := E{(X − µX )2 }
Z ∞
=
(x − µX )2 fX (x)dx.
−∞
Note: for a continuous random variable X, the identity
Var(X) = E(X 2 ) − E(X)2
still holds, so, as with discrete random variables, we can calculate Var(X) by
calculating E(X) and E(X 2 ).
All the properties of expectation and variance that we met in Section 8 still
hold:
E(aX + b) = aE(X) + b,
E(X + Y ) = E(X) + E(Y ),
Var(aX + b) = a2 Var(X),
and, if X and Y are independent,
E(XY ) = E(X)E(Y ),
Var(X + Y ) = Var(X) + Var(Y ).
72
Example 49. Calculate the expectation and variance of the random variable
defined in Example 48.
15
15.1
Standard continuous probability distributions
The exponential distribution
We consider three standard probability distributions for continuous random
variables: the exponential distribution, the uniform distribution, and the
normal distribution.
The exponential distribution is used to represent a ‘time to an event’. Examples of ‘experiments’ that we might describe using a exponential random
variable are
• a patient with heart disease is given a drug, and we observe the time
until the patient’s next heart attack;
• a new car is bought and we observe how many miles the car is driven
before it has its first breakdown.
Definition 29. If a random variable X has an exponential distribution,
with rate parameter λ, then its probability density function is given by
fX (x) = λe−λx ,
for x ≥ 0, and 0 otherwise. We write
X ∼ Exp(rate = λ), or just X ∼ Exp(λ),
to mean “X has an exponential distribution with rate parameter λ”.
73
Theorem 21. Cumulative distribution function of an exponential random
variable
If X ∼ Exp(λ), then
FX (x) = 1 − e−λx .
We can see that limx→∞ F (X)(x) = 1 (so that “FX (∞) = 1”), so that, as
required of a p.d.f.,
Z ∞
Z ∞
fX (x)dx =
λe−λx dx = 1.
−∞
0
2
1
1.8
0.9
1.6
0.8
1.4
0.7
1.2
0.6
F X (x)
f X (x)
We plot both the p.d.f. and c.d.f. of an Exp(2) random variable in Figure 12
1
0.5
0.8
0.4
0.6
0.3
0.4
0.2
0.2
0.1
0
0
1
2
x
3
0
4
0
1
2
x
3
4
Figure 12: The p.d.f. (left plot) and c.d.f. (right plot) of an exponential random variable X
with rate parameter λ = 2.
Theorem 22. (Expectation and variance of an exponential random variable)
74
If X ∼ Exp(λ), then
1
,
λ
1
Var(X) = 2 .
λ
Theorem 23. (The ‘lack of memory’ property of an exponential random
variable)
E(X) =
If X ∼ Exp(λ), then
P (X > x + a|X > a) = P (X > x).
In other words, exponential random variables have the interesting property
that they ‘forget’ how ‘old’ they are. If the lifetime of some object has an
exponential distribution, and the object survives from time 0 to time a, it
will ‘carry on’ as if it was starting at time 0.
Example 50. A computer is left running continuously until it first develops
a fault. The time until the fault, X, is to be modelled with an exponential
distribution. The expected time until the first fault is 100 days.
1. If X ∼ Exp(λ), determine the value of λ. What is the standard deviation
of X?
2. What is the probability that is the computer develops a fault within the
first 100 days?
3. If the computer is still working after 100 days, what is the probability
that it will still be working after 150 days?
Example 51. Suppose the number of earthquakes Nt in an interval [0, t] has
a P oisson(φt) distribution, for any value of t. Recall that if X ∼ P oisson(λ),
e−λ λx
pX (x) = P (X = x) =
x!
75
Let T be the time until the first earthquake. What is the cumulative distribution function of T ? What is the distribution of T ?
15.2
The uniform distribution
The uniform distribution is used to describe a random variable that is constrained to lie in some interval [a, b], but has the same probability of lying in
any interval contained within [a, b] of a fixed width. The uniform distribution
is an important concept in probability theory, but it is less useful for modelling uncertainty in the real world; it is not often plausible in real situations
that all intervals of the same width are equally likely.
Definition 30. If a random variable X has a uniform distribution over
the interval [a, b], then its probability density function is given by
fX (x) =
1
,
b−a
for x ∈ [a, b], and 0 otherwise. We write
X ∼ U [a, b],
to mean “X has a uniform distribution over the interval [a, b].”
Theorem 24. (Cumulative distribution function of a uniform random variable)
If X ∼ U [a, b], then for x ∈ [a, b]
FX (x) =
x−a
.
b−a
Plotting the c.d.f. between x = a and x = b will give a straight line, joining
the points (a, 0) and (b, 1). We plot the p.d.f. and c.d.f. in Figure 13.
76
0.15
1
0.8
f X (x)
F X (x)
0.1
0.6
0.4
0.05
0.2
0
10
15
x
0
20
10
15
x
20
Figure 13: The p.d.f. (left plot) and c.d.f. (right plot) of a uniform random variable X over
the interval [10, 20].
Theorem 25. (Expectation and variance of a uniform random variable)
If X ∼ U [a, b], then
a+b
,
2
(b − a)2
Var(X) =
.
12
E(X) =
Example 52. Let X ∼ U [−1, 1]. Calculate E(X), Var(X) and P (X ≤
−0.5|X ≤ 0).
77
15.3
The standard normal distribution
The normal distribution is very important distribution in both probability
and statistics. Before studying it, we first introduce the Gaussian integral:
Z ∞
√
2
(7)
e−x dx = π.
−∞
(A proof is given in Applebaum (2008), though you will need to understand
changes of variables within double integration).
We first define the standard normal distribution, before considering the more
general case.
Definition 31. If a random variable Z has a standard normal distribution, then its probability density function is given by
2
1
z
fZ (z) = √ exp −
.
2
2π
We write
Z ∼ N (0, 1),
to mean “Z has the standard normal distribution.”
We can use (7) to confirm that this is a valid p.d.f.
√ Starting dxwith the
√ Gaussian integral, we make the substitution x = z/ 2, so that dz = 1/ 2: (7)
immediately gives
Z ∞
1
2
√ e−x dx = 1
π
−∞
and the substitution then gives
Z ∞
1 − z2
√ e 2 dz = 1,
2π
−∞
78
The p.d.f. is plotted in Figure 14 (left plot). Note the distinctive ‘bell-shaped’
curve and the symmetry about z = 0.
15.3.1
The cumulative distribution function of a standard normal random variable
The c.d.f. is
z
2
1
t
√ exp −
FZ (z) = P (Z ≤ z) =
dt.
2
2π
−∞
Z
However, we can’t evaluate this integral analytically, and have to use numerical methods. There are various statistical tables available that give the
value of FZ (z) for different z, but these have now largely been superseded by
modern computing packages, and we will see how to calculate FZ (z) using R
in Section 15.4.3. The c.d.f. is plotted below (right plot).
The notation Φ is commonly used to represent the cdf, and φ to represent
the pdf:
2
Z z
1
t
√ exp −
Φ(z) := P (Z ≤ z) =
dt
2
2π
−∞
2
1
z
φ(z) := √ exp −
.
2
2π
Then
d
Φ(z) = φ(z).
dz
Theorem 26. (Relationship between Φ(z) and Φ(−z).)
Φ(−z) = 1 − Φ(z).
79
0.5
1
0.45
0.4
0.8
0.35
F Z (z)
f Z (z)
0.3
0.25
0.2
0.6
0.4
0.15
0.1
0.2
0.05
0
ï5
0
0
ï5
5
z
0
z
5
Figure 14: The p.d.f. (left plot) and c.d.f. (right plot) of a standard normal random variable.
Note the ‘bell shape’ of the pdf; the p.d.f. is sometimes referred to as the ‘bell-shaped curve’.
This can be seen in Figure 15.
We denote the quantile function by Φ−1 . If we want z such that P (Z ≤ z) =
α, then we write
Φ(z) = α
z = Φ−1 (α).
Theorem 27. (Expectation and variance of a standard normal random variable)
If Z ∼ N (0, 1), then
E(Z) = 0
Var(Z) = 1.
80
0.5
0.45
0.4
0.35
f Z (z)
0.3
0.25
0.2
0.15
0.1
0.05
0
ï4
ï3
ï2
ï1
0
z
1
2
3
4
Figure 15: As fZ is a symmetric about z = 0 the area under the curve between −∞ and −z
is the same as the area under the curve between z and ∞.
15.4
The normal distribution: the general case
The standard normal distribution is one example of the family of normal
distributions, in which the mean is 0 and the variance is 1, but, in general,
normal random variables can have any values for the mean and variance
(though variances cannot be negative, of course).
Normal distributions are used very widely in many situations, for example:
• many physical characteristics of humans and other animals, for example
the distribution of heights of females in a particular age group, can be
well represented with a normal distribution;
• scientists often assume that ‘measurement errors’ are normally distributed;
81
• normal distributions are commonly used in finance to model changes in
stock prices (though not always sensibly!).
Some idea why the normal distribution is so important will be given later in
the course in the section on the Central Limit Theorem.
Definition 32. If a random variable X has a normal distribution with
mean µ and variance σ 2 , then its probability density function is given by
1
1
2
fX (x) = √
exp − 2 (x − µ)
2σ
2πσ 2
We write
X ∼ N (µ, σ 2 ),
to mean “X has a normal distribution with mean µ and variance σ 2 .”
Immediately, we can see that by setting µ = 0 and σ 2 = 1 in Definition 32,
we get the standard normal p.d.f. in Definition 31.
Theorem 28. (Definition of a general normal random variable via transformation of a standard normal random variable)
Let Z ∼ N (0, 1), and define
X = µ + σZ.
Then E(X) = µ, Var(X) = σ 2 and
X ∼ N (µ, σ 2 ),
15.4.1
Summary
It’s worth stating again the relationship between a standard normal random
variable Z and a ‘general’ normal random variable X.
82
• Given Z ∼ N (0, 1), we can obtain X ∼ N (µ, σ 2 ) by transforming Z:
X = µ + σZ.
• Given X ∼ N (µ, σ 2 ), we can obtain Z ∼ N (0, 1) by transforming X:
Z=
X −µ
,
σ
and we refer to transforming X to get a standard N (0, 1) random variable
as standardising X.
Traditionally, we would calculate the c.d.f. of X via standardising and using
the Φ(z) function:
x−µ
X −µ x−µ
=Φ
,
P (X ≤ x) = P
≤
σ
σ
σ
where Φ(z) is given in statistical tables for various values of z. As discussed
before, statistical tables have become largely obsolete given computer packages such as R, although the technique of standardising is still computationally useful.
15.4.2
Visualising the mean and variance
By plotting the density function, we can see the effect of changing the value
of µ and σ 2 and so interpret these parameters more easily. Starting with the
mean, we see, in Figure 16 that if X ∼ N (µ, σ 2 ), then the maximum of the
p.d.f. is at x = µ. If we change µ whilst leaving σ 2 unchanged (Figure 16,
top plot), the p.d.f. ‘shifts’ along the x-axis, but the shape of the p.d.f. is
unchanged.
83
The variance parameter σ 2 determines how ‘spread out’ the p.d.f. is. If we
increase σ 2 , whilst leaving µ unchanged (Figure 16, bottom plot), the peak
of the p.d.f. is in the same place, but we get a flatter curve. This is to be
expected, remembering that random variables with larger variances are more
likely to be further away from their expectations.
f X (x)
0.4
N (0, 1)
0.3
N (2, 1)
0.2
0.1
0
ï5
ï4
ï3
ï2
ï1
0
x
1
2
3
4
5
6
0.4
f X (x)
N (0, 1)
0.3
0.2
N (0, 4)
0.1
0
ï6
ï4
ï2
0
x
2
4
6
Figure 16: Top plot: pdfs for the N (0, 1) and N (2, 1) distributions. Bottom plot: pdfs for
the N (0, 1) and N (0, 4) distribution.
15.4.3
The normal distribution in R
R will calculate the p.d.f., c.d.f. and quantile functions, and will also generate
normal random variables. Note that in R, we specify the standard
deviation rather than the variance.
• Calculate the pdf: dnorm(x,mu,sigma)
Example: calculate fX (2) when X ∼ N (1, 4).
84
> dnorm(2,1,2)
[1] 0.1760327
• Calculate the cdf: pnorm(x,mu,sigma)
Example: calculate FX (−1) = P (X ≤ −1) when X ∼ N (1, 4).
> pnorm(-1,1,2)
[1] 0.1586553
• Invert the c.d.f. to find the α quantile: qnorm(alpha,mu,sigma)
Example: if X ∼ N (0, 1), what value of x satisfies the equation FX (x) =
P (X ≤ x) = 0.95?
> qnorm(0.95,0,1)
[1] 1.644854
Check:
> pnorm(1.644854,0,1)
[1] 0.95
• Generate m random observations from a normal distribution: rnorm(m,mu,sigma)
Example: generate 3 random observations from the N (15, 25) distribution.
> rnorm(3,15,5)
[1] 6.985971 20.671469 11.637691
Example 53. If X ∼ N (3, 4), what is the 25th percentile of the distribution
of X? Use Φ−1 (0.75) = 0.67.
85
Example 54. (from Ross, 2010).
An expert witness in a paternity suit testifies that the length, in days, of
human gestation is approximately normally distributed, with mean 270 days
and standard deviation 10 days. The defendant has proved that he was out
of the country during a period between 290 days before the birth of the child
and 240 days before the birth of the child, so if he is the father, the gestation
period must have either exceeded 290 days, or been shorter than 240 days.
How likely is this?
15.4.4
The two-σ rule
For a standard normal random variable Z,
P (−1.96 ≤ Z ≤ 1.96) = 0.95.
Since E(Z) = 0 and Var(Z) = 1, the probability of Z being within two
standard deviations of its mean value is approximately 0.95 (ie P (−2 ≤ Z ≤
2) = 0.9545 to 4 d.p.). We illustrate this in Figure 17.
If we now consider any normal random variable X ∼ N (µ, σ 2 ), the probability
that it will lie within a distance of two standard deviations from its mean is
approximately 0.95. This is straightforward to verify:
P (|X − µ| ≤ 1.96σ) = P (µ − 1.96σ ≤ X ≤ µ + 1.96σ)
X −µ
= P −1.96 ≤
≤ 1.96
σ
= P (−1.96 ≤ Z ≤ 1.96)
= 0.95,
(with P (|X − µ| ≤ 2σ) = 0.9545 to 4 d.p.).
86
0.45
0.4
0.35
f X (x)
0.3
0.25
0.2
0.15
0.1
0.05
0
ï4
ï3
ï2
ï1
0
x
1
2
3
4
Figure 17: The p.d.f. of a N (0, 1) random variable (mean 0, standard deviation 1). There is
a 95% probability that a normal random variable will lie within 1.96 standard deviations of
its mean.
In Statistics, there is a convention of using 0.05 as a threshold for a ‘small’
probability (more of this in Semester 2), though the choice of 0.05 is arbitrary.
However, the two-σ rule is an easy to remember fact about normal random
variables, and can be a useful yardstick in various situations.
87