Chapter 5 Continuous random variables

Chapter 5
Continuous random variables
While discrete random variables usually involve the counting of something, continuous random variables typically involve a measurement of something. We have seen that discrete random variables have a (at most) countable range and the probabilities assigned to all individual possible values of a
discrete random variable X determine any probability depending on X, i.e.
P(X 2 A) =
X
P(X = x) =
x2A
X
x2A
pX (x)
8A ✓ R.
In contrast, continuous random variables take values from a continuum. This
intuitively means that probabilities depending on such random variables are
determined by the probabilities assigned to non-degenerate intervals.
Definition 5.1. A random variable X is called a continuous random variable
if P(X = x) = 0 for all x 2 R.
Cumulative distribution functions are defined as in the discrete case, except that in the continuous case they cannot be computed interms of a sum
of probabilities assigned to the individual possible values. We now recall the
universal definition of a CDF, valid for any random variable, we show its
properties and we will see that continuity for a random variable is equivalent
to the continuity of its CDF.
Definition 5.2. Let X be a random variable. The cumulative distribution
79
80
5. Continuous random variables
function (CDF) of X is the map FX : R ! R defined by FX (a) = P(X  a)
for all a 2 R.
Proposition 5.3. Let X be a random variable, then FX satisfies the following
properties:
(i) FX is non-decreasing;
(ii) FX is right-continuous;
(iii) FX ( 1) ⌘ lim FX (a) = 0;
a! 1
(iv) FX (1) ⌘ lim FX (a) = 1.
a!1
Moreover, let F : R ! R be a function satisfying the properties (i)-(iv), then
F is the CDF of some random variable.
Proof.
(i) Let x1 , x2 2 R, x1  x2 . Then {X  x1 } ✓ {X  x2 }, hence,
by monotonicity of a probability measure, P(X  x1 )  P(X  x2 ).
(ii) Let x 2 R and (xn )n2N be a decreasing sequence of real numbers conxm for all n  m, xn
verging to x, that is xn
! x. Denote
n!1
An = {X  xn }. Then we have a decreasing sequence of events,
A1 ◆ A2 ◆ . . .. Note that, let (An )n2N a decreasing sequence of events,
the following holds:
P
1
\
n=1
An
!
= lim P(An ).
n!1
(5.1)
Then, we have
lim FX (xn ) = lim P(An ) = P
n!1
n!1
1
\
n=1
An
!
= P(X  X) = FX (x).
Since we had arbitrarily chosen x 2 R and the right-converging sequence
(xn )n2N , we hereby proved the right-continuiuty of FX .
81
(iii) Let (xn )n2N be a decreasing sequence of real numbers, that is xn
for all n  m, such that xn
!
n!1
xm
1. As before, denote An =
{X  xn }. Then we have a decreasing sequence of events with empty
T
intersection, i.e. A1 ◆ A2 ◆ . . ., 1
n=1 An = ;. Thus:
!
1
\
lim FX (xn ) = lim P(An ) = P
An = P(;) = 0.
n!1
n!1
n=1
(iv) Let (xn )n2N be an increasing sequence of real numbers, that is xn  xm
for all n  m, such that xn
! 1. Denoting An = {X  xn }, we
n!1
have an increasing sequence of events whose union is the whole sample
S
space, i.e. A1 ✓ A2 ✓ . . ., 1
n=1 An = ⌦. Note that, let (An )n2N an
increasing sequence of events, the following holds:
!
1
[
P
An = lim P(An ).
n=1
(5.2)
n!1
Then, we have
lim FX (xn ) = lim P(An ) = P
n!1
n!1
1
[
n=1
An
!
= P(⌦) = 1.
We skip the proof of the inverse statement.
The properties in Proposition 5.3 characterize cumulative distribution
functions, in the sense that they are necessary and sufficient conditions for
a real map to be a CDF of some random variable, regardless it is discrete or
continuous.
Remark 5.4. Note that, while being continuous from the right, a CDF FX is
not left-continuous in general. In particular, the left limit of FX at a is
FX (a ) ⌘ lim FX (t) = P(X < a).
t%a
(5.3)
Cumulative distribution functions can be used to compute probabilities
for a random variable, thanks to the following relationships.
Proposition 5.5. For all a, b 2 R, a < b, the following hold:
82
5. Continuous random variables
1. P(a < X  b) = FX (b)
FX (a);
2. P(a  X  b) = FX (b)
FX (a );
3. P(a < X < b) = FX (b )
FX (a);
4. P(a  X < b) = FX (b )
FX (a ).
Proof. The equations in 1.-4. can be easily proven following the same reasoning: we can rewrite the event whose probability is to be computed as a
di↵erence of two sets of the kind {X  x} or {X < x}, then we use the
additivity of probability, the definition of a CDF and the equation (5.3). We
show it explicitly for 1. We have
{a < X  b} = {X  b} \ {X  a},
hence¡
FX (b) = P(X  b) = P(X  a) + P(a < X  b) = FX (a) + P(a < X  b),
from which the claim.
Looking at the CDF, one can also see whether a random variable is continuous or not.
Proposition 5.6. A random variable is continuous if and only if its CDF is
a continuous function.
Proof. First, observe that by the property (ii) in Proposition 5.3, a CDF is
continuous if and only if it is left-continuous. Therefore, given a random
variable X, note that for all a 2 R we have {X = a} = {X  a} \ {X < a},
and hence
P(X = a) = FX (a)
FX (a ),
8a 2 R.
(5.4)
Therefore, P(X = a) = 0 if and only if FX is left-continuous at a. Since this
is true for all a 2 R, the proof is complete.
83
Equation (5.4) shows that the probability that a random variable takes
a specific value is given by the jump of its CDF at that specific point.
Note that, if X is a continuous random variable, then the probability of
any interval of the real line does not depend on the end points of the interval,
that is the right-hand sides of 1.-4. all become equal to FX (b)
FX (a).
We would like to have an analogue of the PMF for discrete random variables that describes the distribution of a continuous random variable.
Definition 5.7. Let X be a continuous random variable. If there exists a
non-negative function fX : R ! R such that for all a, b 2 R, a  b, it holds
Z b
P(a  X  b) =
fX (x) dx,
(5.5)
a
then fX is called a probability density function (PDF) of X.
Remark 5.8.
• If fX is a PDF of X and g : R ! R is a real map that
is pointwise equal to f outside at maximum a finite number of points,
i.e.
|{x 2 R : g(x) 6= fX (x)}| < 1,
then g is also a PDF of X.
• Not all continuous random variables have a PDF.
• If a random variable X has a PDF, then X has infinitely many PDFs.
As remarked, not all continuous random variables have density functions.
However, it is possible to recognize the random variables having a density
by looking at the smoothness of the correspondent cumulative distribution
functions.
Proposition 5.9. A continuous random variable X has a PDF if and only
if there exists a non-negative function f : R ! R such that
Z a
FX (a) =
f (x) dx,
8a 2 R.
(5.6)
1
In this case, f is a PDF of X and, for all x 2 R where f is continuous,
f (x) = FX0 (x).
84
5. Continuous random variables
(a) CDF at a point (in blue) as the area
under the PDF from
1 to that point.
(b) Probability of an interval (in orange) as the
area under the PDF between the end points.
Figure 5.1: Examples of plots of PDFs and related applications.
Proposition 5.9 says that a random variable is continuous if and only if
its CDF is absolutely continuous. An absolutely continuous function is differentiable almost everywhere, there is everywhere outside a set of Lebesgue
measure zero1 . In this case, the density is (almost everywhere) equal to the
derivative of the CDF, that is f = F 0 almost everywhere. In particular, as
claimed in Proposition 5.9, the equality holds at all points of continuity of f .
There is also a practical procedure for (possibly) finding a PDF of a given
random variable X:
1. check for continuity of FX ; if it holds, proceed with the next step;
2. check at which points FX0 exists;
3. if FX0 exists and is continuous outside a finite or countable subset of
the real line, that is if R \ A is at most countable, where A := {x 2 R :
9FX0 (x) continuous}, then define fX (x) = FX0 (x) for all x 2 A.
This procedures assumes slightly stronger hypothesis than Proposition 5.9,
but it is of practical relevance and such hypothesis are satisfied by most
examples, in particular by all common ones.
1
A set A ✓ R has Lebesgue measure zero, i.e. is a null set, if it can be covered by a
countable union of intervals of arbitrarily small total length. In particular, all countable
sets are null sets.
5.1 Most common continuous distributions
Remark 5.10. Let f be a PDF, then
Z 1
f (x) dx = 1.
85
(5.7)
1
If f : R ! R is a non-negative function satisfying (5.7), then f is a PDF.
Following Remark 5.10, we can construct a PDF starting from any nonnegative real map that has finite non-zero integral on R. Indeed, let g : R !
R
R such that g 0 and R g(x) dx = c 2 R\{0}, then we can define a function
f : R ! R by f (x) = 1c g(x) for all x 2 R; f is then a PDF.
Once we have a PDF of a random variable, we are able to compute any
probability depending on that random variable.
Proposition 5.11. Let X be a continuous random variable having a PDF.
Then:
P(X 2 A) =
Z
f (x) dx,
A
8A ✓ R.
(5.8)
The proof of this result requires knowledge of measure theory, which is
behind the scope of this course.
We are now going to see the most common examples of continuous distributions.
5.1
Most common continuous distributions
Uniform distribution
Random variables having constant PDFs on their ranges are called uniform random variables. The most obvious example is the random number
generator, which selects at random a number from the interval [0, 1]. Whenever a random selection of a point from a set is involved, we have already
seen in Section 2.4.1 that the most appropriate probability model is the
equal-likelihood model. Applied to the random number generator, it gives
P(E) =
|E|
|E|
=
= |E|
|⌦|
|1 0|
8E ✓ ⌦ = [0, 1].
86
5. Continuous random variables
If we denote by X the output number, we have X(x) = x for all x 2 ⌦.
Thus, for all x 2 [0, 1], we have {X = x} = {x} and so we easily see that
P(X = x) = |{x}| = 0, which means that X is a continuous random variable.
We see now the formal definition and the specification of the related PDF.
Definition 5.12. A continuous random variable X is called a uniform random variable if there exist a, b 2 R, a < b, such that the PDF of X, fX , is
constant over [a, b] (or (a, b) or [a, b) or (a, b]) and null elsewhere. In this case
we say that X is uniformly distributed on [a, b] and we write X ⇠ U(a, b).
Let X ⇠ U (a, b), then the formula for its PDF follows directly from
Definition 5.12, that is:
fX (x) =
8
<
1
,
b a
x 2 [a, b],
:0,
(5.9)
otherwise.
Indeed, we know from Definition 5.12 that there exists a constant c 2 R such
that
fX (x) =
8
<c,
x 2 [a, b],
:0, otherwise,
but we also know from (5.7) that a PDF must integrate to one, that is
1=
Z
1
1
f (x) dx =
Z
b
c dx = c(b
)
a)
a
c=
1
b
a
.
Then, the formula for the CDF of X follows immediately from (5.9) and
(5.6):
FX (x) =
8
>
0,
>
>
<
x a
b a
>
>
>
:1,
x  a,
, x 2 (a, b),
x
(5.10)
b.
Exponential distribution
Exponential random variables are the continuous analogue of geometric
random variables, which were presented in the discrete setting (Section 4.1).
5.1 Most common continuous distributions
87
They are also related to Poisson random variables, in that they describe the
time passed between consecutive occurrences of a certain event.
Definition 5.13. A continuous random variable X is called an exponential
2 R, > 0, such that
8
< e x , x 0,
fX (x) =
:0,
otherwise.
random variable if there exists
(5.11)
In this case, we say that X is exponentially distributed with paramter
and
we write X ⇠ E( ).
Again, it is easy to deduct from (5.6) and the definition (5.11) that the
CDF of an exponential random variable X with parameter
8
<1 e x , x 0,
FX (x) =
:0,
otherwise.
is
(5.12)
Moreover, we have an explicit formula for the tail probabilities, that will turn
out useful in applications. It is
P(X > a) = e
a
8a > 0.
(5.13)
We have seen in Section 4.1 that geometric random variables have the interesting lack-of-memory property and in particular that they are the only
positive integer-valued random variables with that feature. As exponential
random variables are the continuous analogue of geometric random variables,
it is not surprising to see that they are the only positive continuous random
variables possessing the continuous analogue of that property.
Proposition 5.14. A positive continuous random variable X has the lackof-memory property, i.e.
P(X > s + t|X > s) = P(X > t)
8s, t 2 R, 0  s < t,
if and only if X is exponentially distributed.
(5.14)
88
5. Continuous random variables
Proof. ((). Let X ⇠ E( ). Then:
P(X > s + t, X > s)
P(X > s)
P(X > s + t)
=
P(X > s)
e (s+t)
=
e s
=e t
P(X > s + t|X > s) =
= P(X > t).
()). Let X be a continuous random variable having the lack-of-memory
property. Define a function G : R+ ! R by setting G(t) = P(X > t) for all
t > 0. Then G is a monotonically decreasing function satisfying
G(t + s) = G(s)G(t)
for all s, t > 0.
Let n, m 2 N, we can prove by induction that
✓ ◆m
⇣m⌘
1
G
=G
.
n
n
(5.15)
• Fix n 2 N. For m = 1 we have an identity.
• Suppose now that (5.15) holds up to a certain integer m > 1, then
G(m + 1) = G(m)(1) = G(1)m G(1) = G(1)m+1 .
Analogously, if (5.15) holds for integers n, m > 1, then
G(
m+1
m 1
1
) = G( )( ) = G( )m+1 .
n
n n
n
From this property it also follows that 0 < G(1) < 1. Summing up, for all
r 2 Q, r > 0, we have G(r) = G(1)r , moreover G(0) = 1. This, together
with the other properties above, implies that G is an exponential function:
G(r) = e
r
,
where
=
ln(G(1)).
Since G is monotonically decreasing, we we also have G(x) = e
x 2 R+ . Therefore, we can conclude that P(X > x) = e
and hence the thesis.
x
x
for all
for all x 2 R+ ,
5.1 Most common continuous distributions
89
Normal (or Gaussian) distribution
Initially discovered by Abraham De Moivre in 1733 as an approximation
of coin tossing probabilities, then named after Carl Friedrich Gauss, who used
it in 1809 to predict the location of astronomical bodies, the normal (also
called Gaussian), is the most popular continuous distribution, that arises in
most of real-world applications. The appellation ‘normal’ comes from the
fact that since its first uses many random variables were found to follow the
same distribution or to be well approximated by it.
Definition 5.15. A continuous random variable X is called a normal random variable if there exist µ,
2 R, s > 0, sucht that
fX (x) = p
1
e
2⇡
(x µ)2
2 2
,
x 2 R.
In this case, we say that X is normally distributed with parameters µ,
we write X ⇠ N (µ,
2
(5.16)
and
)
You can prove as an exercise that the function defined in (5.16) is a PDF.
Also note that, by its definition, fX is symmetric about µ.
2
Proposition 5.16. If X ⇠ N (µ,
b, a2
2
).
) and Y = aX + b, then Y ⇠ N (aµ +
You can prove this as a simple analysis exercise.
Corollary 5.17. Let X ⇠ N (µ,
Z :=
X
2
). Then:
µ
⇠ N (0, 1).
(5.17)
The random variable defined in (5.17) is a standardized random variable
that turns out to be of practical importance.
Definition 5.18. A normally distributed random variable with parameters
0,1 is called a standard normal random variable. The CDF of a standard
normal random variable is usually denoted by
Z z
t2
1
p e 2 dt,
(z) =
2⇡
1
, i.e.
z 2 R.
(5.18)
90
5. Continuous random variables
Unfortunately, there is no simple formula for the CDF of a normal random
variable, and consequently for any probability depending on it. However, one
can express normal probabilities in terms of
and let numerical software run
the computation.
Proposition 5.19. Let X ⇠ N (µ, 2 ). Then
✓
◆
✓
◆
b µ
a µ
P(a < X < b) =
,
8a, b 2 R, a  b.
(5.19)
Proof. We have
P(a < X < b) = P
=P
=
Remark 5.20. Let X ⇠ N (µ,
2
✓
✓
✓
a
µ
a
µ
b
µ
<
X
µ
µ
◆
b
µ
a
µ
<Z<
◆
✓
). Then FX (x) =
<
b
◆
x µ
◆
.
for all x 2 R.