Measure Theory and Probability - Department of Statistics, CUHK

Chapter 2
Measure Theory and Probability
2.1 Introduction
In advanced probability courses, a random variable (r.v.) is defined rigorously
through measure theory, which is an in-depth mathematics discipline on generalizing the concepts of length. To have a mathematically rigorous description of probability, one relies on the Probability Space (Ω , F , P), where
• Ω is the sample space,
• F is the σ -algebra of Ω , and
• P is the probability measure.
In this Chapter we motivate the necessities of using measure theory on Probability.
First, we review some basic definitions in mathematics about the size of a set
such as finite, countable and uncountable.
Definition 2.1. For any positive integer n, let Jn = {1, 2, · · · , n} and J = Z+ , the set
of all positive integers. For two sets X and Y , denote X ∼ Y if there exists a bijective
function between X and Y , or equivalently, elements in X and Y can be put into
one-one correspondence. Then, for any set A, we say:
i)
ii)
iii)
iv)
A is finite if A ∼ Jn for some n (the empty set is also considered to be finite).
A is infinite if A is not finite.
A is countable if A ∼ J or A ∼ Jn for some n.
A is uncountable if A is neither finite nor countable.
Example 2.1. If A = {9, 5, 2, 7}, then we can define a function f : A → J4 such that
f (9) = 1, f (5) = 2, f (2) = 3, f (7) = 4. Since f is a bijective function, A is finite.
Example 2.2. If A is the set of even positive integers {2, 4, · · · }, then we can define
a bijective function f : A → J such as f (x) = x/2. (It is straightforward to check
that f is a bijective function; see Exercise 2.4). Thus, A is countable. However, A
is not finite since the number of elements in A is greater than n for any integer n.
Therefore, no bijective function f : A → Jn can be defined for any fixed n.
21
22
2 Measure Theory and Probability
Example 2.3. The set of integers A = {· · · , −3, −2, −1, 0, 1, 2, 3, · · · } is countable,
although it seems to be twice as large as J. In this case, a bijective function f : A → J
can be defined, e.g., f (x) = (2x + 1)1{x≥0} + (−2x)1{x<0} . Note that f need not be
unique. (Exercise 2.4).
Example 2.4. A sequence {xi }i=1,2,3,... is clearly countable (e.g. f (xi ) = i). A double
array {xi j }i=1,2,3··· , j=1,2,3··· is also countable. The way of counting is achieved by
the following diagonal argument:
x11
x12
x13
x14
···
x21
x22
x23
x24
···
x31
x32
x33
x34
···
x41
x42
x43
x44
···
..
.
..
.
..
.
..
.
···
Example 2.5. Rational number Q = { qp : p, q ∈ Z, q 6= 0} is countable, since Q can
be regarded as {x pq } p,q∈J . Thus the result of Example 2.4 can be applied directly.
Example 2.6. Let A be the set of infinite sequence of binary numbers, i.e., A is a set
with elements taking the form such as 0.101100 · · · and 0.011001100 · · · . The set
A is uncountable. To see this, suppose on the contrary that A is countable. Then,
there exists a bijective function f : A → J such that one can enumerate the elements
in A as x1 , x2 , . . . , xn , . . .. Now we point out a contradiction by obtaining a sequence
taking the form y = 0.a1 a2 a3 · · · , where ai = 0 or 1, yet does not belong to A. The
construction of y is as follows: ai = 1− the i-th digit of xi . By construction, for any
integer n, y 6= xn since their n-th digits are different. Thus y ∈
/ A, contradicting that
A contains all the infinite sequence of binary number.
The same argument can be generalized to show that real numbers are uncountable
(Exercise 2.6).
2.2 Probability Space and Random Variable
In this section, we first give the precise definition of Probability Space and Random Variable. Then we explain these concepts in detail in the subsections.
Definition 2.2. (Probability space) A probability space is a triple (Ω , F , P), where
• Ω , the Sample Space, is an arbitrary set describing all possible events.
2.2 Probability Space and Random Variable
23
• F , the σ -field, is a family of subsets of Ω . These subsets are the “measurable
set”, where we can assign values about their probabilities.
• P, the Probability Measure, is a function P : F → [0, 1] satisfying P (Ω ) = 1
∞
and P(∪∞
i=1 Ai ) = ∑i=1 P(Ai ) for disjoints Ai ∈ F .
t
u
Note that the domain of the function P is not the elements of Ω , but subsets
of elements of Ω . Precisely speaking, if ω ∈ Ω , then P({ω}) is well defined if
{ω} ∈ F , but P(ω) is undefined.
Definition 2.3. (Random variable) In a probability space (Ω , F , P), X : Ω → R is
a real-valued random variable if for all x ∈ R, X −1 (−∞, x] ∈ F .
t
u
To be precise, a Random variable is a function from the sample space to the real
line. The requirement X −1 (−∞, x] ∈ F means that for any x ∈ R, the corresponding
subset in Ω , in which X(ω) ≤ x for ω in this subset, is measurable.
In the following we elaborate the elements in the triple (Ω , F , P) one by one.
2.2.1 Sample Space
The Sample Space Ω is an arbitrary set which may be finite, countable or uncountable. Its elements are usually denoted by ω. Generally, Ω consists of all possible
outcomes ω of an experiment or observation. A subset of Ω (e.g. {ω}, {ω1 , . . . , ωn })
is called an event and an element ω of Ω is called a sample point.
Example 2.7. a) When you randomly point your finger on a ruler of length 1 unit,
then
Ω = {The finger points to x| x ∈ [0, 1]},
b) When a die is flown once, the sample space is
Ω = {The face of the die is i| i = 1, 2, 3, 4, 5, 6} .
c) For describing the results of the n tosses of a coin, the sample space is
Ω = {(ω1 , ω2 , · · · , ωn ) : ωi = H or T },
which is a set of size 2n . Each element in Ω is a sequence of length n.
d) For the position of a particle, the space is
Ω = R3 ,
the three-dimensional Euclidean space.
e) For the price of a stock, the sample space is an abstract space describing the
whole operation in the company, which is difficult to quantify.
t
u
24
2 Measure Theory and Probability
2.2.2 Sigma-Field
Basically, σ -field is a family of subsets of Ω (i.e. family of events) that we want to
assign probabilities on. When Ω is finite, the family of all possible subsets of Ω is
the power set 2Ω , which is also finite. If the probability P({ω}) is defined for each
sample point ω ∈ Ω , then the probability of all events (all elements in 2Ω ) can be
defined according to the finite additivity rule P(A ∪ B) = P(A) + P(B) for disjoints
A and B.
However, if Ω is infinite or uncountable, then it becomes problematic to assign
probabilities on events (see Section 2.3). Therefore, mathematicians resort to identifying a class of sets which is interesting enough and useful enough for practical
purposes. It is natural to think of a class of sets that is closed under various operations in set theory including unions, intersections and complements. This class of
sets is known as σ -field or σ -algebra.
Definition 2.4. (σ -Field) A class F of subsets of Ω is called a σ -field if
1. 0,
/ Ω ∈ F;
2. A ∈ F ⇒ Ac ∈ F ;Sand
3. A1 , A2 , · · · ∈ F ⇒ ∞
i=1 Ai ∈ F .
t
u
Remark 2.1. Note that the definition of σ -Field implies the closure of countable
T
intersections, i.e., A1 , A2 , · · · ∈ F ⇒ ∞
i=1 Ai ∈ F (Exercise 2.7).
Example 2.8. The largest σ -field in Ω is the power class 2Ω , consisting of all the
subsets of Ω ; the smallest σ -field is {0,
/ Ω }.
Example 2.9. Let Ω = {1, 2, 3, 4, 5, 6}. Then it is straightforward to verify from definition that F1 = {0,
/ {1, 2, 3}, {4, 5, 6}, Ω } and F2 = {0,
/ {1, 3, 5}, {2, 4, 6}, Ω } are
σ -Fields. Note that if one is interested in “small/large” or “odd/even”, then F1 and
F2 , respectively, are enough for assigning probabilities to possible events. However,
F3 = {0,
/ {1, 2}, {4, 6}, Ω } is not a σ -Field since the set {1, 2, 4, 6} = {1, 2} ∪ {4, 6}
does not belong to F3 .
Example 2.10. Let F consists of the countable and the co-countable sets (A is cocountable if Ac is countable) in Ω . It can be checked that F is a σ -field. If Ω =
[0, 1], then A = [0, 0.5] has the property that A and Ac are both uncountable. Thus
A∈
/ F , which shows that a σ -field may not contain all the subsets of Ω .
Furthermore, we may write A = ∪x∈[0,0.5] {x} as the (uncountable) union of the
each x ∈ A. Since {x} ∈ F and A ∈
/ F , we see that a σ -field may not be closed
under the formation of uncountable unions.
t
u
Definition 2.5. (σ -field generated by a class of sets)
Let A be a class of subsets of Ω . The σ -field generated by A , denoted by σ (A ),
is the smallest σ -Field that contains A .
2.2 Probability Space and Random Variable
25
Example 2.11. Let Ω = {1, 2, 3, 4, 5, 6} and A = {{1}, {1, 3}}. Then it is straightforward to verify from definition that
σ (A ) = {0,
/ {1}, {1, 3}, {3}, {2, 4, 5, 6}, {2, 3, 4, 5, 6}, {1, 2, 4, 5, 6}, Ω }.
When Ω is finite, one can always construct the σ -field explicitly by union and complement operations. However, when Ω and A are not finite, then one does not have
explicit expressions for the elements of σ (A ) in general.
Example 2.12. (Borel σ -field)
On the real line R, there is a special σ -field that is used frequently in probability
theory. Let C be the collection of all finite open intervals on R, i.e., {(l, u)| − ∞ <
l < u < ∞}. Then
B := BR = σ (C )
is called the Borel σ -field, whose elements are called Borel sets. If our sample space
is R and we want to be able to measure the probability of any combinations of
intervals, then B is the smallest σ -field that we have to consider. Particularly, the
Borel σ -field on [a, b] is denoted by B[a,b] := {[a, b] ∩ B : B ∈ B}.
2.2.3 Probability Measure
Given a sample space Ω and the associated σ -field F , we want to measure the
probability of events defined on F . The duple (Ω , F ) is called a measurable space,
where we can define a probability measure. A set function is a real-valued function
defined on some class of subsets of Ω . A set function ν on a σ -field F of Ω is a
measure, if it satisfies the following assumptions.
Definition 2.6. (Measure) Let (Ω , F ) be a measurable space. A set function ν defined on F is called a measure if it has the following properties.
1. 0 6 ν(A) 6 ∞ for any A ∈ F ;
2. ν(0)
/ = 0; and
3. (Countable additivity) If Ai ∈ F for i = 1, 2, · · · , and Ai ’s are disjoint, i.e.
Ai ∩ A j = 0/ for any i 6= j, then
!
ν
∞
[
i=1
∞
Ai
= ∑ ν(Ai ).
i=1
t
u
If a measure ν satisfies all the three conditions in Definition 2.6 and ν(Ω ) = 1,
then ν is called a probability measure. We usually denote a probability measure by
P instead of ν. In this case, (Ω , F , P) is called a probability space. The support of
P is defined to be the minimal set A in F such that P(A) = 1.
The following measures are very important in probability and statistics.
26
2 Measure Theory and Probability
Example 2.13. (Counting Measure)
Given a measurable space (Ω , F ). The counting measure is a set function N(A) on
F that gives the number of events occurred in A ∈ F . (N(A) = ∞ if A contains
infinitely many events)
For example, let Ω = [0, 60] be a time interval of 60 minutes and customers are
arriving randomly. Then the counting measure N is a set function N : B[0,60] →
{0, 1, 2, . . .} such that N(I) = Number of customers arrived in the interval I. Suppose that four customers arrive at time 21, 23, 39, 58, respectively, then N((0, 20)) =
0, N((0, 22)) = 1, N((22, 40)) = 2, N((0, 60)) = 4, etc...
t
u
Example 2.14. (Lebesgue Measure)
Lebesgue measure is the measure λ on (R, B), where B is the Borel σ -field. It
satisfies λ ([a, b]) = b − a for every finite interval [a, b], where −∞ < a < b < ∞. In
this special case, λ (·) is also known as length function l(·). If we restrict λ to the
measurable space ([0, 1], B[0,1] ), then λ is a probability measure. Another name for
the probability space ([0, 1], B[0,1] , λ ) is the uniform distribution on [0,1].
t
u
Example 2.15. (Lebesgue-Stieltjes Measure). Let F : R → R be a right-continuous,
non-decreasing function (e.g., any probability distribution function). The LebesgueStieltjes measure on R associated with F is the measure on (R, B), denoted by λF ,
such that for −∞ < a < b < ∞,
λF ((a, b]) = F(b) − F(a) .
t
u
In fact, Lebesgue measure is originally defined on a σ -field M that is larger
than B. The detailed construction of Lebesgue measure is outside of scope of this
book, we only outline the steps and refer the readers to standard textbook of measure
theory.
2.3 Measurablility of Sets
27
Construction of Lebesgue Measure
1. Define the Lebesgue outer measure for any set A ⊂ R by
(
∗
∞
m (A) = inf
∑ l(In ) : A
⊆ ∪∞
n=1 In , In
)
are intervals
,
n=1
which is the infimum of all possible cover of A. However, m∗ is not really a
measure as it is not countable additive.
2. Define the class of Lebesgue-Measurable set M as follows: The set E is in
M if for every set A ⊆ R, we have
m∗ (A) = m∗ (A ∩ E) + m∗ (A ∩ E c ) .
In this case we say that E is Lebesgue-Measurable, or E ∈ M .
3. Verify that the class of sets M is a σ -field.
4. Verify that if we restrict m∗ (·) on sets in M , then m∗ (·) satisfies the definition of a measure (Definition 2.6).
5. Lebesgue Measure λ is then defined as the outer measure m∗ (·) on the σ field M . The measure space is denoted by (R, M , λ )
2.3 Measurablility of Sets
This section considers some examples to motivate the need for using σ -field and
measure theory to quantify the size of a set. We focus on the probability space
([0, 1], B[0,1] , P), where P = λ is the Lebesgue measure. In this case, the probability
of an event E ∈ B[0, 1] is equivalent to the “length” of the set E.
2.3.1 Singleton
The set of a single element such as
E1 :=
1
,
3
is known as a singleton. Obviously, the length, or the measure, of E1 is
0. Formally,
we use the following argument: For any ε, P(E1 ) ≤ P [ 31 − ε2 , 13 + ε2 ] = ε. Since ε
is arbitrary, P(E1 ) = 0.
In general, a set with “zero length” is known as null set.
28
2 Measure Theory and Probability
Definition 2.7. (Null set) A null set A ⊆ R is a set that can be covered by a sequence of intervals of arbitrarily small total length,Si.e. given any ε > 0 we can find
∞
a sequence {In : n > 1} of intervals such that A ⊆ ∞
t
u
n=1 In and ∑n=1 l(In ) < ε.
Remark 2.2. All empty set is null, but not all null set is empty (e.g. E1 ).
Remark 2.3. Nullity is a property with respect to measure. Therefore, to be precise,
E1 is a Lebesgue-Null set, or simply λ -Null set. For example, let λ[a,b] (·) be the
Lebesgue measure restricted on [a, b] defined by λ[a,b] (B) = λ (B ∩ [a, b]) for any
B ∈ B. Then (2, 3) is a λ[0,1] -Null set but not a λ[0,5] -Null set.
2.3.2 Open Intervals
Strictly speaking, Lebesgue measure in Example 2.14 is defined on closed sets.
However, whether the boundary is closed or open is irrelevant. To find the measure
of the open interval
1
,
E2 := 0,
3
we use the countably additivity property of measure,
1
1
= P({0}) + P(E2 ) + P
.
P 0,
3
3
Since P is defined on a closed set and a singleton, we have P(E2 ) = 31 − 0 − 0 = 13 .
2.3.3 Rational Numbers
We have seen that one or finite number of singletons have Lebesgue measure 0.
What if we have infinite many singletons, such as
E3 := [0, 1] ∩ Q ,
the set of all rational numbers on the interval [0, 1]?
Surprisingly, the set E3 is a Lebesgue null set although it contains infinitely many
elements. To show that P(E3 ) = 0, from Definition 2.7, we need to construct a cover
for E3 with arbitrary small measure: Since E3 is a countable set, it can be arranged
in some way in the form of E3 = {x1 , x2 , · · · } Fix ε > 0, construct a cover for E3 by
the following sequence of intervals.
2.3 Measurablility of Sets
29
ε
ε
I1 = (x1 − , x1 + )
8
8
ε
ε
I2 = (x2 − , x2 + )
16
16
ε
ε
I3 = (x3 − , x3 + )
32
32
..
.
ε
ε
In = (xn − n+2 , xn + n+2 )
2
2
1
ε
∞
Observe that l(In ) = ε2 · 21n . Since ∑∞
n=1 2n = 1, we have ∑n=1 l(In ) = 2 < ε as
needed.
Although elements of E3 can be found almost everywhere in [0, 1], from the fact
that P([0, 1]) = 1 and P(E3 ) = 0, there are far more irrational numbers in [0, 1] than
rational numbers.
2.3.4 Cantor Set
Not only countable set but also uncountable sets can be null. The Cantor set is a
null uncountable set, which can be obtained by the following construction:
Define the interval
C0 = [0, 1].
Remove the segment ( 31 , 23 ), and let the union of the remaining segments as
2
1
∪ ,1 .
C1 = 0,
3
3
Remove the middle thirds of these intervals, and let C2 be the union of the intervals
1
2 3
6 7
8
C2 = 0,
∪ ,
∪ ,
∪ ,1 .
9
9 9
9 9
9
Continuing in this way, we obtain a sequence of Cn such that C1 ⊃ C2 ⊃ C3 ⊃ · · ·
and Cn is the union of 2n intervals, each of length 3−n . The limit
E4 =
∞
\
n=1
Cn =
∞ 3m−1
\
\−1
0,
m=1
k=0
3k + 1
3k + 2
∪
,
1
3m
3m
is defined as the Cantor set.
The Cantor set E4 has un-countably many elements. To see this, note that any x ∈
ak
[0, 1] can be expressed in ternary form: x = ∑∞
k=1 3k = 0.a1 a2 · · · . with ak = 0, 1, 2.
Note that x ∈ E4 if and only if all it ak equal 0 or 2. The un-countability of E4 follows
from the same argument as in Example 2.6.
30
2 Measure Theory and Probability
large that
Next we show that E4 is a P-null set. Given nany ε > 0, choose n so
2 n
−n , we have
<
ε.
Since
E
⊆
C
and
C
is
a
union
of
2
intervals
of
length
3
n
n
4
3
P(E4 ) ≤ P(Cn ) < ε. Since ε arbitrary, P(E4 ) = 0.
2.3.5 Non-measurable Set
The previous examples involve sets that can be measured by union or intersection
operations of intervals. In other words, those sets are Borel measurable. However,
the variety of subsets in [0, 1] can be far more complicated. In this section we define a
subset of [0, 1], E5 , which is not measurable. Therefore, we cannot assign probability
to such an event.
First we introduce the axiom of choice, which is a fundamental set-theoretical
axiom for the construction of E5 .
Axiom 2.3.1 (The axiom of choice) Suppose that A = {Aα : α ∈ Γ } is a nonempty collection, indexed by some set Γ , of non-empty disjoint subsets of Ω . There
then exists a set E ⊂ Ω which contains precisely one element from each of the sets
Aα .
t
u
Example 2.16. Let Γ = {1, 2, . . . , n} and Aα = [α − 0.1, α + 0.1], then it is easy
to see that we can take α − 0.05 ∈ Aα for α = 1, . . . , n to form the set E =
{0.95, 1.95, . . . , n − 0.05}. However, if Γ is an uncountable set, say R, then we cannot enumerate the elements in Γ , and hence it is hard to imagine how to obtain
precisely one element form each Aα .
t
u
To construct E5 , first we partition [0, 1] by the following grouping: we say x ∼ y,
or x and y are in the same group, say Aα , if y − x ∈ Q. Hence [0, 1] is partitioned into
disjoint equivalence classes {Aα , α ∈ Γ }, for some Γ , such that [0, 1] = ∪α∈Γ Aα .
By construction, for each α, any two elements x, y ∈ Aα differ by a rational. On
the other hand, elements of different classes always differ by an irrational. Note that
each Aα is countable, since Q is, but there are uncountably many different classes,
since [0, 1] is uncountable.
Now use the axiom of choice to construct a new set E5 ⊂ [0, 1] which contains
exactly one member aα from each of the Aα . Now enumerate the rationals in [−1, 1]
as a sequence {qn } (we can do this as rationals are countable). Define a sequence of
translates of S by Sn = E5 + qn .
Note the following properties from the construction:
1.
2.
3.
4.
There are countably many Sn , each Sn contains uncountably many elements.
If x, y ∈ E5 and x 6= y, then x − y is an irrational number.
If E5 is Lebesgue-measurable, then so is each Sn and λ (Sn ) = λ (E5 ) for all n.
The sequence of set {Sn } is disjoint:
• Suppose that z ∈ Sm ∩ Sn for some m 6= n. Then we can write aα + qm = z =
aβ + qn for some aα , aβ ∈ S, and their difference aα − aβ = qn − qm ∈ Q.
This contradicts Property 2) and thus such z does not exist.
2.4 Random Variables
31
Now, we have [0, 1] ⊂ ∞
n=1 Sn ⊂ [−1, 2] and λ (Sn ) = λ (E5 ) for all n. By countable additivity and monotonicity of λ this implies:
S
∞
1 = λ ([0, 1]) 6 λ (∪∞
n=1 Sn ) =
∑ λ (Sn ) = λ (E5 ) + λ (E5 ) + · · · 6 λ ([−1, 2]) = 3.
n=1
This is clearly impossible since the sum λ (E5 ) + λ (E5 ) + · · · is either 0 or ∞. Hence,
we have proved by contradiction that S is not measurable. In other words, we cannot
assign a length (or probability) to E5 by the Lebesgue measure.
This example shows the importance of defining the probability space (Ω , F , P):
As there are complicated sets that cannot be measured by our common sense using
intervals, we cannot assign probabilities on the measurable space (R, 2R ). Thus, it
is necessary to restrict our consideration on a collection of some “nice” enough sets
for practical purposes, which is (R, B), or (Ω , F ) in general. The key idea is that
F is generated from countable operations from measurable sets, so we are always
able to assign probability for the elements of F .
2.4 Random Variables
In elementary courses on probability, a random variable is defined simply as a variable X that follows some probability distribution such as the normal distribution.
Here we give a more formal definition.
Definition 2.8. If F is a σ -field on Ω , then a function X : Ω → R is said to be
F -measurable if the inverse image X −1 (B) ∈ F for every Borel set B in R, i.e.
B ∈ B. If (Ω , F , P) is a probability space, then such a function X is called a
random variable (r.v.).
Remark 2.4. Note that the random variable X can be defined on a measurable space,
without specifying a probability measure. For notational simplicity, we use {X ∈ B}
to represent the set X −1 (B) = {ω ∈ Ω |X(ω) ∈ B}. If B = (−∞, a], then X −1 (B) is
written {X ≤ a}
Remark 2.5. Intuitively, the requirement X −1 (B) ∈ F means that for any “reasonable” values that X may take (i.e., B), one can find the associated event (X −1 (B) ⊂
Ω ) contained in F . Hence we can assign probability to the event {X ∈ B} once a
probability measure defined on F is available.
Example 2.17. Some examples of random variables in each of the sample space Ω
in Example 2.7 are as follows:
a) Suppose that you are pointing your finger randomly onto some position of a
one-meter ruler. We have Ω = {The finger points to x| x ∈ [0, 1]} and denote the
elements in Ω by ωx = {The finger points to x}. Then the r.v. nearest integer is
given by X1 : Ω → {0, 1} with X1 (ωx ) = [x]. The r.v. the value shown on the
32
2 Measure Theory and Probability
ruler is given by X2 : Ω → [0, 1] with X2 (ωx ) = x. Although it seems that the
domain and image of X2 are essentially the same, we should always think of Ω
being what happens in the physical world and X being the value that quantifies
some aspects of the physical world.
b) When a die is flown once, Ω = {The face of the die is i| i = 1, 2, 3, 4, 5, 6}. If
the elements in Ω is denoted by ωi = {The face of the die is i}, then the r.v. It
is Big is given by X1 : Ω → {0, 1} with X1 (ωi ) = 1{i≥4} . The r.v. value of the
die is X2 : Ω → {1, 2, . . . , 6} with X2 (ωi ) = i.
c) For n tosses of a coin, Ω = {(ω1 , ω2 , · · · , ωn ) : ωi = H, T }. The r.v. number of
heads is a function X : Ω → {0, 1, . . . , n}. It is cumbersome to write down an
explicit formula for X. However, in practice we are only interested in the value
in that X takes, thus it is not important to be explicit about how X maps from Ω
to {0, 1, . . . , n}.
d) For the position of a particle with Ω = R3 , we may still define the r.v. the positional of a particle by X : Ω → R3 . Again, although X is an identity function,
we should think of Ω describing the physical situation that the particle is lying
somewhere and X is how we use our coordinate system to state its location. In
fact, X is called a random vector as it takes value in R3 instead of R.
e) If Ω is an abstract space describing the whole operation in the company, the r.v.
the price of a stock of the company is given by X : Ω → R+ . It is impossible to
write down a formula for X. However, as mentioned in 3, it is not important to
know the explicit formula of X.
t
u
Remark 2.6. We require that X −1 (B) ∈ F (i.e., {ω : X(ω) ∈ B} ∈ F ) for X to be
a r.v.. Intuitively, we should be able to assign probability to the event where the
random variable takes proper values. For example, consider Ω = {1, 2, . . . , 6}, F1 =
2Ω , F2 = {0,
/ {1, 2, 3}, {4, 5, 6}, Ω }, X1 and X2 are functions from Ω to R such that
X1 (ω) = 1{ω≥4} and X2 (ω) = ω. Then, X1 is a r.v. on both (Ω , F1 ) and (Ω , F2 ).
However, X2 is a r.v. on (Ω , F1 ) but not (Ω , F2 ), since X2−1 (4) = {4} ∈
/ F2 .
t
u
Hence, one needs to associate Ω with an appropriate σ -field to make the definition of r.v. complete. In general, the following definition provides a simple and
natural way to construct such a σ -field.
Definition 2.9. (σ -field generated by a random variable.) The σ -field generated
by a r.v. X : Ω → R is defined as
σ (X) = {X −1 (B)|B ∈ B} .
It is the smallest σ -field such that X is Borel measurable.
Example 2.18. Consider Ω = {1, 2, . . . , 6}, X1 (ω) = 1{ω≥4} , X2 (ω) = ω and X3 (ω) =
1{ω is an even integer} . The σ -field generated by X1 , X2 and X3 are respectively
F1 = {0,
/ {1, 2, 3}, {4, 5, 6}, Ω }, F2 = 2Ω and F3 = {0,
/ {1, 3, 5}, {2, 4, 6}, Ω }.
2.5 Lebesgue Integral
33
On the other hand, if Ω = R, X1 (ω) = 1{ω≥4} , X2 (ω) = ω and X3 (ω) =
1{ω is an even integer} . The σ -field generated by X1 , X2 and X3 are respectively
F1 = {0,
/ (−∞, 4), [4, ∞), R}, F2 = B and F3 = {0,
/ R\{2k : k ∈ Z}, {2k : k ∈
Z}, R}.
t
u
Recall that the sample space Ω is regarded as what happens in the physical world
and is seldom taken into account in probabilistic calculations. Instead, the probabilistic calculations are mainly based on the induced probability measure:
Definition 2.10. (Induced Probability Measure) Every random variable X : Ω →
R induces a probability measure PX on R defined by PX (B) = P{X ∈ B} for each
Borel set B in R. We call PX the distribution of X. The function FX : R → [0, 1]
defined by FX (x) = P{X ≤ x} is called the distribution function of X.
Remark 2.7. On the probability space (Ω , F , P), we use P(ω : X(ω) ∈ B) to measure the probability of an event B ∈ B. It is equivalent to using PX (B) on the probability space (R, B, PX ). This explains why we do not need to be explicit about the
sample space Ω and how the random variable X maps from Ω to R in Example
2.17.
t
u
2.5 Lebesgue Integral
A measure describes the length of a set. An integral gives the area under a curve
over a set, which is a generalization of measure. In probability theory, measure is
related to probability and integral is related to taking expectation. In this section, we
discuss how the Lebesgue measure can be used to define integrals that generalize
Riemann integral. First we recall the definition of Riemann integrals.
Definition 2.11. (Step Function.) Step function is a function f : [a, b] → R such
that f (x) = ai if xi−1 ≤ x < xi where a = x0 < x1 < · · · < xn = b.
The Riemann integrals use rectangle to approximate
the area under a curve. For
R
a step function, the Riemann integral is given by ab f (x)dx = ∑ni=1 ai ∆ xi where
∆ xi = xi − xi−1 is the length of the subinterval [xi−1 , xi ]. For an arbitrary function f ,
the Riemann integral of f is defined as follows.
Definition 2.12. (Riemann Integral) A a finite set P = {a0 , a1 , . . . , an } satisfying
a = a0 < a1 < a2 < · · · < an = b is said to be a partition of [a, b]. Let ∆ ai = ai −ai−1 ,
Mi =
sup
ai−1 ≤x≤ai
f (x) and mi =
inf
ai−1 ≤x≤ai
f (x) .
Then the upper and lower Riemann sums corresponding to the partition P are defined
by
n
n
UR (P, f ) = ∑ Mi ∆ ai , and LR (P, f ) = ∑ mi ∆ ai .
i=1
i=1
34
2 Measure Theory and Probability
If the supremum of LR (P, f ) and the infimum of UR (P, f ) (taking over all possible
partitions P) are equal, thenR the common value is called the Riemann Integral of
the function f , denoted by ab f (x) dx.
However, there exist some functions that are not Riemann integrable.
Example 2.19. (Non-Riemann Integrable function) Let f (x) = 1{x∈Q[0,1] } , where
Q[0,1] is the set of Rational numbers in [0, 1]. Note that any interval (ai−1 , ai ) contains both rational and irrational numbers. It follows that Mi = 1 and mi = 0 for
any i and any partition P (see Definition 2.12). Thus, infP UR (P, f ) = 1 6= 0 =
supP LR (P, f ) and Riemann Integral of f does not exist.
The problem of Riemann integration is that it decompose the integration domain
(x-axis) into small parts, but the function may vary a lot in the y-axis. Lebesgue
integral looks at integration in the opposite way: it decomposes the integration range
(y-axis) into small parts.
Definition 2.13. (Simple function) A non-negative function f : R → R is a simple function if the range of f is a finite set of distinct non-negative real numbers
{a1 , . . . , an } such that for all i = 1, . . . , n,
Ai = f −1 ({ai }) = {x : f (x) = ai } .
t
u
Note that the Ai s are pairwise disjoint and ∪ni=1 Ai = R.
While Riemann integration approximates the area of a function by step functions,
Lebesgue integration perform the approximation by simple functions.
Definition 2.14. (Lebesgue Integral) Consider partitioning of the range of f by
P = {a0 , a1 , . . . , an } such that −∞ < a0 < a1 < a2 < · · · < an < ∞. Let Ai = [ai−1 , ai ).
Then the partition P gives the upper and lower Lebesgue sums
n
n
i=1
i=1
Uλ (P, f ) = ∑ ai λ ( f −1 (Ai )), and Lλ (P, f ) = ∑ ai−1 λ ( f −1 (Ai )) .
If the supremum of Lλ (P, f ) and the infimum of Uλ (P, f ) (taking over all possible
partitions P) are equal, thenR the common value is called the Lebesgue Integral of
the function f , denoted by f dλ .
Remark 2.8. When the partition P becomes finer, the ai and ai−1 in Lebesgue integral converge to each other while Mi and mi in Riemann Integral may not. Thus
Lebesgue integration is a better device in defining an integral. In fact, it can be
shown that if f is Riemann integrable, then f is Lebesgue Integral and the two integrals are the same. (Exercise 2.20) The opposite is not true, as the following example
shows.
Example 2.20. Recall that the function f (x) = 1{x∈Q[0,1] } in Example 2.19 is not
Riemann integrable. From the definition of Lebesgue integral,
Z
f dλ = 1 × λ (Q[0,1] ) + 0 × λ (R\Q[0,1] ) = 0 ,
2.6 Lebesgue integration and Expectation
35
since λ (Q[0,1] ) = 0. Thus f is Lebesgue integrable.
Remark 2.9. A real function f : R → R is said to be Lebesgue measurable if
f −1 (I) ∈ M for each interval I ⊂ R .
(2.1)
R
From Definition 2.14, in order to define the Lebesgue integral f dλ , f has to be a
Lebesgue measurable function. For example, f (x) = 1{x∈E5 } is not Lebesgue integrable (E5 is the non-measurable set defined in Section 2.3.5). On the other hand,
a real function f : R → R is said to be Borel measurable if f −1 (I) ∈ B for each
interval I ⊂ R. Since B ⊂ M , Lebesgue integral can always be defined for Borel
measurable functions, given that the integral is finite.
2.6 Lebesgue integration and Expectation
In the previous section we assumed that the integrands in the Lebesgue integral are
Lebesgue measurable real functions (functions that satisfy (2.1)) . Although a r.v.
X : Ω → R, defined on (Ω , F , P), is not mapping from R to R, the property that
X −1 (B) ∈ F for every B ∈ B is somewhat analogous to (2.1). Thus, the expectation
of a random variable can be formulated rigorously using Lebesgue Integral.
Consider first a discrete random variable X : Ω → R taking discrete values
x1 , x2 , . . .. The integral of X under probability measure P is defined as
Z
Ω
∞
XdP = ∑ xi P{X = xi }
i=1
where P is the probability measure on Ω . Note that we partition Ω into disjoint
subsets (i.e. {X = xi }s) and then on each subset we multiply the value of the function
by the size of that subset (i.e. form the product xi P{X = xi }).The integral is then the
sum of all such products.
For an arbitrary random variable X, we can approximate X by two discrete
random variables X P (ω) = ∑ni=1 xi 1{ω∈Ai } and X P (ω) = ∑ni=1 xi−1 1{ω∈Ai } , where
Ai = X −1 ([xi−1 , xi )) and P = {x0 , x1 , . . . , xn } is a partition on the range of X. The
integral of X, denoted by
Z
Z
XdP ,
X(ω)P(dω) or simply
Ω
Ω
is then obtained by the same way in Definition 2.14. To be specific, let
n
n
i=1
i=1
UP (P, X) = ∑ xi P(X −1 (Ai )) and LP (P, X) = ∑ xi−1 P(X −1 (Ai )) .
R
Then Ω XdP is defined to be supP LP (P, X) or infP UP (P, X), if the latter two quantities exist and are equal.
36
2 Measure Theory and Probability
Specifically, for any F ∈ F , the integral of X over F, is defined by
Z
Z
XdP =
F
X1F dP .
Ω
We have defined the integral of a r.v. with respect to a probability measure P. In
particular, if the integral is finite, we call it expectation.
Definition 2.15. (R.V. in L 1 and Expectation) A random variable X : Ω → R is
said to be integrable (or in L 1 ) if
Z
|X|dP < ∞.
Ω
If X ∈ L 1 , then E(X) = Ω XdP exists and is called Rthe expectation of X. In general, if h : R → R is a Borel measurable function and Ω |h(X)|dP < ∞, then
R
Z
E(h(X)) =
h(X)dP
Ω
is called the expectation of h(X).
Remark 2.10. Recall from Definition 2.10 that a random variable X induces a probability measure PX on (R, B). Thus the expectation of h(X) can be equivalently
expressed as either
Z
Z
E(h(X)) =
h(x)dPX (x) .
h(X)dP or
Ω
R
Moreover, for any B ∈ B, we have
Z
Z
X −1 (B)
h(X)dP =
B
h(x)dPX (x) .
Definition 2.16. (Squared Integrable Function, L 2 and Variance) A random
variable X : Ω → R is called square integrable (or in L 2 ) if
Z
|X|2 dP < ∞ .
Ω
Then the variance of X can be defined by
Z
Var(X) =
(X − E(X))2 dP .
Ω
A statement about a r.v. X is said to hold almost surely (a.s.) if X(ω) satisfies the statement for all ω ∈ A where A is some event of probability 1 (i.e.,
P(A) = 1). For example, let X : [0, 1] → R to be a r.v. on ([0, 1], B[0,1] , λ[0,1] ) such
that X(ω) = 1Q (ω). We can say that X = 0 a.s., since X(ω) = 0 for ω ∈ [0, 1]\Q
and λ[0,1] ([0, 1]\Q) = 1. Note that A need not be Ω , but it can be Ω \N for some
2.6 Lebesgue integration and Expectation
37
null set N. This motivates the term “almost surely”. Finally we mention three useful
results.
Theorem 2.1. (Fatou’s Lemma) If Xn > 0 a.s., then
Z
lim Xn dP ≤ lim
n→∞
Z
Xn dP .
n→∞
Theorem 2.2. (Dominated
Convergence Theorem (DCT)) If limn→∞ Xn = X a.s.
R
and |Xn | < Y where Y dP < ∞, then
Z
Z
Xn dP =
lim
n→∞
Z
X dP =
lim Xn dP .
n→∞
Theorem 2.3. (Monotone Convergence Theorem (MCT)) If Xn > 0 and Xn ↑ X
a.s., then
Z
Z
Xn dP =
lim
n→∞
Z
X dP =
lim Xn dP .
n→∞
Example 2.21. For a strict inequality in the Fatou’s lemma, consider the example
Xn (ω) = n × 1{ω<1/n} on ([0, 1], B[0,1] , λ[0,1] ). For each fixed ω in (0, 1], Xn (ω) = 0
a.s.
for all n ≥ 1/ω. That is, {ω : lim
X (ω) = 0} R= (0, 1], which implies that Xn → 0
R n→∞ n
(since P((0, 1]) = 1). Hence, limn→∞ Xn dP = 0 dλ[0,1] = 0. However, for every
R
integer n, Xn dP = nλ[0,1] 0, n1 = 1. As a result, we have
Z
0=
Z
lim Xn dP < lim
n→∞
n→∞
Xn dP = 1 .
The basic intuition behind this example is that the a.s. limit ignores the the magnitude of Xn on the set which becomes null in the limit while still contributes significantly
on the integral, i.e. the part n × 1{ω<1/n} . The conditions |Xn | < Y and
R
Y dP < ∞ in DCT and Xn ↑ X a.s. in MCT rules out the possibility of such an
example and thus the limit and integral can be exchanged.
Example 2.22. For any non-negative r.v. X on (Ω , F ), there exists a sequence of
discrete random variable Xn on (Ω , F ) such that Xn ↑ X a.s.. For example, we can
take
22n
Xn =
k
.
∑ 2n · 1X −1 ([ 2kn , k+1
2n ))
k=0
It is easy Rto verify that
for any ω ∈ Ω , Xn (ω) ↑ X(ω). Also, from the MCT, we have
R
E(Xn ) = Xn dP → X dP = E(X).
Example 2.23. Given fn (x) =
√
x
,
1+nx3
DCT can be used to find
Z ∞
lim
n→∞ 1
fn (x) dx .
38
2 Measure Theory and Probability
To see this, set X : R+ → R+ such that Xn (ω) = fn (ω)1{ω≥1} and M : R+ → R+
√
such that M(ω) = ω12.5 1{ω≥1} . Note that Xn (ω) < ωω3 1{ω≥1} = M(ω) for all n ≥
R
1
< ∞ if we take λ as the
1. Also, M(ω) satisfies E(M) , R+ M(ω) dλ (ω) = 1.5
measure. Therefore, applying DCT gives
Z ∞
lim
n→∞ 1
fn (x) dx = lim E(Xn ) = E( lim Xn ) = E(0) = 0 .
n→∞
n→∞
2.7 Exercises
Exercise 2.4 Show that the function f in Examples 2.2 and 2.3 are bijective. In each
case, construct another example of bijective function f .
Exercise 2.5 Show that the set of complex number with rational real and imaginary
parts, {a + bi : a, b ∈ Q}, is countable.
Exercise 2.6 Using similar argument as in Example 2.6, show that the set of real
number is uncountable.
Exercise 2.7 Verify that a σ -Field is closed under countable intersections.
Exercise 2.8 Let −∞ < a < b < ∞. Show that the following sets belong to the Borel
σ -field B:
1.
2.
3.
4.
singleton: {x}, where x ∈ R.
half closed interval: [a, b) or (a, b]
closed interval: [a, b].
(−∞, b) and (a, ∞).
Exercise 2.9 Explain whether the following F s are σ -fields or not.
a Suppose Ω = {1, 2, 3, 4, 5, 6} and F = {{1, 2}, {3, 4, 5}, {6}}.
b Suppose Ω and F are defined as in part a). Let F ∗ = {A : A is the union of
some subset in F } (e.g., A = {1, 2} ∪ {3, 4, 5} = {1, 2, 3, 4, 5} ∈ F ∗ ). Is F ∗ a
σ -field?
Exercise 2.10 Consider Example 2.17b). Let F = {0,
/ {1, 2, 3}, {4, 5, 6}, Ω }. Show
that X1 is a r.v. in the measurable space (Ω , F ) but X2 is not.
Exercise 2.11 What is the difference between a singleton and an element?
Exercise 2.12 Suppose that the sample space Ω is a finite set. Suppose we want to
define a σ -field and probability measure associate with Ω . Explain why the notation
σ (Ω ) is meaningless. Show that the smallest σ -field containing all singletons of Ω
is 2Ω . Compare this set between σ ({Ω })
2.7 Exercises
39
Exercise 2.13 Show that the σ -field generated by a class of set A can be expressed
as
σ (A ) = ∩α Fα ,
where {Fα } are all the σ -fields (possibly uncountable) that contain A .
Exercise 2.14 The function f : R → R is called a Borel measurable function if
f −1 (B) of any Borel set B is also a Borel set. Show that any continuous function
f : R → R is a Borel measurable function.
Exercise 2.15 Show that all continuous functions are Borel measurable.
Exercise 2.16 Show that B[a,b] is a σ -field for any −∞ < a < b < ∞.
Exercise 2.17 Verify that σ (X) defined in Definition 2.9 satisfy the definition of a
σ -field.
Exercise 2.18 Show that {1, 2, 3} is a Lebesgue-null set using Definition 2.7.
Exercise 2.19 Give two examples of simple functions. Give two examples of step
function. Is step function always a simple function? Is simple function always a step
function?
Exercise 2.20 Show that if a function f is Riemann integrable, then it is Lebesgue
integrable, and the values of the two integrals are the same.
Exercise 2.21 The (Doob-Dynkin) theorems states that
Doob-Dynkin Theorem
Let X be a random variable. Then each σ (X)-measurable random variable Y
can be written as Y = f (X) for some Borel function f : R → R.
Instead of showing the theorem we consider one illustrative example: Let Ω =
{1, 2, 3, 4} and X : Ω → R be a random variable such that X = i2 for i = 1, 2, 3, 4.
Clearly σ (X) is just the power set P(Ω ) of Ω (i.e. the set of all subsets of Ω ). Let
the r.v. Y : Ω → R be defined by
0 if i = 1, 2
Y (i) =
1 if i = 3, 4.
1. Find σ (Y ).
2. Show that Y is σ (X) measurable.
3. Construct a Borel function f : R → R such that Y = f (X).
Exercise 2.22 Let An = {q1 , . . . , qn } and fn (x) = 1{x∈An } , where qi ∈ R. Find the
Riemann integral and Lebesgue integral of fn on R.
Exercise 2.23 1. Show that if X ∈ L 2 then X ∈ L 1 .
40
2 Measure Theory and Probability
2. If η : Ω → [0, ∞) is in L 2 and nonnegative, then
E(η 2 ) = 2
Z ∞
tP(η > t)dt.
0
Exercise 2.24 Investigate the convergence of
Z ∞ 2 −n2 x2
n xe
a
for a > 0 and a = 0.
1 + x2
dx ,