Notes for Probability Theory

Probability Theory Notes
Dept. of AMS, Fangzheng Xie
March 17, 2017
Contents
1 Probability Measure Theory
4
1.1
Collections of Sets
1.2
Probability Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3
1.4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.2.1
Probability Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.2
Uniform Probability on (0, 1] . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Extension of Probability Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.3.1
Extension Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.3.2
Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Denumerable Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.4.1
Operations On Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.4.2
Some Basic Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.4.3
Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.5
Borel-Cantelli Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.6
Kolmogorov’s Zero-One Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.7
Rényi-Lamperti Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
1.8
Probability Measures and Distribution Functions on (R, R) . . . . . . . . . . . . . 70
2 Random Variables
78
k
2.1
Borel Sets in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.2
Measurable Functions and Mappings . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.3
σ-fields generated by families of mappings . . . . . . . . . . . . . . . . . . . . . . 85
2.4
General Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.5
Transformation of Probability Measure . . . . . . . . . . . . . . . . . . . . . . . . 92
2.6
Independence for Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.6.1
Independence and Tail σ-Fields . . . . . . . . . . . . . . . . . . . . . . . . 93
2.6.2
Kolmogorov’s Zero-One Law for Random Variables . . . . . . . . . . . . . 94
1
3 Integration
3.1
Construction of Integration
96
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.1.1
Integration for Nonnegative Simple Functions . . . . . . . . . . . . . . . . 96
3.1.2
Integration for Nonnegative Measurable Functions . . . . . . . . . . . . . . 97
3.1.3
Integration for General Measurable Functions . . . . . . . . . . . . . . . . 101
3.2
Change of Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.3
Integration to the Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.4
Indefinite Integrals and Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.5
Modes of Convergence and Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.6
Vague Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
3.7
Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.8
Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.9
Uniform Integrability and Convergence of Moments . . . . . . . . . . . . . . . . . 138
4 Transition Probabilities
139
4.1
Product Measurable Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.2
Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4.3
Composition Distribution
4.4
Mixture Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.5
Product Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5 Law of Large Numbers
158
5.1
Simple Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.2
Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.3
Random Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5.4
Strong Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.5
Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.6
Applications of Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . 188
6 Characteristic Functions
194
6.1
General Properties and Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . 194
6.2
Characterization Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.3
Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.4
Simple Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.5
Bochner’s Theorem and Polya’s Theorem . . . . . . . . . . . . . . . . . . . . . . . 214
2
7 Central Limit Theorems
222
7.1
Liapunov’s Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 222
7.2
Lindeberg-Feller Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . 225
8 Conditional Expectation
235
8.1
Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
8.2
Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
9 Martingales
245
9.1
Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
9.2
Martingale Convergence Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . 248
9.3
Radon-Nykodim Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
9.4
Applications of Martingale Convergence Theorem . . . . . . . . . . . . . . . . . . 264
9.5
Optional Stopping Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
10 Conditional Probability Distributions
282
10.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
10.2 Basic Properties and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
3
Chapter 1
Probability Measure Theory
1.1
Collections of Sets
Firstly we need to make some notations: Let Ω be a nonempty set and the elements in Ω is
denoted by ω.
DEFINITION 1.1.1 (Field, σ-Field and Monotone Class) Let Ω be a nonemptyset and
F be a nonempty collection of subsets of Ω.
• F is called a field if it is closed under complementation and finite union. Namely, if E ∈ F
S
implies E c ∈ F and {Ei }ni=1 ⊂ F implies ni=1 Ei ∈ F.
• F is called a monotone class, denoted by M.C., if it is closed under unions of increasing
sequence of sets and intersections of decreasing sequence of sets. Namely, {En }∞
n=1 ⊂ F
S∞
with En ⊂ En+1 for all n ∈ N+ implies n=1 En ∈ F and {En }∞
n=1 ⊂ F with En ⊃ En+1
T∞
for all n ∈ N+ implies n=1 En ∈ F.
• F is called a σ-field if it is closed under complementation and arbitrary countable unions
S∞
of sets. Namely, E ∈ F implies E c ∈ F and {En }∞
n=1 ⊂ F implies
n=1 En ∈ F.
THEOREM 1.1.1 A field is a σ-field if and only if it is a M.C.
Proof. By definition, a σ-field is automatically a M.C. and therefore the ”only if” part is trivial.
Hence we need to show that given a field F beging a M.C., it is a σ-field.
First we show that F is closed under countable unions of sets. Assume that {En }∞
n=1 ⊂ F is
a collection of subsets of F. Define
Fn =
n
[
En
i=1
Then we claim that
∞
[
n=1
En =
∞
[
Fn ,
Fn ⊂ Fn+1 , for all n ∈ N+
n=1
4
The second inclusion is trivial since Fn include all Ei from i = 1 to n. Given ω ∈
S∞
n=1
En ,
there exists an n such that ω ∈ Fn . By the definition of Fn , there exists an i0 6 n such that
S
S∞
S∞
S∞
ω ∈ Ei0 . Therefore ω ∈ ∞
n=1 En , showing that
n=1 Fn ⊂
n=1 En . Conversely, if ω ∈
n=1 En ,
S∞
then there exists an n such that ω ∈ En ⊂ Fn , indicating ω ∈ n=1 Fn . This shows that
S∞
S∞
n=1 En ⊂
n=1 Fn . Hence the assertion is proved.
Now that {Fn }∞
n=1 is a sequence of increasing sets in F and F is a M.C., then by definition
the union is in F. Hence,
∞
[
n=1
En =
∞
[
Fn ∈ F
n=1
The second step is to show that F is closed under complementation, which is trivial because
F is a field and by the definition of field it is closed under complementation.
Therefore F being a M.C. and a field implies that F is a σ-field, which is the ”if” part.
This theorem will be used repeatedly in the following context.
DEFINITION 1.1.2 Given a nonemtpy set Ω, the σ-field {∅, Ω} is called the trivial σ-field
of Ω and P(Ω), the collection of all subsets of Ω, is called the total σ-field.
PROPOSITION 1.1.1 If Ω is a nonempty set and {Fα |α ∈ A} is a collection of σ-field or a
collection of M.C., then the intersection
\
Fα
α∈A
is also a σ-field or a M.C..
Proof. First we prove the case of σ-field. Assume that {Fα |α ∈ A} is a collection of σ-field. Then
T
given E ∈ α∈A Fα , for all α we have E ∈ Fα and by the definition of σ-field the complement
T
E c ∈ Fα . Hence E c ∈ α∈A Fα , indicating the closure under complementation. Next, given
T
S∞
a sequence of sets {En }∞
n=1 ⊂
α∈A Fα , for all α the union
n=1 En ∈ Fα by the definition of
S∞
T
σ-field. Also we know that α is arbitrary so the union n=1 En ∈ α∈A Fα . Therefore we know
T
that α Fα is a σ-field.
T
Now consider the case of M.C.. If {En }∞
n=1 ⊂
α∈A Fα is a sequence of increasing sets Fn ⊂
S
∞
Fn+1 , then for all α we have {En }n=1 ⊂ Fα and by definition of M.C., the union ∞
n=1 En ∈ Fα
S∞
T
T
for all α. Namely, n=1 En ∈ α∈A Fα . Similarly we can conclude that if {En }∞
n=1 ⊂
α∈A Fα
T
T
is decreasing then the intersection is in α∈A Fα . Therefore α∈A Fα is a M.C..
DEFINITION 1.1.3 Given a collection C of subsets of Ω, the σ-field generated by C, denoted
T
by σ(C), is the smallest σ-field containing C as a subcollection. Namely, σ(C) = C⊂F F, where
the intersection ranges over all σ-fields containing C as a subcollection.The M.C. generated by
5
C, denoted by M C(C), is the smallest M.C. containing C as a subcollection. Namely, M C(C) =
T
C⊂G G, where the intersection ranges over all M.C.’s containing C as a subcollection.
THEOREM 1.1.2 (Monotone Class Theorem) Assume that F0 is a field. Then the σ-field
generated by F0 is exactly the M.C. generated by F0 . Namely, σ(F0 ) = M C(F0 ).
To prove this theorem we need the following lemmas:
LEMMA 1.1.1 Assume that F0 is a field and G = M C(F0 ). Define
C1 = {E ∈ G : E ∩ F ∈ G, for all F ∈ F0 }
C2 = {E ∈ G : E ∩ F ∈ G, for all F ∈ G}
D = {E ∈ G : E c ∈ G}
Then C1 , C2 , D are all M.C.’s.
Proof. We prove the lemma by the following steps:
• C1 , C2 are M.C.’s. To see this, assume that {En }∞
n=1 ⊂ C1 or C2 increases. Namely, En ⊂
En+1 for all n, then for all F ∈ G or F0 , the {En ∩ F }∞
n=1 also increases. By the definition
of C1 or C2 , {En ∩ F }∞
n=1 ∈ G. Since G is a M.C., then
!
∞
∞
[
[
(En ∩ F ) ∈ G
En ∩ F =
n=1
n=1
indicating that
S∞
n=1
En ∈ C1 or C2 by definition.
∞
Similarly, {En }∞
n=1 ⊂ C1 or C2 being decreasing implies {En ∩ F }n=1 being decreasing for
all F ∈ G or F0 . By the definition of C1 or C2 , {En ∩ F }∞
n=1 ∈ G. Since G is a M.C., then
!
∞
∞
\
\
En ∩ F =
(En ∩ F ) ∈ G
n=1
indicating that
T∞
n=1
n=1
En ∈ C1 or C2 by definition. Therefore, C1 , C2 are M.C.’s.
• D is a M.C.. if {En }∞
n=1 ⊂ D increases (En ⊂ En+1 for all n), then by de Morgan law
!c
∞
∞
[
\
En =
Enc
n=1
n=1
c
c
where Enc ∈ G by definition. Now that G is a M.C. and {Enc }∞
n=1 is decreasing(En ⊃ En+1
for all n), then
∞
[
!c
En
=
n=1
and by definition of D,
S∞
n=1
∞
\
n=1
En ∈ D.
6
Enc ∈ G
Similarly, if {En }∞
n=1 ⊂ D decreases (En ⊃ En+1 for all n), then by de Morgan law
!c
∞
∞
\
[
En =
Enc
n=1
n=1
c
c
where Enc ∈ G by definition. Now that G is a M.C. and {Enc }∞
n=1 is increasing(En ⊂ En+1
for all n), then
∞
[
!c
En
=
n=1
and by definition of D,
S∞
n=1
∞
\
Enc ∈ G
n=1
En ∈ D.
LEMMA 1.1.2 Assume that F0 is a field and G = M C(F0 ). Define
C1 = {E ∈ G : E ∩ F ∈ G, for all F ∈ F0 }
C2 = {E ∈ G : E ∩ F ∈ G, for all F ∈ G}
D = {E ∈ G : E c ∈ G}
then we have C2 = C1 = D = G.
Proof. We show the lemma by the following steps:
• C1 = G. By the definition of C1 , it is a subcollection of G and therefore we only need to
show G ⊂ C1 . By the lemma above C1 is a M.C., and so we just need to show F0 ⊂ C1 . For
all E ∈ F0 , for all F ∈ F0 we have E ∩ F ∈ F0 ⊂ G since F0 is a field, and by the definition
of C1 we have E ∈ C1 . This indicates that F0 ⊂ C1 .
• C2 = G. By the definition of C2 , we only need to show that G ⊂ C2 . Now that C2 is a M.C.,
then it suffice to show that C2 contains F0 as a subcollection. By the definition of C1 and
C1 = G, for all E ∈ G, for all F ∈ F0 , we have E ∩ F ∈ G. But F ∈ F0 implies that F ∈ G,
and for all E ∈ G, E ∩ F ∈ G. Then by the definition of C2 , we have F ∈ C2 . Therefore,
the conclusion F0 ⊂ G2 is yielded.
• D = G. Similarly since D itself is a subcollection of G, we only need to show that G ⊂ D
and since D is a M.C., then we just need to verify that F0 ⊂ D. But this holds, since for
all F ∈ F0 by the definition of field F c ∈ F0 ⊂ G, indicating that F ∈ D.
Now we can prove theorem 1.1.2.
7
Proof. Let G = M C(F0 ). By lemma 1.1.2 we have
G = {E ∈ G : E c ∈ G} = {E ∈ G : for all F ∈ G, E ∩ F ∈ G}
Therefore, if E ∈ G, then E c ∈ G and E, F ∈ G also implies that E ∩ F ∈ G. Thus if for all
E, F ∈ G, we have E c , F c ∈ G and E c ∪ F c ∈ G and E ∪ F = (E c ∩ F c )c ∈ G. Hence, the
closure of G under complementation and finite union is verified and G is a field. But it is also a
M.C., therefore it is a σ-field. By the definition of σ(F0 ) we have σ(F0 ) ⊂ G. But every σ-field
containing F0 is also a M.C. containing F0 , thus we have
\
\
G = M C(F0 ) =
G⊂
F = σ(F0 ) = σ(F0 )
F0 ⊂G
F0 ⊂F
where G, F represent for M.C.’s and σ-fields.
DEFINITION 1.1.4 (π system, λ-system) Let Ω be a nonempty set.
• Let P be a nonempty collection of subsets of Ω. Then P is called a π-system, if it is closed
under finite intersection. Namely, E, F ∈ P implies E ∩ F ∈ P.
• Let L be a nonempty collection of subsets of Ω. Then L is called a λ-system, if it contains
Ω and is closed under proper differences and countable increasing intersection. Namely,
E, F ∈ L with E ⊃ F implies E\F ∈ L and {En }∞
n=1 with En ⊂ En+1 for all n implies
S∞
n=1 En ∈ L.
REMARK 1.1.1 π-system is weaker than field and λ-system is stronger than M.C.
LEMMA 1.1.3 If F is a collection of subsets of a nonempty set Ω, and F is both a π-system
and a λ-system, then F is a σ-field.
Proof. Since F is a λ-system. Then Ω ∈ F and F being closed under proper difference implies
that F is closed under complementation. In fact, if E ∈ F, then E c = Ω\E is also in F.
Therefore F is a field, since it is a π-system. Next if {En }∞
n=1 ⊂ F with En ⊃ En+1 for all n,
then
is also in F, since
∞
\
Enc
∈ F and
Enc
⊂
∞
[
En =
n=1
c
En+1
!c
Enc
n=1
and F is closed under complementation.
THEOREM 1.1.3 (Dynkin’s π − λ theorem) If P is a π-system and L is a λ-system, then
P ⊂ L implies P ⊂ σ(P) ⊂ L.
Proof. To prove the theorem, it suffices to show that σ(P ) ⊂ l(P), where l(P) is the λ-system
generated by P. Namely, l(P) is the intersection of all λ-systems that include P as a subcollection. But to see this, we only need to show that l(P) is a σ-field. It suffices to show that l(P)
8
is a π-system, since itself is a λ-system and being both a λ-system and a π-system means that
it is a σ-field. Thus, to prove the theorem, it suffices to show that l(P) is a π-system, namely,
to show that l(P) is closed under binary intersection.
Define
G = {A ∈ l(P) : A ∩ B ∈ l(P), for all B ∈ l(P)}
Then if we can show that G = l(P), then we know that l(P) is closed under binary intersection.
By the definition of G, we only need to consider the case G ⊃ l(P). But to see this, we only
need to show that G is a λ-system and includes P as a subcollection.
(I) First we check that P ⊂ G. Namely we want to show that for all A ∈ P, for all B ∈ l(P),
their intersection A ∩ B ∈ l(P). If we define
H = {A ∈ l(P) : A ∩ B ∈ l(P), for all B ∈ P}
and show that H ⊃ l(P), then we can show that P ⊂ G. To see this, it suffices to show
that H is a λ-system and contains P as a subcollection.
(1) We want to show P ⊂ H. If A ∈ P, then for all B ∈ P we will have A ∩ B ∈ P because
itself is a π-system. But P ⊂ l(P), therefore A ∈ H, indicating that H ⊃ P.
(2) We want to show H is a λ-system. Namely, we want to show that
(a) Ω ∈ H. This is because for all B ∈ P, Ω ∩ B = B ∈ P ⊂ l(P) and Ω ∈ l(P) by the
definition of λ-system. Hence Ω ∈ H.
(b) E, F ∈ H with E ⊂ F implies E\F ∈ H. Since E, F ∈ H, then for all B ∈ P, we
have E ∩ B, F ∩ B ∈ l(P) by the definition of H. But
(E\F ) ∩ B = (E ∩ B)\(F ∩ B)
and E ∩ B ⊃ F ∩ B, and both are in the λ-system l(P). Therefore the proper
difference (E ∩ B)\(F ∩ B) is in l(P) and E\F ∈ H by the definition of H.
S∞
(c) If {En }∞
n=1 ⊂ H and En ⊂ En+1 for all n, then
n=1 En ∈ H. For all B ∈ P, by
the definition of H, En ∩ B ∈ l(P) and with (En ∩ B) ⊂ (En+1 ∩ B). Since l(P) is
a λ-system, then
∞
[
!
En
∩B =
n=1
Therefore,
S∞
n=1
∞
[
(En ∩ B) ∈ l(P)
n=1
En ∈ H.
(II) Secondly we need to show that G is a λ-system. Namely, we need to show
(1) Ω ∈ G. First l(P) is a λ-system so Ω ∈ l(P). Then Ω ∩ B = B ∈ l(P) for all B ∈ l(P)
and this implies that Ω ∈ G.
9
(2) E, F ∈ G with E ⊃ F implies E\F ∈ G. For all B ∈ l(P), by the definition of G we
have E ∩ B, F ∩ B ∈ l(P) with (E ∩ B) ⊃ (F ∩ B). But l(P) itself is closed under
proper difference and
(E\F ) ∩ B = (E ∩ B)\(F ∩ B) ∈ l(P)
and this shows that E\F ∈ G by the definition of G.
S∞
(3) {En }∞
n=1 ⊂ G with En ⊂ En+1 for all n implies
n=1 En ∈ G. By definition of G for all
n and for all B ∈ l(P) we have En ∩ B ∈ l(P) with En ∩ B ⊂ En+1 ∩ B. Since l(P) is
a λ-system then
∞
[
!
En
∩B =
n=1
Hence, by the definition of G the union
∞
[
(En ∩ B)
n=1
S∞
n=1 En ∈ G.
∈ l(P)
EXAMPLE 1.1.1 Give an example of the following: Let Ω be a nonempty set and {Fi }∞
i=1
S∞
is a countable sequence of σ-fields on Ω with Fi ⊂ Fi+1 but the union i=1 is not necessarily a
σ-field.
Solution. Take Ω = Z+ and
Fi = σ({1}, {2}, · · · , {i})
It is obvious that Fi ⊂ Fi+1 for all i. We show that E = {2k : k ∈ Z+ } is not an element of
S∞
i=1 Fi . Assume it is true, then there exists an i such that E ∈ Fi . Write
Fi = σ({1}, · · · , {i}) = {A : A ⊂ {1, 2, · · · , i}} ∪ {Z+ \A : A ⊂ {1, 2, · · · , i}}
Since E is an infinite set, then there exists A ⊂ {1, 2, · · · , i} such that E = Z+ \A. But this
is not possible because Z+ \A contains all the odd integers greater than i whereas E does not
S
contain any odd integer. Therefore, E ∈
/ ∞
Fi
S∞ i=1
∞
Nevertheless, we have {2k}k=1 ⊂ i=1 Fi because {2k} ∈ σ({1}, · · · , {2k}) = F2k . But the
S
S∞
union E is not in ∞
i=1 Fi and this shows that
i=1 Fi is not a σ-field.
S∞
EXAMPLE 1.1.2 Let A = {Λi }∞
i=1 be a sequence of disjoint sets with Ω =
i=1 Λi and
F = σ(A). Prove that each member A ∈ F can be written as at most countable union of the
elements in A.
Proof. Define
(
G=
)
A ∈ F : there exists an index set I ⊂ N+ such that A =
[
i∈I
Then it suffices to show that F ⊂ G. To see this, we only need to show
10
Λi
(I) A ⊂ G. This is trivial because for all Λj ∈ A, just take the corresponding index set I = {j}
S
and we obtain Λj = i∈I Λi ∈ G
(II) G is a σ-field. This contains two parts:
(1) E ∈ G implies E c ∈ G. Write E =
Ec =
S
Λi and we have
!c
[
[
Λi =
Λi ∈ G
i∈I
i∈I
i∈Z+ \I
c
and E is in G because we can take the corresponding index set to be Z+ \I.
S∞
S
(2) {En }∞
E
∈
G.
Write
E
=
n
n
n=1 ⊂ G implies
n=1
i∈In Λi for all n and
∞
[
n=1
Therefore
S∞
n=1
En =
∞ [
[
Λi =
n=1 i∈In
[
Λi ,
I=
∞
[
In
n=1
i∈I
En ∈ G because the corresponding index set is I.
EXAMPLE 1.1.3 (2.5) The field f (A) generated by a collection A of subsets of Ω is defined
as the intersection of all fields in Ω containing A.
(a) Show that f (A) is indeed a field, that A ⊂ f (A), and that f (A) is minimal in the sense
that if G is a field and A ⊂ G, then f (A) ⊂ G.
S Tni
(b) Show that for nonempty A, f (A) is the collection of sets of the form m
i=1
j=1 Aij , where
S
n
i
Aij , 1 6 i 6 m are
for each i and j either Aij ∈ A or Acij ∈ A, and where the m sets j=1
disjoint. The sets in f (A) can thus be explicitly presented, which is not in general true of
the sets in σ(A).
Proof. (a) By definition, the field generated by A is given by
\
f (A) =
F
F ⊃A
where the intersection runs over all fields including A. Therefore if G is a field including
A as a subcollection, then it must be in the intersection above and thus
\
f (A) =
F ⊂G
F ⊃A
(b) For our convenience, denote the collection of the sets

(n
)m
ni
m \
[
\i
H=
Aij : either Aij ∈ A or Acij ∈ A,
Aij

i=1 j=1
j=1
Then to see that H is exactly f (A), it suffices to show
11
i=1
are disjoint



(I) H ⊂ f (A). Let
Sm Tni
i=1
Aij be any element in H. First it is easy to see that if
j=1
Aij ∈ A, then Aij ∈ f (A) and if Acij ∈ A, then Aij = Acc
ij ∈ f (A). Since the field is
Tni
closed under finite intersection, then j=1 Aij for each i is also an element of f (A).
S Tni
Furthermore, field is closed under finite union and therefore m
i=1
j=1 Aij ∈ f (A).
(II) H ⊃ f (A). Namely, we need to show
(1) A ⊂ H. For each A ∈ A just take m = n1 = 1 and A11 = A, then Aij is in
S i
A and { nj=1
Aij } are disjoint because it only contains one element. Therefore
A ⊂ H.
(2) H is a field. This contains two parts.
(i) H is closed under binary intersection. Let
A=
ni
m \
[
Aij , B =
k \
lr
[
Brs
r=1 s=1
i=1 j=1
be any two elements in H. Then
!
!
ni
li
m \
k \
m [
k
[
[
[
A∩B =
Aij ∩
Bij =
i=1 j=1
Denote the set Ai =
i=1 r=1
i=1 j=1
Tni
j=1
Aij and Br =
A∩B =
ni
\
Tlr
Aij
∩
lr
\
!!
Brs
s=1
Brs , then
s=1
m [
k
[
j=1
!
(Ai ∩ Br )
i=1 r=1
To see that {Ai ∩ Br }i,r are finite disjoint sets, then for i 6= i0 , r 6= r0 we have
(Ai ∩ Br ) ∩ (Ai0 ∩ Br0 ) = (Ai ∩ Ai0 ) ∩ (Br ∩ Br0 ) = ∅
S
and therefore A ∩ B = i,r (Ai ∩ Br ) ∈ H, indicating that H is closed under
binary intersection.
(ii) H is closed under complementation. Let
Then
ni
m \
[
Sm Tni
!c
Aij
=
i=1 j=1
i=1
ni
m [
\
j=1
Aij be any element in H.
Acij
i=1 j=1
Now that we have shown that H is closed under binary/finite intersection, it
S i
suffices to show that nj=1
Acij ∈ H. Write
j−1
Cij =
Acij \
[
k=1
Sni
Sni
j−1
Acik
=
\
(Acij ∩ Aik )
k=1
i
Acij = j=1 Cij and {Cij }nj=1
are mutually disjoint. Therefore we
Snj=1
i
c
have j=1 Aij ∈ H and H is closed under complementation.
Then
12
EXAMPLE 1.1.4 (2.6)
(a) Let F consists of the finite and the cofinite(A being cofinite if
Ac is finite) in a set Ω. Show that if A consists of all singletons in Ω, then f (A) is the field
F.
(b) Show that f (A) ⊂ σ(A), that f (A) = σ(A) if A is finite, and that σ(f (A)) = σ(A).
(c) Show that if A is countable, then f (A) is countable.
(d) Show for fields F1 and F2 that f (F1 ∪ F2 ) consists of all finite disjoint unions of sets A1 ∩ A2
with Ai ∈ Fi .
Proof. (a) To see this, we only need to verify:
(I) F is indeed a field containing A. The second part is trivial because every singletons
are finite and thus in F. To see that H is closed under complementation, if A is in
F, then A is either finite or cofinite, indicating that Ac is either cofinite or finite, and
hence Ac ∈ F. Next we show that H is closed under finite union. If A, B ∈ F, then
(i) A, B are finite, then A ∪ B is also finite and A ∪ B ∈ F.
(ii) A, B are cofinite, then A ∪ B = (Ac ∩ B c )c , where Ac , B c are finite and Ac ∩ B c is
finite and therefore, A ∪ B is cofinite. Namely, A ∪ B ∈ F.
(iii) A is cofinite and B is finite. Then if A is infinite then (A ∪ B)c = Ac ∩ B c ⊂ Ac is
finite, meaning that A ∪ B is cofinite; If A is finite, then we are going back to case
(i)
Hence, A ∪ B ∈ H.
(II) F ⊂ f (A). Let G be an arbitrary field containing A.
S
(i) If A ∈ F is a finite set, then A = ω∈A {ω}, which is a finite union of some
singletons. But G is a field containing A ⊃ {{ω} : ω ∈ A} and thus, A ∈ G.
S
(ii) If A ∈ F is a cofinite set, then Ac = ω∈A
/ {ω} is a finite set and is a finite union of
the sets in A, which are all singletons. But G ⊃ A being a field means that Ac ∈ G
and therefore A = Acc ∈ G.
Thus we show that F ⊂ G. Namely, F ⊂
T
G⊃A
G = f (A).
(b) (i) If F is a σ-field containing A, then it is also a field containing A. Namely, we have
Θ := {G : A ⊂ G, G is a σ − field} ⊂ ∆ := {G : A ⊂ G, G is a field}
By definition we have
f (A) =
\
G∈∆
13
G⊂
\
G∈Θ
G = σ(A)
(ii) When A is finite, we only need to show that f (A) is a σ-field, namely, show that
A is closed under the union of a sequence of increasing sets by the monotone class
theorem. Let {En }∞
n=1 ⊂ f (A) with En ⊂ En+1 . Now we use exercise 2.5 to show
that f (A) is finite. In exercise 2.5(b) we know that every element in f (A) has the
S Tni
c
form m
i=1
j=1 Aij , where Aij ∈ A or Aij ∈ A. Since A is finite, then à := {A : A ∈
A or Ac ∈ A} is finite and the resulting combinations are finite. Therefore {En }∞
n=1 is
a finite set, meaning that there exists an index k, such that Ek = En for all n > k.
S
Hence the union ∞
n=1 En = Ek ∈ f (A).
(iii) Now that f (A) ⊂ σ(A), then we have σ(f (A)) ⊂ σ(A).
Conversely, f (A) ⊃ A and therefore, σ(f (A)) ⊃ σ(A). And we obtain σ(f (A)) =
σ(A).
(c) If A is countable, then by exercise 2.5(b), we have

(n
)m
ni
m \
[
\i
f (A) =
Aij : Aij ∈ Ã,
Aij

i=1 j=1
j=1
i=1


are disjoint

where à := {A : A ∈ A or Ac ∈ A}. Then A being countable means that à is countable,
and we have
(
 =
n
\
)
Aj : Aj ∈ Ã
j=1
is a countable collection. Therefore,
(m
)
[
f (A) =
Ãi : Ãi ∈ Ã, Ãi ∩ Ãj = ∅, i 6= j
i=1
is a countable collection.
(d) Write
(
C :=
n
[
)
A1j ∩ A2j : Aij ∈ Fi , i = 1, 2,
(A1j ∩ A2j ) ∩ (A1k ∩ A2k ) = ∅
j=1
Then it suffices to show:
(I) C ⊃ f (F1 ∪ F2 ). Namely, we want to show
(i) C ⊃ F1 ∪F2 . This is trivial because Ω ∈ F1 , F2 and every set A1 ∈ F1 can be written
as A1 ∩ Ω ∈ C and A2 ∈ F2 can be written as Ω ∩ A2 ∈ C. Hence F1 ∪ F2 ⊂ C.
(ii) C is a field. Namely, we need to show
14
1 C is closed under finite intersection. If A =
B2k ) ∈ C, then
A∩B =
n [
m
[
Sn
j=1 (A1j
∩ A2j ), B =
Sm
k=1 (B1k
∩
(A1j ∩ B1k ) ∩ (A2j ∩ B2k )
j=1 k=1
where Aij ∩ Bik ∈ Fi because it is a field for each i = 1, 2, and intersection
means that
(A1j ∩ B1k ) ∩ (A2j ∩ B2k ) ∩ (A1j 0 ∩ B1k0 ) ∩ (A2j 0 ∩ B2k0 )
=(A1j ∩ A2j ) ∩ (A1j 0 ∩ A2j 0 ) ∩ (B1k ∩ B2k ) ∩ (B1k0 ∩ B2k0 ) = ∅
Therefore A ∩ B ∈ C.
2 C is closed under complementation. If A =
c
A =
n
\
Sn
j=1
A1j ∩ A2j , then
(A1j ∩ A2j )c
j=1
Since F1 ∩ F2 ⊂ F1 ∪ F2 and it is a field, then A1j ∩ A2j ∈ F1 ∩ F2 and the
complement (A1j ∩A2j )c ∈ F1 ∩F2 ⊂ F1 ∪F2 . But we have know that F1 ∪F2 ⊂ C
and hence, (A1j ∩ A2j )c ∈ C. Then by the closure under finite intersection we
know that Ac ∈ C.
(II) C ⊂ f (F1 ∪ F2 ). If F is any field contains F1 ∪ F2 and A =
Sn
j=1
A1j ∩ A2j ∈ C is any
element in C, then each A1j ∩ A2j ∈ F1 ∩ F2 ⊂ F1 ∪ F2 ⊂ F. Therefore by the fact
S
that F is a field we have the union nj=1 (A1j ∩ A2j ) ∈ F. This indicates that C ⊂ F
and hence C ⊂ f (F1 ∪ F2 ).
1.2
1.2.1
Probability Measure
Probability Space
DEFINITION 1.2.1 (Measurable Space) Given a nonempty set Ω and a σ-field F on Ω,
the tuple (Ω, F) is called a measurable space.
DEFINITION 1.2.2 (Probability Measure) Given a measurable space (Ω, F), the probability measure P : F → R is a map satisfying
(i) For all E ∈ F, we have P (E) > 0;
(ii) If {Ei }∞
i=1 ⊂ F is a sequence of disjoint sets, namely, Ei ∩ Ej = ∅, i 6= j, then
!
∞
∞
X
X
P
Ei =
P (Ei )
i=1
i=1
15
where the notation
P∞
i=1
Ei is used for the union of a sequence of disjoint sets {Ei }∞
i=1 . This
is known as the countable additivity.
(iii) P (Ω) = 1.
and the tuple (Ω, F, P ) is called a probability space.
PROPOSITION 1.2.1 Given a probability space (Ω, F, P ), we have
(iv) P (E) 6 1 for all E ∈ F.
(v) P (∅) = 0.
(vi) P (E c ) = 1 − P (E) for all E ∈ F.
(vii) P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F ) for all E, F ∈ F.
(viii) Monotonicity: E, F ∈ F with E ⊂ F implies P (E) 6 P (F ).
Proof.
(v) Take Ei = ∅ for all i and
P (∅) = P
∞
X
!
Ei
=
∞
X
i=1
P (Ei ) =
i=1
n
X
P (∅)
i=1
If P (∅) > 0 then the equality above implies that P (∅) = +∞, contradicting with the definition
of probability measure P . Therefore P (∅) = 0.
(vi) Take E1 = E, E2 = E c , Ei = ∅ for all i > 3 and use the countable additivity
!
∞
∞
[
X
c
1 = P (Ω) = P
Ei = P (E) + P (E ) +
P (∅) = P (E) + P (E c )
i=1
i=3
and we obtain P (E c ) = 1 − P (E) for all E ∈ F.
(iv) Use (vi) and we have
P (E) = 1 − P (E c ) 6 1,
P (E c ) > 0
(vii) First we show the case where E ∩ F = ∅. Just use the countable additivity and take
E1 = E, E2 = F, Ei = ∅ for all i > 3 and we have
!
∞
∞
[
X
P (E ∪ F ) = P
Ei = P (E) + P (F ) +
Ei = P (E) + P (F )
i=1
i=3
where P (E ∩ F ) = 0 is omitted. This case is known as the finite additivity. Now consider the
equality
P (E ∪ F ) = P (E ∪ (F \E)) = P (E) + P (F \E) = P (E) + P (F \(E ∩ F ))
16
where P (F ) = P ((F \(E ∩ F )) ∪ (E ∩ F )) = P (F \(E ∩ F )) + P (E ∩ F ). Then it follows that
P (E ∪ F ) = P (E) + P (F \(E ∩ F )) = P (E) + P (F ) − P (E ∩ F )
(viii) If E, F ∈ F with E ⊂ F , then use the finite additivity we have
P (F ) = P (E ∪ (F \E)) = P (E) + P (F \E) 6 P (E)
DEFINITION 1.2.3 (A Probability Measure on a Field) Given a field F, a probability
measure on a field is a function F → R satisfying
(i) For all E ∈ F, we have P (E) > 0;
(ii) If {Ei }∞
i=1 ⊂ F is a sequence of disjoint sets, namely, Ei ∩ Ej = ∅, i 6= j, and
then
P
∞
X
!
Ei
∞
X
=
i=1
S∞
i=1
Ei ∈ F,
P (Ei )
i=1
(iii) P (Ω) = 1.
DEFINITION 1.2.4 (A Preprobability Measure on a Field) Given a field F, a preprobability measure on a field is a function F → R satisfying
(i) For all E ∈ F, we have P (E) > 0;
(ii) If {Ei }ni=1 ⊂ F is a collection of finite disjoint sets, namely, Ei ∩ Ej = ∅, i 6= j, then
!
n
n
X
X
P
Ei =
P (Ei )
i=1
i=1
(iii) P (Ω) = 1.
LEMMA 1.2.1 Suppose F is a field and P is a preprobability measure on F. Then the
following are equivalent:
(i) If {En }∞
n=1 ⊂ F with En ⊃ En+1 for all n and
T∞
n=1
En = ∅, then {P (An )}∞
n=1 is decreasing
to 0.
(ii) If {En }∞
n=1 ⊂ F with En ⊃ En+1 for all n and
T∞
n=1
En = E ∈ F, then {P (An )}∞
n=1 decreases
to P (E). This is known as the Monotone Sequential Continuity from Above(MSCA)
S∞
∞
(iii) If {En }∞
n=1 ⊂ F with En ⊂ En+1 for all n and
n=1 En = E ∈ F, then {P (An )}n=1 increases
to P (E). This is known as the Monotone Sequential Continuity from Below(MSCB)
17
(iv) If {En }∞
n=1 ⊂ F are disjoint and
P
S∞
En = E ∈ F, then
!
∞
∞
X
X
En =
P (En )
n=1
n=1
n=1
(v) P is a probability measure on F.
Proof.
• (i) and (ii) are equivalent. (ii) implies (i) because (i) is a special case of (ii) when
E = ∅. To see that (i) implies (ii), consider Fn = En \E for all n, then
∞
\
Fn =
n=1
∞
\
(En ∩ E c ) = E ∩ E c = ∅
n=1
Therefore P (Fn ) = P (En − E) = P (En ) − P (E) decreases to 0 and P (En ) decreases to
P (E).
• (ii) and (iii) are equivalent. If (ii) is true then assume that {En }∞
n=1 with En ⊂ En+1
T∞
S∞
c
c
c
c
and n=1 En = E ∈ F. Then En ⊃ En+1 and n=1 En = E ∈ F. Use (ii) and P (Enc )
decreases to P (E c ), indicating that 1 − P (En ) decreases to 1 − P (E) and P (En ) increases
to P (E).
T∞
Conversely, If (iii) is true then assume that {En }∞
n=1 with En ⊃ En+1 and
n=1 En = E ∈ F.
S∞
c
c
c
c
c
Then En ⊂ En+1 and n=1 En = E ∈ F. Use (iii) and P (En ) increases to P (E c ), indicating
that 1 − P (En ) increases to 1 − P (E) and P (En ) decreases to P (E).
P∞
S
S
• (iii) implies (iv) Let Fn = nk=1 Ek and this indicates ∞
n=1 En = E with
n=1 Fn =
Fn ⊂ Fn+1 . Then use (iii) and P (Fn ) increases to P (E) where
!
n
n
X
X
P (Fn ) = P
Ek =
P (Ek )
k=1
Therefore
lim P (Fn ) =
n→∞
∞
X
k=1
P (Ek ) = P (E) = P
∞
X
!
En
n=1
k=1
• (iv) implies (iii) Let F1 = E1 , Fn+1 = En+1 − En for all n, then {Fn }∞
n=1 is a sequence of
P∞
S∞
disjoint sets with n=1 Fn = n=1 En . Use (iv) and we have
!
∞
∞
n
X
X
X
P
Fn =
P (Fn ) = P (E1 ) + lim
(P (Ek+1 ) − P (Ek )) = lim P (En )
n=1
n=1
n→∞
k=1
n→∞
• (iv) and (v) are equivalent. This is the definition of probability measure on a field.
18
DEFINITION 1.2.5 (Traces) Let F be a σ-field on Ω and Ω0 ⊂ Ω. Then
F0 := F ∩ Ω0 := {F ∩ Ω0 : F ∈ F}
is a σ-field on Ω0 , called the trace of F on Ω0 .
It is easy to check that F ∩ Ω0 is a σ-field. If E ∈ F ∩ Ω0 means that there exists F ∈ F such
that E = F ∩ Ω0 and Ω0 \E = F c ∩ Ω0 ∈ F ∩ Ω0 . If {En }∞
n=1 ⊂ F, then for all n, En = Fn ∩ Ω0
for some Fn ∈ F and
∞
[
En =
n=1
∞
[
∞
[
(Fn ∩ Ω0 ) =
n=1
!
Fn
∩ Ω0 ∈ F ∩ Ω0
n=1
Therefore, F ∩ Ω0 is a σ-algebra.
PROPOSITION 1.2.2 Let F be a σ-field on Ω generated by a collection A of subsets of Ω,
and Ω0 ⊂ Ω. Then σ(A ∩ Ω0 ) = σ(A) ∩ Ω0 , where A ∩ Ω0 := {A ∩ Ω0 : A ∈ A}.
Proof. We follow the steps below to give out the proof.
(I) σ(A ∩ Ω0 ) ⊂ σ(A) ∩ Ω0 . To see this, it suffices to show that
(1) A ∩ Ω0 ⊂ σ(A) ∩ Ω0 . This can be seen easily because for all A ∩ Ω0 ∈ A ∩ Ω0 ,
A ∈ A ⊂ σ(A) and therefore A ∩ Ω0 ∈ σ(A) ∩ Ω0 .
(2) σ(A) ∩ Ω0 is a σ-field. This can be seen by the remark above.
(II) σ(A∩Ω0 ) ⊃ σ(A)∩Ω0 . Namely, for all E ∈ σ(A), we need to show that E ∩Ω0 ∈ σ(A∩Ω0 ).
Using set terminology, it suffices to show that
G := {E ∈ σ(A) : E ∩ Ω0 ∈ σ(A ∩ Ω0 )} ⊃ σ(A)
Then it suffices to show
(1) G is a σ-field Namely, we need to verify the following
(a) E ∈ G implies E c ∈ G. By definition, E ∈ G means E ∈ σ(A) and E ∩ Ω0 ∈
σ(A ∩ Ω0 ). The complement E c ∈ σ(A) since it is a σ-field, and
E c ∩ Ω0 = (Ω ∩ E c ) ∩ Ω0 = Ω0 \(E ∩ Ω0 )
Since σ(A ∩ Ω0 ) is a σ-field on Ω0 , then E c ∩ Ω0 = Ω0 \(E ∩ Ω0 ) ∈ σ(A ∩ Ω0 ), and
we obtain that E c ∈ G by the definition of G.
S∞
(a) {En }∞
n=1 ⊂ G implies
n=1 En ∈ G. Since En ∈ G for all n, then En ∈ σ(A) and
En ∩ Ω0 ∈ σ(A ∩ Ω0 ). Now
∞
[
n=1
19
En ∈ σ(A)
since σ(A) is a σ-field. And
!
∞
∞
[
[
En ∩ Ω0 =
(En ∩ Ω0 ) ∈ σ(A ∩ Ω0 )
n=1
n=1
since for all n the set En ∩ Ω0 is in σ(A ∩ Ω0 ) and itself is a σ-field.
(2) G includes A as a subcollection. For all E ∈ A, we have naturally E ∈ σ(A), and
E ∩ Ω0 ∈ A ∩ Ω0 indicates that E ∩ Ω0 ∈ A ∩ Ω0 ⊂ σ(A ∩ Ω0 ). Therefore, A ⊂ G by
the definition of G.
EXAMPLE 1.2.1 (2.13)
(a) Let F be the field consisting of the finite and the cofinite sets
in an infinite Ω, and define P on F by taking P (A) to be 0 or 1 as A is finite or cofinite(Note
that P is not well defined if Ω is finite). Show that P is finitely additive.
(b) Show that this P is not countably additive if Ω is countably infinite.
(c) Show that this P is countably additive if Ω is uncountable.
(d) Now let F be the σ-field consisting of the countable and the cocountable sets in an uncountable Ω, and define P on F by taking P (A) to be 0 or 1 as A is countable or cocountable(Note
that P is not well defined if Ω is countable). Show that P is countably additive.
Proof. (a) Let A, B ∈ F be disjoint sets.
(I) If A, B are both finite, then A + B is also finite and we have
P (A + B) = 0 = 0 + 0 = P (A) + P (B)
(II) If A is finite and B is cofinite, then (A + B)c ⊂ Ac is also finite and therefore, A + B
is cofinite. Then
P (A + B) = 1 = 0 + 1 = P (A) + P (B)
(III) If A, B are both cofinite, we show that it is impossible, namely, A ∩ B 6= ∅. Suppose
that A ∩ B = ∅, then A ⊂ B c which is a finite set, indicating that A is finite. But A
is cofinite means that Ac is finite. Since Ω is a infinite set, then A = Ω\Ac is infinite,
and this contradicts with A being finite.
(b) Let Ω = N+ with En = {n} for all n ∈ N+ . Then every En is a finite set and P (En ) = 0
and they are mutually disjoint. Nevertheless, we have
∞
X
En = N+
n=1
20
which is a cofinite set. Therefore we have
!
∞
∞
X
X
P
En = 1 6= 0 =
P (En )
n=1
n=1
(c) Assume that {En }∞
n=1 be a countable sequence of disjoint sets in F with
S∞
n=1
En ∈ F. In
(a) we see that when two sets A, B are disjoint, they cannot be cofinite sets simultaneously.
S
Then for all k, we can see that n6=k En and Ek are disjoint and therefore, they cannot be
cofinite simultaneously. Note that we have Ω is
S
(i) If all En ’s are finite, then either ∞
n=1 En is at most countable. But we require that
S∞
n=1 En is either countable or cofinite. Now that Ω is uncountable, then a cofinite set
S
is itself uncountable. Therefore ∞
n=1 En is finite, indicating that En = ∅ for all n > k
for some k. Thus
∞
X
P (En ) =
k
X
P (En ) = 0 = P
!
En
n=1
n=1
n=1
∞
[
S
(ii) If there exists a set Ek that is cofinite, then we have that n6=k En is not cofinite,
S
namely, it is finite since it is the difference ∞
n=1 En \Ek and is still in F. On the other
S∞
hand n=1 En is infinite because Ek is cofinite, indicating that Ek is infinite. Since
S∞
S∞
n=1 En is cofinite. Therefore by (i)
n=1 En is either finite or cofinite(infinite), then
we have En = ∅ for all n > m for some m. Then
∞
X
n=1
P (En ) =
m
X
n=1
P (En ) =
m
X
P (En ) + P (Ek ) = 1 = P
∞
X
!
En
n=1
n=1,n6=k
To sum up, P is countably additive if Ω is uncountable.
(d) (i) First we show that if A, B are disjoint sets in F, then either A, B are all countable or
only one of A, B is cocountable. In fact, if both A, B are cocountable, then A ⊂ B c
and B c is countable, yielding that A is countable. But Ω is uncountable, and we know
that A being cocountable means that Ac is countable and Ω\Ac = A is uncountable.
Therefore we yield an contradiction that A being countable and uncountable.
(ii) If {En }∞
n=1 is a countable sequence of sets in F, then
S
(1) All En ’s are countable. In this case we have ∞
n=1 En is also countable and
!
∞
∞
X
X
P
En = 0 =
P (En )
n=1
n=1
S
(2) There exists an k such that Ek is cocountable. Since n6=k En and Ek are disjoint,
S
then we have n6=k En is countable. Therefore every set En with n 6= k is a countable
21
set. On the other hand, Ek being cocountable means that Ek is uncountable, and
S
S∞
c
c
we yield ∞
n=1 En being cocountable because ( n=1 En ) ⊂ Ek is countable.
!
∞
∞
∞
X
X
X
P (En ) =
P (En ) + P (Ek ) = P (Ek ) = 1 = P
En
n=1
n=1
n=1,n6=k
To sum up, P is countably additive on F.
EXAMPLE 1.2.2 (Homework #2)
1. Let Ω0 be a nonempty proper subset of a set Ω.
Let Ω1 := Ω − Ω0 .
(a) Prove that if G0 is a σ-field on Ω0 (that is, a σ-field of subsets of Ω0 ) and G1 is a σ-field
on Ω1 , then
G := {G0 + G1 : G0 ∈ G0 , G1 ∈ G1 }
is a σ-field on Ω. (Here ”+” denotes disjoint union.)
For the remaining parts, let A be a collection of subsets of Ω that generates a σ-field F
on Ω, let
A0 := A ∩ Ω0 = {A ∩ Ω0 : A ∈ A}
be the trace of A on Ω0 , and let
F0 := F ∩ Ω0 = {F ∩ Ω0 : F ∈ F}
be the trace of F on Ω0 . Similarly, let A1 and F1 be the corresponding traces on Ω1 .
(b) Prove that F0 is a σ-field.
(c) Show that A0 ⊂ F0 .
(d) Show that any σ-field on Ω0 that contains A0 must contain F0 .
Hint: Let G0 be a σ-field on Ω0 that contains A0 and let G1 be the σ-field on Ω1
generated by A1 . Use part (a).
(e) Conclude that F0 is the σ-field generated by A0 .
Proof. (a) In order to show that G is a σ-field, it suffices to prove
(I) G is closed under complementation. Namely, G ∈ G implies Gc ∈ G. If G ∈ G,
22
then there exists G0 ∈ G0 , G1 ∈ G1 , such that G = G0 + G1 . Then
Gc = (G0 + G1 )c
= Gc0 ∩ Gc1
= ((Ω0 + Ω1 )\G0 ) ∩ ((Ω0 + Ω1 )\G1 )
= (Ω1 + (Ω0 \G0 )) ∩ (Ω0 + (Ω1 \G1 ))
= (Ω0 \G0 ) ∪ (Ω1 \G1 )
Now that G0 and Gc are fields on Ω0 and Ω1 respectively, then Ω0 \G0 ∈ G0 and
Ω1 \G1 ∈ G1 . Therefore,
Gc = (Ω0 \G0 ) ∪ (Ω1 \G1 ) = (Ω0 \G0 ) + (Ω1 \G1 ) ∈ G
by the definition of G. Note that (Ω0 \G0 ) ∩ (Ω1 \G1 ) = ∅ holds automatically since
they are subsets of disjoint sets Ω0 and Ω1 .
(II) G is closed under union of countable sequence of sets. Let {Gn }∞
n=1 ⊂ G be
(0)
(1)
a countable sequence of sets in G. Then for all n, there exists Gn ∈ G0 , Gn ∈ G1 ,
(0)
(1)
such that Gn = Gn + Gn . Then
∞
[
Gn =
n=1
For all ω ∈
(0)
(0)
S
(1)
(G(0)
n ∪ Gn ) ⊂
n=1
(0)
∞
n=1 Gn
(1)
Gn , or
∪
∞
[
!
G(0)
n
n=1
S
∞
n=1
(1)
Gn
∪
∞
[
!
G(1)
n
n=1
either there exists some n such that ω ∈
(1)
(0)
(1)
there exists some m such that ω ∈ Gm ⊂ Gm ∪ Gm .
S∞
(0)
(1)
Therefore we have ω ∈ n=1 (Gn ∪ Gn ). This means that
!
!
∞
∞
∞
[
[
[
Gn = ∪
G(0)
∪
G(1)
n
n
Gn
⊂ Gn +
∞
[
n=1
n=1
n=1
(0)
(1)
∞
Since both G0 and G1 are fields with {Gn }∞
n=1 ⊂ G0 and {Gn }n=1 ⊂ G1 , then the
S
S∞
(0)
(1)
unions ∞
n=1 Gn ∈ G1 . Note that the union of these two unions
n=1 Gn ∈ G0 and
can be written as the sum because they are subsets of disjoint sets Ω0 , Ω1 . Hence
!
!
∞
∞
∞
[
[
[
Gn = ∪
G(0)
∪
G(1)
∈G
n
n
n=1
n=1
n=1
by the definition of G and we complete the proof of the closure under union of
countable sequence of sets.
(b) To prove that F0 is a σ-field, it suffices to prove
23
(I) F0 is closed under complementation. If E ∈ F0 , then there exists F ∈ F, such
that E = F ∩ Ω0 . Then Ω0 \E = Ω0 \(F ∩ Ω0 ) = F c ∩ Ω0 . Since F is a field, then
F c ∈ F and therefore, Ω0 \E = F c ∩ Ω0 ∈ F0 by the definition of F0 .
(II) F0 is closed under union of countable sequence of sets. Let {En }∞
n=1 be
a countable sequence of sets in F0 . Then for all n there exists Fn ∈ F such that
En = Fn ∩ Ω0 . Therefore
∞
[
En =
n=1
∞
[
(Fn ∩ Ω0 ) =
n=1
∞
[
!
Fn
∩ Ω0
n=1
S∞
Since F is a σ-field and Fn ∈ F for all n, then n=1 Fn ∈ F. Hence
!
∞
∞
[
[
En =
Fn ∩ Ω0 ∈ F0
n=1
n=1
by the definition of F0 .
(c) For all A0 ∈ A0 there exists A ∈ A such that A0 = A ∩ Ω0 . Then we know that
A ∈ A ⊂ σ(A) = F and this means that A0 = A ∩ Ω ∈ F ∩ Ω0 = F0 . Therefore
A0 ⊂ F0 .
(d) Let G0 be any σ-field containing A and G1 be the σ-field generated by A1 := A ∩ Ω1 =
{A ∩ Ω1 : A ∈ A}, where Ω1 = Ω − Ω0 . Let
G = {G0 + G1 : G0 ∈ G0 , G1 ∈ G1 }
Then by (a) we know that G is a field. Moreover, we know that for all A ∈ G,
A = (A ∩ Ω0 ) + (A ∩ Ω1 ) and A ∩ Ω0 ∈ A0 ⊂ G0 , A ∩ Ω1 ∈ A1 ⊂ G1 . Hence
A = (A ∩ Ω0 ) + (A ∩ Ω1 ) ∈ G
by the definition of G. This means that A ⊂ G and therefore, F = σ(A) ⊂ G.
Now for all F0 ∈ F0 , there exists F ∈ F such that F0 = F ∩ Ω0 . Let F1 = F ∩ Ω1 , then
by F ⊂ G we know that F ∈ G. Namely, there exists G0 ∈ G0 , G1 ∈ G1 such that
F0 + F1 = F = G0 + G1
Now we show that F0 = G0 .
(I) F0 ⊂ G0 . For all ω ∈ F0 we have ω ∈ F = G0 + G1 , namely, either ω ∈ G0 or
ω ∈ G1 . But G0 ⊂ Ω0 and G1 ⊂ Ωc0 . If ω ∈ G1 then ω ∈ G1 ⊂ Ωc0 , contradicting
with ω ∈ F0 ⊂ Ω0 . Therefore ω ∈ G0 and F0 ⊂ G0 .
(II) G0 ⊂ F0 . For all ω ∈ G0 , if ω ∈ F1 then ω ∈ F1 ⊂ Ω1 = Ωc0 . But ω ∈ G0 ⊂ Ω0 , and
this is a contradiction. Therefore ω ∈ F0 and G0 ⊂ F0 .
24
Hence F0 = G0 ∈ G0 . Since F0 is any element in F0 , then F0 ⊂ G0 .
(e) By (b) and (c) we know that F0 is a σ-field containing A0 . Namely F0 ⊃ σ(A0 ). By
(d) we know that F0 ⊂ σ(A0 ), since σ(A0 ) itself is a σ-field containing A0 . Therefore
F0 = σ(A0 ).
EXAMPLE 1.2.3 (Discrete Probability on (0, 1]) If Ω is a finite or countably infinite set
and F be the total σ-field on Ω. For ω ∈ Ω, assign probability masses P ({ω}) = p(ω) with
P
P
nonnegative value such that ω∈Ω p(ω) = 1. Then for all A ⊂ Ω, define P (A) = ω∈A p(ω).
Then P is a probability measure on (Ω, F).
1.2.2
Uniform Probability on (0, 1]
In this part, our goal is to give out a uniform probability space on the unit interval Ω = (0, 1].
The uniform probability measure on (0, 1] is known as the Lebesgue measure.
PROPOSITION 1.2.3 Define B0 to be the collection of all disjoint finite unions of left-open
right-closed subintervals of Ω. Then B0 is a field.
Proof. We may write
B0 =
( n
X
)
Ii : Ii = (ai , bi ], Ii ∩ Ij = ∅, i 6= j
i=1
To see that B0 is a field, we need to verify
(I) B0 is closed under complement. Let
Pn
i=1 Ii
∈ B0 . Then we may rearrange the order
of the intervals Ii such that
a1 6 b 1 6 a2 6 b 2 6 · · · 6 an 6 b n
And therefore
n
X
!c
Ii
= (0, a1 ] +
i=1
n−1
X
(bi , ai+1 ] + (bn , 1] ∈ B0
i=1
(II) B0 is closed under finite intersection Assume that
have
n
X
i=1
!
Ii
∩
m
X
!
Jj
j=1
=
Pn
n X
m
X
i=1 Ii ,
Pm
j=1
Jj ∈ B0 . Then we
Ii ∩ Jj
i=1 j=1
where Ii ∩Jj is either empty or a left-open and right-closed interval. And we have {Ii ∩Jj }i,j
is finite disjoint sets because either i 6= i0 or j 6= j 0 yields (Ii ∩ Jj ) ∩ (Ii0 ∩ Jj 0 ) = (Ii ∩ Ii0 ) ∩
25
(Jj ∩ Jj 0 ) = ∅. Therefore we have
!
n
X
Ii ∩
m
X
i=1
!
Jj
=
j=1
n X
m
X
Ii ∩ Jj ∈ B0
i=1 j=1
PROPOSITION 1.2.4 Let T be the collection of all open sets in (0, 1], namely, the Euclidean
topology on (0, 1]. Then σ(B0 ) = σ(T ).
Proof. It suffices to show that B0 ⊂ σ(T ) and T ⊂ σ(B0 ).
(I) T ⊂ σ(B0 ). If G is an open set in (0, 1], then there exists a countable sequence of disjoint
S
open intervals {Ii }i>1 such that G = i Ii . Write Ii = (ai , bi ) and denote |Ii | = bi − ai . For
each i, let
εik = min
1 b i − ai
,
k
2
Iik = (ai , bi − εik ] ,
,
k ∈ N+
Then we have for each i, k, Iik ∈ B0 and
G=
[
Ii =
∞
[[
Iik ∈ σ(B0 )
i k=1
i>1
because σ(B0 ) is a σ-field containing B0 and each Iik is in B0 .
P
(II) B0 ⊂ σ(T ). For all ni=1 Ii ∈ B0 , write
n
n
n \
∞ X
X
[
1
Ii =
(ai , bi ] =
ai , bi +
k
i=1
i=1
i=1 k=1
where for all k ∈ N+ we have (ai , bi + 1/k) ∈ T is an open set. Since σ(T ) is a σ-algebra,
we have
n
X
Ii =
i=1
n \
∞ [
i=1 k=1
1
ai , bi +
k
∈ σ(T )
because the set operations are countable union after countable intersection.
PROPOSITION 1.2.5 For all
P
Pn
∈ B0 , define
!
n
n
X
X
(ai , bi ] =
(bi − ai )
i=1 (ai , bi ]
i=1
i=1
Then P is a preprobability measure on the field B0 . Furthermore, P is a probability measure on
the field B0 .
26
Proof. First we need to verify:
P
P
(I) P is well-defined. If ni=1 (ai , bi ] = m
j=1 (cj , dj ] and use Ii , Jj to represent (ai , bi ] and (cj , dj ]
respectively, then we have
n
X
Ii =
i=1
n
X
Jj =
n X
m
X
j=1
Ii ∩ Ij
i=1 j=1
and the length of an interval I will not change if the length is substituted by the sum of
finite subintervals Kk of the interval where the sum of these subintervals Kk are the interval
I. Therefore
n
X
|Ii | =
i=1
n
X
n X
m
X
|Jj | =
j=1
|Ii ∩ Ij |
i=1 j=1
(II) P is nonnegative. This is trivial because the length of an interval is nonnegative and
therefore the sum of the lengths is nonnegative.
(III) P (Ω) = 1. This is also trivial because P ((0, 1]) = 1 − 0 = 1.
P
P
(IV) P is finitely additive. Let ni=1 Ii , m
j=1 Jj ∈ B0 be disjoint, then we have Ii ∩ Ij = ∅ for
all i, j and write
P
n
X
Ii +
i=1
m
X
!
=
Jj
n
X
|Ii | +
m
X
i=1
j=1
n
X
|Jj | = P
j=1
!
Ii
+P
i=1
m
X
!
Jj
j=1
Therefore P is a preprobability measure on the field B0 .
Next we need to show that P is a probability measure on the field B0 . According to the
T∞
lemma, it suffices to show that if {En }∞
n=1 ⊂ B0 with En ⊃ En+1 for all n and
n=1 En = ∅,
then P (En ) decreases to 0. Then write
N (n)
En =
X
(ain , bin ]
i=1
For all ε > 0, let
N (n)
Fn (ε) =
X
N (n)
En∗ (ε)
[ain + δin , bin ] ,
=
X
i=1
(ain + 2δin , bin ]
i=1
where
δin = min
ε
bin − ain
,
n+1
N (n)2
4
6
ε
N (n)2n+1
Then we have En∗ (ε) ⊂ Fn (ε) ⊂ En where Fn is a closed set. Therefore
N (n)
P (En \En∗ (ε))
=
X
i=1
27
2δin 6
ε
2n
And we have
∅⊂
∞
\
En∗ (ε)
⊂
∅=
Fn (ε) ⊂
n=1
n=1
Now if we write
∞
\
∞
\
Fn (ε) =
n=1
∞
\
En = ∅
n=1
∞
\
n
\
n=1
k=1
!
Fk (ε)
and use the Cantor’s intersection theorem, then there exists m, for all n > m we have
∅. This shows that
∅⊂
m
\
En∗ (ε)
⊂
m
\
Tn
k=1
Fk =
Fn (ε) = ∅
n=1
n=1
T
∗
implying that P ( m
n=1 En (ε)) = 0. Therefore, for all n > m, we have
!
!
!
m
m
m
\
\
\
Ek = P
Ek − P
En∗ (ε)
P (En ) 6 P (Em ) = P
=P
m
\
Ek −
k=1
k=1
m
\
!
En∗ (ε)
6P
k=1
m
[
k=1
k=1
!
(Ek \Ek∗ (ε)
k=1
m
X
m
X
ε
6
P (Ek \Ek∗ (ε)) 6
<ε
k
2
k=1
k=1
Together with the nonnegativity of probability measure, we have P (En ) → 0, n → ∞.
1.3
Extension of Probability Measure
1.3.1
Extension Theorem
THEOREM 1.3.1 (Extension Theorem) Given a probability measure P on a field F0 , let
F := σ(F0 ). Then there exists a unique extension of P to a probability measure on F.
The proof of the existence part of the extension theorem is constructive by the following
steps:
(1) Define outer measure P ∗ on 2Ω .
(2) List Properties of P ∗ on 2Ω .
(3) Show that P ∗ = P on F0 .
(4) Define M by M = {A : P ∗ (A ∩ E) + P ∗ (Ac ∩ E) = P ∗ (E), for all E ⊂ Ω}.
(5) Reduce M to M = {A : P ∗ (A ∩ E) + P ∗ (Ac ∩ E) 6 P ∗ (E), for all E ⊂ Ω} by
subadditivity
28
(6) Show F0 ⊂ M.
(7) Show P ∗ is a probability measure on the σ-field M ⊃ σ(F0 ) by showing
(a) M is a field
(b) P ∗ is a probability measure on M
(c) P ∗ is countably additive on M
(d) M is closed under countable disjoint union
PROPOSITION 1.3.1 Step 1, 2 For all E ⊂ Ω, define the outer measure
(∞
)
∞
X
[
∗
P (E) = inf
P (An ) : E ⊂
An , An ∈ F0
n=1
n=1
(i) P ∗ (∅) = 0.
(ii) P ∗ is nonnegative.
(iii) P ∗ is monotone.
(iv) P ∗ is countably subadditive.
Proof. (i)is trivial because a sequence of empty sets can cover empty set itself, and thus one
upper bound of P ∗ (∅) is 0, and the infimum P ∗ (∅) of a set of nonnegative numbers is also
nonnegative. (ii)is trivial because P ∗ is the infimum of a set of nonnegative numbers and is also
nonnegative. So now we prove (iii),(iv).
S∞
S
(iii) If E ⊂ F , then F ⊂ ∞
n=1 An and thus
n=1 An also implies E ⊂
)
)
(
(∞
∞
∞
∞
[
X
[
X
An , An ∈ F0 ⊂
P (An ) : E ⊂
An , An ∈ F0
P (An ) : F ⊂
n=1
n=1
n=1
n=1
And the lower bound will not decrease if we take a subset. Therefore P ∗ (F ) > P ∗ (E).
S∞
(iv) For alln, for all ε > 0, there exists {Ank }∞
k=1 ⊂ F0 , such that En ⊂ Ω satisfies En ⊂
k=1 Ank
and
∞
X
P (Ank ) 6 P ∗ (En ) +
k=1
Meanwhile we have
P
∗
∞
[
n=1
S∞
n=1
!
En
6
En ⊂
S∞ S∞
∞ X
∞
X
n=1 k=1
n=1
k=1
ε
2n
Ank and by the definition of infimum
P (Ank ) 6
∞ X
n=1
29
∞
ε X ∗
P (En ) + n <
P (En ) + ε
2
n=1
∗
PROPOSITION 1.3.2 (Step 3) P ∗ = P on F0 . Namely, P is the restriction of P ∗ on the
field F0 .
Proof. Let A ∈ F0 . Then take A1 = A, An = ∅ for all n > 2 and we have
P ∗ (A) 6
∞
X
P (An ) = P (A)
n=1
For the reverse inequality, if {An }∞
n=1 ⊂ F0 with A ⊂
S∞
n=1
An , since F0 is a field, then An ∩ A ∈
F0 and by the countable subadditivity of P we have
!
!
∞
∞
∞
∞
[
[
X
X
P (A) = P A ∩
An = P
(A ∩ An ) 6
P (An ∩ A) 6
P (An )
n=1
n=1
n=1
n=1
where the last inequality holds for monotonicity. Then take infimum on the right-hand side and
we get P (A) 6 P ∗ (A).
PROPOSITION 1.3.3 (Step 4, 5, 6) F0 is included in
M = {A : P ∗ (A ∩ E) + P ∗ (Ac ∩ E) = P ∗ (E), for all E ⊂ Ω}
Proof. Now that we have the subadditivity of the outer measure, then
M = {A : P ∗ (A ∩ E) + P ∗ (Ac ∩ E) 6 P ∗ (E), for all E ⊂ Ω}
Now let A ∈ F0 . Let E be any subset of Ω. By the definition of infimum, for all ε > 0, there
S∞
exists a cover {An }∞
n=1 ⊂ F0 such that E ⊂
n=1 An with
∞
X
P (An ) 6 P ∗ (E) + ε
n=1
Now write
∗
∗
c
P (E ∩ A) + P (E ∩ A ) 6
∞
X
P (An ∩ A) +
n=1
∞
X
c
P (An ∩ A ) =
n=1
∞
X
P (An ) 6 P ∗ (E) + ε
n=1
where the first inequality holds because F0 is a field and A, An ∈ F0 implies that A∩An , Ac ∩An ∈
S
S∞
c
c
F0 for all n, and A ∩ E ⊂ ∞
n=1 (A ∩ An ), A ∩ E ⊂
n=1 (A ∩ An ). Therefore we have
P ∗ (E ∩ A) + P ∗ (E ∩ Ac ) 6 P ∗ (E) + ε
for all ε > 0. This implies that
P ∗ (E ∩ A) + P ∗ (E ∩ Ac ) 6 P ∗ (E)
Therefore we have A ∈ M.
30
PROPOSITION 1.3.4 (Step 7) (Ω, M, P ) is a probability space where
M = {A : P ∗ (A ∩ E) + P ∗ (Ac ∩ E) 6 P ∗ (E), for all E ⊂ Ω}
Proof. We complete the proof of this proposition by the following steps:
(a) M is a field. It suffices to verify
(I) M is closed under complementation. This is easy because if A ∈ M then
P ∗ (A ∩ E) + P ∗ (Ac ∩ E) = P ∗ (Acc ∩ E) + P ∗ (Ac ∩ E) 6 P ∗ (E)
implies that Ac ∈ M.
(II) M is closed under finite union. Let A, B ∈ M and for all E ⊂ Ω
P ∗ (E) > P ∗ (E ∩ A) + P ∗ (E ∩ Ac )
> P ∗ (E ∩ A ∩ B) + P ∗ (E ∩ A ∩ B c ) + P ∗ (E ∩ Ac ∩ B) + P ∗ (E ∩ Ac ∩ B c )
> P ∗ (E ∩ (A ∪ B)c ) + P ∗ (E ∩ ((A ∩ B) ∪ (A ∩ B c ) ∪ (Ac ∩ B)))
= P ∗ (E ∩ (A ∪ B)c ) + P ∗ (E ∩ (A ∪ B))
Therefore A ∪ B ∈ M.
(b) P ∗ is a preprobability measure on M. It suffices to show:
(I) P ∗ is nonnegative. This is trivial because P ∗ is the infimum of a set of nonnegative
numbers and thus is nonnegative.
(I) P ∗ (Ω) = 1. This is trivial because Ω ∈ F0 and P (Ω) = P ∗ (Ω) = 1.
(II) P ∗ is finitely additive. Let A, B ∈ M be disjoint. Then take E = A + B and we
have
P ∗ (A + B) = P ∗ ((A + B) ∩ A) + P ∗ ((A + B) ∩ Ac ) = P ∗ (A) + P ∗ (B)
(c) P ∗ is countably additive on M. Let {An }∞
n=1 ⊂ M be a countable sequence of disjoint
sets. Then for all N we have
N
X
∗
P (An ) = P
n=1
N
X
!
An
6P
n=1
∞
X
!
An
n=1
Then let n → ∞ and we have
∞
X
P ∗ (An ) 6 P
n=1
∞
X
!
An
n=1
The reverse inequality holds naturally since P ∗ is countably subadditive.
31
(d) M is closed under countable disjoint union. Let {An }∞
n=1 ⊂ M be a countable
SN
sequence of disjoint sets. Since M is a field then we have n=1 An ∈ M for all N , which
means
N
X
P ∗ (E) > P ∗
n=1
N
X
> P∗
!
!
+ P∗
∩E
An
!
n=1
∞
X
!
+ P∗
∩E
An
N
X
n=1
!c
!
∩E
An
!c
!
∩E
An
n=1
Here we need to show
N
X
P∗
!
An
!
∩E
=
n=1
N
X
P ∗ (An ∩ E)
n=1
In fact we just need to prove the case when n = 2 and the rest part is just induction. Now
we know that A1 , A2 ∈ M are disjoint, then
P ∗ ((A1 + A2 ) ∩ E) = P ∗ (A1 ∩ ((A1 + A2 ) ∩ E)) + P ∗ (Ac1 ∩ ((A1 + A2 ) ∩ E))
= P ∗ (A1 ∩ E) + P ∗ (A2 ∩ E)
and we finish proving the case when n = 2. Therefore now we have
!c
!
N
∞
X
X
P ∗ (E ∩ An ) + P ∗
An ∩ E
P ∗ (E) >
n=1
n=1
Take N → ∞ and use the monotonicity, we have
P ∗ (E) >
∞
X
P ∗ (E ∩ An ) + P ∗
n=1
> P∗
∞
X
!c
An
P∞
n=1
∩E
n=1
∞
X
!
An
!
∩E
+ P∗
n=1
Hence we have
!
∞
X
!c
An
!
∩E
n=1
An ∈ M.
And the uniqueness of the extension is given by
PROPOSITION 1.3.5 (Uniqueness) If P and P 0 are probability measures on a field F =
σ(F0 ) such that P = P 0 on F0 , then P = P 0 on F.
Proof. Define
G = {A ∈ F : P (A) = P (A0 )}
Then we only need to show that G ⊃ F = σ(F0 ). Now that F0 is a field and by the monotone
class theorem it suffices to show that G is a M.C. containing F0 because M C(F0 ) = σ(F0 ).
32
∞
(I) G is a M.C. Let {An }∞
n=1 increases to A and {Bn }n=1 decreases to B. Then use the MSCB
and MSCA we have
P (A) = lim P (An ) = lim P 0 (An ) = P 0 (A) P (B) = lim P (Bn ) = lim P 0 (Bn ) = P 0 (B)
implying that A, B ∈ G.
(II) G contains F0 . If A ∈ F0 , then A ∈ σ(F0 ) = F is trivial, and P = P 0 on F0 means that
P (A) = P 0 (A).
1.3.2
Completeness
DEFINITION 1.3.1 (Complete Probability Space) A probability space (Ω, F, P ) is complete, E ∈ F with P (E) = 0, F ⊂ E implies F ∈ F.
Here is an equivalent description of the complete probability space.
THEOREM 1.3.2 (Complete Extension) Assume that (Ω, F, P ) is a probability space,
let
P ∗ (A) = inf
(∞
X
P (An ) : A ⊂
n=1
∞
[
)
An , An ∈ F
n=1
and
M = {A ⊂ Ω : P ∗ (A ∩ E) + P ∗ (Ac ∩ E) = P ∗ (E) for all E ⊂ Ω}
Then (Ω, M, P ∗ ) is a complete probability space and is an extension of (Ω, F, P ).
Proof. Just follow the proof of the extension theorem and we know that (Ω, M, P ∗ ) is an extension of (Ω, F, P ). Now we need to show that it is complete, namely, a complete extension.
If B ∈ M is with P ∗ (B) = 0 and A ⊂ B, then for all E ⊂ Ω we have
P ∗ (A ∩ E) + P ∗ (Ac ∩ E) 6 P ∗ (B ∩ E) + P ∗ (Ac ∩ E)
6 P ∗ (B ∩ E) + P ∗ (E)
6 P ∗ (B) + P ∗ (E)
= P ∗ (E)
Therefore A ∈ M.
THEOREM 1.3.3 (Minimal Complete Extension, Ex 3.10(b)) Assume that (Ω, F, P )
is a probability space. Let
F + := {A ⊂ Ω : there exists A0 , B ∈ F, such that A∆A0 ⊂ B with P (B) = 0}
33
and define
P + (A) = P (A0 )
Then
• F + is a σ-field.
• P + is well-defined on F + .
• (Ω, F + , P + ) is a complete extension of (Ω, F, P ).
• If (Ω, F1 , P1 ) is a complete extension of (Ω, F, P ), then we have F ⊂ F1 and P1 = P + on
F +.
And the probability space (Ω, F + , P + ) is called the completion of probability space (Ω, F, P ).
Proof.
• F + is a σ-field. Namely, we want to show
(I) F + is closed under complementation. Let A ∈ F + . Then there exists A0 , B such
that A∆A0 ⊂ B with P (B) = 0. Now
Ac ∆A0c = (Ac − A0c ) ∪ (A0c − Ac ) = (A0 − A) ∪ (A − A0 ) = A∆A0 ⊂ B
where A0c , B ∈ F. Therefore Ac ∈ F + .
+
(II) F + is closed under union of a countable sequence of sets. Let {An }∞
n=1 ⊂ F .
Then for all n there exists A0n , Bn such that An ∆A0n ⊂ Bn with P (Bn ) = 0. Let
S∞
S
0
A0 = ∞
n=1 Bn . Then
n=1 An , B =
0 6 P (B) 6
∞
X
P (Bn ) = 0
n=1
Namely, P (B) = 0. Now let A =
A∆A0 =
∞
[
An −
n=1
⊂
∞
[
(An −
n=1
S∞
n=1
An and
!
∞
[
A0n
∪
n=1
A0n )
∞
[
A0n −
n=1
∪
∞
[
(A0n
n=1
• P + is well-defined on F + .
!
An
n=1
− An ) =
where A0 , B ∈ F. Therefore by definition A =
∞
[
S∞
∞
[
(An ∆A0n )
n=1
n=1
⊂
∞
[
Bn = B
n=1
An ∈ F + .
Let E ∈ F + , and there exist A0 , A00 , B 0 , B 00 ∈ F with
P (B 0 ) = P (B 00 ) = 0 such that A∆A0 ⊂ B 0 , A∆A00 ⊂ B 00 . Then we need to check
P (A0 ) = P (A00 )
34
Now we have
A0
=((A0 − A) ∪ A) − (A − A0 ) ⊂ (A0 − A) ∪ A ⊂ B 0 ∪ A
A
=((A − A0 ) ∪ A0 ) − (A0 − A) ⊂ (A − A0 ) ∪ A0 ⊂ B 0 ∪ A0
A
=((A − A00 ) ∪ A00 ) − (A00 − A) ⊂ (A − A00 ) ∪ A00 ⊂ B 00 ∪ A00
A00 =((A00 − A) ∪ A) − (A − A00 ) ⊂ (A00 − A) ∪ A ⊂ B 00 ∪ A
Then we know that the first and the third relations imply
A0 ⊂ B 0 ∪ A ⊂ B 0 ∪ B 00 ∪ A00
and similarly the second and the fourth imply
A00 ⊂ B 00 ∪ A ⊂ B 00 ∪ B 0 ∪ A0
Therefore by the subadditivity
P (A0 ) 6 P (B 0 ) + P (B 00 ) + P (A00 ) = P (A00 ),
P (A00 ) 6 P (B 00 ) + P (B 0 ) + P (A0 ) = P (A0 )
and we now obtain P (A0 ) = P (A00 ).
• (Ω, F + , P + ) is a complete extension of (Ω, F, P ). It suffices to show
(I) F ⊂ F + . If A ∈ F, then take A0 = A, B = ∅ and we have A0 ∆A = ∅ ⊂ B with
P (B) = 0 and thus F ∈ F + , implying that F ⊂ F+ .
(II) P = P + on F. Now that P + is well-defined, then for A ∈ F, we can take A0 = A, B = ∅
with P (B) = 0. Hence
P + (A) = P (A0 ) = P (A)
(III) (Ω, F + , P + ) is complete. Let B ∈ F + with P + (B) = 0, and A be any subset of B.
Then we know there exists B 0 , C ∈ F such that B∆B 0 ⊂ C with P (C) = 0. Now
A∆B 0 = (A − B 0 ) ∪ (B 0 − A) ⊂ (B − B 0 ) ∪ (B 0 − B) ⊂ C ∪ B 0
where 0 6 P (C ∪ B 0 ) 6 P (C) + P (B 0 ) = 0. Therefore A ∈ F + . This means that
(Ω, F + , P + ) is complete.
• If (Ω, F1 , P1 ) is a complete extension of (Ω, F, P ), then we have F ⊂ F1 and P1 = P +
on F + .
(I) F + ⊂ F1 . Let A ∈ F + . Then there exists A0 , B ∈ F such that A∆A0 ⊂ B with
P (B) = 0. Then we have A0 , B ∈ F1 . Now that F1 is complete, then A−A0 , A0 −A ⊂ B
means that A − A0 , A0 − A ∈ F1 . Since F1 is a field, then
A = (A − A0 ) ∪ A0 \(A0 − A) ∈ F1
and therefore F + ⊂ F1 .
35
(II) P1 = P + on F + . Define
G := {A ∈ F + : P1 (A) = P + (A)}
Then it suffices to show that G ⊃ F + . Namely, it suffices to show that (Ω, G, P + ) is a
complete extension of (Ω, F, P ).
(1) G is a σ-field. Namely, we need to show
(i) G is a field. To see this, we only need to check
* G is closed under complementation. Suppose A ∈ G, then AF + and so
is Ac . And we have P1 (A) = P + (A). Then
P + (Ac ) = 1 − P + (A) = 1 − P1 (A) = P1 (Ac )
By the definition of G we have Ac ∈ G.
* G is closed under finite disjoint union. Suppose A, B ∈ G are disjoint.
Then we have A, B ∈ F + and F + is a σ-field, which have been checked earlier,
and thus A + B ∈ F + . Besides we have P1 (A) = P + (A), P1 (B) = P + (B).
Then
P1 (A + B) = P1 (A) + P1 (B) = P + (A) + P + (B) = P + (A + B)
By the definition of G we have A + B ∈ G.
(ii) G is a M.C..
* Let {An }∞
n=1 ⊂ G be a sequence of increasing sets, namely, An ⊂ An+1 . Then
the union is in F + since every element An is in F + . And by the monotone
sequential continuity from below, we have
!
∞
[
P1
An = lim P1 (An ) = lim P + (An ) = P
n=1
n→∞
n→∞
∞
[
!
An
n=1
* Let {An }∞
n=1 ⊂ G be a sequence of decreasing sets, namely, An ⊃ An+1 . Then
the union is in F + since every element An is in F + . And by the monotone
sequential continuity from above, we have
!
∞
[
P1
An = lim P1 (An ) = lim P + (An ) = P
n=1
n→∞
n→∞
∞
[
!
An
n=1
Therefore G is a M.C..
(2) G ⊃ F. This is trivial because every set in F is in F + , and P1 , P + are both
extension from (Ω, F, P ) and they must agree on F.
36
(3) (Ω, G, P + ) is a complete probability space. Now we have G being a σ-field on
Ω, and that P + restricted on G is also a probability measure. Then we need to check
the completeness of G. Let A ∈ G be a set with P + (A) = 0. Then by definition
A ∈ F + and there exists A0 , B ∈ F with A∆A0 ⊂ B, P (B) = 0. Moreover we have
P + (A) = P (A0 ) = 0. Now if C ⊂ A, then
C∆A0 = (C − A0 ) + (A0 − C) ⊂ B ∪ A0 ∈ F
Since both B and A0 are of probability 0, then B ∪ A0 is also of probability 0.
This means that C ∈ F + and P + (C) = P + (A0 ) = 0. But (Ω, F1 , P1 ) is a complete
probability space, and we now have F + ⊂ F1 and G ⊂ F + as well. Then A, C ∈ F +
implies that A, C ∈ F1 with P1 (C) 6 P1 (A) = P + (A) = 0. Therefore P1 (C) =
P + (C) = 0. By the definition of G we have C ∈ G. Hence (Ω, G, P + ) is a complete
probability space.
EXAMPLE 1.3.1 (2.2.18 in Chung) Let (Ω, F, P ) be a probability space with (Ω, F + , P + )
being its completion. Prove that
F + = {F ∪ N : F ∈ F, N ∈ N } = {F \N : F ∈ F, N ∈ N }
where N = {A ⊂ Ω : A ⊂ B for all B ∈ F, P (B) = 0}.
Proof. Define
G1 := {F ∪ N : F ∈ F, N ∈ N } G2 := {F − N : F ∈ F, N ∈ N }
Then it suffices to show F + ⊃ G1 = G2 ⊃ F + .
(I) G1 ⊂ G2 . Suppose F ∪ N ∈ G1 . Then by definition there exists a set B ∈ F with P (B) = 0,
such that N ⊂ B. Write
F ∪ N = (F ∪ B) − ((B − F ) − N ),
F ∪ B ∈ F,
((B − F ) − N ) ⊂ B
Then we have ((B − F ) − N ) ∈ N . Therefore by definition F ∪ N ∈ G2 .
(II) G2 ⊂ G1 . Suppose F − N ∈ G2 . Then there exists B ∈ F such that P (B) = 0 and N ⊂ B.
Then write
F − N = (F − B) ∪ ((B − N ) ∩ F ),
F − B ∈ F,
Hence (B − N ) ∩ F ∈ N . Therefore F − N ∈ G1 .
37
((B − N ) ∩ F ) ⊂ B
(III) F + ⊃ G1 . Let F ∪ N ∈ G1 . Then by definition there exists B ∈ F with P (B) = 0, such
that N ⊂ B and F ∈ F. Take A0 = F ∪ B, then A0 , B ∈ F and
F ∪ B∆A0 = (F ∪ B)∆(F ∪ N ) = (F ∪ B − F ∪ N ) = (B − F ) ∩ N c ⊂ B
Therefore F ∪ B ∈ F + by definition.
(IV) F + ⊂ G2 . Suppose A ∈ F + , then there exists A0 , B such that P (B) = 0 and A∆A0 ⊂ B.
Now write
A ∪ A0
=(A − A0 ) + (A0 − A) + (A ∩ A0 )
B − (A ∪ A0 ) =(A − A0 ) + (A0 − A) + B 0
where B 0 is defined to be the set difference B − A ∪ A0 . Then
A = A ∩ A0 + (A − A0 ) = (A0 ∪ B) − (B 0 + (A − A0 ))
where A0 ∪ B ∈ F, B 0 + (A − A0 ) ⊂ B. Therefore by definition B 0 + (A − A0 ) ∈ N and
A ∈ G2 .
EXAMPLE 1.3.2 (3.2) Let P be a probability measure on a field F0 and for every subset
A of Ω. Let P ∗ be the outer measure given by P and denote P be the extension probability
measure from F0 to F = σ(F0 ).
(a) Show that
P ∗ (A) = inf{P (B) : A ⊂ B, B ∈ F}
(3.9)
and
P∗ (A) = sup{P (C) : A ⊃ C, C ∈ F}
(3.10)
and the infimum and supremum are always achieved.
(b) Show that A is P ∗ measurable(Namely, for all E ⊂ Ω, we have P ∗ (E) = P ∗ (E ∩ A) +
P ∗ (E ∩ Ac )) if and only if P ∗ (A) = P∗ (A).
(c) The outer measures and inner measures associated with a probability measure P on a σfield are usually defined by (3.9) and (3.10). Show that (3.9) and (3.10) are the same as
(3.1) and (3.2), where
(3.1)
P ∗ (A) = inf
(∞
X
P (An ) : A ⊂
n=1
∞
[
n=1
and
(3.2)
P∗ (A) = 1 − P ∗ (Ac )
38
)
An , An ∈ F
Proof. (a) Define
µ∗ (A) := inf{P (B) : A ⊂ B, B ∈ F},
µ∗ (A) := sup{P (C) : C ⊂ A, C ∈ F}
and we want to show:
S∞
S∞
(I) µ∗ (A) 6 P ∗ (A). Let {An }∞
n=1 ⊂ F be any cover of A( n=1 An ⊃ A where
n=1 An ∈ F
since An ∈ F0 and F = σ(F0 )). Then by monotonicity and subadditivity of probability
measure, we have
∞
[
∗
µ (A) 6 P
!
An
n=1
6
∞
X
P (An )
n=1
Taking the infimum to the right-hand side, and we have µ∗ (A) 6 P ∗ (A).
(II) µ∗ (A) > P ∗ (A). For any set B ∈ F with B ⊃ A, by the monotonicity of outer measure
we have
P (B) = P ∗ (B) > P ∗ (A)
Taking the infimum to the left-hand side, and we have µ∗ (A) > P ∗ (A).
(II) µ∗ (A) = P∗ (A). By definition we have
P∗ (A) = 1 − P ∗ (Ac )
= 1 − µ∗ (Ac )
= 1 − inf{P (B) : B ⊃ Ac , B ∈ F}
= 1 + sup{−P (B) : B ⊃ Ac , B ∈ F}
= sup{1 − P (B) : B ⊃ Ac , B ∈ F}
= sup{P (B c ) : A ⊂ B c , B c ∈ F}
= sup{P (C) : A ⊂ C, C ∈ F} = µ∗ (A)
(III) The infimum of µ∗ can be attained. By the definition of infimum, for all n there
exists Bn ∈ F, Bn ⊃ A, such that
P (Bn ) −
And now take B =
T∞
n=1
1
6 P ∗ (A) 6 P (Bn )
n
Bn , then we have B ⊂ Bn for all n and A ⊂ B. By the
monotonicity of probability measure we have
P (B) −
1
1
6 P (Bn ) − 6 P ∗ (A) 6 P (B), for all n
n
n
Then take n → ∞ and we have P (B) = P ∗ (A).
39
(IV) The supremum of µ∗ can be attained. By the definition of supremum, for all n
there exists Cn ∈ F, Cn ⊂ A, such that
P (Cn ) +
Now take C =
S∞
n=1
1
> P∗ (A) > P (Cn )
n
Cn , then we have C ⊃ Cn for all n and A ⊃ C. By the monotonicity
of probability measure we have
P (C) +
1
1
> P (Cn ) + > P∗ (A) > P (C), for all n
n
n
Then take n → ∞ and we have P (C) = P∗ (A).
(b) It suffices to show
(I) P ∗ -measurable implies P ∗ (A) = P∗ (A). Just take E = Ω and we have
P ∗ (Ω) = 1 = P ∗ (A) + P ∗ (Ac )
Therefore P ∗ (A) = 1 − P ∗ (Ac ) = P∗ (A).
(II) P ∗ (A) = P∗ (A) implies P ∗ -measurable. Now we can use the conclusion of (a).
There exists A1 , A2 , B ∈ F with A1 ⊂ A ⊂ A2 and B ⊃ E for any set E ⊂ Ω, such
that
P (A1 ) = P∗ (A) = P ∗ (A) = P (A2 ), P ∗ (E) = P (B)
Now write
P ∗ (A ∩ E) + P ∗ (Ac ∩ E) 6 P ∗ (A2 ∩ E) + P ∗ (Ac1 ∩ E)
6 P ∗ (A2 ∩ B) + P ∗ (Ac1 ∩ B)
= P (A2 ∩ B) + P (Ac1 ∩ B)
= P ((A2 ∩ B) ∪ (Ac1 ∩ B)) + P (P (A2 ∩ B) ∩ (Ac1 ∩ B))
6 P (B) + P (A2 ∩ Ac1 ∩ B)
6 P (B) + P (A2 − A1 ) = P (B) = P ∗ (E)
Therefore A is P ∗ -measurable.
(c) Define
Q∗ (A) := inf
(∞
X
n=1
P (An ) : A ⊂
∞
[
n=1
Then it suffices to show
40
)
An , An ∈ F
,
Q∗ (A) = 1 − Q∗ (Ac )
(I) Q∗ (A) 6 P ∗ (A). Note that the set
(∞
) (∞
)
∞
∞
X
[
X
[
P (An ) : A ⊂
An , An ∈ F0 ⊂
P (An ) : A ⊂
An , An ∈ F
n=1
n=1
n=1
n=1
∗
since F0 ⊂ F, then taking infimum yields P (A) > P∗ (A).
(II) Q∗ (A) > P ∗ (A). Let {An }∞
n=1 be any cover of A with An ∈ F. Then by monotonicity
and subadditivity of outer measure, we have
∞
[
P ∗ (A) 6 P ∗
!
En
6
n=1
∞
X
P ∗ (En )
n=1
Then take infimum to the right-hand side and we have P ∗ (A) 6 Q∗ (A).
(III) Q∗ (A) = P∗ (A). Since we have Q∗ (A) = P ∗ (A), Q∗ (A) = 1 − Q∗ (Ac ) and P∗ (A) =
1 − P ∗ (Ac ), then automatically Q∗ (A) = P∗ (A) is yielded.
EXAMPLE 1.3.3 (3.10(c)) Show that A ∈ F + if and only if P ∗ (A) = P∗ (A), where P ∗ and
P∗ are defined by (3.9) and (3.10), and P + (A) = P∗ (A) = P ∗ (A) in this case. Thus the complete
extension constructed is exactly the completion.
Proof. Let M = {A ⊂ Ω : P ∗ (A) = P∗ (A)}. Then by the result of 3.2(b) (Ω, M, P ∗ ) is complete
and therefore F + ⊂ M. Hence it suffices to show that M ⊂ F + .
Use the result in 3.2(a) if A ∈ M then there exists A1 , A2 ∈ F with A1 ⊂ A ⊂ A2 with
P (A1 ) = P∗ (A) = P ∗ (A) = P (A2 ). Take A0 = A1 , then
A∆A0 = (A − A1 ) ∪ (A1 − A) = A − A1 ⊂ A2 − A1
Note that P (A2 ) = P (A1 ), A2 ⊃ A1 and A1 , A2 ∈ F, then P (A2 − A1 ) = 0. Hence by the
definition of F + we have A ∈ F + .
1.4
1.4.1
Denumerable Probabilities
Operations On Sets
DEFINITION 1.4.1 Given a collection of sets {Aθ : θ ∈ Θ}, the supremum and infimum of
the collection is given by
sup Aθ =
θ∈Θ
[
Aθ ,
θ∈Θ
inf Aθ =
θ∈Θ
\
Aθ
θ∈Θ
PROPOSITION 1.4.1 Given a collection of sets {Aθ : θ ∈ Θ}, we have
Isup Aθ = sup IAθ ,
θ
41
Iinf Aθ = inf IAθ
θ
Proof.
• Isup Aθ (ω) = 1 if and only if ω ∈ supθ Aθ if and only if there exists θ such that ω ∈ Aθ
if and only if there exists θ such that IAθ (ω) = 1 if and only if supθ Iθ (ω) = 1.
• Isup Aθ (ω) = 0 if and only if ω ∈
/ supθ Aθ if and only if for all θ we have ω ∈
/ Aθ if and only
if for all θ we have IAθ (ω) = 0 if and only if supθ Iθ (ω) = 0.
• Iinf Aθ (ω) = 1 if and only if ω ∈ inf θ Aθ if and only if for all θ we have ω ∈ Aθ if and only if
for all θ we have IAθ (ω) = 1 if and only if inf θ Iθ (ω) = 1.
• Iinf Aθ (ω) = 0 if and only if ω ∈
/ inf θ Aθ if and only if there exists θ such that ω ∈
/ Aθ if and
only if there exists θ such that IAθ (ω) = 0 if and only if inf θ Iθ (ω) = 0.
PROPOSITION 1.4.2 Given a countable collection of sets {An : n ∈ N+ }, we have
Y
ITn An =
IAn
n∈N+
T
An if and only if for all n we have ω ∈ An if and
Q
only if for all n we have IAn (ω) = 1 if and only if n∈N+ IAn (ω) = 1.
T
/ An if and
• ITn An (ω) = 0 if and only if ω ∈
/ n An if and only if there exists n such that ω ∈
Q
only if there exists n such that IAn (ω) = 0 if and only if n∈N+ IAn (ω) = 0.
Proof.
• ITn An (ω) = 1 if and only if ω ∈
n
DEFINITION 1.4.2 The superior limit of a countable collection of sets {An }∞
n=1 is defined
to be
lim sup An :=
n
∞ [
∞
\
An
n=1 i=n
and similarly, the inferior limit of {An }∞
n=1 is defined to be
lim inf An :=
n
∞ \
∞
[
An
n=1 i=n
Also, the superior and limit of a sequence of sets {An }∞
n=1 are often denoted by {An i.o. } and
{An a.a. }, which are abbreviation for ”infinitely often” and ”almost always” respectively.
PROPOSITION 1.4.3 Given a countable collection of sets {An }∞
n=1 , we have
Ilim supn An = lim sup IAn
n
Proof.
• Ilim supn An (ω) = 1 if and only if ω ∈ lim supn An if and only if for all n, there exists
m > n such that ω ∈ Am if and only if there exists m > n such that for all m > n we have
IAm (ω) = 1 if and only if lim supn→∞ IAn (ω) = 1.
42
• Ilim supn An (ω) = 0 if and only if ω ∈
/ lim supn An if and only if there exists n for all m > n
we have ω ∈
/ Am if and only if there exists n, for all m > n we have IAm (ω) = 0 if and only
if lim supn→∞ IAn (ω) = 0.
PROPOSITION 1.4.4 Given a countable collection of sets {An }∞
n=1 , we have
Ilim inf n An = lim inf IAn
n
Proof.
• Ilim inf n An (ω) = 1 if and only if ω ∈ lim inf n An if and only if there exists n, for all
m > n we have ω ∈ Am if and only if there exists m > n such that for all m > n we have
IAm (ω) = 1 if and only if lim inf n→∞ IAn (ω) = 1.
/ lim inf n An if and only if for all n there exists m > n
• Ilim inf n An (ω) = 0 if and only if ω ∈
such that ω ∈
/ Am if and only if there exists n, for all m > n we have IAm (ω) = 0 if and
only if lim inf n→∞ IAn (ω) = 0.
DEFINITION 1.4.3 We say that a countable sequence of sets {An }∞
n=1 has a limit A and
write An → A, if lim inf n An = lim supn An = A.
PROPOSITION 1.4.5 Let {An }∞
n=1 be a countable sequence of sets has a limit A, then we
have
lim IAn = Ilim An
n
Proof. This follows immediately from
lim sup IAn = Ilim sup An = Ilim An = Ilim inf An = lim inf IAn
n
n
Therefore we have
lim IAn = lim inf IAn = lim sup IAn = Ilim An
n
n
n
Indicator function has the following properties
PROPOSITION 1.4.6
• IAc = 1 − IA .
• IA−B = IA − IB if A ⊃ B.
• IA∆B = |IA − IB |.
P
• IPn An = n IAn if {An }kn=1 .
43
Proof.
• IAc (ω) = 1 implies that ω ∈ Ac , namely, ω ∈
/ A and IA (ω) = 0. Therefore 1−IA (ω) =
1. Conversely, IAc (ω) = 0 implies that ω ∈
/ Ac , namely, ω ∈ A and IA (ω) = 1. Therefore
1 − IA (ω) = 1.
• IA−B (ω) = 1 implies that ω ∈ A − B, namely, ω ∈ A, ω ∈
/ B. Therefore IA (ω) − IB (ω) =
1 − 0 = 1. Conversely IA−B (ω) = 0 implies that x ∈
/ A − B, namely, x ∈
/ A, x ∈
/ B or
x ∈ B, x ∈ A. Therefore IA (ω) − IB (ω) = 0 − 0 = 0 or IA (ω) − IB (ω) = 1 − 1 = 0.
• IA∆B (ω) = 1 implies that ω ∈ (A − B) ∪ (B − A), namely, x ∈ A − B or x ∈ B − A.
Then either x ∈ A, x ∈
/ B or x ∈ B, x ∈
/ A. Therefore, |IA (ω) − IB (ω)| = |1 − 0| = 1 or
|IA (ω) − IB (ω)| = |0 − 1| = 1. Conversely IA∆B (ω) = 0 implies that ω ∈
/ (A − B) ∪ (B − A),
namely, x ∈ A ∩ B or x ∈ Ac ∩ B c . Then either x ∈ A, x ∈
/ B or x ∈ Ac , x ∈ B c . Therefore,
|IA (ω) − IB (ω)| = |1 − 1| = 0 or |IA (ω) − IB (ω)| = |0 − 0| = 0.
• IPn An (ω) = 1 implies that there exists k such that ω ∈ Ak and ω ∈
/ An for all n 6= k.
P
P
Namely, we have IAk (ω) = 1, IAn (ω) = 0. Therefore n IAn (ω) = IAk (ω) + n6=k IAn (ω) =
1. Conversely IPn An (ω) = 0 implies that for all n we have ω ∈
/ Ak . Namely, we have
P
IAn (ω) = 0 for all n. Therefore n IAn (ω) = 0.
PROPOSITION 1.4.7 Let {An }∞
n=1 be a sequence of sets.
S
(a) If An ⊂ An+1 for all n, then limn An = ∞
n=1 An .
T
(b) If An ⊃ An+1 for all n, then limn An = ∞
n=1 An .
Proof. (a) By definition we have
∞ [
∞
\
lim sup An =
n
Ak =
n=1 k=n
since An ⊂ An+1 and we have
S∞
Ak =
k=n
lim inf An =
n
because An ⊂ An+1 and we have
∞ [
∞
\
S∞
k=1
∞ \
∞
[
k=n
Ak =
∞
[
An
n=1
An = Ak . Therefore we have
lim An = lim sup An = lim inf An =
n
An
n=1
Ak for all n. And also
n=1 k=n
T∞
Ak =
n=1 k=1
∞
[
n
n
∞
[
An
n=1
(b) By definition we have
lim sup An =
n
∞ [
∞
\
n=1 k=n
44
Ak =
∞
\
n=1
An =
∞
\
n=1
An
since An ⊃ An+1 and we have
S∞
k=n
lim inf An =
n
Ak = An . And also
∞ \
∞
[
Ak =
n=1 k=n
because An ⊂ An+1 and we have
T∞
k=n
∞ \
∞
[
Ak =
n=1 k=1
An =
T∞
k=1
1.4.2
n
n
An
n=1
Ak for all n. Therefore we have
lim An = lim sup An = lim inf An =
n
∞
\
∞
\
An
n=1
Some Basic Estimation
Let (Ω, F, P ) be a probability space.
DEFINITION 1.4.4 (Conditional Probability) Given A, B ∈ F with P (A) > 0, then
conditional probability of B given A is defined to be
P (B|A) =
P (A ∩ B)
P (A)
PROPOSITION 1.4.8 Let {An }∞
n=1 be a sequence of events in F.
S
• If P (An ) = 0 for all n, then P ( ∞
n=1 An ) = 0.
T
• If P (An ) = 1 for all n, then P ( ∞
n=1 An ) = 1.
The first assertion is directly a corollary of the countable subadditivity, and the second one
is a corollary of the monotonicity.
PROPOSITION 1.4.9 Let {An }∞
n=1 be a sequence of events in F. Then we have
P inf An 6 inf P (An ) 6 sup P (An ) 6 P sup An
n
n
n
Proof. By monotonicity we have P (inf n An ) = P (
n
T∞
n=1
An ) 6 P (Am ) for all m, and then take
the infimum of the right-hand side over all m and we have P (inf n An ) 6 inf n P (An ).
Then naturally inf n P (An ) 6 supn P (An ) holds because the infimum is always less or equal
to the supremum of a set of numbers. Finally for all m by monotonicity we have P (Am ) 6
S
P (supn An ) = P ( n An ). Then take the supremum to the right-hand side over all m and we
obtain supn P (An ) 6 P (supn An ).
Hence we get
P inf An
n
6 inf P (An ) 6 sup P (An ) 6 P sup An
n
n
45
n
LEMMA 1.4.1 (Mini-Fatou Lemma) Let {An }∞
n=1 be a sequence of events in F. Then
• P (lim inf n An ) 6 lim inf n P (An ) 6 lim supn P (An ) 6 P (lim supn An )
• If An → A, n → ∞, then P (An ) → P (A), n → ∞.
Proof.
• By definition we have
∞ \
∞
[
P lim inf An = P
n
!
Ak
= lim P
n→∞
n=1 k=n
∞
\
!
Ak
k=n
where the last equality holds for the monotone sequential continuity from below. Using
proposition 1.7.2 we have
P lim inf An 6 lim P inf Ak 6 lim inf P (Ak ) = lim inf P (An )
n
n→∞
n→∞ k>n
k>n
n→∞
Naturally we have the inferior limit being less or equal to the superior limit and hence
P lim inf An 6 lim inf P (An ) 6 lim sup P (An ) = lim sup P (Ak )
n
n→∞
n→∞ k>n
n→∞
Again using proposition 1.7.2 yields
lim sup P (Ak ) 6 lim P
n→∞ k>n
sup Ak
n→∞
Now that the sequence of sets {
S
k>n
!
k>n
[
= lim P
n→∞
Ak
k>n
Ak }∞
n=1 decreases as n goes large, then applying the
monotone sequential continuity from above yields
!
!
∞ [
[
\
lim P
Ak = P
Ak = P lim sup An
n→∞
n
n=1 k>n
k>n
Finally we have
P lim inf An 6 lim inf P (An ) 6 lim sup P (An ) 6 P lim sup An
n
n
n
n
• Suppose we have An → A, namely, we have lim inf n An = lim supn An = A. Then using the
result above yields
P lim inf An = lim inf P (An ) = lim sup P (An ) = P lim sup An
n
n
n
Therefore we have P (An ) → P (A), n → ∞.
46
n
1.4.3
Independence
DEFINITION 1.4.5 (Independence for Events) Suppose that {Aθ : θ ∈ Θ} is a collection
of events in F, then they are independent if for all distinct θ1 , · · · , θn ∈ Θ we have
P (Aθ1 ∩ · · · ∩ Aθn ) = P (Aθ1 ) · · · P (Aθn )
DEFINITION 1.4.6 (Independence for Classes) Suppose that {Aθ : θ ∈ Θ} is a collection
of classes of events in F. Then they are independent if for all distinct θ1 , · · · , θn ∈ Θ and for all
Ai ∈ Aθi , i = 1, 2, · · · , n, we have
P (A1 ∩ · · · ∩ An ) = P (A1 ) · · · P (An )
The following theorem is pretty trivial
THEOREM 1.4.1 (Subclasses Theorem) Suppose {Bθ : θ ∈ Θ} is a collection of classes
of events in F being independent, and Aθ ⊂ Bθ for all θ ∈ Θ, then {Aθ : θ ∈ Θ} are also
independent.
In certain cases we do not need to verify the product rule for every subsets of every classes. We
only need to check whether the generators are independent. Then comes the following theorem:
THEOREM 1.4.2 If {Aθ : θ ∈ Θ} is a collection of independent classes of events in F and
each Aθ is a π-system. Then {σ(Aθ ) : θ ∈ Θ} is also a collection of independent σ-fields.
Proof. The steps of the proof is given:
(I) If A1 , {A2 }, · · · , {An } are independent classes with A1 being a π-system, then
σ(A1 ), {A2 }, · · · , {An } are independent classes.
Consider
G = {A ∈ σ(A1 ) : P (A ∩ A2 ∩ · · · ∩ An ) = P (A)P (A2 ) · · · P (An )}
Then it suffices to show that G ⊃ σ(A1 ). Now that we have A1 being a π-system, then if
we can show that G is a λ-system, then using Dynkins’ π − λ theorem, G will contain σ(A1 )
as a subcollection. Therefore it suffices to show that G is a λ-system.
(a) Ω ∈ G. Since σ(A1 ) is a σ-field, then we have Ω ∈ σ(A1 ). Furthermore we have
P (Ω ∩ A2 ∩ · · · ∩ An ) = P (A2 ∩ · · · ∩ An ) = P (A2 ) · · · P (An ) = P (Ω)P (A2 ) · · · P (An )
Therefore Ω ∈ G by definition.
47
(b) E ∈ G, F ∈ Gg with E ⊂ F implies E − F ∈ G. Now we have E, F ∈ σ(A1 ) and
P (E ∩ A2 ∩ · · · ∩ An ) = P (E)P (A2 ) · · · P (An )
P (F ∩ A2 ∩ · · · ∩ An ) = P (F )P (A2 ) · · · P (An )
Then by σ(A1 ) being a σ-field we know that E − F ∈ σ(A1 ). Moreover we have
P ((E − F ) ∩ A2 ∩ · · · ∩ An ) = P ((E ∩ A2 ∩ · · · ∩ An ) − (F ∩ A2 ∩ · · · ∩ An ))
= (P (E) − P (F ))P (A2 ) · · · P (An )
= P (E − F )P (A2 ) · · · P (An )
Therefore we get the result E − F ∈ G.
S∞
(c) {Ek }∞
Now we have
k=1 ⊂ G with Ek ⊂ Ek+1 for all k implies
k=1 Ek ∈ G.
S∞
En ∈ σ(A1 ) for each n and therefore k=1 Ak ∈ σ(A1 ). Next we have for each k
P (Ek ∩ A2 ∩ · · · ∩ An ) = P (Ek )P (A2 ) · · · P (An )
Therefore using the monotone sequential continuity from below we have
!
!
!
∞
∞
[
[
P
Ek ∩ A2 ∩ · · · ∩ An = P
(Ek ∩ A2 ∩ · · · ∩ An )
k=1
k=1
= lim P (Ek ∩ A2 ∩ · · · ∩ An ) = lim P (Ek )P (A2 ) · · · P (An )
k→∞
k→∞
!
∞
[
=P
Ek P (A2 ) · · · P (An )
k=1
where the second equality holds because {Ek ∩ A2 ∩ · · · ∩ An }∞
k=1 is also a sequence of
increasing sets and thus the monotone sequential continuity from below can be applied.
S
Therefore we obtain that ∞
k=1 En ∈ G.
(II) If A1 , · · · , An are independent π-system, then we have σ(A1 ), A2 , · · · , An are independent.
Now we know that for all Ai ∈ Ai , i = 2, 3, · · · , n, the collection A1 , {A2 }, · · · , {An } are
independent by the subclasses theorem, then by (I) we know that σ(A1 ), {A2 }, · · · , {An }
are independent. Namely, for all A1 ∈ A1 , A2 , · · · , An we have
P (A1 ∩ · · · ∩ An ) = P (A1 ) · · · P (An )
Since A2 , · · · , An are selected arbitrarily, then we know that σ(A1 ), A2 , · · · , An are independent.
(III) If A1 , · · · , An are independent π-system, then we have σ(A1 ), σ(A2 ), · · · , σ(An ) are
independent.
48
By (II) we know that σ(A1 ), A2 , · · · , An are independent. They are also all π-systems
since σ(A1 ) being a σ-field means that it is also a π-system. Therefore using (II) again
we obtain σ(A1 ), σ(A2 ), A3 , · · · , An are independent and so on. Hence finally we have
σ(A1 ), · · · , σ(An ) are independent.
(IV) If Aθ , θ ∈ Θ are independent π-systems, then σ(Aθ ), θ ∈ Θ are independent.
Now we know that for any finite θ1 , · · · , θn ∈ Θ the collection Aθ1 , · · · , Aθn are independent.
Then by (III) we have σ(Aθ1 ), · · · , σ(Aθn ) are independent. Namely, for any Ai ∈ σ(Aθi )
we have
P (A1 ∩ · · · ∩ An ) = P (A1 ) ∩ · · · ∩ P (An )
Then by the definition of independence of classes we have σ(Aθ1 ), · · · , σ(Aθn ) are independent.
THEOREM 1.4.3 (Splitting Theorem/Analysis of Independence(ANOI)) Let (Ω, A, P )
P
be a probability space. Let {Ai : i ∈ I} be a collection of sub-σ-fields of A, and let I = j∈J Ij
be a decomposition of I into disjoint index sets Ij , j ∈ J. Set


[
AIj = σ 
Ai  , j ∈ J
i∈Ij
Then Ai : i ∈ I are independent if and only if we have both
(i) (Independence within groups) For each j ∈ J we have Ai : i ∈ Ij are independent
classes.
(ii) (Independence between groups) The σ-fields AIj : j ∈ J are independent.
Proof. (I) ”Only if part”. Now we know that Ai : i ∈ I are independent and take arbitrary
finite Ai ’s such that ij1 , · · · , ijn(j) ∈ Ij for j = 1, 2, · · · , m. Namely, divide Ai ’s into
corresponding groups of Ij according to j. Then for all Aijk ∈ Aijk , i = 1, 2, · · · , n(j), j =
1, 2, · · · , m by (i) we have
P (Aij1 ∩ · · · ∩ Aijn(j) ) = P (Aij1 ) · · · P (Aijn(j) )
And by (ii) we know that Aij1 ∩ · · · ∩ Aijn(j) , j = 1, 2, · · · , m are independent since they are
in corresponding AIj , j = 1, 2, · · · , m where Aij ’s are all independent σ-fields and closed
under intersection. Therefore putting Aijk ’s together and we obtain
P ((Aij1 ∩ · · · ∩ Aijn(j) ) ∩ · · · ∩ (Aim1 ∩ · · · ∩ Aimn(m) ))
=P ((Aij1 ∩ · · · ∩ Aijn(j) ) ∩ · · · ∩ (Aim1 ∩ · · · ∩ Aimn(m) )
=P ((Aij1 ) · · · P (Aijn(j) ) · · · P (Aim1 ) · · · P (Aimn(m) )
49
Hence Ai : i ∈ I are independent because for all finite Ai ’s we have Ai ’s are independent.
(II) ”If part”. Assume that Ai : i ∈ I are independent σ-fields. By definition for each j ∈ Ij ,
for all i1 , · · · , in ∈ Ij , for all Aik ∈ Aik , k = 1, 2, · · · , n we have
P (Ai1 ∩ · · · ∩ Ain ) = P (Ai1 ) · · · P (Ain )
implying that Ai : i ∈ Ij are independent, and this holds for an arbitrary j ∈ J. Then this
shows (i). Therefore it suffices to show (ii). Define
(
)
\
Cj :=
Ah : Ah ∈ Ah , H ⊂ Ij , H is finite
h∈H
for all j ∈ J.
(a) σ(Cj ) = AIj for all j ∈ J. We can see that Ai ⊂ Cj for all i ∈ Ij because for all
Ai ∈ Ai we have {i} is a finite index in Ij . Namely, we have
[
Ai ⊂ Cj
i∈Ij
We also know that Cj ⊂ σ
S
Si∈Ij
from distinct Ah , and Ah ⊂
Ai
i∈Ij
because Cj consists of finite intersection of Ah
Ai . Therefore the following inclusion relationship
holds


[
Ai ⊂ Cj ⊂ σ 
i∈Ij
[
Ai  = AIj
i∈Ij
for all j ∈ J. Namely we have σ(Cj ) = AIj for all j ∈ J.
(b) For each j ∈ J the class Cj is a π-system. Namely, Cj is closed under finite interT
section. Take H1 , H2 to be any finite subsets of the index Ij . Namely, take h∈H1 Ah
T
and h∈H2 Ah to be any two elements in Cj . Write H = H1 ∪ H2 . Then H is also a
finite subset of the index set Ij and we have
!
!
\
\
\
Ah ∩
Ah =
Ah ∈ Cj
h∈H1
h∈H2
h∈H
Hence Cj is closed under finite intersection. Namely, Cj is a π-system for each j ∈ J.
(c) The classes Cj : j ∈ J are independent. Now we know that Ai : i ∈ I are indepenT
dent. Take any Cj1 , · · · , Cjn be finite collection of Cj ’s and take any hk ∈Hk Ahk ∈ Cjk
for k = 1, 2, · · · , n. Then we have
!
!!
n
n
\
\
Y
Y
Y
P
Ah1 ∩ · · · ∩
Ahn
=
P (Ahk ) =
P
h1 ∈H1
hn ∈Hn
k=1 hk ∈Hk
Therefore the independence of Cj for all j ∈ J is yielded.
50
k=1
!
\
hk ∈Hk
Ahk
Therefore we know that Cj ’s are independent and they are π-systems as well as generators
of AIj . Hence by theorem 1.4.2 we know AIj : j ∈ J are independent.
1.5
Borel-Cantelli Lemmas
THEOREM 1.5.1 (First Borel-Cantelli Lemma) Let (Ω, F, P ) be a probability space and
P∞
{An }∞
n=1 be a sequence of events in F. Then
n=1 P (An ) < ∞ implies that P (An i.o. ) = 0.
P∞
P∞
Proof. Now that
n=1 P (An ) < ∞ then we know limN →∞
n=N P (An ) = 0. Therefore by
monotonicity
P (An i.o. ) = P
∞ [
∞
\
∞
[
!
Ak
6P
n=1 k=n
!
Ak
,
for all n ∈ N+
k=n
On the other hand by countable subadditivity
!
∞
∞
X
[
P (Ak ),
P (An i.o. ) 6 P
Ak 6
for all n ∈ N+
k=n
k=n
Therefore take n → ∞ and we obtain
0 6 P (An i.o. ) 6 lim
∞
X
n→∞
P (Ak ) = 0
k=n
Therefore P (An i.o. ) = 0.
THEOREM 1.5.2 (Second Borel-Cantelli Lemma) Let (Ω, F, P ) be a probability space.
P∞
If {An }∞
n=1 are independent events in F and
n=1 P (An ) = ∞, then P (An i.o. ) = 1.
T
c ∞
Proof. Because { ∞
k=n Ak }n=1 are increasing sequence of sets, then by the monotone continuity
sequential continuity from below we have
!
∞ \
∞
[
P lim sup An = 1 − P
Acn = 1 − lim P
n
Also we have {
Tn+p
k=n
n→∞
n=1 k=n
!
∞
\
Acn
k=n
Acn }∞
p=1 is a decreasing sequence. Then by the monotone continuity from
above we have
P lim sup An = 1 − lim P
n→∞
n
∞
\
k=n
n+p
!
Acn
= 1 − lim lim P
n→∞ p→∞
\
!
Ack
k=n
Note that we have the the inequality 1 − x 6 exp(−x) for x ∈ [0, 1], then by applying independence we have
n+p
P
\
k=n
n+p
!
Ack
=
Y
(1 − P (Ak )) 6 exp −
k=n
n+p
X
k=n
51
!
P (Ak )
Letting p → ∞ yields
n+p
\
lim P
p→∞
!
Ack
6 lim exp −
n+p
X
p→∞
k=n
!
P (Ak )
=0
k=n
Therefore we have
P
n+p
= 1 − lim lim P
lim sup An
n→∞ p→∞
n
!
\
=1−0=1
k=n
EXAMPLE 1.5.1 (Example 4.1(Billingsley)) Consider the function dn (ω) defined on the
unit interval by the binary expansion
ω=
∞
X
dn (ω)
n=1
2n
,
for all ω ∈ (0, 1], ωn ∈ {0, 1} for all n
And let ln (ω) be the length of the run of 0’s starting at dn (ω) : ln (ω) = k if dn (ω) = · · · =
dn+k−1 (ω) = 0 and dn+k (ω) = 1. Then we have
1
2r
P {ω : ln (ω) > r} =
Then by the mini-Fatou lemma we have
P {ln (ω) > r i.o. } > lim sup P {ln (ω) > rn } = lim sup
n→∞
n→∞
1
2rn
Now if we take rn ≡ 1 then the lower bound for P {ln (ω) > r i.o. } is 1/2, which is not so good
because P {ln (ω) > 1 i.o. } = 1.
Now we will find the critical value of rn . Assume that rn are not restricted to integers. Then
we know that
P (An ) := P {ln (ω) > rn } ∈
Therefore
P∞
n=1
P (An ) = ∞ if and only if
P∞
n=1
1
1
,
r
+1
2n
2rn
2−rn = ∞. Now consider the critical value
rn = (1 + ε) log2 n,
for all n
where ε is an arbitrary small number. Then we see that
P∞
n=1
2−rn =
P∞
n=1
n−1−ε converges for
all ε > 0. Therefore we can apply the first Borel-Cantelli lemma and obtain
P (A(ε)) := P {An i.o. } := P {ln (ω) > rn i.o. } = 0
Then we obtain
[
∞ ∞
[
1
A(0) := {ln (ω) > log2 n i.o. } =
ln (ω) > 1 +
logn i.o. =
A(1/k)
k
k=1
k=1
52
Since every A(1/k) is of probability 0, then by subadditivity we obtain that A(0) is of probability
0. Namely,
P (A(0)) = P {ln (ω) > log2 n i.o. } = P
lim sup ln (ω) > log2 n
=0
n
Namely we have
P lim sup ln (ω) 6 log2 n = 1
n
Now we restricted rn to integers and define Bn = {ln (ω) > rn } for all n. Then Bn ∈ Fn :=
σ({{di (ω) = 0} : n 6 i < n + rn }). But we can see Fn : n = 1, 2, · · · are not independent. Then
define Ak = Bnk where nk is defined by induction:
n1 = 1
nk+1 = nk + rnk
Then we claim that {Ak i.o. } ⊂ {Bn i.o. }. If ω ∈ Ak infinitely often, then for all k, there
exists K > k such that ω ∈ AK = BnK . Therefore for all n, there exists k such that nk > n,
and there exists nK > K > k > nk such that ω ∈ BnK . This shows that ω ∈ Bn infinitely often.
Now we are going to derive the condition such that {Ak i.o. } is of probability 1.
Since Ak ∈ Fnk where Fnk : k = 1, 2, · · · are independent σ-fields, then by the splitting
theorem we know that Ak : k = 1, 2, · · · are independent events. Therefore in order to apply
P
the second Borel-Cantelli lemma, it suffices to find a condition such that ∞
n=1 P (Ak ) = ∞.
P∞ −rn
Namely, we want find the condition such that k=1 2 k = ∞.
Write
∞
X
P (Ak ) =
∞
X
2−rnk
k=1
k=1
=
∞
X
−rnk
2
k=1
=
=
∞
X
k=1
∞
X
nk+1 −1
X
1
1
nk+1 − nk n=n
k
nk+1 −1
2−rnk
1 X
1
rnk n=n
k
2−ρn ρ−1
n
n=1
where we write ρn = rnk if nk 6 n < nk+1 . Now if we add an assumption that rn is an increasing
sequence, then we have rn 6 rk . Therefore ρn = rnk > rn if nk 6 n < nk+1 . Then we know that
∞
X
P (Ak ) >
∞
X
n=1
k=1
53
2−rn
1
rn
Set rn = log2 n, then we have
∞
X
P (Ak ) >
∞
X
∞
−rn
2
n=1
k=1
X
1
1
=
rn
n log2 n
n=1
R ∞ dx
diverges. Therefore
The right-hand side series diverges because the improper integral 1 x log
2x
P∞
we obtain that when rn = log2 n the series
k=1 P (Ak ) diverges. By applying the second
Borel-Cantelli lemma we see that
1 > P {Bn i.o. } > P {An i.o. } = 1
The second inequality holds due to the monotonicity. Therefore we see that
P {Bn i.o. } = P lim sup ln (ω) > log2 n = 1
n
Combining the fact obtained by first Borel-Cantelli lemma
P lim sup ln (ω) 6 log2 n = 1
n
We obtain that
P lim sup ln (ω) = log2 n = 1
n
EXAMPLE 1.5.2 Let {An }∞
n=1 be a sequence of independent events, then
!
∞
∞
Y
\
P (An )
P
An =
n=1
n=1
Proof. By the monotone sequential continuity from above
!
!
∞
m
\
\
P
An = lim P
An
n=1
because {
Tm
n=1
m→∞
n=1
An }∞
m=1 is a decreasing sequence of sets. Then
!
!
∞
m
m
\
\
Y
P
An = lim P
An = lim
P (An )
m→∞
n=1
m→∞
n=1
n=1
because An ’s are independent.
EXAMPLE 1.5.3 (Billingsley 4.4) Recall that mini-Fatou’s lemma implies that if {An }∞
n=1
is a sequence of events, then
P lim inf An 6 lim inf P (An ) 6 lim sup P (An ) 6 P lim sup An
n
n
n
n
Find a sequence of events {An }∞
n=1 such that all three inequalities holds strictly.
54
Solution. The first example is constructed on ((0, 1], B, λ). Define
3
3
A2k+1 = 0,
, A2k+2 =
, 1 , for all k = 0, 1, 2 · · ·
4
4
Then we see that lim supn An = (0, 1], lim inf n An = ∅ and thus we have
1
3
P lim inf An = 0 < = lim inf P (An ) < = lim sup P (An ) < 1 = P lim sup An
n
n
4
4
n
n
The second example is related to the RUN LENGTHS example. Define rn = 3/2 + 1/2(−1)n .
Namely, rn = 1, 2, 1, 2, 1, 2, · · · . Then let
An : {ω ∈ Ω : ln (ω) > rn } n ∈ N+
Then we have
lim sup An = {ln (ω) > rn i.o. }
lim inf An = {ln (ω) > rn a.a. }
n
n
To see that lim supn An is of probability 1, it suffices to show that the complement is of probability
0. Then
c
lim sup An
= lim inf
n
n
Now consider
T∞
k=n {lk (ω)
Acn
∞ \
∞
[
= {ln (ω) < rn a.a.} =
{lk (ω) < rk }
n=1 k=n
< rk }. Then ω ∈
T∞
k=n {lk (ω)
< rk } if and only if for all l 6 n we have
lk (ω) < rk . But
(
P
∞
\
)
lk (ω) < rk
6
k=n
Therefore
P {Acn
a.a. } 6
P∞
n=1
P(
T∞
k=n
1 1
1
··· = 0
n
n+2
n+4
2 2
2
Ack ) = 0. Namely we obtain that P (lim supn An ) = 1.
To see that lim inf n An is of probability 0, we need to write
lim inf An =
n
Now we show that
∞ \
∞
[
{lk (ω) > rk }
n=1 k=n
T∞
k=n {lk (ω) > rk } is of probability 0. If ω ∈
T∞
k=n
Ak then we know that for
all k > n, lk (ω) > rk . Namely, starting from n we can obtain that all digits dk , dk+1 , dk+2 , · · ·
are all 0. Therefore
P
∞
\
!
{lk (ω) > rk }
=0
k=n
P∞
T
yielding that P {An a.a. } 6 n=1 P ( ∞
k=n Ak ) = 0. To sum up, we obtain that
1
1
P lim inf An = 0 < lim inf P (An ) = < lim sup P (An ) = < P lim sup An = 1
n
n
4
2
n
n
55
EXAMPLE 1.5.4 (Billingsley Problem 4.11)
(a) If {An }∞
n=1 are independent events, then
!
∞
∞
[
Y
P
An = 1 −
(1 − P (An ))
n=1
n=1
Prove this fact and from them derive the second Borel-Cantelli lemma by the well-known relation
between infinite series and products.
(d) Show that P (lim supn An ) = 1 if and only if
P∞
n=1
P (An ∩ A) diverges for all A with positive
probability.
(e) If {An }∞
n=1 are independent and P (An ) < 1 for all n, then P {An i.o. } = 1 if and only if
S∞
P ( n=1 An ) = 1.
Proof.
(a) Since A1 , · · · , An , · · · are independent, then Ac1 , · · · , Acn , · · · are independent because if we
set {An } for all n, then they are independent π-systems and therefore σ({A1 }), · · · , σ({An }), · · ·
are independent. But Acn ∈ σ({An }), and hence {Acn }∞
n=1 are independent. Then by Ex1.5.2 we
have
P
∞
[
!
An
=1−P
∞
\
!
Acn
=1−
P (Acn )
=1−
∞
Y
(1 − P (An ))
n=1
n=1
n=1
n=1
∞
Y
Now reconsider the Borel Cantelli lemma. Compute
!
!
∞
∞ \
∞
∞
Y
[
\
c
c
(1 − P (Ak ))
P (An i.o. ) = 1 − P
Ak = 1 − lim P
Ak = 1 − lim
n=1 k=n
n→∞
k=n
n→∞
k=n
where the second equality holds due to the monotone sequential continuity from below. Now if
for all n there exists k > n such that P (Ak ) = 1, which does not contradict with the fact that
Q∞
P∞
c
k=n (1 − P (Ak )) = 0 for all n.
n=1 P (An ) = ∞, then we know that P (Ak ) = 0 and therefore
Namely P (An i.o. ) = 1. Now if there exists n0 such that for all k > n0 we have P (Ak ) < 1,
then we know that 1 − P (Ak ) > 0. Now that we know
∞
X
P (Ak ) = ∞,
k=n
∞
X
(− log(1 − P (Ak )) = ∞
k=n
for all n because both infinite sums diverge simultaneously by the comparison criterion. Then
Q
Q∞
we have ∞
k=n0 (1 − P (Ak )) = 0, implying that limn→∞
k=n (1 − P (Ak )) = 0.
To sum up we have P (An i.o. ) = 1.
(d) First assume that P (lim supn An ) = 1. Then consider E = A ∩ lim supn An which is also of
positive probability because A = A ∩ lim supn An + A ∩ (lim supn An )c and (lim supn An )c is of
56
probability 0. Now if assume that
P∞
n=1
P (An ∩ A) converges, namely
P∞
n=1
P (An ∩ E) < ∞,
then we know that
0 = lim
∞
X
n→∞
∞
[
P (Ak ∩ E) > lim P
n→∞
k=n
!
(Ak ∩ E)
∞ [
∞
\
=P
!
(Ak ∩ E)
>0
n=1 k=n
k=n
The last equality holds because of the monotone sequential continuity from above. Namely
we have P (lim supk (Ak ∩ E)) = 0. But E ⊂ lim supk Ak , which implies that P (lim supk (Ak ∩
E)) = P (lim supk Ak ) = 0, contradict with P (lim supn An ) = 1. Therefore we conclude that
P∞
n=1 P (An ∩ A) diverges for all A with positive probability.
P
Secondly consider the reverse direction. Assume that ∞
n=1 P (An ∩ A) diverges for all A with
positive probability. Now if we assume that P (lim supn An ) < 1, namely P (lim inf n Acn ) > 0,
then we have
P lim inf
n
Therefore there exists n0 such that P (
∞
X
P (An ∩ A) =
n=n0
Acn
T∞
∞
X
= lim P
n
∞
\
!
Ack
>0
k=n
T
c
Ack ) > 0. Then take A = ∞
k=n0 Ak and write
!c !
∞
X
[
An −
=
P (∅) = 0
Ak
k=n0
P
n=n0
And this contradicts with the fact that
n=n0
k=n0
P∞
n=1
P (An ∩ A) = ∞. Therefore we conclude that
P (lim supn An ) = 1.
(e) First assume that P {An i.o. } = 1. Then observe that
!
∞ [
∞
\
1 6 P {An i.o. } = P
Ak 6 P
n=1 k=n
∞
[
!
Ak
61
k=1
S
An ) = 1.
Therefore we yield that P ( ∞
Sn=1
∞
Secondly assume that P ( n=1 An ) = 1. Applying de Morgan’s Law and independence yields
T∞
Q
c
P ( n=1 Acn ) = ∞
n=1 P (An ) = 0. Then
!
∞
\
P {An i.o. } = 1 − P {Acn a.a. } = 1 − lim P
Ack
n→∞
k=n
where the last equality holds because of the monotone sequential continuity from below. Furthermore we have
P {An i.o. } = 1 −
P {Acn
a.a. } = 1 − lim P
n→∞
Q∞
∞
\
k=n
!
Ack
= 1 − lim
n→∞
∞
Y
(1 − P (Ak ))
k=n
It suffices to show that limn→∞ k=n (1 − P (Ak )) = 0. If not, then there exists n0 such that
Q∞
Qn0 −1
k=n0 (1−P (Ak )) > 0. Now that we know (1−P (Ak )) > 0 for each k, then
n=1 (1−P (Ak )) > 0
57
holds. Hence we obtain
∞
Y
(1 − P (An )) =
nY
0 −1
∞
Y
n=1
n=n0
n=1
contradicts with the fact that
(1 − P (An )) ·
Q∞
n=1
(1 − P (An )) =
∞
Y
(1 − P (An )) > 0
n=1
P (Acn ) = 0. Therefore we conclude that P {An i.o. } = 1.
EXAMPLE 1.5.5 (Billingsley Problem 4.2)
(a) Prove that
lim sup An ∩ lim sup Bn ⊃ lim sup(An ∩ Bn )
n
n
n
lim sup An ∪ lim sup Bn = lim sup(An ∪ Bn )
n
n
n
lim inf An ∩ lim inf Bn = lim inf (An ∩ Bn )
n
n
n
lim inf An ∪ lim inf Bn ⊂ lim inf (An ∪ Bn )
n
n
n
Show by example that the two inclusions can be strict.
(c) Show that
lim sup An − lim inf An = lim sup(An − An+1 ) = lim sup(An+1 − An )
n
n
n
n
(d) Show that An → A and Bn → B together imply that An ∪Bn → A∪B and An ∩Bn → A∩B.
Proof.
(a) (1) By definition we can see that if ω ∈ lim supn (An ∩ Bn ) then for all n there exists m > n
such that ω ∈ Am ∩ Bm . Therefore for all n there exists m > n such that ω ∈ Am
and ω ∈ Bm . By definition again we obtain that ω ∈ lim supn An and ω ∈ lim supn Bn .
Therefore ω ∈ (lim supn An ) ∩ (lim inf n Bn ). Hence we obtain that lim supn (An ∩ Bn ) ⊂
(lim supn An ) ∩ (lim supn Bn ).
(2) It suffices to show:
(i) ω ∈ lim supn An or ω ∈ lim supn Bn implies that ω ∈ lim supn (An ∩ Bn ). If ω ∈
lim supn An or ω ∈ lim supn Bn , then for all n there exists k, m > n such that ω ∈ Ak ⊂
Ak ∪ Bk , ω ∈ Bm ⊂ Am ∪ Bm . By definition we have ω ∈ lim supn (An ∪ Bn ).
(ii) ω ∈ lim supn (An ∩ Bn ) implies that ω ∈ lim supn An or ω ∈ lim supn Bn . If ω ∈
lim supn (An ∪Bn ) then by definition for all n there exists m > n such that ω ∈ Am ∪Bm .
Namely, for all n there exists m > n such that either ω ∈ Am or ω ∈ Bm . Therefore by
definition again we have ω ∈ lim supn An or ω ∈ lim supn Bn .
58
(3) By (2) we know that
lim inf (An ∩ Bn ) =
n
=
=
lim sup(Acn
n
lim sup Acn
n
c
∪
∪
c
lim sup Acn
n
Bnc )
lim sup Bnc
n
∩
c
lim sup Bnc
n
c
= lim inf An ∩ lim inf Bn
n
n
(4) By (1) we know that
c c
c
c
lim inf An ∪ lim inf Bn = lim sup An ∪ lim sup Bn
n
n
n
n
c
c
c
=
lim sup An ∩ lim sup Bn
n
n
c
⊂ lim sup(Acn ∪ Bnc )
n
= lim inf (An ∩ Bn )
n
The example with strict inclusions is provided by
1
1
1
1
A2k+1 = 0,
, A2k+2 =
, 1 , B2k+2 = 0,
, B2k+1 =
,1 ,
2
2
2
2
for all k ∈ Z+
Therefore we obtain that lim supn An = lim supn Bn = (0, 1] with all An ∩ Bn = ∅, namely,
lim supn (An ∩ Bn ) = ∅. Hence we obtain the strict inclusion for (1) with (0, 1] strict
including ∅. Similarly lim inf n An = lim inf n Bn = ∅ and lim inf n (An ∪ Bn ) = (0, 1] and the
strict inclusion for (4) also holds.
(c) It suffices to show that
(i) lim supn An − lim inf n An ⊂ lim supn (An − An+1 ). If ω ∈ lim supn An − lim inf n An , then
ω ∈ lim supn An and ω ∈ lim supn Acn . By definition for all n there exists k, m > n such
that ω ∈ Ak , Acm . Then for all n there exists k > n and for this k there exists m > k such
that ω ∈ Ak − Am . Namely for all n we have
ω∈
∞ [
∞
[
(Ak − Am )
k=n m=k
But Ak − Am ⊂ (Ak − Ak+1 ) ∪ (Ak+1 − Ak+2 ) ∪ · · · ∪ (Am−1 − Am ). Therefore we have
∞ [
∞
[
(Ak − Am ) ⊂
k=n m=k
∞
[
(Ak − Ak+1 )
k=n
Therefore we obtain lim supn An − lim inf n An ⊂ lim supn (An − An+1 ).
59
(ii) lim supn (An − An+1 ) ⊂ lim supn An − lim inf n An . Now if ω ∈ lim supn (An − An+1 ) then
for all n there exists k > n such that ω ∈ Ak − Ak+1 . But naturally we have for all n the
following inclusion
∞
[
(Ak − Ak+1 ) ⊂
∞ [
∞
[
(Ak − Am )
k=n m=n
k=n
Therefore for all n there exists k, m > n such that ω ∈ Ak , ω ∈ Acm . Hence ω ∈ lim supn An ∩
lim supn Acn = lim supn An − lim inf n An .
(iii) lim supn An − lim inf n An ⊂ lim supn (An+1 − An ). If ω ∈ lim supn An − lim inf n An , then
ω ∈ lim supn An and ω ∈ lim supn Acn . By definition for all n there exists k, m > n such
that ω ∈ Ak , Acm . Then for all n there exists m > n and for this m there exists k > m such
that ω ∈ Ak − Am . Namely for all n we have
∞ [
∞
[
(Ak − Am )
ω∈
m=n k=m
But Ak − Am ⊂ (Am+1 − Am ) ∪ (Am+2 − Am+1 ) ∪ · · · ∪ (Ak − Ak−1 ). Therefore we have
∞ [
∞
[
(Ak − Am ) ⊂
m=n k=m
∞
[
(Ak+1 − Ak )
k=n
Therefore we obtain lim supn An − lim inf n An ⊂ lim supn (An+1 − An ).
(iv) lim supn (An+1 − An ) ⊂ lim supn An − lim inf n An . Now if ω ∈ lim supn (An+1 − An ) then
for all n there exists k > n such that ω ∈ Ak+1 − Ak . But naturally we have for all n the
following inclusion
∞
[
(Ak+1 − Ak ) ⊂
∞
∞ [
[
(Ak − Am )
k=n m=n
k=n
Therefore for all n there exists k, m > n such that ω ∈ Ak , ω ∈ Acm . Hence ω ∈ lim supn An ∩
lim supn Acn = lim supn An − lim inf n An .
(d) If An → A and Bn → B, then lim supn An = lim inf n An = A, lim supn Bn = lim inf n Bn = B
and using (a) yields
A ∪ B = lim inf An ∪ lim inf Bn ⊂ lim inf (An ∪ Bn )
n
n
n
A ∪ B = lim sup An ∪ lim sup Bn = lim sup(An ∪ Bn )
n
n
n
Therefore we obtain lim supn (An ∪ Bn ) ⊂ lim inf n (An ∪ Bn ) ⊂ lim supn (An ∪ Bn ) and yields that
limn (An ∪ Bn ) = lim supn (An ∪ Bn ) = lim inf n (An ∪ Bn ) = A ∪ B. Similarly
A ∩ B = lim sup An ∩ lim sup Bn ⊃ lim sup(An ∩ Bn )
n
n
n
A ∩ B = lim inf An ∩ lim inf Bn = lim inf (An ∩ Bn )
n
n
60
n
Therefore we obtain lim inf n (An ∩ Bn ) ⊃ lim supn (An ∩ Bn ) ⊃ lim inf n (An ∩ Bn ) and yields
limn (An ∩ Bn ) = lim inf n (An ∩ Bn ) = lim supn (An ∩ Bn ) = A ∩ B.
EXAMPLE 1.5.6 (Billingsley Problem 4.5)
(a) Show that limn P (lim inf k An ∩ Ack ) = 0.
Put A∗ = lim supn An and A∗ = lim inf n An .
(b) Show that P (An − A∗ ) → 0 and P (A∗ − An ) = 0.
(c) Show that An → A implies that P (A∆A) → 0.
(d) Suppose that An converges to A in the weaker sense that P (A∆A∗) = P (A∆A∗ ) = 0. Show
that P (A∆An ) → 0.
Proof.
(a) It suffices to show that lim supn lim inf k (An ∩ Ack ) = ∅. Now if we assume that there exists
ω ∈ lim supn lim inf k (An ∩ Ack ), then for all n, there exists k > n, there exists m0 and for all
l > m0 we have ω ∈ Ak ∩ Acl . Now if we take n = m0 , then there exists k > m0 and for all
l > m0 we have ω ∈ Ak − Al . But this means that ω ∈ Ak − Ak = ∅ because k > m0 which
contradicts with ∅ containing no elements. Therefore lim supn lim inf k (An ∩ Ack ) is empty. Then
applying mini-Fatou’s lemma yields
Ack )
0 6 lim inf P lim inf (An ∩
n
k
6 lim sup P lim inf (An ∩ Ack )
k
n
c
6 P lim sup lim inf (An ∩ Ak ) = 0
k
n
Therefore limn P (lim inf k (An ∩ Ack )) = 0.
(b) Write
∗
lim P (An − A ) = lim P An ∩ lim inf
n→∞
n→∞
k
Ack
= lim P lim inf (An ∩
n→∞
k
Ack )
=0
which is a direct result of (a). Similarly
c
c c
c
c c
lim P (A∗ − An ) = lim P lim inf (An ∩ (Ak ) ) = lim P lim inf (An ∩ (Ak ) ) = 0
n→∞
n→∞
n→∞
k
k
(c) Using the results in (b) yields
0 6 P (An ∆A) 6 P (An − A) + P (A − An ) = P (An − A∗ ) + P (A∗ − An ) → 0,
where A∗ = A∗ = A holds because An → A. Therefore limn→∞ P (An ∆A) = 0.
61
n→∞
(d) Observe that
An ∆A = (An − A) + (A − An ) ⊂ ((An − A∗ ) ∪ (A∗ − A)) ∪ ((A − A∗ ) ∪ (A∗ − An ))
and we also have P (A∗ − An ) → 0, P (An − A∗ ) = 0 and A∗ − A ⊂ A∆A∗ , A − A∗ ⊂ A∆A∗ .
Therefore P (A∗ − A) = P (A − A∗ ) = 0 and
0 6 P (An ∆A) 6 P (An − A∗ ) + P (A∗ − A) + P (A − A∗ ) + P (A∗ − An ) → 0, n → ∞
Therefore we obtain that P (An ∆A) → 0, n → ∞.
EXAMPLE 1.5.7 (Chung Problem 4.2.20) Let {En }∞
n=1 be arbitrary events satisfying
(i) lim P (En ) = 0,
(ii)
n→∞
∞
X
c
P (En ∩ En+1
)<∞
n=1
then P (lim supn En ) = 0.
c
i.o. } = 0. But
Proof. Directly applying the first Borel-Cantelli lemma yields P {En ∩ En+1
observe that
c
lim sup En − lim inf En = lim sup(En ∩ En+1
) = {En ∩ En+1 i.o. },
n
n
n
lim inf En ⊂ lim sup En
n
n
Therefore
P
lim sup En
= P lim sup(En ∩ En+1 ) + P lim inf En = P lim inf En
n
n
n
n
But mini-Fatou’s lemma implies that
0 6 P lim inf En 6 lim inf P (En ) = lim P (En ) = 0
n
n→∞
n→∞
Therefore we derive that
P
lim sup En
= P lim inf En = 0
n
n
1.6
Kolmogorov’s Zero-One Law
DEFINITION 1.6.1 (Tail σ-fields) Let (Ω, A) be a measurable space with {An }∞
n=1 being
a sequence of measurable sets. The σ-fields
T =
∞
\
σhAp : p > ni
n=1
is called the tail σ-field associated with the An ’s. If it is a probability space, then events in T
are called tail events.
62
PROPOSITION 1.6.1 Let {An }∞
n=1 be a sequence of events in (Ω, F, P ). Then lim supn An
and lim inf n An are tail events in F.
Proof. For all n we have
∞ [
∞
\
lim sup Am =
m
Ak ∈ σhAp : p > ni
m=n k=m
∞ \
∞
[
lim inf Am =
m
Ak ∈ σhAp : p > ni
m=n k=m
But lim supm Am and lim inf m Am do not rely on n. Then we can take intersection over all n.
Namely, write
lim sup Am , lim inf Am ∈ T =
m
m
∞
\
σhAp : p > ni
n=1
EXAMPLE 1.6.1 Consider the case where (Ω, F, P ) = ((0, 1], B, λ) and ln be the RUN
∞
LENGTHS functions. Put {An }∞
n=1 be {dn (ω) = 0}n=1 . Then for all n0 we have
{ln (ω) > rn i.o. } = lim sup{ln (ω) > rn }
=
=
n
∞
∞
\ [
{ln (ω) > rn }
n=n0 k=n
∞ [
∞
\
{dn (ω) = dn+1 (ω) = · · · = dn+rn −1 (ω) = 0}
n=n0 k=n
=
∞ [
∞ n+r
n −1
\
\
n=n0 k=n
Aj ∈ σhAp : p > n0 i
j=n
Therefore we can take intersection over all n0 and yields
∞
\
{ln (ω) > rn i.o. } ∈ T =
σhAp : p > n0 i
n0 =1
is a tail event.
THEOREM 1.6.1 (Kolmogorov’s Zero-One Law) Suppose {An }∞
n=1 is a sequence of inT∞
dependent events in (Ω, F, P ). Then for each A ∈ T = n=1 σhAp : p > ni we have either
P (A) = 0 or P (A) = 1.
Proof. Firstly we claim that σhAp : p > ni = σ(
S
show
63
p>n
Ap ). Define Ap = σ({Ap }). It suffices to
S
(I) σ( p>n Ap ) ⊂ σhAp : p > ni.
Firstly we have Ap ∈ σhAp : p > ni for all p > n.
Then we obtain Ap = σ({Ap }) ⊂ σhAp : p > ni and taking union over all p > n yields
S
S
p>n Ap ⊂ σhAp : p > ni. Hence we obtain σ( p>n Ap ) ⊂ σhAp : p > ni.
S
(II) σ( p>n Ap ) ⊃ σhAp : p > ni. Now we have σhAp : p > ni is generated by {Ap : p > n}.
Then by definition of Ap for all p > n we have Ap ∈ Ap and therefore {Ap : p > n} ⊂
S
S
A
.
And
this
yields
{A
:
p
>
n}
⊂
σ(
p
p
p>n
p>n Ap ). Therefore we have σhAp : p > ni ⊂
S
σ( p>n Ap ).
Now we know that {An }∞
n=1 are independent events, then we know that
{A1 }, {A2 }, · · · , {An }, · · ·
are singletons and thus are independent π-systems. Then the σ-fields generated by them An :=
σ({An }), n = 1, 2, · · · are independent. Therefore applying the splitting theorem yields that
S
for all n the classes A1 , A2 , · · · , An−1 , σhAp : p > ni = σ( p>n Ap ) are independent. Then the
T
subclasses theorem implies that for all n we have the classes A1 , · · · , An−1 , T = ∞
n=1 σhAp : p >
ni are independent. Since n can be arbitrary, then by the definition of independence of classes
we have A1 , A2 , · · · , An , · · · and T are independent. Again applying the splitting theorem yields
S
T and the class σ( ∞
n=1 Ap ) = σhAp : p > 1i are independent. Then by the subclasses theorem
we know that T and T are independent because T ⊂ σhAp : p > ni. Therefore T is independent
with itself. Hence for any tail event A ∈ T we have
P (A) = P (A ∩ A) = P (A)P (A)
yielding that either P (A) = 0 or P (A) = 1.
1.7
Rényi-Lamperti Lemma
LEMMA 1.7.1 (Rényi-Lamperti Lemma) Suppose (Ω, F, P ) is a probability space with
{An }∞
n=1 being a sequence of events satisfying
(i)
∞
X
Pn
P (An ) = ∞ (ii) lim inf
n=1
n
Pn
k=1 P (Aj ∩ Ak )
=c>1
Pn
2
( i=1 P (Ai ))
j=1
Then we have P (lim supn An ) > 2 − c.
Proof. For all n = 1, 2, · · · , ∞ set Nn =
Pn
k=1 IAk ,
lim sup An = {N∞ = ∞} =
n
∞
\
then we know that
{N∞ > k} =
k=1
64
∞ [
∞
\
{Nn > k}
k=1 n=1
For any k, select n to be large enough such that ENn > k and write
P {Nn > k} = P {Nn − ENn > k − ENn }
= P {Nn − ENn > −(ENn − k)}
> P {(ENn − k) > Nn − ENn > −(ENn − k)}
= P {|Nn − ENn | 6 (ENn − k)}
>1−
V ar(Nn )
(ENn − k)2
Then by monotone sequential continuity from above and below we obtain
P lim sup An = lim lim P {Nk > k}
n→∞ k→∞
n
V ar(Nn )
> lim sup lim sup 1 −
(ENn − k)2
n→∞
k→∞
ENn2 − (ENn )2
= lim sup 1 − lim inf
n→∞
(ENn − k)2
k→∞
P
Now that we have ENn = nk=1 P (Ak ) → ∞ as n → ∞, then we know that for fixed k, the
following holds
ENn2 − (ENn )2
P lim sup An > lim sup 1 − lim inf
n→∞
(ENn )2
n
k→∞
ENn2
= lim sup 2 − lim inf
n→∞ (ENn )2
k→∞
ENn2
= 2 − lim inf
n→∞ (ENn )2
Pn Pn
P (Aj ∩ Ak )
j=1
Pnk=1
= 2 − lim inf
n→∞
( i=1 P (Ai ))2
=2−c
Now we can generalize the second Borel-Cantelli lemma to a weaker version.
THEOREM 1.7.1 Suppose that in a probability space (Ω, F, P ) there exists a sequence of
P∞
events {An }∞
n=1 being pairwise independent. If
n=1 P (An ) = ∞, then P (lim supn An ) = 1.
Proof. Since {An }∞
n=1 are pairwise independent, then we have P (Aj ∩ Ak ) = P (Ak ) if j = k and
P (Aj ∩ Ak ) = P (Aj )P (Ak ). Therefore
P
lim inf
n
P (Aj ∩ Ak )
=1
2
i6n P (Ai ))
j,k6n
(
P
65
And this through Rényi-Lamberti lemma yields P (lim supn An ) > 2−c = 1. Therefore we obtain
P (lim supn An ) = 1.
EXAMPLE 1.7.1 We return to the RUN LENGTHS example. Write An = {ln (ω) > rn }
P
P∞ −rn
and now assume that ∞
= ∞. Then we want to estimate
n=1 P (An ) =
n=1 2
P
j,k6n P (Aj ∩ Ak )
P
( i6n P (Ai ))2
Consider the case where j 6= k. Without loss of generality we may assume that j < k. If
k > j + rj then we see that Aj and Ak are independent and P (Aj ∩ Ak ) = P (Aj )P (Ak ). Now if
k < j + rj . Then
P (Aj ∩ Ak ) = P {dj (ω) = dj+1 (ω) = · · · = dmax{j+rj ,k+rk }−1 (ω) = 0}
= 2− max{j+rj ,k+rk }+j
= 2− max{rj ,k−j+rk }
= 2− max{rj −rk ,k−j}−rk
= 2−rk 2− max{rj −rk ,k−j}
= 2−rk 2−(k−j)
= P (Ak )2−(k−j)
Then we can compute
X
X
P (Ai ) + 2
P (Aj ∩ Ak ) =
i6n
j,k6n
X
P (Aj ∩ Ak )
j,k6n,j<k


=
X
P (Ai ) + 2 
X
i6n
X
P (Aj ∩ Ak ) +
X X
P (Aj ∩ Ak )
j6n k>j+rj
j6n j<k<j+rj

6
6
X
P (Ai ) + 2 

X
X
P (Ak )
i6n
j6n j<k<j+rj
X
X
P (Ai ) + 2
i6n
63
X
X
P (Ak )
k6n
P (Ak ) +
k6n
X
j<k6n
1
2k−j
1
2k−j
+
X
P (Aj )P (Ak )
16j<k6n
!
+
X
P (Aj )P (Ak )
16j<k6n
P (Aj )P (Ak )
j6=k
!2
63
X
k6n
P (Ak ) +
X
P (Aj )
j6n
Therefore we obtain that
P
P
3
j,k6n P (Aj ∩ Ak )
j,k6n P (Aj ∩ Ak )
P
P
P
6
lim
sup
6
lim
+1=1
1 6 lim inf
n→∞
n
( i6n P (Ai ))2
( i6n P (Ai ))2
n
k6n P (Ak )
66
Namely we obtain that
P
c = lim inf
n
P (Aj ∩ Ak )
=1
2
i6n P (Ai ))
j,k6n
(
P
yielding that P (lim supn An ) = 1 by the Rényi-Lamberti lemma. Therefore we obtain that
P∞ −rn
= ∞ implies P {An i.o. } = 1. But the first Borel Cantelli lemma also indicates that
n=1 2
P
P∞ −rn
= ∞. Hence we yields that
P {ln (ω) > rn i.o. } = 1 > 0 implies ∞
n=1 P (An ) =
n=1 2
P∞ −rn
= ∞ if and only if P {ln (ω) > rn i.o. } = 1.
n=1 2
Concerning the upper-bound for the limit superior of An , we have a better result.
LEMMA 1.7.2 (Kochen-Stone lemma) Suppose (Ω, F, P ) is a probability space with
{An }∞
n=1 being a sequence of events satisfying
(i)
∞
X
Pn
P (An ) = ∞ (ii) lim inf
n
n=1
Pn
k=1 P (Aj ∩ Ak )
=c>1
Pn
2
( i=1 P (Ai ))
j=1
Then we have P (lim supn An ) > 1/c.
Proof. For all n = 1, 2, · · · , ∞ set Nn =
Pn
k=1 IAk ,
then we know that for all ε > 0, for all k we
can find a n such that εENn > k. Then we have
lim sup An = {N∞ = ∞} =
n
∞
[
{Nn > εENn }
n=1
Taking f = INn >εENn and g = Nn and applying the Cauchy-Schwarz inequality (Ef g)2 6
Ef 2 Eg 2 yields
E[IN2 n >εENn ]ENn2 = P {Nn > εENn }ENn2
> (E[Nn INn >εENn ])2
= (E[Nn − Nn INn <εENn ])2
> (E[Nn ] − εENn P {Nn < εENn })2
> (1 − ε)2 (ENn )2
This means that we have P {Nn > εENn } > (1 − ε)2 (ENn )2 /(ENn2 ). Then by mini-Fatou’s
lemma for all ε > 0 we obtain
P lim sup An = P lim sup{Nn > εENn }
n
n
ENn2
2
n→∞ (ENn )
P
( i6n P (Ai ))2
> (1 − ε)2 lim sup P
n→∞
j,k6n P (Aj ∩ Ak )
1
> (1 − ε)2
c
Then we can let ε → 0 to get that P (lim supn An ) > 1/c.
> (1 − ε)2 lim sup
67
EXAMPLE 1.7.2 Show that the bound in the Kochen-Stone lemma is sharp in the following
sense: for each 1 6 c 6 ∞ there exists a probability space and a sequence of events {An }∞
n=1
such that
(i)
∞
X
Pn
P (An ) = ∞ (ii) lim inf
n
n=1
Pn
k=1 P (Aj ∩ Ak )
=c>1
Pn
2
( i=1 P (Ai ))
j=1
holds and P (lim supn An ) = 1/c.
Solution. Take Ω = (0, 1], F = B to be the Borel σ-field and P = λ be the Lebesgue measure.
(I) 1 < c 6 ∞. Now Set An = (0, 1/c + 1/(n0 + n)) for all any sufficient large n0 such that
1/c + 1/n 6 1 for all n > n0 . Then we see that lim supn An = (0, 1/c] and P (lim supn An ) =
P
P
1/c. We also see that n P (An ) = n (1/c + 1/(n + n0 )) = ∞. Now it suffices to check
that (ii) holds. Write
X
P (Aj ∩ Ak ) =
16j,k6n
=
n
X
i=1
n
X
P (Ai ) + 2
P (Ai ) + 2
i=1
=
=
P (Aj ∩ Ak )
j<k
n
X
(k − 1)P (Ak )
k=2
n X
1
i=1
X
1
+
c n0 + i
+2
n
X
(k − 1)
k=2
1
1
+
c n0 + k
n
n
X
n0 + 1
n X 1
n(n − 1)
+
+
−2
+n−1
c
n
+
i
c
n
+
k
0
0
i=1
k=2
n
n2 X 2n0 + 1
=
−
+n−1
c
n
0+k
k=2
And
n
X
P (Ak ) =
k=1
n X
1
k=1
1
+
c n0 + k
n
n X 1
= +
c k=1 n0 + k
n2
c
−
Therefore we obtain
P
lim
n→∞
P (Aj ∩ Ak )
P
= lim
n→∞
( k6n P (Ak ))2
j,k6n
= lim
n→∞
1
c
−
Pn
2n0 +1
k=2 n0 +k
n
c
+
Pn
1
c
Pn
+n−1
2
1
k=1 n0 +k
2n0 +1
k=2 n2 (n0 +k)
+
Pn
+
1
k=1 n(n0 +k)
n−1
n2
2
= lim = c
n→∞
Therefore we that the limit is exactly the limit inferior which is c. Namely condition (ii)
holds.
68
(II) c = 1. Now set An = (0, 1] for all n, then we see that (i)holds automatically because
P (An ) = 1 > 0 is a constant. And (ii) holds because any intersection is exactly (0, 1] itself
and the ratio
P
P (Aj ∩ Ak )
=1
2
i6n P (Ai ))
j,k6n
(
P
for all n. Also we have P (lim supn An ) = P (limn An ) = P ((0, 1]) = 1.
EXAMPLE 1.7.3 Let ln be the RUN LENGTHS function on ((0, 1], B, λ).
(a) Show that for each k > 1
P {ln > Ln + LLn + · · · + L(k−1) n + (1 + ε)L(k) n i.o. } = 0 or 1 acc. as ε > 0 or ε 6 0
here L := log2 and L(k) is the kth iterate of L.
(b) Show that with probability one the limit points of the random sequence
ln − Ln
LLn n>3
form precisely the interval [−∞, 1].
Proof.
(a) Set r(n) = Ln + LLn + · · · + L(k−1) n + (1 + ε)L(k) n. Then we have
Z ∞
Z ∞
dx
−r(x)
2
dx =
x(Lx)(LLx) · · · (L(k−2) x)(L(k−1) x)1+ε
0
Z0 ∞
dLx
=
(Lx)(LLx) · · · (L(k−2) x)(L(k−1) x)1+ε
Z0 ∞
dLLx
=
(LLx) · · · (L(k−2) x)(L(k−1) x)1+ε
0
= ············
Z ∞
dL(k−1) x
=
(L(k−1) x)1+ε
Z0 ∞
du
=
u1+ε
0
where we set u = L(k−1) x. Now if ε > 0 then the improper integral converges and hence the
P
−r(n)
corresponding series ∞
converges and we obtain P {ln (ω) > rn i.o. } = 0 by the first
n=1 2
P
−r(n)
Borel-Cantelli lemma. Now if ε 6 0 then the integral diverges and ∞
= ∞ diverges.
n=1 2
Therefore by Ex1.7.1 we obtain that P {ln > r(n) i.o. } = 1.
(b)
69
1.8
Probability Measures and Distribution Functions on (R, R)
DEFINITION 1.8.1 Let (Ω, F, P ) = (R, R, P ) where R is the σ-field generated by all open
intervals in R and P is a probability measure on R. Then the distribution function of P is the
function FP : R → [0, 1] defined by
FP (t) = P ((−∞, t])
EXAMPLE 1.8.1 Given a point x ∈ R, define the probability measure on (R, R) to be
P (A) = δx (A) := IA (x)
Now we can check that P is a probability measure.
• P (A) > 0. P (A) = IA (x) > 0 holds naturally.
• P (Ω) = 1.P (R) = IR (x) = 1 because x is a point in R.
P
P
• P ( n An ) = n P (An ). Now assume that {An }∞
n=1 is a sequence of disjoint sets in R. If
P
/ An and this yields P (An ) = IAn (x) = 0. But we also
x∈
/ n An then for all n we have x ∈
P
have x ∈
/ n An and this yields
!
∞
∞
X
X
P
P
An = I n An (x) = 0 =
P (An )
n=1
If x ∈
P
n
n=1
An then there exists an k such that x ∈ Ek and for all n 6= k we have x ∈
/ En
because {En }∞
n=1 are disjoint. Therefore we have
!
∞
∞
X
X
X
X
P (An )
P (An ) =
IAn (x) = P (Ak ) +
P
An = 1 = P (Ak ) = IAk (x) +
n=1
n6=k
n6=k
n=1
Concerning the distribution function, we have the following properties
PROPOSITION 1.8.1 Let (Ω, F, P ) = (R, R, P ) and FP be the distribution function of P .
Then we have
(1) FP is nondecreasing. That is, x < y implies F (x) 6 F (y).
(2) FP is right continuous.
(3) FP is normalized in the sense that
FP (−∞) = lim FP (t) = 0
t→−∞
FP (+∞) = lim FP (t) = 0
t→−∞
70
Proof.
(1) If x < y then we have (−∞, x] ⊂ (−∞, y] and this yields FP (x) = P ((−∞, x]) 6 P ((−∞, y]) =
FP (y) by monotonicity.
(2) For all x consider any sequence tn ↓ x. We claim that
∞
\
(−∞, x] =
(−∞, tn ]
n=1
Firstly we note that {(−∞, tn ]}∞
n=1 is a sequence of decreasing sets. Then we see that every set
T
T∞
(−∞, tn ] contains (−∞, x] and thus (−∞, x] ⊂ ∞
n=1 (−∞, tn ]. For all y ∈
n=1 (−∞, tn ] we
have y ∈ (−∞, tn ] for all n. Now if y ∈
/ (−∞, x] then y > x and therefore there exists some
tn such that y > tn > x because tn ↓ x. Therefore y ∈
/ (−∞, tn ] and this contradicts with
y ∈ (−∞, tn ] for all n. Therefore y ∈ (−∞, x] and the reverse inclusion is proved.
Therefore by monotone sequential continuity from above we can see that
!
∞
\
FP (x) = P
(−∞, tn ] = lim P ((−∞, tn ]) = lim FP (tn )
n→∞
n=1
n→∞
And this shows the right-continuity of FP .
(3) Consider a sequence tn ↓ −∞ as n → ∞. Then we know that
∅=
∞
\
(−∞, tn ]
n=1
and by monotone sequential continuity from above we have
∞
\
FP (−∞) = lim FP (tn ) = lim P ((−∞, tn ]) = P
n→∞
n→∞
!
(−∞, tn ]
= P (∅) = 0
n=1
Similarly consider a sequence tn ↑ ∞ as n → ∞. Then we know that
R=
∞
[
(−∞, tn ]
n=1
and by monotone sequential continuity from below we have
FP (∞) = lim FP (tn ) = lim P ((−∞, tn ]) = P
n→∞
n→∞
∞
\
!
(−∞, tn ]
= P (R) = 1
n=1
DEFINITION 1.8.2 (Distribution Function) A function F : R → [0, 1] which is increasing, right-continuous and normalized is called a distribution function on R.
Now we are going to introduce the inverse probability transformation
71
DEFINITION 1.8.3 (Inverse Probability Transformation) Let F be a distribution function. Define the inverse probability transformation of F to be
F ∼ (t) = inf{x ∈ R : t 6 F (x)},
for all t ∈ (0, 1)
and define
F ∼ (0) = sup{x ∈ R : F (x) = 0} ∈ [−∞, +∞)
F ∼ (1) = inf{x ∈ R : F (x) = 1} ∈ (−∞, +∞]
The following lemma shows that the supremum and infimum in the definition can actually
be attained.
LEMMA 1.8.1 Let F be a distribution function and F ∼ be its inverse probability transformation. Then for all t ∈ (0, 1) we have
F (F ∼ (t)) > t,
F (F ∼ (0)) > 0,
F (F ∼ (1)) = 1
Proof. By the definition of infimum, for all t ∈ (0, 1), for all n there exists xn such that xn −1/n <
F ∼ (t) 6 xn with F (xn ) > t. Now we have 0 6 xn − F ∼ (t) < 1/n, namely xn ↓ F ∼ (t) as n → ∞,
then by the right-continuity of F we know that F (xn ) → F (F ∼ (t)). Since F (xn ) > t for all n,
then letting n → ∞ yields F (F ∼ (t)) > t.
Now F (F ∼ (0)) > 0 holds naturally because F (F ∼ (0)) = P ((−∞, F ∼ (0)]) > 0 is a probability
and therefore must be nonnegative.
Finally by the definition of infimum for all n there exists xn such that 0 6 xn − F ∼ (1) 6 1/n
with F (xn ) = 1. But this shows that xn ↓ F ∼ (1) and therefore by the right-continuity of F we
obtain F (F ∼ (1)) = limn→∞ F (xn ) = 1.
There is another useful lemma concerning the monotone functions.
LEMMA 1.8.2 Let F : R → R be a monotone increasing function. Then for any interval
I ⊂ R we have that F −1 (I) is also an interval.
Proof. Suppose we have I ∈ R is an interval and t, s ∈ F −1 (I). Without loss of generality we
can assume that t 6 s. Then there exists x, y ∈ I such that F (t) = x, F (s) = y. Now that F is
nondecreasing, we have x 6 y. Then for all α ∈ (0, 1) we have
x = F (t) 6 F (αt + (1 − α)s) 6 F (s) = y
by nondecreasingness of F because t 6 αt + (1 − α)s 6 s. This shows that F (αt + (1 − α)s) ∈ I,
namely, αt + (1 − α)s ∈ F −1 (I). Therefore F −1 (I) is an interval as well.
PROPOSITION 1.8.2 Let F be a distribution function and F ∼ be its inverse probability
transformation. Then
72
(a) F ∼ is nondecreasing.
(b) F ∼ is left continuous.
Proof.
(a) First we show that F is nondecreasing on (0, 1). Assume that 0 < t1 < t2 < 1, then we have
{x ∈ R : t1 6 F (x)} ⊃ {x ∈ R : t2 6 F (x)}
by nondecreasingness of F . Therefore taking the infimum yields
F ∼ (t1 ) = {x ∈ R : t1 6 F (x)} 6 {x ∈ R : t2 6 F (x)} = F ∼ (t2 )
And for all t ∈ (0, 1), we claim that F ∼ (t) ≥ F ∼ (0).
Now we show that F ∼ is nondecreasing on [0, 1]. It suffices to show that F ∼ (0) 6 F ∼ (t) for
all t ∈ (0, 1) because F ∼ (1) = inf{x ∈ R : F (x) = 1} and {x : F (x) = 1} ⊃ {x : F (x) > t}
for all t and taking the infimum yields the desired inequality. Consider any t ∈ (0, 1). Then
we know that F (F ∼ (t)) > t > 0. By lemma 1.8.2 we also know that F ∼ (0) is the supremum
of an interval, say, I. Therefore F (F ∼ (t)) > 0 implies that F ∼ (t) ∈
/ I and this yields that
F ∼ (t) > sup I = F ∼ (0).
(b) F ∼ is left continuous. Suppose we have t ∈ [0, 1] with tn ↑ t as n → ∞. By definition we
know that for all ε > 0 there exists an N such that for all n > N we have 0 6 t − tn < ε. Define
x = limn→∞ F ∼ (tn ) and this limit exists since {F ∼ (tn )}∞
n=1 is an increasing sequence. It suffices
to show
(i) x 6 F ∼ (t). This is because F ∼ (tn ) 6 F ∼ (t) for all n and letting n → ∞ yields x 6 F ∼ (t).
(ii) x > F ∼ (t). Now that for all ε there exists some N such that for all n > N we have
t 6 tn + ε, then we obtain that
t 6 tn + ε 6 F (F ∼ (tn )) + ε 6 F (x) + ε
because F ∼ (tn ) ↑ x as n → ∞. Thus we obtain t 6 F (x) + ε for all ε > 0 and this yields
that t 6 F (x). By the definition of infimum we have F ∼ (t) 6 x.
Now we make restrictions that F ∼ is a function from (0, 1) to R. Concerning the inverse
probability measure, the following switching relation is useful:
LEMMA 1.8.3 (Switching Relation) Let F be a distribution function and F ∼ be its inverse
probability transformation. Then t 6 F (x) if and only if F ∼ (t) 6 x.
73
Proof. It suffices to show that
(I) t 6 F (x) implies F ∼ (t) 6 x. By definition F ∼ (t) is the infimum of the set {u : F (u) > t}.
But we have F (x) > t and this indicates that x ∈ {u : F (u) > t}. Namely taking the
infimum yields F ∼ (t) 6 x.
(II) F ∼ (t) 6 x implies t 6 F (x). By lemma 1.8.1 we have t 6 F (F ∼ (t)) and the nondecreasingness of F and F ∼ (t) 6 x indicates that F (F ∼ (t)) 6 F (x). Therefore we have
t 6 F (F ∼ (t)) 6 F (x)
Now we can give the strict proof of the following theorem which is the main result of this
section:
THEOREM 1.8.1 Let F be a distribution function. Then there is a unique probability
measure P on (R, R) such that
P ((−∞, x]) = F (x),
−∞ < x < ∞
Proof. It suffices to show that
(I) Uniqueness: If P and Q are two probability measures with P ((−∞, x]) = F (x) =
Q((−∞, x]) for all x ∈ R, then P = Q on R. This can be proved using extension theorem.
Consider the field
( n
)
X
F0 =
(ak , bk ] : ak , bk ∈ R, k = 1, 2, · · · , n
k=1
where we requires the intervals (ak , bk ] to be mutually disjoint. We claim that P and Q agrees
on F0 . First it is easy to check that
P ((ak , bk ]) = F (bk ) − F (ak ) = Q((−∞, bk ]) − Q((−∞, ak ]) = Q((ak , bk ])
Then by the finite additivity of P and Q we can derive that P = Q on F0 . Using extension
theorem we can conclude that there exists a unique extension probability measure µ on R =
σ(F0 ), and P, Q are extension probability measures from F0 to R. Therefore P = Q on R.
(II) Existence of P with P ((−∞, x]) = F (x) for all x ∈ R. We still define F0 to be the field
of the form
( n
)
X
F0 =
(ak , bk ] : ak , bk ∈ R, k = 1, 2, · · · , n
k=1
Then we know that R = σ(F0 ). Now define
G = {B ∈ R : (F ∼ )−1 (B) ∈ B}
74
where B denote the σ-filed of (0, 1). Then we claim that G = R. In fact it suffices to show that
G ⊃ R. Namely we only need to show
(a) G is a σ-field
(i) Ω ∈ G. Now we have (F ∼ )−1 (R) = (0, 1) ∈ B, then we know that R = Ω ∈ G by the
definition of G.
(ii) G is closed under complementation. If A ∈ G then by definition we know that
A ∈ R and Ac ∈ R since R is a σ-field. We also know that (F ∼ )−1 (A) ∈ B. Then
(F ∼ )−1 (Ac ) = (F ∼ )−1 (R − A) = (F ∼ )−1 (R) − (F ∼ )−1 (A) = (0, 1) − (F ∼ )−1 (A) =
((F ∼ )−1 (A))c ∈ B also holds. Therefore we have Ac ∈ G.
(iii) G is closed under countable intersection. If {An }∞
n=1 ⊂ G is a countable sequence
of sets in G, then we know that every An is in R and (F ∼ )−1 (An ) ∈ B. Therefore we
S
have ∞
n=1 An ∈ R because R itself is a σ-field. Also
!
∞
∞
[
[
∼ −1
(F )
An =
(F ∼ )−1 (An ) ∈ B
n=1
n=1
because every (F ∼ )−1 (An ) is in B and B is a σ-field. Then by the definition of G we
S
have ∞
n=1 An ∈ G.
(b) G contains all intervals. Now we know that F ∼ is a monotone function. Then if I ∈ R
is an interval, we conclude that (F ∼ )−1 (I) is also an interval. But intervals are in B, which
is the Borel σ-field. Then we know that I ∈ G by the definition of G.
Then we know that G is a σ-field contains all intervals. But R is generated by all intervals,
therefore we conclude that G ⊃ R.
Next construct the probability measure to be
P (B) = λ((F ∼ )−1 (B))
But we need to check that P is indeed a probability measure on R. Namely:
(a) P (B) > 0. This derives from the property that the Lebesgue measure is nonnegative.
(b) P (R) = 1. This is because P (R) = λ((F ∼ )−1 (R) = λ((0, 1)) = 1.
(c) P is countably additive. Now if {An }∞
n=1 is a countable sequence of sets in R, then we
75
have
P
∞
X
!
An
= λ (F ∼ )−1
∞
X
n=1
!!
An
n=1
=λ
∞
X
!
(F ∼ )−1 (An )
n=1
=
∞
X
∼ −1
λ((F ) (An )) =
n=1
∞
X
P (An )
n=1
Therefore P is indeed a probability measure on R.
Finally we need to check that for all x ∈ R we have F (x) = P ((−∞, x]). Now
P ((−∞, x]) = λ((F ∼ )−1 ((−∞, x]))
= λ({t : F ∼ (t) 6 x})
= λ({t : t 6 F (x)})
where the last equality holds because of the switching relation. Then we see that
P ((−∞, x]) = λ({t : t 6 F (x)}) = λ((0, F (x)]) = F (x)
EXAMPLE 1.8.2 (Billingsley Problem 14.3(b)) Show that the inverse probability transformation F ∼ satisfies F (F ∼ (t)−) 6 t 6 F (F ∼ (t)) and that, if F is continuous on some set
E ⊂ R, then F (F ∼ (t)) = t for all t ∈ (F ∼ )−1 (E).
Proof. First we need to show that F (F ∼ (t)−) 6 t 6 F (F ∼ (t)). The second part is simply
yielded by the switching relation, then it suffices to prove F (F ∼ (t)−) 6 t. Now if we have a
sequence xn ↑ F ∼ (t), then the limit can be written as
lim F (xn ) = F (F ∼ (t)−)
n→∞
We know that xn 6 F ∼ (t) = inf{x : t 6 F (x)} and F ∼ (t) is the left point of an interval
F −1 ([t, 1]). Then xn 6 F ∼ (t) implies that xn is not in the interval except the left point. Namely
we have F (xn ) > t by the definition of the interval F −1 ([t, 1]). Letting n → ∞ yields that
limn→∞ F (xn ) = F (F ∼ (t)−) 6 t.
Now if we have the condition that F is continuous on E ⊂ R, then if xn ↑ F ∼ (t) with
{xn }∞
n=1 ⊂ E we can write the limit as
lim F (xn ) = F (F ∼ (t)−)
n→∞
similarly we can know that F (xn ) > t. Letting n → ∞ yields that limn→∞ F (xn ) = F (F ∼ (t)−) 6
t.
But now we know that F is continuous on E, and F ∼ (t) ∈ E which can be derived
76
from the fact that t ∈ (F ∼ )−1 (E). Hence the left-limit is equal to the value of the function
F (F ∼ (t)−) = F (F ∼ (t)) and this yields that F (F ∼ (t)−) = t = F (F ∼ (t)).
EXAMPLE 1.8.3 Let Fn , n > 1 and F be distribution functions on R. Suppose Fn → F
weakly, in the sense that
Fn (x) → F (x)
For all x which F is continuous. Show that Fn∼ (t) → F ∼ (t) for all but at most countably many
t, in particular for almost all t respect to Lebesgue measure.
Proof. We finish the proof by the following steps: Define An to be the set of points where
Fn is discontinuous for all n ∈ N, and A to be the set of points where F is discontinuous.
Then we know that every An is at most countable and A is at most countable since Fn , F are
S
nondecreasing functions. Then E := R−(A ∪ ∞
n=1 An ) is the set where Fn and F are continuous
with Fn (x) → F (x) and R − E is at most countable. Consider the set B = F (E). Then by the
relation
F (R − E) ⊃ F (R) − F (E) = (0, 1) − B
we know that (0, 1) − B is at most countable because R − E is at most countable.
EXAMPLE 1.8.4 Consider the probability space (R, R, P ) and the distribution function F
of P . then P ({x}) = F (x) − F (x−) for all x ∈ R.
Proof. Now let {xn }∞
n=1 be a sequence such that xn ↑ x. Then we see that
S∞
(−∞, x) and by the monotone sequential continuity from below we obtain that
P ({x}) = P ((−∞, x]) − P ((−∞, x))
!
∞
[
= F (x) − P
(−∞, xn ]
n=1
= F (x) − lim P ((−∞, xn ])
n→∞
= F (x) − lim F (xn )
n→∞
= F (x) − F (x−)
77
n=1 (−∞, xn ]
=
Chapter 2
Random Variables
2.1
Borel Sets in Rk
DEFINITION 2.1.1 Define Rk to be the σ-field generated by all bounded rectangles in Rk
and call the elements in Rk Borel sets in Rk .
Here are some basic reults concerning the Borel sets in Rk and the proofs are omitted.
PROPOSITION 2.1.1 Let Rk be the collection of all Borel sets in Rk . Then
Rk = σhbounded rectanglesi = σhopen setsi = σhclosed setsi
= σhbounded rectangles with vertices having rational coordinatesi
PROPOSITION 2.1.2 If A ∈ Rk and x ∈ Rk then x + A ∈ Rk .
PROPOSITION 2.1.3 Let B denote the Borel σ-field on (0, 1] and R be the collection of all
Borel sets on R. Then we have B = R ∩ (0, 1], namely, B is the trace of R on (0, 1].
PROPOSITION 2.1.4 Let B be the Borel σ-field on (0, 1], then we have
(
)
X
R= A=
(k + Bk ) : Bk ∈ B, k ∈ Z
k∈Z
and the Bk ’s are uniquely determined by A: Bk = (A − k) ∩ (0, 1].
2.2
Measurable Functions and Mappings
DEFINITION 2.2.1 Let (Ω, F), (Ω0 , F 0 ) be a measurable space. A mapping X : Ω → Ω0
is called a F/F 0 -measurable mapping, if for all A0 ∈ F 0 the pull back of A0 under X, namely,
X −1 (A0 ) ∈ F is measurable. If (Ω0 , F 0 ) = (R, R) and (Ω, F) = (Rk , Rk ) then X is called a Borel
function.
78
To be more specific, we consider the following situation:
DEFINITION 2.2.2 Let (Ω, F, P ) be a probability space, (Ω0 , F 0 ) be a measurable space
and X : Ω → Ω0 be a measurable mapping. If (Ω0 , F 0 ) = (R, R) then X is called a random
variable; If (Ω, F 0 ) = (Rk , Rk ) then X is called a random vector.
PROPOSITION 2.2.1 Let X : Ω → Ω0 be a F/F 0 -measurable mapping. Then {X −1 (A0 ) :
A0 ∈ F 0 } is a σ-field. This σ-field, denoted by σhXi, is called the σ-field generated by X.
DEFINITION 2.2.3 Assume X : (Ω, F) → (Ω0 , F 0 ) be a mapping. The class X −1 G 0 :=
{X −1 A0 : A0 ∈ G 0 } is defined for all G 0 ⊂ F 0 .
This proposition can be proved simply by using the fact that pull back function conserve all
set operations.
LEMMA 2.2.1 Let (Ω, F) and (Ω0 , F 0 ) be two measurable spaces and X : Ω → Ω0 . If
F 0 = σ(G 0 ) for some G 0 ⊂ F 0 , then σhXi = σ(X −1 G 0 ).
Proof. It suffices to show:
(I) σhXi ⊂ σ(X −1 G 0 ). By definition we know that σhXi = X −1 F 0 . Then we need to show
that X −1 F 0 ⊂ σ(X −1 G 0 ). Namely, we need to show that for all A0 ∈ F 0 the pull back
X −1 A0 ∈ σ(X −1 G 0 ). For this we use the ”good sets” argument. Define
G := {A0 ∈ F 0 : X −1 (A0 ) ∈ σ(X −1 G 0 )}
Then if we can show that G ⊃ F 0 then we obtain that X −1 F 0 ⊂ σ(X −1 G 0 ). Therefore now
it suffices to show that G ⊃ F 0 = σ(G 0 ). To see this, we need to verify
(i) G ⊃ G 0 . By definition A0 ∈ G 0 implies that A0 ∈ σ(G 0 ) = F 0 and X −1 A0 ∈ X −1 G 0 ⊂
σ(X −1 G 0 ). This shows that A0 ∈ G by definition. Hence we obtain G 0 ⊂ G.
(ii) G is a σ-field. To see this, we need to verify
(a) G is nonempty. Now we know that G 0 is nonempty because it is the generator of
a σ-field. Therefore G ⊃ G 0 is also nonempty.
(b) G is closed under complementation. If A0 ∈ G then A0 ∈ F 0 and X −1 (A0 ) ∈
σ(X −1 G 0 ). But both F 0 and σ(X −1 G 0 ) are σ-fields, therefore A0c ∈ F 0 and X −1 (A0c ) =
(X −1 A0 )c ∈ σ(X −1 G 0 ). By definition of G we know that Ac ∈ G.
(c) G is closed under countable union. If {A0n }∞
n=1 is a sequence of sets in G, then
by definition for all n we have A0n ∈ F 0 and X −1 (A0n ) ∈ σ(X −1 G 0 ). This yields that
S∞
S
S∞
0
−1
−1
( ∞
(An ) ∈ σ(X −1 G 0 ). Then again by
n=1 An ∈ F and X
n=1 An ) =
n=1 X
S
the definition of G we obtain that ∞
n=1 An ∈ G.
79
(II) σ(X −1 G 0 ) ⊂ σhXi. To see this, we only need to show that X −1 G 0 ⊂ σhXi. But this is true
since G 0 ⊂ F 0 and
X −1 G 0 ⊂ X −1 F 0 = σhXi
Therefore we know that σ(X −1 G 0 ) ⊂ σhXi.
THEOREM 2.2.1 (Generators are enough) Let X : (Ω, F) → (Ω0 , F 0 ) be a mapping. If
F 0 = σ(G 0 ) and for all A0 ∈ G 0 the pull back X −1 (A0 ) ∈ F then X is F/F 0 -measurable.
Proof. By lemma 2.2.1 we know that σhXi = σ(X −1 G 0 ). Now we know that every A0 ∈ G 0
satisfies that X −1 A0 ∈ F, then this shows that {X −1 A0 : A0 ∈ G 0 } = X −1 G 0 ⊂ F. Therefore
σ(X −1 G 0 ) = σhXi ⊂ F. And this yields that X is F/F 0 measurable.
THEOREM 2.2.2 (Composition Rule) Suppose that X : (Ω, F) → (Ω0 , F 0 ) and X 0 :
(Ω0 , F 0 ) → (Ω00 , F 00 ) are measurable, then so is the composition X 0 ◦ X : (Ω, F) → (Ω00 , F 00 ).
Proof. Assume that A00 ∈ F 00 , then (X 0 ◦ X)−1 (A00 ) = {x ∈ Ω : X 0 ◦ X(x) ∈ A00 } = {x ∈ Ω :
X(x) ∈ X 0−1 (A00 )} = {x ∈ Ω : x ∈ X −1 (X 0−1 (A00 ))} = X −1 ◦ X 0−1 (A00 ). Now we know that X 0
is measurable, therefore X 0−1 (A00 ) ∈ F 0 ; But X is measurable, therefore X −1 (X 0−1 (A00 )) ∈ F.
This shows that (X 0 ◦ X)−1 (A00 ) ∈ F and therefore X 0 ◦ X is measurable.
EXAMPLE 2.2.1 X : (Ω, F) → (Rk , Rk ) is a random vector if and only if X = (X1 , · · · , Xk )
where every Xj , j = 1, 2, · · · , k is a random variable. To see this we only need to check the
generators, namely, all south-west regions of the form (−∞, x1 ] × · · · × (−∞, xk ] ⊂ Rk . Then
X −1 ((−∞, x1 ] × · · · × (−∞, xk ]) = {ω : Xj (ω) 6 xj , 1 6 j 6 k}
n
n
\
\
=
{Xj 6 xj } =
Xj−1 ((−∞, xj ])
j=1
j=1
Now if X is a random vector then for all j take xi = +∞ for all i 6= j, then the intersection of
the pull backs are
n
\
Xj−1 ((−∞, xj ]) = Xj−1 ((−∞, xj ])
j=1
which is measurable in F. Therefore by the generators is enough theorem we obtain that Xj is
a random variable. Conversely if all Xj ’s are measurable then so is the intersection of all pull
backs Xj−1 ((−∞, xj ]). This again by generators are enough theorem yields that X is a random
vector.
80
DEFINITION 2.2.4 Define R to be the extended real line containing ±∞. Namely, define
R = R ∪ {±∞}. Now define the corresponding Borel σ-fields to be R = R ∪ {C ∪ −∞ : C ∈
R} ∪ {C ∪ +∞ : C ∈ R} ∪ {C ∪ −∞, +∞ : C ∈ R}. Measurable functions X : (Ω, F f ) → (R, R)
are also called random variables.
THEOREM 2.2.3 (Continuity is enough for Borel functions) If X : Ri → Rk is continuous, then it is measurable.
Proof. Since X is continuous, then if G is an open set we have X −1 (G) is an open set. Note
that Rk is generated by all open sets, and X −1 (G) is an open set in Ri which is also a Borel set
in Ri . Therefore using the generators-are-enough theorem we obtain that X is measurable.
THEOREM 2.2.4 (Composite Function Theorem) If Xj , j = 1, 2, · · · , k are random
variables on a common measurable space (Ω, F) and g : Rk → R is a Borel function, then
Y = g(X1 , · · · , Xk ) is a random variable.
Proof. Now we know that X := (X1 , · · · , Xk ) is a random vector due to example 2.2.1. In
addition we also know that g is measurable. Therefore by the composition rule we know that
g ◦ X = g(X1 , · · · , Xk ) is measurable g ◦ X : (Ω, F) → (R, R). Therefore g ◦ X is a random
variable.
PROPOSITION 2.2.2 (Closure Theorem)
(i) If X, Y are random variables then so are cY for any c ∈ R, X + Y , max(X, Y ), min(X, Y ),
X + , X − , |X|, XY and X/Y if Y (ω) 6= 0 for all ω ∈ Ω.
(ii) Let {Xn }∞
n=1 be a sequence of random variables on (Ω, F). Then
1 supn Xn , inf n Xn , lim supn Xn , lim inf n Xn are random variables.
2 If limn→∞ Xn exists everywhere, then it is a random variable.
3 The convergence set C := {ω : limn→∞ Xn (ω) exists } is in F.
4 If X is any random variable, then {ω : Xn (ω) → X(ω), n → ∞} is in F.
Proof.
(i) We only need to check the pull backs of generators are measurable. Now consider any western
regions (−∞, x] ∈ R.
• If c > 0 then {cY ∈ (−∞, x]} = {Y ∈ (−∞, x/c]} and by definition the pull back of
(−∞, x/c] under Y is measurable. Therefore cY is measurable. If c < 0 then {cY ∈
(−∞, x]} = {Y ∈ [x/c, +∞)} is measurable because Y is a random variable. If c = 0 then
81
cY = 0 for all ω ∈ Ω and the pull back of any set under cY is either ∅ or Ω, which are
measurable.
• Consider the binary function f (x, y) = x + y from R2 → R. Then f is continuous and
therefore is a Borel function. But (X, Y ) is a random vector, therefore by the composite
function theorem we have f (X, Y ) = X + Y is a random variable.
• Consider the binary function g(x, y) = max(x, y), h(x, y) = min(x, y), then g(x, y) = 21 (x +
y + |x − y|), h(x, y) = 12 (x + y − |x − y|) are also continuous function. Therefore similarly
we have g(X, Y ) = max(X, Y ), h(X, Y ) = min(X, Y ) are random variables.
• Consider the function l(x) = |x|. Then l is continuous and applying the composite function
theorem again yields that |X| = l(X) is a random variable.
• X + = max(X, 0), X − = max −X, 0 and we know that the maximum of two random variables are random variables. Then X + , X − are random variables.
• Consider the binary function p(x, y) = x/y on y 6= 0. Then p is continuous and applying
the composite function theorem yields that p(X, Y ) = X/Y is a random variable provided
(ii)
that Y (ω) 6= 0 for all ω ∈ Ω.
T
• {supn Xn 6 x} = ∞
n=1 {Xn 6 x} lies in F even for x = ∞ and x = −∞, and so supn Xn
is a random variable. Then we know that inf n Xn = supn (−Xn ) where −Xn are all random
variables. Therefore applying the previous result yields inf n Xn is a random variable. Now
we conclude that lim supn Xn = supn→∞ inf k>n Xk and lim inf n Xn = inf n→∞ supk>n Xk are
random variables because for all n the supremum and infimum supk>n Xk , inf k>n Xk are
random variables.
• If limn Xn (ω) exists for all ω ∈ Ω then limn Xn = lim supn Xn = lim inf n Xn and therefore
limn Xn is a random variable by our previous result.
• The convergence set C can be written as
C = {ω : lim sup Xn (ω) = lim inf Xn (ω)} = {ω : lim sup Xn − lim inf Xn = 0}
n
n
n
n
Now we know that lim supn Xn − lim inf n Xn is a random variable, therefore the pull back
of {0} is measurable, indicating that C is measurable in F.
• The set can be written as
{ω : Xn (ω) → X(ω)} = {ω : lim sup Xn − X = 0} ∩ {ω : lim inf Xn − X = 0}
n
n
82
Therefore it is the union of two pull backs of {0} under lim supn Xn −X and lim inf n Xn −X
where these two are random variables, and thus are measurable.
DEFINITION 2.2.5 (Simple Random Variable) A random variable X : (Ω, F) → (R, R)
is said to be a simple random variable, if there exists finite disjoint measurable sets {An }nk=1
P
with Ω = nk=1 Ak .
THEOREM 2.2.5 (Structure Theorem) Let X : (Ω, F) → (R, R) be a nonnegative mapping. Then X is a random variable if and only if X is a limit of simple random variables {Xn }∞
n=1
with the properties 0 6 Xn ↑ X as n → ∞.
Proof. For all n ∈ N define the following simple random variable

−n, X(ω) ∈ [−∞, n)






n, X(ω) ∈ [n, ∞]



Xn = − (k − 1) , X(ω) ∈ − k , (k − 1) , k = 1, 2, · · · , n2n


2n
2n
2n





(k − 1) k
(k − 1)


, X(ω) ∈
, n , k = 1, 2, · · · , n2n
2n
2n
2
Then we assert that Xn → X for all ω.
• If X(ω) = +∞ then for all n we have X(ω) ∈ [n, ∞] and this yields Xn (ω) = n → ∞ =
X(ω) as n → ∞ by the definition of Xn .
• If X(ω) = −∞ then for all n we have X(ω) ∈ [−∞, n) and this yields that Xn (ω) = −n →
−∞ = X(ω) as n → ∞ by the definition of Xn .
• If X(ω) > 0 is finite. Then there exists some N such that X(ω) < N . Then for all n > N
there exists some k ∈ {1, 2, · · · , n2n } such that X(ω) ∈ [(k − 1)/2n , k/2n ). Then we obtain
that Xn (ω) = (k − 1)/2n and computing the discrepancy between Xn and X yields
|Xn (ω) − X(ω)| 6
1
2n
for all n > N
Now letting n → ∞ yields that Xn (ω) → X(ω).
• If X(ω) < 0 is finite. Then there exists some N such that X(ω) > −N . Then for all n > N
there exists some k ∈ {1, 2, · · · , n2n } such that X(ω) ∈ [−k/2n , −(k − 1)/2n ). Then we
obtain that Xn (ω) = −(k − 1)/2n and computing the discrepancy between Xn and X yields
|Xn (ω) − X(ω)| 6
1
2n
Now letting n → ∞ yields that Xn (ω) → X(ω).
83
for all n > N
Next we claim that Xn increases where X(ω) > 0 and Xn decreases where X(ω) < 0.
• If X(ω) = ∞ then Xn (ω) = n for all n and this yields that Xn (ω) = n < n + 1 = Xn+1 (ω).
• If X(ω) = −∞ then Xn (ω) = −n for all n and this yields that Xn (ω) = −n > −(n + 1) =
Xn+1 (ω).
• If X(ω) > 0 is finite then X(ω) ∈ [(k −1)/2n , k/2n ) for some k and this yields that Xn (ω) =
(k − 1)/2n . But we can conclude that either Xn+1 (ω) = 2(k − 1)/2n+1 or Xn+1 (ω) = (2k −
1)/2n depends on either X(ω) ∈ [(k −1)/2n , (2k −1)/2n+1 ) or X(ω) ∈ [(2k −1)/2n+1 , k/2n ).
And this shows that Xn (ω) 6 Xn+1 (ω).
• If X(ω) < 0 is finite then X(ω) ∈ [−k/2n , −(k − 1)/2n ) for some k and this yields that
Xn (ω) = −(k − 1)/2n . But we can conclude that either Xn+1 (ω) = −2(k − 1)/2n+1 or
Xn+1 (ω) = −(2k − 1)/2n depends on either X(ω) ∈ [−k/2n , −(2k − 1)/2n+1 ) or X(ω) ∈
[−(2k − 1)/2n+1 , −(k − 1)/2n ). And this shows that Xn (ω) > Xn+1 (ω).
THEOREM 2.2.6 (Scissors and Paste) Let (Ω, A) and (Ψ, B) be two measurable space.
S∞
Let {An }∞
n=1 be countable covering of Ω by A-sets; namely, Ω =
n=1 An with each An ∈ A.
For each n > 1 let A ∩ An be the trace σ-field of A on An . Let X : Ω → Ψ. Then X is A/B
measurable if and only if for each n, the restriction X|An of X to the domain An is measurable
between A ∩ An and B.
Proof. It suffices to show the that:
(I) If X is A/B-measurable then for all n the restriction X|An is A ∩ An /B-measurable.
Now X is A/B-measurable. Then for all n for all B ∈ B the pull back (X|An )−1 (B) = X −1 (B) ∩
An . Now X is measurable, then X −1 (B) ∈ A and therefore (X|An )−1 (B) = X −1 (B)∩An ∈ A∩An
shows that XAn is measurable.
(II) If for all n the restriction X|An is A∩An /B-measurable then X is A/B-measurable.
Now if B ∈ B and for all n we have (X|An )−1 (B) ∈ A ∩ An , then
X
−1
(B) =
∞
[
(X|An )−1 (B)
n=1
Now we know that (X|An )−1 (B) ∈ A ∩ An ⊂ A for all n, then the union
This shows that X
−1
(B) ∈ A. Therefore X is a random variable.
84
S∞
−1
n=1 (X|An ) (B)
∈ A.
2.3
σ-fields generated by families of mappings
DEFINITION 2.3.1 Let Ω be a set and (Ωi , Ai ), i ∈ I be a collection of measurable spaces
and Xi : Ω → Ωi be a corresponding mapping. Then the σ-field generated by the collection of
mappings {Xi : i ∈ I}, denoted by σhXi : i ∈ Ii, is defined to be the smallest σ-fields such that
S
Xi is measurable for all i ∈ I. Namely, we define σhXi : i ∈ Ii = σ( i∈I Xi−1 Ai ).
PROPOSITION 2.3.1 Let Ω be a set and (Ωi , Ai ), i ∈ I be a collection of measurable spaces
and Xi : Ω → Ωi be a corresponding mapping. Let Ai = σ(Gi ) where Gi ⊂ Ai for all i ∈ I. Then
!
[
σhXi : i ∈ Ii = σ
X −1 Gi
i∈I
Proof. By definition we have
σhXi i = σ
X −1 Ai
!
!
!
[
=σ
[
X −1 σ(Gi )
=σ
σ(X −1 Gi )
i∈I
i∈I
i∈I
[
where the last equality holds because of lemma 2.2.1. But we know that
!
!
[
[
σ
Fi = σ
Ai
i∈I
i∈I
provided that Fi = σ(Ai ). Therefore we obtain that
!
!
[
[
−1
−1
σhXi i = σ
σ(X Gi ) = σ
X Gi
i∈I
i∈I
EXAMPLE 2.3.1 Consider the case where Ω = R2 , Ω1 = Ω2 = R and X1 , X2 be projections:
X1 ((ω1 , ω2 )) = ω1 ,
X2 ((ω1 , ω2 )) = ω2
Therefore we know that
σhX1 , X2 i = σ X −1 ((−a, b]) ∪ X −1 ((−c, d]) = σ ((u, v] × (r, s])
where a, b, c, d, u, v, r, s ∈ R.
THEOREM 2.3.1 (Factorization Theorem) Suppose (Ω, F) and (Ψ, B) are two measurable spaces and X : Ω → Ψ is F/B-measurable. Consider a function F : Ω → R. Then F is
σ(X)/R-measurable if and only if there exists a random variable G : (Ψ, B) → (R, R) such that
F = G ◦ X.
Proof.
85
(I) First we prove the sufficient part. Namely, we assume that there exists such a G with
F = G ◦ X. Then we know that G : Ψ → R is a random variable and X : Ω → Ψ is B/σhXimeasurable. Therefore by the composite function theorem we know that F is a random variable.
(II) Now we prove the necessary part. To do this, we follow the steps:
P
• Assume that F is a simple random variable. Now assume that F (ω) = ki=1 ci IAi
S
where A1 , · · · , An ∈ σhXi and Ω = ni=1 Ai . Note that σhXi = X −1 B = {X −1 (B) : B ∈
B}. Then for all i there exists Bi ∈ B such that Ai = X −1 (Bi ). Now define G(ψ) =
Pk
i=1 ci IBi for all ψ ∈ Ψ. We claim that F = G ◦ X. But first we know that G is B/Rmeasurable because it is a simple random variable mapping from Ψ to R. Now we know
that ω ∈ Ai if and only if X(ω) ∈ Bi . Therefore plugging in ω in G ◦ X yields
G(X(ω)) =
k
X
ci IBi (X(ω)) =
i=1
k
X
ci IAi (ω) = F (ω)
i=1
• Consider that F is general random variable. By the structure theorem there exists
a sequence of simple random variable Fn (ω) such that Fn (ω) → F (ω) as n → ∞ for all
ω ∈ Ω. By our previous result there exists Gn : Ψ → R being a random variable such that
Fn = Gn ◦ X. Now define G(ψ) = limn→∞ Gn (ψ) for all ψ ∈ Ψ. Then by the structure
theorem G is a random variable, and we can check that for all ω ∈ Ω
G(X(ω)) = lim Gn (X(ω)) = lim Fn (ω) = F (ω)
n→∞
n→∞
Therefore we know that F = G ◦ X.
EXAMPLE 2.3.2 Suppose that X1 , · · · , Xn are random variables (Ω, F) → R. Then we
know that X = (X1 , · · · , Xn ) is a random vector. Moreover we know that
!
n
[
σhXi = σ X −1 ((a1 , b1 ] × · · · × (an , bn ]) = σ
X −1 ((ai , bi ]) = σhX1 , · · · , Xn i
i=1
Then F : Ω → R is a σhX1 , · · · , Xn i/R-measurable set if and only if there exists a function
G : Rn → R being Rn /R-measurable by the factorization theorem.
2.4
General Measure
DEFINITION 2.4.1 (General Measure) A set function µ on a σ-field F in Ω is a measure
if
1. 0 6 µ(A) 6 ∞ for all A ∈ F.
86
2. µ(∅) = 0.
3. µ is countably additive on F.
DEFINITION 2.4.2 (Finiteness and σ-Finiteness) A measure µ is said to be finite or
S
infinite if µ(Ω) < ∞ or µ(Ω) = ∞. If A ⊂ F, µ is said to be σ-finite on A if Ω = ∞
k=1 Ak for
some sequence of {Ak }∞
k=1 ⊂ A with µ(Ak ) < ∞; We say µ is σ-finite if µ is σ-finite on F.
DEFINITION 2.4.3 If (Ω, F) is a measurable space with a measure µ, then the triple
(Ω, F, µ) is called a measure space. We say that A ∈ F is a support of µ, or that µ is concentrated
on A, if µ(Ac ) = 0.
DEFINITION 2.4.4 A set function µ on a field F0 is said to be a premeasure if it satisfies
1. 0 6 µ(A) 6 ∞ for all A ∈ F0 .
2. µ(∅) = 0.
3. µ is finitely additive on F.
DEFINITION 2.4.5 A set function µ on a field F0 is said to be a measure if it satisfies
1. 0 6 µ(A) 6 ∞ for all A ∈ F0 .
2. µ(∅) = 0.
3. If {An }∞
n=1 ⊂ F0 is a countable sequence of disjoint sets, then
!
∞
∞
X
X
µ(An )
An =
µ
n=1
n=1
provided that
P∞
n=1
An ∈ F0 .
PROPOSITION 2.4.1 Let (Ω, F, µ) be a measure space. Then
• Finite Additivity. If {Ak }nk=1 ⊂ F are disjoint sets then
!
n
n
X
X
µ
Ak =
µ(Ak )
k=1
k=1
• Monotonicity. If A ⊂ B, A, B ∈ F then µ(A) 6 µ(B).
• Inclusion-Exclusion Formula. If A, B ∈ F then µ(A ∪ B) + µ(A ∩ B) = µ(A) + µ(B)
provided that A, B are of finite measure.
• Countable Subadditive. If {An }∞
n=1 ⊂ F is a sequence of sets then
!
∞
∞
[
X
µ
En 6
µ(En )
n=1
n=1
87
Proof.
Finite Additivity. Consider the sequence of sets {A1 , · · · , An , ∅, ∅, · · · }. Then applying the
countable additivity yields
µ
n
X
!
Ak
=µ
k=1
∞
X
!
Ak
=
k=1
∞
X
µ(Ak ) =
k=1
n
X
µ(Ak )
k=1
Monotonicity. Now we know that B = A + (B − A). Then applying the finite additivity yields
µ(B) = µ(A) + µ(B − A) > µ(A).
Inclusion-Exclusion Formula. Now we know that A ∪ B = (A − B) + (B − A) + (A ∩ B). We
also know that A = (A − B) + (A ∩ B), B = (B − A) + (A ∩ B) and µ(A), µ(B) < ∞. Therefore
we obtain that
µ(A ∪ B) = µ(A − B) + µ(B − A) + µ(A ∩ B)
and also
µ(A − B) + µ(A) = µ(A ∩ B)µ(B − A) + µ(B) = µ(A ∩ B)
by the finite additivity. Then we see that
µ(A ∪ B) + µ(A ∩ B) = µ(A) + µ(B)
Countable Subadditivity. Now consider a sequence of sets {An }∞
n=1 ⊂ F. Define B1 =
S∞
S∞
Sn
A1 , Bn+1 = An+1 − j=1 An . Then the sequence {Bn }∞
n=1 are disjoint and
n=1 Bn .
n=1 An =
Furthermore applying the countable additivity yields that
!
!
∞
∞
∞
∞
[
X
X
X
µ
An = µ
Bn =
µ(Bn ) 6
µ(An )
n=1
n=1
n=1
n=1
where the last inequality holds since Bn ⊂ An and the monotonicity can be applied.
PROPOSITION 2.4.2 Suppose F is a field and µ is a probability measure on F. Then we
have
(i) If {En }∞
n=1 ⊂ F with En ⊃ En+1 for all n and
T∞
n=1
En = E ∈ F, then {µ(En )}∞
n=1 decreases
to µ(E) provided that there exists some N such that µ(EN ) < ∞.
S∞
∞
(ii) If {En }∞
n=1 ⊂ F with En ⊂ En+1 for all n and
n=1 En = E ∈ F, then {µ(An )}n=1 increases
to µ(E).
Proof.
(i) Assume that we have µ(EN ) < ∞ and En ⊃ En+1 for all n. Then we know that
∞
\
n=1
En +
∞
X
(En − En+1 ) = EN
n=N
88
And using the countable additivity yields that
!
∞
∞
\
X
µ
En +
(En − En+1 ) = µ(E) + µ(EN ) − lim En = µ(EN )
n=1
n→∞
n=N
Now we know that µ(EN ) < ∞ and µ(En ) 6 µ(EN ) for all n > N , then we obtain that
limn→∞ µ(En ) = µ(E).
(ii) Now we have the relation
∞
∞
X
[
E1 +
(En+1 − En ) =
En = E
n=1
n=1
Then using the countable additivity yields that
µ(E1 ) +
∞
X
µ(En+1 − En ) = lim µ(En ) = µ(E)
n→∞
n=1
EXAMPLE 2.4.1 (Discrete Measure Spaces) Let Ω be a at most countably infinite set
and F = 2Ω be the total σ-field. Then assign every ω ∈ Ω a point mass m(ω) and define the
P
measure to be µ(A) = ω∈A m(ω). Now we can see that µ is finite if and only if the series
X
m(ω) < ∞
ω∈Ω
Now we consider the σ-finiteness. Now if each point mass m(ω) is finite then we know that
{ω} : ω ∈ Ω is a countable cover of Ω and µ is finite on each {ω}. Therefore we know that µ is
S∞
σ-finite. Conversely if µ is σ-finite then we know that there exists {Aj }∞
j=1 with Ω =
j=1 Aj
and µ(Aj ) < ∞. Then for all ω ∈ Ω there exists some j with ω ∈ Aj and m(ω) 6 µ(Aj ) < ∞
by monotonicity. Hence we conclude that µ is σ-finite if and only if each point mass is finite.
THEOREM 2.4.1 If µ is a σ-finite measure on a measure space (Ω, F, µ), then F does not
contain an uncountable collection of disjoint sets of positive measure.
EXAMPLE 2.4.2 Consider the case (R, R, µ). A singleton {x} ∈ R is said to be an atom if
µ({x}) > 0. Assume that µ is σ-finite on (R, R). Prove that the number of atoms is at most
countable.
Proof. Now let {An }∞
n=1 be a cover of R such that µ(An ) < ∞ with
S∞
n=1
An = Ω. Now suppose
that I = {x ∈ R : µ({x}) > 0} is uncountable. Then write In = I ∩ An . Then we see that
S
I = ∞
n=1 In . Therefore there exists some n0 such that In0 is uncountable, else I would be
a countable union of countable sets and this contradicts with the fact that I is uncountable.
Namely now we have a set In0 being uncountable and for all x ∈ In0 the measure µ({x}) > 0.
89
Define Jk = {x ∈ In0 : µ({x}) > 1/k} for all k ∈ N. Then we obtain that In0 =
S∞
k=1
Jk and
similarly this yields that there exists some k0 with Jk0 being uncountable. Now for all N ∈ N
there exists some LN ⊂ Jk0 such that the number of the elements in LN is exactly N . Then we
see that
X
µ(An0 ) > µ(In0 ) > µ(Jk0 ) > µ(LN ) =
µ({x}) >
x∈LN
N
k
Hence letting N → ∞ yields µ(An0 ) = ∞ and this contradicts µ(An ) < ∞ for all n. Therefore
the number of atoms are countable.
THEOREM 2.4.2 (Egoroff ’s Theorem) Let (Ω, F, µ) be a measure space and {fn }∞
n=1 , f
be finite-valued measurable functions from (Ω, F) to (R, R) with fn (ω) → f (ω) for all ω ∈ Ω as
n → ∞. Consider A ∈ F with finite measure µ(A) < ∞. Then for all ε > 0 there exists B ⊂ A
with µ(B) < ε such that fn converge to f uniformly on A − B.
Proof. Define the set
Bn(k)
=
∞ [
i=n
1
ω ∈ A : |fn (ω) − f (ω)| >
k
(k)
First we claim that for any fixed k the sequence {Bn }∞
n=1 decreases to ∅. In fact if ω ∈
T∞
(k)
(k)
n=1 Bn then for all n we have ω ∈ Bn . Namely for all n there exists some m > n such that
|fm (ω) − f (ω)| > 1/k. But this contradicts with the fact that fm (ω) → f (ω) as m → ∞. And it
(k)
is obvious that Bn is a sequence of decreasing sets. Now by the monotone sequential continuity
(k)
from above, for all ε > 0 and for all k, there exists some nk such that µ(Bnk ) < ε/2k . Define
S
(k)
B= ∞
k=1 Bnk . Then the subadditivity yields
!
∞
∞
[
X
(k)
µ(B) = µ
Bnk 6
µ(Bn(k)
)<ε
k
k=1
k=1
Now we claim that fn converge to f uniformly on A − B. Write
A−B =A∩
∞
\
(Bn(k)
)c
k
k=1
=
∞
\
(A − Bn(k)
)
k
k=1
(k)
By definition ω ∈ A − B means that for all k > 1 we have ω ∈ A − Bnk . Namely ω ∈ A but
(k)
(k)
ω∈
/ Bnk . Then we know that for all k, there exists nk such that ω ∈ (Bnk )c and this yields that
|fi (ω) − f (ω)| < 1/k for all i > nk by the following relation
∞ \
1
(k) c
(Bnk ) =
ω ∈ A : |fi (ω) − f (ω)| <
k
i=n
Hence for all ε > 0 there exists some k such that 1/k < ε and some nk such that for all i > nk
|fi (ω) − f (ω)| <
90
1
<ε
k
for all ω ∈ A − B. Therefore fn converge to f uniformly on A − B.
THEOREM 2.4.3 If µ is a σ-finite measure on a measurable space (Ω, F), then F does not
contain an uncountable collection of disjoint sets of positive measure.
Proof.
(a) The Probability Step. Suppose that µ is a probability measure. Then consider the index
set Θ := {θ : µ(Bθ ) > 0, Bθ ∈ F, Bθ1 ∩ Bθ2 = ∅ if θ1 6= θ2 }. Write
∞ ∞
[
[
1
Θ := {θ : µ(Bθ ) > 0, Bθ ∈ F} =
θ : µ(Bθ ) > , Bθ ∈ F :=
Θk
k
k=1
k=1
Now we claim that each Θk is finite. In fact if Θk is infinite then for all n we can pick up
θ1 , · · · , θn ∈ Θk such that
1>µ
n
X
!
Bθi
=
i=1
n
X
µ(Bθi ) >
i=1
n
k
Then letting n → ∞ yields the contradiction. Therefore Θk is finite for all k and this leads to
S
that Θ = ∞
k=1 Θk is countable.
(b) The Scaling Step for Finite Measure. Now if µ : F → R is finite, then the measure
P (B) = µ(B)/µ(Ω) is a probability measure. Therefore by our previous result the index set
{θ : P (B) = µ(Bθ )/µ(Ω) > 0, B ∈ F, Bθ1 ∩ Bθ2 = ∅ if θ1 6= θ2 }
is countable. But this set is exactly the same set as
Θ = {θ : µ(Bθ ) > 0, B ∈ F, Bθ1 ∩ Bθ2 = ∅ if θ1 6= θ2 }
Therefore the Θ is countable.
(c) The Scissors and Paste Step for σ-Finite Measure. Assume that µ is σ-finite. Then
P∞
there exists a sequence of disjoint sets {Ak }∞
k=1 ⊂ F such that Ω =
k=1 Ak . Then µ|Ak is a
finite measure on (Ak , F ∩ Ak ) for each k. Consider the index set for each k:
Θk := {θ : µ(Bθ ∩ Ak ) > 0, Bθ ∈ F ∩ Ak , Bθ1 ∩ Bθ2 = ∅ if θ1 6= θ2 }
By our previous result we know that Θk is countable. Now consider the following relation:
!
∞
∞
X
X
µ(Bθ ) = µ
Ak ∩ Bθ =
µ(Ak ∩ Bθ )
k=1
k=1
Therefore θ ∈ Θ if and only if µ(Bθ ) > 0 where Θ is defined as our previous notation. But
µ(Bθ ) > 0 if and only if µ(Ak ∩ Bθ ) > 0 for some k, which means that θ ∈ Θk for some k. But
S
S∞
θ ∈ Θk for some k if and only if θ ∈ ∞
k=1 Θk . Therefore θ ∈ Θ if and only if θ ∈
k=1 Θk .
S∞
Namely Θ = k=1 Θk and this shows that Θ is countable.
91
THEOREM 2.4.4 (Uniqueness of Extension) If F is a field, µ, µ0 are measures on σ(F).
Suppose that µ = µ0 on F and both of them are σ-finite on F. Then µ = µ0 .
Proof.
(a) The Probability Step.
This step has been done in the extension theorem in probability
measure theory chapter.
(b) The Scaling Step for Finite Measure. If µ = µ0 on F and µ(Ω) = µ(Ω) = 1/C ∈ (0, ∞),
then the measures Cµ(A) and Cµ0 (A) are probability measures on F. Using (a) yields that the
extension of Cµ and Cµ0 , say, Q and Q0 , are equal on σ(F). But we know that Cµ on σ(F)
itself is an extension of Cµ on F. Therefore we know that Cµ = Q = Q0 = Cµ0 on σ(F) by the
uniqueness of extension of probability measure. Hence µ = µ0 on σ(F).
(c) The Scissors and Paste Step for σ-Finite Measure. Let Ω =
P∞
n=1
An where each
An ∈ F and µ, µ0 are finite on (An , F ∩ An )(Note that this is not a measurable space). Then for
all B ∈ σ(F) write
µ(B) = µ
∞
X
!
B ∩ An
=
n=1
∞
X
n=1
µ(B ∩ An ) =
∞
X
0
µ (B ∩ An ) = µ
n=1
0
∞
X
!
B ∩ An
= µ0 (B)
n=1
where the quality µ(B ∩ An ) = µ0 (B ∩ An ) holds because B ∩ An ∈ σ(F ∩ An ) = σ(F) ∩ An and
µ, µ0 are finite on F ∩ An , and the previous result yields that µ = µ0 on σ(F ∩ An ). Therefore
we know that µ = µ0 on σ(F).
2.5
Transformation of Probability Measure
DEFINITION 2.5.1 Suppose that (Ω, F, µ) is a measure space and X : (Ω, F) → (Ψ, B) is
measurable. Then the measure on (Ψ, B) given by
µX −1 (B) := µ(X −1 (B))
is called the measure induced from the measure µ by the measurable mapping X.
It is easy to see that µX −1 inherits from µ each of the following properties: measure, finite
measure, probability measure. Note that the σ-finiteness is not inherited and here is an example.
EXAMPLE 2.5.1 Let Ω = N, F be the total σ-field and µ be the counting measure. The
target space be (Ψ, B) = ({0}, {∅, {0}}) and X(ω) = 0 for all ω ∈ Ω. Then we see that
µX −1 (Ψ) = µ(Ω) = ∞
and any countable partition of Ψ can only be a sequence of ∅ with {0} and the measure
µX −1 ({0}) = ∞. Therefore µX −1 is not σ-finite.
92
DEFINITION 2.5.2 (Distribution and Distribution Function) Let (Ω, F, P ) be a probability space and X be a random variable. The probability measure induced from P by the
random variable X, which is given by
P X −1 (B) = P {X ∈ B} for all B ∈ R
is called the distribution of the random variable X. Then the distribution P X −1 is a probability
measure on (R, R) and we can define the distribution function of P X −1 . The distribution
function of the random variable X is the distribution function of the distribution of X and can
be given by FX (x) = P {X ∈ (−∞, x]}.
It is natural to consider the following proposition:
PROPOSITION 2.5.1 To each random variable X : (Ω, F, P ) → (R, R) there exists a
probability measure on the space (R, R), namely, its distribution. Conversely, given a probability
measure P on (R, R) there exists a probability space (Ω, F, Q) and a random variable X such
that P = QX −1 .
Proof. The first part is trivial as the definition of distribution. For the second part, take
(Ω, F, P ) = (R, R, Q) and X(ω) = ω. Then we see that
QX −1 (B) = Q{X ∈ B} = Q{ω : X(ω) ∈ B} = Q(B) = P (B)
And the probability space (R, R, P ) together with the random variable X satisfies the desired
result.
2.6
Independence for Random Variables
2.6.1
Independence and Tail σ-Fields
DEFINITION 2.6.1 (Tail σ-Fields and Tail Random Variables) Let {Xn }∞
n=1 be a
sequence of random variables on some probability space (Ω, F, P ). Put
T :=
∞
\
σhXp : p > ni
n=1
Then T is called the tail σ-field associated with the Xn ’s; members of T are called tail events
and T -measurable random variables are called tail random variables
P
EXAMPLE 2.6.1 Set Sn = ni=1 Xi where {Xn }∞
n=1 is a sequence of random variables on
some probability space. Then
1. lim supn Sn is not a tail random variable in general: different values of X1 produce different
values of lim supn Sn ;
93
2. lim supn Sn /n is a tail random variable: in fact lim supn Sn /n = lim supp ((Sn+p −Sn )/(n+p))
is indeed σhXp : p > ni-measurable for n. Therefore it is T -measurable and is a tail random
variable.
S∞
−1
3.
n=1 Xn ({0}) is not a tail event in general: the value of X1 matters.
4. lim supn Xn−1 ({0}) is a tail event: lim supn Xn−1 ({0}) = lim supp>n Xp−1 ({0}) for all n.
Now we consider the case of the tail σ-fields associated with a sequence of events:
PROPOSITION 2.6.1 If {An }∞
n=1 is a sequence of events, then
∞
\
σhAp : p > ni =
n=1
∞
\
σhIAp : p > ni
n=1
Proof. We know that σhIAp i = {∅, Ap , Acp , Ω} = σ({Ap }). Therefore
!
!
∞
∞
[
[
σhIAp : p > ni = σ
σ({Ap }) = σ
{Ap } = σhAp : p > ni
p=n
p=n
And intersecting over all n yields the desired equality.
Now we define the independence of a collection of random variables.
DEFINITION 2.6.2 Let {Xi : i ∈ I} be a collection of random variables. Then Xi ’s are
independent if the σ-fields generated by them, namely, the classes σhXi i : i ∈ I are independent.
2.6.2
Kolmogorov’s Zero-One Law for Random Variables
THEOREM 2.6.1 (Kolmogorov’s Zero-One Law) Let {Xn }∞
n=1 be a sequence of independent random variables. Then all tail events for this sequence have probability either 0 or 1
and all tail random variables are almost surely constant.
Proof.
(I) First we show that the tail events are either of probability 1 or of probability 0.
Suppose that Xn , n = 1, 2, · · · is a sequence of independent random variables, then we know that
σhXn i : n ∈ N is a sequence of independent σ-fields by definition. Therefore by the splitting
theorem we know that
!
σhX1 i, σhX2 i, · · · , σhXn i, σ
[
σhXp i
p>n+1
are independent for all n. And σ
S
p>n+1 σhXp i = σhXp : p > n + 1i. Therefore we obtain
σhX1 i, σhX2 i, · · · , σhXn i, σhXp : p > n + 1i
94
are independent for all n. Using the subclasses theorem yields
σhX1 i, σhX2 i, · · · , σhXn i, T
are independent for all n. Therefore by the definition of independence among classes we know
that
T , σhX1 i, σhX2 i, · · · , σhXn i, · · ·
are independent. Again using the splitting theorem yields that
!
[
T ,σ
σhXn i = σhXn : n > 1i
n>1
are independent. By the subclasses again we obtain that
T,
∞
\
σhXp > ni = T ⊂ σhXn : n > 1i
n=1
are independent. Therefore we know that T and T are independent. This yields that every tail
event is either of probability 0 or probability 1.
(II) Then we show that each tail random variables are almost surely constant. Assume
that X is a tail random variable. Then for all c ∈ R we know that {X ∈ (−∞, c]} is either of
probability 1 or of probability 0. Therefore P {X 6 c} = 0 or 1. But we know that FX (c) =
P {X 6 c} is a distribution function and must be nondecreasing and normalized. Therefore we
know that X(ω) ≡ c0 for some c0 ∈ R for all ω ∈ Ω.
95
Chapter 3
Integration
3.1
Construction of Integration
3.1.1
Integration for Nonnegative Simple Functions
DEFINITION 3.1.1 (Integration for Nonnegative Simple Functions) Suppose that X
P
is a nonnegative simple functions on a measure space (Ω, F, µ). Write X = ni=1 ci IAi where
S
Ai ∈ F for all i and Ω = ni=1 Ai . The integration of X is defined as
Z
n
X
Xdµ :=
ci µ(Ai )
i=1
The integration of simple functions has the following properties
PROPOSITION 3.1.1 Suppose S+ is the collection of nonnegative simple functions on
(Ω, F, µ). Consider the integration on S+ . Then
(0) The integration is well-defined.
(1) The integration is nonnegative.
(2) The integration is linear in the sense
R
R
R
R
R
cXdµ = c Xdµ and (X +Y )dµ = Xdµ+ Y dµ
where we require that c is finite and nonnegative.
(3) The integration is monotone. Namely if X 6 Y then
R
Xdµ 6
R
Y dµ.
Proof.
P
P
P
P
(0) Suppose X = j cj IAj = i bi IBi where Ω = j Aj = i Bi , then
X
X X
X
cj µ(Aj ) =
cj
µ(Aj ∩ Bi ) =
cj µ(Aj ∩ Bi )
j
j
=
i
X
i,j:Aj ∩Bi 6=∅
bi µ(Aj ∩ Bi ) =
X
i
i,j:Aj ∩Bi 6=∅
96
bi
X
j
µ(Aj ∩ Bi ) =
X
i
bi µ(Bi )
Therefore the integration for simple functions is well-defined.
R
P
P
(1) Since X = j cj IAj where cj > 0, it follows that Xdµ = j cj µ(Aj ) > 0.
P
P
(2) If c is finite and nonnegative, then X = j cj IAj implies that cX = j ccj IAj and therefore
Z
X
X
cXdµ =
ccj µ(Aj ) = c
cj µ(Aj )
j
P
j
P
For the additivity part, if X = i xi IAi , Y = j yj IAj , then we know that X =
P
i,j yj IAj . Therefore
Z
X
(X + Y )dµ =
(xi + yj )µ(Ai ∩ Bj )
P
i,j
xi IAi ∩Bj , Y =
i,j
=
X
=
X
xi µ(Ai ∩ Bj ) +
i,j
yj µ(Ai ∩ Bj )
i,j
xi µ(Ai ) +
i
X
yj µ(Bj )
j
Z
=
X
Z
Xdµ +
Y dµ
yj IAj , rearrange the collection {Ai ∩Bj : Ai ∩Bj 6= ∅, i, j} to (Ek )nk=1 ,
P
P
and for all ω ∈ Ek denote X(ω) = uk , Y (ω) = vk , then we have X = k uk IEk , Y = k vk IEk .
(3) If X =
P
i
xi IAi , Y =
P
j
Since X 6 Y , it follows that uk 6 vk for all k and this shows that
Z
Z
X
X
Xdµ =
uk IAk 6
vk IAk = Y dµ
k
3.1.2
k
Integration for Nonnegative Measurable Functions
DEFINITION 3.1.2 Let X be a nonnegative measurable function. According to the structure theorem there exists a sequence of increasing simple functions {Xn }∞
n=1 such that Xn ↑ X
as n → ∞. Define the integration of X to be
Z
Z
Xdµ := lim
Xn dµ
n→∞
Note that the limit exists in [0, ∞] because of the monotonicity of integration over increasing
simple functions. And the following proposition shows that the integration is well-defined.
PROPOSITION 3.1.2 Suppose G+ is the collection of nonnegative measurable functions on
(Ω, F, µ). Consider the integration on G+ . Then
(0) The integration is well-defined.
97
(1) The integration is nonnegative.
(2) The integration is linear in the sense
R
R
R
R
R
cXdµ = c Xdµ and (X +Y )dµ = Xdµ+ Y dµ
where we require that c is nonnegative.
(3) The integration is monotone. Namely if X 6 Y then
R
Xdµ 6
R
Y dµ.
Proof.
(0) Suppose that Xn ↑ X and Yn ↑ X as n → ∞ with Xn , Yn being simple functions. Then it
R
R
suffices to show that limn→∞ Xn dµ > limm→∞ Ym dµ. Equivalently, it suffices to show that
R
R
limn→∞ Xn dµ > Ym dµ for all m.
For all ε > 0, for each j consider the following sets
Ajn := {ω ∈ Aj : Xn (ω) > (1 − ε)yj }
Then we see that Ajn ⊂ Aj,n+1 and
S∞
n=1
Ajn ⊂ Aj . On the other hand if ω ∈ Aj then
Xn (ω) ↑ X(ω) > Y (ω) = yj and therefore for sufficiently large n we have Xn (ω) > (1 − ε)yj .
S∞
S
Namely ω ∈ Ajn ⊂ ∞
n=1 Ajn = Aj . By the monotone sequential
n=1 Ajn . This shows that
continuity from below we have µ(Ajn ) → µ(Aj ) as n → ∞. Therefore
!
Z
Z X
X
yj µ(Ajn )
Xn dµ >
(1 − ε)yj IAjn + 0 · IAj −Ajn dµ = (1 − ε)
j
j
Now letting n → ∞ yields
Z
X
X
lim
Xn dµ > (1 − ε)
yj lim µ(Ajn ) = (1 − ε)
yj µ(Aj )
n→∞
j
n→∞
j
R
R
P
Since ε can be arbitrarily small, then we know that limn→∞ Xn dµ > j yj µ(Aj ) = Ym dµ.
R
(1) Nonnegativity is obvious since Xdµ is the limit of a sequence of increasing nonnegative
real numbers.
(2) If c is a nonnegative number then either c = ∞ or c is finite. Suppose c is finite, then there
is a sequence of simple functions Xn ↑ X as n → ∞ and cXn ↑ cX. This shows that
Z
Z
Z
Z
cXdµ = lim
cXn dµ = c lim
Xn dµ = c Xdµ
n→∞
n→∞
If X, Y are nonnegative measurable functions, then there exists two sequences of simple functions
R
R
R
R
Xn ↑ X, Yn ↑ Y as n → ∞ such that Xn dµ → Xdµ, Yn dµ → Y dµ. But Xn + Yn ↑ X + Y
98
as n → ∞, therefore
Z
Z
(X + Y )dµ = lim
(Xn + Yn )dµ
Z
Z
= lim
Xn dµ + Yn dµ
n→∞
Z
Z
Z
= lim
Xn dµ +
Yn dµ
n→∞
n→∞
Z
Z
= Xdµ + Y dµ
n→∞
(3) Suppose Xn ↑ X, Yn ↑ Y as n → ∞ with Xn , Yn are simple functions. Then define Zn =
max(Xn , Yn ). Then we know that limn→∞ Zn = limn→∞ max(Xn , Yn ) = max(X, Y ) because max
function is continuous. We also know that Zn ’s are simple functions. Therefore
Z
Z
Z
Z
Xdµ = lim
Xn dµ 6 lim
Zn dµ = Y dµ
n→∞
n→∞
Moreover we have the following properties
PROPOSITION 3.1.3 Suppose G+ is the collection of nonnegative measurable functions on
(Ω, F, µ). Consider the integration on G+ . Then
R
(1) Xdµ < ∞ implies that µ{X = ∞} = 0.
R
(2) Xdµ = 0 if and only if µ{X > 0} = 0.
R
R
(3) µ{X 6= Y } = 0 implies that Xdµ = Y dµ.
Proof.
(1) Assume that µ({X = ∞}) > 0. Define simple functions Xn = nI{X=∞} , then we know that
Xn ↑ ∞ · I{X=∞} as n → ∞. Therefore by the definition of nonnegative functions
Z
Z
Z
Xdµ > ∞ · I{X=∞} dµ = lim
nI{X=∞} dµ = µ({X = ∞}) lim n = ∞
n→∞
n→∞
R
And this contradicts with the fact that Xdµ < ∞. Therefore µ({X = ∞}) = 0.
R
(2) If Xdµ = 0 we show that µ({X > 0}) = 0. Assume that µ({X > 0}) > 0. Then we know
that {X > 1/k} ↑ {X > 0} as k → ∞. Therefore we know that µ({X > 1/k}) ↑ µ({X > 0}) > 0
as k → ∞. Therefore there exists some k such that µ({X > 1/k}) > 0. Hence by monotonicity
Z
Z
1
1
· I{X>1/k} dµ = µ({X > 1/k}) > 0
Xdµ >
k
k
R
And this contradicts with Xdµ = 0. Therefore µ({X > 0}) = 0.
99
Conversely if µ({X > 0}) = 0, then
Z
Z
Xdµ = (XI{X>0} + XI{X=0} )dµ
Z
= XI{X>0} dµ
Z
6 ∞ · I{X>0} dµ
= ∞ · µ({X > 0}) = 0
R
R
Together with the nonnegativity of Xdµ we know that Xdµ = 0.
(3) Suppose µ({X 6= Y }) = 0. Then we know that µ({|X − Y | > 0}) = 0. Therefore by our
R
previous result we know that |X − Y |dµ = 0. Therefore
Z
Z
Z
Xdµ 6 Y + |X − Y |dµ = Y dµ
Z
Z
Z
Y dµ 6 X + |Y − X|dµ = Xdµ
And it follows that
R
Xdµ =
R
Y dµ.
Now we are going to introduce one of the most fundamental theorem concerning integration
to the limit.
is a
THEOREM 3.1.1 (Levy’s Monotone Convergence Theorem) Suppose (Xn )∞
R n=1
sequence of nonnegative measurable functions with Xn ↑ X as n → ∞, then we have Xn dµ ↑
R
Xdµ as n → ∞.
R
R
Proof. First we observe that Xn 6 X for all n yields that limn→∞ Xn dµ 6 Xdµ. Note
R
that the limit exists because the sequence ( Xn dµ)∞
n=1 increases due to the monotonicity of
R
R
integration. Therefore it suffices to show that Xdµ 6 Xn dµ.
For each n define a sequence of simple random variables (Xn,m )∞
m=1 that increases to Xn as
m → ∞:



n, Xn (ω) > m
Xn,m (ω) = k − 1
k−1 k

,
,
 m , Xn (ω) ∈
2
2m 2m
k = 1, 2, · · · , m2m
Now we claim that Xn,n → X as n → ∞. First by definition we know that Xn,n 6 Xn , therefore
lim supn Xn,n 6 lim supn Xn = X. Conversely if Xn (ω) > n then Xn,n (ω) = n; Else we have
Xn ∈ [(k − 1)/2n , k/2n ) for some k = 1, 2, · · · , n2n and this implies that Xn,n (ω) = (k − 1)/2n .
Therefore Xn (ω) − 1/2n < Xn,n (ω) and this yields that
1
Xn,n (ω) > min n, Xn (ω) − n
2
100
Taking limit inferior on both sides and use the fact that min is a continuous function yields
1
lim inf Xn,n > lim inf min n, Xn (ω) − n
n→∞
n→∞
2
1
= min lim inf n, lim inf Xn (ω) − n
n
n
2
= min(∞, X(ω)) = X(ω)
Therefore we obtain that Xn,n → X as n → ∞.
Define X̃n = max16k6n Xk,n , then we know that
X̃n = max Xk,n 6 max Xk,n+1 6 max Xk,n+1 = X̃n+1 ,
16k6n
16k6n
16k6n+1
X̃n = max Xk,n 6 max Xk = Xn
16k6n
16k6n
because Xk,n ↑ Xk as n → ∞. We also know that
Xn,n 6 max Xk,n = X̃n 6 Xn
16k6n
Using our previous claim and letting n → ∞ yields that X̃n ↑ X as n → ∞. But X̃n ’s are
simple since each Xn,m ’s are simple, then by the definition of integration and the monotonicity
we obtain that
Z
Z
Z
Xdµ =
And letting n → ∞ yields
3.1.3
R
X̃n dµ 6
Xdµ 6 limn→∞
R
Xn dµ
Xn dµ. Therefore we complete the proof.
Integration for General Measurable Functions
DEFINITION 3.1.3 Consider the classes of functions
Z
Z
Q+ := X : X+ dµ < ∞
Q− := X : X− dµ < ∞
We say that X is quasi-integrable in the positive or negative sense if X ∈ Q+ or X ∈ Q−
respectively. We say that X is quasi-integrable if X ∈ Q+ ∪ Q− and X is integrable if X ∈
Q+ ∩ Q− . We say X ∈ L if X is integrable. If X is quasi-integrable then the integral of X is
defined to be
Z
Z
Xdµ =
+
X dµ −
Z
X − dµ
There are some more general properties concerning general measurable functions:
PROPOSITION 3.1.4 Consider the class of functions Q := Q+ ∪ Q− and the integration
over functions in Q.
R
R
R
(1) Xdµ ∈ [−∞, ∞]. Moreover Xdµ > −∞ implies µ{X = −∞} = 0 and Xdµ < ∞
implies µ{X = +∞} = 0. If X = Y a.e.µ then X ∈ Q± if and only if Y ∈ Q± in which
R
R
case Xdµ = Y dµ.
101
(2) The integration is linear in the following sense:
R
R
(a) cXdµ = c Xdµ provided that c is finite, in which case cX is also quasi-integrable.
R
R
R
(b) (X + Y )dµ = Xdµ + Y dµ provided that X + Y is defined, X, Y ∈ Q± , in which
case X + Y ∈ Q± .
Proof.
R
R
(1) It is trivial that Xdµ takes values on the extended real line. If Xdµ > −∞ then we know
R
that X − dµ < ∞. Then by our previous result µ({X − = ∞}) = µ({X = −∞}). Similarly if
R
R
Xdµ < ∞ then X + dµ < ∞, implying that µ({X = ∞}) = µ({X + = ∞}) = 0.
If X = Y a.e.µ, then we know that µ({X 6= Y }) = 0. Therefore µ({X + 6= Y + }) = µ({X − 6=
Y − }) = 0. By our previous result we know that
Z
Z
Z
Z
+
+
−
X dµ = Y dµ,
X dµ = Y − dµ
Therefore we know that X ∈ Q± if and only if Y ∈ Q± and in which case
Z
Z
Z
Z
Z
Z
+
−
+
−
Xdµ = X dµ − X dµ = Y dµ − Y dµ = Y dµ
R
R
(2a) Suppose c is finite, then if c > 0 we know that (cX)± dµ = c X ± dµ and therefore cX is
R
R
quasi-integrable; And if c < 0 we know that (cX)± dµ = −c X ∓ dµ and therefore cX is also
quasi-integrable. Then we can see that
Z
Z
Z
Z
Z
Z
+
−
+
−
cXdµ = (cX) dµ − (cX) dµ = c X dµ − c X dµ = c Xdµ, c > 0
Z
Z
Z
Z
Z
Z
+
−
−
+
cXdµ = (cX) dµ − (cX) dµ = −c X dµ + c X dµ = c Xdµ, c < 0
(2b) First we show that X + Y ∈ Q± provided that X, Y ∈ Q± . Now consider the maximum
inequality
max(X + Y, 0) 6 max(X, 0) + max(Y, 0)
Then we obtain that (X + Y )± 6 X ± + Y ± . Therefore if X, Y ∈ Q± then by the monotonicity
of integration for nonnegative random variables X + Y ∈ Q± .
R
R
R
Now we show that (X + Y )dµ = Xdµ + Y dµ. Since
X + Y = (X + Y )+ − (X + Y )− = X + + Y + − X − − Y −
Since adding a nonnegative number on a equality where both sides are nonnegative will not
change the equality, we have
(X + Y )+ + X − + Y − = (X + Y )− + X + + Y +
102
Using the linearity of integration on nonnegative random variables yields
Z
Z
Z
Z
Z
Z
+
−
−
−
+
(X + Y ) dµ + X dµ + Y dµ = (X + Y ) dµ + X dµ + Y + dµ
R
R
R
If X, Y, X + Y ∈ Q+ then subtracting (X + Y )+ dµ, X + dµ + Y + dµ yields that
Z
Z
Z
Z
Z
Z
+
−
+
−
+
(∗) − (X + Y ) dµ + (X + Y ) dµ = − X dµ + X dµ − Y dµ + Y − dµ
R
R
R
implying that (X + Y )dµ = Xdµ + Y dµ. Similarly if X, Y, X + Y ∈ Q− we have the same
result by multiplying (−1) on both sides of (*).
EXAMPLE 3.1.1 Suppose Ω is countable, say, Ω = N and F be its total σ-field. Let
µ({m}) = µm ∈ [0, ∞] for each m ∈ Ω. Then all functions X : Ω → R are random variables.
Denote a function X : Ω → R by (xm )m∈Ω .
(0) Integration for indicators. Suppose that A ∈ F is some subset of Ω. Then the integral
is
Z
X
IA dµ = µ(A) =
µm
m∈A
(1) Integration for simple random variables. Suppose that X is a simple random variable,
P
P
namely, X = nj=1 cj IAj where nj=1 Aj = Ω. We assume that cj ’s are distinct. Then the
integral is
Z
Xdµ =
=
n
X
j=1
n
X
cj µ(Aj )
cj
j=1
=
n
X
µm
m∈Aj
cj
j=1
=
X
X
µm I(m∈Aj )
m∈Ω
n
XX
cj µm I(m∈Aj )
m∈Ω j=1
=
X
m∈Ω
µm
n
X
cj I(m∈Aj )
j=1
by the definition of X we know that xm = X(m) =
Pn
j=1 cj IAj (m)
Therefore we obtain
Z
Xdµ =
X
m∈Ω
µm
n
X
j=1
103
cj I(m∈Aj ) =
X
m∈Ω
µ m xm
=
Pn
j=1 cj I(m∈Aj ) .
(2) Integration for nonnegative random variables. Suppose that X is a nonnegative
random variable on (Ω, F). Write X(m) = xm for all m ∈ Ω. Consider the sequence of
simple random variables Xn = XI{1,2,··· ,n} , namely, Xn (m) = xm when m = 1, 2, · · · , n and
Xn (m) = 0 when m > n. Xn ’s are indeed simple. We also know that Xn ↑ X as n → ∞.
Then by the monotone convergence theorem we have
Z
Z
∞
n
X
X
xm µ m
xm µm = lim
Xdµ = lim
Xn dµ = lim
n→∞
n→∞
m=1
n→∞
m=1
(3) Integration for general random variables. Suppose X is a random variable on (Ω, F).
R −
R
P
P
−
+
X dµ = ∞
Then X + dµ = ∞
m=1 xm µm .
m=1 xm µm and
P
±
If X ∈ Q± then we know that ∞
m=1 xm µm converges. Then whether or not the other series
P∞
P∞
∓
µ
converges
or
not
will
not
affect
the
fact
that
x
m
m
m=1 xm µm is irrelevant of the
m=1
order of summation.
P∞
P
−
+
µ
and
x
If X ∈ L then we know that both ∞
m
m
m=1 xm µm converge. Therefore the
P∞ m=1
series is absolutely convergent m=1 |xm |µm < ∞.
3.2
Change of Variable
In this section we consider the relationship of integration between (Ω, σhXi, µ) and the target
space (Ψ, B, µX −1 ) for some mapping X : Ω → Ψ. We know that µX −1 is defined on the target
space (Ψ, B) so (Ψ, B, µX −1 ) is a measure space.
THEOREM 3.2.1 Let X : Ω → (Ψ, B) be some mapping and consider the σ-field σhXi.
Suppose (Ω, σhXi) is equipped with some measure µ and (Ψ, B) with µX −1 . Let G : (Ψ, B) →
(R, R) be measurable and Y = G ◦ X. Then G ∈ Q± with respect to the measure µX −1 if and
only if Y = G ◦ X ∈ Q± , in which case
Z
Z
−1
Gd(µX ) = Y dµ
Proof.
0 Indicator case. Suppose G is an indicator IB for some B ∈ B. Then Y = IX −1 (B) . This
R
indicates that both IB and IX −1 (B) ∈ Q− . We know that GdµX −1 = µX −1 (B) < ∞. But
R
this is exactly µX −1 (B) = µ(X −1 (B)) < ∞. Therefore we know that GdµX −1 = µX −1 (B) =
R
µ(X −1 (B)) = Y dµ and this shows that IB ∈ Q+ if and only if IX −1 (B) = Y ∈ Q+ .
(1) Nonnegative simple random variable case. Suppose that G is a simple random variable.
P
P
Write G = j cj IBj where we assume that cj ’s are distinct and j Bj = Ω, Bj ∈ B. Then we
104
P
know that Y = G ◦ X = j cj IX −1 (Bj ) and by linearity
Z
Z X
X Z
X
−1
−1
cj IBj dµX =
cj IBj µX −1 =
cj µ(X −1 (Bj ))
GdµX =
j
=
j
Z X
j
Z
cj IX −1 (Bj ) dµ =
Y dµ
j
Therefore G ∈ Q± if and only if Y ∈ Q± because the integrals are the same.
(2) Nonnegative random variable case. Suppose G is a nonnegative random variable. Then
there exists a sequence of increasing nonnegative simple random variables (Gn )∞
n=1 such that
Gn ↑ G as n → ∞. This means that Gn ◦ X ↑ G ◦ X = Y as n → ∞. Applying monotone
convergence theorem yields
Z
GdµX
−1
Z
= lim
n→∞
Gn dµX −1
Z
= lim
Gn ◦ Xdµ
n→∞
Z
= G ◦ Xdµ
Z
= Y dµ
(3) General random variable case. Now if G is a random variable, then Y ± = G± ◦ X.
Therefore we know that
Z
Z
+
Z
−
Z
G+ dµX −1
Z
G− dµX −1
G ◦ Xdµ =
Y dµ =
Z
+
Y dµ =
−
G ◦ Xdµ =
by our previous result. Therefore
Z
Z
Z
Z
Z
Z
+
−
+
−1
−
−1
Y dµ = Y dµ − Y dµ = G dµX − G dµX = GdµX −1
3.3
Integration to the Limit
Throughout this section we always define random variables on some measure space(Ω, F, µ).
THEOREM 3.3.1 (Fatou’s Lemma) If (Xn )∞
n=1 is a sequence of random variables, then
Z
Z
lim inf Xn dµ 6 lim inf Xn dµ
n→∞
n→∞
105
Proof. Let Yn = inf k>n Xk and Y = lim inf n→∞ Xn . Then we know that Yn ↑ Y as n → ∞.
R
R
R
By the monotone convergence theorem we know that Yn dµ → Y dµ = lim inf n→∞ Xn dµ
as n → ∞. But by the definition of Yn we also know that Yn > Xn for all n. Therefore the
R
R
monotonicity yields Yn dµ > Xn dµ for all n. By taking limit inferior on both sides we obtain
that
Z
lim inf
n→∞
Z
Xn dµ > lim inf
n→∞
Z
Yn dµ = lim
n→∞
Z
Yn dµ =
lim inf Xn dµ
n→∞
∞
∞
THEOREM 3.3.2 (Pratt’s Theorem) Suppose (Xn )∞
n=1 , (Ln )n=1 , (Un )n=1 , L, U are random
variables with:
(1) Ln → L, Xn → X, Un → U as n → ∞;
(2) Ln 6 Xn 6 Un for all n;
R
R
R
R
(3) Ln , Un , L, U are integrable and Ln dµ → Ldµ and Un dµ → U dµ as n → ∞.
R
R
Then Xn , X are integrable and Xn dµ → Xdµ as n → ∞.
Proof. It suffices to show that:
(I) Xn and X are integrable. Since Ln 6 Xn 6 Un for all n, then we know that Xn+ 6 Un+
and Xn− 6 L−
n for all n. But according to the condition of Pratt’s Theorem, Ln , Un are
R
R
integrable for all n. Therefore Un+ dµ < ∞ and L−
n dµ < ∞. By the monotonicity of
integration for nonnegative random variables, we obtain that
Z
Z
Z
Z
+
+
−
Xn dµ 6 Un dµ < ∞,
Xn dµ 6 L−
n dµ < ∞
Therefore Xn is integrable for all n.
Similarly we know that L 6 X 6 U by letting n → ∞ in the inequality Ln 6 Xn 6 Un
since Ln → L, Xn → X, Un → U as n → ∞. Then X + 6 U + and X − 6 L− yields that
Z
Z
Z
Z
+
+
−
X dµ 6 U dµ < ∞,
X dµ 6 L− dµ < ∞
since L, U are integrable. Therefore X is integrable.
R
R
(II) Xdµ > lim supn→∞ Xn dµ. Consider the nonnegative sequence (Un −Xn )∞
n=1 . Applying
the Fatou’s lemma yields
Z
Z
Z
Z
Z
U dµ − Xdµ = (U − X)dµ = lim inf (Un − Xn )dµ 6 lim inf (Un − Xn )dµ
n→∞
106
n→∞
By the property of limit inferior we know that
Z
Z
Z
Z
Z
lim inf (Un − Xn )dµ = U dµ + lim inf − Xn dµ = U dµ − lim sup Xn dµ
n→∞
n→∞
Since X is integrable, namely,
R
n→∞
Xdµ is finite, then we obtain that
Z
Z
Xdµ > lim sup Xn dµ
n→∞
(III)
R
Xdµ 6 lim inf n→∞
R
Xn dµ. Consider the nonnegative sequence (Xn − Ln )∞
n=1 . Applying
Fatou’s lemma yields
Z
Z
Z
Z
Z
Xdµ − Ldµ = (X − L)dµ = lim inf (Xn − Ln )dµ 6 lim inf (Xn − Ln )dµ
n→∞
n→∞
Similarly we have
Z
Z
Z
Z
Z
Xdµ − Ldµ 6 lim inf (Xn − Ln )dµ = lim inf Xn dµ − Ldµ
n→∞
Since L is integrable, namely,
R
n→∞
Ldµ is finite, then we obtain that
Z
Z
Xdµ 6 lim inf Xn dµ
n→∞
(IV)
R
Xn dµ →
R
Xdµ. By (II) and (III) we know that
Z
Z
Z
Z
Xdµ 6 lim inf Xn dµ 6 lim sup Xn dµ 6 Xdµ
n→∞
Therefore
R
Xdµ = limn→∞
R
n→∞
Xn dµ.
THEOREM 3.3.3 (Dominated Convergence Theorem) If (Xn )∞
n=1 is a sequence of random variables with supn |Xn | being integrable and Xn → X as n → ∞, then X is integrable
R
R
and Xn dµ → Xdµ as n → ∞.
Proof. By the inequality −|Xn | 6 Xn 6 |Xn | we know that
− sup |Xk | = inf (−|Xk |) 6 |Xn | 6 sup |Xk |
k
k
k
for all n. Then define Ln = − supk |Xk |, Un = supk |Xk | for all n. By the condition of Dominated
Convergence Theorem we know that both Ln , Un are integrable. But supk |Xk | is irrelevant of n,
therefore Ln → L, Un → U as n → ∞ where L = − supk |Xk |, U = supk |Xk |. This shows that
U, L are also integrable. Namely we know:
(1) Ln 6 Xn 6 Un for all n;
107
(2) Ln → L, Un → U, Xn → X as n → ∞;
R
R
R
R
(3) Ln dµ = Ldµ, Un dµ = U dµ.
Therefore applying the Pratt’s theorem yields that X is integrable and
R
Xn dµ →
THEOREM 3.3.4 (First Borel-Cantelli Lemma for Random Variables) If
P
∞, then ∞
n=1 Xn converges almost everywhere and is integrable with
!
Z X
∞
∞ Z
X
Xn dµ =
Xn dµ
n=1
Proof. First we show that
P∞
n=1
m by linearity we have
Z X
m
R
Xdµ.
P∞ R
n=1
|Xn |dµ <
n=1
Xn converges almost everywhere and is integrable. For all finite
!
|Xn | dµ =
n=1
m Z
X
|Xn |dµ 6
n=1
∞ Z
X
|Xn |dµ < ∞
n=1
P∞
P
∞
is
an
increasing
sequence
with
the
limit
being
|X
|)
Since the sequence ( m
n
m=1
n=1 |Xn |,
n=1
then by Monotone Convergence Theorem(MCT) we can let m → ∞ and obtain
!
!
Z X
Z X
∞ Z
∞
m
X
|Xn |dµ < ∞
|Xn | dµ = lim
|Xn | dµ =
m→∞
n=1
n=1
n=1
P
This shows that the random variable n=1 |Xn | is integrable and thus µ( ∞
n=1 |Xn | = ∞) = 0.
P∞
Namely n=1 Xn absolutely converges almost everywhere.
P∞
P
Since we have limm→∞ m
n=1 Xn and a dominating function of this sequence of
n=1 Xn =
P∞
random variables can be found:
m
m
∞
X
X
X
Xn 6
|Xn | 6
|Xn |
n=1
n=1
n=1
where
P∞
n=1
|Xn | is integrable by our previous result. Then applying Dominated Convergence
Theorem yields
∞ Z
X
n=1
Xn dµ = lim
m→∞
m Z
X
n=1
Xn dµ = lim
m→∞
Z X
m
n=1
Z
Xn dµ =
∞
X
!
Xn dµ
n=1
EXAMPLE 3.3.1 Take Ω = N, F = 2Ω and µ to be the counting measure. Consider a
∞
sequence of nonnegative sequences ((xnm )∞
m=1 )n=1 with xnm ↑ xm as n → ∞ for all fixed m.
Then any sequence is measurable. Now we assume that (xnm )∞
m=1 is integrable, namely,
Z
∞
X
xn dµ =
xnm < ∞
m=1
108
Applying the monotone convergence theorem yields that
lim
n→∞
∞
X
m=1
xnm =
∞
X
m=1
∞
X
lim xmn =
n→∞
xm
m=1
EXAMPLE 3.3.2 Now consider a sequence of measures (µm )∞
m=1 defined on a measurable
P∞
space (Ω, F, µ). Consider the set function µ(·) := m=1 µ(·). Then it is easy to check that µ is
nonnegative, takes value 0 when the argument is ∅, and satisfies the countable additivity since
nonnegative convergent double series can interchange the order of summation. Therefore µ is a
measure on (Ω, F). Now consider the integration with respect to µ. We claim that
Z
∞ Z
X
Xdµ =
Xdµm
m=1
We prove it by:
(0) Indicator case. Suppose X is an indicator IA , A ∈ F, then by definition
Z
∞
∞ Z
X
X
Xdµ = µ(A) =
µm (A) =
Xdµm
m=1
m=1
(1) Nonnegative simple random variable case. Suppose X is a simple random variable,
P
P
then write X = j cj IAj , Ω j Aj , Aj ∈ F. Then by definition
Z
X
Xdµ =
cj µ(Aj )
j
=
=
=
X
j
∞
X
cj
∞
X
µm (Aj )
m=1
X
m=1 j
∞ Z
X
cj µm (Aj )
Xdµm
m=1
(2) Nonnegatvie random variable case. Suppose X is a nonnegative random variable, then
the structure theorem yields a sequence of nonnegative simple random variables (Xn )∞
n=1
with Xn ↑ X. Then
Z
Z
Xdµ = lim
Xn dµ
∞ Z
X
= lim
Xn dµm
n→∞
n→∞
Because
R
Xn dµm ↑
R
m=1
Xdµm for every fixed m as n → ∞ due to the monotone conver-
gence theorem, then applying the result in our previous example yields that the sum can
109
interchange with the limit, namely
Z
Z
∞ Z
∞
∞ Z
X
X
X
Xdµ = lim
Xn dµm =
lim
Xn dµm =
Xdµm
n→∞
m=1
m=1
n→∞
m=1
(3) General random variable case. Suppose X is quasi-integrable. Namely either
R
∞ or X − dµ < ∞. Then we know that
Z
Z
∞ Z
∞ Z
X
X
+
+
−
X dµ =
X dµm ,
X dµ =
X − dµm
m=1
R
X + dµ <
m=1
One of the two series converges, and since both are of positive terms, then the sum of the
series is the series of the sums. Namely we know that
Z
Z
Z
+
Xdµ = X dµ − X − dµ
∞ Z
∞ Z
X
X
+
X − dµm
X dµm −
=
=
=
m=1
∞ Z
X
m=1
∞ Z
X
m=1
(X + − X − )dµm
Xdµm
m=1
3.4
Indefinite Integrals and Densities
DEFINITION 3.4.1 (Indefinite Integrals) Let X be a quasi-integrable random variable
on some measure space (Ω, F, µ). The set function ρ, defined on F by
Z
Z
Xdµ := XIA dµ
ρX (A) :=
A
is called the indefinite integral of X with respect to µ.
PROPOSITION 3.4.1 (Properties of Indefinite Integrals) Suppose that ρX is the indefinite integral of X with respect to some measure µ on a measurable space (Ω, F, µ). Then
(1) Decomposition. ρX = ρX + − ρX − is the difference of two nonnegative indefinite integrals(namely, ρX + , ρX − ), at least one of which is bounded.
(2) σ-smoothness.
P
P
(2a) σ-additivity. If (An )∞
n=1 are disjoint sets in F, then ρ(
n An ) =
n ρ(An ).
(2b) Monotone Sequential Continuity from Below.
ρ(An ) → ρ(A) as n → ∞.
110
If An ↑ A as n → ∞, then
(2c) Monotone Sequential Continuity from Above. If An ↓ A as n → ∞ with ρ(AN )
being finite for some N , then ρ(An ) → ρ(A).
(3) Absolute Continuity.
(3a) µ(A) = 0 implies that ρ(A) = 0.
(3b) X ∈ L implies that limη→0 sup{|ρX (A)| : A ∈ F, µ(A) 6 η} = 0.
Proof. We may assume that X ∈ Q− without loss of generality because if X ∈ Q+ then −X ∈ Q−
R
and the results can be applied to −X. Then we know that X − dP < ∞.
R
R
(1) Decomposition. By the definition of integration we have ρX (A) XIA dP = (XIA )+ dP −
R
R
R
(XIA )− dP = (X + IA )dP − (X − IA )dP = ρX + (A) − ρX − (A).
(2b) Monotone Sequential Continuity from Below. Since we have An ↑ A then X + IAn ↑
X + IA and X − IAn ↑ X − IA as n → ∞. By Monotone Convergence Theorem(MCT) we have
ρX + (An ) ↑ ρX + (A) and ρX − (An ) ↑ ρX − (A). By monotonicity we have ρX − (An ) 6
R
rhoX − (A) 6 X − dµ < ∞. Then we can write
ρX (An ) = ρX + (An ) − ρX − (An ) → ρX + (A) − ρX − (A) = ρX (A),
n→∞
(2c) Monotone Sequential Continuity from Above. An ↓ A implies that AN −An ↑ AN −A
for some N with ρ(AN ) being finite and n > N . Then by 2b we know that ρ(AN − An ) →
ρ(AN − A) as n → ∞. Note that ρ(AN ) is finite, implying that ρX + (An ) 6 ρX + (AN ) <
∞, ρX − (An ) 6 ρX − (AN ) < ∞ for all n > N and ρX + (A) 6 ρX + (AN ) < ∞, ρX − (A) 6
ρX − (AN ) < ∞. This means that ρ(A), ρ(An ) are finite for all n > N . Similarly we have
ρ(AN − A), ρ(AN − An ) are finite for all n > N . Since we have
ρ(AN ) = ρ(A) + ρ(AN − A) = ρ(An ) + ρ(AN − An ) for all n ∈ N
Then we obtain that ρ(An ) → ρ(A) as n → ∞.
P
P∞
(2a) σ-additivity. Now we know that m
n=1 An ↑
n=1 An , then by 2b we know that
!
!
m
m
∞
X
X
X
ρ(An ) = ρ
An → ρ
An
n=1
n=1
n=1
P
Therefore we obtain that n=1 ρ(An ) = ρ( ∞
n=1 An ).
R
(3a) This can be seen by the fact that ρ(A) = XIA dµ = 0 where X can be any random
because IPm
=
n=1 An
P∞
Pm
n=1 IAn .
variables provided that µ(A) = 0.
111
(3b) For all c ∈ R+ we can write
Z
Z
Z
|ρ(A)| 6 |X|IA dµ = |X|IA∩{|X|6c} dµ + |X|IA∩{|X|>c} dµ
Z
Z
6 cIA dµ + |X|I{|X|>c} dµ
Z
= cµ(A) + |X|I{|X|>c} dµ
Therefore for all c > 0 we obtain that
Z
lim sup |ρ(A)| 6 lim cη +
η→0 µ(A)6η
Z
|X|I{|X|>c} dµ =
η→0
|X|I{|X|>c} dµ
R
|X|{|X|>c} dµ → 0 as c → ∞, which completes the proof because
R
limη→0 supµ(A)6η |ρ(A)| is irrelevant of c. Since we have X being integrable, namely, |X|dµ <
Now we show that
∞, then we see that |X| dominates |X|I|X|>c and therefore we can apply Dominated Convergence Theorem(DCT) and obtain
Z
Z
Z
lim
|X|I{|X|>c} dµ =
lim |X|I{|X|>c} dµ = |X|I{|X|=∞} dµ = 0
c→∞
c→∞
because |X| is integrable and this yields that µ({|X| = ∞}) = 0.
DEFINITION 3.4.2 Suppose (Ω, F) is a measurable space with µ, ν being to measures. Then
we say that ν is absolutely continuous with respect to µ, if µ(A) = 0 implies ν(A) = 0.
DEFINITION 3.4.3 (Density) In a measure space (Ω, F, µ), suppose D is a nonnegative
random variable. Then the indefinite integral of D with respect to µ, denoted by ρX (·), is also
a measure on (Ω, F). In this setting we say that D is the density of ρ with respect to µ.
The following theorem gives out a sufficient condition for two densities to be identical almost
everywhere.
THEOREM 3.4.1 Suppose ρ1 (A) =
R
A
X1 dµ and ρ2 (A) =
R
A
X2 dµ, where X1 , X2 are quasi-
integrable random variables. Then ρ1 (A) = ρ2 (A) for all A ∈ F implies that X1 = X2 almost
everywhere with respect to µ, provided that at least one of ρ1 , ρ2 and µ is σ-finite.
Proof.
(I)At least one of X1 , X2 is integrable. Then by the fact that ρ1 (A) 6 ρ2 (A) we know
that both X1 and X2 are quasi-integrable by taking A = Ω. Therefore for any A ∈ F we have
(X2 − X1 )IA be quasi-integrable. Then
Z
(X2 − X1 )IA dP = ρ2 (A) − ρ1 (A) = 0
112
Taking A = {X2 < X1 } and A = {X2 > X2 } respectively yields
Z
Z
− (X2 − X1 )I{X2 >X1 } dP =
(X1 − X2 )I{X2 −X1 >0} dP = 0
Z
Z
(X1 − X2 )I{X2 <X1 } dP =
(X1 − X2 )I{X1 −X2 >0} dP = 0
And this shows that µ({(X2 − X1 )I{X2 −X1 >0} > 0) = µ({(X1 − X2 )I{X1 −X1 >0} > 0) = 0, namely,
µ({X1 > X2 }) = µ({X2 > X1 }) = 0. Hence we obtain that X1 = X2 , a.e.µ.
(II)One of ρ1 , ρ2 is σ-finite. Without loss of generality we may assume that ρ1 is σ-finite.
P
Then we see that there exists a partition of Ω = ∞
n=1 An such that ρ1 (An ) < ∞. This means
that X1 IAn is integrable for all n. Therefore by (I) we know that µ({X1 IAn 6= X2 IAn }) = 0 for
P
all n. Since Xi = ∞
n=1 Xi IAn for i = 1, 2, it follows that X1 = X2 , a.e.µ.
P
(III)µ is σ-finite. Suppose we have a partition of Ω = ∞
j=1 Aj with µ(Aj ) < ∞. Now for each
S
j consider Ajn = {|X1 | 6 n} for all n ∈ N. Then we see that {|X| = ∞} ∪ ∞
n=1 Ajn . For all
n and j we have that X1 IAjn = X2 IAjn , a.e.µ by (I) because X1 IAjn is bounded by n and takes
positive values on Ajn which is of finite measure, and thus is integrable. Then we see that
X1 I
P
j
Ajn
=
∞
X
j=1
X1 IAjn =
∞
X
X2 IAjn = X2 IPj Ajn , a.e.µ
j=1
for all n. Since Ajn ↑ Aj ∩ {|X|1 < ∞}, it follows that IPj Ajn ↑ I{|X|< ∞} and we have
X1 I|X1 |<∞ = lim X1 IPj Ajn = lim X2 IPj Ajn = X2 I|X1 |<∞ , a.e.µ
n→∞
n→∞
Similarly by switching the role of X1 , X2 we obtain that
X1 I|X2 |<∞ = X2 I|X2 |<∞ , a.e.µ
If we denote D = {|X1 | < ∞} ∪ {|X2 | < ∞} then we obtain that X1 ID = X2 ID , a.e.µ. Now
denote B1 = {X1 = ∞, X2 = −∞}, B2 = {X1 = −∞, X2 = ∞}, then it suffices to show that
µ(B1 ) = µ(B2 ) = 0. This can be seen by way of contradiction because if µ(B1 ) > 0 then
Z
ρ1 (B1 ) = X1 IB1 dµ = (∞) · µ(B1 ) = ∞
Z
ρ2 (B1 ) = X2 IB1 dµ = −(∞) · µ(B1 ) = −∞
contradicting ρ1 (B1 ) = ρ2 (B1 ). Similarly we know that µ(B2 ) = 0. Now we have shown that
X1 |D∪B1 ∪B2 = X2 |D∪B1 ∪B2 , a.e.µ. But X1 = X2 on the remaining set and therefore we show that
X1 = X2 , a.e.µ.
For two densities to be identical almost everywhere, it suffices to check certain generators:
113
PROPOSITION 3.4.2 (Enough Generator is Enough)
(1) If ρ1 (A) 6 ρ2 (A) for all A in a field F0 generating F, and if ρ1 , ρ2 are σ-finite over F0 , then
X1 6 X2 a.e.µ.
(2) If ρ1 (A) = ρ2 (A) for all A in a π-system P generating F, and if ρ1 , ρ2 are σ-finite over P,
then X1 = X2 a.e.µ.
Proof.
(1)
(2) Denote ρ1± (A) = ρX1± (A) and ρ2± (A) = ρX2± (A). Then we see that ρ1± and ρ2± are σ-finite
measures on P where P is a π-system. Now the condition ρ1 (A) = ρ2 (A) implies that
ρX1+ (A) − ρX1− (A) = ρX2+ (A) − ρX2− (A)
Then we have that
ρX1+ +X2− (A) = ρX1+ (A) + ρX2− (A) = ρX2+ (A) + ρX1− (A) = ρX2+ +X1− (A)
This means that as measures ρX1+ +X2− and ρX2+ +X1− are σ-finite because ρi are σ-finite for i = 1, 2.
Now by the uniqueness of extension of σ-finite measures we now that
ρX1+ +X2− (A) = ρX1+ (A) + ρX2− (A) = ρX2+ (A) + ρX1− (A) = ρX2+ +X1− (A)
for all A ∈ F. Hence by the previous theorem we know that X1+ + X2− = X2+ + X1− , a.e.µ. Now
consider the set E := {X1+ + X2− = X2+ + X1− } and we know that µ(E c ) = 0.
• If µ({X1+ = X2− = ∞}) > 0. Since X1 , X2 are quasi-integrable, it follows that
R
∞, X2+ dµ < ∞ and µ(X1− = ∞) = µ(X2+ = ∞) = 0. This yields that
R
X1− dµ <
R
X2− dµ <
µ({X1+ = X2− = ∞, X1− < ∞, X2+ < ∞}) > 0
contradicting with the fact that µ(E c ) = 0.
• If µ({X2+ = X1− = ∞}) > 0. Since X1 , X2 are quasi-integrable, it follows that
R
∞, X1+ dµ < ∞ and µ(X2− = ∞) = µ(X1+ = ∞) = 0. This yields that
µ({X2+ = X1− = ∞, X2− < ∞, X1+ < ∞}) > 0
contradicting with the fact that µ(E c ) = 0.
Therefore we have shown that µ({X1+ = X2− = ∞}) = µ({X2+ = X1− = ∞}) = 0. Namely except
a set of measure 0 we can subtract X2− , X1− from both sides of the equality X1+ + X2− = X2+ + X1−
and obtain that
X1 = X1+ − X1− = X2+ − X2− = X2 , a.e.µ
114
Now we consider the case where D is a density of ρ with respect to µ. Then ρ is also a
measure on the underlying measurable space. The integration with respect to ρ can be derived
through integration with respect to µ.
THEOREM 3.4.2 (Slap in the Density) Suppose ρ(·) =
R
·
Ddµ where D is the density
of ρ. Then X is quasi-integrable with respect to ρ if and only if XD is quasi-integrable with
respect to µ, in which case the integrations have relation
Z
Z
Xdρ = X(ω)D(ω)dµ
Proof. The proof of the theorem goes through the cases from simple random variables to general
random variables.
• X is a simple random variable. Suppose X =
Z
Xdρ =
n
X
cj ρ(Aj ) =
j=1
n
X
Pn
j=1 cj IAj .
Z
cj
DIAj dµ =
Z X
n
j=1
Then
Z
cj IAj Ddµ =
XDdµ
j=1
• X is a nonnegative random variable.
Suppose we have (Xn )∞
n=1 is a sequence of
increasing simple random variables with Xn → X, n → ∞. Then by definition
Z
Z
Z
Xdρ = lim
Xn dρ = lim
Xn Ddµ
n→∞
n→∞
Since Xn ↑ X as n → ∞, it follows that Xn D ↑ XD as n → ∞ and we can apply the
Monotone Convergence Theorem(MCT) and obtain
Z
Z
Z
Z
Xdρ = lim
Xn Ddµ =
lim Xn Ddµ = XDdµ
n→∞
n→∞
• X is a general random variable. We see that
Z
Z
Z
±
±
X dρ = X Ddµ = (XD)± dµ
Therefore
Z
Z
Xdρ =
+
X dµ −
Z
−
Z
X dµ =
+
(XD) dµ −
Z
−
(XD) dµ =
Z
XDdµ
The following theorem describes the case when a sequence of probability measures converge
in a strong sense.
115
THEOREM 3.4.3 (Scheffés Theorem) Suppose (Ω, F, µ) is a measurable space and (Pn )∞
n=1
is a sequence of probability measures on (Ω, F) with densities Dn with respect to µ. If Dn →
D, a.e.µ, then
lim sup{|Pn (A) − P (A)| : A ∈ F} = 0
n→∞
One says Pn → P strongly in this case.
Proof. For all A ∈ F consider the following inequality
Z
Z
Z
Z
|Pn (A) − P (A)| = Dn IA dµ − DIA dµ 6 |Dn IA − DIA |dµ 6 |Dn − D|dµ
Now we know that
• 0 6 |Dn − D| 6 Dn + D,
• |Dn − D| → 0 and Dn + D → 2D as n → ∞,
R
R
R
• (Dn + D)dµ = Dn dµ + Ddµ = 2 for all n,
R
then by Pratt’s Theorem we know that |Dn − D|dµ → 0 as n → ∞. Therefore
Z
0 6 lim sup{|Pn (A) − P (A)| : A ∈ F} 6 lim
|Dn − D|dµ = 0
n→∞
n→∞
EXAMPLE 3.4.1 Consider the measure space (R, R, λ) where the σ-field is the Borel σ-field
and the underlying measure is Lebesgue measure. Consider the densities Dn (ω) =
1
ω n−1 (1−
B(n,n)
ω)n−1 I(0,1) (ω), ω ∈ Ω for all n ∈ N. Then we can see that Pn (A) = ρDn (A) are probability
measures because Pn (Ω) = 1 for all n. Set X(ω) = ωI(0,1] to be the identical random variable
over (0, 1). Then Calculation of expectation and variance yields
Z
Z
1
En X = XdPn =
ωDn (ω)dλ =
2
(0,1]
1
Varn (X) = EX 2 − (EX)2 =
4(2n + 1)
in which case the “slap in the density” rule is used and we use subscript n to denote the
corresponding probability measure. Now we can find the density of the random variable (X −
p
En X)/ Varn (X) with respect to Pn . Consider the sequence of densities (δn )∞
n=1 where δn (ω) =
p
p
Dn (En X + Varn (X)ω) · Varn (X). Then we see that
p
p
δn (ω) = Dn (En X + Varn (X)ω) · Varn (X)
n−1 n−1
1
1
1
ω
1
ω
1
ω
√
=
+ √
− √
+ √
I(0,1)
B(n, n) 2 2n + 1 2 2 2n + 1
2 2 2n + 1
2 2 2n + 1
n−1
(2n − 1)!
1
1
ω2
√
=
−
I(−√2n+1,√2n+1) (ω)
(n − 1)!(n − 1)! 2 2n + 1 4 4(2n + 1)
116
Recall the famous Stirling’s formula:
√
lim
n→∞
2πn
n!
n n
e
=1
Then we can see that
p
2n−1 2n−2
2π(2n − 1)
2n − 1
e
1
p
√
δn (ω) =(1 + o(1)) p
e
n−1
2 2n + 1
2π(n − 1) 2π(n − 1)
n−1
ω2
1
1
−
I(−√2n+1,√2n+1) (ω)
22n−2
2n + 1
2n−1 n−1
1 1 2n − 1
ω2
=(1 + o(1)) √
1−
I(−√2n+1,√2n+1) (ω)
2n + 1
2π e 2n − 2
2n−1 n−1
1 1
1
ω 2 /2
I(−√2n+1,√2n+1) (ω)
=(1 + o(1)) √
1+
1−
2n − 2
n + 1/2
2π e
1
ω2
→ √ exp −
, n→∞
2
2π
Therefore if we denote the density D =
√1
2π
exp(−ω 2 /2) then we see that ρDn → ρD strongly by
Scheffé’s Theorem.
3.5
Modes of Convergence and Lp spaces
Throughout this section our underlying measure space will be a probability space (Ω, F, P ) and
we will introduce some useful modes of convergence. And two random variables are essentially
equal if they are identical with probability one. In addition, the notation for expectation EX is
R
exactly the same definition of XdP provided that random variable X is quasi-integrable with
respect to the given probability measure. We will use both notation from now on.
DEFINITION 3.5.1 (Almost Sure Convergence) Let (Xn )∞
n=1 be a sequence of random
variables and X be a random variable. We say that Xn converge to X almost surely, denoted
by Xn → X, a.s., if P {Xn → X} = 1.
THEOREM 3.5.1 Suppose (Xn )∞
n=1 and X are random variables. Then Xn → X, a.s. if and
only if for all ε > 0 we have P {|Xn − X| > ε, i.o. } = 0.
DEFINITION 3.5.2 (Convergence in Probability) Let (Xn )∞
n=1 be a sequence of random
variables and X be a random variable. Then we say that Xn converge to X in probability,
P
denoted by Xn → X, if for all ε > 0 we have P {|Xn − X| > ε} → 0 as n → ∞.
PROPOSITION 3.5.1 Let (Xn )∞
n=1 be a sequence of random variables and X be a random
P
P
variable. If Xn → X and Xn → Y , then X = Y, a.s..
117
PROPOSITION 3.5.2 Let (Xn )∞
n=1 be a sequence of random variables and X be a random
P
variable. If Xn → X, a.s., then Xn → X.
Conversely we have the following famous theorem:
THEOREM 3.5.2 (Riesz Theorem) Let (Xn )∞
n=1 be a sequence of random variables and
P
X be a random variable. If Xn → X then there exists a subsequence of (Xn )∞
n=1 converging to
X almost surely.
THEOREM 3.5.3 Let (Xn )∞
n=1 be a sequence of random variables and X be a random variP
∞
able. Then Xn → X if and only if for any subsequence (Xnk )∞
k=1 of (Xn )n=1 there exists a further
subsequence of (Xnk )∞
k=1 converging to X almost surely.
DEFINITION 3.5.3 Suppose that S is a set and (xn )∞
n=1 is a sequence in S. We say that
xn → x, n → ∞ in a given mode is metrizable, if there exists a metric d on the set S such that
xn → x in that mode if and only if d(xn , x) → 0.
THEOREM 3.5.4 In a metric space (S, d) we have xn → x, n → ∞ if and only if every
sequence (nk )∞
k=1 have a further subsequence converging to x.
Hence we conclude that:
PROPOSITION 3.5.3 Almost sure convergence is not metrizable.
DEFINITION 3.5.4 (Lp -Norm and Lp Space) Suppose p ∈ (0, ∞) and X is a random
variable. Define the Lp -norm of X by
kXkp := (E|X|p )1/p
And the Lp space denoted by Lp := Lp (P ) is defined to be the collection random variables with
kXkp < ∞.
DEFINITION 3.5.5 (Essential Supremum) Given a random variable X > 0 with the
distribution function FX (x), define its essential supremum
ess sup X := inf{x ∈ R : FX (x) = 1}
If the set on the right is empty, we set ess sup X = +∞; otherwise we in fact have ess sup X =
min{x : FX (x) = 1} by the properties of FX .
DEFINITION 3.5.6 (L∞ -Norm and L∞ Space) Given a random variable X, define the
L∞ -norm of X to be ess sup |X|. Define L∞ space L∞ := L∞ (P ) to be the collection of random
variables with finite essential supremum.
The following lemma is useful for constructing Lp (P ) as a linear space when p ∈ (0, 1).
118
LEMMA 3.5.1 (Chung’s Exercise 3.12) Suppose p > 0 and X, Y ∈ Lp . Then E(|X +
Y |p ) 6 2p (E|X|p + E|Y |p ). If p > 1 then the factor 2p may be replaced by 2p−1 . If 0 6 p 6 1, it
may be replaced by 1.
LEMMA 3.5.2 (Hölder’s Inequality) Let 1 < p < ∞ with 1/p + 1/q = 1 and X, Y be
random variables. Then
|EXY | 6 kXkp kY kq
And the equality holds if and only if |X|p and |Y |q are linearly dependent on L1 .
THEOREM 3.5.5 (Minkowski’s Inequality) Suppose X, Y are random variable and 1 6
p < ∞. Then
kX + Y kp 6 kXkp + kY kq
THEOREM 3.5.6 If 0 < p 6 ∞, then Lp is a vector space.
THEOREM 3.5.7 If 1 6 p 6 ∞, then Lp is a normed linear space, provided we identify
essentially equal random variables. With this definition, Lp is thus a metric space with metric
ρp (X, Y ) = kX − Y kp .
THEOREM 3.5.8 (Jensen’s Inequality) Suppose ϕ is a convex function and X, ϕ(X) are
both integrable. Then
E[ϕ(X)] > ϕ(E[X])
The following example, consisting of 7 parts, clearly specifies many properties in Lp spaces,
especially when p < 1.
EXAMPLE 3.5.1 (Homework Problem # 7) Assume that random variable X is defined
on some underlying probability space (Ω, F, P ) and kXkp be the Lp -norm.
(a) Prove that kXkp increases with p ∈ (0, +∞].
(b) Under what conditions does it happen that r < s 6 ∞ and kXkr = kXks < +∞?
(c) Prove that the Lp decrease with p ∈ (0, ∞]. Under what conditions do Lr and Ls with
r < s contain exactly the same random variables?
(d) Let S = {p : 0 < p < ∞ and kXkp < ∞}. Show that S is of the form S = (0, p0 ) or
S = (0, p0 ] for some 0 6 p0 6 ∞.
(e) Prove that log(kXkpp ) is convex in p ∈interior(S) and that kXkp is continuous in p ∈ S.
(f ) Show that kXk∞ = limp↑∞ ↑ kXkp .
119
(g) Assume that S 6= ∅ and prove that
lim kXkp = exp(E log |X|)
p↓0
with the understanding exp(−∞) = 0.
Proof.
Claim 1: If A ∈ F is of probability 0, then for all random variable Y we have
Z
Y IA dP = 0
Proof of Claim 1: Consider Yn = nIA , then we see that Yn ↑ (∞)IA as n → ∞. Therefore by
the definition of integration we have
Z
Z
(∞)IA dP = lim
Yn dP = lim n · P (A) = 0
n→∞
and by the linearity of integration we have
n→∞
R
R
(−∞)IA dP = − (∞)IA dP = 0. Now applying
the monotonicity of integration yields that
Z
Z
Z
0 = (−∞)IA dP 6 Y IA dP 6 (∞)IA dP = 0
Therefore we obtain that
R
Y IA dP = 0. End of Proof of Claim 1
Claim 2: Suppose ϕ(x) : I → R is a smooth and strict convex function, namely, ϕ00 (x) > 0
for all x ∈ I ◦ where I is an interval, Y is a integrable random variable taking values on I with
R
Y dP ∈ I and ϕ(Y ) is quasi-integrable. Then
Z
Z
Y dP
ϕ(Y )dP > ϕ
and the equality holds if and only if Y is constant almost surely. This is a special case of Jensen’s
inequality.
Proof of Claim 2: Take x0 =
R
Y dµ. Define function f : I → R by
f (x) = ϕ(x) − ϕ0 (x0 )(x − x0 ) − ϕ(x0 )
Then we know that f 0 (x) = ϕ0 (x) − ϕ0 (x0 ) and f 00 (x) = ϕ00 (x) > 0 because ϕ is strictly convex.
This shows that x0 is the unique minimum point of f (x) over R+ . Therefore f (x) > f (x0 ) = 0
and f (x) > 0 for all x 6= x0 . Namely we have
ϕ(x) > ϕ0 (x0 )(x − x0 ) + ϕ(x0 )
for all x ∈ R. Then we have
ϕ(Y ) > ϕ0 (x0 )(Y − x0 ) + ϕ(x0 )
120
Since Y and ϕ(Y ) are integrable, by the monotonicity and linearity of integration we have
Z
Z
Z
0
0
(∗)
ϕ(Y )dP > ϕ (x0 ) Y dP − ϕ (x0 )x0 + ϕ(x0 ) = ϕ(x0 ) = ϕ
Y dµ
R
R
Now we show that ϕ( Y dP ) = ϕ(Y )dP if and only if Y is constant almost surely. First if Y
is almost surely constant, say, P {Y = c} = 1, then we have
Z
Z
Y dP = (Y I{Y =c} + Y I{Y 6=c} )dP
Z
Z
= Y I{Y =c} dP + Y I{Y 6=c} dP
Z
= Y I{Y =c} dP = cP {Y = c} = c
because P {Y 6= c} = 1 − P {Y = c} = 0 and it follows that
R
Y I{Y 6=c} dP = 0 by claim 1. Now
we know that {Y = c} ⊂ {ϕ(Y ) = ϕ(c)} because if Y = c then ϕ(Y ) = ϕ(c). Therefore we
know that 1 = P {Y = c} 6 P {ϕ(Y ) = ϕ(c)} 6 1. Therefore P {ϕ(Y ) = ϕ(c)} = 1 and this
implies that
Z
Z
ϕ(Y )dP =
Z
=
(ϕ(Y )I{ϕ(Y )=ϕ(c)} + ϕ(Y )I{ϕ(Y )6=ϕ(c)} )dP
Z
ϕ(Y )I{ϕ(Y )=ϕ(c)} dP + ϕ(Y )I{ϕ(Y )6=ϕ(c)} dP
Z
=
ϕ(Y )I{ϕ(Y )=ϕ(c)} dP = ϕ(c)P {ϕ(Y ) = ϕ(c)} = ϕ(c)
because P {ϕ(Y ) 6= ϕ(c)} = 1 − P {ϕ(Y ) = ϕ(c)} = 0 and it follows that
R
ϕ(Y )I{ϕ(Y )6=ϕ(c)} dP =
0 by claim 1. Therefore we have
Z
ϕ
Y dP
Z
= ϕ(c) =
ϕ(Y )dP
R
R
Secondly if we have equality holds, namely, ϕ( Y dP ) = ϕ(Y )dP , then we know that (*) holds
with equality, namely, we have
Z
Z
ϕ(Y )dP = (ϕ0 (x0 )Y − ϕ0 (x0 )x0 + ϕ(x0 ))dP
This shows that
R
f (Y )dP = 0 by linearity. Since f (Y ) is nonnegative, then we know that
P {f (Y ) > 0} = 0 and P {f (Y ) = 0} = 1. Therefore we obtain that f (Y ) = 0, a.e.. But f (x) > 0
for all x 6= x0 , namely, f (Y ) = 0 if and only if Y = x0 . Hence P {f (Y ) = 0} = P {Y = x0 } = 1.
Now we obtain that Y = x0 , a.s.. End of Proof of Claim 2
Claim 3: Suppose ϕ(x) : I → R is a smooth and strict concave function, namely, ϕ00 (x) < 0
for all x ∈ I where I is an interval, Y is a integrable random variable taking values on I with
121
R
Y dP ∈ I and ϕ(Y ) is quasi integrable, then
Z
Z
ϕ(Y )dP 6 ϕ
Y dP
and the equality holds if and only if Y is constant almost surely.
Proof of Claim 3: Since ϕ is smooth and strictly concave, then −ϕ is smooth and strictly
convex. Therefore by claim 2 we have
Z
Z
(−ϕ(Y ))dP > −ϕ
Y dP
Z
Z
implying that
ϕ(Y )dP 6 ϕ
Y dP
End of Proof of Claim 3
Claim 4: Consider the function
ψ(y) =
ay − 1
y
, y ∈ (0, r)
where a > 0. Then ψ(y) decreases as y decreases.
Proof of Claim 4: To see this, we know that
ay (y log a − 1) + 1
ψ (y) =
y2
0
Define
ω(y) = ay (y log a − 1) + 1,
y>0
Then we see that ω 0 (y) = yay (log a)2 > 0 , namely, ω(y) > ω(0) = 0. Hence we have ω(y) > 0
for all y > 0. Since ψ 0 (y) = ω(y)/y 2 then we obtain that ψ 0 (y) > 0 for y ∈ (0, r). Therefore we
obtain that ψ(y) decreases as y decreases. End of Proof of Claim 4
(a) Suppose we have 0 < p < q < ∞, then we show that kXkp 6 kXkq .
• Case 1: kXkq = ∞. We know that every nonnegative extended real number is less or
equal to ∞. Therefore kXkp 6 kXkq = ∞.
R
• Case 2: kXkp = ∞. This shows that kXkp dP = kXkpp = ∞, namely, we have
Z
Z
Z
p
p
p
kXkp = |X| dP = |X| I{|X|p 61} dP + |X|p I{|X|p >1} dP
because nonnegative random variables are always quasi-integrable and the linearity can be
applied. Note that
Z
Z
|X| I{|X|p 61} dP 6 I{|X|p 61} dP = P {|X|p 6 1} 6 1
R
R
by monotonicity. This shows that |X|p I{|X|p 61} dP is finite and |X|p I{|X|p <1} dP = ∞.
06
p
Again by monotonicity we have
Z
Z
Z
p
p q/p
∞ = |X| I{|X|p >1} dP 6 (|X| ) I{|X|p >1} dP 6 |X|q dP = kXkqq
because q/p > 1. Therefore we obtain that kXkp = kXkq = ∞ in this case.
122
• Case 3: kXkp < ∞ and kXkq < ∞. This means that both |X|p and |X|q are integrable.
Take r = q/p which is strictly greater than 1, then we know that ϕ(x) = xr is strictly convex
on [0, +∞). Since |X|p and |X|q = (|X|p )r are integrable, then by Jensen’s inequality we
have
Z
q
|X| dP =
Z
p r
(|X| ) dP >
Z
p
r
|X| dP
And this implies that
Z
1/(rp) Z
r 1/(rp) Z
1/p
q
p
p
kXkq =
|X| dP
>
|X| dP
=
|X| dP
= kXkp
Now we consider the case where q = ∞ and we still have r < ∞. We may assume that
kXk∞ < ∞, else we will have kXkp 6 kXk∞ = ∞ naturally for any p < ∞. By definition we
have kXk∞ = min{x : F|X| (x) = 1}, implying that F|X| (kXk∞ ) = P {|X| 6 kXk∞ } = 1. Denote
E = {|X| 6 kXk∞ }. Then we have P (E) = 1, P (E c ) = P {|X| > kXk∞ } = 0. Therefore
Z
Z
p
|X| dP = (|X|IE + |X|IE c )dP
Z
Z
p
= |X| IE dP + |X|p IE c dP
Z
Z
p
= |X| IE dP 6 kXkp∞ IE dP = kXkp∞ P (E) = kXkp∞
where the third equality holds because of claim 1. Therefore we obtain that kXkp 6 kXk∞ for
all p < ∞.
(b) We show that kXkr = kXks < ∞ where r < s if and only if |X| is almost surely constant.
• Assume that kXk is almost surely constant, say, |X| = C, a.s. for some finite C. Take
E = {|X| = C}, then P (E c ) = 0.
If s < ∞ we have
Z
Z
Z
Z
r
r
r
r
|X| dP = (|X| IE + |X| IE c )dP = |X| IE dP + 0 = |C|r IE dP = C r P (E) = C r
Z
Z
Z
Z
s
s
s
s
|X| dP = (|X| IE + |X| IE c )dP = |X| IE dP + 0 = |C|s IE dP = C s P (E) = C s
where the 0 is obtained by claim 1. Therefore we have kXkr = kXks = C.
If s = ∞ then we still have r < ∞ and kXkr = C. Since P {|X| = C} = 1 then we know
that F|X| (C) = 1 and for all x < C we have F|X| (x) = P {|X| 6 x} 6 P {|X| 6= C} = 0.
Namely F|X| (x) = 0 for all x < C. Therefore by definition kXk∞ = min{x : F|X| (x) = 1} =
C = kXkr .
123
• Assume that kXkr = kXks < ∞ where r < s.
If s < ∞. Then we know that kXksr = kXkss . Write s/r = t and we know that ϕ(x) = xt
is strictly convex on [0, +∞). Note that
Z
s/r Z
t Z
Z
s
r
r
r t
kXkr =
|X| dP
=
|X| dP = (|X| ) dP = |X|s dP = kXkss
This means that the equality in Jensen’s inequality holds for Y = |X|r . Therefore by claim
2 we know that |X|r is constant almost surely, say, P {|X|r = C} = 1. Therefore we obtain
that P {|X| = C 1/r } = 1, namely, |X| = C 1/r , a.s..
If s = ∞. Then we still have r < ∞ and there exists some v < s = ∞, v > r. By (a) we
know that kXkr 6 kXkv 6 kXks and this yields that kXkr = kXkv with r < s. By our
previous result we know that |X| is almost surely constant.
(c) By (a) we know that kXkp 6 kXkq if p < q 6 ∞. Therefore if X ∈ Lq then we know that
kXkq < ∞. Therefore kXkp 6 kXkq < ∞ and this implies that X ∈ Lp . Hence Lq ⊂ Lp if
p < q, namely, Lp decrease with p ∈ (0, ∞].
Now we claim that Lr = Ls where r < s if and only if for any sequence of disjoint measurable
sets (An )∞
n=1 ∈ F there exists some N such that P (An ) = 0 for all n > N .
For the “Only if” part: We prove it by contradiction. Suppose that there exists a sequence of
disjoint measurable sets (An )∞
n=1 such that for all N there exists n > N with P (An ) > 0. Then
∞
we can construct a subsequence (nk )∞
k=1 of (n)n=1 such that P (Ank ) > 0 by the following steps:
• For N = 1 there exists n1 > 1 such that P (An1 ) > 0;
• For N = n1 + 1 there exist n2 > n1 + 1 such that P (An2 ) > 0;
• For N = nk + 1 there exists nk+1 > nk + 1 such that P (Ank+1 ) > 0 for all k ∈ N.
∞
Hence by induction we construct a subsequence (nk )∞
k=1 of (n)n=1 with P (Ank ) > 0 and n1 <
n2 < · · · < nk < nk+1 < · · · , indicating that nk → ∞ as k → ∞. But we know that (Ank )∞
k=1
are disjoint, it follows that
1 = P (Ω) > P
∞
X
!
Ank
k=1
=
∞
X
P (Ank )
k=1
Therefore we have limk→∞ P (Ank ) = 0. Now we can construct a further subsequence (nkl )∞
l=1 of
(nk )∞
k=1 such that 0 < P (Ankl ) <
6 −2
l
π2
by the following steps:
• For C/12 there exists nk1 > 1 such that 0 < P (Ank1 ) < (6/π 2 );
• For C/22 there exist nk2 > nk1 + 1 such that 0 < P (Ank2 ) < (6/π 2 )/22 ;
124
• For C/(l + 1)2 there exists nkl+1 > nkl + 1 such that 0 < P (Ankl+1 ) < (6/π 2 )/(l + 1)2 for all
l ∈ N.
2
2
∞
Hence by induction we construct a subsequence (nkl )∞
l=1 of (nk )k=1 with 0 < P (Ankl ) < (6/π )/l
and nk1 < nk2 < · · · < nkl < nkl+1 < · · · , indicating that nkl → ∞ as l → ∞. For our convenience
denote Bl = Ankl for all l to avoid triple subscript. Then we have (Bl )∞
l=1 being a sequence of
disjoint measurable sets with P (Bl ) ∈ (0, (6/π 2 )/l2 ). Now construct a measurable function
X(ω) =
∞
X
l1/s IBl (ω),
ω∈Ω
l=1
Then we claim that X ∈ Lr but X ∈
/ Ls . In fact we can find a sequence of simple random
Pm 1/s
variables Xm (ω) = l=1 l lBl (ω) that increases to X as m → ∞. It follows that (|Xm |r )∞
m=1
is a sequence of simple random variables that increases to |X|r as m → ∞. Similarly we have
s
that (|Xm |s )∞
m=1 is a sequence of simple random variables that increases to |X| as m → ∞. By
the definition of integration
Z
Z
m
∞
X
6 lr/s X 6 1
r
r
|X| dP = lim
|Xm | dP = lim
=
<∞
2 l2
2 l 2−r/s
m→∞
m→∞
π
π
l=1
l=1
Z
Z
m
∞
s/s
X
X
6 l
6 1
s
s
|Xm | dP = lim
|X| dP = lim
=
=∞
2
2
m→∞
m→∞
π l
π2 l
l=1
l=1
because 2 − r/s > 1. This shows that X ∈ Lr but X ∈
/ Ls and contradicts with Lr = Ls where
r < s.
For the “If” part: Suppose we have X ∈ Lr . Consider the sequence of measurable sets
P
An := {ω : |X(ω)| ∈ [n − 1, n)} for each n ∈ N. Then we know that ∞
n=1 An = Ω where
Ai ∩ Aj = ∅ if i 6= j. Now by our assumption there exists N ∈ N such that P (An ) = 0 for all
n > N . Therefore
P {|X| 6 N } = P
N
X
!
An
=
n=1
N
X
n=1
P (An ) =
∞
X
P (An ) = P (Ω) = 1
n=1
This means that F|X| (N ) = 1 and therefore kXk∞ 6 N < ∞, implying that kXk ∈ L∞ and
hence Lr ⊂ L∞ . But by our previous result L∞ ⊂ Lr . Therefore we obtain that Lr = L∞ ⊂ Ls .
Again our previous result yields Lr ⊃ Ls and hence Lr = Ls .
(d)
• Case 1: S = ∅. This could be the case. Consider the case where Ω = (0, 1], F = B is
the Borel σ-field and P = λ is the Lebesgue measure. Take X(ω) = exp(1/ω). Since for all
p > 0 we have
|X(ω)|p
=∞
ω→0+
1/ω
lim
125
R1
|X(ω)|p dω = ∞ for all p > 0. By the relation between
R
Riemann integral and Lebesgue integral we know that |X(ω)|p dP = ∞, yielding that
then the Riemann integral
0
kXkp = ∞ for all p > 0. Therefore S = ∅ in this case and p0 = 0.
• Case 2: S 6= ∅. First we show that S is an interval. For all p, q ∈ S we have kXkq <
∞, kXkp < ∞. Therefore by (a) for all λ ∈ (0, 1) we have kXk(1−λ)p+λq < ∞ because
(1 − λ)p + λq 6 q. By the definition of S we have (1 − λ)p + λq ∈ S for all λ, namely, S is
convex. Therefore S is an interval.
Secondly we show that inf S = 0. In fact by definition we know that S ⊂ R+ , therefore
inf S > inf R+ = 0. Since S is non-empty, then there exists some q ∈ S such that kXkq <
∞. By (a) we know that kXkp 6 kXkq < ∞ for all p < q. Therefore by definition
(0, q) ⊂ S, yielding that inf(0, q) = 0 > inf S. Hence we obtain that inf S = 0.
Therefore we know that S is an interval with inf S = 0. But 0 ∈
/ S by its definition. Take
p0 = sup S and we have that either S = (0, p0 ) or S = (0, p0 ] where 0 < p0 6 ∞.
(e) We may assume that S is non-empty and also assume that X is not 0, a.s., otherwise kXkp
will be 0 for all p ∈ S. Thus by assumption kXkp 6= 0 for all p ∈ S by (b). For all p, q ∈ S, p 6= q
and for all λ ∈ (0, 1), define s = 1/λ, r = 1/(1 − λ). Then 1/r + 1/s = 1 and by Hölder’s
inequality we have
(1−λ)p+λq
kXk(1−λ)p+λq
Z
=
|X|(1−λ)p+λq dP
Z
|X|(1−λ)p · |X|λq dP
1/s
1/r Z
Z
λqs
(1−λ)pr
|X| dP
|X|
dP
·
6
=
Z
=
1/s
1/r Z
q
·
|X| dP
|X| dP
p
Taking logarithm on both sides yields
(1−λ)p+λq
log(kXk(1−λ)p+λq )
1
6 log
r
Z
p
|X| dP
1
+ log
s
Z
q
|X| dP
= (1 − λ) log(kXkpp ) + λ log(kXkqq )
Now suppose λ = 0 or 1, then we see that
(1−λ)p+λq
λ=0:
log(kXk(1−λ)p+λq ) = log kXkpp = (1 − λ) log kXkp + λ log kXkq
λ=1:
log(kXk(1−λ)p+λq ) = log kXkqq = (1 − λ) log kXkp + λ log kXkq
(1−λ)p+λq
Therefore we know that log(kXkpp ) is a convex function with respect to p ∈ S ⊃ S ◦ .
126
Denote g(p) = log(kXkpp ). Then we know that g is convex and hence is continuous in the
interior of S. Therefore kXkp = exp( p1 g(p)) is continuous for p ∈ S because both exp(·) and
g(·)/(·) are continuous function for p ∈ S ◦ . It suffices to show that log kXkpp is continuous in
p = p0 :
n
For any sequence xn ↓ 0 we consider kXkpp00 −x
−xn . Then we see that
Z
p0 −xn
kXkp0 −xn = |X|p0 −xn dP
Z
Z
p0 −xn
= |X|
I{|X|>1} dP + |X|p0 −xn I{|X|61} dP
• p0 > 1. Then for sufficiently large n we have p0 − xn > 1 and |X|p0 −xn I{|X|>1} increases to
|X|p0 I{|X|>1} as n increases and by Monotone Convergence Theorem we have
Z
Z
p0 −xn
lim
|X|
I{|X|>1} dP = |X|p0 I{|X|>1} dP
n→∞
And by Bounded Convergence Theorem where P (Ω) = 1 < ∞ and |X|p0 −xn I|X|61 6 1 we
have
Z
lim
n→∞
p0 −xn
|X|
Z
I{|X|61} dP =
|X|p0 I{|X|>1} dP
• p0 6 1. It suffices to show that
Z
Z
p0 −xn
I{|X|>1} dP → |X|p0 I{|X|>1} dP
|X|
as n → ∞
because for the second part of the integral we can still apply the Bounded Convergence
Theorem with |X|p0 −xn 6 1. But we have each |X|p0 −xn I{|X|>1} decreases to |X|p0 I{|X|>1} as
n → ∞ and each |X|p0 −xn are nonnegative and integrable for each n because kXkp0 −xn < ∞
since p0 − xn ∈ S. Therefore we can apply the decreasing version of Monotone Convergence
Theorem(On the Slide Page 162) and obtain
Z
Z
p0 −xn
lim
|X|
I{|X|>1} dP = |X|p0 I{|X|>1} dP
n→∞
n
p0
p
To sum up we now have that kXkpp00 −x
−xn → kXkp0 as n → ∞, namely kXkp is continuous at
p = p0 (actually left continuous). Therefore kXkp = exp( p1 log kXkpp ) = exp( p1 g(p)) is continuous
at p = p0 .
(f ) We know that kXkp increases as p increases. Then the limit limp↑∞ kXkp exists. But by
(a) we have that kXkp 6 kXk∞ for all p. Therefore we obtain that limp↑∞ kXkp 6 kXk∞ . It
suffices to show that limp↑∞ kXkp > kXk∞ .
• If kXk∞ = ∞, we show that limp↑∞ kXkp = ∞. Since kXk∞ = ∞ means that for all
x > 0 we have F|X| (x) = P {|X| 6 x} < 1, namely, P {|X| > x} > 0, then for all n ∈ N by
127
monotonicity we have
Z
1/p Z
1/p
p
p
p
kXkp =
|X| dP
=
(|X| I{|X|6n} + |X| I{|X|>n} )dP
Z
>
1/p
p
|X| I{|X|>n} dP
Z
1/p
p
n I{|X|>n} dP
>
= n(P {|X| > n})1/p
Since P {|X| > n} > 0 then letting p → ∞ yields limp↑∞ n(P {|X| > n})1/p = n. Therefore
we obtain that limp↑∞ kXkp > n for all n ∈ N and this yields that limp↑∞ kXkp = ∞.
• If kXk∞ < ∞, we show that limp↑∞ kXkp > kXk∞ . By definition kXk∞ = inf{x :
F|X| (x) = 1}, and therefore for all ε > 0 we have F|X| (kXk∞ −ε) = P {|X| 6 kXk∞ −ε} < 1,
namely P {|X| > kXk∞ − ε} > 0. Denote E(ε) = P {|X| 6 kXk∞ − ε}. Then by
monotonicity we have
Z
p
1/p
|X| dP
kXkp =
Z
p
|X| IE(ε) dP
>
Z
p
p
1/p
=
(|X| IE(ε)c + |X| IE(ε) )dP
1/p
Z
>
p
1/p
(kXk∞ − ε) IE(ε) dP
= (kXk∞ − ε)P (E(ε))1/p
Since P (E(ε)) > 0 then letting p → ∞ yields limp↑∞ (kXk∞ − ε)P (E(ε))1/p = kXk∞ − ε.
Therefore we obtain that limp↑∞ kXkp > kXk∞ − ε for all ε > 0 and this yields that
limp↑∞ kXkp > kXk∞ because ε can be arbitrarily small.
(g) First we claim that log |X| is quasi-integrable provided that S 6= ∅. There exists p > 0 with
R
kXkp < ∞, namely, |X|p dP < ∞. Since we have
pxp−1
xp
= lim −1 = lim pxp = ∞
x→∞ log x
x→∞ x
x→∞
lim
by L’Hospital’s rule, then there exists some K > 1 such that for all x > K we have xp / log x > 1,
namely, log x 6 xp if x > K. Hence
Z
Z
+
(log |X|) dP = log |X| · I{|X|>1} dP
Z
Z
= log |X| · I{|X|∈[1,K]} dP + log |X| · I{|X|>K} dP
Z
Z
6 log K · I{|X|∈[1,K]} dP + log |X| · I{|X|>K} dP
Z
6 log KP {|X| ∈ [1, K]} + |X|p I{|X|>K} dP
Z
6 log K + |X|p dP = log K + kXkpp < ∞
128
This shows that log |X| ∈ Q+ and therefore log |X| is quasi-integrable. Also we know that kXkp
decreases as p decreases, therefore limp↓0 kXkp exists and can be substituted by limn→∞ kXk1/n .
Now it suffices to show:
• limp↓0 kXkp > exp(E log |X|). For all p ∈ S we have
Z
Z
Z
1
1
p
p
log kXkp = log |X| dP >
log |X| dP = log |X|dP = E log |X|
p
p
because ϕ(x) := log x is strictly concave, log |X| is quasi-integrable, |X|p is integrable and
hence claim 3 can be applied. Therefore we have
Z
kXkp = exp(log kXkp ) > exp
log |X|dP = exp(E log |X|)
This shows that limp↓0 kXkp > exp(E log |X|).
• limp↓0 kXkp 6 exp(E log |X|). Now it suffices to show that limn→∞ kXk1/n 6 exp(E log |X|).
Let h(x) = n(x1/n − 1) − log x for x > 0. Then we know that h0 (x) = (x1/n − 1)/x and
the minimum point is attained when h0 (x) = 0, namely, x = 1. Therefore h(x) > h(1) = 0.
This yields a useful inequality log x 6 n(x1/n − 1). Note that this inequality still holds with
x = 0 and x = +∞, in which cases we have log(0) = −∞ < −n for all n and +∞ = +∞.
Then by plugging x = kXk1/n we obtain
Z Z
|X|1/n − 1
1/n
|X| dP − 1 =
dP
(∗∗)
log kXk1/n 6 n
1/n
by monotonicity and linearity of integration. Let Yn = n(|X|1/n − 1) and we have Yn →
log |X| as n → ∞ by L’Hospital’s rule. Note that this limit also holds when |X| takes
value 0, in which case we have −n → −∞ as n → ∞. Since S is non-empty then there
exists r ∈ S such that kXkr < ∞. By Claim 4 we know that (Yn )∞
n=n0 decreases as n
increases for some fixed n0 with 1/n0 ∈ S. Then the sequence (Yn0 − Yn )∞
n=n0 is a sequence
of nonnegative increasing random variables with (Yn0 − Yn ) → (Yn0 − log |X|) as n → ∞.
Now applying Monotone Convergence Theorem(MCT) yields
Z
Z
Z
Z
Yn0 dP − lim
Yn dP = lim (Yn0 − Yn )dP =
lim (Yn0 − Yn )dP
n→∞
n→∞
n→∞
Z
Z
Z
= (Yn0 − log |X|)dP = Yn0 dP − log |X|dP
R
R
Yn0 dP is finite, then we obtain that limn→∞ Yn dP = log |X|dP . By our previous
R
result we have log kXk1/n 6 Yn dP , and hence letting n → ∞ in (∗∗) yields
Z
Z
lim log kXk1/n 6 lim
Yn dP = log |X|dP
Since
R
n→∞
n→∞
129
Note that the function exp(·) is continuous, therefore
Z
lim kXk1/n = exp lim log kXk1/n 6 exp
log |X|dP = exp(E log |X|)
n→∞
n→∞
DEFINITION 3.5.7 If 0 < p < ∞, Xn ∈ Lp for all n and X ∈ Lp , then we say Xn converges
Lp
to X in Lp , denoted by Xn → X, if kXn − Xkp → 0 as n → ∞.
PROPOSITION 3.5.4 Lp limits are essentially unique for 0 < p 6 ∞.
THEOREM 3.5.9 Convergence in Lp implies convergence in probability for 0 < p 6 ∞.
Now we introduce a generalization of the Dominated Convergence Theorem when the measure
is a probability measure.
1
THEOREM 3.5.10 Suppose (Xn )∞
n=1 , X, Y are random variables with |Xn | 6 Y, Y ∈ L and
P
Xn → X. Then X is integrable and EXn → EX.
Conversely the conclusion of the theorem is correct with an additional condition.
p
THEOREM 3.5.11 Suppose (Xn )∞
n=1 , X, Y are random variables with |Xn | 6 Y, Y ∈ L and
P
Lp
Xn → X. Then Xn → X.
EXAMPLE 3.5.2 There are examples such that neither convergence in Lp nor convergence
a.s. implies the other one. This is known as the “Skyscrapers” and “Cycling around (0, 1].
There is another thing worth noticing, which is the completeness of modes of convergence.
THEOREM 3.5.12 (Cauchy Criterion for Convergence a.s.) The sequence (Xn )∞
n=1
converges a.s. if and only if for every ε > 0 we have
lim P {|Xn − Xn0 | > ε for some n0 > n > m} = 0
m→∞
THEOREM 3.5.13 (Cauchy Criterion for Convergence in Probability) The sequence
(Xn )∞
n=1 converges in probability if and only if for every ε > 0 we have
lim P {|Xn − Xn0 | > ε} = 0
n,n0 →∞
THEOREM 3.5.14 (Cauchy Criterion for Convergence in Lp ) The sequence (Xn )∞
n=1
converges in Lp for 0 < p 6 ∞ if and only if
lim E|Xn − Xn0 |p = 0
n,n0 →∞
130
3.6
Vague Convergence
DEFINITION 3.6.1 Let (µn )∞
n=1 , µ be finite measures on (R, R). Then µn is said to converge
v
vaguely to µ, denoted by µn → µ or µn ⇒ µ, if there exists a dense set D ⊂ R and for all
a, b ∈ D, a < b we have µn (a, b] → µ(a, b].
Note that µn converging to µ vaguely by no means implies µn (A) → µ(A) for all A ∈ R.
Here are two examples:
EXAMPLE 3.6.1 Suppose Xn = cn with cn → 0 and cn < 0, i.o. , cn > 0, i.o. . Then we
claim that the distribution of Xn , denoted by L(Xn ), converge vaguely to δ0 , which is the unit
mass distribution on 0. This can be seen by taking the dense set D = R − {0}. Then for all
a, b ∈ D with a < b, either 0 ∈ (a, b) or 0 ∈
/ [a, b]. For the first case we see that for sufficiently
large n the value cn ∈ (a, b) because cn → 0 and this yields that L(Xn )(a, b] → 1, n → ∞. For
the second case we see that either a > 0 or b < 0, implying that there is only finitely many
terms of (cn )∞
n=1 in the interval (a, b]. Therefore we see that in this case L(Xn )(a, b] = 0, n → ∞.
However, we can see that if A = (0, b] for some b > 0, then L(Xn )(A) does not converge to
δ0 (A) = 0. In fact L(Xn )(A) even does not converge since cn ∈ (0, b), i.o. and cn ∈
/ (0, b), i.o. ,
meaning that L(Xn )(A) = 1, i.o. and L(Xn )(A) = 0, i.o. .
For our convenience the following weaker notion of probability measure is introduced
DEFINITION 3.6.2 (Subprobability Measure) A finite measure µ on (R, R) is said to
be a subprobability measure, if it satisfies µ(R) 6 1.
The following example shows that even though every µn is a probability measure, the vague
limit could still be only a subprobability measure rather than a probability measure. Heuristically, this can be regard as the “escape” of measure or mass.
EXAMPLE 3.6.2 Let Xn = cn for all n with cn → ∞ as n → ∞. Then we see that
L(Xn ) = δn , which are probability measure. Now we claim that the vague limit of the sequence
of measure L(Xn ) is the zero measure. Take the dense set to be the whole real line itself. For all
a, b ∈ R with a < b, we can see that for sufficiently large n the measure L(Xn )(a, b] = 0. This
is because cn can always shifted to the right side of b when we let n to be large enough. Then
we see that L(Xn )(a, b] → 0 as n → ∞ and this shows that the vague limit of L(Xn ) is the zero
measure.
In fact the only case is that the total mass of the vague limit measure can never be greater
than the limit inferior of the total mass of the sequence of measures. Namely:
131
v
PROPOSITION 3.6.1 Suppose (µn )∞
n=1 , µ are finite measures on (R, R) with µn → µ. Then
µ(R) 6 lim inf n→∞ µn (R).
Proof. By definition there exists a dense set D ⊂ R such that µn (a, b] → µ(a, b] for all a, b ∈
D, a < b. Then by monotonicity
lim inf µn (R) > lim inf µn (a, b] = lim µn (a, b] = µ(a, b],
n→∞
n→∞
n→∞
for all a, b ∈ D, a < b
∞
Now take two sequences (ak )∞
k=1 , (bk )k=1 ⊂ D such that ak ↓ −∞ and bk ↑ +∞ as k → ∞(This
can be yielded since D is dense in R). Then by monotone sequential continuity from below we
see that
lim inf µn (R) > lim µ(ak , bk ] = µ(R)
n→∞
k→∞
From now on we assume that all measures discussed in this section are subprobability measures.
Recall that Scheffé’s theorem says that when a sequence of measures and the target measure
have densities and the densities of the sequence of measures converge to the density of the
candidate measure a.s., then we conclude that the sequence of measures converge strongly to the
candidate measure. It is not possible to use linear combination of point masses to approximate
any measure with continuous density strongly. But in terms of vague convergence, we have the
following example:
EXAMPLE 3.6.3 Suppose µn =
1
n
Pn
k=1 δk/n
v
and µ = λ|(0,1] . We show that µn → µ.
Let Fn be the distribution function of µn . Then for each x ∈ (0, 1] we have
n
n
1X
1X
bnxc
Fn (x) =
I(−∞,x] (k/n) =
I(−∞,nx] (k) =
→ x,
n k=1
n k=1
n
n→∞
This shows that Fn (x) → F (x) for all x ∈ R where we use F to denote the distribution function
of λ|(0,1] because outside (0, 1] each Fn and F are identical. Therefore by subtraction we conclude
that µn (a, b] = Fn (b) − Fn (a) → F (b) − F (a) = λ|(0,1] (a, b] for all a, b ∈ R. This shows that the
vague limit of (µn )∞
n=1 is λ|(0,1] .
Note that µn (Q) = 1 always holds but µ(Q) = 0 since the Lebesgue measure for any countable set is 0. This coincide with the fact that vague convergence does not necessarily implies
convergence on any measurable set.
DEFINITION 3.6.3 (Continuity Point and Continuity Interval) A point x ∈ R is said
to be a continuity point of a measure µ if µ({x}) = 0. An interval is said to be a continuous
interval of µ if the end points are continuous points µ.
132
As a notation of convention, µ(a, b] = 0 when a > b.
THEOREM 3.6.1 Let (µn )∞
n=1 and µ be subprobability measures. The following propositions
are equivalent:
(i) For every finite interval (a, b) and ε > 0, there exists an n0 (a, b, ε) such that if n > n0 then
µ(a + ε, b − ε) − ε 6 µn (a, b) 6 µ(a − ε, b + ε) + ε
(ii) For every continuity interval (a, b] of µ, we have
µn (a, b] → µ(a, b]
v
(iii) µn → µ.
Proof. Routinely the equivalence can be shown by:
• (i) implies (ii)By (i) we know that for every fixed ε there exists some n0 such that
µ(a + ε, b − ε) − ε 6 µn (a, b) 6 µ(a − ε, b + ε) + ε
when n > n0 . Then by letting n → ∞ we obtain
µ(a + ε, b − ε) − ε 6 lim inf µn (a, b) 6 lim sup µn (a, b) 6 µ(a − ε, b + ε) + ε
n→∞
n→∞
Now by monotone sequential continuity we can take ε going discretely down to 0 and get
µ(a, b) 6 lim inf µn (a, b) 6 lim sup µn (a, b) 6 µ[a, b]
n→∞
n→∞
Now suppose (a, b] is a continuity interval of µ, then we see that µn (a, b) → µ(a, b] because
µ{a} = 0. It suffices to show that µn {b} → 0 as n → ∞.
We need the fact that a finite measure on (R, R) has at most countably infinite many
singletons with positive measure(Recall a relevant result concerning the σ-finite measure in
section 2.4). This means that the complement of continuity points of µ is at most countable.
Therefore the set of all continuity points of µ is dense in R. Denote the set of all continuity
points of µ by C. To show that µn {b} → 0, it suffices to show that for all x ∈ C we have
µn {c} → 0. Now for all a, b ∈ C, a < x < b we have
lim sup µn {x} 6 lim sup µn (a, b) 6 µ[a, b] = µ(a, b)
n→∞
n→∞
by our previous result and the fact that µ{a}, µ{b} = 0. Then by the dense property of C
we can select sequences ak ↑ x, bk ↓ x, k → ∞. Then we see that
lim sup µn {x} 6 lim sup µn (ak , bk ) 6 µ(ak , bk )
n→∞
n→∞
133
And by applying the monotone sequential continuity from above we see that µ{x} =
limk→∞ µ(ak , bk ) = 0 and therefore lim supn→∞ µn {x} 6 µ{x} = 0 by letting k → ø in
the previous inequality. This yields that µn {x} → 0 and this part of the proof is done.
• (ii) implies (iii)As mentioned above, the set of all continuity points of µ, denoted by C,
is a dense set in R. Therefore we see that µn (a, b] → µ(a, b] for all a, b ∈ C, a < b implies
v
that µn → µ.
v
• (iii) implies (i)Since µn → µ then there exists some dense set D such that µn (a, b] →
µ(a, b] whenever a, b ∈ D, a < b. Then for all a, b ∈ D, a < b and for all ε there exists
some n0 = n0 (a, b, ε) such that |µn (a, b] − µ(a, b]| < ε. Then there exists a1 , a2 , b1 , b2 ∈ D
with a − ε < a1 < a < a2 < a + ε, b − ε < b1 < b < b2 < b + ε. Then there exists
N = N (a, a1 , a2 , b, b1 , b2 , ε) such that
|µ(a1 , b2 ] − µn (a1 , b2 ]| < ε,
|µ(a2 , b1 ] − µn (a2 , b1 ]| < ε
whenever n > N . Then together monotonicity we have
µ(a + ε, b − ε) − ε 6 µ(a2 , b1 ] − ε 6 µn (a2 , b1 ] 6 µn (a1 , b2 ] 6 µ(a1 , b2 ] + ε 6 µ(a − ε, b + ε) + ε
whenever n > N . But note that ai , bj , i = 1, 2 are selected according to a, b, ε and therefore
N is selected according to a, b, ε. Namely N = N (a, bε) and we obtain the conclusion of
(i).
Actually we have proved the following:
v
PROPOSITION 3.6.2 If a sequence of measure µn → µ then for all continuity point x of µ
we have µn {x} → 0, n → ∞.
v
PROPOSITION 3.6.3 If a sequence of measure µn → µ then we have
lim sup µn [a, b] 6 µ[a, b]
n→∞
Proof. By the first part of the proof of previous theorem we know that
µ(a, b) 6 lim inf µn (a, b) 6 lim sup µn (a, b) 6 µ[a, b]
n→∞
n→∞
Then for all sufficiently large k ∈ N we have that
1
1
1
1
6 µ a − ,b +
lim sup µn [a, b] 6 lim sup µn a − , b +
k
k
k
k
n→∞
n→∞
134
Now by monotone sequential continuity from above we know that
1
1
= µ[a, b]
lim sup µn [a, b] 6 lim µ a − , b +
k→∞
k
k
n→∞
v
v
PROPOSITION 3.6.4 If µn → µ and µn → ν where all relevant measures are subprobability
measures, then µ = ν.
Proof. Since the real limit of a sequence of real numbers is unique, then we consider the sets
Cµ , Cν of all continuity points of both µ and ν. Since Cµc , Cνc are at most countable, then
C = Cµ ∩Cν is dense. Therefore for all a, b ∈ C, a < b we know that µ(a, b] = ν(a, b]. We conclude
that the distribution functions Fµ of µ and Fν of ν coincide on C. Recall that distribution
functions are right continuous. Then for all x ∈ R there exists a sequence (xn )∞
n=1 ⊂ C with
xn ↓ x. Then
Fµ (x) = lim Fµ (xn ) = lim Fν (xn ) = Fν (x)
n→∞
n→∞
Therefore we know that Fµ = Fν . This means that two finite measures coincide on the π-system,
the class of all finite intervals in R. Therefore they must be identical.
If all measures concerned are probability measures, then we define
DEFINITION 3.6.4 (Weak Convergence) If (µn )∞
n=1 , µ are probability measures with
v
w
µn → µ, then we say that µn converge weakly to µ and denote it by µn → µ.
In terms of weak convergence, we have a stronger analogue of previous theorem for vague
convergence.
w
THEOREM 3.6.2 Probability measures µn → µ if and only if for all δ > 0, ε > 0 there exists
n0 = n0 (δ, ε) such that for all n > n0 then
µ(a + δ, b − δ) − ε 6 µn (a, b) 6 µ(a − δ, b + δ) + ε
for all a, b ∈ R, a < b.
COROLLARY 3.6.3 A sequence of measures (µn )∞
n=1 converge weakly to µ if and only if the
distribution functions of µn denoted by Fn , converge to the distribution function F of µ on the
those points where F is continuous.
THEOREM 3.6.4 (Helly’s Selection Principle) Given a sequence of subprobability mea∞
∞
sures (µn )∞
n=1 , there exists a subsequence (nk )k=1 such that (µnk )k=1 converges vaguely to a
subprobability measure.
135
THEOREM 3.6.5 A sequence of subprobability measures (µn )∞
n=1 converge weakly to a subprobability measure µ if and only if every subsequence (µnk )∞
k=1 has a further subsequence
(µnkl )∞
l=1 converge to µ vaguely.
THEOREM 3.6.6 If every vaguely convergent subsequence of a sequence of subprobability
v
measures (µn )∞
n=1 converges to same µ, then µn → µ.
3.7
Continuation
DEFINITION 3.7.1 Let [a, b] ⊂ R. Define
• C[a,b] to be the class of all continuous functions on R and takes value 0 outside [a, b];
• CK to be the class of all compactly supported continuous functions on R;
• C0 to be the class of all continuous functions on R such that f (x) → 0 as x → ∞ and
x → −∞;
• CB to be the collection of all bounded continuous functions on R.
Clearly we have that
C[a,b] ⊂ CK ⊂ C0 ⊂ CB
DEFINITION 3.7.2 (D-Valued Step Function) Given an interval [a, b] ⊂ R̄ and a set
D ⊂ R, a function f is called a D-valued step function, if there exists (aj )lj=0 , (cj )lj=1 ⊂ D such
that a = a0 < a1 < · · · < al = b and f (x) = cj for x ∈ (aj−1 , aj ).
Intuitively, a D-valued step function is a step function taking values in D with endpoints of
sub-intervals lies in D.
LEMMA 3.7.1 Suppose [a, b] ⊂ R and A is dense in R.
• If f ∈ C[a,b] there exists a sequence of A-valued step functions (fn )∞
n=1 on (a, b) such that
kfn − f k → 0 as n → ∞.
• If f ∈ C0 there exists a sequence of A-valued step functions (fn )∞
n=1 on R such that kfn −
f k → 0 as n → ∞.
THEOREM 3.7.1 Suppose (µn )∞
n=1 and µ are subprobability measures. Then:
v
• µn → µ if for all f ∈ CK we have
Z
Z
f dµn → f dµ as n → ∞
136
v
• µn → µ only if for all f ∈ C0 we have
Z
Z
f dµn → f dµ as n → ∞
THEOREM 3.7.2 Let (µn )∞
n=1 and µ be probability measures. Then µn converges weakly to
µ implies that
Z
Z
f dµn →
f dµ
for all f ∈ CB .
DEFINITION 3.7.3 (Tightness) Let (µn )∞
n=1 be a sequence of probability measures. Then
µn is called tight, if
lim sup(µn ((−∞, b]) + µn ([b, +∞, ))) → 0
b→∞ n≥1
DEFINITION 3.7.4 A sequence of random variables (Xn )∞
n=1 is said to be bounded in probability, if the distribution (L(Xn ))∞
n=1 forms a sequence of tight probability measures.
THEOREM 3.7.3 Suppose we have a sequence of probability measure (µn )∞
n=1 converges
vaguely to a subprobability measure µ. Then µ is a probability measure if and only if (µn )∞
n=1
is tight.
3.8
Convergence in Distribution
DEFINITION 3.8.1 Suppose (Xn )∞
n=1 is a sequence of random variables and X is some
L
random variables. We say that Xn converges to X in distribution, denoted by Xn → X, if
v
L(Xn ) → L(X)
LEMMA 3.8.1 Suppose (Xn )∞
n=1 is a sequence of random variables converging to X in probability and f : R → R is a continuous function. Then f (Xn ) converges to f (X) in probability.
THEOREM 3.8.1 If a sequence of random variables (Xn )∞
n=1 converges to X in probability,
L
then Xn → 0 as n → ∞.
L
L
THEOREM 3.8.2 (Slutsky’s Theorem) Suppose random variables Xn → X and Yn → 0,
then
L
(a) Xn + Yn → X,
L
(b) Xn Yn → 0.
137
3.9
Uniform Integrability and Convergence of Moments
THEOREM 3.9.1
(a) If Xn → X a.s., then for all r ∈ (0, +∞] we have kXkr ≤ lim inf n→∞ kXn kr .
Lr
(b) If Xn → X for some r ∈ (0, +∞], then limn→∞ kXn kr = kXkr < ∞.
L1
(c) If Xn → X then EXn → EX as n → ∞.
L
THEOREM 3.9.2 Suppose Xn → X and supn E|X|p ≤ ∞
(a) For all r < p we have kXn kr → kXkr < ∞.
(b) If r < p is an integer, then EXnr → EX r .
DEFINITION 3.9.1 A family of random variables {Xt : t ∈ T } is uniformly integrable if
lim sup E(|Xt |I{|Xt |>A} ) = 0
A→∞ t∈T
THEOREM 3.9.3 The family {Xt : t ∈ T } is uniformly integrable if and only if supt E|Xt | <
∞ and
lim sup{E(|Xt |IB ) : t ∈ T, P (B) ≤ ε} = 0
ε→0
P
THEOREM 3.9.4 Let p ∈ (0, ∞), Xn ∈ Lp and Xn → X. Then the following statements are
equivalent:
(i) {|Xn |r } is uniformly integrable;
Lp
(ii) Xn → X;
(iii) E|Xn |r → E|X|r < ∞
THEOREM 3.9.5 A sequence of random variables (Xn )∞
n=1 is uniformly integrable, if one of
the following statements holds:
• L1 domination: There exists Y ∈ L1 such that |Xn | ≤ Y for all n;
• Identical distribution: (Xn )∞
n=1 are identically distributed with finite mean;
• Lp boundedness: There exists some p > 1 such that supn E|Xn |p < ∞.
Next theorem concerns the so-called “Method of Moments”, which is a sufficient condition
for convergence in distribution.
r
r
THEOREM 3.9.6 Suppose (Xn )∞
n=1 and X have EXn , EX < ∞ for all r ∈ N. If for all
L
r ∈ N we have EXnr → EX r as n → ∞, then Xn → X.
138
Chapter 4
Transition Probabilities
4.1
Product Measurable Space
DEFINITION 4.1.1 Let (Ω1 , A1 ) and (Ω2 , A2 ) be two measurable spaces. Let πi : Ω1 ×Ω2 →
Ωi be the ith projection, i = 1, 2. Set
C : = π1−1 (A1 ) ∪ π2−1 (A2 ),
the class of measurable cylinders
R : = {A1 × A2 : A1 ∈ A1 , A2 ∈ A2 }, the class of measurable rectangles
(
)
X
U :=
Rj : J finite, Rj ∈ R for each j
j∈J
PROPOSITION 4.1.1 C is closed under complementation, R is a π-system(closded under
finite intersections), and U is the field generated by C and by R.
Proof.
(1) C is closed under complementation.
• Suppose C ∈ π1−1 (A1 ), then there exists some A1 ∈ A1 such that C = π1−1 (A1 ). Then we
see that C = {(ω1 , ω2 ) : ω1 ∈ A1 }. Complementation of this is C c = {(ω1 , ω2 ) : ω1 ∈ Ac1 } =
π1−1 (Ac1 ) ∈ π1−1 (A1 ).
• Suppose C ∈ π2−1 (A2 ), then there exists some A2 ∈ A2 such that C = π2−1 (A2 ). Then we
see that C = {(ω1 , ω2 ) : ω2 ∈ A2 }. Complementation of this is C c = {(ω1 , ω2 ) : ω2 ∈ Ac2 } =
π2−1 (Ac2 ) ∈ π2−1 (A2 ).
Hence we conclude that C c ∈ C = π1−1 (A1 ) ∪ π2−1 (A2 ) and C is closed under complementation.
(2) R is a π-system. Suppose A, B ∈ R, then there exists some A1 , B1 ∈ A1 and A2 , B2 ∈ B2
139
such that A = A1 × A2 , B = B1 × B2 . Then
A ∩ B = {(ω1 , ω2 ) : ω1 ∈ A1 , ω2 ∈ A2 , ω1 ∈ B1 , ω2 ∈ B2 }
= {(ω1 , ω2 ) : ω1 ∈ A1 ∩ A2 , ω2 ∈ B1 ∩ B2 }
= (A1 ∩ B1 ) × (A2 ∩ B2 ) ∈ R
because A1 , A2 are σ-fields and closded under intersection, then A1 ∩ B1 ∈ A1 and A2 ∩ B2 ∈ A2 .
(3) U is the field generated by C and R. Denote the field generated by a collection of sets
F by f (F). It suffices to show
• f (C) ⊂ f (R). Write
C = {A1 × Ω2 : A1 ∈ A1 } ∪ {Ω1 × A2 : A2 ∈ A2 }
Then it is clear that C ⊂ R, because Ω1 ∈ A1 , Ω2 ∈ A2 . Therefore f (C) ⊂ f (R).
• f (R) ⊂ U. Since clearly we have that R ⊂ U, then it suffices to show that U is a field.
– Nonemptyness. It is clear that U is nonempty because Ω = Ω1 × Ω2 ∈ R and R ⊂ U
P
because each element R ∈ R can be written as R = 1i=1 Ri , Ri = R.
– Closed under binary intersection. Next we show that U is closed under binary
intersection. Suppose
U1 =
n
X
Ri
U2 =
i=1
Then
U1 ∩ U2 =
n
X
m
X
Ri , Sj ∈ R
Sj
j=1
!
Ri
∩
i=1
n
X
!
Sj
j=1
=
n X
n
X
(Ri ∩ Sj )
i=1 j=1
because Ri ’s are disjoint sets and so are Sj ’s and it follows from the distributive law of
intersection over union. Note that we have already shown that R is a π-system, which
means that Ri ∩ Sj ∈ R, therefore U1 ∩ U2 ∈ U.
– Closed under complementation. We prove it by induction on the number of elements in J. If J is singleton, namely, U ∈ U is an element in R, then U = A1 × A2 for
some Ai ∈ Ai , i = 1, 2. Then
U c = (A1 × A2 )c = A1 × Ac2 + Ac1 × Ω2 ∈ U
this is because (A1 × Ac2 ) ∩ (Ac1 × Ω2 ) = ∅ and that if (ω1 , ω2 ) ∈
/ A1 × A2 , then
either ω1 ∈ Ac1 , leading to (ω1 , ω2 ) ∈ Ac1 × Ω2 , or ω2 ∈
/ A2 with ω1 ∈ A1 , leading to
(ω1 , ω2 ) ∈ A1 × Ac2 , and that A1 × Ac2 , Ac1 × Ω2 ∈ R. Here we obtain a useful property
(∗): if R ∈ R then there exists A, B ∈ R such that Rc = A + B and thus Rc ∈ U.
140
Now suppose for |J| = n(NOTATION here |J| means the cardinality of J) we
P
have U = j∈J Rj satisfies U c ∈ U. This means that
!c
n
X
Ri ∈ R
i=1
Then suppose |J| = n + 1. Write
U=
n+1
X
Rj
Rj ∈ R j = 1, 2, · · · , n + 1
j=1
Then
Uc =
Rn+1 +
n
X
!c
Rj
c
∩
= Rn+1
j=1
n
X
!c
Rj
∈U
j=1
c
∈ U. We have already shown that U is
because Rn+1 ∈ R have property (∗): Rn+1
closed under binary intersection. Therefore for |J| = n + 1 we also have U c ∈ U and
induction gives U being closed under complementation.
• U ⊂ f (C). Suppose U ∈ U. Then we can write
n
X
U=
(Aj × Bj ) Aj ∈ A1 , Bj ∈ A2
j = 1, 2, · · · , n
j=1
We have that Aj × Bj = (Aj × Ω2 ) ∩ (Ω1 × Bj ) = π1−1 (Aj ) ∩ π2−1 (Bj ) ∈ f (C) because f (C)
is closed under binary intersection and π1−1 (Aj ), π2−1 (Bj ) ∈ C. Therefore U ⊂ f (C).
DEFINITION 4.1.2 (Product Measurable Space) Suppose (ωi , Ai ) for i = 1, 2 are measurable spaces. Define Ω = Ω1 , Ω2 and the product σ-field
A1 ⊗ A2 = σ(C) = σ(R)
Then we say that (Ω, A1 ⊗ A2 ) is the product measurable space of (Ω1 , A1 ) and (Ω2 , A2 ).
DEFINITION 4.1.3 (Section of Sets and Mappings) Let Ω1 , Ω2 be two spaces and
Ω = Ω1 ×Ω2 . Let X : Ω → Ψ be a mapping. The section of X/A at ω1 is defined to be the function
Xω1 : Ω2 → Ψ/the set Aω1 ⊂ Ω2 given by Xω1 (ω2 ) = X(ω1 , ω2 )/Aω1 = {ω2 : (ω1 , ω2 ) ∈ A}.
PROPOSITION 4.1.2 (IA )ω1 = IAω1 for A ⊂ Ω and ω1 ∈ Ω1 ; (X −1 (B))ω1 = Xω−1
(B) for
1
B ⊂ Ψ and ω1 ∈ Ω1 .
Proof.
141
First we show that (IA )ω1 = IAω1 . Fix ω1 ∈ Ω1 , for all ω2 ∈ Ω2 we have by definition
(
(
1 (ω1 , ω2 ) ∈ A
1 ω2 ∈ Aω1
(IA )ω1 (ω2 ) = IA (ω1 , ω2 ) =
=
0 (ω1 , ω2 ) ∈
/A
0 ω2 ∈
/ Aω1
(
1 ω2 ∈ Aω1
IAω1 (ω2 ) =
0 ω2 ∈
/ Aω1
because for fixed ω1 , (ω1 , ω2 ) ∈ A if and only if ω2 ∈ Aω1 .
Next we show that (X −1 (B))ω1 = Xω−1
(B) for all B ∈ Ψ.
1
(B). Suppose ω2 ∈ (X −1 (B))ω1 . Then (ω1 , ω2 ) ∈ X −1 (B). This means
• (X −1 (B))ω1 = Xω−1
1
(B).
that Xω1 (ω2 ) = X(ω1 , ω2 ) ∈ B. Therefore ω2 ∈ Xω−1
1
• Xω−1
(B) ⊂ (X −1 (B))ω1 . Suppose ω2 ∈ Xω−1
(B), then by definition we have Xω1 (ω2 ) =
1
1
X(ω1 , ω2 ) ∈ B. Therefore we have (ω1 , ω2 ) ∈ X −1 (B) and by definition of section of a set
ω2 ∈ (X −1 (B))ω1 .
PROPOSITION 4.1.3 Fix ω1 ∈ Ω1 and let iω1 : Ω2 → Ω1 × Ω2 = Ω be the injection mapping
defined by
iω1 (ω2 ) = (ω1 , ω2 )
Suppose X : Ω = Ω1 × Ω2 → Ψ. Then
Aω1 = i−1
ω1 (A) Xω1 = X ◦ iω1
Proof.
(1) We first show that Aω1 = i−1
ω1 (A).
Suppose ω2 ∈ Aω1 , then by definition we have
(ω1 , ω2 ) ∈ A. By the definiton of injection mapping we have that iω1 (ω2 ) = (ω1 , ω2 ) ∈ A and
hence ω2 ∈ i−1
ω1 (A).
(2) We next show that Xω1 = X ◦ iω1 . Suppose ω2 ∈ Ω2 . Then by definition
iω1 (ω2 ) = (ω1 , ω2 )
Xω1 (ω2 ) = X(ω1 , ω2 ) = X(iω1 (ω2 )) = (X ◦ iω1 )(ω2 )
Therefore Xω1 = X ◦ iω1 .
LEMMA 4.1.1 (Sectioning preserves measurability) Let (Ψ, B) be a measurable space,
and let X : Ω → Ψ be measurable. Then Xω1 : Ω2 → Ψ is measurable for each ω1 ∈ Ω1 and
Aω1 ∈ A2 for each A ∈ A and ω1 ∈ Ω1 .
142
Proof. Consider the generators R of product σ-field. Suppose A = A1 × A2 ∈ R. Then i−1
ω1 (A) =
A2 or ∅ according to ω1 ∈ A1 or not. Then we see that the pull-backs of measurable rectangles
are measurable, and therefore iω1 is measurable. Hence Xω1 = X ◦ iω1 is measurable by the
composition rule.
Now suppose A ∈ A. Then Aω1 = iω1 (A) yields that Aω1 is measurable because iω1 is
measurable.
The reverse of the lemma does not hold. Namely, one can have a set in Ω1 × Ω2 having each
sections measurable but not measurable itself.
EXAMPLE 4.1.1 Let Ψ be an uncountable set, and let B be the σ-field in Ψ generated by
the singletons. (B consists of the countable and co-countable subsets of Ψ. ) Take (Ω1 , A1 ) =
(Ψ, B) = (Ω2 , A2 ). Consider the diagonal ∆ := {(ψ, ψ) : ψ ∈ Ψ} of A = B ⊗ B. One has ∆ ∈
/ A.
Proof. First we need a result from Billingsley’s book problem 2.9 and the next following
lemma.
Problem 2.9 If B ∈ σ(A) then there exists a countable subclass AB ⊂ A such that
B ∈ σ(AB ).
Proof. We use the “good sets” argument. Define
G = {B ∈ σ(A) : there exists some countable subclass AB ⊂ A such that B ∈ σ(AB )}
We are to show that σ(A) = G such that every set in σ(A) has the property of sets in G. In fact
it suffices to show that G is a σ-field containing A.
• G contains A. Suppose A ∈ A. Then clearly A ∈ σ(A). And we know that {A} itself is
a countable subclass of A and A ∈ σ({A}). Therefore A ∈ G by definition.
• G is nonempty. This is clear since A ⊂ G and A is nonempty.
• G is closed under countable union. Suppose (Gn )∞
n=1 is a sequence of sets in G. Then
for each Gn there exists some countable subclass An ⊂ A such that Gn ∈ σ(An ). Then
!
!
∞
∞
∞
∞
[
[
[
[
Gn ∈
σ(An ) ⊂ σ
σ(An ) = σ
An
n=1
n=1
n=1
n=1
S∞
The last equality holds because for all n we have σ(An ) ⊂ σ( n=1 An ) and for all n we have
S
S
An ⊂ ∞
n=1 σ(An ). Since An ’s are countable subclass of A, then
n An is also a countable
S
subclass of A. Also σ(A) is closed under countable union. Therefore n Gn ∈ σ(A). By
S
definition of G we have that n Gn ∈ G.
143
• G is closed under complementation. Suppose G ∈ G, then G ∈ σ(A) and therefore
Gc ∈ σ(A); also there exists some AG ⊂ A being a countable subclass such that G ∈ σ(AG ).
Then we can take the countable subclass for Gc to be AGc = AG because Gc ∈ σ(AG ).
Therefore by definition we have Gc ∈ G.
Lemma 1 Suppose A, B are collection of subsets of Ω1 , Ω2 , then σ(A) ⊗ σ(B) =
σ(A × Ω2 ∪ Ω1 × B).
Proof. By the definiton of product σ-field we have
σ(A) ⊗ σ(B) = σ(σ(A) × Ω2 ∪ Ω1 × σ(B))
It is clear that
A × Ω2 ∪ Ω1 × B ⊂ σ(A) × Ω2 ∪ Ω1 × σ(B)
So it suffices to show that
σ(A) × Ω2 ∪ Ω1 × σ(B) ⊂ σ(A × Ω2 ∪ Ω1 × B)
Since we have
σ(A × Ω2 ) ∪ σ(Ω1 × B) ⊂ σ(A × Ω2 ∪ Ω1 × B)
then in fact it suffices to show that
σ(A) × Ω2 ⊂ σ(A × Ω2 )
Ω1 × σ(B) ⊂ σ(Ω1 × B)
To see this, we use the “good sets” arguments. Define
G1 = {A ∈ σ(A) : A × Ω2 ∈ σ(A × Ω2 )}
G2 = {B ∈ σ(B) : Ω1 × B ∈ σ(Ω1 × B)}
Then we have
• A ⊂ G1 , B ⊂ G2 . Since generators are nonempty, then this shows that G1 , G2 are also
nonempty. Since A ⊂ σ(A) and A × Ω2 ⊂ σ(A × Ω2 ) we have A ⊂ G1 ; Similarly we
have B ⊂ σ(B) and Ω1 × B ⊂ σ(Ω1 × B), then B ⊂ G2 .
• G1 , G2 are closed under complementation. If A ∈ G1 , B ∈ G2 then
A ∈ σ(A), A × Ω2 ∈ σ(A × Ω2 ) implies Ac ∈ σ(A), Ac × Ω2 = (A × Ω2 )c ∈ σ(A × Ω2 )
B ∈ σ(B), Ω1 × B ∈ σ(Ω1 × B) implies B c ∈ σ(B), Ω1 × B c = (Ω1 × B)c ∈ σ(Ω1 × B)
By definition of G1 , G2 we know that Ac ∈ G1 , B c ∈ G2 .
144
• G1 , G2 are closed under countable union. If (An )n≥1 ⊂ G1 , (Bn )n≥1 ⊂ G2 , then
(An )n≥1 ⊂ σ(A),
implies
∞
[
An ∈ σ(A),
n=1
(An × Ω2 )n≥1 ∈ σ(A × Ω2 )
!
!
∞
∞
[
[
An × Ω =
An × Ω ∈ σ(A × Ω2 )
n=1
n=1
and
(Bn )n≥1 ⊂ σ(B),
implies
∞
[
Bn ∈ σ(B),
n=1
(Ω1 × Bn )n≥1 ∈ σ(Ω1 × B)
!
!
∞
∞
[
[
Ω1 ×
An =
Ω1 × Bn ∈ σ(Ω1 × B)
n=1
n=1
Therefore by the definition of G1 , G2 we know that
∞
[
An ∈ G1 ,
n=1
∞
[
Bn ∈ G2
n=1
Above three terms imply that G1 is a σ-field containing A and G2 is a σ-field containing B. Hence
σ(A) ⊂ G1 and σ(B) ⊂ G2 . Therefore we know for all A × Ω2 ∈ σ(A) × Ω2 , Ω1 × B ∈ Ω1 × σ(B)
we have
A × Ω2 ∈ σ(A × Ω2 ) implies σ(A) × Ω2 ⊂ σ(A × Ω2 )
Ω1 × B ∈ σ(Ω1 × B) implies Ω1 × σ(B) ⊂ σ(Ω1 × B)
NOTATION: E is a collection of subsets of Ψ and S is a set, then define E × S :=
{E × S : E ∈ E}. And if F, G are collection of subsets of Ψ then define F × G =
{F × G : F ∈ F, G ∈ G}(Do not confuse this notation with the product σ-field
A ⊗ B!!!).
The proofs will be given in the appendix. Now it suffices to show that:
• Suppose E ∈ A, then there exists some countable set S(E) such that E ∈ σ(AE )
where AE := {{x} × Ψ : x ∈ S(E)} ∪ {Ψ × {x} : x ∈ S(E)}. S × E is defined in the same
way. Denote the collection of singletons generating B by D, namely, B = σ(D). By using
the lemma 1 above we know that
A = B ⊗ B = σ(D) ⊗ σ(D) = σ(D × Ψ ∪ Ψ × D)
Now we use Billingsley’s Problem 2.9: there exists a countable subclass ÃE ⊂ D × Ψ ∪
Ψ × D such that E ∈ σ(ÃE ). Note that D is the class of all singletons, then
ÃE ⊂ D × Ψ ∪ Ψ × D = {{x} × Ψ : x ∈ Ψ} ∪ {Ψ × {y} : y ∈ Ψ}
145
means that there exists subsets X(E), Y (E) of Ψ such that
ÃE = {{x} × Ψ : x ∈ X(E)} ∪ {Ψ × {y} : y ∈ Y (E)}
Note that ÃE is a countable class, and if X(E), Y (E) are uncountable then we must have
ÃE uncountable because the classes {{x} × Ψ : x ∈ X(E)}, {Ψ × {y} : y ∈ Y (E)} are
subclasses of ÃE and they are indexed by X(E), Y (E). Therefore S(E) := X(E) × Y (E)
is countable. Now define
AE = {{x} × Ψ : x ∈ S(E)} ∪ {Ψ × {y} : y ∈ S(E)}
Clearly we have ÃE ⊂ AE because X(E), Y (E) ⊂ S(E). And it is still a countable subclass
of A because {{x} × Ψ : x ∈ S(E)} and {Ψ × {y} : y ∈ S(E)} are indexed by S(E), which
is countable. Therefore we now have
E ∈ σ(ÃE ) ⊂ σ(AE ) ⊂ A
• For E ∈ A, define P(E) = {{x} : x ∈ S(E)} ∪ {S(E)c }. Then E ∈ σ(P(E) × P(E)).
Since we have E ∈ σ(AE ) now where
AE = {{x} × Ψ : x ∈ S(E)} ∪ {Ψ × {y} : y ∈ S(E)}
It suffices to show that AE ⊂ σ(P(E) × P(E)). Suppose {x} × Ψ ∈ AE with x ∈ S(E),
then we see that

{x} × Ψ = {x} × S(E)c ∪

[
{y} = ({x} × S(E)c ) ∪
y∈S(E)
[
({x} × {y})
y∈S(E)
Note that the union on the right-hand side is a countable union, and the element sets {x} ⊂
S(E), S(E)c , {y} ⊂ S(E) are taken from P(E), then right-hand side being in σ(P(E) ×
P(E)) follows from the closure under countable union of σ(P(E) × P(E)).
• ∆∈
/ A. We prove it by contradiction. Suppose ∆ ∈ A. Then there exists some countable
set S(∆) ⊂ Ψ such that
∆ ∈ σ(P(∆) × P(∆)) where P(∆) = {{x} : x ∈ S(∆)} ∪ {S(∆)c }
Note that P(∆) is a collection of disjoint sets: {x} ∈ S(∆) and S(∆)c . Then P(∆) × P(∆)
is also a collection of disjoint sets. By Problem 2.9 on Chung’s book we know that
σ(P(∆) × P(∆)) consists of countable unions of element sets in P(∆) × P(∆). Namely,
∆ is a countable union of disjoint sets in P(∆) × P(∆). Now we show the contradiction.
146
Since Ψ is uncountable and σ(∆) is countable, then S(∆)c is uncountable, in particular
non-empty. For our convenience denote S(E) = (xn )n≥1 and define
Λ = S(E)c × S(E)c
Li = {xi } × S(E)c
Rj = S(E)c × {xj } Xij = {xi } × {xj }
Pick ψ0 ∈ S(E)c and we know (ψ0 , ψ0 ) ∈
/ Xij for all i, j ≥ 1 but (ψ0 , ψ0 ) ∈ ∆. Therefore
we must have that at least one of Λ, Li , Rj intersects with ∆. Recall that ∆ is a countable
union of sets in Λ, Li , Rj , Xij .
– Case 1: If Λ ∩ ∆ 6= ∅ then Λ ⊂ ∆. But S(E)c is uncountable. Hence there exists
x, y ∈ S(E)c , x 6= y and this means (x, y) ∈ Λ but (x, y) ∈
/ ∆. Therefore Λ ∩ ∆ = ∅;
– Case 2: If for some i ≥ 1 we have Li ∩ ∆ 6= ∅ then Li ⊂ ∆. But this is not possible
because by definiton of Li for all (xi , y) ∈ Li we have xi 6= y. Hence (xi , y) ∈
/ ∆, and
Li ∩ ∆ = ∅;
– Case 3: If for some j ≥ 1 we have Rj ∩ ∆ 6= ∅ then Rj ⊂ ∆. But this is not possible
because by definiton of Rj for all (y, xj ) ∈ Rj we have xj 6= y. Hence (y, xj ) ∈
/ ∆, and
Rj ∩ ∆ = ∅;
We have exhausted every possibilities and therefore this contradicts with the result that at
least one of Λ, Li , Rj intersects with ∆. Hence we must have ∆ ∈ A.
4.2
Transition Probabilities
DEFINITION 4.2.1 A mapping T : Ω1 → A2 → [0, 1] is called a transition probability from
(Ω1 , A1 ) to (Ω2 , A2 )((Ω1 ) to (Ω2 ) for short) if
(1) T (Ω1 , ·) is a probability measure on (Ω2 , A2 ) for each ω1 ∈ Ω1
(2) T (·, A2 ) is A1 -measurable for each A2 ∈ A2
PROPOSITION 4.2.1 Suppose mapping T : Ω1 × A2 → [0, 1] with T (ω1 ; ·) being a probability measure on (Ω2 , A2 ) for each ω1 ∈ Ω1 . If T (·, A2 ) is A1 measurable for each A2 in some
π-system generating A2 , then T (·, A2 ) is A1 measurable for each A2 ∈ A2 (Use π − λ theorem. )
Proof. Suppose A2 = σ(P) where P is a π-system and T (·, A2 ) is A1 measurable for all A2 ∈ P.
We use the good sets argument. Define
G = {A2 ∈ A2 : T (·, A2 ) is A1 measurable }
We first show that
147
• G ⊃ P. Since we have assumption T (·, A2 ) is A1 -measurable for all A2 ∈ P and P ⊂ A2 ,
it follows from the definition that P ⊂ G.
• G is a λ-system. . It suffices to show that
(1) Ω2 ∈ G. Since for each ω1 ∈ Ω1 we have that T (ω1 , A2 ) is a probability measure on A2 ,
then we know that T (·, Ω2 ) ≡ 1. This means that T (·, Ω2 ) is a constant function and
therefore is measurable. Also we have that Ω2 ∈ A2 is trivial, hence we obtain that
Ω2 ∈ G by definition.
(2) G is closed under complementation. Suppose A2 ∈ G. Then by definition A2 ∈ A2
and this means Ac2 ∈ A2 . It also follows from definiton that T (·, A2 ) is A1 -measurable.
Recall that T (ω1 , A2 ) is a probability measure on A2 for fixed ω1 , then we have
T (ω1 , Ac2 ) = 1 − T (ω1 , A2 ) for all ω1 ∈ Ω1 , which is to say, T (·, Ac2 ) = 1 − T (·, A2 ).
Since T (·, A2 ) is A1 -measurable, therefore T (·, Ac2 ) = 1 − T (·, A2 ) is also measurable.
By definition of G we obtain that Ac2 ∈ G.
(3) G is closed under countable disjoint union. Suppose (En )n≥1 ⊂ G is a sqeuence
of disjoint sets. Then clearly we have Enc ∈ A2 for all n and therefore
!c
∞
∞
\
X
Enc ∈ A2
En =
n=1
n=1
Since for fixed ω1 we know that T (ω1 , ·) is a probability measure on A2 , then for all
ω1 ∈ Ω1 we have
T
ω1 ,
∞
X
n=1
!
En
=
∞
X
T (ω1 , En )
implies T
·,
n=1
n=1
Define
X(ω1 ) := T
∞
X
ω1 ,
∞
X
!
En
n=1
XN (ω1 ) =
N
X
!
En
=
∞
X
T (·, En )
n=1
T (ω1 , En )
n=1
Then we know that (XN )N ≥1 converges to X. Since each Tn (·, En ) is measurable from
N
X
the definition of G, then XN =
Tn (·, En ) is also measurable. Hence the measurability
n=1
of X = lim XN follows from the closure theorem for measurable functions.
N →∞
Now we know that G is a λ-system containing a π-system P, it follows from the π − λ Theorem
that σ(P) = A2 ⊂ G. Therefore by definition of G we know that for all A2 ∈ A2 we have T (·, A2 )
is A1 - measurable.
148
4.3
Composition Distribution
LEMMA 4.3.1 Let (Tω1 )ω1 ∈Ω1 be a transition probability from (Ω1 , A1 ) to (Ω2 , A2 ). Show
that Tω1 ((A1 × A2 )ω1 ) = IA1 (ω1 )Tω1 (A2 ).
Proof. Since we have that
(
(A1 × A2 )ω1 = {ω2 : (ω1 , ω2 ) ∈ (A1 × A2 ) with ω1 fixed} =
A2
if ω1 ∈ A1
∅
if ω1 ∈
/ A1
Therefore
(
Tω1 ((A1 × A2 )ω1 ) =
Tω1 (A2 )
if ω1 ∈ A1
Tω1 (∅)
if ω1 ∈
/ A1
Since we have that Tω1 is a transition probability, then we know that Tω1 (∅) = 0. Therefore we
can write
(
Tω1 ((A1 × A2 )ω1 ) =
Tω1 (A2 )
if ω1 ∈ A1
Tω1 (∅)
if ω1 ∈
/ A1
(
=
Tω1 (A2 )
if ω1 ∈ A1
0
if ω1 ∈
/ A1
= IA1 (ω1 )Tω1 (A2 )
PROPOSITION 4.3.1 Let (Tω1 )ω1 ∈Ω1 be a transition probability from (Ω1 , A1 ) to (Ω2 , A2 )
and A = A1 ⊗ A2 . Let G = {A ∈ A : ω1 7→ Tω1 (Aω1 ) is A1 measurable}. Show that G contains
the π-system of measurable rectangles and G is a λ-system.
Proof.
(1) G contains all measurable rectanles.
From the result of the problem above we knon
that if A1 × A2 is a measurable rectangle, then clearly A1 × A2 ∈ σ(A1 × A2 ) = A, and we have
Tω1 ((A1 × A2 )ω1 ) = IA1 (ω1 )Tω1 (A2 ) = IA1 (ω1 )T (ω1 , A2 )
Since T is a transition probability, then for each fixed A2 ∈ A2 the function T (·, A2 ) is A1 measurable. And A1 ∈ A1 is also measurable, which implies that IA1 is A1 -measurable. Therefore
Tω1 ((A1 × A2 )ω1 ) = IA1 (ω1 )T (ω1 , A2 ) as a function of ω1 being A1 -measurable follows from the
closure theorem for measurable functions. Hence by definition of G we know that A1 ×A2 ∈ G and
this implies that G contains all measurable rectangles. Note that the collection of all measurable
functions given in Problem 1 is a π-system.
(2) G is a λ-system. It suffices to show that
• Ω ∈ G. Write
Tω1 (Ωω1 ) = Tω1 (Ω2 ) = 1
149
because for all fixed ω1 Tω1 (·) is a probability measure. Therefore Tω1 (Ωω1 ) as a function of
ω1 is a constant function and hence is A1 -measurable. Also it is clear that Ω ∈ A. Therefore
Ω ∈ G by definition of G.
• G is closed under complementation. Suppose A ∈ G. Then A ∈ A implies that Ac ∈ A.
Recall the definition of section
Aω1 = {ω2 : (ω1 , ω2 ) ∈ A for fixed ω1 }
Then
(Ac )ω1 = {ω2 : (ω1 , ω2 ) ∈ Ac for fixed ω1 }
= {ω2 : (ω1 , ω2 ) ∈ A for fixed ω1 }c = (Aω1 )c
Therefore we may write
Tω1 ((Ac )ω1 ) = Tω1 ((Aω1 )c ) = 1 − Tω1 (Aω1 )
Since A ∈ G then Tω1 (Aω1 ) as a function of ω1 is A1 -measurable. Therefore Tω1 ((Ac )ω1 )
as a function of ω1 is also A1 -measurable due to the closure theorem. Hence Ac ∈ G by
definition.
• G is closed under countable disjoint union(
P
). Suppose (En )n≥1 ⊂ G is a sequence
∞
X
of disjoint sets in G. Then En ∈ A for all n and this implies that
En ∈ A. Write
n=1
∞
X
!
(
En
n=1
=
ω2 : (ω1 , ω2 ) ∈
∞
X
)
En for fixed ω1
n=1
ω1
Now we know that (ω1 , ω2 ) ∈
∞
X
En if and only if there exists a unique n0 such that
n=1
(ω1 , ω2 ) ∈ En0 . This is equivalent to ω2 ∈ (En0 )ω1 since we are fixing ω1 . And this is
equivalent to that there exists a unique n0 such that ω2 ∈ (En0 )ω1 . And this is equivalent
∞
X
to ω2 ∈
(En )ω1 . Therefore
n=1
∞
X
n=1
!
(
En
=
ω2 : (ω1 , ω2 ) ∈
∞
X
)
En for fixed ω1
n=1
ω1
(
=
ω2 : ω2 ∈
)
∞
X
(En )ω1 for fixed ω1
n=1
=
∞
X
(En )ω1
n=1
Recall that Tω1 is a transition probability. Then by countable additivity we have
! !
!
∞
∞
∞
X
X
X
Tω1
En
= Tω1
(En )ω1 =
Tω1 ((En )ω1 )
n=1
n=1
ω1
150
n=1
Define
X(ω1 ) = Tω1
∞
X
! !
En
n=1
XN (ω1 ) =
N
X
Tω1 ((En )ω1 )
n=1
ω1
Clearly we have XN → X as N → ∞. Since each Tω1 ((En )ω1 ) as a function of ω1 is A1 N
X
measurable because En ∈ G, then XN =
Tω1 ((En )ω1 ) is A1 -measurable. Therefore the
n=1
limit function X of (XN ) is also A1 -measurable by the closure theorem.
Now we can conclude from the π − λ theorem that G = σ(R). Namely, we have
PROPOSITION 4.3.2 Let (Tω1 )ω1 ∈Ω1 be a transition probability from (Ω1 , A1 ) to (Ω2 , A2 )
and A = A1 ⊗ A2 . Then for all A ∈ A the mapping ω1 7→ Tω1 (Aω1 ) is measurable with respect
to A1 .
THEOREM 4.3.1 Suppose M is a probability measure on (Ω1 , A1 ) and T is a transition
probability from (Ω1 , A2 ) to (Ω2 , A2 ). Then there exists a set function M T : A = A1 ⊗ A2 →
[0, 1], defined by
Z
M T (A) =
Tω1 (Aω1 )M (dω1 )
such that M T is a probability measure on the product measurable space (Ω1 × Ω2 , A1 ⊗ A2 ), and
it is named the composition of M and T . It is the unique probability measure on the product
space such that
Z
M T (A1 × A2 ) =
Tω1 (A2 )M (dω1 )
A1
for all A1 ∈ A1 , A2 ∈ A2 .
Proof. First we check that M T is a probability measure. Clearly M T is nonnegative because
T cannot take negative valeus. And
Z
Z
Z
M T (Ω) = Tω1 (Ωω1 )M (dω2 ) = T (ω1 , Ω2 )M (dω2 ) = M (dω2 ) = 1
Now suppose (An )n≥1 is a sequence of sets in A. Then
! Z
! !
Z
∞
∞
X
X
MT
An = Tω1
An
M (dω2 ) = Tω1
n=1
n=1
=
Z X
∞
∞
X
!
(An )ω1
M (dω2 )
n=1
ω1
Tω1 ((An )ω1 ) M (dω2 ) =
n=1
∞ Z
X
n=1
Tω1 ((An )ω1 ) M (dω2 ) =
∞
X
M T (An )
n=1
Where we have used Monotone Convergence Theorem, countable additivity of T (ω1 , ·) and that
sectioning commutes with sum over disjoint sets. Therefore M T is a probability measure on
151
(Ω1 × Ω2 , A1 ⊗ A2 ). For the uniqueness, we know that two probability measures taking same
values over a generating π-system must be the same, and we already know that M T has been
uniquely determined over the π-system R, which generates the product σ-field. Therefore the
uniquness is guaranteed.
Now we introduce the notation
Z
hP, Xi = XdP
Z
hM, T Xi = hM,
Xω1 (ω2 )Tω1 (dω2 )i
THEOREM 4.3.2 (The Law of Iterated Integral) Suppose X : Ω → R is a nonnegative random variable on the product space (Ω1 × Ω2 , A = A1 × A2 ), and M is a probability measure
on (Ω1 , A1 ), T is a transition probability from Ω1 to Ω2 .
Z
T X(ω1 ) := Xω1 (ω2 )Tω1 (dω2 ) is A1 -measurable and
Z Z
Z
X(ω)M T (dω) =
Then the map
Xω1 (ω2 )Tω1 (dω2 ) M (dω1 )
Proof.Z Let G denote the collection of A measurable nonnegative random variables X such that
ω1 7→
Xω1 (ω2 )Tω1 (dω2 ) is A1 measurable and hM T, Xi = hM, T Xi. Then it suffices to show:
Ω2
• IA ∈ G for every A ∈ A. It suffices to show that
– T IA (ω1 ) is A1 -measurable for all A ∈ A. Suppose A ∈ A. By definition we have
Z
Z
T IA (ω1 ) =
(IA )ω1 Tω1 (dω2 ) =
IAω1 Tω1 (dω2 ) = Tω1 (Aω1 )
Ω2
Ω2
Now we know that F = {A ∈ A : ω1 7→ Tω1 (Aω1 ) is A1 measurable} is a collection that
itself is a λ-system as well as containing the π-system R of all measurable rectangles.
Therefore by the π − λ theorem we know that F ⊃ σ(R) = A. Then it is clear that for
all A ∈ A we have that T IA (ω1 ) = Tω1 (Aω1 ) is A1 -measurable by the definition of F.
Z
Z
–
IA (ω)M T (dω) =
T IA (ω1 )M (dω1 ). By definition of M T (A) we have that
Ω
Ω1
Z
Z
IA (ω)M T (dω) = M T (A) =
Ω
Z
T IA (ω1 )M (dω1 ) = hM, T IA i
Tω1 (Aω1 )M (dω1 ) =
Ω1
Ω1
Therefore we have shown that
Z
T IA (ω1 ) =
(IA )ω1 (ω2 )Tω1 (dω2 ) is A1 measurable for all A ∈ A
Ω2
And that hM T, IA i = hM, T IA i. Therefore IA ∈ G by definition of G for all A ∈ A.
152
• G includes all nonnegative simple random variables. It suffices to show that for all
A, B ∈ A and for all a, b ≥ 0 we have aIA + bIB ∈ G. And the induction can complete
the rest. Write A1 = A\(A ∩ B), B1 = B\(A ∩ B), C = A ∩ B. Then aIA + bIB =
aIA1 + (a + b)IC + bIB1 . Next we know that A1 , B1 , C are disjoint, then
Z
T (aIA + bIB )(ω1 ) = (aIA1 + bIB1 + (a + b)IC )ω1 (ω2 )Tω1 (dω2 )
Z
Z
Z
= a (IA1 )ω1 Tω1 (dω2 ) + b (IB1 )ω1 Tω1 (dω2 ) + (a + b) (IC )ω1 Tω1 (dω2 )
= aTω1 ((A1 )ω1 ) + bTω1 ((B1 )ω1 ) + (a + b)Tω1 ((C)ω1 )
Therefore by closure theorem we know that T (aIA + bIB ) is A1 -measurable. Next write
Z
Z
Z
hM T, aIA + bIB i = (aIA + bIB )M T (dω) = a IA M T (dω) + b IB M T (dω)
= ahM T, IA i + bhM T, IB i = ahM, T IA i + bhM, T IB i
Z
= (aT IA (ω2 ) + bT IB (ω2 ))M (dω2 ) = hM, aIA + bIB i
• G includes all nonnegative random variables. Suppose X is a nonnegative random
variables. Then there exists a sequence of simple random variables (Xn )n≥1 that increases
to X. Clearly one has that ((Xn )ω1 )n≥1 increases to Xω1 . Then by Monotone Convergence
Theorem one has
T X(ω2 ) =
Z Z
lim Xn (ω1 )Tω1 (dω2 ) =
lim (Xn )ω1 (ω2 )Tω1 (dω2 )
n→∞
Z
= lim (Xn )ω1 (ω2 )Tω1 (dω2 ) = lim T Xn
n→∞
n→∞
n→∞
By the closure theorem we know that T X is A1 -measurable. And we know that Xn+1 ≥
Xn , which implies that (Xn+1 )ω1 ≥ (Xn )ω1 . Then by monotonicity of integral we have
T Xn+1 ≥ T Xn . Hence we obtain that (T Xn )n≥1 increases to T X. Then again by Monotone
Convergence Theorem we have
Z
hM T, Xi = hM T, lim Xn i =
Z
lim Xn (ω)M T (dω) = lim
Xn (ω)M T (dω)
n→∞
Z
= lim hM T, Xn i = lim hM, T Xn i = lim (T Xn )(ω1 )M (dω1 )
n→∞
n→∞
n→∞
Z
Z
=
lim T Xn (ω1 )M (dω1 ) = T X(ω1 )M (dω1 ) = hM, T Xi
n→∞
n→∞
n→∞
Therefore we conclude that X ∈ G.
153
THEOREM 4.3.3 (Generalized Law of Iterated Integral) Let (Ω1 , A1 , M ) be a probability space, (Ω2 , A2 , Tω1 ) be a probability space with transition probability T . Let M T be the
composition probability of M and T . Suppose X is quasi-integrable with respect to M T . Define
T X : Ω1 → [−∞, ∞] by
(T X)(ω1 ) =
Z


Xω1 Tω1 (dω2 )
if Xω1 ∈ Q(Tω1 )

0
if Xω1 ∈
/ Q(Tω1 )
Then T X is quasi-integrable with respect to M and hM T, Xi = hM, T Xi.
Proof. Set G = {T X + < ∞} ∪ {T X − < ∞} and note that T X = (T X + )IG − (T X − )IG . Then
we know that T X is A1 -measurable. Next we need to show that T X is M -quasi-integrable.
• If X ∈ Q− (M T ). Then hM T, X − i = hM, T X − i < ∞ by the law of iterated integral.
Therefore T X ∈ Q− (M ). Therefore M {T X − < ∞} = 1 and M (G) = 1.
• If X ∈ Q+ (M T ). Then hM T, X + i = hM, T X + i < ∞ by the law of iterated integral.
Therefore T X ∈ Q+ (M ). Therefore M {T X + < ∞} = 1 and M (G) = 1.
Therefore we know that M (G) = 1 and that T X ∈ Q(M ). Hence we have
hM T, Xi = hM T, X + i − hM T, X − i By X ∈ Q(M T )
= hM, T X + i − hM, T X − i By the law of iterated integral
= hM, (T X + )IG i − hM, (T X − )IG i Since G is of probability 1
= hM, (T X + )IG − (T X − )IG i Since hM, T X + IG i < ∞ or hM, T X − IG i < ∞
= hM, T Xi
4.4
Mixture Probabilities
DEFINITION 4.4.1 Let M be a probability measure on (Ω1 , A1 ) and K be a transition
probability from (Ω1 , A1 ) to (Ω2 , A2 ). The probability measure (M K)π2−1 , induced by the the
projection π2 : (ω1 , ω2 ) 7→ ω2 , is called the marginal of M K on (Ω2 , A2 ). We use (M K)2 to
denote (M K)π2−1 . Here (M K)2 is called the M -mixture of the probabilities (Kω1 )ω1 ∈Ω1 and M
is called the mixing distribution, K the kernel of the mixture.
PROPOSITION 4.4.1 Let M be a probability measure on (Ω1 , A1 ) and K be a transition
probability from (Ω1 , A1 ) to (Ω2 , A2 ). Then for all A2 ∈ A2 we have
Z
(M K)2 (A2 ) = K(ω1 , A2 )M (dω1 )
154
Proof. Write
Z
(M K)2 (A2 ) =
IA2 (ω2 )(M Kπ2−1 )
Z
(IA2 ◦ π2 )(ω1 , ω2 )(M K) = hM K, IA2 ◦ π2 i
=
Ω2
Ω
by the change of variables formula. Then
(M K)2 (A2 ) = hM K, IA2 ◦ π2 i = hM, K(IA2 ◦ π2 )i
Z Z
=
(IA2 ◦ π2 )ω1 Kω1 (dω2 ) M (dω1 )
Ω1
Ω2
Z Z
=
IA2 (ω2 )Kω1 (dω2 ) M (dω1 )
Ω1
Ω2
Z
Z
Kω1 (A2 )M (dω1 ) =
K(ω1 , A2 )M (dω1 )
=
Ω1
Ω1
PROPOSITION 4.4.2 Let M be a probability measure on (Ω1 , A1 ) and K be a transition
probability from (Ω1 , A1 ) to (Ω2 , A2 ). Suppose X ∈ Q(M K)2 . Then
Z Z
h(M K)2 , Xi =
X(ω2 )Kω1 (dω2 ) M (dω1 )
Ω1
Ω2
proof. By the change of variables one has
Z
Z
h(M K)2 , Xi = hM K, X ◦ π2 i =
Ω1
(X(ω2 ))ω1 Kω1 (dω2 ) M (dω1 )
Ω2
Since (X(ω2 ))ω1 = X(ω2 ), it follows that
Z Z
Z
h(M K)2 , Xi =
(X(ω2 ))ω1 Kω1 (dω2 ) M (dω1 ) =
Ω1
4.5
Ω2
Z
X(ω2 )Kω1 (dω2 ) M (dω1 )
Ω1
Ω2
Product Distributions
THEOREM 4.5.1 (Little Fubini’s Theorem) Let (Ωi , Ai , Pi ) be probability spaces with
i = 1, 2. Then there eixsts a unique probability measure P := P1 × P2 , called the product
probability measure of P1 and P2 , on (Ω1 × Ω2 , A1 ⊗ A2 ), such that
P (A1 × A2 ) = P1 (A1 ) × P2 (A2 )
for all A1 × A2 ∈ R
Moreover, if X ∈ Q(P ), then
Z Z
Z
Z
Xω2 P1 (dω1 ) P2 (dω2 ) = XdP =
Ω2
Ω1
Ω1
155
Z
Ω2
Xω1 P2 (dω2 ) P1 (dω1 )
Proof. Define transition probabilities T1 from Ω1 to Ω2 and T2 from Ω2 to Ω1 by
T1 (ω1 , A2 ) = P2 (A2 ) for all ω1 ∈ Ω1 , A2 ∈ A2
T2 (ω2 , A1 ) = P1 (A1 ) for all ω2 ∈ Ω2 , A1 ∈ A1
Then
Z
P1 T1 (A1 × A2 ) =
T1 (ω1 , A2 )M (dω1 ) = P1 (A1 )P2 (A2 )
A1
Z
T2 (ω2 , A1 )M (dω2 ) = P2 T2 (A1 × A2 )
=
A2
Therefore we see that P = P1 T1 = P2 T2 . Uniqueness follows from that P is unique in the
π-system R generating the product σ-field A1 ⊗ A2 .
Now consider the iterated integral. From the proof of the generalized law of iterated integral
we know that T1 X + < ∞ or T1 X + < ∞ with probability 1, namely, Xω1 (ω2 ) is quasi-integrable
with respect to T1 for ω1 ∈ Ω1 a.s.. Similarly we have that Xω2 (ω1 ) is quasi-integrable with
respect to T2 for ω2 ∈ Ω2 a.s.. Therefore, if we use the notation T X defined in the generalized
law of iterated integral, we have
Z
Xω2 (ω1 )P1 (dω1 ) = T2 X
Z
a.s. and
Ω1
Xω1 (ω2 )P2 (dω2 ) = T1 X
a.s.
Ω2
Therefore we can apply the generalized law of iterated integral
Z
Z Z
XdP = hP1 T1 , Xi = hP1 , T1 Xi =
Xω1 P2 (dω2 ) P1 (dω1 )
Ω1
Ω2
Z
Z Z
XdP = hP2 T2 , Xi = hP2 , T2 Xi =
Xω2 P1 (dω1 ) P2 (dω2 )
Ω2
Ω1
THEOREM 4.5.2 Suppose X, Y are indepdendent random variables on (Ω, F, P ). Then
P (X, Y )−1 = P X −1 × P Y −1
Proof. First we consider measurable rectangles in R̄2 with Borel σ-fields in R̄2 . Suppose A, B
are two measurable sets in R̄. Then
P (X, Y )−1 (A × B) = P (X ∈ A, Y ∈ B) = P (X ∈ A)P (Y ∈ B) = (P X −1 )(P Y −1 )
Since P (X, Y )−1 = (P X −1 )(P Y −1 ) on the π-system generating the Borel σ-fields, which is the
collection of all measurable rectangles, it follows that P (X, Y )−1 = (P X −1 )(P Y −1 ).
The next theorem is one of the most fundamental theorem in probability theory other than
in general measure theory. It connects independence with expectation.
156
THEOREM 4.5.3 Let X, Y be two independent random variables. Then
(a) If X, Y ≥ 0 then E(XY ) = (EX)(EY ).
(b) If X, Y are integrable then XY is integrable and E(XY ) = (EX)(EY ).
Proof.
(a) Set f (x, y) = xy. Since X, Y ≥ 0 then XY ≥ 0 and therefore XY ∈ Q(P ). Use the change
of variables we have
Z
Z
f ◦ (X, Y )dP =
E(XY ) =
−1
f d(P (X, Y ) ) =
f d(P X −1 × P Y −1 )
R̄2
R̄2
R̄2
Z
By Little Fubini’s theorem we have
Z
Z Z
−1
−1
−1
E(XY ) =
f d(P X × P Y ) =
xyP X (dx) P Y −1 (dy)
2
R̄
R̄
R̄Z
Z
−1
−1
=
xP X (dx)
yP Y (dy) = (EX)(EY )
R̄
R̄
(b) Since E|X| < ∞, E|Y | < ∞, then E|XY | = E|X||Y | = (E|X|)(E|Y |) by (a). Therefore XY
is integrable. Therefore we can apply the computation above again and obtain the result.
157
Chapter 5
Law of Large Numbers
5.1
Simple Limit Theorems
THEOREM 5.1.1 Let (An )∞
n=1 be a sequence of pairwise independent events with common
probability p. Define
Nn =
n
X
IAk
p̂n =
k=1
Nn
n
r
Then p̂n converges to p in L and in probability.
Proof. First we prove that p̂n converges to p in L2 . In fact we can compute
!
n
n
1X
1X
E p̂n = E
Ik =
P (Ak ) = p
n k=1
n k=1
And this leads to
1
p(1 − p) → 0
n
is pairwise independent and the sum of variances formula
kp̂n − pk2 = Var(p̂n ) =
as n → ∞ since the sequence (Ak )∞
k=1
can apply. Therefore p̂n converges to p in L2 . This implies that p̂n converges to p in probability.
And we have that
n
p̂n =
1X
Ik ≤ 1
n k=1
and the constant random variable Y = 1 ∈ Lr for all r > 0. This means that p̂n is Lr dominated
by Y . Therefore we know that p̂n converges to p in Lr for all r > 0.
THEOREM 5.1.2 ((L2 LLN for uncorrelated r.v.’s with bdd var)) If (Xn )∞
n=1 is a
sequence of uncorrelated random variables with EXn2 < ∞ for each n and
M := sup VarXn < ∞
n≥1
158
Then
Sn − ESn L2
→0
n
Pn
where Sn = k=1 Xk . It implies convergence in probability.
Proof. We need to compute
n
X
Sn − ESn 2
M
= 1 Var(Sn ) = 1
→0
Var(Xk ) ≤
2
2
n
n
n k=1
n
2
as n → ∞. Therefore we know that
Sn − ESn L2
→0
n
and as a consequence, converges in probability.
THEOREM 5.1.3 (Exercise 5.1.1) For any sequence of random variables (Xn )∞
n=1 , if Xn
converges to 0 in L2 then
Sn − ESn L2
→0
n
and converges in probability.
Proof. To see this, we only need to compute
n
X
Sn − ESn 1
1
= kSn − ESn k2 = (Xn − EXn )
n
n
n k=1
2
2
n
n
1X
1X
≤
kXn − EXn k2 ≤
(kXn k2 + |EXn |)
n k=1
n k=1
Since we know that EXn2 → 0 as n → ∞ and EXn2 ≥ |EXn |2 , then |EXn | → 0 as n → ∞.
Therefore kXn k2 + |EXn | → 0 as n → ∞. This yields that
n
X
Sn − ESn ≤ 1
(kXn k2 + |EXn |) → 0
n
n k=1
2
as n → 0. And as a consequence the convergence in probability also holds.
Concerning the strong law of large numbers, the following lemma is quite useful
LEMMA 5.1.1 (Theorem 4.2.2) A sequence of random variables (Zn )∞
n=1 converges to 0
a.s. if and only if for all ε > 0 we have P {|Zn | > ε i.o. } = 0.
159
Proof. We first establish the relation between the two objects
{Zn 6→ 0} = {There exists ε > 0, for all n ≥ 1 there exists k ≥ n such that |Zk | ≥ ε}
∞ [
∞
∞ \
∞ [
∞
[\
[
=
{|Zk | ≥ ε} =
{|Zk | ≥ 1/m}
=
ε>0 n=1 k=n
∞
[
m=1 n=1 k=n
{|Zn | ≥ ε i.o. }
m=1
Suppose we have that Zn → 0 a.s., then we know that P {Zn 6→ 0} = 0. For all ε > 0 there
exists some m0 ≥ 1 with 1/m < ε and
{|Zn | ≥ ε i.o. } ⊂ {|Zn | ≥ 1/m0 i.o. } ⊂
∞
[
{|Zn | ≥ 1/m i.o. } = {Zn 6→ 0}
m=1
And this shows that P {|Zn | ≥ ε i.o. } = 0.
Conversely suppose we have that P {|Zn | ≥ ε i.o. } = 0 for all ε > 0. Then for all integer
m ≥ 1 we have that P {|Zn | ≥ 1/m i.o. } = 0. Therefore
!
∞
∞
X
[
P {|Zn | ≥ 1/m i.o. } = 0
P {Zn 6→ 0} = P
{|Zn | ≥ 1/m i.o. } ≤
m=1
m=1
This means that Zn → 0 a.s..
THEOREM 5.1.4 (SLLN for uncorrelated r.v.’s with bdd vars) For any sequence of
uncorrelated random variables (Xn )∞
n=1 , if
M := sup VarXn < ∞
n≥1
Then
Sn − ESn a.s.
→0
n
Proof. Without loss of generality we may assume that ESn = 0. If ESn 6= 0 then we may
consider Sn − ESn with E(Sn − ESn ) = 0.
We seek to find a subsequence (Snk )∞
k=1 with Snk /nk converges to 0 a.s.. By previous lemma
we know that (Snk − ESnk )/nk → 0 a.s. if and only if
P {|Snk − ESnk | ≥ nk ε i.o. } = 0
for all ε > 0. It suffices to show that
∞
X
P {|Snk − ESnk | ≥ nk ε} < ∞
k=1
160
by the first Borel-Cantelli lemma. By Chebyshev’s inequality it suffices to let that
∞
X
P {|Snk
k=1
∞
X
Var(Snk )
− ESnk | ≥ nk ε} ≤
<∞
2n 2
ε
k
k=1
Namely, we need
∞
X
Var(Snk )
<∞
2
n
k
k=1
(∗)
In order to let Sn /n converges to 0 a.s., we need to consider an upper bound for
Dk =
max
nk ≤j<nk+1
|Sj − Snk |
In fact
nk+1 −1
Dk2
≤
2
max
nk ≤j<nk+1
(Sj − Snk ) ≤
X
(Sj − Snk )2
j=nk
And this yields that
nk+1 −1
EDk2
=
X
E(Sj − Snk )2
j=nk
nk+1 −1
=
X
(Var(Sj ) − Var(Snk )2 )
j=nk
nk+1 −1
≤
X
(Var(Snk+1 ) − Var(Snk ))
j=nk
= (nk+1 − nk )(Var(Snk+1 ) − Var(Snk ))
Now we know that for each n between nk and nk+1
|Sn |
1
1
≤ (|Snk | + Dk ) ≤ (|Snk + Dk |)
n
n
nk
For Sn /n converges to 0 a.s., it suffices to let Dk /nk converges to 0 a.s.. Again by the previous
lemma, the first Borel-Cantelli lemma and the Markov’s inequality we see for all ε > 0
P {|Dk | ≥ nk ε i.o. } = 0
it suffices to let
∞
X
P {|Dk | ≥ nk ε} ≤
k=1
∞
X
ED2
k
2 2
n
ε
k
k=1
<
∞ X
nk+1 − nk
n2k ε2
k=1
(Var(Snk+1 ) − Var(Snk )) < ∞
Therefore it suffices to let
∞ X
nk+1 − nk
k=1
n2k
(Var(Snk+1 ) − Var(Snk )) < ∞
161
Now that we know
nk+1 −1
Var(Snk+1 ) − Var(Snk ) =
X
Var(Xj ) ≤ (nk+1 − nk )M
j=nk
Then it suffices to let
2
∞ ∞ X
X
nk+1 − nk
nk+1 − nk
(∗∗)
M <∞
(Var(Snk+1 ) − Var(Snk )) ≤
n2k
nk
k=1
k=1
Take nk = k 2 and then (∗) and (∗∗) hold, yielding that Dk /nk converges to 0 a.s. and Snk /nk
converges to 0 a.s.. Hence
|Sn |
1
≤ (|Snk | + Dk )
n
nk
shows that Sn /n converges to 0 a.s.. because as n → ∞ we know that k → ∞ since nk = k 2 ≤
n < nk+1 = (k + 1)2 .
THEOREM 5.1.5 (SLLN for uncorrelated r.v.’s) For any sequence of uncorrelated ranθ
dom variables (Xn )∞
n=1 , if Var(Sn ) = O(n ), 0 ≤ θ < 1/2 then
Sn − ESn a.s.
→0
n
Proof. We first assert that
Var(Sn )
1
=
n→∞
n1+θ
1+θ
In fact by the mean-value theorem we have
lim
(n + 1)1+θ − n1+θ = (1 + θ)(n + δ(n))θ
0 ≤ δ(n) ≤ 1
Then we see that
Var(Sn+1 ) − Var(Sn )
Var(Xn+1 )
1
=
lim
=
n→∞ (n + 1)1+θ − n1+θ
n→∞ (1 + θ)(n + δ(n))θ
1+θ
lim
By the Stolz theorem we know that
lim
n→∞
Var(Sn )
1
=
1+θ
n
1+θ
Take nk = k 2 . Without loss of generality we may assume that ESn = 0. Then
∞
X
P {|Snk | ≥ nk ε} ≤
k=1
Since
lim
k→∞
Var(Snk )
n2k ε2
∞
X
Var(Snk )
<∞
2 2
n
ε
k
k=1
Var(Sk2 ) 2−2θ
Var(Sk2 )
k
=
lim
= constant
k→∞
k→∞ k 2+2θ ε2
k 4 ε2
(k 2−2θ ) = lim
162
Then by the comparison criterion we obtain the desired result. By previous lemma and the first
Borel-Cantelli lemma we know that
Sk2 a.s.
→0
k2
Furthermore consider
Dk =
max
k2 ≤j<(k+1)2
|Sj − Sk2 |
Then

EDk2 = E
max
k2 ≤j<(k+1)2

(k+1)2 −1
X
(Sj − Sk2 )2 ≤ E 
(Sj − Sk2 )2 
j=k2
≤ E ((k + 1)2 − k 2 )(S(k+1)2 − Sk2 )2
= ((k + 1)2 − k 2 )E(S(k+1)2 − Sk2 )2


(k+1)2
X
= ((k + 1)2 − k 2 )Var 
Xj 
j=k2
(k+1)2
2
2
= ((k + 1) − k )
X
Var (Xj )
j=k2
which is yielded by the uncorrelatedness of (Xn )∞
n=1 . Note that for all ε > 0 we have
∞
X
2
P {|Dk | ≥ k ε} ≤
k=1
∞
X
ED2
k=1
k
k 4 ε2
2
=
(k+1)
∞
X
2k + 1 X
k=1
k 4 ε2
Var(Xj )
j=k2
(k+1)2
∞
X
(2k + 1)2 X Var(Xj )
=
k 4 ε2
2k + 1
2
k=1
j=k
2
(k+1)
∞
X
(2k + 1)2 X Var(Xj )/k 2θ
=
k 4−2θ ε2
2k + 1
2
k=1
j=k
and that
(k+1)2
lim
k→∞
X Var(Xj )/k 2θ
=1
2k + 1
2
j=k
Then we see that


2
2 (k+1)
2θ
2
X
(2k
+
1)
Var(X
)/k
(2k
+
1)
j
2−2θ
k
lim  4−2θ 2
= lim
k 2−2θ = constant
4−2θ ε2
k→∞
k→∞
k
ε
2k
+
1
k
2
j=k
Therefore by comparison criterion we know that
∞
X
P {|Dk | ≥ k 2 ε2 } < ∞
k=1
163
And by previous lemma and the first Borel-Cantelli lemma we know that
Dk a.s.
→0
k2
Now for all n there exists some k such that k 2 ≤ n < (k + 1)2 . Then
1
|Sn |
≤ 2 (|Sk2 | + Dk )
n
k
Now letting n → ∞ yields that k → ∞ and Sn /n converges to 0 a.s..
∞
THEOREM 5.1.6 For any sequence of uncorrelated random variables (Xn )∞
n=1 if (bn )n=1
increases and there exists a sequence (nk )∞
k=1 such that
∞
X
Var(Snk )
<∞
b2nk
k=1
and
∞
X
1
(nk+1 − nk )(Var(Snk+1 ) − Var(Snk ))
2
b
n
k
k=1
Then we have
1
a.s.
(Sn − ESn ) → 0
bn
Proof. Without loss of generality we may assume that ESn = 0. Consider
Dk =
max
nk ≤j<nk+1
|Sj − Snk |
Then
nk+1 −1
EDk2 = E
max
nk ≤j<nk+1
(Sj − Snk )2 ≤ E
X
!
(Sj − Snk )2
j=nk
nk+1 −1
=
X
E(Sj − Snk )2 ≤ (nk+1 − nk )E(Snk+1 − Snk )2
j=nk
nk+1
= (nk+1 − nk )Var(Snk+1 − Snk ) = (nk+1 − nk )
X
Var(Xj )
j=nk
= (nk+1 − nk )(Var(Snk+1 ) − Var(Snk ))
Now by the previous lemma and first Borel-Cantelli lemma we know
∞
X
P {|Snk | ≥ bnk ε} ≤
k=1
∞
X
Var(Snk )
<∞
2 ε2
b
n
k
k=1
And this implies that Snk /bnk converges to 0 a.s.. Similarly we have
∞
X
∞
∞
X
X
EDk2
1
P {|Dk | ≥ bnk ε} ≤
≤
(nk+1 − nk )(Var(Snk+1 ) − Var(Snk )) < ∞
2
2
2
b ε
b ε2
k=1
k=1 nk
k=1 nk
164
And the first Borel-Cantelli lemma and previous lemma implies that
Dk a.s.
→0
bnk
Therefore for all n there exists some k such that nk ≤ n < nk+1 . And note that
|Sn |
1
≤
(|Snk | + Dk )
bn
bnk
then letting n → ∞ yields k → ∞ and Sn /n converges to 0 a.s..
THEOREM 5.1.7 For any sequence of uncorrelated random variables (Xn )∞
n=1 , if Var(Sn ) =
O(nθ ), 0 ≤ θ < 1/2 then
Sn − ESn a.s.
→0
bn
if bn = O(n3/4+θ/2+δ ) for some δ > 0.
Proof. Due to the mean-value theorem we know that (1 + θ)Var(Sn ) = O(n1+θ ). Therefore if we
take nk = k 2 then
1
Var(Snk ) 1+4δ
n1+θ
1 1+4δ
k
k 1+4δ =
lim
k
= lim
k
= constant
3/2+θ+2δ
2
1+4δ
n→∞
n→∞
bnk
1+θk
(1 + θ)nk
This shows the first condition of the previous theorem holds by the comparison criterion because
∞
X
k=1
yielding that
1
k 1+4δ
< ∞,
δ>0
∞
X
Var(Snk )
<∞
2
b
n
k
k=1
For the second theorem compute
lim
k→∞
1
(nk+1 − nk )
b2nk
nk+1 −1
X
!
Var(Xj ) k 1+4δ
j=nk
nk+1 −1
X Var(Xj )/k 2θ
k 2θ
2
= lim
(n
−
n
)
k+1
k
k→∞
b2nk
nk+1 − nk
j=nk
2θ
k
2
= lim
(nk+1 − nk ) k 1+4δ
k→∞
b2nk
1
2
= lim
(2k + 1) k 1+4δ = constant
k→∞
k 3+4δ
!
k 1+4δ
Therefore the second condition holds due to comparison criterion similarly.
a.s.
Now by previous theorem we know that Sn /bn → 0.
165
5.2
Weak Law of Large Numbers
THEOREM 5.2.1 (5.2.2 WLLN in pairwise i.i.d. r.v.’s with E|X| < ∞) Let X, (Xn )∞
n=1
be identically distributed and pairwise independent random variables with E|X| < ∞. Then
Sn P
→ EX
n
P
P
Proof. For each n define Yn = Xn I{|X|<n} , ϕn (x) = nk=1 I{k≤x} and ϕ(x) = ∞
k=1 I{k≤x} . Clearly
we have that ϕn (x) → ϕ(x) for each x ∈ R and ϕn (x) ≤ ϕ(x) ≤ x. Now we are to prove:
P
• nk=1 (EYk )2 = o(n2 ). Since all Xn , X are identically distributed, we have that
EYn = EXn I{|Xn |<n} = EXI{|X|<n}
Since for all ω ∈ Ω we have that XI{|X|<n} (ω) → X(ω)a s n → ∞, by the Dominated
Convergence Theorem we know that
EYn = EXI{|X|≤n} → EX
as n → ∞
Now by Cesaro’s theorem we know that
n
1
1 X
(EYn )2 = lim
lim 2
n→∞ n
n→∞ n
k=1
P
Therefore we know that nk=1 (EYn )2 = o(n2 ).
P
• nk=1 EYk2 = o(n2 ). First compute
n
X
k=1
EYk2
=
n
X
EXk2 I{|Xk |<k}
=
=E
X2
n
X
!
=0
EX 2 I{|X|<k}
!k=1
k=1
n
X
n
1X
(EYn )2
n k=1
I{|X|<k}
= E(|X|2 (n − ϕn (|X|)))
k=1
Now consider the following inequality
sup x(n − ϕn (x)) ≤
x≥0
max
≤ n2
n(n−ϕn (x))
In order to attain the goal we need a second truncation
√
E(|X|2 (n − ϕn (|X|))I{|X|≤√n} ) ≤ E( n|X|(n − ϕn (|X|))I{|X|≤n2 } )
√
≤ E(n n|X|) = µn3/2
And
E(|X|2 (n − ϕn (|X|))I{|X|>√n} ) ≤ E(|X|(n − ϕn (|X|))|X|I{|X|>√n} )
≤ n2 E(|X|I{|X|>√n} )
166
This shows that
because |X|I|X|>√n
E(|X|2 (n − ϕn (|X|)))
lim
=0
n→∞
n2
≤ |X| and by Dominated Convergence Theorem
E(|X|I{|X|>√n} ) → 0 as n → ∞
Therefore we have obtained
Pn
2
E(X 2 (n − ϕn (|X|)))
k=1 EYk
lim
=
lim
=0
n→∞
n→∞
n2
n2
P
P
P
• nk=1 Var(Yk ) = o(n2 ). Since nk=1 EYk2 = o(n2 ) and nk=1 (EYk )2 = o(n2 ), then we have
that
n
X
Var(Yk ) ≤
k=1
n
X
2
(EYk ) +
k=1
n
X
EYk2 = o(n2 )
k=1
Pn
Therefore we have shown that k=1 Var(Yk ) = o(n2 ).
P
P
• nk=1 (Yk − EYk )/n → 0 as n → ∞. By the Chebyshev’s inequality for all ε > 0 we have
( n
)
!
n
n
X
X
1
1 X
P (Yk − EYk ) ≥ nε ≤ 2 2 Var
(Yk ) = 2 2
Var(Yk ) → 0
n
ε
n
ε
k=1
k=1
k=1
Pn
as n → ∞. Therefore we know that k=1 (Yk − EYk )/n → 0 in probability.
P
P
P
P
• nk=1 (Xk −EXk )/n → 0 as n → ∞. Denote Sn = nk=1 (Xk −EXk ), Tn = nk=1 (Yk −EYk ).
First note that we have EXk −EYk = EXI{|X|≥k} and by Dominated Convergence Theorem
it converges to 0 as k → ∞ because |XI{|X|≥k} ≤ |X|. Now we have that
n
n
Sn − Tn
1X
1X
=
(Xk − Yk ) −
(EXk − EYk )
n
n k=1
n k=1
Now by the first Borel Cantelli lemma we have that
∞
X
P {Xn 6= Yn } =
n=1
=
∞
X
n=1
∞
X
P {|Xn | ≥ n} =
EI{|X|≥n} = E
n=1
∞
X
P {|X| ≥ n}
n=1
∞
X
!
I{n≤|X|}
n=1
= Eϕ(|X|) ≤ E|X|
which implies that P {Xn 6= Yn i.o. } = 0. Therefore there exists some n0 such that for all
n ≥ n0 we have Xn = Yn a.s.. Therefore Xn − Yn → 0 as n → ∞ and by Cesaro’s Theorem
we know that
n
1X
a.s.
(Xk − Yk ) → 0
lim
n→∞ n
k=1
167
Therefore
n
n
1X
Sn − Tn
1X
P
=
(Xk − Yk ) −
(EXk − EYk ) → 0
n
n k=1
n k=1
P
a.s.
P
Since Tn /n → 0 and Sn /n − Tn /n → 0, it follows that Sn /n → 0. Namely, we have proven
that
Pn
k=1 (Xk
− EXk )
n
5.3
P
→0
Random Series
LEMMA 5.3.1 (Cauchy Criterion for a.s. Convergence) A sequence of random variables
(Xn )∞
n=1 converges a.s. if and only if for all ε > 0
lim lim P
max |Xn − XN | ≥ ε = 0
N →∞ m→∞
N ≤n≤m
Proof. By Cauchy criterion for real sequence we can write Xn does not converge if and only if
there exists some ε0 , such that for all N ∈ N there exists n, p ≥ N such that |Xn − Xp | ≥ ε0 .
Namely,
{ω : Xn (ω) does not converge} =
=
=
∞ [ \
ε>0 N =1
∞
[ \
max |Xn − Xp | ≥ ε
n,m≥N
∞ [
ε>0 N =1 m=N
∞
∞ [
∞ \
[
∞ [
∞ \
N =1 m=N
increases and thus
P
max |Xn − Xp | ≥ ε
N ≤n,p≤m
k=1 N =1 m=N
As k increases the set
1
max |Xn − Xp | ≥
N ≤n,p≤m
k
1
max |Xn − Xp | ≥
N ≤n,p≤m
k
(Xn )∞
n=1
converges a.s. if and only if for all ε we have
!
!
∞ [
∞ ∞ \
[
max |Xn − Xp | ≥ ε
= lim P
max |Xn − Xp | ≥ ε
N =1 m=N
N ≤n,p≤m
N →∞
m=N
N ≤n,p≤m
= lim lim P
N →∞ m→∞
168
max |Xn − Xp | ≥ ε
N ≤n,p≤m
=0
Now we have the following containing relation
max |Xn − Xp | > ε ⊂
max (|Xn − XN | − |XN − Xp |) > ε
N ≤n,p≤m
N ≤n,p≤m
=
max |Xn − XN | + max |XN − Xp | > ε
N ≤n≤m
N ≤p≤m
ε
ε
⊂
max |Xn − XN | >
∪ max |XN − Xp | >
N ≤n≤m
N ≤p≤m
2
2
ε
=
max |Xn − XN | >
N ≤n≤m
2
ε
⊂
max |Xn − Xp | ≥
N ≤n,p≤m
2
Then we know that for all ε > 0,
max |Xn − Xp | ≥ ε
lim lim P
N →∞ m→∞
N ≤n,p≤m
=0
if and only if for all ε > 0,
max |Xn − XN | ≥ ε
lim lim P
N →∞ m→∞
N ≤n≤m
=0
LEMMA 5.3.2 (Kolmogorov’s Inequality) Let X1 , · · · , Xn be independent random vari2
2
for m = 1, · · · , n. Then for every ε > 0
= σm
ables with mean 0 and finite variances EXm
Pn
2
σm
P max |Sm | > ε ≤ m=1
1≤m≤n
ε2
Proof. Let C = {max1≤m≤n |Sm | ≥ ε}. Define random variable τ : Ω → {1, 2, · · · , n} to be
(
min{m : |Sm (ω)| ≥ ε} , ω ∈ C
τ (ω) =
n
, ω ∈ Cc
Then we see that for k = 1, 2, · · · , n − 1:
{τ = k} = {|S1 | < ε, |S2 | < ε, · · · , |Sk−1 | < ε, |Sk | ≥ ε}
And for k = n:
{τ = n} = {|S1 | < ε, |S2 | < ε, · · · , |Sn−1 | < ε}
Then
Sτ =
n
X
I{τ =k} Sk ∈ σ(X1 , · · · , Xn )
k=1
is also a random variable.
169
• First we claim that C ⊂ {|Sτ | ≥ ε}. Suppose ω ∈ C. If τ (ω) = n then we know that
|S1 (ω)| < ε, · · · , |Sn−1 (ω)| < ε and |Sn (ω)| = |Sτ (ω) (ω)| ≥ ε is given by ω ∈ C. If τ (ω) < n
then we know that there exists some k < n with τ (ω) = k and |Sk (ω)| = |Sτ (ω) (ω)| ≥ ε.
Therefore we have shown that C ⊂ {|Sτ | ≥ ε}.
• Second we claim that E(Sτ2 ) ≤ E(Sn2 ). We need to compute
E(Sn2 ) = E(Sτ + (Sn − Sτ ))2 = E(Sτ2 ) + E(Sn − Sτ )2 + 2E(Sτ (Sn − Sτ ))
≥ E(Sτ )2 + 2E(Sτ (Sn − Sτ ))
Since
E(Sτ (Sn − Sτ )) =
=
n
X
k=1
n
X
E(I{τ =k} Sk (Sn − Sτ )) =
n
X
E(I{τ =k} Sk (Sn − Sk ))
k=1
E(I{τ =k} Sk )E(Sn − Sk ) = 0
k=1
This is because
I{τ =k} Sk ∈ σ(X1 , X2 , · · · , Xk )
Sn − Sk =
n
X
Xi ∈ σ(Xk+1 , · · · , Xn )
i=k+1
and X1 , · · · , Xn are independent. By the splitting theorem we know that I{τ =k} Sk and
Sn − Sk are independent. Therefore we have shown that E(Sn2 ) ≥ E(Sτ2 ).
• By the Chebyshev’s inequality we have
E(Sn2 )
Var(Sn )
E(Sτ )2
≤
=
=
P (C) ≤ P {|Sτ | ≥ ε} ≤
2
2
ε
ε
ε2
Pn
m=1
ε2
2
σm
LEMMA 5.3.3 Let (Xn )∞
n=1 be a sequence of independent random variables with zero means.
P∞
Then k=1 Var(Xk ) ≤ ∞ implies
(∞
)
X
P
Xk < ∞ = 1
k=1
Proof. By the Kolmogorov’s inequality we can compute
(
) Pp
n
X
Var(Xk )
P max Xk ≥ ε ≤ k=m+1 2
m≤n≤p σ
k=m+1
Then
)
Pp
P∞
n
X
k=m+1 Var(Xk )
k=m+1 Var(Xk )
=
max Xk ≥ ε ≤ lim
p→∞
m≤n≤p ε2
ε2
(
lim P
p→∞
k=m+1
170
And this implies that
n
)
P∞
X
k=m+1 Var(Xk )
lim lim P max Xk ≥ ε ≤ lim
=0
m→∞ p→∞
m→∞
m≤n≤p ε2
k=m+1
P
By the Cauchy criterion we know that ∞
k=1 Xk converges a.s..
(
LEMMA 5.3.4 Let (Xn )∞
n=1 be a sequence of independent random variables with mean 0 and
uniformly bounded: supn≥1 |Xn | ≤ c. Then for all ε > 0 we have
(ε + c)2
1−
≤ P max |Sk | ≥ ε
1≤k≤n
Var(Sn )
Proof. Let C = {max1≤k≤m |Sk | ≥ ε} and σk = Var(Xk ). First we show
P
• E(Sτ2 ) = E(s2τ ) where sk = ki=1 Var(Xi ). First define S02 = s20 = 0 and compute
!
τ
X
2
E(Sτ2 − s2τ ) = E
(Sk2 − s2k ) − (Sk−1
− s2k−1 )
k=1
n
X
=E
!
2
I{k≤τ } ((Sk2 − s2k ) − (Sk−1
− s2k−1 ))
k=1
=
n
X
E I{k−1<τ } (Xk2 + 2Xk Sk−1 − σk2 )
k=1
=
=
n
X
E(I{k−1<τ } (Xk2 − σk2 )) + E(I{k−1<τ } 2Xk Sk−1 )
k=1
n
X
EI{k−1<τ } E(Xk2 − σk2 ) + 2EXk E(I{k−1<τ } Sk−1 )
k=1
=0
This is because {k − 1 < τ } = {|S1 | < ε, · · · , |Sk−1 | < ε} ∈ σ(X1 , · · · , Xk−1 ) whereas
Xk ∈ σ(Xk ) and Sk−1 ∈ σ(X1 , · · · , Xk−1 ). Then by the independence the expectation of
the product is the product of the expectations. Therefore ESτ2 = Es2τ .
• s2n P (C c ) ≤ E(s2τ ). Since
E(s2τ ) =
n
X
s2k P ({τ = k}) ≥ s2n P ({τ = n}) ≥ s2n P (C c )
k=1
c
This is because if ω ∈ C then τ (ω) = n and C c ⊂ {τ = n}.
• ESτ2 ≤ (c + ε)2 . This is because
E(Sτ2 ) = E(Sτ −1 + Xτ ) ≤ E(|Sτ −1 | + |Xτ |)2 ≤ (ε + c)2
where |Sτ −1 | < ε for all ω ∈ Ω.
171
Since now we have
s2n (1 − P (C)) = s2n P (C c ) ≤ E(s2τ ) = E(Sτ2 ) ≤ (c + ε)2
Then by rearranging terms we have
P (C) ≥ 1 −
(c + ε)2
(c + ε)2
(c + ε)2
P
=
1
−
=
1
−
n
s2n
Var(Sn )
k=1 Var(Xk )
A generalization of the previous lemma with a weaker lower bound is given:
LEMMA 5.3.5 Let (Xn )∞
n=1 be independent random variables with supn≥1 |Xn | ≤ c. Then
1 (c + ε)2
−
≤ P max |Sk | ≥ ε
1≤k≤n
2 Var(Sn )
Proof. Suppose Xn0 is independent and identically distributed with Xn and independent with Xk
for all k(The existence of this is due to Infinite product space theorem and we shall keep this
0
0 ∞
later). Then we know that (σ(Xn ))∞
n=1 and (σ(Xn ))n=1 are independent. Let Zn = (Xn − Xn )/2.
Then by the splitting theorem we know that (σ(Xn , Xn0 ))∞
n=1 are independent and therefore
(Zn )∞
n=1 are zero mean independent random variables with supn |Zn | ≤ c. By our previous
lemma we have
k
)
X (ε + c)2
2(ε + c)2
P
P
max Zi ≥ ε ≥ 1 − n
=1− n
1≤k≤n k=1 Var(Zk )
k=1 Var(Xk )
(
P
i=1
On the other hand we have
!
(
) (
)
k
k
k
X
X
X
0
max Zi ≥ ε ⊂ max Xi + Xi ≥ 2ε
1≤k≤n 1≤k≤n i=1
i=1
i=1
(
)
k
X
⊂ max |Sk | ≥ ε ∪ max Xi0 ≥ ε
1≤k≤n
1≤k≤n i=1
Therefore
)
k
X
max Zi ≥ ε
1≤k≤n i=1
(
≤ P max |Sk | ≥ ε + P max
2(c + ε)2
≤P
1 − Pn
i=1 Var(Xk )
(
)
k
X
Xi0 ≥ ε
1≤k≤n 1≤k≤n
i=1
= 2P
This shows that
max |Sk | ≥ ε
1≤k≤n
P
max |Sk | ≥ ε
1≤k≤n
172
≥
1 (c + ε)2
−
2 Var(Sn )
The next theorem is the core theorem of this section. This theorem is also used to prove the
strong law of large numbers, which will be discussed in the next section.
THEOREM 5.3.1 (Kolmogorov’s Three Series Theorem) Let (Xn )∞
n=1 be a sequence
of independent random variables and τc (x) = I{|x|≤c} (x)x be the truncation function of x with
P
truncation level c. Then the series ∞
n=1 Xn converges almost surely if( and only if) there exists(
for all) c > 0, the following three serieses converge:
(i)
∞
X
P {|Xn | > c} < ∞
(ii)
n=1
∞
X
E(τc (Xn )) < ∞
(iii)
n=1
∞
X
Var(τc (Xn )) < ∞
n=1
Proof.
Sufficiency: Suppose the three serieses converge. Then we know that (iii) holds and by one of
the previous lemmas we have
∞
X
(τc (Xn ) − E(τc (Xn ))) < ∞ a.s.
n=1
Since we have (ii) holds, then
∞
X
n=1
τc (Xn ) =
∞
X
(τc (Xn ) − E(τc (Xn ))) +
n=1
∞
X
E(τc (Xn )) < ∞ a.s.
n=1
Also (i) holds and by the first Borel-Cantelli lemma we know that P {|Xn | > c i.o. } = 0. Then
we know that there exists some N such that for all n > N we have |Xn | ≤ c. Namely, we know
that series
∞
X
τc (Xn ) =
n=N +1
This yields that
∞
X
Xn < ∞ a.s.
n=N +1
∞
X
Xn < ∞ a.s.
n=1
Necessity: Suppose we have that c > 0 where c can be arbitrary, and
P∞
n=1
Xn converges a.s..
Then Xn → 0 a.s., and this yields that there exists some N and for all n > N we have |Xn | ≤ c
a.s.. Namely, for all n > N we have τc (Xn ) = Xn a.s.. Then we know that P {|Xn | ≤ c a.a. } = 1
and P {|Xn | > c i.o. } = 0. Hence by the second Borel-Cantelli lemma we know that
∞
X
P {|Xn | > c} < ∞
n=1
which is (i).
Now by the Cauchy criterion we know that for all ε:
n
(
)
X
lim lim P max Xk ≥ ε = 0
m→∞ p→∞
m≤n≤p k=m+1
173
P∞
Var(Xn ) = ∞. By one of our previous lemma this means that
(
)
n
X
1
(c + ε)2
1
lim lim P max Xk ≥ ε ≥ lim lim
− Pp
=
m→∞ p→∞
m→∞
p→∞
m≤n≤p 2
2
k=m+1 Var(Xk )
k=m+1
Assume that
n=1
P∞
Var(Xn ) < ∞, which is (iii).
P
Now by one of our previous lemma we know that (iii) implies that ∞
k=1 (τc (Xn )−E(τc (Xn ))) <
And this yields a contradiction. Therefore we know that
n=1
∞ a.s.. Also we know that (i) holds implies that P {|Xn | > c i.o. } = 0 by the first Borel-Cantelli
lemma. Namely, there exists some N , such that for all n > N we have Xn = τc (Xn ). Therefore
P∞
P∞
P∞
X
<
∞
a.s.
implies
that
τ
(X
)
<
∞
a.s..
Hence
n
c
n
n=1
n=1
n=1 E(τc (Xn )) < ∞, which is
(ii).
Here is an example of application of the Kolmogorov’s three series theorem
EXAMPLE 5.3.1 (Random Signs Problem) Let (εk )∞
k=1 be a sequence of i.i.d. random
variables with P {εk = 1} = P {εk = −1} = 1/2. And (cn )∞
n=1 is a sequence of deterministic real
P∞
numbers. Find a sufficient and necessary condition for k=1 (ck εk ) converges a.s..
P∞ 2
P
Solution. We claim that ∞
k=1 ck converges. Take
k=1 (ck εk ) converges a.s. if and only if
P∞
c = 1, then k=1 (ck εk ) converges a.s., by the three series theorem, if and only if all three series
(i)
∞
X
P {|cn εn | > 1}
(ii)
∞
X
E(τ1 (cn εn ))
(iii)
Var(τ1 (cn εn ))
n=1
n=1
n=1
∞
X
converge. Now we know that τ1 (cn εn ) = cn εn if |cn | ≤ 1 and τ1 (cn εn ) = 0 if |cn | > 1. This means
that
∞
X
P {|cn εn | ≥ 1} =
n=1
∞
X
n=1
P {|cn | > 1} =
n=1
n=1
∞
X
∞
X
X
1
n:|cn |>1
Eτ1 (cn εn ) = 0
Var(τ1 (cn εn )) =
∞
X
(E(τ12 (cn εn )) − (Eτ1 (cn εn ))2 ) =
n=1
X
c2n
n:|cn |≤1
Here we can see that series(ii) vanishes automatically and we only need to consider (i) and (iii).
P
2
Since ∞
n=1 cn converges implies that there is only finitely many terms with |cn | > 1 and
P
series(i) is the sum of finitely many 1, and that n:|cn |≤1 c2n < ∞ because there exists some N
P
2
such that for all n > N we have |cn | ≤ 1, and ∞
n=N +1 cn converges. But for n ≤ N there are
only finitely many terms, and therefore series(iii) converges.
P
Conversely, if (i),(ii),(iii) converges, then we know that n:|cn |>1 1 < ∞, implying that |cn | ≤ 1
P
for all but finitely many n, and n:|cn |≤1 c2n < ∞, implying that there exists some N , such that
P
P∞ 2
P∞
2
2
n=N +1 cn ≤
n:|cn |≤1 cn < ∞. Therefore
n=1 cn < ∞.
174
P
Now we have shown that ∞
n=1 (cn εn ) converges a.s. to a finite limit random variable if and
P∞ 2
only if n=1 cn < ∞. For example, the random harmonic series, defined by cn = 1/n, converges
a.s..
5.4
Strong Law of Large Numbers
LEMMA 5.4.1 (Generalized Cesaro’s Theorem) Suppose xn → x and bn ↑ ∞ as n → ∞,
then with b0 = 0,
n
1 X
(bk − bk−1 )xk → x as n → ∞
bn k=1
Proof. Since xn → x, then for all ε there exists some N = N (ε) such that for all n > N we have
|xn − x| < ε. Also we have that (xk − x)N
k=1 is bounded for some constant M . Then we can
compute
n
n
1 X
1 X
(bk − bk−1 )xk − x = (bk − bk−1 )(xk − x)
bn
bn
k=1
k=1
N
n
1 X
1 X
≤
(bk − bk−1 )(xk − x) + (bk − bk−1 )(xk − x)
bn
bn
k=1
k=N +1
N
n
1 X
1 X
≤
(bk − bk−1 )M + (bk − bk−1 )ε
bn
bn
k=1
k=N +1
=
Therefore
ε(bn − bN )
bN M
+
bn
bn
n
1 X
bN M + ε(bn − bN )
=ε
lim sup (bk − bk−1 )xk − x ≤ lim
n→∞
bn k=1
bn
n→∞
Since ε can be arbitrarily small, then we know that
n
1 X
(bk − bk−1 )xk = x
n→∞ bn
k=1
lim
∞
LEMMA 5.4.2 (Kronecker’s Lemma) Let (bn )∞
n=1 , (xn )n=1 be sequences of real numbers
P
with bn ↑ ∞ as n → ∞. Suppose ∞
n=1 (xn /bn ) converges to a finite limit. Then Sn /bn → 0
Pn
where Sn = k=1 xn .
P
Proof. We prove it by summation by parts. Let Tn = nk=1 (xk /bk ). Then xn = bn (Tn − Tn−1 )
175
with b0 = T0 = 0, and
Sn =
n
X
(Tk − Tk−1 )bk = bn Tn − bn Tn−1 + bn−1 Tn−1 − bn−1 Tn−2 + · · · + b1 T1 − b1 T0 + b0 T0
k=1
n
X
= bn Tn −
(bk − bk−1 )Tk−1
k=1
This yields that
n
X
(bk − bk−1 )Tk−1
Sn
= Tn −
n
bn
k=1
Since Tn converges to a finite limit, then by the generalized Cesaro’s theorem we know that
Sn /bn → 0.
THEOREM 5.4.1 Let (Xn )∞
n=1 be a sequence of independent random variables with mean 0
P∞
and n=1 Var(Xn /bn ) < ∞ where (bn )∞
n=1 increases to infinity. Then Sn /bn → 0 a.s..
P∞
P∞
Proof. Since E(Xn /bn ) = 0 and
n=1 (Xn /bn )
n=1 Var(Xn /bn ) < ∞, then we know that
converges a.s.. This means that there exists some event C such that for all ω ∈ C we have
P
that ∞
n=1 (Xn (ω)/bn ) < ∞ with P (C) = 1. Then by the Kronecker’s lemma we know that
Sn (ω)/bn → 0 for all ω ∈ C. This means that C ⊂ {Sn /bn → 0} and by monotonicity we have
P {Sn /bn → 0} = 1.
The following theorem, which uses the sum of tail probability to control the expectation, is
of particular interest:
THEOREM 5.4.2 (Chung’s Theorem 3.2.1) Suppose X is a random variable. Then
∞
X
P {|X| ≥ n} ≤ E|X| ≤ 1 +
n=1
∞
X
P {|X| ≥ n}
n=1
P
Proof. Denote Λn = {n ≤ |X| < n + 1}. Then Ω = ∞
n=0 Λn . Then we can write
!
∞
∞
X
X
E|X| = E
|X|IΛn =
E(|X|IΛn )
n=1
n=1
by the Monotone Convergence Theorem. By monotonicity we have
nP (Λn ) ≤ E(|X|IΛn ) ≤ (n + 1)P (Λn )
Therefore we have
∞
X
n=0
∞
∞
X
X
nP (Λn ) ≤ E(|X|) ≤
(n + 1)P (Λn ) = 1 +
nP (Λn )
n=0
n=0
176
∞
X
It suffices to show that
nP (Λn ) =
n=0
∞
X
nP (Λn ) =
n=0
∞ X
∞
X
P {|X| ≥ n}. Write
n=1
I{k≤n} P (Λn ) =
n=1 k=1
∞
X
∞ X
∞
X
I{k≤n} P (Λn ) =
k=1 n=1
∞ X
∞
X
P (Λn ) =
k=1 n=k
∞
X
P {|X| ≥ k}
k=1
because exchange of double summation of series of positive terms is feasible regardless of whether
∞
X
it converges or diverges. Also we have
P (Λk ) = P {|X| ≥ k} by the countable additivity.
n=k
Thus proof is completed.
For the purpose of application of Kolmogorov’s three series theorem, we also need to control
the truncated variance using the following lemma:
LEMMA 5.4.3 Suppose p ∈ (0, 2) and E|X|p < ∞. Then
∞
X
X
p
Var τ1
E|X|p
≤1+
1/p
n
2
−
p
n=1
Proof. Write
2
X
2 X
∞
∞
X
X
X
E τ1
E
I{|X|p ≤n}
Var τ1
≤
=
n1/p
n1/p
n2/p
n=1
n=1
n=1


!
∞
∞
X
X
I{|X|p ≤n}
1 
= E X2
= E X 2
2/p
2/p
n
n
p
n=1
∞
X
n≥|X|
where we have used Monotone Convergence Theorem to exchange the expectation and the infinite
summation. Note that
∞
X
n≥|X|p
1
n2/p
1
≤
+
(|X|p )2/p
Z
∞
|X|p
dt
1
p
1
=
+
2/p
2
t
|X|
2 − p |X|2−p
Hence we have


X 1
X
X 2

Var τ1
≤
E
1/p
2/p
n
n
p
n=1
n≥|X|
1
p
2
p−2
≤E X
+
|X|
|X|2 2 − p
p
=1+
E|X|p
2−p
∞
X
Before we come to the KMZ strong law of large numbers, first we discuss the classical strongl
law of large numbers because it is particularly tough compared to other results.
177
THEOREM 5.4.3 (Classical Strong Law of Large Numbers) Suppose X, (Xn )∞
n=1 are
n
X
Sn − nEX
→0
i.i.d. sequence of random variables with E|X| < ∞. Let Sn =
Xn . Then
n
k=1
a.s.
Proof. Since we have E|X| < ∞, then we know the series
∞
X
P {|Xn | > n} ≤
n=1
∞
X
P {|Xn | ≥ n} =
n=1
∞
X
P {|X| ≥ n} < ∞
n=1
by our previous result, then the first Borel-Cantelli lemma gives that P {|X| ≥ n i.o. } = 0.
Namely, there exists some N such that |Xn | ≤ n with probability 1, which is exactly τ1 (Xn /n) =
Xn /n a.s.. Also the Dominated Convergence Theorem gives that
E(τn (Xn )) = E(τn (X)) → EX
as n → ∞ because τn (X) → X and |τn (X)| ≤ |X|. Note that by taking p = 1 in the upper
bound inequality for the sum of truncated variances we have
X
∞
∞
X
Xn
X
Var τ1
Var τ1
=
< 1 + E|X| < ∞
n
n
n=1
n=1
which shows that
∞
X
(τ1 (Xn /n) − E(τ1 (Xn /n))) converges a.s.. This means that
∞
X
(Xn /n −
n=1
n=1
E(τ1 (Xn /n))) converges a.s.. Then by Kronecker s lemma we know that
Pn
k=1 (Xk − Eτk (X))
→ 0 a.s.
n
Since we have Eτk (X) → EX as k → ∞, then by Cesaro’s theorem we have
P
Pn
(Eτk (X) − EX)
Sn − nk=1 Eτk (X)
Sn − nEX
=
+ k=1
→ 0 a.s.
n
n
n
THEOREM 5.4.4 (KMZ Strong Law of Large Numbers) Let 0 < p < 2 and X, (Xn )∞
n=1
be a sequence of i.i.d. random variables.
• If E|X|p < ∞ and p ≥ 1, then
Sn − nEX
→ 0 a.s.;
n1/p
• If E|X|p < ∞ and p < 1, then for any finite value a we have
• If E|X|p = ∞, then lim sup
n→∞
Sn − na
→ 0 a.s.
n1/p
|Sn − na|
= ∞ a.s. for any finite value a.
n1/p
p
Proof. For the case E|X| < ∞, it suffices to show p < 1 and p > 1 because we have already
establish the classical strong law of large numbers, which is p = 1.
178
• E|X|p < ∞ and p > 1. Without loss of generality we may assume that EX = 0. Chung’s
theorem 3.2.1 suggest that
X
∞
∞
∞
X
X
Xn
p
P
>1 ≤
P {|Xn | ≥ n} =
P {|X|p ≥ n} ≤ E|X|p < ∞
1/p
n
n=1
n=1
n=1
Also lemma 1.4.3 suggests that
X
∞
∞
X
Xn
X
Var τ1
=
Var τ1
<∞
1/p
1/p
n
n
n=1
n=1
∞
X
Xn
Now we only need to show that
E τ1
< ∞. Since EX = 0, then
n1/p
n=1
X
X
E τ 1
= E
p >n}
I
{|X|
n1/p
n1/p
This yields that


X
∞
X 1
I{n<|X|p } X
E τ 1 Xn
≤
= E |X|

E
1/p
1/p
1/p
n
n
n
p
n=1
n=1
∞ X
n<|X|
by Monotone Convergence Theorem. Then we have


!
Z |X|p
∞ X
X
1
dt
p
X
n
≤ E |X|
E τ 1
 ≤ E |X|
=
E|X|p < ∞
1/p
1/p
1/p
n
n
t
p−1
0
p
n=1
n<|X|
Now by the Kolmogorov’s three series theorem we have that
∞
X
Xn
n=1
n
converges a.s.. There-
Sn
fore by the Kronecker’s lemma we know that
→ 0 a.s..
n
• E|X|p < ∞ and p < 1. Consider the sequence Yn = Xn − a. Then Y, (Yn )∞
n=1 is a sequence
of i.i.d. random variables. Chung’s theorem 3.2.1 suggest that
X
∞
∞
∞
X
X
Yn
p
P
>
1
≤
P
{|Y
|
≥
n}
=
P {|Y |p ≥ n} ≤ E|X|p < ∞
n
1/p
n
n=1
n=1
n=1
Also lemma 1.4.3 suggests that
X
∞
∞
X
Yn
Y
=
Var τ1
<∞
Var τ1
n1/p
n1/p
n=1
n=1
∞
X
Yn
Now we only need to show that
E τ1
< ∞. Write
1/p
n
n=1


X
X
∞ ∞
∞
X
X
I{n≥|Y |}
1 
Y
E τ 1 Yn
≤
=
E τ1
E |Y | 1/p
= E |Y |
1/p
1/p
1/p
n
n
n
n
p
n=1
n=1
n=1
n≥|Y |
179
by Monotone Convergence Theorem. Then


Z ∞
∞ X
X
dt
1
1 
E τ 1 Yn
≤ E |Y |
≤ E |Y |
+
1/p
n1/p
n1/p
|Y |
|Y |p t
p
n=1
n≥|Y |
=1+
p
E|Y |p < ∞
1−p
Therefore by Kolmogorov’s three series theorem we have shown that
a.s.. This means that
n
X
Yk
k=1
n
=
∞
X
Yn
converges
1/p
n
n=1
Sn − na
→ 0 a.s.
n1/p
• E|X|p = ∞. Now we take Yn = Xn −a for any a finite. Then we see that |x−a|p /|x|p → 1 as
x → ∞. This means that there exists some constant A, B such that A|x|p ≤ |x−a|p ≤ B|x|p
for sufficiently large |x|, say, |x| ≥ M . Then we know that
E|X − a|p = E|X − a|p I{|X|≥M } + E|X − a|p I{|X|<M }
≥ E|X − a|p I{|X|≥M } ≥ AE|X|p I{|X|≥M }
Note that E|X|p = E|X|p I{|X|<M } +E|X|p I{|X|≥M } = ∞, showing that E|X|p I{|X|≥M } = ∞.
Therefore we see that
E|X − a|p ≥ AE|X|p I{|X|≥M } = ∞
Hence E|X − a|p = ∞. Namely we have E|Y |p = ∞. Write
1/p
|Sn |
|Sn−1 |
n−1
|Xn |
≤ 1/p +
n1/p
n
(n − 1)1/p
n
Then we know
[
∞
|Xn |
|Xn |
|Sn |
lim inf
≤c
lim sup 1/p < ∞ ⊂ lim sup 1/p < ∞ =
n
n1/p
n→∞ n
n→∞ n
c=1
Note that
∞
X
n=1
P {|Xn /c|p > n} =
∞
X
EI{|X/c|p >n} = E
n=1
∞
X
!
I{|Xn /c|p >n}
≥ E ((|X|/c)p ) − 1 = ∞
n=1
Then by the second Borel-Cantelli lemma we know that P {|Xn /c|p > n i.o. } = 1. Then
we see that
(∞
) X
∞
[
|Sn |
|Xn |
|Xn |
P lim sup 1/p < ∞ ≤ P
lim inf
≤c
≤
P lim inf
≤c
=0
n
n
n1/p
n1/p
n→∞ n
c=1
c=1
Therefore we know that lim supn |Sn |/n1/p = ∞ a.s..
180
5.5
Rate of Convergence
THEOREM 5.5.1 (Rate of Convergence) Let X, (Xn )∞
n=1 be a sequence of i.i.d. random
Sn
variables. A necessary and sufficient condition for on the distribution of X such that
→0
n/ log n
a.s. is that EX = 0 and E|X|(log |X|)+ < ∞.
The proof of the theorem is complicated, and we need following lemmas.
n
LEMMA 5.5.1 If n ≥ 3 and |X| >
, then n < 2|X| log |X|.
log n
x
for all x > 1. Then we know that
Proof. Define function ϕ(x) =
2 log x
ϕ0 (x) =
log x − 1
2(log x)2
ϕ0 (e) = 0
ϕ00 (x) > 0
This means that ϕ(x) ≥ ϕ(e) ≥ 1 for all x > 1 and hence x2 /(2 log x) ≥ x for all x > 1.
Therefore if n/ log n < |X| then we have
e≤
3
n
≤
< |X|
log 3
log n
because n/ log n is monotonically increasing when n ≥ 3. Therefore we know that |X| > 1 and
2ϕ(n) =
|X|2
|X|2
n
≤ |X| ≤ |X|ϕ(|X|) =
=
= 2ϕ(|X|2 )
log n
2 log |X|
log |X|2
Recall that ϕ0 (x) > 0 for x > e, which means that ϕ strictly increases over x > e. And now we
have |X|2 > |X| > e, n ≥ 3 > e as well as 2ϕ(n) < 2ϕ(|X|2 ). Therefore we must have n < |X|2
by the inverse function theorem. Furthermore we need another upper bound for n which is more
delicate than |X|2 : since we already have n ≤ |X|2 for n ≥ 3, then
n < |X| log n ≤ |X| log |X|2 = 2|X| log |X|
holds for n ≥ 3.
LEMMA 5.5.2 If n ≥ 3 and |X| ≥ 3 and |X| ≤
n
,
log n
then n ≥ |X| log |X|.
Proof. If |X| > 3 then we know that |X|/ log |X| < |X| and this means that
2ϕ(|X|) =
|X|
n
< |X| ≤
= 2ϕ(n)
log |X|
log n
Since n ≥ 3, |X| ≥ 3 and ϕ(x) defined in Claim 1 strictly increases in x ≥ 3, then we must
have |X| < n by the inverse function theorem. Similarly a more delicate upper bound is given:
n ≥ |X| log n ≥ |X| log |X|
181
LEMMA 5.5.3 Define an =
n
. If E|X|(log |X|)+ < ∞ and EX = 0, then
log n
n
log n X
E(τak (X)) → 0 as n → ∞
n k=1
Proof.
(i) First we show that (log n)E(τan (X)) → 0 as n → ∞. Define Λn = {2|X| log |X| ≥ n} for
n ≥ 3. Since EX = 0 then we know
Z
Z
|E(τan (X)) log n| = X log ndP = {|X|≤an }
{|X|>an }
Z
X log ndP ≤
|X| log ndP
{|X|>an }
By Claim 1 we know that {|X| > an } ⊂ {n < 2|X| log |X|} = Λn for n ≥ 3. But we also have
that n < 2|X| log |X| over Λn and this yields
log n ≤ log(2|X| log |X|) = log 2 + log |X| + log log |X| ≤ log 2 + 2 log |X|
Note that for n ≥ 3 we have an ≥ 1 and |X| > an > 1, then
Z
Z
|X| log ndP ≤
|X| log ndP
|E(τan (X)) log n| ≤
{|X|>an }
Λn
Z
≤
|X|(log 2 + 2 log |X|)dP
Λn
Z
=
|X|(log 2 + 2(log |X|)+ )dP
Λn
Now we have that EX = 0, which means that E|X| < ∞. Also E|X|(log |X|)+ < ∞. Then we
see that |X|(log 2 + 2(log |X|)+ ) is integrable. Now we have (Λn )∞
n=1 decreases as n → ∞, then
!
!
∞
∞
\
\
lim P (Λn ) = P
Λn = P
{2|X| log |X| > n} = P {|X| log |X| = ∞} = 0
n→∞
n=3
n=3
because E|X|(log |X|)+ < ∞. Then by Exercise 3.2.2 on Chung’s book we know
Z
lim
|X|(log 2 + 2(log |X|)+ )dP = 0
n→∞
Λn
And this yields that E(τan (X)) log n → 0 as n → ∞.
E(τan (X))
(ii) Next we show that lim
= 0. In fact we have
n→∞ an − an−1
E(τan (X))
1
= lim E(τan (X)) log n ·
=0
n→∞ an − an−1
n→∞
(an − an−1 ) log n
lim
because by the Mean value theorem we have
an − an−1 =
log θ − 1
(log θ)2
for some θ :
182
n−1≤θ ≤n
Therefore
lim (an − an−1 ) log n = lim
n→∞
n→∞
(log θ − 1)(log n)
=1
(log θ)2
as n → ∞ because log θ ∼ log n ∼ log(n − 1) when n → ∞.
n
log n X
(iii) Finally we show that
E(τan (X)) → 0 as n → ∞. Now we have
n k=1
E(τan (X))
=0
n→∞ an − an−1
lim
Then by generalized Cesaro’s Theorem we have
n
∞
1 X
E(τan (X))
log n X
C
+
(am − am−1 )
→0
E(τak (X)) =
n k=1
an an m=4
am − am−1
as n → ∞ where C is a constant associated with the first three terms.
Now we are ready to prove the theorem.
• Sufficiency: Suppose now we have
Sn
E|X|(log |X|)+ < ∞ as well as EX = 0. We are to show
converges to 0 a.s..
n/ log n
∞
X
n
P |Xn | >
(1) First consider series
. We claim that it converges. Compute
log
n
n=3
Proof of the Rate of Convergence Theorem.
∞
X
P |Xn | >
n=3
n
log n
=
∞
X
P |X| >
n=3
=E
∞
X
n
log n
=
∞
X
!
I{|X|>n/ log n}
E(I{|X|>n/ log n} )
n=3
=E
n=3
∞
X
!
I{|X|>n/ log n} I{|X|>1}
n=3


X
=E
I{|X|>1} 
n
n≥3: log
<|X|
n
where we have used the identical distribution property of X and Xn in the first equality,
theorem in the third equality because the sequence
P and monotone convergence
∞
k
is a sequence of increasing random variables. By the first
n=1 I{|X|>n/ log n}
k=1
lemma in this section we have
∞
X
n=3
P
|Xn | >
n
log n

=E

X
n
n≥3: log
<|X|
n


b2|X| log |X|c
I{|X|>1}  ≤ E 
X
I{|X|>1} 
n=1
≤ 2E(|X| log |X|I{|X|>1} ) = 2E|X|(log |X|)+ < ∞
183
Then by the first Borel-Cantelli lemma we know that
n
i.o. = 0
P |Xn | >
log n
This means that there exists some N , such that for all n > N we have |Xn | ≤
n
.
log n
Namely, for all n > N we know
τ1
Xn
n/ log n
=
Xn
n/ log n
where τc is the truncation function at level c defined in the notes.
∞
X
Xn
(2) Next we consider series
Var τ1
. We claim that it converges.
n/
log
n
n=3
Write
(∗)
!
∞
X
I
I
{|X|≤n/ log n}
{|X|≤n/ log n}
≤
E X2
= E X2
2
(n/
log
n)
(n/
log n)2
n=3
n=3
n=3
!
!
∞
∞
X
X
I
I
{|X|≤n/
log
n}
{|X|≤n/
log
n}
=E X 2
I
+ E X2
I
2 {|X|≤3}
2 {|X|>3}
(n/
log
n)
(n/
log
n)
n=3
n=3
!
∞
∞
2
X
X
I
(log n)
{|X|≤n/
log
n}
≤9
+ E X2
I
2
2 {|X|>3}
n
(n/
log
n)
n=1
n=3
∞
X
Var τ1
Xn
n/ log n
∞
X
where we have used the identical distribution property of X and Xn in the first inequality as well as formula Var(Y ) ≤ EY 2 , and monotone convergence theorem in the
second equality because the summands in the series are non-negative. Now compute
the second part by the second lemma in this section
!
∞
X
I
{|X|≤n/
log
n}
I{|X|>3}
E X2
(n/ log n)2
n=3


X
I{|X|>3}

≤E X 2
(n/ log n)2
n≥|X| log |X|
Z
(log(|X| log |X|))2
(log t)2 dt
2
≤E X
+
I{|X|>3}
|X|2 (log |X|)2
t2
|X| log |X|
(log(X log |X|))2 (log(|X| log |X|))2 + 2 log(|X| log |X|) + 2
=E
+
|X| I{|X|>3}
(log |X|)2
log |X|
Note that we have log(|X| log |X|) = log |X|+log log |X| ≤ log |X|+log |X| = 2 log |X|,
184
then
X2
E
∞
X
I{|X|≤n/ log n}
(n/ log n)2
!
I{|X|>3}
n=3
(log(X log |X|))2 (log(|X| log |X|))2 + 2 log(|X| log |X|) + 2
|X| I{|X|>3}
+
≤E
(log |X|)2
log |X|
4(log |X|)2 4(log |X|)2 + 4 log |X| + 2
≤E
|X| I{|X|>3}
+
(log |X|)2
log |X|
2|X|
≤E 4 + 4|X|(log |X| + 1) +
I{|X|≥3}
log |X|
2|X|(log |X|)2
I{|X|>3}
≤E 4 + 8|X| log |X| +
log |X|
=4 + 10E(|X|(log |X|)I{|X|>3} ) ≤ 4 + 10E|X|(log |X|)+ < ∞
where we have repeatedly used log(|X| log |X|) ≤ 2 log |X| and log |X| > 1 for |X| ≥ 3.
Therefore by (∗) we know that the sums of truncated variances converges because the
first part
∞
X
(log n)2
n2
n=3
∞ X
<∞
Xn
Xn
(3) Consider
τ1
− Eτ1
. We claim that it converges a.s..
n/ log n
n/ log n
n=1
This is simply an conclusion of lemma 3 on the lecture notes(Three Series Theorem
part). Because the variance does not change due to centering with expectation, and
the summands are zero mean independent random variables since τ1 function is Borel
measurable. Now we have
X
∞
∞
X
Xn
Xn
Xn
Var τ1
Var τ1
=
− Eτ1
<∞
n/
log
n
n/
log
n
n/
log
n
n=1
n=1
by (2), then the second lemma implies that the series
∞ X
Xn
Xn
τ1
− Eτ1
<∞
n/ log n
n/ log n
n=1
almost surely.
(4) Now we know that
∞ X
τ1
n=1
Xn
n/ log n
− Eτ1
Xn
n/ log n
<∞
a.s., which is to say there exists some P -null set N , such that for all ω ∈ Ω\N we have
∞ X
Xn (ω)
Xn
τ1
− Eτ1
<∞
n/ log n
n/ log n
n=1
185
then by Kronecker’s lemma we know that
n
X
(τ
k
log k
(Xk (ω)) − Eτ
k
log k
(Xk ))
k=1
n/ log n
→0
over ω ∈ Ω\N . Also we know from (1) that there exists some n0 = n(ω) such that for
all n > n0 we have τan (Xn (ω)) = Xn (ω). Therefore
n
1 X
(Xk (ω) − Eτ k (Xk )) → 0
In :=
log k
an k=1
as n → ∞ because as n → ∞ the first n0 (ω) terms does not change the value of the
limit since an ↑ ∞. Now we need the result from the second lemma:
n
1 X
Jn :=
E(τak (Xk )) → 0 as n → ∞
an k=1
Therefore we obtain that
n
1 X
Sn (ω)
→0
(Xk (ω)) = In + Jn =
an k=1
n/ log n
over ω ∈ Ω\N where N is a P -null set.
• Necessity:
(1) First we show that EX = 0. If E|X| = ∞ then we know that lim supn |Sn |/n = ∞
by the KMZ strong law of large numbers. Define an = n/ log n. Then
lim sup
n→∞
|Sn |
|Sn |
= lim sup
log n = ∞
an
n
n→∞
because log n → ∞ as n → ∞. This contradicts with Sn /an → 0 a.s.. Hence we
have E|X| < ∞. Now assume that EX 6= 0. Without loss of generality we may let
EX > 0. Then by the KMZ strong law of large numbers we have Sn /n → EX a.s..
But log n → ∞, and this means that
Sn
Sn
=
log n → ∞ a.s.
an
n
This is because the first term Sn /n converges to EX > 0 a.s.. This again contradicts
with Sn /an → 0 a.s.. Therefore EX = 0.
(2) Next we show that E|X|(log |X|)+ < ∞. We prove it by contradiction. Assume
that E|X|(log |X|)+ = ∞. Since
Xn |Sn | |Sn−1 | an−1
≤
+
an an
an−1
an
186
This means that if lim supn |Sn |/an < ∞ then we have lim supn |Xn |/an < ∞, which is
that there exists some integer c ≥ 1 such that |Xn |/an ≤ c for all but finitely many n.
Therefore
[
∞
|Sn |
|Xn |
lim sup
<∞ ⊂
lim inf
≤c
n
an
a
n→∞
n
c=1
For any c compute
X
∞
∞
X
|Xn |
>c ≥
P {|X|/c > an } = E
P
an
n=3
n=3
=E
∞
X
∞
X
!
I{an <|X|/c} I{|X|≥3c}
n=3
!
I{an <|X|/c}∩{|X|≥3c}
n=3
Denote Y = |X|/c. Since we are considering n ≥ 3 then by the second lemma we have
that n < Y log Y together with Y ≥ 3 implies that Y > an and Y > 3(Use the second
lemma reversely). Then this can be written in terms of events
(∗∗)
{n < Y log Y } ∩ {Y ≥ 3} ⊂ {an < |X|/c} ∩ {|X| ≥ 3c} for n ≥ 3
and write it in terms of indicators
(∗ ∗ ∗)
I{n<bY log Y c}∩{Y ≥3} ≤ I{n<Y log Y }∩{Y ≥3} ≤ I{an <|X|/c}∩{|X|≥3c}
for n ≥ 3
Hence by (∗ ∗ ∗) we have that
∞
X
|Xn |
P
>c
an
n=3
≥E
∞
X
!
I{an <|X|/c}∩{|X|≥3c}

bY log Y c−1
≥E
X

I{Y ≥3}  − 2
n=1
n=3
≥E((Y log Y − 2)I{Y ≥3} ) − 2 ≥ E((Y log Y )I{Y ≥3} ) − 4
1
log c
= E(|X| log |X|I{|X|≥3c} ) −
E|X|I{|X|≥3c} − 4
c
c
1
1
log c
= E(|X| log |X|I{|X|>1} ) − E(|X| log |X|I{1<|X|<3c} ) −
E|X|I{|X|≥3c} − 4
c
c
c
∞
X
|Xn |
Now we are to show that the series
P
> c diverges. In fact we have
an
n=3
1
1
P := E(|X| log |X|I{|X|>1} ) = E|X|(log |X|)+ = ∞
c
c
1
1
Q := E(|X| log |X|I{1<|X|<3c} ) ≤ E(3c log 3c) < ∞
c
c
log c
log c
R :=
E|X|I{|X|≥3c} ≤
E|X| < ∞
c
c
187
and above three terms are all nonnegative. Then we know that
∞
X
|Xn |
P
>c ≥P −Q−R−4=∞
a
n
n=3
∞
X
|Xn |
> c diverges. Now we know that ({|Xn > can }) ∈
and hence the series
P
an
n=3
∞
σ(Xn ) and (Xn )∞
n=1 are i.i.d., then ({Xn > can })n=1 is a sequence of independent events.
By the second Borel-Cantelli lemma we know that P {|Xn | > can i.o. } = 1. Namely,
we have
|Xn |
P lim inf
≤c
=0
n
an
yielding that
X
∞
|Sn |
|Xn |
P lim sup
P lim inf
<∞ ≤
≤c
=0
n
an
a
n→∞
n
c=1
And this shows that
E|X|(log |X|)+ < ∞.
5.6
Sn
Sn
diverges a.s., contradicting with
→ 0 a.s.. Hence we have
an
an
Applications of Law of Large Numbers
DEFINITION 5.6.1 (Empirical Distribution Function) Suppose X, (Xn )∞
n=1 is a sequence of i.i.d. random variables. Then for all n and (t, ω) ∈ R × Ω define the empirical
distribution function
n
1X
I(−∞,t] (Xn (ω))
Fn (t, ω) =
n k=1
LEMMA 5.6.1 Suppose F is a distribution function. Let S = {s1 , · · · , sk } be a finite subset
of R. Define partition πS = {(−∞, s1 ), [s1 , s2 ), · · · , [sk , +∞)} and denote s0 = −∞, sk+1 = ∞.
Define
wπS =
sup |F (sj −) − F (sj−1 )|
1≤j≤k+1
Then there exists a sequence of finite sets (Sm )∞
m=1 , such that Wm = wπSm → 0 as m → ∞.
Proof. Since F is a distribution function, then F (x−) is an increasing function. Since F is
right-continuous, then F (x−) is left continuous. Now for any integer m let
m−1
m
X r r + 1
X
1
[0, 1] = 0,
+
,
:=
Ir
n
m m
r=1
r=1
188
Now look at Jr = F −1 (Ir ). Since F is monotonic increasing then F −1 (Ir )’s are intervals.
Now let Sm be the sets of all ends of the intervals Jr except for ±∞. Then Sm is a finite set.
Now write Sm = {s1 , s2 , · · · , sm−1 }. Since sr is the supremum of Jr , then there is a sequence
xn ↑ sr such that (xn )∞
n=1 ⊂ Jr . Then (r − 1)/m < F (xn ) ≤ r/m and we can let n → ∞ and get
(r − 1)/m ≤ F (sk −) ≤ r/m by the left continuity of F (x−). Then we know that sr ∈ Jr . Note
that we have obtain a useful inequality
r−1
r
≤ F (sr −) ≤
m
m
Similarly we consider the inifimum of Jr . Since sr−1 is the infimum of Jr , then there is a sequence
xn ↓ sr−1 such that (xn )∞
n=1 ⊂ Jr . Then (r − 1)/m ≤ F (xn ) ≤ r/m and we can let n → ∞ and
get (r − 1)/m ≤ F (sr−1 ) ≤ r/m by the right continuity of F (x). Then we have obtain a useful
inequality
Then
wπSn
r−1
r
≤ F (sr−1 ) ≤
m
m
j−1
1
j
−
=
→0
= max |F (sj −) − F (sj−1 )| = max
1≤j≤m
1≤j≤m
m
m
m
as m → ∞.
THEOREM 5.6.1 (Glivenko-Cantelli Theorem) Suppose (Xn )∞
n=1 is a sequence of i.i.d.
random variables. Let
n
1X
Fn (t, ω) :=
I(−∞,t] (Xk (ω))
n k=1
Then with probability one we have
Dn (ω) = sup |Fn (t, ω) − F (t)| → 0 t → ∞
t∈R
Proof. Suppose t ∈ [a, b). First write
Fn (t) − F (t) ≤ Fn (b−) − F (a) = Fn (b−) − F (b−) + F (b−) − F (a)
≤ |Fn (b−) − F (b−)| + |F (b−) − F (a)|
Fn (t) − F (t) ≥ Fn (a) − F (b−) = Fn (a) − F (a) + F (a) − F (b−)
≥ −|Fn (a) − F (a)| − |F (b−) − F (a)|
This implies that if πS is a partition then
|Fn (t) − F (t)| ≤ max(|Fn (s) − F (s)|, |Fn (s−) − F (s−)|) + max |F (sj −) − F (sj−1 )|
sj ∈S
sj ∈S
Now we know by previous lemma that there exist a sequence of finite sets (Sm )∞
m=1 such that
N
[
Wm → 0 as m → ∞. Now take RN =
Sm . Then for all N RN is a countable set. For all
m=1
189
s ∈ R∞ = limN RN we have
n
1X
Fn (s) =
I(−∞,s] (X) → EI(−∞,s] (X) = F (s) a.s.
n i=1
n
1X
Fn (s−) =
I(−∞,s) (X) → EI(−∞,s) (X) = F (s−) a.s.
n i=1
by the strong law of large numbers and we can associate a null set N (s) with s such that
convergence is pointwise over N (s)c . Therefore the first part
max (|Fn (s) − F (s)|, |Fn (s−) − F (s−)|) → 0
sj ∈RN
as N → ∞ except for a null set
S
s∈R
N (s). Now consider the second part. We know that RN
contains all points in every Sm for all m ≤ N , and the function F is an increasing function. Then
adding points into RN will decreases the maximal difference in πS due to triangular inequality
F (sj −) − F (sj−1 ) = F (sj −) − F (s0 )| + |F (s0 ) − F (s0 −)| + |F (s0 −) − F (sj=1 )
≥ max(|F (sj −) − F (s0 )|, |F (s0 −) − F (sj−1 )|)
when inserting points in between. Therefore
wπRN = max |F (sj −) − F (sj−1 )| ≤ max |F (sj −) − F (sj−1 )| = wπSN
sj ∈RN
sj ∈SN
Therefore we obtain that
Dn = |Fn (t) − F (t)| ≤ max (|Fn (s) − F (s)|, |Fn (s−) − F (s−)|) + wπRN → 0
sj ∈RN
except for a null set
S
s∈R∞
N (s) since wπRN ≤ wπSN → 0 as N → ∞. Therefore the convergence
is almost sure convergence.
THEOREM 5.6.2 (Weierstrass Approximation Theorem) Let f be a continuous function on [0, 1] and define the Bernstein’s polynomials (pn )n≥1 as follows
!
n
X
n
k
k
n−k
pn (λ) =
λ (1 − λ) f
n
k
k=0
Then pn converges to p uniformly on [0, 1].
Proof. Let X, (Xn )n≥1 be i.i.d. random variables such that
P (Xn = 1) = λ,
P (Xn = 0) = 1 − λ
190
And let Sn =
n
X
Xi . Then
i=1
n
X
k
pn (λ) =
P (Sn = k)f
n
k=0
n
X
k
k
Sn
=
f
=
P
n
n
n
k=0
Sn
= Eλ f
n
(1) By the strong law of large numbers we know that Sn /n → λ a.s.. Then
Z Z
Z
Sn
Sn
Sn
lim Eλ f
= lim
f
dP =
lim f
dP = f (λ) dP = f (λ)
n→∞
n→∞
n→∞
n
n
n
because f is continuous on [0, 1] and thus bounded and we can exchange the limit with
integral sign using Dominated Convergence Theorem, and that f is continuous means
f (Sn /n) → f (λ) a.s..
(2) By the weak law of large numbers we know that Sn /n → λ in probability. Since f is uniformly continuous on [0, 1] then we have f (Sn /n) → f (λ) in probability. Since (f (Sn /n))n≥1
is L1 bounded, then we know that f (Sn /n) → f (λ) in L1 . Therefore
Sn
E f
→ E(f (λ)) = f (λ) as n → ∞
n
Due to uniform continuity of f we know for all ε > 0 there exists some δ such that |f (x)−f (y)| <
ε as long as |x − y| < δ. Define M = max . Now compute
0≤x≤1
Sn
− f (λ)
|pn (λ) − f (λ)| = Eλ f
n
Sn
≤ Eλ f
− f (λ)
n
Z
Z
S
S
n
n
f
f
=
− f (λ) dP +
− f (λ) dP
n
n
{|Sn /n−λ|≥δ}
{|Sn /n−λ|<δ}
≤ 2M P {|Sn /n − λ| ≥ δ} + ε
2M nλ(1 − λ)
+ ε by Chebyshev’s inequality
n2 δ 2
M
≤
+ε
2nδ 2
Choose n(ε) such that M/(2nδ) < ε because δ only depends on ε, then we obtain that
≤
|p(λ) − f (λ)| ≤ 2ε
And therefore pn (λ) converges to f (λ) uniformly over [0, 1].
191
Here we prove a fundamental lemma concerning tail probabilities and expectation using
Lebesgue integral:
LEMMA 5.6.2 Suppose X is an integrable random variable on a probability space (Ω, F, P ),
then
∞
Z
P (|X| > t)dt
E|X| =
0
Proof. Without loss of generality we may assume that X ≥ 0. Now for all n ≥ 1, consider the
product space ((0, n) × Ω, F ⊗ B(0,n) , P × λ/n). Then we have
Z nZ
Z n
Z Z
λ
λ
P (X > t)dt = n
I{X(ω)>t} dP (dt) = n
I{X(ω)>t} (dt)dP
n
n
0
0
(0,n)
due to the Fubini’s theorem. Notice that t ∈ (0, n) and X(ω) > t if and only if 0 < t <
min(X(ω), n). Therefore
Z Z
Z n
Z Z
λ
λ(dt)dP
I{X(ω)>t} (dt)dP =
P (X > t)dt = n
n
(0,min(X(ω,n))
(0,n)
0
Z
= X(ω)I{X(ω)≤n} dP
Since I{X(ω)≤n} increases to 1 a.s. because EX < ∞ and X < ∞ a.s., then by Monotone
Convergence Theorem we have
Z
Z
EX = lim
X(ω)I{X(ω)≤n} dP = lim
n→∞
n→∞
n
Z
P (X > t)dt =
0
∞
P (X > t)dt
0
Here is a sufficient condition for the law of large numbers:
THEOREM 5.6.3 Suppose (Xn )n≥1 is a sequence of i.i.d. random variables. Let Sn =
Pn
j=1 Xj . Then there exists a sequence (µn )n≥1 such that Sn /n − µn → 0 in probability, if
tP (|X| > t) → 0 as t → ∞, and if so, then one can take
Z
µn =
XdP
{|X|≤n}
Proof. Define Yni = τn (Xi ), and set Tn =
Pn
i=1
Yni , and Yn = τn (X) where X is identically
distributed as Xi ’s. Then
Sn
− µn =
n
Sn − Tn
n
+
Tn
− µn
n
If the two terms on the right-hand side converge to 0 in probability, then the proof is completed.
Hence it suffices to show:
192
• (Sn − Tn )/n → 0 in probability as n → ∞. Compute
!
n
X
|Sn − Tn |
P
>ε ≤P
|Xi − τn (Xi )| > 0 = P
n
i=1
≤
n
X
P (Xi 6= τn (Xi )) =
i=1
n
X
n
[
!
{Xi 6= τn (Xi )}
i=1
P (Xi ≥ n) = nP (X ≥ n) → 0
i=1
as n → ∞ by the assumption.
• Tn /n − µn → 0 in probability as n → ∞.
Notice that EYni = EYn = µn , then by
Chebyshev’s inequality
Z
Tn
Var(Tn )
Var(Yn )
EYn2
1
=
≤
= 2 Yn2 dP
P − µn ≥ ε ≤
2
2
2
2
n
nε
nε
nε
nε
Now we apply the previous lemma to integrable random variable Yn2 :
Z
Z ∞
Z ∞
2
2
Yn dP =
P (Yn > t)dt =
P (Xn2 I{Xn ≤n} > t)dt
0
Z0 ∞
Z n
=
2P (Xn I{Xn ≤n} > u)udu = 2
uP (Xn I{Xn ≤n} > u)du
0
0
Z n
≤2
tP (Xn > t)dt
0
By the L’Hospital’s rule we know that
Z n
Tn
2
tP (Xn > t)dt → 0
P − µn ≥ ε ≤ 2
n
nε 0
as n → ∞ because tP (X > t) = tP (Xn > t) → 0 as t → ∞.
193
Chapter 6
Characteristic Functions
6.1
General Properties and Convolutions
DEFINITION 6.1.1 (Complexed Valued-Random Variables and Expectation) A
function Z : (Ω, F, P ) → C is a complexed valued random variable, if X = Re(Z) and
Y = Im(Z) are random variables. If X, Y are integrable, namely, E|X| < ∞, E|Y | < ∞,
then the expectation of Z is defined by EZ = EX + iEY .
DEFINITION 6.1.2 (Characteristic Functions) If X is a random variable with law µ and
distribution F , then the characteristic function f of X(or µ or F ) is the function from R to C
given by
itX
f (t) = Ee
Z
=
itx
Z
e µ(dx) =
R
eitx dF (x)
R
LEMMA 6.1.1 (Modulus Inequality) Suppose Z is complex-valued random variable with
Re(Z) and Im(Z) being integrable, then |EZ| ≤ E|Z|.
Proof. First note that if c = a + ib and Z = X + iY , then
E(cZ) = E(a + ib)(X + iY ) = E((aX − bY ) + i(bX + aY ))
= aEX − bEY + ibEX + iaEY
= (a + ib)(EX + iEY ) = cEZ
Namely, scaling by complex number can be exchanged with E. Next write
EZ = |EZ| (cos θ + i sin θ) = |EZ|eθ
By definition we know that Re(EZ) = Re(E(Re(Z))+iE(Im(Z))) = E(Re(Z)), namely, taking
real part can be exchanged with E. Then
|EZ| = Re(e−θ EZ) = Re(E(e−θ Z)) = E(Re(e−θ Z)) ≤ E|e−θ Z| = E|Z|
194
PROPOSITION 6.1.1 Suppose f is the characteristic function of X.
• |f (t)| ≤ f (0) = 1
• f is uniformly continuous over R.
• faX+b (t) = fX (at)eitb and f−X (t) = fX (−t) = fX (t)
• Let M be a probability measure on (Ω1 , A1 ) and K be a transition probability from Ω1 to
(R, R). Then
Z
Z
itx
Z
e (M K)2 (dx) =
e Kω1 (dx) M (dω1 )
itx
Ω1
R
• Characteristic functions of finite sum of independent random variables is the product of the
characteristic functions of independent random variables.
Proof. Denote µ to be the distribution of X.
• Write
Z
Z
itx
|eitx |µ(dx) ≤ 1 = f (0)
|f (t)| = e µ(dx) ≤
R
R
itx
0
because |e | ≤ 1 and f (0) = E(e ) = 1.
• Write
Z
Z
Z
itx
ihx
itx
i(t+h)x
µ(dx) − e µ(dx) = e (e − 1)µ(dx)
|f (t + h) − f (t)| = e
R
R
Z
ZR
|eihx − 1|µ(dx)
|eitx ||eihx − 1|µ(dx) ≤
≤
R
R
Since |eihx − 1| ≤ 2, then by the Dominated Convergence Theorem the right-hand side goes
to 0 as h → 0 uniformly in t ∈ R. Therefore uniform continuity is yielded.
• Write
faX+b (t) = Ee(aX+b)it = eibt Eei(at)X = fX (at)eibt
and
f−X (t) = Eei(−X)t = Eei(−t)X = fX (−t) = E cos tX − iE sin tX = fX (t)
• This is a direct result from the integration with respect to mixture distribution.
• It suffices to show the case of sum of two independent random variables X, Y . Since eitx is
a Borel function from R to C if we regard C as a measurable space identical to R, then we
know that eitX and eitY are independent and therefore
fX+Y (t) = Eeit(X+Y ) = E(eitX eitY ) = (EeitX )(EeitY ) = fX (t)fY (t)
195
DEFINITION 6.1.3 Suppose X1 , X2 are independent with law µ1 , µ2 and distribution function F1 , F2 , then the law(resp. the distribution function) of X1 + X2 , is defined to be the
convolution of µ1 and µ2 (resp. F1 and F2 ), denoted by µ = µ1 ∗ µ2 (resp. F1 ∗ F2 )
THEOREM 6.1.1 (Convolution Formula) Suppose X1 , X2 are independent with law µ1 , µ2
and distribution function F1 , F2 , then
Z
Z
F (x) =
F2 (x − x1 )dF1 (x1 ) µ(B) =
µ2 (B − x1 )µ1 (dx1 )
R
R
for all B ∈ R where B − x1 is the pointwise translation set B − x1 = {y − x1 : y ∈ B}.
Proof. First consider distribution function. Define g : R2 → R, (x1 , x2 ) 7→ x1 + x2 . Clearly g is a
Borel function. Set µ̃ to be the probability distribution induced by (X1 , X2 ), namely, the joint
distribution. From the result of product measure we know that µ = µ1 × µ2 . Then
F (x) = P {X1 + X2 ≤ x} = (µ̃g −1 )(−∞, x] = ((µ1 × µ2 )g −1 )(−∞, x]
Z
−1
= (µ1 × µ2 )(g (−∞, x]) = µ2 ((g −1 (−∞, x])x1 )µ1 (dx1 )
Z
Z
= µ2 ({x1 + x2 ≤ x})x1 )µ1 (dx1 ) = µ2 (−∞, x − x1 ]µ1 (dx1 )
Z
= F2 (x − x1 )dF1 (x1 )
Now we compute the convolution of the measure. For B ∈ R Write
µ(B) = P {X1 + X2 ∈ B} = (µ̃g −1 )(B) = (µ1 × µ2 )(g −1 (B))
Z
Z
−1
= µ2 ((g B)x1 )µ1 (dx1 ) = µ2 ({x2 : x1 + x2 ∈ B})µ1 (dx1 )
Z
Z
= µ2 ({x2 : x2 ∈ B − x1 })µ1 (dx1 ) = µ2 (B − x1 )µ1 (dx1 )
THEOREM 6.1.2 If µ1 and µ2 are two distributions and µ2 has a density with respect to ν2 ,
where ν2 is defined on a measurable space with linear structure, and dµ2 = D2 dν2 . Suppose ν2
is translation-invariant, then the convolution measure, µ1 ∗ µ2 , also has a density.
Proof. By the convolution formula we know that
Z
Z Z
F (x) = (F1 ∗ F2 )(x) =
F2 (x − x1 )dF (x1 ) =
D2 (x2 )ν2 (dx2 ) dF (x1 )
R
R
(−∞,x−x1 ]
Z Z
=
D2 (y − x1 )ν2 (d(y − x1 )) dF (x1 )
R
(−∞,x]
Z Z
=
D2 (y − x1 )ν2 (dy) dF (x1 )
R
(−∞,x]
196
because ν2 is translation-invariant. Now we can apply the Fubini theorem(General Measure
Version) and obtain that
Z Z
Z
F (x) =
D2 (y − x1 )ν2 (dy) dF (x1 ) =
R
(−∞,x]
Z
(−∞,x]
D2 (y − x1 )dF (x1 ) ν2 (dy)
R
Z
D2 (y − x1 )dF (x1 ) is a density of F .
This means that D =
R
Now let us consider the smoothing property of convolution with respect to a nice distribution,
in particular, the normal distribution, which has a smooth density. Denote
1
x2
nδ = √
x∈R
exp − 2
2δ
2πδ 2
And for any bounded Borel function f (which can be extremely irregular in terms of smoothness),
define
Z
fδ (x) = (f ∗ nδ )(x) =
Z
f (x − y)nδ (y)λ(dy) =
R
nδ (x − y)f (y)λ(dy)
R
THEOREM 6.1.3 Define fδ for each bounded Borel function f and δ > 0. Then fδ ∈ CB∞ ,
the collection of functions with arbitrary order of bounded derivatives. Furthermore if f ∈ Cu ,
the collection of all bounded uniformly continuous functions, then fδ → f uniformly as δ ↓ 0.
Proof. First we show that fδ ∈ CB∞ . Clearly nδ ∈ CB∞ and we know that
k
k
d
x
1
dxk nδ (x) = − δ Hk δ nδ (x)
where Hk is the kth Hermite polynomial. Note that polynomials can be eventually dominated
by exponential functions, then there exist some constant c(k, δ) such that
2
k
x x
1
Hk
= c(k, δ)n2δ (x)/nδ (x)
≤ c(k, δ) exp
−
δ
δ 4δ 2
Therefore the absolute value of kth derivative of nδ (x) can be dominated by c(k, δ)n2δ . Now we
know nδ has bounded derivatives of all orders, then the Dominated Convergence Theorem yields
the exchange of differentiations and integration
k
Z
Z
d
dk
≤ c(k, δ)kf k
f
(x)
=
n
(x
−
y)f
(y)λ(dy)
≤
c(k,
δ)
n
(x
−
y)f
(y)λ(dy)
δ
δ
2δ
dxk
dxk
R
R
since the integral of n2δ (x − y) over R is 1. Therefore fδ ∈ CB∞ .
Next we show that fδ converges to f uniformly as δ ↓ 0 provided that f is uniformly continuous. First note that for a fixed η > 0 we have
Z
lim
nδ (y)dy = lim(2Φ(η/δ) − 1) = 0
δ↓0
|y|>η
δ↓0
197
where Φ is the distribution function of the standard normal distribution. Now for any fixed
η > 0 we have
Z
lim sup |f (x) − fδ (x)| ≤ lim sup
δ↓0
δ↓0
|f (x) − f (x − y)|nδ (y)λ(dy)
R
Z
Z
≤ lim sup
|f (x) − f (x − y)|nδ (y)λ(dy) + 2kf k lim sup
δ↓0
Z
nδ (y)dy
≤ sup |f (x) − f (x − y)| lim sup
δ↓0
{|y|<η}
|y|<η
δ↓0
nδ (y)dy
{|y|≥η}
|y|≤δ
≤ sup |f (x) − f (x − y)|
|y|<η
Now let η ↓ 0 and we obtain that sup|y|<η |f (x) − f (x − y)| ↓ 0 uniformly in x because f is
uniform continuous.
THEOREM 6.1.4 If µn and µ are subprobability measures with
Z
Z
f dµn → f dµ
for all f ∈ CB∞ , then µn converges to µ vaguely.
Proof. Now we need to pickup f ∈ CK as a test function to apply the convergence using integral
of f . Clearly we know that f is bounded, Borel, and uniformly continuous since it is continuous
over a compact set and vanishes outside. Therefore fδ ∈ CB∞ and fδ converges to f uniformly as
δ ↓ 0. Now compute
Z
Z
Z
Z
Z
Z
Z
f dµn − f dµ ≤ f dµn − fδ dµn + fδ dµn − fδ dµ + fδ dµ − f dµ
Z
Z
≤ 2kfδ − f k + fδ dµn − fδ dµ → 0
because fδZ ∈ CB∞ and
Z our assumption can be applied to the second difference of integrals.
Therefore f dµn → f dµ for f ∈ CK and µn converges weakly to µ.
6.2
Characterization Property
The first thing we need to consider is the one-to-one correspondence between the characteristic
functions and the distributions of random variables. We can derive the formula of µ in terms
of f , in principle, given by the so-called inversion formula. But before this, we need some
preliminaries.
198
LEMMA 6.2.1 (Dirichlet Integrals) The following four equalities hold
Z ∞
Z ∞
sin x
π
1 − cos x
π
dx =
dx
=
x
2
x2
2
Z0 ∞
Z0 ∞
1 − cos αx
π
sin αx
π
dx = sgn(α)
dx = |α|
2
x
2
x
2
0
0
Simultaneously we also need an upper bound for indefinite Dirichlet integral:
LEMMA 6.2.2 For all y ≥ 0 we have
Z
0 ≤ sgn(α)
0
y
sin αx
dx ≤
x
π
Z
0
sin x
dx
x
Proof. We may assume α 6= 0, otherwise the result is trivial. Without loss of generality we may
assume that α > 0. Write
y
Z αy
sin αx
sin x
I(y) =
dαx =
dx
αx
x
0
0
0
Taking derivative with respect to y yields that I 0 nπ
= 0 for all n ∈ N+ . This means that I
α
Z
sin αx
dx =
x
Z
y
takes local maximum values at y = nπ/α. Note that when αy = nπ we have
n Z kπ
nπ Z nπ sin x
X
sin x
=
dx =
dx
I
α
x
x
(k−1)π
0
k=1
which is a finite sum of n integrals with alternating signs and decreasing absolute values.
nπ
Therefore
I
decreases as n increases. Hence I(y) takes maximum value at π/α, which
α
Z π
sin x
is
dx.
x
0
THEOREM 6.2.1 (Inversion Formula) Suppose f is the characteristic function of a distribution µ, then for a < b we have
1
1
1
µ(a, b) + µ{a} + µ{b} = lim
T →∞ 2π
2
2
Z
T
−T
e−ita − e−itb
f (t)dt
it
Proof. First compute
Z T −ita
Z T −ita
Z
Z T Z it(x−a)
e
− e−itb
e
− e−itb
e
− eit(x−b)
itx
f (t)dt =
e µ(dx)dt =
µ(dx)dt
2πit
2πit
2πit
−T
−T
R
−T R
Note that
it(x−a)
−ita
Z b
it(x−b) −itb e
e
|b − a|
−
e
−
e
1
itx
= |e | ≤
|e−itu |du ≤
2πit
2πit
2π a
2π
Therefore
Z
T
−T
Z it(x−a)
Z
e
− eit(x−b) µ(dx)dt ≤
2πit
R
T
−T
199
Z
R
|b − a|
|b − a|T
µ(dx)dt =
<∞
2π
π
And we can apply the Fubini theorem and yield
Z T −ita
Z T Z it(x−a)
Z Z T it(x−a)
e
− e−itb
e
− eit(x−b)
e
− eit(x−b)
f (t)dt =
µ(dx)dt =
dtµ(dx)
2πit
2πit
2πit
−T
−T R
R −T
Set
Z
T
I(T, x, a, b) =
−T
eit(x−a) − eit(x−b)
dt
2πit
Then by using the property cos(−x) = cos x, sin(−x) = − sin x and the symmetric property we
obtain symplification of I(T, x, a, b):
Z T
Z T
1
sin(t(x − a))
sin(t(x − b))
I(T, x, a, b) =
dt +
dt
π
t
t
0
0
Now we can compute the limit for I as T → ∞. By using the results from Dirichlet integrals
we have


− 1/2 − (−1/2) = 0,






1/2,


lim I(T, x, a, b) = 1/2 − (−1/2) = 1,
T →∞




1/2,





1/2 − 1/2 = 0,
x<a
x=a
a<x<b
x=b
x>b
and an upper bound for I(T, x, a, b) is provided by the previous lemma:
Z
2 π sin x |I(T, x, a, b)| ≤ dx
π 0
x
Therefore by the Dominated Convergence Theorem we obtain that
Z
lim
I(T, x, a, b)µ(dx)
T →∞ R
Z
=
lim I(T, x, a, b)µ(dx)
R T →∞
Z
Z
Z
Z
Z
=
+
+
+
+
lim I(T, x, a, b)µ(dx)
(−∞,a)
{a}
(a,b)
{b}
(b,+∞)
T →∞
1
1
=µ(a, b) + µ{a} + µ{b}
2
2
This inversion formula provides a way of computing probability measures on R of any intervals
of the form (a, b], and hence uniquely determine the probability measure µ on R. Here is an
explanation. It suffices to use the inversion formula to compute the measure of a singleton.
Suppose {x} is the singleton of interest. Since atoms on the real line are at most countable, then
the continuity points are dense in R. Therefore there exists sequence (an )n≥1 and (bn )n≥1 such
200
that an ↑ x, bn ↓ x and an , bn are continuity points of µ. Therefore by the monotone sequential
continuity from above we obtain
1
1
1
µ{x} = lim (µ(an , bn ) + µ{an } + µ{bn }) = lim
n→∞
n→∞ 2π
2
2
Z
T
−T
e−itan − e−itbn
f (t)dt
it
Besides, there is another way of computing the measure of a singleton using characteristic
function:
THEOREM 6.2.2 Suppose f is the characteristic function of a distribution µ, then
Z T
1
e−itx0 f (t)dt
µ{x0 } = lim
T →∞ 2T −T
Proof. First compute
Z
T
−itx0
|e
−T
Z
T
|e
f (t)|dt ≤
−itx0
−T
Z
|
|eitx |µ(dx)dt ≤ 2T
R
Then by the Fubini theorem we can exchange the order of iterated integral
Z T
Z T
Z Z T
Z
1
1
1
−itx0
−itx0
itx
e
e
eit(x−x0 ) dtµ(dx)
f (t)dt =
e µ(dx)dt =
2T −T
2T −T
2T
R −T
R
Z
Z
sin T (x − x0 )
sin T (x − x0 )
=
µ(dx) =
µ(dx) + µ{x0 }
R T (x − x0 )
R\{x0 } T (x − x0 )
Note that
sin T (x − x0 ) sin T (x − x0 ) sin T (x − x0 ) T (x − x0 ) = T (x − x0 ) I{|x−x0 |≤1} + T (x − x0 ) I{|x−x0 |>1}
sin T (x − x0 ) 1
+
≤ sup T (x − x0 ) T
|x−x0 |≤1
Therefore by Dominated Convergence Theorem we know that
Z
Z T
1
sin T (x − x0 )
−itx0
e
f (t)dt = lim
µ(dx) + µ{x0 }
lim
T →∞ R\{x } T (x − x0 )
T →∞ 2T −T
0
Z
sin T (x − x0 )
=
lim
µ(dx) + µ{x0 }
R\{x0 } T →∞ T (x − x0 )
= µ{x0 }
THEOREM 6.2.3 Suppose f is the characteristic function of a distribution µ, then
Z T
X
1
2
(µ{x}) = lim
|f (t)|2 dt
T →∞ 2T −T
x∈R
201
Proof. Let X be distributed according to µ and Y be i.i.d. with X(The existence of such an Y
is due to the Kolmogorov’s extension theorem, which is beyond the scope of this course). Define
g(x, y) = x − y. Then
Z
−1
P {X − Y = 0} = (µ × µ)(g {0}) = (µ(g −1 {0})x )µ(dx)
Z
X
= (µ{x})µ(dx) =
(µ{x})2
x∈R
Now compute
fX−Y (t) = fX (t)f−Y (t) = fX (t)fX (−t) = fX (t)fX (t) = |f (t)|2
And apply the previous theorem, then
Z T
Z T
X
1
1
2
−it0
(µ{x}) = P {X − Y = 0} = lim
e fX−Y (t)dt = lim
|f (t)|2 dt
T →∞ 2T −T
T →∞ 2T −T
x∈R
THEOREM 6.2.4 (Uniqueness) If two probability distributions have the same characteristic function, then they are the same measure.
Proof. First we claim that µ = ν if and only if for all f ∈ Ck we have
R
f dµ =
R
f dν.
Necessity part is trivial, we only need to consider the sufficiency part. This is due to the
uniqueness of the weak limit of a sequence of measures. For all ε > 0, choose A such that
µ[−A, A] > 1 − ε, ν[−A, A] > 1 − ε and K ⊂ [−A, A]. Now we select a test function f ∈ CK
and let gε be the periodic extension of f over [−A, A]. Since f ∈ CK then we know that
f (−A) = f (A) = 0 and therefore periodic extension g is continuous over the real line. By the
Stone-Weierstrass theorem there exists N = N (ε) and a function of the form
N
X
pε (x) =
an einx
n=−N
Z
such that supx∈R |gε (x) − p(x)| < ε. First note that
Z
pε dµ =
pε dν because they are linear
combinations of characteristic function evaluated at n. Then
Z
Z
f dµ − f dν Z
Z
Z
Z
Z
≤ (f − gε )dµ + (gε − pε )dµ + pε (dµ − dν) + (f − gε )dν + (gε − pε )dν ≤2 sup |gε − pε | + sup |gε |µ([−A, A]c ) + sup |gε |ν([−A, A]c ) = 2ε + 2kf kε
x∈R
x∈R
x∈R
202
Z
because kf k = kgk due to periodic extension. Now by letting ε ↓ 0 we obtain that
Z
f dν. Therefore by the claim above we obtain that µ = ν.
f dµ =
THEOREM 6.2.5 (Fourier Inverse Transform Formula) Suppose F is a distribution
function and f is the characteristic function of F . If f is integrable with respect to the Lebesgue
measure on the real line, then F is continuously differentiable with continuous density
Z
1
0
F (x) =
e−itx f (t)dt
2π R
Proof. First we compute the following
e
ith
t
Z
− 1 = (cos th − 1) + i sin th = −h
Z
sin hudu + ih
0
And therefore
ith
|e
t
Z
cos hudu =
0
t
iheihu du
0
t
Z
|iheihu |du ≤ th
− 1| ≤
0
Therefore we can compute by the inversion formula
1
1
µ(x − h, x) + µ{x − h} + µ{x}
2
2
1
1
= (F (x − h) − F ((x − h)−)) + (F (x) − F (x − h)) − (F (x) − F (x−))
2
2
1
1
= (F (x) − F (x − h)) + (F (x−) − F ((x − h)−))
2
2
Z T ith
Z ∞ ith
e − 1 −itx
1
e − 1 −itx
1
= lim
e f (t)dt =
e f (t)dt
T →∞ 2π −T
it
2π −∞
it
ith
e − 1
≤ h and f is Lebesgue-integrable over R, and |e−itx | ≤ 1. Therefore with
where it F (x) − F (x − h) ≥ 0, F (x−) − F ((x − h)−) ≥ 0 we have
Z Z ∞
Z ∞
1 ∞ eith − 1 −itx
F (x) − F (x − h) ≤
e f (t) dt ≤
h|f (t)|dt = h
|f (t)|dt → 0
π −∞ it
−∞
−∞
as h ↓ 0. Therefore F (x − h) ↑ F (x) as h ↓ 0, which means that F is left-continuous and hence
is continuous. Namely, F (x−) − F ((x − h)−) = F (x) − F (x − h). Therefore we can compute
the left-derivative
Z ith
F (x) − F (x − h)
1
e − 1 −itx
lim
= lim
e f (t)dt
h→0
h→0 2T R
h
ith
Z
1
eith − 1
=
e−itx f (t) lim
dt
h→0
2T R
ith
Z
1
=
e−itx f (t)dt
2T R
203
because
eith − 1
= lim
lim
h→0
h→0
ith
sin th
1 − cos th
+i
th
th
=1
The right-derivative can be derived similarly. Therefore the result is proved.
The following corollary unveiled the motivation of the name Fourier inversion transform:
COROLLARY 6.2.6 If µ is a probability distribution on the real line with density p(x) with
respect to the Lebesgue measure λ over the real line. Denote the characteristic function of µ to
be f . Then we have
Z
f (t) =
Z
itx
e p(x)dx p(x) =
R
e−itx f (t)dt
R
The proofs of the next theorem is beyond the scope of this course, which make use of theorems
and techniques from harmonic analysis. Nevertheless the results are of great interest in the
application of the characteristic function as well as in the harmonic analysis.
THEOREM 6.2.7 Suppose F is a distribution function and f is the characteristic function
of F . If |f |2 is integrable with respect to the Lebesgue measure on the real line, then F is
absolutely continuous.
Now we introduce one of the most well-known theorem in mathematical analysis in our
probabilistic context.
THEOREM 6.2.8 (Riemann-Lebesgue Lemma) Suppose F is an absolutely continuous
distribution function, then lim f (t) = 0 where f is the characteristic function of F .
t→±∞
Proof. Since F is absolutely continuous, then there exists a function g ≥ 0 such that
Z
g(y)λ(dy)
F (x) =
(−∞,x]
and g is integrable. The proof starts from step functions and passes to the limit.
P
• Step 1: Step Function Case. Suppose g(x) = j cj IAj (x), then integrability implies
that λ(Aj ) < ∞ for all j. Notice that Aj ’s here are intervals, then
Z
X Z
itx
f (t) =
g(x)e λ(dx) =
cj
eitx λ(dx)
R
j
Aj
Notice that here Lebesgue integral is the same as the Riemann integral, and Aj ’s are finite
R
intervals, then Aj eitx dx → 0 as t → ±∞ can be yielded by direct computation. Therefore
f (t) → 0 as t → ±∞.
204
• Step 2: General Density Case. Let ε > 0 be any small positive real. Let S be a simple
function
S(x) =
m
X
ck IAk (x)
k=1
such that S(x) ≤ f (x) and
Z
ε
2
R
Such a simple function can be obtained by the definition of integral of a non-negative mea|g(x) − S(x)|dx <
surable function, since the integral of a non-negative measurable function is the increasing
limit of the integrals of an increasing sequence of simple functions. Due to integrability, by
assuming that all ck ’s are nonzero, we know that λ(Ak ) < ∞ for all k = 1, · · · , m. Then
by the property of Lebesgue measure and outer measure on the real line, there exists some
open set Gk such that λ(Gk − Ak ) < ε/4(mCk ) and Gk ⊃ Ak . Since Gk is open, then
there exist some Ok ⊂ Gk such that λ(Gk − Ok ) < ε/(4mCk ), where Ok can be written
P k
(aik , bik ), since Gk must be a countable disjoint union of open intervals, and
as Ok = ni=1
we know that Ok can be taken as the sufficiently large principle part of these intervals.
Therefore
λ(Ok ∆Ak ) = λ(Ok \Ak ) + λ(Ak \Ok ) ≤ λ(Gk − Ak ) + λ(Ak ∩ (Gk − Ok ))
ε
≤ λ(Gk − Ak ) + λ(Gk − Ok ) <
2mCk
Pm
Now define rk (x) = ck IOk (x) and r(x) = k=1 rk (x). Then r is a step function, and by
computation
Z ∞
Z ∞
Z ∞
Z ∞
itx
itx
itx
itx
|f (t)| = g(x)e dx ≤ (g − S)e dx + (S − r)e dx + re dx
−∞
−∞
Z ∞
Z −∞
Z ∞ −∞
∞
≤
|g − s|dx +
|s − r|dx + reitx dx
−∞
−∞
−∞
Z ∞
Z ∞
m
X
ε
itx
< +
ck
|IAk − IOk |dx + re dx
2 k=1
−∞
−∞
Z
Z
m
∞ itx ε ε ∞
ε X
itx
λ(Ak ∆Ok ) + re dx < + + r(x)e dx
= +
2 k=1
2 2
−∞
−∞
Then we can let t → ±∞ and obtain
lim sup |f (t)| ≤ ε
t→±∞
because r(x) is a step function, and the result of step 1 can be applied. But ε can be
arbitrarily small, and this shows that limt→±∞ |f (t)| = 0.
205
6.3
Convergence Theorem
The first result is not surprising.
THEOREM 6.3.1 Let (µn )n≥1 , µ∞ be probability measures on (R, R) with characteristic
functions (fn )n≥1 , f∞ . If µn converges weakly to µ∞ , then fn converges to f∞ uniformly in every
finite interval, and the family {fn , f∞ }n≥1 is equicontinuous.
Proof. First we show that (fn ) are equicontinuous. We know that (µn ) are tight. Then for all
ε > 0 there exists some A(ε) such that supn µn {x : |x| > A(ε)} < ε/2. Now take δ = ε/(2A(ε)).
Then for all h with |h| < δ we have
Z ihx
e − 1
µn (dx)
|fn (t + h) − fn (t)| ≤ it Z
Z
x|h|µ(dx) +
=
µn (dx)
{|x|>A}
{|x|≤A}
≤ A|h| + sup µn {|x| > A} < A
n≥1
ε
ε
+ =ε
2A 2
This shows the equicontinuity. Next we show that fn → f pointwise. This is because {cos tx, sin tx} ⊂
CB and therefore
Z
itx
fn (t) =
Z
Z
e µn (dx) =
ZR
→
cos txµn (dx) + i sin txµn (dx)
Z R
eitx µ(dx) = f (t)
cos txµ(dx) + i sin txdµ =
R
RZ
R
R
Proving that fn converges to f over any compact interval [a, b] is somehow more delicate and
makes use of the Heine-Borel theorem. First for all ε there exists some δ(ε), such that for all
|h| ≤ δ(ε) the following holds
ε
9
S
for all n = 1, 2, · · · , ∞ and for all t ∈ [a, b] due to equicontinuity. Now we see that t∈[a,b] (t −
|fn (t + h) − fn (t)| <
δ(ε), t + δ(ε)) ⊃ [a, b] and the Heine-Borel theorem yields that there exists t1 (ε), · · · , tm(ε) (ε)
S
such that m(ε)
i=1 B(ti (ε), δ(ε)) ⊃ [a, b]. Now for ε > 0 there exists N (ε) such that |fn (ti (ε)) −
f∞ (ti (ε))| < ε/9 for all n > N (ε) and for all i = 1, 2, · · · , m(ε). Therefore for all n > N (ε)
and for all t ∈ [a, b] there exists some ti (ε) such that t + h ∈ B(ti (ε), δ(ε)), and hence for all
n > N (ε) we have
|fn (t + h) − f∞ (t + h)|
≤|fn (t + h) − fn (ti (ε))| + |fn (ti (ε)) − f∞ (ti (ε))| + |f∞ (t + h) − f∞ (ti (ε))|
ε
ε
= + |fn (ti (ε)) − f∞ (ti (ε))| +
9
9
ε ε ε
ε
= + + =
9 9 9
3
206
Then compute
|fn (t) − f (t)| ≤|fn (t) − fn (t + h)| + |fn (t + h) − f∞ (t + h)| + |f∞ (t + h) − f∞ (t)|
ε ε ε
< + + <ε
9 3 9
for all n > N (ε), and N is irrelevant of t. Therefore the convergence is uniform over [a, b].
The following lemma is useful, which provides a way of estimating probability measure in
terms of characteristic function.
LEMMA 6.3.1 Suppose µ is a probability measure on R with characteristic function f (t).
Then For all A > 0 we have
Z
1/A
µ[−2A, 2A] ≥ A f (t)dt − 1
−1/A
Proof. We have shown in theorem 6.2.2 that
Z
Z
Z
Z
sin T x sin T x
1 T
+
f (t)dt = µ(dx) ≤
T x µ(dx)
2T −T
T
x
{|x|≤2A}
{|x|>2A}
R
1
1
1
c
≤ µ[−2A, 2A] +
µ([−2A, 2A] ) = µ[−2A, 2A] 1 −
+
2AT
2AT
2AT
By letting T = 1/A we have
Z
1
A 1/A
1
µ[−2A, 2A] + ≥ f (t)dt
2
2
2 −1/A
Therefore
Z
1/A
f (t)dt − 1
µ[−2A, 2A] ≥ A −1/A
LEMMA 6.3.2 Let (µn )n≥1 be probability measures on R with characteristic functions (fn )n≥1 .
If fn → f∞ pointwise in a neighborhood of 0, f∞ is continuous at 0, and µn converges vaguely
to µ∞ , then µ∞ is a probability measure.
Proof. First we know that the sets of all continuity points of µ∞ is dense in R. Next suppose
the neighborhood where fn converges to f pointwise is (−δ, δ). Now for all ε there exists some
continuity points −2A, 2A of µ such that (−1/A, 1/A) ⊂ (−δ, δ) and that
Z
Z
Z
1/A
1/A
1 1/A
1
(f (t) − f (0))dt ≤
|f (t) − f (0)|dt
f (t)dt − 2 ≤ A −1/A
A −1/A
A −1/A
≤2
|f (t) − f (0)| < ε
sup
t∈(−1/A,1/A)
207
because f is continuous at 0. Then we obtain that
Z
1/A
A
f (t)dt > 2 − ε
−1/A
Now compute
Z
1/A
fn (t)dt − 1
µ(R) ≥ µ[−2A, 2A] = lim µn [−2A, 2A] ≥ lim inf A n→∞
n→∞
−1/A
Z
Z 1/A
1/A
|f (t) − fn (t)|dt − 1
f (t)dt − lim A
≥ A
n→∞
−1/A
−1/A
Z
Z 1/A
1/A
f (t)dt − A
lim |f (t) − fn (t)|dt − 1
= A
−1/A
−1/A n→∞
Z
1/A
f (t)dt − 1 > (2 − ε) − 1 = 1 − ε
= A
−1/A
where the exchange of limit and integration of the second term is due to Dominated Convergence
Theorem since |fn − f | ≤ 2 and fn → f within (−1/A, 1/A). Now ε can be arbitrary and hence
µ(R) = 1.
The next theorem, which is essential in proving central limit theorems in next chapter, is called
Levy’s Continuity Theorem, which describes convergence distribution in terms of convergence
in characteristic functions.
THEOREM 6.3.2 (Levy’s Continuity Theorem) Suppose (µn )n≥1 are probability measures on R and fn are corresponding characteristic functions. If fn converges pointwise to f∞
for some function f∞ , and f∞ is continuous at 0, then µn converges weakly to µ∞ for some
probability measure µ∞ and f∞ is the characteristic function of µ∞ .
Proof. Let (µnk )k≥1 be any subsequence of (µn )n≥1 that converges vaguely. Suppose (µnk ) converges to µ. Then we know that fnk converges to some function f∞ and f∞ is continuous at
0, then the above lemma yields that µ is a probability measure. Set f to be the characteristic
function of µ. Then the first theorem in this section gives that fnk converges pointwise to f .
But (fnk )k≥1 is a subsequence of (fn )n≥1 and fn converges to f∞ pointwise. Therefore f = f∞ .
This means that µ is uniquely determined by f∞ . Therefore any vaguely convergent subsequence
(µnk )k≥1 of (µn )n≥1 converges vaguely to µ determined by f∞ , the same probability measure.
Therefore µn converges to the probability measure µ∞ = µ.
Next we consider the convergence for subprobability measures.
208
DEFINITION 6.3.1 Suppose µ is a subprobability measure on R. The integrated characteristic function of µ is given by
Z
g(u) =
u
eiux − 1
µ(dx)
ix
Z
f (t)dt =
0
R
The subprobability-measure version of convergence theorem is stated without proof.
THEOREM 6.3.3 Suppose (µn )n≥1 are subprobability measures on R and gn are corresponding integrated characteristic functions. Then µn converges vaguely if and only if gn converges
pointwise.
6.4
Simple Applications
The first topic in this section is the moment generating property of characteristic function.
LEMMA 6.4.1 For all x ∈ R we have
n
|x|n+1 2|x|n
ix X (ix)k ,
e −
≤ min
k!
(n
+
1)!
n!
k=0
Proof. First by integration by parts we have
Z x
Z x
i
xn+1
n is
+
(x − s)n+1 eis ds
(x − s) e ds =
n
+
1
n
+
1
0
0
When n = 0 it gives
(ix) i2
e =1+
+
1!
1!
And by induction we obtain that
ix
ix
e =
(x − s)eis ds
0
n
X
(ix)k
k=0
x
Z
in+1
+
k!
n!
x
Z
(x − s)n eis ds
0
Since the first formula in this proof gives by integration by parts
Z x
Z
xn
i x
n−1 is
(x − s) e ds =
+
(x − s)n eis ds
n
n 0
0
Z x
and note that xn = n
(x − s)n−1 ds, then we can substitute the integral remainder by the
integral with (x − s)n :
0
ix
e =
n
X
(ix)k
k=0
in
+
k!
(n − 1)!
Z
x
(x − s)n−1 (eis − 1)ds
0
Now we estimate the two integral remainders:
n+1 Z x
Z
Z
i
1 x
1 x
n is
n is
(x − s) e ds ≤
(x − s)n ds
n!
(x − s) e ds ≤ n!
n!
0
0
0
209
because when x ≥ 0 we have 0 ≤ s ≤ x and |(x − s)n eis | = (x − s)n ; when x < 0 we have
x ≤ s ≤ 0 and |(x − s)n eis | = (s − x)n , which gives
Z x
Z x
Z x
|x|n+1
n
n is
(x − s)n eis ds ≤
(x
−
s)
ds
=
|(x
−
s)
e
|ds
=
n+1
0
Z0 0
Z0 x
Z 0
|x|n+1
n is
n
(x − s)n eis ds ≤
|(x
−
s)
e
|ds
=
(s
−
x)
ds
=
n+1
0
x
x
if x ≥ 0
if x < 0
And similarly we can estimate the integral remainder with (x − s)n−1 (eis − 1), where the complex
factor is bounded by 2 in modulus. Therefore the two estimate gives
n+1 Z x
Z x
n+1
i
2|x|n
in
|x|
n
is
n−1
is
≤
≤
(x
−
s)
e
ds
(x
−
s)
(e
−
1)ds
n!
(n + 1)! (n − 1)!
n!
0
0
and the result follows from the above estimations.
THEOREM 6.4.1 (Taylor Expansion) Suppose X is a random variable with characteristic
function f . If E|X|n < ∞, then
n
X
|tX|n+1 2|tX|n
ik
k
EX ≤ E min
,
f (t) −
k!
(n + 1)!
n!
k=0
Proof. By using the result from previous lemma we obtain that
n
n
k
X
X
E(i(tX))
ik
EX k = Eei(tX) −
f (t) −
k!
k!
k=0
k=0
n
|tX|n+1 2|tX|n
i(tX) X (i(tX))k −
≤ E e
,
≤ E min
k! (n + 1)!
n!
k=0
A more useful form of the Taylor expansion is written in terms of o(tn ) if E|X|n < ∞:
f (t) =
n
X
(it)k
k=0
k!
EX k + o(|t|n )
Like the Taylor expansion in elementary calculus, the Taylor expansion here is also highly related
to the derivatives.
THEOREM 6.4.2 (Moment-Generating Property) Suppose X is a random variable with
characteristic function f . If E|X|n < ∞, then for all k = 0, 1, 2, · · · , n we have
f (k) (t) = ik E(eitX X k )
Proof. First look at the first derivative. By the definition of characteristic function we have
ihX
f (t + h) − f (t)
− 1 − ihX
o(hX)
itX
itX e
itX
− E(iXe ) = E e
=E e X
h
h
hX
210
Since E|X| < ∞, namely, hX is finite almost surely(P {|X| = ∞} = 0). Therefore hX → 0 a.s.
as h → 0. Notice that |o(hX)| ≤ |hX| when |h| is small, and we know that |eitX X| ≤ |X| is
integrable. Therefore we can apply Dominated Convergence Theorem and obtain
f (t + h) − f (t)
o(hX)
0
itX
itX
itX
f (t) − E(iXe ) = lim
− E(iXe ) = lim E e X
=0
h→0
h→0
h
hX
d k
i E(X k eitX ) = ik+1 E(X k+1 eitX ). Note that
dt
ihX
− 1 − ihX
E(X k ei(t+h)X ) − E(X k eitX )
k+1 itX
k itX e
− iE(X e ) = E X e
h
h
Now by induction it suffices to show that
Similarly, Dominated Convergence Theorem can be applied where the integrand tends to 0 as
h → 0. Since E|X|k < ∞. Therefore we complete the induction.
The following result from the complex analysis is stated without proof:
LEMMA 6.4.2 If a sequence of complex number (cn )n≥1 has a limit c, then
cn n
lim 1 +
= ec
n→∞
n
And the next lemma gives a special case of the previous lemma
LEMMA 6.4.3 Let z1 , · · · , zn and w1 , · · · , wn be complex numbers with moduli no greater
than 1. Then
n
n
n
Y
X
Y
wi ≤
|zi − wi |
zi −
i=1
i=1
i=1
Proof. First notice that
n
Y
zi −
i=1
n
Y
wi = (zn − wn )(z1 · · · zn−1 ) + (z1 · · · zn−1 − w1 · · · wn−1 )wn
i=1
= (zn − wn )
n−1
Y
zi + wn
i=1
n−1
Y
zi −
i=1
n−1
Y
!
wi
i=1
Then clearly we have
n
n
n−1
Y
n−1
Y
Y
Y
z
−
w
≤
|z
−
w
|
+
z
−
w
i
i
n
n
i
i
i=1
i=1
i=1
i=1
And the proof can be completed by induction.
We can see that by using characteristic functions the weak law of large numbers is straightforward:
211
THEOREM 6.4.3 (Weak Law of Large Numbers) Let X, (Xn )n≥1 be i.i.d. with EX
n
1X
being finite, then
Xi converges to EX in probability.
n i=1
Proof. First compute the characteristic function for the constant random variable µ = EX:
n
1X
itEX
itµ
Xi by fn and X by f . Then
Ee
= e . Denote the characteristic function of X̄n =
n i=1
i(t/n)Sn
fn (t) = E(e
)=E
n
Y
!
i(t/n)Xn
e
i=1
=
n
Y
E(e
i(t/n)Xn
i=1
t
)=f
n
n
Now do the Taylor expansion on f :
n n
itµ
t
t
1
n
fn (t) = f (t/n) = 1 +
+o
itµ + no
= 1+
n
n
n
n
Since no(t/n) → 0 as n → ∞, then one obtains that µ + no(t/n) → itµ as n → ∞, and
therefore fn (t) → eitµ for all t ∈ R. Therefore by the Levy’s continuity theorem we conclude
L
that X̄n → EX = µ, which is the same as convergence in probability.
Here is the classical central limit theorem. We will proceed this part in next chapter with
several kinds of generalization.
THEOREM 6.4.4 (Classical Central Limit Theorem) Suppose X, (Xn )n≥1 be i.i.d. with
EX and EX 2 being finite, let σ 2 = Var(X) Then
Pn
√
(Xi − EX) L
n i=1
→ N (0, 1)
σ
Proof. Without loss of generality we assume that EX = 0 and σ = 1, otherwise we can consider
√
the normalized Xn ’s. Denote the characteristic function of nX̄n by fn and X by f . Then write
!
n
n
Y
Y
√
√
√
t
i(t/ n)Xn
i(t/ n)Xn
n
i(t/ n)Sn
√
)=E
e
=
E(e
)=f
fn (t) = E(e
n
i=1
i=1
Now do the Taylor expansion on f :
2 n 2
2 n
√
t2
t
1
t
t
n
fn (t) = f (t/ n) = 1 −
+o
= 1+
− + no
2n
n
n
2
n
Since no(t2 /n) → 0 as n → ∞, then one obtains that −t2 /2 + no(t2 /n) → −t2 /2 as n → ∞, and
2
therefore fn (t) → e−t /2 for all t ∈ R. Therefore by the Levy’s continuity theorem we conclude
√
L
that nX̄n → N (0, 1).
Here is another non-trivial application of the Levy’s Continuity Theorem.
212
THEOREM 6.4.5 (Characterization of Normality) Let X, Y be i.i.d. random variables
with mean 0 and variance 1. If X + Y and X − Y are independent, then the distribution is
normal.
Proof. Let (Xn )n≥1 be a sequence of i.i.d. random variables with the same distribution as X
n
X
and Sn =
Xk . Since 2X = (X + Y ) + (X − Y ) is the independent sum of X + Y and X − Y ,
k=1
and X + Y has the same distribution as X1 + X2 , and so does X − Y ans X4 − X3 . Also we
know that X1 + X2 and X4 − X3 are independent, and therefore 2X has the same distribution
as S2 + X4 − X3 = (X1 + X2 ) + (X4 − X3 ). Then the characteristic function f of X satisfies
f (2t) = f 3 (t)f (−t). Define ρ(t) = f (t)/f (−t), then one obtains that
2
f (t)3 f (−t)
f (t)
ρ(2t) =
=
= ρ2 (t)
3
f (t)f (−t)
f (−t)
By induction one obtains that
2n 2n
f (t/2n )
t
f (t/2n ) − f (−t/2n )
2n
ρ(t) = ρ
=
= 1+
2n
f (−t/2n )
f (−t/2n )
since f (t/2n ) → f (0) = 1 and f (−t/2n ) → f (0) = 1, then we conclude by previous lemma that
ρ(t) → 1 as n → ∞, but this is exactly saying that ρ(t) = 1 because ρ does not depend on n.
Therefore f (t) = f (−t), and hence we can write f (2t) = f 3 (t)f (−t) = f 4 (t). By induction one
concludes that
f (t) = f
4n
t
2n
This implies that the distribution of X is the same as the distribution of S4n /2n . Since the
variance of S4n is Var(S4n ) = 4n and the standard deviation is 2n , and ES4n = 0, then by central
L
limit theorem(classical) we know that S4n /2n → N (0, 1). But X does not depend on n, therefore
L
we must have X ∼ N (0, 1) to satisfy X → N (0, 1).
Now we give a brief introduction to the lattice distribution.
DEFINITION 6.4.1 An integer lattice distribution is any probability measure µ with mass
on Z. The span is defined by the greatest common divisor of the difference of the points with
positive probabilities.
The following properties are trivial.
PROPOSITION 6.4.1 Let X be an integer lattice distributed random variables and µ be
the distribution.
• The characteristic function of X is f (t) =
X
j∈Z
213
P {X = j}eijt
• f is 2π-periodic.
1
• The atomic inversion formula is given by P {X = j} =
2π
Z
π
e−ijt f (t)dt.
−π
• Suppose Sn is a random walk, namely, X, X1 , · · · , XZn are i.i.d. with characteristic function
π
1
f and Sn = X1 + · · · + Xn . Then P {Sn = j} =
e−ijt f n (t)dt.
2π −π
Proof. The first and the second are trivial. The third one is easy by the atomic inversion formula
Z T
1
P {X = j} = lim
e−ijt f (t)dt
T →∞ 2T −T
Now for any large T , set 2nπ ≤ T < 2(n + 1)π. Then
Z T
Z 2nπ
Z
1
1
1
−ijt
−ijt
e f (t)dt −
e f (t)dt ≤
|e−ijt f (t)|dt
2T
2T
2T
−T
−2nπ
[−2(n+1)π,−2nπ)∪(2nπ,2(n+1)π]
1
4π → 0
≤
2T
as T → ∞ because n → ∞ if we let T → ∞. Notice that
Z 2nπ
Z π
Z π
1
4nπ 1
1
−ijt
−ijt
e f (t)dt =
e f (t)dt →
e−ijt f (t)dt
2T −2nπ
2T 2π −π
2π −π
as n → ∞, and this complete the proof of the third one. The fourth one is a corollary of the
third since Sn itself is integer lattice distributed and the characteristic function of Sn is f n .
6.5
Bochner’s Theorem and Polya’s Theorem
DEFINITION 6.5.1 (Positive Definite Function) Let f : R → C be a complex-valued
function with real argument. f is positive definite, if f (−x) = f (x) and for any finite set of
tj ∈ R, j = 1, 2, · · · , n, the matrix

(f (tj − tk ))n×n
f (t1 − t1 ) f (t1 − t2 ) · · ·
f (t1 − tn )


 f (t2 − t1 ) f (t2 − t2 ) · · ·
=
..
..

.
.

f (tn − t1 ) f (tn − t2 ) · · ·
f (t2 − tn )
..
.





f (tn − tn )
is positive definite, namely, z ∗ (f (tj −tk ))n×n z ≥ 0 for any complex vector z = (z1 , · · · , zn )T ∈ Cn .
Here A∗ := ĀT is the conjugate transpose of A.
EXAMPLE 6.5.1 If (Z1 , · · · , Zn ) is a complex-valued random vector with E|Zj |2 < ∞ for
214
j = 1, · · · , n, then its uncentered covariance matrix is defined by

E(Z1 Z̄1 ) E(Z1 Z̄2 ) · · · E(Z1 Z̄n )

 E(Z2 Z̄1 ) E(Z2 Z̄2 ) · · · E(Z2 Z̄n )
Σ=
..
..
..

.
.
.

E(Zn Z̄1 ) E(Zn Z̄2 ) · · ·






E(Zn Z̄n )
By the Cauchy Schwartz inequality we know that Σ has finite entries. Suppose z = (z1 , · · · , zn )T
is a complex vector. Then
z ∗ Σz =
n X
n
X
n X
n
X
(z¯j Zj )(z̄k Zk )
z¯j E(Zj Z̄k )z̄j zk = E
j=1 k=1
!
2
n
X
z¯j Zj ≥ 0
=E
j=1
j=1 k=1
This means that Σ is a positive definite matrix. Now consider a complex-valued stochastic
process (Zt )t∈R with E|Zt |2 < ∞ for all t ∈ R. It is called a second order stationary process if
there exists an uncentered covariance function r : R → C such that for all s, t we have
E(Zs Z̄t ) = r(s − t)
Then we know by previous example that r is a positive definite function. For a specific example
of a second order stationary process, we consider (eitX )t∈R for a fixed random variable X. Then
the uncentered covariance function is
E(eisX eitX ) = E(ei(s−t)X ) = f (s − t)
which satisfies the condition of second order stationarity. Therefore the characteristic function
f is a positive definite function.
THEOREM 6.5.1 Suppose f is positive definite. Then for all t ∈ R we have f (−t) = f (t)
and |f (t)| ≤ f (0). If f is continuous at t = 0, then it is uniformly continuous in R. In this case,
we have that for all complex-valued function ζ on R and every T > 0:
Z TZ T
f (s − t)ζ(s)ζ(t)dsdt ≥ 0
0
0
(n)
(n)
Proof. If we consider a sequence of grids (si , tj
(n)
(n)
: 0 ≤ i, j ≤ n) with
(n)
s0 = 0 ≤ s1 ≤ · · · ≤ s(n)
n = T
(n)
t0 = 0 ≤ t1 ≤ · · · ≤ t(n)
n = T
such that
(n)
kπn k := max (si
1≤i,j≤n
(n)
(n)
(n)
− si−1 )(tj − tj−1 ) → 0 as n → ∞
215
Then the Riemann integral can be expressed as the a limit:
Z TZ T
n
X
(n)
(n)
(n)
(n)
(n)
f (s − t)ζ(s)ζ(t)dsdt = lim
f (sni − tj )ξ(sni )ξ(tnj )(si − si−1 )(tj − tj−1 )
0
n→∞
0
= lim
n→∞
i,j=1
n
X
(n)
(n)
f (sni − tj )(ξ(sni )(si
(n)
(n)
(n)
− si−1 ))ξ(tnj )(tj − tj−1 ) ≥ 0
i,j=1
due to the positiveness of f . Now if we take n = 1, z1 = 1, then we obtain f (0) ≥ 0. If we take
n = 2, t1 = t, t2 = 0, z1 = 1, z2 = 1, then one obtains that
2f (0) + f (−t) + f (t) ≥ 0
If we let n = 2, t1 = t, t2 = 0, z1 = 1, z2 = i, then one obtains that
2f (0) + if (−t) − if (t) ≥ 0
Therefore f (t) + f (−t) is real and f (t) − f (−t) is pure imaginary, which implies that
f (t) + f (−t) = f (t) + f (−t),
f (t) − f (−t) = −f (t) + f (−t) implying that f (t) = f (−t)
In the case when n = 2, the determinant is f 2 (0) − |f (t)|2 ≥ 0, which is one of the properties
of a positive semidefinite matrix. Therefore |f (t)| ≤ f (0). Now if f (0) = 0, then f (t) ≡ 1, and
the result is trivial, so we may assume that f (0) > 0. And without loss of generality we may
assume that f (0) = 1, since one can always consider the positive definite function f (t)/f (0).
When n = 3, t1
f (0)
f (t)
f (t + h)
= 0, t2 = t, t3 = t + h the determinant is also non-negative, which is
f (−t) f (−t − h) f (0)
f (−h) = 1 − |f (t)|2 − |f (t + h)|2 + 2Re(f (t)f (h)f (t + h)) ≥ 0
f (h)
f (0)
Now compute
|f (t) − f (t + h)|2 = |f (t)|2 + |f (t + h)|2 − 2Re(f (t)f (t + h))
≤ 1 − |f (h)|2 + 2Re(f (t)f (t + h)(f (h) − 1))
≤ 1 − |f (h)|2 + 2|f (t)||f (t + h)||1 − f (h)|
≤ 1 − |f (h)|2 + 2|1 − f (h)|
= (1 − |f (h)|)(1 + |f (h)|) + 2|1 − f (h)|
≤ 2|1 − f (h)| + 2|1 − f (h)| ≤ 4|1 − f (h)|
Now if f is continuous at h, then for all ε there exists some δ such that whenever |h| < δ we
have |1 − f (h)| < ε2 /4, and therefore for all t we have
|f (t) − f (t + h)|2 ≤ 4|1 − f (h)| < ε2
implying that |f (t) − f (t + h)| < ε
which completes the proof of the uniform continuity over the real line of f (t).
216
Now we introduce the Bochner’s theorem, which characterizes the characteristic functions.
THEOREM 6.5.2 (Bochner’s Theorem) f is a positive definite function with f (0) = 1
and continuous at 0 if and only if f is a characteristic function.
Proof. The sufficiency part is easy, just calculate the quadratic form
n
n X
X
j=1 k=1
f (tj − tk )zj zk =
n X
n
X
E(eitj X−itk X )zj zk
j=1 k=1
!
n X
n
X
(zj eitj X zk eitk X )
=E
j=1 k=1
2
n
X
itj X zj e ≥ 0
=E
j=1
where X is the underlying random variables associated with a probability distribution µ. Therefore f is positive definite. It suffices to show the necessity part, which requires some more
technical construction.
Take ξ(t) = e−itx , then by theorem 6.5.1 we obtain that
Z TZ T
1
pT (t) :=
f (s − t)e−it(s−t)x dsdt ≥ 0
2πT 0 0
Now further computation gives
Z TZ T
Z T Z T −t
1
1
−it(s−t)x
pT (t) =
f (s − t)e
dsdt =
f (u)e−iux dudt
2πT 0 0
2πT 0 −t
Z TZ T
1
=
f (u)e−iux I(−t,T −t) (u)dudt
2πT 0 −T
Since |f | ≤ 1 because it is positive definite, |eiux | = 1, then by Fubini’s theorem we know that
Z T
Z T
1
−iux
I(−t,T −t) (u)dt du
pT (t) =
f (u)e
2πT −T
0
Z T
Z T
1
−iux
=
f (u)e
I(−u,T −u) (t)dt du
2πT −T
0
When u ≤ 0 we know that 0 ≤ −u < t < T ≤ T − u, then therefore the inner integral is over
(−u, T ), and is equal to T + u = T − |u|; when u > 0 then similarly −u < 0 < t < T − u < T ,
and the inner integral is equal to T − u = T − |u|. Therefore
Z T
Z T
Z ∞
1
1
−iux
pT (x) =
f (u)e
I(−u,T −u) (t)dt du =
fT (u)e−iux du
2πT −T
2π
0
−∞
217
|t|
where fT (t) = f (t) 1 −
I{|t|≤T } . Now we try to show that pT is a density and fT (t) =
T
Z ∞
e−itx pT (x)dx. Observe that
−∞
1
α
Z
0
α
Z
β
e−itx dxdβ =
−β
2(1 − cos αt)
αt2
It follows by Fubini’s theorem that
Z Z
Z Z Z
1 α β
1 1 α β ∞
pT (x)dxdβ =
fT (t)e−itx dtdxdβ
α 0 −β
2π α 0 −β −∞
Z ∞
Z Z
1
1 α β −itx
e dxdβdt
fT (t)
=
2π −∞
α 0 −β
Z
1 ∞
1 − cos αt
=
dt
fT (t)
π −∞
αt2
Z
1 ∞
t 1 − cos t
=
fT
dt
π −∞
α
t2
Now apply Dominated Convergence Theorem to α → ∞ and obtain
Z Z
Z
Z
1 ∞
1 ∞ 1 − cos t
1 α β
t 1 − cos t
pT (x)dxdβ =
lim fT
dt =
dt = 1
lim
α→∞ α 0
π −∞ α→∞
α
t2
π −∞
t2
−β
where the last equality is due to Dirichlet integral and the fact that (1−cos t)t−2 is even. Namely,
we have the following holds:
1
lim
α→∞ α
Z
α
Z
β
pT (x)dxdβ = 1
0
−β
And by L’Hospital’s rule we know that
Z
Z
pT (x)dx = lim
R
β→∞
β
pT (x)dx = 1
−β
This shows that is indeed a density, and we know that pT is the inverse Fourier transform of fT .
Now to show that fT is the Fourier transform of pT (x), we first observe a similar equality
Z Z
1 α β itx −iux
2(1 − cos α(u − t))
e e
dxdβ =
α 0 −β
α(u − t)2
218
And then use Fubini’s theorem compute
Z Z
Z Z Z
1 1 α β ∞
1 α β
itx
pT (x)e dxdβ =
fT (u)e−iux dueitx dxdβ
α 0 −β
2π α 0 −β −∞
Z ∞
Z Z
1
1 α β −iux itx
=
fT (u)
e
e dxdβdu
2π −∞
α 0 −β
Z
1 ∞
1 − cos α(u − t)
=
fT (t)
du
π −∞
α(u − t)2
Z
z
1 − cos z
1 ∞
fT
+t
=
dz
π −∞
α
z2
Similarly by Dominated Convergence Theorem we can let α → ∞ inside the integral and get
Z Z
Z
z
1 − cos z
1 ∞
1 α β
itx
pT (x)e dxdβ =
lim fT
+t
dz = fT (t)
lim
α→∞ α 0
π −∞ α→∞
α
z2
−β
Therefore we know that
1
lim
α→∞ α
Z
α
Z
0
β
pT (x)eitx dxdβ = fT (t)
−β
And by L’Hospital’s rule we know that
Z
Z
itx
pT (x)e dx = lim
β→∞
R
β
pT (x)eitx dx = fT (t)
−β
This shows that fT (t) is the characteristic function of the probability measure associated with
the density pT . Now let T → ∞ then fT (t) → f (t), and Levy’s continuity theorem yields that
f , as a limit of characteristic functions, is itself a characteristic function because it is continuous
at 0 and f (0) = 1.
The following Polya’s theorem gives a sufficient condition for a function to be a characteristic
function. The proof is omitted.
THEOREM 6.5.3 (Polya’s Theorem) If f : R → C satisfies f (0) = 1, f (t) ≥ 0 for all t
and f (t) = f (−t), and f is continuous, convex, decreasing on [0, +∞), then f is a characteristic
function.
THEOREM 6.5.4 (Stable Distribution) For each α in the range (0, 2], fα (t) = exp(−|t|α )
is a characteristic function. The corresponding distributions are called stable distributions.
2 /2
Proof. If α = 2 then the resulting f2 (t) = e−t
is the characteristic function of the normal
N (0, 1), and therefore we may assume that α ∈ (0, 2). Consider the density function
p(x) =
α
2|x|α+1
219
if |x| > 1
Then the characteristic function satisfies
Z ∞
Z ∞
1 − cos tx
itx
1 − f (t) =
(1 − e )p(x)dx = α
dx
xα+1
−∞
1
Z ∞
Z ∞
1 − cos |t|x
1 − cos u
α
=α
dx = α|t|
du
α+1
x
uα+1
1
|t|
!
Z |t|
Z ∞
1
−
cos
u
1
−
cos
u
= α|t|α
du −
du
uα+1
uα+1
0
0
by a change of variable u = |t|x. Now we consider sufficiently small t: since
Z |t|
Z |t|
1 − cos u
1 − cos u
du
du
2
α+1
u /2
u
uα+1
0
0
lim
= 1 implying that lim
= lim
=1
Z |t| 2
u→0 1 − cos u
|t|→0 1
|t|→0
t2−α
u
du
2(2 − α)
2 0 uα+1
Z ∞
1 − cos u
du, then
Now if we denote cα = α
uα+1
0
1 − f (t) = cα |t|α − O(t2 ) implying that f (t) = 1 − cα |t|α + O(t2 )
Now if we let (Xn )n≥1 be a sequence of i.i.d. random variable with density function p(x), then
the characteristic function of
n
X
1
Yn :=
Xi
(cα n)1/α i=1
is
2 n
t
|t|α
t
+O
fYn (t) = f
= 1−
1/α
(cα n)
n
n2/α
2 n
1
t
α
= 1+
|t|α + nO
→ e−|t|
2/α
n
n
2 t
α
→ 0 as n → ∞. Therefore we know that e−|t| is the
as n → ∞ because α < 2 and nO
n2/α
limit of a sequence of characteristic functions fYn (t) and itself is continuous at t = 0 and value
n
at 0 is 1. And the Levy’s continuity theorem can be applied to obtain the desired result.
THEOREM 6.5.5 If f is a characteristic function, then so is eλ(f −1) for each λ ≥ 0.
Proof. Suppose f is a characteristic function of a probability measure µ on the real line. Then
the characteristic function of the probability measure (1 − λ/n)δ0 + (λ/n)µ when n ≥ λ is
Z
Z
Z
λ
λ
λ
λ
λ λ
itx
itx
e
1−
δ0 +
(dx) =
e
1−
δ0 (dx) + eitx µ(dx) = 1 − + f
n
n
n
n
n n
R
R
R
To see why the linearity of integral with respect to additive of two measures holds, one can let
(un )n≥1 , (vn )n≥1 be two sequences of simple functions on the real line increases to cos tx, sin tx,
220
and µ1 , µ2 be any two measures on R. Clearly, linearity of integral with respect to additive of
measures holds for simple functions. Then |un | ≤ 1, |vn | ≤ 1 and the Dominated Convergence
Theorem gives
Z
Z
itx
e (µ1 + µ2 )(dx) =
lim (un (x) + ivn (x))(µ1 + µ2 )(dx)
R
R n→∞
Z
= lim (un (x) + ivn (x))(µ1 + µ2 )(dx)
n→∞ R
Z
Z
= lim (un (x) + ivn (x))µ1 (dx) + lim (un (x) + ivn (x))µ2 (dx)
n→∞ R
n→∞ R
Z
Z
lim (un (x) + ivn (x))µ1 (dx) +
lim (un (x) + ivn (x))µ2 (dx)
=
n→∞
R n→∞
ZR
Z
=
eitx µ1 (dx) + eitx µ2 (dx)
R
R
Now let (Xn )n≥1 be a sequence of i.i.d. random variables with distribution (1 − λ/n)δ0 + (λ/n)µ,
n
X
then the characteristic function of
Xi is given by
i=1
2 2
λ λ
λ(f − 1)
1 − + f (t) = 1 +
→ eλ(f (t)−1)
n n
n
Since eλ(f −1) takes 1 when t = 0 and is continuous at 0, then by Levy’s continuity theorem one
obtains that it is a characteristic function.
221
Chapter 7
Central Limit Theorems
7.1
Liapunov’s Central Limit Theorem
DEFINITION 7.1.1 (Standard Array) A standard array is a double array of random
n
n
variables ((Xnj )kj=1
)n≥1 such that kn → ∞ as n → ∞, EXnj = 0, (Xnj )kj=1
are independent for
Pkn
each n, and Var(Sn ) = 1 where Sn = j=1 Xnj .
Intuitively, one can list a double array of random variables in the following display
X11 X12 · · ·
X1k1
X21 X22 · · ·
..
..
.
.
X2k1
..
.
Xn1 Xn2 · · ·
..
..
.
.
Xnkn
..
.
In this chapter we will use the following notations
Sn =
kn
X
Xnj
2
σnj
= Var(Xnj )
j=1
sn =
kn
X
σnj
j=1
3
γnj = E|Xnj |
Γn =
kn
X
γnj
j=1
Central Limit Theorems heavily rely on the following complex analysis lemma.
LEMMA 7.1.1 Suppose (θnj ) is a double array of complex numbers where j = 1, · · · , kn and
n ∈ N+ . If the following three hold
• max1≤j≤kn |θnj | → 0 as n → ∞;
Pn
• kj=1
|θnj | ≤ M < ∞, where M does not depend on n;
222
•
Pkn
j=1 θnj
→ θ, where θ is a finite complex number,
Then one will have
kn
Y
(1 + θnj ) → eθ .
j=1
Proof. Since maxj |θnj | → 0, then for sufficiently large n we know that |θnj | ≤ 1/2 for all j, and
we only consider such sufficiently large n below.
First we use log(1 + θnj ) to denote the complex logarithm with angle lies in (−π, π]. When
the angle is in (−π, π), the complex logarithm is indeed a analytic complex function and we
have the Taylor series
Therefore
∞
m
X
(−1)m+1 θnj
log(1 + θnj ) =
m
m=1
∞
∞
X
m
(−1)m+1 θnj
X |θnj |m
| log(1 + θnj ) − θnj | = ≤
m=2 m
m=2
m
= |θnj |2
∞
∞
X
X
|θnj |m−2
|θnj |m−2
≤ |θnj |2
m
2
m=2
m=2
∞
X
(1/2)m−2
≤ |θnj |
≤ |θnj |2
2
m=2
2
When the angle equals π, namely, the angle of 1 + θnj is π, then it is a negative real number.
And therefore θnj is a real number with |θnj | ≤ 1/2. But this is not possible, since 1 + θnj > 0
for all θnj ∈ R and |θnj | ≤ 1/2. Therefore for all θnj satisfying (i) and |θnj | ≤ 1/2 we always
have | log(1 + θnj ) − θnj | ≤ |θnj |2 . Therefore
kn
kn
kn
kn
X
X
X
X
2
|θnj | ≤ M max |θnj | → 0
|θnj | ≤ max |θnj |
θnj ≤
log(1 + θnj ) −
1≤j≤kn
1≤j≤kn
j=1
j=1
j=1
j=1
Pkn
by condition (i) and (ii). But (iii) means that
j=1 θnj → θ, and therefore we know that
k
k
n
n
X
Y
log(1 + θnj ) → θ, and this is equivalent to
(1 + θnj ) → eθ .
j=1
j=1
THEOREM 7.1.1 (Liapunov’s Central Limit Theorem) If a standard array (Xnj ) satisfies
Γn =
kn
X
E|Xnj |3 → 0 as n → ∞
j=1
then
Sn =
kn
X
L
Xnj → N (0, 1)
j=1
223
Proof. Let fnj be the characteristic function of Xnj . Since we assume that the third moment
exist, then we can do Taylor expansion based on the third moment:
1 2 2
fnj (t) = 1 − σnj
t + Λnj γnj |t|3
2
where |Λnj | ≤ 1/6 since the denominator is 3! = 6. Now we let θnj = 1 − fnj (t). We claim that
the conditions in the complex analysis lemma hold:
(i) max |θnj | → 0. This is because Γn → 0 implies max1≤j≤kn γnj → 0, and hence the
1≤j≤kn
Liapunov’s inequality implies that the second moment max1≤j≤kn σnj → 0. Therefore
2
max θnj ≤ max σnj
1≤j≤kn
(ii)
kn
X
1≤j≤kn
2
|t|3
t2
2 t
+ max |t|3 |Λnj |γn ≤ max σnj
+
max γn → 0
1≤j≤kn
2 1≤j≤kn
2
6 1≤j≤kn
|θnj | is bounded. This is because
j=1
kn
X
k
kn
X
n
t2 1
t2 1 X
Λnj γn |t|3 ≤ + Γn |t|3
|θnj | ≤
σnj +
2
6 j=1
2
6
j=1
j=1
and the right-hand side is bounded due to the third-moment condition Γn → 0, since a
convergent sequence is always bounded.
(iii)
kn
X
θnj converges to −t2 /2. This is because
j=1
kn
X
j=1
kn
X
2
θnj = −
t
+ |t|3
2
Therefore we know that
Λnj γnj
implying
j=1
kn
Y
2 /2
fnj (t) → e−t
kn
kn
X
2
|t|3
t |t|3 X
Γn → 0
γnj =
θnj + ≤
2
6 j=1
6
j=1
, and this is the characteristic function for Sn . Hence
j=1
L
2 /2
Sn → N (0, 1), since e−t
is the characteristic function for N (0, 1).
THEOREM 7.1.2 Suppose (Xn )n≥1 is a sequence of independent mean-zero random varin
n
X
X
2
ables, and let sn =
Var(Xj ), Γn =
E|Xj |3 . If Γn /s3n → 0 as n → ∞, then
j=1
j=1
n
Sn
1 X
L
=
Xj → N (0, 1)
sn
sn j=1
Proof. Let Ynj = Xn /sn . Then Ynj (1 ≤ j ≤ n) forms a standard array. And the third-moment
condition
n
X
n
1 X
E|Ynj | = 3
E|Xj |3 = Γn /s3n → 0
s
n j=1
j=1
3
Now we can apply the Liapunov’s theorem and obtain the asymptotic normality.
224
THEOREM 7.1.3 For a non-centered standard array (Xnj )(namely, sums of row variance
equal to 1 and row independence), if there exists a double array of real number (Mnj ) such that
|Xnj | ≤ Mnj a.s., and max Mnj → 0 as n → ∞, then
1≤j≤kn
Sn − ESn =
n
X
L
(Xnj − EXnj ) → N (0, 1)
j=1
Proof. Since (Ynj ) = (Xnj − EXnj ) is a standard array, then the third-moment condition
Γn =
kn
X
E|Xnj − EXnj |3 ≤
j=1
2 max |Xnj |E|Xnj − EXnj |2
j=1
≤2
kn
X
max Mnj
X
km
1≤j≤kn
Var(Xnj ) = 2 max Mnj → 0
1≤j≤kn
j=1
Therefore Liapunov’s central limit theorem can be applied to obtain asymptotic normality.
7.2
Lindeberg-Feller Central Limit Theorem
DEFINITION 7.2.1 (Lindeberg’s Quantity and Lindeberg’s Condition) Given a double array of random variables (Xnj ), j = 1, · · · , kn , n ∈ N+ , the Lindeberg’s quantity is defined
to be
Λn (η) =
kn
X
2
E(Xnj
; |Xnj | > η)
j=1
One says the Linbeberg’s condition holds, if for all η > 0 we have Λn (η) → 0 as n → ∞.
DEFINITION 7.2.2 (Holospoudicity) A double array of random variables are said to be
holospoudic, if max P (|Xnj | > η) → 0 as n → ∞ for all η > 0 holds.
1≤j≤kn
LEMMA 7.2.1 Let (Xnj ) be any double array of random variables, and consider the following
statements:
(a) For all j, and for all η > 0 we have lim P (|Xnj | > η) = 0;
n→∞
(b) For all η > 0 we have lim max P (|Xnj | > η) = 0;
n→∞ 1≤j≤kn
(c) For all η > 0 we have lim P
n→∞
(d) For all η > 0 we have lim
n→∞
kn
X
max |Xnj | > η
1≤j≤kn
= 0;
P (|Xnj | > η) = 0.
j=1
Then (d) implies (c), (c) implies (b) and (b) implies (a).
225
The proof of this lemma will be included in the necessity part of the proof of the LindebergFeller’s central limit theorem.
EXAMPLE 7.2.1 The implications from (d) to (a) are strict implications. If (Xnj ) are
independent within each row, then (d) holds if and only if (c) holds.
Solution. We consider probability space to be ((0, 1], B, λ).
(i) (a) Does not imply (b): Take kn = n, Xnj = I(0,j/n] , then P (|Xnj | > ε) = j/n → 0 for
all j; max P (|Xnj | > 1/2) = n/n = 1, which does not tend to 0.
1≤j≤n
(ii) (b) Does not imply (c): Take kn = n, Xnj = I((j−1)/n,j/n] , then max P (|Xnj | > ε) =
1≤j≤kn
1/n → 0; P max |Xnj | > 1/2 = P (1 ≥ 1/2) = 1, which does not tend to 0.
1≤j≤n
(iii) (c) Does not imply (d): Take kn = n, Xnj = I(1,1/n] for all j, then P
1/n → 0;
kn
X
P (|Xnj | > 1/2) =
n
X
j=1
j=1
max |Xnj | > ε =
1≤j≤kn
1
= 1, which does not tend to 0.
n
Now we assume independence within each row and (c). First notice inequality x < − log(1 − x)
when x < 1, and now let xnj = P (|Xnj | > ε), then we know that (b) holds, and therefore for
sufficiently large n we have xnj < 1 for all j. Therefore
kn
X
P (|Xnj | > ε) =
j=1
kn
X
xnj ≤ −
j=1
kn
X
log(1 − xnj ) = − log
j=1
kn
Y
P (|Xnj | ≤ ε)
j=1
= − log P (|Xn1 | ≤ ε, · · · , |Xnkn | ≤ ε) = − log 1 − P
kn
[
!!
{|Xnj | > ε}
j=1
= − log 1 − P
max |Xnj | > ε
→0
1≤j≤kn
And the proof is completed.
THEOREM 7.2.1 A double array of random variables are holospoudic if and only if for all
t ∈ R, we have
lim max |fnj (t) − 1| = 0
n→∞ 1≤j≤kn
Proof. Assume that the double array is holospoudic, then by Taylor expansion we have |eitx −1| ≤
|tx|, and therefore
|fnj (t) − 1| ≤E|eitXnj − 1| = E(|eitXnj − 1|; |Xnj | > η) + E(|eitXnj − 1|; |Xnj | ≤ η)
≤2P (|Xnj | > η) + E(|tXnj |; |Xnj | ≤ η) = 2P (|Xnj | > η) + |t|η
226
Hence
max |fnj (t) − 1| ≤ |t|η + 2 max P (|Xnj | > η)
1≤j≤kn
1≤j≤kn
Now by first let n → ∞ and then η ↓ 0 we obtain that
lim sup max |fnj (t) − 1| ≤≤ |t|η + lim 2 max P (|Xnj | > η) ≤ |t|η → 0
n→∞
n→∞
1≤j≤kn
1≤j≤kn
and here we use the holospoudicity. Conversely we assume that maxj |fnj (t) − 1| → 0 as n → ∞.
Let µnj be the distribution of Xnj . By the lemma in section 6.3, which use the integral of
characteristic function to estimate the probability, we obtain that
Z
ε Z 2/ε ε Z 2/ε
ε 2/ε
fnj (t)dt = dt − fnj (t)dt
P (|Xnj | > ε) =1 − µnj [−ε, ε] ≤ 2 − 2 −2/ε 2 −2/ε
2 −2/ε
Z
Z
ε Z 2/ε
ε 2/ε
ε 2/ε
≤ (1 − fnj (t))dt ≤
|1 − fnj (t)|dt ≤
max |1 − fnj (t)|dt
2 −2/ε
2 −2/ε
2 −2/ε 1≤j≤kn
Therefore
ε
max P (|Xnj | > ε) ≤
1≤j≤kn
2
Z
2/ε
max |1 − fnj (t)|dt
−2/ε 1≤j≤kn
Since |1 − fnj (t)| ≤ 2, then by Bounded Convergence Theorem we obtain that
Z
ε 2/ε
lim sup max P (|Xnj | > ε) ≤=
lim max |1 − fnj (t)|dt = 0
2 −2/ε n→∞ 1≤j≤kn
n→∞ 1≤j≤kn
Therefore (Xnj ) is holospoudic.
THEOREM 7.2.2 (Lindebeg-Feller Central Limit Theorem) Let (Xnj ) be a standard
array. Then Lindeberg’s condition holds if and only if:
(1) (Xnj ) are holospoudic;
(2) Sn =
n
X
L
Xnj → N (0, 1).
j=1
Proof.
Necessity: First we assume that Lindeberg condition holds.
(I) (Xnj )’s are holospoudic. If Lindeberg’s condition holds, then for each η > 0
kn
Λn (η)
1 X
2
P (|Xnj | > η) =
E (1; |Xnj | > η) = 2
E Xnj
; |Xnj | > η =
η j=1
η2
j=1
j=1
kn
X
kn
X
227
Since η > 0, Λn (η) → 0 for all η > 0 by Lindeberg’s condition, then one obtains that
kn
X
P (|Xnj | >
j=1
η) → 0 as n → ∞. Then by sub-additivity one has
kn
X
P (|Xnj | > η) ≥ P
j=1
kn
[
!
{|Xnj | > η}
≥P
j=1
max |Xnj | > η
1≤j≤kn
To see why the last inequality holds, we may let ω satisfies maxj |Xnj (ω)| > η. Then there exists
kn
[
some j ≤ kn such that |Xnj (ω)| > η, and this shows that ω ∈
{|Xnj | > η}, and the last
j=1
inequality follows from the monotonicity of P . Now for all j = 1, · · · , kn we have
{|Xnj | > η} ⊂ { max |Xnj | > η}
1≤j≤kn
since maxj |Xnj | ≥ |Xnj for all j = 1, · · · , kn , then we obtain that
implying
max P (|Xnj | > η) ≤ P max |Xnj | > η
P (|Xnj | > η) ≤ P max |Xnj | > η
1≤j≤kn
1≤j≤kn
1≤j≤kn
and hence
kn
X
P (|Xnj | > η) ≥ P
j=1
kn
[
!
{|Xnj | > η}
≥P
j=1
by the first result we obtained:
kn
X
max |Xnj | > η
1≤j≤kn
≥ max P (|Xnj | > η) → 0
1≤j≤kn
P (|Xnj | > η) → 0 as n → ∞. Therefore the double array
j=1
(Xnj )’s are holospoudic. Notice that the equality above also gives a proof of the lemma at the
beginning of this section.
L
(II) Sn → N (0, 1). First we estimate the characteristic function of Xnj , and denote it by fnj (t).
By the Taylor expansion on the complex exponential eitx we can write
eitx = 1 + itx + R1 (tx)
eitx = 1 + itx −
(tx)2
2
|tx|3
where |R2 (tx)| ≤
6
where |R1 (tx)| ≤
t2
+ R2 (tx)
2
Then we compute fnj as follows: for all η > 0 we have
fnj (t) = E(eitXnj ) = E(eitXnj ; |Xnj | ≤ η) + E(eitXnj ; |Xnj | > η)
=1−
t2
2
E(Xnj
; |Xnj | ≤ η) + E(R2 (tXnj ); |Xnj | ≤ η) + E(R1 (tXnj ); |Xnj | > η)
2
because E(itXnj ; |Xnj | ≤ η) + E(itXnj ; |Xnj | > η) = itEXnj = 0. Now define
t2
2
θnj (t) = − E(Xnj
; |Xnj | ≤ η) + E(R2 (tXnj ); |Xnj | ≤ η) + E(R1 (tXnj ); |Xnj | > η)
2
228
Here is an observation:
t2
2
; |Xnj | ≤ η) + |E(R2 (tXnj ); |Xnj | ≤ η)| + |E(R1 (tXnj ); |Xnj | > η)|
E(Xnj
2
t2
2
≤ E(Xnj
; |Xnj | ≤ η) + E(|R2 (tXnj )|; |Xnj | ≤ η) + E(|R1 (tXnj )|; |Xnj | > η)
2
t2
t2
|t|3
2
2
; |Xnj | ≤ η) +
E |Xnj |3 ; |Xnj | ≤ η + E Xnj
; |Xnj | > η
= E(Xnj
2
6
2
|θnj (t)| ≤
(∗)
We claim that
(i) For fixed t we have max |θnj (t)| → 0 as n → ∞. By the observation (∗) above we can
1≤k≤kn
compute
t2
t2
|t|3
2
2
E(Xnj
; |Xnj | ≤ η) +
E |Xnj |3 ; |Xnj | ≤ η + E Xnj
; |Xnj | > η
2
6
2
kn
2
3
2
2
3
(tη)
|tη|
t
(tη)
|tη|
t2 X
2
2
+
+ E(Xnj ; |Xnj | > η) ≤
+
+
=
E(Xnj
; |Xnj | > η)
2
6
2
2
6
2 j=1
|θnj (t)| ≤
(tη)2 |tη|3 t2
+
+ Λn (η)
2
6
2
≤
Therefore max |θnj (t)| ≤
1≤j≤kn
|tη|3
t2
(tη)2
+
+ Λn (η) for all η. Therefore for all η > 0 we
2
6
2
obtain that
lim sup max |θnj (t)| ≤
n→∞
1≤j≤kn
(tη)2 |tη|3 t2
(tη)2 |tη|3
+
+
lim Λn (η) =
+
2
6
2 n→∞
2
6
By Lindeberg’s condition. But η can by arbitrarily small, therefore we can let η ↓ 0 and
obtain that lim sup max |θnj (t)| = 0. Therefore (i) holds.
n→∞
1≤j≤kn
(ii) For fixed t we have
kn
X
|θnj (t)| ≤ M < ∞ where M does not depend on n; We can
j=1
follow the observation (∗) to compute
kn
X
|θnj (t)|
j=1
kn
kn
kn
t2 X
t2 X
|t|3 X
2
3
2
≤
E(Xnj ; |Xnj | ≤ η) +
E |Xnj | ; |Xnj | ≤ η +
E Xnj
; |Xnj | > η
2 j=1
6 j=1
2 j=1
kn
kn
t2
t2 X
|t|3 X
2
≤
EXnj +
E η|Xnj |2 ; |Xnj | ≤ η + Λn (η)
2 j=1
6 j=1
2
2
kn
2
t
η|t|3 X
t2
t2
t
η|t|3
2
≤
+
EXnj + Λn (η) = Λn (η) +
+
2
6
2
2
2
6
j=1
229
If we fixed η = 1, then Λn (1) → 0 as n → ∞, and therefore (Λn (1))n≥1 is bounded, namely,
there exists some C such that Λn (1) ≤ C for all n.Also we know that t2 , |t|3 do not depend
t2
t2 |t|3
on n, therefore we can take M = +
+
C and obtain that
2
2
6
kn
X
t2
|θnj (t)| ≤ +
2
j=1
t2 η|t|3
+
2
6
t2
Λn (η) = +
2
t2 |t|3
+
2
6
Λn (1) ≤ M
where M does not depend on n. Therefore (ii) holds.
(iii) For fixed t we have
kn
X
θnj (t) → −
j=1
t2
as n → ∞. Compute
2
k
n
X
2
t
θnj (t) + 2
j=1
kn
kn
kn
X
X
t2 X
2
≤
E(Xnj ; |Xnj | > η) +
E(|R2 (tXnj )|; |Xnj | ≤ η) +
E(|R1 (tXnj )|; |Xnj | > η)
2 j=1
j=1
j=1
X
kn
kn
kn
X
t2 X
(tXnj )2
|tXnj |3
2
; |Xnj | ≤ η +
E
; |Xnj | > η
E(Xnj ; |Xnj | > η) +
E
≤
2 j=1
6
2
j=1
j=1
kn
|t|3 X
η|t|3
2
=t Λn (η) +
E ηXnj
; |Xnj | ≤ η ≤ t2 Λn (η) +
6 j=1
6
2
This inequality holds for all η > 0, therefore we can first let n → ∞, and obtain that
kn
X
2
t η|t|3
η|t|3
2
lim sup θnj (t) + ≤ t lim Λn (η) +
=
n→∞
2
6
6
n→∞ j=1
k
n
X
t2 Then we can let η ↓ 0, and this yields that lim sup θnj (t) + = 0. Therefore (iii)
2
n→∞ j=1
holds.
Therefore we can apply the complex analysis lemma to obtain that
kn
Y
kn
Y
2
t
fSn (t) = fPkn Xnj (t) =
fnj (t) =
(1 + θnj (t)) → exp −
j=1
2
j=1
j=1
for every fixed t, where exp(−t2 /2) is the characteristic function of N (0, 1).
Sufficiency: Asymptotic normality and holospoudicity are equivalent to
lim
n→∞
kn
Y
j=1
2 /2
fnj (t) = e−t
lim max |fnj (t) − 1| = 0
n→∞ 1≤j≤kn
230
kn
X
t2
E(1 − cos tx) = − . The first condition is equivalent to
n→∞
2
j=1
First we show that lim
lim
n→∞
kn
X
log fnj (t) = −t2 /2
j=1
Let θnj (t) = fnj (t) − 1. Since we have the holospoudicity, which is equivalent to the second
condition, then for sufficiently large n we have max |θnj (t)| ≤ 1/2. From now on we will
1≤j≤kn
consider such sufficiently large n below. Now we will use the method developed in the proof of
the complex analysis lemma. We use log(1 + θnj ) to denote the complex logarithm with angle
lies in (−π, π]. When the angle is in (−π, π), the complex logarithm is indeed a analytic complex
function and we have the Taylor series
∞
X
(−1)m+1 θnj (t)m
log(1 + θnj (t)) =
m
m=1
Therefore
∞
∞
X
(−1)m+1 θnj (t)m X |θnj (t)|m
| log(1 + θnj (t)) − θnj (t)| = ≤
m=2
m=2
m
m
∞
∞
X
X
|θnj (t)|m−2
|θnj (t)|m−2
2
≤ |θnj (t)|
= |θnj (t)|
m
2
m=2
m=2
2
≤ |θnj (t)|2
∞
X
(1/2)m−2
≤ |θnj (t)|2
2
m=2
When the angle equals π, namely, the angle of 1 + θnj (t) is π, then it is a negative real number.
And therefore θnj (t) is a real number with |θnj (t)| ≤ 1/2. But this is not possible, since 1 +
θnj (t) > 0 for all θnj (t) ∈ R and |θnj (t)| ≤ 1/2. Therefore we have | log(1 + θnj (t)) − θnj (t)| ≤
|θnj (t)|2 . Hence we obtain
kn
kn
kn
kn
X
X
X
X
2
log(1 + θnj (t)) −
θnj (t) ≤
|θnj (t)| ≤ max |θnj (t)|
|θnj (t)|
1≤j≤kn
j=1
j=1
j=1
j=1
By the Taylor expansion of eitx we have eitx = 1 + tx + R(tx) where |R(tx)| ≤ |tx|2 /2. Let µnj
be the distribution of Xnj . Then write
kn
X
j=1
kn Z
kn Z
X
X
itx
(e − 1)dµnj =
(tx + R(tx))dµnj |θnj | =
R
j=1
=
kn Z
X
j=1
R
j=1
|R(tx)|dµnj
R
kn
t2
t2 X
2
E(Xnj
)=
≤
2 j=1
2
231
because (Xnj ) is a standard array and the variance of row sum is 1, and EXnj = 0. Therefore
k
kn
kn
n
X
X
X
t2
log(1 + θnj (t)) −
θnj (t) ≤ max |θnj (t)|
|θnj (t)| ≤
max |fnj (t) − 1| → 0
1≤j≤kn
2 1≤j≤kn
j=1
j=1
Since we have
kn
X
j=1
log(1+ θnj (t)) → −t2 /2, therefore
j=1
kn
X
θnj (t) → −t2 /2. Hence taking real parts
j=1
yields
kn
X
lim
n→∞
E(1 − cos tXnj ) =
j=1
t2
2
This is the claim we made at the beginning of the proof of the sufficiency part. Next we compute
kn
t2 X
lim sup −
E(1 − cos tXnj ; |Xnj | ≤ η)
n→∞ 2
j=1
k
kn
n
X
X
= lim sup E(1 − cos tXnj ) + o(1) −
E(1 − cos tXnj ; |Xnj | ≤ η)
n→∞ j=1
j=1
kn
X
E(1 − cos tXnj ; |Xnj | > η)
= lim sup n→∞
j=1
kn
2
2 X
2
EXnj
= 2
P (|Xnj | > η) ≤ lim sup 2
≤2 lim sup
η
n→∞
n→∞ η
j=1
j=1
kn
X
where o(1) vanishes in the limit when n → ∞ and the last inequality is the Chebyshev’s inequality. Since 1 − cos tx ≤ (tx)2 /2 for all t, x, then we have
!
!
kn
kn
2
X
t
t2 X
2
−
E(t2 Xnj
/2; |Xnj | ≤ η) ≤ lim sup
−
E(1 − cos tXnj ; |Xnj | ≤ η)
0 ≤ lim sup
2
2
n→∞
n→∞
j=1
j=1
And therefore
0 ≤ lim sup
n→∞
!
kn
t2 X
2
2
−
E(t2 Xnj
/2; |Xnj | ≤ η) ≤ 2
2
η
j=1
Hence
lim sup 1 −
n→∞
kn
X
!
2
; |Xnj | ≤ η)
E(Xnj
j=1
≤
4
η 2 t2
But by the definition of Lindeberg’s quantity
Λn (η) =
kn
X
j=1
2
E(Xnj
; |Xnj |
> η) =
kn
X
2
(EXnj
−
2
; |Xnj |
E(Xnj
j=1
> η)) = 1 −
kn
X
2
; |Xnj | > η)
E(Xnj
j=1
Therefore
4
η 2 t2
n→∞
Now let t → ∞, then we obtain the Lindeberg’s condition Λn (η) → 0 as n → ∞ for all η > 0.
lim sup Λn (η) ≤
232
COROLLARY 7.2.3 For a standard array (Xnj ) Lindeberg’s condition holds if and only if
L
2
Sn → N (0, 1) and max EXnj
→ 0 as n → ∞.
1≤j≤kn
Proof. The proof is easy, because the second condition implies holospoudicity by Chebyshev’s
inequality, and Lindeberg’s condition implies the second condition due to the following observation
2
2
2
max EXnj
= max E(Xnj
; |Xnj | ≤ η) + max E(Xnj
; |Xnj | > η) ≤ η 2 + Λ(η)
1≤j≤kn
1≤j≤kn
1≤j≤kn
2
And we can first let n → ∞ to get lim max EXnj
≤ η 2 and then let η ↓ 0.
n→∞ 1≤j≤kn
Now we reprove the classical central limit theorem as a corollary to the Lindeberg-Feller’s
central limit theorem.
THEOREM 7.2.4 (Classical Central Limit Theorem) Suppose (Xn )n≥1 are i.i.d. meann
X
2
2
EXj2 . Then
zero random variables with EXn < ∞. Let sn =
j=1
n
Sn
1 X
L
=
Xj → N (0, 1)
sn
sn j=1
Proof. Let Ynj = Xn /sn , then (Ynj ) is a standard array and the Lindeberg’s quantity is
n
X
n
1 X
Λn (η) =
E(Xn2 ; |Xn | > ηsn )
> η) = 2
s
n j=1
j=1
√
1
1
= 2 nE(X 2 ; |X| > ηsn ) =
E(|X|2 ; |X| > η nEX 2 )
2
sn
EX
√
√
Notice that P (|X| > η nEX 2 ) → 0 as n → ∞ because EX 2 is fixed, η > 0 and n → ∞,
E(Ynj2 ; |Ynj |
EX 2 < ∞. Therefore the Lindeberg’s condition is met and we have asymptotic normality.
Another generalized 2 + δ version of Liapunov’s central limit theorem is also a special case
when Lindeberg condition holds:
THEOREM 7.2.5 (Generalized Liapunov’s Theorem) For a standard array (Xnj ), if for
some δ > 0 we have
kn
X
E|Xnj |2+δ → 0 as n → ∞
j=1
Then
Sn =
kn
X
L
Xnj → N (0, 1)
j=1
233
Proof. First we compute the Lindeberg’s quantity
kn
kn
kn
X
X
|Xnj |2+δ
1 X
2
Λn (η) =
E(Xnj ; |Xnj | > η) ≤
E
E|Xnj |2+δ → 0
; |Xnj | > η ≤ δ
δ
η
η j=1
j=1
j=1
as n → ∞ because η > 0. Therefore Lindeberg’s condition holds and we have asymptotic
normality for the standard array.
The following Feller’s theorem, illustrates some cases when we have asymptotic normality
without holospoudicity. The theorem is stated without proof.
THEOREM 7.2.6 (Feller’s Theorem) Let (Xnj ) be a standard array. Suppose there exists
two sequences nk , mk and some constant ρ such that:
• (∗)n1 < n2 < · · · < nk < · · · and nk increases to ∞;
• (∗∗)1 ≤ mk ≤ nk ;
Xm k
• (∗ ∗ ∗)Var
→ ρ as k → ∞.
snk
If Sn =
kn
X
L
Xnj → N (0, 1), then one must have
j=1
Xnk L
→ N (0, 1).
snk ρ
EXAMPLE 7.2.2 Suppose we have independent Bernoulli random variables Yk ∼ Bern(pk ), k ≥
n
X
Sn L
2
1. Put Xk = Yk − EYk = Yk − pk . Consider sn =
→ N (0, 1) if and only
pj (1 − pj ). Then
s
n
j=1
n
X
Xj .
ifs2n → ∞ where Sn =
j=1
Proof. It suffices to show the following two statements:
• s2n → ∞ implies asymptotic normality.
Let Znj = Xn /sn , then it is a standard
array. Since s2n → ∞, then for all η > 0 there exists some n such that sn η > 1, and
|Xn | ≤ 1, which means that E(Xn2 ; |Xn | > ηsn ) = 0 for sufficiently large n, and therefore
n
1 X
Λn (η) = 2
E(Xn2 ; |Xn | > ηsn ) = 0 for sufficiently large n, and we have asymptotic
sn j=1
normality by the Lindeberg-Feller’s theorem.
• s2n converging to a finite limit s2 implies no asymptotic normality. Now we let
p1 q1
p1 q1
X 1 s2 L X1
X1
mk = 1 and nk = k, then Var
= 2 → 2 > 0, and √
→√
6∼ N (0, 1),
sk
sk
s
p1 q 1 sk
p1 q1
and by Feller’s theorem we know that Sn /sn does not have asymptotic normality.
234
Chapter 8
Conditional Expectation
8.1
Definition and Examples
DEFINITION 8.1.1 Let X be a P -quasi-integrable random variable on some underlying
probability space (Ω, A, P ) and B be a sub-σ-field of A. A random variable E(X|B) is called a
conditional expectation of X given B if it satisfies that
(i) E(X|B) is B-measurable,
(ii) E(X|B) satisfies the averaging property: for all B ∈ B we have E(E(X|B); B) = E(X; B),
One of the most fundamental problem is the uniqueness and existence of conditional expectation.
THEOREM 8.1.1 (Existence and Uniqueness) Let X be a P -quasi-integrable random
variable on some underlying probability space (Ω, A, P ) and B be a sub-σ-field of A. Then
E(X|B) exists and is almost surely unique.
Before we proceed to the formal proof of the proposition, we first consider a simple but
motivational case, where B consists of at most countable disjoint unions of sets in A.
PROPOSITION 8.1.1 Let X be a P -quasi-integrable random variable on some underlying
probability space (Ω, A, P ) and C be a partition of Ω and C ⊂ A. Let B = σ(C). Then E(X|B)
exists and is almost surely unique.
Proof. Since B is the σ-field generated by a countable partition C of sets in A, then B consists
of at most countable unions of sets in C. Write C = {Ci }i≥1 . We observe that any function
X : Ω → R that is B-measurable if and only if it is constant on each cells Ci ∈ C, because if not,
then there exists two distinct values, say, y1 , y2 , such that X(ω1 ) = y1 , X(ω2 ) = y2 , ω1 , ω2 ∈ Ci ,
and therefore X −1 (y1 )\Ci 6= ∅, since y2 ∈
/ X −1 (y1 ), contradicting X −1 (y1 ) being a union of Ci ’s.
235
And the sufficiency of being constant is obvious, since it takes only finitely many values with
indicators being B-measurable.
• Uniqueness: Suppose Y, Z are two conditional expectations E(X|B). Then they are both
B-measurable, and hence it suffices to show that if P (Ci ) > 0 then Y (ω) = Z(ω) for ω ∈ Ci .
Suppose Y (ω) = yi , Z(ω) = zi for ω ∈ Ci , then by the averaging property of conditional
expectation we have
P (Ci )yi = E(Y ; Ci ) = E(X; Ci ) = E(Z; Ci ) = P (Ci )zi
Therefore yi = zi since P (Ci ) > 0, and hence Y and Z agree on Ci with the same constant.
• Existence: For all ω ∈ Ω, there exists a unique Ci ∈ C such that ω ∈ Ci . Define
(
P (Ci )−1 E(X; Ci )
if ω ∈ Ci , P (Ci ) > 0
Z(ω) =
0
if ω ∈ Ci , P (Ci ) = 0
Now we check that Z serves as a conditional expectation. First it is integrable because it
takes only finitely many values, and second it is B-measurable because it is constant on
S
each cell Ci ∈ C. Finally let B ∈ B be of the form B = j∈J Cj where J is some countable
index set. Then by the σ-additiveness of indefinite integral we can compute
!
X
X
E(Z; Cj )
Cj =
E(Z; B) = E Z;
j∈J
j∈J
=
X
X
E(Z; Cj ) =
J:P (Cj )>0
P (Cj )−1 E(X; Cj )P (Cj )
J:P (Cj )>0
!
=
X
E(X; Cj ) = E
X;
X
Cj
= E(X; B)
j∈J
j∈J
And the averaging property is satisfied. Therefore Z serves as a conditional expectation.
Intuitively, conditional expectation is the local average of the X with respect to the “smallest”
measurable set in B. And before moving to the rigorous proof of proposition 8.1.1, we see a
particular example which is yielded completely by guess but is also instructive.
EXAMPLE 8.1.1
(1) B = {∅, Ω}, E(X|B) = EX a.s..
(2) Let Ω = [−1, 1] and A be the Borel σ-field over [−1, 1]. Let f be some density on (Ω, A) with
respect to the Lebesgue measure, and let 0 < f < ∞. Consider B = σ(T ) where T : Ω → [0, 1]
236
is defined by T (ω) = |ω| and assume X ≥ 0 be a non-negative random variable. Then
E(X|B)(ω) = X(ω)
f (ω)
f (−ω)
+ X(−ω)
f (ω) + f (−ω)
f (ω) + f (−ω)
We first claim that σhT i = {A ∈ B[−1,1] : x ∈ A implies − x ∈ A} = {A ∈ B[−1,1] : A = −A}.
(i) Let B be any Borel set in [0, 1]. Consider T −1 (B) ∈ σhT i. Then x = T (y) ∈ B implies
that y = x or y = −x, and hence {x, −x} ⊂ T −1 (B), and therefore −x ∈ T −1 (B) ∈ {A ∈
B[−1,1] : x ∈ A implies − x ∈ A}. Therefore σhT i ⊂ {A ∈ B[−1,1] : x ∈ A implies − x ∈ A}.
(ii) Conversely if A ∈ {A ∈ B[−1,1] : x ∈ A implies − x ∈ A}, then let B = {|x| : x ∈ A}. This
means that B = T (A). But T −1 (B) = T −1 (T (A)) = A because T is a surjection. It suffices
to verify that B is a Borel set. But
B = {|x| : x ∈ A} = {|x| : x ∈ A ∩ [0, 1]} ∪ {|x| : x ∈ A ∩ [−1, 0)}
= {x : x ∈ A ∩ [0, 1]} ∪ {x : −x ∈ A ∩ [−1, 0)}
= {x : x ∈ A ∩ [0, 1]} ∪ {x : x ∈ A ∩ (0, 1]} = A ∩ [0, 1]
because A is symmetric and A = −A. Therefore B is Borel in [0, 1] because A is Borel in
[−1, 1].
Now we check that
Z(ω) = X(ω)
f (ω)
f (−ω)
+ X(−ω)
f (ω) + f (−ω)
f (ω) + f (−ω)
indeed serves as a conditional expectation. Since Z is even, then it is a function of T (ω) because
when |ω| is determined, then so is Z. Therefore it is σhT i-measurable. Clearly Z is quasiintegrable because it is non-negative. Now suppose A ∈ σhT i, namely, A = −A, then
f (ω)
f (−ω)
E(Z; A) =E X(ω)
+ X(−ω)
;A
f (ω) + f (−ω)
f (ω) + f (−ω)
Z
Z
f 2 (ω)
f (ω)f (−ω)
= X(ω)
dλ +
X(ω)
dλ
f (ω) + f (−ω)
f (ω) + f (−ω)
A
−A
Z
Z
f 2 (ω)
f (ω)f (−ω)
dλ +
X(ω)
dλ
= X(ω)
f (ω) + f (−ω)
f (ω) + f (−ω)
A
A
Z
= X(ω)f (ω)dλ = E(X; A)
A
Therefore Z serves as E(X|B).
LEMMA 8.1.1 Suppose X, Y are two random variables on a probability space (Ω, A, P ), and
for all A ∈ A we have E(X; A) ≤ E(Y ; A), then X ≤ Y a.s..
237
Proof. Let Bt = {X > t > Y }. Then Bt ∈ B for all t ∈ R. We claim that E(X, Bt ) > tP (Bt ) >
E(Y, Bt ) provided that P (Bt ) > 0. If not then the monotonicity gives E(X, Bt ) = tP (Bt ) =
E(Y, Bt ), and therefore E(X − t; Bt ) = E(t − Y ; Bt ) = 0. Therefore P ({X − t > 0} ∩ Bt ) =
P ({t − Y > 0} ∩ Bt ) = P (Bt ) = 0, contradicting with P (Bt ) > 0. Hence if P (Bt ) > 0 then we
have
E(X, Bt ) > tP (Bt ) > E(Y, Bt ) ≥ E(X, Bt ) ≥ tP (Bt )
But this is not possible, since we obtain tP (Bt ) < tP (Bt ). Therefore P (Bt ) = 0 and hence
[
[
{X > Y } =
{X > t > Y } =
Bt
r∈Q
r∈Q
has probability 0 by sub-additivity. Namely, X ≤ Y a.s..
Now we can prove theorem 8.1.1.
Proof of Theorem 8.1.1.
Uniqueness: ZConsider probability
space (Ω, B, P ). If two random variables on this probability
Z
space satisfies
Y dP =
Y dP , and P is a probability measure, then Y = Z a.s.. This is a reB
Z
sult we have already proved in chapter 3 Indefinite Integral, where the underlying measure is not
necessary a probability measure, but a σ-finite measure. Therefore two conditional expectations
must be a.s. identical.
Existence: To construct a conditional expectation, one need to follow the build-up procedure,
but starting from a squared-integrable random variable.
• Step 1: Suppose X ∈ L2 (Ω, A, P ). Then L2 (Ω, B, P ) is a sub-Hilbert space of L2 (Ω, A, P ),
and therefore there exists a projection Z ∈ L2 (Ω, B, P ) from the whole space to the subspace, such that
Z
E(Y (X − Z)) =
Y (X − Z)dP = 0
Ω
for all Y ∈ L2 (Ω, B, P ). Now for all B ∈ B, we can let Y = IB , and hence we obtain
E(X − Z; B) = E(X; B) − E(Z; B) = 0. Also we know that Z is B-measurable and
integrable, and therefore Z serves as a conditional expectation E(X|B).
• Step 2: Suppose X ≥ 0 is a non-negative random variable. Let Xn = max(X, n), and
then E(Xn |B) exists because Xn is in L2 (Ω, B, P ), and is unique a.s. because uniqueness is
already proved. Since for all B ∈ B we have averaging property
E(E(Xn |B); B) = E(Xn ; B) ≤ E(Xn+1 ; B) = E(E(Xn+1 |B); B)
for all B ∈ B, and by the previous lemma we know that E(Xn |B) increases a.s.. By setting
values to be 0 in a set of probability 0 we can let E(Xn |B) increases for every sample point
238
in Ω. Now we claim that Z = lim E(Xn |B) serves as a conditional expectation. Clearly
n→∞
Z is non-negative and is quasi-integrable. Also Z is B-measurable by the closure theorem,
since E(Xn |B) is B-measurable for all n. It suffices to check the averaging property: for all
B ∈ B compute by Monotone Convergence Theorem
E(Z; B) = E lim E(Xn |B); B = E lim E(Xn |B)IB
n→∞
n→∞
= lim E (E(Xn |B)IB ) = lim E (Xn IB ) = E lim Xn IB
n→∞
n→∞
n→∞
= E(XIB ) = E(X; B)
And therefore Z serves as E(X|B).
• Step 3: Suppose X is general quasi-integrable random variable. First by step 2 existence
of E(X + |B) and E(X − |B) is guaranteed. Without loss of generality we may assume that
E(X − ) < ∞. Therefore E(E(X − |B)) = EX − < ∞, implying that P (E(X − |B) = ∞) =
0. Therefore we can manipulate Z = E(X + |B) − E(X − |B). We show that Z serves as
E(X|B). Clearly Z is B-measurable, and E(X − |B) is integrable, implying that Z is quasiintegrable(Z ∈ Q− ). It suffices to check the averaging property: for all B ∈ B compute
E(Z; B) = E(E(X + |B); B) − E(E(X − |B); B) = E(X + ; B) − E(X − ; B) = E(X; B)
Hence Z serves as E(X|B).
8.2
Basic Properties
THEOREM 8.2.1 (Operator Property E1 & E2) Suppose (Ω, A, P ) is a probability space
with B being a sub-σ-field of A, and X, Y are random variables.
(E1) E(1|B) = 1 a.s..
(E2) Linearity: E(cX|B) = cE(X|B) and E(X + Y |B) = E(X|B) + E(Y |B) if X, Y ∈ Q± .
Proof.
(E1) Clearly 1 is quasi-integrable and B-measurable, and for all B ∈ B we have E(1; B) =
P (B) = E(1; B). Hence E(1|B) = 1 a.s..
(E2) Clearly if E(X|B) is quasi-integrable then cE(X|B) is also quasi-integrable for all finite c;
and also by closure theorem cE(X|B) is B-measurable. Now if B ∈ B then
E(cE(X|B); B) = cE(E(X|B); B) = cE(X; B) = E(cX; B)
Therefore E(cX|B) = cE(X|B) a.s..
239
Next X, Y ∈ Q± then X + Y ∈ Q± , and E(E(X|B)) = EX, E(E(Y |B)) = EY implies that
E(X|B), E(Y |B) are also in Q± . This shows that E(X|B)+E(Y |B) is quasi-integrable. And it is
also B-measurable according to the closure theorem. It suffices to show the averaging property.
For all B ∈ B use the linearity of integral
E(E(X|B)+E(Y |B); B) = E(E(X|B); B)+E(E(Y |B); B) = E(X; B)+E(Y ; B) = E(X +Y ; B)
Therefore E(X + Y |B) = E(X|B) + E(Y |B) a.s..
THEOREM 8.2.2 (Smoothing Properties of Conditional Expectation) Suppose (Ω, A, P )
is a probability space with B being a sub-σ-field of A, and X is a quasi-integrable random variable.
(S1) X ∈ Q± if and only if E(X|B) ∈ Q± , and X ∈ L1 if and only if E(X|B) ∈ L1 .
(S2) If X is B-measurable and Y is quasi-integrable, then E(XY |B) = XE(Y |B) and in particular, E(X|B) = X a.s..
(S3) If B1 ⊂ B2 are two nested sub-σ-fields, then nested property holds
E(E(X|B1 )|B2 ) = E(E(X|B2 )|B1 ) = E(X|B1 ) a.s.
Proof.
(S1) Suppose X ∈ Q± , then EX ± < ∞, and therefore E(E(X ± |B)) < ∞, implying that
E(X ± |B) ∈ L. Since E(X + |B) − E(X − |B) serves as E(X|B)(one of the terms must be finite
with probability 1 due to quasi-integrability), then we know that E(X|B) ∈ Q± . Conversely if
E(X|B) ∈ Q− , then E(E(X|B)− ) < ∞ and the subtraction is legible, and hence
−∞ < E(E(X|B)+ − E(X|B)− ) = E(E(X|B)) = EX
implying that X ∈ Q−
and if E(X|B) ∈ Q+ , then E(E(X|B)+ ) < ∞ and the subtraction is also legible, and hence
+∞ > E(E(X|B)+ − E(X|B)− ) = E(E(X|B)) = EX
implying that X ∈ Q+
(S2) If X is an indicator then the result holds trivially, since this is the averaging property; If X
P
is a simple function, say, X = j cj IBj , then for all B ∈ B we have
X
X
E(E(XY |B); B) = E(XY ; B) =
cj E(IBj Y ; B) =
cj E(Y ; Bj ∩ B)
j
E(XE(Y |B); B) =
X
j
j
cj E(IBj E(Y |B); B) =
X
j
cj E(E(Y |B); Bj ∩ B) =
X
cj E(Y ; Bj ∩ B)
j
And therefore E(XY |B) = XE(Y |B) a.s.. Now if X, Y are nonnegative then let (Xn )n≥1 be a
sequence increasing to X. Then Xn Y increases to XY . Therefore by Monotone Convergence
240
Theorem we can compute for all B ∈ B:
E lim E (Xn Y |B) ; B = lim E (E (Xn Y |B) ; B) = lim E(Xn Y ; B)
n→∞
n→∞
n→∞
= E lim Xn Y ; B = E(XY ; B)
n→∞
Therefore E(XY |B) = lim E(Xn Y |B) a.s.. But Xn is simple, and therefore
n→∞
lim E(Xn Y |B) = lim Xn E(Y |B) = E(Y |B) lim Xn = XE(Y |B)
n→∞
n→∞
n→∞
Finally if X, Y are quasi-integrable and XY is quasi-integrable, then write
XY = (X + − X − )(Y + − Y − ) = (X + Y + ) + (−X − Y + ) + (−X + Y − ) + (X − Y − )
Notice that all four random variables have disjoint supports. Hence if XY ∈ Q− , then (−X − Y + )
and (−X + Y − ) must be in Q− . And automatically X + Y + and X − Y − are in Q− because they are
non-negative. Similarly if XY ∈ Q+ then all four random variables are in Q+ . Also notice that
quasi-integrability also ensures quasi-integrability of conditional expectation. Hence linearity of
conditional expectation can be applied(E2):
E(XY |B) = E(X + Y + |B) + E(−X − Y + |B) + E(−X + Y − |B) + E(X − Y − |B)
= E(X + Y + |B) − E(X − Y + |B) − E(X + Y − |B) + E(X − Y − |B)
= X + E(Y + |B) − X − E(Y + |B) − X + E(Y − |B) + X − E(Y − |B)
= (X + − X − )(E(Y + |B) − E(Y − |B)) = XE(Y |B)
(3) Since B1 ⊂ B2 then E(X|B1 ) is also B2 -measurable. Therefore E(E(X|B1 )|B2 ) = E(X|B1 )
by S2; To show that E(E(X|B2 )|B1 ) = E(X|B1 ), first clearly it is quasi-integrable, and is
B1 -measurable and next we can compute
E(E(E(X|B2 )|B1 ); B1 ) = E(E(X|B2 ); B1 ) = E(X; B1 )
for all B1 ∈ B1 ⊂ B2 . Therefore average property holds and E(E(XB2 )|B1 ) = E(X|B1 ) a.s..
THEOREM 8.2.3 (Operator Property E3) Suppose (Ω, A, P ) is a probability space with
B being a sub-σ-field of A, and X, Y are random variables.
(a) Monotonicity: If X ≤ Y , then E(X|B) ≤ E(Y |B) a.s.. In particular, if X ≥ 0 then
E(X|B) ≥ 0 a.s..
(b) Modulus Inequalities: (E(X|B))+ ≤ E(X + |B), (E(X|B))− ≤ E(X − |B), and |E(X|B)| ≤
E(|X||B).
(c) If X < Y a.s., and at least one of X, Y is integrable, then E(X|B) < E(Y |B) a.s..
241
Proof.
(a) Monotonicity is exactly lemma 8.1.1.
(b) First X ≤ X + , and monotonicity gives E(X|B) ≤ E(X + |B); If A ≤ B with B ≥ 0, then
A+ ≤ B, which implies E(X|B)+ ≤ E(X + |B). Similarly −X ≤ X − , and then −E(X|B) =
E(−X|B) ≤ E(X − |B), and the same argument can be applied to yield E(X|B)− = (−E(X|B))+ ≤
E(X − |B). Therefore
|E(X|B)| = E(X|B)+ + E(X|B)− ≤ E(X + |B) + E(X − |B) = E(X + + X − |B) = E(|X||B)
(c) Suppose Z ≥ 0, then we observe the following equality
0 = E(E(Z|B); E(Z|B) = 0) = E(Z; E(Z|B) = 0)
Denote A = {E(Z|B) = 0}, B = {Z = 0}, then P (A\B) = 0. This is because if P (A) = 0 then
P (A\B) ≤ P (A) = 0, and if P (A) > 0 then E(Z; A) = 0 implies that ZIA = 0 a.s., but P (A) > 0
and therefore P ({Z 6= 0} ∩ A) = P (A\B) = 0. Now let Z = Y − X. Then Z > 0 a.s., and hence
P (B) = 0, which implies P (A) = 0 because P (A) = P (A\B)+P (A∩B) ≤ P (A\B)+P (B) = 0.
Hence E(Z|B) = E(Y |B) − E(X|B) = 0 with probability 0, namely, E(Y |B) − E(X|B) > 0 a.s.,
which is equivalent to E(X|B) < E(Y |B) a.s..
THEOREM 8.2.4 (Monotone Convergence Theorem, E4) Let (Xn )n≥1 be a sequence
of non-negative increasing random variables on a probability space (Ω, A, P ) with a sub-σ-field
B. Then
lim E(Xn |B) = E
n→∞
Proof. Let Xn increases to X.
lim Xn |B
n→∞
a.s.
Lemma 8.1.1 shows that E(Xn |B) also increases.
Clearly
lim E(Xn |B) is quasi-integrable(non-negative actually), and is is B-measurable by the closure
n→∞
theorem. Then for all B ∈ B by Monotone Convergence Theorem for integration we have
E lim E(Xn |B); B = lim E(E(Xn |B); B) = lim E(Xn ; B) = E lim Xn ; B = E(X; B)
n→∞
n→∞
n→∞
n→∞
Therefore averaging property holds and E(Xn |B) ↑ E(X|B) as n → ∞ a.s..
THEOREM 8.2.5 (Fatou’s Lemma, E5) Let (Xn )n≥1 be a sequence of non-negative random variables on a probability space (Ω, A, P ) with a sub-σ-field B. Then
E lim inf Xn |B ≤ lim inf E(Xn |B) a.s.
n→∞
n→∞
Proof. Let Yn = inf Xk and Y = lim inf Xn . Then we know that Yn ↑ Y as n → ∞. By the
k>n
n→∞
Monotone Convergence Theorem E4 we know that
E(Yn |B) ↑ E(Y |B) = E lim inf Xn |B
n→∞
242
as n → ∞. But by the definition of Yn we also know that Yn ≤ Xn for all n. Therefore the
monotonicity yields E(Yn |B) ≤ E(Xn |B) a.s. for all n. By taking limit inferior on both sides we
obtain that
E lim inf Xn |B = lim E(Yn |B) = lim inf E(Yn |B) ≤ lim inf E(Xn |B)
n→∞
n→∞
n→∞
n→∞
THEOREM 8.2.6 (Dominated Convergence Theorem, E6) Let (Xn )n≥1 be a sequence
of random variables on a probability space (Ω, A, P ) with a sub-σ-field B. Suppose there exists
random variables X, Y such that Xn → X a.s. and |Xn | ≤ Y with Y being integrable, then
lim E(Xn |B) = E(X|B) a.s.
n→∞
Proof. First we have −Y ≤ Xn ≤ Y , and (Xn + Y )n≥1 , (Y − Xn )n≥1 are two non-negative
sequences of random variables. Then by Fatou’s lemma E5 we have
E(Y − X|B) = E lim inf (Y − Xn )|B ≤ lim inf E(Y − Xn |B)
n→∞
n→∞
= E(Y |B) − lim sup E(Xn |B) a.s.
n→∞
E(Y + X|B) = E lim inf (Y + Xn )|B ≤ lim inf E(Y + Xn |B)
n→∞
n→∞
= E(Y |B) + lim inf E(Xn |B) a.s.
n→∞
Therefore
lim sup E(Xn |B) ≤ E(X|B)
n→∞
lim inf E(Xn |B) ≥ E(X|B)
n→∞
And therefore E(Xn |B) → E(X|B) a.s. when n → ∞.
THEOREM 8.2.7 (Operator Property E7) If X and B are independent, then E(X|B) =
EX a.s..
Proof. Clearly EX is a constant random variable, and is quasi-integrable and B-measurable.
Now for all B ∈ B, compute
E(EX; B) = P (B)EX = E(XIB ) = E(X; B) = E(E(X|B); B)
where the second equality holds because IB and X are independent. Therefore averaging property holds and E(X|B) = EX a.s..
DEFINITION 8.2.1 Let X be P -quasi-integrable on a probability space (Ω, A, P ), and let
T : Ω → T be a measurable mapping between the σ-fields A and C. The conditional expectation
of X given T , is a random variable E(X|T ) from (T , C) to (R, R) such that
Z
Z
XdP =
E(X|T )d(P T −1 ) for all C ∈ C
{T ∈C}
C
243
REMARK 8.2.1 If BT = σhT i, then E(X|BT ) = E(X|T ) ◦ T . In fact E(X|T ) ◦ T is σhT imeasurable, and by the change of variable formula we have
Z
Z
Z
−1
XdP =
E(X|T )d(P T ) =
{T ∈C}
E(X|T ) ◦ T dP
{T ∈C}
C
for all C ∈ T . But when C runs over T , {T ∈ C} also runs over σhT i, and therefore we conclude
that for all B ∈ σhT i we have E(X; B) = E(E(X|T ) ◦ T ; B). Therefore the averaging property
holds and E(X|B) = E(X|T ) ◦ T a.s..
REMARK 8.2.2 If T : (Ω, A) → (Ω, B) is the identity map, then E(X|BT ) = E(X|T ).
244
Chapter 9
Martingales
9.1
Definition and Examples
DEFINITION 9.1.1 (Partial Order) A partial order on a set S is a relation having
properties:
(i) Reflexivity: x x for all x ∈ S;
(ii) Antisymmetry: x y and y x implies x = y;
(iii) Transitivity: If x y and y z, then x z.
A partial order is called a total order or linear order if for all x, y ∈ S, either x y or y x.
DEFINITION 9.1.2 (Martingale, Submartingale and Supermartingale) Let T be a
partially ordered set, (Ft )t∈T is an increasing family of sub-σ-fields of the underlying probability
space (Ω, A, P ) in the sense that if t1 t2 then Ft1 ⊂ Ft2 . Here (Ft )t∈T is called a filtration. A
family of integrable random variables (Xt )t∈T is said to be adapted to the filtration (Ft )t∈T if Xt
is Ft -measurable for all t ∈ T . Given a filtration (Ft ) and a family of random variables adapted
to (Ft )t∈T , we have the following definitions:
(i) If Xs = E(Xt |Fs ) a.s. for all s t, then (Xt )t∈T is called a martingale;
(ii) If Xs ≤ E(Xt |Fs ) a.s. for all s t, then (Xt )t∈T is called a submartingale;
(iii) If Xs ≥ E(Xt |Fs ) a.s. for all s t, then (Xt )t∈T is called a supermartingale.
PROPOSITION 9.1.1 Suppose (Xn )n≥1 is a sequence of integrable random variables on some
underlying probability space (Ω, F, P ). Then (Xn ) is a martingale with respect to a filtration
(Fn )n≥1 if Xn ∈ Fn and E(Xn+1 |Fn ) = Xn a.s. for all n.
245
EXAMPLE 9.1.1 (Sums of Independent Random Variables) Let (Yn )n≥1 be indepenn
X
dent random variables with mean 0. Define Xn =
Yj . Take filtration Fn = σ(X1 , · · · , Xn ) =
j=1
σ(Y1 , · · · , Yn ). Then (Xn )n≥1 is a martingale with respect to (Fn )n≥1 .
Proof. First Xn ’s are all integrable; Second each Xn is Fn -measurable because Xn ∈ σhX1 , · · · , Xn i =
Fn . Finally we check
E(Xn+1 |Fn ) = E
n
X
!
Yn + Yn+1 |Fn
=E
i=1
=
n
X
n
X
!
Yn |Fn
+ E(Yn+1 |Fn )
i=1
Yi + EYn+1 = Xn
i=1
because Yn+1 is independent of Fn and Xn ∈ Fn .
EXAMPLE 9.1.2 (General Sums) Let (Fn )n≥0 be a filtration and ξn ∈ Fn for n ≥ 1 with
E|ξn | < ∞. Let ηn ∈ Fn such that ηn is bounded by bn for all n ≥ 0. Then
Xn =
n
X
ηk−1 (ξk − E(ξk |Fk−1 ))
k=1
is a martingale.
Proof. First ηn ’s are bounded, ξk ’s are integrable, then Xn is integrable; Second ηn ∈ Fn ,
ξ1 , · · · , ξn ∈ Fn , E(ξk |Fk−1 ) ∈ Fk−1 ⊂ Fn , then we know that Xn ∈ Fn . Finally we check
!
n
X
E(Xn+1 |Fn ) = E ηn (ξn+1 − E(ξn+1 Fn )) +
ηk−1 (ξk − E(ξk |Fk−1 )) Fn
k=1
!
n
X
= E (ηn (ξn+1 − E(ξn+1 |Fn ))| Fn ) + E
ηk−1 (ξk − E(ξk |Fk−1 )) Fn
k=1
= ηn E(ξn+1 |Fn ) − ηn E(ξn+1 |Fn ) +
n
X
ηk−1 E(ξk − E(ξk |Fk−1 )|Fn )
k=1
=
n
X
ηk−1 (ξk − ηk−1 E(ξk |Fk−1 )) = Xn
k=1
Because E(E(ξn+1 |Fn )|Fn ) = E(ξn+1 |Fn ) and E(E(ξk |Fk−1 )|Fn ) = E(ξk |Fk−1 ), and the rest are
by linearity.
EXAMPLE 9.1.3 (Variance of a Sum) Let (ξn )n≥1 be i.i.d. with Eξ = 0 and Var(ξ) =
σ 2 < ∞. Let
Sn =
n
X
ξk
S0 = 0 Fn = σhξ1 , · · · , ξn i
k=1
Define Xn =
Sn2
2
− nσ . Then (Xn )n≥1 is a martingale.
246
Proof. First we check
n
X
E|Xn | = E|Sn2 − nσ 2 | = E
!2
ξk
+ nσ 2 = E
k=1
=
n
X
E(ξk2 ) + 2
X
n
X
!
ξk2
+2
X
E(ξi ξj ) + nσ 2
i<j
k=1
Eξi Eξj + nσ 2 = 2nσ 2 < ∞
i<j
k=1
Next we know that Sn ∈ Fn , and therefore Xn ∈ Fn . Finally we need to check
2
2
E(Xn+1 |Fn ) = E Sn+1
− (n + 1)σ 2 |Fn = E(Sn+1
|Fn ) − (n + 1)σ 2
= E (Sn + ξn+1 )2 |Fn − (n + 1)σ 2
2
= Sn2 + 2Sn E(ξn+1 |Fn ) + E(ξn+1
|Fn ) − (n + 1)σ 2
= Sn2 + σ 2 − (n + 1)σ 2 = Xn
EXAMPLE 9.1.4 (Right-Eigenvectors for Markov Chains) Let (Yn )n≥0 be a discretetime and finite state-space F = {1, 2, · · · , k} Markov chain with transition probability matrix P .
Let Fn = σhY0 , · · · , Yn i. Denote β be an eigenvalue of P and f be a corresponding eigenvector.
Then we have
X
P (x, y)f (y) = βf (x)
y∈F
for all x ∈ F . Define Xn = β
−n
f (Yn ) for n ≥ 0. Then Xn is a martingale.
Proof. First Yn is integrable and hence Xn is integrable; Second since Yn ∈ Fn , and f only takes
finitely many values, then Xn is also Fn -measurable. Finally we check
E(Xn+1 |Fn ) =
1
β n+1
E(f (Yn+1 )|Fn ) =
1
β n+1
k
X
P (Yn , x)f (x)
x=1
f (Yn )
= Xn
= n+1 βf (Yn ) =
β
βn
1
EXAMPLE 9.1.5 (Moment-Generating Function Martingale) Let (ξn )n≥1 be i.i.d. with
distribution function F and moment-generating function ϕ(λ) = Eeλξ1 , and ϕ is finite over some
non-degenerate interval. Let
Sn =
n
X
ξk
S0 = 0 Fn = σhξ1 , · · · , ξn i
k=1
Define Xn = (ϕ(λ))−n exp(λSn ) for all n ≥ 1, then (Xn )n≥1 is a martingale.
247
Proof. Compute
n
1
1 Y
E(exp(λξi )) = 1 < ∞
E|Xn | =
E(exp(λSn )) =
ϕ(λ)n
ϕ(λ)n i=1
And clearly Xn is Fn measurable because Sn ∈ Fn . Now it suffices to check
1
eλSn
E(exp(λS
)|F
)
=
E(exp(λξn+1 )|Fn )
n+1
n
ϕ(λ)n+1
ϕ(λ)n+1
eλSn
eλSn
E
exp(λξ
)
=
ϕ(λ) = Xn
=
n+1
ϕ(λ)n+1
ϕ(λ)n+1
E(Xn+1 |Fn ) =
EXAMPLE 9.1.6 (Doob’s Martingale Process) Let Y be an integrable random variable,
(Fn )n≥1 be a filtration, and Xn = E(Y |Fn ). Then Xn ∈ Fn , and (Xn )n≥1 is a Martingale.
Proof. First E|Xn | = E|E(Y |Fn )| ≤ E(E|Y ||Fn ) = E|Y | < ∞, implying the integrability of
Xn . Second Xn is clearly Fn -measurable by the definition of conditional expectation. Finally
check
E(Xn+1 |Fn ) = E(E(Y |Fn+1 )|Fn ) = E(Y |Fn ) = Xn
by the nested property of conditional expectation.
REMARK 9.1.1 We will show later that any uniformly integrable martingale must be a
Doob’s martingale process.
9.2
Martingale Convergence Theorems
Two main theorems in this section are the Almost Sure Convergence Theorem for L1 bounded
martingales and L1 Convergence Theorem for uniformly integrable random variables. First we
need to deal with the first theorem, and some preliminaries are needed.
LEMMA 9.2.1 If (Xt ) is a submartingale with respect to a filtration (Ft ), then EXt increases
with t; if (Xt ) is a martingale, then EXt is constant in t.
Proof. It suffices to show the first claim. Let s ≤ t. Then E(Xt |Fs ) ≥ Xs , and therefore
EXt = E(E(Xt |Fs )) ≤ E(Xs ).
DEFINITION 9.2.1 (Upcrossing) Suppose (Xn )n≥1 is a discrete-time stochastic process.
Let −∞ < a < b < ∞. One say that X is attempting an upcrossing of a, b at time j if there is an
index i ≤ j such that Xi ≤ a and Xk < b for i ≤ k ≤ j. X completes an upcrossing at time j if
there exists i ≤ j such that Xi ≤ a, Xk < b for i ≤ k < j and Xj ≥ b. We use Ua,b to denote the
number of complete upcrossings, namely, Ua,b = #{j ≥ 1 : X completes an upcrossing at j}.
248
LEMMA 9.2.2 (Skipping Theorem) Let (Xk )nk=1 be a submartingale with respect to
(Fk )nk=1 . Let (εm )nm=2 be binary-valued({0, 1}) random variables adapted to (Fm−1 )nm=2 . Put
n
X
Dm = Xm − Xm−1 for 2 ≤ m ≤ n, and X̃n = X1 +
εm Dm . Then E X̃n ≤ EXn .
m=2
Proof. Direct computation is enough:
!
n
n
n
X
X
X
E X̃n =E X1 +
εm Dm = EX1 +
E(εm Dm ) = EX1 +
E(E(εm Dm |Fm−1 ))
=EX1 +
≤EX1 +
m=2
n
X
m=2
m=2
E(εm E(Xm − Xm−1 |Fm−1 )) = EX1 +
m=2
n
X
n
X
E(εm (E(Xm |Fm−1 ) − Xm−1 ))
m=2
E(E(Xm |Fm−1 ) − Xm−1 ) = EXn
m=2
where the last inequality holds because E(Xm |Fm−1 ) ≥ Xm−1 a.s..
LEMMA 9.2.3 If (Xk )nk=1 is a submartingale with respect to (Fk )nk=1 , then so is ((Xk −a)+ )nk=1
for any finite constant a.
Proof. Without loss of generality we may assume that a = 0 because (Xk −a)k is a submartingale
if and only if (Xk )k is a submartingale. Now if (Xk ) is a submartingale then for j ≤ k:
Xj ≤ E(Xk |Fj ) ≤ E(Xk+ |Fj ) implying that Xj+ ≤ E(Xk+ |Fj )
because E(Xk+ |Fj ) ≥ 0 a.s. by the monotonicity.
LEMMA 9.2.4 (Doob’s Upcrossing Lemma) Let (Xk )nk=1 be a submartingale with respect
to (Fk )nk=1 . Let −∞ < a < b < ∞ and Ua,b = Ua,b (X1 , · · · , Xn ) be the number of upcrossings of
the interval [a, b] by the process X1 , · · · , Xn . Then
EUa,b
E(Xn − a)+ − E(X1 − a)+
E(Xn − a)+
EXn+ + a−
≤
≤
≤
b−a
b−a
b−a
Proof. The last two inequalities are trivial, so it suffices to show the first inequality.
• Case 1: Xm ≥ 0 and a = 0. In this case (Xn − a)+ = Xn and (X1 − a)+ = X1 . Define
Am =
m−1
[ m−1
\
({Xi ≤ a} ∩ {Xj < b}) εm = IAm
X̃n = X1 +
n
X
εm (Xm − Xm−1 )
m=2
i=1 j=i
Then clearly, Am ∈ Fm−1 and therefore εm ∈ Am . By the skipping theorem we have
E X̃n ≤ EXn . Observe that εm = 1 if and only if Xm−1 is attempting an upcrossing, and
therefore the sum in X̃n is the total increment of X during a complete upcrossing, plus the
249
total increment of X over a terminal incomplete upcrossing, if such exists. Because a = 0,
Xm ≥ 0, then εm (Xm − Xm−1 ) ≥ 0 for m = 2, · · · , n and therefore
EXn ≥ E X̃n ≥ EX1 + (b − a)EUa,b
implying that EUa,b ≤
EXn − EX1
b−a
∗
• Case 2: General case. Let Xm
= (Xm − a)+ . Then by previous lemma we know that
∗ n
(Xm
)m=1 is a submartingale with respect to (Fm )nm=1 . Then by the result of the previous
case we have
EXn∗ − EX1∗
b−a
∗
∗
∗
But clearly U0,b−a (X1 , · · · , Xn ) = Ua,b (X1 , · · · , Xn ) because Xm
= 0 if and only if Xm ≤ a,
EU0,b−a (X1∗ , · · · , Xn∗ ) ≤
∗
and Xm
≥ b − a if and only if Xm ≥ b. , so we get
EUa,b (X1 , · · · , Xn ) ≤
E(Xn − a)+ − E(X1 − a)+
b−a
Now we are in a position to prove the following theorem:
THEOREM 9.2.1 (Almost Sure Convergence Theorem for Submartingale) Let (Xn )n≥1
be a submartingale with respect to (Fn )n≥1 . Suppose
lim EXn < ∞
n→∞
Then there exists an integrable random variable X∞ such that Xn → X∞ with probability 1,
and E|Xn | stays bounded.
Proof. For −∞ < a < b < ∞ put Ua,b = lim Ua,b (X1 , · · · , Xn ). Clearly such limit exists because
n→∞
the quantity Ua,b (X1 , · · · , Xn ) increases with n. Now by Monotone Convergence Theorem and
Doob’s upcrossing lemma we obtain
EUa,b = E lim Ua,b (X1 , · · · , Xn ) = lim EUa,b (X1 , · · · , Xn )
n→∞
n→∞
EXn+ + a−
a−
1
≤ lim
=
+
lim EXn+ < ∞
n→∞
b−a
b − a b − a n→∞
where the finiteness is due to the assumption. Therefore Ua,b = ∞ with probability 0. Now
consider relation
ω : lim inf Xn (ω) < lim sup Xn (ω)
{ω : Xn (ω) does not converge } =
n→∞
n→∞
= ω : there exists a, b ∈ Q such that lim inf Xn (ω) < a < b < lim sup Xn (ω)
n→∞
n→∞
[
[
=
ω : lim inf Xn (ω) ≤ a < b ≤ lim sup Xn (ω) =
{ω : Ua,b (ω) = ∞}
a,b∈Q
n→∞
n→∞
250
a,b∈Q
because lim inf Xn (ω) ≤ a < b ≤ lim sup Xn (ω) if and only if Xn ≥ b and Xn ≤ a infinitely
n→∞
n→∞
often, which is equivalent to Ua,b = ∞. But the union in the last equality is a countable union,
and therefore Xn converges to some random variable, say, X∞ , a.s..
Now we show that X∞ is integrable. By Fatou’s lemma
E|X∞ | = E lim inf |Xn | ≤ lim inf E|Xn | ≤ lim sup E|Xn |
n→∞
n→∞
=
lim sup E(2Xn+
n→∞
= 2 lim
n→∞
EXn+
n→∞
+ X ) = lim sup E(2Xn+ − (Xn+ − Xn− ))
=
lim sup E(Xn+
n→∞
−
n→∞
− Xn ) ≤ lim sup(2EXn+ − EX1 )
n→∞
− EX1 < ∞
where the second-to-last inequality holds because EXn ≥ EX1 due to the submartingale property. Therefore X∞ is integrable. Also we obtain that the limit superior of E|Xn | is finite, and
hence E|Xn | stays bounded.
Since L1 -boundedness implies that EXn+ converges to a finite limit, then we have the following
useful form
THEOREM 9.2.2 Let (Xn )n≥1 be a submartingale with respect to (Fn )n≥1 . Suppose E|Xn |
is bounded, then there exists an integrable random variable X∞ such that Xn → X∞ with
probability 1
THEOREM 9.2.3 (L1 Convergence Theorem for Submartingales) Let (Xn )n≥1 be a
submartingale with respect to (Fn )n≥1 . Then (Xn )n≥1 converges to L1 to some integrable random
variable X∞ if and only if (Xn )n≥1 is uniformly integrable.
Proof. Now we know that (Xn )n≥1 is uniformly integrable, which implies that E|Xn | stays
bounded. Therefore by the Almost Sure Convergence Theorem we know that Xn converges
a.s. to a limit random variable X∞ which is integrable, and therefore Xn converges to X∞ in
probability. But we also have that (Xn )n≥1 is uniformly integrable, therefore Xn converges to
X∞ in L1 .
In the next section we will need to deal with martingales over partially ordered index set
rather than discrete-integer index set, and we still need convergence theorem. Therefore slightly
generalization is necessary.
DEFINITION 9.2.2 (Topological Net) A partially ordered set T is said to be directed by
≤ if ≤ is the partial order and for each pair (s, t) ∈ T , there exists a common upper bound
u ∈ T such that s ≤ u and t ≤ u. Now suppose T is directed and (S, d) be a complete metric
space, and (xt )t∈T be a collection of points in S indexed by T . One says limt xt = x for some
251
x ∈ S, if for all ε > 0, there exists some tε ∈ T , such that for all t ≥ tε we have d(xt , x) ≤ ε.
One says that (xt )t∈T is a (topological) net.
LEMMA 9.2.5 Let (xt )t∈T be a topological net with xt in a complete metric space (S, d)
and T is directed. Then (xt ) is a Cauchy sequence in the sense that for all ε > 0 there exists
some t ∈ T such that d(xu , xv ) < ε whenever u, v ≥ t, if every sequence (xtn )n≥1 with tn ↑ in n
converges in S.
Proof. We prove it by contradiction. Suppose (xtn )n≥1 is not Cauchy, then there exists some ε
such that for all t ∈ T there exists some u, v ≥ t with d(xu , xv ) ≥ ε. By triangular inequality we
have
d(xu , xt ) + d(xt , xv ) ≥ d(xu , xv ) ≥ ε
This shows that for all t ∈ T , there exists some u ≥ t such that d(xu , xt ) ≥ ε/2 because if not
then both d(xu , xt ) and d(xv , xt ) are strictly less than ε/2, contradicting the above inequality.
Suppose we have t1 ∈ T , then there exits t2 ≥ t1 with d(xt2 , xt1 ) ≥ ε/2, and inductively for
tn ∈ T , there exists tn+1 ≥ t1 with d(xtn+1 , xtn ) ≥ ε/2. The sequence (xtn )n≥1 is not Cauchy in
S, and hence does not converge, contradicting with the condition provided.
LEMMA 9.2.6 Let (xt )t∈T be a topological net with xt in a complete metric space (S, d) and
T is directed. Then (xt ) converges in S if every sequence (xtn )n≥1 with tn ↑ in n converges in S.
Proof. We already know that if every sequence (xtn )n≥1 with tn ↑ in n converges in S then (xt )t∈T
is a Cauchy sequence, it suffices to show that Cauchy sequence converges.
By the definition of Cauchy sequence we can find a Cauchy sequence in S by the following
steps: First for n = 1 there exists some t1 such that d(xu , xv ) < 1/n = 1 whenever u, v ≥ t1 ;
Now inductively for each n there exists some tn such that d(xu , xv ) < 1/n whenever u, v ≥ tn ,
then we can find a τn+1 such that d(xu , xv ) < 1/(n + 1) whenever u, v ≥ τn+1 , and according
to this we can find tn+1 ≥ τn+1 , tn , such that d(xu , v) < 1/(n + 1) whenever u, v ≥ tn+1 . By
induction we have constructed a sequence (xtn )n≥1 with t1 ≤ t2 ≤ · · · ≤ tn ≤ tn+1 ≤ · · · , and
d(xu , xv ) < 1/n whenever u, v ≥ tn . In particular, we know that d(xtn , xtq ) < 1/n whenever
q ≥ n. Therefore (xtn )n≥1 ⊂ S is Cauchy, and converges to a candidate limit x ∈ S by the
completeness of S.
Now we show that this candidate limit x is indeed the limit of (xt )t∈T . Now for all ε there
exists some n such that d(xtn , x) < ε/2 and 1/n < ε/2. We take tε = tn . By the triangular
inequality for all t ≥ tn = tε we have
d(xt , x) ≤ d(xt , xtn ) + d(xtn , x) ≤
Therefore xt converges to x.
252
1 ε
+ <ε
n 2
THEOREM 9.2.4 (Generalized L1 Convergence Theorem) Let (Xt )t∈T be a uniformly
integrable submartingale with T being directed. Then (Xt ) converges in L1 .
Proof. We consider the target space S to be the collection of all integrable random variables on
the underlying probability space. Suppose (Xtn )n≥1 is any sequence from (Xt )t∈T with (tn )n≥1
increases. Then we know that (Xtn )n≥1 is uniformly integrable since (tn )n≥1 is a sub-index set of
T . Therefore by the L1 Convergence Theorem we know that Xtn converges in L1 , namely, under
the L1 metric d there is some integrable random variable X∞ with d(Xtn , X∞ ) → 0 as n → ∞.
Now we know that any sequence (Xtn )n≥1 with tn increases converges in L1 . Therefore (Xt )t∈T
converges in L1 since convergence in L1 is exactly convergence in metric space (S, d) where d is
the L1 norm distance.
9.3
Radon-Nykodim Theorem
In this section we will prove the well-known Radon-Nykodym Theorem as an application of the
martingale theory. First we need to establish the martingales involved.
LEMMA 9.3.1 Suppose (Ω, A, P ) is a probability space. Let P be the collection of all
partitions π of Ω into finitely many A-measurable sets. P is partially ordered by refinement:
π1 ≤ π2 if every cell of π1 is a union of cells of π2 . Define Fπ := σ(π) and
Dπ (ω) :=
X P (A)
A∈π
Q(A)
IA (ω)
for all π ∈ P and ω ∈ Ω with the convention 0/0 = 0. Then (Dπ )π∈P is a martingale with
respect to the filtration (Fπ )π∈P .
Proof. First we observe that
Z
Z
Z X
Z X
P (A)
P (A)
Dπ dQ = Dπ IB dQ =
IA IB dQ =
IA∩B dQ
Q(A)
Q(A)
B
A∈π
A∈π
X P (A)
X
P (A)
=
Q(A ∩ B) =
Q(A ∩ B)
Q(A)
Q(A)
A∈π
A∈π:A∩B6=∅
=
P (A)
Q(A ∩ B) = P (B)
Q(A)
A∈π:A⊂B
X
for any π ∈ P and B ∈ Fπ because B consists of disjoint union of sets in A which are subsets
of B. Notice that with denominators being 0 result still holds due to the convention 0/0 = 0.
It is clear that (Fπ )π∈P is a filtration, because if π1 ≤ π2 , then every set in Fπ1 is a finite
union of some sets in π2 , and therefore is in Fπ2 . Therefore Fπ1 ⊂ Fπ2 . Also we know Dπ is
adapted to (Fπ )π∈P because Dπ is linear combination of π-set indicators, and closure theorem
253
ensures the measurablility of Dπ with respect to Fπ . We need to show that E(Dπ |Fσ ) = Dσ
if σ ≤ π. In fact Dσ is π-measurable since Fπ ⊃ Fσ , and integrability is also satisfied because
Dσ takes finitely many values. It suffices to check the averaging property. For any A ∈ Fσ , we
know that A ∈ Fπ . Then by the observation we made at the beginning of the proof
Z
Z
Dπ dQ =
Dσ dQ
A
A
Therefore averaging property holds, and E(Dπ |Fσ ) = Dσ .
LEMMA 9.3.2 The martingales defined in the previous lemma is uniformly integrable.
Proof. We need to show:
Z
|Dπ |dQ = 0
lim sup
c→∞ π∈P
{|Dπ |≥c}
To see this, note that for all π ∈ P
Z
Z
|Dπ |dQ =
{|Dπ |≥c}
Dπ dQ = P {Dπ ≥ c}
{|Dπ |≥c}
And notice that for all π ∈ P
1
Q{Dπ ≥ c} ≤
c
Z
1
1
Dπ dQ = P (Ω) =
c
c
Recall condition 2, which says that for all ε > 0 there exists some η, such that P (A) < ε
whenever Q(A) < η. Now we let c > 1/η, then 1/c < η and Q{Dπ ≥ c} < η for all π ∈ P, and
therefore for all π ∈ P:
Z
P {Dπ ≥ c} < ε implying that
|Dπ |dQ = sup P {Dπ ≥ c} ≤ ε
sup
π∈P
{|Dπ |≥c}
π∈P
Therefore by definition of limit we know
Z
|Dπ |dQ = 0
lim sup
c→∞ π∈P
{|Dπ |≥c}
This shows the uniform integrability of (Dπ )π∈P .
THEOREM 9.3.1 (Little Radon-Nykodym Theorem) Let P and Q be two probability
measures on a measurable space (Ω, A). Then the following are equivalent:
(1) P Q.
(2) lim
sup
η→0+ A:Q(A)<η
P (A) = 0.
(3) There exists X ≥ 0 such that P (A) =
R
A
XdQ for all A ∈ A.
Moreover, the random variable X in (3) is unique up to Q-almost-sure equivalence.
254
Proof. (3) implying (2) is already proved in chapter 3 section 3.4 Indefinite Integrals and Densities. Similarly (2) implying (1) is also proved in that section. We will immediately prove (1)
implying (2). Suppose (2) does not hold, then there exists some ε0 , for all n there exists some
An with Q(An ) < 1/2n but P (An ) ≥ ε. Notice that
∞
X
Q(An ) ≤
n=1
∞
X
1
=1<∞
n
2
n=1
Then by the first Borel-Cantelli lemma we know that Q(lim supn An ) = 0. But by the miniFatou’s lemma
P
≥ lim sup P (An ) ≥ ε0
lim sup An
n→∞
n→∞
and this contradicts (1). Therefore (2) holds.
The main part of the proof of the Radon-Nykodym Theorem is to show that (2) implies (3).
We first know that the martingales (Dπ )π∈P defined in the first lemma is uniformly Q-integrable
by the previous lemma. Notice that P is a directed set because any two partitions π1 , π2 have
a common upper bound, which can be the collection of intersections of sets in π1 and π2 . Then
by the Generalized L1 Convergence Theorem for Martingales there exists some Q-integrable D,
such that Dπ converges to D in L1 . This is equivalent to
Z
lim |Dπ − D|dQ = 0
π
By the modulus inequality
R
|Dπ − D| ≥
obtain
R
A
|Dπ − D| ≥ |
Z
lim
π
R
A
Dπ −
R
A
D|, therefore we easily
Z
Dπ dQ =
A
DdQ
A
for all A ∈ A. Now suppose A ∈ A is arbitrary, then we can let π = {A, Ac }, and this implies
π ∈ P. Then for all σ ≥ π we know that A ∈ Fσ and the martingale property gives that
Z
Z
P (A) =
Dπ dQ =
Dσ dQ
A
A
This means that for all σ ≥ π, we have
Z
Z
Z
P (A) −
Dσ dQ = 0 < ε implying that P (A) = lim Dπ dQ =
DdQ
π
A
A
A
DEFINITION 9.3.1 (Essential Supremum) Let (Ω, A, P ) be a probability space and I be
an arbitrary index set.
(a) Let {Ai : i ∈ I} be a family of events. An event A ∈ A is said to be the essential supremum
of {Ai : i ∈ I} with respect to P , denoted by A = ess supP {Ai : i ∈ I}, or simply ess supi Ai ,
if
255
(i) Ai ⊂ A a.s. in the sense that P (Ai \A) = 0 for all i ∈ I;
(ii) If B ∈ A satisfies P (Ai \B) = 0 for all i ∈ I, then A ⊂ B.
(b) Let {Xi : i ∈ I} be a family of random variables. A random variable X ∈ A is said to be the
essential supremum of {Xi : i ∈ I} with respect to P , denoted by X = ess supP {Xi : i ∈ I},
or simply ess supi Xi , if
(i) Xi ⊂ X for all i ∈ I;
(ii) If random variable Y satisfies Xi ≤ Y for all i ∈ I a.s., then X ≤ Y a.s..
PROPOSITION 9.3.1 Let (Ω, A, P ) be a probability space and I be an arbitrary index set.
(a) Let {Ai : i ∈ I} be a family of events. Then ess supi Ai exists and is P -a.s. unique, and
can be taken as a countable union of Ai ’s.
(b) Let {Xi : i ∈ I} be a family of random variables. Then ess supi Xi exists and is P -a.s.
unique, and can be taken as a countable supremum of Ai ’s.
Proof.
(a) Uniqueness is due to the second condition of essential supremum: if A, B are two essential
supremum, then A ⊂ B and B ⊂ A a.s., namely, A = B a.s.. It suffices to show the existence.
S
Define C = { h∈H Ah : H ⊂ I H countable}, namely, the collection of all possible countable
unions of sets Ai : i ∈ I. Put M = sup{P (C) : C ∈ C}. Then there exists a sequence
S
Cn ∈ C such that P (Cn ) → M . Let A = ∞
n=1 Cn . We claim that A serves as a supremum of
{Ai : i ∈ I}. First notice A ∈ C because it is a countable union of countable unions of sets in Ai
by the definition of C. Next
M ≥ P (C) ≥ P (Cn ) → M
as n → ∞
This gives P (C) = M . Then we need to check the two conditions.
For all Ai , i ∈ I we know Ai ∪ A ∈ C, because it is still a countable union, and this implies
that P (Ai ∪ A) ≤ M = P (A), and therefore P (Ai \A) + P (A) = P (Ai ∪ A) = P (A) = M , where
M ≤ 1. Hence P (Ai \A) = 0, namely Ai ⊂ A a.s..
Finally if B is any measurable set with Ai ⊂ B a.s. for all i ∈ I, then write A =
S
h∈H
Ah
for some countable index set H, and we hence have
!
[
X
P (A\B) = P
(Ah \B) ≤
P (Ah \B) = 0
h∈H
h∈H
since P (Ah \B) = 0 for all h ∈ H ⊂ I. Also the proof gives A in terms of a countable union of
sets Ah with h ∈ H ⊂ I, H being countable.
256
(b) If X, Y are two essential supremum for a collection of random variables then by definition
X ≤ Y a.s. and Y ≤ X a.s., and therefore X = Y a.s.. This proves the uniqueness. For the
existence, we prove it by two steps:
• Step 1: Suppose Xi ∈ [0, 1] for all i ∈ I. Let D = {suph∈H Xh : H ⊂ I countable}. Then
X is a family of bounded random variables, and we can let M = sup{ED : D ∈ D}, which
indicates that there is a sequence of random variables (Dn )n≥1 ⊂ D such that EDn → M
as n → ∞. Put X = supn Dn . Notice that X is a countable supremum of a countable
supremum of random variables, then it is also in D, and hence by monotonicity
M ≥ EX ≥ EDn → M
as n → ∞
This means that EX = M . We next claim that X serves as a essential supremum. We need
to check the definition. Observe that X here is also a countable supremum itself. First we
know sup{Xi , X} ∈ D because this is also a countable supremum for all i ∈ I, and therefore
M ≤ EX ≤ E(sup{Xi , X}) ≤ M ≤ 1
This means that EX = E(sup{Xi , X}). Consider event B = {Xi > X}, then we have
M = EX = E(sup{Xi , X}; B) + E(sup{Xi , X}; B c ) = E(Xi ; B) + E(X; B c )
M = EX = E(X; B) + E(X; B c )
Therefore E(X; B) = E(Xi ; B), but over B we know Xi > X, and therefore P (B) = 0.
Namely we have Xi ≤ X a.s..
Next if Xi ≤ Y a.s.. for all i ∈ I for some random variable Y , then by writing X =
suph∈H Xh where H is countable, we know that for all h ∈ H we have Xh ≤ Y . Therefore
there exists some set Nh with P (Nh ) = 0 such that Xh ≤ Y off the set Nh . Now take
S
N = h∈H Nh , then countable subadditivity gives P (N ) = 0 and off the set N we know
that Xh ≤ Y . Therefore the supremum X = suph∈H Xh ≤ Y .
• Step 2: General {Xi : i ∈ I}. Consider Yi = π −1 arctan(Xi ) + 1/2. Then Yi , i ∈ I has a
supremum, say, Y , with Xi ≤ Y a.s., and if Z ≥ Yi a.s. then Z ≥ Y a.s.. This is because
Yi ∈ [0, 1] for all i and Step 1 applies. We know that Y ∈ [0, 1] because 1 is an upper
bound for all Yi . Now let X = tan(π(Y − 1/2)). We claim that X serves as supi Xi . First
we know that X ≥ Xi a.s. for all i because Y ≥ Yi for all i a.s. and the transformation is
strictly monotonically increasing, and images and pullbacks of null sets are still null sets.
Next suppose W ≥ Xi for all i a.s. then monotonic property gives π −1 arctan(W )+1/2 ≥ Yi
a.s., and essential supremum means that π −1 arctan(W ) + 1/2 ≥ Y a.s., and hence W ≥ X
a.s..
257
EXAMPLE 9.3.1 Let (Ω, A, P ) = ((0, 1], B, λ) and T ⊂ (0, 1]. For t ∈ T , set At = {t}. Then
∅ serves as ess supt At with respect to P = λ.
DEFINITION 9.3.2 (Signed Measure) Let (Ω, A) be a measurable space. A set function
µ : A → R is said to be a signed measure, if there exists two measures µ1 , µ2 on (Ω, A) such
that µ = µ1 − µ2 , where at least one of µ1 , µ2 is finite.
THEOREM 9.3.2 (Monotone Sequential Continuity) Let (Ω, A) be a measurable space
with µ being a signed measure.
(1) If (An )n≥1 is a sequence of increasing measurable sets, then µ
∞
[
!
An
n=1
= lim µ(An ).
n→∞
(2) If (An )n≥1!is a sequence of decreasing measurable sets with µ(AN ) finite for some N , then
∞
\
An = lim µ(An ).
µ
n→∞
n=1
Proof. First we prove (1). By the monotone sequential continuity from below we have
!
!
!
∞
∞
∞
[
[
[
An = lim µ1 (An ) − lim µ2 (An )
An − µ2
µ
An = µ1
n=1
n=1
n→∞
n=1
n→∞
= lim (µ1 (An ) − µ2 (An )) = lim µ(A)
n→∞
n→∞
Second we know that µ(AN ) is finite, implying that µ1 (AN ), µ2 (AN ) are finite. By monotone
sequential continuity from above we have
!
!
∞
∞
\
\
µ
An = µ1
A n − µ2
n=1
n=1
∞
\
!
An
= lim µ1 (An ) − lim µ2 (An )
n=1
n→∞
n→∞
= lim (µ1 (An ) − µ2 (An )) = lim µ(A)
n→∞
n→∞
DEFINITION 9.3.3 Let (Ω, A) be a measurable space with µ being a signed measure. A set
A+ ∈ A is said to be a positive set if for all measurable A ⊂ A+ we have µ(A) ≥ 0; A set A− ∈ A
is said to be a negative set if for all measurable A ⊂ A− we have µ(A) ≤ 0. If A+ + A− = Ω
then (A+ , A− ) is called the Hahn-Decomposition
THEOREM 9.3.3 (Hahn-Decomposition Theorem) Let (Ω, A) be a measurable space
with µ being a signed measure. Then there exists positive set A+ and negative set A− such that
Ω = A+ +A− , namely, Hahn-Decomposition exists. Moreover, Hahn-Decomposition is essentially
unique in the sense that if (B + , B − ) is also a Hahn-Decomposition, then µ(A± ∆B ± ) = 0.
258
Proof. Without loss of generality we may assume that µ does not take ∞, because if not then
we can deal with −µ. Define α = sup{µ(A) : A ∈ F}. Clearly α is finite. Then there exists a
sequence of measurable sets (An ) such that µ(An ) → α.
• Step 1: Define for each n
(
Bn =
n
\
)
A0k : A0k ∈ {Ak , Ack }, k = 1, 2, · · · , n
k=1
Then clearly Bn is a partition of Ω into measurable cells. We show that Bn+1 is a finer
partition than Bn in the sense that each Bni ∈ Bn is a finite disjoint union of sets Bn+1,i ∈
Bn+1 . In fact if Bni ∈ Bn then there exists A0k ∈ {Ak , Ack } for k = 1, · · · , n such that
n
\
T
T
T
Bni =
A0k . Then Bni = nk=1 A0k = (Ak+1 ∩ nk=1 A0k ) + Ack+1 ∩ nk=1 A0k where the two
k=1
terms on the right-hand side are in Bn+1 . This shows that Bn is finer when n increases.
• Step 2: Define for each n
X
Cn =
Bni
Bni ∈Bn :µ(Bni )>0
We show that µ(An ) ≤ µ(Cn ) ≤ µ
∞
[
!
Cn . The first inequality is trivial because Cn
k=n
is the disjoint union of all positive-measure sets in Bn , and An is also a disjoint union of
sets in Bn by fixing A0n = An and taking union of all possible A0k ’s for k = 1, · · · , n − 1.
Next for each m ≥ n we know Cn , · · · , Cm are disjoint unions of sets in Bm . Therefore
m+1
m
[
[
Ck −
Ck is also a disjoint unions of sets Bm+1,i ’s in Bm+1 , and each Bm+1,i involved
k=n
k=n
has positive measure because they are subsets!of Cm+1 . Notice that every union here can
m+1
m
[
[
be a empty set. Therefore µ
Ck −
Ck ≥ 0 and hence
k=n
µ
m+1
[
k=n
!
Ck
=µ
k=n
m
[
!
Ck
+µ
k=n
m+1
[
k=n
Ck −
m
[
!
Ck
k=n
≥µ
m
[
!
Ck
k=n
where the finite additivity inherits from!the finite additivity of measures. Therefore by
! inm
m
[
\
duction one obtains µ(Ck ) ≤ µ
Ck for all m ≥ n. Notice that sequence
Ck
k=n
k=n
increases with m, and hence by Monotone Sequential Continuity from below we have
!
!
m
∞
[
[
µ(Cn ) ≤ lim µ
Ck = µ
Ck
m→∞
k=n
259
k=n
m≥n
• Step 3: Define
A+ = lim sup Cn
n→∞
+
Then we show that A serves as a positive set.
! First by Monotone Sequential Continuity
∞
[
from Above we have µ(A+ ) = lim µ
Ck ≥ lim µ(Cn ) ≥ lim µ(An ) = α and the
n→∞
k=n
n→∞
n→∞
reverse naturally holds from the definition of α being the supremum. Now if A ⊂ A+
with µ(A) < 0 then µ(A+ − A) = µ(A+ ) − µ(A) > µ(A+ ) = α, contradicting α being the
supremum. Also we can let A− = (A+ )c , and if A ⊂ A− with µ(A) > 0 then µ(A+ + A) =
µ(A+ ) + µ(A) > α, also a contradiction.
• Step 4: We show that A+ and A− is essentially unique. If (B + , B − ) is also a HahnDecomposition, then µ(B + ∩ A− ) = 0 because it must have both non-negative and nonpositive measure. Similarly µ(B − ∩ A+ ) = 0. Therefore
µ(A+ \B + ) = µ(A+ ∩ B − ) = µ(B − \A− ) = 0 µ(A− \B − ) = µ(A− ∩ B + ) = µ(B + \A+ ) = 0
DEFINITION 9.3.4 (Mutually Singular Measures) Let (Ω, A) be a measurable space.
Two measures µ1 , µ2 is said to be mutually singular, denoted by µ1 ⊥ µ2 , if there exists a set A
such that µ1 (A) = µ2 (Ac ) = 0.
DEFINITION 9.3.5 (Jordan-Decomposition) Let (Ω, A) be a measurable space with µ
being a signed measure. Two measures (µ+ , µ− ) are called a Jordan-Decomposition of µ if
µ = µ+ − µ− with µ+ ⊥ µ− .
THEOREM 9.3.4 (Jordan-Decomposition Theorem) Let (Ω, A) be a measurable space
with µ being a signed measure. There exists a unique Jordan-Decomposition µ = µ+ − µ− .
Proof. First we prove the existence. Let (A+ , A− ) be a Hahn-Decomposition of µ. Define
µ+ (A) = µ(A ∩ A+ ), µ− (A) = −µ(A ∩ A− ). First we check that µ± ’s are measures. Countable
additivity inherits naturally from the countable additivity of signed-measures, which also inherits from the countable additivity of measures since subtraction preserves countable additivity
when one of the serieses converges. And both are non-negative by the definition of HahnDecomposition. Empty set takes measures 0 in both cases, since empty set intersects A± is also
empty. Hence µ± ’s are measures. Next we show that this is indeed the Jordan-Decomposition. In
fact we know that µ(A+ ) ≥ 0 and µ(A− ) ≤ 0, and therefore µ+ (A− ) = µ(A− ∩ A+ ) = µ(∅) = 0,
µ− (A+ ) = µ(A+ ∩ A− ) = µ(∅) = 0. Therefore two measures are mutually singular.
260
Next we prove the uniqueness. Suppose µ = µ1 − µ2 is a Jordan-Decomposition, which means
µ1 ⊥ µ2 . Then there exists B such that µ1 (B) = µ2 (B c ) = 0. Let B + = B c , B − = B. Then
Ω = B + + B − gives a Hahn-Decomposition. And for all A ∈ A compute
µ1 (A) = µ1 (A ∩ B + ) + µ1 (A ∩ B − ) = µ1 (A ∩ B + )
µ2 (A) = µ2 (A ∩ B + ) + µ2 (A ∩ B − ) = µ2 (A ∩ B − )
because µ1 (A ∩ B − ) ≤ µ1 (B − ) = 0 and µ2 (A ∩ B + ) ≤ µ2 (B + ) = 0. Therefore if A ⊂ B + then
µ(A) = µ1 (A) − µ2 (A) = µ1 (A) = µ(A ∩ B + ) + µ(A ∩ B − ) = µ(A ∩ B + )
This shows that µ1 (A) = µ(A∩B + ). Now for general A we have µ1 (A∩B + ) = µ((A∩B + )∩B + ) =
µ(A∩B + ), but µ1 (A) = µ1 (A∩B + ), and therefore we conclude that µ1 (A) = µ(A∩B + ). Similarly
if A ⊂ B − then
µ(A) = µ1 (A) − µ2 (A) = −µ2 (A) = µ(A ∩ B + ) + µ(A ∩ B − ) = µ(A ∩ B − )
This shows that µ2 (A) = −µ(A ∩ B − ). Now for general A we have µ2 (A ∩ B − ) = −µ((A ∩
B − ) ∩ B − ) = −µ(A ∩ B − ), but µ2 (A) = µ2 (A ∩ B − ), and therefore we conclude that µ2 (A) =
−µ(A ∩ B − ). Hence we conclude that µ1 (A) = µ(A ∩ B + ), µ2 (A) = −µ(A ∩ B − ). But HahnDecomposition is essentially unique, and hence µ1 = µ+ , µ2 = µ− . This proves the uniqueness.
The following theorem is one of the most famous result in measure theory.
THEOREM 9.3.5 (Radon-Nykodim Theorem) Let µ be a signed-measure on (Ω, A) and
ν be a σ-finite measure on (Ω, A). Then µZ ν if and only if there exists a ν-quasi-integrable AXdν for all A ∈ A. X is called the Radon-Nykodim
measurable function X such that µ(A) =
A
dµ
derivative and denoted by
.
dν
Proof. The proof consists of the following steps, and we start with necessity part, and end with
sufficiency part.
• Step 1: Assume that µ and ν are finite measures. Then µ/µ(Ω) and ν/ν(Ω) are probability
measures, with absolute continuity still holding. Hence by the Little Radon-Nykodym
theorem there exists some measurable function D with
Z
µ(A)
ν
=
Dd
µ(Ω)
ν(Ω)
A
R
Then we can let X = µ(Ω)/ν(Ω), and therefore µ(A) = A Xdν.
261
• Step 2: Assume that µ is a σ-finite measure and ν is a finite measure. This means that
P
there exists a partition (Ωn )n≥1 with Ω = n Ωn , and µ(Ωn ) < ∞ for all n. Now for any
fixed n, we consider measure space (Ωn , An , µn ) where An is the trace of A over Ωn and
µn is the restriction of µ on An . Also let νn be the restriction of ν on An . Clearly we
have An ⊂ A because each Ωn is A-measurable. Also µn νn is inherited from µ ν.
Then by Step 1 there exists some Xn on (Ωn , An , µn ) such that for all An ∈ An we have
R
P
µn (An ) = An Xn dνn . Now take X = ∞
n=1 Xn IΩn . First we know that each Xn is defined
on Ωn , and Ωn ’s are disjoint, which means that X is well-defined. Next we show that X
serves as the Radon-Nykodym derivative. For all A ∈ A, define An = A∩Ωn , then An ∈ An
and hence
µ(A) = µ
∞
X
n=1
=
∞ Z
X
n=1
!
An
=
∞
X
µn (An ) =
n=1
Xn IΩn dν =
∞ Z
X
n=1
Z X
∞
Xn dνn
An
Z
Xn IΩn dν =
Xdν
A
A n=1
A
where we have used the Monotone Convergence Theorem since Xn IΩn ≥ 0 for all n.
• Step 3: Assume that µ is not σ-finite but still a measure and ν is a finite measure. Put
C = {A ∈ A : µ(A) < ∞} and F = ess supν {C : C ∈ C}. Then µ is σ-finite over F ,
because F is a countable union of sets in C and such sets are of finite measure. Also F is
A-measurable. Therefore by Step 2 there exists a Radon-Nykodym derivative W ≥ 0 such
that
Z
µ(F ∩ A) =
W dν
A
for all A ∈ A, since µ is σ finite over (F, AF ) where AF is the trace of A on F . Notice that
AF ⊂ A.
Now we consider any set B ⊂ F c . If ν(B) = 0 then µ(B) = 0, which is what we are
assuming the absolute continuity. If ν(B) > 0, then we claim that µ(B) = ∞. If not then
B ∈ C, and therefore B ⊂ F a.s. by the definition of essential supremum. Hence
0 < ν(B) = ν(B ∩ F c ) = ν(B\C) = 0
This is a contradiction, and therefore we must have µ(B) = ∞. This implies that µ(A) =
R
R
∞dν since ν(A) > 0. Also when ν(A) = 0 the formula µ(A) = A ∞dν also holds. Hence
A
Z
Z
c
µ(A ∩ F ) =
∞dν =
∞IF c dν
A∩F c
A
Now take X = W + ∞IF c . Then for all A ∈ A we have
Z
Z
Z
c
µ(A) = µ(A ∩ F ) + µ(A ∩ F ) =
W dν +
∞IF c dν =
Xdν
A
262
A
A
• Step 4: Assume that ν is a σ-finite measure and µ is a measure. This means that there
P
exists a partition (Ωn )n≥1 with Ω =
n Ωn , and ν(Ωn ) < ∞ for all n. Now for any
fixed n, we consider measure space (Ωn , An , νn ) where An is the trace of A over Ωn and
νn is the restriction of ν on An . Also let µn be the restriction of µ on An . Clearly we
have An ⊂ A because each Ωn is A-measurable. Also µn νn is inherited from µ ν.
Then by Step 2 there exists some Xn on (Ωn , An , νn ) such that for all An ∈ An we have
R
µn (An ) = An Xn dνn , since νn is now finite on (Ωn , An ) and we do not care whether µn is
P
σ-finite due to step 3. Now take X = ∞
n=1 Xn IΩn . First we know that each Xn is defined
on Ωn , and Ωn ’s are disjoint, which means that X is well-defined. Next we show that X
serves as the Radon-Nykodym derivative. For all A ∈ A, define An = A∩Ωn , then An ∈ An
and hence
µ(A) = µ
∞
X
!
An
=
n=1
=
∞ Z
X
n=1
∞
X
µn (An ) =
n=1
Xn IΩn dν =
∞ Z
X
n=1
Z X
∞
Z
Xn IΩn dν =
Xdν
A
A n=1
A
Xn dνn
An
where we have used the Monotone Convergence Theorem since Xn IΩn ≥ 0 for all n.
• Step 5: Assume that ν is a σ-finite measure and µ is a signed measure. According to
the Jordan-Decomposition theorem there exists two measures µ± with µ = µ+ − µ− and
sets A+ , A− such that µ+ (A− ) = µ− (A+ ) = 0. This gives µ+ (A) = µ+ (A ∩ A+ ), µ− (A) =
µ− (A ∩ A− ). Then by Step 4 there exists random variables X± such that for all A ∈ A we
R
have µ± (A) = A X± dν. Now write
Z
Z
+
+
− −
µ(A) = µ (A ∩ A ) − µ(A ∩ A ) =
X+ dν −
X− dν
+
−
A∩A
A∩A
Z
Z
Z
X− IA− dν = (X+ IA+ − X− IA− )dν
=
X+ IA+ dν −
A
A
A
Notice that X+ IA+ − X− IA− is well-defined because A± are disjoint. X = X+ IA+ − X− IA− ,
then X serves as a Radon-Nykodym derivative.
• Step 6: Sufficiency: suppose ν(A) = 0, then µ(A) =
R
A
Xdν = 0. Therefore µ ν.
THEOREM 9.3.6 Let µ be a signed-measure on (Ω, A) and ν be a σ-finite measure on (Ω, A).
Suppose µ is absolute continuous with respect to ν and X = dµ/dν. Then µ is a measure if and
only if X ≥ 0; µ is a finite signed measure if and only if X is ν-integrable; µ is σ-finite if and
only if |X| < ∞ a.e. with respect to ν.
263
R
Proof. First X ≥ 0 implies that µ(A) ≥ 0 by µ(A) = A Xdν. If µ(A) ≥ 0 for all A then
R
let A = {X < 0}, then µ(A) = {X<0} Xdν ≥ 0, but X is negative on this event, and hence
ν{X < 0} = 0.
Next if µ is a finite signed measure if and only if µ(Ω) is finite, and this is equivalent to
R
µ(Ω) = Xdν is finite.
P
Finally if |X| < ∞ a.e., then we can let Ωn = {n−1 ≤ |X| < n}. This gives Ω = ∞
n=1 Ωn a.s.
because if ω ∈
/ ωn for all n, then |X| = ∞, and this is with measure 0. But |X| is integrable on Ωn ,
and hence |µ(Ωn )| < ∞. This shows the σ-finiteness of µ. Conversely if µ is σ-finite, then there
R
P
exists a countable partition Ω = ∞
n=1 Ωn with |µ(Ωn )| < ∞. But |µ(ωn )| = | Ωn Xdν| < ∞.
This means that X is integrable on Ωn , implying that ν({|X| = ∞} ∩ Ωn ) = 0. Therefore
ν({X = ∞}) =
∞
X
ν({X = ∞} ∩ Ωn ) = 0
n=1
As a corollary of the Radon-Nykodim theorem, let us prove the uniqueness and existence of
the conditional expectation:
THEOREM 9.3.7 Let (Ω, A, P ) be a probability space and X be a quasi-integrable random
variable. If B ⊂ A is a sub-σ-field, then E(X|B) exists and is almost surely unique.
R
Proof. Since X is quasi-integrable, then define µ(A) = A XdP . This is well-defined due to
quasi-integrability. Now we know that µ P , and therefore over probability space (Ω, B, P )
we know that µ P . Therefore there exists a B-measurable function dµ/dP such that for all
B ∈ B we have
Z
dµ
dP =
XdP
µ(B) =
B
B dP
Also such a Radon-Nykodym derivative is quasi-integrable because X is quasi-integrable, and
Z
we can let B = Ω. Therefore dµ/dP serves as E(X|B).
9.4
Applications of Martingale Convergence Theorem
LEMMA 9.4.1 Suppose (cn )n≥1 ⊂ R and lim ecn it exists for all t in set of strictly positive
n→∞
Lebesgue measure. Then (cn )n≥1 converges to a finite limit.
n
o
icn t
Proof. The proof consists two steps. First define A = t : lim e
exists . Then there must
n→∞
be some finite interval [a, b] such that λ(A ∩ [a, b]) > 0.
264
• Step 1: Show that cn cannot have subsequences converge to ±∞. Without loss
generality we assume that cnk is a subsequence of cn converges to ∞. Define

 lim eitcnk
t∈A
g(t) = k→∞
1
t∈
/A
Clearly, we know that |g(t)| = 1 for all t. Now by the Riemann-Lebesgue lemma we know
Z
lim
IA (t)eicnk t λ(dt) = 0
n→∞
[a,b]
But on the other hand by Dominated Convergence Theorem we know
Z
Z
Z
icnk t
icnk t
lim
IA (t)e
λ(dt) =
lim IA (t)e
λ(dt) =
g(t)λ(dt)
n→∞
[a,b] n→∞
[a,b]
[a,b]∩A
Z
g(t)λ(dt) = 0. Since λ([a, b] ∩ A) > 0, then we know that g(t) = 0 over
Therefore
[a,b]∩A
A ∩ [a, b], which is a non-empty set because it has positive Lebesgue measure. Hence there
exists some t with g(t) = 0, contradicting |g(t)| = 1. Hence no subsequences of (cn )n≥1
converge to ∞. Similarly when cnk converges to −∞ Riemann-Lebesgue lemma can still be
applied, and hence not subsequence can converge to −∞.
• Step 2: Show that any two converging subsequences of (cn )n≥1 converges to
the same limit. If not then there exists two subsequences (c0n )n≥1 , (c00n )n≥1 such that
c0n → c0 , c00n → c00 , c0 6= c00 as n → ∞. But we know that
0
00
0
lim eitcn = lim eitcn = eitc = eitc
n→∞
00
n→∞
for all t ∈ A. Therefore itc0 = 2nπ + itc00 , namely,
A ⊂ {t : t =
2nπ
, n ∈ Z}
− c00
c0
where the sets on the right is a countable set with Lebesgue measure 0, contradicting with
λ(A) > 0. Hence any two converging subsequences of (cn )n≥1 converges to the same limit.
Namely, if c0n converges to lim inf n cn and c00n converges to lim supn cn , then both are equal,
and hence cn has a finite.
THEOREM 9.4.1 Suppose (Xn )n≥1 is a sequence of independent random variables on some
n
X
underlying probability space (Ω, F, P ). Let Sn =
Xj . Then Sn converges a.s. if and only if
j=1
it converges in distribution.
265
Proof. Since convergence a.s. implies convergence in distribution, then it suffices to show that
for independent sums convergence in distribution implies convergence a.s.
We know that characteristic functions are continuous and take value 1 at the origin. Denote
fj the characteristic function for Xj , ϕ for Y , which is the a.s. limit of Sn , and ϕn for Sn . Then
there exists t0 > 0 such that |ϕ(t)| > 1/2 within T = [−t0 , t0 ]. Then we know for sufficiently
large n, say, n ≥ n0 , we have |ϕn (t)| > 1/4 because according to Levy’s continuity theorem we
must have ϕn (t) → ϕ(t) for all t ∈ T . Therefore for all t ∈ T , we have ϕn (t) 6= 0 as long as
n ≥ n0 . Now for fixed t ∈ T and n ≥ n0 , define
Zn (t) :=
exp(itSn )
ϕn (t)
Now we are to show:
• (Zn )n≥n0 is a uniformly bounded martingale for each fixed t.
Boundedness is
natural
1
|eitSn |
≤
=4
|ϕn (t)|
1/4
for all t ∈ T and n ≥ n0 , ω ∈ Ω. In fact it is uniformly bounded on the choice of t ∈ T .
|Zn | =
Therefore Zn is uniformly bounded and in L1 . Now we are to show that it is a martingale.
Take the filtration to be Fn = σhZ1 , · · · , Zn i for n ≥ n0 . Clearly it is a filtration and
adaptedness is satisfied. Now compute
itSn itXn+1 itSn
e e
E(eitXn+1 )
itXn+1
Fn = e
E(Zn+1 |Fn ) = E
E(e
= Zn
|F
)
=
Z
n
n
ϕn+1 (t) ϕn+1 (t)
fn+t (t)
Hence (Zn )n≥1 is a uniformly bounded martingale.
• There exists a P -null set N , exp(itSn (ω)) exists a.e. for t ∈ T when ω ∈ N c .
By
the conclusion above we know that there exists Z ∈ L1 such that Zn → Z a.s.. Here Zn is
a complex random variable. Define Ωt = {ω : Zn (Ω) converges}. Then we just proved that
P (Ωt ) = 1 for all t ∈ T . Since Zn → Z a.s., and ϕn (t) → ϕ(t), then we know that eitSn (ω)
converges for all ω ∈ Ωt . Now define the product space
(T × Ω, B ⊗ F, Q × P ) Q = λ/λ(T )
Then it is clear that exp(itSn (ω)) is a measurable function of variable (t, ω). Define
n
o
C := (t, ω) ∈ T × Ω : lim exp(itSn (ω)) exists
n→∞
Now we claim that:
– Claim 1: (Q × P )(C) = 1. We have
Z
Z
(Q × P )(C) =
P (Ct )Q(dt) =
t∈T
t∈T
266
Z
P (Ωt )Q(dt) =
dQ = 1
t∈T
– Claim 2: There exists a P -null set N , such that Q(Cω ) = 1 for all ω ∈ N c . This is
because
Z
Z
1 = (Q × P )(C) =
Q(Cω )P (dω) ≤
ω∈Ω
P (dΩ) = 1
Ω
This means that Q(Cω ) = 1 a.s. over Ω. Therefore there exists a P -null set such that
Q(Cω ) = 1 for ω ∈ N c .
Therefore there exists a P -null set N , such that lim exp(itSn (ω)) converges a.e. for t ∈ T
n→∞
when ω ∈ N c . This means that for all ω ∈ N c , the sequence exp(itSn (ω)) converges a.e.
t ∈ T . But Q(T ) = 1 > 0, by the previous lemma we know that Sn (ω) converges to a finite
limit, as long as ω ∈ N c . Since N is a P -null set then we obtain that Sn converges a.s..
LEMMA 9.4.2 Suppose (Ω, F, P ) is a probability space and (Fn )0≤n<∞ is a filtration in F.
Let Y be integrable. Then Xn := E(Y |Fn ) is a martingale with respect to (Fn )0≤n<∞ , which is
the Doob’s process in section 9.1. Let Yn = Xn and Gn = Fn for 0 ≤ n < ∞. Let Y∞ := Y and
G∞ = F. Then the following holds:
• (Yn )0≤n≤∞ is a martingale and (|Yn |)0≤n≤∞ is a submartingale, in each instance with respect
to the filtration (Gn )0≤n≤∞ .
• (Xn )0≤n<∞ is uniformly integrable.
• There exists X∞ ∈ L1 (Ω, F, P ) such that Xn → X∞ both a.s. and in L1 as n → ∞.
Proof. First we show that (Yn )0≤n≤∞ is a martingale. It suffices to show that E(Y∞ |Fn ) = Xn .
But Y∞ = Y , and Xn by definition is E(Y |Fn ). Then
E(|Yn+1 |Fn ) ≥ E(Yn+1 |Fn ) = Yn = Xn
implying that |Xn | ≤ E(|Yn+1 |Fn )
And we have
E(|Y∞ ||Fn ) = E(|Y ||Fn ) ≥ E(Y |Fn ) = Xn
implying that |Xn | ≤ E(|Y∞ |Fn )
Therefore (|Yn |)0≤n≤∞ is a submartingale. If (Xn )n≥1 are uniformly integrable, then the L1
convergence theorem for martingales yields that there exists a limit random variable X∞ such
that Xn converges to X∞ both a.s. and in L1 . It suffices to show the uniform integrability. First
by Markov’s inequality
E|Xn |
E|Y |
≤
c
c
since E|Xn | ≤ E|Y | is due to the submartingale property, and we know that for all ε, there
P (|Xn | ≥ c) ≤
exists some c such that
sup P (|Xn | ≥ c) ≤
n
E|Y |
< ε implying that
c
267
lim sup P (|Xn | ≥ c) = 0
c→∞
n
Now define µ(A) = E(|Y |; A). By Radon-Nykodym theorem we know that µ P . Therefore
lim sup{E(|Y |; A) : A ∈ F, P (A) < η} = 0
η→0
For all ε there exists some η > 0 such that E(|Y |; A) < ε whenever P (A) < η. Then there exists
some c = c(ε) such that P (|Xn | ≥ c) < η for all n due to the uniformity, and this implies that
E(|Y |; |Xn | ≥ c) < ε for all n. Therefore
sup E(|Y |; |Xn | ≥ c) ≤ ε
n≥1
This shows that
lim sup E(|Y |; |Xn | ≥ c) = 0
c→∞ n≥1
Observe that {|Xn | ≥ c} ∈ Fn , then by the definition of conditional expectation and martingale
property
E(|Xn |; |Xn | ≥ c) ≤ E(E(|Y ||Fn ); |Xn | ≥ c) = E(|Y |; |Xn | ≥ c)
Therefore
lim sup E(|Xn |; |Xn | ≥ c) = lim sup E(|Y |; |Xn | ≥ c) = 0
c→∞ n≥1
c→∞ n≥1
Hence (Xn )n≥0 are uniformly integrable.
REMARK 9.4.1 This lemma shows that a Doob’s process is uniformly integrable.
LEMMA 9.4.3 Let (Ω, F, P ) be a probability space and (Fn )0≤n<∞ is a filtration in F,
where Xn := E(Y |Fn ) is a Doob’s process for some integrable Y with respect to (Fn )0≤n<∞ .
S
Let F∞ = σ( 0≤n<∞ Fn ). Suppose X∞ is the almost sure limit of the Doob’s process (Xn )n≥0
by the previous lemma, then X∞ = E(Y |F∞ ) and Xn = E(X∞ |Fn ) for 0 ≤ n < ∞.
Proof. First we show that X∞ = E(Y |F∞ ). Each Xn is F∞ -measurable for n < ∞, and therefore
X∞ ∈ F∞ by the closure theorem since it is the almost sure limit. Also X∞ is in L1 because Xn
converges to X∞ in L1 . It suffices to show that E(X∞ ; A) = E(Y ; A) for all A ∈ F∞ . Now for
S
all A ∈ A := 0≤n<∞ Fn , there exists some m such that A ∈ Fm . Then Xn → X∞ in L1 implies
that Xn IA → X∞ IA in L1 :
E(|Xn − X∞ |; A) ≤ E|Xn − X∞ | → 0 as n → ∞
Therefore E(Xn ; A) → E(X∞ ; A):
E(X∞ ; A) = lim E(Xn ; A) = lim E (E(Y |Fn ); A) = lim E(Y ; A)
n→∞
n→∞
n→∞
because when n ≥ m, we have A ∈ Fn , and for such n we have E(E(Y |Fn ); A) = E(Y ; A) by
the definition of conditional expectation(actually it is the averaging property). Therefore for all
268
A ∈ A we have E(Y ; A) = E(X∞ ; A). This means two indefinite integrals E(Y ; ·) and E(X∞ ; ·)
agree over a π-system A generating σ(A) = F∞ . We also know that two indefinite integrals are
σ-finite because X∞ and Y are integrable. Therefore they must agree on F∞ .
Now we show that Xn = E(X∞ |Fn ). In fact we know that E(Y |F∞ ) = X∞ , then by the
nested property
Xn = E(Y |Fn ) = E(E(Y |F∞ )|Fn ) = E(X∞ |Fn )
Here is the main theorem concerning general uniformly integrable martingale and Doob’s
process:
THEOREM 9.4.2 Let (Xn )0≤n<∞ be a uniformly integrable martingale with respect to a
filtration (Fn )0≤n<∞ on a probability space (Ω, F, P ). Then there exists X∞ ∈ L1 (Ω, F, P ) such
that Xn → X∞ both a.s. and in L1 as n → ∞; Moreover Xn = E(X∞ |Fn ) for 0 ≤ n < ∞.
Namely, a martingale is uniformly integrable if and only if it is a Doob’s process.
Proof. The candidate limit X∞ is guaranteed by the L1 convergence theorem for martingales.
Also we know that X∞ ∈ L1 . We now show that E(X∞ |Fn ) = Xn . First Xn ∈ Fn is clear, and
Xn being integrable is also clear. Now for all A ∈ Fn , we have
E(X∞ ; A) = lim E(Xm ; A) = lim E(E(Xm |Fn ); A) = lim E(Xn ; A) = E(Xn ; A)
m→∞
m→∞
m→∞
because for sufficiently large m, say, m ≥ n, A ∈ Fn and E(Xm ; A) = E(E(Xm |Fn ); A),
and the martingale property gives E(Xm |Fn ) = Xn for m ≥ n. Therefore we obtain that
Xn = E(X∞ |Fn ) for all n.
LEMMA 9.4.4 Let (Ω, F, P ) be a probability space and (Ft )t∈T be an arbitrary collection of
σ-fields. If Z ∈ L1 , then Zt := E(Z|Ft ) is uniformly integrable.
Proof. Note that |E(Z|Ft )| ≤ E(|Z||Ft ), then without loss of generality we may assume that
Z ≥ 0. It suffices to show
sup E(Zt ; Zt > A) = sup E(E(Z|Ft ); E(Z|Ft ) > A) = sup E(Z; E(Z|Ft ) > A) → 0
t∈T
t∈T
t∈T
as A → ∞. Notice that
sup P (E(Z|Ft ) > A) ≤ sup
t∈T
t∈T
E(E(Z|Ft ))
EZ
=
A
A
which goes to 0 as A → ∞ uniformly in t ∈ T . And E(Z; ·) P (·) as two measures, then for all
ε > 0 there exists some η = η(ε), and exists some A = A(η(ε) such that P (E(Z|Ft ) > A) ≤ η for
269
all t ∈ T , and this yields that E(Z; E(Z|Ft ) > A) ≤ ε for all t ∈ T by the absolute continuity,
namely,
sup E(Z; E(Z|Ft ) > A) ≤ ε
t∈T
And this gives the uniform integrability of (Zt )t∈T .
THEOREM 9.4.3 (Backward Martingale Convergence Theorem) Let (Xn )n≤0 be a
martingale with respect to (Fn )n≤0 . Then there exists an integrable random variable X−∞ such
that Xn converges to X−∞ both in L1 and a.s. as n → −∞.
Proof. Let Un = Ua,b (X−n , · · · , X0 ). Since (X−n , X−n+1 , · · · , X0 ) forms a martingale and also a
submartingale, then Doob’s upcrossing inequality implies that
EUn ≤
E(X0 − a)+
b−a
Put Ua,b = lim Un . Clearly such limit exists because the quantity Ua,b (X−n , · · · , X0 ) increases
n→∞
with n. Now by Monotone Convergence Theorem we obtain
E(X0 − a)+
EUa,b = E lim Un = lim EUn ≤ lim
<∞
n→∞
n→∞
n→∞
b−a
where the finiteness is due to E(X0 − a)+ < ∞. Therefore Ua,b = ∞ with probability 0. Now
consider relation
ω : lim inf X−n (ω) < lim sup X−n (ω)
{ω : X−n (ω) does not converge } =
n→∞
n→∞
= ω : there exists a, b ∈ Q such that lim inf X−n (ω) < a < b < lim sup X−n (ω)
n→∞
n→∞
[ [
=
ω : lim inf X−n (ω) ≤ a < b ≤ lim sup X−n (ω) =
{ω : Ua,b (ω) = ∞}
a,b∈Q
n→∞
n→∞
a,b∈Q
because lim inf X−n (ω) ≤ a < b ≤ lim sup X−n (ω) if and only if X−n ≥ b and X−n ≤ a infinitely
n→∞
n→∞
often, which is equivalent to Ua,b = ∞. But the union in the last equality is a countable union,
and therefore X−n converges to some random variable, say, X−∞ , a.s.. To show that convergence
is in L1 , it suffices to show that (X−n )n≥0 is uniformly integrable. But this is exactly the result
from previous lemma. And almost sure limit of a sequence of uniformly integrable random
variables is still integrable.
As an application of the backward martingale convergence theorem, let us prove:
THEOREM 9.4.4 (L1 Law of Large Numbers) Let (ξn )n≥1 be a sequence of i.i.d. random
P
variables with E|ξ1 | < ∞. Let Sn = nk=1 ξk . Then Sn /n converges to µ := Eξ1 in L1 .
270
THEOREM 9.4.5 Let Z ∈ L1 (Ω, F, P ) where (Ω, F, P ) is a probability space, (Fm )m≤0 be
T
a filtration, F−∞ = −∞
m=0 Fm . Define Zn = E(Z|Fn ) for n ≤ 0. Then E(Z|Fn ) → E(Z|F−∞ )
a.s. as n → −∞.
Proof. First we see that (Zn )n≤0 is a martingale because by the nested property:
E(Zn+1 |Fn ) = E(E(Z|Fn+1 )|Fn ) = E(Z|Fn ) = Zn
Then by the backward martingale convergence theorem we know that there exists some Z−∞
such that Z−n → Z−∞ a.s. and in L1 as n → ∞. Now we need to show that Z−∞ = E(Z|F−∞ ).
It is clear that Z−∞ is integrable by the result from the L1 convergence. Next we check the
measurability. For each k we know that
Z−∞ = lim Z−n = lim Z−n
n→∞
k≤n→∞
which implies that Z−∞ is F−k -measurable for any k. Then for any Borel set B ∈ R we know
−1
Z−∞
(B) ∈ F−k for all k, and hence in F−∞ . This shows the F−∞ -measurability. Finally we
check the averaging property. First for all A ∈ F−∞ we know that
|E(Z−n ; A) − E(Z−∞ ; A)| ≤ E(|Z−n − Z−∞ |; A) ≤ E(|Z−n − Z−∞ |)
This means that convergence in L1 implies convergence in E(·; A). Therefore
E(Z−∞ ; A) = E lim Z−n ; A = lim E(Z−n ; A) = lim E (E(Z|F−n ); A)
n→∞
n→∞
n→∞
= lim E (E(E(Z|F−n )|F−∞ ); A)
n→∞
by the definition of E(·|F−∞ )
nested property: F−∞ ⊂ F−n
= lim E (E(Z|F−∞ ); A)
n→∞
= E(Z|F−∞ ; A)
This means that Z−∞ serves as E(Z|F−∞ ).
COROLLARY 9.4.6 Let (Xn )n≤0 be a martingale with respect to (Fn )n≤0 . Then X−n →
T
E(X0 |F−∞ ) a.s. and in L1 where F−∞ = ∞
m=0 F−m .
Proof. By the martingale property we know that Xn = E(X0 |Fn ) for n ≤ 0. Then by the
previous theorem we know that X−n → E(X0 |F−∞ ) a.s. as n → ∞.
9.5
Optional Stopping Theorems
DEFINITION 9.5.1 (Stopping Time) Let (Ω, F, P ) be a probability space with a filtration
(Fn )0≤n≤∞ . An F-random variable T : Ω → N ∪ {∞} is said to be a stopping time if for all n
we have {T = n} ∈ Fn .
271
REMARK 9.5.1 Equivalent statement for {T = n} ∈ Fn can be {T ≤ n} ∈ Fn or {T >
n} ∈ Fn for all n.
EXAMPLE 9.5.1
1) Deterministic times are stopping times.
2) Finite sum of stopping times is a stopping time. In fact if S, T are two stopping times then
{S + T = n} =
t
X
{S = k} ∩ {T = n − k}
k=0
since {S = k} ∈ Fk ⊂ Fn , {T = n − k} ∈ Fn−k ⊂ Fn , then we know that {S + T = n} ∈ Fn .
3) If S, T are two stopping times, then min(S, T ) is a stopping time. This is because
{min(S, T ) ≥ n} = {S ≥ n, T ≥ n} = {S ≥ n} ∩ {T ≥ n} ∈ Fn
4) If S, T are two stopping times, then max(S, T ) is a stopping time. This is because
{max(S, T ) ≤ n} = {S ≤ n, T ≤ n} = {S ≤ n} ∩ {T ≤ n} ∈ Fn
LEMMA 9.5.1 Let (Xn )n≥0 be a submartingale/martingale and T be a stopping time. Then
for all n ≥ k we have
E(Xn ; T = k) ≥ E(Xk ; T = k) for submartingale
E(Xn ; T = k) = E(Xk ; T = k) for martingale
Proof. Suppose n ≥ k. If (Xn )n≥1 is a submartingale, then
E(Xn ; T = k) = E(E(Xn I{T =k} |Fk )) = E(I{T =k} E(Xn |Fk ))
= E(E(Xn |Fk ); T = k) ≥ E(Xk ; T = k)
If (Xn )n≥1 is a martingale, then
E(Xn ; T = k) = E(E(Xn I{T =k} |Fk )) = E(I{T =k} E(Xn |Fk ))
= E(E(Xn |Fk ); T = k) = E(Xk ; T = k)
LEMMA 9.5.2 Let (Xn )n≥0 be a submartingale/martingale and T be a stopping time. Then
for all n ≥ 1 we have
EX0 ≤ E(Xmin(T,n) ) ≤ EXn
for submartingale
EX0 = E(Xmin(T,n) ) = EXn
for martingale
272
Proof. Write
E(Xmin(T,n) ) = E(Xmin(T,n) ; T ≥ n) +
n−1
X
E(Xmin(T,n) ; T = k)
k=0
= E(Xn ; T ≥ n) +
n−1
X
E(Xk ; T = k)
k=0
If (Xn )n≥1 is a martingale then one obtains by previous lemma
E(Xmin(T,n) ) = E(Xn ; T ≥ n) +
= E(Xn ; T ≥ n) +
n−1
X
k=0
n−1
X
E(Xk ; T = k)
E(Xn ; T = k) = E(Xn )
k=0
Since martingale has constant mean, then EXn = EX0 . It suffices to show the case where
(Xn )n≥1 is a submartingale. First still by previous lemma
E(Xmin(T,n) ) = E(Xn ; T ≥ n) +
n−1
X
E(Xk ; T = k)
k=0
≤ E(Xn ; T ≥ n) +
n−1
X
E(Xn ; T = k) = EXn
k=0
n
X
(Xk −E(Xk |Fk−1 )) is a martingale
Now we need to show that EX0 ≤ EXmin(T,n) . Recall X̃n =
k=1
if X̃0 = 0. Then by the martingale result


min(T,n)
0 = E X̃0 = E(X̃min(T,n) ) = E 
X
(Xk − E(Xk |Fk−1 ))
k=1

≤E

min(T,n)
X
(Xk − Xk−1 ) =
=

E
= EXmin(T,n) − EX0
X
(Xk − Xk−1 ); min(T, n) = j 
k=1
E (Xj − X0 ; min(T, n) = j) =
j=0

min(T,n)
j=0
k=1
n
X
n
X
n
X
E Xmin(T,n) − X0 ; min(T, n) = j
j=0
implying that EX0 ≤ EXmin(T,n)
LEMMA 9.5.3 If (Xn )n≥1 is a sequence of random variables dominated by an integrable
random variable, then (Xn )n≥1 is uniformly integrable.
Proof. Suppose |Xn | ≤ Y ∈ L1 a.s.. Then for all c > 0 compute
sup E(|Xn |; |Xn | > c) ≤ sup E(Y ; |Xn | > c) ≤ sup E(Y ; Y > c) = E(Y ; Y > c)
n≥1
n≥1
n≥1
273
Therefore when c → ∞ we know E(Y ; Y > c) = E(Y ; Y = ∞) = 0, and therefore
lim sup E(|Xn |; |Xn | > c) = 0
c→∞ n≥1
THEOREM 9.5.1 (Optional Stopping Theorem 1-Domination Version) Let (Xn )n≥0
be a martingale and T be a stopping time. If T < ∞ a.s. and
E sup |Xmin(T,n) | < ∞
n≥0
Then EXT = EX0 .
Proof. First notice that T < ∞ implies that XT is a.s. defined. And min(T, n) → T a.s. as
n → ∞. This means that Xmin(T,n) converges to XT a.s. as n → ∞. Now by the Dominated
Convergence Theorem and the second lemma in this section
EX0 = EXmin(T,n) = lim E(Xmin(T,n) ) = E(XT )
n→∞
COROLLARY 9.5.2 Let (Xn )n≥0 be a martingale and T be a stopping time. If ET < ∞
and there exists some finite M such that E(|Xn − Xn−1 |Fn−1 ) ≤ M a.s. over the event {T ≥ n}
for all n ≥ 1, then EXT = EX0 .
Proof. Define Z0 = |X0 | and Zk = |Xk − Xk−1 |, k ≥ 1. Set W =
T
X
Zk . Then
k=0
min(T,n)
min(T,n)
X
X
|Xk − Xk−1 |
(Xk − Xk−1 ) ≤ |X0 | +
|Xmin(T,n) | = X0 +
k=1
k=1
min(T,n)
=
X
|Zk | ≤
k=0
T
X
Zk = W
k=0
Now we know that T < ∞ a.s. because ET < ∞. Then by the Optional Stopping Theorem 1
it only suffices to show that W ∈ L1 . Now write
EW =
∞
X
E(W ; T = n) =
n=0
=
∞ X
∞
X
∞
X
E
n=0
=
k=0 n=k
!
Zk ; T = n
k=0
∞ X
∞
X
I{k≤n} E(Zk ; T = n) =
n=0 k=0
∞ X
∞
X
n
X
=
∞ X
n
X
E(Zk ; T = n)
n=0 k=0
I{k≤n} E(Zk ; T = n)
k=0 n=0
E(Zk ; T = n) =
∞
X
E(Zk ; T ≥ k) =
k=0
∞
X
k=0
274
E(E(Zk I{T ≥k} |Fk−1 ))
Notice that {T ≥ k} =
EW =
∞
X
Pk−1
j=0 {T
= j} ∈ Fk−1 , therefore
E(E(Zk I{T ≥k} |Fk−1 )) =
k=0
=
∞
X
E(I{T ≥k} E(Zk |Fk−1 ))
k=0
∞
X
E(E(Zk |Fk−1 ); T ≥ k) =
k=0
≤M
∞
X
E(E(|Xk − Xk−1 ||Fk−1 ); T ≥ k)
k=0
∞
X
P (T ≥ k) = M
k=0
∞
X
P (T = k) +
k=0
∞
X
!
P (T ≥ k)
≤ M (1 + ET ) < ∞
k=1
where the last inequality is due to theorem 5.4.2, the theorem concerning estimating expectation
in terms of sums of tail probabilities.
THEOREM 9.5.3 (Optional Stopping Theorem 2) Let (Xn )n≥0 be a martingale and T
be a stopping time. If T < ∞ a.s., E|XT | < ∞ and E(Xn ; T > n) → 0 as n → ∞, then
EXT = EX0 .
Proof. Compute
EXT = E(XT ; T > n) + E(XT ; T ≤ n) = E(XT ; T > n) + E(Xmin(T,n) ; T ≤ n)
= E(XT ; T > n) + E(Xmin(T,n) ) − E(Xmin(T,n) ; T > n)
= E(XT ; T > n) + EX0 − E(Xn ; T > n)
By the condition we know E(XT ; T > n) → 0 as n → ∞ because XT ∈ L1 and T < ∞ a.s., and
also E(Xn ; T > n) → 0 as n → ∞ is one of the assumptions. Therefore we can let n → ∞ and
obtain EXT = EX0 on both sides. Notice that EXmin(T,n) = EX0 is applied due to one of the
lemmas in this section.
COROLLARY 9.5.4 Let (Xn )n≥0 be a martingale and T be a stopping time. If T < ∞ a.s.,
2
≤ M for all n, then EXT = EX0 .
and there exists an M < ∞ such that EXmin(T,n)
Proof. One way to prove it is to show that L2 boundedness is sufficient for uniform integrability
and hence is sufficient for L1 -domination. Here we present another proof by applying the three
conditions on the optional stopping theorem 2. First we know that T < ∞ a.s. is satisfied. Next
2
we show that E|XT | < ∞. Since EXmin(T,n)
≤ M for all n, then
2
2
E(XT2 ; T ≤ n) = E(Xmin(T,n)
; T ≤ n) ≤ E(Xmin(T,n)
)≤M
for all n, and we know that I{T ≤n} increases to 1 a.s. because P (T = ∞) = 0. Therefore by
Monotone Convergence Theorem and Jensen’s inequality we know that
(E|XT |)2 ≤ E(XT2 ) = E lim XT2 I{T ≤n} = lim E(XT ; T ≤ n) ≤ M
n→∞
n→∞
275
Hence E|XT | ≤ M . Finally we need to show that E(Xn ; T ≥ n) → 0 as n → ∞. Since each Xn
is integrable, then by Cauchy-Schwarz inequality we have
q
√ p
2
EI{T ≥n} ≤ M P (T ≥ n) → 0
E(Xn ; T ≥ n) = E(Xmin(T,n) ; T ≥ n) ≤ EXmin(T,n)
due to the Monotone Sequential Continuity from Above and {T ≥ n} ↓ {T = ∞}, P (T = ∞) =
0.
Here is an application of the optional stopping theorems:
THEOREM 9.5.5 (Wald’s Equation) Let (ξn )n≥1 be a sequence of i.i.d. random variables
n
X
2
with mean µ and variance σ . Let Sn =
ξk , Xn := Sn − nµ. If T is a stopping time with
k=1
ET < ∞, then E|XT | < ∞ and EXT = EST − µET = 0.
Proof. Clearly (Xn )n≥1 is a martingale. In order to apply the optional stopping theorems, for
example, the second one, we need to show that T < ∞ a.s., E|XT | < ∞ and E(Xn ; T ≥ n) → 0
as n → ∞. Here T < ∞ a.s. is an immediate consequence of ET < ∞. To show that
E|XT | < ∞, by defining Yk = ξk − µ we can compute
!
!
∞
∞
T
T
X
X
X
X
E
E
|Yk |; T = n =
E|XT | ≤ E
|Yk | =
k=1
=
=
=
∞ X
n
X
n=1 k=1
∞
∞ X
X
n=1
E(|Yk |; T = n) =
n=1
k=1
∞ X
∞
X
I{k≤n} E(|Yk |; T
n=1 k=1
∞
∞ X
X
∞
X
k=1
k=1
!
|Yk |; T = n
k=1
= n)
E(|Yk |; T = n)
I{k≤n} E(|Yk |; T = n) =
k=1 n=1
∞
X
n
X
k=1 n=k
E(|Yk |; T ≥ k) =
E|Yk |P (T ≥ k)
where the last equality holds because σhYk i is independent of σhξ1 , · · · , ξk−1 i = Fk−1 , and
Pk−1
{T ≥ k}c = j=1
{T = j} ∈ Fk−1 . Further computation shows
E|XT | ≤
∞
X
E|Yk |P (T ≥ k) = E|ξ1 − µ|
k=1
∞
X
P (T ≥ k) ≤ E|ξ1 − µ|ET < ∞
k=1
Finally we need to show that E(Xn ; T > n) → 0 as n → ∞. Write
E 2 (Xn ; T > n) ≤ EXn2 P (T > n) = Var(Sn )P (T > n) = nσ 2 P (T > n) ≤ σ 2 E(T ; T > n)
Since ET < ∞, then E(T ; T > n) → 0 as n → ∞ because P (T > n) → 0. Therefore we know
that E(Xn ; T > n) → 0 as n → ∞. Therefore optional stopping theorem 2 can be applied to
the martingale (Xn )n≥1 : EXT = EX0 = 0. But
EXT = E(ST − T µ) − EST − µET
276
Therefore the conclusion is proved.
EXAMPLE 9.5.2 (Fundamental Identity of Wald) Let (ξn )n≥1 be a sequence of i.i.d.
random variables with P (ξ1 = 0) < 1 and suppose that 1 ≤ ϕ(θ) := Eeiξ1 < ∞ for a given
n
X
fixed θ ∈ R, where ϕ is the moment-generating function. For n ≥ 1 set Sn :=
ξk and
k=1
Fn := σhξ1 , · · · , ξn i; let S0 = 0 and F0 be the trivial σ-field. Finally, fix values a > 0 and b > 0
and set
T := inf{n : Sn ≤ −a or Sn ≥ b}
(a) Show that T has at worst geometrically thick tails and that ET < ∞.
(b) Show that Xn := ϕ(θ)−n eθSn , n ≥ 0 defines a martingale with respect to (Fn )n≥0 . Using an
optional stopping result, prove the fundamental identity of Wald:
E(ϕ(θ)−T eθST ) = 1
(c) For this part, suppose further that θ 6= 0 and ϕ(θ) = 1. For simplicity, suppose also that
P {ST ≤ −a} > 0 and P {ST ≥ b} > 0. With
Ea := E(eθST |ST ≤ −a) Eb := E(eθST |ST ≥ b)
Show that
P (ST ≥ b) =
1 − Ea
Eb − Ea
Proof.
(a) First we know that P (|ξ1 | = 0) < 1, then P (|ξ1 | > 0) > 0, and hence
!
∞ [
1
1
P (|ξ1 | > 0) = P
|ξ1 | >
= lim P |ξ1 | >
r→∞
r
r
r=1
Therefore there exists some r such that P (|ξ1 | > 1/r) ≥ η > 0 for some η > 0. This yields that
max(P (ξ1 > 1/r), P (ξ1 < −1/r)) ≥
η
2
Let c = a + b. Pick N such that N ≥ cr. This means that
max(P (ξ1 ≥ c/N ), P (ξ1 ≤ −c/N )) ≥ max(P (ξ1 > 1/r), P (ξ1 < −1/r)) ≥
η
2
We know that
{|SN | ≥ c} =
( N
X
i=1
)
ξi ≥ c
∪
( N
X
)
ξi ≤ −c
i=1
⊃
( N
X
i=1
277
) ( N
)
X
ξi ≥ c ,
ξi ≤ −c
i=1
And notice that
( N
X
)
ξi ≥ c
i=1
⊃
N
\
{ξi ≥ c/N }
( N
X
i=1
)
ξi ≤ −c
⊃
i=1
N
\
{ξi ≤ −c/N }
i=1
Therefore by taking δ = (η/2)N we can compute
P (|SN | ≥ c) ≥ max P
N
\
!
N
\
{ξi ≥ c/N } , P
i=1
!!
{ξi ≤ −c/N }
i=1
!
N
N
N
Y
Y
Y
η η N
= max
=
(ξi ≥ c/N ),
=δ
P (ξi ≤ −c/N ) ≥
2
2
i=1
i=1
i=1
Next define Sj0 = SjN − S(j−1)N for j = 1, 2, · · · , k. Then T > kN implies that |Sj0 | < c for all
j = 1, 2, · · · , k(Draw a picture to make oneself clear). Also we know that Sj0 ’s are independent.
Therefore
P (T > kN ) ≤ P
k
\
!
|Sj0 |
<c
j=1
=
k
Y
P (|Sj0 | < c) = P (|S10 | < c)k = (1 − δ)k
j=1
because S10 = SN and P (|S10 | < c) = 1 − P (|SN | ≥ c) = 1 − δ. Notice that for each integer
n ≥ 0, there exists a unique k and r with 0 ≤ r < N , such that n = kN + r, where N, k, r are
non-negative integers. Therefore
P (T > n) ≤ P (T > kN ) ≤ (1 − δ)k = (1 − δ)bn/N c
This is the precise mathematical interpretation of the statement: T has at worst geometrically
thick tails. Next by the second lemma in section 5.4, we have
ET ≤ 1 +
∞
X
P (T > n) ≤ 1 +
∞
X
bn/N c
(1 − δ)
=1+N
n=0
n=0
∞
X
(1 − δ)k = 1 +
k=0
N
<∞
δ
(b) The process (Xn )n≥1 itself is a moment-generating function martingale. We first show that
T is a stopping time. Since T = n if and only if Sn ≤ −a or Sn ≥ b, and that Sj ∈ (−a, b)
for j < n, then we know that T = n is in σhS1 , · · · , Sn−1 , Sn i = Fn . This means that T is a
stopping time. Also we know that T < ∞ a.s. by ET < ∞. Then in order to apply the corollary
for optional stopping theorem 1, it suffices to show that
E(|Xn − Xn−1 ||Fn−1 )I{T ≥n} ≤ M
for some M and for all n. Compute
Xn − Xn−1 =
eθSn
eθSn−1
Xn−1 θξn
−
=
(e − ϕ(θ))
n
n−1
ϕ (θ) ϕ (θ)
ϕ(θ)
278
Consider sample points ω with T (ω) ≥ n. Then |Sn−1 (ω)| ≤ c = a + b. Therefore
|Xn − Xn−1 | ≤ |Xn−1 ||eθξn − ϕ(θ)| ≤ 2|Xn−1 | ≤ 2 exp(|θ|c)
is bounded over event {T ≥ n}. Notice that {T ≥ n} ∈ Fn−1 , then
E(|Xn − Xn−1 ||Fn−1 )I{T ≥n} = E(|Xn − Xn−1 |I{T ≥n} |Fn−1 ) ≤ E(2 exp(|θ|c)|Fn−1 ) = 2 exp(|θ|c)
Therefore we can apply the result of optional stopping theorem and obtain EXT = EX0 = 1.
(c) Since P (ST ≤ −a) > 0, P (ST ≥ b) > 0, then conditional probability measures P (·|ST ≤ −a)
and P (·|ST ≥ b) can be defined as
P (A|ST ≤ −a) =
P (A ∩ {ST ≤ −a})
P (ST ≤ −a)
P (A|ST ≥ b) =
P (A ∩ {ST ≥ b})
P (ST ≥ b)
Therefore conditional expectation is the integral with respect to the conditional probability
measure. Define p = P (ST ≥ b) = 1 − P (ST ≤ −a). Now following the fundamental identity of
Wald one can compute
θST
1 = Ee
Z
=
Z
= (1 − p)
Z
eθST dP
{ST ≥b}
{ST ≤−a}
Z
θST
e dP (·|ST ≤ −a) + p
θST
e
dP +
eθST dP (·|ST ≥ b)
{ST ≥b}
{ST ≤−a}
= (1 − p)Ea + pEb
This gives that
p = P (ST ≥ b) =
1 − Ea
Eb − Ea
The next three results are simple applications of martingales and optional stopping theorems.
THEOREM 9.5.6 (Maximal Inequality for Submartingales) Suppose (Xn )n≥0 is a submartingale, Xn ≥ 0 for all n, then for all ε > 0 and all n ≥ 0, we have
EXn
P max Xk ≥ ε ≤
0≤k≤n
ε
Proof. Let
T = inf{k ≥ 0 : Xk ≥ ε}
Then T is a stopping time because T = k if and only if Xj < ε for j = 0, · · · , k − 1 and Xk ≥ ε,
therefore {T = k} ∈ Fk . By the second lemma in optional stopping theorem section, we know
that
EX0 ≤ EXmin(T,n) ≤ EXn
279
Therefore
EXn ≥ EXmin(T,n) ≥ E(Xmin(T,n) ; T ≤ n)
= E(XT ; T ≤ n) ≥ εP (T ≤ n) = εP
max Xk ≥ ε
0≤k≤n
THEOREM 9.5.7 (Kolmogorov’s Inequality) Suppose (ξn )n≥1 are i.i.d. random variables
P
with mean 0 and variance σ 2 < ∞. Let Sn = nj=1 ξj . Then for all ε > 0 and all n ≥ 1, we have
nσ 2
P max |Sk | ≥ ε ≤ 2
1≤k≤n
ε
Proof. First Sn is a martingale with respect to the natural filtration, then we know
2
2
2
E(Sn+1
|Fn ) = E(Sn2 + 2Sn ξn+1 + ξn+1
|Fn ) = Sn2 + 2Sn E(ξn+1 |Fn ) + E(ξn+1
|Fn )
2
= Sn2 + 2Sn Eξn+1 + Eξn+1
≥ Sn2
This means that (Sn2 )n≥1 is a submartingale. By applying maximal inequality we have
Var(Sn )
nσ 2
ES 2
2
2
=
P max |Sk | ≥ ε = P max Sk ≥ ε ≤ 2 n =
1≤k≤n
1≤k≤n
ε
ε2
ε2
LEMMA 9.5.4 (Jensen’s Inequality for Conditional Expectation) In a probability
space (Ω, A, P ), let X be a random variable and F be a sub-σ-field. If ϕ is convex and
E|X|, E|ϕ(X)| < ∞, then ϕ(E(X|F)) ≤ E(ϕ(X)|F).
Proof. Since ϕ is convex, then by the convex set separation theorem, there exists a support
hyperplane f (x; x0 ) = A(x0 )(x − x0 ) + ϕ(x0 ) such that ϕ(x) ≥ f (x0 ; x0 ) for all x and ϕ(x0 ) =
f (x0 ; x0 ), where x0 is any fixed point. Since E|X| < ∞ and E|ϕ(X)| < ∞, then |X| < ∞ a.s.
and |ϕ(X)| < ∞ a.s.. Then we know E(X|F) is finite a.s.. So we may only need to consider
sample points where both X and E(X|F) are finite. Then
ϕ(X) ≥ f (X; E(X|F)) = ϕ(E(X|F)) + A(E(X|F))(X − E(X|F))
Now by the monotonicity of conditional expectation we have
E(ϕ(X)|F) ≥ E(f (X; E(X|F))|F) = ϕ(E(X|F))
And this inequality holds a.s. because we only eliminate a null set where X or E(X|F) are
infinite.
280
THEOREM 9.5.8 Let (Xn )n≥0 be a martingale such that E|Xn |α < ∞ for all n ≥ 0, and
α > 1 is some positive real. Then
E
max |Xk | ≤
0≤k≤n
α
kXn kα
α−1
Proof. First we show that (|Xn |α )n≥0 is a submartingale. By Jensen’s inequality for conditional
expectation
E(|Xn+1 |α |Fn ) ≥ (E(|Xn+1 ||Fn ))α ≥ |Xn |α
because (|Xn |)n≥0 is a submartingale(absolute value of martingales are submartingales, which
is easy to verify, for example, E(|Xn+1 ||Fn ) ≥ E(Xn+1 |Fn ) = Xn , and non-negativity of
E(|Xn+1 ||Fn ) implies that E(|Xn+1 ||Fn ) ≥ |Xn |). Next we can compute
(Z
Z ∞ Z ∞ ) kXn kα
α
α
P max |Xk | ≥ t dt
E max |Xk | =
+
P max |Xk | ≥ t dt =
0≤k≤n
0≤k≤n
0
Z
≤
kXn kα
∞
dt +
Z
P
kXn kα
0
kXn kα
0
Z
∞
= kXn kα +
kXn kα
0≤k≤n
max |Xk |α ≥ tα dt
0≤k≤n
t=kXn kα
kXn kαα
kXn kαα tα dt = kXn kα +
tα
α − 1 t=+∞
α
=
kXn kα
α−1
281
Chapter 10
Conditional Probability Distributions
10.1
Definition
DEFINITION 10.1.1 (Conditional Probability) Suppose T is a measurable mapping
T : (Ω, A) → (T , C), then the conditional probability of an event A ∈ A given T is defined to
be E(IA |T ), denoted by P (A|T ) or P (A|T = ·).
Namely, P (A|T = ·) is any one of the P T −1 a.s. equivalent C-measurable functions such that
Z
P (A ∩ {T ∈ C}) =
P (A|T = t)P T −1 (dt)
C
for all C ∈ C by the definition of conditional expectation.
Now we have just defined P (A|T = t) by holding A fixed an considering the resulting function
of t. What happens if we hold t fixed and vary A? Do we get a bona fide probability on (Ω, A)?
To find out, choose and fix a P (A|T ) for each A ∈ A. According to the properties of conditional
expectations, these functions will satisfy
1. 0 ≤ P (A|T = ·) ≤ 1 a.s., P (Ω|T = ·) = 1 a.s. with respect to P T −1 .
2. P (A1 + A2 |T = ·) = P (A1 |T = ·) + P (A2 |T = ·) a.s. by the linearity of conditional
expectation.
3. P (An |T = ·) ↑ P (A|T = ·) a.s. for every increasing sequence (An )n≥1 increasing to A by
the Monotone Convergence Theorem for conditional expectation.
However, the null set qualifications are present, and it is in general impossible to get rid of all
of them simultaneously.
DEFINITION 10.1.2 (Conditional Probability Distribution) Suppose T is a measurable mapping T : (Ω, A) → (T , C). A binary function P (·|T = ∗) on A×T is called a conditional
probability distribution on (Ω, A) given T , if
282
1 For each t ∈ T , P (·|T = t) is a probability measure on (Ω, A).
2 For each A ∈ A, P (A|T = ∗) is a conditional probability of A given T .
DEFINITION 10.1.3 (Conditional Probability Distribution of Measurable Functions) A conditional probability distribution of a measurable function Z : (Ω, A) → (Ψ, B)
given a measurable function T : (Ω, A) → (T , C) is a binary function P (Z ∈ ·|T = ∗) on B × T
such that
1 For each t ∈ T , P (Z ∈ ·|T = t) is a probability measure on (Ψ, B).
2 For each B ∈ B, P (Z ∈ B|T = ∗) is a conditional probability of {Z ∈ B} given T .
REMARK 10.1.1 For any measurable function X on a probability space, we shall write
LX = P X −1 as the law of X. Thus, if F is measurable, then LF (X) (B) = P (F (X) ∈ B) =
P (X ∈ F −1 (B)) = LX (F −1 B) = LX F −1 (B) for any measurable B in the target space. This
means LF (X) = LX F −1 . If P (Z ∈ ·|T = ∗) is a conditional probability distribution of Z given
T , it is clear that P (Z ∈ F −1 (·)|T = ∗) is a conditional probability distribution of F (Z) given
T . We shall write LZ|T =t for P (Z ∈ ·|T = t). Further one can see that LF (X)|T =t = LX|T =t F −1
because for any measurable B, P (F (X) ∈ B|T = t) = P (X ∈ F −1 (B)|T = t) = LX|T =t F −1 .
REMARK 10.1.2 If P (·|T = ∗) is a conditional probability distribution on (Ω, A), or even
only on (Ω, Z −1 (B)), then for each t ∈ T ,
LZ|T =t = P (·|T = t)Z −1
Namely, LZ|T =t is the probability measure on (Ψ, B) induced from the probability measure
P (·|T = t) on (Ω, Z −1 (B)) by the measurable function Z.
10.2
Basic Properties and Examples
In this section most results will be stated without proofs.
PROPOSITION 10.2.1 Conditional probability distributions do not always exists.
However, the positive result holds in a very general setting:
DEFINITION 10.2.1 (Standard Borel Space) A measurable space (Ψ, B) is called a
standard Borel space, if there exists a Borel subset C of the real line with trace Borel σ-field C,
and a bijection ϕ : C → Ψ such that ϕ is C/B-measurable and ϕ−1 is B/C-measurable.
PROPOSITION 10.2.2 A conditional probability distribution for Z : (Ω, A) → (Ψ, B) given
T exists if (Ψ, B) is a standard Borel space-in particular, if Ψ is a Borel subset of a complete
separable metric space and B is the trace of the Borel σ-field on Ψ.
283
PROPOSITION 10.2.3 (Uniqueness of Conditional Probability Distribution) A conditional probability distribution for Z : (Ω, A) → (Ψ, B) given T is unique up to a LT -a.s.
equivalence provided that B is countably generated.
PROPOSITION 10.2.4 (Density Case) Suppose LT,Z has density fT,Z with respect to a
R
product measure µ × ν on (T × Ψ, C ⊗ B) with µ, ν being σ-finite. Put fT (t) = fT,Z (t, z)ν(dz)
for all t ∈ T .
1 fT (t) serves as a density for LT with respect to µ.
2 Without loss of generality, suppose 0 ≤ fT,Z < ∞ and 0 ≤ fT < ∞ everywhere. Set
fZ|T (z|t) =
fT,Z (t, z)
fT (t)
and convention 0/0 = 0 is made. Then a conditional probability distribution for Z given T
exists, and has a density fZ|T (·|t) with respect to ν for each t ∈ T .
EXAMPLE 10.2.1 Put Ω = [−1, 1] and A be the Borel σ-field on that, P (·) =
R
·
gdλ where
λ is the Lebesgue measure on [−1, 1]. Define T : Ω → [0, 1] by T (ω) = |ω| and Z(ω) = sgn(ω)
where Z(0) = 1. Then LT,Z has density
fT,Z (t, z) = g(zt)
with respect to λ × ν where ν is counting measure on (Ω, B). Then
fZ|T (1|t) =
g(t)
g(t) + g(−t)
fZ|T (−1|t) =
284
g(−t)
g(t) + g(−t)