An introduction to probability theory
Christel Geiss and Stefan Geiss
Department of Mathematics and Statistics
University of Jyväskylä (Finland)
These lecture notes are combined from [9] and [10]
and are still work in progress.
January 19, 2016
2
Contents
1 Probability spaces
1.1 What is a σ-algebra? . . .
1.2 Measures . . . . . . . . . .
1.3 First look at independence
1.4 Conditional probabilities .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Construction of probability spaces
2.1 Examples of distributions on N . . . . . . . . . . . . . . . .
2.1.1 Binomial distribution with parameter 0 < p < 1 . . .
2.1.2 Poisson distribution with parameter λ > 0 . . . . . .
2.1.3 Geometric distribution with parameter 0 < p < 1 . .
2.2 Carathéodory’s extension theorem . . . . . . . . . . . . . . .
2.3 The Borel σ-algebra . . . . . . . . . . . . . . . . . . . . . . .
2.4 Distributions on R . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Lebesgue measure on (R, B(R)) . . . . . . . . . . . .
2.4.2 Gaussian distribution on R with mean m ∈ R and
variance σ 2 > 0 . . . . . . . . . . . . . . . . . . . . .
2.4.3 Exponential distribution on R with parameter λ > 0
2.5 The trace σ-algebra . . . . . . . . . . . . . . . . . . . . . . .
2.6 Product spaces . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 The Lebesgue measure on Borel sets of Rd . . . . . . . . . .
2.8 A first limit theorem: Poisson’s Theorem . . . . . . . . . . .
.
.
.
.
11
12
14
19
21
.
.
.
.
.
.
.
.
23
23
23
24
24
24
25
29
29
.
.
.
.
.
.
30
31
32
33
35
36
3 Random variables
39
3.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Measurable maps . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Independence
47
4.1 The basic concept . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Zero-One laws . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Integration
61
5.1 Definition of the Lebesgue-integral . . . . . . . . . . . . . . . 61
5.2 Theorem about monotone convergence . . . . . . . . . . . . . 63
5.3 Basic properties of the Lebesgue-integral . . . . . . . . . . . 66
4
CONTENTS
5.4
5.5
5.6
5.7
Lemma of Fatou and Lebesgue’s theorem
Examples: discrete and continuous . . .
5.5.1 Countable measure spaces . . . .
5.5.2 The Riemann integral . . . . . .
5.5.3 The Lebesgue-Stieltjes integral . .
Change of variable formula . . . . . . . .
Fubini’s Theorem . . . . . . . . . . . . .
6 Some inequalities
6.1 Distributional inequalities . . . .
6.2 Convexity . . . . . . . . . . . . .
6.3 Hölder- and Minkowski inequality
6.4 The variance of a random variable
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
67
69
70
70
72
72
73
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
81
81
81
82
86
.
.
.
.
.
.
.
.
89
89
92
94
98
.
.
.
.
103
. 103
. 107
. 112
. 114
7 Radon-Nikodym theorem
7.1 Signed measures . . . . . . . . . . . . . . . .
7.2 Theorem of Radon-Nikodym . . . . . . . .
7.3 Conditional expectation . . . . . . . . . . .
7.4 A representation of conditional expectations
8 Convergence
8.1 Almost sure convergence .
8.2 Convergence in probability
8.3 Convergence in mean . . .
8.4 Summary . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9 Uniform integrability
117
10 Limit theorems
10.1 Sums of independent random variables . . .
10.2 The strong law of large numbers . . . . . . .
10.3 An ergodic Theorem of Birkhoff and Chinčin
10.4 The law of iterated logarithm . . . . . . . .
10.5 An application to insurance . . . . . . . . .
11 Characteristic functions
11.1 Complex numbers . . . . . . .
11.2 Definition and basic properties
11.3 Convolutions . . . . . . . . .
11.4 Some important properties . .
11.5 Examples . . . . . . . . . . .
11.6 Independent random variables
11.7 Moments of measures . . . . .
11.8 Weak convergence . . . . . . .
11.9 Central limit theorem . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
121
. 121
. 128
. 132
. 134
. 137
.
.
.
.
.
.
.
.
.
141
. 142
. 144
. 148
. 151
. 158
. 165
. 167
. 170
. 171
CONTENTS
12 Appendix
12.1 Topology . . . . . . . . . . . . . . . . . . . . . . . . . .
12.2 Set theory . . . . . . . . . . . . . . . . . . . . . . . . .
12.3 Measure and integration theory . . . . . . . . . . . . .
12.3.1 π-λ-Theorem . . . . . . . . . . . . . . . . . . .
12.3.2 Outer measures and the proof of Carathéodory’s
orem . . . . . . . . . . . . . . . . . . . . . . . .
12.3.3 Monotone class theorem . . . . . . . . . . . . .
12.3.4 A set which is not a Lebesgue set . . . . . . . .
12.4 Add-ons for convergence in probability . . . . . . . . .
12.5 Add-ons for inequalities . . . . . . . . . . . . . . . . .
5
. . .
. . .
. . .
. . .
The. . .
. . .
. . .
. . .
. . .
177
. 177
. 178
. 179
. 179
.
.
.
.
.
180
183
184
185
187
6
CONTENTS
Introduction
Probability theory can be understood as a mathematical model for the intuitive notion of uncertainty. Parts of probability theory belong to pure
mathematics, other parts to applied mathematics. Starting with the applied
side, without probability theory all the stochastic models in Physics, Biology, and Economics would either not have been developed or would not be
rigorous. In order to be rigorous a solid mathematical foundation is needed
using various tools, like for example set theory, functional analysis, complex
analysis and the theory of special functions. Also, probability is used in many
branches of pure mathematics itself, even in branches one does not expect
this, like in convex geometry.
The modern period of probability theory is connected with names like S.N.
Bernstein (1880-1968), E. Borel (1871-1956), and A.N. Kolmogorov (19031987). In particular, in 1933 A.N. Kolmogorov published his modern approach of Probability Theory, including the notion of a measurable space and
a probability space. This book will start from this notion, to continue with
random variables and basic parts of integration theory, to study the various
types of convergence and to finish with the powerful theory of characteristic
functions (known from real analysis as Fourier-transform). Having this
tools, limit theorems will be studied, like the fundamental Law of Large
Numbers or the Central Limit Theorem.
This book is intended to follow an axiomatic approach and is intended for
students from mathematics, but also for other students who need more mathematical background for their further studies. We do not assume the knowledge of measure- and integration theory, instead we will introduce the necessary parts. So the approach, we follow, seems to be more difficult in the
beginning. But once one has a solid basis, many things will be easier and
more transparent later.
Historical information about mathematicians can be found in the MacTutor
History of Mathematics Archive under
www-history.mcs.st-andrews.ac.uk/history/index.html
and is also used throughout this book.
Let us start with some introducing examples.
8
CONTENTS
Example 1. You stand at a bus-stop knowing that every 30 minutes a bus
comes. But you do not know when the last bus came. What is the intuitively
expected time you have to wait for the next bus? Or you have a light bulb
and would like to know how many hours you can expect the light bulb will
work? How to treat the above two questions by mathematical tools?
Example 2. Surely many lectures about probability start with rolling a
die: you have six possible outcomes {1, 2, 3, 4, 5, 6}. If you want to roll a
certain number, let’s say 3, assuming the die is fair (whatever this means) it
is intuitively clear that the chance to dice really 3 is 1:6. How to formalize
this?
Example 3. Now the situation is a bit more complicated: Somebody has
two dice and transmits to you only the sum of the two dice. The set of all
outcomes of the experiment is the set
Ω := {(1, 1), ..., (1, 6), ..., (6, 1), ...., (6, 6)}
but what you see is only the result of the map f : Ω → R defined as
f ((a, b)) := a + b,
that means you see f (ω) but not ω = (a, b). This is an easy example of a
random variable f , which we introduce later.
Example 4. In Example 3 we know the set of states ω ∈ Ω explicitly. But
this is not always possible nor necessary: Say, that we would like to measure
the temperature outside our home. We can do this by an electronic thermometer which consists of a sensor outside and a display, including some
electronics, inside. The number we get from the system might not be correct
because of several reasons: the electronics is influenced by the inside temperature and the voltage of the power-supply. Changes of these parameters have
to be compensated by the electronics, but this can, probably, not be done in
a perfect way. Hence, we might not have a systematic error, but some kind
of a random error which appears as a positive or negative deviation from the
exact value. It is impossible to describe all these sources of randomness or
uncertainty explicitly.
We denote the exact temperature by T and the displayed temperature by S,
so that the difference T − S is influenced by the above sources of uncertainty.
If we would measure simultaneously, by using thermometers of the same type,
we would get values S1 , S2 , ... with corresponding differences
D1 := T − S1 ,
D2 := T − S2 ,
D3 := T − S3 , ...
Intuitively, we get random numbers D1 , D2 , ... having a certain distribution.
How to develop an exact mathematical theory out of this?
CONTENTS
9
Firstly, we take an abstract set Ω. Each element ω ∈ Ω will stand for a specific
configuration of our sources influencing the measured value. Secondly, we
take a function
f :Ω→R
which gives for all ω ∈ Ω the difference f (ω) = T − S(ω). From properties
of this function we would like to get useful information of our thermometer
and, in particular, about the correctness of the displayed values.
To put Examples 3 and 4 on a solid ground we go ahead with the following
questions:
Step 1: How to model the randomness of ω, or how likely an ω is? We do
this by introducing the probability spaces in Chapters 1 and 2.
Step 2: What mathematical properties of f do we need to transport the
randomness from ω to f (ω)? This yields to the introduction of the random
variables in Chapter 3.
Step 3: What are properties of f which might be important to know in
practice? For example the mean-value and the variance, denoted by
and E(f − Ef )2 .
Ef
If the first expression is zero, then in Example 3 the calibration of the thermometer is right, if the second one is small the displayed values are very
likely close to the real temperature. To define these quantities one needs the
integration theory developed in Chapter 5.
Step 4: Is it possible to describe the distribution of the values that f may
take? Or before, what do we mean by a distribution? Some basic distributions are discussed in Sections 2.1, 2.4 and 2.7.
Step 5: What is a good method to estimate Ef ? We can take a sequence of
independent (take this intuitive for the moment) random variables f1 , f2 , ...,
having the same distribution as f , and expect that
n
1X
fi (ω) and
n i=1
Ef
are close to each other. This leads us to the Strong Law of Large Numbers
discussed in Chapter 10.
Notation. Given a set Ω and subsets A, B ⊆ Ω, then the following notation
is used:
10
CONTENTS
set of all subsets of Ω:
2Ω
intersection: A ∩ B
union: A ∪ B
set-theoretical minus: A\B
complement:
Ac
symmetric difference: A∆B
empty set:
∅
real numbers:
R
natural numbers:
N
natural numbers with 0 :
N0
rational numbers:
Q
=
=
=
=
=
=
=
{A : A ⊆ Ω}
{ω ∈ Ω : ω ∈ A and ω ∈ B}
{ω ∈ Ω : ω ∈ A or (or both) ω ∈ B}
{ω ∈ Ω : ω ∈ A and ω 6∈ B}
{ω ∈ Ω : ω 6∈ A}
(A ∪ B) \ (A ∩ B)
set, without any element
= {1, 2, 3, ...}
= {0, 1, 2, }
1 if x ∈ A
0 if x 6∈ A
Given real numbers α, β, we use α ∧ β := min {α, β}.
indicator-function:
1IA (x)
=
Chapter 1
Probability spaces
In this chapter we introduce the probability space, the fundamental notion
of probability theory. A probability space (Ω, F, P) consists of three components.
(1) The elementary events or states ω which are collected in a non-empty
set Ω.
Example 1.0.1. (a) If we roll a die, then all possible outcomes are the numbers between 1 and 6. That means
Ω = {1, 2, 3, 4, 5, 6}.
(b) If we flip a coin, then we have either ”heads” or ”tails” on top, that
means
Ω = {H, T }.
If we have two coins, then
Ω = {(H, H), (H, T ), (T, H), (T, T )}
is the set of all possible outcomes.
(c) The choice Ω = [0, ∞) could model the lifetime of a bulb or, in Financial
Mathematics, the range of possible share prices of a particular company.
In the examples above the structure of Ω directly reflects the problem we
wish to model. However, later there will be situations where the particular
structure of the set Ω is not important, we just need the existence of an
appropriate one.
(2) A σ-algebra F, which is the system of observable subsets or events
A ⊆ Ω. The interpretation is that one can usually not decide whether a
system is in the particular state ω ∈ Ω, but one can decide whether ω ∈ A
or ω 6∈ A.
12
CHAPTER 1. PROBABILITY SPACES
(3) A measure P, which gives a probability to all A ∈ F. This probability
is a number P(A) ∈ [0, 1] that describes how likely it is that the event A
occurs.
For the formal mathematical approach we proceed in two steps: in the first
step we define the σ-algebras F, here we do not need any measure. In the
second step we introduce the measures.
1.1
What is a σ-algebra?
The σ-algebra is a basic tool in measure and probability theory. It is the
set the measures are defined on. Without this notion it would be impossible
to consider the fundamental Lebesgue measure on the real line or Gaussian
measures, without which many parts of mathematics can not live.
Definition 1.1.1. [σ-algebra, algebra, measurable space] Let Ω
be a non-empty set. A system F of subsets A ⊆ Ω is called σ-algebra on Ω
if
(1) ∅, Ω ∈ F,
(2) A ∈ F implies that Ac := Ω\A ∈ F,
S
(3) A1 , A2 , ... ∈ F implies that ∞
i=1 Ai ∈ F.
The pair (Ω, F), where F is a σ-algebra on Ω, is called measurable space.
If one replaces (3) by
(30 ) A, B ∈ F implies that A ∪ B ∈ F,
then F is called an algebra.
In probability theory and statistics the sets A ∈ F are called events. One
says that an event A occurs if ω ∈ A and that it does not occur if ω 6∈ A.
By taking A1 = A and A2 = A3 = · · · = B we see that every σ-algebra is an
algebra. Sometimes, the terms σ-field and field are used instead of σ-algebra
and algebra, respectively.
There are at least two reasons to introduce the notion of a σ-algebra: Firstly,
usually we cannot decide whether a particular ω ∈ Ω in our system occurs
or not. To relax this we ask whether for a subset A ⊆ Ω either ω ∈ A or
ω 6∈ A, i.e. whether A is observable. Secondly, we would like to assign to an
observable A a number P(A) ∈ [0, 1] that models how likely it is considered
that ω ∈ A. To define the map P in a reasonable way the observable events
has to form a σ-algebra. One also should note that the properties of a σalgebalgebrara fit the intuition of being observable: for example, if ω ∈ A
can be decided, then ω ∈ Ac , i.e. ω 6∈ A, can be decided as well.
We consider some first examples.
1.1. WHAT IS A σ-ALGEBRA?
13
Example 1.1.2 (The smallest and biggest σ-algebra). For any σalgebra F on Ω one has
{∅, Ω} ⊆ F ⊆ 2Ω ,
where one quickly checks that 2Ω and {Ω, ∅} are σ-algebra as well. Consequently, {Ω, ∅} and 2Ω are the smallest and largest σ-algebras on Ω. Something ’in between’ one can obtain by taking a set A ⊆ Ω not being empty
nor being Ω itself: then
F = {Ω, ∅, A, Ac }
is a σ-algebra not equal to {Ω, ∅} and 2Ω .
Some more concrete examples are the following:
Example 1.1.3 (Rolling Dice). (1) Assume first the model for one die,
i.e. Ω := {1, ..., 6} and F := 2Ω . The event ”the die shows an even
number” is described by
A = {2, 4, 6}.
(2) Assume now that the model with two dice, i.e. Ω := {(a, b) : a, b =
1, ..., 6}. We want to to describe the event ”the sum of the two dice
equals four”, i.e.
A = {(1, 3), (2, 2), (3, 1)}.
There are at least two options to take an appropriate σ-algebra: Either
the largest one F1 := 2Ω . But a smaller natural one would be
F2 := {A ⊆ Ω : A is the empty set or a union of A2 , ..., A12 } ,
where Am := {(k, l) : k + l = m}. The σ-algebra F2 would fit the following model: One player throws two dice however transmits to the other
player only the sum of the outcomes not the values of the particular dice.
Example 1.1.4 (Heads and Tails). (1) Assume a model for two coins,
i.e. Ω := {(H, H), (H, T ), (T, H), (T, T )} and F := 2Ω . ”Exactly one of
two coins shows heads” is modelled via
A = {(H, T ), (T, H)}.
(2) Assume now that we agree about a game with infinitely many trials. As
set of elementary events we simply take the set of all possible outcomes,
i.e. Ω := {(a1 , a2 , a3 , ...) : ak ∈ {T, H}}. A game would be that player
1 wins, if there would be 10 heads more than tails for the first time and
player 2 wins if there would be 10 heads more than tails for the first time.
If Ω = {ω1 , ..., ωn }, then any algebra F on Ω is automatically a σ-algebra.
However, in general this is not the case as shown by the next example:
14
CHAPTER 1. PROBABILITY SPACES
Example 1.1.5. [algebra, which is not a σ-algebra] Let G be the
system of subsets A ⊆ R such that A can be written as
A = (a1 , b1 ] ∪ (a2 , b2 ] ∪ · · · ∪ (an , bn ]
or
A = (a1 , b1 ] ∪ (a2 , b2 ] ∪ · · · ∪ (an , ∞)
where −∞ ≤ a1 ≤ b1 ≤ · · · ≤ an ≤ bn < ∞ with the convention that
(a, a] = ∅. Then G is an algebra, but not a σ-algebra.
1.2
Measures
Now we introduce the measures we are going to use:
Definition 1.2.1. [probability space, measure space] Let (Ω, F) be
a measurable space.
(1) A map P : F → [0, 1] is called probability measure if P(Ω) = 1 and
for all A1 , A2 , ... ∈ F with Ai ∩ Aj = ∅ for i 6= j one has
P
∞
[
Ai =
i=1
∞
X
P(Ai ).
(1.1)
i=1
The triplet (Ω, F, P) is called probability space.
(2) A map µ : F → [0, ∞] (where [0, ∞] := [0, ∞) ∪ {∞}) is called measure
if µ(∅) = 0 and for all A1 , A2 , ... ∈ F with Ai ∩ Aj = ∅ for i 6= j one has
µ
∞
[
Ai =
i=1
∞
X
µ(Ai ).
i=1
The triplet (Ω, F, µ) is called measure space.
(3) A measure space (Ω, F, µ) or a measure µ is called σ-finite provided
that there are Ωk ⊆ Ω, k = 1, 2, ..., such that
(a) Ωk ∈ F for all k = 1, 2, ...,
(b) Ωi ∩ Ωj = ∅ for i 6= j,
S
(c) Ω = ∞
k=1 Ωk ,
(d) µ(Ωk ) < ∞.
The measure space (Ω, F, µ) or the measure µ are called finite if µ(Ω) <
∞.
1.2. MEASURES
15
Remark 1.2.2. Of course, any probability measure is P
a finite measure: We
need only to check µ(∅) = 0 which follows from µ(∅) = ∞
i=1 µ(∅) (note that
∅ ∩ ∅ = ∅) and µ(Ω) < ∞.
Example 1.2.3. (1) We assume the model of a die, i.e. Ω = {1, ..., 6} and
F = 2Ω . Assuming that all outcomes for rolling a die are equally likely,
leads to
1
P({ω}) := .
6
Then, for example,
1
P({2, 4, 6}) = .
2
(2) If we assume to have two coins, i.e.
Ω = {(T, T ), (H, T ), (T, H), (H, H)}
and F = 2Ω , then the intuitive assumption ’fair’ leads to
1
P({ω}) := .
4
That means, for example, the probability that exactly one of two coins
shows heads is
1
P({(H, T ), (T, H)}) = .
2
Example 1.2.4. [Dirac and counting measure] 1 Let Ω be an arbitrary
non-empty set and F = 2Ω .
(1) Dirac measure: For a fixed x0 ∈ Ω we let
1 : x0 ∈ A
δx0 (A) :=
.
0 : x0 ∈
6 A
(2) A trivial (but sometimes important) example is the counting measure:
Define
µ(A) := cardinality of A.
This measure is σ-finite if and only if Ω is countable.
(3) Let B0 ⊆ Ω with card(B0 ) > 1, and let
1 : B0 ∩ B 6= ∅
µ(B) :=
.
0 :
otherwise
One notes, that µ is not a measure.
1
Paul Adrien Maurice Dirac, 08/08/1902 (Bristol, England) - 20/10/1984 (Tallahassee,
Florida, USA), Nobelprice in Physics 1933.
16
CHAPTER 1. PROBABILITY SPACES
Let us now discuss a typical example in which the σ–algebra F is not the set
of all subsets of Ω.
Example 1.2.5. Assume that there are n communication channels between
the points A and B. Each of the channels has a communication rate of ρ > 0
(say ρ bits per second), which yields to the communication rate ρk, in case
k channels are used. Each of the channels fails with probability p, so that
we have a random communication rate R ∈ {0, ρ, ..., nρ}. What is the right
model for this? We use
Ω := {ω = (ε1 , ..., εn ) : εi ∈ {0, 1})
with the interpretation: εi = 0 if channel i is failing, εi = 1 if channel i is
working. F consists of all unions of
Ak := {ω ∈ Ω : ε1 + · · · + εn = k} .
Hence Ak consists of all ω such that the communication rate is ρk. The
system F is the system of observable sets of events since one can only observe
how many channels are failing, but not which channel fails. The measure P
is given by
n n−k
P(Ak ) :=
p (1 − p)k ,
0 < p < 1.
k
Note that P describes the binomial distribution with parameter p on
{0, ..., n} if we identify Ak with the natural number k.
We continue with some basic properties of a probability measure.
Proposition 1.2.6. Let (Ω, F, P) be a probability space. Then the following
assertions are true:
Sn
(1) If
PnA1 , ..., An ∈ F such that Ai ∩ Aj = ∅ if i 6= j, then P ( i=1 Ai ) =
i=1 P (Ai ).
(2) If A, B ∈ F, then P(A\B) = P(A) − P(A ∩ B).
(3) If B ∈ F, then P(B c ) = 1 − P(B).
S
P∞
(4) If A1 , A2 , ... ∈ F then P ( ∞
i=1 Ai ) ≤
i=1 P (Ai ).
(5) Continuity from below: If A1 , A2 , ... ∈ F such that A1 ⊆ A2 ⊆
A3 ⊆ · · · , then
!
∞
[
lim P(An ) = P
An .
n→∞
n=1
(6) Continuity from above: If A1 , A2 , ... ∈ F such that A1 ⊇ A2 ⊇
A3 ⊇ · · · then
!
∞
\
lim P(An ) = P
An .
n→∞
n=1
1.2. MEASURES
17
Proof. (1) We let An+1 = An+2 = · · · = ∅, so that
!
!
n
∞
∞
n
[
[
X
X
P
Ai = P
Ai =
P (Ai ) =
P (Ai ) ,
i=1
i=1
i=1
i=1
because of P(∅) = 0.
(2) Since (A ∩ B) ∩ (A\B) = ∅, we get that
P(A ∩ B) + P(A\B) = P ((A ∩ B) ∪ (A\B)) = P(A).
(3) We apply (2) to A = Ω and observe that Ω\B = B c by definition and
Ω ∩ B = B.
(4) Put B1 := A1 and Bi := Ac1 ∩Ac2 ∩· · ·∩Aci−1 ∩Ai for i =
S∞
S 2, 3, . . . Obviously,
P(Bi ) ≤ P(Ai ) for all i. Since the Bi ’s are disjoint and ∞
i=1 Bi it
i=1 Ai =
follows
!
!
∞
∞
∞
∞
[
[
X
X
P
Ai = P
Bi =
P(Bi ) ≤
P(Ai ).
i=1
i=1
i=1
i=1
(5) We define B1 := A1 , B2 := A2 \A1 , B3 := A3 \A2 , B4 := A4 \A3 , ... and
get that
∞
∞
[
[
An and Bi ∩ Bj = ∅
Bn =
n=1
n=1
for i 6= j. Consequently,
!
!
∞
∞
∞
N
[
[
X
X
P
An = P
Bn =
P (Bn ) = lim
P (Bn ) = lim P(AN )
n=1
since
SN
n=1
n=1
N →∞
n=1
N →∞
n=1
Bn = AN . (6) is an exercise.
The sequence an = (−1)n , n = 1, 2, . . . does not converge, i.e. the limit of
(an )∞
n=1 does not exist. But the limit superior and the limit inferior for a
given sequence of real numbers exists always.
Definition 1.2.7. [lim inf n an and lim supn an ] For a1 , a2 , ... ∈ R we let
lim inf an := lim inf ak
n
n k≥n
and
lim sup an := lim sup ak .
n
n
k≥n
Remark 1.2.8. (1) The value lim inf n an is the infimum of all c such that
there is a subsequence n1 < n2 < n3 < · · · such that limk ank = c.
(2) The value lim supn an is the supremum of all c such that there is a
subsequence n1 < n2 < n3 < · · · such that limk ank = c.
18
CHAPTER 1. PROBABILITY SPACES
(3) By definition one has that
−∞ ≤ lim inf an ≤ lim sup an ≤ ∞.
n
n
Moreover, if lim inf n an = lim supn an = a ∈ R, then limn an = a.
(4) For example, taking an = (−1)n , gives
lim inf an = −1 and
n
lim sup an = 1.
n
As we will see below, also for a sequence of sets one can define a limit superior
and a limit inferior.
Definition 1.2.9. [lim inf n An and lim supn An ] Let (Ω, F) be a measurable
space and A1 , A2 , ... ∈ F. Then
lim inf An :=
n
∞ \
∞
[
Ak
and
lim sup An :=
n
n=1 k=n
∞ [
∞
\
Ak .
n=1 k=n
The definition above says that ω ∈ lim inf n An if and only if all events An ,
except a finite number of them, occur, and that ω ∈ lim supn An if and only
if infinitely many of the events An occur:
lim inf An = {ω ∈ Ω : ω ∈ An for all n except finitely many},
n
lim sup An = {ω ∈ Ω : ω ∈ An infinitely often}.
n
Proposition 1.2.10. [Lemma of Fatou2 ] Let (Ω, F, P) be a probability
space and A1 , A2 , ... ∈ F. Then
P lim inf An ≤ lim inf P (An ) ≤ lim sup P (An ) ≤ P lim sup An .
n
n
n
n
Proof.
Remark 1.2.11. If lim inf n An = lim supn An = A, then the Lemma of Fatou
gives that
lim inf P (An ) = lim sup P (An ) = lim P (An ) = P (A) .
n
n
n
Examples for such systems are obtained by assuming A1 ⊆ A2 ⊆ A3 ⊆ · · ·
or A1 ⊇ A2 ⊇ A3 ⊇ · · · .
2
Pierre Joseph Louis Fatou, 28/02/1878-10/08/1929, French mathematician (dynamical systems, Mandelbrot-set).
1.3. FIRST LOOK AT INDEPENDENCE
1.3
19
A first look at independence and the
Lemma of Borel-Cantelli
Now we turn to the fundamental notion of independence.
Definition 1.3.1. [independence of events] Let (Ω, F, P) be a probability space. The events (Ai )i∈I ⊆ F, where I is an arbitrary non-empty
index set, are called independent, provided that for all n ≥ 2 and distinct
i1 , ..., in ∈ I one has that
P (Ai1 ∩ Ai2 ∩ · · · ∩ Ain ) = P (Ai1 ) P (Ai2 ) · · · P (Ain ) .
Given A1 , ..., An ∈ F, one can easily see that only demanding
P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 ) P (A2 ) · · · P (An )
would not yield to an appropriate notion for the independence of A1 , ..., An :
for example, taking A and B with
P(A ∩ B) 6= P(A)P(B)
and C = ∅ gives
P(A ∩ B ∩ C) = P(A)P(B)P(C),
which is surely not, what we had in mind.
Now we continue with the fundamental Lemma of Borel-Cantelli.
Proposition 1.3.2. [Lemma of Borel-Cantelli3 ] Let (Ω, F, P) be a
probability space and A1 , A2 , ... ∈ F. Then one has the following:
P
(1) If ∞
n=1 P(An ) < ∞, then P (lim supn→∞ An ) = 0.
P
(2) If A1 , A2 , ... are assumed to be independent and ∞
n=1 P(An ) = ∞, then
P (lim supn→∞ An ) = 1.
Proof. . (1) It holds by definition lim supn→∞ An =
∞
[
k=n+1
Ak ⊆
∞
[
T∞ S∞
n=1
k=n
Ak . By
Ak
k=n
and the continuity of P from above (see Proposition 1.2.6) we get that
!
∞ [
∞
\
P lim sup An
= P
Ak
n→∞
3
n=1 k=n
Francesco Paolo Cantelli, 20/12/1875-21/07/1966, Italian mathematician.
20
CHAPTER 1. PROBABILITY SPACES
=
∞
[
lim P
n→∞
≤
lim
n→∞
!
Ak
k=n
∞
X
P (Ak ) = 0,
k=n
where the last inequality follows again from Proposition 1.2.6.
(2) It holds that
c
∞ \
∞
[
c
lim sup An = lim inf An =
Acn .
n
n
n=1 k=n
So, we would need to show
P
!
∞ \
∞
[
Acn
= 0.
n=1 k=n
Letting Bn :=
T∞
k=n
Ack we get that B1 ⊆ B2 ⊆ B3 ⊆ · · · , so that
!
∞ \
∞
[
P
Acn = lim P(Bn )
n
n=1 k=n
so that it suffices to show
∞
\
P(Bn ) = P
!
Ack
= 0.
k=n
Since the independence of A1 , A2 , ... implies the independence of Ac1 , Ac2 , ...,
we finally get (setting pn := P(An )) that
!
!
∞
N
\
\
P
Ack
=
lim P
Ack
N →∞,N ≥n
k=n
=
=
≤
=
lim
N →∞,N ≥n
lim
N →∞,N ≥n
lim
N →∞,N ≥n
lim
N →∞,N ≥n
P
− ∞
k=n pk
= e
= e−∞
= 0
where we have used that 1 − x ≤ e−x .
k=n
N
Y
P (Ack )
k=n
N
Y
(1 − pk )
k=n
N
Y
e−pk
k=n
P
− N
k=n pk
e
1.4. CONDITIONAL PROBABILITIES
21
There is a counterpart of the Lemma of Borel and Cantelli. Its proof,
which is similar to the previous one, we leave as an exercise.
Lemma 1.3.3. Let (Ω, F, P) be a probability space and let A1 , A2 , ... ∈ F be
independent events of positive probability. Then one has that
(1) P(lim inf n→∞ An ) = 1 if and only if
∞
X
log
n=1
1
< ∞.
P(An )
(2) P(lim inf n→∞ An ) = 0 if and only if
∞
X
n=1
1.4
log
1
= ∞.
P(An )
A first look at conditional probabilities
and Bayes’ rule
Independence can be also expressed through conditional probabilities. Let
us first define them:
Definition 1.4.1. [conditional probability] Let (Ω, F, P) be a probability space, A ∈ F with P(A) > 0. Then
P(B|A) :=
P(B ∩ A)
,
P(A)
for B ∈ F,
is called conditional probability of B given A.
It is now obvious that A and B are independent if and only if P(B|A) = P(B).
An important place of the conditional probabilities in Statistics is guaranteed
by Bayes’ formula. Before we formulate this formula in Proposition 1.4.3 we
consider A, B ∈ F, with 0 < P(B) < 1 and P(A) > 0. Then
A = (A ∩ B) ∪ (A ∩ B c ),
where (A ∩ B) ∩ (A ∩ B c ) = ∅, and therefore,
P(A) = P(A ∩ B) + P(A ∩ B c )
= P(A|B)P(B) + P(A|B c )P(B c ).
This implies
P(B|A) =
P(B ∩ A)
P(A|B)P(B)
=
P(A)
P(A)
P(A|B)P(B)
=
.
P(A|B)P(B) + P(A|B c )P(B c )
22
CHAPTER 1. PROBABILITY SPACES
Example 1.4.2. A laboratory blood test is 95% effective in detecting a
certain disease when it is, in fact, present. However, the test also yields a
”false positive” result for 1% of the healthy persons tested. If 0.5% of the
population actually has the disease, what is the probability a person has the
disease given his test result is positive? We set
B := ”the person has the disease”,
A := ”the test result is positive”.
Hence we have
P(A|B) = P(”a positive test result”|”person has the disease”) = 0.95,
P(A|B c ) = 0.01,
P(B) = 0.005.
Applying the above formula we get
0.95 × 0.005
≈ 0.323.
0.95 × 0.005 + 0.01 × 0.995
That means only 32% of the persons whose test results are positive actually
have the disease.
P(B|A) =
4
Proposition 1.4.3. [Bayes’ formula
]
Sn
Assume A, Bj ∈ F, with Ω = j=1 Bj , where Bi ∩ Bj = ∅ for i 6= j and
P(A) > 0, P(Bj ) > 0 for j = 1, . . . , n. Then
P(A|Bj )P(Bj )
P(Bj |A) = Pn
.
k=1 P(A|Bk )P(Bk )
The proof is an exercise. An event Bj is also called hypothesis, the probabilities P(Bj ) the prior probabilities (or a priori probabilities), and the
probabilities P(Bj |A) the posterior probabilities (or a posteriori probabilities) of Bj .
4
Thomas Bayes, 1702-17/04/1761, English mathematician.
Chapter 2
Construction of probability
spaces
Before we discuss some basic examples of distributions we give an example
from insurance to motivate some of them. This example is treated in Section
10.5 in more detail. For the seek of motivation we already make use of random
variables, the reader should understand this notion in an intuitive way for the
moment. Let (Ω, F, P) be a probability space. We want to model the claim
number process (Nt )t≥0 , where Nt is the number of claims that happened
until time t. For this purpose we define
T0 := 0 and Tn := ∆1 + · · · + ∆n
for n ≥ 1, where Tn models the arrival time of the n-th claim. The corresponding claim number process N = (Nt )t≥0 , Nt : Ω → R, which is a
counting process, is given by N0 := 0 and
Nt := n if Tn ≤ t < Tn+1 .
We want to have a process that is - in a sense -without memory. For this
reason one assumes that the interarrival times (∆i )∞
i=1 are independent random variables having an exponential distribution with parameter λ > 0 as
introduced in Section 2.4.3 below. If one continues and wants to deduce
the distribution of Nt , then it turns out that it is distributed according to a
Poisson distribution with parameter λt as explained in Section 2.1.2. Finally,
there is a discrete time approximation of the Poisson distribution given by
the binomial distribution in Section 2.8 that provides an approximation by
a random walk with step-sizes zero and one.
2.1
2.1.1
Examples of distributions on N
Binomial distribution with parameter 0 < p < 1
(1) Ω := {0, 1, ..., n}.
24
CHAPTER 2. CONSTRUCTION OF PROBABILITY SPACES
(2) F := 2Ω (system of all subsets of Ω).
P
(3) P(B) = Binn,p (B) := nk=0 nk pk (1 − p)n−k δk (B), where δk is the Dirac
measure introduced in Example 1.2.4.
Interpretation: Coin-tossing with one coin, such that one has heads with
probability p and tails with probability 1 − p. Then Binn,p ({k}) equals the
probability, that within n trials one has k-times heads.
2.1.2
Poisson distribution with parameter λ > 0
(1) Ω := {0, 1, 2, 3, ...} =: N
(2) F := 2Ω (system of all subsets of Ω).
P
−λ λk
δ (B).
(3) P(B) = Poisλ (B) := ∞
k=0 e
k! k
The Poisson distribution1 is used, for example, to model stochastic processes
with a continuous time parameter and jumps: the probability that the process
jumps k times between the time-points s and t with 0 ≤ s < t < ∞ is equal
to Poisλ(t−s) ({k}).
2.1.3
Geometric distribution with parameter 0 < p < 1
(1) Ω := N0 .
(2) F := 2Ω (system of all subsets of Ω).
P
k
(3) P(B) = Geomp (B) := ∞
k=0 (1 − p) pδk (B).
Interpretation: The probability that an electric light bulb breaks down
is p ∈ (0, 1). The bulb does not have a ”memory”, that means the break
down is independent of the time the bulb has been already switched on. So,
we get the following model: at day 0 the probability of breaking down is p. If
the bulb survives day 0, it breaks down again with probability p at the first
day so that the total probability of a break down at day 1 is (1 − p)p. If we
continue in this way we get that breaking down at day k has the probability
(1 − p)k p.
2.2
Carathéodory’s extension theorem
Although the definition of a measure is not difficult, to prove existence and
uniqueness of measures may sometimes be difficult. The problem lies in the
fact that, in general, the σ-algebras are not constructed explicitly, one only
knows their existence. To overcome this difficulty, one usually exploits
1
Siméon Denis Poisson, 21/06/1781 (Pithiviers, France) - 25/04/1840 (Sceaux, France).
2.3. THE BOREL σ-ALGEBRA
25
Proposition 2.2.1. [Carathéodory’s extension theorem2 ] Let Ω 6=
∅, G be an algebra such that F = σ(G). Assume a map µ0 : G → [0, ∞] such
that
S
(1) µ0 (Ωn ) < ∞, n = 1, 2, ..., for some partition Ω = ∞
n=1 Ωn with Ωn ∈ G,
S
P∞
(2) µ0 ( ∞
n=1 An ) =
n=1 µ0 (An ) for pair-wise disjoint An ∈ G such that
∞
[
An ∈ G.
n=1
Then there exists a unique measure µ : F → [0, ∞] such that
µ(A) = µ0 (A)
for all A ∈ G.
It is clear that the measure µ in the theorem above is automatically σ-finite.
A proof of Proposition 2.2.1 can be found in Section 12.3.2 below. If one is
only interested in the uniqueness of measures one can also use the following
approach as a replacement of Carathéodory’s extension theorem:
Definition 2.2.2. [π-system] A system G of subsets A ⊆ Ω is called πsystem, provided that
A∩B ∈G
for all A, B ∈ G.
Any algebra is a π-system but a π-system is not an algebra in general, take
for example the π-system {(a, b) : −∞ < a < b < ∞} ∪ {∅}.
Proposition 2.2.3. Let (Ω, F) be a measurable space with F = σ(P), where
P is a π-system containing Ω. Assume two finite measures µ and ν on F
such that
µ(A) = ν(A) for all A ∈ P.
Then µ(B) = ν(B) for all B ∈ F.
The proof can be found in Section 12.3.1.
2.3
The Borel σ-algebra
Later we introduce the fundamental Lebesgue measure λ on the real line
which measures simply the length go an interval, i.e. we want that
λ((a, b)) = b − a
2
Constantin Carathéodory, 13/09/1873 (Berlin, Germany) - 02/02/1950 (Munich, Germany).
26
CHAPTER 2. CONSTRUCTION OF PROBABILITY SPACES
for −∞ < a < b < ∞. So we have a set of elementary events, Ω = R
and a candidate for a natural measure, the Lebesgue measure. The question
arises which sets can be measured by this type of measure. In particular,
can we take the system of all subsets F = 2Ω . Surprisingly it turns out
that we cannot use this σ-algebra as it is too big. It can be shown that the
Lebesgue measure cannot be constructed so that it can measure all subsets
in an appropriate way. This leads us to a fundamental problem in measure
theory, to construct a reasonable σ-algebra for the Lebesgue measure. If one
is modest the minimum that one check is the smallest σ-algebra that contains
all open intervals. To get this idea working we use Proposition 2.3.2 below,
which is based on the fundamental
Proposition 2.3.1. [intersection of σ-algebras is a σ-algebra] Let
Ω be an arbitrary non-empty set and let Fj , j ∈ J, J 6= ∅, be a family of
σ-algebras on Ω, where J is an arbitrary index set. Then
\
Fj
F :=
j∈J
is a σ-algebra as well.
Proof. The proof is very easy, but typical and
T fundamental. First we notice
that
∅,
Ω
∈
F
for
all
j
∈
J,
so
that
∅,
Ω
∈
j
j∈J Fj . Now let A, A1 , A2 , ... ∈
T
j∈J Fj . Hence A, A1 , A2 , ... ∈ Fj for all j ∈ J, so that (Fj are σ–algebras!)
c
A = Ω\A ∈ Fj
and
∞
[
Ai ∈ Fj
i=1
for all j ∈ J. Consequently,
Ac ∈
\
j∈J
Fj
and
∞
[
i=1
Ai ∈
\
Fj .
j∈J
Proposition 2.3.2. [smallest σ-algebra containing a set-system]
Let Ω 6= ∅ and G be a non-empty system of subsets of Ω and
\
σ(G) :=
F.
F σ-algebra and G⊆F
Then one has the following:
(1) The system σ(G) is a σ-algebra containing all sets from G.
(2) If F is any σ-algebra on Ω containing all G ∈ G, then σ(G) ⊆ F.
2.3. THE BOREL σ-ALGEBRA
27
Proof. We let
J := {C is a σ–algebra on Ω such that G ⊆ C} .
One has J 6= ∅, because G ⊆ 2Ω and 2Ω is a σ–algebra. Hence
\
σ(G) :=
C
C∈J
yields to a σ-algebra according to Proposition 2.3.1 such that (by construction) G ⊆ σ(G). It remains to show that σ(G) is the smallest σ-algebra
containing G. Assume another σ-algebra F with G ⊆ F. By definition of J
we have that F ∈ J so that
\
C ⊆ F.
σ(G) =
C∈J
The construction is very elegant but has, as already mentioned, the slight
disadvantage that one cannot construct all elements of σ(G) explicitly. Let
us now turn to one of the most important examples, the Borel σ-algebra on
R. To do this we need the notion of open and closed sets.
Definition 2.3.3. [open and closed sets]
(1) Given x, y ∈ Rd , the Euclidean distance is given by
d(x, y) = |x − y| :=
d
X
! 21
|xi − yi |2
.
i=1
For x ∈ Rd and ε > 0 we define the open ball with radius ε > 0 centered
at x ∈ Rd by
Uε (x) := {y ∈ Rd : |x − y| < ε}.
(2) A subset A ⊆ Rd is called open if for each x ∈ A there is an ε > 0 such
that Uε (x) ⊆ A.
(3) A subset B ⊆ Rd is called closed if A := R\B is open.
Definition 2.3.4. [Borel σ-algebra on R] 3 The smallest σ-algebra on
Rd that contains all open sets is called Borel σ-algebra and denoted by
B(Rd ).
3
Félix Edouard Justin Émile Borel, 07/01/1871-03/02/1956, French mathematician.
28
CHAPTER 2. CONSTRUCTION OF PROBABILITY SPACES
Remark 2.3.5. In the same way one can introduce Borel σ-algebras on
metric spaces: Given a metric space M with metric d a set A ⊆ M is open
provided that for all x ∈ A there is a ε > 0 such that {y ∈ M : d(x, y) < ε} ⊆
A. A set B ⊆ M is closed if the complement M \B is open. The Borel σalgebra B(M ) is the smallest σ-algebra that contains all open (closed) subsets
of M .
There are various ways to define the Borel-σ-algebra. Below we illustrate
some options in the case of R, i.e. d = 1.
Proposition 2.3.6. [Generation of the Borel σ-algebra on R] We
let
G0
G1
G2
G3
G4
G5
be
be
be
be
be
be
the
the
the
the
the
the
system
system
system
system
system
system
of
of
of
of
of
of
all
all
all
all
all
all
open subsets of R,
closed subsets of R,
intervals (−∞, b], b ∈ R,
intervals (−∞, b), b ∈ R,
intervals (a, b], −∞ < a < b < ∞,
intervals (a, b), −∞ < a < b < ∞.
Then σ(G0 ) = σ(G1 ) = σ(G2 ) = σ(G3 ) = σ(G4 ) = σ(G5 ).
Proof. myproof of Proposition 2.3.6. We only show that
σ(G0 ) = σ(G1 ) = σ(G3 ) = σ(G5 ).
Because of G3 ⊆ G0 one has
σ(G3 ) ⊆ σ(G0 ).
Moreover, for −∞ < a < b < ∞ one has that
∞ [
1
(a, b) =
(−∞, b)\(−∞, a + ) ∈ σ(G3 )
n
n=1
so that G5 ⊆ σ(G3 ) and
σ(G5 ) ⊆ σ(G3 ).
Now let us assume a bounded non-empty open set A ⊆ R. For all x ∈ A
there is a maximal εx > 0 such that
(x − εx , x + εx ) ⊆ A.
Hence
A=
[
(x − εx , x + εx ),
x∈A∩Q
which proves G0 ⊆ σ(G5 ) and
σ(G0 ) ⊆ σ(G5 ).
Finally, A ∈ G0 implies Ac ∈ G1 ⊆ σ(G1 ) and A ∈ σ(G1 ). Hence G0 ⊆ σ(G1 )
and
σ(G0 ) ⊆ σ(G1 ).
The remaining inclusion σ(G1 ) ⊆ σ(G0 ) can be shown in the same way.
2.4. DISTRIBUTIONS ON R
2.4
29
Distributions on R
2.4.1
Lebesgue measure on (R, B(R))
We will construct the Lebesgue measure on R using Carathéodory’s extension theorem. For this purpose we let
(1) Ω := R,
(2) F := B(R),
(3) As generating algebra G for B(R) we take the algebra from Example
1.1.5: Let G be the system of subsets A ⊆ R such that A can be written
as
A = (a1 , b1 ] ∪ (a2 , b2 ] ∪ · · · ∪ (an , bn ]
or
A = (a1 , b1 ] ∪ (a2 , b2 ] ∪ · · · ∪ (an , ∞)
where −∞ ≤ a1 ≤ b1 ≤ · · · ≤ an ≤ bn < ∞. For such a set A we let
n
X
λ0 (A) :=
(bi − ai ).
i=1
Proposition 2.4.1. The system G is an algebra. The map λ0 : G → [0, ∞]
is correctly defined and satisfies the assumptions of Carathéodory’s extension
theorem Proposition 2.2.1.
Proof. After a standard reduction (check!) we have to show the following:
given −∞ < a < b < ∞ and pair-wise disjoint intervals (an , bn ] with
(a, b] =
∞
[
(an , bn ]
n=1
we have that b − a =
P∞
n=1 (bn
− an ). Let ε ∈ (0, b − a) and observe that
∞ [
[a + ε, b] ⊆
n=1
ε
an , bn + n .
2
Hence we have an open covering of a compact set and by the Heine-Borel
theorem there exists a finite sub-cover:
[
ε
[a + ε, b] ⊆
(an , bn ] ∪ bn , bn + n
2
n∈I(ε)
for some finite set I(ε). The total length of the intervals bn , bn +
most ε > 0, so that
b−a−ε≤
X
n∈I(ε)
(bn − an ) + ε ≤
∞
X
n=1
(bn − an ) + ε.
ε
2n
is at
30
CHAPTER 2. CONSTRUCTION OF PROBABILITY SPACES
Letting ε ↓ 0 we arrive at
b−a≤
∞
X
(bn − an ).
n=1
Since b − a ≥
obvious.
PN
n=1 (bn
− an ) for all N ≥ 1, the opposite inequality is
Definition 2.4.2. [Lebesgue4 measure] The unique extension λ of λ0 to
B(R) according to Proposition 2.2.1 is called Lebesgue measure.
Accordingly, λ is the unique σ-finite measure on B(R) such that
λ((a, b]) = b − a for all − ∞ < a < b < ∞.
2.4.2
Gaussian distribution on R with mean m ∈ R and
variance σ 2 > 0
(1) Ω := R.
(2) F := B(R) Borel σ-algebra.
(3) We take the algebra G considered in Example 1.1.5 and define
P0 (A) :=
n Z
X
i=1
bi
ai
(x−m)2
1
√
e− 2σ2 dx
2πσ 2
for A := (a1 , b1 ]∪(a2 , b2 ]∪· · ·∪(an , bn ] where we consider the Riemannintegral 5 on the right-hand side. One can show (we do not do this here,
but compare with Proposition 5.7.6 below) that P0 satisfies the assumptions of Proposition 2.2.1, so that we can extend P0 to a probability
measure Nm,σ2 on B(R).
The measure Nm,σ2 is called Gaussian distribution 6 (normal distribution) with mean m and variance σ 2 . Given A ∈ B(R) we write
Z
(x−m)2
1
e− 2σ2 .
Nm,σ2 (A) =
pm,σ2 (x)dx with pm,σ2 (x) := √
2πσ 2
A
The function pm,σ2 (x) is called Gaussian density.
4
Henri Léon Lebesgue, 28/06/1875-26/07/1941, French mathematician (generalized
the Riemann integral by the Lebesgue integral; continuation of work of Emile Borel and
Camille Jordan).
5
Georg Friedrich Bernhard Riemann, 17/09/1826 (Germany) - 20/07/1866 (Italy),
Ph.D. thesis under Gauss.
6
Johann Carl Friedrich Gauss, 30/04/1777 (Brunswick, Germany) - 23/02/1855
(Göttingen, Hannover, Germany).
2.4. DISTRIBUTIONS ON R
2.4.3
31
Exponential distribution on R with parameter
λ>0
(1) Ω := R.
(2) F := B(R) Borel σ-algebra.
(3) For A and G as in Subsection 2.4.2 we define, via the Riemann-integral,
n Z bi
X
P0 (A) :=
pλ (x)dx with pλ (x) := 1I[0,∞) (x)λe−λx
i=1
ai
Again, P0 satisfies the assumptions of Proposition 2.2.1, so that we can
extend P0 to the exponential distribution Expλ with parameter λ
and density pλ (x) on B(R).
Given A ∈ B(R) we write
Z
pλ (x)dx.
Expλ (A) =
A
The exponential distribution can be considered as a continuous time version
of the geometric distribution. In particular, we see that the distribution does
not have a memory in the sense that for a, b ≥ 0 we have
Expλ ([a + b, ∞)|[a, ∞)) = Expλ ([b, ∞))
with the conditional probability on the left-hand side. In words: the probability of a realization larger or equal to a+b under the condition that one has
already a value larger or equal a is the same as having a realization larger or
equal b. Indeed, it holds
Expλ ([a + b, ∞) ∩ [a, ∞))
Exp ([a, ∞))
R ∞ −λxλ
λ a+b e dx
R∞
=
λ a e−λx dx
Expλ ([a + b, ∞)|[a, ∞)) =
e−λ(a+b)
e−λa
= Expλ ([b, ∞)).
=
Example 2.4.3. Suppose that the amount of time one spends in a post office
1
is exponential distributed with λ = 10
.
(a) What is the probability, that a customer will spend more than 15 minutes?
(b) What is the probability, that a customer will spend more than 15 minutes from the beginning in the post office, given that the customer
already spent at least 10 minutes?
1
The answer for (a) is Expλ ([15, ∞)) = e−15 10 ≈ 0.220. For (b) we get
1
Expλ ([15, ∞)|[10, ∞)) = Expλ ([5, ∞)) = e−5 10 ≈ 0.604.
32
CHAPTER 2. CONSTRUCTION OF PROBABILITY SPACES
2.5
The trace σ-algebra
Let (Ω, F) be a measurable space. Sometimes one needs a σ-algebra on a
subset A ⊆ Ω. This can be constructed as follows.
Lemma 2.5.1. If Ω 6= ∅, F is a σ-algebra on Ω and A ∈ 2Ω then
FA := {F ∩ A : F ∈ F}
is a σ-algebra on A.
Proof Choosing F = ∅ and F = Ω implies ∅ ∈ FA and A ∈ FA , respectively.
Any element G ∈ FA can be written as G = F ∩ A for some F ∈ F. That
A \ G ∈ FA follows from F c ∈ F and
A \ G = A \ (F ∩ A) = F c ∩ A.
Assume that for k = 1, 2, ... the Gk are given by Gk = Fk ∩ A with Fk ∈ F.
Then
[
∞
∞
∞
[
[
Gk =
(Fk ∩ A) =
Fk ∩ A ∈ FA
k=1
since
S∞
k=1
k=1
k=1
Fk ∈ F.
Definition 2.5.2. The σ-algebra FA on A constructed in Lemma 2.5.1 is
called the trace of F on A.
Lemma 2.5.3. If Ω 6= ∅, G ⊆ 2Ω and A ∈ 2Ω then
σ(G)A = σ(GA )
where GA := {G ∩ A : G ∈ G}, and σ(GA ) denotes the smallest σ-algebra on
A which contains GA .
Proof. (a) The inclusion σ(GA ) ⊆ σ(G)A : We have that G ⊆ σ(G) so that
GA ⊆ σ(G)A . Finally Lemma 2.5.1 implies that
σ(GA ) ⊆ σ(G)A .
(b) The inclusion σ(GA ) ⊇ σ(G)A : We use the principle of good sets and
define
G0 := {G ∈ σ(G) : G ∩ A ∈ σ(GA )}
so that
G ⊆ G0 ⊆ σ(G).
If G0 would be a σ-algebra, then we would have G0 = σ(G) and G∩A ∈ σ(GA )
would hold for any G ∈ σ(G), i.e. σ(GA ) ⊇ σ(G)A . So it remains to show
that G0 is a σ-algebra:
2.6. PRODUCT SPACES
33
(1) Since A, ∅ ∈ σ(GA ) we conclude from the definition of G0 that Ω, ∅ ∈ G0 .
(2) Assume G ∈ G0 . Then G ∩ A ∈ σ(GA ), and since σ(GA ) is a σ-algebra on
A, also Gc ∩ A = A \ (G ∩ A) ∈ σ(GA ). Hence Gc ∈ G0 .
(3) Similarly
S as in the proof of Lemma 2.5.1 one derives from G1 , G2 , ... ∈ G0
that ∞
n=1 Gn ∈ G0 .
2.6
Product spaces
As an application of Carathéodory’s extension theorem we construct product
spaces of σ-finite measure spaces.
Proposition 2.6.1 (Product measure). Assume that (Ωi , Fi , µi ), i =
1, 2, ..., d are σ-finite measure spaces, let
Ω = Ω1 × · · · × Ωd := {(ω1 , . . . , ωd ) : ω1 ∈ Ω1 , . . . , ωd ∈ Ωd },
and define the σ-algebra F on Ω by
F = ⊗di=1 Fi := σ(B1 × · · · × Bd : Bi ∈ Fi , i = 1, ..., d).
Then there exists a unique measure µ = ⊗di=1 µi on F (which is automatically
σ-finite) such that
µ(B1 × · · · × Bd ) = µ1 (B1 ) · · · µd (Bd ).
Definition 2.6.2. [product of measure spaces] The space (Ω, F, µ) is
called product space and denoted by ⊗di=1 (Ωi , Fi , µi ).
Proof of Proposition 2.6.1. By choosing partitions (Ωi,l )∞
l=1 in Ω1 ,..., Ωd ,
where the corresponding measures are finite and considering products Ω1,l1 ×
· · · × Ωd,ld one can reduce the situation to finite measures. So we assume that
µ1 , ..., µd are finite. We define the algebra G generated by all finite unions of
’cuboids’
B1 × · · · × Bd
with B1 ∈ F1 ,..., Bd ∈ Fd . We let
µ0
n
[
(B1i
i=1
!
× ··· ×
Bdi )
:=
n
X
µ1 (B1i ) · · · µd (Bdi )
i=1
where the union of cuboids is assumed to be disjoint. Now we have to verify
the following: If
B1 × · · · × Bd =
∞
[
(B1i × · · · × Bdi )
i=1
34
CHAPTER 2. CONSTRUCTION OF PROBABILITY SPACES
where the union is disjoint, then
∞
X
µ1 (B1 ) · · · µd (Bd ) =
µ1 (B1i ) · · · µd (Bdi ).
i=1
If one of the sets B1 , ..., Bd has measure zero, then both sides vanishes and
there is nothing to prove. So let us assume that B1 , ..., Bd have positive
measure. Because
N
[
B1 × · · · × Bd ⊇
(B1i × · · · × Bdi )
i=1
for all N ≥ 1 we (easily) get that
µ1 (B1 ) · · · µd (Bd ) ≥
N
X
µ1 (B1i ) · · · µd (Bdi )
i=1
and therefore,
µ1 (B1 ) · · · µd (Bd ) ≥
∞
X
µ1 (B1i ) · · · µd (Bdi )
i=1
by N → ∞. The more difficult part consists in proving the converse inequality
∞
X
µ1 (B1 ) · · · µd (Bd ) ≤
µ1 (B1i ) · · · µd (Bdi ).
i=1
We proceed by induction over d. The case d = 1 follows from the σ-additivity
of the measure µ1 . Assume for d ≥ 2 that the statement is verfied for d − 1.
We fix ωd ∈ Ωd so that our assumption applied to the section at ωd implies
that
µ1 (B1 ) · · · µd−1 (Bd−1 )1IBd (ω) =
∞
X
i
µ1 (B1i ) · · · µd−1 (Bd−1
)1IBdi (ωd ).
i=1
In other words, for
αi :=
i
µ1 (B1i ) · · · µd−1 (Bd−1
)
≥0
µ1 (B1 ) · · · µd−1 (Bd−1 )
we have that
1IBd =
∞
X
αi 1IBdi
i=1
and we have to check that
µd (Bd ) ≤
∞
X
i=1
αi µd (Bdi ).
2.7. THE LEBESGUE MEASURE ON BOREL SETS OF RD
35
Let ε ∈ (0, 1) and
(
Aεn :=
ωd ∈ Ωd :
n
X
)
αi 1IBdi (ωd ) > 1IBd (1 − ε) .
i=1
S
ε
Then Aε1 ⊆ Aε2 ⊆ Aε3 ⊆ · · · with ∞
n=1 An = Bd . The monotonicity of the
measure implies that
lim µd (Aεn ) = µd (Bd )
n→∞
and
∞
X
αi µd (Bdi ) ≥ µd (Bd )(1 − ε)
i=1
for all ε ∈ (0, 1). Letting ε → 0 yields the assertion.
The case of infinite products requires more work. Here the interested reader
is referred, for example, to [13] where a special case is considered.
2.7
The Lebesgue measure on Borel sets of
Rd
The Borel σ-algebren B(Rd ) and B(R) are connected in the natural and
expected way:
Proposition 2.7.1. B(Rd ) = B(R) ⊗ · · · ⊗ B(R).
The proof is left as an exercise. For the d-dimensional Lebesgue measure we
obtain
Proposition 2.7.2. There is a unique measure λd on B(Rd ) such that
d
Y
λ ((a1 , b1 ] × . . . × (ad , bd ]) =
(bi − ai )
d
i=1
for all −∞ < ai < bi < ∞. The measure is σ-finite and it holds that
λd = ⊗di=1 λ,
where λ is the one-dimensional Lebesgue measure.
The measure λd is called d-dimensional Lebesgue measure and we obtain the
σ-finite measure space
(Rd , B(Rd ), λd ) = ⊗di=1 (R, B(R), λ).
In order to obtain the Lebesgue measure on a set Q ⊆ Rd with Q ∈ B(Rd )
and λd (Q) > 0 we proceed by restricting the Lebesgue measure λd to the
trace σ-algebra:
36
CHAPTER 2. CONSTRUCTION OF PROBABILITY SPACES
Definition 2.7.3. [Uniform distribution] Given Q ∈ B(Rd ) with
λd (Q) > 0 and the trace σ-algebra
BQ = F ∩ Q : F ∈ B(Rd ) = B ⊆ Q : B ∈ B(Rd ) ,
we set
λdQ (B) := λd (B) for B ∈ BQ .
The probability measure
UQ (B) :=
λdQ (B)
λd (Q)
for B ∈ BQ
is called uniform distribution on Q.
Important cases for Q are the closed intervals [a, b]. Furthermore, for −∞ <
a < b < ∞ we have that B(a,b] = B((a, b]) := σ({(c, d] : a ≤ c < d ≤ b}).
2.8
A first limit theorem: Poisson’s Theorem
For large n and small p the Poisson distribution provides a good approximation for the binomial distribution in the sense of Proposition 2.8.1 below.
This proposition describes the weak convergence investigated later in Section
11.8:
Proposition 2.8.1. [Poisson’s Theorem] Let λ > 0, pn ∈ (0, 1), n =
1, 2, ..., and assume that npn → λ as n → ∞. Then, for all k = 0, 1, . . . ,
Binn,pn ({k}) → Poisλ ({k}), n → ∞.
Consequently,
Binn,pn ((−∞, x]) → Poisλ ((−∞, x]), n → ∞,
for all x ∈ R.
Proof. Fix an integer k ≥ 0. Then
n k
Binn,pn ({k}) =
pn (1 − pn )n−k
k
n(n − 1) . . . (n − k + 1) k
pn (1 − pn )n−k
=
k!
1 n(n − 1) . . . (n − k + 1)
=
(npn )k (1 − pn )n−k .
k
k!
n
= 1. So we have
Of course, limn→∞ (npn )k = λk and limn→∞ n(n−1)...(n−k+1)
nk
to show that limn→∞ (1 − pn )n−k = e−λ . By npn → λ we get that there exists
a sequence εn such that
npn = λ + εn with lim εn = 0.
n→∞
2.8. A FIRST LIMIT THEOREM: POISSON’S THEOREM
37
Choose ε0 ∈ (0, λ) and n0 ≥ 1 such that |εn | ≤ ε0 for all n ≥ n0 . Then
λ + ε0
1−
n
n−k
≤
λ + εn
1−
n
n−k
≤
λ − ε0
1−
n
n−k
.
Using l’Hospital’s rule we get
λ + ε0
lim ln 1 −
n→∞
n
n−k
=
=
=
=
λ + ε0
lim (n − k) ln 1 −
n→∞
n
λ+ε0
ln 1 − n
lim
n→∞ 1/(n − k)
−1 λ+ε0
0
1 − λ+ε
n
n2
lim
2
n→∞
−1/(n − k)
−(λ + ε0 ).
Hence
−(λ+ε0 )
e
n−k
n−k
λ + ε0
λ + εn
= lim 1 −
≤ lim 1 −
.
n→∞
n→∞
n
n
In the same way we get that
n−k
λ + εn
lim 1 −
≤ e−(λ−ε0 ) .
n→∞
n
Finally, since we can choose ε0 > 0 arbitrarily small
n−k
lim (1 − pn )
n→∞
= lim
n→∞
λ + εn
1−
n
n−k
= e−λ .
38
CHAPTER 2. CONSTRUCTION OF PROBABILITY SPACES
Chapter 3
Random variables
Given a probability space (Ω, F, P), in many stochastic models functions
f : Ω → R which describe certain random phenomena are considered and
one is interested in the computation of expressions like
P ({ω ∈ Ω : f (ω) ∈ (a, b)}) ,
where a < b.
This leads us to the condition
{ω ∈ Ω : f (ω) ∈ (a, b)} ∈ F
and hence to random variables we will introduce now.
3.1
Random variables
We start with the most simple random variables.
Definition 3.1.1. [(measurable) step-function] Let (Ω, F) be a measurable space. A function f : Ω → R is called measurable step-function
or step-function, provided that there are α1 , ..., αn ∈ R and A1 , ..., An ∈ F
such that f can be written as
f (ω) =
n
X
αi 1IAi (ω),
i=1
where
1IAi (ω) :=
1 : ω ∈ Ai
.
0 : ω 6∈ Ai
Some particular examples for step-functions are
1IΩ = 1,
1I∅ = 0,
1IA + 1IAc = 1,
40
CHAPTER 3. RANDOM VARIABLES
1IA∩B = 1IA 1IB ,
1IA∪B = 1IA + 1IB − 1IA∩B .
The definition above concerns only functions which take finitely many values,
which will be too restrictive in future. So we wish to extend this definition.
Definition 3.1.2. [random variables] Let (Ω, F) be a measurable space.
A map f : Ω → R is called random variable provided that there is a
sequence (fn )∞
n=1 of measurable step-functions fn : Ω → R such that
f (ω) = lim fn (ω) for all ω ∈ Ω.
n→∞
Does our definition give what we would like to have? Yes, as we see from
Proposition 3.1.3. Let (Ω, F) be a measurable space and let f : Ω → R be
a function. Then the following conditions are equivalent:
(1) f is a random variable.
(2) For all −∞ < a < b < ∞ one has that
f −1 ((a, b)) := {ω ∈ Ω : a < f (ω) < b} ∈ F.
Proof. (1) =⇒ (2) Assume that
f (ω) = lim fn (ω)
n→∞
where fn : Ω → R are measurable step-functions. For a measurable stepfunction one has that
fn−1 ((a, b)) ∈ F
so that
n
o
ω ∈ Ω : a < lim fn (ω) < b
n
∞ [
∞ \
∞ [
1
1
ω ∈Ω:a+
< fn (ω) < b −
∈ F.
=
m
m
m=1 N =1 n=N
f −1 ((a, b)) =
(2) =⇒ (1) First we observe that we also have that
f −1 ([a, b)) = {ω ∈ Ω : a ≤ f (ω) < b}
∞ \
1
=
ω ∈Ω:a−
< f (ω) < b ∈ F
m
m=1
so that we can use the step-functions
fn (ω) :=
n −1
4X
k=−4n
k
1I k
k+1 (ω).
2n { 2n ≤f < 2n }
Sometimes the following proposition is useful which is closely connected to
Proposition 3.1.3.
3.2. MEASURABLE MAPS
41
Proposition 3.1.4. Assume a measurable space (Ω, F) and a sequence of
random variables fn : Ω → R such that f (ω) := limn fn (ω) exists for all
ω ∈ Ω. Then f : Ω → R is a random variable.
The proof is an exercise.
Proposition 3.1.5. [properties of random variables] Let (Ω, F) be a
measurable space and f, g : Ω → R random variables and α, β ∈ R. Then the
following is true:
(1) (αf + βg)(ω) := αf (ω) + βg(ω) is a random variable.
(2) (f g)(ω) := f (ω)g(ω) is a random-variable.
(3) If g(ω) 6= 0 for all ω ∈ Ω, then fg (ω) :=
f (ω)
g(ω)
is a random variable.
(4) |f | is a random variable.
Proof. (2) We find measurable step-functions fn , gn : Ω → R such that
f (ω) = lim fn (ω) and g(ω) = lim gn (ω).
n→∞
n→∞
Hence
(f g)(ω) = lim fn (ω)gn (ω).
n→∞
Finally, we remark, that fn (ω)gn (ω) is a measurable step-function. In fact,
assuming that
fn (ω) =
k
X
αi 1IAi (ω) and gn (ω) =
i=1
l
X
βj 1IBj (ω),
j=1
yields
(fn gn )(ω) =
k X
l
X
αi βj 1IAi (ω)1IBj (ω) =
i=1 j=1
k X
l
X
αi βj 1IAi ∩Bj (ω)
i=1 j=1
and we again obtain a step-function, since Ai ∩ Bj ∈ F. Items (1), (3), and
(4) are an exercise.
3.2
Measurable maps
Now we extend the notion of random variables to the notion of measurable
maps, which is necessary in many considerations and even more natural.
Definition 3.2.1. [measurable map] Let (Ω, F) and (M, Σ) be measurable
spaces. A map f : Ω → M is called (F, Σ)-measurable, provided that
f −1 (B) = {ω ∈ Ω : f (ω) ∈ B} ∈ F
for all B ∈ Σ.
42
CHAPTER 3. RANDOM VARIABLES
Proposition 3.2.2 (Image measures). Let (Ω1 , F1 ) and (Ω2 , F2 ) be measurable spaces, f : Ω1 → Ω2 be a measurable map, and µ1 be a measure on F1 .
Define
µ2 (B2 ) := µ1 (f −1 (B2 )) for B2 ∈ F2 .
Then one has the following:
(i) (Ω2 , F2 , µ2 ) is a measure space with µ1 (Ω1 ) = µ2 (Ω2 ).
(ii) There are examples that µ1 is σ-finite, but µ2 is not.
The connection to the random variables is given by
Proposition 3.2.3. Let (Ω, F) be a measurable space and f : Ω → R. Then
the following assertions are equivalent:
(1) The map f is a random variable.
(2) The map f is (F, B(R))-measurable.
For the proof we need
Lemma 3.2.4. Let (Ω, F) and (M, Σ) be measurable spaces and let f : Ω →
M . Assume that Σ0 ⊆ Σ is a system of subsets such that σ(Σ0 ) = Σ. If
f −1 (B) ∈ F
for all B ∈ Σ0 ,
f −1 (B) ∈ F
for all B ∈ Σ.
then
Proof. Define
A := B ⊆ M : f −1 (B) ∈ F .
By assumption, Σ0 ⊆ A. We show that A is a σ–algebra.
(1) f −1 (M ) = Ω ∈ F implies that M ∈ A.
(2) If B ∈ A, then
f −1 (B c ) =
=
=
=
{ω : f (ω) ∈ B c }
{ω : f (ω) ∈
/ B}
Ω \ {ω : f (ω) ∈ B}
f −1 (B)c ∈ F.
(3) If B1 , B2 , · · · ∈ A, then
f
−1
∞
[
i=1
!
Bi
=
∞
[
f −1 (Bi ) ∈ F.
i=1
By definition of Σ = σ(Σ0 ) this implies that Σ ⊆ A, which implies our
lemma.
3.2. MEASURABLE MAPS
43
Proof of Proposition 3.2.3. (2) =⇒ (1) follows from (a, b) ∈ B(R) for a < b
which implies that f −1 ((a, b)) ∈ F.
(1) =⇒ (2) is a consequence of Lemma 3.2.4 since B(R) = σ((a, b) : −∞ <
a < b < ∞).
Example 3.2.5. If f : R → R is continuous, then f is (B(R), B(R))measurable.
Proof. Since f is continuous we know that f −1 ((a, b)) is open for all −∞ <
a < b < ∞, so that f −1 ((a, b)) ∈ B(R). Since the open intervals generate
B(R) we can apply Lemma 3.2.4.
Now we state some general properties of measurable maps.
Proposition 3.2.6. Let (Ω1 , F1 ), (Ω2 , F2 ), (Ω3 , F3 ) be measurable spaces.
Assume that f : Ω1 → Ω2 is (F1 , F2 )-measurable and that g : Ω2 → Ω3 is
(F2 , F3 )-measurable. Then the following is satisfied:
(1) g ◦ f : Ω1 → Ω3 defined by
(g ◦ f )(ω1 ) := g(f (ω1 ))
is (F1 , F3 )-measurable.
(2) Assume that P1 is a probability measure on F1 and define
P2 (B2 ) := P1 ({ω1 ∈ Ω1 : f (ω1 ) ∈ B2 }) .
Then P2 is a probability measure on F2 .
The proof is an exercise.
Example 3.2.7. We want to simulate the flipping of an (unfair) coin by the
random number generator: the random number generator of the computer
gives us a number which has (a discrete) uniform distribution on [0, 1]. So
we take the probability space ([0, 1], B([0, 1]), λ) and define for p ∈ (0, 1) the
random variable
f (ω) := 1I[0,p) (ω).
Then it holds
P2 ({1}) := P1 ({ω ∈ Ω : f (ω) = 1}) = λ([0, p)) = p,
P2 ({0}) := P1 ({ω1 ∈ Ω1 : f (ω1 ) = 0}) = λ([p, 1]) = 1 − p.
Assume the random number generator gives out the number x. If we would
write a program such that ”output” = ”heads” in case x ∈ [0, p) and ”output”
= ”tails” in case x ∈ [p, 1], ”output” would simulate the flipping of an (unfair)
coin, or in other words, ”output” has binomial distribution Bin1,p .
44
CHAPTER 3. RANDOM VARIABLES
Definition 3.2.8. [law of a random variable] Let (Ω, F, P) be a probability space and f : Ω → R be a random variable. Then
Pf (B) := P ({ω ∈ Ω : f (ω) ∈ B})
is called the law or image measure of the random variable f .
The law of a random variable is completely characterized by its distribution
function which we introduce now.
Definition 3.2.9. [distribution-function] Given a random variable f :
Ω → R on a probability space (Ω, F, P), the function
Ff (x) := P({ω ∈ Ω : f (ω) ≤ x})
is called distribution function of f .
Proposition 3.2.10. [Properties of distribution-functions]
The distribution-function Ff : R → [0, 1] is a right-continuous non-decreasing
function such that
lim Ff (x) = 0
x→−∞
and
lim Ff (x) = 1.
x→∞
Proof. (i) Ff is non-decreasing: given x1 < x2 one has that
{ω ∈ Ω : f (ω) ≤ x1 } ⊆ {ω ∈ Ω : f (ω) ≤ x2 }
and
Ff (x1 ) = P({ω ∈ Ω : f (ω) ≤ x1 }) ≤ P({ω ∈ Ω : f (ω) ≤ x2 }) = Ff (x2 ).
(ii) Ff is right-continuous: let x ∈ R and xn ↓ x. Then
Ff (x) = P({ω ∈ Ω : f (ω) ≤ x})
!
∞
\
= P
{ω ∈ Ω : f (ω) ≤ xn }
n=1
= lim P ({ω ∈ Ω : f (ω) ≤ xn })
n
= lim Ff (xn ).
n
(iii) The properties limx→−∞ Ff (x) = 0 and limx→∞ Ff (x) = 1 are an exercise.
Proposition 3.2.11. Assume that P1 and P2 are probability measures on
B(R) and F1 and F2 are the corresponding distribution functions. Then the
following assertions are equivalent:
3.2. MEASURABLE MAPS
45
(1) P1 = P2 .
(2) F1 (x) = P1 ((−∞, x]) = P2 ((−∞, x]) = F2 (x) for all x ∈ R.
Proof. (1) ⇒ (2) is of course trivial. We consider (2) ⇒ (1): For sets of type
A := (a, b]
one can show that
F1 (b) − F1 (a) = P1 (A) = P2 (A) = F2 (b) − F2 (a).
Now one can apply Proposition 12.3.3.
Summary: Let (Ω, F) be a measurable space and f : Ω → R be a function.
Then the following relations hold true:
f −1 (A) ∈ F for all A ∈ G
where G is one of the systems given in Proposition 2.3.6 or
any other system such that σ(G) = B(R).
~
w
w
w Lemma 3.2.4
f is (F, B(R))-measurable: f −1 (A) ∈ F for all A ∈ B(R)
~
w
w
wProposition 3.2.3
f is a random variable i.e.
there exist measurable step functions (fn )∞
n=1 i.e.
PNn n
n
fn = k=1 ak 1IAk
with ank ∈ R and Ank ∈ F such that
fn (ω) → f (ω) for all ω ∈ Ω as n → ∞.
~
w
w
wProposition 3.1.3
f −1 ((a, b)) ∈ F for all − ∞ < a < b < ∞
46
CHAPTER 3. RANDOM VARIABLES
Chapter 4
Independence
4.1
The basic concept
Let us first start with the notion of a family of independent random variables.
Definition 4.1.1. [independence of a family of random variables]
Let (Ω, F, P) be a probability space and fi : Ω → R, i ∈ I, be random
variables where I is a non-empty index-set. The family (fi )i∈I is called
independent provided that for all distinct i1 , ..., in ∈ I, n = 1, 2, ..., and
all B1 , ..., Bn ∈ B(R) one has that
P (fi1 ∈ B1 , ..., fin ∈ Bn ) = P (fi1 ∈ B1 ) · · · P (fin ∈ Bn ) .
In case, we have a finite index set I, that means for example I = {1, ..., n},
then the definition above is equivalent to
Definition 4.1.2. [independence of a finite family of random variables] Let (Ω, F, P) be a probability space and fi : Ω → R, i = 1, . . . , n,
random variables. The random variables f1 , . . . , fn are called independent
provided that for all B1 , ..., Bn ∈ B(R) one has that
P (f1 ∈ B1 , ..., fn ∈ Bn ) = P (f1 ∈ B1 ) · · · P (fn ∈ Bn ) .
The connection between the independence of random variables and of events
is obvious:
Proposition 4.1.3. Let (Ω, F, P) be a probability space and fi : Ω → R,
i ∈ I, be random variables where I is a non-empty index-set. Then the
following assertions are equivalent.
(1) The family (fi )i∈I is independent.
(2) For all families (Bi )i∈I of Borel sets Bi ∈ B(R) one has that the events
({ω ∈ Ω : fi (ω) ∈ Bi })i∈I are independent.
48
CHAPTER 4. INDEPENDENCE
Sometimes we need to group independent random variables. In this respect
the following proposition turns out to be useful. For the following we say
that g : Rn → R is Borel-measurable (or a Borel function) provided that g
is (B(Rn ), B(R))-measurable.
Proposition 4.1.4. [Grouping of independent random variables]
Let fk : Ω → R, k = 1, 2, 3, ... be independent random variables. Assume
Borel functions gi : Rni → R for i = 1, 2, ... and ni ∈ {1, 2, ...}. Then the random variables g1 (f1 , ..., fn1 ), g2 (fn1 +1 , ..., fn1 +n2 ), g3 (fn1 +n2 +1 , ..., fn1 +n2 +n3 ),
... are independent.
The proof is an exercise.
Proposition 4.1.5. [independence and product of laws] Assume that
(Ω, F, P) is a probability space and that f, g : Ω → R are random variables
with laws Pf and Pg and distribution-functions Ff and Fg , respectively. Then
the following assertions are equivalent:
(1) f and g are independent.
(2) P ((f, g) ∈ B) = (Pf ⊗ Pg )(B) for all B ∈ B(R2 ).
(3) P(f ≤ x, g ≤ y) = Ff (x)Fg (y) for all x, y ∈ R.
The proof is an exercise.
Remark 4.1.6. Assume that there are Riemann-integrable functions pf , pg :
R → [0, ∞) such that
Z
Z
pg (x)dx = 1,
pf (x)dx =
R
R
Z
x
Z
pf (y)dy,
Ff (x) =
x
pg (y)dy
and Fg (x) =
−∞
−∞
for all x ∈ R (one says that the distribution-functions Ff and Fg are absolutely continuous with densities pf and pg , respectively). Then the independence of f and g is also equivalent to the representation
Z x Z y
F(f,g) (x, y) =
pf (u)pg (v)d(u)d(v).
−∞
−∞
In other words: the distribution-function of the random vector (f, g) has a
density which is the product of the densities of f and g.
One often works with sequences of independent random variables f1 , f2 , ... :
Ω → R having a certain distribution. The first mathematical question is to
whether such independent random variables do exist. Another question is if
one can construct such sequences.
The main mathematical question is in the beginning to what extent independent random variables do exist. This question one might easily overlook.
The positive answer is
4.1. THE BASIC CONCEPT
49
Proposition 4.1.7. [Realization of independent random variables] Let I = {1, 2, ..., n} or I = N. Given probability measures Pk on B(R),
k ∈ I, there exists a probability space (Ω, F, P) and independent random
variables (πk )k∈I , πk : Ω → R such that law of πk is Pk , that means
P(πk ∈ B) = Pk (B)
for all B ∈ B(R).
Proof.N(i) Assume that I = {1, 2, ..., n} and take Ω := Rn , F := B(Rn ), and
P := nk=1 Pk . As random variables we choose the projections
πk : Ω → R with πk (x) := xk
if x = (x1 , . . . , xn ).
The projections πk are measurable since
πk−1 (Bk ) = R × · · · × R × Bk × R × · · · × R ∈ B(Rn )
for all Bk ∈ B(R). Finally,
P(π1 ∈ B1 , . . . , πn ∈ Bn ) = P(B1 × · · · × Bn )
n
O
=
Pk (B1 × · · · × Bn )
k=1
=
n
Y
Pk (Bk )
k=1
and, by fixing k and letting Bl = R for l 6= k, we have
P(πk ∈ Bk ) =
=
=
=
P(π1 ∈ R, , πk−1 ∈ R, πk ∈ Bk , πk+1 ∈ R, , πn ∈ R)
···
P1 (R) · · · Pk−1 (R)Pk (Bk )Pk+1 (R) · · · Pn (R)
Pk (Bk )
which proves our statement for a finite index set I.
(ii) Now let us turn to the case I = N. As measurable space we choose
(Ω, F) = (RN , B(RN )) with
RN := {x := (xk )∞
k=1 : xk ∈ R}
and B(RN ) being the smallest σ-algebra containing all cylinder sets
R × · · · × R × Bk × R × R × · · ·
with Bk ∈ B(R).
We again use the projections (πk )∞
k=1 defined by πk (x) := xk . Then the
smallest σ-algebra such that all these projections are random variables is
B(RN ), so that we have
B(RN ) = σ πn−1 (B) : n ∈ N, B ∈ B(R) .
50
CHAPTER 4. INDEPENDENCE
Using exactly the same argument as in (i) (the reader should check this) the
proof is complete if we find a probability measure P on F such that
P(B1 × B2 × · · · × Bn × R × R · · · ) = P1 (B1 ) · · · Pn (Bn ).
This measure we will denote by
∞
O
Pk .
k=1
We use Carathéodory’s theorem to construct this infinite product measure.
As algebra we take
A = {finite disjoint unions of B1 × · · · × Bn × R × R × · · · :
where Bk = (ak , bk ] or (ak , ∞) and − ∞ ≤ ak ≤ bk < ∞}.
It is clear that A is an algebra and that σ(A) = B(RN ). Define the ’premeasure’ P0 on A by
P0 (B1 × · · · × Bn × R × R × · · · ) =
n
Y
Pk (Bk ).
k=1
We skip the formal check that P0 is well-defined. But we indicate a proof
that P0 is σ-additive on A. Here we have to show that
!
∞
∞
∞
[
X
[
Al ∈ A
P0 (Al ) for A1 , A2 , ... ∈ A with
P0
Al =
l=1
l=1
l=1
whenever A1 , A2 , ... are pair-wise disjoint. It follows from the properties of
finite product measures and the monotonicity of P0 that
!
!
L
L
∞
X
[
[
P0 (Al ) ≤ P0
A l ≤ P0
Al .
l=1
l=1
l=1
By L ↑ ∞ this implies
∞
X
P0 (Al ) ≤ P0
l=1
∞
[
!
Al
.
l=1
Assume that the above inequality is strict, i.e. there is some δ > 0 such that
!
∞
∞
[
X
P0
Al ≥
P0 (Al ) + δ.
l=1
l=1
Defining
BL :=
∞
[
l=1
!
Al
\ (A1 ∪ · · · ∪ AL )
4.1. THE BASIC CONCEPT
one gets B1 ⊇ B2 ⊇ · · · and
assumption translates into
51
T∞
l=1
P0 (BL ) ≥ δ
Bl = ∅. With this notation the above
for all L = 1, 2, ...
By induction one can define a sequence of sets Cn such that
• Cn ⊆ Bn ,
• C1 ⊇ C2 ⊇ · · · ,
• Cn is a finite union of sets of type [a1 , b1 ] × · · · × [aK , bK ] × R × R × · · · .
To see that such a choice of (Cn )∞
n=1 is possible take for example C1 ⊆ B1
with P0 (B1 \ C1 ) ≤ 81 δ. From B2 ⊆ B1 and P(B2 ∩ C1 ) = P(B2 ) − P(B2 \ C1 )
we conclude
1
P(B2 ∩ C1 ) ≥ δ − δ.
8
We continue choosing C2 ⊆ B2 ∩ C1 with P0 ((B2 ∩ C1 ) \ C2 ) ≤
implies
1
1
P(B3 ∩ C2 ) ≥ δ − δ − δ
8
16
1
δ.
16
This
etc. Then the sequence π1 (Cl ), l ∈ N consists
T∞ of compact non-empty subsets
of R (or it is R itself) and therefore
also
l=1 π1 (Cl ) is a compact non-empty
T∞
set. We choose y1 ∈ D1 := l=1 π1 (Cl ) and consider the set
Cn (y1 ) := {x ∈ Cn : x1 = y1 }.
called the y1 -section of Cn . The sets {(xk )∞
k=2 : x ∈ Cn (y1 )} are of the same
structure as the Cn , therefore we get
∞
\
π2 (Cl (y1 )) 6= ∅.
l=1
We choose y2 from this intersection and repeat the argument: We define
Cn (y1 , y2 ) := {x ∈ Cn (y1 ) : x2 = y2 } and consider the intersection of
π3 (Cn (y1 , y2 )), n ∈ N, from which we choose y3 . Hence, since the resulting
sequence (yk )∞
k=1 is an element of each Cn , we get
∅=
∞
\
l=1
which is a contradiction.
Bl ⊇
∞
\
l=1
Cl , 6= ∅
52
4.2
CHAPTER 4. INDEPENDENCE
Zero-One laws
P
P∞
1
n1
It is known that ∞
n=1 n = ∞ but that
n=1 (−1) n does converge. What
is going to happen if we take a random sequence of signs?
(p)
(p)
Definition 4.2.1. For p ∈ (0, 1) we denote by ε1 , ε2 , ... : Ω → R a sequence
of independent random variables such that
and P ε(p)
P ε(p)
n = 1 = 1 − p.
n = −1 = p
(1/2)
If p = 1/2 we write εn = εn
Bernoulli 1 random variables.
and call this sequence a sequence of
Now we are interested in
P
∞
X
εn
n=1
n
!
converges .
Remark 4.2.2. Let ξ1 , ξ2 , ... : Ω → R be random variables over (Ω, F, P).
Then
(
)
∞
X
A := ω ∈ Ω :
ξn (ω) converges ∈ F.
n=1
This is not difficult to verify because ω ∈ A if and only if
m
(
)
X
\
[
\
1
ω∈Ω:
ξk (ω) <
.
N
N =1,2,... n =1,2,... m>n≥n
k=n+1
0
0
P
What is a typical property of the set A = {ω ∈ Ω : ∞
n=1 ξn (ω) converges}?
The condition does P
not depend on the
first
realizations
ξ1 (ω), ..., ξN (ω) since
P∞
∞
the convergence of n=1 ξn (ω) and n=N +1 ξn (ω) are equivalent so that
(
)
∞
X
A= ω∈Ω:
ξn (ω) converges .
n=N +1
We shall formulate this in a more abstract way. For this we need
Definition 4.2.3. Let (Ω, F, P) be a probability space and ξα : Ω → R be
a family of random variables. Then σ(ξα : α ∈ I) is the smallest σ-algebra
which contains all sets of form
{ω ∈ Ω : ξα (ω) ∈ B}
where α ∈ I and B ∈ B(R).
Exercise 4.2.4. Show that σ(ξα : α ∈ I) is the smallest σ-algebra on Ω such
that all random variables ξα : Ω → R are measurable.
1
Jacob Bernoulli, 27/12/1654 (Basel, Switzerland)- 16/08/1705 (Basel, Switzerland),
Swiss mathematician.
4.2. ZERO-ONE LAWS
53
Example 4.2.5. For random variables ξ1 , ξ2 , ... : Ω → R and
(
)
∞
X
A= ω∈Ω:
ξn (ω) converges
n=1
one has that A ∈
T∞
N =1
σ(ξN , ξN +1 , ξN +2 , ...).
The above example leads us straight to the famous Zero-One law of Kolmogorov 2 :
Proposition 4.2.6 (Zero-One law of Kolmogorov).
Assume independent random variables ξ1 , ξ2 , ... : Ω → R on some probability space (Ω, F, P)
and let
\
FN∞ := σ(ξN , ξN +1 , ...) and F ∞ :=
Fn∞ .
Then P(A) ∈ {0, 1} for all A ∈ F ∞ .
For the proof the following lemma is needed:
Lemma 4.2.7. Let A ⊆ F be an algebra and let σ(A) = F. Then, for all
ε > 0 and B ∈ F, there is an A ∈ A such that
P(A∆B) < ε.
Proof of Proposition 4.2.6. The idea of the proof is to show that P(A) =
P(A)2 . Define the algebra
A :=
∞
[
σ(ξ1 , ..., ξn ).
n=1
We have that F ∞ ⊆ σ(A). Hence Lemma 4.2.7 implies that for A ∈ F ∞
there are An ∈ σ(ξ1 , ..., ξNn ) such that
P(A∆An ) →n→∞ 0.
We get also that
P(An ∩ A) →n→∞ P(A) and P(An ) →n→∞ P(A).
The first relation can be seen as follows: since
P(An ∩ A) + P(An ∆A) = P(An ∪ A) ≥ P(A)
we get that
lim inf P(An ∩ A) ≥ P(A) ≥ lim sup P(An ∩ A).
n
2
Andrey Nikolaevich Kolmogorov 25/04/1903 (Tambov, Russia) - 20/10/1987 (Moscow, Russia), one of the founders of modern probability theory, Wolf prize 1980.
54
CHAPTER 4. INDEPENDENCE
The second relation can be also checked easily. But now we get, by independence, that
P(A) = lim P(A ∩ An ) = lim P(A)P(An ) = P(A)2
n
n
so that P(A) ∈ {0, 1}.
Corollary 4.2.8. Let ξ1 , ξ2 , ... : Ω → R be independent random variables
over (Ω, F, P). Then
!
∞
X
P
ξn converges ∈ {0, 1} .
n=1
We also obtain an early version of the law of the iterated logarithm (LIL).
But for this we need the central limit theorem (CLT) for the Bernoulli
random variables, which we will not prove here.
Proposition 4.2.9 (CLT). Let ε1 , ε2 , ... : Ω → R be independent Bernoulli
random variables and c ∈ R. Then one has that
Z c
x2 dx
ε1 + · · · + εn
√
e− 2 √ .
≤c =
lim P
n→∞
n
2π
−∞
As a second application of Kolmogorov’s Zero-One law we get
Corollary 4.2.10. Let ε1 , ε2 , ... : Ω → R be independent Bernoulli random
variables. Then one has that
fn
fn
P lim sup √ = ∞, lim inf √ = −∞ = 1
n
n
n
n
where fn := ε1 + · · · + εn .
Proof. The assertion is equivalent to
fn
fn
P lim sup √ = ∞ = 1 and P lim inf √ = −∞ = 1.
n
n
n
n
By symmetry it is sufficient to prove the first equality only. Letting
fn
Ac := lim sup √ ≥ c
n
n
for c ≥ 0 we get that
fn
P lim sup √ = ∞
n
n
=P
∞
\
m=1
!
Am
= lim P Am
m
4.2. ZERO-ONE LAWS
55
because of Am ⊇ Am+1 . Hence we have to prove that P Am = 1. Since
ε1 (ω) + · · · + εn (ω)
√
n
n
ε1 (ω) + · · · + εN −1 (ω) εN (ω) + · · · + εn (ω)
√
√
+
= lim sup
n
n
n→∞,n≥N
εN (ω) + · · · + εn (ω)
√
= lim sup
n
n→∞,n≥N
lim sup
because of
lim
n→∞,n≥N
ε1 (ω) + · · · + εN −1 (ω)
√
= 0,
n
where we assume that N ≥ 2, we get that
Am ∈
∞
\
σ (εN , εN +1 , ...) .
N =1
Consequently, in order to prove P Am = 1 we are in a position to apply Kolmogorov’s Zero-One law (Proposition 4.2.6) and the only thing to
verify is
P Am > 0.
To get this, we first observe that
\
∞ [
∞ fk
fn
√ ≥m
Am = lim sup √ ≥ m ⊇
n
n
k
n=1 k=n
T∞ S∞ n fk
o
since ω ∈ n=1 k=n
≥ m implies that for all n = 1, 2, ... there is a
k
√
∞
k ≥ n√
such that fk (ω)/ k ≥ m, which gives
√ a subsequence (kl )l=1 with
fkl (ω)/ kl ≥ m and finally lim supn fn (ω)/ n ≥ m. Applying Fatou 3 ’s
lemma we derive from that
!
∞ [
∞ \
fk
fn
√ ≥m
P(Am ) ≥ P
≥ lim sup P √ ≥ m .
n
n
k
n=1 k=n
√
Since the central limit theorem (Proposition 4.2.9) gives
lim P
n
we are done.
Z ∞
x2 dx
fn
√ ≥m =
e− 2 √ > 0
n
2π
m
3
Pierre Joseph Louis Fatou, 28/02/1878 (Lorient, France) - 10/08/1929 (Pornichet,
France), French mathematician, dynamical systems, Mandelbrot set.
56
CHAPTER 4. INDEPENDENCE
Now let us turn to a ruin problem. Let p, q ∈ (0, 1), p + q = 1, and fn :=
Pn (p)
(p)
where n = 1, 2, ... and the random variables εi were defined in
i=1 εi
Definition 4.2.1. Consider the event
A := {ω : # {n : fn (ω) = 0} = ∞} .
In words, A is the event, that the path of the process (fn )∞
n=1 reaches 0 infinitely many often. We would like to have a Zero-One law for this event.
However, we are not able to apply Kolmogorov’s Zero-One law (Proposition 4.2.6). But what is the typical property of A? We can rearrange finitely
P
(p)
many elements of the sum ni=1 εi and we will get the same event since
from some N on, this rearrangement does not influence the sum anymore.
To give a formal definition of this property we need
Definition 4.2.11. A map π : N → N is called finite permutation if
(i) the map π is a bijection,
(ii) there is some N ∈ N such that π(n) = n for all n ≥ N .
Moreover, we recall that B(RN ) is the smallest σ-algebra on RN which contains all cylinder sets Z of form
Z := {(ξ1 , ξ2 , ...) : an < ξn < bn , n = 1, 2, ...}
for some −∞ < an < bn < ∞. Now the symmetry property, needed for the
next Zero-One law, looks as follows:
Definition 4.2.12. Let (ξn )∞
n=1 , ξn : Ω → R, be a sequence of independent
random variables over a probability space (Ω, F, P). Let B ∈ B(RN ) and
A := {ω ∈ Ω : (ξ1 (ω), ξ2 (ω), ...) ∈ B} .
The set A is called symmetric if for all finite permutations π : N → N one
has that
A = ω ∈ Ω : (ξπ(1) (ω), ξπ(2) (ω), ...) ∈ B .
A typical set B which serves as basis for the set A is given by
Example 4.2.13. We let B ∈ B(RN ) be the set of all sequences (ξn )∞
n=1 ,
ξ ∈ {−1, 1}, such that
# {n : ξ1 + · · · + ξn = 0} = ∞.
4.2. ZERO-ONE LAWS
57
The next Zero-One law works for identically distributed random variables.
Recall that a sequence of random variables ξ1 , ξ2 , ... : Ω → R over some
probability space (Ω, F, P) is identically distributed provided that
P(ξk > λ) = P(ξl > λ)
for all k, l ∈ N and λ ∈ R.
Proposition 4.2.14 (Zero-One law of Hewitt and Savage). 4
Assume a sequence of independent and identically distributed random variables
ξ1 , ξ2 , ... : Ω → R over some probability space (Ω, F, P). If the event A ∈ F
is symmetric, then P(A) ∈ {0, 1}.
Proof. (a) We are going to use the permutations
n+k : 1≤k ≤n
k − n : n + 1 ≤ k ≤ 2n .
πn (k) :=
k : k > 2n
Now we approximate the set A. By Lemma 4.2.7 we find Bn ∈ B(Rn ) such
that
P (An ∆A) →n 0 for An := {ω : (ξ1 (ω), . . . , ξn (ω)) ∈ Bn } .
(b) Our goal is to show that
P (An ) → P(A),
P (πn (An )) → P(A),
P (An ∩ πn (An )) → P(A),
as n → ∞, where
n
πn (An ) := ω : ξπn (k) (ω) k=1 ∈ Bn
= {ω : (ξn+1 (ω), . . . , ξ2n (ω)) ∈ Bn } .
(c) Assuming for a moment that (b) is proved we derive
P (An ∩ πn (An )) = P(An )P(πn (An ))
since An is a condition on ξ1 , ..., ξn , πn (An ) is a condition on ξn+1 , ..., ξ2n , and
ξ1 , ..., ξ2n are independent. By n → ∞ this equality turns into
P(A) = P(A)P(A)
so that P(A) ∈ {0, 1}.
4
Leonard Jimmie Savage 20/11/1917 (Detroit, USA) - 1/11/1971 (New Haven, USA),
American mathematician and statistician.
58
CHAPTER 4. INDEPENDENCE
(d) Now we prove (b). The convergence
P(An ) → P(A)
is obvious since P(An ∆A) → 0. This implies
P(πn (An )) → P(A)
since P(πn (An )) = P(An ) which follows from the fact that
(ξn+1 , ..., ξ2n , ξ1 , ..., ξn ) and (ξ1 , ..., ξn , ξn+1 , ..., ξ2n )
have the same law in R2n as a consequence that we have an identically distributed sequence of random variables. Finally
P(A∆An ) = P (πn (A∆An ))
= P (πn (A)∆πn (An ))
= P (A∆πn (An ))
where the first equality follows because the random variables (ξ1 , ξ2 , ...) have
the same distribution and the last equality is a consequence of the symmetry
of A. Hence
P (A∆An ) Qn 0 and P (A∆πn (An )) Qn 0
which implies
P (A∆ (An ∩ πn (An ))) Qn 0 and P (An ∩ πn (An )) QP(A).
As an application we consider
Proposition 4.2.15. Let εP
1 , ε2 , ... : Ω → R be independent Bernoulli
random variables and fn = ni=1 εi , n = 1, 2, ... Then
P(# {n : fn = 0} = ∞) = 1
Proof. Consider the sets
A+ := {ω ∈ Ω : # {n : fn (ω) = 0} < ∞} ∩ {ω ∈ Ω : lim inf fn (ω) > 0} ,
A := {ω ∈ Ω : # {n : fn (ω) = 0} = ∞} ,
A− := {ω ∈ Ω : # {n : fn (ω) = 0} < ∞} ∩ {ω ∈ Ω : lim sup fn (ω) < 0} .
Since the random walk is symmetric we have
P(A+ ) = P(A− ).
Moreover, A+ , A, and A− are symmetric, so that
P(A+ ), P(A), P(A− ) ∈ {0, 1}
4.2. ZERO-ONE LAWS
59
by the Zero-One law of Hewitt-Savage. As the only solution to that we
obtain
P(A) = 1 and P(A+ ) = P(A− ) = 0.
Other examples for which the Zero-One law of Kolmogorov or Hewitt
and Savage might be applied are given in
Example 4.2.16. Let us illustrate by some additional very simple examples
the difference of the assumptions for the Zero-One law of Kolmogorov and
the Zero-One law of Hewitt-Savage. Assume a sequence of independent
and identically distributed random variables
T∞ξ1 , ξ2 , ... : Ω → R over some
∞
probability space (Ω, F, P) and that F = n=1 σ (ξn , ξn+1 , . . .).
(a) Given B ∈ B (R) we have that {lim supn ξn ∈ B} ∈ F ∞ . Here we do
not need the same distribution of the random variables ξ1 , ξ2 , ...
(b) The set A := {ω ∈ Ω : ξn (ω) = 0 for all n = 1, 2, . . .} does not belong
to F ∞ but is symmetric, because
A = {ω ∈ Ω : (ξn (ω))∞
n=1 ∈ B}
∞
= ω ∈ Ω : ξπ(n) (ω) n=1 ∈ B
for B = {(0, 0, 0, . . . )} ⊆ B(RN ).
∞
(c) The set A := {ω ∈ Ω : Σ∞
n=1 |ξn (ω)| exists and Σn=1 ξn (ω) < 1} does
not belong to F ∞ but is symmetric, because
A = {ω ∈ Ω : (ξn (ω))∞
n=1 ∈ B}
∞
= ω ∈ Ω : ξπ(n) (ω) n=1 ∈ B
if B is the set of all summable sequences with sum strictly less than
one.
We finish by the non-symmetric random walk. As preparation we need Stirling 5 ’s formula
n n
√
θ
e 12n
n! = 2πn
e
for n = 1, 2, ... and some θ ∈ (0, 1) depending on n. This gives
p
2n θ1
2π(2n) 2n
e 12·2n
2n
(2n)!
4n
e
√
=
=
∼
.
2
√
θ2
(n!)2
n
πn
n 2n
2
12n
( 2πn) e
e
5
James Stirling May 1692 (Garden, Scotland) - 5/12/1770 (Edinburgh, Scotland), Scottish mathematician whose most important work Methodus Differentialis in 1730 is a treatise on infinite series, summation, interpolation and quadrature.
60
CHAPTER 4. INDEPENDENCE
P
(p)
Proposition 4.2.17. Let p 6= 1/2 and fn = ni=1 εi , n = 1, 2, ..., where the
(p) (p)
random variables ε1 , ε2 , ... : Ω → R are given by Definition 4.2.1. Then
P (# {n : fn = 0} = ∞) = 0.
Proof. Letting Bn := {f2n = 0} we obtain
2n
(4pq)n
P(Bn ) =
(pq)n ∼ √
n
πn
by Stirling’s formula. Since p 6= q gives 4pq < 1, we have
∞
X
P(Bn ) < ∞
n=1
so that the Lemma of Borel-Cantelli implies that
P (ω ∈ Bn infinitely often ) = 0.
Chapter 5
Integration
Given a measure space (Ω, F, µ) and a Borel-measurable map f : Ω → R,
we define the Lebesgue-integral
Z
Z
f (ω)dµ(ω)
f dµ =
Ω
Ω
and investigate its basic properties.
5.1
Definition of the Lebesgue-integral
The definition of the integral is done within three steps.
Definition 5.1.1. [step one, f is a non-negative step-function]
Given a probability space (Ω, F, µ) and a Borel-measurable g : Ω → R
with representation
n
X
g=
αi 1IAi
(5.1)
i=1
where αi ≥ 0, Ai ∈ F, and Ω =
Z
i=1
Ai with Ai ∩ Aj = ∅ for i 6= j, we let
Z
gdµ =
Ω
Sn
g(ω)dµ(ω) :=
Ω
n
X
αi µ(Ai ) ∈ [0, ∞].
i=1
R
Note Rthat in this definition the case Ω gdµ = ∞ is allowed. We have to check
that Ω gdµ is well-defined,
R since it might be that different representations of
g give different integrals Ω gdµ. However, this is not the case:
Lemma 5.1.2. Assume two representations
g=
n
X
i=1
of form (5.1), one has that
Pn
i=1
αi 1IAi =
m
X
βj 1IBj
j=1
αi µ(Ai ) =
Pm
j=1
βj µ(Bj ).
62
CHAPTER 5. INTEGRATION
Proof. By taking all possible intersections of the sets Ai and Bj we find a
system of sets C1 , ..., CN ∈ F such that
(a) Cj ∩ Ck = ∅ if j 6= k,
S
(b) N
j=1 Cj = Ω,
(c) for all Ai there is a set Ii ⊆ {1, ..., N } such that Ai =
S
(d) for all Bj there is a set Jj ⊆ {1, ..., N } such that Bj =
k∈Ii
S
Ck ,
k∈Jj
Ck .
Now we get that
X
αi =
i:k∈Ii
X
βj
j:k∈Jj
if Ck 6= ∅ and
n
X
αi µ(Ai ) =
i=1
n X
X
αi µ(Ck ) =
i=1 k∈Ii
=
N
X
k=1
X
k=1
i:k∈Ii
X
!
N
X
βj µ(Ck ) =
n
X
αi
µ(Ck )
βj µ(Bj ).
j=1
j:k∈Jj
Definition 5.1.3. [step two, f is non-negative] Given a measure space
(Ω, F, µ) and a Borel-measurable map f : Ω → R with f (ω) ≥ 0 for all
ω ∈ Ω. Then
Z
Z
f dµ =
f (ω)dµ(ω)
Ω
ΩZ
:= sup
gdµ : 0 ≤ g(ω) ≤ f (ω), g is a measurable step-function .
In the last step we will define the Lebesgue-integral for a general Borelmeasurable map. To this end we decompose f : Ω → R into its positive and
negative part
f (ω) = f + (ω) − f − (ω)
with
f + (ω) := max {f (ω), 0} ≥ 0 and f − (ω) := max {−f (ω), 0} ≥ 0.
Definition 5.1.4. [step three, f is general] Let (Ω, F, µ) be a measure
space and f : Ω → R be a Borel-measurable map.
R
R −
(1) If Ω f + dµ
<
∞
or
f dµ < ∞, then we say that the LebesgueΩ
R
integral Ω f dµ exists and set
Z
Z
Z
+
f dµ :=
f dµ − f − dµ ∈ [−∞, ∞].
Ω
Ω
Ω
5.2. THEOREM ABOUT MONOTONE CONVERGENCE
63
(2) The map f is called integrable provided that
Z
Z
+
f dµ < ∞ and
f − dµ < ∞.
Ω
Ω
(3) If f is integrable and A ∈ F, then
Z
Z
Z
f dµ =
f (ω)dµ(ω) :=
f (ω)1IA (ω)dµ(ω).
A
A
Ω
R
In case that µ = P is a probability measure, Ω f dP is called expectation
or expected value of the random variable f , and is also denoted by Ef or
EP f .
Remark 5.1.5. It follows from the very definition, and is left as an exercise,
that if f, Rg : Ω → R are Borel-measurable with
R µ({ω ∈ Ω : f (ω) 6= g(ω)}) =
0R and if R f dµ ∈ R ∪ {−∞, ∞} exists, then gdµ exists and one has that
gdµ = f dµ.
5.2
Theorem about monotone convergence
The first basic theorem about integration is Theorem 5.2.2 below about
monotone convergence. It is used in two different directions: firstly we need
the theorem later to derive basic properties of the Lebesgue integral, secondly
it opens the way to compute various concrete examples. We start with an
elementary version:
Lemma 5.2.1. Let (Ω, F, µ) be a measurable space and let fn , f : Ω → R
be non-negative Borel-measurable maps such that the fn are step-functions
and 0 ≤ fn (ω) ↑ f (ω) for all ω ∈ Ω as n → ∞. Then
Z
Z
f dµ = lim
fn dµ.
Ω
n→∞
Ω
Proof. (a) First we assume that f = 1IA for some A ∈ F. Let ε ∈ (0, 1) and
Bnε := {ω ∈ A : 1 − ε ≤ fn (ω)} .
Then
(1 − ε)1IBnε (ω) ≤ fn (ω) ≤ 1IA (ω).
S
ε
ε
Since Bnε ⊆ Bn+1
and ∞
n=1 Bn = A we get, by the continuity of µ from below
that limn µ(Bnε ) = µ(A) so that
Z
(1 − ε)µ(A) ≤ lim fn dµ.
n
Ω
64
CHAPTER 5. INTEGRATION
Since this is true for all ε > 0 we get
Z
Z
Z
f dµ = µ(A) ≤ lim fn dµ ≤
f dµ.
n
Ω
Ω
Ω
(b) From (a) one can immediately deduce the case that f is a step-function.
(c) Now let us assume that f is general. Let
fn0 (ω)
:=
n −1
4X
k=0
k
1I k
k+1 (ω)
2n { 2n ≤f < 2n }
so that 0 ≤ fn0 (ω) ↑ f (ω) for all ω ∈ Ω. By the definition of the Lebesgueintegral there existsR a sequence
R 0 ≤ gn (ω) ≤ f (ω) of measurable stepfunctions such that Ω gn dµ ↑ Ω f dµ. Hence
hn := max fn0 , g1 , . . . , gn
is a measurable step-function with 0 ≤ gn (ω) ≤ hn (ω) ↑ f (ω), so that
Z
Z
Z
Z
Z
Z
gn dµ ≤ hn dµ ≤ f dµ and lim
gn dµ = lim
hn dµ = f dµ.
n→∞
n→∞
Consider
dk,n := fk ∧ hn .
Clearly, dRk,n ↑ fk as n → ∞ and dk,n ↑ hn as k → ∞. It may happen that
limk limn dk,n dµ = ∞. Therefore we use the transformation with arctan,
Z
zk,n := arctan dk,n dµ,
∞
so that 0 ≤ zk,n ≤ 1. Since (zk,n )∞
k=1 is increasing for fixed n and (zk,n )n=1 is
increasing for fixed k one quickly checks that
lim lim zk,n = lim lim zk,n .
k
n
n
k
R
R
Hence
it
also
holds
lim
lim
d
dµ
=
lim
lim
dk,n dµ. Step (a) implies
k
n
k,n
n
k
R
R
R
R
hn dµ = limk dk,n dµ and limn dk,n dµ = fk dµ. Collecting all these
results we get
Z
Z
Z
Z
Z
f dµ = lim hn dµ = lim lim dk,n dµ = lim lim dk,n dµ = lim fk dµ.
n
n
k
k
n
k
Theorem 5.2.2. [monotone convergence] Let (Ω, F, µ) be a measure
space and f, f1 , f2 , ... : Ω → R be Borel-measurable.
5.2. THEOREM ABOUT MONOTONE CONVERGENCE
(1) If 0 ≤ fn (ω) ↑ f (ω) a.e., then limn
R
fn dµ =
R
f dµ.
(2) If 0 ≥ fn (ω) ↓ f (ω) a.e., then limn
R
fn dµ =
R
f dµ.
65
Proof. (a) First suppose
0 ≤ fn (ω) ↑ f (ω)
for all ω ∈ Ω.
For each fn take a sequence of step functions (fn,k )k≥1 such that 0 ≤ fn,k ↑ fn ,
as k → ∞. Setting
hN := max fn,k
1≤k≤N
1≤n≤N
we get hN −1 ≤ hN ≤ max1≤n≤N fn = fN . Define h := limN →∞ hN . For
1 ≤ n ≤ N it holds that
fn,N ≤ hN ≤ fN
so that, by N → ∞,
fn ≤ h ≤ f,
and therefore
f = lim fn ≤ h ≤ f.
n→∞
Since hN is a Rstep function
R for each N and hN ↑ f we have by Lemma 5.2.1
that limN →∞ hN dµ = f dµ and therefore, since hN ≤ fN ,
Z
Z
f dµ ≤ lim
fN dµ.
N →∞
R
R
On the other hand, fn ≤ fn+1 ≤ f implies fn dµ ≤ f dµ and hence
Z
Z
lim
fn dµ ≤ f dµ.
n→∞
(b) Now let 0 ≤ fn (ω) ↑ f (ω) a.e. By definition, this means that
0 ≤ fn (ω) ↑ f (ω) for all ω ∈ Ω \ A,
where µ(A) = 0. Hence 0 ≤ fn (ω)1IAc (ω) ↑ f (ω)1IAc (ω) for all ω and step (a)
implies that
Z
Z
lim
fn 1IAc dµ =
n
Using Remark 5.1.5 we get
that we are done.
R
fn 1IAc dµ =
f 1IAc dµ.
R
fn dµ and
R
f 1IAc dµ =
R
f dµ, so
(c) Assertion (2) follows from (1) since 0 ≥ fn ↓ f implies 0 ≤ −fn ↑ −f.
66
5.3
CHAPTER 5. INTEGRATION
Basic properties of the Lebesgue-integral
We say that a property P(ω), depending on ω, holds µ-almost everywhere
or almost everywhere (a.e.) if {ω ∈ Ω : P(ω) holds} ∈ F and that
µ ({ω ∈ Ω : P(ω) does not hold}) = 0.
Let us start with some first properties of the expected value.
Proposition 5.3.1. [properties of the expectation] For Ra measure
space
(Ω, F, µ) and Borel-measurable f, g : Ω → R, such that f dµ and
R
gdµ exist, the following assertions hold:
R
R
(1) If f ≤ g a.e., then f dµ ≤ gdµ.
(2) The map f is integrable if and only if |f | is integrable. In this case one
has
Z
Z
| f dµ| ≤
|f |dµ.
Ω
Ω
R
R
R
R
R
(3) If Rf + dµ+ g + dµ < ∞ orR f − dµ+ g − Rdµ < ∞,Rthen (f +g)+ dµ < ∞
or (f + g)− dµ < ∞ and (f + g)dµ = f dµ + gdµ.
R
R
R
(4) If a ∈ R, then (af )dµ exists and (af )dµ = a f dµ.
(5) If f and g are integrable and a, b ∈ R, then af + bg is integrable.
R
(6) If f ≥ 0 a.e. and Ω f dµ = 0, then f = 0 a.e.
Proof. (a) We assume f (ω) ≥ 0 and g(ω) ≥ 0 for all ω ∈ Ω. Using the
approximations fn0 and gn0 from the proof of Lemma 5.2.1 we see that
0 ≤ fn0 ≤ f, 0 ≤ gn0 ≤ g, lim fn0 = f, lim gn0 = g.
n
n
R
R
R
Hence limn (fn0 + gn0 ) = f + g as well and (fn0 + gn0 )dµ = fn0 dµ + gn0 dµ
(because we have step-functions) and Lemma 5.2.1 gives therefore that
Z
Z
Z
(f + g)dµ = f dµ + gdµ.
(b) Proof of (3): We have that
(f + g)+ + f − + g − = f + + g + + (f + g)−
and, by (a),
Z
Z
Z
Z
Z
Z
+
−
−
+
+
(f + g) dµ + f dµ + g dµ = f dµ + g dµ + (f + g)− dµ.
5.4. LEMMA OF FATOU AND LEBESGUE’S THEOREM
67
Under the assumptions of (3) we can rearrange this equation to
Z
Z
+
(f + g) dµ −
Z
−
(f + g) dµ =
+
f dµ −
Z
−
f dµ +
Z
+
Z
g dµ −
g − dµ
R
R
R
and are done (note that (f +g)± dµ ≤ f ± dµ+ g ± dµ because of (f +g)± ≤
f ± + g ± ).
(c) Proof of (2): Here we note that |f | = f + + f − which implies by (2)
that
Z
Z
Z
+
|f |dµ = f dµ + f − dµ.
R
R
R
R
The required inequality reads as | f + dµ − f − dµ| ≤ f + dµ + f − dµ.
(d) Proof of (1): According to Remark 5.1.5 we only have to check the
case f (ω) ≤
for all ω ∈ Ω Rby setting f and g to zero when f (ω) 6= g(ω).
R g(ω)
−
Moreover g dµ = ∞R gives gdµ = −∞ and there is nothing to prove.
So let us assume thatR g − dµ < ∞ and define h(ω) := g(ω) − f (ω)R ≥ 0 so
that
f +Rh = g and
f − dµ < ∞ as well. By (3) we deduce that gdµ =
R
R
f dµ + hdµ ≥ f dµ.
(e) Proof of (6): Again by Remark 5.1.5 we can assume that f (ω) ≥ 0
for
S∞ all ω ∈ Ω. Assume that µ(f > 0) > 0. By considering {f > 0} =
n=1 {f ≥ 1/n} and using the monotonicity of µ from below we get µ(f >
0) = limn µ(f ≥ 1/n) and some n0 ≥ 1 with µ(f ≥ 1/n0 ) > 0. Now
0≤
and therefore
R
f dµ ≥
1
µ(f
n0
1
1I{f ≥1/n0 } ≤ f
n0
≥ 1/n0 ) > 0 which is a contradiction.
(d) Proof of (5): Since (af + bg)+ ≤ |a||f | + |b||g| and (af + bg)− ≤
|a||f | + |b||g| we get that af + bg is integrable.
The proof of (4) is left as an exercise.
5.4
Lemma of Fatou and Lebesgue’s theorem
First we extend the Theorem 5.2.2 about monotone convergence:
Theorem 5.4.1. Let (Ω, F, µ) be a measure space and g, f, f1 , f2 , ... : Ω → R
be Borel-measurable. If
(1) g(ω) ≤ fn (ω) ↑ f (ω) a.e. and
R
g − dµ < ∞ or
(2) g(ω) ≥ fn (ω) ↓ f (ω) a.e. and
R
g + dµ < ∞,
then limn→∞
R
fn dµ =
R
f dµ.
68
CHAPTER 5. INTEGRATION
Proof. We only consider (1). Let hn := fn − g and h := f − g. Then
0 ≤ hn (ω) ↑ h(ω) a.s.
R
R
−
−
Theorem 5.2.2 implies that limn hn dµ = R hdµ. Since
R fn and
R f are
5.3.1 (1) implies that hn dµ = fn dµ − gdµ and
RintegrableR Proposition
R
hdµ = f dµ − gdµ so that we are done.
Definition 5.4.2. [extended random variable] Let (Ω, F) be a measurable space. A function f : Ω → R ∪ {−∞, ∞} is called extended random
variable if and only if
(1) f −1 (B) ∈ F for all B ∈ B(R),
(2) f −1 ({+∞}) ∈ F and f −1 ({−∞}) ∈ F.
Given an extended map f , the positive and negative part are defined as before
with the convention that +∞ is a positive number and −∞ is a negative
number. In the following it is convenient to have the Lebesgue integral for
extended Borel-measurable maps. For this reason we do not need to redo our steps toward the Lebesgue-integral, we use the following definition
instead:
Definition 5.4.3. Let (Ω, F, µ) be a measure space and f : Ω → R ∪
{+∞, −∞} be an extended Borel-map.
(1) If f (ω) ∈ [0, ∞] for all ω ∈ Ω, then
(R
Z
1I{f <∞} f dµ : µ(f = ∞) = 0
f dµ :=
.
+∞
: µ(f = ∞) > 0
(2) If
R
f + dµ < ∞ or
R
f − dµ < ∞, then
Z
Z
Z
+
f dµ := f dµ − f − dµ.
In Definition 1.2.7 we defined lim supn→∞ an for a sequence (an )∞
n=1 ⊂ R.
∞
Naturally, one can extend this definition to a sequence (fn )n=1 of measurable
maps. In particular, we obtain
Lemma 5.4.4. Given a measurable space (Ω, F), for any sequence of
Borel-measurable maps fn : Ω → R one has that the point-wise defined expressions lim inf n→∞ fn and lim supn→∞ fn are extended Borel-measurable
maps.
Proposition 5.4.5. [Lemma of Fatou] Let (Ω, F, µ) be a measure space
and g, f1 , f2 , ... : Ω → R be Borel-measurable.
5.5. EXAMPLES: DISCRETE AND CONTINUOUS
(1) If g ≤ fn a.e. and
R
g − dµ < ∞, then
R
(lim inf n→∞ fn )− dµ < ∞ and
Z
Z
lim inf fn dµ ≤ lim inf
n→∞
(2) If fn ≤ g a.e. and
R
69
R
g + dµ < ∞, then
Z
(lim inf n→∞ fn )+ dµ < ∞ and
Z
fn dµ ≤
lim sup
fn dµ.
n→∞
lim sup fn dµ.
n→∞
n→∞
Proof. We only need to prove (1) as (2) can be deduced from (1) by passing
from fn to −fn . We let Zk := inf n≥k fnR so that Zk ↑ lim inf n fnR, g ≤ Zk a.e.
and g ≤ lim inf n fn a.e. This explains (lim inf n→∞ fn )− dµ ≤ g − dµ < ∞.
Furthermore, applying Theorem 5.4.1 gives that
Z
Z
Z
lim inf fn dµ = lim
n
n
Zn dµ = lim
inf fk dµ
Z
Z
≤ lim inf fk dµ = lim inf fn dµ.
n
k≥n
n
n
k≥n
Proposition 5.4.6. [Lebesgue’s Theorem, dominated convergence]
Let (Ω, F, µ) be a measure space and g, f, f1 , f2 , ... : Ω → R be Borelmeasurable |fn | ≤ |g| a.e. Assume that g is integrable and that f = limn→∞ fn
a.e. Then f is integrable and one has that
Z
Z
f dµ = lim
n
fn dµ.
Proof. Applying Fatou’s Lemma gives
Z
Z
f dµ =
Z
lim inf fn dµ ≤ lim inf fn dµ
n→∞
n→∞
Z
Z
Z
≤ lim sup fn dµ ≤ lim sup fn dµ = f dµ.
n→∞
5.5
n→∞
Examples: discrete and continuous
One feature of the Lebesgue-integral is that it provides an unified approach
to different mathematical objects, in our case to sums and to the Riemannintegral.
70
5.5.1
CHAPTER 5. INTEGRATION
Countable measure spaces
Let Ω be finite or countably infinite, F = 2Ω , and assume pω ∈ [0, ∞) for
ω ∈ Ω, and define the measure µ by
X
µ(A) :=
pω .
ω∈A
Any function f : Ω → R is Borel-measurable and for a non-negative f we
have
Z
X
f dµ =
f (ω).
ω∈Ω
The Lebesgue-integral of the map f : Ω → R exists if and only if
X
X
f (ω)pω < ∞ or
f (ω)pω > −∞.
{ω:f (ω)≥0}
{ω:f (ω)≤0}
If the Lebesgue-integral exists, then it computes to
Z
X
X
f dµ =
f (ω)pω +
f (ω)pω .
{ω:f (ω)≥0}
{ω:f (ω)≤0}
The map f is integrable if and only if
5.5.2
P
ω∈Ω
|f (ω)|pω < ∞.
The Riemann integral
For the following let −∞ < a < b < ∞. We call P = (tn )N
n=0 be a partition
of [a, b] provided that
a = t0 < · · · < tN = b.
For a bounded function f : [a, b] → R we define the upper and lower sums
by
n h
X
UP (f ) :=
sup
i=1 t∈[ti−1 ,ti )
n h
X
LP (f ) :=
inf
i=1
t∈[ti−1 ,ti )
i
f (t) (ti − ti−1 ),
i
f (t) (ti − ti−1 ),
and the corresponding upper and lower integrals
U (f ) := inf UP (f ),
P
L(f ) := sup LP (f ).
P
The function f is called Riemann-integrable provided that U (f ) = L(f ). In
this case we let
Z
b
R−
f (t)dt := U (f ) = L(f )
a
be the Riemann-integral.
5.5. EXAMPLES: DISCRETE AND CONTINUOUS
71
Proposition 5.5.1 (Lebesgue-criterion for Riemann-integrability). Let
f : [a, b] → R be a bounded function. Then the following assertions are
equivalent:
(1) The function f is Riemann-integrable.
(2) The exists a Borel-set N ∈ B(R) with N ⊆ [a, b] and λ(N ) = 0 such
that f is continuous in all x ∈ [a, b] \ N .
Rb
R
In this case one has that R − a f (t)dt = [a,b] f dλ.
Proof. (1) =⇒ (2) We obtain a refining sequence of nets
N
N
a = tN
0 < t1 < · · · < tnN = b
N
such that one gets step-functions over [tN
i−1 , ti ), and which are set to f (b) in
the point b, such that
1. L1 ≤ L2 ≤ · · · ≤ f ≤ · · · ≤ U2 ≤ U1 ,
R
2. limN [a,b] (UN − LN )dλ = 0.
Therefore there is Borel null set N ∈ B(R) with N ⊆ [a, b] such that
lim UN (t) = lim LN (t) = f (t) for t ∈ [a, b] \ N .
N
N
Therefore f 1IN c is Borel-measurable.
(2) =⇒ (1)
Let us consider two examples indicating the difference between the Riemannintegral and the Lebesgue-integral.
Example 5.5.2. We give the standard example of a function that is
Lebesgue-integrable but not Riemann-integrable. Let
1, x ∈ [0, 1] irrational
f (x) :=
.
0, x ∈ [0, 1] rational
Then f is not Riemann integrable but Lebesgue integrable with Ef = 1 if
we use the probability space ([0, 1], B([0, 1]), λ).
Example 5.5.3. The expression
Z
lim
t→∞
0
t
π
sin x
dx =
x
2
is defined as limit in the Riemann sense although
+
−
Z ∞
Z ∞
sin x
sin x
dx = ∞ and
dx = ∞.
x
x
0
0
The point of this example is that the Riemann-integral takes more information into account, here a cancellation property, than the rather abstract
Lebesgue-integral.
72
CHAPTER 5. INTEGRATION
5.5.3
The Lebesgue-Stieltjes integral
Assume an increasing and right-continuous function F : R → R. Consider
the unique σ-finite measure µ on B(R) with
µ((a, b]) := F (b) − F (a).
The measure µ is called Lebesgue-Stieltjes 1 measure and the corresponding
integral
Z
Z
f (t)dF (t) :=
f (t)dµ(t)
R
the Lebesgue-Stieltjes integral.
5.6
Change of variable formula
R
We want to prove a change of variable formula for the integrals Ω gdµ. In
many cases, only by this formula it is possible to compute explicitly expected
values.
Proposition 5.6.1. [Change of variables] Let (Ω, F, µ) be a measure
space, (E, E) be a measurable space, f : Ω → E be a measurable map, and
ϕ : E → R be a Borel-measurable. Assume that Pf is the image measure
of µ with respect to f , that means
µf (B) = µ({ω ∈ Ω : f (ω) ∈ B}) = µ(f −1 (B))
Then, for g := ϕ ◦ f : Ω → R,
Z
B ∈ E.
Z
ϕdµf =
B
for all
gdµ
f −1 (B)
for all B ∈ E in the sense that if one integral exists, the other exists as well,
and their values are equal.
Proof. By replacing ϕ by ϕ+ and ϕ− we can restrict ourselves to non-negative
ϕ : E → R. Moreover, letting ϕ(x)
e
:= 1IB (x)ϕ(x) we have
ϕ(f
e (ω)) = 1If −1 (B) (ω)ϕ(f (ω))
so that it is sufficient to consider the case B = E. Hence we have to show
that
Z
Z
ϕ(x)dµf (x) =
ϕ(f (ω))dµ(ω).
E
Ω
1
Thomas Jan Stieltjes 29/12/1856 (Zwolle, Overijssel, The Netherlands) - 31/12/1894
(Toulouse, France); analysis, number theory.
5.7. FUBINI’S THEOREM
73
Assume now a sequence of measurable step-function 0 ≤ ϕn (x) ↑ ϕ(x) for all
x ∈ E so that ϕn (f (ω)) ↑ ϕ(f (ω)) for all ω ∈ Ω as well. If we can show that
Z
Z
ϕn dµf =
ϕn (f )dµ,
E
Ω
then we are done. By linearity it is enough to check ϕn = 1IB for some B ∈ E.
But now we get
Z
ϕn dµf = µf (B) = µ(f
−1
Z
(B)) =
E
Ω
1If −1 (B) dµ
Z
Z
=
1IB (f )dµ =
ϕn (f )dµ.
Ω
Ω
Let us give an example for the change of variable formula.
Definition 5.6.2. [Moments] Assume that n ∈ {1, 2, ....}.
R
(1) For a Borel-measurable map f : ΩR → R the integral R|f |n dµ is called
n-th absolute moment of f . If f n dµ exists, then f n dµ is called
n-th moment of f .
(2) For a measure µ on (R, B(R)) the integral
Z
|x|n dµ(x)
R
is
n-th absolute moment of µ.
R called
n
x
dµ(x)
is called n-th moment of µ.
R
If
R
R
xn dµ(x) exists, then
Corollary 5.6.3. Let (Ω, F, µ) be a measure space and f : Ω → R be Borelmeasurable with image law µf . Then, for all n = 1, 2, ...,
Z
Z
Z
Z
n
n
n
|f | dµ =
|x| dµf (x) and
f dµ =
xn dµf (x),
R
R
where the latter equality has to be understood as follows: if one side exists,
then the other exists as well and they coincide.
5.7
Fubini’s Theorem
In this section we consider iterated integrals, as they appear very often in
applications, and show in Fubini’s 2 Theorem that integrals with respect
2
Guido Fubini, 19/01/1879 (Venice, Italy) - 06/06/1943 (New York, USA).
74
CHAPTER 5. INTEGRATION
to product measures can be written as iterated integrals and that one can
change the order of integration in these iterated integrals. In many cases
this provides an appropriate tool for the computation of integrals. Before we
start with Fubini’s Theorem we need some preparations. First we recall the
notion of a vector space.
Theorem 5.7.1. [Fubini’s Theorem for non-negative functions]
Let (Ωi , Fi , µi ), i = 1, 2 be σ-finite measure spaces and f : Ω1 × Ω2 → R be a
non-negative F1 ⊗ F2 -measurable function such that
Z
f (ω1 , ω2 )d(µ1 ⊗ µ2 )(ω1 , ω2 ) < ∞.
(5.2)
Ω1 ×Ω2
Then one has the following:
(1) The functions ω1 → f (ω1 , ω20 ) and ω2 → f (ω10 , ω2 ) are F1 -measurable
and F2 -measurable, respectively, for all ωi0 ∈ Ωi .
(2) The functions
Z
ω1 →
f (ω1 , ω2 )dµ2 (ω2 )
Z
and
ω2 →
Ω2
f (ω1 , ω2 )dµ1 (ω1 )
Ω1
are extended F1 -measurable and F2 -measurable maps, respectively.
(3) One has that
Z
Z Z
f (ω1 , ω2 )d(µ1 ⊗ µ2 ) =
f (ω1 , ω2 )dµ2 (ω2 ) dµ1 (ω1 )
Ω1 ×Ω2
Ω1
Ω2
Z Z
=
f (ω1 , ω2 )dµ1 (ω1 ) dµ2 (ω2 ).
Ω2
Ω1
Remark 5.7.2. It should be noted, that item (iii) together with Formula
(5.2) automatically implies that
Z
µ2
ω2 :
f (ω1 , ω2 )dµ1 (ω1 ) = ∞
=0
Ω1
and
µ1
Z
ω1 :
f (ω1 , ω2 )dµ2 (ω2 ) = ∞
= 0.
Ω2
Proof of Theorem 5.7.1. (a) First we remark it is sufficient to prove the assertions for
fN (ω1 , ω2 ) := min {f (ω1 , ω2 ), N }
which is bounded. The statements (1), (2), and (3) can be obtained via
N → ∞ if we use Proposition 3.1.4 to get the necessary measurabilities
5.7. FUBINI’S THEOREM
75
(which also works for our extended random variables) and the monotone
convergence formulated in Theorem 5.2.2 to get to values of the integrals.
Hence we can assume for the following that supω1 ,ω2 f (ω1 , ω2 ) < ∞.
(b) We want to apply the Monotone Class Theorem Proposition 12.3.6. Let
H be the class of bounded F1 ⊗ F2 -measurable functions f : Ω1 × Ω2 → R
such that
1. the functions ω1 → f (ω1 , ω20 ) and ω2 → f (ω10 , ω2 ) are F1 -measurable
and F2 -measurable, respectively, for all ωi0 ∈ Ωi ,
2. the functions
Z
Z
ω1 →
f (ω1 , ω2 )dµ2 (ω2 ) and ω2 →
Ω2
f (ω1 , ω2 )dµ1 (ω1 )
Ω1
are F1 -measurable and F2 -measurable, respectively,
3. one has that
Z
Z Z
f (ω1 , ω2 )d(µ1 ⊗ µ2 ) =
f (ω1 , ω2 )dµ2 (ω2 ) dµ1 (ω1 )
Ω1 ×Ω2
Ω1
Ω2
Z Z
=
f (ω1 , ω2 )dµ1 (ω1 ) dµ2 (ω2 ).
Ω2
Ω1
Again, using Propositions 3.1.4 and Theorem 5.2.2 we see that H satisfies
the assumptions (1), (2), and (3) of Proposition 12.3.6. As π-system I we
take the system of all F = A × B with A ∈ F1 and B ∈ F2 . Letting
f (ω1 , ω2 ) = 1IA (ω1 )1IB (ω2 ) we easily can check that f ∈ H. For instance,
property (c) follows from
Z
f (ω1 , ω2 )d(µ1 ⊗ µ2 ) = (µ1 ⊗ µ2 )(A × B) = µ1 (A)µ2 (B)
Ω1 ×Ω2
and, for example,
Z Z
Z
f (ω1 , ω2 )dµ2 (ω2 ) dµ1 (ω1 ) =
Ω1
Ω2
1IA (ω1 )µ2 (B)dµ1 (ω1 )
Ω1
= µ1 (A)µ2 (B).
Applying the Monotone Class Theorem Proposition 12.3.6 gives that H contains all bounded functions f : Ω1 ×Ω2 → R measurable with respect F1 ⊗F2 .
Hence we are done.
Now we state Fubini’s Theorem for general random variables f : Ω1 × Ω2 →
R.
76
CHAPTER 5. INTEGRATION
Theorem 5.7.3. [Fubini’s Theorem] Let (Ωi , Fi , µi ), i = 1, 2 be σ-finite
measure spaces and f : Ω1 × Ω2 → R be an F1 ⊗ F2 -measurable function such
that
Z
|f (ω1 , ω2 )|d(µ1 ⊗ µ2 )(ω1 , ω2 ) < ∞.
(5.3)
Ω1 ×Ω2
Then the following holds:
(1) The functions ω1 → f (ω1 , ω20 ) and ω2 → f (ω10 , ω2 ) are F1 -measurable
and F2 -measurable, respectively, for all ωi0 ∈ Ωi .
(2) There are Mi ∈ Fi with µi (Mic ) = 0 such that the integrals
Z
Z
0
f (ω1 , ω2 )dµ1 (ω1 ) and
f (ω10 , ω2 )dµ2 (ω2 )
Ω1
Ω2
exist and are finite for all ωi0 ∈ Mi and the functions
Z
ω1 → 1IM1 (ω1 )
f (ω1 , ω2 )dµ2 (ω2 ),
Ω2
Z
ω2 → 1IM2 (ω2 )
f (ω1 , ω2 )dµ1 (ω1 )
Ω1
are F1 -measurable and F2 -measurable maps, respectively.
(3) One has that
Z
f (ω1 , ω2 )d(µ1 ⊗ µ2 )
Ω1 ×Ω2
Z Z
=
1IM1 (ω1 )
f (ω1 , ω2 )dµ2 (ω2 ) dµ1 (ω1 )
Ω1
Ω2
Z Z
=
1IM2 (ω2 )
f (ω1 , ω2 )dµ1 (ω1 ) dµ2 (ω2 ).
Ω2
Ω1
Remark 5.7.4. Our understanding is that writing, for example, an expression like
Z
1IM2 (ω2 )
f (ω1 , ω2 )dµ1 (ω1 )
Ω1
we only consider and compute the integral for ω2 ∈ M2 .
Proof of Theorem 5.7.3. The proposition follows by decomposing f = f + −f −
and applying Theorem 5.7.1.
In the following example we show how to compute the integral
Z ∞
2
e−x dx
−∞
by Fubini’s Theorem.
5.7. FUBINI’S THEOREM
77
Example 5.7.5. Let f : R × R → R be a non-negative continuous function.
For any N ∈ N Fubini’s Theorem gives that
Z
Z N Z N
f (x, y)dλ(y) dλ(x) =
f (x, y)d(λ ⊗ λ)(x, y)
−N
−N
[−N,N ]×[−N,N ]
2
2
where λ is the Lebesgue measure. Letting f (x, y) := e−(x +y ) , the above
yields that
Z N Z N
Z
2
2
−x2 −y 2
e e dλ(y) dλ(x) =
e−(x +y ) d(λ ⊗ λ)(x, y).
−N
−N
[−N,N ]×[−N,N ]
For the left-hand side we get
Z N Z N
−x2 −y 2
lim
e e dλ(y) dλ(x)
N →∞ −N
−N
Z N
Z N
−x2
−y 2
= lim
e
e dλ(y) dλ(x)
=
N →∞
−N
Z
−N
N
−x2
e
lim
N →∞
Z
dλ(x)
−N
∞
e
=
2
−x2
2
dλ(x) .
−∞
For the right-hand side we get
Z
2
2
lim
e−(x +y ) d(λ ⊗ λ)(x, y)
N →∞ [−N,N ]×[−N,N ]
Z
2
2
= lim
e−(x +y ) d(λ ⊗ λ)(x, y)
R→∞
x2 +y 2 ≤R2
R Z 2π
Z
=
lim
R→∞
0
0
2
e−r rdrdϕ
2
= π lim 1 − e−R
R→∞
= π
where we have used polar coordinates. Comparing both sides gives
Z ∞
√
2
e−x dλ(x) = π.
−∞
As corollary we show that the definition of the Gaussian measure in Section
2.4.2 was “correct”.
Proposition 5.7.6. For σ > 0 and m ∈ R let
(x−m)2
1
pm,σ2 (x) := √
e− 2σ2 .
2πσ 2
78
CHAPTER 5. INTEGRATION
Then,
R
R
pm,σ2 (x)dx = 1,
Z
Z
xpm,σ2 (x)dx = m,
(x − m)2 pm,σ2 (x)dx = σ 2 .
and
R
(5.4)
R
In other words: if a random variable f : Ω → R has as law the normal
distribution Nm,σ2 , then
Ef = m and
E(f − Ef )2 = σ 2 .
(5.5)
Proof. By the change of variable x → m + σx it is sufficient
√ to show the
statements for m = 0 and σ = 1. Firstly, by putting x = z/ 2 one gets
Z ∞
Z ∞
z2
1
1
−x2
e dx = √
e− 2 dz
1= √
π −∞
2π −∞
R
where we have used Example 5.7.5 so that R p0,1 (x)dx = 1. Secondly,
Z
xp0,1 (x)dx = 0
R
follows from the symmetry of the density p0,1 (x) = p0,1 (−x). Finally, by
(x exp(−x2 /2))0 = exp(−x2 /2) − x2 exp(−x2 /2) one can also compute that
Z ∞
Z ∞
2
x2
1
1
2 − x2
√
x e dx = √
e− 2 dx = 1.
2π −∞
2π −∞
We close this section with a “counterexample” to Fubini’s Theorem.
Example 5.7.7. Let Ω = [−1, 1] × [−1, 1] and µ be the uniform distribution
on [−1, 1] (see Section 2.4.1). The function
f (x, y) :=
(x2
xy
+ y 2 )2
for (x, y) 6= (0, 0) and f (0, 0) := 0 is not integrable on Ω, even though the
iterated integrals exist end are equal. In fact
Z 1
Z 1
f (x, y)dµ(x) = 0 and
f (x, y)dµ(y) = 0
−1
so that
Z 1 Z
−1
1
−1
Z
f (x, y)dµ(x) dµ(y) =
−1
1
−1
Z
1
f (x, y)dµ(y) dµ(x) = 0.
−1
On the other hand, using polar coordinates we get
Z
Z 1Z
4
|f (x, y)|d(µ × µ)(x, y) ≥
[−1,1]×[−1,1]
0
0
2π
| sin ϕ cos ϕ|
dϕdr
r
5.7. FUBINI’S THEOREM
79
Z
= 2
0
1
1
dr = ∞.
r
The inequality holds because on the right hand side we integrate only over
the area {(x, y) : x2 + y 2 ≤ 1} which is a subset of [−1, 1] × [−1, 1] and
Z
2π
Z
| sin ϕ cos ϕ|dϕ = 4
π/2
sin ϕ cos ϕdϕ = 2
0
0
follows by a symmetry argument.
Let us finish with an
Example 5.7.8. Assume that (Ω, F, µ) is σ-finite and let f : Ω → R be
non-negative and Borel-measurable. Then, by Fubini’s theorem,
Z
Z Z
f dµ =
1I{t∈[0,∞):t<f (ω)} dλ(t)dµ(ω)
Ω [0,∞)
Z
Z
=
1I{t∈[0,∞):t<f (ω)} dµ(ω)dλ(t)
[0,∞) Ω
Z
=
µ(f > t)dλ(t).
[0,∞)
80
CHAPTER 5. INTEGRATION
Chapter 6
Some inequalities
In this chapter we prove some basic inequalities.
6.1
Distributional inequalities
Proposition 6.1.1. [Chebyshev’s inequality] 1 Let f be a non-negative
integrable random variable defined on a probability space (Ω, F, P). Then, for
all λ > 0,
Ef
.
P({ω : f (ω) ≥ λ}) ≤
λ
Proof. We simply have
λP({ω : f (ω) ≥ λ}) = λE1I{f ≥λ} ≤ Ef 1I{f ≥λ} ≤ Ef.
6.2
Convexity
Definition 6.2.1. [convexity] A function g : R → R is convex if and
only if
g(px + (1 − p)y) ≤ pg(x) + (1 − p)g(y)
for all 0 ≤ p ≤ 1 and all x, y ∈ R. A function g is concave if −g is convex.
Every convex function g : R → R is continuous (check!)
(B(R), B(R))-measurable.
and hence
Proposition 6.2.2. [Jensen’s inequality] 2 If g : R → R is convex and
f : Ω → R a random variable with E|f | < ∞, then
g(Ef ) ≤ Eg(f )
1
Pafnuty Lvovich Chebyshev, 16/05/1821 (Okatovo, Russia) - 08/12/1894 (St Petersburg, Russia)
2
Johan Ludwig William Valdemar Jensen, 08/05/1859 (Nakskov, Denmark)- 05/
03/1925 (Copenhagen, Denmark).
82
CHAPTER 6. SOME INEQUALITIES
where the expected value on the right-hand side might be infinity.
Proof. Let x0 = Ef . Since g is convex we find a “supporting line” in x0 , that
means a, b ∈ R such that
ax0 + b = g(x0 ) and ax + b ≤ g(x)
for all x ∈ R. It follows af (ω) + b ≤ g(f (ω)) for all ω ∈ Ω and
g(Ef ) = aEf + b = E(af + b) ≤ Eg(f ).
Example 6.2.3. (1) The function g(x) := |x| is convex so that, for any
integrable f ,
|Ef | ≤ E|f |.
(2) For 1 ≤ p < ∞ the function g(x) := |x|p is convex, so that Jensen’s
inequality applied to |f | gives that
(E|f |)p ≤ E|f |p .
6.3
Hölder- and Minkowski inequality
For the second case in the example above there is another way we can go. It
uses the famous Hölder-inequality.
Proposition 6.3.1. [Hölder’s inequality] 3 Assume a probability space
(Ω, F, P) and random variables f, g : Ω → R. If 1 < p, q < ∞ with p1 + 1q = 1,
then
1
1
E|f g| ≤ (E|f |p ) p (E|g|q ) q .
Proof. We can assume that E|f |p > 0 and E|g|q > 0. For example, assuming
E|f |p = 0 would imply |f |p = 0 a.s. according to Proposition ?? so that
f g = 0 a.s. and E|f g| = 0. Hence we may set
f˜ :=
f
(E|f |p )
1
p
and
g̃ :=
g
1
.
(E|g|q ) q
We notice that
xa y b ≤ ax + by
3
Otto Ludwig Hölder, 22/12/1859 (Stuttgart, Germany) - 29/08/1937 (Leipzig, Germany).
6.3. HÖLDER- AND MINKOWSKI INEQUALITY
83
for x, y ≥ 0 and positive a, b with a + b = 1, which follows from the concavity
of the logarithm (we can assume for a moment that x, y > 0)
ln(ax + by) ≥ a ln x + b ln y = ln xa + ln y b = ln xa y b .
Setting x := |f˜|p , y := |g̃|q , a := p1 , and b := 1q , we get
1
1
|f˜g̃| = xa y b ≤ ax + by = |f˜|p + |g̃|q
p
q
and
1
1
1 1
E|f˜g̃| ≤ E|f˜|p + E|g̃|q = + = 1.
p
q
p q
On the other hand side,
E|f˜g̃| =
E|f g|
1
1
(E|f |p ) p (E|g|q ) q
so that we are done.
1
1
Corollary 6.3.2. For 0 < p < q < ∞ one has that (E|f |p ) p ≤ (E|f |q ) q .
The proof is an exercise.
Corollary 6.3.3. [Hölder’s inequality for sequences] Let (an )∞
n=1
1
1
and (bn )∞
be
sequences
of
real
numbers.
If
1
<
p,
q
<
∞
with
+
=
1,
n=1
p
q
then
! p1 ∞
! 1q
∞
∞
X
X
X
|an bn | ≤
|an |p
|bn |q
.
n=1
n=1
n=1
Proof. It is sufficient to prove the inequality for finite sequences (bn )N
n=1 since
by letting N → ∞ we get the desired inequality for infinite sequences. Let
Ω = {1, ..., N }, F := 2Ω , and P({k}) := 1/N . Defining f, g : Ω → R by
f (k) := ak and g(k) := bk we get
! p1
! 1q
N
N
N
X
1
1 X
1 X
|an bn | ≤
|an |p
|bn |q
N n=1
N n=1
N n=1
from Proposition 6.3.1. Multiplying by N and letting N → ∞ gives our
assertion.
Proposition 6.3.4. [Minkowski inequality] 4 Assume a probability space
(Ω, F, P), random variables f, g : Ω → R, and 1 ≤ p < ∞. Then
1
1
1
(E|f + g|p ) p ≤ (E|f |p ) p + (E|g|p ) p .
(6.1)
1
where cp = 1 for 1 ≤ p < ∞ and cp = 2 p −1 for 0 < p < 1.
4
Hermann Minkowski, 22/06/1864 (Alexotas, Russian Empire; now Kaunas, Lithuania)
- 12/01/1909 (Göttingen, Germany).
84
CHAPTER 6. SOME INEQUALITIES
Proof. For p = 1 the inequality follows from |f + g| ≤ |f | + |g|. So assume
that 1 < p < ∞. The convexity of x → |x|p gives that
a + b p |a|p + |b|p
2 ≤
2
and (a+b)p ≤ 2p−1 (ap +bp ) for a, b ≥ 0. Consequently, |f +g|p ≤ (|f |+|g|)p ≤
2p−1 (|f |p + |g|p ) and
E|f + g|p ≤ 2p−1 (E|f |p + E|g|p ).
1
1
Assuming now that (E|f |p ) p + (E|g|p ) p < ∞, otherwise there is nothing to
prove, we get that E|f +g|p < ∞ as well by the above considerations. Taking
1 < q < ∞ with p1 + 1q = 1, we continue with
E|f + g|p = E|f + g||f + g|p−1
≤ E(|f | + |g|)|f + g|p−1
= E|f ||f + g|p−1 + E|g||f + g|p−1
1
1
1
1
≤ (E|f |p ) p E|f + g|(p−1)q q + (E|g|p ) p E|f + g|(p−1)q q ,
where we have used Hölder’s inequality. Since (p − 1)q = p, (6.1) follows
1
by dividing the above inequality by (E|f + g|p ) q and taking into the account
1 − 1q = p1 .
We only prove the case 0 < p ≤ 1. Here we get |a + b|p ≤ |a|p + |b|p for all
a, b ∈ R, so that
p1
|f (ω) + g(ω)| dP(ω)
Z
p
Ω
Z
p
≤
Ω
≤ 2
1
−1
p
|f (ω)| dP(ω) +
"Z
Z
|g(ω)| dP(ω)
Ω
|f (ω)|p dP(ω)
Ω
where, for 1 ≤ q =
1
p
p1
p
p1
Z
+
|g(ω)|p dP(ω)
p1 #
,
Ω
< ∞, we have used |a + b|q ≤ 2q−1 (|a|q + |b|q ).
We close with a simple deviation inequality for f .
Corollary 6.3.5. Let f be a random variable defined on a probability space
(Ω, F, P) such that Ef 2 < ∞. Then one has, for all λ > 0,
P(|f − Ef | ≥ λ) ≤
E(f − Ef )2
Ef 2
≤
.
λ2
λ2
6.3. HÖLDER- AND MINKOWSKI INEQUALITY
85
Proof. From Corollary 6.3.2 we get that E|f | < ∞ so that Ef exists. Applying Proposition 6.1.1 to |f − Ef |2 gives that
P({|f − Ef | ≥ λ}) = P({|f − Ef |2 ≥ λ2 }) ≤
E|f − Ef |2
.
λ2
Finally, we use that E(f − Ef )2 = Ef 2 − (Ef )2 ≤ Ef 2 .
86
CHAPTER 6. SOME INEQUALITIES
6.4
The variance of a random variable
In this section we restric
Besides the expected value, the variance is often of interest.
Definition 6.4.1. [variance] Let (Ω, F, P) be a probability space and f :
Ω → R be an integrable random variable. Then
var(f) = σf2 = E[f − Ef]2 ∈ [0, ∞]
is called variance.
Now we continue with some basic properties of the expectation.
We may infer now properties of the variance:
Corollary 6.4.2.
(1) If f is integrable and α, c ∈ R, then
var(αf − c) = α2 var(f).
(2) If Ef 2 < ∞ then var(f) = Ef 2 − (Ef)2 < ∞.
Proof. (1) follows from
var(αf − c) = E[(αf − c) − E(αf − c)]2 = E[αf − αEf]2 = α2 var(f).
1
(2) First we remark that E|f | ≤ (Ef 2 ) 2 as we shall see later by Hölder’s
inequality (Corollary 6.3.2), that means that any square integrable random
variable is integrable. Then we simply get that
var(f) = E[f − Ef]2 = Ef 2 − 2E(fEf) + (Ef)2 = Ef 2 − 2(Ef)2 + (Ef)2 .
Finally, we state a useful formula for independent random variables.
Proposition 6.4.3. If f and g are independent and E|f | < ∞ and E|g| < ∞,
then E|f g| < ∞ and
Ef g = Ef Eg.
The proof is an exercise. Concerning the variance of the sum of independent
random variables we get the fundamental
Proposition 6.4.4. Let f1 , ..., fn be independent random variables with finite
second moment. Then one has that
var(f1 + · · · + fn ) = var(f1 ) + · · · + var(fn ).
6.4. THE VARIANCE OF A RANDOM VARIABLE
87
Proof. The formula follows from
var(f1 + · · · + fn ) = E((f1 + · · · + fn ) − E(f1 + · · · + fn ))2
!2
n
X
= E
(fi − Efi )
i=1
= E
n
X
(fi − Efi )(fj − Efj )
i,j=1
=
n
X
E ((fi − Efi )(fj − Efj ))
i,j=1
=
n
X
E(fi − Efi )2 +
i=1
=
=
n
X
i=1
n
X
X
E ((fi − Efi )(fj − Efj ))
i6=j
var(fi ) +
X
E(fi − Efi )E(fj − Efj )
i6=j
var(fi )
i=1
because E(fi − Efi ) = Efi − Efi = 0.
88
CHAPTER 6. SOME INEQUALITIES
Chapter 7
The theorem of
Radon-Nikodym and
conditional expectations
So far, given a measurable space (Ω, F), we did consider one measure on
this space. However, in many situations the consideration of different measures and their relation to each other is of interest. The central concept,
we consider, is the absolute continuity: Assuming two measures µ and ν on
(Ω, F), we say that ν is absolutely continuous with respect to µ if and only
if µ(A) = 0 implies that ν(A) = 0 and we write ν µ. At first glance this
concept seems to be rather weak as it only concerns null-sets, however the
Radon-Nikodym Theorem (see Theorem 7.2.1 below) demonstrates that
this notion is strong and useful. The Radon-Nikodym Theorem makes it
possible to relate different measures to each other by densities. An application is the construction of conditional expectations, one of the fundamental
concepts in stochastic process theory and filtering.
7.1
Signed measures
Definition 7.1.1 (Signed measures). Let (Ω, F) be a measurable space.
(i) A map ν : F → R is called a (finite) signed measure if and only if
ν = ν + − ν −,
where ν + and ν − are finite measures on F.
(ii) Assume that µ, ν are measures on the measurable space (Ω, F), where
µ is σ-finite and non-negative and ν is a signed measure. Then ν is
absolutely continuous with respect to µ if and only if
µ(A) = 0 implies ν(A) = 0.
We shall write ν µ.
90
CHAPTER 7. RADON-NIKODYM THEOREM
Example 7.1.2. Let (Ω, F, P) be a probability space, L : Ω → R be an
integrable random variable, and
Z
ν(A) :=
LdP.
A
Then ν is a signed measure and ν P.
Proof. We let L+ := max {L, 0} and L− := max {−L, 0} so that L = L+ −L− .
Define
Z
ν ± (A) :=
1IA L± dP.
Ω
Now we check that µ± are finite measures. First we have that
Z
Z
±
±
|L|dP < ∞.
1IΩ L dP ≤
ν (Ω) =
Ω
Ω
Then assume (An )∞
n=1 ⊆ F to be disjoint sets such that A =
∞
S
An . Then
n=1
ν
+
∞
∪ An
n=1
Z
=
Ω
L+ dP
1I ∞
∪ A
n=1
∞
X
Z
=
Ω
n
!
1IAn (ω) L+ dP
n=1
Z
=
lim
Ω N →∞
Z
=
=
lim
N →∞
∞
X
Ω
N
X
n=1
N
X
!
L+ dP
1IAn
!
1IAn
L+ dP
n=1
ν + (An )
n=1
where we have used Lebesgue’s dominated convergence theorem. The same
can be done for L− .
Proposition 7.1.3 (Hahn-decomposition). Let ν be a signed measure on
(Ω, F). Then there exists a measurable partition Ω = Ω− ∪ Ω+ such that
(i) ν(A) ≥ 0 for all A ∈ F with A ⊆ Ω+ ,
(ii) ν(A) ≤ 0 for all A ∈ F with A ⊆ Ω− .
Given another partition Ω = Ω̃− ∪ Ω̃+ with these properties, one has that
ν(Ω+ ∆Ω̃+ ) = ν(Ω− ∆Ω̃− ) = 0.
7.1. SIGNED MEASURES
91
+
∞
Proof. We find a sequence (Ω+
n )n=1 ⊆ F such that ν(Ωn ) ≥ κ −
1
2n
so that
+
−
lim ν(Ω+
n ) = κ := sup ν(B) ≤ ν (Ω) + ν (Ω) < ∞,
n
B⊆F
where ν + and ν − are taken from Definition 7.1.1, and let
Ω+ :=
∞ [
∞
\
Ω+
k.
n=1 k=n
c
Then ν(A) ≤ 2−k for all measurable A ⊆ (Ω+
k ) . Consequently, by the
Lemma of Fatou (Proposition 1.2.10)
!
!
!
∞ [
∞
∞ [
∞
∞ [
∞
\
\
\
Ω+
= ν+
Ω+
− ν−
Ω+
ν
k
k
k
n=1 k=n
n=1 k=n
−
Ω+
n −ν
≥ lim sup ν +
n→∞
≥ κ − lim
n→∞
!
Ω+
k
n=1 k=n
= κ − lim ν −
n→∞
n=1 k=n
∞
∞
\ [
∞
[
!
Ω+
k
" ∞ k=n#
X 1
k=n
2k
= κ
and ν(Ω+ ) = κ. Assuming a measurable subset A ⊆ Ω+ with ν(A) < 0
would yield to a contradiction because in this case,
κ = ν(Ω+ ) < ν(Ω+ \ A) ≤ κ.
Set Ω− := Ω \ Ω+ and asume a mesurable A ⊆ Ω− with ν(A) > 0. This
would again yield a contradiction because then
κ = ν(Ω+ ) < ν(Ω+ ∪ A) ≤ κ.
The last arguments also show that any exchange between Ω+ and Ω+ , which
concerns a subset A of positive measure, yields to a contradiction. This
proves the uniqueness.
From the Hahn
1
-decomposition we derive the Jordan-decomposition:
Definition 7.1.4 (Jordan-decomposition). Given a signed measure µ
on (Ω, F) and Ω± from Proposition 7.1.3, we call ν := ν + − ν − with
ν + (A) := ν(A ∩ Ω+ ) and
ν − (A) := −ν(A ∩ Ω− )
the Jordan-decomposition of ν.
1
Hans Hahn, 27/09/1879 (Vienna, Austria)- 24/07/1934 (Vienna, Austria).
92
CHAPTER 7. RADON-NIKODYM THEOREM
7.2
Theorem of Radon-Nikodym
Theorem 7.2.1 (Radon-Nikodym). Let (Ω, F, µ) be a σ-finite measure space
and ν be a signed measure
with ν µ. Then there exists a measurable
R
L : Ω → R such that Ω |L|dµ < ∞ and
Z
ν(A) =
Ldµ for all A ∈ F.
(7.1)
A
The measurable map L is unique in the following sense: if L and L0 are
random variables satisfying (7.1), then µ(L 6= L0 ) = 0.
The Radon-Nikodym theorem was proved by Radon 2 in 1913 in the case of
Rn . The extension to the general case was done by Nikodym 3 in 1930.
Definition 7.2.2. The measurable map L is called Radon-Nikodym
derivative and we write
dν
.
L=
dµ
We should keep in mind the rule
Z
Z
ν(A) =
1IA dν =
1IA Ldµ,
Ω
Ω
so that ’dν = Ldµ’.
Proof of Theorem 7.2.1. It is sufficient to prove the statement for finite measures µ so that we can even assume that S
µ = P is a probability measure. In the general caseP
we partition Ω = ∞
n=1 Ωn with µ(Ωn ) < ∞, get
Ln : Ωn → R, define L = n 1IΩn Ln , and observe that
Z
|L|dµ =
Ω
∞ Z
X
n=1
Ωn
|Ln |dµ =
∞
X
(ν + (Ωn ) + ν − (Ωn )) = ν + (Ω) + ν − (Ω) < ∞,
n=1
where ν = ν + − ν − is the Jordan decomposition.
So let us assume that µ = P. By the Jordan-decomposition we have ν =
ν + − ν − with ν + P and ν − P because, for example, P(A) = 0 implies
P(A ∩ Ω+ ) = 0 and therefore 0 = ν(A ∩ Ω+ ) = ν + (A). Hence we can consider
ν + and ν − separately, i.e. we may assume that ν is a non-negative finite
measure. We start our proof by defining
Z
M := L ∈ L1 (Ω, F, P) : L ≥ 0, LdP ≤ ν(A) for all A ∈ F .
A
2
Johann Radon, 16/12/1887 (Tetschen, Bohemia; now Decin, Czech Republic) 25/05/1956 (Vienna, Austria).
3
Otton Marcin Nikodym, 13/08/1887 (Zablotow, Galicia, Austria-Hungary; now
Ukraine) - 04/05/1974 (Utica, USA).
7.2. THEOREM OF RADON-NIKODYM
93
Because 0 ∈ M the set M is non-empty. Moreover, L1 , L2 ∈ M implies
L1 ∨ L2 ≤ L1 + L2 ∈ L1 (Ω, F, P)
and, with Ω1 := {L1 ≥ L2 } and Ω2 := {L2 > L1 }, the estimates
Z
Z
Z
L1 ∨ L2 dP =
L1 dP +
L2 dP ≤ ν(A ∩ Ω1 ) + ν(A ∩ Ω1 ) = ν(A).
A
A∩Ω1
A∩Ω2
Consequently, L1 ∨ L2 ∈ M. Observe that
Z
κ := sup
LdP ≤ ν(Ω) < ∞
L∈M
Ω
and find Ln ∈ M such that
Z
Ln dP = κ.
sup
n
Ω
Letting, Kn := max{L1 , ..., Ln }, we obtain Kn ∈ M, 0 ≤ K1 ≤ K2 ≤ · · · ,
and
Z
lim Kn dP = κ.
n
Ω
Define the extended random variable
K(ω) := lim Ln (ω) ∈ [0, ∞].
n
From monotone convergence we get that
Z
KdP = κ < ∞
Ω
and may redefine K on a null set such that K : Ω → R, K ∈ M, and
Z
KdP ≤ ν(A) for A ∈ F.
A
Assume that there is an A0 ∈ F with
Z
KdP < ν(A0 ).
A0
Then P(A0 ) > 0 by the absolute continuity and there is an ε > 0 such that
for
Z
0
ν (A) := ν(A) − (1 + ε) KdP
A
0
one has ν (A0 ) > 0 and therefore the Hahn-decomposition with respect ν 0
yields to a partition Ω = Ω+ ∪ Ω− with ν 0 (Ω+ ) > 0. Because ν 0 P as well,
we obtain that P(Ω+ ) > 0. Let
0 ≤ K 0 := K(1 + εχΩ+ ) ∈ L1 (Ω, F, P).
94
CHAPTER 7. RADON-NIKODYM THEOREM
If we can show that K 0 ∈ M, then we would get a contradiction because
Z
Z
KdP <
K 0 dP ≤ κ.
κ=
Ω
Ω
To check that K 0 ∈ M, we let A ∈ F and get that
Z
Z
Z
0
0
K dP =
K dP +
K 0 dP
A
A∩Ω+
A∩(Ω+ )c
Z
Z
KdP +
KdP
= (1 + ε)
A∩Ω+
A∩(Ω+ )c
Z
Z
+
= (1 + ε)
KdP − ν(A ∩ Ω ) +
KdP + ν(A ∩ Ω+ )
+
+
c
A∩Ω
A∩(Ω )
Z
= −ν 0 (A ∩ Ω+ ) +
KdP + ν(A ∩ Ω+ )
+
c
A∩(Ω )
Z
≤
KdP + ν(A ∩ Ω+ )
A∩(Ω+ )c
≤ ν(A ∩ (Ω+ )c ) + ν(A ∩ Ω+ )
= ν(A).
7.3
Conditional expectation
Given a probability space (Ω, F, P) and a sub-σ-algebra G ⊆ F, a random
variable f : Ω → R that is G-measurable is automatically F-measurable.
The converse is not true in general. Roughly speaking, one can say that an
F-measurable random variable is more complex or does contain more information than an G-measurable random variable. There is a canonical way to
project an F-measurable random variable down to an G-measurable random
variable, that is called conditional expectation. The notion of the conditional
expectation plays a central role in the theory of stochastic processes and its
applications. In particular the notion of a martingale is derived from this,
that can be used for example to describe fair games and is the basis to solve
PDEs by Monte-Carlo methods.
Let us explain the conditional expectation in an easy - but typical - situation
as an average procedure. Assume that the σ-algebra G is obtained by all
possible combinations of the elements of a partition
Ω=
n
[
Ωi
i=1
where the Ωi are pair-wise disjoint
R and of positive measure. Given an random
variable f : Ω → R such that Ω |f |dP < ∞ we average f over the partition
7.3. CONDITIONAL EXPECTATION
95
and obtain a new random variable
Z
1
g(ω) :=
f dP if ω ∈ Ωi .
P(Ωi ) Ωi
Because g is constant on the Ωi ’s it is measurable with respect to G. Moreover, one can easily see that one cannot distinguish between f and g if one
integrates over sets from G, i.e.
Z
Z
f dP =
gdP for all B ∈ G.
B
B
The random variable g will be called conditional expectation of f given G.
The following proposition guaranties the existence of a conditional expectation in the general case, i.e. when G is not of the simple form above.
Proposition 7.3.1 (Conditional expectation). Let (Ω, F, P) be a probability space, G ⊆ F be a sub-σ-algebra and f ∈ L1 (Ω, F, P).
(i) There exists a g ∈ L1 (Ω, G, P) such that
Z
Z
gdP for all B ∈ G.
f dP =
B
B
(ii) If g and g 0 are as in (i), then P(g 6= g 0 ) = 0.
Proof. Define
Z
f dP for A ∈ G
ν(A) :=
A
so that ν is a signed measure on G. Applying the Theorem of RadonNikodym gives a g ∈ L1 (Ω, G, P) such that
Z
Z
Z
ν(A) =
gdP and
gdP = ν(A) =
f dP.
A
A
A
0
Assume now another g ∈ L(Ω, F, P) with
Z
Z
gdP =
g 0 dP
A
A
0
for all A ∈ G and assume that P(g 6= g ) > 0. Hence we find a set A ∈ G
with P(A) > 0 and real numbers α < β such that
g(ω) ≤ α < β ≤ g 0 (ω) for ω ∈ A
or
g 0 (ω) ≤ α < β ≤ g(ω) for ω ∈ A.
Consequently (for example in the first case)
Z
Z
gdP ≤ αP(A) < βP(A) ≤
g 0 dP
A
which is a contradiction.
A
96
CHAPTER 7. RADON-NIKODYM THEOREM
The main point of the theorem above is that g is G-measurable. This leads
to the following definition:
Definition 7.3.2 (Conditional expectation). The G-measurable and
integrable random variable g from Proposition 7.3.1 is called conditional expectation of f given G and is denoted by
g = E[f | G].
One has to keep in mind that the conditional expectation is only unique up
to null sets from G. We continue with some basic properties of conditional
expectations.
Proposition 7.3.3. Let f, g ∈ L1 (Ω, F, P) and let H ⊆ G ⊆ F be sub
σ-algebras of F. Then the following holds true:
(i) Linearity: If µ, λ ∈ R, then
E[λf + µg | G] = λE[f | G] + µE[g | G] a.s.
(ii) Monotonicity: If f 5 g a.s., then E[f | G] 5 E[g | G] a.s.
(iii) Positivity: If g = 0 a.s., then E[g | G] = 0 a.s.
(iv) Convexity: One has that |E[f | G]| 5 E[|f | | G] a.s.
(v) Projection property: If f is G-measurable, then E[f | G] = f a.s.
(vi) Tower property: E[E[f | G] | H] = E[E[f | H] | G] = E[f | H] a.s.
(vii) If h : Ω → R is G-measurable and f h ∈ L1 (Ω, F, P), then
E[hf | G] = hE[f | G] a.s.
(viii) If G = {∅, Ω}, then E[f |G] = Ef a.s.
(ix) If for all B ∈ B(R) and all A ∈ G one has that
P ({f ∈ B} ∩ A) = P(f ∈ B)P(A),
i.e. f is independent from G, then E[f |G] = Ef a.s.
(x) Monotone convergence: assume f ≥ 0 a.s. and random variables 0 ≤
hn ↑ f a.s. Then
lim E[hn |G] = E[f |G] a.s..
n
7.3. CONDITIONAL EXPECTATION
97
Proof. (i), (ix) and (x) are exercises.
(ii) Set A := {ω ∈ Ω : E[f |G](ω) > E[g|G](ω)}. The relation f ≤ g a.s.
implies E[1IA (f −g)] ≤ 0. Noticing that A ∈ G, by the definition of conditional
expectation we conclude
0 ≥ E1IA f − E1IA g = E1IA E[f |G] − E1IA E[g|G] = E1IA E[f |G] − E[g|G] .
Therefore P(A) = 0.
(iii) Apply (ii) to 0 = f ≤ g. (iv) The inequality f ≤ |f | gives E[f |G] 5
E[|f ||G] a.s. and −f ≤ |f | gives −E[f |G] 5 E[|f ||G] a.s., so that we are done.
(v) follows directly from the definition.
(vi) Since E[f |H] is H-measurable and hence G-measurable, item (iv) implies
that E[E[f |H]|G] = E[f |H] a.s. so that one equality is shown. For the other
equality we have to show that
Z
Z
f dP
E[E[f |G]|H]dP =
A
A
for A ∈ H. Letting h := E[f |G], this follows from
Z
Z
Z
Z
Z
f dP
E[f |G]dP =
hdP =
E[h|H]dP =
E[E[f |G]|H]dP =
A
A
A
A
A
since A ∈ H ⊆ G.
(vii) Assume first that h =
N
S
PN
n=1 αn 1IAn , where
An = Ω is a disjoint
n=1
partition with An ∈ G. For A ∈ G we get, a.s., that
Z
hf dP =
A
N
X
Z
αn
n=1
=
=
N
X
n=1
N
X
n=1
Z
=
A
1IAn f dP
A
Z
αn
f dP
A∩An
Z
E[f |G]dP
αn
A∩An
N
X
!
αn 1IAn
E[f |G]dP
n=1
Z
hE(f |G)dP.
=
A
Hence E[hf |G] = hE[f |G] a.s. For the general case we can assume that
f, h ≥ 0 since we can decompose f = f + − f − and h = h+ − h− with
f + := max {f, 0} and f − := min {−f, 0} (and in the same way we proceed
98
CHAPTER 7. RADON-NIKODYM THEOREM
with h). We find step-functions 0 ≤ hn ≤ h such that hn (ω) ↑ h(ω). Then,
by our fist step, we get that
hn E[f |G] = E[hn f |G] a.s.
By n → ∞ the left-hand side follows. The right-hand side is a consequence
of the monotone convergence given in (x).
(viii) Clearly, we have that
Z
Z
f dP = 0 = (Ef )dP and
∅
∅
Z
Z
f dP = Ef =
Ω
(Ef )dP
Ω
so that (viii) follows.
7.4
A representation of conditional expectations
Let (Ω, F, P) be a probability space and f : Ω → R be an integrable random
variable. Let G ⊆ F be a sub σ-algebra obtained by a measurable map
g : Ω → E, where (E, E) is a measurable space, i.e. G = σ(g). We can think
about that g is an information operator. In the following we want to give to
the intuitive equality
Z
Z
f dP =
E(f |g = y)dPg (y),
g∈B
B
where Pg is the image measure of P with respect to g and that represents
our conditional expectation on the left-hand side by computing averages over
the sets {g = y} ∈ E, a proper sense. The problem lies in the fact that the
sets {g = y} ∈ E are general of Pg -measure zero. We start by a factorisation
lemma, that is of independent interest.
Lemma 7.4.1 (Factorization Lemma, [2, p. 62]). Assume Ω 6= ∅, (E, E) be a
measurable space, maps g : Ω → E and F : Ω → R, and σ(g) = σ({g −1 (B) :
B ∈ E}). Then the following assertions are equivalent:
(i) The map F is (Ω, σ(g)) → (R, B(R)) is measurable.
(ii) There exists a measurable h : (E, E) → (R, B(R)) such that F = h ◦ g.
Lemma 7.4.2. Let f : Ω → R be an integrable random variable and g : Ω →
E be measurable. Consider a version F of E[f |σ(g)] that is σ(g) measurable
and a factorization F = h ◦ g according to Lemma 7.4.1. If F 0 is another
version with a corresponding factorization F 0 = h0 ◦ g, then
Pg (h = h0 ) = 1,
where Pg is the image measure of P with respect to g.
7.4. A REPRESENTATION OF CONDITIONAL EXPECTATIONS
99
Proof. For B ∈ E we obtain by a change of variables that
Z
0
[h(y) − h0 (y)]dPg (y)
EPg 1IB (h − h ) =
ZB
=
1IB (g(ω))[h(g(ω)) − h0 (g(ω))]dP(ω) = 0
Ω
because 1IB (g) is σ(g)-measurable. Testing with B = {y ∈ E : h(y) > h0 (y)}
and B = {y ∈ E : h(y) < h0 (y)} leads to h = h0 Pg -a.s.
Definition 7.4.3. We let
E(f |g = y) := h(y) and P(A|g = y) := E(χA |g = y).
Lemma 7.4.4. For a set B ∈ E it holds that
Z
Z
f dP =
E(f |g = y)dPg (y).
g∈B
B
In particular, for f = χA we get that
Z
P(A ∩ {g ∈ B}) =
E(1IA |g = y)dPg (y).
B
Proof. Because {g ∈ B} ∈ σ(g) and the definition of E(f |g = y) we have
that
Z
Z
f dP =
E(f |σ(g))dP
g∈B
g∈B
Z
=
h ◦ gdP
g∈B
Z
h(y)dPg (y)
=
B
Z
=
E(f |g = y)dPg (y).
B
With respect to densities we obtain the following representation:
100
CHAPTER 7. RADON-NIKODYM THEOREM
Proposition 7.4.5. Let f : Ω → R and g : Ω → Rd be random variables
whose joint distribution has a non-negative density h(f,g) ∈ L1 (R1+d , λ1+d ),
i.e. for A ∈ B(R) and B ∈ B(Rd ) one has that
Z
P({ω ∈ Ω : (f (ω), g(ω)) ∈ A × B}) =
h(f,g) (x, y)dλd+1 (x, y).
A×B
Then the following assertion hold:
(i) f and g have densities hf and hg , respectively.
(ii) If
(
hf |g (x|y) :=
then P(f ∈ A|g = y) =
y ∈ Rd .
R
A
h(f,g) (x,y)
hg (y)
if hg (y) 6= 0,
,
otherwise.
0
hf |g (x|y) dλ(x) for all A ∈ B(R) and Pg -a.a.
Proof. (i) The densities one can compute by integrating over the remaining
coordinate, i.e.
Z
hf (x) :=
h(f,g) (x, y)dλd (y),
d
R
Z
hg (y) :=
h(f,g) (x, y)dλ(x).
R
(ii) First we observe that
Z Z
Z Z
hf |g (x|y) dλ(x)dPg (y) =
hf |g (x|y) dλ(x)hg (y)dλd (y)
B A
ZB ZA
hf |g (x|y) dλ(x)1I{hg (y)>0} hg (y)dλd (y)
=
ZB ZA
=
h(f,g) (x, y) dλ(x)1I{hg (y)>0} dλd (y)
B
A
Z Z
=
h(f,g) (x, y) dλ(x)dλd (y)
B
A
= P({f ∈ A} ∩ {g ∈ B}).
On the other hand, we have that
Z
P({f ∈ A} ∩ {g ∈ B}) =
P(f ∈ A|g = y)dPg (y),
B
so that
Z Z
Z
P(f ∈ A|g = y)dPg (y)
hf |g (x|y) dλ(x) dPg (y) =
B
A
B
for all B ∈ B(Rd ) which implies the assertion.
7.4. A REPRESENTATION OF CONDITIONAL EXPECTATIONS
101
Another form of a representation of a conditional expectation is the following:
Proposition 7.4.6 (cf. [6, Appendix]). Let f, g : Ω → R be random variables and G ⊆ F a sub-σ algebra, where we assume that f is independent
from G and g is G-measurable. If ϕ is a Borel function on R2 such that
E|ϕ(f, g)| < ∞, then
E[ϕ(f, g)|G] = (Eϕ(f, y))y=g .
Proof. We have to check that
Z
Z
ϕ(f, g)dP = (Eϕ(f, y))y=g dP
G
G
for all G ∈ G. To verify this equation, we start with a change of variables to
get for G ∈ G that
Z
E ϕ(f, g)1IG =
ϕ(x, y)zdP(f,g,1IG ) (x, y, z).
R3
Since g is G-measurable and f is independent from G we get that f and
(g, 1IG ) are independent, which means P(f,g,1IG ) = Pf ⊗ P(g,1IG ) . By Fubini’s
Theorem and another change of variables,
Z hZ
Z
i
ϕ(x, y)dPf (x) zdP(g,1IG ) (y, z)
ϕ(x, y)zdP(f,g,1IG ) (x, y, z) =
2
R
R3
ZR
=
[Eϕ(f, y)]zdP(g,1IG ) (y, z)
R2
= E (Eϕ(f, y)])y=g 1IG
which implies the assertion.
102
CHAPTER 7. RADON-NIKODYM THEOREM
Chapter 8
What does it mean that
random variables converge?
Assume that we perform some experiment several times and get, each time,
a measurement denoted by f1 , f2 , f3 , ... To get the true quantity (whatever
this means) we naturally consider
Sn =
1
(f1 + · · · + fn )
n
for large n and hope that Sn converges to this true value as n → ∞. To
make this precise we have at least to clarify in which sense the convergence
takes place. This will take us to the almost sure convergence and to the
convergence in probability.
8.1
Almost sure convergence
Definition 8.1.1. Let fn , f : Ω → R be random variables where (Ω, F, P)
is a probability space. We say that fn converges almost surely to f if
P({ω ∈ Ω : |fn (ω) − f (ω)| −→
n 0}) = 1.
We write fn −→
a.s. f .
Remark 8.1.2. (i) To formulate the above definition we need {ω ∈ Ω :
|fn (ω) − f (ω)| −→
n 0}) ∈ F. This follows from
{ω ∈ Ω : |fn (ω) − f (ω)| −→
n 0}
1
=
ω ∈ Ω : ∀ m ≥ 1 ∃ k ≥ 1 s.t. ∀ n ≥ k |fn (ω) − f (ω)| <
m
∞
∞
∞
\ [\
1
ω : |fn (ω) − f (ω)| <
=
∈ F.
m
m=1 k=1 n=k
104
CHAPTER 8. CONVERGENCE
(ii) The above definition depends on the measure P. In general one does
not have that
P({ω ∈ Ω : |fn (ω) − f (ω)| −→
n 0}) = 1
if and only if
Q({ω ∈ Ω : |fn (ω) − f (ω)| −→
n 0}) = 1
if Q is another measure on F.
(iii) Only few properties of fn are transferred to f by almost sure convergence. Take, for example, Ω = [0, 1], F = B([0, 1]), and λ to be the
Lebesgue measure (λ([a, b]) = b − a). Let fn be
1
2 n+1
ω ∈ 0, 2n
n 2 ω,
1 1
fn (ω) := n2n+1 − n2 2n+1 ω, ω ∈ 2n
,n
0,
ω ∈ n1 , 1 .
The function fn : [0, 1] → R is continuous so that fn is a random
variable. Moreover limn fn (ω) = 0 for all ω ∈ [0, 1]. On the other side
Z
1
Z
1
fn (ω) dλ(ω) =
fn (t) dt = 2n −→
∞.
n
0
0
A useful characterization of the almost sure convergence is given by
Proposition 8.1.3. Let (Ω, F, P) be a probability space and fn , f : Ω → R
be random variables. Then the following assertions are equivalent.
(i) fn −→
a.s. f .
(ii) limn P({ω ∈ Ω : supk≥n |fk (ω) − f (ω)| > ε}) = 0 for all ε > 0.
Proof. For ε > 0 and n ≥ 1 define
Aεn = {ω ∈ Ω : sup |fk (ω) − f (ω)| > ε}
k≥n
=
∞
[
{ω ∈ Ω : |fk (ω) − f (ω)| > ε} ∈ F
k=n
so that, since Aε1 ⊇ Aε2 ⊇ . . . ,
lim P(Aεn ) = P
nQ∞
∞
\
k=1
!
Aεn
8.1. ALMOST SURE CONVERGENCE
and
∞
\
105
Aεn = {ω ∈ Ω : ∀ n = 1, 2, . . . sup |fk (ω) − f (ω)| > ε}.
k≥n
n=1
(i)=⇒(ii) Let Ω0 := {ω : limn fn (ω) = f (ω)} ∈ F. Hence for all ω ∈ Ω0 there
exists n(ω) ≥ 1 with supk≥n(ω) |fk (ω) − f (ω)| ≤ ε so that
!c
∞
\
Ω0 ⊆
Aεn
and
n=1
Hence
Aεn ⊆ Ωc0 .
n=1
∞
\
0=P
∞
\
!
Aεn
= lim P (Aεn ) .
n=1
n
(ii)=⇒(i) We have
P
∞
[
!
(Aεn )c
= 1 and P
n=1
Finally,
∞ [
∞ \
1
N
An
!
c
= 1.
N =1 n=1
∞ [
∞ \
1 AnN c
ω∈
N =1 n=1
if and only if for all N = 1, 2, . . . there exists n ≥ 1, 2, . . . such that
sup |fk (ω) − f (ω)| ≤
k≥n
1
.
N
An analogous statement is true that describes Cauchy
1
a.s. sequences:
Proposition 8.1.4. Let f1 , f2 , ... : Ω → R be a sequence of random variables.
Then the following conditions are equivalent:
(i) P ({ω ∈ Ω : (fn (ω))∞
n=1 is a Cauchy sequence }) = 1,
(ii) For all ε > 0 one has that
lim P sup |fk − fl | ≥ ε = 0.
n→∞
k,l≥n
(iii) For all ε > 0 one has that
lim P sup |fk − fn | ≥ ε
n→∞
= 0.
k≥n
1
Augustin Louis Cauchy, 21/08/1789 (Paris, France)- 23/05/1857 (Sceaux, France),
study of real and complex analysis, theory of permutation groups.
106
CHAPTER 8. CONVERGENCE
Proof. (ii)⇐⇒ (iii) follows from
sup |fk − fn | ≤ sup |fk − fl |
k≥n
k,l≥n
≤ sup |fk − fn | + sup |fn − fl |
k≥n
l≥n
= 2 sup |fk − fn |.
k≥n
(i)⇐⇒ (ii) Let
A := {ω ∈ Ω : (fn (ω))∞
n=1 is a Cauchy sequence }
\
[ \ 1
=
ω ∈ Ω : |fk (ω) − fl (ω)| ≤
.
N
N =1,2,... n=1,2,... k>l≥n
Consequently, we have that P(A) = 1 if and only if
!
\ 1
=1
P
ω ∈ Ω : |fk (ω) − fl (ω)| ≤
N
n=1,2,... k>l≥n
[
for all N = 1, 2, ..., if and only if
lim P
n→∞
\ k>l≥n
1
ω ∈ Ω : |fk (ω) − fl (ω)| ≤
N
!
=1
for all N = 1, 2, ..., if and only if
!
[ 1
=0
lim P
ω ∈ Ω : |fk (ω) − fl (ω)| >
n→∞
N
k>l≥n
for all N = 1, 2, .... We can finish the proof by remarking that
[ 1
1
ω ∈ Ω : |fk (ω) − fl (ω)| >
= sup |fk (ω) − fl (ω)| >
.
N
N
k>l≥n
k>l≥n
A first serious example of the almost sure convergence is the following form
of the strong law of large numbers.
Proposition 8.1.5. [Strong law of large numbers]
Let (fn )∞
n=1 be a sequence of independent random variables with Efk = 0,
k = 1, 2, . . . , and c := supn Efn4 < ∞. Then
f1 + · · · + fn a.s.
→ 0.
n
8.2. CONVERGENCE IN PROBABILITY
Proof. Let Sn :=
ESn4
Pn
k=1
fk . It holds
n
X
=E
107
!4
fk
= E
k=1
=
n
X
fi fj fk fl
i,j,k,l,=1
n
X
Efk4 +
k=1
3
n
X
Efk2 Efl2 ,
k,l=1
k6=l
because for distinct {i, j, k, l} it holds
Efi fj3 = Efi fj2 fk = Efi fj fk fl = 0
by independence. For example, Efi fj3 = Efi Efj3 = 0 · Efj3 = 0, where one
3
3
gets that fj3 is integrable by E|fj |3 ≤ (E|fj |4 ) 4 ≤ c 4 . Moreover, by Jensen’s
inequality,
2
Efk2 ≤ Efk4 ≤ c.
Hence Efk2 fl2 = Efk2 Efl2 ≤ c for k 6= l. Consequently,
ESn4 ≤ nc + 3n(n − 1)c ≤ 3cn2 ,
and
E
∞
X
S4
n=1
This implies that
8.2
4
Sn
n4
n
n4
=
∞
∞
X
S 4 X 3c
E n4 ≤
< ∞.
2
n
n
n=1
n=1
a.s.
→ 0 and therefore
Sn
n
a.s.
→ 0.
Convergence in probability
Although we saw in Remark 8.1.2(iii) that a.s. convergence may be a weak
notation, there are examples where this notation turns out to be too strong.
Example 8.2.1. Ω = [0, 1] , F = B([0, 1]) , λ is the Lebesgue measure on
F. Define
f1 (ω) = χ[0, 1 ) (ω) , f2 (ω) = χ[ 1 ,1) (ω) ,
2
2
f3 (ω) = χ[0, 1 ) (ω) , f4 (ω) = χ[ 1 , 1 ) (ω) , . . . , f6 (ω) = χ[ 3 ,1) (ω) ,
4
4 2
4
f7 (ω) = χ[0, 1 ) (ω) , . . .
8
We have the feeling that limn fn (ω) = 0, but in what sense? We do not have
a.s. convergence since #{n : fn (ω) = 1} = ∞ for all ω ∈ [0, 1).
108
CHAPTER 8. CONVERGENCE
One solution consists in the concept of the convergence in probability introduced below in Definition 8.2.4. A direct way to describe that a sequence of
random variables fn : Ω → R converges to a random variable f : Ω → R in
probability consists in
lim P(|fn − f | > ε) = 0 for all ε > 0.
n
By the following Ky-Fan metric we can express this convergence in terms of
a metric, which opens the way to measure the distance between two random
variables defined on the same probability space: To shorten the notation it
will be useful to use
L0 (Ω, F, P) := {f : Ω → R random variable}.
Definition 8.2.2 (Ky Fan 2 -metric). For f, g ∈ L0 (Ω, F, P) we let
d(f, g) := inf{ε > 0 : P(|f − g| > ε) ≤ ε}.
Lemma 8.2.3. For fn , f ∈ L0 (Ω, F, P) we have that d(fn , f ) → 0 as n → ∞
if and only if
lim P(|fn − f | > ε) = 0 for all ε > 0.
n
Proof. The proof is an exercise.
Definition 8.2.4. Let fn , f ∈ L0 (Ω, F, P).
(i) The sequence (fn )∞
n=1 converges to f in probability if limn d(fn , f ) = 0.
We write fn −→
f
.
P
3
(ii) The sequence (fn )∞
provided
n=1 is a Cauchy sequence in probability
that for all ε > 0 there exists n(ε) ≥ 1 such that for all k, l ≥ n(ε) one
has that d(fk , fl ) ≤ ε.
The idea behind this is that (L0 (Ω, F, P), d) becomes a pseudo metric space:
Lemma 8.2.5. For f, g, h ∈ L0 (Ω, F, P) one has
(i) d(f, g) = 0 if and only P(f = g) = 1,
(ii) d(f, h) ≤ d(f, g) + d(g, h),
(iii) d(f, g) = d(g, f ).
Proof. The proof is an exercise.
Basic properties of the convergence in probability are given by
2
3
Ky Fan, 19/12/1914 (Hangzhou, China) - 22/03/2010 (Santa Barbara, USA).
or fundamental in probability
8.2. CONVERGENCE IN PROBABILITY
109
Proposition 8.2.6. Let fn , gn , f, g ∈ L0 (Ω, F, P).
(i) If fn −→
f , then (fn )∞
n=1 is a Cauchy sequence in probability.
P
(ii) Uniqueness of the limit: If fn −→
f and fn −→
g, then P(f = g) = 1.
P
P
(iii) Algebraic operations: If fn −→
f and gn −→
g and λ, µ ∈ R, then
P
P
λfn + µgn −→
λf + µg
P
and
fn gn −→
f g.
P
(iv) Completeness: If If (fn )∞
n=1 is a Cauchy sequence in probability, then
there exist a subsequence 1 ≤ n1 < n2 < · · · and h ∈ L0 (Ω, F, P) with
fn −→
h and
P
fnk −→
a.s. h.
Proof. We only proof part (iv), the remaining parts are exercises. We fix a
sequence ε1 > ε2 > ε3 > · · · > 0 such that
∞
X
εj < ∞
j=1
and find 1 ≤ n1 < n2 < . . . such that
P ({ω ∈ Ω : |fk (ω) − fl (ω)| > εj }) < εj
for k, l ≥ nj . Taking the sequence (fnj )∞
j=1 we get that
P ω ∈ Ω : |fnj+1 (ω) − fnj (ω)| > εj < εj
and
∞
X
P
ω ∈ Ω : |fnj+1 (ω) − fnj (ω)| > εj
< ∞.
j=1
Applying the Borel-Cantelli Lemma implies
P ω ∈ Ω : |fnj+1 (ω) − fnj (ω)| > εj infinitely often = 0.
Hence
(
P
ω∈Ω:
∞
X
)!
|fnj+1 (ω) − fnj (ω)| < ∞
= 1.
j=1
We set
h(ω) :=
fn1 (ω) +
P∞
P∞
(f
(ω)
−
f
(ω))
:
n
n
j+1
j
j=1
j=1 |fnj+1 − fnj | < ∞
0 : else
and get that fnj −→
h. For ε > 0
a.s. h. Finally, we have to check that fn −→
P
one gets
P(|fn − h| > ε) ≤ P(|fn − fnj | + |fnj − h| > ε)
110
CHAPTER 8. CONVERGENCE
ε
ε
+ P |fnj − h| >
.
≤ P |fn − fnj | >
2
2
We can conclude by limj→∞ P |fnj − h| > 2ε = 0 and
ε
P |fn − fnj | >
≤η
2
whenever n, nj ≥ n(η, ε) ≥ 1 and η > 0 is arbitrary.
Example (Example 8.2.1 continued). We have that fn −→
0. In fact,
λ
lim λ({ω ∈ [0, 1] : |fn (ω)| > ε}) ≤ lim λ({ω ∈ [0, 1] : fn (ω) 6= 0}) = 0
n
since
n
1
for n = 1, 2
2
1 for n = 3, 4, 5, 6
λ({ω ∈ [0, 1] : fn (ω) 6= 0}) = 14
for n = 7, . . .
8
...
.
Now we can clarify the relation between the almost sure convergence and the
convergence in probability.
Proposition 8.2.7. For fn , f ∈ L0 (Ω, F, P) one has:
(i) If fn −→
f.
a.s. f , then fn −→
P
(ii) If fn −→
f , then there exists 1 ≤ n1 < n2 < n3 < · · · such that
P
fnk −→
a.s. f as k → ∞.
(iii) One has fn −→
f if and only if for all sub-sequences 1 ≤ n1 < n2 <
P
n3 < . . . there is a sub-sequence 1 ≤ nk1 < nk2 < nk3 < · · · such that
fnkl −→
a.s. f
as
l → ∞.
Proof. (i) The first assertion follows from Proposition 8.1.3 and
P(|fn − f | > ε) ≤ P(sup |fk − f | > ε).
k≥n
(ii) The sequence (fn )∞
n=1 is a Cauchy sequence in probability so that by
Proposition 8.2.6 there is a subsequence (nk )∞
k=1 and a random variable h
such that fnk → h a.s. But this implies fnk → h and fnk → f in probability
so that f = h a.s.
(iii) If fn → f in probability, then fnk → f in probability and part (ii) implies
the existence of one more sub-sequence (nkl )∞
l=1 such that fnkl → f a.s. Now
assume that fn does not converge to f in probability. We find an ε > 0 and a
subsequence (nk )∞
k=1 such that P(|fnk − f | > ε) > ε. However, by assumption
there is a subsequence (nkl )∞
l=1 such that fnkl → f a.s. as l → ∞ and the
same in probability. But this implies that P(|fnkl − f | > ε) → 0 as l → ∞
which is a contradiction.
8.2. CONVERGENCE IN PROBABILITY
111
Example (Example 8.2.1 continued). What is a possible sub-sequence? One
can take
f1 = χ[0 1 ) , f3 = χ[0 1 ) , f7 = χ[0 1 ) , . . . .
2
4
8
Proposition 8.2.8 (Continuous mapping theorem). Let fn , f ∈ L0 (Ω, F, P),
ϕ : R → R is continuous, and if fn −→
f , then ϕ(fn ) −→
ϕ(f ).
P
P
Proof. The assertion follows directly from Proposition 8.2.7 (iii).
We conclude by an topological aspect, by formulating the convergence in
probability in terms of metric spaces.
Definition 8.2.9. (i) For f, g ∈ L0 (Ω, F, P) we define the equivalence relation f ∼ g by P(f = g) = 1.
(ii) L0 (Ω, F, P) is the space of all equivalence classes from L0 (Ω, F, P).
(iii) For f, g ∈ L0 (Ω, F, P) and λ ∈ R we introduce the linear operations
λ[f ] := [λf ] and [f ] + [g] := [f + g].
The final result is
ˆ is a complete linear metric
Proposition 8.2.10. The space [L0 (Ω, F, P), d]
space.
Proposition 8.2.11. [Weak law of large numbers] Let (fn )∞
n=1 be a
2
sequence of random variables such that Efn < ∞, cov(fk , fl ) = 0 for k 6= l,
Efk = m and
1
lim 2 [varf1 + · · · + varfn ] = 0.
n n
Then
f1 + · · · + fn P
−→ m as n → ∞.
n
Proof. By Chebyshev’s inequality (Corollary 6.3.5) we have that
f1 + · · · + fn − nm E|f1 + · · · + fn − nm|2
>ε
P ≤
n
n2 ε2
Pn
2
E ( k=1 (fk − m))
=
n 2 ε2
2
nσ
≤
→0
n 2 ε2
as n → ∞.
112
8.3
CHAPTER 8. CONVERGENCE
Convergence in mean
Definition 8.3.1. Let fn , f ∈ L0 (Ω, F, P).
(i) For p ∈ (0, ∞) we say that fn converges to f with respect to the p-th
mean (fn −→
f ) provided that
Lp
Z
lim
|fn − f |p dP = 0.
n→∞
Ω
(ii) For p = ∞ we say that fn converges to f uniformly (fn −→
f ) provided
L∞
that
lim ess sup |fn (ω) − f (ω)| = 0
n→∞
ω∈Ω
where
(
)
ess sup |g| := inf
sup |g(ω)| : N ∈ F, P(N ) = 0 .
Ω
ω∈Ω\N
Since fn and f are random variables and x 7→ |x|p is a continuous function,
ω 7→ |fn (ω)−f (ω)|p is a non-negative random variable and we may integrate.
Let us summarize some basic properties of this type of convergence.
Proposition 8.3.2. Let 0 < p < q < ∞ and fn , f, gn , g ∈ L0 (Ω, F, P).
f , then fn −→
f.
(i) If fn −→
Lp
P
(ii) If fn −→
f and gn −→
g, then fn + gn −→
f + g.
Lp
Lp
Lp
(iii) If fn −→
f , then fn −→
f.
Lq
Lp
(iv) If fn −→
f and E supn |fn |p < ∞, then f ∈ Lp (Ω, F, P) and
P
lim E|fn − f |p = 0.
n
Proof. (i) Let ε > 0. Then
P({ω : |fn (ω) − f (ω)| > ε}) = P({ω : |fn (ω) − f (ω)|p > εp })
Z
1
≤ p
|fn − f |p dP −→
0.
n
ε Ω
(ii) follows directly from the Minkowski inequality Proposition 6.3.4.
(iii) is an application of the Hölder’s inequality Proposition 6.3.1: Let r :=
q
∈ (1, ∞) and let 1 = 1r + 1s . Then
p
Z
p
Z
|fn − f | dP =
Ω
Ω
|fn − f |p · 1dP
8.3. CONVERGENCE IN MEAN
113
Z
≤
r1 Z
p r
(|fn − f | ) dP
Ω
Ω
Z
q
pq
|fn − f | dP
=
1s dP
1s
.
Ω
(iv) Assume first, that the claim fails to be true. Then we find a sub-sequence
n1 < n2 < · · · and a δ > 0 such that
Z
|fnk − f |p dP ≥ δ > 0.
Ω
But there is also one more sub-sequence (nkl )∞
l=1 such that liml fnkl = f
a.s. Hence to get the contradiction, it suffices to prove the original claim
under the stronger assumption that fn → f a.s. By Lebesgue’s theorem of
dominated convergence we get that
E|f |p = lim E|fn |p < ∞.
n
Hence
p
p
E sup |fn − f | ≤ cp E sup |fn | + |f | < ∞
p
n
n
and, again by dominated convergence,
0 = E lim |fn − f |p = lim E|fn − f |p .
n
n
In our setting there are some problems with property (i) of the norm property.
For this reason we have to work with the equivalence classes introduced in
Definition 8.2.9.
Definition 8.3.3. Letting p ∈ (0, ∞] and
( R
p1
p
: p ∈ (0, ∞),
|f
|
dP
Ω
kf kLp :=
ess supΩ |f | : p = ∞,
we define
f : Ω → R random variable, kf kLp < ∞ ,
n
o
ˆ
Lp (Ω, F, P) :=
f : f ∈ Lp (Ω, F, P) with kfˆkLp := kf kLp .
Lp (Ω, F, P) :=
Proposition 8.3.4. The space Lp (Ω, F, P), k · kLp is a Banach space if
p ∈ [1, ∞] and a quasi-Banach space if p ∈ (0, 1).
Proof. (i) kfˆkLp = 0 if and only if f = 0 a.s. if and only if fˆ = 0.
(ii) is clear and (iii) is a consequence of the Minkowski inequality.
∞
(iv) Assume a Cauchy sequence (fˆn )∞
n=1 ⊆ Lp . Then (fn )n=1 is a Cauchy
sequence in probability since
P ({ω : |fn (ω) − fm (ω)| > λ}) ≤
1 ˆ
kfn − fˆm kpLp ≤ ε
p
λ
114
CHAPTER 8. CONVERGENCE
for n, m ≥ n(λ, ε) ≥ 1. Hence there is a limit f : Ω → R such that fn −→
f.
P
∞
f as well. We pick a sub-sequence (fnk )k=1
It remains to show that fn −→
Lp
such that limk fnk = f P − a.s. . Applying the lemma of Fatou gives
Z
Z
p
|fm − f | dP ≤ lim inf
|fm − fnk |p dP ≤ ε
k
Ω
Ω
for m ≥ m(ε) ≥ 1.
8.4
Summary
L0 (Ω, F, P)
Complete metric space with translation invariant metric
Lp (Ω, F, P) for 0 < p ≤ ∞
quasi-Banach space
Lp (Ω, F, P) for 1 ≤ p ≤ ∞
Banach space
8.4. SUMMARY
115
fnk → f P-a.s. as k → ∞ y
There exists a subsequence (nk )∞
k=1 with
fn → f in probability
:
6
fn → f P-a.s.
E supn |fn |p < ∞
?
fn → f in Lq
-
0<p<q<∞
fn → f in Lp
116
CHAPTER 8. CONVERGENCE
Chapter 9
Uniform integrability
Now we want to understand the condition ERsupn |fn |p < ∞ for p = 1 from
Proposition 8.3.2 (iv) better. We recall that Ω |f |dP < ∞ implies that
Z
|f |dP = 0.
lim
c→∞
{|f |≥c}
R
In fact, the latter condition is (of course) equivalent to Ω |f |dP < ∞. This
leads us to the following definition of uniform integrability.
Definition 9.0.1. Let (Ω, F, P) be a probability space and fi : Ω → R, i ∈ I,
be a family of random variables. Then the family (fi )i∈I is called uniformly
integrable provided that for all ε > 0 there is a constant c > 0 such that
Z
sup
|fi |dP ≤ ε,
i∈I
|fi |≥c
or equivalently,
Z
|fi |dP = 0.
lim sup
c↑∞ i∈I
|fi |≥c
Example 9.0.2. Let (Ω, F, P) = ([0, 1], B([0, 1]), λ) and fn (t) := nχ[0, 1 ] (t),
n
n ∈ I = {1, 2, 3, ...}. This family is not uniformly integrable because for any
c > 0 we have that
Z
|fn (t)|dt = 1.
sup
n
|fn |≥c
The name uniform integrable suggests that the expected values of a uniform
integrable family of random variables is uniformly bounded. In fact, we have
Lemma 9.0.3. Let fi : Ω → R, i ∈ I be a uniformly integrable family, then
sup E|fi | < ∞.
i∈I
118
CHAPTER 9. UNIFORM INTEGRABILITY
Proof. We choose an ε > 0 and find an c > 0 such that
Z
sup
|fi |dP ≤ ε.
i∈I
Then, for all i ∈ I,
Z
E|fi | =
|fi |≥c
Z
Z
|fi |dP +
|fi |≥c
|fi |dP ≤ ε +
|fi |<c
cdP ≤ ε + c.
|fi |≤c
An important sufficient criteria is
Lemma 9.0.4. Let G : [0, ∞) → [0, ∞) non-negative and increasing such
that
G(t)
lim
=∞
t→∞
t
and (fi )i∈I be a family of random variables fi : Ω → R such that
sup EG(|fi |) < ∞.
i∈I
Then (fi )i∈I is uniformly integrable.
Proof. We let ε > 0 and M := supi∈I EG(|fi |) and find a c > 0 such that
G(t)
M
≤
ε
t
for t ≥ c. Then
Z
ε
|fi |dP ≤
M
|fi |≥c
Z
G(|fi |)dP ≤ ε.
|fi |≥c
Example 9.0.5. (i) Examples for functions in Lemma 9.0.4 are G(t) := tp
with 1 < p < ∞ and G(t) := t log(1 + t). Hence supi∈I E|fi |p < ∞ or
supi∈I E[|fi | log(1+|fi |)] < ∞ imply that (fi )i∈I is uniformly integrable.
(ii) One cannot take the function G(t) = t as we have in Example 9.0.2
that E|fn | = 1.
The main statement of this section is
Proposition 9.0.6. Assume random variables fn , f : Ω → R such that
fn →P f , E|fn | < ∞ and E|f | < ∞. Then the following assertions are
equivalent:
(i) fn → f in L1 .
(ii) (fn )∞
n=1 is u.i.
(iii) kfn kL1 →n kf kL1 .
119
The main application of the equivalence above is
Corollary 9.0.7. Assume random variables fn , f : Ω → R such that fn →P f
and E|fn | < ∞. Then E|f | < ∞ and
lim |Efn − Ef | ≤ lim E|fn − f | = 0.
n→∞
n→∞
Proof. According to Proposition 8.2.6 there is a sub-sequence 1 ≤ n1 < n2 <
· · · such that fnk → f a.s. as k → ∞. Applying the Lemma of Fatou,
Proposition 5.4.5, we get that
E|f | ≤ lim inf E|fnk | < ∞.
k
Hence we can apply Proposition 9.0.6 and use the implication (ii) ⇒ (i).
For the proof of Proposition 9.0.6 we need some lemmata:
Lemma 9.0.8. The conditions fn →P f , |fn | ≤ g and |f | ≤ g for some g
with Eg < ∞ imply that
Z
Z
f dP.
lim fn dP =
n
Ω
Ω
Proof. Assume that the conclusion is not true. Then there is a ε > 0 and a
sub-sequence n1 < n2 < n3 < · · · such that
Z
Z
fn dP − f dP ≥ ε.
k
Ω
Ω
But we can find one more sub-sequence nkl such that
fnkl → f
a.s. as l → ∞.
Applying dominated convergence yields in this case that
Z
Z
lim fnkl dP =
f dP
l
Ω
Ω
which is a contradiction.
Lemma 9.0.9. For f ∈ L1 (Ω, F, P) and P(An ) →n 0 one has
Z
lim
f dP = 0.
n
An
Proof. We apply Lemma 9.0.8 to fen := f χAn and get
|fen | ≤ |f | =: g
and fen →P 0
because
P(|fen | > c) ≤ P(An ) →n 0.
120
CHAPTER 9. UNIFORM INTEGRABILITY
Proof of Proposition 9.0.6. (i) =⇒ (ii). We have that
Z
Z
Z
|fn |dP ≤
|fn − f |dP +
|f |dP
{|fn |≥c}
{|fn |≥c}
{|fn |≥c}
Z
Z
≤ kfn − f kL1 +
|f |dP +
{|fn −f |≥c/2}
and
|f |dP
{|f |≥c/2}
Z
lim kfn − f kL1 = lim
n
c↑∞
|f |dP = 0.
{|f |≥c/2}
For the middle term we use Lemma 9.0.9 to deduce that
Z
|fn |dP ≤ ε
{|fn |≥c}
for c ≥ c(ε) and n ≥ n(c(ε), ε).
(ii) =⇒ (iii) By Lemma 9.0.8 we get that
Z
Z
(|fn | ∧ c)dP → (|f | ∧ c)dP
Ω
Ω
because (|fn | ∧ c) →P (|f | ∧ c) as x → |x| ∧ c is a continuous function and
the continuous mapping theorem Proposition 8.2.8. But then
|E|fn | − E|f ||
≤ |E(|fn | ∧ c) − E(|f | ∧ c)| + |E|fn | − E(|fn | ∧ c)|
+ |E|f | − E(|f | ∧ c)|
Z
Z
|f |dP.
|fn |dP +
≤ |E(|fn | ∧ c) − E(|f | ∧ c)| +
{|fn |≥c}
(iii) =⇒ (i) We have that
||fn − f | − |fn | + |f || ≤ 2|f |
and
|fn − f | − |fn | + |f | →P 0.
Hence, by Lemma 9.0.8,
E[|fn − f | − |fn | + |f |] →n 0
and
E|fn − f | →n 0
because E|fn | →n E|f |.
{|f |≥c}
Chapter 10
Limit theorems
In this chapter we present the basic limit theorems, the Strong Law of Large
Numbers and the Law of Iterated Logarithm. Whereas the first one is expected, the second one is surprising.
10.1
Sums of independent random variables
If ε1 , ε2 , . . . are independent Bernoulli random variables, that means
P (εn = 1) = P (εi = −1) = 12 , then we know from Section 4.2 that
P
∞
X
εn
n=1
n
!
converges
∈ {0, 1} .
But, do we get probability zero or one? For this there is a beautiful complete
answer: It consists of the Three-Series-Theorem of Kolmogorov which we
will deduce from the Two-Series-Theorem of Kolmogorov.
Proposition 10.1.1 (Two-Series-Theorem of Kolmogorov). Assume a
probability space (Ω, F, P) and random variables ξ1 , ξ2 , ... : Ω → R which are
assumed to be independent and such that Eξi2 < ∞.
P
P∞
2
(i) If ∞
Eξ
converges
and
n
n=1
n=1 E (ξn − Eξn ) < ∞, then
P
∞
X
!
ξn converges
= 1.
n=1
(ii) Let |ξn (ω)| ≤ d for all n = 1, 2, ... and all ω ∈ Ω, where d > 0 is some
constant. Then the following assertions are equivalent:
P
(a) P ( ∞
n=1 ξn converges ) = 1.
P∞
P∞
2
(b)
n=1 Eξn converges and
n=1 E (ξn − Eξn ) < ∞.
122
CHAPTER 10. LIMIT THEOREMS
To prove the Two-Series-Theorem, we need a deviation inequality due to
Kolmogorov. Before we state and prove this inequality, let us motivate it.
Assume a random variable f : Ω → R with Ef 2 < ∞. Then, by Chebyshev
1
’s inequality
Z
2
2
2
2
|f |2 d P
ε P (|f | ≥ ε) = ε P |f | ≥ ε ≤
Ω
for ε > 0 so that
E |f |2
.
ε2
Assuming independent random variables ξ1 , ξ2 , ... : Ω → R and fn := ξ1 +
· · · + ξn this gives that
P (|f | ≥ ε) ≤
P (|fn | ≥ ε) ≤
E|fn |2
.
ε2
Now one can enlarge the left-hand side by replacing |fn | by supk=1,...,n |fk | so
that we get a maximal inequality.
Lemma 10.1.2 (Inequality of Kolmogorov). Let ξ1 , ξ2 , . . . : Ω → R be
independent with Eξn2 < ∞ and Eξn = 0. Then, for fk := ξ1 + · · · + ξk and
ε > 0,
Z
Ef 2
1
fn2 dP ≤ 2n .
P max |fk | ≥ ε ≤ 2
1≤k≤n
ε {max1≤k≤n |fk |≥ε}
ε
Proof. Let Ak := {|f1 | < ε, . . . , |fk−1 | < ε, |fk | ≥ ε}. Then
2
max |fk | ≥ ε
εP
Z
n
X
≤
n Z
X
k=1
fk2 dP + 2
Ak
fn2 dP ≤
n
X
P (Ak ) ≤
k=1
Ak
k=1
=
=ε
1≤k≤n
2
Z
fk
Ak
Z
n
X
i=k+1
n Z
X
Ak
k=1
!
ξi
fk2 dP
Z
dP +
Ak
n
X
!2
ξi
dP
i=k+1
fn2 dP.
Ω
Example 10.1.3. For Bernoulli variables (εn )∞
n=1 it follows from Lemma
10.1.2 that
E f2
n
P max |ε1 + · · · + εk | ≥ ε ≤ 2 n = 2 .
1≤k≤n
ε
ε
1
Pafnuty Lvovich Chebyshev, 16/05/1821 (Okatovo, Russia) - 08/12/1894 (St Petersburg, Russia), number theory, mechanics, famous for the orthogonal polynomials he investigated.
10.1. SUMS OF INDEPENDENT RANDOM VARIABLES
Letting ε = θ
√
123
n, θ > 0, this gives
√
1
P max |ε1 + · · · + εk | ≥ θ n ≤ 2 .
1≤k≤n
θ
The√left-hand
√ side describes the probability that the random walk exceeds
−θ n or θ n up to step n (not only at step n).
Now we need the converse of the above inequality:
Lemma 10.1.4 (Converse inequality of Kolmogorov). Let ξ1 , ξ2 , ... : Ω →
R be a sequence of independent random variables such that
(i) Eξn = 0 for n = 1, 2, ...,
(ii) there exists a constant c > 0 such that |ξn (ω)| ≤ c for all ω ∈ Ω and
n = 1, 2, ....
If fn := ξ1 + · · · + ξn , ε > 0, n ∈ {1, 2, ...}, and Efn2 > 0, then
(c + ε)2
.
P max |fk | ≥ ε ≥ 1 −
1≤k≤n
Efn2
Proof of Proposition 10.1.1. (i) We let ηn := ξn − Eξn so that Eηn = 0 and
∞
X
Eηn2
=
n=1
∞
X
E(ξn − Eξn )2 < ∞.
n=1
P
P
Eξn converges,Pthe convergence of ∞
Since ∞
n=1 P
n=1 ηn (ω) implies the con∞
ξ
(ω)
=
(η
(ω)
+
Eξ
).
Hence
it is sufficient to show
vergence of ∞
n
n=1 n
n=1 n
that
!
∞
X
P
ηn converges = 1.
n=1
Let gn := η1 + · · · + ηn . Applying Proposition 8.1.4 we see that it is enough
to prove
lim P sup |gk − gn | ≥ ε = 0
n
k≥n
for all ε > 0. But this follows from
P sup |gk − gn | ≥ ε
= lim P
N →∞
k≥n
sup
|gk − gn | ≥ ε
n≤k≤n+N
E(gn+N − gn )2
N →∞
ε2
PN
2
k=1 Eηn+k
= lim
N →∞
ε2
≤
lim
124
CHAPTER 10. LIMIT THEOREMS
P∞
≤
l=n+1
ε2
Eηl2
where we have used the Kolmogorov inequality Lemma 10.1.2. Obviously,
the last term converges to zero as n → ∞.
(ii) Because of step (i) we only have to prove that (a) ⇒ (b). We use again
a symmetrization argument and consider a new sequence ξ10 , ξ20 , ... : Ω0 → R
of independent random variables on (Ω0 , F 0 , P0 ) having the same distribution
as the original sequence ξ1 , ξ2 , ..., that means
P(ξn ≤ λ) = P0 (ξn0 ≤ λ)
for all n = 1, 2, ... and all λ ∈ R. We also may assume that |ξn0 (ω 0 )| ≤
d for all ω 0 ∈ Ω0 and n = 1, 2, .... Taking the product space (M, Σ, µ) =
(Ω, F, P) × (Ω0 , F 0 , P0 ) we may consider ξn , ξn0 : M → R with the convention
that ξn (ω, ω 0 ) = ξn (ω) and ξn0 (ω, ω 0 ) = ξn0 (ω 0 ). Now we let
ηn (ω, ω 0 ) = ξn (ω, ω 0 ) − ξn0 (ω, ω 0 )
and get
Eµ ηn = Eξn − Eξn0 = 0,
|ηn (ω, ω 0 )| ≤ |ξn (ω, ω 0 )| + |ξn0 (ω, ω 0 )| ≤ 2d,
and
(
µ
(ω, ω 0 ) ∈ M :
∞
X
)!
ηn (ω, ω 0 ) converges
n=1
(
= P × P0
∞
X
(ω, ω 0 ) ∈ Ω × Ω0 :
(ξn (ω) − ξn0 (ω 0 )) converges
)!
n=1
= 1.
Letting gn := η1 + · · · + ηn and ε > 0, Proposition 8.1.4 implies that there is
an n ∈ {1, 2, ...} such that
1
µ sup |gk − gn | ≥ ε < .
2
k≥n
Exploiting Lemma 10.1.4 gives, for N ≥ 1, that
(2d + ε)2
(2d + ε)2
1
1 − Pn+N
=1−
≤µ
sup |gk − gn | ≥ ε < .
2
2
E(gn+N − gn )
2
k=n,...,n+N
k=n+1 Eηk
But from this it follows that
n+N
X
k=n+1
Eηk2 < 2(2d + ε)2 < ∞
10.1. SUMS OF INDEPENDENT RANDOM VARIABLES
125
for all N , so that
∞
X
Eηk2 < ∞.
k=1
This gives that
∞
X
∞
∞
X
1X
Eηn2 < ∞.
E(ξn − Eξn ) =
E([ξn − Eξn ] − [ξn0 − Eξn0 ])2 =
2 n=1
n=1
n=1
2
P
P∞
It remains P
to show that ∞
n=1 Eξn exists. Since
n=1 ξn converges almost
∞
surely and n=1 (ξn − Eξn ) converges almost surely because of step
(i) and
P
P∞
∞
2
E(ξ
−
Eξ
)
<
∞
proved
right
now,
we
have
to
have
that
n
n
n=1
n=1 Eξn
converges as well.
As we saw, the Two-Series-Theorem only gives an equivalence in the case that
supn,ω |ξn (ω)| < ∞. For the general case we have an equivalence as well, the
Three-Series-Theorem, which one can quickly deduce from the Two-SeriesTheorem. For its formulation we introduce for a random variable f : Ω → R
and some constant c > 0 the truncated random variable
f (ω) : |f (ω)| ≤ c
c : f (ω) > c .
f c (ω) :=
−c : f (ω) < −c
Corollary 10.1.5 (Three-Series-Theorem of Kolmogorov). Assume a
probability space (Ω, F, P) and independent random variables ξ1 , ξ2 , ... : Ω →
R. Then the following conditions are equivalent:
P
(i) P ( ∞
n=1 ξn converges ) = 1.
(ii) For all constants c > 0 the following three conditions are satisfied:
P∞
c
(a)
n=1 Eξn converges,
P∞
c
c 2
(b)
n=1 E (ξn − Eξn ) < ∞,
P∞
(c)
n=1 P (|ξn | ≥ c) < ∞.
(iii) There exists one constant c > 0 such that conditions (a), (b), and (c)
of item (ii) are satisfied.
Proof. (iii) ⇒ (i). The Two-Series-Theorem implies that
!
∞
X
P
ξnc converges = 1.
n=1
126
CHAPTER 10. LIMIT THEOREMS
Consider ηn := ξn − ξnc and Bn := {ω ∈ Ω : ηn (ω) 6= 0}. Then
∞
X
P(Bn ) =
n=1
∞
X
P (|ξn | > c) < ∞.
n=1
The lemma of Borel-Cantelli implies that
P ({ω ∈ Ω : # {n : ω ∈ Bn } = ∞}) = 0.
P
This implies that P ( ∞
n=1 ηn converges ) = 1 so that
!
!
∞
∞
X
X
P
ξn converges = P
[ηn + ξnc ] converges = 1.
n=1
n=1
(ii) =⇒ (iii) is trivial.
P∞
(i) =⇒ (ii) The almost sure convergence of
n=1 ξn (ω) implies that
limn |ξn (ω)| = 0 a.s. so that
P lim sup {ω ∈ Ω : |ξn (ω)| ≥ c} = 0.
n
The Lemma of Borel-Cantelli (note that the random variables ξn are
independent) gives that
∞
X
P (|ξn | ≥ c) < ∞
n=1
so
convergence of
P∞ sure
P∞that we obtain condition (c). Next, the almost
c
ξ
(ω)
since ξn (ω) =
ξ
(ω)
implies
the
almost
sure
convergence
of
n=1 n
n=1 n
c
ξn (ω) for n ≥ n(ω) for almost all ω ∈ Ω. Hence we can apply the Two-SeriesTheorem Proposition 10.1.1 to obtain items (a) and (b).
Now we consider an example.
Example 10.1.6. Let ε1 , ε2 , ... : Ω → R be independent Bernoulli random
variables, α1 , α2 , ... ∈ R and β1 , β2 , ... ∈ R. Then
!
∞
X
P
[αn + βn εn ] converges = 1
(10.1)
n=1
if and only if
P∞
n=1
αn converges and
P∞
n=1
βn2 < ∞.
∞
Proof. (a) Assuming the conditions on (αn )∞
n=1 and (βn )n=1 we get, for ξn :=
αn + βn εn , that
∞
X
n=1
Eξn =
∞
X
n=1
αn
and
∞
X
n=1
2
E[ξn − Eξn ] =
∞
X
n=1
βn2 < ∞.
10.1. SUMS OF INDEPENDENT RANDOM VARIABLES
127
P
The Two-Series-Theorem gives the almost sure convergence of ∞
n=1 [αn +
βn εn ].
(b) Assuming (10.1), we define ηn (ω) := −εn (ω) and obtain independent
Bernoulli random variables. Since (10.1) is an assumption on the distribution of (ε1 , ε2 , ...), which is the same as that of (η1 , η2 , ...), we get (10.1) for
(ηn )∞
n=1 as well. Hence there are Ωε , Ωη ∈ F of measure one such that
∞
X
[αn + βn εn (ω)] and
∞
X
[αn + βn ηn (ω 0 )]
n=1
n=1
converge for ω ∈ Ωε and ω 0 ∈ Ωη . Since Ωε ∩ Ωη is of measure one, there
exists at least one ω0 ∈ Ωε ∩ Ωη such that
∞
X
[αn + βn εn (ω0 )] and
∞
X
[αn + βn ηn (ω0 )]
n=1
n=1
converge. Taking the sum, we get that
∞
∞
∞
X
X
X
[αn + βn εn (ω0 )]
[αn + βn ηn (ω0 )] =
[αn + βn εn (ω0 )] +
n=1
n=1
n=1
∞
∞
X
X
αn
[αn + βn (−εn (ω0 ))] = 2
+
n=1
n=1
converges. From that we can deduce in turn that
!
∞
X
P
βn εn converges = 1.
n=1
P
0
Picking again an ω00 ∈ Ω such that ∞
n=1 βn εn (ω0 ) converges, we get that
supn |βn | = supn |βn ||εn (ω0 )| < ∞. Hence, by Proposition 10.1.1,
∞
X
n=1
βn2
=
∞
X
E|βn εn − Eβn εn |2 < ∞.
n=1
The following examples demonstrate the basic usage of the Three-SeriesTheorem: it shows that not the size of the large values of the random variables
is sometimes important, but only their probability.
Example 10.1.7. Assume independent random variables ξ1 , ξ2 , ... : Ω → R
such that
1
P(ξn = αn ) = P(ξn = −αn ) = pn ∈ 0,
2
P∞
with n=1 pn < ∞, P(ξn = 0) = 1 − 2pn , and αn ≥ 0. Then one has that
!
∞
X
P
ξn converges = 1.
n=1
128
CHAPTER 10. LIMIT THEOREMS
Proof. For c = 1 we apply the Three-Series-Theorem. By symmetry we have
that
∞
∞
X
X
0=0
Eξn1 =
n=1
n=1
so that item (a) follows. Moreover,
∞
X
E(ξn1 − Eξn1 )2 =
n=1
and
∞
X
E(ξn1 )2 ≤ 2
n=1
∞
X
P(|ξn | ≥ 1) ≤ 2
n=1
∞
X
pn < ∞
n=1
∞
X
pn < ∞
n=1
so that items (b) and (c) follow as well.
10.2
The strong law of large numbers
There are several strong laws of large numbers. We mainly prove a version
which goes back to Kolmogorov.
Proposition 10.2.1 (Kolmogorov 2 ). Let f1 , f2 , ... ∈ L0 (Ω, F, P) such that
(i) f1 , f2 , . . . are independent,
(ii) f1 , f2 , . . . have the same distribution and
(iii) E|f1 | < ∞.
Then one has limn→∞
1
n
(f1 (ω) + · · · + fn (ω)) = Ef1 a.s. .
For the proof we need some preparations.
Lemma 10.2.2 (Töplitz). 3 Let an ≥ 0, bn := a1 + a2 + · · · + an > 0,
limn bn = ∞. Assume (xn )∞
n=1 ⊆ R such that limn xn = x. Then
n
1 X
ai xi −→
x.
n
bn i=1
Proof. The proof is an exercise.
2
Andrey Nikolaevich Kolmogorov, 25/04/1903 (Tambov, Russia) - 20/10/1987 (Moscow, Russia).
3
Otto Töplitz, 01/08/1881 (Wroclaw, Poland) - 15/02/1940 (Jerusalem, Israel).
10.2. THE STRONG LAW OF LARGE NUMBERS
129
4
Lemma 10.2.3 (Kronecker).
Let 0 < b1 ≤ b2 ≤ b3 ≤ . . . with bn → ∞
P
∞
∞
and (xn )n=1 ⊆ R so that n=1 xn is convergent. Then
n
1 X
bj xj −→
0.
n
bn j=1
Proof. We let b0 = 0, S0 = 0 and Sn := x1 + · · · + xn . Then
n
n
1 X
1 X
bj x j =
bj (Sj − Sj−1 )
bn j=1
bn j=1
1
=
bn
bn Sn − b0 S0 −
n
X
!
Sj−1 (bj − bj−1 )
j=1
n
1 X
= Sn −
Sj−1 (bj − bj−1 ) −→
x − x = 0.
n
bn j=1
Lemma 10.2.4. Let f : Ω → R be a random variable. Then
Z
∞
∞
X
X
P(|f | ≥ n).
P(|f | ≥ n) ≤
|f |dP ≤ 1 +
Ω
n=1
n=1
Proof. We simply have that
∞
X
P(|f | ≥ n) =
∞ X
X
n=1 k≥n
n=1
P(|f | ∈ [k, k + 1)) =
∞
X
kP(|f | ∈ [k, k + 1)) ≤ E|f |.
k=1
The other inequality is proved in the same way.
Next we deduce a variant of the Strong Law of Large Numbers:
Lemma 10.2.5. Let ξ1 , ξ2 , ... : Ω → R be a sequence of independent random
variables and βn > 0 such that
(i) βn ↑ ∞,
(ii) Eξn = 0,
P∞ Eξn2
(iii)
2 < ∞.
n=1 βn
Then one has that P β1n (ξ1 + · · · + ξn ) →n 0 = 1. Consequently, for independent η1 , η2 , ... : Ω → R such that
∞
X
1
var(ηn ) < ∞ and m = Eη1 = Eη2 = · · ·
2
n
n=1
one has that P n1 (η1 + · · · + ηn ) →n m = 1.
4
Leopold Kronecker, 07/12/1823 (Legnica, Poland) - 29/12/1891 (Berlin, Germany).
130
CHAPTER 10. LIMIT THEOREMS
Proof. From the Two-Series-Theorem we know that
!
∞
X
ξn
converges = 1.
P
β
n=1 n
Hence Lemma 10.2.3 gives that
!
!
n
n
1 X ξi
1 X
1 = P lim
βi = 0 = P lim
ξi = 0 .
n βn
n βn
βi
i=1
i=1
Proof of Proposition 10.2.1. We can assume that Ef1 = 0. The idea is to
truncate the random variables and to apply Lemma 10.2.5. We let
(
fn (ω) : |fn (ω)| < n
f˜n (ω) :=
0
: |fn (ω)| ≥ n
and gn (ω) := f˜n (ω) − Ef˜n .
(i) Now we compute
∞
∞
X
X
1
1 ˜2
2
Egn ≤
Ef
2
n
n2 n
n=1
n=1
∞
X
1
Eχ{|fn |<n} fn2
=
2
n
n=1
∞ X
n
X
1
=
Eχ{k−1≤|f1 |<k} f12
2
n
n=1 k=1
!
∞
∞
X
X
1
Eχ{k−1≤|f1 |<k} f12
=
2
n
k=1
n=k
≤ 2
≤ 2
∞
X
1
k=1
∞
X
k
Eχ{k−1≤|f1 |<k} f12
Eχ{k−1≤|f1 |<k} |f1 |
k=1
= 2E|f1 | < ∞,
where we have used that
Z ∞
∞
X
2
1
1
1
1
−1 ∞
≤
+
dx
=
+
−x
|
= .
k
n2
k
x2
k
k
k
n=k
(ii) Applying Lemma 10.2.5 gives that n1 (g1 + . . . + gn ) −→
a.s. 0.
P
n
(iii) To replace g by f˜ we need to show that n1 k=1 Ef˜k −→
0. According to
n
˜
the Töplitz-Lemma it is sufficient to prove that Efn −→
0. But this follows
n
from
Z
Z
Z
˜
Efn =
fn dP =
f1 dP −→
f1 dP = 0
n
{|fn |<n}
{|f1 |<n}
Ω
10.2. THE STRONG LAW OF LARGE NUMBERS
131
P
because of dominated convergence. So we get n1 nk=1 Ef˜k −→
a.s. 0.
(iv) To replace f˜ by f we use Borel-Cantelli. We observe
E|f1 | < ∞ ⇐⇒
⇐⇒
∞
X
n=1
∞
X
P (|f1 | ≥ n) < ∞
P (|fn | ≥ n) < ∞
n=1
⇐⇒ P ({ω : #{n : |fn (ω)| ≥ n} < ∞}) = 1
and let Ω0 := {ω : #{n : |fn (ω)| ≥ n} < ∞}. Hence, for all ω ∈ Ω0 we get
some n(ω) such that for n ≥ n(ω) one has fn (ω) = f˜n (ω) and
1
1 ˜
˜
lim (f1 (ω) + . . . + fn (ω)) = lim
f1 (ω) + . . . + fn (ω) .
n→∞ n
n→∞ n
This completes the proof.
Now we consider a converse to the Strong Law of Large Numbers.
Proposition 10.2.6. Assume that (Ω, F, P) is a probability space and that
f1 , f2 , . . . : Ω → R are independent random variables having the same distribution. If there is a constant c ∈ R such that
1
(f1 + . . . + fn ) −→
n c P-a.s.,
n
then E|f1 | < ∞ and Ef1 = c.
The proof will be subject to an exercise.
As an application we deduce a result of Borel. Let t ∈ [0, 1) and write
t=
∞
X
1
t , tn ∈ {0, 1},
n n
2
n=1
where #{n : tn = 0} = ∞. Hence
x1
xn
x1
xn
1
{t : t1 = x1 , . . . , tn = xn } = t :
+ ... + n ≤ t <
+ ... + n + n .
2
2
2
2
2
Let Ω = [0, 1), F = B([0, 1)), λ be the Lebesgue measure.
Proposition 10.2.7 (Borel). Given t ∈ [0, 1) we let
Zn (t) := # {1 ≤ k ≤ n : tk = 1} .
Then
λ
1
1
t ∈ [0, 1) : Zn (t) →
n
2
= 1.
132
CHAPTER 10. LIMIT THEOREMS
Proof. Letting fn (t) := tn we simply have n1 Zn (t) =
The random variables f1 , f2 , . . . : Ω → R satisfy
λ (f1 = θ1 , . . . , fn = θN ) =
1
(f (t)
n 1
+ . . . + fn (t)).
1
= λ (f1 = θ1 ) · · · λ (fn = θN )
2N
for all θ1 , . . . , θN ∈ {0, 1} and are therefore independent as one can replace
the condition {fk = θk } by {f ∈ Bk } for Bk ∈ B(R). Hence we can apply
the SLLN and are done.
10.3
An ergodic Theorem of Birkhoff and
Chinčin
Extensions of the Strong Law of Large Numbers can be obtained by ergodic
theory.
Definition 10.3.1. Let (Ω, F, P) be a probability space.
(1) A measurable map T : Ω → Ω is called measure preserving provided that
P(T −1 (A)) = P(A) for all A ∈ F.
(2) A measure preserving map T : Ω → Ω is called ergodic provided that,
for A ∈ F, the condition
T −1 (A) = A implies P(A) ∈ {0, 1}.
Now we get the following Ergodic Theorem.
Proposition 10.3.2 (Birkhoff & Chinčin). Let (Ω, F, P) be a probability
space, T : Ω → Ω be ergodic, and f : Ω → R be a random variable such that
E|f | < ∞. Then one has that
n−1
1X
f (T k ω) = Ef a.s.
lim
n n
k=0
Why does the Ergodic Theorem of Birkhoff and Chinčin extend the Strong
Law of Large Numbers? Let
(M, Σ, µ) := ⊗∞
n=1 (Ω, F, P)
and define the shift T : M → M by
T (ω1 , ω2 , ω3 , ...) := (ω2 , ω3 , ...).
The shift T is ergodic:
10.3. AN ERGODIC THEOREM OF BIRKHOFF AND CHINČIN
133
(a) T is measure-preserving: This can be checked on the π-system of
cylinder-sets with a finite-dimensional basis.
(b) Assume now that T −1 (A) = A for some A ∈ F. By iteration we get that
T −k A = A for all k = 1, 2, .... In other words
A∈
∞
\
σ(Pk , Pk+1 , ...)
k=1
where the Pk are the coordinate functionals P : M → R. By the 0−1-law
of Kolmogorov we get that µ(A) ∈ {0, 1}.
Finally, we remark that the family (f (T k ))∞
k=0 forms an iid sequence of random variables having the distribution of f we were starting from.
Proof of Proposition 10.3.2 (see [12]). By normalization we can assume that
Ef = 0. Let
n−1
n−1
1X
1X
f (T k ) ≤ lim sup
f (T k ) =: η.
η := lim inf
n
n k=0
n k=0
n
0
The idea is to show that 0 ≤ η 0 ≤ η ≤ 0 a.s. By the symmetry of the problem
(replace f by −f ) it is sufficient to show that
η ≤ 0 a.s.
Let ε > 0 and
Aε
f∗
Sn∗
Mn∗
:=
:=
:=
:=
{η > ε},
(f − ε)χAε ,
f ∗ + · · · + f ∗ (T n−1 ),
max{0, f1∗ , ..., fn∗ }.
From Lemma 10.3.3 it follows that
Ef ∗ χMn∗ >0 ≥ 0.
Moreover,
{Mn∗ > 0} ↑n {sup Sk∗ > 0} = {sup
k
because of
supk Skk
k
Sk
Sk∗
> 0} = {sup
> ε} ∩ Aε = Aε
k
k
k
≥ η. By Lebesgues dominated convergence,
0 ≤ Ef ∗ χMn∗ >0 →n Ef ∗ χAε = Ef χAε − εP(Aε ).
If J is the σ-algebra of T -invariant sets, we get that Aε ∈ J and
Ef χAε = E(χAε E(f |J)) = E(χAε 0)
as all sets from J have measure 0 or 1. Consequently,
εP(Aε ) ≥ 0
for all ε > 0 and we are done.
134
CHAPTER 10. LIMIT THEOREMS
In the proof we used the following maximal ergodic theorem:
Lemma 10.3.3. Let T : Ω → Ω be a measure preserving map and f : Ω → R
be an integrable random variable. Let
Sn := f + · · · + f (T n−1 ),
Mn := max{0, S1 , ..., Sn }.
Then E(f χ{Mn >0} ) ≥ 0.
10.4
The law of iterated logarithm
The Law of Iterated Logarithm gives the precise asymptotics of a
random walk:
Proposition 10.4.1 (Law of iterated logarithm). Let ξ1 , ξ2 , ... : Ω → R be
a sequence of independent and identically distributed random variables such
that Eξn = 0 and Eξn2 = σ 2 ∈ (0, ∞). Then, for fn := ξ1 + · · · + ξn , one has
that
fn
fn
P lim sup
= 1, lim inf
= −1 = 1
n→∞
ψ(n)
ψ(n)
n→∞
(
)
n≥3
(
)
n≥3
p
with ψ(n) := 2σ 2 n log log n.
Remark 10.4.2.
(i) The conditions
lim sup
n→∞
n≥3
fn
= 1 a.s. and
ψ(n)
lim
inf
n→∞
n≥3
fn
= −1 a.s.
ψ(n)
are equivalent since one may consider the random variables (−ξn )∞
n=1
which satisfy the assumptions of the (LIL) as well.
(ii) The statement can be reformulated in terms of the two conditions
P (# {n ≥ 3 : fn ≥ (1 − ε)ψ(n)} = ∞) = 1,
P (# {n ≥ 3 : fn ≥ (1 + ε)ψ(n)} = ∞) = 0
for all ε ∈ (0, 1).
(iii) Chinčin proved the (LIL) in 1924 in the case that |ξn (ω)| ≤ c. Later
on, Kolmogorov extended the law to other random variables in 1929.
Finally, the above version was proved by Wiener 5 and Hartman in
1941.
5
Norbert Wiener, 26/11/1894 (Columbia, USA) - 18/03/1964 (Stockholm, Sweden),
worked on Brownian motion, from where he progressed to harmonic analysis, and won the
Bôcher prize from his studies on Tauberian theorems.
10.4. THE LAW OF ITERATED LOGARITHM
135
We will give the idea of the proof of the (LIL) in the case that ξn = gn ∼
N (0, 1). Let us start with an estimate for the distribution of gn .
Lemma 10.4.3. For g ∼ N (0, σ 2 ) with σ > 0 one has that
lim
P(g > λ)
λ→∞
λ>0
λ2
√ σ e− 2σ2
2πλ
= 1.
Proof. By the change of variables y = σx we get that
lim
λ→∞
λ>0
P(g > λ)
√σ e
2πλ
−
λ2
2σ 2
R∞
= lim
λ→∞
λ>0
λ
y2
e− 2σ2 √dy
2πσ
λ2
√ σ e− 2σ2
2πλ
R∞
= lim
λ→∞
λ>0
λ
σ
x2
e− 2
√dx
2π
λ2
√ σ e− 2σ2
2πλ
R∞
= lim
λ→∞
λ>0
λ
x2
e− 2
√ 1 e−
2πλ
√dx
2π
λ2
2
=1
where we apply the rule of L’Hospital 6 .
We need the following maximal inequality:
Lemma 10.4.4. Let ξ1 , ..., ξn : Ω → R be independent random variables
which are symmetric, that means
P(ξk ≤ λ) = P(−ξk ≤ λ)
for all k = 1, ..., n and λ ∈ R. Then one has that
P max fk > ε ≤ 2P(fn > ε)
k=1,...,n
where fk := ξ1 + · · · + ξk and ε > 0.
Proof. Let 1 ≤ k ≤ n. Because of P(fn − fk > 0) = P(fn − fk < 0) and
1 = P(fn − fk > 0) + P(fn − fk < 0) + P(fn − fk = 0)
we get that P(fn − fk ≥ 0) ≥ 1/2. Now, again as in the proof of Kolmogorov’s maximal inequality, let B1 := {f1 > ε} and
Bk := {f1 ≤ ε, ..., fk−1 ≤ ε, fk > ε}
6
Guillaume Francois Antoine Marquis de L’Hôpital, 1661 (Paris, France)- 2/2/1704
(Paris, France), French mathematician who wrote the first textbook on calculus, which
consisted of the lectures of his teacher Johann Bernoulli.
136
CHAPTER 10. LIMIT THEOREMS
for k = 2, ..., n. Then we get that, where 0/0 := 1,
P
n
X
max fk > ε
=
P(Bk )
k=1,...,n
k=1
=
n
X
P(Bk )
k=1
P(fn ≥ fk )
P(fn ≥ fk )
n
X
P(Bk ∩ {fn ≥ fk })
=
P(fn ≥ fk )
k=1
where we used the independence of Bk and fn − fk = ξk+1 + · · · + ξn if k < n.
Using that
1
≤2
P(fn ≥ fk )
we end up with
P
n
X
max fk > ε ≤ 2
P(Bk ∩ {fn ≥ fk }) ≤ 2P(fn > ε).
k=1,...,n
k=1
Proof of Proposition 10.4.1 for ξn = gn . By symmetry we only need to show
that
!
!
fn
fn
P lim sup
≤ 1 = 1 and P lim sup
≥ 1 = 1.
n→∞
n→∞
ψ(n)
ψ(n)
n≥3
n≥3
This is equivalent that for all ε ∈ (0, 1) one has that
P ω ∈ Ω : ∃n0 ≥ 3∀ n ≥ n0 fn (ω) ≤ (1 + ε)ψ(n) = 1
(10.2)
and
P
ω ∈ Ω : # {n ≥ 3 fn (ω) ≥ (1 − ε)ψ(n)}
= ∞ = 1.
(10.3)
Let nk := (1 + ε)k for k ≥ k0 such that nk0 ≥ 3.
Equation (10.2): Let ε > 0 and define
Ak := {ω ∈ Ω : ∃n ∈ (nk , nk+1 ] fn (ω) > (1 + ε)ψ(n)}
for k ≥ k0 . Then P(lim supk Ak ) = 0 would imply (10.2).PAccording to the
Lemma of Borel-Cantelli it is sufficient to show that ∞
k=k0 P(Ak ) < ∞.
This follows from
P(Ak ) ≤ P (∃n ∈ [1, nk+1 ] : fn > (1 + ε)ψ(nk ))
≤ 2P f[nk+1 ] > (1 + ε)ψ(nk )
10.5. AN APPLICATION TO INSURANCE
≤ 2c √
σ(f[nk+1 ] )
2π((1 + ε)ψ(nk ))
137
−
e
((1+ε)ψ(nk ))2
2σ(f[n
)2
k+1 ]
where we have used the maximal inequality from Lemma 10.4.4 and the
estimate from Lemma 10.4.3 and where σ(fn )2 is the variance of fn . Finally,
by σ(fn )2 = n we get (after some computation) that
∞
X
k=k0
P(Ak ) ≤
∞
X
c0 k −(1+ε) < ∞.
k=k0
Equation (10.3). Applying (10.2) to (−fn )∞
n=1 and ε = 1 gives that
P ω ∈ Ω : ∃n0 ≥ 3∀ n ≥ n0 − fn (ω) ≤ 2ψ(n) = 1.
(10.4)
We set Yk := f[nk ] − f[nk−1 ] for k > k0 and ε ∈ (0, 1). Assume that we can
show that
Yk > (1 − ε)ψ(nk ) + 2ψ(nk−1 )
(10.5)
happens infinitely often with probability one, which means by definition that
f[nk ] − f[nk−1 ] > (1 − ε)ψ(nk ) + 2ψ(nk−1 )
happens infinitely often with probability one. Together with (10.4) this would
imply that
f[nk ] > (1 − ε)ψ(nk )
happens infinitely often with probability one. Hence we have to show (10.5).
For this one can prove that
P (Yk > (1 − ε)ψ(nk ) + 2ψ(nk−1 )) ≥
so that
∞
X
c2
k log k
P (Yk > (1 − ε)ψ(nk ) + 2ψ(nk−1 )) = ∞.
k=k0 +1
An application of the Lemma of Borel-Cantelli implies (10.5).
10.5
An application to insurance
Let (Ω, F, P) be a probability space, λ > 0 and ∆1 , ∆2 , ... : Ω → [0, ∞)
be independent random variables having an exponential distribution with
parameter λ > 0, i.e.
Z ∞
P(∆k ∈ B) = λ
χB (t)e−λt dt
0
138
CHAPTER 10. LIMIT THEOREMS
for a Borel set B ∈ B(R). We can assume that the random variables ∆k do
not attend the value zero. Then we define the jump times
T0 := 0 and Tn := ∆1 + · · · + ∆n
for n ≥ 1. We know that E∆k = 1/λ.
Definition 10.5.1 (Poisson process). The stochastic process N = (Nt )t≥0 ,
Nt : Ω → R, with N0 := 0 and
Nt := n if
Tn ≤ t < Tn+1
is called Poisson Process with intensity λ.
Assume independent non-negative random variables L1 , L2 , ... : Ω → R having the same distribution such that the random variables of both families
(∆k )k≥1 and (Lk )k≥1 are independent, and some κ ∈ R, and define
Xt := κt −
Nt
X
Lk ,
k=1
where an empty sum is treated as zero. The process (Xt )t≥0 belongs to the
family of Lévy processes, that means the following properties hold:
Theorem 10.5.2. (i) For all 0 ≤ t1 < · · · < tn ≤ s < t the increment
Xt − Xs is independent from the family (Xt1 , ..., Xtn ).
(ii) For all 0 ≤ s < t < ∞ the random variables Xt − Xs and Xt−s have
the same distribution.
(iii) All trajectories t → Xt (ω) are right-continuous and have left-hand side
paths.
Letting C, κ > 0 and assume that the Lk take only non-negative values. Then
the process
Nt
X
Rt := C + κt −
Lk = C + X t
k=0
can be interpreted as risk process in insurance, where
(1) C > 0 is the starting capital,
(2) κ > 0 is the rate for charging the premium ,
(3) L1 is the claim size distribution.
The event
R := {ω ∈ Ω : ∃t ≥ 0 Rt (ω) < 0}
is called ruin.
10.5. AN APPLICATION TO INSURANCE
139
Theorem 10.5.3. If E|Lk | < ∞ and κ < λEL1 or if E|Lk |2 < ∞ and
κ = λEL1 , then P(R) = 1.
Proof. We get that
{ω ∈ Ω : ∃t ≥ 0 Rt (ω) < 0}
= {ω ∈ Ω : ∃n ≥ 1 RNn (ω) < 0}
(
=
ω ∈ Ω : ∃n ≥ 1 C + κNn (ω) −
n
X
)
Lk (ω) < 0
k=1
)
n
X
ω ∈ Ω : ∃n ≥ 1 C +
κ∆k (ω) − Lk (ω) < 0
(
=
k=1
(
=
n
X
ω ∈ Ω : ∃n ≥ 1
)
ξk (ω) > C
k=1
with ξk (ω) := Lk (ω) − κ∆k (ω). If E|Lk | < ∞ and κ < λEL1 , then
m := Eξ1 = EL1 −
κ
> 0.
λ
By the strong law of large numbers we get that
n
1X
ξk = m > 0 a.s.
lim
n→∞ n
k=1
P
so that limn→∞ nk=1 ξk = ∞ a.s. and P(R) = 1. If κ = λEL1 and E|Lk |2 <
∞ we obtain m = 0. Under this condition we can apply the Law of Iterated
Logarithm to obtain that
Pn
Pn
ξk
k=1 ξk
P(lim sup
= 1 and lim inf k=1 = −1) = 1
n
ψ(n)
ψ(n)
n
P
which also implies that limn→∞ nk=1 ξk = ∞ a.s. and therefore P(R) =
1.
The only condition which remains possible for an insurance company not go
bankruptcy is
κ > λEL1
which is called Net Profit Condition.
140
CHAPTER 10. LIMIT THEOREMS
Chapter 11
Characteristic functions
Example 1. We assume a source which sends out particles to one side with a
uniformly distributed (random) angle which ranges from zero to 180 degree.
There is a wall in distance of 1 meter. The function f (θ) gives the position of
the particle which hits the wall where θ = π/2 gives position 0. The function
f is random because it depends on the random angle θ. Consider a second
wall parallel to the first one, but with a distance of 2 meters. Now we think
in two different ways: firstly, the particle sent out hits the first wall and will
be resent out to the second wall. Secondly, we do the experiment in one
step and wait until the particle hits the second wall. Is it possible that both
experiments give the same distribution? The answer is yes. Knowing this we
can analyze the distribution in an abstract manner: Namely, we obtain that
the distributions of
f1 + f2 and 2f
are the same, where f1 and f2 are independent copies of f . The property
of this distribution is to be 1-stable, the distribution is called Cauchy 1
distribution. Stable distributions are used (for example) in stochastic modelling and in probabilistic methods in functional analysis. We shall prove the
existence of stable distributions by the help of characteristic functions.
We close with a fundamental example concerning the convergence in distribution: the Central Limit Theorem (CLT). For this we need
Definition 11.0.1. Let (Ω, F, P) be a probability spaces. A sequence of
Independent random variables fn : Ω → R is called Identically Distributed
(i.i.d.) provided that the random variables fn have the same law, that means
P(fn ≤ λ) = P(fk ≤ λ)
for all n, k = 1, 2, ... and all λ ∈ R.
1
Augustin Louis Cauchy, 21 Aug 1789 (Paris) - 23 May 1857 (Sceaux), real and complex
analysis.
142
CHAPTER 11. CHARACTERISTIC FUNCTIONS
Let (Ω, F, P) be a probability space and (fn )∞
n=1 be a sequence of i.i.d. ran2
2
dom variables with Ef1 = 0 and Ef1 = σ . By the law of large numbers we
know
f1 + · · · + fn P
−→ 0.
n
Hence the law of the limit is the Dirac-measure δ0 . Is there a right scaling
factor c(n) such that
f1 + · · · + fn
→ g,
c(n)
where g is a non-degenerate random variable in the sense that Pg 6= δ0 ? And
in which sense does the convergence take place? The answer is the following
Proposition 11.0.2. [Central Limit Theorem] Let (fn )∞
n=1 be a se2
2
quence of i.i.d. random variables with Ef1 = 0 and Ef1 = σ > 0. Then
Z x
u2
1
f1 + · · · + fn
√
≤x → √
e− 2 du
P
σ n
2π −∞
for all x ∈ R as n → ∞, that means that
f1 + · · · + fn d
√
→g
σ n
Rx
u2
for any g with P(g ≤ x) = √12π −∞ e− 2 du.
11.1
Complex numbers
We identify
C∼
= R2 = {(x, y) : x, y ∈ R}
and write
C 3 z = x + iy ∼
= (x, y) ∈ R2 ,
where x = Re(z) is the real part of z and y = Im(z) is the imaginary part
of z. The complex numbers are an extension of the real numbers by using R
as C with x 7→ (x, 0) = x + i · 0. We recall some definitions:
Addition. For z1 = (x1 , y1 ) and z2 = (x2 , y2 ) we let
z1 + z2 := (x1 + x2 , y1 + y2 ) = (x1 + x2 ) + i(y1 + y2 ).
Multiplication. For z1 = (x1 , y1 ) and z2 = (x2 , y2 ) we let
z1 z2 := (x1 x2 − y1 y2 , x1 y2 + x2 y1 ) = (x1 x2 − y1 y2 ) + i(x1 y2 + x2 y1 ).
Remark 11.1.1.
(i) If we interpret i2 = −1, we get this formally by
(x1 + iy1 )(x2 + iy2 ) = x1 x2 + i2 y1 y2 + i(x1 y2 + x2 y1 ).
(ii) If z1 = (x1 , 0), then z1 z2 = (x1 x2 , x1 y2 ) = x1 (x2 , y2 ) and, in the same
way, if z2 = (x2 , 0), then z1 z2 = x2 (x1 , y1 ).
11.1. COMPLEX NUMBERS
143
Length of a complex number: If z = (x, y), then |z| =
p
x2 + y 2 .
Conjugate complex number: If z = (x, y), then z̄ := (x, −y) = x − iy.
We have that z z̄ = x2 + y 2 = |x|2 .
Polar coordinates: These coordinates are given as (r, ϕ) where r ≥ 0 and
ϕ ∈ [0, 2π) are determined by
x = r cos ϕ,
y = r sin ϕ.
Note that r = |z| and that ϕ is not unique whenever r = 0.
Now we recall the notion of the complex exponential function.
Definition 11.1.2. For z ∈ C we let
∞
X
zn
z
e :=
,
n!
n=0
where the convergence is considered with respect to the euclidean metric in
R2 .
Proposition 11.1.3.
(ii) Euler’s
2
(i) For all z1 , z2 ∈ C one has ez1 +z2 = ez1 ez2 .
formula: One has eix = cos x + i sin x for x ∈ R.
Proof. (i) is an exercise. (ii) follows from
1
x0 x2 x4
x
x 3 x5
ix
e =
−
+
− ... + i
−
+
− . . . = cos x + i sin x.
0!
2!
4!
1!
3!
5!
Complex valued random variables.
Definition 11.1.4. Let (Ω, F) be a measurable space.
(i) A map f : Ω → C is called measurable provided that f : Ω → R2 is
measurable, i.e. for f = (f1 , f2 ) the maps f1 , f2 : Ω → R are measurable.
(ii) A random variable f : Ω → C is called integrable provided that
Z
|f (ω)|dP(ω) < ∞.
Ω
In this case we let
Z
Z
Z
f (ω)dP(ω) =
Re (f (ω)) dP(ω) + i Im (f (ω)) dP(ω).
Ω
2
Ω
Ω
Leonhard Euler 15/04/1707 (Basel, Switzerland) - 18/09/1783 (St Petersburg, Russia), Swiss mathematician.
144
CHAPTER 11. CHARACTERISTIC FUNCTIONS
11.2
Definition and basic properties of characteristic functions
The concept of characteristic functions is one of the most important tools in
probability theory, and it also gives a link to harmonic analysis. The idea
is to describe properties of random variables f : Ω → Rd or of measures on
B(Rd ) by a Fourier transform.
d
Definition 11.2.1. Given d ∈ {1, 2, . . .}, we let M+
1 (R ) be the set of all
probability measures µ on B(Rd ).
The notation ‘1‘ stands for µ(Rd ) = 1 and ‘+‘ for µ(A) ≥ 0.
Definition 11.2.2.
d
(i) Let µ ∈ M+
1 (R ). Then
Z
eihx,yi dµ(y), x ∈ Rd ,
µ
b(x) :=
Rd
is called Fourier transform of µ.
(ii) Let f : Ω → Rd be a random variable. Then
Z
ihx,f i
eihx,f (ω)i dP(ω)
ϕf (x) := Ee
=
Ω
is called characteristic function of f .
Remark 11.2.3. (i) µ
b and ϕf exist, since eihx,yi = eihx,f i = 1 and
y → eihx,yi is continuous, so that y → eihx,yi and ω → eihx,f (ω)i are
measurable.
(ii) If µf is the law of f : Ω → Rd , then ϕf (x) = µ
bf (x) for x ∈ Rd . In
fact, this follows from the change of variable formula where we get, for
ψ(y) = eihx,yi ,
Z
Z
Z
ihx,f (ω)i
e
dP(ω) =
ψ(y)dµf (y) =
eihx,yi dµf (y).
Rd
Ω
Rd
Example 11.2.4. (a) Let a ∈ Rd and δa be the Dirac-measure with
(
1, a ∈ B
δa (B) =
0, a ∈
/ B.
Then
Z
δba (x) =
R
eihx,yi dδa (y) = eihx,ai .
11.2. DEFINITION AND BASIC PROPERTIES
(b) Let a1 , . . . , an ∈ Rd , 0 ≤ θi ≤ 1,
µ
b(x) =
n
X
θi eihx,ai i =
i=1
Pn
i=1 θi
n
X
145
= 1 and µ =
Pn
i=1 θi δai .
Then
θi (cos hx, ai i + i sin hx, ai i) ,
i=1
which is a trigonometric polynomial.
(c) Binomial distribution: Let 0 < p < 1, d = 1, n ∈ {1, 2, . . .} and
n n−k
µ({k}) :=
p (1 − p)k , for k = 0, . . . , n.
k
Then
Z
eixy dµ(y)
R
n X
n n−k
=
p (1 − p)k eixk
k
k=0
n X
k
n n−k
=
p
(1 − p)eix
k
k=0
n
= p + (1 − p)eix .
µ
b(x) =
d
Proposition 11.2.5. Let µ ∈ M+
1 (R ). Then the following is true:
(i) The function µ
b : Rd → C is uniformly continuous, that means that
for all ε > 0 there exists δ > 0 such that |b
µ(x) − µ
b(y)| ≤ ε whenever
21
P
d
2
≤ δ.
|x − y| =
i=1 |xi − yi |
(ii) For all x ∈ Rd one has |b
µ(x)| ≤ µ
b(0) = 1.
(iii) The function µ
b is positive semi-definite, that means that for all
d
x1 , . . . , xn ∈ R and λ1 , . . . , λn ∈ C it follows that
n
X
λk λ̄l µ
b(xk − xl ) ≥ 0.
k,l=1
Proof. (ii) This part follows from
Z
Z
Z
ihx,yi
dµ(y) = cos hx, yi dµ(y) + i
de
d
R
Rd
R
=
sin hx, yi dµ(y)
Z
2 Z
cos hx, yi dµ(y) + d
Rd
R
Z
≤
2
Z
|cos hx, yi| dµ(y) +
Rd
2 ! 21
sin hx, yi dµ(y)
2
|sin hx, yi| dµ(y)
Rd
21
146
CHAPTER 11. CHARACTERISTIC FUNCTIONS
= 1,
where we have used Hölder’s inequality.
(iii) Here we have that
n
X
λk λ̄l µ
b(xk − xl ) =
k,l=1
=
n
X
k,l=1
n
X
Z
λk λ̄l
eihxk −xl ,yi dµ(y)
Rd
Z
λk λ̄l
eihxk ,yi e−ihxl ,yi dµ(y).
Rd
k,l=1
Since eiα = cos α + i sin α = cos α − i sin α = cos(−α) + i sin(−α) = e−iα , we
can continue to
#
Z "X
n
n
X
λk λ̄l eihxk ,yi e−ihxl ,yi dµ(y)
λk λ̄l µ
b(xk − xl ) =
Rd
k,l=1
Z
=
Rd
k,l=1
" n
X
k=1
λk eihxk ,yi
#" n
X
#
λ̄l e−ihxl ,yi dµ(y)
l=1
n
2
X
ihxk ,yi λk e
dµ(y) ≥ 0.
Rd Z
=
k=1
(i) Let ε > 0. Choose a ball of radius R > 0 such that µ(Rd \ BR (0)) < 3ε ,
where
! 21
d
X
BR (0) := x ∈ Rd :
|xi |2
≤R .
i=1
ε
Take δ := 3R
. Then, since eiα − eiβ ≤ |α − β| and by Hölder’s inequality, if
|x1 − x2 | ≤ δ
≤
≤
≤
≤
|b
µ(x ) − µ
b(x2 )|
Z 1
Z
ihx ,yi
ihx ,yi
ihx
,yi
1
2
e
dµ(y) +
e 1 − eihx2 ,yi dµ(y)
−e
Rd \BR (0)
ZBR (0)
Z
|hx1 − x2 , yi| dµ(y) +
2dµ(y)
BR (0)
Rd \BR (0)
Z
ε
|x1 − x2 |
|y|dµ(y) + 2
3
BR (0)
ε
ε
R + 2 = ε.
3R
3
Now we connect our Fourier transform to the Fourier transform for functions
ψ : Rd → C.
11.2. DEFINITION AND BASIC PROPERTIES
147
Definition 11.2.6. For a Borel-measurable ψ : Rd → C such that
Z
|ψ(y)|dλ(y) < ∞
Rd
we let
Z
ψ̂(x) :=
eihx,yi ψ(y)dλ(y)
Rd
be the Fourier transform of ψ, where x ∈ Rd and λ is the Lebesgue measure
on Rd .
Before we give the connection to our previous definition of a Fourier transform
we denote the collection of finite signed measures on (Ω, F) (see Definition
7.1.1) by M(Ω, F). In case of M(Rd , B(Rd )) we simply write M(Rd ).
Definition 11.2.7. For µ ∈ M(Rd ) we let
Z
Z
ihx,yi
+
µ
b(x) :=
e
dµ (y) −
Rd
eihx,yi dµ− (y).
Rd
The expression does not depend on the decomposition µ = µ+ − µ− . Let us
now connect the different definitions.
Proposition
11.2.8. Let ψ : Rd → R be Borel-measurable such that
R
|ψ(x)|dλ(x) < ∞. Define
Rd
Z
χB (x)ψ(x)dλ(x).
µ(B) :=
Rd
Then one has the following:
(i) µ ∈ M(Rd ) with µ = µ+ − µ− , where
Z
±
µ (B) :=
χB (x)ψ ± (x)dλ(x),
Rd
−
+
ψ := max{ψ, 0} and ψ := max{−ψ, 0}.
b
(ii) ψ(x)
=µ
b(x) for all x ∈ Rd .
Proof. Assertion (i) is obvious, (ii) follows from
Z
Z
ihx,yi
+
µ̂(x) =
e
dµ (y) −
eihx,yi dµ− (y)
d
d
R Z
ZR
=
eihx,yi ψ + (y)dλ(y) −
eihx,yi ψ − (y)dλ(y)
d
d
R
ZR
=
eihx,yi ψ(y)dλ(y).
Rd
148
11.3
CHAPTER 11. CHARACTERISTIC FUNCTIONS
Convolutions
Often one has the problem to compute the distribution of f1 + · · · + fn
where f1 , ..., fn : Ω → Rd are random variables. We already considered
some problems in this direction in the SLLN and LIL. Now we introduce a
technique especially designed for this purpose: the convolution.
d
Definition 11.3.1 (Convolution of measures). Let µ1 , . . . , µn ∈ M+
1 (R ).
d
d
Then µ1 ∗ · · · ∗ µn ∈ M+
1 (R ) is the law of f : Ω → R defined by
(i) (Ω, F, P) := Rd × · · · × Rd , B(Rd ) ⊗ · · · ⊗ B(Rd ), µ1 × · · · × µn ,
(ii) f (x1 , . . . , xn ) := x1 + · · · + xn ,
that is
(µ1 ∗ · · · ∗ µn )(B) = (µ1 × · · · × µn ) ({(x1 , . . . , xn ) : x1 + · · · + xn ∈ B}) .
The measure µ1 ∗ · · · ∗ µn is called convolution of µ1 , . . . , µn .
Remark 11.3.2. One has Rd × · · · × Rd = Rnd and B(Rd ) ⊗ · · · ⊗ B(Rd ) =
B(Rnd ).
Example 11.3.3. Let a ∈ Rd and
(
1, a ∈ B
δa (B) :=
0, a ∈
/B
d
be the Dirac-measure δa ∈ M+
1 (R ). Then δa1 ∗ · · · ∗ δan = δa1 +···+an .
Proof. By definition,
(δa1 ∗ · · · ∗ δan )(B)
= (δa1 × · · · × δan ) ({(x1 , . . . , xn ) : x1 + · · · + xn ∈ B})
= δ(a1 ,...,an ) ({(x1 , . . . , xn ) : x1 + · · · + xn ∈ B})
(
1, a1 + · · · + an ∈ B
=
0, a1 + · · · + an ∈
/ B.
11.3. CONVOLUTIONS
149
d
Example 11.3.4. δ0 ∗ µ = µ ∗ δ0 = µ for all µ ∈ M+
1 (R ), that means that
δ0 is a unit with respect to the convolution.
Proof. By Fubini’s theorem
(δ0 ∗ µ)(B) = (δ0 × µ) ({(x1 , x2 ) : x1 + x2 ∈ B})
Z Z
=
χB (x1 + x2 )dδ0 (x1 ) dµ(x2 )
d
Rd
ZR
=
χB (x2 )dµ(x2 ) = µ(B).
Rd
For the other direction one can use Proposition 11.3.5 below.
d
Proposition 11.3.5. For µ1 , µ2 , µ3 ∈ M+
1 (R ) one has
(i) µ1 ∗ µ2 = µ2 ∗ µ1 ,
(ii) µ1 ∗ (µ2 ∗ µ3 ) = (µ1 ∗ µ2 ) ∗ µ3 = µ1 ∗ µ2 ∗ µ3 .
Proof. Let f1 , f2 , f3 : Ω → R be independent random variables such that
law(fj ) = µj . Now (i) follows from the fact that f1 + f2 and f2 + f1 have
the same distribution, and (ii) follows from f1 + (f2 + f3 ) = (f1 + f2 ) + f3 =
f1 + f2 + f3 .
As before, we consider the convolution for functions as well. What is a
good candidateR for this? Assume
ψ1 , ψ2 : R → [0, ∞) to be Rcontinuous
R
and such that R ψ1 (x)dx = R ψ2 (x)dx = 1 and define µj (B) := B ψj (x)dx,
B ∈ B(R), for j = 1, 2 so that µ1 , µ2 ∈ M+
1 (R). Now let us formally compute
(µ1 ∗ µ2 )(B) = (µ1 × µ2 )({(x1 , x2 ) : x1 + x2 ∈ B})
Z Z
=
χB (x1 + x2 )ψ1 (x1 )dλ(x1 ) ψ2 (x2 )dλ(x2 )
R
R
Z Z
=
χB (x)ψ1 (x − x2 )dλ(x) ψ2 (x2 )dλ(x2 )
R
R
Z
Z
=
χB (x)
ψ1 (x − x2 )ψ2 (x2 )dλ(x2 ) dλ(x)
R
R
by Fubini’s theorem. Hence (ψ1 ∗ ψ2 )(x) :=
be a good candidate.
R
R
ψ1 (x − y)ψ2 (y)dλ(y) seems to
150
CHAPTER 11. CHARACTERISTIC FUNCTIONS
Definition 11.3.6 (Convolutions of functions). (i) Let ψ1 , ψ2 : Rd → R
d
be
R Borel-functions such that
R ψ1 (x) ≥ 0 and ψ2 (x) ≥ 0 for all x ∈ R ,
ψ (x)dλd (x) < ∞ and Rd ψ2 (x)dλd (x) < ∞. Then
Rd 1
Z
ψ1 (x − y)ψ2 (y)dλd (y).
(ψ1 ∗ ψ2 )(x) :=
Rd
The function ψ1 ∗ ψ2 : Rd → R is called convolution of ψ1 and ψ2 .
R
d
(ii) For Borel-functions
ψ
,
ψ
:
R
→
R
such
that
|ψ1 (x)|dλd (x) < ∞
1
2
R
R
and R |ψ2 (x)|dλd (x) < ∞ we let
(ψ1 ∗ψ2 )(x) = (ψ1+ ∗ψ2+ )(x)−(ψ1+ ∗ψ2− )(x)−(ψ1− ∗ψ2+ )(x)+(ψ1− ∗ψ2− )(x)
if all terms on the right-hand side are finite.
We need to justify the definition above:
Proposition 11.3.7. Let ψ1 , ψ2 : Rd → R be as in Definition 11.3.6(i). Then
one has the following:
(i) ψ1 ∗ ψ2 : Rd → [0, ∞] is an extended measurable function.
(ii) λd ({x ∈ Rd : (ψ1 ∗ ψ2 )(x) = ∞}) = 0.
R
R
R
(iii) Rd (ψ1 ∗ ψ2 )(x)dλd (x) = Rd ψ1 (x)dλd (x) Rd ψ2 (x)dλd (x).
Proof. Consider the map g : Rd × Rd → R defined by
g(x, y) := ψ1 (x − y)ψ2 (y).
We get a non-negative measurable function g : Rd × Rd → R and can apply
Fubini’s theorem. We observe that
Z Z
Z Z
g(x, y)dλd (x)dλd (y) =
ψ1 (x − y)ψ2 (y)dλd (x) dλd (y)
Rd Rd
Rd
Rd
Z Z
=
ψ1 (x − y)dλd (x) ψ2 (y)dλd (y)
Rd
Rd
Z
Z
=
ψ2 (y)dλd (y)
ψ1 (x)dλd (x)
Rd
< ∞.
Rd
implies (iii). Moreover, (i) and (ii) follow as a byproduct from Fubini’s theorem.
11.4. SOME IMPORTANT PROPERTIES
151
Remark 11.3.8. The Lebesgue measure of those x ∈ Rd for which we cannot
define (ψ1 ∗ψ2 )(x) in Definition 11.3.6(ii) is zero so that we (can) agree about
(ψ1 ∗ ψ2 )(x) = 0 for those x.
Now we connect the two convolutions to each other.
d
Proposition 11.3.9. Let µ1 , µ2 ∈ M+
1 (R ) with
Z
χB (x)pi (x)dλd (x), i = 1, 2,
µi (B) :=
Rd
for Borel-measurable and non-negative p1 , p2 : Rd → R such that
Z
Z
p1 (x)dλd (x) =
p2 (x)dλd (x) = 1.
Rd
Then
Rd
Z
(p1 ∗ p2 )(x)χB (x)dλd (x).
(µ1 ∗ µ2 )(B) =
Rd
Exercise 11.3.10. Compute µ1 ∗ µ2 , where µi is the uniform distribution on
[ai , ai + 1] ⊆ R for i = 1, 2.
11.4
Some important properties
d
Proposition 11.4.1. For µ, ν ∈ M+
1 (R ) one has the following:
\
(i) µ
b + νb = µ
+ ν where (µ + ν)(B) := µ(B) + ν(B).
(ii) µ[
∗ν =µ
bνb.
(iii) If A = (aij )di,j=1 : Rd → Rd is a linear transformation, then
[
A(µ)(x)
=µ
b(AT x),
where A(µ)(B) := µ(x ∈ Rd : Ax ∈ B).
(iv) If Sa : Rd → Rd is the shift operator Sa x := x + a, a ∈ Rd , then
\
b b.
S
a (µ) = δa µ
152
CHAPTER 11. CHARACTERISTIC FUNCTIONS
Proof. (i) is clear, (ii) follows from
Z Z
µ[
∗ ν(x) =
eihx,y+zi dµ(y)dν(z)
d
d
ZR R
Z
ihx,yi
=
e
dµ(y)
eihx,zi dν(z)
Rd
Rd
= µ
b(x)b
ν (x).
(iii) is an exercise and
Z
\
S
a (µ)(x) =
Rd
eihx,y+ai dµ(y) = eihx,ai µ
b(x)
implies (iv).
Definition 11.4.2. A system Π of subsets A ∈ Ω is called π-system, provided
that A ∩ B ∈ Π for all A, B ∈ Π.
Proposition 11.4.3. Let (Ω, F, P1 ) and (Ω, F, P2 ) be probability spaces such
that P1 (A) = P2 (A) for all A ∈ Π where Π is a π-system which generates F.
Then P1 = P2 .
Examples 11.4.4.
then
(a) If (Ω1 , F1 ) and (Ω2 , F2 ) are measurable spaces,
Π := {A1 × A2 : A1 ∈ F1 , A2 ∈ F2 }
is a π-system.
(b) Assume a metric space M with the Borel-σ-algebra generated by the
open sets. Then the system of open sets of a metric space is a π-system
that generates the Borel σ-algebra. Another π-system that generates
the Borel σ-algebra is the system of closed sets.
Proposition 11.4.5. Assume that f1 , . . . , fn : Ω → Rd are independent
d
random variables and that µ1 , . . . , µn ∈ M+
1 (R ) are the laws of f1 , . . . , fn ,
that means that
P (fk ∈ B) = µk (B) for all B ∈ B(Rd ).
Then µ1 ∗ · · · ∗ µn is the law of S := f1 + · · · + fn .
11.4. SOME IMPORTANT PROPERTIES
153
n
Proof. Define the product space ×
(Rd , B(Rd ), µk ) =: (M, Σ, Q) and consider
1
the random vector F : Ω → Rnd given by F (ω) := (f1 (ω), . . . , fn (ω)). Let µ
be the law of F . Then, by independence,
µ(B1 × · · · × Bn ) = P(f1 ∈ B1 , . . . , fn ∈ Bn )
n
Y
=
P(fk ∈ Bk )
k=1
= µ1 (B1 ) · · · µn (Bn )
= Q(B1 × · · · × Bn ).
Since the system Π := {B1 ×· · ·×Bn : Bk ∈ B(Rd )} is a generating π-system,
we get that µ = Q. Consequently,
P(S ∈ B) = µ({(x1 , . . . xn ) : x1 + · · · + xn ∈ B})
= Q({(x1 , . . . xn ) : x1 + · · · + xn ∈ B})
= (µ1 ∗ · · · ∗ µn )(B).
A hopefully motivating example.
Example 11.4.6. Let f1 , f2 , . . . : Ω → R be independent random variables
having the same distribution. Let
1
Sn := √ (f1 + · · · + fn ).
n
We are interested in the convergence of Sn and compute their characteristic
functions. Here we get that
\
ϕSn (t) =
= µ √f1 ∗ · · · ∗ µ √fn (t)
f
fn
√1 +···+ √
n
n
n
n
n n
f1 n
t
i √n t
=
µd
= Ee
= ϕf1 ( √ ) ,
f (t)
√1
n
n
ϕ
where µ √fk is the law of
n
(t)
fk
√
.
n
Now we ask:
(Q1) Under what conditions does (ϕf1 ( √tn ))n converge to a function ϕ? If
yes, is the limit ϕ a characteristic function, i.e. does there exist a
probability measure µ such that µ
b = ϕ.
(Q2) And finally, if there is a measure µ, what is its connection to the distributions of Sn ?
154
CHAPTER 11. CHARACTERISTIC FUNCTIONS
The second question will be answered now.
d
Proposition 11.4.7 (Uniqueness). Let µ, ν ∈ M+
1 (R ). Then µ = ν, if and
only if µ
b = νb.
Proof. It is clear that if µ = ν, then µ
b = νb. Let us show the other direction
d
and assume f ∈ L1 (R ; C). By Fubini’s theorem we get
Z
Z Z
µ
b(y)f (y)dλd (y) =
eihx,yi f (y)dµ(x)dλd (y)
d
Rd
Rd
ZR Z
ihx,yi
e
f (y)dλd (y) dµ(x)
=
Rd
Rd
Z
fb(x)dµ(x).
=
Rd
R
R
Hence µ
b = νb implies that Rd fb(x)dµ(x) = Rd fb(x)dν(x) for all f ∈
L1 (Rd ; C). Now we need a little more background and interrupt the proof.
Definition 11.4.8. Let
d
d
C0 (R ; C) := g : R → C continuous and lim |g(x)| = 0
|x|→∞
and
kgkC0 := sup |g(x)|.
x∈Rd
Proposition 11.4.9 (Riemann & Lebesgue). For all f ∈ L1 (Rd ; C) one has
fb ∈ C0 (Rd ; C).
Proof. First of all we remark that by decomposing f into the positive and
negative parts of the real and imaginary part (so that we have four parts)
and applying Proposition 11.2.5 we get that fb is continuous. Next we recall
that
Z
b
[f (y) − g(y)] eixy dλd (y)
f (x) − gb(x) = d
ZR
≤
|f (y) − g(y)| dλd (y)
Rd
= kf − gkL1 .
Assume that E ⊆ L1 (Rd ; C) is a dense subset such that fb0 ∈ C0 (Rd ; C) for
f0 ∈ E. Letting f ∈ L1 (Rd ; C) we find fn ∈ E such that limn kfn − f kL1 = 0
11.4. SOME IMPORTANT PROPERTIES
155
so that limn supx∈Rd |fbn (x) − fb(x)| = 0. Given ε > 0 we find an n ≥ 1 such
that kfbn − fbkC0 < ε so that
b b
b
lim
f
(x)
=
lim
f
(x)
−
f
(x)
|x|→∞ ≤ ε.
n
|x|→∞
Since this holds for all ε > 0, we are done. What is the set E? We
can take all linear combinations of indicator functions g(x1 , . . . , xd ) =
χ(a1 ,b1 ) (x1 ) · · · χ(ad ,bd ) (xd ) for −∞ < ak < bk < ∞. In this case we obtain
that gb(x1 , . . . , xd ) = χ
b(a1 ,b1 ) (x1 ) · · · χ
b(ad ,bd ) (xd ) and
Z
χ
b(ak ,bk ) (xk ) =
bk
eixk yk dyk =
ak
1
eixk bk − eixk ak −→ 0, as |xk | −→ ∞.
ixk
The second result we need is
Proposition 11.4.10 (Stone & Weierstrass). Assume that A ⊆ C0 (Rd ; C)
satisfies the following properties:
(i) A is a linear space.
(ii) g1 , g2 ∈ A implies g1 g2 ∈ A.
(iii) g ∈ A implies ḡ ∈ A.
(iv) For all x0 ∈ Rd there is a g ∈ A such that g(x0 ) 6= 0.
(v) For all x0 6= x1 there is a g ∈ A such that g(x0 ) 6= g(x1 ).
Then A is dense, that means that for all g ∈ C0 (Rd ; C) there exists gn ∈ A
such that limn supx |gn (x) − g(x)| = 0.
We leave this without proof.
Corollary 11.4.11. One has that
n
o
d
d
b
A := f : R → C : f ∈ L1 (R ; C) ⊆ C0 (Rd ; C)
is dense.
156
CHAPTER 11. CHARACTERISTIC FUNCTIONS
Proof. Proposition 11.4.9 implies A ⊆ C0 (Rd ; C). Now we have to check that
the conditions of Proposition 11.4.10 are satisfied.
(i) is clear.
(ii) fb1 fb2 = f\
1 ∗ f2 and kf1 ∗ f2 kL1 ≤ kf1 kL1 kf2 kL1
(iii) Here we have that
Z
Z
ihx,yi
b
f (x) =
e
f (y)dλd (y) =
Rd
e−ihx,yi f (y)dλd (y) = fb(−x).
Rd
(iv) Let x0 ∈ Rd , V := {x0 + y : |y| ≤ 1}, f (x) := χV (x)e−ihx0 ,xi ∈ L1 (Rd ; C).
Then
Z
b
f (x0 ) =
eihx0 ,xi χV (x)e−ihx0 ,xi dλd (x) = λd (V ) > 0.
Rd
(v) Let x0 6= x1 and z0 := (x0 − x1 )/|x0 − x1 |2 . Then eihx0 ,z0 i 6= eihx1 ,z0 i
because of eihx0 −x1 ,z0 i = ei 6= 1 and eihx0 ,zi 6= eihx1 ,zi for all z ∈ W , where W
is a small neighborhood of z0 . Let f (x) := χW (x)(e−ihx0 ,xi − e−ihx1 ,xi ). Then
Z
−ihx0 ,yi
b
b
f (x0 ) − f (x1 ) =
e
− e−ihx1 ,yi eihx0 ,yi − eihx1 ,yi dλd (y)
ZW
ihx ,yi
e 0 − eihx1 ,yi 2 dλd (y) > 0.
=
W
Now we finish the proof of the uniqueness theorem Proposition 11.4.7. We
got that
Z
Z
b
f (x)dµ(x) =
fb(x)dν(x)
Rd
Rd
d
for f ∈ L1 (R ; C) and want to show that µ = ν. Assuming g ∈ C0 (Rd ; C)
and fn ∈ L1 (Rd ; C) such that kg − fbn kC0 −→
n 0, we get that
Z
Z
g(x)dµ(x)
−
g(x)dν(x)
d
d
R
ZR h
Z
i
h
i
b
b
= g(x) − fn (x) dµ(x) −
g(x) − fn (x) dν(x)
d
Rd
R
≤ 2 g − fbn −→
0.
n
C0
Let us define now the system
Π := {(a1 , b1 ] × · · · × (ad , bd ] : −∞ < ak ≤ bk < ∞, k = 1, . . . , d}
For large n we find continuous functions
x ∈ [ak + n1 , bk ]
1 :
(n)
gk (x) := 0 :
x ≤ ak or x ≥ bk +
linear : otherwise
1
n
11.4. SOME IMPORTANT PROPERTIES
157
so that
lim
n→∞
Since g n (x) :=
that
Qd
k=1
µ
d
Y
(n)
gk (xk ) = χQd
k=1 (ak ,bk ]
(x).
k=1
(n)
gk (xk ) ∈ C0 (Rd ; C) we get by dominated convergence
d
Y
!
(ak , bk ]
Z
lim g n (x)dµ(x)
=
n→∞
Rd
k=1
Z
=
g n (x)dµ(x)
lim
n→∞
Rd
Z
=
lim
n→∞
Rd
d
Y
= ν
g n (x)dν(x)
!
(ak , bk ] .
k=1
Since Π is a π-system which generates B(Rd ), we are done.
Next, we state the important
Proposition 11.4.12 (Bochner & Khintchine). Assume that ϕ : Rd → C is
continuous with ϕ(0) = 1. Then the following assertions are equivalent
d
(i) ϕ is the Fourier transform of some µ ∈ M+
1 (R ).
(ii) ϕ is positive semi-definite, i.e. for all n = 1, 2, . . . for all x1 , . . . , xn ∈
Rd and λ1 , . . . , λn ∈ C
n
X
λk λ̄l ϕ(xk − xl ) ≥ 0.
k,l=1
Next we have an explicit inversion formula.
Proposition 11.4.13. (i) Let µ ∈ M+
1 (R) and let F (b) = µ((−∞, b]) be
its distribution function. Then
1
F (b) − F (a) = lim
c→∞ 2π
Z
c
−c
e−iya − e−iyb
µ
b(y)dλ(y),
iy
if a < b and a and b are points of continuity of F .
158
CHAPTER 11. CHARACTERISTIC FUNCTIONS
R
(ii) If µ ∈ M+
µ(x)|dλ(x) < ∞, then µ has a continuous den1 (R) and R |b
sity f : R → [0, ∞), i.e.
Z
µ(B) =
f (x)dλ(x).
B
Moreover,
1
f (x) =
2π
Z
R
e−ixy µ
b(y)dλ(y).
Proposition 11.4.14. Let µ ∈ M+
1 (R). Then µ(B) = µ(−B) for all B ∈
B(R), if and only if µ
b(x) ∈ R for all x ∈ R.
The proof is an exercise.
The Bochner & Khintchine assumption positive semi-definite is sometimes
difficult to check. There is an easier sufficient condition:
Proposition 11.4.15 (Polya). Let ϕ : R → [0, ∞) be
(i) continuous,
(ii) even (i.e. ϕ(x) = ϕ(−x)),
(iii) convex on [0, ∞),
(iv) and assume that ϕ(0) = 1 and limx→∞ ϕ(x) = 0.
Then there exists some µ ∈ M+
b = ϕ.
1 (R) such that µ
11.5
Examples
Normal distribution on R. Recall that
Z
x2 dλ1 (x)
γ(B) :=
e− 2 √
2π
B
is the standard normal distribution on R.
Lemma 11.5.1. One has that
x2
γ
b(x) = e− 2 .
11.5. EXAMPLES
159
Proof. We will not give all details. By definition
Z
Z
2
y 2 dλ1 (y)
ixy − y2 dλ1 (y)
√
γ
b(x) =
e e
=
cos(xy)e− 2 √
.
2π
2π
R
R
But then
Z
γ
b (x) = −
2
− y2
sin(xy)ye
dλ1 (y)
√
= −x
2π
Z
y 2 dλ1 (y)
cos(xy)e− 2 √
= −xb
γ (x)
2π
R
R
√
by partial integration. Letting y ≥ 0 and E(y) := γ
b( 2y) gives
√
√ 1 1 0 p
2y p
1 0 p
0
b ( 2y) = √ γ
b ( 2y) = − √ γ
b( 2y) = −E(y)
E (y) = 2 √ γ
2 y
2y
2y
0
2
x2
for y > 0. Since E(0) = 1, we get E(y) = e−y and γ
b(x) = E( x2 ) = e− 2 .
Now we shift and stretch the normal distribution. Let σ > 0 and m ∈ R.
Then
Z
2
1 (x−m)
dx
e− 2 σ2 √
γm,σ2 (B) :=
.
2πσ 2
B
Proposition 11.5.2. (i) γm,σ2 ∈ M+
1 (R).
R
(ii) R xdγm,σ2 (x) = m is the mean.
R
(iii) R (x − m)2 dγm,σ2 (x) = σ 2 is the variance.
1
(iv) γ
bm,σ2 (x) = eimx e− 2 σ
2 x2
.
Proof. (i) follows from
Z
Z
(x−m)2
y 2 dy
dx
− 21
σ2
√
e
=
e− 2 √ = 1,
2π
2πσ 2
R
R
where we have used y := x−m
and σdy = dx and the last equality was shown
σ
in the basic course.
(ii) is a consequence of
Z
Z
Z
(x−m)2
1 2 dy
dx
− 12
σ2
√
xdγm,σ2 (x) =
xe
= (σy + m)e− 2 y √ = m
2π
2πσ 2
R
R
R
.
for y := x−m
σ
(iii) is an exercise.
(iv) Here we get
Z
γ
bm,σ2 (x) =
R
2
1 (y−m)
σ2
eixy e− 2
dy
√
2πσ 2
160
CHAPTER 11. CHARACTERISTIC FUNCTIONS
Z
z 2 dz
eix(σz+m) e− 2 √
2π
R
ixm
= e γ
b(xσ)
=
1
= eimx e− 2 σ
2 x2
.
Definition 11.5.3. A measure µ ∈ M+
1 (R) is a Gaussian measure, provided
that µ = δm for some m ∈ R or µ = γm,σ2 for σ 2 > 0 and m ∈ R.
Normal distribution on Rd . There are various ways to introduce this
distribution.
Lemma 11.5.4. One has that
R
Rd
e−
hx,xi
2
dx
√ d
2π
= 1.
Proof. By Fubini’s theorem we get that
Z
Z
Z
hx,xi
1 2
1 2 dx1
dx
dxd
− 2
· · · e− 2 x1 · · · e− 2 xd √ · · · √
e
√ d =
2π
2π
R
R
Rd
2π
Z
d
x2 dx
e− 2 √
=
= 1.
2π
R
d
Definition 11.5.5. The measure γ = γ (d) ∈ M+
1 (R ) given by
Z
hx,xi
dx
γ(B) :=
e− 2 √ d
B
2π
is called standard Gaussian measure on Rd .
Definition 11.5.6. A matrix R = (rij )di,j=1 is called positive semi-definite,
provided that
d
X
hRx, xi =
rkl xk xl ≥ 0
k,l=1
for all x = (x1 , . . . , xd ) ∈ Rd . A matrix R = (rij )di,j=1 is called symmetric,
provided that rkl = rlk for all k, l = 1, . . . , d.
d
Proposition 11.5.7. Let µ ∈ M+
1 (R ). Then the following assertions are
equivalent
11.5. EXAMPLES
161
(i) There exist a matrix A = (αkl )dk,l=1 and a vector m ∈ Rd such that µ is
the image measure of γ (d) with respect to the map x 7→ Ax + m, i.e.
µ(B) = γ (d) x ∈ Rd : Ax + m ∈ B .
(ii) There exist a positive semi-definite and symmetric matrix R = (rkl )dk,l=1
and a vector m0 ∈ Rd such that
0
1
µ
b(x) = eihx,m i− 2 hRx,xi .
(iii) For all b = (b1 , . . . , bd ) ∈ Rd the law of ϕb : Rd → R, ϕb (x) := hx, bi,
with respect to Rd , B(Rd ), µ is a Gaussian measure on the real line
R.
In particular, we have that m, R and m0 are unique and that
(a) m = m0 and R = AAT ,
R
(b) Rd xk dµ(x) = mk and
R
(c) Rd (xk − mk ) (xl − ml ) dµ(x) = rkl .
Definition 11.5.8. The above measure µ is called Gaussian measure on Rd
with mean m = (mk )dk=1 and covariance R = (rkl )dk,l=1 .
Proof of Proposition 11.5.7. (i) ⇒ (ii) follows from
Z
Z
ihx,yi
µ
b(x) =
e
dµ(y) =
eihx,m+Ayi dγ (d) (y)
d
d
R
R
Z
T
= eihx,mi eihA x,yi dγ (d) (y) = eihx,mi γ
b(d) AT x
= e
R
ihx,mi − 12 hAT x,AT xi
e
= eihx,mi e− 2 hAA
1
T x,x
i
so that AAT = R and m = m0 where we have used that
Z
(d)
γ
b (x) =
eihx,yi dγ (d) (y)
d
ZR
Z
ix1 y1
(1)
=
e
dγ (y1 ) · · · eixd yd dγ (d) (yd )
R
R
x2
− 21
= e
x2
− 2d
···e
=e
− 12 hx,xi
.
(ii) ⇒ (iii): We compute the Fourier transform of law(ϕb ):
Z
Z
its
\
law(ϕb )(t) =
e dlaw(ϕb )(s) =
eitϕb (y) dµ(y)
R
Rd
162
CHAPTER 11. CHARACTERISTIC FUNCTIONS
Z
eithb,yi dµ(y) = µ
b(tb)
=
Rd
ihtb,m0 i− 12 hRtb,tbi
= e
0
1 2
hRb,bi
= eithb,m i− 2 t
.
But this is the Fourier transform of a Gaussian measure on R.
(iii) ⇒ (i): For all b ∈ Rd there are mb ∈ Rd and σb ≥ 0 such that
Z
1 2 2
eithb,yi dµ(y) = eimb t− 2 σb t .
Rd
Now
we compute mb andR σb . From Proposition 11.5.2. We know that
R
hb, xi dµ(x) = mb and Rd (hb, xi − mb )2 dµ(x) = σb2 . The first equation
Rd
implies that
Z
Z
he1 , xi dµ(x), . . . ,
hed , xi dµ(x)
= mb
b,
Rd
Rd
d
R
and hb, m00 i = mb with m00 = Rd hek , xi dµ(x) k=1 . Knowing this, we can
rewrite the second equation as
Z
2
hb, x − m00 i dµ(x) = σb2 .
Rd
Defining
0
rkl
Z
hek , x − m00 i hel , x − m00 i dµ(x)
:=
Rd
0 d
we get that R0 = (rkl
)k,l=1 is symmetric and that
d
X
0
hR b, bi =
0
rkl
bk bl
k,l=1
d Z
X
=
hek , x − m00 i hel , x − m00 i dµ(x)bk bl
k,l=1
Rd
Z
d
X
=
Rd
Z
!2
bk hek , x − m00 i
dµ(x)
k=1
2
=
Rd
hb, x − m00 i dµ(x) = σb2 .
Consequently,
Z
eihb,yi dµ(y) = eihb,m
00 i− 1 hR0 b,bi
2
,
Rd
where R0 is positive semi-definite because of hR0 b, bi = σb2 ≥ 0. To get (i) we
use algebra: There exists a matrix A such that R0 = AAT . Hence
Z
1
00
T
T
\
eihb,yi dµ(y) = eihb,m i− 2 hA b,A bi = law(ϕ)(b),
Rd
11.5. EXAMPLES
163
where ϕ(x) = m00 +Ax and (Rd , B(Rd ), γ (d) ) is taken as probability
space. FiR
nally, we check (a), (b) and (c). Using (ii) ⇒ (iii) gives that Rd hx, bi dµ(x) =
R
hb, m0 i and Rd hx − m0 , bi2 dµ(x) = hRb, bi, so that hRb, bi = hR0 b, bi for all
b ∈ Rd . Since R and R0 are symmetric, we may deduce
1
[hR(b1 + b2 ), b1 + b2 i − hR(b1 − b2 ), b1 − b2 i]
4
1
=
[hR0 (b1 + b2 ), b1 + b2 i − hR0 (b1 − b2 ), b1 − b2 i] = hR0 b1 , b2 i
4
hRb1 , b2 i =
and R = R0 which proves (c). Using (i), we get
Z
Z
xdµ(x) =
(m + Ax) dγ (d) (x) = m,
Rd
Rd
so that (b) is proved and that m = m0 . Finally, R = AAT follows now from
(i) ⇒ (ii).
d
Definition 11.5.9. A Gaussian measure µ ∈ M+
1 (R ) with mean m and
covariance R is degenerated if rank(R) < d. The Gaussian measure is called
non-degenerated if rank(R) = d.
Examples
(a) γ (d) is non-degenerated since
0 ... 0
1 . . . 0
.. . . ..
.
.
.
0 0 ... 1
1
0
R = ..
.
has rank d.
(b) Let d = 2 and µ = γ (1) ⊗ δx2,0 , i.e.
Z
1 2
dx1
µ(B) :=
e− 2 x1 χ(x1 ,x2,0 )∈B √ .
2π
R
Let us compute the mean:
Z
Z
x1 dµ(x) = 0 and
R2
Moreover
x2 dµ(x) = x2,0 .
R2
Z
(x1 − 0)2 dµ(x) = 1,
Z
Z
2
(x2 − x2,0 ) dµ(x) = (x2 − x2,0 )2 dδx2,0 = 0,
R2
R2
R
164
CHAPTER 11. CHARACTERISTIC FUNCTIONS
and
Z
(x1 − 0)(x2 − x2,0 )dµ(x) = 0.
R2
Consequently, R =
10
00
and rank(R) = 1.
In the case of non-degenerate measures we have the
d
Proposition 11.5.10. Assume that µ ∈ M+
1 (R ) is a non-degenerate Gaussian measure with covariance R and mean m. Then one has that
Z
1
dx
−1
e− 2 hR (x−m),x−mi
µ(B) =
d
1 .
(2π) 2 |det R| 2
B
We will not prove this. The proof is a computation.
Cauchy distribution on R. We start with an experiment. Take a point
source which sends small particles to a wall. The distance
between the point
source and the wall is α > 0. The angle ϕ ∈ − π2 , π2 is distributed
uniformly,
π π
that means that it has the uniform distribution on − 2 , 2 .
Problem: What is the probability that a particle hits a set B? Here we get,
for 0 ≤ ϕ < π2 , µα ([0, α tan ϕ]) = ϕπ and, for 0 ≤ x < ∞,
Z x
arctan αx
1 α dξ
µα ([0, x]) =
=
π
π 0 1 + ξ2
Z x
Z
1
dη
α x dη
=
.
=
πα 0 1 + η 2
π 0 α2 + η 2
α
Definition 11.5.11. For α > 0 the distribution
dµα (x) :=
α
1
d
dx ∈ M+
1 (R )
2
π α + x2
is called Cauchy distribution with parameter α > 0.
Proposition 11.5.12. One has that
µ
bα (x) = e−α|x| .
11.6. INDEPENDENT RANDOM VARIABLES
165
Proof. We prove the statement for α = 1. (The rest can be done by a change
of variables.) Since µ
b1 (−x) = µ
b1 (x) we can restrict our proof to x ≥ 0. We
eixz
consider the meromorphic function f : C → C, f (z) := 1+z
2 , which has its
2
residuals in the z ∈ C such that 1 + z = 0, that means z1 = i and z2 = −i.
From complex analysis it is known that
Z R
Z
1
f (z)dz +
f (z)dz
lim(z − i)f (z) =
z→i
2πi −R
SR
where SR denotes the upper half circle on the complex plane from R to −R.
Since
Z
eixz
dz −→ 0, as R −→ ∞
2
SR 1 + z
and
1
eixz
= e−x ,
z→i z + i
2i
lim(z − i)f (z) = lim
z→i
we obtain that
1 −x
1
e =
lim
2i
2πi R→∞
11.6
Z
R
−R
eixz
1
dz = µ
b1 (x).
2
1+z
2i
Independent random variables
We recall two facts from the basic course.
Proposition 11.6.1. If f, g : Ω → R are independent such that E|f | < ∞
and E|g| < ∞. Then E|f g| < ∞ and Ef g = Ef Eg.
n
Proposition 11.6.2. Let (Ω, F, P) = i=1
× (Ωi , Fi , Pi ) and let gi : Ω → R be
random variables. If fi (ω1 , . . . , ωn ) := gi (ωi ), then f1 , . . . , fn are independent
and law(fi ) = law(gi ).
Proof. Letting B1 , . . . , Bn ∈ B(R) we get that
n
n
Y
P (f1 ∈ B1 , . . . , fn ∈ Bn ) = P i=1
× {gi ∈ Bi } =
Pi (gi ∈ Bi )
i=1
=
n
Y
i=1
P (fi ∈ Bi ) .
166
CHAPTER 11. CHARACTERISTIC FUNCTIONS
Proposition 11.6.3. Let f, g : Ω → R be random variables. Then the
following assertions are equivalent.
(i) f and g are independent.
(ii) If h : Ω → R2 is defined by h(ω) := (f (ω), g(ω)), then ϕh (x, y) =
ϕf (x)ϕg (y).
Proof. (i)⇒(ii): By Proposition 11.6.1
Z
ϕh (x, y) =
ei(xf (ω)+yg(ω)) dP(ω)
Z
ZΩ
ixf (ω)
e
dP(ω) eiyg(ω) dP(ω) = ϕf (x)ϕg (y).
=
Ω
Ω
(ii)⇒(i): Define H : Ω × Ω :→ R2 by H(ω1 , ω2 ) := (f (ω1 ), g(ω2 )). Then the
coordinates are independent and ϕH (x, y) = ϕh (x, y) = ϕf (x)ϕg (y) so that
the law of H and h are the same. But this implies that
P (f ∈ B1 , g ∈ B2 ) = P (h ∈ B1 × B2 )
= (P ⊗ P) (H ∈ B1 × B2 ) = P (f ∈ B1 ) P (g ∈ B2 ) .
We consider an application of this:
Definition 11.6.4. Two random variables f, g : Ω → R with Ef 2 +Eg 2 < ∞
are called uncorrelated, provided that E(f − Ef )(g − Eg) = 0.
Remark 11.6.5. If f and g are independent and if Ef 2 + Eg 2 < ∞, then
they are uncorrelated. In fact, we have (by Proposition 11.6.1) that
E(f − Ef )(g − Eg) = [E(f − Ef )] [E(g − Eg)] = 0.
Proposition 11.6.6. Let f, g : Ω → R be random variables such that (f, g) :
Ω → R2 is a Gaussian random variable. Then the following assertions are
equivalent.
(i) f and g are uncorrelated.
(ii) f and g are independent.
11.7. MOMENTS OF MEASURES
167
Proof. We only have to check that (i) implies (ii). We know from Proposition
11.5.7 that for x = (x1 , x2 ) one has
1
ϕ(f,g) (x1 , x2 ) = ei(x1 Ef +x2 Eg)− 2 hRx,xi ,
with
R=
E(f − Ef )2
E(f − Ef )(g − Eg)
var(f )
0
=
.
E(f − Ef )(g − Eg)
E(g − Eg)2
0
var(g)
Consequently,
1 2
1 2
ϕ(f,g) (x1 , x2 ) = eix1 Ef − 2 x1 var(f ) eix2 Eg− 2 x2 var(g) = ϕf (x1 )ϕg (x2 ).
Applying Proposition 11.6.3 we get the independence of f and g.
Warning: We need that the joint distribution of f and g (in other words
2
law(f, g) ∈ M+
1 (R )) is Gaussian.
11.7
Moments of measures
d
There are different types of moments of a measure µ ∈ M+
1 (R ). Given
integers l1 , . . . , ld ≥ 0 and 0 < p < ∞ we have for example that
Z
xl11 · · · xldd dµ(x1 , . . . , xd ) = moment of order (l1 , . . . , ld ),
Z l1
l x1 · · · xdd dµ(x1 , . . . , xd ) = absolute moment of order (l1 , . . . , ld ),
Rd
p
Z Z
x − xdµ(x) dµ(x) = centred absolute p-th moment.
Rd
R
R
We are interested in the first type and show that one can use the Fourier
transform to compute these moments:
168
CHAPTER 11. CHARACTERISTIC FUNCTIONS
d
Proposition 11.7.1. Let µ ∈ M+
1 (R ) and assume integers k1 , . . . , kd ≥ 0
such that for all integers 0 ≤ lj ≤ kj one has that
Z l1
l x1 · · · xdd dµ(x1 , . . . , xd ) < ∞.
Rd
Then one has the following:
(i)
∂ l1 +···+ld
µ
b ∈ Cb (Rd ).
ld
l1
∂x1 · · · ∂xd
(ii)
∂ l1 +···+ld
µ
b(x) = il1 +···+ld
ld
l1
∂x1 · · · ∂xd
Z
Rd
eihx,yi y1l1 · · · ydld dµ(y).
(iii)
∂ l1 +···+ld
µ
b(0) = il1 +···+ld
ld
l1
∂x1 · · · ∂xd
Z
Rd
y1l1 · · · ydld dµ(y).
(iv) The partial derivatives of µ
b are uniformly continuous.
Example 11.7.2. (a) Binomial distribution: 0 < p < 1, d = 1, n ∈
{1, 2, . . .}.
n n−k
µ({k}) :=
p (1 − p)k , k = 1, . . . , n.
k
n
n−1
µ
b(x) = p + (1 − p)eix , µ
b0 (x) = n p + (1 − p)eix
(1 − p)ieix ,
µ
b0 (0) = n(1 − p)i,
µ
b0 (0)
= n(1 − p).
i
1
(b) Gaussian measure γ0,σ2 ∈ M+
b0,σ2 (x) = e− 2 σ
1 (R): We have that γ
get
0
2 − 1 σ 2 x2
0
γ
b0,σ
, γ
b0,σ
2 (x) = −xσ e 2
2 (0) = 0,
1
00
2 4
2 − σ
γ
b0,σ
2 (x) = (x σ − σ )e 2
2 x2
00
2
, γ
b0,σ
2 (0) = −σ .
(c) Cauchy distribution µα ∈ M+
1 (R) with α > 0: Recall that
dµα (x) =
α
1
and µ
b(x) = e−α|x| .
2
2
πα +x
Proposition 11.7.3. For all α > 0 one has
Z
|x|k dµα (x) = ∞ for k = 1, 2, . . .
R
2 x2
and
11.7. MOMENTS OF MEASURES
169
Proof. A first variant of the proof is
lim
x↓0
e−α|x| − 1
e−α|x| − 1
= −α 6= α = lim
.
x↑0
x
x
Or we can use
Z
Z
Z ∞
|x|k
1
|x|k
dx ≥ 2
xk−2 dx = ∞.
dx = 2
2 + x2 )
2
(α
α
+
1
x
R
|x|≥1
1
Proof of Proposition 11.7.1. We only prove the case l1 = 1, l2 = · · · = ld = 0
(the rest follows by induction). Fix x2 , . . . , xd ∈ R and define f (x1 , y) :=
eih(x1 ,...,xd ),yi . Then
Z
Z
Z
∂ ihx,yi
∂
ihx,yi
eihx,yi y1 dµ(y),
e
dµ(y) =
e
dµ(y) = i
∂x1 Rd
∂x
d
d
1
R
R
where the first inequality has to be justified. Now we define dυ+ (y) =
χ{y1 ≥0} y1 dµ(y) and dυ− (y) = −χ{y1 <0} y1 dµ(y) and obtain bounded measures,
so that
Z
eihx,yi dυ± (y)
x 7→
Rd
are uniformly continuous and bounded and (iv) follows.
For the equality we have to justify, we need
Lemma 11.7.4. Let f : R × Ω → C be such that
(i)
∂f
(·, ω)
∂x
is continuous for all ω ∈ Ω,
(ii)
∂f
(x, ·)
∂x
and f (x, ·) are random variables,
≤ g(ω) for all
(iii) There exists a g : Ω → R, g(ω) ≥ 0, such that ∂f
(x,
ω)
∂x
ω ∈ Ω, x ∈ R and Eg < ∞,
(iv)
R
Ω
|f (x, ω)|dP(ω) < ∞ for all x ∈ R.
Then
∂
∂x
Z
Z
f (x, ω)dP(ω) =
Ω
Ω
∂f
(x, ω)dP(ω).
∂x
170
11.8
CHAPTER 11. CHARACTERISTIC FUNCTIONS
Weak convergence
In the beginning of the lecture we considered the following types of convergence: almost sure convergence, convergence in probability and Lp convergence. Up to now we needed that the underlying probability spaces
are the same. This will be relaxed by the weak convergence, where we only
consider the convergence of the laws.
d
Proposition 11.8.1. Let µn , µ ∈ M+
1 (R ). Then the following assertions
are equivalent.
d
(i) For
R all continuousR and bounded functions ϕ : R → R one has that
ϕ(x)dµn (x) −→
ϕ(x)dµ(x).
n
(ii) For all closed sets A ∈ B(Rd ) one has limn µn (A) ≤ µ(A).
(iii) For all open sets B ∈ B(Rd ) one has limn µn (B) ≥ µ(B).
(iv) If d = 1 and if Fn (x) := µn ((−∞, x]) and F (x) := µ((−∞, x]), then
Fn (x) −→
F (x) for all points x ∈ R of continuity of F .
n
(v) µ
bn (x) −→
µ
b(x) for x ∈ Rd .
n
d
Definition 11.8.2. (i) For µn , µ ∈ M+
1 (R ) we say that µn converges
w
weakly to µ (µn ⇒ µ or µn →
µ) provided that the conditions of
Proposition 11.8.1 are satisfied.
(ii) Let fn : Ωn → Rd and f : Ω → Rd be random variables over probability
spaces (Ωn , Fn , Pn ) and (Ω, F, P). Then fn converges to f weakly or in
d
distribution (fn →
f ) provided that the corresponding laws µn (B) =
Pn (fn ∈ B) and µ(B) = P(f ∈ B) are converging weakly.
What is the connection to our earlier types of convergence?
Proposition 11.8.3. For fn , f : Ω → Rd one has that if fn converges to f
in probability, then fn converges to f in distribution.
Proof. Letting ϕ : Rd → R be continuous and bounded, we need to show
that
Z
Z
Eϕ(fn ) =
ϕ(x)dµn (x) −→
ϕ(x)dµ(x) = Eϕ(f ).
n
Rd
Rd
But this follows from (defining kϕk∞ := supx |ϕ(x)|)
Z
Z
|Eϕ(fn ) − Eϕ(f )| ≤
dP2kϕk∞ +
{|fn −f |>ε}
{|fn −f |≤ε,|f |>N }
dP2kϕk∞
11.9. CENTRAL LIMIT THEOREM
171
Z
|ϕ(fn ) − ϕ(f )| dP
+
{|fn −f |≤ε,|f |≤N }
≤ 2kϕk∞ (P(|fn − f | > ε) + P(|f | > N ))
+
sup
|ϕ(x) − ϕ(y)|
|x−y|≤ε, |y|≤N
≤ 3δ,
for N = N (δ), ε = ε(N (δ), δ)) and n ≥ n(ε).
11.9
Central limit theorem
Proposition 11.9.1 (Central limit theorem). Let f1 , f2 , . . . : Ω → R be a
sequence of independent random variables which have the same distribution
such that 0 < E(fk − Efk )2 = σ 2 < ∞ and Efk = m. Then
Z x
2
1
1
− ξ2
√
P √
((f1 − m) + · · · + (fk − m)) ≤ x −→
e
dξ.
k
2π −∞
kσ 2
Proof. Let fn0 := fn σ−m . Then we get that Efn0 =
E(fn0 )2 = σ12 E(fn − m)2 = 1. We have to show that
P
1
√ f10 + · · · + fn0 ≤ x
n
√1 Sn
n
or, for Sn := f10 + · · · + fn0 ,
is equivalent to
ϕ √1
S (t)
n n
1
√
−→
n
2π
1
(Efn
σ
Z
x
− m) = 0 and
ξ2
e− 2 dξ
−∞
⇒ g ∼ N (0, 1). By Proposition 11.8.1 this
t2
−→
ϕg (t) = e− 2
n
for all t ∈ R. Now
ϕ √1
n
Sn (t)
= ϕ √1 (f 0 +···+fn0 ) (t) = ϕ f10 (t) · · · ϕ √fn0 (t) = ϕ
1
√
n
n
n
t
√
n
n
,
if ϕ(t) = ϕf10 (t). Since E(f10 )2 < ∞, Proposition 11.7.1 implies that ϕ00 ∈
Cb (R) and
t2 00
ϕ (0) + o(t2 )
2
t2
= ϕ(0) + tiEf10 + i2 E(f10 )2 + o(t2 )
2
2
t
= 1 − + o(t2 )
2
ϕ(t) = ϕ(0) + tϕ0 (0) +
172
CHAPTER 11. CHARACTERISTIC FUNCTIONS
for t ∈ R with
o(t2 )
= 0.
t→0 t2
lim
So it remains to show that
2 n
2
t
t2
− t2
+o
−→
e
1−
.
n
2n
n
Letting θ =
t2
,
2
this is done by
n
θ
θ
−θ
1− +o
−→
.
n e
n
n
Fix θ ≥ 0, let ε ∈ (0, 1), and choose n(ε, θ) ∈ N such that
θ
θ
o
n ≤ ε n
for n ≥ n(ε, θ). Then
n n n
θ
θ(1 − ε)
θ
θ(1 + ε)
≤ 1− +o
≤ 1−
1−
n
n
n
n
for n ≥ n(ε, θ) such that θ(1+ε)
< 1. Because
n
n
n
θ(1 + ε)
θ(1 − ε)
−θ(1+ε)
lim 1 −
=e
and lim 1 −
= e−θ(1−ε)
n
n
n
n
it follows that
−θ(1+ε)
e
θ
≤ lim inf 1 − + o
n
n
n
n
θ
θ
θ
≤ lim sup 1 − + o
n
n
n
n
≤ e−θ(1−ε) .
Since this is true for all ε > 0 we end up with
n
θ
θ
−θ
≤ e−θ
e ≤ lim 1 − + o
n
n
n
which finishes the proof.
Proposition 11.9.2 (Poisson). Let fn,1 , . . . , fn,n : Ω → R be independent
random variables such that P(fn,k = 1) = pnk and P(fn,k = 0) = qnk with
pnk + qnk = 1. Assume that
max pnk →n 0 and
1≤k≤n
n
X
pnk →n λ > 0.
k=1
Then, for Sn := fn,1 + · · · + fn,n , the laws µn := law(Sn ) converge weakly to
the Poisson distribution
∞
X
λk
πλ (B) =
e−λ δ{k} (B).
k!
k=0
11.9. CENTRAL LIMIT THEOREM
173
Proof. For θ = eit − 1 we get that
ϕSn (t) =
=
=
n
Y
k=1
n
Y
k=1
n
Y
ϕfn,k (t) =
n
Y
pnk eit + qnk
k=1
1 + pnk eit − 1
(1 + pnk θ)
k=1
= 1+θ
n
X
!
!
pnk
+θ
2
X
pnk1 pnk2
1≤k1 <k2 ≤n
k=1
!
X
+ · · · + θl
pnk1 · · · pnkl
+ ···
1≤k1 <···<kl ≤n
+θn pn1 · · · pnn
n
X
(θλ)l
bln
= 1+
l!
l=1
with
bln :=
l!
λl
X
pnk1 · · · pnkl .
1≤k1 <···<kl ≤n
We do not prove Poisson’s Theorem, we just show that
lim bln = 1
n
for fixed l. This can be easily seen by
X
l!
pnk · · · pnkl
λl 1≤k <···<k ≤n 1
1
l
l P 1≤k ,...,k ≤n p · · · p
nk1
nkl
1
l
pn1 + · · · + pnn
all indices are distinct
P
=
.
λ
1≤k1 ,...,kl ≤n pnk1 · · · pnkl
The first factor converges to 1 as n → ∞. The second one we write as
P
P
pnk1 · · · pnkl
pnk1 · · · pnkl
1≤k1 ,...,kl ≤n
1≤k1 ,...,kl ≤n
all indices are distinct
P
Pall indices are distinct
= 1 − not
1≤k1 ,...,kl ≤n pnk1 · · · pnkl
1≤k1 ,...,kl ≤n pnk1 · · · pnkl
and can bound the second term by
X
pnk1 · · · pnkl ≤ (pn1 + · · · + pnn )l−1 max pnk
1≤k1 ,...,kl ≤n
not all indices are distinct
k
+(pn1 + · · · + pnn )l−2 max p2nk
k
174
CHAPTER 11. CHARACTERISTIC FUNCTIONS
+···
+(pn1 + · · · + pnn ) max pl−1
nk .
k
This should motivate the convergence
it −1)
λθ
ϕSn (t) −→
= eλ(e
n e
Example 11.9.3. pn,1 = · · · = pn,n :=
λ
n
=π
bλ (t).
for n > λ.
Now we consider a limit theorem which is an extension of the central limit
theorem.
Proposition 11.9.4. Assume that fn1 , fn2 , ..., fnn are independent random
variables such that
Efnk = 0 and
n
X
2
Efnk
=1
k=1
for all n = 1, 2, ... Let µnk := law(fnk ) and assume that the Lindeberg condition
n Z
X
x2 dµnk (x) →n 0
k=1
|x|>ε
is satisfied for all ε > 0. Then one has that
Sn →d N (0, 1)
where Sn := fn1 + · · · + fnn .
Example 11.9.5. Let us check that the Lindeberg condition is satisfied in
the case that we consider the weak limit of
1
√ (f1 + · · · + fn )
n
where
(1) Efk = 0,
(2) Efk2 = 1,
(3) f12 , f22 , f32 , ... is a uniformly integrable family of random variables.
11.9. CENTRAL LIMIT THEOREM
175
In the above notation we can write (in distribution) that
fk
fnk =d √ ,
n
so that
n Z
X
k=1
2
x dµnk (x) =
|x|>ε
n Z
X
f √
kn >ε
k=1
n
=
1X
n k=1
f
√k
n
Z
√
fk2 dP
sup
1≤k≤n
Z
≤ sup
√
fk2 dP
k≥1
dP
|fk |> nε
Z
≤
2
|fk |> nε
√
|fk |> nε
fk2 dP
→ 0
as n → ∞. Examples that the family (fk2 )∞
k=1 is uniformly integrable are
(a) f1 , f2 , ... are identical distributed,
(b) supk E|fk |p < ∞ for some 2 < p < ∞.
176
CHAPTER 11. CHARACTERISTIC FUNCTIONS
Chapter 12
Appendix
12.1
Topology
Definition 12.1.1. Let M 6= ∅. The pair (M, d) is called metric space, if
d : M × M → [0, ∞) satisfies
(i) d(x, y) = 0 if and only if x = y (reflexivity),
(ii) d(x, y) = d(y, x) (symmetry),
(iii) d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality).
We conclude with the Lorentz-spaces Lp . Before we do this we recall the
notion of a Banach space.
Definition 12.1.2. Let X be a linear space and k · k : X → [0, ∞). Then
[X, k · k] is called Banach space, provided that
(i) kxk = 0 if and only if x = 0,
(ii) kλxk = |λ|kxk for all λ ∈ R (λ ∈ C) and x ∈ X,
(iii) kx + yk ≤ kxk + kyk and
(iv) if (xn )∞
n=1 ⊆ X is a Cauchy sequence, i.e. for all ε > 0 there exists
n(ε) ≥ 1 with kxm − xn k ≤ ε whenever m, n ≥ n(ε), then there is an
x ∈ X such that
lim kxn − xk = 0.
n
Remark 12.1.3. Properties (i), (ii), and (iii) say that k · k is a norm.
1
Example 12.1.4. For X = Rn and kxk = k(x1 , ..., xn )k := (x21 + · · · + x2n ) 2
we obtain a norm on Rn .
178
CHAPTER 12. APPENDIX
Remark 12.1.5. One can define Lp (Ω, F, P) for 0 < p < 1 in the same way.
We get
(a) kx + ykLp ≤ cp kxkLp + kykLp (quasi-triangle inequality) and
(b) (Lp , d) is a complete metric space with d(fˆ, ĝ) := kfˆ − ĝkpLp .
12.2
Set theory
Definition 12.2.1. Let M 6= ∅ be an arbitrary set. We say that a relation
x ∼ y is an equivalence class relation on M , provided that
(i) x ∼ x for all x ∈ M (reflexivity),
(ii) if x ∼ y, then y ∼ x (symmetry),
(iii) if x ∼ y and y ∼ z, then x ∼ z (transitivity).
Lemma 12.2.2. Let M 6= ∅ equipped with an equivalence class relation x ∼
y. Then
[
M=
Mi ,
i∈I
with
(i) Mi 6= ∅,
(ii) Mi ∩ Mj = ∅, if i 6= j,
(iii) x, y ∈ M belong to the same set Mi if and only if x ∼ y.
Definition 12.2.3. The sets Mi are called equivalence classes, an element
xi ∈ Mi is called representative. We also use the notation Mi = [xi ] if
xi ∈ Mi .
The Axiom of choice is part of the set theory according to Zermelo-Fraenkel.
It has been formulated by Ernst Zermelo in 1904.
Axiom 12.2.4. [Axiom of choice] Let I be a non-empty set and (Mα )α∈I
be a system of non-empty sets Mα . Then there is a function ϕ on I such that
ϕ : α → mα ∈ Mα .
In other words, one can form a set by choosing of each set Mα a representative
mα .
12.3. MEASURE AND INTEGRATION THEORY
179
12.3
Measure and integration theory
12.3.1
π-λ-Theorem
Definition 12.3.1. [λ-system] Given Ω 6= ∅, a system L of subsets of Ω is
called λ-system if
(1) Ω ∈ L,
(2) A, B ∈ L and A ⊆ B imply B\A ∈ L,
(3) A1 , A2 , · · · ∈ L and An ⊆ An+1 , n = 1, 2, . . . imply
S∞
n=1
An ∈ L.
Definition 12.3.2. [π-system] Given Ω 6= ∅, a system P of subsets of Ω is
called π-system, provided that
A∩B ∈P
for all A, B ∈ P.
Any algebra is a π-system but a π-system is not an algebra in general, take
for example the π-system {(a, b) : −∞ < a < b < ∞} ∪ {∅}.
Proposition 12.3.3. [π-λ-Theorem] If P is a π-system and L is a λsystem, then P ⊆ L implies that σ(P) ⊆ L.
Proof of Proposition 2.2.3. Define the system
L := {A ∈ F : µ(A) = ν(A)} ⊇ P.
If we can proof that L is a λ-system, then Proposition 12.3.3 implies that F =
σ(P) ⊆ L and the statement follows. Property (1) of Definition 12.3.1 follows
by the definition of L, property (2) by the finite additivity, and property (3)
from the monotonicity of the measures µ and ν from below.
180
CHAPTER 12. APPENDIX
12.3.2
Outer measures and the proof of Carathéodory’s Theorem
We start with the notion of an outer measure:
Definition 12.3.4. Let Ω 6= ∅. A map µ : 2Ω → [0, ∞] is called outer
measure provided that
(1) µ(∅) = 0,
(2) µ(A) ≤ µ(B) for all A ⊆ B,
S
P∞
(3) µ( ∞
i=1 Ai ) ≤
i=1 µ(Ai ) for all Ai ⊆ Ω.
Moreover, we let
F µ := {B ⊆ Ω : µ(A) = µ(A ∩ B) + µ(A ∩ B c ) for all A ⊆ Ω}
be the system of µ-measurable sets.
The system F µ can be understood as system of good separators B with
respect to µ. The point of the construction is
Proposition 12.3.5. The system F µ forms a σ-algebra and µ is a measure
on F µ .
Proof. Before we start we remark that we also have that
!
n
n
[
X
µ
Ai ≤
µ(Ai )
i=1
i=1
by setting ∅ = An+1 = An+2 = · · · .
(a) Because µ(A ∩ ∅) + µ(A ∩ Ω) = µ(A) we have that ∅, Ω ∈ F µ .
(b) By the symmetry of the definition of F µ with respect to B we also have
that B ∈ F µ if and only if B c ∈ F µ .
(c) Now we assume that B, B 0 ∈ F µ and prove that B ∩ B 0 ∈ F µ : This
follows from
µ(A) =
=
≥
=
µ(A ∩ B) + µ(A ∩ B c )
µ(A ∩ B ∩ B 0 ) + µ(A ∩ B ∩ (B 0 )c ) + µ(A ∩ B c )
µ(A ∩ B ∩ B 0 ) + µ([A ∩ B ∩ (B 0 )c ] ∪ [A ∩ B c ])
µ(A ∩ B ∩ B 0 ) + µ(A ∩ (B ∩ B 0 )c ).
Therefore, F µ is an algebra.
(d) Now let us assume that B1 , B2 , ... ∈ F µ . By using that F µ is an algebra
we may assume that the Bn are pair-wise disjoint while proving that
!!
!c !
∞
∞
[
[
µ(A) ≥ µ A ∩
Bn
+µ A∩
Bn
.
n=1
n=1
12.3. MEASURE AND INTEGRATION THEORY
By the fact that F µ is an algebra we know that
!!
n
[
µ(A) = µ A ∩
Bi
+µ A∩
n
[
i=1
=
n
X
µ(A ∩ Bi ) + µ A ∩
i=1
n
[
181
!c !
Bi
i=1
!c !
Bi
i=1
where the second equality follows from a successive application of the definition of F µ . This implies that
!c !
∞
∞
X
[
µ(A) ≥
µ(A ∩ Bi ) + µ A ∩
Bi
i=1
i=1
∞
[
≥ µ A∩
!!
∞
[
+µ A∩
Bi
i=1
!c !
Bi
i=1
≥ µ(A)
so that we are done.
(e) Finally, let us prove that µ is a measure on F µ . Assuming pair-wise
disjoint B1 , B2 , ... ∈ F µ we have that
!
∞
∞
X
[
µ
µ(Bi )
Bi ≤
i=1
i=1
by the definition of the outer measure. To check the converse inequality, we
observe that the definition of F µ inductively implies that
!
!
∞
∞
[
[
µ
= µ(B1 ) + µ
Bi
Bi
i=1
i=2
= µ(B1 ) + µ(B2 ) + µ
∞
[
!
Bi
i=3
= µ(B1 ) + · · · + µ(Bn ) + µ
!
∞
[
Bi
i=n+1
so that
µ
∞
[
!
Bi
≥
∞
X
i=1
µ(Bi ).
i=1
Now we prove Carathéodory’s Theorem:
Proof. We define
µ∗0 (A) := inf
(∞
X
i=1
µ0 (Ai ) : A ⊆
∞
[
i=1
)
Ai , Ai ∈ G
.
182
CHAPTER 12. APPENDIX
(a) We check the properties (1-3) for µ∗0 being an outer measure: (1) Because
µ0 (∅) = 0, we have µ∗0 (∅) = 0. (2) Any covering of B is a covering of A, so
that µ∗0 (A) ≤ µ∗0 (B). (3) Let ε > 0 and find coverings
Ai ⊆
∞
[
Aij
with
Aij
∈G
∞
X
and
j=1
j=1
Then
S∞
i=1
Ai ⊆
∞
[
µ∗0
µ0 (Aij ) ≤
S∞
i,j=1
!
Aij and
∞
X
≤
Ai
i=1
ε
+ µ∗0 (Ai ).
2i
µ0 (Aij )
≤
∞ h
X
ε
2i
i=1
i,j=1
+
µ∗0 (Ai )
i
=ε+
∞
X
µ∗0 (Ai ).
i=1
Letting ε ↓ 0, the subadditivity follows.
∗
∗
(b) One has G ⊆ F µ0 so that σ(G) ⊆ F µ0 : Let B ∈ G and A ⊆ Ω. We have
to show that
µ∗0 (A) = µ∗0 (A ∩ B) + µ∗0 (A ∩ B c ).
Because µ∗0 (A) ≤ µ∗0 (A ∩ B) + µ∗0 (A ∩ B c ) follows from the property that µ∗0 is
an outer measure, it remains to check that µ∗0 (A) ≥ µ∗0 (A ∩ B) + µ∗0 (A ∩ B c ).
For ε > 0 choose a G-cover
A⊆
∞
[
Ai
∞
X
with
i=1
µ0 (Ai ) ≤ µ∗0 (A) + ε.
i=1
For A ∩ B and A ∩ B c we obtain the G-covers
∞
[
A∩B ⊆
c
(Ai ∩ B) and A ∩ B ⊆
i=1
∞
[
(Ai ∩ B c ),
i=1
and that
µ∗0 (A
∩ B) ≤
∞
X
µ0 (Ai ∩ B) and
µ∗0 (A
c
∩B )≤
i=1
∞
X
µ0 (Ai ∩ B c ).
i=1
This gives that
µ∗0 (A
∩ B) +
µ∗0 (A
∞
X
∩B ) ≤
[µ0 (Ai ∩ B) + µ0 (Ai ∩ B c )]
c
=
i=1
∞
X
µ0 (Ai )
i=1
≤ µ∗0 (A) + ε.
Again, letting ε ↓ 0 yields the assertion.
12.3. MEASURE AND INTEGRATION THEORY
183
(c) One has µ0 = µ∗0 on G: Let A ∈ G. Then A is a G-cover for A so that
µ∗0 (A)S≤ µ0 (A). Now we check µ0 (A) ≤ µ∗0 (A), i.e. that for any G-cover
A⊆ ∞
i=1 Ai one has
∞
X
µ0 (A) ≤
µ0 (Ai ).
i=1
This follows from
µ0 (A) = µ0
∞
[
!
(A ∩ Ai )
i=1
=
∞
X
µ0 (A ∩ Ai ) ≤
i=1
∞
X
µ0 (Ai ) .
i=1
(d) It remains to show the uniqueness. For the case µ0 (Ω) < ∞ this follows
immediately from Proposition 2.2.3. The general case can be checked as
follows: We consider the trace σ-algebras of σ(G) on Ωn , and get that µ and
ν coincide on these trace σ-algebras. This implies that µ and ν coincide
globally.
12.3.3
Monotone class theorem
Now we state the Monotone Class Theorem. It is a powerful tool by which,
for example, measurability assertions can be proved.
Proposition 12.3.6. [Monotone Class Theorem] Let H be a class of
bounded functions from Ω into R satisfying the following conditions:
(1) H is a vector space over R where the natural point-wise operations ”+”
and ” · ” are used.
(2) 1IΩ ∈ H.
(3) If fn ∈ H, fn ≥ 0, and fn ↑ f , where f is bounded on Ω, then f ∈ H.
Then one has the following: if H contains the indicator function of every
set from some π-system I of subsets of Ω, then H contains every bounded
σ(I)-measurable function on Ω.
Proof.
184
12.3.4
CHAPTER 12. APPENDIX
A set which is not a Lebesgue set
In this section we shall construct a set which is a subset of (0, 1] but not an
element of
B((0, 1]) := {B = A ∩ (0, 1] : A ∈ B(R)} .
Before we start we need
Given x, y ∈ (0, 1] and A ⊆ (0, 1], we also need the addition modulo one
x+y
if x + y ∈ (0, 1]
x ⊕ y :=
x + y − 1 otherwise
and
A ⊕ x := {a ⊕ x : a ∈ A}.
Now define
L := {A ∈ B((0, 1]) :
A ⊕ x ∈ B((0, 1]) and λ(A ⊕ x) = λ(A) for all x ∈ (0, 1]} (12.1)
where λ is the Lebesgue measure on (0, 1].
Lemma 12.3.1. The system L from (12.1) is a λ-system.
Proof. The property (1) is clear since Ω ⊕ x = Ω. To check (2) let A, B ∈ L
and A ⊆ B, so that
λ(A ⊕ x) = λ(A)
and
λ(B ⊕ x) = λ(B).
We have to show that B \ A ∈ L. By the definition of ⊕ it is easy to see that
A ⊆ B implies A ⊕ x ⊆ B ⊕ x and
(B ⊕ x) \ (A ⊕ x) = (B \ A) ⊕ x,
and therefore, (B \ A) ⊕ x ∈ B((0, 1]). Since λ is a probability measure it
follows
λ(B \ A) =
=
=
=
λ(B) − λ(A)
λ(B ⊕ x) − λ(A ⊕ x)
λ((B ⊕ x) \ (A ⊕ x))
λ((B \ A) ⊕ x)
and B\A ∈ L. Property (3) is left as an exercise.
Finally, we need the axiom of choice.
Proposition 12.3.2. There exists a subset H ⊆ (0, 1] which does not belong
to B((0, 1]).
12.4. ADD-ONS FOR CONVERGENCE IN PROBABILITY
185
Proof. We take the system L from (12.1). If (a, b] ⊆ [0, 1], then (a, b] ∈ L.
Since
P := {(a, b] : 0 ≤ a < b ≤ 1}
is a π-system which generates B((0, 1]) it follows by the π-λ-Theorem (Proposition 12.3.3) that
B((0, 1]) ⊆ L.
Let us define the equivalence relation
x∼y
if and only if
x⊕r =y
for some rational
r ∈ (0, 1].
Let H ⊆ (0, 1] be consisting of exactly one representative point from each
equivalence class (such set exists under the assumption of the axiom of
choice). Then H ⊕ r1 and H ⊕ r2 are disjoint for r1 6= r2 : if they were
not disjoint, then there would exist h1 ⊕ r1 ∈ (H ⊕ r1 ) and h2 ⊕ r2 ∈ (H ⊕ r2 )
with h1 ⊕ r1 = h2 ⊕ r2 . But this implies h1 ∼ h2 and hence h1 = h2 and
r1 = r2 . So it follows that (0, 1] is the countable union of disjoint sets
[
(0, 1] =
(H ⊕ r).
r∈(0,1] rational
If we assume that H ∈ B((0, 1]) then B((0, 1]) ⊆ L implies H ⊕ r ∈ B((0, 1])
and
[
X
λ((0, 1]) = λ
(H ⊕ r) =
λ(H ⊕ r).
r∈(0,1]
rational
r∈(0,1]
rational
By B((0, 1]) ⊆ L we have λ(H ⊕ r) = λ(H) = a ≥ 0 for all rational numbers
r ∈ (0, 1]. Consequently,
X
1 = λ((0, 1]) =
λ(H ⊕ r) = a + a + . . .
r∈(0,1] rational
So, the right hand side can either be 0 (if a = 0) or ∞ (if a > 0). This leads
to a contradiction, so H 6∈ B((0, 1]).
12.4
Add-ons for convergence in probability
There are alternatives to the metric of Ky Fan. We describe one of them in
this section.
Definition 12.4.1. For f, g ∈ L0 (Ω, F, P) we let
Z
|f − g|
dP(ω).
dI (f, g) :=
Ω 1 + |f − g|
186
CHAPTER 12. APPENDIX
Proposition 12.4.2. For f, g, h ∈ L0 (Ω, F, P) one has that
(i) dI (f, g) = 0 if and only if P(f = g) = 1,
(ii) dI (f, g) = dI (g, f ),
(iii) dI (f, h) ≤ dI (f, g) + dI (g, h).
Proof. (i) We have that
Z
|f − g|
|f − g|
dI (f, g) = 0 ⇐⇒
dP = 0 ⇐⇒ P
=0 =1
1 + |f − g|
Ω 1 + |f − g|
so that dI (f, g) = 0 if and only if P (f = g) = 1. Assertion (ii) follows by
definition and (iii) we can verify by
Z
|f − h|
dI (f, h) =
dP
Ω 1 + |f − h|
Z
|f − g| + |g − h|
≤
dP
Ω 1 + |f − g| + |g − h|
Z
Z
|f − g|
|g − h|
≤
dP +
dP)
Ω 1 + |f − g|
Ω 1 + |g − h)|
= dI (f, g) + dI (g, h).
Proposition 12.4.3. For fn , f ∈ L0 (Ω, F, P) the following holds true:
(i) We have dI (fn , f ) −→
0 if and only if fn −→
f.
n
P
(ii) The sequence (fn )∞
n=1 is a Cauchy sequence with respect to d, i.e. for all
ε > 0 there is an n ≥ 1 such that for k, l ≥ n one has that dI (fk , fl ) < ε,
if and only if (fn )∞
n=1 is a Cauchy sequence in probability.
Proof. (i) The condition dI (fn , f ) −→
0 implies
n
Z
|f − g|
dP −→
0,
n
Ω 1 + |f − g|
so that by Ĉebyšev’s inequality, for λ > 0,
Z
|fn − f |
|fn − f |
>λ ≤
dP −→
λP
n 0.
1 + |fn − f |
Ω 1 + |fn − f |
Given ε > 0 we find λ(ε) > 0 such that if |x| > ε, then
P (|fn − f | > ε) ≤ P
|fn − f |
> λ(ε)
1 + |fn − f |
1
≤
λ(ε)
|x|
1+|x|
Z
Ω
> λ(ε). Hence
|fn − f |
dP
1 + |fn − f |
12.5. ADD-ONS FOR INEQUALITIES
187
and P (|fn − f | > ε) −→
n 0.
Conversely, for all ε > 0 we have that
Z
|fn − f |
dP
Ω 1 + |fn − f |
Z
Z
|fn − f |
|fn − f |
dP +
dP
=
{|fn −f |>ε} 1 + |fn − f |
{|fn −f |≤ε} 1 + |fn − f |
ε
≤ P (|fn − f | > ε}) +
,
1+ε
x
1
since the function 1+x
= 1 − 1+x
is monotone for x ≥ 0. Given θ > 0, we take
θ
ε
ε > 0 such that 1+ε ≤ 2 and then n0 ≥ 1 such that P (|fn − f | > ε}) ≤ 2θ for
n ≥ n0 . Hence dI (fn , f ) ≤ θ for n ≥ n0 .
(ii) Assume that (fn )∞
n=1 is a Cauchy sequence with respect to dI . For λ > 0
we have that
|fk − fl |
> λ ≤ dI (fk , fl ) ≤ η
λP
1 + |fk − fl |
ε
for k, l ≥ n(η) ≥ 1. For λ := 1+ε
with ε > 0 this gives that
ε
P (|fk − fl | > ε) ≤ η
1+ε
for k, l ≥ n(η) ≥ 1. Choosing η = ε2 we end up with
P (|fk − fl | > ε) ≤ ε(1 + ε)
for k, l ≥ n(ε2 ) ≥ 1. Hence (fn )∞
n=1 is a Cauchy sequence in probability.
Now assume that (fn )∞
n=1 is a Cauchy sequence in probability. Then for all
ε > 0 there exists n(ε) ≥ 1 such that for all k, l ≥ n(ε) P (|fk − fl | > ε) ≤ ε.
Consequently,
Z
ε
ε
|fk − fl |
dP ≤ P(|fk − fl | > ε) +
≤ε+
1+ε
1+ε
Ω 1 + |fk − fl |
for k, l ≥ n(ε) ≥ 1.
Proposition 12.4.4. For Lévy-Prokhorov metric we have that
dLP (law(f ), law(g)) ≤ dKF (f, g).
Proof.
12.5
Add-ons for inequalities
Remark 12.5.1. There are other inequalities, called Lévy1 -Octaviani inequalities: assuming independent random variables ξ1 , ξ2 , ... : Ω → R and
ε > 0 one has
1
Paul Pierre Lévy, 15/09/1886 (Paris, France) - 15/12/1971 (Paris, France), influenced
greatly probability theory, also worked in functional analysis and partial differential equations.
188
CHAPTER 12. APPENDIX
(i) P (max1≤k≤n |fk | > ε) ≤ 3 max1≤k≤n P |fk | >
ε
3
,
(ii) P (max1≤k≤n |fk | > ε) ≤ 2P (|fn | > ε) if the sequence (ξn )∞
n=1 is additionally symmetric that means that for all signs θ1 , ..., θn ∈ {−1, 1} the
distributions of the vectors (ξ1 , ..., ξn ) and (θ1 ξ1 , ..., θn ξn ) are the same.
Index
λ-system, 177
lim inf n An , 16
lim inf n an , 15
lim supn An , 16
lim supn an , 15
µ-measurable sets, 178
π-systems and uniqueness of measures, 23
π-system, 23, 150, 177
π − λ-Theorem, 177
σ-algebra, 10
σ-finite, 12
Hahn-decomposition, 88
Radom-Nikodym derivative, 90
absolute continuity, 87
algebra, 10
axiom of choice, 176
Bayes’ formula, 20
binomial distribution, 21
Borel σ-algebra, 25
Carathéodory’s extension
theorem, 23
Cauchy sequence in probability, 106
central limit theorem, 140
change of variables, 70
characteristic function, 142
Chebyshev’s inequality, 79
closed set, 25
complex numbers, 140
conditional probability, 19
continuous mapping theorem, 109
convergence
almost surely, 101
in mean, 110
in probability, 106
convexity, 79
convolution
of a measure, 146
of functions, 148
counting measure, 13
Dirac measure, 13
distribution
Cauchy, 162
normal on R, 156
normal on Rd , 158
distribution-function, 42
dominated convergence, 67
equivalence class, 176
equivalence class relation, 176
event, 10
existence of sets, which are not Borel,
182
expectation of a random variable, 60
expected value, 60
exponential distribution on R, 29
extended random variable, 66
finite permutation, 54
Fourier transform, 142
Fourier transform
of a function, 145
Fubini’s Theorem, 72, 73
Gaussian distribution on R, 28
geometric distribution, 22
Hölder’s inequality, 80
i.i.d. sequence, 139
independence of a family of random
variables, 45
independence of a finite family of random variables, 45
190
independence of a sequence of events,
17
inequality
Kolmogorov, 120
Lévy-Octaviani, 185
Jensen’s inequality, 79
INDEX
probability space, 12
product measure, 31
product of measure spaces, 31
random variable, 38
Bernoulli, 50
identically distributed, 55
random variables
uncorrelated, 164
representative, 176
law of a random variable, 42
Lebesgue measure, 27, 28
Lebesgue’s Theorem, 67
sequence, which fundamental in problemma of Borel-Cantelli, 17
ability, 106
lemma of Fatou, 16
space
lemma of Fatou for random variables,
Lp (Ω, F, P), 111
66
Lp (Ω, F, P), 111
Lindeberg condition, 172
L0 (Ω, F, P), 109
map
L0 (Ω, F, P), 109
ergodic, 130
metric space, 175
measure preserving, 130
step-function, 37
measurable map, 39
Stirling’s formula, 57
measurable space, 10
strong law of large numbers, 104, 126
measurable step-function, 37
symmetric set, 54
measure, 12
theorem
Jordan-decomposition, 89
central limit theorem, 52, 169
image measure, 40
ergodic, 130
Lebesgue measure, 33
law of iterated logarithm, 52, 132
outer measure, 178
of Bochner and Chinčin, 155
product measure, 31
of Poisson, 170
signed measure, 87
of Polya, 156
measure space, 12
of Riemann and Lebesgue, 152
Minkowski’s inequality, 81
of Stone and Weierstrass, 153
moments
three-series-theorem, 123
computation using Fourier transtwo-series-theorem, 119
form, 166
moments, absolute moments, 71
uniform distribution, 27, 34
monotone class theorem, 181
uniform integrability, 115
monotone convergence, 62
uniqueness theorem for Fourier transopen set, 25
Poisson distribution, 22
Poisson’s Theorem, 34
positive semi-definite
function, 143
matrix, 158
probability measure, 12
forms, 152
variance, 84
weak law of large numbers, 109
Zero-One law
Hewitt and Savage, 55
Kolmogorov, 51
Bibliography
[1] H. Bauer. Probability Theory, Walter de Gruyter, 1996.
[2] H. Bauer. Measure and integration theory. Walter de Gruyter, 2001.
[3] P. Billingsley. Probability and Measure. Wiley, 1995.
[4] L. Breiman. Probability, Addison-Wesley , 1968.
[5] R.M. Dudley. Real Analysis and Probability, Cambridge, 2002.
[6] B. Lamberton and D. Lapeyre. Introduction to Stochastic Calculus Applied to Finance,
[7] J. Jacod. Probability Essentials.
[8] W. Feller. An introduction to probability theory and its applications, Volumes I and II, Wiley, 1968 and 1966.
[9] C. Geiss and S. Geiss. An introduction to probability theory. Lectures
Notes 60, University of Jyväskylä, Department of Mathematics and
Statistics. 2009.
[10] S. Geiss. An introduction to probability theory II. Lectures Notes 61,
University of Jyväskylä, Department of Mathematics and Statistics. 2009.
[11] A. Klenke. Probability Theory, Springer.
[12] A. Shirjaev. Probability, Springer, 1996.
[13] D. Williams. Probability with Martingales, Cambridge, 1991.
© Copyright 2026 Paperzz