Applied Probability - Institute for Advanced Studies (IHS)

Applied Probability
c
Leopold Sögner
Department of Economics and Finance
Institute for Advanced Studies
Stumpergasse 56
1060 Wien
Tel: +43-1-59991 182
[email protected]
http://www.ihs.ac.at/∼soegner
January, 2014
Course Outline (1)
Applied Probability
Learning Objectives:
• Parts of the statistics course were dedicated to probability theory.
Some measure theory, the law of large numbers and central limit
theorems have already been covered in that course.
• The applied probability course covers further concepts of probability
theory and its applications.
• The main concepts are discussed in detail during the lectures. In
addition students have to work through the textbooks and have to
solve problems to improve their understanding and to acquire skills to
apply these tools to related problems.
1
Course Outline (2)
Applied Probability
Literature:
• Durrett, Rick (2010), Probability: Theory and Examples, Cambridge
Series in Statistical and Probabilistic Mathematics, 4th edition,
Cambridge 2010.
Supplementary Literature:
• Billingsley, Patrick (2012), Probability and Measure: Anniversary Edition (Wiley
Series in Probability and Statistics)
• Klenke, Achim (2008), Probability Theory: A Comprehensive Course, Springer,
Berlin Heidelberg.
2
Course Outline (3)
Applied Probability
• Measure Theory & the Integral, Durrett Chapter 1, Klenke, Chapter
1,4
–
–
–
–
–
–
Repetition of some concepts taught in the statistics course
Probability Spaces
Random variables
Integration and Fubini’s Theorem
Expected Value
Modes of Convergence
Expected time: 4 units
3
Course Outline (4)
Applied Probability
• Martingales, Durrett Chapter 5
– The concepts of conditional probability and conditional expectation
– Radon Nikodym theorem
– Doob’s Martingale convergence theorem
Expected time: 4 units
4
Course Outline (5)
Applied Probability
• Markov Chains, Durrett Chapter 6
–
–
–
–
Markov property and Markov chains
Recurrence and transience
Stationary measures
Asymptotic behavior
Expected time: 6 units
5
Course Outline (6)
Applied Probability
• Ergodic Theorems, Durrett Chapter 7
– Definitions
– Birkhoff’s ergodic theorem
– Stationary measures
Expected time: 2 units
6
Course Outline (7)
Applied Probability
• Brownian Motion, Durrett Chapter 8
–
–
–
–
–
Definitions and construction
Markov property and Brownian motion
Stopping and hitting times
Martingales and Brownian motion
Donsker’s Theorem
Expected time: 4 units
7
Course Outline (8)
Applied Probability
• Winter Term 2014
– Time schedule - Google calendar
– Practice session will be organized by Alexander Satdarov.
8
Course Outline (9)
Applied Probability
Some more comments on homework and grading:
• Mid term (40%),
• Final test (40%),
• Homework and class-room participation (20%).
• Mid term, final test & retake: tba
9
Outline - Measure Theory
Applied Probability
• Classes of sets, σ-algebra, Borel σ-algebra.
• Set functions, measure, probability measure.
• Measure extension theorem, the Lebesgue measure, product measure.
• Klenke Chapter 1
10
Classes of Sets (1)
Applied Probability
• Ω 6= ∅ is a nonempty set.
• A ⊂ 2Ω, where 2Ω stands for the set of all subset of Ω.
• Ω is called the set of elementary events. Its elements are ω.
• A stands for system of observable events. Elements of A are the
sets A1, A2, . . . . Ai are supersets of ω.
• A may satisfy certain properties. The properties we consider are:
11
Classes of Sets (2)
Applied Probability
• Definition: A class of sets A is called (see Klenke, Definition 1.1)
– Closed under intersections (or π-system, ∩-closed) if
A ∩ B ∈ A whenever A, B ∈ A.
– Closed under countable intersections (or σ − ∩-closed) if
T∞
˙
n=1 An ∈ A for any choice of countable may sets A1 , A2 , ∈A.
– Closed under unions (or ∪-closed) if A ∪ B ∈ A whenever
A, B ∈ A.
– Closed under countable unions (or σ − ∪-closed) if
S∞
˙
n=1 An ∈ A for any choice of countable may sets A1 , A2 , ∈A.
– Closed under differences (or \-closed) if A \ B ∈ A whenever
A, B ∈ A.
– Closed under complements if Ac := Ω \ A ∈ A for any set
A ∈ A.
12
Classes of Sets (3)
Applied Probability
• Definition: σ-algebra/σ-field (see Klenke, Definition 1.2)
A class of sets A ∈ 2Ω is called σ-algebra if it fulfills the following
three conditions:
– Ω∈A
– A is closed under complements.
– A is closed under countable unions.
• Remark & outlook: Our goal is to define probabilities on σ-algebras.
The events considered in probability are elements of A. I.e. for any
A ∈ A have P(A).
13
Classes of Sets (4)
Applied Probability
Some properties/interdependences:
• Theorem, (see Klenke, Theorem 1.3): If A is closed under
complements, then (i) A is ∩-closed is equivalent (⇔) to A is
∪-closed and (ii) A is σ − ∩-closed ⇔ to A is σ − ∪-closed.
• Theorem, (see Klenke, Theorem 1.4): Suppose that A is \-closed,
then (i) A is ∩-closed, (ii) If in addition A is σ − ∩-closed ⇔ to A
is σ − ∪-closed. (iii) Any countable union of sets in A can be
expressed as a countable (resp. finite) disjoint union of sets in A.
• ] stands for disjoint union in the textbook of Klenke.
14
Classes of Sets (5)
Applied Probability
• Definition: Algebra (see Klenke, Definition 1.6)
A class of sets A ∈ 2Ω is called algebra if it fulfills the following
three conditions:
– Ω∈A
– A is \-closed.
– A is ∪-closed.
• Remark: Note that σ − ∪-closed was a property of a σ-algebra.
15
Classes of Sets (6)
Applied Probability
• Definition: Ring (see Klenke, Definition 1.7)
A class of sets A ∈ 2Ω is called a ring if it fulfills the following three
conditions:
–
–
–
–
∅∈A
A is \-closed.
A is ∪-closed.
A ring is called σ-ring if A is σ − ∪-closed.
• Remark: Note that σ − ∪-closed was a property of a σ-algebra. A
ring A containing Ω is an algebra.
16
Classes of Sets (7)
Applied Probability
• Definition: Semiring (see Klenke, Definition 1.8)
A class of sets A ∈ 2Ω is called a semiring if it fulfills the following
three conditions:
– ∅∈A
– For any two sets A, B ∈ A the difference set B \ A is a finite
union of mutually disjoint sets in A.
– A is ∩-closed.
17
Classes of Sets (8)
Applied Probability
• Definition: λ-system (see Klenke, Definition 1.10)
A class of sets A ∈ 2Ω is called a Dynkin’s λ-system if it fulfills the
following three conditions:
– Ω∈A
– For any two sets A, B ∈ A the difference set B \ A is in A.
– ]∞
n=1 An ∈ A for any choice of countably many pairwise disjoint
sets A1, A2, · · · ∈ A.
• Remark: π-system: A class of sets A ∈ 2Ω is called a π-system if it
closed under the formation of finite intersections, A, B ∈ A implies
A ∩ B ∈ A. See definition 1.1.
18
Classes of Sets (9)
Applied Probability
Some properties/interdependences:
• Theorem, (see Klenke, Theorem 1.7): A class of sets A ∈ 2Ω is an
algebra if and only if the following three properties hold:
– Ω∈A
– A is closed under complements.
– A is closed under intersections.
• To see the differences in the definitions, see e.g. Klenke, Example
1.11.
19
Classes of Sets (10)
Applied Probability
Some properties/interdependences:
• Theorem: For a class of sets A ∈ 2Ω containing ∅ the following
statements are equivialent
– (i) If A, B ∈ A and A ∩ B = ∅, then A ∪ B ∈ A, (ii) A, B ∈ A
and A ⊂ B then B \ A ∈ A and (iii) A, B ∈ A then A ∩ B ∈ A.
– (i) A is \-closed, (ii) A is ∪-closed (i.e. A is a ring)
– (i) If A, B ∈ A then the symmetric difference A∆B ∈ A and (ii)
A∩B ∈A
• Remark: (A, ∆, ∩) is a ring in terms of algebra.
20
Classes of Sets (12)
Applied Probability
Some properties/interdependences:
• Theorem, (see Klenke, Theorem 1.7): A class of sets A ∈ 2Ω is an
algebra if and only if the following three properties hold:
– Ω∈A
– A is closed under complements.
– A is closed under intersections.
• To see the differences in the definitions, see e.g. Klenke, Example
1.11.
21
Classes of Sets (13)
Applied Probability
• Theorem, Relations between classes of sets (see Klenke, Theorem
1.12):
–
–
–
–
Every
Every
Every
Every
σ-algebra is a λ-system, an algebra and σ-ring.
σ-ring is a ring.
ring is a semiring.
algebra is a ring. An algebra on a finite Ω is a σ-algebra.
• See e.g. Klenke, Figure 1.1.
22
Classes of Sets (14)
Applied Probability
• Theorem, Intersection of classes of sets (see Klenke, Theorem
1.15): Let I be an arbitrary index set, and assume that Ai is a
σ-algebra for every i ∈ I. Then the intersection
AI := {A ∈ Ω : A ∈ Ai for every i ∈ I} =
\
i∈I
Ai
is a σ-algebra. (The analogous statement holds for rings, σ-rings,
algebras, λ-systems, but not for semirings.)
– By this theorem the intersection of σ-fields is a σ-field.
– It can also be used to construct the smallest σ-field.
23
Classes of Sets (15)
Applied Probability
• Theorem, Generated σ-algebra (see Klenke, Theorem 1.16): Let
E ⊂ 2Ω. Then there exists a smallest σ-algebra σ(E) with E ⊂ σ(E):
σ(E) :=
\
A.
A⊂2Ω is a σ-algebra, E⊂A
σ(E) is called the σ-algebra generated by E. E is called generator of
σ(E). δ(E) is the λ-system generated by E.
24
Classes of Sets (16)
Applied Probability
• We observe that:
E ⊂ σ(E).
If E1 ⊂ E2 then σ(E1) ⊂ σ(E2).
A is a σ-algebra if and only if σ(A) = A.
δ(E) ⊂ σ(E).
Given that D is a λ-system: D is a π-systems is equivalent to D is
a σ-algebra (Theorem 1.18 in Klenke).
– If E is a π-system, then δ(E) = σ(E) (Theorem 1.19 in Klenke).
–
–
–
–
–
25
Classes of Sets (17)
Applied Probability
• For the rest of the course we shall mainly consider real valued
random variable X ∈ Rn.
• We restrict to σ-algebras generated by topologies. For the real
numbers we shall consider half-open subsets with rational borders of
the intervals.
26
Classes of Sets (18)
Applied Probability
• Definition: Topology (see Klenke, Definition 1.20) Let Ω 6= ∅ be an
arbitrary set. A class of sets τ ⊂ 2Ω is called a topology on Ω if it
has the three properties:
– ∅, Ω ∈ τ
– A ∩ B ∈ τ for any two sets A, B ∈ τ .
– SA∈F A ∈ τ for any F ⊂ τ .
The pair (Ω, τ ) is called a topological space. The sets A ∈ τ are
called open sets, and the sets A ⊂ Ω with Ac ∈ τ are called closed.
27
Classes of Sets (19)
Applied Probability
• Topologies are closed under finite intersections, while a σ-algebra is
closed under countable intersections.
• Topologies are closed under arbitrary unions, while a σ-algebra is
closed under countable unions.
28
Classes of Sets (20)
Applied Probability
• Definition: Metric d (distance function): For any elements
x, y, z ∈ Ω, (i) d(x, y) ≥ 0 where d(x, y) = 0 if and only if x = y,
(ii) d(x, y) = d(y, x) (symmetry) and (iii) d(x, z) ≤ d(x, y) + d(y, z)
(subadditivity / triangle inequality).
• Assume that d exists on Ω. Then we define the open ball with
radius r centered at x ∈ Ω by
Br (x) = {y ∈ Ω : d(x, y) < r}.
• The class of open sets is the topology:
τ ={
[
(x,r)∈F
Br (x) : F ⊂ Ω × (0, ∞)}.
29
Classes of Sets (21)
Applied Probability
• Definition: Borel σ-algebra (see Klenke, Definition 1.21) Let (Ω, τ )
be a topological space. The σ-algebra
B(Ω) := B(Ω, τ ) := σ(τ )
that is generated by the open sets A ∈ τ is called Borel σ-algebra
on Ω. The elements A ∈ (Ω, τ ) are called Borel sets or Borel
measurebale sets.
30
Classes of Sets (22)
Applied Probability
• Some remarks:
– We are interested in B(Rn). Rnris equipped with the Euclidian
distance d(x, y) = kx − yk2 = Pni=1(xi − yi)2.
– There are subsets of Rn that are not Borel sets. E.g. Vitali sets see literature (e.g. Durrett, Appendix or Billingsley, end of Chapter
2).
– If C ⊂ Rn is a closed set, then C c ∈ τ is also in B(Rn) such that
C is also a Borel set. Therefore also x ∈ Rn is contained in
B(Rn), i.e. {x} ∈ B(Rn).
– B(Rn) is not a topology. To see this consider sets V ⊂ Rn where
V 6∈ B(Rn) (by the above argument we know that such subsets
exist). V = Sx∈V {x}. If B(Rn) were a topology then it would by
closed under arbitrary unions. Since {x} ∈ B(Rn) we get
V = Sx∈V {x} ∈ B(Rn). However here we get a contradiction.
31
Classes of Sets (23)
Applied Probability
• Since the class of open sets that generates the Borel σ-field is quite
big, we raise the question whether this could also be done with a
smaller class of sets.
• Define: E1 = {A ∈ Rn : A is open}, E2 = {A ∈ Rn : A is closed},
E3 = {A ∈ Rn : A is compact}, E4 = {Br (x) : x ∈ Qn, r ∈ Q+},
E5 = {(a, b) : (a, b) ∈ Qn, a < b}, E6 = {[a, b) : [a, b) ∈ Qn, a < b},
E7 = {(a, b] : (a, b] ∈ Qn, a < b}, E8 = {[a, b] : [a, b] ∈ Qn, a < b},
E9 = {(−∞, b) : b ∈ Qn}, E10 = {(−∞, b] : b ∈ Qn},
E11 = {(a, ∞) : a ∈ Qn}, E12 = {[a, ∞) : a ∈ Qn}.
32
Classes of Sets (24)
Applied Probability
• Theorem, (see Klenke, Theorem 1.23): The Borel σ-field B(Rn) is
generated by any of the classes E1 − E12. That is to say
σ(Ei) = B(Rn) for i = 1, . . . , 12.
• Remark, (see Klenke, Remark 1.24): The classes of sets
E1 − E3,E5 − E12 are a π system. Hence the Borel σ-algebra equal the
generated λ-system, i.e. B(Rn) = δ (Ei) holds for
i = 1, 2, 3, 5, . . . , 12 (see also Dynkin’s π − λ theorem, (Theorem
1.19)). E4 . . . E12 are countable.
33
Set Functions (1)
Applied Probability
• Definition: (see Klenke, Definition 1.27) Let A ⊂ 2Ω and let
µ : A → [0, ∞] be a set function. We say that µ is:
– monotone if µ(A) ≤ µ(B) for any two sets A, B ∈ A with
A ⊂ B.
– additive if µ(]ni=1Ai) = Pni=1 µ(Ai) for any choice of finitely many
mutually disjoint sets A1, . . . , An ∈ A with Sni=1 Ai ∈ A.
P∞
– σ-additive if µ(]∞
A
)
=
i=1 i
i=1 µ(Ai ) for any choice of countably
many mutually disjoint sets A1, . . . , An ∈ A with S∞
i=1 Ai ∈ A.
– subadditive if for any choice of finitely many sets
A, A1, . . . , An ∈ A with A ∈ Sni=1 Ai we have µ(A) ≤ Pni=1 µ(Ai).
– σ-subadditive if for any choice of countable many sets
P∞
A, A1, A2, · · · ∈ A with A ∈ S∞
A
we
have
µ(A)
≤
i=1 i
i=1 µ(Ai ).
34
Set Functions (2)
Applied Probability
• Definition: (see Klenke, Definition 1.28) Let A be a semiring and
let µ : A → [0, ∞] be a set function with µ(∅) = 0. µ is called a:
–
–
–
–
content if µ is additive,
premeasure if µ is σ-additive,
measure if µ is a premeasure and A is a σ-alegebra
probability measure if µ is a measure and µ(Ω) = 1.
35
Set Functions (3)
Applied Probability
• Definition: (see Klenke, Definition 1.29) Let A be a semiring. A
content µ on A is called:
– finite if µ(A) < ∞ for every A ∈ A and
– σ-finite if there exists a sequence of sets Ω1, Ω2, · · · ∈ A such
that Ω = S∞
n=1 Ωn and such that µ(Ωn ) < ∞ for all n ∈ N.
• Discuss examples in Klenke, page 12 and 13.
36
Set Functions (4)
Applied Probability
• Theorem: Properties of the content (see Klenke, Lemma 1.31) Let
A be a semiring and let µ be a content µ on A. The following
statements hold:
– If A is a ring, then µ(A ∪ B) = µ(A) + µ(B) − µ(A ∩ B) for any
two sets A, B ∈ A.
– If A is a ring, then µ(B) = µ(A) + µ(B \ A) for any two sets
A, B ∈ A with A ⊂ B. µ is monotone.
– If µ is σ-additive, then µ is also σ-subadditive.
S∞
– If A is a ring, then P∞
µ(A
)
≤
µ(
n
n=1
n=1 An ) for any choice of
countably many mutually disjoint sets A1, A2, · · · ∈ A with
S∞
n=1 An ∈ A.
37
Set Functions (5)
Applied Probability
• Theorem: Inclusion-exclusion formula (see Klenke, Theorem 1.33)
Let A be a ring and let µ be a content µ on A. Let n ∈ N and
A1, . . . , An ∈ A. Then the following inclusion and exclusion formulas
hold:
– µ(A1 ∪· · ·∪An) = Pnk=1(−1)k−1 P{i1,...,ik }⊂{1,...,n} µ(Ai1 ∩· · ·∩Aik ).
– µ(A1 ∩· · ·∩An) = Pnk=1(−1)k−1 P{i1,...,ik }⊂{1,...,n} µ(Ai1 ∪· · ·∪Aik ).
• The summation is over all subsets of {1, . . . , n} with k elements.
38
Set Functions (6)
Applied Probability
• Definition: (see Klenke, Definition 1.34) Let A1, A2, . . . be sets. We
write
– An ↑ A and say that (An)n∈N increases to A if A1 ⊂ A2 ⊂ . . .
and S∞
n=1 An = A, and
– An ↓ A and say that (An)n∈N decreases to A if A1 ⊃ A2 ⊃ . . .
and T∞
n=1 An = A.
39
Set Functions (7)
Applied Probability
• Definition: Limits (see Klenke, Definition 1.13) Let A1, A2, . . . be
subsets of Ω. The sets
T∞
– lim inf n→∞ An := S∞
n=1 m=n Am , and
S∞
– lim supn→∞ An := T∞
n=1 m=n Am
are called limes inferior and limes superior of the sequence (An).
• lim inf n→∞ An can be written as
lim inf n→∞ An = {ω ∈ Ω : ]{n ∈ N : ω 6∈ An} < ∞}.
• lim supn→∞ An can be written as
lim supn→∞ An = {ω ∈ Ω : ]{n ∈ N : ω ∈ An} = ∞}.
40
Set Functions (8)
Applied Probability
• Definition: Continuity of contents (see Klenke, Definition 1.35) Let
µ be a content on the ring A:
– µ is called lower semicontinuous if (for n → ∞)
µ(An) → µ(A) for any A ∈ A and any sequence (An)n∈N in A
with An ↑ A.
– µ is called upper semicontinuous if (for n → ∞)
µ(An) → µ(A) for any A ∈ A and any sequence (An)n∈N in with
µ(An) < ∞ for some (and eventually all ) n ∈ N and An ↓ A.
– µ is called ∅-continuous if upper semicontinuity holds for A = ∅.
41
Set Functions (9)
Applied Probability
• Theorem: Continuity and premeasure (see Klenke, Theorem 1.36)
Let µ be a content on the ring A. Consider the five properties:
–
–
–
–
–
(i) µ is σ-additive and hence a premeasure.
(ii) µ is σ-subadditive.
(iii) µ is lower semicontinuous.
(iv) µ is ∅-continuous.
(v) µ is upper semicontinuous.
• Then the following implications hold:
(i) ⇔ (ii) ⇔ (iii) ⇒ (iv) ⇔ (v). If µ is finite then we also have
(iii) ⇐ (iv).
• Discuss: what does this result imply for probability measures in a
σ-field?
42
Set Functions (10)
Applied Probability
• Definition: Measurable sets, measure space (see Klenke, Definition
1.38)
– A pair (Ω, A) consisting of a nonempty set Ω and σ-algebra
A ⊂ 2Ω is called measurable space. The sets A ∈ A are called
measurable sets. If Ω is at most countably infinite and if
A = 2Ω, then the measurable space (Ω, 2Ω) is called discrete.
– A triple (Ω, A, µ) is called a measure space if (Ω, A) is a
measurable space and µ is a measure on A.
– If in addition µ(Ω) = 1 then (Ω, A, µ) is called a probability
space. The sets A ∈ A are called events.
– The set of all finite measure on (Ω, A) is denoted by
Mf (Ω) = Mf (Ω, A). The subset of probability measures is
denoted by M1(Ω) = M1(Ω, A). Mσ (Ω) = Mσ (Ω, A) stands for
the set of σ-finite measures.
43
Measure Extension Theorem (1)
Applied Probability
• We already defined and considered:
–
–
–
–
–
Classes of sets, ring, semiring, σ-algebra.
Borel σ-algebra,
Definition of measure, content, premeasure
Measure space, probability space, measurable set.
The goal is now to construct measures on σ-algebras. To do this
we construct measures on a semiring. By the extension theorem we
obtain a measure on th whole σ-algebra.
44
Measure Extension Theorem (2)
Applied Probability
• Example: Lebesgue measure. Let n ∈ N and let
A = {(a, b] : a, b ∈ Rn, a < b}
be the semiring of half open rectanges (a, b] ∈ Rn. The
n-dimensional volume of such a rectangle is
µ((a, b]) =
n
Y
(bi − ai).
i=1
• Can we extend µ((a, b]) to a uniquely determined measure on the
Boral σ-algebra B(Rn) = σ(A)?
• The resulting measure is called Lebesgue measure λ (or λn) on
(Rn, B(Rn)).
45
Measure Extension Theorem (3)
Applied Probability
• Theorem: Caratheodory measure extension theorem (see Klenke,
Theorem 1.41) Let A ∈ 2Ω be a ring and let µ a σ-finite measure on
A. There exists a unique measure µ̃ on σ(A) such that µ̃(A) = µ(A)
for all A ∈ A. Furthermore µ̃ is σ-finite.
• Theorem: Extension theorem (see Klenke, Theorem 1.53) Let
A ∈ 2Ω be a semiring and let µ : A → [0, ∞] by an additive,
σ-subadditive and σ-finite set function with µ(∅) = 0. Then there
exists a unique σ-finite measure µ̃ : σ(A) → [0, ∞] such that
µ̃(A) = µ(A) for all A ∈ A.
46
Measure Extension Theorem (4)
Applied Probability
• Theorem: Lebesgue measure (see Klenke, Theorem 1.55)
There exists a uniquely determined measure λn on (Rn, B(Rn)) with
the property that
n
Y
µ((a, b]) = (bi − ai)
i=1
for all a, b ∈ Rn, a < b. λn is called Lebesgue measure (or
Lebesgue-Borel measure) on (Rn, B(Rn)).
• Defintion: Lebesgue-Stieltjes measure. Consider a monotone
increasing and right-continuous function F . The measure
µF ((a, b]) = F (b) − F (a) on (Rn, B(Rn)) is called
Lebesgue-Stieltjes measure with distribution function F . (see
Examples 1.56 and 1.57).
47
Measure Extension Theorem (5)
Applied Probability
• If F (x) = x, then µF is equal to the Lebesgue measure.
• Let f : R → [0, ∞) be continuous and let F (x) = 0x f (t)dt for all x.
Then µF is the extension of the premeasure with density f .
R
• Let x1, x2, · · · ∈ R and αn > 0 for all n ∈ N such that
P∞
P∞
α
<
∞.
Then
F
=
n=1 n
n=1 αn 1[xn ,∞) is the distribution function
of the finite measure µ = P∞
n=1 αn δxn .
• If limx→∞(F (x) − F (−x)) = 1, then µF is a probability measure.
48
Measure Extension Theorem (6)
Applied Probability
• Definition: Distribution function (see Klenke, Definition 1.59)
– A right continuous monotone increase function F : R → [0, 1] with
F (−∞) := limx→−∞ F (x) = 0 and F (∞) := limx→∞ F (x) = 1 is
called probability distribution function.
49
Measure Extension Theorem (7)
Applied Probability
• Theorem: Finite products of measures (see Klenke, Definition 1.61)
– Let n ∈ N and let µ1, . . . , µn be finite measures or
Lebesgue-Stieltjes measures on (R, B(R)). Then there exists a
unique σ-finite measure µ on (Rn, B(Rn)) such that
µ((a, b]) =
n
Y
i=1
µ((ai, bi])
for all a, b ∈ Rn, a < b. µ = Nni=1 µi is called the product
measure.
50
Measure Extension Theorem (8)
Applied Probability
• Definition: Null set (see Klenke, Definition 1.68)
– A set A is called a µ-null set if µ(A) = 0. By Nµ we denote the
class of all subsets of a µ-null set.
– Let E(ω) be a property that a point ω ∈ Ω has or not. We say
that E holds µ-almost everywhere (a.e.) or for almost all ω if
there exists a null set N such that E(ω) holds for every
ω ∈ Ω \ N . If A ∈ A and if there exists a null set N such that
E(ω) holds for every ω ∈ A \ N , the E holds almost everywhere
on A.
– If µ = P is a probability measure then we say that E holds
almost surely (on Ω, on A).
– Let A, B ∈ A and assume that there is a null set N such that
A4B ⊂ N . Then A = B mod µ.
51
Measure Extension Theorem (9)
Applied Probability
• Definition: Complete measure space (see Klenke, Definition 1.69)
– A measure space (Ω, A, µ) is complete if Nµ ⊂ A.
– If a measure space does not contain the null-sets it can be made
complete by adding these null-sets. For more details see Klenke,
pages 33-34 and further literature.
52
Measurable Maps (1)
Applied Probability
• Structure preserving maps (homomorphims)
– Continuous maps for topololgical spaces.
– Measurable maps for measureable spaces.
• Definition: Measurable maps (see Klenke, Definition 1.76)
– A map X: Ω → Ω0 is called space A − A0 measurable if
X −1(A0) := {X −1(A0) : A0 ∈ A0} ⊂ A. I.e
X −1(A0) ∈ A for any A0 ∈ A0.
If X is measurable we write X : (Ω, A) → (Ω0, A0).
– If Ω0 = R and A = B(R) is the Borel σ-algebra on R, the X:
(Ω, A) → (R, B(R)), then X is called A measurable real map.
53
Measurable Maps (2)
Applied Probability
• Examples:
– The identity map id : Ω → Ω is A − A0 measurable.
– If A = 2Ω or A0 = {, Ω, the any map X : Ω → Ω is A − A0
measurable.
– The indicator function 1A : Ω → {0, 1} is A − 2{0,1} measurable if
and only if A ∈ A.
54
Measurable Maps (3)
Applied Probability
• Theorem: Generated σ-algebra (see Klenke, Theorem 1.78)
– Let (Ω0, A0) be a measurable space and let Ω be a nonempty set.
Let X: Ω → Ω0 be a map. The preimage
X −1(A0) := {X −1(A0) : A0 ∈ A0} ⊂ A
is the smallest σ-algebra with respect to which X is measurable.
σ(X) = X −1(A0) is called the σ-algebra of Ω that is generated
by X.
• Theorem: Measurability of continuous maps (see Klenke, Theorem
1.88)
– Let (Ω, A) and (Ω0, A0) be a topological spaces and let f by a
continuous map. Then f is B(Ω) − B(Ω0) measurable.
55
Measurable Maps (4)
Applied Probability
• Definition: Simple function (see Klenke, Definition 1.93)
Let (Ω, A) be a measurable space. A map f : Ω → R is called
simple function if there is an n ∈ N and mutually disjoint
measurable sets A1, . . . , An ∈ A as well as number
α1, . . . , αn ∈ R such that
f=
n
X
i=1
αi1Ai .
56
Measurable Maps (5)
Applied Probability
• Definition: Simple functions II (see Klenke, Definition 1.95)
Assume that f, f1, f2, . . . are maps f : Ω → R̄ such that
f1(ω) ≤ f2(ω) ≤ . . . and limn→∞ fn(ω) = f (ω) for any ω ∈ Ω.
Then (fn)n∈N increases pointwise to f . Notation fn ↑ f . fn ↓ f
stands for decreases pointwise.
57
Measurable Maps (6)
Applied Probability
• Theorem: Simple function (see Klenke, Theorem 1.96) Let (Ω, A)
be a measurable space and f : Ω → [0, ∞] be measurable. Then the
following statements hold:
– There exists a sequence (fn)n∈N of nonnegative simple functions
such that fn ↑ f .
– There are measurable sets A1, A2, · · · ∈ A and numbers
α1, α2, · · · ≥ 0 such that
f=
∞
X
n=1
αn 1A n .
• More details on measurable maps see textbook.
58
Measurable Maps (7)
Applied Probability
• Measurable maps transport measures from one space to another
• Definition: Image measure/push-forward measure (see Klenke,
Definition 1.98)
– Let (Ω, A) and (Ω0, A0) be a measurable spaces and µ be a a
measure on (Ω, A). X : (Ω, A) → (Ω0, A0) is measurable. The
image measure of µ under X is the measure µ0 := µ ◦ X −1 on
(Ω0, A0) that is defined by:
0
µ ◦ X −1 : A → [0, ∞], A0 7→ µ(X −1(A0)).
59
Measurable Maps (8)
Applied Probability
• Theorem: Density transformation formula (see Klenke, Theorem 1.101)
– Let µ by a measure on Rn that has continuous (or piecewise continuous) density
f : Rn → [0, ∞) such that
µ((−∞, x]) =
Z x
x1
n
f (t1, . . . , tn)dt1 . . . dtn
.
.
.
−∞
−∞
Z
for all x ∈ Rn. Let A ∈ Rn be any open or closed set in Rn with
µ(Rn \ A) = 0. Further B ∈ Rn be any open or closed set. Assume that
φ : A → B is a continuously differentiable bijection with derivative φ0. Then the
image measure µ ◦ φ−1 has the density
f (φ−1(x))
fφ =
if x ∈ B, fφ = 0 if x 6∈ B.
|det(φ0(φ−1(x)))|
60
Random Variables (1)
Applied Probability
• We consider a probability space (Ω, A, P). A in A are called events.
• Definition: Random Variable (see Klenke, Definition 1.102) Let
(Ω0, A0) be a measurable space and let X : Ω → Ω0 be measurable.
– X is called a random variable with values in (Ω0, A0). If
(Ω0, A0) = (R, B(R)) then X is called real random variable.
– For A0 ∈ A0 we write {X ∈ A0} := X −1(A0) and
P(X ∈ A0) := P(X −1(A0)). In addition {X ≥ 0} := X −1([0, ∞))
and {X ≤ b} := X −1((−∞, b]), etc.
61
Random Variables (2)
Applied Probability
• Definition: Distributions (see Klenke, Definition 1.103) Consider a
random variable X.
– The probability measure PX := P ◦ X −1 is called the distribution
of X.
– For a real valued random variable X, the map
FX : x 7→ P(X ≤ x) is called the distribution function of X.
We write X ∼ µ (or X ∼ F ) if µ = PX and say X has
distribution µ.
– A family (Xi)i∈I of random variables is called identically
D
distributed if PXi = PXj for all i, j ∈ I. We write X =Y
if
PX = PY .
62
Random Variables (3)
Applied Probability
• Theorem: Distributions vs. random variables (see Klenke, Theorem
1.104)
– For any distribution function F , there exists a random variable X
with FX = F .
• By this theorem it is also sufficient to consider a model for the
distribution function F , the corresponding random variable need not
be modeled explicitly. We know by this this theorem that an X with
this distribution function has to exist.
63
Outline - Independence
Applied Probability
• Independent events
• Borel Cantelli lemma
• Independent random variables
• Klenke Chapter 2
64
Independence of Events (1)
Applied Probability
• Definition: Independence of events (see Klenke, Definition 2.3)
– Let I be an arbitrary index set and let (Ai)i∈I be an arbitrary
family of events. The family (Ai)i∈I is called independent if for
any finite subset J ⊂ I the product formula holds:

 \
P 
j∈J

Aj  =

Y
j∈J
P (Aj ) .
• Discuss Examples 2.1 and 2.2 in the textbook.
65
Independence of Events (2)
Applied Probability
• Now we roll a die infinitely often. What is the probability that the
face shows 6 infinitely often. It should be one.
• We we play roulette. What is the probability of {0 infinitely often}. It
should also be one.
• Otherwise there must be a last 6 or a last zero.
• Remember: A∗ := lim inf n→∞ An can be written as
lim inf →∞ An = {ω ∈ Ω : ]{n ∈ N : ω 6∈ An} < ∞}.
• A∗ := lim supn→∞ An can be written as
lim sup→∞ An = {ω ∈ Ω : ]{n ∈ N : ω ∈ An} = ∞}.
66
Independence of Events (3)
Applied Probability
• Theorem, Borel-Cantelli lemma (see Klenke, Theorem 2.7): Let
A1, A2, . . . be events and define A∗ := lim supn→∞ An. Then:
∗
– If P∞
n=1 P(An ) < ∞, then P(A ) = 0. (Here P can be an arbitrary
measure on (Ω, A).)
∗
– If (An)n∈N is independent and P∞
n=1 P(An ) = ∞, then P(A ) = 1.
• The Borel-Cantelli lemma belongs to the so called 0 − 1-laws.
• Discuss the examples 2.8 to 2.10. Example 2.9 demonstrates why
independence is important in the second Borel-Cantelli lemma.
67
Independence of Events (4)
Applied Probability
• Definition: Independence of classes of events (see Klenke, Definition
2.11)
– Let I be an arbitrary index set and let Ei ∈ A for all i ∈ I. The
family (Ei)i∈I is called independent if for any finite subset J ⊂ I
and any choice of Ej ∈ Ej , j ∈ J we have

 \
P 
j∈J

Ej  =

Y
j∈J
P (Ej ) .
68
Independence of Events (5)
Applied Probability
• Theorem, Independence of classes (see Klenke, Theorem 2.13):
– Let I be finite and for any i ∈ I let Ei ∈ A with Ω ∈ Ei. Then
(Ei)i∈I is independent if and only if P (Tj∈J Ej ) = Qj∈J P (Ej )
holds for any finite subset J ⊂ I and any choice of
Ej ∈ Ej , j ∈ J.
– (Ei)i∈I is independent if and only if (Ej )j∈J is independent for all
finite J ∈ I.
– (Ei ∪ ∅)i∈I is ∩-stable, then (Ei)i∈I is independent if and only if
(σ (Ei))i∈I is independent.
– Let K be an arbitrary set and let (Ik )k∈K be
mutually
disjoint
subsets of I. If (Ei)i∈I is independent then Si∈Ik Ei k∈K also is
independent.
69
Independent Random Variables (1)
Applied Probability
• We now consider an arbitrary index set I. For each i ∈ I we consider
the measurable space (Ωi, Ai) and the random variable
Xi : (Ω, A) → (Ωi, Ai) with generated σ-field σ(Xi) = Xi−1(Ai).
• Definition, Independent random variables (see Klenke, Definition
2.14): The family (Xi)i∈I of random variables is called independent
if the family (σ(Xi))i∈I of sigma-algebras is independent.
70
Independent Random Variables (2)
Applied Probability
• Definition, Joint distribution (see Klenke, Definition 2.20):
– For any i ∈ I let Xi be a real random variable. For any finite
subset J ∈ I let
FJ (x) := F((Xj )j∈J )(x) : RJ → [0, 1]


x 7→ P(Xj ≤ xj for all j ∈ J) = P 

\

X −1((−∞, xj ])
j∈J
FJ is called the joint distribution function of (Xj )j∈J . The
probability measure P((Xj )j∈J ) on RJ is called joint distribution
of (Xj )j∈J .
– Remark: We consider the probability space (Ω, A, P). X is a
random variable. (Ωi, Ai) = (RJ , B (RJ )). Let Ai = (−∞, xj ] then
X −1((−∞, xj ]) = A ∈ A. Then P is applied to A.
71
Independent Random Variables (3)
Applied Probability
• Theorem, Joint distribution (see Klenke, Theorem 2.21):
– A family (Xi)i∈I if real random variables is independent if and only
if, for every finite J ∈ I and every x = (xj )j∈J ∈ RJ
FJ (x) =
Y
j∈J
F((Xj ))(xj ).
72
Independent Random Variables (4)
Applied Probability
• Theorem, Joint density (see Klenke, Corollary 2.22):
– In addition (to 2.21), assume that any FJ has a continuous density
fJ (x) = f((Xj )j∈J )(x), i.e. there exists a continuous map
fJ : RJ → [0, ∞) such that
FJ (x) =
Z
xj1
xjn
.
.
.
−∞
−∞ fJ (t1 , . . . , tn )dt1 . . . dtn
Z
for all x ∈ RJ where J = {j1, . . . , jn}. In this case the family
(Xi)i∈I if real random variables is independent if and only if for
any finite J ∈ I
Y
f(Xj )(xj ).
fJ (x) =
j∈J
73
Outline - The Integral
Applied Probability
• Construction of integrals with respect to a measure µ.
• Properties of the integral.
• Monotone convergence, the Lemma of Fatou and the St. Petersburg
game.
• Riemann vs. Lebesgue integral.
• Klenke Chapter 4.
74
Construction of the Integral (1)
Applied Probability
• We consider a measure space (Ω, A, µ).
• The goal of this section is to construct an integral with respect to a
measure µ.
• We already observed that measurable functions can be approximated
by (a sequence of increasing) simple functions (see Definition 1.93
and Theorem 1.96). Hence simple functions play an important role in
the construction of the integral.
75
Construction of the Integral (2)
Applied Probability
• Let E be the vector space of simple function (see Definition 1.93) on
(Ω, A) and
E+ = {f ∈ E : f ≥ 0}
the cone of nonnegative simple functions.
• If
f=
m
X
i=1
αi1Ai
for some m ∈ N where α1, . . . , αm ∈ [0, ∞) and A1, . . . , Am ∈ A
are mutually disjoint set, then the above representation of f is called
normal representation of f .
76
Construction of the Integral (3)
Applied Probability
• Theorem, Normal representation (see Klenke, Lemma 4.1):
Pn
– If f = Pm
α
1
and
f
=
i=1 i Ai
j=1 αj 1Bj are two normal
representations of f ∈ E+ then
m
X
i=1
αiµ(Ai) =
n
X
j=1
αj µ(Bj ).
• Remark: In the next step we construct the integral. By this theorem
the value of the integral does not depend on the normal
representation we use.
77
Construction of the Integral (4)
Applied Probability
• Definition, (see Klenke, Definition 4.2):
– Define the map I : E+ → [0, ∞] by
I(f ) =
m
X
i=1
αiµ(Ai)
The function f has the normal representation
Pm
i=1 αi 1Ai .
• Conventions for infinity: 0 · ∞ = ∞ · 0 = 0, x · ∞ = ∞ · x = ∞ for
0 < x < ∞, ∞ · ∞ = ∞.
78
Construction of the Integral (5)
Applied Probability
• Theorem, Properties of I(f ) (see Klenke, Lemma 4.3): The map I
is positive, linear and monotone increasing: Let f, g ∈ E+ and α ≥ 0.
Then the following statements hold:
– I(αf ) = αI(f ).
– I(f + g) = I(f ) + I(g).
– If f ≤ g then I(f ) ≤ I(g).
79
Construction of the Integral (6)
Applied Probability
• Definition, Integral (see Klenke, Definition 4.4):
– If f : Ω → [0, ∞] is measurable, then we define the integral of f
with respect to µ by
Z
f dµ := sup{I(g) : g ∈ E+, g ≤ f }.
• If µ is the Lebesgue measure λ we get the Lebesgue integral. If µ is
the counting measure then the integral becomes a sum.
• The integral is an extension of the map I on E+ to the set of
(nonnegative) measurable functions.
• Note that f ≤ g holds pointwise, i.e. f (ω) ≤ g(ω) for all ω ∈ Ω or
almost surely (everywhere), i.e. f (ω) ≤ g(ω) for all ω ∈ Ω \ N where
N stands for a µ-null set.
80
Construction of the Integral (7)
Applied Probability
• Theorem, Properties of the integral (see Klenke, Theorem 4.6): Let
f, g, f1, f2, . . . be measurable maps Ω → [0, ∞]. Then:
– Monotonicity: If f ≤ g, then f dµ ≤ gdµ.
– Monotone convergence: If fn ↑ f , then the integrals also converge
R
R
fndµ ↑ f dµ.
– Linearity: If α, β ∈ [0, ∞], then
R
R
R
(αf + βg)dµ = α f dµ + β gdµ.
R
R
81
Construction of the Integral (8)
Applied Probability
• Until now we have considered measurable function on E+. Now we
extend this concept to measurable f .
• First, f = f + − f −, where f + = max{0, f } and f − = − min{0, f }.
• f +, f − ≤ |f |. Hence if |f |dµ < ∞, then f +dµ < ∞ and
R
f −dµ < ∞.
R
R
82
Construction of the Integral (9)
Applied Probability
• Definition, Integral of measurable functions (see Klenke, Definition
4.7): A measurable function f : Ω → R̄ is called µ-integrable if
R
|f |dµ < ∞. We write
– L1(µ) := L1(Ω, A, µ) = {f : Ω → R̄ :
R
f measurable and |f |dµ < ∞}.
– For f ∈ L1(µ) we define the integral of f with respect to µ by
Z
f (ω)dµ(ω) := f dµ := f dµ − f −dµ.
Z
Z
+
Z
– If we only have f +dµ < ∞ or f −dµ, the values −∞ and ∞
are possible.
R
R
– A f dµ := 1Af dµ for A ∈ A.
R
R
83
Construction of the Integral (10)
Applied Probability
• Theorem, Properties of the integral (see Klenke, Theorem 4.8): Let
f : Ω → [0, ∞] be a measurable map.
– We have f = 0 almost everywhere if and only if f dµ = 0.
R
– If f dµ < ∞ then f < ∞ almost everywhere.
R
84
Construction of the Integral (11)
Applied Probability
• Theorem, Properties of the integral (see Klenke, Theorem 4.9): Let
f, g ∈ L1(µ).
– Monotonicity: If f ≤ g almost everywhere then f dµ ≤ gdµ. If
R
R
f = g almost everywhere then f dµ = gdµ.
R
R
– Triangle inequality: | f dµ| ≤ |f |dµ.
– Linearity: If α, β ∈ R, then αf + βg ∈ L1(µ) and
R
R
R
(αf + βg)dµ = α f dµ + β gdµ. This equation also holds if at
most one of the integrals is infinite.
R
R
85
Construction of the Integral (12)
Applied Probability
• Theorem, Image measure, change of variable (see Klenke, Theorem
4.10):
– Let (Ω, A) and (Ω0, A0) be a measurable spaces and µ be a a
measure on (Ω, A). X : (Ω, A) → (Ω0, A0) is measurable.
µ0 = µ ◦ X −1 is the image measure of µ under X (see image
measure/push-forward measure (see Klenke, Definition 1.98)).
Assume that f : Ω0 → R̄ is µ0-integrable. Then f ◦ X ∈ L1(µ),
Z
Ω (f ◦ X)dµ(ω) =
Z
Ω0
0
−1
0
f (ω )d(µ ◦ X (ω )) = f (ω 0)dµ0(ω 0)
Z
and
Z
(f
X −1 (A0 )
◦ X)(ω)dµ(ω) =
Z
A0
0
−1
0
f (ω )d(µ ◦ X (ω )) =
Z
A0
f (ω 0)dµ0(ω 0).
86
Construction of the Integral (13)
Applied Probability
Ad change of variable formula:
• If X is a random variable on (Ω, A, P), then PX = P ◦ X −1 and
R
R
R
0
−1 0
f
◦
X(ω)dP(ω)=
f
(X(ω))dP(ω)=
Ω
Ω
Ω0 f (ω )d(P ◦ X (ω ))=
R
0
0
Ω0 f (ω )dPX (ω ).
• Suppose that (Ω0, A0) = (R, B(R)), X is a real valued function φ and f (x) = x is
R
R
the identity then Ω φ(ω)µ(dω) = R x(µ ◦ φ−1dx).
• Next, let f = sin, (Ω, A) = (Ω0, A0) = (R, B(R)), X : λ 7→ 2λ and µ be the
Lebesgue measure λ. Then x = 2λ describes X , X −1 is therefore λ = x/2. Let
A = [0, π] then A0 = [0, 2π]. Moreover, µ0 = λ ◦ X −1 = x/2 and
Rπ
R 2π
1
0 sin(2λ)dλ = 0 sinx 2 dx.
• Also the density transformation formula of theorem 1.101 follows from the above
theorem. Theorem 4.15 in Klenke is a further version of this result.
87
Construction of the Integral (14)
Applied Probability
• Let (Ω, A) be a discrete measurable space and let µ = Pω∈Ω αω δω for
certain numbers of αω ≥ 0. A map f is integrable if
P
ω∈Ω |f (ω)|αω < ∞. In this case
Z
f dµ =
X
ω∈Ω
f (ω)αω .
• Example: ω0, ω1, . . . , ω36 are the events when we consider the roulette
wheel. If the wheel is fair then pi = 1/37, i.e. αω = 1/37. f (ω) is the
gain/loss given some strategy. Assume that we bet 1 EURO on zero.
Then f = 35 if ω0 realizes, while f = −1 with the other ωi. Then
P
ω∈Ω f (ω)αω = 1/37 ∗ 35 − 36/37 ∗ 1 = 1/37 ∗ (35 − 36)) = −1/37.
88
Construction of the Integral (15)
Applied Probability
• Definition, Lebesgue integral (see Klenke, Definition 4.12):
– Let λ be the Lebesgue measure on Rn and let f : Rn → R be
measurable with respect to B ∗(Rn)-B(R) (see Klenke, page 33)
and λ integrable. Then we call
Z
f dλ
the Lebesgue integral of f . If A in B ∗(Rn) and f is measurable
then we write
Z
Z
A f dλ = 1A f dλ.
89
Construction of the Integral (16)
Applied Probability
• Definition, Lebesgue density (see Klenke, Definition 4.13):
– Let µ be a measure on (Ω, A) and let f : Ω → [0, ∞) be a
R
measurable map. Define ν(A) = 1Af dµ for A ∈ A. Then ν has
a density f with respect to µ.
– If µ = λ, then ν has a density with respect to the Lebesgue
measure.
• Prominent examples for densities with respect to the Lebesgue
measure are the normal, the student-t, the gamma, the exponential
density.
90
Construction of the Integral (17)
Applied Probability
• Definition, (see Klenke, Definition 4.16):
– For a measurable f : Ω → R̄ define
kf kp := ( |f |pdµ)
Z
1/p
if p ∈ [1, ∞) and
kf k∞ := inf{K ≥ 0 : µ({|f | > K}) = 0}.
– For p ∈ [1, ∞] we define the vector space
Lp(µ) := {f : Ω ∈ R̄ : f measurable , kf kp < ∞}.
91
Construction of the Integral (18)
Applied Probability
• Theorem, (see Klenke, Theorem 4.17):
• The map kf k1 is a seminorm on the vector space L1(µ). That is to
say for all f, g ∈ L1(µ) and α ∈ R we observe
– kαf k1 = |α|kf k1
– kf + gk1 ≤ kf k1 + kgk1
– kf k1 ≥ 0 for all f and kf k1 = 0 if f = 0 almost everywhere.
• Since kf k1 = 0 does not imply the f = 0 for all ω ∈ Ω, we only
observe a seminorm.
p
p0
• In addition it can be shown that L (µ) ⊂ L (µ) for
1 ≤ p0 ≤ p ≤ ∞ (Klenke, Theorem 4.19).
92
Integral and Limits (1)
Applied Probability
• We go to investigate the question whether the limit and the integral
can be interchanged.
• Two criteria are the monotone convergence theorem and the Lemma
of Fatou.
• The St. Petersburg game is a very prominent example where the limit
of the integral is not the integral of the limit. We shall meet the St.
Petersburg game also when we talk about Martingales (fair games)
and Martingale converge theorems.
• Remark: The St. Petersburg game has been investigated by Jakob
Bernoulli, Ars conjectandi, (1713, post mortem).
93
Integral and Limits (2)
Applied Probability
• Theorem, Monotone convergence [Beppo Levi](see Klenke,
Theorem 4.20):
Let f1, f2, · · · ∈ L1(µ) and let f : Ω → R̄ be measurable. Assume
that fn ↑ f almost everywhere for n → ∞. Then
lim
n→∞
Z
Z
fndµ = f dµ
where both sides can equal +∞.
94
Integral and Limits (3)
Applied Probability
• Theorem, Fatou’s Lemma (see Klenke, Theorem 4.21):
Let let f ∈ L1(µ) and let f, f1, f2, . . . be measurable with fn ≥ f
almost everywhere for all n ∈ N. Then
Z
Z
lim n→∞
inf fndµ ≤ lim n→∞
inf fndµ.
• By this lemma it also follows that
lim sup
Z
n→∞
Z
fndµ ≤ lim sup fndµ
n→∞
(given that there is an integrable majorant g, i.e. fn ≤ g) and
Z
lim n→∞
inf fn dµ ≤ lim n→∞
inf
Z
fn dµ ≤ lim sup
n→∞
Z
fn dµ ≤
Z
lim sup fn dµ.
n→∞
95
Integral and Limits (4)
Applied Probability
• St. Petersburg game
– Consider a gamble: E.g. roulette. To make it simple we only
consider to bet on black or red. When we bet on red the
probability to win is p = 18/37 < 1/2.
– Suppose that this game is played again an again. Then we have a
probability space (Ω, A, P) with Ω = {−1, 1}N, A = (2{−1,1})⊗N
and P = ((1 − p)δblack + pδred)⊗N = ((1 − p)δ−1 + pδ1)⊗N.
Dn : Ω → {−1, 1}, ω 7→ ωn is the n-th result of the game.
96
Integral and Limits (5)
Applied Probability
• St. Petersburg game
– The player plays a so called doubling strategy: In more details,
H1 = 1 is the amount invested in the first round. In step i the
player bets Hi = 2i−1. If realizes then he wins Hi. The player stops
to play after the first time he wins. In more formal terms: Hn = 0
for all n ≥ 2 if D1 = 1, Hn = 0 if there is some Di = 1,
i = 1, . . . , n − 1. Hn = 2n−1 else, i.e. if Di = −1 for all
i = 1, . . . , n − 1. Note that Hn depends on D1, . . . , Dn−1 only,
therefore it σ(D1, . . . , Dn−1) measurable.
– The cumulated gain is
Sn =
n
X
i=1
HiDi.
97
Integral and Limits (7)
Applied Probability
• St. Petersburg game
– The probability of no win until the n-th game is
P(D1 = −1 ∩ · · · ∩ Dn = −1) = (1 − p)n. Therefore,
P(Sn = 1 − 2n) = (1 − p)n and P(Sn = 1) = 1 − (1 − p)n. Hence
Z
SndP = (1 − p)n(1 − 2n) + (1 − (1 − p)n)1 = 1 − (2(1 − p))n ≤ 0
for p ≤ 1/2. Taking limits yields −∞ for p < 1/2 and 0 for
p = 1/2.
R
– Hence, lim SndP ≤ 0.
98
Integral and Limits (8)
Applied Probability
• St. Petersburg game
– The limit S can be −∞ or 1. The probability that
S = −∞ = limn→∞(1 − p)n = 0, while S = 1 with probability 1.
(Let An is the set where Sn = 1, then by Borel-Cantelli
lim sup = 1 by considering the complement Acn, the Borel Cantelli
Lemma yields lim sup ACn has a probability of zero. Hence
R
R
lim inf = 1.) Then SdP = lim SndP = (−∞) ∗ 0 + 1 ∗ 1 = 1.
R
R
– Hence limn→∞ SndP < SdP.
99
Integral and Limits (9)
Applied Probability
• St. Petersburg game
– In the lemma of Fatou an integrable minorant has been
assumed (see f ∈ L1(µ) in theorem 4.21). In the St. Petersburg
game there is no integrable minorant S̃ for (Sn) (i.e. Sn ≥ S̃ for
all n ∈ N).
– Define S̃ := inf{Sn : n ∈ N}, then
P(S̃ = 1 − 2n−1) = p(1 − p)n−1 and
R
n−1
S̃dµ = P∞
(1 − 2n−1) = −∞ for p ≤ 1/2.
n=1 p(1 − p)
100
Lebesgue vs. Riemann Integral (1)
Applied Probability
• What is the difference between the Lebesgue and the Riemann
integral.
• Note the we defined the Lebesgue integral of f with respect to
µ = λ by
Z
f dλ := sup{I(g) : g ∈ E+, g ≤ f }.
• Let J = [a, b] be an interval in R and λ the Lebesgue measure on J.
• Consider the sequence (tn)n∈N of the partitions tn = (tni)i=0...,n where
tn0 = a < . . . tnn = b becomes finer with increasing n where
max tni − tni−1 → 0 for n → ∞.
101
Lebesgue vs. Riemann Integral (2)
Applied Probability
• For any f : J → R and any n ∈ R define the lower and the upper
Riemann sum:
Ltn(f )
:=
and
Unt (f )
:=
n
X
i=1
n
X
i=1
inf f ([tni − tni−1))(tni − tni−1)
sup f ([tni − tni−1))(tni − tni−1).
102
Lebesgue vs. Riemann Integral (3)
Applied Probability
• Definition: Riemann integrability.
– f is Riemann integrable if there exists a t such that the limits of
the lower and the upper Riemann sums are equal and finite. (In
this case the limits do not depend on the choice of t). We write
b
a f (x)dx
Z
= n→∞
lim Ltn(f ) = n→∞
lim Unt (f ).
103
Lebesgue vs. Riemann Integral (4)
Applied Probability
• Theorem, Riemann and Lebesgue integral (see Klenke, Theorem
4.23):
– Let f : J → R be Riemann integrable on J = [a, b]. Then f is
Lebesgue integrable on J with integral
b
a f (x)dx
Z
=
Z
J
f dλ.
• It can be shown that a function is Riemann integrable if it is
continuous almost everywhere, e.g. the set of its points of
discontinuity has Lebesgue measure zero (see e.g. Billingsley, 1986;
Heuser, 1993, Chapter 17, Chapters 83-84 in the second book).
104
Lebesgue vs. Riemann Integral (5)
Applied Probability
• Example, A function with is Lebesgue integrable but not Riemann
integrable (see Klenke, Example 4.24):
– Let f : [0, 1] → R where x 7→ 1x∈Q. Here Ln(f ) = 0 and
Un(f ) = 1, hence this function is not Riemann integrable. The
R
Lebesgue integral is [0,1] 1x∈Qdλ = 0 since Q ∩ [0, 1] has measure
zero.
105
Lebesgue vs. Riemann Integral (6)
Applied Probability
• Example, An improper Riemann integrable function which is not
Lebesgue integrable (see Klenke, Example 4.25):
– An improper integral is defined by means of
R∞
Rn
f
(x)dx
=
lim
n→∞
0
0 f (x)dx. It can be shown that
R
exists while [0,∞) f dλ = ∞.
R∞
0
sin(x)
1+x dx
106
Lebesgue vs. Riemann Integral (7)
Applied Probability
• Theorem, Properties of the integral (see Klenke, Theorem 4.26):
– Let f : Ω → R be measurable and f ≥ 0 almost everywhere. Then
∞
X
n=1
Z
µ({f ≥ n}) ≤ f (x)dµ ≤
and
Z
f (x)dµ =
∞
0 µ({f
Z
∞
X
n=1
µ({f > n})
≥ t})dt .
107
Outline - Expected Value, LLN, Inequalities
Applied Probability
• Expected value by using the concept of the integral.
• The Cauchy-Schwarz and the Markov inequality.
• The weak and the strong law of large numbers.
• Klenke, Chapter 5
108
Moments (1)
Applied Probability
• Definition, (see Klenke, Definition 5.1): Consider a probability space
(Ω, A, P). Let X be a real valued random variable.
– If X ∈ L1(P), then X is called integrable and we call
Z
E(X) = XdP
the expectation or mean of X. If E(X) = 0 then X is called
centered.
– If n ∈ N and X ∈ Ln(P), then the quantities
mk =: E(X k ) , Mk =: E(|X|k ) for any k = 1, . . . , n
are called the kth moments and kth absolute moments of X.
109
Moments (2)
Applied Probability
• Definition, (see Klenke, Definition 5.1): Consider a probability space
(Ω, A, P). Let X be a real valued random variable.
– If X ∈ L2(P), then X is called square integrable and
V(X) = E(X 2) − E(X)2
r
is the variance of X. The number σ := V(X) is called the
standard deviation X. (In the textbook sometimes V(X) = ∞
if E(X 2) = ∞ is used.)
– If X, Y ∈ L2(P), then we define the covariance of X and Y by
Cov(X, Y ) = E ((X − E(X))(Y − E(Y ))) .
X and Y are called uncorrelated if Cov(X, Y ) = 0 and
correlated otherwise.
110
Moments (3)
Applied Probability
• Theorem, Rules for expectations (see Klenke, Theorem 5.3): Let
X, Y, Xn, Zn, n ∈ N be real integrable random variables on
(Ω, A, P).
– If PX = PY , then E(X) = E(Y ).
– Linearity: Let c ∈ R. Then cX ∈ L1(P) and X + Y ∈ L1(P) as
well as E(cX) = cE(X) and E(X + Y ) = E(X) + E(Y ).
– If X ≥ 0 almost surely then E(X) = 0 if and only if X = 0
almost surely.
– Monotonicity: If X ≤ Y almost surely, then E(X) ≤ E(Y ) with
equality if and only if X = Y almost surely.
111
Moments (4)
Applied Probability
• Theorem, Rules for expectations (see Klenke, Theorem 5.3): Let
X, Y, Xn, Zn, n ∈ N be real integrable random variable on (Ω, A, P).
– Triangle inequality: |E(X)| ≤ E(|X|).
– If Xn ≥ 0 almost surely for all n ∈ N, then
P∞
E(P∞
X
)
=
n
n=1
n=1 E(Xn ).
– If Zn ↑ Z for some Z, then E(Z) = limn→∞ E(Zn) ∈ (−∞, ∞].
112
Moments (5)
Applied Probability
• Theorem, Independent vs. Uncorrelated (see Klenke, Theorem 5.4):
– Let X, Y ∈ L1(P) be independent. Then (XY ) ∈ L1(P),
E(XY ) = E(X)E(Y ) and Cov(XY ) = 0, i.e. X and Y are
uncorrelated.
113
Moments (6)
Applied Probability
• Theorem, Wald’s identity (see Klenke, Theorem 5.5):
– Let T, X1, X2, . . . be independent real random variables in L1(P).
Let P(T ∈ N0) = 1 and assume that X1, X2, . . . are identically
distributed. Define
T
X
ST :=
Xi .
i=1
Then ST ∈ L1(P) and E(ST ) = E(T )E(X1).
114
Moments (7)
Applied Probability
• Theorem, Properties of the Variance (see Klenke, Theorem 5.6):
Let X ∈ L2(P). Then:
– V(X) = E ((X − E(X))2) ≥ 0.
– V(X) = 0 if and only if X = E (X) almost surely.
– The map f : R → R, x 7→ E ((X − x)2) is miminal at
x0 = E(X) with f (E(X)) = V(X).
115
Moments (8)
Applied Probability
• Theorem, Covariance (see Klenke, Theorem 5.7):
The map Cov : L2(P) × L2(P) → R is a positive semidefinite symmetric
bilinear form and Cov = 0 if Y is almost surely constant. The detailed version
of this concise statement is: Let X1, . . . , Xm, Y1, . . . , Yn ∈ L2(P) and
α1, . . . , αm, β1, . . . , βn ∈ R as well as d, e ∈ R. Then

d +
Cov 
m
X
i=1
αiXi, d +
n
X
j=1

 =
αj Xj 
X
i,j
αiβj Cov(Xi, Yj ) .
In particular V(αX) = α2V(X), α, X ∈ R, and the Bienaymé formula holds,

V
m
X


Xi =
m
X

i=1
i=1
V (Xi) +
For uncorrelated X1, . . . , Xm we have V (
m
X
i,j=1; i6=j
Pm
Cov(Xi, Yj ) .
i=1 Xi ) =
Pm
i=1 V (Xi ).
116
Moments (9)
Applied Probability
• Theorem, Cauchy-Schwarz inequality (see Klenke, Theorem 5.9):
If X, Y ∈ L2(P) then
(Cov (X, Y )))2 ≤ V(X)V(Y ) .
Equality holds if and only if there are a, b, c ∈ R with
|a| + |b| + |c| > 0 and such that aX + bY + c = 0 almost surely.
117
Moments (10)
Applied Probability
• Theorem, Blackwell-Girshick (see Klenke, Theorem 5.10):
– Let T, X1, X2, . . . be independent real random variables in L2(P).
Let P(T ∈ N0) = 1 and assume that X1, X2, . . . are identically
distributed. Define
T
X
ST :=
Xi .
i=1
Then ST ∈ L2(P) and V(ST ) = V(T )E(X1)2 + E(T )V(X1).
118
The Weak Law of Large Numbers (1)
Applied Probability
• Theorem, Markov inequality, Chebyshev inequality (see Klenke,
Theorem 5.11):
– Let X be a real random variables and let f : [0, ∞) → [0, ∞) be
monotone increasing. Then for any ε with f (ε) > 0, the Markov
inequality holds:
P (|X| ≥ ε) ≤
E(f (|X|))
.
f (ε)
E(X 2 )
ε2 .
2
In the special case f (x) = x we get P (|X| ≥ ε) ≤
In
particular if X ∈ L2(P) the Chebyshev inequality holds:
P (|X − E(X)| ≥ ε) ≤
V(X)
.
2
ε
119
The Weak Law of Large Numbers (2)
Applied Probability
• Definition, Law of Large Numbers (see Klenke, Theorem 5.12): Let
(Xn)n∈N be a sequence of real random variables in X ∈ L1(P) and
let S̃n := Pni=1(X − E(Xi)).
– We say that (Xn)n∈N fulfills the weak law of large numbers if
1


| S̃n | > ε = 0 for any ε > 0.
lim
P
n→∞
n


– We say that (Xn)n∈N fulfills the strong law of large numbers if
1
P lim sup | S̃n| = 0 = 1 .
n→∞ n


120
The Weak Law of Large Numbers (3)
Applied Probability
• Theorem, Weak Law of Large Numbers (see Klenke, Theorem 5.14):
– Let X1, X2, . . . be uncorrelated random variables in L2(P) with
V := supn∈N V(Xn) < ∞. Then (Xn)n∈N fulfills the weak law of
large numbers. More precisely, for any ε > 0 we have
1
V
P | S̃n| ≥ ε ≤ 2 .
n
nε


121
The Strong Law of Large Numbers (1)
Applied Probability
• Theorem, Strong of Large Numbers (see Klenke, Theorem 5.16):
– Let X1, X2, · · · ∈ L2(P) be pairwise independent (that is Xi and
Xj are independent for all i, j ∈ N with i 6= j) and identically
distributed. Then (Xn)n∈N fulfills the strong law of large numbers.
122
The Strong Law of Large Numbers (2)
Applied Probability
• Theorem, Etemadi’s Strong of Large Numbers (see Klenke,
Theorem 5.17):
– Let X1, X2, · · · ∈ L1(P) be pairwise independent (that is Xi and
Xj are independent for all i, j ∈ N with i 6= j) and identically
distributed. Then (Xn)n∈N fulfills the strong law of large numbers.
123
The Strong Law of Large Numbers (3)
Applied Probability
• Example, Monte Carlo Integration (see Klenke, Example 5.21):
– Let f : [0, 1] → R be a function. We want to determine the value
R
of the integral I = 01 f (x)dx numerically.
– Generate pseudo random numbers X1, . . . , Xn on [0, 1].
– Iˆn := n1 Pni=1 f (xi) is an estimate of I.
– Given that f ∈ L1([0, 1]), the strong law of large numbers yields
Iˆn → I almost surely.
124
The Strong Law of Large Numbers (4)
Applied Probability
• Example, Monte Carlo Integration (see Klenke, Example 5.21):
– We don’t know anything how fast Iˆn converges to I. If
R
f ∈ L2([0, 1]) then V1 = f 2(x)dx − I 2 can be obtained. The
Chebychev inequality yields
−1/2
P |Iˆn − I| > εn
!
≤
V1
.
2
ε
I.e. the error is of the order n−1/2.
– In the literature different methods to reduce the variance of Iˆn are
available. One important example is importance sampling. See
e.g. Robert and Casella (1999).
125
The Strong Law of Large Numbers (5)
Applied Probability
• Definition, Empirical Distribution Function (see Klenke, Definition
5.22):
– Let X1, X2, . . . be real random variables. The map
Fn : R → [0, 1], x 7→ n1 Pni=1 1(−∞,x] is called empirical
distribution function of X1, . . . , Xn.
126
The Strong Law of Large Numbers (6)
Applied Probability
• Theorem, Glivenko-Cantelli (see Klenke, Theorem 5.23):
– Let X1, X2, . . . be iid real random variables with distribution
function F and let Fn, n ∈ N be the empirical distribution
functions. Then
lim sup supx∈R|Fn(x) − F (x)| = 0 almost surely.
n→∞
127
Outline - Convergence Theorems
Applied Probability
• Almost sure convergence.
• Convergence in probability (convergence in measure).
• Mean convergence (L1 convergence).
• Uniform integrability.
• Klenke, Chapter 6.
128
Almost Sure and Measure Convergence (1)
Applied Probability
• The triple (Ω, A, µ) is a σ-finite measure space.
• (E, d) is a separable metric space with Borel σ-algebra B(E).
Separable means that there exists a countable dense set see e.g.
Munkres (2000)
• f1, f2, · · · : Ω → E are measurable with respect to A − B(B).
129
Almost Sure and Measure Convergence (2)
Applied Probability
• Definition, Almost sure convergence, convergence in probability (see Klenke,
Definition 6.2): We say that (fn)n∈N converges to f
– in µ-measure, symbolically fn meas
→ f if
µ({d(f, fn) > ε} ∩ A) n→∞
−→ 0
for all ε > 0 and all A ∈ A with µ(A) < ∞, and
– in µ-almost everywhere (a.e.), symbolically fn a.e.
→ f if there exists a µ-null
set N ∈ A such that
d(f (ω), fn(ω)) n→∞
−→ 0
for any ω ∈ Ω \ N .
– If µ is a probability measure, then convergence in µ-measure is called
convergence in probability. If (fn) converges almost everywhere then we say
the it converges almost surely (a.s.).
130
Almost Sure and Measure Convergence (3)
Applied Probability
• Almost sure convergence implies convergence in probability.
• Convergence in probability does not imply almost sure convergence.
– Let (Xn)n∈N be an independent family of real values random
variables. Xn is Bernoulli distributed with pn = n1 . Then
P
0.
P({d(X, Xn) > ε}) = P(|X − Xn| > ε) = 1/n. Hence Xn →
Let An be the event where Xn = 1, then lim supn→∞(An) = A∗
corresponds to {1 infinitely often}. Since
P∞
P∞
1
P(|X
−
X
|
>
ε)
=
n
n=1
n=1 n = ∞ the second Borel-Cantelli
lemma implies that lim supn→∞ Xn = 1 almost surely.
131
Almost Sure and Measure Convergence (4)
Applied Probability
• Definition, Mean Convergence (see Klenke, Definition 6.8):
– Let E = R and f, f1, f2, · · · ∈ L1(µ). We say that (fn)n∈N
converges in mean (L1 convergence) to f , symbolically
L1
fn → f
if ||fn − f ||1 → 0.
• Note that ||fn − f ||1 = |fn − f |dµ n→∞
−→ 0. Lp convergence means
R
that ||fn − f ||p = ( |fn − f |pdµ)1/p n→∞
−→ 0 (this comes later in more
detail).
R
• L1 convergence implies convergence in measure but not vice versa.
132
Almost Sure and Measure Convergence (5)
Applied Probability
• Theorem, Fast Convergence (see Klenke, Definition 6.12): Let
(E, d) be a separable metric space. In order for the sequence (fn)n∈N
of measurable maps Ω → E to converge almost everywhere, it is
sufficient that one of the following conditions hold:
– E = R and there is a p ∈ [1, ∞) with fn ∈ Lp(µ) for all n ∈ N
and there is an f ∈ Lp(µ) with P∞
n=1 ||fn − f ||p < ∞.
P∞
– There is a measuralbe f with n=1 µ(A ∩ {d(f, fn) > ε}) < ∞
for all ε > 0 and for all A ∈ A with µ(A) < ∞.
– E is complete and there is a summable sequence (εn)n∈N such
that P∞
n=1 µ(A ∩ {d(fn , fn+1 ) > εn }) < ∞ for all ε > 0 and for all
A ∈ A with µ(A) < ∞.
133
Almost Sure and Measure Convergence (6)
Applied Probability
• Corollary, Subsequence and convergence (see Klenke, Definition
6.13): Let (E, d) be a separable metric space. Let f, f1, f2, . . . be
measurable maps Ω → E. Then the following statements are
equivalent:
– fn n→∞
−→ f in measure.
– For any subsequence of (fn)n∈N there exists a sub-subsequence
that converges to f almost everywhere.
134
Convergence in Distribution (1)
Applied Probability
• Convergence in Distribution, see Klenke, Definition 13.17;
• Definition, Convergence in Distribution (see Karr, 1993, Def. 5.5):
The sequence (Xn) converges to X in distribution if
lim FXn (t) = FX (t)
n→∞
d
for all t at which FX is continuous. This is denoted by Xn −→
X
or Xn ⇒ X
135
Uniform Integrability (1)
Applied Probability
• If f is integrable then f 1|f |>α goes to zero almost everywhere as
R
α → ∞. Therefore limα→∞ |f |≥α |f |dµ = 0.
• Uniform means that if we consider a sequence (fn) ∈ L1(µ) the
R
integrability holds for all n, i.e. limα→∞ supn |fn|≥α |fn|dµ = 0 (see
Billingsley, 1986, page 220).
• Example: fndµ = na . Klenke uses the following definition:
R
• Definition, uniformly integrable (see Klenke, Definition 6.16):
– A family F ⊂ L1(µ) is called uniformly integrable if
inf1
sup (|f | − g)+ dµ = 0 .
Z
0≤g∈L (µ) f ∈F
136
Uniform Integrability (2)
Applied Probability
• Theorem (see Klenke, Theorem 6.17): The family F ⊂ L1(µ) is
uniformly integrable if and only if
Z
inf sup
0≤g̃∈L1 (µ) f ∈F |f |>g̃
|f |dµ = 0 .
If µ(Ω) < ∞ then uniform integrability is equivalent to the following
two conditions:
– inf a∈[0,∞) supf ∈F (|f | − a)+ dµ = 0 and
R
– inf a∈[0,∞) supf ∈F |f |>a |f |dµ = 0.
R
137
Uniform Integrability (3)
Applied Probability
• Theorem (see Klenke, Theorem 6.25): Let {fn : n ∈ N} ⊂ L1(µ).
The following statements are equivialent:
– There is an f ∈ L1(µ) with fn → f in L1.
– (fn)n∈N is an L1(µ)-Cauchy sequence, that is ||fn − fm||1 → 0 for
m, n → ∞.
– (fn)n∈N is uniformly integrable and there is a measurable map f
such that fn n→∞
−→ f in measure.
The limits in the first and the third point coincide.
138
Uniform Integrability (4)
Applied Probability
• In Chapter 6 we additionally find:
– Lebesgues dominated convergence theorem.
– Interchanging the Integral and Differentiation
139
Outline - Convergence Theorems
Applied Probability
• Lp spaces and Lp convergence.
• Jensen’s inequality, Hölder’s inequality, Minkowski’s inequality.
• The Fischer-Riesz theorem (Lp(µ) is a Banach space).
• Hilbert spaces.
• Lebesgue’s decomposition theorem, absolute continuity.
• The Randon Nikodym theorem.
• Klenke, Chapter 7.
140
Lp Spaces (1)
Applied Probability
• We consider a σ-finite measure space (Ω, A, µ). For f : Ω → R̄ we
define
Z
1/p
||f ||p := ( |f |pdµ)
for p ∈ [1, ∞)
and
||f ||∞ := inf{K ≥ 0 : µ(|f | > k) = 0}
• Spaces of functions where these terms are finite are the Lp spaces.
Lp(Ω, A, µ) = Lp(µ) = {f : Ω → R̄ measurable and ||f ||p < ∞}.
141
Lp Spaces (2)
Applied Probability
• Note that ||f ||p is only a seminorm; we observed this for L1 in
Klenke (2008)[Theorem 4.17]. The goal is to adapt the space such
that we obtain a norm.
• To obtain a norm ||f − g||p = 0 if and only if f = g µ-almost
everywhere. For a seminorm we only have f = g implies
||f − g||p = 0.
• Hence we define N = {h is measurable and h = 0 µ − a.e.}. For
any p ∈ [1, ∞], N is a subvector of Lp.
• To obtain a norm from the seminorm we build the factor space.
142
Lp Spaces (3)
Applied Probability
• Definition, Factor space (see Klenke, Definition 7.1):
– For any p ∈ [1, ∞] define
Lp(Ω, A, µ) = Lp(µ) := Lp/N = {f¯ := f + N : f ∈ Lp}.
For f¯ ∈ Lp(µ) define ||f¯||p = ||f ||p for any f ∈ f¯. Also let
R
R
f¯dµ = f dµ if this expression is define for f .
• I.e. with f¯ we define equivalence classes. f ∼ g if g, f ∈ f¯.
143
Lp Spaces (4)
Applied Probability
• Definition, Lp convergence (see Klenke, Definition 7.2):
– Let p ∈ [1, ∞] and f1, f2, · · · ∈ Lp(µ). If ||fn − f ||p n→∞
−→ 0 then
Lp
p
we say that (fn)n∈N converges to f in L (µ) and write fn −→ f .
144
Lp Spaces (5)
Applied Probability
• Theorem, (see Klenke, Theorem 7.3): Let p ∈ [1, ∞] and
f1, f2, · · · ∈ Lp(µ). Then the following statements are equivalent:
p
Lp
– There is an f ∈ L (µ) with fn −→ f .
– (fn)n∈N is a Cauchy sequence in Lp(µ). (I.e. there is a positive
integer N such that for all m, n ∈ N the ||fm − fn||p ≤ ε.)
If p < ∞ then these two statements are equivalent to
– (|fn|p)n∈N is uniformly integrable and there exists a measurable f
with fn converging to f in measure. The limits in the first and this
point coincide.
145
Inequalities and the Fischer-Riesz T. (1)
Applied Probability
• Theorem, Jensen’s inequality (see Klenke, Theorem 7.9):
– Let I ⊂ R be and interval and let X be an I-valued random
variable with E(|X|) < ∞. If φ is convex, then E(φ(X)−) < ∞
and
E(φ(X)) ≥ φ (E(X)) .
• φ has to be convex on an interval containing the range of X.
• Extension to Rn see Klenke, Theorem 7.11.
• Example: Consider a random variable X where E(X 2) < ∞. Then
Jensen’s inequality yields E(X 2) ≥ (E(X))2. Hence
V(X) = E(X 2) − (E(X))2 ≥ 0.
146
Inequalities and the Fischer-Riesz T. (2)
Applied Probability
• Theorem, Hölder’s inequality (see Klenke, Theorem 7.16):
– Let p, q ∈ [1, ∞] with p1 + 1q = 1 and f ∈ Lp(µ), g ∈ Lq (µ). Then
f g ∈ L1(µ) and
||f g||1 ≤ ||f ||p||g||q .
147
Inequalities and the Fischer-Riesz T. (3)
Applied Probability
• Theorem, Minkowski’s inequality (see Klenke, Theorem 7.16):
– For p ∈ [1, ∞] and f, g ∈ Lp(µ)
||f + g||p ≤ ||f ||p + ||g||p .
148
Inequalities and the Fischer-Riesz T. (4)
Applied Probability
• Theorem, Fischer-Riesz (see Klenke, Theorem 7.18):
– (Lp(µ), ||.||p) is a Banach space for every p ∈ [1, ∞].
• A Banach space B is a vector space equipped with a norm ||.||. With
respect to that norm this space is complete (for every Cauchy
sequence (fn)∞
n=1 in B, there exists an element f ∈ B such that
limn→∞ ||fn − f || = 0).
• By the Minkowski inequality we observe that the triangle inequality is
satisfied in (Lp(µ), ||.||p). Therefore ||.||p is a norm. By Klenke
(2008)[Theorem 7.3] the space (Lp(µ), ||.||p) is complete.
149
Hilbert Spaces (1)
Applied Probability
• Definition, Inner product (see Klenke, Definition 7.19): Let V be a
real vector space. A map h., .i : V × V → R is called an inner
product if
– (linearity) hx, αy + zi = αhx, yi + hx, zi for all x, y, z ∈ V and
α ∈ R.
– (symmetry) hx, yi = hy, xi for all x, y ∈ V .
– (positive definiteness) hx, xi > 0 for all x ∈ V \ {0} .
• If only the first two properties hold and hx, xi ≥ 0 for all x, then
h., .i is called a positive semindefinite symmetric bilinear form, or a
semi-inner product.
• If h., .i is an inner product, then (V, h., .i) is called a (real) Hilbert
space if the norm defined by ||x|| := h., .i1/2 is complete, this is if
(V, ||.||) is a Banach space.
150
Hilbert Spaces (2)
Applied Probability
• Definition, (see Klenke, Definition 7.20):
– For f, g ∈ L2(µ) define
Z
hf, gi := f gdµ .
– For f¯, ḡ ∈ L2(µ) define
hf¯, ḡi := hf, gi
where f ∈ f¯ and g ∈ ḡ.
• Theorem, (see Klenke, Theorem 7.21):
– h., .i is an inner product on L2(µ) and a semi-inner product on
L2(µ). In addition ||f ||2 = hf, f i1/2.
151
Hilbert Spaces (3)
Applied Probability
• Theorem, (see Klenke, Theorem 7.22):
– The space (L2(µ), h., .i) is a Hilbert space.
152
Hilbert Spaces (4)
Applied Probability
• Definition, Orthogonal Complement (see Klenke, Definition 7.24):
– Let V be a real vector space with inner product h., .i. If W ⊂ V
then the orthogonal complement of W is the following linear
subspace of V .
W ⊥ := {v ∈ V : hv, wi = 0 for all w ∈ W } .
153
Hilbert Spaces (5)
Applied Probability
• Theorem, Orthogonal Decomposition (see Klenke, Theorem 7.22):
– Let (V, h., .i) be a Hilbert space and let W ⊂ V be a closed linear
subspace. For any x ∈ V , there is a unique representation
x = w + w⊥ where w ∈ W and w⊥ ∈ W ⊥.
154
Hilbert Spaces (6)
Applied Probability
• Let ||x − ŵ|| = inf w∈W ||x − w||.
• It can be shown that the above expression is minimized when ŵ ∈ W
and x − ŵ ∈ W ⊥ (see e.g. Ruud, 2000, Section 2.6.2); or:
• Theorem, Projection Theorem (see e.g. Brockwell and Davis, 2006,
Theorem 2.3.1): If W is a closed subspace of the Hilbert space V
and x ∈ V , then
– there is a unique element ŵ ∈ W such that
||x − ŵ|| = inf w∈W ||x − w||.
– ŵ ∈ W and ||x − ŵ||) = inf w∈W ||x − w|| if and only if ŵ ∈ W
and (x − ŵ) ∈ W ⊥.
• We shall observe that ŵ is given by the conditional expectation.
155
The Radon-Nikodym Theorem (1)
Applied Probability
• Definition, (see Klenke, Definition 7.30): Let µ and ν be two
measures on (Ω, A).
– ν is called absolutely continuous with respect to µ
(symbolically ν µ) if
ν(A) = 0 for all A ∈ A with µ(A) = 0 .
The measures ν and µ are called equivalent if ν µ and
µ ν.
– µ is called singular to ν (µ ⊥ ν) if there exists and A ∈ A such
that µ(A) = 0 and ν(Ω \ A) = 0.
156
The Radon-Nikodym Theorem (2)
Applied Probability
• Theorem, Lebesgue’s decomposition theorem (see Klenke, Theorem
7.33):
– Let µ and ν be σ-finite measures in (Ω, A). Then ν can be
uniquely decomposed into an absolutely continuous part νa and a
singular part νs (with respect to µ):
ν = νa + νs where νa µ and νs ⊥ µ .
νa has a density with respect to µ and
finite µ almost everywhere.
dνa
dµ
is A-measurable and
157
The Radon-Nikodym Theorem (3)
Applied Probability
• Theorem, Radon-Nikodym theorem (see Klenke, Corollary 7.34):
– Let µ and ν be σ-finite measures in (Ω, A). Then
ν has a density w.r.t µ ⇔ ν µ .
dν
is A-measurable and finite µ almost everywhere.
In this case dµ
dν
The term dµ
is called Radon-Nikodym derivative of ν with
respect to µ.
158
Outline - Martingales
Applied Probability
• Conditional Expectation.
• Martingales.
• Discrete Stochastic Integrals and No-Arbitrage.
• Optional Sampling Theorem.
• The Martingale Convergence Theorem.
• Klenke, Chapters 8 to 11.
159
Conditional Expectation (1)
Applied Probability
• Definition, Conditional Probability (see Klenke, Definition 8.2):
– Let (Ω, A, P) be a probability space and A ∈ A. We define the
conditional probability given A for any B ∈ A by


P(B∩A)




P(B|A) = 


P(A)
if P(A) > 0 ,
0 else.
• The specification for the case P(A) is arbitrary but of no importance.
160
Conditional Expectation (2)
Applied Probability
• Theorem, (see Klenke, Theorem 8.4):
– If P(A) > 0, then P(B|A) is a probability measure on (Ω, A).
• Theorem, (see Klenke, Theorem 8.5): Let A, B ∈ A with
P(A), P(B) > 0. Then
– A, B are independent ⇔ P(B|A) = P(B) ⇔ P(A|B) = P(A).
161
Conditional Expectation (3)
Applied Probability
• Theorem, Summation formula/law of total probability (see Klenke,
Theorem 8.6):
– Let I be a countable set and let (Bi)i∈I be pairwise disjoint sets
with P (Ui∈I Bi) = 1. Then for any A ∈ A,
P(A) =
X
i∈I
P(A|Bi)P(Bi) .
162
Conditional Expectation (4)
Applied Probability
• Theorem, Bayes’ formula (see Klenke, Theorem 8.7):
– Let I be a countable set and let (Bi)i∈I be pairwise disjoint sets
with P (Ui∈I Bi) = 1. Then for any A ∈ A with P(A) > 0 and any
k ∈ I,
P(Bk |A) =
P(A|Bk )P(Bk )
P(A ∩ Bk )
=
.
P
P(A)
i∈I P(A|Bi )P(Bi )
163
Conditional Expectation (5)
Applied Probability
• Definition, (see Klenke, Defintion 8.9):
– Let X ∈ L1(P) and A ∈ A. Then we define
Z


E(1A X)




P(A)
E(X|A) := X(ω)P[dω|A] = 

0 else.
if P(A) > 0 ,
164
Conditional Expectation (6)
Applied Probability
• P(B|A) = E(1B |A) for all B ∈ A
• Consider a countable set I and pairwise disjoint sets (Bi)i∈I with
U
i∈I Bi = Ω.
• Define F = σ(Bi, i ∈ I).
• For X ∈ L1(P) we define the map E(X|F) : Ω → R by
E(X|F) = E(X|Bi) ⇐⇒ ω ∈ Bi .
165
Conditional Expectation (7)
Applied Probability
• Theorem, (see Klenke, Theorem 8.10): The map E(X|F) has the
following properties:
– E(X|F) is F-measurable.
– E(X|F) ∈ L1(P) and for any A ∈ F we have
R
R
A E(X|F)dP = A XdP.
166
Conditional Expectation (8)
Applied Probability
• F ⊂ A is a sub-σ-algebra and X ∈ L1(Ω, A, P).
• Definition, Conditional Expectation (see Klenke, Definition 8.11): A
random variable Y is called a conditional expectation of X given
F, symbolically E(X|F) := Y , if
– Y is F-measurable.
– For any A ∈ F we have E(X1A) = E(Y 1A).
– For B ∈ A, P(B|F) := E(1B |F) is called a conditional
probability of B given the σ-algebra F.
167
Conditional Expectation (9)
Applied Probability
• Theorem, Conditional Expectation (see Klenke, Theorem 8.12):
– E(X|F) exists and is unique (up to equality almost surely).
• Existence follows from the Radon-Nikodym theorem.
168
Conditional Expectation (10)
Applied Probability
• We write/define E(X|Y ) = E(X|σ(Y ))
• Theorem, Properties of the Conditional Expectation (see Klenke,
Theorem 8.14): Let G ⊂ F ⊂ A be σ-algebras and let
Y ∈ L1(Ω, A, P). Then:
– (Linearity) E(λX + Y |F) = λE(X|F) + E(Y |F).
– (Monotonicity) If X ≥ Y a.s. the E(X|F) ≥ E(Y |F).
– If E(|XY |) < ∞ and Y is measurable with respect to F, then
E(Y X|F) = Y E(X|F) and E(Y |F) = E(Y |Y ) = Y .
– (Tower Property) E(E(X|F)|G) = E(E(X|G)|F) = E(X|G).
– (Triangle inequality) E(|X||F) ≥ |E(X|F)|.
169
Conditional Expectation (11)
Applied Probability
• Theorem, Properties of the Conditional Expectation (see Klenke,
Theorem 8.14): Let G ⊂ F ⊂ A be σ-algebras and let
Y ∈ L1(Ω, A, P). Then:
– (Independence) If σ(X) and F are independent, then
E(X|F) = E(X).
– If P(A) = {0, 1} for any A ∈ F then E(X|F) = E(X).
– (Dominated convergence) Assume Y ∈ L1(P), Y ≥ 0 and
(Xn)n∈N is a sequence of random variables with |Xn| ≤ Y for
n ∈ N and such that Xn → X a.s. Then
1
lim
E(X
|F)
=
E(X|F)
a.s.
and
in
L
(P) .
n
n→∞
170
Conditional Expectation (12)
Applied Probability
• Theorem, Conditional Expectation and Projection (see Corollary,
8.16):
– Let F ⊂ A be σ-algebra and let X be a random variable with
E(X 2) < ∞. Then E(X|F) is the orthogonal projection of X on
L2(Ω, F, P). That is, for any F-measurable Y with E(Y 2) < ∞
2
E (X − Y ) ≥ E (X − E(X|F))
2
with equality if and only if E(X|F) = Y .
171
Processes, Filtrations (1)
Applied Probability
• In the following (E, τ ) is Polish space (separable completely
metrizable topological space - (see e.g. Klenke, 2008, p. 184)) with
Borel σ-algebra E. (Ω, F, P) stands for a probability space, I for an
index set.
• Definition, Stochastic Process (see Klenke, Definition 9.1):
– Let I ⊂ R. A family of random variables X = (Xt, t ∈ I) on
(Ω, F, P) with values in (E, E) is called a stochastic process
with index set I and range E.
• In most cases the ’time notation’ instead of the more general index
set notation is used.
172
Processes, Filtrations (2)
Applied Probability
• Examples:
– Let I = N0 and (Yn, n ∈ N0) be a family of iid Rademacher
random variables (with p = 1/2) on a probability space (Ω, F, P).
I.e. P(Yn = 1) = P(Yn = −1) = 12 . E = Z (with the discrete
topology) and let
Xt =
t
X
n=1
Yn
for all t ∈ N0. (Xt, t ∈ N0) is called symmetric random walk
in Z. For random walks see e.g. Durrett (2010)[Chapter 4].
– Brownian motion, here I = R+, see e.g. Klenke
(2008)[Chapter 21] or Durrett (2010)[Chapter 8].
– Poisson Process, see e.g. Klenke (2008)[Chapter 3].
– Random Graphs, see e.g. Durrett (2007).
173
Processes, Filtrations (3)
Applied Probability
• Definition, (see Klenke, Definition 9.6):
– If X is a random variable or a stochastic process, we write
L [X] = PX for the distribution of X. If G ⊂ F is a σ-algebra,
the we write L [X|G] for the regular conditional distribution of X
given G.
174
Processes, Filtrations (4)
Applied Probability
• Definition, (see Klenke, Definition 9.7): An E-valued stochastic
process X = (Xt)t∈I is called
– real valued if E = R.
– a process with independent increments if X is real valued and
for all n ∈ N and all t0, t1, . . . , tn ∈ I with t0 < t1 < · · · < tn we
have that (Xti − Xti−1 )i=1,...,n is independent.
– Gaussian process if X is real valued and for all n ∈ N and all
t0, t1, . . . , tn ∈ I, Xt1 , . . . , Xtn ) is n-dimensional normally
distributed, and
– integrable (square integrable) if X is real valued and
E(|Xt|) < ∞ (E(Xt2) < ∞) for al t ∈ I.
175
Processes, Filtrations (5)
Applied Probability
• Definition, (see Klenke, Defintion 9.7): Assume that I ∈ R is closed
under addition. An E-valued stochastic process X = (Xt)t∈I is called
– stationary if
L [(Xs+t)t∈I ] = L [(Xt)t∈I ]
for all s ∈ I, and
– a process with stationary increments if X is real valued and
L [(Xs+t+r − Xt+r )] = L [(Xs+r − Xr )]
and for all r, s, t ∈ I. If 0 ∈ I then it is enough to consider r = 0.
176
Processes, Filtrations (6)
Applied Probability
• Remark: In econometrics often a weaker form of stationarity is used.
• Definition, Weak Stationarity (see e.g. Brockwell and Davis, 2006,
Defintion 1.3.2): The time series (Xt)t∈Z is said to be weak
stationary (covarinace stationary, stationary in the wide
sense, second order stationary) if
– E(Xt) < ∞ for all t ∈ Z.
– E(Xt) = m for all t ∈ Z.
– E ((Xr − m)(Xs − m)) = E ((Xr+t − m)(Xs+t − m)) for all
r, s, t ∈ Z.
177
Processes, Filtrations (7)
Applied Probability
• In the following definition the index set I should be partially ordered.
• Definition, Filtration (see Klenke, Definition 9.9): Let
F = (Ft, t ∈ I) be a family of σ-algebras with Ft ∈ F for all t ∈ I.
F is called filtration if Fs ⊂ Ft for all s, t ∈ I with s ≤ t.
• Definition, Adapted (see Klenke, Definition 9.10): A stochastic
process X is called adapted to the filtration F if Xt is
Ft-measurable for all t ∈ I. If Ft = σ(Xs, s ≤ t) for all t ∈ I, then
we denote by F = σ(X) the filtration generated by X.
178
Processes, Filtrations (8)
Applied Probability
• Definition, Predictable (see Klenke, Definition 9.12): A stochastic
process X = (Xn, n ∈ N0) is called predictable with respect to
the filtration F = (Fn, n ∈ N0) if X0 is constant and if for every
n ∈ N, Xn is Fn−1-measurable.
• For measure theoretic details (adapted, augmented filtration) see e.g.
Karatzas and Shreve (1991).
179
Processes, Filtrations (9)
Applied Probability
• Definition, Stopping Time (see Klenke, Definition 9.15):
– A random variable τ with values in I ∪ {∞} is called a stopping
time with respect to F if for any t ∈ I
{τ ≤ t} ∈ Ft.
• Theorem, Stopping Time (see Klenke, Theorem 9.16):
– Let I by countable. τ is a stopping time if and only if
{τ = t} ∈ Ft for all t ∈ I.
180
Processes, Filtrations (10)
Applied Probability
• Theorem, Stopping Time (see Klenke, Theorem 9.18): Let σ and τ
be stopping times. Then:
– σ ∨ τ and σ ∧ τ are stopping times.
– If σ, τ ≥ 0, then σ + τ is also a stopping time.
– If s ≥ 0, then τ + s is a stopping time. However, in general, τ − s
is not.
181
Processes, Filtrations (11)
Applied Probability
• Definition, σ-algebra of the τ -past (see Klenke, Definition 9.19):
– Let τ be a stopping time. Then
Fτ := {A ∈ F : A ∩ {τ ≤ t} ∈ Ft, for any t ∈ I}
is called the σ-algebra of the τ -past.
182
Processes, Filtrations (12)
Applied Probability
• Theorem, (see Klenke, Lemma 9.21):
– If σ and τ are stopping times with σ ≤ τ , then Fσ ⊂ Fτ .
• Definition, (see Klenke, Definition 9.22):
– If τ < ∞ is a stopping time, then we define Xτ (ω) = Xτ (ω)(ω).
183
Martingales (1)
Applied Probability
• Definition, Martingales (see Klenke, Definition 9.24): Let (Ω, F, P)
be a probability space, I ⊂ R, and let F be a filtration. Let
X = (Xt)t∈I be a real-valued, adapted stochastic process with
E(|Xt|) < ∞ for all t ∈ I. X is called (with respect to F) a
– martingale if E(Xt|Fs) = Xs for all s, t ∈ I with t > s,
– submartingale if E(Xt|Fs) ≥ Xs for all s, t ∈ I with t > s,
– supermartingale if E(Xt|Fs) ≤ Xs for all s, t ∈ I with t > s.
• Consider the map t 7→ E(Xt). For a martingale this map is constant,
for a submartingale monotone increasing, while for a supermartingale
it is monotone decreasing.
• If not otherwise stated Ft = σ(Xs, s ≤ t).
184
Martingales (2)
Applied Probability
• Theorem, Martingales - properties (see Klenke, Theorem 9.32): Let
(Ω, F, P) be a probability space, I ⊂ R, and let F be a filtration. Let
X = (Xt)t∈I be a real-valued, adapted stochastic process with
E(|Xt|) < ∞ for all t ∈ I.
– X is a supermartingale if and only if (−X) is a submartingale.
– Let X and Y be martingales and let a, b ∈ R. Then (aX + bY ) is
a martingale.
– Let X and Y be supermartingales and let a, b ≥ 0. Then
(aX + bY ) is a supermartingale.
– Let X and Y be supermartingales. Then
Z := X ∧ Y = (min(Xt, Yt))t∈I is a supermartingale.
– If (Xt)t∈I is a supermartigale and E(XT ) ≥ E(X0) for some
T ∈ N0, then (Xt)t∈I is a martingale. If there exists a sequence
TN → ∞ with E(XTN ) ≥ E(X0), then (Xt)t∈I is a martingale.
185
Martingales (3)
Applied Probability
• Theorem, (see Klenke, Theorem 9.33): Let X = (Xt)t∈I be a
martingale and let φ : R → R be a convex function.
– If E(φ(Xt)+) < ∞ for all t ∈ I, then (φ(Xt))t∈I is a
submartingale.
– If t∗ := sup(I) ∈ I, then E(φ(Xt∗ )+) < ∞ implies
E(φ(Xt)+) < ∞.
– In particular, if p ≥ 1 and E(|Xt|p) < ∞ for all t ∈ I, then
(|Xt|p)t∈I is a submartingale.
186
Discrete Stochastic Integral (1)
Applied Probability
• Definition, Discrete Stochastic Integral (see Klenke, Definition
9.37):
– Let (Xn)n∈N0 be an F-adapted real process and let (Hn)n∈N be a
real-valued and F-predictable process. The discrete stochastic
integral of H with respect to X is the stochastic process H · X
defined by
(H · X) :=
n
X
m=1
Hm(Xm − Xm−1) for n ∈ N0 .
If X is a martingal, then H · X is also called martingale
transform of X.
• Note that (H · X) is F-adapted by construction.
187
Discrete Stochastic Integral (2)
Applied Probability
• Theorem, Stability Theorem (see Klenke, Theorem 9.39): Let
(Xn)n∈N0 be an F-adapted real process with E(|X0|) < ∞.
– X is a martingale if and only if for any locally bounded predictable
process H (i.e. each Hn is bounded), the stochastic integral
(H · X) is a martingale.
– X is a submartingale (supermartingale) if and only if (H · X) is a
submartingale (supermartingale) for any locally bounded
predictable process H ≥ 0.
188
Discrete Stochastic Integral (3)
Applied Probability
• Example: St. Petersburg game (Klenke, Example 9.40)
– I = N0, D1, D2, . . . are iid Rademacher random variables (with
p = 1/2), i.e. P(Di = 1) = P(Di = −1) = 21 for all i ∈ N0.
– D = (Di)i∈N0 and F = σ(D).
– Di is the result of a bet that gives a gain or loss of one Euro for
every Euro we put at stake.
– Hn is the Euro amount we bet in the gambling round n.
– The gambling strategy has to be predictable, hence
Hn = Fn(D1, . . . , Dn−1). We already had
Hn = 2n−11(D1=···=Dn−1=−1). H1 = 1 which is measurable with
respect to the trivial σ-field F0 = {∅, Ω}.
189
Discrete Stochastic Integral (4)
Applied Probability
• Example: St. Petersburg game
– Define Xn = Pni=1 Di. (Xn) is a martingale.
– Let H1 = 1 and Hn = 2n−11(D1=···=Dn−1=−1). Then
Sn = Pni=1 Hi(Xi − Xi−1) = Pni=1 HiDi = (H · X)n is the gain
process.
– (Sn), or S in textbook notation, is a martingale.
– We obtain E(Sn) = 0 for all n ∈ N.
– Note that we already know that Sn → 1 almost surely. This issue
will be discussed when we investigate martingale convergence. In
this example n ∈ N0.
190
Martingales and Option Pricing (1)
Applied Probability
• In the following we consider the Cox et al. (1979) model:
– I = {0, 1, . . . , T }
– A risky asset with binary price process (St)t∈I .
– A risk free asset, ≈ bond, paying a fixed interest rate r ≥ 0. The
price of the risk free asset at period t is Stb = (1 + r)t.
– We want to price a European call option, which is a derivative
security with a payoff structure (ST − K)+ = max(0, ST − K). T
is called expiry date or maturity, while K is called strike price.
– With European options the option cannot be executed before the
expiry date T .
191
Martingales and Option Pricing (2)
Applied Probability
• By the martingale transform Y = (H · X) we have transformed the
martingale X into a further martingale.
• Let Y0 = 0. When the process X is fixed, which martingales Y can
be obtained by means of H = H(Y ).
• Not all martingales Y can be obtained (see e.g. Klenke, 2008,
Example 9.41).
• However, a martingale (Y ) can be represented as stochastic integrals
if the increments of Xn − Xn−1 can only take two values.
192
Martingales and Option Pricing (3)
Applied Probability
• Definition, Binary Model (see Klenke, Definition 9.42):
– A stochastic process X0, . . . , XT is called binary splitting or a
binary model if there exist random variables D1, . . . , DT with
values in {−1, 1} and functions fn : Rn−1 × {−1, 1} → R for
n = 1, . . . , T as well as x0 ∈ R such that X0 = x0 and
Xn = fn(X1, . . . , Xn−1, Dn)
for any n = 1, . . . , T . By F = σ(X) we denote the filtration
generated by X.
• Xn depends only on the past Xi, but not on the full information
arising from D1, . . . , Dn.
193
Martingales and Option Pricing (4)
Applied Probability
• Theorem, Representation Theorem (see Klenke, Theorem 9.43):
– Let X be a binary model and let VT be an FT -measurable random
variable. Then there exists a bounded predictable process H and a
v0 ∈ R with VT = v0 + (H · X).
194
Martingales and Option Pricing (5)
Applied Probability
•
The predictable process H = (Htb, Hts)> , Ht ∈ R2, is called
trading strategy. Htb and Hts are the number of bonds and shares
held by the investor in period 0 ≤ t ≤ T .
• The value of the portfolio at time t is
Vt = HtbStb + HtsSt
the discounted value is
!
1
s
b b
Ṽt = b Ht Sn + Ht St .
Sn
195
Martingales and Option Pricing (6)
Applied Probability
• Definition, Self Financing Trading Strategy (see Lamberton and
Lapeyre, 2008):
– A trading strategy is called self financing if
b
s
HtbStb + HtsSt = Ht+1
Stb + Ht+1
St hold for any
t = {0, 1, . . . , T − 1}. (Ht is predictable, i.e. measurable with
respect to Ft = σ(X1, . . . , Xn−1).)
• For t = 0, V0(= H0bS0b + H0sS0) = H1bS1b + H1sS1 has to hold.
V0 = H0bS0b + H0sS0 as done in Lamberton and Lapeyre (2008), some
random variable, or some constant v0. This depends on F0. In Klenke
H0 has to be constant based on his definition of predictability, while
the textbook of Lamberton and Lapeyre (2008) only required
measurable with respect to F0.
196
Martingales and Option Pricing (7)
Applied Probability
• Theorem, (see Lamberton and Lapeyre, 2008, Proposition 1.1.2):
The following statements are equivalent:
– The strategy H is self financing.
– For any t ∈ {1, . . . , T }
Vt(H) = V0(H) +
t
X
(Hib∆Stb + His∆Si)
i=1
where ∆St = St − St−1.
– For any t ∈ {1, . . . , T }
Ṽt(H) = V0(H) +
where ∆S̃t = S̃t − S̃t−1 =
1
S
Stb t
t
X
(Hib∆S̃ib + Hts∆S̃i)
i=1
− S b1 St−1.
t−1
197
Martingales and Option Pricing (8)
Applied Probability
• Theorem, (see Lamberton and Lapeyre, 2008, Proposition 1.1.3):
– For any predictable process H s and for any F0 measurable variable
V0, there exists a unique predictable process H b such that the
strategy H = (H b, H s) is self financing and the initial value if V0.
198
Martingales and Option Pricing (9)
Applied Probability
• Definition, Admissible Strategy (see Lamberton and Lapeyre, 2008,
Definition 1.1.4):
– A strategy H is admissible if it is self financing and if Vt(H) ≥ 0
for any t ∈ {0, 1, . . . , T }.
• Definition, Arbitrage Strategy (see Lamberton and Lapeyre, 2008,
Definition 1.1.5):
– An arbitrage strategy is an admissible strategy with zero initial
value (i.e. V0(H) = 0) and non-zero final value (i.e. VT (H) > 0).
• For different definitions/forms of arbitrage (see Mas-Colell et al.,
1995; Werner and Ross, 2000)
199
Martingales and Option Pricing (10)
Applied Probability
• Definition, Viable Market (see Lamberton and Lapeyre, 2008,
Definition 1.1.5):
– A market is viable (arbitrage free) if there is no arbitrage
opportunity.
• Theorem, Fundamental Theorem of Asset Pricing (see Lamberton
and Lapeyre, 2008, Theorem 1.1.6):
– The market is viable if and only if there exists a probability
measure P∗ equivalent to P such that the discounted prices of
assets are martingales.
200
Martingales and Option Pricing (11)
Applied Probability
• Remark:
– For different definitions/forms of arbitrage (see Mas-Colell et al.,
1995; Werner and Ross, 2000).
– The discrete time version of the fundamental theorem of asset
prices goes back to Harrison and Kreps (1979) and Harrison and
Pliska (1981).
– The continuous time analog of this theorem has been derived by
Delbaen and Schachermayer (1994).
– An easier way to look on this theorem is provided in e.g. Filipović
(2009)[Chapter 4].
201
Martingales and Option Pricing (12)
Applied Probability
• Definition, Attainable Claim (see Lamberton and Lapeyre, 2008,
Definition 1.3.1):
– A contingent claim h is attainable if there exists an admissible
strategy H worth h at time T .
• Example: h = (ST − K)+. H is a linear combination of bonds and
shares such that the payoff is (ST − K)+.
• Definition, Complete Market (see Lamberton and Lapeyre, 2008,
Definition 1.3.3):
– The market is complete if every contingent claim is attainable.
202
Martingales and Option Pricing (13)
Applied Probability
• Theorem, (see Lamberton and Lapeyre, 2008, Theorem 1.3.4):
– A viable market is complete if and only if there exists a unique
probability measure P∗ equivalent to P, under which discounted
prices are martingales.
203
Martingales and Option Pricing (14)
Applied Probability
• Definition, Cox et al. (1979) Model (see Klenke, Definition 9.44):
– Consider an economy with a risky asset S and a risk free asset S b.
– Let T ∈ N, a ∈ (−1, 0) and b ∈ (0, 1) as well as p ∈ (0, 1).
Further, let D1, . . . , DT be iid Rademacher random variables
where P(Di = 1) = 1 − P(Di = −1) = p. We let the initial price
of the risky asset S0 = s0 > 0 and, for n = 1, . . . , T , define






(1 + b)St−1
St = 

(1 + a)St−1
if Dt = +1,
if Dt = −1.
– F0 = {∅, Ω}, F = 2Ω.
204
Martingales and Option Pricing (15)
Applied Probability
• Definition, Cox et al. (1979) Model (see Klenke, Definition 9.44):
– The bond prices Stb are described by the deterministic function
Stb = (1 + r)t, r ≥ 0, fulfilling a < r < b, is a fixed interest rate.
– An European call option with payoff profile
h(XT ) = max(0, ST − K) = (ST − K)+ is written on the risky
asset described by S. K is called strike price, T is the
expiration date. π(VT ) is the arbitrage free value of this financial
derivative.
205
Martingales and Option Pricing (16)
Applied Probability
• Let us start with T = 1. In this case the asset price is either
S11 = (1 + b)s0 or S11 = (1 + b)s0.
• The value process is VT (D1 = 1) = V1(D1 = 1) = ((1 + b)s0 − K)+
with a probability of p and V1(D1 = −1) = ((1 + a)s0 − K)+ with
1 − p.
• From the above results we know that S̃ (or S 0 in Klenke’s notation)
has to follows a martingale. I.e. we have to transform p such that S̃
follows a martingale, i.e.
1
(p∗(1 + b)s0 + (1 − p∗)(1 + a)s0) .
s0 =
1+r
206
Martingales and Option Pricing (17)
Applied Probability
a−r
∗
• Some algebra yields p∗ = r−a
=
>
0.
Hence
p
∼ p. The
b−a
a−b
equivalent process X 0 is (1 + b)s0 with probability p∗ and (1 + a)x0
with probability 1 − p∗. This process is a martingale.
• To obtain the value of the option we calculate the expected value of
VT under the equivalent measure p∗. This yields
π(VT ) = Ep∗ (VT )
1 ∗
+
∗
+
=
p ((1 + b)s0 − K) + (1 − p )((1 + a)s0 − K) .
1+r
207
Martingales and Option Pricing (18)
Applied Probability
• For T ∈ N, Cox et al. (1979) derived:
• The price of an European call option is given by
π(VT ) = Ep∗ (VT )

=

T 
s0
T ∗ i
X
T −i i
T
−i
∗
 
  (p ) (1 − p )
·
(1 + b) (1 + a)
 
T
(1 + r) i=A i

−

T 
T ∗ i
K
X
∗ T −i
 
  (p ) (1 − p )
·
,


T
(1 + r) i=A i
where A := min{i ∈ N0 : (1 + b)i(1 + a)T −is0 > K}.
208
Martingales and Efficient Markets
Applied Probability
• Remark:
– In the above example martingales have been used to price
derivatives.
– On the martingale property of asset prices see e.g. Lucas (1978)
and Duffie (2001).
– Regarding efficient capital market literature see e.g. LeRoy (1989)
or Campbell et al. (1997) and the literature cited there.
209
Optional Sampling Theorems
Applied Probability
• Motivation:
– If X is a martingale, then the martingale transform (H · X)
provides us with a further martingale.
– Does this also hold for a stopped process?
– In less mathematical terms, by the stability theorem we observe
that a fair game (= martingale) cannot be transformed in an
unfair game by some gambling strategy H.
– The optional sampling theorems investigate these issue for
processes stopped at random times.
210
Dobb Decomposition (1)
Applied Probability
• Let X = (Xn)n∈N0 be an adapted process with E(|Xn|) < ∞ for all
n ∈ N0 .
• We try to decompose X into a martingale and a predictable process.
I.e.. for n ∈ N0,
Mn := X0 +
and
An :=
n
X
k=1
n
X
k=1
(Xk − E(Xk |Fk−1))
(E(Xk |Fk−1) − Xk−1)
• Xn = Mn + An. M is a martingale, A is predictable with A0 = 0.
211
Dobb Decomposition (2)
Applied Probability
• Theorem, Doob decomposition (see Klenke, Theorem 10.1):
– Let X = (Xn)n∈N0 be an adapted process with E(|Xn|) < ∞ for
all n ∈ N0. Then there exists a unique decomposition
X = M + A, where A is predictable with A0 = 0 and M is a
martingale. This representation of X is called the Doob
decomposition. X is a submartingale if and only if A is
monotone increasing.
212
Dobb Decomposition (3)
Applied Probability
• Definition, Square variation process (see Klenke, Definition 10.3):
– Let X = (Xn)n∈I be a square integrable F martingale. The unique
predictable process A for which (Xn2 − An)n∈I becomes a
martingale is called the square variation process of X and is
denoted by (hXin)n∈I = A.
213
Dobb Decomposition (4)
Applied Probability
• Theorem, Square variation process (see Klenke, Theorem 10.4):
– Let X = (Xn)n∈I be a square integrable F martingale. Then for
n ∈ N0 ,
n
X
E((Xi − Xi−1)2|Fi−1)
hXin =
i=1
and
E (hXin) =
n
X
i=1
V(Xn − X0) .
214
Dobb Decomposition (5)
Applied Probability
• Discuss the Examples 10.2., 10.6 and 10.7
• The square variation (quadratic variation) is an important
concept/property of processes when continuous time martingales are
investigated (see e.g. Klenke, 2008, Chapter 21).
215
Optional Sampling and Stopping (1)
Applied Probability
• Theorem, (see Klenke, Lemma 10.10):
– Let I ⊂ R be countable, X = (Xt)t∈I be a martingale, let T ∈ I
and let τ be a stopping time with τ ≤ T . Then
Xτ = E(XT |Fτ )
and
E (Xτ ) = E (XT ) .
• Note that this theorem required that τ is bounded by some T ∈ I.
216
Optional Sampling and Stopping (2)
Applied Probability
• Theorem, Optional Sampling Theorem (see Klenke, Theorem
10.11): Let X = (Xn)n∈N0 be a supermartingale and let σ ≤ τ be
stopping times.
– Assume there exists a T ∈ N with τ ≤ T . Then
Xσ ≥ E(Xτ |Fσ ) ,
and, in particular E (Xσ ) ≥ E (Xτ ). If X is a martingale, then
equality holds in each case.
217
Optional Sampling and Stopping (3)
Applied Probability
• Theorem, Optional Sampling Theorem (see Klenke, Theorem
10.11): Let X = (Xn)n∈N0 be a supermartingale and let σ ≤ τ be
stopping times.
– If X is nonnegative and if τ < ∞ almost surely, then we have
E(Xτ ) ≤ E(X0) < ∞, E(Xσ ) ≤ E(X0) < ∞, and
Xσ ≥ E(Xτ |Fσ ).
– Assume that more generally X is only adapted and integrable.
Then X is a martingale if and only if E(Xτ ) = E(X0) for any
bounded stopping time τ .
218
Optional Sampling and Stopping (4)
Applied Probability
• Definition, Stopped Process (see Klenke, Definition 10.13):
– Let I ⊂ R be countable, let (Xt)t∈I be adapted and let τ be a
stopping time. We define the stopped process X τ by
Xtτ = Xτ ∧t
for any t ∈ I. Further Fτ is the filtration (Ftτ ) = (Ft∧τ ).
• Ft∧τ could be σ(Xt∧τ ).
• Xtτ is adapted to Fτ and F.
219
Optional Sampling and Stopping (5)
Applied Probability
• Theorem, Optional Stopping (see Klenke, Theorem 10.15):
– Let X = (Xn)n∈N0 be a (sub-, super-) martingale with respect to
F and let τ be a stopping time. Then X τ is a (sub-, super-)
martingale with respect to F and Fτ .
• Discuss the examples 10.16, 10.17 and 10.19.
220
Optional Sampling and Stopping (6)
Applied Probability
• Until now we have considered bounded stopping times. To obtain an
optional sampling result with unbounded stopping times stronger
assumptions the stochastic process X become necessary ⇒ uniform
integrability.
• Theorem, (see Klenke, Lemma 10.20):
– Let X = (Xn)n∈N0 be a uniformly integrable martingale. Then the
family (Xτ : τ is a finite stopping time) is uniformly integrable.
• Note that bounded mean τ ≤ T means P(τ ≤ T ) = 1, while finite
means τ ∈ I, where I ⊂ R, almost surely, or P(τ < ∞) = 1.
221
Optional Sampling and Stopping (7)
Applied Probability
• Theorem, Optional Sampling and Uniform Integrability (see Klenke,
Theorem 10.21):
– Let X = (Xn)n∈N0 be a uniformly integrable martingale
(respectively supermartingale) and let σ ≤ τ be finite stopping
times. Then E(|Xτ |) < ∞ and
Xσ = E (Xτ |Fσ )
respectively Xσ ≥ E (Xτ |Fσ ).
222
Optional Sampling and Stopping (8)
Applied Probability
• Theorem, (see Klenke, Corollary 10.22):
– Let X = (Xn)n∈N0 be a uniformly integrable martingale
(respectively supermartingale) and let τ1 ≤ τ2 ≤ . . . be finite
stopping times. Then (Xτn )n∈N is a martingale (respectively
supermartingale).
223
Martingale Convergence
Applied Probability
• Motivation:
– In the former section we observed that we obtain a martingale
from a martingale when the martingale transform (H · X) is
applied or by optimal stopping.
– I.e. we cannot transform a fair game into an unfair game.
– Now we investigate this question when t → ∞.
224
Doob’s Inequality (1)
Applied Probability
• Let I ⊂ N0 and let X = (Xn)n∈I be a stochastic process. For n ∈ N
we define Xn∗ = sup{Xk : k ≤ n} and |X|∗n = sup{|Xk | : k ≤ n}.
• Theorem, (see Klenke, Lemma 11.1):
– Let X be a submartingale, then for all λ > 0,
λP (Xn∗
≥ λ) ≤ E Xn1{Xn∗≥λ} ≤ E |Xn|1{Xn∗≥λ} .
225
Doob’s Inequality (2)
Applied Probability
• Theorem, Doob’s Lp-inequality (see Klenke, Theorem 11.2): Let X
be a martingale or a positive submartingale.
– For any p ≥ 1 and λ > 0,
λpP (|X|∗n ≤ λ) ≤ E (|Xn|p) .
– For any p > 1
p p
p
≥ λ) ≤
 E (|Xn | ) .
p−1

p
E (|Xn| ) ≤
E ((|X|∗n)p




226
Martingale Convergence Theorems (1)
Applied Probability
• Motivation and notation - upcrossing inequality:
– F = (Fn)n∈N0 , F∞ = σ (Sn∈N0 Fn). (Xn)n∈N0 is real valued and
adpated to F.
– a, b ∈ R with a < b.
– An upcrossing occurs when X passes the interval [a, b].
227
Martingale Convergence Theorems (2)
Applied Probability
• Motivation and notation - upcrossing inequality:
– In more detail:
τk := inf{n ≥ σk−1 : Xn ≤ a} and σk := inf{n ≥ τk : Xn ≥ b}
for k ∈ N. τk = ∞ if σk−1 = ∞. σk = ∞ if τk = ∞.
– X has its kth upcrossing over [a, b] between τk and σk if
σk < ∞.
– For n ∈ N, we define the number of upcrossings over [a, b] at
time n by
Una,b := sup{k ∈ N0 : σk ≤ n} .
228
Martingale Convergence Theorems (3)
Applied Probability
• Theorem, Upcrossing inequality (see Klenke, Lemma 11.3):
– Let (Xn)n∈N0 be a submartingale. Then
E
Una,b
!
E ((Xn − a)+) − E ((X0 − a)+)
.
≤
b−a
229
Martingale Convergence Theorems (4)
Applied Probability
• Theorem, Martingale convergence theroem (see Klenke, Theorem
11.4):
– Let (Xn)n∈N0 be a submartingale with
sup{E (Xn+) : n ≥ 0} < ∞. Then there exists an F∞-measurable
−→ X∞ almost
random variable X∞ with E (X∞) < ∞ and Xn n→∞
surely.
230
Martingale Convergence Theorems (5)
Applied Probability
• Theorem, (see Klenke, Corollary 11.5):
– Let (Xn)n∈N0 be a nonnegative supermartingale, then there is an
F∞-measurable random variable X∞ ≥ 0 with E (X∞) ≤ E (X0)
−→ X∞ almost surely.
and Xn n→∞
231
Martingale Convergence Theorems (6)
Applied Probability
• Example: St. Petersburg game
– Let Sn be the account balance in the St. Petersburg game
(Example 9.40). Then S is a martingale.
– Sn ≤ 1 almost surely for any n.
– Therefore the requirements of the martingale convergence theorem
are fulfilled. By this (Xn) converges to a finite random variable
almost surely. In the St. Petersburg game (Xn) converges to 1
almost surely.
– Since E(Sn) = 0 for all n ∈ N0 we do not obtain L1 convergence.
We also know that Sn is integrable but not uniformly integrable.
232
Martingale Convergence Theorems (7)
Applied Probability
• Theorem, Convergence theorem for uniformly integrable martingales
(see Klenke, Corollary 11.7): Let (Xn)n∈N0 be a uniformly integrable
F− (sub-, super-) martingale. Then there exists an F∞-measurable
−→ X∞ almost surely and
integrable random variable X∞ with Xn n→∞
in L1. Furthermore
– Xn = E (X∞|Fn) for all n ∈ N is a martingale.
– Xn ≤ E (X∞|Fn) for all n ∈ N is a submartingale.
– Xn ≥ E (X∞|Fn) for all n ∈ N is a supermartingale.
233
Martingale Convergence Theorems (7)
Applied Probability
• Theorem, Lp convergence theorem for martingales (see Klenke,
Corollary 11.7):
Let p > 1 and let (Xn)n∈N0 be an Lp-bounded martingale. Then
there exists an F∞-measurable integrable random variable X∞
with E (|X∞|p) < ∞ and Xn n→∞
−→ X∞ almost surely and in Lp. In
particular, (|Xn|p)n∈N0 is uniformly integrable.
234
Outline - Markov Chains
Applied Probability
• Markov Chains - definitions.
• Discrete Markov Chains.
• Recurrence and transience.
• Random Walks
• Invariant distributions, periodicity and convergence
• Markov chain Monte Carlo methods
• Klenke, Chapters 17 and 18.
235
Definitions and Construction (1)
Applied Probability
• E is a Polish space with Borel σ-algebra B(E), I ⊂ R is some index
set, (Xt)t∈I is an E-valued stochastic process. F = (Ft)t∈I = σ(X)
if not otherwise stated.
• Definition, Markov Property (see Klenke, Definition 17.1):
– We say that X has the Markov Property if, for every A ∈ B(E)
and all s, t ∈ I, with s ≤ t:
P(Xt ∈ A|Fs) = P(Xt ∈ A|Xs) .
• If E is countable space, then X has the Markov property if and only
if for all n ∈ N, all s1 < · · · < sn < t and all i1, . . . , in, i ∈ E with
P(Xs1 = i1, . . . , Xsn = in) > 0 we have
P(Xt = i|Xs1 = i1, . . . , Xsn = in) = P(Xt = i|Xsn = in) .
236
Definitions and Construction (2)
Applied Probability
• Definition, (see Klenke, Definition 17.3): Let I ⊂ [0, ∞) be closed
under addition and assume 0 ∈ I. A stochastic process X is called
time-homogeneous Markov Process with distributions (Px)x∈E on
the space (Ω, A) if:
– For every x ∈ E, X is a stochastic process on the probability
space (Ω, A, Px) with Px(X0 = x) = 1.
– The map κ : E × B(E)⊗I → [0, 1], (x, B) 7→ Px(X ∈ B) is a
stochastic kernel.
237
Definitions and Construction (3)
Applied Probability
• Definition, (see Klenke, Definition 17.3): Let I ⊂ [0, ∞) be closed
under addition and assume 0 ∈ I. A stochastic process X is called
time-homogeneous Markov Process with distributions (Px)x∈E on
the space (Ω, A) if:
– X has the time homogeneous Markov property: For every
A ∈ B(E) every x ∈ E and all s, t ∈ I we have
Px(Xt+s ∈ A|Fs) = κt(Xs, A), Px almost surely. Here for every
t ∈ I the transition kernel κt : E × B(E) → [0, 1] is the
stochastic kernel defined for x ∈ E and A ∈ B(E) by
I
κt(x, A) := κ x, {y ∈ E : y(t) ∈ A} = Px(Xt ∈ A) .
The family (κt(x, A)), t ∈ I, x ∈ E A ∈ B(E) is also called the
family of transition probabilities of X.
238
Definitions and Construction (4)
Applied Probability
• Definition, (see Klenke, Definition 17.3): Let I ⊂ [0, ∞) be closed
under addition and assume 0 ∈ I. A stochastic process X is called
time-homogeneous Markov Process with distributions (Px)x∈E on
the space (Ω, A).
– We write Ex for the expectation with respect to Px, L(X) = Px
and Lx(X|F) = Px(X ∈ .|F) for a (regular) conditional
distribution of X given F.
– If E is countable, then X is called discrete Markov process.
– In the special case I = N0, X is called Markov chain. In this
case κn is called the family of n-step transition probabilities.
• We shall observe that the existence of the transition kernel κt will
result in the existence of the kernel κ
• Discuss the examples 17.5 and 17.7.
239
Definitions and Construction (5)
Applied Probability
• Definition, Transition kernel (see Klenke, Definition 8.24): Let
(Ω1, A1) and (Ω2, A2) be measurable spaces. A map
κ : Ω1 × A2 → [0, ∞] is called a (σ-) finite transition kernel from
Ω1 to Ω2 if
– ω1 7→ κ(ω1, A2) is A1-measurable for any A2 ∈ A2.
– A2 →
7 κ(ω1, A2) is a (σ-) finite measure on (Ω2, A2) for any
ω1 ∈ Ω1.
If the measure is a probability measure for all ω1 ∈ Ω1 then κ us
called a stochastic kernel or Markov kernel.
240
Definitions and Construction (6)
Applied Probability
• The next theorem constructs a Markov process for a more general Markov
semigroup of stochastic kernels. (I.e. we consider a set of kernels K, with elements
κt, and a binary operation - in our case κs ∗ κt - for which the semigroup axioms
hold (in particular ∗ is associative)).
• Theorem, (see Klenke, Theorem 17.8):
– Let I ⊂ [0, ∞) be closed under addition and let (κt)t∈I be a Markov semigroup
of stochastic kernel from E to E . Then there is a measurable space (Ω, A) and
a Markov process ((Xt)t∈I ), Px) on the space (Ω, A) with transition probabilities
Px(Xt ∈ A) = κt(x, A)
for all x ∈ E , A ∈ B(E) and t ∈ I . Conversely, for every Markov process X
the above equation defines a semigroup of stochastic kernels. By this equation
the finite dimensional distributions of X are uniquely determined.
241
Definitions and Construction (7)
Applied Probability
• Theorem, (see Klenke, Theorem 17.9):
– A stochastic process is a Markov process if and only if there exists
a stochastic kernel κ : E × B(E)⊗I → [0, 1] such that for every
bounded B(E)⊗I − B(E)-measurable function f : E I → R and
for every s ≥ 0 and x ∈ E we have
Ex(f ((Xt+s)t∈E |Fs) = EXs (f (X))) :=
Z
EI
κ(Xs, dy)f (y) .
242
Definitions and Construction (8)
Applied Probability
• Theorem, (see Klenke, Corollary 17.10):
– A stochastic process (Xn)n∈N0 is a Markov process if and only if
Lx((Xn+k )n∈N0 |Fk ) = LXk ((Xn)n∈N0 )
for every k ∈ N0.
243
Definitions and Construction (9)
Applied Probability
• Theorem, (see Klenke, Theorem 17.11):
– Let I = N0. If (Xn)n∈N0 is a stochastic process with distributions Px, x ∈ E),
then the Markov property in Definition 17.3(iii) is implied by the existence of a
stochastic kernel κ1 : E × B(E) → [0, 1] with the property that for every
A ∈ B(E), every x ∈ E and every s ∈ I , we have
P(x)(Xs+1 ∈ A|Fs) = κ1(Xs, A) .
In this case the n-step transition kernel κn can be computed inductively by
κn = κn−1 ∗ κ1 =
Z
E
κn−1(., dx)κ1(x, .)
In particular the family (κn)nN is a Markov semigroup and the distribution of X
is uniquely determined by κ1.
244
Definitions and Construction (10)
Applied Probability
• Definition, Strong Markov Property (see Klenke, Definition 17.12):
– Let I ⊂ [0, ∞). be closed under addition. A Markov process
(Xt)t∈I with distributions (P(x), x ∈ E) has the strong Markov
property if, for every a.s. finite stopping time τ , every bounded
B(E)⊗I − B(E)-measurable function f : E I → R and for every
x ∈ E we have
Ex(f ((Xτ +s)t∈E |Fτ ) = EXτ (f (X))) :=
Z
EI
κ(Xτ , dy)f (y) .
245
Definitions and Construction (11)
Applied Probability
• Theorem, (see Klenke, Theorem 17.14):
– If I ⊂ [0, ∞) is countable and closed under addition, then every
Markov process (Xn)n∈I with distributions (P(x), x ∈ E) has the
strong Markov property.
246
Definitions and Construction (12)
Applied Probability
• Theorem, Reflection Principle (see Klenke, Theorem 17.15):
– Let Y1, Y2, . . . be iid real random variables with symmetric
distribution L(Y1) = L(−Y1). Define X0 and Xn := Y1 + · · · + Yn
for n ∈ N. Then for every n ∈ N0 and a > 0


P  sup Xm ≥ a ≤ 2P (Xm ≥ a) − P (Xn = a) .
m≤n
If we have P (Y1 ∈ {−1, 0, 1}) = 1, then for a ∈ N equability
holds.
247
Discrete Markov Chains (1)
Applied Probability
• In the following E is countable and I = N0.
• X = (Xn)n∈N0 on E is a discrete Markov chain or Markov chain with
discrete state space.
• If X is discrete then (Px)x∈E is described by the transition matrix
P = (p(x, y))x,y∈E = (Px[X1 = y])x,y∈E .
• The n-step transition probabilities are p(n)(x, y) = Px[Xn = y].
248
Discrete Markov Chains (2)
Applied Probability
• The n-step transition probabilities can be obtained by the n-fold
matrix product
p(n)(x, y) = pn(x, y) .
where p(n)(x, y) = Pz∈E pn−1(x, z)p(z, y); p0 = I is the identity
matrix.
• Induction results in the Chapman-Kolmogorov equation: For all
m, n ∈ N0 and x, y ∈ E we have
p(m+n)(x, y) = Pz∈E p(m)(x, z)p(n)(z, y); p0 = I is the identity matrix.
249
Discrete Markov Chains (3)
Applied Probability
• Definition, Stochastic matrix (see Klenke, Definition 17.16):
– A matrix (p(x, y))x,y∈E with nonnegative entries and with
X
p(x, y) = 1
y∈E
for all x ∈ E is called a stochastic matrix.
250
Discrete Markov Chains (4)
Applied Probability
• Remark: Stochastic matrix
– A stochastic matrix is a stochastic kernel from E to E.
– By theorem 17.8 there exists a unique discrete Markov chain.
– Example: (Rn(x), x ∈ E, n ∈ N0) is an independent family of
random variables with values in E and distributions
P(Rn(x) = y) = p(x, y) for all x, y ∈ E and n ∈ N0. We did not
require that (Rn(x), x ∈ E) are independent. We also did not
require that the Rn have the same distribution. Only the
one-dimensional marginal distributions are determined.
x
With x ∈ E we define X0 = x and Xnx = Rn(Xn−1
) for n ∈ N.
– Notation: Px := L(X x) is the distribution of X x. This a
probability measure on the space of sequences (E N0 , B(E)⊗N0 ).
251
Discrete Markov Chains (5)
Applied Probability
• Theorem, (see Klenke, Theorem 17.17):
– With respect to the distribution (Px)x∈E the canonical process X
on (E N0 , B(E)⊗N0 ) is a Markov chain with transition matrix P.
– In particular, for any stochastic matrix p, there corresponds a
unique discrete Markov chain X with transition probabilities p.
252
Discrete Markov Chains (6)
Applied Probability
• Example, Random Walk (see Klenke, Example 17.18):
– Let E = Z and assume that p(x, y) = p(0, y − x) for all x, y ∈ Z.
– Here p is translation invariant.
– A discrete Markov chain X with this transition probability matrix
p is a random walk on Z.
– Xn has the same distribution as X0 + Z1 + · · · + Zn where (Zn)
are iid with P(Zn = x) = p(0, x).
253
Discrete Markov Chains (7)
Applied Probability
• Example, Simulation (see Klenke, Example 17.19):
–
–
–
–
Consider a finite state space E = {1, . . . , k}.
We want to simulate X with transition matrix P.
(Un)n∈N is a sequence of uniform random numbers.
Define r(i, 0) = 0, r(i, j) = p(i, 1) + · · · + pi,j for i, j ∈ E. Define
Rn by
Rn(i) = j ⇔ Un ∈ [r(i, j − 1), r(i, j))
– Then P(Rn(i) = j) = r(i, j) − r(i, j − 1) = p(i, j).
254
Discrete Processes in Continuous Time (1)
Applied Probability
• E is countable.
• (Xt)t∈[0,∞) is a process on E with transition matrix
p(x, y) := Px(Xt = y) with x, y ∈ E.
255
Discrete Processes in Continuous Time (2)
Applied Probability
• With x 6= y the process X jumps with rate q(x, y) if the following
limit exists
1
q(x, y) = lim Px(Xt = y) .
t↓0 t
• Assume that this limit exists, i.e.
y6=x q(x, y)
P
< ∞.
• We define q(x, x) = − Py6=x q(x, y).
• With this conversion
1
lim Px(Xt = y) − 1(x=y) = q(x, y)
t↓0 t
for all x, y ∈ E.
256
Discrete Processes in Continuous Time (3)
Applied Probability
• Definition, Generator (see Klenke, Definition 17.23):
P
1
P
(X
=
y),
q(x,
x)
=
−
– If q(x, y)
=
lim
t
y6=x q(x, y) and
t↓0 t x
limt↓0 1t Px(Xt = y) − 1(x=y) = q(x, y) hold, then
Q = (q(x, y))x,y∈E is called the Q-matrix of X or the generator
of the semigroup (pt)t≥0.
257
Discrete Processes in Continuous Time (4)
Applied Probability
• Theorem, Generator (see Klenke, Theorem 17.25):
– Let Q be an E × E matrix such that q(x, y) ≥ 0 for all x, y ∈ E
with x 6= y. Assume that q(x, y) = limt↓0 1t Px(Xt = y) for x 6= y
and q(x, x) = − Py6=x q(x, y) hold and that
λ := supx∈E |q(x, x)| < ∞.
Then Q is the Q-matrix of the unique Markov process X.
258
Discrete Processes in Continuous Time (5)
Applied Probability
• Example, Poisson Process (see Klenke, Example 17.24):
– A Poisson process with intensity α has the Q-matrix
q(x, y) = α(1(y=x+1) − 1(y=x)).
• We meet continuous time Markov chains with discrete state space
e.g. in credit risk modeling (see e.g. Schönbucher, 2003,
Chapter 8.2).
259
Poisson Process (1)
Applied Probability
• Example, Geiger counter, number of clicks should be a random
variable in the time interval I = (a, b]. The number of clicks should
be
– Independent for disjoint intervals.
– Homogeneous: I.e. when the interval is shifted by some constant
c ∈ R then the distribution remains the same.
– The number of clicks should have finite expectation.
– At one point of time there should be only one click.
260
Poisson Process (2)
Applied Probability
• In more formal terms: I = {(a, b] : a, b ∈ [0, ∞], a ≤ b},
`((a, b]) = b − a. I ∈ I and NI ∈ N0, Nt = N(0,t] ... the number of
jumps until t. (NI , I ∈ I) is a family
P1 NI∪J = NI + NJ if I ∩ J = ∅ and I ∪ J ∈ I.
P2 The distribution of NI only depends on the length of the interval
I. I.e. PNI = PNJ if `(I) = `(J).
P3 If J ⊂ I with I ∩ J = ∅ for all I, J ∈ J and I 6= J, then
(NJ , J ∈ J ) is an independent family.
261
Poisson Process (3)
Applied Probability
• In more formal terms: I = {(a, b] : a, b ∈ [0, ∞], a ≤ b},
`((a, b]) = b − a. I ∈ I and NI ∈ N0, Nt = N(0,t] ... the number of
jumps until t. (NI , I ∈ I) is a family
P4 For any I ∈ I we have E(NI ) < ∞.
P5 lim supε↓0 ε−1P(Nε ≥ 2) = 0.
• Let λ := lim supε↓0 ε−1P(Nε ≥ 2). Then:
262
Poisson Process (4)
Applied Probability


P (doubleclick in (0, 1]) = n→∞
lim P 
=
=
2n[
−1
k=0


{N(k2−n,(k+1)2−n] ≥ 2}
2n\
−1



1 − n→∞
lim P 

{N(k2−n,(k+1)2−n] ≤ 1}
1 − n→∞
lim
k=0
2nY
−1
k=0
P {N(k2−n,(k+1)2−n] ≤ 1} by P3
2n
=
1 − n→∞
lim 1 − P {N(0,2−n] ≥ 2}
=
n 2
2
1 − n→∞
lim 1 − n P {N(0,2−n] ≥ 2} 
2
≤
1 − e−λ .

by P2

n
263
Poisson Process (5)
Applied Probability
• Definition, Poisson Process (see Klenke, Definition 5.33): A family
(Nt, t ≥ 0) of N0-valued random variables is called a Poisson
process with intensity α ≥ 0 if N0 = 0 and if:
– For any n ∈ N and any choice of n + 1 numbers
0 = t0 < t1 · · · < tn the family (Nti − Nti−1 , i = 1, . . . , n) is
independent.
– For t > s ≥ 0 the difference Nt − Ns is Poisson distributed with
parameter α(t − s), that is
P(Nt − Ns = k) = e
α(t−s) (α(t
− s))k
k!
for all k ∈ N0.
264
Poisson Process (6)
Applied Probability
• Theorem, (see Klenke, Theorem 5.34):
– If (NI , I ∈ I) has the properties P1 to P5, then (N(0,t], t ≥ 0) is a
Poisson process with intensity α := E(N(0,1]). If (Nt) is a Poisson
process, then (Nt − Ns, (s, t] ∈ I) has the properties P1 to P5.
265
Poisson Process (7)
Applied Probability
• Example - Geiger counter
– Waiting time between clicks, is the time that the process does not
jump. Hence, P(N(s,s+t] = 0) = P(Nt − Ns = 0) = e−αt.
– The waiting times are independent, since Nt − Ns and Ns − Nr
are independent for some t > s > r.
– The Poisson process is a process with independent and stationary
increments.
266
Poisson Process (8)
Applied Probability
• Ad existence:
– Let W1, W2, . . . be an independent family of exponentially
distributed random variables with parameter α > 0, i.e.
P(Wn > x) = e−αx.
– Define Tn = Pnk=1 Wk and interpret Tk is the waiting time between
jump n − 1 and n. Tn are times where the process jumps from
n − 1 to n.
– Let Nt = Pn≥1 1(Tn≤t).
– Hence, {Nt = k} = {Tk ≤ t < Tk+1}.
– Since {Nt ≥ k} ∈ Ft, the jump times Tk are stopping times.
– The process (Nt) is right-continuous and non-decreasing.
267
Poisson Process (9)
Applied Probability
• Theorem, (see Klenke, Theorem 5.35):
– Given the above construction of (Nt), the family (Nt, t ≥ 0) is a
Poisson process with intensity α.
• We meet the Poisson process e.g. in continuous time option pricing
models (see e.g. Lamberton and Lapeyre, 2008, Chapter 7).
268
Discrete Processes in Continuous Time (5)
Applied Probability
• Example, Poisson Process (see Klenke, Example 17.24):
– A Poisson process with intensity α has the Q-matrix
q(x, y) = α(1(y=x+1) − 1(y=x)).
269
Rating Model (1)
Applied Probability
• We meet continuous time Markov chains with discrete state space
e.g. in credit risk modeling (see e.g. Schönbucher, 2003,
Chapter 8.2).
• Standard & Poor’s rating transition matrix. (p(x, y))x,y∈E = P(0,1] =























AAA
AA
A
BBB
BB
B
CCC
D
AAA
89.10
0.86
0.09
0.06
0.04
0.00
0.00
0.00
AA
9.63
90.10
2.91
0.43
0.22
0.19
0.00
0.00
A
0.78
7.47
88.94
6.56
0.79
0.31
1.16
0.00
BBB
0.19
0.99
6.49
84.27
7.19
0.66
1.16
0.00
BB
0.30
0.29
1.01
6.44
77.64
5.17
2.03
0.00
B
0.00
0.29
0.45
1.60
10.43
82.46
7.54
0.00
CCC
0.00
0.00
0.00
0.18
1.27
4.35
64.93
0.00
D
0.00
0.00
0.09
0.45
2.41
6.85
23.19
100.00























.
270
Rating Model (2)
Applied Probability
• After we take the matrix logarithm (see e.g. Horn and Johnson,
1990), we obtain the generator matrix Q:


















AAA
AA
A
BBB
BB
B
CCC
D
AAA
−0.1159
0.0096
0.0008
0.0006
0.0004
0.0000
0.0000
0.0000
AA
0.1075
−0.1061
0.0324
0.0036
0.0022
0.0021
−0.0004
0.0000
A
0.0042
0.0832
−0.1214
0.0756
0.0058
0.0027
0.0144
0.0000
BBB
0.0013
0.0081
0.0746
−0.1775
0.0885
0.0047
0.0136
0.0000
BB
0.0034
0.0026
0.0090
0.0790
−0.2612
0.0640
0.0245
0.0000
B
−0.0004
0.0029
0.0040
0.0140
0.1295
−0.1998
0.1013
0.0000
CCC
0.0000
−0.0001
−0.0003
0.0014
0.0138
0.0590
−0.4353
0.0000
D
0.0000
−0.0002
0.0006
0.0033
0.0208
0.0673
0.2820
0.0000


















271
.
Rating Model (3)
Applied Probability
• Note that P(0,1] = exp(1 · Q), where exp stands for the matrix
exponential.
• To obtain P(0,s], s ∈ R+, calculate P(0,s] = exp(s · Q).
• The off-diagonal elements qij , i, j = 1, . . . , N , i 6= j are
non-negative.
• The diagonal elements satisfy qii = − Pl6=i qil .
272
Rating Model (4)
Applied Probability
• Q(s) and therefore P(t,T ] can also depend on time t. E.g. consider
P(t,t+∆] ≈ IN + ∆Q(s).
• If the generator matrices commute, i.e. Q(s)Q(t) = Q(t)Q(s), then
R
P(t,T ] = exp( tT Q(s)ds).
• If Q(t) = Q for all t, then P(t,T ] = exp((T − t)Q).
273
Rating Model (5)
Applied Probability
• The continuous time Markov chain with generator Q can be viewed
as a collection of K compound Poisson processes.
• I.e. for every rating class k ∈ E, the Poisson process Nk (t) triggers a
transition away from this class k with intensity −qkk (the intensity to
stay in k was Qkk )
• Whenever a jump occurs the random variable V indicates the new
state of the chain. Hence the conditional intensity to move from class
k to class j is qkj /qkk .
• The conditional probability to move from class k to class j when is
jump takes place is (approximately) −qkj /qkk .
274
Rating Model (6)
Applied Probability
• The rating process R(t) can be written as
dR(t) = (V − R(t))dNR(t)(t) .
• For the estimation of transition intensities see Schönbucher
(2003)[Chapter 8.3].
275
Recurrence and Transience (1)
Applied Probability
• We consider X = (Xn)n∈N0 , a chain on a countable space E with
transition matrix P.
• In the following we meet some properties of discrete chains. This
properties will be useful when we investigate the limit behavior of
chains.
276
Recurrence and Transience (2)
Applied Probability
• Definition, (see Klenke, Theorem 17.28):
– For any x ∈ E, let τx := τx1 := inf{n > 0 : Xn = x} and
τxk := inf{n > τxk−1 : Xn = x}
for k ∈ N, k ≥ 2. τxk is the kth entrance time of X for x. For
x, y ∈ E let
F (x, y) := Px(τy1 < ∞) := Px(there is an n > 1 with Xn = y)
be the probability of ever going from x to y. In particular, F (x, x)
is the return probability after the first jump from x to x.
277
Recurrence and Transience (3)
Applied Probability
• Theorem, (see Klenke, Theorem 17.29):
– For all x, y ∈ E and k ∈ N we have
Px(τyk < ∞) = F (x, y)F (y, y)k−1 .
278
Recurrence and Transience (4)
Applied Probability
• Definition, (see Klenke, Definition 17.30): A state x ∈ E is called
–
–
–
–
–
recurrent if F (x, x) = 1.
positive recurrent if Ex(τx1) < ∞.
null recurrent if x is recurrent but not positive recurrent.
transient if F (x, x) < 1.
absorbing if p(x, x) = 1.
• The Markov chain X is called (positive/null) recurrent if every state
x ∈ E is (positive/null) recurrent and is called transient if every
recurrent state is absorbing.
• Note that ’absorbing’ ⇒ ’positive recurrent’ ⇒ ’recurrent’.
279
Recurrence and Transience (5)
Applied Probability
• Definition, (see Klenke, Definition 17.33):
– Denote by N (y) = P∞
n=0 1Xn =y the total number of visits of X to
y and by
∞
X
G(x, y) = Ex(N (y)) =
pn(x, y)
n=0
the Green function of X.
280
Recurrence and Transience (6)
Applied Probability
• Theorem, (see Klenke, Theorem 17.34):
– For x ∈ E we have (with the convention 1/0 = ∞)
G(x, y) =
if x 6= y and
F (x, y)
1 − F (y, y)
1
G(x, y) =
1 − F (y, y)
if x = y. In addition,
G(x, y) = F (x, y)G(y, y) + 1x=y .
– A state x ∈ E is recurrent if and only if G(x, x) = ∞.
281
Recurrence and Transience (7)
Applied Probability
• Theorem, (see Klenke, Theorem 17.35):
– If x is recurrent and F (x, y) > 0, then y is also recurrent and
F (x, y) = F (y, x) = 1.
282
Recurrence and Transience (8)
Applied Probability
• Definition, Irreducible (see Klenke, Definition 17.36): A discrete
Markov chain is called
– irreducible if F (x, y) > 0 for all x, y ∈ E or equivalently
G(x, y) > 0.
– weak irreducible if F (x, y) + F (y, x) > 0 for all x, y ∈ E.
283
Recurrence and Transience (9)
Applied Probability
• Theorem, (see Klenke, Theorem 17.35):
– An irreducible discrete Markov chain is either recurrent or
transient. If |E| > 2 then there is no absorbing state.
284
Invariant Distributions (1)
Applied Probability
• We consider a discrete space E and (Xn)n∈N0 is a Markov chain.
• Is there a distribution L(Xn) which stays the same for all n?
• If such an invariant distribution exists the next chapter provides
conditions when convergence to this invariant distribution takes place.
285
Invariant Distributions (2)
Applied Probability
• Definition, (see Klenke, Definition 17.42):
– If µ is a measure on E and f : E → R is a map, then we write
µp({x}) = Py∈E µp({y})p(y, x) and Pf (x) = Py∈E p(x, y)f (y) if
the sums converge.
286
Invariant Distributions (3)
Applied Probability
• Definition, (see Klenke, Definition 17.43):
– A σ-finite measure µ and E is called invariant measure if
µP = µ.
A probability measure that is an invariant measure is called an
invariant distribution. Denote by I the set of invariant
distributions.
– A function f : E → R is called subharmonic if Pf exists and if
f ≤ Pf . If is called superharmonic if f ≥ pf and harmonic if
f = Pf
• Remark: In terms of linear algebra an invariant measure is a left
eigenvector of P corresponding to the eigenvalue 1. A harmonic
function is a right eigenvector corresponding to the eigenvalue 1.
287
Invariant Distributions (4)
Applied Probability
• Theorem, (see Klenke, Theorem 17.46):
– If X is transient, then an invariant distribution does not exist.
288
Invariant Distributions (5)
Applied Probability
• Theorem, (see Klenke, Theorem 17.47)
– Let x be a recurrent state and let τx1 = inf{n ≥ 1 : Xn = x}.
Then one invariant measure µx is defined by

µx({x}) = E
1 −1
τxX



n=0

1


{Xn =y} 

=
∞
X
n=0
Px X n =
y; τx1
>n .
289
Invariant Distributions (6)
Applied Probability
• Theorem, (see Klenke, Corollary 17.48):
– If X is positive recurrent then
π :=
µx
Ex (τx1)
is an invariant distribution for any x ∈ E.
• Klenke provides a citation for: If X is irreducible and recurrent, then
an invariant measure if X is unique up to a multiplicative factor.
• If X is transient there can be more than one invariant measure; see
e.g. Remark 17.50(ii).
290
Invariant Distributions (7)
Applied Probability
• Theorem, (see Klenke, Theorem 17.51)
– Let X be irreducible. X is positive recurrent if and only if the set
of invariant distributions I =
6 ∅. In this case, I = {π} with
1
π({x}) :=
>0
Ex (τx1)
for all x ∈ E.
• Discuss Example 17.52, and Exampler 1.7.3, 1.7.8 and 1.7.10 in
Norris (1998).
291
Convergence of Chains (1)
Applied Probability
• We consider a Markov chain X with invariant distribution π.
• When does the distribution of Xn (PXn = Pn or L(Xn)) converge to
π if n → ∞.
• We shall observe that it is necessary and sufficient that the state
space cannot be decomposed into subspaces that the chain does not
leave or that are visited by the chain periodically (see e.g. weather
model vs. learning model).
• The first property will be called irreducible and second aperiodic (vs.
reducible and periodic).
292
Convergence of Chains (2)
Applied Probability
• We consider a positive recurrent Markov chain X on the countable
space E with transition matrix P started at some arbitrary
µ ∈ M(E).
• When does the distribution of Xn converge to π, i.e. µPn → π for
n → ∞.
• First π has to be unique (up to a factor). π is the unique left
eigenvector of P with eigenvalue 1. For uniqueness irreducibility is
sufficient by Theorem 17.49.
• To obtain µPn → π contraction properties of P are necessary. First,
1 is the largest eigenvalue of P. The stochastic matrix is sufficiently
contractive if the multiplicity of the eigenvalue is one. There are no
further eigenvalues with modulus (possibly complex valued) one.
293
Periodicity of Markov Chains (1)
Applied Probability
• For the last property, irreducible is not sufficient. E.g. with
E = {0, . . . , N − 1} the Markov chain with transition matrix
p(x, y) = 1(y=x+1 (mod N )) is

P=







0 1 0
0 0 1
1 0 0








.
for N = 3. Every point is visited periodically after N steps.
• The eigenvalues are 1 and −0.5000 ± 0.8660ı. For N = 2 the
eigenvalues are 1, 1. For N ≥ 1 the eigenvalues are the N roots
provided by e2πık/N , k = 0, 1, . . . , N − 1. The uniform distribution is
the invariant distribution but limn→∞ δxPn does not exist for any
x ∈ E.
294
Periodicity of Markov Chains (2)
Applied Probability
• Notation: For m, n ∈ N, we write m|n if m is a divisor of n, i.e.
n
m ∈ N.
• If M ⊂ N, then ged(M ) is the greatest common divisor of all
n ∈ M.
295
Periodicity of Markov Chains (3)
Applied Probability
• Definition, periodic, aperiodic (see Klenke, Definition 18.1):
– For x, y ∈ E define
N (x, y) = {n ∈ N0 : pn(x, y) > 0} .
For any x ∈ E, dx := ged(N (x, x)) is called the period of the
state x.
– If dx = dy for all x, y ∈ E, then d = dx is called the period of X.
– If dx = 1 for all x ∈ E, the X is called aperiodic.
296
Periodicity of Markov Chains (4)
Applied Probability
• Theorem, (see Klenke, Lemma 18.2):
– For any x ∈ E there exists an nx ∈ N with Pn·dx (x, x) > 0 for all
n ≥ nx.
297
Periodicity of Markov Chains (5)
Applied Probability
• Theorem, (see Klenke, Lemma 18.3): Let X be irreducible. Then
the following statements hold:
– d = dx = dy for all x, y ∈ E.
– For all x, y ∈ E there exist nx,y ∈ N and Lx,y ∈ {0, . . . , d − 1}
such that nd + Lx,y ∈ N (x, y) for all n ≥ nx,y . Lx,y is uniquely
determined and we have Lx,y + Ly,z = Lx,z (mod d) for all
x, y, z ∈ E.
298
Periodicity of Markov Chains (6)
Applied Probability
• Theorem, (see Klenke, Theorem 18.4): Let X be irreducible with
period d. Then there exists a disjoint decomposition of the state
space
E=
d
]
i=1
Ei
with the property
p(x, y) > 0 and x ∈ Ei ⇒ y ∈ Ei+1 (mod d) .
The decomposition is unique up to cyclic permutations.
299
Convergence Theorem (1)
Applied Probability
• Definition, Total variation norm (see Klenke, page 173):
– Consider two probability measures P and Q, then
kP − QkT V = sup{ f d(P − Q) : f ∈ L∞(C) with kf k∞ = 1}.
Z
is called total variation norm.
• It can be shown that if kPn − QkT V → 0 then Pn converges to Q in
distribution when n → ∞.
300
Convergence Theorem (2)
Applied Probability
• Theorem, Convergence Theorem of Markov chains (see Klenke,
Theorem 18.18): Let X be an irreducible, positive recurrent Markov
chain on E with invariant distribution π. Then the following are
equivialent:
– X is aperiodic
– For every x ∈ E we have
kLx(Xn) − πkT V n→∞
−→ 0 .
– kLx(Xn) − πkT V n→∞
−→ 0 holds for some x ∈ E.
– For every µ ∈ M1(E) we have kµpn − πkT V n→∞
−→ 0.
301
Convergence Theorem (3)
Applied Probability
• In Chapter 18 countable E were considered. The results obtained
here can be extended to more general E, i.e. E = Rn.
• Convergence theorems can also be obtained for these more general
E. See e.g. Robert and Casella (1999), Meyn and Tweedie (2009) or
Durrett (2010)[Chapter 6.8].
302
Speed of Convergence (1)
Applied Probability
• How fast does PXn converge to π.
• Without going into details suppose that |E| = N . Consider the
eigenvalues of P, sorted according to their modulus,
λ[1] = 1 ≥ |λ[2]| ≥ · · · ≥ |λ[N ]|. If p is irreducible then |λ[2]| < 1.
Then
kµpn − πkT V ≤ C|λ[2]|n .
• In Examples 18.14 and 18.15 Klenke obtains and investigates the
speed of convergence.
303
Speed of Convergence (2)
Applied Probability
• The speed of convergence is an important issue when Markov chain
Monte Carlo methods are applied. The speed is related to the
question of how many draws are necessary to obtain samples from
the (approximate) posterior. In most applications the speed of
convergence cannot be derived analytically. Therefore, so called
convergence diagnostics are applied in Bayesian statistics and
Bayesian econometrics (Gelman and Rubin, 1992; Brooks and
Gelman, 1998; Geweke, 1992; Cowles and Carlin, 1996; Chib and
Ergashev, 2009).
304
Markov Chains and Linear Algebra (1)
Applied Probability
• We now assume that E is finite. |E| = n then P is an n × n matrix
(P is an n × n matrix, the probability vector p is a n × 1 column
vector).
• The states are (S1, . . . , Sn) = (E1, . . . , En)
305
Markov Chains and Linear Algebra (2)
Applied Probability
• Example: Weather model, see (see Luenberger, 1979, page 225):
– The states (S1, S2, S3) are sunny, cloudy, rainy.
– The stochastic matrix is:

P=











S
S 0.5
C 0.5
R 0
C
0.5
0.25
0.5
R
0
0.25
0.5












.
306
Markov Chains and Linear Algebra (3)
Applied Probability
• Example: Estes Learning model, see (see Luenberger, 1979, page
226):
– Two states (S1, S2) := (L, N ), something has been learned or not.
Here it is assumed that nothing is forgotten after it has been
learned. The probability to learn is α.
– The stochastic matrix is:

P=







L
N
L N
1 0
α 1−α








.
307
Markov Chains and Linear Algebra (4)
Applied Probability
• Example, Gambler’s Ruin problem (see Luenberger, 1979, Chapters 2
and 8):
– We consider two players: A for guest, B for the house.
– p is the probability that A wins a coin from player B. q = 1 − p is
the probability that B wins one coin from A.
– The initial holdings are a, b ∈ N.
– A player wins overall if he obtains all coins.
– What is the probability that A wins?
308
Markov Chains and Linear Algebra (5)
Applied Probability
• Example: Suppose that a = b = 2, then

P=




















0
1
2
3
4
0
1
q
0
0
0
1
0
0
q
0
0
2
0
p
0
q
0
3
0
0
p
0
0
4
0
0
0
p
1





















.
309
Markov Chains and Linear Algebra (6)
Applied Probability
• Proposition, (see Luenberger, 1979, page 230):
– Corresponding to a stochastic matrix P the value λ0 = 1 = λ[1] is
an eigenvalue. No other eigenvalue if P has absolute value greater
than 1.
• Definition, Regular Chain (see Luenberger, 1979, page 230):
– A Markov chain is called regular if Pm > 0 for some m ∈ N.
310
Markov Chains and Linear Algebra (7)
Applied Probability
• Proposition, (see Luenberger, 1979, page 230): Let P be the
transition matrix of a regular Markov chain. Then:
– There is a unique probability vector p > 0 such that p>P = p>.
– For any initial state i (corresponding to an initial probability vector
equal to the ith coordinate vector ei) the limit vector
m
π > = m→∞
lim e>
i P
exists and is independent of i. Furthermore, π = p.
– limm→∞ Pm = P̄, where P̄ is the n × n matrix, each of whose
rows is equal to p>.
311
Markov Chains and Linear Algebra (8)
Applied Probability
• Example: Weather model
– A chain is regular if Pm > 0 for some m ∈ N.
– For the weather model P2 > 0.
– By Matlab (or some other software package) we can derive the
left eigenvectors of P. Based on this we obtain the invariant
probability vector p.
– In addition we observe that Pm for m sufficiently large yields a
matrix close to P̄, which is a matrix containing in the rows
i = 1, . . . , n the vectors p>.
312
Markov Chains and Linear Algebra (9)
Applied Probability
• By ordering the states we can obtain the transition matrix P in
blocked form.


0(r×n−r) 
 P
P =  r
 .
R
Q
• The r × r matrix Pr collects the closed/absorbing states. R is an
(n − r) × r matrix representing the transition probabilities from the
transient states to states within the closed class. The
(n − r) × (n − r) substochastic matrix Q contains the transition
probabilities within the transient states.
• M = (I(n−r) − Q)−1 is called the fundamental matrix of the
Markov chain.
313
Markov Chains and Linear Algebra (10)
Applied Probability
• Proposition, (see Luenberger, 1979, page 240):
– The matrix M = (I(n−r) − Q)−1 exists and is positive.
314
Markov Chains and Linear Algebra (11)
Applied Probability
• Proposition, (see Luenberger, 1979, page 240):
– The element mij of the matrix M of a Markov chain with
transient states is equal to the mean number of times the process
is in the transient state Sj if it is initiated in a transient state Si.
315
Markov Chains and Linear Algebra (12)
Applied Probability
• Proposition, (see Luenberger, 1979, page 240):
– Let 1(n−r) be a column vector of ones. In a Markov chain with
transient states, the ith component of the vector M1(n−r) is equal
to the mean number of steps before entering a closed class when
the process is initiated in transient state Si.
316
Markov Chains and Linear Algebra (13)
Applied Probability
• Proposition, (see Luenberger, 1979, page 241):
– Let bij be the probability that if a Markov chain is started in
transient state Si, it will first enter a closed class by visiting state
Sj . Let B be the (n − r) × r matrix with entries bij . Then
B = MR.
317
Markov Chains and Linear Algebra (14)
Applied Probability
• Example: Estes Learning model
– With


L N








P =  L 1 0
 .



N α 1−α
we already observe the canonical form of P. Here n = 2, r = 1,
Pr = 1, R = α and Q = 1 − α.
– Hence, M = (1 − (1 − α))−1 = α1 .
– α1 is the mean number of steps necessary enter into the closed
class (”learned”).
– Here, B = MR = αα = 1.
318
Markov Chains and Linear Algebra (15)
Applied Probability
• Example: Gambler’s Ruin with n = 4 coins:
given by

0 4 1 2




 0
1 0 0 0



 4
0 1 0 0
P = 
q 0 0 p
 1


 2
0 0 q 0



3 0 p 0 q
P in canonical form is
3
0
0
0
p
0





















.
• The n coin problem can be easily implemented in Matlab.
Luenberger (1979)[page 244] presents some analytical results.
319
Markov Chain Monte Carlo (1)
Applied Probability
• In the next step we want to sample a random variable Y with
distribution π.
• E is a finite set in chapter 18.3. The following results also hold for
more general E.
• U1, U2, . . . are iid uniform random variables.
• The idea is to construct a Markov chain X that is distributed
(approximately) like π.
• The method of producing π distributed samples is called Markov
chain Monte Carlo method (MCMC).
320
Markov Chain Monte Carlo (2)
Applied Probability
• Metropolis algorithm
– Q is a transition matrix of an arbitrary irreducible Markov chain on
E. The Metropolis matrix is











q(x, y) min 1,
π(y)q(y,x)
π(x)q(x,y)
p(x, y) = 0







1 − Pz6=x p(x, z)
!
if x 6= y, q(x, y) > 0,
if x 6= y, q(x, y) = 0,
if x = y .
– Note that P is reversible, i.e. for all x, y ∈ E we have
π(x)p(x, y) = π(y)p(y, x) and π is invariant.
321
Markov Chain Monte Carlo (3)
Applied Probability
• Proposition, (see Klenke, Theorem 18.20):
– If Q is irreducible, then the Metropolis matrix P of Q is irreducible
with unique invariant measure π. If, in addition, Q is aperiodic or
if π is not the uniform distribution on E, then P is aperiodic.
322
Markov Chain Monte Carlo (4)
Applied Probability
• Metropolis algorithm
Now we simulate a chain X with distribution converging to π as
follows:
– We can draw from the reference chain/proposal distribution q.
– Suppose that a transition from the present state x to state y is
proposed. The we accept this proposal with probability

min 1,


π(y)q(y, x) 
 .
π(x)q(x, y)
323
Markov Chain Monte Carlo (5)
Applied Probability
• Example: Ising model
– A model on ferromagnetism in crystals.
– Atoms are placed at sites of a lattice Λ = 0, . . . , N − 12.
– Each atom i ∈ Λ has a magnetic spin x(i) ∈ {−1, 1}, that either
points upwards or downwards.
– Neighboring atoms interact.
– Due to thermic fluctuations the state of the system is random and
distributed according to the Bolzmann distribution π on the state
space E = {−1, 1}Λ. The inverse temperature β = 1/T ≥ 0 (in
Kelvin) a parameter of this distribution.
324
Markov Chain Monte Carlo (6)
Applied Probability
• Example: Ising model
– The local energy level of a single atom i ∈ Λ is described by
H i(x) =
1 X
1(x(i)6=x(j)
2 j∈Λ:j∼i
where i ∼ j indicates that i and j are neighbors in Λ (coordinate
wise mod N , periodic boundary conditions).
– Total energy (Hamiltonian function)
H(x) =
X
i∈Λ
H i(x) =
X
j∼i
1(x(i)6=x(j)
325
Markov Chain Monte Carlo (7)
Applied Probability
• Example: Ising model
– The Bolzmann distribution π is given by
exp(−βH(x))
π(x) = P
.
x∈E exp(−βH(x))
Due to the normalizing term we get a probability measure.
326
Markov Chain Monte Carlo (8)
Applied Probability
• Example: Ising model
– Consider x ∈ E. Denote xi,σ the state in which at site i the spin is
changed to σ ∈ {−1, 1}. I.e. xi,σ (j) = σ if i = j and x(j) if i 6= j.
– xi is the state where the spin is reversed, i.e. xi = xi, −x(i).
– We want to simulate the atoms x.
327
Markov Chain Monte Carlo (9)
Applied Probability
• Example: Ising model
1
if y = xi for some
– Use a reference chain: q(x, y). E.g. q(x, y) = ]Λ
i ∈ Λ and zero else. I.e. we choose a site i ∈ Λ (uniformly on Λ)
and invert the spin at that site. Q is irreducible.
– The Metropolis algorithm accepts the proposal of the reference
chain with probability 1 if π(xi) ≥ π(x). Otherwise the proposal is
accepted with probability π(xi)/π(x).
328
Markov Chain Monte Carlo (10)
Applied Probability
• Example: Ising model
– Note that
P
P
H(xi) − H(x)
=
1
−
j:
j∼i
j: j∼i 1(x(j)6=x(i)) =
(x(j)6
=
−x(i))
−2 · Pj: j∼i 1(x(j)6=x(i)) − 12 .
P
1
i
– This yields log(π(x )/π(x)) = 2 · j∼i 1(x(j)6=x(i)) − 2 . This
expression only depends on the 2d neighbors, in our case d = 2.
I.e. the normalizing constant need not be calculated.
329
Markov Chain Monte Carlo (11)
Applied Probability
• Example: Ising model
– By the we obtain the Metropolis transition matrix


1



]Λ


min 1, exp 2 ·
p(x, y) = 1 − Pi∈Λ p(x, xi )




0
1
j∼i 1(x(j)6=x(i)) − 2
P
if y = xi ,
if x = y
else.
330
Markov Chain Monte Carlo (12)
Applied Probability
• Example: Ising model
– This can be implemented as follows: Draw I1, I2, . . . , In ∼ UΛ and
Un iid uniform on [0, 1]. Then
(
Fn(x) = x
In
if log Un ≤ 2 ·
P
j∼i
1(x(j)6=x(i)) −
1
2
x else.
– The chain (Xn)n∈N is obtain from Fn(Xn−1) for every n ∈ N.
331
Markov Chain Monte Carlo (13)
Applied Probability
• An alternative to the Metropolis algorithm is the Gibbs sampler.
• If x us a state and i ∈ Λ, then define
x−i := {y ∈ E : y(j) = x(j) for j 6= i} .
332
Markov Chain Monte Carlo (14)
Applied Probability
• Definition, Gibbs sampler (see Robert and Casella
(1999)[Chapter 7]): Suppose that for some k > 1 the random
variable Y can be written as Y = (Y1, . . . , Yk ), were Yi is either unior multidimensional. Suppose the we can simulate from the
corresponding densities f1, . . . , fk , that is
Yi|y1, y2, . . . , yi−1, yi+1, . . . , yk ∼ fi(yi|y1, y2, . . . , yi−1, yi+1, . . . , yk )
for all i = 1, . . . , k. The associated Gibbs sampling algorithm for a
transition from Ym to Ym+1 is
Step 1:
Y1 ∼ f1(y1|y2,m, . . . , yk,m)
Step 2:
..
Y2 ∼ f2(y2|y1,m+1, y3,m, . . . , yk,m)
..
Step k:
Yk ∼ fk (yk |y1,m+1, y2,m+1, . . . , yk−1,m+1)
333
Markov Chain Monte Carlo (15)
Applied Probability
• Our goal is to sample X.
• Definition, Completion, (see Robert and Casella, 1999, Definition
7.1.4):
Given a probability density f , a density g that satisfies
R
Z g(x, z)dz = f (x) is called completion of f .
334
Markov Chain Monte Carlo (16)
Applied Probability
• Observe that for each i fixed, the Gibbs sampler corresponds to a Metropolis
Hastings sampler with proposal density
0
0
q(y, y 0) = δ(y1,y2,...,yi−1,yi+1,...,yk )(y10 , y20 , . . . , yi−1
, yi+1
, . . . , yk0 ) ×
fi(yi0 |y1, y2, . . . , yi−1, yi+1, . . . , yk ). Hence,
p(x, y)
=
g(y 0) fi(yi|y1, y2, . . . , yi−1, yi+1, . . . , yk )
g(y) fi(yi0 |y1, y2, . . . , yi−1, yi+1, . . . , yk )
=
fi(yi0 |y1, y2, . . . , yi−1, yi+1, . . . , yk )
fi(yi|y1, y2, . . . , yi−1, yi+1, . . . , yk )
×
fi(yi|y1, y2, . . . , yi−1, yi+1, . . . , yk )
=1.
fi(yi0 |y1, y2, . . . , yi−1, yi+1, . . . , yk )
• The Gibbs sampler is equivalent to a Metropolis-Hastings algorithm with
acceptance probability one. For more details see Robert and Casella (1999).
335
Markov Chain Monte Carlo (17)
Applied Probability
• Example: Ising model
– Here we have x−i = {xi,−1, xi,+1}. For i ∈ Λ and σ ∈ {−1, 1}
i,σ
π(x |x−i) =
=
=
π(xi,σ )
π({xi,−1, xi,+1})
exp(−βH(xi,σ ))
exp(−βH(xi,−1)) + exp(−βH(xi,+1))
i,σ
1 + exp β(H(x ) − H(x
i,−σ
))
−1
−1
1
X
= 1 + exp 2β
(1(x(j)6=σ) − ) .
j: j∼i
2



336
Markov Chain Monte Carlo (18)
Applied Probability
• Example: Ising model
– For the Ising model we get a Markov chain (Xn)n∈N0 with values
in E = {−1, 1}Λ and transition matrix
Here we have x−i = {xi,−1, xi,+1}. For i ∈ Λ and σ ∈ {−1, 1}
p(x, y) =




−1
1 1 + exp 2β P
1
j: j∼i (1(x(j)6=σ) − 2 )
]Λ


0
if y = xi for some i ∈ Λ,
otherwise.
337
Markov Chain Monte Carlo (19)
Applied Probability
• Until now a convergent Markov chain has been constructed to
simulate from a distribution π.
• The distribution we want to simulate from can also be the
distribution of some parameters θ ∈ Θ. Where Θ stands for some
parameter space.
• This fact is extensively used in Bayesian econometrics and statistics,
where MCMC methods are used to simulate the posterior distribution
of the unknown parameter θ.
• Since the normalizing constant need not be derived to implement the
Metropolis-Hastings algorithm or the Gibbs sampler, these methods
can be implemented in a straightforward way.
338
Markov Chain Monte Carlo (20)
Applied Probability
• Example: Linear Regression model
– Consider the linear stochastic model yn = β >xn + εn. yn ∈ R,
xn ∈ Rk , β ∈ Rk , xn,1 = 1 for all n. n ∈ N. εn is iid normal with
mean zero and variance σ 2.
– Then Θ = Rk × Rk+.
– Suppose that N observations are available. The distribution of
Y N |X N is described by
π(Y1, . . . , YN |X N , θ) =
=
N
Y
n=1
N
Y
n=1
π(Yn|Xn, θ)
√
N
1
1 X
exp( 2
(Yn − β >Xn)) .
2σ n=1
2πσ
π(.|X N , θ) evaluated at the data y N , xN is called likelihood.
339
Markov Chain Monte Carlo (21)
Applied Probability
• Example: Linear Regression model
– In a Bayesian analysis priors on θ have to be assumed. For example,
we choose the so called natural conjugate priors. For our model
this is normal distribution for β and a gamma distribution for 1/σ 2
or an inverse gamma distribution for σ 2. In particular
π(θ) = π(β, 1/σ 2) = πN (β|b0, B̃0σ 2)πΓ(1/σ 2|v0, S0)
or
π(θ) = π(β, 1/σ 2) = πN (β|b0, B̃0σ 2)πΓ−1 (σ 2|v0, S0) .
340
Markov Chain Monte Carlo (22)
Applied Probability
• Example: Linear Regression model
πΓ−1(σ 2|v0, S0)
v
S0 0 1 v0 +1
s0
= Γ(v0) σ2
exp − σ2 .
–
– If y is Gamma distributed (with density
ba
f (y; a, b) = Γ(a) (y)a−1 exp (−by), E(y) = a/b, V(y) = a/b2 ),
then 1/y follows an inverse gamma distribution (with density
ba
f (y; a, b) = Γ(a) (1/y)a+1 exp (−b/y), E(y) = b/(a − 1),
V(y) = b2/((a − 1)(a − 2)) ) (see e.g. Frühwirth-Schnatter,
2006, p. 434).
– By the Bayes theorem:
π(θ|y N , xN ) ∝ π(y1, . . . , yN |xN , θ)π(β, 1/σ 2)
= πN (β|b0, B̃0σ 2)πΓ−1 (σ 2|v0, S0) .
341
Markov Chain Monte Carlo (23)
Applied Probability
• Example: Linear Regression model
– By some algebra it can be shown that (see e.g. Cameron and
Trivedi, 2005; Frühwirth-Schnatter, 2006)
π(θ|y N , xN ) ∝
v +1
s
1/σ 2 N exp − N2
σ
B̃N B̃0−1b0
!
−k/2
1
σ2
exp − 2 (β − βN )> BN (β − βN )
2σ
>
B̃N B̃0−1b0
>
!
!
+X Y =
+ X Xβ̂OLS ,
where βN =
2
BN = σN
B̃N β̂OLS = (X>X)−1X>Y where X is a N × k matrix
and Y is of dimension N × 1. B̃N = (B̃0−1 + X>X)−1,
N
1
>
> −1
> −1
vN = v0 + 2 and sN = s0 + 2 Y Y + b0 B̃0 b0 − β B̃N β .
– The posterior π(β, σ 2|y N , xN ) has normal-gamma form. It
factorizes into a normal density with parameters bN and BN and
an (inverse) gamma density with parameter shape parameter vN
and scale parameter sN .
342
Markov Chain Monte Carlo (24)
Applied Probability
• Example: Linear Regression model
– Suppose that σ 2 is fixed, and the normal prior πN (β|b0, B0) is
used for β. Then
π(β|y N , xN , σ 2)
with mean βN = BN B0−1b0
BN = (B0−1 + σ12 X>X)−1.
1
>
X
Y
2
σ
is a normal density
+
and
variance parameter
– When β is fixed and an inverse gamma prior is applied to σ 2, then
π(σ 2|y N , xN , β)
follows an inverse gamma distribution with shape parameter
2
vN = v0 + N2 and scale parameter sN = s0 + 21 PN
n=1 (yn − βxn ) .
343
Markov Chain Monte Carlo (25)
Applied Probability
• Example: Linear Regression model
– We observe that with σ 2 the conditional distribution of β is a
normal distribution with mean parameter bN and variance BN .
– With β fixed, we observe that σ 2 follows an inverse gamma
distribution with parameters vN and sN .
344
Markov Chain Monte Carlo (26)
Applied Probability
• Example: Linear Regression model
– In addition, consider an update of β, then z stands for the former
β and σ 2. In more detail we construct a chain (Zm)m∈N, where
2
Zm = (βm, σm
). This is called the mth draw of the sampler.
– This fact can be used in a computationally efficient way. By
2
drawing from π(β|y n, xn, σ 2), we get (βm, σm−1
). We draw σ 2
from an inverse gamma distribution with parameter vN and sN .
345
Markov Chain Monte Carlo (27)
Applied Probability
• Example: Linear Regression model
– Hence we get the chain (Zm) as follows. Given Zm−1 draw
2
Step 1: βm from πN (β|y N , xN , σm−1
)
2
Step 2: σm
from πΓ−1 (σ 2|y N , xN , βm)
– The convergence period of this chain is called burn-in phase,
while after the burn-in phase we obtain (approximate) draws from
the posterior.
346
Markov Chain Monte Carlo (28)
Applied Probability
• Alternatively Metropolis-Hastings algorithm can be used.
– Here q(x, y) is a proposal density. Here,






min 1,


0
p(x, y) = 
π(y)q(y,x)
π(x)q(x,y)
!
else.
– To implement the Metropolis-Hastings
algorithm we simply have
!
π(y)q(y,x)
to calculate min 1, π(x)q(x,y) . The new y is accepted if some iid
uniform variable Um is smaller than this term.
347
Markov Chain Monte Carlo (29)
Applied Probability
• Example: Linear Regression model
2
– Metropolis-Hastings update of β. Given Xm−1 = (βm−1, σm−1
)
draw β new from some proposal density q(β, βm−1). Then
2
2
) q(βm−1 , β new ) 
π(y N |xN , β new , σm−1
) π(β new |σm−1
new

.
ρ(βm−1 , β
) = min 1,
2
2
π(y N |xN , βm−1 , σm−1
) π(βm−1 |σm−1
) q(β new , βm−1 )


– Accept β new if Um ≤ ρ(βm−1, β new ).
– Note that also here the normalized constant need not be
calculated.
348
Markov Chain Monte Carlo (30)
Applied Probability
• Example: Linear Regression model
2
– Metropolis-Hastings update of σ 2. Given (βm, σm−1
) draw σ 2,new
2
from some proposal density q(σ 2, σm−1
). Then
2
ρ(σm−1
, σ 2,new ) =
2
, σ 2,new ) 
π(y N |xN , βm , σ 2,new ) π(βm |σ 2,new )π(σ 2,new ) q(σm−1

min 1,
.
2
2
2
2
π(y N |xN , βm , σm−1
) π(βm |σm−1
)π(σm−1
) q(σ 2,new , σm−1
)


2
– Accept σ 2,new if Um ≤ ρ(σm−1
, σ 2,new ).
– Note that conjugate priors are not necessary with the
Metropolis-Hastings algorithm.
349
Markov Chain Monte Carlo (31)
Applied Probability
• Remark:
– In Klenke, chapter 17 and 18 we investigated a countable state
space. The Gibbs sampler and the Metropolis-Hastings algorithm
also work in a more general state space. See Robert and Casella
(1999) and Meyn and Tweedie (2009).
– For the Metropolis Hastings algorithm see Tierney (1998).
– Bayesian methods can also be applied in a lot of models where the
likelihood is not available in closed form. E.g. latent variable
models, hierarchical models, mixture models (see e.g.
Frühwirth-Schnatter, 2006).
350
Outline - Brownian Motion
Applied Probability
• Continuous versions and Hölder continuity.
• Definitions and properties.
• Convergence of probability measures
• Donsker’s Theorem
• Klenke, Chapter 21.
351
Continuous Versions (1)
Applied Probability
• Independent normally distributed increments (see Klenke, Example
14.45):
– I = [0, ∞] and Ωi = R, i ∈ [0, ∞), B = B(R). Ω = R[0,∞),
A = B ⊗[0,∞) and let Xt be the coordinate map for t ∈ [0, ∞).
Then X = (Xt)t≥0 is the canonical process on (Ω, A).
– Construct a probability measure P on (Ω, A) such that X has
independent, stationary and normally distributed increments:
(Xti − Xti−1 )i=1,...,n
are independent for all 0 = t0 < t1 < · · · < tn. PXt−Xs = N(0,t−s)
for all t > s.
352
Continuous Versions (2)
Applied Probability
• Independent normally distributed increments (see Klenke, Example
14.45):
– Define stochastic kernels κt(x, dy) := δx ∗ N(0,t)(dy) for t ∈ [0, ∞)
where N0,0 = δ0. Here the Chapman-Kolmogorov equation holds
κs ∗ κt(x, dy) = δx(N(0,s) ∗ N(0,t))(dy) = δx ∗ N(0,s+t)(dy) = κs+t(x, dy)
– On more details on probability measures on product space (see e.g.
Klenke, 2008, Chapter 14)
353
Continuous Versions (3)
Applied Probability
• Independent normally distributed increments (see Klenke, Example
14.45):
– P is the unique probability measure on Ω according to Corollary
14.44.
– With X we have almost constructed a Brownian motion, what is
missing is to investigate whether the paths (i.e. the maps t 7→ Xt)
are almost surely continuous. A-priori the paths of a canonical
process need not be continuous since every map [0, ∞) → R is
possible. The next step to find paths which are P-almost surely
negligible.
354
Continuous Versions (4)
Applied Probability
• Definition, (see Klenke, Definition 21.1): Let X and Y be
stochastic processes on (Ω, A, P) with time set I and state space E.
X and Y are called
– modifications or versions of each other if, for any t ∈ I, we
have Xt = Yt almost surely.
– indistinguishable if there exists an N ∈ A with P(N ) = 0 such
that {Xt 6= Yt} ⊂ N for all t ∈ I.
• Indistinguishable processes are modifications.
355
Continuous Versions (5)
Applied Probability
• Definition, Hölder continuous (see Klenke, Definition 21.2):
– Let (E, d) and (E 0, d0) be metric spaces and γ ∈ (0, 1]. A map
φ : E → E 0 is called Hölder continuous of order γ at the
point r ∈ E, if there exists ε > 0 and C < ∞ such that for any
s ∈ E with d(s, r) < ε we have
d0(φ(r), φ(s)) ≤ Cd(r, s)γ .
– φ is called locally Hölder continuous of order γ if, for every
t ∈ E, there exist ε > 0 and C(t, ε) > 0 such that for all s, r ∈ E
with d(s, r) < ε and d(r, t) < ε, the above inequality holds. or
versions of each other if, for any t ∈ I, we have Xt = Yt almost
surely.
– Finally, φ is called Hölder continuous of order γ if there exists
a C such that the above inequality holds for all s, r ∈ E.
356
Continuous Versions (6)
Applied Probability
• Remarks: Hölder continuous
– If γ = 1 Hölder continuity is Lipschitz continuity.
– If E = R and γ > 1 every locally Hölder continuous function is
constant.
– If φ is Hölder γ-continuous at a given point t, there need not exist
an open neighborhood in which φ is continuous, i.e. φ need not be
locally Hölder γ-continuous.
357
Continuous Versions (7)
Applied Probability
• Theorem, Hölder continuous - properties (see Klenke, Lemma 21.3):
Let I ⊂ R and let f : I → R be locally locally Hölder continuous of
order γ ∈ (0, 1]. Then the following statements hold:
– f is locally Hölder continuous of order γ 0 for every γ 0 ∈ (0, γ).
– If I is compact, then f is Hölder continuous.
– Let I be a bounded interval of length T > 0. Assume that there
exists an ε > 0 and an C(ε) < ∞ such that for all s, t ∈ I with
|t − s| ≥ ε we have
|f (t) − f (s)| ≤ C(ε)|t − s|γ .
Then f is Hölder continuous of order γ with constant
C = C(ε) [T /ε]1−γ .
358
Continuous Versions (8)
Applied Probability
• Definition, Path properties (see Klenke, Definition 21.4):
– Let I ⊂ R and let X = (Xt, t ∈ I) be a stochastic process on
some probability space (Ω, A, P) with values in a metric space
(E, d). For every ω ∈ Ω we say that the map I → E, t 7→ Xt(ω)
is a path of X.
– We say that X has almost surely continuous paths, or briefly that
X is a.s. continuous, if for almost all ω ∈ Ω, the path t 7→ Xt(ω)
is continuous.
– Similarly, we define, locally Hölder-γ-continuous paths, etc.
359
Continuous Versions (9)
Applied Probability
• Theorem, (see Klenke, Lemma 21.5): Let X and Y be modifications
of each other. Assume that one of the following properties hold:
– I is countable.
– I ⊂ R is a (possibly unbounded) interval and X and Y are almost
surely right continuous.
• Then X and Y are indistinguishable.
360
Continuous Versions (10)
Applied Probability
• Theorem, Kolmogorov-Chentsov (see Klenke, Theorem 21.6): Let
X = (Xt : t ∈ [0, ∞)) be a real valued process. Assume for every T > 0 there are
numbers α, β, C > 0 such that
E (|Xt − Xs|α ) ≤ C|t − s|1+β
for all s, t ∈ [0, T ]. Then the following statements hold:
– There is a modification X̃ = (X̃t, t ∈ [0,
∞))
of X whose paths are locally
Hölder-γ -continuous of every order γ ∈ 0, αβ .
u
β
– Let γ ∈ 0, α . For every ε > 0 and T < ∞ there exists a number K < ∞
that depends only on ε, T, α, β, C, γ such that
γ
P |X̃t − X̃s| ≤ K|t − s| , s, t ∈ [0, T ] ≥ 1 − ε .
361
Continuous Versions (11)
Applied Probability
• Remark: Kolmogorov-Chentsov theorem:
– The result of this theorem holds in a Polish space (E, %). The
proof does not rely on the assumption that the range is in R.
– If we change the time set then the assumptions have to be
strengthed. E.g. If (Xt)t∈Rd with values in E, we have
E (ρ(Xt, Xs)α) ≤ Ckt − skd+β
2
d
for all s, t ∈ [−T, T ] . Then for all γ ∈ 0,
Hölder-γ-continuous version of X.
β
α
!
there is a locally
362
Construction and Path Properties (1)
Applied Probability
• Definition, Brownian Motion (see Klenke, Definition 21.8): A real
valued stochastic process B = (Bt : t ∈ [0, ∞)) is called a
(standard) Brownian motion if
–
–
–
–
B0 = 0,
B has independent, stationary increments,
Bt ∼ N0,t (normal with mean zero and variance t) for all t > 0 and
t 7→ Bt is P-almost surely continuous.
363
Construction and Path Properties (2)
Applied Probability
• Theorem, Existence of Brownian Motion (see Klenke, Theorem
21.9):
There exists a probability space (Ω, A, P) and a Brownian motion
B on (Ω, A, P). The paths are almost surely Hölder-γ-continuous
for any γ ∈ (0, 21 ).
364
Construction and Path Properties (3)
Applied Probability
• Remark, Gaussian process
– (Xt)t∈I is called Gaussian process if for every n ∈ N and
t1, . . . , tn ∈ I we have
(Xt1 , . . . , Xtn )
is n-dimensional normally distributed.
– X is called centered if E(Xt) = 0 for every t ∈ I.
– The map Γ(s, t) = Cov(Xs, Xt) for s, t ∈ I is called covariance
function.
365
Construction and Path Properties (4)
Applied Probability
• Theorem, (see Klenke, Theorem 21.11): Let X = (Xt)t∈[0,∞) be a
stochastic process. Then the following are equivalent:
– X is a Brownian motion.
– X is a continuous centered Gaussian process with
Cov(Xs, Xt) = s ∧ t for all s, t ≥ 0.
• Theorem, Scaling property of Brownian motion (see Klenke,
Corollary 21.12):
– If B is a Brownian motion and if K 6= 0, then K −1BK 2t is also a
Brownian motion.
366
Construction and Path Properties (5)
Applied Probability
• Definition, Brownian Bridge (see Klenke, Example 21.13):
– A process X = (Xt : t ∈ [0, 1]), with Xt := Bt − tB1 , is called
Brownian bridge.
• The covariance function of the Brownian bridge is
Γ(s, t) = s ∧ t − st. To see this calculate
E(Xs, Xt) = E(Bs − sB1, Bt − tB1) = · · · = s ∧ t − st.
367
Construction and Path Properties (6)
Applied Probability
• Theorem, (see Klenke, Theorem 21.11):
– Let (Bt)t≥0 be a Brownian motion. Then E(Bs, Bt) = t − s






tB1/t

0
Xt = 
if t ≥ 0,
if t = 0 .
Then X is a Brownian motion.
• A Brownian motion (Wt)t≥0 started at zero and with E(Wt2) = t is
often called standard Brownian motion.
368
Construction and Path Properties (7)
Applied Probability
• Theorem, Blumenthal’s 0-1 law (see Klenke, Theorem 21.15):
– Let (Bt)t≥0 be a Brownian motion and let F = (Ft)t≥0 = σ(B) be
the filtration generated by B.
– Further, let F0+ = Tt>0 Ft.
– Then F0+ is the P-trivial σ-algebra.
• On the P-trivial σ-algebra see e.g. Klenke (2008)[Chapter 2.3].
369
Construction and Path Properties (8)
Applied Probability
• Theorem, Paley-Wiener-Zygmund (see Klenke, Theorem 21.17):
– For every γ > 12 , almost surely, the paths of Brownian motion
(Bt)t≥0 are not Hölder-continuous of order γ at any point.
– In particular, the paths are almost surely nowhere differentiable.
370
Strong Markov Property (1)
Applied Probability
• Px is the probability measure such that B = (Bt)t≥0 is a Brownian
motion started at x ∈ R. I.e. under Px the process (Bt − x)t≥0 is a
standard Brownian motion. The simple Markov property directly
follows from the construction of the process.
• Theorem, Strong Markov propery (see Klenke, Theorem 21.18):
– Brownian motion B with distributions (Px)x∈R has the strong
Markov property.
371
Strong Markov Property (2)
Applied Probability
• Theorem, Reflexion principle for Brownian motion (see Klenke,
Theorem 21.19):
– For every a > 0 and T > 0,
√
2 T 1 −a2/(2T )
P (sup{Bt : t ∈ [0, T ] > a}) = P (BT > a) ≤ √
e
.
2π a
372
Strong Markov Property (3)
Applied Probability
• Theorem, Lévy’s arcsine law (see Klenke, Theorem 21.20):
– Let T > 0 and ζT := sup{t ≤ T : Bt = 0}. Then for t ∈ [0, T ]
r
2
P (ζT ≤ t) = arcsin t/T .
π
373
Feller Processes (1)
Applied Probability
• In some applications a continuous version of a process is too
demanding. I.e. when you work with the Poisson process or a
Brownian motion with jumps. Often there is a version which is right
continuous paths and left side limits.
• Definition, Càdlàg (see Klenke, Definition 21.21): Let E be a Polish
space. A map f : [0, ∞] → E is called right continuous with left
limits (RCLL) or càdlàg (continue à droint, limites à gauche) if
f (t) = f (t+) := lims↓t f (s) for every t ≥ 0 and if for every t > 0 the
left sided limit f (t−) = lims↑t exists and is finite.
374
Feller Processes (2)
Applied Probability
• Definition, (see Klenke, Definition 21.22):
– A filtration F = (Ft)t≥0 is called right continuous if F = F+,
where Ft+ = Ts>t Fs. We say that filtration F satisfies the usual
conditions if F is right continuous and of F0 is P-complete.
• See also Karatzas and Shreve (1991): adapted, augmented filtration
filtration.
375
Feller Processes (3)
Applied Probability
• Theorem, Dobb’s regularisation (see Klenke, Theorem 21.24):
– Let F = (Ft)t≥0 be a filtration that satisfies the usual conditions
and let X = (Xt)t≥0 be an F-supermartingale such that
t 7→ E(Xt) is right continuous. The there exists a modification X̃
of X with RCLL paths.
376
Feller Processes (4)
Applied Probability
• Definition, Feller semigroup (see Klenke, Definition 21.26):
– A Markov semigroup (κt) on E is called Feller semigroup if
f (x) = lim κtf (x)
t→0
for all x ∈ E, f ∈ C0(E) (set of bounded continuous functions
that vanish at infinity) and κtf ∈ C0(E) for every f ∈ C0(E).
377
Feller Processes (5)
Applied Probability
• Theorem, (see Klenke, Theorem 21.27):
– Let (κt)t≥0 be a Feller semigroup on the locally compact Polish
space E. Then there exists a strong Markov process (Xt)t≥0 with
RCLL paths and transition kernels (κt)t≥0.
– Such a process is called Feller process.
378
The Space C([0, ∞)) (1)
Applied Probability
• C([0, ∞)) is the space of continuous functions. Instead of Ω = R we
shall work with Ω = C([0, ∞)) in the following. We need some of
these results to investigate the functional central limit theorem.
• Let us consider functionals which depend on the whole part of a
Brownian motion. E.g. is sup{Xt, t ∈ [0, 1]} measurable.
• For general processes this is not the case, while by the continuity of
Brownian motion measurability still holds.
• We can consider Brownian motion as the canonical process on the
space Ω = C([0, ∞)) of continuous paths.
379
The Space C([0, ∞)) (2)
Applied Probability
• Let Ω = C([0, ∞)) ⊂ R[0,∞). The evaluation map is Xt : Ω → R,
ω 7→ t(ω).
• For f, g ∈ C([0, ∞)) and n ∈ N let dn(f, g) := k(f − g)|[0,n]k ∧ 1
−n
and d(f, g) = P∞
n=1 2 dn (f, g).
380
The Space C([0, ∞)) (3)
Applied Probability
• Theorem, (see Klenke, Theorem 21.30):
– d is a complete metric on Ω = C([0, ∞)) ⊂ R(0,∞) that induces
the topology of uniform convergence on compact sets. The space
(Ω, d) is separable and hence Polish.
381
The Space C([0, ∞)) (4)
Applied Probability
• Theorem, (see Klenke, Theorem 21.31):
– With respect to the Borel σ-algebra B(Ω, d) the canonical
projections Xt, t ∈ [0, ∞) are measurable.
– On the other hand the Xt generate B(Ω, d). Hence
B(R)
N
[0,∞)
|Ω = σ (Xt, t ∈ [0, ∞)) = B(Ω, d) .
382
The Space C([0, ∞)) (5)
Applied Probability
• Definition, (see Klenke, Definition 21.33):
– Let P be the probability measure on Ω = C([0, ∞)) with respect
to which the canonical process X is a Brownian motion.
– Then P is called the Wiener measure.
– The tripe (Ω, A, P) is called the Wiener space and X is called
the canonical Brownian motion or Wiener process.
383
Convergence of Prob. M. on C([0, ∞)) (1)
Applied Probability
• Let X and (X n)n∈N be random variables with values in C([0, ∞))
with distributions PX and PX n .
• Definition, (see Klenke, Definition 21.35):
– We say that the finite-dimensional distributions of (X n) converge
to those of X if, for every k ∈ N and t1, . . . , tk ∈ [0, ∞), we have
(Xtn1 , . . . , Xtnk ) n→∞
⇒ (Xt1 , . . . , Xtk ) .
In the case, we write X n
n→∞,f dd
⇒
X or PX n
n→∞,f dd
⇒
PX .
384
Convergence of Prob. M. on C([0, ∞)) (2)
Applied Probability
• Theorem, (see Klenke, Lemma 21.36):
– Pn
n→∞, f dd
⇒
P and Pn
n→∞, f dd
⇒
Q imply P = Q
385
Convergence of Prob. M. on C([0, ∞)) (3)
Applied Probability
• Theorem, (see Klenke, Theorem 21.37):
– Weak convergence in M(Ω, d) implies finite dimensional
distribution convergence:
Pn n→∞
⇒ P ⇒ Pn
n→∞,f dd
⇒
Q.
386
Convergence of Prob. M. on C([0, ∞)) (4)
Applied Probability
• Theorem, (see Klenke, Theorem 21.38): Let (Pn)n∈N and P be
probability measures on C([0, ∞)). Then the following are equivalent:
n→∞,f dd
– Pn ⇒ P and (Pn)n∈N is tight.
– Pn n→∞
⇒ P weakly.
• On weak convergence see e.g. Klenke (2008)[Chapter 13]. In
particular, tightness is defined in Definition 13.26.
387
Convergence of Prob. M. on C([0, ∞)) (5)
Applied Probability
• To derive a useful criterion for tightness, the Arzelà-Ascoli theorem
will be used. For N, δ > 0 and ω ∈ C([0, ∞)), let
V N (ω, δ) := sup{|ω(t) − ω(s)| : |t − s| ≤ δ, s, t ≤ N } .
• Theorem, Arzelà-Ascoli (see Klenke, Theorem 21.39): A set
A ⊂ C([0, ∞)) is relatively compact if and only if the following two
conditions hold:
– {ω(0) : ω ∈ A} ⊂ R is bounded.
– For every N we have limδ↓0 supω∈A V N (ω, ∆) = 0.
388
Convergence of Prob. M. on C([0, ∞)) (6)
Applied Probability
• Theorem, (see Klenke, Theorem 21.40): A family (Pi : i ∈ I) of
probability measures on C([0, ∞)) is weakly relatively compact if and
only if the following two conditions hold:
– (Pi ◦ X0−1, i ∈ I) is tight; this for every ε > 0, there is a K > 0
such that
Pi ({ω : |ω(0)| > K}) ≤ ε
for all i ∈ I.
– For all η, ε > 0 and N ∈ N there is a δ > 0 such that
N
Pi {ω : V (ω, δ) > η} ≤ ε
for all i ∈ I.
389
Convergence of Prob. M. on C([0, ∞)) (7)
Applied Probability
• Theorem, (see Klenke, Corollary 21.41):
– Let (Xi : i ∈ I) and (Yi : i ∈ I) be families of random variables
in C([0, ∞)).
– Assume that (PXi : i ∈ I) and (PYi : i ∈ I) are tight.
– Then (PXi+Yi : i ∈ I) is tight.
390
Convergence of Prob. M. on C([0, ∞)) (8)
Applied Probability
• Theorem, Kolmogorov’s criterion for weak relative compactness (see
Klenke, Theorem 21.42): Let (X i : i ∈ I) be a sequence of
continuous stochastic processes. Assume that the following conditions
are satisfied.
– The family (P(X0i ∈ ·) : i ∈ I) is tight.
– There are numbers C, α, β > 0 such that for all s, t ∈ [0, ∞) and
every i ∈ I we have
E
|Xsi
−
Xti|α
≤ C|s − t|β+1 .
Then the family (PXi : i ∈ I) = (L(X i), i ∈ I) of distributions of
X i is weakly relatively compact in M(C([0, ∞))).
391
Donsker’s Theorem (1)
Applied Probability
• We consider iid random variables Y1, Y2, . . . with E(Yi) = 0 and
V(Yi) = σ 2 > 0.
• For t > 0 define
Stn
:=
[nt]
X
i=1
Yi and
S̃tn
1 [nt]
X
:= √ 2
Yi .
i=1
σ n
[nt] stands for the integer part of nt.
• By the central limit theorem
L(S̃tn) n→∞
→ N0,t .
392
Donsker’s Theorem (2)
Applied Probability
• Given the properties of Browian motion (Bt ∼ N0,t) we observe that
L(S̃tn) n→∞
→ L(Bt)
for any t > 0.
• By the multivariate central limit theorem we observe that
→ L(Bt1 , . . . , BtN ) .
L(S̃tn1 , . . . , S̃tnN ) n→∞
393
Donsker’s Theorem (3)
Applied Probability
• Define
S̄tn
1 [nt]
tn − [tn]
X
√
:=
Yi + √ 2 Y[nt]+1 .
2
σ n i=1
σ n
• Then for ε > 0
P(|S̃tn−S̄tn|
−2
> ε) ≤ ε E
(S̃tn
−
S̄tn)2
1 1 1
1 n→∞
≤ 2 2 E Y1 ≤ 2 → 0 .
ε nσ
εn
394
Donsker’s Theorem (4)
Applied Probability
• By Slutzky’s theorem (see e.g. Klenke, 2008, Theorem 13.18) we
obtain convergence of the finitedimensional distributions to the
Wiener measure PW . I.e.
PS̄ n
n→∞,f dd
⇒
PW .
395
Donsker’s Theorem (5)
Applied Probability
• Donsker’s theorem strengthens this convergence statement to weak
convergence on C([0, ∞)).
• This theorem is also called functional central limit theorem.
• Theorems of this kind are also called invariance principles since
the limiting distribution is the same for all distributions of Yi with
expectation of zero and the same variance.
396
Donsker’s Theorem (6)
Applied Probability
• Theorem, Donsker’s Theorem (see Klenke, Theorem 21.43):
– In the sense of weak convergence on C([0, ∞)) the distributions of
S̄ n converge to the Wiener measure,
L(S̄ n) n→∞
→ PW .
– S̄tn ⇒ Bt, where ⇒ stands for convergence in distribution.
• This theorem builds the basis when limit distributions of partial sums
are considered in econometrics (see e.g. Davidson, 1994; White,
2001).
397
Donsker’s Theorem (7)
Applied Probability
• From the continuous mapping theorem (see e.g. Klenke, 2008,
Theorem 13.25) it follows:
• Theorem, (see e.g. Durrett, 2010, Theorem 8.6.6)
– If φ : C([0, 1]) → R has the property that it is continuous almost
every where then
φ(S̄(nt)) ⇒ φ(Bt) .
398
Donsker’s Theorem (8)
Applied Probability
• Example (see Durrett, 2010, Example 8.6.1):
– Let φ(x) = x(1). Then φ : C([0, 1]) → R is continuous and the
above theorem gives the central limit theorem.
• Example (see Durrett, 2010, Example 8.6.5):
– Let φ(ω) =
[0,1] ω(t)
R
−1−(k/2)
n
n
X
m=1
k
dt and k ∈ N. φ(.) is continuous. Then
k
−1−(k/2)
(Sm) = n
n
X
(
m
X
m=1 i=1
k
Yi) ⇒
1
0 Bt dt
Z
.
399
*Literatur
Billingsley, P. (1986). Probability and Measure. Wiley, Wiley series in
probability and mathematical statistics, New York, 2nd edition.
Brockwell, P. J. and Davis, R. A. (2006). Time Series: Theory and
Methods. Springer Series and Statistics. Springer, New York, 2nd
edition.
Brooks, S. P. and Gelman, A. (1998). General methods for monitoring
convergence of iterative simulations. Journal of Computational and
Graphical Statistics, 7(4):434–455.
Cameron, A. C. and Trivedi, P. K. (2005). Microeconometrics: Methods
and Applications. Cambridge University Press, New York.
Campbell, J. Y., Lo, A. W., and MacKinlay, A. C. (1997). The
Econometrics of Financial Markets. Princeton University Press,
Princeton.
Chib, S. and Ergashev, B. (2009). Analysis of multifactor affine yield
curve models. Journal of the American Statistical Association,
104(488):1324–1337.
Cowles, M. K. and Carlin, B. P. (1996). Markov chain Monte Carlo 400
convergence diagnostics: A comparative review. Journal of the
American Statistical Association, 91(434):883–904.
Cox, J., Ross, S., and Rubinstein, M. (1979). Option pricing: A
simplified approach. Journal of Financial Economics, 7:229–263.
Davidson, J. (1994). Stochastic Limit Theory - An Introduction for
Econometricians. Oxford University Press, New York.
Delbaen, F. and Schachermayer, W. (1994). A general version of the
fundamental theorem of asset. Mathematische Annalen, 300:463–520.
Duffie, D. (2001). Dynamic Asset Pricing. Princeton University Press,
Princeton and Oxford.
Durrett, R. (2007). Random Graph Dynamics. Cambridge Series in
Statistical and Probabilistic Mathematics. Cambridge University
Press, Cambridge.
Durrett, R. (2010). Probability: Theory and Examples. 4th Edition.
Cambridge University Press, Cambridge.
Filipović, D. (2009). Term-Structure Models: A Graduate Course.
Springer, Berlin.
Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching
Models. Springer Series in Statistics. Springer.
401
Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching
Models. Springer Series in Statistics, Springer, New York.
Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation
using multiple sequences. Statistical Science, 7(4):457–472.
Geweke, J. (1992). Evaluating the accuracy of sampling-based
approaches to the calculation of posterior moments. In Bernardo,
J. M., Berger, J. O., Dawid, A. P., and Smith, A. F. M., editors,
Bayesian Statistics 4, pages 169–193. Oxford University Press, Oxford.
Harrison, J. and Pliska, S. R. (1981). Martingales and stochastic
integrals in the theory of continuous trading. Stochastic Processes
and their Applications, 11(3):215 – 260.
Harrison, M. and Kreps, D. (1979). Martingales and arbitrage in
multiperiod security markets. Journal of Economic Theory,
20:381–408.
Heuser, H. (1993). Lehrbuch der Analysis, Teil 1. Teubner, Wiesbaden,
10th edition.
Horn, R. A. and Johnson, C. R. (1990). Matrix analysis. Cambridge
University Press, Cambridge. Corrected reprint of the 1985 original.
Karatzas, I. and Shreve, S. E. (1991). Brownian Motion and Stochastic
402
Calculus. Springer-Verlag, New York, 2nd edition.
Karr, A. F. (1993). Probability Theory. Springer.
Klenke, A. (2008). Probability Theory - A Comprehensive Course.
Springer.
Lamberton, D. and Lapeyre, B. (2008). Introduction to Stochastic
Calculus – Applied to Finance. Chapman & Hall, London, 2nd edition.
LeRoy, S. F. (1989). Efficient capital markets and martingales. Journal
of Economic Literature, 27(4):1583–1621.
Lucas, Robert E, J. (1978). Asset prices in an exchange economy.
Econometrica, 46(6):1429–45.
Luenberger, D. G. (1979). Introduction to dynamic systems: theory,
models, and applications. John Whiley and Sons, New York.
Mas-Colell, A., Whinston, M. D., and Green, J. R. (1995).
Microeconomic Theory. Oxford University Press, New York.
Meyn, S. and Tweedie, R. L. (2009). Markov Chains and Stochastic
Stability. Cambridge University Press, New York, 2nd edition.
Munkres, J. (2000). Topology. Prentice Hall, Upper Saddle River, NJ,
2nd edition.
Norris, J. R. (1998). Markov Chains. Cambridge University Press.
403
Robert, C. and Casella, G. (1999). Monte Carlo Statistical Methods.
Springer, New York.
Ruud, P. A. (2000). An Introduction to Classical Econometric Theory.
Oxford University Press, New York.
Schönbucher, P. J. (2003). Credit Derivates Pricing Models: Models,
Pricing and Implementation. Wiley Finance Series. John Wiley &
Sons.
Tierney, L. (1998). A note on metropolis-hastings kernels for general
state spaces. Annals of Applied Probability, 8:1–9.
Werner, J. and Ross, S. A. (2000). Principles of Financial Economics.
Cambridge University Press.
White, H. (2001). Asymptotic Theory For Econometricians. Emerald
Group Publishing, Bingley, UK, revised edition.
404