Advanced Probability

Advanced Probability
University of Cambridge,
Part III of the Mathematical Tripos
Michaelmas Term 2006
Grégory Miermont1
1 CNRS
& Laboratoire de Mathématique, Equipe Probabilités, Statistique et Modélisation,
Bt. 425, Université Paris-Sud, 91405 Orsay, France
Contents
1 Conditional expectation
1.1 The discrete case . . . . . . . . . . . . . . .
1.2 Conditioning with respect to a σ-algebra . .
1.2.1 The L2 case . . . . . . . . . . . . . .
1.2.2 General case . . . . . . . . . . . . . .
1.2.3 Non-negative case . . . . . . . . . . .
1.3 Specific properties of conditional expectation
1.4 Computing a conditional expectation . . . .
1.4.1 Conditional density functions . . . .
1.4.2 The Gaussian case . . . . . . . . . .
2 Discrete-time martingales
2.1 Basic notions . . . . . . . . . . . . . .
2.1.1 Stochastic processes, filtrations
2.1.2 Martingales . . . . . . . . . . .
2.1.3 Doob’s stopping times . . . . .
2.2 Optional stopping . . . . . . . . . . . .
2.3 The convergence theorem . . . . . . . .
2.4 Lp convergence, p > 1 . . . . . . . . . .
2.4.1 A maximal inequality . . . . . .
2.5 L1 convergence . . . . . . . . . . . . .
2.6 Optional stopping in the UI case . . .
2.7 Backwards martingales . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Examples of applications of discrete-time martingales
3.1 Kolmogorov’s 0 − 1 law, law of large numbers . . . . . .
3.2 Branching processes . . . . . . . . . . . . . . . . . . . . .
3.3 The Radon-Nikodym theorem . . . . . . . . . . . . . . .
3.4 Product martingales . . . . . . . . . . . . . . . . . . . .
3.4.1 Example: consistency of the likelihood ratio test .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
6
7
7
8
10
11
11
11
.
.
.
.
.
.
.
.
.
.
.
13
13
13
14
14
15
17
18
18
20
20
21
.
.
.
.
.
23
23
23
24
26
28
4 Continuous-parameter processes
4.1 Theoretical problems . . . . . . . . . . . . . . . . . . .
4.2 Finite marginal distributions, versions . . . . . . . . . .
4.3 The martingale regularization theorem . . . . . . . . .
4.4 Convergence theorems for continuous-time martingales
4.5 Kolmogorov’s continuity criterion . . . . . . . . . . . .
5 Weak convergence
5.1 Definition and characterizations
5.2 Convergence in distribution . .
5.3 Tightness . . . . . . . . . . . .
5.4 Lévy’s convergence theorem . .
.
.
.
.
.
.
.
.
.
.
.
.
6 Brownian motion
6.1 Wiener’s theorem . . . . . . . . . . .
6.2 First properties . . . . . . . . . . . .
6.3 The strong Markov property . . . . .
6.4 Martingales and Brownian motion . .
6.5 Recurrence and transience properties
6.6 The Dirichlet problem . . . . . . . .
6.7 Donsker’s invariance principle . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7 Poisson random measures and processes
7.1 Poisson random measures . . . . . . . . . . .
7.2 Integrals with respect to a Poisson measure . .
7.3 Poisson point processes . . . . . . . . . . . . .
7.3.1 Example: the Poisson process . . . . .
7.3.2 Example: compound Poisson processes
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
31
32
34
35
.
.
.
.
37
37
39
40
41
.
.
.
.
.
.
.
43
43
47
48
51
53
55
59
.
.
.
.
.
63
63
64
66
66
67
8 ID laws and Lévy processes
69
8.1 The Lévy-Khintchine formula . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.2 Lévy processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9 Exercises
9.1 Conditional expectation . . .
9.2 Discrete-time martingales . .
9.3 Continuous-time processes . .
9.4 Weak convergence . . . . . . .
9.5 Brownian motion . . . . . . .
9.6 Poisson measures, ID laws and
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
Lévy processes
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
75
77
79
84
85
86
89
Chapter 1
Conditional expectation
1.1
The discrete case
Let (Ω, F, P ) be a probability space. If A, B ∈ F are two events such that P (B) > 0, we
define the conditional probability of A given B by the formula
P (A|B) =
P (A ∩ B)
.
P (B)
We interpret this quantity as the probability of the event A given the fact that B is
realized. The fact that
P (B|A)P (A)
P (A|B) =
P (B)
is called Bayes’ rule. More generally, if X ∈ L1 (Ω, F, P ) is an integrable random variable,
we define
E[X 1B ]
E[X|B] =
,
P (B)
the conditional expectation of X given B.
Example. Toss a fair die (probability space Ω = {1, 2, 3, 4, 5, 6} and P ({ω}) = 1/6,
for ω ∈ Ω) and let A = {the result is even}, B = {the result is less than or equal to 2}.
Then P (A|B) = 1/2, P (B|A) = 1/3. If X = ω is the result, then E[X|A] = 4, E[X|B] =
3/2.
S
Let (Bi , i ∈ I) be a countable collection of disjoint events, such that Ω = i∈I Bi , and
G = σ{Bi , i ∈ I}. If X ∈ L1 (Ω, F, P ), we define a random variable
X
X0 =
E[X|Bi ]1Bi ,
i∈I
with the convention that E[X|Bi ] = 0 if P (Bi ) = 0.
The random variable X 0 is integrable, since
X
X
|E[X 1Bi ]|
≤ E[|X|].
E[|X 0 |] =
P (Bi )|E[X|Bi ]| =
P (Bi )
P
(B
i)
i∈I
i∈I
Moreover, it is straightforward to check:
5
6
CHAPTER 1. CONDITIONAL EXPECTATION
1. X 0 is G-measurable, and
2. for every B ∈ G, E[1B X 0 ] = E[1B X].
Example. If X ∈ L1 (Ω, F, P ) and Y is a random variable with values in a countable set
E, the above construction gives, by letting By = {Y = y}, y ∈ E, which partitions σ(Y )
into measurable events, a random variable
X
E[X|Y ] =
E[X|Y = y]1{Y =y} .
y∈E
Notice that the value taken by E[X|Y = y] when P (Y = y) = 0, which we have fixed
to 0, is actually irrelevant to define E[X|Y ], since a random variable is always defined up
to a set of zero measure. It is important to keep in mind that conditional expectations
are always a priori only defined up to a zero-measure set.
1.2
Conditioning with respect to a σ-algebra
We are now going to define the conditional expectation given a sub-σ-algebra of our
probability space, by using the properties 1. and 2. of the previous paragraph. The
definition is due to Kolmogorov.
Theorem 1.2.1 Let G ⊂ F be a sub-σ-algebra, and X ∈ L1 (Ω, F, P ). Then there exists
a random variable X 0 with E[|X 0 |] < ∞ such that the following two characteristic property
are verified:
1. X 0 is G-measurable
2. for every B ∈ G, E[1B X 0 ] = E[1B X].
Moreover, if X 00 is another such random variable, then X 0 = X 00 a.s. We denote by
E[X|G] ∈ L1 (Ω, G, P ) the class of random variable X 0 . It is called the conditional expectation of X given G.
Otherwise said, E[X|G] is the unique element of L1 (Ω, G, P ) such that E[1B X] =
E[1B E[X|G]] for every B ∈ G. Equivalently, an approximation argument allows to replace
2. in the statement by:
2’. For every bounded G-measurable random variable Z, E[ZX 0 ] = E[ZX].
Proof of the uniqueness.
Suppose X 0 and X 00 satisfy the two conditions of the
statement. Then B = {X 0 > X 00 } ∈ G, and therefore
0 = E[1B (X − X)] = E[1B (X 0 − X 00 )],
which shows X 0 ≤ X 00 a.s., the reverse inequality is obtained by symmetry.
The existence will need two intermediate steps.
1.2. CONDITIONING WITH RESPECT TO A σ-ALGEBRA
1.2.1
7
The L2 case
We first consider L2 variables. Suppose that X ∈ L2 (Ω, F, P ) and let G ⊂ F be a
sub-σ-algebra. Notice that L2 (Ω, G, P ) is a closed vector subspace of the Hilbert space
L2 (Ω, F, P ). Therefore, there exists a unique random variable X 0 ∈ L2 (Ω, G, P ) such that
E[Z(X − X 0 )] = 0 for every Z ∈ L2 (Ω, G, P ), namely, X 0 is the orthogonal projection of
X onto L2 (Ω, G, P ). This shows the previous theorem in the case X ∈ L2 , and in fact
E[·|G] : L2 → L2 is the orthonormal projector onto L2 (Ω, G, P ), and hence is linear.
It follows from the uniqueness statement that the conditional expectation has the
following nice interpretation in the L2 case: E[X|G] is the G-measurable random variable
that best approximates X. It is useful to keep this intuitive idea even in the general L1
case, although the word “approximates” becomes more fuzzy.
Notice that X 0 := E[X|G] ≥ 0 a.s. whenever X ≥ 0 since (notice {X 0 < 0} ∈ G)
E[X 1{X 0 <0} ] = E[X 0 1{X 0 <0} ],
and the left-hand side is non-negative while the right hand-side is non-positive, entailing
P (X 0 < 0) = 0. Moreover, it holds that E[E[X|G]] = E[X], because it is the scalar
product of X against the constant function 1 ∈ L2 (Ω, G, P ).
1.2.2
General case
Now let X ≥ 0 be any non-negative random variable (not necessarily integrable). Then
X ∧ n is in L2 for every n ∈ N, and X ∧ n increases to X pointwise. Therefore, the
sequence E[X ∧ n|G] is an (a.s.) increasing sequence, because X ∧ n − X ∧ (n − 1) ≥ 0)
and by linearity of E[·|G] on L2 . It therefore increases a.s. to a limit which we denote by
E[X|G]. Notice that E[E[X ∧ n|G]] = E[X ∧ n] so that by the monotone convergence
theorem, E[E[X|G]] = E[X]. In particular, if X is integrable, then so is E[X|G].
Proof of existence in Theorem 1.2.1. Existence. Let X ∈ L1 , and write X =
X + − X − (where X + = X ∨ 0, and X − = (−X) ∨ 0). Then X + , X − are non-negative
integrable random variables, so E[X + |G] and E[X − |G] are finite a.s. and we may define
E[X|G] = E[X + |G] − E[X − |G].
Now, let B ∈ G. Then E[(X + ∧ n)1B ] = E[E[X + ∧ n|G]1B ] by definition. The monotone
convergence theorem allows to pass to the limit (all integrated random variables are nonnegative), and we obtain E[X + 1B ] = E[E[X + |G]1B ]. The same is easily true for X − ,
and by subtracting we see that E[X|B] indeed satisfies the characteristic properties 1., 2.
The following properties are immediate consequences of the previous theorem and its
proof.
Proposition 1.2.1 Let G ⊂ F be a σ-algebra and X, Y ∈ L1 (Ω, F, P ). Then
1. E[E[X|G]] = E[X]
2. If X is G-measurable, then E[X|G] = X.
8
CHAPTER 1. CONDITIONAL EXPECTATION
3. If X is independent of G, then E[X|G] = E[X].
4. If a, b ∈ R then E[aX + bY |G] = aE[X|G] + bE[Y |G] (linearity).
5. If X ≥ 0 then E[X|G] ≥ 0 (positiveness).
6. |E[X|G]| ≤ E[|X| |G], so that E[|E[X|G]|] ≤ E[|X|].
Important remark. Notice that all statements concerning conditional expectation are
about L1 variables, which are only defined up to a subset of of zero probability, and hence
are a.s. statements. This is of crucial importance and reminds the fact that we encountered
before, that E[X|Y = y] can be assigned an arbitrary value whenever P (Y = y) = 0.
1.2.3
Non-negative case
In the course of proving the last theorem, we actually built an object E[X|G] as the a.s.
increasing limit of E[X ∧ n|G] for any non-negative random variable X, not necessarily
integrable. This random variable enjoys similar properties as the L1 case, and we state
them similarly as in Theorem 1.2.1.
Theorem 1.2.2 Let G ⊂ F be a sub-σ-algebra, and X ≥ 0 a non-negative random
variable. Then there exists a random variable X 0 ≥ 0 such that
1. X 0 is G-measurable, and
2. for every non-negative G-measurable random variable Z, E[ZX 0 ] = E[ZX].
Moreover, if X 00 is another such r.v., X 0 = X 00 a.s. We denote by E[X|G] the class of X 0
up to a.s. equality.
Proof. Any r.v. in the class of E[X|G] = lim supn E[X ∧ n|G] trivially satisfies 1. It also
satisfies 2. since if Z is a positive G-measurable random variable, we have, by passing to
the (increasing) limit in
E[(X ∧ n)(Z ∧ n)] = E[E[X ∧ n|G]Z ∧ n],
that E[XZ] = E[E[X|G]Z].
Uniqueness. If X 0 , X 00 are non-negative and satisfy the properties 1. & 2., for any
a < b ∈ Q+ , by letting B = {X 0 ≤ a < b ≤ X 00 } ∈ G, we obtain
bP (B) ≤ E[X 00 1B ] = E[X 1B ] = E[X 0 1B ] ≤ aP (B),
which entails P (B) = 0, so that P (X 0 < X 00 ) = 0 by taking the countable union over
a < b ∈ Q+ . Similarly, P (X 0 > X 00 ) = 0.
The reader is invited to formulate and prove analogs of the properties of Proposition
1.2.1 for positive variables, and in particular, that if 0 ≤ X ≤ Y then 0 ≤ E[X|G] ≤
E[Y |G] a.s. The conditional expectation enjoys the following properties, which match
those of the classical expectation.
1.2. CONDITIONING WITH RESPECT TO A σ-ALGEBRA
9
Proposition 1.2.2 Let G ⊂ F be a σ-algebra.
1. If (Xn , n ≥ 0) is an increasing sequence of non-negative random variables with limit
X, then (conditional monotone convergence theorem)
E[Xn |G] % E[X|G]
a.s.
n→∞
2. If (Xn , n ≥ 0) is a sequence of non-negative random variables, then (conditional
Fatou theorem)
E[lim inf Xn |G] ≤ lim inf E[Xn |G]
a.s.
n→∞
n→∞
3. If (Xn , n ≥ 0) is a sequence of random variables a.s. converging to X, and if there
exists Y ∈ L1 (Ω, F, P ) such that supn |Xn | ≤ Y a.s., then (conditional dominated
convergence theorem)
a.s. and in L1 .
lim E[Xn |G] = E[X|G] ,
n→∞
4. If ϕ : R → (−∞, ∞] is a convex function and X ∈ L1 (Ω, F, P ), and either ϕ is
non-negative or ϕ(X) ∈ L1 (Ω, F, P ), then (conditional Jensen inequality)
E[ϕ(X)|G] ≥ ϕ(E[X|G])
a.s.
5. If 1 ≤ p < ∞ and X ∈ Lp (Ω, F, P ),
kE[X|G]kp ≤ kXkp .
In particular, the linear operator X 7→ E[X|G] from Lp (Ω, F, P ) to Lp (Ω, G, P ) is
continuous.
Proof. 1. Let X 0 be the increasing limit of E[Xn |G]. Let Z be a positive G-measurable
random variable, then E[ZE[Xn |G]] = E[ZXn ], which by taking an increasing limit gives
E[ZX 0 ] = E[ZX], so X 0 = E[X|G].
2. We have E[inf k≥n Xk |G] ≤ inf k≥n E[Xk |G] for every n by monotonicity of the
conditional expectation, and the result is obtained by passing to the limit and using 1.
3. Applying 2. to the nonnegative random variables Z − Xn , Z + Xn , we get that
E[Z − X|G] ≤ E[Z|G] − lim sup E[Xn |G] and that E[Z + X|G] ≤ E[Z|G] + lim inf E[Xn |G],
giving the a.s. result. The L1 result is a consequence of the dominated convergence
theorem, since |E[Xn |G]| ≤ E[|Xn | |G] ≤ |Z| a.s.
4. A convex function ϕ is the superior envelope of its affine minorants, i.e.
ϕ(x) =
sup
ax + b =
a,b∈R:∀y,ay+b≤ϕ(y)
sup
ax + b.
a,b∈Q:∀y,ay+b≤ϕ(y)
The result is then a consequence of linearity of the conditional expectation and the fact
that Q is countable (this last fact is necessary because of the fact that conditional expectation is defined only a.s.).
5. One deduces from 4. and the previous proposition that kE[X|G]kpp = E[|E[X|G]|p ] ≤
E[E[|X|p |G]] = E[|X|p ] = kXkpp , if 1 ≤ p < ∞ and X ∈ Lp (Ω, F, P ). Thus
kE[X|G]kp ≤ kXkp .
10
CHAPTER 1. CONDITIONAL EXPECTATION
1.3
Specific properties of conditional expectation
The “information contained in G” can be factorized out the conditional expectation:
Proposition 1.3.1 Let G ⊂ F be a σ-algebra, and let X, Y be real random variables such
that either X, Y are non-negative or X, XY ∈ L1 (Ω, F, P ). Then, is Y is G-measurable,
we have
E[Y X|G] = Y E[X|G].
Proof. Let Z be a non-negative G-measurable random variable, then, if X, Y are nonnegative, E[ZY X] = E[ZY E[X|G]] since ZY is non-negative, and the result follows by
uniqueness. If X, XY are integrable, the same result follows by letting X = X + −X − , Y =
Y + − Y −.
One has the Tower property (restricting the information)
Proposition 1.3.2 Let G1 ⊂ G2 ⊂ F be σ-algebras. Then for every random variable X
which is positive or integrable,
E[E[X|G2 ]|G1 ] = E[X|G1 ].
Proof. For a positive bounded G1 -measurable Z, Z is G2 -measurable as well, so that
E[ZE[E[X|G2 ]|G1 ]] = E[ZE[X|G2 ]] = E[ZX] = E[ZE[X|G1 ]], hence the result.
Proposition 1.3.3 Let G1 , G2 be two sub-σ-algebras of F, and let X be a positive or
integrable random variable. Then, if G2 is independent of σ(X, G1 ), E[X|G1 ∨ G2 ] =
E[X|G1 ].
Proof. Let A ∈ G1 , B ∈ G2 , then
E[1A∩B E[X|G1 ∨ G2 ]] = E[1A 1B X] = E[1B E[X 1A |G2 ]] = P (B)E[X 1A ]
= P (B)E[1A E[X|G1 ]] = E[1A∩B E[X|G1 ]],
where we have used the independence property at the third and last steps. The proof is
then done by the monotone class theorem.
Proposition 1.3.4 Let X, Y be random variables and G be a sub-σ-algebra of F such
that Y is G-measurable and X is independent of G. Then for any non-negative measurable
function f ,
Z
E[f (X, Y )|G] =
P (X ∈ dx)f (x, Y ),
where P (X ∈ dx) is the law of X.
Proof. For any non-negative G-measurable random variable Z, we have that X is independent of (Y, Z), so that the law P ((X, Y, Z) ∈ dxdydz) is equal to the product
P (X ∈ dx)P ((Y, Z) ∈ dydz) of the law of X by the law of (Y, Z). Hence,
Z
E[Zf (X, Y )] =
zf (x, y)P (X ∈ dx)P ((Y, Z) ∈ dydz)
Z
Z
=
P (X ∈ dx)E[Zf (x, Y )] = E Z P (X ∈ dx)f (x, Y ) ,
where we used Fubini’s theorem in two places. This shows the result.
1.4. COMPUTING A CONDITIONAL EXPECTATION
1.4
11
Computing a conditional expectation
We give two concrete and important examples of computation of conditional expectations.
1.4.1
Conditional density functions
Suppose X, Y have values in Rm and Rn respectively, and Rthat the law of (X, Y ) has a
density: P ((X, Y ) ∈ dxdy) = fX,Y (x, y)dxdy. Let fY (y) = x∈Rm fX,Y (x, y)dx, y ∈ Rn be
the density of Y . Then for every non-negative measurable h : Rm → R, g : Rn → R, we
have
Z
E[h(X)g(Y )] =
h(x)g(y)fX,Y (x, y)dxdy
Rm ×Rn
Z
Z
fX,Y (x, y)
1{fY (y)>0} dx
=
g(y)fY (y)dy
h(x)
fY (y)
Rn
Rm
= E[ϕ(Y )g(Y )],
so E[h(X)|Y ] = ϕ(Y ), where
1
ϕ(y) =
fY (y)
Z
h(x)fX,Y (x, y)dx
if fY (y) > 0,
Rm
and 0 else. We interpret this result by saying that
Z
h(x)ν(Y, dx),
E[h(X)|Y ] =
Rm
where ν(y, dx) = fY (y)−1 fX,Y (x, y)1{fY (y)>0} dx = fX|Y (x|y)dx. The measure ν(y, dx)
is called conditional distribution given Y = y, and fX|Y (x|y) is the conditional density
function of X given Y = y. Notice this function of x, y is defined only up to a zeromeasure set.
1.4.2
The Gaussian case
Let (X, Y ) be a Gaussian vector in R2 . Take X 0 = aY +b with a, b such that Cov (X, Y ) =
aVar Y and aE[Y ] + b = E[X]. In this case, Cov (Y, X − X 0 ) = 0, hence X − X 0 is
independent of σ(Y ) by properties of Gaussian vectors. Moreover, X − X 0 is centered so
for every B ∈ σ(Y ), one has E[1B X] = E[1B X 0 ], hence X 0 = E[X|Y ].
12
CHAPTER 1. CONDITIONAL EXPECTATION
Chapter 2
Discrete-time martingales
Before we entirely focus on discrete-time martingales, we start with a general discussion
on stochastic processes, which includes both discrete and continuous-time processes.
2.1
2.1.1
Basic notions
Stochastic processes, filtrations
Let (Ω, F, P ) be a probability space. For a measurable states space (E, E) and a subset
I ⊂ R of “times”, or “epochs”, an E-valued stochastic process indexed by I is a collection
(Xt , t ∈ I) of random variables. Most of the processes we will consider take values in R,
Rd , or C, being endowed with their Borel σ-algebras.
A filtration is a collection (Ft , t ∈ I) of sub-σ-algebras of F which is increasing (s ≤
t =⇒ Fs ⊆ Ft ). Once a filtration is given, we call (Ω, F, (Ft )t∈I , P ) a filtered probability
space. A process (Xt , t ∈ I) is adapted to the filtration (Ft , t ∈ I) if Xt is Ft -measurable
for every t.
The intuitive idea is that Ft is the quantity of information available up to time t
(present). To give an informal example, if we are interested in the evolution of the stock
market, we can take Ft as the past history of the stocks prices (or only some of them) up
to time t.
W
We will let F∞ = t∈I Ft ⊆ F be the information at the end of times.
Example. For every process (Xt , t ∈ I), one associates the natural filtration
FtX = σ({Xs , s ≤ t}) ,
t ∈ I.
Every process is adapted to its natural filtration, and F X is the smallest filtration X is
adapted to: FtX contains all the measurable events depending on (Xs , s ≤ t).
Last, a real-valued process (Xt , t ∈ I) is said to be integrable if E[|Xt |] < ∞ for all
t ∈ I.
13
14
CHAPTER 2. DISCRETE-TIME MARTINGALES
2.1.2
Martingales
Definition 2.1.1 Let (Ω, F, (Ft )t∈I , P ) be a filtered probability space. An R-valued adapted
integrable process (Xt , t ∈ I) is:
• a martingale if for every s ≤ t, E[Xt |Fs ] = Xs .
• a supermartingale if for every s ≤ t, E[Xt |Fs ] ≤ Xs .
• a submartingale if for every s ≤ t, E[Xt |Fs ] ≥ Xs .
Notice that a (super, sub)martingale remains a martingale with respect to its natural
filtration, by the tower property of conditional expectation.
2.1.3
Doob’s stopping times
Definition 2.1.2 Let (Ω, F, (Ft )t∈I , P ) be a filtered probability space. A stopping time
(with respect to this space) is a random variable T : Ω → I t {∞} such that {T ≤ t} ∈ Ft
for every t ∈ I.
For example, constant times are (trivial) stopping times. If I = Z+ , the random
variable n1A + ∞1Ac is a stopping time if A ∈ Fn (with the convention 0 · ∞ = 0).
The intuitive idea behind this definition is that T is a time when a decision can be taken
(given the information we have). For example, for a meteorologist having the weather
information up to the present time, the “first day of 2006 when the temperature is above
23◦ C is a stopping time, but not the “last day of 2006 when the temperature is above
23◦ C”.
Example. If I ⊂ Z+ , the definition can be replaced by {T = n} ∈ Fn for all n ∈ I.
When I is a subset of the integers, we will denote the time by letters n, m, k rather than
t, s, r (so n ≥ 0 means n ∈ Z+ ). Particularly important instances of stopping times in this
case are the first entrance times. Let (Xn , n ≥ 0) be an adapted process and let A ∈ E.
The first entrance time in A is
TA = inf{n ∈ Z+ : Xn ∈ A} ∈ Z+ t {∞}.
It is a stopping time since
{TA ≤ n} =
[
−1
Xm
(A).
0≤m≤n
To the contrary, the last exit time before some fixed N ,
LA = sup{n ∈ {0, 1, . . . , N } : Xn ∈ A} ∈ Z+ t {∞},
is in general not a stopping time.
As an immediate consequence of the definition, one gets:
Proposition 2.1.1 Let S, T, (Tn , n ∈ N) be stopping times (with respect to some filtered
probability space). Then S ∧ T, S ∨ T, inf n Tn , supn Tn , lim inf n Tn , lim supn Tn are stopping
times.
2.2. OPTIONAL STOPPING
15
Definition 2.1.3 Let T be a stopping time with respect to some filtered probability space
(Ω, F, (Ft )t∈I , P ). We define FT , the σ-algebra of events before time T
FT = {A ∈ F∞ : A ∩ {T ≤ t} ∈ Ft }.
The reader is invited to check that it defines indeed a σ-algebra, which is interpreted
as the events that are measurable with respect to the information available at time T : “4
days before the first day (T ) in 2005 when the temperature is above 23◦ C, the temperature
was below 10◦ C” is in FT . If S, T are stopping times, one checks that
S ≤ T =⇒ FS ⊆ FT .
(2.1)
Now suppose that I is countable. If (Xt , t ∈ I) is adapted, and T a stopping time,
we let XT 1{T <∞} = XT (ω) (ω) if T (ω) < ∞, and 0 else. It is a random variable as the
composition of (ω, t) 7→ Xt (ω) and ω 7→ (ω, T (ω)), which are measurable (why?). We also
let X T = (XT ∧t , t ∈ I), and we call it the process X stopped at T .
Proposition 2.1.2 Under these hypotheses,
1. XT 1{T <∞} is FT -measurable,
2. the process X T is adapted,
3. if moreover I = Z+ and X is integrable, then X T is integrable.
S
Proof. 1. Let A ∈ E. ThenS{XT ∈ A} ∩ {T ≤ t} = s∈I,s≤t {Xs ∈ A} ∩ {T = s}. Then
notice {T = s} = {T ≤ s} \ u<s {T ≤ u} ∈ Fs .
2. For every t ∈ I, XT ∧t is FT ∧t -measurable, hence Ft measurable since T ∧ t ≤ t, by
(2.1).
P
3. If I = Z+ and X is integrable, E[|XnT |] = m<n E[|Xm |1{T =m} ] + E[|Xn |1{T ≥n} ] ≤
n sup0≤m≤n E[|Xm |].
From now on until the end of the section (except in the paragraph on backwards
martingales), we will suppose that E = R and I = Z+ (discrete-time processes).
2.2
Discrete-time martingales: optional stopping
We consider a filtered probability space (Ω, F, (Fn ), P ). All the above terminology (stopping times, adapted processes and so on) will be with respect to this space.
We first introduce the so-called ‘martingale transform’, which is sometimes called the
‘discrete stochastic integral’ with respect to a (super, sub)martingale X. We say that a
process (Cn , n ≥ 1) is previsible if Cn is Fn−1 -measurable for every n ≥ 1. A previsible
process can be interpreted as a strategy: one bets at time n only with all the accumulated
knowledge up to time n − 1.
16
CHAPTER 2. DISCRETE-TIME MARTINGALES
If (Xn , n ≥ 0) is adapted and (Cn , n ≥ 1) is previsible, we define an adapted process
C · X by
n
X
(C · X)n =
Ck (Xk − Xk−1 ).
k=1
We can interpret this new process as follows: if Xn is a certain amount of money at time
n and if Cn is the bet of a player at time n then (C · X)n is the total winning of the player
at time n.
Proposition 2.2.1 In this setting, if X is a martingale, and C is bounded, then C · X
is a martingale. If X is a supermartingale (resp. submartingale) and Cn ≥ 0 for every
n ≥ 1, then C · X is a supermartingale (resp. submartingale).
Proof. Suppose X is a martingale. Since C is bounded, the process C · X is trivially
integrable. Since Cn+1 is Fn -measurable,
E[(C · X)n+1 − (C · X)n |Fn ] = E[Cn+1 (Xn+1 − Xn )|Fn ] = Cn+1 E[Xn+1 − Xn |Fn ] = 0.
The (super-, sub-)martingale cases are similar.
Theorem 2.2.1 (Optional stopping) Let (Xn , n ≥ 0) be a martingale (resp. super-,
sub-martingale).
(i) If T is a stopping time, then Then X T is also a martingale (resp. super-, submartingale).
(ii) If S ≤ T are bounded stopping times, then E[XT |FS ] = XS (resp. E[XT |FS ] ≤
XS , E[XT |FS ] ≥ XS ).
(iii) If S ≤ T are bounded stopping times, then E[XT ] = E[XS ] (resp. E[XT ] ≤
E[XS ], E[XT ] ≥ E[XS ]).
Proof. (i) Let Cn = 1{n≤T } , then C is a previsible non-negative bounded process, and it
is immediate that C · X = X T . The first result follows from Proposition 2.2.1.
(ii) If now S, T are bounded stopping times with S ≤ T , and A ∈ FS , we define
Cn = 1A 1{S<n≤T } . Then C is a nonnegative bounded previsible process, since A ∩ {S <
n} = A ∩ {S ≤ n − 1} ∈ Fn−1 and {n ≤ T } = {n − 1 < T } ∈ Fn−1 . Morevoer, XS , XT
are integrable since S, T are bounded, and (C · X)K = 1A (XT − XS ) as soon as K ≥ T
a.s. Since C · X is a martingale, E[(C · X)K ] = E[(C · X)0 ] = 0. Taking expectations
entails that E[XT |FS ] = XS .
(iii) Follows by taking expectations in (ii).
Notice that the last two statement is not true in general. For example, if (Yn , n ≥ 0)
are
P independent random variables which take values ±1 with probability 1/2, then Xn =
1≤i≤n Yi is a martingale. If T = inf{n ≥ 0 : Xn = 1} then it is classical that T < ∞
a.s., but of course E[XT ] = 1 > 0 = E[X0 ]. However, for non-negative supermartingales,
Fatou’s lemma entails:
Proposition 2.2.2 Suppose X is a non-negative supermartingale. Then for any stopping
time which is a.s. finite, we have E[XT ] ≤ E[X0 ].
2.3. THE CONVERGENCE THEOREM
17
Beware that this ≤ sign should not in general be turned into a = sign, even if X is
a martingale! The very same proposition is actually true without the assumption that
P (T < ∞) = 1, by the martingale convergence theorem 2.3.1 below.
2.3
Discrete-time martingales: the convergence theorem
The martingale convergence theorem is the most important result in this chapter.
Theorem 2.3.1 (Martingale convergence theorem) If X is a supermartingale which
is bounded in L1 (Ω, F, P ), i.e. such that supn E[|Xn |] < ∞, then Xn converges a.s. towards an a.s. finite limit X∞ .
An easy and important corollary for that is
Corollary 2.3.1 A non-negative supermartingale converges a.s. towards an a.s. finite
limit.
Indeed, for a non-negative supermartingale, E[|Xn |] = E[Xn ] ≤ E[X0 ] < ∞.
The proof of Theorem 2.3.1 relies on an estimation of the number of upcrossings of a
submartingale between to levels a < b. If (xn , n ≥ 0) is a real sequence, and a < b are
two real numbers, we define two integer-valued sequences Sk (x), Tk (x), k ≥ 1 recursively
as follows. Let T0 (x) = 0 and for k ≥ 0, let
Sk+1 (x) = inf{n ≥ Tk : xn < a} ,
Tk+1 (x) = inf{n ≥ Sk+1 (x) : xm > b},
with the usual convention inf ∅ = ∞. The number Nn ([a, b], x) = sup{k > 0 : Tk (x) ≤ n}
is the number of upcrossings of x between a and b before time n, which increases as
n → ∞ to the total number of upcrossings N ([a, b], x) = sup{k > 0 : Tk (x) < ∞}. The
key is the simple following analytic lemma:
Lemma 2.3.1 A real sequence x converges (in R) if and only if N ([a, b], x) < ∞ for
every rationals a < b.
Proof. If there exist a < b rationals such that N ([a, b], x) = ∞, then lim inf n xn ≤ a <
b ≤ lim supn xn so that x does not converge. If x does not converge, then lim inf n xn <
lim supn xn , so by taking two rationals a < b in between, we get the converse statement.
Theorem 2.3.2 (Doob’s upcrossing lemma) Let X be a supermartingale, and a < b
two reals. Then for every n ≥ 0,
(b − a)E[Nn ([a, b], X)] ≤ E[(Xn − a)− ].
18
CHAPTER 2. DISCRETE-TIME MARTINGALES
Proof. It is immediate by induction that Sk = Sk (X), Tk = Tk (X) defined as above are
stopping times. Define a previsible process C, taking only 0 or 1 values, by
X
Cn =
1{Sk <n≤Tk } .
k≥1
It is indeed previsible since {Sk < n ≤ Tk } = {Sk ≤ n − 1} ∩ {Tk ≤ n − 1}c ∈ Fn−1 . Now,
letting Nn = N ([a, b], X), we have
(C · X)n =
Nn
X
(XTi − XSi ) + (Xn − XSNn +1 )1{SNn +1 ≤n}
i=1
≥ (b − a)Nn + (Xn − a)1{Xn ≤a} ≥ (b − a)Nn − (Xn − a)− .
Since C is a non-negative bounded previsible process, C · X is a supermartingale so finally
(b − a)E[Nn ] − E[(Xn − a)− ] ≤ E[(C · X)n ] ≤ 0,
hence the result.
Since (x + y)− ≤ |x| + |y|, we get from Theorem 2.3.2
Proof of Theorem 2.3.1.
that E[Nn ] ≤ (b − a)−1 E[|Xn | + a], and since Nn increases to N = N ([a, b], X) we get
by monotone convergence E[N ] ≤ (b − a)−1 (supn E[|Xn |] + a). In particular, we get
N ([a, b], X) < ∞ a.s. for every a < b ∈ Q, so
!
\
P
{N ([a, b], X) < ∞} = 1.
a<b∈Q
Hence the a.s. convergence to some X∞ , possibly infinite.
Now Fatou’s lemma gives E[|X∞ |] ≤ lim inf n E[|Xn |] < ∞ by hypothesis, hence
|X∞ | < ∞ a.s.
Exercise. In fact, from Theorem 2.3.2 it is clearly enough that supn E[Xn− ] < ∞ is sufficient, prove that this actually implies boundedness is L1 (provided X is a supermartingale,
of course).
Doob’s inequalities and Lp convergence, p > 1
2.4
2.4.1
A maximal inequality
en = sup0≤k≤n Xk , for
Proposition 2.4.1 Let X be a sub-martingale. Then letting X
every c > 0, and n ≥ 0,
en ≥ c) ≤ E[Xn 1{X ∗ ≥c} ] ≤ E[X + ].
cP (X
n
n
Proof. Letting T = inf{k ≥ 0 : Xk ≥ c}, we obtain by optional stopping that
E[Xn ] ≥ E[XnT ] = E[Xn 1{T >n} ] + E[XT 1{T ≤n} ] ≥ E[Xn 1{T >n} ] + cP (T ≤ n).
en ≥ c}, the conclusion follows.
Since {T ≤ n} = {X
2.4. LP CONVERGENCE, P > 1
19
Theorem 2.4.1 (Doob’s Lp inequality) Let p > 1, and X be a martingale, then letting Xn∗ = sup0≤k≤n |Xk |, we have
kXn∗ kp ≤
p
kXn kp .
p−1
Proof. Since x 7→ |x| in convex, the process (|Xn |, n ≥ 0) is a non-negative submartingale.
Applying Proposition 2.4.1 and Hölder’s inequality shows that
Z ∞
∗ p
dx pxp−1 P (Xn∗ ≥ x)
E[(Xn ) ] =
Z0 ∞
dx pxp−2 E[|Xn |1{Xn∗ ≥x} ]
≤
0
Z Xn∗
p−2
dx x
= pE |Xn |
0
p
p
=
E[|Xn |(Xn∗ )p−1 ] ≤
kXn kp kXn∗ kp−1
p ,
p−1
p−1
which yields the result.
Theorem 2.4.2 Let X be a martingale and p > 1, then the following statements are
equivalent:
1. X is bounded in Lp (Ω, F, P ): supn≥0 kXn kp < ∞
2. X converges a.s. and in Lp to a random variable X∞
3. There exists some Z ∈ Lp (Ω, F, P ) such that
Xn = E[Z|Fn ].
Proof. 1. =⇒ 2. Suppose X is bounded in Lp , then in particular, it is bounded in L1
so it converges a.s. to some finite X∞ by Theorem 2.3.1. Moreover, X∞ ∈ Lp by an easy
application of Fatou’s theorem. Next, Doob’s inequality kXn∗ kp ≤ CkXn kp < C 0 < ∞
∗
∗
entails kX∞
kp < ∞ by monotone convergence, where X∞
= supn≥0 |Xn | is the monotone
∗
∗
∗
limit of Xn . Since X∞ ≥ supn∈N |Xn |, |Xn −X∞ | ≤ 2X∞ ∈ Lp and dominated convergence
entails that Xn converges to X∞ in Lp .
2. =⇒ 3. Since conditional expectation is continuous as a linear operator on Lp
spaces (Proposition 1.2.2), if Xn → X∞ in Lp we have for n ≤ m, Xn = E[Xm |Fn ] →m→∞
E[X∞ |Fn ].
3. =⇒ 1. This is immediate by the conditional Jensen inequality.
p
A martingale which has the form in 3. is
W said to be closedS(in L ). Notice that
in this case, X∞ = E[Z|F∞ ], where F∞ = n≥0 Fn . Indeed, n≥0 Fn is a π-system
that spans F∞ , and moreover if B ∈ FN say is an element of this π-system, E[1B Z] =
E[1B E[Z|F∞ ]] = E[1B XN ] → E[1B X∞ ] as N → ∞. Since X∞ = lim supn Xn is F∞ measurable, this gives the result.
Therefore, for p > 1, the map Z ∈ Lp (Ω, F∞ , P ) 7→ (E[Z|Fn ], n ≥ 0) is a bijection
between Lp (Ω, F∞ , P ) and the set of martingales that are bounded in Lp .
20
CHAPTER 2. DISCRETE-TIME MARTINGALES
2.5
Uniform integrability and convergence in L1
The case of L1 convergence is a little different from Lp for p > 1, as one needs to suppose uniform integrability rather than a mere boundedness in L1 . Notice that uniform
integrability follows from boundedness in Lp .
Theorem 2.5.1 Let X be a martingale. The following statements are equivalent:
1. (Xn , n ≥ 0) is uniformly integrable
2. Xn converges a.s. and in L1 (Ω, F, P ) to a limit X∞
3. There exists Z ∈ L1 (Ω, F, P ) so that Xn = E[Z|Fn ], n ≥ 0.
Proof. 1. =⇒ 2. Suppose X is uniformly integrable, then it is bounded in L1 so by
Theorem 2.3.1 it converges a.s. By properties of uniform integrability, it then converges
in L1 .
2. =⇒ 3. This follows the same proof as above: X∞ = Z is a suitable choice.
3. =⇒ 1. This is a straightforward consequence of the fact that
{E[X|G] : G is a sub-σ-algebra of F}
is U.I., see the example sheet 1.
As above, we then have E[Z|F∞ ] = X∞ , and this theorem says that there is a one-toone correspondence between U.I. martingales and L1 (Ω, F∞ , P ).
Exercise.
Show that if X is a U.I. supermartingale (resp. submartingale), then Xn
converges a.s. and in L1 to a limit X∞ , so that E[X∞ |Fn ] ≤ Xn (resp. ≥) for every n.
2.6
Optional stopping in the case of U.I. martingales
We give an improved version of the optional stopping theorem, in which the boundedness
condition on the stopping time is lifted, and replaced by a uniform integrability condition
on the martingale. Since U.I. martingales have a well defined limit X∞ , we unambiguously
let XT = XT 1{T <∞} + X∞ 1{T =∞} for any stopping time T .
Theorem 2.6.1 Let X be a U.I. martingale, and S, T be two stopping times with S ≤ T .
Then E[XT |FS ] = XS .
Proof. We check that XT ∈ L1 , indeed, since |Xn | ≤ E[|X∞ | |Fn ],
∞
X
X
E[|XT |] =
E[|X∞ |1{T =n} ] = E[|X∞ |].
E[|Xn |1{T =n} ] + E[|X∞ |1{T =∞} ] ≤
n=0
n∈N
Next, if B ∈ FT ,
E[1B X∞ ] =
X
n∈Z+ t{∞}
E[1B 1{T =n} X∞ ] =
X
E[1B 1{T =n} Xn ] = E[1B XT ],
n∈Z+ t{∞}
so that XT = E[X∞ |FT ]. Finally, E[XT |FS ] = E[E[X∞ |FT ]|FS ] = XS , by the tower
property.
2.7. BACKWARDS MARTINGALES
2.7
21
Backwards martingales
Backwards martingales are martingales whose time-set is Z− . More precisely, given a
filtration . . . ⊆ G−2 ⊆ G−1 ⊆ G0 , a process (Xn , n ≤ 0) is a backward martingale if
E[Xn+1 |Gn ] = Xn , as in the usual definition. They are somehow nicer than forward
martingales, as they are automatically U.I. since X0 ∈ L1 , and E[X0 |Gn ] = Xn for every
n ≤ 0. Adapting Doob’s upcrossing theorem is a simple exercise: if Nm ([a, b], X) is the
number of upcrossings of a backwards martingale from a to b between times −m and 0,
one has, considering the (forward) supermartingale (X−m+k , 0 ≤ k ≤ m), that
(b − a)E[Nm ([a, b], X)] ≤ E[(X0 − a)− ].
As m → ∞, Nm ([a, b], X) increases to the total number of upcrossings of X from a to
b, and this allows to conclude that XT
n converges a.s. as n → −∞ to a G−∞ -measurable
random variable X−∞ , where G−∞ = n≤0 Gn . We proved:
Theorem 2.7.1 Let X be a backwards martingale. Then Xn converges a.s. and in L1 as
n → −∞ to the random variable X−∞ = E[X0 |G−∞ ].
Moreover, if X0 ∈ Lp for some p > 1, then X is bounded in Lp and converges in Lp
as n → −∞.
22
CHAPTER 2. DISCRETE-TIME MARTINGALES
Chapter 3
Examples of applications of
discrete-time martingales
3.1
Kolmogorov’s 0 − 1 law, law of large numbers
Let (Yn , n ≥ 1) be a sequence of independent random variables.
Theorem 3.1.1 (Kolmogorov’s 0 − 1 law) The tail σ-algebra G∞ =
Gn = σ{Xm , m ≥ n}, is trivial: every A ∈ G∞ has probability 0 or 1.
T
n≥0
Gn , where
Proof. Let Fn = σ{Y1 , . . . , Yn }, n ≥ 1. Let A ∈ G∞ . Then E[1A |Fn ] = P (A) since Fn is
independent of Gn+1 , hence of G∞ . Therefore, the martingale convergence theorem gives
E[1A |F∞ ] = 1A = P (A) a.s., since G∞ ⊂ F∞ . Hence, P (A) ∈ {0, 1}.
1
PnSuppose now that the Yi are real-valued i.i.d. random variables in L . Let Sn =
k=1 Yk , n ≥ 0 be the associated random walk.
Theorem 3.1.2 (LLN) A.s. as n → ∞,
Sn
−→ E[Y1 ].
n n→∞
Proof. Let Hn = σ{Sn , Sn+1 , . . .} = σ{Sn , Yn+1 , Yn+2 , . . .}. We have E[Sn |Hn+1 ] =
Sn+1 − E[Xn+1 |Sn+1 ]. Now, by symmetry we have E[Xn+1 |Sn+1 ] = E[Xk |Sn+1 ] for every 1 ≤ k ≤ n + 1, so that it equals (n + 1)−1 E[Sn+1 |Sn+1 ] = Sn+1 /(n + 1). Finally,
E[Sn /n|Hn+1 ] = Sn+1 /(n + 1), so that (S−n /(−n), n ≤ −1) is a backwards martingale
with respect to its natural filtration. Therefore, Sn /n converges a.s. and in L1 to a limit
which is a.s. constant by Kolmogorov’s 0 − 1 law, so it must be equal to its mean value:
E[S1 |H∞ ] = E[S1 ] = E[Y1 ].
3.2
Branching processes
Let µ be a probability distribution on Z+ , and consider a Markov process (Zn , n ≥ 0) in
Z+ whose step-transitions are determined by the following rule. Given Zn = z, take z
23
24CHAPTER 3. EXAMPLES OF APPLICATIONS OF DISCRETE-TIME MARTINGALES
independent random variables Y1 , . . . , Yz with law µ, and let Zn+1 have the same distribution as Y1 + . . . + Yz . In particular, 0 is an absorbing state for this process. This can be
interpreted as follows: Zn is a number of individuals present in a population, and at each
time, each individual dies after giving birth to a µ-distributed number of sons, independently of the others. Notice that E[Zn+1 |Fn ] = E[Zn+1
P|Zn ] = mZn , where (Fn , n ≥ 0) is
the natural filtration, and m is the mean of µ, m = z zµ({z}). Therefore, supposing
m ∈ (0, ∞),
Proposition 3.2.1 The process (m−n Zn , n ≥ 0) is a non-negative martingale.
Notice that the fact that the martingale converges a.s. to a finite value immediately
implies that when m < 1, there exists some n so that Zn = 0, i.e. the population becomes
extinct in finite time. It is also guessed that when m > 1, Zn should be of order mn so
that the population should grow explosively, at least with a positive probability. It is a
standard to show that
P
z
Exercise Let ϕ(s) =
z∈Z+ µ({z})s be the generating function of µ, we suppose
µ({1}) < 1. Show that if Z0 = 1, then the generating function of Zn is the n-fold
composition of ϕ with itself. Show that the probability of eventual extinction of the population satisfies ϕ(q) = q, and that q > 0 ⇐⇒ m > 1. As a hint, ϕ is a convex function
such that ϕ0 (1) = m.
Notice that, still supposing Z0 = 1, the martingale (Mn = Zn /mn , n ≥ 0) cannot be
U.I. when m ≤ 1, since it converges to 0 a.s., so E[M∞ ] < E[M0 ]. This leaves open
the question whether P (M∞ > 0) > 0 in the case m > 1. We are going to address the
problem in a particular case:
Proposition 3.2.2 Suppose m > 1, Z0 = 1 and σ 2 = Var (µ) < ∞. Then the martingale
M is bounded in L2 , and hence converges a.s. and in L2 to a variable M∞ so that E[M∞ ] =
1, in particular, P (M∞ > 0) > 0.
2
2
Proof. We compute E[Zn+1
|Fn ] = Zn2 m2 + Zn σ 2 . This shows that E[Mn+1
] = E[Mn2 ] +
σ 2 m−n , and therefore, since m−n , n ≥ 0 is summable, M is bounded in L2 (this statement
is actually equivalent to m > 1).
Exercise Show that under these hypotheses, {M∞ > 0} and {limn Zn = ∞} are equal,
up to an event of vanishing probability.
3.3
A martingale approach to the Radon-Nikodym
theorem
We begin with the following general remark. Let (Ω, F, (Fn ), P ) be a filtered probability
space with F∞ = F, and let Q be a finite non-negative measure on (Ω, F). Let Pn and
Qn denote the restrictions of P and Q to the measurable space (Ω, Fn ). Suppose that for
every n, Qn has a density Mn with respect to Pn , namely Qn (dω) = Mn (ω)Pn (dω), where
Mn is an Fn -measurable non-negative function. We also sometimes let Mn = dQn /dPn .
3.3. THE RADON-NIKODYM THEOREM
25
Then it is immediate that (Mn , n ≥ 0) is a martingale with respect to the filtered space
(Ω, F, (Fn ), P ). Indeed, E[Mn ] = Q(Ω) < ∞, and for A ∈ Fn ,
E P [Mn+1 1A ] = E Pn+1 [Mn+1 1A ] = Qn+1 (A) = Qn (A) = E Pn [Mn 1A ] = E P [Mn 1A ],
where E P , E Pn denote expectations with respect to the probability measures P, Pn . A
natural problem is to wonder whether the identity Qn = Mn Pn passes to the limit Q =
M∞ P as n → ∞, where M∞ is the a.s. limit of the non-negative martingale M .
Proposition 3.3.1 Under these hypotheses, there exists a non-negative random variable
X := dQ/dP such that Q = X · P if and only if (Mn , n ≥ 0) is U.I.
Proof. If M is U.I., then we can pass to the limit in E[M
S m 1A ] = Q(A) for A ∈ Fn and
m → ∞, to obtain E[M∞ 1A ] = Q(A) for every A ∈ n≥0 Fn . Since this last set is a
π-system that generates F∞ = F, we obtain M∞ · P = Q by the theorem on uniqueness
of measures.
Conversely, if Q = X · P , then for A ∈ Fn , we have Q(A) = E[Mn 1A] = E[X 1A] so
that Mn = E[X|Fn ], which shows that M is U.I.
The Radon-Nikodym theorem (in a particular case) states as follows.
Theorem 3.3.1 (Radon-Nikodym) Let (Ω, F) be a measurable space such that F is
separable, i.e. generated by a countable set of events Fk , k ≥ 1. Let P be a probability
measure on (Ω, F) and Q be a finite non-negative measure on (Ω, F). Then the following
statements are equivalent.
(i) Q is absolutely continuous with respect to P , namely
∀ A ∈ F, P (A) = 0 =⇒ Q(A) = 0.
(ii) ∀ ε > 0, ∃ δ > 0, ∀ A ∈ F, P (A) ≤ δ =⇒ Q(A) ≤ ε.
(iii) There exists a non-negative random variable X such that Q = X · P .
The separability condition on F can actually be lifted, see Williams’ book for the
proof in the general case.
Proof. That (iii) implies (i) is straightforward.
If (ii) is not satisfied then we can find a sequence Bn of events and an ε > 0 such
that P (Bn ) < 2−n but Q(Bn ) ≥ ε. But by the Borel-Cantelli
lemma, P (lim sup Bn ) =
S
0, while Q(lim sup Bn ), as the decreasing limit of Q( k≥n Bk ) as n → ∞, must be ≥
lim supn Q(Bn ) ≥ ε. Hence, (i) does not hold for the set A = lim sup Bn . So (i) implies
(ii).
Let us now assume (ii). Let Fn be a filtration such that Fn is the σ-algebra spanned by
events F1 , . . . , Fn , Notice that any event of Fn is a disjoint union of non-empty “atoms”
of the form
\
Gi ,
i≥1
26CHAPTER 3. EXAMPLES OF APPLICATIONS OF DISCRETE-TIME MARTINGALES
where either Gi = Fi or its complementary set. We let An be the set of atoms of Fn . Let
Mn (ω) =
X Q(A)
1A (ω),
P
(A)
A∈A
n
with the convention that 0/0 = 0. Then it is easy to check that Mn is a density for Qn
with respect to Pn , where Pn , Qn denote restrictions to Fn as above. Indeed, if A ∈ An ,
Qn (A) =
Q(A)
P (A) = E Pn [Mn 1A ].
P (A)
Therefore, (Mn , n ≥ 0) is a non-negative (Fn , n ≥ 0)-martingale, and Mn (ω) converges
a.s. towards a limit M∞ (ω). Moreover, the last proposition tells us that it suffices to shoz
that (Mn ) is U.I. to conclude the proof.
But note that we have E[Mn 1{Mn ≥a} ] = Q(Mn ≥ a). So for ε > 0 fixed, P (Mn ≥ a) ≤
E[Mn ]/a = Q(Ω)/a < δ for all n, as soon as a is large enough, with δ fixed by the claim,
and this entails Q(Mn ≥ a) ≤ ε for every n. Hence the result.
Example. Let Ω = [0, 1) be endowed with its Borel σ-field, which is spanned by {Ik,j =
[j2−k , (j + 1)2−k ), k ≥ 0, 0 ≤ j ≤ 2k − 1}. The intervals Ik,j , 0 ≤ j ≤ 2k − 1 are called the
dyadic intervals of depth k, they span a σ-algebra which we call Fk . We let λ(dω) be the
Lebesgue measure on [0, 1). Let ν be a finite non-negative measure on [0, 1), and
n
Mn (ω) = 2
n −1
2X
1In,j (ω)ν(In,j ),
j=0
then we obtain by the previous theorem that if ν is absolutely continuous with respect
to λ, then ν = f · λ for some non-negative measurable f . We then see that a.s., if
Ik (x) = [2−k [2k x], 2−k ([2k x] + 1)) denotes the dyadic interval of level k containing x,
Z
k
2
f (x)λ(dx) → f (x).
Ik (x)
k→∞
This is a particular case of Lebesgue differentiation theorem.
3.4
Product martingales and likelihood ratio tests
Theorem 3.4.1 (Kakutani’s theorem) Let (Yn , n ≥ 1) a sequence of independent
non-negative
random variables, with mean 1. Let Fn = σ(Y1 , . . . , Yn ). Then Xn =
Q
Y
,
n
≥
0 is a (Fn , n ≥ 0)-martingale, which converges to some X∞ ≥ 0. Letting
1≤k≤n √k
an = E[ Yn ], the following statements are equivalent:
1. X is U.I.
2. E[X∞ ] = 1
3. P (X∞ > 0) > 0
3.4. PRODUCT MARTINGALES
4.
Q
n
27
an > 0.
Proof. The fact that M is a (non-negative) martingale follows from the fact that
E[Xn+1 |Fn ] = Xn E[Yn+1 |Fn ] = Xn E[Yn+1 ]. For the same reason, the process
n √
Y
Yn
Mn =
,
n≥0
a
n
k=1
Qn
−2
2
is a non-negative martingale with
mean
E[M
]
=
1,
and
E[M
]
=
n
n
k=1 ak . Thus, M
Q
is bounded in L2 if andp
only if n an > 0 (notice that an ∈ (0, 1] e.g. by the Schwarz
√
inequality E[1 · Yn ] ≤ E[Yn ]).
Now, with the standard notation Xn∗ = sup0≤k≤n Xk , using Doob’s L2 inequality,
E[Xn∗ ] ≤ E[(Mn∗ )2 ] ≤ 4E[Mn2 ],
∗
which shows that if M is bounded in L2 , then X∞
is integrable, hence X is U.I. since it
∗
is dominated by X∞ . We thus have obtained 4. =⇒ 1. =⇒ 2. =⇒ 3., where the
second implication comes from the optional stopping theorem
Q for U.I. martingales, and
the implication 2. =⇒ 3. is
√trivial. OnQthe other hand, if n an = 0, since Mn converges
a.s. to some M∞ ≥ 0, then Xn = Mn 1≤k≤n ak converges to 0, so that 3. does not hold.
So 3. =⇒ 4., hence the result.
Note that, if Yn > 0 a.s. for every n, the event {X∞ = 0} is a tail event, so that 2.
above is equivalent to P (X∞ > 0) = 1 by Kolmogorov’s 0 − 1 law.
As an example of application of this theorem, consider a σ-finite measured space
(E, E, λ) and let Ω = E N , F = E ⊗N be the product measurable space. We let Xn (ω) =
ωn , n ≥ 1, and Fn = σ({X1 , . . . , Xn }). One says that X is the canonical (E-valued)
process.
Now suppose given two families of probability measures (µn , n ≥ 1) and (νn , n ≥
1) that admit densities dµn = fn dλ, dν
n = gn dλ with respect
N
N to λ. We suppose that
fn (x)gn (x) > 0 for every n, x. Let P = n≥1 µn , resp. Q = n≥1 νn denote the measures
on (Ω, F) under which (Xn , n ≥ 1) is a sequence Q
of independent random variables with
respective laws µn (resp. νn ). In particular, if A = ni=1 Ai × E N is a measurable rectangle
in Fn ,
Z Y
n
n
gi (xi ) Y
Q(A) =
fi (xi )dxi = E P (Mn 1A ),
f
(x
)
E n i=1 i i i=1
where E P denotes expectation with respect to P , and
Mn =
n
Y
gi (Xi )
i=1
fi (Xi )
.
Since measurable rectangles of Fn form a π-system that span Fn , the probability Q|Fn is
absolutely continuous with respect to P |Fn , with density Mn , so that (Mn , n ≥ 1) is a nonnegative martingale with respect to the filtered space (Ω, F, (Fn , n ≥ 0), P ). Kakutani’s
theorem then shows that M converges a.s. and in L1 to its limit M∞ if and only if
2
X Z p
YZ p
p
fn (x)gn (x)λ(dx) > 0 ⇐⇒
fn (x) − gn (x) λ(dx).
n≥1
E
n≥1
E
28CHAPTER 3. EXAMPLES OF APPLICATIONS OF DISCRETE-TIME MARTINGALES
In this case, one has Q(A) = E P [M∞ 1A ] for every measurable rectangle of F, and Q is
absolutely continuous with respect to P with density M∞ . In the opposite case, M∞ = 0,
so Proposition 3.3.1 shows that Q and P are carried by two disjoint measurable sets.
3.4.1
Example: consistency of the likelihood ratio test
In the case where µn = µ, νn = ν for every n, we see that M∞ = 0 a.s. if (and only
if) µ 6= ν. This is called the consistency of the likelihood ratio test in statistics. Let
us recall the background for the application of this test. Suppose given an i.i.d. sample
X1 , X2 , . . . , Xn , with an unknown common distribution. Suppose one wants to test the
hypothesis (H0 ) that this distribution is P against the hypothesis (H1 ) that it is Q,
where P and Q have everywhere positive densities f, g with respect to some common σfinite measure
λ (for example, a normal distribution and a Cauchy distribution). Letting
Q
Mn = 1≤i≤n g(Xi )/f (Xi ), we use the test 1{Mn ≤1} for acceptance of H0 against H1 . Then
supposing H0 , then M∞ = 0 a.s. so the probability of rejection P (Mn > 1) converges to
0. Similarly, supposing H1 , then M∞ = +∞ a.s. so the probability of rejection goes to 1.
Chapter 4
Continuous-parameter stochastic
processes
In this section, we will consider the case when processes are indexed by a real interval
I ⊂ R, with non-empty interior, in many cases I will be R+ . This makes the whole study
more involved, as we now stress here. In all what follows, the states space E is assumed
to be a metric space, usually E = R or E = Rd endowed with the Euclidean norm.
4.1
Theoretical problems when dealing with continuous time processes
Although the definitions for filtrations, adapted processes, stopping times, martingales,
super-, sub-martingales are not changed when compared to the discrete case (see the beginning of Section 2), the use of continuous time induces important measurability problems. Indeed, there is no reason why an adapted process (ω, t) 7→ Xt (ω) should be a
measurable map defined on Ω × I, or even the sample path t 7→ Xt (ω) for any fixed ω. In
particular, stopped processes like XT 1{T <∞} for a stopping time T have no reason to be
random variables.
Even worse, there are in general “very few” stopping times — for example first entrance
times inf{t ≥ 0 : Xt ∈ A} for measurable (or even open or closed) subsets of the states
space E need not be stopping times.
This is the reason why we add a priori requirements on the regularity of random
processes under consideration. A quite natural requirement is that they are continuous
processes, i.e. that t 7→ Xt (ω) is continuous for a.e. ω, because a continuous function is
determined by its values on a countable dense subset of I. More generally, we will consider
processes that are right-continuous and admit left limits everywhere, a.s. — such processes
are called càdlàg, and are also determined by the values they take on a countable dense
subset of I (the notation càdlàg stands for the French ‘continu à droite, limité à gauche’).
We let C(I, E), D(I, E) denote the spaces of continuous and càdlàg functions from I
to E, we consider these sets as measurable spaces by endowing them with the product
σ-algebra that makes the projections πt : X 7→ Xt measurable for every t ∈ I. Usually, we
will consider processes with values in R, or sometimes Rd for some d ≥ 1 in the chapter
29
30
CHAPTER 4. CONTINUOUS-PARAMETER PROCESSES
on Brownian motion. The following proposition holds, of which (ii) is an analog of 1., 2.
in Proposition 2.1.2.
Proposition 4.1.1 Let (Ω, F, (Ft , t ∈ I), P ) be a filtered probability space, and let (Xt , t ∈
I) be an adapted process with values in E.
(i) Suppose X is continuous (i.e. (Xt (ω), t ∈ I) ∈ C(I, E) for every ω). If A is a
closed set, and inf I > −∞, then the random time
TA = inf{t ∈ I : Xt ∈ A}
is a stopping time.
(ii) Let T be a stopping time, and suppose X is càdlàg. Then XT 1{T <∞} : ω 7→
XT (ω) (ω)1{T (ω)<∞} is an FT -measurable random variable. Moreover, the stopped process
X T = (XT ∧t , t ≥ 0) is adapted.
Proof. For (i), notice that if A is closed and X is continuous, then for every t ∈ I,
{TA ≤ t} =
inf d(Xs , A) = 0 ,
s∈I∩Q,s≤t
where d(x, A) = inf y∈A d(x, y) is the distance from x to the set A. Indeed, if Xs ∈ A
for some s ≤ t, then for qn converging to s in Q ∩ I ∩ (−∞, t], Xqn converges to Xs , so
that d(Xqn , A) converges to 0. Conversely, if there exists qn ∈ Q ∩ I ∩ (−∞, t] such that
d(Xqn , A) converges to 0, then since inf I > −∞ we can extract along a subsequence and
assume qn converges to some s ∈ I ∩ (−∞, t], and this s has to satisfy d(Xs , A) = 0 by
continuity of X. Since A is closed, this implies Xs ∈ A, so that {TA ≤ t}.
For (ii), first note that a random variable Z is FT -measurable
if Z 1{T ≤t} ∈ Ft for
P
every t ∈ I, by approximating Z by a finite sum of the form
αi 1Ai , for Ai ∈ FT .
Notice also that if T is a stopping time, then, if dxe denotes smallest n ∈ Z+ with
n ≥ x, Tn = 2−n d2n T e is also a stopping time with Tn ≥ T , that decreases to T as n → ∞
(Tn = ∞ if T = ∞). Indeed, {Tn ≤ t} = {T ≤ 2−n b2n tc} ∈ Ft (notice dxe ≤ y if and
only if x ≤ byc, where byc is the largest n ∈ Z+ with n ≤ x). Moreover, Tn takes values
in the set Dn∗ = {k2−n , k ∈ Z+ } t {∞} of dyadic numbers with level n (or ∞).
Therefore, XT 1{T <∞} 1{T ≤t} = Xt 1{T =t} + XT 1{T <t} , which by the càdlàg property is
equal to
Xt 1{T =t} + lim XTn ∧t 1{T <t} .
n→∞
The variables Xt 1{T =t} and XTn ∧t 1{T <t} are Ft -measurable, because
X
XTn ∧t =
Xd 1{Tn =d} + Xt 1{t<Tn } .
∗ ,d≤t
d∈Dn
hence the result. For the statement on the stopped process, notice that for every t, XT ∧t
is FT ∧t , hence Ft -measurable.
It turns out that (i) does not hold in general for càdlàg processes, although it is a very
subtle problem to find counterexamples. See Rogers and Williams’ book, Chapters II.74
4.2. FINITE MARGINAL DISTRIBUTIONS, VERSIONS
31
and II.75. In particular, Lemma 75.1 therein shows that TA is a stopping time if A is
compact and X is an adapted càdlàg process, whenever the filtration (Ft , t ∈ I) satisfies
the so-called “usual conditions” — see Section 4.3 for the definition of these conditions.
You may check as an exercise that the times TA for open sets A associated with càdlàg
processes, are stopping times with respect to the filtration (Ft+ , t ∈ I), where
\
Ft+ =
Fs .
s>t
Somehow, the filtration (Ft+ ) foresees what will happen ‘just after’ t.
4.2
Finite marginal distributions, versions
We now discuss the notion of law of a process. If (Xt , t ∈ I) is a stochastic process, we
can consider it as a random variable with values in the set E I of maps f : I → E, where
this last space is endowed with the product σ-algebra (the smallest σ-algebra that makes
the projections f ∈ E I 7→ f (t) measurable for every t ∈ I). It is then natural to consider
the image measure µ of the probability P by the process X as the law of X. However,
this measure is uneasy to manipulate, and the quantities that are of true interest are the
following simpler objects.
Definition 4.2.1 Let (Xt , t ∈ I) be a process. For every finite J ⊂ I, the finite marginal
distribution of X indexed by J is the law µJ of the E J -valued random variable (Xt , t ∈ J).
It is a nice fact that the finite marginal distributions {µJ : J ⊂ I, #J < ∞} uniquely
characterize the law µ of the process (Xt , t ∈ I) as defined above. Indeed, by definition, if
X and Y are càdlàg processes having the same finite marginal
Qtheir distribution
Q laws, then
agree on the π-system of finite “rectangles” of the form i∈J As × t∈I\J E for finite
J ⊂ I, which generate the product σ-algebra, hence the distributions under consideration
are equal. Notice that this uniqueness result does not imply the existence of a process
with given marginal distributions.
The problem with (finite marginal) laws of processes is that they are powerless in
dealing with properties of processes that involve more than countably many times, such
as continuity or càdlàg properties of the process. For example, if X is a continuous
process, there are (many!) non-continuous processes that have the same finite-marginal
distributions as X: the finite marginal distributions just do not ‘see’ the sample path
properties of the process. This motivates the following definition.
Definition 4.2.2 If X and X 0 are two processes defined on some common probability
space (Ω, F, P ), we say that X 0 is a version of X if for every t, Xt (ω) = Xt0 (ω) a.s.
In particular, two versions X and X 0 of the same process share the same finitedimensional distribution, however, this does not say that there exists an ω so that
Xt (ω) = Xt0 (ω) for every t. This becomes true if both X and X 0 are a priori known
to be càdlàg, for instance.
32
CHAPTER 4. CONTINUOUS-PARAMETER PROCESSES
Example. To explain these very abstract notions, suppose we want to find a process
(Xt , 0 ≤ t ≤ 1) whose finite marginal laws are Dirac masses at 0, namely
µJ ({(0, 0, . . . , 0)}) = P (Xs = 0, s ∈ J) = 1
| {z }
#J times
for every finite J ⊂ [0, 1]. Of course, the process Xt = 0, 0 ≤ t ≤ 1 satisfies this. However,
the process Xt0 = 1{U } (t), 0 ≤ t ≤ 1, where U is a uniform random variable on [0, 1], is a
version of X, and therefore has the same law as X. But of course, it is not continuous,
and P (Xt0 = 0 ∀ t ∈ [0, 1]) = 0. We thus want to consider it as a ‘bad’ version of the
zero process. This example motivates the following way of dealing with processes: when
considering a process whose finite marginal distributions are known, we first try to find
the most regular version of the process as we can before studying it.
We will discuss two ‘regularization theorems’ in this course, the martingale regularization theorem and Kolmogorov’s continuity criterion, which are instances of situations
when there exists a regular (continuous or càdlàg) version of the stochastic process under
consideration.
4.3
The martingale regularization theorem
We consider here a martingale (Xt , t ≥ 0) on some filtered probability space (Ω, F, (Ft , t ≥
0), P ). We let N be the set of events in F with probability 0,
\
Fs ,
t ≥ 0,
Ft+ =
s>t
and Fet = Ft+ ∪ N .
e
Theorem 4.3.1 Let (Xt , t ≥ 0) be a martingale. Then there exists a càdlàg process X
which is a martingale with respect to the filtered probability space (Ω, F, (Fet , t ≥ 0), P ),
et |Ft ] a.s. If Fet = Ft for every t ≥ 0, X
e is therefore a
so that for every t ≥ 0, Xt = E[X
càdlàg version of X.
We say that (Ft , t ≥ 0) satisfies the usual conditions if Fet = Ft for every t, that is,
N ⊆ F0 and Ft+ = Ft (a filtration satisfying this last condition for t ≥ 0 is called rightcontinuous, notice that (Ft+ , t ≥ 0) is right-continuous for every filtration (Ft , t ≥ 0)). As
a corollary of Theorem 4.3.1, in the case when the filtration satisfies the usual conditions, a
martingale admits a càdlàg version so there is “little to lose” to consider that martingales
are càdlàg.
Lemma 4.3.1 A function f : Q+ → R admits a left and a right (finite) limit at every
t ∈ R+ if and only if for every rationals a < b and bounded I ⊂ Q, f is bounded on I and
the number
∃ 0 ≤ s1 < t1 < . . . < sn < tn , all in I ,
N ([a, b], I, f ) = sup n ≥ 0 :
f (si ) < a, f (ti ) > b, 1 ≤ i ≤ n
a upcrossings of f from a to b is finite.
4.3. THE MARTINGALE REGULARIZATION THEOREM
33
Proof of Theorem 4.3.1.
We first show that X is bounded on bounded subsets
of Q+ . Indeed, if I is such a subset and J = {a1 , . . . , ak } is a finite subset of I with
a1 < . . . < an , then Ml = Xal , 1 ≤ l ≤ k is a martingale. Doob’s maximal inequality
applied to the submartingale |M | then shows that
cP (Mk∗ > c) = cP ( max |Xal | > c) ≤ E[|Xak |] ≤ E[|XK |]
1≤l≤k
for any K > sup I. Therefore, taking a monotone limit over finite J ⊂ I with union I,
we have
cP (sup |Xt | > c) ≤ E[|XK |].
t∈I
This shows that P (supt∈I |Xt | < ∞) = 1 by letting c → ∞.
Let I still be a bounded subset of R+ , and a < b ∈ Q+ . By definition, we have
N ([a, b], I, X) = supJ⊂I,finite N ([a, b], J, X). So let J ⊂ I be a finite subset of the form
{a1 , a2 , . . . , ak } as above, and again let Ml = Xal , 1 ≤ l ≤ k. Doob’s upcrossing lemma
for this martingale gives
(b − a)E[N ([a, b], J, X)] ≤ E[(Xak − a)− ] ≤ E[(XK − a)− ],
for any K ≥ sup I, because ((Xt − a)− , t ≥ 0) is a submartingale due to the convexity of
x 7→ (x − a)− . Taking the supremum over J shows that N ([a, b], I, X) is a.s. bounded,
because E[|XK |] < ∞. This shows by letting K → ∞ along integers, that N ([a, b], I, X)
is finite for every bounded subset I of Q+ , and every a < b rationals, for every ω in an
event Ω0 with probability 1. Therefore, we can define
et (ω) =
X
lim
s∈Q+ ,s>t
Xs (ω) ,
ω ∈ Ω0
et (ω) = 0 for every t for ω ∈
e thus obtained then is indeed
and X
/ Ω0 . The process X
e is an (Fet )-martingale,
adapted to the filtration (Fet , t ≥ 0). It remains to show that X
et |Ft ] = Xt , and is càdlàg.
satisfies E[X
First, check that if X remains an (Ft ∨ N , t ∈ I)-martingale, because E[X|G ∨ N ] =
E[X|G] in L1 (Ω, G ∨ N , P ) for any integrable X and sub-σ-algebra G ∈ F. Thus, we may
suppose that N ⊂ Ft for every t. Let s < t ∈ R+ , and sn , n ≥ 0 be a (strictly) decreasing
es = lim Xsn = lim E[Xt |Fsn ]
sequence of rationals that converges to s, with s0 < t. Then X
by definition for ω ∈ Ω0 . Now, the process (Mn = Xs−n , n ≤ 0) is a backwards martingale
with respect to the filtration (Gn = Fs−n , n ≤ 0). The backwards martingale convergence
es = E[Xt |Fs+ ], and therefore Xt = E[X
et |Ft ]. Moreover,
theorem thus shows that X
taking a rational sequence (tn ) decreasing to t and using again the backwards martingale
et in L1 , so that X
es = E[X
et |Fes ] for every s ≤ t.
convergence theorem, (Xtn ) converges to X
es (ω)
The only thing that remains to prove is the càdlàg property. If t ∈ R+ and if X
et (ω) as s ↓ t, then |X
et − X
es | > ε for some ε > 0 and for infinitely
does not converge to X
et − Xu | > ε/2 for an infinite number of rationals u > t,
many s > t, so that if ω ∈ Ω0 , |X
e has left limits is similar.
contradicting ω ∈ Ω0 . The argument for showing that X
From now on, when considering martingales in continuous time, we will always take
their càdlàg version, provided the underlying filtration satisfies the usual hypotheses.
34
CHAPTER 4. CONTINUOUS-PARAMETER PROCESSES
4.4
Doob’s inequalities and convergence theorems for
martingales in continuous time
Considering càdlàg martingales makes it straightforward to generalize to the continuous
case the inequalities of section 2.4, by density arguments. We leave to the reader to show
the following theorems which are analog to the discrete-time case.
Proposition 4.4.1 (A.s. convergence) Let (Xt , t ≥ 0) be a càdlàg martingale which
is bounded in L1 . Then Xt converges as t → ∞ a.s. to an (a.s.) finite limit X∞ .
To prove this, notice that convergence of Xt as t → ∞ to a (possibly infinite) limit
is equivalent to the fact that the number of upcrossings of X from below a to above b
over the time interval R+ is finite for every a < b rationals. However, by the càdlàg
property, it suffices to restrict our attention to the countable time set Q+ rather than
R+ . Indeed, for each upcrossing of X from a to b between times s < t say, we can find
rationals s0 > s, t0 > t as close to s, t as wanted so that X accomplishes an upcrossing
from a to b between times s0 , t0 , and this implies that N (X, R+ , [a, b]) = N (X, Q+ , [a, b])
(possibly infinite). Then, use similar arguments as those used in the first part of the proof
of Theorem 4.3.1.
Proposition 4.4.2 (Doob’s inequalities) If (Xt , t ≥ 0) is a càdlàg martingale and
Xt∗ = sup0≤s≤t |Xs |, then for every c > 0, t ≥ 0,
cP (Xt∗ ≥ c) ≤ E[|Xt |].
Moreover, if p > 1 then
kXt∗ kp ≤
p
kXt kp .
p−1
To prove this, notice that Xt∗ = sups∈{t}∪([0,t]∩Q) |Xs | by the càdlàg property.
Proposition 4.4.3 (Lp convergence) (i) If X is a càdlàg martingale and p > 1 then
supt≥0 kXt kp < ∞ if and only if X converges a.s. and in Lp to its limit X∞ , and this if
and only if X is closed in Lp , i.e. there exists Z ∈ Lp so that E[Z|Ft ] = Xt for every t,
a.s. (one can then take Z = X∞ ).
(ii) If X is a càdlàg martingale then X is U.I. if and only if X converges a.s. and in
1
L to its limit X∞ , and this if and only if X is closed (in L1 ).
Proposition 4.4.4 (Optional stopping) Let X be a càdlàg U.I. martingale. Then for
every stopping times S ≤ T , one has E[XT |FS ] = XS a.s.
Proof. Let Tn be the stopping time 2−n d2n T e as defined in the proof of Proposition
4.1.1. The right-continuity of paths of X shows that XTn converges to XT a.s. Moreover,
Tn takes values in the countable set Dn∗ of dyadic rationals of level n (and ∞), so that
X
X
E[X∞ |FTn ] =
E[1{Tn =d} X∞ |FTn ] =
1{Tn =d} E[X∞ |Fd ]
∗
d∈Dn
∗
d∈Dn
4.5. KOLMOGOROV’S CONTINUITY CRITERION
35
(you should check this carefully). Now, since Xt converges to X∞ in L1 , Xd = E[Xt |Fd ] =
E[X∞ |Fd ] a.s., and E[X∞ |FTn ] = XTn . Passing to the limit as n → ∞ and using the
backwards martingale convergence theorem, we obtain E[X∞ |FT0 ] = XT where FT0 =
T
n≥1 FTn , and therefore E[X∞ |FT ] = XT by the tower property, since XT is FT -measurable.
The theorem then follows as in Theorem 2.6.1.
4.5
Kolmogorov’s continuity criterion
Theorem 4.5.1 (Kolmogorov’s continuity criterion) Let (Xt , 0 ≤ t ≤ 1) be a stochastic process with real values. Suppose there exist p > 0, c > 0, ε > 0 so that for every
s, t ≥ 0,
E[|Xt − Xs |p ] ≤ c|t − s|1+ε .
e of X which is a.s. continuous (and even α-Hölder
Then, there exists a modification X
continuous for any α ∈ (0, ε/p)).
Proof. Let Dn = {k · 2−n , 0 ≤ k ≤ 2n } denote the dyadic numbers of [0, 1] with level n,
so Dn increases as n increases. Then letting α ∈ (0, ε/p), Markov’s inequality gives for
0 ≤ k < 2n ,
P (|Xk2−n −X(k+1)2−n | > 2−nα ) ≤ 2npα E[|Xk2−n −X(k+1)2−n |p ] ≤ 2npα 2−n−nε ≤ 2−n 2−(ε−pα)n .
Summing over Dn we obtain
−nα
P
sup |Xk2−n − X(k+1)2−n | > 2
≤ 2−n(ε−pα) ,
0≤k<2n
which is summable. Therefore, the Borel-Cantelli lemma shows that for a.a. ω, there
exists Nω so that if n ≥ Nω , the supremum under consideration is ≤ 2−nα . Otherwise
said, a.s.,
|Xk2−n − X(k+1)2−n |
≤ M (ω) < ∞.
sup
sup
2−nα
n≥0 k∈{0,...,2n −1}
S
We claim that this implies that for every s, t ∈ D = n≥0 Dn , |Xs − Xt | ≤ M 0 (ω)|t − s|α ,
for some M 0 (ω) < ∞ a.s. Indeed, if s, t ∈ D, s < t, and if r is the least integer such that
t − s > 2−r−1 we can write [s, t) as a disjoint unions of intervals of the form [r, r + 2−n )
with r ∈ Dn and n > r, in such a way that for every n > r, at most two of these intervals
have length 2−n . This entails that
X
|Xs − Xt | ≤ 2
M (ω)2−nα ≤ 2(1 − 2−α )−1 M (ω)2−(r+1)α ≤ M 0 (ω)|t − s|α
n≥r+1
where M 0 (ω) < ∞ a.s. Therefore, the process (Xt , t ∈ D) is a.s. uniformly continuous
(and even α-Hölder continuous). Since D is an everywhere dense set in [0, 1], the latter
e on [0, 1], which is also α-Hölder
process a.s. admits a unique continuous extension X
et = limn Xtn , where (tn , n ≥ 0) is any D-valued
continuous (it is consistently defined by X
sequence converging to t). On the exceptional set where (Xd , d ∈ D) is not uniformly
36
CHAPTER 4. CONTINUOUS-PARAMETER PROCESSES
et = 0, 0 ≤ t ≤ 1, so X
e is continuous. It remains to show that X
e is a
continuous, we let X
version of X. To this end, we estimate by Fatou’s lemma
et |p ] ≤ lim inf E[|Xt − Xtn |p ],
E[|Xt − X
n
where (tn , n ≥ 0) is any D-valued sequence converging to t. But since E[|Xt − Xtn |p ] ≤
et a.s. for every t.
c|t − tn |1+ε , this converges to 0 as n → ∞. Therefore, Xt = X
The nice thing about this criterion is that is depends only on a control on the twodimensional marginal distributions of the stochastic process.
In fact, the very same proof can give the following alternative
Corollary 4.5.1 Let (Xd , d ∈ D) be a stochastic process indexed by the set D of dyadic
numbers in [0, 1]. Assume that there exist c, p, ε > 0 so that for every s, t ∈ D,
E[|Xs − Xt |p ] ≤ c|s − t|1+ε ,
thenalmost-surely, the process (Xd , d ∈ [0, 1]) has an extension (Xt , t ∈ [0, 1]) that is
continuous, and even Hölder-continuous of any index α ∈ (0, ε/p).
Chapter 5
Weak convergence
5.1
Definition and characterizations
Let (M, d) be a metric space, endowed with its Borel σ-algebra. All measures in this
chapter will be measures on such a measurable space. Let (µn , n ≥ 0) be a sequence
of probability measures on M . We say that µn converges weakly to the non-negative
measure µ if for every continuous bounded function f : M → R, one has µn (f ) → µ(f ).
Notice that in this case, µ is automatically a probability measure since µ(1) = 1, and
the definition actually still makes sense if we suppose that µn (resp. µ) are just finite
non-negative measures on M .
Examples. Let (xn , n ≥ 0) be a M -valued sequence that converges to x. Then δxn
converges weakly to δx , where δa is the Dirac mass at a. This is just saying that f (xn ) →
f (x) for continuous functions.
P
Let M = [0, 1] and µn = n−1 0≤k≤n−1 δk/n . Then µn (f ) is the Riemann sum
R1
P
n−1 0≤k≤n−1 f (k/n), which converges to 0 f (x)dx if f is continuous, which shows that
µn converges weakly to Lebesgue’s measure on [0, 1].
In this two cases, notice that it is not true that µn (A) converges to µ(A) for every
Borel set A convergence. This ‘pointwise convergence’ is stronger, but much more rigid
than weak convergence. For example, δxn does not converge in that sense to δx unless
xn = x eventually. See e.g. Chapter III in Stroock’s book for a discussion on the various
existing notions of convergence for measures.
Theorem 5.1.1 Let (µn , n ≥ 0) be a sequence of probability distributions. The following
assertions are equivalent:
1. µn converges weakly to µ
2. For every open subset G of M , lim inf n µn (G) ≥ µ(G) (‘open sets can lose mass’)
3. For every closed subset F of M , lim supn µn (F ) ≤ µ(F ) (‘closed sets can gain mass’)
4. For every Borel subset A in M with µ(∂A) = 0, limn µn (A) = µ(A). (‘mass is lost
or gained through the boundary’)
37
38
CHAPTER 5. WEAK CONVERGENCE
Proof. 1. =⇒ 2. Let G be an open subset with nonempty complement Gc . The distance
function d(x, Gc ) is continuous and positive if and only if x ∈ G. Let fM = 1∧(M d(x, Gc )).
Then fM increases to 1G as M ↑ ∞. Now, µn (fM ) ≤ µn (G) converges to µ(fM ), so that
lim inf n µn (G) ≥ µ(fM ) for every M , and by monotone convergence letting M ↑ ∞, one
gets the result.
2. ⇐⇒ 3. is obvious by taking complementary sets.
2.,3. =⇒ 4. Let A◦ and A respectively denote the interior and the closure of A. Since
µ(∂A) = µ(A \ A◦ ) = 0, we obtain µ(A◦ ) = µ(A) = µ(A).
lim sup µn (A) ≤ µ(A) ≤ lim inf µn (A◦ ),
n
n
and since A◦ ⊂ A, this gives the result.
4. =⇒ 1. Let f : M → R+ be a continuous bounded non-negative function, then using
Fubini’s theorem,
Z
Z
f (x)µn (dx) =
M
Z
µn (dx)
M
∞
1{t≤f (x)} dt =
0
Z
K
µn ({f ≥ t})dt,
0
where K is any upper bound for f . Now {f ≥ t} := {x : f (x) ≥ t} is a closed subset
of M , whose boundary is included in {f = t}, because {f > t} is open and included in
{f ≥ t}, and their difference is {f = t}. However, there can be at most a countable set
of numbers t such that µ({f = t}) > 0, because
{t : µ({f = t}) > 0} =
[
{t : µ({f = t}) ≥ n−1 },
n≥1
and the n-th set on the right-hand side has at most n elements. Therefore, for Lebesguealmost all t, µ(∂{f ≥ t}) = 0 and therefore, 4. and dominated convergence over the
finite interval [0, K], where the integrated quantities are bounded by 1, show that µn (f )
RK
converges to 0 µ({f ≥ t})dt = µ(f ). The case of functions taking values of both signs
is immediate.
As a consequence, one obtains the following important criterion for weak convergence
of measures on R. Recall that the distribution function of a non-negative finite measure
µ on R is the càdlàg function defined by Fµ (x) = µ((−∞, x]), x ∈ R.
Proposition 5.1.1 Let µn , n ≥ 0, µ be probability measures on R. Then the following
statements are equivalent:
1. µn converges weakly to µ
2. for every x ∈ R such that Fµ is continuous at x, Fµn (x) converges to Fµ (x) as
n → ∞.
Proof. The continuity of Fµ at x exactly says that µ(∂Ax ) = 0 where Ax = (−∞, x], so
1. =⇒ 2. is immediate by Theorem 5.1.1.
5.2. CONVERGENCE IN DISTRIBUTION
39
S Conversely, let G be an open subset of R, which we write as a countable union
k (ak , bk ) of disjoint open intervals. Then
µn (G) =
X
µn ((ak , bk )) ,
(5.1)
k
while for every k and ak < a0 < b0 < bk . ,
µn ((ak , bk )) = Fµn (bk −) − Fµn (ak ) ≥ Fµn (b0 ) − Fµn (a0 ).
If we take a0 , b0 to be continuity points of Fµ , we then obtain lim inf n µn ((ak , bk )) ≥
Fµ (b0 ) − Fµ (a0 ). Letting a0 ↓ ak , b0 ↑ bk along continuity points of Fµ (such points
always form a dense set in R) gives lim inf n µn ((ak , bk )) ≥P
µ((ak , bk )). On the other hand,
applying Fatou’s lemma to (5.1) yields lim inf n µn (G) ≥ k lim inf n µn ((ak , bk )), whence
lim inf n µn (G) ≥ µ(G).
5.2
Convergence in distribution for random variables
If (Xn , n ≥ 0) is a sequence of random variables with values in a metric space (M, d), and
defined on possibly different probability spaces (Ωn , Fn , Pn ), we say that Xn converges
in distribution to a random variable X on (Ω, F, P ) if the law of Xn converges weakly
to that of X. Otherwise said, Xn converges in distribution to X if for every continuous
bounded function f , E[f (Xn )] converges to E[f (X)].
The two following examples are the probabilistic counterpart of the examples discussed
in the beginning of the previous section.
Examples. If (xn ) is a sequence in M that converges to x, then xn converges as n → ∞
to x in distribution, if the xn , n ≥ 0 and x are considered as random variables!
If U is a uniform random variable on [0, 1) and Un = n−1 bnU c, we see that Un has law
µn and converges in distribution to U .
In the two cases we just discussed, the variables under consideration even converge
a.s., which directly entails convergence in distribution, see the example sheets.
The notion of convergence in distribution is related to the other notions of convergence
for random variables as follows. See the Example sheet 3 for the proof.
Proposition 5.2.1 1. If (Xn , n ≥ 1) is a sequence of random variables that converges in
probability to some random variable X, then Xn converges in distribution to X.
2. If (Xn , n ≥ 1) is a sequence of random variables that converges in distribution to
some constant random variable c, then Xn converges to c in probability.
Using Proposition 5.1.1, we can discuss the following
40
CHAPTER 5. WEAK CONVERGENCE
Example: the central limit theorem. The central limit theorem says that if (Xn , n ≥
1) is a sequence of iid random variables in L2 with m = E[X1 ] and σ 2 = Var (X1 ), then
for every a < b in R,
Z b
Sn − mn
1
2
√
P a≤
≤b → √
e−x /2 dx,
n→∞
σ n
2π a
√
where Sn = X1 + . . . + Xn . This is exactly saying that (Sn − mn)/(σ n) converges in
distribution as n → ∞ to a Gaussian N (0, 1) random variable.
5.3
Tightness
Definition 5.3.1 Let {µi , i ∈ I} be a family of probability measures on M . This family
is said to be tight if for every ε > 0, there exists a compact subset K ⊂ M such that
sup µi (M \ K) ≤ ε,
i∈I
i.e. most of the mass of µi is contained in K, uniformly in i ∈ I.
Proposition 5.3.1 (Prokhorov’s theorem) Suppose that the sequence of probability
measures (µn , n ≥ 0) on M is tight. Then there exists a subsequence (µnk , k ≥ 0) along
which µn converges weakly to some limiting µ.
The proof is considerably eased when M = R, which we will suppose. For the general
case, see Billingsley’s book Convergence of Probability Measures. Notice that in particular,
if (µn , n ≥ 0) is a sequence of probability measures on a compact space, then there exists
a subsequence µnk weakly converging to some µ.
Proof. Let Fn be the distribution function of µn . Then it is easy by a diagonal extraction
argument to find an extraction (nk , k ≥ 0) and a non-decreasing function F : Q → [0, 1]
such that Fnk (r) → F (r) as k → ∞ for every rational r. The function F is extended on
R as a càdlàg non-decreasing function by the formula F (x) = limr↓x,r∈Q F (r). It is then
elementary by a monotonicity argument to show that Fnk (x) → F (x) for every x which
is a continuity point of F .
To conclude, we must check that F is the distribution function of some measure µ.
But the tightness shows that for every ε > 0, there exists A > 0 such that Fn (A) ≥ 1 − ε
and Fn (−A) ≤ ε for every n. By further choosing A so that F is continuous at A and
−A, we see that F (A) ≥ 1 − ε and F (−A) ≤ ε, whence F has limits 0 and 1 at −∞
and +∞. By a standard corollary of Caratheodory’s theorem, there exists a probability
measure µ having F as its distribution function.
Remark. The fact that Fn converges up to extraction to a function F , which need not be
a probability distribution function unless the tightness hypothesis is verified, is a particular
case of Helly’s theorem, and says that up to extraction, a family of probability laws µn
converges vaguely to a possibly defective measure µ (i.e. of mass ≤ 1), i.e. µn (f ) → µ(f )
5.4. LÉVY’S CONVERGENCE THEOREM
41
for every f with compact support. The problem that could appear is that some of the mass
of the µn ’s could ‘go to infinity’, for example δn converges vaguely to the zero measure as
n → ∞, and does not converge weakly. This phenomenon of mass ‘going away’ is exactly
what Prokhorov’s theorem prevents from happening.
In many situations, showing that a sequence of random variables Xn converges in
distribution to a limiting X with law µ is done in two steps. One first shows that the
sequence (µn , n ≥ 0) of laws of the Xn form a tight sequence. Then, one shows that the
limit of µn along any subsequence cannot be other than µ. This will be illustrated in the
next section.
5.4
Lévy’s convergence theorem
In this section, we let d be a positive integer and consider only random variables with
values in the states space Rd .
Recall that the characteristic function of an Rd -valued random variable X is the function ΦX : Rd → C defined by ΦX (λ) = E[exp(ihλ, Xi)]. It is a continuous function on
Rd , such that ΦX (0) = 1. Moreover, it induces an injective mapping from (distributions
of) random variables to complex-valued functions defined on Rd , in the sense that two
random variables with distinct distributions have distinct characteristic functions.
The following theorem is extremely useful in practice.
Theorem 5.4.1 (Lévy’s convergence theorem) Let (Xn , n ≥ 0) be a sequence of random variables.
(i) If Xn converges in distribution to a random variable X, then ΦXn (λ) converges to
ΦX (λ) for every λ ∈ Rd .
(ii) If ΦXn (λ) converges to Ψ(λ) for every λ ∈ Rd , where Ψ is a function which is
continuous at 0, then Ψ is a characteristic function, i.e. there exists a random variable X
such that Ψ = ΦX , and moreover, Xn converges in distribution to X.
Corollary 5.4.1 If (Xn , n ≥ 0), X are random variables in Rd , then Xn converges in
distribution to X as n → ∞ if and only if ΦXn converges to ΦX pointwise.
The proof of (i) in Lévy’s theorem is immediate since the function x 7→ exp(iλ · x) is
continuous and bounded from Rd to C. For the proof of (ii), we will need to show that
the hypotheses imply the tightness of the sequence of laws of Xn , n ≥ 0. To this end, the
following bound is very useful.
Lemma 5.4.1 Let X be a random variable with values in Rd . Then for any norm k · k
on Rd there exists a constant C > 0 (depending on d and on the choice of the norm) such
that
Z
d
P (kXk ≥ K) ≤ CK
(1 − <ΦX (u))du.
[−K −1 ,K −1 ]d
42
CHAPTER 5. WEAK CONVERGENCE
Proof. Let µ be the distribution of X. Using Fubini’s theorem and a simple recursion,
it is easy to check that
!
Z
Z
d
Y
1
sin(λx
)
i
.
(1 − <ΦX (u))du = 2d
µ(dx) 1 −
λd [−λ,λ]d
λx
d
i
R
i=1
Now, the continuous function sinc : t ∈ R 7→ t−1 sin t is such that
Q there exists 0 < c < 1
such that |sinc t| ≤ c for every t ≥ 1, so that f : u ∈ Rd 7→ di=1 sin ui /ui is such that
|f (u)| ≤ c as soon as kuk∞ ≥ 1. Therefore, 1 − f is a non-negative continuous function
which is ≥ 1 − c when kuk∞ ≥ 1. Letting C = 2d (1 − c)−1 entails that C(1 − f (u)) ≥
1{kuk∞ ≥1} . Putting things together, one gets the result for the norm k·k∞ , and the general
result follows from the equivalence of norms in finite-dimensional vector spaces.
Proof of Lévy’s theorem.
Suppose ΦXn converges pointwise to a limit Ψ that is
continuous at 0. Then, |1 − <ΦXn | being bounded above by 2, fixing ε > 0, the dominated
convergence theorem shows that for any K > 0
Z
Z
d
d
lim K
(1 − <ΦXn (u))du = K
(1 − <Ψ(u))du.
n
[−K −1 ,K −1 ]d
[−K −1 ,K −1 ]d
By taking K large enough, we can make this limiting value < ε/(2Cd ), because Ψ is
continuous at 0, and it follows by the lemma that for every n large enough, P (|Xn | ≥
K) ≤ ε. Up to increasing K, this then holds for every n, showing tightness of the family
of laws of the Xn . Therefore, up to extracting a subsequence, we see from Prokhorov’s
theorem that Xn converges in distribution to a limiting X, so that ΦXn converges pointwise
to ΦX along this subsequence (by part (i)). This is possible only if ΦX = Ψ, showing
that Ψ is a characteristic function. Moreover, this shows that the law of X is the only
possible probability measure which is the weak limit of the laws of the Xn along some
subsequence, so Xn must converge to X in distribution.
More precisely, if Xn did not converge in distribution to X, we could find a continuous
bounded f , some ε > 0 and a subsequence Xnk such that for all k,
|E[f (Xnk )] − E[f (X)]| > ε
(5.2)
But since the laws of (Xnk , k ≥ 0) are tight, we could find a further subsequence along
which Xnk converges in distribution to some X 0 , which by (i) would satisfy ΦX 0 = Ψ = ΦX
and thus have same distribution as X, contradicting (5.2).
Chapter 6
Introduction to Brownian motion
6.1
History up to Wiener’s theorem
This chapter is devoted to the construction and some properties of one of probability
theory’s most fundamental objects. Brownian motion earned its name after R. Brown,
who observed around 1827 that tiny particles of pollen in water have an extremely erratic
motion. It was observed by Physicists that this was due to a important number of random
shocks undertaken by the particles from the (much smaller) water molecules in motion
in the liquid. A. Einstein established in 1905 the first mathematical basis for Brownian
motion, by showing that it must be an isotropic Gaussian process. The first rigorous
mathematical construction of Brownian motion is due to N. Wiener in 1923, using Fourier
theory.
In order to motivate the introduction of this object, we first begin by a “microscopical”
depiction of Brownian motion. Suppose (Xn , n ≥ 0) is a sequence of Rd valued random
variables with mean 0 and covariance matrix σ 2 Id , which is the identity matrix in d
dimensions, for some σ 2 > 0. Namely, if X1 = (X11 , . . . , X1d ),
E[X1i X1j ] = σ 2 δij ,
E[X1i ] = 0 ,
1 ≤ i, j ≤ d.
We interpret Xn as the spatial displacement resulting from the shocks due to water
molecules during the n-th time interval, and the fact that the covariance matrix is scalar
stands for an isotropy assumption (no direction of space is priviledged).
From this, we let Sn = X1 + . . . + Xn and we embed this discrete-time process into
continuous time by letting
(n)
Bt
= n−1/2 S[nt] ,
t ≥ 0.
Let | · | be the Euclidean norm on Rd and for t > 0 and X, y ∈ Rd , define
|x|2
1
exp −
,
pt (x) =
(2πt)d/2
2t
which is the density of the Gaussian distribution N (0, tId ) with mean 0 and covariance
matrix tId . By convention, the Gaussian law N (m, 0) is the Dirac mass at m.
43
44
CHAPTER 6. BROWNIAN MOTION
Proposition 6.1.1 Let 0 ≤ t1 < t2 < . . . < tk . Then the finite marginal distributions of
B (n) with respect to times t1 , . . . , tk converge weakly as n → ∞. More precisely, if F is a
bounded continuous function, and letting x0 = 0, t0 = 0,
Z
h
i
Y
(n)
(n)
F (x1 , . . . , xk )
E F (Bt1 , . . . , Btk ) →
pσ2 (ti −ti−1 ) (xi − xi−1 )dxi .
n→∞
(n)
(Rd )k
1≤i≤k
(n)
Otherwise said, (Bt1 , . . . , Btk ) converges in distribution to (G1 , G2 , . . . , Gk ), which is a
random vector whose law is characterized by the fact that (G1 , G2 − G1 , . . . , Gk − Gk−1 )
are independent centered Gaussian random variables with respective covariance matrices
σ 2 (ti − ti−1 )Id .
(n)
(n)
(n)
(n)
Proof. With the notations of the theorem, we first check that (Bt1 , Bt2 −Bt1 , . . . , Btk −
(n)
Btk−1 ) is a sequence of independent random variables. Indeed, one has for 1 ≤ i ≤ k,
(n)
Bti
−
(n)
Bti−1
1
=√
n
[nti ]
X
Xj ,
j=[nti−1 ]+1
and the independence follows by the fact that (Xj , j ≥ 0) is an i.i.d. family. Even better,
we have the identity in distribution for the i-th increment
(n)
(n)
Bti − Bti−1
p
[nti ]−[nti−1 ]
X
[nti ] − [nti−1 ]
1
√
p
=
Xj ,
n
[nti ] − [nti−1 ]
j=1
d
and the central limit theorem shows that this converges in distribution to a Gaussian law
N (0, σ 2 (ti − ti−1 )Id ). Summing up our study, and introducing characteristic functions, we
have shown that for every ξ = (ξj , 1 ≤ j ≤ k),
"
E exp i
k
Y
!#
(n)
ξj (Btj
−
(n)
Btj−1
=
j=1
k
Y
h
i
(n)
(n)
E exp iξj (Btj − Btj−1
j=1
→
n→∞
k
Y
E[exp (iξj (Gj − Gj−1 )]
j=1
"
=
E exp i
k
Y
!#
ξi (Gj − Gj−1 )
,
j=1
where G1 , . . . , Gk is distributed as in the statement of the proposition. By Lévy’s convergence theorem we deduce that increments of B (n) between times ti converge to increments
of the sequence Gi , which is easily equivalent to the statement.
This gives the clue that B (n) should converge to a process B whose increments are
independent and Gaussian with covariances dictated by the above formula. This will be
set in a rigorous way at the end of this section, with Donsker’s invariance theorem.
6.1. WIENER’S THEOREM
45
Definition 6.1.1 A Rd -valued stochastic process (Bt , t ≥ 0) is called a standard Brownian
motion if it is a continuous process, that satisfies the following conditions:
(i) B0 = 0 a.s.,
(ii) for every 0 = t0 ≤ t1 ≤ t2 ≤ . . . ≤ tk , the increments (Bt1 −Bt0 , Bt2 −Bt1 , . . . , Btk −
Btk−1 ) are independent, and
(iii) for every t, s ≥ 0, the law of Bt+s − Bt is Gaussian with mean 0 and covariance
sId .
The term “standard” refers to the fact that B1 is normalized to have variance Id , and
the choice B0 = 0.
The characteristic properties (i), (ii), (iii) exactly amount to say that the finitedimensional marginals of a Brownian motion are given by the formula of Proposition
6.1.1. Therefore the law of the Brownian motion is uniquely determined. We now show
Wiener’s theorem that Brownian motion exists!
Theorem 6.1.1 (Wiener) There exists a Brownian motion on some probability space.
Proof. We will first prove the theorem in dimension d = 1 and construct a process
(Bt , 0 ≤ t ≤ 1) satisfying the properties of a Brownian motion.
S
Let D0 = {0, 1}, Dn = {k2−n , 0 ≤ k ≤ 2n } for n ≥ 1, and D = n≥0 Dn be the set of
dyadic rational numbers in [0, 1]. On some probability space (Ω, F, P ), let (Zd , d ∈ D) be
a collection of independent random variables all having a Gaussian distribution N (0, 1)
with mean 0 and variance 1. We are first going to construct the process (Bd , d ∈ D) so
that Bd is a linear combination of the Zd0 ’s for every d.
It is a well-known and important fact that if random variables X1 , X2 , . . . are linear
combinations of independent centered Gaussian random variables, then X1 , X2 , . . . are independent if and only if they are pairwise uncorrelated, namely Cov (Xi , Xj ) = E[Xi Xj ] =
0 for every i 6= j.
We set B0 = 0 and Bd = Z1 . Inductively, given (Bd , d ∈ Dn−1 ), we build (Bd , d ∈ Dn )
in such a way that
• (Bd , d ∈ Dn ) satisfies (i), (ii), (iii) in the definition of the Brownian motion (where
the instants under consideration are taken in Dn ).
• the random variables (Zd , d ∈ D \ Dn ) are independent of (Bd , d ∈ Dn ).
To this end, take d ∈ Dn \ Dn−1 , and let d− = d − 2−n and d+ = d + 2−n so that d− , d+
are consecutive dyadic numbers in Dn−1 . Then write
Bd =
Bd− + Bd+
Zd
+ (n+1)/2 .
2
2
Then Bd −Bd− = (Bd+ −Bd− )/2+Zd /2(n+1)/2 and Bd+ −Bd = (Bd+ −Bd− )/2−Zd /2(n+1)/2 .
Now notice that Nd := (Bd+ − Bd− )/2 and Nd0 := Zd /2(n+1)/2 are by the induction
hypothesis two independent centered Gaussian random variables with variance 2−n−1 .
From this, one deduces Cov (Nd + Nd0 , Nd − Nd0 ) = Var (Nd ) − Var (Nd0 ) = 0, so that
46
CHAPTER 6. BROWNIAN MOTION
the increments Bd − Bd− and Bd+ − Bd are independent with variance 2−n , as should
be. Moreover, these increments are independent of the increments Bd0 +2−n−1 − Bd0 for
d0 ∈ Dn−1 , d0 6= d− and of Zd0 , d0 ∈ Dn \ Dn−1 , d0 6= d so they are independent of the
increments Bd00 +2−n − Bd00 for d00 ∈ Dn , d00 ∈
/ {d− , d}. This allows the induction argument
to proceed one step further.
Thus, we have a process (Bd , d ∈ D) satisfying the properties of Brownian motion.
Note that Bt − B − s has same
√
Let s ≤ t ∈ D, and notice that for every p > 0, since Bt −Bs has same law as t − sN ,
where N is a standard Gaussian random variable,
E[|Bt − Bs |p ] = |t − s|p/2 E[|N |p ].
Since a Gaussian random variable admits moments of all orders, it follows from Corollary
4.5.1 that (Bd , d ∈ D) a.s. admits a continuous continuation (Bt , 0 ≤ t ≤ 1).
Up to modifying B on the exceptional event where such an extension does not exist,
replacing it by the 0 function for instance, we see that B can be supposed to be continuous
for every ω.
We now check that (Bt , t ∈ [0, 1]) thus constructed has the properties of Brownian
motion. Let 0 = t0 < t1 < . . . < tk , and let 0 = tn0 < tn1 < . . . < tnk be dyadic numbers
such that tni converges to ti as n → ∞. Then by continuity, (Btn1 , . . . , Btnk ) converges a.s.
to (Bt1 , . . . , Btk ) as n → ∞, while on the other hand, (Btnj − Btnj−1 ,1≤j≤k ) are independent
Gaussian random variables with variances (tnj − tnj−1 , 1 ≤ j ≤ k), so it is not difficult
using Lvy’s theorem to see that this converges in distribution to independent Gaussian
random variables with respective variances tj − tj−1 , which thus is the distribution of
(Btj − Btj−1 , 1 ≤ j ≤ k), as wanted.
It is now easy to construct a Brownian motion indexed by R+ : simply take independent
standard Brownian motions (Bti , 0 ≤ t ≤ 1), i ≥ 0 as we just constructed, and let
btc−1
Bt =
X
btc
B1i + Bt−btc ,
t ≥ 0.
i=0
It is easy to check that this has the wanted properties.
Finally, it is straightforward to build a Brownian motion in Rd , by taking d independent
copies B 1 , . . . , B d of B and checking that ((Bt1 , . . . , Btd ), t ≥ 0) is a Brownian motion in
Rd .
Let ΩW = C(R+ , Rd ) be the ‘Wiener space’ of continuous functions, endowed with the
product σ-algebra W (or the Borel σ-algebra associated with the compact-open topology).
Let Xt (w) = w(t), t ≥ 0 denote the canonical process (w ∈ ΩW ).
Proposition 6.1.2 (Wiener’s measure) There exists a unique measure W0 (dw) on
(ΩW , W), such that (Xt , t ≥ 0) is a standard Brownian motion on (ΩW , W, W0 (dw)).
Proof. Let (Bt , t ≥ 0) be a standard Brownian motion defined on some probability space
(Ω, F, P ) The distribution of B, i.e. the image measure of P by the random variable
6.2. FIRST PROPERTIES
47
B : Ω → ΩW , is a measure W0 (dw) satisfying the conditions of the statement. Uniqueness
is obvious because such a measure is determined by the finite-dimensional marginals of
Brownian motion.
For x ∈ Rd we also let Wx (dw) to be the image measure of W by (wt , t ≥ 0) 7→
(x + wt , t ≥ 0). A (continuous) process with law Wx (dw) is called a Brownian motion
started at x.
We let (FtB , t ≥ 0) be the natural filtration of (Bt , t ≥ 0).
Notice that Kolmogorov’s continuity lemma shows that a standard Brownian motion
is also a.s. locally Hölder continuous with any exponent α < 1/2, since it is for every
α < 1/2 − 1/p for some integer p.
6.2
First properties
The first few following basic (and fundamental) invariance properties of Brownian motion
are left as an exercise.
Proposition 6.2.1 Let B be a standard Brownian motion in Rd .
1. If U ∈ O(n) is an orthogonal matrix, then U B = (U Bt , t ≥ 0) is again a Brownian
motion. In particular, −B is a Brownian motion.
2. If λ > 0 then (λ−1/2 Bλt , t ≥ 0) is a standard Brownian motion (scaling property)
3. For every t ≥ 0, the shifted process (Bt+s − Bt , s ≥ 0) is a Brownian motion independent of FtB (simple Markov property).
We now turn to less trivial path properties of Brownian motion. We begin with
Theorem 6.2.1 T
(Blumenthal’s 0 − 1 law) Let B be a standard Brownian motion. The
B
σ-algebra F0+ = ε>0 FεB is trivial, i.e. constituted of the events of probability 0 or 1.
B
Proof. Let 0 < t1 < t2 < . . . < tk and A ∈ F0+
. Then if F is continuous bounded
d k
function (R ) → R, we have by continuity of B and the dominated convergence theorem,
E[1A F (Bt1 , . . . , Btk )] = lim E[1A F (Bt1 −ε , . . . , Btk −ε )],
(ε)
(ε)
ε↓0
where B ε = (Bt+ε−Bε , t ≥ 0). On the other hand, since A is FεB -measurable for any ε > 0,
the simple Markov property shows that this is equal to
P (A) lim E[F (Bt1 −ε , . . . , Btk −ε )],
ε↓0
which is P (A)E[F (Bt1 , . . . , Btk )], using again dominated convergence and continuity of B
B
B
and F . This entails that F0+ is independent of σ(Bs , s ≥ 0) = F∞
. However, F∞
contains
B
F0+ , so that the latter σ-algebra is independent of itself, and P (A) = P (A ∩ A) = P (A)2 ,
entailing the result.
48
CHAPTER 6. BROWNIAN MOTION
Proposition 6.2.2 (i) For d = 1 and t ≥ 0, let St = sup0≤s≤t Bs and It = inf 0≤s≤t Bs
(these are random variables because B is continuous). Then almost-surely, for every
ε > 0, one has
Sε > 0
and
Iε < 0.
In particular, there exists a zero of B in any interval of the form (0, ε), ε > 0.
(ii) A.s.,
sup Bt = − inf Bt = +∞.
t≥0
t≥0
(iii) Let C be an open cone in Rd with non-empty interior and origin at 0, i.e. a set
of the form {tu : t > 0, u ∈ A}, where A is an non-empty open subset of the unit sphere
of Rd . If
HC = inf{t > 0 : Bt ∈ C}
is the first hitting time of C, then HC = 0 a.s.
Proof. (i) The probability that Bt > 0 is 1/2 for every t, so P (St > 0) ≥ 1/2, and therefore if tn , n ≥ 0 is any sequence decreasing to 0, P (lim supn {Btn > 0}) ≥ lim supn P (Btn >
0) = 1/2. Since the event lim supn {Btn > 0} is in F0+ , Blumenthal’s law shows that its
probability must be 1. The same is true for the infimum by considering the Brownian
motion −B.
(ii) Let S∞ = supt≥0 Bt . By scaling invariance, for every λ > 0, λS∞ = supt≥0 λBt has
same law as supt≥0 Bλ2 t = S∞ . This is possible only if either S∞ ∈ {0, ∞} a.s., however,
it cannot be 0 by (i).
(iii) The cone C is invariant by multiplication by a positive scalar, so that P (Bt ∈ C)
is the same as P (B1 ∈ C) for every t by the scaling invariance of Brownian motion. Now,
if C has nonempty interior, it is straightforward to check that P (B1 ∈ C) > 0, and one
concludes similarly as above. Details are left to the reader.
6.3
The strong Markov property
We now want to prove an important analog of the simple Markov property, where deterministic times are replaced by stopping times. To begin with, we extend a little the
definition of Brownian motion, by allowing it to start from a random location, and by
working with filtrations that are larger with the natural filtration of standard Brownian
motions.
We say that B is a Brownian motion (started at B0 ) if (Bt − B0 , t ≥ 0) is a standard
Brownian motion which is independent of B0 . Otherwise said, it is the same as the
definition as a standard Brownian motion, except that we do not require that B0 = 0. If
we want to express this on the Wiener space with the Wiener measure, we have for every
measurable functional F : ΩW → R+ ,
E[F (Bt , t ≥ 0)] = E[F (Bt − B0 + B0 , t ≥ 0)],
and since (Bt − B0 ) has law Wx , this is
Z
Z
Z
P (B0 ∈ dx)
W0 (dw)F (x + w(t), t ≥ 0) =
Rd
ΩW
Rd
P (B0 ∈ dx)Wx (F ) = E[WB0 (F )],
6.3. THE STRONG MARKOV PROPERTY
49
where as above, Wx is the image of W0 by the translation w 7→ x + w, and WB0 (F )
is the random variable ω 7→ WB0 (ω) (F ). Using Proposition 1.3.4 actually shows that
E[F (B)|B0 ] = WB0 (F ).
Let (Ft , t ≥ 0) be a filtration. We say that a Brownian motion B is an (Ft )-Brownian
motion if B is adapted to (Ft ), and if B (t) = (Bt+s − Bt , s ≥ 0) is independent of Ft for
every t ≥ 0. For instance, if (Ft ) is the natural filtration of a 2-dimensional Brownian
motion (Bt1 , Bt2 , t ≥ 0), then (Bt1 , t ≥ 0) is an (Ft )-Brownian motion. If B 0 is a standard
Brownian motion and X is a random variable independent of B 0 , then B = (X +Bt0 , t ≥ 0)
0
is a Brownian motion (started at B0 = X), and it is an (FtB ) = (σ(X) ∨ FtB )-Brownian
motion. A Brownian motion is always an (FtB )-Brownian motion. If B is a standard
Brownian motion, then the completed filtration Ft = FtB ∨ N (N being the set of events
of probability 0) can be shown to be right-continuous, i.e. Ft+ = Ft for every t ≥ 0, and
B is an (Ft )-Brownian motion.
Let (Bt , t ≥ 0) be an (Ft )-Brownian motion in Rd and T be an (Ft )-stopping time.
(T )
We let Bt = BT +t − BT for every t ≥ 0 on the event {T < ∞}, and 0 otherwise. Then
Theorem 6.3.1 (Strong Markov property) Conditionally on {T < ∞}, the process
B (T ) is a standard Brownian motion, which is independent of FT . Otherwise said, conditionally given FT and {T < ∞}, the process (BT +t , t ≥ 0) is an (FT +t )-Brownian motion
started at BT .
Proof. Suppose first that T < ∞ a.s. Let A ∈ FT , and consider times t1 < t2 < . . . < tk .
We want to show that for every bounded continuous function F on (Rd )k ,
E[1A F (Bt1 , . . . , Btk )] = P (A)E[F (Bt1 , . . . , Btk )].
(T )
(T )
(6.1)
Indeed, taking A = Ω entails that B (T ) is a Brownian motion, while letting A vary in FT
(T )
(T )
entails the independence of (Bt1 , . . . , Btk ) and FT for every t1 , . . . , tk , hence of B (T ) and
FT .
Now, suppose first that T takes its values in a countable subset E of R+ . Then
X
(T )
(T )
(s)
(s)
E[1A F (Bt1 , . . . , Btk )] =
E[1A∩{T =s} F (Bt1 , . . . , Btk )]
s∈E
=
X
P (A ∩ {T = s})E[F (Bt1 , . . . , Btk )],
s∈E
where we used the simple Markov property and the fact that A ∩ {T = s} ∈ Fs by
definition. Back to the general case, we can apply this result to the stopping time Tn =
2−n d2n T e. Since Tn ≥ T , it holds that FT ⊂ FTn so that we obtain for A ∈ FT
E[1A F (Bt1 n , . . . , Btk n )] = P (A)E[F (Bt1 , . . . , Btn )].
(T )
(T )
(T )
(T )
(6.2)
Now, by a.s. continuity of B, it holds that Bt n converges a.s. to Bt as n → ∞, for
every t ≥ 0. Since F is bounded, the dominated convergence theorem allows to pass to
the limit in (6.2), obtaining (6.1).
Finally, if P (T = ∞) > 0, check that (6.1) remains true when replacing A by A∩{T <
∞}, and divide by P ({T < ∞}).
50
CHAPTER 6. BROWNIAN MOTION
An important example of application of the strong Markov property is the so-called
reflection principle. Recall that St = sup0≤s≤t Bs .
Theorem 6.3.2 (Reflection principle) Let (Bt , t ≥ 0) be an (Ft )-Brownian motion
started at 0, and T be an (Ft )-stopping time. Then, the process
et = Bt 1{t≤T } + (2BT − Bt )1{t>T } ,
B
t≥0
is also an (Ft )-Brownian motion started at 0.
Proof. By the strong Markov property, the processes (Bt , 0 ≤ t ≤ T ) and B (T ) are
independent. Moreover, B (T ) is a standard Brownian motion, and hence has same law
as −B (T ) . Therefore, the pair ((Bt , 0 ≤ t ≤ T ), B (T ) ) has same law as ((Bt , 0 ≤ t ≤
T ), −B (T ) ). On the other hand, the trajectory B is a measurable G((Bt , 0 ≤ t ≤ T ), B (T ) ),
where G(X, Y ) is the concatenation of the paths X, Y . The conclusion follows from the
e
fact that G((Bt , 0 ≤ t ≤ T ), −B (T ) ) = B.
Corollary 6.3.1 (Sometimes also called the reflection principle) Let 0 < b and
a ≤ b, then for every t ≥ 0,
P (St ≥ b, Bt ≤ a) = P (Bt ≥ 2b − a).
Proof. Let Tx = inf{t ≥ 0 : Bt ≥ x} be the first entrance time of Bt in [x, ∞) for
x > 0. Then Tx is an (FtB )-stopping time for every x by (i), Proposition 4.1.1. Notice
that Tx < ∞ a.s. since S∞ = ∞ a.s., where S∞ = limt→∞ St .
Now by continuity of B, BTx = x for every x. By the reflection principle applied to
e given in the statement of the reflection principle)
T = Tb , we obtain (with the definition B
et ≥ 2b − a),
P (St ≥ b, Bt ≤ a) = P (Tb ≤ t, 2b − Bt ≥ 2b − a) = P (Tb ≤ t, B
since 2b − Bt =
contained in {Tb
2b − a), and the
et as soon as t ≥ Tb . On the other hand, the event {B
et ≥ 2b − a} is
B
et ≥
≤ t} since 2b − a ≥ b. Therefore, we obtain P (St ≥ b, Bt ≤ a) = P (B
e is a Brownian motion.
result follows since B
Notice also that the probability under consideration is equal to P (St > b, Bt < a) =
P (Bt > 2b − a), i.e. the inequalities can be strict or not. Indeed, for the right-hand side,
this is due to the fact that the distribution of Bt is non-atomic, and for the left-hand side,
this boils down to showing that for every x,
Tx = inf{t ≥ 0 : Bt > x} , a.s.,
which is a straightforward consequence of the strong Markov property at time Tx , combined with Proposition 6.2.2.
Corollary 6.3.2 The random variable St has the same law as |Bt |, for every fixed t ≥ 0.
Moreover, for every x > 0, the random time Tx has same law as (x/B1 )2 .
Proof. As a ↑ b, the probability P (St ≥ b, Bt ≤ a) converges to P (St ≥ b, Bt ≤ b), and
this is equal to P (Bt ≥ b) by Corollary 6.3.1. Therefore,
P (St ≥ b) = P (St ≥ b, Bt ≤ b) + P (Bt ≥ b) = 2P (Bt ≥ b) = P (|Bt | ≥ b),
because {Bt ≥ b} ⊂ {St ≥ b}, and this gives the result. We leave the computation of the
distribution of Tx as an exercise.
6.4. MARTINGALES AND BROWNIAN MOTION
6.4
51
Some martingales associated to Brownian motion
One of the nice features of Brownian motion is that there are a tremendous amount of
martingales that are associated with it.
Proposition 6.4.1 Let (Bt , t ≥ 0) be an (Ft )-Brownian motion.
(i) If d = 1 and B0 ∈ L1 , the process (Bt , t ≥ 0) is a (Ft )-martingale.
(ii) If d = 1 and B0 ∈ L2 , the process (Bt2 − t, t ≥ 0) is a (Ft )-martingale.
(iii) In any dimension, let u = (u1 , . . . , ud ) ∈ Cd . If E[exp(hu, B0 i)] < ∞, the process
2
d
2
M = (exp(hu, BP
t i − tu /2), t ≥ 0) is also a (Ft )-martingale for every u ∈ C , where u is
d
a notation for i=1 u2i .
Notice that in (iii), we are dealing with C-valued processes. The definition of E[X|G]
the conditional expectation for a random variable X ∈ L1 (C) is E[<X|G]+iE[=X|G], and
we say that an integrable process (Xt , t ≥ 0) with values in C, and adapted to a filtration
(Ft ), is a martingale if its real and imaginary parts are. Notice that the hypothesis on B0
in (iii) is automatically satisfied whenever u = iv with v ∈ R is purely imaginary.
(s)
(s)
Proof. (i) If s ≤ t, E[Bt − Bs |Fs ] = E[Bt−s ] = 0, where Bu = Bu+s − Bs has mean 0
and is independent of Fs , by the simple Markov property. The integrability of the process
is obvious by hypothesis on B0 .
(ii) Integrability is an easy exercise using that Bt − B0 is independent of B0 . We have,
for s ≤ t, Bt2 = (Bt − Bs )2 + 2Bs (Bt − Bs ) + Bs2 . Taking conditional expectation given Fs
and using the simple Markov property gives that E[Bt2 ] = (t − s) + Bs2 , hence the result.
(iii) Integrability comes from the fact that E[exp(λBt )] = exp(tλ2 /2) whenever B is a
standard Brownian motion, and the fact that
E[exp(hu, Bt i)] = E[exp(hu, (Bt −B0 +B0 )i)] = E[exp(hu, Bt −B0 i)]E[exp(hu, B0 i)] < ∞.
For s ≤ t, Mt = exp(ihu, (Bt −Bs )i+ihu, Bs i+t|u|2 /2). We use the Markov property again,
and the fact that E[exp(ihu, Bt − Bs i)] = exp(−(t − s)|u|2 /2), which is the characteristic
function of a Gaussian law with mean 0 and variance |u|2 .
From this, one can show that
Proposition 6.4.2 Let (Bt , t ≥ 0) be a standard Brownian motion and Tx = inf{t ≥ 0 :
Bt = x}. Then for x, y > 0, one has
x
P (T−y < Tx ) =
, E[Tx ∧ T−y ] = xy.
x+y
Proposition 6.4.3 Let (Bt , t ≥ 0) be a (Ft )-Brownian motion. Let f (t, x) : R+ ×Rd → C
be continuously differentiable in the variable t and twice continuously differentiable in x,
and suppose that f and its derivatives of all order are bounded. Then,
Z t ∂
1
Mt = f (t, Bt ) − f (0, B0 ) −
ds
+ ∆ f (s, Bs ) , t ≥ 0
∂t 2
0
P
∂2
is a (Ft )-martingale, where ∆ = di=1 ∂x
2 is the Laplacian operator acting on the spatial
i
coordinate of f .
52
CHAPTER 6. BROWNIAN MOTION
This is the first symptom of the famous Itō formula, which says what this martingale
actually is.
Proof. Integrability is trivial from the boundedness of f , as well as adptedness, since Mt
is a function of (Bs , 0 ≤ s ≤ t). Let s, t ≥ 0. We estimate
Z
E[Mt+s |Ft ] = Mt + E f (t + s, Bt+s ) − f (t, Bt ) −
t
t+s
du
∂
∆
+
∂t
2
f (u, Bu )Ft .
On the one hand, E[f (t + s, Bt+s ) − f (t, Bt )|Ft ] = E[f (t + s, Bt+s − Bt + Bt )|Ft ] − f (t, Bt ),
and since Bt+s − Bt is independent of Ft with law N (0, s), using Proposition 1.3.4, this
is equal to
Z
f (t + s, Bt + x)p(s, x)dx − f (t, Bt ),
(6.3)
R
where p(s, x) = (2πs)−d/2 exp(−|x|2 /(2s)) is the probability density function for N (0, s).
On the other hand, if we let L = ∂/∂t + ∆/2,
Z
E
t
t+s
Z s
du Lf (u, Bu )Ft = E
du Lf (u + t, Bt+s − Bt + Bt )Ft .
0
(t)
This expression is of the form E[F ((B (t) , Bt ))|Ft ], where F is measurable and Bs =
Bt+s − Bt , s ≥ 0 is independent of Bt by the simple Markov property, and has law
W0 (dw), the Wiener measure. If (Xt , t ≥ 0) is the canonical process Xt (w) = wt , then
this last expression rewrites, by Proposition 1.3.4,
Z t+s
Z
Z s
Lf (u, Bu )duFt =
E
W0 (dw)
du Lf (u + t, Xs + Bt )
t
ΩW
0
Z s Z
W0 (dw)Lfe(u, Xs + Bt )
=
du
Z0 s ZΩW
=
du
dx p(s, x)Lfe(u, x + Bt ),
0
Rd
where fe(s, x) = f (s+t, x), and we made a use of Fubini’s theorem. Next, the boundedness
of Lfe entails that this is equal to
Z s Z
lim
du
dx p(s, x)Lfe(u, x + Bt ).
ε↓0
ε
Rd
From the expression for L, we can split this into two parts. Using integration by parts,
Z
Z
s
∂ fe
(u, x + Bt )
∂t
R
ε
Z
Z
e
=
dx p(s, x)f (s, x + Bt ) − dx p(ε, x)fe(ε, x + Bt )
R
R
Z
Z s
∂p
−
dx
du (u, x) f (t + u, x + Bt ).
∂t
Rd
ε
dx
du p(u, x)
6.5. RECURRENCE AND TRANSIENCE PROPERTIES
53
Similarly, integrating by parts twice yields
Z s Z
Z s Z
∆
∆
du dx p(u, x) f (u + t, x + Bt ) =
du dx p(u, x) f (u + t, x + Bt ).
2
2
ε
R
ε
R
Now, p(t, x) satisfies the heat equation (∂t − ∆/2)p = 0. Therefore, the integral terms
cancel each other, and it remains
Z
Z t+s
Z
e
dx p(s, x)f (s, x + Bt ) − lim
dx p(ε, x)fe(ε, x + Bt ),
E
du Lf (u, Bu )Ft =
t
Rd
ε↓0
Rd
which by dominated convergence is exactly (6.3). This shows that E[Mt+s − Mt |Ft ] = 0.
6.5
Recurrence and transience properties of Brownian motion
From this section on, we are going to introduce a bit of extra notation. We will suppose
that the reference measurable space (Ω, F) on which (Bt , t ≥ 0) is defined endowed with
probability measures Px , x ∈ Rd such that under Px , (Bt −x, t ≥ 0) is a standard Brownian
motion. A possibility is to choose the Wiener space and endow it with the measures Wx ,
so that the canonical process (Xt , t ≥ 0) is a Brownian motion started at x under Wx .
We let Ex be the expectation associated with Px . In the sequel, B(x, r), B(x, r) will
respectively denote the open and closed Euclidean balls with center x and radius r, in Rd
for some d ≥ 1.
Theorem 6.5.1 (i) If d = 1, Brownian motion is point-recurrent in the sense that under
P0 (or any Py , y ∈ R),
a.s., {t ≥ 0 : Bt = x} is unbounded for every x ∈ R.
(ii) If d = 2, Brownian motion is neighborhood-recurrent, in the sense that for every
x, under Px ,
a.s., {t ≥ 0 : |Bt | ≤ ε} is unbounded for every x ∈ Rd , ε > 0.
However, points are polar in the sense that for every x ∈ Rd ,
P0 (Hx = ∞) = 1 , where Hx = inf{t > 0 : Bt = x}
is the hitting time of x.
(iii) If d ≥ 3, Brownian motion is transient, in the sense that a.s. under P0 , |Bt | → ∞
as t → ∞.
Proof. (i) is a consequence of (ii) in Proposition 6.2.2.
For (ii), let 0 < ε < R be real numbers and f be a C ∞ function, which is bounded with
all its derivatives, and that coincides with x 7→ log |x| on Dε,R = {x ∈ R2 : ε ≤ |x| ≤ R}.
54
CHAPTER 6. BROWNIAN MOTION
Then one can check that ∆f = 0 on the interior of Dε,R , and therefore, if we let S =
inf{t ≥ 0 : |Bt | = ε} and T = inf{t ≥ 0 : |Bt | = R}, then S, T, H = S ∧ T are stopping
times, and from Proposition 6.4.3, the stopped process (log |Bt∧H |, t ≥ 0) is a (bounded)
martingale. If ε < |x| < R, we thus obtain that Ex [log |BH |] = log |x|. Since H ≤ T < ∞
a.s. (Brownian motion is unbounded a.s.), and since |BS | = ε, |BT | = R on the event that
S < ∞, T < ∞, the left-hand side is (log ε)Px (S < T ) + (log R)Px (S > T ). Therefore,
Px (S < T ) =
log R − log |x|
.
log R − log ε
(6.4)
Letting ε → 0 shows that the probability of hitting 0 before hitting the boundary of the
ball with radius R is 0, and therefore, letting R → ∞, the probability of hitting 0 (starting
from x 6= 0) is 0. The announced result (for x 6= 0) is then obtained by translation. We
thus have that P0 (Hx < ∞) = 0 for every x 6= 0. Next, we have
P0 (∃t ≥ a : Bt = 0) = P0 (∃s ≥ 0 : Bs+a − Ba + Ba = 0),
and the Markov property at time a shows that this is
Z
P0 (∃t ≥ a : Bt = 0) =
P0 (Ba ∈ dy)P0 (∃s ≥ 0 : Bs + y = 0)
2
R
Z
P0 (Ba ∈ dy)Py (∃s ≥ 0 : Bs = 0) = 0,
=
R2
because the law of Ba under P0 is a Gaussian law that does not charge the point 0 (we
have been using the notation P (X ∈ dx) for the law of the random variable X).
On the other hand, letting R → ∞ first in (6.4), we get that the probability of
hitting the ball with center 0 and radius ε is 1 for every ε, starting from any point:
Px (∃t ≥ 0 : |Bt | ≥ ε) = 1. Thus, for every n ∈ Z+ , a similar application of the Markov
property at time n gives
Z
Px (∃t ≥ n : |Bt | ≤ ε) =
Px (Bn ∈ dy)Py (∃t ≥ 0 : |Bt | ≤ ε) = 1.
R2
Hence the result.
For (iii) Since the three first components of a Brownian motion in Rd form a Brownian
motion, it is clearly sufficient to treat the case d = 3. So assume d = 3. Let f be a C ∞
function with all derivatives that are bounded, and that coincides with x 7→ 1/|x| on Dε,R ,
which is defined as previously but for d = 3. Then ∆f = 0 on the interior of Dε,R , and
the same argument as above shows that for x ∈ Dε,R , defining S, T as above,
Px (S < T ) =
|x|−1 − R−1
.
ε−1 − R−1
This converges to ε/|x| as R → ∞, which is thus the probability of ever visiting B(0, ε)
when starting from x (with |x| ≥ ε). Define two sequences of stopping times Sk , Tk , k ≥ 1
by S1 = inf{t ≥ 0 : |Bt | ≤ ε}, and
Tk = inf{t ≥ Sk : |Bt | ≥ 2ε} ,
Sk+1 = inf{t ≥ Sk : |Bt | ≥ 2ε}.
6.6. THE DIRICHLET PROBLEM
55
If Sk is finite, we get that Tk is also finite, because Brownian motion is an a.s. unbounded
process, so {Sk < ∞} = {Tk < ∞} up to a zero-probability event. The strong Markov
property at time Tk gives
Px (Sk+1 < ∞|Sk < ∞) = Px (Sk+1 < ∞|Tk < ∞)
= Px (∃s ≥ Tk : |Bs − BTk + BTk | ≤ ε | Tk < ∞)
Z
=
Px (BTk ∈ dy|Tk < ∞)Py (∃s ≥ 0 : |Bs | ≤ ε),
R3
where Px (BTk ∈ dy|Tk < ∞) is the law of BTk under the probability measure Px (A|Tk <
∞), A ∈ F. Since |BTk | = 2ε on the event {Tk < ∞}, we have that the last probability
is ε/|y| = 1/2. Finally, we obtain by induction that Px (Sk < ∞) ≤ Px (S1 < ∞)2−k+1 ,
and the Borel-Cantelli lemma entails that a.s., Sk = ∞ for some k. Therefore, Brownian
motion in dimension 3 a.s. eventually leaves the ball of radius ε for good, and letting
ε = n → ∞ along Z+ gives the result.
Remark. If B(x, ε) is the Euclidean ball of center x and radius ε, notice that the property
of (ii) implies the fact that {t ≥ 0 : Bt ∈ B(x, ε)} is unbounded for every x ∈ R2 and every
ε > 0, almost surely (indeed, one can cover R2 by a countable union of balls of a fixed
radius). In particular, the trajectory of a 2-dimensional Brownian motion is everywhere
dense. On the other hand, it will a.s. never hit a fixed countable family of points (except
maybe at time 0), like the points with rational coordinates!
6.6
Brownian motion and the Dirichlet problem
Let D be a connected open subset of Rd for some d ≥ 2. We will say that D is a domain.
Let ∂D be the boundary of D. We denote by ∆ the Laplacian on Rd . Suppose given
a measurable function g : ∂D → R. A solution of the Dirichlet problem with boundary
condition g on D is a function u : D → R of class C 2 (D) ∩ C(D), such that
∆u = 0
on D
(6.5)
u|∂D = g.
A solution of the Dirichlet problem is the mathematical counterpart of the following
physical problem: given an object made of homogeneous material, such that the temperature g(y) is imposed at point y of its boundary, the solution u(x) of the Dirichlet problem
gives the temperature at the point x in the object when equilibium is attained.
As we will see, it is possible to give a probabilistic resolution of the Dirichlet problem
with the help of Brownian motion. This is essentially due to Kakutani. We let Ex be
the law of the Brownian motion in Rd started at x. In the remaining of the section, let
T = inf{t ≥ 0 : Bt ∈
/ D} be the first exit time from D. It is a stopping time, as it is the
first entrance time in the closed set Dc . We will assume that the domain D is such that
P (T < ∞) = 1 to avoid complications. Hence BT is a well-defined random variable.
In the sequel, | · | is the euclidean norm on Rd . The goal of this section is to prove the
following
56
CHAPTER 6. BROWNIAN MOTION
Theorem 6.6.1 Suppose that g ∈ C(∂D, R) is bounded, and assume that D satisfies a
local exterior cone condition (l.e.c.c.), i.e. for every y ∈ ∂D, there exists a nonempty
open convex cone with origin at y such that C ∩ B(y, r) ⊂ Dc for some r > 0. Then the
function
u : x 7→ Ex [g(BT )]
is the unique bounded solution of the Dirichlet problem (6.5).
In particular, if D is bounded and satisfies the l.e.c.c., then u is the unique solution
of the Dirichlet problem.
We start with a uniqueness statement.
Proposition 6.6.1 Let g be a bounded function in C(∂D, R). Set
u(x) = Ex [g(BT )] .
If v is a bounded solution of the Dirichlet problem, then v = u.
In particular, we obtain uniqueness when D is bounded. Notice that we do not make
any assumption on the regularity of D here besides the fact that T < ∞ a.s.
Proof. Let v be a bounded solution of the Dirichlet problem. For every N ≥ 1, introduce
the reduced set DN = {x ∈ D : |x| < N and d(x, ∂D) > 1/N }. Notice it is an open set,
but which need not be connected. We let TN be the first exit time of DN . By Proposition
6.4.3, the process
Z t
1
Mt = veN (Bt ) − veN (B0 ) +
− ∆e
vN (Bs )ds ,
t≥0
2
0
is a martingale, where veN is a C 2 function that coincides with v on DN , and which
is bounded with all its partial derivatives (this may look innocent, but the fact that
such a function exists is highly non-trivial, the use of such a function could be avoided
by a stopped analog of Proposition 6.4.3). Moreover, the martingale stopped at TN is
Mt∧TN = v(Bt∧TN ) − v(B0 ), because ∆v = 0 inside D, and it is bounded (because v is
bounded on DN ), hence uniformly integrable. By optional stopping at TN , we get that
for every x ∈ DN ,
0 = Ex [MTN ] = Ex [v(BTN )] − v(x)
(6.6)
Now, as N → ∞, BTN converges to BT a.s. by continuity of paths and the fact that
T < ∞ a.s. Since v is bounded, we can use dominated convergence as N → ∞, and get
that for every x ∈ D,
v(x) = Ex [v(BT )] = Ex [g(BT )],
hence the result.
For every x ∈ Rd and r > 0, let σx,r be the uniform probability measure on the sphere
Sx,r = {y ∈ Rd : |y − x| = r}. It is the unique probability measure on Sx,r that is invariant
under isometries of Sx,r . We say that a locally bounded measurable function h : D → R
is harmonic on D if for every x ∈ D and every r > 0 such that the closed ball B(x, r)
with center x and radius r is contained in D,
Z
h(x) =
h(y)σx,r (dy).
Sx,r
6.6. THE DIRICHLET PROBLEM
57
Proposition 6.6.2 Let h be harmonic on a domain D. Then h ∈ C ∞ (D, R), and ∆h = 0
on D.
Proof. Let x ∈ D and ε > 0 such that B(x, ε) ⊂ D. Then let ϕ ∈ C ∞ (D, R) be
non-negative with non-empty compact support in [0, ε[. We have, for 0 < r < ε,
Z
h(x) =
h(x + y)σ0,r (dy).
S(0,r)
Multiplying by ϕ(r)rd−1 and integrating in r gives
Z
ch(x) =
ϕ(|z|)h(x + z)dz,
Rd
where c > 0 is some constant, where we have used the fact that
Z
Z
Z
d−1
f (x)dx = C
r dr
f (ry)σ0,r (dy)
Rd
R+
S(0,r)
R
R
for some C > 0. Therefore, ch(x) = Rd ϕ(|z − x|)h(z)dz and by derivation under the
sign, we easily get that h is C ∞ .
Next, by translation we may suppose that 0 ∈ D and show only that ∆h(0) = 0. we
may apply Taylor’s formula to h, obtaining, as x → 0,
h(x) = h(0) + h∇h(0), xi +
d
X
i=1
x2i
X
∂2h
∂2h
(0)
+
x
x
(0) + o(|x|2 ).
i j
∂x2i
∂x
∂x
i
j
i6=j
Now, integration over S0,r for r small enough yields
Z
h(x)σ0,r (dx) = h(0) + Cr ∆h(0) + o(|r|2 ),
Sx,r
R
where Cr = S0,r x21 σ0,r (dx), as the reader may check that all the other integrals up to the
second order are 0, by symmetry. since the left-hand side is h(0), we obtain ∆h(0) = 0.
Therefore, harmonic functions are solutions of certain Dirichlet problems.
Proposition 6.6.3 Let g be a bounded measurable function on ∂D, and let T = inf{t ≥
0 : Bt ∈
/ D}. Then the function h : x ∈ D 7→ Ex [g(BT )] is harmonic on D, and hence
∆h = 0 on D.
Proof. For every Borel subsets A1 , . . . , Ak of Rd and times t1 < . . . < tk , the map
x 7→ Px (Bt1 ∈ A1 , . . . , Btn ∈ An )
is measurable by Fubini’s theorem, once one has written the explicit formula for this
probability. Therefore, by the monotone class theorem, x 7→ Ex [F ] is measurable for
58
CHAPTER 6. BROWNIAN MOTION
every integrable random variable F , which is measurable with respect to the product
σ-algebra on C(R, Rd ). Moreover, h is bounded by assumption.
Now, let S = inf{t ≥ 0 : |Bt − x| ≥ r} the first exit time of B form the ball of center x
and radius r. Then by (ii), Proposition 6.2.2, S < ∞ a.s. By the strong Markov property,
e = (BS+t , t ≥ 0) is an (FS+t ) Brownian motion started at BS . Moreover, the first hitting
B
e is Te = T − S. Moreover, B
e e = BT , so that
time of ∂D for B
T
Z
e e )] =
Px (BS ∈ dy)Ey [g(BT )1{T <∞} ],
Ex [g(BT )] = Ex [g(B
T
Rd
R
and we recognize Px (BS ∈ dy)h(y) in the last expression.
Since B starts from x under Px , the rotation invariance of Brownian motion shows
that BS − x has a distribution on the sphere of center 0 and radius r which is invariant
under the orthogonal group, so we conclude that the distribution of BS is the uniform
measure on the sphere of center x and radius r, and therefore that h is harmonic on D.
It remains to understand whether the function u of Theorem 6.6.1 is actually a solution
of the Dirichlet problem. Indeed, is not the case in general that u(x) has limit g(y) as
x ∈ D, x → y, and the reason is that some points of ∂D may be ‘invisible’ to Brownian
motion. The reader can convince himself, for example, that if D = B(0, 1)\{0} is the open
ball of R2 with center 0 and radius 1, whose origin has been removed, and if g = 1{0} , then
no solution of the Dirichlet problem with boundary constraint g exists. The probabilistic
reason for that is that Brownian motion does not see the boundary point 0. This is the
reason why we have to make regularity assumptions on D in the following theorem.
Proof of Theorem 6.6.1.
It remains to prove that under the l.e.c.c., u is continuous on D, i.e. u(x) converges to
g(y) as x ∈ D converges to y ∈ ∂D. In order to do that, we need a preliminary lemma.
Recall that T is the first exit time of D for the Brownian path.
Lemma 6.6.1 Let D be a domain satisfying the l.e.c.c., and let y ∈ ∂D. Then for every
η > 0, Px (T < η) → 1 as x ∈ D → y.
Proof. Let Cy = y + C be a nonempty open convex cone with origin at y such that for
some η > 0, Cy ⊂ Dc (we leave as an exercise the case when only a neighborhood of this
cone around y is contained in Dc ). Then it is an elementary geometrical fact that for every
η > 0 small enough, there exist δ > 0 and a nonempty open convex cone C 0 with origin
at 0, such that x + (C 0 \ B(0, η)) ⊆ Cy for every x ∈ B(y, δ). Now by (iii) in Proposition
6.2.2, if HCε 0 = inf{t > 0 : Bt ∈ C 0 \ B(0, ε)}, then P0 (HCε 0 < η) % P0 (HC 0 < η) = 1 as
η ↓ 0.
Since hitting x + (C 0 \ B(0, η)) implies hitting Cy and therefore leaving D, we obtain,
after translating by x, that for every η, ε0 > 0, Px (T < η) can be made ≥ 1 − ε0 for x
belonging to a sufficiently small δ-neighborhood of y in D.
We can now finish the proof of Theorem 6.6.1. Let y ∈ ∂D. We want to estimate the
quantity Ex [g(BT )] − g(y) for some y ∈ ∂D. For η, δ > 0, let
Aη,δ = sup |Bt − x| ≥ δ/2 .
0≤t≤η
6.7. DONSKER’S INVARIANCE PRINCIPLE
59
This event decreases to ∅ as η ↓ 0 because B has continuous paths. Now, for any δ, η > 0,
Ex [|g(BT ) − g(y)|] = Ex [|g(BT ) − g(y)| ; {T ≤ η} ∩ Acδ,η ]
+Ex [|g(BT ) − g(y)| ; {T ≤ η} ∩ Aδ,η ]
+Ex [|g(BT ) − g(y)| ; {T ≥ η}]
Fix ε > 0. We are going to show that each of these three quantities can be made < ε/3
for x close enough to y. Since g is continuous at y, for some δ > 0, |y − z| < δ with
y, z ∈ ∂D implies |g(y) − g(z)| < ε/3. Moreover, on the event {T ≤ η} ∩ Acδ,η , we know
that |BT − x| < δ/2, and thus |BT − y| ≤ δ as soon as |x − y| ≤ δ/2. Therefore, for every
η > 0, the first quantity is less than ε/3 for x ∈ B(y, δ/2).
Next, if M is an upper bound for |g|, the second quantity is bounded by 2M P (Aδ,η ).
Hence, by now choosing η small enough, this is < ε/3.
Finally, with δ, η fixed as above, the third quantity is bounded by 2M Px (T ≥ η). By
the previous lemma, this is < ε/3 as soon as x ∈ B(y, α) ∩ D for some α > 0. Therefore,
for any x ∈ B(y, α ∧ δ/2) ∩ D, |u(x) − g(y)| < ε. This entails the result.
Corollary 6.6.1 A function u : D → R is harmonic in D if and only if it is in C 2 (D, R),
and satisfies ∆u = 0.
Proof. Let u be of class C 2 (D) be of zero Laplacian, and let x ∈ D. Let ε be such that
B(x, ε) ⊆ D, and notice that u|B(x,ε) is a solution of the Dirichlet problem on B(x, ε) with
boundary values u|∂B(x,ε) . Then B(x, ε) satisfies the l.e.c.c., so that u|B(x,ε) is the unique
such solution, which is also given by the harmonic function of Theorem 6.6.1. Therefore,
u is harmonic on D.
6.7
Donsker’s invariance principle
The following theorem completes the description of Brownian motion as a ‘limit’ of centered random walks as depicted in the beginning of the chapter, and strengthen the
convergence of finite-dimensional marginals to that convergence in distribution. We endow C([0, 1], R) with the supremum norm, and recall (see the exercises on continuous-time
processes) that the product σ-algebra associated with it coincides with the Borel σ-algebra
associated with this norm. We say that a function F : C([0, 1]) → R is continuous if it is
continuous with respect to this norm.
Theorem 6.7.1 (Donsker’s invariance principle) Let (Xn , n ≥ 1) be a sequence of
R-valued integrable independent random variables with common law µ, such that
Z
Z
xµ(dx) = 0 and
x2 µ(dx) = σ 2 ∈ (0, ∞).
Let S0 = 0 and Sn = X1 + . . . + Xn , and define a continuous process that interpolates
linearly between values of S, namely
St = (1 − {t})S[t] + {t}S[t]+1
t ≥ 0,
60
CHAPTER 6. BROWNIAN MOTION
where [t] denotes the integer part of t and {t} = t − [t]. Then S [N ] := ((σ 2 N )−1/2 SN t , 0 ≤
t ≤ 1) converges in distribution to a standard Brownian motion between times 0 and 1,
i.e. for every bounded continuous function F : C([0, 1]) → R,
E F (S [N ] ) → E0 [F (B)].
n→∞
Notice that this is much stronger than what Proposition 6.1.1 says. Despite the
slight difference of framework between these two results (one uses càdlàg continuoustime version of the random walk, and the other uses an interpolated continuous version),
Donsker’s invariance principle is stronger. For instance, one can infer from this theorem
that the random variable N −1/2 sup0≤n≤N Sn converges to sup0≤t≤1 Bt in distribution,
because f 7→ sup f is a continuous operation on C([0, 1], R). Proposition 6.1.1 would be
powerless to address this issue.
The proof we give here is an elegant demonstration that makes use of a coupling of the
random walk with the Brownian motion, called the Skorokhod embedding theorem. It is
however specific to dimension d = 1. Suppose we are given a Brownian motion (Bt , t ≥ 0)
on some probability space (Ω, F, P ).
Let µ+ (dx) = P (X1 ∈ dx)1{x≥0} , µ− (dy) = P (−X1 ∈ dy)1{y≥0} define two nonnegative measures. Assume that (Ω, F, P ) is a rich enough probability space so that we
can further define on it, independently of (Bt , t ≥ 0), a sequence of independent identically
distributed R2 -valued random variables ((Yn , Zn ), n ≥ 1) with distribution
P ((Yn , Zn ) ∈ dxdy) = C(x + y)µ+ (dx)µ− (dy),
where C > 0 is the appropriate normalizing constant that makes this expression a probability measure.
Next, consider let F0 = σ{(Yn , Zn ), n ≥ 1} and Ft = F0 ∨ FtB , so that (Ft , t ≥ 0)
is a filtration such that B is an (Ft )-Brownian motion. We define a sequence of random
times, by T0 = 0, T1 = inf{t ≥ 0 : Bt ∈ {Y1 , −Z1 }}, and recursively,
Tn = inf{t ≥ Tn−1 : Bt − BTn−1 ∈ {Yn , −Zn }}.
By (ii) in Proposition 6.2.2, these times are a.s. finite, and they are stopping times with
respect to the filtration (Ft ). We claim that
Lemma 6.7.1 The sequence (BTn , n ≥ 0) has the same law as (Sn , n ≥ 0). Moreover,
the intertimes (Tn − Tn−1 , n ≥ 0) form an independent sequence of random variables with
same distribution, and expectation E[T1 ] = σ 2 .
Proof. By repeated application of the Markov property at times Tn , n ≥ 1, and the
fact that the (Yn , Zn ), n ≥ 1 are independent with same distribution, we obtain that the
processes (Bt+Tn−1 −BTn−1 , 0 ≤ t ≤ Tn −Tn−1 ) are independent with the same distribution.
The fact that the differences BTn − BTn−1 , n ≥ 1 and Tn − Tn−1 , n ≥ 0 form sequences of
independent and identicaly distributed random variables follows from this observation.
It therefore remains to check that BT1 has same law as X1 and E[T1 ] = σ 2 . Remember
from Proposition 6.4.2 that given Y1 , Z1 , the probability that BT1 = Y1 is Z1 /(Y1 + Z1 ), as
6.7. DONSKER’S INVARIANCE PRINCIPLE
61
follows from the optional stopping theorem. Therefore, for every non-negative measurable
function f , by first conditioning on (Y1 , Z1 ), we get
Z1
Y1
E[f (BT1 )] = E f (Y1 )
+ f (−Z1 )
Y1 + Z 1
Y1 + Z 1
Z
x
y
+ f (−y)
=
C(x + y)µ+ (dx)µ− (dy) f (x)
x+y
x+y
R+ ×R+
Z
= C0
(f (x)µ+ (dx) + f (−x)µ− (dx)) = C 0 E[f (X1 )],
R+
R
for C 0 = C
can only be = 1 (by taking f = 1). Here, we have used the
R xµ+ (dx) which
R
fact that xµ+ (dx) = xµ− (dx), which amounts to say that X1 is centered.
For E[T1 ], recall from Propostition 6.4.2 that E[inf{t ≥ 0 : Bt ∈ {x, −y}}] = xy, so
by a similar conditioning argument as above,
Z
C(x + y)xyµ+ (dx)µ− (dy) = σ 2 ,
E[T1 ] =
R+ ×R+
where we again used that C
R
xµ+ (dx) = 1.
Proof of Donsker’s invariance principle. We suppose given a Brownian motion
(N )
B. For N ≥ 1, define Bt = N 1/2 BN −1 t , t ≥ 0, which is a Brownian motion by scaling
invariance. Perform the Skorokhod embedding construction on B (N ) to obtain variables
(N )
(N )
(N )
(N )
Tn , n ≥ 0. Then, let Sn = B (N ) . Then by Lemma 6.7.1, Sn , n ≥ 0 is a random
Tn
walk with same law as Sn , n ≥ 0. We interpolate linearly between integers to obtain a
(N )
(N )
(N )
(N )
continuous process St , t ≥ 0. Finally, let Set = (σ 2 N )−1/2 SN t , t ≥ 0 and Ten =
(N )
N −1 Tn .
(N )
We are going to show that the supremum norm of Bt − Set , 0 ≤ t ≤ 1 converges to 0
in probability.
By the law of large numbers, Tn /n converges a.s. to σ 2 as n → ∞. Thus, by a
monotonicity argument, N −1 sup0≤n≤N |Tn − σ 2 n| converges to 0 a.s. as N → ∞. As a
consequence, this supremum converges to 0 in probability, meaning that for every δ > 0,
(N
)
P
sup |Ten − n/N | ≥ δ
→ 0.
N →∞
0≤n≤N
(N )
(N )
On the other hand, for every t ∈ [n/N, (n + 1)/N ], there exists some u ∈ [Ten , Ten+1 ] with
(N )
(N )
Bu = Set , because Sen/N = BTen(N ) for every n and by the intermediate values theorem,
(N )
Se(N ) and B being continuous. Therefore, the event {sup0≤t≤1 |Set −Bt | > ε} is contained
in the union K N ∪ LN , where
N
(N
)
K =
sup |Ten − n/N | > δ
0≤n≤N
and
LN = {∃ t ∈ [0, 1], ∃ u ∈ [t − δ − 1/N, t + δ + 1/N ] : |Bt − Bu | > ε}.
62
CHAPTER 6. BROWNIAN MOTION
We already know that P (K N ) → 0 as N → ∞. For LN , since B is a.s. uniformly
continuous on [0, 1], by taking δ small enough and then N large enough, we can make
P (LN ) as small as wanted. Therefore, we have showed that
P kSe(N ) − Bk∞ > ε → 0.
n→∞
Therefore, (Se(N ) , 0 ≤ t ≤ 1) converges in probability for the uniform norm to (Bt , 0 ≤
t ≤ 1), which entails convergence in distribution by Proposition 5.2.1. This concludes the
proof.
Chapter 7
Poisson random measures and
Poisson processes
7.1
Poisson random measures
Let (E, E) be a measurable space, and let µ be a non-negative σ-finite measure on (E, E).
We denote by E ∗ the set of σ-finite atomic measures on E, i.e. the set of σ-finite measures
taking values
that can be put in
P in Z+ t {∞} (in fact, we will only consider measures
∗
the form i∈I δxi with I countable and xi ∈ E, i ∈ I). The set E is endowed with the
product σ-algebra E ∗ = σ(XA , A ∈ E), where XA (m) = m(A) for m ∈ E ∗ , and A ∈ E.
Otherwise said, for every A ∈ E, the mapping m 7→ m(A) from E ∗ to Z+ t {∞} is
measurable with respect to E ∗ . For λ > 0 we denote by P(λ) the Poisson distribution
with parameter λ, which assigns mass e−λ λn /n! to the integer n.
Definition 7.1.1 A Poisson measure on (E, E) with intensity µ is a random variable
M with values in E ∗ , such that if (Ak , k ≥ 1) is a sequence of disjoint sets in E, with
µ(Ak ) < ∞ for every k,
(i) the random variables M (Ak ), k ≥ 1 are independent, and
(ii) the law of M (Ak ) is P(µ(Ak )) for k ≥ 1.
Notice that properties (i) and (ii) completely characterize the law of the random
variable M . Indeed, notice that events which are either empty or of the form
{m ∈ E ∗ : m(A1 ) = i1 , . . . , m(Ak ) = ik } ,
with pairwise disjoint A1 , . . . , Ak ∈ E, µ(Aj ) < ∞, 1 ≤ j ≤ k and where (i1 , . . . , ik ) are
integers, form a π-system that generates E ∗ . If now M is a Poisson random measure with
intensity µ, on some probability space (Ω, F, P ), then
P (M (A1 ) = i1 , . . . , M (Ak ) = ik ) =
k
Y
j=1
e−µ(Aj )
µ(Aj )ij
.
nj !
Hence the uniqueness of the law of a random measure satisfying (i), (ii). Existence is
stated in the next
63
64
CHAPTER 7. POISSON RANDOM MEASURES AND PROCESSES
Proposition 7.1.1 For every σ-finite non-negative measure µ on (E, E), there exists a
Poisson random measure on (E, E) with intensity µ.
Proof. Suppose first that λ = µ(E) < ∞. We let N be a Poisson random variable with
parameter λ, and X1 , . . . be independent
random variables, independent of N , with law
P (ω)
µ/µ(E). Finally, we let Mω = N
δ
Xi (ω) .
i=1
Now, if N is Poisson with parameter λ and (Yi , i ≥ 1) areP
independent and independent
N
of N , with P (Yi = j) = pj , 1 ≤ j ≤ k, it holds that
i=1 1{Yi =j} , 1 ≤ j ≤ k are
independent with respective laws P(pj λ), 1 ≤ j ≤ k. It follows that M is a Poisson
measure with intensity µ: for disjoint A1 , . . . , Ak in E with finite µ-measures, we let Yi = j
whenever Xi ∈ Aj , defining independent random variables in {1, . . . , k} with P (Yi = j) =
µ(Aj )/µ(E), so that M (Aj ), 1 ≤ j ≤ k are independent P(µ(E)µ(Aj )/µ(E)), 1 ≤ j ≤ k
random variables.
In the general case, since µ is σ-finite, there is a partition of E into measurable sets
Ek , k ≥ 1 that are disjoint and have finite µ-measure. We can construct independent
Poisson measures Mk on Ek with intensity µ(· ∩ Ek ), for k ≥ 1. We claim that
X
M (A) =
Mk (A ∩ Ek ) ,
A ∈ E,
k≥1
defines a Poisson random measure with intensity µ. This is an easy consequence of the
property that if Z1 , Z2 , . . . are independent Poisson variables with respective parameters
λ1 , λ2 , . . ., then the sum Z1 + Z2 + . . . is Poisson with parameter λ1 + λ2 + . . . (with the
convention that P(∞) is a Dirac mass at ∞).
From the construction, we obtain the following important property of Poisson random
measures:
Proposition 7.1.2 Let M be a Poisson random measure on E with intensity µ, and let
A ∈ E be such that µ(A) < ∞. Then
P M (A) has law P(µ(A)), and given M (A) = k, the
restriction M |A has same law as ki=1 δXi , where (X1 , X2 , . . . , Xk ) are independent with
law µ(· ∩ A)/µ(A). Moreover, if A, B ∈ E are disjoint, then the restrictions M |A , M |B
are
P independent. Last, any Poisson random measure can be written in the form M (dx) =
i∈I δxi (dx) where I is a countable index-set and the xi , i ∈ I are random variables.
7.2
Integrals with respect to a Poisson measure
Proposition 7.2.1 Let M be a Poisson random measure on E, with intensity µ. Then
for every measurable f : E → R+ , the quantity
Z
M (f ) :=
f (x)M (dx)
E
defines a random variable, and
Z
E[exp(−M (f ))] = exp − µ(dx)(1 − exp(−f (x))) .
E
7.2. INTEGRALS WITH RESPECT TO A POISSON MEASURE
Moreover, if f : E → R is measurable and in L1 (µ), then f ∈ L1 (M ) a.s.,
defines a random variable, and
Z
µ(dx)(exp(if (x)) − 1) .
E[exp(iM (f ))] = exp
65
R
E
f (x)M (dx)
E
The first formula is sometimes called the Laplace functional formula, or Campbell
formula. Notice that by replacing f by af , differentiating the formula with respect to a
and letting a ↓ 0, one gets the first moment formula
Z
E[M (f )] =
f (x)µ(dx) ,
E
whenever f ≥ 0, or f is integrable w.r.t. µ (in this case, consider first f + , f − ). Similarly,
Z
f (x)2 µ(dx)
Var M (f ) =
E
(for this, first notice that the restrictions of M to {f ≥ 0} and {f < 0} are independent).
Proof. Let En , n ≥ 0 be a measurable partition of E into sets with finite µ-measure.
First assume that f = 1A for A ∈ E, µ(A) < ∞. Then M (A) is a random variable
by definition of M , and this extends to any A ∈ E by considering A ∩ En , n ≥ 0 and
summation. Since any measurable non-negative function is the increasing limit of finite
linear combinations of such indicator functions, we obtain that M (f ) is a random variable
as a limit of random variables. Moreover, a similar argument shows that M (f 1En ), n ≥ 0
are independent random variables.
Next, assume f ≥ 0. The number Nn of atoms of M that fall in En has law P(µ(En ))
and given Nn = k, the atoms can be supposed to be independent random variables with
law µ(· ∩ En )/µ(En ). Therefore,
Z
k
∞
k
X
µ(dx) −f (x)
−µ(En ) µ(En )
e
E[exp(−M (f 1En ))] =
e
k!
µ(E
n)
E
n
k=0
Z
= exp −
µ(dx)(1 − exp(−f (x)))
En
From the independence of the variables M (f 1En ), we can then take products over n ≥ 0
(i.e. apply monotone convergence) and obtain the wanted formula.
From this, we obtain the first moment formula for functions f ≥ 0. If f is a measurable
function from E → R, applying the result to |f | shows that if f ∈ L1 (µ), then M (|f |) < ∞
a.s. so M (f ) is well-defined for almost every ω, and defines a random variable as it is equal
to M (f + ) − M (f − ).
To establish the last formula of the theorem, in the case where f ∈ L1 (µ), follows by
the same kind of arguments: first, we Restablish the formula for f 1En in place
of f . Then, to
R
if (x)
obtain the result, we must show that An µ(dx)(e
−1) converges to E µ(dx)(eif (x) −1),
where An = E0 ∪· · ·∪En . But |eif (x) −1| ≤ |f (x)|, whence
the function under
R
R consideration
is integrable with respect to µ, giving the result (| E\An g(x)µ(dx)| ≤ E\An |g(x)|µ(dx)
decreases to 0 whenever g is integrable).
66
7.3
CHAPTER 7. POISSON RANDOM MEASURES AND PROCESSES
Poisson point processes
We now show how Poisson random measures can be used to define certain stochastic
processes. Let (E, E) be a measurable space, and consider a σ-finite measure G on (E, E).
Let µ be the product measure dt ⊗ G(dx) on R+ × E, where dt is the Lebesgue measure
on (R+ , B(R+ )). Otherwise said, µ is the unique measure such that µ([0, t] × A) = tG(A)
for t ≥ 0 and A ∈ E.
A Lévy process (Xt , t ≥ 0) (with values in R) is a process with independent and
stationary increments, i.e. such that for every 0 = t0 ≤ t1 ≤ . . . ≤ tk , the random variables
(Xti − Xti−1 , 1 ≤ i ≤ k) are independent with respective laws that of Xti −ti−1 , 1 ≤ i ≤ k.
Equivalently, X is a Lévy process if and only if X (t) = (Xt+s − Xt , s ≥ 0) has same law
as X and is independent of FtX = σ(Xs , 0 ≤ s ≤ t) for every t ≥ 0 (simple Markov
property).
Proposition 7.3.1 A Poisson random measure M whose intensity µ is of the above form
is called a Poisson point process. If f be a measurable G-integrable function on E, then
the process
Z
Ntf =
f (x)M (ds, dx) ,
t ≥ 0,
[0,t]×E
is a Lévy process. Moreover, the process
Z
Z
f
Mt =
f (x)M (ds, dx) − t f (x)G(dx) ,
[0,t]×E
t ≥ 0,
E
is a martingale with respect to the filtration Ft = σ(M ([0, s] × A), s ≤ t, A ∈ E), t ≥ 0. If
moreover f ∈ L2 (µ), the process
Z
f 2
(Mt ) − t f (x)2 G(dx),
E
is an (Ft )-martingale.
R
Proof. For s ≤ t, we have Ntf − Nsf = (s,t]×E f (x)M (du, dx). Moreover, it is easy to
check that M (du, dx)1{u∈(s,t]} has same law as the image of M (du, dx)1{u∈(0,t−s]} under
(u, x) 7→ (s + u, x) from R+ × E to itself, and is independent of M (du, dx)1{u∈[0,s]} .
We obtain that N f has stationary and independent increments. The fact that M f is a
martingale is a straightforward consequence of the first moment formula and the simple
Markov property. The last statement comes from writing (Mtf )2 = (Mtf − Msf + Msf )2
and expanding, then using the variance formula and the simple Markov property.
7.3.1
Example: the Poisson process
Let X1 , X2 , . . . be a sequence of independent exponential random variables with parameter
θ, and define 0 = T0 ≤ T1 ≤ . . . by Tn = X1 + . . . + Xn . We let
Ntθ
=
∞
X
n=1
1{Tn ≤t} ,
t ≥ 0,
7.3. POISSON POINT PROCESSES
67
be the càdlàg process that counts the number of times Tn that are ≤ t. The process
(Ntθ , t ≥ 0) is called the (homogeneous) Poisson process with intensity θ. This is the socalled Markovian description of the Poisson process, which is a jump-hold Markov process.
The following alternative description makes use of Poisson random measures. We give
the statement without proof, which can be found in textbooks, or make a good exercise
(first notice that with both definitions, N θ is a process with stationary and independent
increments).
Proposition 7.3.2 Let θ > 0, and let M be a Poisson random measure with intensity
θdt on R+ . Then the process
Ntθ = M ([0, t]) ,
t≥0
is a Poisson process with intensity θ.
The set of atoms of the measure M itself is sometimes also called a Poisson (point)
process with intensity θ.
7.3.2
Example: compound Poisson processes
A compound Poisson process with intensity ν is a process of the form
Z
ν
Nt =
xM (ds, dx) ,
t ≥ 0,
[0,t]×R
where M is a Poisson random measure with intensity
P dt ⊗ ν(dx) and ν is a finite measure
on R. Alternatively, if we write M in the form i∈I δ(ti ,xi ) , for every t ≥ 0 we can write
Xt = xi whenever t = ti and Xt = 0 otherwise. As an exercise, one can prove that this
is a.s. well-defined, i.e. that a.s., for every t ≥ 0, the set {i ∈ I : ti = t} has at most one
element. With this notation, we can write
X
Ntν =
Xs ,
t≥0
0≤s≤t
(notice that there is a.s. a finite set of times s ∈ [0, t] such that Xs 6= 0, so the sum is
meaningful).
There is a Markov jump-hold description for these processes as well: if N θ is a Poisson
process with parameter θ = ν(R) and jump times 0 < T1 < T2 < . . ., and if Y1 , Y2 , . . . is a
sequence of i.i.d. random variables, independent of N θ and with law ν/θ, then the process
X
Yn 1{Tn ≤t} ,
t ≥ 0,
n≥1
is a compound Poisson process with intensity ν. This comes from the following marking
property of Poisson measures:
P suppose we have a description of a Poisson random measure
M (dx) with intensity µ as i∈I δXi (dx), where (Xi , i ∈ I) is a countable family of random
variables. If (Yi , i ∈ I) is a family of i.i.d. random variables with law ν, and independent
68
CHAPTER 7. POISSON RANDOM MEASURES AND PROCESSES
P
of M , then M 0 = i∈I δ(Xi ,Yi ) is a Poisson random measure with intensity the product
measure µ ⊗ ν.
We let CP(ν) be the law of N1ν , it is called the compound Poisson distribution with
intensity ν. It can be written in the form
CP(ν) =
X
n≥0
e−ν(R)
ν ∗n
,
n!
where ν ∗n is the n-fold convolution of the measure ν. Recall that the convolution of two
finite measures µ, ν on R is the unique measure µ ∗ ν which is characterized by
ZZ
µ ∗ ν(A) =
1A (x + y)µ(dx)ν(dy) ,
A ∈ BR ,
and that if µ, ν are probability measures, then µ∗ν is the law of the sum of two independent
random variables with respective laws µ, ν). The characteristic function of CP(ν) is given
by
ΦCP(ν) (u) = exp(−ν(R)(1 − Φν/ν(R) (u))),
where Φν/ν(R) is the characteristic function of ν/ν(R).
Chapter 8
Infinitely divisible laws and Lévy
processes
In this chapter, we consider only random variables and processes with values in R.
8.1
Infinitely divisible laws and Lévy-Khintchine formula
Definition 8.1.1 Let µ be a probability measure on (R, BR ). We say that µ is infinitely
divisible (ID) if for every n ≥ 1, there exists a probability distribution µn such that if
X1 , . . . , Xn are independent with law µn , then their sum X1 + . . . + Xn has law µ.
Otherwise said, for every n, there exists µn such that µ∗n
n = µ, where ∗ stands for
the convolution operation for measures. Yet otherwise said, the characteristic function
Φ of µ is such that for every n ≥ 1, there exists another characteristic function Φn with
Φnn = Φ. We stress that it is not the existence of a function whose n-th power is Φ which
is problematic, but really that this function is a characteristic function.
To start with, let us mention examples of ID laws. Constant random variables are
ID. The Gaussian N (m, σ 2 ) is the convolution of n laws N (m/n, σ 2 /n), so it is ID. The
Poisson law P(λ) is also ID as the convolution of n laws P(λ/n). More generally, a
compound Poisson law CP(ν) is ID, as the n-th power of CP(ν/n).
It is a bit harder to see, but however true, that exponential and geometric distributions
are ID. However, the uniform distribution on [0, 1], or the Bernoulli distribution with
parameter p ∈ (0, 1), are not ID. Suppose indeed that an ID law µ has a support which
is bounded above and below by M > 0. Then the support of µn is bounded by M/n, but
then its variance is ≤ M 2 /n2 , which shows that the variance of µ is ≤ M 2 /n for every n,
hence µ is a Dirac mass.
The main goal of this chapter is to give a structural theorem for ID laws, the LévyKhintchine formula. Say that a triple (a, q, Π) is a Lévy triple if
• a ∈ R,
• q ≥ 0,
69
70
CHAPTER 8. ID LAWS AND LÉVY PROCESSES
R
• Π is a σ-finite measure on R such that Π({0}) = 0 and (x2 ∧ 1) Π(dx) < ∞.
In particular, Π(1{|x|>ε} ) < ∞ for every ε > 0.
Theorem 8.1.1 (Lévy-Khintchine formula) Let µ be an ID law. Then there exist a
unique Lévy triple (a, q, Π) such that if Φ is the characteristic function of µ, Φ(u) = eψ(u) ,
where ψ is the characteristic exponent given by
Z
q 2
ψ(u) = iau − u + (eiux − 1 − iux1{|x|<1} )Π(dx).
2
R
We reobtain the constants for a = q = 0, the normal laws for Π = 0, and the compound
Poisson laws for a = q = 0 and Π(dx) = ν(dx), a finite measure.
Lemma 8.1.1 The characteristic function Φ of an ID law never vanishes, and therefore,
the characteristic exponent ψ with ψ(0) = 0 is well-defined and unique.
Proof. If µ is ID, then Φ = Φnn for all n, where Φn is the characteristic exponent of
some law µn . Therefore, |Φ| = |Φn |n , and taking logarithms, as n → ∞, we see that
Φn converges pointwise to 1{Φ6=0} . However, Φ is continuous and takes the value 1 at 0,
so it is non-zero in a neighborhood of 0, and 1{Φ6=0} equals 1 (hence is continuous) in a
neighborhood of 0. By Lévy’s convergence theorem, this shows that µn weakly converges
to some distribution, which has no choice but to be δ0 . In particular, Φ never vanishes.
To conclude, it is a standard topology exercise that a continuous function f : R → C
that never vanishes and such that f (0) = 1 can be uniquely lifted into a continuous
function g : R → C with g(0) = 0, so that eg = f .
As a corollary, notice that Φn , the n-th ‘root’ of Φ, can itself be written in the form
e for a unique continuous ψn satisfying ψn (0) = 0, so that ψn = ψ/n. It also entails the
uniqueness of µn such that µ∗n
n = µ.
ψn
Lemma 8.1.2 An ID law is the weak limit of compound Poisson laws.
Proof. Let Φn be the characteristic function of µn , as defined above. Then since (1−(1−
Φn ))n = Φ, and Φn → 1 pointwise, we obtain that −n(1 − Φn ) → ψ pointwise, taking the
complex logarithm in a neighborhood of 1. In fact, this convergence even holds uniformly
on compact neighborhood of 0, a fact that we will need later on. Exponentiating gives
exp(−n(1 − Φn )) → Φ. However, on the left-hand size we can recognize the characterisitc
function of a compound Poisson law with intensity nµn .
Proof of the Lévy-Khintchine formula. We must now prove that the limit ψ of
−n(1 − Φn ) has the form given in the statement of the theorem. First of all, we make
a technical modification of the statement, replacing the 1{|x|<1} in the statement by a
continuous function h such that 1{|x|<1} ≤ h ≤ 1{|x|≤2} . This will just modify the value
of a in the statement.
8.2. LÉVY PROCESSES
71
Let ηn (dx) = (1 ∧ x2 )nµn (dx), which is a sequence of measures with finite total mass.
Suppose we know that the sequence (ηn , n ≥ 1) is tight and (ηn (R), n ≥ 1) is bounded,
and let η be the limit of ηn along some subsequence nk . Then
Z
Z
ηn (dx)
iux
(e − 1)nµn (dx) =
(eiux − 1) 2
(8.1)
x ∧1
R
R
Z
Z
(eiux − 1 − iuxh(x))
xh(x)
=
ηn (dx) + iu
ηn (dx)
2
2
x ∧1
R x ∧1
ZR
=
Θ(u, x)ηn (dx) + iuan
R
where
Θn (u, x) =
(eiux − 1 − iuxh(x))/(x2 ∧ 1)
−u2 /2
if x 6= 0
if x = 0,
R
η (dx). Now, for each fixed
function,
and an = R xh(x)
x2 ∧1 n
R u, Θ(·, u) is a continuous bounded
R
and therefore, along the subsequence nk , R Θ(u, x)ηn (dx) converges to R Θ(u, x)η(dx).
Since the left-hand side in (8.1) converges to ψ(u), this implies that ank converges to some
a ∈ R. Therefore, if q = η({0}), we obtain that
Z
q 2
ψ(u) = iua − u + (eiux − 1 − iuxh(x))Π(dx),
2
R
where Π(dx) = 1{x6=0} (x2 ∧ 1)−1 η(dx) is a measure that is σ-finite, integrates x2 ∧ 1, and
does not charge 0. Hence the result.
So, let us prove that (ηn , n ≥ 1) is tight and that the total masses are bounded. First,
2
x 1{|x|≤1} ≤ C(1 − cos x) for some C > 0, so
Z
Z
2
x 1{|x|≤1} nµn (dy) ≤ C (1 − cos x)nµn (dx),
ηn (|x| ≤ 1) =
R
R
which converges to −C<ψ(1) as n → ∞. Second, adapting Lemma 5.4.1, since ηn 1{|x|≥1} =
nµn 1{|x|≥1} , for some C > 0, and every K ≥ 1,
Z
ηn (|x| ≥ K) ≤ CK
Z
K −1
n(1 − <Φn (x))dx → −CK
{|x|≤K −1 }
n→∞
<ψ(x)dx,
−K −1
where the limit can be taken because the convergence of the integrand is uniform on
compact neighborhoods of 0, as stressed in the proof of Lemma 8.1.2. Now the limit can
be made as small as wanted for K large enough, because ψ is continuous and ψ(0) = 0.
This entails the result.
The uniqueness statement will be proved in the next section.
8.2
Lévy processes
In this section, all the Lévy processes under consideration start at X0 = 0. Lévy processes
are closely related to ID laws, indeed, if X is a Lévy process, then the random variable
72
CHAPTER 8. ID LAWS AND LÉVY PROCESSES
X1 can be written as a sum of i.i.d. variables
X1 =
n
X
(Xk/n − X(k−1)/n ),
k=1
hence is ID. In fact, (laws of) càdlàg Lévy processes are in one-to-one correspondence with
ID laws, as we show in this section. The first problem we address is that the mapping
X 7→ X1 is injective from the set of (laws of) càdlàg Lévy processes to the set of ID laws.
Proposition 8.2.1 Let µ be an ID law. Then there exists at most one càdlàg Lévy
process (Xt , t ≥ 0) such that X1 has law µ. Moreover, if µ has a Lévy triple (a, q, Π) with
associated characteristic exponent
Z
q 2
ψ(u) = iau − u + (eiux − 1 − iux1{|x|<1} )Π(dx),
2
R
then the law of such a process X is entirely characterized by the formula.
E[exp(iuXt )] = exp(tψ(u)).
Proof. If X is as in the statement, then for n ≥ 1, X1/n must have ψ/n as characteristic
exponent by uniqueness of the characteristic exponent of the n-th root of an ID law.
From this we deduce easily that E[exp(iuXt )] = exp(tψ(u)) for every t ∈ Q+ and u ∈ R.
Since X is càdlàg we deduce the result for every t ∈ R+ by approximating t by 2−n d2n te.
Therefore, the one-dimensional marginal distributions of X are uniquely determined by
µ. It is then easy to check that the finite-marginal distributions of Lévy processes are in
turn determined by their one-dimensional marginal distributions, because the increments
(Xtj − Xtj−1 , 1 ≤ j ≤ k), for any 0 = t0 ≤ t1 ≤ . . . ≤ tk , are independent with respective
laws (Xtj −tj−1 , 1 ≤ j ≤ k). Hence the result.
The next theorem is a kind of converse to this theorem, and gives an explicit construction of ‘the’ càdlàg Lévy process whose law at time 1 is a given ID law µ. Let (a, q, Π)
be a Lévy triple associated to an ID law µ. Consider a Poisson random measure M on
R+ × R with intensity dt ⊗ Π(dx), and let (t, ∆t ) = (t, x) if M has an atom of the form
(t, x), and ∆t = 0 otherwise. For any n ≥ 1, consider the martingale
Z Z
Z
n
Yt =
1{n−1 ≤|y|<1} yM (ds, dy) − t yΠ(dy)1{n−1 ≤|y|<1} ,
t≥0
[0,t]
R
associated by Proposition 7.3.1 with the Poisson measure M (dt, dx)1{n−1 ≤|x|<1} . Notice
also that this last measure has always a finite number of atoms, because Π(dx)1{|x|>n−1 }
is a finite measure by assumption on Π, so that
Z
X
n
Yt =
∆t 1{n−1 ≤|∆t |<1} − t yΠ(dy)1{n−1 ≤|y|<1} ,
t ≥ 0.
(8.2)
0≤s≤t
Independently of M , let Bt be a standard Brownian motion. Finally notice that
X
Yt0 =
∆t 1{|∆t |≥1} ,
t≥0
(8.3)
0≤s≤t
is a compound Poisson process with intensity Π(dx)1{|x|>1} . We let Ft be the σ-algebra
generated by {Bs , Ys0 , Ysn , n ≥ 1; 0 ≤ s ≤ t}.
8.2. LÉVY PROCESSES
73
Theorem 8.2.1 (Lévy-Itō’s theorem) Let µ be an ID law, with Lévy triple (a, q, Π),
and let B, Y 0 , Y n , n ≥ 1 denote the processes associated with this triple as explained above.
Then there exists a càdlàg square integrable (Ft )-martingale Y ∞ such that for every t ≥ 0,
n
∞ 2
E sup |Ys − Ys | −→ 0.
n→∞
0≤s≤t
Moreover, the process
Xt = at +
√
qBt + Yt0 + Yt∞ ,
t≥0
is a Lévy process such that X1 has distribution µ.
This theorem, which is extremely useful in practice, is an explicit construction of any
càdlàg Lévy process, out of four independent ingredients: a deterministic drift, a Brownian
motion, and a jump part made of a compound Poisson process, and a compensated L2
càdlàg martingale. The compensation by a drift in the formula
defining Y n is crucial,
R
because the identity function is in general not in L1 (Π), so that [0,t]×[0,1] xM (ds, dx) is in
general ill-defined.
Proof. For every n > m > 0, the process Y n − Y m is a càdlàg martingale, and Doob’s
L2 inequality gives
Z
n
m 2
n
m 2
E sup |Ys − Ys |
≤ 4E[|Yt − Yt | ] = 4t y 2 Π(dy)1{n−1 ≤|x|<m−1 }
0≤s≤t
R
Z
≤ 4t y 2 Π(dy)1{0<|x|<m−1 } ,
R
we used the last statement of Proposition 7.3.1 for the second equality. Since
Rwhere
2
y Π(dy)1{0<x<1} < ∞, this can be made as small as wanted for m large enough. In
particular, for every t, Ytn is a Cauchy sequence in L2 , and thus converges to a limit Yt∞
in L2 . The process (Yt∞ , t ≥ 0) then defines a martingale, as is checked by passing to the
limit as n → ∞ in E[Ytn |Fs ] = Ysn . Moreover, by passing to the limit as n → ∞, we
obtain that sup0≤s≤t |Ys∞ − Ysm |2 converges in L2 to 0 as m → ∞, for every t ≥ 0. By
extracting along a subsequence, we may assume that the convergence is almost-sure, so
that Y ∞ is the a.s. uniform limit over compacts of càdlàg processes, hence is also a càdlàg
process (in fact, admits a càdlàg version).
Therefore, the process X defined in the statement is indeed a càdlàg process, and it is
easy to show that it is a Lévy process, being a pointwise L2 limit of Lévy processes. The
last thing that remains to be proved is that X1 has law µ. But from the independence of
√
the components used to build X, we obtain that if Xtn = at + qBt + Yt0 + Ytn ,
Z
q 2
n
E[exp(iuX1 )] = exp iua − u + (eiuy − 1)Π(dy)1{|x|≥1}
2
R
Z
iuy
+ (e − 1 − iuy)Π(dy)1{n−1 ≤|y|<1} .
R
By passing to the limit as n → ∞, we obtain that X1 has the characteristic function
associated to µ.
74
CHAPTER 8. ID LAWS AND LÉVY PROCESSES
Proof of the uniqueness in Theorem 8.1.1. Let µ be an ID law with Lévy triple
(a, q, Π), and let X be the unique càdlàg Lévy process such that X1 has law µ, given by
Proposition 8.2.1. Then Theorem 8.2.1 shows that the jumps of X between times 0 and
t, i.e. the process (∆s , 0 ≤ s ≤ t) defined by ∆s = Xs − Xs− , s ≥ 0, are the atoms of
a Poisson random measure M with intensity tΠ on R. This intensity is determined by
the law of M through the first moment formula tΠ(A) = E[M (A)] of Proposition 7.2.1.
Then, by defining Y 0 and Y n by the formulas (8.2), (8.3) and letting Y ∞ = limn Y n
along a subsequence according to which the limit is almost-sure uniformly on compacts,
we obtain that X − Y 0 − Y ∞ is a (scaled) Brownian motion with drift, with same law
e1 , and √q as its
e = (at + √qBt , t ≥ 0). We can recover a as the expectation of B
as B
variance. Finally, we see that µ uniquely determines its Lévy triple.
Chapter 9
Exercises
Warmup
The exercises of this section are designed to help remind you of basic concepts of probability theory (random variables, expectation, classical probability distributions, BorelCantelli lemmas). The last one is a longer exercise that contains the basic results on
uniform integrability that are needed in this course.
Exercise 9.0.1
Remind yourself what the following classical discrete distributions are : Bernoulli with parameter p ∈ [0, 1], binomial with parameters (n, p) ∈ N × [0, 1], geometric with parameter
p ∈ [0, 1], Poisson with parameter λ ≥ 0.
Do so with the following classical distributions on R: uniform on [a, b], exponential
with mean θ−1 , gamma with (positive) parameters (a, θ) (mean a/θ, variance a/θ2 ), beta
with (positive) parameters (a, b), Gaussian with mean m and variance σ 2 , Cauchy with
parameter a.
Exercise 9.0.2
Compute the distribution of 1/N 2 , where N is a standard Gaussian N (0, 1) random
variable. What is the distribution of N/N 0 , where N, N 0 are two independent such random
variables ?
Exercise 9.0.3
Show that for any countable set I and I-indexed family (Xi , i ∈ I) of non-negative random
variables, then supi∈I E[Xi ] ≤ E[supi∈I Xi ]. Show that these two quantities are equal if
for every i, j ∈ I there exists some k ∈ I such that Xi ∨ Xj ≤ Xk .
Exercise 9.0.4
Fix α > 0, and let (Zn , n ≥ 0) be a sequence of independent random variables with values
in {0, 1}, whose laws are characterized by
1
P (Zn = 1) = α = 1 − P (Zn = 0).
n
1
Show that Zn converges to 0 in L . Show that lim supn Zn is 0 a.s. if α > 1 and 1 a.s. if
α ≤ 1.
75
76
CHAPTER 9. EXERCISES
Exercise 9.0.5
Let (Xn , n ≥ 1) be a sequence of independent exponential random variables with mean
1. Show that lim supn (log n)−1 Xn = 1 a.s.
Exercise 9.0.6
Let N be a random N (0, 1) random variable. Show that
1
P (N > x) ≤ √ exp(−x2 /2).
x 2π
Show in fact that as x → ∞,
1
P (N > x) = √ exp(−x2 /2)(1 + o(1)).
x 2π
Let (Yn , n ≥ 1) be a sequence of independent such Gaussian variables.
lim supn (2 log n)−1/2 Yn = 1 a.s.
Show that
Exercise 9.0.7 The basics of uniform integrability
Let (E, A, µ) be a measured space, withRµ(E) < ∞. If f is a measurable non-negative
function, we let µ(f ) be a shorthand for E f dµ.
A family of R-valued functions (fi , i ∈ I) in L1 (E, A, µ) is said to be uniformly integrable (U.I. in short) if the following holds :
sup µ(|fi |1{|fi |>a} ) → 0.
i∈I
a→∞
You may think of (E, A, µ) and the fi as being a probability space and random variables.
1. Show that a U.I. family is bounded in L1 (E, A, µ). Show that the converse is not
true.
2. Show that a finite family of integrable functions is U.I.
3. Let G : R+ → R+ be a measurable function such that limx→∞ x−1 G(x) = +∞.
Show that for every C > 0, the family
{f ∈ L1 (E, A, µ) : µ(G(|f |)) ≤ C}
is U.I. Deduce that a family of measurable functions that is bounded in Lp (E, A, µ) for
some p > 1 is U.I.
4. (Harder) Show that the converse is true : if (fi , i ∈ I) is a U.I. family, then there
exists a function G as in 3. so that (fi , i ∈ I) is included in a set of the form of previous
displayed expression. (Hint : consider an increasing positive sequence (an , n ≥ 0) such
that supi∈I µ(|fi |1{fi ≥an } ) ≤ 2−n for every n)
5. Let (fi , i ∈ I) be a family that is bounded in L1 (E, A, µ). Show that (i) and (ii)
below are equivalent :
(i) (fi , i ∈ I) is U.I.
(ii) ∀ ε > 0, ∃ δ > 0 s.t. ∀ A ∈ A, µ(A) < δ =⇒ supi∈I µ(|fi |1A ) < ε.
6. Show that if (fi , i ∈ I) and (gj , j ∈ J) are two U.I. families, then (fi +gj , i ∈ I, j ∈ J)
is also U.I.
9.1. CONDITIONAL EXPECTATION
77
7. Let (fn , n ≥ 0) be a sequence of L1 functions that converges in measure to a
measurable function f , i.e. for every ε > 0,
µ({|f − fn | > ε}) → 0.
n→∞
Show that (fn , n ≥ 0) converges in L1 to f if and only if (fn , n ≥ 0) is U.I. Hint : For the
necessary condition, you might find useful to consider sets such as {|f − fn | > 1}, {ε <
|f − fn | ≤ 1} and {|f − fn | ≤ ε}.
Remark. This shows that a sequence of random variables converging in probability (or
a.s.) to some other random variable has an ’upgraded’ L1 convergence if and only if it is
uniformly integrable.
9.1
Conditional expectation
Exercise 9.1.1
Let X, Y be two random variables in L1 so that
E[X|Y ] = Y
and
E[Y |X] = X.
Show that X = Y a.s. As a hint, you may want to consider quantities like E[(X −
Y )1{X>c,Y ≤c} ] + E[(X − Y )1{X≤c,Y ≤c} ].
Exercise 9.1.2
Let X, Y be two independent Bernoulli random variables with parameter p ∈ (0, 1). Let
Z = 1{X+Y =0} . Compute E[X|Z], E[Y |Z].
Exercise 9.1.3
Let X ≥ 0 be a random variable on a probability space (Ω, F, P ), and let G ⊆ F be
a sub-σ-algebra. Show that X > 0 implies that E[X|G] > 0, up to an event of zero
probability. Show that {E[X|G] > 0} is actually the smallest G-measurable event that
contains the event {X > 0}, up to zero probability events.
Exercise 9.1.4
Check that the sum Z of two independent exponential random variables X, Y with parameter θ > 0 (mean 1/θ) is a gamma distribution with parameter (2, θ), whose density with
respect to Lebesgue measure is θ2 x exp(−θx)1{x≥0} . Show that for every non-negative
measurable h,
Z
1 Z
h(u)du.
E[h(X)|Z] =
Z 0
Conversely, let Z be a random variable with a Γ(2, θ) distribution, and suppose X is
a random variable whose conditional distribution given Z is uniform on [0, Z]. Namely,
RZ
for every Borel non-negative function h, E[h(X)|Z] = Z −1 0 h(x)dx a.s. Show that X
and Z − X are independent, with exponential law.
78
CHAPTER 9. EXERCISES
Exercise 9.1.5
Suppose given a, b > 0, and let X, Y be two random variables with values in Z+ and R+
respectively, whose distribution is characterized by the formula
Z t
(ay)n
P (X = n, Y ≤ t) = b
exp(−(a + b)y)dy.
n!
0
Let n ∈ Z+ and h : R+ → R+ be a measurable function, compute E[h(Y )|X = n]. Then
compute E[Y /(X + 1)], E[1{X=n} |Y ] and E[X|Y ].
Exercise 9.1.6
Let (X, Y1 , . . . , Yn ) be a random vector with components in L2 . Show that the best
approximation ofP
X in the L2 norm by an affine combination of the (Yi , 1 ≤ i ≤ n), say
of the form λ0 + ni=1 λi (Yi − E[Yi ]), is given by λ0 = E[X] and any solution (λ1 , . . . , λn )
of the linear system
Cov (X, Yj ) =
n
X
λi Cov (Yi , Yj ) ,
1 ≤ j ≤ n.
i=1
This affine combination is called the linear regression of X with respect to (Y1 , . . . , Yn ).
If (X, Y1 , . . . , Yn ) is a Gaussian random vector, show that E[X|Y1 , . . . , Yn ] equals the
linear regression of X with respect to (Y1 , . . . , Yn ).
Exercise 9.1.7
Let X ∈ L1 (Ω, F, P ). Show that the family
{E[X|G] : G is a sub-σ-algebra of F}
is uniformly integrable.
Exercise 9.1.8 Conditional independence
Let G ⊆ F be a sub-σ-algebra. Two random variables X, Y are said to be independent
conditionally on G if for every non-negative measurable f, g,
E[f (X)g(Y )|G] = E[f (X)|G] E[g(Y )|G].
What are two random variables independent conditionally on {∅, Ω}? On F?
1. Show that X, Y are independent conditionally on G if and only if for every nonnegative G-measurable random variable Z, and every f, g non-negative measurable functions,
E[f (X)g(Y )Z] = E[f (X)ZE[g(Y )|G]],
and this if and only if for every measurable non-negative g,
E[g(Y )|G ∨ σ(X)] = E[g(Y )|G].
Comment the case G = {∅, Ω}.
2. Suppose given three random variables X, Y, Z with a positive density p(x, y, z).
Suppose X, Y are independent conditionally on σ(Z). Show that there exist measurable
positive functions r, s so that p(x, y, z) = q(z)r(x, z)s(y, z) where q is the density of Z,
and conversely.
9.2. DISCRETE-TIME MARTINGALES
9.2
79
Discrete-time martingales
Exercise 9.2.1
Let (Xn , n ≥ 0) be an integrable process with values in a countable subset E ⊂ R. Show
that X is a martingale with respect to its natural filtration if and only if for every n and
every i0 , . . . , in ∈ E,
E[Xn+1 |X0 = i0 , . . . , Xn = in ] = in .
Exercise 9.2.2
Let (Xn , n ≥ 1) be a sequence of independent random variables with respective laws given
by
1
n2
1
2
= 1 − 2.
P Xn = 2
P (Xn = −n ) = 2 ,
n
n −1
n
Let Sn = X1 + . . . + Xn . Show that Sn /n → 1 a.s. as n → ∞, and deduce that (Sn , n ≥ 0)
is a martingale which converges to +∞.
Exercise 9.2.3
Let (Ω, F, (Fn ), P ) be filtered probability space. Let A ∈ Fn for some n, and let m, m0 ≥
n. Show that m1A + m0 1Ac is a stopping time.
Show that an adapted process (Xn , n ≥ 0) with respect to some filtered probability
space is a martingale if and only if it is integrable, and for every bounded stopping time
T , E[XT ] = E[X0 ].
Exercise 9.2.4
Let X be a martingale (resp. supermartingale) on some filtered probability space, and let
T be an a.s. finite stopping time. Prove that E[XT ] = E[X0 ] (resp. E[XT ] ≤ E[X0 ]) if
either one of the following conditions holds:
1. X is bounded (∃M > 0 : ∀n ≥ 0, |Xn | ≤ M a.s.).
2. X has bounded increments (∃M > 0 : ∀n ≥ 0, |Xn+1 − Xn | ≤ M a.s.) and E[T ] <
∞.
Exercise 9.2.5
Let (Xn , n ≥ 0) be a non-negative supermartingale. Show the maximal inequality for
a > 0:
aP max Xk ≥ a ≤ E[X0 ].
0≤k≤n
Exercise 9.2.6
Let T be an (Fn , n ≥ 0)-stopping time such that for some integer N > 0 and ε > 0,
P (T ≤ N + n|Fn ) ≥ ε ,
for every n ≥ 0.
Show that E[T ] < ∞. Hint: Find bounds for P (T > kN ).
80
CHAPTER 9. EXERCISES
Exercise 9.2.7
Your winnings per unit stake on game n are εn , where (εn , n ≥ 0) is a sequence of
independent random variables with
P (εn = 1) = p ,
P (εn = −1) = 1 − p = q,
where p ∈ (1/2, 1). Your stake Cn on game n must lie between 0 and Zn−1 , where Zn−1
is your fortune at time n − 1. Your object is to maximize the expected ’interest rate’
E[log(ZN /Z0 )] where N is a given integer representing the length of the game, and Z0 ,
your fortune at time 0, is a given constant. Let Fn = σ{ε1 , . . . , εn }. Show that if C is
any previsible strategy, that is if Cn is Fn−1 -measurable for all n, then log Zn − nα is a
supermartingale, where α denotes the entropy
α = p log p + q log q + log 2,
so that E[log(ZN /Z0 )] ≤ N α, but that, for a certain strategy, log Zn − nα is a martingale.
What is the best strategy?
Exercise 9.2.8 Pólya’s urn
Consider an urn that initially contains two balls, one black, one white. One picks at
random one of the balls with equal probability, checks the color, replaces the ball in the
urn and adds another ball of the same color. Then resume the procedure. After step n,
n + 2 balls are in the urn, of which Bn + 1 are black and n + 1 − Bn are white.
1. Show that ((n + 2)−1 (Bn + 1), n ≥ 0) is a martingale with respect to a certain
filtration you should indicate. Show that it converges a.s. and in Lp for all p ≥ 1 to a
[0, 1]-valued random variable X∞ .
2. Show that for every k, the process
(Bn + 1)(Bn + 2) . . . (Bn + k)
,
(n + 2)(n + 3) . . . (n + k + 1)
n≥1
k
], and finally the law of X∞ .
is a martingale. Deduce the value of E[X∞
3. Re-obtain this result by directly showing that P (Bn = k) = (n + 1)−1 for every
n ≥ 1, 1 ≤ k ≤ n. As a hint, let Yi be the indicator that the i-th picked ball is black, and
compute P (Yi = ai , 1 ≤ i ≤ n) for any (ai , 1 ≤ i ≤ n) ∈ {0, 1}n .
4. Show that for 0 < θ < 1, (Nnθ , n ≥ 0) is a martingale, where
Nnθ =
(n + 1)!
θBn (1 − θ)n−Bn
Bn !(n − Bn )!
Exercise 9.2.9 Bayes’ urn
Let U be a uniform random variable on [0, 1], and conditionally on UP, let X1 , X2 , . . . be
independent Bernoulli random variables with parameter U . Let Bn = ni=1 Xi . Show that
for every n, (B1 , . . . , Bn ) has the same law as the sequence (B1 , . . . , Bn ) in the previous
exercise. Show that Nnθ is a conditional density function of U given B1 , . . . , Bn .
Exercise 9.2.10 Monkey typing ABRACADABRA
A monkey types a text at random on a keyboard, so that each new letter is picked
9.2. DISCRETE-TIME MARTINGALES
81
uniformly at random among the 26 letters of the roman alphabet. Let Xn be the n-th
letter of the monkey’s masterpiece, and let T be the first time when the monkey has typed
the exact word ABRACADABRA
T = inf{n ≥ 0 : (Xn−10 , Xn−9 , . . . , Xn ) = (A, B, R, A, C, A, D, A, B, R, A)}.
Show that E[T ] < ∞. The goal is to give the exact value of E[T ]. For this, suppose that
just before each time n, a player Pn comes and bets 1 gold coin (GC) that Xn will be A.
If he loses, he leaves the game, and if he wins, he earns 26GC, which he entirely plays on
Xn+1 being B. If he loses, he leaves, else he earns 262 GC which he bets on Xn+2 being R,
and so on. Show that
E[T ] = 2611 + 264 + 26.
(Hint: Use exercise 9.2.4) Why is that larger than the average first time the monkey has
typed ABRACADABRI?
Exercise 9.2.11
Let (Xn , n ≥ 0) be a sequence of [0, 1]-valued random variables, which satisfy the following
property. First, X0 = a a.s. for some a ∈ (0, 1), and for n ≥ 0,
1 + Xn Xn Fn = 1 − Xn ,
Fn = Xn ,
P Xn+1 =
P Xn+1 =
2 2 where Fn = σ{Xk , 0 ≤ k ≤ n}. Here, we have denoted P (A|G) = E[1A |G].
1. Prove that (Xn , n ≥ 0) is a martingale that converges in Lp for every p ≥ 1.
2. Check that E[(Xn+1 − Xn )2 ] = E[Xn (1 − Xn )]/4. Then determine E[X∞ (1 − X∞ )]
and deduce the law of X∞ .
Exercise 9.2.12
Let (Xn , n ≥ 0) be a martingale in L2 . Show that its increments (Xn+1 − Xn , n ≥ 0) are
pairwise orthogonal. Conclude that X is bounded in L2 if and only if
X
E[(Xn+1 − Xn )2 ] < ∞,
n≥0
and that Xn converges in L2 in this case, without using the L2 convergence theorem for
martingales.
Exercise 9.2.13 Wald’s identity
Let (Xn , n ≥ 0) be a sequence of independent and identically distributed real integrable
random variables, which are not a.s. 0. We let Sn = X1 +. . .+Xn be the associated random
walk, and recall that (Sn − nE[X1 ], n ≥ 0) is a martingale. Let T be a (Fn )-stopping
time.
1. Show that
E[|ST ∧n − ST |] ≤
∞
X
k=n+1
E[|Xk |1{T ≥k} ] ≤ E[|X1 |]E[T 1{T ≥n+1} ].
82
CHAPTER 9. EXERCISES
Deduce that if E[T ] < ∞, then ST ∧n converges to ST in L1 . Deduce that if E[T ] < ∞,
then E[ST ] = E[X1 ]E[T ].
2. Suppose E[X1 ] = 0 and Ta = inf{n ≥ 0 : Sn > a} for some a > 0. Show that
E[Ta ] = ∞.
3. Let now a < 0 < b and Ta,b = inf{n ≥ 0 : Sn < a or Sn > b}. Assume that
E[X1 ] 6= 0. By discussing separately the cases where X1 is bounded or not, prove that
E[Ta,b ] < ∞ and that E[STa,b ] = E[X1 ]E[Ta,b ].
4. Assume that E[X1 ] = 0. Show that E[Ta,b ] < ∞. Hint: consider again separately
the cases when X1 is bounded and unbounded. In the bounded case, think how far
(Sn2 , n ≥ 0) is from being a martingale.
Exercise 9.2.14 The gambler’s ruin
Let 0 < K < N be integers. Consider a sequence of independent random variables
(Xn , n ≥ 1) with P (Xn = 1) = p = 1 − P (Xn = −1), where p ∈ (0, 1/2) ∪ (1/2, 1). Let
Sn = X1 + . . . + Xn and define
T0 = inf{n ≥ 1 : Sn = 0} ,
TN = inf{n ≥ 1 : Sn = N }.
Show that T := T0 ∧ TN is a.s. finite (and in fact has finite expectation). Then show that,
letting q = 1 − p,
Sn
q
,
Mn =
p
Nn = Sn − (p − q)n ,
n ≥ 0,
defines two martingales with respect to the natural filtration of (Sn , n ≥ 1). Compute
P (T0 < TN ) and E[ST ], E[T ].
What happens to this exercise if p = 1/2?
Exercise 9.2.15 Azuma-Hœffding inequality
1. Let Y be a random variable taking values in [−c, c] for some c > 0, and such that
E[Y ] = 0. Show that for every θ ∈ R,
2 2
θ c
θY
E[e ] ≤ cosh θc ≤ exp
.
2
As a hint, the convexity of z 7→ ezθ entails that
eyθ ≤
y + c cθ c − y −cθ
e +
e .
2c
2c
Also, state and prove a conditional version of this fact.
2. Let M be a martingale with M0 = 0, and such that there exists a sequence
(cn , n ≥ 0) of positive real numbers such that |Mn − Mn−1 | ≤ cn for every n. Show that
for x ≥ 0,
x2
P sup Mk ≥ x ≤ exp − Pn 2 .
2 k=1 ck
0≤k≤n
As a hint, notice that (eθMn , n ≥ 0) is a submartingale, and optimize over θ.
9.2. DISCRETE-TIME MARTINGALES
83
Exercise 9.2.16 A discrete Girsanov theorem
Let Ω be the space of real-valued sequences (ωn , n ≥ 0) such that lim supn ωn = +∞ and
lim inf n ωn = −∞. We say that such sequences oscillate. Let Fn = σ{Xk , 0 ≤ k ≤ n}
where Xk (ω) = ωk is the k-th projection, and F = F∞ . Show that p = 1/2 is the only real
in (0, 1) such that there exists a probability measure Pp on (Ω, F) that makes (Xn , n ≥ 0)
a simple random walk with step distributions
Pp (X1 = 1) = p = 1 − P (X1 = −1).
Let Pp,n be the unique probability measure on (Ω, Fn ) that makes (Xk , 0 ≤ k ≤ n) a
simple random walk with these step distributions. If p ∈ (0, 1) \ {1/2}, identify the
martingale
dPp,n
.
Mn =
dP1/2,n
Find a finite stopping time T such that E1/2 [MT ] < 1.
Exercise 9.2.17
Let f : [0, 1] → R be a Lipschitz function, i.e. |f (x) − f (y)| ≤ K|x − y| for some K > 0
and every x, y. Let fn be the function obtained by interpolating linearly between the
values of f taken at numbers of the form k2−n , 0 ≤ k ≤ 2n , and let Mn = fn0 .
1. Show that Mn is a martingale in some filtration.
2. Deduce
that there exists an integrable function g : [0, 1] → R such that f (x) =
Rx
f (0) + 0 g(y)dy for almost every 0 ≤ x ≤ 1.
Exercise 9.2.18 Doob’s decomposition of submartingales
Let (Xn , n ≥ 0) be a submartingale.
1. Show that there exists a unique martingale Mn and a unique previsible process
(An , n ≥ 0) such that A0 = 0, A is increasing and X = M + A.
2. Show that M, A are bounded in L1 if and only if X is, and that A∞ < ∞ a.s. in
this case (and even that E[A∞ ] < ∞), where A∞ is the increasing limit of An as n → ∞.
Exercise 9.2.19
Let (Xn , n ≥ 0) be a U.I. submartingale.
1. Show that if X = M + A is the Doob decomposition of X, then M is U.I.
2. Show that for every pair of stopping times S, T , with S ≤ T ,
E[XT |FS ] ≥ XS
Exercise 9.2.20 Quadratic variation
Let (Xn , n ≥ 0) be a square-integrable martingale.
1. Show that there exists a unique increasing previsible process starting at 0, which
we denote by (hXin , n ≥ 0), so that (Xn2 − hXin , n ≥ 0) is a martingale.
2. Let C be a bounded previsible process. Compute hC · Xi.
3. Let T be a stopping time, show that hX T i = hXiT .
4. (Harder) Show that hXi∞ < ∞ implies that Xn converges as n → ∞, up to a zero
probability event. Is the converse true? Show that it is when supn≥0 |Xn+1 − Xn | ≤ K
a.s. for some K > 0.
84
9.3
CHAPTER 9. EXERCISES
Continuous-time processes
Exercise 9.3.1 Gaussian processes
A real-valued process (Xt , t ≥ 0) is called a Gaussian process if for every t1 < t2 < . . . < tk ,
the random vector (Xt1 , . . . , Xtk ) is a Gaussian random vector. Show that the law of a
Gaussian process is uniquely characterized by the numbers E[Xt ], t ≥ 0 and Cov (Xs , Xt )
for s, t ≥ 0.
Exercise 9.3.2
Let T be an exponential random variable with parameter λ > 0. Define
0 if t < T
1 − eλt if t < T
Zt =
, Ft = σ{Zs , 0 ≤ s ≤ t} , Mt =
.
1 if t ≥ T
1
if t ≥ T
Show that E[|Mt |] < ∞ for every t ≥ 0, and that E[Mt 1{T >r} ] = E[Ms 1{T >r} ] for every
r ≤ s ≤ t. Deduce that (Mt , t ≥ 0) is a càdlàg (Ft )-martingale.
Is M bounded in L1 ? Is it uniformly integrable? Is MT − in L1 ?
Exercise 9.3.3 Hazard function
Let T be a random variable in (0, ∞) that admits a strictly positive continuous density
f on (0, ∞). Let F (t) = P (T ≤ t). Let
Z t
f (s)ds
At =
,
t≥0
0 1 − F (s)
to be the hazard function of T . Show that AT has the law of an exponential random
variable with parameter 1. As a hint, consider the distribution function P (AT ≤ t), t ≥ 0
and write it in terms of the inverse function A−1 .
By letting Zt = 1t≥T , t ≥ 0 and Ft = σ{Zs , 0 ≤ s ≤ t}, prove that (Zt − AT ∧t , t ≥ 0)
is a càdlàg martingale with respect to (Ft , t ≥ 0).
The next exercises are designed to (hopefully) help those of you who want to have a
better insight on the nature of filtrations and events related to continuous-time processes.
Exercise 9.3.4
Let C1 be the product σ-algebra on Ω = C([0, 1], R), i.e. the smallest σ-algebra that makes
the applications Xt : ω 7→ ω(t) for t ≥ 0 measurable.
Let C2 be the (more natural?) Borel σ-algebra on C([0, 1], R), when endowed with the
uniform norm and the associated topology.
Show that C1 = C2 .
Exercise 9.3.5
Let I be a nonempty real interval. Let Ω = RI be the set of all functions defined on I,
which is endowed with the product σ-algebra F, i.e. the smallest σ-algebra with respect
to which Xt : ω 7→ ω(t) is measurable for every t. Show that
[
G=
σ(Xs , s ∈ J)
J≺I
9.4. WEAK CONVERGENCE
85
is a σ-algebra, where J ≺ I stands for J ⊂ I and J is countable. Deduce that G = F.
Show that the set
{ω ∈ Ω : s 7→ Xs (ω) is continuous}
is not measurable with respect to F.
9.4
Weak convergence
Exercise 9.4.1
Let (Xn , n ≥ 1) be a sequence of independent random variables with uniform distribution
on [0, 1]. Let Mn = max(X1 , . . . , Xn ). Show that n(1 − Mn ) converges in distribution as
n → ∞, and determine the limit law.
Exercise 9.4.2
Let (Xn , n ∈ N t {∞}) be random variables defined on some probability space (Ω, F, P ),
with values in a metric space (M, d).
1. Suppose that Xn → X∞ a.s. as n → ∞. Show that Xn converges to X∞ in
distribution.
2. Suppose that Xn converges in probability to X∞ . Show that Xn converges in
distribution to X∞ .
Hint: use the fact that (Xn , n ≥ 0) converges in probability to X∞ if and only if
for every subsequence extracted from (Xn , n ≥ 0), there exists a further subsequence
converging a.s. to X∞ .
3. If Xn converges in distribution to a constant X∞ = c, then Xn converges in
probability to c.
Exercise 9.4.3
Suppose given sequences (Xn , n ≥ 0), (Yn , n ≥ 0) of real-valued random variables, and
two extra random variables X, Y , such that Xn , Yn respectively converge in distribution
to X, Y . Is it true that (Xn , Yn ) converges in distribution to (X, Y )? Show that this is
true in the following cases
1. For every n, Xn and Yn are independent, as well as X and Y .
2. Y is a.s. constant (Hint: use 3. in the previous exercise).
Exercise 9.4.4
Let m be a probability measure on R. Define, for every n ≥ 0,
X
mn (dx) =
m([k2−n , (k + 1)2−n ))δk2−n (dx),
k∈Z
where δz (dx) denotes the Dirac mass at z. Show that mn converges weakly to m.
Exercise 9.4.5
1. Let (Xn , n ≥ 1) be independent exponential random variables with mean 1. Define
86
CHAPTER 9. EXERCISES
Sn = X1 +. . .+Xn , and determine without computation the limit of P (Sn ≤ n) as n → ∞
(Hint: which theorem could be useful here?).
P
2. Determine also without computation the limit of exp(−n) nk=0 (k!)−1 nk .
Hint: recall that the Poisson law with parameter λ > 0 is the probability distribution
on Z+ that puts mass e−λ λn /n! on the integer n. Then if X, Y are two random variables
which are independent and with respective laws that are Poisson with parameters λ, µ,
then X + Y has a Poisson law with parameter λ + µ. Using this, make the formula look
like question 1.
Exercise 9.4.6
Let (Yn , n ≥ 0) be a sequence of random variables so that Yn follows a Gaussian N (mn , σn2 )
law, and suppose that Yn weakly converges to some Y as n → ∞. Show that there exist
m ∈ R and σ 2 ≥ 0 so that mn → m, σn2 → σ 2 , and that Y is Gaussian N (m, σ 2 ).
Hint: Use characteristic functions, and first show that the variance converges.
Exercise 9.4.7
Let d ≥ 1.
1. Show that a finite family of probability measures on Rd is tight.
2. Assuming Prokhorov’s theorem for probability measures on Rd , show that if
(µn , n ≥ 0) is a sequence of non-negative measures on Rd which is tight (for every ε > 0
there is a compact K ⊂ Rd such that supn≥0 µn (Rd \ K) < ε) and such that
sup µn (Rd ) < ∞,
n≥0
then there exists a subsequence nk along which µn weakly converges to a limit µ (i.e.
µnk (f ) converges to µ(f ) for every bounded continuous f ).
9.5
Brownian motion
Exercise 9.5.1
Recall that a Gaussian process (Xt , t ≥ 0) in Rd is a process such that for every t1 <
t2 < . . . < tk ∈ R+ , the vector (Xt1 , . . . , Xtk ) is a Gaussian random vector. Show that
the (standard) Brownian motion in Rd is the unique Gaussian process (Bt , t ≥ 0) with
E[Bt ] = 0 for every t ≥ 0 and Cov (Bs , Bt ) = (s ∧ t) Id for every s, t ≥ 0.
Exercise 9.5.2
Let B be a standard real-valued Brownian motion.
1. Show that a.s.,
Bt
lim sup √ = ∞ ,
t
t↓0
Bt
lim inf √ = −∞.
t↓0
t
2. Show that Bn /n √→ 0 a.s. as n → ∞. Then show that a.s. for n large enough,
supt∈[n,n+1] |Bt − Bn | ≤ n, and conclude that Bt /t → 0 a.s. as t → ∞.
9.5. BROWNIAN MOTION
87
3. Show that the process
Bt0
=
tB1/t
0
if t > 0
,
if t = 0
t≥0
is a standard Brownian motion (Hint: Use Exercise 9.5.1).
4. Use this to show that
Bt
lim sup √ = ∞ ,
t
t→∞
Bt
lim inf √ = −∞.
t→∞
t
Exercise 9.5.3 Around hitting times
Let (Bt , t ≥ 0) be a standard real-valued Brownian motion.
1. Let Tx = inf{t ≥ 0 : Bt = x} for x ∈ R. Prove that Tx has same distribution as
(x/B1 )2 , and compute its probability distribution function.
2. For x, y > 0, show that
P (T−y < Tx ) =
x
x+y
,
E[Tx ∧ T−y ] = xy.
3. Show that if 0 < x < y, the random variable Ty − Tx has same law as Ty−x , and is
independent of FTx (where (Ft , t ≥ 0) is the natural filtration of B).
Hint: the three questions are independent.
Exercise 9.5.4
Let (Bt , t ≥ 0) be a standard real-valued Brownian motion. Compute the joint distribution
of (Bt , sup0≤s≤t Bs ) for t ≥ 0.
Exercise 9.5.5
Let (Bt , t ≥ 0) be a standard Brownian motion, and let 0 ≤ a < b.
1. Compute the mean and variance of
n
Xn :=
2
X
Ba+k(b−a)2−n − Ba+(k−1)(b−a)2−n
2
.
k=1
2. Show that Xn converges a.s. and give its limit.
3. Deduce that a.s. there exists no interval [a, b] with a < b such that B is Hölder
continuous with exponent α > 1/2 on [a, b], i.e. supa≤s,t≤b (|Bt − Bs |/|t − s|α ) < ∞.
Exercise 9.5.6
Let (Bt , t ≥ 0) be a standard Brownian motion. Define G1 = sup{t ≤ 1 : Bt = 0} and
D1 = inf{t ≥ 1 : Bt = 0}.
1. Are these random variables stopping times? Show that G1 has same distribution
as D1−1 .
2. By applying the Markov property at time 1, compute the law of D1 . Deduce that
of G1 (it is called the arcsine law).
88
CHAPTER 9. EXERCISES
Exercise 9.5.7
Let (Bt , t ≥ 0) be a standard Brownian motion, and let (Ft , t ≥ 0) be its natural filtration.
Determine all the polynomials f (t, x) of degree less than or equal to 3 in x such that
(f (t, Bt ), t ≥ 0) is a martingale.
Exercise 9.5.8
Let (Bt , t ≥ 0) be a standard Brownian motion in R3 . We let Rt = 1/|Bt |.
1. Show that (Rt , t ≥ 1) is bounded in L2 .
2. Show that E[Rt ] → 0 as t → ∞.
3. Show that Rt , t ≥ 1 is a supermartingale. Deduce that |Bt | → ∞ as t → ∞, a.s.
Exercise 9.5.9 Zeros of Brownian motion
Let (Bt , t ≥ 0) be a standard real-valued Brownian motion. Let Z = {t ≥ 0 : Bt = 0} be
the set of zeros of B.
1. Show that it is closed, unbounded and has zero Lebesgue measure a.s.
2. By using the stopping times Dq = inf{t ≥ q : Bt = 0} for q ∈ Q+ , show that Z has
no isolated point a.s.
Exercise 9.5.10
Let W0 (dw) denote Wiener’s measure on Ω0 = {w ∈ C([0, 1]) : w(0) = 0}, and define a
(a)
new probability measure W0 on Ω0 by
(a)
dW0
(w) = exp(aw(1) − a2 /2).
dW0
(a)
1. Show that under W0 , the canonical process Xt : w 7→ w(t) remains Gaussian, and
give its distribution.
2. Show that W0 ({f ∈ Ω0 : kf k∞ < ε}) > 0 for every ε > 0, where kf k∞ =
sup0≤t≤1 |f (t)|.
3. Show that for every non-empty open set U ⊂ Ω0 , one has W0 (A) > 0. Hint: First
note that any such U contains the ε-neighborhood of a function f which is piecewise
linear, for some ε > 0.
Exercise 9.5.11 Brownian bridge
Let (Bt , 0 ≤ t ≤ 1) be a standard Brownian motion. We let (Zty = yt + (Bt − tB1 ), 0 ≤
t ≤ 1) for any y ∈ R, and call it the Brownian bridge from 0 to y. Let W0y be the
law of (Zty , 0 ≤ t ≤ 1) on C([0, 1]). Show that for any non-negative measurable function
F : C([0, 1]) → R+ for f (y) = W0y (F ), we have
E[F (B)|B1 ] = f (B1 ) ,
a.s.
Hint: find a simple argument entailing that B1 is independent of the process (Bt −tB1 , 0 ≤
t ≤ 1).
Explain why we can interpret W0y as the law of a Brownian motion ‘conditioned to hit
y at time 1’.
9.6. POISSON MEASURES, ID LAWS AND LÉVY PROCESSES
89
Exercise 9.5.12
Show that the Dirichlet problem on D = B(0, 1) \ {0} in Rd , with boundary conditions
g(x) = 0 for |x| = 1 and g(x) = 1 for x = 0, has no solution for d ≥ 2.
Exercise 9.5.13 Dirichlet problem in the upper-half plane
Let H = {(x, y) ∈ R2 : y > 0}. Let (Bt , t ≥ 0) be a Brownian motion started from x
under the probability measure Px , and let T = inf{t ≥ 0 : Bt ∈
/ H}.
1. Determine the law of BT under Px whenever x ∈ H.
2. Show that if u is a bounded continuous function on H which is harmonic on H,
then
Z
1
y
dz u(z, 0)
u(x, y) =
dz.
π (x − z)2 + y 2
R
9.6
Poisson measures, ID laws and Lévy processes
Exercise 9.6.1
Prove that the Poisson law with parameter λ > 0 is the weak limit of the Binomial law
with parameters (n, λ/n) as n → ∞.
A factory makes 500,000 light bulbs in a day. On an average, 4 of these are defectuous.
Estimate the probability that on some given day, 2 of the produced light bulbs were
defectuous.
Exercise 9.6.2 The bus paradox
Why do we always feel we are waiting a very long time before buses arrive? This exercise
gives in indication of why... well, if buses arrive according to a Poisson process.
1. Suppose buses are circulating in a city day and night since ever, the counterpart
being that drivers do not officiate with a timetable. Rather, the times of arrival of buses
at a given bus-stop are the atoms of a Poisson measure on R with intensity θdt, where dt
is Lebesgue measure on R. A customer arrives at a fixed time t at the bus-stop. Let S, T
be the two consecutive atoms of the Poisson measure satisfying S < t < T . Show that
the average time E[T − S] that elapses between the arrivals of the last bus before time t
and the first bus after time t is 2/θ. Explain why this is twice the average time between
consecutive buses. Can you see why this is so?
2. Suppose that buses start circulating at time 0, so that arrivals of buses at the
station are now the jump times of a Poisson process with intensity θ on R+ . If the
customer arrives at time t, show that the average elapsed time between the bus before
(time S) and after his arrival (time T ) is θ−1 (2 − e−θt ) (with the convention S = 0 if no
atom has fallen in [0, t]).
Exercise 9.6.3
Prove Proposition 7.3.2.
Exercise 9.6.4
P
Check the marking property of Poisson random measures: if M (dx) =
i∈I δxi (dx) is
90
CHAPTER 9. EXERCISES
a Poisson random measure on E, E with intensity µ, and if yi , i ∈ I are i.i.d. random
variables with law ν on some measurable space (F, F), and independent of M , then
P
i∈I δ(xi ,yi ) (dx dy) is a Poisson random measure on E × F with intensity µ ⊗ ν.
Exercise 9.6.5 Brownian motion and the Cauchy process
Let (Bt = (Bt1 , Bt2 ), t ≥ 0) be a standard Brownian motion in R2 (i.e. B0 = 0). Recall
that the Cauchy law with parameter a > 0 has probability distribution function a/(π(a2 +
x2 )), x ∈ R. We let
a ≥ 0.
Ca = inf{t ≥ 0 : Bt2 = −a} ,
Prove that the process (BC1 a , a ≥ 0) is a Lévy process such that Ca is a Cauchy law with
parameter a for every a > 0. Does this remind you of a previous exercise?
Index
càdlàg, 29
Intensity measure, 63
Blumenthal’s 0 − 1 law, 47
Branching process, 23
Brownian motion, 45
(Ft )-Brownian motion, 48
finite marginal distributions, 45
standard, 45
Kakutani’s product-martingales theorem, 26
Kolmogorov’s 0 − 1 law, 23
Kolmogorov’s continuity criterion, 35
Central limit theorem, 40
Characteristic exponent, 70
Compound Poisson distribution, 68
Compound Poisson process, 67
Conditional convergence theorems, 9
Conditional density functions, 11
Conditional expectation
discrete case, 5
for L1 random variables, 6
for non-negative random variables, 8
Conditional Jensen inequality, 9
Convergence in distribution, 39
Dirichlet problem, 55
Donsker’s invariance principle, 59
Doob’s Lp inequality, 19, 34
Doob’s maximal inequality, 18
Doob’s upcrossing lemma, 17
Lévy process, 66, 71
Lévy triple, 69
Lévy’s convergence theorem, 41
Lévy-Itō theorem, 73
Lévy-Khintchine formula, 70
Laplace functional, 65
Last exit time, 14
Law of large numbers, 23
Likelihood ratio test, 28
Martingale, 14
backwards, 21
closed, 19
complex-valued, 51
convergence theorem
almost-sure, 17, 34
for backwards martingales, 21
in L1 , 20, 34
in Lp , p > 1, 19, 34
regularization, 32
uniformly integrable, 20
Martingale transform, 15
Exterior cone condition, 55
Optional stopping
Filtration, 13
for discrete-time martingales, 16
filtered space, 13
for uniformly integrable martingales, 20,
natural filtration, 13
34
Finite marginal distributions, 31
Poisson point process, 66
First entrance time, 14, 30
First hitting times for Brownian motion, 50, Poisson random measure, 63
Prokhorov’s theorem, 40
51
Harmonic function, 56
Infinitely divisible distribution, 69
Radon-Nikodym theorem, 25
Recurrence, 53
Reflection principle, 49
91
92
Scaling property of Brownian motion, 47
Separable σ-algebra, 25
Simple Markov property
for Brownian motion, 47
for Lévy processes, 66
Skorokhod’s embedding, 60
Stochastic process, 13
adapted, 13
continuous-time, 29
discrete-time, 15
integrable, 13
previsible, 15
stopped process, 15
Stopping time
definition, 14
measurable events before T , 15
Strong Markov property, 49
Submartingale, 14
Supermartingale, 14
Taking out ‘what is known’, 10
Tightness, 40
Tower property, 10
Transience, 53
Versions of a process, 31
Weak convergence, 37
Wiener space, 46
Wiener’s measure, 46, 48
Wiener’s theorem, 45
INDEX