Stochastic Process and Applications
Kazufumi Ito∗
October 6, 2014
Abstract
In this monograph we cover the basic probability theory and stochastic analysis and its
application in a wide class of science and engineering, including PDE theory, statistics, filtering,
Data assimilation, parameter estimation, stochastic optimal control, game theory, Financial
mathematics. It is very essential that modeling of any process is analyzed using probability
theory is stochastic at least in part. Stochastic systems and processes play a fundamental role
in mathematical models of phenomena in many fields of science, engineering, and economics.
1
Probability Theory
In this section we discuss the basic concept and theory of the probability and stochastic process.
The central objects of probability theory are to develop the mathematic tool to analyze random
variables, stochastic processes, and random events. It provides the systematic and mathematical
approach for analyzing a wide class of random phenomena.
1.1
Probability Triple
We introduce the probability triple (Ω, F, P ), which is the foundation of the probability analysis.
Let Ω be a set and F be a collection of subsets of Ω. A point ω ∈ Ω ia a sample and A ∈ F
is an event. The probability measure P assigns 0 ≤ P (A) ≤ 1 for each event A ∈ F, i.e. the
probability of event A occurs:
Definition The triple (Ω, F, P ) is the probability triple if
(1) F is σ-algebra, i.e.,
Ω ∈ F, A ∈ F ⇒ Ac ∈ F
Fn ∈ F ⇒
S
n
Fn ∈ F
(2) P is σ-additive; for a sequence of disjoint events {An } in F,
∞
[
X
P ( An ) =
P (An )
n
n=1
and for A ∈ F
P (Ac ) = 1 − P (A).
P (Ω) = 1,
Since (
S
n
Fn )c =
T
n
Fnc , the countable intersection
\
Fn ∈ F.
n
∗
Department of Mathematics, North Carolina State University, Raleigh, North Carolina, USA
1
Since Ω = A ∪ Ac =the disjoint union,
P (A) + P (Ac ) = P (Ω) = 1
The σ-additivity of the measure is equivalent to the monotone continuity of the measure:
Theorem (Monotone Convergence) The measure
P is σ-additive if and only if for all
S
sequence {Ak } of nondecreasing events and A = k≥1 Ak , limn→∞ P (An ) = P (A).
T
Proof: {Ack } is a sequence of nonincreasing events and Ac = k≥1 Ack . Since
\
Ack = Ac1 + (Ac2 \ Ac1 ) + (Ac3 \ Ac2 ) + · · ·
k≥1
we have
P (Ac ) = P (Ac1 ) + P (Ac2 \ Ac1 ) + P (Ac3 \ Ac2 ) + · · ·
= P (Ac1 ) + P (Ac2 ) − P (Ac1 ) + P (Ac3 ) − P (Ac1 ) + · · · = limn→∞ P (Acn )
Thus,
P (A) = 1 − P (Ac ) = 1 − (1 − lim P (An )) = lim P (An )
n→∞
n→∞
P∞
Conversely, let A1 , A2 , · · · ∈ F be pairwise disjoint and let k=1 Ak ∈ F. Then
P(
∞
X
Ak ) = P (
k=1
Since
P∞
k=n+1
n
X
∞
X
Ak ) + P (
k=1
Ak )
k=n+1
Ak ↓ ∅, we have
∞
X
lim
n→∞
and thus
P(
∞
X
k=1
Ak ) = lim
Ak = ∅
k=n+1
n→∞
n
X
P (Ak ) =
k=1
∞
X
P (Ak ).
k=1
Examples (σ-algebra)
F0 = {Ω, ∅},
F ∗ = all subsets of Ω.
Let A be a subset of Ω and σ-algebra generated by A is
FA = {Ω, ∅, A, Ac }
Let A, B be subsets of Ω and σ-algebra generated by A, B is
FA,B = {Ω, ∅, A, Ac , B, B c , A ∩ B, A ∪ B, Ac ∩ B c , Ac ∪ B c , Ac ∩ B, Ac ∪ B, A ∩ B c , A ∪ B c }.
A finite set of subsets A1 , A2 , · · · , An of Ω which are pairwise disjoint
and whose union is Ω. it
S
is called a partition of Ω . It generates the σ-algebra: A = {A = j∈J Aj } where J runs over all
subsets of 1, · · · , n. This σ-algebra has 2n elements. For every finite σ-algebra is of this form,
the smallest nonempty elements {A1 , · · · , An } of this algebra are called atoms.
Example (Countable measure) Let Ω has a countable decomposition {Dk }, i.e.,
X
Ω=
Dk , Dj ∩ Dj = ∅, i 6= j.
P
Let F = F ∗ and P (Dk ) = αk > 0 and k αk = 1. For the Poisson random variable X
Dk = {X = k},
P (Dk ) = e−λ
2
λk
.
k!
for λ > 0.
∗
Example (Coin Tossing) If the cardinality of Ω is finite,
Pthen naturally we let F = F and
P ({ω}), ω ∈ Ω defines a measure on (Ω, F), i.e., P (A) = ω∈A P (ω) for A ∈ F. For example
the case of coin tossing n-times independently is formulated as
Ω = {ω = (b1 , · · · , bn ), bi = 0, 1}
P
P
and P (ω) = p ai q n− ai , where p is the probability of ”Head” appears and q is the probability
of ”Tail” appears. The cardinality of Ω is 2n in this case.
For the case of an infinite number of coin tossing Ω is the set of binary sequences;
Ω = {ω = (b1 , b2 , · · · ), bi = 0, 1}.
Each number x ∈ [0, 1) has the binary expression
∞
X
bk
x=
2k
k=1
Thus, Ω has the cardinality of the continuum. Suppose p = q = 12 and all samples ω ∈ Ω have
the same probability. Since the set [0, 1) is uncountable, P (ω) = 0 for each ω ∈ Ω.
The sets [ 21 , 1) = {”Head” appears at the first toss} and [0, 12 ) = {”Tail”
P appears at the first toss}
should have the probability 21 . But, if we use F = F ∗ , then P (A) = ω∈A P (ω) = 0. This
suggests F ∗ does not lead very far and F ∗ is too big. This leads to the concept of the Borel σalgebra of a topological space. In summary for the probability space (Ω, F, P ), F must be closed
with repeat to countable unions and intersections and complements and P must be assigned to
all A ∈ F and is σ-additive.
Definition For any set C of subsets of Ω, we can define the σ-algebra σ(C) by the smallest
σ-algebra A which contains C. The σ-algebra σ(C) is the intersection of all σ-algebras which
contain C. It is again a σ-algebra, i.e.,
\
A=
Aα
α
where Aα is all σ-algebras that contain C.
The following construction of the measure space and probability measure is essential for the
triple (Ω, F, P ) on uncountable space Ω.
Construction of probability measure If (E, O) is a topological space, where O is the set
of open sets in E, then the σ-algebra B(E) generated by O is called the Borel σ-algebra of
the topological space E and (E, B(E)) defines the measure space and a set B in B(E) is called
a Borel set. For example (Rn , B(Rn )) is the the measure space and B(Rn ) is the σ-algebra
generated by open balls in Rn .
Caratheodory Theorem Let B = σ(A), the smallest σ-algebra containing an algebra A of
subsets of E. Let µ0 is a σ additive measure of on (E, A). Then there exist a unique measure
µ on (E, B) which is an extension of µ0 , i.e., µ(A) = µ0 (A), A ∈ A.
Definition (Mesuerable functions) A map f from a measure space (X, A) to an other
measure space (Y, B) is called measurable, if f −1 (B) = {x ∈ X : f (x) ∈ B} ∈ A for all B ∈ B.
If f : (Rn , B(Rn )) → (Rm , B(Rm )) is measurable, we say f is a Borel function.
For example, for f (x) = x2 on (R, B(R)) one has f −1 (([1, 4]) = [1, 2] ∪ [−2, −1]. In general
every continuous function Rn → Rn is a Borel function since the inverse image of open sets in
Rn are open in Rm .
Definition (Random Variable) A function X : Ω → R is called a random variable, if it is a
measurable map from (Ω, F) to (R, B(R)), i.e., X −1 (B) = {ω ∈ Ω : (X(ω) ∈ F for all Borel set
3
B. Every random variable X defines a σ-algebra FX = {X −1 (B) : B ∈ B(R)}. which is called
the σ-algebra generated by X.
Definition (Induced measure and Distribution) Let X be a random variable. Then we
define the induced measure on on (R, B(R) by
µ(B) = P (X −1 (B)),
B ∈ B(R)
and the distribution function by
F (x) = P (X(ω) ≤ x),
x ∈ R.
Then, F satisfies that x ∈ R → F (x) ∈ R+ is nondecreasing, right continuous and the left limit
exists everywhere and F (−∞) = limx→−∞ = 0, F (∞) = limx→∞ = 1. Such a function F is
called a distribution function on R.
Example (Random Variable and Probability measure) Let Ω = R and B(R) be Borel
σ-algebra. Note that
(a, b] =
\
1
),
n
(a, b +
n
\
[a, b] =
(a −
n
1
1
, b + ) ∈ B(R).
n
n
Thus, B(R) coincides with the σ-algebra generated by the semi-closed intervals. Let A be the
algebra of finite disjoint sum of semi-closed intervals (ai , bi ] and define P0 by
P0 (
n
X
(ak , bk ]) =
n
X
(F (bk ) − F (ak ))
k=1
k=1
where F is a distribution function on R. We have the measure P on (R, B(R)) by the Caratheodory
Theorem and thus a random variable X(ω) = ω on (Ω, F) = (R, B(R)). That is, a random variable X is uniquely identified with its distribution function F . For example if F (x) = x, P0 is
the Lesbegue measure dx on (R, B(R)).
We now prove that P0 is σ-additive on A. By the monotone convergence theorem it suffices
to prove that
P0 (An ) ↓ 0, An ↓ ∅, An ∈ A.
Without loss of the generality one can assume that An ⊂ [−N, N ]. Since F is the right continuous, for each An there exists a set Bn ∈ A such that Bn ⊂ An and
P0 (An ) − P0 (Bn ) ≤ 2−n
for all > 0. TThe collection of sets {[−N, N ] \ Bn } is an open covering of the compact set
[−N, N ] since Bn = ∅. By the Heine-Borel theorem there exists a finite sub-covering;
n0
[
[−N, N ] \ Bn = [−N, N ].
n=1
0
and thus ∩nn=1
Bn = 0. Thus,
P0 (An0 ) = P0 (An0 \
P0 (
n0
\
T n0
(Ak \ Bk )) ≤
k=1
k=1
n0
X
B k ) + P0 (
T n0
k=1
Bk ) = P0 (An0 \
P0 (Ak \ Bk ) ≤ .
k=1
Since > 0 is arbitrary P0 (An ) → 0 as n → ∞.
4
T n0
k=1
Bk )
1.2
Exercises
Problem
Problem
Problem
Problem
Problem
Problem
variable.
1.3
1 Show that P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
2 Show that ∩α Fα is a σ-algebra.
3 Show that let X be a random variable, then {X −1 (B) : B ∈ B(R)} is a σ-algebra.
4 Show that X is a random variable if and only if {ω : X(ω) ≤ a} ∈ F for all a ∈ R.
5 Show that a distribution function has at most countably many discontinuities.
6 Show that if f is a Borel function and X is a random variable then f (X) is a random
Expectation
In this section we define the expectation of a random variable X on (Ω, F, P ).
Defintion (Simple random variable) A simple random variable X is defined by
X(ω) =
n
X
xi IAk (ω)
where {Ak } is a partition ofΩ, i.e, Ak ∈ F are disjoint and
is given by
X
E[X] =
xk P (Ak ).
P
Ak = Ω. Then expectation of X
Theorem (Approximation) For every random variable X(ω) ≥ 0 there exists a sequence of
simple random variable {Xn } such that 0 ≤ Xn (ω) ≤ X(ω) and Xn (ω) ↑ X(ω) for all ω ∈ Ω).
Proof: For n ≥ 1, define a sequence of simple random variables by
n
Xn (ω) =
n2
X
k−1
k=1
2n
Ik,n (ω) + nIX(ω)>n
(1.1)
k
where Ik,n is the indicator function of the set {ω : k−1
2n < X(ω) ≤ 2n }. It is easy to verify
that Xn (ω) is monotonically nondecreasing and Xn (ω) ≤ X(ω) and thus Xn (ω) → X(ω) for all
ω ∈ Ω. Definition (Expectation) For a nonnegative random variable X we define the expectation by
E[X] = lim E[Xn ]
n→∞
where the limit exists since E[Xn ] is an increasing number sequence.
Note that X = X + − X − with X + (ω) = max(0, X(ω)), X − (ω) = max(0, X(ω)). So, we can
apply for Theorem and Definition for X + and X − .
E[X] = E[X + ] − E[X − ]
If E[X + ], E[X − ] < ∞, X is integrable and
E[|X|] = E[X + ] + E[X − ].
Corollary Let µX is the induced distribution of a random variable X, i.e.,
µX (x) = P ({X(ω) ≤ x})
Then, for a Borel function f : R → R we have as n → ∞
n
E[f (Xn )] =
n2
X
f(
k−1
k
k−1
)(µ( n ) − µX ( n )) + f (n)(1 − µX (n)) →
2n
2
2
5
Z
∞
f (x) dµx (x),
0
which is the Lebesgue Stieljes integral with respect to measure dµx . Thus,
Z
Z ∞
f (x) dµ(x).
E[f (X)] =
f (X(ω))P (dω) =
−∞
Ω
Definition (Absolutely continuous) The measure Q is absolute continuous with respect to
the measure P if
Q(A) = 0 if P (A) = 0
for all A ∈ F.
Theorem (Radon Nykodym derivative) If Q is Q is absolute continuous with respect to
the measure P , then there exists a nonnegative random variable f such that
Z
Z
dQ(ω) =
f (ω)dP (ω)
A
A
for A ∈ F, i.e.
dQ
(ω) = f (ω) for almost surely
dP
If µx is absolutely continuous with the Lebesgue measure dx, then here exists the probability
density function p(x) ≥ of X exists, i.e.,
dµx
= p(x)
dx
and
(RadonN ykodymderivative)
Z
E[f (X)] =
Z
∞
f (X(ω))P (dω) =
f (x)p(x) dx.
−∞
Ω
Definition (Independent Random variables) Random variables X and Y are independent
if
P ({X(ω) ≤ x} ∩ P ({Y (ω) ≤ y}) = P ({X(ω) ≤ x})P ({Y (ω) ≤ y})
for all x, y.
Since σ-algebra generated by semi closed intervals (−∞, x], x ∈ R equals to B(R), the
definition is equivalent to
P (X −1 (B1 )} ∩ P ({Y −1 (B2 )) = P (X −1 (B1 ))P (Y −1 (B2 )
for all Borel sets B1 , B2 .
Theorem (Independent Random Variables) For independent random variables X and Y
and Borel function f, g if f (X) and g(Y ) are integrable, then the product f (X)g(Y ) is integrable
and
E[f (X)g(Y )] = E[f (X)]E[g(Y )]
Proof: If X and Y are simple random variables, i.e.,
X
X
X=
xk Ak (x), Y =
yj Bj (x),
where {Ak } and {Bj } are partitions of (Ω, F), then
P
E[f (X)g(Y )] = f (xk )g(yj )P (X −1 (Ak ))P (Y −1 (Bj ))
=
X
k
f (xk )P (X −1 (Ak ))
X
g(yj )P (Y −1 (Bj )) = E[f (X)]E[g(Y )]
j
since
P (X −1 (Ak ) ∩ Y −1 (Bj )) = P (X −1 (Ak ))P (Y −1 (Bj )).
6
In general we approximate X, Y by a sequence of simple random variables by the approximation
(??). Theorem (Convolution) If X and Y are independent random variables, and let F (x) =
P (X ≤ x), and G(y) = P (Y ≤ y), then
Z
P (X + Y ≤ z) = F (z − y)dG(y).
The integral on the right-hand side is called the convolution of F and G and is denoted by
(F ∗ G)(z).
Proof: Let h(x, y) = Ix+y≤z . Let µ and ν be the probability measures with distribution
functions F and G. Since for fixed y
Z
Z
h(x, y)µ(dx) = I(−∞,z−y] (x)µ(dx) = F (z − y)
Thus,
Z
P (X + Y ≤ z) = Ix+y≤z µ(dx)ν(dy) =
Z
F (z − y)ν(dy) =
F (z − y)dG(y).
For a random vector X(ω) = (X1 (ω), · · · , Xn (ω) in Rn we define the mean m and variance
R by
R
m = E[X] = Rn x dµX (x)
Rij = Var(X) = E[(Xi − mi )(Xj − mj )] =
R
Rn
(xi − mi ))xj − mj ) dµX (x).
The variance matrix R is symmetric and nonnegative definite since
X
ξ t Rξ = E[|
ξk (Xk − mk )|2 ] ≥ 0
k
n
for all ξ ∈ R .
Random Variable For m ∈ Rn and R > 0 is symmetric positive definite matrix on Rn and
p(x) =
1
n
2
t
1
(2π) |R|
1
2
e− 2 (x−m)
R−1 (x−m)
is the probability density function of a random vector X, where |R| = det(R), the determinant
of R, i.e.,
Z
µ(B) =
p(x) dx, B ∈ B(Rn )
B
defines the Gaussian distribution on Rn . Thus, a Gaussian random variable is completely
determined by its statistics (m, R).
Square integrable Random variables Define
L2 (Ω, P ) = the space of square integrable random variable, E[|X|2 ] < ∞.
Then L2 (Ω, P ) is a vector space, i.e.,
E[|α X + βY |2 ] ≤ 2(α2 E[|X|2 ] + β 2 E[|Y |2 ])
since
|E[XY ]| ≤
p
E|X|2
p
E[|Y |2 ].
In fact, if we define the inner product by
(X, Y )L2 = E[XY ]
then it is a complete inner product space, i.e., a Hilbert space.
7
1.4
Stochastic Process
A stochastic process is a parametric collections of random variables {Xt }t∈T defined on a probability space (Ω, F, P ) and taking values in the state space S. The space T is either discrete
time T = 0, 1, · · · or T = [0, ∞). The state space S is a complete merit space. That is, for each
t∈T
ω ∈ Ω → Xt (ω) ∈ S is a random variable.
On the other hand, for each ω ∈ Ω
t → Xt (ω)
defines a sample path of Xt . Thus, Xt (ω) represents the value at time t ∈ T of a sample ω ∈ Ω
and it may be regarded as a function of two variables:
(t, ω) : T × Ω → X(t, ω) ∈ S
and we assume that X(t, ω) is jointly measurable on (T × Ω, B(T ) × F).
Let Ω be a subset of the product space S T of functions t → X(t, ω) from t ∈ T to S. The
σ-algebra F contains the σ-algebra B(S T ) generated by the cylinder sets of the form
{ω : Xt1 ∈ B1 , · · · , Xtn ∈ Bn }
(1.2)
for all t1 , · · · , tn ∈ T, n ∈ N and Borel sets Bk in B(S). Therefore, we adopt the point of view
that a stochastic process is a probability measure on the measure space (S T , B(S T )).
Definition (Finite dimensional distribution) The finite dimensional distribution of the
stochastic process Xt are the measures defined µt1 ,··· ,tn on S n ;
µt1 ,··· ,tn (F1 × · · · × Fn ) = P (Xt1 ∈ F1 , · · · , Xtn ∈ Fn ).
for all tk ∈ T, n ∈ N and Borel sets Fk of S. The family of finite dimensional distributions determines the statistical properties of the process Xt . Conversely, a given family of
{νt1 ,··· ,tn , tk ∈ T, n ∈ N } of probability measure on S n with the two natural consistency conditions it follows from the Kolmogorov’s extension theory we are able to construct a stochastic
process;
Theorem (Kolmogorov’s extension theory) For all t1 , · · · , tn , let νt1 ,··· ,tn be the probability
mesures on S n satisfying
νtπ(1) ,··· ,tπ(n) (B1 × · · · × Bn ) = νt1 ,··· ,tn (Bπ−1 (1) × · · · × Bπn (n) )
for all permutations π on {1, · · · , n} and
νt1 ,··· ,tn (B1 × · · · × Bn ) = νt1 ,··· ,tn ,tn+1 ,···tn+m (B1 × · · · × Bn × S · · · × S)
there exits a probability space (Ω, F, P ) and a stochastic process Xt on Ω such that
νt1 ,··· ,tn (B1 × · · · × Bn ) = P (Xt1 ∈ B1 , · · · , Xtn ∈ Bn ),
for all tk ∈ T, n ∈ N and all Borel sets Bk .
Example (Brownian motion)
A stochastic process Bt , t ≥ 0 is called a Brownian motion if it satisfies the following
conditions:
i) For all 0 ≤ t1 < · · · < tn the increments Btn − Btn−1 , · · · Bt2 − Bt1 are independent random
variables.
ii) If 0 ≤ s < t, the increment Bt − Bs has the normal distribution N (0, t − s).
Based on the conditions we have
Z
νt1 ,··· ,tn (F1 ×· · ·×Fn ) =
p(t1 , x, x1 )p(t2 −t1 , x1 , x2 ) · · · p(tn −tn−1 , xn−1 , xn ) dx1 · · · dxn
F1 ×···×Fn
8
where
n
p(t, x, y) = (2πt)− 2 e−
|x−y|2
2t
.
We will discuss the properties of the Brownian motion in Chapter . It will be shown that
t → Bt (ω) is continuous.
Measure space ((C, B(C)) Let T = [0, 1] and let C be the space of continuous functions xt on
T . Let B0 be the σ-algebra generated by open sets of C with respect to the metric
d(x, y) = max |xt − yt |,
x, y ∈ C.
t∈[0,1]
We show that B0 coincides with the σ-algebra generated by cylinder sets (1.2). Let B = {x ∈
C : xt0 < b} be a cylinder set. It is easy to see that B is an open set. Hence every cylinder
set of the form {x : xt1 < b1 , · · · xtn < bn } ∈ B0 and thus B(C) ⊂ B0 . Conversely, consider and
open ball Sρ (x0 ) = {d(x, x0 ) < ρ}. Since the function xt is continuous,
Bρ (x0 ) = {y ∈ C : max |yt − x0t | < ρ} = ∪tk {y ∈ C : |ytk − x0tk | < ρ},
t∈T
where {tk } are the rational points of [0, 1]. Therefore, B0 ⊂ B(C). 1.5
Convergence of Stochastic Process
In this section we discuss the convergence of a sequence of Random variables {Xn }.
Theorem. If {Xn } is a sequence of random variables, then so are
inf Xn ,
n
sup Xn ,
lim sup Xn ,
n
lim inf Xn
n
n
Proof: The theorem follows from the facts;
{inf n Xn < a} = ∪n {Xn < a} ∈ F
{supn Xn > a} = ∪n {Xn > a} ∈ F.
lim inf n Xn = supn (inf m≥n Xm )
lim supn Xn = inf n (inf m≥n Xm ).
From Theorem, we see that
{ω : lim Xn (ω) exists} = {ω : lim inf Xn (ω) = lim sup Xn (ω)}
n
n
n
is a measurable set.
Borel-Cantelli Lemma If
Proof: Note that
P
P (An ) < ∞ then P (An occurs infinitely manny time) = 0.
∞
An occurs infinitely manny time) = lim sup An = ∩∞
n ∪k≥n Ak .
Thus,
P (An occurs infinitely manny time) = lim P (∪∞
k≥n Ak ) ≤ lim
n→∞
n→∞
X
P (Ak ) = 0.
k≥n
Fatou’s Lemma Let {fk } be a sequence of non-negative measurable functions on a measure
space (E, B, µ). Define the function f ? : E → [0, ∞] a.e. pointwise limit by
f (x) = lim inf fn (x),
n→∞
9
x ∈ E.
Then f ? is measurable and
Z
Z
f dµ ≤ lim inf
fn dµ.
n→∞
E
E
Proof: For every k define pointwise the function
gk = inf fn .
n≥k
Then the sequence {gk } is increasing and converges pointwise to f . Since for all k ≤ n we have
gk ≤ fn ,
Z
Z
gk dµ ≤
fn dµ,
E
and hence
E
Z
Z
gk dµ ≤ inf
fn dµ.
n≥k
E
E
Using the monotone convergence theorem for the first equality, then the last inequality from
above, and finally the definition of the limit inferior, it follows that
Z
Z
Z
Z
f dµ = lim
gk dµ ≤ lim inf
fn dµ = lim inf
fn dµ.
E
k→∞
k→∞ n≥k
E
n→∞
E
E
Theorem (Lebesgue’s Dominated Convergence Theorem) Let {fn } be a sequence of
real-valued measurable functions on a measure space (E, B, µ). Suppose that the sequence
converges pointwise to a function f a.e. and is dominated by some integrable function g in the
sense that |fn (x)| ≤ g(x) a.e. Then f is integrable and
Z
|fn − f | dµ = 0
lim
n→∞
E
which also implies
Z
Z
lim
fn dµ =
n→∞
E
f dµ
E
Proof: Since f is the pointwise limit of the sequence fn of measurable functions that is dominated
by g, it is also measurable and dominated by g, hence it is integrable. Note that
|f − fn | ≤ |f | + |fn | ≤ 2g
for all n and
lim sup |f − fn | = 0, a.e.
n→∞
Since 2g − |f − fn | ≥ 0 it follows from the Fatou’s lemma that
Z
Z
lim sup
|f − fn | dµ ≤
lim sup |f − fn | dµ = 0,
n→∞
E
E
n→∞
which implies that
Z
|f − fn | dµ = 0.
lim
n→∞
E
Let {Xn } be a sequence of random variables. The we have the following convergence notions
to a random variable X;
Definition (a) almost surely (a.s.) convergence:
Xn (ω) → X(ω) except zero P -measure set.
(b) Lp convergence: for p ≥ 1 let Lp (Ω, P ) be the Banach space of Lp integrable random
variables with
1
|X|p = E[|X|p ] p .
10
Xn converges to X in Lp if
|Xn − X|p → 0 as n → ∞.
(c) convergence in probability: Xn converges in probability if
P (|Xn − X| > ) → 0 as n → ∞
for all > 0.
(d) convergence in distribution; Xn converges to X in distribution
E[f (Xn )] → E[f (X)]
for all bounded continuous function f .
In the followings we present the relationships between the different convergence notions of a
sequence of random variables {Xn }.
Theorem If Xn converges to X in probability implies the convergence in distribution.
Proof: Note that for random variables X, Y
P (Y ≤ a) = P (Y ≤ a, X ≤ a + ) + P (Y ≤ a, X > a + )
≤ P (X ≤ a + ) + P (Y − X ≤ a − X, a − X < −)
≤ P (X ≤ a + ) + P (Y − X < −)
≤ P (X ≤ a + ) + P (Y − X < −) + P (Y − X > )
= P (X ≤ a + ) + P (|Y − X| > ).
It can be shown that in order to prove convergence in distribution, one must show that the
sequence of the distributions FXn of Xn converges to the one FX of X at every point where FX
is continuous. Let a be such a point. For every > 0, we have:
P (Xn ≤ a) ≤ P (X ≤ a + ) + P (|Xn − X| > )
P (X ≤ a − ) ≤ P (Xn ≤ a) + P (|Xn − X| > )
and
P (X ≤ a − ) − P (|Xn − X| > ) ≤ P (Xn ≤ a) ≤ P (X ≤ a + ) + P (|Xn − X| > ).
Taking the limit as n → ∞, we obtain:
FX (a − ) ≤ lim P (Xn ≤ a) ≤ FX (a + ),
n→∞
Since Fx is continuous at a
lim FXn (a) = FX (a).
n→∞
The space L of random variables is a complete metric space with metric
d(X, Y ) = E[(
|X − Y |
]
1 + |X − Y |
In fact, d(X, Y ) = 0 if and if X = Y almost surely and since
|X + Y |
|X|
|Y |
≤
+
,
1 + |X + Y |
1 + |X| 1 + |Y |
d satisfies the triangle inequality.
Theorem (completeness) (L, d) is a complete metric space.
11
Proof: Let {Xn } be a Cauchy sequence of (L, d). Select a subsequence Xnk such that
∞
X
d(Xnk , Xnk+1 ) < ∞
k=1
Then,
X
E[
k
Since |X| ≤
2|X|
1+|X|
|Xnk − Xnk+1 |
]<∞
1 + |Xnk − Xnk+1 |
for |X| ≤ 1,
∞
X
|Xnk − Xnk+1 | < ∞ a.s.
and {Xnk } almost surely converges to X(ω) for X ∈ L. Moreover, d(Xnk , X) → 0 and thus
(L, d) is complete.
Theorem Xn converges in probability to X if and only if Xn converges to X in d-metric.
Proof: For X, Y ∈ L
Z
d(X, Y ) =
|X−Y |≥
|X − Y |
dP +
1 + |X − Y |
Z
|X−Y |<
|X − Y |
dP ≤ P (|X − Y | ≥ ) +
1 + |X − Y |
1+
holds for all > 0. Thus, Xn converges in probability to X, then Xn converges to X in d-metric.
Conversely, since
d(X, Y ) ≥
P (|X − Y | ≥ )
1+
if Xn converges to X in d-metric, then Xn converges in probability to X. Theorem (a) If either Xn converges to X in Lp or converges to a.s. to X, then Xn converges
to X in probability.
(b) If Xn converges to X in probability, there exists a subsequence that converges a.s. to X.
1
Proof: Since E[|X|] ≤ (E|X|p ) p E[1] = |X|p and d(X, 0) ≤ |X|, the Lp convergence implies
the convergence in probability. If Xn converges to X a.s., then by the bounded convergence
theorem d(Xn , X) → 0 and thus Xn converges to X in probability. Conversely, if Xn converges
to X in probability, then d(Xn , X) → 0 and as shown in the proof of Theorem (completeness)
there exists a subsequence that converges a.s. to X.
In summary we have the chain of implications between the various notions of convergence:
Ls
−−→
Lr
⇒
s>r≥1
−−→
⇓
a.s.
−−→
1.6
p
⇒
−−→
d
⇒ −−→
Uniform integrable random variables
Definition (Uniform integrable) A sequence of random variables {Xn } is uniformly integrable if
Z
sup
|Xn | dP → 0 as c → ∞.
n
|Xn |≥c
Theorem (Uniform Integrable I) If {Xn } is uniformly integrable, then
E[lim inf Xn ] ≤ lim inf E[Xn ] ≤ lim sup E[Xn ] ≤ E[lim sup Xn ]
Theorem (Uniformly integrable II) A sequence {Xn } of random variables converging in
probability to X also converge to X in L1 if and only if they are uniformly integrable.
12
Proof: Suppose {Xn } is uniformly integrable. Since for Yn = Xn − X
Z
Z
|Yn |
1
d(Yn , 0) ≥
dP ≥
|Yn | dP,
1 + c |Yn |≤
|Yn ≤c 1 + |Yn |
and thus
Z
Z
|Yn | dP ≤ (1 + c)d(Xn , 0) +
|Yn | dP.
|Yn |≥c
Ω
For arbitrary > 0 if we choose c such that supn
R
|Yn |≥c
|Yn | < , then
Z
|Yn | dP ≤ (1 + c) d(Yn , 0) + Ω
Since d(Xn , 0) → 0, lim supn→∞ |Yn | ≤ c a.s. and thus |Xn − X|1 = |Yn |1 → 0.
Conversely, suppose |Xn −X| → 0. For arbitrary > 0 we choose n0 such that supn≥n0 E[|Yn |] <
. Then, for all c > 0
Z
|Yn | dP < .
sup
n≥n0
|Yn |>c
If for each 1 ≤ k ≤ n0 − 1 we choose ck such that
Z
|Yk | dP < .
|Yk |>ck
and let c = max 1 ≤ k ≤ n0 − 1 ck , we have
Z
sup
n
|Yn | dP < .
|Yn |>c
Lemma Let G be a nonnegative increasing function on R+ such that limt→∞
G(t)
t
→ ∞. If
sup E[G(|Xn |)] < ∞
n
then {Xn } is uniformly integrable.
Proof: For
G(t)
t
> a, t > c we have
Z
Z
1
1
G(|Xn |) dP ≤ sup E[G(|Xn |)].
sup
|Xn | dP ≤ sup
a n |Xn |≥c
a n
n
|Xn |>c
Theorem (Kolmogorov)
1.7
Conditional Expectation
Definition Let X be a random variable and A be a σ-algebra. The conditional expectation
E[X|A] is a A random variable that satisfies
E[IA E[X|A)] = E[IA X]
(1.3)
for all A ∈ A.
Note that Q(A) = E[IA X], A ∈ A for a nonnegative random variable X defines a measure
Q on (Ω, A) and if P (A) = 0 implies Q(A) = 0 (i.e. Q is absolutely continuous with respect to
P ). By the Radon-Nikodym theorem the conditional expectation exists as the Radon-Nikodym
derivative dQ
dP = E[X|A].
Condition (1.3) is equivalent to the orthogonality condition;
E[Z (X − E[X|A])] = 0 for all A-measurable random variables Z.
13
(1.4)
In fact, (1.3) follows from letting Z = IA . Conversely, we approximate Z by a sequence of
A-measurable simple random variables
X
Zn =
zk IAk , Ak ∈ A.
Let L2 (Ω, F, P ) be a space of square integrable random variables and define the inner product
by
(X, Y )L2 = E[XY ]
2
Then, L (Ω, F, P ) is a Hilbert space. Moreover X̂ = E[X|A] minimizes
E[|X − Z|2 ] over all A-measurable square integral random variables
In fact,
E[|X − Z|2 ] = E[|X − X̂|2 + 2(X − X̂)(X̂ − Z) + |Z − X̂|2 ] = E[|X − X̂|2 ] + E[|Z − X̂|2 ].
That is, E[X|A] is the orthogonal projection of X onto the subspace space of A-measurable
random variables of L2 (Ω, F, P ).
Example (1) If {Gk } in F is a partition of Ω and G is the σ-algebra generated by {Gk }, then
E[IA |G] = P (A|G) =
P (A ∪ Gk )
P (Gk )
ω ∈ Gk .
If G1 and G2 be independent σ-algebras,
P (A|G2 ) = P (A)
Conversely, if the above holds for all A ∈ G1 , then G1 and G2 are independent
(3) If X, Y are random variables
Z
pX,Y (x, y)
P (X ∈ B|Y = y) =
dx
pY (y)
B
where pX,Y is the joint density function of (X, Y ) and and pY (y) is the marginal density of Y .
In fact,
Z Z
P (X ∈ B|y−δ < Y ≤ y+δ) =
P ({X ∈ B} ∩ {y − δ < Y ≤ y + δ})
=
P (y − δ < Y ≤ y + δ)
B
y+δ
pX,Y
y−δ
Z y+δ
pY (y) dy
y−δ
for δ > 0.
Theorem (Property of Conditional Expectation)
(1) E[E[X|H)|A] = E[X|A] for A ⊆ H.
(2) E[X|A] = E[X], if X is independent with A.
(3) E[Z X|A] = Z E[X|A] if Z is A measurable.
(4) |E[X|A]| ≤ E[|X||A].
(5) (Jensen’s inequality) If f is convex function, the
φ(E[X|A]) ≤ E[φ(X)|A].
Proof: (1) For all H ∈ H
Z
Z
E[X|H]dP =
H
Z
XdP =
H
E[X|A]dP.
H
14
(x, y) dxdy
(2) For all A ∈ A we have
Z
Z
Z
E[X|A]dP = IA XdP = E[X]P (A) = [X]dP
A
A
(3) If we let f = IG , G ∈ A we have
Z
Z
Z
IG E[X|H]dP =
E[X|A]dP =
A
A∪G
Z
XdP =
A∪G
Z
IG X =
A
E[IG X|A]dP
G
for all A ∈ A and thus
E[IX |A] = IG E[X|A].
1.8
Characteristic Functions and Weak convergence
A sequence of random variables {Xn } is converge weakly or converge in distribution to X if and
only if distribution functions Fn (x) = P (Xn ≤ x) converge to F (x) = P (X ≤ x) for all x at
which F (x) is continuous and {Fn } is tight, i.e., for arbitrary > 0 thee exists C such that
Z
|
dFn (x)| ≤ .
(1.5)
|x|≥C
To see that convergence at continuity points is enough to identify the limit, observe that F is
right continuous and the discontinuities of F are at most a countable set.
Theorem (Weak Convergence) A sequence of random variables {Xn } is converge weakly or
converge in distribution to X if and only if {µn } tight and converge weakly to F .
Proof: By the bounded convergence theorem for g is a bounded continuous function
E[g(Xn )] → E[g(X)]
if Fn converges to F weakly. Conversely, assume that F (x) is continuous at x and define
1
y≤x
x+−y
g (y) =
y ∈ [x, x + ]
0
y ≥ x + .
Since g is continuous, and g = 0 for y ≥ x + ,
lim sup P (Xn ≤ x) ≤ lim sup E[g (Xn )] = E[g (Xn )] ≤ P (X ≤ x + ).
Letting → 0+ , we have lim sup P (Xn ≤ x) ≤ P (X ≤ x). Also,
lim inf P (Xn ≤ x) ≥ lim inf E[g (Xn − )] = E[g (X − )] ≥ P (X ≤ x − ).
Letting → 0+, we have lim inf P (Xn ≤ x) ≥ P (X ≤ x). Combining these, we have F (xn ) →
F (x) as n → ∞. The characteristic function plays an important role in the weak convergence of random
variables.
Definition For X ∈ Rn is random vector the characteristic function of X is defined by
Z
ϕ(ξ) = E[ei(ξ,X) ] =
ei(ξ,x) dF (x), ξ ∈ Rn ,
Rn
where F is the distribution of X.
15
Theorem The characteristic function of the random variable X t ∈ R → ϕ(t) = E[eitX ]
satisfies;
(1) |ϕ(t)| ≤ ϕ(0) = 1.
(2) ϕ(t) is uniformly continuous.
(3) ϕ(t) = ϕ(−t).
(4) ϕ(t) is real-valued if and only if F is symmetric.
(5) If E[|X|n ] < ∞ for some n ≥ 1, then ϕ(n) (t) exists for all r ≤ n,
Z
ϕ(r) (t) = (ix)r eitx dF (x), (i)r E[X r ] = ϕ(r) (0),
R
and
ϕ(t) =
n
X
(it)r
r=0
r!
E[X r ] +
(it)n
n (t),
n!
where |n (t)| ≤ 3E[|X|n ] and n (t) → 0 as t → 0.
(6) If ϕ(2n) (0) exists and is finite, then E[X 2n ] < ∞.
(7) If E[|X|n ] < ∞ for all n ≥ 1 and lim sup (E[|X|
n
ϕ(t) =
1
n
]) n
=
1
eR
< ∞, then
∞
X
(it)n
E[|X|n ] for all |t| < R.
n!
n=0
Integrating by parts gives
Z x
Z x
xn+1
i
(x − s)n eis ds =
+
(x − s)n+1 eis ds
n+1 n+1 0
0
For n = 0
Z
x
eis ds = x + i
x
Z
0
(1.6)
(x − s)eis ds
0
and thus
eix = 1 + i x −
Z
x
(x − s)eis ds
0
By () with n = 1
eix = 1 + i x −
x2
−i
2
x
Z
(x − s)2 eis ds
0
By induction
e
ix
=
n
X
(ix)k
k=1
Since
Z
E[|
k!
i[ n + 1
+
n!
Z
x
(x − s)n eis ds
0
tX
(tX − s)2 eis ds|] ≤ t2 E[(|t||X|3 ∧ |X|2 ].
0
we have
t2
E[X 2 ] + o(t2 ).
2
Theorem (Lévy continuity theorem) Let {Fn } be a sequence of probability measures with
characteristic function {ϕn }. (1) If Fn converges weakly to F , then ϕn (t) → ϕ(t) for all t.
(2) If ϕn (t) converges pointwise to ϕ that is continuous at 0, then the associated sequence of
distributions {Fn } is tight and converges weakly to the measure F with characteristic function
ϕ.
E[eitX ] = 1 + i tE[X] −
Proof: For (1) since eits is bounded continuos,
ϕn (t) = E[eitXn ] → E[eitx ] = ϕ(t).
16
For (2) note that
Z
c
(1 − eitx ) dt = 2(1 −
−c
sin(cx)
)
x
and thus
Z
Z
Z
sin cx
2
1 c
1
(1 − ϕn (t)) dt ≤ 2 (1 −
) dFn (x) ≥ 2
(1 − ) dFn (x) ≥ µn (|x| ≥ ).
2
c −c
cx
cx
c
|x|≥ c
Since ϕ(t) → 1 sas t → 0 there exists for c such that c ≤ c
Z
1 c
(1 − ϕ(t) dt| ≤ .
|
c −c
But, since ϕ(t) → ϕ(t), by the bounded convergence theorem there exist N such that for n ≥ N
Z
1 c
2
2 ≥
(1 − ϕn (t)) dt ≥ µn (|x| ≥ ).
c −c
c
and thus the sequence {µn } of measures is tight.
1.9
Law of Large numbers
The law of large numbers (LLN) describes the result of performing the same experiment a large
number of times. According to the law, the average of the results obtained from a large number
of trials should be close to the expected value, and will tend to become closer as more trials are
performed.
Theorem (L2 Weak Law of Large Number) Let
PnX1 , X2 , · · · be uncorrelated random variables with E[Xk ] = µ and var(Xk ) ≤ C. If Sn = k=1 X + k then as n → ∞ Snn → µ in L2
and in probability.
Proof:
Sn
1 X
C
Sn
− µ|2 ] = V ar( ) = 2
V ar(Xk ) ≤
→ 0.
n
n
n
n
By the Chebyshevs inequality,
E[|
P (|
Sn
1
Sn
− µ| > ) ≤ 2 E[|
− µ|2 ].
n
n
Theorem (Weak Law of Large numbers) Let X1 , X2 , · · · be i.i.d. random variables with
E[|Xk |] < ∞. Then, Snn → µ in probability.
Proof: By the Lebesgue dominated convergence theorem
xP (|X1 | > x) ≤ E[|X1 |I{|X1 |>x} ] → 0 as x → ∞
µn = E[X1 I{|X1 |<n} → E[X1 ] = µ as n → ∞.
Since P (| Snn − µn | > ) → 0.
Theorem (Strong law of large numbers) Let X1 , X2 , · · · be i.i.d. random variables with
E[|Xk |] < ∞. Then, Snn → µ a.s. as n → ∞.
Pn
Proof: Let Yk = Xk I{|Xk |≤k} and Tn = k=1 Yk . Since
X
Z
P (|Xk > k) ≤
∞
P (X1 > t) dt = E[|X1 |]
k
17
by the Borel-Cantelli lemma, P ({ω : Xk (ω) = Tk (ω) for infinitely many k = 0}) = 0 and thus
Sn
Tn
Tn
n − n → 0 a.s. as n → ∞. It suffices to show that n → µ a.s. as n → ∞. Let Zk = Yk −E[Yk ],
2
so E[Zk ] = 0. Now V ar(Zk ) = V ar(Yk ) ≤ E[|Yk | ] and
∞
X
V ar(Zk )
k=1
We show that
P∞
Zk
k=1 k
k2
∞
X
E[|Yk |2
k2
k=1
< 4E[|X1 |].
converges a.s. as n → ∞. Let Qn =
|Qn − QM | > ) ≤
P ( sup
≤
M ≤n≤N
Pn
Zk
k=1 k ¿
1
1
V ar(QN − QM ) ≤ 2
2
Then,
N
X
k≥M +1
V ar(Zk )
.
k2
Letting N → ∞ we have
P ( sup |Qn − QM | > ) ≤≤
n≥M
1 X V ar(Zk )
→ 0 as M → ∞.
2
k2
k>M
Since
P ( sup |Qn − Qm | > 2) ≤ P ( sup |Qn − QM | > ),
n, m≥M
n≥M
supn, m≥M |Qn − Qm | → 0 a.s. and thus {Qn (ω)} are Cauchy sequence in probability one. By
the Kroneckers lemma
n
n
1X
Tn
1X
(Yk − E[Yk ]) → 0, a.s. and thus
−
E[Yk ] → 0 a.s.
n
n
n
k=1
k=1
Since by the Lebesgue dominated convergence theorem E[Yk ] → µ as k → ∞ it follows that
Tn
n → µ. 1.10
Central Limit Theory
The central limit theorem (CLT) states that, the arithmetic mean of a sufficiently large number of
independent random variables, each with a well-defined expected value and well-defined variance,
will be approximately normally distributed.
Theorem (Central Limit)PLet X1 , X2 , · · · be i.i.d. random variables with with E[Xk ] = µ
and V ar(Xk ) = σ 2 . If Sn = k Xk , then
√ Sn
n( n − µ)
converges in distribution to N (0, 1),
σ
the standard normal distribution.
√
n( Sn −µ)
n
Proof: Note that the characteristic function ϕn of
is given by
σ
n
2
t2
t2
ϕn (t) = 1 −
+o
→ e−t /2 , n → ∞.
2n
n
But this limit is just the characteristic function of a standard normal distribution N (0, 1), and
the central limit theorem follows from the Lévy continuity theorem, which confirms that the
convergence of characteristic functions implies convergence in distribution. By the law of large numbers, the sample averages Snn converge in probability and almost
surely to the expected value µ as n → ∞. The central limit theorem describes the distribution
of the random fluctuations of the sample average of a sequence of random variables around
the maen µ. More precisely, it states that as sample size n gets larger, the distribution of
the difference√between the sample average Snn and its limit µ, when multiplied by the factor
√
n (that is, n(Sn − µ), approximates the normal distribution N (0, σ). For large enough n,
the distribution of the sample average Snn is close to the normal distribution with mean µ and
√
2
variance σn . The usefulness of the theorem is that the distribution of n(Sn − µ) approaches
the normal distribution regardless of the distribution of the individual random variables Xk .
18
1.11
Large Deviations
The large deviations theory concerns with the asymptotic exponential rate of the probability
measures of tail events. Let X1 , X2 , · · · be a sequence of i.i.d. random variables let Sn =
P
n
k=1 Xk . In this section, we will discuss the large deviations in terms of the rate function
I(x) = lim logP (
n→∞
Sn
≥ x)
n
for x > µ = E[Xk ]. If we let
λ(θ) = logE[eθXk ]
and define
λ∗ (x) = sup{θx − λ(θ)},
θ>0
the LegendreFenchel transformation of λ, then it will be shown that I(x) = −λ∗ (x).
For the existence of the limit if θ > 0, then Chebyshevs inequality implies
eθnx P (
Sn
≥ x) ≤ E[eθ Sn ] = enλ(θ) .
n
and thus
Sn
≥ x) ≤ e−n(θx−λ(θ)) .
n
We assume that there exists θ > 0 such that
P(
(1.7)
E[eθXk ] < ∞.
If we let πn = P ( Snn ≥ x), then
πm+n ≥ P (Sm ≥ mx, Sn+m − Sm ≥ nx) = πm πn
since Sm and Sn+m − Sm are independent. Letting In = logπn we have
Lemma If Im+n ≥ Im + In , then as n → ∞,
In
n
→ supm
Im
m.
m
m
Proof: Since lim sup Inn ≤ supm Im
, it suffices to prove that for any lim inf Inn ≥ Im
. If we let
n = km + ` with 0 ≤ ` < m and from the assumption, we have In ≥ kIm + I` . Dividing this by
n, we obtain
In
km Im
I`
≥
+
n
km + ` m
n
Letting n → ∞, the desired result holds. From Lemma and (1.7) limn→∞ n1 logP ( Snn ≥ x) = I(x) exists and
P(
Sn
≥ x) ≤ enI(x) .
n
Since θ > 0 is arbitrary, it follows from (1.7) I(x) ≤ −λ∗ (x) and
P(
∗
Sn
≥ x) ≤ e−n λ (x) .
n
Theorem I(x) = −λ∗ (x).
Proof: Let F be the distribution of Xk . Define
Z ∞
eλx dF (x)
φ(θ) =
−∞
19
and θ+ = sup{θ : φ(θ) < ∞}, θ− = inf{θ : φ(θ) < ∞}. Since for θ ∈ (θ1 , θ+ )
Z x
1
Fθ (x) =
eθ y dF (y)
φ(θ) −∞
and
Z
∞
xdFθ (x) =
−∞
Z
φ0 (θ)
φ(θ)
∞
x2 dFθ (x) = φ00 (θ)
−∞
we have
d
dθ
φ0
φ
φ00
=
−
φ
φ0
φ
2
∞
Z
∞
Z
x dFθ (x) − (
2
=
−∞
xdFθ (x))2 > 0
−∞
which is the variance of Fθ , and thus λ = logφ(θ) is convex. Thus, there exits θ̄ such that
x=
φ0 (θ̄)
φ(θ̄)
and θ̄ = argmax{θx − logφ(θ)}.
Pn
Let X1θ , X2θ , · · · be i.i.d. random variables with distribution Fθ and let Snθ = k=1 Xkθ . By the
definition the Radon-Nikodym derivative of the associated measures, it is immediate from the
dF
= e−θx φ(θ). Then,
definition dF
θ
dF n
= e−θx φ(θ)n .
(1.8)
dFθn
In fact, by the induction we have
n
F (z) = (F
n−1
Z
∞
∗ F )(z) =
Z
−∞
Z
=
dFθn−1 (x)
Z
z−x
dFn−1 (x)
dF (y)
−∞
dFθ (y)Ix+y≤z e−θ(x−y) φ(θ)n
θ
θ
−θ(Sn−1 +Xn )
= E[I{Sn−1
]φ(θ)n
θ
θ ≤z} e
+Xn
Z
z
e−θx dFθn (x)φ(θ)n
=
−∞
If c > x, then from (1.8) and the monotonicity
Z nc
Sn
≥ x) =
P(
e−θy dFnθ (y) ≥ e−ncθ (Fθn (nc) − Fθn (nx)
n
nx
Since Fθ has mean
φ0 (θ)
φ(θ) ,
if we have x <
φ0 (θ)
φ(θ)
(1.9)
< c, then the weak law of large numbers implies
Fθn (nc) − Fθn (nx) → 1 as n → ∞.
Thus, we have
Sn
1
lim inf n → ∞ logP (
≥ x) ≥ −θc + logφ(θ)
n
n
Since c > x and θ > θ̄ are arbitrary, we obtain
I(x) = lim inf
n→∞
1
Sn
logP (
≥ x) = −xθ̄ + logφ(θ̄) = −λ∗ (x)
n
n
In general we say that a large deviation principle holds for a sequence of probability measures
Pn defined on the Borel subsets of a complete separable metric space X, with a rate function
20
H(x), if (1) H(x) ≥ 0 is a lower semicontinuous function on X with the property that K = {x :
H(x) ≤ c} is a compact set for every c ≥ 0 and
(2) for any closed set C ⊂ X,
lim sup log Pn (C) ≤ − inf H(x).
x∈C
n→∞
(3) for any open set U ⊂
lim inf log Pn (U ) ≥ − inf H(x).
n→∞
x∈U
Rate functions with property (1) are referred to as good rate functions.
Theorem Let {Pn } satisfy LDP on X with rate H and F (x) : X → R a bounded continuous
function. Then
Z
1
lim log enF (x) dPn = sup{F (x) − H(x)}.
n→∞ n
x
Proof: We note that
n
X
1
log
enak = sup ak .
n→∞ n
lim
k=1
For the upper bound, dividing the range of F into a finite number of intervals of size
j
denoting by Cj,k = {x : j−1
k ≤ F (x) ≤ k . Then,
Z
XZ
X nj
nF (x)
e
dPn ≤
enF (x) dPn ≤
e k P (Cj,k )
j
Cj,k
1
k
and
j
Thus, we obtain for any k, the bound
Z
1
j
1
lim log enF (x) dPn ≤ supj { − inf H(x)} ≤ sup{F (x) − H(x)} + .
n→∞ n
x∈C
k
k
j,k
x
Letting k → ∞ we obtain the upper bound. For the lower bound if we take any x0 with
H(x0 ) < ∞, then in a neighborhood U of x0 , F (x) is bounded below by F (x0 ) − (U ). By
assumption Pn (U ) ≥ exp(?nH(x0 ) + o(n)). Since the integrand is nonnegative
Z
Z
enF (x) dPn ≥
enF (x) dPn ≥ exp(n(F (x0 ) − H(x0 ) − (U )) + o(n))
X
U
Since x0 is arbitrary and U can be shrunk to x0 , we obtain the lower bound.
Theorem (Puhalskii, O’Brien and Vervaat, de Acosta) Suppose that {Xn } is exponentially tight, i.e., for each a > 0 there exists a compact set Ka ⊂ S such that
lim sup
n→∞
1
log P (Xn ∈
/ Ka ) ≤ −a
n
Then there exists a subsequence {Xnk } for which the large deviation principle holds with a good
rate function
2
Markov Chain
In this section consider the discrete time stochastic process. Let S be the state space, e.g.,
S = Z = {integers}, S = {0, 1, · · · , N } and S = {−N, · · · , 0, · · · , N }.
Definition We say that a stochastic process {Xn }, n ≥ 0 is a Markov chain with initial
distribution π (P (X0 ) = πi ) and (one-step) transition matrix P if for each n and ik , 0 ≤ k ≤ n−1
P (Xn+1 = j|Xn = i, Xn−1 = in−1 , · · · , X0 = i0 ) = P (Xn+1 = j|Xn = i) = pij
21
(2.1)
with
X
pij = 1,
pij ≥ 0.
j∈S
Thus, the distribution of Xn+1 depends only on the current state Xn and is independent of
the past.
Example Consider the discrete-time dynamics:
Xn+1 = f (Xn , wn ),
f : S × R → S,
where {wn } is independent identically distributed random variables. Then, we have
P (f (x, w) = j|x = i) = pij .
We have the following properties.
Theorem Let P n = {pnij .
P (Xn+2 = j|Xn = i) =
X
(2)
pik pkj = (P 2 )ij = pij ,
k∈S
P (Xn = j) = (πP n )j =
X
(n)
πi pi,,j
and
(m+n)
pij
=
X
(n) (m)
pik pkj
(Chapman-Kolmogorov).
k∈S
2.1
Classification of the States
In this section we analyze the asymptotic behavior of the Markov chain (2.1), e.g., including
(n)
Questions (1) The limits πj = limn→∞ pij exist and is independent of the initial
P state i.
(2) The limits (π1 , π2 , · · · ) form a probability distribution, that is, π ≥ 0 and
πi = 1.
(3) The chain is ergotic, i.e., πi > 0.
(4) There is one and only one stationary probability distribution π such that π = πP (invariant).
The followings are the basic characterization of Markov Chains.
(n)
Definition (1) Communicate: i → j if pij > 0 for some n ≥ 0. i ↔ j (communicate) if i → j
and j → i.
(2) Communicating classes: i ↔ j defines an equivalent relation, i.e., i ↔ i (reflective), i ↔ j ⇔
j ↔ i (symmetric) and i ↔ j, j ↔ k ⇔ i ↔ k (transitive). Thus, the equivalent relation i ↔ j
defines equivalent classes of the states, i.e., the communicating classes. A communicating class
is closed if the probability of leaving the class is zero, namely that if i is in an equivalent class
C but j is not, then j is not accessible from i.
(3) Irreducible: A Markov chain is said to be irreducible if its state space is a single communicating class; in other words, if it is possible to get to any state from any state.
(4) Transient, Null and Positive recurrent: Let the random variable τi be the first return time
to state i (the ”hitting time”):
τii = min{n ≥ 1 : Xn = i|X0 = i}.
P∞
The number of visits Ni to state i is defined by Ni = n=0 I{Xn =i} and
E[Ni ] =
∞
X
P (Xn = i|X0 = i) =
n=0
∞
X
n=0
22
(n)
pii ,
where I{F } is the indicator function of event F , i.e., I{F }(ω) = 1 if ω ∈ F and I{F }(ω) = 0
P∞ (n)
if ω ∈
/ F . If n=0 pii = ∞ state i is recurrent (return to the state infinitely may times). If
P∞ (n)
n=0 pii < ∞ state i is transit (return to the state finitely may times).
Define the probability of the first time return
(n)
fii = E[τii = n] = P (Xn = i, Xk 6= i|X0 = i)
of state i. Let fi be the probability of ever returning to state i given that the chain started in
state i, i.e.
∞
X
(n)
fi = P (τii < ∞) =
fii .
n=1
Then, Ni has the geometric distribution, i.e.,
P (Ni = n) = fin−1 (1 − fi )
and
E[Ni ] =
1
.
1 − fi
Thus, state i is recurrent if and only if fi = 1 and state i is transit if and only if fi < 1. The
mean recurrence time of a recurrent state i is the expected return time µi :
µi = E[τii ] =
∞
X
(n)
nfii .
n=1
State i is positive recurrent (or non-null persistent) if µi is finite; otherwise, state i is null
recurrent (or null persistent).
(n)
(5) Period: State i has period d = d(i) if (i) pii > 0 for values od n = dm, (ii) d is the largest
number satisfying (i), equivalently d is the greatest common divisor of the numbers n for which
(n)
pii > 0. Note that even though a state has period k, it may not be possible to reach the state
in k steps. For example, suppose it is possible to return to the state in {6, 8, 10, 12, . . . } time
steps; k would be 2, even though 2 does not appear in this list. If k = 1, then the state is said
to be aperiodic: returns to state i can occur at irregular times. Otherwise (k > 1), the state is
said to be periodic with period k.
(6) Asymptotic: Let a Markov chain is irrecusable and aperiodic. Then, if either state i is
(n)
(n)
transient and null recurrent pij → 0 as n∞ or if all state i is positive recurrent pij → µ1j as
n → ∞.
(7) Stationary Distribution: The vector π isPcalled a stationary distribution (or invariant measure) if its entries πj are non-negative and j∈S πj = 1 and if it satisfies
π = πP ⇔ πj =
X
πi pij .
i∈S
An irreducible chain has a stationary distribution if and only if all of its states are positive
recurrent. In that case, it is unique and is related to the expected return time:
πj =
1
.
µj
Further, if the chain is both irreducible and aperiodic, then for any i and j,
(n)
lim p
n→∞ ij
=
1
.
µj
Note that there is no assumption on the starting distribution; the chain converges to the stationary distribution regardless of where it begins. Such π is called the equilibrium distribution
23
of the chain. If a chain has more than one closed communicating class, its stationary distributions will not be unique (consider any closed communicating class Ci in the chain; each one
will have its own unique stationary distribution πi . Extending these distributions to the overall
chain, setting all values to zero outside the communication class, yields that the set of invariant
measures of the original chain is the set of all convex combinations of the πi ’s). However, if a
state j is aperiodic, then
1
(n)
lim p =
n→∞ jj
µj
and for any other state i, let fij be the probability that the chain ever visits state j if it starts
at i,
fij
(n)
lim p =
.
n→∞ ij
µj
If a state i is periodic with period d(i) > 1 then the limit
(n)
lim p
n→∞ ii
does not exist, although the limit
(dn+r)
lim p
n→∞ ii
does exist for every integer r.
Theorem 1 Let C be a communicating class. Then either all states in C are transient or all
are recurrent.
Proof: The theorem follows from
(n+r+m)
pii
(n) (r) (m)
≥ pij pjj pji .
Theorem 2 Every recurrent class is closed.
Proof: Let C be a class which is not closed. Then there exists i ∈ C, and j ∈
/ C and m with
P (Xm = j|X0 = i) > 0. Since we have
P ({Xm = j} ∩ {Xn = i for infinitely many n}|X0 = i) = 0
this implies that
P (Xn = i for infinitely many n|X0 = i)) < 1,
so i is not recurrent, and so neither is C.
Theorem 3 Every finite closed class is recurrent.
Proof: Suppose C is closed and finite and that {Xn } starts in C. Then for some i ∈ C we have
0 < P (Xn = i for infinitely many n) = P (Xn = i for some n)P (Xn = i for infinitely many n)
by the strong Markov property. This shows that i is not transient, so C is recurrent.
2.2
Stationary distribution
When the limits exist, let j denote the long run proportion of time that the chain spends in
state j
n
1 X
I{Xm = j|X0 = i} for all initial states i.
n→∞ n
m=1
(1)
πj = lim
Taking expected values if πj exists then it can be computed alternatively by (via the bounded
convergence theorem)
n
n
1 X
1 X (m)
P (Xm = j|X0 = i) =
p
for all initial states iquad(Cesaro sense),
n→∞ n
n m=0 ij
m=0
πj = lim
24
or equivalently
(2)
π0
π
n
X
1
lim
P m = π = π0
n→∞ n
..
m=1
.
π1
π1
..
.
π2
π2
···
···
.
Theorem 4 If {Xn } is a positive recurrent Markov chain, then a unique stationary distribution
πj exists and is given by πj = E[τ1jj ] > 0 for all states j ∈ S. If the chain is null recurrent or
transient then the limits in (1) are all 0 and no stationary distribution exits.
Proof: First, we immediately obtain the transient case result since by definition, each fixed
state i is then only visited a finite number of times; hence the limit in (2) must be 0. Next, j is
recurrent. Assume that X0 = j. Let t0 = 0, t1 = τjj , t2 = min{k > t1 : Xk = j} and in general
tn+1 = min{k > tn : Xk = j}. These tn are the consecutive times at which the chain visits
state j. If we let Yn = tn − tn−1 (the interevent times) then we revisit state j for the n-th time
at time tn = Y1 + · · · + Yn . The idea here is to break up the evolution of the Markov chain into
i.i.d. cycles where a cycle begins every time the chain visits state j. Yn is the n-th cycle-length.
By the Markov property, the chain starts over again and is independent of the past everytime it
enters state j (formally this follows by the Strong Markov Property). This means that the cycle
lengths Yn , n ≥ 1 form an i.i.d. sequence with common distribution the same as the first cycle
length τjj . In particular, E[Yn ] = E[τjj ] for all n ≥ 1. Now observe that the number of revisits
to state j is precisely n visits at time tn = Y1 + · · · + Yn , and thus the long-run proportion of
visits to state j per unit time can be computed as
m
n
1
1 X
I{Xk = j} = lim Pn
=
n→∞
m→∞ m
E[τ
Y
jj ]
i
i=1
πj = lim
k=1
where the last equality follows from the Strong Law of Large Numbers). Thus in the positive
recurrent case, πj > 0 for all j ∈ S, where as in the null recurrent case, πj = 0 for all j ∈ S.
Finally, if X0 = i 6= j, then we can first wait until the chain enters state j (which it will
eventually, by recurrence), and then proceed with the above proof. Uniqueness follows by the
unique representation.
Theorem 5 Suppose {Xn } is an irreducible Markov chain with transition matrix P. Then {Xn }
is positive recurrent if and only if there exists a (non-negative, summing to 1) solution, π, to the
set of linear equations π = πP , in which case π is precisely the unique stationary distribution
for the Markov chain.
Proof: Assume the chain is irreducible and positive recurrent. Then we know from Theorem 5
that π exists and is unique. On the one hand, if we multiply (on the right) each side of Equation
(5) by P , then we obtain
π
n
n
X
X
1
1
lim
P m+1 = lim
P m + lim (P n+1 − P ) = π ,
n→∞
n→∞ n
n m=1
.
..
m=1
which implies π = πP .
Conversely, assume the chain is either transient or null recurrent. From Theorem 4, we know
that then the limits in (2) are identically 0, that is,
n
1 X m
P =0
n→∞ n
m=1
lim
But if π = πP then (by multiplying both right sides by P) π = πP 2 and more generally
π = πP m , m ≥ 1 and so
n
n
1 X m
1 X
P ) = lim
πP m = 0,
n→∞ n
n→∞ n
m=1
m=1
π( lim
25
which implies π = 0, contradicting that π is a probability distribution. Having ruled out the
transient and null recurrent cases, we conclude that the chain must be positive recurrent. For
the uniqueness, suppose π 0 = π 0 P . Multiplying both sides of (2) (on the left) by π 0 , we conclude
that
π
π0 π1 π2 · · ·
π 0 = π 0 π = π 0 π0 π1 π2 · · · .
..
..
.
.
P
0
0
Since j∈S πj = 1, πj = πj for all j ∈ S.
2.3
Stopping Time
Let {Fn , n ≥ 0} be an increasing family of σ-algebras and {Xn , n ≥ 0} be a {Fn , n ≥ 0}
adapted stochastic process.
Definition A stopping time with respect to {Fn } is a random variable such that {τ = n} is Fn
measurable for all b ≥ 0.
If Fn is the σ-algebra generated by {X0 , · · · , Xn }, the event {τ = n} is completely determined
by (at most) the total information known up to time n, {X0 , · · · , Xn }.
For example the hitting time
τi = min{n ≥ 0 : Xn = i}
of state i and
τA = min{n ≥ 0 : Xn ∈ A}.
of closed set A are stopping times.
Wald’s equation: We now consider the very special case of stopping times when {Xn . n ≥ 1 is
an independent and identically distributed
(i.i.d.) sequence with common mean E[X]. We are
Pτ
interested in the sum up to time:
X
.
n
n=1
Theorem (Wald’s Equation) If τ > 0 is a stopping time with respect to an i.i.d. sequence
{Xn , n ≥ 1} and if E[τ ] < ∞ and E[|X|] < ∞, then
E[
τ
X
Xn ] = E[τ ]E[X].
n=1
Proof: Since
τ
X
Xn =
n=1
∞
X
Xn I{τ > n − 1}
n+1
and Xn and I{τ > n − 1} are independent, we have
E[
τ
X
n=1
Xn ] = E[X]
∞
X
P ({τ > n}) = E[X]E[τ ],
n=0
where the last equality is due to ”integrating the tail” method for computing expected values
of non-negative random variables.
Null recurrence of the simple symmetric random walk: Let Rn be the simple symmetric random
walk: Rn = ∆1 + · · · + ∆n with R0 = 0 where ∆n , n ≥ 1 is i.i.d. with P (∆ = ±1) = 0.5 and
E[∆] = 0. This MC is recurrent but null recurrent. In fact we show that E[τ11 ] = ∞ By
conditioning on the first step i = 1,
1
1
E[τ11 ] = (1 + E[τ21 ]) + (1 + E[τ01 ]) = 1 + 0.5E[τ21 ] + 0.5E[τ01 ].
2
2
26
Note that by definition, the chain at time Rτ = 1 for τ = τ01 and
1 = Rτ =
τ
X
∆n
n=1
But from Wald’s equation assuming E[τ ] < ∞, then we conclude that
1 = E[Rτ ] = E[∆]E[τ ] = 0
which yields the contradiction 1 = 0 and thus E[τ01 ] = E[τ11 ] = ∞.
Theorem 6 Suppose i 6= j are both recurrent. If i and j communicate and if j is positive
recurrent (E[τj j] < ∞), then i is positive recurrent (E[τii ] < ∞) and also E[τij ] < ∞. In
particular, all states in a recurrent communication class are either all together positive recurrent
or all together null recurrent.
Proof: Assume that E[τjj ] < ∞ and that i and j communicate. Choose the smallest n ≥ 1 such
(n)
that pji > 0. With X0 = j, let A = {Xk 6= j; 1 ≤ k ≤ n, Xn = i} and P (A) > 0. Then
E[τjj ] ≥ E[τjj |A]P (A) = (n + E[τij ])P (A),
and hence E[τij ] < ∞ (for otherwise E[τjj ] = ∞, a contradiction). With X0 = j, let {Ym , m ≥
1} be i.i.d process as defined in the proof of Theorem 4. Thus the n-th revisit of the chain to
state j is at time tn = Y1 + · · · + Yn , and E[Y ] = E[τjj ] < ∞. Let
p = P (the chain visits state i before returning to state j|X0 = j),
then p ≥ P (A), where A is defined above. Every time the chain revisits state j, there is,
independent of the past, this probability p that the chain will visit state i before revisiting state
j again. Letting N denote the number of revisits the chain makes to state j until first visiting
state i, we thus see that N has a geometric distribution with ”success” probability p, and so
E[N ] < ∞. N is a stopping time with respect to the process {Ym }, and
τji ≤
N
X
Ym
m=1
and so by Wald’s equation
E[τji ] ≤ E[N ]E[Y ] < ∞.
Finally, E[τii ] ≤ E[τij ] + E[τji ] < ∞.
Strong Markov Chain property: If τ is a stopping time with respect to the Markov chain, then
in fact, we get what is called the Strong Markov Property: Given the state Xτ at time τ (the
present), the future Xτ +1 , Xτ +2 , · · · is independent of the past X0 , · · · , Xtau−1 . The point is
that we can replace a deterministic time n by a stopping time τ and retain the Markov property.
It is a stronger statement than the Markov property. This property easily follows since {τ = n}
only depends on X0 , · · · , Xn , the past and the present, and not on any of the future. Given the
joint event (τ = n, Xn = i), the future Xn+1 , Xn+2 , · · · is still independent of the past:
P (Xn+1 = j|τ = n, Xn = i, · · · , X0 = i0 ) = P (Xn+1 = j|Xn = i, · · · , X0 = i0 ) = pij
2.4
Hitting Times and Absorption Probabilities
Let {Xn , n ≥ 0} be a Markov chain with transition matrix P . The hitting time of a subset A
of S is the random variable H A defined by
H A = inf{n : Xn ∈ A}
The probability starting from i that the chain ever hits A is then
A
hA
i = P (H < ∞|X0 = i)
27
When A is a closed class, hA
i is called the absorption probability. The mean time taken for the
chain to reach A; if P (H A < ∞|X0 = i) = 1, is given by
kiA = E[H A |X0 = i] =
∞
X
nP (H A = n|X0 = i).
n=0
A
The vector of hitting probabilities hA
i = (hi , i ∈ S) satisfies the linear system h = P h;
hA
i = 1 for i ∈ A
hA
i =
P
j∈S
pij hA
/ A.
j for i ∈
/ A, then H A ≥ 1, so by the Markov
In fact, if X0 = i then H A = 0 so hA
i = 0. If X0 = i, i ∈∈
property
P (H A < ∞|X1 = j, X0 = i) = P (H A < ∞|X0 = j) = hA
j
and
A
hA
i = P (H < ∞|X0 = i) =
=
P
j∈S
P
j∈S
P (H A < ∞, X1 = j|X0 = i)
P (H A < ∞|X1 = j)P (X1 = j|X0 = i) =
P
j∈S
pij hA
j
Similarly, the probability fij that the chain ever visits state j satisfies
f = P f.
The vector of mean hitting times k A = (kiA , i ∈ S) satisfies the following system of linear
equations, k = 1 + P k;
kiA = 0 for i ∈ A
kiA = 1 +
P
j ∈A
/
pij kjA for i ∈
/A
In fact, if X0 = i ∈ A, then H A = 0 so kiA = 0. If X0 = i ∈
/ A, then H A ≥ 1, so by the Markov
property
E[H A |X1 = j, X0 = i] = 1 + E[H A |X0 = j]
and
kiA = E[H A |X0 = i] =
=
P
j∈S
P
j∈S
E[H A I{X1 = j}|X0 = i]
E[H A |X1 = j, X0 = i]P (X1 = j|X0 = i) = 1 +
P
j ∈A
/
pij kjA .
Remark: The systems of these equations may have more than one solution. In this case, the
vector of hitting probabilities hA and the vector of mean hitting times k A are the minimal
non-negative solutions of these systems.
2.5
Examples
In this section we discusses examples of the Markov chains. First, consider the random walk,
i.e, the transition probability P satisfies
pi,i−1 = q,
pi,i+1 = p,
p, q > 0 and p + q = 1.
Example 1 (Simple Random Walk) The chain is irreducible and the period d = 2 with
(2n+1)
pii
= 0 and
(4pq)n
(2n)! n n
(2n)
,
p q ∼ √
pii =
n!n!
2πn
by Stirlings formula. Thus, if p = q, then
X
(n)
pii ∼
X
28
√
1
=∞
2πn
and the chain is recurrent. If p 6= q, then r = 4pq < 1 and
X
(n)
pii ∼
X
rn
√
<∞
2πn
and thus the chain is transient. If π is a stationary distribution, then
πi = q pi−1 + p πi+1
p (πi+1 − pi ) = q (πi − πi−1 )
Thus, for bounded solutions we must have πi = πi−1 and π0 = 0. Hence p = q the chain null
recurrent.
Example 2 (Absorbing end i = 0) S = {0, 1, · · · } with the aborning state i = 0, i.e.,
p00 = 1. The chain two subclasses C0 = {0} and C1 = {1, 2, · · · }. C0 is positive recurrent and
C1 is transient.π = (1, 0, 0, · · · ) is a stationary distribution. The absorbing probability αi = fi0
satisfies
αi = pαi+1 + qαi−1
and
p(αi+1 − αi ) = q(αi − αi−1 )
Thus,
αi = A + B( pq )i
For pq ≥ 1 since α is bounded, B = 0 and αi = A = 1. For pq < 1, αi = ( pq )i since α0 = 1 and
α∞ = 0.
Example 3 (Absorbing ends i = 0, N ) Let S = {0, 1, 2, · · · , N } and p00 = 1 and pN N = 1.
There are three subclasses C0 = {0}, C1 = {1, · · · , N − 1} and C2 = {N }. C0 , C2 are positive
recurrent and C1 is transient.π = (α, 0, 0, · · · , β) with α, β ≥ 0 and α + β = 1 are statinary
distributions. The absorbing probability αi = fi0 satisfies
αi = pαi+1 + qαi−1
Using the same arguments in Example 2,
q i q N
( ) −( )
p1−( q )pN ,
p
αi =
1 − Ni
p 6= q
p=q
Example 4 (Reflecting rend i = 0) Let S = {0, 1, 2, · · · } and p0,1 = 1. The chain is
irreducible with period d = 2. For pq < 1, fi1 = αi = ( pq )i−1 , i > 1 from Example 2. But, if the
chain is recurrent, then fi1 = 1 for all i > 1. Thus, the chain ia transient pnij → 0 as n → ∞.
Now, for pq ≥ 1 we have fi1 = 1 for i > 1 and f11 = q + pf21 = 1 and hence the chain is
recurrent. If π is a stationary distribution,
π0 = π1 q
π1 = π0 + π2 q
πi = πi−1 p + πi+1 q,
i≥2
From the first two equations, pπ1 = qπ2 . From the last equations, By induction in i we have
pπi = qπi+1 . If p = q, πi = π0 and consequently π0 = 0 for all i ≥ 0, which implies the chain is
null recurrent.
P
πi = 1
Next, for pq > 1 it follows from
1 = π1 (q +
∞
X
p
q
( )k ) = π1 (q +
).
q
q−p
k=0
29
Thus, π1 =
q−p
2q 2
and
π0 =
q−p
,
2q
p
πi = π1 ( )i−1 for i ≥ 1.
q
Therefore, for pq > 1 the chain is positive recurrent.
Example 5 (Reflecting ends i = 0, N ) Let S = {0, 1, · · · , N } and p01 = 1 and pN,N −1 = 1.
The chain irreducible with period d = 2. As we did in Example 4, we have the stationary
distribution
N
−2
X
p
p
πi = ( )i−1
( )k , 1 ≤ i ≤ N − 1
q
q
k=0
and π0 = qπ1 and πN = pπN −1 and thus the chain is positive recurrent.
Example 6 (Birth-and-death chain) Consider the Markov chain with state space S =
{0, 1, 2, · · · } and transition probabilities p00 = 1 and pi,i−1 = qi , pi,i+1 = pi for i ≥ 1. As in
Example 2, C0 = {i = 0} is positive recurrent and C1 = {1, 2, · · · } is transient. We wish to
calculate the absorption probability αi = fi0 . Such a chain may serve as a model for the size of
a population, recorded each time it changes, pi being the probability that we get a birth before
a death in a population of size i.
αi = pi αi+1 + qi αi−1
and
pi (αi+1 − αi ) = qi (αi − αi−1 )
Thus,
αi+1 = 1 −
i
X
Πkj=1
k=0
qj
(1 − α1 )
pj
There are P
two different cases:
q
∞
(i) If A = k=0 Πkj=1 pjj = ∞, then α1 = 1 and αi = 1 for all i ≥ 0.
P∞
q
(ii) If A = k=0 Πkj=1 pjj < ∞, then 1 − α1 = A1 and
Pi
q
k=0
Πkj=1 pjj
k=0
Πkj=1 pjj
1 − αi+1 = P∞
q
,
so the population survives with positive probability.
2.6
Exercise
Problem 1 Show that the relation ↔ is transitive
Problem 2 Show that for every Markov chain with countably many state,
n
1 X (m)
fij
pij =
.
n→∞ n
µj
m=1
lim
Pm
(m)
(m−k)
p(k)jj ).
(Hint: pij = k=1 fij
Problem 3 Consider an irreducible chain with {0, 1, · · · }. P
A necessary and sufficient condition
for the chain to be transient is the system u = P u (ui = j∈S pij uj ) has a bounded solution
such that ui is not a constant solution.
Problem 4 Complete the Example 5.
Problem 5 Consider a Markov chain with S = {0, 1, · · · } and transition probabilities:
pi > 0,
j =i+1
ri ≥ 0,
j=i
pij =
j =i−1
qi > 0,
0
otherwise
30
Let γn = Πnk=1 pqkk , n ≥ 1.
P
(1) Show
γn < ∞ and the chain is recurrent if and
P that the chain is transient if and only if
only if
γn = ∞.
P 1
(2) Show that the chain is positive recurrent if and only if
γn pn < ∞ and the chain is null
P 1
recurrent if and only if
γn pn = ∞.
Problem 6 Classify the states of a Markov chain
p q 0 0
0 0 p q
P =
p q 0 0
0 0 p q
where p + q = 1 and p ≥ 0, q ≥ 0.
3
Continuous time Markov Chain
A continuous-time Markov process (CTMC) is a stochastic process {Xt , t ≥ 0} that satisfies the
Markov property and takes values from a set S called the state space; it is the continuous-time
version of a Markov chain. For s > t
P (Xs = j|σ(Xt )) = P (Xs = j|Ft ),
where {Ft , t ≥ 0} is an increasing family of σ-algebras, Xt is Ft measurable and σ(Xt ) is the
σ-algebra generated by the random variable Xt . In effect, the state of the process at time s
is conditionally independent of the history of the process before time t, given the state of the
process at time t. The process is characterized by ”transition rates” qij between states, i.e., qij
(for i 6= j) measures how quickly that i → j transition happens. Precisely, after a tiny amount
of time h, the probability the state is now at j is given by
P (Xt+h = j|Xt = i) = qij h + o(h),
i 6= j,
+
where o(h) implies that o(h)
h → 0 as h → 0 . Hence, over a sufficiently small interval of time,
the probability of a particular transition (between different states) is roughly proportional to
the duration of that interval. The qij are called transition rates because if we have a large
ensemble of n systems in state i, they will switch over to state j at an average rate of nqij until
n decreases appreciably.
The transition rates qij are given as the ij-th elements of the transition rate matrix Q.
As the transition rate matrix contains rates, the rate of departing from one state to arrive at
another should be positive, and the rate that the system remains in a state should be negative.
The rates for a given state should sum to zero, yielding the diagonal elements to be
X
qii = −
qij .
j6=i
With this notation, if let
Pij (h) = P (Xh = j|X0 = i)
be the transition probability, then
lim
h→0+
P (h) − I
= Q.
h
The transition probability satisfies the semigroup property
P (t + s) = P (t)P (s) for t, s ≥ 0 with P (0) = I
31
Thus,
P (t + h) = P (t) = (P (h) − I)P (t),
P (t − h) − P (t) = (I − P (h))P (t − h)
for t > 0, h > 0 and hence
P 0 (t) = lim
τ →0
P (t + τ ) − P (t)
= QP (t).
τ
Since
lim+
t→0
where e
Qt
eQt − I
= Q,
t
is the matrix exponential defined by
eQt =
∞ k
X
t
k=0
k!
Qk ,
we obtain
P (t) = eQt ,
i.e., Q is the generator of P (t). Thus, letting pj (t) = P (Xt = j), the evolution of a continuoustime Markov process is given by the first-order differential equation
d
p(t) = p(t)Q,
dt
p(0) = π = initial distribution.
The probability that no transition happens in some time r > 0 is
P (Xs = i, for alls ∈ (t, t + r) | Xt = i) = e−qi r .
That is, the probability distribution of the waiting time until the first transition is an exponential
distribution with rate parameter qi = −qii , and continuous-time Markov processes are thus
memoryless processes. Letting τn denote the time at which the n-th change of state (transition)
occurs, we see that Yn = Xτn+ , the state right after the n-th transition, defines the underlying
discrete-time Markov chain, called the embedded Markov chain. Yn keeps track, consecutively,
of the states visited right after each transition, and moves from state to state according to
the one-step transition probabilities πij = P (Yn+1 = j|Yn = i). This transition matrix {πij },
together with the waiting-time rates qi , completely determines the continuous time Markov
chain, i.e.,
qij = qi πij for all j 6= i.
Hence,
Q = Λ(Π − I),
Λ = diag(q0 , q1 , · · · ).
Example (Poisson counting process) Let Nt , t ≥ be the counting process for a Poisson
process at rate λ. Then Nt forms a continoous time Markov chain with S = {0, 1, 2, · · · } and
qi,j = λ for j = i + 1, otherwise 0, i.e. πi,i+1 = 1. This process is characterized by a rate
parameter λ, also known as intensity, such that the number of events in time interval (t, t + τ ]
follows a Poisson distribution with associated parameter λτ , i.e.,
P (Nt+τ − Nt = k) =
e−λτ (λτ )k
k!
k = 0, 1, . . . ,
where k is the number of jumps during (t, t + τ ]. That is,
pk (τ ) =
satisfies
e−λτ (λτ )k
k!
d
pk (t) = −λpk (t) + λpk−1 (t)
dt
32
d
and thus dt
p(t) = Qp(t). The increment Nt+h − Nt is independent of Ft and the gaps τ1 , τ2 , · · ·
between successive jumps are independent and identically distributed with exponential distribution;
P (τi ≥ t) = P (N (t) = 0) = e−λt , t ≥ 0.
Thus, a concrete construction of a Poisson process can be done as follows. Consider a sequence
{τn , n ≥ 1} be i.i.d. random variables with exponential law of parameter λ. Set T0 = 0 and for
n ≥ 1, Tn = τ1 + · · · + τn . Note that limn→∞ Tn = 1 almost surely, because by the strong law
of large numbers
1
Tn
= E[τ ] =
lim
n→∞ n
λ
Nt , t ≥ 0 be the arrival process associated with the interarrival times Tn . That is
Nt =
∞
X
nI{Tn ≤ t ≤ Tn+1 }.
(3.1)
n=0
The characteristic function of Nt is given by
E[eiNt ξ ] =
∞
X
einξ e−λt
n=0
iξ
(λt)n
= eλt(e −1)
n!
Thus,
E[Nt ] = λt.
and λ is the expected number of arrivals in an interval of unit length, or in other words, is the
arrival rate. On the other hand, the expect time until a new arrival is λ1 .
V ar(Nt ) = λt
and thus
E[|Nt − Ns |2 ] = λ |t − s| + (λ|t − s|)2
The Poisson process is continuous in mean of order 2 but the sample paths of the Poisson process
are discontinuous with jumps of size 1.
Example (Sum of Poisson processes) Let {Lt , t ≥ 0} and {Mt , t ≥ 0} be two independent
Poisson processes with respective rates λ and µ. The process Nt = Lt + Mt is a Poisson process
of rate λ + µ.
Proof: Clearly, the process Nt has independent increments and N0 = 0. Then, it suces to show
that for each 0 < s < t, the random variable Nt − Ns has a Poisson distribution of parameter
(λ + µ)(t − s).
P (Nt − Ns = n) =
n
X
P (Lt − Ls = k, Mt − Ms = n − k)
k=0
=
n
X
k=0
e−λ(t−s)
(λ(t − s))k −µ(t−s) (µ(t − s))n−k
((λ + µ)(t − s))n
e
= e−(λ+µ)(t−s)
.
k!
(n − k)!
n!
Example (Compounded Poisson process) Let {Xn , n ≥ 0} be a Markov chain with
transition probability Π and define the continuous Markov chain Xt by
Xt = XNt
Then,
pi,j (t) = P (Xt = j|X0 = i) =
∞
X
k=0
33
e−λt
(λt)k (k)
π
k! i,j
or equivalently
P (t) =
∞
X
e−λt
k=0
(λt)k k
Π = eλ(Π−I)t = eQt
k!
where Q = λ (Π − I) is the generator of Xt .
In general, the construction of a continuous-time Markov chain with generator Q and initial
distribution π is as follows. Consider a discrete-time Markov chain Xn , n ≥ 0 with initial
distribution π and transition matrix Π. The stochastic process {Xt , t ≥ 0} will visit successively
the sates Y0 , Y1 , Y2 , · · · starting from X0 = Y0 . Denote by HY0 , · · · , HYn−1 the holding times in
the state Yk . We assume the holding times HY0 , · · · , HYn−1 are independent exponential random
variables of parameters qY0 , · · · , qYn−1 , i.e., for j ∈ S
P (Hj ≥ t) = e−qj t ,
t ≥ 0.
Let Tn = HY0 + · · · + HYn−1 and
for Tn ≤ t < Tn+1
Xt = Yn ,
The random time
ζ=
∞
X
HYn
n=0
is called the explosion time. We say that the Markov chain Xt is not explosive if P (ζ = ∞) = 1.
Let {Xt , t ≥ 0} be an irreducible continuous-time Markov chain with generator Q. The
following statements are equivalent:
(i) The jump chain Π is positive recurrent.
(ii) Q is not explosive and has an invariant distribution π.
Moreover, under these assumptions, we have
lim pij (t) =
1
as t → ∞,
qj µj
where µj = E[τ |X0 = j] = E[τjj ] is the expected return time to the state j.
3.1
Explosion
When a state space S is infinite, it can happen that the process, through successive jumps,
moves to state that have the shorter waiting time , i.e. have larger jump rates qi . The waiting
time at state i has the expected value E[τi ] = q1i .
Example (Birth process) A birth process {Xt , t ≥ 0} as generalization of the Poisson process
in which the parameter λ is allowed to depend on the current state of the process. The data for
a birth process consist of birth rates qi > 0, where i ≥ 0. Then, a birth process {Xt , t ≥ 0} is
a continuous time Markov chain with state-space S = {0, 1, 2, · · · } and generator Q:
qi,i = −qi ,
qi,j = qi for j = 1,
qij = 0, otherwise.
That is, conditional on X0 = i, the holding times Hi , Hi+1 , · · · are independent exponential
random variables of parameters qi , qi+1 , · · · , respectively, and the jump chain is given by Yn =
i + n.P
Concerning the explosion time, two cases are possible:
∞
(i) If j=0 q1j < ∞, ζ < ∞ a.s.
P∞
(ii) If j=0 q1j = ∞, ζ = ∞ a.s.
P∞
In fact, if j=0 q1j < ∞, by the monotone convergence theory
E[ζ|X0 = i] = E[
∞
X
τn |X0 = i] =
n=0
34
∞
X
1
< ∞,
q
j+i
j=0
ζ < ∞ a.s.. If
P∞
1
j=0 qi+j
E[e
−
= ∞, then Π∞
j=0 (1 +
P∞
n=0
τn
]=
1
qi+j )
−τn
Π∞
]
n=0 E[e
=
= ∞ and since τj is independent,
Π∞
j=1
−1
1
= 0,
1+
gi+j
P∞
n=0 τn = ∞ a.s..
Particular case (Simple birth process): Consider a population in which each individual gives
birth after an exponential time of parameter λ, all independently. If i individuals are present
then the first birth will occur after an exponential time of parameter iλ. Then we have i + 1
individuals and, by the memoryless property, the process begins afresh. Then the size of the
population performs a birth process with rates qi = iλ, i ≥ 1. Suppose X0 = 1. Note that
P
∞ 1
i=1 iλ = ∞, so ζ = ∞ a.s. and there is no explosion in finite time. However, the mean
population size growths exponentially: E[Xt ] = eλt : Indeed, let τ be the time of the first birth.
Then if we let µ(t) = E[Xt ], then
Z t
2λe−λs µ(t − s) ds + e−λt
µ(t) = E[Xt I{τ ≤ t}[+E[Xt I{τ > t}] =
so
0
By letting r = t − s we have
Z
λt
t
e µ(t) = 1 + 2λ
eλr µ(r) dr
0
and thus µ(t) = eλt .
For the birth process with qi = (i + 1)2 is explosive since
X
i
1
< ∞.
(i + 1)2
With bounded qi the birth process is not explosive. If qi > 0 is not bounded, the Q is no
longer bounded.
Theorem (Explosive) The Markov chain corresponding to the transition rate matrix Q starting from i explodes in finite time if and only if there exists a nonnegative bounded sequence
with Ui > 0 that satisfies
X
qij Uj ≥ σ Ui for all i,
for some σ > 0.
Theorem (Non Explosive) If for some σ > 0, there exists a nonnegative U on S that satisfies
X
qij Uj ≤ σ Ui for all i,
and Ui → ∞ as qi → ∞, then the chain is not explosive.
3.2
Invariant distribution
A probability distribution (or, more generally, a measure) π on the state space S is said to
be invariant for a continuous-time Markov chain {Xt , t ≥ 0} if πP (t) = π for all t ≥ 0. If
we choose an invariant distribution π as initial distribution of the Markov chain {Xt , t ≥ 0},
then the distribution of is π for all t ≥ 0. If {Xt , t ≥ 0} is a continuous-time Markov chain
irreducible and recurrent (that is, the associated jump matrix Π is recurrent) with generator Q,
then, a measure π is invariant if and only if
πQ = 0,
and there is a unique (up to multiplication by constants) solution π which is strictly positive.
On the other hand, if we set αj = qi πj , then it is equivalent to say that α is invariant for the
jump matrix Π. In fact, we have α(Π − I) if and only if πQ = 0.
35
That is, to find the stationary probability distribution vector, we must next find α such that
α(I − Π) = 0,
P
with α being a row vector, such that all elements in α are greater than 0 and j∈S αj = 1.
From this, π may be found as
αj
πj =
qj
P
and normalize π so that
πj = 1.
A CTMC is called positive recurrent if it is irreducible and all states are positive recurrent.
We define the limiting probabilities for the CTMC as the long-run proportion of time the chain
spends in each state j ∈ S:
Z
1 t
I{Xs = j|X0 = i) ds, w.p.1.,
Pj = lim
t→∞ t 0
which after taking expected values yields
1
t→∞ t
Z
t
Pij (s) ds.
Pj = lim
0
P
When each Pj exists and Pj = 1, then P = (Pj , j ∈ S) (as a row vector) is called the limiting
(or stationary) distribution for the Markov chain.
Proposition 1 If Xt is a positive recurrent CTMC, then the limiting probability distribution
P exists, is unique, and is given by
Pj =
E[Hj ]
1
=
.
E[τjj ]
qj E[τjj ]
In words: The long-run proportion of time the chain spends in state j equals the expected
amount of time spent in state j during a cycle divided by the expected cycle length (between
visits to state j)”. Moreover, the stronger mode of convergence (weak convergence) holds:
Pj = limt→∞ Pij (t). Finally, if the chain is either null recurrent or transient, then Pj = 0, j ∈ S,
no limiting distribution exists.
Example (Birth-Death process) A birth-death chain is a continuous time Markov chain
with state space S = {0, 1, 2, · · · } (representing population size) and transition rates:
qi,i+1 = λi ,
qi,i−1 = µi ,
qi,i = −λi − µi
with µ0 = 0. Thus,
πi,i+1 = pi ,
πi,i−1 = 1 − pi
with pi =
λi
.
λi + µi
The matrix Π is irreducible. Notice that
P
(n)
πii
λi + µi
is the expected time spent in state i. A necessary and sufficient condition for non explosion is
then
∞ P (n)
X
πii
= ∞.
λ
+
µi
i=0 i
On the other hand, equation πQ = 0 satisfied by invariant measures leads to the system
µ1 π1 = λ0 π0
λ0 π0 + µ2 π2 = (λ1 + µ1 )π1
λi−1 πi−1 + µi+1 πi+1 = (λi + µi )πi ,
36
i ≥ 2.
So, πi is an equilibrium if and only if
λi πi = µi+1 πi+1
and
πi =
Πi−1
k=0 λk
π0
Πij=1 µk
Hence, an invariant distribution exists if and only if
X Πi−1 λk
k=0
c=
Πij=1 µk
<∞
and the invariant distribution is
π0 =
3.3
1
,
1+c
πi =
Πi−1
k=0 λk
π0
Πij=1 µk
Dynkin’s formula
Let τA is the exit time from A;
τA = inf{t ≥ 0 : Xt ∈
/ A}.
Theorem For λ > 0 the function
Ui = E[e−λ τA f (xτA )|X0 = i]
(3.2)
is the unique solution to
(QU )j = λ Uj , j ∈ A,
Ui = f (i), i ∈
/ A.
Proof: First, note that if i ∈
/ A, then τA = 0 and Ui = fi . Since
λ I)e−λt P (t),
Z t
eλt P (t) = I +
e−λs P (s)(Q − λ I) ds
(3.3)
d
−λt
P (t))
dt (e
= (Q −
0
Thus
Mt = e
−λt
Z
f (Xt ) − f (i) −
t
e−λs (Q − λI)f (Xs ) ds is a martingale
(3.4)
0
with respect (Ω, Ft , P ). In fact, t ≥ s
i
E (Mt − Ms |Fs ) = e
−λs
i
E (e
−λ(t−s)
Z
f (Xt ) − f (Xs ) −
t
e−λ(σ−s) (Q − λI)f (Xσ ) dσ|Fs )
s
= e−λs e−λ(t−s) P (t − s)f (Xs ) − f (Xs ) −
Z
t
e−λ(σ−s) P (σ − s)(Q − λI)f (Xs ) dσ = 0,
s
where we used
E i (f (Xt )|Fs ) = P (t − s)f (Xs ).
Thus, by the Doob’s optional sampling theorem E[Mτ ] = 0 for a stooping time τ ≥ 0 and we
have
Z τ
−λτ
E[e
φ(Xτ )|X0 = i] = φ(i) + E[
e−λs (Q − λ I)φ(Xs ) ds|X0 = i].
(3.5)
0
Suppose U satisfies (3.3), letting φ = U and τ = τA ,
E[e−τA U (XτA )|X0 = i] − Ui = 0,
37
which implies (3.2) holds.
Remark (1) Equation
λUj − (QU )j = gj , j ∈ A,
Ui = f (i), i ∈
/ A.
(3.6)
has the unique solution of the form
Ui = E[e−τA f (xτA ] +
τA
Z
E[e−λs g(Xs ) ds|X0 = i]
0
(2) If λ = 0 it is required that P (τA < ∞) = 1.
(3) If U satisfies (QU )j = 1, j ∈ A and Ui = 0 for i ∈
/ A, then
E[τA |X0 = j] = Uj
3.4
Excises
Problem 1 Show that
E[Nt ] = λt
and V ar(Nt ) = λt.
Problem 2 The process defined by (3.1) is the Poisson process.
Problem 3 Construct a binary S = {0, 1} continuous time Markov processes.
Problem 4 Let {Lt , t ≥ 0} and {Mt , t ≥ 0} be two independent Poisson processes with
respective rates λ and µ. Show that the process Xt = Lt − Mt is a continuous time Markov
chain on S = {integers} and find its generator. Let Pn (t) = P (Xt = n|X0 = 0). Show that
∞
X
Pn (t)z n = e−(λ+µ)t eλzt+µz
−1
t
,
|z| =
6 0
n=−∞
and
E[Xt ] = (λ − µ)t,
4
E[|Xt |2 ] = (λ + µ)t + (λ − µ)2 t2 .
Markov Process
Let (S, B) be a measurable space. A discrete time Markov process {Xn , n ≥ 0} is fully described
by the one step transition probability Π(x, A) defined for x ∈ S and A ∈ B, which is a probability
measure on (S, B) and
Π(x, A) = P (X1 ∈ A|X0 = x).
The multistep transition probability {Π(n) (x, A)} are determined by
Z
Π(n+1) (x, A) =
Π(n) (y, A)Π(x, dy).
S
The, they satisfies the Chapman-Kolmogorov equations;
Z
Π(n+m) (x, A) =
Π(n) (y, A)Π(m) (x, dy).
S
In the continuous time Markov process {Xt , t ≥ 0} we use the transition probabilities
p(t, x, A) defined for t ≥ 0, x ∈ S and A ∈ B which is defined by
p(t, x, A) = P (Xt ∈ A|X0 = x).
They satisfy the Chapman-Kolmogorov equations
Z
p(t + s, x, A) =
p(s, y, A)p(t, x, dy).
S
38
Given transition probabilities, we define a consistent family of finite dimensional distributions
on (Ω, F, P ) by
Z Z
Z
Ft1 ,··· ,tn (B1 × · · · × Bn ) =
···
p(t1 , x, dy1 )p(t2 − t1 , y1 , dy2 ) · · · p(tn − tn−1 , yn−1 , dyn )
B1
B2
Bn
(4.1)
for the cylinder set, given arbitrary 0 < t1 < · · · < tn and Bj ∈ B. It reflects the fact that
the increments Xtj − Xtj−1 , 1 ≤ j ≤ n are independent random variables. Conversely, such
a consistent family of finite distributions by the Kolmogorov extension theory there exists a
Markov process ωt which satisfies
P(
n
\
{ωtj ∈ Bj }) = Ft1 ,··· ,tn (B1 × · · · × Bn )
j=1
Suppose {Yn , n ≥ 1} is i.i.d. random variables with distribution α. Let Sn = Y1 + · · · + Yn
and Nt is a Poisson process. We define s compound process Xt = SNt . Such a process inherits
the independent increment property from Nt . The distribution of any increment Xt+h − Xt is
that of XNt and determined by the distribution of Sn where n is random variable and has a
Poisson distribution with parameter λt;
E[ei(ξ,Xt ) ] =
∞
X
e−λt
n=0
R
i(ξ,x)
(λt)n
−1)dα(x)
α̂(ξ)n = e−λt eλtα̂ = eλt(α̂−1) = eλt S (e
,
n!
where
E[e
i(ξ,
Pn
k=1
Yk )
]=
E[Πnk=1 ei(ξ,Yk ) ]
n
= α̂(ξ) ,
Z
α̂(ξ) =
eiξx dα(x).
S
In other words Xt has an infinitely divisible distribution with a Levy measure given by λtα(x).
If we let M = λα, we have
R
i(ξ,x)
−1) dM (x)
E[ei(ξ,Xt ) ] = et S (e
.
(4.2)
4.1
Infinite number of small jumps
A Poisson process cannot have an infinite number of jumps in a finite interval. But if we consider
compounded Poisson processes we can, in principle by adding an infinite number of small jumps
obtain a finite sum. That is, let {Xk (t)} be a family of mutually independent compounded
Poisson process with Mk = λk αk and
X
Xt =
Xk (t)
k
If the sum exists then it is a process with independent increments. We may center these process
with suitable constants ak t and we define
X
Xt =
(Xk (t) − ak t)
k
We assume
XZ
and
XZ
k
(1)
We decompose Mk as Mk = Mk
From
dMk (x) < ∞
(4.3)
x2 dMk (x) < ∞
(4.4)
|x|>1
k
|x|≤1
(2)
corresponding to jump of sizes |x| ≤ and |x| > 1.
X (2)
M (2) =
Mk
+ Mk
k
39
sums to a finite mesures and the corresponding process
X (2)
(2)
Xt =
Xk (t)
k
exits. Since
X
P ( sup |Xk (s)| =
6 0) ≤
k
0≤s≤t
X
X
(2)
(2)
t Mk (R) < ∞
(1 − e−tMk (R) ) ≤
k
k
it follows from Borel-Cantelli lemma, in any finite interval the sum is almost surely a finite sum.
R
P
(1)
For the convergence of k Xk (t) we let ak = |x|≤1 x dMk (x) and we have
E[|Xk (t) − ak t|2 ] = t
Z
x2 dMk (x)
|x|≤1
From (4.4) and the two series theorem
X
(Xk (t) − ak t)
k
(1)
converges to Xt . A simple applications of Doob’s inequality shows that in fact a.s. uniformly
converges in finite time interval, i.e., define the tail
X (1)
Tn (t) =
(Xk (t) − ak t).
k≥n
(1)
Since E[Xk (t) − ak t] = 0, Tn (t) is a martingale and by the Doob’s martingale inequality
X
(1)
P ( sup |δ)f rac1δ 2
V (Xk (t) − ak t) → 0 as n → ∞.
0≤s≤t
k≥n
If we now reassemble the pieces we obtain
E[eiξXt ] = et
R
|x|≤1
(eiξx −1−iξ x)dM (x)+t
R
|x|>1
(eiξx −1) dM (x)
,
(4.5)
which is the Levy-Kintchine representation of infinitely divisible distributions except for the
missing Brownian motion term.
4.2
Feller semigroup
Let B(S) be the Banach space of all essentially bounded functions f (x) : S → R with the norm
|f |∞ = sup |f (x)|
x∈S
Define a family of bounded linear operators {T (t), t ≥ 0} in L(B(S)) by
Z
(T (t)f )(x) =
f (y)p(t, x, dy) = E x [f (Xt )].
S
where
E x [f (Xt )] = E[f (Xt )|X0 = x]
The collection of {T (t), t ≥ 0} has the properties
(1) T (t) maps nonnegative function on (S, B) into nonnegative functions.
(2) |T (t)f |∞ ≤ |f |∞ for all f ∈ X and T (t)1 = 1. Thus, kT (t)k = 1.
(3) T (0) = I, T (t + s) = T (t)T (s) (semigroup property) for t, s ≥ 0.
40
Let C0 (S) denote the space of all real-valued continuous functions on S that vanish at
infinity, equipped with the sup-norm |f | = |f |∞ . A Feller semigroup on C0 (S) is a collection
{T (t), t ≥ 0} of positive linear operators from C0 (S) to itself such that
(1) |T (t)f | ≤ |f | for all t ≥ 0,
(2) the semigroup property: T (t + s) = T (t)T (s) for all s, t ≥ 0,
(3) limt→0+ |T (t)f − f | = 0 for every f in C0 (s)) (strongly continuity at 0).
Thus, we let X be the subspace of B(S) such that
X = {f ∈ B(S) : lim+ |T (t)f − f | → 0}
t→0
and the collection {T (t), t ≥ 0} forms the strongly continuous semigroup on X.
Let {Xn , n ≥ 0} be a discrete time Markov process with transition probability Π(x, A).
Define the bounded linear operator in X by
Z
(Πf )(x) =
f (y)Π(x, dy) = E[f (X1 )|X0 = x]
S
Define a continuous time Markov process by Xt = XNt . Then,
T (t) =
∞
X
n=0
eλt
(λt)n n
Π = eλt(Π−I) = eAt
n!
where A = λ(Π − I).
In general we define the infinitesimal A of {T (t), t ≥ 0} by
Af = s − lim+
t→0
T (t)f − f
t
with domain
T (t)f − f
exists}.
t
If {Xt , t ≥ 0} is Markov process with stationary increments then we have a convolution
semigroup
Z
(T (t)f )(x) =
f (x, y) µt (dy),
dom(A) = {f ∈ X : s − lim+
t→0
S
where µt+s = µt ∗ µs for t, s ≥ 0 and
Z
p(t, x, A) =
1A (x + y) µt (dy)
S
Then,
µt ∗ f − f
.
t
t→0
Theorem (C0 -semigroup) Let u(t) = T (t)f = E x [f (Xt )].
(1) If u(t) = T (t)f ∈ C(0, T ; X) for every f ∈ X.
(2) If f ∈ dom (A), then u ∈ C 1 (0, T ; X) ∩ C(0, T ; dom(A)) and
Af = s − lim+
d
u(t) = Au(t) = AT (t)f.
dt
(3) The infinitesimal generator A is closed and densely defined. For f ∈ X
Z t
T (t)f − f = A
T (s)f ds.
(4.6)
0
(4) λ > 0 the resolvent is given by
(λ I − A)−1 =
Z
0
41
∞
e−λs T (s) ds
(4.7)
with estimate
1
.
λ
Proof: (1) follows from the semigroup property and the fact that for h > 0
|(λ I − A)−1 | ≤
(4.8)
u(t + h) − u(t) = (T (h) − I)T (t)f
and for t − h ≥ 0
u(t − h) − u(t) = T (t − h)(I − T (h))f.
Thus, x ∈ C(0, T ; X) follows from the strong continuity of S(t) at t = 0.
(2)–(3) Moreover,
u(t + h) − u(t)
T (h) − I
T (h)f − f
=
T (t)f = T (t)
h
h
h
and thus T (t)f ∈ don(A) and
lim
h→0+
u(t + h) − u(t)
= AT (t)f = Au(t).
h
Similarly,
lim+
h→0
T (h)f − f
u(t − h) − u(t)
= lim+ T (t − h)
= S(t)Af.
−h
h
h→0
Hence, for f ∈ dom(A)
Z
T (t)f − f =
t
Z
t
Z
AT (s)f ds = A
T (s)Af ds =
0
0
t
T (s)f ds
(4.9)
0
If fn ∈ don(A) → f and Afn → y in X, we have
Z
T (t)f − f =
t
T (s)y ds
0
Since
1
lim
t→0+ t
t
Z
T (s)y ds = y
0
f ∈ dom(A) and y = Af and hence A is closed. Since A is closed it follows from (4.9) that for
f ∈X
Z t
T (s)f ds ∈ dom(A)
0
and (4.6) holds. For f ∈ X let
fh =
1
h
Z
h
T (s)f ds ∈ dom(A)
0
Since fh → f as h → 0+ , dom(A) is dense in X.
(4) For λ > 0 define Rt ∈ L(X) by
Z
Rt =
t
e−λs T (s) ds.
0
Since A − λ I is the infinitesimal generator of the semigroup e−λt T (t), from (4.6)
(λ I − A)Rt f = f − e−λt T (t)f → f as t → ∞.
42
Since A is closed and |e−λt T (t)| → 0 as t → ∞, we have R = limt→∞ Rt satisfies
(λ I − A)Rf = f.
Conversely, for f ∈ dom(A)
∞
Z
e−λs T (s)(A − λ I)f ds = lim e−λt T (t)f − f = −f
R(A − λ I)f =
t→∞
0
Hence
Z
∞
e−λs T (s) ds = (λ I − A)−1
R=
0
Since for f ∈ X
∞
Z
|e−λs T (s)φ| ≤
|Rf | ≤
0
e(−λ)s |φ| ds =
0
we have
|(λ I − A)−1 | ≤
4.3
∞
Z
1
,
λ
1
|f |,
λ
λ > 0l.
Infinitesimal generator
In this section we discuss examples of Markov process and the corresponding generators.
Example (Poisson process) For a Poisson process {Nt . t ≥ 0}
Af = λ(f (i + 1) − f (i)),
i ∈ S = {0, 1, · · · )
Example (Transport Process) For the shift (deterministic) process xt = ct
T (t)f = E[f (Xt )|X0 = x] = f (x + ct)
and
Af = c f 0 (x) with dom(A) = Lipschitz functions.
Example (Levy process) Consider a process that has the Levy representation
E x [eiξXt ] = et
R
R
(eiξz −1) dM (z) iξx
e
with a finite Levy measure M (dx). Then,
Z
Af = (f (x + z) − f (x)) dM (z).
(4.10)
R
In fact we have for f = eiξx
T (t)eiξx = et
and thus
Aeiξx =
Z
R
R
(eiξz −1) dM (z) iξx
e
(eiξ(x+z) − eixξ ) dM (z).
R
R
1
Since for any f we have f (x) = 2π
fˆeiξx dξ by the inverse Fourier transform, (4.10) holds.
R
Example (Brownian Motion) A Brownian motion {Bt t ≥ 0} is a Markov process with the
transition probability
Z
|y−x|2
1
√
p(t, x, A) =
e− 2σ2 t dy
2πtσ
A
and
E x [eiξBt ] = √
1
2πtσ
Z
eiξy e−
R
43
|y−x|2
2σ 2 t
dy = e−
σ2
2
t|ξ|2 iξx
e
Thus,
1
(Af )(x) = −
2π
Z
R
σ2 2 ˆ
σ2
|ξ| f (ξ)eiξx dξ = − f 00 (x)
2
2
with dom(A) = C02 (R).
Example (Levy-Kintchine process) For the process defined by (4.5) we have
Z
Z
σ 2 00
0
0
(Af )(x) =
f (x)+cf (x)+
(f (x+z)−f (x)−zf (x)) dM (z)+
(f (x+z)−f (x)) dM (z).
2
|z|≤1
|z|>1
Example (Cauchy Process) A Cauchy process {Xt t ≥ 0} is a Markov process with the
transition probability
Z
1
t
dy
p(t, x, A) =
2
π A t + (y − x)2
and
x
E [e
iξXt
Z
1
]=
π
eiξy
R
t
dy = e−t|ξ| eiξx
t2 + (y − x)2
Thus,
(Af )(x) = −
1 1
2π π
Z
|ξ|fˆ(ξ)eiξx dξ =
R
1
π
Z
R
f (z) − f (x)
dz,
|x − z|2
with dom(A) = C01 (R).
In general, for the symmetric α-stable Levy process
α
E x [eiξXt ] = e−t|ξ| eiξx .
Example (Gamma Process) A Gamma process {Xt , t ≥ 0} is a Markov process with the
transition probability
Z
1 −(y−x)
e
(y − x)t−1 dy
p(t, x, A) =
A Γ(t)
and
E x [eiξXt ] =
1
Γ(t)
Z
eiξx e−(1−iξ)(y−x)) (y − x)t−1 dy = (1 − iξ)−t eiξx
R
Thus,
(Af )(x) =
4.4
1
2π
Z
e−(x−z)
R
f (x + z) − f (x)
dz,
|x − z|
with dom(A) = C01 (R).
Dynkin’s formula
Theorem Let f be a bounded continuous function in dom(A). Then
Z
Mt = f (Xt ) − f (x) −
t
Af (Xs ) ds
0
is a martingale with respect (Ω, Ft , P ). Proof: For t ≥ s
Z t
Z t
E [Mt −Ms |Fs ] = E [f (Xt )−f (Xs )−
Af (Xσ ) dσ|Fs ] = T (t−s)f −f (Xs )−
T (σ−s)Af (Xs ) dσ = 0.
x
x
s
s
Remark For f ∈ dom(A)
eλt f (Xt ) − f (x) −
f (Xt )exp(−
Rt
0
Rt
0
e−λs Af (Xs ) ds for λ ∈ R
Af (Xs )
f (Xs )
ds)
44
for uniformly positive f
are martingales.
The characteristic operator Ac defined by
E x [f (XτU )] − f (x)
,
U ↓x
E x [τU ]
(Ac f )(x) = lim
where the sets U form a sequence of open sets Uk that decrease to the point x in the sense that
∞
\
Uk+1 ⊆ Uk and
Uk = {x},
k=1
and
τU = inf{t ≥ 0|Xt 6∈ U }
is the exit time from U for Xt . dom(Ac ) denotes the set of all f for which this limit exists
for all x ∈ S and all sequences {Uk }. If E x [τU ] = ∞ for all open sets U containing x, define
Ac f (x) = 0. The characteristic operator is an extension of the infinitesimal generator, i.e.,
dom(A) ⊂ don(Ac ) and Ac f = Af for f ∈ dom(A).
Theorem (Dynkin’s formula) Let f be a bounded continuous function in dom(Ac ) and τ be
a stoping time with E[τ ] < ∞. Then,
Z τ
E x [f (Xτ )] = f (x) + E x [
Ac f (Xs ) ds].
0
4.5
Invariant measure
Let T (t)f = E x [f (Xt ))] =
Z
f (y)p(t, x, y) dy. Then for f ∈ dom(A)
S
Z tZ
Z
f (y)p(t, x, y) dy = f (x) +
p(s, x, y)Af (y) dy.
S
0
S
Define the adjoint operator A∗ of A is defined by
Z
Z
Af φ dy = f A∗ φ dy
(4.11)
for all f ∈ dom(A). Since dom(A) is dense and there exists a unique closed linear operator A∗
in X that satisfies (4.11). Thus, we have
Z Z
Z t
( p(t, x, y) − δx (y) − A∗
p(s, x, y) ds)f (y) dy = 0
S
0
for all f ∈ A. Since dom(A) is dense in X, it follows that
Z t
p(t, x, ·) = δx + A
p(s, x, ·) ds.
Or, equivalently the transition probability p satisfies the Kolmogorov forward equation
∂p
= A∗ p(t),
∂t
p(0) = δx .
(4.12)
As we discussed in Section 4.1, if the state space S is countable, the invariant distribution π
is defined as
π = πP (t), t > 0
or equivalently
πQ = 0
45
where P (t) is the transition probability matrix and Q is the transition rate matrix of the
continuous time Markov chain {Xt , t ≥ 0}. For the case S is continuum (e.g. S = R) we define
the invariant measure µ (a bounded linear functional on C(S)) by
Z
µ(A) =
p(t, x, A) dµ(x)
S
for all A ∈ B and t > 0, or equivalently
hµ, T (t)f i = hµ, f i
for all f ∈ C(S) and t > 0. One can state this as T (t)∗ µ = µ for all t > 0 or A∗ µ = 0, i.e.,
hµ, Af i = 0 for all f ∈ dom(A).
For the Ito’s diffusion process
Af =
a(x) 00
f + b(x)f 0
2
and dµ = φ dx satisfies
A∗ φ = (
a0 (x) 0
a(x) 0
φ + (−b(x) +
)φ) = 0
2
2
and thus
Rx
φ(x) = c e
5
0
2b−a0
a
dx
.
Martingale Process
In this section we consider a probability space (Ω, F, P ) and a nondecreasing sequence of σ-fields
Fn contained in {Fn , n ≥ 0}.
Definition A sequence of real random variables {Mn } is called a martingale with respect to
the filtration {Fn , n ≥ 0} if
(1) For each n, Mn is Fn -measurable (that is, Mn is adapted to the filtration Fn ,
(2) For each n, E[|Xn |] < ∞,
(3) For each n, E[Mn+1 |Fn ] = Mn .
The sequence {Mn } is called a supermartingale (or submartingale) if property (iii) is replaced
by
E[Mn+1 |Fn ] ≥ Mn ( or E[Mn+1 |Fn ] ≤ Mn .
Notice that the martingale property implies that E[Mn ] = E[M0 ] for all n. On the other hand,
condition (iii) can also be written as
E[∆Mn |Fn−1 ] = 0
for all n, where ∆Mn = Mn − Mn−1 .
Example 1 Suppose that ξn are independent centered random variables (E[ξk ] = 0, k ≥ 1.
Set M0 = 0 and Mn = ξ1 · · · + ξn . Then Mn is a martingale with respect to the sequence of
Fn = σ(ξ1 , · · · , ξm ), n ≥ 1.
Example 2 Suppose that {ξn , n ≥ 1} are independent random variable such that P(ξn =
ξ1 +···+ξn
−1) = 1 − p, P (ξn = 1) = p, 0 < p < 1. Then Mn = ( 1−p
is a martingale with respect
p )
to the sequence of σ-fields σ(ξ1 , · · · , ξn ), n ≥ 1. In fact,
ξn+1
ξn+1
E[Mn+1 |Fn ] = E[( 1−p
Mn |Fn ] = E[( 1−p
]E[Mn |Fn ] = Mn .
p )
p )
Example 3 If Mn is a martingale and ϕ is a convex function such that E[|ϕ(Mn )|] ≤< ∞ for all
n then ϕ(Mn ) is a submartingale. In fact, by Jensens inequality for the conditional expectation
we have
E[ϕ(Mn+1 )|Fn ] ≥ ϕ(E[Mn+1 |Fn ]) = ϕ(Mn )
46
In particular, if {Mn } is a martingale such that E[|Mn |p ] < ∞ for all n and for some p ≥ 1,
then |Mn |p is a submartingale.
Example 4 Suppose that {Fn , n ≥ 0} is a given filtration. We say that Hn , n ≥ 1 is a
predictable sequence of random variables if for each n, Hn is Fn−1 -measurable. The martingale
transform of a martingale Mn by a predictable sequence Hn as the sequence
(H · M )n = M0 +
n−1
X
Hj ∆Mj ,
j=1
defines a martingale.
Example 5 (Likelihood Ratios) Let {Yn , n ≥} be i.i.d. random variables and let f0 and f1
be probability density functions. Define the sequence of probability ratios;
Xn =
f1 (Y0 )f1 (Y1 ) · · · f1 (Yn )
f0 (Y0 )f0 (Y1 ) · · · f0 (Yn )
and let Fn = σ(Yk , 0 ≤ k ≤ n). The, {Xn , n ≥ 0} is a martingale, i.e.,
E[Xn+1 |Fn )] = E[
where we used
f1 (Yn+1 )
f1 (Yn+1 )
Xn |Fn ] = E[
]Xn = Xn
f0 (Yn+1 )
f0 (Yn+1 )
f1 (Yn+1 )
]=
E[
f0 (Yn+1 )
Z
f1 (y)
f0 (y) dy = 1.
f0 (y)
Example 6 (Exponential Martingale) Suppose that {Yn n ≥ 1} are i.i.d. random variables
with distribution N (0, σ 2 ). Set M0 = 1, and
Mn = e
Pn
2
k=1
Yk − σ2
Then, {Mn } is a nonnegative martingale. In fact
E[Mn |Fn−1 ] = E[eYn −
σ2
2
Mn−1 |Fn−1 ] = E[eYn −
σ2
2
]Mn−1 = Mn−1 .
Example 7 (Martingale induced by Eigenvector of Transition Matrix) Let {Yn , n ≥ 0}
be a Poisson process with the transition probability P . Assume a bounded sequence f (i) ≥ 0
satisfies
X
f (i) =
pij f (j)
j
Let Xn = f (Yn ) and Fn = σ(Yk , 0 ≤ k ≤ n). Then {Xn , n ≥ 0} is a martingale. In fact,
E[|Xn |] < ∞ since f is bounded and
X
E[Xn+1 |Fn ] = E[f (Yn+1 )|Fn [= E[f (Yn+1 )|Yn ] =
pYn ,j f (j) = f (Yn ) = Xn
j
Example 8 (Radon-Nikodym derivatives) Suppose Z be a uniformly distributed random
variable on [0, 1], define the sequence of random variables by setting
Yn =
k
2n
for the unique k (depending on n and Z) that satisfies
k
k+1
≤Z<
.
2n
2n
That is, Yn determines the first n bits of the binary representation of Z. Let f be a bounded
function on [0, 1] and form the finite difference quotient sequence
Xn = 2n (f (Yn + 2−n ) − f (Yn )).
47
Then {Xn , n ≥ 0} is a martingale. In fact,
E[Xn+1 |Fn ] = 2n+1 E[f (Yn+1 + 2−(n+1) ) − f (Yn+1 )|Fn ]
1
1
= 2n+1 ( (f (Yn + 2−(n+1) − f (Yn )) + (f (Yn + 2−n ) − f (Yn + 2−(n+1) )))
2
2
= 2n (f (Yn + 2−n ) − f (Yn )) = Xn .
where we used the fact that Z conditional on Fn has a uniform distribution [Yn , Yn + 2−n ) and
thus Yn+1 is equally likely to be Yn or Yn + 2−(n+1) .
Theorem (Martingale central limit theorem) Suppose {(Mn , Fn )} is a martingale sequence
and let
Vk = E[|Mk − Mk−1 |2 |Fn−1 ]
If
Vn
n
→ σ 2 in probability and for all > 0
n
1X E (Mk − Mk−1 )2 I|Mk −Mk−1 |>√n → 0
n
k=1
as n → ∞, then
5.1
Mn
n
converges to N (0, σ) in distribution.
Doob’s decomposition
Theorem (Doob’s Decomposition) Let (Xn , Fn ) be a submartingale. The, there exit a
unique Doob’s decomposition of Xn such that for a martingale (Mn , Fn ) and a predictable
increasing sequence (An , Fn−1 ) with A0 = 0,
Xn = Mn + An .
Proof: Define
Mn = X0 +
An =
Pn
j=1 (Xj
Pn
− E[Xj |Fj−1 ])
j=1 (E[Xj |Fj−1 ]
− Xj−1 ).
It is easy to see that (Mn , An ) gives the desired decomposition. For the uniqueness, if we let
Xn = Mn0 + A0n the other decomposition, then
0
E[A0n+1 − A0n |Fn ] = E[An+1 − An ) + (Mn+1 − Mn ) − (Mn+1
− Mn0 )|Fn ]
and thus we have
A0n+1 − A0n = An+1 − An
Since A00 = A0 , this implies A0n = An for all n ≥ 1 and hence the decomposition is unique.
The Doob’s decomposition plays a key role in study of square integrable martingale (Mn , Fn )),
i.e., E[Mn2 ] < ∞ for all n ≥ 0. Since {M2n , n ≥ 0} is a submartingale, from Theorem there
exits a martingale mn and a predictable increasing sequence (hM in , Fn−1 ) such that
Mn2 = mn + hM in
The sequence (hM in , Fn−1 ) is called the quadratic variation of {Mn } and is given by
hM in =
n
X
E[(∆Mj )2 |Fj−1 ],
j=1
where
2
2
E[(∆Mj )2 |Fj−1 ] = E[Mj2 − 2Mj Mj−1 + Mj−1
|Fj−1 ] = E[Mj2 − Mj−1
|Fj−1 ] = hM ij − hM ij−1
48
For k ≥ `
E[(Mk − M` )2 |F` )] = E[Mk2 − M`2 |F` ] = E[hM ik − hM i` |F` ].
In particular, if M0 = 0, then E[Mk2 ] = E[hM ik ].
If (Xn , Fn ) and (Yn , Fn ) are square integrable martingales, we define
hX, Y in =
1
(hX + Y in − hX − Y in ).
4
It is easy to verify that
Xn Yn − hX, Y in is a martingale
(5.1)
E[(Xk − X` )(Yk − Y` )|F` ] = E[hX, Y ik − hX, Y i` |F` ].
(5.2)
and for k ≥ `
Moreover, we have
hX, Y in =
n
X
E[∆Xj ∆Yj |Fj−1 ]
(5.3)
j=1
Pn
Pn
In the case Xn = k=1 ξk and Yn = k=1 ηk , where {ξk } and {ηk } are sequences of independent
square integrable random variables with E[ξk ] = E[ηk ] = 0, then
hX, Y in =
n
X
E[ξj ηj ].
j=1
Theorem For the martingale transform
(H · M )n = M0 +
n
X
Hj ∆Mj ,
j=1
the quadratic variation is given by
hH · M in =
n
X
j=1
5.2
E[(Hj ∆Mj )2 |Fj−1 )] =
n
X
|Hj |2 E[|∆Mj |2 |Fj−1 ] =
n
X
|Hj |2 ∆hM ij .
j=1
j=1
Doob’s Optional SamplingTheorem
Example 9 Let {ξk }, k ≥ 1 be an i.i.d sequence with Bernoulli random variables, P (ξk =
1) = p, P (ξk = −1) = 1 − p. Let Fn = σ(η1 , ·, ηn ) and assume the player’s stake Vn (Fn−1 measurable) at the n-th turn. Then the player’s gain Xn is
Xn =
n
X
V k ξk
k=1
Then, (Xn , Fn )) ia martingale if p = 12 . Consider the gambling strategy that doubles the stake
after a loos and drops out the game immediately after a win, i.e, the stakes are
n−1
2
if ξ1 = · · · = ξn−1 = −1
Vn =
0
otherwise
Pn
Then, if ξ1 = · · · = ξn−1 = −1, the total loss after n turns is i=1 2i−1 = 2n − 1. Thus, if
ξn+1 = 1, we have
Xn+1 = Xn + Vn+1 = −(2n − 1) + 2n = 1.
Let τ = inf{n ≥ 1 : ξn = 1}. If p = 12 , the the game is fair and P (τ = n) = ( 12 )n , P (τ < ∞) = 1
and E[Xτ ] = 1. Therefore, for a fair game, by applying this strategy, a player can in finite time
complete the game successfully in increasing his capital by one unit (E[Xτ ] = 1 > X0 = 0).
49
The following the basic theorem the typical case in which E[Xτ ] = E[X0 ] of a Markov time
τ ≥ 0.
Theorem (Optional Sampling) Let (Xn , Fn ) is a martingale (or submartingale), and τ1 ≤ τ2
are stopping times. If
E[|Xτi |] < ∞,
lim inf E[|Xn |I{τi > n}] = 0,
(5.4)
n→∞
then
E[Xτ2 |Fτ1 ] = (≥)Xτ1 and E[Xτ2 ] = (≥)E[Xτ1 ].
Proof: It suffices to prove that for A ∈ Fτ1 ,
Z
Z
Xτ2 dP =
A∩{τ2 ≥τ1 }
Xτ1 dP
A∩{τ2 ≥τ1 }
for every A ∈ Fτ1 , or equivalently
Z
Z
Xτ2 dP =
B∩{τ2 ≥n}
for B = A ∩ {τ1 = n} and all n ≥ 0. Since
Z
Z
Xn dP =
B∩{τ2 ≥n}
Xτ1 dP,
(5.5)
B∩{τ2 ≥n}
Z
E[Xn+1 |Fn ] dP
Xn dP +
B∩{τ2 =n}
Z
B∩{τ2 >n}
Z
=
Xτ2 dP +
Xn+2 dP
B∩{n≤τ2 ≤n+1}
B∩{τ2 ≥n+1}
Z
Z
··· =
Xτ2 dP +
B∩{n≤τ2 ≤m}
Xm dP,
B∩{τ2 >m}
Z
Z
Z
Xτ2 dP +
B∩{n≤τ2 ≤m}
Xn dP =
B∩{τ2 ≥n}
+
Since Xm = 2Xm
− |Xm |, we have
Z
Z
Xτ2 dP = lim sup(
m→∞
B∩{τ2 ≥n}
Z
Xn dP −
B∩{τ2 ≥n}
Xm dP )
B∩{τ2 >m}
Z
Z
Z
Xn dP − lim inf
=
Xm dP.
B∩{τ2 >m}
B∩{τ2 ≥n}
m→∞
Xm dP =
B∩{τ2 >m}
Xn dP,
B∩{τ2 ≥n}
which implies (5.5).
Example 9 (revisited)
Z
|Xn | dP = (2n − 1)P (τ > n) = (2n − 1)2−n → 1 as n → ∞.
τ >n
and condition (5.4) is violated.
The followings are the versions of the Doob’s optional sampling (optional stopping) theorem.
Corollary For some N ≥ 0 such that P (τ1 ≤ N ) = P (τ2 ≤ N ) = 1, condition (5.4) holds and
thus E[Xτ ] = E[X0 ].
Corollary If {Xn } is uniformly integrable, then condition (5.4) holds and thus E[Xτ ] = E[X0 ].
50
Theorem Let {Xn } be a martingale (or submartingale) and τ be a stopping time with respect
to Fn = σ(Xk , k ≤ n). Suppose E[τ ] < ∞ and for all n and some constant C
E[|Xn+1 − Xn ||Fn ] ≤ C
({τ ≥ n}, P − a.s.)
Then,
E[|Xτ |] < ∞
and E[Xτ ] = (≥)E[X0 ].
Pτ
Proof: Let Y0 = 0 and Yj = |Xj − Xj−1 |, j ≥ 1 Then, |Xτ | ≤ j=0 Yj and
E[|Xτ |] ≤ E[
τ
X
∞ Z
X
Yj ] =
=
∞ X
n Z
X
n=0 j=0
Yj dP =
τ =n
∞ X
∞ Z
X
j=0 n=j
Since {τ ≥ j} = Ω \ {τ < j} ∈ Fj−1 ,
Z
Z
Yj dP =
τ ≥j
Yj ) dP
τ =n j=0
n=0
j=0
n
X
Yj dP =
τ =n
∞ Z
X
j=0
Yj dP
τ ≥j
E[Yj |Fj−1 ] dP ≤ C P (τ ≥ j)
τ ≥j
and thus
τ
∞
X
X
E[|Xτ |] ≤ E[
Yj ] ≤ E[|X0 |] + C
P (τ ≥ j) = E[|X0 |] + C P (τ ) < ∞
j=0
Moreover, if τ > n then
n
X
Yj ≤
j=0
τ
X
Yj
j=0
and thus
Z
Z
|Xn | dP ≤
τ >n
τ
X
Yj dP.
τ >n j=0
Pτ
Since E[ j=0 Yj ] < ∞ and {τ > n} ↓ ∅, it follows from the Lebesgue dominated convergence
theorem that
Z
Z
τ
X
Yj dP = 0
lim inf
|Xn | dP ≤ lim inf
n→∞
n→∞
τ >n
τ >n j=0
Hence the theorem follows from Theorem (Optional Sampling).
Example (Wald’s identities) Let {ξk , k ≥} be i.i.d random variables with E[|ξk |] < ∞ and
τ is a stopping time with respect to Fn = σ(ξk , k ≤ n). If E[τ ] < ∞,
E[
τ
X
ξk ] = E[ξ1 ] E[τ ]
k=1
If moreover E[|ξk |2 ] < ∞, then
E[|
τ
X
ξk − τ |ξ1 )|2 ] = V ar(ξ1 ) E[τ ].
k=1
In fact,
Xn =
n
X
ξk − nE[ξ1 ]
k=1
51
is a martingale and
E[|Xn+1 − Xn ||Fn ] = E[|ξn+1 − Eξ1 )|Fn ] = E[|ξn+1 − E[ξ1 ]|) ≤ 2E[|ξ1 |] < ∞.
Thus, E[Xτ ] = E[X0 ] = 0 and the claimed identity holds.
Example (Wald’s fundamental identity) Let {ξk , k ≥} be i.i.d random
variables with and
Pn
τ is a stopping time with respect to Fn = σ(ξk , k ≤ n). Define Sn = k=1 ξk assume E[τ ] < ∞
and |Sn | ≤ C, (τ > n, P − a.s.) (for example, τ = {n ≥ 0 : |Sn | ≥ a} for some a > 0)). Let
φ(t) = E[eξ1 t ] and for some t0 6= 0, φ(t0 ) exits and φ(t0 ) ≥ 1. Then,
E[et0 Sτ φ(t0 )−τ ] = 1.
In fact, Xn = et0 Sn φ(t0 )−n is martingale and
E[|Xn+1 − Xn ||Fn ] = Xn E[|et0 ξn+1 φ(t0 )−1 − 1||Fn ] = Xn E[|et0 ξn+1 φ(t0 )−1 − 1|] < ∞.
The claimed identity follows from E[X1 ] = 1.
5.3
Martingale Convergence
Theorem (Doob’s Maximal Inequality) Suppose that {Mn } is a submartingale. Then
P (max Mk ≥ λ) ≤
k≤n
1
E[Mn I{max Mk ≥ λ}].
k≤n
λ
Proof: Define the stopping time τ = min{n ≥ 0 : Mn ≥ λ} ∧ n. Then, by the optional sampling
theory,
E[Mn ] ≥ E[Mτ ] = E[Mτ I{maxk≤n Mk ≥ λ}] + E[Mτ |{maxk≤n Mk < λ}]
≥ λ P (maxk≤n Mk ≥ λ) + E[Mn I{maxk≤n Mk < λ].
As a consequence, if {Mn } is a martingale and p ≥ 1, applying Doobs maximal inequality
to the submartingale {|Mn |p } we obtain
P ( max |Mn | ≥ λ) ≤
0≤n≤N
1
E[|MN |p ]
λp
for p ≥ 1,
(5.6)
which is a generalization of Chebyshev inequality.
Kolmogorov’s Inequality
Pn Let {ξk , k ≥} be i.i.d random variables with E[ξk ] = 0 and
E[|ξ1 |2 ] < ∞. since Sn = k=1 ξk is a martingale with respect to Fn = σ(ξk , k ≤ n),
P (max |Sk | ≥ ) ≤
k≤n
ESn2
.
2
For a < b let τ0 = 1 and
τ1 = min{n > 0; Xn ≤ a},
τ2 = min{n > τ1 ; Xn ≥ b}, · · ·
τ2n−1 = min{n > τ2n−2 ; Xn ≤ a},
τ2n = min{n > τ2n−1 ; Xn ≥ b}, · · ·
Let βn (a, b) = max{m : τ2m ≤ n} be the upcrossing number of [a, b] by the process {Xk , k ≥ 1}.
Theorem (The Martingale Convergence Theorem) If {Mn } is a submartingale such that
supn E[Mn+ ] < ∞, then
Mn → M a.s.,
where M is an integrable random variable.
52
Proof: First, since
E[Mn+ ] ≤ E[|Mn |] = 2 E[Mn+ ] − E[Mn ] ≤ 2 E[Mn+ ] − E[M1 ],
we have supn E[|Mn |] < ∞. Suppose that
A = {lim sup Mn > lim inf Mn }
and P (A) > 0.
The since
A = ∪a<b (lim sup Mn > b > a > lim inf Mn where a, b are rational numbers
for some rational numbers a, b
P ({lim sup Mn > b > a > lim inf Mn }) > 0
(5.7)
Let βn (a, b) be the number of upcrossings of (a, b) by the sequence M1 , · · · , Mn .
E[(Mn − a)+ ]
E[Mn+ ] + |a|
≤
b−a
b−a
E[βn (a, b)] ≤
and thus
supn E[Mn+ ] + |a|
n→∞
b−a
which contradicts to assumption (5.7). Hence limn→∞ Mn = M exists and by Fatou’ lemma
lim E[βn (a, b)] ≤
E|M | ≤ sup E|Mn | < ∞
n
Example 6 (revisited)
Since E[|Mn |] = 1 and limn→∞ Mn exists almost surely. By the law
Pn
k=1 Yk
→ 0 in probability, we have limn→∞ Mn = 0, a.s..
of large number
n
Theorem (P.Levy) Let ξ be an integrable random variable and F∞ = σ(∪n Fn ). Then,
a.s. and in L1 .
E[ξ|Fn ] → E[ξ|F∞ ]
Proof: Let Xn = E[ξ|Fn ]. For a > 0 and b > 0
Z
Z
Z
|Xn | dP ≤
E[|ξ||Fn ] dP =
{|Xn |≥a}
{|Xn |≥a}
Z
|ξ| dP
{|Xn |≥a}
Z
|ξ| dP +
=
|ξ| dP
{{|Xn |≥a}∩{|ξ|≤b}}
{{|Xn |≥a}∩{|ξ|>b}}
Z
≤ bP (|Xn | ≥ a) +
|ξ| dP
{|ξ|≤b}
b
≤ E[|Xn |] +
a
Z
b
|ξ| dP ≤ E[|ξ|] +
a
{|ξ|≤b}
Letting a → ∞ and the b → ∞ in this, we have
Z
lim sup
a→∞ n
Z
|ξ| dP
{|ξ|≤b}
|Xn | dP = 0,
{|Xn |≥a}
i.e., {Xn } is uniformly integrable. Thus, from Martingale Convergence theorem there exists a
random variable X such that Xn = E[ξ|Fn ] → X a.s and in L1 . For the last assertion let m ≥ n
and A ∈ Fn . Then,
Z
Z
Z
Z
Xm dP =
Xn dP =
E[ξ|Fn ] dP =
ξ dP.
A
A
A
53
A
Since {Xn } is uniformly integrable, E[IA |Xm − X|] → 0 as m → ∞ and
Z
Z
X dP =
ξ dP
A
A
for all A ∈ Fn and thus for all A ∈ ∪n Fn . Since E|X| < ∞ and E|ξ| < ∞ the left and
right hand side of the above inequalities define σ-additive measures on the algebra ∪n Fn . By
Caratheodory’s theorem there exists the unique extension on these measures to F∞ = σ(∪n Fn ).
Thus,
Z
Z
Z
X dP =
ξ dP =
E[ξ|F∞ ] dP.
A
A
A
Since X is F∞ -measurable, X = E[ξ|F∞ ].
Corollary (Doob Martingale) A {Mn } is uniformly integrable martingale if and only if there
exists an integrable random variable M such that Mn = E[X|Fn ] for n ≥ 1.
Proof: Since {Mn } is uniformly integrable, supn E[|Mn |] < ∞ and Mn → M in L1 (Ω, P ) as
n → ∞. Since {Mn } is a martingale, for A ∈ Fm and n ≥ m,
Z
Z
E[Mn |Fm ] dP =
Mm dP
A
A
But, we have
Z
Z
E[Mn |Fm ] dP =
A
Hence
Z
|
Mn dP
A
Z
Z
(Mm − M ) dP | = |
(Mn − M ) dP | ≤
A
|Mn − M | dP → 0
A
as n → ∞ and
Ω
Z
Z
Mm dP =
M dP.
A
A
Corollary If (Mn , Fn ) is submartingale, and for some p > 1 supn E[|Mn |] < ∞ then there exits
an integrable random variable M such that
and Mn → M in Lp .
Mn = E[M |Fn ]
Corollary If (Mn , Fn ) is a martingale
Mn
→ 0 P − a.s.
hM in
Example 8 (revisited) Assume f is Lipschitz continuous,i.e. |f (x) − f (y)| ≤ L |x − y|. Then
|Xn | ≤ L. Note that F = B[0, 1]) = σ(∪n Fn ) there is F-measurable function g = g(x) such
that Xn → g a.s. and
Xn = E[g|Fn ]
Thus, for B = [0, k2−n ]
f (k2−n ) − f (0) =
Z
k2−n
Z
Xn dx =
0
g dx.
0
Since n and k are arbitrary, we obtain
Z
f (x) − f (0) =
i.e., f is absolutely continuous and
d
dx f
= g a.s.
54
x
g(s) ds,
0
k2−n
5.4
Continuous time Martingale and Stochastic integral
Let {Xt , t ≥ 0} be a continuous time stochastic process on a probability space (Ω, F, P ) and
{Ft , t ≥ 0} be a family of sub-σ algebras with Fs ⊂ Ft for all t > s ≥ 0. A random variable
τ ≥ 0 is a Markov time with respect to the filteration Ft if for all t ≥ 0, the event {τ ≤ t} is
Ft measurable,i.e., the event is completely described by the information available up to time t.
For continuous time process it is not sufficient to require {τ = t} is Ft measurable for all t ≥ 0.
If τ1 , τ2 are Markov times, so are τ1 + τ2 , τ1 ∧ τ2 = min(τ1 , τ2 ) and τ1 ∨ τ2 = max(τ1 , τ2 ). Thus,
τ ∧ t is a Markov time. For example, let Ft = σ(Xs , s ≤ t) of a continuous process Xt . The
exit time from an open set A;
τA = inf{t : Xt ∈
/ A}
is a Markov process, i.e,
{τ > t} =
∞
[
\
{dist(Xr , Ac ) ≥
k=1 r∈Q, 0≤r≤t
1
}.
k
In general if Xt is not continuous τA is not necessary a Markov time. Suppose t → Xt (ω) is
continuous from the right and has a limit from the left, i.e., Xt = lims↓t and Xt− = lims↑t Xs
exists for all t ≥ 0. Let
\
Ft+ =
Fs
s>t
+
Then, Ft+ is a σ algebra, Xt is Ft+ and F t ⊂ Fs+ for t < s. Next, F̄t+ be the smallest σ
algebra containing every set in Ft+ and every set A in F with P (A) = 0, i.e, it consists of all
events that are P − a.s. equivalent to events in Ft+ . Then, for every Borel set B, the arrival
time
inf{t ≥ 0 : Xt ∈ B},
Xt ∈ B for some t ≥ 0
τB =
∞,
Xt ∈
/ B for all t ≥ 0,
is a Markov time with respect to F̄t+ .
Given a filtered probability space (Ω, F, (Ft )t≥0 , P ), then a continuous-time stochastic process (Xt )t≥0 is a martingale (submartingale) if
(a) Xt is Ft measurable for all t ≥ 0.
(b) E[Xt+ ] < ∞.
(c) Xt = (≥)E[Xt |Fs ] for all t ≥ s ≥ 0.
Both the martingale optional sampling and convergence theorems hold for continuous time,
i.e.,
E[X0 ] ≤ E[Xτ ∧t ] ≤ E[Xt ]
for all Markov times τ . Here, the inequalities for a submartingale and the equalities for a
martingale. If P (τ < ∞) = 1 then P −a.s.
Xτ ∧t → Xτ as t → ∞.
Theorem (Optional Sampling) Let {Xt , t ≥ 0} be a martingale (submartingale) and τ is
+
a Markov time with respect to Ft . If P (τ < ∞) and the random variables {Xt∧τ
, t ≥ 0} are
uniformly integrable, then E[X0 ] = (≤)E[Xτ ]x.
Corollary Let {Xt , t ≥ 0} is a martingale and τ is a Markov time with respect to Ft . If
P (τ < ∞) and E[supt≥0 |Xt , t ≥ 0}] < ∞, then E[X0 ] = E[Xτ ].
We use these results to derive a number of important proprieties of the Brownian motion in
Chapter 7.
Example (Poisson Process) If {Nt , t ≥ 0} is a Poisson process with parameter λ, then
−θ
Nt − λ t, (Nt − λt)2 − λ t, e−θ Nt +λt (1−e
55
)
(5.8)
are martingales with respect to Ft = σ(Ns , s ≤ t). Let a is a positive integer and τa = inf{t ≥
0 : Nt ≥ a} starting from N0 = 0. With the observation Nτa = a, we have
a
λ
2
−βτa
θa
a = λE[τa ], E[(λ τa − a) ] = λ E[τa ] = a, E[e
]=e =
(5.9)
λ+β
where β = −λ (1 − e−θ ). The last equation is the Laplace transform of τa and it shows that τa
has a gamma distribution with parameters a and λ.
Example (Birth Processes) Let {Xt . t ≥ 0} be a pure birth process having the birth rate
λ(i) for i ≥ 0. If Xt = 0 then
Z
t
Yt = Xt −
λ(Xs ) ds,
Vt = eθXt +(1−e
θ
)
Rt
0
λ(Xs ) ds
0
are martingales with respect to Ft = σ(Xs , s ≤ t).
Lemma 2.2 Suppose Mt is almost surely continuous martingale with respect to (Ω, Ft , P ) and
At is a progressively measurable (predictable) function, which is almost surely continuous and
of bounded variation in t. Then, under the assumption that sup0≤s≤t |Ms |Var[0,t] A(·, ω) is
integrable,
Z
t
M t At − M 0 A0 −
M (s) dA(s)
0
is a martingale.
Proof: The main step is to see why
Z
t
E[Mt At − M0 A0 −
Ms dAs ] = 0
0
Then the same argument, repeated conditionally will prove the martingale property.
X
E[Mt At − M0 A0 ] = lim
E[Mtj Atj − Mtj−1 Atj−1 ]
j
= lim
X
E[E[(Mtj − Mtj−1 ]Atj−1 |Ftj−1 ] + Mtj (Atj − Atj−1 ))
j
= lim
X
Z t
E[Mtj (Atj − Atj−1 ] = E[
Ms dAs ].
0
j
where the assumption and the dominated convergence theorem. For
Mt = f (Xt ) − f (X0 ) −
At = e −
Rt
0
Af (Xs )
f (Xs )
we have
f (Xt )e−
Rt
0
Af (Xs ) ds
ds
Rt
0
Af (Xs )
f (Xs )
ds
is a martingale if f is uniformly positive. In fact
Z t
M t At − M 0 A0 −
Ms dAs = f (Xt )At − f (X0 )A0 .
0
56
(5.10)
(5.11)
5.5
Excises
Problem 1 Show (5.1)–(5.3).
Problem 2 If {ξk , k ≥ 1} is a sequence of independent random variables with E[ξk ] = 1. Show
that Xn = Πnk=1 ξk is a martingale with respect to Fn = σ(ξk k ≤ n). Consider the case
P (ξk = 0) = P (ξk = 2) = 21 . Show that there is no an integrable random variable ξ such that
Xn = E[ξ|Fn ].
Problem 3 Let {ξk }P
be a sequence of independent random variables with E[ξk ] = 0 and V (ξk ) =
n
σk2 . Define Sn =
the following generalization of
k=1
Pξτk and Fn = σ(ξk , k ≤ n). Show P
τ
2
2
|ξ
|]
<
∞
then
E[S
]
=
0.
If
E[
Wald’s
identities.
If
E[
τ
k=1 |ξk | ] < ∞ then E[Sτ ] =
k=1 k
Pτ
2
E[ k=1 σk ].
Problem 4 Show (5.8) and (5.9).
Problem 5 Suppose {Xn } is a martingale satisfying some p > 1 E[|Xn |p ] < ∞. Show
1
(E[( max |Xk |)p ]) p ≤
0≤k≤n
1
p
E[|Xn |p ] p .
p−1
R∞
Hint: E[|ξ|p ] = p 0 tp−1 P (|ξ| > t) dt. Now, we use the maximal inequality for the submartingale |Xn |.
Problem 6 Show (5.10)-(5.11).
Problem 7 Show that Xt = eλBt −
6
λ2 t
2
satisfies dXt = λXt dBt . Find the generator of Xt .
Brownian Motion and Ito’s Calculus
In 1827 Robert Brown observed the complex and erratic motion of grains of pollen suspended
in a liquid. It was later discovered that such irregular motion comes from the extremely large
number of collisions of the suspended pollen grains with the molecules of the liquid. In the
20s Norbert Wiener presented a mathematical model for this motion based on the theory of
stochastic processes. The position of a particle at each time t ≥ 0 is a d dimensional random
vector Bt . The mathematical definition of a Brownian motion is the following Definition:
Definition (Brownian motion) A stochastic process Bt , t ≥ 0 is called a Brownian motion
if it satisfies the following conditions:
i) For all 0 = t0 ≥ t1 < · · · < tn the increments Btn − Btn−1 , · · · Bt1 − Bt0 are independent
random variables.
ii) If 0 ≤ s < t, the increment Bt − Bs has the normal distribution N (0, t − s).
Theorem (Continuous Process) If Xt is a stochastic process on (Ω, F, P ) satisfying
E[|Xt − Xs |α ] ≤ C|t − s|1+β
for some positive constants α, β and C, then if necessary, Xt , t ≥ 0 can be modified for each t
on a set of measure zero, to obtain an equivalent version that is almost surely continuous.
For the Brownian Motion, from (ii) an elementary calculation yields
E|Bt − Bs |4 = 3|t − s|2
so that Theorem with α = 3, β = 1 and C = 3 applies.
Remark (1) With probability 1 Brownian paths satisfy a Holder condition with any exponent
less than 12 . It is not hard to see that they do not satisfy a Holder condition with exponent 21 .
p
The random variables (Bt − Bs )/ |t − s| have standard normal distributions for any interval
[s, t] and they are independent for disjoint intervals. We can find as many disjoint intervals as
we wish and therefore dominate the Holder constant from below by the supremum of absolute
values of an arbitrary number of independent Gaussians.
(2) The mapping ω → Bt (ω) ∈ C([0, 1); R) induces a probability measure PB which is called
the Wiener measure, on the space of continuous functions C = C([0; 1); R) equipped with its
57
Borel-field B(C), generated by open balls in C. Then we can take as canonical probability space
for the Brownian motion the space (C, B(C), PB ). In this canonical space, the random variables
are the evaluation maps: Xt (ω) = ω(t).
Pn
First, we will show that the quadratic variation k=1 |∆Bk |2 , ∆Bk = Btk − Btk−1 converges
in mean square to the length of the interval as the length of the subdivision tends to zero;
Pn
P
E[( k=1 |∆Bk |2 − t)2 ] = k,` (|∆Bk |2 − ∆tk )(|∆B` |2 − ∆t` )
=
n
X
(∆Bk − tk )4 =
X
k=1
3(∆tk )2 − 2(∆tk )2 + (∆tk )2 =
n
X
k
2(∆tk )2 ≤ 2t max |∆tk | → 0.
k
k
Pn
On the other hand, the total variation, defined by V = sup
k=1 |∆Bk | over all partition
0 = t0 < t1 < · · · < tn = t, is infinite with probability one. In fact, using the continuity of the
trajectories of the Brownian motion, we have
n
X
|∆Bk |2 ≤ supk |∆Bk |
k=1
|∆Bk | ≤ V supk |∆Bk | → 0
k=1
if V < ∞, which contradicts the fact that
6.1
n
X
Pn
k=1
|∆Bk |2 converges in mean square to t.
Brownian motion and Martingale
If {Bt , t ≥ 0} is a Brownian motion and Ft is the filtration generated by Bt , then, the processes
Bt , |Bt |2 − t and eλBt −
E[eλBt −
λ2
2
t
λ2
2
t
are martingales. In fact
|Fs ] = E[eλ(Bt −Bs )−
λ2
2
2
(t−s) λBs − λ2 s
e
|Fs ] = E[eλ(Bt −Bs )−
λ2
2
(t−s)
]eλBs −
λ2
2
s
= eλBs −
λ2
2
λ2
Consider the stopping time τa = inf{t ≥ 0 : Bt = a} for a > 0. Since the process Mt = eλBt − 2 t
is a martingale, E[Mt ] = E[M0 ] = 1. By the Optional Stopping theorem we obtain E[Mτa ∧N ] =
1 for all N ≥ 1. Note that
Mτa ∧N = eλBτa ∧N −
λ2
2
τa ∧N
≤ eλa .
On the other hand,
lim Mτa ∧N = Mτa if τa < ∞,
lim Mτa ∧N = 0 if τa = ∞,
N →∞
N →∞
and the dominated convergence theorem implies E[I{τa < ∞}Mτa ] = 1, that is,
E[I{τa < ∞}e−
λ2
2
τa
] = e−λa .
Letting λ → 0+ , we obtain P (τa < ∞) = 1 and consequently,
E[e−
With the change of variables
λ2
2
λ2
2
τa
] = e−λa
(6.1)
= α, we have
√
E[e−ατa ] = e−
2αa
.
From this expression we can compute the distribution function of the random variable τa ;
Z
t
P (τa ≤ t) =
0
58
2
ae−a /2s
√
ds.
2πs3
(6.2)
s
On the other hand, the expectation of τa can be obtained by computing the derivative of (6.2)
with respect to the variable a:
√
ae−2 αa
E(τa e−ατa ) = √
.
2α
and letting α → 0+ we obtain P (τa < ∞) = 1.
(3) One can use the Martingale inequality in order to estimate the probability P (sups≤t |Bs | ≥ `).
For A > 0, by Doob’s inequality
P (sup eλBs −
λ2
2
s
1
.
A
≥ A) ≤
s≤t
and thus
P (sups≤t Bs ≥ `) ≤ P (sups≤t |Bs −
= P (sup |λBs −
s≤t
λs
2 |
≥ ` − λ2 t)
λ2
λ2
λ2 s
| ≥ λ` − t) ≤ eλ`− 2 t
2
2
Optimizing over λ > 0 we obtain
`2
P (sup Bs ≥ `) ≤ e− 2t
s≤t
and by symmetry
`2
P (sup |Bs | ≥ `) ≤ 2e− 2t
s≤t
The estimate is not too bad because by reflection principle
r
r Z ∞
Z ∞
2
y2
2
2
− x2t
e− 2 dy
P (sup |Bs | ≥ `) ≥ 2P (Bt ≥ `) =
e
dx =
2πt `
π √`
s≤t
t
and thus
lim P (τ` ≤ t) = 1.
t→∞
In particular, the one-dimensional Brownian motion starting from 0 will get up to any level `
at some time.
Theorem (Levy theorem) If P is a measure on (C[0, 1], B, P ) such that P (X0 = 0) = 1 and
the the functions Xt and |Xt |2 − t are martingales with respect to (C[0, T ], Bt , P ) then P is the
Wiener measure.
Proof: The proof is based on the observation that a Gaussian distribution is determined by two
moments. But that the distribution is Gaussian is a consequence of the fact that the paths are
almost surely continuous and not part of our assumptions. The actual proof is carried out by
establishing that for each real number λ
Xλ (t) = eλ Xt −
λ2
2
t
(6.3)
is a martingale with respect to (C[0; T ]; Bt , P ). Once this is established it is elementary to
compute
E[eλ (Xt −Xs ) |Bs ] = e
λ2
2
(t−s)
(6.4)
which shows that we have a Gaussian process with independent increments with two matching
moments (0, t − s). The proof of (6.3) is more or less the same as proving the central limit
theorem. In order to prove (6.4) we can assume with out loss of generality that s = 0 and will
show that
λ2
(6.5)
E[eλXt − 2 t ] = 1.
59
To this end let us define successively τ0, = 0 and
τk+1, = min(inf{s ≥ τk, : |Xs − Xτk, | ≥ }, t, τk, + ).
Then each τk, is a stopping time and eventually τk, = t by continuity of paths. The continuity
of paths also guarantees that |Xτk+1, − Xτk, | ≤ . We have
Xt =
X
(Xτk+1, − Xτk, ),
t=
k≥0
X
(τk+1, − τk, ).
k≥0
To establish (6.5) we calculate the left hand side as
P
lim E[e
2
0≤k≤n
λ (Xτk+1, −Xτk, )− λ2 (τk+1, −τk, )
n→∞
]
and show that it is equal to 1. Let us consider the σ-algebra Fk = Bτk, and let
qk (ω) = E[eλ (Xτk+1, −Xτk, )−(
λ2
2
+δ)(τk+1, −τk, )
|Fk ]
where δ = δ(, λ) is to be chosen later such that 0 ≤ δ(, λ) ≤ 1 and δ(, λ) → 0 as → 0+ for
every fixed λ. If z and τ are random variables bounded by such that
E[z] = E[z 2 − τ ] = 0,
then for any 0 ≤ δ ≤ 1
E[eλz−(
λ2
2
+δ)τ
] ≤ E[1+(λz−(
λ2
1
λ2
+δ)τ ]+ (λz−( +δ)τ )2 +Cλ (|z|3 +λ3 ) ≤ E[1−δτ +Cλ τ ] ≤ 1
2
2
2
provided that δ = Cλ . Clearly there is a choice of δ(, λ) → 0 as → 0+ such that qk (ω) ≤ 1
for every k and almost all ω. In particular, by induction
E[e
2
P
0≤k≤n
λ (Xτk+1, −Xτk, )−( λ2 +δ)(τk+1, −τk, )
]≤1
for every n and by Fatou’s lemma
E[eλ(Xt −X0 )−(
λ2
2
+δ)t
] ≤ 1.
Since > 0 is arbitrary we have proved one half of (6.5). To prove the other half, we note that
Xλ (t) is a submartingale and from Doob’s martingale inequality we can get a tail estimate
`2
P ( sup |Xt − X0 | ≥ `) ≤ 2 e− 2t .
0≤s≤t
Since this allows us to use the dominated convergence theorem and establish
E[e
6.2
P
2
0≤k≤n
λ (Xτk+1, −Xτk, )−( λ2 −δ)(τk+1, −τk, )
] ≥ 1.
Random walks and Brownian Motion
Let ξk be a sequence of independent identically distributed random variables with mean 0 and
variance 1 The partial sums Sk are defined by S0 = 0 and Sk = ξ1 + · · · + ξk for 1 ≤ k ≤ n. We
define stochastic processes Xn (t), t ∈ [01] by
k
Sk
Xn ( ) = √
n
n
60
k
for 0 ≤ k ≤ n and for t ∈ [ k−1
n , n]
k
k−1
Xn (t) = (nt − k + 1)X( ) + (k − nt)Xn (
).
n
n
Let Pn denote the distribution of the process Xn (t, ω) on C[0, 1] and P the distribution of
Brownian Motion. We want to explore the sense in which limn→∞ Pn = P
Lemma For any finite collection 0 < t1 < · · · < tm ≤ 1 of m sample points, the joint distribution
of (X(t1 ), · · · X(tm )) under Pn converges, as n → ∞, to the corresponding distribution under
P.
Proof: We are dealing here basically with the central limit theorem for sums of independent
random variables. Let us define kni = [nti ] and the increments
ξni =
Skni − Skni−1
√
n
for i = 1, · · · m. For each n, ξni are m mutually independent random variables and their distributions converge as n → ∞ to Gaussians with 0 means and variances ti −ti−1 , respectively. This is
of course the same distribution for these increments under Brownian Motion. The interpolation
is of no consequence, because the difference between the end points is exactly some √ξin . So it
does not really matter if in the definition of Xn (t) we take kn = [nt] or kn = [nt] + 1 or take the
interpolated value. We can state this convergence in the form
lim E[f (Xn (t1 ), Xn (t2 ), · · · Xn (tm ))] = E[f (Bt1 , · · · , Btm )],
n→∞
for every m, any sample points (t1 , · · · tm ) and any bounded continuous function f on Rm .
Equivalently, for a simple random walk
=
1 (n)
1 (n)
pk−1 + pk+1 ,
2
2
=
1 pk−1 − 2pk + 12 pk+1
,
2
∆x2
(n+1)
pk
or
(n+1)
pk
(n)
(n)
− pk
∆t
(n)
(n)
where ∆t = δx2 . Letting ∆t → 0 we obtain
∂
1 ∂2
p(t, x) =
p(t, x).
∂t
2 ∂x2
6.3
Stochastic Integral with respect to Brownian motion
In this section we define the Ito’s stochastic intrgral. For a deterministic f (t) ∈ L2 (0, T )
Z
0
t
f (s)dBs = lim +
X
∆t→0
fj (Btj+1 − Btj )
j
defines the Wienner integral. Since
X
X
X
(fj+1 Btj+1 − fj Btj ) =
fj (Btj+1 − Btj ) +
(fj+1 − fj )Btj+1 ,
j
j
j
for f ∈ BV (0, T )
Z
t
Z
f (s)dBs = f (t)Bt − f (0)B0 −
0
t
Bs df (s).
0
Given the filtration {Ft }, assume the Brownian motion is Ft -adapted.
61
Definition For an elementary stochastic process
ft (ω) =
n
X
fk−1 (ω)χ[tk−1 ,tk ) ,
fk−1 is a random variable,
k=0
we define the stochastic integral
Z t
X
fs (ω) dBs (ω) =
fk−1 (ω)(Btk (ω) − Btk−1 (ω)).
0
k=0
It defines the Ito’s sum and is the generalization of the ordinary concept of a RiemannStieltjes
integral.
Examples (Stochastic integral) (1) Let fk−1 = Btk−1 we have
X
Btk−1 (Btk − Btk−1 ) =
k
X1
k
→
2
(|Btk |2 − |Btk−1 |2 − |Btk − Btk−1 |2 )
1
1
(|Bt |2 − |B0 |2 ) − t.
2
2
Btk−1 +Btk
2
(2) For fk−1 =
X Btk−1 + Btk
X1
1
(Btk−1 − Btk−1 ) =
(Bt2k − Bt2k−1 ) = (|Bt |2 − |B0 |2 )
2
2
2
k
k
(3) For fk−1 = Btk
X
Btk (Btk−1 − Btk−1 ) =
k
X1
k
2
(|Btk |2 − |Btk−1 |2 + |Btk + Btk−1 |2 ) →
1 2
t
(B − B02 ) + .
2 t
2
Examples show that unlike a RiemannStieltjes integral, it is very essential to select fk−1
on [tk−1 , tk ) for an approximating sequence for elementary stochastic processes of continuous
process ft (ω). We use the non-anticipated one, i.e., fk−1 is Ftk−1 -measurable. Then, we have
the following properties.
Lemma (Isometry) For Ft -adapted elementary process ft (ω)
Z
E[
t
Z
fs dBs ] = 0,
t
t
Z
E[|fs (ω)|2 ] ds. (Isometry).
fs dBs | ] =
E[|
0
2
0
0
Proof: Let ∆Bk−1 = Btk − Btk−1 .
Z t
X
E[
fs dBs ] =
E[fk−1 ∆Bk−1 ] = 0
0
k
since fk−1 and ∆Bk−1 are independent.
Z
t
2
fs dBs | ] =
E|
0
X
E[fk−1 f`−1 ∆Bk−1 ∆B`−1 ] =
k,j
X
2
Z
E[|fk−1 | ](tk −tk−1 ) =
0
k
where we use
E[fk−1 f`−1 ∆Bk−1 ∆B`−1 ] =
0
E[fk−1 |2 ](tk − tk−1 )
62
if k 6= `
if k = `,
t
E[|fs (ω)|2 ] ds.
since fk−1 f`−1 ∆Bk−1 and ∆B`−1 are independent if k < `. Define the class V[0, T ] satisfying
(a) f is adapted and measurable (the mapping (t, ω) → ft (ω) is measurable on the product
space [0, T ] × Ω with respect to the product σ-algebra B[0, T ] × F).
RT
(b) E[ 0 |ft |2 dt] < ∞.
It follows from the isometry that if {f n } be a approximating sequence of f ∈ V [0, T ], then
Z t
E[|Itn − Itm |2 ] =
E[|fsn − fsm |2 ] ds
0
for
Itn =
t
Z
fsn dBs .
0
where V(0, T ) is Hilbert space with inner product
Z T
(f (t, ω), g(t, ω) dt].
(f, g)V(0,T ) = E[
Lemma (Density) For f ∈ V[0, T ] there exists a sequence of Ft -adapted elementary stochastic
processes such that
Z T
E[|fsn − fs |2 ] ds → 0 as n → ∞.
(6.6)
0
Proof: It suffices to prove the theorem for the bounded process since if we define hn (t, ω) by
if f (t, ω) < −n
−n
f (t, ω) if |f (t, ω)| ≤ n
hn (t, ω) =
n
if f (t, ω) > −n
then
Z
T
E[|fs (ω) − hn (s, ω)|2 ] ds → 0 as n → ∞
0
by the Lebesgue dominated convergence theorem. First, suppose t → ft (ω) is bounded and
continuous. We let
X
ftn =
ftk−1 (ω)χ[tk−1 ,tk ) .
k
ftn (ω)
Since
converges to ft (ω) a.e., and by the bounded convergence theory (6.6) holds. Next,
Rt
If ft is bounded, and let gt (ω) = h1 t−h fs (ω) ds, for h > 0, then
Z
T
E|fs − gs |2 ds → 0 as h → 0+ .
0
Since t → gt (ω) is bounded and continuous, (6.6) holds. By this lemma, we define:
Definition (Ito stochastic Integral) The Ito stochastic integral is defined by
Z t
X
n
fs dBs = lim
fk−1
(Btk − Btk−1 )
0
|P |→0
k
for class V[0, T ], i.e., {Itn } is a Cauchy sequence in L2 (Ω, Ft , P ) has a unique limit. If t → ft (ω)
is continuous,
Z t
X
fs dBs = lim
f (tk−1 , ω) , (Btk − Btk−1 ).
0
|P |→0
k
63
Theorem (Ito stochastic Integral) The process
t
Z
t → It (ω) =
fs (ω) dBs
0
satisfies (i)
Z
2
E[|It | ] =
E[It ] = 0,
t
E[|fs |2 ] ds.
0
(ii) t ∈ It (ω) is Ft martingale, and
(iii) t ∈ It (ω) is continuos a.s..
Proof: For an elementary stochastic process ft
X
E[Tt |Fs ] = Is +
E[fk−1 ∆Bk−1 |Ftk−1 ] Fs ].
tk >s
Since
E[E[fk−1 ∆Bk−1 |Ftk−1 ]|Fs ] = ftk−1 E[∆Bk−1 ] = 0
we have E[It |Fs ] = Is , t ≥ s.
For (iii) we have
P ( sup |Itn − Itm | ≥ δ) ≤
0≤t≤T
1
1
E[|ITn − ITm |2 ] =
δ
δ
Z
T
E[|f n − f m |2 ] ds
0
by the Doob’s martingale inequality. Hence we may choose a subsequence nk such that
n
P ( sup |It k+1 − Itnk | ≥ 2−k ) < 2−k
0≤t≤T
By the Borel-Cantelli lemma
n
P ( sup |It k+1 − Itnk | > 2−k for infinitely many k) = 0.
0≤t≤T
Thus, for a.s ω there exits k1 (ω) such thet
n
sup |It k+1 − Itnk | ≤ 2−k for all k ≥ k1 (ω).
0≤t≤T
and therefore Itnk (ω) converges uniformly to It a.s. on [0, T ], i.e., for arbitrary > 0 there exists
N (ω) such that for n ≥ N (ω)
sup |I n (ω) − It (ω)| ≤
0≤t≤T
.
3
Since t → ItN (ω) is continuous there exists δ(ω) such that for |t − s| ≤ δ(ω)
|It (ω) − Is (ω)| ≤ |ItN (ω) − It (ω)| + |ItN (ω) − IsN (ω)| + |IsN (ω) − Is (ω)| ≤ .
One can extend the Ito stochastic integral replacing property (b) by the weaker assumption:
Rt
(b0 ) P ( 0 |ft |2 < ∞) = 1.
We denote by La,T the space of processes that verify properties (a) and (b0 ). Stochastic integral
is extended to the space La,T by means of a localization argument. Suppose that u belongs to
La,T . For each n ≥ 1 we define the stopping time
Z
τn (ω) = inf{t ≥ 0 :
0
64
t
|fs (ω)|2 ds ≥ n}
RT
where, by convention, τn = T if 0 |fs |2 ds < n. In this way we obtain a nondecreasing sequence
of stopping times such that τn ↑ T . Furthermore,
Z t
t < τn ⇔
|fs |2 ds < n
0
(n)
= ft I[0,τn ] (t) and then f (n) ∈ La,T . For m ≥ n, on the set {t ≤ τn }
Set ft
Z
t
Z
t
Z
fs(n)
u(n)
s dBs ,
0
0
since
t
Z
u(m)
dBs =
s
dBs =
t
fs(m) I[0,τn ]
fs(m) dBs .
dBs =
0
0
0
t∧τn
Z
As a consequence, there exists an adapted and continuous process denoted by
that for any n ≥ 1 and t ≤ τn
Z t
Z t
(n)
fs dBs =
fs dBs .
0
Rt
0
fs dBs such
0
The stochastic integral of processes in the space La,T is linear and has continuous trajectories.
However, it may have infinite expectation and variance. Instead of the isometry property, there
is a continuity property in probability by the proposition:
Proposition 4 Suppose that f ∈ La,T . For all K, δ > 0 we have:
T
Z
T
Z
|fs |2 ds ≥ δ) +
fs dBs | ≥ K) ≤ P (
P (|
0
0
δ
.
K2
Proof: Consider the stopping time defined by
Z
τ = inf{t ≥ 0 :
t
|fs |2 ds ≥ δ}
0
Then, we have
T
Z
T
Z
|fs |2 ds ≥ δ) + P ({|
fs dBs | ≥ K) ≤ P (
P (|
0
T
Z
Z
fs dBs | ≥ K|} ∩ {
0
0
T
|fs |2 ds ≤ δ})
0
where
Z
T
Z
T
|fs |2 ds ≤ δ}) = P ({|
fs dBs | ≥ K|} ∩ {
P ({|
0
Z
fs dBs | ≥ K|} ∩ {τ = T })
0
Z
0
τ
fs dBs | ≥ K|} ∩ {τ = T }) ≤
= P ({|
T
0
1
E[|
K2
Z
0
τ
fs2 dBs |2 ]
1
E[
K2
Z
0
τ
|fs |2 ds] ≤
δ
.
K2
As a consequence of the above proposition, if f (n) is a sequence of processes in the space
La,T which converges to f ∈ La,T in probability:
Z
P(
T
|fs(n) − fs |2 ds > ) → 0 as n → ∞
0
for all > 0, then
Z
0
T
fs(n) dBs →
Z
T
fs dBs in probability.
0
65
6.4
Stratonovich integral
The Stratonovich stochastic integral
T
Z
Xt ◦ dBt : Ω → R
0
is defined to be the limit in probability of
k−1
X
i=0
X ti+1 +ti (Bti+1 − Bti )
2
as the mesh of the partition P = {0 = t0 < t1 < · · · < tk = T } of [0, T ] tends to 0.
Examples (Stratonovich’s stochastic integral) Since
X1
X Btj+1 + Btj
1
(Btj+1 − Btj ) =
(|Btj+1 |2 − |Btj |2 ) = (|Bt |2 − |B0 |2 ),
2
2
2
j
j
we have
Z
t
Bs ◦ dBs =
0
1
(|Bt |2 − |B0 |2 ).
2
Conversion between Ito and Stratonovich integrals
The conversion is performed using the formula
Z t
Z
Z t
1 t 0
f (Bs ) ◦ dBs =
f (Bs ) ds +
f (Bs ) dBs ,
2 0
0
0
(6.7)
where f is a continuously differentiable function and the last integral is the Ito integral. In fact,
we have
f (Btj+ 1 ) − f (Btj ) = f 0 (Btj )(Btj+ 1 − Btj ) + o(|Btj+ 1 − Btj |)
2
and
E[
2
2
X
1X 0
f (Btj )∆tj ].
(f 0 (Btj )(Btj+ 1 − Btj )(Btj+1 − Btj )] = E[
2 j
2
j
Thus,
Z
X
1 t 0
0
(f (Btj )(Btj+ 1 − Btj )(Btj+1 − Btj ) →
f (Bs )ds in L2 (Ω, F, P ).
2
2
0
j
since
X
E[|f 0 (Btj )((Btj+ 1 − Btj )(Btj+1 − Btj+ 1 ) −
2
j
2
∆tj 2
1X
)| ] =
E[|f 0 (Btj )|2 ]∆tj |2 → 0.
2
4 j
Stratonovich Calculus Stratonovich integrals are defined such that the chain rule of ordinary
calculus holds, i.e.,
Z
t
f 0 (Xs ) ◦ dXs .
f (Xt ) − f (X0 ) =
0
Since
f 0 (Xtj+ 1 )(Xtj+1 − Xtj )
2
= f (Xtj+1 ) − f (Xtj ) + f (Xtj+1 ) − f (Xtj+ 1 ) − f 0 (Xtj+ 1 )(Xtj+1 − Xtj+ 1 )
2
2
−(f (Xtj ) − f (Xtj+ 1 ) − f 0 (Xtj+ 1 )(Xtj − Xtj+ 1 ))
2
2
2
66
2
and
f (Xtj+1 ) − f (Xtj+ 1 ) − f 0 (Xtj+ 1 )(Xtj+1 − Xtj+ 1 ) = 12 f 00 (Xtj+ 1 )(Xtj+1 − Xtj+ 1 )2 + O(|Xtj+1 − Xtj+ 1 |3 )
2
2
2
2
2
2
f (Xtj ) − f (Xtj+ 1 ) − f 0 (Xtj+ 1 )(Xtj − Xtj+ 1 ) = 12 f 00 (Xtj )(Xtj+ 1 − Xtj )2 + O(|Xtj+1 − Xtj+ 1 |3 )
2
2
2
2
Thus, if Xt is an Ito’s process
X
f 0 (Xtj+ 1 )(Xtj+1 − Xtj ) → f (Xt ) − f (X0 )
2
in L2 (0, T, P )
2
j
as ∆t → 0+ .
6.5
Martingale representation
Martingale representation Theory Let Ft = σ(Bs , s ≤ t). For every square integrable, continuous Ft martingale there exits a unique f ∈ V(0, T ) = {square integrable Ft adapted processon(0, T )
such that
Z t
f (s, ω) dBt (ω).
Mt = E[M0 ] +
0
Proof: Step 1 Let {hk (t)} is the orthonormal basis of L2 (0, T ). Define the exponential martingale
Yk (t) = e
That is, if dXk = kk dBt −
|hk |2
2
Rt
0
hk (s) dBs − 21
Rt
0
|hk (s)|2 ds
.
dt, then it follows from Ito’s rule that Yk = eXk (t) satisfies
1
dYk (t) = Yk (t)(dXk + |hk |2 dt) = hk (t)Yk (t) dBt
2
and
t
Z
Yk (t) = 1 +
hk (t)Yk (t) dBt .
0
Since
d((Yk Yj ) = dYk Yj + Yk dYj + hk hj Yk Yj dt,
we have
Z
E[Yk (t)Yj (t)] = 1 +
t
hk (s)hj (s)E[Yk (s)Yj (s)] ds
0
Thus,
E[Yk (T )Yj (T )] = e
RT
0
hk (s)hj (s) ds
=1
and
E[(|Yk (T ) − 1|2 ] = e − 1.
since {hk (t)} are an orthonormal basis in L2 (0, T ).
k (t)−1
, k ≥ 1} are dense in L2 (FT , P ).
Step 2: Next, we prove that { √1T , Y√
e−1
P
k hk χ[tk−1 ,tk ) (t)
Z T
X
X
h dBt =
hk (Btk − Btk−1 ) =
λk Btk
0
k
k
2
Suppose g ∈ L FT , P ) satisfies
P
G(λ) = E[ge
67
k
λk Btk
]=0
For h(t) =
for all λ. For arbitrary φ ∈ C0∞ (Rn )
n
E[φ(Bt1 , · · · , Btn )g] = E[((2π)− 2
∞
Z
e−i
P
y k B tk
φ̂(y) dy)g]
−∞
∞
Z
n
= (2π)− 2
φ̂(y)E[ge−i
P
yk Btk
] dy = 0,
−∞
where φ̂ is the Fourier transform of φ;
n
φ̂(y) = (2π)− 2
Z
∞
ei
P
xk yk
φ(x) dx.
−∞
Since a family of random variables {φ(Bt1 , · · · , Btn ), 0 =R t0 < t1 < · · R· tn ≤ T, φ ∈ C0∞ (Rn )} is
T
T
2
1
dense in L2 (FT , P ), thus the span of random variables {e 0 h(s) dBs − 2 0 |h(s)| ds , h ∈ L2 (0, T )}
is dense in L2 (FT , P ).
1 Yk (t) − 1
Hence, { √ , √
, k ≥ 1} are an orthonormal basis in L2 (Ω, FT , P ) and for every FT
e−1
T
measurable random variable F has
∞
∞ Z T
X
X
F = α0 +
αk (Yk (t) − 1) = E[F ] +
αk hk (s)Yk (s)dBs
k=1
k=1
0
(6.8)
T
Z
= E[F ] +
f (s, ω) dBs ,
0
where
f (s, ω) =
∞
X
αk hk (s)Yk (s) with αk = √
k=1
1
E[(Yk (t) − 1)F ].
e−1
By the isometry
T
Z
E[|F |2 ] = E[|F0 |2 ] + E[
|f (s, ω)|2 ds].
0
and the representation (6.8) is unique.
Step 3 By Step 2 for t1 ≤ t2
Z
Mt1 = E[Mt2 |Ft1 ] = E[M0 ] + E[
t2
f (t2 ) (s, ω) dBs |Ft1 ]
0
Z
= E[M0 ] +
t1
f (t2 ) (s, ω) dBs = E[M0 ] +
0
Z
t1
f (t1 ) (s, ω) dBs .
0
Thus,
Z
0 = E[|
t1
(f (t1 ) (s, σ) − f (t2 ) (s, ω)) dBs |2 ] =
0
and f
6.6
(t1 )
(s, ω) = f
t1
Z
E[|f (t1 ) (s, ω) − f (t2 ) (s, ω))|2 ] ds.
0
(t2 )
(s, ω) = f (s, ω) almost surely. Ito’s stochastic calculus
In this section we discuss the Ito’s stochastic calculus.
Ito’s Lemma Let Xt be an Ito’s process, i.e.,
Z t
Z t
Xt = X0 +
u(s, ω) +
v(s, ω) dBs
0
0
68
where Ft -adapted processes u, v satisfy
P (|u(t, ω)| < ∞) = 1,
P (|vv t (t, ω)| < ∞) = 1.
For f ∈ C 1,2 ([0, T ] × Rd )
Z
f (Xt )−f (X0 ) =
0
t
n Z
n
X
∂
1 X t ∂2f
∂f
f (s, Xs ) ds+
(s, Xs )dXsj +
(s, Xs )(vv t )i,j (s, ω) ds.
∂t
∂x
2
∂x
∂x
j
i
j
i,j=1 0
i=1
Or, equivalently (increment form)
df (t, Xt ) =
n
n
X
1 X ∂2f
∂f
∂f
dt +
dXtj +
(t, Xt )(vv t )i,j (t, ω) dt.
∂t
∂x
2
∂x
∂x
j
i
j
i,j=1
j=1
(6.9)
Proof: For a positive integer n we define a stopping time τn by
Z t
Z t
τn = inf{t > 0 : |X0 +
v dBs | > n or |
u ds| > n}.
0
0
Then τn → ∞ as n → ∞ a.s.. Thus, it suffices to prove the formula for Xt∧τn and thus without
Rt
Rt
2
f
∂f
loss of generality we can assume that |X0 + 0 v dBs |, | 0 u ds| are bounded and f, ∂x
, ∂x∂i ∂x
j
j
are bounded and uniformly continuous. Note that by the mean value theorem
f (Xt ) − f (X0 ) =
N
n
X
X
∂f
∂f
[ (tk , Xtk−1 )(tk − tk−1 ) +
(Xtk−1 )(Xtk − Xtk−1 )
∂t
∂xj
i=0
k=1
+
n
n
1 X X ∂2f
(ξi,j )(Xtik − Xtik−1 )(Xtjk − Xtjk−1 )].
2 i=1 j=1 ∂xi ∂xj
By the definition of the Ito stochastic integral the second term of RHS converges to
n Z t
X
∂f
(Xs ) dXsj .
∂x
j
0
j=1
Note that
(Xtik − Xtik−1 )(Xtjk − Xtjk−1 ). = (uk−1 ∆tk−1 + vk−1 ∆Bk−1 )i (uk−1 ∆tk−1 + vk−1 ∆Bk−1 )j ,
where we assumed u, v are elementary processes.
6.7
Tanaka’s formula
In this section we discuss the Tanaka’s formula for the one dimensional Brownian motion. Let
|x|,
|x| ≥
g (x) =
2
x +
|x| ≤ .
2
2
By the Ito formula
Z
g (Bt ) = g (B0 ) +
t
g0 (Bs )dBs +
0
Since
Z
1
m(s ∈ [0, t] : Bs ∈ (−, ))
2
t
(g0 (Bs ) − sign0 (Bs ))dBs =
0
Z
0
69
t
Bs
I{|Bs | ≤ }dBs
and by the isometry
Z t
Z Z t
Bs
x 2 − x2
1
2
√
E[|
I{|Bs | ≤ }dBs | ] =
e 2s dxdt → 0 as → 0+ ,
2πs − 0
0
we obtain
Z
t
|Bt | = |B0 | +
sign0 (Bs )dBs (ω) + Lt (ω)
0
where Lt (ω) = the local time for the Brownian motion is defined by
Lt (ω) = lim+
→0
6.8
1
m(s ∈ [0, t] : Bs ∈ (−, )) in L2 (Ω, F, P ).
2
Feynman-Kac formula
The FeynmanKac formula, named after Richard Feynman and Mark Kac, establishes a link between parabolic partial differential equations (PDEs) and stochastic processes. It offers a method
of solving certain PDEs by simulating random paths of a stochastic process. Conversely, an important class of expectations of random processes can be computed by deterministic methods.
Theorem (Feynman-Kac Formula) For f ∈ C02 (Rn ), v ∈ C 1,2 ([0, T ], Rn ) satisfies
∂v
+ Av − q(x)v = 0,
∂t
v(T, x) = f (x)
(6.10)
if and only if
v(t, x) = E t,x [e−
Proof: Let
Zs = e−
RT
t
Rs
t
q(Xs ) ds
f (XT )]
q(Xs ) ds
and then
dZs = −q(Xs )Zs ds.
By the Ito’s formula
d(Zs v(s, Xs )) = v dZs + Zs (
∂v
+ Av) + b · (∇v(s, Xt )σ(Xs )dBs )
∂s
Thus,
Z
E t,x [ZT v(T, XT )] = v(t, x) + E t,x [
T
(Zs (
t
∂v
+ Av − qv) ds]
∂s
which implies the representation. Example Note that (6.10) is backward in time with terminal condition at t = T . Let u(t, x) =
v(T − t, x). Then (6.10) is equivalent to the initial value problem for u
∂u
= Au − q(x)u,
∂t
u(0, x) = f (x).
Let Xt = x + σ Bt . Then
∂u
σ2
=
∆u − q(x)u,
∂t
2
u(0, x) = f (x).
has the solution representation
u(t, x) = E[e−
Rt
0
q(Xt ) dt
f (Xt )] = E[e−
Rt
0
70
q(Xt ) dt
1
] √
( 2πtσ)n
Z
Rn
f (y)e−
|y−x|2
2σ 2 t
dy
6.9
Excises
Problem 1 Show (6.7)
Problem 2 Let Bt be a two-dimensional Brownian motion. Given ρ > 0 , compute P (|Bt | < ρ).
Problem 3 Compute the mean and covariance of the geometric Brownian motion. Is it a Gaussian
process?
Problem 4 Let Bt be a Brownian motion. Find the law of Bt conditioned by Bt1 , Bt2 , and
(Bt1 ; Bt2 ) assuming t1 < t < t2 .
Problem 5 Check if the following processes are martingales,
eλBt −
λ2 t
2
t
, et/2 cos(Bt ), (Bt + t)e−Bt − 2 , B1 (t)B2 (t)B3 (t)
where B1 , B2 and B3 are independent Brownian motions.
7
Diffusion Process
When we model a stochastic process in the continuous time it is almost impossible to specify
in some reasonable manner a consistent set of finite dimensional distributions in general. The
one exception is the family of Gaussian processes with specified means and covariances. It
is much more natural and beneficial to take an evolutionary approach. For simplicity let us
consider the one dimensional case where we are trying to define a real valued stochastic process
Xt with continuous trajectories. The space C[0, T ] is the space on which we wish to construct
the measure P . We have the σ-fields Ft = σ(Xs , 0 ≤ s ≤ i) defined for t ≤ T . Let F = FT ,
the total σ-field. We try to specify the measure P by specifying the conditional distributions
P [Xt+h − Xt ∈ A|Ft ] of process {Xt } by the infinitesimal evolution of their mean and variance:
E[Xt+h − Xt |Ft ] = h b(t, ω) + o(h)
(7.1)
E[|Xt+h − Xt |2 |Ft ] = h a(t, ω) + o(h),
where for each t ≤ T the drift b(t, ω) and the variance a(t, ω) are Ft -measurable functions.
Since we insist on continuity of paths, this will force the distributions to be nearly Gaussian and
no additional specification should be necessary. Equations (7.1) are infinitesimal differential
relations and the integrated forms are precise mathematical statements. We will discuss the
approach by K. Ito that realizes the increments Xt+h − Xt as
p
Xt+h − Xt ∼ b(t, Xt )h + a(t, Xt )(Bt+h − Bt )
and as h → 0 Xt defines a solution to the stochastic differential equation
Z t
Z tp
Xt = X0 +
a(s, Xs )dBs .
b(s, Xs ) ds +
0
0
We need some definitions.
Definition (Progressively measurable) We say that a function f : [0, T ] × Ω → R is progressively measurable if, for every t ∈ [0, T ] the restriction of f to [0, t] × Ω is a measurable
function of t and ω on ([0, t] × Ω, B[0, t] × Ft ).
The condition is somewhat stronger than just demanding that for each t, f (t, ω) is Ft is
measurable. The following facts hold.
(1) If f (t, x) is measurable function of t and x, then f (t, Xt (ω)) is progressively measurable.
(2) If f (t, ω) is either left continuous (or right continuous) as function of t for every ω and if in
addition f (t, ω) is Ft measurable for every t, then f is progressively measurable.
(3) There is a sub σ-field Σ ⊂ B[0, T ]×F such that progressive measurability is just measurability
with respect to Σ. In particular standard operations performed on progressively measurable
functions yield progressively measurable functions.
71
We shall always assume that the functions b(t, ω) and a(t, ω) be progressively measurable.
Let us suppose in addition that they are bounded functions. The boundedness will be relaxed
at a later stage. We reformulate conditions (7.1) as
Z
M1 (t) = Xt − X0 −
t
b(s, ω) ds,
t
Z
and M2 (t) = M1 (t)2 −
a(s, ω) ds
0
0
are martingales with respect to (Ω, Ft , P ). We can define a diffusion process corresponding to
a, b as a probability measure P on (Ω, F) such that relative to (Ω, Ft , P ) M1 (t) and M2 (t) are
martingales. If in addition we are given a probability measure µ as the initial distribution, i.e.
µ(A) = P (X0 ∈ A), then we can expect that P is determined by a, b and µ. We have seen that
if a = 1 and b = 0, with µ = δ0 , we obtain the standard Brownian motion Bt . If a = a(t, Xt ) and
b = b(t, Xt ), we expect that Xt is a Markov process, because the infinitesimal parameters depend
only on the current position and not on the past history. If there is no explicit dependence on
time, then the Markov process have the stationary transition probabilities. Finally, if a(t, ω) =
a(t) is a deterministic function in t and b(t, ω) = b1 (t) + c(t)Xt , then P x is Gaussian, if µ is so.
Since Xt are continuous we can establish that
Zλ (t) = eλ M1 (t)−
λ2
2
Rt
0
a(s,ω) ds
= eλ(Xt −X0 −
Rt
0
2
b(s,ω) ds)− λ2
Rt
0
a(s,ω) ds
is a martingale with respect to (Ω, Ft , P ) for every real λ. We can also use for the definition of
a diffusion process corresponding to a, b the condition that Zλ (t) be a martingale with respect
to (Ω, Ft , P ) for every λ. If so, we do not have to assume that the paths were almost surely
continuous. (Ω, Ft , P ) could be any probability space on which a stochastic process Xt such
that Zλ (t) is a martingale for all λ. If C is an upper bound for a, it is easy to see that
E[eλ(M1 (t)−M1 (s)) ] ≤ e
Cλ2
2
.
The lemma of Garsia-Rodemich-Rumsey will guarantee that the paths can be chosen to be
continuous.
In general, let (Ω, F, P ) be a probability space. Let T be the interval [0, T ] for some finite
T or the infinite interval [0, ∞) Ft , t ∈ T be sub S
σ-algebras such that Fs ⊂ Ft for s < t.
We can assume without loss of generality that F = t∈T Ft . Let a stochastic process Xt with
values in Rn be given. Assume that it is progressively measurable with respect to (Ω, Ft ). We
generalize the definitions described in the above to diffusion processes with values in Rn . Given
a positive semidefinite n × n matrix function a = ai,j and an n-vector b = bj function, we define
the operator
X
1X
∂2
∂
(La,b f )(x) =
ai,j
f+
bj
f.
2 i,j
∂xi ∂xj
∂x
j
j
If a = ai,j (t, ω) and b = bj (t, ω) are progressively measurable functions, we define
(Lt,ω f )(x) = (La(t,ω),b(t,ω) f )(x).
Then, we have the equivalent characterizations of the diffusion process:
Theorem 2 (Diffusion Process) The following definitions are equivalent. Xt is a diffusion
process corresponding to bounded progressively measurable functions a(t, ω), b(t, ω) that take
with values in the space of symmetric positive semidefinite n × n matrices, and n-vectors if
(1) Xt has an almost surely continuous version and
Z
Y (t) = Xt − X0 −
t
Z
b(s, ω) ds,
Zi,j (t) = Yi (t, ω)Yj (t, ω) −
0
ai,j (s, ω) ds
0
are (Ω, Ft , P ) martingales.
72
t
(2) For every λ ∈ Rn
1
Zλ (t, ω) = e(λ,Y (t,ω))− 2
Rt
0
(λ,a(s,ω)λ) ds
is an (Ω, Ft , P ) martingale.
(3) For every λ ∈ Rn
1
Xλ (t, ω) = ei(λ,Y (t,ω))+ 2
Rt
0
(λ,a(s,ω)λ) ds
is an (Ω, Ft , P ) martingale.
(4) For every smooth bounded function f on Rn with two bounded continuous derivatives
Z t
(Ls,ω f (Xs )) ds is an (Ω, Ft , P ) martingale.
f (Xt ) − f (X0 ) −
0
(5) For every smooth bounded function φ on T × Rn with at least two bounded continuous x
derivatives and one bounded continuous t derivative
Z t
∂
( + Ls,ω )φ(s, Xs )) ds is an (Ω, Ft , P ) martingale.
φ(t, Xt ) − φ(0, X0 ) −
∂t
0
(6) For every smooth bounded function φ ∈ C 1,2 (T , Rn )
Z
Z t
1 t
∂
(∇x φ(s, Xs ), a(s, ω)∇x φ(s, Xs )) ds
exp φ(t, Xt ) − φ(0, X0 ) −
( + Ls,ω φ(s, Xs )) ds −
2 0
0 ∂t
is an (Ω, Ft , P ) martingale.
(7) Same as (6) except that φ is replaced by ψ of the form ψ(t, x) = (λ, x) + φ(t, x) where is φ
as in (6) and λ ∈ Rn is arbitrary.
Under any one of the above definitions, Y (t, ω) has an almost surely continuous version
satisfying
`2
P ( sup |Y (s, ω) − Y (0, ω)| ≥ `) ≤ 2n e− Ct .
0≤s≤t
for some constant C depending only on the dimension n and the upper bound for a.
Proof: (3) Since
1
1
dZλ (t) = ((λ, dYt ) − (λ, a λ) dt)Zλ (t) + (λ, a λ)Zλ (t) dt = (λ, dYt )Zλ (t),
2
2
we have
t
Z
Zλ (t) − Zλ (s) =
Zλ (σ)(λ, dYσ )
s
and thus Zt ia a martingale.
(4) Let us apply the above lemma with Mt = Xλ (t) and
At = e
Rt
0
i(λ,bs )− 21 (λ,as λ) ds
Then a simple computation yields
Z t
Z t
M t At − M 0 A0 −
Ms dAs = eλ (Xt − X0 ) − 1 −
(Ls,ω eλ )(Xs − X0 ) ds,
0
0
where eλ (x) = ei(λ,x) . Multiplying this by eλ (X0 ), which is essentially a constant, we conclude
that
Z
t
eλ (Xt ) − eλ (X0 ) −
(Ls,ω eλ )(Xs ) ds
0
is a martingale. That is,
E[ei(λ,Xt −Xs |Fs ] =
Z
t
E[(−i(λ, b(σ)) + (λ, a(σ)λ))ei(λ,Xσ −Xs ) |Fs ] dσ.
s
73
If b and a are deterministic
E[ei(λ,Xt −Xs ) |Fs ] = e−
Rt
s
i(λ,b(σ))+(λ,a(σ)λ) dσ
and Xt is a Gaussian process if X0 is so.
(5) Note that
E[φ(t, Xt ) − φ(s, Xs )|Fs ] = E[φ(t, Xt ) − φ(t, Xs )|Fs ] + E[φ(t, Xs ) − φ(s, Xs )|Fs ]
Z
t
Z
Lσ,ω φ(σ, Xσ ) dσ|Fs ] + E[
= E[
s
s
Z
t
= E[
(
s
t
∂
φ(σ, Xs ) dσ|Fs ]
∂t
∂
+ Lσ,ω )φ(σ, Xσ ) dσ|Fs ] + J
∂t
where
Z t
Z t
∂
∂
J = E[
Lσ,ω (φ(t, X − σ) − φ(σ, Xσ ) dσ|Ft ) +
( φ(σ, Xs ) − φ(σ, Xσ ) dσ|Fs ]
∂t
∂t
s
s
Z tZ u
Z tZ t
∂
∂
( φ(v, Xu )Lu,ω φ(v, xu ) dudv|Fs )] − E[
= E[
(Lv,ω φ(u, Xv ) dudv|Fs ]
∂t
∂t
s
s
s
u
Z Z
Z Z
∂
∂
= E[
Lu,ω φ(v, Xu ) dudv −
(Lv,ω φ(u, Xv ) dudv|Fs )] = 0
∂t
∂t
s≤u≤v≤t
s≤v≤u≤t
where we used the fact that the last two integrals are symmetric with respect to (u, v).
7.1
Excises
Problem 1 Show that
Z
Mt = u(t, Xt ) − u(0, X0 ) −
t
(
0
∂
+ L)u(s, Xs ) ds
∂t
is a Ft martingale. If we assume
∂u
+ Lu(t, x) = 0,
∂t
u(T, x) = f (x)
then show that u(t, x) = E t,x [f (XT )] = E[f (XT )|Xt = x].
Problem 2 Show that
Z t R
R
s
∂
− 0t q(Xs ) ds
Mt = e
u(t, Xt ) − u(0, X0 ) −
e− 0 q(Xσ ) dσ ( + L − q(Xs ))u(s, Xs )) ds
∂t
0
is a Ft martingale. If we assume
∂u
+ Lu(t, x) − q(x)u(t, x) = 0,
∂t
then show that u(t, x) = E t,x (e−
Problem 3 Let
RT
t
q(Xs ) ds
u(T, x) = f (x),
f (XT )) (Feynman-Kac formula).
Xt = er t+σ Bt −
σ2 t
2
Show that
dXt = rXt dt + σXt dBt
74
and
Lf = rx f 0 +
σ 2 x2 00
f .
2
If
u(t, x) = E t,x (e−r(T −t) f (XT )),
then show that u satisfies Black-Scholes equation
∂u
∂u σ 2 x2 ∂ 2 u
− r u = 0,
+ rx
+
∂t
∂x
2 ∂x2
u(T, x) = max(0, K − x) = f (x).
(for the European call option)
8
Stochastic Differential equation
A stochastic differential equation models physical processes driven by random forces and random
rate change and uncertainty in models and initial conditions. It has a wide class of applications
in multidisplined sciences and engineering and provides a mathematical tool to analyze concrete
stochastic dynamics and apply and develop probabilistic methods.
For example, consider the population growth mode
dN
= a(t)N (t) + f (t),
dt
N (0) = N0 ,
(8.1)
where N (t) is the size of population at time t, a(t) is the growth rate and f (t) is the generation
rate of the population at time t. We introduce the random environmental effects through;
a(t) = r(t) + ”noise”,
f (t) = F (t) + ”noise”
and random initial value N0 . In different applications (8.1) can be used to model the chemical concentration and the mathematical finance for example. We will formulate the random
equations as the Ito’s stochastic differential equation and develop the solution methods.
Consider the discrete dynamics for Xk , k ≥ 0
Xk+1 = Xk + b(Xk ) ∆t + σ(Xk )wk ,
X0 = x
(8.2)
where b(x) is the drift, σ(x) is the standard deviation, and wk = Btk+1 − Btk is independent,
√
identically distributed Gaussian random variables with N (0, ∆t). Equivalently, we have
Xn = x +
n
X
b(Xk−1 )∆t +
n
X
σ(Xk )(Btk − Btk−1 ).
0
k=1
We will analyze the limit as the time-stepsize ∆ → 0, i.e., let
Xt∆t = Xn on [tn , tn+1 )
then {Xt∆t } converges to the weak solution to the stochastic differential equations.
First, we establish the existence of the strong solution to the stochastic differential equation
dXt = b(t, Xt ) + σ(t, Xt )dBt
under the conditions
H1) (Lipschitz)
|b(t, x) − b(t, y)| + |σ(t, x) − σ(t, y)| ≤ D |x − y|
H2) (Linear Growth)
|b(t, x))| + |σ(t, x)| ≤ C(1 + |x|).
75
Ito’s Lemma Let a square integrable random variable X0 and Ft -Brownian motion Bt , t ≥ 0
be given and assume they are independent. Under conditions H1) and H2) there exists a unique
almost surely continuous measurable processesXt (ω) that satisfies
Z t
Z t
Xt (ω) = X0 (ω) +
b(s, Xs (ω)) ds +
σ(s, Xs (ω))dBs (ω).
(8.3)
0
0
Proof: (Uniqueness) Suppose Xt , X̂t be two solutions. Then, we have
Z t
Z t
E[|Xt − X̂t |2 ] = E[|X0 − X̂0
(b(s, Xs ) − b(s, X̂s ) ds +
(σ(s, Xs ) − σ(s, X̂s )dBs |2 )]
0
0
Z t
|Xs − X̂s |2 ds]
≤ 3E[|(X0 − X̂0 |2 ] + 3(1 + t)D2 E[
0
By Gronwall’s inequality
E[|Xt − X̂t |2 ] ≤ 3E[|(X0 − X̂0 |2 ]e3D
2
t(1+ 2t
.
(Existence) Consider the fixed point iterate
Xtk+1 = Φ(t, Xtk )
and
Z
Φ(t, Xt ) = X0 +
with Xt0 = X0
t
Z
b(s, Xs ) +
0
t
σ(s, Xs )dBs
0
Then,
E[|Xtk+1 − Xtk |2 ] ≤ (1 + t)D2
Z
t
E[|Xsk − Xsk−1 |2 ] ds
0
and
E[|Xt1 − Xt0 |2 ] ≤ 2C 2 t(1 + E[|X0 |2 ])
By induction in k we have
Ak t k
(8.4)
k!
on t ∈ [0, T ]. Thus, {Xtk } is Cauchy a sequence in L2 (Ω, Ft , P ) has a unique limit Xt (ω) =
limk→∞ Xtk (ω) uniformly on [0, T ]. By the martingale inequality
Z T
k+1
k
−k
sup P (|Xt
− Xt | ≥ 2 ) ≤ P (
|b(s, Xsk+1 ) − b(s, X k )|2 ≥ 2−2k−2 )
E[|Xtk − Xtk−1 |2 ] ≤
0≤s≤T
0
Z
+2k+1 E[
T
|σ(s, Xsk+1 ) − σ(s, X k )|2 ] ds.
0
From (8.4) and by Borel-Cantelli lemma Xt (ω) = limk→∞ Xtk (ω) a.s., uniformly on [0, T ]. Theorem (Ito’s Rule) Let Xt be a Ito’s diffusion process, i.e. Xt satisfies
dXt = b(Xt ) dt + σ(Xt )dBt .
where b σ are Lipschitz continuous. Then,
Z t
Z
f (xt ) = f (X0 ) +
∇f (Xs ) · (b(Xs ) ds + σ(Xs )dBs ) +
0
t
0
t
where a(x) = σσ . {Xt , t ≥ 0} is a Markov process and
Z t
f (Xt ) − f (X0 ) −
Af (Xs ) ds
0
76
∂2f
1
ai,j (Xs )(
)(Xs ) ds,
2
∂xi ∂xj
is an Ft -martingale. The generator A of {Xt } is given by
Af =
X
bj (x)(
j
∂f
1
∂2f
)(x) + ai,j (x)(
)(x).
∂xj
2
∂xi ∂xj
with dom(A) = C02 (Rn ).
Crollary
df (t, Bt ) = (
∂f
1
+ ∆f )(Bt ) dt + ∇f (Bt ) · dBt
∂t
2
and thus f (t, Bt ) is a martingale if and only if
8.1
∂f
∂t
+ 12 ∆f = 0.
Weak solution
The solution Xt (ω) to (8.3) under Ito’s condition is a strong solution in the sense that the
sample path of Brownian motion t → Bt (ω) is given in advance and then the sample solution is
constructed as an Ft adapted stochastic process.
Definition (Weak Solution) A pair of stochastic processes (X̃t , B̃t , Ht ) is a weak solution to
(8.3):
Z t
Z t
X̃t = X0 +
b(s, X̃s ) ds +
σ(s, X̃s ) dB̃s ,
0
0
if Ht is an increasing family of σ-algebras, X̃t is Ht -adapted and B̃t is an Ht -Brownian motion.
The uniqueness of the weak solution implies that any two solutions are identical in law, i.e.,
have the same finite dimensional distributions. From the point of view of stochastic modelings
the weak solution concept is more natural since we do not need to specify the Brownian motion
sample beforehand. As will be shown next, there are stochastic differential equations that have
no strong solution but a unique weak solution.
Example(Weak Solution) Consider the stochastic differential equation
dXt = sign(Xt ) dBt
We show that it does not have a strong solution. Suppose it has a strong solution Xt , i.e.,
Z t
Xt =
sign(Xs ) dBs
(8.5)
0
It will be shown that since |sign(Xt )| = 1, Xt is a Brownian motion. Since dBt = sign(Xt ) dXt
we have
Z
t
Bt =
sign(Xs ) dXs
0
It follows from the Tanaka’s formula
Bt = |Xt | − |X0 | − Lt (ω)
where Lt (ω) is the local time of the Brownian motion Xt . Thus, if Ht is σ-algebra generated by
{|Xs |, s ≤ t}, then Bt is Ht -measurable. But, since Ht is strictly contained in Ft , the σ-algebra
generated by {Bs , s ≤ t}, it yields a contradiction.
A weak solution to (8.5) is any Brownian motion Xt . In fact
Z t
B̃t =
sign(Xt ) dXt
0
is a Brownian motion. Since dB̃t = sign(Xt )dXt , we have dXt = sign(Xt )dB̃t .
77
Yamada and Watanabe Theorem Suppose SDE has a weak solution and the solution is
path wise unique. The equation has s strong solution.
Yamada and Watanabe Theorem II Assume b and σ satisfy
|b(t, x1 ) − b(tx 2)| + |σ(t, x1 ) − σ(t, x2 )|ρ(|x1 − x2 |
where ρ is continuous, ρ(0) = 0 increasing, convex and
Z 1
ds = ∞
ρ(s)
for some > 0. Then, SDE has unique strong solution.
8.2
Markov Property of Ito’s diffusion
Consider the SDE
dXt = b(Xt ) dt + σ(Xt ) dBt
with Lipschitz continuous b, σ. It follows from Ito’s Lemma that there exists a unique strong
solution Xt0,x with X0 = x and
Xt0,x = Φ(t, x, {Bs , s ≤ t}).
(8.6)
is a unique pathwise solution that depends on the initial condition and the Brownian motion
path. For h > 0
Z t+h
Z t+h
0,x
Xt+h
= Xt0,x +
b(Xs0,x ) ds +
b(Xs0,x ) dBs
t
Let X̃v =
0,x
,
Xt+v
t
B̃v = Bt+v − Bt and s = t + v. Then
X̃h = Xt0,x +
Z
h
Z
b(X̃v ) ds
0
h
σ(X̃v ) dB̃v ,
0
i.e.,
X̃h = Φ(h, Xt0,x , {B̃v , v ≤ h}).
(8.7)
Define
E x [g(Xt )] = E[g(Xt0,x ].
for bounded continuous function g.
Theorem (Markov property) If {Xt , t ≥ 0} is an Ito diffusion process, then for h ≥ 0
E x [g(Xt+h |calF t ] = E Xt (ω [(Xh )].
Proof It follows from (8.6)–(8.7) that
0,x
E[Xt+h
|Ft ] = E[Xh0,x |x = Xt0,x ].
Define a linear operator Ht by
(Ht g)(x) = E x [g(Xt )]
The, we have Corollary For t, s ≥ 0, Ht+s = Ht Hs . Proof It follows from the Markov property
that
0,x
0,x
Ht+s g = E x [g(Xt+s )] = E[E[g(Xt+s
|Ft ] = E[E Xt g(Xs )]] = E x [(Ht g)(Xs )] = Hs (Ht g).
Theorem Ht is a strongly continuous semigroup on on the space X of bounded uniformly
continuous functions on Rn
78
From the Ito’s lemma
E[|Xt0,x − Xt0,y |2 ] ≤ M |x − y|2
Thus, for all yn → x there exists a subsequence zn of yn such that Xtzn → Xtx almost surely.
lim E[g(Xt0,zn )] = E[g(Xtx )].
n∞
Moreover,
|Xt0,x − x| → 0 almost surely as t → 0+ ,
and thus Ht g → g in X as t → 0+ .
Definition (Generator) Define the generator of Ito’s diffusion process Xt by
(Aφ)(x) = lim+
t to0
E x [φ(Xt ) − φ(x)
Ht φ − φ
= lim+
t
t
t to0
with domain
dom(A) = {φ ∈ X : s − lim+
t to0
E x [φ(Xt ) − φ(x)
exists}.
t
It will be shown that
Theorem (Generator)
Aφ =
X
bj (x)
j
∂2
∂
1X
(σ(x)σ(x)0 )i,j
φ+
φ
∂xj
2 i,j
∂xi ∂xj
for φ ∈ C02 (Rn ).
For SDE (??) we consider the augmented system
dX0 = dt
dXt = b(X0 , Xt ) + σ(X0 , Xt )dBt
Thus, the generator
Aφ(t, x) =
∂
φ(t, x) + Ax φ(t, x)
∂t
for φ ∈ C01,2 (R × Ω).
8.3
Dynkin’s formula
In this section we discuss the Dynkin’s formula.
Theorem (Dynkyin’s Formula) Let Xt be an Ito’s process
dXt = u dt + v dBt .
Assume a stopping τ satisfies E[τ ] < ∞. Then, for f ∈ C02 (Rn )
Z τ
∂
E x [f (τ, Xτ )] = f (x) + E x [ ( + L(s, ω))f (s, Xs ) ds]
∂t
0
where
L(t, ω)f =
X
j
uj
∂f
1 X ∂2f
(t, Xt ) +
(t, Xt ).
∂xj
2 i,j ∂xi ∂xj
Proof: By the Ito’s formula
Z
f (Xτ ) = f (x) +
τ
(
0
∂
+ L(t, ω))f (s, Xs ) ds +
∂t
79
Z
τ
∇f (s, Xs ) · v dBs .
0
Since
Ex[
τ ∧k
Z
∇f (s, Xs ) · v dBs ] = 0.
0
and
E x [|
Z
τ ∧k
Z
τ
∇f (s, Xs ) · v dBs |2 ] = M 2 E[τ − τ ∧ k]| → 0
∇f (s, Xs ) · v dBs −
0
0
for some M as k → ∞, we have
Z
Ex[
τ
∇f (s, Xs ) · v dBs ] = 0.
0
Let Ω be a bounded open set in Rn and τ = τΩ . Then the Dynkin’s formula holds for
f ∈ C 2 (Rn ). Assume that there exits a function f such that
Af = 0,
f |∂Ω = 1
Then, for τ = τΩ and x ∈ Ω we have
Z
E x [f (Xτ )] = f (x) + E x [
τ
Af (Xs ) ds]
and thus
E[τ ] = 1 − f (x).
Example (Brownian Motion) Let Xt = x + Bt in Rn and f = |x|2 . Define a stooping time
τ by
τ = inf{t ≥ 0 : |Xt | = R}, |x| < R.
By the Dynkin’s formula
Z
E [f (Xτ ∧k )] = f (x) + E [
x
x
0
τ ∧k
1
∆f (Xs ) ds] = |x|2 + n E x [τ ∧ k].
2
Thus, letting k → ∞
E x [τ ] = R2 − |x|2 .
For n = 2 let f (x) = −log|x| and |x| ≥ R. Since ∆f = 0,
E x [f (Xτk ] = f (x)
for τk = inf{t ≥ 0|Xt | = R or |Xt | = 2k R}. For pk = P x (|Xτk | = R) and qk = P x (|Xτk | = 2k R).
−logR pk − (logR + k log2)qk = −log|x|
Thus, qk → 0 as k → ∞ and P x (τ < ∞) = 1. This implies the Brownian motion is recurrent.
For n > 2 let f (x) = |x|2−n . Since
R2−n pk + (2k R)2−n qk = |x|2−n ,
lim pk = P x (τ < ∞) = (
k→∞
and the Brownian motion is transient.
80
|x| 2−n
)
R
8.4
Martingale Problem
Given a second order elliptic linear operator L : C 2 (Rn ) → C(Rn ):
Lu = T r(A(x)D2 u(x)) + b(x) · ∇u(x) + c(x)u(x).
the martingale Problem for L consists in finding for each x ∈ Rn a probability measure P x
over the space of all continuous functions Xt (ω) : [0, ∞) → Rn such that X0 = x and for all
f ∈ C 2 (Rn ) we have that
Z t
f (Xt ) −
(Lf )(Xs ) ds
0
is an Ft local martingale under P x . If for any x ∈ Rn there exists a unique P x satisfying the
above conditions we say that the martingale problem for L is well-posed.
Note that existence for the martingale problem implies existence of solutions to the Cauchy
problem for the operator L with initial data in C ( Rn ), indeed, if f ∈ C 2 (Rn ) then we can define
u(x, t) = EP x [f (Xt )].
Then it can be shown that u(x, t) is a classical solution to the Cauchy problem
∂u
= Lu in Rn ,
∂t
u(x, 0) = f (x).
Thus at first glance it would seem that well-posedness for the martingale Problem for L is a
stronger fact than that the well-posedness for the corresponding Cauchy problem, but often one
can turn this around and show the well-posedness for the martingale problem after understanding
the Cauchy problem for L well enough. If A(x) = σσ(x)t , c(x) ≤ 0 and σ, b and c are
Lipschitz continuous, then the martingale is well-posed for L. The martingale problem for
P
P
2
∂
A = j bj (x) ∂x
+ 12 i,j (σσ t )i,j ∂x∂i ∂xj has a solution if and only if the stochastic differential
j
equation
dXt = b(Xt ) dt + σ(Xt ) dBt
has a weak solution. Moreover, this weak solution is a Markov process if and only if the
martingale problem for A is well-posed. The case where the coefficients are merely continuous
is much harder and it is an important result of Stroock and Varadhan that says still in this case
the martingale problem is well-posed.
Theorem Let Xt is an Ito’s diffusion given by
dXt = b(Xt ) dt + σ(Xt ) dBt
and Yt is an Ito’s process given by
dYt = u(t, ω) dt + v(t, ω) dBt
Then Xt ∼ Yt if and only if
E x [u(t)|Yt ] = b(Yt ) and v(t, ω)v t (t, ω) = σ(Yt )σ(Yt )t
for a.a. (t, ω), where Yt is σ-algebra generated by {Ys , s ≤ t}.
Proof: Assume A is the generator ofXt
A=
X
bj (x)
j
∂
1X
∂2
+
(σ(x)σ(x)t )ij
∂xj
2 i,j
∂xi ∂xj
and define
Lf (t, ω) =
X
j
uj (t, ω)
1X
∂2f
∂f
(Yt ) +
(v(t, ω)v(t, ω)t )ij
(Yt )
∂xj
2 i,j
∂xi ∂xj
81
It follows from the Ito’s rule that
Z t
Z t
E[f (Yt )|Ys ] = f (Ys ) + E[
Lf (r, ω) dr|Ys ] + E[
∇f t v dBr |Ys ]
s
s
t
Z
= f (Ys ) + E[
[Lf (r, ω)|Yr ] dr|Ys ]
s
t
Z
Af (Yr ) dr|Ys ].
= f (Ys ) + E[
s
Thus, if we define
Z
Mt = f (Yt ) −
t
Af (Yr ) dr,
0
then
Z
Z t
Z t
Af (Yr ) dr|Ys ] = f (Ys ) −
Af (Yr ) dr|Ys ] − E[
E[Mt |Ys ] = f (Ys ) + E[
0
s
s
Af (Yr ) dr = Ms
0
Hence Mt is a martingale with respect to the σ-algebra Yt . Since the martingale problem has
the unique solution, Xt and Yt have the same law. The specific case of Theorem is:
Corollary (Brownian Motion) Let be an Ito’s process defined by
dYt = v(t, ω) dBt
Then Yt ∼ Bt if and only if
v(t, ω)v t (t, ω) = In for a.a. (t, ω)
Example (Bessel’s process) Let Rt (ω) = |Bt (ω)|2 in Rn , n ≥ 2. Then it satisfies
dRt =
Bt · dBt
n−1
+
dt.
Rt
2Rt
(8.8)
From Corollary
t
Z
B̂t =
0
Bt · dBt
Rt
is Brownian motion. Thus, (8.8) is written as
dRt =
which has the generator
Af (x) =
8.5
n−1
dt + dB̂t
2Rt
1 00
n−1 0
f (x) +
f (x).
2
2x
Girzanov Transform
In this section we discuss the Grizanov transform. In probability theory, the Girzanov theorem
describes how the dynamics of stochastic processes change when the original measure is changed
to an equivalent probability measure, i.e. one can shift the drift term in Ito’s diffusion by the
measure transform. The Grizanov theorem uses the measure change:
dν
(ω) = f (ω) ∈ L1 (Ω) on (Ω, F
dµ
82
on a measure space (Ω, F) and we have
Lemma (Measure Change) Assume
Z
Eν [|X|] =
|X(ω)|f (ω) dµ = Eµ [f X] < ∞.
Ω
Then we have
Eν [X|H] Eµ [f |H] = Eµ [X|H].
Proof: The lemma follows from the following identities:
Z
Z
Z
Z
Eν [X|H]f dµ =
Eν [X|H] dν =
X dν =
Xf dµ = Eµ [f X|H]
H
H
H
H
Z
Z
Eν [X|H]f dµ = Eµ [Eν [X|H]f IH |H] = Eµ [IH Eν [X|H] Eµ [f |H]] =
H
Eν [X|H]Eµ [f |H] dµ.
H
The standard form of the Grizanov theorem is:
Theorem I (Girzanov) Let Yt be an Ito process defined by
dYt = b(t, ω) dt + dBt
and Mt is an exponential martingale;
M t = e−
Rt
0
b(s,ω) dBs − 12
Rt
0
|b(s,ω)|2 ds
.
Define the measure Q on (Ω, FT ) by
dQ = Mt (ω)dP
on Ft
Then, Yt is (Ft , Q)-Brownian motion.
Proof: Since
dMt = −bMt dBt ,
d(Mt Yt ) = Mt (b dt + dBt ) − Yt Mt dBt − bMt d dt = Mt (1 − Yt ) dBt , (8.9)
Mt Yt is a martingale. Since
d(Mt Yt2 ) = dMt Yt2 + 2Yt Mt dYt + Mt dt − 2bMt Yt dt
= (−bMt dBt )Yt2 + 2(b dt + dBt ) + Mt dt − 2bMt dt = (−bMt Yt2 + 2Mt Yt ) dBt + Mt dt,
(8.10)
E[Mt Yt2 |Fs ] = Ms Ys2 + (t − s)Ms
Hence Yt and Yt2 − t are (Ft , Q) martingale since
EQ [Yt |Fs ] =
and
EQ [Yt2 − t|Fs ] =
Ms Ys
E[Mt Yt |Fs ]
=
= Ys
E[Mt |Fs ]
Ms
E[Mt (Yt2 − t)|Fs ]
Ms Ys2 − sMs
=
= Ys2 − s.
E[Mt |Fs ]
Ms
By the Levy characterization of Brownian motion Yt is a (Ft , Q) Brownian motion.
Remark MT dP = Mt dP on Ft , t ≤ T , i.e.,
Z
Z
f MT dP = E[MT f ] = E[E[MT f |Ft ]] = E[f E[MT |Ft ]] = E[f Mt ] = f Mt dP
for all f ∈ Ft -measurable.
83
The drift shift by the Grizanov theorem is:
Theorem II (Girzanov) Let Xt , Yt be Ito processes defined by
dXt = α(t, ω) dt + θ(t, ω) dBt
(8.11)
dYt = β(t, ω) dt + θ(t, ω) dBt
(8.12)
and
Assume that there exists a u(t, ω) such that
θ(t, ω)u(t, ω) = β(t, ω) − α(t, ω)
1
and assume E[e 2
RT
0
|u(s,ω)|2 ds
] < ∞ (Novikov condition). Let Mt be an exponential martingale;
M t = e−
Rt
0
u(s,ω) ds− 21
Rt
0
|u(s,ω)|2 ds
.
Define the measure Q on (Ω, FT ) by
dQ = Mt (ω)dP
Then, B̂t =
Rt
0
on Ft
u(s, ω) ds + Bt is (Ft , Q)-Brownian motion and
dYt = αt dt + θ(t, ω)dB̂t on (FT , Q),
i.e., (Yt , B̂t ) is a weak solution to (8.11) on (Ft , Q).
Proof: The theorem follows from
dYt = β(t, ω) dt + θ(t, ω)(dB̂t − u(t, ω) dt)
= (β(t, ω) − θ(t, ω)u(t, ω)) dt + θ(t, ω)dB̂t = α(t, ω) dt + θ(t, ω)dB̂t .
The weak solution by the Grizanov theorem is:
Theorem III (Girzanov) Let Yt be an Ito diffusion processes defined by
dYt = b(t, Yt )dt + σ(t, Yt ) dBt .
Assume that there exists a u(t, ω) such that
σ(t, Yt )u(t, ω) = b(t, Yt ) − a(t, Yt )
1
and assume E[e 2
RT
0
|u(s,ω)|2 ds
] < ∞ (Novikov condition). Let Mt be an exponential martingale;
M t = e−
Rt
0
u(s,ω) ds− 21
Rt
0
|u(s,ω)|2 ds
.
Define the measure Q on (Ω, FT ) by
dQ = Mt (ω)dP
on Ft
Then,
Z
B̂t =
t
u(s, ω) ds + Bt
0
is (Ft , Q)-Brownian motion. Then (Yt , B̂t ) satisfies
dYt = a(t, Yt ) dt + σ(t, Yt )dB̂t on (FT , Q).
Example (Weak Solution) (1) One can construct a weak solution to
dXt = a(t, xt ) dt + dBt
84
(8.13)
where a(t, x) be bounded continuous. Let
Yt = x + Bt
with b = 0, σ = I. Let u = −a(Yt ) and
Mt = e
Rt
0
a(s,Ys )dBs − 12
and
Rt
0
|a(s,Ys )|2 ds
t
Z
B̂t =
u(s, Ys ) ds + Bt .
0
Then, (Yt , B̂t ) is a weak solution to
dXt = a(t, Xt ) dt + dB̂t .
Also, for any weak solution Xt to (8.13) we have
E[f (Xtx1 , · · · , Xtxk )] = EQ [f (Ytx1 , · · · , Ytxk )], = EP [Mt f (Btx1 , · · · , Btxk )
for all bounded continuous function f and 0 ≤ t1 ≤ · · · ≤ tn ≤ t.
(2) Let Yt is the geometrical Brownian motion:
Yt = e
Rt
σ(t) dBs − 21
0
Rt
0
|σ(s)|2 ds
i.e.,
dYt = σ(t) dBt
Let u(t, Yt ) =
t)
− a(t,Y
σ(t)Yt
(b = 0)
and define
Mt = e
Rt
0
u(s,Ys )dBs − 12
and
Z
Rt
0
|u(s,Ys )|2 ds
t
B̂t =
u(s, Ys ) ds + Bt .
0
Then, (Yt , B̂t ) is a weak solution to
dXt = a(t, Xt ) dt + σ(t)Xt dB̂t
on (Ft , Q).
(3) Let Yt is the martingale
Z
t
Yt = x +
σ(s) dBs ,
0
i.e.
dYt = σ(t) dBt
(b = 0).
t)
Let u(t, Yt ) = − a(t,Y
σ(t) and define
Mt = e
Rt
0
u(s,Ys )dBs − 12
and
Z
B̂t =
Rt
0
|u(s,Ys )|2 ds
t
u(s, Ys ) ds + Bt .
0
Then, (Yt , B̂t ) is a weak solution to
dXt = a(t, Xt ) dt + σ(t) dB̂t .
on (Ft , Q).
85
8.6
Excises
Problem 1 Check (8.9)–(8.10).
Problem 2 Consider the SDE
dXt = f (t, Xt ) dt + σ(t)Xt dBt
(1) Define
Ft = e −
Rt
0
σ(s)dBs + 21
Rt
0
|σ(s)|2 ds
.
Show that d(Ft Xt ) = Ft f (t, Xt ) dt.
(2) Let Yt (ω) be a solution to
d
Yt (ω) = Ft (ω)f (t, Ft−1 (ω)Yt (ω)).
dt
Show that Xt = Ft−1 (ω)Yt (ω) defines a solution to the SDE.
(3) If f (t, x) = r(t)x, then
Xt = X0 e
Rt
0
R
σ(s)dBs + 0t (r(s)− 21 |σ(s)|2 ) ds
Derive a solution to
dXt = Xtγ dt + σXt dBt .
9
Stochastic Integral with respect to Martingale Process
Let (Ω, F, P ) be the probability space and Ft be the right continuous increasing family of sub
σ algebras (i.e. Ft = ∩s>t Fs ). Let Mt is a right continuous square integrable martingale. The
process Xt is predictable if measurable with respect to the σ-algebra Ft− for each time t. Every
process that is left continuous is a predictable process. For every square integrable Ft adapted
process there exists a predictable Φ̃ ∈ L2 such that Φ̃ is a modification of Φ. For example, we
may take
Z
1 t
Φs (ω) ds.
Φ̃t (ω) = lim sup
h→0+ h t−h
One can define the stochastic integral
t
Z
Xt =
Hs dMs ,
(9.1)
0
where {Mt , t ≥ 0} is a square integrable martingale and {Ht , t ≥ 0} is a predictable process.
Definition Let L0 be the set of bounded adapted process such that
Ht = Hj on [tj , tj+1 ) and Hj is Ftj measurable,
with some partition P = {0 = t0 < t1 < · · · } of the interval [0, T ]. For Ht ∈ L0
Xt = I(Ht ) =
k−1
X
Hj (Mtj+1 − Mtj ) + Htk (Mt − Mtk ).
j=0
As the discrete time case, we define the quadratic variation of {Mt , t ≥ 0} by
E[hM it − hM is |Fs ] = E[(Mt − Ms )2 |Fs ],
then
|Mt |2 − hM it
86
(9.2)
is a martingale and hM it is naturally increasing predictable process. We can complete a space
of predictable process by the norm
T
Z
|Ht |2 dhM it ,
0
and the completion is called L2 (hM i). Note that I is a linear operator on the subspace L0 of
simple predictable process of L2 (hM i) and it follows from Theorem for the martingale transform
that
Z
t
|Xt |2 −
|Hs |2 dhM is
0
ia martingale and
t
Z
|Hs |2 dhM is .
hXit =
0
Rt
Proposition 1 The stochastic integral 0 fs dMs for f ∈ L0 is a square integrable martingale
and satisfies
Rt
Rt
h 0 fs dMs it = 0 fs2 dhM is
E[|
Rt
0
Rt
fs dMs |2 ] = E[ 0 fs2 dhM is ] = ||f ||2 .
Proof: For t > s (without loss of generality) we assume that t, belong to the partition P .
t
Z
fσ dMσ |2 |Fs ] =
E[|
s
+2
X
E[E[ft2i (Mti+1 − Mti )2 |Fti )|Fs ]]
i
X
E[E[ftk ft` (Mtk+1 − Mtk )(Mt`+1 − Mt` )|Ft` ]|Fs ].
k>`
Here
E[ft2i (Mti+1 − Mti )2 |Fti ] = ft2i E[(Mti+1 − Mti )2 |Fti ]
= ft2i E[Mt2i+1 − Mt2i |Fti ] = ft2i E[hM iti+1 − hM iti |Fti ]
and
E[ftk ft` (Mtk+1 − Mtk )(Mt`+1 − Mt` )|Ft` ] = E[ftk ft` E[Mtk+1 − Mtk |Ftk ](Mt`+1 − Mt` )|Ft` ] = 0.
Thus,
Z
E[|
t
X
fσ dMσ |2 |Fs ] =
s
Z t
E[ft2i ]E[hM iti+1 − hM iti |Fs ] = E[
|fσ |2 dhM iσ |Fs ].
s
i
which implies the claim. Definition For H ∈ L2 (hM i)
t
Z
Hs dMs = lim Xtn = lim
0
Z
t
Hsn dMs
0
where Htn ∈ L0 and ||H n − H|| → 0 as n → ∞. From Proposition 1
E[|XTn − XTm |2 ] = ||H n − H m ||2
and by the martingale inequality
E[ sup |Xsn − Xsm |2 ] ≤ 4 E[|XTn − XTm |2 ].
0≤s≤T
87
Since L0 is dense in L2 (hM i) there exits a unique limit Xt of Xtn in L2 (hM i) and Xtn , 0 ≤ t ≤ T
has a subsequence that converges uniformly a.s. to Xt (pathwise). Thus, the limit Yt , 0 ≤ t ≤ T
Rt
defines the stochastic integral 0 fs dMs and is right continuous. That is, I is a bounded linear
operator on L0 and since L0 is dense in L2 (hM i) the stochastic integral (9.1) is the extension
of (9.2) on L2 (hM i).
Z t
2
|Xt | −
|Hs |2 dhM is
0
is again a martingale after the extension.
Remark (1) If Mt is continuous, then it is not necessary to assume that Ft is right continuous. If
we let Ft+ = ∩s>t Fs . Then if Mt is an Ft continuous martingale, Mt is also an Ft+ martingale.
The corresponding natural increasing process hM it is Ft+ adapted, but since hM it is continuous
hM it is Ft adapted. Hence Mt2 − hM it is an Ft continuous martingale.
R
(2) If Mt is continuous, then it is not necessary to assume that Φt is predictable, and Φs dMs
is a continuous Ft martingale for Φt is a square integrable Ft adapted process.
(3) L2 (hM i) is a Hilbert space with inner product
Z T
(f, g) = E[
ft gt dhM it ].
0
If the original martingale Mt is almost surely continuous and so is Xt . This is obvious if Ht
is simple by (9.2) and follows from the Doob’s martingale inequality for general. That is,
P ( sup |Xsm − Xsn | ≥ ) ≤
0≤s≤T
1
||H m − H n ||.
2
Choose a sequence nk such that
P ( sup |Xsm − Xsn | ≥ 2−k ) ≤ 2−k
0≤s≤T
and thus
∞
X
P ( sup |Xsnk+1 − Xsnk | ≥ 2−k ) < ∞.
0≤s≤T
By Borel-Cantelli lemma
P ( sup |Xsnk+1 − Xsnk | ≥ 2−k ) for infinitely many k) = 0
0≤s≤T
So, for almost surely ω, there exists k ≥ k1 (ω) such that for all k ≥ k1 (ω)
sup |Xsnk+1 − Xsnk | ≤ 2−k .
0≤s≤T
Hence, lim Xt (ω) = limk→∞ Xtnk (ω) is continues.
Example Let Mt = Nt − t for Poisson process Nt . Then, Mt and |Mt |2 − t are martingales.
Z t
Z t
X
Xt =
Ns dMs =
N ((τj )− ) −
Ns ds.
0
9.1
τj ≤t
0
Generalized Ito’s differential rule
Let Xt = X0 + Mt + At where Mt ∈ M2c is a continuous (locally) square integrable martingale
and At is an continuous process of bounded variation. Then we have the Ito’s differential rule:
Theorem For f ∈ C 1,2 ([0, T ] × Rd )
Z
f (Xt ) − f (X0 ) =
t
ft (s, Xs ) ds +
0
d
X
fxi (s, Xs )dXsi
i=1
88
d Z
1 X t
fxi ,xj (s, Xs ) dhM i , M j is
+
2 i,j=1 0
Or, equivalently (increment form)
df (t, Xt ) = ft dt + +fxi (Xt )dXti +
d Z
1 X t
fxi ,xj (Xt ) dhM i , M j it .
2 i,j=1 0
(9.3)
Proof: For a positive integer n we define a stopping time τn by
τn = inf{t > 0 : |X0 + Mt | > n or |At | > n}
Then τn → ∞ as n → ∞ a.s.. Thus, it suffices to prove the formula for Xt∧τn and thus without
loss of generality we can assume that |X0 +Mt |, |At | are bounded and f, fxi , fxi ,xj are bounded
and uniformly continuous.
Note that by the mean value theorem
f (Xt )−f (X0 ) =
n X
d
X
fxi (Xtk )(Xtk+1 −Xtk )+
k=0 i=0
n
d
d
1 XXX
fxi ,xj (ξi,j )(Xtik+1 −Xtik )(Xtjk+1 −Xtjk ).
2
i=1 j=1
k=0
By the definition of the stochastic integral the first term of RHS converges to
d Z
X
i=1
t
(fxi (Xs ) dMsi + fxi (Xs ) dAis )
0
The second term is a linear combination of forms
X
g(ξk )(Mtk+1 − Mtk )(Ntk+1 − Ntk )
k
X
g(ξk )(Mtk+1 − Mtk )(Atk+1 − Atk )
k
X
g(ξk )(Atk+1 − Atk )(Ctk+1 − Ctk )
k
where Mt , Nt ∈ Mc2 and At , Ct are continuous process of bounded variation. Here
X
|
g(ξk )(Mtk+1 − Mtk )(Atk+1 − Atk )| ≤ kgk sup |Mtk+1 − Mtk | At → 0
k
k
as |P | → 0. In the following theorem it will be shown that the first term converges to
Rt
g(Xs ) dhM, N is .
0
Lemma If |Ms | ≤ C for some C on [0, t], then
E[|Vn |2 ] ≤ 12 C 4
if Vn =
n
X
(Mtk+1 − Mtk )2 .
k=0
Proof: It is easy to see that
|Vn |2 =
n
X
(Mtk+1 − Mtk )4 + 2
k=0
and
n
X
(Vn − Vk−1 )(Mtk+1 − Mtk )2
k=1
n
X
E[(Vn − Vk−1 )|Ftk ] = E[ (Mti+1 − Mti )2 |Ftk ] = E[(Mt − Mtk )2 |Ftk ] ≤ 4 C 2
i=k
Thus,
E[
n
X
(Vn − Vk−1 )(Mtk+1 − Mtk )2 ] ≤ 4C 2 E[Vn ] = 4C 2 E[M (t)2 ] ≤ 4C 4 .
k=1
89
Also,
E[
n
X
(Mtk+1 − Mtk )4 ] ≤ 4C 2 E[Vn ] ≤ 4 C 4 .
k=0
Theorem Let Mt and Nt be bounded continuous martingale. For a bounded uniformly continuous function g
Z t
X
g(Xs ) dhM, N is in L1 (Ω).
gk (Mtk+1 − Mtk )(Ntk+1 − Ntk ) →
0
where gk = g(Xtk + (1 − θk )(Xtk+1 − Xtk ) with θk ∈ [0, 1].
Proof: Let
P
I=
g(Xtk ) [(Mtk+1 − Mtk )(Ntk+1 − Ntk ) − (hM, N itk+1 − hM, N itk )]
J=
P
K=
P
(gk − g(Xtk ))(Mtk+1 − Mtk )(Ntk+1 − Ntk )
g(Xtk )[(hM, N itk+1 − hM, N itk ) −
Rt
0
g(Xs )dhM, N is
We show that I, J, K → 0 as |P | → 0. Clearly E[|K|] → 0 as |P | → 0. Let
X
X
Vt =
(Mtk+1 − Mtk )2 , Wt =
(Ntk+1 − Ntk )2 .
tk+1 ≤t
tk+1 ≤t
Since
|J| ≤ sup |gk − g(Xtk )|(Vt Wt )1/2
k
we have from Lemma
E|J| ≤ E[sup |gk − g(Xtk )|2 ]1/2 E[Vt2 ]1/4 E[Wt2 ]1/4 ≤
√
12C 2 E[sup |g(ξk ) − g(Xtk )|2 ]1/2 → 0
k
k
as |P | → 0. For I let
Ii =
i−1
X
g(Xtk ) [(Mtk+1 − Mtk )(Ntk+1 − Ntk ) − (hM, N itk+1 − hM, N itk )]
k=0
Then (Ii , Fti ) is a discrete-time martingale. Thus from the same arguments as in the proof of
Proposition 1
E[|I|2 ] =
n
X
E[|g(Xtk )|2 ((Mtk+1 − Mtk )(Ntk+1 − Ntk ) − (hM, N itk+1 − hM, N itk ))2
k=0
and therefore
E[|I|2 ] ≤ 2kgk2
n
X
E[(Mtk+1 − Mtk )2 (Ntk+1 − Ntk )2 ] + 2kgk2
k=0
Here
Pn
k=0
n
X
E[(hM, N itk+1 − hM, N itk )2 ].
k=0
E[(Mtk+1 − Mtk )2 (Ntk+1 − Ntk )2 ] ≤ E[supk (Mtk+1 − Mtk )2
P
k
(Ntk+1 − Ntk )2 ]
≤ E[sup (Mtk+1 − Mtk )4 ]1/2 E[|Wt |2 ]1/2 → 0
k
as |P | → 0. Since hM, N is , s ∈ [0, t] is bounded
n
X
k=0
E[(hM, N itk+1 − hM, N itk )2 ] ≤ E[sup |hM, N itk+1 − hM, N itk |khM, N ikt ] → 0
k
90
as |P | → 0. Thus E[|I|2 ] → 0. Theorem (Levy Characterization of Brownian motion) Let Xt be a continuous Ft
adapted process. Then the followings are equivalent
(1) Xt is an Ft − Brownian motion.
(2) Xt is a square integrable martingale and hX i , X j it = δi,j t.
Proof: It suffices to prove that
1
E[ei (ξ,Xt −Xs ) |Fs ] = e− 2 |ξ|
2
(t−s)
.
Applying the Ito’s formula for ei (ξ,Xt )
Z
Z t
1 t 2 i (ξ,Xσ )
i (ξ,Xt )
i (ξ,Xs )
i (ξ,Xσ )
|ξ| e
e
−e
=
(iξ e
, dXσ ) −
dσ.
2 s
s
Since Xt ∈ Mc2
Z t
E[ (iξ ei (ξ,Xσ ) , dXσ )|Fs ] = 0.
s
Multiplying the both sides of this by e−i (ξ,Xs ) , for A ∈ Fs
Z t
1
E[ei (ξ,Xσ −Xs ) χA ] dσ.
E[ei (ξ,Xt −Xs ) χA ] − P (A) = − |ξ|2
2
s
Thus, we obtain
1
2
E[ei (ξ,Xt −Xs ) χA ] = P (A) e− 2 |ξ|
9.2
(t−s)
.
Semimartingale
A stochastic process {Xt , t ≥ 0} is called a semimartingale if it can be decomposed as the
sum of a local martingale and an adapted finite-variation process. Semimartingales are ”good
integrators”, forming the largest class of processes with respect to which the Ito-integral can
be defined. The class of semimartingales is quite large (including, for example, all continuously differentiable processes, Brownian motion and Poisson processes). Submartingales and
supermartingales together represent a subset of the semimartingales. As with ordinary calculus, integration by parts is an important result in stochastic calculus. The integration by parts
formula for the Ito- integral differs from the standard result due to the inclusion of a quadratic
covariation term. This term comes from the fact that Ito-calculus deals with processes with nonzero quadratic variation, which only occurs for infinite variation processes (such as Brownian
motion). If X and Y are semimartingales then
Z t
Z t
Xt Yt = X0 Y0 +
Xs− dYs +
Ys− dXs + hX, Y it
0
0
where hX, Y i is the quadratic covariance process. The result is similar to the integration by
parts theorem for the RiemannStieltjes integral but has an additional quadratic variation term.
Ito’s lemma is the version of the chain rule or change of variables formula which applies
to the Ito stochastic integral. It is one of the most powerful and frequently used theorems
in stochastic calculus. For a continuous d-dimensional semimartingale Xt ∈ Rd and twice
continuously differentiable function f from Rd to R, it states that f (Xt ) is a semimartingale
and,
d
d
X
1 X
df (Xt ) =
fi (Xt ) dXti +
fi,j (Xt ) dhX i , X j it .
2
i,j=1
i=1
This differs from the chain rule used in standard calculus due to the term involving the quadratic
covariation. The formula can be generalized to non-continuous semimartingales by adding a pure
jump term to ensure that the jumps of the left and right hand sides agree.
91
9.3
Excises
Problem 1 Show (5.1)–(5.3).
Problem 2 If {ξk , k ≥ 1} is a sequence of independent random variables with E[ξk ] = 1. Show
that Xn = Πnk=1 ξk is a martingale with respect to Fn = σ(ξk k ≤ n). Consider the case
P (ξk = 0) = P (ξk = 2) = 21 . Show that there is no an integrable random variable ξ such that
Xn = E[ξ|Fn ].
Problem 3 Let {ξk }P
be a sequence of independent random variables with E[ξk ] = 0 and V (ξk ) =
n
σk2 . Define Sn =
the following generalization of
k=1
Pξτk and Fn = σ(ξk , k ≤ n). Show P
τ
2
2
|ξ
|]
<
∞
then
E[S
]
=
0.
If
E[
Wald’s
identities.
If
E[
τ
k=1 |ξk | ] < ∞ then E[Sτ ] =
k=1 k
Pτ
2
E[ k=1 σk ].
Problem 4 Show (5.8) and (5.9).
Problem 5 Suppose {Xn } is a martingale satisfying some p > 1 E[|Xn |p ] < ∞. Show
1
(E[( max |Xk |)p ]) p ≤
0≤k≤n
1
p
E[|Xn |p ] p .
p−1
R∞
Hint: E[|ξ|p ] = p 0 tp−1 P (|ξ| > t) dt. Now, we use the maximal inequality for the submartingale |Xn |.
Problem 6 Show (5.10)-(5.11).
Problem 7 Show that Xt = eλBt −
10
10.1
λ2 t
2
satisfies dXt = λXt dBt . Find the generator of Xt .
Filtering Theory
Discrete time Filtering
We discuss the nonlinear filtering problem for the discrete-time signal system for Rn -valued
process xk :
xk = f (xk−1 ) + wk
(10.1)
and the observation process yk ∈ Rp is given by
yk = h(xk ) + vk
(10.2)
where wk and vk are independent i.i.d. Gaussian random variables with covariances Q and
R, respectively. We assume that the initial condition x0 and wk , vk are independent random
variables. The optimal nonlinear filtering problem is to find the conditional expectation E[xk |Yk ]
of the process xk given the observation data Yk = {yj , 0 ≤ j ≤ k}. The probability density
function pk|k of the conditional expectation E[xk |Yk ] is updated by the Bayes’ formula:
Z
T
−1
1
1
pk|k−1 (x) =
(10.3)
e− 2 (x−f (t)) Q (x−f (t)) pk−1|k−1 (t) dt
n det Q)1/2
((2π)
n
R
and
1
T
pk|k (x) = c e− 2 (y−h(x))
R−1 (y−h(x))
pk|k−1 (x)
(10.4)
where pk|k−1 is the one-step prediction and is the probability density function of xk conditioned
on Yk−1 . That is, the recursive filter (10.3)–(10.4) consists of the prediction step (10.3) and the
correction step (10.4).
Most well know case when (10.3)—-(10.4) has the exact update is the Kalman filter for the
linear Gaussian system. Consider the linear system
xk = Ak xk−1 + bk + vk
and
y k = Hk x k + w k
92
If x0 is Gaussian, then {(xk , yk )} are Gaussian system. That is, pk−1|k−1 is a Gaussian with
mean xk−1|k−1 and covariance Σk−1|k−1 . From the predictor step of (10.3) we have
pk|k−1 (t) =
1
1
((2π)n det Σk−1|k−1 ) 2
− 12 (t−xk−1|k−1 )T Σ−1
(t−xk−1|k−1 )
k−1|k−1
e
dt.
We show that pk|k−1 is a Gaussian with mean xk|k−1 and covariance Σk|k−1 with
xk|k−1 = Ak xk−1|k−1 + bk
(10.5)
Σk|k−1 = Ak Σk−1|k−1 Atk + Qk .
In fact,
xk|k−1 = E[xk |Yk−1 ] = Ek|k−1 [Axk−1 + bk + vk |Yk−1 ] = Axk−1|k−1 + bk
and
Ek|k−1 [(xk − xk|k−1 )(xk − xk|k−1 )t ] = Ek|k−1 ((A(xk−1 − xk|k−1 )(A(xk−1 − xk|k−1 )t |Yk−1 ] + Qk
since vk is independent with Yk−1 .
Define the innovation process
Ik = yk − Ek|k−1 [yk ] = yk − Ek|k−1 [Hk x(k)] = yk − Hk xk|k−1
We show that the corrector step of (10.3) is equivalent to pk|k is a Gaussian with mean xk|k and
covariance Σk|k with
xk|k = xk|k−1 + Gk (y(k) − Hk xk|k−1 )
(10.6)
Σk|k = Σk|k−1 − Σk|k−1 Hk∗ (HΣk|k−1 Hk∗ + Rk )−1 Hk Σk|k−1 ),
where
Gk = Σk|k−1 Hk∗ (Hk Σk|k−1 Hk∗ + R)−1
is called the Kalman filter gain. In fact, the innovation process is independent with Yk−1 and
thus
xk|k = E[xk |Yk ] = E[xk |Yk−1 ] + GIk for some G ∈ Rn×p ,
where we have
GE[Ik Ik∗ ] = E[(xk|k − xk|k−1 )Ik∗ ] = E[xk Ik∗ ].
Since
E[xk Ik∗ ] = E[xk (wk + Hk (xk − xk|k−1 )∗ ]
= E[xk (xk − xk|k−1 )∗ ]Hk∗
= E[(xk − xk|k−1 )(xk − xk|k−1 )∗ ]Hk∗
= Σk|k−1 Hk∗ .
Similarly,
E[Ik Ik∗ ] = Rk + Hk Σk|k−1 Hk∗ ,
we obtain
G = Gk = Σk|k−1 Hk∗ (Rk + Hk Σk|k−1 Hk∗ )−1 .
Since
E[(xk − xk|k )(xk − xk|k )∗ ] = E[(xk − xk|k−1 − GIk )(xk − xk|k−1 − GIk )t
= E[(xk − xk|k−1 )(xk − xk|k−1 )t ] − GE[Ik Ik∗ ]G∗
= Σk|k−1 − Σk|k−1 Hk∗ ((Rk + Hk Σk|k−1 Hk∗ )−1 Hk Σk|k−1 .
93
In summary the Kalman filter for the linear system consist of the predictor step
xk|k−1 = Ak xk−1|k−1 + bk
Σk|k−1 = Ak Σk−1|k−1 Atk + Qk .
and the corrector step
xk|k = xk|k−1 + Gk (y(k) − Hk xk|k−1 )
Gk = Σk|k−1 Hk∗ (Hk Σk|k−1 Hk∗ + R)−1
Σk|k = Σk|k−1 − Σk|k−1 Hk∗ (HΣk|k−1 Hk∗ + Rk )−1 Hk Σk|k−1 ).
In the nonlinear case given density pk|k−1 let
zk = yk − Ek|k−1 (yk ) = h(xk ) − ĥk + vk
where
Z
ĥk = Ek|k−1 h(xk ) =
h(x)pk|k−1 (x) dx.
Then, since vk independent of Yk−1
Ek−1|k [(zk , η)] = Ek−1|k [h(xk ) − x̂k , η)] = 0
for all η ∈ Yk−1 . Thus,
x̂k|k = x̂k|k−1 + Gzk
is the (linear) least square estimate. Thus, if we let
Z
Σzz = Ek|k−1 [zk zk∗ ] = R + (h(x) − ĥk )(h(x) − x̂k )∗ pk|k−1 dx
and
Σxz Ek|k−1 [ĥk Ik∗ ] =
Z
(x − x̂k|k−1 )(h(x) − ĥk )∗ pk|k−1 dx,
then the gain G is given by
G = Σxz Σ−1
zz
10.2
Gaussian Filter
We consider a Gaussian approximation of this recursive formula (10.3) and develop Gaussian
filters. We assume that pk−1|k−1 is a single Gaussian distribution with mean xk−1|k−1 and
covariance Pk−1|k−1 . Then we construct the Gaussian approximation of pk|k in the following
two steps. First, we consider the predictor step. We approximate pk|k−1 by the Gaussian
distribution that has the same mean and covariance as pk|k−1 . By Fubini’s theorem the mean
and covariance of pk|k−1 are given by
Z
Ek|k−1 [x] =
xpk|k−1 (x) dx
Rn
Z
Z
=
Rn
Rn
1
− 21 (x−f (t))T Q−1 (x−f (t))
e
x
dx
pk−1|k−1 (t) dt
((2π)n det Q)1/2
Z
=
f (t)pk−1|k−1 (t) dt
Rn
94
and
Z
T
xxT pk|k−1 (x) dx
Ek|k−1 [xx ] =
Rn
1
− 21 (x−f (t))T Q−1 (x−f (t))
T
e
xx dx pk−1|k−1 (t) dt
((2π)n det Q)1/2
Z
Z
=
Rn
Rn
Z
f (t)f (t)T pk−1|k−1 (t) dt
=Q+
Rn
Remark 1.1: To derive the mean and covariance of pk|k−1 , we also can use (10.1) and the
independence of vk and xk−1 . For example,
Ek|k−1 [x] = E[f (xk−1 + vk |Yk−1 ] = E[f (xk−1 )|Yk−1 ]
which is precisely the same as the above, though differently expressed.
Thus, if pk−1|k−1 is a Gaussian with mean xk−1|k−1 and covariance Pk−1|k−1 , then the
Gaussian approximation of pk|k−1 has mean xk|k−1 and covariance Pk|k−1 defined by
Z
−1
1
(t−xk−1|k−1 )
− 1 (t−xk−1|k−1 )T Pk−1|k−1
dt
(10.7)
e 2
xk|k−1 =
f (t)
n
1/2
((2π) det Pk−1|k−1 )
Rn
and
Pk|k−1 = Q +
R
Rn
(f (t) − xk|k−1 )(f (t) − xk|k−1 )T
1
((2π)n det Pk−1|k−1 )1/2
(10.8)
−1
(t−xk−1|k−1 )
− 12 (t−xk−1|k−1 )T Pk−1|k−1
e
dt.
Next, we discuss the corrector step.
T
1
pk|k (x) = c e− 2 (y−h(x))
R−1 (y−h(x))
pk|k−1 (x)
where c is the normalization constant and we assume that pk|k−1 (x) is given by the Gaussian
approximation defined by (10.7)–(10.8). Define the innovation process
Ik = yk − Ek|k−1 [yk ] = yk − Ek|k−1 [h(xk )].
Here, we again approximate for the conditional distribution of (xk , h(xk )) given the observations up to time k − 1 by a Gaussian. That is, we approximate Ek|k−1 [h(xk )] by its Gaussian
approximation z. This means that the probability density function of z is given by the Gaussian
distribution with mean ẑ and covariance Pzz defined by
Z
−1
1
− 1 (t−xk|k−1 )T Pk|k−1
(t−xk|k−1 )
e 2
ẑ =
h(t)
dt
(10.9)
n
1/2
((2π) det Pk|k−1 )
Rn
and
Z
(h(t) − ẑ)(h(t) − ẑ)
Pzz =
Rn
1
((2π)n det P
k|k−1
)1/2
e
−1
(t−xk|k−1 )
− 21 (t−xk|k−1 )T Pk|k−1
dt.
(10.10)
Finally, we construct the Gaussian approximation of pk|k with mean xk|k and covariance Pk|k
defined by
xk|k = xk|k−1 + Lk (y(k) − ẑ)
(10.11)
and
T
Pk|k = Pk|k−1 − Lk Pxz
(10.12)
where the Kalman-filter gain Lk is defined by
Lk = Pxz (R + Pzz )−1
(10.13)
and the covariance Pxz is defined by
Z
−1
1
− 1 (t−xk|k−1 )T Pk|k−1
(t−xk|k−1 )
e 2
dt. (10.14)
Pxz =
(t − xk|k−1 )(h(t) − ẑ)T
n
1/2
((2π)
det
P
)
n
k|k−1
R
95
10.3
Central difference Filter
In order to implement the Gaussian filter we must develop the approximation methods to evaluate integrals (10.7)–(10.14). That is, we discuss the approximation methods for the integral of
the form
Z
T −1
1
1
e− 2 (t−x̄) Σ (t−x̄) dt.
I=
F (t)
(10.15)
n
1/2
((2π) det Σ)
Rn
If we assume Σ = S t S and change the coordinate of integration by t = S t s + x̄, then
Z
2
1
1
e− 2 |s| ds
I=
F̃ (s)
n/2
(2π)
n
R
(10.16)
with F̃ (s) = F (S t s + x̄).
We consider the polynomial interpolation of F̃ (s) by
P (s) = F̃ (0) +
n
X
i=1
n
X
1
Hi,i s2i
ai si +
2
i=1
with
ai =
F̃ (hei ) − F̃ (−hei )
2h
Hi,i =
F̃ (hei ) − 2F̃ (0) + F̃ (−hei )
.
h2
Thus,
Z
I∼M =
P (s)
Rn
n
X
1
1
− 21 |s|2
ds
=
F̃
(0)
+
Hi,i
e
n/2
2
(2π)
i=1
(10.17)
and
Z
(1)
Rn
(2)
(P1 (s) − M1 )(P1 (s) − M2 )
2
1
1
e− 2 |s| ds
n/2
(2π)
n
n
X
F̃1 (hei ) − F̃1 (−hei ) F̃2 (hei ) − F̃2 (−hei ) X 1 (1) (2)
+
H H .
=
2h
2h
2 i,i i,i
i=1
i=1
(10.18)
If we remove the second order correction term in (10.39)–(10.40), then this coincides with the
extended Kalman filter with the central difference approximation of the Jacobian of F̃ .
Remark The step size h > 0 can be selected so that the sampling points are in domain of
F and defines an effective neighborhood of F about x̄. One may employ the Gauss-Hermite
quadrature rule:
Z
I=
F̃ (s)
Rn
m
m
X
X
1
− 12 |s|2
e
ds
∼
·
·
·
F̃ (qi1 , qi2 , . . . , qin ) wi1 wi2 . . . win
(2π)n/2
i =1
i =1
1
n
xi
√
,
2
1 ≤ i ≤ m and the weights wi are determined [?] as follows.
p
Let J be the symmetric tri-diagonal matrix with zero diagonals and Ji,i+1 = i/2, 1 ≤ i ≤
m−1. Then {xi } are the eigenvalues of J and wi equal to |(vi )1 |2 where (vi )1 is the first element
of the i-th normalized eigenvector of J. The rule is exact for all polynomials of degree up to
2m − 1 and is much accurate. In order to evaluate I we need mn -point function evaluations.
For example m = 3 we have
and the quadrature points qi =
√
√
q1 = − 3, q2 = 0, q3 = 3
and w1 = w3 =
96
2
1
, w2 = .
6
3
and I requires 9-point function evaluations for n = 2. Since the quadrature points are fixed it
is less feasible regarding the scaling and constraint.
Central Difference Filter
Predictor Step Compute the factorization Pk−1|k−1 = S t S and set f˜(s) = f (S t s + xk−1|k−1 )
Pn
Pn
approximated by f˜(0) + i=1 ai si + 12 i=1 Hi,i s2i . Update pk|k−1 = N (xk|k−1 , Pk|k−1 ) by
xk|k−1 = f˜(0) +
Pk|k−1 = Q +
n
X
1
Hi,i
2
i=1
n
X
ai ati +
i=1
n
X
1
t
Hi,i Hi,i
2
i=1
where ai and Hi,i are the central difference approximation the first order and second order
derivatives of f˜.
Corrector Step Compute the factorization Pk|k−1 = S̃ t S̃ and set h̃(s) = h(S̃ t s + xk|k−1 )
approximated by
n
X
1
h̃(0) +
(bi si + Gi,i s2i ).
2
i=1
Update pk|k = N (xk|k , Pk|k ) by
xk|k = xk|k−1 + Lk (y(k) − zk )
T
Pk|k = Pk|k−1 − Lk Pxz
where
zk = h̃(0) +
n
X
1
Gi,i
2
i=1
Pxz = S̃ t (b1 , · · · , bn )t .
Pzz =
n
X
bi bti +
i=1
n
X
1
Gi,i Gti,i .
2
i=1
Lk = Pxz (R + Pzz )−1 .
where bi and Gi,i are the central difference approximations of the first order and second order
derivatives of h̃.
In the algorithm we avoid to calculate the derivatives of f and h. Instead, we use the central
difference. The operation counts in terms of function evaluations is (n + p)(2n + 1) function
evaluations.
10.4
Mixed Gaussian filters
In this section we discuss the mixed Gaussian filter. We approximate the conditional probability
density pk−1|k−1 by the linear combination of multiple Gaussian distributions, i.e.,
pk−1|k−1 (x) =
m
X
i=1
(i)
αk−1|k−1
1
(i)
(i)
(i)
− 12 (x−xk−1|k−1 )T (Pk−1|k−1 )−1 (x−xk−1|k−1 )
(i)
((2π)n det Pk−1|k−1 )1/2
e
.
(10.19)
97
(i)
(i)
Here we apply the Gaussian filter (10.7)–(10.14) to each Gaussian distribution N (xk−1|k−1 , Pk−1|k−1 )
(i)
(i)
and obtain the predictor update N (xk|k−1 , Pk|k−1 ). Each update is independent from the others
and can be performed in a parallel manner.
(i)
Next, we update the weights αk−1|k−1 for the new update pk|k−1 (x). We use the update via
the collocation property:
(i)
(i)
(i)
(i)
(i)
αk|k−1 N (xk|k−1 , Pk|k−1 )(xk|k−1 ) = pk|k−1 (xk|k−1 )
=
X
(j)
(j)
(j)
(10.20)
(i)
αk−1|k−1 N (xk|k−1 , Pk|k−1 )(xk|k−1 ).
j
(i)
(i)
Then, we use the corrector step for each Gaussian distribution N (xk−1|k−1 , Pk−1|k−1 ) We
discuss here three update formulae at the end of corrector step. First, by equating the zero
moment of each Gaussian distribution we obtain
Z
(i)
(i)
(i)
1
(i)
− 1 (x−xk|k )T (Pk|k )−1 (x−xk|k )
e 2
dx
αk|k
(i) 1/2
n
n
R ((2π) det P
k|k )
=
(i)
αk|k−1
Z
1
Rn
(i)
(i)
(2π)n (det R det Pk|k−1 )1/2
e
(i)
(i)
− 21 (y−h(x))T R−1 (y−h(x))+(x−xk|k−1 )T (Pk|k−1 )−1 (x−xk|k−1 ))
dx.
Here we approximate the right hand side by the Gaussian distribution as in (10.7)–(10.8) and
obtain
−1
T
1
1
(i)
(i)
e− 2 (y−ẑ) (R+Pzz ) (y−ẑ) ,
(10.21)
αk|k = αk|k−1
((2π)n det (R + Pzz ))1/2
where ẑ, Pzz are defined by (10.9)–(10.10), which is the update formula discussed in [?].
(i)
Next, we apply the collocation condition at xk|k :
1
(i)
αk
(i)
((2π)n det Pk|k )1/2
=
X
1
(j)
αk|k−1
j
(i)
(j)
((2π)n det Pk|k−1 )1/2
e
(i)
(i)
(j)
(j)
(i)
(j)
− 21 ((y−h(xk|k ))T R−1 (y−h(xk|k ))+(xk|k −xk|k−1 )T (Pk|k−1 )−1 (xk|k −xk|k−1 ))
to obtain the update
(i)
(i)
αk =
X
j
(j)
αk|k−1
(det Pk|k )1/2
e
(i)
(i)
(i)
(j)
(det Pk|k−1 )1/2
(j)
(i)
(i)
(j)
− 12 ((y−h(xk|k ))T R−1 (y−h(xk|k ))+(xk|k −xk|k−1 )T (Pk|k−1 )−1 (xk|k −xk|k−1 ))
(10.22)
(i)
Finally we discuss the simultaneous update of the weights. We determine the weights αk
(i)
by the L2 -projection, i.e., αk|k , 1 ≤ i ≤ m minimize
2
m
X
(i) T
(i) −1
(i) 1
1
(i)
−
(x−x
)
(P
)
(x−x
)
2
pk|k (x) −
dx
k|k
k|k
k|k
αk|k
e
n det P (i) )1/2
Rn ((2π)
i=1
k|k
Z
(10.23)
over (R+ )m , where pk|k (x) is defined as in the corrector step (10.4) with
pk|k−1 (x) =
m
X
i=1
(i)
αk|k−1
1
(i)
(i)
((2π)n det Pk|k−1 )1/2
98
e
(i)
(i)
− 21 (x−xk|k−1 )T (Pk|k−1 )−1 (x−xk|k−1 )
.
.
In order to perform the minimization (10.23) we need to evaluate the integral of the form
Z
T −1
T −1
1
1
1
e− 2 (y−h(x)) R (y−h(x))
e− 2 (x−x̄) Σ (x−x̄) dx.
n det Σ)1/2
((2π)
n
R
and it is relatively expensive. Hence we propose the minimization of the sum of collocation
distances:
2
m
m X
X
(i)
(j) T
(j) −1
(i)
(j) 1
1
(i)
(j)
−
(x
−x
)
(P
)
(x
−x
)
2
pk|k (x ) −
.
k|k
k|k
k|k
k|k
k|k
αk|k
e
(10.24)
k|k
n det P (j) )1/2
((2π)
j=1
i=1
k|k
over α ∈ (R+ )m . Problem (10.24) is formulated as the quadratic programming
1 T T
δ
α A Aα − AT b + |α|2
2
2
min
subject to α ≥ 0
(10.25)
where δ > 0 is chosen so that the singularity of the matrix AT A is avoided and the matrices
(A, b) are defined by
Ai,j =
bi =
1
(i)
(i)
((2π)n det Pk|k )1/2
m
X
j=1
e
1
(j)
αk|k−1
(j)
(i)
(i)
(j)
− 12 (xk|k −xk|k )T (Pk|k )−1 (xk|k −xk|k )
(i)
(j)
(2π)n (det R det Pk|k−1 )1/2
e
(i)
(i)
(j)
(j)
(j)
Thus, we solve (10.25) to obtain the weights αk at each corrector step by using the existing
numerical optimization method (e.g., see [?]).
The theoretical foundation of the Gaussian sum approximation as above is the reproducing
kernel Hilbert space theory and that any probability density function can be approximated as
closely as desired by a Gaussian sum. More precisely, we state the following error estimate, the
proof of which is found in [?].
Theorem 5.1 (Gaussian sum) Let M be a non-negative integer and N = 2M + 2. For any
> 0 there must exist D > 0 and a mask
(
)
n
X
T
n
αj | j = (j1 , . . . , jn ) ∈ Z , |j| =
js ≤ M
s=1
N
such that for any density function p(x) ∈ C N (Rn ) ∩ W∞
(Rn ) and all h > 0 the estimate
N (Rn ) + |p|
N −1
|p(x) − ph (x)| ≤ cN hN |p|W∞
W∞
(Rn )
(10.26)
holds, where ph (x) is a linear combination of Gaussian distributions given by
X
|x−hk|2
1
ph (x) = p
pk e− 2Dh2
(2π)n Dh2 k∈Z n
with
pk =
X
(i)
(j)
− 12 ((y−h(xk|k ))T R−1 (y−h(xk|k ))+(xk|k −xk|k−1 )T (Pk|k−1 )−1 (xk|k −xk|k−1 ))
α̂j p(h(k − j)),
α̂j = hD−
n−1
2
(10.27)
αj
|j|≤M
N (Rn ) and | · |
N −1
and cN is a constant independent of p(x). Here | · |W∞
W∞
(Rn ) are defined by
N (Rn ) =
|p|W∞
X
|j|=N
99
|∂ j p|C(Rn )
.
and
N −1
|p|W∞
(Rn ) =
N
−1
X
|j|=0
|∂ j p|
j!
respectively.
Roughly speaking, the estimate (10.26) shows that any probability density function p(x) ∈
N
C N (Rn ) ∩ W∞
(Rn ) can be approximated by a sum of Gaussian distributions each of whose
components is given by
|x−hk|2
1
p
e− 2Dh2
(10.28)
n
2
(2π) Dh
with order O(hN ) for h → 0.
10.5
Continuous-Time Nonlinear Filtering
A signal Xt is an Ft -adapted Markov process. Let Yt be a Rp -valued observation process given
by
Z t
h(Xs ) ds + Vt
(10.29)
Yt =
0
where Vt is a p-dimensional Ft -adapted Brownian motion and ht = h(Xt ) is a predictable
RT
process that contains the information of the signal process Xt and satisfies E[ 0 |ht |2 ds] < ∞.
Let Yt be the σ-algebra generated by the observation {Ys , s ≤ t}. The conditional expectation
X̂t = E[Xt |Yt ] is the least mean square estimate of the signal Xt . Define the conditional
probability distribution πt by
πt (φ) = E[φ(Xt )|Yt ]
for φ ∈ Cb2 (Rd )
Then πt satisfies the so-called Kushner-Stratonovich equation
Z t
Z t
πt (φ) = π0 (φ) +
πt (Aφ) ds +
(πs (h φ) − πs (h) πt (φ), dYt − πs (h) ds),
0
(10.30)
(10.31)
0
for φ ∈ dom(A), where A is the generator of the Markov process Xt . In this section we derive
the filter equation (10.31) following Fujisaki-Kallianpur-Kunita [].
We define the innovation process It by
Z t
It = Yt −
E[hs |Ys ] ds.
(10.32)
0
Theorem 1 It is a Yt -adapted Brownian motion.
Proof: For t ≥ s
Z
E[It − Is |Ys ] = E[
t
(hσ − ĥσ ) ds|Ys ] + E[Vt − Vs |Ys ] = 0.
s
Thus It is a Yt martingale. Now, the quadratic variation of It coincides with the one of Vt since
It is the sum of Vt and a process of bounded variation. That is,
hI i , I j it = hV i , V j it = t δi,j .
Thus, the claim follows from the Levy’s theorem. Since It is Yt measurable, thus σ(Is , s ≤ t) ⊂ Yt . If the two σ-algebras coincide for every
t ≥ 0, then It has the innovation property. It has been shown that the innovation property
RT
is valid if ht is independent of Vt and satisfies E[ 0 |ht |2 dt] < ∞. On the other hand, if ht
and Vt are not independent, the innovation property is not necessary valid. Suppose that the
100
innovation property is satisfied. Then by the martingale representation theorem every Yt -square
integrable martingale has the representation
Z t
Mt = M0 +
(Φs , dIs )
0
where Φs ∈ L2 (hIi). We will show that this representation holds without assuming the innovation property. The claim is based on the Girsanov transformation.
Grisanov’s Theorem Let Bt is a d-dimensional; Brownian motion on (Ω, F, P ) and Ft =
Rt
σ(Bs , s ≤ t). Let gt be a Ft -predictable process with 0 |gs |2 ds < ∞ a.s.. Define
Z
Z t
1 t
|gs |2 ds)
αt = exp( (gs , dBs ) −
2 0
s
and assume that E[αt ] = 1 holds for all t ≥ 0. Define the probability measure Q on FT by
Z
Q(A) =
αt dP, A ∈ Ft .
A
Then, (1) If Mt is a (Ft , P )-local martingale, then
Z
M̃t = Mt − h (gs , dBs ), M it
Rt
is a (Ft , Q)-local martingale. In particular B̃t = Bt − 0 gs ds is a (Ft , Q) Brownian motion.
(2) Every square integrable (Ft , Q) martingale M̃t is represented by
t
Z
M̃t = M̃0 +
(Ψs , dB̃s ),
0
where Ψt is an Ft predictable process such that EQ [
RT
0
|Ψt |2 dt] < ∞.
Proof: Since dαt = gt αt dBt , by Ito’s rule
Rt
Rt
αt M̃t = M̃0 + 0 M̃s dαs + αs dM̃s + 0 αs gs dhB, M is
= M̃0 +
Rt
0
gs αs M̃s dBs +
Rt
0
αs dMs .
Thus, αt M̃t is a (Ft , P )-local martingale and therefore M̃t is a (Ft , Q)-local martingale.
To prove the second assertion, we observe
Z
Z t
1 t
−1
|gs |2 ds)
αt = exp(−
(gs , dB̃s ) −
2 0
0
and thus dP = αt−1 dQ on Ft . Thus if M̃t is a (Ft , Q) martingale, then
Z
Mt = M̃t + h (gs , dB̃s ), M̃ it
is (Ft , Q) martingale. By the martingale representation theorem
Z
Mt = M 0 +
t
(Ψs , dBs )
0
and thus
Z
M̃t =
t
Z
0
t
Z
(Ψs , gs ) ds − h
(Ψs , dB̃s ) +
0
101
(gs , dB̃s ), M̃ it .
We note that the first term of the right hand side is a (Ft , P ) martingale. Thus, the remaining
term should be a martingale. But since it is a process of bounded variation and a continuous
martingale with bounded variation is zero, it is zero. .
Rt
Corollary Let h(Xt ) be a Ft -predictable process with 0 |h(Xs )|2 ds < ∞ a.s.. Define
t
Z
1
2
−(h(Xs ), dBs ) −
βt = exp(
s
t
Z
|h(Xs )|2 ds)
0
and assume that E[βt ] = 1 holds for all t ≥ 0. Define the probability measure Q on FT by
Z
Q(A) =
βt dP, A ∈ Ft .
A
Then, (1) If Mt is a (Ft , P )-local martingale, then Yt = Vt +
motion.
Rt
0
h(Xs ) ds is a (Ft , Q) Brownian
Fujisaki-Kallianpur-Kunita Theorem Let Yt be the σ-algebra generated by the observation
process Yt . Suppose that Mt is a square integrable Yt martingale. Then there exists a Yt
predictable process Φt such that
Z t
Mt = M0 +
(Φs , dIs ).
0
Proof: Let
t
Z
1
2
(ĥs , dIs ) −
βt = exp(−
0
Z
t
|ĥs |2 ds)
0
Rt
and define the probability measure by dQ = βt dP on Yt . Then It + 0 ĥs ds = Yt is a (Yt , Q)
Brownian motion by Girsanov theorem. Hence every square integrable (Yt , Q) martingale is
represented as a stochastic integral by Yt . Observe that
Z
Z t
1 t
|ĥs |2 ds)
αt = βt−1 = exp( (ĥs , dYs ) −
2
0
0
and dP = αt dQ. Then by Girsanov theorem It is the basis of (Yt , P ) martingale, i.e., every
square integrable (Yt , P ) martingale is represented as a stochastic integral by It . Let Xt be an Ito’s process
Z
t
Xt = X0 +
fs ds + Wt
0
where Wt is a square integrable (Ft , P ) martingale. Define
Z t
Mt = X̂t − X̂0 −
fˆs ds
(10.33)
0
is Yt martingale. In fact,
E[Mt − Ms |Ys ] = E[X̂t − X̂s −
= E[Xt − Xs −
Thus, by FKK theorem we have Mt =
Rt
s
Rt
0
Rt
s
fˆσ dσ|Ys ]
fσ dσ|Ys ] = E[Wt − Ws |Ys ] = 0.
(Φs , dIs ). We have the following representation of Φt .
Theorem 2
Z t
Mt =
(E[Xs h(Xs )|Ys ] − E[Xs |Ys ]E[h(Xs )|Ys ] + E[Ds |Ys ], dYs − E[h(Xs )|Ys ] ds)
0
102
Z
t
Ds ds = hW, V it .
where Dt is the predictable process defined by
0
Proof: If we let ϕt =
Rt
0
(fs − fˆs ) ds, then
Mt = Wt + ϕt + (X̂t − Xt ) + X0 − X̂0
By FKK theorem for Zt ∈ M2 there exists gs ∈ L2 such that
Z t
Z t
Z t
(gs , hs − ĥs ) ds = Nt + ψt .
(gs , dVt ) +
gs dIs =
Zt =
0
0
0
Then
E[Mt Zt ] = E[(Wt + ϕt + X0 − X̂0 )Zt ].
By Ito’s formula
t
Z
(Wt + ϕt + X0 − X̂0 )Zt = Ft martingale +
Z
(Wt + ϕt + X0 − X̂0 ) dψt +
0
Since Wt + ϕt + (X0 − X̂0 ) = Xt − X̂0 +
Z
t
Rt
t
Z
Z
[Xs (gs , hs − ĥs )] =
0
0
Since
t
Z
Z
E[
it follows that
t
[
E[(gs , X
s hs − X̂s ĥs ] ds.
0
t
Zs dϕs ] =
0
Zs dϕs + hW, V it .
0
fˆs ds,
0
(Wt + ϕt + X0 − X̂0 )dψs ] = E
E[
t
E[Zs (gs , fs − fˆs )] ds = 0,
0
Z t
[
E[Mt Zt ] = E[ (gs , X
s hs − X̂s ĥs + D̂s ) ds
s
for all Zt ∈ M2 , which proves the theorem. Let Xt be Ito’s diffusion process, i.e., dXt = f (Xt ) dt + σ(Xt ) dBt . By Ito’s formula for
φ ∈ Cb2 (Rd )
Z t
φ(Xt ) − φ(X0 ) =
Aφ(Xs ) ds + W̃t
0
where
t
Z
W̃t =
φxi (Xs )σi,j (Xs ) dBs
0
is a (Ft , P )-martingale. It follows from Theorem 2 that for φ ∈ Cb2 (Rd )
t
Z
E[φ(Xt )|Yt ] = E[φ(X0 )] +
E[Aφ(Xs )|Ys ] ds
0
(10.34)
Z
t
(E[M φ(Xs )h(Xs )|Ys ] − E[φ(Xs )|Ys ]E[h(Xs )|Ys ], dYs − E[h(Xs )|Ys ] ds)
+
0
where
Mφ = h φ +
X
φxi σi,j hW j , V i
i,j
since
Z
hW̃ , V it =
t
φxi (Xs )σi,j (Xs ) dhW j , V is .
0
103
From the theorem the optimal estimate X̂t of the state Xt given observations Yt satisfies the
optimal filter equation
\
X̂t = f\
(Xt ) dt + G(t)(dYt − h(X
(10.35)
t )),
where
Z
\
φ(X
t ) = E[φ(Xt )|Yt ) =
φ(x)π(t, x) dx
Rn
and
t
\
ct h(X
G(t) = Xt\
h(Xt )t − X
t)
Rwhere we assume Dt = 0. If the conditional density function π(t, x) exists, i.e. πt (φ) =
φ(x)π(t, x) dx then it satisfies the Kushner’s equation
dπ(t, x) = A(t)∗ π dt + M (t)π(dYt − π(h) dt).
(10.36)
where
M (t)π = (h − π(h))0 π.
Remark Let S be a complete metric space and B(S) be the space of all bounded measurable
functions on S. Let Xt be an S-valued Markov process Xt . That is,
P (Xt ∈ B|σ(Xξ , ξ ≤ s)) = P (Xt ∈ B|σ(Xs ))
for every Borel set B. Let P (s, x, t, E) be the transition probability distribution of Xt and define
Z
t
Ps f =
P (s, x, t, dy)f (y), f ∈ B(S).
S
for all Borel sets B. Then Pst is a linear bounded operator on B(S). Let D be a subspace of
B(S). A family of linear operators Ut defined on D, satisfying
Z t
t
Ps f − f =
Psσ Uσ f dσ, for all f ∈ D
s
is called the generator of Pst . Then, for f ∈ D
Z
Wt (f ) = f (Xt ) − f (X0 ) −
t
Us f (Xs ) ds
0
is a Ft martingale. In fact, for t ≥ s
E[Wt (f ) − Ws (f )|Fs ] = E[f (Xt ) − f (Xs ) −
Rt
s
Uσ f (Xσ ) dσ|Fs ]
Z t
= E[f (Xt )|Fs ] − f (Xs ) − E[
Uσ f (Xσ ) dσ|Fs ].
s
Since by the Markov property E[f (Xt )|Fs ] =
Pst f (Xs ),
E[f (Xt )|Fs ] − f (Xs ) = Pst f (Xs ) − f (Xs ) =
Z
thus
t
Psσ f (Xσ ) dσ =
s
Z
t
E[Uσ f (Xσ ) dσ|Fs ] dσ.
s
and therefore E[Wt (f ) − Ws (f )|Fs ] = 0.
Rt
Let Yt = 0 h(s, Xs ) + Vt and assume that the processes Xt and Vt are independent. Then
we have
Z t
Z t
ˆ
f\
(Xt ) − f\
(X0 ) =
E[Us f (Xs )|Ys ] ds +
(f[
s hs − fs ĥs , dIs )
0
0
where fs = f (Xs ) and hs = h(s, Xs ). Define πt (B) = E[χB (Xt )] for B ∈ B(S). Now, suppose
there exists a measure λ on (S, B(S)) such that for every t ≥ 0 P (Xt ∈ dx) is absolutely
104
continuous with respect to λ(dx). Then πt is a probability measure and absolute continuous
with
respect to λ a.s. and let ρt (x, ω) be the Radon-Nikodym derivative of πt , i.e. πt (B) =
R
ρ
(x,
ω) λ(dx). We define the dual operator Us∗ of Us by
t
B
Z
Z
Us∗ g(x)λ(dx) =
g(x)Us f (s)λ(dx).
S
S
It follows from (10.34) that if ρt (x) is in domain of Ut∗ for t ≥ 0, then
Z t
Z t
Z
∗
ρt (x) = ρ0 (x) +
Us ρs (x) ds +
(ρs (x)(h(s, x) −
h(s, y)ρs (y)λ(dy)), dIs ).
0
0
(10.37)
S
Examples (1) For Ito’s diffusion process Xt
Ut = ai,j (t, x)
and
Ut∗ ρ =
∂2
∂
+ bi (t, x)
∂xi ∂xj
∂xi
∂
∂2
(ai,j ρ) −
(bi ρ).
∂xi ∂xj
∂xi
(2) Let Xt be a finite state Markov chain process, i.e., S = {1, 2, · · · , n} and
lim +
∆t→0
Then
Pst (i, j)
Ptt+∆t (i, j) − δi,j
= qi,j (t).
∆t
satisfies
Pst (i, j) = δi,j +
t
Z
Psσ (i, k)qk,j (σ) dσ.
s
Thus, if we define
X
(Ut f )(i) =
qi,j (t)f (j)
j∈S
then Ut is the generator of the Markov process Xt . Moreover,
X
(Ut∗ q)(i) =
ρ(j)qj,i (t)
j∈S
and hence ρt (i) = P (Xt = i|Ys ) satisfies
Z t
XZ t
X
ρt (i) = ρ0 (i) +
ρs (j) qj,i (s) ds +
(ρs (i)(h(s, i) −
ρs (j) h(s, j), dIs ).
j∈S
10.6
0
0
(10.38)
j∈S
Zakai equation
The Kushner’s equation for the conditional probability π
Z t
Z t
πt (φ) = π0 (φ) +
πs (A(s)φ) ds +
πs (M(s)φ) − πs (φ) πs (h), dYs − πs (h) ds)
0
(10.39)
0
for all φ ∈ C02 (Rn ) is a Brownian motion Yt driven stochastic PDE with the cubic nonlinearity
on (Ω, Yt , Q). If Xt is an Ito’s diffusion process Xt , for φ ∈ Cb2 (Rd )
A(t)φ =
X
ai,j (t, x)
i,j
and
M(t)φ = h(t, x) φ +
X
∂φ
∂2φ
+
bi (t, x)
∂xi ∂xj
∂x
i
i
X ∂φ
σi,j (t, x) hW j , V i.
∂x
i
i,j
105
We will transform it to a linear stochastic PDE, so-called Zakai equation.
Theorem (1) Suppose πt is a solution to the Kushner’s equation (10.39) and let
Z
Z t
1 t
|πs (hs )|2 ds).
αt = exp( (πs (hs ), dYs ) −
2 0
0
Then ρt (φ) = πt (φ)αt satisfies the Zakai equation
Z t
Z t
(ρs (M(s)φ), dYs )
ρs (A(s)φ) ds +
ρt (φ) = ρ0 (φ) +
(10.40)
0
0
for all φ ∈ dom(A.
(2) If ρt is a solution to the Zakai equation (10.40), then πt (φ) =
ρt (φ)
satisfies the Kushner’s
ρt (1)
equation.
Proof: (1) By Ito’s formula
d(πt (φ)αt ) = πt (φ)dαt + αt dπt (φ) + dhα, π(φ)it .
Since
πt (φ)dαt = (πt (φ)πt (h)αt , dYt )
αt dπt (φ) = πt (A(t)φ)αt dt + αt (πt (M(t)φ) − πt (φ)πt (h), dYt − πt (h) dt)
dhα, π(φ)it = αt (πt (h), πt (M(t)φ) − πt (φ)πt (h)) dt
thus ρt satisfies
dρt (φ) = ρt (A(t)φ) dt + (ρt (M(t)φ), dYt ).
(2) From (10.40)
Z
ρt (1) = ρ0 (1) +
t
Z
(ρs (h), dYs ) = ρ0 (1) +
0
t
(πs (h)ρs (1), dYs ).
0
where ρt (M(t)1) = ρt (h) = πt (h)ρt (1). Hence ρt (1) is given by
Z t
Z t
ρt (1) = ρ0 (1) exp( (πs (h), dYs ) −
|πs (h)|2 ds).
0
0
Thus, βt = ρt (1)−1 satisfies
dβt = βt (πt (h)), dYt ) + |πt (h)|2 βt dt.
By Ito’s formula
d(ρt (φ)βt ) = ρt (φ)dβt + βt dρt (φ) + dhβ, ρ(φ)it .
Since
ρt (φ) dβt = (πt (φ)πt (h), dYt ) − |πt (h)|2 πt (φ) dt
βt dρt (φ) = πt (A(t)φ) dt + (πt (M(t)φ h), dYt )
dhβ, ρ(φ)it = −(πt (M(t)φ h, πt (h) dt.
πt satisfies the Kushner’s equation. Remark The Zakai equation is linear equation
dρ(t, x) = A∗ (t)ρ dt + M(t)ρ dYt
and is a Brownian Motion driven Fokker Planck equation on (Ω, Yt , Q).
106
10.7
Relation to the Discrete Time Optimal Filter
Consider the discrete-time observation
yk = h(Xtk ) + Vk
The optimal filter is given by
∂
π(t, x) = A∗ (t)π on [tk , tk+1 ),
∂t
−
π(t+
k+1 , x) = c e
10.8
π(tk ) = π(t+
k)
(yk −h(x))t R−1 (yk −h(x)
2
π(tk+1 , x)
Gaussian filter
We consider a Gaussian approximation of the optimal nonlinear filter (10.35) by approximating
the conditional probability density π by the normal distribution
N (t, x) = N (x̂(t), P (t))(x) =
1
1
(2π)n detP (t)) 2
0
1
−1
e− 2 (x−x̂(t)) P (t)
(x−x̂(t))
.
Thus, we have the filter equation
d
dx(t)
(1)
= f\
(x(t))
where
f\
(x(t))
\
h(x(t))
and
Z
L(1) (t) =
(1)
\ (1) dt),
+ L(1) (t)(dyt − h(x(t))
Z
(1)
=
f (x)N (t, x) dx
Z
(1)
=
(1)
d
(x − x(t)
(10.41)
h(x)N (t, x) dx.
\
)(h(x) − h(x(t))
(1)
))t N (t, x) dx.
ˆ (1) )(x(t)−x(t)
ˆ (1) )t ]
In order to obtain the update for the covariance P (t) = P (1) (t) = E[x(t)−x(t)
we note that
ˆ (1) ) = (f (x) − f\
d(x(t) − x(t)
(x(t))
(1)
\
) dt − L(1) (t)(h(x(t)) − h(x(t))
(1)
) dt
+σ dw1 (t) − L(1) (t)dw2 (t).
If we assume h(x) = Hx, then
L(1) = P (1) H t .
By the Ito’s rule
ˆ
ˆ t)
d((x(t) − x(t))(x(t)
− x(t))
ˆ
ˆ t + (x(t) − x(t)))d(x(t)
ˆ
ˆ t + Q + L(1) (t)L(1) (t)t ) dt.
= d(x(t) − x(t)))(x(t)
− x(t))
− x(t))
we obtain
where
(1)
d (1)
d , P (1) (t)) − L(1) (t)L(1) (t) + Q.
P (t) = F (1) (x(t)
dt
Z
(1)
(1)
(1)
(1) d
(1)
d )t N (t, x) dx
F (x(t) , P (t)) = (f (x) − f\
(x(t)) )(x − x(t)
Z
+
(1)
d
(x − x(t)
\(1) )t N (t, x) dx.
)(f (x) − f (x(t))
107
(10.42)
(1)
(1)
\
Note that f\
(x(t)) and h(x(t))
depend on P 1 (t) and thus (10.43) and (10.42) are coupled.
In order to the filter equation separated from the covariance update we consider
d
dx(t)
(2)
d
= f (x(t)
(2)
(2)
d
) + L(2) (t)(dyt − h(x(t)
) dt),
(2)
d (2)
d , P (2) (t)) − L(2) (t)L(2) (t) + Σ.
P (t) = F (2) (x(t)
dt
(2)
d
where N (t, x) = N (x(t)
L(2) (t) =
d
F (2) (x(t)
(2)
d
(x − x(t)
, P (2) (t)) =
Z
+
(10.44)
, P (2) (t))(x) and
Z
(2)
(10.43)
(2)
d
(x − x(t)
[
)(h(x) − h(x(t))
(2)
Z
[
(f (x) − f ((x(t)
[
)(f (x) − f (x(t))
(2)
))t N (t, x) dx.
(2)
(2)
d
)(x − x(t)
)t N (t, x) dx
)t N (t, x) dx.
Example f (x, a) = A(a)x where a represents coefficients of matrix A(a) ∈ Rn×n for a ∈ Rm ,i.e.,
∂A
n×m
. Assume that N (t, m, Σ) is the normal distribution with mean m = (x̂, â) and
∂a = Ȧ ∈ R
covariance Σ
Σ11 Σ12
.
Σ=
Σ21 Σ22
Then,
Z
\ = A(â)x̂ +
A(a)x
(A(a)(x − x̂) + dotAx̂(a − â) + (A(a) − A(â)(x − x̂) dN (t, m, Σ)
= A(â)x̂ + (Ȧx̂)Σ21 .
and
Z
\
(A(a)x − A(a)x)(x
− x̂)t dN (t, m, Σ) = A(hata)Σ11 + Ȧx̂Σ21
Thus,we obtain the system for (x̂, â, Σ):
dx̂ = (A(â)x̂ + (dot(A)x̂)Σ2,1 ) dt + Σ11 H t R−1 (dyt − H x̂ dt)
dâ = Σ2,1 H t R−1 (dyt − H x̂ dt)
d Σ = J(x̂, â)Σ + ΣJ(x̂, â)t − ΣH t R−1 HΣ + Q = 0
dt
where
A(â) Ȧx̂
J(x̂, â) =
.
0
∗
0
∗
AΣ11 + Σ11 A + BΣ21 + Σ12 B − Σ11 HH t Σ11 + Q11 = 0.
AΣ12 + BΣ22 − Σ11 HH t Σ12 + Q12 = 0.
Σ12 HH t Σ12 + Q22 = 0.
108
11
Optimal Stopping Problem and Variational inequality
The theory of optimal stopping is concerned with the problem of determining an optimal stopping time of the process Xt to maximize an expected reward or minimize an expected cost.
Optimal stopping problems can be found in areas of statistics, economics, and mathematical
finance (related to the pricing of American options). Optimal stopping problems can often
be written in the form of a Bellman equation, and are therefore often solved using dynamic
programming.
Definition A measurable function f is called super-maenvalued function if f (x) ≥ E x [f (Xτ ]
for all stopping time τ and x ∈ Rn . If in addition f is lower-semicontinuos, then f is called
superharmonic.
Remark (1) By the Fatou’s lemma
f (x) ≤ E x [lim inf f (Xτn )] ≤ lim inf E x [f (Xτn ]
n→∞
n→∞
for any sequence τn of stopping times satisfying τn → 0+ , if f is lower semicontinous. Thus if
f is super harmonic
f (x) = lim E x [f (Xτn )] for all x ∈ Rn
n→∞
2
n
(2) If f ∈ C (R ) it follows from the Dynkin’s formula that f is super harmonic if and only if
Af ≤ 0.
where A is the characteristic orator of the diffusion process Xt .
(3) If Xt = x + Bt is the Brownian motion, then the super harmonic function coincides with the
super harmonic function in the classical potential theory.
Definition f is the super harmonic majorant of g if f ≥ g and f is superharmonic. Define the
least superharmonic majorant ĝ if ĥ ≤ f for any super harmonic majorant of g.
Lemma (Construction of the least superharmonic majorant) Let g = g0 be a lower
nonnegative semicomtinuous function Rn and define the sequence {gn } by
gn (x) = sup E x [gn−1 (Xt )],
t∈Sn
where Sn = {k2−n , 0 ≤ k ≤ 4n }. Then gn ↑ ĝ and ĝ is the least superharmonic majorant of g.
Proof: By the definition {gn } is nondecreasing. Define g̃(x) = limn→∞ gn (x). Then,
g̃(x) ≥ gn (x) = E x [gn−1 (Xt )] for all n and t ∈ Sn .
Thus,
g̃(x) ≥ lim E x [gn−1 (Xt )] = E x [g̃(Xt )]
n→∞
S∞
for all t ∈ S = n=1 Sn . Since g̃ is a monotone limit of lower semicontinuous functions, then g̃
is lower semicontinuous. For a fixed t ∈ R choose tn → t and we have
g̃(x) ≥ lim inf E x [g̃(Xtn )] ≥ E x [lim inf g̃(Xtn )] ≥ E x [g̃(Xt )]
n→∞
n→∞
by the Fatou lemma. Hence g̃ is an excessive function. It is shown in Dynkin [] that g̃ is a
superharmonic majorant of g. If f is any superharmonic majorant of g, the by induction one
can show that
f (x) ≥ gn (x) for all n
and thus f (x) ≥ g̃(x). Hence g̃ is the least superharmonic majorant. 109
Theorem (Optimal Stopping) Let V (x) is the optimal reward function and ĝ be the least
superharmonic majorant of a continuous reward function g ≥ 0. Then, V (x) = ĝ. Define the
continuation region
D = {x : V (x) > g(x)}.
Suppose there exists an optimal stopping time τ ∗ , then τ ∗ ≥ τD for all x ∈ D and τ ∗ = τD , i.e.,
V (x) = E x [g(XτD )].
Proof: Since ĝ is superharmonic majorant of g
ĝ(x) ≥ E x [ĝ(Xτ )] ≥ E x [g(Xτ )].
Thus,
V (x) = sup E x [Xτ ] ≤ ĝ(x)
τ
Assume g is bounded. Define
D = {x : g(x) < ĝ(x) − }
and
ĝ (x) = E x [ĝ(Xτ )]
Then, ĝ (x) is superharmonic . One can prove that
g(x) ≤ ĝ (x) + Thus, ĝ + is a superharmonic majorant of g and
ĝ ≤ ĝ + = E x [ĝ(Xτ )] + ≤ E x [(g + (Xτ )] + ≤ V (x) + 2
which implies ĝ = V (x) since > 0 is arbitrary. Let x ∈ D and τ be a stopping time with
P x (τ < τD ) > 0. Since g(Xτ ) < V (Xτ ) if τ < τD and g ≤ V , we have
R
R
E x [g(Xτ )] = τ <τD g(Xτ )dP x τ ≥τD g(Xτ )dP x
Z
V (Xτ )dP x +
<
Z
g(Xτ )dP x = E x [V (Xτ )] ≤ V (x)
τ ≥τD
τ <τD
since V = ĝ is superharmonic.
For x ∈ D we have
V (x) = E x [g(Xτ ∗ )] ≤ E x [ĝ(Xτ ∗ )] ≤ E x [ĝ(XτD )]
= E x [g(XτD )] ≤ V (x)
For x ∈ ∂D and assume τD > 0. Let τk be a sequence of stopping times such that 0 < τk < τD
and τk → 0 a.s. as k → ∞. Since Xτk ∈ D and Xt is a strong Markov process we have
E x [g(XτD )] = E x [θτk g(X|tauD )] = E x [E Xτk [g(XτD )]] = E x [V (XτD )].
Hence since V is lower semi-continuous and by the Fatou’s lemma
V (x) ≤ E x [lim inf V (Xτk )] ≤ lim inf E x [V (Xτk )] = E x [g(XτD )]
k→∞
k→∞
x
Hence V (x) = E [g(XτD )]. Remark (1) Let Xt be a deterministic process and let
g(x) =
x2
.
1 + x2
110
Then, g ∗ (x) = 1. But the optimal stooping time does not exists.
(2) If g ∈ C 2 (Ω) and let
U = {Ag > 0}.
Then, U ⊂ D. In fact, for x ∈ U by Dynkin’s formula
Z
V (X) ≥ E x [g(Xτ )] = g(x) + E x [
τ
Ag(Xs ) ds] > g(x)
0
and thus U ⊂ D.
Example 1 (Running cost) Let τΩ be the stopping time of an open set Ω in Rn . Consider the
optimal stopping time problem with a running cost of the form:
Z τ
x
f (Xs ) ds]
V (x) = sup E [g(Xτ ) +
τ
0
over all stopping time 0 ≤ τ ≤ τΩ . Consider the augmented system:
dXt = b(Xt ) dt + σ(Xt ) dBt , X0 = x
dWt = f (Xt ) dt,
W0 = 0.
The, we have
V (x) = sup E x [g(Xτ ) + Wτ ].
τ
and the corresponding generator is given by
Aφ̃(x, w) = Aφ̃ + f
∂f
(x, w).
∂ φ̃
For φ̃(x, w) = φ(x) + w we have
Aφ̃(x, w) = Aφ + f (x).
Theorem (Variational Inequality I) Let φ ∈ H 2 (Ω) satisfies
max(Aφ, g − φ) = 0.
Let
D = {x ∈ Ω : φ(x) > g(x)}
Then τD is the an optimal stooping time and φ = V is the optimal value function, i.e.,
Z τ
x
φ(x) = sup E [g(Xτ ) +
f (Xs ) ds].
0≤τ ≤τΩ
Proof: For > 0 let φ =
0
−n ζ( x−y
φ(y) dy be a molifier of φ. Then, by the Dynkin’s formula
Z τ
E x [φ (Xτ )] = φ (x) + E x [ (Aφ + f )(Xs )]
R
Ω
0
Letting → 0+
x
x
Z
E [φ(Xτ )] = φ(x) + E [
τ
(Aφ + f )(Xs )]
0
since Aφ → Aφ a.e. x ∈ Ω. For x ∈ D and τ < τD
Z τ
E x [g(Xτ )] < E x [φ(Xτ )] = φ(x) + E x [ (Aφ + f )(Xs )] = φ(x)
0
111
and thus V (x) = φ(x). For x ∈
/ D since
Z τ
E x [φ(Xτ )] = E x [g(Xτ )] = φ(x) + E x [ (Aφ + f )(Xs ) ds] ≤ φ(x).
0
V (x) = φ(x). Example 2 (Time dependent case)
x
V (x) = sup E [e
−
Rτ
0
q(Xs ) ds
Z
g(Xτ ) +
0≤τ ≤T
τ
e−
Rs
0
q(Xr ) dr
f (Xs ) ds] = sup E x [Zτ g(Xτ ) + Wτ ],
0≤τ ≤T
0
where for the augmented dynamics
dt = dt
dXt = b(t, , Xt ) dt + σ(Xt ) dBt ,
X0 = x
dWt = Zt f (t, Xt ) dt, W0 = 0.
dZt = −q(Xt )Zt dt, Z0 = 1,
the generator is given by
Aφ̃(t, x, w, z) =
∂
∂
∂
φ̃ + Aφ + zf (t, x) φ̃ − q(x)z φ̃.
∂t
∂t
∂z
For φ̃(t, x, w, z) = zφ(t, x) + w, z > 0, we have
Aφ̃(x, w) = z(
∂
φ + Aφ + f (t, x) − q(x)φ).
∂t
Theorem (Variational Inequality II) Let φ ∈ H 1,2 (0, T, Ω) satisfies
∂
v + Av − q(x)v − f (t, x), g − v = 0, v(T, x) = g(x).
max
∂t
Let
D = {x ∈ Ω : v(t, x) > g(x)}
Then τD is the an optimal stooping time and v is the optimal value function, i.e.,
Z τ
R
Rs
t,x − tτ q(Xs ) ds
v(t, x) = sup E [e
g(Xτ ) +
e− t q(Xr ) dr f (s, Xs ) ds].
t≤τ ≤T
0
American Option
v(t, x) = max E t,x [e−r(τ −t) ψ(Xτ )]
t≤τ ≤T
for the stock process
dXt = rXt dt + σXt dBt .
+
Let ψ = (K − x) (put option) or ψ = (x − K)+ (call option), where K is the strike price and
T > 0 is the maturity time. Then the Black Scholes American option equation is given by
σ 2 x2
∂
v + rx vx +
vxx − r v, ψ − v = 0 a.e. in (0, T ) × R, v(T, x) = ψ(x).
max
∂t
2
In general, consider a stochastic Xt defined on the probability space (Ω, F, P ) with filtration
Ft and assume that Xt is adapted to the filtration. The optimal stopping problem is to find the
stopping time τ ∗ which maximizes the expected reward
V (t, x) = E t,x [g(Xτ ∗ )] = sup E 0,x [g(Xτ )]
0≤τ ≤T
112
where V (t, x) is called the value function. Here T can take value ∞. A more specific formulation
is as follows. We consider an Ft -adapted strong Markov process X = {Xt , t ≥ 0}. Given
continuous functions g, f , the optimal stopping problem is
Z τ
V (x) = sup E 0,x [g(Xτ ) +
f (Xt ) dt].
0≤τ ≤T
0
Let Xt be a Levy process in Rn given by the SDE
Z
dXt = b(Xt )dt + σ(Xt )dBt +
γ(Xt− , z)N̄ (dt, dz),
X0 = x
Rn
where Bt is an m-dimensional Brownian motion, N̄ is an l-dimensional compensated Poisson
random measure, b : Rn → Rn , σ : Rn → Rn×m , and γ : Rn × Rn → Rk×l are given functions
such that a unique solution Xt exists. Let Ω ⊂ Rn be an open set and τΩ is the exit time from
Ω of Xt :
τΩ = inf{t > 0 : Xt ∈
/ Ω}.
Z τ
V (x) = sup J τ (y) = sup E 0,x [g(Xτ ) +
f (Xt ) dt].
τ ≤τΩ
τ ≤τS
0
It turns out that under some regularity conditions, the following verification theorem holds:
If a function φ : Ω̄ → R satisfies
• φ ∈ C(Ω̄) ∩ C 1 (Ω) ∩ C 2 (Ω \ ∂D), where the continuation region is D = {x ∈ Ω : φ(x) >
g(x)},
• φ ≥ g on Ω, and
• Aφ + f ≤ 0 on Ω \ ∂D, where A is the infinitesimal generator of Xt ,
then φ(x) ≥ V (x) for all x ∈ Ω̄. Moreover, if
Aφ + f = 0 on D
Then φ(x) = V (x) for all x ∈ Ω̄ and τ ∗ = inf{t > 0 : Xt ∈
/ D} is an optimal stopping time.
These conditions can also be written is a more compact form (the integro-differential variational inequality):
max {Aφ + f, g − φ} = 0 on Ω \ ∂D.
12 Stochastic Optimal Control and Hamilton-Jacobi-Bellman
equation
In this section we consider the stochastic optimal control problem.
12.1
Discrete time
Consider the discrete time control problem; minimize
N
−1
X
J u = E{wk }N −1 [
k=i
Q(Xk ) + h(uk ) + V (XN )|X0 = x]
k=0
−1
over controls {uk }N
k=0 ∈ U , subject to
Xn+1 = f (Xn , un , wn ),
X0 = x,
where Xn is an S-valued process, uk is Fk -adapted, and {wk } is the independent, identically
distributed random variables.
113
Let {Vn } be the value function
Vn (x) =
min
−1
{uk }N
k=n ∈U
E[
N
−1
X
(Q(Xk ) + h(uk )|Xn = x].
k=n
The optimality principle is given by the so-called Bellman’s dynamic programming:
Vn (x) = min Ew [Q(x) + h(u) + Vn+1 (f (x, u, w))]
u∈U
(12.1)
for n ≤ N − 1 and the optimal feedback law is
un (x) = argminu∈U Ew [Q(x) + h(u) + Vn+1 (f (x, u, w))].
In fact, since {wk } are independent
Vn (x) = minun Ew [
+
min
{uk }N
k=n+1 ∈U
PN −1
k=n
(Q(x) + h(un ))
E{wk }N −1 [
N
−1
X
k=n+1
(Q(x) + h(un )) + VN (XN )|xn+1 = f (x, u, w)].
k=n+1
Example (Markov chain case) In the case finite state control Markov chain with Let S =
{1, 2, · · · n} and Piju be the controlled transition probability. Then, the DP principle (12.1) is
written as
X
Vn (i) = min{Q(i) + h(u) +
Piju Vn+1 (j)}.
u∈U
j
Example (Deterministic case) In the deterministic the DP principle (12.1) reduces to
Vn (x) = min {Q(x) + h(u) + Vn+1 (f (x, u))}.
u∈U
Consider the linear quadratic case; minimize
N
−1
X
(xk , Qxk ) + (Ruk , uk ) + (xN , GxN )
k=0
subject to
xn+1 = Axn + Bun
Then,
Vn (x) = (x, Pn x)
where the self-adjoint operator on X is defined by
Pn = A∗ Pn+1 A − A∗ P B(R + B ∗ Pn+1 B)−1 B ∗ P A + Q
with PN = G. In fact
(x, Qx) + (u, Ru) + (Ax + Bu, Pn+1 (Ax + Bu))
= ((R + B ∗ P B)u, u) + (B ∗ Pn+1 Ax, u) + (x, Qx) + (Pn+1 Ax, Ax)
is minimized at
u = −(R + B ∗ P B)−1 B ∗ Pn+1 Ax
and thus
Vn (x) = (x, Qx) + (Pn+1 Ax, Ax) − ((R + B ∗ P B)−1 B ∗ Pn+1 Ax, B ∗ Pn+1 Ax) = (x, Pn x).
114
12.2
Continuous time stochastic optimal control problem
Let Ω be an open set in Rn and τ = τΩ is the exit time from Ω. We consider the stochastic
optimal control problem:
Z T ∧τ
u
s,x
min J (s, x) = E [
f 0 (t, Xt , ut ) dt + g(XT ∧τ )]
(12.2)
subject to
dXt = b(Xt , ut ) dt + σ(Xt , ut ) dBt
(12.3)
over ut ∈ Uad , i.e., a progressively measurable control such that {u ∈ U a.s.} where U is a closed
convex set in Rm . Here we assume the running cost f 0 and the terminal cost g is Lipschitz.
Define the optimal value function V by
min
V (s, x) = inf J u (s, x).
u∈Uad
Theorem (Hamilton-Jacobi-Bellman equation) Suppose V ∈ C 1,2 (0, T ) satisfies
∂
V (t, x)+min{Au V +f 0 (t, x, ∇V )} = 0,
u∈U
∂t
V (T, x) = g(x), x ∈ Ω and V (t, x) = g(x), x ∈ ∂Ω,
(12.4)
where the generator Au , u ∈ U is given by
1
tr σ(t, x, u)σ(t, x, u)t ∇2 V,
2
Then, V is the value function and the optimal control is a Markov control
Au V = b(t, x, u) · ∇V +
u∗t = argminu∈U {Au V (t, Xt ) + f 0 (t, Xt , u)}.
(12.5)
Proof: By the Ito’s formula
u
J (s, x) = V (s, x) + E
s,x
Z
[
0
T ∧τ
(
∂V
+ Au + f 0 (t, Xt , ut )) dt] ≥ V (s, s)
∂t
for all ut ∈ Uad . Moreover the equality hold if and only if ut . satisfies (12.5). Note that
1
u∗ = argminu∈U {b(t, x, u)·∇V (t, x)+ trσσ(t, x, u)∇2 V f 0 (t, x, u)} = Ψ(t, x, ∇V (t, x), ∇2 V (t, x))
2
(12.5) is a Markov control. For example consider the case when b(t, x, u) = f (x) + B(x)u,
σ(t, x, u) = σ(t, x) and f 0 (t, x, u) = `(x) + h(u). Then, we have
u∗ = argminu∈U (h(u) + (∇V (t, x), B(x)u))
Corollary (Discrete State) Consider the control Markov process on S = {1, · · · n} with the
transition probability Piju . The value function V satisfies
X
d
V (t, i) + min{f 0 (t, i, u) +
Piju V( t, j)} = 0
u∈U
dt
j
and the optimal control is given by
ut (i) = argminu∈U {f 0 (t, i, u) +
X
Piju V( t, j)}
j
In general let {Xt } be a Markov process with the controlled generator Au and minimize
(12.2) E s,x J u (s, x) over admissible controls u. Then, by the Dynkin’s formula the HJB theorem
holds.
Example (Linear Quadratic Gaussian regulator problem)
Example (Optimal Potfolio selection)
115
© Copyright 2026 Paperzz