Convergence of Markov Processes

Convergence of Markov Processes
Amanda Turner
University of Cambridge
1
Contents
1 Introduction
2
2 The Space DE [0, ∞)
3
2.1
The Skorohod Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Convergence of Probability Measures
3
10
3.1
The Prohorov Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
3.2
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
3.3
The Skorohod Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
4 Convergence of Finite Dimensional Distributions
18
5 Relative Compactness in DE [0, ∞)
21
5.1
Prohorov’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
5.2
Compact Sets in DE [0, ∞) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
5.3
Some Useful Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 A Law of Large Numbers
26
28
6.1
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
6.2
The Fluid Limit
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
6.3
A Brief Look at the Exit Time . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
7 A Central Limit Theorem
38
7.1
Relative Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
7.2
Convergence of the Finite Dimensional Distributions . . . . . . . . . . . . . . . .
42
8 Applications
48
8.1
Epidemics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
8.2
Logistic Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
9 Conclusion
53
1
1
Introduction
This essay aims to give an account of the theory and applications of the convergence of stochastic
processes, and in particular Markov processes. This is developed as a generalisation of the
convergence of real-valued random variables using ideas mainly due to Prohorov and Skorohod.
Sections 2 to 5 cover the general theory, which is applied in Sections 6 to 8.
For random variables taking values in R, there are a number of types of convergence including almost sure convergence, convergence in probability and convergence in distribution. The
first two depend on the random variables being defined on the same probability space and are
consequently not sufficiently general. Convergence in distribution is weaker than the other two
types of convergence in the sense that if Xn → X almost surely or in probability, then Xn
converges to X in distribution. However, if Xn converges to X in distribution, then there exists
a probability space on which are defined random variables Yn and Y with the same distributions
as Xn and X such that Yn → Y almost surely, and hence in probability. In this way convergence
in distribution ‘incorporates’ the other types of convergence.
The notion of convergence for stochastic processes, that is random variables taking values
in some space of functions on [0, ∞), is even less straightforward. Once again, there is almost
sure convergence and convergence in probability, but one may also say that X n converges to X
if the finite-dimensional distributions converge, that is if for any choice t1 , . . . , tk of times, then
(Xtn1 , . . . , Xtnk ) converges in distribution to (Xt1 , . . . , Xtk ). It turns out that a direct generalisation of convergence in distribution of random variables in R to stochastic processes in some
sense ‘incorporates’ all these types of convergence.
We restrict our attention to stochastic processes whose sample paths are right continuous
functions with left limits at every time point t, also known as cadlag functions. The reason for
this is that most processes which arise in applications have this property, and these functions
are reasonably well behaved. In order to be able to talk about almost sure convergence and
convergence in probability, we need a topology on the space of cadlag functions. It turns out to
be extremely difficult to construct a topology with useful properties on this space and Section
2 contains a very technical discussion of the construction and properties of such a topology, the
Skorohod topology.
For each stochastic process X, there is a unique probability measure on the space of cadlag
functions that characterises the distribution of X. As this probability measure is independent
of the probability space on which X is defined, it is easier to work with the probability measures
to obtain results on convergence. In Section 3 we construct a metric on the space of probability
measures that induces a topology equivalent to that generated by convergence in distribution of
the related stochastic processes. Using this we establish an equivalence between convergence in
distribution, almost sure convergence and convergence in probability.
Section 4 looks at the convergence of finite dimensional distributions. This viewpoint is used
to establish a key result, due to Prohorov, that states that a sequence of stochastic processes
converges if and only if it is relatively compact and the corresponding finite dimensional distributions converge. This gives the required equivalence between convergence in distribution and
convergence of finite dimensional distributions.
In order to apply the result from Section 4 in any practical situations, we need to have an
understanding of what it means for a sequence of stochastic processes to be relatively compact.
Prohorov’s Theorem, discussed in Section 5, establishes equivalent conditions in terms of the
compact sets on the space of cadlag functions. Characterising these compact sets in a way that
can be easily applied and putting these results together gives some useful necessary and sufficient
conditions for a family of stochastic processes to be relatively compact.
The remainder of the essay applies the general theory that has been built up so far to various
cases of interest. In Section 6 the Law of Large Numbers is generalised by showing that under
certain conditions a sequence of Markov jump processes can converge to the solution of an
2
ordinary differential equation. (The sense in which this is a generalisation of the Law of Large
Numbers is explained at the beginning of the section.)
This idea is developed further in Section 7, by showing that the fluctuations about this limit
converge in distribution to the solution of a stochastic differential equation, which generalises
the central limit theorem. Here the results from Section 4 and the characterisation of relative
compactness from Section 5 are applied to prove the convergence in distribution.
Finally, the applications of the large number and central limit results to some practical
situations are discussed. Particular mention is given to the application of these limit theorems
to population processes in biology.
The material in Sections 2 to 5 is broadly based on the approach of Ethier and Kurtz [4].
Sections 6 and 7 cover material from the paper of Darling and Norris [3], although the application
of the theorems from Section 4 and Section 5 is an extension of the results in this paper.
The Space DE [0, ∞)
2
Most stochastic processes arising in applications have right and left limits at each time point
for almost all sample paths. By convention we assume that the sample paths are in fact right
continuous where this can be done without changing the finite-dimensional distributions. For
this reason, the space of right continuous functions with left limits is of great importance and
in this section we explore its various properties and define a suitable metric on it. We conclude
the section by investigating the Borel σ-algebra that results from this metric.
Although in the applications to be discussed, the stochastic processes have sample paths
taking values in some subset of (Rd , | · |), where possible we establish results for processes with
sample paths taking values in a general metric space. Throughout this essay we shall denote
this metric space by (E, r), and define q to be the metric q = r ∧ 1.
Definition 2.1. DE [0, ∞) is the space of all right continuous functions x : [0, ∞) → E with left
limits i.e. for each t ≥ 0, lims↓t x(s) = x(t), and lims↑t x(s) = x(t−) exists.
We begin with a result that shows that functions in DE [0, ∞) are fairly well behaved.
Lemma 2.2. If x ∈ DE [0, ∞), then x has at most countably many points of discontinuity.
S∞
Proof. The set of discontinuities of x is given by n=1 An where An = {t > 0 : r(x(t), x(t−)) >
1
n }, so it is enough to show that each An is countable. Suppose we have distinct points t1 , t2 , . . . ∈
An with tm → t for some t, as m → ∞. By restricting to a subsequence if necessary, we may
assume that either tm ↑ t or tm ↓ t. Then limm→∞ x(tm ) = x(t−) = limm→∞ x(tm −) or
limm→∞ x(tm ) = x(t) = limm→∞ x(tm −) and so r(x(tm ), x(tm −)) < n1 for large enough m,
contradicting xm ∈ An . Therefore A¯n cannot contain any limiting sequences. But for each
T > 0, every sequence in the interval [0, T ] has a convergent subsequence and so there are only
a finite number of points of An in [0, T ]. Hence An is countable, as required.
2.1
The Skorohod Topology
The results on convergence of probability measures that we shall prove in subsequent sections
are most applicable to complete separable metric spaces. For this reason we define a metric
on DE [0, ∞) under which it is separable and complete if (E, r) is separable and complete. In
particular DRd [0, ∞) will be separable and complete.
Definition 2.3. Let Λ′ be the collection of strictly increasing functions λ mapping [0, ∞) onto
[0, ∞) (in particular, λ(0) = 0, limt→∞ λ(t) = ∞, and λ is continuous) Let Λ be the set of
3
Lipschitz continuous functions λ ∈ Λ′ such that
λ(t) − λ(s) <∞
γ(λ) = sup log
t−s 0≤s<t
For x, y ∈ DE [0, ∞), define
d(x, y) = inf γ(λ) ∨
λ∈Λ
Z
∞
e
−u
d(x, y, λ, u)du
0
where
d(x, y, λ, u) = sup q(x(t ∧ u), y(λ(t) ∧ u)).
t≥0
The Skorohod Topology is the topology induced on DE [0, ∞) by the metric d.
Proposition 2.4. The function d, defined above, is a metric on DE [0, ∞).
Proof. Suppose (xn )n≥1 , (yn )n≥1 are sequences in DE [0, ∞). Then limn→∞ d(xn , yn ) = 0 if and
only if there exists a sequence (λn )n≥1 in Λ such that
lim γ(λn ) = 0
(2.1)
lim µ{u ∈ [0, u0 ] : d(xn , yn , λn , u) ≥ ε} = 0
(2.2)
n→∞
and
n→∞
for every ε > 0 and u0 > 0, where µ is Lebesgue measure.
Now for all T > 0,
T eγ(λ) − 1
=
λ(t)−λ(s)
T esup0≤s<t |log t−s | − 1
λ(t)−λ(s)
sup T e|log t−s | − 1
=
0≤s<t
λ(t)
sup T e|log t | − 1
≥
0<t≤T
=
sup T
0<t≤T
≥
λ(t) − t
,
max
t
sup max {λ(t) − t,
0<t≤T
t − λ(t)
λ(t)
t − λ(t)} ,
where the first inequality follows from setting s = 0 and bounding t by T and the next line
follows by considering the cases log λ(t)
t ≥ and < 0 separately. This gives us
sup |λ(t) − t| ≤ T eγ(λ) − 1 ,
(2.3)
0≤t≤T
by which (2.1) implies
lim
sup |λn (t) − t| = 0
n→∞ 0≤t≤T
(2.4)
for all T > 0.
Now suppose d(x, y) = 0. Then setting xn = x and yn = y for all n ∈ N, (2.2) and (2.4)
imply that x(t) = y(t) for almost all continuity points t of y. But by Lemma 2.2, y has at most
countably many points of discontinuity and so x(t) = y(t) for almost all points t and, as x and
y are right continuous, x = y.
Let x, y ∈ DE [0, ∞). Then since λ is bijective on [0, ∞)
sup q(x(t ∧ u), y(λ(t) ∧ u)) = sup q(x(λ−1 (t) ∧ u), y(t ∧ u))
t≥0
t≥0
4
for all λ ∈ Λ and u ≥ 0, and so d(x, y, λ, u) = d(y, x, λ−1 , u). Also
λ(t) − λ(s) γ(λ) = sup log
t−s 0≤s<t
λ(λ−1 (t)) − λ(λ−1 (s)) = sup log
λ−1 (t) − λ−1 (s) 0≤s<t
t−s
= sup log −1
−1
λ (t) − λ (s) 0≤s<t
λ−1 (t) − λ−1 (s) = sup log
t−s
0≤s<t
=
γ(λ−1 )
for every λ ∈ Λ and so d(x, y) = d(y, x).
To show that d is a metric it only remains to check the triangle inequality.
Let x, y, z ∈ DE [0, ∞), λ1 , λ2 ∈ Λ, and u ≥ 0. Then
sup q(x(t ∧ u), z(λ2 (λ1 (t)) ∧ u)) ≤
t≥0
sup q(x(t ∧ u), y(λ1 (t) ∧ u))
t≥0
+ sup q(y(λ1 (t) ∧ u), z(λ2 (λ1 (t)) ∧ u))
t≥0
=
sup q(x(t ∧ u), y(λ1 (t) ∧ u))
t≥0
+ sup q(y(t ∧ u), z(λ2 (t) ∧ u)),
t≥0
i.e. d(x, z, λ2 ◦ λ1 , u) ≤ d(x, y, λ1 , u) + d(y, z, λ2 , u). But since λ2 ◦ λ1 ∈ Λ and
λ2 (λ1 (t)) − λ2 (λ1 (s)) γ(λ2 ◦ λ1 ) = sup log
t−s
0≤s<t
λ2 (λ1 (t)) − λ2 (λ1 (s))
λ1 (t) − λ1 (s) = sup log
+ log
λ1 (t) − λ1 (s)
t−s
0≤s<t
λ1 (t) − λ1 (s) λ2 (λ1 (t)) − λ2 (λ1 (s)) sup log
≤ sup log
+ 0<s≤t
λ1 (t) − λ1 (s)
t−s
0≤s<t
= γ(λ2 ) + γ(λ1 ),
we obtain d(x, z) ≤ d(x, y) + d(y, z) as required.
It is not very clear from the definition of the Skorohod topology under which conditions
sequences in DE [0, ∞) converge. The following two propositions establish some necessary and
sufficient conditions for convergence in DE [0, ∞) which are slightly easier to get an intuitive
grasp of.
Proposition 2.5. Let (xn )n≥1 and x be in DE [0, ∞). Then limn→∞ d(xn , x) = 0 if and only if
there exists a sequence (λn )n≥1 in Λ such that (2.1) holds and
lim d(xn , x, λn , u) = 0
n→∞
for all continuity points
u
of
x.
(2.5)
In particular, limn→∞ d(xn , x) = 0 implies that limn→∞ xn (u) = limn→∞ xn (u−) = x(u) for all
continuity points u of x.
Proof. By Lemma 2.2, x has only countably many discontinuity points and so, by the Reverse
5
Fatou Lemma and (2.5),
lim µ{u ∈ [0, u0 ] : d(xn , x, λn , u) ≥ ε}
n→∞
Z
= lim sup 1{u∈[0,u0 ]:d(xn,x,λn ,u)≥ε} dµ
n→∞
Z
≤
1{u∈[0,u0 ]:lim supn→∞ d(xn ,x,λn ,u)≥ε} dµ
≤ µ{u ∈ [0, u0 ] : u is a discontinuity point of x}
= 0,
i.e (2.2) holds and so the conditions are sufficient.
Conversely, suppose that limn→∞ d(xn , x) = 0 and that u is a continuity point of x. Then
there exists a sequence (λn )n≥1 in Λ such that (2.1) holds, and (2.2) holds with yn = x for all
n. In particular, there exists an increasing sequence (Nm )m≥1 such that for all n ≥ Nm
1
> 0.
(2.6)
µ v ∈ (u, u + 1] : d(xn , x, λn , v) <
m
Hence, for each Nm ≤ n < Nm+1 , there exists a un ∈ (u, u + 1] such that d(xn , x, λn , un ) <
By picking arbitrary values of un ∈ (u, u + 1] for n < N1 , we obtain
lim sup q(xn (t ∧ un ), x(λn (t) ∧ un )) = lim d(xn , x, λn , un ) = 0.
n→∞ t≥0
n→∞
1
m.
(2.7)
Now
d(xn , x, λn , u) =
≤
sup q(xn (t ∧ u), x(λn (t) ∧ u))
t≥0
sup q(xn (t ∧ u), x(λn (t ∧ u) ∧ un ))
t≥0
+ sup q(x(λn (t ∧ u) ∧ un ), x(λn (t) ∧ u))
t≥0
But
sup q(x(λn (t ∧ u) ∧ un ), x(λn (t) ∧ u)) =
t≥0
sup q(x(λn (t ∧ u) ∧ un ), x(λn (t) ∧ u))
0≤t≤u
∨ sup q(x(λn (t ∧ u) ∧ un ), x(λn (t) ∧ u))
t>u
=
sup q(x(λn (t) ∧ un ), x(λn (t) ∧ u))
0≤t≤u
∨ sup q(x(λn (u) ∧ un ), x(λn (t) ∧ u))
t>u
=
sup
q(x(s), x(u))
u≤s≤λn (u)∨u
∨
sup
λn (u)∧u<s≤u
q(x(λn (u) ∧ un ), x(s))
where the third equality is obtained by setting s = λn (t) ∧ un in the first term and s = λn (t) ∧ u
in the second term. Hence
d(xn , x, λn , u) ≤
sup q(xn (t ∧ un ), x(λn (t) ∧ un ))
0≤t≤u
+
sup
q(x(s), x(u))
u≤s≤λn (u)∨u
∨
sup
λn (u)∧u<s≤u
q(x(λn (u) ∧ un ), x(s))
for each n. Thus limn→∞ d(xn , x, λn , u) = 0 by (2.7), (2.4), and the continuity of x at u.
6
(2.8)
Proposition 2.6. Let (xn )n≥1 and x be in DE [0, ∞). Then limn→∞ d(xn , x) = 0 if and only if
there exists a sequence (λn )n≥1 in Λ such that (2.1) holds and
lim
sup r(xn (t), x(λn (t))) = 0
(2.9)
n→∞ 0≤t≤T
for all T > 0.
Remark 2.7. The above proposition is equivalent to one with (2.9) replaced by
lim
sup r(xn (λn (t)), x(t)) = 0
(2.10)
n→∞ 0≤t≤T
Proof. Suppose limn→∞ d(xn , x) = 0. Then there exists a sequence (λn )n≥1 in Λ such that (2.1)
holds and (2.2) holds with yn = x for all n. In particular, by (2.6) with u = m, there exists a
sequence (un )n≥1 ⊂ (0, ∞) with un → ∞, and d(xn , x, λn , un ) → 0 i.e.
lim sup r(xn (t ∧ un ), x(λn (t) ∧ un )) = 0.
n→∞ t≥0
But given T > 0, un ≥ T ∨ λn (T ) for sufficiently large n (using (2.4)) and so the above equation
implies (2.9).
Conversely, suppose there exists a sequence (λn )n≥1 in Λ such that (2.1) and (2.9) hold.
Then for every continuity point u of x,
sup q(xn (t ∧ un ), x(λn (t) ∧ un )) = sup q(xn (t), x(λn (t))) ≤ sup r(xn (t), x(λn (t)))
0≤t≤u
0≤t≤u
0≤t≤u
for all n large enough that un > λn (u) ∨ u. And so, by (2.8), and the right continuity of x (2.5)
holds. The result follows by Proposition 2.5.
Two simple examples of sequences that converge in (DE [0, ∞), d) may give some insight into
why the metric d is defined as above:
Example 2.8. Suppose (xn )n≥1 is a sequence in (DE [0, ∞), d) such that xn → x locally uniformly for some x ∈ (DE [0, ∞), d). Then limn→∞ sup0≤t≤T r(xn (t), x(t)) = 0 and so, taking
λn (t) = t for all n in Proposition 2.6, xn → x with respect to d. Hence the Skorohod topology
is weaker than the (locally) uniform topology.
Example 2.9. Define a sequence (xn )n≥1 in (DE [0, ∞), d), by xn (s) = αn 1{tn ≤s} , where αn ∈ E
and tn ∈ [0, ∞). Suppose that tn → t and αn → α for some t ∈ [0, ∞) and α ∈ E. Intuitively
one would expect xn → x where x(s) = α1{t≤s} . However, in the locally uniform topology this
is not always the case; for example if t ≤ tn for all n and α 6= 0, then xn (t) = 0 for all n,
whereas x(t) = α. In other words, the locally uniform topology is too strong. The functions
λn are introduced to allow small perturbations around the points of discontinuity of x. In this
example, if t 6= 0, then tn > 0 for sufficiently large n and so we may set λn (s) = ttn s. Then
limn→∞ γ(λn ) = limn→∞ log ttn = 0 and
sup r(xn (s), x(λn (s))) = sup r(αn 1tn ≤s , α1tn ≤s ) → 0.
0≤s≤T
0≤s≤T
By Proposition 2.6, xn → x, as intuitively expected.
We are now ready to prove that (DE [0, ∞), d) has the required properties. Note that while
separability is a topological property, completeness is a property of the metric.
Theorem 2.10. If E is separable, then DE [0, ∞) is separable. If the metric space (E, r) is
complete, then (DE [0, ∞), d) is complete.
7
Proof. Since E is separable, there exists a countable dense subset {α1 , α2 , . . .} ⊂ E. Let Γ be
the countable collection of elements of DE [0, ∞) of the form
(
αik ,
tk−1 ≤ t < tk , k = 1, . . . , n,
y(t) =
(2.11)
t ≥ tn ,
αin ,
where 0 = t0 < t1 < · · · < tn are rationals, i1 , . . . in are positive integers and n ≥ 1. We shall
show that Γ is dense in DE [0, ∞). Given ε > 0, let x ∈ DE [0, ∞) and let T ∈ N be sufficiently
large that e−T < 2ε . By Lemma 2.2, x has only finitely many points of discontinuity in the
interval (0, T ), at s1 , . . . sm say, where 0 = s0 < s1 < · · · < sm < sm+1 = T . Since x has left
limits, for each j = 0, . . . , m, x is uniformly continuous on the interval [sj , sj+1 ) and so there
exists some 0 < δj < sj+1 −sj such that if s, t ∈ [sj , sj+1 ), and |s−t| < δj , then |x(s)−x(t)| < 4ε .
3T
Let n ≥ δ0 ∧···∧δ
and let tk = kT
n for k = 0, . . . , n. For each k = 0, . . . , n − 1, there exists some
m
positive integer ik such that |αik − x(tk )| < ε4 . Define y as in (2.11). For each j = 1, . . . m with
nsj ∈
/ N, there exists some kj such that sj ∈ (tkj , tkj +1 ). Let t′kj =
sj −e−ε tkj +1
1−e−ε
∈ (tkj −1 , sj ),
sj −eε tkj +1
∈ (tkj +1 , tkj +2 ). Note that, by the definition of n, |kj − ki | ≥ 3 for all
and
=
1−eε
i 6= j and so the t′kj are strictly increasing. Let λ ∈ Λ be the strictly increasing piecewise linear
function joining (t′kj , t′kj ) to (tkj +1 , sj ) to (t′kj +1 , t′kj +1 ) for those j = 1, . . . m, for which nsj ∈
/ N,
t′kj+1
with gradient 1 otherwise. Then
t′
s j − t′ kj kj +1 − sj γ(λ) ≤ min log ′
∧ log =ε
j
tkj +1 − tkj +1 tkj +1 − tkj and, if t < T , then r(x(t), y(λ(t))) = r(x(t), x(tk )) + r(x(tk ), αik ) <
t − tk < δ. Hence, by the definition of d,
ε
+ e−T ≤ ε,
d(x, y) ≤ ε ∨
2
ε
2,
for some k with 0 ≤
and so Γ is dense in DE [0, ∞).
To prove completeness, suppose that (xn )n≥1 is a Cauchy sequence in DE [0, ∞). There exist
1 ≤ N1 < N2 < · · · such that m, n ≥ Nk implies
d(xn , xm ) ≤ 2−k−1 e−k .
Then if yk = xNk for k = 1, 2, . . ., there exist uk > k and λk ∈ Λ such that
γ(λk ) ∨ d(yk , yk+1 , λk , uk ) ≤ 2−k .
Pk+n
Let µn,k = λk+n ◦ · · · ◦ λk . Then γ(µn,k ) ≤ j=k γ(λj ) ≤ 2−k+1 . By a similar argument to that
used to prove (2.3), for any µ, λ ∈ Λ,
µ(λ(t)) − µ(t) ,
eγ(µ) ≥ sup λ(t) − t
t6=λ(t)
and so
sup |µ(λ(t) − t)| ≤ sup |λ(t) − t|eγ(µ) .
0≤t≤T
0≤t≤T
Hence, if n ≥ m, then
sup |µn,k − µm,k |
0≤t≤T
sup |µn−m−1,k+m+1 (t) − t|eγ(µm,k )
≤
0≤t≤T
≤
T (eγ(µn−m−1,k+m+1 ) − 1)e2
−k+1
−k−m
≤ T (e2
→ 0
8
−k+1
− 1)e2
as m → ∞, where the second inequality follows by (2.3). Hence, µn,k converges uniformly on
compact intervals to a strictly increasing, Lipschitz continuous function µk , with γ(µk ) ≤ 2−k+1 ,
i.e. µk ∈ Λ. Now
−1
sup q(yk (µ−1
k (t) ∧ uk ), yk+1 (µk+1 (t) ∧ uk ))
t≥0
−1
= sup q(yk (µ−1
k (t) ∧ uk ), yk+1 (λk (µk (t)) ∧ uk ))
t≥0
= sup q(yk (t ∧ uk ), yk+1 (λk (t) ∧ uk ))
t≥0
= d(yk , yk+1 , λk , uk )
≤ 2−k
for k = 1, 2, . . .. Since (E, r) is complete, zk = yk ◦ µ−1
∈ DE [0, ∞), converges uniformly
k
on bounded intervals to a function y : [0, ∞) → E. As each zk ∈ DE [0, ∞), y ∈ DE [0, ∞)
(taking locally uniform limits preserves right continuity and the existence of left limits). Now
limk→∞ γ(µ−1
k ) = limk→∞ γ(µk ) = 0 and
lim sup r(yk (µ−1
k (t)), y(t)) = lim sup r(zk (t), y(t)) = 0,
k→∞ 0≤t≤T
k→∞ 0≤t≤T
for all T > 0 and so, by Proposition 2.6 (using Remark 2.7), limk→∞ d(yk , y) = 0. Hence
(DE [0, ∞), d) is complete.
In order to study Borel probability measures on DE [0, ∞) it is important to know more about
SE , the Borel σ-algebra of DE [0, ∞). The following result states that SE is just the σ-algebra
generated by the coordinate random variables.
Proposition 2.11. For each t ≥ 0, define πt : DE [0, ∞) → E by πt (x) = x(t). Then
′
SE ⊃ SE
= σ(πt : 0 ≤ t < ∞).
′
If E is separable, then SE = SE
.
Proof. For each ε > 0, t ≥ 0, and f ∈ C̄(E), the space of real valued bounded continuous
functions, define
Z
1 t+ε
ftε (x) =
f (πs (x))ds.
ε t
Now suppose (xn )n≥1 is a sequence in DE [0, ∞) converging to x. Then there exists a sequence
(λn )n≥1 in Λ such that (2.1) and (2.10) hold. Then
Z
1
f (xn (λn (s)))1{λn (t)≤s≤λn (t+ε)} λ′n (s)ds
ftε (xn ) =
ε
Z
1
→
f (x(s))1{t≤s≤t+ε} ds
ε
= ftε (x),
by dominated convergence, since xn (λn (s)) → x(s) uniformly on bounded intervals, f is bounded
and continuous, and λn (s) → s uniformly on bounded intervals, implying λ′n (s) → 1 almost
everywhere, uniformly on bounded intervals. Hence ftε is a continuous function on DE [0, ∞)
and so is Borel measurable. As limε→∞ ftε (x) = f (πt (x)) for every x ∈ DE [0, ∞), f ◦ πt is Borel
measurable for every f ∈ C̄(E) and hence every bounded measurable function f . In particular,
πt−1 (Γ) = {x ∈ DE [0, ∞) : 1Γ (πt (x)) = 1} ∈ SE
′
for all Borel subsets Γ ⊂ E, and hence SE ⊃ SE
.
Assume now that E is separable. Let n ≥ 1, let 0 = t0 < t1 < · · · < tn < tn+1 = ∞, and for
α0 , α1 , . . . , αn ∈ E define η(α0 , α1 , . . . , αn ) ∈ DE [0, ∞) by
η(α0 , α1 , . . . , αn )(t) = αi ,
ti ≤ t < ti+1 ,
9
i = 0, 1, . . . , n.
Now
d(η(α0 , α1 , . . . , αn ), η(α′0 , α′1 , . . . , α′n )) ≤ max r(αi , α′i ),
0≤i≤n
n+1
and so η is a continuous function from E
into DE [0, ∞). Since E is separable, E n+1 is
separable and so there exists a countable dense subset A ⊂ E n+1 . For fixed z ∈ DE [0, ∞) and
ε > 0, by the continuity of η,
Γ = {a ∈ E
n+1
[
: d(z, η(a)) < ε} =
{a∈A:d(z,η(a))<ε}
∞
\
1
,
B a,
n
n=1
is a measurable subset of E n+1 with respect to the Borel product σ-algebra and so, since each
′
′
-measurable function from DE [0, ∞)
πt is SE
measurable, d(z, η ◦ (πt0 , πt1 , . . . , πtn )) is an SE
into R. For m = 1, 2, . . ., define ηm as η but with n = m2 and ti = mi , i = 0, 1, . . .. By
an identical argument to that in the proof of the separability of DE [0, ∞) in Theorem 2.10,
d(x, ηm (πt0 (x), πt1 (x), . . . πtn (x))) → 0 as m → ∞ and hence
lim |d(z, ηm (πt0 (x), πt1 (x), . . . πtn (x))) − d(z, x)| ≤
m→∞
=
lim d(x, ηm (πt0 (x), πt1 (x), . . . πtn (x)))
m→∞
0
′
for every x ∈ DE [0, ∞). Therefore d(z, x) = limm→∞ d(z, ηm (πt0 (x), πt1 (x), . . . πtn (x))) is SE
measurable in x for fixed z ∈ DE [0, ∞) and in particular, every open ball B(z, ε) = {x ∈
′
DE [0, ∞) : d(z, x) < ε} belongs to SE
. Since E (and, by Theorem 2.10, DE [0, ∞)) is separable,
′
SE
contains all the open sets in DE [0, ∞) and hence contains SE .
3
Convergence of Probability Measures
I order to study the convergence of the distributions of stochastic processes, it is necessary to
understand the probability measures that characterise these. In this section we construct a
metric on the space of probability measures corresponding to the convergence, in distribution,
of the stochastic processes. Using this, we establish a relationship between convergence in
distribution and convergence in probability of processes defined on a common probability space.
This result is applied to sequences of Markov chains and Diffusion processes to obtain some
simple conditions for convergence.
Where possible, results are proved for probability measures on a general metric space (S, d).
However, in practice, we generally take S = DE [0, ∞), and in particular DRd [0, ∞), with the
metric d defined in the previous section.
Notation 3.1. For a metric space (S, d)
B(S) is the σ-algebra of Borel subsets of S
P(S) is the family of Borel probability measures on S
C̄(S) is the space of real-valued bounded continuous functions on (S, d) with norm kf k =
supx∈S |f (x)|
C is the collection of closed subsets of S
F ε = {x ∈ S : inf y∈F d(x, y) < ε} where F ⊂ S
Definition 3.2. A sequence (Pn )n≥1 in P(S) is said to converge weakly to P ∈ P(S) (denoted
Pn ⇒ P ) if
Z
Z
lim
f dPn = f dP for all f ∈ C̄(S).
n→∞
The distribution of an S-valued random variable X, denoted by P X −1 , is the element of P(S)
given by P X −1 (B) = P(X ∈ B) where P is the probability measure on the probability space
underlying X.
10
A sequence (Xn )n≥1 of S-valued random variables is said to converge in distribution to the
S-valued random variable X if P Xn−1 ⇒ P X −1 , or equivalently, if
lim E(f (Xn )) = E(f (X))
n→∞
for all f ∈ C̄(S).
This is denoted by Xn ⇒ X.
Remark 3.3. Note that this is a direct generalisation of the definition of convergence in distribution of a sequence of real-valued random variables (Xn )n≥1 , where we say that Xn ⇒ X if
limn→∞ E(f (Xn )) = E(f (X)) for all f ∈ C̄(R).
3.1
The Prohorov Metric
We now define a metric ρ on P(S) with the property that a sequence of probability measures
converges with respect to ρ if and only if it converges weakly.
Definition 3.4. For P and Q ∈ P(S) the Prohorov metric is defined by
ρ(P, Q) = inf{ε > 0 : P (F ) ≤ Q(F ε ) + ε for all F ∈ C},
using the notation defined in 3.1.
In order to prove that ρ is a metric, the following lemma is needed.
Lemma 3.5. Let P, Q ∈ P(S) and α, β > 0. If
for all F ∈ C, then
P (F ) ≤ Q(F α ) + β
(3.1)
Q(F ) ≤ P (F α ) + β
(3.2)
for all F ∈ C.
Proof. Suppose F ∈ C. F α is open since if x ∈ F α , then there exists y ∈ F such that d(x, y) < α.
Then d(z, y) ≤ d(z, x) + d(x, y) < α for all z ∈ B(x, α − d(x, y)) and so B(x, α − d(x, y)) ⊂ F α .
Hence, if G = S \ F α , then G ∈ C. F ⊂ S \ Gα since if x ∈ Gα , then there exists some y ∈
/ Fα
such that d(x, y) < α. Then d(y, z) ≥ α for all z ∈ F and so d(x, z) ≥ d(y, z) − d(x, y) > 0 for
all z ∈ F i.e. x ∈
/ F . Substituting G into (3.1) gives
P (F α ) = 1 − P (G) ≥ 1 − Q(Gα ) − β ≥ Q(F ) − β.
Proposition 3.6. The function ρ, defined above, is a metric on P(S).
Proof. By the above lemma,
ρ(P, Q)
= inf{ε > 0 : P (F ) ≤ Q(F ε ) + ε
= inf{ε > 0 : Q(F ) ≤ P (F ε ) + ε
= ρ(Q, P ).
for all F ∈ C}
for all F ∈ C}
If ρ(P, Q) = 0, then there exists a sequence (εn )n≥1 with εn → 0 such that P (F ) ≤ Q(F εn ) + εn
for all n. Letting n → ∞ and using the continuity of probability measures, gives P (F ) ≤ Q(F )
for all F ∈ C. By the above symmetry between P and Q, P (F ) = Q(F ) for all F ∈ C and hence
for all F ∈ B(S). Therefore ρ(P, Q) = 0 if and only if P = Q.
11
Finally, if P, Q, R ∈ P(S), and δ > 0, ε > 0 satisfy ρ(P, Q) < δ and ρ(Q, R) < ε, then
P (F ) ≤
≤
≤
≤
Q(F δ ) + δ
Q(F δ ) + δ
R((F δ )ε ) + δ + ε
R(F δ+ε ) + δ + ε
for all F ∈ C, so ρ(P, R) ≤ δ + ε and hence ρ(P, R) ≤ inf δ>ρ(P,Q) δ + inf ε>ρ(Q,R) ε = ρ(P, Q) +
ρ(Q, R) as required.
Theorem 3.7. Let (Pn )n≥1 be a sequence in P(S) and P ∈ P(S). If S is separable, then
limn→∞ ρ(Pn , P ) = 0 if and only if Pn ⇒ P .
Proof. Suppose limn→∞ ρ(Pn , P ) = 0. For each N , let εn = ρ(Pn , P ) + n1 . Given f ∈ C̄(S) with
f ≥ 0,
Z
Z kf k
Z kf k
f dPn =
Pn (f ≥ t)dt ≤
P ({f ≥ t}εn )dt + εn kf k
0
0
for every n and so, by dominated convergence,
lim sup
n→∞
Z
f dPn
≤
n→∞
=
Z
lim
kf k
0
=
Z
kf k
Z
0
P ({f ≥ t}εn )dt
P (f ≥ t)dt
f dP.
Hence, for all f ∈ C̄(S),
lim sup
n→∞
Z
(kf k + f )dPn ≤
lim sup
n→∞
Therefore
Z
f dP
Z
Z
(kf k + f )dP,
(kf k − f )dPn ≤
Z
and
(kf k − f )dP.
Z
≤ kf k − lim sup (kf k − f )dPn
Zn→∞
= lim inf f dPn
n→∞
Z
≤ lim sup f dPn
n→∞
Z
= lim sup (kf k + f )dPn − kf k
n→∞
Z
≤
f dP.
and so we must have equality throughout. Thus limn→∞
Pn ⇒ P .
R
f dPn =
R
f dP for all f ∈ C̄(S) i.e.
Conversely, suppose Pn ⇒ P . We first establish some preliminary results for open and closed
subsets of S. Let F ∈ C and for each ε > 0, define fε ∈ C̄(S) by
d(x, F )
∨ 0,
fε (x) = 1 −
ε
12
where d(x, F ) = inf y∈F d(x, y). Then fε ≥ 1F for all ε > 0 and so
Z
Z
lim sup Pn (F ) ≤ lim
fε dPn = fε dP,
n→∞
n→∞
for each ε > 0. Therefore, by dominated convergence,
Z
lim sup Pn (F ) ≤ lim fε dP = P (F ).
ε→0
n→∞
If G ⊂ S is open, then
lim inf Pn (G) = 1 − lim sup Pn (Gc ) ≥ 1 − P (Gc ) = P (G).
n→∞
n→∞
Now let ε > 0. Since S is separable (note: this is the only point in the proof where we use
ε
separability), there exists
S∞ a countable dense subset {x1 , x2 , . . .} ⊂ S. Let Ei = B(xi , 4 ) for
i = 1, 2, . . .. Then P ( i=1 Ei ) = P (S) = 1 and so there exists some smallest integer N such
ε2
S
SN
, where
that P ( i=1 Ei ) > 1 − 2ε . Now let G be the collection of open sets of the form
i∈I Ei
I ⊂ {1, . . . , N }. Since G is finite, by the above result on open sets, there exists some N0 such
that P (G) ≤ Pn (G) + 2ε for all G ∈ G and n ≥ N0 . Given F ∈ C, let
[
F0 = {Ei : 1 ≤ i ≤ N, Ei ∩ F 6= ∅}.
ε
Then F02 ∈ G and so
P (F ) ≤
≤
≤
≤
ε
2
P (F0 ) + P
∞
[
i=N +1
ε
2
ε
2
Pn (F0 ) + ε
ε
Ei
!
P (F02 ) +
Pn (F ε ) + ε
S∞
for all n ≥ N0 , where the first inequality is by F ⊂ F0 ∪( i=N +1 Ei ), the second by the definition
of N , the third by the definition of N0 and the fourth by the diameter of the Ei being 2ε . Hence
ρ(Pn , P ) ≤ ε for each n ≥ N0 , i.e. limn→∞ ρ(Pn , P ) = 0.
Definition 3.8. Let P, Q ∈ P(S). Define M(P, Q) to be the set of all µ ∈ P(S × S) with
marginals P and Q i.e. µ(A × S) = P (A) and µ(S × A) = Q(A) for all A ∈ B(S).
The following lemma provides a probabilistic interpretation of the Prohorov metric:
Lemma 3.9.
ρ(P, Q) ≤
inf
µ∈M(P,Q)
inf {ε > 0 : µ((x, y) : d(x, y) ≥ ε) ≤ ε} .
Proof. If for some ε > 0 and µ ∈ M(P, Q) we have
µ((x, y) : d(x, y) ≥ ε) ≤ ε,
then
P (F )
= µ(F × S)
≤ µ ((F × S) ∩ {(x, y) : d(x, y) < ε}) + µ((x, y) : d(x, y) ≥ ε)
≤ µ(S × F ε ) + ε
= Q(F ε ) + ε,
for all F ∈ C and so ρ(P, Q) ≤ ε. The result follows.
13
In fact, in the case when S is separable, the inequality in the above lemma can be replaced
by an equality. (The proof is an immediate consequence of Lemma 3.11).
Proposition 3.10. Let (S, d) be separable. Suppose that Xn , n = 1, 2, . . ., and X are S-valued
random variables defined on the same probability space with distributions Pn , n = 1, 2, . . ., and
P respectively. If d(Xn , X) → 0 in probability as n → ∞, then Pn ⇒ P .
Proof. For n = 1, 2, . . ., let µn be the joint distribution of Xn and X. Then
lim µn ((x, y) : d(x, y) ≥ ε) =
n→∞
=
lim P(d(Xn , X) ≥ ε)
n→∞
0,
where P is the probability measure on the probability space underlying Xn and X. By Lemma
3.9, limn→∞ ρ(Xn , X) = 0, and since S is separable, the result follows by Theorem 3.7.
3.2
Examples
Proposition 3.10 suggests a method of proving that a sequence of probability measures converges
weakly by constructing random variables with the required distributions on a common probability space and showing that they converge in probability. We illustrate this method by looking
at three examples: discrete time Markov chains, continuous time Markov chains and diffusion
processes.
3.2.1
Discrete Time Markov Chains
Suppose that E is countable and that (X N )N ≥1 and X are discrete time Markov Chains with
initial distributions (λN )N ≥1 and λ, and transition matrices (P N )N ≥1 and P respectively. We
will show that if P N → P and λN → λ uniformly, then X N ⇒ X.
We construct random variables (Y N )N ≥1 and Y with the required distributions on the probability space ([0, 1), B, µ) where B is the Borel σ-algebra and µ is Lebesgue measure. Without
loss of generality we may assume E = N (set the relevant probabilities to zero if E is finite).
For each ω ∈ [0, 1), construct a sequence a(ω) = (in )n≥0 of elements of N inductively as
follows:
P∞
Pi0
Since i=0 λi = 1, there exists a smallest i0 ∈ N such that ω > i=0
λi . Set a(ω)0 = i0 ,
Pi0 −1
and let µi0 = i=0 λi .
P∞
some smallest i1 such that
Since µi0 ≤ ω < µi0 + λi0 , and
i=0 pi0 i = 1, there existsP
Pi1
i1 −1
pi0 i .
ω < µi0 + λi0 i=0 pi0 i . Set a(ω)1 = i1 , and let µi0 ,i1 = µi0 + λi0 i=0
Suppose we have constructed a(ω)m = iP
m and µi0 ,...,im for all m < n, such that µi0 ,...,im ≤
∞
ω < µi0 ,...,im + λi0 pi0 i1 · · · pim−1 im . Since i=0 pin−1 i = 1, there exists some smallest in such
Pin
that ω < µi0 ,...,in−1 + λi0 pi0 i1 · · · pin−2 in−1 i=0 pin−1 i . Set a(ω)n = in , and let µi0 ,...,in =
P n −1
pin−1 i .
µi0 ,...,in−1 + λi0 pi0 i1 · · · pin−2 in−1 ii=0
Define a discrete time process (Yn )n≥0 on ([0, 1), B, µ) by setting Yn (ω) = a(ω)n . Then
P (Y0 = i0 , Y1 = i1 , . . . , Yn = in ) =
=
µ(ω : µi0 ,...,in ≤ ω < µi0 ,...,in + λi0 pi0
λi0 pi0
i1
· · · pin−1
in ,
i1
· · · pin−1
in )
and so (Yn )n≥0 is a Markov chain with initial distribution λ and transition matrix P . For each
N ∈ N, construct (YnN )n≥0 with initial distribution λN and transition matrix P N similarly.
µi0 ,...,in is a continuous function in a finite number of the entries of λ and P , and λN → λ and
P → P , uniformly in N . Hence µN
i0 ,...,in → µi0 ,...,in . Therefore, if µi0 ,...,in < ω < µi0 ,...,in+1 ,
then there exists some N0 ∈ N such that N ≥ N0 implies that
N
N
µN
i0 ,...,in < ω < µi0 ,...,in+1
14
and hence YmN (ω) = Ym (ω) for all m ≤ n. In other words, provided ω 6= µi0 ,...,in for any i0 , . . . , in ,
Y N (ω) → Y (ω). But only a countable number of elements of [0, 1) are equal to µi0 ,...,in for some
i0 , . . . , in and so limN →∞ Y N = Y almost surely. By Proposition 3.10, X N ⇒ X as required.
3.2.2
Continuous Time Markov Chains
Suppose that E is countable and that (X N )N ≥1 and X are continuous time Markov Chains with
initial distributions (λN )N ≥1 and λ, and generator matrices (QN )N ≥1 and Q respectively. We
will show that if QN → Q and λN → λ uniformly, then X N ⇒ X.
We construct random variables (Z N )N ≥1 and Z with the required distributions on a common
probability space (Ω, F, P), using a construction due to Norris [9]. As in the discrete case we
shall assume that E = N.
Let (ΠN )N ≥1 and Π be the jump matrices corresponding to the generator matrices (QN )N ≥1
and Q respectively. Since QN → Q uniformly, the corresponding jump matrices ΠN → Π
uniformly and, by the discrete case above, there exist discrete time Markov chains (Y N )N ≥1 and
Y with initial distributions (λN )N ≥1 and λ, and transition matrices (ΠN )N ≥1 and Π respectively,
such that limN →∞ Y N = Y almost surely. By discarding a set of measure zero if necessary, we
may assume that Y N (ω) → Y (ω) for all ω ∈ Ω. Let T1 , T2 , . . . be independent exponential
random variables of parameter 1, independent of (Y N )N ≥1 and Y . Defining q(i) = −qii , set
n
Sn = q(YTn−1
) , Jn = S1 + · · · + Sn , and
(
Yn
if Jn ≤ t < Jn+1 for some n
Zt =
∞
otherwise.
Then the Sn are independent exponential random variables with parameters q(Yn ) and so Z has
the required distribution. Define SnN , JnN and Z N similarly for N ≥ 1. Since Y N (ω) → Y (ω)
for all ω ∈ Ω, given ω ∈ Ω, for each n, there exists some Nn such that N ≥ Nn implies that
Tm (ω)
N
YmN (ω) = Ym (ω) for all m ≤ n. Then since QN → Q, if N ≥ Nn , then Sm
=
(ω) = qN (Y
N
(ω))
m−1
Tm (ω)
qN (Ym−1 (ω))
m (ω)
N
→ q(YTm−1
(ω)) = Sm (ω) for all m ≤ n + 1 and hence Jm → Jm for all m ≤ n + 1. By
the same argument as that used to prove that DE [0, ∞) is separable in Theorem 2.10, it follows
that d(Z N (ω), Z(ω)) → 0 as N → ∞, and so limN →∞ Z N = Z almost surely. By Proposition
3.10, X N ⇒ X, as required.
3.2.3
Diffusion Processes
Suppose that (an )n≥1 and a are bounded symmetric uniformly positive definite Lipschitz functions Rd → Rd ⊗ Rd , and that (bn )n≥1 and b are bounded Lipschitz functions Rd → Rd . Let
(X n )n≥1 and X be diffusion processes in Rd with diffusivities (an )n≥1 and a, and drifts (bn )n≥1
and b respectively, starting from x ∈ Rd . We shall show that if an → a and bn → b uniformly,
then X n ⇒ X.
Let (Bt )t≥0 be a Brownian Motion in Rd . We shall construct diffusions (Z n )n≥1 and Z, with
the required distributions, on the probability space underlying (Bt )t≥0 .
Since a(x) is symmetric positive definite for all x ∈ Rd , there exists a unique symmetric
positive definite map σ : Rd → Rd ⊗ (Rd )∗ such that σ(x)σ(x)∗ = a(x). Furthermore, since a
is bounded and Lipschitz, σ is bounded and Lipschitz. Similarly, there exist unique symmetric
positive definite bounded Lipschitz maps (σn )n≥1 and, since an → a uniformly, σn → σ uniformly. Assume that σ, σn , b and bn have Lipschitz constant K, independent of n. Since σ, σn ,
b and bn are Lipschitz, there exist continuous processes Z and Z n for n = 1, 2, . . ., adapted to
the filtration generated by (Bt )t≥0 satisfying
dZt
dZtn
=
=
σ(Zt )dBt + b(Zt )dt, Z0 = x,
σn (Ztn )dBt + bn (Ztn )dt, Z0n = x.
15
Z n and Z have the required distributions. Using (x + y)2 = 2x2 + 2y 2 ,
|Ztn
2
− Zt |
=
≤
Z t
2
Z t
n
(σn (Zsn ) − σ(Zs ))dBs +
(bn (Zs ) − b(Zs ))ds
0
0
Z t
2
Z t
2
n
n
2 (σn (Zs ) − σ(Zs ))dBs + 2 (bn (Zs ) − b(Zs ))ds .
0
By Doob’s L2 inequality,
0
Z t
2 !
Z t
n
E sup (σn (Zs ) − σ(Zs ))dBs ≤ 4E
|σn (Zsn ) − σ(Zs )|2 ds ,
s≤t
0
0
and by the Cauchy-Schwartz inequality,
Z t
2 !
Z t
n
n
2
E sup (bn (Zs ) − b(Zs ))ds
≤ tE
|bn (Zs ) − b(Zs )| ds .
s≤t
0
0
Now since K is a Lipschitz constant for σn ,
|σn (Zrn ) − σ(Zr )|
≤ |σn (Zrn ) − σn (Zr )| + |σn (Zr ) − σ(Zr )|
≤ K|Zrn − Zr | + kσn − σk,
and similarly
|bn (Zrn ) − b(Zr )| ≤ K|Zrn − Zr | + kbn − bk.
Hence
Z t
Z t
≤ 8E
E sup |Zsn − Zs |2
|σn (Zsn ) − σ(Zs )|2 ds + 2tE
|bn (Zsn ) − b(Zs )|2 ds
s≤t
0
0
Z
t
(K 2 |Zsn − Zs |2 + kσn − σk2 )ds
Z t
+ 4tE
(K 2 |Zsn − Zs |2 + kbn − bk2 )ds
≤ 16E
0
0
≤ 16tkσn − σk2 + 4t2 kbn − bk2
Z t + (16 + 4t)K 2
E sup |Zrn − Zr |2 ds.
r≤s
0
Given ε > 0 and T > 0, set c = (16 + 4T )K 2 and ε′ = εe−cT . q
Since σn → σ and bq
n → b
ε′
ε′
uniformly, there exists N ∈ N such that n ≥ N implies kσn − σk < 32T and kbn − bk < 8T
2.
n
2
If fn (t) = E sups≤t |Zs − Zs | , then
′
fn (t) ≤ ε + c
Z
t
fn (s)ds,
0
for all n ≥ N and 0 ≤ t ≤ T . By Gronwall’s Inequality (Lemma 6.9), fn (t) ≤ ε′ ect = ε and so
E sups≤t |Zsn − Zs |2 → 0 as n → ∞. In particular, sups≤t |Zsn − Zs | → 0 in probability. Taking
λn (t) = t for all t, Proposition 2.6 implies that d(Z n , Z) → 0 in probability. By Proposition
3.10, X N ⇒ X, as required.
3.3
The Skorohod Representation
A converse to Proposition 3.10 exists, in the form of the Skorohod Representation. Before we
can prove this, we need the following lemma.
16
Lemma 3.11. Let S be separable. Let P, Q ∈ P(S), ε > 0 satisfy ρ(P, Q) < ε, and δ > 0.
Suppose that E1 , . . . , EN ∈ B(S) are disjoint with diameters less than δ and that P (E0 ) ≤ δ,
S
where E0 = S \ N
i=1 Ei . Then there exist constants c1 , . . . cN ∈ [0, 1] and independent random
variables X, Y0 , . . . , YN (S-valued) and ξ ( [0, 1]-valued) on some probability space (Ω, F, P) such
that X has distribution P , ξ is uniformly distributed on [0, 1],
(
Yi
on {X ∈ Ei , ξ ≥ ci }, i = 1, . . . , N,
Y =
SN
Y0
on {X ∈ E0 } ∪ i=1 {X ∈ Ei , ξ < ci }
has distribution Q,
{d(X, Y ) ≥ δ + ε} ⊂ {X ∈ E0 } ∪ ξ < max
ε
: P (Ei ) > 0
P (Ei )
,
and
P(d(X, Y ) ≥ δ + ε) ≤ δ + ε.
Proof. This lemma is not proved here as the proof is long and not very illuminating. The
interested reader is referred to pp97-101 of Ethier and Kurtz [4].
Theorem 3.12 (The Skorohod Representation). Let (S, d) be separable. Suppose Pn ,
n = 1, 2, . . . and P in P(S) satisfy Pn ⇒ P . Then there exists a probability space (Ω, F, P)
on which are defined S-valued random variables Xn , n = 1, 2, . . . and X with distributions Pn ,
n = 1, 2, . . ., and P respectively, such that limn→∞ Xn = X almost surely.
S∞
−k
Proof. Let {x1 , x2 , . . .} be a dense subset of S. Then P
= 1 for each k and
i=1 B xi , 2
so, given ε > 0, there exist integers N1 , N2 , . . . such that
!
Nk
[
−k
≥ 1 − 2−k
B xi , 2
P
i=1
SNk (k)
(k)
(k)
Ei . Assume (without loss of
for k = 1, 2, . . .. Set Ei = B(xi , 2−k ) and E0 = S \ i=1
(k)
generality) that εk = min1≤i≤Nk P (Ei ) > 0. Define the sequence (kn )n≥1 by
n
εk o
kn = 1 ∨ max k ≥ 1 : ρ(Pn , P ) <
.
k
ε
Apply Lemma 3.11 with Q = Pn , ε = kknn if kn > 1 and ε = ρ(Pn , P ) + n1 if kn = 1, δ = 2−kn ,
Ei = Eikn , and N = Nkn for n = 1, 2, . . .. Then there exists a probability space (Ω, F, P) on
(n)
(n)
which are defined S-valued random variables Y0 , . . . , YNkn , n = 1, 2, . . ., a random variable
ξ, uniformly distributed on [0, 1], and an S-valued random variable X with distribution P , all
(n)
(n)
of which are independent, such that if the constants c1 , . . . , cNkn ∈ [0, 1], n = 1, 2, . . . are
appropriately chosen, then the random variable
(
(n)
(k )
(n)
on {X ∈ Ei n , ξ ≥ ci }, i = 1, . . . , Nkn ,
Yi
Xn =
S
Nkn
(n)
(k )
(n)
(k )
{X ∈ Ei n , ξ < ci }
Y0
on {X ∈ E0 n } ∪ i=1
has distribution Pn and
1
εk
(k )
d(Xn , X) ≥ 2−kn + n ⊂ {X ∈ E0 n } ∪ ξ <
kn
kn
if kn > 1, for n = 1, 2, . . ..
17
Since Pn ⇒ P , by Theorem 3.7 ρ(Pn , P ) → 0. Hence, for each k ∈ N, ρ(Pn , P ) < εkk
for sufficiently large n, and so kn ≥ k for sufficiently large n. If Kn = minm≥n km , then
limn→∞ Kn = ∞. However, for Kn > 1,
!
∞
∞ X
[
1
εkm
(k)
−km
P(X ∈ E0 ) + P ξ <
≤
+
d(Xm , X) ≥ 2
P
km
Kn
m=n
k=Kn
≤
2−Kn +1 +
1
Kn
→ 0.
So limn→∞ Xn = X almost surely.
We conclude this section by giving an application of this theorem.
Corollary 3.13 (The Continuous Mapping Theorem). Let (S, d) and (S ′ , d′ ) be separable
metric spaces and let h : S → S ′ be Borel measurable. Suppose that Pn , n = 1, 2, . . . and P in
P(S) satisfy Pn ⇒ P , and define Qn , n = 1, 2, . . . and Q in P(S ′ ) by Qn = Pn h−1 , Q = P h−1 .
(By definition, P h−1 (B) = P (s ∈ S : h(s) ∈ B).) Let Ch be the set of points of S at which h is
continuous. If P (Ch ) = 1, then Qn ⇒ Q on S ′ .
Proof. By Theorem 3.12, there exists a probability space (Ω, F, P) on which are defined Svalued random variables Xn , n = 1, 2, . . ., and X with distributions Pn , n = 1, 2, . . ., and P
respectively, such that limn→∞ Xn = X almost surely. Since P(X ∈ Ch ) = P (Ch ) = 1, we have
limn→∞ h(Xn ) = h(X) almost surely, and so, by Proposition 3.10, Qn ⇒ Q in S ′ .
4
Convergence of Finite Dimensional Distributions
We now get to a key result, due to Prohorov, on characterizing convergent processes. This states
that a sequence of stochastic processes converges if and only if it is relatively compact and the
finite dimensional distributions converge.
This is a particularly useful method for checking the convergence of processes where the
limit has independent increments, as in this case computing the finite dimensional distributions
is relatively straightforward. (See Example 4.4 at the end of the section for an illustration).
Definition 4.1. Let {Xα } (where α ranges over some index set) be a family of stochastic processes with sample paths in DE [0, ∞), and let {Pα } ⊂ P(DE [0, ∞)) be the family of associated
probability distributions (i.e. Pα (B) = P(Xα ∈ B), (where P is the probability measure on the
probability space underlying Xα ), for all B ∈ SE , SE being the Borel σ-algebra of DE [0, ∞).)
We say that {Xα } is relatively compact if {Pα } is (i.e. if the closure of {Pα } in P(DE [0, ∞)) is
compact).
Lemma 4.2. If X is a process with sample paths in DE [0, ∞), then the complement in [0, ∞)
of
D(X) = {t ≥ 0 : P(X(t) = X(t−)) = 1}
is at most countable.
Proof. For each, ε > 0, δ > 0, and T > 0, let
A(ε, δ, T ) = {0 ≤ t ≤ T : P (r(X(t), X(t−)) ≥ ε) ≥ δ} .
18
Now if A(ε, δ, T ) contains a sequence (tn )n≥1 of distinct points, then
P (r(X(tn ), X(tn −)) ≥ ε
infinitely often)
= P
=
≥
∞ [
∞
\
{r(X(tm ), X(tm −)) ≥ ε}
n=1 m=n
∞
[
lim P
n→∞
!
{r(X(tm ), X(tm −)) ≥ ε}
m=n
!
lim P (r(Xn (t), X(tn −)) ≥ ε)
n→∞
≥ δ > 0.
contradicting the fact that for each x ∈ DE [0, ∞), r(x(t), x(t−)) ≥ ε for at most finitely many
t ∈ [0, T ] (see the proof of Lemma 2.2). Hence A(ε, δ, T ) is finite and so
c
D(X) = {t ≥ 0 : P(r(X(t), X(t−)) > 0) > 0} =
∞
∞ [
∞ [
[
A
n=1 n=1 N =1
1 1
, ,N
n m
,
is at most countable.
Theorem 4.3. Let E be separable and let Xn , n = 1, 2, . . ., and X be processes with sample
paths in DE [0, ∞).
(a) If Xn ⇒ X, then
(Xn (t1 ), . . . , Xn (tk )) ⇒ (X(t1 ), . . . , X(tk ))
(4.1)
for every finite set {t1 , . . . , tk } ⊂ D(X). Moreover, for each finite set {t1 , . . . , tk } ⊂ [0, ∞),
there exist sequences (tn1 )n≥1 in [t1 , ∞), . . . , (tnk )n≥1 in [tk , ∞) converging to t1 , . . . , tk ,
respectively, such that (Xn (tn1 ), . . . , Xn (tnk )) ⇒ (X(t1 ), . . . , X(tk )).
(b) If {Xn } is relatively compact and there exists a dense set D ⊂ [0, ∞) such that (4.1) holds
for every finite set {t1 , . . . , tk } ⊂ D, then Xn ⇒ X.
Proof. (a) Suppose that Xn ⇒ X. Using the Skorohod Representation (Theorem 3.12), there
exists a probability space on which are defined processes Yn , n = 1, 2, . . ., and Y with
sample paths in DE [0, ∞) and with the same distributions as Xn , n = 1, 2, . . ., and X, such
that d(Yn , Y ) = 0 almost surely. If {t1 , . . . , tk } ⊂ D(X) = D(Y ), then using the notation of
Proposition 2.11, (πt1 , . . . , πtk ) : DE [0, ∞) → E k is continuous almost surely with respect
to the distribution of Y and so, by the Continuous Mapping Theorem (Corollary 3.13),
lim (Yn (t1 ), . . . , Yn (tk ))
n→∞
=
=
=
lim (πt1 , . . . , πtk )(Yn )
n→∞
(πt1 , . . . , πtk )(Y )
(Y (t1 ), . . . , Y (tk )) almost surely.
The first conclusion follows by Proposition 3.10.
For the second conclusion, we observe that, by Lemma 4.2, for each finite set {t1 , . . . tk } ⊂
[0, ∞), there exist sequences (tn1 )n≥1 in [t1 , ∞) ∩ D(X), . . . , (tnk )n≥1 in [tk , ∞) ∩ D(X) conm
verging to t1 , . . . , tk , respectively. Then, by the above result, (Xn (tm
1 ), . . . , Xn (tk )) ⇒
m
m
(X(t1 ), . . . , X(tk )) for each m ∈ N as n → ∞. Since the process X is right conm
tinuous, (X(tm
1 ), . . . , X(tk )) → (X(t1 ), . . . , X(tk )) almost surely as m → ∞ and so
n
n
(Xn (t1 ), . . . , Xn (tk )) ⇒ (X(t1 ), . . . , X(tk )).
(b) Since {Xn } is relatively compact, the closure of {Pn } is compact and hence every subsequence of {Pn } has a convergent subsequence. It follows that every subsequence of {Xn }
has a convergent (in distribution) subsequence and so it is enough to show that every convergent subsequence of {Xn } converges in distribution to X. Restricting to a subsequence if
necessary, suppose that Xn ⇒ Y . We must show that X and Y have the same distribution.
19
Let {t1 , . . . , tk } ⊂ D(Y ) and f1 , . . . fk ∈ C̄(E), and choose sequences (tn1 )n≥1 in D∩[t1 , ∞),
. . . , (tnk )n≥1 in D ∩ [tk , ∞) converging to t1
, . . . , tk , respectively.
map (x1 , . . . , xk) 7→
TheQ
Qk
Qk
k
k
m
m
i=1 fi (xi ) ∈ C̄(E ) and so (4.1) implies E
i=1 fi (Xn (ti )) → E
i=1 fi (X(ti )) as
n → ∞ for each m ≥ 1. Therefore there exist integers n1 < n2 < n3 < . . . such that
!
!
k
k
Y
Y
1
m
m
(4.2)
fi (Xnm (ti )) < .
fi (X(ti )) − E
E
m
i=1
i=1
Now
!
! !
!
k
k
k
k
Y
Y
Y
Y
m
fi (X(ti )) fi (Y (ti )) ≤ E
fi (X(ti )) − E
fi (X(ti )) − E
E
i=1
i=1
i=1
i=1
!
!
k
k
Y
Y
m
m
fi (Xnm (ti )) fi (X(ti )) − E
+ E
i=1
i=1
!
!
k
k
Y
Y
m
+ E
fi (Xnm (ti )) − E
fi (Xnm (ti )) i=1
i=1
!
!
k
k
Y
Y
fi (Y (ti )) fi (Xnm (ti )) − E
+ E
i=1
i=1
for each m ≥ 1. All four terms on the right tend to zero as m → ∞, the first by the right
continuity of X, the second by (4.2), the third by the right continuity of Xnm , and the
fourth by (a), using the facts that Xnm ⇒ Y and {t1 , . . . , tk } ⊂ D(Y ). Hence
!
!
k
k
Y
Y
fi (Y (ti ))
(4.3)
fi (X(ti )) = E
E
i=1
i=1
for all f1 , . . . fk ∈ C̄(E) and all {t1 , . . . , tk } ⊂ D(Y ) (and hence for all {t1 , . . . , tk } ⊂ [0, ∞)
(Lemma 4.2 and right continuity of X and Y )). Now let
D = {A ∈ SE : E(1X∈A ) = E(1Y ∈A )} .
This is clearly a d-system. Also, since 1A is in the closure of C̄(E) for all open sets A ⊂ E,
(4.3), together with the dominated convergence theorem, implies that D contains the πsystem consisting of all finite intersections of {πt−1 (A) : t ∈ [0, ∞) and A ⊂ E is open}. By
Dynkin’s π-system Lemma, D contains the σ-algebra generated by the coordinate random
variables πt , and hence, by Proposition 2.11, SE itself. Thus X and Y have the same
distribution.
Example 4.4. Suppose (Xn )n≥1 is a relatively compact sequence of processes with sample
paths in DRd [0, ∞) having independent increments, and let X be a process in DRd [0, ∞) having independent increments. For simplicity assume that X(0) = Xn (0) = 0 for all n. Now
(Xn (t1 ), . . . , Xn (tk )) ⇒ (X(t1 ), . . . , X(tk )) for every finite set {t1 , . . . , tk } ⊂ D(X) if and only if




k
k
X
X
hθj , X(tj )i
hθj , Xn (tj )i → E exp i
E exp i
j=1
j=1
for every k-tuple (θ1 , . . . , θk ) ∈ ((Rd )∗ )k . Since Xn has independent increments,


k
k
Y
X
E(exp ihθj′ , Xn (tj ) − Xn (tj−1 )i)
hθj , Xn (tj )i =
E exp i
j=1
j=1
20
Pk
where θj′ = m=j θm for j = 1, . . . , k. The same result holds for X and so, if Xn (t) − Xn (s) ⇒
X(t) − X(s) for all s, t, then the finite dimensional distributions of Xn converge in distribution
to those of X and hence Xn ⇒ X. Since this condition is clearly necessary, Xn ⇒ X if and only
if Xn (t) − Xn (s) ⇒ X(t) − X(s) for all s, t.
Relative Compactness in DE [0, ∞)
5
In order to apply Theorem 4.3 in any practical cases, it is necessary to have an understanding of
the conditions a family of stochastic processes, or equivalently probability measures, must satisfy
to be relatively compact. In this section we establish some necessary and sufficient conditions
for relative compactness in DE [0, ∞) which will be useful in later sections.
5.1
Prohorov’s Theorem
Prohorov’s Theorem gives a characterisation of the compact subsets of P(S), where (S, d) is the
metric space of Section 3, by relating compactness to the notion of tightness.
Definition 5.1. A probability measure P ∈ P(S) is said to be tight if for each ε > 0 there
exists a compact set K ⊂ S such that P (K) ≥ 1 − ε.
A family of probability measures M ⊂ P(S) is tight if for each ε > 0 there exists a compact
set K ⊂ S such that inf P ∈M P (K) ≥ 1 − ε.
Theorem 5.2 (Prohorov’s Theorem). Let (S, d) be complete and separable, and let M ⊂
P(S). Then the following are equivalent:
(a) M is tight.
(b) For each ε > 0, there exists a compact set K ⊂ S such that
inf P (K ε ) ≥ 1 − ε,
P ∈M
where K ε is defined in 3.1.
(c) M is relatively compact.
Before we can prove this, we need two intermediate results.
Theorem 5.3. If S is separable, then P(S) is separable. If in addition (S, d) is complete, then
(P(S), ρ) is complete.
Proof. Since S is separable, there exists a countable dense subset {x1 , x2 , . . .} ⊂ S. Let δx denote
the element of P(S) with unit mass at x ∈ S. Fix x0 ∈ S. We shall show that the countable
PN
P
set of probability measures of the form N
i=0 ai = 1
i=0 ai δxi withSN finite, ai rational and
∞
is dense in P(S). Given ε > 0, let P ∈ P(S). Since P ( i=1 B(xi , ε/2)) = 1, there exists some
Si−1
SN
N < ∞ such that P ( i=1 B(xi , ε/2)) ≥ 1 − 2ε . Set Ai = B(xi , ε/2) \ j=1 B(xj , ε/2) for each
⌊mP (Ai )⌋
i ≤ N . Pick some m ∈ N with m ≥ 2N
ε . Let ai =
m P< P (Ai ) for i = 1, . . . , N and define
PN
a0 = 1 − i=1 ai . Then the ai are rational and so Q = N
i=0 ai δxi is of the required form. If
F ∈ C, then


!c !
N
[
[
P (F ) ≤ P 
Ai
Ai  + P
F ∩Ai 6=∅
≤
X
F ∩Ai 6=∅
i=1
ε
⌊mP (Ai )⌋ N
+
+
m
m 2
≤ Q(F ε ) + ε.
21
Therefore ρ(P, Q) < ε and so P(S) is separable.
To prove completeness, suppose (Pn )n≥1 is a Cauchy sequence in P(S). By restricting
to a subsequence if necessary, we may assume that ρ(Pn−1 , Pn ) < 2−n for each n ≥ 2. As
in the proof of separability, for each n = 2, 3, . . . there exists some Nn < ∞ and disjoint
(n)
(n)
(n)
sets E1 , . . . , ENn ∈ B(S) with diameters less than 2−n such that Pn−1 (E0 ) ≤ 2−n where
SNn (n)
(n)
Ei . By applying Lemma 3.11 successively for n = 2, 3, . . ., with P = Pn−1 ,
E0 = S \ i=1
Q = Pn , ε = δ = 2−n and N = Nn , there exists a probability space (Ω, F, P) on which are defined
(n)
(n)
S-values random variables Y0 , . . . , YNn , for n = 2, 3, . . ., [0, 1]-valued random variables ξ (n) ,
n = 2, 3, . . ., and an S-valued random variable X1 with distribution P1 , such that if the constants
(n)
(n)
c1 , . . . , cNn ∈ [0, 1] are appropriately chosen, then the random variable
Xn =
(
(n)
Yi
(n)
Y0
has distribution Pn , and
(n)
(n)
on {Xn−1 ∈ Ei , ξ (n) ≥ ci }, i = 1, . . . , Nn ,
SNn
(n)
(n)
(n)
on {Xn−1 ∈ E0 } ∪ i=1
{Xn−1 ∈ Ei , ξ (n) < ci }
P(d(Xn−1 , Xn ) ≥ 2−n+1 ) ≤ 2−n+1 .
P∞
Then n=2 P(d(Xn−1 , Xn ) ≥ 2−n+1 ) < ∞ and, by the Borel-Cantelli Lemma, P(d(Xn−1 , Xn ) ≥
2−n+1 infinitely often) = 0. Hence
!
∞
X
P
d(Xn−1 , Xn ) < ∞ = 1.
n=2
Since (S, d) is complete, limn→∞ Xn exists on this set. Setting X to be the value of the limit
where it exists, and 0 otherwise, limn→∞ Xn = X almost surely and so, by Proposition 3.10,
limn→∞ ρ(Pn , P ) = 0, where P is the distribution of X.
Lemma 5.4. If (S, d) is complete and separable, then each P ∈ P(S)is tight.
Proof. Let {x1 , x2 , . . .} be a dense subset of S, and let P ∈ P(S). Then P
for each n and so, given ε > 0, there exist integers N1 , N2 , . . . such that
!
N
[n ε
1
≥1− n
P
B xk ,
n
2
S∞
k=1
B xk , n1
=1
k=1
T
SNn
B xk , n1 . Then for each δ > 0, K can
for n = 1, 2, . . .. Let K be the closure of n≥1 k=1
be covered by Nn balls of radius δ where n > δ1 . Therefore K is totally bounded and hence
compact, and
"
!#
N
∞
[n X
1
1−P
B xk ,
P (K) ≥ 1 −
n
n=1
k=1
∞
X
ε
≥ 1−
n
2
n=1
= 1 − ε.
Proof of Theorem 5.2.
(a ⇒ b)
Immediate.
22
(b ⇒ c) By Theorem 5.3, (P(S), ρ) is complete and hence the closure of M is complete. So it
suffices to show S
that M is totally bounded i.e. given δ > 0, there exists a finite set N ⊂ P(S)
such that M ⊂ P ∈N {Q : ρ(P, Q) < δ}.
holds. By the
Let 0 < ε < 2δ . Then there exists a compact set K ⊂ S such that (b) S
compactness of K, there exists a finite set {x1 , . . . , xn } ⊂ K such that K ε ⊂ ni=1 Bi , where
Bi = B(xi , 2ε). Fix x0 ∈ S and an integer m ≥ nε and let N be the finite collection of probability
measures of the form
n X
ki
(5.1)
δx i ,
P =
m
i=0
Pn
where ki are integers with 0 ≤ ki ≤ m and i=0 ki = m.
Si−1
Given Q ∈ M, let ki = ⌊mQ(Ei )⌋ for i = 1, 2, . . . , n, where Ei = Bi \ j=1 Bj , and let
Pn
k0 = m − i=1 ki . Then, defining P by 5.1, we have


[
Q(F ) ≤ Q 
Ei  + Q((K ε )c )
F ∩Ei 6=∅
≤
X
F ∩Ei 6=∅
⌊mQ(Ei )⌋
n
+
+ε
m
m
≤ P (F 2ε ) + 2ε
for all closed sets F ⊂ S. So ρ(P, Q) ≤ 2ε < δ as required.
(c ⇒ a) Let ε > 0. Since M is relatively compact, it is totally
bounded and hence, for each
S
ε
}. Since
n ∈ N, there exists a finite subset Nn ⊂ M such that M ⊂ P ∈Nn {Q : ρ(P, Q) < 2n+1
Nn is finite, by Lemma 5.4, for each n ∈ N we can choose a compact set Kn ⊂ S such that
ε
for all P ∈ Nn . Given Q ∈ M, for each n ∈ N, there exists Pn ∈ Nn such
P (Kn ) ≥ 1 − 2n+1
that
n+1
ε
ε
Q(Knε/2 ) ≥ Pn (Kn ) − n+1 ≥ 1 − n .
2
2
T
ε/2n+1
Letting K be the closure of n≥1 Kn
, K is compact and
Q(K) ≥ 1 −
5.2
∞
X
ε
= 1 − ε.
2n
n=1
Compact Sets in DE [0, ∞)
To apply Prohorov’s Theorem (Theorem 5.2) to P(DE [0, ∞)), it is necessary to have a characterisation of the compact sets of DE [0, ∞). We first give conditions under which a collection of
step functions is compact.
Definition 5.5. Given a step function x ∈ DE [0, ∞), define s0 (x) = 0 and, for k = 1, 2, . . .,
define sk (x) = inf{t > sk−1 (x) : x(t) 6= x(t−)} if sk−1 (x) < ∞, and sk (x) = ∞ if sk−1 (x) = ∞.
Lemma 5.6. For Γ ⊂ E, compact, and δ > 0, define A(Γ, δ) to be the set of step functions
x ∈ DE [0, ∞) such that x(t) ∈ Γ for all t ≥ 0, and sk (x) − sk−1 (x) > δ for each k ≥ 1 for which
sk−1 < ∞. Then the closure of A(Γ, δ) is compact.
Proof. It is enough to show that every sequence in A(Γ, δ) has a convergent subsequence. Suppose (xn )n≥1 is a sequence in A(Γ, δ). Either there exists a subsequence (x1a
n )n≥1 such that
1
1
s1 (x1a
)
<
∞
for
all
n,
or
there
exists
a
subsequence
(x
)
such
that
s
(x
)
=
∞ for all n. In
n≥1
1
n
n
n
23
1a
the first case, there exists a subsequence (x1b
for some t1 ∈ [δ, ∞],
n )n≥1 of (xn )n≥1 such that,
1
t1
1b
limn→∞ s1 (xn ) = t1 . If t1 < ∞, we can insist further that log s1 (x1b ) < n . Since Γ is compact,
n
1
1
1
the sequence (x1b
n )n≥1 has a subsequence (xn )n≥1 such that limn→∞ xn (s1 (xn )) = α1 for some
1
α1 . In this way, we can construct a sequence of subsequences (xn )n≥1 ⊂ (x2n )n≥1 ⊂ · · · such
that, for k = 1, 2, . . ., either
(a) sk (xkn ) < ∞ for all n, there exists some tk ∈ [kδ, ∞] such that limn→∞ sk (xkn ) = tk , and if
1
−tk−1
k
k
tk < ∞, log sk (xktk)−s
k < n , and limn→∞ xn (sk (xn )) = αk for some αk , or
k−1 (x )
n
n
(b) sk (xkn ) = ∞ for all n.
Let (ym )m≥1 be the subsequence of (xn )n≥1 defined by ym = xm
m and define y ∈ DE [0, ∞) by
y(t) = αk , tk ≤ t < tk+1 , for k = 0, 1, . . . where we take t0 = 0. Since sk (ym ) − sk−1 (ym ) > δ
for each k ≥ 1 and each m for which sk−1 (ym ) < ∞, we may define λm ∈ Λ to be the piecewise
linear function that joins (sk−1 (y
m ), tk−1 ) to (sk (ym ), tk ) if tk < ∞ and which has gradient 1
t −t
1
→ 0, and, for each t there exists a k
otherwise. Then γ(λm ) = supk log sk (ymk)−sk−1
< m
k−1 (ym )
such that r(ym (t), y(λm (t))) = r(ym (sk (ym )), y(tk )) → 0. Thus ym → y by Proposition 2.6.
The conditions for compactness will be stated in terms of the following modulus of continuity.
Definition 5.7. For x ∈ DE [0, ∞), δ > 0, and T > 0, define
w′ (x, δ, T ) = inf max
{ti }
i
sup
r(x(s), x(t)),
s,t∈[ti−1 ,ti )
where {ti } ranges over all partitions of the form 0 = t0 < t1 < · · · < tn−1 < T ≤ tn with
min1≤i≤n (ti − ti−1 ) > δ and n ≥ 1. (The initially strange looking condition tn−1 < T ≤ tn
allows us to not have to worry about the length of the final interval, in a partition of [0, T ],
being smaller than δ. For example, partitions where each interval is the same length, but this
length does not divide T , are admissible.)
Note that w′ (x, δ, T ) is non-decreasing in δ and in T , and that
w′ (x, δ, T ) ≤ w′ (y, δ, T ) + 2
sup
r(x(s), y(s)).
0≤s<T +δ
Before we can characterise the compact sets, we need to establish a few properties of
w′ (x, δ, T ).
Lemma 5.8.
(a) For each x ∈ DE [0, ∞) and T > 0, w′ (x, δ, T ) is right continuous in δ and
lim w′ (x, δ, T ) = 0.
δ→0
(b) If (xn )n≥1 is a sequence in DE [0, ∞), and limn→∞ d(xn , x) = 0, then
lim sup w′ (x, δ, T ) ≤ w′ (x, δ, T + ε)
n→∞
for every δ > 0, T > 0, and ε > 0.
(c) For each δ > 0 and T > 0, w′ (x, δ, T ) is Borel measurable in x.
Proof. (a) The right continuity follows from the fact that any partition 0 = t0 < t1 < · · · <
tn−1 < T ≤ tn with min1≤i≤n (ti −ti−1 ) > δ and n ≥ 1 also satisfies min1≤i≤n (ti −ti−1 ) > δ ′
for δ ′ = 21 (δ + min1≤i≤n (ti − ti−1 )) > δ.
24
Let N ∈ N and define τ0N = 0 and, for k = 1, 2, . . .,
1
N
N
τkN = inf t > τk−1
: r(x(t), x(τk−1
)) >
N
N
N
if τk−1
< ∞, τkN = ∞ if τk−1
= ∞. Note that the sequence (τkN )k≥0 is strictly increasing
N
(as long as its terms remain finite) by the right continuity of x, and for 0 < δ < min{τk+1
−
2
N
N
N
N
′
))
+
r(x(τ
),
x(t))
≤
τk : τk < T }, w (x, δ, T ) ≤ maxi sups,t∈[τiN ,τi+1
N ) r(x(s), x(τ
i
i
N.
′
Hence limδ→0 w (x, δ, T ) = 0.
(b) Let (xn )n≥1 be a sequence in DE [0, ∞), x ∈ DE [0, ∞). Suppose δ > 0 and T > 0. If
limn→∞ d(xn , x) = 0, then by Proposition 2.6, there exists a sequence (λn )n≥1 in Λ such
that (2.1) and (2.9) hold when T is replaced by T + δ. For each n, let yn (t) = x(λn (t)) for
all t ≥ 0 and let δn = sup0≤t≤T (λn (t + δ) − λn (t)). Then, for every ε > 0,
lim sup w′ (xn , δ, T ) ≤
n→∞
≤
≤
=
lim sup w′ (yn , δ, T ) + 2 lim sup
n→∞
sup
n→∞ 0≤s<T +δ
r(xn (s), x(λn (t)))
lim sup w′ (x, δn , λn (T ))
n→∞
lim w′ (x, δn ∨ δ, T + ε)
n→∞
′
w (x, δ, T + ε),
where the first inequality follows from the comment at the end of Definition 5.7, the second
follows by substituting s, t for λn (s), λn (t) in the definition of w′ and by w′ being nondecreasing in δ, the third follows by w′ being non-decreasing in δ and T and by λn (T ) → T ,
and the equality follows by w′ being right continuous in δ (part (a)).
(c) Define w′ (x, δ, T +) = limε↓0 w′ (x, δ, T + ε). This exists by the monotonicity of w′ in T .
Then if xn → x, by (b),
lim sup w′ (xn , δ, T +) ≤
lim w′ (x, δ, T + 2ε)
ε↓0
′
n→∞
=
w (x, δ, T +).
So w′ (x, δ, T +) is upper semicontinuous and hence Borel measurable in x. The result follows by the observation that w′ (x, δ, T ) = limε↓0 w′ (x, δ, (T − ε)+) for every x ∈ DE [0, ∞).
Theorem 5.9. Let (E, r) be complete. Then the closure of A ⊂ DE [0, ∞) is compact if and
only if the following two conditions hold:
(a) For every rational t ≥ 0, there exits a compact set Γt ⊂ E such that x(t) ∈ Γt for all
x ∈ A.
(b) For each T > 0,
lim sup w′ (x, δ, T ) = 0.
δ→0 x∈A
Proof. Suppose A satisfies (a) and (b), and let l ≥ 1. Choose δl ∈ (0, 1) such that
sup w′ (x, δl , l) ≤
x∈A
and ml ≥ 2 such that
let Al = A(Γ(l) , δl ).
1
ml
< δl . Define Γ(l) =
S(l+1)ml
i=0
25
1
l
Γi/ml . Using the notation of Lemma 5.6,
Given x ∈ A, there is a partition 0 = t0 < t1 < · · · < tn−1 < l ≤ tn < l + 1 < tn+1 = ∞ with
min1≤i≤n (ti − ti−1 ) > δl such that
max
sup
1≤i≤n t∈[ti−1 ,ti )
r(x(s), x(t)) ≤
2
.
l
Define x′ ∈ Al by x′ (t) = x((⌊ml ti ⌋ + 1)/ml ) for ti ≤ t < ti+1 , i = 0, 1, . . . , n. Then, by the
definition of ml , ti ≤ (⌊ml ti ⌋ + 1)/ml ≤ ti + m1l < ti+1 , sup0≤t<l r(x′ (t), x(t)) ≤ 2l . Hence
d(x′ , x)
≤
≤
<
Z
∞
e−u sup r(x′ (t ∧ u), x(t ∧ u)) ∧ 1du
t≥0
0
2
+ e−l
l
3
,
l
T
3/l
l/3
and so A ⊂ Al . Now l was arbitrary and so A ⊂ l≥1 Al . By Lemma 5.6, Āl is compact for
each l ≥ 1 and hence A is totally bounded. It follows that A has compact closure, as required.
Conversely, suppose that A has compact closure. For each rational t ≥ 0, define Γt ⊂ E by
Γt = Āt , where At = {x(t) : x ∈ A}. In order to show that Γt is compact, it suffices to show
that every sequence in At has a convergent subsequence. Suppose (xn (t))n≥1 is a sequence in
At . Since A has compact closure, by restricting to a subsequence if necessary, we may assume
that xn → x, for some x ∈ DE [0, ∞). By Proposition 2.6, there exists a sequence (λn )n≥1
in Λ such that (2.1) and (2.9) hold. There is a subsequence λnr such that either λnr (t) ≥ t
for all nr , or λnr (t) < t for all nr . In the first case, r(xnr (t), x(t)) ≤ r(xnr (t), x(λnr (t))) +
r(x(λnr (t)), x(t)), the first term of which converges to 0 by (2.9) and the second term of which
converges to 0 by (2.4) and the right continuity of x. In the second case, r(xnr (t), x(t−)) ≤
r(xnr (t), x(λnr (t))) + r(x(λnr (t)), x(t−)), which converges to 0 similarly. Hence (xn (t))n≥1 has
a convergent subsequence, and (a) holds.
To see that (b) holds, suppose there exist η > 0, T > 0 and a sequence (xn )n≥1 in A such that
w′ (xn , n1 , T ) ≥ η for all n. Since A has compact closure, we may assume that limn→∞ d(xn , x) =
0 for some x ∈ DE [0, ∞). By Lemma 5.8(b),
η ≤ lim sup w′ (xn , δ, T ) ≤ w′ (x, δ, T + 1)
n→∞
for all δ > 0. Letting δ → 0, by Lemma 5.8(a), the right hand side tends to zero, resulting in a
contradiction. Hence (b) holds.
5.3
Some Useful Criteria
We now combine the above characterisation with Prohorov’s Theorem (Theorem 5.2) to obtain
some useful criteria for relative compactness in DE [0, ∞).
Theorem 5.10. Let (E, r) be complete and separable, and let {Xα } be a family of stochastic
processes with sample paths in DE [0, ∞). Then {Xα } is relatively compact if and only if the
following two conditions hold:
(a) For every η > 0 and rational t ≥ 0, there exists a compact set Γη,t ⊂ E such that
inf P Xα (t) ∈ Γηη,t ≥ 1 − η.
α
(b) For every η > 0 and T > 0, there exists δ > 0 such that
sup P (w′ (Xα , δ, T ) ≥ η) ≤ η.
α
26
(Note that, by Lemma 5.8, w′ (x, δ, T ) is Borel measurable and so the set {w′ (Xα , δ, T )} is
measurable in the probability space underlying Xα . Hence it is legitimate to refer to the
probability of this set).
Proof. If {Xα } is relatively compact, then since (DE [0, ∞), d) is complete and separable (Theorem 2.10) we may apply Prohorov’s Theorem (Theorem 5.2), to find a compact set Kη ⊂
DE [0, ∞) such that inf α P (Xα ∈ Kη ) ≥ 1 − η. Since Kη is compact, by Theorem 5.9, for every
rational t ≥ 0, there exists a compact set Γη,t such that x(t) ∈ Γη,t for all x ∈ Kη i.e.
inf P (Xα (t) ∈ Γη,t ) ≥ inf P (Xα ∈ Kη ) ≥ 1 − η.
α
α
′
Also, by Theorem 5.9, limδ→0 supx∈Kη w (x, δ, T ) = 0 and so, for each T > 0, there exists δ > 0
such that supx∈Kη w′ (x, δ, T ) < η. Then
sup P (w′ (Xα , δ, T ) ≥ η)
α
≤
sup P(Xα ∈
/ Kη )
≤
1 − inf P(Xα ∈ Kη )
≤
α
α
η.
So (a) and (b) hold and in fact Γηη,t can be replaced by Γη,t in (a).
Conversely, let ε > 0, let T be a positive integer such that e−T < 2ε , and choose δ > 0 such
SmT
that (b) holds with η = 4ε . Let m > 1δ and set Γ = i=0 Γε2−i−2 , mi . Then
ε
inf P Xα (i/m) ∈ Γ 4 , i = 0, 1, . . . , mT
α
≥ 1−
≥ 1−
mT X
i=0
mT
X
−i−2
1 − inf P Xα (i/m) ∈ Γε2
i
−i−2
ε2
,
α
m
ε2−i−2
i=0
ε
≥ 1− .
2
Using the notation of Lemma 5.6, let A = A(Γ, δ). By the lemma, A has compact closure.
ε
Given x ∈ DE [0, ∞), with w′ (x, δ, T ) < 4ε and x(i/m) ∈ Γ 4 for i = 0, 1, . . . , mT , by the
definition of w′ (Definition 5.7), there exists a partition 0 = t0 < t1 < · · · < tn−1 < T ≤ tn such
that min1≤i≤n (ti − ti−1 ) > δ and
max
sup
1≤i≤n s∈[ti−1 ,ti )
r(x(s), x(t)) <
ε
,
4
and there exist yi ⊂ Γ such that r(x(i/m), yi ) < 4ε , for i = 0, 1, . . . , mT . Define x′ ∈ A by
(
y⌊mti−1 ⌋+1 ,
ti−1 ≤ t < ti , i = 1, . . . , n − 1
′
x (t) =
y⌊mtn−1 ⌋+1 ,
t ≥ tn−1 .
⌋+1
< ti and so if ti−1 ≤ t < ti ∧ T , then
Then since m > 1δ , for each i = 1, . . . , n, ti−1 ≤ ⌊mti−1
m
′
r(x(t), x (t)) ≤ r(x(t), x((⌊mti−1 ⌋ + 1)/m)) + r(x((⌊mti−1 ⌋ + 1)/m), y⌊mti−1 ⌋+1 ) < 2ε , and hence
d(x, x′ ) < 2ε + e−T < ε, implying that x ∈ Aε . Consequently,
ε ε
inf P(Xα ∈ Aε ) ≥ inf P Xα (i/m) ∈ Γ 4 , i = 0, 1, . . . , mT and w′ (Xα , δ, T ) <
α
α
4
ε ε
≥ 1−
+
2 4
> 1 − ε.
Since Theorem 2.10 implies (DE [0, ∞), d) is complete and separable, by Prohorov’s Theorem
(Theorem 5.2), {Xα } is relatively compact.
27
Corollary 5.11. Let (E, r) be complete and separable, and let (Xn )n≥1 be a sequence of stochastic processes with sample paths in DE [0, ∞). Then (Xn )n≥1 is relatively compact if and only if
the following two conditions hold:
(a) For every η > 0 and rational t ≥ 0, there exists a compact set Γη,t ⊂ E such that
lim inf P Xn (t) ∈ Γηη,t ≥ 1 − η.
n→∞
(b) For every η > 0 and T > 0, there exists δ > 0 such that
lim sup P (w′ (Xn , δ, T ) ≥ η) ≤ η.
n→∞
Proof. The conditions are necessary as an immediate consequence of Theorem 5.10.
For the sufficiency, fix η > 0, rational t ≥ 0, and T
there exists a compact set Γn ⊂ E such that P(Xn (t)
there exists δn > 0 such that P(w′ (Xn , δn , T ) ≥ η) ≤ η.
a compact set Γ0 ⊂ E, δ0 > 0, and a positive integer n0
> 0. For every n ∈ N, by Lemma 5.4
∈ Γηn ) ≥ 1 − η and by Lemma 5.8(a)
By conditions (a) and (b) there exists
such that
inf P (Xn (t) ∈ Γη0 ) ≥ 1 − η
n≥n0
and
sup P (w′ (Xn , δ0 , T ) ≥ η) ≤ η.
n≥n0
We can replace n0 by 1 in the above relations if we replace Γ0 by Γ =
Vn0 −1
δn . The result follows by Theorem 5.10.
δ = n=0
6
Sn0 −1
n=0
Γn and δ0 by
A Law of Large Numbers
We now change course slightly to discuss a generalisation of the Law of Large Numbers to a
sequence of Markov jump processes. We shall use this result in the next section when we apply
the theory we have built up so far to establishing a generalisation of the Central Limit Theorem
to Markov processes.
The Law of Large Numbers essentially demonstrates that the average behaviour of a sequence
of independent identically distributed random variables with finite means becomes deterministic
as the number of random variables becomes large. By viewing these random variables as the
jump sizes in a random walk, the value of
X1 + X2 + · · · + XN
N
can be regarded as the position of the random walk at time 1 if the jump rate is increased and
the jump size is decreased, each by a factor of N (See Figure 1). In this setting, the Law of Large
Numbers can be interpreted as saying that in the limit as N → ∞ the position of a random walk
at time 1 becomes deterministic if the jump rate is increased, and the jump size is decreased,
each by a factor of N .
Now suppose that instead of a random walk we have a general pure jump Markov process
(defined below). We generalise the Law of Large Numbers by showing that, under certain
conditions, if we alter the jump rate and jump size as described above, then, in the limit as
N → ∞, the position of the jump process at time t is deterministic for all t. This deterministic
limit is known as the fluid limit.
28
Figure 1: Illustration of the scaling process for N = 3
6.1
Preliminaries
We start with a few definitions.
Definition 6.1. A stochastic process X = (Xt )t≥0 taking values in a subset I of Rd is a pure
jump process if there exist some (possibly infinite) random times 0 = J1 < J2 < . . . < Jn ↑ ζ,
(where ∞ < ∞ is allowed) and some I-valued process (Yn )n∈Z+ such that
(
Yn
if Jn ≤ t < Jn+1
Xt =
∂
if t ≥ ζ
where ζ (which may be infinite) is the explosion time of the process and ∂ is some cemetery
state.
X is a pure jump Markov process with Lévy kernel K if it is a pure jump process and, for all
n ∈ N,
P(Jn ∈ dt, ∆XJn ∈ dy | Jn > t, XJn−1 = x) = K(x, dy)dt,
where ∆XJn = XJn − XJn−1 is the displacement of the nth jump.
Definition 6.2. The jump measure µ of X on (0, ∞) × Rd is given by
µ=
∞
X
δ(Jn ,∆XJn )
n=1
R
P
where δ(t,y) denotes the unit mass at (t, y) i.e. f dµ = ∞
n=1 f (Jn , ∆XJn ).
We also introduce the random measure ν on (0, ∞) × Rd , given by
ν(dt, dy) = K(Xt− , dy)dt.
29
Definition 6.3. The Laplace transform of a pure jump Markov process with Lévy kernel K is
given by
Z
ehθ,yi K(x, dy).
m(x, θ) =
Rd
We assume that, for some η0 > 0,
sup sup m(x, θ) ≤ C < ∞.
(6.1)
x∈I |θ|≤η0
The conditions for the fluid limit to exist will be formulated in terms of the Laplace transform,
so before we state the main theorem, we establish a few preliminary results.
Fix η ∈ (0, η0 ). We establish bounds on m′′ (x, θ) and m′′′ (x, θ) for |θ| ≤ η in the lemma
below, where ′ denotes differentiation in θ. Although we only use the first bound in the proof of
the fluid limit, the second bound will be useful in the next section and it is convenient to prove
them together.
Lemma 6.4. There exist A < ∞ and B < ∞ such that
|m′′ (x, θ)| ≤ A
|m′′′ (x, θ)| ≤ B
and
(6.2)
for all x ∈ I and |θ| ≤ η.
Proof. Set δ = η0 − |θ| ≥ η0 − η > 0. Note that (δy)2 ≤ eδy + e−δy for all y ∈ R. Then
|y|2 = (y1 )2 + · · · + (yd )2 ≤ δ −2
and so
Z
Rd
2 hθ,yi
|y| e
K(x, dy) ≤
=
δ
−2
d Z
X
i=1
δ
−2
d Z
X
i=1
e
δyi hθ,yi
e
d
X
(eδyi + e−δyi )
i=1
K(x, dy) +
Z
e
−δyi hθ,yi
e
Rd
Rd
e
hθ+δei ,yi
K(x, dy) +
Rd
Z
e
hθ−δei ,yi
Rd
K(x, dy)
K(x, dy)
2Cd
δ2
since |θ ± δei | ≤ |θ| + δ = η0 . We may now differentiate twice under the integral sign to see that
m′′ (x, θ) exists (m′ (x, θ) exists since |y| ≤ 21 (|y|2 + 1)). Then for |u|, |v| ≤ 1,
Z
′′
hu, yihy, viehθ,yiK(x, dy)
hu, m (x, θ)vi =
d
ZR
≤
|u||y|2 |v|ehθ,yi K(x, dy)
≤
Rd
≤
≤
So, setting A =
2Cd
(η0 −η)2
2Cd
δ2
2Cd
(η0 − η)2
gives the required result.
2
2
Now, with δ as above, (δy)2 ≤ 49 (e 3 δy + e− 3 δy) and so
3
|y|3 = (y1 )2 + · · · + (yd )2 2
d
X
! 32
≤
27 −3
δ
8
≤
d
X
1
27 −3
(eδyi + e−δyi )
δ (2d) 2
8
i=1
30
i=1
(e
2
3 δyi
+e
− 32 δyi
)
2
where the last inequality follows by Jensen’s Inequality and the concavity of x 3 . Then, as above
Z
Rd
3
3 hθ,yi
|y| e
1
27Cd 2
27 −3
δ (2d) 2 2Cd ≤ √
K(x, dy) ≤
8
2 2(η0 − η)3
and so, by the same reasoning as above, m′′′ (x, θ) exists and taking B =
required result.
3
√27Cd 2
2 2(η0 −η)3
gives the
Proposition 6.5. Let a : Ω × (0, ∞) × Rd → R be a previsible process satisfying
E
Z tZ
Rd
0
|a(s, y)|ν(ds, dy) < ∞
for all t. Then the following process is a martingale.
Z tZ
a(s, y)(µ − ν)(ds, dy).
Rd
0
Proof. This result is well known.
In particular, by the proof of Lemma 6.4, |y| ≤ 21 (|y|2 + 1) ≤
h−η0 ei ,yi
e
) + 1), and so (6.1) gives
Z tZ
1 2d
E
|y|ν(ds, dy) ≤
+ 1 Ct < ∞
2 η02
0
Rd
1 −2
2 (η0
Pd
i=1 (e
hη0 ei ,yi
and hence we may take a(s, y) = y to get the martingale
Z
y(µ − ν)(ds, dy).
Mt =
(6.3)
Rd
For θ ∈ (Rd )∗ , define
φ(x, θ) =
Z
Rd
n
o
ehθ,yi − 1 − hθ, yi K(x, dy).
Then φ ≥ 0 and, by the second-order mean value theorem,
φ(x, θ) =
Z
1
0
hθ, m′′ (x, rθ)θi(1 − r)dr.
Therefore, by Lemma 6.4
φ(x, θ) ≤
Z
0
1
|m′′ (x, rθ)||θ|2 (1 − r)dr ≤
1
A|θ|2 ,
2
for x ∈ I and |θ| ≤ η.
Definition 6.6. Let (θt )t≥0 be a previsible process in (Rd )∗ with |θt | ≤ η for all t. Define
Zt =
Ztθ
= exp
Z
t
0
hθs , dMs i −
Lemma 6.7. (Zt )t≥0 is a martingale.
31
Z
0
t
+
φ(Xs , θs )ds .
Proof. Since |θt | ≤ η for all t,
|Zt | ≤
≤
≤
Z t
|θs ||dMs | +
|φ(Xs , θs )|ds}
0
0
Z tZ
Z t
1 2
Aη dt}
exp{
2η|y|ν(ds, dy) +
2
d
0
R
0
1 2
2d
+
1
ηCt
+
Aη
t
,
exp
η02
2
exp{
Z
t
and so (Zt )t≥0 is locally bounded. Now,
Z
Z
hθt ,yi
{e
− 1 − hθt , yi}ν(dt, dy)} − 1
hθt , yi(µ − ν)(dt, dy) −
Zt − Zt− = Zt− exp{
Rd
Rd


Z
X
(ehθt ,yi − 1)ν(dt, dy)
ehθt ,yi − 1 +
= Zt− 
Rd
y:(Jn ,∆XJn )=(t,y)
=
Z
Rd
Hence
Zt− (ehθt ,yi − 1)(µ − ν)(dt, dy).
Zt = 1 +
Z tZ
0
Rd
Zs− (ehθs ,yi − 1)(µ − ν)(ds, dy),
and so (Zt )t≥0 is a non-negative local martingale. Therefore there exist stopping times Tn → ∞
such that (ZtTn )t≥0 is a martingale and so, by Fatou’s Lemma, E(Zt ) = E(lim inf n→∞ ZtTn ) ≤
lim inf n→∞ E(ZtTn ) = 1. Therefore
Z tZ
Z t
Z
hθs ,yi
E
|Zs− (e
− 1)|ν(ds, dy) ≤ E
Zs−
(ehθs ,yi + 1)K(Xs− , dy)ds
0
Rd
0
Rd
Z t
Zs (m(Xs , θs ) + m(Xs , 0))ds
≤ E
0
≤ 2Ct < ∞.
By Proposition 6.5, (Zt )t≥0 is a martingale.
√
Proposition 6.8 (The Exponential Martingale Inequality). For all δ ∈ (0, Aηt d],
δ2
P sup |Ms | > δ ≤ 2de− 2Adt .
s<t
Proof. Fix θ ∈ (Rd )∗ with |θ| = 1 and, for any δ ′ > 0, consider the stopping time
T = inf{t ≥ 0 : hθ, Mt i > δ ′ }.
For any ε < η, by Lemma 6.7, (Ztεθ )t≥0 is a martingale, where θt = θ for all t. Therefore, by
the Optional Stopping Theorem, E(ZTεθ∧t ) = E(Z0εθ ) = 1. Since on the set {T ≤ t},
(
)
Z T
1 2 2
εθ
ZT ∧t ≥ exp εhθ, MT i −
Aε |θ| dt
0 2
′
1
2
≥ eδ ε− 2 Atε
(using the right continuity of M to get hθ, MT i ≥ δ ′ ),
= P(T ≤ t)
P suphθ, Ms i > δ ′
s≤t
2
′
1
≤ P ZTεθ∧t ≥ eδ ε− 2 Atε
′
1
2
′
1
2
≤ e−δ ε+ 2 Atε E(ZTεθ∧t )
= e−δ ε+ 2 Atε ,
32
where the second inequality is by Chebyshev’s Inequality. In particular, when δ ′ ≤ Atη, we can
δ′
to get
take ε = At
δ′ 2
P suphθ, Ms i > δ ′ ≤ e− 2At .
s≤t
Now, if sups≤t |Ms | > δ, then sups≤t hθ, Ms i >
√δ
d
√δ
d
≤ AT η gives
P(sup |Ms | > δ) ≤
s<t
≤
for one of θ = ±e1 , . . . , ±ed. Setting δ ′ =
X
d
d
X
δ
δ
P hei , Ms i > √
P h−ei , Ms i > √
+
d
d
i=1
i=1
δ2
2de− 2Adt .
Finally we state Gronwall’s Inequality, which will be useful in this section and the next.
Lemma 6.9 (Gronwall’s Inequality). Let µ be a Borel measure on [0, ∞), let ε ≥ 0 and let
f be a Borel measurable function that is bounded on bounded intervals and satisfies
Z
0 ≤ f (t) ≤ ε +
f (s)µ(ds)
[0,t)
for t ≥ 0. Then
f (t) ≤ εeµ[0,t) .
In particular, if M > 0 and
0 ≤ f (t) ≤ ε + M
for t ≥ 0, then
Z
t
f (s)ds
0
f (t) ≤ εeMt .
Proof. This result is well known.
6.2
The Fluid Limit
Let (XtN )t≥0 be a sequence of pure jump Markov processes with Lévy kernels K N (x, dy) taking
values in some subsets I N of Rd . Let S be an open subset in Rd and set S N = I N ∩ S. We shall
study, under certain conditions, the limiting behaviour of (XtN )t≥0 as N → ∞, on compact time
intervals, up to the first time the process leaves S.
The introduction of S does not add any additional restrictions as we are free to take S = Rd .
However, in some situations our processes may all stop abruptly on leaving some open set U
i.e. K N (x, dy) = 0 for all x ∈
/ U and when this happens it is useful to restrict our attention to
what happens before the processes leave U . In this case we may choose S to be a subset of U .
In other cases we may wish to take S to be small to make it easier to check that the various
convergence conditions are satisfied. The only restriction imposed on S is that the conjectured
limit path does not leave S in the relevant compact time interval.
Definition 6.10. Fix t0 > 0 and set
/ S} ∧ t0 .
T N = inf{t ≥ 0 : XtN ∈
N
¿From now on we shall assume that XtN = Xt∧T
N for all t ≥ 0.
33
Let mN (x, θ) be the Laplace transform corresponding to Lévy kernel K N (x, dy) for x ∈ S N
and θ ∈ (Rd )∗ (see Definition 6.3). We assume that there is a limit kernel K(x, dy) defined for
x ∈ S with corresponding Laplace transform m(x, θ) with the following properties:
(a) There exists a constant η0 > 0 such that m(x, θ) is bounded for all x ∈ S and |θ| ≤ η0 .
(b) As N → ∞,
N
m (x, N θ)
− m(x, θ) → 0.
sup sup N
N
x∈S
(6.4)
|θ|≤η0
We give some justification of condition (6.4) with the following example.
Example 6.11. Consider the case outlined at the beginning of this chapter where X N arises
from X 1 by increasing the jump rate by a factor of N and decreasing the jump size by a factor of
1
1
N
1
1
N
N . Then XtN = N1 (XN
t − X0 ) + X0 = N (XN t − X0 ) + X0 . So, setting K(x, dy) = K (x, dy),
we get
K(x, dy)dt
= P (J1 ∈ dt, ∆XJ1 ∈ dy | J1 > t, X0 = x)
= P N J1N ∈ dt, N ∆XJNN ∈ dy | N J1N > t, X0N = x
1
N
N
= P J1 ∈ dt/N, ∆XJ N ∈ dy/N | J1N > t/N, X0N = x
1
= K N (x, dy/N ) dt/N.
Hence
K N (x, dy
N )
N
= K(x, dy) and
m(x, N θ)
N
=
=
=
1
N
Z
ZR
Z
ehN θ,yiK N (x, dy)
Rd
′
ehθ,y i K N (x, dy ′ /N )/N
d
′
ehθ,y i K(x, dy ′ )
Rd
=
m(x, θ)
(where y ′ = N y) i.e. (6.4) holds. Hence our fluid limit theorem will apply in this case (subject
to a few additional constraints on K(x, dy)) and so we do have a generalisation of the Law of
Large Numbers as claimed at the beginning of the section.
We also assume that there exists some x0 ∈ S̄, the closure of S such that, for all δ > 0,
lim sup N −1 log P(|X0N − x0 | ≥ δ) < 0.
(6.5)
N →∞
What this means is that there exists some γδ > 0 such that P(|X0N − x0 | ≥ δ) < e−γδ N for
sufficiently large N i.e. X0N converges in probability to x0 exponentially fast in N . This holds
N
in the simple case X0N = xN
0 (deterministic) with x0 → x0 as N → ∞.
Set b(x) = m′ (x, 0), where once again ′ denotes differentiation in θ. We make the final
assumptions that b is Lipschitz on S and that S has a Lipschitz boundary so that b has an
extension to a Lipschitz vector field b̃ on Rd . Then there is a unique solution (xt )t≥0 to the
ordinary differential equation ẋt = b̃(xt ) starting from x0 .
Theorem 6.12 (Fluid Limit). Under the above assumptions, for all δ > 0,
lim sup N −1 log P( sup |XtN − xt | ≥ δ) < 0,
N →∞
t≤T N
where T N is defined in Definition 6.10.
34
Proof. By assumption (6.4), since m(x, θ) is bounded, there exists some constant C < ∞ such
that, for all n ∈ N,
mN (x, N θ)
≤ C.
sup sup
N
x∈S N |θ|≤η0
Fix η ∈ (0, η0 ). Then, by the proof of Lemma 6.4,
|mN ′′ (x, θ)| ≤
2N Cd
A
= ,
2
(N η0 − N η)
N
for all x ∈ S N and |θ| ≤ N η.
Set bN (x) = mN ′ (x, 0) and define (MtN )t≥0 by
XtN = X0N + MtN +
t
Z
bN (XsN )ds.
0
Then
MtN
XtN − X0N −
=
X
=
N ≤t
Jn
0
Rd
Rd
0
∆XJnN −
Z tZ
=
Z tZ
Z tZ
yK N (XsN , dy)ds
yν N (ds, dy)
Rd
0
y(µN − ν N )(ds, dy),
where µN and ν N are defined as in Definition 6.2. MtN corresponds to the martingale M
√ t in
A
N ηt0 d =
(6.3) and hence we may apply Proposition 6.8 to MtN to conclude that, if ε0 = N
√
Aηt0 d > 0 and C0 = max{2d, 2Adt0 }, then, for all ε ∈ (0, ε0 ],
!
P
N ε2
≤ 2de− 2AdT N ≤ C0 e−
sup |MtN | > ε
t≤T N
N ε2
C0
.
(6.6)
Given δ > 0, set ε = min{ 2δ e−Kt0 , ε0 }, where K is the Lipschitz constant of b̃. Let
ΩN = {|X0N − x0 | ≤ ε and sup |MtN | ≤ ε}.
t≤T N
Then by (6.5) and (6.6)
P(Ω \ ΩN ) =
≤
≤
P(|X0N − x0 | > ε
sup |MtN | > ε)
or
t≤T N
P(|X0N − x0 | > ε) + P( sup |MtN | > ε)
t≤T N
e−γε N + C0 e
2
− NCε
0
for large enough N , and, by L’Hôpital’s Rule,
lim sup N
N →∞
Hence
−1
log(e
−γε N
2
+ C0 e
− NCε
0
)
= lim sup
−γε e−γε N −
−
ε2
C0 C0 e
e−γε N + C0 e−
≤ − min{γε , ε2 /C0 } < 0.
N →∞
lim sup N −1 log P(Ω \ ΩN ) < 0.
N →∞
35
N ε2
C0
N ε2
C0
However, by (6.4), there exists N1 ∈ N such that
N
m (x, N θ)
ε2
− m(x, θ) ≤
sup sup N
8At0 d
x∈S N |θ|≤η0
for all N ≥ N1 . Then for each N ≥ N1 , pick h ∈ (Rd )∗ in the same direction as bN (x) − b(x)
and with |h| = 2Atε √d . |h| ≤ η by the definition of ε0 , since ε ≤ ε0 . Then
0
|bN (x) − b(x)|
|hh, bN (x) − b(x)i|
|h|
1 hN h, bN (x)i mN (x, N h) − mN (x, 0) −
≤
|h| N
N
N
N
m (x, N h)
m (x, 0)
+ − m(x, h) + − m(x, 0)
N
N
+ |m(x, h) − m(x, 0) − hh, b(x)i|
=
≤
≤
=
!
N
N
φ (x, N h) + 2 sup sup m (x, N θ) − m(x, θ) + |φ(x, h)|
N
N
x∈S N |θ|≤η0
1 1A 2 2
ε2
1
1
N |h| + 2
+ A|h|2
|h| N 2 N
8At0 d 2
ε
(6.7)
t0
1
|h|
We note that
XtN − xt = (X0N − x0 ) + MtN +
for t ≤ T
N
Z
0
t
(bN (XsN ) − b(XsN ))ds +
Z
0
t
(b̃(XsN ) − b̃(xs ))ds
N
≤ t0 . So, for N ≥ N1 , on Ω ,
|XtN
− xt | ≤ 3ε + K
Z
0
t
|XsN − xs |ds
which implies, by Gronwall’s lemma (Lemma 6.9), that supt≤T N |XtN − xt | ≤ 3εeKt0 ≤ δ. Hence
lim sup N −1 log P( sup |XtN − xt | ≥ δ) ≤ lim sup N −1 log P(Ω \ ΩN ) < 0,
N →∞
N →∞
t≤T N
as required.
Remark 6.13. A corresponding generalisation of the Law of Large Numbers holds for diffusion
processes. Suppose X 1 is a diffusion process with diffusivity a1 and drift b1 , satisfying the
conditions of Section 3.2.3. For simplicity assume that X01 = 0. Then there exists a Brownian
motion B such that
dXt1 = σ1 (Xt1 )dBt + b1 (Xt1 )dt,
where σ1 (x)σ1 (x)∗ = a1 (x). Defining X N as in Example 6.11, but with X0N = 0, we get
1
XtN = N1 XN
t and so,
σ1 (N XtN ) ′
√
dXtN =
dBt + b1 (N XtN )dt,
N
x)
, and
where Bt′ = √1N BN t is a Brownian motion. Hence X N has diffusivity aN (x) = a1 (N
N
drift bN (x) = b1 (N x). Since a1 is bounded, aN → 0 uniformly as N → ∞. If there exists some
Lipschitz function b such that bN → b uniformly, let xt be the solution to the ordinary differential
N
t
equation dx
dx = b(xt ) with x0 = 0. By Section 3.2.3, Xt → xt in probability, generalising the
Law of Large Numbers, as required.
36
6.3
A Brief Look at the Exit Time
We conclude this section by making some observations about the limiting distribution of the
exit time T N . This will enable us to make the statement of Theorem 6.12 more precise and will
also be useful in the next section.
Definition 6.14. Define
τ = inf{t ≥ 0 : xt ∈
/ S̄} ∧ t0 ,
T = {t ∈ [0, τ ) : xt ∈
/ S}.
We aim to show that
lim sup N −1 log P(
N →∞
inf
t∈T ∪{τ }
|T N − t| > δ) < 0.
Set
N
ΩN
δ = { sup |Xt − xt | < δ}
t≤T N
and
Aδ = {s ∈ [0, t0 ] : inf c |xs − y| < δ}.
y∈S
Note that since S is open, by the right continuity of X N , either T N = t0 or XTNN ∈
/ S. Hence,
N
δ
N
on the set ΩN
,
T
∈
A
or
T
=
t
(See
Figure
2).
0
δ
Figure 2: Illustration of the set Aδ
/ S̄.
Note that if τ 6= t0 , then, for 0 < ε < t0 −τ there exists some τε ∈ [τ, τ +ε) such that xτε ∈
N
∈
/
S
and
so
Since (S̄)c is open, there exists δ > 0 such that B(xτ , δ) ∩ S̄ = ∅. Then on ΩN
,
X
τε
δ
T N ≤ τε < t0 . Hence T N = t0 on ΩN
implies
that
τ
=
t
.
0
δ
Lemma 6.15. sups∈Aδ inf t∈T ∪{τ } |s − t| → 0 as δ → 0.
37
Proof. Suppose the contrary. Then there exists some ε > 0 such that, for each n ∈ N, there
1
1
exists some sn ∈ A n , with inf t∈T ∪{τ } |sn − t| ≥ ε. By the definition of A n , there is a sequence
yn ∈ S c with |xsn − yn | < n1 . Since sn is in the compact interval [0, t0 ] for all n, there exists
a convergent subsequence snr → s for some s ≤ t0 . As xt is continuous, xsnr → xs . Hence
ynr = (ynr − xsnr ) + xsnr → xs and, since S c is closed, xs ∈ S c i.e. s ∈ T ∪ {τ }, contradicting
our choice of sn .
Proposition 6.16. For all δ > 0,
lim sup N −1 log P
N →∞
inf
t∈T ∪{τ }
|T N − t| > δ
< 0.
Proof. By the above lemma, given δ > 0, there exists an ε > 0 such that sups∈Aε inf t∈T ∪{τ } |s −
N
∈ Aε
t| ≤ δ and such that, if τ 6= t0 , B(xτη , ε) ∩ S̄ = ∅ for some 0 < η < t0 − τ . On ΩN
ε , T
N
N
N
N
or T = t0 . Hence inf t∈T ∪{τ } |T − t| ≤ δ and so {inf t∈T ∪{τ } |T − t| > δ} ⊂ Ω \ Ωε . By
Theorem 6.12,
lim sup N −1 log P(
inf
t∈T ∪{τ }
N →∞
|T N − t| > δ) ≤
lim sup N −1 log P(Ω \ ΩN
ε )
N →∞
=
lim sup N −1 log P( sup |XtN − xt | > δ)
<
0.
N →∞
t≤T N
In particular, if T is empty, then T N → τ in probability, exponentially fast in N . Since
N
N
= Xt∧T
, |XtN − xt∧τ | ≤ |XT N − xT N | + |xT N − xt∧τ |. Now as xt is
N , for all t ≥ T
continuous, there exists some ε0 > 0 such that |T N − τ | ≤ ε0 implies that the second term is
N
less than 2δ for all t ≥ T N . Choosing 0 < ε < δ2 sufficiently small that ΩN
− τ | ≤ ε0 }
ε ⊂ {|T
gives
lim sup N −1 log P(sup |XtN − xt∧τ | > δ) ≤ lim sup N −1 log P(Ω \ ΩN
ε ) < 0.
XtN
N →∞
7
N →∞
t≤t0
A Central Limit Theorem
The Central Limit Theorem states that if we have a sequence (Xn )n∈N of independent identically
distributed random variables with finite mean and variance, then
√
X1 + X2 + · · · + XN
N
−µ
N
converges in distribution to a N (0, σ 2 ) random variable where σ 2 is the variance of each random
variable, and µ is the mean of each random variable or, equivalently, the deterministic limit to
which (X1 + X2 + · · · + XN )/N converges by the Law of Large Numbers.
As in the previous section, we can view these random variables as the jump sizes in a random
walk, and so obtain a corresponding statement about the position of the random walk at time
1 if the jump rate is increased and the jump size is decreased, each by a factor of N .
In this section we generalise the Central Limit Theorem to Markov jump processes by showing
that, under certain conditions, if we alter the jump rate and jump size as described above to
obtain a sequence (XtN )t≥0 of jump processes, then,
√
N XtN − xt
38
converges in distribution to a Gaussian process (the analogue of a Gaussian random variable
for processes), where xt is the deterministic fluid limit obtained in the previous section by
generalising the Law of Large Numbers.
We prove our result for the more general sequence of Markov jump processes described in
the previous section.
Definition 7.1. Define
γtN =
√
N XtN − xt∧T N ,
N
where T N is the exit time defined in Definition 6.10 and, as before, XtN = Xt∧T
N.
In addition to the assumptions of the previous section, we also assume that
(a) there exists some random variable γ0 such that
γ0N ⇒ γ0 ,
(b)
sup
√
x∈S N
N |bN (x) − b(x)| → 0,
(7.1)
(7.2)
where, as before, bN (x) = mN ′ (x, 0) and b(x) = m′ (x, 0),
(c) b is C 1 on S,
(d) a, defined by a(x) = m′′ (x, 0), is Lipschitz on S.
Example 7.2. In the case where X N arises from X 1 by increasing the jump rate by a factor
N
θ)
of N and decreasing the jump size by a factor of N , by Example 6.11 m (x,N
= m(x, θ) and
N
N
so b (x) = b(x). Hence assumption (b) holds in this case and so our convergence theorem will
apply in this case (subject to a few additional constraints on K(x, dy)) and so we do have a
generalisation of the Central Limit Theorem as claimed at the beginning of the section.
Note that
hu, a(x)vi =
and
Z
Rd
hu, yihy, viK(x, dy) = hua(x)∗ , vi
hv, a(x)vi =
Z
Rd
hv, yi2 K(x, dy) ≥ 0
so a is symmetric positive definite and hence there exists σ, unique up to change of sign, such
that σ(x)σ(x)∗ = a(x).
Definition 7.3. Let (γt )t≤τ be the unique solution to the linear stochastic differential equation
dγt = σ(xt )dBt + ∇b(xt )γt dt
starting from γ0 , where τ is defined in Definition 6.14 and B is a Brownian motion. Note that
the distribution of (γt )t≤τ does not depend on the choice of σ since −B and B have the same
distributions.
As the limiting distribution of γtN will depend on the limiting behaviour of T N , we make the
final assumptions that T , defined in Definition 6.14 is finite and that, for all t ∈ T , ∂S is C 1 at
xt with inward normal nt , and P(hnt , γt i = 0) = 0.
Definition 7.4. Define
T = min{t ∈ T : hnt , γt i < 0} ∧ τ.
From now on we assume that γt = γt∧T .
We aim to prove that γtN ⇒ γt as N → ∞.
39
7.1
Relative Compactness
As (γt )t≥0 arises from a linear stochastic differential equation, it has independent increments
and so checking that the finite dimensional distributions of γ N converge to those of γ will be
relatively straightforward. Therefore it is convenient to use Theorem 4.3 to prove the convergence
in distribution of the above processes.
We establish the relative compactness of the sequence (γ N )N ∈N by checking the necessary
and sufficient conditions stated in Corollary 5.11, namely that
(a) For every ε > 0 and rational t ≥ 0, there exists a compact set Γε,t ⊂ Rd such that
lim inf P γtN ∈ Γεε,t ≥ 1 − ε.
N →∞
(b) For every ε > 0 and T > 0, there exists δ > 0 such that
lim sup P w′ (γ N , δ, T ) ≥ ε ≤ ε.
N →∞
where w′ is defined in Definition 5.7.
By conditioning on F0 = σ(γ0 ), if necessary, it is sufficient to consider the case when γ0 is
non-random. We make this assumption throughout the remainder of this section.
Lemma 7.5. For all ε > 0 there exists λ < ∞ such that, for all N ∈ N,
P sup |γtN | ≥ λ < ε.
t≤t0
q
Proof. Given ε > 0, pick λ′ ≥ max{|γ0 | + 1, C0 log 2Cε 0 }, where C0 is defined in (6.6). By
(7.2), there exists some N1 ∈ N such that N ≥ N1 implies
√
λ′
N |bN (x) − b(x)| ≤
t0
for all x ∈ S N . By (7.1), there exists some N2 ∈ N such that N ≥ N2 implies
P(|γ0N | > λ′ ) ≤ P(|γ0N | > |γ0 | + 1) <
ε
.
2
By (6.6), there exists some N3 ∈ N such that N ≥ N3 implies that
s


!
2C0
√
C
log
ε
0
ε 
P
N sup |MtN | > λ′ ≤ P  sup |MtN | >
≤ .
N
2
t≤T N
t≤T N
N3 is chosen so that
q
C0 log
N
2C0
ε
≤ ε0 , where ε0 is defined in (6.6).
Now, by the definition of MtN ,
γtN
=
γ0N
√ Z t N N
√ Z t
√
N
N
(b (Xs ) − b(Xs ))ds + N
(b̃(XsN ) − b̃(xs ))ds
+ N Mt + N
0
and so, on the set
(
√
λ′
N |bN (x) − b(x)| ≤ ,
t0
0
|γ0N |
40
′
≤λ,
√
N sup |MtN | ≤ λ′
t≤T N
)
,
|γtN |
≤
≤
Z t
λ′ √
λ + λ + t + NK
|XsN − xs |ds
t0
0
Z t
′
3λ + K
|γsN |ds,
′
′
0
where K is the Lipschitz constant of b̃. Hence, by Gronwall’s Inequality (Lemma 6.9),
sup |γtN | ≤ 3λ′ e−Kt0 ,
t≤T N
and so, if N0 = max{N1 , N2 , N3 }, then N ≥ N0 implies that
√
P sup |γtN | ≥ 3λ′ e−Kt0 ≤ P(|γ0N | > λ′ or N sup |MtN | > λ′ ) < ε.
t≤t0
t≤T N
Pick λ ≥ 3λ′ e−Kt0 sufficiently large that P supt≤t0 |γtN | ≥ λ < ε for all N < N0 . Then, for all
N ∈ N, P supt≤t0 |γtN | ≥ λ < ε.
For all t, take Γε,t = B̄(0, λ), the closed ball of radius λ in Rd . Then by the above lemma,
lim inf P(γtN ∈ Γεε,t ) ≥ 1 − ε,
N →∞
and so condition (a) of Corollary 5.11 holds.
Lemma 7.6. For all ε > 0, there exists λ < ∞ such that, for all δ > 0, there exists Nδ < ∞
such that, for all N ≥ Nδ and all t ≤ t0 ,
!
√
N
N
P
sup
|γs − γt | > λ δ < ε.
s≤t0 ,t≤s≤t+δ
Proof. Given ε > 0, by Lemma 7.5 there exists λ1 < ∞ such that
ε
λ1
N
< ,
P |γt | ≥ √
2
K t0
q
′
′
for all t ≤ t0 . By (6.6), setting λ2 = 2Ad log 4d
ε , there exists Nδ < ∞ such that N ≥ Nδ
implies that
!
(
)
p
√
√
N (λ2 δ/N )2
ε
N
P
≤ ,
N sup |Mt | > λ2 δ < 2d exp −
N
2Ad(T ∧ δ)
2
t≤T N ∧δ
and hence
√
N
P
sup
s≤t0 ,t≤s≤t+δ
|MsN
−
MtN |
√
> λ2 δ
!
<
ε
,
2
p
where the dependence of Nδ′ on δ arises from needing λ2 δ/N ≤ ε0 . Setting λ′ = max{λ1 , λ2 }
and λ = 3λ′ e−Kt0 , by (7.2), there exists N0 < ∞ such that N ≥ N0 implies
√
λ′
N |bN (x) − b(x)| ≤ √ .
t0
Now, by the definition of MtN , if s ≥ t
√ Z s N N
√ Z s
√
(b (Xu ) − b(XuN ))du + N
(b̃(XuN ) − b̃(xu ))du,
γsN − γtN = N (MsN − MtN ) + N
t
t
41
and
Z s
Z
√
N
N
(
b̃(X
)
−
b̃(x
))du
≤
K
u
u
t
s
|γuN |du ≤ K
t
Hence, on the set
(
√
λ′
N |bN (x) − b(x)| ≤ √ ,
t0
|γtN |
√
N
λ′
≤ √ ,
K t0
Z
t
s
(|γtN | + |γuN − γtN |)du.
sup
s≤t0 ,t≤s≤t+δ
|MsN
−
MtN |
′
≤λ
√
δ
)
,
if s ≤ t0 and t ≤ s ≤ t + δ,
|γsN
−
γtN |
Z s
√
λ′
λ′
≤ λ δ + (s − t) √ + K(s − t) √ + K
|γuN − γtN |du
t0
K t0
t
Z s
√
|γuN − γtN |du,
≤ 3λ′ δ + K
′
t
where, for the second inequality, δ may be taken to be at most t0 without changing the admissible
values of s. By Gronwall’s Inequality (Lemma 6.9),
√
√
sup
|γsN − γtN | ≤ 3λ′ δe−Kt0 = λ δ,
s≤t0 ,t≤s≤t+δ
and so, if Nδ = max{N0 , Nδ′ }, then N ≥ Nδ implies that
!
√
λ′
N
N
N
P
sup
|γs − γt | ≥ λ δ
≤ P |γt | ≤ √
K t0
s≤t0 ,t≤s≤t+δ
√
N
+P
sup
s≤t0 ,t≤s≤t+δ
|MsN
−
MtN |
√
≤λ δ
′
!
≤ ε.
2
Given ε > 0 and T > 0, set 2δ = λε where λ is as above. Then there exists a partition
{ti } of the form 0 = t0 < t1 < · · · < tn−1 < T ≤ tn with ti − ti−1 = 2δ for all i. Now
w′ (γ N , δ, T ) ≤
≤
max
i
sup
s,t∈[ti−1 ,ti )
sup
s≤t0 ,t≤s≤t+2δ
|γsN − γtN |
|γsN − γtN |,
and so
lim sup P w′ (γ N , δ, T ) ≥ ε ≤ lim sup P
N →∞
N →∞
sup
s≤t0 ,t≤s≤t+2δ
|γsN
√
− γtN | ≥ λ 2δ
!
≤ ε.
Therefore condition (b) of Corollary 5.11 holds and hence the sequence (γ N )N ∈N is relatively
compact.
7.2
Convergence of the Finite Dimensional Distributions
It now remains to prove that, for all finite subsets {t1 , . . . tk } ⊂ [0, ∞), the finite dimensional
distributions (γtN1 , . . . , γtNk ) ⇒ (γt1 , . . . , γtk ). As γtN is stopped at T N and γt is stopped at T ,
where T is defined in Definition 7.4, it is necessary to establish a preliminary result about the
relationship between T N and T when N is large.
Lemma 7.7.
(a) If T = 0, then P(T N = 0) → 1 as N → ∞.
42
(b) If T > 0 and τ1 is the smallest non-zero element of T , then for every t < τ1 , P(T N > t) → 1
as N → ∞.
Proof. (a) By the definition of T , T = 0 if and only if x0 ∈ ∂S and hn0 , γ0 i < 0 where n0 is
the inward normal at x0 . Since γ0N → γ0 in probability (γ0 is non random), hn0 , γ0 i < 0
implies that P(hn0 , γ0N i < 0) → 1 as N → ∞ and so P(hn0 , X0N − x0 i < 0) → 1 as N → ∞
(by the definition of γ0N ). If x0 ∈ ∂S, then this means that X0N ∈
/ S provided |X0N − x0 |
N
N
is small enough. Therefore T = 0. Since X0 → x0 in probability, this implies that
P(T N = 0) → 1 as N → ∞.
(b) If T > 0, then, as above, either x0 ∈ S, or hn0 , γ0 i > 0 (P(hn0 , γ0 i = 0) = 0). If x0 ∈ S
then, since T is finite, the result follows by Proposition 6.16. Suppose that x0 ∈ ∂S and
hn0 , γ0 i > 0. Since ∂S is C 1 at x0 , for all ε > 0, there exists δ(ε) > 0 such that, for all
x ∈ S̄ with |x − x0 | ≤ δ(ε), and all v ∈ Rd ,
|v| ≤ δ(ε) and hn0 , vi ≥ ε|v| ⇒ x + v ∈ S.
(7.3)
We illustrate this condition in Figure 3.
Figure 3: Illustration of the case x0 ∈ S and hn0 , γ0 i > 0
Given ε > 0, let λ1 be the λ from Lemma 7.5 corresponding to 3ε . Let λ2 be the λ from
Lemma 7.6 corresponding to 3ε . Since γ0N → γ0 in probability, there exists some N1 < ∞
such that N ≥ N1 implies
ε
hn0 , γ0 i
N
< .
P |γ0 − γ0 | >
4
3
2 hn0 ,γ0 i
1
0 ,γ0 i
Let ε1 = min hn2λ
and N0 = max{N1 , Nε1 }, where Nε1 is defined as
,
,
λ1
4λ2
1
in Lemma 7.6. Let Ω be the set
1 hn0 , γ0 i
N
,
,
sup |γt | < max
ε1
2ε1
t≤t0
sup
t≤t0 ∧ε1
|γtN
43
−
γ0N |
hn0 , γ0 i
hn0 , γ0 i
≤
, |γ0N − γ0 | ≤
4
4
.
Then if N ≥ N0 and t ≤ t0 ∧ ε1 , on Ω,
hn0 , γtN i
≥
≥
≥
hn0 , γ0 i
2ε1
hn0 , γ0 i − |hn0 , γtN − γ0N i| − |hn0 , γ0N − γ0 i|
hn0 , γ0 i
2ε1
2ε1
hn0 , γ0 i − |γtN − γ0N | − |γ0N − γ0 | |γtN |
hn0 , γ0 i
2ε1
hn0 , γ0 i hn0 , γ0 i
hn0 , γ0 i −
|γtN |
−
hn0 , γ0 i
4
4
= ε1 |γtN |,
and
|γtN | <
1
.
ε1
But
√
hn0 , γ0 i
P(Ωc ) ≤ P sup |γtN | ≥ λ1 + P
sup |γtN − γ0N | > λ2 ε1 + P |γ0N − γ0 | >
4
t≤t0
t≤t0 ∧ε1
which is less than ε. Hence
1
N
N
N
P hn0 , γt i > ε1 |γt | and |γt | <
≥ 1 − ε.
ε1
By the continuity of xt , there exists 0 < ε2 < ε1 ∧ τ such that t ≤ ε2 implies |xt − x0 | ≤
δ(ε1 ). Set N ′ ≥ max{N0 , (ε1 δ(ε1 ))−2 }. Then, for N ≥ N ′ and t ≤ T N ∧ ε2 , xt ∈ S̄,
1
|xt − x0 | ≤ δ(ε1 ), N − 2 |γtN | ≤
1
N− 2
ε1
≤ δ(ε1 ) and hn0 , γtN i > ε1 |γtN |, all with probability
1
exceeding 1 − ε. By (7.3), this implies that XtN = xt + N − 2 γtN ∈ S i.e. T N 6= t for all
t ≤ ε2 . Hence P(T N ≤ ε2 ) < ε for all N ≥ N ′ , and so P(T N = 0) → 0 as N → ∞. The
result follows by Proposition 6.16.
Lemma 7.8. Suppose t ≤ t0 . Then γtN ⇒ γt as N → ∞.
Proof. If T = 0, then by Lemma 7.7, T N → 0 in probability as N → ∞. If t > 0, then, recalling
that γtN is stopped at T N and that γt is stopped at T , by Lemma 7.6 and (7.1),
P(|γtN − γt | ≤ ε)
P(|γTNN − γ0 | ≤ ε)
ε
ε
≥ P(|γTNN − γ0N | ≤ , |γ0N − γ0 | < and T N ≤ t)
2
2
→ 1.
=
So assume T > 0. By conditioning on σ(γτm ) if necessary, where τm ∈ T , we may assume
that t < τ1 , where τ1 is the smallest non-zero element of T .
Define (ψt )t≤τ in Rd ⊗ (Rd )∗ by
ψ̇t = ∇b(xt )ψt ,
ψ0 = id.
Fix θ ∈ (Rd )∗ and set θt = (ψt∗ )−1 θ. Then ψ˙t∗ θt + ψt∗ θ˙t = 0 (θ is constant) and so θ˙t =
−ψt∗ −1 (ψ̇t∗ θt ) = −ψt∗ −1 (ψt∗ ∇b(xt )∗ θt ) = −∇b(xt )∗ θt . By Itô’s formula, and the definition of γt
(Definition 7.3),
dhθt , γt i =
=
=
hdθt , γt i + hθt , dγt i + 0
−h∇b(xt )∗ θt , γt idt + hθt , σ(xt )dBt i + hθt , ∇b(xt )γt idt
hθt , σ(xt )dBt i,
44
and so
hθt , γt i ∼ N (hθ, γ0 i,
Z
t
0
hθs , a(xs )θs ids),
i.e. γ is a Gaussian process. On the other hand,
√
√
dγtN = NdMtN + N (bN (XtN ) − b(xt ))dt,
where MtN is defined as in Theorem 6.12. As above,
√
dhθt , γtN i = N hθt , dMtN i + RtN,θ dt,
where
RtN,θ =
(7.4)
√
N hθt , bN (XtN ) − b(xt ) − ∇b(xt )(XtN − xt )i.
By (7.2)
√
N |bN (XtN ) − b(XtN )| → 0.
sup
t≤T N
Given ε > 0, since xt ∈ S for all t in the compact interval [ε, τ1 − ε], there exists some δ ′ > 0
such that the compact set {x ∈ Rd : inf t∈[ε,τ1 −ε] |x − xt | ≤ δ ′ } is a subset of S. Then since b is
C 1 on S, ∇b is uniformly continuous on compact sets and so there exists some 0 < δ < δ ′ such
that |x − y| < δ implies that |∇b(x) − ∇b(y)| ≤ ε. Now suppose |x − xt | ≤ δ. By the mean value
theorem, there exists some ξ ∈ B(xt , |x − xt |) ⊂ B(xt , δ) such that b(x) − b(xt ) = ∇b(ξ)(x − xt ).
Then
|b(x) − b(xt ) − ∇b(xt )(x − xt )| =
|∇b(ξ) − ∇b(xt )||x − xt |
≤
ε|x − xt |.
Hence, |XtN − xt | ≤ δ and ε ≤ t ≤ τ1 − ε imply
√
N |b(XtN ) − b(xt ) − ∇b(xt )(XtN − xt )| ≤ ε|γtN |.
θt is continuous and hence bounded, by C1 say, on the compact interval [0, τ1 ]. Similarly ∇b(xt )
is bounded, by C2 say. Given η > 0, by Lemma 7.5 there exists some λ < ∞ such that
P(supt≤T N |γtN | > λ) < η2 . Given t′ < τ1 , pick ε sufficiently small that t′ ≤ τ1 − ε and
ε[C1 (ε + (K + C2 )λ) + t′ (C1 (1 + λ))] < η. Then, if δ > 0 is as above, on the set
√
Ω = { sup N |bN (XtN ) − b(XtN )| < ε, sup |XtN − xt | ≤ δ, sup |γtN | ≤ λ},
t≤T N
t≤T N
|RtN,θ | ≤
≤
t≤T N
√
C1 (sup N |bN (XtN ) − b(XtN )| + (K + sup |∇b(xt )|) sup |γtN |)
t≤t′
t≤t′
t≤t0
C1 (ε + (K + C2 )λ),
and if ε ≤ t ≤ t′ , then
|RtN,θ | ≤
≤
C1 (sup
t≤t′
√
N |bN (XtN ) − b(XtN )| + ε sup |γtN |)
t≤t′
C1 ε(1 + λ).
Hence
Z
0
t′
|RtN,θ |dt
=
Z
0
≤
<
ε
|RtN,θ |dt +
Z
t′
ε
|RtN,θ |dt
ε [C1 (ε + (K + C2 )λ)] + (t′ − ε)(C1 ε(1 + λ))
η.
45
Choosing N sufficiently large that P(Ω) ≥ 1 − η,
P
Z
t′
0
|RtN,θ |dt
≤η
!
≥1−η
i.e. the integral converges to 0 in probability.
Therefore, returning to (7.4), in order to show γtN ⇒ γt , it suffices to show, for all θ ∈ (Rd )∗
and all t ≤ τ1 , that
Z t
√ Z t
N
hθs , dMsN i → N (0,
hθs , a(xs )θs ids)
0
0
in distribution. By Lévy’s Continuity Theorem, it is sufficient to show, for all θ ∈ (Rd )∗ and all
t ≤ τ1 , that E(EtN,θ ) → 1 as N → ∞, where
Z
√ Z t
1 t
N,θ
N
Et = exp{i N
hθs , dMs i +
hθs , a(xs )θs ids}.
2 0
0
Set m̃N (x, θ) = mN (x, iθ), m̃(x, θ) = m(x, iθ) and
Z N
eihθ,yi − 1 − ihθ, yi K N (x, dy).
φ̃ (x, θ) =
Rd
Assuming that (6.4) holds for θ ∈ (Cd )∗ , by the same argument as (6.7),
sup sup |m̃N ′ (x, N θ) − m̃′ (x, θ)| → 0
x∈S N |θ|≤η
as N → ∞, for all η ≤ η0 . By the second-order mean value theorem, if h ∈ (Rd )∗ , then
Z 1
m̃′′ (x, θ)h = m̃′ (x, θ + h) − m̃′ (x, θ) −
m̃′′′ (x, rθ)(h, h)(1 − r)dr,
0
and similarly for m̃N ′′ (x, N θ). By Lemma 6.4, |m̃′′′ (x, θ)| ≤ B, for all x ∈ S and |θ| ≤ η, and
η0 −η ε
B
N
N 2 |m̃N ′′′ (x, N θ)| ≤ N 2 N
N 3 = B, for all x ∈ S and |θ| ≤ η. Given ε > 0, let δ = min{ 2 , 2B }.
There exists N0 such that N ≥ N0 implies
sup
sup
x∈S N
η+η
|θ|≤ 2 0
|m̃N ′ (x, N θ) − m̃′ (x, θ)| <
εδ
.
4
If N ≥ N0 , then
|N m̃N ′′ (x, N θ) − m̃′′ (x, θ)|
=
=
1 (N m̃N ′′ (x, N θ) − m̃′′ (x, θ))h
|h|=δ |h|
1 N ′
sup
m̃ (x, N (θ + h)) − m̃′ (x, θ + h)
|h|=δ |h|
sup
− m̃N ′ (x, N θ) + m̃′ (x, θ)
Z 1
m̃′′′ (x, θ) − N 2 m̃N ′′′ (x, N θ) (h, h)(1 − r)dr
+
0
≤
<
≤
1
1 2 sup sup |m̃N ′ (x, N θ) − m̃′ (x, θ)| + 2B|h|2
2
|h|=δ |h|
x∈S N |θ|≤ η+η0
sup
2
1 2εδ
+ Bδ
δ 4
ε.
Hence, for all η < η0 , we have
sup sup |N m̃N ′′ (x, N θ) − m̃′′ (x, θ)| → 0.
x∈S N |θ|≤η
46
Since a(x) = m′′ (x, 0) = −m̃′′ (x, 0),
φ̃N (x,
√
1
N θ) + hθ, a(x)θi =
2
Z
0
1
√
N m̃N ′′ (x, N rθ) − m̃′′ (x, 0) (θ, θ)(1 − r)dr,
where the first term arises from the second-order mean value theorem. Hence, for all ρ < ∞,
sup sup |φ̃N (x,
x∈S N
|θ|≤ρ
√
1
N θ) + hθ, a(x)θi|
2
≤
≤
√
1 2
ρ sup sup |N m̃N ′′ (x, N θ) − m̃′′ (x, 0)|
2 x∈S N |θ|≤ρ
1 2
ρ sup sup |N m̃N ′′ (x, N θ) − m̃′′ (x, θ)|
2 x∈S N |θ|≤ √ρ
N
√
+ sup sup |m̃′′ (x, θ/ N ) − m̃′′ (x, 0)|
→ 0,
x∈S N |θ|≤ρ
(7.5)
the second term converging by the continuity of m̃′′ (x, θ) as a function of θ.
N
Write EtN,θ = EtN = ZtN AN
t Bt , where
Z t
√
√ Z t
N
N
hθs , dMs i −
φ̃N (XsN , N θs )ds},
Zt
= exp{i N
0
0
Z t
√
1
φ̃N (XsN , N θs + hθs , a(XsN )θs i ds},
AN
= exp{
t
2
0
Z t
1
BtN = exp{
hθs , (a(xs ) − a(XsN ))θs ids}.
0 2
N
N
(Zt∧T
N )t≤τ is a martingale as in Lemma 6.7, and so E(Zt∧T N ) = 1 for all N . Fix t ≤ τ . Since
θs is continuous, it is bounded on the
√ compact interval N[0, t]. By Lemma 6.4, a(x) is bounded
and so, for s ≤ t, by (7.5), φ̃N (XsN , N θs ), and hence Zt∧T
N , is bounded, uniformly in N . Also
N
by (7.5), AN
→
1
uniformly
as
N
→
∞.
Since
a(x)
is
bounded,
Bt∧T
N
N is bounded uniformly
t∧T
N
N
in N , and since a is Lipschitz and Xt → xt in probability (Theorem 6.12), Bt∧T
N converges to
N
N
N
1 in probability. Given ε > 0, suppose |Zt∧T N |, |At∧T N |, |Bt∧T N | ≤ M for some constant M and
ε
ε
N
that N is sufficiently large that P(|AN
t∧T N Bt∧T N − 1| ≥ δ) ≤ 2(M 3 +1) where δ = 2M . Then
E(Z N
N
N
t∧T N At∧T N Bt∧T N )
N
N
N
− 1 = E Zt∧T
N (At∧T N Bt∧T N − 1)
N
N
N
≤ E |Zt∧T
N |At∧T N Bt∧T N − 1|1{|AN
B N N −1|<δ}
N
t∧T
t∧T
N
N
N
+ E (|Zt∧T N At∧T N Bt∧T N | + 1)1{|AN N B N N −1|≥δ}
t∧T
≤
N
M δP(|AN
t∧T N Bt∧T N − 1| < δ)
3
N
+ (M + 1)P(|AN
t∧T N Bt∧T N −
< ε.
t∧T
1| ≥ δ)
N
N
N
N
Hence E(Zt∧T
> t) → 1 for all t < τ1 .
N At∧T N Bt∧T N ) → 1 as N → ∞. By Lemma 7.7, P(T
N
N N N
Since Et = Zt At Bt is bounded uniformly in N , by a similar argument to that above,
E(EtN ) → 1 for all t < τ1 , as required.
Lemma 7.9. For all finite subsets {t1 , . . . tk } ⊂ [0, τ ], (γtN1 , . . . , γtNk ) ⇒ (γt1 , . . . γtk ).
Proof. The proof is by induction on k. The case k = 1 is proved in Lemma 7.8 above. Suppose
we have proved the result for all m < k, where k > 1. Let 0 ≤ t1 < · · · < tk ≤ τ . By Lévy’s
Continuity Theorem, it is sufficient to show that




k
k
X
X
hθj , γtj i ,
hθj , γtN i → E exp i
E exp i
j
j=1
j=1
47
for every k-tuple (θ1 , . . . , θk ) ∈ ((Rd )∗ )k . Let
βN =
k−2
X
j=1
hθj , γtNj i + hθk−1 + θk , γtNk−1 i,
and define β similarly. By our inductive hypothesis, β N ⇒ β and so, conditional on Ftk−1 =
σ(γt : 0 ≤ t ≤ tk−1 ), β N → β in probability. Hence
N
E ei(β −β) − 1 | Ftk−1 → 0
as N → ∞. Also, by an identical argument to Lemma 7.8,
ihθ ,γ N −γ N
i
E e k tk tk−1 | Ftk−1 → E eihθk ,γtk −γtk−1 i | Ftk−1 ,
as N → ∞. Hence
N
ihθk ,γtN −γtN
i
k
k−1
− E eiβ eihθk ,γtk −γtk−1 i E eiβ e
h N
i
ihθ ,γ N −γ N
i
= E E eiβ e k tk tk−1 − eiβ eihθk ,γtk −γtk−1 i | Ftk−1 N
ihθ ,γ N −γ N
i
≤ E E ei(β −β) e k tk tk−1 − eihθk ,γtk −γtk−1 i | Ftk−1 h
i
N
ihθ ,γ N −γ N
i
≤ E E ei(β −β) − 1 e k tk tk−1 | Ftk−1 ihθ ,γ N −γ N
i
+ E E e k tk tk−1 − eihθk ,γtk −γtk−1 i | Ftk−1 i
h N
≤ E E ei(β −β) − 1 | Ftk−1
ihθ ,γ N −γ N
i
+ E E e k tk tk−1 − eihθk ,γtk −γtk−1 i | Ftk−1 → 0.
Therefore (γtN1 , . . . , γtNk ) ⇒ (γt1 , . . . γtk ), completing the inductive step.
Combining these results gives the following theorem.
Theorem 7.10. γ N ⇒ γ as N → ∞.
Proof. By Lemmas 7.5 and 7.6, the sequence (γ N )N ∈N is relatively compact and by Lemma 7.9,
the finite dimensional distributions of γ N converge in distribution to those of γ. The result
follows by Theorem 4.3.
The basic implication of this theorem is that (XtN )t≥0 can be approximated (in distribution)
1
by the Gaussian process (xt + N − 2 γt )t≥0 . This allows us to use properties of diffusion processes
when investigating the behaviour of X N for large values of N .
8
Applications
The results from the previous two sections can be applied to obtain limiting results in a wide
range of areas, from random graphs and stochastic networks to perturbed dynamical systems.
We look at the applications to biology and specifically to population processes, discussing two
examples: epidemics and logistic growth.
48
8.1
Epidemics
Suppose we have a population consisting of N individuals. In the population, at any given time t,
there are a number of individuals, StN , who are susceptible to a particular disease, and a number
of individuals, ItN , who are infected by the disease and can pass it on. A susceptible individual
encounters diseased individuals at a rate proportional to the fraction of the total population
that is diseased, with proportionality constant λ. We assume that diseased individuals recover
and become immune independently of each other, at a rate µ. Therefore, (StN , ItN )t≥0 is a
continuous-time Markov chain, taking values in (Z+ )2 where
(s, i) → (s − 1, i + 1) at rate λsi/N
(s, i) → (s, i − 1) at rate µi.
We are interested in the proportion of susceptible and infected individuals for large values of N
as this is generally the situation under which an actual epidemic occurs.
Let XtN = (StN , ItN )/N be the proportion of susceptibles and infectives at time t. Since
+ StN ≤ N for all t, XtN takes values in U N = N1 Z2 ∩ U , where U is the unit square [0, 1]2 .
We are particularly interested in the proportion of individuals who have avoided being infected
by the time the last infective recovers and for this reason, we let S be the open set (−1, 2)×(0, 2).
(This choice of S is somewhat arbitrary, but has the property that the only way the process
can leave S is by the last infective recovering). Set S N = U N ∩ S, and for some fixed t0 , let
T N = inf{t ≥ 0 : XtN ∈
/ S} ∧ t0 .
(
(−1,1)
with rate λStN ItN /N = N λXtN,1 XtN,2
N
N
∆Xt = (0,−1)
with rate µItN = N µXtN,2,
N
ItN
and so K N (x, dy) = N λx1 x2 δ(−1,1)/N + N µx2 δ(0,−1)/N . Then
Z
mN (x, θ) =
ehθ,yiK N (x, dy)
R2
= N λx1 x2 e
θ2 −θ1
N
θ2
+ N µx2 e− N ,
and so, there exists a limit kernel K(x, dy) = λx1 x2 δ(−1,1) + µx2 δ(0,−1) , with corresponding
Laplace transform m(x, θ) = λx1 x2 eθ2 −θ1 + µx2 e−θ2 with the properties:
(a) There exists a constant η0 = 1 such that
sup sup m(x, θ) ≤ 4λe2 + 2µe1 < ∞.
x∈S |θ|≤1
(b)
N
m (x, N θ)
− m(x, θ) = 0.
sup sup N
N
x∈S
|θ|≤η0
We also assume that there exists some x0 ∈ S̄, the closure of S such that, for all δ > 0,
lim sup N −1 log P(|X0N − x0 | ≥ δ) < 0.
N →∞
Note that as S0N + I0N = N for all N , x0 = (1 − α, α) for some α ∈ [0, 1]. Set b(x) = m′ (x, 0) =
(−λx1 x2 , λx1 x2 − µx2 ). On S,
|λ(x1 x2 − y 1 y 2 )| ≤
≤
λ(|x1 − y 1 ||x2 | + |x2 − y 2 ||y 1 |)
2λ(|x1 − y 1 | + |x2 − y 2 |),
49
and
|λ(x1 x2 − y 1 y 2 ) − µ(x2 − y 2 )| ≤
≤
2λ|x1 − y 1 | + (2λ + µ)|x2 − y 2 |)
(2λ + µ)(|x1 − y 1 | + |x2 − y 2 |).
Hence
|b(x)|2
|λ(x1 x2 − y 1 y 2 )|2 + |λ(x1 x2 − y 1 y 2 ) − µ(x2 − y 2 )|2
(4λ2 + (2λ + µ)2 )(|x1 − y 1 | + |x2 − y 2 |)2
=
≤
2(4λ2 + (2λ + µ)2 )|x − y|2 ,
≤
and so b is Lipschitz on S. Therefore, there is a unique solution (xt )t≥0 to the ordinary differential
equation x˙t = b(xt ) starting from x0 , that is
x˙2t = (λx1t − µ)x2t .
x˙1t = −λx1t x2t ,
The sequence of jump processes (X N )N ≥1 satisfies all the conditions needed to apply the Fluid
Limit Theorem (Theorem 6.12) and so, for all δ > 0,
lim sup N −1 log P( sup |XtN − xt | ≥ δ) < 0.
N →∞
t≤T N
Let τ = inf{t ≥ 0 : x2t = 0}. Then T = {t ≥ 0 : xt ∈
/ S} = [τ, ∞) (if x2t = 0, then x˙1t = x˙2t = 0),
and so xt∧τ = xt for all t. The argument in Proposition 6.16 then shows that
lim sup N −1 log P(sup |XtN − xt∧τ | > δ) < 0.
N →∞
t≥0
By investigating the properties of xt , we can get a good idea of the behaviour of XtN for large
values of N . We introduce a new variable x3t , the proportion of individuals that have recovered
by time t, so that x1t +x2t +x3t = 1 for all t, and hence x˙3t = µx2t . We shall discuss three questions:
(a) What conditions are needed for an epidemic to take place?
(b) What proportion of people will catch the disease and, in particular, will there be any
susceptibles left when the epidemic has died out?
(c) What is the time dependence of the epidemic?
The case α = 0, when there are no infectives initially, is not interesting as all the values remain
constant over time, so we assume that α > 0. By reversing time in the differential equations, it
can be seen that this implies that x2t > 0 for all t (and so in fact τ = ∞). An epidemic occurs
when the proportion of infectives increases over some time period. Looking at the equation for
x˙2t , this happens if and only if λx1t > µ for some t. By the equation for x˙1t , x1t is a decreasing
> 1. We call ρ0
function of t and so an epidemic occurs if and only if λxt0 > µ, or ρ0 = λ(1−α)
µ
the reproductive rate.
By the equation for x˙3t , x3t is increasing, but bounded above by 1, and so must converge to
a limit as t → ∞. Hence x˙3 → 0, and so x2 → 0 as t → ∞. Therefore, the epidemic dies out in
t
the limit. Now
t
dx1t
x˙1t
λ
=
= − x1t ,
3
˙
3
dxt
µ
xt
λ
3
and so x1t = x10 e− µ xt . At the end of the epidemic, x2∞ = 0 and so x1∞ + x3∞ = 1. Hence the
proportion of susceptibles at the end of the epidemic satisfies
λ
1
x1∞ = (1 − α)e− µ (1−x∞ ) ,
50
which can be solved numerically. Often, the initial number of infectives is very small, so (1 − α)
x1∞
be the proportion of susceptibles ultimately infected,
is approximately 1. If we let π = 1 − 1−α
then substituting in the above equation gives
π ≈ 1 − e−
λ(1−α)
π
µ
= 1 − e−ρ0 π .
For a given value of ρ0 , this can be solved using Newton’s method.
λ
3
Using the fact that x2t = 1 − x1t − x3t , and x1t = (1 − α)e− µ xt , the epidemic curve is given by
λ 3
x˙3t = µ(1 − x3t − (1 − α)e− µ xt ).
For specified values of λ, µ and α, this equation can be solved by numerical methods, which in
turn gives values for x1t and x2t . The graphs in Figure 4 (from [2]), show the course of a model
epidemic for λ = 0.001, µ = 0.1 in the cases α = 1/500 i.e ρ0 = 4.99 > 1, and α = 1/150 i.e.
ρ0 = 1.49 > 1.
Figure 4: Graphs of a model epidemic for ρ0 = 4.99 and ρ0 = 1.49
8.2
Logistic Growth
We now interpret N as the area of a region occupied by a certain population. If the population
k
size is k, then the population density is N
. For simplicity, we assume that births and deaths occur
singly. The rate of births and deaths should be approximately proportional to the population
size. We assume, however, that crowding affects the birth and death rates, which therefore
depend on the population density. Therefore, the birth and death rates should be of the form
λ(k/N )k and µ(k/N )k, respectively, for some functions λ and µ. If a, b, and c are non-negative
constants, then one of the simplest such models is given by taking λ(x) = a and µ(x) = b + cx.
51
Let the Markov chain (XtN )t≥0 be the population density at time t. Then X N takes values in
I N = N1 Z+ , and
(
1
at rate N aXtN
N
∆Xt = N 1
− N at rate N bXtN + N c(XtN )2 .
We are interested in whether the population exceeds a certain density or dies out and so, for
fixed R > 0, we set S = (0, R), and S N = I N ∩ S. X N has Lévy Kernel K N (x, dy) =
N axδ N1 + (N bx + N c)δ− N1 and so
θ
θ
mN (x, θ) = N axe N + (N bx + N c)e− N .
There exists a limit kernel K(x, dy) = axδ1 + (bx + c)δ−1 , with corresponding Laplace transform
m(x, θ) = axeθ + (bx + c)e−θ with the required properties. Suppose there exists some x0 ∈ S̄,
the closure of S such that, for all δ > 0,
lim sup N −1 log P(|X0N − x0 | ≥ δ) < 0.
N →∞
Set b(x) = m′ (x, 0) = (a − b)x − cx2 . b is Lipschitz on S and so there is a unique solution (xt )t≥0
to the ordinary differential equation ẋt = (a − b)xt − cx2t starting from x0 . Applying the Fluid
Limit Theorem (Theorem 6.12), for all δ > 0,
lim sup N −1 log P( sup |XtN − xt | ≥ δ) < 0,
N →∞
t≤T N
where T N is defined as in Definition 6.10. We can solve the ordinary differential equation
explicitly to get
xt =
x0 (a − b)
(a − b − cx0 )e(b−a)t + cx0
if a 6= b,
or xt =
x0
x0 ct + 1
if a = b,
from which it is easy to see that, when c 6= 0, the population dies out as t → ∞ if a ≤ b, and
the population density stabilises to a−b
c if a > b. If c = 0, then the population dies out if a < b,
it explodes (i.e. becomes infinite) if a > b, and the density is stable at x0 if a = b. A graph of
the case a > b, c 6= 0 is sketched in Figure 5.
Figure 5: A graph of the logistic growth model in the case a > b, c 6= 0
√
We now look at the fluctuations about this result. As in Section 7, let γtN = N (XtN − xt )
and suppose that there exists a random variable γ0 such that γ0N ⇒ γ0 . Let a(x) = m′′ (x, 0) =
52
(a + b)x + cx2 . It is easy to check that all the necessary conditions for Theorem 7.10 hold. Hence
γ N ⇒ γ, where γ is the solution to the stochastic differential equation
dγt
= σ(xt )dBt + ∇b(xt )γt dt
q
=
(a + b)xt + cx2t dBt + (a − b − 2cxt )γt dt.
In all of the cases mentioned above, except a > b, c = 0, xt converges to a finite limit as t → ∞.
When this limit is zero, γt behaves asymptotically like the solution to the differential equation
dyt = (a − b)yt dt.
That is, yt = x0 e(a−b)t → 0 as t → ∞ when a < b, and so the fluctuations die out as well. When
xt converges to a−b
c , a > b and so γt behaves asymptotically like the solution to the stochastic
differential equation
r
2a(a − b)
dYt =
dBt − (a − b)Yt dt,
c
i.e. Y is an Ornstein-Uhlenbeck process and so Yt ∼ N x0 e−(a−b)t , ac (1 − e−2(a−b)t ) ⇒ N (0, ac )
a
as t → ∞. Hence the fluctuations are asymptotically stable and XtN behaves like a N ( a−b
c , Nc )
random variable for large t and N .
9
Conclusion
In this essay, the notion of convergence for a sequence of stochastic processes has been discussed.
We followed the route of defining a topology on the space of cadlag functions, and a metric
on the space of probability measures that induced the same topology as that generated by
weak convergence. This enabled us to establish an equivalence between weak convergence,
convergence in probability and almost sure convergence, which we used to prove that, in the
simple cases of Markov chains and diffusion processes, the convergence of the generators implied
weak convergence. In Section 4, we proved a key result, due to Prohorov, which states that
a sequence of stochastic processes converges if and only if it is relatively compact and the
corresponding finite dimensional distributions converge. The remainder of the essay was spent
establishing a more workable form of this result, and applying it to generalise the Law of Large
Numbers and the Central Limit Theorem to Markov jump processes. Applications of these
theorems were also discussed.
There are a number of ways in which the key result mentioned above can be developed. One
idea is to show that any sequence of Markov processes converges if the corresponding sequence
of generators converges. Using the semigroup representation, it follows directly from the Markov
property that the finite dimensional distributions converge. The difficulty is showing the relative
compactness of the sequence of processes. This can be done using Aldous’ Criterion which is
proved by Kallenberg [8]. A full proof of the result, for Feller processes, is given in the same
reference.
A shortcoming of Prohorov’s result is that in general it is very difficult, and sometimes impossible, to calculate the finite dimensional distributions of a process. Strook and Varadhan
formulated an alternative result which depends on the generators of the processes, and is consequently applicable in many more situations. It relies on the fact that if X is a Markov process
with sample paths in a metric space (S, d) and with generator A, then
f (X(t)) −
Z
t
Af (X(s))ds
0
is a martingale for all f ∈ C̄(S). This can in fact be used to characterise the process X.
Strook and Varadhan’s result is stated as follows. Suppose that (Xn )n≥1 is a relatively compact
53
sequence of Markov processes, with generators (An )n≥1 such that An → A. If there exists a
Markov process X such that
Z t
f (X(t)) −
Af (X(s))ds
0
is a martingale for all f ∈ C̄(S), then Xn ⇒ X. A comprehensive account of this result is given
in [4] and in [7].
54
References
[1] Billingsley, P. Convergence of Probability Measures. Wiley, New York, 1999
[2] Brown, D. and Rothery, P. Models in Biology: Mathematics, Statistics and Computing.
Wiley, Chichester, 1993
[3] Darling, R.W.R. and Norris J.R. Vertex Identifiability in Large Random Hypergraphs. In
preparation, 2002
[4] Ethier, S.N. and Kurtz, T.G. Markov Processes: Characterization and Convergence. Wiley,
New York, 1986
[5] Friedlin, M.I. Markov Processes and Differential Equations:
Birkhäuser Verlag, Basel, 1996
Asymptotic Problems.
[6] Galambos, J. and Gani, J. Studies in Applied Probability. Applied Probability Trust, Oxford, 1994
[7] Jacod, J. and Shiryaev, A.N. Limit Theorems for Stochastic Processes. Springer-Verlag,
Berlin, 1987
[8] Kallenberg, O. Foundations of Modern Probability. Springer, New York, 1997
[9] Norris, J.R. Markov Chains. Cambridge University Press, Cambridge, 1997
[10] Rogers, L.C.G. and Williams, D. Diffusions, Markov processes, and Martingales, Vol. I.
Cambridge University Press, Cambridge, 2001
55