Lecture Notes - Applied Financial Mathematics

Continuous Time Finance
Lecture Notes by Ulrich Horst
The objective of this course is to give an introduction to the probabilistic techniques required to
understand some of the most widely used models of continuous mathematical finance. The course is
intended for graduate students in mathematics, but it might also be useful for students in economics
and operations research.
We start out discussing Brownian motion and continue with the pathwise Itô calculus and selected
topics in stochastic analysis such as Itô integrals, the change of variable formula (Girsanov’s theorem),
the martingale representation theorem and stochastic differential equations. They constitute the
building blocks of continuous time financial mathematics including the famous Black-Scholes option
pricing formula. Armed with the necessary results from stochastic calculus we then discuss the riskneutral approach to pricing derivatives. In particular, we prove that the market is free of arbitrage
if and only if there exists an equivalent local martingale measure (EMM) and identify the class of
EMMs in a generalized Black-Scholes model. This first part of the course mainly follows the textbooks
by Lamberton & Lapeyre (1996) and Øksendal (2003). The former provides a more elementary
introduction to stochastic calculus but with a clear application to mathematical finance while the
latter provides a more detailed introduction into the theory of stochastic analysis for Brownian motion.
For a more elaborate discussion of Brownian motion and semi-martingale stochastic calculus we refer
to the books by Karatzas & Shreve (1988) and Revuz & Yor (1999); further aspects of discrete and
continuous time are discussed in the books by Shreve (2005a, 2005b); students with an interest in
economics are encouraged to also consult Duffie (1996) and Hull (2000).
The second part of the course provides an introduction into stochastic optimal control with applications to utility maximization and portfolio optimization. We discuss both the traditional HamiltonJacobi-Bellman approach that derives value functions and optimal strategies through PDE methods
as well as more recent approaches using backward stochastic differential equations (BSDEs). This
part of course relies on Chapters 3 and 5 of the highly recommendable book of Pham (2009).
Contents
1 Preliminaries
1.1 Stochastic processes and σ-fields . . . . . . . . . . . .
1.2 Brownian motion . . . . . . . . . . . . . . . . . . . . .
1.3 Quadratic variation . . . . . . . . . . . . . . . . . . . .
1.4 Sample path properties . . . . . . . . . . . . . . . . .
1.4.1 Markov property and nowhere differentiability
1.4.2 Reflection principle and running maximum . .
.
.
.
.
.
.
3
3
5
6
9
9
11
2 Pathwise Itô calculus
2.1 The basic Itô formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
15
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
2.2
2.3
2
The quadratic variation of Itô integrals . . . . . . . . . . . . . . . . . . . . . . . . . . .
Application: The Bachelier Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
18
3 Stochastic Calculus for Brownian Motion
3.1 The Itô Integral . . . . . . . . . . . . . . . . . . .
3.1.1 The Itô isometry . . . . . . . . . . . . . .
3.1.2 Construction of stochastic integrals . . . .
3.1.3 Properties of stochastic integrals . . . . .
3.1.4 Itô integral for a larger class of integrands
3.2 Itô processes . . . . . . . . . . . . . . . . . . . .
3.3 The martingale representation theorem . . . . . .
3.4 Application: The Black Scholes Model . . . . . .
3.4.1 Pricing derivatives . . . . . . . . . . . . .
3.4.2 Barrier Options . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
20
20
21
22
23
24
25
29
30
31
34
4 Topics in diffusion theory
4.1 Stochastic differential equations . . . . . . .
4.2 Solutions to linear SDEs . . . . . . . . . . .
4.2.1 Stochastic Exponential SDEs . . . .
4.2.2 General linear SDEs . . . . . . . . .
4.3 Change of variables and Girsanov’s theorem
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
35
38
38
38
39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Risk neutral pricing
5.1 The market model . . . . . . . . . . . . . . . . . . . . . .
5.2 Equivalent martingale measures and absence of arbitrage
5.2.1 Characterization of equivalent martingale measures
5.2.2 The range of option prices . . . . . . . . . . . . . .
5.3 Option Pricing and PDE . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
43
44
45
46
47
6 Stochastic Optimal Control
6.1 Examples of stochastic optimization problems . . . . .
6.1.1 Portfolio allocation . . . . . . . . . . . . . . . .
6.1.2 Production-consumption model . . . . . . . . .
6.1.3 Irreversible investment model . . . . . . . . . .
6.2 Controlled diffusion processes . . . . . . . . . . . . . .
6.3 Dynamic programming principle . . . . . . . . . . . .
6.4 Hamilton-Jacobi-Bellman equation . . . . . . . . . . .
6.4.1 Formal derivation of HJB . . . . . . . . . . . .
6.4.2 Verification theorem . . . . . . . . . . . . . . .
6.5 Application: Portfolio optimization for power utilities
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
48
49
49
50
50
51
52
52
53
54
54
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
PRELIMINARIES
7 Backward Stochastic Equations and Optimal Control
7.1 Existence and uniqueness results . . . . . . . . . . . . . . .
7.1.1 Linear BSDE . . . . . . . . . . . . . . . . . . . . . .
7.1.2 Comparison principle . . . . . . . . . . . . . . . . .
7.1.3 BSDE, PDE and nonlinear Feynman-Kac formula .
7.2 Control and BSDE . . . . . . . . . . . . . . . . . . . . . . .
7.3 Application: Portfolio optimization . . . . . . . . . . . . .
7.3.1 Exponential utility maximization with option payoff
7.3.2 Mean-variance criterion for portfolio selection . . . .
1
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
56
56
57
57
57
59
60
60
62
Preliminaries
This section recalls the definition of stochastic processes in continuous time. We also introduce the
Brownian motion or Wiener process that is the core of most financial market models. We show that
typical Brownian sample paths are of unbounded variation and nowhere differentiable and give the
joint distribution of Brownian motion and its running maximum. We will need this distribution to
calculate the fair value of barrier options.
Further Reading: Much is this section is taken from Karatzas & Shreve (1988), Chapters 1 and
2.
1.1
Stochastic processes and σ-fields
Let (Ω, F, P) be a probability space. A real-valued stochastic process X = (Xt )t≥0 is a family of
random variables on (Ω, F, P) taking values in R. For each sample point ω ∈ Ω the function t → Xt (ω)
is called the sample path of the process X associated with ω. A stochastic process can hence be viewed
as a random draw of sample paths. Two processes X and Y defined on the same probability space are
modifications of each other if P[Xt = Yt ] = 1 for every t ≥ 0. The processes are called indistinguishable
if almost all their sample paths agree, i.e., P[Xt = Yt for all t ≥ 0] = 1. Evidently, if two processes
are indistinguishable, then they are modifications of each other. The following lemma establishes a
partial converse.
Lemma 1.1 If two processes X and Y are modifications of each other and have almost surely rightcontinuous sample paths, then they are indistinguishable.
We equip (Ω, F, P) with a filtration (Ft )t≥0 , i.e., with an increasing family of sub-σ-fields of F.
The filtration models the flow of information and Ft is interpreted as the set of events which are
observable up to time t ∈ [0, ∞). Unless stated otherwise we always assume that (Ft ) satisfies the
usual conditions of right continuity and completeness. This means that Ft = ∩s>t Fs and F0 contains
all P-null sets in F.
Several notions of measurability will be distinguished throughout this course.
Definition 1.2 Let X be a stochastic process on the filtered probability space (Ω, F, (Ft ), P).
1
PRELIMINARIES
4
(i) The process is called measurable if the map (t, ω) �→ Xt (ω) is (B[0, ∞) ⊗ F)-measurable.
(ii) The process is called progressively measurable if the restriction to the interval [0, t] is (B[0, t] ⊗
Ft )-measurable for every t ≥ 0.
(iii) The process is called adapted to the filtration (Ft ) is Xt is Ft -measurable for every t ≥ 0.
Evidently, a progressively measurable process is measurable and adapted. The converse is not
necessarily true. However, one has the following result.
Lemma 1.3 If almost all paths of an adapted process are right-continuous or almost all paths are
left-continuous, then the process is progressively measurable.
In almost all applications, one works with right-continuous processes and then there is no need to
distinguish between measurability and progressive measurability. In general, the following theorem
holds. It goes back to Chung & Doob.
Theorem 1.4 Let X be measurable and adapted to (Ft ). Then X has a progressively measurable
modification.
Throughout this course all equations and inequalities are understood in the almost sure sense. An
adapted process that satisfies Xt ∈ L1 (P) for all t is called a
• submartingale if E[Xt |Fs ] ≥ Xs for all t ≥ s;
• supermartingale if E[Xt |Fs ] ≤ Xs for all t ≥ s;
• martingale if E[Xt |Fs ] = Xs for all t ≥ s.
An adapted process X is called a Markov process if for all t > 0, s ≥ 0 and every bounded
real-valued function f on R we have
E[f (Xt+s )|Ft ] = E[f (Xt+s )|σ(Xt )].
Here σ(Xt ) denotes the σ-field generated by the random variable Xt . Intuitively, X has the Markov
property if the distribution of future states is completely determined by the present state. If X models
the price fluctuations of some risky asset the Markov property is equivalent to saying that the market
satisfies a weak form of the efficient market hypothesis: all available information is contained in the
current price.
A diffusion is a (strong) Markov process with continuous sample paths such that for all (t, x) the
limits
1
µ(t, x) = lim E [X(t + h) − X(t)|X(t) = x]
h→0 h
and
�
1 �
σ(t, x) = lim E (X(t + h) − X(t))2 |X(t) = x
h→0 h
exist. In this case µ is called the drift and σ the diffusion coefficient. The simplest and perhaps most
important diffusion is a Brownian motion.
1
PRELIMINARIES
1.2
5
Brownian motion
The Brownian motion or Wiener process is the most important building block of continuous-time asset
pricing models.
Definition 1.5 A stochastic process W = (Wt )t≥0 on the probability space (Ω, F, P) is called a standard one-dimensional Brownian motion if the following conditions are satisfied:
(i) W0 = 0 P-a.s.
(ii) W has independent increments; that is for all t > s ≥ 0 we have that
Wt+s − Ws
is independent of
(Wu )0≤u≤s .
(iii) The increments are stationary and normally distributed:
Wt+s − Wt ∼ N (0, s).
(iv) W has almost surely continuous sample paths.
Notice that the increments of a Brownian motion over a time period [t, t+s] are normally distributed
with mean zero and variance equal to the length s of the time interval. In particular
Wt ∼ N (0, t).
As for the covariance of the random variables Wt and Ws (t ≥ s) observe that
Cov(Wt , Ws )
=
=
(ii)
=
(iii)
E[Wt · Ws ] − E[Wt ] · E[Ws ]
E[(Wt − Ws )Ws ] + E[Ws2 ]
E[(Wt − Ws )] · E[Ws ] + E[Ws2 ]
=
0+s
=
s.
In other words, for all t, s ∈ R+ we have that
Cov(Wt , Ws ) = min{s, t}
(1)
and all the finite-dimensional distributions of W are multivariate normal with mean zero and covariance matrix given by (1). A construction of a Wiener process is beyond the scope of this course; we
refer to Karatzas & Shreve (1988), Chapter 2, for constructions of Brownian motion.
Theorem 1.6 A standard Brownian motion exists.
For a standard Wiener process W let (Ft ) be its canonical filtration, that is Ft = σ(Ws , s ≤ t). It
can be shown that the completed filtration satisfies the usual conditions. We will need to following
important property.
1
PRELIMINARIES
6
Proposition 1.7 Let W be a standard BM. Then (Wt ) and (Wt2 − t) are martingales with respect to
the canonical filtration.
Proof: Let t > s. Then
(iii)
E[Wt |Fs ] = E[Wt − Ws + Ws |Fs ] = E[Wt − Ws |Fs ] + Ws = Ws .
For the second assertion we need to show that
E[Wt2 − t|Fs ] = Ws2 − s.
Since W is adapted this is equivalent to
E[Wt2 − Ws2 |Fs ] = t − s.
But
E[Wt2 − Ws2 |Fs ]
E[Wt2 |Fs ] − Ws2
=
E[Wt2 |Fs ] − 2Ws E[Wt |Fs ] + Ws2
=
=
E[(Wt − Ws )2 |Fs ]
=
t − s.
This proves the assertion.
✷
A famous characterization result of Brownian motion due to Paul Lévy states that the converse of
the preceding proposition is also true. We state the result without proof.
Theorem 1.8 A continuous real-valued process (Xt ) defined on a probability space (Ω, F, P) is a
Brownian motion if and only if both (Xt ) and (Xt2 − t) are martingales.
1.3
Quadratic variation
Let us fix some time T . In mathematical finance one often encounters integrals of the form
�
T
f (s)dWs (ω)
(2)
0
for some, say, deterministic, function f . Such integrals are natural continuous time versions of the
stochastic integrals considered in financial market models of discrete time. For a given sample path
W (ω) of the Brownian motion process the integral (2) were well defined in the usual Riemann-Stieltjes
sense if W (ω) were of bounded variation1 . Unfortunately, W (ω) typically has unbounded variation.
This calls for an alternative definition of integrals with respect to Wiener processes. We postpone the
introduction of such integral to the next section and instead focus first on the quadratic variation of
Brownian sample paths.
1A
function is of bounded variation if and only if it can be represented as the difference of two increasing functions;
more on this later.
1
PRELIMINARIES
7
Definition 1.9 A partition Π of the time interval [0, T ] is a set of points 0 = t0 < t1 < · · · tn = T .
The mesh size is
|Π| := max |ti − ti−1 |.
i
We are now ready to introduce the (quadratic) variation of a real valued function X on [0, T ].
Definition 1.10 Let X : [0, T ] → R be a real-valued function. The variation V (X) of X is defined
by
� n
�
�
V (X) := sup
|X(ti ) − X(ti−1 )| : Π = (t0 , . . . , tn ) is a partition
i=1
If V (X) < ∞, then X is of finite variation. The quadratic variation process along a partition Π is
given by
�
2
Vt2 (X, Π) =
(X(ti ) − X(ti−1 )) ,
(t ∈ [0, T ]).
ti ∈τ,ti ≤t
If (Πn ) is a sequence of partitions with |Πn | → 0 as n → ∞ and if for all t ∈ [0, T ] the limit
�X�t := lim Vt2 (X, Πn )
n→∞
exists and if the map t �→ �X�t is continuous, then X is said to have continuous quadratic variation
�X� = (�X�t ) along (Πn ).
Notice that the quadratic variation process is increasing and that < X >0 = 0. The following
proposition shows that X has a trivial quadratic variation process if it is of bounded variation.
Proposition 1.11 If X is continuous and of bounded variation, then its quadratic variation process
is trivial, i.e., identically equal to zero (along all partitions).
Proof: For any partition Πn = (tn0 , . . . tnmn ) of [0, T ] we have that
�
n
tn
i ∈Πn ,ti ≤t
�
X(tni ) − X(tni−1 )
�2
�
≤
sup |X(tni ) − X(tni−1 )| ·
≤
sup |X(tni ) − X(tni−1 )|V (X).
i
n
tn
i ∈Πn ,ti ≤t
�
�
�X(tni ) − X(tni−1 )�
i
Since X is continuous and hence uniformly continuous on compact sets
lim sup |X(tni ) − X(tni−1 )| = 0 if
n→∞
i
|Πn | → 0.
✷
�
We recall form our calculus course that an integral f dX with respect to X can be defined in
the Riemann-Stieltjes sense only if X is of bounded variation. In particular, the integral cannot be
defined in the standard sense when X has a non trivial quadratic variation process. For a typical path
of a Brownian motion it turns out that
�W (ω)�t = t
for all times so the integral (2) cannot be defined as a Riemann-Stieltjes integral.
1
PRELIMINARIES
8
Theorem 1.12 Let (Πn ) be a sequence of partitions of [0, T ] such that
lim Vt2 (W (ω), Πn ) = t
for all t ∈ [0, T ]
n→∞
�
n
|Πn | < ∞. Then
P-a.s.
Proof: In proving the assertion we shall assume that t belongs to the partitions Πn . The arguments
can be made precise using a standard �-argument. In a first step we prove L2 -convergence. To this
end, observe that independence of the increments yields
E
�
�
ti ∈Πn
2
(Wti − Wti−1 ) − t
�2
=
=
E
�
� �
ti ∈Πn
�
ti ∈Πn
2
(Wti − Wti−1 ) − (ti − ti−1 )
�
�2
�
�2
E (Wti − Wti−1 )2 − (ti − ti−1 ) .
Next, recall that E[Wt2 ] = t so E[(Wt − Ws )2 ] = t − s and
Since
�
�2
�
�
E (Wti − Wti−1 )2 − (ti − ti−1 ) = V ar (Wti − Wti−1 )2 .
Wti − Wti−1 ∼ N (0, ti − ti−1 )
it follows that
Hence
E
�
�
ti ∈Πn
(Wti − Wti−1 )2 − t
�2
�
�
V ar (Wti − Wti−1 )2 = 2(ti − ti−1 )2 .
�
(ti − ti−1 )2
=
2
≤
2 · |Πn | · t
ti ∈Πn
which converges to zero as n tends to infinity. L2 -convergence implies almost sure convergence along
a subsequence but we actually need to show convergence of the entire sequence. For this we write
�
�
�
� �
� 2
�
�
2
E Vt (W, Πn ) = E
(Wti − Wti−1 ) =
E (Wti − Wti−1 )2 = t.
ti ∈Πn
ti ∈Πn
Thus, by Tchebysheff’s inequality we have for every � > 0 that
�
� V ar(Vt2 (W, Πn ))
P |Vt2 (W, Πn ) − t| > � ≤
.
�2
Since a Brownian sample path has independent increments
V ar(Vt2 (W, Πn )) =
�
ti ∈Πn
�
�
�
V ar (Wti − Wti−1 )2 = 2
(ti − ti−1 )2 ≤ 2T |Πn |
ti ∈Πn
so
�
� 2T
P |Vt2 (W, Πn ) − t| > � ≤ 2 |Πn |.
�
�
Since n |Πn | < ∞ it follows from the lemma by Borel and Cantelli given below that
�
�
P |Vt2 (W, Πn ) − t| > � infinitely often = 0.
1
PRELIMINARIES
9
This shows that, for every t ∈ [0, T ] there exists a set Nt of measure zero such that
< W (ω) >t = t
for all
ω∈
/ Nt .
A standard “sandwich-argument” shows that there is actually a set N of measure zero such that
< W (ω) >t = t
for all t ∈ [0, T ] and all
ω∈
/ N.
✷
The preceding proof used the following result, known as the Borell-Cantelli-Lemma.
Lemma 1.13 (Borell-Cantelli) Let {An }n∈N be a sequence of events defined on a probability space
(Ω, F, P).
(i) If
(ii) If
1.4
�
�
n
P[An ] < ∞, only finitely many An occur: P[only finitely many An occur] = 1.
n
P[An ] = ∞ and the events A1 , A2 , . . . are independent, infinitely many An occur:
P[infinitely many An occur] = 1.
Sample path properties
We close this section with a couple of theorems on important sample path properties of Brownian
motion.
1.4.1
Markov property and nowhere differentiability
We first show that Brownian motion possesses the Markov property. For this we recall that moment
generating functions uniquely determine probability distributions.
Theorem 1.14 Let W be a standard Brownian motion. Then for all t > s and every bounded function
f on R we have
E[f (Wt )|(Wu )u≤s ] = E[f (Wt )|Ws ].
Proof: We use the moment generating function to show that the conditional distribution of Wt+s
given Ft is the same as that given Wt . Indeed, independence of Brownian increments along with the
fact that Wt+s − Wt ∼ N (0, s) yields
E[euWt+s |Ft ]
=
euWt E[eu(Wt+s −Wt ) |Ft ]
=
euWt E[eu(Wt+s −Wt ) ]
=
euWt eu
=
euWt E[eu(Wt+s −Wt ) |Wt ]
=
2s
2
E[euWt+s |Wt ].
✷
1
PRELIMINARIES
10
For a continuous function f on R we denote by D± f (t) and D± f (t) the upper and lower Dini
derivative at t, respectively. The function f is differentiable at t if D+ f (t) and D+ f (t), respectively
D− f (t) and D− f (t), are finite and equal. It turns out that a typical Brownian path is nowhere
differentiable. This is an important property because if asset price processes were differentiable, the
price evolution over small period were predictable hence arbitrage opportunities would arise. The
proof of nowhere differentiability of Browninan paths uses the following results.
Remark 1.15 (i) Recall that when {Cn } is a decreasing sequence of events, i.e., C1 ⊃ · · · ⊃ Cn ⊃
Cn+1 ⊃ · · · ⊃ C, then
P[C] = P[ lim Cn ] = lim P[Cn ].
n→∞
n→∞
(ii) Recall that the countable union of P-null sets yields a set of P-measure zero.
We are now ready to show that a typical Brownian path is nowhere differentiable.
Theorem 1.16 Let W be a standard Brownian motion. A typical sample path is nowhere differentiable. More precisely, the set
{ω ∈ Ω : for each t ∈ [0, ∞) either D+ Wt (ω) = ∞ or D+ Wt (ω) = −∞}
contains an event F with P[F ] = 1.
Proof: It is enough to consider the time interval [0, 1]. For fixed integers j, k ≥ 1 we define the set
Aj,k =
�
�
t∈[0,1] h∈[0,1/k]
{ω ∈ Ω : |Wt+h (ω) − Wt (ω)| ≤ jh}.
Then
{ω ∈ Ω : −∞ < D+ Wt (ω) ≤ D+ Wt (ω) < ∞ for some t ∈ [0, 1]} ⊂
∞ �
∞
�
Aj,k
j=1 k=1
We will show that for any pair (j, k) there exists a set C of P-measure zero such that Aj,k ⊂ C. Since
there are only countably many pairs (j, k) ∈ N × N this shows that
�
�
P {ω ∈ Ω : −∞ < D+ Wt (ω) ≤ D+ Wt (ω) < ∞ for some t ∈ [0, 1]} = 0.
Let us fix a sample path ω ∈ Aj,k and an integer n ≥ 4k. By definition of the set Aj,k there exists
t ∈ [0, 1] such that
|Wt+h (ω) − Wt (ω)| ≤ jh for all h ∈ [0, 1/k].
Next, we fix an integer i ∈ N with 1 ≤ i ≤ n such that
i−1
i
≤t≤
n
n
and
i+v
v+1
1
−t≤
≤
n
n
k
for
v = 1, 2, 3.
It follows that
|W(i+1)/n (ω) − Wi/n (ω)| ≤ |W(i+1)/n (ω) − Wt (ω)| + |Wi/n (ω) − Wt (ω)| ≤
2j
j
3j
+ = .
n
n
n
1
PRELIMINARIES
11
The crucial observation is now that the assumption ω ∈ Aj,k provides information about the size
of the Brownian increment, not only over the interval [i/n, (i + 1)/n] but also over the neighboring
intervals [(i + 1)/n, (i + 2)/n] and [(i + 2)/n, (i + 3)/n]. Indeed
|W(i+2)/n (ω) − W(i+1)/n (ω)|
≤
|W(i+2)/n (ω) − Wt (ω)| + |W(i+1)/n (ω) − Wt (ω)|
3j
2j
5j
+
=
n
n
n
≤
and
|W(i+3)/n (ω) − W(i+2)/n (ω)|
≤
|W(i+3)/n (ω) − Wt (ω)| + |W(i+2)/n (ω) − Wt (ω)|
4j
3j
7j
+
= .
n
n
n
≤
With
(n)
Ci
:=
3 �
�
v=1
ω ∈ Ω : |W(i+v)/n (ω) − W(i+v−1)/n (ω)| ≤
(n)
we therefore observe that Aj,k ⊂ ∪ni=1 Ci
√
2v + 1
j
n
�
holds for all n ≥ 4k. Since
n(W(i+v)/n − W(i+v−1)/n ) =: Zv
(v = 1, 2, 3)
are standard normal random variables and one can easily verify a bound of the form
P[|Zv | ≤ �] ≤ �.
From their independence,
(n)
P[Ci ] ≤
We have
Aj,k ⊂ C :=
105j 3
.
n3/2
∞ �
n
�
(n)
Ci
n=4k i=1
and the assertion follows from Remark 1.15 above because
(n)
P[C] ≤ inf P[∪ni=1 Ci ] = 0.
n≥4k
✷
1.4.2
Reflection principle and running maximum
Our next goal is to give the joint distribution of a Brownian motion and its running maximum; we
shall need this distribution to calculate the fair value of barrier options. To this end, we first repeat,
in a non-rigorous way, for Brownian motion the reflection principle arguments given in the course on
discrete-time finance for the random walk. This section is based on Shreve (2005), Chapter 3.
Let us fix a positive level m and a time t > 0. We wish to “count” the paths that reach level m
before or at time t, i.e., those paths for which
τm := inf{s ≥ 0 : Ws = m}
1
PRELIMINARIES
12
satisfies τm ≤ t. There are two types of such paths: those that reach m prior to t but at t are at some
level w, and those who exceed m at time t. For each path that reaches m prior to time t but is at a
level w below m at t there is a “reflected path” that is at level 2m − w at time t. This leads to the
following reflection equality:
P[τm ≤ t, Wt ≤ w] = P[Wt ≥ 2m − w]
(w ≤ m, m > 0).
(This is derived by heuristic arguments; a more rigorous derivation using the Markov property is left
for student presentations.)
Theorem 1.17 For all m �= 0 the random variable τm has the cumulative distribution function
� ∞
y2
2
P[τm ≤ t] = √
e− 2 dy
2π |m|
√
t
and density
|m| − m2
fτm (t) = √
e 2t
t 2πt
Proof: Let m > 0. We substitute w = m into the reflection equation to obtain
P[τm ≤ t, Wt ≤ m] = P[Wt ≥ m].
On the other hand, if Wt ≥ m, then τm ≤ t. In other words
P[τm ≤ t, Wt ≥ m] = P[Wt ≥ m].
Adding these two equations, we obtain the cumulative distribution function for τm :
P[τm ≤ t]
=
P[τm ≤ t, Wt ≤ m] + P[τm ≤ t, Wt ≥ m]
=
2P[Wt ≥ m]
� ∞
z2
2
√
e− 2t dz.
2πt m
=
If m < 0, then τm and τ|m| have the same distribution. We then obtain the density fτ|m| by differentiation.
✷
We define the running maximum of a Brownian motion by
Mt = max Ws
0≤s≤t
Clearly Mt ≥ m if and only if τm ≤ t. This permits us to rewrite the reflection equality as
P[M ≥ m, Wt ≤ w] = P[Wt ≥ 2m − w]
(w ≤ m, m > 0).
From this we obtain the joint distribution of a Brownian motion and its running maximum.
Proposition 1.18 We have for all t > 0 the joint density of (Mt , Wt ) is
�
�
2(2m − w)
(2m − w)2
fMt ,Wt (m, w) = √
exp −
(w ≤ m, m > 0).
2t
2πt3
2
PATHWISE ITÔ CALCULUS
13
Proof: Because
P[Mt ≥ m, Wt ≤ w] =
and
�
∞
m
P[Wt ≥ 2m − w] = √
we have that
�
∞
m
�
�
w
−∞
1
2πt
�
w
−∞
fMt ,Wt (x, y)dydx = √
fMt ,Wt (x, y)dydx
∞
e−
z2
s
dz
2m−w
1
2πt
�
∞
e−
z2
2
dz.
2m−w
We differentiate first with respect to m to obtain
� w
2 − (2m−w)2
2t
fMt ,Wt (x, y)dy = − √
−
e
.
2πt
−∞
Differentiating with respect to w yields the desired result.
2
✷
Pathwise Itô calculus
�
Since Brownian sample paths are typically of unbounded variation, integrals of the form Wt (ω)dWt (ω)
cannot be defined in the usual Riemann-Stieltjes sense. We illustrate this by the following example
where we approximate a Brownian path in two “reasonable” ways; the resulting integrals turn out to
the be quite different, though.
Example 2.1 Let W be a standard Wiener process. Both
�
φ1 (t, ω) =
Wtj (ω)1(tj ,tj+1 ] (t)
j
and
φ2 (t, ω) =
�
Wtj+1 (ω)1(tj ,tj+1 ] (t)
j
“reasonably” approximate the sample path t → Wt (ω). Since φ1 and φ2 are piecewise constant the
integrals
� T
φi (t, ω)dWt (ω) (i = 1, 2)
0
�u
dWt = Wu − Ws . However,
s
��
�
T
� �
�
E
φ1 (t, ω)dWt (ω) =
E Wtj (Wtj+1 − Wtj ) = 0
make sense if we define
0
j
while
E
��
T
φ2 (t, ω)dWt (ω)
0
�
=
�
j
=
�
j
=
T.
�
�
E Wtj+1 (Wtj+1 − Wtj )
�
�
E (Wtj+1 − Wtj )2
2
PATHWISE ITÔ CALCULUS
14
Remark 2.2 Notice that the integrand φ2 of the preceding example is not adapted to the filtration
generated by the Brownian motion W .
Our goal is to now give a first definition of integrals of smooth functions of a Wiener process with
respect to Brownian motion. More generally, we shall define integrals of smooth transformations of
continuous functions X : [0, T ] → Rn of continuous quadratic variation w.r.t X. It turns out that
such integrals can be defined pathwise. We denote the components of X by X i (i = 1, ..., n).
Definition 2.3 Let X : [0, T ] → Rn be given. The variation-process associated with X i : [0, T ] → R
is defined by
�
�
�
i
i
i
V (X ) := sup
|X (ti ) − X (ti−1 )| : Π = (t0 , . . . , tn ) is a partition
i
If V (X i ) < ∞, then X i is of finite variation. If X is continuous then we say that it is of continuous
co-variation if the limit
�X j , X k �t = lim
n→∞
�
n n
tn
i ∈Π ,ti ≤t
�
X j (tni ) − X j (tni−1 )
��
X k (tni ) − X k (tni−1 )
�
exists for all j, k = 1, ..., n along a sequence of partitions (Πn ) with |Πn | → 0 as n → ∞ and the map
t �→< X, Y >t is continuous.
Remark 2.4 Notice that we require the quadratic variation to exist only along a given sequence of
partitions. For the special case of Brownian sample paths we showed that the choice of partition is
irrelevant as long as the mesh sizes converge to zero sufficiently fast. In general, this need not be the
case.
The quadratic variation process < X > of a real-valued continuous function X : [0, T ] → R is
defined by �X� := �X, X�. It is non-negative, increasing and satisfies �X�0 = 0. If X ∈ C 1 with
X0 = 0, then
� t
� t
Xt =
Xs� ds and V (X) =
|Xs� |ds
0
0
and �X� ≡ 0. The following properties of quadratic variation processes are easily verified.
Proposition 2.5 Let X, Y : [0, T ] → R be continuous and of continuous quadratic variation. Then
the following holds:
(i) The quadratic variation < X, Y > exists and is continuous if and only if < X + Y > and X + Y
exist and are continuous.
(ii) < ·, · > defines a bilinear form.
(iii) The polarization identity holds:
< X, Y >=
1
(< X + Y > − < X > − < Y >) .
2
2
PATHWISE ITÔ CALCULUS
15
(iv) The Cauchy-Schwarz inequality holds:
| < X, Y > | ≤
√
< X >< Y >.
The polarization identity along with the Cauchy-Schwartz inequality yields the following corollary.
Corollary 2.6 Let X be continuous with continuous variation and A be of bounded variation. Then
< X + A >=< X > .
Remark 2.7 We notice that the quadratic variation process < X i > is non-decreasing and hence of
bounded variation. By the polarization identity, the co-variation processes < X k , X j > for k �= j can
be expressed as the difference of two bounded variation functions and hence < X k , X j > is also of
bounded variation. As a result, integrals with respect to d < X i > and d < X k , X j > can be defined
in the usual way.
2.1
The basic Itô formula
For a given partition Π = (ti ) of the interval [0, T ] and a given continuous function X : [0, T ] → Rn
of continuous quadratic variation we denote by
∆i X j := Xtji+1 − Xtji
the increments of X j along Π. If X were of bounded variation, then for every continuously differentiable function f on Rn we would have
� t
f (Xt ) = f (X0 ) +
f � (Xs )dXs .
0
If < X >�= 0, i.e., X is of unbounded variation, then we need a generalization that accounts for the
“many small fluctuations responsible for the quadratic variation”. The Itô formula provides such a
generalization. For its proof we need the following lemma.
Lemma 2.8 Let (Πn ) be a sequence of partitions of [0, T ] with |Πn | → 0 and g : [0, T ] → R continuous. Then
� t
�
lim
g(ti )∆i X j ∆i X k =
g(s)d < X j , X k >s for all t ∈ [0, ∞)
n→∞
0
ti ∈Πn ,ti ≤t
We are now ready to state and prove a first version of Itô’s formula.
Theorem 2.9 (Itô formula) Let X : [0, T ] → Rn be continuous with continuous quadratic variation
< X > and let A : [0, T ] → Rm be continuous and of bounded variation. Let F : Rm × Rn → R be of
class C 1,2 . Then
� t
� t
F (At , Xt ) = F (A0 , X0 ) +
∇a F (As , Xs )dAs +
∇x F (As , Xs )dXs
0
0
n �
1 � t ∂2
+
F (As , Xs )d < X j , X k >s
j
k
2
0 ∂x ∂x
j,k=1
(3)
2
PATHWISE ITÔ CALCULUS
with
�
16
t
0
∇x F (Xs , As )dXs := lim
n
�
ti
∇x F (Xti , Ati )∆i X.
A special case of the preceding formula develops for At = t. In this case the mapping f is of the form
f (t, Xt ).
Corollary 2.10 Let f : [0, T ] × R → R be of class C 1,2 and X : [0, T ] → R be continuous with
continuous quadratic variation < X >. Then
� t
� t
�
1 t
f (t, Xt ) − f (0, X0 ) =
ft (s, Xs )ds +
fx (s, Xs )dXs +
fxx (s, Xs )d < X >s .
2 0
0
0
Another immediate result is the product rule which is obtained by applying the Itô formula to the
function f (x, y) = xy.
Corollary 2.11 (Product Rule) Let (X, Y ) be continuous with continuous quadratic variation. Then
Xt Yt − X0 Y0 =
2.2
�
t
Xs dYs +
0
�
t
Ys dXs + < X, Y >t .
0
The quadratic variation of Itô integrals
If g : R → R is continuously differentiable and G� = g, then it follows from the Itô formula that the
Itô integral
� t
Yt :=
g(Xs )dXs
0
is well defined and that the mapping t �→ Yt is continuous. For many applications it will be important
to identify the quadratic variation of Y . To this end, we first prove the following result.
Proposition 2.12 Let X be real-valued, continuous and of continuous quadratic variation and g ∈ C 1 .
Then the mapping t �→ g(Xt ) is of continuous quadratic variation and
< g(X) >t =
�
t
2
(g � (Xs )) d < X >s .
0
As a corollary to the preceding proposition we obtain a formula for the quadratic variation of Itô
integrals.
Corollary 2.13 Let X be real-valued, continuous and of continuous quadratic variation and g ∈ C 1 .
Then the process
� t
Yt :=
g(Xs )dXs
0
has quadratic variation
< Y >t =
�
t
g 2 (Xs )d < X >s .
0
The Itô formula also allows us to evaluate the integral of smooth functions of BM w.r.t. the Wiener
process.
2
PATHWISE ITÔ CALCULUS
17
Example 2.14 Let W be a standard Brownian motion and F : R → R be twice continuously differentiable. Then
� t
�
1 t ��
F (Wt ) = F (0) +
F � (Ws )dWs +
F (Ws )ds.
(4)
2 0
0
If F (x) = xn , then
Xtn = X0n + n
�
t
0
Xsn−1 dXs +
n(n − 1)
2
�
t
0
Xsn−2 d < X >s .
In particular, we obtain a closed form solution for the integral of a Brownian path with respect to
itself:
� t
� t
� t
Wt2 = W02 + 2
Ws dWs +
d < W >s = 2
Ws dWs + t.
0
0
0
The following example is of particular interest in mathematical finance.
Example 2.15 (Geometric Brownian Motion) Given a BM W , and constants µ ∈ R and σ > 0 the
geometric Brownian motion is given by the process
�
�
� �
1 2
St = S exp σWt + µ − σ t .
2
In order to write this process in differential form we apply Itô’s formula to the exponential function
x �→ ex and to the process
�
�
1
Xt = ln S0 + µ − σ 2 t + σWt .
2
In view of Corollary 2.6 we have that < X >t = σ 2 t so Itô’s formula yields
� t
�
1 t
St = S0 exp(Xt ) = S0 +
Su dXu +
Su d < X > u
2 0
0
��
�
�
� t
�
1 2
1 t 2
= S0 +
Su µ − σ du + σdWu +
σ Su du
2
2 0
0
� t
� t
= S0 +
µSu du +
σSu dWu .
0
0
In differential form the geometric Brownian motion can be written as
dSt
= µdt + σdWt .
St
Thus, if we assume that a stock price grows at an average rate µ and that the growth rates fluctuate
around their average rate according to σWt , then the price dynamics follows a geometric Brownian
motion. This is the assumption underlying the Black-Scholes option pricing model. The Bachelier
model assumes that asset prices follow a Browninan motion with drift. The Bachelier model is perhaps
the simplest model within which to illustrate the two main approaches to derivative pricing: the PDE
and the probabilistic (risk-neutral) approach. We discuss the Bachelir model in the following section.
2
PATHWISE ITÔ CALCULUS
2.3
18
Application: The Bachelier Model
Let us denote by Xt the price of some risky asset (“stock”) at time t ≥ 0 and assume that the asset
pays no dividends and that there is no storage cost. The the Bachelier model assumes that the stock
price process is given by
Xt = X0 + mt + σWt ,
(5)
where m ∈ R is the drift parameter, σ > 0 is the volatility parameter and W is a Brownian motion.
Our results on the quadratic variation of Itô integrals implies
�X�t = �σW �t = σ 2 t.
In the sequel we discuss two approaches to value and hedge in the Bachelier model a derivative
security with payoff profile
H(ω) = f (XT (ω))
The PDE Approach. Our first approach is based on PDE methods. The idea is to consider the
terminal value problem
�
1 2
=0
(t, x) ∈ [0, T ) × R
2 σ Fxx + Ft
.
(6)
F (T, x) = f (x)
If this PDE admits a classical solution F , i.e. a solution of class C 1,2 , then F satisfies the assumption
of Corollary 2.10 and a replicating strategy for H can be derived by invoking Itô’s formula.
Lemma 2.16 Let F ∈ C 1,2 solve the terminal value problem (6). Then it holds for almost all ω ∈ Ω
that
� T
H(ω) := f (XT (ω)) = F (0, X0 ) +
Fx (s, Xs (ω))dXs (ω).
(7)
0
The preceding result states that by following the trading strategy Fx (t, Xt ) the writer of an option
can eliminate all the risk associated with issuing H. Implementing this strategy requires the initial
investment F (0, X0 ), the fair (arbitrage-free) price of the contingent claim H. In the sequel we are
going to show how to solve the PDE in (6). The PDE is very similar to the heat equation:
1
Fxx − Ft = 0
2
(t, x) ∈ [0, ∞) × R.
(8)
It is easy to check that
� 2�
1
x
P (t, x) = √
exp −
(9)
2t
2πt
solves the heat equation. The function is called a fundamental solution. It has the following important
properties.
Proposition 2.17 (Smoothing Properties of the Heat Kernel) Let P (t, x) be the fundamental
solution to the heat equation and g ∈ Cb (R). Then the function
�
u(t, x) :=
g(y)P (t, x − y)dy
R
satisfies the following properties:
2
PATHWISE ITÔ CALCULUS
19
(i) It belongs to the class C ∞ ((0, ∞) × R).
(ii) It satisfies the heat equation on R × (0, ∞).
(iii) lim(t,x)→(0,y), t>0 u(t, x) = g(y).
With minor modifications, the proof of the preceding theorem can be extended to continuous functions
f : R → R that satisfy the growth condition
�
�2
|f (x)| ≤ α 1 + eC|x|
(10)
for some C, α > 0. Reversing the time and scaling it by σ 2 therefore yields the following result (the
proof is left as an exercise).
Theorem 2.18 Suppose that f : R → R satisfies (10). Then the function
�
F (t, x) :=
f (y)Pσ2 (T −t) (x, y)dy,
(t, x) ∈ [0, T ) × R
(11)
R
belongs to C ∞ ([0, T ) × R), satisfies limt→T F (t, x) = f (x) and hence solves the terminal value problem
(??).
To conclude, we obtained the no-arbitrage-price F (0, x) at time t = 0 and the replication strategy
Fx in terms of one-dimensional integrals. The function F in (11) is given by
�
�
�
1
(x − y)2
�
F (t, x) =
f (y)
exp − 2
dy
2σ (T − t)
σ 2π(T − t)
R
� 2�
�
√
1
z
=√
f (x + σz T − t) exp −
dz
2
2π R
�
�
√
= Ẽ f (x + σ T − tZ) ,
(12)
where Z ∼ N (0, 1) under some probability measure P̃; for specific payoffs f (.), this integral expression
can be solved in closed form. It is important to notice that the representation of F involves the
volatility parameter σ, but not the drift parameter m. We shall further comment on this below.
The probabilistic approach. Thus far, we have solved the pricing and hedging problem invoking
only PDE methods; no reference to probability was made. In order to take a more probabilistic
approach let us assume that σ �= 0 and rewrite the process (Xt ) as
Xt = X0 + σWt∗ ,
(13)
where Wt∗ = Wt + m
σ t. The process W is a Brownian motion with respect to P and the filtration (Ft ).
We are now going to show that W ∗ is a Brownian motion with respect to the same filtration (Ft ) but
a different (equivalent) probability measure P∗ ∼ P. For this, we recall the following lemma.
Lemma 2.19 (Bayes’
Formula) Let P̃ ∼ P be a probability measure on (Ω, F, (Ft ), P) with density
�
dP̃ �
process Zt := dP � > 0. Then,
Ft
Ht := Ẽt [H] =
Et [HZT ]
Et [ZT ]
for
H ≥ 0 or HZ ∈ L1 (P),
where Et [.] = E[.|Ft ] denotes the conditional expectation given Ft .
(14)
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
20
We are now ready to state and prove a first version of the change of measure formula on the Wiener
space, known as the Cameron-Martin-Girsanov formula.
Theorem 2.20 (Cameron-Martin-Girsanov) The following statements hold:
i) There exists an equivalent probability measure P∗ ∼ P such that Wt∗ = Wt + m
σ t (t ∈ [0, T ]) is a
∗
P -Brownian motion.
�
�
�
∗�
ii) The measure P∗ has the density Z := dP
= exp αWT − 12 α2 T with respect to P where
dP �
FT
α = −m
σ.
In view of the preceding theorem we can write the NA-price in (12) in the Bachelier model as
F (t, x) = E∗t [f (x + σ(WT∗ − Wt∗ ))] .
That is
F (t, Xt (ω)) = E∗ [f (x + σ(WT∗ − Wt∗ ))]|x=Xt (ω)
= E∗t [f (Xt + σ(WT∗ − Wt∗ ))]
= E∗t [f (Xt + (XT − Xt ))]
= E∗t [f (XT )] ,
(15)
just as in the classical actuarial approach, but under the “risk neutral” probability measure P∗ ; in
general P∗ �= P, unless m = 0. In view of (13) and (15) the price process (Xt ) of the underlying and
the derivative price process (F (t, Xt )) are martingales under P∗ with respect to the original filtration.
3
Stochastic Calculus for Brownian Motion
The Itô formula allows us to calculate integrals of smooth functions of Brownian motion W with
respect to Brownian motion in terms of ordinary Riemann integrals. For more general integrands, i.e.,
for integrands which are measurable with respect to the filtration generated by W one needs a more
advanced machinery, known as Itô calculus.
Further Reading: With the exception of the proof of the martingale representation theorem this
section mainly follows Chapters 3 and 4 of Øksendal (2003). For a more general theory of stochastic
integration see also the books of Karatzas & Shreve (1988), Chapter 3.
3.1
The Itô Integral
For our financial market models we will need to consider limits of integrals of the form
�
�
�
φti−1 (ω) Wti (ω) − Wti−1 (ω) .
(16)
ti
The question is then for which functions (trading strategies) φ a limit in (16) can be taken and if so
in which sense, i.e., in the almost sure sense, in probability, or in L2 . In constructing the stochastic
integral with respect to a Wiener process W we shall work on a Wiener basis, i.e., assume that (Ft )
is the filtration generated by W , augmented by the null sets.
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
21
Definition 3.1 Let V be the class of all functions f (t, ω) : [0, ∞) × Ω → R such that
(i) The mapping (t, ω) �→ f (t, ω) is progressively measurable.
(ii) The random variable f (t, ·) is Ft -measurable.
(iii) The mapping f is square integrable with respect to P ⊗ λ, that is
��
�
T
E
0
f 2 (t, ω)dt < ∞.
Notice that the function φ1 of Example 2.1 belongs to V while φ2 does not. For a function f ∈ V
we are now going to define the Itô-integral (up to some time T )
I[f ](ω) :=
�
T
f (t, ω)dWt (ω).
0
The definition is obvious for elementary functions, i.e., functions of the form
φ(t, ω) =
�
ej (ω)1(tj ,tj+1 ] (t)
j
where 0 ≤ t0 < t1 < · · · < tn ≤ T and ej is Ftj -measurable and bounded. For any such function we
put
� T
�
�
�
φ(t, ω)dWt (ω) :=
ej (ω) Wtj+1 − Wtj (ω).
0
3.1.1
j
The Itô isometry
It was Itô’s fundamental insight that one should not proceed in a pathwise way to define stochastic
integrals. Instead, one should take a functional-analytic point of view, applying Hilbert space theory.
The key insight is that the norm of an elementary function equals the norm of the stochastic integral:
��
�
�� �
�
�
�
�
� φdW �
� φ�
=
.
�
� 2
�
� 2
L (P)
L (P⊗λ)
The extension of the class of integrands from the set of elementary functions to V is based on this
so-called Itô isometry. L2 -spaces are complete. This, together with the isometry between the space
of stochastic integrals (equipped with the L2 (P)-norm) and the space of integrands (equipped with
the L2 (P ⊗ λ)-norm) allows for a extension of the integral w.r.t. Brownian motion beyond simple
processes.
Lemma 3.2 (Itô isometry) Let φ be a bounded elementary function. Then
�
�2 
��
�
� T
T
2


E
φ(t, ω)dWt (ω)
=E
φ (t, ω)dt .
0
0
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
3.1.2
22
Construction of stochastic integrals
In order to extend the stochastic integral to functions belonging to V we first show that the set of
simple processes is dense in V. Subsequently we use the isometry to define the integral as the L2 -limit
of the integrals of the approximating simple processes.
Lemma 3.3 Let g ∈ V be bounded. Then there exists a sequence of simple processes φn such that
� T
lim E
(g − φn )2 dt = 0.
n→∞
0
We are now ready to finish the construction of the stochastic integral. If f ∈ V is bounded and
continuous, the first part of the preceding lemma proves that there exists a sequence of simple processes
�T
L2 (P⊗λ)
φn such that φn −→ f . It follows from the Itô isometry that the sequence { 0 φn dW }n∈N forms
a Cauchy sequence in L2 (P):
�
�2 
��
�
�
�
lim E 
n→∞
T
0
T
φm (t, ·)dWt −
φn (t, ·)dWt
0
T
 = lim E
n→∞
0
2
(φm (t, ·) − φn (t, ·)) dt = 0.
�T
Since the L2 (P) is complete the sequence { 0 φn dW }n∈N converges. We define the integral of f
�T
with respect to BM as the unique limit, denoted 0 f dW . If f is bounded but discontinuous we
approximate it by continuous functions as in the part (ii) of the lemma. Finally, if f is only square
integrable, then we use the approximation


if f (t, ω) < −n
 −n
hn (t, ω) =
f (t, ω) if − n ≤ f (t, ω) ≤ n .


n
if f (t, ω) > n
Overall, we have, for any f ∈ V, defined the Itô integral
� T
I[f ](ω) :=
f (t, ω)dWt
0
2
as an L -limit of stochastic integrals of elementary functions with respect to Brownian motion.
Remark 3.4 Let {φn } and {ψn } be two sequences approximating f in the L2 -sense. Then the isometry for elementary functions yields
�
�2 
��
�
� T
� T
T
2
E
φn dWt −
ψn dWt  = E
(φn − ψn ) dt → 0
0
0
0
so the definition of I[f ] does not depend on the approximating sequence.
As an immediate corollary from the construction of the stochastic integral we obtain an isometry
for functions belonging to the class V.
Corollary 3.5 (Itô isometry) Let f ∈ V. Then
�
�2 
��
� T
E
f (t, ω)dWt  = E
0
T
2
�
f (t, ω)dt .
0
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
Example 3.6
23
(i) If follows from the construction of the Itô integral that
�
t
0
sdWs = tWt −
�
t
Ws ds.
0
Indeed, let us define the simple functions
φn (s) =
�j
1 j j+1 (s)
n [n, n )
j
n
approximating the identity function and put snj = nj on [ nj , j+1
n ) and sj = 0 elsewhere. We have
that
� t
� t
�
sdWs = lim
φn dW = lim
snj ∆Wj .
n→∞
0
Furthermore
�
snj ∆Wj =
j
n→∞
0
�
j
∆(snj Wj ) −
�
j
W j+1 ∆snj
n
j
The first term on the right hand side of the preceding equation converges to tWt while the second
�t
term is an ordinary Riemann sum that approximates the integral 0 Ws ds.
�
(ii) The integral Ws dWs has finite first and second moments because
���
�� t
�
�2 � � t
� t
t
1
E
Ws dWs = 0 and E
Ws dWs
=
E[Ws2 ]ds =
sds = t2 .
2
0
0
0
0
3.1.3
Properties of stochastic integrals
The following properties of the Itô integral are easily verified for elementary functions and hence for
all functions f ∈ V:
(i) For any 0 ≤ c ≤ T we have
�T
0
f dW =
�c
0
f dW +
�T
c
f dW .
�
�
(ii) Itô integrals are linear: For any constant c and all f, g ∈ V we have (cf + g)dW = c f dW +
�
gdW .
�
(iii) Expected values of Itô integrals are zero: E[ f dW ] = 0; see also Theorem 3.7 below.
�t
(iv) Itô integrals are adapted: The random variable 0 f dW is Ft -measurable.
Another important property of Itô integrals is the fact that they form martingales.
Theorem 3.7 Let f ∈ V. Then the stochastic process
�� ·
0
f dW
�
t≥0
is an (Ft )-martingale.
It turns out that Itô integrals can be chosen to depend continuously on the upper integral boundary.
The proof uses Doob’s martingale inequality. It allows for an estimate of the running maximum of a
continuous martingale in terms of its p-th moment at the terminal time.
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
24
Theorem 3.8 (Doob’s martingale inequality) Let (Mt ) is a martingale defined on (Ω, F, P) with almost surely continuous sample paths. For all p ≥ 1, T ≥ 0 and λ > 0 we have
�
�
1
P sup |Mt | > λ ≤ p E[|MT |p ].
λ
0≤t≤T
We are now ready to show that Itô integrals depend continuously on the upper integral bound.
Theorem 3.9 Let f ∈ V. Then there exists a t-continuous version of the process
�� t
�
f dWs
.
0
3.1.4
0≤t≤T
Itô integral for a larger class of integrands
It will sometimes be necessary to consider Itô integrals for a more general class of integrands. For
instance, let W be a Brownian motion defined on (Ω, F, P) with canonical filtration (Ft ). Below we
shall see that under certain conditions there exists an equivalent probability measure P∗ ∼ P and a
process W ∗ that is a Brownian motion w.r.t. P∗ and (Ft ) but whose canonical filtration (Ft∗ ) may
�
be strictly small than (Ft ). For such a process we will need to define integrals f dW ∗ for (Ft )adapted integrands (i.e. for trading strategies that are measurable w.r.t. the original filtration, but
not necessarily measurable w.r.t the smaller filtration (Ft∗ )). In order to define such integrals we
introduce the following class of functions.
Definition 3.10 Let W be a Wiener process defined on (Ω, F, P). Let (Gt ) be an increasing family
of σ-fields such that Wt is Gt -adapted. We denote by W the class of all functions f : [0, T ] × Ω → R
that satisfy the following conditions:
(i) f is progressively measurable
(ii) f (t, ·) is Gt -adapted
��
�
T
(iii) P 0 f 2 (s, ω) < ∞ = 1
If f ∈ W, then one can show that there exist step functions fn such that
� t
|fn − f |2 ds → 0
0
in probability. For such a sequence one has that
�
f dW .
�t
0
fn dW converges in probability to a limit, denoted
Remark 3.11 Loosely speaking, one extends the integral of BM to adapted processes f that satisfy
�� τn
�
E
fs2 ds < ∞
0
along a “localizing sequence” (τn ), i.e., a sequence of increasing stopping times that satisfies τn → ∞.
�·
As before, there exists a t-continuous version. However, ( 0 f dW ) will in general not be a martingale any more; typically, it will only be a local martingale in the sense of the following definition.
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
25
Definition 3.12 An Ft -adapted process (Zt ) is called a local martingale with respect to (Ft ) if there
exists a sequence of (Ft )-stopping times τn such that
τn → ∞
a.s. as n → ∞
and (Zt∧τn ) − Z0 is an Ft -martingale for every n.
Note that no integrability assumption is imposed on Z0 . Because of this, it need not be possible
to ‘reduce’ a local martingale to be a martingale.
Lemma 3.13 A local martingale (Zt ) that is bounded from below and satisfies E[Z0 ] < ∞ is a supermartingale.
The following lemma states a condition under which a local martingale is a (true) martingale.
Lemma 3.14 Let (Zt ) be a local martingale with respect to (Ft ). Then (Zt ) is a martingale if the
process is L1 bounded in the sense that
sup |Zs | ∈ L1
for all t > 0
s≤t
3.2
Itô processes
In Section 2 we introduced the Itô formula in its simplest form which allows us to evaluate integrals
�
f (Ws )dWs
when f : R → R is a sufficiently smooth function. The preceding section introduced stochastic
integrals
�
f (s, ω)dWs
for more general integrands. The definition of this integral can easily be extended to higher dimensions.
To this end, let
W = (W 1 , . . . , W n )
be a vector of n independent standard Brownian motions (the assumption of independence can be
dropped). For a function v : R × Ω → Rn×n we put



v1,1 · · · v1,n
dWs1
� t
� t
 .


..
.. 
 .
 . 
vdW =
.
.   .. 
 .
0
0
vn,1 · · · vn,n
dWsn
and denote again by V the class of all functions for which this definition makes sense. An Itô process
will now be defined as the sum of an Itô integral plus an absolutely continuous process of finite
variation.
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
26
Definition 3.15 A stochastic process X = (Xt ) on (Ω, F, P) of the form
Xt = X0 +
�
t
u(s, ω)ds +
0
�
t
v(s, ω)dWs (ω)
(17)
0
where v ∈ V and u is adapted and satisfies the integrability condition holds
� t
|u(s, ω)|ds < ∞ for all t
0
almost surely is called an Itô process.
Example 3.16 The simplest possible Itô process is a Brownian motion with drift and volatility:
dXt = µdt + σdWt
for some constants µ ∈ R and σ > 0.
The generalized Itô formula allows for an evaluation of smooth functions of Itô processes. Its proof
follows from minor modifications of the arguments given in the proof of Theorem 2.9.
Theorem 3.17 Let X be an Itô process as defined by (17) and let
g(t, x) = (g1 (t, x), . . . , gp (t, x))
be continuously differentiable with respect to the time variable and twice continuously differentiable
with respect to the space variable. Then
Y (t, ω) := g(t, Xt (ω))
defines an Itô process with
dYk (t, ω) =
� ∂
∂
1 � ∂2
gk (t, Xt )dt +
gk (t, Xt )dXti +
gk (t, Xt )dXti dXtj
∂t
∂x
2
∂x
∂x
i
i
j
i
i,j
where dW i dW j = δi,j dt and dW i dt = dtdWti = dtdt = 0.
The following introduces a stochastic process that is the building block of the Black-Scholes option
pricing model.
Example 3.18 (Geometric Brownian motion) Given a BM W , an initial value S0 and constants
µ ∈ R and σ > 0 the geometric Brownian motion process is defined by
�
�
� �
1
St = S0 exp σWt + µ − σ 2 t .
2
�
�
In terms of the Itô processes Xt = σWt and Yt = µ − 12 σ 2 t and the twice continuously differentiable
function G(x, y) = S0 exp(x + y) we have that
St = G(Xt , Yt ).
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
27
An application of the generalized Itô formula yields
� t
� t
�
1 t
St = S0 +
G(Xs , Ys )dXs +
G(Xs , Ys )dYs +
G(Xs , Ys )σ 2 ds
2
0
0
0
� t
� t
�
1 t
2
= S0 +
G(Xs , Ys )σdWs +
G(Xs , Ys )(µ − 1/2σ )ds +
G(Xs , Ys )σ 2 ds
2 0
0
0
� t
� t
= S0 +
σSs dWs +
µSs ds.
0
0
In other words, the geometric Brownian motion process solves the stochastic differential equation
dSt = St (σdWt + µdt) .
The next example illustrates the link between Itô processes and partial differential equations.
Example 3.19 (Brownian motion and the reverse heat equation) Suppose that the functions F solves
the heat equation
1
Ft (t, x) + Fxx (t, x) = 0
(18)
2
on some bounded domain D ⊂ R with (smooth) boundary condition h(t, x). Let W be a onedimensional standard Brownian motion, denote by
τ := inf{t : Wt ∈
/ D} ∧ T
the minimum of the first exit time of W from D and some terminal time T . The function F satisfies
the boundary condition
F (τ, Wτ ) = h(τ, Wτ ) on ∂D.
By Itô’s formula we have
F (t, Wt ) = F (0, 0) +
�
t
Ft (s, Ws )ds +
0
�
t
Fx (s, Ws )dWs +
0
1
2
�
t
Fxx (s, Ws )ds
0
up to the random time τ . Sine F satisfies (20) we obtain
� τ
F (τ, Wτ ) = F (0, 0) +
Fx (s, Ws )dWs .
0
If Fx is sufficiently smooth, then the process (
E
�
τ
�·
0
Fx (s, Ws )dWs ) is a martingale and
Fx (s, Ws )dWs = 0.
0
Similarly, we can start the Brownian motion at time t in x. If we denote by Pt,x the resulting
distribution of the Wiener process and by Et,x the expected value with respect to Pt,x the boundary
condition yields
F (t, x) = Et,x h(τ, Wτ ).
Hence we obtain a probabilistic representation of the solution to the heat equation with boundary
condition h.
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
28
Example 3.20 Prove that the process
1
Xt = e 2 t sin(Wt )
is a martingale. Indeed, by Itô’s formula we have
1
d(e 2 t sin(Wt ))
=
=
1
1 1t
1 1
e 2 sin(Wt )dt + e 2 t cos(Wt )dWt + e 2 t (− sin(Wt ))dt
2
2
1
e 2 t cos(Wt )dWt .
Hence (Xt ) can be represented as a stochastic integral with respect to Brownian motion. As such it is
a martingale.
The generalized Itô formula also yields the following integration by parts formula; the proof is
straightforward and left as an exercise.
Proposition 3.21 (Integration-by-parts formula) Let X and Y be Itô processes on R. Then the
following integration-by-parts formula holds:
�
�
�
Xs dYs = Xt Yt − X0 Y0 − Ys dXs − dXs dYs .
We close this section with a brief discussion of the quadratic variation of an Itô process
dXt = v(t, ω)dWt
on the time interval [0, T ]. By Theorem 3.9 the process X can be chosen to be continuous in t. The
quadratic variation of X is defined as
< X(ω) >t = lim
n
�
i=1
|Xtni (ω) − Xtni−1 (ω)|2
where {tni }ni=1 is a partition of [0, T ] for every n ∈ N. The limit is in probability, taken over all
partitions with mesh sizes tending to zero as n → ∞. For a simple integrand v it is easily verified that
� t
< X(ω) >t =
v 2 (s, ω)ds.
0
The standard approximation of general integrands by simple processes shows that this identity carries
over to v ∈ V. From this we immediately obtain the following result.
Theorem 3.22 (Quadratic variation of Itô processes) Let X be an Itô process. The quadratic variation of X is given by
� t
< X(ω) >t =
�v(s, ω)�2 ds.
0
In particular, the quadratic variation comes solely from the Itô integral
�
ordinary integral uds.
�
vdW and not from the
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
3.3
29
The martingale representation theorem
By Theorem 3.7 Itô integrals are martingales with respect to the canonical filtration generated by
the Wiener process W . In a first step we establish the Itô representation. It says that every square
integrable F-measurable random variable F can be written as a stochastic integral.
Theorem 3.23 Assume that the filtration is generated by a Wiener process and let F ∈ L2 (FT ).
Then there exists a unique process f ∈ V such that
F (ω) = E[F ] +
�
T
f (t, ω)dWt (ω).
0
For the proof we need two results. The first deals with orthogonal complements of stable sets.
Lemma 3.24 Let M20 be the class of square integrable martingales starting in zero and A be a stable
subset of M20 . Let
A⊥ := {M ∈ M20 : E[M∞ N∞ ] = 0 for all N ∈ A}.
Then M · N is a martingale for all M ∈ A⊥ and N ∈ A.
The second result that we shall need is the complex version of the Stone-Weierstrass theorem
Lemma 3.25 (Stone-Weierstrass, complex version) Let K be a locally compact Hausdorff space.
Let A be a complex sub-algebra of C(K, C) equipped with the sup-norm that contains all constants, is
stable with respect to complex conjugation, and separates points. Then A is dense in C(K, C).
The proof of the martingale representation theorem may seem hard if you are not familiar with
functional analysis style arguments. The basic idea is to first describe the set of random variables
for which the desired representation does not hold as the orthogonal complement to a certain class of
measurable functions and then to show that this class is so large that the orthogonal complement is
actually trivial.
We are now ready to state and prove the martingale representation theorem.
Theorem 3.26 Let (Mt ) be an (Ft )-martingale with Mt ∈ L2 . Then there exists a unique (in L2 (P ⊗
λ)) process g(t, ω) such that g ∈ V and
Mt = E[M0 ] +
�
t
g(s, ω)dWs
0
P-a.s. for all t ≥ 0.
The martingale representation theorem yields a fairly abstract result. A closed form expression for the
integrand can only be obtained in special cases of which we discuss some below. We refer to Øksendal
(2003), p.60 for more examples.
Example 3.27
(i) Let F = W12 and consider the martingale
Mt := E[F |Ft ] = Wt2 + (1 − t).
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
By Itô’s formula
Wt2 = t + 2
�
30
t
Ws dWs
0
so that we obtain the following representation of (Mt ):
Mt = 1 + 2
(ii) Let F =
�T
0
�
t
Ws dWs .
0
Ws ds. The integration by parts formula yields
�
�
T
0
Ws ds = T WT −
T
sdWs =
0
�
T
0
(T − s)dWt .
(iii) Let F = exp(WT ). Since
�
�
1
d exp(Wt − t) = exp(Wt − 1/2t)dWt
2
we have
�
1
exp(WT − T ) = 1 +
2
and hence
1
F = exp( T ) +
2
�
T
0
T
exp(Wt − 1/2t)dWt
0
1
exp(Wt + (T − t))dWt .
2
(iv) Let F = sin(WT ). By Example 3.20 above
1
1
d(e 2 t sin(Wt )) = e 2 t cos(Wt )dWt .
or
e
1
2T
sin(WT ) =
Hence
F = sin(WT ) =
3.4
�
�
T
1
e 2 t cos(Wt )dWt .
0
T
1
e 2 (t−T ) cos(Wt )dWt .
0
Application: The Black Scholes Model
This section provides a first analysis of the Black-Scholes option pricing model. The assumption is
that the stock price process (St ) follows a geometric Brownian motion with drift µ and volatility σ.
That is,
dSt = St (µdt + σdWt ).
We denote by St0 the price of a riskless bond at time t that pays interest at a rate r ≥ 0 so
dSt0 = St0 rdt.
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
3.4.1
31
Pricing derivatives
Let us now consider some examples of financial derivatives that one can price explicitly within the
Black-Scholes framework.
European stock options. Our first goal is to price a European contingent claim
H = h(ST )
on the risky asset with maturity T . To this end, we denote by
φ(t, St )
and
η(t, St )
the number of stocks and bonds, respectively, an investor holds at time t. The value of her portfolio
is hence given by
V (t, St ) = St · φ(t, St ) + St0 · η(t, St ).
We assume that an agent can trade continuously so her gains from trading follow the stochastic process
� t
� t
Gt =
φ(t, St )dSt +
η(t, St )dSt0 .
0
0
We also assume that the trading strategy (φ, η) is self-financing so that V (t, St ) equals the initial
investment plus the gains from trading in the stock and bond market:
V (t, St ) = V (0, S0 ) + Gt .
The trading strategy (φ, η) replicates the claim H if by following (φ, η) the issuer can meet his payment
obligations at maturity:
V (T, ST ) = H.
It turns out that the replicating strategy can be characterized as the solution to a linear partial
differential equation.
Theorem 3.28 (PDE for the replicating strategy) Let V : [0, T ] × R+ → R ∈ C 1,2 be a solution to
the terminal value problem
1
Vt (t, S) + σ 2 S 2 VSS (t, S) + rSVS (t, S) = rV (t, S)
2
with
V (T, ·) = h.
(19)
Then the trading strategy
φ(t, St ) = VS (t, St )
and
η(t, St ) =
V (t, St ) − φ(t, St )St
St0
along with the portfolio value V (t, S) defines a self-financing trading strategy replicating H.
The preceding theorem states that by following the trading strategy (φ, η) the writer of the option H
can “eliminate” all the risk associated with issuing it. Implementing this strategy requires an initial
investment V (0, S0 ), the claim’s “fair” value. It turns out that for a European call option with strike
K, i.e. for the claim
h(ST ) = (ST − K)+
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
32
this value can be given in closed form. The following lemma shows that pricing an option in the
Black-Scholes model boils down to solving the (time-) reversed heat equation on [0, T ] × R with a
terminal condition given by the payoff function at maturity.
Lemma 3.29 Let τ (t) = σ 2 (T − t). Define
�
�
1 2
z(t, S) = ln S + r − σ (T − t)
2
and denote by u the solution to the heat equation
ut =
1
uxx
2
with intial condition
u(0, z) = (ez − K)+ .
Then the function
C(t, S) = e−r(T −t) u(τ (t), z(t, S))
solves the terminal value problem (19) for a European call option.
From a previous example we know that the solution to the heat equation with initial condition u(0, z) =
u0 (z) equals
� ∞
√
2
1
u(t, z) = √
u0 (z + x t)e−x /2 dx.
(20)
2π −∞
Hence a straightforward calculation yields the Black-Scholes price of a European call option:
CBS (t, S, σ, r, K, T ) = SN (d1 ) − e−r(T −t) KN (d2 )
(21)
where N (·) denotes the cumulative distribution function of the standard normal distribution and
d1 =
ln(S/K) + (r + 12 σ 2 )(T − t)
√
σ T −t
and
√
d2 = d1 − σ T − t.
The hedge portfolios in the stock and bond market are given in terms of the quantities d1,2 as
φ=
∂CBS
= N (d1 )
∂S
and
η = −e−rT KN (d2 ).
Remark 3.30 One can show that u in (20) is continuously differentiable with respect to the time
variable and twice continuously differentiable with respect to the space variable if u0 is continuous.
Differentiability conditions on u0 are not needed (typically u0 is not differentiable!). Continuity of
the initial (terminal) condition is enough to guarantee that u ∈ C 1,2 . This is due to the smoothing
property of the normal density. This justifies - a posteriori - the application of Itô’s formula when
calculating the hedge portfolio.
The derivative of the option value with respect to the current asset price is called the option
delta. It specifies the number of shares the issuer needs to hold in her portfolio to eliminate the risk
associated with her short position in the option. The second derivative with respect to S is called the
option gamma. It yields a measure for the dependence of the hedge portfolio on asset prices. The
derivatives with respect to interest rate, the volatility and the time to maturity are typically referred
to the rho, vega, and theta, respectively.
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
33
Remark 3.31 (i) Notice that the option value is independent of the drift µ. Notice furthermore
that so far we did not consider any measure change. Our current approach is based solely on the
idea of constructing a hedge portfolio that replicates the option’s payoff at maturity. Below we
consider a more advanced method to option pricing based on the concept of risk neutral valuation.
(ii) Traders are concerned about the hedge portfolio - not the option value. As a result they are
concerned about the option gamma. They typically try to hold a hedge portfolio with a small
gamma as this limits the transactions (and hence the transaction costs) in the stock market
necessary to hold a riskless (hedge) portfolio.
(iii) For a given market price of an option, the implied volatility is defined as the volatility that makes
the observed market price the Black-Scholes price. The Black-Scholes model assumes that the
implied vola is independent of the strike and time to maturity. In reality, however, one typically
observes a volatility smile. The implied vola is increasing in |S − K|, i.e., increasing in the
option’s “moneyness”. It also depends on the strike generating what is known as an implied
volatility surface.
Options on options. We close this section with two examples where the Black-Scholes pricing
formula yields closed form solutions for prices of options on options.
Example 3.32 A Chooser Option gives its holder the right to choose at some future date T0 whether
the option is to be a call or put option with strike K and time to expiry (T − T0 ). The payoff at time
T0 is
CT0 = max {CBS (T0 , ST0 , σ, r, K, T − T0 ), PBS (T0 , ST0 , σ, r, K, T − T0 )}
while the payoff at maturity is
CT = (ST − K)+ 1A + (K − ST )+ 1Ac
where
A := {CBS (T0 , ST0 , σ, r, K, T − T0 ) > PBS (T0 , ST0 , σ, r, K, T − T0 )}.
By the call-put-parity
PBS = CBS − ST0 + Ke−r(T −T0 )
we see that
CT0 = CBS + (Ke−r(T −T0 ) − ST0 )+ .
Thus, the payoff at time T0 equals those of a portfolio comprising a call option and put option with
different strikes and maturities and the value of a chooser option is its discounted Black-Scholes price.
Example 3.33 A Compound Option is an option with strike K0 and maturity T0 on an option with
strike K and maturity T . For instance, a “call on a call” gives its holder the right to buy a call option
with strike K and maturity T at some time T0 < T at a predetermined price K0 . Its value at time T0
is
C T0
=
=
+
(CBS (T0 , ST0 , σ, r, K, T − T0 ) − K0 )
� ∞�
�
e−rT0
g(x)N (d1 (x)) − Ke−r(T −T0 ) N (d2 (x)) − K0 ϕ(x)dx
x0
3
STOCHASTIC CALCULUS FOR BROWNIAN MOTION
34
where ϕ denotes the density of the standard normal distribution,
� �
�
g(x) := S0 exp σ T0 x + (r − σ 2 /2)T0
d1,2 (x)
=
2
ln g(x)
K + (r ± σ /2)(T − T0 )
√
σ T − T0
and
x0 = inf {CBS (T0 , g(x), σ, r, K, T − T0 ) ≥ K0 }.
x
3.4.2
Barrier Options
So far we considered only European-style derivatives, i.e., derivatives whose payoff depends only on
the price of the underlying at maturity. By contrast, the payoff of a barrier option depends on whether
the price of the underlying reaches a certain level before maturity. Most barrier options are either
knock-ins or knock-outs. For instance, a down-and-in put with strike K and barrier B pays
�
(K − ST )+ if min0≤t≤T St ≤ B
put
Cdi =
.
0
else
An up-and-out call with strike K and barrier B corresponds to
�
(ST − K)+ if max0≤t≤T St < B
call
Cuo =
.
0
else
Just like European options, within the Black-Scholes framework barrier options can be priced using
PDE methods. The difference is that one obtains an additional boundary condition that captures the
option’s payoff when the asset price hits the threshold level B.
Proposition 3.34 (PDE for up-and-out call options) Consider an up-and-out call option on S whose
payoff at maturity is given by
�
(ST − K)+ if max0≤t≤T St < B
H=
.
0
otherwise
The boundary value problem for the option value reads
1
Vt + σ 2 S 2 VSS + rSVS = rV
2
[0, T ] × (−∞, B)
on
(22)
with terminal and boundary condition given by, respectively,
V (T, S) = (S − K)+
and
V (t, B) = 0.
(23)
This PDE is not as straightforward as the PDE for European options. The solution can nonetheless
be obtained invoking probabilistic methods (the solution can be represented in terms of Brownian exit
times). In order to illustrate the main idea, let us assume that
St = e W t .
4
TOPICS IN DIFFUSION THEORY
35
The option payoff is
VT = (eWT − K)1{WT ≥k,MT ≤b}
where MT = max0≤t≤T Wt and
k = log K,
b = log B.
In the section on risk neutral pricing we show that within the current setting the fair value of the
option is
Vt = E[VT | Ft ].
(24)
This can be computed using the joint distribution of a Brownian motion and its running maximum
derived above. In order to see the link to the above boundary value problem, let
τB = inf{t ≤ T : St = B}
(τB := ∞ if
MT < B)
On {0 ≤ t ≤ τB } the option has not been knocked out and (Vt∧τB )0≤t≤T is a martingale because τB
is a stopping time. Moreover, from (24) we obtain the following result.
Lemma 3.35 The process (Vt ) is a Markov process up to τB , i.e.,
Vt = v(t, St )
on
0 ≤ t ≤ τB .
Due to the smoothing properties of the heat-kernel v satisfies the assumption of the generalized
Itô formula (v(t, St ) is the option value under the assumption that the option has not been knocked
out). We can therefore compute the differential and invoke the martingale property on {0 ≤ t ≤ τB }
to deduce that
1
vt (t, x) + x2 vx,x (t, x) = 0 on [0, T ] × [0, B].
2
This holds because (t, St ) can reach any point in [0, T ] × [0, B] before the option knocks out. We refer
to Chapter 18 of book by Björk (2004) for specific formulae for barrier options.
4
Topics in diffusion theory
So far, we used only PDE methods to analyze pricing and hedging problems. In order to study
equivalent martingale measures and the risk-neutral approach to pricing and hedging in continuous
time, we need to introduce the change of measure formula. This requires some knowledge of stochastic
differential equations and stochastic exponentials.
Further Reading: Much of this section (with the exception of the proof of the Novikov condition)
is based on Chapters 5.1 and 8.6 of Øksendal (2003).
4.1
Stochastic differential equations
Let W be a Brownian motion defined on some probability space (Ω, F, P). An equation of the form
dXt = b(t, Xt )dt + σ(t, Xt )dWt
(25)
4
TOPICS IN DIFFUSION THEORY
36
is called a stochastic differential equation (SDE) with coefficients b(·, ·) and σ(·, ·). We already solved
the SDE
dXt = Xt (µdt + σdWt )
using Itô’s formula. Xt turned out to be a function of Wt and t, i.e., a strong solution in the sense of
the following definition.
Definition 4.1 The process X is called a strong solution to the SDE (40) if for all t ≥ 0 the following
holds:
(i) Xt is a function of t and the Brownian path up to time t, i.e., Xt is of the form
Xt = F (t, (Ws )0≤s≤t ) .
(ii) The integrals
�t
0
b(s, Xs )ds and
�t
0
σ(s, Xs )dWs exist.
(iii) The process satisfies the integral equation
Xt = x +
�
t
b(s, Xs )ds +
0
�
t
σ(s, Xs )dWs .
0
Example 4.2 Consider the SDE dXt = αt dWt for some deterministic function α ∈ C 1 . Then
� t
Xt = x +
αs dWs .
0
The integration by parts formula yields
Xt = x + α t Wt −
�
t
0
Ws αs� ds
so X is a strong solution.
Example 4.3 An Ornstein-Uhlenbeck process takes the form
dXt = −αXt dt + σdWt
for some positive constants α and σ. Such a process fluctuates around zero (at least in the long run),
due to the minus sign in front of the drift term. To solve this SDE we consider the process
Yt = Xt eαt
so
dYt = eαt dXt + αeαt Xt dt.
Hence
dYt = σeαt dWt .
In view of the previous example this SDE has a strong solution and
�
�
� t
−αt
αs
Xt = e
x+σ
e dWs .
0
4
TOPICS IN DIFFUSION THEORY
37
An Ornstein Uhlenbeck process is a Gauss process because Xt follows a normal distribution. Specifically, if we define γ := √σ2α then the martingale property of stochastic integrals implies
�
�
Xt ∼ N xe−αt , γ 2 (1 − e−2αt ) .
Replacing the constant x by a random variable X0 that is independent of the Brownian motion and
satisfies
X0 ∼ N (0, γ 2 ) yields Xt ∼ N (0, γ 2 ).
Thus, unlike Brownian motion that asymptotically fluctuates between infinity and negative infinity (remember the law of iterated logarithm), an Ornstein-Uhlenbeck process admits a stationary distribution.
The covariance function of the stationary process is given by
�
�
ρ(s, t) = γ 2 e−α|t−s| − e−α|t+s| .
A strong solution does not always exist. An example is the famous Tanaka equation
�
+1 if x ≥ 0
dXt = sign(Xt )dWt where sign(x) =
.
−1 if x < 0
To deal with such equations one works with a weaker solution concept, called weak solutions on which
we shall not elaborate in this course. Instead we move on to the following existence and uniqueness
results for SDEs. Its proof is similar to the standard Picard-Iteration in the theory of ordinary
differential equation so we only give an outline.
Theorem 4.4 Let T > 0 and let b, σ : [0, T ] × R → R be measurable functions that satisfy the linear
growth condition
|b(t, x)| + |σ(t, x)| ≤ C(1 + |x|) (x ∈ R, t ∈ [0, T ])
along with the Lipschitz condition
|b(t, x) − b(t, y)| + |σ(t, x) − σ(t, y)| ≤ D|x − y|.
Let Z be a square integrable random variable that is independent of W . Then the SDE
dXt = b(t, Xt )dt + σ(t, Xt )dWt
on
[0, T ]
with initial condition X0 = Z has a unique (up to indistinguishability) continuous strong solution.
It turns out that strong solutions to stochastic differential equations have the Markov property. We
state this result without further proof.
Theorem 4.5 Let the coefficients of the SDE
dXt = b(t, Xt )dt + σ(t, Xt )dWt
satisfy the conditions of Theorem 4.4 and let (Ft ) be the canonical filtration:
Ft = σ(Xs : 0 ≤ s ≤ t).
It holds for all t1 ≤ t2 ≤ T that
P[Xt2 ≤ y|Ft1 ] = P[Xt2 ≤ y|Xt1 ]
P-a.s.
4
TOPICS IN DIFFUSION THEORY
4.2
38
Solutions to linear SDEs
As with ordinary differential equations, stochastic differential equations can rarely be solved in closed
form. A closed form solution can be obtained for linear SDEs, i.e., for equations of the form
dXt = (αt + βt Xt )dt + (γt + δt Xt )dWt
(26)
where the coefficients are adapted stochastic processes and continuous in the time variable. The special
case where α ≡ 0 ≡ δ corresponds to a generalization of the Ornstein-Uhlenbeck process studied in
Example 4.3.
4.2.1
Stochastic Exponential SDEs
When αt = 0 and γt = 0 the linear SDE (26) simplifies to
dUt = βt Ut dt + δt Ut dWt .
(27)
This SDE is of the form
dUt = Ut dYt
where the Itô process (Yt ) is defined by
dYt = βt dt + δt dWt
with Y0 = 0.
The SDE (27) is called the stochastic exponential of Y , denoted
Ut = U0 E(Y )t .
It will be key to the famous Girsanov theorem analyzed in the next section. By direct calculation we
verify that
�
�
1
Ut = U0 exp Yt − < Y >t
2
�
Ut is always positive if U0 is. Since the bracket < Y > of Y is given by δs2 ds we obtain that
�� t
�
� t
1
Ut = U0 exp
(βs − δs2 )ds +
δs dWs .
(28)
2
0
0
It is checked directly by Itô’s formula that this is indeed a solution to (27). Obviously, U is a strictly
positive local martingale for β = 0 (as a stochastic integral w.r.t. Brownian motion). Conversely, every
continuous strictly positive local martingale can be represented as a stochastic exponential.
Lemma 4.6 Let U be a strictly positive local martingale. Then there exists a local martingale L such
that U = E(L).
4.2.2
General linear SDEs
The case of general linear SDEs can be reduced to stochastic exponentials by a variation of constants
formula. More precisely, we look for solutions of the form
Xt = U t Vt
where
dVt = at dt + bt dWt
4
TOPICS IN DIFFUSION THEORY
39
and (Ut ) satisfies (27). If we put U0 = 1 and V0 = X0 and take the differential of the product Ut Vt
we see that
bt Ut = γt and at Ut = αt − δt γt .
Using the representation (28) for Ut , this determines the functions at and bt . Thus V (t) is obtained
and X(t) is found to be
�
�
� t
� t
α s − δs γ s
γs
Xt = U t X0 +
ds +
dWs .
Us
0
0 Us
Example 4.7 Let X satisfy the SDE
dXt = at Xt dt + dWt .
To solve this equation using our general formula we first need to solve for the U process, i.e., solve
dUt = at Ut dt,
We obtain Ut = exp
��
t
0
�
U0 = 1.
as ds so
Xt = e
�t
0
as ds
�
X0 +
�
t
e
0
−
�s
0
au du
�
dWs .
Example 4.8 (Brownian bridge) A linear SDE of the form
dXt =
b − Xt
dt + dWt
T −t
starting in X0 = a is called a Brownian bridge and is of major importance in many financial market
models of insider trading. With
αt = −
1
,
T −t
βt =
b
,
T −t
γt = 1
and
δt = 0
the solution is given by
�
t
Xt = a 1 −
T
�
+b
t
+ (T − t)
T
�
t
0
1
dWs .
T −s
We leave it as an exercise to prove that
lim (T − t)
t→T
�
t
0
1
dWs = 0
T −s
almost surely so that XT = b. Thus we do not only know the starting but also the end point of the
process.
4.3
Change of variables and Girsanov’s theorem
We have seen that changes of measures play a key role in pricing and hedging financial derivatives in
discrete time models. In this section we discuss a change of measure formula (Girsanov’s theorem) that
is of fundamental importance in continuous time financial mathematics. Loosely speaking Girsanov’s
4
TOPICS IN DIFFUSION THEORY
40
theorem states that if we change the drift coefficient of an Itô process the distribution of the process
will not change dramatically. In fact, the law of the new process is absolutely continuous with respect
to the law of the old process and we can explicitly calculate the density process. To make this more
precise we fix a filtered probability space (Ω, (Ft )0≤t≤T , P) carrying a standard Brownian motion W
and recall that when the measure Q on (Ω, F) is absolutely continuous with respect to P on F = FT ,
denoted Q � P, then there exists a random variable
Z :=
dQ
dP
called the Radon-Nikodym derivative such that
�
Q(A) =
Z(ω)P(ω)
(A ∈ FT ).
A
Furthermore, the restriction Q|Ft of Q to Ft is absolutely continuous with respect to the restriction
P|Ft of P to Ft and the density process
Zt :=
d(Q|Ft )
d(P|Ft )
is a uniformly integrable martingale. As always, we take càdlàg versions of Z. The first result shows
how local martingales under P and Q are related.
Lemma 4.9 Consider the stopping time R := inf {t ≥ 0 | Zt = 0} . Then:
(i) R = +∞ a.s. under Q,
(ii) For a non-negative adapted process U and s < t we have
EQ
s [Ut ] = 1{Zs �=0}
1 P
E [Ut Zt ].
Zs s
(iii) For any F-adapted process Y , if Y Z is local martingale under P up to time R, then Y is a Q-local
martingale.
Let us now state a first version of Girsanov’s theorem. The proof uses Lévy’s characterization of
Brownian motion stated in Theorem 1.8.
Theorem 4.10 (Girsanov Theorem I) Let (αt ) be an adapted process and Y be the solution to the
SDE
dYt = αt dt + dWt
(t ∈ [0, T ]).
Let the process M be the stochastic exponential of −αW , i.e.,
� � t
�
�
1 t 2
Mt = exp −
αs dWs −
α ds
2 0 s
0
If M is a martingale with respect to (Ft ) and P, then Y is a standard Brownian motion with respect
to the measure Q ∼ P defined by
dQ = MT dP.
4
TOPICS IN DIFFUSION THEORY
41
We notice that stochastic exponentials are positive continuous local martingales and hence supermartingales but not martingales in general. A sufficient condition to guarantee that (Mt ) is a martingale is the Novikov condition.
Theorem 4.11 (Novikov Condition) The condition
�
� �
��
1 T 2
E exp
α ds
<∞
2 0 s
implies that M is a martingale. In particular, M is a martingale whenever the drift process of Y is
bounded.
The proof uses a clever application of the Hölder inequality. The following lemma is key.
Lemma 4.12 Let p, q ∈ (1, ∞) with
� �
�
sup E exp
1
p
+
1
q
= 1. If
��
�
√
p
Mτ
: τ stopping time , τ ≤ c < ∞
√
2( p − 1)
for all
c > 0,
then E(M ) is an Lq -martingale.
We are now ready to prove the Novikov condition.
Example 4.13 Suppose that (Yt ) is a Brownian motion on [0, T ] with drift µ ∈ R, i.e., that
(t ∈ [0, T ]).
dYt = µdt + dWt
Then (Yt ) is a Browninan motion with respect to the measure Q where
1
dQ = e−µWT − 2 µ
2
T
dP.
Similar arguments as in the proof of Theorem 4.10 yield the follow generalization.
Theorem 4.14 (Girsanov Theorem II) Let Yt ∈ Rn be an Itô process of the form
dYt = βt dt + θt dWt
where (Wt ) is an m-dimensional Brownian motion and βt ∈ Rn and θt ∈ Rn,m are adapted stochastic
processes. Suppose there exist processes ut ∈ W m and αt ∈ W m such that
θt (ω)ut (ω) = βt (ω) − αt (ω).
Put
and
� � t
�
�
1 t 2
Mt := exp −
us dWs −
us ds
2 0
0
dQ = MT dP.
(29)
(0 ≤ t ≤ T )
4
TOPICS IN DIFFUSION THEORY
42
Assume that M is a P-martingale. Then Q is a probability measure and
� t
Ŵt =
us ds + Wt
(0 ≤ t ≤ T )
0
is a Q-Brownian motion. In terms of Ŵ the process Y has stochastic integral representation
dYt = αt dt + θt dŴt .
We note again that the Novikov condition
�
� �
��
1 T 2
EP exp
u ds
<∞
2 0 s
is sufficient to guarantee that (Mt ) is a martingale. We also point out that when n = m and θ is
invertible, then the process (ut ) satisfying (29) is uniquely given by
ut (ω) = θt−1 (ω)[βt (ω) − αt (ω)].
In applications in finance one usually chooses α to be equal to zero. In this case Y has the representation
dYt = θt dŴt
under Q. In particular, Y is a Q-martingale and Q is called an equivalent martingale measure for Y .
Example 4.15 Suppose that (Yt ) is a Brownian motion on [0, T ] with drift µ ∈ R and volatility
σ > 0, i.e., that
dYt = µdt + σdWt
(t ∈ [0, T ]).
Then (Yt ) is a Browninan motion with respect to the measure Q where
µ
1 µ 2
dQ
= e − σ WT − 2 ( σ ) T .
dP
Remark 4.16 The assumption that θ is invertible has an economic interpretation. If Y is a vector
of asset prices and θ is not invertible, then one of the assets can be represented as a combination of
the others including the risk free bond. Hence that asset is redundant. As a result, we have more
Brownian motions, i.e., sources of uncertainty than independent assets and the market is incomplete.
Let us return to the Black-Scholes option pricing model where asset prices follow the geometric
Brownian motion
dSt = St (µdt + σdWt )
(0 ≤ t ≤ T )
defined on some probability space (Ω, F, P). Let r > 0 be the risk-free interest rate, denote by (S̃t )
the process of discounted stock prices,
S̃t = e−rt St ,
and introduce a probability measure P∗ ∼ P with density
� �
�
�
T
µ−r
1 T µ−r 2
MT := exp −
dWs −
(
) ds .
σ
2 0
σ
0
5
RISK NEUTRAL PRICING
43
Under the equivalent martingale measure P∗ ∼ P the process of discounted asset prices has the
representation
dS̃t = σ S̃t dŴt .
Thus, (S̃t ) is a P∗ -martingale. Hence the Itô representation stated in Theorem 3.23 asserts that any
contingent claim F ∈ L2 (P∗ , FT ) can be represented as a stochastic integral with respect to (S̃t ):
� T
F = EQ [F ] +
ψu dS̃u .
0
In the following section we will identify (ψs ) as a trading strategy that replicates F and EQ [F ] as the
arbitrage-free price of F . We also generalize the PDE characterization of option prices beyond the
framework of the basic Black-Scholes model.
5
Risk neutral pricing
We are now ready to address to problem of pricing derivative securities in models with several assets
whose price processes follow Itô processes. Throughout we work on stochastic basis (Ω, F, (Ft ), P)
where (Ft ) is the filtration generated by an n-dimensional Brownian motion W augmented by the
null-sets.
5.1
The market model
The financial market consists of d ≤ n risky assets with price processes S 1 , ..., S d and one risk-free
with price process S 0 . For simplicity we assume that the risk-free is r = 0 (this can be achieved by
passing to discounted terms) so St0 ≡ 1. The price processes of the risky assets follow the dynamics
dSti = Sti dRti
(30)
where the returns process R = (Ri )di=1 is an Itô process given by
dRt = γt dt + σt dWt
with adapted processes γ and σ taking values in Rd and Rd×n , respectively. We assume that the
model parameters satisfy the following condition.
��
�
T
Assumption 5.1 The processes γ and σ satisfy P 0 γs ds < ∞ = 1, respectively, σ ∈ V. Furthermore, there exists an adapted and (t, ω)-measurable process ξ that satisfies the Novikov condition such
that
σt (ω)ξt (ω) = γt (ω) P ⊗ λ-a.s.
(31)
In view of the preceding assumption, we can rewrite the returns process as
dRt = σt (ξt dt + dWt ) := σt dŴt .
Example 5.2 Condition (31) is satisfied if the volatility matrix σ is of full rank, i.e. det(σt σt∗ ) �= 0
dP ⊗ dt-a.s., where A∗ denotes the transpose of the matrix A. In this case
ξ := σ ∗ (σσ ∗ )−1 γ.
(32)
5
RISK NEUTRAL PRICING
44
We parameterize strategies by the amounts of money invested in each risky asset. Together with
an initial capital this will define a value process by the self-financing requirement.
Definition 5.3 (Self-financing Trading Strategies) A self-financing trading strategy ϕ is an adapted
process ϕ = (ϕ)di=1 on [0, T ] such that
� T
|ϕ∗t σt |2 dt < ∞ a.s..
(33)
0
The wealth process V = V
ϕ
associated with a self-financing strategy is defined by
dVt = ϕ∗t dRt .
The model is said to allow arbitrage if there exists a self-financing strategy ϕ such that the
associated value process satisfies
V0ϕ = 0
and
P[VTϕ ≥ 0] = 1 with P[VTϕ > 0] > 0.
Without further assumptions on the trading strategies it will be impossible to exclude arbitrage.
We therefore introduce the notion of admissible trading strategies.
Definition 5.4 A self-financing trading strategy ϕ is called admissible if the following conditions hold:
(i) The associated value process V ϕ is bounded from below, that is V ϕ ≥ −C for some C ∈ R
(“bounded credit limit”).
(ii) The process ξ is uniformly bounded (in t and ω) and ϕ satisfies the integrability condition
��
�
T
E
0
|ϕ∗t σt |2 dt < ∞.
(34)
From now on, we consider only admissible strategies.
5.2
Equivalent martingale measures and absence of arbitrage
Just like in discrete time models there is a close link between the absence of arbitrage and the existence
of (local) equivalent martingale measures.
Definition 5.5 We say that P∗ is an equivalent (local) martingale measure if P∗ and P are equivalent
and (discounted) asset prices are (local) martingales under P∗ .
The link between the existence of an equivalent martingale measure and the absence of arbitrage
(or, more generally, the condition of “no free lunch with vanishing risk”) can be stated in great generality. We settle with the following result whose proof needs the Burkholder-Davis-Gundy inequality.
Lemma 5.6 (Burkholder-Davis-Gundy Inequality) For a continuous local martingale M starting at 0 and for any p ∈ (0, ∞), there exist constants cp , Cp > 0 such that
�
�
�
�
p �
p �
p
2
2
cp E �M �∞ ≤ E sup|Mt | ≤ Cp E �M �∞
.
t≥0
5
RISK NEUTRAL PRICING
45
Theorem 5.7 Under Assumption 5.1 the following holds:
(i) There exists a local equivalent martingale measure.
(ii) The model is free of arbitrage
One can also prove a partial converse of the above result.
Proposition 5.8 Suppose that the model is free of arbitrage. Then there exists an Ft -adapted (t, ω)measurable process ξ such that
σt (ω)ξt (ω) = γt (ω)
P ⊗ λ-a.s.
We say that the market model is complete if any contingent claim H ∈ L2 (P) is attainable, i.e., if
there exists an admissible trading strategy ϕ and a real number z such that
� T
H =z+
ϕu dSu .
0
Proposition 5.9 The market is complete if and only if n = d and the volatility matrix σ has a
left-inverse σ ∗ , i.e.,
σt∗ σt = I a.s.
In this case, there exists a unique replicating strategy for any claim H ∈ L2 .
5.2.1
Characterization of equivalent martingale measures
In general, the above market model is incomplete and so there exist many EMMs. However, it turns
out that the EMMs can be conveniently characterized. To this end, it will be useful to re-parameterize
strategies. The value process associated with ϕ satisfies
dVtϕ = ϕ∗t σt dŴt = (σt∗ ϕt )∗ dŴt
so the range
Ct = Im σt∗
of the matrix σt∗ is important for the set of attainable payoffs. In the sequel we assume that the
matrix-valued process σtt σt is a.s. invertible. As a result, we obtain that ξ lies in the range of σ ∗ .
Indeed, the equation
σ∗ z = ξ
is solved for
z = σ(σ ∗ σ)−1 ξ
(we omit the time-variable). Furthermore, for all y ∈ Ker σ one has that
< σ ∗ z, y >=< z, σy >= 0
(35)
so Im σt∗ ⊥ Kerσt . Let us now recall that one class of EMMs has already been constructed. It was
defined in terms of the density
� � ·
�
ZT := E −
ξt dWt )T
0
5
RISK NEUTRAL PRICING
46
where ξ satisfies Assumption 5.1. In order to construct further martingale measures, let Q ∼ P, with
density process
� ·
Z = E( λt dW )T
for an adapted process λ such that
0
�T
|λt | dt < ∞ and introduce the Q-BM
� ·
WQ = W −
λt dt.
2
0
0
Since
�
�
σt dŴt = σt (ξt dt + dWt ) = σt (ξt + λt )dt + dWtQ
we see that Q is an EMM if and only if σ(ξ + λ) = 0 (dP ⊗ dt a.e.). That is,
λQ
t = λt = −ξt + ηt ,
with
ηt = ηtQ ∈ Ker σt ≡ Ct⊥ .
Furthermore, since ξt ∈ Ct one has η · ξ = η ∗ ξ = 0 (dP ⊗ dt a.e.), due to (35) and hence any such
EMM Q must have a density process of the form
�
�� ·
�
� � ·
� �� ·
�
dQ ��
Q
Q
Zt =
=E
λs dWs = E −
ξs dWs E
ηs dWs .
(36)
dP �Ft
0
0
0
t
t
t
Thus, we have shown the following result.
Proposition 5.10 (Characterization of the EMM) (i) Any EMM Q has a density process of
the form (36) with λ = −ξ + η where −ξt = ΠCt (λt ), ηt = ΠCt⊥ (λt ) satisfying
�
T
0
|λt |2 dt =
�
T
0
|ξt |2 dt +
�
T
0
|ηt |2 dt < ∞.
In particular, η · ξ ≡ 0, and λ, η and ξ are unique dP ⊗ dt a.e.
�T
(ii) In turn, any predictable λ with ΠCt (λt ) = −ξt (dP ⊗ dt a.e.) and 0 |λt |2 dt < ∞ and such that
�� ·
�
Z := E 0 λt dWt is a martingale, defines an equivalent local martingale measure Q via (36).
5.2.2
The range of option prices
We know from discrete time finance that the set of arbitrage free option prices is essentially identical
to the set of expected option payoffs under the equivalent martingale measures. A similar result holds
within the Black-Scholes framework. In order to see this, let us consider a European option on a
contingent claim H. The maximal price the buyer is willing to pay is
p(H) := sup {y : there exists an admissible ϕ with V0ϕ = 0 s.t. − y + VTϕ ≥ −H a.s.}
On the other hand, the minimal price the seller is willing to accept is
q(H) := inf {z : there exists an admissible ϕ with V0ϕ = 0 s.t. z + VTϕ ≥ H a.s.}
If p(H) = q(H), then this common value is the option price. In general, we have the following result.
5
RISK NEUTRAL PRICING
47
Proposition 5.11 Let Q be an equivalent martingale measure such that
� �
�
T
dQ
1 2
= exp −
ξs dWs − ξs ds
dP
2
0
and ξ satisfies the Novikov condition and σ ∗ ξ = γ. Then
ess infH ≤ p(H) ≤ EQ [H] ≤ q(H) ≤ ∞.
5.3
Option Pricing and PDE
We have seen that the value of a European option in the geometric Brownian motion model can be
calculated using partial differential equations. In this section we extend this approach to more general
diffusion models. Specifically, we assume that asset prices follow the diffusion
dStx = b(t, Stx )dt + σ(t, Stx )dWt
with S0x = x.
(37)
where W is a d-dimensional Brownian motion and the drift and diffusion coefficients satisfy the
assumptions of Theorem 4.4. We assume that the model is complete and denote by P∗ the unique
equivalent martingale measure.
The infinitesimal generator at time t of (St ) is the differential operator At that acts on all sufficiently function u : R → R according to
(At u)(x) =
1 2
∂2u
∂u
σ (t, x) 2 + b(t, x) .
2
∂x
∂x
For the time-homogenous case σ(t, x) ≡ σ(x) and b(t, x) ≡ b(x) this reduces to
(Au)(x) =
d
E[u(Stx )] |t=0 .
dt
∂
The operator ∂t
+ At is called the Dynkin operator. It arises naturally when solving stochastic
differential equations. In fact, we have the following result whose proof is an immediate consequence
of Itô’s formula.
Theorem 5.12 Let u : [0, T ] × R → R be twice continuously differentiable with respect to the state
variable and continuously differentiable with respect to the time variable. If u has bounded derivatives
with respect to x, then the process (Mt ) defined by
�
� t�
∂u
x
Mt = u(t, St ) −
+ At u (v, Svx )dv
∂t
0
is a martingale.
In order to deal with discounted quantities we also state a slightly more general result in the
following proposition.
Proposition 5.13 Under the assumptions of Theorem 5.12 and if r(t, x) is a bounded continuous
function the following process is a martingale:
�
�
� t �
�
x
∂u
− tT r(u,Su
)du
x
− 0v r(s,Ssx )ds
Mt = e
u(t, St ) −
e
+ At u − ru (v, Svx )dv.
∂t
0
6
STOCHASTIC OPTIMAL CONTROL
48
As in the benchmark geometric Brownian motion model the option value is given by
� �T
�
x
Vt = E∗ e− t r(u,Su )du f (STx )|Ft
and we can prove that
Vt = F (t, Stx )
where the function F is defined by
F (t, x)
� �T
�
x
E∗ e− t r(u,Su )du f (STt,x )
=
when we denote by (Sut,x ) the unique strong solution of (37) that starts from x at time t. The following
result characterizes the function F as a solution of a partial differential equation.
Theorem 5.14 Let u be as in Theorem 5.12. If u satisfies the boundary value problem
∂u
+ At u = 0
∂t
for all
(t, x) ∈ [0, T ] × R
(38)
x∈R
(39)
and
u(T, x) = f (x)
for all
then
u(t, x) = F (t, x).
The previous result reduces the problem of pricing a European contingent claim to solving of terminal
value problem. For this terminal value problem to have a unique classical solution one needs to impose
some regularity assumptions on the volatility coefficients and the operator At often needs to be elliptic.
6
Stochastic Optimal Control
In a complete financial market model each contingent claim under some conditions can be replicated
by continuously trading in the stock and bond market. In an incomplete market, a perfect hedge may
not be possible and a hedging error may occur. A hedging error also occurs in a complete market if
the writer of an option invests less than the fair value in hedging. The problem of quantile hedging
consists in finding the optimal strategy that maximizes the probability of a perfect hedge subject to
capital constraints. This is a typical example of a stochastic optimal control problem.
Example (“Quantile Hedging”) Consider a complete financial market model with a single risky
asset and unique equivalent martingale measure P∗ . The fair value of a contingent claim H ∈ L2 is
given by E∗ [H]. If the writer of the claim faces a capital constraint in the sense that he cannot (or
does not want to) invest more than v < E∗ [H] Euros for hedging, then a perfect hedge is not possible.
For a self-financing trading strategy ξ, let us thus denote by (Vtξ ) the associated value process with
initial investment V0ξ and introduce the set
S := {ξ : ξ self-financing, square intergrable with V0ξ ≤ v}
6
STOCHASTIC OPTIMAL CONTROL
49
of trading strategies that require an initial investment of no more than v. The problem of quantile
hedging consists of
sup P[VTξ ≥ H].
ξ∈S
Further Reading: This section is based on Chapters 2 and 3 of Pham (2009).
6.1
Examples of stochastic optimization problems
In the sequel we present several examples of stochastic optimization problems arising in economics
and finance.
6.1.1
Portfolio allocation
We consider a financial market consisting of a riskless asset with strictly positive price process S 0
representing the savings account, and n risky assets of price process S representing stocks. An agent
may invest in this market at any time t, with a number of shares αt in the n risky assets. By
denoting by Xt her wealth at time t, the number of shares invested in the savings account at time t
is (Xt − αt · St )/St0 . The self-financed wealth process evolves according to
dXt = (Xt − αt · St )
dSt0
+ αt · dSt
St0
The control is the a process α valued in A, subset of Rn . The portfolio allocation problem is to choose
the best investment in these assets. A classical approach for describing the behavior and preferences of
agents and investors is the expected utility criterion. It relies on the theory of choice under uncertainty:
the agent compares random incomes for which he knows the probability distributions. Under some
conditions on the preferences, von Neumann and Morgenstern show that they can be represented
through the expectation of some function U , called utility. The random income X is preferred to a
random income X � if E[U (X)] ≥ E[U (X � )]. The utility function U is nondecreasing and concave, this
last feature formulating the risk aversion of the agent.
In this portfolio allocation context, the criterion consists of maximizing the expected utility from
terminal wealth on a finite horizon T < ∞:
sup E[U (XT )]
α
A class of possible utility functions would be
U (x) =
�
xp −1
p ,
x≥0
−∞, x < 0
.
with 0 < p < 1, the limiting case p = 0 corresponding to a logarithmic utility function: U (x) = ln x,
x > 0. These popular utility functions are called CRRA (Constant Relative Risk Aversion) since the
relative risk aversion defined by η = −xU �� (x)/U � (x), is constant in this case and equal to 1 − p.
6
STOCHASTIC OPTIMAL CONTROL
6.1.2
50
Production-consumption model
We consider the following model for a production unit. The capital value Kt at time t evolves according
to the investment rate It in capital and the price St per unit of capital by
dKt = Kt
dSt
+ It dt
St
The debt Lt of this production unit evolves in terms of the interest rate r, the consumption rate Ct
and the productivity rate Pt of capital:
dLt = rLt dt −
Kt
dPt + (It + Ct )dt
St
We choose a dynamics model for (St , Pt ):
dSt = µSt dt + σ1 St dWt1 ,
dPt = bdt + σ2 dWt2
where (W 1 , W 2 ) is a two-dimensional Brownian motion on a filtered probability space (Ω, F, F =
(Ft ), P) and µ, b, σ1 , σ2 are constants, σ1 , σ2 > 0. The net value of this production unit is
X t = Kt − L t
We impose the constraints
Kt ≥ 0, Ct ≥ 0, Xt > 0, t ≥ 0
We denote by kt = Kt /Xt and ct = Ct /Xt the control variables for investment and consumption. The
dynamics of the controlled system is then governed by:
� �
�
�
b
Xt
dXt = Xt kt µ − r +
+ (r − ct ) dt + kt Xt σ1 dWt1 + kt σ2 dWt2
St
St
dSt = µSt dt + σ1 St dWt1 ,
Given a discount factor β > 0 and a utility function U , the objective is to determine the optimal
investment and consumption for the production unit:
�� ∞
�
−βt
sup E
e U (ct Xt )dt .
(k,c)
6.1.3
0
Irreversible investment model
We consider a firm with production goods (electricity, oil, etc.) The firm may increase its production
capacity by transferring capital form an activity sector to another one. The controlled dynamics of
its production capacity then evolves according to
dXt = Xt (−δdt + σdWt ) + αt dt.
Here, δ ≥ 0 is the depreciation rate of the production, σ > 0 is the volatility, αt dt is the capital-unit
number obtained by the firm for a cost λαt dt where λ > 0 is interpreted as a conversion factor from
6
STOCHASTIC OPTIMAL CONTROL
51
an activity sector to another one. The control α is valued in R+ . This is an irreversible model for the
capital expansion of the firm. The profit function of the company is an increasing, concave function
Π from R+ into R, and the optimization problem for the firm is
�� ∞
�
−βt
sup E
e (Π(Xt ) − λαt )dt
α
6.2
0
Controlled diffusion processes
We consider a control model where the state of the system is governed by a stochastic differential
equation (SDE) valued in Rn :
dXs = b(Xs , αs )ds + σ(Xs , αs )dWs ,
(40)
where W is a d-dimensional Brownian motion on a filtered probability space (Ω, F, (Ft ), P) satisfying
the usual conditions.
Assumption 6.1 We assume that the following conditions are satisfied.
(i) The control α = (αs ) is a progressively measurable process taking values in some set A ⊂ Rm .
(ii) The measurable functions b : Rn × A → Rn and σ : Rn × A → Rn×d satisfy a uniform Lipschitz
condition in a:
|b(x, a) − b(y, a)| + |σ(x, a) − σ(y, a)| ≤ K|x − y|
for all
x, y ∈ Rn , a ∈ A.
(41)
In the sequel, for 0 ≤ t ≤ T < ∞, we denote by Tt,T the set of stopping times valued in [t, T ] and
by A the set of control processes α such that
��
�
T
E
0
|b(0, αt )|2 + |σ(0, αt )|2 dt < ∞.
(42)
The conditions (41) and (42) ensure for all α ∈ A and for any initial condition (t, x) ∈ [0, T ] × Rn ,
the existence and uniqueness of a strong solution to the SDE (with random coefficients) (40) starting
from x at s = t. We denote by {Xst,x , s ∈ [t, T ]} this solution with a.s. continuous paths. Under the
above conditions on b, σ and α we also have:
E[ sup |Xst,x |2 ] < ∞
(43)
s∈[t,T ]
lim E[ sup
h↓0+
s∈[t,t+h]
|Xst,x − x|2 ] = 0
(44)
Now, let f : [0, T ] × Rn × A → R and g : Rn → R be two measurable functions describing running
and terminal revenues, respectively. We suppose that
(i) g is lower-bounded
or (ii) g satisfies a quadratic growth condition: |g(x)| ≤ C(1 + |x|2 ), for all x ∈ Rn ,
for some constant C independent of x.
(45)
6
STOCHASTIC OPTIMAL CONTROL
52
For (t, x) ∈ [0, T ] × Rn , we denote by A(t, x) the subset of controls α of A such that
��
�
T
E
t
|f (s, Xst,x , αs )|ds < ∞,
(46)
and we assume that A(t, x) is not empty for all (t, x) ∈ [0, T ] × Rn . We can then define under (45)
the gain function:
��
�
T
J(t, x, α) := E
f (s, Xst,x , αs )ds + g(XTt,x ) ,
t
for all (t, x) ∈ [0, T ] × R and α ∈ A(t, x). The objective is to maximize over control processes the
function J. To this end, we introduce the associated value function:
n
v(t, x) :=
sup
J(t, x, α).
α∈A(t,x)
Definition 6.2
(i) We say that α̂ ∈ A(t, x) is an optimal control if J(t, x, α̂) = v(t, x).
(ii) A control α is called a Markovian control if it has the form αs = a(s, Xst,x ) for some measurable
function a from [0, T ] × Rn into A.
In the sequel, we shall implicitly assume that the value function v is measurable in its arguments.
This point is not trivial a priori, but can be proven using the so called measurable selection theorem.
6.3
Dynamic programming principle
The dynamic programming principle (DPP) is a fundamental principle in the theory of stochastic
control. It states that the optimization problem can be split in two parts: an optimal control on the
whole time interval [t, T ] may be obtained by first searching for an optimal control from time θ given
the state value Xθt,x , i.e. compute v(θ, Xθt,T ), and then maximizing over controls on [t, θ] the quantity
��
�
θ
E
t
f (s, Xst,x , αs )ds + v(θ, Xθt,x ) .
Theorem 6.3 Let (t, x) ∈ [0, T ] × Rn . Then
v(t, x) =
sup
sup E
α∈A(t,x) θ∈Tt,T
=
6.4
sup
inf E
α∈A(t,x) θ∈Tt,T
��
��
θ
f (s, Xst,x , αs )ds
t
+
v(θ, Xθt,x )
+
v(θ, Xθt,x )
θ
t
f (s, Xst,x , αs )ds
�
�
(47)
Hamilton-Jacobi-Bellman equation
The Hamilton-Jacobi-Bellman equation (HJB) is the infinitesimal version of the dynamic programming
principle: it describes the local behavior of the value function when we send the stopping time θ ∈ Tt,T
to t.
6
STOCHASTIC OPTIMAL CONTROL
6.4.1
53
Formal derivation of HJB
Let us consider the time θ = t + h with h > 0 and a control α ∈ A(t, x). According to DPP:
��
�
t+h
v(t, x) ≥ E
t
t,x
f (s, Xst,x , αs )ds + v(t + h, Xt+h
)
(48)
By assuming that v is smooth enough, we may apply Itô’s formula:
�
� t+h �
∂v
t,x
+ Lαs v (s, Xst,x )ds + (local) martingale
v(t + h, Xt+h
) = v(t, x) +
∂t
t
where the operator La (for a ∈ A) is defined by
1
La v := b(x, a) · Dx v + tr(σ(x, a)σ � (x, a)Dx2 v).
2
Substituting into (48) we get
��
t+h
E
t
�
�
�
∂v
+ Lαs v (s, Xst,x ) + f (s, Xst,x , αs )ds ≤ 0.
∂t
Dividing by h and letting h → 0 we obtain:
∂v
(t, x) + La v(t, x) + f (t, x, a) ≤ 0
∂t
for all a ∈ A. Choosing an optimal control α∗ ∈ A(t, x) s.t.
��
t+h
v(t, x) = E
t
f (s, Xst,x , αs∗ )ds + v(t +
t,x
h, Xt+h
)
�
we would have obtained
∗
∂v
(t, x) + Lαt v(t, x) + f (t, x, αt∗ ) = 0
∂t
after h → 0. This suggests that v should satisfy
∂v
(t, x) + sup [La v(t, x) + f (t, x, a)] = 0
∂t
a∈A
for all
(t, x) ∈ [0, T ) × Rn .
We often rewrite this PDE in the form
−
∂v
(t, x) − H(t, x, Dx v(t, x), Dx2 v(t, x)) = 0
∂t
for all
(t, x) ∈ [0, T ) × Rn
(49)
where for (t, x, p, M ) ∈ [0, T ] × Rn × Rn × Sn :
1
H(t, x, p, M ) = sup [b(x, a) · p + tr(σ(x, a)σ � (x, a)M ) + f (t, x, a)].
2
a∈A
This function is called the Hamiltonian of the associated control problem. The equation (49) is called
the dynamic programming equation or the Hamilton-Jacobi-Bellman equation. The terminal condition
associated to this PDE is
v(T, x) := g(x) for all x ∈ Rn ,
which results from the very definition of the value function v considered at the horizon date T .
6
STOCHASTIC OPTIMAL CONTROL
6.4.2
54
Verification theorem
The crucial step in the classical approach to dynamic programming consists in proving that, given a
smooth solution to the HJB equation, this candidate coincides with the value function. This result is
called the verification theorem, and allows us to exhibit as byproduct an optimal Markovian control.
The proof relies essentially on Itô’s formula.
Theorem 6.4 Let w ∈ C 1,2 ([0, T ) × Rn ) ∩ C 0 ([0, T ] × Rn ) satisfy the quadratic growth condition
|w(t, x)| ≤ C(1 + |x|2 )
for all
(t, x) ∈ [0, T ] × Rn .
(i) Suppose that
−
∂w
(t, x) − sup [La w(t, x) + f (t, x, a)] ≥ 0
∂t
a∈A
w(T, x) ≥ g(x)
for all
for all
(t, x) ∈ [0, T ) × Rn
(50)
x ∈ Rn
(51)
Then w ≥ v on [0, T ] × Rn .
(ii) Suppose in addition that w(T, ·) = g, and there exists a measurable function α̂(·, ·) valued in
A such that
−
∂w
(t, x) − sup {La w(t, x) + f (t, x, a)}
∂t
a
=−
∂w
(t, x) − Lα̂(t,x) w(t, x) − f (t, x, α̂(t, x)) = 0
∂t
for all
(t, x) ∈ [0, T ) × Rn ,
(52)
and s.t. the SDE
dXs = b(Xs , α̂(s, Xs ))ds + σ(Xs , α̂(s, Xs ))dWs
admits a unique solution, denoted by X̂st,x , given an initial condition Xt = x, and the process
{α̂(s, X̂st,x ), s ∈ [t, T ]} lies in A(t, x).
Then w = v on [0, T ] × Rn and α̂ is an optimal Markovian control.
6.5
Application: Portfolio optimization for power utilities
We consider again the example described in Section 6.1.1 in the framework of the Black-ScholesMerton model over a finite horizon T . An agent invests at any time t a portion αt of his wealth in a
stock of price S (governed by a geometric Brownian motion) and 1 − αt in a bond of price S 0 with
interest rate r. The investor faces the portfolio constraint that at any time t, αt is valued in A, a
compact convex subset of R. His wealth process evolves according to the SDE
dXt =
Xt (1 − αt ) 0
Xt α t
dSt +
dSt =
St
St0
= Xt (αt µ + (1 − αt )r)dt + Xt αt σdWt .
Given a portfolio strategy α ∈ A we denote by X t,x the corresponding wealth process starting from
an initial capital Xt = x at time t > 0. The agent wants to maximize the expected utility from
6
STOCHASTIC OPTIMAL CONTROL
55
terminal wealth at time T . For an increasing concave utility function U the value function of the
utility maximization problem is then defined by
v(t, x) = sup E[U (XTt,x )],
α∈A
(t, x) ∈ [0, T ] × R+ .
(53)
The HJB equation for the stochastic control problem (53) is then
−
∂w
− sup [La w(t, x)] = 0,
∂t
a∈A
(54)
together with the terminal condition
w(T, x) = U (x),
x ∈ R+ .
(55)
2
1 2 2 2∂ w
Here, La w(t, x) = x(aµ + (1 − a)r) ∂w
∂x + 2 x a σ ∂x2 . It turns out that for the particular case of power
utility functions of CRRA type, as considered originally by Merton
U (x) =
xp
,
p
x ≥ 0, 0 < p < 1
one can find explicitly a smooth solution to the above problem. We are looking for a candidate solution
in the form
w(t, x) = φ(t)U (x)
for some positive function φ. By substituting into (54)-(55), we derive that φ should satisfy the
ordinary differential equation
φ� (t) + ρφ(t) = 0,
φ(T ) = 1
where
1
ρ = p sup[a(µ − r) + r − a2 (1 − p)σ 2 ].
2
q∈A
We obtain φ(t) = exp(ρ(T − t)). Hence, the function
w(t, x) = exp(ρ(T − t))U (x),
(t, x) ∈ [0, T ] × R+
is a smooth solution to (54)-(55). Furthermore, the function A � a �−→ a(µ − r) + r − 21 a2 (1 − p)σ 2
is strictly concave on the closed convex set A, and thus attains its maximum at some constant â. By
construction, â attains supa∈A [La w(t, x)]. Moreover, the wealth process associated to the constant
control â
dXt = Xt (âµ + (1 − â)r)dt + Xt âσdWt
admits a unique solution given an initial condition. From the verification Theorem 6.4, this proves that
the value function to the utility maximization problem (53) is equal to w, and the optimal proportion
of wealth to invest in stock is constant given by â. Finally notice that when A is large enough, the
values â and ρ are explicitly given by
µ−r
â = 2
σ (1 − p)
and
ρ=
(µ − r)2 p
+ rp.
2σ 2 1 − p
7
BACKWARD STOCHASTIC EQUATIONS AND OPTIMAL CONTROL
7
56
Backward Stochastic Equations and Optimal Control
This chapter is an introduction to the theory of BSDEs and its applications. It became now very
popular, and is an important field of research due to its connections with stochastic control, mathematical finance and partial differential equations. BSDEs provide a probabilistic representation of
nonlinear PDEs. As a consequence, BSDEs can also be used for designing numerical algorithms to
nonlinear PDEs.
Further Reading: This section is based on Chapter 6 of Pham (2009).
7.1
Existence and uniqueness results
Let W = (Wt )t∈[0,T ] be a standard d-dimensional Brownian motion on a filtered probability space
(Ω, F, F, P), where F = (Ft )t∈[0,T ] is the natural filtration on W , and T is a fixed finite horizon.
We denote by S2 (0, T ) the set of real-valued progressively measurable processes Y such that
�
�
E
sup |Yt |2 < ∞
t∈[0,T ]
and H2 (0, T )d the set of Rd -valued progressively measurable processes Z such that
��
�
T
E
|Zt |2 dt < ∞.
0
We are given a pair (ξ, f ), called the terminal condition and generator (or driver), satisfying:
• (A) ξ ∈ L2 (Ω, FT , P; R)
• (B) f : Ω × [0, T ] × R × Rd → R s.t.
– f (·, t, y, z) written for simplicity f (t, y, z), is progressively measurable for all y, z
– f (t, 0, 0) ∈ H2 (0, T )
– f satisfies a uniform Lipschitz condition in (y, z), i.e. there exists a constant Cf such that
|f (t, y1 , z1 ) − f (t, y2 , z2 )| ≤ Cf (|y1 − y2 | + |z1 − z2 |)
for all
y1 , y2 , z1 , z2
dt ⊗ P a.e.
We consider the (one-dimensional) backward stochastic differential equation (BSDE):
−dYt = f (t, Yt , Zt )dt − Zt · dWt ,
YT = ξ
(56)
The second equality is called the terminal condition.
Definition 7.1 A solution to the BSDE (56) is a pair (Y, Z) ∈ S2 (0, T ) × H2 (0, T )d satisfying
� T
� T
Yt = ξ +
f (s, Ys , Zs )ds −
Zs · dWs , t ∈ [0, T ].
t
t
We prove an existence and uniqueness result for the above BSDE.
Theorem 7.2 Given a pair (ξ, f ) satisfying (A) and (B), there exists a unique solution (Y, Z) to the
BSDE (56).
7
BACKWARD STOCHASTIC EQUATIONS AND OPTIMAL CONTROL
7.1.1
57
Linear BSDE
We consider the particular case where the generator f is linear in y and z. The linear BSDE is written
in the form
−dYt = (At Yt + Zt · Bt + Ct )dt − Zt · dWt , YT = ξ,
(57)
where A and B are bounded progressively measurable processes valued in R and Rd , and C is a process
in H2 (0, T )d . We can solve this BSDE explicitly.
Proposition 7.3 The unique solution (Y, Z) to the linear BSDE (57) is given by
�
� �
� T
�
Γt Yt = E ΓT ξ +
Γs Cs ds��Ft ,
(58)
t
where Γ is the adjoint (or dual) process, i.e., the solution to the linear SDE
dΓt = Γt (At dt + Bt · dWt ),
7.1.2
Γ0 = 1
Comparison principle
We state a very useful comparison principle for BSDEs.
Theorem 7.4 Let (ξ 1 , f 1 ) and (ξ 2 , f 2 ) be two pairs of terminal conditions and generators satisfying
conditions (A) and (B), let (Y 1 , Z 1 ), (Y 2 , Z 2 ) be the solutions to their corresponding BSDEs. Suppose
that:
• ξ 1 ≤ ξ 2 a.s.
• f 1 (t, Yt1 , Zt1 ) ≤ f 2 (t, Yt1 , Zt1 ) dt ⊗ dP a.e.
• f 2 (t, Yt1 , Zt1 ) ∈ H2 (0, T ).
Then Yt1 ≤ Yt2 for all t ∈ [0, T ], a.s.
Furthermore, if, in addition, Y01 = Y02 , then Yt1 = Yt2 , t ∈ [0, T ] a.s. and thus Z 1 = Z 2 a.e. In
particular, if P(ξ 1 < ξ 2 ) > 0 or f 1 (t, Yt1 , Zt1 ) < f 2 (t, Yt1 , Zt1 ) on a set of strictly positive measure
dt ⊗ dP, then Y01 < Y02 .
7.1.3
BSDE, PDE and nonlinear Feynman-Kac formula
According to the Feynman-Kac formula, the solution to the linear parabolic PDE
−
∂v
− Lv − f (t, x) = 0,
∂t
v(T, x) = g(x),
has the representation
v(t, x) = E
��
(t, x) ∈ [0, T ) × Rn ,
x ∈ Rn ,
T
t
f (s, Xst,x )ds
+
g(XTt,x )
(59)
(60)
�
,
7
BACKWARD STOCHASTIC EQUATIONS AND OPTIMAL CONTROL
58
where L is the second-order operator
1
2
Lv = b(x) · Dx v + tr(σ(x)σ � (x)Dxx
v),
2
and {Xst,x , s ∈ [t, T ]} is the solution to the SDE
s ∈ [t, T ],
dXs = b(Xs )ds + σ(Xs )dWs ,
Xt = x
In this section we study an extension of the Feynman-Kac formula for semilinear PDEs of the form
−
∂v
− Lv − f (t, x, v, σ � Dx v) = 0,
∂t
v(T, x) = g(x),
(t, x) ∈ [0, T ) × Rn ,
x ∈ Rn ,
(61)
(62)
with Rn -valued b and Rn×d -valued σ both satisfying a Lipschitz condition on Rn . f is a continuous
function on [0, T ] × Rn × R × Rd satisfying a linear growth condition in (x, y, z) and a Lipschitz
condition in (y, z), uniformly in (t, x). The continuous function g satisfies a linear growth condition.
We shall represent the solution to this PDE by means of the BSDE
−dYs = f (s, Xs , Ys , Zs )ds − Zs · dWs ,
s ∈ [0, T ],
YT = g(XT ),
(63)
and the forward SDE valued in Rn :
dXs = b(Xs )ds + σ(Xs )dWs
(64)
(this is also called a Forward-Backward SDE). By a standard estimate on the second moment of X,
we see that the terminal condition and the generator of the BSDE (63) satisfy the conditions (A) and
(B) stated in Section 7.1. Thus there exists a unique solution (Y, Z) ∈ S2 (0, T ) × H2 (0, T )d satisfying
(63). By the Markov property of the diffusion X we notice that Yt = v(t, Xt ), where
v(t, x) := Ytt,x
is a deterministic function of (t, x) in [0, T ] × Rn , {Xst,x , s ∈ [t, T ]} is the solution to the SDE (64)
starting from x at t, and {(Yst,x , Zst,x ), s ∈ [t, T ]} is the solution to the BSDE (63) with Xs = Xst,x ,
s ∈ [t, T ].2 We call this framework a Markovian case for the BSDE.
The next result is analogous to the verification theorem for Hamilton-Jacobi-Bellman equations,
and shows that a classical solution to the semilinear PDE provides a solution to the BSDE.
Theorem 7.5 Let v ∈ C 1,2 ([0, T )×Rn )∩C 0 ([0, T ]×Rn ) be a classical solution to (61)-(62), satisfying
a linear growth condition and such that for some positive constants C, q: |Dx v(t, x)| ≤ C(1 + |x|q )
for all x ∈ Rn . Then, the pair (Y, Z) defined by
Yt = v(t, Xt ),
Zt = σ � (Xt )Dx v(t, Xt ),
t ∈ [0, T ],
is the solution to the BSDE (63).
that Xst,x and also g(XTt,x ) are independent of Ft , therefore Yst,x , Zst,x must be independent of Ft as well,
according to the fixed point construction in section 7.1. Hence Ytt,x is deterministic.
2 Note
7
BACKWARD STOCHASTIC EQUATIONS AND OPTIMAL CONTROL
7.2
59
Control and BSDE
Theorem 7.6 Let (Y, Z), (Y α , Z α ) be solutions to the BSDEs given by (ξ, f ), (ξ α , f α ), α ∈ A, a
subset of progressively measurable processes where (ξ, f ), (ξ α , f α ), α ∈ A satisfy (A) and (B) in
Section 7.1. Suppose that there exists some α̂ ∈ A such that
f (t, Yt , Zt ) = ess inf f α (t, Yt , Zt ) = f α̂ (t, Yt , Zt ),
α∈A
dt ⊗ P a.e.
ξ = ess inf ξ α = ξ α̂ .
α∈A
Then Yt = ess inf α∈A Ytα = Ytα̂ , t ∈ [0, T ] a.s.
In Chapter 6 we studied how to solve a stochastic control problem by the dynamic programming
method. We present here an alternative approach, called Pontryagin maximum principle, and based
on optimality conditions for controls.
We consider the framework of a stochastic control problem as defined in Chapter 6: let X be a
controlled diffusion on Rn governed by
dXs = b(Xs , αs ) + σ(Xs , αs )dWs ,
(65)
where W is a d-dimensional standard Brownian motion, and α ∈ A, the control process, is a progressively measurable process valued in A. The gain functional to maximize is
��
�
T
J(α) = E
f (t, Xt , αt )dt + g(XT ) ,
0
where f : [0, T ] × Rn × A → R is continuous in (t, x) for all a in A, g : Rn → R is a concave C 1
function, and f , g satisfy a quadratic growth condition in x.
We define the generalized Hamiltonian H : [0, T ] × Rn × A × Rn × Rn×d → R by
H(t, x, a, y, z) = b(x, a) · y + tr(σ � (x, a)z) + f (t, x, a),
and we assume that H is differentiable in x with derivative denoted by Dx H. We consider for each
α ∈ A, the BSDE, called the adjoint equation:
−dYt = Dx H(t, Xt , αt , Yt , Zt )dt − Zt dWt ,
YT = Dx g(XT ).
(66)
Theorem 7.7 Let α̂ ∈ A and X̂ be the associated controlled diffusion. Suppose that there exists a
solution (Ŷ , Ẑ) to the associated BSDE (66) such that
H(t, X̂t , α̂t , Ŷt , Ẑt ) = max H(t, X̂t , a, Ŷt , Ẑt ),
a∈A
t ∈ [0, T ] a.s.
(67)
and
(x, a) �−→ H(t, x, a, Ŷt , Ẑt ) is a concave function,
for all t ∈ [0, T ]. Then α̂ is an optimal control, i.e.
J(α̂) = sup J(α).
α∈A
(68)
7
BACKWARD STOCHASTIC EQUATIONS AND OPTIMAL CONTROL
60
We conclude this section by providing the connection between maximum principle and dynamic programming. The value function of the stochastic control problem considered above is defined by
��
�
T
v(t, x) = sup E
α∈A
t
f (s, Xst,x , αs )ds + g(XTt,x )
(69)
where {Xst,x , s ∈ [t, T ]} is the solution to (65) starting from x at t. Recall that the associated
Hamilton-Jacobi-Bellman equation is
−
∂v
− sup [G(t, x, a, Dx v, Dx2 v)] = 0,
∂t a∈A
(70)
where for (t, x, a, p, M ) ∈ [0, T ] × Rn × A × Rn × Sn ,
1
G(t, x, a, p, M ) = b(x, a) · p + tr(σσ � (x, a)M ) + f (t, x, a).
2
Theorem 7.8 Suppose that v ∈ C 1,3 ([0, T )×Rn )∩C 0 ([0, T ]×Rn ), and there exists an optimal control
α̂ ∈ A to (69) with associated controlled diffusion X̂. Then
G(t, X̂t , α̂t , Dx v(t, X̂t ), Dx2 v(t, X̂t )) = max G(t, X̂t , a, Dx v(t, X̂t ), Dx2 v(t, X̂t )),
(71)
(Ŷt , Ẑt ) = (Dx v(t, X̂t ), Dx2 v(t, X̂t )σ(X̂t , α̂t )),
(72)
a∈A
and the pair
is the solution to the adjoint BSDE (66).
7.3
7.3.1
Application: Portfolio optimization
Exponential utility maximization with option payoff
We consider a financial market with one riskless asset of price S 0 = 1 and one risky asset of price
process
dSt = St (bt dt + σt dWt ),
where W is a standard Brownian motion on (Ω, F, F = (Ft ), P) equipped with the natural filtration F
of W , b and σ are two bounded progressively measurable processes, σt ≥ ε, for all t, a.s. with ε ≥ 0.
An agent, starting from a capital x, invests an amount αt at any time t in the risky asset. His wealth
process, controlled by α, is given by
� t
� t
dSu
x,α
Xt = x +
αu
=x+
αu (bu du + σu dWu ).
Su
0
0
�T
We denote by A the set of progressively measurable processes α valued in R, such that 0 |αt |2 dt < ∞
a.s. and X x,α is lower-bounded. The agent must provide at maturity T an option payoff represented by
a bounded random variable ξ FT -measurable. Given his risk aversion characterized by an exponential
utility
U (x) = − exp(−ηx), x ∈ R, η > 0,
7
BACKWARD STOCHASTIC EQUATIONS AND OPTIMAL CONTROL
61
the objective of the agent is to solve the maximization problem:
v(x) = sup E[U (XTx,α − ξ)].
(73)
α∈A
The approach adopted here for determining the value function v and the optimal control α̂ is quite
general, and is based on the following argument. We construct a family of processes (Jtα )t∈[0,T ] , α ∈ A,
satisfying the properties:
(i) JTα = U (XTx,α − ξ) for all α ∈ A
(ii) J0α is a constant independent of α ∈ A
(iii) J α is a supermartingale for all α ∈ A, and there exists an α̂ ∈ A such that J α̂ is a martingale
Indeed, in this case, for such α̂, we have for any α ∈ A,
E[U (XTx,α − ξ)] = E[JTα ] ≤ J0α = J0α̂ = E[JTα̂ ] = E[U (XTx,α̂ − ξ)]
which proves that α̂ is an optimal control and v(x) = J0α̂ .
We construct such a family (J α ) in the form
Jtα = U (Xtx,α − Yt ),
t ∈ [0, T ],
α ∈ A,
with (Y, Z) solution to the BSDE
Yt = ξ +
�
T
t
f (s, Zs )ds −
�
T
Zs dWs ,
t
t ∈ [0, T ],
where f is a generator to be determined. The conditions (i) and (ii) are clearly satisfied. In order to
satisfy condition (iii), we shall exploit the particular structure of the exponential utility function U .
Indeed, by substituting the definitions of X x,α and Y into U (Xtx,α − Yt ) we obtain:
Jtα = − exp(−η(Xtx,α − Yt )) = Mtα Ctα ,
where M α is the local martingale given by
�
� � t
�
1 t
Mtα = exp(−η(x − Y0 )) exp −
η(αu σu − Zu )dWu −
|η(αu σu − Zu )|2 du) ,
2 0
0
and
Ctα = − exp
with
ρ(t, a, z) = η
�η
��
t
0
�
ρ(u, αu , Zu )du ,
�
|aσt − z|2 − abt − f (t, z) .
2
We are then looking for a generator f such that the process (Ctα ) is nonincreasing for all α ∈ A, and
constant for some α̂ ∈ A. In other words, the problem is reduced to finding f such that
ρ(t, αt , Zt ) ≥ 0,
t ∈ [0, T ]
7
BACKWARD STOCHASTIC EQUATIONS AND OPTIMAL CONTROL
for all α ∈ A and
62
t ∈ [0, T ].
ρ(t, α̂t , Zt ) = 0,
These two conditions are satisfied for
bt
1
f (t, z) := −z −
σt
2η
and
1
α̂t :=
σt
�
1 bt
Zt +
η σt
�
,
� �2
� bt �
� � ,
� σt �
(74)
t ∈ [0, T ],
which can be seen by rewriting ρ in the form
�
�2
1
η ��
1 bt ��
bt
1
ρ(t, a, z) = �aσt − z −
−z −
η
2
η σt �
σt
2η
Theorem 7.9 The value function to problem (73) is equal to
(75)
� �2
� bt �
� � − f (t, z).
� σt �
v(x) = U (x − Y0 ) = − exp(−η(x − Y0 )),
where (Y, Z) is the solution to the BSDE
Yt = ξ +
�
T
t
f (s, Zs )ds −
�
T
Zs dWs ,
t
t ∈ [0, T ],
with a generator f given by (74). Furthermore, an optimal control α̂ is given by (75).
7.3.2
Mean-variance criterion for portfolio selection
We consider a Black-Scholes financial model. There is one riskless asset of price process
dSt0 = rSt0 dt,
and one stock of price process
dSt = St (bdt + σdWt ),
with constants b > r and σ > 0. An agent invests at any time t an amount αt in the stock, and his
wealth process is governed by
dXt = αt
dSt
dS 0
+ (Xt − αt ) 0t = [rXt + αt (b − r)]dt + σαt dWt ,
St
St
X0 = x.
We denote by A the set of progressively measurable processes α valued in R, such that
��
�
T
E
0
|αt |2 dt < ∞.
The mean-variance criterion for portfolio selection consists in minimizing the variance of the wealth
under the constraint that its expectation is equal to a given constant:
V (m) = inf {Var(XT ) : E[XT ] = m},
α∈A
m ∈ R.
7
BACKWARD STOCHASTIC EQUATIONS AND OPTIMAL CONTROL
63
We shall see later, by the Lagrangian method, that this problem is reduced to the resolution of an
auxiliary control problem
Ṽ (λ) = inf E[(XT − λ)2 ], λ ∈ R.
α∈A
We shall solve this auxiliary problem by the stochastic maximum principle described in Section 7.2.
In this case, the Hamiltonian H takes the form:
H(x, a, y, z) = [rx + a(b − r)]y + σaz.
The adjoint BSDE (66) is written for any α ∈ A as
−dYt = rYt dt − Zt dWt ,
YT = 2(XT − λ).
Let α̂ ∈ A be a candidate for the optimal control, and X̂, (Ŷ , Ẑ) the corresponding processes. Then,
H(x, a, Ŷt , Ẑt ) = rxŶt + a[(b − r)Ŷt + σ Ẑt ].
Since this expression is linear in a, we see that conditions (67) and (68) will be satisfied iff
(b − r)Ŷt + σ Ẑt = 0,
t ∈ [0, T ] a.s.
We are looking for the solution (Ŷ , Ẑ) to the adjoint BSDE in the form
Ŷt = ϕ(t)X̂t + ψ(t),
for some deterministic C 1 functions ϕ and ψ. By substituting into the adjoint BSDE and using the
definition of X we observe through comparison of the finite variation and martingale parts:
ϕ� (t)X̂t + ϕ(t)(rX̂t + α̂t (b − r)) + ψ � (t) = −r(ϕ(t)X̂t + ψ(t)),
ϕ(t)σ α̂t = Ẑt ,
We also have terminal conditions
ϕ(T ) = 2,
ψ(T ) = −2λ.
By using (b − r)Ŷt + σ Ẑt = 0 and ϕ(t)σ α̂t = Ẑt we have
α̂t =
(r − b)Ŷt
(r − b)(ϕ(t)X̂t + ψ(t))
=
.
2
σ ϕ(t)
σ 2 ϕ(t)
On the other hand, using (76), we also have
α̂t =
(ϕ� (t) + 2rϕ(t))X̂t + ψ � (t) + rψ(t)
.
(r − b)ϕ(t)
By comparing the two expressions for α̂ we obtain ODEs
�
�
(b − r)2
�
ϕ (t) + 2r −
ϕ(t) = 0, ϕ(T ) = 2
σ2
�
�
(b − r)2
ψ � (t) + r −
ψ(t) = 0, ψ(T ) = −2λ,
σ2
(76)
REFERENCES
64
whose explicit solutions are
ϕ(t) = 2 exp
��
ψ(t) = −2λ exp
2r −
��
(b − r)2
σ2
r−
�
(b − r)2
σ2
�
(T − t) ,
�
�
(T − t) .
With this choice of ϕ and ψ the corresponding processes (Ŷ , Ẑ) solve the adjoint BSDE and the
conditions for the maximum principle in Theorem 7.7 are satisfied. We therefore have an optimal
control given by
(r − b)(ϕ(t)x + ψ(t))
α̂λ (t, x) =
,
σ 2 ϕ(t)
written in the Markovian form.
The value function Ṽ (λ) = E[(XT − λ)2 ] with X given by control α̂λ can be calculated explicitly
using Itô’s formula and the definition of α̂λ :
�
�
(b − r)2
Ṽ (λ) = exp −
T
(λ − erT x)2 , λ ∈ R.
σ2
Finally we show how the two optimization problems associated with V and Ṽ are related.
Proposition 7.10 We have the two conjugate relations
Ṽ (λ) = inf [V (m) + (m − λ)2 ],
λ ∈ R,
(77)
V (m) = sup[Ṽ (λ) − (m − λ)2 ],
m ∈ R.
(78)
m∈R
λ∈R
For any m ∈ R, the optimal control of V (m) is equal to α̂λm where λm attains the maximum in (78),
i.e.
��
� �
2
m − x exp r − (b−r)
T
σ2
λm =
.
(79)
2
1 − exp(− (b−r)
σ2 T )
References
Björk, T. (2004): Arbitrage Theory in Continuous Time. Oxford University Press, Oxford.
Duffie, D. (1996): Dynamic Asset Pricing Theory. Princeton University Press, Princeton, NJ.
Hull, J. (2000): Options, Futures and other Derivatives, 4th edition. Prentice Hall, London.
Karatzas, I & S.E. Shreve (1988): Stochastic Calculus and Brownian Motion. Springer, Berlin.
Lamberton, D & B. Lapeyre (1996): Stochastic Calculus Applied to Finance. Chapman & Hall,
London.
Øksendal, B. (2003): Stochastic Differential Equations: An Introduction with Applications. Springer,
Berlin.
REFERENCES
65
Pham, H. (2009): Continuous-time Stochastic Control and Optimization with Financial Applications.
Springer, Berlin.
Revuz, D. & M. Yor (1999): Continuous Martingales and Brownian Motion, 3r d edition. Springer,
London.
Shreve, S.E. (2005a): Stochastic Calculus for Finance I: The Binomial Asset Pricing Model.
Springer, Berlin.
Shreve, S.E. (2005b): Stochastic Calculus for Finance II: Continuous-Time Models. Springer, Berlin.