DPP and HJB for Stopping Times - Math Course Material

Stochastic Optimal Control
Lecture 6: DPP and HJB for Stopping Times
Álvaro Cartea, University of Oxford
January 23, 2017
1 / 13
Overview of course1
I
Deterministic dynamic optimisation
I
Stochastic dynamic optimisation
I
Diffusions and Jumps
I
Infinitesimal generators
Dynamic programming principle
I
I
I
I
I
Diffusions
Jumps
Stopping times
Examples:
I
I
I
Merton portfolio problems
Optimal execution of blocks of shares
others
1 Preliminary.
Contents may change as we go along.
2 / 13
Setup
For this purpose, let X = (Xt )0≤t≤T denote a vector-valued process of
dim m satisfying the SDE
dXt = µ(t, Xt ) dt + σ(t, Xt ) dWt + γ(t, Xt ) dNt ,
where µ : [0, T ] × Rm 7→ Rm , σ : [0, T ] × Rm 7→ Rm × Rm ,
γ : [0, T ] × Rm 7→ Rm × Rm , W is an m-dimensional Brownian motion
with independent components, and N is an m-dimensional counting
process with intensities λ(t, Xt ). The filtration F is the natural one
generated by X, and the generator of the process Lt acts on twice
differentiable functions as follows:
Lt h(t, x) = µ(t, x) · Dx h(t, x) + 21 Trσ(t, x)σ(t, x)0 D2x h(t, x)
+
p
X
λj (t, x) h(t, x + γ ·j (t, x)) − h(t, x) .
j=1
3 / 13
The agent then has a performance criterion, for each F-stopping time
τ ∈ T[t,T ] , given by
H τ (t, x) = Et,x [ G (Xτ ) ] ,
(1a)
where G (Xτ ) is the reward upon exercise, and she seeks to find the value
function
H(t, x) = sup H τ (t, x) ,
(1b)
τ ∈T[t,T ]
and the stopping time τ which attains the supremum if it exists.
4 / 13
The agent is attempting to decide between stopping ‘now’ and receiving
the reward G , or continuing to hold off in hopes of receiving a larger
reward in the future. She should continue to wait at the point
(t, x) ∈ [0, T ] × Rm as long as the value function has not attained a
value of G (x) there.
This motivates the definition of the stopping region, S, which we define
as
n
o
S = (t, x) ∈ [0, T ] × Rm : H(t, x) = G (x) .
I
Note that whenever (t, x) ∈ S, i.e. if the state of the system lies in
S, it is optimal to stop immediately – since the agent cannot
improve beyond G by the definition of the value function in (1).
I
The complement of this region, S c , is known as the continuation
region.
5 / 13
DPP for optimal stopping
Theorem
Dynamic Programming Principle for Stopping Problems. The value
function (1) satisfies the DPP
H(t, x) = sup Et,x [ G (Xτ ) 1τ <θ + H(θ, Xθ ) 1τ ≥θ ] ,
(2)
τ ∈T[t,T ]
for all (t, x) ∈ [0, T ] × Rm and all stopping times θ ≤ T .
I
if the optimal stopping time τ ∗ occurs prior to θ, then the agent’s
value function equals the reward at τ ∗ .
I
If the agent has not stopped by θ, then at θ she receives the value
function evaluated at the current state of the system.
6 / 13
HJB for optimal stopping
Theorem
Dynamic Programming Equation for Stopping Problems. Assume
that the value function H(t, x) is once differentiable in t and all second
order derivatives in x exist, i.e. H ∈ C 1,2 ([0, T ], Rm ), and that
G : Rm 7→ R is continuous. Then H solves the variational inequality,
also known as an obstacle problem, or free-boundary, problem:
max
n
∂t H + Lt H , G − H
o
= 0,
on
D,
(3)
where D = [0, T ] × Rm .
7 / 13
Example American option
8 / 13
Perpetual call option
Here we price an American call with infinite horizon, i.e. expiry T = ∞.
Assume the stock price S satisfies the SDE
dSt
= (r − D0 ) dt + σ dWt .
St
(4)
The agent’s value function is
V (S) = sup E e −r τ max(Sτ − K ) ,
(5)
τ ∈T
where T are the F-stopping times bounded by T (which in this case is
infinity), and K is the strike price of the option.
Assume the DPP holds, then the value function solves
n
o
max − r V + Lt V , G − V = 0 , on D ,
(6)
where D = [0, T ] × R+ , and the infinitesimal generator
1
Lt = (r − D0 ) S ∂S + σ 2 S 2 ∂SS .
2
9 / 13
Hold region
When it is not optimal to exercise, for a call the region is S < S ? and we
solve
1 2 2 ∂2V
∂V
σ S
− rV = 0.
(7)
+ (r − D0 )S
2
∂S 2
∂S
Trial solution V (S, t) = S β and obtain the characteristic equation
1
2(r − D0 )
β± =
1−
2
σ2
v
!
u 2
1u
2(r − D0 )
8r
t
±
1−
+ 2
2
σ2
σ
Hence the solution to (7) is given by the linear combination
V (S) = AS β+ + BS β− ;
where A and B are constants.
10 / 13
Boundary conditions
I
At the boundary agent exercises option:
C A (S? ) = max(S? − K , 0).
I
We also know that as S → 0 C A (S) → 0
I
We must rule out the solution that contains the negative root β− .
I
At exercise we have that
C (S? )
=
S? − K
AS?β+
=
S? − K .
11 / 13
The holder must choose the largest
A=
S? − K
S?β+
such that the option value is maximal. Hence
max
S?
S? − K
S?β+
.
(8)
The first order condition is
S?β+ − β+ (S? − K )S?β+ −1
S?2β+
−S? − β+ (S? − K )S?−1
=
0
(9)
=
0.
(10)
12 / 13
Therefore
S? =
β+
K,
β+ − 1
and
A=K
1−β+
β+ − 1
β+
β+ −1
Finally, the value of the perpetual call is
 β+
K

β+
 ββ+ −1
β+ −1 S
+K
C (S) =


S −K
1
.
β+ − 1
for
S < S? ,
for
S ≥ S? .
13 / 13