Solving Hamilton-Jacobi-Bellman equations by combining a max

Solving Hamilton-Jacobi-Bellman equations by
combining a max-plus linear approximation and
a probabilistic numerical method
Marianne Akian
INRIA Saclay - Île-de-France and CMAP École polytechnique CNRS
RICAM Workshop: Numerical methods for Hamilton-Jacobi equations
in optimal control and related fields
Linz, November 21-25, 2016
Joint work with Eric Fodjo, see arXiv:1605.02816
A finite horizon diffusion control problem involving
“discrete” and “continuum” controls
The state ξs ∈ Rd satisfies the stochastic differential equation
dξs = f µs (ξs , us )ds + σ µs (ξs , us )dWs ,
where (Ws )s≥0 is a d-dimensional Brownian motion,
µ := (µs )0≤s≤T , and u := (us )0≤s≤T are admissible control processes,
µs ∈ M a finite set and us ∈ U ⊂ Rp .
The problem consists in maximizing the finite horizon discounted payoff
(δ m ≥ 0):
hR
Rs µ
τ
T
J(t, x, µ, u) := E t e − t δ (ξτ ,uτ )dτ `µs (ξs , us )ds
i
RT µ
τ
+e − t δ (ξτ ,uτ )dτ ψ(ξT ) | ξt = x .
The Hamilton-Jacobi-Bellman (HJB) equation
Define the value function v : [0, T ] × Rd → R as:
v (t, x) = sup J(t, x, µ, u) .
µ,u
Under suitable assumptions, it is the unique (continuous) viscosity solution
of the HJB equation
2
− ∂v
∂t − H(x, v (t, x), Dv (t, x), D v (t, x)) = 0,
v (T , x) = ψ(x),
x ∈ Rd , t ∈ [0, T ),
x ∈ Rd ,
satisfying also some growth condition at infinity (in space), where the
Hamiltonian H : Rd × R × Rd × Sd → R is given by:
H(x, r , p, Γ) := max Hm (x, r , p, Γ) ,
m∈M
with
Hm (x, r , p, Γ) := maxu∈U
n
1
2
tr σ m (x, u)σ m (x, u)T Γ
o
+f m (x, u) · p − δ m (x, u)r + `m (x, u) .
Standard grid based discretizations solving HJB equations suffer the curse
of dimensionality malediction: for an error of , the computing time of finite
difference or finite element methods is at least in the order of (1/)d/2 .
Some possible curse of dimensionality-free methods:
I
I
Idempotent methods introduced by McEneaney (2007) in the
deterministic case, and by McEneaney, Kaise and Han (2011) in the
stochastic case.
Probabilistic numerical methods based on a backward stochastic
differential equation interpretation of the HJB equation, simulations
and regressions:
I
I
I
I
Quantization Bally, Pagès (2003) for stopping time problems.
Introduction of a new process without control: Bouchard, Touzi (2004)
when σ does not depend on control; Cheridito, Soner, Touzi and Victoir
(2007) and Fahim, Touzi and Warin (2011) in the fully-nonlinear case.
Control randomization: Kharroubi, Langrené, Pham (2013).
Fixed point iterations: Bender, Zhang (2008) for semilinear PDE (which
are not HJB equations).
The idempotent method of McEneaney, Kaise and Han
Given m and u, denote by ξˆm,u the Euler discretization of the process ξ
with time step h:
ξˆm,u (t + h) = ξˆm,u (t) + f m (ξˆm,u (t), u)h + σ m (ξˆm,u (t), u)(Wt+h − Wt ) .
Define the dynamic programming operators:
n
o
m
m
Tt,h
(φ)(x) =sup h`m (x, u) + e −hδ (x,u) E φ(ξˆm,u (t + h)) | ξˆm,u (t) = x
,
u∈U
and
m
Tt,h (φ)(x) = max Tt,h
(φ)(x) .
m∈M
The HJB equation can be discretized in time by:
v h (t, x) = Tt,h (v h (t + h, ·))(x),
t ∈ Th := {0, h, 2h, . . . , T − h} .
Under appropriate assumptions, this scheme converges to the solution of
HJB eq. when h goes to zero.
I
m and T
In the deterministic case (σ m = 0), Tt,h
t,h are max-plus linear:
v h (t + h, x) = maxi=1,...,N (λi + qit+h (x)) ∀x ⇒
v h (t, x) = maxi=1,...,N (λi + qit (x)) ∀x with qit = Tt,h (qit+h ) .
I
I
I
I
We only need to compute the effect of the dynamic programming
operator on the finite basis qiT , i = 1, . . . , N, for instance by
computing their projection on a fixed basis (see Fleming and
McEneaney (2000) and A.,Gaubert,Lakoua (2008)).
However, the qiT are difficult to compute in general, or the size of the
basis need to be exponential in d.
m (q) is a quadratic form when q is a quadratic form, and if it easy
If Tt,h
to compute (for instance when the Hm correspond to linear quadratic
problems), and if the qiT are quadratic forms, then the qit are finite
suppremum of quadratic forms easy to compute (see McEneaney
(2006)). The number of quadratic forms for v h (0, ·) is exponential in
the number of time step only.
This idea was extended to the stochastic case by McEneaney, Kaise
and Han (2011).
Theorem (McEneaney, Kaise and Han (2011))
Assume δ m = 0, σ m is constant, f m is affine, `m is concave quadratic (with
respect to (x, u)), and ψ is the supremum of a finite number of concave
quadratic forms. Then, for all t ∈ Th , there exists a set Zt and a map
gt : Rd × Zt → R such that for all z ∈ Zt , gt (·, z) is a concave quadratic
form and
v h (t, x) = sup gt (x, z) .
z∈Zt
Moreover, the sets Zt satisfy
Zt = M × {z̄t+h : W → Zt+h | Borel measurable} ,
where W = Rd is the space of values of the Brownian process.
I
Here a concave quadratic form is any map Rd → R of the form
1
d
x 7→ q(x, z) := x T Qx+b·x+c, with z = (Q, b, c) ∈ Qd = S−
d ×R ×R
2
I
The proof uses the max-plus (infinite) distributivity property.
I
In the deterministic case, the sets Zt are finite, and their cardinality is
exponential in time: #Zt = M × #Zt+h = · · · = M Nt × #ZT with
M = #M and Nt = (T − t)/h.
I
In the stochastic case, the sets Zt are infinite as soon as t < T .
I
If the Brownian process is discretized in space, then W can be replaced
by the finite subset with fixed cardinality p, and the sets Zt become
finite.
I
Nevertheless, their cardinality increases doubly exponentially in time:
p Nt −1
#Zt = M × (#Zt+h )p = · · · = M p−1 × (#ZT )p
(p = 2 for the Bernouilli discretization).
Nt
where p ≥ 2
I
Then, McEneaney, Kaise and Han proposed to apply a pruning method
to reduce at each time step t ∈ Th the cardinality of Zt .
I
In this talk, we shall replace pruning by random sampling.
I
The idea is to use only quadratic forms that are optimal in the points
of a sample of the process.
Consider the case with no continuous control u and no discount factor.
I Then ξˆm (t + h) = S m (ξˆm (t), Wt+h − Wt ) with
t,h
m
St,h
(x, w ) = x + f m (x)h + σ m (x)w .
and
m
m
Tt,h
(φ)(x) = h`m (x) + E φ(St,h
(x, Wt+h − Wt )) .
I
I
d
Assume that φ(x) = maxz∈Zt+h q(x, z), Zt+h ⊂ Qd = S−
d × R × R,
1 T
and q(x, z) := 2 x Qx + b · x + c, z = (Q, b, c) ∈ Qd .
Then, for each x ∈ Rd , there exists z̄xm : W → Zt+h measurable s.t.
m (x, W
m
m
φ(St,h
t+h − Wt )) = q St,h (x, Wt+h − Wt ), z̄x (Wt+h − Wt ) .
I
Moreover, under the previous assumptions on `m , f m and σ m , we have,
for all x 0 ∈ Rd ,
m
E h`m (x 0 ) + q St,h
(x 0 , Wt+h − Wt ), z̄xm (Wt+h − Wt ) = q(x 0 , zxm )
for some zxm ∈ Qd , and so
m
Tt,h
(φ)(x) = q(x, zxm ) = sup q(x, zxm0 ) .
x 0 ∈Rd
The sampling algorithm
I
Let M = #M and choose N = (Nin , Nrg ) giving size of samples.
I
Choose ZT ⊂ Qd such that |ψ(x) − maxz∈ZT q(x, z)| ≤ . Define
v h,N (T , x) = maxz∈ZT q(x, z), for x ∈ Rd .
Construct a sample of ((ξˆm (0))m∈M , (Wt+h − Wt )t∈Th ) of size Nin
indexed by ω ∈ ΩNin := {1, . . . , Nin }, and deduce ξˆm (t, ω), m ∈ M.
For t = T − h, T − 2h, . . . , 0 do:
I
I
1. For each ω ∈ ΩNin and m ∈ M, denote x = ξˆm (t, ω), and construct a
subsample of size Nrg of elements (ωi , ωi0 ) of (ΩNin )2 , i ∈ ΩNrg . Let
z̄xm : W → Zt+h (as above), be computed at the points
(Wt+h − Wt )(ωi0 ) only.
Consider
m
q̃(x 0 , w ) = h`m (x 0 ) + q St,h
(x 0 , w ), z̄xm (w ) .
Approximate zxm such that q(x 0 , zxm ) = E[q̃(x 0 , Wt+h − Wt )] by doing a
regression of q̃(ξˆm (t), Wt+h − Wt ) using a (usual) basis of quadratic
forms of ξˆm (t), and the sample (ξˆm (t, ωi ), (Wt+h − Wt )(ωi0 )), i ∈ ΩNrg .
2. Let Zt be the set of the parameters zxm ∈ Qd of all the quadratic forms
obtained in Step 2. Define v h,N (t, x) = maxz∈Zt q(x, z).
The sampling algorithm
I
Several subsamplings of elements (ωi , ωi0 ), i ∈ ΩNrg , of (ΩNin )2 have
been tested:
1. the initial sampling: Nrg = Nin and ωi = ωi0 = i.
2. Nrg = Nx × Nw , choose, once for all ω ∈ ΩNin and m ∈ M, a random
sampling of size Nx of the elements of ΩNin and an independent random
sampling of size Nw and make the product: so ωi and ωi0 are
independent.
3. Do as in Method 2, but choose different samplings for each ω ∈ ΩNin
and m ∈ M, independently.
4. Do as in Method 2, but replace the second random sampling of ΩNin by
ΩNin itself, so Nw = Nin .
5. Do as in Method 2, but replace both random samplings of ΩNin by ΩNin
2
itself, so Nrg = Nin
.
I
For each time step t ∈ Th , we have #Zt ≤ M × Nin , and the number
of computations is in O((M × Nin )2 × Nrg ) and in
O((M × Nin )2 × Nw ) for subsamplings 2,3,4.
We consider a simple problem of pricing and hedging an option with
uncertain volatility and two underlying processes, tested in Kharroubi,
Langrené, Pham (2013).
I
I
d = 2, M = {ρmin , ρmax } with −1 ≤ ρmin ≤ ρmax ≤ 1 .
The dynamics of the processes are given, for all m ∈ M, by f m = 0
and
σ 1 ξ1
0
m
√
, ξ = (ξ1 , ξ2 ) ∈ R2 ,
σ (ξ) =
σ2 mξ2 σ2 1 − m2 ξ2
with σ1 , σ2 > 0, that is
dξi = σi ξi dBi ,
I
hdB1 , dB2 i = µs ds .
The final reward is given by
ψ(ξ) = (ξ1 − ξ2 − K1 )+ − (ξ1 − ξ2 − K2 )+ ,
with x + = max(x, 0), K1 < K2 .
ξ = (ξ1 , ξ2 ) ∈ R2 ,
I
We restrict the state space to the set of ξ ∈ R2+ s.t.
ξ1 − ξ2 ∈ [−100, 100].
I
We take σ1 = 0.4, σ2 = 0.3, K1 = −5, K2 = 5, T = 0.25, and fix the
time discretization step to h = 0.01.
I
M = {ρ} with ρ = −0.8 or ρ = 0.8, or M = {−0.8, 0.8}.
I
When M = {ρ}, the value function is known, so we can compute the
error.
I
We tested in that case each subsampling method: Method 1 (initial
sampling) gives very bad results even at time T − h. Method 5 need
too much space and time even for Nin = 1000. Method 2 seems to be
the best one.
ρ
-0.8
0.8
-0.8
0.8
-0.8
0.8
-0.8
0.8
-0.8
0.8
-0.8
0.8
Nin
1000
1000
1000
1000
1000
1000
100
100
100
100
100
100
Nrg
10000
10000
1000
1000
1000
1000
1000
1000
10000
10000
1000
1000
Nx
10
10
10
10
10
10
10
10
10
10
10
10
Nw
1000
1000
100
100
100
100
100
100
1000
1000
100
100
Nm
2
2
2
2
3
3
2
2
2
2
4
4
e∞
0.521
0.157
0.75
0.36
3.48
3.05
1.95
1.81
2.09
1.79
2.15
1.80
e1
0.173
0.074
0.41
0.11
1.92
0.81
0.46
0.33
0.53
0.36
0.55
0.39
Table : Sup-norm and normalized `1 norm of the error on the value function at
time t = 0, and states ξ2 = 50 and ξ1 ∈ [20, 80], denoted e∞ and e1 resp., when
M = ρ, and the subsampling number Nm is used.
Figure : Value function obtained at t = 0, and ξ2 = 50 as a function of
ξ1 ∈ [20, 80]. Here Nin = 1000, Nrg = Nx × Nw , Nx = 10, Nw = 1000 and the
subsampling method 2 is used. In blue, M = {−0.8}, in green M = {0.8}, and in
black M = {−0.8, 0.8}.
In the general case:
I
Discretize the continuous control u and apply the previous method
would be too expensive since one need to simulate a process for each
m and u.
I
So we need to consider simulations associated to a process
independent of the control u at least.
I
The probabilistic method of Fahim, Touzi and Warin (2011) uses the
simulation of one process only.
I
We shall use the simulation of one process of each discrete control m.
The algorithm of Fahim, Touzi and Warin (2011)
For a fixed m ∈ M, consider the equation:
−
∂v
− Hm (x, v (t, x), Dv (t, x), D 2 v (t, x)) = 0,
∂t
x ∈ Rd , t ∈ [0, T ) .
Decompose the Hamiltonian Hm as Hm = Lm + G m with
Lm (x, r , p, Γ) :=
1
tr (am (x)Γ) + f m (x) · p ,
2
am (x) = σ m (x)σ m (x)T and G m such that ∂Γ G m is positive semidefinite
(that is G m is elliptic).
The idea of the algorithm: if Xtm is the diffusion with generator Lm , and
v m the viscosity solution of the above equation, then Yt = v m (Xtm ) is a
backward process with drift −G m (Xtm , Yt , Zt , Γt ).
Denote by X̂ m the Euler discretization of Xtm :
X̂ m (t + h) = X̂ m (t) + f m (X̂ m (t))h + σ m (X̂ m (t))(Wt+h − Wt ) .
Then, v m is approximated (Fahim, Touzi and Warin) by v m,h satisfying:
m
v m,h (t, x) = Tt,h
(v m,h (t + h, ·))(x),
t ∈ Th ,
and
m
0
0
1
2
Tt,h
(φ)(x) = Dm,t,h
(φ)(x)+hG m (x, Dm,t,h
(φ)(x), Dm,t,h
(φ)(x), Dm,t,h
(φ)(x))
i
with Dm,t,h
(φ) being the approximation of the ith derivative of φ given by
i
i
Dm,t,h
(φ)(x) = E(φ(X̂ m (t + h))Pm,t,x,h
(Wt+h − Wt ) | X̂ m (t) = x) ,
i
where, for all m, t, x, h, i, Pm,t,x,h
is a polynomial of degree i with values in
an appropriate finite dimensional space. For instance, when σ m is the
identity:
0
Pm,t,x,h
= 1,
1
Pm,t,x,h
(w ) =
w
,
h
2
Pm,t,x,h
(w ) =
ww T − hI
.
h2
I
The time approximation of (Fahim, Touzi and Warin) works when
tr(am (x)−1 ∂Γ G m ) ≤ 1 and G m is Lipschitz continuous. Indeed, this
m is a monotone operator over the set of Lipschitz
implies that Tt,h
continuous functions Rd → R:
m
m
φ ≤ ψ =⇒ Tt,h
(φ) ≤ Tt,h
(ψ) .
I
m is further approximated by using a regression scheme.
Tt,h
I
Under some technical assumptions, the above algorithm converges.
I
The above assumptions do not allow in general to handle the case of
the Hamiltonian H directly, since it may be nonsmooth and with
noncomparable diffusion coefficients.
I
Note also that theoretically, the sample size to obtain the convergence
of the estimator is at least in the order of 1/hd/2 . Also the dimension
of the linear regression space should be in this order. The curse of
dimensionality persists, but in practice a much smaller sample size is
sufficient.
Combining idempotent and probabilistic algorithms
I
When the above assumptions are satisfied for the Hm , one consider
the following mixed scheme:
v h (t, x) = Tt,h (v h (t + h, ·))(x),
t ∈ Th ,
with
m
Tt,h (φ)(x) = max Tt,h
(φ)(x) ,
m∈M
and
m
Tt,h
as above, for each m ∈ M.
I
The mixed scheme consists also in the approximation of the above
v h (t, ·) as a finite supremum of quadratic forms, using a sampling
algorithm similar to the previous one, where we apply a regression over
i
the set of quadratic forms to compute Dm,t,h
(φ), when φ is the
supremum of quadratic forms.
I
To do this, we need a result comparable to the one of McEneaney,
Kaise and Han (2011) with the new operator Tt,h .
I
We know that Tt,h is monotone.
The following result generalizes the max-plus distributivity. It says that any
monotone continuous map distributes over max.
Theorem
Let W = Rd and D be the set of measurable functions W → R with at
most exponential growth rate, and and let G be a monotone additively
α-subhomogeneous operator from D → R, for some constant α > 0. Let
(Z , A) be a measurable space, and let W be endowed with its Borel
σ-algebra. Let φ : W × Z → R be a measurable map such that for all
z ∈ Z , φ(·, z) is continuous and belongs to D. Let v ∈ D be such that
v (W ) = supz∈Z φ(W , z). Assume that v is continuous. Then,
G (v ) = sup G (φ̄z̄ )
z̄∈Z
where φ̄z̄ : W → R, W 7→ φ(W , z̄(W )), and
Z = {z̄ : W → Z , measurable and such that φ̄z̄ ∈ D}.
m preserve random quadratic forms, the conclusion
Then, if the operators Tt,h
of the theorem of McEneaney, Kaise and Han (2011) holds for the mixed
scheme.
And a sampling algorithm can be applied to compute the sets Zt such that
v h (t, x) = maxz∈Zt q(x, z).
Moreover, the previous result and method can also be applied to HJB-Isaacs
equations of zero sum games, as soon as the above property holds.
I
However, the assumptions necessary for the scheme of (Fahim, Touzi
and Warin) to work may be difficult to obtain for the Hamiltonians Hm
too: linear quadratic problems have unbounded coefficients, and in
financial applications, the diffusions coefficients are highly
unnoncomparable.
I
Recently, we managed to get rid of these assumptions by using
approximations of the first and second derivative of the function vh by
using different polynomials in (Wt+h − Wt )+ and (Wt+h − Wt )− and
Wt+h − Wt .
I
We hope to use this new scheme to obtain more appealing nunmerical
results in the future.
Conclusion
I
I
I
I
I
We proposed an algorithm to solve HJB equations, combining ideas
included in the idempotent algorithm of McEneaney, Kaise and Han
(2011) and in the probabilistic numerical scheme of Fahim, Touzi and
Warin (2011).
The advantages with respect to the pure probabilistic scheme are that
the regression estimation is over a linear space of small dimension, and
that one may bypass in some cases the restrictive condition of
application of the method.
The advantages with respect to the pure idempotent scheme is that
one may avoid the pruning step: the number of quadratic forms
generated by the algorithm is linear with respect to the sampling size
times the number of discrete controls.
The theoretical results suggest that it can also be applied to Isaacs
equations of zero-sum games.
We find only recently an improvement of the probabilistic scheme of
Fahim, Touzi and Warin (2011), which will allows us to do numerical
tests in high dimension and with fully nonlinear Hamiltonians.