Stochastic Optimal Control Lecture 6: DPP and HJB for Stopping Times Álvaro Cartea, University of Oxford January 23, 2017 1 / 13 Overview of course1 I Deterministic dynamic optimisation I Stochastic dynamic optimisation I Diffusions and Jumps I Infinitesimal generators Dynamic programming principle I I I I I Diffusions Jumps Stopping times Examples: I I I Merton portfolio problems Optimal execution of blocks of shares others 1 Preliminary. Contents may change as we go along. 2 / 13 Setup For this purpose, let X = (Xt )0≤t≤T denote a vector-valued process of dim m satisfying the SDE dXt = µ(t, Xt ) dt + σ(t, Xt ) dWt + γ(t, Xt ) dNt , where µ : [0, T ] × Rm 7→ Rm , σ : [0, T ] × Rm 7→ Rm × Rm , γ : [0, T ] × Rm 7→ Rm × Rm , W is an m-dimensional Brownian motion with independent components, and N is an m-dimensional counting process with intensities λ(t, Xt ). The filtration F is the natural one generated by X, and the generator of the process Lt acts on twice differentiable functions as follows: Lt h(t, x) = µ(t, x) · Dx h(t, x) + 21 Trσ(t, x)σ(t, x)0 D2x h(t, x) + p X λj (t, x) h(t, x + γ ·j (t, x)) − h(t, x) . j=1 3 / 13 The agent then has a performance criterion, for each F-stopping time τ ∈ T[t,T ] , given by H τ (t, x) = Et,x [ G (Xτ ) ] , (1a) where G (Xτ ) is the reward upon exercise, and she seeks to find the value function H(t, x) = sup H τ (t, x) , (1b) τ ∈T[t,T ] and the stopping time τ which attains the supremum if it exists. 4 / 13 The agent is attempting to decide between stopping ‘now’ and receiving the reward G , or continuing to hold off in hopes of receiving a larger reward in the future. She should continue to wait at the point (t, x) ∈ [0, T ] × Rm as long as the value function has not attained a value of G (x) there. This motivates the definition of the stopping region, S, which we define as n o S = (t, x) ∈ [0, T ] × Rm : H(t, x) = G (x) . I Note that whenever (t, x) ∈ S, i.e. if the state of the system lies in S, it is optimal to stop immediately – since the agent cannot improve beyond G by the definition of the value function in (1). I The complement of this region, S c , is known as the continuation region. 5 / 13 DPP for optimal stopping Theorem Dynamic Programming Principle for Stopping Problems. The value function (1) satisfies the DPP H(t, x) = sup Et,x [ G (Xτ ) 1τ <θ + H(θ, Xθ ) 1τ ≥θ ] , (2) τ ∈T[t,T ] for all (t, x) ∈ [0, T ] × Rm and all stopping times θ ≤ T . I if the optimal stopping time τ ∗ occurs prior to θ, then the agent’s value function equals the reward at τ ∗ . I If the agent has not stopped by θ, then at θ she receives the value function evaluated at the current state of the system. 6 / 13 HJB for optimal stopping Theorem Dynamic Programming Equation for Stopping Problems. Assume that the value function H(t, x) is once differentiable in t and all second order derivatives in x exist, i.e. H ∈ C 1,2 ([0, T ], Rm ), and that G : Rm 7→ R is continuous. Then H solves the variational inequality, also known as an obstacle problem, or free-boundary, problem: max n ∂t H + Lt H , G − H o = 0, on D, (3) where D = [0, T ] × Rm . 7 / 13 Example American option 8 / 13 Perpetual call option Here we price an American call with infinite horizon, i.e. expiry T = ∞. Assume the stock price S satisfies the SDE dSt = (r − D0 ) dt + σ dWt . St (4) The agent’s value function is V (S) = sup E e −r τ max(Sτ − K ) , (5) τ ∈T where T are the F-stopping times bounded by T (which in this case is infinity), and K is the strike price of the option. Assume the DPP holds, then the value function solves n o max − r V + Lt V , G − V = 0 , on D , (6) where D = [0, T ] × R+ , and the infinitesimal generator 1 Lt = (r − D0 ) S ∂S + σ 2 S 2 ∂SS . 2 9 / 13 Hold region When it is not optimal to exercise, for a call the region is S < S ? and we solve 1 2 2 ∂2V ∂V σ S − rV = 0. (7) + (r − D0 )S 2 ∂S 2 ∂S Trial solution V (S, t) = S β and obtain the characteristic equation 1 2(r − D0 ) β± = 1− 2 σ2 v ! u 2 1u 2(r − D0 ) 8r t ± 1− + 2 2 σ2 σ Hence the solution to (7) is given by the linear combination V (S) = AS β+ + BS β− ; where A and B are constants. 10 / 13 Boundary conditions I At the boundary agent exercises option: C A (S? ) = max(S? − K , 0). I We also know that as S → 0 C A (S) → 0 I We must rule out the solution that contains the negative root β− . I At exercise we have that C (S? ) = S? − K AS?β+ = S? − K . 11 / 13 The holder must choose the largest A= S? − K S?β+ such that the option value is maximal. Hence max S? S? − K S?β+ . (8) The first order condition is S?β+ − β+ (S? − K )S?β+ −1 S?2β+ −S? − β+ (S? − K )S?−1 = 0 (9) = 0. (10) 12 / 13 Therefore S? = β+ K, β+ − 1 and A=K 1−β+ β+ − 1 β+ β+ −1 Finally, the value of the perpetual call is β+ K β+ ββ+ −1 β+ −1 S +K C (S) = S −K 1 . β+ − 1 for S < S? , for S ≥ S? . 13 / 13
© Copyright 2026 Paperzz