Introduction to Stochastic Optimization Part 4: Multi

Introduction to Stochastic Optimization
Part 4: Multi-stage decision problems
Georg Ch. Pflug
April 23, 2009
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
The problem
ξ = (ξ1 , . . . , ξT ) a multivariate time series process (e.g. future
interest rates, future asset prices, future demands, etc.)
At times t = 0, t = 1, . . . , t = T − 1, we make decisions
x0 , . . . , xT −1 .
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Example: dynamic portfolio management
mu
Efficient frontier multirisk dynport
12.15
6
12.1
5
4
12.05
3
12 2
11.95 1
0.04 0.06 0.08
risk
1: mu = 11.9597
risk = 0.02972
1.02
1
0.98
0.96
0
1
2
1.05
1
1
3
value
0
1
2
3
6: mu = 12.1339
risk = 0.08031
0
1
2
3
5: mu = 12.099
risk = 0.06436
4: mu = 12.0642
risk = 0.05019
1.08
1.04
1.02
1
0.98
0.96
3: mu = 12.0294
risk = 0.04099
2: mu = 11.9945
risk = 0.03434
1.05
1.1
1.04
1
1
0
1
2
time stage
3
0.96
0
1
2
3
0
1
2
3
An efficient frontier using the (negative) multiperiod AV@R as risk
functional
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Scenario processes
Instead of a single scenario variable ξ let us now consider a
scenario process ξ = (ξ1 , . . . , ξT ). A decision model is called
multi-period, if the scenario process has more than one period. and
it is called multi-stage, if the decisions are to be made at different
times, say at times 0, 1, 2, . . .
period 1:
observation of
the r.v. ξ1
?
stage 0:
t=0
decision
x0
period 2:
observation of
the r.v. ξ2
?
stage 1:
t=1
decision
x1
Georg Ch. Pflug
period 3:
observation of
the r.v. ξ3
?
stage 2:
t=2
decision
x2
?
stage 3:
t=3
decision
x3
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Tree processes and filtrations
A stochastic process (νt ) is called a tree process, if the
sigma-algebra generated by νt and by (ν1 , ν2 , . . . , νt ) coincide
σ(νt ) = σ(ν1 , . . . , νt ).
Let Ft = σ(νt ). Then F = (F1 , . . . , Ft ) is a filtration, i.e.
Ft ⊆ Ft+1 .
Conversely, every filtration is generated by a tree process. We use
the notation
ξt ¢ Ft if ξt is Ft measurable for all t, and
ξ ¢ F if (ξt ) is adapted to the filtration F , i.e. ξt ¢ Ft for all t.
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
The scenario process (ξt ) is adapted to F if and only if the ξt ’s are
functions of νt , i.e.
ξt = ft (νt ).
Decisions must be made in the indicated order: First choose x0 ,
then observe ξ1 , then choose x1 and so on. i.e. also the decision
sequence (xt ) must be adapted to the filtration F . The
requirement
x ¢F
is called the non-anticipativity constraint. Information is a
resource. The value of perfect information can be interpreted as
shadow costs for the lack of information.
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Tree processes on finite probability spaces are trees
0.4³
1
³³
³
m
³
P
1
PP0.6
0.5¶
7
P
P
q
¶
¶
¶
0.3³
1 2m 1.0³
¶ ³
³
m
³
¶
0
@
0.4´
@
3́
´
@0.2
´
@
R 3m
@
´ 0.2Q
Q
Q0.4
Q
Q
s
9m ω6 ; P{ω6 } = 0.08
ν0
ν2
ν1
4m ω1 ; P{ω1 } = 0.2
5m ω2 ; P{ω2 } = 0.3
6m ω3 ; P{ω3 } = 0.3
7m ω4 ; P{ω4 } = 0.08
8m ω5 ; P{ω5 } = 0.04
For a given tree denote by n− the predecessor and by n+ the set
of successors for the node n.
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
The planning horizon is T , that is the last decision to be taken is
xT −1 . The ”success” (cash-flow, income, change of wealth) Yt
within period t is a function Ht of all observations and decisions
before stage t that is
Yt = Ht (x0 , ξ1 , . . . , xt−1 , ξt ).
The objective of multi-stage financial optimization problems is to
maximize the acceptability of the whole operation under the
non-anticipativity constraints and possibly some additional
operating constraints.
Maximize in x = (x0 , x1 , . . . , xT −1 ) : A[Y1 , . . . , YT ; F0 , . . . , F T −1 ]
where Yt = Ht (x0 , ξ1 , . . . , xt−1 , ξt )
subject to the non-anticipativity constraints x ¢ F
and possible some more operational constraints
(1)
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
State-space models
Sometimes we may identify states zt such as wealth, actual
portfolio, etc.
-decision
x0
r.v.
ξ1
r.v.
ξ2
-decision
x1
- state
- z1
state
z0
t=0
-
t=1
Figure: The dynamics of a state-space decision model
The multi-stage problem in state-space formulation reads
Maximize in x : A[H1 (z1 ), H2 (z2 ), . . . , HT (zT ); F ]
under the system dynamics zt = gt (zt−1 , xt−1 , ξt )
with initial condition z0
subject to constraints xt ∈ X (zt )
and the nonanticipativity constraints x ¢ F .
Georg Ch. Pflug
t = 1, . . . , T − 1
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Multi-period acceptability and risk functionals
Let Y1 , . . . , YT be discounted cash-flow (”success”) process
adapted to (Ft ).
An acceptability functional A(Y1 , . . . , YT , F1 , . . . , FT ) is a real
valued functional defined on a space of adapted processes
(Yt ) ¢ F
I Information monotonicity: If Ft ⊆ Ft0 , for all t, then
A(Y1 , . . . , YT ; F0 , . . . , FT −1 ) ≤ A(Y1 , . . . , YT ; F00 , . . . , FT0 −1 ).
I
Multi-period translation equivariance: If ct ¢ Ft−1 , then
A(Y1 , . . . , Yt + ct , . . . , YT , F1 , . . . , FT ) = E(ct )
+A(Y1 , . . . , YT , F1 , . . . , FT ).
I
I
Concavity. (Y1 , . . . , YT ) 7→ A(Y1 , . . . , YT ; F0 , . . . , FT −1 ) is
concave.
(2)
(1)
Monotonicity: If Yt ≤ Yt , a.s. t = 1, . . . , T , then
(1)
(1)
(2)
(2)
A(Y1 , . . . , YT , F1 , . . . , FT ) ≤ A(Y1 , . . . , YT , F1 , . . . , FT ).
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Examples for multi-period acceptability functionals
The multi-period conditional distortion functional:
A(Y1 , . . . , YT ; F ) =
T
X
wt E[AJt (Yt |Ft−1 )]
t=1
Special case: the multi-period AV@R:
AV@Rβ,w (Y1 , . . . , YT ; F ) =
T
X
wt E[AV@Rβt (Yt |Ft−1 )].
t=1
Here (wt ) are some weights measuring the relative importance of
”successes” at different times.
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Distances for stochastic processes
If (Ξ1 , d1 ) and (Ξ2 , d2 ) are Polish spaces then so is the Cartesian
product (Ξ1 × Ξ2 ) with metric
d 2 ((u1 , u2 ), (v1 , v2 )) = d1 (u1 , v1 ) + d2 (u2 , v2 ).
Consider some metric d on Rm , which makes it Polish (it needs
not to be the Euclidean one). Then we define the following spaces
Ξ1 = (Rm , d)
Ξ2 = (Rm × P1 (Ξ1 , d), d 2 ) = (Rm × P1 (Rm , d), d 2 )
Ξ3 = (Rm × P1 (Ξ2 , d), d 2 ) = (Rm × P1 (Rm × P1 (Rm , d), d 2 ), d 2 )
..
.
ΞT
= (Rm × P1 (ΞT −1 , d), d 2 )
All spaces Ξ1 , . . . , ΞT are Polish spaces and they may carry
probability distributions.
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Nested distributions
Definition. A Borel probability distribution P with finite first
moment on ΞT is called a nested distribution of depth T .
For any nested distribution P, there is an embedded multivariate
distribution P. The projection from the nested distribution to the
embedded distribution is not injective. Notation for discrete
distributions:
"
probabilities:
values:
0.3
0.4
0.3
3.0
1.0
5.0
#
"
0.4
0.3
0.3
1.0
5.0
3.0
#
"
0.1
0.2
0.4
0.3
3.0
3.0
1.0
5.0
#
Left: A valid distribution. Middle: the same distribution. Right:
Not a valid distribution
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Examples for nested distributions
0.4 5.1
1
³³
³
PP
0.6
q1.0
P
¶
7
¶
¶³
1.0
0.3 3.0
-2.8
1
³
¶³
@
0.4 3.3
´3́
@0.2
@
R3.0´ 0.2
-4.7
Q
0.4
QQ
s6.0
0.5 2.4

0.2


 "

0.4

3.0
0.2
0.4
6.0
4.7
3.3
0.3
#
" 3.0 #
1.0
2.8

0.5
"
2.4
0.6
0.4
1.0
5.1


# 


The embedded multivariate, but non-nested distribution of the
scenario process can be gotten from it:



0.08
0.04
0.08
0.3
0.3
3.0
6.0
3.0
4.7
3.0
3.3
3.0
2.8
2.4
1.0
Georg Ch. Pflug
0.2


2.4 
5.1
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Minimal filtrations

0.5


0
 "
#

0.5
0.5

0.0
1.0
0.5



0
"
# 

0.5
0.5

0.0
1.0


1.0




1.0
 "
# 


0.5
0.5


0.0
1.0
Left: Not a valid nested distribution. Right: A valid one
This fact leads to the concept of minimal filtrations.
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Example
:
»»
z
©
* 0 XX
©
0 H
»
:
H
j 0 »
¡
µ
XX
z
¡
»
:
»
@
z
©
* 0 XX
R 0 ©
@
HH
»
:
j 0 »
XX
z
1
1
1
0
1
0
0
0
This tree process is already minimal.
:
»»
z
X
©
* 0 X
©
0 H
:
H
j 0 »
¡
µ
X»
z
X
¡
:
»
»
@
z
©
* 0 XX
R 0 ©
@
HH
:
j 0 X
»»
z
X
1
0
1
0
1
0
1
0
0.5
- 0
1
1
¡
µ
- 0 ¡
@
@
R 0
0.5
1
Left: this tree process is not minimal, Right: its minimal reduction.
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Main Results
Theorem. Under a mild condition on the basic problem
(compound convexity of R), we may w.l.o.g. reduce the problem
to solutions, which are measurable w.r.t. the minimal filtration.
We define the nested distance between two nested distributions P
and P̃ as the Kantorovich distance of these distributions on the
metric space ΞT .
Theorem. Let P, P̃ be nested distributions and P, P̃ be the
pertaining multi-period distributions. Then
dKA (P, P̃) ≤ dKA (P, P̃).
Theorem. If, for the objective R[H(x0 , ξ1 , . . . , xT −1 , ξT )], R and
H are Lipschitz, then the approximation error satisfies
g ≤ C · dKA (P, P̃).
e(Opt, Opt)
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Example for the nested distance
Let
µµ
P=N
¶ µ
¶¶
1 1
,
.
1 2
0
0
and
P̃ =


0.30345
"
0.3931
0.30345
−1.029
0.3931
0.30345
−2.058
−1.029
0.0
#
"
0.30345
0.0
0.3931
0.30345
−1.029
0.0
1.029
Georg Ch. Pflug

0.30345
#
"
0.30345
1.029
0.3931
0.30345
0.0
1.029
2.058
#

Introduction to Stochastic Optimization Part 4: Multi-stage decision
0.15
0.15
0.1
0.1
0.05
0.05
0
0
3
3
2
2
1
3
2
0
1
−1
0
−1
−2
3
1
2
0
1
0
−1
−1
−2
−2
−2
−3
−3
The nested distance is d(P, P̃) = 0.82.
The distance of the multiperiod distributions is d(P, P̃) = 0.68.
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
0.15
0.1
0.1
0.05
0.05
0
0
3
3
2
2
1
3
2
0
1
−1
0
−1
−2
1
3
0
2
1
−1
0
−1
−2
−2
−3
−2
−3
The nested distance is d(P, P̃) = 1.12.
The distance of the multiperiod distributions is d(P, P̃) = 0.67.
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
The distance algorithm
Let ν (1) and ν (2) be two tree processes and let the scenario
(1)
(1) (1)
processes (the node values) be ξt = ft (νt ) resp.
(2)
(2) (2)
ξt = ft (νt ). The distance between the two nested
distributions is defined in a backward recursion way through a
distance dt0 between the node of process 1 and process 2 at each
stage t.
(2)
(1)
Let Nt resp. Nt be the node set at stage t of the tree
processes ν (1) resp. ν (2) . We define the distances dt0 on
(1)
(2)
Nt × Nt for each t.
The final value of the distance is the distance between the roots of
the two processes.
To
we have to solve
PTcalculate(1)the distance,
(2)
#(N
)
·
#(N
)
linear
optimization problems, where
t
t
t=1
(i)
#(Nt ) is the number of nodes of process i at stage t.
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
I
End of Recursion. For the last stage T , let for
(2)
(1)
(u, v ) ∈ NT × NT
(1)
(2)
dT0 (u, v ) := ρ(fT (u), fT (v )),
I
where ρ is any metric on Rm .
0
Backward Recursion Step. Suppose that the distances dt+1
(1)
(2)
have been defined on Nt+1 × Nt+1 . Then for each pair
(1)
(2)
(u, v ) ∈ Nt × Nt , the conditional distributions of
(1)
(1)
(2)
(2)
νt+1 |νt = u resp. νt+1 |νt = v are well defined. Let now
(1)
(2)
(1)
(1)
dt0 (u, v ) := ρ(ft (u), ft (u))+d(νt+1 |νt
(1)
(2)
(2)
= u, νt+1 |νt
= v)
(2)
where the distances between νt+1 and νt+1 are measured by
0 .
dt+1
I
(1)
(2)
Final Result. For t = 1, the node sets N1 resp. N1 are
singletons and contain only the root. The distance d10 between
these roots is the nested distance between the trees.
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Further examples
5
3
6
5
4
2
4
3
1
3
2
1
0
0
−1
2
1
0
−1
−1
−2
−2
−2
−3
−3
−4
−3
0
1
2
3
−4
0
tree 1
1
2
tree 2
3
−4
0
1
3
tree 3
d(P(1) , P(2) ) = 3.90;
d(P (1) , P (2) ) = 3.48
d(P(1) , P(3) ) = 2.52;
d(P (1) , P (3) ) = 1.77
d(P(2) , P(3) ) = 3.79;
d(P (2) , P (3) ) = 3.44
Georg Ch. Pflug
2
Introduction to Stochastic Optimization Part 4: Multi-stage decision
The value of information
Let νt be a tree process and Ft = σ(νt ) = σ(ν1 , . . . , νt ) be the
σ-algebra generated by νt .
We will work with the standard filtration
F = (F0 , . . . , FT )
(2)
resp. the clairvoyant’s filtration
F T = (FT , . . . , FT ).
(3)
The standard tree (left) represents the filtration F . The clairvoyant
expansion (right) represents the clairvoyants filtration F T .
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
With the short notation
H(x, ξ)
for
H(x0 , ξ1 , x1 , . . . , xT −1 , ξT )
we may reformulate the multi-stage decision problem as
A0 := max{A[H(x, ξ)] : x ¢ F }.
(4)
The clairvoyant’s problem is
C0 := max{A[H(x, ξ)] : x ¢ F T },
(5)
where F T is the clairvoyants filtration (3). Notice that the
”condition” x ¢ F T is in fact no restriction at all.
If the functional A is pointwise monotonic, one may interchange
the order of the maximization and the application of the functional
in the clairvoyant’s problem, i.e.
max{A[H(x, ξ)] : x ¢ F T } = A[max{H(x, ξ) : x ¢ F T }].
Georg Ch. Pflug
(6)
Introduction to Stochastic Optimization Part 4: Multi-stage decision
Let H̄ denote the inner function in (6), i.e.
H̄(ω1 , . . . , ωT ) = max{H(x0 , ω1 , x1 , . . . , xT −1 , ωT ) : xt ∈ X }. (7)
The clairvoyant knows this function and gets the objective value
C0 = A[H̄(ξ1 , . . . , ξT )].
It is evident that
C0 ≥ A0 ,
since the feasible set of (4) is contained in the feasible set of (5).
We may call the difference the value of perfect information
D0 = C0 − A0 .
D0 is a measure of multi-period deviation risk.
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
The conditional problem
For any possible value v of the tree process νt , denote by P {νt =v }
the conditional probability conditioned on the event {νt = v }.
Under P {νt =v } the variables ν1 , . . . , νt−1 sit on the predecessors of
v , i.e. their distribution are concentrated on singletons. The
variables νt+1 , . . . , νT sit only on those nodes, which are
successors of the node v . In particular, conditioning on a terminal
node v leads to a deterministic process, the predecessor path of v .
The same decision problem, which is solved on the original tree,
can also be solved on all conditional subtrees. By the notation
AP [·] we express the fact that the probability measure P governs
the tree process. Define now
At (v ) = max{AP {νt =v } [H(x, ξ)] : x ¢ F }
Ct (v ) = max{AP {νt =v } [H̄(x, ξ)] : x ¢ F T }
where H̄ is given by (7). By this construction, (At ) and (Ct ) are
processes which depend on the tree process, i.e. are adapted to F .
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision
The value-of-perfect-information process
Define
Dt = Ct − At
as the value-of-perfect-information process. It is a nonnegative
process, which describes the evolution of risk in time.
A is compound convex, if
A(Y ) ≤ E[A(Y |F)].
Lemma.
(i) If A is compound convex, then the clairvoyant process Ct is a
submartingale.
(ii) If A is compound linear, then Ct is a martingale.
(iii) If A is compound convex, then the optimality process At is a
submartingale.
(iv) If A is compound linear (in particular if A is the expectation),
then the value-of-information process Dt is a supermartingale.
Georg Ch. Pflug
Introduction to Stochastic Optimization Part 4: Multi-stage decision