Chen, Kou, and Wang: A Partitioning Algorithm for MDPs and Market Microstructure
Article submitted to Management Science; manuscript no. MS-0001-1922.65
37
Online Appendices
Appendix A:
Proofs
Proof of Lemma 1:
Clearly, formulation (2) is a special case of formulation (1). To show the opposite
direction, we transform formulation (1) into (2), whose coefficients and states are represented by hat symbols.
Let n̂ = np, m̂ = m + p
x̂i1 ⌘ (xi , yi1 ) 2 R
1, . . . , p
1, and divide stage i into p sub-stages {i1 , . . . , ip }. At the start of stage i, let
and yi1 = 0. The multivariate action ui can be decomposed as follows. For j =
m+p 1
1, let ûij = ui (j) (jth component of ui ) and x̂ij+1 = (xi , yij+1 ), where yij+1 (k) = yij (k) for k 6= j
and yij+1 (j) = ûij . In other words, we record the action u by the auxiliary state vector y. Meanwhile, let
Ŝij = Si and Ĉ = 0. For j = p, let ûip = ui (p) and Âip , B̂ip , and Ĉip be the same as Ai , Bi , and Ci , except
that yip = ui (1 : p
1) is now the state rather than the action. Hence, stage ip of the new formulation is
equivalent to stage i, the formulation (1). In terms of the constraint, FS0 i ,i xi DS0 i ,i ui G0Si ,i xi is considered
only at sub-stage ij 0 , where ui (ij 0 ) is the last component appearing in the constraint, i.e., DSi ,i (ij 0 ) 6= 0,
while DSi ,i (ij 0 +1 ) = · · · = DSi ,i (ip ) = 0. This naturally defines a constraint at sub-stage ij 0 , as the first j 0
1
components of ui are recorded by the endogenous state yij0 . It is easy to see that the transformed problem,
⇤
in the form of (2), is equivalent to the original problem.
Lemma 2. If (4) holds, then Vi (xi , Si , ui ) is strictly convex in ui for any (xi , Si ) 2 Rm ⇥ {s1 , . . . , sl } and
i = 0, . . . , n
1.
Proof of Lemma 2:
(1)
We show that for any
(2)
Vi (xi , Si , ui ) + (1
)Vi (xi , Si , ui )
(1)
2 (0, 1) and two feasible actions ui
(1)
(2)
and ui ,
(2)
)ui ). For any feasible policy {ut }nt=i , the
Vi (xi , Si , ui + (1
realized cost-to-go is convex by the condition. Because expectation is simply a weighted sum of all re(1)
(2)
alizations, the expected cost-to-go is also convex. Let {ut }nt=i+1 and {ut }nt=i+1 be the corresponding
(1)
policies that minimize the cost starting from i + 1 after ui
{
(1)
ut +(1
(2)
)ut }nt=i .
(1)
Vi (xi , Si , ui ) + (1
are taken at i. We can define a policy
Clearly, it is a feasible and non-anticipating policy. Its expected cost-to-go is less than
(2)
)Vi (xi , Si , ui ) by the convexity shown above. Because it is a member of all feasible
(1)
(2)
policies with ui = ui + (1
This completes the proof.
Proof of Theorem 1:
(2)
and ui
(1)
)ui , the expected cost is greater than equal to Vi (xi , Si , ui + (1
(2)
)ui ).
⇤
Now, we prove Theorem 1 by backward induction, naturally leading to the parti-
tioning algorithm. For i = n, it is obvious that in the interior of the region (GSn ,n )0 xn (FSn ,n )0 xn , the
problem is infeasible. We rearrange the objective function of the last stage into aSn ,n u2n + (bSn ,n )0 xn un +
x0n cSn ,n xn , where a 2 R, b 2 Rm and c 2 Rm⇥m . By Lemma 2, aSn ,n > 0, and the unconstrained minimizer
is ũ⇤n (xn , Sn ) = (bSn ,n )0 xn /2aSn ,n , a linear function of the state xn . For each k = 1, . . . , l and Sn = sk , we
partition the feasible state space by the linear boundaries ũ⇤n (FSn ,n )0 xn , (FSn ,n )0 xn ũ⇤n (GSn ,n )0 xn
and (GSn ,n )0 xn ũ⇤n . In each region, we let u⇤n = (FSn ,n )0 xn , u⇤n = ũ⇤n and u⇤n = (GSn ,n )0 xn , respectively,
and compute the quadratic value function. For differentiability, it is sufficient to check the boundaries, e.g.,
(FSn ,n + bSn ,n /2aSn ,n )0 xn = 0. One can easily verify that the value functions of the two neighboring regions differ by aSn ,n x0n LL0 xn , where L = FSn ,n + bSn ,n /2aSn ,n . Therefore, they have equal gradients at the
boundary, and Jn is differentiable if (GSn ,n
FSn ,n )0 xn > 0.
Chen, Kou, and Wang: A Partitioning Algorithm for MDPs and Market Microstructure
Article submitted to Management Science; manuscript no. MS-0001-1922.65
38
n
i+1
Suppose that the result holds for i + 1. At time i, we define a refinement {Pri+1 }r=1
of all partitions
nk,i+1
{Prk,i+1 }r=1
,
k = 1, . . . , l, i.e., two points belong to the same region in
ni+1
{Pri+1 }r=1
belong to the same region of all l partitions. For Si = sj , in the feasible region (Fsj ,i
if and only if they
Gsj ,i )0 xi 0, consider
the set of (xi , ui ) that transit into the region Pri+1 contained in Prk,i+1
for k = 1, . . . , l. We require Prk,i+1
k
k
(i)
to be a feasible region if Pjk > 0. The unconstrained minimizer ũ⇤i is given by a scalar quadratic program:
(✓ ◆
)
✓ ◆ X
l
0
ui
ui
(i) 0
⇤
k,i+1
ũi = arg min
Csj ,i
+
Pjk xi+1 Nrk xi+1
xi
xi
ui 2R
k=1
= arg min asj ,i u2i + (bsj ,i )0 xi ui + x0i csj ,i xi ,
ui 2R
when we substitute in xi+1 = Asj ,i xi + Bsj ,i ui . Because a is positive, as implied by Lemma 2, ũ⇤i =
(bsj ,i )0 xi /2asj ,i is a linear function of xi . Because of the differentiability of Ji+1 and, thus, Vi (·, sj , ·), the
minimum of the Bellman equation is attained if and only if one of the following occurs: (1) ũ⇤i (Fsj ,i )0 xi
and u⇤i = (Fsj ,i )0 xi ; (2) (Fsj ,i )0 xi ũ⇤i (Gsj ,i )0 xi and u⇤i = ũ⇤i ; or (3) (Gsj ,i )0 xi ũ⇤i and u⇤i = (Gsj ,i )0 xi .
Each corresponds to a region of the state space bounded by the following:
(1) Asj ,i + Bsj ,i (Fsj ,i )0 xi 2 Prk,i+1
,
k
(2) Asj ,i
(bsj ,i )0 xi /2asj ,i (Fsj ,i )0 xi , (Fsj ,i
Gsj ,i )0 xi 0;
Bsj ,i (bsj ,i )0 /2asj ,i xi 2 Prk,i+1
, (Fsj ,i )0 xi (bsj ,i )0 xi /2asj ,i (Gsj ,i )0 xi ;
k
(3) Asj ,i + Bsj ,i (Gsj ,i )0 xi 2 Prk,i+1
, (Gsj ,i )0 xi (bsj ,i )0 xi /2asj ,i , (Fsj ,i
k
(i)
Gsj ,i )0 xi 0;
for all k = 1, . . . , l such that Pjk > 0. By the induction hypothesis, Prk,i+1
is a polyhedron. Hence, the
k
boundaries above are linear in xi . We substitute in u⇤i , which is linear in xi , into the Bellman equation and
compute the value function, which is quadratic in xi .
We then show that the regions defined above for all possible (r1 , . . . , rl ) are disjoint and form a partition
of Rm . If the intersection of two regions has interior points, then for any such interior point, there exist
u1i 6= u2i , being a local minimizer and bringing xi to x1i+1 and x2i+1 in different regions at i + 1 for at least
one Si+1 = sk . However, this possibility is ruled out by Lemma 2 because of the strict convexity of Vi in ui .
Moreover, the points that are not covered in any region listed above must be infeasible.
Finally, we show that Ji (·, sj ) is differentiable in the interior of the feasible set of the state space for all
j = 1, . . . , l. If a boundary L0 xi = 0 is of the type (FSi ,i + bSi ,i /2aSi ,i )0 xi = 0 or (GSi ,i + bSi ,i /2aSi ,i )0 xi = 0,
the differentiability can be shown, similar to the case i = n. If the boundary is inherited from i + 1, then
because of the differentiability of Ji+1 , the value function is differentiable at the boundary. Suppose that
Ji (·, sj ) is non-differentiable at some xi . Then, we can always find a boundary that contains xi , and the
gradients of the value function of the two neighboring regions are not equal at the boundary. This cannot
occur because the boundary must be of either or both types. Therefore, we have completed the inductive
step.
⇤
Note that the Bellman equation is in the following form:
◆
xn
Jn (xn , dn , Qn ) = dn +
xn
2qn
⇢✓
◆
✓
✓
◆
◆
ui
ui
Ji (xi , di , Qi ) = min
di +
ui + Ei Ji+1 xi ui , e ⇢i t di +
, Qi+1
0ui xi
2Qi
Qi
⇤
un (xn , dn , Qn ) = xn
⇢✓
◆
✓
✓
◆
◆
ui
ui
⇤
⇢i t
ui (xi , di , Qi ) = arg min
di +
ui + Ei Ji+1 xi ui , e
di +
, Qi+1
2Qi
Qi
0ui xi
Proof of Proposition 1:
✓
,
Chen, Kou, and Wang: A Partitioning Algorithm for MDPs and Market Microstructure
Article submitted to Management Science; manuscript no. MS-0001-1922.65
where Ei [·] = E[·|Fi ]. Similarly, for (7), we have
✓
◆
xn
J¯n (xn , dn , Qn , Mn ) = Mn + dn +
xn
2Qn
⇢✓
◆
✓
ui
J¯i (xi , di , Qi , Mi ) = min
Mi + d i +
ui + Ei J¯i+1 xi
0ui xi
2Qi
ū⇤n (xn , dn , Qn , Mn ) =xn
⇢✓
◆
✓
ui
ū⇤i (xi , di , Qi , Mi ) = arg min
Mi + d i +
ui + Ei J¯i+1 xi
2Qi
0ui xi
39
ui , e
⇢i
t
✓
di +
ui
Qi
◆
, Qi+1 , Mi+1
◆
ui , e
⇢i
t
✓
di +
ui
Qi
◆
, Qi+1 , Mi+1
◆
.
For part (i) and (ii), we use backward induction to show that
J¯i (xi , di , Qi , Mi ) = Ji (xi , di , Qi ) + xi Mi
ū⇤i (xi , di , Qi , Mi ) = u⇤i (xi , di , Qi ).
At time tn , we have
✓
◆
xn
J¯n (xn , dn , Qn , Mn ) = dn +
xn + xn Mn = Jn (xn , dn , Qn ) + xn Mn
2Qn
x̄⇤n (xn , dn , Qn , Mn ) = xn = x⇤n (xn , dn , Qn ),
and the result holds. Suppose that at stage i + 1, we have
J¯i+1 (xi+1 , di+1 , Qi+1 , Mi+1 ) = Ji+1 (xi+1 , di+1 , Qi+1 ) + xi+1 Mi+1
ū⇤i+1 (xi+1 , di+1 , Qi+1 , Mi+1 ) = u⇤i+1 (xi+1 , di+1 , Qi+1 ).
Then, according to the Bellman equation, we have
⇢✓
◆
✓
✓
◆
◆
ui
ui
⇢i t
¯
¯
Ji (xi , di , Qi , Mi ) = min
Mi + d i +
ui + Ei Ji+1 xi ui , e
di +
, Qi+1 , Mi+1
0ui xi
2Q
Qi
⇢✓
◆ i
ui
= min
di +
u i + u i Mi
0ui xi
2Qi
✓
✓
◆
◆
ui
+ Ei Ji+1 xi ui , e ⇢i t di +
, Qi+1 + Ei [(xi ui )Mi+1 ]
Qi
⇢✓
◆
ui
= min
di +
u i + u i Mi
0ui xi
2Qi
✓
✓
◆
◆
ui
+ Ei Ji+1 xi ui , e ⇢i t di +
, Qi+1 + (xi ui )Mi
Qi
⇢✓
◆
✓
✓
◆
◆
ui
ui
= min
di +
ui + Ei Ji+1 xi ui , e ⇢i t di +
, Qi+1
+ x i Mi
0ui xi
2Qi
Qi
=J(xi , di , Qi ) + xi Mi ,
because Mi is a martingale. Given that the optimization problem of ui does not involve Mi , we can deduce
that u⇤i and ū⇤i must be equal. Thus, the result is proved by induction.
For part (iii), we first show that Ji (xi , di , Qi ) is an increasing function of di by backward induction: It is
straightforward for tn ; the inductive step is given by the Bellman equation
⇢✓
◆
✓
✓
◆
◆
ui
ui
⇢i t
Ji (xi , di , Qi ) = min
di +
ui + Ei Ji+1 xi ui , e
di +
, Qi+1
0ui xi
2Qi
Qi
.
Chen, Kou, and Wang: A Partitioning Algorithm for MDPs and Market Microstructure
Article submitted to Management Science; manuscript no. MS-0001-1922.65
40
Note that the expectation is increasing in di by induction. Thus, Ji increases in di . Next, in the above
Bellman equation, ⇢i appears only in di+1 . As J is increasing in d, it must be decreasing in ⇢.
For part (iv), we use backward induction to prove a stronger result: Ji (xi , di , Q0i ) Ji (xi , di , Qi ) for Q0i
Qi . Note that at time tn , Jn (xn , dn , Qn ) = (dn + xn /2Qn ) xn is a decreasing function of Qn . If this is true
for i + 1, then for i, we have
⇢✓
◆
✓
l
X
ui
di +
ui +
P(Qi+1 = sj |Qi )Ji+1 xi
0ui xi
2Qi
j=1
⇢✓
◆
✓
l
X
ui
min
di +
u
+
P(Q
=
s
|Q
)J
i
i+1
j
i
i+1 xi
0ui xi
2Q0i
j=1
Ji (xi , di , Qi ) = min
ui , e
⇢i
t
ui , e
⇢i
t
✓
✓
di +
ui
Qi
di +
ui
Qi
◆
◆
, Qi+1
, Qi+1
◆
◆
.
By induction, Ji+1 is a decreasing function of Qi+1 . Thus,
l
X
j=1
l
X
j=1
(P (Qi+1 = sj |Qi )
(P (Qi+1 = sj |Qi )
P (Qi+1 =
sj |Q0i )) Ji+1
P (Qi+1 =
sj |Q0i )) Ji+1
Therefore, we have shown that Ji (xi , di , Qi )
✓
✓
xi
xi
ui , e
⇢i
t
ui , e
⇢i
t
✓
✓
ui
di +
Qi
ui
di +
Qi
◆
◆
, Qi+1
◆
◆
, a = 0.
Ji (xi , di , Q0i ) for i and have thus completed the proof.
⇤
Note that the realized cost can be expressed as follows:
!
!
!
✓
◆ X
i 1
i 1
i 1
n
n
X
X
X
X
ui
uj
ui
ui di +
=
exp
⇢j t d 0 +
exp
⇢k t
+
ui
2Qi
Qj Qi
i=0
i=0
j=0
j=0
k=j
!
! !
j 1
i 1
n
n
n
X
X
X
X
X
u2i
ui uj
=
+
exp
⇢k t
+ d0
exp
⇢j t u i .
2Qi i<j
Qi
i=0
i=0
j=0
k=i
Proof of Proposition 2:
If we index the rows and columns from 0, then the symmetric quadratic matrix H 2 R(n+1)⇥(n+1) is in
Pj 1
the form Hii = 1/2Qi and Hij = exp(
k=i ⇢k t)/2Qi for 0 i < j n. If Qi > exp( 2⇢i t)Qi+1 for
i = 0, 1, . . . , n
1, then we can perform the Cholesky decomposition of H: Let L be a lower triangular matrix
with
L00 =
r
1
,
2Q0
Lii =
s
1
2Qi
e
2⇢i
1
t
1
2Qi
,
Lij = e
⇢i
1
t
Li
1,j ,
1
for 0 j < i n. It is easy to confirm that LL0 = S. Because Lii > 0, H is positive definite. Therefore, if
s1 > exp ( 2⇢i t) sl , then the realized cost is always convex in {ui }ni=0 . The same argument holds for the
realized cost starting from stage i for i = 0, . . . , n
1. Thus, using Lemma 2, we have completed the proof.
⇤
Proof of Proposition 3:
Part (i) is straightforward because the set of all admissible deterministic policies
is a subset of ⇥. For part (ii), let {q̃0 , q̃1 , . . . , q̃n } be a deterministic vector valued in the space {s1 , . . . , sl }n+1 .
(k)
We construct a sequence of Markov chains {Qi }0in that converge to {q̃0 , q̃1 , . . . , q̃n }, with volatility
(k)
The initial exogenous states Q0
↵=
min
min
0in 1 sj 6=q̃i+1
are equal to q̃0 , and their parameters satisfy the following:
(sj
q̃i
✓(µi
q̃i ) t)2
(q̃i+1
q̃i
✓(µi
q̃i ) t)2 > 0.
(k)
i .
Chen, Kou, and Wang: A Partitioning Algorithm for MDPs and Market Microstructure
Article submitted to Management Science; manuscript no. MS-0001-1922.65
41
The condition guarantees that the deterministic chain {q̃0 , q̃1 , . . . , q̃n } is the only likely sample path in
(k)
(k)
(k)
{Qi }0in . Let Cdet , Csto be the optimal costs of the deterministic formulation and the MDP for(k)
mulation when the market depth follows {Qi }0in . We will show that if
(k)
limk!1 Cdet
(k)
Csto
(k)
! 0 as k ! 1, then
= 0.
First, we compute the likelihood of {q̃0 , q̃1 , . . . , q̃n }:
⇣
⌘
(k)
(k)
P {Q0 , Q1 , . . . , Q(k)
n } = {q̃0 , q̃1 , . . . , q̃n }
⇣
⌘ ⇣
⌘
⇣
(k)
(k)
(k)
(k)
(k)
=P Q1 = q̃1 |Q0 = q̃0 P Q2 = q̃2 |Q1 = q̃1 · · · P Q(k)
n = q̃n |Qn 1 = q̃n
=
=
n
Y1
i=0
n
Y1
i=0
✓
exp
Pl
(q̃i+1
j=1 exp (
Pl
j=1 exp
⇣
q̃i
(sj
q̃i ) t)2 /2(
✓(µi
q̃i
(k)
q̃i )2 t
q̃i ) t)2 /2(
(k) q̃
t)2 (q̃i+1 q̃i
2( (k) q̃i )2 t
◆n
✓(µi
✓(µi
i)
2
q̃i
✓(µi
q̃i )
q̃i )
⌘
t)
1
(sj
1
t)2
⌘
1
1) exp ( ↵/2( (k) q̃i )2 t)
✓
✓
◆◆
↵
= 1 O exp
,
2( (k) max{sj })2 t
n
o
(k)
as k ! 1. Therefore, the Markov chain Qi
converges to the deterministic chain in probability.
1 + (l
Next, consider the deterministic optimization problem of {xi }:
◆
✓
n
X
ui
min
ui di +
2q̃i
i=0
s.t.
n
X
ui = X, ui
i=0
di+1 = e
⇢i
t
✓
0 i = 0, 1, . . . , n
di +
ui
q̃i
◆
The objective function is a quadratic function of ui , and the constraints form a closed set. Hence, its minimum
n
o
(k)
can be achieved. Denote the optimal cost and an optimal solution as C̃ and {ũi }. Let xi
be any policy
n
o
(k)
of ⇥. We use the notation xi (!) to emphasize the stochastic nature of the policy. Now, we have
E
=E
"
n
X
" i=0
n
X
i=0
+E
(k)
(k)
xi (!)
di (!) +
xi (!)
(k)
2Qi (!)
(k)
(k)
xi (!)
"
n
X
di (!) +
xi (!)
(k)
2Qi (!)
!#
!
(k)
(k)
xi (!)
(k)
(k)
1({Q0 , Q1 , . . . , Q(k)
n }=
xi (!)
!
{q̃0 , q̃1 , . . . , q̃n })
(k)
(k)
1({Q0 , Q1 , . . . , Q(k)
n } 6=
#
{q̃0 , q̃1 , . . . , q̃n })
(k)
2Qi (!)
" n
!
#
(k)
X (k)
xi (!)
(k)
(k)
(k)
E
xi (!) di (!) +
1({q0 , Q1 , . . . , Qn } = {q̃0 , q̃1 , . . . , q̃n })
(k)
2q̃i
i=0
h
i
(k)
(k)
C̃E 1({Q0 , Q1 , . . . , Q(k)
n } = {q̃0 , q̃1 , . . . , q̃n })
✓
✓
✓
◆◆◆
↵
C̃ 1 O exp
.
2( (k) max {sj })2 t
i=0
di (!) +
#
Chen, Kou, and Wang: A Partitioning Algorithm for MDPs and Market Microstructure
Article submitted to Management Science; manuscript no. MS-0001-1922.65
42
Therefore, the optimal policy in ⇥ can achieve no better optimum than C̃ when k goes to infinity. However,
(k)
if we let ui (!) = ũi , which is a deterministic policy and a special member of ⇥, then the objective function
is
E
"
n
X
x̃i di (!) +
i=0
x̃i
(k)
2Qi (!)
!#
=E
"
n
X
x̃i di (!) +
i=0
+E
"
n
X
x̃i
(k)
2Qi (!)
x̃i
x̃i di (!) +
i=0
✓
C̃ + O exp
✓
!
(k)
(k)
(k)
1({Q0 , Q1 , . . . , Q(k)
n }=
!
{q̃0 , q̃1 , . . . , q̃n })
(k)
(k)
1({Q0 , Q1 , . . . , Q(k)
n } 6=
2Qi (!)
◆◆
↵
2( (k) max {sj })2 t
#
{q̃0 , q̃1 , . . . , q̃n })
The second term in the last inequality is due to the boundedness of all variables, x, d and q. Thus, we have
(k)
shown that limk!1 Csto = C̃.
(k)
Because the policy {x̃i }ni=0 is deterministic, the same argument holds for Cdet , and we can obtain
(k)
limk!1 Cdet = C̃. Hence, we have proven that the upper bound converges to the optimal value of the original
problem.
⇤
The Upper Bound for the Complexity of Both Applications (Section 2.4):
To show that (2l)n+1 is an up-
per bound for the LOB application, note that the terminal stage has two regions in a partition. Because
m = 2, the refinement of l partitions, each having k regions, generates lk regions. For each such region,
wait/buy gives two options. Therefore, backward induction multiplies the complexity by 2l at each stage and
leads to the upper bound.
To show the bound for the complexity of renewable electricity management, note that the constraint is
present only at the terminal stage when there are two regions. Thus, only the refinement of l partitions
creates new regions, which multiplies the complexity by l at each stage. In the case of binomial noise, we
have m = 3 and l = 4, leading to the bound 2ln .
Appendix B:
⇤
Details of the Numerical Examples in the Paper
When random noise is unobserved:
min
{ui }n
i=0
s.t.
Without loss of generality, consider the following formulation:
" n ✓ ◆
#
X u 0 ✓u ◆
i
i
E
Ci
(10)
xi
xi
i=0
Fi0 xi ui G0i xi , i = 0, . . . , n
xi+1 = Ai xi + Bi ui + ✏i , i = 0, . . . , n
where ✏i is a Markov chain, and only its distribution conditional on ✏i
1
1,
(not its realization) is known at
stage i. This fits into the usual setting of random noise. We next show how (10) can be transformed into (2).
Consider the following formulation:
min
{ui }2n
i=0
s.t.
E
"
◆0
n ✓
X
u
2i
i=0
x2i
Ci
✓
u2i
x2i
◆#
(11)
0
Fi/2
xi ui G0i/2 xi , i = 0, 2, . . . , 2n
0 ui 0, i = 1, 3, . . . , 2n
1
xi+1 = Ai/2 xi + Bi/2 ui , i = 0, 2, . . . , 2n
xi+1 = xi + ✏(i
1)/2 ,
i = 1, 3, . . . , 2n
1,
2
#
Chen, Kou, and Wang: A Partitioning Algorithm for MDPs and Market Microstructure
Article submitted to Management Science; manuscript no. MS-0001-1922.65
where ✏(i
43
is observed at i if i is odd, as in formulation (2). In formulation (11), we divide each decision
1)/2
stage i in (10) into a pre-decision stage 2i, when the decision is made, and a post-decision stage 2i + 1, when
the randomness is realized. It is not difficult to show that (11) is equivalent to (10): Their costs and dynamics
are the same considering the even stages of (11); in terms of the information structure, when decision u2i is
made in (11), the decision-maker observes ✏i
1
and knows only the distribution of ✏i . Therefore, the setting
of unobserved random noises can be incorporated.
The Matrix Formulation of Example 1:
The convexity condition Assumption 1 is clearly satisfied, since
the stagewise cost is convex. The matrix A 2 R11⇥11 does not depend on i and Si . A(1, 1) = A(2, 2) =
A(j, j + 1) = 1 for j = 2, . . . , 10. For matrix B 2 R11 , BSi ,i (2) = 1 if Si = 1 and BSi ,i (11) = 1 if Si = 9. The
matrix C 2 R12⇥12 is independent of i and Si with C(1, 1) = C(2, 2) = C(3, 3) = 1 and C(3, 2) = C(2, 3) = 1.
The matrix F = 0, and there is no upper bound. All unstated components of the matrices are zero. To
P
incorporate the terminal cost c(x3 ) = 12
1)2 , we can use the terminal value function V3 (x) = c(x3 =
i=3 (xi
x) = (x
1)2 + (x + d1
2)2 + · · · + (x + d1 + · · · + d9
10)2 , which is quadratic in the endogenous state.
The MDP Formulation of the Electricity Management Problem:
We use the pre- and post-decision for-
mulation for unobserved random noise. Let the Markov chain Si be i.i.d. draws of (✏(1) , ✏(2) ), with four states
s1 = (1, 1), s2 = (1, 1), s3 = ( 1, 1) and s4 = ( 1, 1), from distribution P(s1 ) = P(s4 ) = (1 + ⇢)/4 and
⇢)/4. Let x = (x, p, d, 1), where 1 represents a state variable that always equals 1. For
P(s2 ) = P(s3 ) = (1
i = 0, 2, 4, . . . , 2n
2,
0
1
B0
Asj ,i = @
0
0
0
1
0
0
0
1
0 1
0 0
1
B 0
0 0 C
B⌫ C
B
, Bsj ,i = @ A , Csj ,i = B 1/2
A
1µ t
0
@ 0
0 1
0
0
0
0
0
0
0
1/2
0
0
0
0
0
0
0
0
0
1
0
0C
C
0C,
0A
0
and there is no constraint (no boundaries associated with F or G are generated in the partition). For
i = 1, 3, . . . , 2n
1,
0
For i = n,
1
B0
Asj ,i = B
@0
0
0
1
0
0
0
0
1
0
1
0p
C
p sj (1)p t C
, Bsj ,i = 04 , Csj ,i = 05⇥5 , Fsj ,i = 04 ; Gsj ,i = 04 .
tA
d sj (2)
1
0
and the constraint is û
B
B
Csj ,i = B
@
+⌘
⌘
0
⌘
0
⌘
⌘
0
⌘
0
1
0 ⌘0
0 ⌘ 0C
C
0 0 0C,
0 ⌘ 0A
0 0 0
0. Then, our algorithm can be applied to solve the optimal trading/production
problem for the power producer. To reduce complexity, one can also replace the state d and x with their
difference d
x.
Although we divide a stage into pre- and post-decision stages, the computational cost is significantly less
than (m, 2n, l) because ✏i is white noise: at even stages in (11), there is no dependence on the previous
information, and no further partitions are generated. In fact, the complexity is still (m, n, l).
Chen, Kou, and Wang: A Partitioning Algorithm for MDPs and Market Microstructure
Article submitted to Management Science; manuscript no. MS-0001-1922.65
44
Random Coefficient Matrices in the Randomized Computational Experiments (Section 2.4.1):
A is the
identity matrix of size m for all (sj , i); the components of Bsj ,i are drawn independently and uniformly
from [ 1, 1]; for C, we first generate a matrix L 2 R(m+1)⇥(m+1) whose components are independent and
uniformly distributed in [ 1, 1], and then, let C = LT L; F is a zero vector of size m; the components of G are
drawn independently and uniformly from [0, 1]; and the components in the transition probability matrix are
independent and uniformly distributed in [0, 1] and then normalized by their row sum. Note that for B, C
and G, we generate l sets of coefficient matrices (one for each exogenous state). The coefficients are assumed
to be constant over time. We use special forms of A and F ; otherwise, the problem is likely to be infeasible
in the whole state space even for a small n.
The Random Experiments for the Applications (Section 2.4.1):
For the LOB application, for given m, n
and l, we randomly generate ⇢ ⇠ U (0, 6), T ⇠ U (0, 5), x0 ⇠ U (0, 10000),
⇠ U (0, 3) and ✓ ⇠ U (0, 5), where
U (a, b) is a random variable uniformly distributed within [a, b]. In the electricity application, m = 3 ((d, x)
collapsing to a single state d
T ⇠ U (1, 24),
⌫ = 4 ⇥ 10
5
.
p
⇠ U (0, 1),
d
x, price p and the constant state 1) and l = 4. We randomly generate
⇠ U (0, 1000),
⇠ U (0, 0.2), ⌘ ⇠ U (0, 100) and ⇢ ⇠ U (0, 0.8) and let
= 0.2,
© Copyright 2026 Paperzz