Binary Decision Rules for Multistage Adaptive Mixed

Binary Decision Rules for Multistage Adaptive
Mixed-Integer Optimization
Dimitris Bertsimas∗
Angelos Georghiou†
August 20, 2014
Abstract
Decision rules provide a flexible toolbox for solving the computationally demanding, multistage
adaptive optimization problems. There is a plethora of real-valued decision rules that are highly
scalable and achieve good quality solutions. On the other hand, existing binary decision rule structures tend to produce good quality solutions at the expense of limited scalability, and are typically
confined to worst-case optimization problems. To address these issues, we first propose a linearly
parameterised binary decision rule structure and derive the exact reformulation of the decision rule
problem. In the cases where the resulting optimization problem grows exponentially with respect
to the problem data, we provide a systematic methodology that trades-off scalability and optimality, resulting to practical binary decision rules. We also apply the proposed binary decision rules
to the class of problems with random-recourse and show that they share similar complexity as the
fixed-recourse problems. Our numerical results demonstrate the effectiveness of the proposed binary
decision rules and show that they are (i) highly scalable and (ii) provide high quality solutions.
Keywords. Adaptive Optimization, Binary Decision Rules, Mixed-integer optimization.
1
Introduction
In this paper, we use robust optimization techniques to develop efficient solution methods for the class
of multistage adaptive mixed-integer optimization problems. The one-stage variant of the problem can
be described as follows: Given matrices A ∈ Rm×k , B ∈ Rm×q , C ∈ Rn×k , D ∈ Rq×k , H ∈ Rm×k and a
probability measure Pξ supported on set Ξ for the uncertain vector ξ ∈ Rk , we are interested in choosing
∗ Sloan School of Management and Operations Research Center, Massachusetts Institute of Technology, USA, E-mail:
[email protected].
† Automatic
Control
Laboratory,
ETH
Zurich,
Switzerland,
E-mail:
[email protected],
[email protected]
1
n real-valued functions x(∙) ∈ Rk,n and q binary functions y(∙) ∈ Bk,q in order to solve:
minimize
(Cξ)> x(ξ) + (Dξ)> y(ξ)


x(∙) ∈ Rk,n , y(∙) ∈ Bk,q , 
Eξ
subject to
Ax(ξ) + By(ξ) ≤ Hξ,


(1)
∀ξ ∈ Ξ.
Here, Rk,n denotes the space of all real-valued functions from Rk to Rn , and Bk,q the space of binary
functions from Rk to {0, 1}q . Problem (1) has a long history both from the theoretical and practical
point of view, with applications in many fields such as engineering [22, 31], operations management
[5, 29], and finance [13, 14]. From a theoretical point of view, Dyer and Stougie [18] have shown that
computing the optimal solution for the class of Problems (1) involving only real-valued decisions, is
#P-hard, while their multistage variants are believed to be “computationally intractable already when
medium-accuracy solutions are sought” [30]. In view of these complexity results, there is a need for
computationally efficient solution methods that possibly sacrifices optimality for tractability. To quote
Shapiro and Nemirovski [30], “. . . in actual applications it is better to pose a modest and achievable goal
rather than an ambitious goal which we do not know how to achieve.”
A drastic simplification that achieves this goal is to use decision rules. This functional approximation
restricts the infinite space of the adaptive decisions x(∙) and y(∙) to admit pre-specified structures,
allowing the use of numerical solution methods. Decisions rules for real-valued functions have been used
since 1974 [19]. Nevertheless, their potential was not fully exploited until recently, when new advances
in robust optimization provided the tools to reformulate the decision rule problem as tractable convex
optimization problems [4, 7]. Robust optimization techniques were first used by Ben-Tal et al. [6] to
reformulate the linear decision rule problem. In this work, real-valued functions are parameterised as
linear function of the random variables, i.e., x(ξ) = x> ξ for x ∈ Rk . The simple structure of linear
decision rules offers the scalability needed to tackle multistage adaptive problems. Even though linear
decision rules are known to be optimal in a number of problem instances [1, 11, 23], their simple structure
generically sacrifices a significant amount of optimality in return for scalability. To gain back some
degree of optimality, attention was focused on the construction of non-linear decision rules. Inspired
by the linear decision rule structure, the real-valued adaptive decisions are parameterised as x(ξ) =
0
0
x> L(ξ), x ∈ Rk , where L : Rk → Rk , k 0 ≥ k, is a non-linear operator defining the structure of the
decision rule. This approximation provides significant improvements in solution quality, while retaining
in large parts the favourable scalability properties of the linear decision rules. Table 1 summarizes
non-linear decision rules from the literature that are parameterized as x(ξ) = x> L(ξ), together with the
complexity of the induced convex optimization problems. A notable exception of real-valued decision rules
2
Real-Valued Decision Rule Structures
x(ξ) = x> L(ξ)
Computational Burden
Linear
LOP
Piecewise linear
LOP
Multilinear
LOP
Quadratic
SOCOP
SOCOP
Power, Monomial, Inverse monomial
Polynomial
SDOP
Reference
[6, 22, 26, 31]
[17, 16, 20, 21]
[20]
[3, 20]
[20]
[2, 12]
Table 1: The table summarizes the literature of real-valued decision rule structures for which
the resulting semi-infinite optimization problem can be reformulated exactly using robust optimization techniques. The computation burden refers to the structure of the resulting optimization problems for the case where the uncertainty set is described by a polyhedral set, with
Linear Optimization Problems denoted by LOP, Second-Order Cone Optimization Problems
denoted by SOCOP, and Semi-Definite Optimization Problems denoted by SDOP.
>
>
>
k
that are alternatively parameterized as x(ξ) = max{x>
1 ξ, . . . , xP ξ} − max{x1 ξ, . . . , xP ξ}, xp , xp ∈ R ,
is proposed by Bertsimas and Georghiou [10] in the framework of worst-case adaptive optimization. This
structure offers near-optimal designs but requires the solution of mixed-integer optimization problems.
In contrast to the plethora of decision rule structures for real-valued decisions, the literature on
discrete decision rules is somewhat limited. There are, however, three notable exceptions that deal with
the case of worst-case adaptive optimization. The first one is the work of Bertsimas and Caramanis [8],
where integer decision rules are parameterized as y(ξ) = y > dξe, y ∈ Zk , where d∙e is the component-wise
ceiling function. In this work, the resulting semi-infinite optimization problem is further approximated
and solved using a randomized algorithm [15], providing only probabilistic guaranties on the feasibility of
the solution. The second binary decision rule structure was proposed by Bertsimas and Georghiou [10],
where the decision rule is parameterized as
y(ξ) =


 1,

 0,
>
>
>
max{y >
1 ξ, . . . , y P ξ} − max{y 1 ξ, . . . , y P ξ} ≤ 0,
(2)
otherwise,
with y p , y p ∈ Rk , p = 1, . . . , P . This binary decision rule structure offers near-optimal designs at the
expense of scalability. Finally, in the recent work by Hanasusanto et al. [24], the binary decision rule is
restricted to the so called “K-adaptable structure”, resulting to binary adaptable policies with K contingency plans. This approximation is an adaption of the work presented in Bertsimas and Caramanis [9],
and its use is limited to two-stage adaptive problems.
The goal of this paper is to develop binary decision rules structures that can be used together with
the real-valued decision rules listed in Table 1 for solving multistage adaptive mixed-integer optimization
problems. The proposed methodology is inspired by the work of Bertsimas and Caramanis [8], and uses
the tools developed in Georghiou et al. [20] to robustly reformulate the problem into a finite dimensional
3
mixed-integer linear optimization problem. The main contributions of this paper can be summarized as
follows.
1. We propose a linear parameterized binary decision rule structure, and derive the exact reformulation
of the decision rule problem that achieves the best binary decision rule. We prove that for general
polyhedra uncertainty sets and arbitrary decision rule structures, the decision rule problem is
computationally intractable, resulting in mixed-integer linear optimization problems whose size
can grow exponentially with respect to the problem data. To remedy this exponential growth,
we use similar ideas as those discussed in Georghiou et al. [20], to provide a systematic trade-off
between scalability and optimality, resulting to practical binary decision rules.
2. We apply the proposed binary decision rules to the class of problems with random-recourse, i.e.,
to problem instances where the technology matrix B(ξ) is a given function of the uncertain vector
ξ. We provide the exact reformulation of the binary decision rule problem, and we show that
the resulting mixed-integer linear optimization problem shares similar complexity as in the fixedrecourse case.
3. We demonstrate the effectiveness of the proposed methods in the context of a multistage inventory
control problem (fixed-recourse problem), and a multistage knapsack problem (random-recourse
problem). We show that for the inventory control problem, we are able to solve problem instances
with 50 time stages and 200 binary recourse decisions, while for the knapsack problem we are able
to solve problem instances with 25 time stages and 1300 binary recourse decisions.
The rest of this paper is organized as follows. In Section 2, we outline our approach for binary decision
rules in the context of one-stage adaptive optimization problems with fixed-recourse, and in Section 3
we discuss the extension to random-recourse problems. In Section 4, we extend the proposed approach
to multistage adaptive optimization problems and in Section 5, we present our computational results.
Throughout the paper, our work is focused on problem instances involving only binary decision rules for
ease of exposition. The extension to problem instances involving both real-valued and binary recourse
decisions can be easily extrapolated, and thus omitted.
Notation.
We denote scalar quantities by lowercase, non-bold face symbols and vector quantities by
lowercase, boldface symbols, e.g., x ∈ R and x ∈ Rn , respectively. Similarly, scalar and vector valued
functions will be denoted by, x(∙) ∈ R and x(∙) ∈ Rn , respectively. Matrices are denoted by uppercase
symbols , e.g., A ∈ Rn×m . We model uncertainty by a probability space Rk , B Rk , Pξ and denote the
elements of the sample space Rk by ξ. The Borel σ-algebra B Rk is the set of events that are assigned
probabilities by the probability measure Pξ . The support Ξ of Pξ represents the smallest closed subset
4
of Rk which has probability 1, and Eξ (∙) denotes the expectation operator with respect to Pξ . Tr (A)
denotes the trace of a square matrix A ∈ Rn×n and 1(∙) is the indicator function. Finally, ek the kth
canonical basis vector, while e denotes the vector whose components are all ones. In both cases, the
dimension will be clear from the context.
2
Binary Decision Rules for Fixed-Recourse Problems
In this section, we present our approach for one-stage adaptive optimization problems with fixed recourse,
involving only binary decisions. Given matrices B ∈ Rm×q , D ∈ Rq×k , H ∈ Rm×k and a probability
measure Pξ supported on set Ξ for the uncertain vector ξ ∈ Rk , we are interested in choosing binary
functions y(∙) ∈ Bk,q in order to solve:
(Dξ)> y(ξ)
minimize
Eξ
subject to
y(∙) ∈ Bk,q ,




By(ξ) ≤ Hξ, 
(3)
∀ξ ∈ Ξ.
Here, we assume that the uncertainty set Ξ is a non-empty, convex and compact polyhedron
n
o
Ξ = ξ ∈ Rk : ∃ζ ∈ Rv such that W ξ + U ζ ≥ h, ξ1 = 1 ,
(4)
where W ∈ Rl×k , U ∈ Rl×v and h ∈ Rl . Moreover, we assume that the linear hull of Ξ spans Rk . The
parameter ξ1 is set equal to 1 without loss of generality as it allows us to represent affine functions of
the non-degenerate outcomes (ξ2 , . . . , ξk ) in a compact manner as linear functions of ξ = (ξ1 , . . . , ξk ).
If the distribution governing the uncertainty ξ is unknown or if the decision maker is very risk-averse,
then one might choose to alternatively minimize the worst-case costs with respect to all possible scenarios
ξ ∈ Ξ. This can be achieved, by replacing the objective function in Problem (3) with the worst-case ob
jective maxξ ∈Ξ (Dξ)> y(ξ) . By introducing the auxiliary variable τ ∈ R and an epigraph formulation,
we can equivalently write the worst-case problem as the following adaptive robust optimization problem.
minimize
subject to
τ

τ ∈ R, y(∙) ∈ Bk,q , 




>
(Dξ) y(ξ) ≤ τ,





By(ξ) ≤ Hξ,
∀ξ ∈ Ξ.
(5)
Notice that ξ appears in the left hand side of the constraint (Dξ)> y(ξ) ≤ τ , multiplying the binary
5
decisions y(ξ). Therefore, Problem (5) is an instance of the class of problems with random-recourse,
which will be investigated in Section 3.
Problem (3) involves a continuum of decision variables and inequality constraints. Therefore, in
order to make the problem amenable to numerical solutions, there is a need for suitable functional
approximations for y(∙).
2.1
Structure of Binary Decision Rules
In this section, we present the structure of the binary decision rules. We restrict the feasible region of
the binary functions y(∙) ∈ Bk,q to admit the following piecewise constant structure:


y(ξ) = Y G(ξ), Y ∈ Zq×g , 
0 ≤ Y G(ξ) ≤ e,
where G : Rk → {0, 1}g is a piecewise constant function:
G1 (ξ) := 1,


Gi (ξ) := 1 α>
i ξ ≥ βi ,
∀ξ ∈ Ξ,
i = 2, . . . , g,
(6a)
(6b)
for given αi ∈ Rk and βi ∈ R, i = 2, . . . , g. Here, we assume that both the set {ξ ∈ Ξ : α>
i ξ ≥ βi } and
>
>
{ξ ∈ Ξ : α>
i ξ ≤ βi } have non-empty interior for all i = 2, . . . , g, and αi ξ − βi 6= αj ξ − βj for all ξ ∈ Ξ
where i, j ∈ {1, . . . , g}, i 6= j. Since, G(∙) is a piecewise constant function, y(ξ) = Y G(ξ), Y ∈ Zq×g can
give rise to an arbitrary integer decision rule. By imposing the additional constraint 0 ≤ Y G(ξ) ≤ e, for
all ξ ∈ Ξ, y(∙) is restricted to the space of binary decision rules.
Applying decision rules (6) to Problem (3) yields the following semi-infinite problem, which involves
a finite number of decision variables Y ∈ Zq×g , and an infinite number of constraints:
minimize
Eξ
subject to
Y ∈ Zq×g ,
(Dξ)> Y G(ξ)






B Y G(ξ) ≤ Hξ, 




0 ≤ Y G(ξ) ≤ e,
(7)
∀ξ ∈ Ξ.
Notice that all decision variables appear linearly in Problem (7). Nevertheless, the objective and constraints are non-linear functions of ξ.
The following example illustrates the use of the binary decision rule (7) in a simple instance of
Problem (3), motivating the choice of vectors αi ∈ Rk and βi ∈ R, i = 2, . . . , g, and showing the
6
relationship between Problems (3) and (7).
Example 1 Consider the following instance of Problem (3):
minimize
Eξ y(ξ)


y(∙) ∈ B2,1 , 
subject to

y(ξ) ≥ ξ2 , 
where Pξ is a uniform distribution supported on Ξ =
(8)
∀ξ ∈ Ξ,
(ξ1 , ξ2 ) ∈ R2 : ξ1 = 1, ξ2 ∈ [−1, 1] . The
optimal solution of Problem (8) is y ∗ (ξ) = 1(ξ2 ≥ 0) achieving an optimal value of
1
2.
One can attain
the same solution by solving the following semi-infinite problem where G(∙) is defined to be G1 (ξ) = 1,
G2 (ξ) = 1(ξ2 ≥ 0), i.e., α2 = (0, 1)> , β2 = 0 and g = 2, see Figure 1.
minimize
Eξ y > G(ξ)
subject to
y ∈ Z2 ,
>
y G(ξ) ≥ ξ2 ,
0 ≤ y G(ξ) ≤ 1,
>






(9)
∀ξ ∈ Ξ.





The optimal solution of Problem (9) is y ∗ = (0, 1)> achieving an optimal value of
1
2,
and is equivalent
to the optimal binary decision rule in Problem (8).
G2 (ξ)
1
0
-1
0
1
ξ2
Ξ
Figure 1: Plot of piecewise constant function G2 (ξ ) = 1(ξ2 ≥ 0). This structure of G(∙) will give
rise of piecewise constant decision rules y(ξ ) = y > G(ξ ), for some y ∈ R2 . If one imposes constraints
y ∈ Z2 and 0 ≤ y > G(ξ ) ≤ 1, y(∙) is restricted to the space of binary decision rules induced by G.
In the following section, we present robust reformulations of the semi-infinite Problem (7).
2.2
Computing the Best Binary Decision Rule
In this section, we present an exact reformulation of Problem (7). The solution of the reformulated
problem will achieve the best binary decision rule associated with structure (6). The main idea used in
7
this section is to map Problem (7) to an equivalent lifted adaptive optimization problem on a higherdimensional probability space. This will allow to represent the non-convex constraints with respect to ξ
in Problem (7), as linear constraints in the lifted adaptive optimization problem. The relation between
the uncertain parameters in the original and the lifted problems is determined through the piecewise
constant operator G(∙). By taking advantage of the linear structure of the lifted problem, we will employ
linear duality arguments to reformulate the semi-infinite structure of the lifted problem into a finite
dimensional mixed-integer linear optimization problem. The work presented in this section uses the
tools developed by Georghiou et al. [20, Section 3].
We define the non-linear operator

ξ


L(ξ) = 
,
G(ξ)

0
L : R k → Rk ,
0
(10)
0
where k 0 = k + g, and define the projection matrices Rξ ∈ Rk×k , RG ∈ Rg×k such that
Rξ L(ξ) = ξ,
RG L(ξ) = G(ξ).
We will refer to L(∙) as the lifting operator. Using L(∙) and Rξ , RG , we can rewrite Problem (7) into the
following optimization problem:
(DRξ L(ξ))> Y RG L(ξ)
minimize
Eξ
subject to
Y ∈ Zq×g ,






B Y RG L(ξ) ≤ HRξ L(ξ), 




0 ≤ Y RG L(ξ) ≤ e,
(11)
∀ξ ∈ Ξ.
Notice that this simple reformulation still has infinite number of constraints. Nevertheless, due to the
structure of Rξ and RG , the constraints are now linear with respect to L(ξ). The objective function of
Problem (11) can be further reformulated to
Eξ
0
(DRξ L(ξ))> Y RG L(ξ) = Tr M Rξ> D > Y RG ,
0
where M ∈ Rk ×k , M = Eξ (L(ξ)L(ξ)> ) is the second order moment matrix of the uncertain vector
L(ξ). In some special cases where the Pξ has a simple structure, e.g., Pξ is a uniform distribution, and
G(∙) is not too complicated, M can be calculated analytically. If this is not the case, an arbitrarily
good approximation of M can be calculated using Monte Carlo simulations. This is not computationally
8
demanding as it does not does not involve an optimization problem, and can be done offline.
0
We now define the uncertain vector ξ 0 = (ξ10> , ξ20> )> ∈ Rk , such that ξ 0 := L(ξ) = (ξ > , G(ξ)> )> . ξ 0
0
0 has probability measure Pξ 0 which is defined on the space Rk , B Rk
and is completely determined
by the probability measure Pξ through the relation
Pξ 0 (B 0 ) := Pξ
0
{ξ ∈ Rk : L(ξ) ∈ B 0 }
∀B 0 ∈ B(Rk ).
(12)
o
n
0
Ξ0 := L(Ξ) = ξ 0 ∈ Rk : ∃ξ ∈ Ξ such that L(ξ) = ξ 0 ,
o
n
0
= ξ 0 ∈ Rk : Rξ ξ 0 ∈ Ξ, L(RG ξ 0 ) = ξ 0 ,
(13)
We also introduce the probability measure support
and the expectation operator Eξ 0 (∙) with respect to the probability measure Pξ 0 . We will refer to Pξ
0
and
Ξ0 as the lifted probability measure and uncertainty set, respectively. Notice that although Ξ is a convex
polyhedron, Ξ0 can be highly non-convex due to the non-linear nature of L(∙). Using the definition of ξ 0
we introduce the following lifted adaptive optimization problem :
minimize
Tr M 0 Rξ> D > Y RG
subject to
Y ∈ Zq×g ,
0






0
B Y RG ξ ≤ HRξ ξ ,





0 ≤ Y RG ξ ≤ e,
0
0
0
(14)
∀ξ 0 ∈ Ξ0 ,
where M 0 ∈ Rk ×k , M 0 = Eξ 0 (ξ 0 ξ 0> ) is the second order moment matrix associated with ξ 0 . Since there
is an one-to-one mapping between Pξ and Pξ 0 , M = M 0 .
Proposition 1 Problems (11) and (14) are equivalent in the following sense: both problems have the
same optimal value, and there is a one-to-one mapping between feasible and optimal solutions in both
problems.
Proof See [20, Proposition 3.6 (i)].
The lifted uncertainty set Ξ0 is an open set due to the discontinuous nature of G(∙). This can be
0
problematic in an optimization framework. We now define Ξ := cl(Ξ0 ) to be the closure of set Ξ0 , and
9
introduce the following variant of Problem (14).
minimize
Tr M 0 Rξ> D > Y RG
subject to
Y ∈ Zq×g ,
0
0






B Y RG ξ ≤ HRξ ξ , 




0
0 ≤ Y RG ξ ≤ e,
0
(15)
∀ξ ∈ Ξ ,
0
The following proposition demonstrates that Problems (14) and (15) are in fact equivalent.
Proposition 2 Problems (14) and (15) are equivalent in the following sense: both problems have the
same optimal value, and there is a one-to-one mapping between feasible and optimal solutions in both
problems.
Proof To prove the assertion, it is sufficient to show that Problems (14) and (15) have the same feasible
region. Notice, that since the constraints are linear in ξ 0 , the semi-infinite constraints in Problems (14)
can be equivalently written as
(HRξ − B Y RG ) ∈ (cone(Ξ0 )∗ )m ,
(Y RG ) ∈ (cone(Ξ0 )∗ )q ,
0 ∗ q
(ee>
1 − Y RG ) ∈ (cone(Ξ ) ) ,
where cone(Ξ0 )∗ is the dual cone of cone(Ξ0 ). Similarly, the semi-infinite constraints in Problems (15)
can be equivalently written as
0
(HRξ − B Y RG ) ∈ (cone(Ξ )∗ )m ,
0
(Y RG ) ∈ (cone(Ξ )∗ )q ,
0
∗ q
(ee>
1 − Y RG ) ∈ (cone(Ξ ) ) .
0
From [28, Corollary 6.21], we have that cone(Ξ 0 )∗ = cone(Ξ )∗ , and therefore, we can conclude that the
feasible region of Problems (14) and (15) is equivalent.
Problem (15) is linear to both the decision variables Y and the uncertain vector ξ 0 . Despite this
nice bilinear structure, in the following we demonstrate that Problems (11) and (15) are generically
intractable for decision rules of type (6).
Theorem 1 Problems (11) and (15) defined through L(∙) in (10) and G(∙) in (6b), are NP-hard even
when Y ∈ Zq×g is relaxed to Y ∈ Rq×g .
10
Proof See Appendix A.
Theorem 1 provides a rather disappointing result on the complexity of Problems (11) and (15).
Therefore, unless P = NP, there is no algorithm that solves generic problems of type (11) and (15) in
polynomial time. This difficulty stems from the generic structure of G(∙) in (6b), combined with a generic
0
polyhedral uncertainty set Ξ, resulting in highly non-convex sets Ξ . Nevertheless, in the following we
0
derive the exact polyhedral representation of conv(Ξ ), allowing us to use linear duality arguments from
[4, 7], to reformulate Problem (15) into a mixed-integer linear optimization problem. We remark that
Theorem 1 also covers decision rule problems involving real-valued, piecewise constant decisions rules
constructed using (6b), and demonstrates that these problems are also computationally intractable.
0
We construct the convex hull of Ξ by taking convex combinations of its extreme points, denoted by
0
0
ext(Ξ ). These points are either the extreme points of Ξ 0 , or the limit points Ξ \Ξ0 . We construct these
points as follows: First, using the definition of G(∙) we partition Ξ into P polyhedra, Ξp . Then, for all
extreme points of Ξp , ξ ∈ ext(Ξp ), we calculate the one-side limit point at L(ξ). The set of all one-side
0
limit points will coincides with the set of extreme points of Ξ , see Figure 3. The partitions of Ξ can be
formulated as follows:
Ξp =



 ξ ∈ Ξ : α> ξ ≥ βi , i ∈ Gp ⊆ {2, . . . , g}, 

i


α>
i ξ ≤ βi , i ∈ {2, . . . , g}\Gp


,
p = 1, . . . , P.
(16)
Here, Gp is constructed such that the number of partitions P is maximized, while ensuring that all Ξ p
are (a) non-empty, (b) any partition pair Ξi and Ξj , i 6= j, can overlap only on one their facets, and (c),
SP
Ξ = p=1 Ξp , see Figure 2. Depending on the choices of αi and βi , the number of partitions can be up
α>
2 ξ = β2
Ξ2
Ξ3
α>
1 ξ = β1
Ξ1
Ξ4
Ξ
Figure 2: The uncertainty set Ξ is partitioned into Ξp , p = 1, . . . , 4 polyhedra using α
α
>ξ
2
>ξ
1
= β1 and
= β2 , such that Ξ = Ξ1 ∪ . . . ∪ Ξ4 .
to P = 2g . We define V and V (p) such that
V :=
P
[
V (p),
V (p) := ext(Ξp ), p = 1, . . . , P.
p=1
11
(17)
0
Due to the discontinuity of G(∙), the set of points {ξ 0 ∈ Rk : ξ 0 = L(ξ), ξ ∈ V } does not contain the
0
b p (ξ) = (ξ > , G
b p (ξ)> )>
points Ξ \Ξ0 . To construct these points, we now define the one-sided limit points L
b p (ξ) : Rk → Rg is given by.
for all ξ ∈ V (p) and each partition p = 1, . . . , P . Here, G
b p (ξ) :=
G
lim
u ∈Ξp , u →ξ
G(u),
∀ξ ∈ V (p), p = 1, . . . , P.
(18)
From definition (6b), for each partition p, G(ξ) is constant for all ξ in the interior of Ξp , which is denoted
b p (ξ̃) is equal to G(ξ) for all ξ ∈ int(Ξp ).
by int(Ξp ). Therefore, for each ξ̃ ∈ V (p), the one-side limit G
Furthermore, since each partition Ξp spans Rk , then there exists at least k + 1 vertices in Ξp , and
b p (ξ). From the definition (18), we have that L
b p (ξ) are
therefore, at least k + 1 one-sided limit points G
0
the extreme points of Ξ for all ξ ∈ V (p) and p = 1, . . . , P , see Figure 3.
ξ20
1
0
5
0
10
ξ10
Ξ
Ξ2
Ξ1
0
0 = (ξ 0 , ξ 0 ) = L(ξ) = (ξ, G(ξ))> ,
1 2
where G(ξ) = 1(ξ ≥ 5) and ξ ∈ [0, 10]. Here, V = {ext(Ξ1 ), ext(Ξ2 )} = {0, 5, 10}. The convex hull
is constructed by taking convex combinations of the points L(0), L(5), L(10) (black dots), and point
b 1 (5))> (white dot), where G
b 1 (5) is the one-side limit point at ξ = 5 using partition Ξ1 .
(5, G
Figure 3: Convex hull representation of Ξ induced by lifting ξ
0
The following proposition gives the polyhedral representation of the convex hull of Ξ .
Proposition 3 Let L(∙) be given by (10) and G(∙) being defined in (6b). Then, the tight, closed, outer
0
approximation for the convex hull of Ξ is given by the following polyhedron:
n
0
0
conv (Ξ ) = ξ 0 = (ξ10> , ξ20> )> ∈ Rk : ∃ζp (v) ∈ R+ , ∀v ∈ V (p), p = 1, . . . , P, such that
P
X
X
ζp (v) = 1,
p=1 v ∈V (p)
P
X
X
ζp (v)v,
ξ10 =
p=1 v ∈V (p)
P
X
X
b p (v)
ξ20 =
ζp (v)G
p=1 v ∈V (p)
0
(19)
o
.
Proof The proof is split in two parts: First, we prove that L(Ξ) ⊆ conv (Ξ ) holds, and then we show
12
0
0
that conv (cl(L(Ξ))) ⊇ conv (Ξ ) is true as well, concluding that conv (cl(L(Ξ))) = conv (Ξ ).
0
To prove assertion L(Ξ) ⊆ conv (Ξ ), pick any ξ ∈ Ξ. By construction ξ belongs to some partition of
Ξ, say Ξp , for which L(ξ) is an element of Ξ0 . There are two possibilities: (a) ξ lies on the boundary of
the partition and hence on one of its facets, or (b) ξ lies in the interior of Ξp . If ξ lies on a facet, and
since Ξp spans Rk , then Carathéodory’s Theorem implies that there are k extreme points of the facet,
Pk
Pk
v1 , . . . , vk ∈ V (p), such that for some δ(v) ∈ Rk+ with i=1 δ(vi ) = 1, we have ξ = i=1 δ(vi )vi . Since
b p (vi ), v1 , . . . , vk ∈ V (p), attain the
G(∙) is constant on each facet, then the k one-side limit points G
same value as G(ξ). Therefore, we have
G
k
X
δ(vi )vi
i=1
!
=
k
X
i=1
b p (vi ).
δ(vi )G
Hence, setting ζp (vi ) = δ(vi ), i = 1, . . . , k, in the definition of the convex hull with the rest of the ζ(v)
equal to zero proves part (a) of the assertion. If ξ lies in the interior of Ξp , then again by Carathéodory’s
Theorem there are k + 1 extreme points of Ξp such that for some δ(v) ∈ Rk+1
and v1 , . . . , vk+1 ∈ V (p)
+
Pk+1
Pk+1
with i=1 δ(vi ) = 1, we have ξ = i=1 δ(vi )vi . By construction, there also exists k + 1 one-sided limit
b p (v) : v ∈ V (p)} that attain the same value as G(ξ) for all ξ ∈ int(Ξp ). Thus, we
in the collection {G
have that
G
k+1
X
i=1
δ(vi )vi
!
=
k+1
X
i=1
b p (vi ).
δ(vi )G
Setting ζp (vi ) = δ(vi ), i = 1, . . . , k + 1, with the rest of the ζ(v) equal to zero proves part (b) of the
assertion.
0
0
To prove the second part of assertion, conv (cl(L(Ξ))) ⊇ conv (Ξ ), fix ξ 0 ∈ Ξ . By construction, there
exists ζp (v) ∈ R+ , p = 1, . . . , P , v ∈ V , such that
P
X
X
p=1 v ∈V (p)
This implies that
ζp (v) = 1,
ξ10 =
P
X
X
ζp (v)v,
ξ20 =
p=1 v ∈V (p)
P
X
X
p=1 v ∈V (p)
 


P
0
X
X
ξ1 
 v 
ζp (v) 
 =
,
b p (v)
ξ20
G
p=1 v ∈V (p)
b p (v).
ζp (v)G
0
that is, ξ 0 is a convex combination of either corner points or limit points of L(Ξ). Therefore, conv (Ξ )
is equal to conv (cl(L(Ξ))) almost surely. This concludes the proof.
The convex hull (19) is a closed and bounded polyhedral set described by a finite set of linear
13
inequalities. Therefore, one can rewrite (19) as the following polyhedron
n
o
0
0
0
conv(Ξ ) = ξ 0 ∈ Rk : ∃ζ 0 ∈ Rv such that W 0 ξ 0 + U 0 ζ 0 ≥ h0 ,
0
0
0
0
(20)
0
where W 0 ∈ Rl ×k , U 0 ∈ Rl ×v and h0 ∈ Rl are completely determined through Propositions 3.
The following proposition captures the essence of robust optimization and provides the tools for
reformulating the infinite number of constraints in Problem (15), see [4, 7]. The proof is repeated here
to keep the paper self-contained.
0
0
Proposition 4 For any Z ∈ Rm×k and conv(Ξ ) given by (20), the following statements are equivalent.
0
(i) Zξ 0 ≥ 0 for all ξ 0 ∈ conv(Ξ ),
0
(ii) ∃ Λ ∈ Rm×l
with ΛW 0 = Z, ΛU 0 = 0, Λh0 ≥ 0.
+
Proof We denote by Zμ> the μth row of the matrix Z. Then, statement (i) is equivalent to
⇐⇒
⇐⇒
⇐⇒
Zξ 0 ≥ 0 for all ξ 0 ∈ conv(Ξ0 ),
n
o
>
0
v0
0 0
0 0
0
ξ
:
∃ζ
∈
R
,
W
ξ
+
U
ζ
≥
h
Z
,
0 ≤ min
μ
ξ 0
0 ≤ max 0 h0> Λμ : W 0> Λμ = Zμ , U 0> Λμ = 0, Λμ ≥ 0 ,
Λμ ∈Rl
0
∃ Λμ ∈ Rl with W 0> Λμ = Zμ , U 0> Λμ = 0, h0> Λμ ≥ 0, Λμ ≥ 0,








∀ μ = 1, . . . , m 


∀ μ = 1, . . . , m 






∀ μ = 1, . . . , m 
(21)
The equivalence in the third line follows from linear duality. Interpreting Λ >
μ as the μth row of a new
matrix Λ ∈ Rm×l shows that the last line in (21) is equivalent to assertion (ii). Thus, the claim follows.
Using Proposition 4 together with the polyhedron (20), we can now reformulate Problem (15) into
the following mixed-integer linear optimization problem.
minimize
subject to
Tr M 0 Rξ> D > Y RG
0
0
q×l
Y ∈ Zq×g , Λ ∈ Rm×l
, Γ ∈ Rq×l
+
+ , Θ ∈ R+
BY RG + ΛW 0 = HRξ , ΛU 0 = 0, Λh0 ≥ 0,
0
(22)
Y RG = ΓW 0 , ΓU 0 = 0, Γh0 ≥ 0,
0
0
0
ee>
1 − Y RG = ΘW , ΘU = 0, Θh ≥ 0.
0
0
0
, Γ ∈ Rq×l
and Θ ∈ Rq×l
are the auxiliary variables associated with constraints
Here, Λ ∈ Rm×l
+
+
+
B Y RG ξ 0 ≤ HRξ ξ 0 , 0 ≤ Y RG ξ 0 and Y RG ξ 0 ≤ e, respectively. We emphasize that Problem (22) is the
0
exact reformulation of Problem (15), since (19) is a tight outer approximation of the convex hull of Ξ .
Therefore, the solution of Problem (22) achieves the best binary decision rule associated with G(∙) in
14
(6b). Notice that the size of the Problem (22) grows quadratically with respect to the number of binary
0
decision q, and l0 the number of constraints of conv(Ξ ). However, l0 can be very large as it depends
on the cardinality of V , which is constructed using the extreme points of the P = 2g partitions Ξp .
Therefore, the size of Problem (22) can grow exponentially with respect to the description of Ξ and the
complexity of the binary decision rule, g, defined in (6).
In the following, we present instances of Ξ and G(∙), for which the proposed solution method will
result to mixed-integer optimization problems whose size grows only polynomially with respect to the
input parameters.
2.3
Scalable Binary Recision Rule Structures
In this section, we derive a scalable mixed-integer linear optimization reformulation of Problem (7) by
considering simplified instances of the uncertainty set Ξ and binary decision rules y(∙). We will show
that for the structure of Ξ and y(∙) considered in this section, the size of the resulting mixed-integer
linear optimization problem grows polynomially in the size of the original Problem (3) as well as the
description of the binary decision rules. We consider two cases: (i) uncertainty sets that can be described
as hyperrectangles and (ii) general polyhedral uncertainty sets.
We now consider case (i), and define the uncertainty set of Problem (3) to have the following hyperrectangular structure:
Ξ = ξ ∈ Rk : l ≤ ξ ≤ u, ξ1 = 1 ,
(23)
for given l, u ∈ Rk . We restrict the feasible region of the binary functions y(∙) ∈ Bk,q to admit the
piecewise constant structure (6a), but now G : Rk → {0, 1}g is written in the form
G(∙) = (G1 (∙), G2,1 (∙), . . . , G2,r (∙), . . . , Gk,r (∙))> ,
with g = 1 + (k − 1)r. Here, G1 : Rk → {0, 1} and Gi,j : Rk → {0, 1}, for j = 1, . . . , r, i = 2, . . . , k, are
piecewise constant functions given by
G1 (ξ) := 1,
Gi,j (ξ) := 1 (ξi ≥ βi,j ) ,
j = 1, . . . , r, i = 2, . . . , k,
(24)
for fixed βi,j ∈ R, j = 1, . . . , r, i = 2, . . . , k. By construction, we assume that li < βi,1 < . . . , βi,r < ui ,
for all i = 2, . . . , k. Structure (24) gives rise to binary decision rules that are discontinuous along the
axis of the uncertainty set, and have r discontinuities in each direction. Notice that (24) is a special
case of (6b), where vector α are replaced by ei . One can easily modify (24) such that the number of
discontinuities r is different for each direction ξi . We will refer to βi,1 , . . . , βi,r as the breakpoints in
15
direction ξi .
Problem (7) still retains the same semi-infinite structure using (23) and (24). In the following, we
use the same arguments as in Section 2.2, to (a) redefine Problem (7) into a higher dimensional space
and (b) construct the convex hull of the lifted uncertainty set and use it together with Proposition 4 to
reformulate the infinite number of constraints of the lifted problem.
0
We define the non-linear lifting operator L : Rk → Rk to be L(∙) = (L1 (∙)> , . . . , Lk (∙)> )> such that
L 1 : R k → R2 ,
L1 (ξ) = (ξ1 , G1 (ξ))> ,
Li : Rk → Rr+1 ,
Li (ξ) = (ξi , Gi (ξ)> )> , i = 2, . . . , k,
(25)
with k 0 = 2 + (k − 1)(r + 1). Here, with slight abuse of notation, Gi (∙) = (Gi,1 (∙), . . . , Gi,r )> for
i = 2, . . . , k. Notice that L(∙) is separable in each component Li (∙), since Gi (∙) only involves ξi . We also
introduce matrices Rξ = (Rξ
,1 , . . . , Rξ ,k )
0
∈ Rk×k with Rξ
,1
∈ Rk×2 , Rξ
,i
∈ Rk×(r+1) , i = 2, . . . , k, such
that
Rξ L(ξ) = ξ,
Rξ ,i Li (ξ) = ei ξi , i = 1, . . . , k,
(26a)
0
and RG = (RG,1 , . . . , RG,k ) ∈ Rg×k with RG,1 ∈ Rg×2 , RG,i ∈ Rg×(r+1) , i = 2, . . . , k, such that
RG L(ξ) = G(ξ),
(26b)
RG,i Li (ξ) = (0, . . . , Gi (ξ)> , . . . , 0)> , i = 1, . . . , k.
As in Section 2.2, using the definition of L(∙) and Rξ , RG in (25) and (26), respectively, we can rewrite
Problem (7) into Problem (11).
0
We now define the uncertain vector ξ 0 = (ξ10> , . . . , ξk0> )> ∈ Rk such that ξ 0 = L(ξ) and ξi0 = Li (ξ) for
i = 1, . . . , k. ξ 0 has probability measure Pξ
support is give by
0
defined using (12), and corresponding probability measure
n
o
0
Ξ0 = L(Ξ) = ξ 0 ∈ Rk : l ≤ Rξ ξ 0 ≤ u, L(RG ξ 0 ) = ξ 0 .
(27)
Notice that since both Ξ and L(∙) are separable in each component ξi , Ξ0 can be written in the following
equivalent form:
0
Ξ0 = L (Ξ) = (ξ10> , . . . , ξk0> )> ∈ Rk : ξi ∈ Ξ0i , i = 1, . . . , k ,
where Ξ0i = Li (Ξ). Therefore, we can express conv(cl(Ξ 0 )) as
0
conv(cl(Ξ0 )) = (ξ10> , . . . , ξk0> )> ∈ Rk : ξi0 ∈ conv(cl(Ξ0i )), i = 1, . . . , k ,
0
0
= (ξ10> , . . . , ξk0> )> ∈ Rk : ξi0 ∈ conv(Ξi ), i = 1, . . . , k .
(28)
0
It is thus sufficient to derive a closed-form representation for the marginal convex hulls conv(Ξi ).
16
0
0
We now construct the polyhedral representation of conv(Ξi ). As before, conv(Ξi ) will be con0
structed by taking convex combinations of its extreme points. Set conv(Ξ1 ) has the trivial representation
0
conv(Ξ1 ) = {ξ10 ∈ R2 : ξ10 = e}. We define the sets Vi = {li , βi,1 , . . . , βi,r , ui }, i = 2, . . . , k, and introduce
the following partitions for each dimension of Ξ.
Ξi,1 := {ξi ∈ R : li ≤ ξi ≤ βi,1 },
Ξi,p := {ξi ∈ R : βi,p−1 ≤ ξi ≤ βi,p },
p = 2, . . . , r,
Ξi,r+1 := {ξi ∈ R : βi,r ≤ ξi ≤ ui },











i = 2, . . . , k.
(29)
Therefore, Vi (1) = {li , βi,1 }, Vi (p) = {βi,p−1 , βi,p }, p = 2, . . . , r, Vi (r + 1) = {βi,r , ui }. It is easy to see
0
that points L(ei ξi ), ξi ∈ Vi , are extreme points of conv(Ξi ), see Figure 4. Notice that Gi (ei ξi ) is constant
for all ξi ∈ int(Ξi,p ), p = 1, . . . , r + 1, but can attain different values on the boundaries of the partitions.
b i,p (ei ξi ) ∈ Rr+1
To this end, for each partition and ξi ∈ Vi (p), we define the one-sided limit points G
such that
b i,p (ei ξi ) =
G
lim
u∈Ξi,p , u→ξi
Gi (ei u),
∀ξi ∈ Vi (p), p = 1, . . . , r + 1,
(30)
for all i = 2, . . . , k. Gi (ei ξi ) is constant for all ξi ∈ int(Ξi,p ) and each partition p. Therefore, for each
˜ is equal to G(ei ξi ) for all ξ ∈ int(Ξi,p ).
b i,p (ei ξ)
ξ˜i ∈ Vi (p), the one-side limit G
0
ξi,3
1
0
ξi,2
0
1
0
3
2
1
0
0
ξi,1
Ξi
0
= Li (ξ) = (ξi , Gi (ξ)> )> , where
Gi (ξ) = (1(ξi ≥ 1), 1(ξi ≥
and ξi ∈ [0, 3]. Here, Vi = {0, 1, 2, 3}. The convex hull is constructed by taking convex combinations of the points L(0), L(1), L(2), L(3) (black dots), and points
b 1,1 (1))> , (2, G
b 1,2 (2))> (white dot), where G
b 1,1 (1) and G
b 1,2 (2) are the one-side limit point at
(1, G
points ξi = 1 and ξi = 2, respectively.
Figure 4: Convex hull representation of Ξi induced by lifting ξ
2))>
0
i
0
The following proposition gives the polyhedral representation of the each conv(Ξi ), i = 2, . . . , k. A
0
visual representation of conv(Ξi ) is depicted in Figure 4.
Proposition 5 For each i = 2 . . . , k, let Li (∙) be given by (25) and Gi (∙) being defined in (24). Then,
17
0
the tight, closed, outer approximation for the convex hull of Ξi is given by the following polyhedron:
n
0
0>
0> >
, ξi,2
) ∈ Rr+1 : ∃ζi,p (v) ∈ R+ , ∀v ∈ Vi (p), p = 1, . . . , r + 1, such that
conv (Ξi ) = ξi0 = (ξi,1
r+1 X
X
ζi,p (v) = 1,
p=1 v ∈Vi (p)
r+1 X
X
0
=
ζi,p (v)v,
ξi,1
p=1 v ∈Vi (p)
r+1 X
X
0
b p (ei v)
ξi,2
=
ζi,p (v)G
p=1 v∈Vi (p)
(31)
o
.
Proof Proposition 5 can be proven using similar arguments as in Proposition 3.
0
0
Set conv(Ξi ) has 2r + 2 extreme points. Using the extreme point representation (31), conv(Ξi ) can
0
0
be represented through 4r + 4 constraints. Therefore, conv(Ξ ) = ×ki=1 conv(Ξi ) can be represented using
a total of 2 + k(4r + 4) constraints, i.e., its size grows quadratically in the dimension of Ξ, k, and the
complexity of the binary decision rule r. One can now use the polyhedral description (31) together
with Proposition 4, to reformulate Problem (15) into a mixed-integer linear optimization problem which
can be expressed in the form (22). The size of the mixed-integer linear optimization problem will be
polynomial in q, m, k and r. We emphasize that since the convex hull representation (31) is exact, the
solution of Problem (22) achieves the best binary decision rule associated with G(∙) defined in (24).
We now consider case (ii), and we assume that Ξ is a generic set of the type (4) and G(∙) is given
by (24). From Section 2, we know that the number of constraints needed to describe the polyhedral
0
representation of conv(Ξ ) grows exponentially with the description of Ξ and complexity of the decision
rule. In the following, we presented a systematic way to construct a tractable outer approximation
0
for conv(Ξ ). Using this outer approximation to reformulate the lifted problem (11), will not yield the
best binary decision rule structure associated with function G(∙) but rather a conservative approximation. Nevertheless, the size of the resulting mixed-integer linear optimization problem will only grow
polynomially with respect to the constrains of Ξ and complexity of the binary decision rule.
To this end, let {ξ ∈ Rk : l ≤ ξ ≤ u} be the smallest hyperrectangle containing Ξ. We have
Ξ
=
=
ξ ∈ Rk : ∃ζ ∈ Rv such that W ξ + U ζ ≥ h, ξ1 = 1 ,
ξ ∈ R : ∃ζ ∈ R such that W ξ + U ζ ≥ h, ξ1 = 1, l ≤ ξ ≤ u ,
k
v
which implies that the lifted uncertainty set Ξ 0 = L(Ξ) can be expressed as Ξ0 = Ξ01 ∩ Ξ02 , where
Ξ01
:=
Ξ02
:=
n
0
ξ 0 ∈ Rk : ∃ζ ∈ Rv such that W Rξ ξ 0 + U ζ ≥ h, ξ10 = 1
n
o
0
ξ 0 ∈ Rk : l ≤ Rξ ξ 0 ≤ u, L(RG ξ 0 ) = ξ 0 .
18
o
(32)
Notice that Ξ02 has exactly the same structure as (27). We thus conclude that
b 0 := {Ξ01 ∩ conv(Ξ02 )} ⊇ conv(Ξ0 ),
Ξ
(33)
b 0 can be used as an outer approximation of conv(Ξ0 ). Since conv(Ξ02 ) can be written
and therefore, Ξ
b 0 has a total of (l + 1) + (2 + k(4r + 4)) constraints.
as the polyhedron induced by Proposition 5, set Ξ
b 0 together with Proposition 4, to reformulate
As before, one can now use the polyhedral description of Ξ
Problem (15) into a mixed-integer linear optimization problem which can be expressed in the form (22).
The size of the mixed-integer linear optimization problem will be polynomial in q, m, k, l and r. Since
b 0 ⊇ conv(Ξ0 ), the solution of Problem (22) might not achieve the best binary decision rule induced by
Ξ
(24), but rather a conservative approximation. Nevertheless, using this systematic decomposition of the
uncertainty set, we can efficiently apply binary decision rules to problems where the uncertainty set has
arbitrary convex polyhedral structure.
One can use G(∙) given by (24) together with this systematic decomposition of Ξ to achieve the same
binary decision rule structures as those offered by G(∙) defined in (6b). We will illustrate this through
the following example. Lets assume that the problem in hand requires a generic Ξ of type (4), and we
want to parameterize the binary decision rule to be a linear function of G(ξ) = 1 α> ξ ≥ β for some
α ∈ Rk and β ∈ R. We can introduce an additional random variable ξ˜ in Ξ such that ξ˜ = α> ξ. ξ˜ is
completely determined by the random variables already presented in Ξ. The modified support can now
be written as follows,
Ξ
=
=
n
n
o
˜ ∈ Rk+1 : ∃ζ ∈ Rv , W ξ + U ζ ≥ h, ξ1 = 1, ξ˜ = α> ξ ,
(ξ, ξ)
o
˜ ∈ Rk+1 : ∃ζ ∈ Rv , W ξ + U ζ ≥ h, ξ1 = 1, ξ˜ = α> ξ, l ≤ ξ ≤ u, ˜l ≤ ξ˜ ≤ ũ ,
(ξ, ξ)
(34)
˜ ∈ Rk+1 : l ≤ ξ ≤ u, ˜l ≤ ξ˜ ≤ ũ} is the smallest hyperrectangle containing Ξ. We can now
where {(ξ, ξ)
use
˜ = 1 ξ˜ ≥ β ,
G(ξ, ξ)
which is an instance of (24), to achieve the same structure as G(ξ) = 1 α> ξ ≥ β . Defining appropriate
L(∙) and Rξ , RG , we can use the following decomposition of the lifted uncertainty set Ξ 0 = Ξ01 ∩ Ξ02 such
that
Ξ01
:=
Ξ02
:=
n
o
0
(ξ 0 , ξ˜0 ) ∈ Rk +1 : ∃ζ ∈ Rv such that W Rξ ξ 0 + U ζ ≥ h, ξ10 = 1, ξ˜0 = α> Rξ ξ 0 ,
n
o
0
(ξ 0 , ξ˜0 ) ∈ Rk +1 : l ≤ Rξ ξ 0 ≤ u, ˜l ≤ ξ˜0 ≤ ũ, L(RG (ξ 0 , ξ˜0 )) = (ξ 0 , ξ˜0 ) .
19
b 0 := {Ξ0 ∩ conv(Ξ0 )} ⊇ conv(Ξ0 ) we can reformulate reformulate Problem (15) into a
By defining, Ξ
1
2
mixed-integer linear optimization problem with polynomial number of decision variable and constraints
with respect to the the input data. Once more, we emphasize that this systematic decomposition might
not achieve the best binary decision rule induced by G(∙) but rather a conservative approximation.
Nevertheless, we can now achieve the same flexibility for the binary decision rules as those offered by
(6b) without the exponential growth in the size of the resulting problem.
3
Binary Decision Rules for Random-Recourse Problems
In this section, we present our approach for one-stage adaptive optimization problems with random
recourse. The solution method of this class of problems is an adaptation of the solution method presented
in Section 2.2. Given function B : Rk → Rm×q and matrices D ∈ Rq×k , H ∈ Rm×k and a probability
measure Pξ supported on set Ξ for the uncertain vector ξ ∈ Rk , we are interested in choosing binary
functions y(∙) ∈ Bk,q in order to solve:
(Dξ)> y(ξ)
minimize
Eξ
subject to
y(∙) ∈ Bk,q ,




B(ξ)y(ξ) ≤ Hξ, 
(35)
∀ξ ∈ Ξ,
where Ξ is a generic polyhedron of type (4). We assume that the recourse matric B(ξ) depends linearly
on the uncertain parameters. Therefore, the μth row of B(ξ) is representable as ξ > Bμ for matrices
Bμ ∈ Rk×q with μ = 1, . . . , m. Problem (35) can therefore be written as the following problem.
minimize
Eξ
subject to
y(∙) ∈ Bk,q ,
(Dξ)> y(ξ)




ξ > Bμ y(ξ) ≤ Hμ> ξ, μ = 1, . . . , m 
(36)
∀ξ ∈ Ξ,
where Hμ> denotes the μth row of matrix H. Problem (36) involves a continuum of decision variables
and inequality constraints. Therefore, in order to make the problem amenable to numerical solutions,
we restrict y(∙) to admit structure (6). Applying the binary decision rules (6) to Problem (36) yields the
following semi-infinite problem, which involves a finite number of decision variables Y ∈ Zq×g , and an
20
infinite number of constraints:
minimize
Eξ
subject to
Y ∈ Zq×g ,
ξ > D> Y G(ξ)
>
ξ Bμ Y G(ξ) ≤
Hμ> ξ,
0 ≤ Y G(ξ) ≤ e,






μ = 1, . . . , m, 




(37)
∀ξ ∈ Ξ.
Notice that all decision variables appear linearly in Problem (7). Nevertheless, the objective and constraints are non-linear functions of ξ.
0
We now define the non-linear lifting operator L : Rk → Rk such that

ξ





 G(ξ) 






L(ξ) = G(ξ)ξ1  ,


 . 
 .. 




G(ξ)ξk
0
L : R k → Rk ,
(38)
k 0 = k + g + gk. Note that including both components G(ξ) and G(ξ)ξ1 in L(ξ) is redundant as by
construction ξ1 = 1. Nevertheless, in the following we adopt formulation (38) for easy of exposition. We
0
0
also define matrices Rξ ∈ Rk×k , and RG ∈ Rq×k such that
Rξ L(ξ) = ξ,
RG L(ξ) = G(ξ).
(39)
0
In addition, we define F : Rk×g → Rk that expresses the quadratic polynomials ξ > Bμ Y G(ξ), as linear
functions of L(ξ) through the following constraints:
y0μ = F (Bμ Y ),
0
>
0
k
y0>
μ L(ξ) = ξ Bμ Y G(ξ), yμ ∈ R ,
μ = 1, . . . , m,
∀ξ ∈ Ξ.
(40)
Constraints y0μ = F (Bμ Y ), μ = 1, . . . , m, are linear, since F (∙) effectively reorders the entries of matrix
Bμ Y into the vector y0μ . Using (38), (39) and (40), Problem (37) can now be rewritten into the following
21
optimization problem.
minimize
Tr M Rξ> D > Y RG
subject to
Y ∈ Zq×g ,
>
y0>
μ L(ξ) ≤ Hμ Rξ L(ξ),
0
y0μ = F (Bμ Y ), y0μ ∈ Rk ,
0 ≤ Y RG L(ξ) ≤ e,









μ = 1, . . . , m, 


μ = 1, . . . , m, 






(41)
∀ξ ∈ Ξ.
Problem (41) has similar structure as Problem (11), i.e., both Y and L(ξ) appear linearly in the constraints. Therefore, in the following we redefine Problem (41) in a higher dimensional space and apply
Proposition 4 to reformulate the problem into a mixed-integer optimization problem.
0
0>
0>
0>
0> >
0
We define the uncertain vector ξ 0 = (ξ1,1
, ξ1,2
, ξ2,1
, . . . , ξ2,k
) ∈ Rk such that ξ 0 = L(ξ) and ξ1,1
= ξ,
0
0
= G(ξ), and ξ2,i
= G(ξ)ξi for i = 1, . . . , k. ξ 0 has probability measure Pξ
ξ1,2
0
defined as in (12) and
support Ξ0 = L(Ξ). Problem (41) can now be written as the equivalent semi-infinite problem defined on
the lifted space ξ 0 .
minimize
Tr M 0 Rξ> D > Y RG
subject to
Y ∈ Zq×g ,
0
>
0
y0>
μ ξ ≤ Hμ Rξ ξ ,
0
y0μ = F (Bμ Y ), y0μ ∈ Rk ,
0 ≤ Y RG ξ 0 ≤ e,
0









μ = 1, . . . , m, 


μ = 1, . . . , m, 






(42)
∀ξ 0 ∈ Ξ0 ,
0
We will construct the convex hull of Ξ in the same way as in Section 2.2. Notice that conv(Ξ ) will have
the same number of extreme points as the convex hull defined through the lifting L(ξ) = (ξ > , G(ξ)> )> ,
b p (ξ) as in (18), it is easy to see that
see Figure 5. By defining the partitions Ξ p of Ξ as in (16), and G
0
conv(Ξ ) can be described as a convex combination of the following points.


 ξ



 G

b
 p (ξ) 


b

Gp (ξ)ξ1  ,




..


.




b
Gp (ξ)ξk
ξ ∈ V (p), p = 1, . . . , P.
0
The following proposition gives the polyhedral representation of the convex hull of Ξ .
22
G(ξ)
G(ξ)ξ2
2
2
1
0
ξ2
0
− 2
2
ξ2
0
− 2
4
2
0
ξ1
− 2
2
0
− 2
0
2
4
ξ1
e ) = (ξ1 , ξ2 , G(ξ )ξ2 )
Visualization of L(ξ ) = (ξ1 , ξ2 , G(ξ )) (left) and L(ξ
Here, V =
(right) where G(ξ ) = 1(ξ1 ≥ 2), for ξ1 ∈ [0, 4] and ξ2 ∈ [−2, 2].
{(0, −2), (0, 2), (2, −2), (2, 2), (4, −2), (4, 2)}. The convex hull of L(ξ ) is constructed by taken convex
b −2)))> , (2, 2, G((2,
b
combinations of L(ξ ) for all ξ ∈ V and (2, −2, G(2,
2)))> , and the convex hull of
e ) is constructed by taken convex combinations of L(ξ
e ) for all ξ ∈ V and (2, −2, G(2,
b −2)(−2))> ,
L(ξ
b 2)2)> .
(2, 2, G(2,
Figure 5:
Proposition 6 Let L(∙) being defined in (38) and G(∙) being defined in (6b). Then, the tight, closed,
0
outer approximation for the convex hull of Ξ is given by the following polyhedron:
n
0
0
0>
0>
0>
0> >
conv (Ξ ) = ξ 0 = (ξ1,1
, ξ1,2
, ξ2,1
, . . . , ξ2,k
) ∈ Rk : ∃ζp (v) ∈ R+ , ∀v ∈ V (p), p = 1, . . . , P,
P
X
X
ζp (v) = 1,
p=1 v ∈V (p)
P
X
X
0
ξ1,1
=
ζp (v)v,
p=1 v ∈V (p)
P
X
X
0
b p (v),
ξ1,2
=
ζp (v)G
p=1 v ∈V (p)
P
X
X
0
b p (v)vi ,
ξ2,i
=
ζp (v)G
p=1 v ∈V (p)
(43)
o
i = 1, . . . , k .
Proof Proposition 6 can be proven using similar arguments as in Proposition 3.
The description of polyhedron (43) inherits the same exponential complexity as (19). That is, the
number of constraints in (43) depends on the cardinality of V , which in the worst-case, is constructed
using the extreme points of P = 2g partitions of Ξ. Nevertheless, the description (43) provides a tight
0
outer approximation for conv (Ξ ).
Using Proposition 4 together with the polyhedron (43), we can now reformulate Problem (42) into
23
the following mixed-integer linear optimization problem.
minimize
subject to
Tr M 0 Rξ> D > Y RG
0
0
q×l
Y ∈ Zq×g , Γ ∈ Rq×l
+ , Θ ∈ R+ ,
y0μ
k0
∈ R , λμ ∈
0
Rm×l
,
+






y0μ + λμ W 0 = Rξ> Hμ , λμ U 0 = 0, λμ h0 ≥ 0




0
0
k0

y = F (Bμ Y ), y ∈ R ,
μ
μ = 1, . . . , m,
(44)
μ
Y RG = ΓW , ΓU 0 = 0, Γh0 ≥ 0,
0
0
0
0
ee>
1 − Y RG = ΘW , ΘU = 0, Θh ≥ 0.
0
0
>
0
are associated with constraints y0>
Here, the auxiliary variables λμ ∈ Rm×l
μ ξ ≤ Hμ Rξ ξ , for μ =
+
0
0
1, . . . , m, and Γ ∈ Rq×l
and Θ ∈ Rq×l
with constraints 0 ≤ Y RG ξ 0 and Y RG ξ 0 ≤ e, respectively.
+
+
Problem (44) is the exact reformulation of Problem (42), since (43) is a tight outer approximation of the
0
convex hull of Ξ , and thus the solution of Problem (44) achieves the best binary decision rule associated
with G(∙) in (6b). Nevertheless, the size of Problem (44) is affected by the exponential growth of the
0
constraints in conv (Ξ ). One can mitigate this exponential growth by considering simplified structures
of Ξ and G(∙), following similar guidelines as those those discussed in Section 2.3.
4
Binary Decision Rules for Multistage Problems
In this section, we extend the methodology presented in Section 2 to cover multistage adaptive optimization problems with fixed-recourse. The mathematical formulations presented can easily be adapted to
random-recourse problems by following the guidelines of Section 3.
The dynamic decision process considered can be described as follows: A decision maker first observes
an uncertain parameter ξ1 ∈ Rk1 and then takes a binary decision y1 (ξ1 ) ∈ {0, 1}q1 . Subsequently,
a second uncertain parameter ξ2 ∈ Rk2 is revealed, in response to which the decision maker takes a
second decision y2 (ξ1 , ξ2 ) ∈ {0, 1}q2 . This sequence of alternating observations and decisions extends
over T time stages. To enforce the non-anticipative structure of the decisions, a decision taken at
stage t can only depend on the observed parameters up to and including stage t, i.e., yt (ξ t ) where
Pt
t
ξ t = (ξ1> , . . . , ξt> )> ∈ Rk , with k t = s=1 ks . For consistency with the previous sections, and with
slight abuse of notation, we assume that k1 = 1 and ξ1 = 1. As before, setting ξ1 = 1 is a non-restrictive
assumption which allows to represent affine functions of the non-degenerate outcomes (ξ2> , . . . , ξt> )> in a
compact manner as linear functions of (ξ1> , . . . , ξt> )> . We denote by ξ = (ξ1> , . . . , ξT> )> ∈ Rk the vector
of all uncertain parameters, where k = k T . Finally, we denote by Bkt ,qt the space of binary functions
24
t
from Rk to {0, 1}qt .
t
t
Given matrices Bts ∈ Rmt ×qs , Dt ∈ Rqt ×k and Ht ∈ Rmt ×k , and a probability measure Pξ supported
on set Ξ given by (4), for the uncertain vector ξ ∈ Rk , we are interested in choosing binary functions
yt (∙) ∈ Bkt ,qt in order to solve:
minimize
T
X
Eξ
(Dt ξ t )> yt (ξ t )
t=1
subject to
!




yt (∙) ∈ Bkt ,qt ,
t
X
s=1

Bts ys (ξ s ) ≤ Ht ξ t , 

(45)
t = 1, . . . , T, ∀ξ ∈ Ξ.
Problem (45) involves a continuum of decision variables and inequality constraints. Therefore, in order to
make the problem amenable to numerical solutions, we restrict the feasible region of the binary functions
yt (∙) ∈ Bkt ,qt to admit the following piecewise constant structure:

t

yt (ξ) = Yt Gt (ξ), Y ∈ Zqt ×g , 
t = 1, . . . , T,


0 ≤ Yt Gt (ξ) ≤ e,
t
∀ξ ∈ Ξ,
(46a)
t
where Gt : Rk → {0, 1}g can be expressed by Gt (∙) = (G1 (∙)> , . . . , Gt (∙)> ), and Gt : Rk → {0, 1}gt are
the piecewise constant functions:
G1 (ξ1 ) := 1,
t
Gt,i (ξ) := 1 α>
t,i ξ ≥ βt,i ,
i = 1, . . . , gt , t = 2, . . . , T,
(46b)
t
for given αt,i ∈ Rk and βt,i ∈ R, i = 1, . . . , g, t = 2, . . . , T . Here, we assume that the sets {ξ ∈ Ξ :
t
> t
α>
t,i ξ ≥ βt,i } and {ξ ∈ Ξ : αt,i ξ ≤ βt,i } are non-empty for all i = 1, . . . , gt , t = 2, . . . , T , and
t
> t
α>
t,i ξ − βt,i 6= αt,j ξ − βt,j for all ξ ∈ Ξ where i, j ∈ {1, . . . , gt }, i 6= j and t = 2, . . . , T . The dimension
Pt
of Gt (∙) is g t , with g t = s=1 gs . The non-anticipativity of yt (∙) is ensured by restricting Gt (∙) to depend
only on random parameters up to and including stage t. Applying decision rules (46) to Problem (45)
yields the following semi-infinite problem.
minimize
Eξ
T
X
(Dt ξ t )> Yt Gt (ξ)
t=1
subject to
Yt ∈ Z
t
X
s=1
qt ×g t
!
,
Bts Ys G (ξ) ≤ Ht ξ ,
s
t
0 ≤ Yt Gt (ξ) ≤ e
25















(47)
t = 1, . . . , T, ∀ξ ∈ Ξ.
We can now use the lifting techniques to first express Problem (47) in a higher dimensional space,
compute the convex hull associated with the non-convex uncertainty set of the lifted problem, and finally
0t
reformulate the semi-infinite structure using Proposition 4. We define lifting Lt : Rk → Rk , such that
Lt (∙) = (L1 (∙)> , . . . , Lt (∙)> )> , where
0
L t : R k t → R kt ,
Lt (ξ) = (ξt> , Gt (ξ)> )> , t = 1, . . . , T,
Here, kt0 = kt + gt . By convention L(∙) = (L1 (∙)> , . . . , LT (∙)> )> and k 0t =
addition, we define matrices, Rξ
Rξ
∈ Rk
,t
,t L(ξ)
t
×k0
and RG,t ∈ Rg
= ξt ,
t
×k0
such that
RG,t L(ξ) = G(ξ)t ,
(48a)
Pt
s=1
ks0 with k 0 = k 0T . In
t = 1, . . . , T.
(48b)
Using (48), Problem (47) can now be rewritten in the following form:
minimize
T
X
Eξ
(Dt Rξ
,t L(ξ))
>
Yt RG,t L(ξ)
t=1
subject to
!
t
Yt ∈ Zqt ×g ,
t
X
s=1
Bts Ys RG,s L(ξ) ≤ Ht Rξ
,t L(ξ),
0 ≤ Yt RG,t L(ξ) ≤ e








(49)
t = 1, . . . , T, ∀ξ ∈ Ξ,







Problem (49) has similar structure as Problem (11), i.e., both Yt and L(ξ) appear linearly in the constraints. Therefore, in the following we redefine Problem (49) in a higher dimensional space and apply
Proposition 4 to reformulate the problem into a mixed-integer optimization problem.
0
We now define the uncertain vector ξ 0 = (ξ10> , . . . , ξT0> )> ∈ Rk such that ξ 0 = L(ξ) and ξt0 = Lt (ξ)
for t = 1, . . . , T . ξ 0 has probability measure Pξ
0
defined using (12), and probability measure support
Ξ0 = L(Ξ). Using the definition of ξ 0 , the lifted semi-infinite problem is given as follows:
minimize
Eξ
0
T
X
(Dt Rξ
,t ξ
) Yt RG,t ξ 0
0 >
t=1
subject to
Yt ∈ Z
t
X
s=1
qt ×g t
,
0
Bts Ys RG,s ξ ≤ Ht Rξ
0 ≤ Yt RG,t ξ 0 ≤ e
,t ξ
0
,
!















(50)
t = 1, . . . , T, ∀ξ ∈ Ξ .
0
0
The definition of L(∙) in (48a) share almost identical structure as L(∙) in (10). Therefore, the convex
0
hull of Ξ can be expressed as a slight variant of the polyhedron given in (19), and will constitute a
26
0
tight outer approximation of conv(Ξ ). Using Proposition 4 together with the polyhedral representation
0
of conv(Ξ ), we can now reformulate Problem (42) into the following mixed-integer linear optimization
problem.
T
X
minimize
t=1
Tr M 0 Rξ>,t Dt> Yt RG,t







t

X

0
0
0

Bts Y RG,s + Λt W = Ht Rξ ,t , Λt U = 0, Λt h ≥ 0, 
t
0
t
0
t
t ×l
, Γ ∈ Rq+ ×l , Θ ∈ Rq+ ×l
Yt ∈ Zqt ×g , Λ ∈ Rm
+
subject to
0
s=1










Yt RG,t = Γt W , Γt U = 0, Γt h ≥ 0,
0
0
0
0
0
0
ee>
1 − Yt RG,t = Θt W , Θt U = 0, Θt h ≥ 0.
0
0
(51)
t = 1, . . . , T,
where M 0 ∈ Rk ×k , M 0 = Eξ 0 (ξ 0 ξ 0> ). Once more, we emphasize that Problem (51) is the exact reformulation of Problem (50). Therefore, the solution of Problem (51) achieves the best binary decision
rule associated with G(∙) in (6b) at the cost of the exponential growth of its size with respect to the
description of Ξ and the complexity of the binary decision rules (46). One can mitigate this exponential
growth by considering simplified structures of Ξ and G(∙), following similar guidelines as those those
discussed in Section 2.3.
5
Computational Results
In this section, we apply the proposed binary decision rules to a multistage inventory control problem
and to a multistage knapsack problem. In both problems, the structure of the binary decision rules
is constructed using G(∙) defined in (24), where the breakpoints βi,1 , . . . , βi,r are placed equidistantly
within the marginal support of each random parameter ξi . To improve the scalability of the solution
method, we further restrict the structure of the binary decisions rules to be


y(ξ) = Y G(ξ), Y ∈ {−1, 0, 1}q×g , 


0 ≤ Y G(ξ) ≤ e,
∀ξ ∈ Ξ.
(52)
All of our numerical results are carried out using the IBM ILOG CPLEX 12.5 optimization package on
a Intel Core i5 − 3570 CPU at 3.40GHz machine with 8GB RAM [25].
5.1
Multistage Inventory Control
In this case study, we consider a single item inventory control problem where the ordering decisions
are discrete. This problem can be formulated as an instance of the multistage adaptive optimization
27
problem with fixed-recourse, and can be described as follows. At the beginning of each time period
t ∈ T := {2, . . . , T }, the decision maker observes the product demand ξt that needs to be satisfied
robustly. This demand can be served in two ways: (i) by pre-ordering at stage 1 a maximum of N lots,
each delivering a fixed quantity qz at the beginning of period t, for a unit cost of cz ; (ii) or by placing an
order for a maximum of N lots, each delivering immediately a fixed quantity qy , for a unit cost of cy , with
cz < cy . For each lot n = 1, . . . , N , the pre-ordering binary decisions delivered at stage t are denoted by
zn,t ∈ {0, 1}, and the recourse binary ordering decisions are denoted by yn,t (∙) ∈ Bkt ,1 . If the ordered
quantity is greater than the demand, the excess units are stored in a warehouse, incurring a unit holding
cost ch , and can be used to serve future demand. The level of available inventory at each period is given
Pt PN
by It (∙) ∈ Rkt ,1 . In addition, the cumulative volume of pre-orders s=1 n=1 zn,s must not exceed the
ordering budget B tot,t . The decision maker wishes to determine the orders zn,t and yn,t (∙) that minimize
the total ordering and holding costs associated with the worst-case demand realization over the planning
horizon T . The problem can be formulated as the following multistage adaptive optimization problem.
minimize
subject to
max
ξ ∈Ξ
N
XX
cz qz zn,t + cy qy yn,t (ξ ) + ch It (ξ )
t
t
t∈T n=1
!
It (∙) ∈ Rkt ,1 ,
zn,t ∈ {0, 1}, yn,t (∙) ∈ Bkt ,1 ,
It (ξ ) = It−1 (ξ
t
t−1
)+
N
X
i=1
It (ξ t ) ≥ 0,
t X
N
X
s=1 n=1
∀n = 1, . . . , N,
qz zn,t + qy yn,t (ξ ) − ξt ,
t
zn,s ≤ B tot,t ,
The uncertainty set Ξ is given by,

































(53)
∀t ∈ T , ∀ξ ∈ Ξ.
Ξ := ξ ∈ RT : ξ1 = 1, l ≤ ξ ≤ u .
We emphasize that since we are interested in minimizing the total costs associated with the worst-case
demand realization, the model does not require to specify a distribution for our uncertainty demand.
Moreover, notice that the real-valued inventory decision It (ξ t ) can be eliminated from formulation (53)
Pt PN
s
by substituting It (ξ t ) = I1 + s=1
i=1 qz zn,s + qy yn,s (ξ ) − ξs in both the objective and constraints,
thus eliminating the need to further approximate It (∙) using decision rules.
For our computational experiments we randomly generated 30 instances of Problem (53). The parameters are randomly chosen using a uniform distribution from the following sets: Advanced and instant
28
ordering costs are chosen from cz ∈ [0, 5] and cy ∈ [0, 10], respectively, such that cz < cy , and holding
costs are elements of ch ∈ [0, 5]. The bounds for each random parameter are chosen from li ∈ [0, 5] and
ui ∈ [10, 15] for i = 2, . . . , T . The cumulative ordering budget equals to B tot,t = 10(t−1), for t = 2, . . . , T .
We also assume that qz = qy = 15/N , and the initial inventory level equals to zero, i.e., I1 = 0.
Average Objective Value Improvements, N = 2
T
10
15
20
Global optimality
r=1
r=5
r=9
15%
17%
17%
16%
18%
18%
18%
19%
20%
1% optimality
r=1 r=5
r=9
14%
17%
17%
16%
18%
18%
18%
19%
20%
5% optimality
r=1 r=5 r=9
14%
16%
16%
16%
17%
17%
17%
18%
19%
Average Solution Time, N = 2
T
10
15
20
Global optimality
r=1
r=5
r=9
<1sec
4sec
42sec
<1sec 166sec
983sec
1sec 688sec 5580sec
1%
r=1
<1sec
<1sec
<1sec
optimality
r=5
r=9
4sec
15sec
16sec
83sec
46sec 710sec
5% optimality
r=1 r=5 r=9
<1sec
1sec
8sec
<1sec
3sec 10sec
<1sec
9sec 92sec
Table 2: Comparison of average improvement of adaptive versus static binary decision rules
using the performance measure (Non-Adapt.−Adapt.)
, (top), and average solution time of binary
Non-Adapt.
decision rules (bottom).
In the first test series, we compared the performance of the binary decision rules versus the nonadaptive binary decisions for the randomly generated instances. We consider binary decision rules for
yn,t (∙) that have r ∈ {1, 5, 9} breakpoints in each direction ξi , and planning horizons T ∈ {10, 15, 20}.
Moreover, we consider three different termination criteria for the mixed-integer linear optimization problems: (i) global optimality, (ii) 1% optimality and (iii) 5% optimality. All non-adaptive problems were
solved to global optimality. Table 2 presents the results from the case where N = 2, in terms of the
average improvement of the binary decision rules versus the static binary decisions. The improvement is
defined as quantity
(Non-Adapt.−Adapt.)
,
Non-Adapt.
where Non-Adapt. and Adapt. are the objective values of the non-
adaptive and adaptive binary decision rule problems, respectively. The first observation is the significant
improvement over the not-adaptive decisions, and the improvement in solution quality as the complexity
of the binary decision rules increase. As expected, solving the problems to global optimality achieves
a slightly better objective value compared to the relaxed termination criteria cases. Nevertheless, the
computational burden is substantially larger, rendering the decision rules impractical for T > 20 and
r > 9. Using the problem instances of this test series, we also compared the binary decision rule structure
given in (6a) and (52), i.e., the decision variable in the mixed-integer linear optimization problems were
Y ∈ Zq×g and Y ∈ {−1, 0, 1}q×g , respectively. In both cases, the optimal solution was identical, but the
29
solution time for the case Y ∈ Zq×g was substantially longer, rendering the decision rules impractical for
T > 15 and r > 5.
Binary Decision Rule Structure (2) v.s. (6), N = 2
T
2
4
6
8
10
Average Performance
r=0 r=1 r=5 r=9
19%
3%
2%
1%
29%
16%
12%
10%
52%
34%
27%
26%
60%
40%
29%
28%
47%
38%
31%
30%
Design
<1sec
3sec
607sec
1628sec
16613sec
Solution Time
r=0 r=1 r=5
<1sec <1sec <1sec
<1sec <1sec <1sec
<1sec <1sec <1sec
<1sec <1sec
1sec
<1sec <1sec
1sec
r=9
<1sec
<1sec
<1sec
1sec
5sec
Table 3: Comparison between the binary decision rules discussed in Bertsimas and
Georghiou [10] (Design), given by (2) with P = 1, and the proposed decision rules (Adaptive).
. The proposed decision rule problems
The performance is defined as quantity (Adapt.−Design)
Design
have great scalability properties but do not perform as well compared to binary decision rules
of type (2).
In our second test series, we investigate the relative performance between the proposed decision rules
and the binary decision rules discussed in Bertsimas and Georghiou [10]. In particular, we used decision
rules of type (2) with P = 1, and the corresponding decision rule problem was solved using Algorithm 2
= 0.01 and βb = 10−4 , see [10, Section 2.3] for more details. The
with parameters δ = 0.01, β = 10−4 , b
termination criteria for all mixed-integer linear optimization problems was fixed to 5%. The results are
presented Table 3, and constructed using the randomly generated instances for the cases where N = 2,
T ∈ {2, . . . , 10} and r ∈ {0, 1, 5, 9}. The performance is defined as quantity
(Adapt.−Design)
,
Design
where Adapt.
and Design are the objective values of the proposed binary decision rule structure and the decision rule
structure given by (2), respectively. As expected, structure (2) is much more flexible compared to the
proposed structure, producing high quality designs. This is the case as the discontinuity of the binary
decision rule is decided endogenously by the optimization problem. However, this flexibility comes at a
high computational costs, rendering decision rules (2) impractical for problem instances with T > 10.
On the other hand, the proposed decision rules are highly scalable, providing good quality solutions to
particularly challenging problem instances within a few seconds.
In our third test series, we investigate the scalability properties of the proposed binary decision
rules and their relative performance. In this test series, the termination criterion of the mixed-integer
linear optimization problems was fixed to 5%. The results are presented in Figure 6 and Table 4, and
constructed using the randomly generated instances for the cases where N ∈ {2, 3, 4}, T ∈ {10, . . . , 50}
and r ∈ {1, 3, 5}. As expected, solution quality increases as the decision rules become more flexible.
It is interesting to see that these improvements saturate for r > 3, getting diminishing returns for the
computational effort required.
30
40
60
r=1
r=3
r=5
50
Average Improvement (%)
Average Improvement (%)
50
N =3
60
r=1
r=3
r=5
50
Average Improvement (%)
N =2
60
40
40
30
30
30
20
20
20
10
10
20
30
Time Stages
40
50
N =4
r=1
r=3
r=5
10
10
20
30
Time Stages
40
50
10
10
20
30
Time Stages
40
50
Figure 6: Comparison of average performance of adaptive versus static binary decision rules
.
for N ∈ {2, 3, 4} and r ∈ {1, 3, 5} using the performance measure (Non-Adapt.−Adapt.)
Non-Adapt.
Average Solution Time
T
10
20
30
40
50
r=1
<1sec
<1sec
<1sec
<1sec
1sec
N =2
r=3
r=5
<1sec
1sec
1sec
9sec
5sec
63sec
9sec 114sec
12sec 225sec
r=1
<1sec
<1sec
<1sec
1sec
1sec
N =3
r=3
r=5
<1sec
2sec
2sec
48sec
7sec
224sec
12sec
453sec
36sec 1951sec
r=1
<1sec
<1sec
<1sec
1sec
1sec
N =4
r=3
r=5
<1sec
4sec
5sec
57sec
31sec
380sec
95sec
4615sec
357sec 19298sec
Table 4: Average solution time for N ∈ {2, 3, 4} and r ∈ {1, 3, 5}.
5.2
Multistage Knapsack Problem
In this case study, we consider a multistage knapsack problem, which can be formulated as an instance of
the multistage adaptive optimization problem with random recourse and binary recourse decisions. In the
classic one-stage knapsack problem, the decision maker is given N items, each weighing wn , n = 1, . . . , N ,
and a knapsack that can carry a maximum weight w. The decision maker wants to maximize the number
of items that fit into the knapsack, by choosing binary decisions yn ∈ {0, 1} for each item n, while keeping
PN
the total weight of the items in the knapsack n=1 wn yn less than w. In the multistage variant of the
problem, at the beginning of each time period t ∈ T := {1, . . . , T }, the decision maker is given N items,
each weighing wn,t (ξt ), n = 1, . . . , N , and an empty knapsack with capacity w. The weighs wn,t (ξt )
are assume to be linear functions of the random parameter ξt that realizes at stage t. Upon observing
the weight of each item, the decision maker then decides the subset of the N items that can fit in the
knapsack. The items left behind can be taken at later stages. We denote by yn,t,s (∙) ∈ Bks ,1 the binary
decision corresponding to item with weight wn,t (ξt ), taking value 1 if the item is fitted into the knapsack
at stage s, s ≥ t, and zero otherwise. The decision maker wishes to maximize the expected number of
items that can fit into the knapsacks over the planning horizon T by determining the collection of items
yn,t,s (∙). The problem can be formulated as the following multistage optimization problem with random
31
recourse.
maximize
N XX
T
X
Eξ
yn,t,s (ξ )
s
n=1 t∈T s=t
subject to
!



yn,t,s (∙) ∈ Bks ,1 , s = 1 . . . , t, n = 1 . . . , N, 




t X
N

X

t
wn,s (ξs ) yn,s,t (ξ ) ≤ w,
∀t ∈ T , ∀ξ ∈ Ξ.

s=1 n=1


T

X



yn,t,s (ξ s ) ≤ 1, n = 1 . . . , N,


(54)
s=t
Here, the second set of constraints ensure that at each stage t, the total weight of the knapsack is below
its capacity w, while the last set of constraints ensure that an item can only be carried in the knapsack
only once over the time horizon. The random vector ξ follows a beta distribution, Beta(α, β), and the
uncertainty set Ξ is given by,
Ξ := ξ ∈ RT : ξ1 = 1, 0 ≤ ξ ≤ e .
For our computational experiments we randomly generated 30 instances of Problem (54). The parameters are randomly chosen using a uniform distribution from the following sets: The weight of item n
bn,t ξt where w
bn,t ∈ [0.5, 1], and the parameters of the beta distribution
at stage t is given by wn,t (ξt ) = w
are chosen from α ∈ (0, 5] and β ∈ (0, 5]. The capacity of the knapsack is set to w = N/2.
In the first test series, we compared the performance of the binary decision rules versus the nonadaptive binary decisions for the randomly generated instances. We consider binary decision rule that
have r = {1, 2, 3} breakpoints in each direction ξi , for problem instances with N ∈ {2, 3, 4} and planning
horizons T ∈ {3, . . . , 15}. The termination criterion of the mixed-integer linear optimization problems
was fixed to 5%. Figure 7 depicts the average improvement in objective value of the binary decisions
rules versus the static binary decisions. As before, the improvement in objective value is defined as
quantity
(Non-Adapt.−Adapt.)
.
Non-Adapt.
The results show a substantial improvement in solution quality for all
problem instances N ∈ {2, 3, 4}, with the most flexible decision rule structure achieving the best results.
For the time horizon and complexity of the binary decision rules considered, all problem instances were
solved within 20 minutes. The solution time for instances with T > 15 and r > 3 was substantially
longer.
In the second test series, we improve the scalability of the solution method by restricting the information available to the binary decision rules. To this end, we define ηs,t = (ξs , ξt )> , and we restrict
the recourse decisions yn,t,s (ξ s ) to have structure yn,t,s (ηs,t ) for all n = 1, . . . , N, s = t, . . . , T and
t = 1 . . . , T . The number of integer decision variables needed to approximate yn,t,s (ξ s ) using the binary
decision rule structure (52) and (24) is s(r + 1), while for yn,t,s (ηs,t ) the number of integer decision
32
70
60
50
40
30
20
4
6
8
10
Time Stages
12
N =3
80
Average Improvement (%)
Average Improvement (%)
r=1
r=2
r=3
14
r=1
r=2
r=3
70
60
50
40
30
20
4
6
8
10
Time Stages
12
N =4
80
Average Improvement (%)
N =2
80
14
r=1
r=2
r=3
70
60
50
40
30
20
4
6
8
10
Time Stages
12
14
Figure 7: Comparison of average performance of adaptive versus static binary decision
rules for N ∈ {2, 3, 4} and r ∈ {1, 2, 3}, using the performance measure (Non-Adapt.−Adapt.)
.
Non-Adapt.
All problems were solved within 20 minutes.
variables required is 2(r + 1). We will refer to ηs,t = (ξs , ξt )> as the partial information available to
the decision rule at stage s. Table 5 compares the solution quality and computation time, of the full
information and partial information structures for problem instances with N = 4, r ∈ {1, 2, 3} and
T ∈ {5, . . . , 25}. The solution quality of the full information structure is slightly better, but the solution
method suffers with respect to scalability. On the other hand, using the partial information structure
we can solve much bigger problem instances within very reasonable amount of time. We note that the
problem instance with N = 4 and T = 25 involves a total of 1300 binary decision rules.
Average Objective Value Improvements, N = 4
T
5
10
15
20
25
Full
r=1
32.5%
28.3%
28.9%
28.3%
26.8%
Information
r=2
r=3
34.8% 35.7%
32.0% 32.1%
31.5% 32.6%
n/a
n/a
n/a
n/a
Partial Information
r=1
r=2
r=3
30.5% 32.5%
34.2%
27.9% 30.3%
31.5%
26.9% 30.1%
31.2%
27.6% 30.9%
31.7%
26.0% 31.0%
31.9%
Average Solution Time, N = 4
5
10
15
20
25
Full Information
<1sec
1sec
1sec
4sec
7sec
25sec
27sec
52sec 364sec
76sec
n/a
n/a
396sec
n/a
n/a
Partial Information
<1sec <1sec
1sec
2sec
4sec
6sec
13sec
42sec
28sec
34sec 169sec
431sec
162sec 449sec 11349sec
Table 5: Comparison of average performance and average computational times for binary
decision rules with full and partial information structure, using the performance measure
(Non-Adapt.−Adapt.)
.
Non-Adapt.
33
6
Conclusions
In this paper, we present linearly parameterised binary decision rule structures that can be used in
conjunction with real-valued decision rules appearing in the literature, for solving multistage adaptive
mixed-integer optimization problems. We provide a systematic way to reformulate the binary decision
rule problem into a finite dimensional mixed-integer linear optimization problem, and we identify instances where the size of this problem grows polynomially with respect to the input data. The theory
presented covers both fixed-recourse and random-recourse problems. Our numerical results demonstrate
the effectiveness of the proposed binary decision rules and show that they are (i) highly scalable and (ii)
provide significant improvements in solution quality.
Appendix A
Technical Proofs
In this section, we will provide the proof for Theorem 1. To do this, we need the following auxiliary
results given in Lemmas 1, 2 and 3. Lemma 1 gives the complexity of the integer feasibility problem,
which will be used in Lemmas 2 and 3 to prove that checking feasibility of a semi-infinite constraint
involving indicator functions is NP-hard.
Lemma 1 (Integer feasibility problem) The following decision problem is NP-hard:
W ∈ Rl×k , h ∈ Rl and N ∈ Z+ with N < k.
Pk
Question. Is there ξ ∈ {0, 1}k such that W ξ ≥ h, i=1 ξi = N ?
Instance.
Proof See [27].
(55)
Lemma 2 (Equivalent integer feasibility problem) The following decision problem is NP-hard:
W ∈ Rl×k , h ∈ Rl and N ∈ Z+ with N < k.
Pk
Pk
Question. Is there ξ ∈ [0, 2]k such that W ξ ≥ h, i=1 ξi = N, i=1 1(ξi ≥ 1) = N ?
Instance.
(56)
Proof In the following, we will show that if there exists ξ that satisfies the assertion, then ξ must be
in ξ ∈ {0, 1}k . If that is the case, then decision problem (56) reduces to the decision problem (55),
which from Lemma 1 we know to be NP-hard. We will prove this by contradiction. Assume, with loss
Pk
Pk
of generality, that there exists ξ with ξ1 > 1 that satisfy W ξ ≥ h, i=1 ξi = N, i=1 1(ξi ≥ 1) = N .
Pk
Pk
Since ξ1 > 1, then i=2 ξi < N − 1 and as a result i=2 1(ξi ≥ 1) < N − 1, which contradicts the
Pk
assumption that i=1 1(ξi ≥ 1) = N . Now, assume that there exists ξ with 0 < ξ1 < 1 that satisfy
Pk
Pk
Pk
W ξ ≥ h, i=1 ξi = N, i=1 1(ξi ≥ 1) = N . Since 0 < ξ1 < 1, then i=2 1(ξi ≥ 1) < N − 1, which
34
again contradicts the assumption that
Pk
i=1
1(ξi ≥ 1) = N . Therefore, we conclude that ξ ∈ {0, 1}k ,
and thus the decision problem (56) is NP-hard.
The following lemma provides the key ingredient for proving Theorem 1.
Lemma 3 The following decision problem is NP-hard:
A convex polytope Ξ ⊂ Rk and τ ∈ R.
Pk
Question. Do all ξ ∈ Ξ satisfy
i=1 1(ξi ≥ 1) ≤ τ ?
Instance.
(57)
Proof We will show that an instance of decision problem (57) can be reduced to an equivalent integer
feasibility problem (56). Consider the instance τ = N − 1, with N < k, and Ξ be given by
Ξ :=
(
ξ ∈ [0, 2] : W ξ ≥ h,
k
k
X
ξi = N
i=1
)
(58)
.
Therefore, the decision problem (57) evaluates to true if there is no ξ such that
ξ ∈ [0, 2]k such that W ξ ≥,
k
X
ξi = N,
i=1
k
X
i=1
1(ξi ≥ 1) = N.
(59)
Checking the latter assertion is equivalent to checking if the decision problem (56) holds. Notice that
Pk
by construction, there is no ξ such that i=1 1(ξi ≥ 1) > N . Using Lemma 2, we can thus deduce that
(57) is an NP-hard problem.
We know have all the ingredients to prove Theorem 1.
Proof of Theorem 1 Let Ξ be a convex polytope given by
n
Ξ = ξ ∈ [0, 2] : W ξ ≥ h,
k
k
X
i=1
o
ξi = N ,
and denote by Pξ the uniform distribution on Ξ. We define the following instance of (3).
minimize
0
subject to
y(∙) ∈ Bk,k ,
ξi − 1 ≤ yi (ξ) ≤ ξi ∀i = 1, . . . , k,
k
X
yi (ξ) ≤ N − 1,
i=1
35















∀ξ ∈ Ξ.
(60)
The optimal solution of Problem (60) is given by yi∗ (ξ) := 1(ξi ≥ 1), i = 1, . . . , k. We thus have
k
X
i=1
1(ξi ≥ 1) ≤ N − 1
∀ξ ∈ Ξ
⇐⇒
Problem (60) is feasible.
Hence, Lemma 3 implies that checking the feasibility of Problem (60) is NP-hard. We now set αi = ei
and βi = 1, for i = 2, . . . , k, in the description of G(∙) in (6b). By construction, there exists Y ∈ {0, 1}k×g
such that y ∗ (ξ) = Y G (ξ), that achieves the same objective value in all Problems (11), (15) and (60).
The above arguments allow us to conclude that
k
X
i=1
1(ξi ≥ 1) ≤ N −1
∀ξ ∈ Ξ
⇐⇒
Problem (11) is feasible
⇐⇒
Propositions 1 & 2
Problem (15) is feasible.
Lemma 3 implies that Problems (11) and (15) are NP-hard, even when Y ∈ Zk×g is relaxed to Y ∈
Rk×g .
References
[1] B.D.O. Anderson and J.B. Moore. Optimal Control: Linear Quadratic Methods. Prentice Hall, 1990.
[2] D. Bampou and D. Kuhn. Scenario-free stochastic programming with polynomial decision rules.
50th IEEE Conference on Decision and Control and European Control Conference, Orlando, FL,
USA, pages 7806–7812, December 2011.
[3] A. Ben-Tal and D. den Hertog. Immunizing conic quadratic optimization problems against implementation errors, volume 2011-060. Tilburg University, 2011.
[4] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust Optimization. Princeton University Press,
2009.
[5] A. Ben-Tal, B. Golany, and S. Shtern. Robust multi-echelon multi-period inventory control. European Journal of Operational Research, 199(3):922–935, 2009.
[6] A. Ben-Tal, A. Goryashko, E. Guslitzer, and A. Nemirovski. Adjustable robust solutions of uncertain
linear programs. Mathematical Programming, 99(2):351–376, 2004.
[7] A. Ben-Tal and A. Nemirovski. Robust convex optimization. Mathematics of Operations Research,
23(4):769–805, 1998.
[8] D. Bertsimas and C. Caramanis. Adaptability via sampling. In 46th IEEE Conference on Decision
and Control, New Orleans, LA, USA, pages 4717–4722, December 2007.
36
[9] D. Bertsimas and C. Caramanis. Finite adaptability in multistage linear optimization. IEEE Transactions on Automatic Control, 55(12):2751–2766, 2010.
[10] D. Bertsimas and A. Georghiou. Design of near optimal decision rules in multistage adaptive mixedinteger optimization. Submitted for publication, available via Optimization-Online, 2013.
[11] D. Bertsimas, D.A. Iancu, and P.A. Parrilo. Optimality of affine policies in multi-stage robust
optimization. Mathematics of Operations Research, 35(2):363–394, 2010.
[12] D. Bertsimas, D.A. Iancu, and P.A. Parrilo. A hierarchy of near-optimal policies for multistage
adaptive optimization. IEEE Transactions on Automatic Control, 56(12):2809–2824, December
2011.
[13] G. C. Calafiore. Multi-period portfolio optimization with linear control policies.
Automatica,
44(10):2463–2473, 2008.
[14] G. C. Calafiore. An affine control method for optimal dynamic asset allocation with transaction
costs. SIAM Journal on Control and Optimization, 48(4):2254–2274, 2009.
[15] C. Caramanis. Adaptable optimization: Theory and algorithms. PhD thesis, Massachusetts Institute
of Technology, 2006.
[16] X. Chen, M. Sim, P. Sun, and J. Zhang. A linear decision-based approximation approach to stochastic programming. Operations Research, 56(2):344–357, 2008.
[17] X. Chen and Y. Zhang. Uncertain linear programs: Extended affinely adjustable robust counterparts.
Operations Research, 57(6):1469–1482, 2009.
[18] M. Dyer and L. Stougie. Computational complexity of stochastic programming problems. Mathematical Programming, 106(3):423–432, 2006.
[19] Stanley J. Garstka and Roger J.-B. Wets. On decision rules in stochastic programming. Mathematical
Programming, 7(1):117–143, 1974.
[20] A. Georghiou, W. Wiesemann, and D. Kuhn. Generalized decision rule approximations for stochastic
programming via liftings. Mathematical Programming, to appear, 2014.
[21] J. Goh and M. Sim. Distributionally robust optimization and its tractable approximations. Operations Research, 58(4):902–917, 2010.
[22] P. J Goulart, E. C Kerrigan, and J. M Maciejowski. Optimization over state feedback policies for
robust control with constraints. Automatica, 42(4):523–533, 2006.
37
[23] C. E. Gounaris, W. Wiesemann, and C. A. Floudas. The robust capacitated vehicle routing problem
under demand uncertainty. Operations Research, 61(3):677–693, 2013.
[24] G. A. Hanasusanto, D. Kuhn, and W. Wiesemann. Two-stage robust integer programming. Submitted for publication, available via Optimization-Online, 2014.
[25] IBM ILOG CPLEX. http://www.ibm.com/software/integration/optimization/ cplex-optimizer.
[26] D. Kuhn, W. Wiesemann, and A. Georghiou. Primal and dual linear decision rules in stochastic
and robust optimization. Mathematical Programming, 130(1):177–209, 2011.
[27] C. H. Papadimitriou. On the complexity of integer programming. Journal of the Association for
Computing Machinery, 28(4):765–768, 1981.
[28] R.T. Rockafellar and R.J.B. Wets. Variational Analysis. Springer, 1997.
[29] C.-T. See and M. Sim. Robust approximation to multiperiod inventory management. Operations
Research, 58(3):583–594, 2010.
[30] A. Shapiro and A. Nemirovski. On complexity of stochastic programming problems. Applied Optimization, 99(1):111–146, 2005.
[31] J. Skaf and S. P. Boyd. Design of affine controllers via convex optimization. IEEE Transactions on
Automatic Control, 55(11):2476–2487, 2010.
38