Binary Decision Rules for Multistage Adaptive Mixed-Integer Optimization Dimitris Bertsimas∗ Angelos Georghiou† August 20, 2014 Abstract Decision rules provide a flexible toolbox for solving the computationally demanding, multistage adaptive optimization problems. There is a plethora of real-valued decision rules that are highly scalable and achieve good quality solutions. On the other hand, existing binary decision rule structures tend to produce good quality solutions at the expense of limited scalability, and are typically confined to worst-case optimization problems. To address these issues, we first propose a linearly parameterised binary decision rule structure and derive the exact reformulation of the decision rule problem. In the cases where the resulting optimization problem grows exponentially with respect to the problem data, we provide a systematic methodology that trades-off scalability and optimality, resulting to practical binary decision rules. We also apply the proposed binary decision rules to the class of problems with random-recourse and show that they share similar complexity as the fixed-recourse problems. Our numerical results demonstrate the effectiveness of the proposed binary decision rules and show that they are (i) highly scalable and (ii) provide high quality solutions. Keywords. Adaptive Optimization, Binary Decision Rules, Mixed-integer optimization. 1 Introduction In this paper, we use robust optimization techniques to develop efficient solution methods for the class of multistage adaptive mixed-integer optimization problems. The one-stage variant of the problem can be described as follows: Given matrices A ∈ Rm×k , B ∈ Rm×q , C ∈ Rn×k , D ∈ Rq×k , H ∈ Rm×k and a probability measure Pξ supported on set Ξ for the uncertain vector ξ ∈ Rk , we are interested in choosing ∗ Sloan School of Management and Operations Research Center, Massachusetts Institute of Technology, USA, E-mail: [email protected]. † Automatic Control Laboratory, ETH Zurich, Switzerland, E-mail: [email protected], [email protected] 1 n real-valued functions x(∙) ∈ Rk,n and q binary functions y(∙) ∈ Bk,q in order to solve: minimize (Cξ)> x(ξ) + (Dξ)> y(ξ) x(∙) ∈ Rk,n , y(∙) ∈ Bk,q , Eξ subject to Ax(ξ) + By(ξ) ≤ Hξ, (1) ∀ξ ∈ Ξ. Here, Rk,n denotes the space of all real-valued functions from Rk to Rn , and Bk,q the space of binary functions from Rk to {0, 1}q . Problem (1) has a long history both from the theoretical and practical point of view, with applications in many fields such as engineering [22, 31], operations management [5, 29], and finance [13, 14]. From a theoretical point of view, Dyer and Stougie [18] have shown that computing the optimal solution for the class of Problems (1) involving only real-valued decisions, is #P-hard, while their multistage variants are believed to be “computationally intractable already when medium-accuracy solutions are sought” [30]. In view of these complexity results, there is a need for computationally efficient solution methods that possibly sacrifices optimality for tractability. To quote Shapiro and Nemirovski [30], “. . . in actual applications it is better to pose a modest and achievable goal rather than an ambitious goal which we do not know how to achieve.” A drastic simplification that achieves this goal is to use decision rules. This functional approximation restricts the infinite space of the adaptive decisions x(∙) and y(∙) to admit pre-specified structures, allowing the use of numerical solution methods. Decisions rules for real-valued functions have been used since 1974 [19]. Nevertheless, their potential was not fully exploited until recently, when new advances in robust optimization provided the tools to reformulate the decision rule problem as tractable convex optimization problems [4, 7]. Robust optimization techniques were first used by Ben-Tal et al. [6] to reformulate the linear decision rule problem. In this work, real-valued functions are parameterised as linear function of the random variables, i.e., x(ξ) = x> ξ for x ∈ Rk . The simple structure of linear decision rules offers the scalability needed to tackle multistage adaptive problems. Even though linear decision rules are known to be optimal in a number of problem instances [1, 11, 23], their simple structure generically sacrifices a significant amount of optimality in return for scalability. To gain back some degree of optimality, attention was focused on the construction of non-linear decision rules. Inspired by the linear decision rule structure, the real-valued adaptive decisions are parameterised as x(ξ) = 0 0 x> L(ξ), x ∈ Rk , where L : Rk → Rk , k 0 ≥ k, is a non-linear operator defining the structure of the decision rule. This approximation provides significant improvements in solution quality, while retaining in large parts the favourable scalability properties of the linear decision rules. Table 1 summarizes non-linear decision rules from the literature that are parameterized as x(ξ) = x> L(ξ), together with the complexity of the induced convex optimization problems. A notable exception of real-valued decision rules 2 Real-Valued Decision Rule Structures x(ξ) = x> L(ξ) Computational Burden Linear LOP Piecewise linear LOP Multilinear LOP Quadratic SOCOP SOCOP Power, Monomial, Inverse monomial Polynomial SDOP Reference [6, 22, 26, 31] [17, 16, 20, 21] [20] [3, 20] [20] [2, 12] Table 1: The table summarizes the literature of real-valued decision rule structures for which the resulting semi-infinite optimization problem can be reformulated exactly using robust optimization techniques. The computation burden refers to the structure of the resulting optimization problems for the case where the uncertainty set is described by a polyhedral set, with Linear Optimization Problems denoted by LOP, Second-Order Cone Optimization Problems denoted by SOCOP, and Semi-Definite Optimization Problems denoted by SDOP. > > > k that are alternatively parameterized as x(ξ) = max{x> 1 ξ, . . . , xP ξ} − max{x1 ξ, . . . , xP ξ}, xp , xp ∈ R , is proposed by Bertsimas and Georghiou [10] in the framework of worst-case adaptive optimization. This structure offers near-optimal designs but requires the solution of mixed-integer optimization problems. In contrast to the plethora of decision rule structures for real-valued decisions, the literature on discrete decision rules is somewhat limited. There are, however, three notable exceptions that deal with the case of worst-case adaptive optimization. The first one is the work of Bertsimas and Caramanis [8], where integer decision rules are parameterized as y(ξ) = y > dξe, y ∈ Zk , where d∙e is the component-wise ceiling function. In this work, the resulting semi-infinite optimization problem is further approximated and solved using a randomized algorithm [15], providing only probabilistic guaranties on the feasibility of the solution. The second binary decision rule structure was proposed by Bertsimas and Georghiou [10], where the decision rule is parameterized as y(ξ) = 1, 0, > > > max{y > 1 ξ, . . . , y P ξ} − max{y 1 ξ, . . . , y P ξ} ≤ 0, (2) otherwise, with y p , y p ∈ Rk , p = 1, . . . , P . This binary decision rule structure offers near-optimal designs at the expense of scalability. Finally, in the recent work by Hanasusanto et al. [24], the binary decision rule is restricted to the so called “K-adaptable structure”, resulting to binary adaptable policies with K contingency plans. This approximation is an adaption of the work presented in Bertsimas and Caramanis [9], and its use is limited to two-stage adaptive problems. The goal of this paper is to develop binary decision rules structures that can be used together with the real-valued decision rules listed in Table 1 for solving multistage adaptive mixed-integer optimization problems. The proposed methodology is inspired by the work of Bertsimas and Caramanis [8], and uses the tools developed in Georghiou et al. [20] to robustly reformulate the problem into a finite dimensional 3 mixed-integer linear optimization problem. The main contributions of this paper can be summarized as follows. 1. We propose a linear parameterized binary decision rule structure, and derive the exact reformulation of the decision rule problem that achieves the best binary decision rule. We prove that for general polyhedra uncertainty sets and arbitrary decision rule structures, the decision rule problem is computationally intractable, resulting in mixed-integer linear optimization problems whose size can grow exponentially with respect to the problem data. To remedy this exponential growth, we use similar ideas as those discussed in Georghiou et al. [20], to provide a systematic trade-off between scalability and optimality, resulting to practical binary decision rules. 2. We apply the proposed binary decision rules to the class of problems with random-recourse, i.e., to problem instances where the technology matrix B(ξ) is a given function of the uncertain vector ξ. We provide the exact reformulation of the binary decision rule problem, and we show that the resulting mixed-integer linear optimization problem shares similar complexity as in the fixedrecourse case. 3. We demonstrate the effectiveness of the proposed methods in the context of a multistage inventory control problem (fixed-recourse problem), and a multistage knapsack problem (random-recourse problem). We show that for the inventory control problem, we are able to solve problem instances with 50 time stages and 200 binary recourse decisions, while for the knapsack problem we are able to solve problem instances with 25 time stages and 1300 binary recourse decisions. The rest of this paper is organized as follows. In Section 2, we outline our approach for binary decision rules in the context of one-stage adaptive optimization problems with fixed-recourse, and in Section 3 we discuss the extension to random-recourse problems. In Section 4, we extend the proposed approach to multistage adaptive optimization problems and in Section 5, we present our computational results. Throughout the paper, our work is focused on problem instances involving only binary decision rules for ease of exposition. The extension to problem instances involving both real-valued and binary recourse decisions can be easily extrapolated, and thus omitted. Notation. We denote scalar quantities by lowercase, non-bold face symbols and vector quantities by lowercase, boldface symbols, e.g., x ∈ R and x ∈ Rn , respectively. Similarly, scalar and vector valued functions will be denoted by, x(∙) ∈ R and x(∙) ∈ Rn , respectively. Matrices are denoted by uppercase symbols , e.g., A ∈ Rn×m . We model uncertainty by a probability space Rk , B Rk , Pξ and denote the elements of the sample space Rk by ξ. The Borel σ-algebra B Rk is the set of events that are assigned probabilities by the probability measure Pξ . The support Ξ of Pξ represents the smallest closed subset 4 of Rk which has probability 1, and Eξ (∙) denotes the expectation operator with respect to Pξ . Tr (A) denotes the trace of a square matrix A ∈ Rn×n and 1(∙) is the indicator function. Finally, ek the kth canonical basis vector, while e denotes the vector whose components are all ones. In both cases, the dimension will be clear from the context. 2 Binary Decision Rules for Fixed-Recourse Problems In this section, we present our approach for one-stage adaptive optimization problems with fixed recourse, involving only binary decisions. Given matrices B ∈ Rm×q , D ∈ Rq×k , H ∈ Rm×k and a probability measure Pξ supported on set Ξ for the uncertain vector ξ ∈ Rk , we are interested in choosing binary functions y(∙) ∈ Bk,q in order to solve: (Dξ)> y(ξ) minimize Eξ subject to y(∙) ∈ Bk,q , By(ξ) ≤ Hξ, (3) ∀ξ ∈ Ξ. Here, we assume that the uncertainty set Ξ is a non-empty, convex and compact polyhedron n o Ξ = ξ ∈ Rk : ∃ζ ∈ Rv such that W ξ + U ζ ≥ h, ξ1 = 1 , (4) where W ∈ Rl×k , U ∈ Rl×v and h ∈ Rl . Moreover, we assume that the linear hull of Ξ spans Rk . The parameter ξ1 is set equal to 1 without loss of generality as it allows us to represent affine functions of the non-degenerate outcomes (ξ2 , . . . , ξk ) in a compact manner as linear functions of ξ = (ξ1 , . . . , ξk ). If the distribution governing the uncertainty ξ is unknown or if the decision maker is very risk-averse, then one might choose to alternatively minimize the worst-case costs with respect to all possible scenarios ξ ∈ Ξ. This can be achieved, by replacing the objective function in Problem (3) with the worst-case ob jective maxξ ∈Ξ (Dξ)> y(ξ) . By introducing the auxiliary variable τ ∈ R and an epigraph formulation, we can equivalently write the worst-case problem as the following adaptive robust optimization problem. minimize subject to τ τ ∈ R, y(∙) ∈ Bk,q , > (Dξ) y(ξ) ≤ τ, By(ξ) ≤ Hξ, ∀ξ ∈ Ξ. (5) Notice that ξ appears in the left hand side of the constraint (Dξ)> y(ξ) ≤ τ , multiplying the binary 5 decisions y(ξ). Therefore, Problem (5) is an instance of the class of problems with random-recourse, which will be investigated in Section 3. Problem (3) involves a continuum of decision variables and inequality constraints. Therefore, in order to make the problem amenable to numerical solutions, there is a need for suitable functional approximations for y(∙). 2.1 Structure of Binary Decision Rules In this section, we present the structure of the binary decision rules. We restrict the feasible region of the binary functions y(∙) ∈ Bk,q to admit the following piecewise constant structure: y(ξ) = Y G(ξ), Y ∈ Zq×g , 0 ≤ Y G(ξ) ≤ e, where G : Rk → {0, 1}g is a piecewise constant function: G1 (ξ) := 1, Gi (ξ) := 1 α> i ξ ≥ βi , ∀ξ ∈ Ξ, i = 2, . . . , g, (6a) (6b) for given αi ∈ Rk and βi ∈ R, i = 2, . . . , g. Here, we assume that both the set {ξ ∈ Ξ : α> i ξ ≥ βi } and > > {ξ ∈ Ξ : α> i ξ ≤ βi } have non-empty interior for all i = 2, . . . , g, and αi ξ − βi 6= αj ξ − βj for all ξ ∈ Ξ where i, j ∈ {1, . . . , g}, i 6= j. Since, G(∙) is a piecewise constant function, y(ξ) = Y G(ξ), Y ∈ Zq×g can give rise to an arbitrary integer decision rule. By imposing the additional constraint 0 ≤ Y G(ξ) ≤ e, for all ξ ∈ Ξ, y(∙) is restricted to the space of binary decision rules. Applying decision rules (6) to Problem (3) yields the following semi-infinite problem, which involves a finite number of decision variables Y ∈ Zq×g , and an infinite number of constraints: minimize Eξ subject to Y ∈ Zq×g , (Dξ)> Y G(ξ) B Y G(ξ) ≤ Hξ, 0 ≤ Y G(ξ) ≤ e, (7) ∀ξ ∈ Ξ. Notice that all decision variables appear linearly in Problem (7). Nevertheless, the objective and constraints are non-linear functions of ξ. The following example illustrates the use of the binary decision rule (7) in a simple instance of Problem (3), motivating the choice of vectors αi ∈ Rk and βi ∈ R, i = 2, . . . , g, and showing the 6 relationship between Problems (3) and (7). Example 1 Consider the following instance of Problem (3): minimize Eξ y(ξ) y(∙) ∈ B2,1 , subject to y(ξ) ≥ ξ2 , where Pξ is a uniform distribution supported on Ξ = (8) ∀ξ ∈ Ξ, (ξ1 , ξ2 ) ∈ R2 : ξ1 = 1, ξ2 ∈ [−1, 1] . The optimal solution of Problem (8) is y ∗ (ξ) = 1(ξ2 ≥ 0) achieving an optimal value of 1 2. One can attain the same solution by solving the following semi-infinite problem where G(∙) is defined to be G1 (ξ) = 1, G2 (ξ) = 1(ξ2 ≥ 0), i.e., α2 = (0, 1)> , β2 = 0 and g = 2, see Figure 1. minimize Eξ y > G(ξ) subject to y ∈ Z2 , > y G(ξ) ≥ ξ2 , 0 ≤ y G(ξ) ≤ 1, > (9) ∀ξ ∈ Ξ. The optimal solution of Problem (9) is y ∗ = (0, 1)> achieving an optimal value of 1 2, and is equivalent to the optimal binary decision rule in Problem (8). G2 (ξ) 1 0 -1 0 1 ξ2 Ξ Figure 1: Plot of piecewise constant function G2 (ξ ) = 1(ξ2 ≥ 0). This structure of G(∙) will give rise of piecewise constant decision rules y(ξ ) = y > G(ξ ), for some y ∈ R2 . If one imposes constraints y ∈ Z2 and 0 ≤ y > G(ξ ) ≤ 1, y(∙) is restricted to the space of binary decision rules induced by G. In the following section, we present robust reformulations of the semi-infinite Problem (7). 2.2 Computing the Best Binary Decision Rule In this section, we present an exact reformulation of Problem (7). The solution of the reformulated problem will achieve the best binary decision rule associated with structure (6). The main idea used in 7 this section is to map Problem (7) to an equivalent lifted adaptive optimization problem on a higherdimensional probability space. This will allow to represent the non-convex constraints with respect to ξ in Problem (7), as linear constraints in the lifted adaptive optimization problem. The relation between the uncertain parameters in the original and the lifted problems is determined through the piecewise constant operator G(∙). By taking advantage of the linear structure of the lifted problem, we will employ linear duality arguments to reformulate the semi-infinite structure of the lifted problem into a finite dimensional mixed-integer linear optimization problem. The work presented in this section uses the tools developed by Georghiou et al. [20, Section 3]. We define the non-linear operator ξ L(ξ) = , G(ξ) 0 L : R k → Rk , 0 (10) 0 where k 0 = k + g, and define the projection matrices Rξ ∈ Rk×k , RG ∈ Rg×k such that Rξ L(ξ) = ξ, RG L(ξ) = G(ξ). We will refer to L(∙) as the lifting operator. Using L(∙) and Rξ , RG , we can rewrite Problem (7) into the following optimization problem: (DRξ L(ξ))> Y RG L(ξ) minimize Eξ subject to Y ∈ Zq×g , B Y RG L(ξ) ≤ HRξ L(ξ), 0 ≤ Y RG L(ξ) ≤ e, (11) ∀ξ ∈ Ξ. Notice that this simple reformulation still has infinite number of constraints. Nevertheless, due to the structure of Rξ and RG , the constraints are now linear with respect to L(ξ). The objective function of Problem (11) can be further reformulated to Eξ 0 (DRξ L(ξ))> Y RG L(ξ) = Tr M Rξ> D > Y RG , 0 where M ∈ Rk ×k , M = Eξ (L(ξ)L(ξ)> ) is the second order moment matrix of the uncertain vector L(ξ). In some special cases where the Pξ has a simple structure, e.g., Pξ is a uniform distribution, and G(∙) is not too complicated, M can be calculated analytically. If this is not the case, an arbitrarily good approximation of M can be calculated using Monte Carlo simulations. This is not computationally 8 demanding as it does not does not involve an optimization problem, and can be done offline. 0 We now define the uncertain vector ξ 0 = (ξ10> , ξ20> )> ∈ Rk , such that ξ 0 := L(ξ) = (ξ > , G(ξ)> )> . ξ 0 0 0 has probability measure Pξ 0 which is defined on the space Rk , B Rk and is completely determined by the probability measure Pξ through the relation Pξ 0 (B 0 ) := Pξ 0 {ξ ∈ Rk : L(ξ) ∈ B 0 } ∀B 0 ∈ B(Rk ). (12) o n 0 Ξ0 := L(Ξ) = ξ 0 ∈ Rk : ∃ξ ∈ Ξ such that L(ξ) = ξ 0 , o n 0 = ξ 0 ∈ Rk : Rξ ξ 0 ∈ Ξ, L(RG ξ 0 ) = ξ 0 , (13) We also introduce the probability measure support and the expectation operator Eξ 0 (∙) with respect to the probability measure Pξ 0 . We will refer to Pξ 0 and Ξ0 as the lifted probability measure and uncertainty set, respectively. Notice that although Ξ is a convex polyhedron, Ξ0 can be highly non-convex due to the non-linear nature of L(∙). Using the definition of ξ 0 we introduce the following lifted adaptive optimization problem : minimize Tr M 0 Rξ> D > Y RG subject to Y ∈ Zq×g , 0 0 B Y RG ξ ≤ HRξ ξ , 0 ≤ Y RG ξ ≤ e, 0 0 0 (14) ∀ξ 0 ∈ Ξ0 , where M 0 ∈ Rk ×k , M 0 = Eξ 0 (ξ 0 ξ 0> ) is the second order moment matrix associated with ξ 0 . Since there is an one-to-one mapping between Pξ and Pξ 0 , M = M 0 . Proposition 1 Problems (11) and (14) are equivalent in the following sense: both problems have the same optimal value, and there is a one-to-one mapping between feasible and optimal solutions in both problems. Proof See [20, Proposition 3.6 (i)]. The lifted uncertainty set Ξ0 is an open set due to the discontinuous nature of G(∙). This can be 0 problematic in an optimization framework. We now define Ξ := cl(Ξ0 ) to be the closure of set Ξ0 , and 9 introduce the following variant of Problem (14). minimize Tr M 0 Rξ> D > Y RG subject to Y ∈ Zq×g , 0 0 B Y RG ξ ≤ HRξ ξ , 0 0 ≤ Y RG ξ ≤ e, 0 (15) ∀ξ ∈ Ξ , 0 The following proposition demonstrates that Problems (14) and (15) are in fact equivalent. Proposition 2 Problems (14) and (15) are equivalent in the following sense: both problems have the same optimal value, and there is a one-to-one mapping between feasible and optimal solutions in both problems. Proof To prove the assertion, it is sufficient to show that Problems (14) and (15) have the same feasible region. Notice, that since the constraints are linear in ξ 0 , the semi-infinite constraints in Problems (14) can be equivalently written as (HRξ − B Y RG ) ∈ (cone(Ξ0 )∗ )m , (Y RG ) ∈ (cone(Ξ0 )∗ )q , 0 ∗ q (ee> 1 − Y RG ) ∈ (cone(Ξ ) ) , where cone(Ξ0 )∗ is the dual cone of cone(Ξ0 ). Similarly, the semi-infinite constraints in Problems (15) can be equivalently written as 0 (HRξ − B Y RG ) ∈ (cone(Ξ )∗ )m , 0 (Y RG ) ∈ (cone(Ξ )∗ )q , 0 ∗ q (ee> 1 − Y RG ) ∈ (cone(Ξ ) ) . 0 From [28, Corollary 6.21], we have that cone(Ξ 0 )∗ = cone(Ξ )∗ , and therefore, we can conclude that the feasible region of Problems (14) and (15) is equivalent. Problem (15) is linear to both the decision variables Y and the uncertain vector ξ 0 . Despite this nice bilinear structure, in the following we demonstrate that Problems (11) and (15) are generically intractable for decision rules of type (6). Theorem 1 Problems (11) and (15) defined through L(∙) in (10) and G(∙) in (6b), are NP-hard even when Y ∈ Zq×g is relaxed to Y ∈ Rq×g . 10 Proof See Appendix A. Theorem 1 provides a rather disappointing result on the complexity of Problems (11) and (15). Therefore, unless P = NP, there is no algorithm that solves generic problems of type (11) and (15) in polynomial time. This difficulty stems from the generic structure of G(∙) in (6b), combined with a generic 0 polyhedral uncertainty set Ξ, resulting in highly non-convex sets Ξ . Nevertheless, in the following we 0 derive the exact polyhedral representation of conv(Ξ ), allowing us to use linear duality arguments from [4, 7], to reformulate Problem (15) into a mixed-integer linear optimization problem. We remark that Theorem 1 also covers decision rule problems involving real-valued, piecewise constant decisions rules constructed using (6b), and demonstrates that these problems are also computationally intractable. 0 We construct the convex hull of Ξ by taking convex combinations of its extreme points, denoted by 0 0 ext(Ξ ). These points are either the extreme points of Ξ 0 , or the limit points Ξ \Ξ0 . We construct these points as follows: First, using the definition of G(∙) we partition Ξ into P polyhedra, Ξp . Then, for all extreme points of Ξp , ξ ∈ ext(Ξp ), we calculate the one-side limit point at L(ξ). The set of all one-side 0 limit points will coincides with the set of extreme points of Ξ , see Figure 3. The partitions of Ξ can be formulated as follows: Ξp = ξ ∈ Ξ : α> ξ ≥ βi , i ∈ Gp ⊆ {2, . . . , g}, i α> i ξ ≤ βi , i ∈ {2, . . . , g}\Gp , p = 1, . . . , P. (16) Here, Gp is constructed such that the number of partitions P is maximized, while ensuring that all Ξ p are (a) non-empty, (b) any partition pair Ξi and Ξj , i 6= j, can overlap only on one their facets, and (c), SP Ξ = p=1 Ξp , see Figure 2. Depending on the choices of αi and βi , the number of partitions can be up α> 2 ξ = β2 Ξ2 Ξ3 α> 1 ξ = β1 Ξ1 Ξ4 Ξ Figure 2: The uncertainty set Ξ is partitioned into Ξp , p = 1, . . . , 4 polyhedra using α α >ξ 2 >ξ 1 = β1 and = β2 , such that Ξ = Ξ1 ∪ . . . ∪ Ξ4 . to P = 2g . We define V and V (p) such that V := P [ V (p), V (p) := ext(Ξp ), p = 1, . . . , P. p=1 11 (17) 0 Due to the discontinuity of G(∙), the set of points {ξ 0 ∈ Rk : ξ 0 = L(ξ), ξ ∈ V } does not contain the 0 b p (ξ) = (ξ > , G b p (ξ)> )> points Ξ \Ξ0 . To construct these points, we now define the one-sided limit points L b p (ξ) : Rk → Rg is given by. for all ξ ∈ V (p) and each partition p = 1, . . . , P . Here, G b p (ξ) := G lim u ∈Ξp , u →ξ G(u), ∀ξ ∈ V (p), p = 1, . . . , P. (18) From definition (6b), for each partition p, G(ξ) is constant for all ξ in the interior of Ξp , which is denoted b p (ξ̃) is equal to G(ξ) for all ξ ∈ int(Ξp ). by int(Ξp ). Therefore, for each ξ̃ ∈ V (p), the one-side limit G Furthermore, since each partition Ξp spans Rk , then there exists at least k + 1 vertices in Ξp , and b p (ξ). From the definition (18), we have that L b p (ξ) are therefore, at least k + 1 one-sided limit points G 0 the extreme points of Ξ for all ξ ∈ V (p) and p = 1, . . . , P , see Figure 3. ξ20 1 0 5 0 10 ξ10 Ξ Ξ2 Ξ1 0 0 = (ξ 0 , ξ 0 ) = L(ξ) = (ξ, G(ξ))> , 1 2 where G(ξ) = 1(ξ ≥ 5) and ξ ∈ [0, 10]. Here, V = {ext(Ξ1 ), ext(Ξ2 )} = {0, 5, 10}. The convex hull is constructed by taking convex combinations of the points L(0), L(5), L(10) (black dots), and point b 1 (5))> (white dot), where G b 1 (5) is the one-side limit point at ξ = 5 using partition Ξ1 . (5, G Figure 3: Convex hull representation of Ξ induced by lifting ξ 0 The following proposition gives the polyhedral representation of the convex hull of Ξ . Proposition 3 Let L(∙) be given by (10) and G(∙) being defined in (6b). Then, the tight, closed, outer 0 approximation for the convex hull of Ξ is given by the following polyhedron: n 0 0 conv (Ξ ) = ξ 0 = (ξ10> , ξ20> )> ∈ Rk : ∃ζp (v) ∈ R+ , ∀v ∈ V (p), p = 1, . . . , P, such that P X X ζp (v) = 1, p=1 v ∈V (p) P X X ζp (v)v, ξ10 = p=1 v ∈V (p) P X X b p (v) ξ20 = ζp (v)G p=1 v ∈V (p) 0 (19) o . Proof The proof is split in two parts: First, we prove that L(Ξ) ⊆ conv (Ξ ) holds, and then we show 12 0 0 that conv (cl(L(Ξ))) ⊇ conv (Ξ ) is true as well, concluding that conv (cl(L(Ξ))) = conv (Ξ ). 0 To prove assertion L(Ξ) ⊆ conv (Ξ ), pick any ξ ∈ Ξ. By construction ξ belongs to some partition of Ξ, say Ξp , for which L(ξ) is an element of Ξ0 . There are two possibilities: (a) ξ lies on the boundary of the partition and hence on one of its facets, or (b) ξ lies in the interior of Ξp . If ξ lies on a facet, and since Ξp spans Rk , then Carathéodory’s Theorem implies that there are k extreme points of the facet, Pk Pk v1 , . . . , vk ∈ V (p), such that for some δ(v) ∈ Rk+ with i=1 δ(vi ) = 1, we have ξ = i=1 δ(vi )vi . Since b p (vi ), v1 , . . . , vk ∈ V (p), attain the G(∙) is constant on each facet, then the k one-side limit points G same value as G(ξ). Therefore, we have G k X δ(vi )vi i=1 ! = k X i=1 b p (vi ). δ(vi )G Hence, setting ζp (vi ) = δ(vi ), i = 1, . . . , k, in the definition of the convex hull with the rest of the ζ(v) equal to zero proves part (a) of the assertion. If ξ lies in the interior of Ξp , then again by Carathéodory’s Theorem there are k + 1 extreme points of Ξp such that for some δ(v) ∈ Rk+1 and v1 , . . . , vk+1 ∈ V (p) + Pk+1 Pk+1 with i=1 δ(vi ) = 1, we have ξ = i=1 δ(vi )vi . By construction, there also exists k + 1 one-sided limit b p (v) : v ∈ V (p)} that attain the same value as G(ξ) for all ξ ∈ int(Ξp ). Thus, we in the collection {G have that G k+1 X i=1 δ(vi )vi ! = k+1 X i=1 b p (vi ). δ(vi )G Setting ζp (vi ) = δ(vi ), i = 1, . . . , k + 1, with the rest of the ζ(v) equal to zero proves part (b) of the assertion. 0 0 To prove the second part of assertion, conv (cl(L(Ξ))) ⊇ conv (Ξ ), fix ξ 0 ∈ Ξ . By construction, there exists ζp (v) ∈ R+ , p = 1, . . . , P , v ∈ V , such that P X X p=1 v ∈V (p) This implies that ζp (v) = 1, ξ10 = P X X ζp (v)v, ξ20 = p=1 v ∈V (p) P X X p=1 v ∈V (p) P 0 X X ξ1 v ζp (v) = , b p (v) ξ20 G p=1 v ∈V (p) b p (v). ζp (v)G 0 that is, ξ 0 is a convex combination of either corner points or limit points of L(Ξ). Therefore, conv (Ξ ) is equal to conv (cl(L(Ξ))) almost surely. This concludes the proof. The convex hull (19) is a closed and bounded polyhedral set described by a finite set of linear 13 inequalities. Therefore, one can rewrite (19) as the following polyhedron n o 0 0 0 conv(Ξ ) = ξ 0 ∈ Rk : ∃ζ 0 ∈ Rv such that W 0 ξ 0 + U 0 ζ 0 ≥ h0 , 0 0 0 0 (20) 0 where W 0 ∈ Rl ×k , U 0 ∈ Rl ×v and h0 ∈ Rl are completely determined through Propositions 3. The following proposition captures the essence of robust optimization and provides the tools for reformulating the infinite number of constraints in Problem (15), see [4, 7]. The proof is repeated here to keep the paper self-contained. 0 0 Proposition 4 For any Z ∈ Rm×k and conv(Ξ ) given by (20), the following statements are equivalent. 0 (i) Zξ 0 ≥ 0 for all ξ 0 ∈ conv(Ξ ), 0 (ii) ∃ Λ ∈ Rm×l with ΛW 0 = Z, ΛU 0 = 0, Λh0 ≥ 0. + Proof We denote by Zμ> the μth row of the matrix Z. Then, statement (i) is equivalent to ⇐⇒ ⇐⇒ ⇐⇒ Zξ 0 ≥ 0 for all ξ 0 ∈ conv(Ξ0 ), n o > 0 v0 0 0 0 0 0 ξ : ∃ζ ∈ R , W ξ + U ζ ≥ h Z , 0 ≤ min μ ξ 0 0 ≤ max 0 h0> Λμ : W 0> Λμ = Zμ , U 0> Λμ = 0, Λμ ≥ 0 , Λμ ∈Rl 0 ∃ Λμ ∈ Rl with W 0> Λμ = Zμ , U 0> Λμ = 0, h0> Λμ ≥ 0, Λμ ≥ 0, ∀ μ = 1, . . . , m ∀ μ = 1, . . . , m ∀ μ = 1, . . . , m (21) The equivalence in the third line follows from linear duality. Interpreting Λ > μ as the μth row of a new matrix Λ ∈ Rm×l shows that the last line in (21) is equivalent to assertion (ii). Thus, the claim follows. Using Proposition 4 together with the polyhedron (20), we can now reformulate Problem (15) into the following mixed-integer linear optimization problem. minimize subject to Tr M 0 Rξ> D > Y RG 0 0 q×l Y ∈ Zq×g , Λ ∈ Rm×l , Γ ∈ Rq×l + + , Θ ∈ R+ BY RG + ΛW 0 = HRξ , ΛU 0 = 0, Λh0 ≥ 0, 0 (22) Y RG = ΓW 0 , ΓU 0 = 0, Γh0 ≥ 0, 0 0 0 ee> 1 − Y RG = ΘW , ΘU = 0, Θh ≥ 0. 0 0 0 , Γ ∈ Rq×l and Θ ∈ Rq×l are the auxiliary variables associated with constraints Here, Λ ∈ Rm×l + + + B Y RG ξ 0 ≤ HRξ ξ 0 , 0 ≤ Y RG ξ 0 and Y RG ξ 0 ≤ e, respectively. We emphasize that Problem (22) is the 0 exact reformulation of Problem (15), since (19) is a tight outer approximation of the convex hull of Ξ . Therefore, the solution of Problem (22) achieves the best binary decision rule associated with G(∙) in 14 (6b). Notice that the size of the Problem (22) grows quadratically with respect to the number of binary 0 decision q, and l0 the number of constraints of conv(Ξ ). However, l0 can be very large as it depends on the cardinality of V , which is constructed using the extreme points of the P = 2g partitions Ξp . Therefore, the size of Problem (22) can grow exponentially with respect to the description of Ξ and the complexity of the binary decision rule, g, defined in (6). In the following, we present instances of Ξ and G(∙), for which the proposed solution method will result to mixed-integer optimization problems whose size grows only polynomially with respect to the input parameters. 2.3 Scalable Binary Recision Rule Structures In this section, we derive a scalable mixed-integer linear optimization reformulation of Problem (7) by considering simplified instances of the uncertainty set Ξ and binary decision rules y(∙). We will show that for the structure of Ξ and y(∙) considered in this section, the size of the resulting mixed-integer linear optimization problem grows polynomially in the size of the original Problem (3) as well as the description of the binary decision rules. We consider two cases: (i) uncertainty sets that can be described as hyperrectangles and (ii) general polyhedral uncertainty sets. We now consider case (i), and define the uncertainty set of Problem (3) to have the following hyperrectangular structure: Ξ = ξ ∈ Rk : l ≤ ξ ≤ u, ξ1 = 1 , (23) for given l, u ∈ Rk . We restrict the feasible region of the binary functions y(∙) ∈ Bk,q to admit the piecewise constant structure (6a), but now G : Rk → {0, 1}g is written in the form G(∙) = (G1 (∙), G2,1 (∙), . . . , G2,r (∙), . . . , Gk,r (∙))> , with g = 1 + (k − 1)r. Here, G1 : Rk → {0, 1} and Gi,j : Rk → {0, 1}, for j = 1, . . . , r, i = 2, . . . , k, are piecewise constant functions given by G1 (ξ) := 1, Gi,j (ξ) := 1 (ξi ≥ βi,j ) , j = 1, . . . , r, i = 2, . . . , k, (24) for fixed βi,j ∈ R, j = 1, . . . , r, i = 2, . . . , k. By construction, we assume that li < βi,1 < . . . , βi,r < ui , for all i = 2, . . . , k. Structure (24) gives rise to binary decision rules that are discontinuous along the axis of the uncertainty set, and have r discontinuities in each direction. Notice that (24) is a special case of (6b), where vector α are replaced by ei . One can easily modify (24) such that the number of discontinuities r is different for each direction ξi . We will refer to βi,1 , . . . , βi,r as the breakpoints in 15 direction ξi . Problem (7) still retains the same semi-infinite structure using (23) and (24). In the following, we use the same arguments as in Section 2.2, to (a) redefine Problem (7) into a higher dimensional space and (b) construct the convex hull of the lifted uncertainty set and use it together with Proposition 4 to reformulate the infinite number of constraints of the lifted problem. 0 We define the non-linear lifting operator L : Rk → Rk to be L(∙) = (L1 (∙)> , . . . , Lk (∙)> )> such that L 1 : R k → R2 , L1 (ξ) = (ξ1 , G1 (ξ))> , Li : Rk → Rr+1 , Li (ξ) = (ξi , Gi (ξ)> )> , i = 2, . . . , k, (25) with k 0 = 2 + (k − 1)(r + 1). Here, with slight abuse of notation, Gi (∙) = (Gi,1 (∙), . . . , Gi,r )> for i = 2, . . . , k. Notice that L(∙) is separable in each component Li (∙), since Gi (∙) only involves ξi . We also introduce matrices Rξ = (Rξ ,1 , . . . , Rξ ,k ) 0 ∈ Rk×k with Rξ ,1 ∈ Rk×2 , Rξ ,i ∈ Rk×(r+1) , i = 2, . . . , k, such that Rξ L(ξ) = ξ, Rξ ,i Li (ξ) = ei ξi , i = 1, . . . , k, (26a) 0 and RG = (RG,1 , . . . , RG,k ) ∈ Rg×k with RG,1 ∈ Rg×2 , RG,i ∈ Rg×(r+1) , i = 2, . . . , k, such that RG L(ξ) = G(ξ), (26b) RG,i Li (ξ) = (0, . . . , Gi (ξ)> , . . . , 0)> , i = 1, . . . , k. As in Section 2.2, using the definition of L(∙) and Rξ , RG in (25) and (26), respectively, we can rewrite Problem (7) into Problem (11). 0 We now define the uncertain vector ξ 0 = (ξ10> , . . . , ξk0> )> ∈ Rk such that ξ 0 = L(ξ) and ξi0 = Li (ξ) for i = 1, . . . , k. ξ 0 has probability measure Pξ support is give by 0 defined using (12), and corresponding probability measure n o 0 Ξ0 = L(Ξ) = ξ 0 ∈ Rk : l ≤ Rξ ξ 0 ≤ u, L(RG ξ 0 ) = ξ 0 . (27) Notice that since both Ξ and L(∙) are separable in each component ξi , Ξ0 can be written in the following equivalent form: 0 Ξ0 = L (Ξ) = (ξ10> , . . . , ξk0> )> ∈ Rk : ξi ∈ Ξ0i , i = 1, . . . , k , where Ξ0i = Li (Ξ). Therefore, we can express conv(cl(Ξ 0 )) as 0 conv(cl(Ξ0 )) = (ξ10> , . . . , ξk0> )> ∈ Rk : ξi0 ∈ conv(cl(Ξ0i )), i = 1, . . . , k , 0 0 = (ξ10> , . . . , ξk0> )> ∈ Rk : ξi0 ∈ conv(Ξi ), i = 1, . . . , k . (28) 0 It is thus sufficient to derive a closed-form representation for the marginal convex hulls conv(Ξi ). 16 0 0 We now construct the polyhedral representation of conv(Ξi ). As before, conv(Ξi ) will be con0 structed by taking convex combinations of its extreme points. Set conv(Ξ1 ) has the trivial representation 0 conv(Ξ1 ) = {ξ10 ∈ R2 : ξ10 = e}. We define the sets Vi = {li , βi,1 , . . . , βi,r , ui }, i = 2, . . . , k, and introduce the following partitions for each dimension of Ξ. Ξi,1 := {ξi ∈ R : li ≤ ξi ≤ βi,1 }, Ξi,p := {ξi ∈ R : βi,p−1 ≤ ξi ≤ βi,p }, p = 2, . . . , r, Ξi,r+1 := {ξi ∈ R : βi,r ≤ ξi ≤ ui }, i = 2, . . . , k. (29) Therefore, Vi (1) = {li , βi,1 }, Vi (p) = {βi,p−1 , βi,p }, p = 2, . . . , r, Vi (r + 1) = {βi,r , ui }. It is easy to see 0 that points L(ei ξi ), ξi ∈ Vi , are extreme points of conv(Ξi ), see Figure 4. Notice that Gi (ei ξi ) is constant for all ξi ∈ int(Ξi,p ), p = 1, . . . , r + 1, but can attain different values on the boundaries of the partitions. b i,p (ei ξi ) ∈ Rr+1 To this end, for each partition and ξi ∈ Vi (p), we define the one-sided limit points G such that b i,p (ei ξi ) = G lim u∈Ξi,p , u→ξi Gi (ei u), ∀ξi ∈ Vi (p), p = 1, . . . , r + 1, (30) for all i = 2, . . . , k. Gi (ei ξi ) is constant for all ξi ∈ int(Ξi,p ) and each partition p. Therefore, for each ˜ is equal to G(ei ξi ) for all ξ ∈ int(Ξi,p ). b i,p (ei ξ) ξ˜i ∈ Vi (p), the one-side limit G 0 ξi,3 1 0 ξi,2 0 1 0 3 2 1 0 0 ξi,1 Ξi 0 = Li (ξ) = (ξi , Gi (ξ)> )> , where Gi (ξ) = (1(ξi ≥ 1), 1(ξi ≥ and ξi ∈ [0, 3]. Here, Vi = {0, 1, 2, 3}. The convex hull is constructed by taking convex combinations of the points L(0), L(1), L(2), L(3) (black dots), and points b 1,1 (1))> , (2, G b 1,2 (2))> (white dot), where G b 1,1 (1) and G b 1,2 (2) are the one-side limit point at (1, G points ξi = 1 and ξi = 2, respectively. Figure 4: Convex hull representation of Ξi induced by lifting ξ 2))> 0 i 0 The following proposition gives the polyhedral representation of the each conv(Ξi ), i = 2, . . . , k. A 0 visual representation of conv(Ξi ) is depicted in Figure 4. Proposition 5 For each i = 2 . . . , k, let Li (∙) be given by (25) and Gi (∙) being defined in (24). Then, 17 0 the tight, closed, outer approximation for the convex hull of Ξi is given by the following polyhedron: n 0 0> 0> > , ξi,2 ) ∈ Rr+1 : ∃ζi,p (v) ∈ R+ , ∀v ∈ Vi (p), p = 1, . . . , r + 1, such that conv (Ξi ) = ξi0 = (ξi,1 r+1 X X ζi,p (v) = 1, p=1 v ∈Vi (p) r+1 X X 0 = ζi,p (v)v, ξi,1 p=1 v ∈Vi (p) r+1 X X 0 b p (ei v) ξi,2 = ζi,p (v)G p=1 v∈Vi (p) (31) o . Proof Proposition 5 can be proven using similar arguments as in Proposition 3. 0 0 Set conv(Ξi ) has 2r + 2 extreme points. Using the extreme point representation (31), conv(Ξi ) can 0 0 be represented through 4r + 4 constraints. Therefore, conv(Ξ ) = ×ki=1 conv(Ξi ) can be represented using a total of 2 + k(4r + 4) constraints, i.e., its size grows quadratically in the dimension of Ξ, k, and the complexity of the binary decision rule r. One can now use the polyhedral description (31) together with Proposition 4, to reformulate Problem (15) into a mixed-integer linear optimization problem which can be expressed in the form (22). The size of the mixed-integer linear optimization problem will be polynomial in q, m, k and r. We emphasize that since the convex hull representation (31) is exact, the solution of Problem (22) achieves the best binary decision rule associated with G(∙) defined in (24). We now consider case (ii), and we assume that Ξ is a generic set of the type (4) and G(∙) is given by (24). From Section 2, we know that the number of constraints needed to describe the polyhedral 0 representation of conv(Ξ ) grows exponentially with the description of Ξ and complexity of the decision rule. In the following, we presented a systematic way to construct a tractable outer approximation 0 for conv(Ξ ). Using this outer approximation to reformulate the lifted problem (11), will not yield the best binary decision rule structure associated with function G(∙) but rather a conservative approximation. Nevertheless, the size of the resulting mixed-integer linear optimization problem will only grow polynomially with respect to the constrains of Ξ and complexity of the binary decision rule. To this end, let {ξ ∈ Rk : l ≤ ξ ≤ u} be the smallest hyperrectangle containing Ξ. We have Ξ = = ξ ∈ Rk : ∃ζ ∈ Rv such that W ξ + U ζ ≥ h, ξ1 = 1 , ξ ∈ R : ∃ζ ∈ R such that W ξ + U ζ ≥ h, ξ1 = 1, l ≤ ξ ≤ u , k v which implies that the lifted uncertainty set Ξ 0 = L(Ξ) can be expressed as Ξ0 = Ξ01 ∩ Ξ02 , where Ξ01 := Ξ02 := n 0 ξ 0 ∈ Rk : ∃ζ ∈ Rv such that W Rξ ξ 0 + U ζ ≥ h, ξ10 = 1 n o 0 ξ 0 ∈ Rk : l ≤ Rξ ξ 0 ≤ u, L(RG ξ 0 ) = ξ 0 . 18 o (32) Notice that Ξ02 has exactly the same structure as (27). We thus conclude that b 0 := {Ξ01 ∩ conv(Ξ02 )} ⊇ conv(Ξ0 ), Ξ (33) b 0 can be used as an outer approximation of conv(Ξ0 ). Since conv(Ξ02 ) can be written and therefore, Ξ b 0 has a total of (l + 1) + (2 + k(4r + 4)) constraints. as the polyhedron induced by Proposition 5, set Ξ b 0 together with Proposition 4, to reformulate As before, one can now use the polyhedral description of Ξ Problem (15) into a mixed-integer linear optimization problem which can be expressed in the form (22). The size of the mixed-integer linear optimization problem will be polynomial in q, m, k, l and r. Since b 0 ⊇ conv(Ξ0 ), the solution of Problem (22) might not achieve the best binary decision rule induced by Ξ (24), but rather a conservative approximation. Nevertheless, using this systematic decomposition of the uncertainty set, we can efficiently apply binary decision rules to problems where the uncertainty set has arbitrary convex polyhedral structure. One can use G(∙) given by (24) together with this systematic decomposition of Ξ to achieve the same binary decision rule structures as those offered by G(∙) defined in (6b). We will illustrate this through the following example. Lets assume that the problem in hand requires a generic Ξ of type (4), and we want to parameterize the binary decision rule to be a linear function of G(ξ) = 1 α> ξ ≥ β for some α ∈ Rk and β ∈ R. We can introduce an additional random variable ξ˜ in Ξ such that ξ˜ = α> ξ. ξ˜ is completely determined by the random variables already presented in Ξ. The modified support can now be written as follows, Ξ = = n n o ˜ ∈ Rk+1 : ∃ζ ∈ Rv , W ξ + U ζ ≥ h, ξ1 = 1, ξ˜ = α> ξ , (ξ, ξ) o ˜ ∈ Rk+1 : ∃ζ ∈ Rv , W ξ + U ζ ≥ h, ξ1 = 1, ξ˜ = α> ξ, l ≤ ξ ≤ u, ˜l ≤ ξ˜ ≤ ũ , (ξ, ξ) (34) ˜ ∈ Rk+1 : l ≤ ξ ≤ u, ˜l ≤ ξ˜ ≤ ũ} is the smallest hyperrectangle containing Ξ. We can now where {(ξ, ξ) use ˜ = 1 ξ˜ ≥ β , G(ξ, ξ) which is an instance of (24), to achieve the same structure as G(ξ) = 1 α> ξ ≥ β . Defining appropriate L(∙) and Rξ , RG , we can use the following decomposition of the lifted uncertainty set Ξ 0 = Ξ01 ∩ Ξ02 such that Ξ01 := Ξ02 := n o 0 (ξ 0 , ξ˜0 ) ∈ Rk +1 : ∃ζ ∈ Rv such that W Rξ ξ 0 + U ζ ≥ h, ξ10 = 1, ξ˜0 = α> Rξ ξ 0 , n o 0 (ξ 0 , ξ˜0 ) ∈ Rk +1 : l ≤ Rξ ξ 0 ≤ u, ˜l ≤ ξ˜0 ≤ ũ, L(RG (ξ 0 , ξ˜0 )) = (ξ 0 , ξ˜0 ) . 19 b 0 := {Ξ0 ∩ conv(Ξ0 )} ⊇ conv(Ξ0 ) we can reformulate reformulate Problem (15) into a By defining, Ξ 1 2 mixed-integer linear optimization problem with polynomial number of decision variable and constraints with respect to the the input data. Once more, we emphasize that this systematic decomposition might not achieve the best binary decision rule induced by G(∙) but rather a conservative approximation. Nevertheless, we can now achieve the same flexibility for the binary decision rules as those offered by (6b) without the exponential growth in the size of the resulting problem. 3 Binary Decision Rules for Random-Recourse Problems In this section, we present our approach for one-stage adaptive optimization problems with random recourse. The solution method of this class of problems is an adaptation of the solution method presented in Section 2.2. Given function B : Rk → Rm×q and matrices D ∈ Rq×k , H ∈ Rm×k and a probability measure Pξ supported on set Ξ for the uncertain vector ξ ∈ Rk , we are interested in choosing binary functions y(∙) ∈ Bk,q in order to solve: (Dξ)> y(ξ) minimize Eξ subject to y(∙) ∈ Bk,q , B(ξ)y(ξ) ≤ Hξ, (35) ∀ξ ∈ Ξ, where Ξ is a generic polyhedron of type (4). We assume that the recourse matric B(ξ) depends linearly on the uncertain parameters. Therefore, the μth row of B(ξ) is representable as ξ > Bμ for matrices Bμ ∈ Rk×q with μ = 1, . . . , m. Problem (35) can therefore be written as the following problem. minimize Eξ subject to y(∙) ∈ Bk,q , (Dξ)> y(ξ) ξ > Bμ y(ξ) ≤ Hμ> ξ, μ = 1, . . . , m (36) ∀ξ ∈ Ξ, where Hμ> denotes the μth row of matrix H. Problem (36) involves a continuum of decision variables and inequality constraints. Therefore, in order to make the problem amenable to numerical solutions, we restrict y(∙) to admit structure (6). Applying the binary decision rules (6) to Problem (36) yields the following semi-infinite problem, which involves a finite number of decision variables Y ∈ Zq×g , and an 20 infinite number of constraints: minimize Eξ subject to Y ∈ Zq×g , ξ > D> Y G(ξ) > ξ Bμ Y G(ξ) ≤ Hμ> ξ, 0 ≤ Y G(ξ) ≤ e, μ = 1, . . . , m, (37) ∀ξ ∈ Ξ. Notice that all decision variables appear linearly in Problem (7). Nevertheless, the objective and constraints are non-linear functions of ξ. 0 We now define the non-linear lifting operator L : Rk → Rk such that ξ G(ξ) L(ξ) = G(ξ)ξ1 , . .. G(ξ)ξk 0 L : R k → Rk , (38) k 0 = k + g + gk. Note that including both components G(ξ) and G(ξ)ξ1 in L(ξ) is redundant as by construction ξ1 = 1. Nevertheless, in the following we adopt formulation (38) for easy of exposition. We 0 0 also define matrices Rξ ∈ Rk×k , and RG ∈ Rq×k such that Rξ L(ξ) = ξ, RG L(ξ) = G(ξ). (39) 0 In addition, we define F : Rk×g → Rk that expresses the quadratic polynomials ξ > Bμ Y G(ξ), as linear functions of L(ξ) through the following constraints: y0μ = F (Bμ Y ), 0 > 0 k y0> μ L(ξ) = ξ Bμ Y G(ξ), yμ ∈ R , μ = 1, . . . , m, ∀ξ ∈ Ξ. (40) Constraints y0μ = F (Bμ Y ), μ = 1, . . . , m, are linear, since F (∙) effectively reorders the entries of matrix Bμ Y into the vector y0μ . Using (38), (39) and (40), Problem (37) can now be rewritten into the following 21 optimization problem. minimize Tr M Rξ> D > Y RG subject to Y ∈ Zq×g , > y0> μ L(ξ) ≤ Hμ Rξ L(ξ), 0 y0μ = F (Bμ Y ), y0μ ∈ Rk , 0 ≤ Y RG L(ξ) ≤ e, μ = 1, . . . , m, μ = 1, . . . , m, (41) ∀ξ ∈ Ξ. Problem (41) has similar structure as Problem (11), i.e., both Y and L(ξ) appear linearly in the constraints. Therefore, in the following we redefine Problem (41) in a higher dimensional space and apply Proposition 4 to reformulate the problem into a mixed-integer optimization problem. 0 0> 0> 0> 0> > 0 We define the uncertain vector ξ 0 = (ξ1,1 , ξ1,2 , ξ2,1 , . . . , ξ2,k ) ∈ Rk such that ξ 0 = L(ξ) and ξ1,1 = ξ, 0 0 = G(ξ), and ξ2,i = G(ξ)ξi for i = 1, . . . , k. ξ 0 has probability measure Pξ ξ1,2 0 defined as in (12) and support Ξ0 = L(Ξ). Problem (41) can now be written as the equivalent semi-infinite problem defined on the lifted space ξ 0 . minimize Tr M 0 Rξ> D > Y RG subject to Y ∈ Zq×g , 0 > 0 y0> μ ξ ≤ Hμ Rξ ξ , 0 y0μ = F (Bμ Y ), y0μ ∈ Rk , 0 ≤ Y RG ξ 0 ≤ e, 0 μ = 1, . . . , m, μ = 1, . . . , m, (42) ∀ξ 0 ∈ Ξ0 , 0 We will construct the convex hull of Ξ in the same way as in Section 2.2. Notice that conv(Ξ ) will have the same number of extreme points as the convex hull defined through the lifting L(ξ) = (ξ > , G(ξ)> )> , b p (ξ) as in (18), it is easy to see that see Figure 5. By defining the partitions Ξ p of Ξ as in (16), and G 0 conv(Ξ ) can be described as a convex combination of the following points. ξ G b p (ξ) b Gp (ξ)ξ1 , .. . b Gp (ξ)ξk ξ ∈ V (p), p = 1, . . . , P. 0 The following proposition gives the polyhedral representation of the convex hull of Ξ . 22 G(ξ) G(ξ)ξ2 2 2 1 0 ξ2 0 − 2 2 ξ2 0 − 2 4 2 0 ξ1 − 2 2 0 − 2 0 2 4 ξ1 e ) = (ξ1 , ξ2 , G(ξ )ξ2 ) Visualization of L(ξ ) = (ξ1 , ξ2 , G(ξ )) (left) and L(ξ Here, V = (right) where G(ξ ) = 1(ξ1 ≥ 2), for ξ1 ∈ [0, 4] and ξ2 ∈ [−2, 2]. {(0, −2), (0, 2), (2, −2), (2, 2), (4, −2), (4, 2)}. The convex hull of L(ξ ) is constructed by taken convex b −2)))> , (2, 2, G((2, b combinations of L(ξ ) for all ξ ∈ V and (2, −2, G(2, 2)))> , and the convex hull of e ) is constructed by taken convex combinations of L(ξ e ) for all ξ ∈ V and (2, −2, G(2, b −2)(−2))> , L(ξ b 2)2)> . (2, 2, G(2, Figure 5: Proposition 6 Let L(∙) being defined in (38) and G(∙) being defined in (6b). Then, the tight, closed, 0 outer approximation for the convex hull of Ξ is given by the following polyhedron: n 0 0 0> 0> 0> 0> > conv (Ξ ) = ξ 0 = (ξ1,1 , ξ1,2 , ξ2,1 , . . . , ξ2,k ) ∈ Rk : ∃ζp (v) ∈ R+ , ∀v ∈ V (p), p = 1, . . . , P, P X X ζp (v) = 1, p=1 v ∈V (p) P X X 0 ξ1,1 = ζp (v)v, p=1 v ∈V (p) P X X 0 b p (v), ξ1,2 = ζp (v)G p=1 v ∈V (p) P X X 0 b p (v)vi , ξ2,i = ζp (v)G p=1 v ∈V (p) (43) o i = 1, . . . , k . Proof Proposition 6 can be proven using similar arguments as in Proposition 3. The description of polyhedron (43) inherits the same exponential complexity as (19). That is, the number of constraints in (43) depends on the cardinality of V , which in the worst-case, is constructed using the extreme points of P = 2g partitions of Ξ. Nevertheless, the description (43) provides a tight 0 outer approximation for conv (Ξ ). Using Proposition 4 together with the polyhedron (43), we can now reformulate Problem (42) into 23 the following mixed-integer linear optimization problem. minimize subject to Tr M 0 Rξ> D > Y RG 0 0 q×l Y ∈ Zq×g , Γ ∈ Rq×l + , Θ ∈ R+ , y0μ k0 ∈ R , λμ ∈ 0 Rm×l , + y0μ + λμ W 0 = Rξ> Hμ , λμ U 0 = 0, λμ h0 ≥ 0 0 0 k0 y = F (Bμ Y ), y ∈ R , μ μ = 1, . . . , m, (44) μ Y RG = ΓW , ΓU 0 = 0, Γh0 ≥ 0, 0 0 0 0 ee> 1 − Y RG = ΘW , ΘU = 0, Θh ≥ 0. 0 0 > 0 are associated with constraints y0> Here, the auxiliary variables λμ ∈ Rm×l μ ξ ≤ Hμ Rξ ξ , for μ = + 0 0 1, . . . , m, and Γ ∈ Rq×l and Θ ∈ Rq×l with constraints 0 ≤ Y RG ξ 0 and Y RG ξ 0 ≤ e, respectively. + + Problem (44) is the exact reformulation of Problem (42), since (43) is a tight outer approximation of the 0 convex hull of Ξ , and thus the solution of Problem (44) achieves the best binary decision rule associated with G(∙) in (6b). Nevertheless, the size of Problem (44) is affected by the exponential growth of the 0 constraints in conv (Ξ ). One can mitigate this exponential growth by considering simplified structures of Ξ and G(∙), following similar guidelines as those those discussed in Section 2.3. 4 Binary Decision Rules for Multistage Problems In this section, we extend the methodology presented in Section 2 to cover multistage adaptive optimization problems with fixed-recourse. The mathematical formulations presented can easily be adapted to random-recourse problems by following the guidelines of Section 3. The dynamic decision process considered can be described as follows: A decision maker first observes an uncertain parameter ξ1 ∈ Rk1 and then takes a binary decision y1 (ξ1 ) ∈ {0, 1}q1 . Subsequently, a second uncertain parameter ξ2 ∈ Rk2 is revealed, in response to which the decision maker takes a second decision y2 (ξ1 , ξ2 ) ∈ {0, 1}q2 . This sequence of alternating observations and decisions extends over T time stages. To enforce the non-anticipative structure of the decisions, a decision taken at stage t can only depend on the observed parameters up to and including stage t, i.e., yt (ξ t ) where Pt t ξ t = (ξ1> , . . . , ξt> )> ∈ Rk , with k t = s=1 ks . For consistency with the previous sections, and with slight abuse of notation, we assume that k1 = 1 and ξ1 = 1. As before, setting ξ1 = 1 is a non-restrictive assumption which allows to represent affine functions of the non-degenerate outcomes (ξ2> , . . . , ξt> )> in a compact manner as linear functions of (ξ1> , . . . , ξt> )> . We denote by ξ = (ξ1> , . . . , ξT> )> ∈ Rk the vector of all uncertain parameters, where k = k T . Finally, we denote by Bkt ,qt the space of binary functions 24 t from Rk to {0, 1}qt . t t Given matrices Bts ∈ Rmt ×qs , Dt ∈ Rqt ×k and Ht ∈ Rmt ×k , and a probability measure Pξ supported on set Ξ given by (4), for the uncertain vector ξ ∈ Rk , we are interested in choosing binary functions yt (∙) ∈ Bkt ,qt in order to solve: minimize T X Eξ (Dt ξ t )> yt (ξ t ) t=1 subject to ! yt (∙) ∈ Bkt ,qt , t X s=1 Bts ys (ξ s ) ≤ Ht ξ t , (45) t = 1, . . . , T, ∀ξ ∈ Ξ. Problem (45) involves a continuum of decision variables and inequality constraints. Therefore, in order to make the problem amenable to numerical solutions, we restrict the feasible region of the binary functions yt (∙) ∈ Bkt ,qt to admit the following piecewise constant structure: t yt (ξ) = Yt Gt (ξ), Y ∈ Zqt ×g , t = 1, . . . , T, 0 ≤ Yt Gt (ξ) ≤ e, t ∀ξ ∈ Ξ, (46a) t where Gt : Rk → {0, 1}g can be expressed by Gt (∙) = (G1 (∙)> , . . . , Gt (∙)> ), and Gt : Rk → {0, 1}gt are the piecewise constant functions: G1 (ξ1 ) := 1, t Gt,i (ξ) := 1 α> t,i ξ ≥ βt,i , i = 1, . . . , gt , t = 2, . . . , T, (46b) t for given αt,i ∈ Rk and βt,i ∈ R, i = 1, . . . , g, t = 2, . . . , T . Here, we assume that the sets {ξ ∈ Ξ : t > t α> t,i ξ ≥ βt,i } and {ξ ∈ Ξ : αt,i ξ ≤ βt,i } are non-empty for all i = 1, . . . , gt , t = 2, . . . , T , and t > t α> t,i ξ − βt,i 6= αt,j ξ − βt,j for all ξ ∈ Ξ where i, j ∈ {1, . . . , gt }, i 6= j and t = 2, . . . , T . The dimension Pt of Gt (∙) is g t , with g t = s=1 gs . The non-anticipativity of yt (∙) is ensured by restricting Gt (∙) to depend only on random parameters up to and including stage t. Applying decision rules (46) to Problem (45) yields the following semi-infinite problem. minimize Eξ T X (Dt ξ t )> Yt Gt (ξ) t=1 subject to Yt ∈ Z t X s=1 qt ×g t ! , Bts Ys G (ξ) ≤ Ht ξ , s t 0 ≤ Yt Gt (ξ) ≤ e 25 (47) t = 1, . . . , T, ∀ξ ∈ Ξ. We can now use the lifting techniques to first express Problem (47) in a higher dimensional space, compute the convex hull associated with the non-convex uncertainty set of the lifted problem, and finally 0t reformulate the semi-infinite structure using Proposition 4. We define lifting Lt : Rk → Rk , such that Lt (∙) = (L1 (∙)> , . . . , Lt (∙)> )> , where 0 L t : R k t → R kt , Lt (ξ) = (ξt> , Gt (ξ)> )> , t = 1, . . . , T, Here, kt0 = kt + gt . By convention L(∙) = (L1 (∙)> , . . . , LT (∙)> )> and k 0t = addition, we define matrices, Rξ Rξ ∈ Rk ,t ,t L(ξ) t ×k0 and RG,t ∈ Rg = ξt , t ×k0 such that RG,t L(ξ) = G(ξ)t , (48a) Pt s=1 ks0 with k 0 = k 0T . In t = 1, . . . , T. (48b) Using (48), Problem (47) can now be rewritten in the following form: minimize T X Eξ (Dt Rξ ,t L(ξ)) > Yt RG,t L(ξ) t=1 subject to ! t Yt ∈ Zqt ×g , t X s=1 Bts Ys RG,s L(ξ) ≤ Ht Rξ ,t L(ξ), 0 ≤ Yt RG,t L(ξ) ≤ e (49) t = 1, . . . , T, ∀ξ ∈ Ξ, Problem (49) has similar structure as Problem (11), i.e., both Yt and L(ξ) appear linearly in the constraints. Therefore, in the following we redefine Problem (49) in a higher dimensional space and apply Proposition 4 to reformulate the problem into a mixed-integer optimization problem. 0 We now define the uncertain vector ξ 0 = (ξ10> , . . . , ξT0> )> ∈ Rk such that ξ 0 = L(ξ) and ξt0 = Lt (ξ) for t = 1, . . . , T . ξ 0 has probability measure Pξ 0 defined using (12), and probability measure support Ξ0 = L(Ξ). Using the definition of ξ 0 , the lifted semi-infinite problem is given as follows: minimize Eξ 0 T X (Dt Rξ ,t ξ ) Yt RG,t ξ 0 0 > t=1 subject to Yt ∈ Z t X s=1 qt ×g t , 0 Bts Ys RG,s ξ ≤ Ht Rξ 0 ≤ Yt RG,t ξ 0 ≤ e ,t ξ 0 , ! (50) t = 1, . . . , T, ∀ξ ∈ Ξ . 0 0 The definition of L(∙) in (48a) share almost identical structure as L(∙) in (10). Therefore, the convex 0 hull of Ξ can be expressed as a slight variant of the polyhedron given in (19), and will constitute a 26 0 tight outer approximation of conv(Ξ ). Using Proposition 4 together with the polyhedral representation 0 of conv(Ξ ), we can now reformulate Problem (42) into the following mixed-integer linear optimization problem. T X minimize t=1 Tr M 0 Rξ>,t Dt> Yt RG,t t X 0 0 0 Bts Y RG,s + Λt W = Ht Rξ ,t , Λt U = 0, Λt h ≥ 0, t 0 t 0 t t ×l , Γ ∈ Rq+ ×l , Θ ∈ Rq+ ×l Yt ∈ Zqt ×g , Λ ∈ Rm + subject to 0 s=1 Yt RG,t = Γt W , Γt U = 0, Γt h ≥ 0, 0 0 0 0 0 0 ee> 1 − Yt RG,t = Θt W , Θt U = 0, Θt h ≥ 0. 0 0 (51) t = 1, . . . , T, where M 0 ∈ Rk ×k , M 0 = Eξ 0 (ξ 0 ξ 0> ). Once more, we emphasize that Problem (51) is the exact reformulation of Problem (50). Therefore, the solution of Problem (51) achieves the best binary decision rule associated with G(∙) in (6b) at the cost of the exponential growth of its size with respect to the description of Ξ and the complexity of the binary decision rules (46). One can mitigate this exponential growth by considering simplified structures of Ξ and G(∙), following similar guidelines as those those discussed in Section 2.3. 5 Computational Results In this section, we apply the proposed binary decision rules to a multistage inventory control problem and to a multistage knapsack problem. In both problems, the structure of the binary decision rules is constructed using G(∙) defined in (24), where the breakpoints βi,1 , . . . , βi,r are placed equidistantly within the marginal support of each random parameter ξi . To improve the scalability of the solution method, we further restrict the structure of the binary decisions rules to be y(ξ) = Y G(ξ), Y ∈ {−1, 0, 1}q×g , 0 ≤ Y G(ξ) ≤ e, ∀ξ ∈ Ξ. (52) All of our numerical results are carried out using the IBM ILOG CPLEX 12.5 optimization package on a Intel Core i5 − 3570 CPU at 3.40GHz machine with 8GB RAM [25]. 5.1 Multistage Inventory Control In this case study, we consider a single item inventory control problem where the ordering decisions are discrete. This problem can be formulated as an instance of the multistage adaptive optimization 27 problem with fixed-recourse, and can be described as follows. At the beginning of each time period t ∈ T := {2, . . . , T }, the decision maker observes the product demand ξt that needs to be satisfied robustly. This demand can be served in two ways: (i) by pre-ordering at stage 1 a maximum of N lots, each delivering a fixed quantity qz at the beginning of period t, for a unit cost of cz ; (ii) or by placing an order for a maximum of N lots, each delivering immediately a fixed quantity qy , for a unit cost of cy , with cz < cy . For each lot n = 1, . . . , N , the pre-ordering binary decisions delivered at stage t are denoted by zn,t ∈ {0, 1}, and the recourse binary ordering decisions are denoted by yn,t (∙) ∈ Bkt ,1 . If the ordered quantity is greater than the demand, the excess units are stored in a warehouse, incurring a unit holding cost ch , and can be used to serve future demand. The level of available inventory at each period is given Pt PN by It (∙) ∈ Rkt ,1 . In addition, the cumulative volume of pre-orders s=1 n=1 zn,s must not exceed the ordering budget B tot,t . The decision maker wishes to determine the orders zn,t and yn,t (∙) that minimize the total ordering and holding costs associated with the worst-case demand realization over the planning horizon T . The problem can be formulated as the following multistage adaptive optimization problem. minimize subject to max ξ ∈Ξ N XX cz qz zn,t + cy qy yn,t (ξ ) + ch It (ξ ) t t t∈T n=1 ! It (∙) ∈ Rkt ,1 , zn,t ∈ {0, 1}, yn,t (∙) ∈ Bkt ,1 , It (ξ ) = It−1 (ξ t t−1 )+ N X i=1 It (ξ t ) ≥ 0, t X N X s=1 n=1 ∀n = 1, . . . , N, qz zn,t + qy yn,t (ξ ) − ξt , t zn,s ≤ B tot,t , The uncertainty set Ξ is given by, (53) ∀t ∈ T , ∀ξ ∈ Ξ. Ξ := ξ ∈ RT : ξ1 = 1, l ≤ ξ ≤ u . We emphasize that since we are interested in minimizing the total costs associated with the worst-case demand realization, the model does not require to specify a distribution for our uncertainty demand. Moreover, notice that the real-valued inventory decision It (ξ t ) can be eliminated from formulation (53) Pt PN s by substituting It (ξ t ) = I1 + s=1 i=1 qz zn,s + qy yn,s (ξ ) − ξs in both the objective and constraints, thus eliminating the need to further approximate It (∙) using decision rules. For our computational experiments we randomly generated 30 instances of Problem (53). The parameters are randomly chosen using a uniform distribution from the following sets: Advanced and instant 28 ordering costs are chosen from cz ∈ [0, 5] and cy ∈ [0, 10], respectively, such that cz < cy , and holding costs are elements of ch ∈ [0, 5]. The bounds for each random parameter are chosen from li ∈ [0, 5] and ui ∈ [10, 15] for i = 2, . . . , T . The cumulative ordering budget equals to B tot,t = 10(t−1), for t = 2, . . . , T . We also assume that qz = qy = 15/N , and the initial inventory level equals to zero, i.e., I1 = 0. Average Objective Value Improvements, N = 2 T 10 15 20 Global optimality r=1 r=5 r=9 15% 17% 17% 16% 18% 18% 18% 19% 20% 1% optimality r=1 r=5 r=9 14% 17% 17% 16% 18% 18% 18% 19% 20% 5% optimality r=1 r=5 r=9 14% 16% 16% 16% 17% 17% 17% 18% 19% Average Solution Time, N = 2 T 10 15 20 Global optimality r=1 r=5 r=9 <1sec 4sec 42sec <1sec 166sec 983sec 1sec 688sec 5580sec 1% r=1 <1sec <1sec <1sec optimality r=5 r=9 4sec 15sec 16sec 83sec 46sec 710sec 5% optimality r=1 r=5 r=9 <1sec 1sec 8sec <1sec 3sec 10sec <1sec 9sec 92sec Table 2: Comparison of average improvement of adaptive versus static binary decision rules using the performance measure (Non-Adapt.−Adapt.) , (top), and average solution time of binary Non-Adapt. decision rules (bottom). In the first test series, we compared the performance of the binary decision rules versus the nonadaptive binary decisions for the randomly generated instances. We consider binary decision rules for yn,t (∙) that have r ∈ {1, 5, 9} breakpoints in each direction ξi , and planning horizons T ∈ {10, 15, 20}. Moreover, we consider three different termination criteria for the mixed-integer linear optimization problems: (i) global optimality, (ii) 1% optimality and (iii) 5% optimality. All non-adaptive problems were solved to global optimality. Table 2 presents the results from the case where N = 2, in terms of the average improvement of the binary decision rules versus the static binary decisions. The improvement is defined as quantity (Non-Adapt.−Adapt.) , Non-Adapt. where Non-Adapt. and Adapt. are the objective values of the non- adaptive and adaptive binary decision rule problems, respectively. The first observation is the significant improvement over the not-adaptive decisions, and the improvement in solution quality as the complexity of the binary decision rules increase. As expected, solving the problems to global optimality achieves a slightly better objective value compared to the relaxed termination criteria cases. Nevertheless, the computational burden is substantially larger, rendering the decision rules impractical for T > 20 and r > 9. Using the problem instances of this test series, we also compared the binary decision rule structure given in (6a) and (52), i.e., the decision variable in the mixed-integer linear optimization problems were Y ∈ Zq×g and Y ∈ {−1, 0, 1}q×g , respectively. In both cases, the optimal solution was identical, but the 29 solution time for the case Y ∈ Zq×g was substantially longer, rendering the decision rules impractical for T > 15 and r > 5. Binary Decision Rule Structure (2) v.s. (6), N = 2 T 2 4 6 8 10 Average Performance r=0 r=1 r=5 r=9 19% 3% 2% 1% 29% 16% 12% 10% 52% 34% 27% 26% 60% 40% 29% 28% 47% 38% 31% 30% Design <1sec 3sec 607sec 1628sec 16613sec Solution Time r=0 r=1 r=5 <1sec <1sec <1sec <1sec <1sec <1sec <1sec <1sec <1sec <1sec <1sec 1sec <1sec <1sec 1sec r=9 <1sec <1sec <1sec 1sec 5sec Table 3: Comparison between the binary decision rules discussed in Bertsimas and Georghiou [10] (Design), given by (2) with P = 1, and the proposed decision rules (Adaptive). . The proposed decision rule problems The performance is defined as quantity (Adapt.−Design) Design have great scalability properties but do not perform as well compared to binary decision rules of type (2). In our second test series, we investigate the relative performance between the proposed decision rules and the binary decision rules discussed in Bertsimas and Georghiou [10]. In particular, we used decision rules of type (2) with P = 1, and the corresponding decision rule problem was solved using Algorithm 2 = 0.01 and βb = 10−4 , see [10, Section 2.3] for more details. The with parameters δ = 0.01, β = 10−4 , b termination criteria for all mixed-integer linear optimization problems was fixed to 5%. The results are presented Table 3, and constructed using the randomly generated instances for the cases where N = 2, T ∈ {2, . . . , 10} and r ∈ {0, 1, 5, 9}. The performance is defined as quantity (Adapt.−Design) , Design where Adapt. and Design are the objective values of the proposed binary decision rule structure and the decision rule structure given by (2), respectively. As expected, structure (2) is much more flexible compared to the proposed structure, producing high quality designs. This is the case as the discontinuity of the binary decision rule is decided endogenously by the optimization problem. However, this flexibility comes at a high computational costs, rendering decision rules (2) impractical for problem instances with T > 10. On the other hand, the proposed decision rules are highly scalable, providing good quality solutions to particularly challenging problem instances within a few seconds. In our third test series, we investigate the scalability properties of the proposed binary decision rules and their relative performance. In this test series, the termination criterion of the mixed-integer linear optimization problems was fixed to 5%. The results are presented in Figure 6 and Table 4, and constructed using the randomly generated instances for the cases where N ∈ {2, 3, 4}, T ∈ {10, . . . , 50} and r ∈ {1, 3, 5}. As expected, solution quality increases as the decision rules become more flexible. It is interesting to see that these improvements saturate for r > 3, getting diminishing returns for the computational effort required. 30 40 60 r=1 r=3 r=5 50 Average Improvement (%) Average Improvement (%) 50 N =3 60 r=1 r=3 r=5 50 Average Improvement (%) N =2 60 40 40 30 30 30 20 20 20 10 10 20 30 Time Stages 40 50 N =4 r=1 r=3 r=5 10 10 20 30 Time Stages 40 50 10 10 20 30 Time Stages 40 50 Figure 6: Comparison of average performance of adaptive versus static binary decision rules . for N ∈ {2, 3, 4} and r ∈ {1, 3, 5} using the performance measure (Non-Adapt.−Adapt.) Non-Adapt. Average Solution Time T 10 20 30 40 50 r=1 <1sec <1sec <1sec <1sec 1sec N =2 r=3 r=5 <1sec 1sec 1sec 9sec 5sec 63sec 9sec 114sec 12sec 225sec r=1 <1sec <1sec <1sec 1sec 1sec N =3 r=3 r=5 <1sec 2sec 2sec 48sec 7sec 224sec 12sec 453sec 36sec 1951sec r=1 <1sec <1sec <1sec 1sec 1sec N =4 r=3 r=5 <1sec 4sec 5sec 57sec 31sec 380sec 95sec 4615sec 357sec 19298sec Table 4: Average solution time for N ∈ {2, 3, 4} and r ∈ {1, 3, 5}. 5.2 Multistage Knapsack Problem In this case study, we consider a multistage knapsack problem, which can be formulated as an instance of the multistage adaptive optimization problem with random recourse and binary recourse decisions. In the classic one-stage knapsack problem, the decision maker is given N items, each weighing wn , n = 1, . . . , N , and a knapsack that can carry a maximum weight w. The decision maker wants to maximize the number of items that fit into the knapsack, by choosing binary decisions yn ∈ {0, 1} for each item n, while keeping PN the total weight of the items in the knapsack n=1 wn yn less than w. In the multistage variant of the problem, at the beginning of each time period t ∈ T := {1, . . . , T }, the decision maker is given N items, each weighing wn,t (ξt ), n = 1, . . . , N , and an empty knapsack with capacity w. The weighs wn,t (ξt ) are assume to be linear functions of the random parameter ξt that realizes at stage t. Upon observing the weight of each item, the decision maker then decides the subset of the N items that can fit in the knapsack. The items left behind can be taken at later stages. We denote by yn,t,s (∙) ∈ Bks ,1 the binary decision corresponding to item with weight wn,t (ξt ), taking value 1 if the item is fitted into the knapsack at stage s, s ≥ t, and zero otherwise. The decision maker wishes to maximize the expected number of items that can fit into the knapsacks over the planning horizon T by determining the collection of items yn,t,s (∙). The problem can be formulated as the following multistage optimization problem with random 31 recourse. maximize N XX T X Eξ yn,t,s (ξ ) s n=1 t∈T s=t subject to ! yn,t,s (∙) ∈ Bks ,1 , s = 1 . . . , t, n = 1 . . . , N, t X N X t wn,s (ξs ) yn,s,t (ξ ) ≤ w, ∀t ∈ T , ∀ξ ∈ Ξ. s=1 n=1 T X yn,t,s (ξ s ) ≤ 1, n = 1 . . . , N, (54) s=t Here, the second set of constraints ensure that at each stage t, the total weight of the knapsack is below its capacity w, while the last set of constraints ensure that an item can only be carried in the knapsack only once over the time horizon. The random vector ξ follows a beta distribution, Beta(α, β), and the uncertainty set Ξ is given by, Ξ := ξ ∈ RT : ξ1 = 1, 0 ≤ ξ ≤ e . For our computational experiments we randomly generated 30 instances of Problem (54). The parameters are randomly chosen using a uniform distribution from the following sets: The weight of item n bn,t ξt where w bn,t ∈ [0.5, 1], and the parameters of the beta distribution at stage t is given by wn,t (ξt ) = w are chosen from α ∈ (0, 5] and β ∈ (0, 5]. The capacity of the knapsack is set to w = N/2. In the first test series, we compared the performance of the binary decision rules versus the nonadaptive binary decisions for the randomly generated instances. We consider binary decision rule that have r = {1, 2, 3} breakpoints in each direction ξi , for problem instances with N ∈ {2, 3, 4} and planning horizons T ∈ {3, . . . , 15}. The termination criterion of the mixed-integer linear optimization problems was fixed to 5%. Figure 7 depicts the average improvement in objective value of the binary decisions rules versus the static binary decisions. As before, the improvement in objective value is defined as quantity (Non-Adapt.−Adapt.) . Non-Adapt. The results show a substantial improvement in solution quality for all problem instances N ∈ {2, 3, 4}, with the most flexible decision rule structure achieving the best results. For the time horizon and complexity of the binary decision rules considered, all problem instances were solved within 20 minutes. The solution time for instances with T > 15 and r > 3 was substantially longer. In the second test series, we improve the scalability of the solution method by restricting the information available to the binary decision rules. To this end, we define ηs,t = (ξs , ξt )> , and we restrict the recourse decisions yn,t,s (ξ s ) to have structure yn,t,s (ηs,t ) for all n = 1, . . . , N, s = t, . . . , T and t = 1 . . . , T . The number of integer decision variables needed to approximate yn,t,s (ξ s ) using the binary decision rule structure (52) and (24) is s(r + 1), while for yn,t,s (ηs,t ) the number of integer decision 32 70 60 50 40 30 20 4 6 8 10 Time Stages 12 N =3 80 Average Improvement (%) Average Improvement (%) r=1 r=2 r=3 14 r=1 r=2 r=3 70 60 50 40 30 20 4 6 8 10 Time Stages 12 N =4 80 Average Improvement (%) N =2 80 14 r=1 r=2 r=3 70 60 50 40 30 20 4 6 8 10 Time Stages 12 14 Figure 7: Comparison of average performance of adaptive versus static binary decision rules for N ∈ {2, 3, 4} and r ∈ {1, 2, 3}, using the performance measure (Non-Adapt.−Adapt.) . Non-Adapt. All problems were solved within 20 minutes. variables required is 2(r + 1). We will refer to ηs,t = (ξs , ξt )> as the partial information available to the decision rule at stage s. Table 5 compares the solution quality and computation time, of the full information and partial information structures for problem instances with N = 4, r ∈ {1, 2, 3} and T ∈ {5, . . . , 25}. The solution quality of the full information structure is slightly better, but the solution method suffers with respect to scalability. On the other hand, using the partial information structure we can solve much bigger problem instances within very reasonable amount of time. We note that the problem instance with N = 4 and T = 25 involves a total of 1300 binary decision rules. Average Objective Value Improvements, N = 4 T 5 10 15 20 25 Full r=1 32.5% 28.3% 28.9% 28.3% 26.8% Information r=2 r=3 34.8% 35.7% 32.0% 32.1% 31.5% 32.6% n/a n/a n/a n/a Partial Information r=1 r=2 r=3 30.5% 32.5% 34.2% 27.9% 30.3% 31.5% 26.9% 30.1% 31.2% 27.6% 30.9% 31.7% 26.0% 31.0% 31.9% Average Solution Time, N = 4 5 10 15 20 25 Full Information <1sec 1sec 1sec 4sec 7sec 25sec 27sec 52sec 364sec 76sec n/a n/a 396sec n/a n/a Partial Information <1sec <1sec 1sec 2sec 4sec 6sec 13sec 42sec 28sec 34sec 169sec 431sec 162sec 449sec 11349sec Table 5: Comparison of average performance and average computational times for binary decision rules with full and partial information structure, using the performance measure (Non-Adapt.−Adapt.) . Non-Adapt. 33 6 Conclusions In this paper, we present linearly parameterised binary decision rule structures that can be used in conjunction with real-valued decision rules appearing in the literature, for solving multistage adaptive mixed-integer optimization problems. We provide a systematic way to reformulate the binary decision rule problem into a finite dimensional mixed-integer linear optimization problem, and we identify instances where the size of this problem grows polynomially with respect to the input data. The theory presented covers both fixed-recourse and random-recourse problems. Our numerical results demonstrate the effectiveness of the proposed binary decision rules and show that they are (i) highly scalable and (ii) provide significant improvements in solution quality. Appendix A Technical Proofs In this section, we will provide the proof for Theorem 1. To do this, we need the following auxiliary results given in Lemmas 1, 2 and 3. Lemma 1 gives the complexity of the integer feasibility problem, which will be used in Lemmas 2 and 3 to prove that checking feasibility of a semi-infinite constraint involving indicator functions is NP-hard. Lemma 1 (Integer feasibility problem) The following decision problem is NP-hard: W ∈ Rl×k , h ∈ Rl and N ∈ Z+ with N < k. Pk Question. Is there ξ ∈ {0, 1}k such that W ξ ≥ h, i=1 ξi = N ? Instance. Proof See [27]. (55) Lemma 2 (Equivalent integer feasibility problem) The following decision problem is NP-hard: W ∈ Rl×k , h ∈ Rl and N ∈ Z+ with N < k. Pk Pk Question. Is there ξ ∈ [0, 2]k such that W ξ ≥ h, i=1 ξi = N, i=1 1(ξi ≥ 1) = N ? Instance. (56) Proof In the following, we will show that if there exists ξ that satisfies the assertion, then ξ must be in ξ ∈ {0, 1}k . If that is the case, then decision problem (56) reduces to the decision problem (55), which from Lemma 1 we know to be NP-hard. We will prove this by contradiction. Assume, with loss Pk Pk of generality, that there exists ξ with ξ1 > 1 that satisfy W ξ ≥ h, i=1 ξi = N, i=1 1(ξi ≥ 1) = N . Pk Pk Since ξ1 > 1, then i=2 ξi < N − 1 and as a result i=2 1(ξi ≥ 1) < N − 1, which contradicts the Pk assumption that i=1 1(ξi ≥ 1) = N . Now, assume that there exists ξ with 0 < ξ1 < 1 that satisfy Pk Pk Pk W ξ ≥ h, i=1 ξi = N, i=1 1(ξi ≥ 1) = N . Since 0 < ξ1 < 1, then i=2 1(ξi ≥ 1) < N − 1, which 34 again contradicts the assumption that Pk i=1 1(ξi ≥ 1) = N . Therefore, we conclude that ξ ∈ {0, 1}k , and thus the decision problem (56) is NP-hard. The following lemma provides the key ingredient for proving Theorem 1. Lemma 3 The following decision problem is NP-hard: A convex polytope Ξ ⊂ Rk and τ ∈ R. Pk Question. Do all ξ ∈ Ξ satisfy i=1 1(ξi ≥ 1) ≤ τ ? Instance. (57) Proof We will show that an instance of decision problem (57) can be reduced to an equivalent integer feasibility problem (56). Consider the instance τ = N − 1, with N < k, and Ξ be given by Ξ := ( ξ ∈ [0, 2] : W ξ ≥ h, k k X ξi = N i=1 ) (58) . Therefore, the decision problem (57) evaluates to true if there is no ξ such that ξ ∈ [0, 2]k such that W ξ ≥, k X ξi = N, i=1 k X i=1 1(ξi ≥ 1) = N. (59) Checking the latter assertion is equivalent to checking if the decision problem (56) holds. Notice that Pk by construction, there is no ξ such that i=1 1(ξi ≥ 1) > N . Using Lemma 2, we can thus deduce that (57) is an NP-hard problem. We know have all the ingredients to prove Theorem 1. Proof of Theorem 1 Let Ξ be a convex polytope given by n Ξ = ξ ∈ [0, 2] : W ξ ≥ h, k k X i=1 o ξi = N , and denote by Pξ the uniform distribution on Ξ. We define the following instance of (3). minimize 0 subject to y(∙) ∈ Bk,k , ξi − 1 ≤ yi (ξ) ≤ ξi ∀i = 1, . . . , k, k X yi (ξ) ≤ N − 1, i=1 35 ∀ξ ∈ Ξ. (60) The optimal solution of Problem (60) is given by yi∗ (ξ) := 1(ξi ≥ 1), i = 1, . . . , k. We thus have k X i=1 1(ξi ≥ 1) ≤ N − 1 ∀ξ ∈ Ξ ⇐⇒ Problem (60) is feasible. Hence, Lemma 3 implies that checking the feasibility of Problem (60) is NP-hard. We now set αi = ei and βi = 1, for i = 2, . . . , k, in the description of G(∙) in (6b). By construction, there exists Y ∈ {0, 1}k×g such that y ∗ (ξ) = Y G (ξ), that achieves the same objective value in all Problems (11), (15) and (60). The above arguments allow us to conclude that k X i=1 1(ξi ≥ 1) ≤ N −1 ∀ξ ∈ Ξ ⇐⇒ Problem (11) is feasible ⇐⇒ Propositions 1 & 2 Problem (15) is feasible. Lemma 3 implies that Problems (11) and (15) are NP-hard, even when Y ∈ Zk×g is relaxed to Y ∈ Rk×g . References [1] B.D.O. Anderson and J.B. Moore. Optimal Control: Linear Quadratic Methods. Prentice Hall, 1990. [2] D. Bampou and D. Kuhn. Scenario-free stochastic programming with polynomial decision rules. 50th IEEE Conference on Decision and Control and European Control Conference, Orlando, FL, USA, pages 7806–7812, December 2011. [3] A. Ben-Tal and D. den Hertog. Immunizing conic quadratic optimization problems against implementation errors, volume 2011-060. Tilburg University, 2011. [4] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust Optimization. Princeton University Press, 2009. [5] A. Ben-Tal, B. Golany, and S. Shtern. Robust multi-echelon multi-period inventory control. European Journal of Operational Research, 199(3):922–935, 2009. [6] A. Ben-Tal, A. Goryashko, E. Guslitzer, and A. Nemirovski. Adjustable robust solutions of uncertain linear programs. Mathematical Programming, 99(2):351–376, 2004. [7] A. Ben-Tal and A. Nemirovski. Robust convex optimization. Mathematics of Operations Research, 23(4):769–805, 1998. [8] D. Bertsimas and C. Caramanis. Adaptability via sampling. In 46th IEEE Conference on Decision and Control, New Orleans, LA, USA, pages 4717–4722, December 2007. 36 [9] D. Bertsimas and C. Caramanis. Finite adaptability in multistage linear optimization. IEEE Transactions on Automatic Control, 55(12):2751–2766, 2010. [10] D. Bertsimas and A. Georghiou. Design of near optimal decision rules in multistage adaptive mixedinteger optimization. Submitted for publication, available via Optimization-Online, 2013. [11] D. Bertsimas, D.A. Iancu, and P.A. Parrilo. Optimality of affine policies in multi-stage robust optimization. Mathematics of Operations Research, 35(2):363–394, 2010. [12] D. Bertsimas, D.A. Iancu, and P.A. Parrilo. A hierarchy of near-optimal policies for multistage adaptive optimization. IEEE Transactions on Automatic Control, 56(12):2809–2824, December 2011. [13] G. C. Calafiore. Multi-period portfolio optimization with linear control policies. Automatica, 44(10):2463–2473, 2008. [14] G. C. Calafiore. An affine control method for optimal dynamic asset allocation with transaction costs. SIAM Journal on Control and Optimization, 48(4):2254–2274, 2009. [15] C. Caramanis. Adaptable optimization: Theory and algorithms. PhD thesis, Massachusetts Institute of Technology, 2006. [16] X. Chen, M. Sim, P. Sun, and J. Zhang. A linear decision-based approximation approach to stochastic programming. Operations Research, 56(2):344–357, 2008. [17] X. Chen and Y. Zhang. Uncertain linear programs: Extended affinely adjustable robust counterparts. Operations Research, 57(6):1469–1482, 2009. [18] M. Dyer and L. Stougie. Computational complexity of stochastic programming problems. Mathematical Programming, 106(3):423–432, 2006. [19] Stanley J. Garstka and Roger J.-B. Wets. On decision rules in stochastic programming. Mathematical Programming, 7(1):117–143, 1974. [20] A. Georghiou, W. Wiesemann, and D. Kuhn. Generalized decision rule approximations for stochastic programming via liftings. Mathematical Programming, to appear, 2014. [21] J. Goh and M. Sim. Distributionally robust optimization and its tractable approximations. Operations Research, 58(4):902–917, 2010. [22] P. J Goulart, E. C Kerrigan, and J. M Maciejowski. Optimization over state feedback policies for robust control with constraints. Automatica, 42(4):523–533, 2006. 37 [23] C. E. Gounaris, W. Wiesemann, and C. A. Floudas. The robust capacitated vehicle routing problem under demand uncertainty. Operations Research, 61(3):677–693, 2013. [24] G. A. Hanasusanto, D. Kuhn, and W. Wiesemann. Two-stage robust integer programming. Submitted for publication, available via Optimization-Online, 2014. [25] IBM ILOG CPLEX. http://www.ibm.com/software/integration/optimization/ cplex-optimizer. [26] D. Kuhn, W. Wiesemann, and A. Georghiou. Primal and dual linear decision rules in stochastic and robust optimization. Mathematical Programming, 130(1):177–209, 2011. [27] C. H. Papadimitriou. On the complexity of integer programming. Journal of the Association for Computing Machinery, 28(4):765–768, 1981. [28] R.T. Rockafellar and R.J.B. Wets. Variational Analysis. Springer, 1997. [29] C.-T. See and M. Sim. Robust approximation to multiperiod inventory management. Operations Research, 58(3):583–594, 2010. [30] A. Shapiro and A. Nemirovski. On complexity of stochastic programming problems. Applied Optimization, 99(1):111–146, 2005. [31] J. Skaf and S. P. Boyd. Design of affine controllers via convex optimization. IEEE Transactions on Automatic Control, 55(11):2476–2487, 2010. 38
© Copyright 2026 Paperzz