CHAPTER 9 Integer Programming

CHAPTER 9
Integer Programming
An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values:
(9.1)
max c T x
st
Ax ≤ b and x integral
Integrality restrictions occur in many situations. For example, the products in a
linear production model (cf. p. 81) might be “indivisible goods” that can only
be produced in integer multiples of one unit. Many problems in operations research and combinatorial optimization can be formulated as ILPs. As integer
programming is NP-hard (see Section 8.3), every NP-problem can in principle be
formulated as an ILP. In fact, such problems usually admit many different ILP
formulations. Finding a particularly suited one is often a decisive step towards
the solution of a problem.
9.1. Formulating an Integer Program
In this section we present a number of (typical) examples of problems with their
corresponding ILP formulations.
Graph Coloring. Let us start with the combinatorial problem of coloring the
nodes of a graph G = V E so that no two adjacent nodes receive the same color
and as few colors as possible are used (cf. Section 8.1). This problem occurs in
many applications. For example, the nodes may represent “jobs” that can each
be executed in one unit of time. An edge joining two nodes may indicate that
the corresponding jobs cannot be executed in parallel (because they use perhaps
common resources). In this interpretation, the graph G would be the conflict
graph of the given set of jobs. The minimum number of colors needed to color its
nodes equals the number of time units necessary to execute all jobs.
Formulating the node coloring problem as an ILP, we assume V = {1 n}
and that we have n colors at our disposal. We introduce binary variables y k ,
k = 1 n, to indicate whether color k is used y k = 1 or not yk = 0 . Furthermore, we introduce variables x ik to indicate whether node i receives color k.
181
182
The resulting ILP is
P
(9.2) min nk=1 yk s t
9. INTEGER PROGRAMMING
1
2
3
4
5
Pn
xik
xik − yk
xik + x jk
0 ≤ xik yk
xik yk
k=1
=
≤
≤
≤
∈
1 i = 1 n
0 i k = 1 n
1 i j ∈ E k = 1 n
1
Z
The constraints (4) and (5) ensure that the x ik and yk are binary variables. The
constraints (1)–(3) guarantee (in this order) that each node is colored, node i receives color k only if color k is used at all, and any two adjacent nodes have
different colors.
E X . 9.1. Show: If the integrality constraint (5) is removed, the resulting linear program
has optimum value equal to 1.
The Traveling Salesman Problem (TSP). This is one of the best-known combinatorial optimization problems: There are n towns and a ”salesman”, located
in town 1, who is to visit each of the other n − 1 towns exactly once and then
return home. The tour (traveling salesman tour) has to be chosen so that the total distance traveled is minimized. To model this problem, consider the so-called
complete graph Kn , i.e., the graph Kn = V E with n = |V| pairwise adjacent
nodes. With respect to a given cost (distance) function c : E → R we then seek
to find a Hamilton circuit C ⊆ E, i.e., a circuit including every node, of minimal
cost.
An ILP formulation can be obtained as follows. We introduce binary variables
xik i k = 1 n to indicate whether node i is the kth node visited. In addition,
we introduce variables y e e ∈ E to record whether edge e is traversed:
P
min e∈E ce ye
st
Pn x11 = 1
xik = 1 i = 1 n
Pk=1
n
n
i=1 x ik = 1 k = 1 P
(9.3)
e ye = n
xi+ k−1 + x jk − ye ≤ 1 e = i j k ≥ 2
xin + x11 − ye ≤ 1 e = i 1 0 ≤ xik ye ≤ 1
xik ye ∈ Z
E X . 9.2. Show that each feasible solution of (9.3) corresponds to a Hamilton circuit and
conversely.
In computational practice, other TSP formulations have proved more efficient.
To derive an alternative formulation, consider first the following simple program
with edge variables ye , e ∈ E:
9.1. FORMULATING AN INTEGER PROGRAM
183
min c T y s t
y Jt i = 2
i = 1 n
0 ≤ y ≤ 1
(9.4)
y
integral
P
(Recall our shorthand notation y -7 i = e∈" i ye for the sum of all y-values on
edges incident with node i.)
ILP (9.4) does not describe our problem correctly: We still must rule out solutions corresponding to disjoint circuits that cover all nodes. We achieve this by
adding more inequalities, so-called subtour elimination constraints. To simplify
the notation, we write for y ∈ R E and two disjoint subsets S T ⊆ V
X
y S : T =
ye
The subtour elimination constraints
e = e i¡ j f
i ∈ S¡ j ∈ T
y S : S ≥ 2
make sure that there will be at least two edges in the solution that lead from a
proper nonempty subset S ⊂ V to its complement S = V \ S. So the corresponding
tour is connected. A correct ILP-formulation is thus given by
min c T y s t
(9.5)
y Jt i y S : S
0 ≤ y
y
= 2
≥ 2
≤ 1
integral
i = 1 n
∅⊂S⊂V
Note the contrast to our first formulation (9.3): ILP (9.5) has exponentially many
constraints, one for each proper subset S ⊂ V. If n = 30, there are more than 2 30
constraints. Yet, the way to solve (9.5) in practice is to add even more constraints!
This approach of adding so-called cutting planes is presented in Sections 9.2 and
9.3 below.
REMARK. The mere fact that (9.5) has exponentially many constraints does not prevent
us from solving it (without the integrality constraints) efficiently (cf. Section 10.6.2).
Maximum Clique. This is another well-studied combinatorial problem, which
we will use as a case study for integer programming techniques later. Consider
again the complete graph Kn = V E on n nodes. This time, there are weights
c ∈ R V and d ∈ R E given on both the vertices and the edges. We look for a set
C ⊆ V that maximizes the total weight of vertices and induced edges:
(9.6)
max c C + d E C C⊆V
As Kn = V E is the complete graph, each C ⊆ V is a clique (set of pairwise
adjacent nodes). Therefore, we call (9.6) the maximum weighted clique problem.
184
9. INTEGER PROGRAMMING
E X . 9.3. Given a graph G = V E 0 with E 0 ⊆ E, choose c = 1 and
0 e ∈ E0
de =
−n otherwise
Show: With these parameters (for Kn = V E ), (9.6) reduces to the problem of finding
a clique C in G of maximum cardinality.
Problem (9.6) admits a rather straightforward ILP-formulation:
(9.7)
max c T x + d T y
ye − xi ≤
0
e ∈ E i ∈ e
xi + x j − ye ≤
1
e = i j ∈ E
0 ≤ x y ≤
1
x y
integer
A vector x y with all components x i ye ∈ {0 1} that satisfies the constraints of
(9.7) is the so-called (vertex-edge) incidence vector of the clique
C = {i ∈ V | xi = 1}
In other words, x ∈ R V is the incidence vector of C and y ∈ R E is the incidence
vector of E C .
REMARK. The reader may have noticed that all ILPs we have formulated so far are
binary programs, i.e., the variables are restricted to take values in {0 1} only. This is
not by pure accident. The majority of integer optimization problems can be cast in this
setting. But of course, there are also others (e.g., the integer linear production model
mentioned in the introduction to this chapter).
9.2. Cutting Planes I
Consider the integer linear program
(9.8)
max c T x
st
Ax ≤ b and x integral
For the following structural analysis it is important (see Ex. 9.4) to assume that A
and b are rational, i.e., A ∈ Qm×n and b ∈ Qm . In this case, the polyhedron
(9.9)
P = {x ∈ Rn | Ax ≤ b}
is a rational polyhedron (cf. Section 3.6). The set of integer vectors in P is a
discrete set, whose convex hull we denote by
(9.10)
PI = conv {x ∈ Zn | Ax ≤ b}
Solving (9.8) is equivalent with maximizing c T x over the convex set PI (Why?).
Below, we shall prove that also PI is a polyhedron and derive a system of inequalities describing PI . We thus show how (at least in principle) the original problem
(9.8) can be reduced to a linear program.
E X . 9.4. Give an example of a (non-rational) polyhedron P ⊆ R n such that the set PI is
not a polyhedron.
9.2. CUTTING PLANES I
185
P ROPOSITION 9.1. Let P ⊆ Rn be a rational polyhedron. Then also PI is a
rational polyhedron. In case PI 6= ∅, its recession cone equals that of P.
Proof. The claim is trivial if P is bounded (as P then contains only finitely many
integer points and the result follows by virtue of the discussion in Section 3.6). By
the Weyl-Minkowski Theorem 3.2, a rational polyhedron generally decomposes
into
P = conv V + cone W
with finite sets of rational vectors V ⊆ Qn and W ⊆ Qn . By scaling, if necessary,
we may assume that W ⊆ Zn . Denote by V and W the matrices whose columns
are the vectors in V and W respectively. Thus each x ∈ P can be written as
x = Vλ + Wµ where λ µ ≥ 0 and 1 T λ = 1
Let bµc be the integral part of µ 6= 0 (obtained by rounding down each component O i ≥ 0 to the next integer b O i c). Splitting µ into its integral part bµc and its
non-integral part µ = µ − bµc yields
x = Vλ + Wµ + Wbµc = x + Wbµc
with bµc ≥ 0 integral and x ∈ P, where
P = {Vλ + Wµ | λ ≥ 0 1 T λ = 1 0 ≤ µ 1}
Because W ⊆ Zn , x is integral if and only if x is integral. Hence
P ∩ Zn = P ∩ Zn + {Wz | z ≥ 0 integral}
Taking convex hulls on both sides, we find (cf. Ex. 9.5)
PI = conv P ∩ Zn + cone W
Since P is bounded, P ∩ Zn is finite. So the claim follows as before.
E X . 9.5. Show: conv V + W = conv V + conv W for all V W ⊆ R n .
We next want to derive a system of inequalities describing PI . There is no loss of
generality when we assume P to be described by a system Ax ≤ b with A and b
integral. The idea now is to derive new inequalities that are valid for PI (but not
necessarily for P) and to add these to the system Ax ≤ b. Such inequalities are
called cutting planes as they “cut off” parts of P that are guaranteed to contain no
integral points.
Consider an inequality c T x ≤ V that is valid for P. If c ∈ Zn but V 6∈ Z, then
each integral x ∈ P ∩ Zn obviously satisfies the stronger inequality c T x ≤ b V c.
Recall from Corollary 2.6 that valid inequalities for P can be derived from the
system Ax ≤ b by taking nonnegative linear combinations. We therefore consider
inequalities of the form
(9.11)
yT A x ≤ yT b y≥0
186
9. INTEGER PROGRAMMING
If y T A ∈ Zn , then every x ∈ P ∩ Zn (and hence every x ∈ PI ) satisfies
y T A x ≤ by T bc
(9.12)
We say that (9.12) arises from (9.11) by rounding (if y T A ∈ Zn ). In particular, we
regain the original inequalities Ax ≤ b taking as y all unit vectors. We conclude
PI ⊆ P0 = {x ∈ Rn | y T A x ≤ by T bc y ≥ 0 y T A ∈ Zn } ⊆ P
Searching for inequalities of type (9.12) with y T A ∈ Zn , we may restrict ourselves
to 0 ≤ y ≤ 1. Indeed, each y ≥ 0 splits into its integral part z = byc ≥ 0 and
non-integral part y = y − z. The inequality (9.12) is then implied by the two
inequalities
(9.13)
zT A x ≤ zT b
T
T
y A x ≤ by bc
∈ Z
(Recall that we assume A and b to be integral.) The first inequality in (9.13) is
implied by Ax ≤ b. To describe P 0 , it thus suffices to augment the system Ax ≤ b
by all inequalities of the type (9.12) with 0 ≤ y 1, which describes
(9.14)
P0 = {x ∈ Rn | y T A x ≤ by T bc 0 ≤ y ≤ 1 y T A ∈ Zn }
by a finite number of inequalities (see Ex. 9.6) and thus exhibits P 0 as a polyhedron.
E X . 9.6. Show: There are only finitely many vectors y T A ∈ Zn with 0 ≤ y ≤ 1.
E X . 9.7. Show: P ⊆ Q implies P 0 ⊆ Q0 . (In particular, P 0 depends only on P and not
on the particular system Ax ≤ b describing P.)
Iterating the above construction, we obtain the so-called Gomory sequence
(9.15)
P ⊇ P0 ⊇ P00 ⊇ ⊇ P k ⊇ ⊇ PI
Remarkably (cf. Gomory [34], and also Chvatal [9]), Gomory sequences are always finite:
T HEOREM 9.1. The Gomory sequence is finite in the sense that P t = PI holds
for some t ∈ N.
Before giving the proof, let us examine in geometric terms what it means to pass
from P to P0 . Consider an inequality
yT A x ≤ yT b
with y ≥ 0 and y T A ∈ Zn . Assume that the components of y T A have greatest
common divisor d = 1 (otherwise replace y by d −1 y). Then the equation
y T A x = by T bc
9.2. CUTTING PLANES I
187
admits an integral solution x ∈ Zn (cf. Ex. 9.8). Hence passing from P to P 0
amounts to shifting all supporting hyperplanes H of P “towards” PI until they
“touch” Zn in some point x (not necessarily in PI ).
F IGURE 9.1. Moving a cutting plane towards PI
E X . 9.8. Show: An equation c T x = ' with c ∈ Zn , ' ∈ Z admits an integer solution if and
only if the greatest common divisor of the components of c divides ' (Hint: Section 2.3).
The crucial step in proving Theorem 9.1 is the observation that the Gomory sequence (9.15) induces Gomory sequences on all faces of P simultaneously. More
precisely, assume F ⊆ P is a proper face. From Section 3.6, we know that
F = P ∩ H holds for some rational hyperplane
H = {x ∈ Rn | y T A x = y T b}
m
with y ∈ Q+
(and hence y T A ∈ Qn and y T b ∈ Q).
L EMMA 9.1. F = P ∩ H implies F 0 = P0 ∩ H.
Proof. From Ex. 9.7 we conclude F 0 ⊆ P0 . Since, furthermore, F 0 ⊆ F ⊆ H
holds, we conclude F 0 ⊆ P0 ∩ H. To prove the converse inclusion, note that F is
the solution set of
Ax ≤ b
T
y Ax = y T b
Scaling y if necessary, we may assume that y T A and yT b are integral. By definition, F 0 is described by the inequalities
(9.16)
T
T
w T A + 1 y A x ≤ bw T b + 1 y bc
with w ≥ 0, 1 ∈ R (not sign-restricted) and w T A + 1 yT A ∈ Zn . We show that
each inequality (9.16) is also valid for P 0 ∩ H (and hence P0 ∩ H ⊆ F 0 ).
If 1¢ 0, observe that for x ∈ H (and hence for x ∈ P 0 ∩ H) the inequality (9.16)
remains unchanged if we increase 1 by an integer k ∈ N: Since x satisfies y T Ax =
188
9. INTEGER PROGRAMMING
y T b ∈ Z, both the left and right hand side will increase by ky T b if 1 is increased to
1 + k. Hence we can assume 1 ≥ 0 without loss of generality. If 1 ≥ 0, however,
(9.16) is easily recognized as an inequality of type (9.12). (Take y = w + 1 y ≥ 0.)
So the inequality is valid for P 0 and hence for P0 ∩ H.
We are now prepared for the
Proof of Theorem 9.1. In case P = {x ∈ Rn | Ax = b} is an affine subspace, the
claim follows from Corollary 2.2 (cf. Ex. 9.9). In general, P is presented in the
form
Ax = b
(9.17)
A0 x ≤ b0
with n − d equalities Ai· x = bi and s ≥ 0 facet inducing (i.e., irredundant) inequalities A0j· x ≤ b0j .
C ASE 1: PI = ∅. Let us argue by induction on s ≥ 0. If s = 0, P is an affine
subspace and the claim is true. If s ≥ 1, we remove the last inequality A 0s· x ≤ b0s
in (9.17) and let Q ⊆ Rn be the corresponding polyhedron. By induction, we then
have Q t = Q I for some t ∈ N. Now PI = ∅ implies
Q I ∩ {x ∈ Rn | A0s· x ≤ b0s } = ∅
Since P t ⊆ Q t and (trivially) P t ⊆ {x ∈ Rn | A0s· x ≤ b0s }, we conclude that
P t = ∅ holds, too.
C ASE 2: PI 6= ∅. We proceed now by induction on dim P. If dim P = 0, P = {p}
is an affine subspace and the claim is true. In general, since PI is a polyhedron,
we can represent it as
with C and d integral.
Ax = b
Cx ≤ d
We show that each inequality c T x ≤ of the system Cx ≤ d will eventually become valid for some P t , t ∈ N (which establishes the claim immediately). So fix
an inequality c T x ≤ . Since P and PI (and hence all P t ) have identical recession
cones by Proposition 9.1, the values
t = max c T x
x∈ P £ t ¤
are finite for each t ∈ N. The sequence t is decreasing. Indeed, from the definition of the Gomory sequence we conclude that t+1 ≤ b t c. Hence the sequence
t reaches its limit
:= lim t ≥ t→∞
in finitely many steps. If = , there is nothing left to prove. Suppose therefore
= t and consider the face
F := {x ∈ P t | c T x = }
9.2. CUTTING PLANES I
189
Then FI must be empty since every x ∈ PI ⊇ FI satisfies c T x ≤ > . If c T ∈
row A, then c T x is constant on P ⊇ P t ⊇ PI , so x¥ is impossible. Hence
c T 6∈ row A, i.e., dim F dim P. By induction, we conclude from Lemma 9.1
F k = P t+k ∩ {x ∈ Rn | c T x = } = ∅
for some finite k. Hence t+k , a contradiction.
E X . 9.9. Assume P = {x ∈ Rn | Ax = b}. Show that either P = PI or P0 = PI = ∅.
(Hint: Corollary 2.2 and Proposition 9.1)
E X . 9.10 (Matching Polytopes). Let G = V E be a graph with an even number of
nodes. A perfect matching in G is a set of pairwise disjoint edges covering all nodes.
Perfect matchings in G are in one-to-one correspondence with integral (and hence binary)
vectors x ∈ R E satisfying the constraints
i ∈ V
(1) x vgH i L = 1
(2) 0 ≤ x ≤ 1.
Let P ⊆ R E be the polytope described by these constraints. The associated polytope P I
is called the matching polytope of G. Thus PI is the convex hull of (incidence vectors
of) perfect matchings in G. (For example, if G consists of two disjoint triangles, we have
R E ' R6 , P = { 21 · 1} and PI = ∅).
To construct the Gomory polytope P 0 , consider some S ⊆ V. When we add the constraints
(1) for i ∈ S, every edge e = i j with i j ∈ S occurs twice. So the resulting equation is
(1’) x vgH S + 2x E S = |S|
(Recall that E S ⊆ E is the set of edges induced by S.) On the other hand, (2) implies
(2’) x vgH S ≥ 0 From (1’) and (2’) we conclude that x E S L ≤ 21 |S| is valid for P. Hence for S ⊆ V
(3) x E S ≤ b 12 |S|c
is valid for P 0 . It can be shown (cf. [12]) that the inequalities (1)-(3) describe P I . So
P0 = PI and the Gomory sequence has length 1.
Gomory’s Cutting Plane Method. Theorem 9.1 tells us that – at least in principle
– integer programs can be solved by repeated application of linear programming.
Conceptually, Gomory’s method works as follows. Start with the integer linear
program
(9.18)
max c T x
st
Ax ≤ b x integral
and solve its LP-relaxation, which is obtained by dropping the integrality constraint:
(9.19)
max c T x
st
Ax ≤ b
So c T x is maximized over P = {x ∈ Rn | Ax ≤ b}. If the optimal solution is
integral, the problem is solved. Otherwise, determine P 0 and maximize c T x over
P0 etc.
190
9. INTEGER PROGRAMMING
Unfortunately, this approach is hopeless inefficient. In practice, if the optimum x ∗
of (9.19) is non-integral, one tries to find cutting planes (i.e., valid inequalities for
PI that “cut off” a part of P containing x∗ ) right away in order to add these to the
system Ax ≤ b and then solves the new system etc.. This procedure is generally
known as the cutting plane method for integer linear programs.
Of particular interest in this context are cutting planes that are best possible in
the sense that they cut as much as possible off P. Ideally, one would like to
add inequalities that define facets of PI . Numerous classes of such facet defining
cutting planes for various types of problems have been published in the literature.
In Section 9.3, we discuss some techniques for deriving such cutting planes.
9.3. Cutting Planes II
The cutting plane method has been successfully applied to many types of problems. The most extensively studied problem in this context is the traveling salesman problem (see, e.g., [12] for a detailed exposition). Here, we will take the max
clique problem from Section 9.1 as our guiding example, trying to indicate some
general techniques for deriving cutting planes. Moreover, we take the opportunity
to explain how even more general (seemingly nonlinear) integer programs can be
formulated as ILPs.
The following unconstrained quadratic boolean (i.e., binary) problem was studied
in Padberg [64] with respect to a symmetric matrix Q = q i j ∈ Rn×n :
(9.20)
max
n
X
i + j=1
qi j xi x j xi ∈ {0 1}
As xi · xi = xi holds for a binary variable x i , the essential nonlinear terms in the
objective function are the terms q i j xi x j i 6= j . These may be linearized with the
help of new variables y i j = xi x j . Since xi x j = x j xi , it suffices to introduce just
n n − 1 6 2 new variables y e , one for each edge e = i j ∈ E in the complete
graph Kn = V E with V = {1 n}.
The salient point is the fact that the non-linear equation y e = xi x j is equivalent
with the three linear inequalities
ye ≤ xi ye ≤ x j and xi + x j − ye ≤ 1
if xi x j and ye are binary variables.
9.3. CUTTING PLANES II
191
With ci = qii and de = qi j + q ji for e = i j ∈ E, problem (9.20) can thus be
written as an integer linear program:
n
X
X
max
ci xi +
de ye
s.t.
i=1
(9.21)
e∈E
ye − xi ≤
0
e ∈ E i ∈ e
xi + x j − ye ≤
1
e = i j ∈ E
0 ≤ xi ye ≤
1
xi ye
integer.
Note that (9.21) is precisely our ILP formulation (9.7) of the weighted max clique
problem.
Let P ⊆ R V∪E be the polytope defined by the inequality constraints of (9.21).
As we have seen in Section 9.1, PI is then the convex hull of the (vertex-edge)
incidence vectors x y ∈ R V∪E of cliques (subsets) C ⊆ V.
The polytope P ⊆ R V∪E is easily seen to have full dimension n + n2 (because,
e.g., x = 12 · 1 and y = 13 · 1 yields an interior point x y of P). Even PI is
full-dimensional (see Ex. 9.11).
E X . 9.11. Show: R V∪E is the affine hull of the incidence vectors of the cliques of sizes
0,1 and 2.
What cutting planes can we construct for PI ? By “inspection”, we find that for
any three vertices i j k ∈ V and corresponding edges e f g ∈ E, the following
triangle inequality
(9.22)
xi + x j + xk − ye − y f − y g ≤ 1
holds for any clique incidence vector x y ∈ R V∪E .
E X . 9.12. Show: (9.22) can also be derived from the inequalities describing P by rounding.
This idea can be generalized. To this end, we extend our general shorthand notation and write for x y ∈ R V∪E and S ⊆ V:
X
X
x S =
xi and y S =
ye
i∈S
e∈E S For example, (9.22) now simply becomes: x S − y S ≤ 1 for |S| = 3.
For every S ⊆ V and integer 1 ∈ N, consider the following clique inequality
(9.23)
1 x S − y S ≤ 1?1 + 1 6 2
P ROPOSITION 9.2. Each clique inequality is valid for PI .
9. INTEGER PROGRAMMING
192
Proof. Let x y ∈ R V∪E be the incidence vector of some clique C ⊆ V. We must
show that x y satisfies (9.23) for each S ⊆ V and 1 ∈ N. Let s = |S ∩ C|. Then
x S = s and y S = s s − 1 6 2. Hence
1?1 + 1 6 2 − 1 x S + y S = [1v1 + 1 − 21 s + s s − 1 ] 6 2
= ?1 − s ?1 − s + 1 6 2 which is nonnegative since both 1 and s are integers.
A further class of inequalities can be derived similarly. For any two disjoint subsets S T ⊆ V, the associated cut inequality is
(9.24)
x S + y S + y T − y S : T ≥ 0
(Recall from Section 9.1 that y S : T denotes the sum of all y-values on edges
joining S and T).
P ROPOSITION 9.3. Each cut inequality is valid for PI .
Proof. Assume that x y ∈ R V∪E is the clique incidence vector of C ⊆ V. With
s = |C ∩ S| and t = |C ∩ T|, we then find
x S + y S + y T − y S : T = s + s s − 1 6 2 + t t − 1 6 2 − st
= s − t s − t + 1 6 2 ≥ 0
Multiplying a valid inequality with a variable x i ≥ 0, we obtain a new (nonlinear!) inequality. We can linearize it by introducing new variables as explained at
the beginning of this section. Alternatively, we may simply use linear (lower or
upper) bounds for the nonlinear terms, thus weakening the resulting inequality.
For example, multiplying a clique inequality (9.23) by 2x i , i ∈ S, yields
X
21
xi x j − 2xi y S ≤ 1?1 + 1 xi
j∈S
Because of xi y S ≤ y S , x2i = xi and xi x j = ye , e = i j ∈ E, the following
so-called i-clique inequality
21 y i : S \ {i} − 2y S − 1?1 − 1 x i ≤ 0
must be valid for PI . (This may also be verified directly.)
(9.25)
REMARK. Most of the above inequalities actually define facets of P I . Consider, e.g.,
for some % , 1 ≤ % ≤ n − 2, the clique inequality
% x S − y S ≤ % % + 1 2
which is satisfied with equality by all incidence vectors of cliques C ⊆ V with |C ∩ S| = %
or |C ∩ S| = % + 1. Let H ⊆ R V∪E be the affine hull of all these incidence vectors.
To prove that the clique inequality is facet defining, one has to show
dim H = dim PI − 1 9.4. BRANCH AND BOUND
193
i.e., H is a hyperplane in R V∪E . This is not too hard to do. (In the special case S = V and
% = 1, it follows readily from Ex. 9.11).
The cutting plane method suffers from a difficulty we have not mentioned so far.
Suppose we try to solve an integer linear program, starting with its LP-relaxation
and repeatedly adding cutting planes. In each step, we then face the problem of
finding a suitable cutting plane that cuts off the current non-integral optimum.
This problem is generally difficult. E.g., for the max clique problem one can
show that it is N P-hard to check whether a given x∗ y∗ ∈ R V∪E satisfies all
clique inequalities and, if not, find a violated one to cut off x∗ y∗ .
Moreover, one usually has only a limited number of different classes (types) of
cutting planes to work with. In the max clique problem, for example, we could
end up with a solution x∗ y∗ that satisfies all clique, i-clique and cut inequalities
and yet is non-integral. The original system and these three classes of cutting
planes namely describe PI by no means completely.
The situation in practice, however, is often not so bad. Quite efficient heuristics
can be designed that frequently succeed to find cutting planes of a special type.
Macambira and de Souza [57], for example, solve max clique instances of up to
50 nodes with the above clique and cut inequalities and some more sophisticated
generalizations thereof.
Furthermore, even when a given problem is not solved completely by cutting
planes, the computation was not futile: Typically, the (non-integral) optimum
obtained after having added hundreds of cutting planes provides a rather tight
estimate of the true integer optimum. Such estimates are extremely valuable in a
branch and bound method for solving ILPs as discussed in Section 9.4 below. For
example, the combination of cutting planes and a branch and bound procedure has
solved instances of the TSP with several thousand nodes to optimality (cf. [12]).
9.4. Branch and Bound
Any linear maximization program (ILP) with binary variables x 1 xn can in
principle be solved by complete enumeration: Check all 2 n possible solutions for
feasibility and compare their objective values. To do this in a systematic fashion,
one constructs an associated tree of subproblems as follows. Fixing, say the first
variable x1 , to either x1 = 0 or x1 = 1, we generate two subproblems ILP | x1 = 0 and ILP | x1 = 1 . These two subproblems are said to be obtained from (ILP) by
branching on x1 .
Clearly, an optimal solution of (ILP) can be inferred by solving the two subproblems. Repeating the above branching step, we can build a binary tree whose nodes
correspond to subproblems obtained by fixing some variables to be 0 or 1. (The
term binary refers here to the fact that each node in the tree has exactly two lower
neighbors.) The resulting tree may look as indicated in Figure 9.2 below.
194
9. INTEGER PROGRAMMING
I L P
I L P | x1 = 0 I L P | x1 = 0 x3 = 0 I L P | x1 = 1 I L P | x1 = 0 x3 = 1 F IGURE 9.2.
Having constructed the complete tree, we could solve (ILP) bottom up and inspect
the 2n leaves of the tree, which correspond to ”trivial” (all variables fixed) problems. In contrast to this solution by complete enumeration, branch and bound
aims at building only a small part of the tree, leaving most of the “lower part”
unexplored. This approach is suggested by the following two obvious facts:
• If we can solve a particular subproblem, say ILP | x 1 = 0 x3 = 1 , directly (e.g., by cutting planes), there is no need to inspect the subproblems in the branch below ILP | x1 = 0 x3 = 1 in the tree.
• If we obtain an upper bound U x1 = 0 x3 = 1 for the sub-problem
ILP | x1 = 0 x3 = 1 that is less than the objective value of some known
feasible solution of the original (ILP), then ILP | x1 = 0 x3 = 1 offers
no optimal solution.
Only if neither of these circumstances occurs we have to explore the subtree
rooted at ILP | x1 = 0 x3 = 1 for possible optimal solutions. We do this by
branching at ILP | x1 = 0 x3 = 1 and creating two new subproblems in the
search tree. An efficient branch and bound procedure tries to avoid such branching steps as much as possible. To this end, one needs efficient algorithms that
produce
(1) “good” feasible solutions of the original (ILP).
(2) tight upper bounds for the subproblems.
There is a trade-off between the quality of the feasible solutions and upper bounds
on the one hand and the size of the search tree we have to build on the other. As
a rule of thumb, “good” solutions should be almost optimal and bounds should
differ from the true optimum by less than 10%.
Algorithms for computing good feasible solutions usually depend very much on
the particular problem at hand. So there is little to say in general. Quite often,
however, simple and fast heuristic procedures for almost optimal solutions can
be found. Such algorithms, also called heuristics for short, are known for many
problem types. They have no guarantee for success, but work well in practice.
REMARK [L OCAL S EARCH]. In the max clique problem the following simple local
search often yields surprisingly good solutions: We start with some C ⊆ V and check
9.5. LAGRANGIAN RELAXATION
195
whether the removal of some node i ∈ C or the addition of some node j ∈ C yields an
improvement. If so, we add (delete) the corresponding node and continue this way until
no such improvement is possible (in which case we stop with the current local optimum
C ⊆ V). This procedure may be repeated with different initial solutions C ⊆ V.
Computing good upper bounds is usually more difficult. Often, one just solves
the corresponding LP-relaxations. If these are too weak, one can try to improve
them by adding cutting planes as outlined in Section 9.3 . An alternative is to
obtain upper bounds from Lagrangian relaxation (see Section 9.5 below).
Search and Branching Strategies. For the practical execution of a branch and
bound algorithm, one needs to specify how one should proceed. Suppose, for
example, that we are in a situation as indicated in Figure 9.2, i.e., that we have
branched from (ILP) on variable x1 and from ILP | x1 = 0 on variable x3 . We
then face the question which subproblem to consider next, either ILP | x 1 = 1 or one of the subproblems of ILP | x1 = 0 . There are two possible (extremal)
strategies: We either always go to one of the “lowest” (most restricted) subproblems or to one of the “highest” (least restricted) subproblems. The latter strategy,
choosing ILP | x1 = 1 in our case, is called breadth first search while the former strategy is referred to as depth first search, as it moves down the search tree
as fast as possible.
A second question concerns the way of branching itself. If LP-relaxation or cutting planes are used for computing upper bounds, we obtain a fractional optimum
x∗ each time we try to solve a subproblem. A commonly used branching rule
then branches on the most fractional x∗i . In the case of (0 1)-variables, this rule
branches on the variable x i for which x∗i is closest to 1 6 2. In concrete applications,
we have perhaps an idea about the “relevance” of the variables. We may then alternatively decide to branch on the most relevant variable x i . Advanced software
packages for integer programming allow the user to specify the branching process
and support various upper bounding techniques.
REMARK. The branch and bound approach can easily be extended to general integer
problems. Instead of fixing a variable x i to either 0 or 1, we may restrict it to x i ≤ % i or
xi ≥ % i + 1 for suitable % i ∈ Z. Indeed, the general idea is to partition a given subproblem
into a number of (possibly more than just two) subproblems of similar type.
9.5. Lagrangian Relaxation
In Section 5.1, Lagrangian relaxation was introduced as a means for calculating
upper bounds for optimization problems. Thereby, one “relaxes” (dualizes) some
(in)equality constraints by adding them to the objective function using Lagrangian
multipliers y ≥ 0 (in case of inequality constraints) to obtain an upper bound L y .
The crucial question is which constraints to dualize. The more constraints are
dualized, the weaker the bound becomes. On the other hand, dualizing more
constraints facilitates the computation of L y . There is a trade-off between the
9. INTEGER PROGRAMMING
196
quality of the bounds we obtain and the effort necessary for their computation.
Generally, one would dualize only the “difficult” constraints, i.e., those that are
difficult to deal with directly (see Section 9.5.2 for an example).
Held and Karp [39] were the first to apply the idea of Lagrangian relaxation to
integer linear programs. Assume that we are given an integer program as
max {c T x | Ax ≤ b Bx ≤ d x ∈ Zn }
(9.26)
for given integral matrices A B and vectors b c d and let z ∗I P be the optimum
value of (9.26). Dualizing the constraints Ax − b ≤ 0 with multipliers u ≥ 0
yields the upper bound
(9.27)
L u
= max {c T x − u T Ax − b | Bx ≤ d x ∈ Zn }
= u T b + max { c T − u T A x | Bx ≤ d x ∈ Zn }
and thus the Lagrangian dual problem
z∗D = min L u (9.28)
u≥0
E X . 9.13. Show that L u is an upper bound on z ∗I P for every u ≥ 0.
It is instructive to compare (9.28) with the linear programming relaxation
z∗L P = max {c T x | Ax ≤ b Bx ≤ d} (9.29)
which we obtain by dropping the integrality constraint x ∈ Z n . We find that Lagrangian relaxation approximates the true optimum z ∗I P at least as well:
T HEOREM 9.2. z∗I P ≤ z∗D ≤ z∗L P.
Proof. The first inequality is clear (cf. Ex. 9.13). The second one follows from
the fact that the Lagrangian dual of a linear program equals the linear programming dual. Formally, we may derive the second inequality by applying linear
programming duality twice:
z∗D = min L u u≥0
= min [u T b + max { c T − u T A x | Bx ≤ d x ∈ Zn }]
x
u≥0
T
≤ min [u b + max { c T − u T A x | Bx ≤ d}]
x
u≥0
T
= min [u b + min{d T v | v T B = c T − u T A v ≥ 0}]
v
u≥0
T
T
= min {u b + v d | u T A + v T B = c T u ≥ 0 v ≥ 0}
u+ v
= max {c T x | Ax ≤ b Bx ≤ d} = z∗L P
x
9.5. LAGRANGIAN RELAXATION
197
REMARK. As the proof of Theorem 9.2 shows, z ∗D = z∗L P holds if and only if the integrality constraint x ∈ Z n is redundant in the Lagrangian dual problem defining z ∗D . In this
case, the Lagrangian dual is said to have the integrality property (cf. Geoffrion [29]).
It turns out that solving the Lagrangian dual problem amounts to minimizing a
”piecewise linear” function of a certain type. We say that a function f : R n → R
is piecewise linear convex if f is obtained as the maximum of a finite number of
affine functions f i : Rn → R (cf. Figure 9.3 below). (General convex functions
are discussed in Chapter 10).
f x
x
F IGURE 9.3. f u = max{ f i u | 1 ≤ i ≤ k}
P ROPOSITION 9.4. Let U be the set of vectors u ≥ 0 such that
(9.30)
L u = u T b + max { c T − u T A x | Bx ≤ d x ∈ Zn } ∞
x
Then L is a piecewise linear convex function on U.
Proof. For fixed u ≥ 0, the maximum in (9.30) is obtained by maximizing a linear
function f x = c T − u T A x over
PI = conv {x | Bx ≤ d x ∈ Zn } = conv V + cone E say, with finite sets V ⊆ Zn and E ⊆ Zn (cf. Proposition 9.1). If L u ∞, the
maximum in (9.30) is attained at some v ∈ V (Why?). Hence
L u = u T b + max { c T − u T A v | v ∈ V} exhibiting the restriction of L to U as the maximum of the finitely many affine
functions
R i u = u T b − Avi + c T vi
vi ∈ V 198
9. INTEGER PROGRAMMING
9.5.1. Solving the Lagrangian Dual. After these structural investigations,
let us address the problem of computing (at least approximately) the best possible
upper bound L u∗ and solving the Lagrangian dual
z∗D = min L u u≥0
To this end, we assume that we can evaluate (i.e., efficiently solve) for any given
u ≥ 0:
(9.31)
L u = max {c T x − u T Ax − b | Bx ≤ d x ∈ Zn }
REMARK. In practice this means that the constraints we dualize (Ax ≤ b) have to be
chosen appropriately so that the resulting L u is easy to evaluate (otherwise we obviously cannot expect to solve the problem min L u )
Suppose x ∈ Zn is an optimal solution of (9.31). We then seek some u ≥ 0 such
that L u L u . Since x is a feasible solution of the maximization problem in
(9.31), L u L u implies
(9.32)
and hence
L u = c T x − u T Ax − b c T x − u T Ax − b ≤ L u u − u
T
Ax − b 0
The Subgradient Method. The preceding argument suggests to try a vector u =
u + ∆u with
∆u = u − u = Ax − b for some small step size 0.
Of course, we also want to have u = u + ∆u ≥ 0. So we simply replace any
m
negative component by 0, i.e., we project the resulting vector u onto the set R +
of
feasible multipliers and obtain
(9.33)
u = max{0 u + Ax − b }
(componentwise)
REMARK. This procedure appears intuitively reasonable: As our step size is small,
a negative component can only occur if u i ≈ 0 and Ai· x bi . This means that we do
not need to enforce the constraint A i· x ≤ bi by assigning a large penalty (Lagrangian
multiplier) to it. Consequently, we try u i = 0.
The above procedure is the subgradient method (cf. also Section 5.2.3) for solving the Lagrangian dual: We start with some u0 ≥ 0 and compute a sequence
u1 u2 by iterating the above step with step sizes 1 2 .
The appropriate choice of the step size i is a delicate problem – both in theory
and in practice. A basic result states that convergence takes place (in the sense of
Theorem 11.6) provided
∞
X
lim i = 0 and
i =∞
i→∞
i=0
9.5. LAGRANGIAN RELAXATION
199
9.5.2. Max Clique Revisited. How could Lagrangian relaxation be applied
to the max clique problem? The first (and most crucial) step is to establish an
appropriate ILP formulation of the max clique problem. This formulation should
be such that dualizing a suitable subset of constraints yields upper bounds that are
reasonably tight and efficiently computable. A bit of experimenting reveals our
original formulation (9.7) resp. (9.21) to be inadequate. Below, we shall derive
an alternative formulation that turns out to work better.
We start by passing from the underlying complete graph Kn = V E to the complete directed graph Dn = V A , replacing each edge e = i j ∈ E by two
oppositely directed arcs i j ∈ A and j i ∈ A. To avoid confusion with the notation, we will always indicate whether a pair i j is considered as an ordered or
unordered pair and write i j ∈ A or i j ∈ E, resp. With each arc i j ∈ A, we
associate a binary variable y i j . The original edge weights d e e ∈ E are equally
replaced by arc weights q i j = q ji = de 6 2 (e = i j ∈ E).
The original ILP formulation (9.7) can now be equivalently replaced by
(9.34)
1
2
3
4
max c T x + q T y
st
1
xi + x j − 2 yi j + y ji ≤
yi j − y ji
=
yi j − xi
≤
x ∈ {0 1} V y ∈ {0 1} A
1 i j ∈ E
0 i j ∈ E
0 i j ∈ A
REMARK. (9.34) is a “directed version” of (9.7). The cliques (subsets) C ⊆ V are now
in one-to-one correspondence with the feasible solutions of (9.34), namely the vertex-arc
incidence vectors x y ∈ {0 1} V∪ A , defined by x i = 1 if i ∈ C and y i j = 1 if i j ∈ C.
The directed version (9.34) offers the following advantage over the formulation
(9.7): After dualizing constraints (1) and (2) in (9.34), the remaining constraints
(3) and (4) imply no “dependence” between different nodes i and j (i.e., y i j = 1
implies xi = 1 but not x j = 1) . The resulting Lagrangian relaxation can therefore
be solved quite easily (cf. Ex. 9.14).
E for dualizing constraints (1) and unreE X . 9.14. Using Lagrangian multipliers u ∈ R +
stricted multipliers v ∈ R E for dualizing the equality constraints (2) in (9.34), one obtains
X a
X
1
T
T
L u v = max c x + q y +
ui j 1 − xi − x j + yi j + y ji +
i j y i j − y ji
2
i + j ∈E
i + j ∈E
subject to (3)–(4) from (9.34) .
So for given u ∈
and v ∈ R E , computing L u v amounts to solving a problem of the
following type (with suitable e
c ∈ R V and e
q ∈ R A ):
E
R+
max e
cT x + e
q T y subject to (3)–(4) from (9.34)
Show: A problem of the latter type is easy to solve because the constraints (3)–(4) imply
no “dependence” between different nodes i and j.
P
e
qi j & 0.)
(Hint: For i ∈ V, let Pi = { j ∈ V | e
qi j & 0}. Set xi = 1 if e
ci +
j∈ Pi
9. INTEGER PROGRAMMING
200
Unfortunately, the Lagrangian bounds we obtain from the dualization of the constraints (1) and (2) in (9.34) are too weak to be useful in practice. To derive tighter
bounds, we want to add more constraints to (9.34) while keeping the enlarged system still efficiently solvable after dualizing constraints (1) and (2). It turns out that
one can add “directed versions” (cf. below) of the clique inequalities (9.23) and
the i-clique inequalities (9.25) for S = V without complicating things too much.
The resulting formulation of the max clique problem is
1
(9.35)
2
3
4
5
6
max c T x + q T y
s.t.
xi + x j − 12 yi j + y ji ≤
yi j − y ji
=
yi j − xi
≤
21 x V − y V ≤
1 ?1
+
21 y J i − y V − 1?1 − 1 xi ≤
x ∈ {0 1} V y ∈ {0 1} A
1
0
0
+ 1
0
i j ∈ E
i j ∈ E
i j ∈ A
1 = 1 n
i∈V
where, in constraints (4) and (5), we used the straightforward extension of our
general shorthand notation:
X
X
y V =
yi j
yi j
and
y J + i =
i+ j ∈ A
j6=i
Constraints (4) and (5) are “directed versions” of the original clique and i-clique
inequalities (9.23) and (9.25).
E X . 9.15. Show that every incidence vector x y ∈ R V∪ A of a set (clique) C ⊆ V
satisfies the constraints in (9.35). (Hint: Section 9.3)
To dualize constraints (1) and (2) in (9.35), we introduce Lagrangian multipliers
u ∈ R+E for the inequality constraints (1) and unrestricted multipliers v ∈ R E for
the equality constraints (2). So we obtain for L u v the expression
X
X
1
_
max c T x + q T y +
ui j 1 − xi − x j + yi j + y ji +
i j y i j − y ji 2
i + j ∈E
i + j ∈E
subject to (3)–(6) from (9.35)
Given u ∈ R+E and v ∈ R E , the computation of L u v amounts to solving a problem of the following type (for suitable e
c ∈ R V and e
q ∈ R A ):
(9.36)
max e
cT x + e
q T y subject to (3)–(6) from (9.35)
The integer linear program (9.36) appears to be more difficult, but can still be
solved quickly.
For p = 0 n, we determine the best solution satisfying x V = p as follows:
For p = 0, set x = y = 0. Given p ≥ 1, we choose for each i ∈ V the p − 1
most profitable arcs in + i , i.e., those with the highest e
q-values. Suppose their
e
q-values sum up to e
qi for i ∈ V. We then let xi = 1 for the p largest values of
e
ci + e
qi . If xi = 1, we let yi j = 1 for the p − 1 most profitable arcs in + i .
9.5. LAGRANGIAN RELAXATION
201
The optimal solution is then the best we found for p = 0 n. This follows
from
L EMMA 9.2. Let x y ∈ {0 1} V∪ A . Then x y is a feasible solution of (9.36) if
and only if there exists some p ∈ {0 n} such that
p − 1 if xi = 1
+
i ∈ V
i
x V = p and ii y J i =
0
if xi = 0
Proof. Assume first that x y satisfies (i) and (ii). Then x y satisfies the constraints (3) and (6) of (9.35). Constraint (4) reduces to
(4’)
21 p − p p − 1 ≤ 1?1 + 1 ,
which holds for all 1¦ p ∈ Z since ?1 − p 2 + v1 − p ≥ 0. Constraint (5) is
certainly satisfied if xi = 0 (due to (ii)). For xi = 1, constraint (5) becomes
which is (4’) again.
21 p − 1 − p p − 1 ≤ 1?1 − 1 P
Conversely, assume that x y is feasible for (9.36) and let p = x V = i∈V xi .
Consider the constraints (5) of (9.36) for those i with x i = 1. Adding the corresponding inequalities for any 1 , we find
21 y V − py V − p 1v1 − 1 ≤ 0
Taking 1 = p, we conclude y V ≤ p p − 1 On the other hand, letting 1 = p in (4), we have
2 p2 − y V ≤ p p + 1 and hence
y V ≥ p p − 1 which proves y V = p p − 1 . Substituting the latter equality into (5) (with
1 = p) and dividing by p, we deduce for i ∈ V with x i = 1:
2y J
+
i ≤ p − 1 + p − 1 x i = 2 p − 1 In view of constraint (3) in (9.35), we thus have the inequalities
p − 1 if xi = 1
+
y J i ≤
0
if xi = 0.
Since y V = p p − 1 , actually equality must hold.
E X . 9.16. The Lagrangian bounds L u v we obtain when solving (9.36) as explained
above are generally better than the bound produced by the LP-relaxation of (9.36). Consider, for example, the complete directed graph D 4 = V A with e
c = 0 ∈ R V and symmetric arc weights e
qi j = e
q ji as indicated in Figure 9.4 below.
An optimum integral solution of (9.36) can be obtained as follows: Choose any set C ⊆ V
with |C| = 3. Set x i = 1 if i ∈ C. Furthermore, for each i ∈ C choose two arcs in g + i with weight e
qi j = 1. Set yi j = 1 on these two arcs. This solution guarantees an objective
function value e
q T y = 6 (so the duality gap is zero).
202
9. INTEGER PROGRAMMING
In contrast, the LP-relaxation of (9.36) is solved by x 1 = x4 = 1, x2 = x3 = 2 3, y12 =
y13 = y42 = y43 = 1 and y21 = y23 = y24 = y31 = y32 = y34 = 2 3 with an objective
value of 8. So Lagrangian relaxation (in this example) provides strictly better bounds than
LP-relaxation. In other words, problem formulation (9.36) does not have the integrality
property (cf. p. 197).
2
3
1
4
F IGURE 9.4. All arcs have weight 1 except the two arcs 1 4 and
4 1 of weight −100.
Our Lagrangian relaxation of the max clique problem makes use of cutting planes
by adding them to the constraints. This approach works well as long as we can
deal with these additional constraints directly. If we wanted to add other cutting
planes (say triangle inequalities), solving (9.36) with these additional constraints
would become a lot more difficult.
An alternative procedure would add such constraints and dualize them immediately. The resulting Lagrangian bound may then again be computed by solving a
problem of type (9.36) (with a modified objective function). This approach has
proved rather promising in practice (cf. [43]).
9.6. Dualizing the Binary Constraints
As we have seen, Lagrangian relaxation is a technique to get rid of difficult inequality or equality constraints by dualizing them. Can we do something similar
with the binary constraints? The answer is yes, and the reason is simple: A binary constraint xi ∈ {0 1} can be equivalently written as an equality constraint
x2i − xi = 0, which we dualize as usual.
Note, however that dualizing the quadratic equation x 2i − xi = 0 necessarily results
in a quadratic term in the Lagrangian function. We illustrate this approach in the
case of the maximum clique problem – or, equivalently, the unconstrained quadratic binary optimization problem from Section 9.3 (see Lemaréchal and Oustry
[52] for other examples and more details of this technique in general).
9.6. DUALIZING THE BINARY CONSTRAINTS
203
Let Q ∈ Rn×n be a symmetric matrix and reconsider the unconstrained quadratic
boolean problem
max {x T Qx | x ∈ {0 1}n }
(9.37)
Dualizing the constraints x2i − xi = 0 with Lagrangian multipliers u i ∈ R, we
obtain the Lagrangian bound
X
(9.38)
L u = maxn x T Qx +
ui x2i − xi x∈R
i
Letting U ∈ Rn×n denote the diagonal matrix with diagonal u ∈ R n , we can write
L u = max x T Q + U x − u T x
(9.39)
x
Evaluating L u amounts to solving the unconstrained quadratic optimization
problem (9.39). Ex. 9.17 shows how to accomplish this.
E X . 9.17. For fixed u ∈ Rn , consider the function
f x = xT Q + U x − uT x Show: If x Q + U x & 0 holds for some x ∈ R n , then f has no finite maximum.
Assume that x T Q + U x ≤ 0 always holds (i.e., Q + U is negative semidefinite). Show:
x is optimal for f if and only if ∇ f x = 2x T Q + U − u T = 0 T . (Hint: Section 10.3).
So f has a finite maximum if and only if Q + U is negative semidefinite and ∇ f x = 0 T
has a solution. The maximum is attained in each x ∈ R n satisfying 2 Q + U x = u, which
implies
1
1
L u = max f x = x T u − u T x = − u T x x
2
2
The Lagrangian dual minu L u is called the semidefinite relaxation of the primal
(9.37), as it can be reformulated as follows (with u ∈ R n , r ∈ R):
min L u u
= min {r | L u ≤ r}
u+ r
= min {r | x T Q + U x − u T x ≤ r ∀x ∈ Rn }
u+ r
1
−r
− 21 u T
T
= min {r | 1 x ≤ 0 ∀ x ∈ Rn }
1
u+ r
−
u
Q
+
U
x
2
−r
− 12 u T
is negative semidefinite}
= min { r |
− 12 u Q + U u+ r
Only the last step needs further explanation, which is given in Ex. 9.18 below.
E X . 9.18. Show for any S ∈ R n+1 × n+1 :
1
1 xT S
≤ 0 for all x ∈ Rn ⇐⇒
x
z T Sz ≤ 0 for all z ∈ Rn+1 .
9. INTEGER PROGRAMMING
204
Our reformulation of the Lagrangian dual via
(9.40)
min L u u
= min r
r+ u
st
S r+ u =
−r
− 12 u T
1
− 2 u Q + U
0
is a special case of a semidefinite program (optimizing a linear objective under
linear and semidefinite constraints, see also Section12.6).
REMARK. To understand how (and why) problem (9.40) can be solved at least approximately, consider the following “cutting plane approach”: We first replace the condition
of semidefiniteness for S = Sr+ u by a finite number of linear inequalities
a T Sa ≤ 0
(9.41)
a ∈ A
for some finite set A ⊆ Rn+1 . Note that, for each fixed a ∈ A, the inequality a T Sa ≤ 0 is
a linear inequality with variables r and u.
We then minimize r subject to constraints (9.41). If the solution provides us with r and u
such that Sr+ u is negative semidefinite, we have found a solution. Otherwise, if a T Sa & 0
holds for some a ∈ Rn+1 , we add a to A (i.e., we add a violated inequality) and solve the
modified problem etc. (Note that we can check whether S = S r+ u is negative semidefinite
with the Diagonalization algorithm from Section 2.1. This also provides us with a suitable
vector a in case S is not negative semidefinite.)
The theoretical aspects of this approach will be discussed in the context of the ellipsoid
method in Section 10.6. In practice, analogues of the interior point method for linear
programs (cf. Chapter 6) solve semidefinite programs more efficiently.
We want to emphasize that the approach of dualizing the binary constraints in a
general integer program
max c T x
s.t.
Ax ≤ b x ∈ {0 1}n
is limited. If we dualize only the binary constraints x 2i − xi = 0 using Lagrangian
multipliers ui ∈ R, the Lagrangian function becomes
L u = max x T Ux + c − u T x s.t. Ax ≤ b
In contrast to (9.38), this is a quadratic optimization problem with inequality constraints, which is in general difficult (NP-hard, cf. Section 8.3).

Download Report

CHAPTER 9 Integer Programming

Paperzz.com

Your Paperzz