1 Complexity of the Linear Programming Feasibility Problem

1
Complexity of the Linear Programming
Feasibility Problem
1.1
Linear Programming
We start with a brief review of the basic concepts of linear programming, compare [1]. Linear programs are usually stated in the following standard form
(P)
min cT x subject to Ax = b , x ≥ 0,
where A ∈ Rm×n , b ∈ Rm , c ∈ Rn are the given data and we look for an optimal
vector x ∈ Rn . Here and in what follows x ≥ 0 means xi ≥ 0 for all i and x > 0
means xi > 0 for all i. We call (P) feasible if there exists x ∈ Rn such that
Ax = b, x ≥ 0. A feasible problem (P) is called bounded if the minimum of the
objective function is finite. Otherwise it is called unbounded.
Given the data A, b, c we consider the closely related optimization problem
(D)
max bT y
subject to AT y ≤ c,
where y ∈ Rm . This problem is called the dual problem of (P). Here we can
also speak about feasibility and boundedness.
Suppose now that (P) and (D) are both feasible, say
Ax = b , x ≥ 0 , AT y ≤ c for some x ∈ Rn , y ∈ Rm .
Introducing the vector s := c − AT y of slack variables we have AT y + s = c and
s ≥ 0. Then
(1.1)
cT x − bT y = (sT + y T A)x − bT y = sT x + y T (Ax − b) = sT x ≥ 0.
It follows that (P) and (D) are both bounded and max bT y ≤ min cT x. The
fundamental duality theorem of linear programming states that actually equality
holds. A proof can be found e.g. in [1].
Theorem 1.1 (Duality Theorem of Linear Programming)
(i) The problem (P) is bounded iff (D) is bounded. In this case the objective
values of (P) and (D) are equal.
2
Primal-Dual Interior Point Methods: Basic Ideas
(ii) If (P) is unbounded then (D) is infeasible. If (D) is unbounded, then (P)
is infeasible.
Often it is useful to attempt to solve (P) and (D) simultaneously. Consider
the polyhedral set F of feasible points z = (x, y, s) ∈ Rn+m+n satisfying
(1.2)
Ax = b , AT y + s = c , x ≥ 0 , s ≥ 0.
We note that F is convex. It follows from (1.1) and the Duality Theorem 1.1
that for (x, y, s) ∈ F, x is an optimal solution of (P) and y is an optimal solution
of (D) iff the complementary slackness condition
(1.3)
xi si = 0 i = 1, 2, . . . , n
holds.
1.2
Primal-Dual Interior Point Methods: Basic Ideas
The most common method to solve linear programs is Dantzig’s simplex method.
This method relies on the geometry of the polyhedron of solutions and follows
a path of vertices on the boundary of the polyhedron. By contrast, interior
point methods follow a path in the interior of the polyhedron, hence the name.
The path is a nonlinear curve that is approximately followed by a variant of
Newton’s method.
More specifically, primal-dual interior point methods search for solutions of
the system
(1.4)
Ax = b , AT y + s = c , x ≥ 0 , s ≥ 0 , x1 s1 = 0, . . . , xn sn = 0
by following a certain curve in the strictly feasible set F ◦ ⊆ Rn+m+n defined by
(1.5)
Ax = b , AT y + s = c , x > 0 , s > 0.
Note that (1.4) is only mildly nonlinear (quadratic equations xi si = 0). It is
the nonnegativity constraints that form the main source of difficulty. For a
parameter µ > 0 we add now to (1.5) the additional constraints
(1.6)
x1 s1 = µ , . . . , xn sn = µ.
One calls µ the duality measure. Under certain genericity assumptions, there
is exactly one strictly feasible solution ζµ ∈ F ◦ satisfying (1.6) and such that
ζ = (x, y, s) = limµ→0 ζµ exists. Then it is clear that ζ ∈ F and xi si = 0 for
all i. Hence ζ is a desired solution of the primal-dual optimization problem.
We postpone the proof of the next theorem to Section 1.3.
Theorem 1.2 Suppose that m ≤ n, rk A = m and that there is a strictly feasible
point, i.e., F ◦ 6= ∅. Then for all µ > 0 there exists a uniquely determined point
ζµ = (xµ , y µ , sµ ) ∈ F ◦ such that xµi sµi = µ for i = 1, 2, . . . , n.
Complexity of the Linear Programming Feasibility Problem
µ
ζ0
F◦
3
C N (β)
ζk
zk
ζk+1
zk+1
0
Figure 1.1: Central path C and central neighborhood N (β)
Definition 1.3 The central path C of the primal-dual optimization problem
given by A, b, c is the set
C = {ζµ : µ > 0}.
Suppose we know ζµ0 for some µ0 > 0. The basic idea is to choose a sequence
of parameters µ0 > µ1 > µ2 > . . . converging to zero and to successively compute approximations zk of ζk := ζµk for k = 0, 1, 2, . . . until a certain accuracy
is reached. In most cases one chooses µk = σ k µ with a centering parameter
σ ∈ (0, 1).
It is useful to define a duality measure for any z = (x, y, s) ∈ F ◦ by
n
µ(z) :=
1X
1
xi si = sT x.
n i=1
n
How can we get the approximations zk ? This is based on Newton’s method, one
of the most fundamental methods in computational mathematics. Consider the
map F : Rn+m+n → Rn+m+n ,
z = (x, y, s) 7→ F (z) = (AT y + s − c, Ax − b, x1 s1 , . . . , xn sn ).
We note that {ζµ } = F −1 (0, 0, µe), where e := (1, . . . , 1) ∈ Rn , by Theorem 1.2.
The Jacobian matrix of F at z equals


0 AT I
0,
DF (z) = A 0
S
0 X
where here and in the following we set
S = diag(s1 , . . . , sn ) , X = diag(x1 , . . . , xn ).
Depending on the context, z, e etc. should be interpreted as column vectors.
4
Primal-Dual Interior Point Methods: Basic Ideas
Lemma 1.4 If m ≤ n and rk A = m, then DF (z) is invertible, provided si xi 6=
0 for all i.
Proof. By elementary column operations we can bring the matrix DF (z) to
the form


D AT I
A 0
0,
0
0 X
−1
where D = diag(−s1 x−1
1 , . . . , −sn xn ). It is therefore sufficient to show that
T
D A
the matrix
is invertible. Such matrices are called of Karush-KuhnA 0
D AT x
Tucker type. Suppose that
= 0, that is, Dx + AT y = 0, Ax = 0.
y
A 0
It follows that
T
D AT x
T
Dx + AT y
T
T
y
y
0 = x
= x
y
A 0
0
= xT Dx + (Ax)T y = xT Dx.
As D is negative definite, it follows x = 0. Hence AT y = 0. Therefore, y = 0,
as rk A = m.
2
We continnue with the description of the basic algorithmic idea. Choose
µk = σ k µ0 and set ζk = ζµk . Then F (ζk ) = (0, 0, µk e) for all k ∈ N. A first
order approximation gives
(1.7)
F (ζk+1 ) ≈ F (ζk ) + DF (ζk )(ζk+1 − ζk ).
Suppose that zk = (x, y, s) ∈ F ◦ is an approximation of ζk . Then F (zk ) =
(0, 0, x1 s1 , . . . , xn sn ) = (0, 0, X S e). We obtain from (1.7), replacing the unknowns ζk by zk ,
(0, 0, µk+1 e)T = F (ζk+1 ) ≈ F (zk ) + DF (zk )(ζk+1 − zk ).
This leads to the definition
(1.8)
zk+1 := zk + DF (zk )−1 (0, 0, µk+1 e − X S e)T
of the approximation of ζk+1 . This vector is well defined due to Lemma 1.4.
Put zk+1 = zk + (∆x, ∆y, ∆s). Then
(AT ∆y + ∆s, A∆x, ∆x + ∆s)
= DF (zk )(∆x, ∆y, ∆s)T
= (0, 0, µk+1 e − X S e)T ,
hence AT ∆y + ∆s = 0, A∆x = 0, which implies AT (y + ∆y) + (s + ∆s) = c,
A(x + ∆x) = b. We have shown that zk+1 satisfies the equalities in (1.5). By
a suitable choice of the parameter σ we will see that one can achieve that zk+1
also satisfies the strict inequalities in (1.5), that is zk+1 ∈ F ◦ .
Summarizing, the framework for a primal-dual interior point method is the
following:
Complexity of the Linear Programming Feasibility Problem
5
Algorithm 1.1: Primal-Dual IPM
Input: A ∈ Rm×n , b ∈ Rm , c ∈ Rn s.t. rk A = m ≤ n.
Choose starting point z0 = (x0 , y 0 , s0 ) ∈ F ◦ with duality measure µ0 .
Choose centering parameter σ ∈ (0, 1).
for k = 0, 1, 2, . . .
Solve


  k 
0
∆x
0 AT
I
,
A
0
0
0  · ∆y k  = 
k+1
k k
k
k
k
σ
µ0 e − X S e
S
0 X
∆s
where X k = diag(xk1 , . . . , xkn ), S k = diag(sk1 , . . . , skn ).
Set
(xk+1 , y k+1 , sk+1 ) = (xk , y k , sk ) + (∆xk , ∆y k , ∆sk ).
until some stopping criterion is matched
1.3
Existence and uniqueness of central path
We provide here the proof of the fundamental Theorem 1.2, following [4].
Lemma 1.5 Suppose that F ◦ 6= ∅. Then for all K ∈ R the set
{(x, s) ∈ Rn × Rn | ∃y ∈ Rm (x, y, s) ∈ F, sT x ≤ K}
is bounded.
Proof. Let(x̄, ȳ, s̄) ∈ F ◦ . For any (x, y, s) ∈ F we have Ax̄ = b and Ax = b,
hence A(x̄ − x) = 0. Similarly, AT (ȳ − y) + (s̄ − s) = 0. This implies
(s̄ − s)T (x̄ − x) = −(ȳ − y)T A(x̄ − x) = 0.
Assuming sT x ≤ K we get
sT x̄ + s̄T x ≤ K + s̄T x̄.
The quantity ξ := mini min{x̄i , s̄i } is positive by assumption. We therefore get
ξeT (x + s) ≤ K + s̄T x̄,
hence ξ −1 (K + s̄T x̄) is an upper bound on xi and si for all i.
Fix µ > 0 and consider the barrier function
(1.9)
f : H◦ → R ,
f (x, s) =
n
X
1 T
s x−
ln(xj sj )
µ
j=1
2
6
Existence and uniqueness of central path
defined on the projection H◦ of F ◦ :
H◦ := {(x, s) ∈ Rn × Rn | ∃y ∈ Rm (x, y, s) ∈ F ◦ }.
Note that H◦ is convex as F ◦ is convex. Moreover, f (x, s) approaches ∞ whenever any of the products xj sj approaches zero.
Lemma 1.6
1. f is strictly convex.
2. f is bounded from below.
3. For all κ ∈ R there exist 0 < α < β such that
{(x, s) ∈ H◦ | f (x, s) ≤ κ} ⊆ [α, β]n × [α, β]n .
Pn
Proof. (1) Consider the function g : Rn+ × Rn+ → R, g(x, s) = − j=1 ln(xj sj ).
2
2
∂ g
−2 ∂ g
−2
and all other second order derivatives of g
We have ∂x
2 = xj , ∂s2 = sj
j
j
vanish. The Hessian of g is therefore positive definite and hence g is strictly
convex (cf. [2]). In particular, the restriction of g to H◦ is strictly convex as
well.
We claim that the restriction of sT x to H◦ is linear. In fact, if x̄ is such that
Ax̄ = b, then for any (x, s) ∈ H◦ with y ∈ Rm such that (x, y, s) ∈ F ◦
sT x = cT x − bT y = cT x − x̄T AT y = cT x − x̄T (c − s) = cT x + x̄T s − x̄T c,
which is linear in (x, s). This proves the first assumption.
(2) We write
n
X
xj sj f (x, s) =
h
+ n − n ln µ,
µ
j=1
where
h(t) := t − ln t − 1.
It is clear that h is strictly convex on (0, ∞) and limt→0 h(t) = ∞, limt→∞ h(t) =
∞. Moreover, h(t) ≥ 0 for t ∈ (0, ∞) with equality iff t = 1. Using this, we get
f (x, s) ≥ n − n ln µ,
which shows the second assertion.
(3) Suppose (x, s) ∈ H◦ with f (x, s) ≤ κ for some κ. Then, for all j,
xj sj h
≤ κ − n + n log µ =: κ̄.
µ
From the properties of h it follows that there exist 0 < α1 < β1 such that
h−1 (−∞, κ̄] ⊆ [α1 , β1 ],
hence µα1 ≤ xj sj ≤ µβ1 . Applying Lemma 1.5 with K = nµβ1 shows that
there is some β such that xj ≤ β, sj ≤ β. Hence xj ≥ µα1 β −1 , sj ≥ µα1 β −1 ,
which proves the third assertion with α = µα1 β −1 .
2
Complexity of the Linear Programming Feasibility Problem
7
Suppose that F ◦ 6= ∅. Lemma 1.6(3) implies that f achieves its minimum
in H◦ . Moreover, the minimizer is unique as f is strictly convex. We shall
denote this minimizer by (xµ , sµ ). We note that if rk A = m ≤ n, then y µ is
uniquely determined by the condition AT y µ + sµ = c.
To complete the argument, we will show that xi si = µ, i = 1, 2, . . . , n, are
exactly the first order conditions characterizing local minima of the function f .
(Note that a local minimum of f is a global minimum by the strict convexity
of f .)
We recall a well known fact about Lagrangian multipliers from analysis.
Let g, h1 , . . . , hm : U → R be differentiable functions defined on the open subset
U ⊆ Rn . Suppose that u ∈ U is a local minimum of g under the constraints h1 =
0, . . . , hm = 0. Then, if the gradients ∇h1 , . . . , ∇hm are linearly independent
at u, there exist Lagrange multipliers λ1 , . . . , λm ∈ R such that
∇g(u) + λ1 ∇h1 (u) + . . . + λm ∇hm (u) = 0.
(1.10)
We apply this fact to the problem
min f (x, s)
s.t. Ax = b , AT y + s = c , x > 0 , s > 0.
Suppose that (x, y, s) is a local minimum of f . The linear independence condition holds due to Lemma 1.4 and our assumption rk A = m ≤ n. By (1.10)
there are Lagrange multipliers v ∈ Rm , w ∈ Rn such that
µ−1 s − X −1 e + AT v = 0 , Aw = 0 , µ−1 x − S −1 e + w = 0.
(1.11)
−1
s − X −1 e,
(Here we have used that ∂f
∂x = µ
last two conditions imply
∂f
∂y
−1
= 0, ∂f
x − S −1 e.) The
∂s = µ
A(µ−1 x − S −1 e) = 0.
Hence (µ−1 Xe − S −1 e)T AT v = 0. With the first condition we get
(µ−1 Xe − S −1 e)T (µ−1 Se − X −1 e) = 0.
Therefore
0
= (µ−1 Xe − S −1 e)T (X −1/2 S 1/2 )(X 1/2 S −1/2 )(µ−1 Se − X −1 e)
= kµ−1 (XS)1/2 e − (XS)−1/2 ek2 .
This implies XSe = µe, hence (x, y, s) lies on the central path C.
Conversely, suppose that (x, y, s) ∈ F ◦ satisfies XSe = µe. Put v = 0 and
w = 0. Then the first order conditions (1.11) are satisfied. Since f is strictly
convex, (x, s) is a global minimum of f . By the previously shown uniqueness,
we have (x, s) = (xµ , sµ ). This completes the proof of Theorem 1.2.
2
8
Analysis of IPM
1.4
Analysis of IPM
Recall the following useful conventions: For a vector u ∈ Rd we denote by U
the matrix U = diag(u1 , . . . , ud ). Moreover, e stands for the vector (1, . . . , 1)T
of the corresponding dimension. Note that U e = u. The euclidean norm of
u is denoted by kuk. We write kU k for the operator norm and kU kF for the
Frobenius norm of U .
Lemma 1.7 Let u, v ∈ Rd be such that uT v ≥ 0. Then
kU V ek ≤
1
ku + vk2 .
2
Proof. We have
kU V ek = kU vk ≤ kU k kvk ≤ kU kF kvk = kuk kvk.
Moreover, as uT v ≥ 0,
kuk kvk ≤
1
1
1
(kuk2 + kvk2 ) ≤ (kuk2 + 2uT v + kvk2 ) = ku + vk2 .
2
2
2
2
We remark that with a little more work, the factor 2−1 in Lemma 1.7 can
be improved to 2−3/2 , see [3, Lemma 14.1].
Let A ∈ Rm×n , b ∈ Rm , c ∈ Rn such that rk A = m ≤ n. Moreover, let
z = (x, y, s) ∈ F ◦ , that is
Ax = b , AT y + s = c , x > 0 , s > 0.
We consider one step of Algorithm 1.1, the primal-dual IPM, with centering
parameter σ ∈ (0, 1). That is, we set µ := µ(z) = n1 sT x, we define ∆z =
(∆x, ∆y, ∆s) by
  


∆x
0
0 AT I
,
A 0
0
0  ∆y  = 
(1.12)
∆s
σµe − XSe
S
0 X
and put
z̃ = (x̃, ỹ, s̃) = (x, y, s) + (∆x, ∆y, ∆s).
Lemma 1.8
1. ∆sT ∆x = 0
2. µ(z̃) = σµ(z)
3. z̃ ∈ F ◦ if x̃ > 0, s̃ > 0.
Complexity of the Linear Programming Feasibility Problem
9
Proof. (1) By definition of ∆z = (∆x, ∆y, ∆s) we have
AT ∆y + ∆s = 0
A∆x = 0
.
S∆x + X∆s = σµe − XSe
(1.13)
Therefore,
∆sT ∆x = −∆y T A∆x = 0.
(2) The third equation in (1.13) implies sT ∆x + xT ∆s = nσµ − xT s. Therefore,
s̃T x̃ = (sT + ∆sT )(x + ∆x) = sT x + ∆sT x + sT ∆x + ∆sT ∆x
= nσµ.
This means µ(z̃) = n1 s̃T x̃ = σµ.
(3) We already verified at the end of Section 1.2 (by a straightforward calculation) that z̃ satisfies the equality constraints in (1.5).
2
A remaining issue is how to achieve that x̃ > 0, s̃ > 0 by a suitable choice
of the centering parameter σ.
Definition 1.9 Let β > 0. The central neighborhood N (β) is defined as the set
of strictly feasible points z = (x, y, s) ∈ F ◦ such that
kXSe − µ(z)ek ≤ βµ(z).
The central neighborhood is a neighborhood of the central path C in F ◦ that
becomes smaller when µ(z) approaches zero (cf. Figure 1.1).
In the following we set β = 14 and write N := N ( 14 ).
Lemma 1.10 Let z = (x, y, s) ∈ N and ∆z = (∆x, ∆y, ∆s) be defined by (1.12)
with respect to σ = 1 − √ξn with 0 ≤ ξ ≤ 14 . Then z̃ = z + ∆z satisfies z̃ ∈ N .
Proof. By (1.12) we have
(1.14)
XSe + X∆s + S∆x = σµe,
which implies
X̃ S̃e = XSe + X∆s + S∆x + ∆X∆Se = ∆X∆Se + σµe.
Moreover, by Lemma 1.8(2), µ(z̃) = σµ. We therefore need to show that
(1.15)
kX̃ S̃e − σµek = k∆X∆Sek ≤ βµ(z̃) = βσµ.
10
Analysis of IPM
To do so note first that z ∈ N implies |xi si − µ| ≤ βµ for all i, hence
(1 − β)µ ≤ xi si ≤ (1 + β)µ.
(1.16)
By (1.14) we have X∆s + S∆x = σµe − XSe. Setting D := X 1/2 S −1/2 we get
(1.17)
D∆s + D−1 ∆x = (XS)−1/2 (σµe − XSe).
Because (D−1 ∆x)T (D∆s) = ∆sT ∆x = 0 (cf. Lemma 1.8(1)) we can apply
Lemma 1.7 with u = D−1 ∆x and v = D∆s to obtain
k∆X∆Sek = k(D−1 ∆X)(D∆S)ek
≤ 2−1 kD−1 ∆x + D∆sk2
≤ 2−1 k(XS)−1/2 (σµe − XSe)k2
by (1.17)
−1
by (1.16)
≤ (2µ(1 − β))
kσµe − XSek
2
≤ (2µ(1 − β))−1 (kµe − XSek + kµ(σ − 1)ek)2
√
≤ (2µ(1 − β))−1 (βµ + (1 − σ)µ n)2
−1
≤ (2(1 − β))
2
(β + ξ) µ
by Def. 1.9
by def. of σ.
A small calculation shows that
ξ
1
(β + ξ)2 ≤ β(1 − ξ) ≤ β(1 − √ )
2(1 − β)
n
for β = 41 and 0 ≤ ξ ≤ 14 . This proves (1.15).
We still need to show that z̃ ∈ F ◦ . For this, by Lemma 1.8(3), it is sufficient
to prove that x̃, s̃ > 0. Inequality (1.15) implies x̃i s̃i ≥ (1 − β)σµ > 0. Suppose
we had x̃i ≤ 0 or s̃i ≤ 0 for some i. Then x̃i < 0 and s̃i < 0 which implies
|∆xi | > xi and |∆si | > si . But then,
(1.15)
(1.16)
βµ > βσµ ≥ k∆X∆Sek ≥ |∆xi ∆si | > xi si ≥ (1 − β)µ,
hence β ≥ 21 , a contradiction.
2
Theorem 1.11 On an input (A, b, c) ∈ Rm×n ×Rm ×Rn with rk A = m ≤ n and
the choice of the centering parameter σ = 1 − √ξn with ξ ∈ (0, 14 ], Algorithm 1.1
produces on a strictly feasible starting point z0 in the central neighborhood N =
N ( 14 ) a sequence of iterates zk ∈ N such that µ(zk ) = σ k µ(z0 ), for k ∈ N. We
therefore have for all ε > 0,
√
n
µ(z0 )
µ(zk ) ≤ ε for k ≥
ln
.
ξ
ε
Proof. It suffices to show the last assertion. This follows from the implication
1
k ≥ a−1 ln B ⇒ (1 − a)k ≤
B
for 0 < a < 1, B > 0. (Use ln(1 − a) ≤ −a to show this.)
2
Bibliography
[1] Dimitris Bertsimas and John Tsitsiklis. Introduction to Linear Optimization.
Athena Scientific, 1997.
[2] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge
University Press, Cambridge, 2004.
[3] Jorge Nocedal and Stephen J. Wright. Numerical optimization. Springer
Series in Operations Research. Springer-Verlag, New York, 1999.
[4] Stephen J. Wright. Primal-dual interior-point methods. Society for Industrial
and Applied Mathematics (SIAM), Philadelphia, PA, 1997.