Chapter 4. Duality in convex optimization

Chapter 4. Duality in convex optimization
Version: 10-10-2015
4.1 Lagrangean duality
Recall the program (J = {1, . . . , m})
(CO)
min f (x) s.t.
x ∈ F = {x ∈ C and gj (x) ≤ 0, j ∈ J}
with convex f , gj , a convex set C, minimal value v(CO).
Def.4.1 We introduce the Lagrangean function of (CO):
L(x, y) := f (x) +
m
X
yj gj (x),
y ≥ 0 ; x ∈ Rn , y ∈ Rm .
j=1
Note that the Lagrangean function L(x, y) is convex in x
(for fixed y) and (affine) linear in y (for fixed x).
CO, Chapter 3
p 1/20
Lagrangean dual problem
Defining


m


X
ψ(y) := inf f (x) +
yj gj (x)

x∈C 
for y ∈ Rm
j=1
the Lagrangean-dual problem of (CO) is given by:
(D) :
sup ψ(y)
y≥0
The (sup-)value of (D) is denoted by v(D).
Note: By definition ψ(y) is called concave if −ψ(y) is convex.
So that maxy ψ(y) ≡ miny −ψ(y ) is a convex problem.
L.4.2 [KRT,L.3.2] (without any convexity assumption on (CO))
The function ψ(y) is a concave function and thus, (D), the
Lagrangean dual of (CO) is a convex program.
CO, Chapter 3
p 2/20
Note further that for any fixed x:


m

 f (x) if g (x) ≤ 0, ∀j
X
j
sup f (x) +
yj gj (x) =


∞ otherwise
y≥0
j=1
So (CO) can be written equivalently as:
(CO) :
inf sup L(x, y)
x∈C y≥0
and by definition the dual (D) reads:
(D) :
CO, Chapter 3
sup inf L(x, y)
y≥0 x∈C
p 3/20
Th.4.3 [KRT;Th.3.4,Cor.3.5] (weak duality)
(without convexity assumptions)
For any y ≥ 0 and x ∈ F we have: ψ(y) ≤ f (x) . So
supy≥0 ψ(y) ≤ inf {f (x) | x ∈ F } .
|
{z
}
|
{z
}
v(CO)
v(D)
In particular:
if equality holds for y ≥ 0, i.e., if
ψ(y) = inf f (x) | gj (x) ≤ 0 , j ∈ J
x∈C
then y is a solution of (D).
If we have ψ(y) = f (x) for y ≥ 0, x ∈ F then y, x are
optimal solutions of (D), (CO), respectively.
Def.4.4 The value v (CO) − v(D) ≥ 0 is called duality gap.
(This gap may be positive, see examples lateron).
We speak of strong duality if v(D) = v(CO) holds.
CO, Chapter 3
p 4/20
Alternative form of Weak Duality:
sup inf L(x, y) ≤ inf sup L(x, y)
y≥0 x∈C
x∈C y≥0
(?)
Ex.4.1 Let X ⊂ Rn , Y ⊂ Rm and let g : X × Y → R be any
function g(x, y) defined for x ∈ X , y ∈ Y . Show that the
following inequality is always true:
sup inf g(x, y) ≤ inf sup g(x, y)
y∈Y x∈X
x∈X y∈Y
(??)
Remark:
The equality in (?) and (??) only holds under extra
conditions. In (?) we “need” Slater’s condition wrt. the
feasible set F .
In (??) we “need” convexity conditions on g and X , Y as well as
Slater’s conditions.
CO, Chapter 3
p 5/20
Th.4.5 (strong duality)
“stronger than” [KRT,Th.3.6]
Suppose (CO) satisfies the Slater condition ( so
v (CO) < ∞). Then
sup ψ(y) = inf f (x) | gj (x) ≤ 0 , j ∈ J
x∈C
y≥0
If moreover v(CO) is bounded from below (i.e., it is
bounded), then the dual optimal value is attained.
Proof. (case: −∞ < v (CO) < ∞) Let a := v (CO) be the
optimal value of (CO). From the Farkas Lemma (Th.3.12) it
follows that there exists a vector y = (y 1 , . . . , y m ) ≥ 0 such that
L(x, y) = f (x) +
m
X
j=1
y j gj (x) ≥ a , ∀x ∈ C, or inf L(x, y) ≥ a.
x∈C
By the definition of ψ(y) this implies ψ(y ) ≥ a. Using the weak
duality theorem, it follows a ≤ ψ(y) ≤ v (D) ≤ v (CO) = a and
thus a = ψ(y ) = v (CO). So, y is a maximizer of (D). CO, Chapter 3
p 6/20
Note that the existence of a minimizer of (the primal) (CO)
is not assured (and not needed) in Th. 4.5.
Ex.4.2.
Consider
n
o
(CO) min f (x) := x12 + x22 | x1 + x2 ≥ 1 , x = (x1 , x2 ) ∈ R2
Compute the solution of the dual and primal program and show
that v (CO) = v (D) = 1/2 holds.
CO, Chapter 3
p 7/20
Example. (No Slater but without duality gap)
(CO) min x
s.t. x 2 ≤ 0, x ∈ R
This (CO) problem is not Slater regular and has solution x = 0
with v (CO) = 0. On the other hand, we have
(
1
− 4y
for y > 0
ψ(y) = inf (x + yx 2 ) =
x∈R
−∞ for y = 0.
We see that ψ(y) < 0 for all y ≥ 0. One has
sup {ψ(y) | y ≥ 0} = 0.
So the Lagrange-dual (D) has the same optimal value as the
primal problem. In spite of the lack of Slater regularity there is
no duality gap; however a solution of (D) doesn’t exist.
CO, Chapter 3
p 8/20
4.2 Saddle point duality and KKT duality
Recall the convex program
x ∈ F = {x ∈ C and gj (x) ≤ 0, j ∈ J}
P
with Lagrangean function: L(x, y) = f (x) + j∈J yj gj (x)
min f (x) s.t.
(CO)
Def.4.6
[KRT,Def. 2.26]
A vector pair (x, y), with x ∈ C and y ≥ 0 is called a saddle
point of the Lagrange function L(x, y) if
L(x, y) ≤ L(x, y) ≤ L(x, y),
CO, Chapter 3
∀x ∈ C, ∀y ≥ 0.
p 9/20
L.4.7
[KRT,L.2.28]
A saddlepoint (x, y) of L(x, y) satisfies the strong duality
relation
sup inf L(x, y) = L(x, y) = inf sup L(x, y) .
y≥0 x∈C
x∈C y≥0
Moreover, if (x, y) is a saddle point of L then x is a
minimizer of (CO) and y is a maximizer of (D).
Proof: see Extraproof.pdf
Th.4.8
[KRT,Th.2.30]
Let the problem (CO) satisfy the Slater condition. Then the
vector x is a minimizer of (CO) if and only if there is a
vector y such that (x, y) is a saddle point of the Lagrange
function L.
Proof: see Extraproof.pdf
CO, Chapter 3
p 10/20
Back to the case: f , gj ∈ C 1 , C = Rn . We are now able to
prove the converse of Th.3.7.
Ex.4.3 (without Slater condition) Show that the pair (x, y)
(x ∈ F, y ≥ 0) satisfies the KKT condition for (CO) if and
only if (x, y) (y ≥ 0) is a saddle point of L.
PROOF “⇐”: Let (x, y) be a saddle point, i.e., for all x ∈ Rn ,
y ≥ 0 we have
X
X
X
f (x) +
yj gj (x) ≤ f (x) +
y j gj (x) ≤ f (x) +
y j gj (x) (?)
j
j
j
In left ≤: Letting yj → ∞ yields gj (x)
P≤ 0. So x ∈ F.
Choosing y = 0 gives f (x) ≤ f (x) + Pj y j gj (x) ≤ f (x)
(use y j ≥ 0, gj (x) ≤ 0). So we have j y j gj (x) = 0 and thus
y j gj (x) = 0 ∀j (complementarity conditions).
In right hand ≤: It shows that x is a solution of the
unconstrained minimization problem minx∈Rn L(x, y). So the
KKT condition must hold for x:
X
0 = ∇x L(x, y) = ∇f (x) +
y j ∇gj (x) .
j
CO, Chapter 3
p 11/20
Cor.4.9
[KRT,Cor.2.33]
Let C = Rn and let f , gj ∈ C 1 be convex functions.
Suppose (CO) satisfies the Slater condition. Then x is a
minimizer of (CO) if and only if with some y ≥ 0 the pair
(x, y) satisfies the KKT condition.
Proof. “⇐”: By Theorem 3.7.
“⇒”:
x minimizer
⇒
Th.4.8
there exists saddle point (x, y), y ≥ 0
⇒
Ex.4.3
(x, y) satisfies KKT
CO, Chapter 3
p 12/20
4.3 The Wolfe-dual
We reconsider (CO) with convex C = Rn , and convex
f , gj ∈ C 1 . Its Lagrangean dual:
(D)
sup ψ(y) := infn L(x, y)
x∈R
y≥0
where L(x, y) = f (x) +
Pm
j=1
yj gj (x).
Recall: For y fixed, x is a minimizer of ψ(y) iff
∇x L(x, y) = 0.
In that case: infx∈C L(x, y) = L(x, y). So, (provided, a
solution of ψ(y) exists) the Lagrangean dual takes the form:
m
X
(WD)
sup f (x) +
yj gj (x)
s.t. y ≥ 0 and
x,y
j=1
∇x L(x, y) ≡ ∇f (x) +
m
X
yj ∇gj (x) = 0
j=1
CO, Chapter 3
p 13/20
Def.4.10 (WD) is called the Wolfe-Dual of (CO). (Note, that
in general, (WD) is not a convex problem.)
Th.4.11
[KRT,Th.3.9]
(weak duality)
If x̂ is feasible for (CO) and (x, y) is a feasible point of
(WD) then
L(x, y) ≤ f (x̂).
If L(x, y) = f (x̂) holds then (x, y) is a solution of (WD) and
x̂ is a minimizer of (CO)
Proof: see Extraproof.pdf.
From Cor 4.9 (or Th 4.8) we obtain:
Th.4.12,
[KRT,Th.3.10] (partial strong duality)
Let (CO) satisfy the Slater condition. If the feasible point
x ∈ F is a minimizer of (CO) then there exists y ≥ 0 such
that (x, y) is an optimal solution of (WD) and
L(x, y) = f (x).
Proof: see Extraproof.pdf.
CO, Chapter 3
p 14/20
Rem. Only under additional assumptions (e.g., second order
conditions) also the converse of Th.4.11 holds.
Rem. The proof of Th.4.11 also shows: If (x, y) is a KKT
point (or equivalently a saddlepoint of L(x, y)) then (x, y) is a
maximizer of (WD).
Ex.4.4
(CO)
Consider the convex optimization problem
min x1 + e x2
x∈R2
s.t.
3x1 − 2e x2 ≥ 10, x2 ≥ 0.
Look at the Wolfe dual and apply the weak duality result to
show that x = (4, 0) is a solution of (CO).
(Remark: A substitution z = ex2 ≥ 1 would lead to a linear
problem.)
CO, Chapter 3
p 15/20
General remarks on duality:
The Wolfe-duality concept and duality via saddle
points is restricted to the case where a solution x of
the primal program (CO) exists.
The Lagrange duality approach is more general. The
existence of a minimizer of (CO) is not required.
Wolfe’s dual is often useful to find a suitable form of
the dual of a convex program.
[KRT,Sec.3.4] gives some examples with positive duality
gap. The following example in particular shows that Wolfe- and
Lagrangean-duality are not (always) the same
q
Ex.4.5.
(CO)
minx∈R2 e −x2
s.t.
x12 + x22 ≤ x1
Show: The feasible set is F = {(x1 , x2 ) | x2 = 0, x1 ≥ 0} (all
feasible points are minimizers) and
v (WD) = −∞ < v (D) = 0 < v (CO) = 1.
CO, Chapter 3
p 16/20
4.4 Examples of dual problems
The Wolfe dual can be used to find the Lagrangean dual.
Duality for Linear Optimization (LP)
Let A ∈ Rm×n , b ∈ Rm and c, x ∈ Rn . Consider the LP:
n
o
(P)
max b T y | AT y ≤ c .
y
The corresponding (Wolfe-) dual is
(D)
min {c T x | Ax = b , x ≥ 0}.
x
Ex.4.6 Show that (D) is also the Lagrangean dual of (P).
Rem. In LP, since the Slater condition is equivalent with the
feasibility of (P), strong (Lagrangean) duality holds if (P) is
feasible.
CO, Chapter 3
p 17/20
Duality for convex quadratic programming
Consider with positive semidefinite Q ∈ Rn×n the
quadratic program
o
n
(P) min c T x + 12 x T Qx | Ax ≥ b, x ≥ 0 ,
x
with linear constraints and convex object function. The
Wolfe-dual is:
n
o
(D) max b T y − 12 x T Qx | AT y − Qx ≤ c .
y≥0,x
Proof: Writing (P) as
n
o
minn c T x + 21 x T Qx | b − Ax ≤ 0 , −x ≤ 0
x∈R
the Wolfe-dual becomes: maxy≥0,s≥0,x {c T x + 21 x T Qx+
y T (b − Ax) − sT x | c + Qx − AT y − s = 0}.
CO, Chapter 3
p 18/20
Substituting c = −Qx + AT y + s in the objective, we get
c T x + 12 x T Qx + y T (b − Ax) − sT x
T
= −Qx + AT y + s x + 12 x T Qx + y T (b − Ax) − sT x
= (−Qx)T x + 12 x T Qx + y T b
= bT y − 12 x T Qx.
Hence the problem simplifies to
n
o
bT y − 12 x T Qx | c + Qx − AT y − s = 0 .
max
y≥0,s≥0,x
By eliminating s this becomes
n
o
max bT y − 21 x T Qx | AT y − Qx ≤ c .
y≥0,x
CO, Chapter 3
p 19/20
A last remark on convex programs The constraints in (CO)
should be such that they define a convex feasible set.
So for an inequality constraint g(x) ≤ 0 we assume that g is
convex.
What about equality constraints?
affine linear conditions aT x + b = 0 define an affine linear
(thus convex) set.
The solution sets of nonlinear (non-affine linear) equality
constraints h(x) = 0, are (in general) not convex.
So, in convex programs the equality constraints are
(always) given by affine linear functions.
CO, Chapter 3
p 20/20