Alternative ways to formulate optimization problems
Dimitar Dimitrov
Örebro University
May, 2011
1 / 32
Topics addressed in this material
Convex optimization problems
Change of variable
Eliminating linear equality constraints
Introducing linear equality constraints
Introducing linear inequality constraints (epigraph form)
Slack variables
Optimality conditions for convex problems (with differentiable objective)
Sufficiency of the first-order conditions under convexity
Slater’s constraint qualification
We focus only on convex optimization problems, however, most concepts can be
extended to general problems. Instead of describing each case in a general setting, we
use the linear and quadratic programming problems to illustrate all concepts. We
follow the exposition in [1].
2 / 32
Convex optimization problems
Convex optimization problem (standard form)
minimize f0 (x)
x
subject to fi (x) ≤ 0,
Cx = d,
i = 1, . . . , mi ,
C ∈ Rme ×n ,
where fi (x), i = 0, . . . , mi are convex functions. Note that the equality constraints of
a convex optimization problem can be only affine functions of x.
The feasible set of a convex optimization problem is convex
This is indeed the case because
since fi , i = 0, . . . , mi are convex functions, their domains dom(fi ) are convex
sets, hence the domain of the problem is convex since it is given by their
intersection
mi
\
D=
dom(fi ).
i=0
the intersection of D with the convex sets {x : fi (x) ≤ 0}, i = 1, . . . , mi and me
hyperplanes {x : Cx = d} is a convex set
3 / 32
Convex optimization problem: minimize convex function over a convex set
Some notes are due:
If the inequality constraints of a convex optimization problem are given in the
form
fi (x) ≥ 0, i = 1, . . . , mi ,
then the functions fi (x) are concave.
In order to simplify notation, sometimes we write convex optimization problems
not in standard form. For example, when there are only simple bounds (also
called boxed constraints) on the decision variables, we usually use
minimize f0 (x)
x
subject to li ≤ xi ≤ ui ,
i = 1, . . . , n.
where li and ui stand for the ith lower and upper bounds, respectively. The
above problem can be expressed in standard form as follows
minimize f0 (x)
x
subject to xi − ui ≤ 0,
li − xi ≤ 0,
i = 1, . . . , n,
i = 1, . . . , n.
When the optimization problem involves the maximization of a concave function
over a convex set, we still refer to it as a convex optimization problem.
4 / 32
Algebraic specification of the feasible set
As we already discussed (in lecture 2, kkt.pdf, pp. 27), the geometry of the feasible
set can be defined with alternative algebraic representations. For example the
following two sets have the same geometry
V1 = {(x1 , x2 ) : x1 x22 ≤ 0},
V2 = {(x1 , x2 ) : x1 ≤ 0}.
As another example, consider
W1 = {(x1 , x2 ) : (x1 + x2 )2 = 0},
W2 = {(x1 , x2 ) : x1 + x2 = 0}.
Let f0 be a convex function, then according to [1], pp. 137 the following distinction
should be made (even thought, in both problems we minimize a convex function over
a convex set)
Non-convex optimization problem
minimize f0 (x1 , x2 )
x1 ,x2
subject to x1 x22 ≤ 0,
(x1 + x2 )2 = 0.
Convex optimization problem
minimize f0 (x1 , x2 )
x1 ,x2
subject to x1 ≤ 0,
x1 + x2 = 0.
The (“pedantic”) reasoning is that f1 (x1 , x2 ) = x1 x22 is not a convex function, and
h1 (x1 , x2 ) = (x1 + x2 )2 = 0 is not an affine equality constraint (as required in the
standard form of a convex optimization problem). We will revisit this issue when we
discuss the Slater’s constraint qualification.
5 / 32
Change of variable
We illustrate the idea on an example. Consider the following optimization problem
Quadratic program (QP)
1
minimize
f (x) = xT Hx + xT g
x∈Rn
2
subject to Ax ≤ b,
A ∈ Rmi ×n
Cx = d,
C ∈ Rme ×n
where H ∈ Sn
++ is assumed.
Recall that every positive definite matrix can be expressed as the product of a lower
and an upper triangular matrix, i.e., H = LT L, where L ∈ Rn×n is upper triangular
(check the Matlab’s function chol). Hence, we can express the objective function as
f (x) =
1 T T
x L Lx + xT g.
2
Let us introduce a new variable v, such that
v := Lx,
and hence
x = L−1 v.
6 / 32
Change of variable (continued)
Substituting x = L−1 v in the QP leads to
QP with variable v
1
minimize
f˜(v) = vT v + v T g̃
v ∈Rn
2
subject to Ãv ≤ b,
C̃v = d,
where
g̃ = L−T g,
à = AL−1 ,
C̃ = CL−1 .
Once the solution v ⋆ is found, the solution x⋆ of the original problem can be
recovered as x⋆ = L−1 v ⋆ . We can say that in “some sense” the two problems are
“equivalent”.
Observation
From the above example it becomes clear that any QP with H ∈ Sn
++ can be
expressed as an “equivalent” QP with a Hessian that is the identity matrix (which is
commonly referred to as least-distance problem).
We will revisit the above observation when we discuss the Newton’s method and in
the context of (active-set) methods for solving QPs.
7 / 32
Example (change of variable)
4
2
2
1
0
x2
−1
P̃
P
−2
x⋆
−4
x2
0
−2
−∇f (x⋆ )
k∇f (x⋆ )k2
−3
−∇f (v ⋆ )
k∇f (v ⋆ )k2
−8
−10
−12
−4
x⋆
−5
−6
v⋆
−6
−5
−4
−3
−2
−1
x1
P = {x : Ax ≤ b}
P̃ = {v : AL−1 v ≤ b}
2 0
L=
0 4
x⋆ = L−1 v ⋆
−14
= (0.45, −1.45)
0
1
2
minimize
x1 ,x2
−16
3
1
2
subject to
|
x1
x2
v ⋆ = (0.9, −5.8)
−15
T |
−1
1
0.1
{z
A
−10
4
0
0
16
{z
H
−5
}
x1
x2
0
x1
+
−1
1
x1
−1
≤ 3 .
x2
0.8
0.4
}
|
{z
b
x1
x2
5
10
T 10
35
|
{z
}
g
}
The figure on the left depicts the level sets of the objective function and feasible set of
the above QP. The figure on the right depicts the problem after a change of variable.
Note that the two figures have different scale.
8 / 32
Change of variable (another example)
Quadratic program with variable x
1 T
minimize
x Hx + xT g
x∈Rn 2
subject to Ax ≤ b
Cx = d,
where H ∈
Sn
++
is assumed.
Quadratic program with variable z
Performing a change of variable (by adding the unconstrained minimizer to x)
z = x + H −1 g (hence, x = z − H −1 g) leads to
1 T
minimize
z Hz
z ∈Rn 2
subject to Az ≤ b + AH −1 g
Cz = d + CH −1 g.
We dropped 12 g T H −T H −1 g − g T H −1 g from the objective function as if does not
depend on z.
9 / 32
Eliminating linear equality constraints
We illustrate the idea on an example. Consider the following QP
1
minimize
f (x) = xT Hx + xT g
x∈Rn
2
subject to Ax ≤ b,
A ∈ Rmi ×n
Cx = d,
C ∈ Rme ×n
where rank(C) = me < n is assumed.
We can eliminate the equality constraints by observing that x is a solution of the
system of linear equations Cx = d if
x ∈ {xz ∈ Rz : Zxz + x0 },
where the columns of Z ∈ Rn×z span the null space of C (with z = n − me ) and x0
is any vector that satisfies Cx0 = d (i.e., some particular solution of Cx = d). This
can be verified by noting that
C(Zxz + x0 ) = CZ xz + Cx0 = d.
|{z}
0
Essentially, we want to reformulate the original QP into a QP with variable xz (which
has a smaller dimension).
10 / 32
Eliminating linear equality constraints (continued)
Substituting x = Zxz + x0 in the QP we obtain
1
minimize
f˜(xz ) = (Zxz + x0 )T H(Zxz + x0 ) + (Zxz + x0 )T g
xz ∈Rz
2
subject to A(Zxz + x0 ) ≤ b,
which can be simplified to
1
T
T
f˜(xz ) = xT
minimize
Z T HZ xz + xT
z (Z Hx0 + Z g)
xz ∈Rz
2 z | {z }
|
{z
}
H̃
g̃
subject to AZ xz ≤ b − Ax0 .
|{z}
| {z }
Ã
b̃
Above we used that x0 is a constant. H̃ is called the projected Hessian matrix.
Even though the QP with the equality constraints eliminated has a smaller dimension,
it is not necessarily easier to solve (compared to the original problem). For example,
when the Hessian matrix H has some sparsity pattern, it is very likely that after
forming H̃, this sparsity pattern would be lost, which could ruin the efficiency of the
algorithm that solves it.
11 / 32
Example 1 (eliminating linear equality constraints)
1
0
x2
−1
x⋆
−2
⋆
−∇f (x )
k∇f (x⋆ )k2
−3
−4
−5
−6
−5
−4
−3
x1
−2
−1
0
1
We use the same objective function as in the previous example, and consider only the
following equality constraint
−0.17
6 1 x = −1 , hence, Z =
, z = 1.
1
|{z}
| {z }
b
C
We choose x0 = (0, −1). Therefore H̃ = 15.68, g̃ = 17.10, x⋆z = −1.09.
12 / 32
Example 2 (eliminating linear equality constraints)
As another example, consider the following discrete-time system
xk+1 = Axk + Buk ,
k = 0, . . . , N − 1,
x0 is a known initial state,
where xk ∈ Rn and uk ∈ Rm are the state and control input at discrete-time k. A is
the “dynamics matrix” and B is the “input matrix”.
The problem: find u0 , . . . , uN−1 so that
x1 , . . . , xN are “small” (we aim at a good regulation)
u0 , . . . , uN−1 are “small” (we want to use small input effort)
In general these are two competing objectives, since a “large” control input can drive
the state to zero (i.e., regulate) fast.
We define what “small” is, with the following cost function
J(x1 , . . . , xN , u0 , . . . , uN−1 ) =
N−1
X
k=1
T
xT
k Qxk + xN Qf xN +
N−1
X
uT
k Ruk ,
k=0
where Q and Qf are symmetric and positive semidefinite matrices that represent state
cost, and final state cost, respectively, while R is a symmetric positive definite matrix
representing the input cost. Q, Qf and R are assumed to be given.
The problem: find u0 , . . . , uN−1 so that J(x1 , . . . , xN , u0 , . . . , uN−1 ) is minimized
is called a Linear Quadratic Regulator (LQR) problem.
13 / 32
J(x1 , . . . , xN , u0 , . . . , uN−1 ) =
N−1
X
T
xT
k Qxk + xN Qf xN +
k=1
N−1
X
uT
k Ruk .
k=0
first term measures state deviation
second term measures final state deviation
last test measures input size
Using v x = (x1 , . . . , xN ) ∈ RnN and v u = (u0 , . . . , uN−1 ) ∈ RmN , we can rewrite
the above objective function as
T
J(v x , v u ) = v T
x H xvx + vu H u vu ,
where
Hx =
Q
..
.
0
0
...
..
.
...
...
0
..
.
Q
0
0
..
.
0
Qf
,
R
.
H u = ..
0
...
..
.
...
0
..
. .
R
Next, we present two ways to formulate the LQR problem.
14 / 32
Formulation 1
T
minimize J(v x , v u ) = vT
x H x vx + vu H u vu
vx , v u
k = 0, . . . , N − 1,
subject to xk+1 = Axk + Buk ,
x0 is a known initial state.
The equality constraints can be expressed in a matrix form as
C x v x + C u v u = d,
where
Cx
=
−I
A
0
.
.
.
0
0
−I
A
.
.
.
0
0
0
−I
.
.
.
0
...
...
...
.
.
.
...
0
0
0
.
.
.
A
0
0
0
.
.
.
−I
,
Cu
=
B
0
0
.
.
.
0
0
B
0
.
.
.
0
0
0
B
.
.
.
0
...
...
...
.
.
.
...
0
0
0
.
.
.
B
,
d=
−Ax0
0
0
.
.
.
0
.
Note that
both v u and v x are variables in the above problem
both the Hessian and the matrix of the equality constraints have a very “nice”
sparsity pattern (for more information see [3])
15 / 32
Formulation 2
An alternative approach is to express vx as a function of v u and eliminate it from the
cost function. This can be achieved by iterating the dynamical equations for N steps
in the future as follows
vx =
|
B
AB
.
..
0
B
.
..
AN−1 B
AN−2 B
{z
W
...
...
..
.
...
0
0
.
..
B
vu +
}
A
..
. x0 .
AN
| {z }
w
Substituting v x in J(v x , v u ) leads to
˜ u ) = v T Hv u + v T w̃,
J(v
u
u
where H = W T H x W + H u , w̃ = 2W T H x wx0 ,
Note that
two drawbacks of this approach are:
forming the matrix H requires two matrix-matrix multiplications
H is in general dense, hence without any structure to exploit, the cost of
˜ u ) increases (compared to Formulation 1)
minimizing J(v
16 / 32
Introducing linear equality constraints
In some cases, it might be more convenient or even recommended to introduce
equality constraints to a given problem. Some of the reasons might be
we might have to put the problem in standard form (possibly required by an
off-the-shelf solver)
it could be desirable to introduce sparsity patterns that can be exploited by a
particular algorithm. As an illustrative (and rather trivial) example, consider the
nonnegative least-squares problem (with C ∈ Rm×n , m < n, rank(C) = m)
minimize kCx − dk22
x
subject to x ≥ 0.
The variable x appears both in the objective function and inequality constraints.
We could reformulate it as
minimize krk22
x ,r
subject to r = Cx − d,
x≥0
Now, the objective function and inequality constraints involve different
optimization variables (which are coupled through the equality constraints).
we might want to obtain additional insight into the problem (the next example
demonstrates that)
17 / 32
Example (idealized planar pendulum) [2], pp. 270
The equations governing the motion of a mass with position
(x, y) on the plane (only subject to the gravitational force)
are given by
0
0
ẍ
m 0
+
=
.
(1)
mg
0
ÿ
0 m
{z
} | {z } | {z }
|
n
q̈
H
When the mass is connected via a “massless” rod (with
length ℓ) to the origin of the inertial frame, the resulting
system is commonly known as an idealized pendulum. Since
its motion is subject to the constraint
y
x
θ
ℓ
x 2 + y 2 = ℓ2 ,
its equations of motion are usually formulated with respect
to the angle θ, which leads to an unconstrained problem.
Here, we formulate the equations with respect to q̈.
mg
In order to explicitly account for the constraints, we first note that (1) is the solution
of the following QP
1
minimize q̈T H q̈ + q̈ T n,
2
q̈∈R2
where H is positive definite, hence, there is a unique solution q̈⋆ .
18 / 32
In order to introduce the equality constraint in the QP with variables (ẍ, ÿ), we
differentiate it (with respect to time)
ẋ
0
x y
=
.
ẏ
0
| {z }
C
Differentiating (with respect to time) one more time leads to C q̈ = −Ċ q̇ .
| {z }
d
Optimality conditions
QP with equality constraints
1
minimize q̈ T H q̈ + q̈T n
2
q̈∈R2
subject to C q̈ = d.
H q̈⋆ + n + C T ν ⋆ = 0
C q̈ ⋆ = d.
Or in matrix form
⋆ −n
q̈
H CT
=
.
⋆
d
ν
C
0
We can obtain an explicit expression for ν ⋆ and q̈⋆ by simply using block elimination
−1
d + CH −1 n ,
ν ⋆ = − CH −1 C T
q̈⋆ = −H −1 n + C T ν ⋆ .
Even thought we have to solve a larger system of linear equations, we have an explicit
measure of the tension in the rod supporting the mass, namely kC T ν ⋆ k2 .
19 / 32
Epigraph form (introducing linear inequality constraints)
Problem in standard form
minimize f0 (x)
x
subject to fi (x) ≤ 0,
Cx = d,
(2)
i = 1, . . . , mi ,
C ∈ Rme ×n .
epi(f0 )
Problem in epigraph form
minimize t
x,t
(3)
subject to f0 (x) − t ≤ 0,
fi (x) ≤ 0,
Cx = d,
(x⋆ , f0 (x⋆ ))
i = 1, . . . , mi ,
C ∈ Rme ×n .
x⋆
dom(f0 )
The two problems are “equivalent”, because (x⋆ , t⋆ ) is optimal for (3) if and only
if x⋆ is optimal for (2) and t⋆ = f0 (x⋆ ).
Geometric interpretation: the epigraph form is an optimization problem in the
“epigraph space” (x, t), subject to the constraints on x.
Since t is a linear function of t and x, both the new objective function and the
new inequality constraint f0 (x) − t ≤ 0 are convex.
Every convex optimization problem can be transformed into a problem with linear
objective function.
20 / 32
Example (epigraph form)
3.5
−
0
0
x
t=
2
feasible
region
t=
x
epi(f0 )
−
−
2.5
x=1
3
1.5
t
1
(x⋆ , t⋆ )
0.5
level sets of t
0
x
−0.5
−3
Problem in standard form
minimize f0 (x) = |x|
x
subject to − x + 1 ≤ 0.
−2
−1
0
1
Problem in epigraph form
minimize t
x,t
subject to |x| − t ≤ 0,
− x + 1 ≤ 0.
2
3
Problem as an LP
minimize t
x,t
subject to x − t ≤ 0,
− x − t ≤ 0,
− x + 1 ≤ 0.
21 / 32
Slack variables
An inequality constraint
fi (x) ≤ 0,
is feasible if and only if there exists si ≥ 0 that satisfies
fi (x) + si = 0.
If we use the notation si ∈ R+ , it is clear that
{x : fi (x) + si = 0, si ∈ R+ } = {x : fi (x) = −R+ } = {x : fi (x) ≤ 0}.
A problem in standard form is “equivalent” to
minimize f0 (x)
x ,s
subject to fi (x) + si = 0,
si ≥ 0,
Cx = d,
i = 1, . . . , mi ,
i = 1, . . . , mi ,
C ∈ Rme ×n ,
where s = (s1 , . . . , smi ). Hence, it is always possible to replace each inequality
constraint with an equality constraint and a nonnegativity constraint (at the expense
of introducing an additional decision variable).
Using slack variables is sometimes necessary in order to put a given optimization
problem in standard form (recall the example about “polyhedron in standard form”).
22 / 32
Optimality conditions for convex problems (with differentiable objective)
Assume a convex problem with a differentiable objective function f0 .
Recall that for any x1 , x2 ∈ dom(f0 ),
f0 (x2 ) ≥ f0 (x1 ) + ∇f0 (x1 )T (x2 − x1 ).
Let us denote the feasible set by S (assumed to be nonempty). For simplicity we
assume that the domain of the problem D = Rn .
S = {x : fi (x) ≤ 0, i = 1, . . . , mi ,
cT
i x = di , i = 1, . . . , me }.
x1 is optimal if an only if the following two conditions hold
x1 ∈ S
∇f0 (x1 )T (x2 − x1 ) ≥ 0 for all x2 ∈ S
Note that, if ∇f0 (x1 )T (x2 − x1 ) ≥ 0 for some x1 , x2 ∈ S, then
f0 (x2 ) − f0 (x1 ) ≥ ∇f0 (x1 )T (x2 − x1 ),
{z
}
|
≥0
hence, f0 (x2 ) ≥ f0 (x1 ).
23 / 32
Geometric interpretation 1: optimality conditions for a convex problem
If x⋆ is optimal, and ∇f0 (x⋆ ) 6= 0, then −∇f0 (x⋆ ) defines a supporting hyperplane
to the feasible set at x⋆ . If S = Rn , ∇f0 (x⋆ )T (x − x⋆ ) ≥ 0 is satisfied for all
x ∈ Rn if and only if f0 (x⋆ ) = 0.
4
3
2
S
1
x2
x⋆
0
−∇f (x⋆ )
k∇f (x⋆ )k2
−1
−2
−3
−4
−5
−4
−3
−2
−1
x1
0
1
2
3
4
24 / 32
Geometric interpretations 2,3: optimality conditions for a convex problem
Recall that
If S ⊆ Rn is a closed convex (nonempty) set, then for every v ∈ Rn , there exists a
unique vector x ∈ S that solves the following problem
minimize kx − vk2
x
subject to x ∈ S.
We called the solution x⋆ the projection of v on S and denoted it by (see lecture 3,
convex.pdf)
x⋆ = projS (v).
x⋆ is optimal if and only if
x⋆ = projS (x⋆ − α∇f (x⋆ )),
for any α ≥ 0.
In words the above condition states that x⋆ is optimal if an only if a step in the
steepest descent direction followed by an Euclidean projection onto S means that we
have not moved at all [4], pp. 91.
Another projection (actually directional derivative) related interpretation is:
If ∇f0 (x⋆ )T (x − x⋆ ) ≥ 0 for all x ∈ S, then
∇f0 (x⋆ )T x ≥ ∇f0 (x⋆ )T x⋆ ,
for all x ∈ S.
25 / 32
From ∇f0 (x⋆ )T (x − x⋆ ) ≥ 0 to KKT (linear equality constraints)
Let x⋆ is optimal with respect to the following convex problem
minimize f0 (x)
x
subject to Cx = d,
C ∈ Rme ×n .
It is straightforward to derive the stationarity condition (see lecture 1, intro.pdf)
starting from
∇f0 (x⋆ )T (x − x⋆ ) ≥ 0,
for all x that satisfy Cx = d.
(4)
First, note that since x⋆ and any feasible x satisfy C(x − x⋆ ) = 0, we can
rewrite (4) as
∇f0 (x⋆ )T v ≥ 0,
for all v ∈ N (C).
(5)
Since if v ∈ N (C), so does −v, we conclude that condition (5) can be satisfied
only if
∇f0 (x⋆ ) ⊥ N (C).
(6)
Since R(C) ⊥ N (C), we conclude that
∇f0 (x⋆ ) ∈ R(C),
(7)
or equivalently
∇f0 (x⋆ ) + C T ν ⋆ = 0,
for some ν ⋆ ∈ Rme .
26 / 32
From first-order conditions to ∇f0 (x⋆ )T (x − x⋆ ) ≥ 0
Consider the following convex problem
minimize f0 (x)
x
subject to fi (x) ≤ 0,
i = 1, . . . , mi .
fi (x⋆ ) ≤ 0,
i = 1, . . . , mi ,
λ⋆i ≥ 0,
i = 1, . . . , mi ,
λ⋆i fi (x⋆ ) = 0,
i = 1, . . . , mi ,
∇f0 (x⋆ ) +
Assume that the pair (x⋆ , λ⋆ ) satisfies →
mi
X
λ⋆i ∇fi (x⋆ ) = 0.
i=1
We want to show that the above conditions imply ∇f0 (x⋆ )T (x − x⋆ ) ≥ 0 for all
feasible x (see [1], Exercise 5.31).
From convexity of fi (x) ≤ 0 we have
0 ≥ fi (x) ≥ fi (x⋆ ) + ∇fi (x⋆ )T (x − x⋆ ),
i = 1, . . . , mi .
Summing over all constraints and using λ⋆i ≥ 0 leads to
mi
X
i=1
λ⋆i fi (x⋆ ) + ∇fi (x⋆ )T (x − x⋆ ) =
mi
X
i=1
λ⋆i fi (x⋆ ) +
mi
X
λ⋆i ∇fi (x⋆ )T (x − x⋆ )
i=1
= −∇f0 (x⋆ )T (x − x⋆ ) ≤ 0.
Note that above we used in turn: (i) the feasibility of x, (ii) convexity of the
inequality constraints, (iii) nonnegativity of the Lagrange multipliers, (iv)
complementarity condition, and (v) stationarity condition.
27 / 32
Sufficiency of the first-order conditions under convexity
First-order conditions
fi (x⋆ ) ≤ 0,
i = 1, . . . , mi ,
primal feasibility condition
⋆
hi (x ) = 0,
i = 1, . . . , me ,
primal feasibility condition
λ⋆i ≥ 0,
i = 1, . . . , mi ,
dual feasibility condition
λ⋆i fi (x⋆ ) = 0,
i = 1, . . . , mi ,
complementarity condition
∇f0 (x⋆ ) +
mi
X
i=1
λ⋆i ∇fi (x⋆ ) +
me
X
µ⋆i ∇hi (x⋆ ) = 0,
stationarity condition.
i=1
KKT (necessary) conditions for x⋆ to be a local minimizer
x⋆ is regular (i.e., the gradients of the active constraints at x⋆ are linearly
independent)
⋆ )
there exist unique Lagrange multipliers λ⋆ = (λ⋆1 , . . . , λ⋆mi ), ν ⋆ = (ν1⋆ , . . . , νm
e
that satisfy the first-order conditions
In general, the KKT conditions are not sufficient for local optimality.
However, if the optimization problem is convex, the first-order conditions (alone) are
sufficient for global optimality (see [4], pp. 133), i.e.,
the optimization problem is convex
⇒ x⋆ is a global minimizer.
⋆
first-order conditions hold at x
28 / 32
First-order conditions are not necessary even for convex problems
Consider the convex optimization problem [4], pp. 135
minimize x1
x1 ,x2
subject to x21 + x2 ≤ 0,
x2 ≥ 0.
x⋆
0 is the only feasible point (hence
= 0). It is easy to check that the following
system is unsolvable
1
0
0
λ1
0
+
=
,
0
1 −1
λ2
0
λ ≥ 0.
Clearly, in the above example, the LICQ (Linear Independence Constraint
Qualification) does not hold (i.e., 0 is not a regular point).
So far we used only the LICQ when formulating KKT conditions. As it was mentioned
in lecture 2, kkt.pdf it can be rather restrictive in some cases. That is why
alternative constraint qualifications have been developed. Next, we define the so
called Slater’s constraint qualification.
29 / 32
Slater’s constraint qualification
Slater’s condition [5], pp. 331
Let x⋆ be a local minimizer of the problem
minimize f0 (x)
x
subject to fi (x) ≤ 0,
i = 1, . . . , mi ,
hi (x) = 0,
i = 1, . . . , me ,
where fi , i = 0, . . . , mi are continuously differentiable functions from Rn to R, and
hi , i = 1, . . . , me are affine. If the functions fi , i = 1, . . . , mi are convex and there
exists x̃ satisfying
fi (x̃) < 0, ∀i ∈ A(x⋆ ),
then x⋆ satisfies the first-order conditions. The differentiability assumption above is
made because the first-order conditions involve the gradients of fi , i = 0, . . . , mi .
Recall that A(x⋆ ) is a set containing the indexes of the active constraints at x⋆ .
When dealing with a convex problem, the Slater’s condition holds if there is a feasible
point that strictly satisfies the inequality constraints (such point is ofter referred to as
strictly feasible).
To summarize [4], pp. 134
For a feasible point x⋆ to be globally optimal, for a convex optimization problem, it is
both necessary and sufficient to verify the first-order optimality conditions, provided
that there exists x̃ satisfying fi (x̃) < 0, ∀i ∈ A(x⋆ ).
30 / 32
Algebraic specification of the feasible set (again)
Recall that we defined a convex optimization problem not only as a problem where a
convex function is minimized over a convex set. We require in addition that the
feasible set is described specifically by a set of inequalities fi (x) ≤ 0, i = 1, . . . , mi
involving convex functions, and a set of linear (well, actually affine) equality
constraints [1], pp. 138.
An attempt at giving a plausible reasoning (for the additional requirement)
I think that if the additional requirement is not included, it would be more difficult
(not possible?) to state a general result (as the one on the previous slide). Well, at
least not using the Slater’s CQ, because it is explicit regarding the functions defining
the feasible set.
When dealing with a convex problem, the Slater’s CQ is not very restrictive (in fact
rather “convenient”), and if a problem is assumed to verify it, one can make very
general statements regarding its properties (as we will see in the context of Lagrangian
duality as well).
31 / 32
[1]
S. Boyd, and L. Vandenberghe, “Convex Optimization,” Cambridge, 2004.
[2]
R. M. Murray, Z. Li, and S. S. Sastry, “A Mathematical Introduction to Robotic
Manipulation,” CRC Press, 1993.
[3]
http://www.stanford.edu/class/ee363/
[4]
N. Andréasson, A. Evgrafov, and M. Patriksson, “An Introduction to Continuous
Optimization: Foundations and Fundamental Algorithms,” 2005.
[5]
D.P. Bertsekas, “Nonlinear Programming,” Athena Scientific, (3rd print) 2008.
32 / 32
© Copyright 2026 Paperzz