Classical Optimization Theory

10/20/2010
Classical Optimization
Theory
Quadratic forms
Let
X
 x1

 x2
 .

.
 xn









Let A bbe a n×n symmetric matrix.
i
1
10/20/2010
Then the quadratic form
  aii xi2  2
i
 a x x
1 i  j  n
ij i
j
Q( X )  X T A X
(or the matrix A) is called
Positive semi definite if XTAX ≥ 0 for all X.
Positive definite if XTAX > 0 for all X ≠ 0.
Negative semi definite if XTAX ≤ 0
for all X.
X
Negative definite if XTAX < 0 for all X ≠ 0.
A necessary and sufficient condition for A (or Q(X)) to be :
Positive definite (semi definite) is that all the n principal
minors of A are > 0 (≥0).
Negative definite (semi definite) if kth principal minor of A
has the sign of (-1)k , k=1,2,…,n (kth principal minor of A is
zero or has the sign of (-1)k ,k=1,2,…,n )
kth principal minor is the k×k determinant
a11 a12 ... a1k
a21 a22 ... a2 k
.
.
ak1 ak 2 ... akk
2
10/20/2010
Let f (X)=f (x1 ,x2 ,…,xn ) be a real-valued function of the n
variables x1 ,x2 ,…,xn (we assume that f (X) is at least twice
differentiable ). A point X0 is said to be a local maximum of
f (X) if  an ε > 0 such that f ( X 0  h )  f ( X 0 )
f all
for
ll
(Here
(H

h  (h1 , h2 ,.., hn )
And thus X 0  h  ( x10  h1 , x 20  h 2 ,.., x n0  h n ) )
hj  
X0 is said to be
 a local minimum of f (X) if  an ε > 0 such
that f (X0  h)  f (X0 ) for all | h j | 
X0 is
i called
ll d an absolute
b l maximum
i
or global
l b l maximum
i
off
f (X) if
f (X )  f (X0)  X
X0 is called an absolute minimum or a global minimum if
f (X )  f (X0)  X
Theorem
A necessary condition for X0 to be an optimum point of
f (X) is that f ( X )  0
0
f
(that is all the first order partial derivatives
are zero at
xi
X0.)
Definition
fi i i
f ( X 0 )  0 is called a stationary
A point X0 for which
point of f (X) (potential candidate for local maximum or
local minimum).
3
10/20/2010
Theorem
Let X0 be a stationary point of f (X). A sufficient condition
for X0 to be a local minimum of f (X) is that the Hessian
matrix H(X0) is positive definite; local maximum of f (X)
is that the Hessian matrix H(X0) is negative definite.
Here H(X) is the n×n matrix whose ith row are the partial
derivates of f (j =1,2,..n) with respect to xi.
x j
H(X) =
 2 f

2
 x1
 2 f

 x1x2
 .

 .
 2
  f
 xn x1
2 f
x1x2
.
.
2 f
x22
.
.
2 f
. .
xn x2
.
2 f 

x1xn 
2 f 

x2 xn 
.



2 f 
xn2 
Problem 3 set 20.1A page 705
Find the stationary points of the function
f (x1,x2,x3) = 2x1x2x3 – 4x1x3 – 2x2x3 + x12 + x22 + x32 – 2x1
- 4x2 + 4x3
And hence find the extrema of f (X)
f
 2 x 2 x3  4 x3  2 x1  2
x1
f
 2 x1 x3  2 x3  2 x 2  4
x 2
f
 2 x1 x 2  4 x1  2 x 2  2 x3  4
x3
4
10/20/2010
All these above equations = 0 give
x 2 x3  2 x3  x1  1
(1)
x1 x3  x3  x 2  2
(2)
x1 x 2  2 x1  x 2  x3  2
(3)
( 2) 
 x 2  2  x3  x1 x3
Substituting in (3) for x2, we get
2x1+ x1x3 – x12x3 – 2x1 – 2 – x3 + x1x3 + x3 = -2
2
Or
x1x3 (2 – x1) = 0
Therefore x1= 0 or x3 = 0 or x1 = 2
Case (i)
x1 = 0
(1) implies x2x3 – 2x3 = 1
(4)
(2) implies x2 – x3 = 2
(5)
(3) implies -x2+x3 = -2
same as (5)
(4) using (5) gives
x3(2 + x3) – 2x3 = 1
or x32 = 1 x3=
1
Sub case (i) x3 = 1 gives (using (5) ) x2 = 3
(ii) x3= -1 gives (using (5) ) x2 = 1
5
10/20/2010
Therefore, we get 2 stationary points (0,3,1), (0,1,-1).
Case (ii) x3 = 0
(1) gives x1 = 1
(2) gives x2 = 2
(3) gives x1x2 - 2x1 - x2 = -2 RHS
therefore, we get the stationary point (1,2,0)
Case ((iii)) x1= 2
(1) gives x2x3 – 2x3 = -1
(2) gives x2 + x3 = 2
(6)
(7)
(3) gives x2 + x3 = 2 same as (7)
(6) Using (7) gives
x3(2 – x3) – 2x3 = -1
x32 = 1, therefore x3 = ±1
sub case (i) x3 = 1 gives (using (5)) x2 = 1
(ii)x3 = -1, gives (using (5)) x2 = 3
therefore, we get two stationary points (2,1,1), (2,3,-1)
Hessian matrix =
2 x3
2 x 2  4
 2
 2x
2
2 x1  2 
3

2 x 2  4 2 x1  2
2 
6
10/20/2010
Point
(0,3,1)
(0,3, )
Principal
minors
2,0,2,0,
,0,-3
32
Nature
Saddle
S
dd e p
pt
(0,1,--1)
(0,1,
2,0,
2,0,--32
Saddle pt
(1,2,0)
2,4,8
Local min
(2,1,1)
2,0,
2,0,--32
Saddle pt
(2,3,--1)
(2,3,
2,0,
2,0,--32
Saddle pt
Convex Sets
A set is convex if every line segment connecting any
two points in the set is contained entirely within the
set
 Ex - polyhedron
 Ex - ball
An extreme point of a convex set is any point that is
not on any line segment connecting any other two
points of the set
The intersection of convex sets is a convex set
7
10/20/2010
Why do we care so much about convexity?
Because it guarantees that a local optimum is a
global optimum
This is significant because all of our basic
p
algorithms
g
search for local optima
p
optimization
Those that try harder to find global optima generally just run underlying
algorithms several times starting at different solutions
Possible Optimal Solutions to Convex NLPs
(not occurring at corner points)
objective function
level curve
objective function
level curve
optimal solution
optimal solution
Feasible
Region
Feasible
Region
linear objective,
nonlinear constraints
nonlinear objective,
linear constraints
objective function
level curve
objective function
level curves
optimal solution
Feasible
Region
nonlinear objective,
nonlinear constraints
optimal solution
Feasible
Region
nonlinear objective,
linear constraints
8
10/20/2010
Local vs. Global Optimal Solutions for
Nonconvex NLPs
X2
Local optimal solution
C
E
F
Feasible Region
B
Local and global
optimal solution
G
A
D
X1
Definition:
A function f(X)=f(x1,x2,…xn) of n variables is said to be
convex if for each pair of points X,Y on the graph, the line
segment joining these 2 points lies entirely above or on the
graph.
f((1-λ)X + λY) ≤ (1-λ)f(X)+ λf(Y)
f is called concave ( strictly concave) if
– f is convex ( strictly
y convex))
d2 f
0
dx 2
Convexity test for function of one variable
convex if
concave if
d2 f
0
dx 2
9
10/20/2010
Convexity test for functions of 2 variables
quantity
convex
concave
≥0
Strictly
convex
>0
≥0
Strictly
concave
>0
fxx-(fxy)2
fxx
≥0
>0
≤0
<0
fyy
≥0
>0
≤0
<0
Constrained optimization
Karush – Kuhn – Tucker (KKT) conditions:
Consider the problem
maximize z = f(X) = f(x1, x2,…, xn)
subject to g(X) ≤ 0 [g1(X)≤0
g2(X)≤0
.
gm(X)≤0]
(the non-negativity restrictions, if any, are included in the
above).
10
10/20/2010
We define the Lagrangean function
L(X, S, λ) = f(X) – λ[g(X) + s2]
(where s12, s22,..,sm2 are the non negative slack variables
added to g1(X)≤0 ,…. gm(X)≤0 to make them into equalities)
= f(X) – [λ1{g1(X) + s12} + λ2{g2(X) +s22}+..+λm{gm(x) +
sm2}]
KKT necessary conditions for optimality are given by
λ≥0
λ≥0,
L
L
 f ( X )   g ( X )  0
X
L
 2i S i  0, i  1,2,... , m
S i
L
 ( g i ( X )  S 2 )  0

These are equivalent to the following conditions:
λ≥0
f ( X )   g ( X )  0
i g i ( X )  0
i=1,2…m
gi ( X )  0
11
10/20/2010
In scalar notation, this is given by
λi≥0
i=1,2,….m
f
g
g
g
 1 1  2 2  ..  m m  0
x j
x j
x j
x j
j  1, 2, . . . , n
i g i ( X )  0
i=1,2,….m
gi ( X )  0
i=1,2,….m
Sufficiency of the KKT conditions:
Required conditions
Sense of optimization
Objective function Solution space
maximization
concave
Convex set
minimization
convex
Convex set
12
10/20/2010
Economical Interpretation of Lagrange
Multipliers
As
As with LPs, there is actually a whole area of duality
theory that corresponds to NLPs.
In this vein, we can view Lagrangians as shadow
prices for the constraints in NLP (corresponding to
the y vector in LP)
KKT Conditions
KKT conditions may not lead directly to a very
efficient
ffi i t algorithm
l ith ffor solving
l i NLP
NLPs. However,
H
they
th
do have a number of benefits:
They give insight into what optimal solutions to NLPs
look like
They provide a way to set up and solve small problems
They provide a method to check solutions to large
problems
bl
The Lagrangian values can be seen as shadow prices of
the constraints
13
10/20/2010
Problem Use the KKT conditions to derive an optimal
solution for the following problem:
maximize f(x1, x2) = x1+ 2x2 – x23
subject
bj t to
t x1 + x2 ≤ 1
x1
≥0
x2 ≥ 0
Solution: Here there are three constraints namely,
g1(x1,x2) = x1+x2 - 1 ≤ 0
g2(x1,x2) = -x1
g3(x1,x2) =
≤0
-x2
≤0
Hence the KKT conditions become
g
g
g
f
 1 1   2 2  3 3  0
x1
x1
x1
x1
g
g
g
f
 1 1   2 2  3 3  0
x 2
x 2
x 2
x 2
λ1g1(x1,x2) = 0
λ2g2(x1,x2) = 0
(note: f is concave, gi are
λ3g3(x1,x2) = 0
convex, maximization problem
g1(x1,x2) ≤ 0
these KKT conditions are
g2(x1,x2) ≤ 0
sufficient at the optimum point)
g3(x1,x2) ≤ 0
and λ1≥0, λ2≥0, λ3≥0
14
10/20/2010
i.e.
1 – λ1 + λ2 = 0
(1)
2 – 3x22 – λ1 + λ3 = 0 (2)
λ1(x1 + x2 – 1) = 0
(3)
λ2 x1
=0
(4)
λ3 x2
=0
(5)
x1 + x2 – 1 ≤ 0
(6)
x1
≥0
(7)
x2
≥0
(8)
λ1 ≥ 0
(9)
λ2
≥0
(10)
and λ3 ≥ 0
(11)
(1) gives λ1 = 1 + λ2 ≥ 1 >0 (using 10)
Hence (3) gives x1 + x2 = 1 (12)
Thus both x1, x2 cannot be zero.
So let x1>0 (4) gives λ2 = 0. therefore λ1 = 1
if now x2 = 0, then (2) gives 2 – 0 – 1 + λ3 = 0 or λ3 < 0
not possible
Therefore x2 > 0
hence (5) gives λ3 = 0 and then (2) gives x22 = 1/3 so x2 =1/√3
And so x1 = 1- 1/√3
Max f = 1 - 1/√3 + 2/√3 – 1/3√3 = 1 + 2 / 3√3
15
10/20/2010
Maximize f(x) = 20x1 + 10 x2
(0,1)
Subject to x12 + x22 ≤ 1
x1 + 2x2 ≤ 2
(4/5 3/5)
(4/5,3/5)
x1 ≥ 0, x2 ≥ 0
KKT conditions become
20 - 2λ1 x1 – λ2 + λ3 = 0
10 - 2λ1 x2 – 2λ2 + λ4 = 0
λ1 (x12 + x22 – 1) =0
λ1 (x1 + 2x2 – 2) =0
λ3 x1
=0
λ4 x2
=0
x12 + x22 ≤ 1
x1 + 2x2 ≤ 2
x1
≥0
x2
≥0
λ1
≥0
λ2
≥0
λ3
≥0
λ4
≥0
16
10/20/2010
From the figure it is clear that max f occurs at (x1, x2) where
x1, x2 >0.
λ3 = 0, λ4 = 0
suppose x1 + 2x2 – 2 ≠ 0
λ2 = 0 , therefore we get 20 - 2λ1 x1=0

10 - 2λ1 x2=0
λ1x1=10 and λ1x2=5, squaring and adding we get
λ12 = 125
λ1 = 5√5
therefore x1 = 2/√5, x2 = 1/√5, f= 50/√5 >22
λ2 ≠ 0
 x1 + x2 – 2 = 0
Therefore x1=0, x2=1, f =10
Or
x1= 4/5, x2= 3/5, f=22
Therefore max f occurs at x1 = 2/√5,
√ x2 = 1/√5
√
17
10/20/2010
Problem Use the KKT conditions to derive an optimal
solution for the following problem:
minimize f(x1, x2) = x12+ x2
subject
bj t to
t x12 + x22 ≤ 9
x1 + x2 ≤ 1
Solution: Here there are two constraints namely,
g1(x1,x2) = x12+x22 - 9 ≤ 0
g2(x1,x2) = x1 + x2 -1 ≤ 0
1 : 1  0, 2  0 as it is a minimization problem
2: 2x1 - 21x1 - 2 = 0
1 - 21x2 - 2
Thus the KKT conditions are:
2
2
1 1
2
=0
3 :  ( x  x  9)  0
2 ( x1  x2  1)  0
4 : x12  x22  9
x1  x2  1
18
10/20/2010
Now 1  0
(from 2) gives 2  1 Not possible.
Hence 1  0 and so
x12  x22  9
(5)
Assume 2  0. So (1st equation of ) (2) gives
2 x1 (1  1 )  0
Since 1  0
we get x1= 0
From (5), we get x2   3
2nd equation of (2) says (with 1  0, 2  0 ) x2 = -3
Thus the optimal solution is:
The optimal value is :
1
x1  0, x2   3, 1   , 2  0
6
Basic r
r
R1
R2
S1
S2
1
0
0
0
x1
22
0
40
-18
18
1
0
1
r
1
0
x1
0
1
R2 0
0
S1
0
0
S2
0
0
x2
-8
0
-18
10
1
1
2
0
1
1
0
2
5
0
1
4
0
4
0
z  3
1
-1
0
-1
0
0
2
-1
0
0
-11
0
0
0
19 29
89
10 20
20
9
1
1

20 40
40
19 29
89
10 20
20
29
1
1


20 40
40
89
1
1


20 40
40
9

20
1

40
9

20
1
40
1
40
R1
0
-1
1
0
0
R2
0
-1
0
1
0
S1
0
0
0
0
1
S2
0
0
0
0
0
Sol
70
0
20
50
6
0
0
0
0
1
18
1
11

20
1
40
9
20
1

40
1

40
0
0
0
59
0
0
0
1
2
1
0
0
59
0
1
0
0
0
1
0
1
0
0
11
2
35
2
19
10/20/2010
Basic r
x1
x2
1
917 2657 337

580 580
580
r
1
0
0
x1
0
1

R2
0
0
0
0
x2
S2
0
0
0
0
1

r
1
0
0
x1
0
1
0
R2 0
0
x2
0
2
0
0
2
1
2
29
327
1160

7
1160
0

2657 337

580 580
1
1740
580

2
29
2
29
327
327

1160
1160
0
1
0
0
0

1
0
1
0
9
20
1
40
0
0
1 1
0
1
0
0
0
243
580
7
1160
337
580
2

29
327
1160
0
2320
580
1

40

R1
1
7
7

1160 1160
917
580
2

2900
580
1
40
R2
0

0
1

0
0

S1
S2
Sol
38
29
0
1502
29
9
29
0
64
29
38
29
0
1502
29
20
29
0
110
29
89
29
1
18
29
0
0
0
0
1
2
9
20
1

40
1
0
0
59
0
1
0
11
2
1
0

3560 1160
327 327
720
327
Using Excel Solver, the optimum solution is:
x1 = 1.604651, x2 = 4.395349 and
the maximum z = 230.7209
20