SE524/EC524 Optimization Theory and Methods

SE524/EC524 Optimization Theory and Methods
Yannis Paschalidis
[email protected], http://ionia.bu.edu/
Department of Electrical and Computer Engineering,
Division of Systems Engineering,
and Center for Information and Systems Engineering,
Boston University
Lecture 15: Outline
97/159
1
Introduction to Nonlinear Programming (NLP).
2
Some NLP formulations.
3
Unconstrained optimization.
4
Gradient methods.
5
Stepsize selection.
Yannis Paschalidis, Boston University
SE524/EC524: Lecture 15
Some background material for NLP
Norms || · || on Rn .
Euclidean norm: ||x|| =
√
x′ x.
Open Ball around a with radius r : {y | ||y − a|| < r }.
A ⊂ Rn is compact iff closed and bounded (Heine-Borel).
Consider function f : A → Rn :
continuous at x ∈ A if limy→x f (y) = f (x).
right-continuous if limy↓x f (y) = f (x).
left-continuous if limy↑x f (y) = f (x).
lower-semicontinuous if f (x) ≤ lim inf k→∞ f (xk ) for every
sequence xk → x.
upper-semicontinuous if f (x) ≥ lim supk→∞ f (xk ) for every
sequence xk → x.
coercive if limk→∞ f (xk ) = ∞ for every sequence satisfying
||xk || → ∞.
98/159
Yannis Paschalidis, Boston University
SE524/EC524: Lecture 15
Some background material for NLP (cont.)
Theorem
(Weierstrass) Let A ⊂ Rn where A is closed and non-empty. Let
f : A → Rn be lower-semicontinuous for all x ∈ A .
(a)
(b)
If A is compact then ∃x ∈ A s.t. f (x) = inf z∈A f (z).
If f is coercive, then ∃x ∈ A s.t. f (x) = inf z∈A f (z).
Gradient:
, . . . , ϑfϑx(x)
).
f : Rn → R ⇒ ∇f (x) = ( ϑfϑx(x)
1
n
f = (f1 , . . . , fm ) : Rn → Rm ⇒ ∇f (x) = [∇f1 (x) · · · ∇fm (x)].
Hessian:
f : Rn → R ⇒ ∇2 f (x) = ∇(∇f (x)) =
ϑ2 f (x)
ϑxi ϑxj
.
Taylor expansion:
1
f (x + y) = f (x) + y′ ∇f (x) + y′ ∇2 f (x)y + o(||y||2 ).
2
99/159
Yannis Paschalidis, Boston University
SE524/EC524: Lecture 15
Formulation and definitions
Unconstrained optimization problem:
min f (x)
s.t. x ∈ Rn
x∗ is a local minimum if ∃ǫ > 0 s.t. f (x∗ ) ≤ f (x) ∀x with
||x − x∗ || < ǫ.
x∗ is a global minimum if f (x∗ ) ≤ f (x) ∀x ∈ Rn .
100/159
Yannis Paschalidis, Boston University
SE524/EC524: Lecture 15
Necessary Conditions
Proposition
Let x∗ be an unconstrained local min and f : Rn → R continuously
differentiable in an open set S containing x∗ . Then
∇f (x∗ ) = 0.
If f :
Rn
→ R is twice continuously differentiable within S then
∇2 f (x∗ ) 0.
101/159
(1st order)
Yannis Paschalidis, Boston University
(2nd order)
SE524/EC524: Lecture 15
Convexity
Proposition
Let f : C → R be convex over the convex set C ⊂ Rn .
(a)
(b)
A local min of f over C is also a global min over C . If f is
strictly convex, ∃ at most one global min.
If f is convex and C is open, the condition
∇f (x∗ ) = 0
is necessary and sufficient for x∗ ∈ C to be a global min of
f over C .
102/159
Yannis Paschalidis, Boston University
SE524/EC524: Lecture 15
Sufficient Conditions
Proposition
Let f : Rn → R be twice continuously differentiable in an open set
S ⊂ Rn . Let also x∗ ∈ S s.t.
∇f (x∗ ) = 0,
∇2 f (x∗ ) ≻ 0.
Then x∗ is a strict unconstrained local min of f , that is, ∃γ, ǫ > 0
s.t.
f (x) ≥ f (x∗ ) +
103/159
γ
||x − x∗ ||2 ,
2
Yannis Paschalidis, Boston University
∀x with ||x − x∗ || < ǫ.
SE524/EC524: Lecture 15
Gradient Methods
Generic gradient method:
xk+1 = xk + αk dk
such that if ∇f (xk ) 6= 0 then dk is chosen so that
∇f (xk )′ dk < 0 (descent direction).
An interesting class of gradient methods:
xk+1 = xk − αk Dk ∇f (xk ).
Steepest descent: Dk = I.
Newton’s method: Dk = (∇2 f (xk ))−1 .
Diagonallyscaled steepest descent: 2 k −1
2 k −1
ϑ f (x )
f (x )
Dk = diag
.
, . . . , ϑ(ϑx
2
(ϑx1 )2
n)
Modified Newton’s method: Dk = (∇2 f (x0 ))−1 .
104/159
Yannis Paschalidis, Boston University
SE524/EC524: Lecture 15
Least squares problems
min f (x) = 21 ||g (x)||2
s.t. x ∈ Rn .
Note that
∇f (x) = ∇g (x)′ g (x).
Gauss-Newton method for least squares:
−1
xk+1 = xk − αk ∇g (xk )∇g (xk )′
∇g (xk )′ g (xk ).
105/159
Yannis Paschalidis, Boston University
SE524/EC524: Lecture 15
Stesize Selection
Minimization rule:
f (xk + αk dk ) = min f (xk + αdk ).
α≥0
Limited minimization rule:
f (xk + αk dk ) = min f (xk + αdk ).
α∈[0,s]
Constant stepsize: αk = s.
Diminishing stepsize:
αk → 0
106/159
with
Yannis Paschalidis, Boston University
∞
X
k=0
αk = ∞.
SE524/EC524: Lecture 15