References
Problem: Unconstrained minimization I
Consider the function f : x ∈ Rd → f (x) ∈ R. We want to find x̂ such that:
x̂ = arg min f (x)
x
Convex Optimization , Chapter 9, S. Boyd & L. Vandenberghe, Cambridge
University Press 2004.
Assuming f is differentiable (and convex), a necessary (and sufficient) condition
for x̂ to be optimal is:
∇ f (x̂) = 0
Sometimes this minimum cannot be found analytically, so instead we aim at
designing an algorithm that generates a sequence of points x(0) , x(1) , · · · , x(k) that
converges towards x̂.
Problem: Unconstrained minimization II
Descent methods
Examples:
Quadratic minimization can be solve analytically:
Theorem (General descent method)
©
ª
x̂ = arg min f (x) = ky − Axk2
Given a starting point x(0)
x
Repeat
Unconstrained Geometric Programming:
(
x̂ = arg min f (x) = log
x
Ã
m
X
!)
exp(aTi x + b i )
i =1
1
Determine a descent direction ∆x(k)
2
Line search: Choose a step size t(k) > 0
3
Update: x(k+1) = x(k) + t(k) ∆x(k)
The optimal condition is:
∇ f (x̂) = Pm
1
m
X
T
i =1 exp(a i x + b i ) i =1
Choice of a starting point x(0)
Until stopping criterion is satisfied.
exp(aTi x + b i ) a i
Choice of the step size t(k) I
When updating the current position:
small steps: inefficient
large step: potentially bad results
1
Exact line search: t is chosen to minimize f along the ray {x + t∆x| t ≥ 0} or:
t = arg min f (x + s∆x)
s≥0
( k)
Exercise: Show that ∇ f (x
search.
( k) T
+ t∆x
) ∆x(k) = 0 when t is fixed by exact line
Choice of the step size t(k) II
Choice of the step size t(k) III
Remark: most of the time you need numerical methods to find the root of
2
Backtracking line search: Since most line search are inexact in practice, the
step length is usually chosen to approximatly minimize f or reduce f enough.
′
φ ( s) with
φ( s) = f (x + s∆x)
Backtracking line search is one of them:
Theorem (Backtracking line search)
Given a descent direction ∆x for f at x, and 2 constants α ∈ [0; 0.5] β ∈]0; 1[,
1
Dichotomous and Golden search
2
Bisection
Given t := 1
3
Newton’s method
while f (x + t∆x) > f (x) + α t∇ f (x)T ∆x, t := β t
Choice of a descent direction ∆x I
Choice of a descent direction ∆x II
A natural choice for the search direction is the negative gradient ∆x = −∇ f (x),
and the resulting algorithm is then called gradient descent algorithm.
Definition (Gradient)
With x ∈ Rd or x = ( x1 , x2 , · · · , xd )T , the gradient is defined by:
∂ f (x)
∂ x1
.
∇ f (x) = ..
∂ f (x)
∂ xd
Choice stopping criterion is satisfied
Newton’s method I
Definition (Hessian matrix)
k∇ f ( x(k+1) )k < ǫ
H f (x) = ∇2 f (x) =
∂2 f
∂ x1 ∂ x1
···
..
.
∂2 f
∂ xd ∂ x1
···
∂2 f
∂ x1 ∂ x d
∂2 f
∂ xd ∂ xd
Theorem (Newton’s method)
x(k+1) = x(k) − [∇2 f (x(k) )]−1 ∇ f (x(k) )
or rewritten using the Hessian matrix
x(k+1) = x(k) − [ H f (x(k) )]−1 ∇ f (x(k) )
Newton’s method II
Applications
Exercise: Prove the Newton method.
Alternatives:
Quasi-Newton methods
...
Finding minimum or maximum as many applications: we cite here just one the
Mean-shift approach.
© Copyright 2026 Paperzz