Numerical Optimization: Introduction and
gradient-based methods
Master 2 Recherche LRI
Apprentissage Statistique et Optimisation
Anne Auger
Inria Saclay-Ile-de-France
November 2011
http://tao.lri.fr/tiki-index.php?page=Courses
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
1 / 38
1
Numerical optimization
A few examples
Data fitting, regression
In machine learning
Black-box optimization
Different notions of optimum
Typical difficulties in optimization
Deterministic vs stochastic - local versus global methods
2
Mathematical tools and optimality conditions
First order differentiability and gradients
Second order differentiability and hessian
Optimality Conditions for unconstrainted optimization
3
Gradient based optimization algorithms
Root finding methods (1-D optimization)
Relaxation algorithm
Descent methods
Gradient descent, Newton descent, BFGS
Trust regions methods
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
2 / 38
Numerical optimization
Numerical or continuous optimization
Unconstrained optimization
Optimize a function where parameters to optimize are “continuous”
(live in Rn ).
minn f (x)
x∈R
n: dimension of the problem
corresponds to dimension of euclidian vector space Rn
Maximization vs Minimization
Maximize f = Minimize − f
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
3 / 38
Numerical optimization
A few examples
Analytical functions
Convex quadratic function:
1
f (x) = xT Ax + b T x + c
2
1
= (x − x0 )T A(x − x0 ) + c
2
(1)
(2)
where A ∈ Rn×n , symmetric positive definite, b ∈ Rn , c ∈ R.
Exercice
Express x0 in (2) as a function of A and b. Express the minimum of f .
For n = 2, plot the level sets of a convex-quadratic function where level
sets are defined as the sets
Lc = {x ∈ Rn |f (x) = c}.
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
4 / 38
Numerical optimization
A few examples
Data fitting - Data calibration
Objective
Given a sequence of data points (xi , yi ) ∈ Rp × R, i = 1, . . . , N, find a
model “y = f (x)00 that explains the data
experimental measurements in biology, chemistry, ...
In general, choice of a parametric model or family of functions
use of expertise for choosing model or simple models only affordable
(fθ )θ∈Rn
(linear, quadratic)
Try to find the parameter θ ∈
Rn
fitting best to the data
Fitting best to the data
Minimize the quadratic error:
minn
θ∈R
Anne Auger (Inria Saclay-Ile-de-France)
N
X
|fθ (xi ) − yi |2
i=1
Numercial Optimization I
November 2011
5 / 38
Numerical optimization
A few examples
Optimization and machine learning
(Simple) Linear regression
Given a set of data (examples): {yi , xi1 , . . . , xip }i=1...N
| {z }
XT
i
min
p
N
X
w∈R ,β∈R
|i=1
|wT Xi + β − yi |2
{z
kX̃w−yk2
}
same as data fitting with linear model, i.e. f(w,β) (x) = wT x + β, θ ∈ Rp
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
6 / 38
Numerical optimization
A few examples
Optimization and Machine Learning (cont.)
Clustering
k-means
Given a set of observations (x1 , . . . , xp ) where each observation is a
n-dimensional real vector, k-means clustering aims to partition the n
observations into k sets (k ≤ p), S = {S1 , S2 , . . . , Sk } so as to minimize
the within-cluster sum of squares:
minimizeµ1 ,...,µk
k X
X
kxj − µi k2
i=1 xj ∈Si
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
7 / 38
Numerical optimization
A few examples
Black-box optimization
Optimization of well placement
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
8 / 38
Numerical optimization
Different notions of optimum
Different notions of optimum
local versus global minimum
local minimum x? : for all x in a neighborhood of x? , f (x) ≥ f (x? )
global minimum: for all x, f (x) ≥ f (x? )
essential infimum of a function:
Given a measure µ on Rn , the essential infimum of f is defined as
ess inf f = sup{b ∈ R : µ({x : f (x) < b}) = 0}
important to keep in mind in the context of stochastic optimization
algorithms
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
9 / 38
Numerical optimization
Typical difficulties in optimization
What Makes a Function Difficult to Solve?
100
90
ruggedness
80
70
non-smooth, discontinuous, multimodal, and/or
noisy function
60
50
40
30
20
10
dimensionality
0
−4
(considerably) larger than three
−3
−2
−1
0
1
2
3
4
cut from 3-D example, solvable
with an evolution strategy
non-separability
dependencies between the objective variables
ill-conditioning
a narrow ridge
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
10 / 38
Numerical optimization
Typical difficulties in optimization
What Makes a Function Difficult to Solve?
100
90
80
70
60
50
40
30
20
10
0
−4
−3
−2
−1
0
1
2
3
4
cut from 3-D example, solvable
with an evolution strategy
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
10 / 38
Numerical optimization
Typical difficulties in optimization
Curse of Dimensionality
The term Curse of dimensionality (Richard Bellman) refers to problems
caused by the rapid increase in volume associated with adding extra
dimensions to a (mathematical) space.
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
11 / 38
Numerical optimization
Typical difficulties in optimization
Curse of Dimensionality
The term Curse of dimensionality (Richard Bellman) refers to problems
caused by the rapid increase in volume associated with adding extra
dimensions to a (mathematical) space.
Example: Consider placing 100 points onto a real interval, say [0, 1]. To
get similar coverage, in terms of distance between adjacent points, of the
10-dimensional space [0, 1]10 would require 10010 = 1020 points. A 100
points appear now as isolated points in a vast empty space.
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
11 / 38
Numerical optimization
Typical difficulties in optimization
Curse of Dimensionality
The term Curse of dimensionality (Richard Bellman) refers to problems
caused by the rapid increase in volume associated with adding extra
dimensions to a (mathematical) space.
Example: Consider placing 100 points onto a real interval, say [0, 1]. To
get similar coverage, in terms of distance between adjacent points, of the
10-dimensional space [0, 1]10 would require 10010 = 1020 points. A 100
points appear now as isolated points in a vast empty space.
Consequently, a search policy (e.g. exhaustive search) that is valuable in
small dimensions might be useless in moderate or large dimensional search
spaces.
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
11 / 38
Numerical optimization
Typical difficulties in optimization
Separable Problems
Definition (Separable Problem)
A function f is separable if
arg min f (x1 , . . . , xn ) =
(x1 ,...,xn )
arg min f (x1 , . . .), . . . , arg min f (. . . , xn )
x1
xn
⇒ it follows that f can be optimized in a sequence of n independent
1-D optimization processes
Example: Additively decomposable
functions
f (x1 , . . . , xn ) =
n
X
3
2
1
fi (xi )
0
i=1
−1
Rastrigin function
−2
−3
−3
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
−2
−1
0
1
2
3
November 2011
12 / 38
Numerical optimization
Typical difficulties in optimization
Non-Separable Problems
(1,2)
Building a non-separable problem from a separable one
Rotating the coordinate system
f : x 7→ f (x) separable
f : x 7→ f (Rx) non-separable
R rotation matrix
3
3
2
2
1
R
−→
0
1
0
−1
−1
−2
−2
−3
−3
−2
−1
0
1
2
−3
−3
3
−2
−1
0
1
2
3
1
Hansen, Ostermeier, Gawelczyk (1995). On the adaptation of arbitrary normal mutation distributions in
evolution strategies: The generating set adaptation. Sixth ICGA, pp. 57-64, Morgan Kaufmann
2
Salomon (1996). ”Reevaluating Genetic Algorithm Performance under Coordinate Rotation of Benchmark
Functions; A survey of some theoretical and practical aspects of genetic algorithms.” BioSystems, 39(3):263-278
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
13 / 38
Numerical optimization
Typical difficulties in optimization
Ill-Conditioned Problems
Curvature of level sets
Consider the convex-quadratic function
P
P
f (x) = 12 (x − x∗ )T H(x − x∗ ) = 12 i hi,i xi2 + 12 i6=j hi,j xi xj
H is Hessian matrix of f and symmetric positive definite
gradient direction −f 0 (x)T
Newton direction −H−1 f 0 (x)T
Ill-conditioning means squeezed level sets (high curvature). Condition
number equals nine here. Condition numbers up to 1010 are not
unusual in real world problems.
If H ≈ I (small condition number of H) first order information (e.g. the
gradient) is sufficient. Otherwise second order information (estimation of
H−1 ) is necessary.
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
14 / 38
Numerical optimization
Typical difficulties in optimization
What Makes a Function Difficult to Solve?
. . . and what can be done
The Problem
Ruggedness
The Approach in ESs and continuous EDAs
non-local policy, large sampling width (step-size)
as large as possible while preserving a reasonable
convergence speed
stochastic, non-elitistic, population-based method
recombination operator
serves as repair mechanism
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
15 / 38
Numerical optimization
Typical difficulties in optimization
What Makes a Function Difficult to Solve?
. . . and what can be done
The Problem
Ruggedness
The Approach in ESs and continuous EDAs
non-local policy, large sampling width (step-size)
as large as possible while preserving a reasonable
convergence speed
stochastic, non-elitistic, population-based method
recombination operator
serves as repair mechanism
Dimensionality,
Non-Separability
exploiting the problem structure
Anne Auger (Inria Saclay-Ile-de-France)
locality, neighborhood, encoding
Numercial Optimization I
November 2011
15 / 38
Numerical optimization
Typical difficulties in optimization
What Makes a Function Difficult to Solve?
. . . and what can be done
The Problem
Ruggedness
The Approach in ESs and continuous EDAs
non-local policy, large sampling width (step-size)
as large as possible while preserving a reasonable
convergence speed
stochastic, non-elitistic, population-based method
recombination operator
serves as repair mechanism
Dimensionality,
Non-Separability
exploiting the problem structure
Ill-conditioning
second order approach
locality, neighborhood, encoding
changes the neighborhood metric
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
15 / 38
Numerical optimization
Deterministic vs stochastic - local versus global methods
Different optimization methods
Deterministic methods
local methods
Convex optimization methods - gradient based methods
most often require to use gradients of functions
converge to local optima, fast if function has the right assumptions
(smooth enough)
deterministic methods that do not require convexity: simplex method,
pattern search methods
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
16 / 38
Numerical optimization
Deterministic vs stochastic - local versus global methods
Different optimization methods (cont.)
Stochastic methods
global methods
use of randomness to be able to escape local optima
pure random search, simulated annealing (SA)
not reasonable methods in practice (too slow)
genetic algorithms
not powerful for numerical optimization, originally introduced for
binary search spaces {0, 1}n
evolution strategies
powerful for numerical optimization
Remarks
Impossible to be exhaustive
Classification is not that binary
methods combining deterministic and stochastic do of course exist
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
17 / 38
Mathematical tools and optimality conditions
1
First order differentiability and gradients
Numerical optimization
A few examples
Data fitting, regression
In machine learning
Black-box optimization
Different notions of optimum
Typical difficulties in optimization
Deterministic vs stochastic - local versus global methods
2
Mathematical tools and optimality conditions
First order differentiability and gradients
Second order differentiability and hessian
Optimality Conditions for unconstrainted optimization
3
Gradient based optimization algorithms
Root finding methods (1-D optimization)
Relaxation algorithm
Descent methods
Gradient descent, Newton descent, BFGS
Trust regions methods
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
18 / 38
Mathematical tools and optimality conditions
First order differentiability and gradients
Differentiability
Generalization of derivability in 1-D
Given a normed vector space (E , k.k) and complete (Banach space),
consider f : U ⊂ E → R with U open set of E .
f is differentiable in x ∈ U if there exists a continuous linear form Dfx
such that
f (x + h) = f (x) + Dfx (h) + o(khk)
(3)
Dfx is the differential of f in x
Exercice
Consider E = Rn with the scalar product hx, yi = xT y. Let a ∈ Rn , show
that
f (x) = ha, xi
is differentiable and compute its differential.
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
19 / 38
Mathematical tools and optimality conditions
First order differentiability and gradients
Gradient
p
If the norm k.k comes from a scalar product, i.e. kxk = hx, xi (the
Banach space E is then called an Hilbert space), the gradient of f in x
denoted ∇f (x) is defined as the element of E such that
Dfx (h) = h∇f (x), hi
(4)
Riesz representation Theorem
Taylor formula - order one
Replacing differential by (4) in (3) we obtain the Taylor formula:
f (x + h) = f (x) + h∇f (x), hi + o(khk)
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
20 / 38
Mathematical tools and optimality conditions
First order differentiability and gradients
Gradient (cont)
Exercice
Compute the gradient of the function f (x) = ha, xi.
Exercice
Level sets are sets defined as
Lc = {x ∈ Rn |f (x) = c}.
Let x0 ∈ Lc 6= ∅. Show that ∇f (x0 ) is orthogonal to the level set in x0 .
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
21 / 38
Mathematical tools and optimality conditions
First order differentiability and gradients
Gradient
Examples
if f (x) = ha, xi, ∇f (x) = a
in Rn , if f (x) = xT Ax, then ∇f (x) = (A + AT )x
particular case if f (x) = kxk2 , then ∇f (x) = 2x
in R, ∇f (x) = f 0 (x)
in (Rn , k.k2 ) where kxk2 =
p
hx, xi is the euclidian norm
T
∂f
∂f
∇f (x)
=
(x), . . . ,
(x)
| {z }
∂x1
∂xn
|
{z
}
“natural00 gradient(Amari,2000)
“vanilla00 gradient
Attention: this equality will not hold for all norms. Has some
importance in machine learning, see the natural gradient topic
introduced by Amari.
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
22 / 38
Mathematical tools and optimality conditions
Second order differentiability and hessian
Second order differentiability
(first order) differential: gives a linear local approximation
second order differential: gives a quadratic local approximation
Definition: second order differentiability
f : U ⊂ E → R is differentiable at the second order in x ∈ U if it is
differentiable in a neighborhood of x and if u 7→ Dfu is differentiable in x
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
23 / 38
Mathematical tools and optimality conditions
Second order differentiability and hessian
Second order differentiability (cont.)
Other definition
f : U ⊂ E → R is differentiable at the second order in x ∈ U iff there
exists a continuous linear application Dfx and a bilinear symmetric
continuous application D 2 fx such that
1
f (x + h) = f (x) + Dfx (h) + D 2 fx (h, h) + o(khk2 )
2
In a Hilbert (E , h i)
D 2 fx (h, h) = h∇2 f (x)(h), hi
where ∇2 f (x) : E → E is a symmetric continuous operator.
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
24 / 38
Mathematical tools and optimality conditions
Second order differentiability and hessian
Hessian matrix
In (Rn , hx, yi = xT y), ∇2 f (x) is represented by a symmetric matrix called
the hessian matrix. It can be computed as
∂2f
∂2f
∂2f
∂x1 ∂x2 · · ·
∂x1 ∂xn
∂x12
∂2f
∂2f
∂2f
· · · ∂x2 ∂xn
∂x2 ∂x1
∂x22
∇2 (f ) =
.
.
.
.
.
.
..
..
..
.
2
2
2
∂ f
∂ f
∂ f
·
·
·
∂xn ∂x1
∂xn ∂x2
∂x 2
n
Exercice
In Rn , compute the hessian matrix of f (x) = 12 xT Ax.
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
25 / 38
Mathematical tools and optimality conditions
Optimality Conditions for unconstrainted optimization
Optimality condition
First order Necessary condition
for 1-D optimization problems f : R → R
Assume f is derivable
x? is a local optimum ⇒ f 0 (x? ) = 0
not a necessary condition: consider f (x) = x3
proof via Taylor formula: f (x? + h) = f (x? ) + f 0 (x? )h + o(khk)
Points y such that f 0 (y) = 0 are called critical or stationary points.
Generalization
If f : E 7→ R with (E , h, i) a Hilbert is differentiable
x? is a local minimum of f ⇒ ∇f (x? ) = 0.
proof via Taylor formula
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
26 / 38
Mathematical tools and optimality conditions
Optimality Conditions for unconstrainted optimization
Optimality conditions
Second order necessary and sufficient conditions
If f is twice continuously differentiable
if x? is a local optimum, then ∇f (x? ) = 0 and ∇2 f (x? ) is positive
semi-definite
if ∇f (x? ) = 0 and ∇2 f (x? ) is positive definite, then there is an α > 0
such that f (x) ≥ f (x? ) + αkx − x? k2
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
27 / 38
Gradient based optimization algorithms
1
Numerical optimization
A few examples
Data fitting, regression
In machine learning
Black-box optimization
Different notions of optimum
Typical difficulties in optimization
Deterministic vs stochastic - local versus global methods
2
Mathematical tools and optimality conditions
First order differentiability and gradients
Second order differentiability and hessian
Optimality Conditions for unconstrainted optimization
3
Gradient based optimization algorithms
Root finding methods (1-D optimization)
Relaxation algorithm
Descent methods
Gradient descent, Newton descent, BFGS
Trust regions methods
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
28 / 38
Gradient based optimization algorithms
Optimization algorithms
Iterative procedures that generate a sequence (xn )n∈N where at each
time step n, xn is the estimate of the optimum.
Convergence
No exact result in general, however the algorithms aim at converging
to optima (local or global), i.e.
kxn − x? k −−−→ 0
n→∞
convergence speed of the algorithm = how fast kxn − x? k goes to zero
Equivalently, given a fixed precision , the algorithms aim at
approaching the optimum x? with precision in finite time
running time = how many iterations is needed to reach the precision Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
29 / 38
Gradient based optimization algorithms
Root finding methods (1-D optimization)
Newton-Raphson method
Method to find the root of a function f : R → R
linear approximation of f : f (x + h) = f (x) + hf 0 (x) + o(khk)
at the first order, f (x + h) = 0 leads to h = − ff 0(x)
(x)
Algorithm: xn+1 = xn −
f (xn )
f 0 (xn )
quadratic convergence: |xn+1 − x ? | ≤ µ|xn − x ? |2
provided the initial point is close enough from the root x ? and there is
no other critical point in a neighborhood of x ?
secant method: replaces the derivative computation its the finite
difference approximation
Application to the minimization of f
try to solve the equation f 0 (x) = 0
algorithm: xn+1 = xn −
Anne Auger (Inria Saclay-Ile-de-France)
f 0 (xn )
f 00 (xn )
Numercial Optimization I
November 2011
30 / 38
Gradient based optimization algorithms
Root finding methods (1-D optimization)
Bissection method
An other method to find the root of a 1-D function
bisects an interval and then
selects a subinterval in which a
root must lie
method guaranteed to converge
if f continuous and the intial
points bracket the solution
kxn − x ? k ≤
|a1 − b1 |
2n
Wikipedia image
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
31 / 38
Gradient based optimization algorithms
Relaxation algorithm
Relaxation algorithm
Naive approach for optimization of f : Rn → R
Algorithm
successive optimization w.r.t. each variable
1 choose an intial point x0 ,k=1
2 while not happy
for i = 1, . . . , n
k+1
k , . . . , xk)
xik+1 = arg minx∈R f (x1k+1 , . . . , xi−1
, x, xi+1
n
use a 1D-minimization procedure
endfor
k=k+1
Critics
Very bad for non-separable problems
Not invariant if change of coordinate system
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
32 / 38
Gradient based optimization algorithms
Descent methods
Descent methods
General principle
1 choose an intial point x0 ,k=1
2 while not happy
2.1 choose a descent direction dk 6= 0
2.2 choose a step-size σk > 0
2.3 set xk+1 = xk + σk dk , k = k + 1
Remaining questions
How to choose dk ?
How to choose σk ?
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
33 / 38
Gradient based optimization algorithms
Descent methods
Gradient descent
Rationale: dk = −∇f (xk ) is a descent direction
Indeed for f differentiable
f (x − σ∇f (x)) = f (x) − σk∇f (x)k2 + o(σ)
for
σ
<
f (x)
|{z}
small enough
Step-size
optimal step-size
σk = arg min f (xk − σ∇f (xk ))
(5)
σ
in general precise minimization of (5) is expensive and unecessary,
instead a line search algorithm executes a limited number of trial
steps until one loose approximation of the minimum is found
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
34 / 38
Gradient based optimization algorithms
Descent methods
Gradient descent
Convergence rate on convex-quadratic function
On f (x) = − 12 xT Ax − b T x + c with A a symmetric positive definite
matrix with Q = cond(A) we have that the gradient descent
algorithm with optimal step-size satisfy:
f (xk ) − min f ≤
Q −1
Q +1
2k
[f (x0 ) − min f ]
The convergence rate depends on the starting point, however
2k
Q−1
gives the correct description of the convergence rate for
Q+1
almost all starting points.
Very slow convergence for ill-conditionned problems
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
35 / 38
Gradient based optimization algorithms
Descent methods
Newton algorithm, BFGS
Descent methods
Newton method
descent direction: −[∇2 f (xk )]−1 ∇f (xk )
Newton direction
points towards the optimum on f (x) =
xT Ax
+
bT x
+c
However, hessian matrix is expensive to compute in general and its
inversion is also not immediate
Broyden-Fletcher-Goldfarb-Shanno (BFGS) method
construct in an iterative manner using solely the gradient, an
approximation of the Newton direction −[∇2 f (xk )]−1 ∇f (xk )
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
36 / 38
Gradient based optimization algorithms
Trust regions methods
Trust regions methods
General idea
instead of choosing a descent direction and minimizing along this
direction, trust regions algorithms construct a model mk of the
objective function (usually a quadratic model)
because the model may not be a good approximation of f far away
from the current iterate xk , the search for a minimizer of the model is
restricted to a region (trust region) around the current iterate
trust region is usually a ball, whose radius is adapted (shrinked if no
better solution is found)
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
37 / 38
Gradient based optimization algorithms
Trust regions methods
Was not handled in this class
but you might want to have a look at it
Constraint optimization
min
x∈Rn f (x)
subjected to ci (x) = 0, i = 1, . . . , p
equality constraints
and bj (x) ≥ 0, j = 1, . . . , k
inequality constraints
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
38 / 38
© Copyright 2026 Paperzz