A Method for the Minimization of [0.5ex] Piecewise Smooth Objective

A Method for the Minimization of
Piecewise Smooth Objective Functions
Sabrina Fiege1
1
Andreas Griewank2
Andrea Walther1
Paderborn University, Germany
2
Yachaytech, Ecuador
7th International Conference on Algorithmic Differentiation
September 12-15, 2016
Christ Church Oxford, UK
Motivation
Our Optimization Approach
Our goal: Locate local optima of a piecewise smooth function by
successive approximation by piecewise linear models and
⇒ Piecewise Linearization
explicit handling of kink structure in PL model.
Assumptions:
We consider Lipschitzian piecewise smooth funtions f
: Rn → R.
All nondifferentiabilities are incorporated by abs().
min(u , v )
max(u , v )
= (v + u − abs(v − u ))/2,
= (v + u + abs(v − u ))/2
and complementarity conditions are covered.
Handling of abs() is included in algorithmic differentiation tool ADOL-C.
Previous work:
Lipschitz Optimization based on gray-box piecewise linearization,
A. Griewank, A. Walther, SF, T. Bosse, Mathematical Programming, 2015
S. Fiege, A. Griewank, and A. Walther
1 / 22
September 12, 2016
Motivation
Observations
200
100
Solving min f (x ) with f PL is not easy:
f(x,y)
0
-100
Global minimization is NP-hard.
-200
Steepest descent with exact line search may
-300
fail.
-400
20
10
Zeno behaviour possible,
0
y
i.e., solution trajactory with infinite number of
50
-10
0
-20
direction changes in a finite amount of time.
-50
x
-100
Nondifferentiable points of f
20
15
f2(x)
10
x2
5
J.-B. Hiriart-Urruty and C. Lemaréchal,
0
−10
Springer, 1993
−15
f−1(x)
f−2(x)
−20
−100
2 / 22
x0=(9,−3)
−5
Convex Analysis and Minimization Algorithms I,
S. Fiege, A. Griewank, and A. Walther
f1(x)
f0(x)
−50
x1
0
50
September 12, 2016
AD Drivers
Outline
1
Motivation
2
AD Drivers
3
Lipschitzian Piecewise Smooth Minimization
Minimization of Piecewise Smooth Functions
Minimization of Piecewise Linear Functions
Convergence results
Numerical Results
4
Conclusion and Outlook
S. Fiege, A. Griewank, and A. Walther
3 / 22
September 12, 2016
AD Drivers
Adapted Evaluation Procedure for PS Objectives
vi −n
zi
σi
vi
y
=
=
=
=
=
xi
ψi (vj )j ≺i
sign(zi )
σi zi = abs(zi )
ψs (vj )j ≺s
i
= 1 ... n
i
= 1 ... s
Table: Reduced evaluation procedure
s
∈ N number of evaluations of absolut value function.
σ = {−1, 0, 1}s is called signature vector.
z
∈ Rs is called switching vector.
S. Fiege, A. Griewank, and A. Walther
4 / 22
September 12, 2016
AD Drivers
Selection Functions and Limiting Gradients
PS functions can be represented by selection functions fσ as
f (x )
∈ {fσ (x ) : σ ∈ E ⊂ {−1, 0, 1}s }.
where each fσ are continuously differentiable on open neigborhoods where
The Clarke subdifferential is given by
∂ f (x ) ≡ conv(∂ L f (x ))
with
∂ L f (x ) ≡ {∇fσ (x ) : fσ (x ) = f (x )}
L
where the elements of ∂ f (x ) are called limiting gradients.
If f is piecewise smooth, Clarke subdifferential is given by
∂ f (x ) ≡ conv{∇fσ (x ) | σ ∈ E}.
⇒ ∂ f (x ) can be generated by the gradients of finitely many selection functions.
S. Fiege, A. Griewank, and A. Walther
5 / 22
September 12, 2016
AD Drivers
Directionally Active Gradient
A directionally active gradient g is given by
g
≡ g (x , d ) ∈ ∂ L f (x )
such that f
0
(x , d ) = g T d
and g (x ; d ) equals ∇fσ (x ) of a locally differentiable selection function fσ .
AD driver:
directional_active_gradient(tag,n,x,d,g,σg )
≡ σ(x ; d ) at a given point x and a given direction d.
Returns g (x ; d ) and σg
ADOL-C: https://projects.coin-or.org/ADOL-C
S. Fiege, A. Griewank, and A. Walther
6 / 22
September 12, 2016
AD Drivers
Piecewise Linearization
Construction of tangent approximation for each elemental function
∆vi = ∆vj ± ∆vk
for vi
= vj ± vk
∆vi = vj ∗ ∆vk + vk ∗ ∆vj
for vi
= vj ∗ vk
∆vi = ϕ (vj )j ≺i ∗ ∆(vj )j ≺i
for vi
= ϕi (vj )j ≺i 6= abs(vj )
∆vi = abs(vj + ∆vj ) − vi
for vi
= abs(vj )
0
One obtains the piecewise linearization
fPL,x (∆x )
= f (x ) + ∆f (x ; ∆x )
of the original PS function f at a point x with the argument ∆x.
Andreas Griewank. On stable piecewise linearization and generalized algorithmic differentiation,
Optimization Methods & Software, 28(6), 1139–1178 2013.
S. Fiege, A. Griewank, and A. Walther
7 / 22
September 12, 2016
AD Drivers
Piecewise Linearization: AD Drivers
zos_pl_forward(tag,1,n,1,x,y,z)
Evaluates y = fPL,x (x ) and the switching variables z with vi = abs(zi ), i = 1, ..., s.
s=get_num_switches(tag)
Returns the number s of evaluations of the absolut value function.
fos_pl_forward(tag,1,n,x,deltax,y,deltay,z,deltaz)
Computes the increment ∆y = ∆f (x ; ∆x ) and the increment ∆z of the switching
variable.
ADOL-C: https://projects.coin-or.org/ADOL-C
S. Fiege, A. Griewank, and A. Walther
8 / 22
September 12, 2016
AD Drivers
The Abs-normal Form for PL Functions
Definition Abs-normal form for PL F
z
y
Z
∈ Rs×n ,
=
L
c1
c2
+
: Rn → R
Z
L
T
T
a
∈ Rs×s , a ∈ Rn ,
b
b
∈ Rs
x
|z |
c1
∈ Rs ,
c2
∈R
L is stricly lower triangular
Σ ≡ diag(σ) and |z | = Σ · z
PL function fPL approximation of PS function.
PL fPL,x
≡y
can be written as abs-normal form.
Andreas Griewank. On stable piecewise linearization and generalized algorithmic differentiation,
Optimization Methods & Software, 28(6), 1139–1178 2013.
S. Fiege, A. Griewank, and A. Walther
9 / 22
September 12, 2016
AD Drivers
The Abs-normal Form for PL Functions
Definition Abs-normal form for PL F
z
y
Z
∈R
s×n
=
,
L
c1
c2
∈R
s×s
+
: Rn → R
Z
a
n
L
T
,a∈R,
b
b
T
∈R
x
Σ·z
s
c1
∈ Rs ,
c2
∈R
Take the first row, solve for z and plug into the 2nd
fσ (x )
≡ y = c2 + bT Σ(I − LΣ)−1 c1 + (aT + bT Σ(I − LΣ)−1 Z ) x
|
{z
} |
{z
}
≡γσ(x )
≡gσ(x )
= γσ(x ) + gσ(x ) x
Computimg the abs-normal form once, provides all selection functions fσ of PL by choosing the
corresponding signature matrix Σ.
S. Fiege, A. Griewank, and A. Walther
9 / 22
September 12, 2016
AD Drivers
The Abs-normal Form for PL Functions
Definition Abs-normal form for PL F
z
y
Z
∈R
s×n
=
,
L
c1
c2
∈R
s×s
+
: Rn → R
Z
a
n
L
T
,a∈R,
b
b
T
∈R
x
Σ·z
s
c1
∈ Rs ,
c2
∈R
AD driver:
abs_normal(tag,n,x,sigma,y,z,c1,c2,a,b,Z,L)
Computes a PL for a given PS function f and a given point x.
Remark: c1, c2, a, b, Z and L only depent on the PS function f .
ADOL-C: https://projects.coin-or.org/ADOL-C
S. Fiege, A. Griewank, and A. Walther
9 / 22
September 12, 2016
Lipschitzian Piecewise Smooth Minimization
Minimization of PS Functions
Outline
1
Motivation
2
AD Drivers
3
Lipschitzian Piecewise Smooth Minimization
Minimization of Piecewise Smooth Functions
Minimization of Piecewise Linear Functions
Convergence results
Numerical Results
4
Conclusion and Outlook
S. Fiege, A. Griewank, and A. Walther
10 / 22
September 12, 2016
Lipschitzian Piecewise Smooth Minimization
Minimization of PS Functions
LiPsMin
Lipschitzian Piecewise Smooth Minimization
LiPSMin( x, q ): Preconditions: x
Set x
For k
0
∈ Rn , κ > 0, q ≥ 0 sufficiently great.
0
= x and q = q.
= 0, 1, 2...
k
1 Generate local model fPL,x k (·) at current iterate x .
2 Use PLMin(x
k
k
k
, ∆x , q ) to solve local overestimated problem.
1
∆x k = arg min∆x fPL,x k (∆x ) + (1 + κ)q k k∆x k2 .
2
3 Update x
4 Set q
k +1
k +1
= x k + ∆x k .
= max{q̂ k +1 , µ q k + (1 − µ) q̂ k +1 , q 0 } with µ ∈ [0, 1] and
q̂
S. Fiege, A. Griewank, and A. Walther
k +1
:= q̂ (x k , ∆x k ) ≡
11 / 22
2|f (x
k +1
) − fPL,x k (∆x k )|
.
k∆x k k2
September 12, 2016
Lipschitzian Piecewise Smooth Minimization
Minimization of PS Functions
: R2 → R, f (x1 , x2 ) = max{x22 − max{x1 , 0}, 0}
Example: f
k
=0
2
Local QP in x
0
2
1
1
→
0
f(x,y)
f(x,y)
0
-1
-1
-2
-2
-3
-3
New iterate
-1
-0.5
0
x
0.5
1
-0.5
-1
k
=1
0.5
0
1
x
y
1
-1
-0.5
0
= x + ∆x
.
0
0
x
0.5
1
-1
0.5
0
-0.5
1
y
2
4
3
1
f(x,y)
2
f(x,y)
0
Local QP in x
-1
→
-2
1
1
0
-1
-2
-3
-1
-3
-0.5
-1
0
-0.5
x
0
x
1
0.5
0.5
1
-1
S. Fiege, A. Griewank, and A. Walther
-0.5
0.5
0
1
y
12 / 22
etc.
0.5
0
1
-0.5
-1
y
September 12, 2016
Lipschitzian Piecewise Smooth Minimization
Minimization of PL Functions
Solution of PL Function by PLMin()
20
15
10
PLMin(x , ∆x , q): Preconditions: x , ∆x
,q
≥0
= 0. Identify σ = σ(x ).
= 0, 1, 2, ...
−10
−15
−20
−100
1 Determine solution δ x of local overestimated QP on Pσj .
2 Update ∆xj +1
0
−5
0
Set ∆x0
For j
∈R
y
5
n
0
−50
50
x
Step 1: Solve local QP
= ∆xj + δ x.
20
15
3 Compute descent direction d by d (x )
10
5
= 0: STOP.
y
4 If kd k
= shortest(qx , G).
5 Identify new polyhedra Pσj +1 by direction d.
0
−5
−10
return ∆x
= ∆xj
−15
−20
−100
0
−50
50
x
Step 5: Identify new
polyhedron
S. Fiege, A. Griewank, and A. Walther
13 / 22
September 12, 2016
Lipschitzian Piecewise Smooth Minimization
Convergence results
Convergence of PLMin()
Argument space is divided only into finitely many polyhedra.
Function value is decreased each time we switch from one polyheron to another.
Algorithm must reach stationary point x̂ after finitely many steps
S. Fiege, A. Griewank, and A. Walther
14 / 22
September 12, 2016
Lipschitzian Piecewise Smooth Minimization
Convergence results
Convergence of LiPsMin
Assumptions:
PS function f is bounded below and has compact set N0 for the starting point x0 ,
fPL,x is a second order approximation of f of the form
f (x
+ ∆x ) = f (x ) + ∆f (x ; ∆x ) +
≤ f (x ) + ∆f (x ; ∆x ) +
there exists a monotonic mapping q̄ (ρ)
1
2
1
2
2
q̃ k∆x k
2
q̂ k∆x k
with
q̂
≡ |q̃ | ≥ 0,
: [0, ∞) → [0, ∞) s.t. for all x ∈ N0 and
∆x ∈ Rn
2|f (x
S. Fiege, A. Griewank, and A. Walther
+ ∆x ) − f (x ) − ∆f (x ; ∆x )|
≤ q̄ (k∆x k)
k∆x k2
15 / 22
September 12, 2016
Lipschitzian Piecewise Smooth Minimization
Convergence results
Convergence of LiPsMin
Under these general assumptions, one has:
a) The sequence of steps {∆x
b) The sequences {∆x
c) The sequence {q
k
k
k
}k ∈N exists.
}k ∈N and {q̂ k }k ∈N are uniformly bounded.
}k ∈N is bounded.
Convergence of LiPsMin
: Rn → R be a piecewise smooth function as described on the previous slide which has a
n
0
0
bounded level set N0 = {x ∈ R | f (x ) ≤ f (x )} with x the starting point of the generated
k
sequence of iterates {x }k ∈N .
∗
k
Then a cluster point x of the infinite sequence {x }k ∈N generated by LiPsMin exists and this
Let f
cluster point is Clarke stationary.
S. Fiege, A. Griewank, and A. Walther
16 / 22
September 12, 2016
Lipschitzian Piecewise Smooth Minimization
Convergence results
Convergence of LiPsMin
Idea of proof:
Local overestimated problem solved by PLMin
1
∆x k = arg mins (∆f (x ; s) + (1 + κ)q k ksk2 ) < 0.
2
Combined with the quadratic approximation and q̄
f (x
k
+ ∆ x k ) − f (x k ) ≤
1
2
q
k +1
There is a subsequence of {∆x
There is a subsequence of {x
k
k
= lim supk →∞ q k +1 , one obtains
− (1 + κ)q k k∆x k k2 ≤
1
2
q̄
− (1 + κ)q k k∆x k k2 .
} tending to 0.
} converging to a cluster point x ∗ since N0 is compact.
If f̂x is Clarke stationary at a ∆x for one q
≥ 0, then the piecewise smooth function f
is
Clarke stationary at x.
S. Fiege, A. Griewank, and A. Walther
17 / 22
September 12, 2016
Lipschitzian Piecewise Smooth Minimization
Numerical Results
Test function: MAXQUAD, PS and convex
f (x )
=
i
Ajj
max
1≤i ≤5
=
j
10
> i
x Ax
− x > bi
| sin(i )| +
X
|Aijk |
i
with
Akj
and
bj
i
= Ai jk = ej /k cos(jk ) sin(i ),
< k,
= ej /i sin(ij )
k 6=j
= 10
=0
∗
Optimal value: f (x ) = −0.8414083
Dimension n
Inital point: x
for j
-10 -10
MAXQUAD
LiPsMin
HANSO
MPBNGC
0
-10 -8
-10 -6
f∗
#f
#g
#iter
LiPsMin
-0.8414293
144
518
46
HANSO
-0.8413940
2955
2955
65 + 3GS
MPBNGC
-0.8414083
40
40
39
-10 -4
-10 -2
-10 0
10 0
10 1
log(f(x
S. Fiege, A. Griewank, and A. Walther
18 / 22
10 2
k
))
September 12, 2016
Lipschitzian Piecewise Smooth Minimization
Numerical Results
Test function: Chained Cresent I, PS and nonconvex
f (x )
= max
( n−1
X
2
xi
)
n−1
X
2
2
+ (xi +1 − 1) + xi +1 − 1 ,
−xi − (xi +1 − 1) + xi +1 + 1
2
i =1
i =1
= 100
= −1.5, when
0
xi = 2 ,
when
∗
Optimal value: f (x ) = 0
Dimension n
0
10 5
mod (i , 2)
mod
=1
(i , 2) = 0
Chained Cresent I
LiPsMin
HANSO
MPBNGC
10 0
log(f(x
k
))
Inital point: xi
f∗
#f
#g
#iter
LiPsMin
6.8e-13
237
471
79
HANSO
1.1e-15
703
703
47 + 3GS
MPBNGC
4.2e-09
96
96
66
10 -5
10 -10
10 -15
10 0
10 1
10 2
log(k)
S. Fiege, A. Griewank, and A. Walther
19 / 22
September 12, 2016
Lipschitzian Piecewise Smooth Minimization
Numerical Results
Test function: Chained LQ, PS and convex
f (x )
=
n−1
X
max −xi
− xi +1 , −xi − xi +1 + xi2 + xi2+1 − 1
with f (x
∗
) = −(n − 1)
√
2
i =1
0
Inital point: xi
= −0.5,
∗
)=0
Optimal value: f (x
n
f∗
#f
#g
#iter
LiPsMin
5
10
20
-5.657
-12.728
-26.87
126
31
34
443
127
280
82
18
20
HANSO
5
10
20
-5.657
-12.728
-26.87
778
3920
18548
778
3920
18548
51 + 3GS
100 + 3GS
165 + 3GS
MPBNGC
5
10
20
-5.657
-12.728
-26.87
88
123
1011
88
123
1011
51
106
1000
S. Fiege, A. Griewank, and A. Walther
20 / 22
September 12, 2016
Lipschitzian Piecewise Smooth Minimization
Numerical Results
Test function: Number of Active Faces, PS and nonconvex
f (x )
max {g (−
=
n
X
1≤i ≤n
0
Inital point: xi
xi ), g (xi ), }
with g (y )
= ln(|y | + 1)
and
f (x
∗
)=0
i =1
= −1,
∗
)=0
Optimal value: f (x
S. Fiege, A. Griewank, and A. Walther
n
f∗
#f
#g
#iter
LiPsMin
5
10
20
8.9e-16
7.8e-15
3.0e-14
5
7
9
7
8
9
2
3
4
HANSO
5
10
20
1.3e-5
8.4e-5
3.2e-5
24
23
25
24
23
25
11
11
9
MPBNGC
5
10
20
0
1e-11
1e-11
18
1000
1000
18
1000
1000
15
994
991
21 / 22
September 12, 2016
Conclusion and Outlook
Conclusion and Outlook
AD drivers provided by ADOL-C
Minimization method for Lipschitzian PS functions: LiPsMin
Convergence theory
Numerical results
Future Work:
Termination criteria based on new optimality conditions
S. Fiege, A. Griewank, and A. Walther
22 / 22
September 12, 2016
Conclusion and Outlook
Conclusion and Outlook
AD drivers provided by ADOL-C
Minimization method for Lipschitzian PS functions: LiPsMin
Convergence theory
Numerical results
Future Work:
Termination criteria based on new optimality conditions
Thank you for your attention! Questions?
S. Fiege, A. Griewank, and A. Walther
22 / 22
September 12, 2016

Download Report

A Method for the Minimization of [0.5ex] Piecewise Smooth Objective

Paperzz.com

Your Paperzz