Finding saddle points of mountain pass type with quadratic models

Finding saddle points of mountain pass
type with quadratic models on ane
spaces
C.H. Jerey Pang
Applied Mathematics Instructor,
Massachussetts Institute of Technology
August 18 2011, MOPTA 2011
Denition of Mountain Pass
50
40
30
20
10
0
4
2
0
−2
−4
−3
I
−2
−1
0
1
Saddle
2
3
inf p∈Γ max0≤t ≤1 f p (t ) , where f : X → R, and
Γ set of paths p : [0, 1] → Rn s.t. p (0) = a, p (1) = b.
Mountain Pass Problem in Chemistry
I
Computational chemistry:
I
I
I
Least energy to transition between 2 stable states.
Highest point on path gives intermediate state.
Importance long established in chemistry curriculum.
I Foundations by Marcelin
I Important work in (Eyring- Polanyi '31) and (PelzerWigner '32)
I
Numerous methods:
I
I
Nudged Elastic Band, Dimer method, Ridge method,
Conjugate Peak Renement, Drag Method, String
method, Method of Dewar-Healy-Stewart, Step and
Slide, Elastic String method (Moré-Munson '04), etc...
Software:
I
Gaussian and VASP
Mountain Pass Problem in Numerical PDE
I
Numerical solution of PDEs:
I
Weak solutions saddle points (∇J = 0) of J : H → R.
I
I
I
I
I
∀y ∈ Ω,
−∆x (y ) = F 0 (x (y ))
x (y ) = 0 R
∀y ∈ ∂ Ω.
1
2−
F
(
x
(
y
))
dy .
Functional: J (x ) :=
k
x
k
Ω
2
PDE:
Critical points that are maxima/ minima easily found.
Saddle points found by mountain pass (max on path).
Applications:
I
I
I
I
Periodic solutions of a boundary value problem modeling
a suspension bridge (Feng '94; Lazer- McKenna '91)
A system of Ginzburg-Landau type equations in thin lm
model of superconductivity (Glotov- McKenna '08)
Choreographical 3-body problem
(Arioli- Barutello- Terracini '06)
Cylinder buckling (Horák- Lord- Peletier '06)
Notable work
Critical point existence theorems
I Mountain Pass Theorem (Ambrosetti- Rabinowitz '73)
I Multidimensional Mountain Pass Thm (Rabinowitz '77)
Numerical implementations for PDEs
I First implementation by (Choi-McKenna '93).
I Morse index 2 by (Ding- Costa- Chen '99).
I General Morse index by (Li- Zhou '01).
I Constrained Optimization by (Horák '04)
Current Numerical Approaches
I
Many methods from Computational Chemistry.
inf p∈Γ max0≤t ≤1 f p (t ) : suggests min-max over paths.
Many algorithms (see survey in Jabri '03) discretize path:
I
Other strategies (Horák '04, Barutello-Terracini '07)
I
I
Idea 1: Closest pts to components of level sets
I
I
I
Suppose li % f (x̄ ), where ∇f (x̄ ) = 0.
Idea 1: Find closest points in components of
{x : f (x ) ≤ li }. (Miron-Fichthorn '01, Lewis-Pang '09)
Note bottleneck
Comparison to optimization
Global problems
I Global optimization provably dicult
I Global mountain pass provably dicult
Local problems
I Local optimization:
I
I
Speed of convergence to local minimizer
Local mountain pass:
I
I
Speed of convergence to saddle point
Guess mountain pass after saddle point identied.
This talk: Only the local problem
Idea 2: Quadratic models
Assumptions:
I
I
f : Rn → R (domain= Rn )
Hessian is invertible
Mountain pass case: One −ve eigenvalue, rest +ve.
quadratic: f (x ) = 1 x T Hx + g T x + c
I
I
f
I
I
2
Near x̄ , f (x ) = x T Hx + g T x + c + o (kx − x̄ k2 ),
Like conjugate gradients, drop o (·) term & hope for best.
1
2
Idea 2:
I Use quadratic expression f (x ) = 1 x T Hx + g T x + c .
2
I
I
Basis of fast convergence in Newton, Quasi-Newton, and
Conjugate gradients methods.
Used to prove superlinear convergence of an algorithm in
(Lewis-Pang '09).
Linear model on ane spaces
Proposition
For f (x ) = Ax + b,
values of f (vi ) on d + 1 points {v1, . . . , vd +1}
determines f on the d dimensional ane space.
Quadratic model on ane spaces
Proposition
For f (x ) = 12 x T Ax + bT x + c,
values of f (vi ) and ∇f (vi ) on d + 1 points {v1, . . . , vd +1}
determines f on the d dimensional ane space.
Closest points between components
x1
I
I
x2
x3
y3
y2
y1
z
Want to nd closest points between components.
Midpoint of closest points estimates saddle point
Description of inner algorithm for
f
: Rn → R
Let ∇f (x̄ ) = 0,
l be s.t. l < f (x̄ ).
(Example:
f (x ) = −x12 + x22 + x32)
To nd closest pts
in components
of {x | f (x ) ≤ l },
Take 2 points and in separate components.
1.
x̃j
ỹj
2. Get ane space thru x̃j & ỹj using ∇f (x̃j ) & ∇f (ỹj )
3. Find formula of f there.
4. Get closer iterates x̃j +1 and ỹj +1 and return to step 1.
Description of inner algorithm for
f
: Rn → R
Let ∇f (x̄ ) = 0,
l be s.t. l < f (x̄ ).
(Example:
f (x ) = −x12 + x22 + x32)
To nd closest pts
in components
of {x | f (x ) ≤ l },
1. Take 2 points x̃j and ỹj in separate components.
2.
x̃j ỹj
∇f (x̃j )
∇f (ỹj )
3. Find formula of f there.
4. Get closer iterates x̃j +1 and ỹj +1 and return to step 1.
Get ane space thru & with
&
Description of inner algorithm for
f
: Rn → R
Let ∇f (x̄ ) = 0,
l be s.t. l < f (x̄ ).
(Example:
f (x ) = −x12 + x22 + x32)
To nd closest pts
in components
of {x | f (x ) ≤ l },
1. Take 2 points x̃j and ỹj in separate components.
2. Get ane space thru x̃j & ỹj using ∇f (x̃j ) & ∇f (ỹj )
3.
f
4. Get closer iterates x̃j +1 and ỹj +1 and return to step 1.
Find formula of there.
Description of inner algorithm for
f
: Rn → R
Let ∇f (x̄ ) = 0,
l be s.t. l < f (x̄ ).
(Example:
f (x ) = −x12 + x22 + x32)
To nd closest pts
in components
of {x | f (x ) ≤ l },
1. Take 2 points x̃j and ỹj in separate components.
2. Get ane space thru x̃j & ỹj using ∇f (x̃j ) & ∇f (ỹj )
3. Find formula of f there.
4.
x̃j +1 ỹj +1
Get closer iterates
&
and go to step 1.
Other details
I
I
Ane space through x̃j and ỹj need not be 2-d
Like conjugate gradients, consider bigger and bigger
subspaces from all previously evaluated points
(3D) New a. space from old a. space & {∇f (x̃j ), ∇f (ỹj )}.
(MG) Use { 21 [∇f (x̃j ) + ∇f (ỹj )]} instead.
Other details
An outer algorithm to nd li s.t. li → f (x̄ ), where ∇f (x̄ ) = 0.
In general (non-quadratic) case, as li → f (x̄ ),
iterates approach x̄ , so better approx. for ∇2 f (x̄ ).
If f quadratic, then (3D) & (MG) independent of li .
(i.e., midpoints of x̃j and ỹj independent of li )
In fact, (MG) equivalent to conjugate gradients.
Another choice of ane space through x̃j and ỹj
(MV) Use ∇f (x̃j ) or ∇f (ỹj ), depending on which linearization
{x | f (x ) ≤ l } ≈
{x | ∇f (x̃j )(x − x̃j ) ≤ 0} near x̃j
or {x | ∇f (ỹj )(x − ỹj ) ≤ 0} near ỹj
predicts better reduction of dist. btw components.
Yet another choice of second direction
Use intersection of tangent spaces.
(MD) 2nd direction via closest pt from midpoint to intersection.
Close to optimality, tangent spaces nearly parallel.
Must use other methods for numerical stability.
Measures for the algorithm
Four criterias for measuring of the algorithm
(DI) Distance between iterates:
kx̃j − ỹj k minus optimal distance between components.
(VO) ViolationnDof optimality condition:
E D
Eo
∇f (y )
∇f (x )
y
−x
x
−y
1 − min k∇f (x )k , ky −x k , k∇f (y )k , kx −y k .
i
i
i
i
i
i
i
i
i
i
i
i
(NG) Norm of gradient ∇f
x̃j + ỹj . (DS) Distance from saddle point to 12 x̃j + ỹj .
Summary:
1. (3D) performs best.
2. For other methods,
1
2
I
I
I
(MV) better for (DI) and (VO)
(MG) better for (NG) and (DS)
(MD) good when tangent planes not nearly parallel
Performance of strategies for (DI) and (VO)
Numerical experiment: Test convergence on
f (x ) = 12 x T Dx with random starting points and D .
Objective (DI): Distance between iterates
2
10
Log of 1 − violation in optimality condition for distance
Log of (distance between iterates) − (optimal distance)
10
(MV)
(MG)
(MD) then (MV)
(3D) then (MV)
1
0
10
−1
10
−2
10
−3
10
I
I
I
Objective (VO): Violation of optimality condition
0
10
(MV)
(MG)
(MD) then (MV)
(3D) then (MV)
−1
10
−2
10
−3
10
−4
10
−5
0
2
4
6
8
10
12
iteration count
14
16
18
20
10
0
2
4
6
8
10
12
iteration count
14
(3D) best strategy
(MV) better than (MG) for objectives (DI) and (VO)
(MD) switching to (MV) catches up with (MV)
16
18
20
Performance of strategies for (NG) and (DS)
Numerical experiment: Test convergence on
f (x ) = 12 x T Dx with random starting points and D .
Objective (NG): Norm of gradient
2
10
(MV)
(MG)
(MD) then (MG)
(3D) then (MG)
(MV)
(MG)
(MD) then (MG)
(3D) then (MG)
1
Log of distance to true saddle point
10
Log of norm of gradient
Objective (DS): Distance to true saddle point
2
10
0
10
−1
10
−2
1
10
0
10
10
−3
10
I
I
I
−1
0
2
4
6
8
10
12
iteration count
14
16
18
20
10
0
2
4
6
8
10
12
iteration count
14
(3D) best strategy
(MG) better than (MV) for objectives (NG) and (DS)
In fact, (MD) better than (MG) in short term
16
18
20
Conclusion
In progress (with J. Brereton and J.H. Chen):
I Details for global mountain pass algorithm
(motivated by quadratic approximation)
I Implemention
Website: http://math.mit.edu/~chj2pang
Preprint: Finding saddle points of mountain pass type with
quadratic models on ane spaces, (Link to arxiv:1107.4287)
The end. Thanks!