A gradient-based trust radius update suitable for saddle point and

Indian Journal of Chemistry
Vol. 53A, Aug-Sept 2014, pp. 1036-1042
A gradient-based trust radius update suitable for saddle point and
transition state optimization
Paul W Ayers* & Sandra Rabi
Department of Chemistry & Chemical Biology, McMaster University, Hamilton, Ontario L8S 4M1, Canada
Email: [email protected]
Received 23 April 2014; accepted 25 April 2014
A new trust radius method, appropriate for saddle point optimization is proposed. The idea is to ensure that the
gradients from the quadratic model, and the exact computed gradients, are similar in direction and magnitude. For transition
state optimization, the new trust radius method is found to work somewhat better than more traditional methods, based on
the value of the objective function.
Keywords: Theoretical chemistry, Saddle point optimization, Transition state optimization, Trust radius method
For most chemical reactions, good estimates for the
equilibrium constants, branching ratios, reaction rates,
and other thermodynamic and kinetic characteristics
can be inferred from knowledge of the location,
height, and vibrational frequencies of the minima
(reactant, product, reactive intermediates) and
1st-order saddle points (transition states) through
which the reaction coordinate passes.1 Finding the
locations of stationary points on molecular potential
energy surfaces is, therefore, the starting point for
most
computational
studies
of
chemical
thermodynamics and kinetics.2-5
Finding minima on the potential energy surface is
relatively straightforward: starting from an initial guess
for the stable molecular structure, one descends the
potential energy surface until no further descent is
possible. This is always a local minimum on the
potential energy surface. Finding transition states is
more complicated because one may need to ascend or
descend, depending on the starting structure.
Furthermore, once one finds a stationary point, one must
ensure that it has just one imaginary vibrational
frequency (one negative curvature direction). For this
reason, there is continuing interest in developing
methods for locating transition states on potential energy
surfaces, from our research group and from others.4-34
If one has a reasonable initial guess for the
geometry of a minimum, then it is reasonable to use a
(quasi-)Newton method to optimize the solution.35-37
In this method, one uses the Taylor expansion for the
potential energy surface around a point, truncated at
second order (Eq. 1),
U ( q + ∆q ) ≈ Υ ( q + ∆q ) ≡ U ( q ) + ∇U ( q ) ⋅ ∆q
T
+ 12 ( ∆q )
( H ( q ) ) ( ∆q )
… (1)
and the Taylor expansion for the gradient of the
molecular potential energy, truncated at first order
(Eq. 2),
∇U ( q + ∆q ) ≈ γ ( q + ∆q ) ≡ ∇U ( q ) + ( H ( q ) ) ( ∆q )
… (2)
to make an improved guess for the energy. The Hessian
matrix, H, can be computed exactly as the secondderivative matrix of the potential energy (Eq. 3),
H ( q ) = ∇∇ U ( q )
T
… (3)
but to reduce the computational expense of the
calculation, the Hessian is typically approximated by
a quasi-Newton method. Setting the approximate
gradient, γ, to zero gives a linear equation for the
(quasi-)Newton step, ∆q.
The simplest form of Newton’s method is to
compute ∆q, then update the position of the atomic
nuclei to q + ∆q. Evaluating the potential energy,
gradient, and Hessian for the new geometry allows
one to compute a new ∆q, which defines the second
iteration of the geometry optimizer. Ideally, this
method could be iterated to convergence (which
occurs when the gradient of the potential energy
AYERS & RABI : TRUST RADIUS METHOD FOR SADDLE POINT & TS OPTIMIZATION
surface is zero). Unfortunately, this approach is prone
to limit cycles; ensuring convergence requires
controlling the step-size. That is, one does not always
take the full step ∆q; sometimes one takes a shorter
step, or even a step in a somewhat different direction,
to ensure that every step brings one closer to a solution.
For geometry optimization methods in quantum
chemistry, the stepsize is usually controlled by
performing a line search38-40 or by a trust radius
method. Trust radius methods are found to be more
efficient in most cases, and the rational function
optimization (RFO8,16,41-45) and trust region image
method (TRIM42,43,46) are popular within quantum
chemistry. In our studies, we have found the TRIM
method to be most efficient, and that is what we will
use to test the method developed in this paper.
Trust Radius Methods
Traditionally, trust radius methods were designed
for minimization. The idea—based on Eq. (1)—is that
when the quadratic model for the energy closely
resembles the actually energy at a new point, we can
trust the Newton step, and larger steps should be
allowed. Conversely, if the quadratic model for the
energy is inaccurate, we should not trust the new step,
and the size of the step should be reduced.
1037
(In particular, the trust radius is always reduced when
the quadratic model predicts an energy decrease
(increase) but the energy actually increases
(decreases)). This intuition is quantified in Algorithm 1.47
Once the trust radius, τ, has been selected, one
determines a new energy by minimizing the quadratic
model for the energy subject to the constraint that the
step size is no larger than the trust radius, i. e.,
∆q = arg min
Υ ( q + ∆q )
{∆q ∆q ≤τ }
… (4)
where Υ ( q + ∆q ) is the quadratic model for the
energy (cf. Eq. (1)).
This approach is highly effective for geometry
minimization, but it is not obviously appropriate for
transition state optimization, where a desirable step
(i.e., a ∆q that brings one closer to the transition state
structure) could either increase or decrease the
energy. In our earlier work, we have used a slight
revision of preceding algorithm (see Algorithm 2) in
this case. (Algorithm 2 is a refinement of the method
in Ref. 48.) An important feature of this method is
that very short steps are always accepted. If one is a
region of the potential energy surface that is so far
Algorithm 1Energy-based trust radius update for minimization
1038
INDIAN J CHEM, SEC A, AUG-SEPT 2014
Algorithm 2Energy-based trust radius for saddle points. [The trust radius updating procedure is unchanged from Algorithm 1. The method
for accepting/rejecting steps is similar to Algorithm 1, but line 3 becomes a condition on the gradient and a minimum stepsize is imposed
(lines 11-16)]
from a transition state structure that all the
eigenvalues of the Hessian are positive (i.e., all
normal vibrational modes have real frequencies), then
one must increase take an uphill step that increase the
gradient to move into a region of the potential energy
surface with an imaginary vibrational frequency.
Since the quadratic model should be accurate when
steps are very short, in Algorithm 2, one always
accepts very short steps.
For transition state optimization, the criterion for a
desirable step is not whether the change in energy
resembles the approximate change in the energy from
the quadratic model; desirable steps are those that
reduce the gradient of the potential energy. Since the
effectiveness of a transition state optimization method
is predicated not on its ability to effectively optimize
the energy, but on its ability to minimize the norm of
the gradient, we decided to design a trust radius
method based on the accuracy of the linear
approximation to the gradient to the true gradient, as
in Eq. (2). Since the gradient is a vector, not a scalar,
we consider the approximate gradient, γ, to be
accurate if the predicted change in the gradient is
close to the actual change in the gradient,
cos ( θ ) =
( γ ( q + ∆q ) − ∇U ( q ) ) ⋅ ( ∇U ( q + ∆q ) − ∇U ( q ) )
γ ( q + ∆q ) − ∇U ( q ) ∇U ( q + ∆q ) − ∇U ( q )
… (5)
This leads to a gradient-based algorithm for updating
the trust radius. (See Algorithm 3).
The rationale for the condition used by Algorithm 3
to measure the alignment between the approximate and
exact change in ∇U is perhaps not immediately
apparent. As the dimensionality increases, it becomes
increasingly unlikely for two vectors to be nearly
aligned. Specifically, the probability distribution for the
angle between two d-dimensional random vectors is
pd (θ ) =
Γ ( d2 )
π Γ(
d −1
2
)
sin d − 2 (θ )
… (6)
and the average cosine of the angle between two
vectors is
cos ( θ12 ) =
v1 ⋅ v 2 ( d − 2 )!!  π2
=

v1 v 2 ( d − 1)!! 1
∼d
−1
d is even
d is odd
2
… (7)
AYERS & RABI : TRUST RADIUS METHOD FOR SADDLE POINT & TS OPTIMIZATION
1039
Algorithm 3Gradient-based trust radius update for saddle points. [The method for accepting/rejecting a step is exactly the same as for
Algorithm 2, except that the gradient-based trust radius update is used]
Therefore, the angle between the approximate and
exact gradients tends to decrease as the number of
atoms increases (see Fig. 1). (This is an example of
the curse of dimension: vectors in high-dimensional
spaces are almost always nearly orthogonal.) For this
reason, we measure the alignment between the vectors
by comparing to a function, cx(d), which is the value
of cos(θ) such that x% of pairs of vectors will have a
cosine greater than cx(d), where d = 3Natoms – 6(5) is
the dimensionality of the molecular potential energy
surface.
Computational Tests
We tested the ability of energy-based (Algorithm 2)
and gradient-based (Algorithm 3) trust radius
optimization methods to locate the transition states of
five reactions from the benchmark set of Baker and
Chan (see Fig. 2).49 Our transition state finding
algorithm50 uses the Bofill quasi-Newton update,42 but
uses finite differences to update key internal
coordinates, as suggested in Ref. 48. The method uses
a nonredundant system of delocalized internal
coordinates51,52 generated from a set redundant
internal coordinates53 that is generated by a method
inspired by the Dalton program.54 The trust radius
Fig. 1
The probability distribution function for the angle
between two random vectors in (3Natoms – 6) is plotted versus
the number of atoms. This explains why, as the number of atoms
decreases, the cosine of the angle between the vectors
representing the approximate and exact change in the gradient,
Eq. (5), decreases.
image method42,43,46 is used to determine the direction
of a step with the specified trust radius. Energies and
gradients were evaluated using the Gaussian
program,55 using the Hartree-Fock method with the
6-31++G(d,p) basis set.
1040
INDIAN J CHEM, SEC A, AUG-SEPT 2014
Fig. 2The five reactions used to test the trust radius
optimization methods in this paper. The transition states were
optimized at the HF/6-31++G(d,p) level, and then five random
perturbations of the correct transition state structures were made
to generate five initial geometries for each of the five reactions.
As an initial trust radius, we chose
τ initial = .35 N atoms a.u.
… (8)
We believe that the maximum and minimum trust
radii,
τ max = N atoms a.u.
τ min = 101 N atoms a.u.
… (9)
… (10)
are appropriate for molecular geometry optimization
problems because the quadratic model for the energy
should be accurate if none of the Cartesian
coordinates change by more than about .1 a.u.
For each reaction, we randomly perturbed the
internal coordinates associated with bond-forming/
breaking so that root-mean-square distance between the
perturbed structure and the exact transition state was.
4 a.u.. Choosing five different perturbations for each of
the five reactions gave a total of 25 transition state
optimizations, which may be enough to make tentative
conclusions about the trust radius methods presented
here. For each trust radius method, we considered
several different choices for the defining parameters.
We took three choices for the factor used to increase
the trust radius when the quadratic model for the
potential energy surface is accurate, κ = 1.5, 2.0, and
4.0. For the energy-based trust radius, the parameters
used to assess the similarity between the quadratic
model for the change in energy and the calculated
change in energy were φgood = 0.6, 0.7, 0.8, and 0.9 and
φbad = 0.1, 0.2, 0.3, and 0.4 (cf. Algorithm 1). For the
gradient-based trust radius, the parameters used to
assess the similarity between the predicted change in
the magnitude of the gradient and the actual change
were φgood = 0.6, 0.75, and 0.9 and φbad = 0.1, 0.25, and
0.4. The parameters used to assess the similarity
between the predicted direction of the change in
gradient and the actual direction-of-change were
x = 5%, 10%, 20%, and 30% and y = 30%, 40%, and
45% (cf. Algorithm 3). Recall that cz(d) is defined such
that z% of pairs of d-dimensional vectors have a value
of cos(θ) less than cz(d). We examined all possible
choices for these parameters for five initial guess
geometries for each of the five reactions in Fig. 1, for a
total of 9,300 transition state optimizations.
Remarkably, for every choice of parameters, all
twenty-five transition state optimizations converged
when the gradient based trust radius method was used.
(98% of the calculations using the energy-based trust
radius converged. Every calculation with κ < 4
converged with the energy-based trust radius, implying
that κ = 4 increases the trust radius too aggressively for
the energy-based method, but not the gradient-based
approach. Disregarding the results for the energy-based
trust radius with κ = 4, the computational cost of the
two methods is very similar. Specifically, averaging
over all five initial guesses for all five reactions, the
gradient-based method required between 20.5 and 23.3
gradient evaluations (21.6, on average), depending on
the parameters used in the trust radius updating
procedure. The energy-based method required between
21.2 and 23.0 gradient evaluations (22.0, on average).
We conclude that both methods are relatively
insensitive to the parameters used. In the future, we
should perform more exhaustive tests, on larger sets of
reactions and with more challenging initial guess
geometries, so that we can determine optimal trust
radius parameters. However, in our ongoing work, we
have found the following parameter-sets to be
effective, though they are certainly not optimal. For
the energy-based trust radius: κ = 2.0, φgood = 2/3, and
φbad = 1/3. For the gradient-based trust radius, κ = 2.0,
φgood = 0.8, φbad = 0.2, x = 10%, y = 40%. With these
parameters, we see indications that Algorithm 3 is
AYERS & RABI : TRUST RADIUS METHOD FOR SADDLE POINT & TS OPTIMIZATION
more robust and more efficient than Algorithm 2, but
more systematic choices are needed. (It could, for
example, be that the parameters for the gradient-based
trust radius update are quite good, while the
parameters for the energy-based trust radius update
are far from optimal.
Conclusions
We have developed a new gradient-based trust radius
method (Algorithm 3), for use in transition state
optimization and our preliminary results indicate that
for transition state optimization, the gradient-based
trust radius method is remarkably robust, and slightly
more robust than the more conventional trust radius
updating method based on the value of the accuracy
of the quadratic approximation to the objective
function (Algorithm 2). The basic idea in the gradientbased method is that one compares the approximate
gradient (computed from the first-order Taylor series,
Eq. (2)) to the exactly computed gradient, and
increases (decreases) the trust radius when the
gradients are similar (dissimilar) in direction and
length. Preliminary, nonsystematic, tests indicate that
the gradient-based trust radius update is also effective
for geometry minimization.
Acknowledgement
PWA and SR were supported by a Discovery Grant
and a Canada Graduate Scholarship from National
Sciences and Engineering Research Council
(NSERC), Canada. PWA acknowledges helpful
discussions with Dr Steven K Burger on trust radius
methods. The mathematical work was completed by
PWA during a fervidly productive visit to
Prof. Patricio Fuentealba and Prof. Carlos Cardenas at
the Universidad de Chile, Santiago, Chile.
References
1 McQuarrie D A, in, (Harper-Collins, New York), 1976.
2 Schlegel H B, J Comput Chem, 24 (2003) 1514.
3 Wales D J, Energy Landscapes, (Cambridge University
Press, Cambridge, UK) 2003.
4 Schlegel H B, Wiley Interdisc Rev-Computl Mol Sci,
1 (2011) 790.
5 Liu Y L, Burger S K, Dey B K, Sarkar U, Janicki M &
Ayers P W, The Fast-Marching Method for Determining
Chemical Reaction Mechanisms in Complex Systems, in
Quantum Biochemistry, edited by C F Matta, (Wiley-VCH,
Boston) 2010.
6 Halgren T A & Lipscomb W N, Chem Phys Lett, 49 (1977)
225.
7 Crippen G M & Scheraga H A, Arch Biochem Biophys, 144
(1971) 462.
8 Banerjee A, Adams N, Simons J & Shepard R, J Phys Chem,
89 (1985) 52.
1041
9 Cerjan C J & Miller W H, J Chem Phys, 75 (1981) 2800.
10 Dey B K & Ayers P W, Mol Phys, 104 (2006) 541.
11 Dey B K, Janicki M R & Ayers P W, J Chem Phys, 121 (2004)
6667.
12 Burger S K & Ayers P W, J Chem Phys, 133 (2010) 034116.
13 Burger S K & Ayers P W, J Chem Theory Comp, 6 (2010)
1490.
14 Burger S K, Liu Y L, Sarkar U & Ayers P W, J Chem Phys,
130 (2009) 024103.
15 Henkelman G & Jonsson H, J Chem Phys, 111 (1999) 7010.
16 Heyden A, Bell A T & Keil F J, J Chem Phys, 123 (2005).
17 Peters B, Heyden A, Bell A T & Chakraborty A, J Chem Phys,
120 (2004) 7877.
18 Quapp W, J Theoret Comput Chem, 8 (2009) 101.
19 Quapp W, J Chem Phys, 122 (2005) 174106.
20 Hirsch M & Quapp W, J Comput Chem, 23 (2002) 887.
21 Quapp W, Hirsch M, Imig O & Heidrich D, J Comput Chem,
19 (1998) 1087.
22 Quapp W, Chem Phys Lett, 253 (1996) 286.
23 Bofill J M & Anglada J M, Theor Chem Acc, 105 (2001) 463.
24 Ohno K & Maeda S, Phys Scr, 78 (2008) 058122.
25 Ohno K & Maeda S, Chem Phys Lett, 384 (2004) 277.
26 Peng C Y & Schlegel H B, Israel J Chem, 33 (1993) 449.
27 E W N, Ren W Q & Vanden-Eijnden E, J Chem Phys, 126
(2007) 164103.
28 E W, Ren W Q & Vanden-Eijnden E, Phys Rev B, 66 (2002).
29 Ayala P Y & Schlegel H B, J Chem Phys, 107 (1997) 375.
30 Burger S K & Yang W, J Chem Phys, 127 (2007) 164107.
31 Burger S K & Yang W T, J Chem Phys, 124 (2006) 054109.
32 Koslover E F & Wales D J, J Chem Phys, 127 (2007).
33 Poppinger D, Chem Phys Lett, 35 (1975) 550.
34 Liu Y L, Burger S K & Ayers P W, J Math Chem, 49 (2011)
1915.
35 Fletcher R, Practical Methods of Optimization, 2nd Edn,
(Wiley, Chichester; New York) 1987.
36 Dennis J E & Schnabel R B, Numerical Methods for
Unconstrained Optimization and Nonlinear Equations,
(Prentice-Hall, NJ) 1983.
37 Nocedal J & Wright S J, in Springer Series in Operations
Research, (Springer, New York), 1999.
38 Armijo L, Pacific J Math, 16 (1966) 1.
39 Wolfe P, SIAM Rev, 11 (1969) 226.
40 Wolfe P, SIAM Rev, 13 (1971) 185.
41 Baker J, J Comput Chem, 7 (1986) 385.
42 Bofill J M, J Comput Chem, 15 (1994) 1.
43 Culot P, Dive G, Nguyen V H & Ghuysen J M, Theor Chim
Act, 82 (1992) 189.
44 Besalu E & Bofill J M, Theor Chem Acc, 100 (1998) 265.
45 Anglada J M & Bofill J M, Int J Quantum Chem, 62 (1997) 153.
46 Helgaker T, Chem Phys Lett, 182 (1991) 503.
47 Nocedal J & Wright S J, Numerical Optimization, (SpringerVerlag, New York) 1999.
48 Burger S K & Ayers P W, J Chem Phys, 132 (2010) 234110.
49 Baker J & Chan F R, J Comput Chem, 17 (1996) 888.
50 Rabi S, Vestraelen T & Ayers P W (submitted).
51 Baker J, Kessi A & Delley B, J Chem Phys, 105 (1996) 192.
52 Baker J, Kinghorn D & Pulay P, J Chem Phys, 110 (1999)
4986.
53 Pulay P & Fogarasi G, J Chem Phys, 96 (1992) 2856.
54 Bakken V & Helgaker T, J Chem Phys, 117 (2002) 9160.
55 Gaussian 09, (Gaussian Inc, Wallingford CT), 2009.
INDIAN J CHEM, SEC A, AUG-SEPT 2014
1042
Appendix
Derivation of the criterion for measuring alignment
between d-dimensional vectors.
Consider two vectors in d dimensions. The direction of
the first vector can be chosen to define the z-axis. The
probability that the angle between the vectors is less that
θmax can be evaluated by integration in hyperspherical
coordinates,
θ max π
∫0
π 2π
∫0 ∫0 ∫0
pd (θ max ) =
π π
π
sin d − 2 (θ1 ) sin d −3 (θ 2 )
sin 2 (θ d −3 ) sin (θ d − 2 ) dφ dθ d − 2 dθ 2 dθ1
sin d − 2 (θ1 ) sin d −3 (θ 2 )
2π
∫0 ∫0 ∫0 ∫0
However, for convenience, we use the less
fundamental form,
cx ( d ) ax bx
,
+
d d2
… (A5)
Table A1 gives values of a and b that were determined
by fitting to the values of (cx(d))2 in high numbers of
dimensions. Figure A1 shows the quality of the fit, which is
remarkably accurate even when the number of dimensions
is small.
sin 2 (θ d −3 ) sin (θ d − 2 ) dφ dθ d − 2 dθ 2 dθ1
… (A1)
This integral can be
hypergeometric functions,
expressed
in
terms
of
1  1 Γ ( d2 ) 
pd ( θ max ) = − 
 cos ( θ max )
2  π Γ ( d2−1 ) 
× 2 F1 ( 12 , 32 − d2 , 32 ,cos2 ( θ max ) )
… (A2)
or, most simply, in terms of regularized beta-functions,
( 12 , d2−1 )
d −1 1
,
) ( 2 2)
pd (θ max ) = 1 − 12 I cos2 θ
= 12 I sin 2 (θ
max
Fig. A1The fit of cx(d) to the asymptotic expression in Eq. for x
= 10%, 25%, and 40%. The fit is remarkably accurate, even for
just three atoms.
max
Table A1Consider a pair of d-dimension vectors, where d is
large. x% of vector-pairs in d dimensions have a cosine between
… (A3)
The expressions based on regularized beta functions
assume that 0 ≤ θmax ≤ π/2. We now define the inverse of
this function. That is, we define the function, cx(d), such
that x% of d-dimensional vector-pairs have a value of
cos(θmax) less than cx(d). The fundamental dependence of
Eq. (A3) on cos 2 (θ max ) suggests the asymptotic form
(c ( d ))
x
2
ax bx
+
d d2
,
… (A4)
them that is greater than cx ( d ) = ax d −1 + bx d −2 . See Eq. and the
preceding discussion.
x
ax
bx
5
10
15
20
25
30
35
40
45
2.7055
1.6424
1.0742
0.7084
0.4549
0.2750
0.1484
0.0642
0.0158
0.4
1.1
1.0
0.81
0.58
0.37
0.21
0.094
0.024