Indian Journal of Chemistry Vol. 53A, Aug-Sept 2014, pp. 1036-1042 A gradient-based trust radius update suitable for saddle point and transition state optimization Paul W Ayers* & Sandra Rabi Department of Chemistry & Chemical Biology, McMaster University, Hamilton, Ontario L8S 4M1, Canada Email: [email protected] Received 23 April 2014; accepted 25 April 2014 A new trust radius method, appropriate for saddle point optimization is proposed. The idea is to ensure that the gradients from the quadratic model, and the exact computed gradients, are similar in direction and magnitude. For transition state optimization, the new trust radius method is found to work somewhat better than more traditional methods, based on the value of the objective function. Keywords: Theoretical chemistry, Saddle point optimization, Transition state optimization, Trust radius method For most chemical reactions, good estimates for the equilibrium constants, branching ratios, reaction rates, and other thermodynamic and kinetic characteristics can be inferred from knowledge of the location, height, and vibrational frequencies of the minima (reactant, product, reactive intermediates) and 1st-order saddle points (transition states) through which the reaction coordinate passes.1 Finding the locations of stationary points on molecular potential energy surfaces is, therefore, the starting point for most computational studies of chemical thermodynamics and kinetics.2-5 Finding minima on the potential energy surface is relatively straightforward: starting from an initial guess for the stable molecular structure, one descends the potential energy surface until no further descent is possible. This is always a local minimum on the potential energy surface. Finding transition states is more complicated because one may need to ascend or descend, depending on the starting structure. Furthermore, once one finds a stationary point, one must ensure that it has just one imaginary vibrational frequency (one negative curvature direction). For this reason, there is continuing interest in developing methods for locating transition states on potential energy surfaces, from our research group and from others.4-34 If one has a reasonable initial guess for the geometry of a minimum, then it is reasonable to use a (quasi-)Newton method to optimize the solution.35-37 In this method, one uses the Taylor expansion for the potential energy surface around a point, truncated at second order (Eq. 1), U ( q + ∆q ) ≈ Υ ( q + ∆q ) ≡ U ( q ) + ∇U ( q ) ⋅ ∆q T + 12 ( ∆q ) ( H ( q ) ) ( ∆q ) … (1) and the Taylor expansion for the gradient of the molecular potential energy, truncated at first order (Eq. 2), ∇U ( q + ∆q ) ≈ γ ( q + ∆q ) ≡ ∇U ( q ) + ( H ( q ) ) ( ∆q ) … (2) to make an improved guess for the energy. The Hessian matrix, H, can be computed exactly as the secondderivative matrix of the potential energy (Eq. 3), H ( q ) = ∇∇ U ( q ) T … (3) but to reduce the computational expense of the calculation, the Hessian is typically approximated by a quasi-Newton method. Setting the approximate gradient, γ, to zero gives a linear equation for the (quasi-)Newton step, ∆q. The simplest form of Newton’s method is to compute ∆q, then update the position of the atomic nuclei to q + ∆q. Evaluating the potential energy, gradient, and Hessian for the new geometry allows one to compute a new ∆q, which defines the second iteration of the geometry optimizer. Ideally, this method could be iterated to convergence (which occurs when the gradient of the potential energy AYERS & RABI : TRUST RADIUS METHOD FOR SADDLE POINT & TS OPTIMIZATION surface is zero). Unfortunately, this approach is prone to limit cycles; ensuring convergence requires controlling the step-size. That is, one does not always take the full step ∆q; sometimes one takes a shorter step, or even a step in a somewhat different direction, to ensure that every step brings one closer to a solution. For geometry optimization methods in quantum chemistry, the stepsize is usually controlled by performing a line search38-40 or by a trust radius method. Trust radius methods are found to be more efficient in most cases, and the rational function optimization (RFO8,16,41-45) and trust region image method (TRIM42,43,46) are popular within quantum chemistry. In our studies, we have found the TRIM method to be most efficient, and that is what we will use to test the method developed in this paper. Trust Radius Methods Traditionally, trust radius methods were designed for minimization. The idea—based on Eq. (1)—is that when the quadratic model for the energy closely resembles the actually energy at a new point, we can trust the Newton step, and larger steps should be allowed. Conversely, if the quadratic model for the energy is inaccurate, we should not trust the new step, and the size of the step should be reduced. 1037 (In particular, the trust radius is always reduced when the quadratic model predicts an energy decrease (increase) but the energy actually increases (decreases)). This intuition is quantified in Algorithm 1.47 Once the trust radius, τ, has been selected, one determines a new energy by minimizing the quadratic model for the energy subject to the constraint that the step size is no larger than the trust radius, i. e., ∆q = arg min Υ ( q + ∆q ) {∆q ∆q ≤τ } … (4) where Υ ( q + ∆q ) is the quadratic model for the energy (cf. Eq. (1)). This approach is highly effective for geometry minimization, but it is not obviously appropriate for transition state optimization, where a desirable step (i.e., a ∆q that brings one closer to the transition state structure) could either increase or decrease the energy. In our earlier work, we have used a slight revision of preceding algorithm (see Algorithm 2) in this case. (Algorithm 2 is a refinement of the method in Ref. 48.) An important feature of this method is that very short steps are always accepted. If one is a region of the potential energy surface that is so far Algorithm 1Energy-based trust radius update for minimization 1038 INDIAN J CHEM, SEC A, AUG-SEPT 2014 Algorithm 2Energy-based trust radius for saddle points. [The trust radius updating procedure is unchanged from Algorithm 1. The method for accepting/rejecting steps is similar to Algorithm 1, but line 3 becomes a condition on the gradient and a minimum stepsize is imposed (lines 11-16)] from a transition state structure that all the eigenvalues of the Hessian are positive (i.e., all normal vibrational modes have real frequencies), then one must increase take an uphill step that increase the gradient to move into a region of the potential energy surface with an imaginary vibrational frequency. Since the quadratic model should be accurate when steps are very short, in Algorithm 2, one always accepts very short steps. For transition state optimization, the criterion for a desirable step is not whether the change in energy resembles the approximate change in the energy from the quadratic model; desirable steps are those that reduce the gradient of the potential energy. Since the effectiveness of a transition state optimization method is predicated not on its ability to effectively optimize the energy, but on its ability to minimize the norm of the gradient, we decided to design a trust radius method based on the accuracy of the linear approximation to the gradient to the true gradient, as in Eq. (2). Since the gradient is a vector, not a scalar, we consider the approximate gradient, γ, to be accurate if the predicted change in the gradient is close to the actual change in the gradient, cos ( θ ) = ( γ ( q + ∆q ) − ∇U ( q ) ) ⋅ ( ∇U ( q + ∆q ) − ∇U ( q ) ) γ ( q + ∆q ) − ∇U ( q ) ∇U ( q + ∆q ) − ∇U ( q ) … (5) This leads to a gradient-based algorithm for updating the trust radius. (See Algorithm 3). The rationale for the condition used by Algorithm 3 to measure the alignment between the approximate and exact change in ∇U is perhaps not immediately apparent. As the dimensionality increases, it becomes increasingly unlikely for two vectors to be nearly aligned. Specifically, the probability distribution for the angle between two d-dimensional random vectors is pd (θ ) = Γ ( d2 ) π Γ( d −1 2 ) sin d − 2 (θ ) … (6) and the average cosine of the angle between two vectors is cos ( θ12 ) = v1 ⋅ v 2 ( d − 2 )!! π2 = v1 v 2 ( d − 1)!! 1 ∼d −1 d is even d is odd 2 … (7) AYERS & RABI : TRUST RADIUS METHOD FOR SADDLE POINT & TS OPTIMIZATION 1039 Algorithm 3Gradient-based trust radius update for saddle points. [The method for accepting/rejecting a step is exactly the same as for Algorithm 2, except that the gradient-based trust radius update is used] Therefore, the angle between the approximate and exact gradients tends to decrease as the number of atoms increases (see Fig. 1). (This is an example of the curse of dimension: vectors in high-dimensional spaces are almost always nearly orthogonal.) For this reason, we measure the alignment between the vectors by comparing to a function, cx(d), which is the value of cos(θ) such that x% of pairs of vectors will have a cosine greater than cx(d), where d = 3Natoms – 6(5) is the dimensionality of the molecular potential energy surface. Computational Tests We tested the ability of energy-based (Algorithm 2) and gradient-based (Algorithm 3) trust radius optimization methods to locate the transition states of five reactions from the benchmark set of Baker and Chan (see Fig. 2).49 Our transition state finding algorithm50 uses the Bofill quasi-Newton update,42 but uses finite differences to update key internal coordinates, as suggested in Ref. 48. The method uses a nonredundant system of delocalized internal coordinates51,52 generated from a set redundant internal coordinates53 that is generated by a method inspired by the Dalton program.54 The trust radius Fig. 1 The probability distribution function for the angle between two random vectors in (3Natoms – 6) is plotted versus the number of atoms. This explains why, as the number of atoms decreases, the cosine of the angle between the vectors representing the approximate and exact change in the gradient, Eq. (5), decreases. image method42,43,46 is used to determine the direction of a step with the specified trust radius. Energies and gradients were evaluated using the Gaussian program,55 using the Hartree-Fock method with the 6-31++G(d,p) basis set. 1040 INDIAN J CHEM, SEC A, AUG-SEPT 2014 Fig. 2The five reactions used to test the trust radius optimization methods in this paper. The transition states were optimized at the HF/6-31++G(d,p) level, and then five random perturbations of the correct transition state structures were made to generate five initial geometries for each of the five reactions. As an initial trust radius, we chose τ initial = .35 N atoms a.u. … (8) We believe that the maximum and minimum trust radii, τ max = N atoms a.u. τ min = 101 N atoms a.u. … (9) … (10) are appropriate for molecular geometry optimization problems because the quadratic model for the energy should be accurate if none of the Cartesian coordinates change by more than about .1 a.u. For each reaction, we randomly perturbed the internal coordinates associated with bond-forming/ breaking so that root-mean-square distance between the perturbed structure and the exact transition state was. 4 a.u.. Choosing five different perturbations for each of the five reactions gave a total of 25 transition state optimizations, which may be enough to make tentative conclusions about the trust radius methods presented here. For each trust radius method, we considered several different choices for the defining parameters. We took three choices for the factor used to increase the trust radius when the quadratic model for the potential energy surface is accurate, κ = 1.5, 2.0, and 4.0. For the energy-based trust radius, the parameters used to assess the similarity between the quadratic model for the change in energy and the calculated change in energy were φgood = 0.6, 0.7, 0.8, and 0.9 and φbad = 0.1, 0.2, 0.3, and 0.4 (cf. Algorithm 1). For the gradient-based trust radius, the parameters used to assess the similarity between the predicted change in the magnitude of the gradient and the actual change were φgood = 0.6, 0.75, and 0.9 and φbad = 0.1, 0.25, and 0.4. The parameters used to assess the similarity between the predicted direction of the change in gradient and the actual direction-of-change were x = 5%, 10%, 20%, and 30% and y = 30%, 40%, and 45% (cf. Algorithm 3). Recall that cz(d) is defined such that z% of pairs of d-dimensional vectors have a value of cos(θ) less than cz(d). We examined all possible choices for these parameters for five initial guess geometries for each of the five reactions in Fig. 1, for a total of 9,300 transition state optimizations. Remarkably, for every choice of parameters, all twenty-five transition state optimizations converged when the gradient based trust radius method was used. (98% of the calculations using the energy-based trust radius converged. Every calculation with κ < 4 converged with the energy-based trust radius, implying that κ = 4 increases the trust radius too aggressively for the energy-based method, but not the gradient-based approach. Disregarding the results for the energy-based trust radius with κ = 4, the computational cost of the two methods is very similar. Specifically, averaging over all five initial guesses for all five reactions, the gradient-based method required between 20.5 and 23.3 gradient evaluations (21.6, on average), depending on the parameters used in the trust radius updating procedure. The energy-based method required between 21.2 and 23.0 gradient evaluations (22.0, on average). We conclude that both methods are relatively insensitive to the parameters used. In the future, we should perform more exhaustive tests, on larger sets of reactions and with more challenging initial guess geometries, so that we can determine optimal trust radius parameters. However, in our ongoing work, we have found the following parameter-sets to be effective, though they are certainly not optimal. For the energy-based trust radius: κ = 2.0, φgood = 2/3, and φbad = 1/3. For the gradient-based trust radius, κ = 2.0, φgood = 0.8, φbad = 0.2, x = 10%, y = 40%. With these parameters, we see indications that Algorithm 3 is AYERS & RABI : TRUST RADIUS METHOD FOR SADDLE POINT & TS OPTIMIZATION more robust and more efficient than Algorithm 2, but more systematic choices are needed. (It could, for example, be that the parameters for the gradient-based trust radius update are quite good, while the parameters for the energy-based trust radius update are far from optimal. Conclusions We have developed a new gradient-based trust radius method (Algorithm 3), for use in transition state optimization and our preliminary results indicate that for transition state optimization, the gradient-based trust radius method is remarkably robust, and slightly more robust than the more conventional trust radius updating method based on the value of the accuracy of the quadratic approximation to the objective function (Algorithm 2). The basic idea in the gradientbased method is that one compares the approximate gradient (computed from the first-order Taylor series, Eq. (2)) to the exactly computed gradient, and increases (decreases) the trust radius when the gradients are similar (dissimilar) in direction and length. Preliminary, nonsystematic, tests indicate that the gradient-based trust radius update is also effective for geometry minimization. Acknowledgement PWA and SR were supported by a Discovery Grant and a Canada Graduate Scholarship from National Sciences and Engineering Research Council (NSERC), Canada. PWA acknowledges helpful discussions with Dr Steven K Burger on trust radius methods. The mathematical work was completed by PWA during a fervidly productive visit to Prof. Patricio Fuentealba and Prof. Carlos Cardenas at the Universidad de Chile, Santiago, Chile. References 1 McQuarrie D A, in, (Harper-Collins, New York), 1976. 2 Schlegel H B, J Comput Chem, 24 (2003) 1514. 3 Wales D J, Energy Landscapes, (Cambridge University Press, Cambridge, UK) 2003. 4 Schlegel H B, Wiley Interdisc Rev-Computl Mol Sci, 1 (2011) 790. 5 Liu Y L, Burger S K, Dey B K, Sarkar U, Janicki M & Ayers P W, The Fast-Marching Method for Determining Chemical Reaction Mechanisms in Complex Systems, in Quantum Biochemistry, edited by C F Matta, (Wiley-VCH, Boston) 2010. 6 Halgren T A & Lipscomb W N, Chem Phys Lett, 49 (1977) 225. 7 Crippen G M & Scheraga H A, Arch Biochem Biophys, 144 (1971) 462. 8 Banerjee A, Adams N, Simons J & Shepard R, J Phys Chem, 89 (1985) 52. 1041 9 Cerjan C J & Miller W H, J Chem Phys, 75 (1981) 2800. 10 Dey B K & Ayers P W, Mol Phys, 104 (2006) 541. 11 Dey B K, Janicki M R & Ayers P W, J Chem Phys, 121 (2004) 6667. 12 Burger S K & Ayers P W, J Chem Phys, 133 (2010) 034116. 13 Burger S K & Ayers P W, J Chem Theory Comp, 6 (2010) 1490. 14 Burger S K, Liu Y L, Sarkar U & Ayers P W, J Chem Phys, 130 (2009) 024103. 15 Henkelman G & Jonsson H, J Chem Phys, 111 (1999) 7010. 16 Heyden A, Bell A T & Keil F J, J Chem Phys, 123 (2005). 17 Peters B, Heyden A, Bell A T & Chakraborty A, J Chem Phys, 120 (2004) 7877. 18 Quapp W, J Theoret Comput Chem, 8 (2009) 101. 19 Quapp W, J Chem Phys, 122 (2005) 174106. 20 Hirsch M & Quapp W, J Comput Chem, 23 (2002) 887. 21 Quapp W, Hirsch M, Imig O & Heidrich D, J Comput Chem, 19 (1998) 1087. 22 Quapp W, Chem Phys Lett, 253 (1996) 286. 23 Bofill J M & Anglada J M, Theor Chem Acc, 105 (2001) 463. 24 Ohno K & Maeda S, Phys Scr, 78 (2008) 058122. 25 Ohno K & Maeda S, Chem Phys Lett, 384 (2004) 277. 26 Peng C Y & Schlegel H B, Israel J Chem, 33 (1993) 449. 27 E W N, Ren W Q & Vanden-Eijnden E, J Chem Phys, 126 (2007) 164103. 28 E W, Ren W Q & Vanden-Eijnden E, Phys Rev B, 66 (2002). 29 Ayala P Y & Schlegel H B, J Chem Phys, 107 (1997) 375. 30 Burger S K & Yang W, J Chem Phys, 127 (2007) 164107. 31 Burger S K & Yang W T, J Chem Phys, 124 (2006) 054109. 32 Koslover E F & Wales D J, J Chem Phys, 127 (2007). 33 Poppinger D, Chem Phys Lett, 35 (1975) 550. 34 Liu Y L, Burger S K & Ayers P W, J Math Chem, 49 (2011) 1915. 35 Fletcher R, Practical Methods of Optimization, 2nd Edn, (Wiley, Chichester; New York) 1987. 36 Dennis J E & Schnabel R B, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, (Prentice-Hall, NJ) 1983. 37 Nocedal J & Wright S J, in Springer Series in Operations Research, (Springer, New York), 1999. 38 Armijo L, Pacific J Math, 16 (1966) 1. 39 Wolfe P, SIAM Rev, 11 (1969) 226. 40 Wolfe P, SIAM Rev, 13 (1971) 185. 41 Baker J, J Comput Chem, 7 (1986) 385. 42 Bofill J M, J Comput Chem, 15 (1994) 1. 43 Culot P, Dive G, Nguyen V H & Ghuysen J M, Theor Chim Act, 82 (1992) 189. 44 Besalu E & Bofill J M, Theor Chem Acc, 100 (1998) 265. 45 Anglada J M & Bofill J M, Int J Quantum Chem, 62 (1997) 153. 46 Helgaker T, Chem Phys Lett, 182 (1991) 503. 47 Nocedal J & Wright S J, Numerical Optimization, (SpringerVerlag, New York) 1999. 48 Burger S K & Ayers P W, J Chem Phys, 132 (2010) 234110. 49 Baker J & Chan F R, J Comput Chem, 17 (1996) 888. 50 Rabi S, Vestraelen T & Ayers P W (submitted). 51 Baker J, Kessi A & Delley B, J Chem Phys, 105 (1996) 192. 52 Baker J, Kinghorn D & Pulay P, J Chem Phys, 110 (1999) 4986. 53 Pulay P & Fogarasi G, J Chem Phys, 96 (1992) 2856. 54 Bakken V & Helgaker T, J Chem Phys, 117 (2002) 9160. 55 Gaussian 09, (Gaussian Inc, Wallingford CT), 2009. INDIAN J CHEM, SEC A, AUG-SEPT 2014 1042 Appendix Derivation of the criterion for measuring alignment between d-dimensional vectors. Consider two vectors in d dimensions. The direction of the first vector can be chosen to define the z-axis. The probability that the angle between the vectors is less that θmax can be evaluated by integration in hyperspherical coordinates, θ max π ∫0 π 2π ∫0 ∫0 ∫0 pd (θ max ) = π π π sin d − 2 (θ1 ) sin d −3 (θ 2 ) sin 2 (θ d −3 ) sin (θ d − 2 ) dφ dθ d − 2 dθ 2 dθ1 sin d − 2 (θ1 ) sin d −3 (θ 2 ) 2π ∫0 ∫0 ∫0 ∫0 However, for convenience, we use the less fundamental form, cx ( d ) ax bx , + d d2 … (A5) Table A1 gives values of a and b that were determined by fitting to the values of (cx(d))2 in high numbers of dimensions. Figure A1 shows the quality of the fit, which is remarkably accurate even when the number of dimensions is small. sin 2 (θ d −3 ) sin (θ d − 2 ) dφ dθ d − 2 dθ 2 dθ1 … (A1) This integral can be hypergeometric functions, expressed in terms of 1 1 Γ ( d2 ) pd ( θ max ) = − cos ( θ max ) 2 π Γ ( d2−1 ) × 2 F1 ( 12 , 32 − d2 , 32 ,cos2 ( θ max ) ) … (A2) or, most simply, in terms of regularized beta-functions, ( 12 , d2−1 ) d −1 1 , ) ( 2 2) pd (θ max ) = 1 − 12 I cos2 θ = 12 I sin 2 (θ max Fig. A1The fit of cx(d) to the asymptotic expression in Eq. for x = 10%, 25%, and 40%. The fit is remarkably accurate, even for just three atoms. max Table A1Consider a pair of d-dimension vectors, where d is large. x% of vector-pairs in d dimensions have a cosine between … (A3) The expressions based on regularized beta functions assume that 0 ≤ θmax ≤ π/2. We now define the inverse of this function. That is, we define the function, cx(d), such that x% of d-dimensional vector-pairs have a value of cos(θmax) less than cx(d). The fundamental dependence of Eq. (A3) on cos 2 (θ max ) suggests the asymptotic form (c ( d )) x 2 ax bx + d d2 , … (A4) them that is greater than cx ( d ) = ax d −1 + bx d −2 . See Eq. and the preceding discussion. x ax bx 5 10 15 20 25 30 35 40 45 2.7055 1.6424 1.0742 0.7084 0.4549 0.2750 0.1484 0.0642 0.0158 0.4 1.1 1.0 0.81 0.58 0.37 0.21 0.094 0.024
© Copyright 2025 Paperzz