Optimization Methods and Software Vol. 19, Nos. 3–4, June–August 2004, pp. 247–265 THEORETICAL EFFICIENCY OF A NEW INEXACT METHOD OF TANGENT HYPERBOLAS NAIYANG DENGa and HAIBIN ZHANGb,∗ a College of Sciences, China Agricultural University, Beijing 100083, China; b Department of Applied Mathematics, Beijing University of Technology, Beijing 100022, China (Received 6 February 2003; Revised 20 September 2003; In final form 10 January 2004) A new improved method of tangent hyperbolas is established in this article. It is shown that for middle and large scale unconstrained optimization problems, its average computation cost in each iteration is much lower than that of Improved Method of Tangent Hyperbolas (IMTH) while the rapid local convergence rate is maintained. Keywords: Unconstrained optimization problems; Preconditioned conjugate gradient method (PCG); Improved method of tangent hyperbolas; Automatic differentiation AMS Subject Classification: 65K05; 90C30 1 INTRODUCTION Consider the middle and large scale unconstrained optimization problem min f (x). x∈R n (1) Suppose that problem (1) satisfies the following standard assumptions. ASSUMPTION (A1) f is four-times continuously differentiable in a neighbourhood of the local minimum x ∗ , which minimizes f (x). ASSUMPTION (A2) The Hessian ∇ 2 f (x ∗ ) is symmetric positive definite. In this article, we are concerned with the third-order methods [see e.g. Refs. 8,10]. Just as Newton method approximates the gradient of the objective function with a linear function with the same slope at the current iterate, the third-order methods approximate the gradient with a ‘parabola’ with the same slope and curvature at the current iterate. The most famous third-order method is the improved method of tangent hyperbolas (Algorithm IMTH): xk+1 = xk + sk1 + sk2 , ∗ Corresponding author. Fax: (8610)67391738; E-mail:[email protected] c 2004 Taylor & Francis Ltd ISSN 1055-6788 print; ISSN 1029-4937 online DOI: 10.1080/10556780410001683087 (2) 248 N. DENG AND H. ZHANG where sk1 and sk2 are the solutions to the following Newton equation and Newton-like equation, respectively: ∇ 2 f (xk )sk1 = −∇f (xk ), (3) 1 ∇ 2 f (xk )sk2 = − ∇ 3 f (xk )sk1 sk1 . 2 (4) The cost in one step of the Algorithm IMTH consists of two parts: (1) Evaluating the gradient ∇f (xk ), the Hessian ∇ 2 f (xk ), and the term ∇ 3 f (xk )sk1 sk1 . (2) Solving the Newton equation (3) and the Newton-like equation (4). Our aim is to improve Algorithm IMTH further by reducing its computation cost while keeping the same convergence rate. Here the following two strategies are used: (1) Automatic differentiation (AD). It is an efficient and precise numerical differentiation technique. It has significant advantages over hand-coding, finite difference approximation, and symbolic differentiation [see Ref. 14]. Forward mode and reverse mode of AD were apparently first proposed in Ref. [15] and Ref. [11], respectively. Since 1990s, AD has been developed rapidly, see e.g. Refs. [2,7]. One of its most important applications is to improve the optimization algorithms by computing the relevant derivative information efficiently. (2) Preconditioned conjugate gradient (PCG) method. Computing the exact solution to a linear system by a direct method such as Cholesky factorization (CF) may be expensive for the middle and large scale problems. It is reasonable to use an iterative method to solve it only approximately, see e.g. Refs. [4,5,12,13]. Here PCG method is selected. Noticing the local algorithm is concerned, we restrict our discussion in a sufficiently small neighbourhood of the solution to Eq. (1). Thus, it can be assumed that the norms of the righthand side of both Eqs. (3) and (4) are small enough, e.g. less than 1. It should be noted that the idea of using PCG method in Ref. [5] to the third-order method was mentioned briefly in Ref. [3]. Using the above two strategies, a new improved method of tangent hyperbolas (NIMTH) – Algorithm NIMTH – is established in this article. It is well known that the optimization algorithms with AD technique are more efficient than the algorithms with other differentiation techniques. Therefore, in this article we only show the superiority of our new Algorithm NIMTH to the original Algorithm IMTH when AD technique is used in both of them. In this article, the following notations and conventions are used: ẋ ∈ R n denotes a directional vector. ∇f (x), ∇ 2 f (x), and ∇ 3 f (x) are respectively, gradient, Hessian matrix, and third-order tensor of the function f (x). κ(A) denotes the condition number of the matrix A. · denotes the largest integer not larger than ·, and · denotes the smallest integer not smaller than ·. Finally, we make the convention that b · · · = 0, when a > b. a This article is organized as follows: An Algorithm Model NIMTH(p) with a parameter p is established in Section 2. The local convergence and the efficiency of the algorithm model are given, respectively, in Sections 3 and 4. In Section 5, we derive our new Algorithm NIMTH from Algorithm Model NIMTH(p) and show that it is much more efficient than the original Algorithm IMTH for middle and large scale problems. INEXACT METHOD OF TANGENT HYPERBOLAS 249 2 ALGORITHM MODEL NIMTH( p) Now starting from Algorithm IMTH, we establish Algorithm Model NIMTH(p), where p is a parameter. Its characteristics are: the derivative information is calculated by AD; the linear system of equations is solved by either CF or PCG method. So, next we first give the AD algorithms and the PCG method, then propose the algorithm model. 2.1 Automatic Differentiation Algorithms We can evaluate the gradient, the directional gradient, and the second-order directional gradient of the function by AD, see e.g. Ref. [7]. Two AD algorithms, Algorithm AD1 and Algorithm AD2(m), in Ref. [6] will be used later. The former will be used to evaluate a gradient ∇f (x); the latter will be used to evaluate m Hessian-vector products ∇ 2 f (x)ẋi with i = 1, . . . , m, where ẋi ∈ R n , and the parameter m is any positive integer. In order to evaluate ∇ 3 f (x)ẋ ẋ by AD, the following algorithm is also proposed: ALGORITHM AD3 Step 0. Set x, ẋ ∈ R n . Step 1. Calculate ∇ 2 f (x)ẋ by Algorithm AD2(1). Step 2. Evaluate ∇ 3 f (x)ẋ ẋ, using Forward Propagation of Tangents. 2.2 Algorithm PCG For solving the linear equation system ∇ 2 f (x)s = b, (5) the following Algorithm PCG(M, ∇f (x), b, l) is constructed, where M is the preconditioner and l is the maximum number of subiterations. ALGORITHM PCG(M, ∇f (x), b, l) Step 0. Initial Data. Set s0 = 0, r0 = b, i = 1. Step 1. Termination Test. If ri−1 ≤ b3 or i − 1 = l, (6) then terminate the iteration by taking s̄ = si−1 . Step 2. Subiteration: (a) Solve the equation Mz = ri−1 for z, set ti−1 = zT ri−1 ; if i = 1, then β = 0 and q = z; else, β = ti−1 /ti−2 , q = z + βq and evaluate w = ∇ 2 f (x)q (7) by Algorithm AD2(1) with ẋ1 = q. k (b) set si = si−1 + αq, where α = ti−1 /q T and w = ∇ 2 f (x k )q; set rik = ri−1 + αw. Step 3. Set i = i + 1, and go to Step 1. It should be noted according to the test criteria (6), the number of PCG iterations is less than or equal to l. In fact, when the number of PCG iterations is selected to be n, the solution must be exact, as resulting from a CF. 250 N. DENG AND H. ZHANG 2.3 Algorithm Model NIMTH( p) Now we are in a position to propose the new algorithm model. Its basic structure is as follows: every CF step is followed by p PCG steps, where p is a parameter. ALGORITHM MODEL NIMTH(p) Step 0. Initial Data. Set the initial point x0 ∈ R n and the parameter p. If p ≥ 1, set the maximum numbers of subiterations: l1N = 2 · 3, l1N = 2 · 32 , . . . , lpN = 2 · 3p and l1H = 3, l2H = 32 , . . . , lpH = 3p and set k = 0, where N and H , respectively, denotes the Newton equations (3) and (4). Step 1. Evaluate ∇f (xk ) by Algorithm AD1. If ∇f (xk ) = 0, then terminate the iteration by taking x ∗ = xk . Step 2. Switch Test. If k can be divided by p + 1 with no remainder, go to Step 3; otherwise, go to Step 4. Step 3. CF Step. Evaluate ∇ 2 f (xk ) by using Algorithm AD2(n) with setting ẋi = ei , i = 1, . . . , n, where ei is the i-th Cartesian basic vector in R n . If p > 0, then set B k = ∇ 2 f (xk ). Evaluate ∇ 3 f (xk )sk1 sk1 by using Algorithm AD3 with x = xk and ẋ = sk1 . Solve sk1 and sk2 in the Newton equation (3) and Newton-like equation (4) by CF ∇ 2 f (xk ) = Lk Dk LTk . Set m = 0 and go to Step 5. Step 4. PCG Step. Set m = m + 1 M = Bk. (8) Find the approximate solution sk1 and sk2 to the Newton equation (3) and Newton-like equation (4) by the Algorithm PCG (M, ∇f (xk ), −∇f (xk ), lmN ) and Algorithm PCG (M, ∇f (xk ), −(1/2)∇ 3 f (x)sk1 sk1 , lmH ). Step 5. Update Solution Estimate. Set xk+1 = xk + sk1 + sk2 . Set k = k + 1, and go to Step 1. 3 CONVERGENCE ANALYSIS OF ALGORITHM MODEL NIMTH( p) In this section, the local convergence of the new algorithm model is given. DEFINITION 3.1 Let both xCF and xc be near to the solution x ∗ to Eq. (1). The progress index ν = ν[xCF , xc ] from xCF to xc with respect to x ∗ is defined as ν = ν[xCF , xc ] = ln xc − x ∗ , ln xCF − x ∗ (9) where xCF denotes the point obtained by the CF step and xc denotes the current iterative point obtained from PCG step between one CF step and next CF step. The progress index ν roughly reflects the convergence order. 1 2 1 2 + sCF , where sCF and sCF are the solutions to NewLEMMA 3.1 Assume that x+ = xCF + sCF ton equation ∇ 2 f (xCF )s 1 = −∇f (xCF ) INEXACT METHOD OF TANGENT HYPERBOLAS 251 and the Newton-like equation 1 ∇ 2 f (xCF )s 2 = − ∇ 3 f (xCF )s 1 s 1 , 2 respectively. Then there exists δ ∈ (0, 1) such that for the solution x ∗ to Eq. (1), when xCF − x ∗ ≤ δ, we have (10) ν[xCF , x+ ] ≥ 3 + θ1 . where θ1 = ln C1 / ln xCF − x ∗ , C1 > 1 is a constant. Proof In fact, by Assumption (A1), we have 1 ∇f (x ∗ ) − ∇f (xCF ) = ∇ 2 f (xCF )(x ∗ − xCF ) + ∇ 3 f (xCF )(x ∗ − xCF )(x ∗ − xCF ) 2 + O(x ∗ − xCF 3 ). (11) Since ∇f (x ∗ ) = 0, we conclude ∇f (xCF ) = O(xCF − x ∗ ). (12) By 0 = ∇f (x ∗ ) = ∇f (xCF ) + ∇ 2 f (xCF )(x ∗ − xCF ) + O(x ∗ − xCF 2 ), we get ∇f (xCF ) + ∇ 2 f (xCF )(x ∗ − xCF ) = O(x ∗ − xCF 2 ). Since xCF is sufficiently near to x ∗ , Assumptions (A1) and (A2) along with ∇ 2 f (xCF )−1 = O(1) yield ∇ 2 f (xCF )−1 [∇f (xCF ) + ∇ 2 f (xCF )(x ∗ − xCF )] = O(x ∗ − xCF 2 ). Therefore, x ∗ − xCF + ∇ 2 f (xCF )−1 ∇f (xCF ) = O(x ∗ − xCF 2 ). From Eqs. (11)–(13), we have 1 2 x+ − x ∗ = xCF + sCF + sCF − x∗ = xCF − ∇ 2 f (xCF )−1 ∇f (xCF ) 1 − ∇ 2 f (xCF )−1 ∇ 3 (xCF )[−∇ 2 f (xCF )−1 ∇f (xCF )]2 − x ∗ 2 = ∇ 2 f (xCF )−1 {∇ 2 f (xCF )(xCF − x ∗ ) − ∇f (xCF ) 1 − ∇ 3 f (xCF )[−∇ 2 f (xCF )−1 ∇f (xCF )]2 } 2 1 3 = ∇ 2 f (xCF )−1 ∇ f (xCF )(x ∗ − xCF )2 + O(x ∗ − xCF 3 ) 2 1 − ∇ 2 f (xCF )−1 ∇ 3 f (xCF )[−∇ 2 f (xCF )−1 ∇f (xCF )]2 2 (13) 252 N. DENG AND H. ZHANG = 1 2 ∇ f (xCF )−1 ∇ 3 f (xCF )[x ∗ − xCF − ∇ 2 f (xCF )−1 ∇f (xCF )] 2 × [x ∗ − xCF + ∇ 2 f (xCF )−1 ∇f (xCF )] + O(x ∗ − xCF 3 ) = 1 2 ∇ f (xCF )−1 ∇ 3 f (xCF )O(x ∗ − xCF )O(x ∗ − xCF )2 ) 2 + O(x ∗ − xCF 3 ) = O(x ∗ − xCF 3 ). In other word, there exists a constant C1 > 1 such that x+ − x ∗ ≤ C1 xCF − x ∗ 3 . Thus, Eq. (10) is obtained from Eq. (9). LEMMA 3.2 (14) Let the progress index ν from xCF to xc with respect to x ∗ satisfy ν = ν[xCF , xc ] ≥ 1. (15) Assume that s̄ 1 and s̄ 2 are obtained by Algorithm PCG (∇ 2 f (xCF ), ∇f (xCF ), −∇f (xCF ), lmN ) and Algorithm PCG (∇ 2 f (xCF ), ∇f (xCF ), −(1/2)∇ 3 f (xCF )s̄ 1 s̄ 1 , lmH ), respectively. Then there exists δ ∈ (0, 1), such that when xCF − x ∗ ≤ δ, the residuals r(s̄ 1 ) = ∇ 2 f (xc )s̄ 1 + ∇f (xc ) and 1 r (s̄ 2 ) = ∇ 2 f (xc )s̄ 2 + ∇ 3 f (xc )s̄ 1 s̄ 1 , 2 respectively, satisfy N r(s 1 ) ≤ ω ∇f (xc )1+min{2,lm /ν} , (16) and H r (s 2 ) ≤ ω ∇f (xc )2+min{1,lm /ν} , (17) where ω > 0 and w > 0 are constants. Proof We first prove the inequality (16). Without loss of generality, we assume ∇f (xc ) < 1. According to termination condition (6), there are two possibilities: either r(s 1 ) ≤ ∇f (xc )3 , (18) or the number of the subiterations is lmN , i.e. s̄ 1 = sl1N . m (19) The former case implies that the inequality (16) is true with ω = 1 and 1 + ( lmN / max{ν, 3m } ≤ 3. So we only need to show Eq. (16) for the latter case. First, we estimate the difference between (∇ 2 f (xc ))ij and (∇ 2 f (xCF ))ij , where (·)ij is the element in the ith row and the j th column of the matrix. Noticing Assumption (A2), INEXACT METHOD OF TANGENT HYPERBOLAS 253 Definition 3.1 and Eq. (15), we have that when δ is small enough, there exists a constant ω1 > 0, such that 1 1 ω1 xc − xCF ≤ ω1 (xc − x ∗ + xCF − x ∗ ) 2 2 1 = ω1 (xc − x ∗ + xc − x ∗ 1/ν ) ≤ ω1 xc − x ∗ 1/ν (20) 2 |(∇ 2 f (xc ))ij − (∇ 2 f (xCF ))ij | ≤ On the other hand, by Assumptions (A1) and (A2), we conclude that when δ is small enough, ∇f (xc ) = ∇f (xc ) − ∇f (x ∗ ) ≥ 1 ω2 xc − x ∗ 2 (21) where ω2 = min{λmin , 1}, and λmin is the smallest eigenvalue of the matrix ∇ 2 f (x ∗ ). It follows from Eqs. (20) and (21) that |(∇ 2 f (xc ))ij − (∇ 2 f (xCF ))ij | ≤ ω1 2 ω2 1/ν ∇f (xc )1/ν ≤ ω3 ∇f (xc )1/ν . (22) where ω3 = 2ω1 /ω2 , which is independent of ν. Let A = (∇ 2 f (xCF ))−1/2 ∇ 2 f (xc )(∇ 2 f (xCF ))−1/2 = I + (∇ 2 f (xCF ))−1/2 [∇ 2 f (xc ) − ∇ 2 f (xCF )](∇ 2 f (xCF ))−1/2 . (23) It is easy to see that when δ is small enough for all i, j , |(∇ 2 f (xCF )−1/2 )ij | ≤ ω4 , (24) def where ω4 = max{|(∇ 2 f (x ∗ )−1/2 )ij ||i, j = 1, 2, . . . , n} and > 1 is a positive constant. Therefore, from Eqs. (22) and (24), |((∇ 2 f (xCF ))−1/2 [∇ 2 f (xc ) − ∇ 2 f (xCF )](∇ 2 f (xCF ))−1/2 )ij | ≤ ω5 ∇f (xc )1/ν n (25) where ω5 = n3 ω3 ω42 . By Gerschgoring Theorem, Eqs. (22) and (24), we can get that for any eigenvalue λ of A, there exists a aii (a diagonal entry of A), such that |λ − aii | ≤ (n − 1) By Eq. (24), we have |aii − 1| ≤ ω5 ∇f (xc )1/ν . n ω5 ∇f (xc )1/ν , n therefore, |λ − 1| ≤ |λ − aii | + |aii − 1| ≤ ω5 ∇f (xc )1/ν , that is, 1 − ω5 ∇f (xc )1/ν ≤ λ ≤ 1 + ω5 ∇f (xc )1/ν . (26) 254 N. DENG AND H. ZHANG Therefore, the condition number of A, κ(A) satisfies κ(A) = λmax (A) 1 + ω5 ∇f (xc )1/ν . ≤ λmin (A) 1 − ω5 ∇f (xc )1/ν (27) Thus by Eq. (27), noticing limδ→0 ∇f (xc ) = 0, we conclude that when δ is small enough, κ(A) ≤ 2(1 − ω5 ), κ(A) − 1 ≤ (28) 2ω5 ∇f (xc ) ≤ 4ω5 ∇f (xc )1/ν . 1 − ω5 ∇f (xc )1/ν 1/ν (29) Now, we consider to solve the linear system Aŝ = −b (30) using conjugate gradient method, where A is defined by Eq. (23) and b = ∇ 2 f (xCF )−1/2 ∇f (xc ). (31) Let the initial ŝ0 = 0 and ŝ be the approximate solution obtained after l N subiterations. Let us estimate the residual r̂(ŝl N ) = Aŝl N + b. (32) In fact, from the Lemma 2.3.2 in Ref. [9], we have and r̂(ŝl N ) ŝ N − ŝ ∗ A ≤ (κ(A))1/2 l r̂(0) 0 − ŝ ∗ A (33) l N ŝl − ŝ ∗ A (κ(A))1/2 − 1 ≤ 2 . 0 − ŝ ∗ A (κ(A))1/2 + 1 (34) where ŝ ∗ is the exact solution to Eq. (30). Combining Eqs. (33), (34), (28), and (29) we have r̂(ŝl N ) ≤ 2r̂(0)(κ(A)) 1/2 (κ(A))1/2 − 1 (κ(A))1/2 + 1 ≤ 2r̂(0)(κ(A))1/2 (κ(A) − 1)l l N N N ≤ 2r̂(0)(2(1 + ω5 ))1/2 (4ω5 )l ∇f (xc )l lN N /ν ≤ 2∇ 2 f (xCF )−1/2 (2(1 + ω5 ))1/2 (4ω5 ) ∇f (xc ) ≤ ω6 ∇f (xc )1+l N /ν , N where ω6 = 4(2(1 + ω5 ))1/2 (4ω5 )l ∇ 2 f (x ∗ )−1/2 . Considering the relationships sl N = −∇ 2 f (xc )−1 ∇f (xc ), ŝl N = (∇ 2 f (xCF ))1/2 sl N , (35) 1+l N /ν INEXACT METHOD OF TANGENT HYPERBOLAS 255 and r(sl N ) = (∇ 2 f (xCF ))1/2 r̂(sl N ), we obtain r(sl N ) ≤ ω∇f (xc )1+l N /ν , (36) where ω = 2ω6 (∇ 2 f (x ∗ ))1/2 . For the solution s̄ 2 to Newton-like equation, similarly we can prove that, after executing l H subiterations of the PCG, similar to Eq. (35), the residual satisfies H r (sl H ) ≤ 2r (0)(2(1 + ω5 ))1/2 (4ω5 )l ∇f (xc )l H /ν . (37) Since r (0) = 1 3 1 ∇ f (xc )sl N sl N ≤ ∇ 3 f (xc )sl N 2 , 2 2 and Assumptions (A1) and (A2), there exists a constant C such that r (0) ≤ C ∇f (xc )2 . Therefore, r (sl H ) ≤ ω ∇f (xc )2+l H /ν , where ω is a constant. Therefore from Eqs. (36) and (38), we get the conclusion. (38) LEMMA 3.3 Suppose that ν ≥ 1 and x+ = xc + s̄ 1 + s̄ 2 , where ν, s̄ 1 , and s̄ 2 are defined in Lemma 3.2. Then there exists δ ∈ (0, 1), such that for the solution x ∗ to Eq. (1), when xCF − x ∗ ≤ δ and xc − x ∗ ≤ δ, we have ν[xCF , x+ ] ≥ ν + θ2 , (39) = min{3, 1 + min{2, lmN /ν}, 2 + min{1, lmH /ν}}, (40) where def θ2 = ln C2 < 0, ln xCF − x ∗ (41) and C2 > 1 is a constant. Proof From the definition of the residual r(s̄ 1 ) and s̄ 2 , we have s̄ 1 = −∇ 2 f (xc )−1 ∇f (xc ) + ∇ 2 f (xc )−1 r(s̄ 1 ), (42) 1 s̄ 2 = − ∇ 2 f (xc )−1 ∇ 3 f (xc )s̄ 1 s̄ 1 + ∇ 2 f (xc )−1 r(s̄ 2 ). 2 (43) and 256 N. DENG AND H. ZHANG Suppose 1 B1 = xc − ∇ 2 f (xc )−1 ∇f (xc ) − ∇ 2 f (xc )−1 ∇ 3 f (xc ) 2 × [−∇ 2 f (xc )−1 ∇f (xc )]2 − x ∗ −1 −1 (44) −1 B2 = ∇ f (xc ) r(s̄ ) + ∇ f (xc ) ∇ f (xc )[∇ f (xc ) ∇f (xc )] 2 1 2 3 2 1 × [∇ 2 f (xc )−1 r(s̄ 1 )] − ∇ 2 f (xc )−1 ∇ 3 f (xc )[∇ 2 f (xc )−1 r(s̄ 1 )]2 2 B3 = ∇ 2 f (xc )−1 (s̄ 1 ). Then (45) (46) x+ − x ∗ = xc + s̄ 1 + s̄ 2 − x ∗ = B1 + B2 + B3. (47) By Assumptions (A1) and (A2), Lemma 3.2, we can see that there exist constants M1 , M2 , and M3 that are only dependent on x ∗ , such that the following three inequalities hold: B1 ≤ M1 xc − x ∗ 3 , B2 ≤ M2 ∇f (xc ) (48) 1+l N / max{ν,3m } , (49) H / max{ν,3m } , (50) B3 ≤ M3 ∇f (xc )2+l where the proof of Eq. (48) can be obtained by the proof of Lemma 3.1 with setting xc = xCF , while Eqs. (49) and (50) are not difficult to prove. Therefore, by Eqs. (47)–(50) x+ − x ∗ ≤ C2 xc − x ∗ , (51) where and C2 are defined in Eqs. (40) and (41). Notice that ν[xCF , x+ ] = ν[xc , x+ ]ν, we can get Eq. (39) from Eq. (51). For convenience, in expression we rewrite the sequence {xk } generated by Algorithm Model NIMTH(p) as: CF {x , x CF , . . . , xjCF when p = 0 (p+1) , . . . , }, 0(p+1) 1(p+1) CF PCG (52) {xk } = {x0(p+1) , x0(p+1)+1 , . . . , CF PCG CF xj (p+1) , xjPCG (p+1)+1 , . . . , xj (p+1)+p , x(j +1)(p+1) , . . . , }, when p > 0, where the notations CF or PCG are to show which step is executed at the corresponding iterate. CF CF We call the iterations to obtain x(j +1)(p+1) from xj (p+1) as j th cycle of the algorithm model. LEMMA 3.4 Consider the sequence (52) generated by Algorithm Model NIMTH(p). Then there exists δ ∈ (0, 1) such that, when x0 − x ∗ < δ, we have, for the solution x ∗ to Eq. (1) and for any j , ν[xjCF (p+1) , xj (p+1)+q+1 ] ≥ 3 + 1 t=1 ltN + 3q+1 − 1 θ > 1 for q = 0, 1, . . . , p, 2 (53) where θ = min{θ1 , θ2 }, while θ1 and θ2 are defined in Lemma 3.1 and Lemma 3.3, respectively. Proof We consider the two cases p = 0 and p > 0 separately. For the former case, by ∗ Lemma 3.1, we have that when δ is small enough and xjCF (p+1) − x ≤ δ, CF CF CF ν[xjCF (p+1) , x(j +1)(p+1) ] = ν[xj , xj +1 ] ≥ 3 + θ1 ≥ 3 + θ > 1. (54) INEXACT METHOD OF TANGENT HYPERBOLAS 257 For the latter case, it is sufficient to prove by induction that when δ is small enough and ∗ xjCF (p+1) − x ≤ δ, Eq. (54) and ∗ xjCF (p+1)+q+1 − x ≤ δ, q = 0, 1, . . . , p. (55) are valid. First, note that when q = 0, Eqs. (53) and (55) can be obtained by Lemma 3.1 directly. Second, we assume the validity of Eqs. (53) and (55) with q = i − 1 and prove their validity ∗ with q = i. In fact, by Lemma 3.3 we can conclude that when xjCF (p+1) − x ≤ δ, PCG ν[xjCF (p+1) , xj (p+1)+i+1 ] ≥ min{α1 , α2 , α3 }, (56) where α1 = PCG ν[xjCF (p+1) , xj (p+1)+i ] + min 1, PCG ν[xjCF (p+1) , xj (p+1)+i ] PCG α2 = 2ν[xjCF (p+1) , xj (p+1)+i ] + min 1, 3i PCG ν[xjCF (p+1) , xj (p+1)+i ] liN + θ2 , 3i liH + θ2 , PCG α3 = 3ν[xjCF (p+1) , xj (p+1)+i ] + θ2 . It is easy to see from the induction assumption that PCG i ν[xjCF (p+1) , xj (p+1)+i ] ≥ 3 1 − 1/3i 3i − 1 θ = 3i 1 + θ . 2 2 Thus, α1 ≥ 3 + i−1 ltN t=1 =3+ ≥3+ 3i − 1 1 − 1/3i + θ + 1+ θ liN + θ 2 2 1 − 1/3i N ltN + 1 + li θ 2 t=1 i i ltN + t=1 3i+1 − 1 θ, 2 (57) 1 − 1/3i H 3i − 1 α2 ≥ 2 3 + θ + 1+ li + θ + 2 2 t=1 i−1 i−1 = 3+ ltN + 3 + ltN + liH i−1 ltN t=1 + (3i − 1) + =3+ i t=1 ltN + t=1 1 − 1/3i H li + 1 2 3i+1 − 1 θ, 2 θ (58) 258 N. DENG AND H. ZHANG 3i − 1 α3 ≥ 3 3 + + θ +θ 2 t=1 i−1 i−1 3i − 1 N N =3+ lt + 2 3 + lt + 3 θ +θ 2 t=1 t=1 =3+ i−1 i t=1 ltN ltN + 3i+1 − 1 θ. 2 (59) The validity of Eq. (53) follows from Eqs. (56) and (59). Combining Eqs. (53) with the condition ∗ xjCF (p+1) − x ≤ δ yields Eq. (53). THEOREM 3.1 Algorithm Model NIMTH(p) is locally convergent. Furthermore, there exists a constant C3 > 0, such that CF ∗ CF ∗ 3 x(j +1)(p+1) − x ≤ C3 xj (p+1) − x Proof 4 p+1 . (60) Setting q = p in Lemma 3.4, the conclusion is obtained. EFFICIENCY ANALYSIS OF ALGORITHM MODEL NIMTH( p) In this section, we will analyze the efficiency of Algorithm Model NIMTH(p) after examining the computation cost. 4.1 The Computation Cost of Algorithm Model NIMTH( p) 4.1.1 The Computation Cost of Automatic Differentiation Algorithms Consider the computation cost AAD1 to evaluate a gradient ∇f by Algorithm AD1. By Eq. (32) in Ref. [7, Chap. 3], Section 3.4, we have QAD1 ≤ 4Qf , (61) Qf = the computation cost to evaluate a function value f (x). (62) where The computation cost QAD2 (m) of Algorithm AD2 consists of two parts: the cost QAD1 to evaluate a gradient ∇f and the cost QHV (m) to evaluate m Hessian-vector products after ∇f (x) is evaluated. Using Eq. (14) in Ref. [7, Chap. 3], Section 3.2, we have QHV (m) ≤ 1.5mQAD1 ≤ 6mQf . (63) QAD2 (m) = QAD1 + QHV (m) ≤ (1 + 1.5m)QAD1 ≤ (4 + 6m)Qf , (64) Therefore, where the last inequality comes from Eq. (61). INEXACT METHOD OF TANGENT HYPERBOLAS 259 Let QAD3 be the computation cost involved in Algorithm AD3. Using Eq. (14) in Ref. [7, Chap. 3], Section 3.2, we arrive at QAD3 ≤ 20Qf . 4.1.2 (65) The Computation Cost of a CF Step The computation cost of a CF step with an extra gradient evaluation consists of four parts: (1) Evaluating a gradient ∇f and a Hessian ∇ 2 f . This is completed by using Algorithm AD2(n) with setting ẋi = ei , i = 1, . . . , n, where ei is the i-th Cartesian basic vector in R n . The corresponding computation cost is QAD2 (n). (2) Solving the Newton equation by CF. Denoting the corresponding computation cost as Q− D, we have 1 3 3 2 2 Q− (66) D = n + n − n, 6 2 3 where only the multiplicative operations are considered. (3) Evaluating a tensor–vector product ∇ 3 f s 1 s 1 . This is completed by using Algorithm AD3 with x = xk , ẋ = s 1 , where s 1 is the solution of the Newton equation (3). The corresponding computation cost is QAD3 − QAD1 . (4) Solving the Newton-like equation. The corresponding multiplicative computation cost is n2 . Thus, by Eqs. (64) and (63), the total computation cost of one CF step with an extra gradient evaluation is AD1 2 2 QAD2 (n) + QAD3 − QAD1 + Q− + QHV (n) + QAD3 − QAD1 + Q− D +n =Q D +n 2 = QHV (n) + QAD3 + Q− D +n ≤ (6n + 20)Qf + QD , (67) 1 3 5 2 2 n + n − n. 6 2 3 (68) where 2 QD = Q− D +n = 4.1.3 The Computation Cost of the PCG Steps For the PCG steps, examine the tth (1 ≤ t ≤ p) PCG step after a CF step. According to Algorithm Model NIMTH(p), the subiteration number in the tth PCG step is not greater than 3t+1 . Therefore, the computation cost of the tth PCG step consists of two parts: (1) Evaluating a gradient ∇f , a tensor–vector product (1/2)∇ 3 f (x)qq and at most 3t+1 Hessian-vector products ∇ 2 f q. Using Algorithm AD2(m) with m ≤ 3t+1 , the computation cost is not greater than QAD2 (3t+1 ). (2) Executing at most 3t+1 subiterations. Denote QI = QI (n) = n2 + 6n + 2 (69) as the multiplicative computation cost in one PCG subiteration. The computation cost is not greater than 3t+1 QI . 260 N. DENG AND H. ZHANG Thus, by Eq. (64), the total computation cost of p PCG steps with p extra gradient evaluations has the upper bound p [Q AD3 +Q AD2 (3 t+1 ) + (3 t+1 t=1 1 p+2 2 Qf )QI ] ≤ 20p + 6 −3 3 2 1 + (3p+2 − 32 )QI . 2 4.1.4 (70) The Computation Cost of Algorithm Model NIMTH( p) CF CF Let W [xjCF (p+1) , x(j +1)(p+1) ] be the computation cost of the j th cycle from xj (p+1) to CF x(j +1)(p+1) . Combining Eqs. (67) and (70) and defining def 1 σ = σ (p) = (3p+2 − 32 ), 2 (71) we conclude that the average computation cost in the p + 1 steps of j th cycle satisfies CF W [xjCF (p+1) , x(j +1)(p+1) ] p+1 ≤ (6n + 20p + 6σ + 20)Qf + QD + σ QI def = w(n, Qf , p). p+1 (72) 4.2 The Efficiency Coefficient of Algorithm Model NIMTH( p) Now we estimate the efficiency of Algorithm Model NIMTH(p). Here we cite a definition of the efficiency coefficient given by Ref. [1]. DEFINITION 4.1 Consider the sequence {xk } generated by Algorithm Model NIMTH(p) and given in Eq. (52). The efficiency coefficient of the algorithm is defined by (p) = lim inf j →∞ {xk } CF ln ν[xjCF (p+1) , x(j +1)(p+1) ] CF W [xjCF (p+1) , x(j +1)(p+1) ] . (73) THEOREM 4.1 Suppose that the efficiency coefficient of Algorithm Model NIMTH(p) is defined by Eq. (73). Then it satisfies (p) ≥ (p + 1) ln 3 CF W [xjCF (p+1) , x(j +1)(p+1) ] ≥ ln 3 def = (p), w(n, Qf , p) (74) where w(n, Qf , p) is defined by Eqs. (72), (71), (68), and (69). Proof By Lemma 3.4, we conclude that when x0 is close to the solution x ∗ to Eq. (1) enough, the sequence (52) generated by Algorithm Model NIMTH(p) satisfies Eq. (53). Using the estimate (53) with q = p, we have CF ν[xjCF (p+1) , x(j +1)(p+1) ] ≥3+ p t=1 2 · 3t + 3p+1 − 2 3p+1 − 2 θ = 3p+1 + θ. 2 2 Then the conclusion (74) comes from Eqs. (73), (75), and (72). (75) INEXACT METHOD OF TANGENT HYPERBOLAS 5 261 NEW ALGORITHM AND ITS THEORETICAL ADVANTAGES Our Algorithm NIMTH is derived by specifying the parameter p in Algorithm Model NIMTH(p). Theorem 4.1 shows that (p) is a lower bound of the efficiency coefficient of Algorithm Model NIMTH(p). It is natural to select p by maximizing (p) or by minimizing w(n, Qf , p) defined by Eq. (72). This leads to our Algorithm NIMTH: 5.1 Algorithm NIMTH Algorithm NIMTH is obtained from Algorithm Model NIMTH(p) by specifying p = p ∗ , where p∗ is the solution to the one-dimensional optimization problem: min w(n, Qf , p) = (6n + 20p + 6σ + 20)Qf + QD + σ QI , p+1 s. t. p is a nonnegative integer, (76) (77) where Qf , QD , QI , and σ = σ (p) are defined by Eqs. (62), (68), (69), and (71). In order to compare Algorithm NIMTH with the original Algorithm IMTH, note that the latter can also be obtained from Algorithm Model NIMTH (p) by specifying p = 0. Therefore, according to Theorem 4.1, the following efficiency ratio reflects, in some sense the improvement of Algorithm NIMTH over Algorithm IMTH – the larger this efficiency ratio, the more superior Algorithm NIMTH. DEFINITION 5.1 The efficiency ratio of Algorithm NIMTH versus Algorithm IMTH is defined as w(n, Qf , 0) (p ∗ ) γ (n, Qf ) = R(n, Qf , p∗ ) = = . (78) (0) w(n, Qf , p∗ ) In order to estimate the efficiency ratio, the following two lemmas are needed. LEMMA 5.1 When n ≥ 100, the solution p ∗ to the one-dimensional optimization problem (76) and (77) satisfies: 3((2/3)n − 3) (1/9)n − 3 ∗ < 3p +1 < . 3 ln(((1/9)n − 3)/e) ln(((2/3)n − 3)/e) − ln ln(((2/3)n − 3)/e) (79) Proof Suppose p1 is the solution to the optimization problem min w(n, Qf , p) p≥0 (80) with the continuous variable p, where w(n, Qf , p) is defined by Eq. (72). Obviously, the solution p∗ = p(n, Qf ) to Eqs. (76) and (77) and the solution p1 = p1 (n, Qf ) have the relationship p1 ≤ p∗ ≤ p1 . By ∂w(n, Qf , p1 ) = 0, ∂p1 (81) 262 N. DENG AND H. ZHANG we have (3p1 +1 /e) ln(3p1 +1 /e) = (β − 3)/e, (82) 2 (QD + 6nQf ) . 3 (QI + 6Qf ) (83) where β = β(n, Qf ) = Obviously, Eq. (82) has a unique solution p1 when n ≥ 100 and Qf ≥ 0. Now let us estimate p1 . By the monotonicity of p1 with respect to β in Eq. (82) and the monotonicity of β with respect to n and Qf in Eq. (83), it is not difficult to see that when n ≥ 100 and Qf ≥ 0, p1 (n, Qf ) is increasing with respect to both n and Qf . When n ≥ 100, e2 + 3 < 2n 2 < β = β(n, Qf ) < n, 36 3 (84) In addition, notice that the function 3p1 +1 3p1 +1 ln e e is increasing when p1 ≥ 0. Hence, by Eqs. (84) and (82) we conclude that when n ≥ 100, p1 satisfies β −3 β −3 < 3p1 +1 < . (85) ln((β − 3)/e) ln((β − 3)/e) − ln ln((β − 3)/e) This leads to, by Eq. (81), that when n ≥ 100, σ1 (β) < 3p where σ1 (β) = and σ2 (β) = ∗ +1 < σ2 (β), (86) β −3 , 3 ln((β − 3)/e) 3(β − 3) . 3 ln((β − 3)/e) − ln ln((β − 3)/e) Thus, by Eq. (84) and the monotonicity of σ1 (β) and σ2 (β) with respect to β, Eq. (79) is obtained. LEMMA 5.2 The solution p ∗ to the optimization problem (76) and (77) satisfies: (1) When n ≥ 100, p∗ ≥ 1 (87) (2) When n → +∞, ∗ lim sup 3p +1 ≤ 2, n/ ln n lim inf 3p +1 1 ≥ . n/ ln n 27 n→+∞ (88) ∗ n→+∞ (89) INEXACT METHOD OF TANGENT HYPERBOLAS (3) When n → +∞, p∗ ∼ Proof ln n . ln 3 263 (90) In fact, QD > 9Q1 when n ≥ 100 and Qf ≥ 0. So w(n, Qf , 0) − w(n, Qf , 1) = 1 [(QD − 9QI ) + (6n − 54)Qf ] > 0. 2 Therefore, w(n, Qf , 0) > w(n, Qf , 1). Thus Eq. (87) is proved. By Eq. (79), it is easy to see the validity of Eqs. (88)–(90). (91) The next remark is concerned with the maximum number in the PCG step in our Algorithm. Remark 5.1 Suppose n ≥ 100, it is shown by Lemma 5.2 that p∗ ≥ 1. Therefore, w(n, Qf , p∗ − 1) > w(n, Qf , p∗ ). This yields that the computation cost of the p ∗ th PCG step is less than that of a CF step. ∗ ∗ lpN∗ QI + lpH∗ QI = 2 · 3p QI + 3p QI < QD , where lpN∗ and lpH∗ are defined in Step 0 in Algorithm NIMTH. So the subiteration number in the p∗ th PCG step satisfies that lpN∗ + lpH∗ < QD (1/6)n3 + (5/2)n2 − (2/3)n = < n. QI n2 + 6n + 2 Therefore, we have lpN∗ + lpH∗ < n. Notice that max{l1N + l1H , l2N + l2H , . . . , lpN∗ + lpH∗ } = lpN∗ + lpH∗ , we conclude that the maximum of the subiteration numbers in the PCG steps in a circle is less than n. In practice, the maximum is much less than n. This is the reason that leads to the efficiency of our algorithm. THEOREM 5.1 When n ≥ 100, the efficiency coefficient γ (n, Qf ) satisfies: (1) (2) (3) (4) γ (n, Qf ) > 1. When fixed n, γ (n, Qf ) is strictly increasing with respect to Qf ≥ 0. γ (n, 0) are strictly increasing with respect to n. When n → +∞, for all Qf > 0 γ (n, Qf ) > γ (n, 0) ∼ ln n . ln 3 264 N. DENG AND H. ZHANG Proof In the following, n ≥ 100 and Qf ≥ 0 are always assumed. (1) By Lemma 5.2 (1), p∗ ≥ 1 when n ≥ 100. Therefore, by Eq. (91) γ (n, Qf ) = R(n, Qf , p∗ ) = w(n, Qf , 0) w(n, Qf , 0) ≥ > 1. ∗ w(n, Qf , p ) w(n, Qf , 1) (92) (2) Define c(p) = p/σ (p), (93) where σ (p) is defined in Eq. (71). σ (p) and c(p) are functions with respect to continuous variable p ≥ 1. Because c (p) = p (1/2)(3p+2 − 32 ) = 2(3p − 1 − p3p ln 3) < 0, 9(3p − 1)2 by Eq. (87), we can get c(p∗ ) = Therefore, 6 + 20c(p ∗ ) 6 + 20/9 QI ≤ < , 6n + 20 6n + 20 QD that is, Thus, p∗ p∗ 1 1 = ≤ = . ∗ σ σ (p ∗ ) σ (1) 9 6σ ∗ + 20p ∗ σ ∗ QI . < 6n + 20 QD (94) 20 + 6n + 20p ∗ + 6σ ∗ Q D + σ ∗ QI . < (1 + p ∗ )(20 + 6n) (1 + p ∗ )QD (95) Therefore, it is not difficult to get that when fixed n and Qf 1 > Qf ≥ 0, (p∗ + 1)((20 + 6n)Qf + QD ) (6n + 20 + 20p ∗ + 6σ ∗ )Qf + QD + σ ∗ QI (p ∗ + 1)((20 + 6n)Qf 1 + QD ) , (6n + 20 + 20p ∗ + 6σ ∗ )Qf 1 + QD + σ ∗ QI (96) and, by Eq. (78), γ (n, Qf ) = R(n, Qf , p∗ ) < R(n, Qf 1 , p∗ ). (97) def If p1∗ = p(n, Qf 1 ), then R(n, Qf 1 , p∗ ) ≤ R(n, Qf 1 , p1∗ ) = γ (n, Qf 1 ). (98) Combining Eqs. (97) and (98), we have, when fixed n and Qf 1 > Qf ≥ 0, γ (n, Qf ) < γ (n, Qf 1 ). The conclusion (1) is obtained. (99) INEXACT METHOD OF TANGENT HYPERBOLAS 265 (3) Denoting T (n) = QD /QI , then it is easy to prove T (n) < T (n + 1). (100) By Eqs. (100) and (78) and γ (n, 0) = (p0∗ (n) + 1)QD p0∗ (n) + 1 = , ∗ QD + σ0 (n)QI 1 + σ0∗ (n)/T (n) (101) where p0∗ (n) = p(n, 0), σ0∗ (n) = σ (p0∗ (n)), we have γ (n, 0) < p0∗ (n) + 1 p0∗ (n + 1) + 1 < = γ (n + 1, 0). ∗ 1 + σ0 (n)/T (n + 1) 1 + σ0∗ (n + 1)/T (n + 1) The conclusion (2) is obtained. (4) By Eqs. (101), (90), (88), and (89), the validity of the conclusion (3) is obtained. Theorem 5.3 is also supported by our preliminary numerical experiments. The detail is omitted. Acknowledgments The work was supported by the National Science Foundation of China (Grant No. 10071094), Research Grants Council of the Hong Kong Special Administrative Region, China (Grant CityU 1066/00P), and the Talent Foundation of Beijing (Grant No. Kw0603200352). References [1] R. Brent (1973). Some efficient algorithms for solving systems of nonlinear equation. SIAM J. Numerical Anal., 10, 327–344. [2] M. Bartholomew-Biggs, S. Brown, B. Christianson and L.C.W. Dixon (2000). Automatic differentiation of algorithms. J. Comput. Appl. Math., 12, 171–190. [3] L.C.W. Dixon (2000). On the Deng-Wang theorem. OR Trans., 4, 42–48. [4] R. Dembo, S. Eisenstat and T. Steihaug (1982). Inexact Newton method. SIAM J. Numerical Anal., 19, 400–408. [5] N.Y. Deng and Z.Z. Wang (2000). Theoretical efficiency of an inexact Newton method. J. Optim. Theor. Appl., 105, 97–112. [6] N.Y. Deng, H.B. Zhang and C.H. Zhang (2001). Further improvement of the Newton- PCG algorithm with automatic differentiation. Optim. Methods Software, 16, 151–178. [7] A. Griewank (2000). Evaluating derivatives principles and techniques of algorithmic differentiation. Frontiers in Appl. Math., Vol. 19, SIAM, Philadephia. [8] R. Jackson and G. McCormick (1986). The Poliyadic Structure of Factorable Functions Ten-sors with Applications to High-Order Minimization Techniques. J. Optim. Techniques, J. Optim. Theor. Appl., 51(1), 63–94. [9] C. Kelly (1995). Iterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia. [10] R. Kalaba and A. Tischler (1983). A generalized Newton algorithm using high order derivatives. J. Optim. Theor. Appl., 39, 1–17. [11] G. Ostrovskii, Yu. Volin and W. Borisov (1971). Über die Berechnung von Ableitungen. Wissenschaftliche Zeitschrift der Technischen Hochschule für Chemie, Leuna-Merseburg, 13(4), 382–384. [12] A.H. Sherman (1978). On Newton-iterative methods for the solution of systems of nonlinear equations. SIAM J. Numer. Anal., 15, 755–771. [13] P.L. Toint (1981). Towards an Efficient Sparsity Exploiting Newton Method for Minimization, Sparse Matrices and Their Uses, In: I.S. Duff (Ed.), Academic Press, London, England, pp. 57–88. [14] J.E. Tolsma and P.I. Barton (1998). On computational differentiation. Comput. Chem. Eng., 22(4/5), 475–490. [15] R. Wengert (1964). A simple automatic derivative evaluation program. Commun. ACM, 7(8), 463–464.
© Copyright 2026 Paperzz