Chebyshev expansion methods for electronic structure calculations on large molecular systems Roi Baer and Martin Head-Gordon Department of Chemistry, University of California, Berkeley, California 94720 and Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 ~Received 17 July 1997; accepted 15 September 1997! The Chebyshev polynomial expansion of the one electron density matrix ~DM! in electronic structure calculations is studied, extended in several ways, and benchmark demonstrations are applied to large saturated hydrocarbon systems, using a tight-binding method. We describe a flexible tree code for the sparse numerical algebra. We present an efficient method to locate the chemical potential. A reverse summation of the expansion is found to significantly improve numerical speed. We also discuss the use of Chebyshev expansions as analytical tools to estimate the range and sparsity of the DM and the overlap matrix. Using these analytical estimates, a comparison with other linear scaling algorithms and their applicability to various systems is considered. © 1997 American Institute of Physics. @S0021-9606~97!03947-0# I. INTRODUCTION Many theories of molecular electronic structure employ the concept of an effective one-electron Hamiltonian. Kohn and Sham, using the theorems of Hohenberg and Kohn, rigorously prove the existence of an effective one-electron Hamiltonian, of the form1 H52 \2 2 ¹ 1 n ~ r! , 2m e ~1.1! for determining the ground state energy of a many-electron system. The normalized eigenstates ( c i ) of this Hamiltonian describe the exact ground state one-electron density ~we ignore spin for simplicity!: r ~ r! 5 (i n iu c i~ r! u 2 , ~1.2! where n i is the occupancy of the ith state: equal to 0 or 1, according to the rule that for 2N e electrons the N e lowest eigenstates are populated and the other states are vacant. Definitions can also be devised for an odd number of electrons. The occupation numbers may be viewed as the eigenvalues of a more general operator, the Kohn–Sham density matrix ~KSDM!: r ~ r,r8 ! 5 (i n i c i~ r! c *i ~ r8 ! , ~1.3! which is idempotent: Er ~ r,r9 ! r ~ r9 ,r8 ! dr9 5 r ~ r,r8 ! . ~1.4! The ground state energy of the system is then formulated in terms of the density matrix. Likewise, in Hartree–Fock theory, the effective Hamiltonian is the Fock Matrix, and a Hartree–Fock density matrix ~HFDM! can be defined in the same manner as for the Kohn–Sham theory. J. Chem. Phys. 107 (23), 15 December 1997 This rigorous scheme has also been the basis of constructing new semiempirical tight binding models,2 where there too the groundstate energy is determined by calculating the idempotent density matrix. Hartree–Fock and DF theories are self-consistent field ~SCF! theories, and once the density matrix is calculated, a new Hamiltonian is constructed from it. In real space this step turns out to be computationally intensive, and for small systems has a O(N 4e ) complexity. However, new theoretical developments first introduced in Ref. 3 and later developed further4–7 have overcome this obstacle, achieving linear scaling in this aspect of the computation and paving the way for large system SCF calculations. Thus, the computational bottleneck shifts to the calculation of the DM from the given Hamiltonian. Exceedingly successful approaches to treating large systems are the planewave total energy and Car–Parrinello approaches.8,9 These methods are capable of dealing with a number of atoms, presently on the order of hundreds. It is difficult to extend the methods beyond this size primarily due to the O(M N 2e ) scaling, where N e is the number of electrons in a unit cell and M the number of plane waves. Recently, it was pointed out10–12 ~and also see Ref. 13 for earlier ideas! that by invoking a basis of localized functions, instead of a plane-wave basis, it is possible to develop methods that scale linearly with system size. Pursuing this idea, several algorithms for dealing with different aspects of the electronic-structure calculations have been developed. The most established are the methods based on searching for a DM that minimizes a generalized energy functional that includes terms encouraging DM idempotency.14–18 In these methods, the minimization process requires a calculation of a power of the density matrix. Thus, in the LNV method,14,15 the power F 2 , where F is the density matrix, needs to be calculated. The method of Hernandez et al.17,18 requires the calculation of F 3 and the Kohn method16 requires a calculation of F 4 . We name these ‘‘F3F methods.’’ Other linear scaling methods have been proposed,11,12,19–21 mostly based 0021-9606/97/107(23)/10003/11/$10.00 © 1997 American Institute of Physics 10003 10004 R. Baer and M. Head-Gordon: Chebyshev expansion methods on an orbital approach, but we shall not explicitly consider these in this paper, which concentrates on the one-electron density matrix. A different approach for calculating the DM is a direct extraction of it from the Hamiltonian, without a search for a functional minimum. Such a method has been proposed by Goedecker and Colombo22 and is based on a polynomial expansion of the DM. In particular, Goedecker et al. used Chebyshev polynomials in these expansions.23,24 This approach was also recently applied to tight binding models by Voter et al.25 Chebyshev expansions have been very successful in quantum dynamical calculations ever since their introduction to the field by Kosloff and Tal-Ezer.26 The methods have been used for expanding various functions of the Hamiltonian, including the evolution operator in time-dependent reactive scattering,27 and in molecular spectroscopy,28 the Green’s function for reactive scattering29 and filtering methods for dissipative tunneling.30,31 Recently, Kouri et al.32 used a Chebyshev polynomial expansion of the Heavyside weight, formally equivalent to the DM, for plane-wave DFT calculations. This paper is intended to further establish the Chebyshev expansion method of Refs. 22 and 24 in several aspects. First, we introduce several important improvements to the method. We describe a tree code for representing the sparse column vectors being computed, taking full advantage of the fact that it starts off very narrow and broadens gradually. In particular, it is shown that the Chebyshev series may be summed in reverse, and this, in conjunction with our tree code method increases the efficiency of the calculation by large factors, without sacrificing precision and without imposing any a priori cutoff radii around atoms. We also discuss how to perform an efficient search for the chemical potential, using special properties of the expansion. Next, we investigate the dependence on system geometry and accuracy constraints of the new linear scaling methods. In this respect it is shown how the analytical properties of the Chebyshev expansion can be used to determine the DM sparsity before attempting any actual calculation. We then use such estimates to evaluate general properties of several linear scaling methods. This analysis leads to nontrivial results showing that different linear scaling methods can have different scaling properties with respect to dimensionality of the system and to accuracy. Finally, we present several results on hydrocarbon sheets and chains, demonstrating the linear scaling properties of the Chebyshev method and our theoretical estimates. The structure of the paper follows. We define some technical terms and notations and describe briefly the Chebyshev expansion method and several improvements in Sec. II. We use, in Sec. III, the Chebyshev expansion as an analytical tool for studying the locality and sparsity of the density matrix. In Sec. IV we produce theoretical estimates for the numerical work needed to calculate the DM. Estimates are given for the Chebyshev expansion method and for F3F methods. Examples of the performance of the Chebyshev method within a tight-binding model of hydrocarbon systems in one dimension and two dimensions is given in Sec. V. And a summary of our findings is finally given in Sec. VI. II. GENERAL FRAMEWORK A. Breadth and effective dimension The theory is formulated in a basis of N functions, localized in R space, u a & (a51,...,N) and its dual ^ b̄ u ( ^ b̄ u a & 5 d ab ). In this space, single electron wave functions f~r! are represented by column vectors v with coefficients n a 5 ^ ā u f & . The Hamiltonian is represented by the sparse matrix H ab 5 ^ ā u Ĥ u b & . Note that in this representation the Hamiltonian matrix is not Hermitian, however its eigenvalues are all real and equal to the orbital energies. An alternative approach would be to use the matrix S 21/2 for defining the dual basis, in which case all matrices are Hermitian. We define the breadth B(v) of a vector v as the number of its nonzero elements. The breadth B(H) of a matrix H is defined as the maximum of the breadth of its columns. Consider an extended system of sites, each interacting with a finite number of near neighbors. The breadth B(H) of the Hamiltonian matrix equals the maximal number of such near neighbor interactions. The matrix H 2 will connect to a given site the near neighbors of the near neighbors. Thus, if the system is a chain of sites, H 2 will have a breadth of 2B(H) and, in general, B ~ H m ! 5mB ~ H !~ 1-D chain! , ~2.1! while, if the system is a two-dimensional ~2-D! sheet, the number of interacting sites grows as a square of the number of Hamiltonian applications in Eq. ~2.1!: B ~ H m ! 5m 2 B ~ H !~ 2-D sheet! . ~2.2! In general, we define an effective dimension of the connectivity of the system as ~strictly speaking m must be much smaller than the system size N, thus in practice d is defined by a sufficiently large m that is still much smaller than N!: d5 lim m→` log@ B ~ H m ! /B ~ H !# . log m ~2.3! The effective dimension of a system is close in meaning, yet different than the usual concept of dimension. The usual concept refers to the minimal dimension of the Cartesian space in which the geometric structure of the molecule and its electrons can be embedded. The breadth of a matrix and a vector is important in numerical applications, since algorithms can be constructed to explicitly take advantage of the fact that it is of a finite value. Under these algorithms, the numerical work, in terms of CPU time say, associated with the application of the Hamiltonian matrix to a vector of breadth B(v), is J ~ Hv! ' a B ~ H ! B ~ v! , ~2.4! where a is a hardware-dependent constant. Thus, in general, J ~ H m v! 'm d J ~ Hv! 5 a m d B ~ H ! B ~ v! . J. Chem. Phys., Vol. 107, No. 23, 15 December 1997 ~2.5! R. Baer and M. Head-Gordon: Chebyshev expansion methods The calculation of the matrix product H m H n can be estimated by considering the calculation to be N products of a matrix of breadth B(H m )5m d B(H) and a vector of breadth B(H n )5n d B(H), thus J ~ H m H n ! ' a ~ mn ! d @ B ~ H !# 2 N. ~2.6! These results are valid in the limit of large n, m and N is large compared to them both. We see that the construction of H m , for a given m, involves numerical work, which scales linearly with the system size. These equalities are correct for finite interaction range and otherwise exact numerical computations. In actual calculations the interactions are not of finite range, but only a finite precision is needed. Thus, a different type of breadth should be defined: the breadth B D (v) ~where v is normalized! is the number of elements with magnitude greater than 102D . For a matrix H, the breadth B D (H) is defined as the maximal breadth of its columns, after the matrix has been normalized so that its eigenvalues are all smaller than unity @we will shortly discuss this issue in Eq. ~2.11!#. The finite precision criterion usually allows for smaller rates of increase of the breadth B D (H m ) with m. In fact, we demonstrate in Sec. III that typically B D (H m )'m d/2B D (H). This rule does not contradict Eq. ~2.3!, which serves as a definition of the effective dimension, and where exact arithmetic is used with a finite band Hamiltonian. 10005 The parameter b, called the inverse temperature, controls the proximity of the FDM to the true DM. If the HOMO–LUMO gap is of size de, the DM can be approached to an accuracy of 102D , by choosing it large enough such that b d e /2'D log 10. ~2.9! For metals, or systems with a zero HOMO–LUMO gap, b can be chosen to describe the system with a physical temperature. We now briefly describe the Chebyshev polynomial expansion of the operator F(Ĥ). This operator is written as a series of Chebyshev polynomials: P21 F ~ Ĥ ! 5 ( n50 a n ~ b s , m s ! T n ~ Ĥ s ! . ~2.10! The symbols in this equation are all defined below. P is the expansion length and H s is a shifted and scaled Hamiltonian, constructed so that its eigenvalues are contained in the interval @ 21, 1 # . To be specific, we define E max and E min to be the largest and smallest eigenvalues of H, thus H s5 H2Ē , DE ~2.11! where Ē5 E max1E min ; 2 DE5 E max2E min . 2 ~2.12! B. Chebyshev expansion of the density matrix In order to take advantage of the sparsity of the oneelectron density matrix ~DM!, it is formulated as a power series in the Hamiltonian matrix. An efficient and powerful way to achieve this is by using the Chebyshev expansion, first proposed by Goedecker et al.22,24 We now briefly describe this approach. Formally, the DM is given by r ab 5 ^ ā u u ( e F 2Ĥ) u b & , where u (x) is the Heavyside weight, and e F is determined by the requirement that the number of occupied states equals the number of electrons 2N e :tr@ r̂ # 52N e . Kouri et al.32 has used a Chebyshev expansion of this Heavyside function. However, due to the nonanalytic nature of the Heavyside weight, we have found it difficult to control the convergence. This was caused by the tendency of the Chebyshev expansion to spread errors evenly on the entire interpolation interval. It is essential to localize the error of the approximation in the bandgap. Thus we follow Goedecker et al.,22–24 who used a Chebyshev polynomial expansion of the Fermi–Dirac density matrix ~FDM! given by F ~ Ĥ ! 5 1 11e ˆ 2m! . b~ H ~2.7! Similarly, we define a scaled inverse temperature: b s 5 b DE, ~2.13! and a scaled-shifted chemical potential: m s 5 ~ m 2Ē ! /DE. ~2.14! T n (x)5cos(n cos21 x) is the nth Chebyshev polynomial, and the expansion coefficients are defined by a n~ b s , m s ! 5 ~ 22 d n0 ! p E 1 T n~ x ! 21 A12x 2 1 11e b s ~ x2 m s ! dx, ~2.15! and calculated numerically by substituting x5cos(u) and integrating using the Fast Fourier Transform. In a local basis, the nth column of the density matrix r n can be obtained by operating on the nth unit vector vn ~a column of zeros with 1 in the nth place! with the expansion of Eq. ~2.10!, where the operator Ĥ is represented by the matrix H ab 5 ^ ā u Ĥ u b & . As a result, r n takes the form P21 Here m, called the chemical potential, is defined by the number of electrons r n5 ( m50 a m ~ b s , m s ! vnm , ~2.16! where, based on the Chebyshev polynomial recursion, tr@ F ~ Ĥ !# 5N e . ~2.8! T m11 ~ x ! 52xT m ~ x ! 2T m21 ~ x ! , J. Chem. Phys., Vol. 107, No. 23, 15 December 1997 ~2.17! 10006 R. Baer and M. Head-Gordon: Chebyshev expansion methods the vnm are defined by vn0 5vn , vn1 5Hvn , ~2.18! n n 52Hvnm 2vm21 . vm11 It can be shown33 that the Chebyshev expansion converges uniformly and geometrically and that it is the best polynomial expansion in the minimax sense, meaning a minimal largest error throughout the interpolation interval, for a given order of the polynomial. When the expansion is truncated at some finite length P, the truncation error is relatively smooth and uniform throughout the interpolated interval. In the Appendix we show that the order of the polynomial is related to the scaled inverse temperature by the following equation: P' 32 ~ D21 ! b s ~2.19! where D is the numerical accuracy, in terms of number of significant figures. We should mention that since the columns of the density matrix are constructed independently, the algorithm is naturally and most efficiently parallelizable. Furthermore, for almost all applications, the entire density matrix is never needed at once and large memory allocation can be saved by organizing the computation so the DM is used column by column. C. Efficiently locating the chemical potential The column vectors of Eq. ~2.18! are calculated without reference to the chemical potential ~or temperature!. These vectors can then be used with different expansion coefficients for several simultaneous DM calculations. This enables us to perform a calculation of several DMs, each corresponding to a different trial chemical potential, at a cost that is only a small fraction larger than the cost of a single DM computation. This greatly facilitates the search for the chemical potential that is determined by Eq. ~2.8!. D. Exploiting sparsity and dynamical sparsity As will be discussed at later stages of this paper, the ground state density matrix of a nonmetallic system has a finite breadth, largely independent of the system size. The same situation is prevalent for the metallic system at nonzero temperatures. Thus, much like the Hamiltonian, the density matrix itself is sparse once the system gets large enough. For calculations of energies and forces on such systems, it is essential to take account of this explicitly in the algorithms. Most conventional sparse matrix methods rely on a definition of the sparsity before the computation is started. Once such a definition is known it is possible to use various sparse matrix indexing schemes.34 One way to determine the sparsity in advance is by defining a localization volume around each atom, beyond which DM correlations are neglected.14,24 When implementing the Chebyshev method, it is, however, beneficial not to use a predefined sparsity, and instead to take advantage of an additional property: that of a dynamical sparsity. As explained in Sec. II B, the computation starts off from a very localized column vn0 5 d n0 and at every step operates once with the Hamiltonian H s on the vector of the previous step. Thus, the column vectors acquire larger and larger breadth as the calculation proceeds, reaching full breadth only at the late stages ~the last 10% of the expansion iterations P!. To take account of this dynamic broadening of the column vector, a special sparse linear algebra algorithm must be developed for representing the columns, matrices, and the algebraic operations. We present such an algorithm, which has the additional feature of being very flexible and does not require any a priori form of sparsity to be imposed. Instead, it allows vectors to be very narrow or wide, as dictated by the evolving computation. The important step for achieving these features is to use tree structures for representing column vectors. We chose to work with a binary tree. In our method, the breadth of the column vectors is allowed to grow or shrink by a process of trimming the tree as the computation proceeds. The trimming is done according to an accuracy threshold, which acts much like the digital precision of the computer: numbers with a magnitude less than the threshold are considered arbitrary and zeroed after each iteration. A full account and details of the method will be published elsewhere,35 and here we only briefly describe the central idea. Consider a one-dimensional system with seven atoms partitioned in space to boxes labeled A B C D, as shown in Diagram 1 Diagram 1. Assume for simplicity that every atom has one orbital so that a state vector of the system is a column vector of length 7, where the nth element C n is the probability amplitude for the electron to be in the orbital of atom n. In a tree based on the partitioning A–D, this column vector is represented as shown in Diagram 2. The data is organized in a way that encourages the following property: the larger the spatial distance between two atoms, the earlier in the tree hierarchy they branch. If the column vector C represents a column of the DM, then, due to sparsity, orbitals centered on two very distant atoms will generally not be simultaneously occupied. The benefit of the tree structure is that if, for example, all the coefficients C4 –C7 are zero, then this information is stored in a single zero flag associated with the node designated by an asterisk in the diagram ~every node in the tree has such a zero flag!. Thus, whenever two columns having zeros in elements C4 –C7 are added, the addition operation is performed J. Chem. Phys., Vol. 107, No. 23, 15 December 1997 R. Baer and M. Head-Gordon: Chebyshev expansion methods only for the elements on the left-hand part of the tree. Similar considerations apply to other algebraic operations. 10007 The deficiency of the reverse summation is that the possibility of calculating the density matrices for many chemical potentials and temperatures at one expansion, as described in Sec. II C is now lost. III. DM STRUCTURE IN R SPACE Diagram 2. The structure of the tree can efficiently zero large spatially contiguous parts of the wavefunction by setting the zero flag of the relevant node. Thus the important process of trimming the tree as the computation proceeds is very efficient. The binary tree codes can also be used to efficiently treat two- and three-dimensional systems, as will be described in a future publication.35 E. Reverse Chebyshev summation The Chebyshev series may be summed in reverse order, starting from the small coefficients, working toward the large ones. Thus, the broadening of the column vectors, as the Hamiltonian is applied to them, is delayed to later stages as much as possible. The tree codes of the previous section can take advantage of this and the performance of the method increases by dramatically large factors. The reverse summation is based on the Clenshaw summation method.36 The calculation proceeds by constructing a series of column vectors wJm , for m5 P21•••0, by the recursion n n wnm 52Hwm11 2wm12 1a n vn , ~2.20! starting with wnP 5wnP11 50. The Chebyshev approximation to the nth column r n is then P21 r n5 ( m50 P21 a m vnm 5 ( m50 n n 1T m wm12 ! ~ T m wnm 22HT m wm11 P21 5 ( m52 ~ T m 22HT m21 1T m22 ! wnm 1T 0 wn0 1T 1 wn1 22HT 0 wn1 5wn0 2Hwn1 . ~2.21! As mentioned above, the advantage of reverse summation is that the very small elements are summed first, and these may be efficiently trimmed without loss of accuracy so that the broadening of the columns is delayed as much as possible to the final summations. Comparison of the forward and reverse summations show that the numerical work for the latter is only a small fraction of that of the former ~usually a reduction of execution times by a factor of more than 5!. In this section, we analyze the locality and sparsity of the density matrix in R space using the fact that to a given precision the density matrix involves a finite Chebyshev expansion. In Sec. III A the locality of the DM in R space is discussed in general. Basis independent results are derived, showing that the spatial range of the DM is inversely proportional to the square root of the HOMO–LUMO gap. In Sec. III C we discuss the sparsity of the DM in a given local basis set. A. DM locality for insulators Insulators are characterized by the existence of a HOMO–LUMO gap de, quite independent of the system size. In metals, the gap usually shrinks proportionally as the system grows, and are therefore excluded from the following ground state discussion. Finite temperature calculation in metals are considered in Sec. III B. The FDM F̂ b , m 5 $ 11exp@b(Ĥ2m)#%21, is essentially equivalent to the ground state DM ~of either KS or HF Hamiltonians! when m is taken at the center of the gap and b is given by Eq. ~2.9! for ground state calculations to a precision 102D , thus, b s 'D DE 32 log 10. de ~3.1! Using Eq. ~2.19!, the Chebyshev expansion can be truncated at the following length: P53D ~ D21 ! DE de ~3.2! ~where we approximated 34 log 10'3!. The fact that the Chebyshev expansion is of finite length, can be used also as a theoretical tool for studying the properties of the ground state DM of insulators. We show that the spatial range of the FDM ~and thus of the DM! is inversely proportional to the square root of the HOMO–LUMO energy gap de. The discussion is rather a qualitative one, but allows to draw general conclusions for a wide variety of systems. The range of the FDM is a loosely defined quantity, since, in general, the function ^ r8 u F̂ b , m u r& is dependent on both r and r8 . However, when u r2r8 u is large, the exact functional dependence is not important, and one is interested in determining the spatial range W for which the value of ^ r8 u F̂ b , m u r& may be neglected whenever u r2r8 u .W. The system can be represented using a finite basis of Gaussians G r of range s, centered on a three-dimensional mesh of points r. The mesh spacing is a, of the same order as s. The overlap matrix is S r8 r5 ^ G r8 u G r& 5e 2 ~ r2r8 ! 2 /2s 2 , and the dual biorthonormal basis is defined by J. Chem. Phys., Vol. 107, No. 23, 15 December 1997 ~3.3! 10008 R. Baer and M. Head-Gordon: Chebyshev expansion methods ^ Ḡ ru 5 ( ~ S 21 ! rr8 ^ G r8 u . ~3.4! r8 We state results for a finite basis and then take the limit to an infinite delta-function basis by indefinitely decreasing the mesh spacing a and the range of the Gaussians s ~keeping a/ s constant!. It may be assumed that for a large distance a given basis function and its dual have essentially the same functional behavior, so that the ~scaled! Hamiltonian matrix elements take the following form for large separations: ^ Ḡ r8 u Ĥ s u G r& 'e 2 ~ r2r8 ! 2 /2s 2 ~3.5! , where the prefactor of the exponent has been dropped, since due to the locality of the interactions, it has only weak ~nonexponential! dependence on u r2r8 u when the latter is large. The long-range matrix elements for Ĥ 2s can be estimated by a Gaussian composition rule as Finally, plugging this result in Eq. ~3.7!, the spatial range can be estimated by the following representation independent estimate: ^ Ḡ r8 u Ĥ 2s u G r& 5 ( ^ Ḡ r8 u Ĥ s u G x&^ Ḡ xu Ĥ s u G r& x 'e 2 ~ r2r8 ! 2 /4s 2 ~3.6! . is & s , and repeatedly using Thus, the spatial range of the Gaussian composition rule P-1 times, the range of H sP is shown to be AP s . Using the expression of Eq. ~3.2! for the expansion length P, the range of the density matrix is then given by Ĥ 2s W ~ F̂ ! ' A 3D ~ D21 ! s 2 DE . de ~3.7! This equation depends on two representation-dependent parameters: the spatial range s of the basis functions G r , and the eigenvalue range of the Hamiltonian matrix DE5(E max 2Emin)/2. For small enough s, E min is influenced by the minimal values of the potential energy on mesh points close to atomic centers and E max is dominated by the maximal values of the kinetic energy. This is seen by considering the simple Gaussian integrals: E min' ^ Ḡ r u 2 Z maxe 2 Z maxe 2 u G r & '2 ; r̂ s \2 2 \2 ¹ u G r& ' . E max' ^ Ḡ r u 2 2m e 2m e s 2 ~3.8! The relations of Eq. ~3.8! implicitly assume that the exact Kohn–Sham exchange-correlation potential is not more singular than the kinetic energy and the Coulomb potentials. This indeed seems to be the case in practical applications. Thus, for very small s we find that overall DE is dominated by the kinetic energy term, DE'E max/2, and therefore ~note: taking the limit DE→` does not alter the estimate of the polynomial length, because, as is shown in the Appendix, the estimate of Eq. ~3.2! depends on the condition of Eq. ~A8!, and this condition is better satisfied the larger E max is!: lim s ADE5\/ A4m e . s →0 FIG. 1. The DM range of a metal ~squares! and a small gap insulator with d «50.01 a.u. ~diamonds!. Ranges are for D53. ~3.9! W ~ F̂ ! ' A \2 3D ~ D21 ! . 4m e d e ~3.10! This result agrees with the estimation of Kohn for onedimensional periodic systems,37 according to which the spatial range is proportional to d e 21/2. The arguments we present can be considered a generalization of Kohn’s theorem to systems of any dimension. Furthermore, for nonperiodic systems, although Eq. ~3.10! probably overestimates the range of the DM, it establishes a finite range for it, a conclusion derived also in Refs. 38 and 39. B. DM locality for metals at finite temperature A further generalization of the Kohn theorem is possible, following a similar line of reasoning for a finite temperature system, where the Kohn–Sham procedure still holds, but the exchange-correlation potential is now temperature dependent.40 Assuming this new potential introduces no larger singularities in s 21 than those of the kinetic energy s 22 , we immediately obtain, from the discussion in the previous section and from Eq. ~2.19!, W ~ F̂ ! ' AP s 5 A \2 ~ D21 ! b . 3m e ~3.11! This result is especially applicable to metals, since for insulators it grossly overestimates the range unless the temperature is very high. The relation greatly resembles the deBroglie thermal wavelength l5 Ah 2 b /3m e obtained by combining the free particle de-Broglie relation together with the thermal kinetic energy expression 3/2 k BT. However, Eq. ~3.11! has been derived without explicitly assuming the electron to be free. Our estimate of the range of the DM for a metal and a small band insulator or semiconductor is given in Fig. 1. J. Chem. Phys., Vol. 107, No. 23, 15 December 1997 R. Baer and M. Head-Gordon: Chebyshev expansion methods B D ~ H ! 'L d/2B D ~ S ! . C. DM sparsity in a local basis In this section we use the results obtained in the Sec. III A for deducing several characteristics of a representationdependent DM. Since we should like to focus on sparsity, we transform the results of the previous section to ‘‘breadth’’ of the DM or FDM, in a sense similar to that defined in Sec. II. We define the breadth of vector v to a given precision 102D , B D (v), as the number of elements with magnitude exceeding 102D . The breadth B D (H s ) of a ~scaled! matrix H s is the maximal breadth of its column. We zero ~trim!, after every numerical matrix operation, the elements with magnitude smaller than 102D , thus keeping the vector breadth as small as possible, increasing the efficiency of matrix and vector manipulations. The general results of the preceding section motivate our claim, that the breadth, to precision D, of H ns grows approximately at a rate proportional to n d/2, to be specific, we assume that a reasonable estimate is B D ~ H ns ! 'n d/2B D ~ H s ! . ~3.12! Here, d—the effective dimension—is defined by Eq. ~2.3!. Note that Eq. ~2.3! does not contradict Eq. ~3.12! because of the difference in definitions of breadth. The FDM is approximated to precision D by a polynomial in the Hamiltonian of order P, so the breadth of the FDM is estimated by B D ~ F ! ' P d/2B D ~ H ! , ~3.13! where P is given by Eq. ~3.2! for finite temperature calculations, and by Eq. ~2.19! for ground state calculations. This is an upper estimate, based on the worst-case assumption that no consistent cancellations occur in the expansion. In this matter, we refer the reader to the closing remarks of Sec. III B. We now estimate the breadth B D (H) of the Hamiltonian matrix itself. Since the dual basis ^ ā u is obtained from the original basis by the metric S ab 5 ^ a u b & , namely, ^ ā u 5 ( ~ S b 21 ! ab ^ b u , ~3.14! the Hamiltonian matrix is defined as: H ab 5 (c ~ S 21 ! ac ^ c u Ĥ u b & . ~3.15! The breadth of the matrix H ac will therefore be B D ~ H ! 'B D ~ S 21 ! . ~3.16! 21 We estimate the breadth of S by determining the length L of its Chebyshev expansion ~see the Appendix!: L' 1 2 D AC log 10. ~3.17! The breadth of the inverse overlap matrix is therefore B D ~ S 21 ! 'L d/2B D ~ S ! . 10009 ~3.18! This also serves as an estimate of the breadth of the Hamiltonian, and we can write ~3.19! Inserting this expression into Eq. ~3.13! we obtain the breadth of the FDM as B D ~ F ! ' ~ PL ! d/2B D ~ S ! . ~3.20! IV. ESTIMATES OF ALGORITHMIC COMPLEXITIES In this section we use the estimates of the breadth of the density matrix to determine the scaling properties of two categories of approaches to linear scaling. Before we continue, however, we feel it is important to devote a few words to the definition of the accuracy D of the calculation. Measuring the calculational accuracy in terms of the error in the total energy, as is done in several recent papers, may be misleading. This error does not clearly indicate the quality of the calculated DM for other than total energy estimations. For example, since the energy minimization algorithms zero the first-order error in the total energy, these approaches will tend to give high accuracy for the total energy, even when the DM is relatively poorly determined. When the electronic structure calculation is aimed, as it usually is, at a dynamical application ~i.e., calculating forces!, it is the error in the DM that is important. Thus, unless only structure is important, a suitable definition of the precision should be based on the violation of DM idempotency and commutativity with the Hamiltonian, and not on the trace of the DM and the Hamiltonian. For example, in the case of an orthonormal basis we define the precision by 10 2D 5max SA tr@~ F 2 2F ! 2 # tr F , A tr~@ H s ,F # 2 ! tr F D . ~4.1! A. F 3 F methods A number of approaches having a linear scaling complexity for calculation of the DM have been put forth by several groups, such as Li, Nunes, and Vanderbilt ~LNV!,14 Hernandez et al.,17,18 and Kohn.16 These algorithms involve a minimization of a functional of the DM, constructed to ensure idempotency. The minimization process is composed of a sequence of M calculations of a power of the density matrix F n , where n52,3,4 in the LNV, Hernandez et al. and Kohn approaches, respectively. By idempotency F n 'F, so the computation of F n requires n21 multiplications of matrices similar in sparsity to F, hence our name F3F methods for these methods. It follows that the numerical work required for the F3F method is @see Eqs. ~2.6! and ~3.13!# J5 a M n B D ~ F ! 2 N ' a M n ~ PL ! d B D ~ S ! 2 N, ~4.2! where M n is the number of F3F operations required to reach the minimum and determine the DM to a precision D. J. Chem. Phys., Vol. 107, No. 23, 15 December 1997 10010 R. Baer and M. Head-Gordon: Chebyshev expansion methods Some of the F3F methods do not explicitly require the calculation of the inverse overlap matrix S 21 , and as a result, the matrix F is a modified density matrix not exactly equal to the density matrix, as we have defined it. However, as pointed out by Nunes and Vanderbilt,15 the breadth of the modified DM is still comparable to the range of the original DM which does include the S 21 term, so the resulting numerical labor, can still be estimated as shown in Eq. ~4.2!. B. The Chebyshev expansion method The Chebyshev expansion of the FDM also constitutes a linearly scaling algorithm. The numerical work involved consists of applying the Chebyshev series, of length P to each of the N basis functions. The numerical work needed to calculate the nth column of the density matrix is @see Eq. ~2.18!# P J~ Fn!' ( m50 ~4.3! J ~ Hvnm ! . FIG. 2. Numerical work vs error norm of DM @see Eq. ~4.1!#. Circles ~LNV! and squares ~Chebyshev! are calculation results while lines are of a slope given by equations in the text LNV calculations for a 3D system were not calculated due to CPU memory limitations. The breadth of the Chebyshev vectors is B D ~ vnm ! 'B D ~ H m v! 5m d/2B D ~ H ! . ~4.4! Thus, the work in Eq. ~4.3! becomes P21 J~ Fn!' ( m50 a m d/2B D ~ H ! 2 ' a P d/211 B D ~ H ! 2 . ~4.5! The total work for calculating the density matrix in the Chebyshev method is therefore J' a P d/211 L d B D ~ S ! 2 N. ~4.6! Comparing this result with the corresponding estimate for the F3F methods @Eq. ~4.2!#, it is seen that the latter scale as P d while the Chebyshev method as P d/211 . This difference stems from the fact that the Chebyshev method involves operating P times with the relatively small breadth Hamiltonian on N vectors of breadth P d/2, while the F3F methods involve multiplication of two matrices, of breadth P d/2 each. ergy, starting from F51/2 Î. The results are shown in Fig. 2, where the numerical work, in CPU time, is plotted against the error norm of Eq. ~4.1!. The lines in the figure are those determined from Eqs. ~4.2! and ~4.6! @using the relation between the expansion length P and the accuracy D, Eq. ~3.2!#. It is seen that the theoretical estimates are in reasonable accordance with the actual results. V. APPLICATIONS: TIGHT-BINDING SYSTEMS In this section we provide examples of the performance of the Chebyshev expansion in a tight-binding model for hydrocarbons. We use the model of Davidson and Pickett42 including the modifications of Horsfield et al.41 yielding a local charge neutrality tight-binding method. Two cases are C. Case study: Numerical work versus DM accuracy In order to check the results of Eqs. ~4.2! and ~4.6! we have timed the calculations for reaching a precision D using both the LNV method and Chebyshev method. The calculations were performed on a tight-binding cubic lattice model, having 10d sites ~d51,2,3 is the dimensionality! and a nearest neighbor spacing of 4 a.u. The parametrization of the model was based on the Hamiltonian of Ref. 41 for carbon, but the following changes were made for simplifying the interactions, achieving a large band gap and smaller spectral range: only two electrons were allocated to each atom and separately for each dimension d we changed the magnitude of the Slater–Koster parameter V ss, s until a band gap of d e 50.1 a.u. was achieved and the spectral range DE was in the range 0.25–0.3 a.u. Both methods were applied using the sparse matrix tree code described in Sec. II D. In the LNV implementation the conjugate gradients method was used for minimizing the en- FIG. 3. CPU times for calculating the density matrix of a d51 hydrocarbon chain Cn Hn12 using a tight-binding Hamiltonian, as a function of system size, for the Chebyshev method ~diamonds—D53, triangles—D54, and dots—D55! and direct diagonalization ~squares!. J. Chem. Phys., Vol. 107, No. 23, 15 December 1997 R. Baer and M. Head-Gordon: Chebyshev expansion methods FIG. 4. CPU times for calculating the density matrix of a 2-D carbon sheet saturated with hydrogen, using a tight-binding Hamiltonian, as a function of system size and accuracy, for the Chebyshev method ~dots—D53 and triangles—D54! and direct diagonalization ~squares!. considered: d51 and d52 systems. CPU times reported refer to calculations on a DEC-3000 workstation with a single 175 MHz processor and 128 MB of RAM. A. d 51 system: Saturated carbon chain „Cn H2 n 12 … This system is characterized by a large HOMO–LUMO gap of d e 50.3 a.u. The spectral range of the tight-binding Hamiltonian is DE51.7 a.u. We timed the performance for varying system sizes and three precision values D53,4,5 ~with corresponding expansion lengths P590, 190, 360!. Note that the dimension of the full matrices is N56n12, where n is the number of carbon atoms. The results are shown in Fig. 3. It is seen that the turnover size ~the system size at which conventional diagonalization gives comparable performance to the linear scaling method! is at about n 550, 70, and 120 for corresponding accuracy D53, 4, and 5. B. d 52 System: carbon sheet saturated with hydrogen Here, too, the hydrogen saturation enables a large bandgap of d e 50.17 a.u. The energy range is DE'2 a.u. We report, in Fig. 4, the results for D53, D54 ~Chebyshev expansion lengths is P5160, 360!. As the system gets larger the number of hydrogen atoms per carbon atom approaches 1 ~for a small system, the boundary effects are noticeable and some Carbon atoms are saturated by two hydrogen atoms!, so the matrix dimension is about N55n. The turnover sizes are n5130 for D53 and n5280 for D54. VI. CONCLUSIONS In this paper we analyzed linear scaling algorithms for electronic structure calculations and focused on one specific method, the Chebyshev expansion of the DM. For that method we have given rules for selecting the various param- 10011 eters, based on the accuracy required and known properties of the system. We have also shown how to speed up the application of the method, first by representing vectors as binary trees and trimming the trees according to a threshold accuracy criterion, and then by performing a reverse summation. We have also pointed out how to efficiently search for the chemical potential by calculating the DM for several chemical potentials and temperatures in one forward summation. One conclusion is that the linear scaling methods are especially useful for large tight-binding Hamiltonian calculations. For ab initio calculations, where an overlap matrix is present, the methods are rather limited to systems of low effective dimensionality (d,2) and large gap ~or hightemperature! higher dimensional systems. The Chebyshev expansion method is shown to be a strong competitor for the LNV-type methods that have emerged recently. This is especially so for systems with effective dimensionality larger than 1, where we have given arguments why this should be so. ACKNOWLEDGMENTS This work was supported by the Laboratory Directed Research and Development Program of Lawrence Berkeley Laboratory under US-DOE Contract No. DE-AC0376SF00098. R. B. wishes to thank D. Neuhauser for helpful discussions. M.H.G. acknowledges a Packard Fellowship ~1995–2000!. APPENDIX: EXPANSION LENGTHS FOR THE DENSITY AND OVERLAP MATRICES In this appendix we use the mathematical theory of Chebyshev expansion convergence to estimate the length of the Chebyshev expansion series for the DM and the overlap matrix. The convergence properties of the Chebyshev polynomial expansion of a given function f (x) in the interval @ 21, 1 # is controlled by the singularities near the real axis of its analytical continuation f (z) in the complex plane. The theory is well established ~see, for example, Ref. 33! and only the essentials will be summarized here. Let us associate with each positive number r an ellipse with foci at z561 given by the following parametrized curve in the complex plane: r 1 r 21 r 2 r 21 cos u 1i sin u , z r~ u ! 5 2 2 ~A1! where the parameter u varies in the interval @0,2p#. Let r be the largest number for which f (z) is analytical in the complex domain encircled by the ellipse z r ~since z r is the same ellipse as z r 21 , r is not less than 1!. Then, the coefficients a n in the expansion of the function f (x), xP @ 21,1# , satisfy33 u a nu < 2M , rn where M is the maximal value of u f ( z r ) u . J. Chem. Phys., Vol. 107, No. 23, 15 December 1997 ~A2! 10012 R. Baer and M. Head-Gordon: Chebyshev expansion methods For the case of f (z)51/(11e b s (z2 m s ) ), the singularities are at z m5 m s1 ~ 2m11 ! p bs ~A3! i, where m is any integer. Thus, the largest ellipse encircling an analytic domain for f is determined by the location of z 0 5 m s 1i j , where j5p/bs . r max5a1b, ~A5! with ACD log 10. ~A14! Notice that when a geometrical expansion for the overlap matrix is used: S 21 ' L s 21 ( n50 ~ 12S ! n , ~A15! its required length, for a given precision D, is L G 'DC log 10. ~A16! Thus the Chebyshev expansion length is substantially less sensitive to the condition number of the matrix. m 2 1 j 2 111 A~ m 2 1 j 2 11 ! 2 24 m 2 . ~A6! 2 a 2 511b 2 ' m 2s 1 j 2 111 A~ 12 m 2s ! 2 12 j 2 ~ m 2s 11 ! . 2 ~A7! Now, for all cases of relevance, 4 j 2 !12 m 2s , ~A8! so the following estimate is obtained: a 2 511b 2 '11 j2 . 2 ~A9! Therefore, r max5a1b'11j/&, and log r max' j /&. ~A10! Using Eq. ~A2!, assuming a required precision of 102D , P must be large enough so that 2M 2D . P ,10 r max ~A11! Taking a logarithm, rearranging, and using Eq. ~A10!, the resulting estimate is &D log 10 'D b s . j ~A12! The linear relation of P and b was checked in numerical tests empirically we found a somewhat tighter limit: P' 32 ~ D21 ! b s . P. Hohenberg and W. Kohn, Phys. Rev. B B136, 864 ~1964!; W. Kohn and L. J. Sham, Phys. Rev. A A140, 1133 ~1965!; L. J. Sham and W. Kohn, Phys. Rev. 145, 561 ~1966!. 2 A. P. Sutton, M. W. Finnis, D. G. Pettifor, and Y. Ohta, J. Phys. C 21, 35 ~1988!. 3 C. A. White, B. G. Johnson, P. M. W. Gill, and M. Head-Gordon, Chem. Phys. Lett. 230, 8 ~1994!; C. A. White, B. G. Johnson, P. M. W. Gill, and M. Head-Gordon, ibid. 2453, 268 ~1997!. 4 C. A. White, B. G. Johnson, P. M. W. Gill, and M. Head-Gordon, Chem. Phys. Lett. 253, 268 ~1996!. 5 J. C. Burant, G. E. Scuseria, and M. J. Frisch, J. Chem. Phys. 105, 8969 ~1996!. 6 M. C. Strain, G. E. Scuseria, and M. J. Frisch, Science 271, 5245 ~1996!. 7 E. Schwegler, M. Challacombe, and M. Head-Gordon, J. Chem. Phys. 106, 9708 ~1997!. 8 M. C. Payne, M. P. Teter D. C. Allen, T. A. Arias, and J. D. Joanopolus, Rev. Mod. Phys. 64, 1045 ~1992!. 9 R. Car and M. Parrinello, Phys. Rev. Lett. 55, 2471 ~1985!. 10 W. T. Yang, Phys. Rev. Lett. 66, 1438 ~1991!. 11 F. Mauri, G. Galli, and R. Car, Phys. Rev. B 47, 9973 ~1993!. 12 W. Kohn, Chem. Phys. Lett. 208, 167 ~1993!. 13 P. W. Anderson, Phys. Rev. Lett. 21, 13 ~1968!. 14 X.-P. Li, R. W. Nunes, and D. Vanderbilt, Phys. Rev. B 47, 10 891 ~1993!. 15 R. W. Nunes and D. Vanderbilt, Phys. Rev. B 50, 17 611 ~1994!. 16 W. Kohn, Phys. Rev. Lett. 76, 3168 ~1996!. 17 E. Hernandez and M. J. Gillan, Phys. Rev. B 51, 10 157 ~1995!. 18 E. Hernandez, M. J. Gillan, and C. M. Goringe, Phys. Rev. B 53, 7147 ~1996!. 19 P. Ordejon, D. A. Drabold, R. M. Martin, and M. P. Grumbach, Phys. Rev. B 51, 1456 ~1995!. 20 E. B. Stechel, A. R. Williams, and P. J. Feibelman, Phys. Rev. B 49, 10 088 ~1994!. 21 G. Galli and M. Parrinello, Phys. Rev. Lett. 69, 3547 ~1992!. 22 S. Goedecker and L. Colombo, Phys. Rev. Lett. 73, 122 ~1994!. 23 S. Goedecker, J. Comput. Phys. 118, 261 ~1995!. 24 S. Goedecker and M. Teter, Phys. Rev. B 51, 9455 ~1995!. 25 A. F. Voter, J. D. Kress, and R. N. Silver, Phys. Rev. B 53, 12 733 ~1996!. 26 R. Kosloff and H. Tal-Ezer, Chem. Phys. Lett. 127, 223 ~1986!. 27 D. Neuhauser and M. Baer, J. Chem. Phys. 90, 4351 ~1989!. 28 R. Baer and R. Kosloff, Chem. Phys. Lett. 200, 183 ~1992!. 29 D. K. Hoffman, Y. Huang, W. Zhu, and D. J. Kouri, J. Chem. Phys. 101, 1242 ~1994!. 30 R. Baer, Y. Zeiri, and R. Kosloff, Phys. Rev. B 54, 5287 ~1996!. 31 R. Baer and R. Kosloff, J. Chem. Phys. 106, 8862 ~1997!. 32 D. J. Kouri, Y. Huang, and D. K. Hoffman, J. Phys. Chem. 100, 7903 ~1996!. 33 T. J. Rivlin, Chebyshev Polynomials: From Approximation Theory to Algebra and Number Theory ~Wiley, New York, 1990!. 34 R. P. Tewarson, Sparse Matrices ~Academic Press, New York, 1973!. 1 Assuming very small j, we neglect j 4 and write P. L. 21 ~A4! The largest ellipse not containing the z 0 singularity is defined by a 2 511b 2 5 tion of the equation ( r 1 r 21 )/252z 0 and the resulting estimate of the series length L necessary for achieving a precision 102D is obtained for large C as ~A13! The computation of the inverse overlap matrix S 21 ~where the eigenvalues of S are assumed all positive! can also be performed using a Chebyshev expansion. Performing a similar analysis to the function, it can be shown, after proper scaling, that the pole nearest to the interpolation interval is at z 0 5(11C)/(12C), where C is the condition number of the overlap matrix ~the ratio of the largest to the smallest eigenvalues!; thus the appropriate r max is the solu- J. Chem. Phys., Vol. 107, No. 23, 15 December 1997 R. Baer and M. Head-Gordon: Chebyshev expansion methods W. Kohn and R. J. Onffroy, Phys. Rev. B 8, 2485 ~1973!. A. K. Rajagopal, Adv. Chem. Phys. 41, 59 ~1980!. 41 A. P. Horsfield, P. D. Godwin, D. G. Pettifor, and A. P. Sutton, Phys. Rev. B 54, 15 773 ~1996!. 42 B. N. Davidson and W. E. Pickett, Phys. Rev. B 49, 11 253 ~1994!. 35 39 36 40 R. Baer and M. Head-Gordon, to be published. C. W. Clenshaw, Mathematical Tables, National Physical Laboratory Vol. 5 ~HM Stationery Office, London, 1962!. 37 W. Kohn, Phys. Rev. 115, 809 ~1959!. 38 A. Nenciu and G. Nenciu, Phys. Rev. B 47, 10 112 ~1993!. 10013 J. Chem. Phys., Vol. 107, No. 23, 15 December 1997
© Copyright 2026 Paperzz