Chebyshev expansion methods for electronic structure calculations

Chebyshev expansion methods for electronic structure calculations
on large molecular systems
Roi Baer and Martin Head-Gordon
Department of Chemistry, University of California, Berkeley, California 94720 and Chemical Sciences
Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720
~Received 17 July 1997; accepted 15 September 1997!
The Chebyshev polynomial expansion of the one electron density matrix ~DM! in electronic
structure calculations is studied, extended in several ways, and benchmark demonstrations are
applied to large saturated hydrocarbon systems, using a tight-binding method. We describe a flexible
tree code for the sparse numerical algebra. We present an efficient method to locate the chemical
potential. A reverse summation of the expansion is found to significantly improve numerical speed.
We also discuss the use of Chebyshev expansions as analytical tools to estimate the range and
sparsity of the DM and the overlap matrix. Using these analytical estimates, a comparison with other
linear scaling algorithms and their applicability to various systems is considered.
© 1997 American Institute of Physics. @S0021-9606~97!03947-0#
I. INTRODUCTION
Many theories of molecular electronic structure employ
the concept of an effective one-electron Hamiltonian. Kohn
and Sham, using the theorems of Hohenberg and Kohn, rigorously prove the existence of an effective one-electron
Hamiltonian, of the form1
H52
\2 2
¹ 1 n ~ r! ,
2m e
~1.1!
for determining the ground state energy of a many-electron
system. The normalized eigenstates ( c i ) of this Hamiltonian
describe the exact ground state one-electron density ~we ignore spin for simplicity!:
r ~ r! 5
(i n iu c i~ r! u 2 ,
~1.2!
where n i is the occupancy of the ith state: equal to 0 or 1,
according to the rule that for 2N e electrons the N e lowest
eigenstates are populated and the other states are vacant.
Definitions can also be devised for an odd number of electrons. The occupation numbers may be viewed as the eigenvalues of a more general operator, the Kohn–Sham density
matrix ~KSDM!:
r ~ r,r8 ! 5
(i n i c i~ r! c *i ~ r8 ! ,
~1.3!
which is idempotent:
Er
~ r,r9 ! r ~ r9 ,r8 ! dr9 5 r ~ r,r8 ! .
~1.4!
The ground state energy of the system is then formulated in
terms of the density matrix. Likewise, in Hartree–Fock
theory, the effective Hamiltonian is the Fock Matrix, and a
Hartree–Fock density matrix ~HFDM! can be defined in the
same manner as for the Kohn–Sham theory.
J. Chem. Phys. 107 (23), 15 December 1997
This rigorous scheme has also been the basis of constructing new semiempirical tight binding models,2 where
there too the groundstate energy is determined by calculating
the idempotent density matrix.
Hartree–Fock and DF theories are self-consistent field
~SCF! theories, and once the density matrix is calculated, a
new Hamiltonian is constructed from it. In real space this
step turns out to be computationally intensive, and for small
systems has a O(N 4e ) complexity. However, new theoretical
developments first introduced in Ref. 3 and later developed
further4–7 have overcome this obstacle, achieving linear scaling in this aspect of the computation and paving the way for
large system SCF calculations.
Thus, the computational bottleneck shifts to the calculation of the DM from the given Hamiltonian. Exceedingly
successful approaches to treating large systems are the planewave total energy and Car–Parrinello approaches.8,9 These
methods are capable of dealing with a number of atoms,
presently on the order of hundreds. It is difficult to extend
the methods beyond this size primarily due to the O(M N 2e )
scaling, where N e is the number of electrons in a unit cell
and M the number of plane waves.
Recently, it was pointed out10–12 ~and also see Ref. 13
for earlier ideas! that by invoking a basis of localized functions, instead of a plane-wave basis, it is possible to develop
methods that scale linearly with system size. Pursuing this
idea, several algorithms for dealing with different aspects of
the electronic-structure calculations have been developed.
The most established are the methods based on searching for
a DM that minimizes a generalized energy functional that
includes terms encouraging DM idempotency.14–18 In these
methods, the minimization process requires a calculation of a
power of the density matrix. Thus, in the LNV method,14,15
the power F 2 , where F is the density matrix, needs to be
calculated. The method of Hernandez et al.17,18 requires the
calculation of F 3 and the Kohn method16 requires a calculation of F 4 . We name these ‘‘F3F methods.’’ Other linear
scaling methods have been proposed,11,12,19–21 mostly based
0021-9606/97/107(23)/10003/11/$10.00
© 1997 American Institute of Physics
10003
10004
R. Baer and M. Head-Gordon: Chebyshev expansion methods
on an orbital approach, but we shall not explicitly consider
these in this paper, which concentrates on the one-electron
density matrix.
A different approach for calculating the DM is a direct
extraction of it from the Hamiltonian, without a search for a
functional minimum. Such a method has been proposed by
Goedecker and Colombo22 and is based on a polynomial expansion of the DM. In particular, Goedecker et al. used
Chebyshev polynomials in these expansions.23,24 This approach was also recently applied to tight binding models by
Voter et al.25
Chebyshev expansions have been very successful in
quantum dynamical calculations ever since their introduction
to the field by Kosloff and Tal-Ezer.26 The methods have
been used for expanding various functions of the Hamiltonian, including the evolution operator in time-dependent
reactive scattering,27 and in molecular spectroscopy,28 the
Green’s function for reactive scattering29 and filtering methods for dissipative tunneling.30,31 Recently, Kouri et al.32
used a Chebyshev polynomial expansion of the Heavyside
weight, formally equivalent to the DM, for plane-wave DFT
calculations.
This paper is intended to further establish the Chebyshev
expansion method of Refs. 22 and 24 in several aspects.
First, we introduce several important improvements to the
method. We describe a tree code for representing the sparse
column vectors being computed, taking full advantage of the
fact that it starts off very narrow and broadens gradually. In
particular, it is shown that the Chebyshev series may be
summed in reverse, and this, in conjunction with our tree
code method increases the efficiency of the calculation by
large factors, without sacrificing precision and without imposing any a priori cutoff radii around atoms. We also discuss how to perform an efficient search for the chemical
potential, using special properties of the expansion.
Next, we investigate the dependence on system geometry and accuracy constraints of the new linear scaling methods. In this respect it is shown how the analytical properties
of the Chebyshev expansion can be used to determine the
DM sparsity before attempting any actual calculation. We
then use such estimates to evaluate general properties of several linear scaling methods. This analysis leads to nontrivial
results showing that different linear scaling methods can
have different scaling properties with respect to dimensionality of the system and to accuracy.
Finally, we present several results on hydrocarbon sheets
and chains, demonstrating the linear scaling properties of the
Chebyshev method and our theoretical estimates.
The structure of the paper follows. We define some technical terms and notations and describe briefly the Chebyshev
expansion method and several improvements in Sec. II. We
use, in Sec. III, the Chebyshev expansion as an analytical
tool for studying the locality and sparsity of the density matrix. In Sec. IV we produce theoretical estimates for the numerical work needed to calculate the DM. Estimates are
given for the Chebyshev expansion method and for F3F
methods. Examples of the performance of the Chebyshev
method within a tight-binding model of hydrocarbon systems
in one dimension and two dimensions is given in Sec. V.
And a summary of our findings is finally given in Sec. VI.
II. GENERAL FRAMEWORK
A. Breadth and effective dimension
The theory is formulated in a basis of N functions, localized in R space, u a & (a51,...,N) and its dual ^ b̄ u ( ^ b̄ u a &
5 d ab ). In this space, single electron wave functions f~r! are
represented by column vectors v with coefficients n a
5 ^ ā u f & . The Hamiltonian is represented by the sparse matrix H ab 5 ^ ā u Ĥ u b & . Note that in this representation the
Hamiltonian matrix is not Hermitian, however its eigenvalues are all real and equal to the orbital energies. An alternative approach would be to use the matrix S 21/2 for defining
the dual basis, in which case all matrices are Hermitian.
We define the breadth B(v) of a vector v as the number
of its nonzero elements. The breadth B(H) of a matrix H is
defined as the maximum of the breadth of its columns.
Consider an extended system of sites, each interacting
with a finite number of near neighbors. The breadth B(H) of
the Hamiltonian matrix equals the maximal number of such
near neighbor interactions. The matrix H 2 will connect to a
given site the near neighbors of the near neighbors. Thus, if
the system is a chain of sites, H 2 will have a breadth of
2B(H) and, in general,
B ~ H m ! 5mB ~ H !~ 1-D chain! ,
~2.1!
while, if the system is a two-dimensional ~2-D! sheet, the
number of interacting sites grows as a square of the number
of Hamiltonian applications in Eq. ~2.1!:
B ~ H m ! 5m 2 B ~ H !~ 2-D sheet! .
~2.2!
In general, we define an effective dimension of the connectivity of the system as ~strictly speaking m must be much
smaller than the system size N, thus in practice d is defined
by a sufficiently large m that is still much smaller than N!:
d5 lim
m→`
log@ B ~ H m ! /B ~ H !#
.
log m
~2.3!
The effective dimension of a system is close in meaning, yet
different than the usual concept of dimension. The usual concept refers to the minimal dimension of the Cartesian space
in which the geometric structure of the molecule and its electrons can be embedded.
The breadth of a matrix and a vector is important in
numerical applications, since algorithms can be constructed
to explicitly take advantage of the fact that it is of a finite
value. Under these algorithms, the numerical work, in terms
of CPU time say, associated with the application of the
Hamiltonian matrix to a vector of breadth B(v), is
J ~ Hv! ' a B ~ H ! B ~ v! ,
~2.4!
where a is a hardware-dependent constant. Thus, in general,
J ~ H m v! 'm d J ~ Hv! 5 a m d B ~ H ! B ~ v! .
J. Chem. Phys., Vol. 107, No. 23, 15 December 1997
~2.5!
R. Baer and M. Head-Gordon: Chebyshev expansion methods
The calculation of the matrix product H m H n can be estimated by considering the calculation to be N products of a
matrix of breadth B(H m )5m d B(H) and a vector of breadth
B(H n )5n d B(H), thus
J ~ H m H n ! ' a ~ mn ! d @ B ~ H !# 2 N.
~2.6!
These results are valid in the limit of large n, m and N is
large compared to them both. We see that the construction of
H m , for a given m, involves numerical work, which scales
linearly with the system size.
These equalities are correct for finite interaction range
and otherwise exact numerical computations. In actual calculations the interactions are not of finite range, but only a
finite precision is needed. Thus, a different type of breadth
should be defined: the breadth B D (v) ~where v is normalized! is the number of elements with magnitude greater than
102D . For a matrix H, the breadth B D (H) is defined as the
maximal breadth of its columns, after the matrix has been
normalized so that its eigenvalues are all smaller than unity
@we will shortly discuss this issue in Eq. ~2.11!#. The finite
precision criterion usually allows for smaller rates of increase of the breadth B D (H m ) with m. In fact, we demonstrate in Sec. III that typically B D (H m )'m d/2B D (H). This
rule does not contradict Eq. ~2.3!, which serves as a definition of the effective dimension, and where exact arithmetic is
used with a finite band Hamiltonian.
10005
The parameter b, called the inverse temperature, controls the
proximity of the FDM to the true DM. If the HOMO–LUMO
gap is of size de, the DM can be approached to an accuracy
of 102D , by choosing it large enough such that
b d e /2'D log 10.
~2.9!
For metals, or systems with a zero HOMO–LUMO gap, b
can be chosen to describe the system with a physical temperature. We now briefly describe the Chebyshev polynomial
expansion of the operator F(Ĥ). This operator is written as a
series of Chebyshev polynomials:
P21
F ~ Ĥ ! 5
(
n50
a n ~ b s , m s ! T n ~ Ĥ s ! .
~2.10!
The symbols in this equation are all defined below. P is
the expansion length and H s is a shifted and scaled Hamiltonian, constructed so that its eigenvalues are contained in
the interval @ 21, 1 # . To be specific, we define E max and E min
to be the largest and smallest eigenvalues of H, thus
H s5
H2Ē
,
DE
~2.11!
where
Ē5
E max1E min
;
2
DE5
E max2E min
.
2
~2.12!
B. Chebyshev expansion of the density matrix
In order to take advantage of the sparsity of the oneelectron density matrix ~DM!, it is formulated as a power
series in the Hamiltonian matrix. An efficient and powerful
way to achieve this is by using the Chebyshev expansion,
first proposed by Goedecker et al.22,24 We now briefly describe this approach.
Formally, the DM is given by r ab 5 ^ ā u u ( e F 2Ĥ) u b & ,
where u (x) is the Heavyside weight, and e F is determined by
the requirement that the number of occupied states equals the
number of electrons 2N e :tr@ r̂ # 52N e . Kouri et al.32 has
used a Chebyshev expansion of this Heavyside function.
However, due to the nonanalytic nature of the Heavyside
weight, we have found it difficult to control the convergence.
This was caused by the tendency of the Chebyshev expansion to spread errors evenly on the entire interpolation interval. It is essential to localize the error of the approximation
in the bandgap. Thus we follow Goedecker et al.,22–24 who
used a Chebyshev polynomial expansion of the Fermi–Dirac
density matrix ~FDM! given by
F ~ Ĥ ! 5
1
11e
ˆ 2m! .
b~ H
~2.7!
Similarly, we define a scaled inverse temperature:
b s 5 b DE,
~2.13!
and a scaled-shifted chemical potential:
m s 5 ~ m 2Ē ! /DE.
~2.14!
T n (x)5cos(n cos21 x) is the nth Chebyshev polynomial, and
the expansion coefficients are defined by
a n~ b s , m s ! 5
~ 22 d n0 !
p
E
1
T n~ x !
21
A12x 2
1
11e
b s ~ x2 m s !
dx,
~2.15!
and calculated numerically by substituting x5cos(u) and integrating using the Fast Fourier Transform.
In a local basis, the nth column of the density matrix r n
can be obtained by operating on the nth unit vector vn ~a
column of zeros with 1 in the nth place! with the expansion
of Eq. ~2.10!, where the operator Ĥ is represented by the
matrix H ab 5 ^ ā u Ĥ u b & . As a result, r n takes the form
P21
Here m, called the chemical potential, is defined by the number of electrons
r n5
(
m50
a m ~ b s , m s ! vnm ,
~2.16!
where, based on the Chebyshev polynomial recursion,
tr@ F ~ Ĥ !# 5N e .
~2.8!
T m11 ~ x ! 52xT m ~ x ! 2T m21 ~ x ! ,
J. Chem. Phys., Vol. 107, No. 23, 15 December 1997
~2.17!
10006
R. Baer and M. Head-Gordon: Chebyshev expansion methods
the vnm are defined by
vn0 5vn ,
vn1 5Hvn ,
~2.18!
n
n
52Hvnm 2vm21
.
vm11
It can be shown33 that the Chebyshev expansion converges
uniformly and geometrically and that it is the best polynomial expansion in the minimax sense, meaning a minimal
largest error throughout the interpolation interval, for a given
order of the polynomial. When the expansion is truncated at
some finite length P, the truncation error is relatively smooth
and uniform throughout the interpolated interval. In the Appendix we show that the order of the polynomial is related to
the scaled inverse temperature by the following equation:
P' 32 ~ D21 ! b s
~2.19!
where D is the numerical accuracy, in terms of number of
significant figures.
We should mention that since the columns of the density
matrix are constructed independently, the algorithm is naturally and most efficiently parallelizable. Furthermore, for almost all applications, the entire density matrix is never
needed at once and large memory allocation can be saved by
organizing the computation so the DM is used column by
column.
C. Efficiently locating the chemical potential
The column vectors of Eq. ~2.18! are calculated without
reference to the chemical potential ~or temperature!. These
vectors can then be used with different expansion coefficients for several simultaneous DM calculations. This enables us to perform a calculation of several DMs, each corresponding to a different trial chemical potential, at a cost
that is only a small fraction larger than the cost of a single
DM computation. This greatly facilitates the search for the
chemical potential that is determined by Eq. ~2.8!.
D. Exploiting sparsity and dynamical sparsity
As will be discussed at later stages of this paper, the
ground state density matrix of a nonmetallic system has a
finite breadth, largely independent of the system size. The
same situation is prevalent for the metallic system at nonzero
temperatures. Thus, much like the Hamiltonian, the density
matrix itself is sparse once the system gets large enough. For
calculations of energies and forces on such systems, it is
essential to take account of this explicitly in the algorithms.
Most conventional sparse matrix methods rely on a definition of the sparsity before the computation is started. Once
such a definition is known it is possible to use various sparse
matrix indexing schemes.34 One way to determine the sparsity in advance is by defining a localization volume around
each atom, beyond which DM correlations are neglected.14,24
When implementing the Chebyshev method, it is, however,
beneficial not to use a predefined sparsity, and instead to take
advantage of an additional property: that of a dynamical
sparsity. As explained in Sec. II B, the computation starts off
from a very localized column vn0 5 d n0 and at every step operates once with the Hamiltonian H s on the vector of the
previous step. Thus, the column vectors acquire larger and
larger breadth as the calculation proceeds, reaching full
breadth only at the late stages ~the last 10% of the expansion
iterations P!.
To take account of this dynamic broadening of the column vector, a special sparse linear algebra algorithm must be
developed for representing the columns, matrices, and the
algebraic operations. We present such an algorithm, which
has the additional feature of being very flexible and does not
require any a priori form of sparsity to be imposed. Instead,
it allows vectors to be very narrow or wide, as dictated by
the evolving computation. The important step for achieving
these features is to use tree structures for representing column vectors. We chose to work with a binary tree. In our
method, the breadth of the column vectors is allowed to grow
or shrink by a process of trimming the tree as the computation proceeds. The trimming is done according to an accuracy threshold, which acts much like the digital precision of
the computer: numbers with a magnitude less than the
threshold are considered arbitrary and zeroed after each iteration. A full account and details of the method will be
published elsewhere,35 and here we only briefly describe the
central idea.
Consider a one-dimensional system with seven atoms
partitioned in space to boxes labeled A B C D, as shown in
Diagram 1
Diagram 1.
Assume for simplicity that every atom has one orbital so that
a state vector of the system is a column vector of length 7,
where the nth element C n is the probability amplitude for the
electron to be in the orbital of atom n. In a tree based on the
partitioning A–D, this column vector is represented as
shown in Diagram 2.
The data is organized in a way that encourages the following property: the larger the spatial distance between two
atoms, the earlier in the tree hierarchy they branch.
If the column vector C represents a column of the DM,
then, due to sparsity, orbitals centered on two very distant
atoms will generally not be simultaneously occupied. The
benefit of the tree structure is that if, for example, all the
coefficients C4 –C7 are zero, then this information is stored in
a single zero flag associated with the node designated by an
asterisk in the diagram ~every node in the tree has such a
zero flag!. Thus, whenever two columns having zeros in elements C4 –C7 are added, the addition operation is performed
J. Chem. Phys., Vol. 107, No. 23, 15 December 1997
R. Baer and M. Head-Gordon: Chebyshev expansion methods
only for the elements on the left-hand part of the tree. Similar considerations apply to other algebraic operations.
10007
The deficiency of the reverse summation is that the possibility of calculating the density matrices for many chemical
potentials and temperatures at one expansion, as described in
Sec. II C is now lost.
III. DM STRUCTURE IN R SPACE
Diagram 2.
The structure of the tree can efficiently zero large spatially
contiguous parts of the wavefunction by setting the zero flag
of the relevant node. Thus the important process of trimming
the tree as the computation proceeds is very efficient.
The binary tree codes can also be used to efficiently treat
two- and three-dimensional systems, as will be described in a
future publication.35
E. Reverse Chebyshev summation
The Chebyshev series may be summed in reverse order,
starting from the small coefficients, working toward the large
ones. Thus, the broadening of the column vectors, as the
Hamiltonian is applied to them, is delayed to later stages as
much as possible. The tree codes of the previous section can
take advantage of this and the performance of the method
increases by dramatically large factors. The reverse summation is based on the Clenshaw summation method.36 The
calculation proceeds by constructing a series of column vectors wJm , for m5 P21•••0, by the recursion
n
n
wnm 52Hwm11
2wm12
1a n vn ,
~2.20!
starting with wnP 5wnP11 50. The Chebyshev approximation
to the nth column r n is then
P21
r n5
(
m50
P21
a m vnm 5
(
m50
n
n
1T m wm12
!
~ T m wnm 22HT m wm11
P21
5
(
m52
~ T m 22HT m21 1T m22 ! wnm
1T 0 wn0 1T 1 wn1 22HT 0 wn1
5wn0 2Hwn1 .
~2.21!
As mentioned above, the advantage of reverse summation is
that the very small elements are summed first, and these may
be efficiently trimmed without loss of accuracy so that the
broadening of the columns is delayed as much as possible to
the final summations. Comparison of the forward and reverse
summations show that the numerical work for the latter is
only a small fraction of that of the former ~usually a reduction of execution times by a factor of more than 5!.
In this section, we analyze the locality and sparsity of
the density matrix in R space using the fact that to a given
precision the density matrix involves a finite Chebyshev expansion. In Sec. III A the locality of the DM in R space is
discussed in general. Basis independent results are derived,
showing that the spatial range of the DM is inversely proportional to the square root of the HOMO–LUMO gap. In
Sec. III C we discuss the sparsity of the DM in a given local
basis set.
A. DM locality for insulators
Insulators are characterized by the existence of a
HOMO–LUMO gap de, quite independent of the system
size. In metals, the gap usually shrinks proportionally as the
system grows, and are therefore excluded from the following
ground state discussion. Finite temperature calculation in
metals are considered in Sec. III B.
The FDM F̂ b , m 5 $ 11exp@b(Ĥ2m)#%21, is essentially
equivalent to the ground state DM ~of either KS or HF
Hamiltonians! when m is taken at the center of the gap and b
is given by Eq. ~2.9! for ground state calculations to a precision 102D , thus,
b s 'D
DE
32 log 10.
de
~3.1!
Using Eq. ~2.19!, the Chebyshev expansion can be truncated
at the following length:
P53D ~ D21 !
DE
de
~3.2!
~where we approximated 34 log 10'3!. The fact that the
Chebyshev expansion is of finite length, can be used also as
a theoretical tool for studying the properties of the ground
state DM of insulators. We show that the spatial range of the
FDM ~and thus of the DM! is inversely proportional to the
square root of the HOMO–LUMO energy gap de.
The discussion is rather a qualitative one, but allows to
draw general conclusions for a wide variety of systems. The
range of the FDM is a loosely defined quantity, since, in
general, the function ^ r8 u F̂ b , m u r& is dependent on both r and
r8 . However, when u r2r8 u is large, the exact functional dependence is not important, and one is interested in determining the spatial range W for which the value of ^ r8 u F̂ b , m u r&
may be neglected whenever u r2r8 u .W.
The system can be represented using a finite basis of
Gaussians G r of range s, centered on a three-dimensional
mesh of points r. The mesh spacing is a, of the same order
as s. The overlap matrix is
S r8 r5 ^ G r8 u G r& 5e 2 ~ r2r8 !
2 /2s 2
,
and the dual biorthonormal basis is defined by
J. Chem. Phys., Vol. 107, No. 23, 15 December 1997
~3.3!
10008
R. Baer and M. Head-Gordon: Chebyshev expansion methods
^ Ḡ ru 5 ( ~ S 21 ! rr8 ^ G r8 u .
~3.4!
r8
We state results for a finite basis and then take the limit to an
infinite delta-function basis by indefinitely decreasing the
mesh spacing a and the range of the Gaussians s ~keeping
a/ s constant!. It may be assumed that for a large distance a
given basis function and its dual have essentially the same
functional behavior, so that the ~scaled! Hamiltonian matrix
elements take the following form for large separations:
^ Ḡ r8 u Ĥ s u G r& 'e 2 ~ r2r8 !
2 /2s 2
~3.5!
,
where the prefactor of the exponent has been dropped, since
due to the locality of the interactions, it has only weak ~nonexponential! dependence on u r2r8 u when the latter is large.
The long-range matrix elements for Ĥ 2s can be estimated by a
Gaussian composition rule as
Finally, plugging this result in Eq. ~3.7!, the spatial range can
be estimated by the following representation independent estimate:
^ Ḡ r8 u Ĥ 2s u G r& 5 ( ^ Ḡ r8 u Ĥ s u G x&^ Ḡ xu Ĥ s u G r&
x
'e 2 ~ r2r8 !
2 /4s 2
~3.6!
.
is & s , and repeatedly using
Thus, the spatial range of
the Gaussian composition rule P-1 times, the range of H sP is
shown to be AP s .
Using the expression of Eq. ~3.2! for the expansion
length P, the range of the density matrix is then given by
Ĥ 2s
W ~ F̂ ! '
A
3D ~ D21 ! s 2 DE
.
de
~3.7!
This equation depends on two representation-dependent parameters: the spatial range s of the basis functions G r , and
the eigenvalue range of the Hamiltonian matrix DE5(E max
2Emin)/2.
For small enough s, E min is influenced by the minimal
values of the potential energy on mesh points close to atomic
centers and E max is dominated by the maximal values of the
kinetic energy. This is seen by considering the simple Gaussian integrals:
E min' ^ Ḡ r u 2
Z maxe 2
Z maxe 2
u G r & '2
;
r̂
s
\2 2
\2
¹ u G r& '
.
E max' ^ Ḡ r u 2
2m e
2m e s 2
~3.8!
The relations of Eq. ~3.8! implicitly assume that the exact
Kohn–Sham exchange-correlation potential is not more singular than the kinetic energy and the Coulomb potentials.
This indeed seems to be the case in practical applications.
Thus, for very small s we find that overall DE is dominated by the kinetic energy term, DE'E max/2, and therefore
~note: taking the limit DE→` does not alter the estimate of
the polynomial length, because, as is shown in the Appendix,
the estimate of Eq. ~3.2! depends on the condition of Eq.
~A8!, and this condition is better satisfied the larger E max is!:
lim s ADE5\/ A4m e .
s →0
FIG. 1. The DM range of a metal ~squares! and a small gap insulator with
d «50.01 a.u. ~diamonds!. Ranges are for D53.
~3.9!
W ~ F̂ ! '
A
\2
3D ~ D21 ! .
4m e d e
~3.10!
This result agrees with the estimation of Kohn for onedimensional periodic systems,37 according to which the spatial range is proportional to d e 21/2. The arguments we
present can be considered a generalization of Kohn’s theorem to systems of any dimension. Furthermore, for nonperiodic systems, although Eq. ~3.10! probably overestimates the
range of the DM, it establishes a finite range for it, a conclusion derived also in Refs. 38 and 39.
B. DM locality for metals at finite temperature
A further generalization of the Kohn theorem is possible,
following a similar line of reasoning for a finite temperature
system, where the Kohn–Sham procedure still holds, but the
exchange-correlation potential is now temperature
dependent.40 Assuming this new potential introduces no
larger singularities in s 21 than those of the kinetic energy
s 22 , we immediately obtain, from the discussion in the previous section and from Eq. ~2.19!,
W ~ F̂ ! ' AP s 5
A
\2
~ D21 ! b .
3m e
~3.11!
This result is especially applicable to metals, since for insulators it grossly overestimates the range unless the temperature is very high. The relation greatly resembles the deBroglie thermal wavelength l5 Ah 2 b /3m e obtained by
combining the free particle de-Broglie relation together with
the thermal kinetic energy expression 3/2 k BT. However, Eq.
~3.11! has been derived without explicitly assuming the electron to be free. Our estimate of the range of the DM for a
metal and a small band insulator or semiconductor is given
in Fig. 1.
J. Chem. Phys., Vol. 107, No. 23, 15 December 1997
R. Baer and M. Head-Gordon: Chebyshev expansion methods
B D ~ H ! 'L d/2B D ~ S ! .
C. DM sparsity in a local basis
In this section we use the results obtained in the Sec.
III A for deducing several characteristics of a representationdependent DM. Since we should like to focus on sparsity, we
transform the results of the previous section to ‘‘breadth’’ of
the DM or FDM, in a sense similar to that defined in Sec. II.
We define the breadth of vector v to a given precision 102D ,
B D (v), as the number of elements with magnitude exceeding
102D . The breadth B D (H s ) of a ~scaled! matrix H s is the
maximal breadth of its column. We zero ~trim!, after every
numerical matrix operation, the elements with magnitude
smaller than 102D , thus keeping the vector breadth as small
as possible, increasing the efficiency of matrix and vector
manipulations.
The general results of the preceding section motivate our
claim, that the breadth, to precision D, of H ns grows approximately at a rate proportional to n d/2, to be specific, we assume that a reasonable estimate is
B D ~ H ns ! 'n d/2B D ~ H s ! .
~3.12!
Here, d—the effective dimension—is defined by Eq. ~2.3!.
Note that Eq. ~2.3! does not contradict Eq. ~3.12! because of
the difference in definitions of breadth.
The FDM is approximated to precision D by a polynomial in the Hamiltonian of order P, so the breadth of the
FDM is estimated by
B D ~ F ! ' P d/2B D ~ H ! ,
~3.13!
where P is given by Eq. ~3.2! for finite temperature calculations, and by Eq. ~2.19! for ground state calculations.
This is an upper estimate, based on the worst-case assumption that no consistent cancellations occur in the expansion. In this matter, we refer the reader to the closing remarks of Sec. III B.
We now estimate the breadth B D (H) of the Hamiltonian
matrix itself. Since the dual basis ^ ā u is obtained from the
original basis by the metric S ab 5 ^ a u b & , namely,
^ ā u 5 ( ~ S
b
21
! ab ^ b u ,
~3.14!
the Hamiltonian matrix is defined as:
H ab 5
(c ~ S 21 ! ac ^ c u Ĥ u b & .
~3.15!
The breadth of the matrix H ac will therefore be
B D ~ H ! 'B D ~ S 21 ! .
~3.16!
21
We estimate the breadth of S by determining the length L
of its Chebyshev expansion ~see the Appendix!:
L'
1
2
D AC log 10.
~3.17!
The breadth of the inverse overlap matrix is therefore
B D ~ S 21 ! 'L d/2B D ~ S ! .
10009
~3.18!
This also serves as an estimate of the breadth of the Hamiltonian, and we can write
~3.19!
Inserting this expression into Eq. ~3.13! we obtain the
breadth of the FDM as
B D ~ F ! ' ~ PL ! d/2B D ~ S ! .
~3.20!
IV. ESTIMATES OF ALGORITHMIC COMPLEXITIES
In this section we use the estimates of the breadth of the
density matrix to determine the scaling properties of two
categories of approaches to linear scaling.
Before we continue, however, we feel it is important to
devote a few words to the definition of the accuracy D of the
calculation. Measuring the calculational accuracy in terms of
the error in the total energy, as is done in several recent
papers, may be misleading. This error does not clearly indicate the quality of the calculated DM for other than total
energy estimations. For example, since the energy minimization algorithms zero the first-order error in the total energy,
these approaches will tend to give high accuracy for the total
energy, even when the DM is relatively poorly determined.
When the electronic structure calculation is aimed, as it usually is, at a dynamical application ~i.e., calculating forces!, it
is the error in the DM that is important. Thus, unless only
structure is important, a suitable definition of the precision
should be based on the violation of DM idempotency and
commutativity with the Hamiltonian, and not on the trace of
the DM and the Hamiltonian. For example, in the case of an
orthonormal basis we define the precision by
10 2D 5max
SA
tr@~ F 2 2F ! 2 #
tr F
,
A
tr~@ H s ,F # 2 !
tr F
D
.
~4.1!
A. F 3 F methods
A number of approaches having a linear scaling complexity for calculation of the DM have been put forth by
several groups, such as Li, Nunes, and Vanderbilt ~LNV!,14
Hernandez et al.,17,18 and Kohn.16 These algorithms involve
a minimization of a functional of the DM, constructed to
ensure idempotency.
The minimization process is composed of a sequence of
M calculations of a power of the density matrix F n , where
n52,3,4 in the LNV, Hernandez et al. and Kohn approaches, respectively. By idempotency F n 'F, so the computation of F n requires n21 multiplications of matrices
similar in sparsity to F, hence our name F3F methods for
these methods. It follows that the numerical work required
for the F3F method is @see Eqs. ~2.6! and ~3.13!#
J5 a M n B D ~ F ! 2 N
' a M n ~ PL ! d B D ~ S ! 2 N,
~4.2!
where M n is the number of F3F operations required to
reach the minimum and determine the DM to a precision D.
J. Chem. Phys., Vol. 107, No. 23, 15 December 1997
10010
R. Baer and M. Head-Gordon: Chebyshev expansion methods
Some of the F3F methods do not explicitly require the
calculation of the inverse overlap matrix S 21 , and as a result,
the matrix F is a modified density matrix not exactly equal to
the density matrix, as we have defined it. However, as
pointed out by Nunes and Vanderbilt,15 the breadth of the
modified DM is still comparable to the range of the original
DM which does include the S 21 term, so the resulting numerical labor, can still be estimated as shown in Eq. ~4.2!.
B. The Chebyshev expansion method
The Chebyshev expansion of the FDM also constitutes a
linearly scaling algorithm. The numerical work involved
consists of applying the Chebyshev series, of length P to
each of the N basis functions. The numerical work needed to
calculate the nth column of the density matrix is @see Eq.
~2.18!#
P
J~ Fn!'
(
m50
~4.3!
J ~ Hvnm ! .
FIG. 2. Numerical work vs error norm of DM @see Eq. ~4.1!#. Circles ~LNV!
and squares ~Chebyshev! are calculation results while lines are of a slope
given by equations in the text LNV calculations for a 3D system were not
calculated due to CPU memory limitations.
The breadth of the Chebyshev vectors is
B D ~ vnm ! 'B D ~ H m v! 5m d/2B D ~ H ! .
~4.4!
Thus, the work in Eq. ~4.3! becomes
P21
J~ Fn!'
(
m50
a m d/2B D ~ H ! 2 ' a P d/211 B D ~ H ! 2 .
~4.5!
The total work for calculating the density matrix in the
Chebyshev method is therefore
J' a P d/211 L d B D ~ S ! 2 N.
~4.6!
Comparing this result with the corresponding estimate for
the F3F methods @Eq. ~4.2!#, it is seen that the latter scale
as P d while the Chebyshev method as P d/211 . This difference stems from the fact that the Chebyshev method involves
operating P times with the relatively small breadth Hamiltonian on N vectors of breadth P d/2, while the F3F methods
involve multiplication of two matrices, of breadth P d/2 each.
ergy, starting from F51/2 Î. The results are shown in Fig. 2,
where the numerical work, in CPU time, is plotted against
the error norm of Eq. ~4.1!. The lines in the figure are those
determined from Eqs. ~4.2! and ~4.6! @using the relation between the expansion length P and the accuracy D, Eq. ~3.2!#.
It is seen that the theoretical estimates are in reasonable accordance with the actual results.
V. APPLICATIONS: TIGHT-BINDING SYSTEMS
In this section we provide examples of the performance
of the Chebyshev expansion in a tight-binding model for
hydrocarbons. We use the model of Davidson and Pickett42
including the modifications of Horsfield et al.41 yielding a
local charge neutrality tight-binding method. Two cases are
C. Case study: Numerical work versus DM accuracy
In order to check the results of Eqs. ~4.2! and ~4.6! we
have timed the calculations for reaching a precision D using
both the LNV method and Chebyshev method. The calculations were performed on a tight-binding cubic lattice model,
having 10d sites ~d51,2,3 is the dimensionality! and a nearest neighbor spacing of 4 a.u. The parametrization of the
model was based on the Hamiltonian of Ref. 41 for carbon,
but the following changes were made for simplifying the
interactions, achieving a large band gap and smaller spectral
range: only two electrons were allocated to each atom and
separately for each dimension d we changed the magnitude
of the Slater–Koster parameter V ss, s until a band gap of
d e 50.1 a.u. was achieved and the spectral range DE was in
the range 0.25–0.3 a.u.
Both methods were applied using the sparse matrix tree
code described in Sec. II D. In the LNV implementation the
conjugate gradients method was used for minimizing the en-
FIG. 3. CPU times for calculating the density matrix of a d51 hydrocarbon
chain Cn Hn12 using a tight-binding Hamiltonian, as a function of system
size, for the Chebyshev method ~diamonds—D53, triangles—D54, and
dots—D55! and direct diagonalization ~squares!.
J. Chem. Phys., Vol. 107, No. 23, 15 December 1997
R. Baer and M. Head-Gordon: Chebyshev expansion methods
FIG. 4. CPU times for calculating the density matrix of a 2-D carbon sheet
saturated with hydrogen, using a tight-binding Hamiltonian, as a function of
system size and accuracy, for the Chebyshev method ~dots—D53 and
triangles—D54! and direct diagonalization ~squares!.
considered: d51 and d52 systems. CPU times reported refer to calculations on a DEC-3000 workstation with a single
175 MHz processor and 128 MB of RAM.
A. d 51 system: Saturated carbon chain „Cn H2 n 12 …
This system is characterized by a large HOMO–LUMO
gap of d e 50.3 a.u. The spectral range of the tight-binding
Hamiltonian is DE51.7 a.u. We timed the performance for
varying system sizes and three precision values D53,4,5
~with corresponding expansion lengths P590, 190, 360!.
Note that the dimension of the full matrices is N56n12,
where n is the number of carbon atoms. The results are
shown in Fig. 3. It is seen that the turnover size ~the system
size at which conventional diagonalization gives comparable
performance to the linear scaling method! is at about n
550, 70, and 120 for corresponding accuracy D53, 4, and
5.
B. d 52 System: carbon sheet saturated with
hydrogen
Here, too, the hydrogen saturation enables a large bandgap of d e 50.17 a.u. The energy range is DE'2 a.u. We
report, in Fig. 4, the results for D53, D54 ~Chebyshev
expansion lengths is P5160, 360!. As the system gets larger
the number of hydrogen atoms per carbon atom approaches 1
~for a small system, the boundary effects are noticeable and
some Carbon atoms are saturated by two hydrogen atoms!,
so the matrix dimension is about N55n. The turnover sizes
are n5130 for D53 and n5280 for D54.
VI. CONCLUSIONS
In this paper we analyzed linear scaling algorithms for
electronic structure calculations and focused on one specific
method, the Chebyshev expansion of the DM. For that
method we have given rules for selecting the various param-
10011
eters, based on the accuracy required and known properties
of the system. We have also shown how to speed up the
application of the method, first by representing vectors as
binary trees and trimming the trees according to a threshold
accuracy criterion, and then by performing a reverse summation. We have also pointed out how to efficiently search for
the chemical potential by calculating the DM for several
chemical potentials and temperatures in one forward summation.
One conclusion is that the linear scaling methods are
especially useful for large tight-binding Hamiltonian calculations. For ab initio calculations, where an overlap matrix is
present, the methods are rather limited to systems of low
effective dimensionality (d,2) and large gap ~or hightemperature! higher dimensional systems.
The Chebyshev expansion method is shown to be a
strong competitor for the LNV-type methods that have
emerged recently. This is especially so for systems with effective dimensionality larger than 1, where we have given
arguments why this should be so.
ACKNOWLEDGMENTS
This work was supported by the Laboratory Directed
Research and Development Program of Lawrence Berkeley
Laboratory under US-DOE Contract No. DE-AC0376SF00098. R. B. wishes to thank D. Neuhauser for helpful
discussions. M.H.G. acknowledges a Packard Fellowship
~1995–2000!.
APPENDIX: EXPANSION LENGTHS FOR THE
DENSITY AND OVERLAP MATRICES
In this appendix we use the mathematical theory of
Chebyshev expansion convergence to estimate the length of
the Chebyshev expansion series for the DM and the overlap
matrix.
The convergence properties of the Chebyshev polynomial expansion of a given function f (x) in the interval
@ 21, 1 # is controlled by the singularities near the real axis of
its analytical continuation f (z) in the complex plane. The
theory is well established ~see, for example, Ref. 33! and
only the essentials will be summarized here. Let us associate
with each positive number r an ellipse with foci at z561
given by the following parametrized curve in the complex
plane:
r 1 r 21
r 2 r 21
cos u 1i
sin u ,
z r~ u ! 5
2
2
~A1!
where the parameter u varies in the interval @0,2p#. Let r be
the largest number for which f (z) is analytical in the complex domain encircled by the ellipse z r ~since z r is the same
ellipse as z r 21 , r is not less than 1!. Then, the coefficients a n
in the expansion of the function f (x), xP @ 21,1# , satisfy33
u a nu <
2M
,
rn
where M is the maximal value of u f ( z r ) u .
J. Chem. Phys., Vol. 107, No. 23, 15 December 1997
~A2!
10012
R. Baer and M. Head-Gordon: Chebyshev expansion methods
For the case of f (z)51/(11e b s (z2 m s ) ), the singularities
are at
z m5 m s1
~ 2m11 ! p
bs
~A3!
i,
where m is any integer. Thus, the largest ellipse encircling an
analytic domain for f is determined by the location of z 0
5 m s 1i j , where
j5p/bs .
r max5a1b,
~A5!
with
ACD
log 10.
~A14!
Notice that when a geometrical expansion for the overlap matrix is used:
S 21 '
L s 21
(
n50
~ 12S ! n ,
~A15!
its required length, for a given precision D, is
L G 'DC log 10.
~A16!
Thus the Chebyshev expansion length is substantially less
sensitive to the condition number of the matrix.
m 2 1 j 2 111 A~ m 2 1 j 2 11 ! 2 24 m 2
. ~A6!
2
a 2 511b 2 '
m 2s 1 j 2 111 A~ 12 m 2s ! 2 12 j 2 ~ m 2s 11 !
.
2
~A7!
Now, for all cases of relevance,
4 j 2 !12 m 2s ,
~A8!
so the following estimate is obtained:
a 2 511b 2 '11
j2
.
2
~A9!
Therefore, r max5a1b'11j/&, and
log r max' j /&.
~A10!
Using Eq. ~A2!, assuming a required precision of 102D ,
P must be large enough so that
2M
2D
.
P ,10
r max
~A11!
Taking a logarithm, rearranging, and using Eq. ~A10!, the
resulting estimate is
&D log 10
'D b s .
j
~A12!
The linear relation of P and b was checked in numerical
tests empirically we found a somewhat tighter limit:
P' 32 ~ D21 ! b s .
P. Hohenberg and W. Kohn, Phys. Rev. B B136, 864 ~1964!; W. Kohn
and L. J. Sham, Phys. Rev. A A140, 1133 ~1965!; L. J. Sham and W.
Kohn, Phys. Rev. 145, 561 ~1966!.
2
A. P. Sutton, M. W. Finnis, D. G. Pettifor, and Y. Ohta, J. Phys. C 21, 35
~1988!.
3
C. A. White, B. G. Johnson, P. M. W. Gill, and M. Head-Gordon, Chem.
Phys. Lett. 230, 8 ~1994!; C. A. White, B. G. Johnson, P. M. W. Gill, and
M. Head-Gordon, ibid. 2453, 268 ~1997!.
4
C. A. White, B. G. Johnson, P. M. W. Gill, and M. Head-Gordon, Chem.
Phys. Lett. 253, 268 ~1996!.
5
J. C. Burant, G. E. Scuseria, and M. J. Frisch, J. Chem. Phys. 105, 8969
~1996!.
6
M. C. Strain, G. E. Scuseria, and M. J. Frisch, Science 271, 5245 ~1996!.
7
E. Schwegler, M. Challacombe, and M. Head-Gordon, J. Chem. Phys.
106, 9708 ~1997!.
8
M. C. Payne, M. P. Teter D. C. Allen, T. A. Arias, and J. D. Joanopolus,
Rev. Mod. Phys. 64, 1045 ~1992!.
9
R. Car and M. Parrinello, Phys. Rev. Lett. 55, 2471 ~1985!.
10
W. T. Yang, Phys. Rev. Lett. 66, 1438 ~1991!.
11
F. Mauri, G. Galli, and R. Car, Phys. Rev. B 47, 9973 ~1993!.
12
W. Kohn, Chem. Phys. Lett. 208, 167 ~1993!.
13
P. W. Anderson, Phys. Rev. Lett. 21, 13 ~1968!.
14
X.-P. Li, R. W. Nunes, and D. Vanderbilt, Phys. Rev. B 47, 10 891
~1993!.
15
R. W. Nunes and D. Vanderbilt, Phys. Rev. B 50, 17 611 ~1994!.
16
W. Kohn, Phys. Rev. Lett. 76, 3168 ~1996!.
17
E. Hernandez and M. J. Gillan, Phys. Rev. B 51, 10 157 ~1995!.
18
E. Hernandez, M. J. Gillan, and C. M. Goringe, Phys. Rev. B 53, 7147
~1996!.
19
P. Ordejon, D. A. Drabold, R. M. Martin, and M. P. Grumbach, Phys.
Rev. B 51, 1456 ~1995!.
20
E. B. Stechel, A. R. Williams, and P. J. Feibelman, Phys. Rev. B 49,
10 088 ~1994!.
21
G. Galli and M. Parrinello, Phys. Rev. Lett. 69, 3547 ~1992!.
22
S. Goedecker and L. Colombo, Phys. Rev. Lett. 73, 122 ~1994!.
23
S. Goedecker, J. Comput. Phys. 118, 261 ~1995!.
24
S. Goedecker and M. Teter, Phys. Rev. B 51, 9455 ~1995!.
25
A. F. Voter, J. D. Kress, and R. N. Silver, Phys. Rev. B 53, 12 733 ~1996!.
26
R. Kosloff and H. Tal-Ezer, Chem. Phys. Lett. 127, 223 ~1986!.
27
D. Neuhauser and M. Baer, J. Chem. Phys. 90, 4351 ~1989!.
28
R. Baer and R. Kosloff, Chem. Phys. Lett. 200, 183 ~1992!.
29
D. K. Hoffman, Y. Huang, W. Zhu, and D. J. Kouri, J. Chem. Phys. 101,
1242 ~1994!.
30
R. Baer, Y. Zeiri, and R. Kosloff, Phys. Rev. B 54, 5287 ~1996!.
31
R. Baer and R. Kosloff, J. Chem. Phys. 106, 8862 ~1997!.
32
D. J. Kouri, Y. Huang, and D. K. Hoffman, J. Phys. Chem. 100, 7903
~1996!.
33
T. J. Rivlin, Chebyshev Polynomials: From Approximation Theory to Algebra and Number Theory ~Wiley, New York, 1990!.
34
R. P. Tewarson, Sparse Matrices ~Academic Press, New York, 1973!.
1
Assuming very small j, we neglect j 4 and write
P.
L. 21
~A4!
The largest ellipse not containing the z 0 singularity is defined
by
a 2 511b 2 5
tion of the equation ( r 1 r 21 )/252z 0 and the resulting estimate of the series length L necessary for achieving a precision 102D is obtained for large C as
~A13!
The computation of the inverse overlap matrix S 21
~where the eigenvalues of S are assumed all positive! can
also be performed using a Chebyshev expansion. Performing
a similar analysis to the function, it can be shown, after
proper scaling, that the pole nearest to the interpolation interval is at z 0 5(11C)/(12C), where C is the condition
number of the overlap matrix ~the ratio of the largest to the
smallest eigenvalues!; thus the appropriate r max is the solu-
J. Chem. Phys., Vol. 107, No. 23, 15 December 1997
R. Baer and M. Head-Gordon: Chebyshev expansion methods
W. Kohn and R. J. Onffroy, Phys. Rev. B 8, 2485 ~1973!.
A. K. Rajagopal, Adv. Chem. Phys. 41, 59 ~1980!.
41
A. P. Horsfield, P. D. Godwin, D. G. Pettifor, and A. P. Sutton, Phys. Rev.
B 54, 15 773 ~1996!.
42
B. N. Davidson and W. E. Pickett, Phys. Rev. B 49, 11 253 ~1994!.
35
39
36
40
R. Baer and M. Head-Gordon, to be published.
C. W. Clenshaw, Mathematical Tables, National Physical Laboratory Vol.
5 ~HM Stationery Office, London, 1962!.
37
W. Kohn, Phys. Rev. 115, 809 ~1959!.
38
A. Nenciu and G. Nenciu, Phys. Rev. B 47, 10 112 ~1993!.
10013
J. Chem. Phys., Vol. 107, No. 23, 15 December 1997