K4 supplementary material

I.
APPENDIX A: SEGMENTED CONTRACTION ALGORITHM
This section describes the presented algorithm in its segmented contraction version. For
each pair of generic basis set functions, all powers of gaussian exponents can be precomputed
and reused as necessary in every ERI batch involving said pairs.
function PrecomputePowers
for j = 1 → Kb do
for i = 1 → Ka do
for all required p do
1
Vij,p = dA
i (2ηij )p
end for
end for
for all required b do
b
Uj,b = dB
j (2βj )
end for
end for
for l = 1 → Kd do
for k = 1 → Kc do
for all required q do
1
Tkl,q = dC
k (2ζkl )q
end for
end for
for all required d do
d
Sl,d = dD
l (2γl )
end for
end for
end function
The accumulation of kernels in the segmented contraction version of the K4 algorithm
proceeds as follows:
function SegmentedContraction
compute frame of reference
compute interatomic vectors in the frame of reference
1
for all (b, p, d, q, m) do
(m)
bp (0)dq
←0
end for
for l = 1 → Kd do
for all (b, p, q, m) do
(m)
bp (0}q
←0
end for
for k = 1 → Kc do
for all (b, p, m) do
(m)
bp (0]
←0
end for
for j = 1 → Kb do
for all (p, m) do
(m)
p {0]
←0
end for
for i = 1 → Ka do
compute Fm (z)
compute [0](m)
for all (p, m) do
(m)
p {0]
← p {0](m) + Vij,p ∗ [0](m)
end for
end for
for all (b, p, m) do
(m)
bp (0]
← bp (0](m) + Uj,b ∗ p {0](m)
end for
end for
for all (b, p, q, m) do
(m)
bp (0}q
(m)
← bp (0}q
+ Tkl,q ∗ bp (0](m)
end for
end for
for all (b, p, d, q, m) do
(m)
bp (0)dq
(m)
(m)
← bp (0)dq + Sl,d ∗ bp (0}q
2
end for
end for
end function
After the accumulation of kernels, these are transformed into the final ERI batch by the
twelve-step sequence described below.
function Transform
apply 1CRR over z
apply < bra| CTE over z
apply |ket > CTE over z
apply 1CRR over y
apply 1CRR over x
apply |ket > CTE over y
apply |ket > CTE over x
apply < bra| CTE over y
apply < bra| CTE over x
apply < bra| HRR over z
apply |ket > HRR over z
apply |ket > HRR over y
rotate the integral batch to the original frame
end function
As pointed out earlier, the rotation of the batch can be skipped and absorbed in other
parts of the calculation (i.e. density matrix contraction step).
II.
APPENDIX B: GENERAL CONTRACTION ALGORITHM
This section describes the presented algorithm in its general contraction version. The
precomputed quantities are the same as in segmented contraction, only taking into account
every individual function.
function GeneralContraction
compute frame of reference
compute interatomic vectors in the frame of reference
for ja = 1 → Ja do
3
for jb = 1 → Jb do
for jc = 1 → Jc do
for jd = 1 → Jd do
for all (b, p, d, q, m) do
(m) ja jb jc jd
bp (0)dq
←0
end for
end for
end for
end for
end for
for l = 1 → Kd do
for ja = 1 → Ja do
for jb = 1 → Jb do
for jc = 1 → Jc do
for all (b, p, q, m) do
(m) ja jb jc
bp (0}q
←0
end for
end for
end for
end for
for k = 1 → Kc do
for ja = 1 → Ja do
for jb = 1 → Jb do
for all (b, p, m) do
(m)ja jb
bp (0]
←0
end for
end for
end for
for j = 1 → Kb do
for ja = 1 → Ja do
for all (p, m) do
(m)ja
p {0]
←0
4
end for
end for
for i = 1 → Ka do
compute Fm (z)
compute [0](m)
for ja = 1 → Ja do
for all (p, m) do
(m)ja
p {0]
ja
← p {0](m)ja + Vij,p
∗ [0](m)
end for
end for
end for
for ja = 1 → Ja do
for jb = 1 → Jb do
for all (b, p, m) do
(m)ja jb
bp (0]
jb
← bp (0](m)ja jb + Uj,b
∗ p {0](m)ja
end for
end for
end for
end for
for ja = 1 → Ja do
for jb = 1 → Jb do
for jc = 1 → Jc do
for all (b, p, q, m) do
(m) ja jb jc
bp (0}q
(m) ja jb jc
← bp (0}q
end for
end for
end for
end for
end for
for ja = 1 → Ja do
for jb = 1 → Jb do
for jc = 1 → Jc do
5
jc
+ Tkl,q
∗ bp (0](m)ja jb
for jd = 1 → Jd do
for all (b, p, d, q, m) do
(m) ja jb jc jd
bp (0)dq
(m) ja jb jc jd
← bp (0)dq
(m) ja jb jc
jd
+ Sl,d
∗ bp (0}q
end for
end for
end for
end for
end for
end for
end function
The corresponding transforms are performed over each individual 4-tuple of indices
(ja jb jc jd ) identically as in the segmented contraction scheme.
6
III.
APPENDIX C: PILOT CODE TIMINGS
TABLE I. Timing comparison of the segmented contraction algorithm for basis sets with sp shells
b b
Basis
Code
STO-6G
e f
c
6-31+G*
FIESTA+K4 DALTON 2011 speedup FIESTA+K4 DALTON 2011 speedup
C20 (bowl isomer)
29.986
335.28
11.18
264.55
588.72
2.22
C20 (cage isomer)
28.186
358.44
12.72
260.41
613.24
2.35
C7 H4 heptahexane
1.0165
8.7380
8.60
4.5453
12.044
2.65
C7 H4 heptatriyne
1.0182
8.6200
8.46
4.4984
11.926
2.65
C6 H12 2,3-dimethylbut-2-ene
2.6250
15.746
6.00
4.9243
15.258
3.10
C12 H24 octamethylcyclobutane
41.358
228.49
5.52
90.137
231.95
2.57
C5 N OH5 2-hydroxypyridine
1.1412
10.825
9.49
4.6403
13.760
2.97
C5 N OH5 2-hydropyridone
1.2294
10.799
8.78
4.6209
13.594
2.94
CN2 H2 diazomethane
0.0461
0.4170
9.04
0.1935
0.4920
2.54
C3 N2 H6 1-pyrazoline
0.5460
4.4940
8.23
1.5379
4.9320
3.21
3.8057
27.160
7.14
11.232
30.506
2.72
C16 H16 [2, 2]paracyclophane
46.821
330.56
7.06
174.20
431.49
2.48
C12 H12 (D6h cage)
14.669
114.29
7.79
53.170
140.39
2.64
C12 H12 (isomer 2)
15.134
106.95
7.07
54.349
135.17
2.49
C9 O3 (isomer 1)
3.6873
42.681
11.58
31.782
72.967
2.30
C9 O3 (isomer 2)
3.6335
42.724
11.76
32.952
72.970
2.21
10.228
48.771
4.77
19.680
47.490
2.41
10.355
56.938
5.50
19.235
53.263
2.77
C8 H10 p-xylene
C8 H18 n-octane
C8 H18 2,2,3,3-tetramethylbutane
a
Times (in seconds) refer to the combined ERI evaluation and density matrix contraction steps.
b
All prescreening options were turned off.
c
H: [6s]/(1s)C,O,N: [12s6p]/(2s1p)1
d
e
d
H: [4s]/(2s)C,O,N: [11s5p1d]/(4s3p1d)2
All code was compiled using GCC 4.3.0 and ’-O3 -march=native -ffast-math’. Vectorization was disabled.
f
All tests were run on one core of a AMD Phenom 9750 with 8Gb of RAM, running Linux 2.6.27 64 bit.
7
TABLE II. Timing comparison of the segmented contraction algorithm for double zeta polarized
basis sets
b b
Basis
Code
Def2-SVP
e f
c
Sapporo DZP
d
FIESTA+K4 DALTON 2011 speedup FIESTA+K4 DALTON 2011 speedup
C20 (bowl isomer)
128.79
269.55
2.09
182.21
537.12
2.95
C20 (cage isomer)
128.01
276.57
2.16
180.14
544.25
3.02
C7 H4 heptahexane
3.4595
8.6840
2.51
5.3543
18.217
3.40
C7 H4 heptatriyne
3.4386
8.6460
2.51
5.3718
18.080
3.37
C6 H12 2,3-dimethylbut-2-ene
8.0382
23.754
2.96
14.222
50.037
3.52
C12 H24 octamethylcyclobutane
142.62
371.68
2.61
246.40
779.04
3.16
C5 N OH5 2-hydroxypyridine
4.1096
10.656
2.59
6.3892
22.391
3.51
C5 N OH5 2-hydropyridone
4.1109
10.630
2.59
6.3832
22.335
3.50
CN2 H2 diazomethane
0.1999
0.3740
1.87
0.2823
0.7920
2.81
C3 N2 H6 1-pyrazoline
1.7868
5.2870
2.96
2.9733
11.063
3.72
12.985
34.120
2.63
21.653
72.384
3.34
C16 H16 [2, 2]paracyclophane
175.43
416.91
2.38
280.95
861.48
3.07
C12 H12 (D6h cage)
54.264
133.70
2.46
86.823
276.51
3.18
C12 H12 (isomer 2)
53.899
130.38
2.42
86.902
275.72
3.17
C9 O3 (isomer 1)
15.009
33.635
2.24
20.943
66.906
3.19
C9 O3 (isomer 2)
15.205
33.443
2.20
21.403
66.901
3.13
33.284
85.758
2.58
59.608
184.27
3.09
34.065
93.026
2.73
59.519
196.05
3.29
C8 H10 p-xylene
C8 H18 n-octane
C8 H18 2,2,3,3-tetramethylbutane
a
Times (in seconds) refer to the combined ERI evaluation and density matrix contraction steps.
b
c
All prescreening options were turned off.
H: [4s1p]/(2s1p)C,O,N: [7s4p1d]/(3s2p1d)3
d
H: [4s3p]/(2s1p)4,5 C,O,N: [9s4p2d]/(3s2p1d)6,7
e
All code was compiled using GCC 4.3.0 and ’-O3 -march=native -ffast-math’. Vectorization was disabled.
f
All tests were run on one core of a AMD Phenom 9750 with 8Gb of RAM, running Linux 2.6.27 64 bit.
8
TABLE III. Timing comparison of the segmented contraction algorithm for triple zeta polarized
basis sets
b b
Basis
Code
Def2-TZVPP
e f
c
Sapporo TZP
d
FIESTA+K4 DALTON 2011 speedup FIESTA+K4 DALTON 2011 speedup
C20 (bowl isomer)
6649.9
4994.1
0.75
6406.9
9413.7
1.47
C20 (cage isomer)
6671.6
5186.8
0.78
6434.2
9548.3
1.48
C7 H4 heptahexane
175.64
165.21
0.94
176.88
323.31
1.83
C7 H4 heptatriyne
175.16
161.73
0.92
177.40
318.90
1.80
C6 H12 2,3-dimethylbut-2-ene
398.43
445.78
1.12
426.43
895.33
2.10
C12 H24 octamethylcyclobutane
6974.5
6890.7
0.99
7218.1
13809.
1.91
C5 N OH5 2-hydroxypyridine
210.51
202.15
0.96
215.09
391.02
1.82
C5 N OH5 2-hydropyridone
210.18
202.18
0.96
212.44
392.04
1.85
CN2 H2 diazomethane
7.8574
6.7630
0.86
7.8487
13.269
1.69
C3 N2 H6 1-pyrazoline
87.608
99.521
1.14
90.691
192.21
2.12
650.17
636.60
0.98
668.69
1276.8
1.91
C16 H16 [2, 2]paracyclophane
8865.5
7725.6
0.87
8869.2
15264.
1.72
C12 H12 (D6h cage)
2714.8
2541.8
0.94
2716.1
4943.7
1.82
C12 H12 (isomer 2)
2684.5
2446.0
0.91
2737.6
4827.5
1.76
C9 O3 (isomer 1)
802.75
623.18
0.78
754.58
1199.9
1.59
C9 O3 (isomer 2)
813.52
629.97
0.77
752.12
1189.4
1.58
1610.6
1597.7
0.99
1705.1
3363.5
1.97
1643.1
1742.0
1.06
1707.8
3480.7
2.04
C8 H10 p-xylene
C8 H18 n-octane
C8 H18 2,2,3,3-tetramethylbutane
a
Times (in seconds) refer to the combined ERI evaluation and density matrix contraction steps.
b
c
All prescreening options were turned off.
H: [5s2p1d]/(3s2p1d) C,O,N: [11s6p2d1f]/(5s3p2d1f)3
d
H: [6s3p2d]/(3s2p1d)4,5 C,O,N: [10s5p3d2f]/(4s3p2d1f)6,8
e
All code was compiled using GCC 4.3.0 and ’-O3 -march=native -ffast-math’. Vectorization was disabled.
f
All tests were run on one core of a AMD Phenom 9750 with 8Gb of RAM, running Linux 2.6.27 64 bit.
9
TABLE IV. Timing comparison of the algorithm for Sadlej-pVTZ and ANO TZP basis sets
Basis
Code
Sadlej-pVTZ
e f
c
a b
ANO TZP
FIESTA+K4 DALTON 2011 speedup FIESTA+K4 DALTON 2011 speedup
C20 (bowl isomer)
1241.1
3226.1
2.60
10972.
60669.
5.53
C20 (cage isomer)
1274.9
3353.1
2.63
10977.
61015.
5.56
C7 H4 heptahexane
34.487
95.062
2.76
331.20
1899.6
5.74
C7 H4 heptatriyne
34.368
92.898
2.70
332.97
1895.2
5.69
C6 H12 2,3-dimethylbut-2-ene
68.959
213.02
3.09
825.12
4386.9
5.32
C12 H24 octamethylcyclobutane
1554.7
3319.7
2.14
14101.
67360.
4.78
C5 N OH5 2-hydroxypyridine
40.902
111.53
2.73
403.33
2244.2
5.56
C5 N OH5 2-hydropyridone
40.893
112.61
2.75
402.14
2221.2
5.52
CN2 H2 diazomethane
1.6894
3.9090
2.31
11.880
76.081
6.40
C3 N2 H6 1-pyrazoline
18.168
51.228
2.82
178.73
1016.8
5.69
138.38
328.17
2.37
1287.7
6700.6
5.20
C16 H16 [2, 2]paracyclophane
1841.5
4184.3
2.27
16393.
81825.
4.99
C12 H12 (D6h cage)
560.98
1361.8
2.43
5150.8
26258.
5.10
C12 H12 (isomer 2)
561.29
1337.7
2.38
5083.7
26193.
5.15
C9 O3 (isomer 1)
145.15
401.93
2.77
1366.4
7816.0
5.72
C9 O3 (isomer 2)
145.67
402.32
2.76
1358.4
7780.7
5.73
368.33
774.06
2.10
3330.6
16489.
4.95
371.98
811.10
2.18
3343.2
16860.
5.04
C8 H10 p-xylene
C8 H18 n-octane
C8 H18 2,2,3,3-tetramethylbutane
a
Times (in seconds) refer to the combined ERI evaluation and density matrix contraction steps.
b
c
All prescreening options were turned off.
H[6s4p]/(3s2p)C,O,N[10s6p4d]/(5s3p2d)9–13
d
e
d
H[8s4p3d]/(3s2p1d)C,O,N[14s9p4d3f]/(4s3p2d1f)14–16
All code was compiled using GCC 4.3.0 and ’-O3 -march=native -ffast-math’. Vectorization was disabled.
f
All tests were run on one core of a AMD Phenom 9750 with 8Gb of RAM, running Linux 2.6.27 64 bit.
10
REFERENCES
1
W. Hehre, R. Stewart, and J. Pople, J. Chem. Phys. 51, 2657 (1969).
2
W. Hehre, R. Ditchfield, and J. Pople, J. Chem. Phys. 56, 2257 (1972).
3
F. Weigend and R. Ahlrichs, J. Phys. Chem. 7, 3297 (2005).
4
H. Yamamoto and O. Matsuoka, Bull. Univ. Electro. Comm. 5, 23 (1992).
5
T. Noro, M. Sekiya, and T. Koga, Theor. Chem. Acc. 109, 85 (2003).
6
T. Noro, M. Sekiya, and T. Koga, Theor. Chem. Acc. 98, 25 (1997).
7
H. Moriyama and H. Tatewaki, Unpublished.
8
H. Tatewaki and T. Koga, J. Chem. Chem. 104, 8493 (1996).
9
A. Sadlej, Collec. Czech. Chem. Commun. 53 (1988).
10
A. Sadlej and M. Urban, J. Mol. Struct. 234, 147 (1991).
11
A. Sadlej, Theor. Chim. Acta 79, 123 (1992).
12
A. Sadlej, Theor. Chim. Acta 81, 45 (1992).
13
A. Sadlej, Theor. Chim. Acta 81, 339 (1992).
14
P. Widmark, P. Malmqvist, and B. Roos, Theor. Chim. Acta 77, 291 (1990).
15
P. Widmark, B. Persson, and B. Roos, Theor. Chim. Acta 79, 419 (1991).
16
R. Pou-Amerigo, M. Merchan, I. Nebot-Gil, P. Widmark, and B. Roos, Theor. Chim. Acta
92, 149 (1995).
11