Production of gauge configurations for lattice QCD at zero and non

Production of gauge configurations for lattice QCD at zero
and non-zero temperature
A. Bazavov1 , N. Brambilla2 , P. Petrezcky3 and A. Vairo2 J. Weber2
2
1
University of Iowa,
Physik Department, Technische Universität München, Garching,
3
Brookhaven National Lab
KONWIHR Kick-off meeting, 08/06/2015
Goal of lattice QCD
Want to solve the QCD path integral with first principles calculation
ZQCD ∝
Z
DA DψDψ̄ e iS
g
[A;g]
q
Y
e iS
[A;g]
Y
[A,ψ,ψ̄;mf ]
f
Compute observables from path integral as
hOi ∝
Z
DA DψDψ̄ O[A, ψ, ψ̄]e iS
g
e iS
q
[A,ψ,ψ̄;mf ]
f
Not well-defined ⇒ finite space-time lattice provides UV and IR cutoff
Estimate with finite set of configurations ⇒ importance sampling
⇒ Analytical continuation to Euclidean space-time
g
iSM
[AM ; g] → −SEg [AE ; g],
q
iSM
[AM , ψM , ψ̄M ; mf ] → −SEq [AE , ψE , ψ̄E ; mf ]
Generic HMC update algorithm
Classical solution minimizes Euclidean action ⇒ action as stat. weight
hOi ∝
Z
DA DψDψ̄ O[A, ψ, ψ̄]e −S
g
[A;g]
Y
e −S
q
[A,ψ,ψ̄;mf ]
f
Start from arbitrary field configuration, update with Markov process
⇒ thermalised configurations that fluctuate around classical solution
Ergodicity: always accept favourable configurations,
but also accept unfavourable configurations with non-zero probability
⇒ Better propose new configurations with high acceptance probability
Heatbath update combined with molecular dynamics traj. in HMC
Global MC accept/reject step at end of each MD trajectory
⇒ Discretisation errors of MD trajectory corrected in global A/R step
Discrete molecular dynamics
Evolve in fictitious MD time with EoM from MD Hamiltonian
H[P, Q] = P 2 /2 + S eff [Q]
Effective action S eff [Q]: both gauge action and terms from quark action
Conjugate momenta P generated in heatbath step at start of traj.
Ṗ = −
∂S eff [Q]
= F [Q]
∂Q
Q̇ = P
Thermalisation through heatbath (MD uses symplectic integrator)
Integrate along MD traj. using discrete MD time steps ∆τ
Balance MD integrators between discretisation error and numerical cost
Gauge force significantly larger and cheaper than fermion force
⇒ Gauge force w. smaller steps than fermion force (→ MULTI-STEP)
Dealing with fermions as sea quarks
MD force needs derivative of action wrt. fields (Q ↔ {A, ψ, ψ̄})
Fermions fields are Grassmann-valued, difficult for computer algebra
⇒ Formally integrate out quarks ⇒ determinant of Dirac operator M
hOi ∝
Z
DA O[A]e −S
g
[A;g]
Y
det M[A; mf ]
f
Explicit valence quarks (in O) need propagators (inv. Dirac op. M −1 )
Direct calculation of fermion determinant prohibitively expensive
⇒ Use pseudofermions and exponentiate fermion matrix (Dirac op.)
det M[A; mf ] ∝
Z
DφDφ† exp −φ† (M † [A; mf ])−1 (M[A; mf ])−1 φ
However: inversion of fermion matrix the most expensive operation
Estimate φ, φ† integration with stochastic noise in heatbath step
Highly improved staggered quarks
(Highly Improved) Staggered Quarks (HISQ) have residual chiral
symmetry, project out one of four spin components of Dirac field
⇒ Dirac operator naïvely factor four cheaper than std. implementations
⇒ Chiral corrections O(amf ) suppressed to O(mf2 a2 )
Staggered propagator has four poles ⇒ four mass-degenerate quarks
Non-degeneracy at finite cutoff ⇒ taste-breaking (reduced for HISQ)
HISQ action uses fat links and long links, up to seven-link hopping
terms w. smeared and reunitarised (U(3) or SU(3)) gauge-fields
⇒ Remove spurious fermions with fourth root of quark determinant
Hasenbusch preconditioning
Low quark masses: fermion force fluctuates strongly ⇒ low acceptance
⇒ Decrease size of fluctuations with preconditioner
⇒ Hasenbsuch preconditioning w. large mass parameter mh
det M[A; mf ]r = det
M[A; mf ]
M[A; mh ]
r
det M[A; mh ]r
In our 2+1 flavour setup: Hasenbusch quarks divided into three parts
p
4
det M[A; ml
]2 M[A; m
s
]1
=
r
4
det
M[A; ml ]2 M[A; ms ]1
M[A; mh ]3
p
4
det M[A; mh ]
3
⇒ Three extra inversions for Hasenbusch quarks (→ MULTI-STEP)
Possibility to use strange quark as Hasenbusch quark for light quarks?
RHMC (Rational Hybrid Monte Carlo) for staggered sea quarks
Approximate fourth root w. high order rational function computed
at high precision using GMP library, recover coefficients from files
Coefficients depend on bare quark masses, but not on gauge fields
Coefficients serve as constant shifts for massless staggered Dirac op.
Multi-shift CG for inversion [by B.Jegerlehner, hep-lat/9612014],
refine multi-shift solutions with single-shift CG if necessary
Use QOPQDP library [USQCD consortium]: multi-/single-shift CG
MPI/OpenMP hybrid implementations
Even-odd preconditioning
Optimised communications
Initially intended Disjoint-Additive Schwarz Preconditioning
⇒ Breaks structure of constant shifts, not compatible w. multi-shift CG
⇒ Put DASP implementation to single-shift CG code (for valence quarks)
Calculation of the fermion and gauge force
Compute discretised gauge field derivative from M(M † M)−1 φ
⇒ Different levels of smearing and reunitarisation contribute for HISQ
Also use QOPQDP library [USQCD consortium]
MPI/OpenMP hybrid implementation
Probably algorithmically optimised to the utmost, but needs closer look
Extensive use of QOPDP library: machine-dependently (AVX,AVX2)
tune underlying QLA (QCD Linear Algebra) and QDP (QFT Data Parallel)
Summary
Heatbath step: thermalisation through
stochastical conjugate momenta and pseudofermions
Molecular dynamics evolution: accounts for sea quarks
Rational function approximation for fourth root of quark determinant
Hasenbusch preconditioning
Multi-shift CG
Even-odd preconditioning
Improvement possible via:
MULTI-STEP molecular dynamics evolution
Optimise step sizes for Hasenbusch preconditioning
Machine dependent tuning (AVX,AVX2) of time-critical linear algebra
Global accept/reject step: correct for errors of MD time evolution
Good control of molecular dynamics evolution to keep acceptance high
Beyond the current project: MULTI-STEP may be very valuable
for state-of-the-art simulations with dynamical charm quarks
Charm quarks are heavy (amc ∼ 1), use extra ‘mass’ term (NAIK term)
Expect benefits by using different MD step size for charm
Expect speedup from Disjoint Additive Schwarz Preconditioning
Machine-dependent tuning affects these simulations as well ours