LULESH execution with Unums in C/C++

LULESH Execution with Unums in C/C++
CROSS Research Symposium
Scott Lloyd, Markus Schordan, Dan Quinlan
October 25, 2016
LLNL-PRES-705910
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore
National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
Current Unum Implementation
 Developed at LLNL
— Open Source http://github.com/LLNL/unum
 Unum precision can be changed at run time
— Can range in size from a few bits up to thousands
 Convert between unums and primitive 'C' types
— Unums ↔ Integers, Floating-point numbers
 Relational operations
— Standard Less than, Greater than, Equal
— Others to detect overlapping or disjoint intervals
 Arithmetic operations
— Add, Subtract, Multiply, Divide, Square, Square Root, Negate, Absolute
 C++ wrapper class for the unum library
— Allows standard mathematical operators to be used with unums
 Instrumentation
— Collect precision and data movement statistics
2
LLNL-PRES-705910
Future Work for Unum Library
 Additional arithmetic operations
— Power
 Transcendental functions
— Exponential, Logarithmic, Trigonometric functions
 Fused Operations
— Dot product, Multiply accumulate, Summation
 Complex FFT
— Use fused operations to improve accuracy
 Performance improvement
— Optimize as needed to meet research goals
 Documentation
3
LLNL-PRES-705910
unum.h
unum
ubnd
unum_set()
unum_set_ui()
unum_set_si()
unum_set_d()
unum_set_str()
hlayer.h
scan_*()
print_*()
uview_*()
view_uenv()
ulayer.h
unum_t
ubnd_t
unum_init()
ubnd_init()
unum_clear()
ubnd_clear()
hlayer.c
ulayer.c
unum_get_si()
unum_get_d()
unum_get_str()
conv.h
*2g() – si,ui,d,f
g2*()
unum.c
unumxx.h
ubnd.c
conv.c
unum_cmp()
unum_add()
unum_sub()
unum_mul()
unum_div()
unum_pow()
unum_sq()
unum_sqrt()
unum_neg()
unum_abs()
unum_guess()
Transcendental
Fused
ComplexFFT
ubnd_*()
uenv.h
MAX_*SIZE
variables
init_uenv()
set_uenv()
mpx.h
MPX_VAR
mpx_t
mpx_*()
*2un() – si,ui,d
un2*()
utag()
signmask()
bigu()
*2ub() – si,ui,d
ub2*()
scale()
ne()
u2f()
f2u()
inexQ()
infuQ()
nanuQ()
unum2g()
ubnd2g()
u2g()
g2u()
unify()
smartunify()
guessu()
promotef()
promotee()
promote()
demotef()
demotee()
support.c
gmp.c
gmp_macro.h
glayer.c
support.h
utag_t
gmp.h
mpn_*()
mpz_*()
mpf_*()
uenv.c
Unum Library Module Map, Future Work in Red
ubnd.h
ltuQ()
gtuQ()
nequQ()
nnequQ()
sameuQ()
spanszerouQ()
intersectuQ()
plusu()
minusu()
timesu()
divideu()
powu()
squareu()
sqrtu()
negateu()
absu()
expu()
logu()
cosu()
sinu()
tanu()
cotu()
unum_nbits()
glayer.h
gnum_t
gbnd_t
gnum_init()
gbnd_init()
gnum_clear()
gbnd_clear()
gbnd.h
ltgQ()
gtgQ()
neqgQ()
nneqgQ()
samegQ()
spanszerogQ()
intersectgQ()
plusg()
minusg()
timesg()
divideg()
powg()
squareg()
sqrtg()
negateg()
absg()
expg()
logg()
cosg()
sing()
tang()
cotg()
gmp_aux.h
mpn_*shift()
mpn_*bit()
mp*_import_b()
mp*_export_b()
gmp_aux.c
gbnd.c
4
LLNL-PRES-705910
Main interfaces to unum functionality
 unumxx.h
— ‘C++’ wrapper class that allows unums to be used with the standard
arithmetic operators
ubnd_c a, b, c;
c = a + b;
 unum.h
— ‘C’ function interface that stores unums in a variable length byte array
— char ub[ubnd_sz];
UBND_VAR(a); UBND_VAR(b); UBND_VAR(c); // macro
ubnd_add(c, a, b);
 ubnd.h
— ‘C’ function interface that stores unums in a data structure
— typedef struct {unsigned char p; unum_s *l; unum_s *r;} ubnd_s;
UB_VAR(a); UB_VAR(b); UB_VAR(c); // macro
plusu(c, a, b);
5
LLNL-PRES-705910
LULESH Proxy Application
 Livermore Unstructured Lagrangian Explicit Shock
Hydrodynamics
 Describes the motion of materials relative to each other when
subject to forces
 Partitions the spatial problem domain into a collection of
volumetric elements defined by a mesh
6
LLNL-PRES-705910
Modifications made to LULESH for unums
 Use ubound class to represent reals
Automatic rounding
— typedef ubnd_c Real_t; // floating point representation
 Use macro to specify real literals with a string
— #define RLIT(arg) Real_t(#arg)
— Avoid converting to double first
 Time step
— Time values are exact
— Use guess function on time and delta time values
— Better to take the minimum of delta time
Selective rounding
 Divide by zero
— ∆x = (xt2 – xt1) may result in interval containing zero
— Add a small value “ptiny” to make non-zero
— Use guess function to collapse interval
 Negative volume internal self-check
— Use guess function on volume variables
 Clipping, maximum and minimum
— clipl(arg, limit) and cliph(arg, limit)
 Comparisons, specify lower or upper endpoint
— cmpe(arg1, ep1, arg2, ep2) returns -1, 0, +1
7
LLNL-PRES-705910
IEEE - Exponent: 11 bits Fraction: 52 bits
8
LLNL-PRES-705910
Unum - Exponent: 8 bits Fraction: 16 bits
9
LLNL-PRES-705910
Unum - Exponent: 8 bits Fraction: 8 bits
10
LLNL-PRES-705910
Round (guess) after each operation
No intervals
LULESH statistics, problem size 53
Precision
Bits per Number
Symmetry Test
Operation Result Type
Average
Max Relative Difference
Exact
Unum
Inexact
Pair
env 3,2
Time step increment too small. Need more precision. Delta time: 4.96e-05 at time step 11
env 3,3
16.6 69%
24
4.88e-02
100.0%
0.0%
0.0%
env 3,4
24.0 73%
33
2.47e-03
100.0%
0.0%
0.0%
float
32
32
1.69e-06
100.0%
N/A
N/A
env 3,5
38.6 77%
50
1.84e-08
100.0%
0.0%
0.0%
double
64
64
1.72e-14
100.0%
N/A
N/A
env 3,6
65.7 79%
83
5.28e-18
100.0%
0.0%
0.0%
long dbl
80
80
3.21e-18
100.0%
N/A
N/A
env 3,7
118.7 80%
148
1.74e-37
100.0%
0.0%
0.0%
Example env 3,5: up to 23 exponent bits, up to 25 fraction bits
11
LLNL-PRES-705910
Round (guess) after each operation
No intervals
Avg 66
12
LLNL-PRES-705910
Round (guess) on each assignment
All expressions are calculated with intervals
LULESH statistics, problem size 53
Precision
Bits per Number
Symmetry Test
Operation Result Type
Average
Max Relative Difference
Exact
Unum
Inexact
Pair
env 3,2
Time step increment too small. Need more precision. Delta time: 5.72e-05 at time step 10
env 3,3
18.4 77%
24
1.42
64.8%
23.3%
11.9%
env 3,4
27.6 84%
33
1.35e-04
55.2%
28.5%
16.3%
float
32
32
1.69e-06
100%
N/A
N/A
env 3,5
45.2 90%
50
6.06e-09
47.6%
33.0%
19.4%
double
64
64
1.72e-14
100%
N/A
N/A
env 3,6
77.2 93%
83
9.54e-19
46.2%
34.2%
19.6%
long dbl
80
80
3.21e-18
100%
N/A
N/A
env 3,7
139.3 94%
148
3.26e-38
46.2%
34.2%
19.6%
Example env 3,5: up to 23 exponent bits, up to 25 fraction bits
13
LLNL-PRES-705910
Round (guess) on each assignment
All expressions are calculated with intervals
Avg 77
14
LLNL-PRES-705910
Round (guess) select expressions
Only time and volume expressions are rounded, all others are calculated with intervals
LULESH statistics, problem size 53
Precision
Bits per Number
Symmetry Test
Operation Result Type
Average
Max Relative Difference
Exact
Unum
Inexact
Pair
env 3,2
Time step increment too small. Need more precision. Delta time: 2.86e-05 at time step 4
env 3,3
Extreme value encountered (6.78e+38,Inf) at time step 22
env 3,4
Extreme value encountered (6.81e+38,Inf) at time step 35
float
32
32
1.69e-06
100%
N/A
N/A
env 3,5
73.4 147%
50
2.15e-02
11.5%
7.4%
81.1%
double
64
64
1.72e-14
100%
N/A
N/A
env 3,6
125 151%
83
1.62e-14
11.5%
7.4%
81.1%
long dbl
80
80
3.21e-18
100%
N/A
N/A
env 3,7
228 154%
148
8.88e-34
11.6%
7.3%
81.1%
Example env 3,5: up to 23 exponent bits, up to 25 fraction bits
15
LLNL-PRES-705910
Round (guess) select expressions
Only time and volume expressions are rounded, all others are calculated with intervals
Avg 125
16
LLNL-PRES-705910
Summary
 LULESH ran successfully with unums using automatic rounding
—
—
—
—
—
Little change required to the source code
Unum environment can be set from the command line
Unums can help in finding the needed precision for an application
Variable length encoding can reduce bits moved by 10-20%
Better accuracy (based on symmetry) for a given number of bits
 LULESH running with intervals is still a work in progress
—
—
—
—
—
—
Results are not definitive but indicate trends
Consider division with interval containing zero
Effort required to interpret conditionals with respect to intervals
Mesh node values (boxes) tend to widen with time
Higher precision allows longer run before encountering extreme values
Unum pairs (ubounds) needed about 80% of the time
17
LLNL-PRES-705910