Esperanto Technologies Simple explanation for - RISC

A Fast
Instruction Set Simulator
for RISC-V
[email protected]
[email protected]
[email protected]
[email protected]
Esperanto Technologies, Inc.
5th RISC-V Workshop
November 30, 2016
Esperanto Technologies: A Fast ISA Simulator for RISC-V
1
5th RISC-V Workshop Nov 30, 2016
Background
Esperanto is a stealth mode startup designing chips with RISC-V.
Esperanto wanted a fast RISC-V ISA simulator capable of:
•
•
•
•
Running large applications with minimal (<5x) slowdown
Running large number of threads with good scalability
Providing flexibility in testing instruction extensions
Providing flexibility in gathering performance data
Fast simulation is a key productivity tool
• Gives a fast compile, run and test/debug loop prior to silicon
• Current simulators were judged to be too slow
• Undertook project to modify Eltechs ExaGear to run RISC-V
Esperanto Technologies: A Fast ISA Simulator for RISC-V
2
5th RISC-V Workshop Nov 30, 2016
Motivation
Evaluated two existing options: QEMU and Spike
Would either be sufficient?
We compiled several tests from Spec2006
benchmark both for x86-64 and RISC-V with the
same compiler version and options: GCC 6.1.0 –O2
Esperanto Technologies: A Fast ISA Simulator for RISC-V
3
5th RISC-V Workshop Nov 30, 2016
Comparison of native, QEMU and Spike runtimes
x86-64 native
(in seconds)
QEMU system mode
( in seconds)
Spike pk
(in seconds)
099.go
8.6
585
2,294
401.bzip2
475
Failed
41,043
444
Failed
48,859
66*
Failed
16,045
64*
2145
10,778
464
26040
Failed
Benchmark
410.bwaves
416.gamess
445.gobmk
435.gromacs
* Only one subtask
Esperanto Technologies: A Fast ISA Simulator for RISC-V
4
5th RISC-V Workshop Nov 30, 2016
Result of Spike and QEMU evaluation
• Spike is a very slow simulator: 86x – 267x times
slower than native
• Spike user mode (pk) requires static binaries – not
applicable in some cases
• QEMU system mode was not fast enough:
• 34x – 68x times slower than native
• QEMU had no user mode support for RISC-V when
we started the evaluation
Esperanto Technologies: A Fast ISA Simulator for RISC-V
5
5th RISC-V Workshop Nov 30, 2016
Fast Simulator Design
Esperanto Technologies: A Fast ISA Simulator for RISC-V
6
5th RISC-V Workshop Nov 30, 2016
User mode approach
“User mode” emulation means we only need to run
the RISC-V instructions in the application, not the OS.
The OS still runs compiled to the native hardware.
User mode approach advantages:
•
Faster emulation (does not need sophisticated software
MMU techniques)
•
Does not need stable RISC-V kernel right now
•
Kernel code does not blur performance results
Esperanto Technologies: A Fast ISA Simulator for RISC-V
7
5th RISC-V Workshop Nov 30, 2016
User mode approach
RISC-V World !
RISC-V
Apps
RISC-V
Libs
Fast Simulator
x86 Apps
x86 Libs
x86-64 Linux
x86-64 Native Hardware
Esperanto Technologies: A Fast ISA Simulator for RISC-V
8
5th RISC-V Workshop Nov 30, 2016
User mode key points
• Fast Simulator can only execute Linux RISC-V ELFbinaries with user mode
• RISC-V code is translated into corresponding x86-64
instructions
• RISC-V Linux system calls are translated into
corresponding x86-64 Linux system calls
• RISC-V applications can only see a RISC-V world
(some kind of chroot), but can communicate with host
via kernel (for example, using sockets)
Esperanto Technologies: A Fast ISA Simulator for RISC-V
9
5th RISC-V Workshop Nov 30, 2016
Fast Simulator Translation Flow
Perform
optimizations here
3. Run translated
trace on x86
RISC-V APPLICATION
FAST TRANSLATION
{
Was this trace
previously
translated?
2. If the trace was
not translated,
then translate it
RUNNING TRANSLATED
TRACE
RISC-V traces
1. Check if the trace was
previously translated
Esperanto Technologies: A Fast ISA Simulator for RISC-V
4. Save translated
trace in cache
CACHE
6. Run restored
trace on x86
5. If trace was
translated before
restore it from cache
10
5th RISC-V Workshop Nov 30, 2016
Fast Simulator Translation Flow Key Points
• Fast Simulator translates traces
• All compiled traces are saved in a Translation Cache
and are then reused
• Several optimizations are applied, including efficient
register allocation, peephole optimization, dynamic
jump cache, etc.
• Floating point calculations directly use hardware FPU
and FPU registers
Esperanto Technologies: A Fast ISA Simulator for RISC-V
11
5th RISC-V Workshop Nov 30, 2016
Fast Simulator Design
Frontends
New Arch
x86-32
ARM v7
IR
Component
ARM v7
ARM v8
x86-64
Backends
Esperanto Technologies: A Fast ISA Simulator for RISC-V
12
5th RISC-V Workshop Nov 30, 2016
Fast Simulator Design
Frontends
RISC-V 64bit
x86-32
ARM v7
IR
Component
ARM v7
ARM v8
x86-64
Backends
Esperanto Technologies: A Fast ISA Simulator for RISC-V
13
5th RISC-V Workshop Nov 30, 2016
Fast Simulator Design Benefits
• x86-64 backend was implemented previously and
we just reused it
• Intermediate Representation (IR) was also reused
• All reused components are very reliable due to
exhaustive testing in previous products
• For first reasonable version we had to implement
only RISC-V frontend and tune other components
• Many optimizations worked without changes
Esperanto Technologies: A Fast ISA Simulator for RISC-V
14
5th RISC-V Workshop Nov 30, 2016
Development time frame
• About 2 months for a first reliable version which is
able to run full Spec2006 benchmark
• About one month of performance tuning the
optimizations to get the presented numbers
Esperanto Technologies: A Fast ISA Simulator for RISC-V
15
5th RISC-V Workshop Nov 30, 2016
As a Result
• Fast development
• High reliability
• High performance
Esperanto Technologies: A Fast ISA Simulator for RISC-V
16
5th RISC-V Workshop Nov 30, 2016
Performance results
Esperanto Technologies: A Fast ISA Simulator for RISC-V
17
5th RISC-V Workshop Nov 30, 2016
Performance results
• Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz
• GCC 6.1.0, –O2 optimization level
• RISC-V toolchain available on 24 OCT 2016
• QEMU 2.7.50 (RISC-V user mode appears!)
• Spec2006 benchmarks
Esperanto Technologies: A Fast ISA Simulator for RISC-V
18
5th RISC-V Workshop Nov 30, 2016
CINT2006
Fast Sim/x86-64 performance comparison
Spec name
400.perlbench
401.bzip2
403.gcc
429.mcf
445.gobmk
456.hmmer
458.sjeng
462.libquantum
464.h264ref
471.omnetpp
473.astar
483.xalancbmk
GeoMean
Fast Sim
(in seconds)
1168
1188
890
469
1473
1363
1745
787
1986
719
721
828
x86-64
(in seconds)
359
475
318
376
447
427
494
550
537
383
401
301
Esperanto Technologies: A Fast ISA Simulator for RISC-V
Ratio
3.25
2.50
2.80
1.25
3.30
3.19
3.53
1.43
3.70
1.88
1.80
2.75
2.47
19
5th RISC-V Workshop Nov 30, 2016
CFP2006
Fast Sim/x86-64 performance comparison
Spec name
410.bwaves
416.gamess
433.milc
434.zeusmp
435.gromacs
436.cactusADM
437.leslie3d
444.namd
447.dealII
450.soplex
453.povray
454.calculix
459.GemsFDTD
465.tonto
470.lbm
481.wrf
482.sphinx3
GeoMean
x86-64
(in seconds)
444
66
441
411
464
763
432
363
310
314
169
710
516
656
490
589
733
Esperanto Technologies: A Fast ISA Simulator for RISC-V
Fast Sim
(in seconds)
1213
235
1134
1938
1360
4756
1595
1263
956
667
737
4149
1126
2126
1244
2280
3704
Ratio
2.73
3.56
2.57
4.72
2.93
6.23
3.69
3.48
3.08
2.12
4.36
5.84
2.18
3.24
2.54
3.87
5.05
3.48
20
5th RISC-V Workshop Nov 30, 2016
CINT2006
QEMU user mode/Fast Sim performance comparison
Spec name
400.perlbench
401.bzip2
403.gcc
429.mcf
445.gobmk
456.hmmer
458.sjeng
462.libquantum
464.h264ref
471.omnetpp
473.astar
483.xalancbmk
GeoMean
Esperanto Technologies: A Fast ISA Simulator for RISC-V
Fast Sim
(in seconds)
1168
1188
890
469
1473
1363
1745
787
1986
719
721
828
QEMU user mode
(in seconds)
3334
1478
1766
506
2385
1609
3042
921
4492
2050
971
2060
Ratio
2.85
1.24
1.98
1.08
1.62
1.18
1.74
1.17
2.26
2.85
1.35
2.49
1.71
21
5th RISC-V Workshop Nov 30, 2016
CFP2006
QEMU user mode/Fast Sim performance comparison
Spec name
410.bwaves
416.gamess
433.milc
434.zeusmp
435.gromacs
436.cactusADM
437.leslie3d
444.namd
447.dealII
450.soplex
453.povray
454.calculix
459.GemsFDTD
465.tonto
470.lbm
481.wrf
482.sphinx3
GeoMean
Esperanto Technologies: A Fast ISA Simulator for RISC-V
Fast Sim
(in seconds)
1213
235
1134
1938
1360
4756
1595
1263
956
667
737
4149
1126
2126
1244
2280
3704
QEMU user mode
(in seconds)
10127
1362
9306
10104
16519
24871
9292
17074
4987
1739
4325
47720
12004
16752
14201
17689
24365
Ratio
8.35
5.80
8.21
5.21
12.15
5.23
5.83
13.52
5.22
2.61
5.87
11.50
10.66
7.88
11.42
7.76
6.58
7.29
22
5th RISC-V Workshop Nov 30, 2016
CINT2006
Spike pk/Fast Sim performance comparison
Spec name
400.perlbench
401.bzip2
403.gcc
429.mcf
445.gobmk
456.hmmer
458.sjeng
462.libquantum
464.h264ref
471.omnetpp
473.astar
483.xalancbmk
GeoMean
Esperanto Technologies: A Fast ISA Simulator for RISC-V
Fast Sim
Spike pk
(in seconds)
1168
1188
890
469
1473
1363
1745
787
1986
719
721
828
(in seconds)
55111
41040
Ratio
47.18
34.55
failed
4762
71583
32425
102748
10245
10.15
48.60
23.79
58.88
13.02
failed
19965
11651
27.77
16.16
failed
26.56
23
5th RISC-V Workshop Nov 30, 2016
CFP2006
Spike pk/Fast Sim performance comparison
Spec name
410.bwaves
416.gamess
433.milc
434.zeusmp
435.gromacs
436.cactusADM
437.leslie3d
444.namd
447.dealII
450.soplex
453.povray
454.calculix
459.GemsFDTD
465.tonto
470.lbm
481.wrf
482.sphinx3
GeoMean
Esperanto Technologies: A Fast ISA Simulator for RISC-V
Fast Sim
(in seconds)
1213
235
1134
1938
1360
4756
1595
1263
956
667
737
4149
1126
2126
1244
2280
3704
Spike pk
(in seconds)
48859
16045
29221
35506
failed
199003
26769
44916
24788
7861
37731
159790
30716
91514
39644
failed
failed
Ratio
40.28
68.28
25.77
18.32
41.84
16.78
35.56
25.93
11.79
51.20
38.51
27.28
43.05
31.87
30.92
24
5th RISC-V Workshop Nov 30, 2016
Performance summary
• Fast Simulator is only 2.5 times slower than native
on SpecInt2006 and 3.5 times on SpecFPU2006
(3x in average and 6.2x in the worst case)
• Fast Simulator is 1.7 times faster than QEMU user
mode on SpecInt2006 and 7.3 times faster on
SpecFPU2006
(4x in average and up to 13.5x on some tests )
• Fast simulator ~30 times faster than Spike
Esperanto Technologies: A Fast ISA Simulator for RISC-V
25
5th RISC-V Workshop Nov 30, 2016
Why is Fast Simulator so fast?
Use of a “performance oriented” architecture:
• Compiler style Intermediate Representation
(allows implementing more optimizations)
• Smart trace collection
• Optimized x86-64 backend
• Many of optimization tricks in other components
are enabled by default
• Hardware floating point emulation
Esperanto Technologies: A Fast ISA Simulator for RISC-V
26
5th RISC-V Workshop Nov 30, 2016
Good Multithreading Scalability
Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz , 64 cores
GeoBenchmark:
$ mig_benchmark 1001 1001 1
Number of x86-64
threads
seconds
32.262
1
Fast Sim
seconds
Ratio
1
Ratio
2.00
154.477
77.322
1
2.00
2
16.136
4
8.084
3.99
38.811
3.98
8
4.07
7.93
19.525
7.91
16
2.049
15.75
9.799
15.76
32
1.051
30.70
5.082
30.40
64
0.536
60.19
2.614
59.10
Esperanto Technologies: A Fast ISA Simulator for RISC-V
27
5th RISC-V Workshop Nov 30, 2016
Future plans
Still a proprietary simulator under development
Future plans may include:
•
Additional optimizations to improve performance
•
Improved floating point precision support
•
Support for additional extensions
•
Support for additional metrics
•
Improved debugging options
•
System Mode ?
Esperanto Technologies: A Fast ISA Simulator for RISC-V
28
5th RISC-V Workshop Nov 30, 2016
Summary
• Internal working name is ExaGear-RVx
• ExaGear-RVx is an extremely fast simulator
• Only 2.5 (int) to 3.5x(float) slower than native execution
• Up to 10+ times faster(4x in average) than QEMU user mode
• Good multithreading scalability
• Industrial level reliability
• Reasonable design flexibility – Allows quick
support for hardware ISA changes
Esperanto Technologies: A Fast ISA Simulator for RISC-V
29
5th RISC-V Workshop Nov 30, 2016