A Fast Instruction Set Simulator for RISC-V [email protected] [email protected] [email protected] [email protected] Esperanto Technologies, Inc. 5th RISC-V Workshop November 30, 2016 Esperanto Technologies: A Fast ISA Simulator for RISC-V 1 5th RISC-V Workshop Nov 30, 2016 Background Esperanto is a stealth mode startup designing chips with RISC-V. Esperanto wanted a fast RISC-V ISA simulator capable of: • • • • Running large applications with minimal (<5x) slowdown Running large number of threads with good scalability Providing flexibility in testing instruction extensions Providing flexibility in gathering performance data Fast simulation is a key productivity tool • Gives a fast compile, run and test/debug loop prior to silicon • Current simulators were judged to be too slow • Undertook project to modify Eltechs ExaGear to run RISC-V Esperanto Technologies: A Fast ISA Simulator for RISC-V 2 5th RISC-V Workshop Nov 30, 2016 Motivation Evaluated two existing options: QEMU and Spike Would either be sufficient? We compiled several tests from Spec2006 benchmark both for x86-64 and RISC-V with the same compiler version and options: GCC 6.1.0 –O2 Esperanto Technologies: A Fast ISA Simulator for RISC-V 3 5th RISC-V Workshop Nov 30, 2016 Comparison of native, QEMU and Spike runtimes x86-64 native (in seconds) QEMU system mode ( in seconds) Spike pk (in seconds) 099.go 8.6 585 2,294 401.bzip2 475 Failed 41,043 444 Failed 48,859 66* Failed 16,045 64* 2145 10,778 464 26040 Failed Benchmark 410.bwaves 416.gamess 445.gobmk 435.gromacs * Only one subtask Esperanto Technologies: A Fast ISA Simulator for RISC-V 4 5th RISC-V Workshop Nov 30, 2016 Result of Spike and QEMU evaluation • Spike is a very slow simulator: 86x – 267x times slower than native • Spike user mode (pk) requires static binaries – not applicable in some cases • QEMU system mode was not fast enough: • 34x – 68x times slower than native • QEMU had no user mode support for RISC-V when we started the evaluation Esperanto Technologies: A Fast ISA Simulator for RISC-V 5 5th RISC-V Workshop Nov 30, 2016 Fast Simulator Design Esperanto Technologies: A Fast ISA Simulator for RISC-V 6 5th RISC-V Workshop Nov 30, 2016 User mode approach “User mode” emulation means we only need to run the RISC-V instructions in the application, not the OS. The OS still runs compiled to the native hardware. User mode approach advantages: • Faster emulation (does not need sophisticated software MMU techniques) • Does not need stable RISC-V kernel right now • Kernel code does not blur performance results Esperanto Technologies: A Fast ISA Simulator for RISC-V 7 5th RISC-V Workshop Nov 30, 2016 User mode approach RISC-V World ! RISC-V Apps RISC-V Libs Fast Simulator x86 Apps x86 Libs x86-64 Linux x86-64 Native Hardware Esperanto Technologies: A Fast ISA Simulator for RISC-V 8 5th RISC-V Workshop Nov 30, 2016 User mode key points • Fast Simulator can only execute Linux RISC-V ELFbinaries with user mode • RISC-V code is translated into corresponding x86-64 instructions • RISC-V Linux system calls are translated into corresponding x86-64 Linux system calls • RISC-V applications can only see a RISC-V world (some kind of chroot), but can communicate with host via kernel (for example, using sockets) Esperanto Technologies: A Fast ISA Simulator for RISC-V 9 5th RISC-V Workshop Nov 30, 2016 Fast Simulator Translation Flow Perform optimizations here 3. Run translated trace on x86 RISC-V APPLICATION FAST TRANSLATION { Was this trace previously translated? 2. If the trace was not translated, then translate it RUNNING TRANSLATED TRACE RISC-V traces 1. Check if the trace was previously translated Esperanto Technologies: A Fast ISA Simulator for RISC-V 4. Save translated trace in cache CACHE 6. Run restored trace on x86 5. If trace was translated before restore it from cache 10 5th RISC-V Workshop Nov 30, 2016 Fast Simulator Translation Flow Key Points • Fast Simulator translates traces • All compiled traces are saved in a Translation Cache and are then reused • Several optimizations are applied, including efficient register allocation, peephole optimization, dynamic jump cache, etc. • Floating point calculations directly use hardware FPU and FPU registers Esperanto Technologies: A Fast ISA Simulator for RISC-V 11 5th RISC-V Workshop Nov 30, 2016 Fast Simulator Design Frontends New Arch x86-32 ARM v7 IR Component ARM v7 ARM v8 x86-64 Backends Esperanto Technologies: A Fast ISA Simulator for RISC-V 12 5th RISC-V Workshop Nov 30, 2016 Fast Simulator Design Frontends RISC-V 64bit x86-32 ARM v7 IR Component ARM v7 ARM v8 x86-64 Backends Esperanto Technologies: A Fast ISA Simulator for RISC-V 13 5th RISC-V Workshop Nov 30, 2016 Fast Simulator Design Benefits • x86-64 backend was implemented previously and we just reused it • Intermediate Representation (IR) was also reused • All reused components are very reliable due to exhaustive testing in previous products • For first reasonable version we had to implement only RISC-V frontend and tune other components • Many optimizations worked without changes Esperanto Technologies: A Fast ISA Simulator for RISC-V 14 5th RISC-V Workshop Nov 30, 2016 Development time frame • About 2 months for a first reliable version which is able to run full Spec2006 benchmark • About one month of performance tuning the optimizations to get the presented numbers Esperanto Technologies: A Fast ISA Simulator for RISC-V 15 5th RISC-V Workshop Nov 30, 2016 As a Result • Fast development • High reliability • High performance Esperanto Technologies: A Fast ISA Simulator for RISC-V 16 5th RISC-V Workshop Nov 30, 2016 Performance results Esperanto Technologies: A Fast ISA Simulator for RISC-V 17 5th RISC-V Workshop Nov 30, 2016 Performance results • Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz • GCC 6.1.0, –O2 optimization level • RISC-V toolchain available on 24 OCT 2016 • QEMU 2.7.50 (RISC-V user mode appears!) • Spec2006 benchmarks Esperanto Technologies: A Fast ISA Simulator for RISC-V 18 5th RISC-V Workshop Nov 30, 2016 CINT2006 Fast Sim/x86-64 performance comparison Spec name 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk GeoMean Fast Sim (in seconds) 1168 1188 890 469 1473 1363 1745 787 1986 719 721 828 x86-64 (in seconds) 359 475 318 376 447 427 494 550 537 383 401 301 Esperanto Technologies: A Fast ISA Simulator for RISC-V Ratio 3.25 2.50 2.80 1.25 3.30 3.19 3.53 1.43 3.70 1.88 1.80 2.75 2.47 19 5th RISC-V Workshop Nov 30, 2016 CFP2006 Fast Sim/x86-64 performance comparison Spec name 410.bwaves 416.gamess 433.milc 434.zeusmp 435.gromacs 436.cactusADM 437.leslie3d 444.namd 447.dealII 450.soplex 453.povray 454.calculix 459.GemsFDTD 465.tonto 470.lbm 481.wrf 482.sphinx3 GeoMean x86-64 (in seconds) 444 66 441 411 464 763 432 363 310 314 169 710 516 656 490 589 733 Esperanto Technologies: A Fast ISA Simulator for RISC-V Fast Sim (in seconds) 1213 235 1134 1938 1360 4756 1595 1263 956 667 737 4149 1126 2126 1244 2280 3704 Ratio 2.73 3.56 2.57 4.72 2.93 6.23 3.69 3.48 3.08 2.12 4.36 5.84 2.18 3.24 2.54 3.87 5.05 3.48 20 5th RISC-V Workshop Nov 30, 2016 CINT2006 QEMU user mode/Fast Sim performance comparison Spec name 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk GeoMean Esperanto Technologies: A Fast ISA Simulator for RISC-V Fast Sim (in seconds) 1168 1188 890 469 1473 1363 1745 787 1986 719 721 828 QEMU user mode (in seconds) 3334 1478 1766 506 2385 1609 3042 921 4492 2050 971 2060 Ratio 2.85 1.24 1.98 1.08 1.62 1.18 1.74 1.17 2.26 2.85 1.35 2.49 1.71 21 5th RISC-V Workshop Nov 30, 2016 CFP2006 QEMU user mode/Fast Sim performance comparison Spec name 410.bwaves 416.gamess 433.milc 434.zeusmp 435.gromacs 436.cactusADM 437.leslie3d 444.namd 447.dealII 450.soplex 453.povray 454.calculix 459.GemsFDTD 465.tonto 470.lbm 481.wrf 482.sphinx3 GeoMean Esperanto Technologies: A Fast ISA Simulator for RISC-V Fast Sim (in seconds) 1213 235 1134 1938 1360 4756 1595 1263 956 667 737 4149 1126 2126 1244 2280 3704 QEMU user mode (in seconds) 10127 1362 9306 10104 16519 24871 9292 17074 4987 1739 4325 47720 12004 16752 14201 17689 24365 Ratio 8.35 5.80 8.21 5.21 12.15 5.23 5.83 13.52 5.22 2.61 5.87 11.50 10.66 7.88 11.42 7.76 6.58 7.29 22 5th RISC-V Workshop Nov 30, 2016 CINT2006 Spike pk/Fast Sim performance comparison Spec name 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk GeoMean Esperanto Technologies: A Fast ISA Simulator for RISC-V Fast Sim Spike pk (in seconds) 1168 1188 890 469 1473 1363 1745 787 1986 719 721 828 (in seconds) 55111 41040 Ratio 47.18 34.55 failed 4762 71583 32425 102748 10245 10.15 48.60 23.79 58.88 13.02 failed 19965 11651 27.77 16.16 failed 26.56 23 5th RISC-V Workshop Nov 30, 2016 CFP2006 Spike pk/Fast Sim performance comparison Spec name 410.bwaves 416.gamess 433.milc 434.zeusmp 435.gromacs 436.cactusADM 437.leslie3d 444.namd 447.dealII 450.soplex 453.povray 454.calculix 459.GemsFDTD 465.tonto 470.lbm 481.wrf 482.sphinx3 GeoMean Esperanto Technologies: A Fast ISA Simulator for RISC-V Fast Sim (in seconds) 1213 235 1134 1938 1360 4756 1595 1263 956 667 737 4149 1126 2126 1244 2280 3704 Spike pk (in seconds) 48859 16045 29221 35506 failed 199003 26769 44916 24788 7861 37731 159790 30716 91514 39644 failed failed Ratio 40.28 68.28 25.77 18.32 41.84 16.78 35.56 25.93 11.79 51.20 38.51 27.28 43.05 31.87 30.92 24 5th RISC-V Workshop Nov 30, 2016 Performance summary • Fast Simulator is only 2.5 times slower than native on SpecInt2006 and 3.5 times on SpecFPU2006 (3x in average and 6.2x in the worst case) • Fast Simulator is 1.7 times faster than QEMU user mode on SpecInt2006 and 7.3 times faster on SpecFPU2006 (4x in average and up to 13.5x on some tests ) • Fast simulator ~30 times faster than Spike Esperanto Technologies: A Fast ISA Simulator for RISC-V 25 5th RISC-V Workshop Nov 30, 2016 Why is Fast Simulator so fast? Use of a “performance oriented” architecture: • Compiler style Intermediate Representation (allows implementing more optimizations) • Smart trace collection • Optimized x86-64 backend • Many of optimization tricks in other components are enabled by default • Hardware floating point emulation Esperanto Technologies: A Fast ISA Simulator for RISC-V 26 5th RISC-V Workshop Nov 30, 2016 Good Multithreading Scalability Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz , 64 cores GeoBenchmark: $ mig_benchmark 1001 1001 1 Number of x86-64 threads seconds 32.262 1 Fast Sim seconds Ratio 1 Ratio 2.00 154.477 77.322 1 2.00 2 16.136 4 8.084 3.99 38.811 3.98 8 4.07 7.93 19.525 7.91 16 2.049 15.75 9.799 15.76 32 1.051 30.70 5.082 30.40 64 0.536 60.19 2.614 59.10 Esperanto Technologies: A Fast ISA Simulator for RISC-V 27 5th RISC-V Workshop Nov 30, 2016 Future plans Still a proprietary simulator under development Future plans may include: • Additional optimizations to improve performance • Improved floating point precision support • Support for additional extensions • Support for additional metrics • Improved debugging options • System Mode ? Esperanto Technologies: A Fast ISA Simulator for RISC-V 28 5th RISC-V Workshop Nov 30, 2016 Summary • Internal working name is ExaGear-RVx • ExaGear-RVx is an extremely fast simulator • Only 2.5 (int) to 3.5x(float) slower than native execution • Up to 10+ times faster(4x in average) than QEMU user mode • Good multithreading scalability • Industrial level reliability • Reasonable design flexibility – Allows quick support for hardware ISA changes Esperanto Technologies: A Fast ISA Simulator for RISC-V 29 5th RISC-V Workshop Nov 30, 2016
© Copyright 2026 Paperzz