Design-Space Exploration ASCI Spring School 2017 Lab Joost Hoozemans Jeroen van Straten Challenge the future 1 VLIW Processor Challenge the future 2 Design-Space Exploration • Find the optimal configuration given a set of benchmarks • “Optimal” : design choices • Metrics: Performance, Energy, Area Challenge the future 3 Design-Space Exploration Challenge the future 4 Goals • Generate & evaluate design points (configurations) • Using the tools and some scripting • Prune design-space using common sense design choices • Identify most promising candidates • Pareto points • Perform multi-objective optimization (Trade-off) • Highest performance • Smallest area • Discuss your findings in a report Challenge the future 5 Design-Space Exploration • Find the optimal configuration given a set of benchmarks • “Optimal” : design choices • Metrics: Performance, Energy, Area Challenge the future 6 Design-Space Exploration • Find the optimal configuration given a set of benchmarks • “Optimal” : design choices • Metrics: Performance, Energy, Area • Platform: VEX simulator Challenge the future 7 VEX : VLIW Example • Family of VLIW processors • Parameters • Issue width (max nr. of operations/cycle) • Functional units (ALU, Multipliers) • Clustering • Number of registers • Cache sizes • Compiled simulator • Generates a simulation of your core running a program Challenge the future 8 Issue width MUL MEM ALU Challenge the future 9 Challenge the future 10 Challenge the future 11 Programs: Powerstone • Simple benchmark programs • Different application domains (embedded, image processing, general-purpose) Challenge the future 12 Processor core description: Machine configuration file • Example in workspace • REG: Resources (functional units) • DEL: Delays (pipeline) • REG: Number of Registers Challenge the future 13 Lab materials •Lab manual • And this presentation – more slides with hints •The HP VEX toolchain •Script to facilitate: • Generation of machine configurations • Area and energy estimation • Build & run benchmarks •Download link: on ASCI webpage Challenge the future 14 Lab guides Jeroen van Straten [email protected] Joost Hoozemans [email protected] Challenge the future 15 http://rvex.ewi.tudelft.nl Challenge the future 16 Data Cache Pipeline Instruction Cache Register File Challenge the future 17 Data Cache F D E1 E2 WB Instruction Cache Register File Challenge the future 18 Data Cache F D ALU WB Instruction Cache Register File Challenge the future 19 Data Cache F D MEM WB Instruction Cache Register File Challenge the future 20 Data Cache F D MUL WB Instruction Cache Register File Challenge the future 21 Data Cache ALU F D MEM WB MUL Instruction Cache Register File Challenge the future 22 Data Cache add r1 = r2, r3 ;; add r4 = r5, r6 ;; add r7 = r8, r9 ;; add r10 = r 11, r12 ;; mul r13 = r14, r15 ;; Ld r20 = 0x1234 ;; ALU F D MEM WB MUL Register File Challenge the future 23 Data Cache Instruction Cache F D MEM WB F D MUL WB F D WB ALU Register File Challenge the future 24 Data Cache add r1 = r2, r3 Ld r20 = 0x1234 mul r13 = r14, r15 ;; add r4 = r5, r6 ;; add r7 = r8, r9 ;; add r10 = r 11, r12 ;; F D MEM WB F D MUL WB F D WB ALU Register File Challenge the future 25 Data Cache MEM F D ALU WB MUL MEM Instruction Cache F D ALU WB MUL MEM F D ALU WB MUL Register File Challenge the future 26 Data Cache add r1 = r2, r3 Ld r20 = 0x1234 mul r13 = r14, r15 ;; add r4 = r5, r6 add r7 = r8, r9 add r10 = r 11, r12 ;; MEM F D ALU WB MUL MEM F D ALU WB MUL MEM F D ALU WB MUL Register File Challenge the future 27 Data Cache MEM F D ALU WB MUL Instruction Cache F D ALU WB Branch F D ALU WB Register File Challenge the future 28 Data Cache add r1 = r2, r3 Ld r20 = 0x1234 mul r13 = r14, r15 ;; add r4 = r5, r6 add r7 = r8, r9 add r10 = r 11, r12 ;; MEM F D ALU WB MUL F D ALU WB Branch F D ALU WB Register File Challenge the future 29 Intermezzo: VEX clusters ● A cluster is a section of a VLIW processor that has its own register file and resources ● Only cluster 0 can execute control operations (branch, goto, call, etc.) ● multicluster != multicore 30 Intermezzo: VEX clusters ● 4-issue without clusters: Instruction bundle ALU, BR ALU ALU ALU 64 registers 8 read, 4 write per cycle 31 Intermezzo: VEX clusters ● 4-issue divided into 2 clusters Smaller/faster register file Instruction bundle ALU, BR ALU, C 32 registers 4R, 2W per cycle ALU, C ALU 32 registers 4R, 2W per cycle 32 Intermezzo: VEX ● ci at the start of the line denotes cluster ● So does the first number in the register names ;; c0 sub c0=c1 mov $r0.8 $r0.3 = $r0.13, $r0.15 = $r1.5 c0 c0 c1 $r0.15 = $r0.15, 32 $r0.14 = $r0.14, $r0.8 $r1.2 = $r1.6, ~0x3f ;; add shl add ;; 33
© Copyright 2026 Paperzz