Kickoff slides

Design-Space Exploration
ASCI Spring School 2017 Lab
Joost Hoozemans
Jeroen van Straten
Challenge the future
1
VLIW Processor
Challenge the future
2
Design-Space Exploration
• Find the optimal configuration given a set of benchmarks
• “Optimal” : design choices
• Metrics: Performance, Energy, Area
Challenge the future
3
Design-Space Exploration
Challenge the future
4
Goals
• Generate & evaluate design points (configurations)
• Using the tools and some scripting
• Prune design-space using common sense design choices
• Identify most promising candidates
• Pareto points
• Perform multi-objective optimization (Trade-off)
• Highest performance
• Smallest area
• Discuss your findings in a report
Challenge the future
5
Design-Space Exploration
• Find the optimal configuration given a set of benchmarks
• “Optimal” : design choices
• Metrics: Performance, Energy, Area
Challenge the future
6
Design-Space Exploration
• Find the optimal configuration given a set of benchmarks
• “Optimal” : design choices
• Metrics: Performance, Energy, Area
• Platform: VEX simulator
Challenge the future
7
VEX : VLIW Example
• Family of VLIW processors
• Parameters
• Issue width (max nr. of operations/cycle)
• Functional units (ALU, Multipliers)
• Clustering
• Number of registers
• Cache sizes
• Compiled simulator
• Generates a simulation of your core running a program
Challenge the future
8
Issue width
MUL
MEM
ALU
Challenge the future
9
Challenge the future
10
Challenge the future
11
Programs: Powerstone
• Simple benchmark programs
• Different application domains (embedded, image processing,
general-purpose)
Challenge the future
12
Processor core description:
Machine configuration file
• Example in workspace
• REG: Resources (functional units)
• DEL: Delays (pipeline)
• REG: Number of Registers
Challenge the future
13
Lab materials
•Lab manual
• And this presentation – more slides with hints
•The HP VEX toolchain
•Script to facilitate:
• Generation of machine configurations
• Area and energy estimation
• Build & run benchmarks
•Download link: on ASCI webpage
Challenge the future
14
Lab guides
Jeroen van Straten
[email protected]
Joost Hoozemans
[email protected]
Challenge the future
15
http://rvex.ewi.tudelft.nl
Challenge the future
16
Data
Cache
Pipeline
Instruction
Cache
Register
File
Challenge the future
17
Data
Cache
F
D
E1
E2
WB
Instruction
Cache
Register
File
Challenge the future
18
Data
Cache
F
D
ALU
WB
Instruction
Cache
Register
File
Challenge the future
19
Data
Cache
F
D
MEM
WB
Instruction
Cache
Register
File
Challenge the future
20
Data
Cache
F
D
MUL
WB
Instruction
Cache
Register
File
Challenge the future
21
Data
Cache
ALU
F
D
MEM
WB
MUL
Instruction
Cache
Register
File
Challenge the future
22
Data
Cache
add r1 = r2, r3
;;
add r4 = r5, r6
;;
add r7 = r8, r9
;;
add r10 = r 11, r12
;;
mul r13 = r14, r15
;;
Ld r20 = 0x1234
;;
ALU
F
D
MEM
WB
MUL
Register
File
Challenge the future
23
Data
Cache
Instruction
Cache
F
D
MEM
WB
F
D
MUL
WB
F
D
WB
ALU
Register
File
Challenge the future
24
Data
Cache
add r1 = r2, r3
Ld r20 = 0x1234
mul r13 = r14, r15
;;
add r4 = r5, r6
;;
add r7 = r8, r9
;;
add r10 = r 11, r12
;;
F
D
MEM
WB
F
D
MUL
WB
F
D
WB
ALU
Register
File
Challenge the future
25
Data
Cache
MEM
F
D
ALU
WB
MUL
MEM
Instruction
Cache
F
D
ALU
WB
MUL
MEM
F
D
ALU
WB
MUL
Register
File
Challenge the future
26
Data
Cache
add r1 = r2, r3
Ld r20 = 0x1234
mul r13 = r14, r15
;;
add r4 = r5, r6
add r7 = r8, r9
add r10 = r 11, r12
;;
MEM
F
D
ALU
WB
MUL
MEM
F
D
ALU
WB
MUL
MEM
F
D
ALU
WB
MUL
Register
File
Challenge the future
27
Data
Cache
MEM
F
D
ALU
WB
MUL
Instruction
Cache
F
D
ALU
WB
Branch
F
D
ALU
WB
Register
File
Challenge the future
28
Data
Cache
add r1 = r2, r3
Ld r20 = 0x1234
mul r13 = r14, r15
;;
add r4 = r5, r6
add r7 = r8, r9
add r10 = r 11, r12
;;
MEM
F
D
ALU
WB
MUL
F
D
ALU
WB
Branch
F
D
ALU
WB
Register
File
Challenge the future
29
Intermezzo: VEX clusters
● A cluster is a section of a VLIW
processor that has its own register file
and resources
● Only cluster 0 can execute control
operations (branch, goto, call, etc.)
● multicluster != multicore
30
Intermezzo: VEX clusters
● 4-issue without clusters:
Instruction bundle
ALU, BR
ALU
ALU
ALU
64 registers
8 read, 4 write per cycle
31
Intermezzo: VEX clusters
● 4-issue divided into 2 clusters
 Smaller/faster register file
Instruction bundle
ALU, BR
ALU, C
32 registers
4R, 2W per
cycle
ALU, C
ALU
32 registers
4R, 2W per
cycle
32
Intermezzo: VEX
● ci at the start of the line denotes cluster
● So does the first number in the register
names
;;
c0
sub
c0=c1 mov
$r0.8
$r0.3
= $r0.13, $r0.15
= $r1.5
c0
c0
c1
$r0.15 = $r0.15, 32
$r0.14 = $r0.14, $r0.8
$r1.2 = $r1.6, ~0x3f
;;
add
shl
add
;;
33