91.5x122 cm Poster Template

Control-Flow Decoupling
Rami Sheikh, James Tuck, Eric Rotenberg
North Carolina State University
Motivation
 Single-thread performance is important for single- and multithreaded applications.
 Per-core energy consumption is at a premium.
 Better branch handling is a BIG win: improves performance,
reduces energy and enables memory latency tolerance.
baseline
baseline + perfect prediction
2.5
2
slice
Energy Reduction
65%
128
Haswell
96
Sandy
Bridge
Nehalem
0
384
………...
BQ miss
slice
EX
IF
branch
IF
………….…
IF
EX
Speculate
or Stall
Other interesting aspects of CFD:
 Supports partially separable branches
 Supports nested branches through multi-level decoupling
 Overheads can be significantly reduced through value
communication (called CFD+ in the paper)
Future Generations
168 192 256
Window Size
IF
branch
1
0.5
Uncommon Case
BQ hit
68%
67%
63%
65%
67%
69%
1.5
Execution
Scenarios
Common Case
Conroe
Instructions per Cycle (IPC)
3
512
CFD Compiler Implementation in GCC
Interesting Observation
A third of mispredictions come from separable
branches:
 The branch has a large CD region
(if-conversion not profitable).
 The branch does not depend on its own CD
instructions via a loop-carried data
dependence.
branch-slice
branch
controldependent
region
Results
Applying CFD manually:
1.34
1.43
1.18
1.17
1.02
1.13
1.14
1.07
1.06
1.02 1.01
Normalized Energy
Speedup
Control-Flow Decoupling (CFD)
Key idea: separate the loop into two loops:
 The first contains only the branch’s predicate computation.
 The second contains the branch and its control-dependent
instructions.
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.97
0.85
0.91 1.00 0.92
0.96
0.79
0.81
0.63 0.59 0.61
CFD Loops
branch-slice
Push_BQ
Applying CFD automatically (compiler):
1.6
branch-slice
1.2
Speedup
branch
controlcontroldependent
dependent
region
Automated
0.9
1.4
BQ
Branch_on_BQ
Manual
1.0
BQ
drives
fetch
1.0
0.8
0.6
0.4
controldependent
region
Manual
Automated
0.8
Normalized Energy
Original Loop
0.7
0.6
0.5
0.4
0.3
0.2
0.2
0.1
0.0
0.0
Conclusion
slice
IF
branch
Original
Problem #1
…..…..…. EX
…..…..….
IF
No fetch
separation:
need branch
prediction
EX
CFD
slice
slice
slice
IF
…..…
…….….
independent
work in between
slice
branch
TEMPLATE DESIGN © 2008
www.PosterPresentations.com
IF
…..…..….
IF
…..…..….
IF
CFD provides:
• Fetch separation
• Mechanism to comm.
predicates to Fetch Unit
EX
EX
…..…..….
EX
BQ
Problem #2
EX
No mechanism to comm.
predicates to Fetch Unit
IF
…..…
branch
branch
EX
branch
IF
…..…..….
IF
EX
…..…..….
IF
EX
…..…..….
EX
 A third of mispredictions come from
16.3%
separable branches.
 CFD is a software/hardware collabor18.7%
ation for exploiting separability with
low complexity and high efficacy.
27.2%
 CFD is comparable to if-conversion in
terms of number of static branches
and MPKI contribution.
Separable
Hammock
Inseparable
37.8%
Not Analyzed