Control-Flow Decoupling Rami Sheikh, James Tuck, Eric Rotenberg North Carolina State University Motivation Single-thread performance is important for single- and multithreaded applications. Per-core energy consumption is at a premium. Better branch handling is a BIG win: improves performance, reduces energy and enables memory latency tolerance. baseline baseline + perfect prediction 2.5 2 slice Energy Reduction 65% 128 Haswell 96 Sandy Bridge Nehalem 0 384 ………... BQ miss slice EX IF branch IF ………….… IF EX Speculate or Stall Other interesting aspects of CFD: Supports partially separable branches Supports nested branches through multi-level decoupling Overheads can be significantly reduced through value communication (called CFD+ in the paper) Future Generations 168 192 256 Window Size IF branch 1 0.5 Uncommon Case BQ hit 68% 67% 63% 65% 67% 69% 1.5 Execution Scenarios Common Case Conroe Instructions per Cycle (IPC) 3 512 CFD Compiler Implementation in GCC Interesting Observation A third of mispredictions come from separable branches: The branch has a large CD region (if-conversion not profitable). The branch does not depend on its own CD instructions via a loop-carried data dependence. branch-slice branch controldependent region Results Applying CFD manually: 1.34 1.43 1.18 1.17 1.02 1.13 1.14 1.07 1.06 1.02 1.01 Normalized Energy Speedup Control-Flow Decoupling (CFD) Key idea: separate the loop into two loops: The first contains only the branch’s predicate computation. The second contains the branch and its control-dependent instructions. 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.97 0.85 0.91 1.00 0.92 0.96 0.79 0.81 0.63 0.59 0.61 CFD Loops branch-slice Push_BQ Applying CFD automatically (compiler): 1.6 branch-slice 1.2 Speedup branch controlcontroldependent dependent region Automated 0.9 1.4 BQ Branch_on_BQ Manual 1.0 BQ drives fetch 1.0 0.8 0.6 0.4 controldependent region Manual Automated 0.8 Normalized Energy Original Loop 0.7 0.6 0.5 0.4 0.3 0.2 0.2 0.1 0.0 0.0 Conclusion slice IF branch Original Problem #1 …..…..…. EX …..…..…. IF No fetch separation: need branch prediction EX CFD slice slice slice IF …..… …….…. independent work in between slice branch TEMPLATE DESIGN © 2008 www.PosterPresentations.com IF …..…..…. IF …..…..…. IF CFD provides: • Fetch separation • Mechanism to comm. predicates to Fetch Unit EX EX …..…..…. EX BQ Problem #2 EX No mechanism to comm. predicates to Fetch Unit IF …..… branch branch EX branch IF …..…..…. IF EX …..…..…. IF EX …..…..…. EX A third of mispredictions come from 16.3% separable branches. CFD is a software/hardware collabor18.7% ation for exploiting separability with low complexity and high efficacy. 27.2% CFD is comparable to if-conversion in terms of number of static branches and MPKI contribution. Separable Hammock Inseparable 37.8% Not Analyzed
© Copyright 2026 Paperzz