Single-ISA Heterogeneous Multi-Core Architectures: The Potential

Single-ISA Heterogeneous
Multi-Core Architectures:
The Potential for Processor Power Reduction
Rakesh Kumar Keith Farkas Norman P Jouppi ,Partha Ranganathan ,Dean
M.Tullsen
University of California, San Diego
MICRO 2003
Speaker:Chun-Chung Chen
Outline
•
•
•
•
•
Introduction
Motivation
Architecture
Experiment Results
Conclusion
2
Motivation
• By 2015 processors will consume 300W
• Existing CMP designs use only homogeneous cores
• Applications with high ILP can be exploited on wider cores
but applications with low ILP use less power on narrower
cores with little loss in performance
• No need to design cores from scratch because existing
Alpha cores run on practically the same ISA
General Idea
• Single-ISA heterogeneous multi-core architecture
– A mechanism to reduce power dissipation
• System software dynamically choose the most power
efficient processor under some performance constraints
 Power efficiency
Modeling of CPU Cores
• EV4: Alpha 21064
• EV6: Alpha 21264
EV5: Alpha 21164
EV8-: single-threaded version of Alpha 21464
• Assumption
• Only one application runs at a time on only one core
• Unused cores are completely powered down (therefore no leakage)
Cores, cont.
• Assuming all cores are implemented in 0.10 micron
technology
• We assume the four cores have private L1 data and
instruction caches and share a common L2 cache, phaselock loop circuitry, and pins.
• All cores run at 2.1GHz.
• ISA differences solved by. Either programs are compiled
to the least common denominator (the EV4), or we use
software traps for the older cores.
Modeling of Power
7
Core Switching
• Switching done at the operating system level
• OS switch involves cache flush and saving and
loading user states for the cores
• Estimate that a core can be powered up in ~1000
cycles at 2.1 GHz
• Switching overhead turns out to be negligible (~1%)
Variation in Power & Performance
• Benchmark, applu
• Relative performance of the cores varies between
phases.
9
Switching Algorithms:
Oracle based dynamic switching using energy heuristic
• With oracle knowledge of power requirements
and performance potential, chose the core that
would have the lowest energy consumption, as
long as it performs within 10% of EV8-
applu
Switching Algorithms:
Oracle based dynamic switching using energy-delay heuristic
•
•
Oracle chooses the core that has the lowest
energy–delay product.
Choose the core that would maximize
IPS2/Watt, as long as it performs within 50%
of EV8-
applu
Switching Algorithms:
Realistic Dynamic Switching
• Every 100 time intervals, one or more cores are sampled for five
intervals each.
• Neighbor
– One of the neighboring cores is chosen at random to be sampled
• Neighbor-global
– Similar to neighbor, except selecting the accumulated energydelay product.
• Random
– A core is chosen at random to be sampled
• All
– All cores are sampled
Realistic Dynamic Switching Results
• Results shown normalized
to EV8- performance
• Performance degradation
of realistic schemes is less
than in oracle-based
schemes
• Realistic schemes resulted
in more core switching
Conclusion
• Realistic dynamic switching algorithms show a
decrease in energy and energy-delay with only a
small decrease in performance.
• Single ISA heterogeneous multi-core processors
using existing technology may be a way to curb
power usage.