Simplifying Active Memory Clusters by Leveraging - CSE-IITK

LEMap: Controlling Leakage in
Large Chip-multiprocessor
Caches via Profile-guided Virtual
Address Translation
Jugash Chandarlapati
Mainak Chaudhuri
Indian Institute of Technology, Kanpur
Motivation
37% L2 cache energy
10% dead time per page
Low Energy Map
Motivation



Past work has exploited this dead time at
cache block grain
For large last-level caches the bookkeeping overhead becomes enormous
Good news: large potential of dead time
exploitation at page grain
– By-product: smart involvement of OS

Design smart VA to PA mapping to cluster
virtual pages accessed together in time
so that average size of idle region
increases
Low Energy Map
Highlights

Three major contributions
– First proposal to exploit smart virtual address
translation schemes for region-based leakage
control in large multi-banked shared CMP
NUCAs
– A new application-directed page placement
system call to realize the leakage-aware
translation
– 7% total system energy saving, 50% L2
cache energy saving, 52% L2 cache power
saving for an 8-core CMP with a 16 MB
shared L2 cache at 65 nm on selected
SPLASH-2, SPEC OMP, and DIS applications
Low Energy Map
LEMap: Basic idea
Subbanks
Baseline
B0 B1 B2 B3 B4 B5 B6 B7
L2 bank
control
C0
C1
C2
C3
CROSSBAR
C4
C5
C6
C7
B8 B9 B10 B11 B12 B13 B14 B15
Idle subbank:
drowsy
Low Energy Map
Showing one time window
LEMap: Basic idea
Subbanks
LEMap
B0 B1 B2 B3 B4 B5 B6 B7
L2 bank
control
C0
C1
C2
C3
CROSSBAR
C4
C5
C6
C7
B8 B9 B10 B11 B12 B13 B14 B15
Idle subbank:
drowsy
Low Energy Map
Showing one time window
LEMap: Basic idea

Map a cluster of virtual pages that are
accessed together onto a few subbanks
– Improves effectiveness of low power drowsy
mode due to larger number of idle subbanks
– Can power down a subbank after the last
access to the cluster of virtual pages mapped
onto it
– Take care of proximity by choosing a
subbank for a cluster of virtual pages such
that average access latency is minimized
(important for NUCAs)
Low Energy Map
Implementing LEMap


Collect (core id, virtual page id,
timestamp) tuple for each L2 cache
access via a profile run
Cluster the virtual pages based on
timestamp using a hierarchical
agglomerative clustering algorithm
– Grow the birth and death times of a cluster
gradually until the cluster size exceeds
subbank size

Map each cluster on a physical subbank
via application-directed page placement
system call that takes a vector of VPNs
Low Energy Map
Simulation results




Done on an 8-core CMP with a 16 MB
shared L2 cache with leakage controlled
at 128 KB subbank grain (16 banks)
Models dynamic and leakage (gate and
subthreshold) power of all on-chip
components at 65 nm including memory
controller (leakage model extracted from
HSpice simulations)
Models DRAM dynamic power following
Micron technical notes
Executes eight explicitly parallel shared
memory applications drawn from
SPLASH-2, SPEC OMP, and DIS suites
Simulation results: L2 cache power
52% L2 cache power saving
Low Energy Map
Simulation results: L2 cache
energy
50% L2 cache energy saving
Low Energy Map
Simulation results: Total energy
7% total energy saving
Low Energy Map
Simulation results: Execution time
3% loss on average
Low Energy Map
Summary




A novel virtual to physical address
translation mechanism to control leakage
in large shared caches in CMP
Uses profile information to optimize
virtual page to physical subbank
placement in parallel programs
Controls leakage at subbank grain to
reduce timekeeping overhead of drowsy
Achieves 50% L2 cache energy saving
and 7% total energy saving on eight
benchmark programs compared to
drowsy
Low Energy Map
Acknowledgment


Intel: graduate fellowship
IBM: faculty award
Low Energy Map
LEMap: Controlling Leakage in
Large Chip-multiprocessor
Caches via Profile-guided Virtual
Address Translation
THANK YOU!
Jugash Chandarlapati
Mainak Chaudhuri
Indian Institute of Technology, Kanpur