LEMap: Controlling Leakage in Large Chip-multiprocessor Caches via Profile-guided Virtual Address Translation Jugash Chandarlapati Mainak Chaudhuri Indian Institute of Technology, Kanpur Motivation 37% L2 cache energy 10% dead time per page Low Energy Map Motivation Past work has exploited this dead time at cache block grain For large last-level caches the bookkeeping overhead becomes enormous Good news: large potential of dead time exploitation at page grain – By-product: smart involvement of OS Design smart VA to PA mapping to cluster virtual pages accessed together in time so that average size of idle region increases Low Energy Map Highlights Three major contributions – First proposal to exploit smart virtual address translation schemes for region-based leakage control in large multi-banked shared CMP NUCAs – A new application-directed page placement system call to realize the leakage-aware translation – 7% total system energy saving, 50% L2 cache energy saving, 52% L2 cache power saving for an 8-core CMP with a 16 MB shared L2 cache at 65 nm on selected SPLASH-2, SPEC OMP, and DIS applications Low Energy Map LEMap: Basic idea Subbanks Baseline B0 B1 B2 B3 B4 B5 B6 B7 L2 bank control C0 C1 C2 C3 CROSSBAR C4 C5 C6 C7 B8 B9 B10 B11 B12 B13 B14 B15 Idle subbank: drowsy Low Energy Map Showing one time window LEMap: Basic idea Subbanks LEMap B0 B1 B2 B3 B4 B5 B6 B7 L2 bank control C0 C1 C2 C3 CROSSBAR C4 C5 C6 C7 B8 B9 B10 B11 B12 B13 B14 B15 Idle subbank: drowsy Low Energy Map Showing one time window LEMap: Basic idea Map a cluster of virtual pages that are accessed together onto a few subbanks – Improves effectiveness of low power drowsy mode due to larger number of idle subbanks – Can power down a subbank after the last access to the cluster of virtual pages mapped onto it – Take care of proximity by choosing a subbank for a cluster of virtual pages such that average access latency is minimized (important for NUCAs) Low Energy Map Implementing LEMap Collect (core id, virtual page id, timestamp) tuple for each L2 cache access via a profile run Cluster the virtual pages based on timestamp using a hierarchical agglomerative clustering algorithm – Grow the birth and death times of a cluster gradually until the cluster size exceeds subbank size Map each cluster on a physical subbank via application-directed page placement system call that takes a vector of VPNs Low Energy Map Simulation results Done on an 8-core CMP with a 16 MB shared L2 cache with leakage controlled at 128 KB subbank grain (16 banks) Models dynamic and leakage (gate and subthreshold) power of all on-chip components at 65 nm including memory controller (leakage model extracted from HSpice simulations) Models DRAM dynamic power following Micron technical notes Executes eight explicitly parallel shared memory applications drawn from SPLASH-2, SPEC OMP, and DIS suites Simulation results: L2 cache power 52% L2 cache power saving Low Energy Map Simulation results: L2 cache energy 50% L2 cache energy saving Low Energy Map Simulation results: Total energy 7% total energy saving Low Energy Map Simulation results: Execution time 3% loss on average Low Energy Map Summary A novel virtual to physical address translation mechanism to control leakage in large shared caches in CMP Uses profile information to optimize virtual page to physical subbank placement in parallel programs Controls leakage at subbank grain to reduce timekeeping overhead of drowsy Achieves 50% L2 cache energy saving and 7% total energy saving on eight benchmark programs compared to drowsy Low Energy Map Acknowledgment Intel: graduate fellowship IBM: faculty award Low Energy Map LEMap: Controlling Leakage in Large Chip-multiprocessor Caches via Profile-guided Virtual Address Translation THANK YOU! Jugash Chandarlapati Mainak Chaudhuri Indian Institute of Technology, Kanpur
© Copyright 2026 Paperzz