Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China 2010-06-20 JWAC-1: Cache Replacement Championship ISCA-2010 Inner Mongolia University Background Cache Replacement Policy plays an important role in a cache design. LRU policy is widely used in nowadays microprocessor The LLC has poor locality due to the L1 already filters temporal locality LRU causes thrashing when working set > cache size College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Possible solution if working set > cache size, retain some working set [Qureshi, et al, ISCA’07] record part of a longer cache access history How we do it? Grouping a cache set and keeping part of access history in each group. Inspired by the thread migration paper of Pierre at HPCA’04 L2 L2 L2 L2 L2 L2 C0 C1 Cn g0 g1 gn College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Overview Proposal: Subset Based Replacement Policy (SRP) SRP successfully reduces the misses through retaining part of longer history in the groups. But the static SRP does not suitable for different programs. To adapt the diversity of programs and the behavior changing inside a program, we propose Adaptive SRP policy (ASRP). ASRP obtains a 4.5 % of geometric average miss reduction over LRU. College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Outline Introduction Static Subset Based Replacement Policy Adaptive Subset Based Replacement Policy Summary College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Static Subset Based Replacement Policy Cache set subset Active: Accept insertion Non-Active subset subset Local LRU Stack College of Computer Science subset JWAC-1: Cache Replacement Championship Inner Mongolia University Insertion scheme in SRP blocks in active subset MRU a b c LRU d Reference to ‘i’ a b c i Insertion only occurs in active subset Choose victim at LRU position. Do NOT promote to MRU College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Operation on cache hit in SRP hit in any (active or non-active) subset MRU a b c LRU d Reference to ‘c’ c a b d Move to local MRU position College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Changing of active subset When the misses in a set > a threshold X, change active subset Thus: A. force X consecutive misses only replacing the blocks in active subset B. assume N subsets, then a subset can change to active again ONLY after (N-1)*X misses C. a greater value of X, a longer time that blocks in non-active subsets can stay in a set College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Thrashing access pattern in SRP assume working set is 24 blocks, LLC is 16-way, 4 subsets, 4 blocks/subset b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15 b16 b17 ….. b24 x=6 Blocks in a set with SRP: b2b3b4b6 b8b9b10b12 b14b15b16b18 MRU b4 b10 b3 b9 b2 b8 b5 b6 b1 b12 b11 b7 Subset 0 Subset 1 LRU College of Computer Science b20b21b22b24 Blocks in a set with LRU: b9 ….. b24 When access b2b3b4b6b8 again, SRP hits but LRU misses JWAC-1: Cache Replacement Championship Inner Mongolia University Case Study of thrashing workload Different static thresholds have different abilities to reduce misses Misses per 1K instructions 7.5 7 6.5 SRP 6 LRU 5.5 5 4.5 4 1 2 4 8 16 32 64 128 256 512 1k 2k Threshold College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Hardware implementation MRU LRU College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Results 1.6 threshold 2 threshold 4 1.4 threshold 8 1.2 1 0.8 mi lc na md om ne tpp pe rlb en ch po va ry s je ng so ple x sp hin x3 ton to xa lan cb mk ze usm p av ara ge mc f 0.6 as t bw ar av es bz i ca ctu p2 sA DM ca lc u li x de al I I ga me ss gc Ge ms c FD TD go bm k gro ma cs h2 46 ref hm me r lb m le s li li bq e3d ua n tu m (%) Improvement of misses over LRU 1.8 • SRP reduces misses for thrashing workloads but increases for LRU-friendly ones. • Not exist a threshold that is suitable for all benchmarks College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Outline Introduction Static Subset Based Replacement Policy Adaptive Subset Based Replacement Policy Summary College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Adaptive SRP policy Different programs prefer different thresholds. In ASRP policy: Victim selection and insertion policy are same as in SRP ONLY difference: threshold is selected dynamically from a pool of values according to which one causes fewest misses. The maximum threshold is 128 Pick eight values: 20, 21, …, 27 Apply the best threshold value to the cache College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University ASRP policy via “Set Dueling” Thres-20-sets + miss Cntr_0 Thres-21-sets Thres-27-sets Follower Sets + Cntr_7 Eight thresholds College of Computer Science Divide the cache into two type: Sampling sets (eight thresholds * 4sets/thres.) Follower sets Eight counters misses to threshold X’s sampling sets: counter_x++ Counters decides threshold for Follower sets: counter with smallest value JWAC-1: Cache Replacement Championship Inner Mongolia University Resetting mechanism To avoid the accumulative effect of a big value in a specific Cnrt_x Eight thresholds Record the times of a same threshold is selected by the follower sets last_follow When the times > a threshold, reset all the Cntr_Xs = Y ++ N -- Cntr_0 global_follow reset >? threshold College of Computer Science Cntr_7 JWAC-1: Cache Replacement Championship Inner Mongolia University Budget Totally 45K bits only 70% of the budget used by LRU policy, and 35% of the total budget provided by this championship College of Computer Science JWAC-1: Cache Replacement Championship as bw tar av es c a bz ip 2 ctu sA D ca M lc u li x de ga al II me ss Ge g ms c c FD TD go b gro m k ma h2 c s 46 re hm f me r lb m le s li bq l ie3d ua n tu m mc f mi lc na om md n pe e tpp rlb en c po h va ry s je n so g ple sp x hix 3 ton xa lan to cb z e mk usm p av era ge (%) Improvement of misses over LRU Inner Mongolia University Results 1.6 1.5 DIP College of Computer Science ASRP 1.4 1.3 1.2 1.1 1 0.9 0.8 For 1MB 16-ways LLC. ASRP gets a geometric average speedup of 4.5% over LRU JWAC-1: Cache Replacement Championship Inner Mongolia University Analyze 7.8 Misses per 1K instructions Misses per 1K instructions 7.5 7 6.5 SRP LRU ASRP 6 5.5 5 4.5 4 1 2 4 8 16 32 64 128 256 512 1k Threshold xalancbmk 2k 7.7 7.6 7.5 SRP LRU ASRP 7.4 7.3 7.2 7.1 7 1 2 4 8 16 32 64 128 256 512 1k Threshold 2k GemsFDTD The sampling mechanism does help ASRP to find the best thresholds for different programs College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Conclusion Keeping part of working set in the cache helps reducing misses when the cache suffers a thrashing problem The part of longer access history helps SRP more accurately capturing the frequently used blocks Different programs and different phases of a program prefer different thresholds to contribute maximum hits to the cache “Set Dueling” helps ASRP dynamically selecting a suitable threshold The experiment results show the effectiveness of ASRP policy College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Thank you! Any question? College of Computer Science JWAC-1: Cache Replacement Championship College of Computer Science DIP hm m xa lan er cb mk Ge ms FD TD om ne tpp 1.8 b G e z ip2 ms FD TD go bm k om ne tpp as tar hm me r sp hix xa lan 3 cb mk xa la n er cb m k m hm hm G m em er sF D TD ne tp p TD DIP as tar bw av es as tar bw av es (%) Improvement of misses over LRU sF D 2.2 om G em ta bw r av es as ta r as ta r as (%) Improvement of misses over LRU Inner Mongolia University Result on multi-core processor 2.4 ASRP 1.8 2 1.6 1.4 1.2 0.8 1 0.6 2 ASRP 1.6 1.4 1.2 1 0.8 0.6 JWAC-1: Cache Replacement Championship Inner Mongolia University Misses per 1K instructions Case Study of LRU-friendly workload 7.8 7.7 7.6 7.5 7.4 7.3 7.2 SRP LRU 7.1 7 1 2 4 8 16 32 64 128 256 512 1k 2k Threshold College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University Explanation of active subset changing College of Computer Science JWAC-1: Cache Replacement Championship Inner Mongolia University A simple example of SRP policy College of Computer Science JWAC-1: Cache Replacement Championship
© Copyright 2026 Paperzz