Hawkeye: Leveraging Belady’s Algorithm for Improved Cache Replacement Akanksha Jain Calvin Lin June 25, 2017 Our Strategy • Based on the Hawkeye Cache [ISCA ‘16] 2 Speedup (%) 16 14 12 10 8 6 4 2 0 -2 -4 Average calculix bzip2 gcc lbm tonto h264ref libquantum GemsFDTD hmmer cactusADM zeusmp soplex sphinx3 xalancbmk astar omnetpp mcf milc Performance Improvement (Config 1) Hawkeye 4.3 % 3 Speedup (%) 25 20 15 10 5 0 -5 -10 -15 -20 -25 Average calculix bzip2 gcc lbm tonto h264ref libquantum GemsFDTD hmmer cactusADM zeusmp soplex sphinx3 xalancbmk astar omnetpp mcf milc Performance Improvement With Prefetcher (Config 2) Hawkeye -1.2 % 4 Today’s Talk • Understand why Hawkeye does poorly with prefetches • Suggest solutions to address these issues 5 The Hawkeye Cache • Based on Belady’s optimal solution – Evict the line that is re-used furthest in the future – Provably optimal, but requires future knowledge 6 The Hawkeye Cache • We cannot look into the future • But we can apply the OPT algorithm to past events to learn how OPT behaves • If past behavior predicts the future, then this solution should approach OPT Predictor Past Behavior Future time What should I evict? 7 The Hawkeye Cache PC Cache Access Stream OPTgen Computes OPT’s decisions for the past OPT hit/miss PC-based Predictor Learns past OPT decisions Insertion Priority Last Level Cache 8 The Hawkeye Cache PC Cache Access Stream OPTgen Computes OPT’s decisions for the past How should OPTgen deal with prefetches? OPT hit/miss PC-based Predictor Learns past OPT decisions Insertion Priority Last Level Cache How should the predictor deal with prefetches? 9 Prefetch-Aware Hawkeye • How should OPTgen deal with prefetches? – Key challenge: redundant prefetches and inaccurate prefetches • How should the predictor deal with prefetches? – Separate predictors for demand and prefetches 10 Background: OPTgen PC Cache Access Stream OPTgen Computes OPT’s decisions for the past OPT hit/miss PC-based Predictor Learns past OPT decisions Insertion Priority Last Level Cache 11 Background: OPTgen • Linear time algorithm that reproduces OPT’s solution for the past – 100% accuracy with no resource constraints (assuming no prefetches) – 95.5% accurate with sampling (assuming no prefetches) • Each line’s demand on the cache is represented by a liveness interval • Lines are allocated cache capacity in the order that they are reused 12 Background: OPTgen • For each re-use, would this line have been a hit or miss with OPT? • Is there room for C? Yes C A time B Past Hit C C E D D A B Cache Capacity = 2 Future 13 Background: OPTgen • For each re-use, would this line have been a hit or miss with OPT? • Is there room for D? Yes D Hit C A time B C C E Past Behavior D D A B Cache Capacity = 2 Future 14 Background: OPTgen • For each re-use, would this line have been a hit or miss with OPT? • Is there room for A? Yes A Hit D C A time B C C E D Past Behavior D A B Cache Capacity = 2 Future 15 Background: OPTgen • For each re-use, would this line have been a hit or miss with OPT? B Miss A D D C A time B C C E D Past Behavior D A B Cache Capacity = 2 Future 16 Liveness Intervals • What defines liveness intervals? • In the absence of prefetching, endpoints are demand accesses – D-D intervals (reuse of demand accesses) • In the presence of prefetching, four kinds of intervals possible – D-D, D-P, P-D, P-P intervals – Original solution did not distinguish among these 17 OPTgen With Prefetches • What if we had an inaccurate prefetch to C? Prefetch C à Never used D C A time B C C E Past Behavior D D A B Cache Capacity = 2 Future 18 OPTgen With Prefetches • What if we had an inaccurate prefetch to C? Prefetch C à Never used D P C D C A time B C C E Past Behavior D D A B Cache Capacity = 2 Future 19 OPTgen With Prefetches • D-P intervals are undesirable because they create unwanted demand on the cache A Miss D P à Never used C D C A time B C C E Past Behavior D D A B Cache Capacity = 2 Future 20 OPTgen With Prefetches • P-P intervals also create unwanted demand on the cache without resulting in hits A D P C P C P à Never used P C C D C A time B C C E Past Behavior D D A B Cache Capacity = 2 Future 21 OPTgen With Prefetches • One Solution: Only consider liveness intervals that end with demand accesses – D-D intervals (demand reuse) – P-D intervals (useful prefetch) • Ignore liveness intervals that end with a prefetch – D-P intervals (redundant prefetch, potentially inaccurate) – P-P intervals (redundant prefetch, potentially inaccurate) 22 Complication • Ignoring redundant prefetches (*-P intervals) maximizes cache efficiency at the expense of memory traffic 23 Memory Traffic Overhead • Ignoring *-P intervals generates additional memory traffic 4 memory requests D P C P C P C P C D C A time B C C E Past Behavior D D A B Cache Capacity = 2 Future 24 Memory Traffic Overhead • Caching *-P intervals reduces memory traffic 1 memory request D P P P P C D C A time B C C E Past Behavior D D A B Cache Capacity = 2 Future 25 Complication • Ignoring redundant intervals maximizes cache efficiency at the expense of memory traffic • Trade-off between cache efficiency and memory traffic • Our Solution: Ignore long *-P intervals, but cache short *-P intervals 26 Speedup (%) 14 12 10 8 6 4 2 0 -2 -4 Average calculix bzip2 gcc lbm tonto h264ref libquantum GemsFDTD hmmer cactusADM zeusmp soplex sphinx3 xalancbmk astar omnetpp mcf milc Performance Improvement With Prefetcher (Config2) Hawkeye 2.3 % 27 Speedup (%) 14 12 10 8 6 4 2 0 -2 -4 Average calculix bzip2 gcc lbm tonto h264ref libquantum GemsFDTD SHiP hmmer cactusADM zeusmp soplex sphinx3 xalancbmk astar omnetpp mcf milc Performance Improvement With Prefetcher (Config2) Hawkeye 2.1% vs. 2.3 % 28 -10 10 5 Average calculix bzip2 gcc lbm tonto h264ref libquantum GemsFDTD SHiP hmmer cactusADM zeusmp soplex sphinx3 xalancbmk astar omnetpp mcf milc Speedup (%) Performance Improvement (Config 1) Hawkeye 15 3.4% vs. 4.3 % 0 -5 29 Multi-Core Results (Config 3) 35 30 SHiP Hawkeye Fair Speedup (%) 25 7.2% 9.3% 20 15 10 5 0 -5 -10 0 50 100 150 200 250 300 30 Conclusions • The Hawkeye Cache – Performs well in the absence of prefetching – It matches state-of-the-art solutions in the presence of prefetching 31 Conclusions • Hawkeye doesn’t really do better than SHiP in the presence of prefetching • What is the optimal caching solution in the presence of prefetching? – If we know the answer, then a Hawkeye-like solution should do well 32 Thank You • Questions? 33
© Copyright 2024 Paperzz