Hawkeye - Cache Replacement Championship

Hawkeye: Leveraging Belady’s Algorithm
for Improved Cache Replacement
Akanksha Jain
Calvin Lin
June 25, 2017
Our Strategy
•  Based on the Hawkeye Cache [ISCA ‘16]
2
Speedup (%)
16
14
12
10
8
6
4
2
0
-2
-4
Average
calculix
bzip2
gcc
lbm
tonto
h264ref
libquantum
GemsFDTD
hmmer
cactusADM
zeusmp
soplex
sphinx3
xalancbmk
astar
omnetpp
mcf
milc
Performance Improvement (Config 1)
Hawkeye
4.3 %
3
Speedup (%)
25
20
15
10
5
0
-5
-10
-15
-20
-25
Average
calculix
bzip2
gcc
lbm
tonto
h264ref
libquantum
GemsFDTD
hmmer
cactusADM
zeusmp
soplex
sphinx3
xalancbmk
astar
omnetpp
mcf
milc
Performance Improvement With Prefetcher
(Config 2)
Hawkeye
-1.2 %
4
Today’s Talk
•  Understand why Hawkeye does poorly with prefetches
•  Suggest solutions to address these issues
5
The Hawkeye Cache
•  Based on Belady’s optimal solution
–  Evict the line that is re-used furthest in the future
–  Provably optimal, but requires future knowledge
6
The Hawkeye Cache
•  We cannot look into the future
•  But we can apply the OPT algorithm to past events to learn how
OPT behaves
•  If past behavior predicts the future, then this solution should
approach OPT
Predictor
Past Behavior
Future
time
What should I evict?
7
The Hawkeye Cache
PC Cache
Access
Stream
OPTgen
Computes OPT’s
decisions for the past
OPT
hit/miss
PC-based
Predictor
Learns past OPT
decisions
Insertion
Priority
Last Level Cache
8
The Hawkeye Cache
PC Cache
Access
Stream
OPTgen
Computes OPT’s
decisions for the past
How should
OPTgen deal with
prefetches?
OPT
hit/miss
PC-based
Predictor
Learns past OPT
decisions
Insertion
Priority
Last Level Cache
How should the
predictor deal with
prefetches?
9
Prefetch-Aware Hawkeye
•  How should OPTgen deal with prefetches?
–  Key challenge: redundant prefetches and inaccurate prefetches
•  How should the predictor deal with prefetches?
–  Separate predictors for demand and prefetches
10
Background: OPTgen
PC Cache
Access
Stream
OPTgen
Computes OPT’s
decisions for the past
OPT
hit/miss
PC-based
Predictor
Learns past OPT
decisions
Insertion
Priority
Last Level Cache
11
Background: OPTgen
•  Linear time algorithm that reproduces OPT’s solution for the past
–  100% accuracy with no resource constraints (assuming no prefetches)
–  95.5% accurate with sampling (assuming no prefetches)
•  Each line’s demand on the cache is represented by a liveness interval
•  Lines are allocated cache capacity in the order that they are reused
12
Background: OPTgen
•  For each re-use, would this line have been a hit or miss with OPT?
•  Is there room for C? Yes
C
A
time
B
Past
Hit
C C E
D
D
A
B
Cache Capacity = 2
Future
13
Background: OPTgen
•  For each re-use, would this line have been a hit or miss with OPT?
•  Is there room for D? Yes
D
Hit
C
A
time
B
C
C
E
Past Behavior
D
D
A
B
Cache Capacity = 2
Future
14
Background: OPTgen
•  For each re-use, would this line have been a hit or miss with OPT?
•  Is there room for A? Yes
A
Hit
D
C
A
time
B
C
C
E
D
Past Behavior
D
A
B
Cache Capacity = 2
Future
15
Background: OPTgen
•  For each re-use, would this line have been a hit or miss with OPT?
B
Miss
A
D
D
C
A
time
B
C
C
E
D
Past Behavior
D
A
B
Cache Capacity = 2
Future
16
Liveness Intervals
•  What defines liveness intervals?
•  In the absence of prefetching, endpoints are demand accesses
–  D-D intervals (reuse of demand accesses)
•  In the presence of prefetching, four kinds of intervals possible
–  D-D, D-P, P-D, P-P intervals
–  Original solution did not distinguish among these
17
OPTgen With Prefetches
•  What if we had an inaccurate prefetch to C?
Prefetch C à Never used D
C
A
time
B
C
C
E
Past Behavior
D
D
A
B
Cache Capacity = 2
Future
18
OPTgen With Prefetches
•  What if we had an inaccurate prefetch to C?
Prefetch C à Never used D
P
C
D
C
A
time
B
C
C
E
Past Behavior
D
D
A
B
Cache Capacity = 2
Future
19
OPTgen With Prefetches
•  D-P intervals are undesirable because they create unwanted demand on the
cache
A
Miss
D
P à Never used
C
D
C
A
time
B
C
C
E
Past Behavior
D
D
A
B
Cache Capacity = 2
Future
20
OPTgen With Prefetches
•  P-P intervals also create unwanted demand on the cache without resulting in hits
A
D
P
C
P
C
P à Never used
P
C
C
D
C
A
time
B
C
C
E
Past Behavior
D
D
A
B
Cache Capacity = 2
Future
21
OPTgen With Prefetches
•  One Solution: Only consider liveness intervals that end with
demand accesses
–  D-D intervals (demand reuse)
–  P-D intervals (useful prefetch)
•  Ignore liveness intervals that end with a prefetch
–  D-P intervals (redundant prefetch, potentially inaccurate)
–  P-P intervals (redundant prefetch, potentially inaccurate)
22
Complication
•  Ignoring redundant prefetches (*-P intervals) maximizes cache
efficiency at the expense of memory traffic
23
Memory Traffic Overhead
•  Ignoring *-P intervals generates additional memory traffic
4 memory requests
D
P
C
P
C
P
C
P
C
D
C
A
time
B
C
C
E
Past Behavior
D
D
A
B
Cache Capacity = 2
Future
24
Memory Traffic Overhead
•  Caching *-P intervals reduces memory traffic
1 memory request
D
P
P
P
P
C
D
C
A
time
B
C
C
E
Past Behavior
D
D
A
B
Cache Capacity = 2
Future
25
Complication
•  Ignoring redundant intervals maximizes cache efficiency at the
expense of memory traffic
•  Trade-off between cache efficiency and memory traffic
•  Our Solution: Ignore long *-P intervals, but cache short *-P
intervals
26
Speedup (%)
14
12
10
8
6
4
2
0
-2
-4
Average
calculix
bzip2
gcc
lbm
tonto
h264ref
libquantum
GemsFDTD
hmmer
cactusADM
zeusmp
soplex
sphinx3
xalancbmk
astar
omnetpp
mcf
milc
Performance Improvement With Prefetcher
(Config2)
Hawkeye
2.3 %
27
Speedup (%)
14
12
10
8
6
4
2
0
-2
-4
Average
calculix
bzip2
gcc
lbm
tonto
h264ref
libquantum
GemsFDTD
SHiP
hmmer
cactusADM
zeusmp
soplex
sphinx3
xalancbmk
astar
omnetpp
mcf
milc
Performance Improvement With Prefetcher
(Config2)
Hawkeye
2.1% vs.
2.3 %
28
-10
10
5
Average
calculix
bzip2
gcc
lbm
tonto
h264ref
libquantum
GemsFDTD
SHiP
hmmer
cactusADM
zeusmp
soplex
sphinx3
xalancbmk
astar
omnetpp
mcf
milc
Speedup (%)
Performance Improvement (Config 1)
Hawkeye
15
3.4% vs.
4.3 %
0
-5
29
Multi-Core Results (Config 3)
35
30
SHiP
Hawkeye
Fair Speedup (%)
25
7.2%
9.3%
20
15
10
5
0
-5
-10
0
50
100
150
200
250
300
30
Conclusions
•  The Hawkeye Cache
–  Performs well in the absence of prefetching
–  It matches state-of-the-art solutions in the presence of prefetching
31
Conclusions
•  Hawkeye doesn’t really do better than SHiP in the presence of
prefetching
•  What is the optimal caching solution in the presence of
prefetching?
–  If we know the answer, then a Hawkeye-like solution should do well
32
Thank You
•  Questions?
33