Presentation - Cache Replacement Championship

Cache Replacement Policy
Based on
Expected Hit Count
A. Vakil-Ghahani
S. Mahdizadeh
M. Lotfi-Namin
M. Bakhshalipour
P. Lotfi-Kamran
H. Sarbazi-Azad
CRC-2: The 2nd Cache Replacement Championship
ISCA 2017
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
CRC-2
Intro
Problem
• Off-chip accesses stall the processor for hundreds of
cycles
• Limited cache size
o
Latency
o
Limited silicon area
One Solution
• Improving replacement policy
2
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
CRC-2
Replacement Policy
• Determine a victim in the case of a conflict
• Most locality is captured by first level caches
◦ Simple approaches like LRU
◦ Inefficient for LLC
• Need a more accurate replacement policy
◦ Better approximation of Belady's MIN
◦ Last level cache
3
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
CRC-2
4
Observation
•Blocks with more remaining hit counts will re-reference
earlier in future
Average Reuse Distance
lbm
libquantum
mcf
xalanc
100
80
60
40
20
1
2
3
4
5
Number of Remaining Hits
6
7+
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
CRC-2
5
The Proposal: Expected Hit Count (EHC)
Evict the block with the minimum expected remaining hit
count
Resident Blocks
Incoming
Block
Tag
A
B
C
D
E
Predicted Remaining
Hit Count
0
5
2
1
1
Comparator
Block with Minimum
Predicted Hit Count
A
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
CRC-2
6
EHC
• Hit-count predictor
•
Hit-counter per block
•
number of hits of the block since the entrance to the cache
•
Store the number of recently hit counts in a table (HHT)
Set-associative structure
LRU replacement policy
Indexed by block’s tag
o
o
o
o
To save area
• Use information from HHT for selecting a victim
• Need a baseline policy
DRRIP
o
o
o
good performance
low area overhead
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
CRC-2
7
HHT Structure
• Set-associative, 16-way
• LRU Replacement Policy
• Hit Count Array: A FIFO queue that stores the two
last experienced hit counts
Hit Count Array
16 WAY
128 SETS
VALID
LRU
RECENCY
(4-bit)
HIT
HIT
COUNTER COUNTER
(3-bit)
(3-bit)
TAG (20-bit)
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
CRC-2
8
Updating Metadata
• Saturation of hit count of a block
• Eviction of a block
RRPV
.
.
.
Data
Tag
RRPV
HHT Tag
20-bit after index
HHT
HHT Index
Tag
.
.
.
7-LSB bits of tag
Data
.
.
.
Hit Count of Evicted/
Saturated Block
Hit
Cnt
.
.
.
Hit
Cnt
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
CRC-2
EHC: Victim Selection
Calculate value for each block
Resident Blocks
A
RRPV
HHT
B
HitCnt
RRPV
Tag
EHC(E)
A
4
...
...
D
2
Incoming Block
C
HitCnt
RRPV
D
HitCnt
RRPV
E
HitCnt
Default HitCnt
RRPV
(0)
Predicted Remaining HC = |E – HitCnt|
Value= Predicted Remaining HC – RRPV
9
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
CRC-2
10
Example
RRPV
A
5
B
2
5
C
2
0
D
1
0
E
2
6
0
4  2  5  3 2  2  5  5 0  1  0  1 2  2  0  0 7  0  6  1
Minimum
Value
Evict B
HHT
Tag
A
C
D
E
EHC
4
0
2
7
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
Why baseline?
• HHT cannot predict:
•
New incoming blocks (compulsory misses)
•
Old blocks without any entry in HHT
• LRU, SRRIP or DRRIP as baseline
• DRRIP has low area and good performance
CRC-2
11
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
CRC-2
12
EHC: Victim Selection (cntd.)
Evict/Bypass (Exclusive cache) block with the lowest
value
Block with Minimum
Value
Replace the Block
Bypass the Block
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
Methodology
● CRC framework based on CMPsim
● Single-core and four-core
● Core parameters
○ 6-stage pipeline, 256-entry ROB
○ L1 (I&D): 32 KB, 8-way
○ Private L2: 256 KB, 8-way
○ Shared LLC: 2MB per core, 16-way
● Benchmarks
o
SPEC CPU2006
CRC-2
13
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
CRC-2
14
Prior Replacement Policies
• DRRIP
• Assigns re-reference interval prediction value (RRPV) for
each block
• Evicts block with maximum RRPV
• SHiP
• Classifies blocks into two categories
• Good Blocks and Bad Blocks
• Enhances DRRIP by predicting dead-on-arrival blocks
• EVA
• Reconciles hit probability and expected lifetime by measuring
time in cache as forgone hits
• Evicts candidate with lowest EVA
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
CRC-2
15
Trace Results
DRRIP
MPKI reduction over LRU
30%
SHiP
EVA
up to
24%
20%
EHC
on average
11%
10%
0%
-10%
-20%
-30%
bzip
cactusADM
mcf
xalanc
Average
CRC-2
16
Average
Cache Replacement Policy Based on Expected Hit Count
MIX2
25 June 2017
DRRIP
1.2
1.15
No clue to
determine the
dead blocks
SHiP
EVA
EHC
Binary
classification
1.1
1.05
1
Single-Core
Mix1
Average
xalanc
mcf
0.95
cactusADM
Performance normailzed to LRU
Cycle-Accurate Simulation
Multi-Core
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
Hardware overhead
• 3-bit RRPV per block (baseline overhead)
• 3-bit hit count per block
• 2K entries in HHT
• Each HHT entry:
•
20-bit tag,
•
•
•
2×3 bit hit-count,
4-bit LRU recency,
A valid bit
CRC-2
17
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
Hardware optimization
HHT with 1K entries performs well too
IPC normalized to LRU
1K-entry
1.15
1.1
1.05
1
0.95
0.9
2K-entry
CRC-2
18
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
CRC-2
19
Conclusions
EHC:
• Low-cost-yet-effective replacement policy
• Evicts block that predicted to have the
minimum expected remaining hit count
• Cache-like structure for HHT
• 31.75KB area overhead for a 2MB cache
• 19.75KB area overhead over baseline (DRRIP)
• 3.4% Performance improvement over baseline
25 June 2017
Cache Replacement Policy Based on Expected Hit Count
Thank you
for
your attention!
CRC-2
20