JWAC-1: Cache Replacement Championship

Adaptive Subset Based Replacement
Policy for High Performance Caching
Liqiang He Yan Sun Chaozhong Zhang
College of Computer Science, Inner Mongolia University
Hohhot, Inner Mongolia, P. R. China
2010-06-20
JWAC-1: Cache Replacement Championship
ISCA-2010
Inner Mongolia University
Background




Cache Replacement Policy plays an important
role in a cache design.
LRU policy is widely used in nowadays
microprocessor
The LLC has poor locality due to the L1 already
filters temporal locality
LRU causes thrashing when working set > cache
size
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Possible solution


if working set > cache size, retain some working set
[Qureshi, et al, ISCA’07]
record part of a longer cache access history
How we do it?
Grouping a cache set and keeping part of access history
in each group.
Inspired by the thread migration paper of Pierre at HPCA’04
L2
L2
L2
L2
L2
L2
C0
C1
Cn
g0
g1
gn
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Overview
Proposal: Subset Based Replacement Policy (SRP)
SRP successfully reduces the misses through
retaining part of longer history in the groups.
But the static SRP does not suitable for different
programs.
To adapt the diversity of programs and the behavior
changing inside a program, we propose Adaptive
SRP policy (ASRP).
ASRP obtains a 4.5 % of geometric average miss reduction over LRU.
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Outline

Introduction

Static Subset Based Replacement Policy

Adaptive Subset Based Replacement Policy

Summary
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Static Subset Based Replacement Policy
Cache set
subset
Active:
Accept insertion
Non-Active
subset
subset
Local LRU Stack
College of Computer Science
subset
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Insertion scheme in SRP
blocks in active subset
MRU
a
b
c
LRU
d
Reference to ‘i’
a
b
c
i
Insertion only occurs in active subset
Choose victim at LRU position. Do NOT promote to MRU
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Operation on cache hit in SRP
hit in any (active or non-active) subset
MRU
a
b
c
LRU
d
Reference to ‘c’
c
a
b
d
Move to local MRU position
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Changing of active subset
When the misses in a set > a threshold X,
change active subset
Thus:
A. force X consecutive misses only replacing the
blocks in active subset
B. assume N subsets, then a subset can change to
active again ONLY after (N-1)*X misses
C. a greater value of X, a longer time that blocks in
non-active subsets can stay in a set
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Thrashing access pattern in SRP
assume working set is 24 blocks,
LLC is 16-way, 4 subsets, 4 blocks/subset
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15 b16 b17 ….. b24
x=6
Blocks in a set with SRP:
b2b3b4b6 b8b9b10b12 b14b15b16b18
MRU b4
b10
b3
b9
b2
b8
b5
b6
b1
b12
b11
b7
Subset 0
Subset 1
LRU
College of Computer Science
b20b21b22b24
Blocks in a set with LRU:
b9 ….. b24
When access b2b3b4b6b8 again,
SRP hits but LRU misses
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Case Study of thrashing workload
Different static thresholds have
different abilities to reduce misses
Misses per 1K instructions
7.5
7
6.5
SRP
6
LRU
5.5
5
4.5
4
1
2
4
8
16
32
64
128 256 512
1k
2k
Threshold
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Hardware implementation
MRU
LRU
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Results
1.6
threshold 2
threshold 4
1.4
threshold 8
1.2
1
0.8
mi
lc
na
md
om
ne
tpp
pe
rlb
en
ch
po
va
ry
s je
ng
so
ple
x
sp
hin
x3
ton
to
xa
lan
cb
mk
ze
usm
p
av
ara
ge
mc
f
0.6
as
t
bw ar
av
es
bz
i
ca
ctu p2
sA
DM
ca
lc u
li x
de
al I
I
ga
me
ss
gc
Ge
ms c
FD
TD
go
bm
k
gro
ma
cs
h2
46
ref
hm
me
r
lb m
le s
li
li bq e3d
ua
n tu
m
(%) Improvement of misses over LRU
1.8
• SRP reduces misses for thrashing workloads
but increases for LRU-friendly ones.
• Not exist a threshold that is suitable for all benchmarks
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Outline

Introduction

Static Subset Based Replacement Policy

Adaptive Subset Based Replacement Policy

Summary
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Adaptive SRP policy
Different programs prefer different thresholds.
In ASRP policy:


Victim selection and insertion policy are same as in
SRP
ONLY difference: threshold is selected dynamically
from a pool of values according to which one causes
fewest misses.



The maximum threshold is 128
Pick eight values: 20, 21, …, 27
Apply the best threshold value to the cache
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
ASRP policy via “Set Dueling”
Thres-20-sets
+
miss
Cntr_0
Thres-21-sets
Thres-27-sets
Follower Sets
+
Cntr_7
Eight
thresholds
College of Computer Science
Divide the cache into two type:
 Sampling sets (eight
thresholds * 4sets/thres.)
 Follower sets
Eight counters
misses to threshold X’s
sampling sets: counter_x++
Counters decides threshold for
Follower sets:
counter with smallest value
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Resetting mechanism
To avoid the accumulative
effect of a big value in a
specific Cnrt_x
Eight
thresholds
Record the times of a same
threshold is selected by
the follower sets
last_follow
When the times > a threshold,
reset all the Cntr_Xs
=
Y
++
N
--
Cntr_0
global_follow
reset
>?
threshold
College of Computer Science
Cntr_7
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Budget


Totally 45K bits
only 70% of the budget used by LRU policy, and
35% of the total budget provided by this
championship
College of Computer Science
JWAC-1: Cache Replacement Championship
as
bw tar
av
es
c a bz ip
2
ctu
sA
D
ca M
lc u
li x
de
ga al II
me
ss
Ge
g
ms c c
FD
TD
go
b
gro m k
ma
h2 c s
46
re
hm f
me
r
lb m
le s
li bq l ie3d
ua
n tu
m
mc
f
mi
lc
na
om md
n
pe e tpp
rlb
en
c
po h
va
ry
s je
n
so g
ple
sp x
hix
3
ton
xa
lan to
cb
z e mk
usm
p
av
era
ge
(%) Improvement of misses over LRU
Inner Mongolia University
Results
1.6
1.5
DIP
College of Computer Science
ASRP
1.4
1.3
1.2
1.1
1
0.9
0.8
For 1MB 16-ways LLC. ASRP gets a geometric average
speedup of 4.5% over LRU
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Analyze
7.8
Misses per 1K instructions
Misses per 1K instructions
7.5
7
6.5
SRP
LRU
ASRP
6
5.5
5
4.5
4
1
2
4
8
16
32 64 128 256 512 1k
Threshold
xalancbmk
2k
7.7
7.6
7.5
SRP
LRU
ASRP
7.4
7.3
7.2
7.1
7
1
2
4
8
16
32 64 128 256 512 1k
Threshold
2k
GemsFDTD
The sampling mechanism does help ASRP to
find the best thresholds for different programs
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Conclusion





Keeping part of working set in the cache helps
reducing misses when the cache suffers a thrashing
problem
The part of longer access history helps SRP more
accurately capturing the frequently used blocks
Different programs and different phases of a
program prefer different thresholds to contribute
maximum hits to the cache
“Set Dueling” helps ASRP dynamically selecting a
suitable threshold
The experiment results show the effectiveness of
ASRP policy
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Thank you!
Any question?
College of Computer Science
JWAC-1: Cache Replacement Championship
College of Computer Science
DIP
hm
m
xa
lan er
cb
mk
Ge
ms
FD
TD
om
ne
tpp
1.8
b
G e z ip2
ms
FD
TD
go
bm
k
om
ne
tpp
as
tar
hm
me
r
sp
hix
xa
lan 3
cb
mk
xa
la
n
er
cb
m
k
m
hm
hm
G
m
em
er
sF
D
TD
ne
tp
p
TD
DIP
as
tar
bw
av
es
as
tar
bw
av
es
(%) Improvement of misses over LRU
sF
D
2.2
om
G
em
ta
bw r
av
es
as
ta
r
as
ta
r
as
(%) Improvement of misses over LRU
Inner Mongolia University
Result on multi-core processor
2.4
ASRP
1.8
2
1.6
1.4
1.2
0.8
1
0.6
2
ASRP
1.6
1.4
1.2
1
0.8
0.6
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Misses per 1K instructions
Case Study of LRU-friendly workload
7.8
7.7
7.6
7.5
7.4
7.3
7.2
SRP
LRU
7.1
7
1
2
4
8
16
32
64 128 256 512 1k
2k
Threshold
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
Explanation of active subset changing
College of Computer Science
JWAC-1: Cache Replacement Championship
Inner Mongolia University
A simple example of SRP policy
College of Computer Science
JWAC-1: Cache Replacement Championship