ppt

Insertion Policy Selection Using
Decision Tree Analysis
Samira Khan, Daniel A. Jiménez
University of Texas at San Antonio
Motivation




L1 and L2 filters the cache access
Last Level Cache (LLC) does not have much
temporal locality
Large fraction of blocks brought to cache are never
accessed again (zero reuse lines).
For SPEC CPU 2006 benchmarks, on average 60.18%
lines are never accessed again while they are in the
LLC
Motivation


No cache bursts in LLC
Only small portion of hits occur near the MRU position
Goal
Get rid of zero reuse lines as early as possible
 Keep lines in cache for sufficient time to get
the first hit
 Minimal change to LRU policy
 Use as little space as possible

Insertion Position Selection

Find the optimal insertion position





Zero reuse lines will get evicted earlier
Most of the non zero reuse lines should be in cache before
their first hit
This will get rid of zero reuse lines and make space for
useful lines
Use Decision Tree Analysis via set dueling to find the
position
This allows choosing among the insertion positions to set
duel
nearMRU pos
nearLRU pos
For 400.perlbench 66.67% lines brought to cache are never accessed again and 73.03%
hits occur in between MRU and middle position
LRU pos
middle pos
MRU pos
Set dueling between
middle and MRU pos
MRU pos winner
middle pos winner
Set dueling between
LRU and middle pos
LRU pos winner
Insert pos
LRU
Set dueling between
nearMRU and MRU pos
Middle pos winner
nearMRU pos
winner
Set dueling between
nearLRU and middle pos
nearLRU pos
winner
Insert pos
nearLRU
Middle pos winner
Insert pos
middle
Insert pos
nearMRU
MRU pos
winner
Insert pos
MRU
Adaptive Multi Set Dueling

Current multi set dueling






Have one leader set for each insertion policy
Partial follower sets duplicate the winner set policy
Each policy set duel in a tournament manner
Not scalable
Leader sets performing the looser policies hurt performance
Adaptive multi set dueling



Leader set adaptively chooses the policy
No need for partial follower set
Scalable
Result
Space Overhead
Space overhead for a 1MB 16 way set associative LLC
Parameter
Storage
Total Storage
LRU overhead per line
4 bits
1024*16*4 = 8 KB
Set type per set
2 bits
1024 * 2 = 2048 bits
Two counters (psel1 & psel2)
Each 10 bits
20 bits
One counter (switched)
1 bit
1 bit
Total
8 KB + 2069 bits
Conclusion

Insertion Position Selection using Decision Tree
Analysis





Requires minimal change to LRU
Needs only 2069 bits extra space
Chooses the best insertion position adaptively
Gets rid of zero reuse lines without any storage hungry
predictor
Makes multi set dueling scalable
Questions
Zero Reuse Lines in SPEC CPU 2006
Adaptive Multi Set Dueling
-1
+1
pselab
-1
-1
+1
pselcd
psel1
+1
-1
+1
pselef
-1
-1
+1
pselgh
psel2
+1
-1
psel1
+1
+1
All sets in LLC
+1, if pb wins
psel2
Leader sets in adaptive
multi set dueling scheme
-1, if pa wins
pa
pb
pα
Leader sets in current
multi set dueling scheme
pa
pb
φab
pc
pd
φcd
pe
pf
φef
pg
ph
φgh
Result
MRU
nearMRU
middle
nearLRU
LRU
psel2
psel1
s