Temporal Distribution Based Software Cache
Partition To Reduce I-Cache Misses
Xiaomi An, Jiqiang Song, Wendong Wang
SimpLight Nanoelectronics Ltd
2008/03/24
outline
Traditional code layout optimizations
Code layout optimizations in Open64
compiler
Temporal distribution based software cache
partition to reduce I-Cache misses
Future work
SimpLight Confidential Patent pending
2
Traditional code layout optimizations
Code layout is a kind of optimization to
change the code organization in memory.
Main benefits of code layout:
• Improve branch prediction by placement of
basic blocks
• Reduce I-cache misses by changing code’s
mapping onto cache (mainly compulsory
misses and conflict misses)
• Fit code into complex memory hierarchy (e.g.
scratch-pad memory and cache)
SimpLight Confidential Patent pending
3
Traditional code layout optimizations
Representation of temporal relationship:
• control flow graph with edge frequency
• weighted call graph
• temporal relation graph
Consideration of cache architecture:
• Linearize code, do not consider cache
architecture (Pettis and Hansen)
• Distribute temporal interleaved code onto
different cache lines (Hashemi, Gloy, etc)
SimpLight Confidential Patent pending
4
Code layout optimizations in Open64
compiler
Profile based basic block reordering and
procedure-splitting in CG
• Based on control flow graph with edge
frequency
• Pettis and Hansen based algorithm
Procedure reordering in IPA
• Based on weighted call graph with call-edge
frequency
• Kind of Pettis and Hansen based algorithm
SimpLight Confidential Patent pending
5
Software cache partition
What is Software cache partition?
• Through code layout optimization, different code
blocks are mapped to different regions of the I-cache.
Benefits of software cache partition
• Reduce cache misses
• Remove interference of multi-programs and avoid
additional hardware support (embedded systems)
• Soft implementation of scratch pad memory on top of
I-cache
SimpLight Confidential Patent pending
6
Benefits of software cache partition (1)
Remove interference of multi-programs and
avoid additional hardware support
Video app
Audio app
I-cache is partitioned
according to the performance
demand and code locality of
the video application and the
audio application.
SimpLight Confidential Patent pending
7
Benefits of software cache partition (2)
Soft implementation of scratch pad memory on top
of I-cache
Code with real
time requirement
Other code
I-cache is partitioned to
guarantee code with real time
requirement will not be
replaced after they are brought
into the cache.
SimpLight Confidential Patent pending
8
Benefits of software cache partition (3)
Reduce I-cache misses
Runtime trace of code blocks:
ABCDEF(UV)5ABCDEF(PQ)5ABCDEF(XY)5ABCDEF
Layout 1: 24 misses
A/E
B/F
C
D
U/P/X
V/Q/Y
Layout 2: 18 misses
A
B
C
D
E/U/P/X
F/V/Q/Y
SimpLight Confidential Patent pending
9
Temporal distribution based layout of
code blocks in the partitioned cache
Selection of good candidates holding cache lines
exclusively
• Hot, Dense and Temporal Distribution
Mapping into I-cache:
Hot, dense
and good
regularity
Hot and
good locality
Cold
Hot and
good locality
Cold
Hot and
good locality
SimpLight Confidential Patent pending
Share
cache lines
Share
cache lines
10
Temporal distribution
Temporal locality and temporal regularity
Trace: ABCDEF(UV)5ABCDEF(PQ)5ABCDEF(XY)5ABCDEF
A,B,C,D,E,F have good temporal regularity since they have
uniform distribution along the trace.
U,V,P,Q,X,Y have good temporal locality since they exhibit a
large skew in the reference distribution.
Share
cache lines
Our mapping:
Totally 18 misses
Share cache
lines
A
B
C
D
U
V
P
Q
X
Y
SimpLight Confidential Patent pending
E
F
11
Qualification of temporal distribution
• Variance of reuse distance
PA (i ) PA (i 1)
1
ARD A
i 2
Var ( RD A )
N A 1
NA
2
• Temporal distribution
TD( A)
LRA / L2
1 Var( RD A )2
• Weighted temporal distribution
WTD( A) N A *TD( A)
SimpLight Confidential Patent pending
12
Iterative partition and layout
Func Partition (RB, IRB)
Sort nodes in RB by instruction density // highest
//instruction density first
RB_SIZE = Calc_rb_size(RB)
IRB_SIZE = Calc_irb_size(IRB)
While(RB_SIZE+IRB_SIZE>CACHE_SIZE) {
Adjust(RB, IRB)
RB_SIZE = Calc_rb_size(RB)
IRB_SIZE = Calc_irb_size(IRB)
}
SimpLight Confidential Patent pending
13
Experiments and results (1)
100.00%
90.00%
BB reorder
80.00%
70.00%
BB reorder +
layout
60.00%
50.00%
BB reorder +
pu split +
layout
40.00%
30.00%
20.00%
10.00%
0.00%
H264 enc
H264 dec AVSM dec MPEG4 dec G729.A
Cumulative effect of optimizations on I-cache miss reduction
SimpLight Confidential Patent pending
14
Experiments and results (2)
100.00%
90.00%
80.00%
70.00%
60.00%
TD
PH
TRG
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
H264 enc
H264 dec AVSM dec MPEG4 dec G729.A
Reduction of I-cache misses by TD, PH and TRG.
SimpLight Confidential Patent pending
15
Experiments and results (3)
80.00%
70.00%
60.00%
50.00%
TD
PH
TRG
40.00%
30.00%
20.00%
10.00%
0.00%
HE:stenfan HE:akiyo HE:football HD:stenfan HD:akiyo HD:football
H264 codec I-cache miss reduction by TD, PH and TRG with various
inputs
SimpLight Confidential Patent pending
16
Future work
Improve current iterative partition algorithm
Incorporate more cache configurations into
the layout algorithm, e.g. cache line size, L2
cache …
Develop effective software cache partition
method for multi-thread programs on our
memory hierarchy
SimpLight Confidential Patent pending
17
Thank You!
SimpLight Confidential Patent pending
18
© Copyright 2026 Paperzz