Presentation - IBM Research

Code Optimization Technologies
IBM Research – Haifa
Code Alignment for Architectures with Pipeline
Group Dispatching
Helena Kosachevsky, Gadi Haber, Omer Boehm
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Agenda
 Background
 Code alignment algorithm
–General concepts, code chains
–Genetic algorithm
 Code alignment for Power 6
–Architecture specifics
–Evaluation strategies
 Results
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Background
 Proper code placement strongly impacts
–instruction cache performance
–branch prediction
–instruction fetch mechanism
 Previous works
– not many sees code alignment as a code chains placement without
reordering, using padding of a certain size
– used as a complementary optimization, producing mixed results
 We propose a profile-guided generic optimization algorithm, producing
stable performance gain
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Code Alignment Algorithm – general concepts
 Code chain - is a code sequence which is executed more or less
continuously with no significant differences in its instructions
frequency
 Satisfies one of the following properties:
–Terminates with unconditional jump or branch via register
–Terminates with a conditional branch whose fallthru is taken
infrequently
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Code Alignment Algorithm – working on chains
 Aligns each chain by inserting non-executable padding between
the chains
 Working on chains, not basic blocks, – limits code inflation
 Profile allows to focus on frequently executed chains – avoid long
run time and code inflation
 Tries to determine the best position for each given chain
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Code Chains and Around
0x100
Alignment offset of size of 3 instructions
Chain 1
Instruction buffer boundary
0x120
Gap of size of 4 instructions between the chains
0x140
Chain 2
0x160
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Code Alignment Algorithm – filtering alignment options
 The algorithm works in phases, in each phase a different measure determines the
best alignment alternatives
 The initial set of alignment options is defined
 This set is filtered in several steps with different filter at each step
 These filters, or evaluation strategies, are specific to the architecture and model
the performance dependency on the code placement
 The strategies are applied based on predefined priorities. The next filter will apply
only to results which survived previous filters. The next filter results doesn’t
override the previous one’s, but refines it
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Power 6 Pipeline
The generic pipeline stages of instruction processing:
 Fetch : Instructions are copied from the instruction cache or memory into the fetch
buffer.
 Decode : Instructions in the fetch buffer are interpreted.
 Dispatch : Instructions are sent to the appropriate execution units.
 Execute : The operations indicated by the instructions are carried out in the
execution units.
 Complete :At the end of execution, the result of instructions can be forwarded to
other pending instructions while the result awaits write back.
 Write Back : The results of execution are written to the architected register, cache
or memory in program order, and any exceptions are recognized.
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Alignment for Power 6
In-order architecture, static dispatch grouping
Very sensitive to code alignment
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Alignment for Power 6 – architecture specifics
 Fetch buffer contains 8 instructions
 Instructions which are not to be executed are discarded
 “Good” instructions are delivered for dispatch
 Dispatch groups are formed, each cycle one group is executed
 A new dispatch group starts on the instruction buffer boundary
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Alignment for Power 6 – evaluation strategies
Start from 8 possible alignment options. Filter them by:
1. dispatch groups
- minimize the number of dispatch groups formed within the
chain, normalized by their execution frequency
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Alignment Evaluation Using Grouping Analysis
0x00
offset=0
groups=8
offset=1
groups=8
offset=2
groups=7
offset=3
groups=6
offset=4
groups=7
offset=5
groups=7
offset=6
groups=7
offset=7
groups=6
0x20
0x40
performing
Penalty for offset is the sum of execution
grouping
counters ofanalysis
each created group
0x60
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Alignment for Power 6 – evaluation strategies
2. hot targets
- Aligns targets of frequently executed branch instructions, that
have high incoming control flow
- Best case – hot targets are placed on the beginning of the
instruction buffer
- Worst case – the first instruction of the hot target is the last
instruction of the ibuff
- a dispatch group with 1 instruction
- this is the only executed instruction of the ibuff
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Alignment Evaluation by Aligned Hot Targets
Chain 1
0x100
Inserting a gap between the chains to
place the hot target on the ibuff boundary
0x120
Chain 2
Frequently taken target
0x140
0x160
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Alignment for Power 6 – evaluation strategies
Other possible strategies:
 Branch instructions alignment
 Reduce dispatch stalls
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Results
 The algorithm was implemented into IBM FDPR-Pro, a profilebased post-link optimizer
 In some cases of extremely bad code alignment up to 40%
improvement is achieved
 Stable performance gain on SPEC 2006 INT64 benchmarks,
running on AIX 6.1 on Power 6. Applied on top of standard O3
FDPR-Pro optimization set and showed up to 5% improvement.
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Results – SPEC 2006
Alignment - % improvement over base
12
10
8
6
O3 with no alignment
O3 with new alignment
4
SD %
2
en
g
sj
na
m
d
m
cf
lb
m
m
er
hm
re
f
h2
64
s
ac
om
gr
AD
M
ct
us
ip
2
ca
-2
bz
bw
av
es
0
© 2010 IBM Corporation
Code Optimization Technologies
IBM Research – Haifa
Thanks!
[email protected]
[email protected]
[email protected]
© 2010 IBM Corporation