test R1=0 R5=R2+1

Evaluation of the Gini-index
for Studying Branch
Prediction Features
Veerle Desmet
Lieven Eeckhout
Koen De Bosschere
A simple prediction example
outlook
t°
windy
season,...
features
prediction
mechanism
umbrella
prediction
past observations
goal = prediction accuracy of 100%
2
A simple prediction example
• Daily prediction
• Binary prediction: yes or no
• Outcome in the evening
• Prediction strategies:
– No need in summer, yes otherwise
• Easy, not very accurate
– Based on humidity and temperature
• More complex, very accurate
3
Predicting
• How to improve prediction accuracy?
• Shortcomings of existing models?
– Feature set
– Prediction mechanism
– Implementation limits
– ...
• This talk:
evaluation of prediction features
for branch prediction
4
Program execution
• Phases during instruction execution:
Fetch
Decode
Execute
• Fetch = read next instruction
• Decode = analyze type and
read operands
• Execute
• Write Back = write result
Write Back
R1=R2+R3
addition
4
3
computation
R1 contains 7
5
Pipelined architectures
Parallel versus sequential:
Fetch
Decode
Execute
Write Back
R1=R2+R3
R5=R2+1
R4=R3-1
R7=2*R1
R5=R6
R1=4
R1=R2+R3
R5=R2+1
R4=R3-1
R7=2*R1
R5=R6
R1=R2+R3
R5=R2+1
R4=R3-1
R7=2*R1
R1=R2+R3
R5=R2+1
R4=R3-1
• Constant flow of instructions possible
• Faster applications
• Limitation due to branches
6
Branches
• Branches determine program
flow or execution path
• Introduce 2 bubbles affecting
pipeline throughput
R1=R2+R3
R5=R6
R5=R2+1
test R1=0
R7=2*R1 no
R2=R2-1
R7=0
yes
Fetch
Decode
Execute
Write Back
R1=R2+R3
test
R5=R2+1
R4=R3-1
R7=2*R1
R5=R6
?R1=0
R5=R2+1
test
?R1=0
R5=R6
R5=R2+1
test
R1=0
R1=R2+R3
R5=R6
test
R1=0
R5=R2+1
7
Solution
• 1 out of 8 instructions is a branch
• Waiting for the outcome of branches
seriously affects amount of parallelism
• Increasing number of pipeline stages
– Pentium 4: up to 20 stages
Predict outcome
of branch
8
Branch prediction
• Fetch those instructions that
are likely to be executed
• Correct prediction eliminates
bubbles
R1=R2+R3
R5=R6
R5=R2+1
test R1=0
R7=2*R1 no
R2=R2-1
R7=0
yes
Fetch
Decode
Execute
Write Back
R1=R2+R3
test
R5=R2+1
R4=R3-1
R7=2*R1
R5=R6
R1=0
R2=R2-1
R5=R2+1
R7=2*R1
test
R1=0
R5=R6
test
R1=0
R5=R2+1
R1=R2+R3
R5=R2+1
R5=R6
9
Branch prediction
• Prediction for each branch execution
• Binary prediction: taken or not-taken
• Outcome after the test is excuted
• Prediction strategies:
– Many predictors in literature
– Static versus dynamic
10
Static branch prediction
• BTFNT: Backward Taken, Forward Not Taken
– Loops (e.g. For, while)
– Summer no need of umbrella
• Based on type of test in branch
– Branch if equal mostly not-taken
– Sunday no need of umbrella
• Easy, prediction fixed at compile-time
• Prediction accuracy: about 75%
11
Dynamic branch prediction
•
•
•
•
Bimodal
Global
Gshare
Local
Simulations:
• SimpleScalar/Alpha
• SPEC2000 integer benchmarks
• 250M branches
12
Bimodal branch predictor
branch
address
saturating
counter
e.g. 3
2
prediction
e.g. taken
update with
outcome
e.g. taken
Averaging outcomes from previous years
13
Global branch predictor
global
history
e.g. 0111
1111
saturating
counter
e.g. 3
2
prediction
e.g. taken
update with
outcome
e.g. taken
Averaging last day outcomes
14
Gshare branch predictor
AMD K6
branch address
XOR
global history
e.g. 1010
saturating
counter
e.g. 2
prediction
e.g. taken
update with outcome
15
Local branch predictor
branch
address
local
history
e.g. 1111
prediction
saturating
counter
e.g. 2
Record day outcomes of previous years
Averaging over same day histories
16
Prediction Accuracy (%)
Accuracy versus storage
100
gshare
95
local
90
bimodal
85
global
80
75
1
10
100
1000
10000
Predictor Size (byte)
100000
17
Branch prediction strategies
• All saturating counter mechanism
• All use of limited tables
– problem with so-called aliasing
• Different prediction features
• Accuracies up to 95%
• Further improvement?
• Predictive power of features?
18
Feature selection
features?
prediction
mechanism
prediction
Feature selection
• Which features are relevant?
• Less features
– require less storage
– faster prediction
19
Systematic feature evaluation
• Feature = input to predictor
• Power of features
– predictor size not fixed
– prediction strategy not fixed
• Decision trees:
– Selects feature
– Split observations
– Recursive algorithm
– Easy understandable
20
Decision Tree Construction
Outlook
t°
windy
features
windy
sunny
high
no
no
sunny
low
yes
yes
overcast
high
no
no
overcast
low
no
no
overcast
high
yes
yes
overcast
low
yes
yes
rain
low
no
yes
rain
high
yes
yes
yes
prediction
mechanism
no
YES
outlook
sunny
overcast
rain
NO
YES
prediction
21
Gini-index
Metric for partition purity of a data set S:
Gini (S) = 1 – p0² – p1²
where pi is the relative frequence of class i in S
For binary prediction: minimum 0
maximum 0.5
The higher the Gini-index,
the more difficult to predict
22
Finding good split points
• If data set S is split into two subsets S0 and
S1 with sizes N0 and N1 (N = N0 + N1):
N0
N1
Gini(S0)
+
Gini(S1)
Ginisplit(S) =
N
N
• Feature with lowest Ginisplit is chosen
• Extensible for non binary features
• Looking for features with low Ginisplit-index,
i.e. features with good predictive power
23
Ginisplit-index
Individual feature bits
0,5
0,45
0,4
0,35
0,3
0,25
0,2
0,15
0,1
0,05
0
global
history
branch address
gshare-index
target direction
branch type
ending type
successor
basic block
local
history
dynamic features
static features
24
Individual features
• Local history bits very good
– perfect local history uses branch address
• Static features powerful
– non-binary
– except target direction
– known at compile-time
• Looking for good feature combinations...
25
Ginisplit-index
Features as used in predictors
0,5
0,45
0,4
0,35
0,3
0,25
0,2
0,15
0,1
0,05
0
gshare-index
branch
address
global
history
local
history
0 1 2 3 4 5 6 7 8 9 10 1112 1314 1516 1718 1920
Feature length (bit)
26
Features as used in predictors
• Static features better for small lengths
• Better if longer features
• A few local history bits enough
• Same behaviour as accuracy curves
– low Gini-index implies high accuracy
• Independent to predictor size
• Independent to prediction strategy
27
Remark
• Limitation of decision
trees: outliers
– majority vote
– clean data
Outlook
t°
windy
sunny
high
no
no
sunny
high
no
yes
sunny
high
no
no
• Keep implementation in
mind
28
Conclusion
• Need of accurate branch prediction in
modern microprocessors
• Towards systematic predictor
development
– Selecting features
– Predictive power of features
• Gini-index useful for studying branch
prediction features
– without fixing any predictor aspect
29
Thanks for Listening