Presentation

VLSI Project
Neural Networks based
Branch Prediction
Alexander Zlotnik
Marcel Apfelbaum
Supervised by: Michael Behar, Spring 2005
Introduction

Branch Prediction has always been a “hot” topic
20% of all instructions are branches

Correct prediction makes faster execution
Misprediction has high costs

Classic predictors are based on 2 bit counter
state-machines
taken
not-taken
00
SNT
taken
01
WNT
not-taken
taken
11
ST
10
WT
not-taken
taken
not-taken
VLSI Project Spring 2005
2
Introduction (cont.)


Modern predictors are 2 level and use 2 bit
counters and branch history (local\global)
Known problems are:
• Memory size exponential to history length
• Too long history can cause errors

Recent studies explore
Branch Prediction using
Neural Networks
VLSI Project Spring 2005
3
Project Objective




Develop a mechanism for branch prediction
Explore the practicability and applicability of such
mechanism and explore its success rates
Use of a known Neural Networks technology: The
Perceptron
Compare and analyze against “old” predictors
VLSI Project Spring 2005
4
Project Requirements




Develop for SimpleScalar platform to
simulate OOOE processors
Run developed predictor on accepted
benchmarks
C language
No hardware components equivalence
needed, software implementation only
VLSI Project Spring 2005
5
Background and Theory
Perceptron
VLSI Project Spring 2005
6
Background and Theory (cont.)
Perceptron Training
Let ө=training threshold
t=1 if the branch was taken, or -1 otherwise
x=history vector
if (sign( yout ) != t) or |yout|<= ө then
for i := 0 to n do
wi := wi + t*xi
end for
end if
VLSI Project Spring 2005
7
Development Stages
1.
2.
3.
4.
5.
6.
7.
Studying the background
Learning SimpleScalar platform
Coding a "dummy" predictor and using it to be
sure that we understand how branch prediction
is handled in the SimpleScalar platform
Coding the perceptron predictor itself
Coding perceptron behavior revealer
Benchmarking (smart environment)
A special study of our suggestion regarding
perceptron predictor performance
VLSI Project Spring 2005
8
Principles



Branch prediction needs a learning methodology, NN
provides it based on inputs and outputs (patterns
recognition)
As history grows, the data structures of our predictor
grows linearly.
We use a perceptron to learn correlations between
particular branch outcomes in the global history and the
behavior of the current branch. These correlations are
represented by the weights. The larger the weight, the
stronger the correlation, and the more that particular
branch in the history contributes to the prediction of the
current branch. The input to the bias weight is always 1, so
instead of learning a correlation with a previous branch
outcome, the bias weight learns the bias of the branch,
independent of the history.
VLSI Project Spring 2005
9
Design and Implementation
VLSI Project Spring 2005
10
Hardware budget

History length
Long history length -> less perceptrons

Threshold
The threshold is a parameter to the perceptron
training algorithm that is used to decide
whether the predictor needs more training.

Representation of weights
Weights are signed integers.
Nr of bits = 1 + floor(log(Θ)).
VLSI Project Spring 2005
11
Algorithm

Fetch stage
1. The branch address is hashed to produce
an index i Є 0..n - 1 into the table of
perceptrons.
2. The i-th perceptron is fetched from the
table into a vector register, of weights P.
3.The value of y is computed as the dot
product of P and the global history register.
4.The branch is predicted not taken when y
is negative, or taken otherwise.
VLSI Project Spring 2005
12
Algorithm (cont.)

Execution stage
1. Once the actual outcome of the branch
becomes known, the training algorithm uses this
outcome and the value of y to update the
weights in P (training)
2. P is written back to the i-th entry in the table.
VLSI Project Spring 2005
13
Simulation Results
In all parameters Perceptron based
predictor outran the GSHARE
Simulation done over Benchmarks of
VPR, Perl, Parser from the ss_spec2k
VLSI Project Spring 2005
14
Simulation Results (cont.)
GSHARE on VPR
Neural on VPR
1
1
0.99
0.99
0.9869
0.98
0.9781
0.9785
0.9737
0.9716
0.97
0.9626
0.9644
0.96
0.95
0.9859
0.9855
15/128, 11520
15/256, 23040
15/512, 46080
0.9879
0.9875
15/1024, 92160
15/2048, 184320
0.98
0.9785
Prediction Rate
Prediction Rate
0.9773
0.9863
0.97
0.96
0.95
0.9487
0.94
0.94
0.9325
0.93
0.93
8, 512
9, 1024
10, 2048
11, 4096
12, 8192
13, 16384
14, 32768
15, 65536 16, 131072 17, 262144
15/64, 5760
GHr, Memory
GHr/Percpetrons, Memory
VLSI Project Spring 2005
15
Simulation Results (cont.)
GSHARE on VPR - Instructions Per Cycle
NEURAL on VPR - Instructions Per Counter
2
2
1.96
1.96
1.9362
1.928
1.9313
1.937
1.9375
15/1024, 92160
15/2048, 184320
IPC
1.92
IPC
1.92
1.9311
1.88
1.8719
1.8674
1.8538
1.877
1.8782
1.8794
1.8793
1.88
1.8533
1.84
1.84
1.8283
1.8
1.8013
8, 512
1.8
9, 1024
10, 2048
11, 4096
12, 8192
13, 16384 14, 32768 15, 65536 16, 131072 17, 262144
15/64, 5760
GHr, Memory
15/128, 11520
15/256, 23040
15/512, 46080
GHr/Percpetrons, Memory
VLSI Project Spring 2005
16
Simulation Results (cont.)
Perceptron Prediction by GHr
0.992
0.99
Prediction rate
0.988
256
64
0.986
1024
2048
0.984
0.982
0.98
10
15
20
25
30
GHr size
VLSI Project Spring 2005
17
Special Problems

Software simulation of hardware
• Utilizing existing data structures of
SimpleScalar

Compiling self written programs for
SimpleScalar
• After several weeks of hard work we decided
to use accepted benchmarks
VLSI Project Spring 2005
18
Summary



We implemented a different branch prediction
mechanism and received exciting results
Hardware implementation of the mechanism is
hard, but possible
Longer history in perceptron helps getting better
predictions
VLSI Project Spring 2005
19