Lecture 8. Branch Prediction COSC6385

COSC6385 Advanced Computer Architecture
Lecture 8. Branch Prediction
Instructor: Weidong Shi (Larry), PhD
Computer Science Department
University of Houston
Predict What?
• Direction (1-bit)
 Single direction for unconditional jumps and calls/returns
 Binary for conditional branches
• Target (32-bit or 64-bit addresses)
 Some are easy
• One: Uni-directional jumps
• Two: Fall through (Not Taken) vs. Taken
 Many: Function Pointer or Indirect Jump (e.g. jr r31)
2
Categorizing Branches
75%
Conditional
Branch
82%
6%
Jump
SPEC2000INT
SPEC2000FP
10%
19%
Call/Return
8%
0%
20%
40%
60%
80%
100%
Frequency of branch instructions
Source: H&P using Alpha
3
Branch Misprediction
1
2
3
4
5
6
7
8
9
PC Next PC Fetch DriveAlloc Rename Queue
10
11
12
Schedule
Single Issue
4
13
14
15
16
17
18
19
20
Dispatch Reg File ExecFlags Br Resolve
Branch Misprediction
1
2
3
4
5
6
7
8
9
PC Next PC Fetch DriveAlloc Rename Queue
10
11
12
Schedule
13
14
15
16
17
18
19
20
Dispatch Reg File ExecFlags Br Resolve
Single Issue
Mispredict
5
Branch Misprediction
1
2
3
4
5
6
7
8
9
PC Next PC Fetch DriveAlloc Rename Queue
10
11
12
Schedule
13
14
15
16
17
18
19
20
Dispatch Reg File ExecFlags Br Resolve
Single Issue (flush entailed instructions and refetch)
Mispredict
6
Branch Misprediction
1
2
3
4
5
6
7
8
9
PC Next PC Fetch DriveAlloc Rename Queue
10
11
12
Schedule
Single Issue
Fetch the correct path
7
13
14
15
16
17
18
19
20
Dispatch Reg File ExecFlags Br Resolve
Branch Misprediction
1
2
3
4
5
6
7
8
9
PC Next PC Fetch DriveAlloc Rename Queue
10
11
12
Schedule
13
14
15
16
17
18
19
20
Dispatch Reg File ExecFlags Br Resolve
Single Issue
Mispredict
8-issue Superscalar Processor (Worst case)
8
Why Branch is Predictable?
if (aa==2)
aa = 0;
if (bb==2)
bb = 0;
if (aa!=bb)
….
for (i=0; i<100; i++) {
….
}
addi
addi
L1:
……
……
addi
bne
……
addi r2, r0, 2
bne r10, r2, L_bb
xor r10, r10, r10
j
L_exit
L_bb:
bne r11, r2, L_xx
xor r11, r11, r11
j
L_exit
L_xx:
beq r10, r11, L_exit
…
Lexit:
r10, r0, 100
r1, r0, r0
r1, r1, 1
r1, r10, L1
9
Control Speculation
• Execute instruction beyond a branch before the
branch is resolved  Performance
• Speculative execution
• What if mis-speculated? need
 Recovery mechanism
 Squash instructions on the incorrect path
• Branch prediction: Dynamic vs. Static
• What to predict?
10
Static Branch Prediction
• Uni-directional, always predict taken (or not
taken)
• Backward taken, Forward not taken
 Need offset information (when?)
• Compiler hints with branch annotation
 When the info will be available? Post-decode?
11
Simplest Dynamic Branch Predictor
• Prediction based on latest outcome
• Index by some bits in the branch PC
 Aliasing
NT
T
T
for (i=0; i<100; i++) {
….
}
NT
0x40010100
0x40010104
addi r10, r0, 100
addi r1, r1, r0
.
.
.
0x40010108
L1:
……
……
addi r1, r1, 1
bne r1, r10, L1
……
T
…
0x40010A04
0x40010A08
T
NT
NT
How accurate?
1-bit
Branch
History
Table
Typical Table Organization
PC
(32 bits)
2N entries
Hash
N bits
.
.
.
.
.
table update
FSM
Update
Logic
Actual outcome
Prediction
Simplest Dynamic Branch Predictor
for (i=0; i<100; i++) {
if (a[i] == 0) {
…
}
…
}
0x40010100
0x40010104
0x40010108
0x4001010c
0x40010110
0x40010210
0x40010B0c
0x40010B10
addi r10, r0, 100
addi r1, r1, r0
L1:
add r21, r20, r1
lw
r2, (r21)
beq r2, r0, L2
……
j
L3
L2:
………
L3:
addi r1, r1, 1
bne r1, r10, L1
NT
T
T
NT
T
.
.
.
T
NT
NT
1-bit
Branch
History
Table
FSM of the Simplest Predictor
• A 2-state machine
• Change mind fast
0
If branch taken
If branch not taken
0
Predict not taken
1
Predict taken
1
Example Using 1-bit Branch History Table
addi r10, r0, 4
addi r1, r1, r0
L1:
……
addi r1, r1, 1
bne r1, r10, L1
for (i=0; i<4; i++) {
….
}
Pred
Actual











0
1
1
1
1
0
1
1
1
1
0
T
T
0
T
NT
T
T
T
T
T
NT
1
T
1
Accuracy = ?
60%
2-bit Saturating Up/Down Counter Predictor
MSB: Direction bit
LSB: Hysteresis bit
10/
WT
11/
ST
01/
WN
00/
SN
Taken
Not Taken
Predict Not taken
ST: Strongly Taken
WT: Weakly Taken
WN: Weakly Not Taken
Predict taken
SN: Strongly Not Taken
Example Using 2-bit up/down Counter
addi r10, r0, 4
addi r1, r1, r0
L1:
……
addi r1, r1, 1
bne r1, r10, L1
for (i=0; i<4; i++) {
….
}
Pred
Actual











1
2
3
3
3
2
3
3
3
3
2
T
T
T
T
NT
T
2
WT
3
ST
1
WN
0
SN
T
T
T
NT
3
T
Accuracy = ? 80%
Bimodal Branch Prediction
PC Address
1 1 . . . . .1 0 1 . . . . .
0 1 0 0
.
.
.
.
.
N bits
2N entries
(each entry has a
2 bit counter)
table update
• 2N entries addressed by N-bit PC
• Each entry keeps a counter (2-bit or more)
for prediction
• Counter update: the same as 2-bit
counter
FSM
Update
Logic
Actual outcome
Prediction
Global vs. Local Branch History
• Local Behavior
 What is the predicted direction of Branch A given the
outcomes of previous instances of Branch A?
• Global Behavior
 What is the predicted direction of Branch Z given the
outcomes of all* previous branches A, B, …, X and Y?
* number of previous branches tracked limited by the history
length
Branch Correlation
Code Snippet
if (aa==2)
// b1
aa = 0;
if (bb==2)
// b2
bb = 0;
if (aa!=bb) { // b3
…….
}
1 (T)
1
b3
Path: A:1-1
aa=0
bb=0
b2
b1
0 (NT)
0
1
b3
b3
b2
b3
B:1-0 C:0-1 D:0-0
aa=0 aa2 aa2
bb2 bb=0 bb2
• Branch direction
 Not independent
 Correlated to the path taken
• Example: Path 1-1 of b3 can be surely known
beforehand
• Track path using a 2-bit register
21
0
Why Global Correlations Exist
• Example: related branch conditions
A:
p = findNode(foo);
if ( p is parent )
do something;
do other stuff; /* may contain more branches */
B:
if ( p is a child )
do something else;
Outcome of second
branch is always
opposite of the first
branch
Global Branch History Register
Code Snippet
if (aa==2)
// b1
aa = 0;
if (bb==2)
// b2
bb = 0;
if (aa!=bb) { // b3
…….
}
Actual
000
T
T
001 011
NT
110
• An N-bit Shift Register
• Shift-in branch outcomes
 1 taken
 0  not taken
• First-in First-Out
• BHR can be
 Global
 Local (Per-address)
Local Branch History Register
addi r10, r0, 4
addi r1, r1, r0
L1:
……
addi r1, r1, 1
bne r1, r10, L1
for (i=0; i<4; i++) {
….
}
Actual
T
000
001 011
T
T
T
111
111
NT
110
T
101
T
011
T
T
111
111
NT
110