class5-pipeline-b.ppt

CS:APP Chapter 4
Computer Architecture
Pipelined
Implementation
Part II
Randal E. Bryant
Carnegie Mellon University
http://csapp.cs.cmu.edu
CS:APP
Overview
Make the pipelined processor work!
Data Hazards


Instruction having register R as source follows shortly after
instruction having register R as destination
Common condition, don’t want to slow down pipeline
Control Hazards

Mispredict conditional branch
 Our design predicts all branches as being taken
 Naïve pipeline executes two extra instructions

Getting return address for ret instruction
 Naïve pipeline executes three extra instructions
Making Sure It Really Works

–2–
What if multiple special cases happen simultaneously?
CS:APP
W_icode, W_valM
Pipeline Stages
W_valE, W_valM, W_dstE, W_dstM
W
valM
Fetch
Memory
Data
Data
memory
memory
M_icode,
M_Bch,
M_valA
Addr, Data



Select current PC
Read instruction
Compute incremented PC
M
Bch
valE
CC
CC
Execute
ALU
ALU
aluA, aluB
Decode

E
Read program registers
valA, valB
Execute
d_srcA,
d_srcB
Decode
A
B
Register
Register M
file
file
E
Write back

Operate ALU
Memory

D
icode, ifun,
rA, rB, valC
Instruction
Instruction
memory
memory
Fetch
Write Back
–3–
valP
PC
PC
increment
increment
predPC
Read or write data memory
PC

valP
f_PC
F
Update register file
CS:APP
Write back
PIPE- Hardware
W
icode
valE
Mem.
control
write
Pipeline registers hold
intermediate values
from instruction
execution
Forward (Upward) Paths


Values passed from one
stage to next
Cannot jump past
stages
Data
Data
memory
memory
Memory
data in
Addr
M_valA
M_Bch
M
icode
Bch
valE
valA
dstE dstM
e_Bch
Execute
E
ALU
fun.
ALU
ALU
CC
CC
icode ifun
ALU
A
ALU
B
valC
valA
valB
dstE dstM srcA srcB
d_srcA d_srcB
Select
A
Decode
D
Fetch
d_rvalA
A
dstE dstM srcA srcB
W_valM
B
Register
Register M
file
file E
 e.g., valC passes
through decode
dstE dstM
data out
read

valM
icode ifun
rA
rB
Instruction
Instruction
memory
memory
valC
W_valE
valP
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
Select
PC
F
–4–
W_valM
predPC
CS:APP
Data Dependencies: 2 Nop’s
# demo-h2.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
0x00e: addl %edx,%eax
0x010: halt
6
7
8
9
10
W
Cycle 6
W
f%eax
R[ %eax] R[
3
] f3
•
•
•
D
valA f R[ %edx] = 10
valB f R[ %eax] = 0
–5–
Error
CS:APP
Data Dependencies: No Nop
# demo-h0.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
0x006: irmovl
$3,%eax
0x00c: addl %edx,%eax
0x00e: halt
6
7
8
W
Cycle 4
M
M_valE = 10
M_dstE = %edx
E
e_valE f 0 + 3 = 3
E_dstE = %eax
D
valA f R[ %edx] = 0
valB f R[ %eax] = 0
–6–
Error
CS:APP
Stalling for Data Dependencies
# demo-h2.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
E
M
W
D
D
E
M
W
F
F
D
E
M
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
6
bubble
0x00e: addl %edx,%eax
0x010: halt



–7–
F
7
8
9
10
11
W
If instruction follows too closely after one that writes
register, slow it down
Hold instruction in decode
Dynamically inject nop into execute stage
CS:APP
Write back
Stall Condition
W
icode
valE
dstE
dstM
data out
read
Mem.
control
Source Registers
valM
write
Memory
Data
Data
memory
memory
data in
Addr
M_valA
M_Bch

srcA and srcB of current
instruction in decode
stage
Destination Registers


dstE and dstM fields
Instructions in execute,
memory, and write-back
stages
Special Case

Don’t stall for register ID
8
M
–8–
Bch
valE
valA
dstE
dstM
dstE
dstM srcA
e_Bch
Execute
E
icode
ALU
fun.
ALU
ALU
CC
CC
ifun
ALU
A
ALU
B
valC
valA
valB
srcB
d_srcA d_srcB
Select
A
Decode
d_rvalA
A
dstE
dstM srcA
srcB
W_valM
B
Register
RegisterM
file
file
W_valE
E
D
Fetch
icode
ifun
rA
rB
Instruction
Instruction
memory
memory
valC
valP
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
Select
PC
 Indicates absence of
register operand
icode
F
W_valM
predPC
CS:APP
Detecting Stall Condition
# demo-h2.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
E
M
W
D
D
E
M
W
F
F
D
E
M
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
6
bubble
0x00e: addl %edx,%eax
0x010: halt
F
7
8
9
10
11
W
Cycle 6
W
W_dstE = %eax
W_valE = 3
•
•
•
D
–9–
srcA = %edx
srcB = %eax
CS:APP
Stalling X3
# demo-h0.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
E
M
W
F
D
E
M
W
E
M
W
E
M
W
E
M
W
0x006: irmovl
$3,%eax
bubble
bubble
6
bubble
0x00c: addl %edx,%eax
F
0x00e: halt
7
8
9
10
D
D
D
D
E
M
W
F
F
F
F
D
E
M
11
W
Cycle 6
W
Cycle 5
W_dstE = %eax
M
Cycle 4
M_dstE = %eax
E
•
•
•
D
E_dstE = %eax
D
– 10 –
srcA = %edx
srcB = %eax
srcA = %edx
srcB = %eax
•
•
•
D
srcA = %edx
srcB = %eax
CS:APP
What Happens When Stalling?
# demo-h0.ys
0x000: irmovl $10,%edx
0x006: irmovl
$3,%eax
0x00c: addl %edx,%eax
0x00e: halt



Cycle 8
4
5
6
7
Write Back
Memory
Execute
Decode
Fetch
0x000: bubble
0x006:
irmovl $10,%edx
$3,%eax
0x000: bubble
0x006:
irmovl $10,%edx
$3,%eax
0x006: bubble
0x00c:
irmovl
addl
%edx,%eax
$3,%eax
0x00c: halt
0x00e:
addl %edx,%eax
0x00e: halt
Stalling instruction held back in decode stage
Following instruction stays in fetch stage
Bubbles injected into execute stage
 Like dynamically generated nop’s
 Move through later stages
– 11 –
CS:APP
Implementing Stalling
W_dstM
W_dstE
W
icode
valE
valM
dstE dstM
valE
valA
dstE dstM
M_dstM
M_dstE
M
icode
Bch
E_dstM
Pipe
control
logic
E_dstE
E_bubble
E
icode ifun
valC
valA
valB
dstE dstM srcA
srcB
d_srcB
d_srcA
srcB
D_icode
D_stall
F_stall
D
srcA
icode ifun
F
rA
rB
valC
valP
predPC
Pipeline Control


– 12 –
Combinational logic detects stall condition
Sets mode signals for how pipeline registers should update
CS:APP
Pipeline Register Modes
Input = y
Output = x
x
Normal
stall
=0
Output = x
x
stall
=1
– 13 –
Output = x
x
stall
=0
y
_
Rising
clock
_
Output = x
x
bubble
=0
Input = y
Bubble
_
Output = y
bubble
=0
Input = y
Stall
_
Rising
clock
_
Rising
clock
_
n
o
p
Output = nop
bubble
=1
CS:APP
Data Forwarding
Naïve Pipeline


Register isn’t written until completion of write-back stage
Source operands read from register file in decode stage
 Needs to be in register file at start of stage
Observation

Value generated in execute or memory stage
Trick


– 14 –
Pass value directly from generating instruction to decode
stage
Needs to be available at end of decode stage
CS:APP
Data Forwarding Example
# demo-h2.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
F
E
D
F
M
E
D
F
W
M
E
D
F
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
0x00e: addl %edx,%eax
0x010: halt



– 15 –
irmovl in writeback stage
Destination value in
W pipeline register
Forward as valB for
decode stage
6
7
8
9
10
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
Cycle 6
W
R[ %eax] f 3
W_dstE = %eax
W_valE = 3
•
•
•
D
srcA = %edx
srcB = %eax
valA f R[ %edx] = 10
valB f W_valE = 3
CS:APP
W_icode, W_valM
W_valE, W_valM, W_dstE, W_dstM
Bypass Paths
W_valE
W_valM
W
Decode Stage



m_valM
Forwarding logic selects
Memory
valA and valB
Normally from register
file
Forwarding: get valA or
valB from later pipeline
Execute
stage
Addr, Data
M_valE
M


Execute: valE
Memory: valE, valM
Write back: valE, valM
e_valE
Bch
CC
CC
ALU
ALU
E_valA, E_valB,
E_srcA, E_srcB
Forwarding Sources

Data
Data
memory
memory
M_icode,
M_Bch,
M_valA
E
valA, valB
Forward
d_srcA,
d_srcB
Decode
A
B
Register
Register M
file
file
E
Write back
– 16 –
D
valP
CS:APP
Data Forwarding Example #2
# demo-h0.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
0x006: irmovl
$3,%eax
0x00c: addl %edx,%eax
0x00e: halt
Register %edx


Generated by ALU
during previous cycle
Forward from memory
as valA
Register %eax
– 17 –

Value just generated
by ALU

Forward from execute
as valB
6
7
8
W
Cycle 4
M
M_dstE = %edx
M_valE = 10
E
E_dstE = %eax
e_valE f 0 + 3 = 3
D
srcA = %edx
srcB = %eax
valA f M_valE = 10
valB f e_valE = 3
CS:APP
Implementing
Forwarding
W_valE
Write back
W_valM
W
icode
valE
valM
dstE dstM
data out
read
m_valM
Data
Data
memory
memory
Mem.
control
write
Memory

data in
Addr
M_Bch
M
icode
M_valA
M_valE
Bch
valE
valA
dstE dstM

e_Bch
e_valE
ALU
ALU
CC
CC
Execute
E
icode ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE dstM srcA
Add additional feedback
paths from E, M, and W
pipeline registers into
decode stage
Create logic blocks to
select from multiple
sources for valA and valB
in decode stage
srcB
d_srcA d_srcB
dstE dstM srcA
Sel+Fwd
A
Decode
Fwd
B
A
W_valM
B
Register
Register M
file
file
W_valE
E
D
icode ifun
– 18 –
rA
rB
Instruction
Instruction
memory
valC
srcB
valP
PC
PC
increment
Predict
PC
CS:APP
Implementing Forwarding
W_valE
W_valM
valE
valM
dstE dstM
data out
read
m_valM
Data
Data
memory
memory
l
write
data in
Addr
M_valA
M_valE
valE
valA
dstE dstM
e_valE
ALU
ALU
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE dstM srcA
srcB
d_srcA d_srcB
dstE dstM srcA
Sel+Fwd
A
Fwd
B
A
W_valM
B
Register
Register M
file
file
E
–valC
19 –
srcB
valP
## What should be the A value?
int new_E_valA = [
# Use incremented PC
D_icode in { ICALL, IJXX } : D_valP;
# Forward valE from execute
d_srcA == E_dstE : e_valE;
# Forward valM from memory
d_srcA == M_dstM : m_valM;
# Forward valE from memory
d_srcA == M_dstE : M_valE;
# Forward valM from write back
d_srcA == W_dstM : W_valM;
# Forward valE from write back
d_srcA == W_dstE : W_valE;
# Use value read from register file
1 : d_rvalA;
];
W_valE
CS:APP
Limitation of Forwarding
# demo-luh.ys
1
2
3
4
5
0x000: irmovl $128,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
0x018: mrmovl 0(%edx),%eax # Load %eax
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
0x006: irmovl
$3,%ecx
0x00c: rmmovl %ecx, 0(%edx)
0x012: irmovl $10,%ebx
6
0x01e: addl %ebx,%eax # Use %eax
0x020: halt
Load-use dependency


Value needed by end of
decode stage in cycle 7
Value read from memory in
memory stage of cycle 8
7
8
9
10
Cycle 7
Cycle 8
M
M
M_dstE = %ebx
M_valE = 10
11
W
M_dstM = %eax
m_valM f M[128] = 3
•
•
•
D
valA f M_valE = 10
valB f R[%eax] = 0
– 20 –
Error
CS:APP
Avoiding Load/Use Hazard
# demo-luh.ys
1
2
3
4
5
0x000: irmovl $128,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
0x018: mrmovl 0(%edx),%eax # Load %eax
D
E
M
W
F
D
E
M
W
E
M
W
D
D
E
M
W
F
F
D
E
M
0x006: irmovl
$3,%ecx
0x00c: rmmovl %ecx, 0(%edx)
0x012: irmovl $10,%ebx
6
7
8
bubble
0x01e: addl %ebx,%eax # Use %eax
0x020: halt


Stall using instruction for
one cycle
Can then pick up loaded
value by forwarding from
memory stage
F
9
10
11
12
W
Cycle 8
W
W_dstE = %ebx
W_valE = 10
M
M_dstM = %eax
m_valM f M[128] = 3
•
•
•
D
– 21 –
valA f W_valE = 10
valB f m_valM = 3
CS:APP
Data
Data
memory
memory
Mem.
control
write
Memory
Detecting Load/Use Hazard
data in
Addr
M_Bch
M
icode
M_valA
M_valE
Bch
valE
valA
dstE
dstM
e_Bch
e_valE
ALU
ALU
CC
CC
Execute
E
icode
ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE
dstM
dstE
dstM
srcA
srcB
d_srcA d_srcB
Sel +Fwd
A
Decode
D
icode
rA
rB
valC
srcB
Fwd
B
A
ifun
srcA
W_valM
B
Register
RegisterM
file
file E
W_valE
valP
Predict
PC
Condition
Fetch
Instruction
Instruction
memory
memory
Trigger
PC
PC
increment
increment
f_PC
Load/Use Hazard
F
– 22 –
M_valA
E_icode in { IMRMOVL, IPOPL } &&
E_dstM in { d_srcA, d_srcB }
Select
PC
W_valM
predPC
CS:APP
Control for Load/Use Hazard
# demo-luh.ys
1
2
3
0x000:
0x006:
0x00c:
0x012:
0x018:
4
irmovl $128,%edx
F
D E M
irmovl $3,%ecx
F
D E
rmmovl %ecx, 0(%edx)
F
D
irmovl $10,%ebx
F
mrmovl 0(%edx),%eax # Load %eax
bubble
0x01e: addl %ebx,%eax # Use %eax
0x020: halt


6
7
W
M
E
D
F
W
M
E
D
W
M
E
F
D
F
8
9
10
11
12
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
Stall instructions in fetch
and decode stages
Inject bubble into execute
stage
Condition
Load/Use Hazard
– 23 –
5
F
D
E
M
W
stall
stall
bubble
normal
normal
CS:APP
Branch Misprediction Example
demo-j.ys
0x000:
xorl %eax,%eax
0x002:
jne t
0x007:
irmovl $1, %eax
0x00d:
nop
0x00e:
nop
0x00f:
nop
0x010:
halt
0x011: t: irmovl $3, %edx
0x017:
irmovl $4, %ecx
0x01d:
irmovl $5, %edx

– 24 –
# Not taken
# Fall through
# Target (Should not execute)
# Should not execute
# Should not execute
Should only execute first 8 instructions
CS:APP
Handling Misprediction
# demo-j.ys
1
2
3
4
5
0x000:
xorl %eax,%eax
F
D
E
M
W
0x002:
jne target # Not taken
F
D
E
M
W
F
D
E
M
W
bubble
D
E
M
W
0x007:
irmovl $1,%eax # Fall through
F
D
E
M
W
0x00d:
nop
F
D
E
M
0x011: t: irmovl $2,%edx # Target
bubble
0x017:
irmovl $3,%ebx # Target+1
6
7
8
9
10
F
W
Predict branch as taken

Fetch 2 instructions at target
Cancel when mispredicted



– 25 –
Detect branch not-taken in execute stage
On following cycle, replace instructions in execute and
decode by bubbles
No side effects have occurred yet
CS:APP
W_valM
W
icode
valE
valM
dstE
dstM
Detecting Mispredicted Branch
data out
read
Data
Data
memory
memory
Mem.
control
write
Memory
m_valM
data in
Addr
M_Bch
M
icode
M_valA
M_valE
Bch
valE
valA
dstE
dstM
e_Bch
e_valE
ALU
ALU
CC
CC
Execute
E
icode
ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE
dstM
dstE
dstM
srcA
srcB
d_srcA d_srcB
Sel +Fwd
A
Condition
srcA
srcB
Fwd
B
Trigger
Decode
A
W_valM
B
Register
RegisterM
file
file E
Mispredicted Branch E_icode = IJXX & !e_Bch
D
Fetch
icode
ifun
rA
rB
Instruction
Instruction
memory
memory
valC
W_valE
valP
PC
PC
increment
increment
Predict
PC
f_PC
– 26 –
M_valA
Select
PC
W_valM
CS:APP
Control for Misprediction
# demo-j.ys
1
2
3
4
5
0x000:
xorl %eax,%eax
F
D
E
M
W
0x002:
jne target # Not taken
F
D
E
M
W
F
D
E
M
W
bubble
D
E
M
W
0x007:
irmovl $1,%eax # Fall through
F
D
E
M
W
0x00d:
nop
F
D
E
M
0x011: t: irmovl $2,%edx # Target
bubble
0x017:
irmovl $3,%ebx # Target+1
Condition
F
Mispredicted Branch normal
– 27 –
6
7
8
9
10
F
W
D
E
M
W
bubble
bubble
normal
normal
CS:APP
demo-retb.ys
Return Example
0x000:
0x006:
0x00b:
0x011:
0x020:
0x020:
0x026:
0x027:
0x02d:
0x033:
0x039:
0x100:
0x100:

– 28 –
irmovl Stack,%esp
call p
irmovl $5,%esi
halt
.pos 0x20
p: irmovl $-1,%edi
ret
irmovl $1,%eax
irmovl $2,%ecx
irmovl $3,%edx
irmovl $4,%ebx
.pos 0x100
Stack:
# Initialize stack pointer
# Procedure call
# Return point
# procedure
#
#
#
#
Should
Should
Should
Should
not
not
not
not
be
be
be
be
executed
executed
executed
executed
# Stack: Stack pointer
Previously executed three additional instructions
CS:APP
Correct Return Example
# demo-retb
0x026:
ret
F
bubble
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
bubble
bubble
0x00b:

irmovl $5,%esi # Return
As ret passes through
pipeline, stall at fetch stage
W
W
valM = 0x0b
 While in decode, execute, and
memory stage


– 29 –
Inject bubble into decode
stage
Release stall when reach
write-back stage
•
•
•
F
valC f 5
rB f %esi
CS:APP
Detecting Return
M_Bch
M
icode
M_valE
Bch
valE
valA
dstE dstM
e_Bch
e_valE
ALU
ALU
CC
CC
Execute
E
icode ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE dstM srcA
srcB
d_srcA d_srcB
dstE dstM srcA
Sel+Fwd
A
Decode
D
– 30 –
icode ifun
Fwd
B
A
B
Register
Register M
file
file E
rA
rB
valC
srcB
W_valM
W_valE
valP
Condition
Trigger
Processing ret
IRET in { D_icode, E_icode, M_icode }
CS:APP
Control for Return
# demo-retb
0x026:
ret
F
bubble
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
bubble
bubble
0x00b:
irmovl $5,%esi # Return
Condition
Processing ret
– 31 –
W
F
D
E
M
W
stall
bubble
normal
normal
normal
CS:APP
Special Control Cases
Detection
Condition
Trigger
Processing ret
IRET in { D_icode, E_icode, M_icode }
Load/Use Hazard
E_icode in { IMRMOVL, IPOPL } &&
E_dstM in { d_srcA, d_srcB }
Mispredicted Branch E_icode = IJXX & !e_Bch
Action (on next cycle)
Condition
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
bubble
bubble
normal
normal
Mispredicted Branch normal
– 32 –
CS:APP
Implementing Pipeline Control
W
icode
valE
valM
dstE dstM
valE
valA
dstE dstM
M_icode
M
icode
Bch
e_Bch
CC
CC
E_dstM
E_icode
Pipe
control
logic
E_bubble
E
icode ifun
valC
valA
valB
dstE dstM srcA
srcB
d_srcB
d_srcA
srcB
D_icode
srcA
D_bubble

– 33 –

D_stall
D
F_stall
F
icode ifun
rA
rB
valC
valP
predPC
Combinational logic generates pipeline control signals
Action occurs at start of following cycle
CS:APP
Initial Version of Pipeline Control
bool F_stall =
# Conditions for a load/use hazard
E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } ||
# Stalling at fetch while ret passes through pipeline
IRET in { D_icode, E_icode, M_icode };
bool D_stall =
# Conditions for a load/use hazard
E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };
bool D_bubble =
# Mispredicted branch
(E_icode == IJXX && !e_Bch) ||
# Stalling at fetch while ret passes through pipeline
IRET in { D_icode, E_icode, M_icode };
bool E_bubble =
# Mispredicted branch
(E_icode == IJXX && !e_Bch) ||
# Load/use hazard
E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};
– 34 –
CS:APP
Control Combinations
Load/use
M
ret 1
Mispredict
M
E
Load
E
D
Use
D
JXX
ret 2
M
M
E
E
D
D
ret
ret 3
M
ret
ret
E
bubble
bubble
D
bubble
Combination A
Combination B

Special cases that can arise on same clock cycle
Combination A


Not-taken branch
ret instruction at branch target
Combination B


– 35 –
Instruction that reads from memory to %esp
Followed by ret instruction
CS:APP
A
Decode
A
Control Combination A
D
M
JXX
E
D
ifun
rA
Fetch
rB
valC
valP
Instruction
Instruction
memory
memory
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
E
D
W_valE
E
ret 1
Mispredict
M
icode
W_valM
B
Register
RegisterM
file
file
Select
PC
ret
F
W_valM
predPC
Combination A
Condition
F
D
E
M
W
stall
bubble
normal
normal
normal
Mispredicted Branch normal
bubble
bubble
normal
normal
Combination
bubble
bubble
normal
normal
Processing ret



– 36 –
stall
Should handle as mispredicted branch
Stalls F pipeline register
But PC selection logic will be using M_valM anyhow
CS:APP
Control Combination B
ret 1
Load/use
M
M
E
Load
E
D
Use
D
ret
Combination B
Condition
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
Combination
stall
bubble + bubble
stall
normal
normal


– 37 –
Would attempt to bubble and stall pipeline register D
Signaled by processor as pipeline error
CS:APP
Handling Control Combination B
ret 1
Load/use
M
M
E
Load
E
D
Use
D
ret
Combination B
Condition
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
Combination
stall
stall
bubble
normal
normal


– 38 –
Load/use hazard should get priority
ret instruction should be held in decode stage for additional
cycle
CS:APP
Corrected Pipeline Control Logic
bool D_bubble =
# Mispredicted branch
(E_icode == IJXX && !e_Bch) ||
# Stalling at fetch while ret passes
IRET in { D_icode, E_icode, M_icode
# but not condition for a load/use
&& !(E_icode in { IMRMOVL, IPOPL }
&& E_dstM in { d_srcA, d_srcB
Condition
through pipeline
}
hazard
});
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
Combination
stall
stall
bubble
normal
normal


– 39 –
Load/use hazard should get priority
ret instruction should be held in decode stage for additional
cycle
CS:APP
Pipeline Summary
Data Hazards

Most handled by forwarding
 No performance penalty

Load/use hazard requires one cycle stall
Control Hazards

Cancel instructions when detect mispredicted branch
 Two clock cycles wasted

Stall fetch stage while ret passes through pipeline
 Three clock cycles wasted
Control Combinations


Must analyze carefully
First version had subtle bug
 Only arises with unusual instruction combination
– 40 –
CS:APP