CS:APP Chapter 4
Computer Architecture
Pipelined
Implementation
Part II
Randal E. Bryant
Carnegie Mellon University
http://csapp.cs.cmu.edu
CS:APP
Overview
Make the pipelined processor work!
Data Hazards
Instruction having register R as source follows shortly after
instruction having register R as destination
Common condition, don’t want to slow down pipeline
Control Hazards
Mispredict conditional branch
Our design predicts all branches as being taken
Naïve pipeline executes two extra instructions
Getting return address for ret instruction
Naïve pipeline executes three extra instructions
Making Sure It Really Works
–2–
What if multiple special cases happen simultaneously?
CS:APP
W_icode, W_valM
Pipeline Stages
W_valE, W_valM, W_dstE, W_dstM
W
valM
Fetch
Memory
Data
Data
memory
memory
M_icode,
M_Bch,
M_valA
Addr, Data
Select current PC
Read instruction
Compute incremented PC
M
Bch
valE
CC
CC
Execute
ALU
ALU
aluA, aluB
Decode
E
Read program registers
valA, valB
Execute
d_srcA,
d_srcB
Decode
A
B
Register
Register M
file
file
E
Write back
Operate ALU
Memory
D
icode, ifun,
rA, rB, valC
Instruction
Instruction
memory
memory
Fetch
Write Back
–3–
valP
PC
PC
increment
increment
predPC
Read or write data memory
PC
valP
f_PC
F
Update register file
CS:APP
Write back
PIPE- Hardware
W
icode
valE
Mem.
control
write
Pipeline registers hold
intermediate values
from instruction
execution
Forward (Upward) Paths
Values passed from one
stage to next
Cannot jump past
stages
Data
Data
memory
memory
Memory
data in
Addr
M_valA
M_Bch
M
icode
Bch
valE
valA
dstE dstM
e_Bch
Execute
E
ALU
fun.
ALU
ALU
CC
CC
icode ifun
ALU
A
ALU
B
valC
valA
valB
dstE dstM srcA srcB
d_srcA d_srcB
Select
A
Decode
D
Fetch
d_rvalA
A
dstE dstM srcA srcB
W_valM
B
Register
Register M
file
file E
e.g., valC passes
through decode
dstE dstM
data out
read
valM
icode ifun
rA
rB
Instruction
Instruction
memory
memory
valC
W_valE
valP
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
Select
PC
F
–4–
W_valM
predPC
CS:APP
Data Dependencies: 2 Nop’s
# demo-h2.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
0x00e: addl %edx,%eax
0x010: halt
6
7
8
9
10
W
Cycle 6
W
f%eax
R[ %eax] R[
3
] f3
•
•
•
D
valA f R[ %edx] = 10
valB f R[ %eax] = 0
–5–
Error
CS:APP
Data Dependencies: No Nop
# demo-h0.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
0x006: irmovl
$3,%eax
0x00c: addl %edx,%eax
0x00e: halt
6
7
8
W
Cycle 4
M
M_valE = 10
M_dstE = %edx
E
e_valE f 0 + 3 = 3
E_dstE = %eax
D
valA f R[ %edx] = 0
valB f R[ %eax] = 0
–6–
Error
CS:APP
Stalling for Data Dependencies
# demo-h2.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
E
M
W
D
D
E
M
W
F
F
D
E
M
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
6
bubble
0x00e: addl %edx,%eax
0x010: halt
–7–
F
7
8
9
10
11
W
If instruction follows too closely after one that writes
register, slow it down
Hold instruction in decode
Dynamically inject nop into execute stage
CS:APP
Write back
Stall Condition
W
icode
valE
dstE
dstM
data out
read
Mem.
control
Source Registers
valM
write
Memory
Data
Data
memory
memory
data in
Addr
M_valA
M_Bch
srcA and srcB of current
instruction in decode
stage
Destination Registers
dstE and dstM fields
Instructions in execute,
memory, and write-back
stages
Special Case
Don’t stall for register ID
8
M
–8–
Bch
valE
valA
dstE
dstM
dstE
dstM srcA
e_Bch
Execute
E
icode
ALU
fun.
ALU
ALU
CC
CC
ifun
ALU
A
ALU
B
valC
valA
valB
srcB
d_srcA d_srcB
Select
A
Decode
d_rvalA
A
dstE
dstM srcA
srcB
W_valM
B
Register
RegisterM
file
file
W_valE
E
D
Fetch
icode
ifun
rA
rB
Instruction
Instruction
memory
memory
valC
valP
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
Select
PC
Indicates absence of
register operand
icode
F
W_valM
predPC
CS:APP
Detecting Stall Condition
# demo-h2.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
E
M
W
D
D
E
M
W
F
F
D
E
M
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
6
bubble
0x00e: addl %edx,%eax
0x010: halt
F
7
8
9
10
11
W
Cycle 6
W
W_dstE = %eax
W_valE = 3
•
•
•
D
–9–
srcA = %edx
srcB = %eax
CS:APP
Stalling X3
# demo-h0.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
E
M
W
F
D
E
M
W
E
M
W
E
M
W
E
M
W
0x006: irmovl
$3,%eax
bubble
bubble
6
bubble
0x00c: addl %edx,%eax
F
0x00e: halt
7
8
9
10
D
D
D
D
E
M
W
F
F
F
F
D
E
M
11
W
Cycle 6
W
Cycle 5
W_dstE = %eax
M
Cycle 4
M_dstE = %eax
E
•
•
•
D
E_dstE = %eax
D
– 10 –
srcA = %edx
srcB = %eax
srcA = %edx
srcB = %eax
•
•
•
D
srcA = %edx
srcB = %eax
CS:APP
What Happens When Stalling?
# demo-h0.ys
0x000: irmovl $10,%edx
0x006: irmovl
$3,%eax
0x00c: addl %edx,%eax
0x00e: halt
Cycle 8
4
5
6
7
Write Back
Memory
Execute
Decode
Fetch
0x000: bubble
0x006:
irmovl $10,%edx
$3,%eax
0x000: bubble
0x006:
irmovl $10,%edx
$3,%eax
0x006: bubble
0x00c:
irmovl
addl
%edx,%eax
$3,%eax
0x00c: halt
0x00e:
addl %edx,%eax
0x00e: halt
Stalling instruction held back in decode stage
Following instruction stays in fetch stage
Bubbles injected into execute stage
Like dynamically generated nop’s
Move through later stages
– 11 –
CS:APP
Implementing Stalling
W_dstM
W_dstE
W
icode
valE
valM
dstE dstM
valE
valA
dstE dstM
M_dstM
M_dstE
M
icode
Bch
E_dstM
Pipe
control
logic
E_dstE
E_bubble
E
icode ifun
valC
valA
valB
dstE dstM srcA
srcB
d_srcB
d_srcA
srcB
D_icode
D_stall
F_stall
D
srcA
icode ifun
F
rA
rB
valC
valP
predPC
Pipeline Control
– 12 –
Combinational logic detects stall condition
Sets mode signals for how pipeline registers should update
CS:APP
Pipeline Register Modes
Input = y
Output = x
x
Normal
stall
=0
Output = x
x
stall
=1
– 13 –
Output = x
x
stall
=0
y
_
Rising
clock
_
Output = x
x
bubble
=0
Input = y
Bubble
_
Output = y
bubble
=0
Input = y
Stall
_
Rising
clock
_
Rising
clock
_
n
o
p
Output = nop
bubble
=1
CS:APP
Data Forwarding
Naïve Pipeline
Register isn’t written until completion of write-back stage
Source operands read from register file in decode stage
Needs to be in register file at start of stage
Observation
Value generated in execute or memory stage
Trick
– 14 –
Pass value directly from generating instruction to decode
stage
Needs to be available at end of decode stage
CS:APP
Data Forwarding Example
# demo-h2.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
F
E
D
F
M
E
D
F
W
M
E
D
F
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
0x00e: addl %edx,%eax
0x010: halt
– 15 –
irmovl in writeback stage
Destination value in
W pipeline register
Forward as valB for
decode stage
6
7
8
9
10
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
Cycle 6
W
R[ %eax] f 3
W_dstE = %eax
W_valE = 3
•
•
•
D
srcA = %edx
srcB = %eax
valA f R[ %edx] = 10
valB f W_valE = 3
CS:APP
W_icode, W_valM
W_valE, W_valM, W_dstE, W_dstM
Bypass Paths
W_valE
W_valM
W
Decode Stage
m_valM
Forwarding logic selects
Memory
valA and valB
Normally from register
file
Forwarding: get valA or
valB from later pipeline
Execute
stage
Addr, Data
M_valE
M
Execute: valE
Memory: valE, valM
Write back: valE, valM
e_valE
Bch
CC
CC
ALU
ALU
E_valA, E_valB,
E_srcA, E_srcB
Forwarding Sources
Data
Data
memory
memory
M_icode,
M_Bch,
M_valA
E
valA, valB
Forward
d_srcA,
d_srcB
Decode
A
B
Register
Register M
file
file
E
Write back
– 16 –
D
valP
CS:APP
Data Forwarding Example #2
# demo-h0.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
0x006: irmovl
$3,%eax
0x00c: addl %edx,%eax
0x00e: halt
Register %edx
Generated by ALU
during previous cycle
Forward from memory
as valA
Register %eax
– 17 –
Value just generated
by ALU
Forward from execute
as valB
6
7
8
W
Cycle 4
M
M_dstE = %edx
M_valE = 10
E
E_dstE = %eax
e_valE f 0 + 3 = 3
D
srcA = %edx
srcB = %eax
valA f M_valE = 10
valB f e_valE = 3
CS:APP
Implementing
Forwarding
W_valE
Write back
W_valM
W
icode
valE
valM
dstE dstM
data out
read
m_valM
Data
Data
memory
memory
Mem.
control
write
Memory
data in
Addr
M_Bch
M
icode
M_valA
M_valE
Bch
valE
valA
dstE dstM
e_Bch
e_valE
ALU
ALU
CC
CC
Execute
E
icode ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE dstM srcA
Add additional feedback
paths from E, M, and W
pipeline registers into
decode stage
Create logic blocks to
select from multiple
sources for valA and valB
in decode stage
srcB
d_srcA d_srcB
dstE dstM srcA
Sel+Fwd
A
Decode
Fwd
B
A
W_valM
B
Register
Register M
file
file
W_valE
E
D
icode ifun
– 18 –
rA
rB
Instruction
Instruction
memory
valC
srcB
valP
PC
PC
increment
Predict
PC
CS:APP
Implementing Forwarding
W_valE
W_valM
valE
valM
dstE dstM
data out
read
m_valM
Data
Data
memory
memory
l
write
data in
Addr
M_valA
M_valE
valE
valA
dstE dstM
e_valE
ALU
ALU
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE dstM srcA
srcB
d_srcA d_srcB
dstE dstM srcA
Sel+Fwd
A
Fwd
B
A
W_valM
B
Register
Register M
file
file
E
–valC
19 –
srcB
valP
## What should be the A value?
int new_E_valA = [
# Use incremented PC
D_icode in { ICALL, IJXX } : D_valP;
# Forward valE from execute
d_srcA == E_dstE : e_valE;
# Forward valM from memory
d_srcA == M_dstM : m_valM;
# Forward valE from memory
d_srcA == M_dstE : M_valE;
# Forward valM from write back
d_srcA == W_dstM : W_valM;
# Forward valE from write back
d_srcA == W_dstE : W_valE;
# Use value read from register file
1 : d_rvalA;
];
W_valE
CS:APP
Limitation of Forwarding
# demo-luh.ys
1
2
3
4
5
0x000: irmovl $128,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
0x018: mrmovl 0(%edx),%eax # Load %eax
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
0x006: irmovl
$3,%ecx
0x00c: rmmovl %ecx, 0(%edx)
0x012: irmovl $10,%ebx
6
0x01e: addl %ebx,%eax # Use %eax
0x020: halt
Load-use dependency
Value needed by end of
decode stage in cycle 7
Value read from memory in
memory stage of cycle 8
7
8
9
10
Cycle 7
Cycle 8
M
M
M_dstE = %ebx
M_valE = 10
11
W
M_dstM = %eax
m_valM f M[128] = 3
•
•
•
D
valA f M_valE = 10
valB f R[%eax] = 0
– 20 –
Error
CS:APP
Avoiding Load/Use Hazard
# demo-luh.ys
1
2
3
4
5
0x000: irmovl $128,%edx
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
0x018: mrmovl 0(%edx),%eax # Load %eax
D
E
M
W
F
D
E
M
W
E
M
W
D
D
E
M
W
F
F
D
E
M
0x006: irmovl
$3,%ecx
0x00c: rmmovl %ecx, 0(%edx)
0x012: irmovl $10,%ebx
6
7
8
bubble
0x01e: addl %ebx,%eax # Use %eax
0x020: halt
Stall using instruction for
one cycle
Can then pick up loaded
value by forwarding from
memory stage
F
9
10
11
12
W
Cycle 8
W
W_dstE = %ebx
W_valE = 10
M
M_dstM = %eax
m_valM f M[128] = 3
•
•
•
D
– 21 –
valA f W_valE = 10
valB f m_valM = 3
CS:APP
Data
Data
memory
memory
Mem.
control
write
Memory
Detecting Load/Use Hazard
data in
Addr
M_Bch
M
icode
M_valA
M_valE
Bch
valE
valA
dstE
dstM
e_Bch
e_valE
ALU
ALU
CC
CC
Execute
E
icode
ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE
dstM
dstE
dstM
srcA
srcB
d_srcA d_srcB
Sel +Fwd
A
Decode
D
icode
rA
rB
valC
srcB
Fwd
B
A
ifun
srcA
W_valM
B
Register
RegisterM
file
file E
W_valE
valP
Predict
PC
Condition
Fetch
Instruction
Instruction
memory
memory
Trigger
PC
PC
increment
increment
f_PC
Load/Use Hazard
F
– 22 –
M_valA
E_icode in { IMRMOVL, IPOPL } &&
E_dstM in { d_srcA, d_srcB }
Select
PC
W_valM
predPC
CS:APP
Control for Load/Use Hazard
# demo-luh.ys
1
2
3
0x000:
0x006:
0x00c:
0x012:
0x018:
4
irmovl $128,%edx
F
D E M
irmovl $3,%ecx
F
D E
rmmovl %ecx, 0(%edx)
F
D
irmovl $10,%ebx
F
mrmovl 0(%edx),%eax # Load %eax
bubble
0x01e: addl %ebx,%eax # Use %eax
0x020: halt
6
7
W
M
E
D
F
W
M
E
D
W
M
E
F
D
F
8
9
10
11
12
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
Stall instructions in fetch
and decode stages
Inject bubble into execute
stage
Condition
Load/Use Hazard
– 23 –
5
F
D
E
M
W
stall
stall
bubble
normal
normal
CS:APP
Branch Misprediction Example
demo-j.ys
0x000:
xorl %eax,%eax
0x002:
jne t
0x007:
irmovl $1, %eax
0x00d:
nop
0x00e:
nop
0x00f:
nop
0x010:
halt
0x011: t: irmovl $3, %edx
0x017:
irmovl $4, %ecx
0x01d:
irmovl $5, %edx
– 24 –
# Not taken
# Fall through
# Target (Should not execute)
# Should not execute
# Should not execute
Should only execute first 8 instructions
CS:APP
Handling Misprediction
# demo-j.ys
1
2
3
4
5
0x000:
xorl %eax,%eax
F
D
E
M
W
0x002:
jne target # Not taken
F
D
E
M
W
F
D
E
M
W
bubble
D
E
M
W
0x007:
irmovl $1,%eax # Fall through
F
D
E
M
W
0x00d:
nop
F
D
E
M
0x011: t: irmovl $2,%edx # Target
bubble
0x017:
irmovl $3,%ebx # Target+1
6
7
8
9
10
F
W
Predict branch as taken
Fetch 2 instructions at target
Cancel when mispredicted
– 25 –
Detect branch not-taken in execute stage
On following cycle, replace instructions in execute and
decode by bubbles
No side effects have occurred yet
CS:APP
W_valM
W
icode
valE
valM
dstE
dstM
Detecting Mispredicted Branch
data out
read
Data
Data
memory
memory
Mem.
control
write
Memory
m_valM
data in
Addr
M_Bch
M
icode
M_valA
M_valE
Bch
valE
valA
dstE
dstM
e_Bch
e_valE
ALU
ALU
CC
CC
Execute
E
icode
ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE
dstM
dstE
dstM
srcA
srcB
d_srcA d_srcB
Sel +Fwd
A
Condition
srcA
srcB
Fwd
B
Trigger
Decode
A
W_valM
B
Register
RegisterM
file
file E
Mispredicted Branch E_icode = IJXX & !e_Bch
D
Fetch
icode
ifun
rA
rB
Instruction
Instruction
memory
memory
valC
W_valE
valP
PC
PC
increment
increment
Predict
PC
f_PC
– 26 –
M_valA
Select
PC
W_valM
CS:APP
Control for Misprediction
# demo-j.ys
1
2
3
4
5
0x000:
xorl %eax,%eax
F
D
E
M
W
0x002:
jne target # Not taken
F
D
E
M
W
F
D
E
M
W
bubble
D
E
M
W
0x007:
irmovl $1,%eax # Fall through
F
D
E
M
W
0x00d:
nop
F
D
E
M
0x011: t: irmovl $2,%edx # Target
bubble
0x017:
irmovl $3,%ebx # Target+1
Condition
F
Mispredicted Branch normal
– 27 –
6
7
8
9
10
F
W
D
E
M
W
bubble
bubble
normal
normal
CS:APP
demo-retb.ys
Return Example
0x000:
0x006:
0x00b:
0x011:
0x020:
0x020:
0x026:
0x027:
0x02d:
0x033:
0x039:
0x100:
0x100:
– 28 –
irmovl Stack,%esp
call p
irmovl $5,%esi
halt
.pos 0x20
p: irmovl $-1,%edi
ret
irmovl $1,%eax
irmovl $2,%ecx
irmovl $3,%edx
irmovl $4,%ebx
.pos 0x100
Stack:
# Initialize stack pointer
# Procedure call
# Return point
# procedure
#
#
#
#
Should
Should
Should
Should
not
not
not
not
be
be
be
be
executed
executed
executed
executed
# Stack: Stack pointer
Previously executed three additional instructions
CS:APP
Correct Return Example
# demo-retb
0x026:
ret
F
bubble
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
bubble
bubble
0x00b:
irmovl $5,%esi # Return
As ret passes through
pipeline, stall at fetch stage
W
W
valM = 0x0b
While in decode, execute, and
memory stage
– 29 –
Inject bubble into decode
stage
Release stall when reach
write-back stage
•
•
•
F
valC f 5
rB f %esi
CS:APP
Detecting Return
M_Bch
M
icode
M_valE
Bch
valE
valA
dstE dstM
e_Bch
e_valE
ALU
ALU
CC
CC
Execute
E
icode ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE dstM srcA
srcB
d_srcA d_srcB
dstE dstM srcA
Sel+Fwd
A
Decode
D
– 30 –
icode ifun
Fwd
B
A
B
Register
Register M
file
file E
rA
rB
valC
srcB
W_valM
W_valE
valP
Condition
Trigger
Processing ret
IRET in { D_icode, E_icode, M_icode }
CS:APP
Control for Return
# demo-retb
0x026:
ret
F
bubble
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
W
F
D
E
M
bubble
bubble
0x00b:
irmovl $5,%esi # Return
Condition
Processing ret
– 31 –
W
F
D
E
M
W
stall
bubble
normal
normal
normal
CS:APP
Special Control Cases
Detection
Condition
Trigger
Processing ret
IRET in { D_icode, E_icode, M_icode }
Load/Use Hazard
E_icode in { IMRMOVL, IPOPL } &&
E_dstM in { d_srcA, d_srcB }
Mispredicted Branch E_icode = IJXX & !e_Bch
Action (on next cycle)
Condition
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
bubble
bubble
normal
normal
Mispredicted Branch normal
– 32 –
CS:APP
Implementing Pipeline Control
W
icode
valE
valM
dstE dstM
valE
valA
dstE dstM
M_icode
M
icode
Bch
e_Bch
CC
CC
E_dstM
E_icode
Pipe
control
logic
E_bubble
E
icode ifun
valC
valA
valB
dstE dstM srcA
srcB
d_srcB
d_srcA
srcB
D_icode
srcA
D_bubble
– 33 –
D_stall
D
F_stall
F
icode ifun
rA
rB
valC
valP
predPC
Combinational logic generates pipeline control signals
Action occurs at start of following cycle
CS:APP
Initial Version of Pipeline Control
bool F_stall =
# Conditions for a load/use hazard
E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } ||
# Stalling at fetch while ret passes through pipeline
IRET in { D_icode, E_icode, M_icode };
bool D_stall =
# Conditions for a load/use hazard
E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };
bool D_bubble =
# Mispredicted branch
(E_icode == IJXX && !e_Bch) ||
# Stalling at fetch while ret passes through pipeline
IRET in { D_icode, E_icode, M_icode };
bool E_bubble =
# Mispredicted branch
(E_icode == IJXX && !e_Bch) ||
# Load/use hazard
E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};
– 34 –
CS:APP
Control Combinations
Load/use
M
ret 1
Mispredict
M
E
Load
E
D
Use
D
JXX
ret 2
M
M
E
E
D
D
ret
ret 3
M
ret
ret
E
bubble
bubble
D
bubble
Combination A
Combination B
Special cases that can arise on same clock cycle
Combination A
Not-taken branch
ret instruction at branch target
Combination B
– 35 –
Instruction that reads from memory to %esp
Followed by ret instruction
CS:APP
A
Decode
A
Control Combination A
D
M
JXX
E
D
ifun
rA
Fetch
rB
valC
valP
Instruction
Instruction
memory
memory
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
E
D
W_valE
E
ret 1
Mispredict
M
icode
W_valM
B
Register
RegisterM
file
file
Select
PC
ret
F
W_valM
predPC
Combination A
Condition
F
D
E
M
W
stall
bubble
normal
normal
normal
Mispredicted Branch normal
bubble
bubble
normal
normal
Combination
bubble
bubble
normal
normal
Processing ret
– 36 –
stall
Should handle as mispredicted branch
Stalls F pipeline register
But PC selection logic will be using M_valM anyhow
CS:APP
Control Combination B
ret 1
Load/use
M
M
E
Load
E
D
Use
D
ret
Combination B
Condition
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
Combination
stall
bubble + bubble
stall
normal
normal
– 37 –
Would attempt to bubble and stall pipeline register D
Signaled by processor as pipeline error
CS:APP
Handling Control Combination B
ret 1
Load/use
M
M
E
Load
E
D
Use
D
ret
Combination B
Condition
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
Combination
stall
stall
bubble
normal
normal
– 38 –
Load/use hazard should get priority
ret instruction should be held in decode stage for additional
cycle
CS:APP
Corrected Pipeline Control Logic
bool D_bubble =
# Mispredicted branch
(E_icode == IJXX && !e_Bch) ||
# Stalling at fetch while ret passes
IRET in { D_icode, E_icode, M_icode
# but not condition for a load/use
&& !(E_icode in { IMRMOVL, IPOPL }
&& E_dstM in { d_srcA, d_srcB
Condition
through pipeline
}
hazard
});
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
Combination
stall
stall
bubble
normal
normal
– 39 –
Load/use hazard should get priority
ret instruction should be held in decode stage for additional
cycle
CS:APP
Pipeline Summary
Data Hazards
Most handled by forwarding
No performance penalty
Load/use hazard requires one cycle stall
Control Hazards
Cancel instructions when detect mispredicted branch
Two clock cycles wasted
Stall fetch stage while ret passes through pipeline
Three clock cycles wasted
Control Combinations
Must analyze carefully
First version had subtle bug
Only arises with unusual instruction combination
– 40 –
CS:APP
© Copyright 2026 Paperzz