EE 457 Unit 6a Pipelining Introduction More Pipelining Examples

1
2
Pipelining Introduction
• Consider a drink bottling plant
– Filling the bottle = 3 sec.
– Placing the cap = 3 sec.
– Labeling = 3 sec.
EE 457 Unit 6a
• Would you want…
– Machine 1 = Does all three (9 secs.), outputs the bottle, repeats…
– Machine 2 = Divided into three parts (one for each step) passing
bottles between them
Basic Pipelining Techniques
• Machine _____ offers ability to __________
Filler + Capper + Label
(3 + 3 + 3)
Place
Cap
(3 sec)
Filler
(3 sec)
Labeler
(3 sec)
3
4
More Pipelining Examples
A3
B3
X
Y
B[3:0]
A2
B2
X
Y
A1
B1
X
Y
A0
B0
X
Y
Co FA Ci
Co FA Ci
Co FA Ci
Co FA Ci
S
S
S
S
Z3
addr
A[3:0]
i
Z2
Z1
5 ns
0
i
Z[3:0]
Z0
BMEM
data
i
addr
AMEM
data
• Freshman/Sophomore/Junior/Senior
– Z[i] = A[i] + B[i]
– Delay: 10ns Mem. Access (read or write), 10 ns each FA
– Clock cycle time = _____________________________________
addr
– Would you buy a combo washer + dryer unit that
does both operations in the same tank??
• Consider adding an array of 4-bit numbers:
data
• Car Assembly Line
• Wash/Dry/Fold
Summing Elements
ZMEM
5
6
Pipelined Adder
Pipelining Effects on Clock Period
5 ns
AMEM
addr
i
i
A[3:0]
data
10ns
Pipeline Register
(Stage Latch)
S1/S2
BMEM
• Rather than just try to balance
delay we could consider making
more stages
addr
B[3:0]
A3
B3
A2
B2
A1
B1
A0
B0
A3
B3
A2
B2
A1
B1
A0
B0
data
Y
FA Ci
Co S
If we assume that
the pipeline
registers are ideal
(0ns additional
delay) we can clock
the pipe every __
ns. Speedup =
____
C1
S2/S3
X
Co
0
S0
Y
FA Ci
S
C2
S3/S4
S1
X
Y
FA Ci
Co S
C3
S4/S5
X
Co
S5/S6
C4
S2
Y
FA Ci
S
S3
Z3
S2
Z2
Z[3:0]
Z0
10 ns
10 ns
Ex. 2: Balanced stage delay
Clock Period = __ns (____ speedup)
5 ns
5 ns
5 ns
5 ns
ZMEM
i
Z1
Ex. 1: Unbalanced stage delay
Clock Period = ___ns
– Divide long stage into multiple stages
– In Example 3, clock period could be
_________________
– Time through the pipeline (latency) is
still 20 ns, but we’ve doubled our
throughput (1 result every __ ns rather
than every 10 or 15 ns)
– Note: There is a small time overhead
to adding a pipeline register/stage (i.e.
can’t go crazy adding stages)
X
15 ns
addr
10ns
Ex. 3: Break long stage into multiple stages
Clock period = ___(___ speedup)
data
7
8
To Register or Latch?
But Can We Latch?
• We can latch if we run the latches on opposite phases of the
clock or have a so-called _________________
• What should we use for pipeline stages
– Registers [edge-sensitive] …or…
– Latches [level-sensitive]
– Because each latch runs on the opposite phase data can only move
one step before being stopped by a latch that is in hold (off) mode
• Latches may allow data to _________________
• You may learn more about this in EE577a or EE560 (a
technique known as Slack Borrowing & Time Stealing)
• Answer: __________________
Φ
S2b
S3a
Latch
~Φ
S2a
Latch
S1b
Latch
S3
Latch
S2
Register or Latch
Register or Latch
S1
S3b
9
10
Pipelining Introduction
Non-Pipelined Execution
• Implementation technique that _________execution of multiple instructions
at once
• Improves ___________ rather an single-instruction execution latency
• ______________ stage determines clock cycle time [e.g. a 30 min. wash cycle
but 1 hour dry time means _________ per load]
• In the case of perfectly balanced stages:
Fetch
(I-MEM)
Reg.
Read
ALU Op.
Data
Mem
Reg.
Write
Total Time
Load
10 ns
5 ns
10 ns
10 ns
5 ns
40 ns
Store
R-Type
Branch
– Speedup =
Jump
• A 5-stage pipelined CPU may not realize this speedup 5x b/c…
–
–
–
–
Instruction
time
The stages may not be perfectly balanced
The overhead of filling up the pipe initially
The overhead (setup time and clock-to-Q) delay of the stage registers
Inability to keep the pipe full due to branches & data hazards
40 ns
LW $5,100($2)
Fetch Reg ALU
Data Reg
40 ns
Fetch Reg ALU
LW $7,40($6)
Data Reg
40 ns
Fetch
LW $8,24($6)
…
3 Instructions = 3*40 ns
11
Pipelined Execution
time
LW $7,40($6)
LW $8,24($6)
Fetch Reg
30 ns
40 ns
ALU
Data Reg
Fetch Reg
ALU
Fetch Reg
50 ns
60 ns
70 ns
80 ns
– i.e. Multicycle CPU w/ k steps
or single cycle CPU w/ clock
cycle k times slower
Data Reg
ALU
Fetch Reg
Data Reg
ALU
• Execute n instructions using a k
stage datapath
• w/o pipelining: ___________
Data Reg
• w/ pipelining: ____________
– 70 ns = ___ ns for 1st instruc + _____ for the remaining instructions to complete
• The speedup looks like it is only 120 ns / 70 ns = 1.7x
• But consider 1003 instructions: ____________________________
– The overhead of filling the pipeline is ___________ over steady-state execution when
the pipeline is full
– ___ cycles for 1st instruc. + ____
cycles for n-1 instrucs.
– Assumes we keep the pipeline
full
Decode
10ns
Exec.
10ns
Mem.
10ns
WB
10ns
C1
ADD
C2
SUB
ADD
C3
LW
SUB
ADD
C4
SW
LW
SUB
C5
AND
SW
LW
SUB
ADD
C6
OR
AND
SW
LW
SUB
C7
XOR
OR
AND
SW
LW
XOR
OR
AND
SW
XOR
OR
AND
XOR
OR
C8
C9
C10
ADD
XOR
C11
7 Instrucs. = 11 clocks
Pipeline Emptying
• Notice that even though the register access only takes 5 ns it is allocated a
10 ns slot in the pipeline
• Total time for these 3 pipelined instructions =
Fetch
10ns
Pipeline Full
…
20 ns
Pipelined Timing
Pipeline Filling
LW $5,100($2)
10 ns
12
13
14
Single-Cycle CPU Datapath
Designing the Pipelined Datapath
Fetch (IF)
• To pipeline a datapath in five stages means five
instructions will be executing (“in-flight”) during any
single clock cycle
• Resources cannot be shared between stages because
there may always be an instruction wanting to use
the resource
Decode (ID)
Exec. (EX)
Mem
WB
0
1
A
+
4
Sh.
Left
2
+
PCSrc
B
Branch
RegWrite
[25:21]
5
[20:16]
5
Addr.
PC
[15:11]
Instruc.
Write
Reg. #
0
1
5
Write
Data
I-Cache
[15:0]
RegDst
MemRead
Read
Reg. 2 #
Read
data 1
0
Zero
Read
data 2
0
ALU
– Each stage needs its own resources
– The single-cycle CPU datapath also matches this concept of
no shared resources
– We can simply divide the single-cycle CPU into stages
Read
Reg. 1 #
Res.
Read
Data
1
Register File
16
Addr.
Sign
Extend
32
1
Write
Data
ALUSrc
INST[5:0]
ALUOp[1:0]
D-Cache
MemtoReg
ALU control
MemWrite
14
15
16
Information Flow in a Pipeline
• Data or control information should flow only in the
forward direction in a linear pipeline
– Non-linear pipelines where information is fed back into a
previous stage occurs in more complex pipelines such as
floating point dividers
• The CPU pipeline is like a buffet line or cafeteria
where people can not try to revisit a a previous
serving station without disrupting the smooth flow of
the line
???
Buffet Line
Register File
• Don’t we have a non-linear flow when we write a value back
to the register file?
– An instruction in WB is re-using the register file in the ID stage
– Actually we are utilizing different ________of the register file
• ID stage ___________ register values
• WB stage __________ register value
– Like a buffet line with _________ at one station
IM
Reg
ALU
IM
???
Buffet Line
Reg
17
18
Pipelining the Fetch Phase
CC2
CC3
CC4
CC5
IM
Reg
ALU
DM
Reg
IM
Reg
IM
ADD $3,$4,$5
ALU
Reg
IM
CC6
CC7
CC8
Write $5
DM
Reg
ALU
DM
Reg
ALU
DM
Reg
Fetch
4
A
B
0
1
Addr.
Instruc.
I-Cache
Stage Register
LW $5,100($2)
CC1
• Note that to keep the pipeline full we
have to fetch a new instruction every
clock cycle
• Thus we have to perform
PC = PC + 4 every clock cycle
• Thus there shall be no pipelining
registers in the datapath responsible
for PC = PC +4
• Support for branch/jump warrants a
lengthy discussion which we will
perform later
+
• Only an issue if WB to same register as being read
• Register file can be designed to do “internal forwarding”
where the data being written is immediately ______ out as
the ___________________
PC
Register File
Reg
Read $5
19
20
Pipeline Packing List
Basic 5 Stage Pipeline
Op = 35
rs=1
rt=10
immed.=40
SW $15,100($2) Op = 43
rs=2
rt=15
immed.=100
Fetch
Addr.
Instruc.
I-Cache
Instruction Register
5
Read
Reg. 1 #
rt
5
Read
Reg. 2 #
5
Write
Reg. #
rt/rd
Write
Data
Read
data 2
Register File
16
Instruc = 32
Read
data 1
Sign
Extend
32
Sh.
Left
2
Mem
Zero
0
1
ALU
1
rs
B
Exec.
WB
+
0
A
+
4
Decode
Res.
Addr.
Read
Data
Write
Data
D-Cache
Pipeline Stage Register
LW $10,40($1)
Pipeline Stage Register
•
PC
• Just as when you go on a trip you have to pack
everything you need _____________, so in pipelines
you have to take all the control and data you will
need with you down the pipeline
Compute the size of each pipeline register (find the max. info needed for any
instruction in each stage)
To simplify, just consider LW/SW (Ignore control signals)
Pipeline Stage Register
•
0
1
21
22
Basic 5 Stage Pipeline
4
Decode
5
Read
Reg. 1 #
5
Read
Reg. 2 #
+
PC
1
Addr.
Instruc.
I-Cache
Write
Reg. #
Write
Data
Read
data 1
Read
data 2
Register File
16
Sign
Extend
Sh.
Left
2
Zero
ALU
0
Instruction Register
B
Mem
0
Res.
1
0
32
WB
+
A
Exec.
Addr.
Read
Data
Write
Data
Pipeline Stage Register
Fetch
Pipeline Stage Register
•
There is a bug in the load instruction implementation
Which register is written with the data read from memory?
We need to preserve the ______________ number by carrying it through the
pipeline with us
LW $10,40($1)
In general this is true for all signals needed later in the pipe
SW $15,100($2)
Pipeline Stage Register
•
•
•
0
PIPELINE CONTROL
1
D-Cache
1
23
Pipeline Control Overview
Stage Control
• We will just consider basic (simple) pipeline control and deal with problems
related to branch and data hazards later
• It is assumed that the PC and pipeline register update on each clock cycle so no
separate write enable signals are needed for these registers
Read
Reg. 1 #
5
Read
Reg. 2 #
Instruc.
I-Cache
Instruction Register
PC
1
Addr.
Write
Reg. #
Write
Data
Read
data 1
Read
data 2
Register File
16
Sign
Extend
32
Zero
0
1
0
1
ALU
0
Sh.
Left
2
WB
Res.
Addr.
Read
Data
Write
Data
D-Cache
Pipeline Stage Register
5
+
B
Mem
+
A
Exec.
Pipeline Stage Register
4
Decode
Pipeline Stage Register
Fetch
24
0
1
• Instruction Fetch: The control signals to read instruction memory and to write
the PC are always asserted, so there is nothing special to control in this pipeline
stage
• ID/RF: As in the previous stage the same thing happens at every clock cycle so
there are no optional control lines to set
• Execution: The signals to be set are RegDst, ALUop/Func, and ALUSrc. The
signals determine whether the result of the instruction written into the register
specified by bits 20-16 (for a load) or 15-11 for an R-format), specify the ALU
operation, and determine whether the second operand of the ALU will be a
register or a sign-extended immediate
• Memory Stage: The control lines set in this stage are Branch, MemRead, and
MemWrite. These signals are set for the BEQ, LW, and SW instructions
respectively
• WriteBack: the two control lines are RegWrite , which writes the chosen register,
and MemToReg, which decides between the ALU result or memory value as the
write data
25
26
Control Signal Generation
• Recall from the Single-Cycle CPU
discussion that there is no state machine
control, but a simple translator
(combinational logic) to translate the 6bit opcode into these 9 control signals
• Since the datapaths of the single-cycle
and pipelined CPU are essentially the
same, so is the control
• The main difference is that the control
signals are generated in one clock cycle
and used in a subsequent cycle (later
pipeline stage)
• We can produce all our signals in the
________ and use the pipeline registers
to store and pass them to the
_______________ stage
ALU
Src
ALU
Op[1:0]
Func[5:0]
1
0
10
…
R-format
Branch
Mem
Read
Mem
Write
Reg
Write
MemtoReg
0
0
1
0
0
LW
0
1
00
X
0
1
0
1
1
SW
X
1
00
X
0
0
1
0
X
Beq
X
0
01
X
0
0
0
0
X
[25:0]
Reg
Dst
RegWrite
[25:21]
5
[20:16]
5
Addr.
[15:11]
Instruc.
Read
Reg. 1 #
Read
Reg. 2 #
Read
data 1
Write
Reg. #
0
1
5
Read
data 2
Write
Data
I-Cache
RegDst
[15:0]
Instruction
ALUOp[1:0]
MemtoReg
RegDst
ALUSrc
Control
PC
• How many control signals are needed in each
stage
[31:26]
Control Signals per Stage
16
Register File
Sign
Extend
27
28
Basic 5 Stage Pipeline
Exercise:
16
Sign
Extend
32
1
0
1
Res.
Addr.
Read
Data
Write
Data
5
0
0
1
1
Addr.
Instruc.
I-Cache
5
Read
Reg. 2 #
Write
Reg. #
Write
Data
Read
data 1
Read
data 2
Register File
D-Cache
16
Sign
Extend
32
ALUSrc,RegDst,
ALUOp, (Func)
Sh.
Left
2
Zero
0
1
0
1
Branch, MemRead,
MemWrite
WB
WB
Mem WB
WB
Ex Mem WB
Read
Reg. 1 #
Mem
Res.
Addr.
Read
Data
Write
Data
D-Cache
Pipeline Stage Register
Register File
Zero
0
A
B
PC
Read
data 2
Control
Exec.
Pipeline Stage Register
Write
Data
RegWrite,
MemToReg
Instruction Register
I-Cache
Instruction Register
PC
Instruc.
Read
data 1
Decode
=> AND $12,$4,$5
ALU
1
Write
Reg. #
ALU
0
Addr.
Fetch
4
Pipeline Stage Register
5
Read
Reg. 2 #
Sh.
Left
2
WB
=> SUB $11,$2,$3
=> ADD $14,$8,$9
RegWrite,
MemToReg
+
Read
Reg. 1 #
Branch, MemRead,
MemWrite
– LW $10,40($1)
=> OR $13,$6,$7
+
5
+
B
ALUSrc,RegDst,
ALUOp, (Func)
On copies of this sheet, show this sequence executing on the pipeline:
+
A
Mem
Mem WB
Control
4
Exec.
Pipeline Stage Register
Ex Mem WB
Decode
Pipeline Stage Register
Fetch
•
Pipeline Stage Register
• Control is generated in the decode stage and passed along to consuming
stages through stage registers
0
1
29
30
Review
• Although an instruction can begin at each clock cycle, an individual
instruction still takes five clock cycles
• Note that it takes four clock cycle before the five-stage pipeline is
operating at full efficiency
• Register write-back is controlled by the WB stage even though the register
file is located in the ID stage; the correct write register ID is carried down
the pipeline with the instruction data
• When a stage is inactive, the values of the control lines are deasserted
(shown as 0's) to prevent anything harmful from occurring
• No state machine is needed; sequencing of the control signals follows
simply from the pipeline itself (i.e. control signals are produced initially
but delayed by the stage registers until the correct stage / clock cycle for
application of that signal)
ADDITIONAL REFERENCE
31
32
16
Sign
Extend
32
1
Addr.
Read
Data
Write
Data
D-Cache
1
1
Addr.
PC
Res.
0
Instruc.
I-Cache
Reg. 1 #
5
Read
Reg. 2 #
Write
Reg. #
Write
Data
Read
data 1
Read
data 2
Register File
16
Sign
Extend
32
$t1 #
Fetch LW and
increment PC
Decode instruction
and fetch operands
Pipeline Stage Register
Register File
0
Pipeline Stage Register
Instruction Register
Read
data 2
0
5
Sh.
Left
2
Mem
Zero
0
1
ALU
Write
Data
Zero
ALU
Instruc.
I-Cache
Write
Reg. #
Read
data 1
$s0 # Read
B
Pipeline Stage Register
Read
Reg. 2 #
A
Exec.
WB
+
5
Sh.
Left
2
Decode
Res.
Addr.
Read
Data
Write
Data
D-Cache
Pipeline Stage Register
Fetch
+
Addr.
PC
WB
+
5
Read
Reg. 1 #
+
1
Mem
4
A
B
0
Exec.
Pipeline Stage Register
4
Decode
LW $t1,4($s0) machine code
Fetch
LW $t1,4($s0): Decode
Pipeline Stage Register
LW $t1,4($s0): Fetch
0
1
33
34
Read
data 2
Register File
16
Sign
Extend
32
0
Res.
1
5
Read
Reg. 2 #
Addr.
Read
Data
Write
Data
0
1
Addr.
Instruc.
I-Cache
1
Write
Reg. #
Write
Data
Read
data 1
Read
data 2
Register File
D-Cache
16
Sign
Extend
Mem
Sh.
Left
2
Zero
0
WB
Res.
1
Addr.
Read
Data
Write
Data
Pipeline Stage Register
Write
Data
Zero
Read
Reg. 1 #
Pipeline Stage Register
Write
Reg. #
0
Instruction Register
I-Cache
Instruction Register
PC
Instruc.
5
Exec.
ALU
1
Addr.
A
B
ALU
0
Read
data 1
Decode
PC
Read
Reg. 2 #
4
Sh.
Left
2
Pipeline Stage Register
5
Fetch
+
Read
Reg. 1 #
WB
+
5
+
B
Mem
+
A
Exec.
Pipeline Stage Register
4
Decode
$t1 # / Offset=0x00000004 / $s0 value
Fetch
LW $t1,4($s0): Memory
$t1 # / Address
LW $t1,4($s0): Execute
0
1
D-Cache
32
Add offset 4 to
$s0 value
Read word
from memory
35
36
16
Sign
Extend
32
1
Addr.
Read
Data
Write
Data
0
0
1
Addr.
PC
Res.
B
5
Read
Reg. 1 #
5
Read
Reg. 2 #
Instruc.
I-Cache
1
D-Cache
Write
Reg. #
Write
Data
Fetch LW
Read
data 2
Register File
16
Write
word to
$t1
Read
data 1
Sign
Extend
Sh.
Left
2
Mem
Zero
0
Res.
1
Addr.
Read
Data
Write
Data
0
1
D-Cache
32
Decode instruction
and fetch operands
WB
Pipeline Stage Register
Register File
0
$t1 # / Data read from memory
Instruction Register
Read
data 2
A
Exec.
ALU
Write
Data
Zero
ALU
Instruc.
I-Cache
Write
Reg. #
Read
data 1
Pipeline Stage Register
Read
Reg. 2 #
Decode
Pipeline Stage Register
Fetch
+
Addr.
PC
WB
+
5
Sh.
Left
2
+
5
Read
Reg. 1 #
+
1
Mem
4
A
B
0
Exec.
Pipeline Stage Register
4
Decode
Instruction Register
Fetch
LW $t1,4($s0)
Pipeline Stage Register
LW $t1,4($s0): Writeback
Add offset 4 to
$s0
Read word
Write
from memory word to
$t1
37
38
16
Sign
Extend
1
Addr.
Read
Data
Write
Data
0
1
Addr.
PC
Res.
0
Instruc.
I-Cache
1
$t5 # Read
5
5
Reg. 2 #
Write
Reg. #
Write
Data
Read
data 1
Read
data 2
Register File
D-Cache
32
Sh.
Left
2
Reg. 1 #
$t6 # Read
16
Sign
Extend
Pipeline Stage Register
Register File
0
Pipeline Stage Register
Read
data 2
Zero
Pipeline Stage Register
Instruction Register
Write
Data
Read
data 1
B
Mem
Zero
ALU
Write
Reg. #
A
Exec.
0
WB
+
I-Cache
Read
Reg. 2 #
Sh.
Left
2
ALU
Instruc.
5
Read
Reg. 1 #
Decode
Res.
1
Addr.
Read
Data
Write
Data
Pipeline Stage Register
Fetch
+
5
Addr.
PC
WB
+
+
1
Mem
4
A
B
0
Exec.
Pipeline Stage Register
4
Decode
ADD $t4,$t5,$t6 machine code
Fetch
ADD $t4,$t5,$t6: Decode
Pipeline Stage Register
ADD $t4,$t5,$t6: Fetch
0
1
D-Cache
32
$t4 #
Decode instruction
and fetch operands
Fetch ADD and
increment PC
39
40
16
Sign
Extend
1
Addr.
Read
Data
Write
Data
D-Cache
32
Add $t5 + $t6
0
0
1
1
Addr.
PC
Res.
Read
Reg. 1 #
5
Read
Reg. 2 #
Instruc.
I-Cache
Write
Reg. #
Write
Data
Read
data 1
Read
data 2
Register File
16
Sign
Extend
32
Sh.
Left
2
Zero
0
1
WB
Res.
Addr.
Read
Data
Write
Data
D-Cache
Just pass
sum through
Pipeline Stage Register
Register File
0
Pipeline Stage Register
Instruction Register
Read
data 2
5
Mem
ALU
Write
Data
Zero
ALU
Instruc.
I-Cache
Write
Reg. #
Read
data 1
B
Pipeline Stage Register
Read
Reg. 2 #
A
Exec.
+
5
Sh.
Left
2
Decode
$t4 # / Sum of $t5 + $t6
Fetch
+
Addr.
PC
WB
+
5
Read
Reg. 1 #
+
1
Mem
4
A
B
0
Exec.
$t4 # / $t6 value / $t5 value
4
Decode
Instruction Register
Fetch
ADD $t4,$t5,$t6: Memory
Pipeline Stage Register
ADD $t4,$t5,$t6: Execute
0
1
41
42
Register File
16
Sign
Extend
32
0
1
Res.
Read
Reg. 1 #
5
Read
Reg. 2 #
Addr.
Read
Data
Write
Data
0
1
Addr.
Instruc.
I-Cache
1
Write
Reg. #
Write
Data
Read
data 2
Register File
D-Cache
16
Write
sum to
$t4
Read
data 1
Sign
Extend
Zero
Res.
1
Addr.
Read
Data
Write
Data
0
1
D-Cache
32
Decode instruction
and fetch operands
Fetch
ADD
Sh.
Left
2
0
WB
Pipeline Stage Register
Read
data 2
0
PC
Instruction Register
Write
Data
Zero
5
Mem
ALU
I-Cache
Write
Reg. #
ALU
Instruc.
Read
data 1
$t4 # / Sum of $t5 + $t6
Read
Reg. 2 #
A
B
Pipeline Stage Register
5
Sh.
Left
2
Exec.
+
Read
Reg. 1 #
Decode
Pipeline Stage Register
Fetch
+
Addr.
PC
WB
+
5
+
Add $t5 + $t6
Just pass
sum through
Write
sum to
$t4
43
44
Basic 5 Stage Pipeline
• Control is generated in the decode stage and passed along to consuming
stages through stage registers
Read
Reg. 1 #
5
Read
Reg. 2 #
1
Addr.
Instruc.
I-Cache
Instruction Register
0
Write
Reg. #
Write
Data
Read
data 1
Read
data 2
Register File
16
Sign
Extend
32
Zero
0
1
0
1
ALU
OLD PIPELINING
Sh.
Left
2
WB
Res.
Addr.
Read
Data
Write
Data
D-Cache
Pipeline Stage Register
5
+
B
Mem
+
A
Exec.
Pipeline Stage Register
4
Decode
Pipeline Stage Register
Fetch
PC
1
Mem
4
A
B
0
Exec.
Pipeline Stage Register
4
Decode
Instruction Register
Fetch
ADD $t4,$t5,$t6
Pipeline Stage Register
ADD $t4,$t5,$t6: Writeback
0
1
45
Basic 5 Stage Pipeline
• Control is generated in the decode stage and passed along to consuming
stages through stage registers
Read
Reg. 1 #
5
Read
Reg. 2 #
Instruc.
I-Cache
Instruction Register
PC
1
Addr.
Write
Reg. #
Write
Data
Read
data 1
Read
data 2
Register File
16
Sign
Extend
32
Zero
0
1
0
1
ALU
0
Sh.
Left
2
WB
Branch, MemRead,
MemWrite
Res.
Addr.
Read
Data
Write
Data
D-Cache
Pipeline Stage Register
5
+
B
ALUSrc,RegDst,
ALUOp, (Func)
WB
RegWrite,
MemToReg
+
A
Mem
Mem WB
Control
4
Exec.
Pipeline Stage Register
Ex Mem WB
Decode
Pipeline Stage Register
Fetch
0
1