under ideal circumstances

Intro to Computer Org.
Pipelining, Part 1
Pushing Performance Further

With the multicycle datapath, time
can be saved for some instructions
by breaking each instruction into
multiple pieces.

Some instructions need less cycles
than others, allowing us to omit the
time needed for the cycles to match
the longest instructions on the singlecycle datapath.
Pushing Performance Further

Note that with the
multicycle
datapath, we are
still only executing
one instruction at
a time.
Pushing Performance Further

Upon observation of the individual
steps needed for each instruction,
we noted that not all parts of the
datapath are needed for each cycle.

One classmate mentioned that after
the second stage, we really should be
able to load up a new instruction.
Pushing Performance Further

If we wish to improve performance,
what properties of a CPU can we
seek to modify?
Instruction Count
 Clock rate / GHz
 Cycles Per Instruction (CPI)

Pushing Performance Further

If we wish to improve performance,
what properties of a CPU can we
seek to modify?

Instruction Count

Dependent on compilers.
Clock rate / GHz
 Cycles Per Instruction (CPI)

Pushing Performance Further

If we wish to improve performance,
what properties of a CPU can we
seek to modify?
Instruction Count
 Clock rate / GHz



May be fundamentally limited
Cycles Per Instruction (CPI)
Pushing Performance Further

If we wish to improve performance,
what properties of a CPU can we
seek to modify?
Instruction Count
 Clock rate / GHz
 Cycles Per Instruction (CPI)


Can we do more instructions per cycle?
Pushing Performance Further

If we wished to reduce the CPI
from that of the multicycle
datapath, what could we do?

Merging cycles together would
increase the clock rate.
Pushing Performance Further

“Computer performance is
characterized by the amount of
useful work accomplished by a
computer system compared to the
time and resources used.”
-Wikipedia article
Pushing Performance Further

If we can’t make the “useful work”
faster… what if we could do more of
it at the same time?
Pushing Performance Further

In technical terms, if you cannot
improve the latency, try improving
throughput.
Latency – time required to complete a
single task.
 Throughput – amount of tasks which
can be handled at once.

Pushing Performance Further
If we can’t make the “useful work”
faster… what if we could do more of
it at the same time?
 A classic example: Doing laundry.


When multiple loads must be done,
the only interference between loads of
laundry comes different time
requirements for the different steps.

One dryer can only handle one load.
Sequential Laundry
6 PM 7
T
a
s
k
O
r
d
e
r
8
9
10
11
12
1
2 AM
30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
Time
A
B
C
D
Pipelined Laundry
6 PM 7
T
a
s
k
8
9
3030 30 30 30 30 30
10
11
12
1
2 AM
Time
A
B
C
O
D
r
d
e
r
The same amount of work is
done in far less time… but
each load takes the same
amount of time!
Latency is the same.
Throughput is greater.
Pipelining Our Datapath
The pipelined datapath will look
very similar to the single-cycle
datapath.
 However, the cycle divisions will
come directly from the multi-cycle
datapath.

Pipelining Our Datapath
1.
2.
3.
4.
5.
Instruction Fetching (IF)
Instruction Decoding (ID)
Execute [ALU, branching] (EX)
Memory Access (MEM)
Writeback to the register file (WB)
Pipelining Our Datapath

Each of these stages always
comes in the same order on the
multi-cycle datapath.

However, some instructions did not
need certain stages.
Pipelining Our Datapath

R-type instructions
1.
Instruction Fetching (IF)
Instruction Decoding (ID)
Execute [ALU, branching] (EX)
Memory Access (MEM)
Writeback to the register file (WB)
2.
3.
4.
5.
Pipelining Our Datapath

Store Word
1.
Instruction Fetching (IF)
Instruction Decoding (ID)
Execute [ALU, branching] (EX)
Memory Access (MEM)
Writeback to the register file (WB)
2.
3.
4.
5.
Pipelining Our Datapath

Branching / Jumping
1.
Instruction Fetching (IF)
Instruction Decoding (ID)
Execute [ALU, branching] (EX)
Memory Access (MEM)
Writeback to the register file (WB)
2.
3.
4.
5.
Pipelining Our Datapath

For pipelining, the unneeded stages
will still “run.”
They just won’t do anything useful.
 Each section of the datapath can only
be used for one instruction at a time,
so we must keep the space occupied
to keep things synchronized.

Single-Cycle Datapath…
Single-Cycle -> Pipelining
IF
ID
EX
MEM
WB
Pipelined Datapath
Pipelined Datapath

Each giant bar represents a giant
set of registers that holds every
needed value in place for the next
clock cycle.

We can’t refer back to hardware from
prior stages because it will be holding
data for other instructions.
Pipelined Datapath
Pipelined Instructions
Blue = being
actively
used.
Impact On Performance

If we examine the impact this will
have on performance, under ideal
circumstances…
Each instruction now takes 5 cycles.
 We can run 5 instructions in parallel.
 CPI = 1.

Impact On Performance

If we examine the impact this will
have on performance, under ideal
circumstances…
CPI = 1.
 If we still have the same clock cycle
time as we did for the multi-cycle
datapath…
 5x speedup over the single-cycle
datapath!

Impact On Performance

If we examine the impact this will
have on performance, under ideal
circumstances…
CPI = 1.
 If we still have the same clock cycle
time as we did for the multi-cycle
datapath…
 4.12x speedup over the multi-cycle
datapath!

Impact On Performance

If we examine the impact this will
have on performance, under ideal
circumstances…

What does this mean?

Coming soon to a lecture near you!
Pipelining Control Logic
The needed control logic is very
similar to that which was used for
the single-cycle datapath.
 The key difference – we must save
the control lines ahead of time for
their intended states.

Pipelined Datapath