Intro to Computer Org. Pipelining, Part 1 Pushing Performance Further With the multicycle datapath, time can be saved for some instructions by breaking each instruction into multiple pieces. Some instructions need less cycles than others, allowing us to omit the time needed for the cycles to match the longest instructions on the singlecycle datapath. Pushing Performance Further Note that with the multicycle datapath, we are still only executing one instruction at a time. Pushing Performance Further Upon observation of the individual steps needed for each instruction, we noted that not all parts of the datapath are needed for each cycle. One classmate mentioned that after the second stage, we really should be able to load up a new instruction. Pushing Performance Further If we wish to improve performance, what properties of a CPU can we seek to modify? Instruction Count Clock rate / GHz Cycles Per Instruction (CPI) Pushing Performance Further If we wish to improve performance, what properties of a CPU can we seek to modify? Instruction Count Dependent on compilers. Clock rate / GHz Cycles Per Instruction (CPI) Pushing Performance Further If we wish to improve performance, what properties of a CPU can we seek to modify? Instruction Count Clock rate / GHz May be fundamentally limited Cycles Per Instruction (CPI) Pushing Performance Further If we wish to improve performance, what properties of a CPU can we seek to modify? Instruction Count Clock rate / GHz Cycles Per Instruction (CPI) Can we do more instructions per cycle? Pushing Performance Further If we wished to reduce the CPI from that of the multicycle datapath, what could we do? Merging cycles together would increase the clock rate. Pushing Performance Further “Computer performance is characterized by the amount of useful work accomplished by a computer system compared to the time and resources used.” -Wikipedia article Pushing Performance Further If we can’t make the “useful work” faster… what if we could do more of it at the same time? Pushing Performance Further In technical terms, if you cannot improve the latency, try improving throughput. Latency – time required to complete a single task. Throughput – amount of tasks which can be handled at once. Pushing Performance Further If we can’t make the “useful work” faster… what if we could do more of it at the same time? A classic example: Doing laundry. When multiple loads must be done, the only interference between loads of laundry comes different time requirements for the different steps. One dryer can only handle one load. Sequential Laundry 6 PM 7 T a s k O r d e r 8 9 10 11 12 1 2 AM 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Time A B C D Pipelined Laundry 6 PM 7 T a s k 8 9 3030 30 30 30 30 30 10 11 12 1 2 AM Time A B C O D r d e r The same amount of work is done in far less time… but each load takes the same amount of time! Latency is the same. Throughput is greater. Pipelining Our Datapath The pipelined datapath will look very similar to the single-cycle datapath. However, the cycle divisions will come directly from the multi-cycle datapath. Pipelining Our Datapath 1. 2. 3. 4. 5. Instruction Fetching (IF) Instruction Decoding (ID) Execute [ALU, branching] (EX) Memory Access (MEM) Writeback to the register file (WB) Pipelining Our Datapath Each of these stages always comes in the same order on the multi-cycle datapath. However, some instructions did not need certain stages. Pipelining Our Datapath R-type instructions 1. Instruction Fetching (IF) Instruction Decoding (ID) Execute [ALU, branching] (EX) Memory Access (MEM) Writeback to the register file (WB) 2. 3. 4. 5. Pipelining Our Datapath Store Word 1. Instruction Fetching (IF) Instruction Decoding (ID) Execute [ALU, branching] (EX) Memory Access (MEM) Writeback to the register file (WB) 2. 3. 4. 5. Pipelining Our Datapath Branching / Jumping 1. Instruction Fetching (IF) Instruction Decoding (ID) Execute [ALU, branching] (EX) Memory Access (MEM) Writeback to the register file (WB) 2. 3. 4. 5. Pipelining Our Datapath For pipelining, the unneeded stages will still “run.” They just won’t do anything useful. Each section of the datapath can only be used for one instruction at a time, so we must keep the space occupied to keep things synchronized. Single-Cycle Datapath… Single-Cycle -> Pipelining IF ID EX MEM WB Pipelined Datapath Pipelined Datapath Each giant bar represents a giant set of registers that holds every needed value in place for the next clock cycle. We can’t refer back to hardware from prior stages because it will be holding data for other instructions. Pipelined Datapath Pipelined Instructions Blue = being actively used. Impact On Performance If we examine the impact this will have on performance, under ideal circumstances… Each instruction now takes 5 cycles. We can run 5 instructions in parallel. CPI = 1. Impact On Performance If we examine the impact this will have on performance, under ideal circumstances… CPI = 1. If we still have the same clock cycle time as we did for the multi-cycle datapath… 5x speedup over the single-cycle datapath! Impact On Performance If we examine the impact this will have on performance, under ideal circumstances… CPI = 1. If we still have the same clock cycle time as we did for the multi-cycle datapath… 4.12x speedup over the multi-cycle datapath! Impact On Performance If we examine the impact this will have on performance, under ideal circumstances… What does this mean? Coming soon to a lecture near you! Pipelining Control Logic The needed control logic is very similar to that which was used for the single-cycle datapath. The key difference – we must save the control lines ahead of time for their intended states. Pipelined Datapath
© Copyright 2026 Paperzz