Goal: Describe Pipelining A Pipeline ( )צנרתis like a conveyor belt ( סרט )נאmultiple instructions can be executed at the same time, each one in a different stage. Laundry ( )כביסהis like pipelining: 1. Put a load of dirty clothing in the washer 2. Move the wet wash in the dryer 3. Move the dry wash onto a table and fold 4. Put the clothes away in a closet You can’t do it any faster. But what if you have 3 washes (whites, colors, delicates)? Computer Architecture - Pipelining 1/14 Pipelined Laundry Each wash cycle takes 2hours. 4 washes take 8 hours. 4 pipelined washes take 3.5 hours. The stages are overlapped. pipelined laundry is potentially 4 times faster. Time 6 PM 7 8 9 10 11 12 1 2 AM 6 PM 7 8 9 10 11 12 1 2 AM Task order A B C D Time Task order A B C D 2/14 Pipelining Instructions Executing instructions is performed in stages: 1. Fetch the instruction from memory 2. Decode the instruction and read the registers 3. Execute the operation or calculate address 4. Access a word in data Memory 5. Writeback the result into a register While the first instruction is being decoded the second instruction is already being fetched. While the first instruction is executed the second is decoded and the third ... Computer Architecture - Pipelining 3/14 Single-Cycle vs. Pipelined Performance A lw takes 8ns, each cycle is 2ns long (time of longest stage, memory access). Program 2 execution Time order (in instructions) lw $1, 100($0) Instruction Reg fetch lw $2, 200($0) 4 6 8 ALU Data access 10 12 14 ALU Data access 16 18 Reg Instruction Reg fetch 8 ns lw $3, 300($0) Reg Instruction fetch 8 ns ... 8 ns Program execution Time order (in instructions) 2 lw $1, 100($0) Instruction fetch lw $2, 200($0) 2 ns lw $3, 300($0) Computer 4 Reg Instruction fetch 2 ns 6 ALU Reg Instruction fetch 8 Data access ALU Reg 10 14 12 Reg Data access ALU Reg Data access Architecture - Pipelining 2 ns 2 ns 2 ns 2 ns Reg 2 ns Speedup of Pipelining Assuming ideal conditions: DT instructions pipelined = DT instructions nonpipelined number of pipe stages Time between instructions: 8ns. Number of pipe stages: 5 -> 8ns/5 = 1.6ns But the minimum pipeline stage is 2ns. So why is the speedup only 24ns/14ns = 1.7 and not 4.0. For 1003 instructions: single-cycle: 1000*8ns + 24ns = 8,024 pipelined: Computer 1000*2ns + 14ns = 2,014 = 3.98 Architecture - Pipelining 5/14 The MIPS instruction set was designed for pipelining: Each instruction is the same size. Each cycle an instruction is fetched. In the 80x86 instruction lengths vary from 1 to 17 bytes. Some instructions can be fetched in 1 cycle other not, this complicates things. MIPS has few instruction formats, in each instruction the source register fields are in the same place. Registers can be read at the same time the control is determining the type of instruction. If this wasn’t so an extra pipeline stage would be needed to read the registers. Only Load and Store instructions access memory. If ALU ops could access memory stages 3 and 4 would be expanded. Computer Architecture - Pipelining Pipeline Hazards When the next instruction can’t execute in the next clock cycle we say that a Hazard ( )סכנהhas occurred. There are 3 type of hazards: Structural Hazards: The same unit is needed by two instructions. Control Hazards: The next instruction isn’t known yet. Data Hazards: An instruction depends on the result of a previous instruction. Computer Architecture - Pipelining 6/14 Structural Hazards Laundry example: A washer/dryer combination is used. Both the wash and dry stages use the same unit. The folding table has all your school work on it. Instruction example: A single memory is used. The Fetch and Memory stages can’t be executed at the same time. The Register file can’t be read and written to at the same cycle. Solution: Two memories, for data and instructions. Enable the register file to read and write simultaneously. Computer Architecture - Pipelining 7/14 Control Hazards Laundry example: Washing filthy uniforms. We might have to add more soap and wash again. Only after the dry stage can we tell if the uniforms are clean, or is more soap needed. Instruction example: A branch instruction is being decoded. The next instruction fetched might be the wrong one. Even with a dedicated ALU in the decode stage we still miss a stage. Solution: Stall: Wait until the branch direction is known and then continue fetching. This is known as a pipeline stall or bubble. Computer Architecture - Pipelining 8/14 Control Hazard Example If the branch test fails the lw instruction will be executed with a delay of one cycle. In many processors a delay of 2 cycles is necessary. Program execution Time order (in instructions) add $4, $5, $6 beq $1, $2, 40 2 Instruction fetch 2ns 4 Reg Instruction fetch lw $3, 300($0) 4 ns 6 ALU Reg 8 Data access ALU Instruction fetch 10 14 12 16 Reg Data access Reg Reg ALU Data access Reg 2ns This solution is to slow, we need a faster solution. What if we predict ( )מנבאthe result of the branch. Computer Architecture - Pipelining 9/14 Branch Prediction Laundry Solution: While drying the first load of uniforms wash the second load. If the first load isn’t clean enough rewash the first and second loads. Program execution Time order (in instructions) add $4, $5, $6 2 6 Instruction Reg fetch 2 ns lw $3, 300($0) 2 4 Instruction Reg fetch beq $1, $2, 40 2 ns ALU Instruction Reg fetch bubble Instruction Solution: or $7, $8, $9 Data access ALU 6 4 ns 10 14 Reg Data access ALU 8 Data access 12 Reg Instruction Reg fetch 2 ns Program execution Time order (in instructions) 8 Data access ALU Instruction Reg fetch beq $1, $2, 40 add $4, $5 ,$6 4 10 Reg 14 12 Reg ALU Data access Reg bubble bubble bubble Instruction Reg fetch ALU bubble Data access Reg Predict all branches are not taken. Fetch the next instruction. If the branch is taken (misprediction), stall. Computer Architecture - Pipelining 10/14 3rd Solution: Delayed Decision Laundry Solution: When drying the first uniform load, wash a regular load. Instruction Solution: Switch around the order of instructions. After the branch instruction execute an instruction that isn’t dependent on Program the branch. execution order Time (in instructions) beq $1, $2, 40 2 Instruction fetch add $4, $5, $6 (Delayed branch slot) 2 ns lw $3, 300($0) 4 Reg Instruction fetch 2 ns 6 ALU Reg Instruction fetch 8 Data access ALU Reg 10 12 14 Reg Data access ALU Reg Data access Reg 2 ns Computer Architecture - Pipelining 11/14 Data Hazards Laundry example: The first load is mainly socks. Every sock has its pair in the second load. Can’t fold the first load until the second load is dried. Instruction example: add sub $s0, $t0, $t1 $t2, $s0, $t3 The 2nd instruction is dependent on the 1st. Only during the 5th stage is the result written back into $s0. Solution: Stall: Wait until the 1st instruction ends. Results in 3 bubbles. To long to wait. Computer Architecture - Pipelining 12/14 Forwarding The result is calculated in the 3rd stage, why wait: Program execution order Time (in instructions) 2 add $s0, $t0, $t1 4 IF sub $t2, $s0, $t3 6 ID EX IF ID 8 MEM EX 10 WB MEM WB In the case of a R-format following a load, a bubble is added (called a load-use data hazard): 2 Time Program execution order (in instructions) lw $s0, 20($t1) sub $t2, $s0, $t3 IF 4 6 ID EX bubble bubble IF 8 10 12 MEM WB bubble bubble bubble ID EX MEM 14 WB 13/14 Code Reordering Find the hazard: lw lw sw sw # $t1 is the address of v[k] $t0, 0($t1) # $t0=v[k] $t2, 4($t1) # $t2 = v[k+1] $t2, 0($t1) # v[k] = $t2 $t0, 4($t1) # v[k+1] = $t0 Solution: Reorder the instructions: lw lw sw sw $t0, $t2, $t0, $t2, Computer 0($t1) # $t0=v[k] 4($t1) # $t2 = v[k+1] 4($t1) # v[k+1] = $t0 0($t1) # v[k] = $t2 Architecture - Pipelining 14/14
© Copyright 2026 Paperzz