Lecture 7 Dynamic Scheduling Dr. Norafida Ithnin Advanced Computer System & Architecture Dynamic vs. Static Scheduling • Data hazards in a program to cause a processor to stall. • With static scheduling the compiler tries to reorder these instructions during compile time to reduce pipeline stalls. – Uses less hardware – Can use more powerful algorithms • With dynamic scheduling the hardware tries to rearrange the instructions during run-time to reduce pipeline stalls. – Simpler compiler – Handles dependencies not known at compile time – Allows code compiled for a different machine to run efficiently. Out-Of-Order Execution • In our previous model of DLX, all instructions executed in the order that they appear • This can lead to unnecessary stalls DIVD FO, F2, F4 ADDD F10, F0, F8 SUBD F12, F8, F14 • SUBD stalls waiting for the ADDD to go first, even though SUBD does not have a data dependency. • With out-of-order execution, the SUBD is allowed to executed before the add – This can lead to out-of order completion, which can cause WAW and WAR hazards Scoreboarding • The scoreboard implements a centralized control scheme that – Detects all resource and data hazards – Allows instructions to execute out-of-order when no resource hazards or data dependencies • First implemented in 1964 by the CDC 6600, which had 18 separate functional units – 4 FP units (2 multiply, 1 add, 1 divide) – 7 memory units (5 loads, 2 stores) – 7 integer units (add, shift, logical, compare, etc.) • Dynamic DLX (much simpler) – 2 FP multiply (10 EX cycles) – 1 FP add (2 EX cycles) – 1 FP divide (40 EX cycles) – 1 integer unit (1 EX cycle) Out-of-Order Execution • Out-of-order execution divides ID stage: 1. Issue—decode instructions, check for structural hazards 2. Read operands—wait until no data hazards, then read operands • Scoreboards allow instruction to execute whenever 1 & 2 hold, not waiting for prior instructions • CDC 6600: In order issue, out of order execution, out of order commit (also called completion) Scoreboard Implications • Out-of-order completion can lead to WAR and WAW hazards? • Solution for WAW – Detect WAW hazard before reading operands – Stall write until other instruction completes • Solutions for WAR – Detect WAR hazards before writing back to the register files and stall the write back • This scoreboard does not take advantage of forwarding, since it waits until both results are written back to the register file • Scoreboard replaces ID, EX, WB with 4 stages Four Stages of Scoreboard Control • Issue (ID1) – decode instructions – check for structural and WAW hazards – stall until structural and WAW hazards are resolved • Read operands (ID2) – wait until no RAW hazards – then read operands • Execution (EX) – operate on operands – may be multiple cycles - notify scoreboard when done • Write result (WB) – finish execution – stall if WAR hazard Three Parts of the Scoreboard 1. Instruction status—which of 4 steps the instruction is in: ID1, ID1, EX, or WB. 2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unit Busy—Indicates whether the unit is busy or not Op—Operation to perform in the unit (e.g., + or –) Fi—Destination register Fj, Fk—Source-register numbers Qj, Qk—Functional units producing source registers Fj, Fk Rj, Rk—Flags indicating when Fj, Fk are ready 3. Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register Scoreboard Example for DLX Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Clock FU Issue Read Execution Write operandscompleteResult Busy No No No No No Op dest Fi F0 F2 F4 S1 Fj S2 Fk FU for j FU for k Fj? Qj Qk Rj Fk? Rk F6 F8 F10 F30 F12 ... Scoreboard Example Cycle 1 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Clock 1 FU Issue 1 Read ExecutionW rite operandscompleteResult Busy Yes No No No No Op Load dest Fi F6 F0 F2 F4 S1 Fj S2 Fk R2 F6 F8 F10 Integer FU for j FU for k Fj? Qj Qk Rj F12 ... Fk? Rk Yes F30 Scoreboard Example Cycle 2 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 Clock F0 2 Busy Op Yes Load No No No No FU • Issue 2nd LD? F2 dest Fi F6 S1 Fj S2 Fk R2 F4 F6 F8 F10 Integer FU for j FU for k Fj? Qj Qk Rj F12 ... Fk? Rk Yes F30 Scoreboard Example Cycle 3 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 Clock F0 3 • Busy Op Yes Load No No No No FU Issue MULT? F2 dest Fi F6 S1 Fj S2 Fk R2 F4 F6 F8 F10 Integer FU for j FU for k Fj? Qj Qk Rj F12 ... Fk? Rk Yes F30 Scoreboard Example Cycle 4 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 Clock F0 4 FU Busy Op Yes Load No No No No F2 dest Fi F6 S1 Fj S2 Fk R2 F4 F6 F8 F10 Integer FU for j FU for k Fj? Qj Qk Rj F12 ... Fk? Rk Yes F30 Scoreboard Example Cycle 5 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 Clock F0 5 FU Busy Op Yes Load No No No No F2 Integer dest Fi F2 S1 Fj S2 Fk R3 FU for j FU for k Fj? Qj Qk Rj F4 F6 F8 F10 F12 ... Fk? Rk Yes F30 Scoreboard Example Cycle 6 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 6 Clock F0 6 FU Busy Op Yes Load Yes Mult No No No F2 Mult1 Integer dest Fi F2 F0 S1 Fj F4 F6 F8 F10 F2 S2 Fk R3 F4 FU for j FU for k Fj? Qj Qk Rj Integer F12 No Fk? Rk Yes Yes ... F30 Scoreboard Example Cycle 7 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 6 7 Busy Yes Yes No Yes No Clock F0 7 • FU Op Load Mult dest Fi F2 F0 S1 Fj F2 S2 Fk R3 F4 Sub F8 F6 F2 F2 F4 F6 F8 F10 Mult1 Integer Read multiply operands? Add FU for j FU for k Fj? Qj Qk Rj Integer No Fk? Rk Yes Yes Integer Yes No F12 F30 ... Scoreboard Example Cycle 8a Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 6 7 8 Busy Yes Yes No Yes Yes Clock F0 8 FU Op Load Mult dest Fi F2 F0 S1 Fj F2 S2 Fk R3 F4 Sub Div F8 F10 F6 F0 F2 F6 F2 F4 F6 F8 F10 Mult1 Integer FU for j FU for k Fj? Qj Qk Rj Integer Mult1 Add Divide No Fk? Rk Yes Yes Integer Yes No No Yes F12 F30 ... Scoreboard Example Cycle 8b Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 7 8 Busy No Yes No Yes Yes Clock F0 8 FU Mult1 Op dest Fi S1 Fj S2 Fk FU for j FU for k Fj? Qj Qk Rj Mult F0 F2 F4 Yes Yes Sub Div F8 F10 F6 F0 F2 F6 Yes No Yes Yes F2 F4 F6 F8 F10 ... F30 Mult1 Add Divide F12 Fk? Rk Scoreboard Example Cycle 9 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 10 Mult1 Mult2 2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 8 Busy No Yes No Yes Yes Clock F0 9 FU Mult1 Op dest Fi S1 Fj S2 Fk FU for j FU for k Fj? Qj Qk Rj Mult F0 F2 F4 Yes Yes Sub Div F8 F10 F6 F0 F2 F6 Yes No Yes Yes F2 F4 F6 F8 F10 ... F30 Mult1 Add Divide F12 Fk? Rk Scoreboard Example Cycle 11 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 8 Mult1 Mult2 0 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 8 Busy No Yes No Yes Yes Clock F0 11 FU Mult1 Op dest Fi S1 Fj S2 Fk FU for j FU for k Fj? Qj Qk Rj Mult F0 F2 F4 Yes Yes Sub Div F8 F10 F6 F0 F2 F6 Yes No Yes Yes F2 F4 F6 F8 F10 ... F30 Mult1 Add Divide F12 Fk? Rk Scoreboard Example Cycle 12 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 7 Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 Clock F0 12 • FU Busy Op No Yes Mult No No Yes Div F2 dest Fi S1 Fj S2 Fk F0 F2 F4 F10 F0 F6 F4 F6 F8 F10 Mult1 Read operands for DIVD? FU for j FU for k Fj? Qj Qk Rj Mult1 Divide F12 Fk? Rk Yes Yes No Yes ... F30 Scoreboard Example Cycle 13 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 6 Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 13 FU Mult1 F2 F4 FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 14 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 5 Mult1 Mult2 2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 14 FU Mult1 F2 F4 FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 15 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 4 Mult1 Mult2 1 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 15 FU Mult1 F2 F4 FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 16 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 3 Mult1 Mult2 0 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 16 FU Mult1 F2 F4 FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 17 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 2 Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 17 • FU F2 F4 Mult1 Write result of ADDD? FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 18 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 1 Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 18 FU Mult1 F2 F4 FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 19 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 0 Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 19 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 19 FU Mult1 F2 F4 FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 20 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No No No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 20 FU F2 F4 FU for j FU for k Fj? Qj Qk Rj F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes Yes ... F30 Scoreboard Example Cycle 21 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 21 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No No No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 21 FU F2 F4 FU for j FU for k Fj? Qj Qk Rj F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes Yes ... F30 Scoreboard Example Cycle 22 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add 40 Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 21 13 14 16 22 dest S1 S2 Busy Op Fi Fj Fk No No No No Yes Div F10 F0 F6 Clock F0 22 FU F2 F4 FU for j FU for k Fj? Qj Qk Rj F6 F8 F10 Divide F12 Fk? Rk Yes Yes ... F30 Scoreboard Example Cycle 61 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add 0 Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 21 61 13 14 16 22 dest S1 S2 Busy Op Fi Fj Fk No No No No Yes Div F10 F0 F6 Clock F0 61 FU F2 F4 FU for j FU for k Fj? Qj Qk Rj F6 F8 F10 Divide F12 Fk? Rk Yes Yes ... F30 Scoreboard Example Cycle 62 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add 0 Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 21 61 62 13 14 16 22 dest S1 S2 Busy Op Fi Fj Fk No No No No No Clock F0 62 FU F2 F4 FU for j FU for k Fj? Qj Qk Rj F6 F8 F10 F12 ... Fk? Rk F30 CDC 6000 Scoreboard Summary • Speedup from scoreboard – 1.7 for FORTRAN programs – 2.5 for hand-coded assembly language programs – Effects of modern compilers? • Hardware – Scoreboard hardware approximately same as one FPU – Main cost was buses (4x’s normal amount) – Could be more severe for modern processors • Limitations – No forwarding logic – Limited to instructions in basic block – Stalls for WAW hazards – Wait for WAR hazards before WB Tomasulo Algorithm for Dynamic Scheduling • For IBM 360/91 in 1967 - about 3 years after CDC 6600 • Goal: High performance without special compilers • Differences between IBM 360 & CDC 6600 – IBM has only 2 register specifiers/instr vs. 3 in CDC 6600 – IBM has register-memory instructions – IBM has 4 FP registers vs. 8 in CDC 6600 – IBM has pipelined functional units (3 adds, 2 multiplies) • Tomasulo algorithm is designed to handle name dependencies (WAW and WAR hazards) efficiently SUB F1, F2, F0 DIVF F2, F3 , F2 ADDF F3, F0, F0 MULF F3, F1, F1 Tomasulo Algorithm • Differences from Scoreboarding – Distributed hazard detection and control (through reservation stations) – Results are bypassed to function units – Common data bus (CDB) broadcasts results to all FUs. – HW renaming of registers to avoid WAR, WAW hazards – Load and Stores treated as FUs as well – Registers in instructions replaced by pointers to reservation station buffers • Lead to concepts used in Alpha 21264, HP 8000, MIPS 10000, Pentium II, PowerPC 604, … Tomasulo Organization FP Op Queue FP Registers Load Buffer Common Data Bus FP Add Res. Station Store Buffer FP Mul Res. Station Reservation Station Components Op—Operation to perform in the unit (e.g., + or –) Qj, Qk—Reservation stations producing source registers Vj, Vk—Value of Source operands Rj, Rk—Flags indicating when Vj, Vk are ready Busy—Indicates reservation station and FU is busy Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register. Three Stages of Tomasulo Algorithm 1. Issue—get instruction from FP Op Queue If reservation station free, the scoreboard issues instr & sends operands (renames registers). 2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch CDB for result 3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting units; mark reservation station available. Tomasulo Example Cycle 0 Instruction status Instruction j k Issue LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Execution complete S1 Vj S2 Vk RS for j Qj RS for k Qk Clock F2 F4 F6 F8 0 F0 FU Write Result Load1 Load2 Load3 Busy No No No Address F10 F12 ... F30 Tomasulo Example Cycle 1 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Execution complete S1 Vj S2 Vk RS for j Qj RS for k Qk Clock F2 F4 F6 F8 1 F0 FU Write Result Load1 Load2 Load3 Load1 Busy No Yes No No Address 34+R2 F10 F12 ... F30 Tomasulo Example Cycle 2 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Execution complete S1 Vj S2 Vk RS for j Qj RS for k Qk Clock F2 F4 F6 F8 2 F0 FU Write Result Load1 Load2 Load3 Load2 Load1 Busy Yes Yes No Address 34+R2 45+R3 F10 F12 ... F30 Tomasulo Example Cycle 3 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 3 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Execution complete 3 Write Result S1 Vj S2 Vk RS for j Qj R(F4) Load2 Clock F0 F2 F4 F6 Mult1 Load2 3 FU Load1 Load2 Load3 Load1 Busy Yes Yes No Address 34+R2 45+R3 F10 F12 ... RS for k Qk F8 F30 Tomasulo Example Cycle 4 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 Yes 0 Add2 No Add3 No 0 Mult1 Yes 0 Mult2 No Register result status Clock 4 FU Issue 1 2 3 4 Execution complete 3 Write Result 4 Load1 Load2 Load3 S1 Op Vj SUBD M(34+R2) S2 Vk RS for j Qj MULTD R(F4) Load2 F4 F6 F8 M(34+R2) Add1 F0 F2 Mult1 Load2 Busy No Yes No Address F10 F12 ... 45+R3 RS for k Qk Load2 F30 Tomasulo Example Cycle 5 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 Yes 0 Add2 No Add3 No 0 Mult1 Yes 0 Mult2 Yes Register result status Clock 5 FU Issue 1 2 3 4 5 Execution complete 3 5 Write Result 4 Load1 Load2 Load3 Busy No Yes No Address F12 ... S1 Op Vj SUBD M(34+R2) S2 Vk RS for j Qj MULTD DIVD R(F4) M(34+R2) Load2 Mult1 F4 F6 F8 F10 M(34+R2) Add1 Mult2 F0 F2 Mult1 Load2 45+R3 RS for k Qk Load2 F30 Tomasulo Example Cycle 6 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 2 Add1 Yes 0 Add2 Yes Add3 No 10 Mult1 Yes 0 Mult2 Yes Register result status Clock 6 FU Issue 1 2 3 4 5 6 Execution complete 3 5 Write Result 4 6 Load1 Load2 Load3 RS for j Qj Busy No No No Address F12 ... S1 Op Vj SUBD M(34+R2) ADDD S2 Vk M(45+R3) M(45+R3) RS for k Qk MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 F10 Mult1 M(45+R3) Add2 Add1 Mult2 Add1 F30 Tomasulo Example Cycle 7 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 1 Add1 Yes 0 Add2 Yes Add3 No 9 Mult1 Yes 0 Mult2 Yes Register result status Clock 7 FU Issue 1 2 3 4 5 6 Execution complete 3 5 Write Result 4 6 Load1 Load2 Load3 RS for j Qj Busy No No No Address F12 ... S1 Op Vj SUBD M(34+R2) ADDD S2 Vk M(45+R3) M(45+R3) RS for k Qk MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 F10 Mult1 M(45+R3) Add2 Add1 Mult2 Add1 F30 Tomasulo Example Cycle 8 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 Yes 0 Add2 Yes Add3 No 8 Mult1 Yes 0 Mult2 Yes Register result status Clock 8 FU Issue 1 2 3 4 5 6 Execution complete 3 5 Write Result 4 6 Load1 Load2 Load3 Busy No No No Address F12 ... 8 S1 Op Vj SUBD M(34+R2) ADDD S2 Vk M(45+R3) M(45+R3) RS for j Qj RS for k Qk MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 F10 Mult1 M(45+R3) Add2 Add1 Mult2 Add1 F30 Tomasulo Example Cycle 9 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 Yes Add3 No 7 Mult1 Yes 0 Mult2 Yes Register result status Clock 9 FU Issue 1 2 3 4 5 6 Op Execution complete 3 5 Write Result 4 6 8 S1 Vj Load1 Load2 Load3 Busy No No No Address F10 F12 ... 9 S2 Vk RS for j Qj RS for k Qk ADDD M()–M() M(45+R3) MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) Add2 M()–M() Mult2 F30 Tomasulo Example Cycle 10 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 2 Add2 Yes Add3 No 67 Mult1 Yes 0 Mult2 Yes Register result status Clock 10 FU Issue 1 2 3 4 5 6 Op Execution complete 3 5 Write Result 4 6 8 S1 Vj Load1 Load2 Load3 Busy No No No Address F10 F12 ... 9 S2 Vk RS for j Qj RS for k Qk ADDD M()–M() M(45+R3) MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) Add2 M()–M() Mult2 F30 Tomasulo Example Cycle 11 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 1 Add2 Yes Add3 No 5 Mult1 Yes 0 Mult2 Yes Register result status Clock 11 FU Issue 1 2 3 4 5 6 Op Execution complete 3 5 Write Result 4 6 8 S1 Vj Load1 Load2 Load3 Busy No No No Address F10 F12 ... 9 S2 Vk RS for j Qj RS for k Qk ADDD M()–M() M(45+R3) MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) Add2 M()–M() Mult2 F30 Tomasulo Example Cycle 12 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 Yes Add3 No 4 Mult1 Yes 0 Mult2 Yes Register result status Clock 12 FU Issue 1 2 3 4 5 6 Op Execution complete 3 5 Write Result 4 6 8 Load1 Load2 Load3 Busy No No No Address F10 F12 ... 9 12 S1 Vj S2 Vk RS for j Qj RS for k Qk ADDD M()–M() M(45+R3) MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) Add2 M()–M() Mult2 F30 Tomasulo Example Cycle 13 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No Add3 No 3 Mult1 Yes 0 Mult2 Yes Register result status Clock 13 FU Issue 1 2 3 4 5 6 Execution complete 3 5 Write Result 4 6 8 9 12 13 Address F10 F12 ... S2 Vk RS for j Qj MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) (M–M)+M() M()–M() Mult2 Op S1 Vj Load1 Load2 Load3 Busy No No No RS for k Qk F30 Tomasulo Example Cycle 14 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No Add3 No 2 Mult1 Yes 0 Mult2 Yes Register result status Clock 14 FU Issue 1 2 3 4 5 6 Execution complete 3 5 Write Result 4 6 8 9 12 13 Address F10 F12 ... S2 Vk RS for j Qj MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) (M–M)+M() M()–M() Mult2 Op S1 Vj Load1 Load2 Load3 Busy No No No RS for k Qk F30 Tomasulo Example Cycle 15 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No Add3 No 1 Mult1 Yes 0 Mult2 Yes Register result status Clock 15 FU Issue 1 2 3 4 5 6 Execution complete 3 5 Write Result 4 6 8 9 12 13 Address F10 F12 ... S2 Vk RS for j Qj MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) (M–M)+M() M()–M() Mult2 Op S1 Vj Load1 Load2 Load3 Busy No No No RS for k Qk F30 Tomasulo Example Cycle 16 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No Add3 No 0 Mult1 Yes 0 Mult2 Yes Register result status Clock 16 FU Issue 1 2 3 4 5 6 Execution complete 3 5 16 8 Write Result 4 6 Address F10 F12 ... 9 12 13 S2 Vk RS for j Qj MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) (M–M)+M() M()–M() Mult2 Op S1 Vj Load1 Load2 Load3 Busy No No No RS for k Qk F30 Tomasulo Example Cycle 17 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 3 SUBD F8 F6 F2 4 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 Yes DIVD Register result status Execution complete 3 5 16 8 Clock 17 FU Write Result 4 6 17 9 12 Load1 Load2 Load3 Busy No No No Address F10 F12 ... 13 S1 Vj S2 Vk M*F4 M(34+R2) F0 F2 F4 M*F4 M(45+R3) RS for j Qj RS for k Qk F6 F8 (M–M)+M() M()–M() Mult2 F30 Tomasulo Example Cycle 18 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 3 SUBD F8 F6 F2 4 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 40 Mult2 Yes DIVD Register result status Execution complete 3 5 16 8 Clock 18 FU Write Result 4 6 17 9 12 Load1 Load2 Load3 Busy No No No Address F10 F12 ... 13 S1 Vj S2 Vk M*F4 M(34+R2) F0 F2 F4 M*F4 M(45+R3) RS for j Qj RS for k Qk F6 F8 (M–M)+M() M()–M() Mult2 F30 Tomasulo Example Cycle 57 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 3 SUBD F8 F6 F2 4 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 1 Mult2 Yes DIVD Register result status Execution complete 3 5 16 8 Clock 57 FU Write Result 4 6 17 9 12 Load1 Load2 Load3 Busy No No No Address F10 F12 ... 13 S1 Vj S2 Vk M*F4 M(34+R2) F0 F2 F4 M*F4 M(45+R3) RS for j Qj RS for k Qk F6 F8 (M–M)+M() M()–M() Mult2 F30 Tomasulo Example Cycle 58 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 3 SUBD F8 F6 F2 4 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 Yes DIVD Register result status Execution complete 3 5 16 8 58 12 S1 Vj Write Result 4 6 17 9 M*F4 M(34+R2) Clock F0 F2 F4 M*F4 M(45+R3) 58 FU Load1 Load2 Load3 Busy No No No Address F10 F12 ... 13 S2 Vk RS for j Qj RS for k Qk F6 F8 (M–M)+M() M()–M() Mult2 F30 Tomasulo Example Cycle 59 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 3 SUBD F8 F6 F2 4 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Execution complete 3 5 16 8 58 12 S1 Vj Write Result 4 6 17 9 59 13 S2 Vk RS for j Qj RS for k Qk Clock F0 F2 F4 F6 F8 M*F4 M(45+R3) (M–M)+M() M()–M() M*F4/M 59 FU Load1 Load2 Load3 Busy No No No Address F10 F12 ... F30 Tomasulo Summary • Advantages – Prevents register from being the bottleneck – Eliminates WAR, WAW hazards – Allows loop unrolling in HW • Common Data Bus – Broadcasts results to multiple instructions – Central bottleneck • Lasting Contributions – Dynamic scheduling – Register renaming – Load/store disambiguation Discussions
© Copyright 2026 Paperzz