Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008 Precise Exceptions • What is precise? – All instructions older than interrupt are complete – All instructions younger than interrupt haven’t started • Simultaneous interrupts/exceptions • Odd bits of state • Out of order writes © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 2 Review: CPI < 1 • Superscalar: Issue & execute > 1 instruction/cycle • In-order processor – Instructions move from D to E in program order – May finish execute out of order • problem spots – fetch, branch prediction trace cache? – decode (N2 dependence cross-check) – execute (N2 bypass) clustering? © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 3 Can we do better? • Problem: Stall in ID stage if any data hazard. MULD ADDD ADDD ADDD F2, F3, F4 F1, F2, F3 F3, F4, F5 F1, F4, F5 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Long latency… Computer Science 220 4 HW Schemes: Instruction Parallelism • Why in HW at run time? – Works when can’t know dependencies – Simpler Compiler – Code for one machine runs well on another machine • Key Idea: Allow instructions behind stall to proceed DIVD F0, F2, F4 ADD F10, F0, F8 SUBD F8, F8, F14 – Enables out-of-order execution => out-of-order completion – ID stage check for both structural & data dependencies © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 5 HW Schemes: Instruction Parallelism • Out-of-order execution divides ID stage: 1. Issue: decode instructions, check for structural hazards 2. Read: operands wait until no data hazards, then read operands • Scoreboards allow instruction to execute whenever 1 & 2 hold, not waiting for prior instructions © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 6 Scoreboard Implications • Out-of-order completion => WAR, WAW hazards? • Solutions for WAR – Queue both the operation and copies of its operands – Read registers only during Read Operands stage • For WAW, must detect hazard: stall until other completes • Need to have multiple instructions in execution phase => multiple execution units or pipelined execution units • Scoreboard keeps track of dependencies, state or operations • Scoreboard replaces ID, EX, WB with 4 stages © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 7 Four Stages of Scoreboard Control 1. Issue: decode instructions & check for structural hazards (ID1, sometimes called dispatch) If a functional unit for the instruction is free and no other active instruction has the same destination register (WAW), the scoreboard issues the instruction to the functional unit and updates its internal data structure. If a structural or WAW hazard exists, then the instruction issue stalls, and no further instructions will issue until these hazards are cleared. 2. Read operands: wait until no data hazards, then read operands (ID2, sometimes called issue) A source operand is available if no earlier issued active instruction is going to write it, or if the register containing the operand is being written by a currently active functional unit. When the source operands are available, the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution. The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order. © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 8 Four Stages of Scoreboard Control 3. Execution: operate on operands The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution. 4. Write Result: finish execution (WB) Once the scoreboard is aware that the functional unit has completed execution, the scoreboard checks for WAR hazards. If none, it writes results. If WAR, then it stalls the instruction. Example: DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F8,F8,F14 CDC 6600 scoreboard would stall SUBD until ADDD reads operands © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 9 Three Parts of the Scoreboard 1. Instruction status: which of 4 steps the instruction is in 2. Functional unit status: Indicates the state of the functional unit (FU). 9 fields for each functional unit Busy--Indicates whether the unit is busy or not Op--Operation to perform in the unit (e.g., + or -) Fi--Destination register Fj, Fk--Source-register numbers Qj, Qk--Functional units producing source registers Fj, Fk Rj, Rk--Flags indicating when Fj, Fk are ready 3. Register result status: Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 10 Scoreboard Example Cycle 1 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Integer Yes Load Mult1 No Mult2 No Add No Divide No Register Result Status CLOCK F0 1 FU Issue 1 Fi F6 F2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Fj Read Execution Write Operand Complete Result Fk R2 F4 Qj F6 Int Qk F8 Computer Science 220 F10 Rj Rk Yes F12 … F31 11 Scoreboard Example Cycle 2 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Integer Yes Load Mult1 No Mult2 No Add No Divide No Register Result Status CLOCK F0 2 FU Issue 1 Fi F6 F2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Fj Read Execution Write Operand Complete Result 2 Fk R2 F4 Qj F6 Int Qk F8 Computer Science 220 F10 Rj Rk Yes F12 … F31 12 Scoreboard Example Cycle 3 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Integer Yes Load Mult1 No Mult2 No Add No Divide No Register Result Status CLOCK F0 3 FU Issue 1 Fi F6 F2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Fj Read Execution Write Operand Complete Result 2 3 Fk R2 F4 Qj F6 Int Qk F8 Computer Science 220 F10 Rj Rk Yes F12 … F31 13 Scoreboard Example Cycle 4 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Integer Yes Load Mult1 No Mult2 No Add No Divide No Register Result Status CLOCK F0 4 FU Issue 1 Fi F6 F2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Fj Read Execution Write Operand Complete Result 2 3 4 Fk R2 F4 Qj F6 Int Qk F8 Computer Science 220 F10 Rj Rk Yes F12 … F31 14 Scoreboard Example Cycle 5 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Integer Yes Load Mult1 No Mult2 No Add No Divide No Register Result Status CLOCK F0 5 FU Issue 1 5 Fi F2 F2 Int © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Fj Read Execution Write Operand Complete Result 2 3 4 Fk R3 F4 Qj F6 Qk F8 Computer Science 220 F10 Rj Rk Yes F12 … F31 15 Scoreboard Example Cycle 6 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Integer Yes Load Mult1 Yes Mult Mult2 No Add No Divide No Register Result Status CLOCK F0 6 FU Mul1 Issue 1 5 6 Fi F2 F0 F2 Int © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Fj Read Execution Write Operand Complete Result 2 3 4 6 Fk R3 F4 F2 F4 Qj Qk Integer F6 F8 Computer Science 220 Rj No F10 Rk Yes Yes F12 … F31 16 Scoreboard Example Cycle 7 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Integer Yes Load Mult1 Yes Mult Mult2 No Add Yes Sub Divide No Register Result Status CLOCK F0 7 FU Mul1 Issue 1 5 6 7 Fi F2 F0 F2 Fk R3 F4 F8 F6 F2 F2 Int © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Fj Read Execution Write Operand Complete Result 2 3 4 6 7 F4 Qj Qk Integer No Int F6 F8 Add Computer Science 220 Rj F10 Rk Yes Yes Yes No F12 … F31 17 Scoreboard Example Cycle 8a Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Fi Integer Yes Load F2 Mult1 Yes Mult F0 Mult2 No Add Yes Sub F8 Divide Yes Div F10 Register Result Status CLOCK F0 F2 8 FU Mul1 Int © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 5 6 7 8 Fj Read Execution Write Operand Complete Result 2 3 4 6 7 F2 Fk R3 F4 F6 F0 F2 F6 F4 Qj Qk Integer No Int Mult1 F6 F8 Add Computer Science 220 Rj F10 Div Rk Yes Yes Yes No No Yes F12 … F31 18 Scoreboard Example Cycle 8b Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Fi Integer No Mult1 Yes Mult F0 Mult2 No Add Yes Sub F8 Divide Yes Div F10 Register Result Status CLOCK F0 F2 8 FU Mul1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Read Execution Write Issue Operand Complete Result 1 2 3 4 5 6 7 8 6 7 8 Fj Fk F2 F4 F6 F0 F2 F6 F4 Qj Qk Rj Rk Yes Yes Int Mult1 F6 F8 Add Computer Science 220 F10 Div Yes Yes No Yes F12 … F31 19 Scoreboard Example Cycle 9 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Fi Integer No Mult1 Yes Mult F0 Mult2 No Add Yes Sub F8 Divide Yes Div F10 Register Result Status CLOCK F0 F2 9 FU Mul1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 5 6 7 8 Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 9 Fj Fk F2 F4 F6 F0 F2 F6 F4 Qj Qk Rj Rk Yes Yes Int Mult1 F6 F8 Add Computer Science 220 F10 Div Yes Yes No Yes F12 … F31 20 Scoreboard Example Cycle 11 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Fi Integer No Mult1 Yes Mult F0 Mult2 No Add Yes Sub F8 Divide Yes Div F10 Register Result Status CLOCK F0 F2 11 FU Mul1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 5 6 7 8 Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 9 11 Fj Fk F2 F4 F6 F0 F2 F6 F4 Qj Qk Rj Rk Yes Yes Int Mult1 F6 F8 Add Computer Science 220 F10 Div Yes Yes No Yes F12 … F31 21 Scoreboard Example Cycle 12 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Issue 1 5 6 7 8 Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 9 11 12 Functional Unit Status Name Busy Op Fi Fj Fk Qj Integer No Mult1 Yes Mult F0 F2 F4 Mult2 No Add No Divide Yes Div F10 F0 F6 Mult1 Register Result Status CLOCK F0 F2 F4 F6 F8 12 FU Mul1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Qk Computer Science 220 Rj Rk Yes Yes No Yes F10 Div F12 … F31 22 Scoreboard Example Cycle 13 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Fi Integer No Mult1 Yes Mult F0 Mult2 No Add Yes Ad F6 Divide Yes Div F10 Register Result Status CLOCK F0 F2 13 FU Mul1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 5 6 7 8 13 Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 9 11 12 Fk F2 F4 Yes Yes F8 F0 F2 F6 Yes Yes No Yes F4 Qj Qk Mult1 F6 Add F8 Computer Science 220 F10 Div Rj Rk Fj F12 … F31 23 Scoreboard Example Cycle 14 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Fi Integer No Mult1 Yes Mult F0 Mult2 No Add Yes Ad F6 Divide Yes Div F10 Register Result Status CLOCK F0 F2 14 FU Mul1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 5 6 7 8 13 Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 9 11 12 14 Fk F2 F4 Yes Yes F8 F0 F2 F6 Yes Yes No Yes F4 Qj Qk Mult1 F6 Add F8 Computer Science 220 F10 Div Rj Rk Fj F12 … F31 24 Scoreboard Example Cycle 15 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Fi Integer No Mult1 Yes Mult F0 Mult2 No Add Yes Add F6 Divide Yes Div F10 Register Result Status CLOCK F0 F2 15 FU Mul1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 5 6 7 8 13 Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 9 11 12 14 Fk F2 F4 Yes Yes F8 F0 F2 F6 Yes Yes No Yes F4 Qj Qk Mult1 F6 Add F8 Computer Science 220 F10 Div Rj Rk Fj F12 … F31 25 Scoreboard Example Cycle 16 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Fi Integer No Mult1 Yes Mult F0 Mult2 No Add Yes Add F6 Divide Yes Div F10 Register Result Status CLOCK F0 F2 16 FU Mul1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 5 6 7 8 13 Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 9 11 12 14 16 Fk F2 F4 Yes Yes F8 F0 F2 F6 Yes Yes No Yes F4 Qj Qk Mult1 F6 Add F8 Computer Science 220 F10 Div Rj Rk Fj F12 … F31 26 Scoreboard Example Cycle 17 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Fi Integer No Mult1 Yes Mult F0 Mult2 No Add Yes Add F6 Divide Yes Div F10 Register Result Status CLOCK F0 F2 17 FU Mul1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 5 6 7 8 13 Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 9 11 12 14 16 Fk F2 F4 Yes Yes F8 F0 F2 F6 Yes Yes No Yes F4 Qj Qk Mult1 F6 Add F8 Computer Science 220 F10 Div Rj Rk Fj F12 … F31 27 Scoreboard Example Cycle 18 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Fi Integer No Mult1 Yes Mult F0 Mult2 No Add Yes Add F6 Divide Yes Div F10 Register Result Status CLOCK F0 F2 18 FU Mul1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 5 6 7 8 13 Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 9 11 12 14 16 Fk F2 F4 Yes Yes F8 F0 F2 F6 Yes Yes No Yes F4 Qj Qk Mult1 F6 Add F8 Computer Science 220 F10 Div Rj Rk Fj F12 … F31 28 Scoreboard Example Cycle 19 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Fi Integer No Mult1 Yes Mult F0 Mult2 No Add Yes Add F6 Divide Yes Div F10 Register Result Status CLOCK F0 F2 19 FU Mul1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 5 6 7 8 13 Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 19 9 11 12 14 16 Fk F2 F4 Yes Yes F8 F0 F2 F6 Yes Yes No Yes F4 Qj Qk Mult1 F6 Add F8 Computer Science 220 F10 Div Rj Rk Fj F12 … F31 29 Scoreboard Example Cycle 20 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Integer No Mult1 No Mult2 No Add Yes Add Divide Yes Div Register Result Status CLOCK F0 20 FU Issue 1 5 6 7 8 13 Fi Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 19 20 9 11 12 14 Fj Fk F6 F8 F10 F0 F2 F6 F2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz F4 16 Qj Qk F8 Computer Science 220 Rk Yes Yes Yes Yes Mult1 F6 Add Rj F10 Div F12 … F31 30 Scoreboard Example Cycle 21 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Integer No Mult1 No Mult2 No Add Yes Add Divide Yes Div Register Result Status CLOCK F0 21 FU Issue 1 5 6 7 8 13 Fi Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 19 20 9 11 12 21 14 16 Fj Fk F6 F8 F10 F0 F2 F6 F2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz F4 Qj Qk F8 Computer Science 220 Rk Yes Yes Yes Yes Mult1 F6 Add Rj F10 Div F12 … F31 31 Scoreboard Example Cycle 22 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Integer No Mult1 No Mult2 No Add No Divide Yes Div Register Result Status CLOCK F0 22 FU Issue 1 5 6 7 8 13 Fi Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 19 20 9 11 12 21 14 16 22 Fj Fk Qj F10 F0 F6 Mult1 F2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz F4 F6 Qk F8 Computer Science 220 Rj 40 cycle Divide Rk Yes Yes F10 Div F12 … F31 32 Scoreboard Example Cycle 61 Instruction Status Instruction j LD F6 34+ LD F2 45+ MULT F0 F2 SUBD F8 F6 DIVD F10 F0 ADDD F6 F8 k R2 R3 F4 F2 F6 F2 Functional Unit Status Name Busy Op Integer No Mult1 No Mult2 No Add No Divide Yes Div Register Result Status CLOCK F0 61 FU Issue 1 5 6 7 8 13 Fi Read Execution Write Operand Complete Result 2 3 4 6 7 8 9 19 20 9 11 12 21 61 14 16 22 Fj Fk Qj F10 F0 F6 Mult1 F2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz F4 F6 Qk F8 Computer Science 220 Rj Rk Yes Yes F10 Div F12 … F31 33 Scoreboard Summary • Speedup 1.7 from compiler; 2.5 by hand BUT slow memory (no cache) • Limitations of 6600 scoreboard – – – – – No forwarding Limited to instructions in basic block (small window) Number of functional units (structural hazards) Wait for WAR hazards Prevent WAW hazards • How to design a datapath that eliminates these problems? © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 34 Tomasulo’s Algorithm: Another Dynamic Scheme • For IBM 360/91 about 3 years after CDC 6600 • Goal: High Performance without special compilers • Differences between IBM 360 & CDC 6600 ISA – IBM has only 2 register specifiers/instr vs. 3 in CDC 6600 – IBM has 4 FP registers vs. 8 in CDC 6600 • Differences between Tomasulo Algorithm & Scoreboard – Control & buffers distributed with Function Units vs. centralized in scoreboard; called “reservation stations” – Register specifiers in instructions replaced by pointers to reservation station buffer (Everything can be solved with level of indirection!) – HW renaming of registers to avoid WAR, WAW hazards – Common Data Bus broadcasts results to all FUs – Load and Stores treated as FUs as well © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 35 Tomasulo Organization From Memory Load Buffers From Instruction Unit FP Registers FP op queue Operand Bus Store Buffers To Memory FP multipliers FP adders Common Data Bus (CDB) © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 36 Reservation Station Components Op—Operation to perform in the unit (e.g., + or –) Qj, Qk—Reservation stations producing source registers Vj, Vk—Value of Source operands Rj, Rk—Flags indicating when Vj, Vk are ready Busy—Indicates reservation station and FU is busy Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register. © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 37 Three Stages of Tomasulo Algorithm 1. Issue—get instruction from FP Op Queue If reservation station free, the scoreboard issues instr & sends operands (renames registers). 2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch CDB for result 3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting units; mark reservation station available. © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 38 Tomasulo Example Cycle 0 Instruction status Instruction j k Issue LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Execution complete S1 Vj S2 Vk RS for j Qj RS for k Qk Clock F2 F4 F6 F8 0 F0 Write Result Load1 Load2 Load3 Busy No No No Address F10 F12 ... F30 FU © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 39 Tomasulo Example Cycle 1 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Execution complete S1 Vj S2 Vk RS for j Qj RS for k Qk Clock F2 F4 F6 F8 1 F0 Write Result Load1 Load2 Load3 FU © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Busy No Yes No No Address 34+R2 F10 F12 ... F30 Load1 Computer Science 220 40 Tomasulo Example Cycle 2 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Execution complete S1 Vj S2 Vk RS for j Qj RS for k Qk Clock F2 F4 F6 F8 2 F0 FU Write Result Load1 Load2 Load3 Load2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Busy Yes Yes No Address 34+R2 45+R3 F10 F12 ... F30 Load1 Computer Science 220 41 Tomasulo Example Cycle 3 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 3 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Execution complete 3 Write Result S1 Vj S2 Vk RS for j Qj R(F4) Load2 Clock F0 F2 F4 F6 Mult1 Load2 3 FU © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Load1 Load2 Load3 Busy Yes Yes No Address 34+R2 45+R3 F10 F12 ... RS for k Qk F8 F30 Load1 Computer Science 220 42 Tomasulo Example Cycle 4 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 Yes 0 Add2 No Add3 No 0 Mult1 Yes 0 Mult2 No Register result status Clock 4 FU Issue 1 2 3 4 Execution complete 3 Write Result 4 Load1 Load2 Load3 S1 Op Vj SUBD M(34+R2) S2 Vk RS for j Qj MULTD R(F4) Load2 F4 F6 F8 M(34+R2) Add1 F0 F2 Mult1 Load2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 Busy No Yes No Address F10 F12 ... 45+R3 RS for k Qk Load2 F30 43 Tomasulo Example Cycle 5 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 Yes 0 Add2 No Add3 No 0 Mult1 Yes 0 Mult2 Yes Register result status Clock 5 FU Issue 1 2 3 4 5 Execution complete 3 5 Write Result 4 Load1 Load2 Load3 Busy No Yes No Address F12 ... S1 Op Vj SUBD M(34+R2) S2 Vk RS for j Qj MULTD DIVD R(F4) M(34+R2) Load2 Mult1 F4 F6 F8 F10 M(34+R2) Add1 Mult2 F0 F2 Mult1 Load2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 45+R3 RS for k Qk Load2 F30 44 Tomasulo Example Cycle 6 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 2 Add1 Yes 0 Add2 Yes Add3 No 10 Mult1 Yes 0 Mult2 Yes Register result status Clock 6 FU Issue 1 2 3 4 5 6 Execution complete 3 5 Write Result 4 6 Load1 Load2 Load3 Address F12 ... S1 Op Vj SUBD M(34+R2) ADDD S2 Vk M(45+R3) M(45+R3) MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 F10 Mult1 M(45+R3) Add2 Add1 Mult2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz RS for j Qj Busy No No No RS for k Qk Add1 Computer Science 220 F30 45 Tomasulo Example Cycle 7 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 1 Add1 Yes 0 Add2 Yes Add3 No 9 Mult1 Yes 0 Mult2 Yes Register result status Clock 7 FU Issue 1 2 3 4 5 6 Execution complete 3 5 Write Result 4 6 Load1 Load2 Load3 Address F12 ... S1 Op Vj SUBD M(34+R2) ADDD S2 Vk M(45+R3) M(45+R3) MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 F10 Mult1 M(45+R3) Add2 Add1 Mult2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz RS for j Qj Busy No No No RS for k Qk Add1 Computer Science 220 F30 46 Tomasulo Example Cycle 8 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 Yes 0 Add2 Yes Add3 No 8 Mult1 Yes 0 Mult2 Yes Register result status Clock 8 FU Issue 1 2 3 4 5 6 Execution complete 3 5 Write Result 4 6 Load1 Load2 Load3 Busy No No No Address F12 ... 8 S1 Op Vj SUBD M(34+R2) ADDD S2 Vk M(45+R3) M(45+R3) MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 F10 Mult1 M(45+R3) Add2 Add1 Mult2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz RS for j Qj RS for k Qk Add1 Computer Science 220 F30 47 Tomasulo Example Cycle 9 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 Yes Add3 No 7 Mult1 Yes 0 Mult2 Yes Register result status Clock 9 FU Issue 1 2 3 4 5 6 Op Execution complete 3 5 Write Result 4 6 8 S1 Vj Load1 Load2 Load3 Busy No No No Address F10 F12 ... 9 S2 Vk RS for j Qj RS for k Qk ADDD M()–M() M(45+R3) MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) Add2 M()–M() Mult2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 F30 48 Tomasulo Example Cycle 10 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 2 Add2 Yes Add3 No 67 Mult1 Yes 0 Mult2 Yes Register result status Clock 10 FU Issue 1 2 3 4 5 6 Op Execution complete 3 5 Write Result 4 6 8 S1 Vj Load1 Load2 Load3 Busy No No No Address F10 F12 ... 9 S2 Vk RS for j Qj RS for k Qk ADDD M()–M() M(45+R3) MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) Add2 M()–M() Mult2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 F30 49 Tomasulo Example Cycle 11 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 1 Add2 Yes Add3 No 5 Mult1 Yes 0 Mult2 Yes Register result status Clock 11 FU Issue 1 2 3 4 5 6 Op Execution complete 3 5 Write Result 4 6 8 S1 Vj Load1 Load2 Load3 Busy No No No Address F10 F12 ... 9 S2 Vk RS for j Qj RS for k Qk ADDD M()–M() M(45+R3) MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) Add2 M()–M() Mult2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 F30 50 Tomasulo Example Cycle 12 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 Yes Add3 No 4 Mult1 Yes 0 Mult2 Yes Register result status Clock 12 FU Issue 1 2 3 4 5 6 Op Execution complete 3 5 Write Result 4 6 8 Load1 Load2 Load3 Busy No No No Address F10 F12 ... 9 12 S1 Vj S2 Vk RS for j Qj RS for k Qk ADDD M()–M() M(45+R3) MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) Add2 M()–M() Mult2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 F30 51 Tomasulo Example Cycle 13 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No Add3 No 3 Mult1 Yes 0 Mult2 Yes Register result status Clock 13 FU Issue 1 2 3 4 5 6 Execution complete 3 5 Write Result 4 6 8 9 12 13 Address F10 F12 ... S2 Vk RS for j Qj MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) (M–M)+M() M()–M() Mult2 Op S1 Vj Load1 Load2 Load3 Busy No No No © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 RS for k Qk F30 52 Tomasulo Example Cycle 14 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No Add3 No 2 Mult1 Yes 0 Mult2 Yes Register result status Clock 14 FU Issue 1 2 3 4 5 6 Execution complete 3 5 Write Result 4 6 8 9 12 13 Address F10 F12 ... S2 Vk RS for j Qj MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) (M–M)+M() M()–M() Mult2 Op S1 Vj Load1 Load2 Load3 Busy No No No © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 RS for k Qk F30 53 Tomasulo Example Cycle 15 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No Add3 No 1 Mult1 Yes 0 Mult2 Yes Register result status Clock 15 FU Issue 1 2 3 4 5 6 Execution complete 3 5 Write Result 4 6 8 9 12 13 Address F10 F12 ... S2 Vk RS for j Qj MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) (M–M)+M() M()–M() Mult2 Op S1 Vj Load1 Load2 Load3 Busy No No No © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 RS for k Qk F30 54 Tomasulo Example Cycle 16 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No Add3 No 0 Mult1 Yes 0 Mult2 Yes Register result status Clock 16 FU Issue 1 2 3 4 5 6 Execution complete 3 5 16 8 Write Result 4 6 Address F10 F12 ... 9 12 13 S2 Vk RS for j Qj MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 F0 F2 F4 F6 F8 Mult1 M(45+R3) (M–M)+M() M()–M() Mult2 Op S1 Vj Load1 Load2 Load3 Busy No No No © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 RS for k Qk F30 55 Tomasulo Example Cycle 17 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 3 SUBD F8 F6 F2 4 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 Yes DIVD Register result status Execution complete 3 5 16 8 Clock 17 FU Write Result 4 6 17 9 12 Address F10 F12 ... 13 S1 Vj S2 Vk M*F4 M(34+R2) F0 F2 F4 M*F4 M(45+R3) © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Load1 Load2 Load3 Busy No No No RS for j Qj RS for k Qk F6 F8 (M–M)+M() M()–M() Mult2 Computer Science 220 F30 56 Tomasulo Example Cycle 18 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 3 SUBD F8 F6 F2 4 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 40 Mult2 Yes DIVD Register result status Execution complete 3 5 16 8 Clock 18 FU Write Result 4 6 17 9 12 Address F10 F12 ... 13 S1 Vj S2 Vk M*F4 M(34+R2) F0 F2 F4 M*F4 M(45+R3) © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Load1 Load2 Load3 Busy No No No RS for j Qj RS for k Qk F6 F8 (M–M)+M() M()–M() Mult2 Computer Science 220 F30 57 Tomasulo Example Cycle 57 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 3 SUBD F8 F6 F2 4 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 1 Mult2 Yes DIVD Register result status Execution complete 3 5 16 8 Clock 57 FU Write Result 4 6 17 9 12 Address F10 F12 ... 13 S1 Vj S2 Vk M*F4 M(34+R2) F0 F2 F4 M*F4 M(45+R3) © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Load1 Load2 Load3 Busy No No No RS for j Qj RS for k Qk F6 F8 (M–M)+M() M()–M() Mult2 Computer Science 220 F30 58 Tomasulo Example Cycle 58 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 3 SUBD F8 F6 F2 4 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 Yes DIVD Register result status Execution complete 3 5 16 8 58 12 S1 Vj Write Result 4 6 17 9 M*F4 M(34+R2) Clock F0 F2 F4 M*F4 M(45+R3) 58 FU © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Load1 Load2 Load3 Busy No No No Address F10 F12 ... 13 S2 Vk RS for j Qj RS for k Qk F6 F8 (M–M)+M() M()–M() Mult2 Computer Science 220 F30 59 Tomasulo Example Cycle 59 Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTDF0 F2 F4 3 SUBD F8 F6 F2 4 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Execution complete 3 5 16 8 58 12 S1 Vj Write Result 4 6 17 9 59 13 S2 Vk RS for j Qj RS for k Qk Clock F0 F2 F4 F6 F8 M*F4 M(45+R3) (M–M)+M() M()–M() M*F4/M 59 FU © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Load1 Load2 Load3 Computer Science 220 Busy No No No Address F10 F12 ... F30 60 Tomasulo vs. Scoreboard • Is tomasulo better? • Finish in 59 cycles vs. 61 for scoreboard, why? • We do reach the divide 3 cycles earlier… Simultaneous read of operand for SUBD and MULT © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 61 Tomasulo Loop Example Loop: LD MULTD SD SUBI BNEZ F0 F4 F4 R1 R1 0 F0 0 R1 Loop R1 F2 R1 #8 • Multiply takes 4 clocks • Loads may have cache misses © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 62 Loop Example Cycle 0 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 No 0 Mult2 No Register result status Clock 0 R1 80 F0 Issue S1 Vj F2 Execution Write complete Result S2 Vk F4 Busy Address No No No Qi No No No Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTDF4 SD F4 SUBI R1 BNEZ R1 F6 F8 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Qi © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 63 Loop Example Cycle 1 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 No 0 Mult2 No Register result status Clock 1 R1 80 F0 Qi Issue 1 S1 Vj F2 Execution Write complete Result S2 Vk F4 Busy Address Yes 80 No No Qi No No No Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTDF4 SD F4 SUBI R1 BNEZ R1 F6 F8 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Load1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 64 Loop Example Cycle 2 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock 2 R1 80 F0 Qi Load1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 2 S1 Vj Execution Write complete Result S2 Vk R(F2) F2 F4 Busy Address Yes 80 No No Qi No No No Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTDF4 SD F4 Load1 SUBI R1 BNEZ R1 F6 F8 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult1 Computer Science 220 65 Loop Example Cycle 3 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock 3 R1 80 F0 Qi Load1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 2 3 S1 Vj Execution Write complete Result S2 Vk R(F2) F2 F4 Busy Address Yes 80 No No Qi Yes 80 Mult1 No No Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTDF4 SD F4 Load1 SUBI R1 BNEZ R1 F6 F8 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult1 Computer Science 220 66 Loop Example Cycle 4 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock 4 R1 72 F0 Qi Load1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 2 3 S1 Vj Execution Write complete Result S2 Vk R(F2) F2 F4 Busy Address Yes 80 No No Qi Yes 80 Mult1 No No Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTDF4 SD F4 Load1 SUBI R1 BNEZ R1 F6 F8 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult1 Computer Science 220 67 Loop Example Cycle 5 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock 5 R1 72 F0 Qi Load1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 2 3 S1 Vj Execution Write complete Result S2 Vk R(F2) F2 F4 Busy Address Yes 80 No No Qi Yes 80 Mult1 No No Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTDF4 SD F4 Load1 SUBI R1 BNEZ R1 F6 F8 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult1 Computer Science 220 68 Loop Example Cycle 6 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock 6 R1 72 F0 Qi Load1 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 2 3 6 S1 Vj Execution Write complete Result S2 Vk R(F2) F2 F4 Busy Address Yes 80 Yes 72 No Qi Yes 80 Mult1 No No Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTDF4 SD F4 Load1 SUBI R1 BNEZ R1 F6 F8 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult1 Computer Science 220 69 Loop Example Cycle 7 Instruction status Instruction j k LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes 0 Mult2 Yes Register result status Clock 7 R1 72 iteration 1 1 1 2 2 2 S1 Vj Op MULTD MULTD F0 Qi Issue 1 2 3 6 7 Load2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz F2 Execution Write complete Result Busy Address Yes 80 Yes 72 No Qi Yes 80 Mult1 No No R(F2) R(F2) Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTDF4 SD F4 Load1 SUBI R1 Load2 BNEZ R1 F4 F6 S2 Vk F8 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult2 Computer Science 220 70 Loop Example Cycle 8 Instruction status Instruction j k LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes 0 Mult2 Yes Register result status Clock 8 R1 72 iteration 1 1 1 2 2 2 Op MULTD MULTD F0 Qi Issue 1 2 3 6 7 8 S1 Vj Load2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz F2 Execution Write complete Result Busy Address Yes 80 Yes 72 No Qi Yes 80 Mult1 Yes 72 Mult2 No R(F2) R(F2) Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTDF4 SD F4 Load1 SUBI R1 Load2 BNEZ R1 F4 F6 S2 Vk F8 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult2 Computer Science 220 71 Loop Example Cycle 9 Instruction status Instruction j k LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes 0 Mult2 Yes Register result status Clock 9 R1 64 iteration 1 1 1 2 2 2 Op MULTD MULTD F0 Qi Issue 1 2 3 6 7 8 S1 Vj Load2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz F2 Execution Write complete Result 9 Load1 Load2 Load3 Store1 Store2 Store3 S2 RS for j RS for k Vk Qj Qk R(F2) R(F2) Load1 Load2 F4 F6 F8 Busy Address Yes 80 Yes 72 No Qi Yes 80 Mult1 Yes 72 Mult2 No Code: LD F0 MULTDF4 SD F4 SUBI R1 BNEZ R1 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult2 Computer Science 220 72 Loop Example Cycle 10 Instruction status Instruction j k LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 4 Mult1 Yes 0 Mult2 Yes Register result status Clock 10 R1 64 Qi iteration 1 1 1 2 2 2 Op Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 Load2 Load3 10 Store1 Store2 Store3 S2 RS for j RS for k Vk Qj Qk MULTD MULTD M(80) R(F2) R(F2) Load2 F0 F2 F6 Load2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz F4 F8 Busy Address No Yes 72 No Qi Yes 80 Mult1 Yes 72 Mult2 No Code: LD F0 MULTDF4 SD F4 SUBI R1 BNEZ R1 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult2 Computer Science 220 73 Loop Example Cycle 11 Instruction status Instruction j k LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 3 Mult1 Yes 4 Mult2 Yes Register result status Clock 11 R1 64 iteration 1 1 1 2 2 2 Op Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 Load2 Load3 10 11 Store1 Store2 Store3 S2 RS for j RS for k Vk Qj Qk MULTD MULTD M(80) R(F2) M(72) R(F2) F0 F2 Qi © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz F4 F6 F8 Busy Address No No Yes 64 Qi Yes 80 Mult1 Yes 72 Mult2 No Code: LD F0 MULTDF4 SD F4 SUBI R1 BNEZ R1 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult2 Computer Science 220 74 Loop Example Cycle 12 Instruction status Instruction j k LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 2 Mult1 Yes 3 Mult2 Yes Register result status Clock 12 R1 64 iteration 1 1 1 2 2 2 Op Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 Load2 Load3 10 11 Store1 Store2 Store3 S2 RS for j RS for k Vk Qj Qk MULTD MULTD M(80) R(F2) M(72) R(F2) F0 F2 Qi © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz F4 F6 F8 Busy Address No No Yes 64 Qi Yes 80 Mult1 Yes 72 Mult2 No Code: LD F0 MULTDF4 SD F4 SUBI R1 BNEZ R1 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult2 Computer Science 220 75 Loop Example Cycle 13 Instruction status Instruction j k LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 1 Mult1 Yes 2 Mult2 Yes Register result status Clock 13 R1 64 iteration 1 1 1 2 2 2 Op Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 Load2 Load3 10 11 Store1 Store2 Store3 S2 RS for j RS for k Vk Qj Qk MULTD MULTD M(80) R(F2) M(72) R(F2) F0 F2 Qi © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz F4 F6 F8 Busy Address No No Yes 64 Qi Yes 80 Mult1 Yes 72 Mult2 No Code: LD F0 MULTDF4 SD F4 SUBI R1 BNEZ R1 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult2 Computer Science 220 76 Loop Example Cycle 14 Instruction status Instruction j k LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTDF4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes 1 Mult2 Yes Register result status Clock 14 R1 64 iteration 1 1 1 2 2 2 Op Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 14 Load2 Load3 10 11 Store1 Store2 Store3 S2 RS for j RS for k Vk Qj Qk MULTD MULTD M(80) R(F2) M(72) R(F2) F0 F2 Qi © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz F4 F6 F8 Busy Address No No Yes 64 Qi Yes 80 Mult1 Yes 72 Mult2 No Code: LD F0 MULTDF4 SD F4 SUBI R1 BNEZ R1 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult2 Computer Science 220 77 Loop Example Cycle 15 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 No 0 Mult2 Yes MULTD Register result status Clock 15 R1 64 F0 Qi © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 14 15 Load2 Load3 10 11 Store1 15 Store2 Store3 S2 RS for j RS for k Vk Qj Qk M(72) R(F2) F2 F4 F6 F8 Busy Address No No Yes 64 Qi Yes 80 M(80)*R(F2) Yes 72 Mult2 No Code: LD F0 MULTDF4 SD F4 SUBI R1 BNEZ R1 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult2 Computer Science 220 78 Loop Example Cycle 16 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock 16 R1 64 F0 Qi © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 2 3 6 7 8 S1 Vj F2 Execution Write complete Result 9 10 Load1 14 15 Load2 Load3 10 11 Store1 15 16 Store2 Store3 S2 RS for j RS for k Vk Qj Qk R(F2) Load3 F4 F6 F8 Busy Address No No Yes 64 Qi Yes 80 M(80)*R(F2) Yes 72 M(72)*R(72) No Code: LD F0 MULTDF4 SD F4 SUBI R1 BNEZ R1 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult1 Computer Science 220 79 Loop Example Cycle 17 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock 17 R1 64 F0 Qi © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 2 3 6 7 8 S1 Vj F2 Execution Write complete Result 9 10 Load1 14 15 Load2 Load3 10 11 Store1 15 16 Store2 Store3 S2 RS for j RS for k Vk Qj Qk R(F2) Load3 F4 F6 F8 Busy Address No No Yes 64 Qi Yes 80 M(80)*R(F2) Yes 72 M(72)*R(72) Yes 64 Mult1 Code: LD F0 MULTDF4 SD F4 SUBI R1 BNEZ R1 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult1 Computer Science 220 80 Loop Example Cycle 18 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock 18 R1 56 F0 Qi © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 14 15 18 10 11 15 16 S2 Vk R(F2) F2 F4 Busy Address No No Yes 64 Qi Yes 80 M(80)*R(F2) Yes 72 M(72)*R(72) Yes 64 Mult1 Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTDF4 SD F4 Load3 SUBI R1 BNEZ R1 F6 F8 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult1 Computer Science 220 81 Loop Example Cycle 19 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock 19 R1 56 F0 Qi © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 14 15 18 19 10 11 15 16 S2 Vk R(F2) F2 F4 Busy Address No No Yes 64 Qi No Yes 72 M(72)*R(72) Yes 64 Mult1 Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTDF4 SD F4 Load3 SUBI R1 BNEZ R1 F6 F8 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult1 Computer Science 220 82 Loop Example Cycle 20 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock 20 R1 56 F0 Qi © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 14 15 18 19 10 11 15 16 20 S2 RS for j Vk Qj R(F2) F2 F4 Busy Address No No Yes 64 Qi No Yes 72 M(72)*R(72) Yes 64 Mult1 Load1 Load2 Load3 Store1 Store2 Store3 RS for k Qk Code: LD F0 MULTDF4 SD F4 Load3 SUBI R1 BNEZ R1 F6 F8 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult1 Computer Science 220 83 Loop Example Cycle 21 Instruction status Instruction j k iteration LD F0 0 R1 1 MULTDF4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTDF4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock 21 R1 56 F0 Qi © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 14 15 18 19 10 11 15 16 20 21 S2 RS for j Vk Qj R(F2) F2 F4 Busy Address No No Yes 64 Qi No No Yes 64 Mult1 Load1 Load2 Load3 Store1 Store2 Store3 RS for k Qk Code: LD F0 MULTDF4 SD F4 Load3 SUBI R1 BNEZ R1 F6 F8 0 R1 F0 F2 0 R1 R1 #8 Loop F10 F12 ... F30 Mult1 Computer Science 220 84 Tomasulo Summary • • • • Prevents Register as bottleneck Avoids WAR, WAW hazards of Scoreboard Allows loop unrolling in HW Not limited to basic blocks (provided branch prediction) • Lasting Contributions – Dynamic scheduling – Register renaming – Load/store disambiguation © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 85 Next Time • Tomasulo’s Algorithm © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Computer Science 220 86
© Copyright 2026 Paperzz