Advanced Pipelining Out of Order Processors COMP25212 Overview and Learning Outcomes • Find out how modern processors work • Understand the evolution of processors • Learn how out-of-order processors can improve processors performance • Discover architectural solutions to support and improve out-of-order execution 2 Remember from last week • Classic 5-stage pipeline • Control Hazards • Data Hazards • Instruction Level Parallelism • Superscalar • Out of order execution Classic 5-stage pipeline Inst Cache Data Cache Write Logic Mem Logic Exec Logic Decode Logic Fetch Logic • A single execution flow Modern Pipelines • Many execution flows Write Back Mul2 Mul3 Write Back Div1 Div2 Div3 Write Back Write Back Ld2 Mul1 Fetch Decode Add1 Functional Units Ld1 Inst Cache In ARM Processors In-order processor Out of order processor Out of Order Execution • The original order in a program is not preserved • Processors execute instructions as input data becomes available • Pipeline stalls due to conflicted instructions are avoided by processing instructions which are able to run immediately • Take advantage of ILP • Instructions per cycle increases Conflicted Instructions • Cache misses: long wait before finishing execution • Structural Hazard: the required resources are not available • Data hazard: dependencies between instructions Structural Hazards • Functional Units are typically not pipelined • This means only one instruction can use them at once • If all suitable Functional Units for executing an instruction are busy, then the instruction can not be executed Modern Pipelines • Many execution flows Write Back Mul2 Mul3 Write Back Div1 Div2 Div3 Write Back Write Back Ld2 Mul1 Fetch Decode Add1 Functional Units Ld1 Inst Cache Data dependencies • True dependency r1 <- r2 op r3 r4 <- r1 op r5 Read-after-write RAW • Anti-dependency r1 <- r2 op r3 r2 <- r4 op r5 Write-after-read WAR • Output dependency r1 <- r2 op r3 … r1 <- r4 op r5 Write-after-write WAW Dynamic Scheduling • Key Idea: Allow instructions behind stall to proceed. => Instructions executing in parallel. There are multiple execution units, so use them DIVD ADDD SUBD F0, F2, F4 F10, F0, F8 F12, F8, F14 Even though ADDD stalls, the SUBD has no dependencies and can run. – Enables out-of-order execution => out-of-order completion Dynamic pipeline scheduling overcomes the limitations of in-order pipelined execution by allowing out-of-order instruction execution Out of Order Execution with Scoreboard Scoreboard • The scoreboard is a centralized hardware mechanism – Instruction are executed as soon as their operands are available and there are no hazard conditions • It dynamically constructs the dependency graph by hardware for a window of instructions as they are issued in program order • The scoreboard is a “data structure” that provides the information necessary for all pieces of the processor to work together (In Appendix A.8) CDC6600 (1963) The Key idea of Scoreboards • Out-of-order execution divides ID stage: 1.Issue—decode instructions, check for structural hazards 2.Read operands—wait until no data hazards, then read operands • Scoreboards allow instruction to execute whenever 1 & 2 hold, not waiting for prior instructions • We will use In-order issue, out-of-order execution, out of order commit ( also called completion) Typical Scoreboard Structure Stages of a Scoreboard Pipeline Execute Integer Issue Read Operands Write Back Execute FP Multiplication Write Back Execute FP Multiplication Write Back Execute FP Add Write Back Execute FP Division Write Back Stages of a Scoreboard Pipeline 1. Issue —decode instructions & check for structural & WAW hazards (ID) If a functional unit for the instruction is free (no structural hazards) Always and no other active instruction has the same destination register done in (no WAW), the scoreboard issues the instruction to the functional program unit and updates its internal data structure. order If a structural or WAW hazard exists, then the instruction issue stalls, and no further instructions will issue until these hazards are cleared. 2. Read operands —wait until no data hazards, then read operands (RO) Can be done out of program order A source operand is available if no earlier issued active instruction is going to write it, or if the register containing the operand is being written by a currently active functional unit (no RAW). When the source operands are available, the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution. The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order. Stages of a Scoreboard Pipeline 3. Execution —operate on operands (EX) The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution. 4. Write result —finish execution (WB) Once the scoreboard is aware of the fact that the functional unit has completed execution, the scoreboard checks for WAR hazards. If none, it writes results. If WAR, then it stalls the instruction. Example: DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F8,F8,F14 Scoreboard would stall SUBD until ADDD reads operands Information within the Scoreboard 1. Instruction status—which of 4 stages the instruction is in 2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unit Busy—Indicates whether the unit is being used or not Op—Operation to perform in the unit (e.g., + or –) Fi—Destination register Fj, Fk—Source-register numbers Qj, Qk—Functional units producing source registers Fj, Fk Rj, Rk—Flags indicating when Fj, Fk are ready. Set to No after operands are read. 3. Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register Information within the Scoreboard Instruction stream Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Clock FU Issue Read Execution Write operandscompleteResult Busy No No No No No Op dest Fi F0 F2 F4 Instruction status: Scoreboard only records the status We will show the times for each stage, for convenience S1 Fj S2 Fk FU for j FU for k Fj? Qj Qk Rj Fk? Rk F6 F8 F10 F30 F12 ... Information within the Scoreboard 1. Instruction status—which of 4 stages the instruction is in 2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unit Busy—Indicates whether the unit is being used or not Op—Operation to perform in the unit (e.g., + or –) Fi—Destination register Fj, Fk—Source-register numbers Qj, Qk—Functional units producing source registers Fj, Fk Rj, Rk—Flags indicating when Fj, Fk are ready. Set to No after operands are read. 3. Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register Information within the Scoreboard Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 FU count Mult2 down Add Divide Register result status Clock FU Issue Read Execution Write operandscompleteResult S1 Fj Functional Units: 1 Integer 2 Multiplication 1 Addition 1 Division Busy No No No No No Op dest Fi S2 Fk F0 F2 will ... F4Source F6and F8 Which F10 FUF12 F30 Operands destination registers FU for j FU for k Fj? Qj Qk Rj produce each operand Fk? Rk Ready? Information within the Scoreboard 1. Instruction status—which of 4 stages the instruction is in 2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unit Busy—Indicates whether the unit is being used or not Op—Operation to perform in the unit (e.g., + or –) Fi—Destination register Fj, Fk—Source-register numbers Qj, Qk—Functional units producing source registers Fj, Fk Rj, Rk—Flags indicating when Fj, Fk are ready. Set to No after operands are read. 3. Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register Information within the Scoreboard Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Clock Issue Read Execution Write operandscompleteResult Busy No No No No No Op dest Fi F0 F2 F4 S1 Fj S2 Fk FU for j FU for k Fj? Qj Qk Rj Fk? Rk F6 F8 F10 F30 F12 FU Clock cycle counter Which FU will write in each register? ... A Scoreboard Example The following code is run on a scoreboard pipeline with: Functional Unit (FU) Integer Mem Floating Point Multiply Floating Point Add Floating point Divide L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 # of FUs 1 2 1 1 EX cycles 1 10 2 40 Functional units are not pipelined Dependency Graph For Example Code Example Code 1 1 2 3 4 5 6 L.D F6, 34 (R2) 2 L.D F2, 45 (R3) 3 MUL.D F0, F2, F4 4 SUB.D F8, F6, F2 5 DIV.D F10, F0, F6 L.D L.D MUL.D SUB.D DIV.D ADD.D F6, 34(R2) F2, 45(R3) F0, F2, F4 F8, F6, F2 F10, F0, F6 F6, F8, F2 Date Dependence: (1, 4) (1, 5) (2, 3) (2, 4) (2, 6) (3, 5) (4, 6) Output Dependence: (1, 6) Anti-dependence: (5, 6) Real Data Dependence (RAW) 6 ADD.D F6, F8, F2 Anti-dependence (WAR) Output Dependence (WAW) Scoreboard Example Cycle 1 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Clock 1 FU Issue 1 Read ExecutionW rite operandscompleteResult Busy Yes No No No No Op Load dest Fi F6 F0 F2 F4 S2 Fk R2 Issue LD #1 S1 Fj F6 F8 F10 Integer FU for j FU for k Fj? Qj Qk Rj F12 ... Fk? Rk Yes F30 Scoreboard Example Cycle 2 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 Clock F0 2 FU Stall Busy Op Yes Load No No No No F2 S2 Fk R2 LD#1 reads operands LD #2 can’t issue since integer unit is busy MULT can’t issue because we require in-order issue. Pipeline Stalls dest Fi F6 S1 Fj F4 F6 F8 F10 Integer FU for j FU for k Fj? Qj Qk Rj F12 ... Fk? Rk Yes F30 Scoreboard Example Cycle 3 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 Clock F0 3 FU Busy Op Yes Load No No No No F2 S2 Fk R2 LD #1 completes dest Fi F6 S1 Fj F4 F6 F8 F10 Integer FU for j FU for k Fj? Qj Qk Rj F12 ... Fk? Rk Yes F30 Scoreboard Example Cycle 4 Instruction status Read Execution Write Instruction j k Issueoperands complete Result LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVDF10 F0 F6 ADDD F6 F8 F2 Functional unit status dest S1 S2 TimeName Busy Op Fi Fj Fk Integer No Mult1 No Mult2 No Add No Divide No Register result status Clock 4 F0 F2 FU F4 LD #1 writes back and frees Integer FU and register F6 FU for FU j for F k j? Qj Qk Rj F6 F8 F10 F12 ... Fk? Rk F30 Scoreboard Example Cycle 5 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 Clock F0 5 FU Busy Op Yes Load No No No No F2 Integer S2 Fk R3 Issue LD #2 since integer unit is now free. dest Fi F2 S1 Fj FU for j FU for k Fj? Qj Qk Rj F4 F6 F8 F10 F12 ... Fk? Rk Yes F30 Scoreboard Example Cycle 6 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 6 Clock F0 6 FU Busy Op Yes Load Yes Mult No No No F2 Mult1 Integer Issue MULT. dest Fi F2 F0 S1 Fj F4 F6 F8 F10 F2 S2 Fk R3 F4 FU for j FU for k Fj? Qj Qk Rj Integer F12 No Fk? Rk Yes Yes ... F30 Scoreboard Example Cycle 7 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 6 7 Busy Yes Yes No Yes No Clock F0 7 FU MULT can’t read its operands (F2) because LD #2 hasn’t finished. SUBD is issued Op Load Mult dest Fi F2 F0 S1 Fj F2 S2 Fk R3 F4 Sub F8 F6 F2 F2 F4 F6 F8 F10 Mult1 Integer Add FU for j FU for k Fj? Qj Qk Rj Integer No Fk? Rk Yes Yes Integer Yes No F12 F30 ... Scoreboard Example Cycle 8a Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 6 7 8 Busy Yes Yes No Yes Yes Clock F0 8 FU MULT and SUBD both waiting for F2. DIVD issues. Op Load Mult dest Fi F2 F0 S1 Fj F2 S2 Fk R3 F4 Sub Div F8 F10 F6 F0 F2 F6 F2 F4 F6 F8 F10 Mult1 Integer FU for j FU for k Fj? Qj Qk Rj Integer Mult1 Add Divide No Fk? Rk Yes Yes Integer Yes No No Yes F12 F30 ... Scoreboard Example Cycle 8b Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 7 8 Busy No Yes No Yes Yes Clock F0 8 FU Mult1 LD #2 writes F2. Op dest Fi S1 Fj S2 Fk FU for j FU for k Fj? Qj Qk Rj Mult F0 F2 F4 Yes Yes Sub Div F8 F10 F6 F0 F2 F6 Yes No Yes Yes F2 F4 F6 F8 F10 ... F30 Mult1 Add Divide F12 Fk? Rk Scoreboard Example Cycle 9 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 10 Mult1 Mult2 2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 8 Busy No Yes No Yes Yes Clock F0 9 FU Mult1 Now MULT and SUBD can both read F2. Op dest Fi S1 Fj S2 Fk FU for j FU for k Fj? Qj Qk Rj Mult F0 F2 F4 Yes Yes Sub Div F8 F10 F6 F0 F2 F6 Yes No Yes Yes F2 F4 F6 F8 F10 ... F30 Mult1 Add Divide F12 Fk? Rk Scoreboard Example Cycle 10 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 10 9 Mult1 Mult2 12 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 8 Busy No Yes No Yes Yes Clock F0 9 FU Mult1 MULT and SUB continue operation Op dest Fi S1 Fj S2 Fk FU for j FU for k Fj? Qj Qk Rj Mult F0 F2 F4 Yes Yes Sub Div F8 F10 F6 F0 F2 F6 Yes No Yes Yes F2 F4 F6 F8 F10 ... F30 Mult1 Add Divide F12 Fk? Rk Scoreboard Example Cycle 11 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 8 Mult1 Mult2 0 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 8 Busy No Yes No Yes Yes Clock F0 11 FU Mult1 ADDD can not be issued because add unit is busy. SUBD completes Op dest Fi S1 Fj S2 Fk FU for j FU for k Fj? Qj Qk Rj Mult F0 F2 F4 Yes Yes Sub Div F8 F10 F6 F0 F2 F6 Yes No Yes Yes F2 F4 F6 F8 F10 ... F30 Mult1 Add Divide F12 Fk? Rk Scoreboard Example Cycle 12 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 7 Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 Clock F0 12 FU Busy Op No Yes Mult No No Yes Div Mult1 F2 SUBD finishes. DIVD waits for F0 dest Fi S1 Fj S2 Fk FU for j FU for k Fj? Qj Qk Rj F0 F2 F4 F10 F0 F6 F4 F6 F8 F10 Mult1 Divide F12 Fk? Rk Yes Yes No Yes ... F30 Scoreboard Example Cycle 13 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 6 Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 13 FU Mult1 F2 F4 ADDD issues. FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 14 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 5 Mult1 Mult2 2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 14 FU Mult1 F2 F4 MULT and ADDD continue their operation FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 15 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 4 Mult1 Mult2 1 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 15 FU Mult1 F2 F4 Nearly there… FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 16 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 3 Mult1 Mult2 0 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 16 FU Mult1 F2 F4 ADDD completes execution FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 17 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 2 Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 17 FU Mult1 F2 F4 ADDD can’t write because of RAW with DIVD ADDD stalls write back FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 18 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 1 Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 18 FU Mult1 F2 F4 MULT still continues its execution FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 19 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer 0 Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 19 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No Yes Mult F0 F2 F4 No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 19 FU Mult1 F2 F4 MULT completes execution. FU for j FU for k Fj? Qj Qk Rj Mult1 F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes No Yes Yes ... F30 Scoreboard Example Cycle 20 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No No No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 20 FU F2 F4 MULT writes and frees FU and register F0 FU for j FU for k Fj? Qj Qk Rj F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes Yes ... F30 Scoreboard Example Cycle 21 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 21 13 14 16 dest S1 S2 Busy Op Fi Fj Fk No No No Yes Add F6 F8 F2 Yes Div F10 F0 F6 Clock F0 21 FU F2 F4 DIVD can read operands FU for j FU for k Fj? Qj Qk Rj F6 F8 F10 Add Divide F12 Fk? Rk Yes Yes Yes Yes ... F30 Scoreboard Example Cycle 22 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add 40 Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 21 13 14 16 22 dest S1 S2 Busy Op Fi Fj Fk No No No No Yes Div F10 F0 F6 Clock F0 22 FU F2 F4 Now ADDD can write since WAR removed ADD FU and register F6 freed FU for j FU for k Fj? Qj Qk Rj F6 F8 F10 Divide F12 Fk? Rk Yes Yes ... F30 39 cycles later… Scoreboard Example Cycle 61 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add 0 Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 21 61 13 14 16 22 dest S1 S2 Busy Op Fi Fj Fk No No No No Yes Div F10 F0 F6 Clock F0 61 FU F2 F4 DIVD completes execution FU for j FU for k Fj? Qj Qk Rj F6 F8 F10 Divide F12 Fk? Rk Yes Yes ... F30 Scoreboard Example Cycle 62 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add 0 Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 21 61 62 13 14 16 22 dest S1 S2 Busy Op Fi Fj Fk No No No No No Clock F0 62 FU F2 F4 DIVD writes back and frees resources Execution Complete FU for j FU for k Fj? Qj Qk Rj F6 F8 F10 F12 ... Fk? Rk F30 Scoreboard Example Cycle 62 Instruction status Instruction j k LD F6 34+ R2 LD F2 45+ R3 MULTDF0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add 0 Divide Register result status Read Execution W rite Issue operandscompleteResult 1 2 3 4 5 6 7 8 6 9 19 20 7 9 11 12 8 21 61 62 13 14 16 22 dest S1 S2 Busy Op Fi Fj Fk No No No No No Clock F0 62 FU F2 F4 In-order issue Out-of-order execution Out-of-order completion FU for j FU for k Fj? Qj Qk Rj F6 F8 F10 F12 ... Fk? Rk F30 Summary • Techniques to deal with data hazards in instruction pipelines by: – Result forwarding to reduce or eliminate RAW hazards – Hazard detection hardware to stall the pipeline during hazards – Compiler-based static scheduling to separate the dependent instructions minimizing actual hazard-prevention stalls in scheduled code (will discuss in detail next week.) – Uses a hardware-based mechanism to rearrange instruction execution order to reduce stalls dynamically at runtime (dynamic scheduling) » Better dynamic exploitation of instruction-level parallelism (ILP) Limitations of Scoreboard • The amount of parallelism available among the instructions (chosen from the same basic block) • The number of score entries (The size of the scoreboard determines the size of the window) • The number and types of functional units (Structural hazards increase when out of order execution is used) • The presence of antidependence and output dependences lead to WAR and WAW stalls.
© Copyright 2026 Paperzz