Lecture 1

Advanced Pipelining
Out of Order Processors
COMP25212
Overview and Learning Outcomes
• Find out how modern processors work
• Understand the evolution of processors
• Learn how out-of-order processors can
improve processors performance
• Discover architectural solutions to support
and improve out-of-order execution
2
Remember from last week
• Classic 5-stage pipeline
• Control Hazards
• Data Hazards
• Instruction Level Parallelism
• Superscalar
• Out of order execution
Classic 5-stage pipeline
Inst Cache
Data Cache
Write
Logic
Mem
Logic
Exec
Logic
Decode
Logic
Fetch
Logic
• A single execution flow
Modern Pipelines
• Many execution flows
Write
Back
Mul2
Mul3
Write
Back
Div1
Div2
Div3
Write
Back
Write
Back
Ld2
Mul1
Fetch
Decode
Add1
Functional Units
Ld1
Inst Cache
In ARM Processors
In-order processor
Out of order processor
Out of Order Execution
• The original order in a program is not
preserved
• Processors execute instructions as input data
becomes available
• Pipeline stalls due to conflicted instructions
are avoided by processing instructions which
are able to run immediately
• Take advantage of ILP
• Instructions per cycle increases
Conflicted Instructions
• Cache misses: long wait before finishing
execution
• Structural Hazard: the required resources are
not available
• Data hazard: dependencies between
instructions
Structural Hazards
• Functional Units are typically not pipelined
• This means only one instruction can use
them at once
• If all suitable Functional Units for executing
an instruction are busy, then the instruction
can not be executed
Modern Pipelines
• Many execution flows
Write
Back
Mul2
Mul3
Write
Back
Div1
Div2
Div3
Write
Back
Write
Back
Ld2
Mul1
Fetch
Decode
Add1
Functional Units
Ld1
Inst Cache
Data dependencies
• True dependency
r1 <- r2 op r3
r4 <- r1 op r5
Read-after-write
RAW
• Anti-dependency
r1 <- r2 op r3
r2 <- r4 op r5
Write-after-read
WAR
• Output dependency
r1 <- r2 op r3
…
r1 <- r4 op r5
Write-after-write
WAW
Dynamic Scheduling
• Key Idea: Allow instructions behind stall to
proceed. => Instructions executing in parallel.
There are multiple execution units, so use them
DIVD
ADDD
SUBD
F0, F2, F4
F10, F0, F8
F12, F8, F14
Even though ADDD stalls, the
SUBD has no dependencies
and can run.
– Enables out-of-order execution => out-of-order completion
Dynamic pipeline scheduling overcomes the limitations of in-order
pipelined execution by allowing out-of-order instruction execution
Out of Order Execution with Scoreboard
Scoreboard
• The scoreboard is a centralized hardware
mechanism
– Instruction are executed as soon as their operands are
available and there are no hazard conditions
• It dynamically constructs the dependency graph
by hardware for a window of instructions as they
are issued in program order
• The scoreboard is a “data structure” that provides
the information necessary for all pieces of the
processor to work together
(In Appendix A.8)
CDC6600
(1963)
The Key idea of Scoreboards
• Out-of-order execution divides ID stage:
1.Issue—decode instructions, check for structural
hazards
2.Read operands—wait until no data hazards, then read
operands
• Scoreboards allow instruction to execute
whenever 1 & 2 hold, not waiting for prior
instructions
• We will use In-order issue, out-of-order
execution, out of order commit ( also called
completion)
Typical Scoreboard Structure
Stages of a Scoreboard Pipeline
Execute
Integer
Issue
Read
Operands
Write
Back
Execute
FP Multiplication
Write
Back
Execute
FP Multiplication
Write
Back
Execute
FP Add
Write
Back
Execute
FP Division
Write
Back
Stages of a Scoreboard Pipeline
1. Issue —decode instructions & check for structural & WAW
hazards (ID)
If a functional unit for the instruction is free (no structural hazards)
Always
and no other active instruction has the same destination register
done in
(no WAW), the scoreboard issues the instruction to the functional
program
unit and updates its internal data structure.
order
If a structural or WAW hazard exists, then the instruction issue stalls,
and no further instructions will issue until these hazards are
cleared.
2. Read operands —wait until no data hazards, then read operands
(RO)
Can be
done
out of
program
order
A source operand is available if no earlier issued active instruction is
going to write it, or if the register containing the operand is being
written by a currently active functional unit (no RAW).
When the source operands are available, the scoreboard tells the
functional unit to proceed to read the operands from the registers
and begin execution. The scoreboard resolves RAW hazards
dynamically in this step, and instructions may be sent into
execution out of order.
Stages of a Scoreboard Pipeline
3. Execution —operate on operands (EX)
The functional unit begins execution upon receiving
operands. When the result is ready, it notifies the
scoreboard that it has completed execution.
4. Write result —finish execution (WB)
Once the scoreboard is aware of the fact that the functional
unit has completed execution, the scoreboard checks for
WAR hazards. If none, it writes results. If WAR, then it
stalls the instruction.
Example:
DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F8,F8,F14
Scoreboard would stall SUBD until ADDD reads operands
Information within the Scoreboard
1. Instruction status—which of 4 stages the instruction is in
2. Functional unit status—Indicates the state of the functional
unit (FU). 9 fields for each functional unit
Busy—Indicates whether the unit is being used or not
Op—Operation to perform in the unit (e.g., + or –)
Fi—Destination register
Fj, Fk—Source-register numbers
Qj, Qk—Functional units producing source registers
Fj, Fk
Rj, Rk—Flags indicating when Fj, Fk are ready. Set
to No after operands are read.
3. Register result status—Indicates which functional unit will
write each register, if one exists. Blank when no pending
instructions will write that register
Information within the Scoreboard
Instruction stream
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Clock
FU
Issue
Read Execution
Write
operandscompleteResult
Busy
No
No
No
No
No
Op
dest
Fi
F0
F2
F4
Instruction status:
Scoreboard only
records the status
We will show the
times for each stage,
for convenience
S1
Fj
S2
Fk
FU for j FU for k Fj?
Qj
Qk
Rj
Fk?
Rk
F6
F8
F10
F30
F12
...
Information within the Scoreboard
1. Instruction status—which of 4 stages the instruction is in
2. Functional unit status—Indicates the state of the functional
unit (FU). 9 fields for each functional unit
Busy—Indicates whether the unit is being used or not
Op—Operation to perform in the unit (e.g., + or –)
Fi—Destination register
Fj, Fk—Source-register numbers
Qj, Qk—Functional units producing source registers
Fj, Fk
Rj, Rk—Flags indicating when Fj, Fk are ready. Set
to No after operands are read.
3. Register result status—Indicates which functional unit will
write each register, if one exists. Blank when no pending
instructions will write that register
Information within the Scoreboard
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
FU count
Mult2
down
Add
Divide
Register result status
Clock
FU
Issue
Read Execution
Write
operandscompleteResult
S1
Fj
Functional Units:
1 Integer
2 Multiplication
1 Addition
1 Division
Busy
No
No
No
No
No
Op
dest
Fi
S2
Fk
F0
F2
will ...
F4Source
F6and F8 Which
F10 FUF12
F30
Operands
destination
registers
FU for j FU for k Fj?
Qj
Qk
Rj
produce each
operand
Fk?
Rk
Ready?
Information within the Scoreboard
1. Instruction status—which of 4 stages the instruction is in
2. Functional unit status—Indicates the state of the functional
unit (FU). 9 fields for each functional unit
Busy—Indicates whether the unit is being used or not
Op—Operation to perform in the unit (e.g., + or –)
Fi—Destination register
Fj, Fk—Source-register numbers
Qj, Qk—Functional units producing source registers
Fj, Fk
Rj, Rk—Flags indicating when Fj, Fk are ready. Set
to No after operands are read.
3. Register result status—Indicates which functional unit will
write each register, if one exists. Blank when no pending
instructions will write that register
Information within the Scoreboard
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Clock
Issue
Read Execution
Write
operandscompleteResult
Busy
No
No
No
No
No
Op
dest
Fi
F0
F2
F4
S1
Fj
S2
Fk
FU for j FU for k Fj?
Qj
Qk
Rj
Fk?
Rk
F6
F8
F10
F30
F12
FU
Clock cycle
counter
Which FU will write in each register?
...
A Scoreboard Example
The following code is run on a scoreboard pipeline with:
Functional Unit (FU)
Integer Mem
Floating Point Multiply
Floating Point Add
Floating point Divide
L.D
F6, 34(R2)
L.D
F2, 45(R3)
MUL.D
F0, F2, F4
SUB.D
F8, F6, F2
DIV.D
F10, F0, F6
ADD.D
F6, F8, F2
# of FUs
1
2
1
1
EX cycles
1
10
2
40
Functional units
are not pipelined
Dependency Graph For Example Code
Example Code
1
1
2
3
4
5
6
L.D F6, 34 (R2)
2
L.D F2, 45 (R3)
3
MUL.D F0, F2, F4
4
SUB.D F8, F6, F2
5
DIV.D F10, F0, F6
L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D
F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F6, F2
F10, F0, F6
F6, F8, F2
Date Dependence:
(1, 4) (1, 5) (2, 3) (2, 4)
(2, 6) (3, 5) (4, 6)
Output Dependence:
(1, 6)
Anti-dependence:
(5, 6)
Real Data Dependence (RAW)
6
ADD.D F6, F8, F2
Anti-dependence
(WAR)
Output Dependence
(WAW)
Scoreboard Example Cycle 1
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Clock
1
FU
Issue
1
Read
ExecutionW rite
operandscompleteResult
Busy
Yes
No
No
No
No
Op
Load
dest
Fi
F6
F0
F2
F4
S2
Fk
R2
Issue LD #1
S1
Fj
F6 F8 F10
Integer
FU for j FU for k Fj?
Qj
Qk
Rj
F12
...
Fk?
Rk
Yes
F30
Scoreboard Example Cycle 2
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
Clock
F0
2
FU
Stall
Busy Op
Yes Load
No
No
No
No
F2
S2
Fk
R2
LD#1 reads operands
LD #2 can’t issue since
integer unit is busy
MULT can’t issue because
we require in-order issue.
Pipeline Stalls
dest
Fi
F6
S1
Fj
F4
F6 F8 F10
Integer
FU for j FU for k Fj?
Qj
Qk
Rj
F12
...
Fk?
Rk
Yes
F30
Scoreboard Example Cycle 3
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
Clock
F0
3
FU
Busy Op
Yes Load
No
No
No
No
F2
S2
Fk
R2
LD #1 completes
dest
Fi
F6
S1
Fj
F4
F6 F8 F10
Integer
FU for j FU for k Fj?
Qj
Qk
Rj
F12
...
Fk?
Rk
Yes
F30
Scoreboard Example Cycle 4
Instruction status
Read Execution
Write
Instruction j
k Issueoperands
complete
Result
LD F6 34+ R2
1
2
3
4
LD F2 45+ R3
MULTD
F0 F2 F4
SUBD
F8 F6 F2
DIVDF10 F0 F6
ADDD
F6 F8 F2
Functional unit status
dest S1 S2
TimeName
Busy Op
Fi
Fj
Fk
Integer No
Mult1
No
Mult2
No
Add
No
Divide
No
Register result status
Clock
4
F0 F2
FU
F4
LD #1 writes back and
frees Integer FU and
register F6
FU for FU
j
for F
k j?
Qj
Qk
Rj
F6 F8 F10 F12 ...
Fk?
Rk
F30
Scoreboard Example Cycle 5
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
Clock
F0
5
FU
Busy Op
Yes Load
No
No
No
No
F2
Integer
S2
Fk
R3
Issue LD #2 since integer
unit is now free.
dest
Fi
F2
S1
Fj
FU for j FU for k Fj?
Qj
Qk
Rj
F4
F6 F8 F10
F12
...
Fk?
Rk
Yes
F30
Scoreboard Example Cycle 6
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
6
Clock
F0
6
FU
Busy Op
Yes Load
Yes Mult
No
No
No
F2
Mult1 Integer
Issue MULT.
dest
Fi
F2
F0
S1
Fj
F4
F6 F8 F10
F2
S2
Fk
R3
F4
FU for j FU for k Fj?
Qj
Qk
Rj
Integer
F12
No
Fk?
Rk
Yes
Yes
...
F30
Scoreboard Example Cycle 7
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
6
7
Busy
Yes
Yes
No
Yes
No
Clock
F0
7
FU
MULT can’t read its
operands (F2) because LD
#2 hasn’t finished.
SUBD is issued
Op
Load
Mult
dest
Fi
F2
F0
S1
Fj
F2
S2
Fk
R3
F4
Sub
F8
F6
F2
F2
F4
F6 F8 F10
Mult1 Integer
Add
FU for j FU for k Fj?
Qj
Qk
Rj
Integer
No
Fk?
Rk
Yes
Yes
Integer Yes
No
F12
F30
...
Scoreboard Example Cycle 8a
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
6
7
8
Busy
Yes
Yes
No
Yes
Yes
Clock
F0
8
FU
MULT and SUBD both
waiting for F2.
DIVD issues.
Op
Load
Mult
dest
Fi
F2
F0
S1
Fj
F2
S2
Fk
R3
F4
Sub
Div
F8
F10
F6
F0
F2
F6
F2
F4
F6 F8 F10
Mult1 Integer
FU for j FU for k Fj?
Qj
Qk
Rj
Integer
Mult1
Add Divide
No
Fk?
Rk
Yes
Yes
Integer Yes
No
No
Yes
F12
F30
...
Scoreboard Example Cycle 8b
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
7
8
Busy
No
Yes
No
Yes
Yes
Clock
F0
8
FU
Mult1
LD #2 writes F2.
Op
dest
Fi
S1
Fj
S2
Fk
FU for j FU for k Fj?
Qj
Qk
Rj
Mult
F0
F2
F4
Yes
Yes
Sub
Div
F8
F10
F6
F0
F2
F6
Yes
No
Yes
Yes
F2
F4
F6 F8 F10
...
F30
Mult1
Add Divide
F12
Fk?
Rk
Scoreboard Example Cycle 9
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
10 Mult1
Mult2
2 Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
8
Busy
No
Yes
No
Yes
Yes
Clock
F0
9
FU
Mult1
Now MULT and SUBD can
both read F2.
Op
dest
Fi
S1
Fj
S2
Fk
FU for j FU for k Fj?
Qj
Qk
Rj
Mult
F0
F2
F4
Yes
Yes
Sub
Div
F8
F10
F6
F0
F2
F6
Yes
No
Yes
Yes
F2
F4
F6 F8 F10
...
F30
Mult1
Add Divide
F12
Fk?
Rk
Scoreboard Example Cycle 10
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
10
9 Mult1
Mult2
12 Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
8
Busy
No
Yes
No
Yes
Yes
Clock
F0
9
FU
Mult1
MULT and SUB continue
operation
Op
dest
Fi
S1
Fj
S2
Fk
FU for j FU for k Fj?
Qj
Qk
Rj
Mult
F0
F2
F4
Yes
Yes
Sub
Div
F8
F10
F6
F0
F2
F6
Yes
No
Yes
Yes
F2
F4
F6 F8 F10
...
F30
Mult1
Add Divide
F12
Fk?
Rk
Scoreboard Example Cycle 11
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
8 Mult1
Mult2
0 Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
8
Busy
No
Yes
No
Yes
Yes
Clock
F0
11
FU
Mult1
ADDD can not be issued
because add unit is busy.
SUBD completes
Op
dest
Fi
S1
Fj
S2
Fk
FU for j FU for k Fj?
Qj
Qk
Rj
Mult
F0
F2
F4
Yes
Yes
Sub
Div
F8
F10
F6
F0
F2
F6
Yes
No
Yes
Yes
F2
F4
F6 F8 F10
...
F30
Mult1
Add Divide
F12
Fk?
Rk
Scoreboard Example Cycle 12
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
7 Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
Clock
F0
12
FU
Busy Op
No
Yes Mult
No
No
Yes Div
Mult1
F2
SUBD finishes.
DIVD waits for F0
dest
Fi
S1
Fj
S2
Fk
FU for j FU for k Fj?
Qj
Qk
Rj
F0
F2
F4
F10
F0
F6
F4
F6 F8 F10
Mult1
Divide
F12
Fk?
Rk
Yes
Yes
No
Yes
...
F30
Scoreboard Example Cycle 13
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
6 Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
13
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
13
FU
Mult1
F2
F4
ADDD issues.
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 14
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
5 Mult1
Mult2
2 Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
13
14
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
14
FU
Mult1
F2
F4
MULT and ADDD
continue their operation
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 15
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
4 Mult1
Mult2
1 Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
13
14
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
15
FU
Mult1
F2
F4
Nearly there…
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 16
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
3 Mult1
Mult2
0 Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
13
14
16
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
16
FU
Mult1
F2
F4
ADDD completes
execution
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 17
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
2 Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
13
14
16
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
17
FU
Mult1
F2
F4
ADDD can’t write because
of RAW with DIVD
ADDD stalls write back
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 18
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
1 Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
13
14
16
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
18
FU
Mult1
F2
F4
MULT still continues
its execution
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 19
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
0 Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
19
7
9
11
12
8
13
14
16
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
19
FU
Mult1
F2
F4
MULT completes execution.
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 20
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
19
20
7
9
11
12
8
13
14
16
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
No
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
20
FU
F2
F4
MULT writes and frees FU
and register F0
FU for j FU for k Fj?
Qj
Qk
Rj
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
Yes
...
F30
Scoreboard Example Cycle 21
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
19
20
7
9
11
12
8
21
13
14
16
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
No
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
21
FU
F2
F4
DIVD can read operands
FU for j FU for k Fj?
Qj
Qk
Rj
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
Yes
...
F30
Scoreboard Example Cycle 22
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
40 Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
19
20
7
9
11
12
8
21
13
14
16
22
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
No
No
No
Yes Div
F10
F0
F6
Clock
F0
22
FU
F2
F4
Now ADDD can write since
WAR removed
ADD FU and register F6
freed
FU for j FU for k Fj?
Qj
Qk
Rj
F6 F8 F10
Divide
F12
Fk?
Rk
Yes
Yes
...
F30
39 cycles later…
Scoreboard Example Cycle 61
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
0 Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
19
20
7
9
11
12
8
21
61
13
14
16
22
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
No
No
No
Yes Div
F10
F0
F6
Clock
F0
61
FU
F2
F4
DIVD completes execution
FU for j FU for k Fj?
Qj
Qk
Rj
F6 F8 F10
Divide
F12
Fk?
Rk
Yes
Yes
...
F30
Scoreboard Example Cycle 62
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
0 Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
19
20
7
9
11
12
8
21
61
62
13
14
16
22
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
No
No
No
No
Clock
F0
62
FU
F2
F4
DIVD writes back
and frees resources
Execution Complete
FU for j FU for k Fj?
Qj
Qk
Rj
F6 F8 F10
F12
...
Fk?
Rk
F30
Scoreboard Example Cycle 62
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
0 Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
19
20
7
9
11
12
8
21
61
62
13
14
16
22
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
No
No
No
No
Clock
F0
62
FU
F2
F4
In-order issue
Out-of-order execution
Out-of-order completion
FU for j FU for k Fj?
Qj
Qk
Rj
F6 F8 F10
F12
...
Fk?
Rk
F30
Summary
• Techniques to deal with data hazards in instruction
pipelines by:
– Result forwarding to reduce or eliminate RAW hazards
– Hazard detection hardware to stall the pipeline during
hazards
– Compiler-based static scheduling to separate the dependent
instructions minimizing actual hazard-prevention stalls in
scheduled code (will discuss in detail next week.)
– Uses a hardware-based mechanism to rearrange instruction
execution order to reduce stalls dynamically at runtime
(dynamic scheduling)
» Better dynamic exploitation of instruction-level
parallelism (ILP)
Limitations of Scoreboard
• The amount of parallelism available among the
instructions (chosen from the same basic block)
• The number of score entries (The size of the
scoreboard determines the size of the window)
• The number and types of functional units (Structural
hazards increase when out of order execution is
used)
• The presence of antidependence and output
dependences lead to WAR and WAW stalls.