Read operands for DIVD?

Lecture 7
Dynamic Scheduling
Dr. Norafida Ithnin
Advanced Computer System &
Architecture
Dynamic vs. Static Scheduling
• Data hazards in a program to cause a processor to stall.
• With static scheduling the compiler tries to reorder these
instructions during compile time to reduce pipeline stalls.
– Uses less hardware
– Can use more powerful algorithms
• With dynamic scheduling the hardware tries to rearrange the
instructions during run-time to reduce pipeline stalls.
– Simpler compiler
– Handles dependencies not known at compile time
– Allows code compiled for a different machine to run
efficiently.
Out-Of-Order Execution
• In our previous model of DLX, all instructions executed in the
order that they appear
• This can lead to unnecessary stalls
DIVD
FO, F2, F4
ADDD
F10, F0, F8
SUBD
F12, F8, F14
• SUBD stalls waiting for the ADDD to go first, even though
SUBD does not have a data dependency.
• With out-of-order execution, the SUBD is allowed to executed
before the add
– This can lead to out-of order completion, which can cause
WAW and WAR hazards
Scoreboarding
• The scoreboard implements a centralized control scheme that
– Detects all resource and data hazards
– Allows instructions to execute out-of-order when no resource hazards
or data dependencies
• First implemented in 1964 by the CDC 6600, which had 18 separate
functional units
– 4 FP units (2 multiply, 1 add, 1 divide)
– 7 memory units (5 loads, 2 stores)
– 7 integer units (add, shift, logical, compare, etc.)
• Dynamic DLX (much simpler)
– 2 FP multiply (10 EX cycles)
– 1 FP add (2 EX cycles)
– 1 FP divide (40 EX cycles)
– 1 integer unit (1 EX cycle)
Out-of-Order Execution
• Out-of-order execution divides ID stage:
1. Issue—decode instructions, check for structural hazards
2. Read operands—wait until no data hazards, then read operands
• Scoreboards allow instruction to execute whenever
1 & 2 hold, not waiting for prior instructions
• CDC 6600: In order issue, out of order execution, out
of order commit (also called completion)
Scoreboard Implications
• Out-of-order completion can lead to WAR and WAW hazards?
• Solution for WAW
– Detect WAW hazard before reading operands
– Stall write until other instruction completes
• Solutions for WAR
– Detect WAR hazards before writing back to the register files
and stall the write back
• This scoreboard does not take advantage of forwarding, since it
waits until both results are written back to the register file
• Scoreboard replaces ID, EX, WB with 4 stages
Four Stages of Scoreboard Control
• Issue (ID1)
– decode instructions
– check for structural and WAW hazards
– stall until structural and WAW hazards are resolved
• Read operands (ID2)
– wait until no RAW hazards
– then read operands
• Execution (EX)
– operate on operands
– may be multiple cycles - notify scoreboard when done
• Write result (WB)
– finish execution
– stall if WAR hazard
Three Parts of the Scoreboard
1. Instruction status—which of 4 steps the instruction is in: ID1, ID1, EX, or WB.
2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields
for each functional unit
Busy—Indicates whether the unit is busy or not
Op—Operation to perform in the unit (e.g., + or –)
Fi—Destination register
Fj, Fk—Source-register numbers
Qj, Qk—Functional units producing source registers Fj, Fk
Rj, Rk—Flags indicating when Fj, Fk are ready
3. Register result status—Indicates which functional unit will write each register, if
one exists. Blank when no pending instructions will write that register
Scoreboard Example for DLX
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Clock
FU
Issue
Read Execution
Write
operandscompleteResult
Busy
No
No
No
No
No
Op
dest
Fi
F0
F2
F4
S1
Fj
S2
Fk
FU for j FU for k Fj?
Qj
Qk
Rj
Fk?
Rk
F6
F8
F10
F30
F12
...
Scoreboard Example Cycle 1
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Clock
1
FU
Issue
1
Read
ExecutionW rite
operandscompleteResult
Busy
Yes
No
No
No
No
Op
Load
dest
Fi
F6
F0
F2
F4
S1
Fj
S2
Fk
R2
F6 F8 F10
Integer
FU for j FU for k Fj?
Qj
Qk
Rj
F12
...
Fk?
Rk
Yes
F30
Scoreboard Example Cycle 2
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
Clock
F0
2
Busy Op
Yes Load
No
No
No
No
FU
• Issue 2nd LD?
F2
dest
Fi
F6
S1
Fj
S2
Fk
R2
F4
F6 F8 F10
Integer
FU for j FU for k Fj?
Qj
Qk
Rj
F12
...
Fk?
Rk
Yes
F30
Scoreboard Example Cycle 3
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
Clock
F0
3
•
Busy Op
Yes Load
No
No
No
No
FU
Issue MULT?
F2
dest
Fi
F6
S1
Fj
S2
Fk
R2
F4
F6 F8 F10
Integer
FU for j FU for k Fj?
Qj
Qk
Rj
F12
...
Fk?
Rk
Yes
F30
Scoreboard Example Cycle 4
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
Clock
F0
4
FU
Busy Op
Yes Load
No
No
No
No
F2
dest
Fi
F6
S1
Fj
S2
Fk
R2
F4
F6 F8 F10
Integer
FU for j FU for k Fj?
Qj
Qk
Rj
F12
...
Fk?
Rk
Yes
F30
Scoreboard Example Cycle 5
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
Clock
F0
5
FU
Busy Op
Yes Load
No
No
No
No
F2
Integer
dest
Fi
F2
S1
Fj
S2
Fk
R3
FU for j FU for k Fj?
Qj
Qk
Rj
F4
F6 F8 F10
F12
...
Fk?
Rk
Yes
F30
Scoreboard Example Cycle 6
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
6
Clock
F0
6
FU
Busy Op
Yes Load
Yes Mult
No
No
No
F2
Mult1 Integer
dest
Fi
F2
F0
S1
Fj
F4
F6 F8 F10
F2
S2
Fk
R3
F4
FU for j FU for k Fj?
Qj
Qk
Rj
Integer
F12
No
Fk?
Rk
Yes
Yes
...
F30
Scoreboard Example Cycle 7
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
6
7
Busy
Yes
Yes
No
Yes
No
Clock
F0
7
•
FU
Op
Load
Mult
dest
Fi
F2
F0
S1
Fj
F2
S2
Fk
R3
F4
Sub
F8
F6
F2
F2
F4
F6 F8 F10
Mult1 Integer
Read multiply operands?
Add
FU for j FU for k Fj?
Qj
Qk
Rj
Integer
No
Fk?
Rk
Yes
Yes
Integer Yes
No
F12
F30
...
Scoreboard Example Cycle 8a
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
6
7
8
Busy
Yes
Yes
No
Yes
Yes
Clock
F0
8
FU
Op
Load
Mult
dest
Fi
F2
F0
S1
Fj
F2
S2
Fk
R3
F4
Sub
Div
F8
F10
F6
F0
F2
F6
F2
F4
F6 F8 F10
Mult1 Integer
FU for j FU for k Fj?
Qj
Qk
Rj
Integer
Mult1
Add Divide
No
Fk?
Rk
Yes
Yes
Integer Yes
No
No
Yes
F12
F30
...
Scoreboard Example Cycle 8b
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
7
8
Busy
No
Yes
No
Yes
Yes
Clock
F0
8
FU
Mult1
Op
dest
Fi
S1
Fj
S2
Fk
FU for j FU for k Fj?
Qj
Qk
Rj
Mult
F0
F2
F4
Yes
Yes
Sub
Div
F8
F10
F6
F0
F2
F6
Yes
No
Yes
Yes
F2
F4
F6 F8 F10
...
F30
Mult1
Add Divide
F12
Fk?
Rk
Scoreboard Example Cycle 9
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
10 Mult1
Mult2
2 Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
8
Busy
No
Yes
No
Yes
Yes
Clock
F0
9
FU
Mult1
Op
dest
Fi
S1
Fj
S2
Fk
FU for j FU for k Fj?
Qj
Qk
Rj
Mult
F0
F2
F4
Yes
Yes
Sub
Div
F8
F10
F6
F0
F2
F6
Yes
No
Yes
Yes
F2
F4
F6 F8 F10
...
F30
Mult1
Add Divide
F12
Fk?
Rk
Scoreboard Example Cycle 11
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
8 Mult1
Mult2
0 Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
8
Busy
No
Yes
No
Yes
Yes
Clock
F0
11
FU
Mult1
Op
dest
Fi
S1
Fj
S2
Fk
FU for j FU for k Fj?
Qj
Qk
Rj
Mult
F0
F2
F4
Yes
Yes
Sub
Div
F8
F10
F6
F0
F2
F6
Yes
No
Yes
Yes
F2
F4
F6 F8 F10
...
F30
Mult1
Add Divide
F12
Fk?
Rk
Scoreboard Example Cycle 12
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
7 Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
Clock
F0
12
•
FU
Busy Op
No
Yes Mult
No
No
Yes Div
F2
dest
Fi
S1
Fj
S2
Fk
F0
F2
F4
F10
F0
F6
F4
F6 F8 F10
Mult1
Read operands for DIVD?
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
Divide
F12
Fk?
Rk
Yes
Yes
No
Yes
...
F30
Scoreboard Example Cycle 13
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
6 Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
13
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
13
FU
Mult1
F2
F4
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 14
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
5 Mult1
Mult2
2 Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
13
14
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
14
FU
Mult1
F2
F4
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 15
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
4 Mult1
Mult2
1 Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
13
14
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
15
FU
Mult1
F2
F4
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 16
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
3 Mult1
Mult2
0 Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
13
14
16
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
16
FU
Mult1
F2
F4
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 17
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
2 Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
13
14
16
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
17
•
FU
F2
F4
Mult1
Write result of ADDD?
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 18
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
1 Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
7
9
11
12
8
13
14
16
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
18
FU
Mult1
F2
F4
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 19
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
0 Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
19
7
9
11
12
8
13
14
16
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
Yes Mult
F0
F2
F4
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
19
FU
Mult1
F2
F4
FU for j FU for k Fj?
Qj
Qk
Rj
Mult1
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
No
Yes
Yes
...
F30
Scoreboard Example Cycle 20
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
19
20
7
9
11
12
8
13
14
16
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
No
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
20
FU
F2
F4
FU for j FU for k Fj?
Qj
Qk
Rj
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
Yes
...
F30
Scoreboard Example Cycle 21
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
19
20
7
9
11
12
8
21
13
14
16
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
No
No
Yes Add
F6
F8
F2
Yes Div
F10
F0
F6
Clock
F0
21
FU
F2
F4
FU for j FU for k Fj?
Qj
Qk
Rj
F6 F8 F10
Add
Divide
F12
Fk?
Rk
Yes
Yes
Yes
Yes
...
F30
Scoreboard Example Cycle 22
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
40 Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
19
20
7
9
11
12
8
21
13
14
16
22
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
No
No
No
Yes Div
F10
F0
F6
Clock
F0
22
FU
F2
F4
FU for j FU for k Fj?
Qj
Qk
Rj
F6 F8 F10
Divide
F12
Fk?
Rk
Yes
Yes
...
F30
Scoreboard Example Cycle 61
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
0 Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
19
20
7
9
11
12
8
21
61
13
14
16
22
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
No
No
No
Yes Div
F10
F0
F6
Clock
F0
61
FU
F2
F4
FU for j FU for k Fj?
Qj
Qk
Rj
F6 F8 F10
Divide
F12
Fk?
Rk
Yes
Yes
...
F30
Scoreboard Example Cycle 62
Instruction status
Instruction
j
k
LD
F6
34+ R2
LD
F2
45+ R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Functional unit status
Time Name
Integer
Mult1
Mult2
Add
0 Divide
Register result status
Read
Execution
W rite
Issue operandscompleteResult
1
2
3
4
5
6
7
8
6
9
19
20
7
9
11
12
8
21
61
62
13
14
16
22
dest
S1 S2
Busy Op
Fi
Fj
Fk
No
No
No
No
No
Clock
F0
62
FU
F2
F4
FU for j FU for k Fj?
Qj
Qk
Rj
F6 F8 F10
F12
...
Fk?
Rk
F30
CDC 6000 Scoreboard Summary
• Speedup from scoreboard
– 1.7 for FORTRAN programs
– 2.5 for hand-coded assembly language programs
– Effects of modern compilers?
• Hardware
– Scoreboard hardware approximately same as one FPU
– Main cost was buses (4x’s normal amount)
– Could be more severe for modern processors
• Limitations
– No forwarding logic
– Limited to instructions in basic block
– Stalls for WAW hazards
– Wait for WAR hazards before WB
Tomasulo Algorithm for Dynamic
Scheduling
• For IBM 360/91 in 1967 - about 3 years after CDC 6600
• Goal: High performance without special compilers
• Differences between IBM 360 & CDC 6600
– IBM has only 2 register specifiers/instr vs. 3 in CDC 6600
– IBM has register-memory instructions
– IBM has 4 FP registers vs. 8 in CDC 6600
– IBM has pipelined functional units (3 adds, 2 multiplies)
• Tomasulo algorithm is designed to handle name dependencies (WAW and WAR
hazards) efficiently
SUB
F1, F2, F0
DIVF
F2, F3 , F2
ADDF
F3, F0, F0
MULF
F3, F1, F1
Tomasulo Algorithm
• Differences from Scoreboarding
– Distributed hazard detection and control (through
reservation stations)
– Results are bypassed to function units
– Common data bus (CDB) broadcasts results to all FUs.
– HW renaming of registers to avoid WAR, WAW hazards
– Load and Stores treated as FUs as well
– Registers in instructions replaced by pointers to reservation
station buffers
• Lead to concepts used in Alpha 21264, HP 8000, MIPS 10000,
Pentium II, PowerPC 604, …
Tomasulo Organization
FP Op Queue
FP
Registers
Load
Buffer
Common
Data
Bus
FP Add
Res.
Station
Store
Buffer
FP Mul
Res.
Station
Reservation Station Components
Op—Operation to perform in the unit (e.g., + or –)
Qj, Qk—Reservation stations producing source
registers
Vj, Vk—Value of Source operands
Rj, Rk—Flags indicating when Vj, Vk are ready
Busy—Indicates reservation station and FU is busy
Register result status—Indicates which functional
unit will write each register, if one exists. Blank when
no pending instructions that will write that register.
Three Stages of Tomasulo
Algorithm
1. Issue—get instruction from FP Op Queue
If reservation station free, the scoreboard
issues instr & sends operands (renames
registers).
2. Execution—operate on operands (EX)
When both operands ready then execute;
if not ready, watch CDB for result
3. Write result—finish execution (WB)
Write on Common Data Bus to all awaiting
units; mark reservation station available.
Tomasulo Example Cycle 0
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Execution
complete
S1
Vj
S2
Vk
RS for j
Qj
RS for k
Qk
Clock
F2
F4
F6
F8
0
F0
FU
Write
Result
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
F30
Tomasulo Example Cycle 1
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Execution
complete
S1
Vj
S2
Vk
RS for j
Qj
RS for k
Qk
Clock
F2
F4
F6
F8
1
F0
FU
Write
Result
Load1
Load2
Load3
Load1
Busy
No
Yes
No
No
Address
34+R2
F10
F12 ...
F30
Tomasulo Example Cycle 2
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Execution
complete
S1
Vj
S2
Vk
RS for j
Qj
RS for k
Qk
Clock
F2
F4
F6
F8
2
F0
FU
Write
Result
Load1
Load2
Load3
Load2
Load1
Busy
Yes
Yes
No
Address
34+R2
45+R3
F10
F12 ...
F30
Tomasulo Example Cycle 3
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
3
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Execution
complete
3
Write
Result
S1
Vj
S2
Vk
RS for j
Qj
R(F4)
Load2
Clock
F0
F2
F4
F6
Mult1
Load2
3
FU
Load1
Load2
Load3
Load1
Busy
Yes
Yes
No
Address
34+R2
45+R3
F10
F12 ...
RS for k
Qk
F8
F30
Tomasulo Example Cycle 4
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 Yes
0 Add2 No
Add3 No
0 Mult1 Yes
0 Mult2 No
Register result status
Clock
4
FU
Issue
1
2
3
4
Execution
complete
3
Write
Result
4
Load1
Load2
Load3
S1
Op
Vj
SUBD M(34+R2)
S2
Vk
RS for j
Qj
MULTD
R(F4)
Load2
F4
F6
F8
M(34+R2)
Add1
F0
F2
Mult1
Load2
Busy
No
Yes
No
Address
F10
F12 ...
45+R3
RS for k
Qk
Load2
F30
Tomasulo Example Cycle 5
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 Yes
0 Add2 No
Add3 No
0 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
5
FU
Issue
1
2
3
4
5
Execution
complete
3
5
Write
Result
4
Load1
Load2
Load3
Busy
No
Yes
No
Address
F12 ...
S1
Op
Vj
SUBD M(34+R2)
S2
Vk
RS for j
Qj
MULTD
DIVD
R(F4)
M(34+R2)
Load2
Mult1
F4
F6
F8
F10
M(34+R2)
Add1
Mult2
F0
F2
Mult1
Load2
45+R3
RS for k
Qk
Load2
F30
Tomasulo Example Cycle 6
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
2 Add1 Yes
0 Add2 Yes
Add3 No
10 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
6
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
Write
Result
4
6
Load1
Load2
Load3
RS for j
Qj
Busy
No
No
No
Address
F12 ...
S1
Op
Vj
SUBD M(34+R2)
ADDD
S2
Vk
M(45+R3)
M(45+R3)
RS for k
Qk
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
F10
Mult1
M(45+R3)
Add2
Add1
Mult2
Add1
F30
Tomasulo Example Cycle 7
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
1 Add1 Yes
0 Add2 Yes
Add3 No
9 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
7
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
Write
Result
4
6
Load1
Load2
Load3
RS for j
Qj
Busy
No
No
No
Address
F12 ...
S1
Op
Vj
SUBD M(34+R2)
ADDD
S2
Vk
M(45+R3)
M(45+R3)
RS for k
Qk
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
F10
Mult1
M(45+R3)
Add2
Add1
Mult2
Add1
F30
Tomasulo Example Cycle 8
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 Yes
0 Add2 Yes
Add3 No
8 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
8
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
Write
Result
4
6
Load1
Load2
Load3
Busy
No
No
No
Address
F12 ...
8
S1
Op
Vj
SUBD M(34+R2)
ADDD
S2
Vk
M(45+R3)
M(45+R3)
RS for j
Qj
RS for k
Qk
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
F10
Mult1
M(45+R3)
Add2
Add1
Mult2
Add1
F30
Tomasulo Example Cycle 9
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 Yes
Add3 No
7 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
9
FU
Issue
1
2
3
4
5
6
Op
Execution
complete
3
5
Write
Result
4
6
8
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
9
S2
Vk
RS for j
Qj
RS for k
Qk
ADDD M()–M()
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
Add2
M()–M() Mult2
F30
Tomasulo Example Cycle 10
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
2 Add2 Yes
Add3 No
67 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
10
FU
Issue
1
2
3
4
5
6
Op
Execution
complete
3
5
Write
Result
4
6
8
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
9
S2
Vk
RS for j
Qj
RS for k
Qk
ADDD M()–M()
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
Add2
M()–M() Mult2
F30
Tomasulo Example Cycle 11
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
1 Add2 Yes
Add3 No
5 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
11
FU
Issue
1
2
3
4
5
6
Op
Execution
complete
3
5
Write
Result
4
6
8
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
9
S2
Vk
RS for j
Qj
RS for k
Qk
ADDD M()–M()
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
Add2
M()–M() Mult2
F30
Tomasulo Example Cycle 12
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 Yes
Add3 No
4 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
12
FU
Issue
1
2
3
4
5
6
Op
Execution
complete
3
5
Write
Result
4
6
8
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
9
12
S1
Vj
S2
Vk
RS for j
Qj
RS for k
Qk
ADDD M()–M()
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
Add2
M()–M() Mult2
F30
Tomasulo Example Cycle 13
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
Add3 No
3 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
13
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
Write
Result
4
6
8
9
12
13
Address
F10
F12 ...
S2
Vk
RS for j
Qj
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
(M–M)+M()
M()–M() Mult2
Op
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
RS for k
Qk
F30
Tomasulo Example Cycle 14
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
Add3 No
2 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
14
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
Write
Result
4
6
8
9
12
13
Address
F10
F12 ...
S2
Vk
RS for j
Qj
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
(M–M)+M()
M()–M() Mult2
Op
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
RS for k
Qk
F30
Tomasulo Example Cycle 15
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
Add3 No
1 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
15
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
Write
Result
4
6
8
9
12
13
Address
F10
F12 ...
S2
Vk
RS for j
Qj
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
(M–M)+M()
M()–M() Mult2
Op
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
RS for k
Qk
F30
Tomasulo Example Cycle 16
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
16
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
16
8
Write
Result
4
6
Address
F10
F12 ...
9
12
13
S2
Vk
RS for j
Qj
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
(M–M)+M()
M()–M() Mult2
Op
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
RS for k
Qk
F30
Tomasulo Example Cycle 17
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
3
SUBD F8
F6
F2
4
DIVD F10 F0
F6
5
ADDD F6
F8
F2
6
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 Yes DIVD
Register result status
Execution
complete
3
5
16
8
Clock
17
FU
Write
Result
4
6
17
9
12
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
13
S1
Vj
S2
Vk
M*F4
M(34+R2)
F0
F2
F4
M*F4
M(45+R3)
RS for j
Qj
RS for k
Qk
F6
F8
(M–M)+M()
M()–M() Mult2
F30
Tomasulo Example Cycle 18
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
3
SUBD F8
F6
F2
4
DIVD F10 F0
F6
5
ADDD F6
F8
F2
6
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
40 Mult2 Yes DIVD
Register result status
Execution
complete
3
5
16
8
Clock
18
FU
Write
Result
4
6
17
9
12
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
13
S1
Vj
S2
Vk
M*F4
M(34+R2)
F0
F2
F4
M*F4
M(45+R3)
RS for j
Qj
RS for k
Qk
F6
F8
(M–M)+M()
M()–M() Mult2
F30
Tomasulo Example Cycle 57
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
3
SUBD F8
F6
F2
4
DIVD F10 F0
F6
5
ADDD F6
F8
F2
6
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
1 Mult2 Yes DIVD
Register result status
Execution
complete
3
5
16
8
Clock
57
FU
Write
Result
4
6
17
9
12
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
13
S1
Vj
S2
Vk
M*F4
M(34+R2)
F0
F2
F4
M*F4
M(45+R3)
RS for j
Qj
RS for k
Qk
F6
F8
(M–M)+M()
M()–M() Mult2
F30
Tomasulo Example Cycle 58
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
3
SUBD F8
F6
F2
4
DIVD F10 F0
F6
5
ADDD F6
F8
F2
6
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 Yes DIVD
Register result status
Execution
complete
3
5
16
8
58
12
S1
Vj
Write
Result
4
6
17
9
M*F4
M(34+R2)
Clock
F0
F2
F4
M*F4
M(45+R3)
58
FU
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
13
S2
Vk
RS for j
Qj
RS for k
Qk
F6
F8
(M–M)+M()
M()–M() Mult2
F30
Tomasulo Example Cycle 59
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
3
SUBD F8
F6
F2
4
DIVD F10 F0
F6
5
ADDD F6
F8
F2
6
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Execution
complete
3
5
16
8
58
12
S1
Vj
Write
Result
4
6
17
9
59
13
S2
Vk
RS for j
Qj
RS for k
Qk
Clock
F0
F2
F4
F6
F8
M*F4
M(45+R3)
(M–M)+M()
M()–M() M*F4/M
59
FU
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
F30
Tomasulo Summary
• Advantages
– Prevents register from being the bottleneck
– Eliminates WAR, WAW hazards
– Allows loop unrolling in HW
• Common Data Bus
– Broadcasts results to multiple instructions
– Central bottleneck
• Lasting Contributions
– Dynamic scheduling
– Register renaming
– Load/store disambiguation
Discussions