Lecture 12: Dynamic Branch Prediction, Superscalar, VLIW

Dynamic ILP: Scoreboard
Professor Alvin R. Lebeck
Computer Science 220 / ECE 252
Fall 2008
Precise Exceptions
• What is precise?
– All instructions older than interrupt are complete
– All instructions younger than interrupt haven’t started
• Simultaneous interrupts/exceptions
• Odd bits of state
• Out of order writes
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
2
Review: CPI < 1
• Superscalar: Issue & execute > 1 instruction/cycle
• In-order processor
– Instructions move from D to E in program order
– May finish execute out of order
• problem spots
– fetch, branch prediction  trace cache?
– decode (N2 dependence cross-check)
– execute (N2 bypass)  clustering?
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
3
Can we do better?
• Problem: Stall in ID stage if any data hazard.
MULD
ADDD
ADDD
ADDD
F2, F3, F4
F1, F2, F3
F3, F4, F5
F1, F4, F5
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Long latency…
Computer Science 220
4
HW Schemes: Instruction Parallelism
• Why in HW at run time?
– Works when can’t know dependencies
– Simpler Compiler
– Code for one machine runs well on another machine
• Key Idea: Allow instructions behind stall to proceed
DIVD
F0, F2, F4
ADD
F10, F0, F8
SUBD
F8, F8, F14
– Enables out-of-order execution => out-of-order completion
– ID stage check for both structural & data dependencies
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
5
HW Schemes: Instruction Parallelism
• Out-of-order execution divides ID stage:
1. Issue: decode instructions, check for structural hazards
2. Read: operands wait until no data hazards, then read operands
• Scoreboards allow instruction to execute whenever 1
& 2 hold, not waiting for prior instructions
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
6
Scoreboard Implications
• Out-of-order completion => WAR, WAW hazards?
• Solutions for WAR
– Queue both the operation and copies of its operands
– Read registers only during Read Operands stage
• For WAW, must detect hazard: stall until other
completes
• Need to have multiple instructions in execution phase
=> multiple execution units or pipelined execution
units
• Scoreboard keeps track of dependencies, state or
operations
• Scoreboard replaces ID, EX, WB with 4 stages
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
7
Four Stages of Scoreboard Control
1. Issue: decode instructions & check for structural
hazards (ID1, sometimes called dispatch)
If a functional unit for the instruction is free and no other active
instruction has the same destination register (WAW), the scoreboard
issues the instruction to the functional unit and updates its internal
data structure. If a structural or WAW hazard exists, then the
instruction issue stalls, and no further instructions will issue until
these hazards are cleared.
2. Read operands: wait until no data hazards, then
read operands (ID2, sometimes called issue)
A source operand is available if no earlier issued active instruction is
going to write it, or if the register containing the operand is being
written by a currently active functional unit. When the source
operands are available, the scoreboard tells the functional unit to
proceed to read the operands from the registers and begin execution.
The scoreboard resolves RAW hazards dynamically in this step, and
instructions may be sent into execution out of order.
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
8
Four Stages of Scoreboard Control
3. Execution: operate on operands
The functional unit begins execution upon receiving operands. When the
result is ready, it notifies the scoreboard that it has completed execution.
4. Write Result: finish execution (WB)
Once the scoreboard is aware that the functional unit has completed
execution, the scoreboard checks for WAR hazards. If none, it writes
results. If WAR, then it stalls the instruction.
Example:
DIVD
F0,F2,F4
ADDD
F10,F0,F8
SUBD
F8,F8,F14
CDC 6600 scoreboard would stall SUBD until ADDD reads operands
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
9
Three Parts of the Scoreboard
1. Instruction status: which of 4 steps the instruction is
in
2. Functional unit status: Indicates the state of the
functional unit (FU). 9 fields for each functional unit
Busy--Indicates whether the unit is busy or not
Op--Operation to perform in the unit (e.g., + or -)
Fi--Destination register
Fj, Fk--Source-register numbers
Qj, Qk--Functional units producing source registers Fj, Fk
Rj, Rk--Flags indicating when Fj, Fk are ready
3. Register result status: Indicates which functional unit
will write each register, if one exists. Blank when no
pending instructions will write that register
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
10
Scoreboard Example Cycle 1
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Integer Yes
Load
Mult1
No
Mult2
No
Add
No
Divide
No
Register Result Status
CLOCK
F0
1
FU
Issue
1
Fi
F6
F2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Fj
Read Execution Write
Operand Complete Result
Fk
R2
F4
Qj
F6
Int
Qk
F8
Computer Science 220
F10
Rj
Rk
Yes
F12
…
F31
11
Scoreboard Example Cycle 2
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Integer Yes
Load
Mult1
No
Mult2
No
Add
No
Divide
No
Register Result Status
CLOCK
F0
2
FU
Issue
1
Fi
F6
F2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Fj
Read Execution Write
Operand Complete Result
2
Fk
R2
F4
Qj
F6
Int
Qk
F8
Computer Science 220
F10
Rj
Rk
Yes
F12
…
F31
12
Scoreboard Example Cycle 3
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Integer Yes
Load
Mult1
No
Mult2
No
Add
No
Divide
No
Register Result Status
CLOCK
F0
3
FU
Issue
1
Fi
F6
F2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Fj
Read Execution Write
Operand Complete Result
2
3
Fk
R2
F4
Qj
F6
Int
Qk
F8
Computer Science 220
F10
Rj
Rk
Yes
F12
…
F31
13
Scoreboard Example Cycle 4
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Integer Yes
Load
Mult1
No
Mult2
No
Add
No
Divide
No
Register Result Status
CLOCK
F0
4
FU
Issue
1
Fi
F6
F2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Fj
Read Execution Write
Operand Complete Result
2
3
4
Fk
R2
F4
Qj
F6
Int
Qk
F8
Computer Science 220
F10
Rj
Rk
Yes
F12
…
F31
14
Scoreboard Example Cycle 5
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Integer Yes
Load
Mult1
No
Mult2
No
Add
No
Divide
No
Register Result Status
CLOCK
F0
5
FU
Issue
1
5
Fi
F2
F2
Int
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Fj
Read
Execution Write
Operand Complete Result
2
3
4
Fk
R3
F4
Qj
F6
Qk
F8
Computer Science 220
F10
Rj
Rk
Yes
F12
…
F31
15
Scoreboard Example Cycle 6
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Integer Yes
Load
Mult1
Yes
Mult
Mult2
No
Add
No
Divide
No
Register Result Status
CLOCK
F0
6
FU Mul1
Issue
1
5
6
Fi
F2
F0
F2
Int
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Fj
Read Execution Write
Operand Complete Result
2
3
4
6
Fk
R3
F4
F2
F4
Qj
Qk
Integer
F6
F8
Computer Science 220
Rj
No
F10
Rk
Yes
Yes
F12
…
F31
16
Scoreboard Example Cycle 7
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Integer Yes
Load
Mult1
Yes
Mult
Mult2
No
Add
Yes
Sub
Divide
No
Register Result Status
CLOCK
F0
7
FU Mul1
Issue
1
5
6
7
Fi
F2
F0
F2
Fk
R3
F4
F8
F6
F2
F2
Int
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Fj
Read Execution Write
Operand Complete Result
2
3
4
6
7
F4
Qj
Qk
Integer
No
Int
F6
F8
Add
Computer Science 220
Rj
F10
Rk
Yes
Yes
Yes No
F12
…
F31
17
Scoreboard Example Cycle 8a
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Fi
Integer
Yes
Load
F2
Mult1
Yes
Mult
F0
Mult2
No
Add
Yes
Sub
F8
Divide
Yes
Div
F10
Register Result Status
CLOCK
F0
F2
8
FU Mul1 Int
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
5
6
7
8
Fj
Read
Execution Write
Operand Complete Result
2
3
4
6
7
F2
Fk
R3
F4
F6
F0
F2
F6
F4
Qj
Qk
Integer
No
Int
Mult1
F6
F8
Add
Computer Science 220
Rj
F10
Div
Rk
Yes
Yes
Yes No
No Yes
F12
…
F31
18
Scoreboard Example Cycle 8b
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name Busy
Op
Fi
Integer No
Mult1
Yes
Mult
F0
Mult2
No
Add
Yes
Sub
F8
Divide Yes
Div
F10
Register Result Status
CLOCK
F0
F2
8
FU Mul1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Read Execution Write
Issue Operand Complete Result
1
2
3
4
5
6
7
8
6
7
8
Fj
Fk
F2
F4
F6
F0
F2
F6
F4
Qj
Qk
Rj
Rk
Yes Yes
Int
Mult1
F6
F8
Add
Computer Science 220
F10
Div
Yes Yes
No Yes
F12
…
F31
19
Scoreboard Example Cycle 9
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Fi
Integer
No
Mult1
Yes
Mult
F0
Mult2
No
Add
Yes
Sub
F8
Divide
Yes
Div
F10
Register Result Status
CLOCK
F0
F2
9
FU Mul1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
5
6
7
8
Read Execution Write
Operand Complete Result
2
3
4
6
7
8
9
9
Fj
Fk
F2
F4
F6
F0
F2
F6
F4
Qj
Qk
Rj
Rk
Yes Yes
Int
Mult1
F6
F8
Add
Computer Science 220
F10
Div
Yes Yes
No Yes
F12
…
F31
20
Scoreboard Example Cycle 11
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Fi
Integer
No
Mult1
Yes
Mult
F0
Mult2
No
Add
Yes
Sub
F8
Divide
Yes
Div
F10
Register Result Status
CLOCK
F0
F2
11
FU Mul1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
5
6
7
8
Read Execution Write
Operand Complete Result
2
3
4
6
7
8
9
9
11
Fj
Fk
F2
F4
F6
F0
F2
F6
F4
Qj
Qk
Rj
Rk
Yes Yes
Int
Mult1
F6
F8
Add
Computer Science 220
F10
Div
Yes Yes
No Yes
F12
…
F31
21
Scoreboard Example Cycle 12
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Issue
1
5
6
7
8
Read Execution Write
Operand Complete Result
2
3
4
6
7
8
9
9
11
12
Functional Unit Status
Name
Busy
Op
Fi Fj Fk
Qj
Integer
No
Mult1
Yes
Mult
F0 F2 F4
Mult2
No
Add
No
Divide
Yes
Div
F10 F0 F6 Mult1
Register Result Status
CLOCK
F0
F2
F4
F6
F8
12
FU Mul1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Qk
Computer Science 220
Rj
Rk
Yes Yes
No Yes
F10
Div
F12
…
F31
22
Scoreboard Example Cycle 13
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Fi
Integer
No
Mult1
Yes
Mult
F0
Mult2
No
Add
Yes
Ad
F6
Divide
Yes
Div
F10
Register Result Status
CLOCK
F0
F2
13
FU Mul1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
5
6
7
8
13
Read Execution Write
Operand Complete Result
2
3
4
6
7
8
9
9
11
12
Fk
F2
F4
Yes Yes
F8
F0
F2
F6
Yes Yes
No Yes
F4
Qj
Qk
Mult1
F6
Add
F8
Computer Science 220
F10
Div
Rj
Rk
Fj
F12
…
F31
23
Scoreboard Example Cycle 14
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Fi
Integer
No
Mult1
Yes
Mult
F0
Mult2
No
Add
Yes
Ad
F6
Divide
Yes
Div
F10
Register Result Status
CLOCK
F0
F2
14
FU Mul1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
5
6
7
8
13
Read Execution Write
Operand Complete Result
2
3
4
6
7
8
9
9
11
12
14
Fk
F2
F4
Yes Yes
F8
F0
F2
F6
Yes Yes
No Yes
F4
Qj
Qk
Mult1
F6
Add
F8
Computer Science 220
F10
Div
Rj
Rk
Fj
F12
…
F31
24
Scoreboard Example Cycle 15
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Fi
Integer
No
Mult1
Yes
Mult
F0
Mult2
No
Add
Yes
Add
F6
Divide
Yes
Div
F10
Register Result Status
CLOCK
F0
F2
15
FU Mul1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
5
6
7
8
13
Read Execution Write
Operand Complete Result
2
3
4
6
7
8
9
9
11
12
14
Fk
F2
F4
Yes Yes
F8
F0
F2
F6
Yes Yes
No Yes
F4
Qj
Qk
Mult1
F6
Add
F8
Computer Science 220
F10
Div
Rj
Rk
Fj
F12
…
F31
25
Scoreboard Example Cycle 16
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Fi
Integer
No
Mult1
Yes
Mult
F0
Mult2
No
Add
Yes
Add
F6
Divide
Yes
Div
F10
Register Result Status
CLOCK
F0
F2
16
FU Mul1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
5
6
7
8
13
Read
Execution Write
Operand Complete Result
2
3
4
6
7
8
9
9
11
12
14
16
Fk
F2
F4
Yes Yes
F8
F0
F2
F6
Yes Yes
No Yes
F4
Qj
Qk
Mult1
F6
Add
F8
Computer Science 220
F10
Div
Rj
Rk
Fj
F12
…
F31
26
Scoreboard Example Cycle 17
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Fi
Integer
No
Mult1
Yes
Mult
F0
Mult2
No
Add
Yes
Add
F6
Divide
Yes
Div
F10
Register Result Status
CLOCK
F0
F2
17
FU Mul1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
5
6
7
8
13
Read
Execution Write
Operand Complete Result
2
3
4
6
7
8
9
9
11
12
14
16
Fk
F2
F4
Yes Yes
F8
F0
F2
F6
Yes Yes
No Yes
F4
Qj
Qk
Mult1
F6
Add
F8
Computer Science 220
F10
Div
Rj
Rk
Fj
F12
…
F31
27
Scoreboard Example Cycle 18
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Fi
Integer
No
Mult1
Yes
Mult
F0
Mult2
No
Add
Yes
Add
F6
Divide
Yes
Div
F10
Register Result Status
CLOCK
F0
F2
18
FU Mul1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
5
6
7
8
13
Read Execution Write
Operand Complete Result
2
3
4
6
7
8
9
9
11
12
14
16
Fk
F2
F4
Yes Yes
F8
F0
F2
F6
Yes Yes
No Yes
F4
Qj
Qk
Mult1
F6
Add
F8
Computer Science 220
F10
Div
Rj
Rk
Fj
F12
…
F31
28
Scoreboard Example Cycle 19
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Fi
Integer
No
Mult1
Yes
Mult
F0
Mult2
No
Add
Yes
Add
F6
Divide
Yes
Div
F10
Register Result Status
CLOCK
F0
F2
19
FU Mul1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
5
6
7
8
13
Read Execution Write
Operand Complete Result
2
3
4
6
7
8
9
19
9
11
12
14
16
Fk
F2
F4
Yes Yes
F8
F0
F2
F6
Yes Yes
No Yes
F4
Qj
Qk
Mult1
F6
Add
F8
Computer Science 220
F10
Div
Rj
Rk
Fj
F12
…
F31
29
Scoreboard Example Cycle 20
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Integer
No
Mult1
No
Mult2
No
Add
Yes
Add
Divide
Yes
Div
Register Result Status
CLOCK
F0
20
FU
Issue
1
5
6
7
8
13
Fi
Read Execution Write
Operand Complete Result
2
3
4
6
7
8
9
19
20
9
11
12
14
Fj
Fk
F6 F8
F10 F0
F2
F6
F2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
F4
16
Qj
Qk
F8
Computer Science 220
Rk
Yes Yes
Yes Yes
Mult1
F6
Add
Rj
F10
Div
F12
…
F31
30
Scoreboard Example Cycle 21
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Integer
No
Mult1
No
Mult2
No
Add
Yes
Add
Divide
Yes
Div
Register Result Status
CLOCK
F0
21
FU
Issue
1
5
6
7
8
13
Fi
Read Execution Write
Operand Complete Result
2
3
4
6
7
8
9
19
20
9
11
12
21
14
16
Fj
Fk
F6 F8
F10 F0
F2
F6
F2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
F4
Qj
Qk
F8
Computer Science 220
Rk
Yes Yes
Yes Yes
Mult1
F6
Add
Rj
F10
Div
F12
…
F31
31
Scoreboard Example Cycle 22
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Integer
No
Mult1
No
Mult2
No
Add
No
Divide
Yes
Div
Register Result Status
CLOCK
F0
22
FU
Issue
1
5
6
7
8
13
Fi
Read Execution Write
Operand Complete Result
2
3
4
6
7
8
9
19
20
9
11
12
21
14
16
22
Fj
Fk
Qj
F10 F0
F6
Mult1
F2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
F4
F6
Qk
F8
Computer Science 220
Rj
40 cycle
Divide
Rk
Yes Yes
F10
Div
F12
…
F31
32
Scoreboard Example Cycle 61
Instruction Status
Instruction
j
LD
F6
34+
LD
F2
45+
MULT F0
F2
SUBD F8
F6
DIVD F10
F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional Unit Status
Name
Busy
Op
Integer
No
Mult1
No
Mult2
No
Add
No
Divide
Yes
Div
Register Result Status
CLOCK
F0
61
FU
Issue
1
5
6
7
8
13
Fi
Read Execution Write
Operand Complete Result
2
3
4
6
7
8
9
19
20
9
11
12
21
61
14
16
22
Fj
Fk
Qj
F10 F0
F6
Mult1
F2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
F4
F6
Qk
F8
Computer Science 220
Rj
Rk
Yes Yes
F10
Div
F12
…
F31
33
Scoreboard Summary
• Speedup 1.7 from compiler; 2.5 by hand
BUT slow memory (no cache)
• Limitations of 6600 scoreboard
–
–
–
–
–
No forwarding
Limited to instructions in basic block (small window)
Number of functional units (structural hazards)
Wait for WAR hazards
Prevent WAW hazards
• How to design a datapath that eliminates these
problems?
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
34
Tomasulo’s Algorithm: Another Dynamic Scheme
• For IBM 360/91 about 3 years after CDC 6600
• Goal: High Performance without special compilers
• Differences between IBM 360 & CDC 6600 ISA
– IBM has only 2 register specifiers/instr vs. 3 in CDC 6600
– IBM has 4 FP registers vs. 8 in CDC 6600
• Differences between Tomasulo Algorithm &
Scoreboard
– Control & buffers distributed with Function Units vs. centralized in
scoreboard; called “reservation stations”
– Register specifiers in instructions replaced by pointers to
reservation station buffer (Everything can be solved with level of
indirection!)
– HW renaming of registers to avoid WAR, WAW hazards
– Common Data Bus broadcasts results to all FUs
– Load and Stores treated as FUs as well
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
35
Tomasulo Organization
From
Memory
Load
Buffers
From Instruction Unit
FP Registers
FP op
queue
Operand
Bus
Store
Buffers
To Memory
FP multipliers
FP adders
Common Data Bus (CDB)
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
36
Reservation Station Components
Op—Operation to perform in the unit (e.g., + or –)
Qj, Qk—Reservation stations producing source
registers
Vj, Vk—Value of Source operands
Rj, Rk—Flags indicating when Vj, Vk are ready
Busy—Indicates reservation station and FU is busy
Register result status—Indicates which functional
unit will write each register, if one exists. Blank
when no pending instructions that will write that
register.
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
37
Three Stages of Tomasulo Algorithm
1. Issue—get instruction from FP Op Queue
If reservation station free, the scoreboard issues instr &
sends operands (renames registers).
2. Execution—operate on operands (EX)
When both operands ready then execute;
if not ready, watch CDB for result
3. Write result—finish execution (WB)
Write on Common Data Bus to all awaiting units;
mark reservation station available.
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
38
Tomasulo Example Cycle 0
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Execution
complete
S1
Vj
S2
Vk
RS for j
Qj
RS for k
Qk
Clock
F2
F4
F6
F8
0
F0
Write
Result
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
F30
FU
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
39
Tomasulo Example Cycle 1
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Execution
complete
S1
Vj
S2
Vk
RS for j
Qj
RS for k
Qk
Clock
F2
F4
F6
F8
1
F0
Write
Result
Load1
Load2
Load3
FU
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Busy
No
Yes
No
No
Address
34+R2
F10
F12 ...
F30
Load1
Computer Science 220
40
Tomasulo Example Cycle 2
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Execution
complete
S1
Vj
S2
Vk
RS for j
Qj
RS for k
Qk
Clock
F2
F4
F6
F8
2
F0
FU
Write
Result
Load1
Load2
Load3
Load2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Busy
Yes
Yes
No
Address
34+R2
45+R3
F10
F12 ...
F30
Load1
Computer Science 220
41
Tomasulo Example Cycle 3
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
3
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Execution
complete
3
Write
Result
S1
Vj
S2
Vk
RS for j
Qj
R(F4)
Load2
Clock
F0
F2
F4
F6
Mult1
Load2
3
FU
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Load1
Load2
Load3
Busy
Yes
Yes
No
Address
34+R2
45+R3
F10
F12 ...
RS for k
Qk
F8
F30
Load1
Computer Science 220
42
Tomasulo Example Cycle 4
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 Yes
0 Add2 No
Add3 No
0 Mult1 Yes
0 Mult2 No
Register result status
Clock
4
FU
Issue
1
2
3
4
Execution
complete
3
Write
Result
4
Load1
Load2
Load3
S1
Op
Vj
SUBD M(34+R2)
S2
Vk
RS for j
Qj
MULTD
R(F4)
Load2
F4
F6
F8
M(34+R2)
Add1
F0
F2
Mult1
Load2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
Busy
No
Yes
No
Address
F10
F12 ...
45+R3
RS for k
Qk
Load2
F30
43
Tomasulo Example Cycle 5
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 Yes
0 Add2 No
Add3 No
0 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
5
FU
Issue
1
2
3
4
5
Execution
complete
3
5
Write
Result
4
Load1
Load2
Load3
Busy
No
Yes
No
Address
F12 ...
S1
Op
Vj
SUBD M(34+R2)
S2
Vk
RS for j
Qj
MULTD
DIVD
R(F4)
M(34+R2)
Load2
Mult1
F4
F6
F8
F10
M(34+R2)
Add1
Mult2
F0
F2
Mult1
Load2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
45+R3
RS for k
Qk
Load2
F30
44
Tomasulo Example Cycle 6
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
2 Add1 Yes
0 Add2 Yes
Add3 No
10 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
6
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
Write
Result
4
6
Load1
Load2
Load3
Address
F12 ...
S1
Op
Vj
SUBD M(34+R2)
ADDD
S2
Vk
M(45+R3)
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
F10
Mult1
M(45+R3)
Add2
Add1
Mult2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
RS for j
Qj
Busy
No
No
No
RS for k
Qk
Add1
Computer Science 220
F30
45
Tomasulo Example Cycle 7
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
1 Add1 Yes
0 Add2 Yes
Add3 No
9 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
7
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
Write
Result
4
6
Load1
Load2
Load3
Address
F12 ...
S1
Op
Vj
SUBD M(34+R2)
ADDD
S2
Vk
M(45+R3)
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
F10
Mult1
M(45+R3)
Add2
Add1
Mult2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
RS for j
Qj
Busy
No
No
No
RS for k
Qk
Add1
Computer Science 220
F30
46
Tomasulo Example Cycle 8
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 Yes
0 Add2 Yes
Add3 No
8 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
8
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
Write
Result
4
6
Load1
Load2
Load3
Busy
No
No
No
Address
F12 ...
8
S1
Op
Vj
SUBD M(34+R2)
ADDD
S2
Vk
M(45+R3)
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
F10
Mult1
M(45+R3)
Add2
Add1
Mult2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
RS for j
Qj
RS for k
Qk
Add1
Computer Science 220
F30
47
Tomasulo Example Cycle 9
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 Yes
Add3 No
7 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
9
FU
Issue
1
2
3
4
5
6
Op
Execution
complete
3
5
Write
Result
4
6
8
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
9
S2
Vk
RS for j
Qj
RS for k
Qk
ADDD M()–M()
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
Add2
M()–M() Mult2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
F30
48
Tomasulo Example Cycle 10
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
2 Add2 Yes
Add3 No
67 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
10
FU
Issue
1
2
3
4
5
6
Op
Execution
complete
3
5
Write
Result
4
6
8
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
9
S2
Vk
RS for j
Qj
RS for k
Qk
ADDD M()–M()
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
Add2
M()–M() Mult2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
F30
49
Tomasulo Example Cycle 11
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
1 Add2 Yes
Add3 No
5 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
11
FU
Issue
1
2
3
4
5
6
Op
Execution
complete
3
5
Write
Result
4
6
8
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
9
S2
Vk
RS for j
Qj
RS for k
Qk
ADDD M()–M()
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
Add2
M()–M() Mult2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
F30
50
Tomasulo Example Cycle 12
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 Yes
Add3 No
4 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
12
FU
Issue
1
2
3
4
5
6
Op
Execution
complete
3
5
Write
Result
4
6
8
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
9
12
S1
Vj
S2
Vk
RS for j
Qj
RS for k
Qk
ADDD M()–M()
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
Add2
M()–M() Mult2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
F30
51
Tomasulo Example Cycle 13
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
Add3 No
3 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
13
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
Write
Result
4
6
8
9
12
13
Address
F10
F12 ...
S2
Vk
RS for j
Qj
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
(M–M)+M()
M()–M() Mult2
Op
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
RS for k
Qk
F30
52
Tomasulo Example Cycle 14
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
Add3 No
2 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
14
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
Write
Result
4
6
8
9
12
13
Address
F10
F12 ...
S2
Vk
RS for j
Qj
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
(M–M)+M()
M()–M() Mult2
Op
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
RS for k
Qk
F30
53
Tomasulo Example Cycle 15
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
Add3 No
1 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
15
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
Write
Result
4
6
8
9
12
13
Address
F10
F12 ...
S2
Vk
RS for j
Qj
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
(M–M)+M()
M()–M() Mult2
Op
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
RS for k
Qk
F30
54
Tomasulo Example Cycle 16
Instruction status
Instruction
j
k
LD
F6
34+
R2
LD
F2
45+
R3
MULTDF0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
16
FU
Issue
1
2
3
4
5
6
Execution
complete
3
5
16
8
Write
Result
4
6
Address
F10
F12 ...
9
12
13
S2
Vk
RS for j
Qj
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
(M–M)+M()
M()–M() Mult2
Op
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
RS for k
Qk
F30
55
Tomasulo Example Cycle 17
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
3
SUBD F8
F6
F2
4
DIVD F10 F0
F6
5
ADDD F6
F8
F2
6
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 Yes DIVD
Register result status
Execution
complete
3
5
16
8
Clock
17
FU
Write
Result
4
6
17
9
12
Address
F10
F12 ...
13
S1
Vj
S2
Vk
M*F4
M(34+R2)
F0
F2
F4
M*F4
M(45+R3)
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Load1
Load2
Load3
Busy
No
No
No
RS for j
Qj
RS for k
Qk
F6
F8
(M–M)+M()
M()–M() Mult2
Computer Science 220
F30
56
Tomasulo Example Cycle 18
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
3
SUBD F8
F6
F2
4
DIVD F10 F0
F6
5
ADDD F6
F8
F2
6
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
40 Mult2 Yes DIVD
Register result status
Execution
complete
3
5
16
8
Clock
18
FU
Write
Result
4
6
17
9
12
Address
F10
F12 ...
13
S1
Vj
S2
Vk
M*F4
M(34+R2)
F0
F2
F4
M*F4
M(45+R3)
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Load1
Load2
Load3
Busy
No
No
No
RS for j
Qj
RS for k
Qk
F6
F8
(M–M)+M()
M()–M() Mult2
Computer Science 220
F30
57
Tomasulo Example Cycle 57
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
3
SUBD F8
F6
F2
4
DIVD F10 F0
F6
5
ADDD F6
F8
F2
6
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
1 Mult2 Yes DIVD
Register result status
Execution
complete
3
5
16
8
Clock
57
FU
Write
Result
4
6
17
9
12
Address
F10
F12 ...
13
S1
Vj
S2
Vk
M*F4
M(34+R2)
F0
F2
F4
M*F4
M(45+R3)
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Load1
Load2
Load3
Busy
No
No
No
RS for j
Qj
RS for k
Qk
F6
F8
(M–M)+M()
M()–M() Mult2
Computer Science 220
F30
58
Tomasulo Example Cycle 58
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
3
SUBD F8
F6
F2
4
DIVD F10 F0
F6
5
ADDD F6
F8
F2
6
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 Yes DIVD
Register result status
Execution
complete
3
5
16
8
58
12
S1
Vj
Write
Result
4
6
17
9
M*F4
M(34+R2)
Clock
F0
F2
F4
M*F4
M(45+R3)
58
FU
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
13
S2
Vk
RS for j
Qj
RS for k
Qk
F6
F8
(M–M)+M()
M()–M() Mult2
Computer Science 220
F30
59
Tomasulo Example Cycle 59
Instruction status
Instruction
j
k
Issue
LD
F6
34+
R2
1
LD
F2
45+
R3
2
MULTDF0
F2
F4
3
SUBD F8
F6
F2
4
DIVD F10 F0
F6
5
ADDD F6
F8
F2
6
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Execution
complete
3
5
16
8
58
12
S1
Vj
Write
Result
4
6
17
9
59
13
S2
Vk
RS for j
Qj
RS for k
Qk
Clock
F0
F2
F4
F6
F8
M*F4
M(45+R3)
(M–M)+M()
M()–M() M*F4/M
59
FU
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Load1
Load2
Load3
Computer Science 220
Busy
No
No
No
Address
F10
F12 ...
F30
60
Tomasulo vs. Scoreboard
• Is tomasulo better?
• Finish in 59 cycles vs. 61 for scoreboard, why?
• We do reach the divide 3 cycles earlier…
Simultaneous read of operand for SUBD and MULT
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
61
Tomasulo Loop Example
Loop: LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
• Multiply takes 4 clocks
• Loads may have cache misses
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
62
Loop Example Cycle 0
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock
0
R1
80
F0
Issue
S1
Vj
F2
Execution Write
complete Result
S2
Vk
F4
Busy Address
No
No
No
Qi
No
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Qi
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
63
Loop Example Cycle 1
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock
1
R1
80
F0
Qi
Issue
1
S1
Vj
F2
Execution Write
complete Result
S2
Vk
F4
Busy Address
Yes
80
No
No
Qi
No
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Load1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
64
Loop Example Cycle 2
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
2
R1
80
F0
Qi
Load1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
2
S1
Vj
Execution Write
complete Result
S2
Vk
R(F2)
F2
F4
Busy Address
Yes
80
No
No
Qi
No
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
Computer Science 220
65
Loop Example Cycle 3
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
3
R1
80
F0
Qi
Load1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
2
3
S1
Vj
Execution Write
complete Result
S2
Vk
R(F2)
F2
F4
Busy Address
Yes
80
No
No
Qi
Yes
80 Mult1
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
Computer Science 220
66
Loop Example Cycle 4
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
4
R1
72
F0
Qi
Load1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
2
3
S1
Vj
Execution Write
complete Result
S2
Vk
R(F2)
F2
F4
Busy Address
Yes
80
No
No
Qi
Yes
80 Mult1
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
Computer Science 220
67
Loop Example Cycle 5
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
5
R1
72
F0
Qi
Load1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
2
3
S1
Vj
Execution Write
complete Result
S2
Vk
R(F2)
F2
F4
Busy Address
Yes
80
No
No
Qi
Yes
80 Mult1
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
Computer Science 220
68
Loop Example Cycle 6
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
6
R1
72
F0
Qi
Load1
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
2
3
6
S1
Vj
Execution Write
complete Result
S2
Vk
R(F2)
F2
F4
Busy Address
Yes
80
Yes
72
No
Qi
Yes
80 Mult1
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
Computer Science 220
69
Loop Example Cycle 7
Instruction status
Instruction
j
k
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
7
R1
72
iteration
1
1
1
2
2
2
S1
Vj
Op
MULTD
MULTD
F0
Qi
Issue
1
2
3
6
7
Load2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
F2
Execution Write
complete Result
Busy Address
Yes
80
Yes
72
No
Qi
Yes
80 Mult1
No
No
R(F2)
R(F2)
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
Load2
BNEZ R1
F4
F6
S2
Vk
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
Computer Science 220
70
Loop Example Cycle 8
Instruction status
Instruction
j
k
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
8
R1
72
iteration
1
1
1
2
2
2
Op
MULTD
MULTD
F0
Qi
Issue
1
2
3
6
7
8
S1
Vj
Load2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
F2
Execution Write
complete Result
Busy Address
Yes
80
Yes
72
No
Qi
Yes
80 Mult1
Yes
72 Mult2
No
R(F2)
R(F2)
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
Load2
BNEZ R1
F4
F6
S2
Vk
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
Computer Science 220
71
Loop Example Cycle 9
Instruction status
Instruction
j
k
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
9
R1
64
iteration
1
1
1
2
2
2
Op
MULTD
MULTD
F0
Qi
Issue
1
2
3
6
7
8
S1
Vj
Load2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
F2
Execution Write
complete Result
9
Load1
Load2
Load3
Store1
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
R(F2)
R(F2)
Load1
Load2
F4
F6
F8
Busy Address
Yes
80
Yes
72
No
Qi
Yes
80 Mult1
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
Computer Science 220
72
Loop Example Cycle 10
Instruction status
Instruction
j
k
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
0 Add3 No
4 Mult1 Yes
0 Mult2 Yes
Register result status
Clock
10
R1
64
Qi
iteration
1
1
1
2
2
2
Op
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
Load1
Load2
Load3
10
Store1
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
MULTD
MULTD
M(80) R(F2)
R(F2)
Load2
F0
F2
F6
Load2
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
F4
F8
Busy Address
No
Yes
72
No
Qi
Yes
80 Mult1
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
Computer Science 220
73
Loop Example Cycle 11
Instruction status
Instruction
j
k
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
0 Add3 No
3 Mult1 Yes
4 Mult2 Yes
Register result status
Clock
11
R1
64
iteration
1
1
1
2
2
2
Op
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
Load1
Load2
Load3
10
11
Store1
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
MULTD
MULTD
M(80) R(F2)
M(72) R(F2)
F0
F2
Qi
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 Mult1
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
Computer Science 220
74
Loop Example Cycle 12
Instruction status
Instruction
j
k
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
0 Add3 No
2 Mult1 Yes
3 Mult2 Yes
Register result status
Clock
12
R1
64
iteration
1
1
1
2
2
2
Op
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
Load1
Load2
Load3
10
11
Store1
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
MULTD
MULTD
M(80) R(F2)
M(72) R(F2)
F0
F2
Qi
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 Mult1
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
Computer Science 220
75
Loop Example Cycle 13
Instruction status
Instruction
j
k
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
0 Add3 No
1 Mult1 Yes
2 Mult2 Yes
Register result status
Clock
13
R1
64
iteration
1
1
1
2
2
2
Op
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
Load1
Load2
Load3
10
11
Store1
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
MULTD
MULTD
M(80) R(F2)
M(72) R(F2)
F0
F2
Qi
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 Mult1
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
Computer Science 220
76
Loop Example Cycle 14
Instruction status
Instruction
j
k
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
LD
F0
0 R1
MULTDF4
F0 F2
SD
F4
0 R1
Reservation Stations
Time Name Busy
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes
1 Mult2 Yes
Register result status
Clock
14
R1
64
iteration
1
1
1
2
2
2
Op
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
Load1
14
Load2
Load3
10
11
Store1
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
MULTD
MULTD
M(80) R(F2)
M(72) R(F2)
F0
F2
Qi
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 Mult1
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
Computer Science 220
77
Loop Example Cycle 15
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 No
0 Mult2 Yes MULTD
Register result status
Clock
15
R1
64
F0
Qi
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
Load1
14
15
Load2
Load3
10
11
Store1
15
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
M(72) R(F2)
F2
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 M(80)*R(F2)
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
Computer Science 220
78
Loop Example Cycle 16
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
16
R1
64
F0
Qi
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
2
3
6
7
8
S1
Vj
F2
Execution Write
complete Result
9
10
Load1
14
15
Load2
Load3
10
11
Store1
15
16
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
R(F2)
Load3
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 M(80)*R(F2)
Yes
72 M(72)*R(72)
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
Computer Science 220
79
Loop Example Cycle 17
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
17
R1
64
F0
Qi
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
2
3
6
7
8
S1
Vj
F2
Execution Write
complete Result
9
10
Load1
14
15
Load2
Load3
10
11
Store1
15
16
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
R(F2)
Load3
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 M(80)*R(F2)
Yes
72 M(72)*R(72)
Yes
64 Mult1
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
Computer Science 220
80
Loop Example Cycle 18
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
18
R1
56
F0
Qi
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
14
15
18
10
11
15
16
S2
Vk
R(F2)
F2
F4
Busy Address
No
No
Yes
64 Qi
Yes
80 M(80)*R(F2)
Yes
72 M(72)*R(72)
Yes
64 Mult1
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load3
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
Computer Science 220
81
Loop Example Cycle 19
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
19
R1
56
F0
Qi
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
14
15
18
19
10
11
15
16
S2
Vk
R(F2)
F2
F4
Busy Address
No
No
Yes
64 Qi
No
Yes
72 M(72)*R(72)
Yes
64 Mult1
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load3
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
Computer Science 220
82
Loop Example Cycle 20
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
20
R1
56
F0
Qi
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
14
15
18
19
10
11
15
16
20
S2
RS for j
Vk
Qj
R(F2)
F2
F4
Busy Address
No
No
Yes
64 Qi
No
Yes
72 M(72)*R(72)
Yes
64 Mult1
Load1
Load2
Load3
Store1
Store2
Store3
RS for k
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load3
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
Computer Science 220
83
Loop Example Cycle 21
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
21
R1
56
F0
Qi
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
14
15
18
19
10
11
15
16
20
21
S2
RS for j
Vk
Qj
R(F2)
F2
F4
Busy Address
No
No
Yes
64 Qi
No
No
Yes
64 Mult1
Load1
Load2
Load3
Store1
Store2
Store3
RS for k
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load3
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
Computer Science 220
84
Tomasulo Summary
•
•
•
•
Prevents Register as bottleneck
Avoids WAR, WAW hazards of Scoreboard
Allows loop unrolling in HW
Not limited to basic blocks (provided branch
prediction)
• Lasting Contributions
– Dynamic scheduling
– Register renaming
– Load/store disambiguation
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
85
Next Time
• Tomasulo’s Algorithm
© 2008 Lebeck, Sorin, Roth, Hill, Wood,
Sohi, Smith, Vijaykumar, Lipasti, Katz
Computer Science 220
86