Lecture 11: Instruction Level Parallelism and the

Tomasulo’s Algorithm
RHK.S96 1
Tomasulo Organization
RHK.S96 2
Reservation Station Components
Op—Operation to perform in the unit (e.g., + or –)
Qj, Qk—Reservation stations producing source
registers
Vj, Vk—Value of Source operands
Rj, Rk—Flags indicating when Vj, Vk are ready
Busy—Indicates reservation station and FU is busy
Register result status—Indicates which functional
unit will write each register, if one exists. Blank when
no pending instructions that will write that register.
RHK.S96 3
Three Stages of Tomasulo Algorithm
1. Issue—get instruction from FP Op Queue
If reservation station free, the scoreboard issues instr &
sends operands (renames registers).
2. Execution—operate on operands (EX)
When both operands ready then execute;
if not ready, watch CDB for result
3. Write result—finish execution (WB)
Write on Common Data Bus to all awaiting units;
mark reservation station available.
RHK.S96 4
Tomasulo Example Cycle 0
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
0 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Write
Result
Load1
Load2
Load3
Busy Op
No
No
No
No
No
Clock
0
Issue
Execution
complete
F0
S1
Vj
S2
Vk
RS for j
Qj
RS for k
Qk
F2
F4
F6
F8
Busy
No
No
No
Address
F10
F12 ...
F30
FU
RHK.S96 5
Tomasulo Example Cycle 1
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
0 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Busy Op
No
No
No
No
No
Clock
1
Issue
1
F0
FU
Execution
complete
Write
Result
Load1
Load2
Load3
S1
Vj
S2
Vk
RS for j
Qj
RS for k
Qk
F2
F4
F6
F8
Busy
No
Yes
No
No
Address
34+R2
F10
F12 ...
F30
Load1
RHK.S96 6
Tomasulo Example Cycle 2
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
0 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Busy Op
No
No
No
No
No
Clock
2
Issue
1
2
F0
FU
Execution
complete
Write
Result
Load1
Load2
Load3
S1
Vj
S2
Vk
RS for j
Qj
RS for k
Qk
F2
F4
F6
F8
Load2
Busy
Yes
Yes
No
Address
34+R2
45+R3
F10
F12 ...
F30
Load1
RHK.S96 7
Tomasulo Example Cycle 3
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
0 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Busy Op
No
No
No
Yes MULTD
No
Clock
3
Issue
1
2
3
FU
Execution
complete
3
Write
Result
S1
Vj
S2
Vk
RS for j
Qj
R(F4)
Load2
F4
F6
F0
F2
Mult1
Load2
Load1
Load2
Load3
Busy
Yes
Yes
No
Address
34+R2
45+R3
F10
F12 ...
RS for k
Qk
F8
F30
Load1
RHK.S96 8
Tomasulo Example Cycle 4
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
0 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Busy Op
Yes SUBD
No
No
Yes MULTD
No
Clock
4
Issue
1
2
3
4
FU
Execution
complete
3
S1
Vj
M(34+R2)
F0
F2
Mult1
Load2
Write
Result
4
Load1
Load2
Load3
S2
Vk
RS for j
Qj
R(F4)
Load2
F4
F6
F8
M(34+R2)
Add1
Busy
No
Yes
No
Address
F10
F12 ...
45+R3
RS for k
Qk
Load2
F30
RHK.S96 9
Tomasulo Example Cycle 5
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
0 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Busy
Yes
No
No
Yes
Yes
Clock
5
FU
Issue
1
2
3
4
5
Op
SUBD
Execution
complete
3
5
Write
Result
4
S1
Vj
M(34+R2)
S2
Vk
RS for j
Qj
R(F4)
M(34+R2)
Load2
Mult1
F4
F6
M(34+R2)
MULTD
DIVD
F0
F2
Mult1
Load2
Busy
No
Yes
No
Address
F8
F10
F12 ...
Add1
Mult2
Load1
Load2
Load3
45+R3
RS for k
Qk
Load2
F30
RHK.S96 10
Tomasulo Example Cycle 6
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
2 Add1
0 Add2
Add3
10 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Busy
Yes
Yes
No
Yes
Yes
Clock
6
FU
Issue
1
2
3
4
5
6
Op
SUBD
ADDD
Execution
complete
3
5
Write
Result
4
6
S1
Vj
M(34+R2)
S2
Vk
M(45+R3)
M(45+R3)
Load1
Load2
Load3
RS for j
Qj
Busy
No
No
No
Address
F12 ...
RS for k
Qk
Add1
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
F10
Mult1
M(45+R3)
Add2
Add1
Mult2
F30
RHK.S96 11
Tomasulo Example Cycle 7
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
1 Add1
0 Add2
Add3
9 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Busy
Yes
Yes
No
Yes
Yes
Clock
7
FU
Issue
1
2
3
4
5
6
Op
SUBD
ADDD
Execution
complete
3
5
Write
Result
4
6
S1
Vj
M(34+R2)
S2
Vk
M(45+R3)
M(45+R3)
Load1
Load2
Load3
RS for j
Qj
Busy
No
No
No
Address
F12 ...
RS for k
Qk
Add1
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
F10
Mult1
M(45+R3)
Add2
Add1
Mult2
F30
RHK.S96 12
Tomasulo Example Cycle 8
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
8 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Busy
Yes
Yes
No
Yes
Yes
Clock
8
FU
Issue
1
2
3
4
5
6
Op
SUBD
ADDD
Execution
complete
3
5
Write
Result
4
6
Load1
Load2
Load3
Busy
No
No
No
Address
F12 ...
8
S1
Vj
M(34+R2)
S2
Vk
M(45+R3)
M(45+R3)
RS for j
Qj
RS for k
Qk
Add1
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
F10
Mult1
M(45+R3)
Add2
Add1
Mult2
F30
RHK.S96 13
Tomasulo Example Cycle 9
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
7 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Busy
No
Yes
No
Yes
Yes
Clock
9
FU
Issue
1
2
3
4
5
6
Op
Execution
complete
3
5
Write
Result
4
6
8
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
9
S2
Vk
RS for j
Qj
RS for k
Qk
ADDD M()–M()
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
Add2
M()–M() Mult2
F30
RHK.S96 14
Tomasulo Example Cycle 10
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
2 Add2
Add3
67 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Busy
No
Yes
No
Yes
Yes
Clock
10
FU
Issue
1
2
3
4
5
6
Op
Execution
complete
3
5
Write
Result
4
6
8
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
9
S2
Vk
RS for j
Qj
RS for k
Qk
ADDD M()–M()
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
Add2
M()–M() Mult2
F30
RHK.S96 15
Tomasulo Example Cycle 11
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
1 Add2
Add3
5 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Busy
No
Yes
No
Yes
Yes
Clock
11
FU
Issue
1
2
3
4
5
6
Op
Execution
complete
3
5
Write
Result
4
6
8
S1
Vj
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
9
S2
Vk
RS for j
Qj
RS for k
Qk
ADDD M()–M()
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
Add2
M()–M() Mult2
F30
RHK.S96 16
Tomasulo Example Cycle 12
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
4 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Busy
No
Yes
No
Yes
Yes
Clock
12
FU
Issue
1
2
3
4
5
6
Op
Execution
complete
3
5
Write
Result
4
6
8
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
9
12
S1
Vj
S2
Vk
RS for j
Qj
RS for k
Qk
ADDD M()–M()
M(45+R3)
MULTD M(45+R3)
DIVD
R(F4)
M(34+R2)
Mult1
F0
F2
F4
F6
F8
Mult1
M(45+R3)
Add2
M()–M() Mult2
F30
RHK.S96 17
Tomasulo Example Cycle 13
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
3 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
9
12
13
Busy Op
No
No
No
Yes MULTD M(45+R3)
Yes DIVD
FU
Write
Result
4
6
8
S1
Vj
Clock
13
Issue
1
2
3
4
5
6
Execution
complete
3
5
F0
F2
Mult1
M(45+R3)
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
S2
Vk
RS for j
Qj
RS for k
Qk
R(F4)
M(34+R2)
Mult1
F4
F6
F8
(M–M)+M()
M()–M() Mult2
F30
RHK.S96 18
Tomasulo Example Cycle 14
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
2 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
9
12
13
Busy Op
No
No
No
Yes MULTD M(45+R3)
Yes DIVD
FU
Write
Result
4
6
8
S1
Vj
Clock
14
Issue
1
2
3
4
5
6
Execution
complete
3
5
F0
F2
Mult1
M(45+R3)
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
S2
Vk
RS for j
Qj
RS for k
Qk
R(F4)
M(34+R2)
Mult1
F4
F6
F8
(M–M)+M()
M()–M() Mult2
F30
RHK.S96 19
Tomasulo Example Cycle 15
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
1 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
9
12
13
Busy Op
No
No
No
Yes MULTD M(45+R3)
Yes DIVD
FU
Write
Result
4
6
8
S1
Vj
Clock
15
Issue
1
2
3
4
5
6
Execution
complete
3
5
F0
F2
Mult1
M(45+R3)
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
S2
Vk
RS for j
Qj
RS for k
Qk
R(F4)
M(34+R2)
Mult1
F4
F6
F8
(M–M)+M()
M()–M() Mult2
F30
RHK.S96 20
Tomasulo Example Cycle 16
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
0 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Busy Op
No
No
No
Yes MULTD M(45+R3)
Yes DIVD
FU
Write
Result
4
6
F0
F2
Mult1
M(45+R3)
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
9
12
S1
Vj
Clock
16
Issue
1
2
3
4
5
6
Execution
complete
3
5
16
8
13
S2
Vk
RS for j
Qj
RS for k
Qk
R(F4)
M(34+R2)
Mult1
F4
F6
F8
(M–M)+M()
M()–M() Mult2
F30
RHK.S96 21
Tomasulo Example Cycle 17
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
0 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
FU
Write
Result
4
6
17
9
12
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
13
S1
Vj
S2
Vk
M*F4
M(34+R2)
F0
F2
F4
M*F4
M(45+R3)
Busy Op
No
No
No
No
Yes DIVD
Clock
17
Issue
1
2
3
4
5
6
Execution
complete
3
5
16
8
RS for j
Qj
RS for k
Qk
F6
F8
(M–M)+M()
M()–M() Mult2
F30
RHK.S96 22
Tomasulo Example Cycle 18
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
0 Mult1
40 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
FU
Write
Result
4
6
17
9
12
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
13
S1
Vj
S2
Vk
M*F4
M(34+R2)
F0
F2
F4
M*F4
M(45+R3)
Busy Op
No
No
No
No
Yes DIVD
Clock
18
Issue
1
2
3
4
5
6
Execution
complete
3
5
16
8
RS for j
Qj
RS for k
Qk
F6
F8
(M–M)+M()
M()–M() Mult2
F30
RHK.S96 23
Tomasulo Example Cycle 57
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
0 Mult1
1 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
FU
Write
Result
4
6
17
9
12
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
13
S1
Vj
S2
Vk
M*F4
M(34+R2)
F0
F2
F4
M*F4
M(45+R3)
Busy Op
No
No
No
No
Yes DIVD
Clock
57
Issue
1
2
3
4
5
6
Execution
complete
3
5
16
8
RS for j
Qj
RS for k
Qk
F6
F8
(M–M)+M()
M()–M() Mult2
F30
RHK.S96 24
Tomasulo Example Cycle 58
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
0 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Write
Result
4
6
17
9
M*F4
M(34+R2)
F0
F2
F4
M*F4
M(45+R3)
Issue
1
2
3
4
5
6
Busy Op
No
No
No
No
Yes DIVD
Clock
58
Execution
complete
3
5
16
8
58
12
S1
Vj
FU
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
13
S2
Vk
RS for j
Qj
RS for k
Qk
F6
F8
(M–M)+M()
M()–M() Mult2
F30
RHK.S96 25
Tomasulo Example Cycle 59
Instruction status
Instruction
j
LD
F6
34+
LD
F2
45+
MULTDF0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
Time Name
0 Add1
0 Add2
Add3
0 Mult1
0 Mult2
Register result status
k
R2
R3
F4
F2
F6
F2
Write
Result
4
6
17
9
59
13
S2
Vk
RS for j
Qj
RS for k
Qk
F0
F2
F4
F6
F8
M*F4
M(45+R3)
(M–M)+M()
M()–M() M*F4/M
Issue
1
2
3
4
5
6
Busy Op
No
No
No
No
No
Clock
59
Execution
complete
3
5
16
8
58
12
S1
Vj
FU
Load1
Load2
Load3
Busy
No
No
No
Address
F10
F12 ...
F30
RHK.S96 26
Tomasulo Loop Example
Loop: LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
• Multiply takes 4 clocks
• Load have cache misses
RHK.S96 27
Loop Example Cycle 0
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock
0
R1
80
F0
Issue
S1
Vj
F2
Execution Write
complete Result
S2
Vk
F4
Busy Address
No
No
No
Qi
No
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Qi
RHK.S96 28
Loop Example Cycle 1
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock
1
R1
80
F0
Qi
Issue
1
S1
Vj
F2
Execution Write
complete Result
S2
Vk
F4
Busy Address
Yes
80
No
No
Qi
No
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Load1
RHK.S96 29
Loop Example Cycle 2
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
2
R1
80
F0
Qi
Load1
Issue
1
2
S1
Vj
Execution Write
complete Result
S2
Vk
R(F2)
F2
F4
Busy Address
Yes
80
No
No
Qi
No
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
RHK.S96 30
Loop Example Cycle 3
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
3
R1
80
F0
Qi
Load1
Issue
1
2
3
S1
Vj
Execution Write
complete Result
S2
Vk
R(F2)
F2
F4
Busy Address
Yes
80
No
No
Qi
Yes
80 Mult1
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
RHK.S96 31
Loop Example Cycle 4
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
4
R1
72
F0
Qi
Load1
Issue
1
2
3
S1
Vj
Execution Write
complete Result
S2
Vk
R(F2)
F2
F4
Busy Address
Yes
80
No
No
Qi
Yes
80 Mult1
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
RHK.S96 32
Loop Example Cycle 5
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
5
R1
72
F0
Qi
Load1
Issue
1
2
3
S1
Vj
Execution Write
complete Result
S2
Vk
R(F2)
F2
F4
Busy Address
Yes
80
No
No
Qi
Yes
80 Mult1
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
RHK.S96 33
Loop Example Cycle 6
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
6
R1
72
F0
Qi
Load1
Issue
1
2
3
6
S1
Vj
Execution Write
complete Result
S2
Vk
R(F2)
F2
F4
Busy Address
Yes
80
Yes
72
No
Qi
Yes
80 Mult1
No
No
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
RHK.S96 34
Loop Example Cycle 7
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 Yes MULTD
Register result status
Clock
7
R1
72
F0
Qi
Load2
Issue
1
2
3
6
7
S1
Vj
F2
Execution Write
complete Result
Busy Address
Yes
80
Yes
72
No
Qi
Yes
80 Mult1
No
No
R(F2)
R(F2)
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
Load2
BNEZ R1
F4
F6
S2
Vk
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
RHK.S96 35
Loop Example Cycle 8
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 Yes MULTD
Register result status
Clock
8
R1
72
F0
Qi
Load2
Issue
1
2
3
6
7
8
S1
Vj
F2
Execution Write
complete Result
Busy Address
Yes
80
Yes
72
No
Qi
Yes
80 Mult1
Yes
72 Mult2
No
R(F2)
R(F2)
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load1
SUBI R1
Load2
BNEZ R1
F4
F6
S2
Vk
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
RHK.S96 36
Loop Example Cycle 9
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 Yes MULTD
Register result status
Clock
9
R1
64
F0
Qi
Load2
Issue
1
2
3
6
7
8
S1
Vj
F2
Execution Write
complete Result
9
Load1
Load2
Load3
Store1
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
R(F2)
R(F2)
Load1
Load2
F4
F6
F8
Busy Address
Yes
80
Yes
72
No
Qi
Yes
80 Mult1
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
RHK.S96 37
Loop Example Cycle 10
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
4 Mult1 Yes MULTD
0 Mult2 Yes MULTD
Register result status
Clock
10
R1
64
F0
Qi
Load2
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
Load1
Load2
Load3
10
Store1
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
M(80) R(F2)
R(F2)
Load2
F2
F6
F4
F8
Busy Address
No
Yes
72
No
Qi
Yes
80 Mult1
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
RHK.S96 38
Loop Example Cycle 11
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
3 Mult1 Yes MULTD
4 Mult2 Yes MULTD
Register result status
Clock
11
R1
64
F0
Qi
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
Load1
Load2
Load3
10
11
Store1
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
M(80) R(F2)
M(72) R(F2)
F2
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 Mult1
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
RHK.S96 39
Loop Example Cycle 12
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
2 Mult1 Yes MULTD
3 Mult2 Yes MULTD
Register result status
Clock
12
R1
64
F0
Qi
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
Load1
Load2
Load3
10
11
Store1
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
M(80) R(F2)
M(72) R(F2)
F2
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 Mult1
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
RHK.S96 40
Loop Example Cycle 13
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
1 Mult1 Yes MULTD
2 Mult2 Yes MULTD
Register result status
Clock
13
R1
64
F0
Qi
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
Load1
Load2
Load3
10
11
Store1
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
M(80) R(F2)
M(72) R(F2)
F2
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 Mult1
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
RHK.S96 41
Loop Example Cycle 14
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
1 Mult2 Yes MULTD
Register result status
Clock
14
R1
64
F0
Qi
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
Load1
14
Load2
Load3
10
11
Store1
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
M(80) R(F2)
M(72) R(F2)
F2
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 Mult1
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
RHK.S96 42
Loop Example Cycle 15
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 No
0 Mult2 Yes MULTD
Register result status
Clock
15
R1
64
F0
Qi
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
Load1
14
15
Load2
Load3
10
11
Store1
15
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
M(72) R(F2)
F2
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 M(80)*R(F2)
Yes
72 Mult2
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult2
RHK.S96 43
Loop Example Cycle 16
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
16
R1
64
F0
Qi
Issue
1
2
3
6
7
8
S1
Vj
F2
Execution Write
complete Result
9
10
Load1
14
15
Load2
Load3
10
11
Store1
15
16
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
R(F2)
Load3
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 M(80)*R(F2)
Yes
72 M(72)*R(72)
No
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
RHK.S96 44
Loop Example Cycle 17
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
17
R1
64
F0
Qi
Issue
1
2
3
6
7
8
S1
Vj
F2
Execution Write
complete Result
9
10
Load1
14
15
Load2
Load3
10
11
Store1
15
16
Store2
Store3
S2
RS for j RS for k
Vk
Qj
Qk
R(F2)
Load3
F4
F6
F8
Busy Address
No
No
Yes
64 Qi
Yes
80 M(80)*R(F2)
Yes
72 M(72)*R(72)
Yes
64 Mult1
Code:
LD
F0
MULTDF4
SD
F4
SUBI R1
BNEZ R1
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
RHK.S96 45
Loop Example Cycle 18
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
18
R1
56
F0
Qi
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
14
15
18
10
11
15
16
S2
Vk
R(F2)
F2
F4
Busy Address
No
No
Yes
64 Qi
Yes
80 M(80)*R(F2)
Yes
72 M(72)*R(72)
Yes
64 Mult1
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load3
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
RHK.S96 46
Loop Example Cycle 19
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
19
R1
56
F0
Qi
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
14
15
18
19
10
11
15
16
S2
Vk
R(F2)
F2
F4
Busy Address
No
No
Yes
64 Qi
No
Yes
72 M(72)*R(72)
Yes
64 Mult1
Load1
Load2
Load3
Store1
Store2
Store3
RS for j RS for k
Qj
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load3
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
RHK.S96 47
Loop Example Cycle 20
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
20
R1
56
F0
Qi
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
14
15
18
19
10
11
15
16
20
S2
RS for j
Vk
Qj
R(F2)
F2
F4
Busy Address
No
No
Yes
64 Qi
No
Yes
72 M(72)*R(72)
Yes
64 Mult1
Load1
Load2
Load3
Store1
Store2
Store3
RS for k
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load3
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
RHK.S96 48
Loop Example Cycle 21
Instruction status
Instruction
j
k
iteration
LD
F0
0 R1
1
MULTDF4
F0 F2
1
SD
F4
0 R1
1
LD
F0
0 R1
2
MULTDF4
F0 F2
2
SD
F4
0 R1
2
Reservation Stations
Time Name Busy Op
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 Yes MULTD
0 Mult2 No
Register result status
Clock
21
R1
56
F0
Qi
Issue
1
2
3
6
7
8
S1
Vj
Execution Write
complete Result
9
10
14
15
18
19
10
11
15
16
20
21
S2
RS for j
Vk
Qj
R(F2)
F2
F4
Busy Address
No
No
Yes
64 Qi
No
No
Yes
64 Mult1
Load1
Load2
Load3
Store1
Store2
Store3
RS for k
Qk
Code:
LD
F0
MULTDF4
SD
F4
Load3
SUBI R1
BNEZ R1
F6
F8
0 R1
F0 F2
0 R1
R1 #8
Loop
F10 F12 ... F30
Mult1
RHK.S96 49
Tomasulo Summary
•
•
•
•
Prevents Register as bottleneck
Avoids WAR, WAW hazards of Scoreboard
Allows loop unrolling in HW
Not limited to basic blocks (provided branch
prediction)
• Lasting Contributions
– Dynamic scheduling
– Register renaming
– Load/store disambiguation
• Next: More branch prediction
RHK.S96 50