Pipelining More on LLP, Dynamic Scheduling

MS108 Computer System I
Lecture 7
Tomasulo’s Algorithm
Prof. Xiaoyao Liang
2015/4/10
1
The Tomasulo’s Algorithm
• From IBM 360/91
• Goal: High Performance using a limited number of
registers without a special compiler
– 4 double-precision FP registers on 360
– Uses register renaming
• Why Study a 1966 Computer?
– The descendants of this include: Alpha 21264, HP
8000, MIPS 10000, Pentium III, PowerPC 604, …
2
Tomasulo Algorithm
• Control & buffers are distributed with Function Units (FU)
– FU buffers called “reservation stations (RS)”
– Contain information about instructions, including operands
– More reservation stations than registers, so can do
optimizations compilers can’t
• Registers in instructions replaced by values or pointers to
reservation stations
– form of register renaming
– avoids WAR, WAW hazards
• Results to FU from RS, not through registers (equivalent
of forwarding). A Common Data Bus (CDB) broadcasts
results to all FUs (their RSes)
• Loads and Stores treated as FUs with RSes as well
3
Tomasulo Organization
FP Registers
From Mem
FP Op
Queue
Load Buffers
Load1
Load2
Load3
Load4
Load5
Load6
Store
Buffers
Add1
Add2
Add3
Mult1
Mult2
FP adders
4
Reservation
Stations
To Mem
FP multipliers
Common Data Bus (CDB)
Tomasulo Organization
Reservation Station Components
•
•
•
•
Busy: Indicates reservation station or FU is busy
Op: Operation to perform in the unit (e.g., + or –)
Vj, Vk: Value of Source operands
Qj, Qk: Reservation stations producing source
registers (value to be written)
– Note: Qj,Qk=0 => ready
• A: effective address
5
Tomasulo Organization
• Register result status— Qi
– Indicates which functional unit will write each register,
if one exists. Blank when no pending instructions that
will write that register
• Common data bus
– Normal data bus: data + destination (“go to” bus)
– CDB: data + source (“come from” bus)
• 64 bits of data + 4 bits of Functional Unit source address
• Write if matches expected Functional Unit (produces result)
• Does the broadcast
6
Three Stages of Tomasulo Algorithm
• 1. Issue—get instruction from FP Op Queue
– If reservation station free (no structural hazard),
control issues the instruction & sends operands
(renames registers).
• 2. Execute—operate on operands (EX)
– When both operands ready then execute;
if not ready, watch Common Data Bus for result
• 3. Write result—finish execution (WB)
– Write on Common Data Bus to all awaiting units;
mark reservation station available
7
Tomasulo Loop Example
Loop:
LD
MULTD
SD
SUBI
BNEZ
F0,
F4,
F4,
R1,
R1,
0(R1)
F0,
F2
0(R1)
R1,
#8
Loop
• This time assume multiply takes 4 clock cycles in the
execution stage
• Assume 1st load takes 8 clock cycles (L1 cache miss) in
the execution stage, 2nd load takes 1 extra cycle (hit)
• Assume store takes 3 cycles in the execution stage
• To be clear, will not show clocks for SUBI, BNEZ
• Show about 2 iterations
8
Loop Example using simplified presentation for
load/store components
Instruction status:
ITER Instruction
1
1
1
2
Iter- 2
ation 2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
Count
Reservation Stations:
Time
Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Op
Vj
Exec Write
Issue CompResult
Load1
Load2
Load3
Store1
Store2
Store3
S1
Vk
S2
Qj
RS
Qk
0
9
F0
R1
80
F2
F4
F6
F8
No
No
No
No
No
No
Added Store Buffers
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
F10
F12
Register result status
Clock
Qk
Busy Addr
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Instruction Loop
Qi
Value of Register used for address, iteration control
Loop Example Cycle 1
Instruction status:
ITER Instruction
1
LD
F0
j
k
0
R1
1
Vj
S1
Vk
Reservation Stations:
Time
Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
Exec Write
Issue CompResult
Op
S2
Qj
RS
Qk
Busy Addr
Qk
Load1
Load2
Load3
Store1
Store2
Store3
Yes
No
No
No
No
No
80
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
1
10
R1
80
F0
Qi Load1
F2
F4
F6
F8
F10 F12
Loop Example Cycle 2
Instruction status:
ITER Instruction
Exec Write
j
k
Issue CompResult
Busy Addr
1
LD
F0
0
R1
1
Load1 Yes
1
MULTD
F4
F0
F2
2
Load2
No
Load3
No
Store1
No
Store2
No
Store3
No
Reservation Stations:
Time
Op
Vj
S1
S2
RS
Vk
Qj
Qk
Qk
80
Name
Busy
Add1
No
LD
F0
0
R1
Add2
No
MULTD
F4
F0
F2
Add3
No
SD
F4
0
R1
Mult1
Yes Multd
SUBI
R1
R1
#8
Mult2
No
BNEZ
R1
Loop
F12
...
R(F2) Load1
Code:
Register result status
Clock
2
11
F0
R1
80
Qi
Load1
F2
F4
Mult1
F6
F8
F10
F30
Loop Example Cycle 3
Instruction status:
ITER Instruction
1
1
1
LD
MULTD
SD
F0
F4
F4
j
k
0
F0
0
R1
F2
R1
Reservation Stations:
Time
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
Vj
Exec Write
Issue CompResult
1
2
3
S1
Vk
S2
Qj
RS
Qk
R(F2) Load1
Busy Addr
Qk
Load1
Load2
Load3
Store1
Store2
Store3
Yes
No
No
Yes
No
No
80
80
Mult1
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
3
12
R1
80
F0
Qj Load1
F2
F4
Mult1
F6
F8
F10 F12
Loop Example Cycle 4
Instruction status:
ITER Instruction
1
1
1
LD
MULTD
SD
F0
F4
F4
j
k
0
F0
0
R1
F2
R1
Reservation Stations:
Time
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
Vj
Exec Write
Issue CompResult
1
2
3
S1
Vk
S2
Qj
RS
Qk
R(F2) Load1
Busy Addr
Qk
Load1
Load2
Load3
Store1
Store2
Store3
Yes
No
No
Yes
No
No
80
80
Mult1
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
4
R1
80
F0
Qi Load1
F2
F4
F6
F8
F10 F12
Mult1
•13 Dispatching SUBI Instruction (not in FP queue)
Loop Example Cycle 5
Instruction status:
ITER Instruction
1
1
1
LD
MULTD
SD
F0
F4
F4
j
k
0
F0
0
R1
F2
R1
Reservation Stations:
Time
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
Vj
Exec Write
Issue CompResult
1
2
3
S1
Vk
S2
Qj
RS
Qk
R(F2) Load1
Busy Addr
Qk
Load1
Load2
Load3
Store1
Store2
Store3
Yes
No
No
Yes
No
No
80
80
Mult1
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
5
R1
72
F0
Qi Load1
F2
F4
F6
F8
F10 F12
Mult1
•14 And, BNEZ instruction (not in FP queue)
Loop Example Cycle 6
Instruction status:
ITER Instruction
1
1
1
2
LD
MULTD
SD
LD
F0
F4
F4
F0
j
k
0
F0
0
0
R1
F2
R1
R1
1
2
3
6
Vj
S1
Vk
Reservation Stations:
Time
Exec Write
Issue CompResult
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
S2
Qj
RS
Qk
R(F2) Load1
Busy Addr
Qk
Load1
Load2
Load3
Store1
Store2
Store3
Yes
Yes
No
Yes
No
No
80
72
80
Mult1
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
6
R1
72
F0
Qi Load2
F2
F4
F6
F8
F10 F12
Mult1
• Notice that F0 never sees Load from location 80
15
Loop Example Cycle 7
Instruction status:
ITER Instruction
1
1
1
2
2
LD
MULTD
SD
LD
MULTD
F0
F4
F4
F0
F4
j
k
0
F0
0
0
F0
R1
F2
R1
R1
F2
1
2
3
6
7
Vj
S1
Vk
Reservation Stations:
Time
Exec Write
Issue CompResult
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 Yes Multd
S2
Qj
RS
Qk
R(F2) Load1
R(F2) Load2
Busy Addr
Qk
Load1
Load2
Load3
Store1
Store2
Store3
Yes
Yes
No
Yes
No
No
80
72
80
Mult1
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
7
•
•
16
R1
72
F0
Qi Load2
F2
F4
F6
F8
Mult2
Register file completely detached from computation
First and Second iteration completely overlapped
F10 F12
Loop Example Cycle 8
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
1
2
3
6
7
8
Vj
S1
Vk
Reservation Stations:
Time
Exec Write
Issue CompResult
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 Yes Multd
S2
Qj
RS
Qk
R(F2) Load1
R(F2) Load2
Busy Addr
Qk
Load1
Load2
Load3
Store1
Store2
Store3
Yes
Yes
No
Yes
Yes
No
80
72
80
72
Mult1
Mult2
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
8
17
R1
72
F0
Qi Load2
F2
F4
Mult2
F6
F8
F10 F12
Loop Example Cycle 9
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
1
2
3
6
7
8
9
Vj
S1
Vk
S2
Qj
Reservation Stations:
Time
Exec Write
Issue CompResult
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 Yes Multd
RS
Qk
R(F2) Load1
R(F2) Load2
Busy Addr
Qk
Load1
Load2
Load3
Store1
Store2
Store3
Yes
Yes
No
Yes
Yes
No
80
72
80
72
Mult1
Mult2
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
9
R1
72
F0
Qi Load2
F2
F4
Mult2
• Load1 completing: who is waiting?
•18 Note: Dispatching SUBI
F6
F8
F10 F12
Loop Example Cycle 10
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
4
Exec Write
Issue CompResult
1
2
3
6
7
8
S1
Vk
9
10
10
S2
Qj
Name Busy Op
Vj
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd M[80] R(F2)
Mult2 Yes Multd
R(F2) Load2
RS
Qk
Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3
Code:
LD
MULTD
SD
SUBI
BNEZ
No
Yes
No
Yes
Yes
No
F0
F4
F4
R1
R1
Qk
72
80
72
Mult1
Mult2
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
R1
10
64
F0
Qi Load2
F2
F4
Mult2
• Load2 completing: who is waiting?
•19 Note: Dispatching BNEZ
F6
F8
F10 F12
Tomasulo Organization
FP Registers
From Mem
FP Op
Queue
Load Buffers
Load1
Load2
Load3
Load4
Load5
Load6
Store
Buffers
Add1
Add2
Add3
Mult1
Mult2
FP adders
20
Reservation
Stations
To Mem
FP multipliers
Common Data Bus (CDB)
Loop Example Cycle 11
Instruction status:
ITER Instruction
Exec Write
j
k
Issue CompResult
1
LD
F0
0
R1
1
1
MULTD
F4
F0
F2
1
SD
F4
0
2
LD
F0
2
MULTD
2
SD
No
2
Load2
No
R1
3
Load3 Yes
64
0
R1
6
Store1 Yes
80
Mult1
F4
F0
F2
7
Store2 Yes
72
Mult2
F4
0
R1
8
Store3
Time
Op
Vj
10
10
Qk
Load1
Reservation Stations:
9
Busy Addr
11
S1
S2
RS
Vk
Qj
Qk
No
Name
Busy
Code:
Add1
No
LD
F0
0
R1
Add2
No
MULTD
F4
F0
F2
Add3
No
SD
F4
0
R1
3
Mult1
Yes Multd M[80] R(F2)
SUBI
R1
R1
#8
4
Mult2
Yes Multd M[72] R(F2)
BNEZ
R1
Loop
F12
...
Register result status
Clock
11
F0
R1
64
Qi
Load3
F2
F4
F6
Mult2
• Next load in sequence
21
F8
F10
F30
Loop Example Cycle 12
Instruction status:
ITER Instruction
Exec Write
j
k
Issue CompResult
1
LD
F0
0
R1
1
1
MULTD
F4
F0
F2
1
SD
F4
0
2
LD
F0
2
MULTD
2
SD
No
2
Load2
No
R1
3
Load3 Yes
64
0
R1
6
Store1 Yes
80
Mult1
F4
F0
F2
7
Store2 Yes
72
Mult2
F4
0
R1
8
Store3
Time
Op
Vj
10
10
Qk
Load1
Reservation Stations:
9
Busy Addr
11
S1
S2
RS
Vk
Qj
Qk
No
Name
Busy
Code:
Add1
No
LD
F0
0
R1
Add2
No
MULTD
F4
F0
F2
Add3
No
SD
F4
0
R1
2
Mult1
Yes Multd M[80] R(F2)
SUBI
R1
R1
#8
3
Mult2
Yes Multd M[72] R(F2)
BNEZ
R1
Loop
F12
...
Register result status
Clock
12
F0
R1
64
Qi
Load3
F2
F4
F6
F8
Mult2
• Why not issue third multiply?
22
F10
F30
Loop Example Cycle 13
Instruction status:
ITER Instruction
Exec Write
j
k
Issue CompResult
1
LD
F0
0
R1
1
1
MULTD
F4
F0
F2
1
SD
F4
0
2
LD
F0
2
MULTD
2
SD
No
2
Load2
No
R1
3
Load3 Yes
64
0
R1
6
Store1 Yes
80
Mult1
F4
F0
F2
7
Store2 Yes
72
Mult2
F4
0
R1
8
Store3
Time
Op
Vj
10
10
Qk
Load1
Reservation Stations:
9
Busy Addr
11
S1
S2
RS
Vk
Qj
Qk
No
Name
Busy
Code:
Add1
No
LD
F0
0
R1
Add2
No
MULTD
F4
F0
F2
Add3
No
SD
F4
0
R1
1
Mult1
Yes Multd M[80] R(F2)
SUBI
R1
R1
#8
2
Mult2
Yes Multd M[72] R(F2)
BNEZ
R1
Loop
F12
...
Register result status
Clock
13
F0
R1
64
Qi
Load3
F2
F4
F6
F8
Mult2
• Why not issue third store?
23
F10
F30
Loop Example Cycle 14
Instruction status:
ITER Instruction
Exec Write
j
k
Issue CompResult
1
LD
F0
0
R1
1
9
1
MULTD
F4
F0
F2
2
14
1
SD
F4
0
R1
3
2
LD
F0
0
R1
6
2
MULTD
F4
F0
F2
2
SD
F4
0
R1
Reservation Stations:
Time
Op
Vj
10
Busy Addr
Load1
No
Load2
No
Qk
Load3 Yes
64
Store1 Yes
80
Mult1
7
Store2 Yes
72
Mult2
8
Store3
10
11
S1
S2
RS
Vk
Qj
Qk
No
Name
Busy
Code:
Add1
No
LD
F0
0
R1
Add2
No
MULTD
F4
F0
F2
Add3
No
SD
F4
0
R1
0
Mult1
Yes Multd M[80] R(F2)
SUBI
R1
R1
#8
1
Mult2
Yes Multd M[72] R(F2)
BNEZ
R1
Loop
F12
...
Register result status
Clock
14
F0
R1
64
Qi
Load3
F2
F4
F6
F8
F10
Mult2
• Mult1 completing. Who is waiting?
24
F30
Loop Example Cycle 15
Instruction status:
ITER Instruction
Exec Write
j
k
Issue CompResult
Busy Addr
1
LD
F0
0
R1
1
9
10
Load1
No
1
MULTD
F4
F0
F2
2
14
15
Load2
No
1
SD
F4
0
R1
3
2
LD
F0
0
R1
6
10
2
MULTD
F4
F0
F2
7
15
2
SD
F4
0
R1
8
Reservation Stations:
Time
0
Op
Vj
11
Load3 Yes
64
Store1 Yes
80
[80]*R2
Store2 Yes
72
Mult2
Store3
S1
S2
RS
Vk
Qj
Qk
Qk
No
Name
Busy
Code:
Add1
No
LD
F0
0
R1
Add2
No
MULTD
F4
F0
F2
Add3
No
SD
F4
0
R1
Mult1
No
SUBI
R1
R1
#8
Mult2
Yes Multd M[72] R(F2)
BNEZ
R1
Loop
F12
...
Register result status
Clock
15
F0
R1
64
Qi
Load3
F2
F4
F6
F8
F10
Mult2
• Mult2 completing. Who is waiting?
25
F30
Loop Example Cycle 16
Instruction status:
ITER Instruction
Exec Write
j
k
Issue CompResult
Busy Addr
1
LD
F0
0
R1
1
9
10
Load1
No
1
MULTD
F4
F0
F2
2
14
15
Load2
No
1
SD
F4
0
R1
3
2
LD
F0
0
R1
6
10
2
MULTD
F4
F0
F2
7
15
2
SD
F4
0
R1
8
Reservation Stations:
Time
4
Op
Vj
Qk
Load3 Yes
64
11
Store1 Yes
80
[80]*R2
16
Store2 Yes
72
[72]*R2
Store3
S1
S2
RS
Vk
Qj
Qk
No
Name
Busy
Add1
No
LD
F0
0
R1
Add2
No
MULTD
F4
F0
F2
Add3
No
SD
F4
0
R1
Mult1
Yes Multd
SUBI
R1
R1
#8
Mult2
No
BNEZ
R1
Loop
F12
...
R(F2) Load3
Code:
Register result status
Clock
16
26
F0
R1
64
Qi
Load3
F2
F4
Mult1
F6
F8
F10
F30
Loop Example Cycle 17
Instruction status:
ITER Instruction
Exec Write
j
k
Issue CompResult
Busy Addr
1
LD
F0
0
R1
1
9
10
Load1
No
1
MULTD
F4
F0
F2
2
14
15
Load2
No
1
SD
F4
0
R1
3
2
LD
F0
0
R1
6
10
2
MULTD
F4
F0
F2
7
15
2
SD
F4
0
R1
8
Reservation Stations:
Time
Op
Vj
Qk
Load3 Yes
64
11
Store1 Yes
80
[80]*R2
16
Store2 Yes
72
[72]*R2
Store3 Yes
64
Mult1
S1
S2
RS
Vk
Qj
Qk
Name
Busy
Add1
No
LD
F0
0
R1
Add2
No
MULTD
F4
F0
F2
Add3
No
SD
F4
0
R1
Mult1
Yes Multd
SUBI
R1
R1
#8
Mult2
No
BNEZ
R1
Loop
F12
...
R(F2) Load3
Code:
Register result status
Clock
17
27
F0
R1
64
Qi
Load3
F2
F4
Mult1
F6
F8
F10
F30
Loop Example Cycle 18
Instruction status:
ITER Instruction
Exec Write
j
k
Issue CompResult
Busy Addr
1
LD
F0
0
R1
1
9
10
Load1
No
1
MULTD
F4
F0
F2
2
14
15
Load2
No
1
SD
F4
0
R1
3
18
2
LD
F0
0
R1
6
10
2
MULTD
F4
F0
F2
7
15
2
SD
F4
0
R1
8
Reservation Stations:
Time
Op
Vj
Qk
Load3 Yes
64
11
Store1 Yes
80
[80]*R2
16
Store2 Yes
72
[72]*R2
Store3 Yes
64
Mult1
S1
S2
RS
Vk
Qj
Qk
Name
Busy
Add1
No
LD
F0
0
R1
Add2
No
MULTD
F4
F0
F2
Add3
No
SD
F4
0
R1
Mult1
Yes Multd
SUBI
R1
R1
#8
Mult2
No
BNEZ
R1
Loop
F12
...
R(F2) Load3
Code:
Register result status
Clock
18
28
F0
R1
64
Qi
Load3
F2
F4
Mult1
F6
F8
F10
F30
Loop Example Cycle 19
Instruction status:
ITER Instruction
Exec Write
j
k
Issue CompResult
Busy Addr
Qk
1
LD
F0
0
R1
1
9
10
Load1
No
1
MULTD
F4
F0
F2
2
14
15
Load2
No
1
SD
F4
0
R1
3
18
19
Load3 Yes
2
LD
F0
0
R1
6
10
11
Store1
2
MULTD
F4
F0
F2
7
15
16
Store2 Yes
72
[72]*R2
2
SD
F4
0
R1
8
19
Store3 Yes
64
Mult1
S1
S2
RS
Vk
Qj
Qk
Reservation Stations:
Time
Op
Vj
64
No
Name
Busy
Add1
No
LD
F0
0
R1
Add2
No
MULTD
F4
F0
F2
Add3
No
SD
F4
0
R1
Mult1
Yes Multd
SUBI
R1
R1
#8
Mult2
No
BNEZ
R1
Loop
F12
...
R(F2) Load3
Code:
Register result status
Clock
19
29
F0
R1
56
Qi
Load3
F2
F4
Mult1
F6
F8
F10
F30
Loop Example Cycle 20
Instruction status:
ITER Instruction
Exec Write
j
k
Issue CompResult
Busy Addr
1
LD
F0
0
R1
1
9
10
Load1 Yes
1
MULTD
F4
F0
F2
2
14
15
Load2
1
SD
F4
0
R1
3
18
19
Load3 Yes
2
LD
F0
0
R1
6
10
11
Store1
No
2
MULTD
F4
F0
F2
7
15
16
Store2
No
2
SD
F4
0
R1
8
19
20
Store3 Yes
S1
S2
RS
Vk
Qj
Qk
Reservation Stations:
Time
Op
Vj
Qk
56
No
64
64
Mult1
Name
Busy
Add1
No
LD
F0
0
R1
Add2
No
MULTD
F4
F0
F2
Add3
No
SD
F4
0
R1
Mult1
Yes Multd
SUBI
R1
R1
#8
Mult2
No
BNEZ
R1
Loop
F12
...
R(F2) Load3
Code:
Register result status
Clock
20
F0
R1
56
Qi
Load1
F2
F4
F6
F8
F10
F30
Mult1
• Once again: In-order issue, out-of-order execution and out-of-order
completion.
30
Why can Tomasulo overlap
iterations of loops?
• Register renaming
– Multiple iterations use different physical
destinations for registers (dynamic loop unrolling).
• Reservation stations
– Buffer old values of registers - avoiding the WAR
stall that we saw in the scoreboard.
• Other perspective: Tomasulo builds data flow
dependency graph on the fly.
31
Tomasulo’s scheme offers 2 major
advantages
(1)the distribution of the hazard detection logic
– Distributed reservation stations and the CDB
– If multiple instructions waiting on single result,
the instructions can be released simultaneously
by broadcast on CDB
– If a centralized register file were used, the units
would have to read their results from the registers
when register buses are available.
(2) the elimination of stalls for WAW and WAR
hazards
32