PPT

Lecture 11: Modern Superscalar
Processor Models
Generic Superscalar Models, Issue
Queue-based Pipeline, Multiple-Issue
Design
1
Generic Superscalar Processor Models
schedule
D-cache
FU
FU
bypass
Regfile
Wakeup
select
Rename
Fetch
Issue queue based
commit
schedule
D-cache
FU
FU
Wakeup
select
bypass
ROB Reg
Rename
Fetch
execute
Reservation based (already studied)
commit
execute
Revised from Paracharla PhD thesis 1998
2
Issue Queue Based Pipeline
Fetch->Rename->Issue->Reg-read-> Execute>Writeback/Commit
Core structure: register mapping table
Rename: translate architectural registers into
physical registers
Issue: send instruction out to register read and
then execution
Commit: Process mis-prediction/exception, update
register renaming
Why study? Used in Alpha 21264, MIPS R10000, Intel
P4
3
Compare Reservation Station and Issue
Queue
Pipeline Stage Sequence
1.
2.
RS: IF -> REN -> REG/ROB->SCHD->…
IQ: IF -> REN -> SCHD -> REG ->…
 Mapping Table vs. Status Table
1.
2.
RS: Status table chooses architectural register
or ROB
IQ: Always renames to a physical register
 Register file
1.
2.
RS: Architectural register file stores
architectural states
IQ: Physical register file; No architectural
register file! Mapping table determines
architectural states
4
Compare Reservation Station and Issue
Queue
Reservation Station
1.
2.
RS: busy, fu, op, Qj, Qk, Vj, Vk
IQ: busy, fu, op, Pj, Pk, ReadyJ, ReadyK
ROB
1.
2.
RS: Store register values
IQ: No register contents
Pros and Cons of IQ:



No copying between ROB and register
Efficient use of register
Bad: Complex mapping table design
5
Register Mapping Table
Records the mapping
from virtual,
architectural
registers to physical
registers
Mapping is stored in
RAM or CAM
memories
Phy reg
Arch reg
(virtual)
R1 => P3
R2 => P10
R3 => P6
R4 => P8
R5 => P12
…
6
Register Renaming Examples
Loop:
LW R2, 0(R1)
ADD R2, R2, 1
SW R2, 0(R1)
ADD R1, R1, 4
BNE R2, R3, LOOP
LW returns 100, R1=1000
Renamed dynamic instructions:
…
BNE P2, P3, Loop
LW P32, 0(P1)
ADD P33, P32, 1
SW R33, 0(P1)
ADD P34, P1, 4
BNE P34, P3, LOOP
…
Assume at first BNE.rename,
R1-R31 mapped to P1-P31,
P32-P127 are free
First BNE may be predicted
either correctly or not
7
Register Mapping Status
R1 => P1
R2 => P2
R3 => P3
R4 => P4
R5 => P5
…
R1 => P1
R2 => P32
R3 => P3
R4 => P4
R5 => P5
…
R1=>P1
R2 => P33
R3 => P3
R4 => P4
R5 => P5
…
R1=>P1
R2 => P33
R3 => P3
R4 => P4
R5 => P5
R1=>R34
R2 => P33
R3 => P3
R4 => P4
R5 => P5
No change
P1=4000
P2=200
…
P32=100
P33=101
P34=4004
…
…
At commit (possible sequence)
P1=4000
P2=200
…
P32=100
P33=?
P34=4004
P1=4000
P2=200
…
P32=100
P33=101
P34=4004
P1=4000
P2=200
…
P32=100
P33=101
P34=4004
8
Commit and Rollback
Rename point
commit point
R1 => P1
R2 => P2
R3 => P3
R4 => P4
R5 => P5
…
P1=4000
P2=200
…
P32=100
P33=?
P34=4004
R1 => P1
R2 => P32
R3 => P3
R4 => P4
R5 => P5
…
P1=>R1
R2 => P33
R3 => P3
R4 => P4
R5 => P5
…
P1=>R1
R2 => P33
R3 => P3
R4 => P4
R5 => P5
…
P1=>R34
R2 => P33
R3 => P3
R4 => P4
R5 => P5
…
Commit successful: make the next mapping
status as committed mapping status
free the previous physical register
Mis-prediction/exception: flush pipeline,
flush the following mappings
9
Program Execution Correctness

Only committed instructions write to register
and memory
Yes, from programmer’s viewpoint -- only
committed instructions’ register output
becomes visible

Maintain correct data flow – a child instruction
always use the values from its parents
Yes, in renamed form, and not affected by
speculative execution

Register/memory receives the value of last write
Yes, from programmer’s viewpoint -architectural mapping status is updated in
program order
Note memory correctness is not affected
10
Mapping Table Design – MIPS R1000
Current
mapping
Committed
mapping
Mapping tables
Mapping after Br4
Mapping after Br3
Mapping after Br2
Mapping after Br1
Committed mapping
Branch stack
Alternative PC4
Alternative PC3
Alternative PC2
Alternative PC1
RAM-based structure:




Automatically, parallel saving on branches at rename
On mis-prediction: restore the previous mapping immediately,
flush pipeline, restart fetch at the alternative PC
On commit of branch instruction: make the corresponding
mapping as the committed one
Stall if branch stack is full
11
Mapping Table Design – MIPS R1000
How about precise exception?

Cannot preserve every mapping status for every
instruction
Solution: record the change of mapping in
ROB



ROB: Contains Dest Architectural Register,
Renamed physical register, Old renamed physical
register
On exception: rollback mapping one instruction by
one instruction, four instructions per cycle
Slow performance – but how frequent is
exception?
Note branch mis-prediction has fast recovery
12
Mapping Table Design – Alpha 21264
Valid bits
p0 Arch. Reg # 1 1
p1 Arch. Reg # 1 0
p2 Arch. Reg # 0 1
Match and valid
…
…
pk Arch. Reg # 1 1
committed
mapping
current
mapping
CAM structure




Associative searching on architecture register index,
output physical register index (through an encoder)
One column represents one mapping, allocated to each
instruction with register output at rename
One pair of valid bit changes per one dest renaming
Fast recovery even on exceptions
13
Multiple Issue Pipelines
Each pipeline stages accept k instructions – kissue processor



Alpha 21264 – 4-issue
MIPS R1000 – 4-issue
Intel P4 – 3-issue
Memory structure must have multiple ports
proportional to issue width!
What if k instructions at rename have
dependence among them? Need Dependence
check logic!
14
Dependence Check Logic
Rs0 Rt0 Rd0 Rs1 Rt1 Rd1 Rs2 Rt2 Rd2 Rs3 Rt3 Rd3
No dependence
check yet
mapping table
Ps0Ps1 Pd0 Ps0Ps1 Pd1 Ps0 Ps1 Pd2
Ps0Ps1 Pd3
Any change to the first renaming?
What is the change to the second one? Third
and forth ones?
15