R1=R2*R3 - University of Houston

COSC6385 Advanced Computer Architecture
Lecture 12. ROB and Precise Interrupts
Instructor: Weidong Shi (Larry), PhD
Computer Science Department
University of Houston
2
Precise Interrupt
3
Out-of-Order Execution
• We’re now executing instructions in data-flow
order
 Great! More performance
• But outside world can’t know about this
 Must maintain illusion of sequentiality
Issue with Imprecise Interrupt
lw r5, 8(r10)
add r10, r9, r8
Instruction
Page Fault
add r12, r10, r7
L1:
add r3, r1, r2
add r4, r1, r4
add r2, r4, r4
• add instructions take one cycle
• E.g.,
 Load (left side) induces a “data page fault”;
 Add (right side) induces an “instruction page fault”
• If out-of-order completion is allowed
 r10, r12, (or r2, r4) … will be modified
 Wrong values will be used by the re-issued load
• Interrupt classes
 Program interrupts (exceptions or traps)
 External interrupts (asynchronous)
5
End of
Non-Resident
Page X
Start of
Resident
Page X+1
Precise Interrupts
• To reflect a sequential architecture model 
Serially correct (think about a single issue, nonpipelined processor)
• Keep “Precise State” of an execution
 All instructions before the interrupted instruction must be
completed
 The state should appear as if no instruction issued after the
interrupted instruction
 The interrupted PC should be presented to the interrupt handler
(restartable)
• Similar to branch misprediction handling
• Out-of-order execution makes the ordering hard
 Undo what comes after an interrupt
6
Why Supporting Precise Interrupts
• Need to maintain a precise state (for recovery)
•
•
•
•
•
Software debugging
I/O or timer interrupts
Virtual memory (page fault)
Instruction emulation
Virtual machines
7
Support Precise Interrupt
• Buffer results
• Can reconstruct the scenario (state) as sequential
execution
• Restart from saved PC with saved PC state
8
Reorder Buffer (ROB) [SmithPlezkun’85 ‘88]
• Architecture Register File keeps “In-order state”
• Reorder Buffer (ROB)
 A circular buffer
 Contains all in-flight instructions
 buffers the “Lookahead state”
 In-order allocation/deallocation with head/tail pointers
• When an exception occurs
 Halting instruction issues
 Revert to in-order state using RF and discard ROB results
• Also used for branch misprediction recovery
• Pentium Pro/II/III integrates physical register file within ROB
• Pentium 4 decouples ROB and physical register file
9
V
Head
(oldest
instruction)
Spec?
Done?
Reorder Buffer (with physical registers)
PC
Exp
event
RegDst
.
.
.
Tail
(next inst
to be
allocated)
10 ROB
Sandy Bridge : 168-entry
Data (physical register)
.
.
.
V
Spec?
Done?
Handling Precise Interrupts
PC
Exp
event
RegDst
Head
01 0 1
0
1 0 0
xA000
xA004
0000
0000
R1
R2
Tail
1 0 0
xA008
0000
FR1
.
.
.
Data (physical register)
11
R1=R1+10
R2=R2*2
FR1=FR2/0.0
.
.
.
R1
R2
R3
R4
ARF
11
1
1
2
1
3
1
4
1
R31
11
PC
Exp
event
0
1 0 0
xA004
0000
R2
1 0 0
xA008
0000
FR1
FR1=FR2/0.0
1 0 0
xA00C
0000
R3
R3=R3+1
V
Head
Spec?
Done?
Handling Precise Interrupts
RegDst
Data (physical register)
R2=R2*2
Tail
.
.
.
.
.
.
R1
R2
R3
R4
ARF
11
1
1
2
1
3
1
4
1
R31
12
PC
Exp
event
0
1 0 0
xA004
0000
R2
1 0 0
xA008
0000
FR1
1 0 1
1 0 0
xA00C
xA010
0000
0000
R3
R4
V
Head
Spec?
Done?
Handling Precise Interrupts
RegDst
Data (physical register)
R2=R2*2
FR1=FR2/0.0
4
R3=R3+1
R4=R4*2
Tail
.
.
.
.
.
.
R1
R2
R3
R4
ARF
11
1
1
2
1
3
1
4
1
R31
13
PC
Exp
event
0
1 0 0
1
1 0 0
xA004
0000
R2
xA008
0010
FR1
1 0 1
1 0 1
xA00C
xA010
0000
0000
R3
R4
1 0 0
xA014
0000
FR4
V
Head
Spec?
Done?
Handling Precise Interrupts
RegDst
Data (physical register)
4
R2=R2*2
FR1=FR2/0.0
4
8
R3=R3+1
R4=R4*2
FR4=FR4*2.0
Tail
.
.
.
.
.
.
R1
R2
R3
R4
ARF
11
1
4
1
2
1
3
1
4
1
R31
14
PC
Exp
event
0
1
0 0 1
xA004
0000
R2
1 0 0
xA008
0010
FR1
1 0 1
1 0 1
xA00C
xA010
0000
0000
R3
R4
1 0 0
xA014
0000
FR4
V
Head
Spec?
Done?
Handling Precise Interrupts
RegDst
Data (physical register)
4
R2=R2*2
FR1=FR2/0.0
4
8
R3=R3+1
R4=R4*2
FR4=FR4*2.0
Tail
.
.
.
.
.
.
R1
R2
R3
R4
ARF
11
1
1
4
1
3
1
4
1
R31
15
PC
Exp
event
RegDst
0
0
1 0 0
xA008
0010
FR1
1 0 1
1 0 1
xA00C
xA010
0000
0000
R3
R4
1 0 0
xA014
0000
FR4
V
Head
Spec?
Done?
Handling Precise Interrupts
These values
were not
Data (physical register)
committed into
RF
FR1=FR2/0.0
4
8
R3=R3+1
R4=R4*2
FR4=FR4*2.0
Tail
Back up “PC”
and current RF
.
.
.
.
Exception detected. .
.
R1
R2
R3
R4
1
R31
Depending on the Exception, process will either abort
or instruction will be resumed from this
16
excepting instruction
ARF
11
1
1
4
1
3
1
4
V
Head
Spec?
Done?
Handling Speculative Execution
1 0 0
1 0 0
PC
Exp
event
xB000
xB004
0000
0000
RegDst
Data (physical register)
R1=R1+10
BEQ R1, R0, L1
R1
Tail
.
.
.
.
.
.
R1
R2
R3
R4
ARF
1
1
2
1
3
1
4
1
R31
17
PC
Exp
event
1 0 0
1 0 0
xB000
xB004
0000
0000
R1
1 1 1
xC100
0000
1 1 0
xC104
0000
R2
R1
1 1 0
xD2AC
0000
1 1 1
xD2B0
0000
V
Head
Spec?
Done?
Handling Speculative Execution
RegDst
Data (physical register)
R1=R1+10
BEQ R1, R0, L1
32
R2=R3 << 2
R1=R2*R3
BEQ R3, R0, L1
R1
28
R1=R7+1
Tail
.
.
.
.
.
.
R1
R2
R3
R4
ARF
1
1
2
1
3
1
4
1
R31
18
BEQ R1, R0, L1 is predicted TAKEN
V
Head
Spec?
Done?
Handling Speculative Execution
PC
Exp
event
1 0 0
xB004
0000
1 1 1
xC100
0000
1 1 0
xC104
0000
1 1 0
xD2AC
0000
1 1 1
xD2B0
0000
RegDst
BEQ
Data (physical register)
Misprediction
BEQ R1, R0, L1
R2
R1
32
R2=R3 << 2
R1=R2*R3
BEQ R3, R0, L1
R1
28
R1=R7+1
Tail
.
.
.
.
.
.
R1
R2
R3
R4
ARF
11
1
2
1
3
1
4
1
R31
19
BEQ R1, R0, L1 is resolved, actually
NOT TAKEN !!
V
Head
Spec?
Done?
Handling Speculative Execution
1 0 0
PC
Exp
event
xB004
0000
RegDst
Data (physical register)
BEQ R1, R0, L1
Tail
.
.
.
.
.
.
R1
R2
R3
R4
ARF
11
1
2
1
3
1
4
1
R31
20after the mis-speculated branch
Retire branch, Clear all entries
V
Head
Spec?
Done?
Handling Speculative Execution
1 0 0
PC
Exp
event
RegDst
xB008
0000
R2
Data (physical register)
R2=R5 << 4
Tail
.
.
.
.
.
.
R1
R2
R3
R4
ARF
11
1
2
1
3
1
4
1
R31
21 path (Fall through in this case)
Continue execution from the correct
RAT Recovery
ARF
br
RAT
?!?
ARF state corresponds to state prior
to oldest non-committed instruction
As instructions are processed, the RAT
corresponds to the register mapping after
the most recently renamed instruction
On a branch misprediction, wrong-path
instructions are flushed from the machine
The RAT is left with an invalid set of
mappings corresponding to the wrongpath instruction state
22
Solution: Stall and Drain
ARF
Allow all instructions to execute and
commit; ARF corresponds to last
committed instruction
RAT
ARF now corresponds to the state
right before the next instruction to
be renamed (foo)
br
X
Reset RAT so that all mappings
?!?
refer to the ARF
Pros: Very simple
to implement
Resume renaming the new correctfoo
Correct path instructions
from
Cons: Performance
loss
pathfetch;
instructions from fetch
can’t
rename
because
RAT
is
wrong
due to stalls
23
Another Solution: Checkpointing
ARF
At each branch, make a copy of the RAT
(register mapping at the time of the branch)
br
br
br
br
foo
RAT
RAT
RAT
RAT
RAT
On a misprediction:
1. flush wrong-path instructions
2. deallocate RAT checkpoints
3. recover RAT from checkpoint
4. resume renaming
24
Checkpoint
Free Pool
Commit Illustrated
• Make instruction execution “visible” to the
outside world
 “Commit” the changes to the architected state
ROB
A
B
C
D
E
F
G
H
J
K










WB result
ARF
Outside World “sees”:
A executed
B executed
C executed
D executed
E executed
Instructions execute out of program order,
but outside world still “believes” it’s in-order