Document

Performance Implications of Faults in Prediction Arrays
Nikolas Ladas
Yiannakis Sazeides
Veerle Desmet
University of Cyprus
Ghent University
DFR’ 10
Pisa, Italy - 24/1/2010
HiPEAC2010
Motivation
●
●
Technology scaling: Opportunities and Challenges
Reliability and computing tomorrow
●
●
Failures will not be exceptional
Various sources of failures
●
●
●
●
Manufacturing: imperfections, process-variation
Physical phenomena: soft-errors, wear-out
Power constraints: control operation below Vcc-min
Key challenge: provide reliable operation with little or no
performance degradation in the presence of faults with lowoverhead solutions
Nikolas Ladas 24/1/2010
2
Architectural vs Non-Architectural Faults
●
●
●
●
●
●
So far research mainly focused on correctness
Emphasis architectural structures, e.g. caches, registers,
buses, alus etc
However, faults can occur in non-architectural structures,
e.g. predictor and replacement arrays
Faults in non-architectural structures may degrade
performance
Not issue for soft-errors
Can be problem for persistent faults: wear-out, processvariation, operation below Vcc-min
Nikolas Ladas 24/1/2010
3

Arrays
•
•
•
•
•
•
•
•
•

Non-architectural Resources
line predictor
branch direction predictor
return-address-stack
indirect jump predictor
memory dependence prediction
way, hit/miss, bank predictors
replacement arrays (various caches)
hysteresis arrays (various predictors)
EV6 like core array bits breakdown
10
Non-Arch
90
...
Arch
Non-Arrays
•
•
•
branch target address adder
memory prefetch adder
....
Nikolas Ladas 24/1/2010
4
This talk…
●
●
●
Quantify performance implications of faults in nonarchitectural array-structures
Identify which non-architectural array-structures are
the most sensitive to faults
Do we need to worry about protecting these
structures?
Nikolas Ladas 24/1/2010
5
Outline
●
●
●
●
●
Fault model / Experimental framework
Performance implications of faults when all nonarchitectural arrays are faulty
Criticality of the non-architectural arrays studied
Fault semantics
Conclusions and future direction
Nikolas Ladas 24/1/2010
6
Faults and Arrays


Faults may occur in different parts of an array
We only consider cell faults
BL’
BL
WL
BL’
B
L
cell
cell
cell
BL’
BL
BL’
BL
...
BL’
BL
BL’
BL
cell
cell
cell
cell
ce
ll
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
ce
ll
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
cell
WL
WL
WL
WL
wordline
WL
WL
WL
cell
decoder
WL
WL
cell
cell
cell
.
.
.
bitline
cell
driver
Nikolas Ladas 24/1/2010
7
Array Fault Modeling Key Parameters

Number of faults:
•
•

Fault Locations
•
•

consider % of cells that are faulty: 0.125 and 0.5
Understand performance trends with increasing number of faults
consider random fault locations each affecting 1 cell
Try to capture average behavior
Model for each fault
•
each faulty cell randomly set at either stuck-at-1 or stuck-at-0
Nikolas Ladas 24/1/2010
8
Processor Model
•
EV7 like processor with 15 stage pipeline
4-way ooo, mispredictions resolved at commit
•
Non-Architectural Arrays Considered
•
•
•
•
•
•
•
•
•
•
Line Predictor Array:
Line Predictor Hysteresis Array:
LRU array for 2-way 64KB 64B/block I$ :
LRU array 2-way 64KB 64B/block D$ :
Gshare Direction Predictor:
Return address stack:
Memory dependence predictor (load-wait)
4K entries, 11 bits/entry
4K entries, 2 bits/entry
512 entries, 1 bit/entry
512 entries, 1 bit/entry
32Kentries, 2bits/entry
16 entries, 31bits/entry
1024 entries, 1 bit/entry
sim-alpha simulator
SPEC CPU 2000 benchmarks – 100 M instructions
•
Representative regions
Nikolas Ladas 24/1/2010
9
Experiments


Baseline performance: runs with no faults
For experiments with faults:
•
•







For each run all arrays with faults have same % of faulty
bits 0.125, 0.5
ALL experiments are performed using the same 100
randomly generated fault maps (50 for each % of faulty bits)
Gshare Direction Predictor
65536 bits:
Line Predictor Array
45056 bits:
Line Predictor Hysteresis Array
8192 bits:
Memory dependence predictor
1024 bits:
2-way 64KB 64B/block I$ LRU array 512 bits:
2-way 64KB 64B/block D$ LRU array 512 bits:
Return address stack
496 bits:
0.125%
82
56
10
1
1
1
1
Nikolas Ladas 24/1/2010
0.5%
328
225
41
5
3
3
3
10
Performance with 0.125% Faulty Bits (all arrays faulty)
Nikolas Ladas 24/1/2010
11
Performance with 0.5% of Faulty Bits (all arrays faulty)
Nikolas Ladas 24/1/2010
12
Observations with all arrays faulty
•
Performance degradation substantial even with small % of faulty bits
•
Both INT and FP benchmarks can degrade
•
•
•
0.125 0.5
Average degradation
1%
3.5%
Max degradation
39% 53%
Degradation is benchmark specific
•
•
•
•
•
•
•
Instruction mix (different number and type of vulnerable instructions)
Programs with high accuracy more vulnerable than those with low accuracies
When few arrays entries accessed by a program it takes large number of faults to
have faulty entries accessed
Some benchmarks are memory dominated
Worst-case degradation much greater than average
Will cause performance variation between otherwise identical
cores/chips
Are all bits equally vulnerable? Which unit(s) matter the most?
Nikolas Ladas 24/1/2010
13
Performance for Each Structure
(0.125% faulty bits)
26 benchmarks x 50 experiments for each section
Nikolas Ladas 24/1/2010
14
Performance for Each Structure
(0.5% faulty bits)
26 benchmarks x 50 experiments for each section
Nikolas Ladas 24/1/2010
15
Observations
•
•
•
•
For the processor configuration used in this study the
various non-architectural units are not equally vulnerable
to same fraction of faults.
RAS and BPRED are the most sensitive to faults
Line predictor and load-wait predictor degrade
performance significantly when there are 0.5% faults
2-way I$ and D$ are not sensitive even at 0.5% of faults
in the LRU array
Nikolas Ladas 24/1/2010
16
Reasons for Variable Vulnerability across units
●
●
Semantics of faults vary across unit
Some faults cause flushing the pipeline, others delay the
execution of an instruction, others cause a one-cycle bubble
●
●
●
Faults causing delays can be less severe since they can be hidden
in the shadow of a misprediction or with ooo
Units with typically higher accuracy more vulnerable (RAS
and conditional predictor)
Even within a unit faults can have different semantics
Nikolas Ladas 24/1/2010
17
Semantics of Faults for a 2-bit Replacement
State
0x
1x
0/1
Action
Replace
No replace
Stack-at value
00
R
01
R
11
N
10
N
Always Replace
00
R
Never Replace
01
R
11
N
01
R
00
00
R
R
10
N
01
R
10
N
11
N
Nikolas Ladas 24/1/2010
11
N
10
N
18
Repair mechanism: XOR Remapping
Access map
After remapping
XOR 1
Faulty accesses: 143
Fault map
0
1
40
0
3
0
20
1
50
1
100
0
0
0
0
0
70
•Access map: counts access/entry during an interval
•Fault Map: indicates which entries are faulty (can be determined at manufacturing
test or at very coarse intervals using BIST)
•Remap the index using XOR to minimize faulty accesses
•At regular intervals search for the optimal XOR value using the access map and
fault map
Nikolas Ladas 24/1/2010
19
Results
•26 benchmarks x 10 fault maps per category
•Recovers most of the performance degradation
•Possible to make things worse if we remap when there is no need
Nikolas Ladas 24/1/2010
20
Summary-Conclusions
●
●
Faults in non-architectural arrays can degrade processor
performance
Not all faults are equally important. Fault semantics vary.
●
●
RAS and conditional branch predictor the most critical
Faults can cause performance non-determinism across
otherwise identical chips or within the cores of the same
chip
Nikolas Ladas 24/1/2010
21
Future Work
●
●
Develop analytical model to predict the performance
distribution for a given failure rate
Understand implications of faults for other architectural
and non-architectural structures
Nikolas Ladas 24/1/2010
22
Acknowledgments


Costas Kourougiannis
Funding: University of Cyprus, Ghent University, HiPEAC,
Intel
Nikolas Ladas 24/1/2010
23
Thanks!
24
BACKUP SLIDES
25
Fault Semantics

Line Predictor Array:
•
•

Line Predictor Hysteresis Array:
•
•

•
faulty entries always predict taken or always not-taken
Incorrect prediction that gets resolved late (25% chance been lucky)
Return address stack
•

Converts sets with faulty LRU bit to direct mapped sets, more misses but can hide
Gshare Direction Predictor
•

Always update prediction on a misprediction
Never update
2-way 64KB 64B/block I$ and D$ LRU arrays
•

incorrect prediction
Conditional, returns get corrected within a cycle, indirects are resolved much later
Return misprediction is resolved late
Memory dependence predictor (load-wait)
•
•
Independent load wait (common case we should not wait) can partially hide
Dependent load not wait (this should rarely be a serious problem)
Nikolas Ladas 24/1/2010
26
Processor Pipeline
Update program counter
Update line prediction
Line predictor
0
Branch
predictor
4
8
...
4092
NLS_PC
4095
n
PC
Assign value to
PC
(indirect jump)
CT 3
CT1
Adder
n
adder
=
...
4 nops
4xn
Correct PC
4xn
4 nops
Hit
CT 2
CT 4
Instruction cache
RAS
Miss
L2
Fetch stage
Slot stage
Writeback stage
Commit stage
27
27
Line predictor Logical structure
Instruction Cache
way0
sb0
sb1
way1
sb2
sb0
sb3
sb1
sb2
sb3
512
.
.
.
.
.
.
sbX
TAG
1
9
2
inst0
inst1
inst2
inst3
Predecode bits
Valid sb
inst0 inst1
inst2
inst3
28
28
Functional Faults and Array Logical View
Not practical to study faults at physical level
Functional Models: Abstractions that ease study of faults
Fault locations: cell, input address, input/output data
We only consider cell faults
data_in
row
address
cell
output bit
31
BIST for Detecting Faults and Updating Fault Map
32
Example Remapping Search Algo
33
Interleaved vs Non-Interleaved Design Style (1)

Each array wordline contains many entries

Entries in the physical implementation are bit-interleaved
•
More area efficient
34
Interleaved vs Non-Interleaved Design Style (2)

But a cluster faults affects more entries in interleaved design

For architectural structures:
•
•

Soft-errors prefer interleaved
Hard-errors: map to spare/disable block/set
For non-architectural structures:
•
•
Soft-errors – no need for protection
Hard-errors: prefer non-interleaved (if area not issue)
35
4K LP:No Interleaving vs Interleaving (average random)
36
Random results without and with remapping
37
Expected Invariants


With increasing faults more performance degradation
Frequently accessed entries more critical than less accessed
entries
Cell stuck-at-1 more critical if bits stored in the cell are
biased towards zero
38
Worst-case - Hit rate
39
Random results without and with remapping
40