Designing and Testing Fault-Tolerant Techniques for SRAM

Paper by F.L. Kastensmidt, G. Neuberger, L. Carro, R. Reis
Talk by Nick Boyd
1

Exploring techniques for detecting and
dealing with radiation-induced faults in
FPGAs

Why?
 Drive to use commercial off-the-shelf to minimize
cost and development time (for space apps)
 As technology gets smaller, radiation becomes an
issue even at ground level
2

Incident radiation deposits energy
 creation of electron-hole pairs and secondary ionizations produce
transient current pulse

Can change a ‘0’ to a ‘1’ or vice versa, often called “bit flip”

In combinational logic: Single Event Transient (SET)

In sequential (or memory): Single Event Upset (SEU)
3

In FPGAs there are further considerations
 SEU in the configuration SRAM (logic, routing)
 SET in combinational FPGA fabric
 SEU in BlockRAM
4

Effects of SEU in configuration fabric
5

TMR = Triple Modular Redundancy

Logic is triplicated and results are accepted by majority vote

Everything is tripled; including combinational, sequential, routing
and i/o
6

Benefits
 Able to detect and correct SEU\SET anywhere in
the FPGA
 No performance penalty

Drawbacks
 Very large area/resource penalty
(particularly problematic for i/o pads)
7

A new technique proposed by the authors of
this paper

DMR-CED: Double Modular Redundancy with
Concurrent Error Detection

Motivation: Want to find a way that is as
reliable as TMR in detecting/correcting errors
with less area overhead
8

CED = Concurrent Error Detection

Exploits some property of the logic block to
find error

Time-redundant examples:
 bit-wise inversion
 re-computing with shifted operands (RESO)
 re-computing with swapped operands (REWSO)
9

Result calculated from direct input and stored

Input then encoded, new result calculated
and decoded

Two outputs
compared –
should be equal
10

How can we use CED?
 Only duplicate combinational logic
 Use CED to determine the faulty module only if
there is disagreement
11
12

Three sample sequential circuits tested
 8-bit multiplier
 8-bit ALU
 FIR filter

Sample circuits generated then each node was
replaced with a multiplexor which either passes
‘correct’, ‘0’, or ‘1’

Able to simulate every possible SEU fault
13
14
15

Benefits
 Reduces area required for combinational logic (by
a significant amount in some cases)

Drawbacks
 Significantly more complicated due to CED
 CED circuit needs to be chosen to be optimized
for each combinational circuit you protect
 Speed reduced by as much as 50%
16

Reasonably well written and complete

Necessary to read the references to
understand the minutiae of underlying
principles

DMR-CED probably only useful under very
specific conditions
17