Fault Tolerant Four-State Logic by Using Self

Fault Tolerant Four-State Logic by Using Self-Healing Cells
Martin Delvai
Institute of Computer Engineering
Vienna University of Technology, Austria
[email protected]
Thomas Panhofer, Werner Friesenbichler
Austrian Aerospace GmbH
{thomas.panhofer, werner.friesenbichler}@space.at
sient and permanent fault mitigation are treated in Section V
and VI, respectively. Finally, Section VII draws a conclusion.
Abstract— The trend towards higher integration and faster
operating speed leads to decreasing feature sizes and lower
supply voltages in modern integrated circuits. These properties
make the circuits more error-prone, requiring a fault tolerant
implementation for applications demanding high reliability, e.g.
space missions. In previous work we presented a concept how
to obtain fault tolerant digital circuits by using asynchronous
Four-State Logic (FSL). This type of logic already exhibits a
high degree of fault tolerance where most faults simply halt the
circuit (deadlock). The remaining types of faults are handled
by temporal redundancy. Adding a deadlock detection unit and
introducing the concept of Self-Healing Cells (SHCs) leads to
a highly reliable circuit that is able to tolerate even multiple
faults. However our experiments revealed that some specific
fault constellations neither cause a deadlock nor are they
detected by a redundant calculation. We present two improved
ways of error detection, which allow to capture even these types
of faults. Further, a comparison between the size of an SHC and
the achieved fault tolerance wrt. multiple faults is performed.
II. R ELATED W ORK
Different methods based on classic QDI logic using a 4phase protocol have been proposed to mitigate transient faults
[3], [4] and to detect permanent faults [5], [6]. Although
easier to implement, this protocol requires spacers to separate
the particular data words, which reduces the speed of the
circuit. Often the investigations end with the enforcement of
a deadlock but there is only little investigation how to recover
from such a state [7]. To recover from permanent errors runtime reconfiguration can be performed. Genetic algorithms
[8] autonomously evolve a working circuit, however, the
result is rather unpredictable. Reverse-engineering the bitstream of an FPGA to manipulate the circuit systematically is
possible [9], but as for the genetic algorithms seems not to be
acceptable for high-reliability applications. The only known
tool for manipulating synthesized circuits provided by FPGA
manufacturers is JBits [10] from Xilinx, but it is restricted
to the Virtex-II series and Java based, which cannot be used
e.g. for space applications. Using different pre-synthesized
implementations of the same circuit [11] overcomes these
problems, but requires large memories to store the bitstreams.
All these methods can only be applied to FPGAs, whereas
our approach can also be used in ASICs. Further, the actual
configuration of a circuit can be easily determined leading
to a predictable behavior.
I. I NTRODUCTION
During its lifetime, a circuit might be exposed to various
types of faults. In particular for applications where repair
is not feasible, e.g. space missions, high reliability is an
important issue that requires error mitigation and failure handling techniques. A robust architecture having an increased
inherent fault tolerance is of advantage. Asynchronous FourState Logic (FSL) [1] shows this property and additionally
provides a more deterministic behavior in the presence of
errors than conventional single-rail logic. The concept we
propose tolerates both transient and permanent faults. It is
based on the fact that FSL circuits tend to stop (deadlock) in
the presence of errors. Combining a deadlock detection unit
with Self-Healing Cells (SHCs) allows to re-route the circuit
after a deadlock. The flexibility of SHCs allows to recover
even from multiple faults. Due to the handshaking in FSL,
no data is lost in case of a deadlock – thus after completion
of the mitigation process, the operation will continue without
any additional actions. Unfortunately, some faults produce a
wrong result without leading to a deadlock. These faults are
more severe because they cannot be detected that simply.
Just adding temporal redundancy does not cure due to the
vulnerability to token and synchronization errors [2]. A more
evolved architectural design of FSL is presented to mitigate
these faults.
The paper starts with related work in Section II. It briefly
introduces the principle, the robustness aspects as well as the
underlying fault hypothesis of FSL in Section III before it
moves on to the self-healing approach in Section IV. Tran-
978-1-4244-2658-4/08/$25.00 ©2008 IEEE
III. F OUR S TATE L OGIC – FSL
A. Concept of FSL
FSL is a member of Quasi Delay Insensitive (QDI) circuits
[12] that uses a two-phase handshake protocol. Consecutive
data is separated by using two alternating, diverse code
sets: ϕ 0 and ϕ 1, called phases. Fig. 1 shows the encoding
and transition between the boolean values TRUE/ FALSE
denoted as ’h’/’l’ in phase ϕ 0 and ’H’/’L’ in ϕ 1. Each logic
value is encoded by the two signal rails a and b [13].
Combinational FSL logic only calculates a new output if
the inputs are consistent, i.e. have the same phase, otherwise
the old output value is preserved. Assuming an appropriate
design of the circuit [14], it operates glitch-free. This fact
is important, since due to the absence of a clock signal
the storage elements in an FSL circuit have to derive the
trigger event when to store new data from the data encoding
itself. The glitch-free operation ensures that consistent data
1
„FALSE“
logic state ϕ0 (a,b)
„FALSE“
„TRUE“
l (0,0)
h (1,1)
ϕ1 (a,b)
l(0,0)
ϕ0
L (0,1)
H (1,0)
h(1,1)
ϕ0
H(1,0)
ϕ1
Fig. 1.
fault has vanished. However, if a specific amount of time is
exceeded, the circuit is considered to be halted forever, which
is called deadlock. A token error describes the corruption of
data, where data moving through an asynchronous channel
is referred to as token. Synchronization errors are related
to faults in the handshaking process and may lead to an
unintended token consumption. Token and synchronization
errors do not stop the circuit but generate wrong data [2].
L(0,1)
ϕ1
„TRUE“
FSL encoding and state transitions
IV. S ELF -H EALING A PPROACH
Data in
f(x)
Φ
Φ
en
Latch
Data in
f(x)
Φ
Φ
en
Ctrl
done
Data out
done
Latch
Monitoring Unit
Fig. 2.
Primarily our approach aims to mitigate faults by using
FSL logic, which already widely tolerates delay faults [14].
Transient and permanent faults require special attention.
In QDI circuits permanent faults cause a deadlock, which
simplifies their detection but requires a costly repair procedure. We developed Self-Healing-Cells (SHC), which allow
to recover from permanent errors by duplicating the internal
logical structure and providing a flexible routing. Each SHC
constitutes a fault containment region. Thus, in contrast to a
simple duplication of the entire target device the finer grained
SHC approach is able to tolerate multiple faults.
Transient faults either suspend the circuit’s operation or
trigger wrong data. In case the transient fault leads to a
deadlock, the same recovery mechanism as for permanent
faults can be applied. However, if the transient fault generates wrong data, additional measures are required. A first
approach to detect faults was presented in [2] but also
identified some weaknesses. Section V provides two possible
improvements to overcome these shortcomings.
Data out
Φ
en
Ctrl
pass
Latches
Φ
Data out
Latches
Data in
Latches
is also valid data, so both combinational logic and registers
derive their trigger from a consistency check of the input
data. Registers control the data flow. After the data has been
captured, an acknowledge signal is generated, which requires
the preceding stage to issue new data.
Ctrl
pass
done
Latch
pass
Watchdog
FSL pipeline structure
The data path of FSL circuits is modeled similarly to a
micropipeline [15] and is shown in Fig. 2. The monitoring
and watchdog unit are not part of the micropipeline concept
but are required for deadlock detection, see Section VI.
B. Robustness of FSL Circuits
V. T RANSIENT E RROR R ECOVERY
To assess the robustness we consider permanent, transient
and delay faults. FSL is largely immune against delay faults,
provided the basic FSL gates are internally not affected by
delay constraints [14]. Delay faults degrade the performance
of the circuit but do not affect a correct data processing.
For transient faults, we distinguish a settled state, where all
inputs are consistent and a transient state, where the inputs
are inconsistent. In the settled state, a single fault at any
input produces an inconsistent input vector, which does not
propagate because FSL circuits do not react on inconsistent
data. As long as the transient fault persists, the circuit is
halted. In the transient state, a fault can cause an error only
if it generates a consistent input vector: Assuming an n-bit
vector, this requires that n − 1 signals have acquired the new
value and the fault affects the remaining, slowest signal, in
particular the rail that is not going to change anyway. This
reduces the probability, that a transient fault yields an error
1
by a factor of 2n
compared to conventional single-rail logic.
Permanent faults are crucial for asynchronous logic because
they may result in permanent inconsistent signal vectors and
thus in a deadlock. A detailed analysis of robustness of quasi
delay-insensitive circuits can be found in [16].
A. Fault injection
Initially, we added temporal redundancy [2] to detect
transient faults not mitigated by FSL: Each operation is
performed twice, once in ϕ 0 and the other time in ϕ 1. The
results are compared at logical level and if they differ a third
calculation is initiated to identify the correct result. Due to
the alternating phases permanent errors are also detected,
which are invisible to conventional temporal redundant systems. The main drawback of this method is that it requires
two operations, which reduces the operational speed by 50%.
However, due to the 2-phase protocol, the throughput is still
as high as for classical QDI circuits using a 4-phase protocol.
Fault injection experiments were performed using a
pipelined 4-bit ripple carry adder designed in FSL, as shown
in Fig. 3. For simplicity, both operands are fetched at the
same time using an 8-bit wide register. Transient faults were
injected into the combinational logic of the adder as well
as the operand and result register on all signals and at all
time instants of the simulation. The particular components
were simulated with arbitrary propagation delays to obtain
a realistic behavior. The fault duration was selected longer
than the slowest propagation delay in the circuit to avoid
suppressing the transient by the logic function. The fault
injection experiments were performed with a simple FSL
circuit and with the same circuit again but applying temporal
redundancy.
C. Fault Hypothesis in FSL
Permanent, transient and delay faults may trigger delay,
token or synchronization errors. Delay errors suspend the
execution of a circuit. The circuit may resume when the
2
4
Operand
Register
SOURCE
TABLE I
Sum 4
4
Data 8
4-bit
Adder
5
Result
Register
Done
4- BIT PIPELINE ADDER WITH TIME REDUNDANCY
Carry
SINK
Pass
N=0
Fig. 3.
Faults injected
Mitigated by FSL
Errors
Deadlocks
Wrong data
Mitigated by time red.
Undetected Failures
4-bit FSL pipeline adder
The results shown in Table I are grouped according to the
size of the redundant data set N, see also subsection V-B. For
N = 0 no redundant calculation is performed, while N = 1
uses a simple time redundancy scheme where each operation
is repeated. In both cases, 2314 (6.8%) of the injected faults
lead to an error, where 533 (1.6%) produced a deadlock.
In the pure FSL circuit, 1781 (5.2%) of the errors generate
wrong data, which leads to undetected failures in the absence
of any control mechanism. However, even with a redundant
calculation 83 (0.2%) errors remained undetected. A closer
examination revealed that these failures were triggered by
synchronization errors that consume two consecutive tokens
forming one complete redundant data set.
Such a behavior is simulated in Fig. 4, showing a 4-bit
wide pipeline having three stages. At time t2, the pipeline
is filled with the sequence ’HLLH’, ’hllh’ and ’HHHH’,
where two consecutive tokens build one redundant data set.
A transient fault on done2 (from t3 to t4), consumes the set
’HHHH’/’hhhh’ without passing it through the pipeline and
so the complete set vanished from the data sequence.
fsdatain
llll
HLLH
hllh
HHHH
hhhh
LHLH
fsdata3
HLLH
llll
llll
lhlh
HHHH
hhhh
LHLH
hllh
HLLH
t0
Fig. 4.
hllh
HLLH
t1
t2
t3
N=4
34230 100.0%
31903 93.2%
2327
6.8%
613
1.8%
1714
5.0%
1714
5.0%
0
0.0%
To harden FSL against synchronization errors, the sensitivity of the handshake to transient faults has to be removed.
For this purpose the rail synchronization technique [4] was
adapted. The idea is to join two asynchronous channels and
to ensure that each channel cannot memorize data without
the presence of valid data in the other channel. Due to this
interlocking an error must affect both channels to propagate.
A transient fault on one channel is blocked as long as
the other channel does not contain valid data. In case the
other channel already contains a valid token, the transient
fault is memorized but will produce an invalid token. The
circuit in [4] has been developed for standard QDI logic,
which uses a 4-phase protocol and distinguishes valid and
invalid tokens. The rail synchronization also depends on
minor timing assumptions, since it assumes that the correct
rail of the faulty channel has enough time to propagate to the
storage element and to generate the invalid token before the
erroneous value is acknowledged. Otherwise, a token error
is produced.
FSL uses a 2-phase protocol and does not comprise
invalid tokens. Further, an FSL register already comprises
a synchronization mechanism at the input. In contrast to
the method in [4], all input bits are synchronized and not
only two adjacent ones. This property makes an FSL register
largely immune against token errors triggered by transient
faults at the register inputs, since the fault will only be
memorized if all bits agree to the same phase. The wider
the data bus the higher is this immunity. If the register
comprises only one or two bits, there is no difference to
the rail synchronization method.
The original FSL lacks a synchronization mechanism for
the acknowledge signal. In our approach the register is split
in two parts as depicted by Fig. 5. Each part provides an
acknowledge signal that is joined at the preceding pipeline
done3
llll
34230 100.0%
31916 93.2%
2314
6.8%
533
1.6%
1781
5.2%
1698
5.0%
83
0.2%
C. Acknowledge Synchronization
done2
fsdata2
34230 100.0%
31916 93.2%
2314
6.8%
533
1.6%
1781
5.2%
0
0.0%
1781
5.2%
As shown in Table I, no undetected failures were observed
with N = 4. The number of errors is not identical to the
simulation with N = 1, since re-arranging the sequence of
the input data creates a dependency whether the injected fault
flips a signal rail or not. However, the variation is only minor.
Although the increase of the redundant data set size N
allowed to detect all faults, it has the drawback that the whole
set has to be stored during the redundant calculation and that
the comparison becomes more complicated. Thus another
mechanism to avoid synchronization errors is presented.
done1
fsdata1
Size of redundant data set
N=1
t4 t5
Token vanishing due to a synchronization error
This token vanishing stems from the 2-phase protocol in
FSL that reacts on every transition. A pulse on the acknowledge signal creates two events and thus two synchronization
errors. If the data source of the affected register is fast enough
to provide a new token for each transition, two consecutive
tokens will be consumed. If these tokens form a redundant
data set, both errors will remain undetected.
B. Grouped Data Sets
The operations to be performed are grouped in two redundant data sets, each comprising N operations. The sets
are processed consecutively and the results are checked for
equality. For N = 1, each operation is simply repeated,
i.e. the data series is {1, 1, 2, 2, ...}, where the numbers
denote the particular tokens in the series. The length of
the redundant data set should be kept short for a fast error
detection. The fault injection experiment was repeated with
a redundant set size of N = 4, where the data series is
{1, 2, 3, 4, 1, 2, 3, 4, 5, 6, ...}.
3
Data out
Latches
Data in
Φ
Data in
Φ
Φ
en
Latches
stage. The register will only accept new data if both acknowledge inputs have the same value, thus a transient fault
occurring only on one of the two acknowledge lines will be
blocked. We modified the registers in Fig. 3 accordingly and
repeated the fault injection with the input data from Table I.
Data out
Φ
en
Ctrl
Ack1
Ctrl
f(x)
done
Latch
Data out
Latches
Data in
Φ
Ack2
Data in
Φ
Φ
en
Φ
Ctrl
Ack1
Ack2
Latch
Fig. 5.
Data out
en
Ctrl
done
Latch
Latches
done
’R2 ’. One SHC is able to tolerate at least one fault, either
internal or at its exernal interfaces. It is possible to implement
any circuit of arbitrary complexity as SHC, ranging from low
level gates (e.g. AND, OR, etc.) up to complex circuits (e.g.
a complete processor).
Designing a low level gate as SHC provides a high degree
of fault tolerance but also increases the relative hardware
overhead. Implementing more complex functions as SHC
reduces that overhead but decreases the capability to mitigate
faults. Each SHC can recover at least one fault but depending
on the internal fault distribution there is a high probability
that even multiple faults can be recovered.
The watchdog unit used for deadlock detection can only
determine the faulty pipe stage but not the faulty gate itself.
Thus the configuration inputs are changed using a ”trialand-error” method. In our prototype circuit, the configuration
pattern is generated by a counter. As soon as a configuration
is found that resumes the circuit operation, it is assumed
that the circuit has been repaired correctly. If the new
configuration does not remove the deadlock, the reconfiguration process will go on. Note that for larger circuits a
more sophisticated reconfiguration controller is required, as
a simple counter would lead to large reconfiguration times.
Currently our approach considers only combinational circuits. However, a permanent fault or a bit-flip in a register
behaves equally to a permanent fault that affects the interconnections between a register and the combinational logic
[17], i.e. an external interface of a SHC – and such a fault
can be mitigated by appropriate re-routing.
During a deadlock the input values of gates, pipelines, etc.
will be kept valid since the subsequent circuit stage cannot
consume the token. Thus, after a working configuration
has been found, the circuit will autonomously continue its
operation without loss or corruption of data. Notice that this
ability is specific to QDI circuits.
Ack1
Ack2
done
Latch
FSL Register with Acknowledge Synchronization
TABLE II
4- BIT PIPELINE ADDER WITH TIME REDUNDANCY AND ACKNOWLEDGE
SYNCHRONIZATION
Faults injected
Mitigated by FSL
Errors
Deadlocks
Wrong data
Mitigated by time red.
Undetected Failures
redundant calculation with N = 1
w/o ack sync
with ack sync
34230
100.0%
38220
100.0%
31916
93.2%
35554
93.0%
2314
6.8%
2666
7.0%
533
1.6%
1218
3.2%
1781
5.2%
1448
3.8%
1698
5.0%
1448
3.8%
83
0.2%
0
0.0%
As Table II shows, no undetected failures occurred, even
with the simple time redundancy scheme N = 1. Note that the
circuit under test was simulated with arbitrary propagation
delays that have been set rather high. Especially the skew between the particular bits of a register was selected artificially
large to provoke token errors. A probability assessment of
these types of errors in real circuits is currently ongoing.
B. Comparison of Different SHC Sizes
To compare the resource overhead versus fault tolerance
of different SHC sizes/complexities, two implementation
extremes of a 1-bit full adder have been designed. Circuit
A is built from low level SHCs (Fig. 6a), as shown for a
half adder in Fig. 6b. In circuit B (Fig. 6c), two complete
full adders were implemented in the SHC, which basically
act as hot-redundant circuits.
Obviously, the fine granular version A built from basic
gates occupies more resources than the coarse granular
circuit B. In turn, circuit A shows a much higher degree
of fault tolerance compared to circuit B as will be shown
later.
To reduce simulation time the simulation was performed
using a 1-bit full adder instead of the 4-bit adder. Since a 4bit adder consists of cascaded 1-bit adders and faults on the
external interfaces have been considered, the results obtained
from the 1-bit adder are valid also for the 4-bit adder.
The goal of the simulation was to prove that for any
injected stuck-at fault a working configuration can be found.
Transient faults are handled by the concept described in
VI. P ERMANENT E RROR R ECOVERY
For deadlock detection a watchdog unit is added as shown
in Fig. 2. It monitors the phase detectors and triggers a
reconfiguration if they do not change their state within a
defined time interval. Moreover, the states of the phase
detectors can be used to gain information about the fault
location. The watchdog timing can be chosen several orders
of magnitudes higher than the circuit processing time. Thus,
this timing assumption is no practical limitation to the QDI
concept.
A. Self-Healing Cells
To recover from deadlocks we introduced Self-Healing
Cells (SHC). A SHC is an internal redundant circuit, such as
the 2-input FSL AND gate in Fig. 6a, which allows to bypass
defective circuit parts by re-routing the internal signals. The
routing is controlled by the reconfiguration-inputs ’R1 ’ and
4
anom
a1_int
bnom
b1_int
anom
bnom
FSL
AND
1
cnom
ared
bred
R1
R2
cnom
bnom
cred
cinnom
SH-AND
snom
ared
a2_int
bred
b2_int
FSL
AND
2
cred
sred
R3
R4
R1 R2
anom
SH-XOR
TABLE III
a1_int
sumnom
b1_int
FSL
FA 1
cin1_int
C OMPARISON OF SHC S WITH D IFFERENT C OMPLEXITY
coutnom
Circuit A: 17600 fault conditions
Circuit B: 22720 fault conditions
number of signals
number of reconfig. inputs
equivalent gate count
failed with 1 fault
failed with 2 faults
failed with 3 faults
failed with 4 faults
failed with 5 faults
failed with 6 faults
failed with 7 faults
failed with 8 faults
failed with 9 faults
failed with 10 faults
failed with 11 faults
a2_int
ared
sumred
b2_int
bred
cin2_int
cinred
FSL
FA 2
coutred
R1 R2
(a) Self-Healing Basic Gate
Fig. 6.
(b) Self-Healing Half-Adder (c) Complex Self-Healing Cell (Full-Adder)
Self-Healing Cells of Different Complexity
section V and thus have not been considered in this simulation. We randomly injected permanent stuck-at-0 and stuckat-1 faults on all internal and external signals, including
the reconfiguration inputs. Then we applied all valid input
stimuli. Since we have three inputs (a, b, cin) this results
in 8 combinations for each phase or 16 combinations in
total. Each signal was subjected at least once to both stuckat faults. Due to the storage elements in the FSL gates
the circuit behavior in the presence of errors depends on
the history. To consider this dependency we performed five
independent simulation runs for each fault configuration and
took the mean values. In total we simulated 17600 fault
conditions for circuit A and 22720 for circuit B. The result
of the adder (sum and carry) was compared to the expected
value. The circuit was defined to be working if at least one of
the two redundant outputs showed the correct result. If both
outputs were wrong, the reconfiguration inputs were counted
up until a correct result was obtained. The development
of a decision algorithm to determine the correct output is
currently ongoing.
The simulations revealed that for both circuits it is possible
to repair each single fault, independent of its location. For
multiple faults there is a high probability for repair, however,
depending on the fault location in the circuit this cannot be
guaranteed. If e.g. two faults affect both redundant paths
within a SHC, the circuit will fail. The same applies if a
signal and its associated reconfiguration input are affected
by a permanent fault at the same time. The summary of
the simulations is presented in Table III. The number of
signals was extracted from the behavioral design. To obtain
the resource occupation of the two circuits, both designs were
synthesized into a Xilinx Virtex-4.
It can be seen that even with 11 simultaneously injected
faults, which means that about 10% of the circuit’s signals
are defect, still about 54% (circuit A) and 37.5% (circuit B)
of the fault constellations could be repaired. The resource
overhead is approximately 40% higher with gate-level SHCs
compared to the full-adder implemented as a single SHC.
However, as can be seen in Fig. 7, the gain of fault tolerance
of circuit A compared to circuit B is also significant, in
particular for a small number of faults where the probability
for multiple faults within the same SHC is low.
circuit A circuit B
A/B
gate SHC FA SHC comparison
142
110
+29.1%
10
2
+400.0%
580
412
+40.8%
0.0%
0.0%
-0.0%
2.1%
5.0%
-57.2%
5.5%
14.3%
-61.9%
11.0%
22.8%
-51.8%
16.0%
30.5%
-47.4%
22.0%
39.0%
-43.7%
26.8%
45.2%
-40.6%
32.5%
49.9%
-34.9%
38.1%
54.7%
-30.4%
42.3%
59.2%
-28.5%
46.0%
62.5%
-26.5%
60
percentage
50
40
30
20
Failed Conditions Full-Adder SHC
10
Failed Conditions Gate-Level SHC
Gain of Fault-Tolerance
0
1
Fig. 7.
2
3
4
5
6
7
8
number of simultaneously injected faults
9
10
11
Fail-Cases and Gain of Fault-Tolerance vs. Number of Faults
be used in a real application as the expected output is
unknown. However, this information is not required within
our approach: If a signal in the nominal or redundant data
path is affected by a fault, the output generated from this
signal may (i) carry the right phase encoding but a faulty
logical value (“high” instead of “low”, e.g.). or (ii) carry a
wrong phase encoding (ϕ 1 instead of ϕ 0, e.g.). The first case
is detected by the redundant calculation process described
previously and does not require a reconfiguration. In the
second case the circuit automatically stops its execution due
to the fact that a combinational gate or at least the register at
the end of the data path in question will never get a consistent
input. A deadlock occurs and the watchdog unit triggers the
reconfiguration process. The latter reconfigures the circuit so
that the affected signal is excluded from the data path. Subsequently the circuit will continue its execution. Thus also
the identification of a working configuration does not require
any additional means – after selecting a correct configuration
the circuit starts to work autonomously, otherwise the next
configuration is applied.
D. Simulation of a Deadlock Recovery
A simple implementation of the deadlock recovery is
shown in Fig. 8. The watchdog counter is reset by the
phase detectors in the FSL registers. If it wraps around,
a new configuration is requested by asserting Req. The
reconfiguration unit comprises a counter that is incremented
with each request. After the new reconfiguration has been
applied, the Ack signal is asserted, which resets the deadlock
C. Reconfiguration Control
To evaluate whether the fault can be repaired, we compared the output with the expected value. This method cannot
5
detector. If the configuration was successful the circuit’s
operation continues preventing any further requests. If not, a
new setting will be requested after the deadlock timeout has
expired until a working configuration is found.
nom
nom
red
red
nom
nom
f(x)
Reg
SHC
red
red
nom
f(x)
Reg
Reg
Various fault injection simulations have shown that transient faults are either mitigated by the inherent fault tolerant
properties of FSL, end up in a deadlock or can be detected
via time redundancy. It has been shown that all permanent
single faults can be repaired by designing the basic circuit
elements as self-healing cells. Further, this concept provides
a high probability to recover even from multiple faults.
SHC
red
R EFERENCES
Reg
[1] A. J. McAuley, “Four state asynchronous architectures,” IEEE Transactions on Computers, vol. 41, no. 2, pp. 129–142, February 1992.
[2] W. Friesenbichler, T. Panhofer, and M. Delvai, “Improving fault tolerance by using reconfigurable asynchronous circuits,” in Proceedings
of the 11th Workshop of Design and Diagnostics of Electric Circuits
and Systems (DDECS’08), March 2008, pp. 267–270.
[3] W. Jang and A. J. Martin, “SEU-tolerant QDI circuits,” in Proceedings
of the 11th IEEE International Symposium on Asynchronous Systems
& Circuits (ASYNC), 2005, pp. 156–165.
[4] Y. Monnet, M. Renaudin, and R. Leveugle, “Hardening techniques
against transient faults for asynchronous circuits,” in Proceedings of
the 11th IEEE International On-Line Testing Symposium (IOLTS’05),
2005, pp. 129–134.
[5] C. LaFrieda and R. Manohar, “Fault detection and isolation techniques
for quasi delay-insensitive circuits,” in DSN ’04: Proceedings of the
2004 International Conference on Dependable Systems and Networks
(DSN’04). Washington, DC, USA: IEEE Computer Society, 2004,
p. 41.
[6] S. Peng and R. Manohar, “Efficient failure detection in pipelined asynchronous circuits,” in Proceedings of the 2005 20th IEEE International
Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’05),
2005, pp. 484–493.
[7] S. Peng and R. Manohar, “Fault tolerant asynchronous adder through
dynamic self-reconfiguration,” in ICCD ’05: Proceedings of the 2005
International Conference on Computer Design, 2005, pp. 171–179.
[8] R. S. Oreifej, R. N. Al-Haddad, H. Tan, and R. F. DeMara, “Layered
approach to intrinsic evolvable hardware using direct bitstream manipulation of Virtex II pro devices,” in Proceedings of the 17th International Conference on Field Programmable Logic and Applications,
2007.
[9] S. Raaijmakers and S. Wong, “Run-time partial reconfiguration for
removal, placement and routing on the Virtex-II pro,” in Proceedings
of the 17th International Conference on Field Programmable Logic
and Applications, 2007.
[10] S. A. Guccione, D. Levi, and P. Sundararajan, “Jbits: A java-based
interface to FPGA hardware,” in Proceedings of the 2nd Annual
Military and Aerospace Applications of Programmable Devices and
Technologies Conference (MAPLD), 1999.
[11] W.-J. Huang and E. J.McCluskey, “Column-based precompiled configuration techniques for FPGA fault tolerance,” in Proceedings of the 9th
Annual IEEE Symposium on Field-Programmable Custom Computing
Machines (FCCM’01), 2001.
[12] J. Sparso and S. Furber, Eds., Principles of Asynchronous Circuit
Design - A Systems Perspective. Kluwer Academic Publishers, 2001.
[13] M. Delvai and A. Steininger, “Asynchronous logic design - from
concepts to implementation,” The 3rd International Conference on
Cybernetics and Information Technologies, Systems and Applications
- Volume 1, Jul. 2006.
[14] W. Huber, “Design of an asynchronous processor based on code alternation logic - exploration of delay insensitivity,” Ph.D. dissertation,
Vienna University of Technology, 2005.
[15] I. E. Sutherland, “Micropipelines,” Communications of the ACM,
vol. 32, no. 6, pp. 720–738, 1989.
[16] S. Piestrak and T. Nanya, “Towards totally self-checking delayinsensitive systems,” Fault-Tolerant Computing, 1995. FTCS-25. Digest of Papers., Twenty-Fifth International Symposium on, pp. 228–
237, 27-30 Jun 1995.
[17] O. A. Petlin and S. B. Furber, “Built-in self-testing of micropipelines,”
in Proceedings of the Third International Symposium on Advanced
Research in Asynchronous Circuits and Systems (ASYNC ’97), April
1997.
Req
Deadlock Detector
Fig. 8.
Ack
Reconfiguration Unit
Deadlock Detection and Reconfiguration
Permanent faults were applied to the 4-bit ripple carry
adder in Fig. 3 but now each basic gate of the adder is
constructed as SHC. The circuit comprises 34 reconfiguration
inputs, and initializes all SHCs to their nominal input. Fig. 9
shows the simulation of a permanent fault injected into the
carry bit calculation of the LSB.
NOM
datax_nom 1
2
4
6
8
10
12
14
0
2
4
6
8
datay_nom 1
3
5
7
9
11
13
15
1
3
5
7
9
dataout 2
1
9
13
17
21
25
29
1
5
9
13
17
5
RED
datax_red 1
2
4
6
8
10
12
14
0
2
4
6
8
datay_red 1
3
5
7
9
11
13
15
1
3
5
7
9
dataout 2
1
9
13
17
21
25
29
1
5
9
13
17
5
HA0_AND2_NOM
a1
{1 1}
{1 0}
{0 0}
{0 1}
{0 1}
{0 0}
{0 0}
{0 0}
{0 1}
{0 1}
b1
{1 1}
{1 0}
{1 1}
{1 0}
{1 0}
{1 1}
{1 1}
{1 1}
{1 0}
{1 0}
c1
{1 1}
{1 0}
a2
{1 1}
{1 0}
{0 0}
{0 1}
{0 1}
{0 0}
{0 0}
{0 0}
{0 1}
{0 1}
b2
{1 1}
{1 0}
{1 1}
{1 0}
{1 0}
{1 1}
{1 1}
{1 1}
{1 0}
{1 0}
c2
{1 1}
{1 0}
{0 0}
{0 1}
{0 1}
{0 0}
{0 0}
{0 0}
{0 1}
{0 1}
HA0_AND2_RED
Deadlock Detect
deadlock
r 000000EFE
000000EFF
Fig. 9.
000000F00
Reconfiguration Simulation
The inputs of gate HA0 AND2 NOM are a1 = 00 and
b1 = 11, which nominally results in an output of c1 = 00.
Due to the permanent fault, the output is stuck at c1 = 10,
which generates a wrong phase and produces a deadlock.
Examining the reconfiguration inputs shows that bits 8-11
have to be set to logic 1 to select the redundant carry bit. To
save time, the simulation starts with a reconfiguration setting
of 0x000000EFE. The circuit is halted due to the permanent
fault and the deadlock unit starts generating requests for the
reconfiguration unit. When the reconfiguration input is set
to 0x000000F00, the redundant carry bit is selected, which
holds the correct value c1 = 00 and the circuit resumes its
operation, which is indicated by the activity on the data lines.
VII. C ONCLUSION
This paper illustrates a self-healing approach for asynchronous circuits based on FSL. It combines hardware and
time redundancy and is able to tolerate transient as well as
multiple permanent faults.
6