Soft Error Propagation and Correlation Estimation in

Noname manuscript No.
First
Workshop
2012
(will MEDIAN
be inserted
by the editor)
Soft Error Propagation and Correlation Estimation in
Combinational Network
Liang Chen · Mehdi B. Tahoori
Abstract Soft error is becoming one of the major reliability concerns in nano era. Considering the correlation
effect in error propagation is essential for accurate error
rate estimation. This paper proposes a novel method to
address this correlation issue in combinational circuits.
Based on the concept of error propagation function and
super gate idea, the error probability and correlation
problems are converted into the computation of normal signal probability and correlation. The experimental results show that compared with Monte-Carlo simulation, our approach is 866x faster, while the average
error probability inaccuracy is below 0.007.
at circuit level can propagate to multiple outputs, and
the way it is seen at higher abstraction levels can manifest as multiple correlated errors (bit flips). Modeling
this correlation effect is essential for accurate error rate
estimation at both gate level and hierarchical RTL, consequently the selective protection of vulnerable gates or
blocks can efficiently lead to robust system design.
This paper focuses on logical masking effect of soft
errors in the combinational circuits. It proposes a novel
method based on the concept of error propagation function and super-gate idea to unify the treatment of signal
probability and error probability, which efficiently addresses the internal correlation among different wires.
In addition, semi-super gate simplification is utilized
to speedup the computation. Therefore, it provides a
concise and efficient solution, and much potential for
further extensions (e.g. analyzing multiple upsets and
block-level error propagation). The simulation results
show that our method is 866x faster than Monte-Carlo
simulation, while the average inaccuracy of error probability estimation is below 0.007.
The organization of the rest of this paper is as follows. Section 2 discusses our proposed error propagation model, super gate idea and adopted correlation
model. Section 3 introduces the procedures to calculate
the gate error probabilities using super gate formula
method. Section 4 describes the experimental results
and finally Section 5 concludes this paper.
Keywords Soft Error · Error Probability · Logical
Masking · Signal Correlation
1 Introduction
Radiation-induced soft error is becoming one significant
reliability issue in nano era [1]. Not only the extent,
but also the criticality of soft errors are increasing [6].
Therefore, it is essential to consider this impact in early
design phases for cost-effective error mitigation.
As sequential circuit elements and even combinational logic gates are becoming equally vulnerable to
soft errors as memories [6], the intrinsic irregularity and
functional complexity of logic circuits make the modeling of error propagation and error probability estimation more sophisticated than that of memory cells.
Even a single event transient caused by a particle strike
2 Proposed Error Propagation Model
Liang Chen
Karlsruhe Institute of Technology, Karlsruhe, Germany
E-mail: [email protected]
2.1 Error Propagation in Combinational Network
Mehdi B. Tahoori
Karlsruhe Institute of Technology, Karlsruhe, Germany
E-mail: [email protected]
Soft errors are modeled as bit flips in our paper. To illustrate the basic idea, we take the boolean AND function l = ij as the running example. For ij = 01, only
29
First MEDIAN Workshop 2012
2
$ $"
#
!"
!
#"
Liang Chen, Mehdi B. Tahoori
$
$"
#
#"
!
%&'()*+,-
!
!"
If the error-free function l = ij is combined with
EPF, a four-input, two-output super-gate can be constructed corresponding to the primitive AND gate, as
illustrated in Figure 1.
Here if , jf , lf are just virtual error signals to facilitate the modeling of error propagation, and this representation has several important features:
– It significantly simplifies the propagation and correlation modeling of the bit-flips in combinational
network: it does not differentiate two kinds of bit
flips: 0→1 and 1→0 as in [2].
– The super-gate concept is independent of any specific algorithm: it is just a boolean function and
could be analyzed using the well researched methods
for signal probability, switching activity, etc.
– This concept is easy to be extended to more complicated error modeling: it can be adjusted for single
bit-flips, multiple bit-flips, permanent faults, etc.
With regard to the properties of the virtual error
signals, their interpretation is different:
– Logic value: ’1’ is interpreted as bit-flip occurring in
the corresponding signal and ’0’ means error-free;
– Signal probability (SP): interpreted as the error probability of the corresponding signal.
For the sake of simplicity, two-input AND, OR and
INV gates are chosen as primitive ones to constitute
arbitrary combinational networks, similar to the treatment in [2]. Actually, any complex gate in the design
library can be modeled as a macro-gate consisting of the
basic AND/OR/INV gates. Each macro-gate would be
considered as a single node in the netlist graph, while
the probability and correlation calculation can be carried out using the simple extension of basic gate rules.
This modeling technique doubles the number of signals needed to be considered. However, on one hand this
additional complexity is unavoidable to model propagation of bit-flip errors and their complicated correlations;
on the other hand, this problem can be alleviated by
property exploration of the EPF, as discussed later.
"
#
$
!
Fig. 1 Super-gate concept
when i is erroneous (0→1) and j is error-free, the output l is erroneous (0→1). The other three cases could
be analyzed similarly.
If notation xf is used to indicate whether the signal
x is faulty, i.e. xf = 1 means a bit-flip occurs on signal
x, the four cases mentioned above can be expressed by
the so-called Error Propagation Function (EPF):
lf = ij̄ i¯f jf + īj̄if jf + ījif j¯f + ij(if + jf )
(1)
"
(a) Fanout node
#
%
(b) AND gate
Fig. 2 Typical structures for correlation calculation
2.2 Signal Correlation
In typical digital circuits, there are two kinds of correlation: temporal correlation and spatial correlation [4].
Temporal correlation is always related to the historical
trends of bit streams, which is beyond the scope of this
paper. Only spatial correlation due to the reconvergent
fanout will be considered in this paper.
2.2.1 Correlation Model
To model the spatial correlation, we adopt the Correlation Coefficient Method (CCM) [3] from signal probability community. With the notation of signal probability
P (i = 1) = p(i), the correlation coefficient of signals
i, j is defined as
Ci,j = Cj,i =
p(ij)
p(i|j)
p(j|i)
=
=
p(i)p(j)
p(i)
p(j)
(2)
where p(ij) is the joint probability P (i = 1, j = 1), and
p(i|j) is the conditional probability P (i = 1|j = 1).
From this definition, it can be derived that when these
two signals are uncorrelated, Ci,j = 1.
For the primitive two-input AND gate, given pi , pj
and Ci,j we can exactly calculate the signal probability
of the gate output l using the following formula:
p(l) = p(i)p(j)Ci,j ,
0 ≤ Ci,j ≤
1
p(i)p(j)
(3)
The correlation coefficients between different signals
can be analytically computed for all structural cases
in the combinational network. Two typical cases are
illustrated in Figure 2 and the correlation formulas are
Cl,m = 1/p(i) and Cl,m = Ci,h Cj,h , respectively.
2.2.2 Accuracy Issue
In Figure 2(b) it is assumed
Cij,h ≈ Ci,h Cj,h
(4)
Therefore, the second and higher order correlations among
multiple signals are not considered in CCM. The signal
probability estimation in [3] and switching activity estimation in [4] show this first-order approximation can
provide accurate results in practice. However, neglecting high order correlations may lead gate output probability outside the [0, 1] bound. Therefore, Inequality
(3) is used to limit Ci,j to avoid probability overflow.
30
First MEDIAN Workshop 2012
Soft Error Propagation and Correlation Estimation in Combinational Network
Actually, from our observation the upper bound in
Inequality (3) is rather loose. Revisiting the definition
of correlation coefficient in Equation (2), we have:
Ci,j =
p(i|j)
p(j|i)
1
1
=
≤ min{
,
}
p(i)
p(j)
p(i) p(j)
3
pair can be derived in a similar way. Furthermore, using De Morgan’s law we have l = i + j ≡ īj̄, the above
Super-AND correlation rules can be easily extended to
Super-OR cases with minor modification.
(5)
3.2 Semi-super-gate Concept Integration
This new inequality gives a tighter upper bound, therefore provides better error bounding in the propagation,
especially for the signals with high order correlation.
In the scope of soft errors, if single fault assumption
is used, only the gates in the fanout cone of the error
site will be influenced and contribute to error probabilities of primary outputs, as illustrated in Figure 3.
Therefore, only considering error propagation in fanout
cone has large benefit with regard to runtime reduction,
especially for those error sites near primary outputs,
because their fanout cones are rather small compared
with the entire circuit.
3 Error Probability Estimation
3.1 EPF Exploration
Although we can replace each gate with its corresponding super gate and then process them by CCM, this
straightforward method introduces unnecessary graph
transformation effort. One promising alternative is to
derive signal probability and correlation propagation
formula for super gate pairs.
It is possible to use basic signal probability and correlation rules in Section 2.2 to manually derive these
formulas, but this process is tedious, error-prone and
actually not necessary. To facilitate following analysis,
the EPF of AND gate is rewritten as
'(%
!"'
'(#
!"&
!"#
'()
!$
!"%
l = ij
l f1
lf2
p(lf ) = p(lf0 ) + p(lf1 ) + p(lf2 ) + p(lf3 )
i = 0, 1, 2
'(*
!"&
Fig. 3 Error propagation path
By investigating this new scheme further, we discover that for the gates at the boundary of fanout cone,
only one of the two inputs for AND, OR gate is possible to be erroneous, i.e. another input is definitely
error-free. This important observation is very useful as
it contributes more to the reduction of complexity and
runtime. Recalling the EPF of AND gate in Equation
(1), and assuming the input i is error-free, i.e. if = 0,
this EPF can be simplified as follows:
(7)
(8)
(9)
For the probability p(lf ) calculation, each of the
above four terms could be easily calculated using basic gates rules, as every term contains only four variables. For the correlation coefficient Cl,lf , instead of
using CCM for gate-level implementation of EPF, we
turn to correlation coefficient definition in Equation (2)
to derive this value:
Cl,lf =
!$%
'(+
l f3
i, j = 0, 1, 2, 3 and i �= j
p(llfi ) = 0,
!"%
(6)
One important property of this equation is that the
4 terms lf0 , lf1 , lf2 , lf3 are mutually exclusive:
p(lfi lfj ) = 0,
!$&
!"#
lf = ij̄ i¯f jf + īj̄if jf + ījif j¯f + ij(if + jf )
� �� � � �� � � �� � � �� �
l f0
!$#
lf = ij̄jf + ijjf = ijf
(11)
The complex super-gate EPF collapses to only two primitive AND gates, called semi-super gate: one for errorfree function, the other for error propagation. The derivation of corresponding correlation formulas are very straightforward and omitted here for brevity. A better calculation scheme is proposed and illustrated in Figure 3:
p(llf )
p[l(lf0 + lf1 + lf2 + lf3 )]
=
p(l)p(lf )
p(l)p(lf )
p(llf0 ) + p(llf1 ) + p(llf2 ) + p(llf3 )
=
p(l)p(lf )
p(llf3 )
p(lf3 )
=
=
(10)
p(l)p(lf )
p(l)p(lf )
– Original gate: calculation with original signal probability formula, as gates Ugi , i = 0, 1, 2, 3;
– Semi-super gate: calculation with semi-super gate
where all the three signal probabilities p(lf3 ), p(lf ) and
formula, as gates Usi , i = 0, 1, 2;
p(l) are easy to be obtained, as discussed above. The
– Super gate: calculation with super-gate formula, as
general formulas for typical structures, e.g. Super-AND/wire gates Uf i , i = 0, 1, 2.
31
First MEDIAN Workshop 2012
4
The corresponding error probability estimation flow
is described as follows:
1. Graph setup: the original gate-level netlist is parsed,
and topologically levelized;
2. Signal probability calculation: the signal probability
of each gate and the correlation coefficients are calculated and recorded;
3. Gate calculation type determination: for a specific
error site, graph algorithm is used to obtain the
fanout cone, then all the gates at the same or larger
levels with regard to error site are tagged as one of
the three types: original, semi-super and super gate;
4. Error probability calculation: CCM method is used
to traverse from error site level by level using different formulas. The signal probabilities and correlation coefficients at different levels obtained in Step
2 can be reused for different error sites.
The error probability of primary output pe (P Oi )
is the signal probability of error signal P Oif , i.e.
pe (P Oi ) = p(P Oif ).
4 Experimental Results
The proposed approach was implemented in C++ and
experiments were performed for several 74 series and
ISCAS’85 benchmarks on a workstation with Intel Xeon
E5540 2.53GHz and 16GB RAM.
The benchmarks are synthesized using the primitive
gates. Exhaustive (number of primary inputs < 20) or
statistical Monte-Carlo (MC) simulation [5] is used for
comparison. The confidence level is set to 95% and MC
simulation is terminated when the absolute width of the
confidence interval is smaller than 0.005 for the error
probability of each primary output (PO). As our motivation is analyzing correlated error propagation from
logic level to higher abstraction levels, individual error
probabilities rather than an overall reliability metric
are preferred. Therefore, for each error site, the error
probabilities of all POs are calculated and compared
one by one with MC simulation. Similar to CCM [3],
we report the maximum absolute error (MAX), average
absolute error (AVG) and root mean square (RMS) error. The relative error is deliberately excluded due to
its misleading meaning for small probability values.
Here the common assumption of random and independent PIs for combinational circuit is used. Nevertheless, the proposed method can handle realistic correlated inputs because it uses both probabilities and correlations to capture the signal statistics level by level.
Table 1 shows that the AVG and RMS errors are
very small, on average below 0.007 and 0.02, respectively. The MAX error representing worst case scenario
is 0.1283 on average. Please note that our reported inaccuracy values are based on node by node comparison of
Liang Chen, Mehdi B. Tahoori
Table 1 Runtime and accuracy for error probability estimation with super gate formula (SGF) method
Bench
Gates
74182
74L85
74283
74181
c432
c499
c880
c1355
average
21
52
61
106
232
616
459
629
-
MC
0.43
8.46
3.00
212.59
5412.58
11112.80
30885.10
17587.50
-
Runtime (s)
SGF
Speedup
0.0055
78
0.0324
261
0.0407
74
0.1537
1383
4.3
1268
31.3
355
9.8
3165
51.5
341
866
Analytical Inaccuracy
MAX
AVG
RMS
0.0000
0.0000
0.0000
0.0764
0.0035
0.0107
0.1235
0.0066
0.0215
0.2741
0.0116
0.0263
0.2482
0.0278
0.0515
0.0053
0.0007
0.0011
0.2410
0.0023
0.0134
0.0582
0.0022
0.0043
0.1283 0.0068 0.0161
all primary outputs for all erroneous sites, rather than
one overall accuracy metric for the whole circuit. Taking this into consideration, the MAX errors are still in
reasonable range. For runtime, on average our approach
is 866x faster than the Monte-Carlo simulation.
5 Conclusion
Soft error is becoming one major reliability issue in the
nano era. It is necessary to handle error correlation effect for more accurate error estimation. This paper proposed a novel approach to consider both signal and error correlations in a unified way. The concept of error
propagation function and super gate were conceived to
address the error probability and correlation problem
with signal probability and correlation techniques. Experimental results showed our approach is 866x faster
than Monte-Carlo simulation, while the average inaccuracy of error probability estimation is below 0.007.
Acknowledgement
This work was partly supported by the German Research Foundation (DFG) as part of the national focal
program ”Dependable Embedded Systems” (SPP-1500,
http://spp1500.ira.uka.de/).
References
1. Baumann, R.: Soft errors in advanced computer systems. IEEE Design and Test of Computers 22(3), 258–266
(2005)
2. Choudhury, M., Mohanram, K.: Reliability analysis of
logic circuits. IEEE Tran. on Computer-Aided Design of
Integrated Circuits and Systems 28(3), 392–405 (2009)
3. Ercolani, S., Favalli, M., Damiani, M., Olivo, P., Ricco, B.:
Estimate of signal probability in combinational logic networks. In: European Test Conference, pp. 132–138 (1989)
4. Marculescu, R., Marculescu, D., Pedram, M.: Probabilistic
modeling of dependencies during switching activity analysis. IEEE Tran. on Computer-Aided Design of Integrated
Circuits and Systems 17(2), 73–83 (1998)
5. Rubinstein, R., Kroese, D.: Simulation and the Monte
Carlo method. John Wiley & Sons (2008)
6. Shivakumar, P., Kistler, M., Keckler, S., Burger, D., Alvisi,
L.: Modeling the effect of technology trends on the soft
error rate of combinational logic. In: Intl. Conf. on Dependable Systems and Networks, pp. 389–398 (2002)
32