Noname manuscript No. First Workshop 2012 (will MEDIAN be inserted by the editor) Soft Error Propagation and Correlation Estimation in Combinational Network Liang Chen · Mehdi B. Tahoori Abstract Soft error is becoming one of the major reliability concerns in nano era. Considering the correlation effect in error propagation is essential for accurate error rate estimation. This paper proposes a novel method to address this correlation issue in combinational circuits. Based on the concept of error propagation function and super gate idea, the error probability and correlation problems are converted into the computation of normal signal probability and correlation. The experimental results show that compared with Monte-Carlo simulation, our approach is 866x faster, while the average error probability inaccuracy is below 0.007. at circuit level can propagate to multiple outputs, and the way it is seen at higher abstraction levels can manifest as multiple correlated errors (bit flips). Modeling this correlation effect is essential for accurate error rate estimation at both gate level and hierarchical RTL, consequently the selective protection of vulnerable gates or blocks can efficiently lead to robust system design. This paper focuses on logical masking effect of soft errors in the combinational circuits. It proposes a novel method based on the concept of error propagation function and super-gate idea to unify the treatment of signal probability and error probability, which efficiently addresses the internal correlation among different wires. In addition, semi-super gate simplification is utilized to speedup the computation. Therefore, it provides a concise and efficient solution, and much potential for further extensions (e.g. analyzing multiple upsets and block-level error propagation). The simulation results show that our method is 866x faster than Monte-Carlo simulation, while the average inaccuracy of error probability estimation is below 0.007. The organization of the rest of this paper is as follows. Section 2 discusses our proposed error propagation model, super gate idea and adopted correlation model. Section 3 introduces the procedures to calculate the gate error probabilities using super gate formula method. Section 4 describes the experimental results and finally Section 5 concludes this paper. Keywords Soft Error · Error Probability · Logical Masking · Signal Correlation 1 Introduction Radiation-induced soft error is becoming one significant reliability issue in nano era [1]. Not only the extent, but also the criticality of soft errors are increasing [6]. Therefore, it is essential to consider this impact in early design phases for cost-effective error mitigation. As sequential circuit elements and even combinational logic gates are becoming equally vulnerable to soft errors as memories [6], the intrinsic irregularity and functional complexity of logic circuits make the modeling of error propagation and error probability estimation more sophisticated than that of memory cells. Even a single event transient caused by a particle strike 2 Proposed Error Propagation Model Liang Chen Karlsruhe Institute of Technology, Karlsruhe, Germany E-mail: [email protected] 2.1 Error Propagation in Combinational Network Mehdi B. Tahoori Karlsruhe Institute of Technology, Karlsruhe, Germany E-mail: [email protected] Soft errors are modeled as bit flips in our paper. To illustrate the basic idea, we take the boolean AND function l = ij as the running example. For ij = 01, only 29 First MEDIAN Workshop 2012 2 $ $" # !" ! #" Liang Chen, Mehdi B. Tahoori $ $" # #" ! %&'()*+,- ! !" If the error-free function l = ij is combined with EPF, a four-input, two-output super-gate can be constructed corresponding to the primitive AND gate, as illustrated in Figure 1. Here if , jf , lf are just virtual error signals to facilitate the modeling of error propagation, and this representation has several important features: – It significantly simplifies the propagation and correlation modeling of the bit-flips in combinational network: it does not differentiate two kinds of bit flips: 0→1 and 1→0 as in [2]. – The super-gate concept is independent of any specific algorithm: it is just a boolean function and could be analyzed using the well researched methods for signal probability, switching activity, etc. – This concept is easy to be extended to more complicated error modeling: it can be adjusted for single bit-flips, multiple bit-flips, permanent faults, etc. With regard to the properties of the virtual error signals, their interpretation is different: – Logic value: ’1’ is interpreted as bit-flip occurring in the corresponding signal and ’0’ means error-free; – Signal probability (SP): interpreted as the error probability of the corresponding signal. For the sake of simplicity, two-input AND, OR and INV gates are chosen as primitive ones to constitute arbitrary combinational networks, similar to the treatment in [2]. Actually, any complex gate in the design library can be modeled as a macro-gate consisting of the basic AND/OR/INV gates. Each macro-gate would be considered as a single node in the netlist graph, while the probability and correlation calculation can be carried out using the simple extension of basic gate rules. This modeling technique doubles the number of signals needed to be considered. However, on one hand this additional complexity is unavoidable to model propagation of bit-flip errors and their complicated correlations; on the other hand, this problem can be alleviated by property exploration of the EPF, as discussed later. " # $ ! Fig. 1 Super-gate concept when i is erroneous (0→1) and j is error-free, the output l is erroneous (0→1). The other three cases could be analyzed similarly. If notation xf is used to indicate whether the signal x is faulty, i.e. xf = 1 means a bit-flip occurs on signal x, the four cases mentioned above can be expressed by the so-called Error Propagation Function (EPF): lf = ij̄ i¯f jf + īj̄if jf + ījif j¯f + ij(if + jf ) (1) " (a) Fanout node # % (b) AND gate Fig. 2 Typical structures for correlation calculation 2.2 Signal Correlation In typical digital circuits, there are two kinds of correlation: temporal correlation and spatial correlation [4]. Temporal correlation is always related to the historical trends of bit streams, which is beyond the scope of this paper. Only spatial correlation due to the reconvergent fanout will be considered in this paper. 2.2.1 Correlation Model To model the spatial correlation, we adopt the Correlation Coefficient Method (CCM) [3] from signal probability community. With the notation of signal probability P (i = 1) = p(i), the correlation coefficient of signals i, j is defined as Ci,j = Cj,i = p(ij) p(i|j) p(j|i) = = p(i)p(j) p(i) p(j) (2) where p(ij) is the joint probability P (i = 1, j = 1), and p(i|j) is the conditional probability P (i = 1|j = 1). From this definition, it can be derived that when these two signals are uncorrelated, Ci,j = 1. For the primitive two-input AND gate, given pi , pj and Ci,j we can exactly calculate the signal probability of the gate output l using the following formula: p(l) = p(i)p(j)Ci,j , 0 ≤ Ci,j ≤ 1 p(i)p(j) (3) The correlation coefficients between different signals can be analytically computed for all structural cases in the combinational network. Two typical cases are illustrated in Figure 2 and the correlation formulas are Cl,m = 1/p(i) and Cl,m = Ci,h Cj,h , respectively. 2.2.2 Accuracy Issue In Figure 2(b) it is assumed Cij,h ≈ Ci,h Cj,h (4) Therefore, the second and higher order correlations among multiple signals are not considered in CCM. The signal probability estimation in [3] and switching activity estimation in [4] show this first-order approximation can provide accurate results in practice. However, neglecting high order correlations may lead gate output probability outside the [0, 1] bound. Therefore, Inequality (3) is used to limit Ci,j to avoid probability overflow. 30 First MEDIAN Workshop 2012 Soft Error Propagation and Correlation Estimation in Combinational Network Actually, from our observation the upper bound in Inequality (3) is rather loose. Revisiting the definition of correlation coefficient in Equation (2), we have: Ci,j = p(i|j) p(j|i) 1 1 = ≤ min{ , } p(i) p(j) p(i) p(j) 3 pair can be derived in a similar way. Furthermore, using De Morgan’s law we have l = i + j ≡ īj̄, the above Super-AND correlation rules can be easily extended to Super-OR cases with minor modification. (5) 3.2 Semi-super-gate Concept Integration This new inequality gives a tighter upper bound, therefore provides better error bounding in the propagation, especially for the signals with high order correlation. In the scope of soft errors, if single fault assumption is used, only the gates in the fanout cone of the error site will be influenced and contribute to error probabilities of primary outputs, as illustrated in Figure 3. Therefore, only considering error propagation in fanout cone has large benefit with regard to runtime reduction, especially for those error sites near primary outputs, because their fanout cones are rather small compared with the entire circuit. 3 Error Probability Estimation 3.1 EPF Exploration Although we can replace each gate with its corresponding super gate and then process them by CCM, this straightforward method introduces unnecessary graph transformation effort. One promising alternative is to derive signal probability and correlation propagation formula for super gate pairs. It is possible to use basic signal probability and correlation rules in Section 2.2 to manually derive these formulas, but this process is tedious, error-prone and actually not necessary. To facilitate following analysis, the EPF of AND gate is rewritten as '(% !"' '(# !"& !"# '() !$ !"% l = ij l f1 lf2 p(lf ) = p(lf0 ) + p(lf1 ) + p(lf2 ) + p(lf3 ) i = 0, 1, 2 '(* !"& Fig. 3 Error propagation path By investigating this new scheme further, we discover that for the gates at the boundary of fanout cone, only one of the two inputs for AND, OR gate is possible to be erroneous, i.e. another input is definitely error-free. This important observation is very useful as it contributes more to the reduction of complexity and runtime. Recalling the EPF of AND gate in Equation (1), and assuming the input i is error-free, i.e. if = 0, this EPF can be simplified as follows: (7) (8) (9) For the probability p(lf ) calculation, each of the above four terms could be easily calculated using basic gates rules, as every term contains only four variables. For the correlation coefficient Cl,lf , instead of using CCM for gate-level implementation of EPF, we turn to correlation coefficient definition in Equation (2) to derive this value: Cl,lf = !$% '(+ l f3 i, j = 0, 1, 2, 3 and i �= j p(llfi ) = 0, !"% (6) One important property of this equation is that the 4 terms lf0 , lf1 , lf2 , lf3 are mutually exclusive: p(lfi lfj ) = 0, !$& !"# lf = ij̄ i¯f jf + īj̄if jf + ījif j¯f + ij(if + jf ) � �� � � �� � � �� � � �� � l f0 !$# lf = ij̄jf + ijjf = ijf (11) The complex super-gate EPF collapses to only two primitive AND gates, called semi-super gate: one for errorfree function, the other for error propagation. The derivation of corresponding correlation formulas are very straightforward and omitted here for brevity. A better calculation scheme is proposed and illustrated in Figure 3: p(llf ) p[l(lf0 + lf1 + lf2 + lf3 )] = p(l)p(lf ) p(l)p(lf ) p(llf0 ) + p(llf1 ) + p(llf2 ) + p(llf3 ) = p(l)p(lf ) p(llf3 ) p(lf3 ) = = (10) p(l)p(lf ) p(l)p(lf ) – Original gate: calculation with original signal probability formula, as gates Ugi , i = 0, 1, 2, 3; – Semi-super gate: calculation with semi-super gate where all the three signal probabilities p(lf3 ), p(lf ) and formula, as gates Usi , i = 0, 1, 2; p(l) are easy to be obtained, as discussed above. The – Super gate: calculation with super-gate formula, as general formulas for typical structures, e.g. Super-AND/wire gates Uf i , i = 0, 1, 2. 31 First MEDIAN Workshop 2012 4 The corresponding error probability estimation flow is described as follows: 1. Graph setup: the original gate-level netlist is parsed, and topologically levelized; 2. Signal probability calculation: the signal probability of each gate and the correlation coefficients are calculated and recorded; 3. Gate calculation type determination: for a specific error site, graph algorithm is used to obtain the fanout cone, then all the gates at the same or larger levels with regard to error site are tagged as one of the three types: original, semi-super and super gate; 4. Error probability calculation: CCM method is used to traverse from error site level by level using different formulas. The signal probabilities and correlation coefficients at different levels obtained in Step 2 can be reused for different error sites. The error probability of primary output pe (P Oi ) is the signal probability of error signal P Oif , i.e. pe (P Oi ) = p(P Oif ). 4 Experimental Results The proposed approach was implemented in C++ and experiments were performed for several 74 series and ISCAS’85 benchmarks on a workstation with Intel Xeon E5540 2.53GHz and 16GB RAM. The benchmarks are synthesized using the primitive gates. Exhaustive (number of primary inputs < 20) or statistical Monte-Carlo (MC) simulation [5] is used for comparison. The confidence level is set to 95% and MC simulation is terminated when the absolute width of the confidence interval is smaller than 0.005 for the error probability of each primary output (PO). As our motivation is analyzing correlated error propagation from logic level to higher abstraction levels, individual error probabilities rather than an overall reliability metric are preferred. Therefore, for each error site, the error probabilities of all POs are calculated and compared one by one with MC simulation. Similar to CCM [3], we report the maximum absolute error (MAX), average absolute error (AVG) and root mean square (RMS) error. The relative error is deliberately excluded due to its misleading meaning for small probability values. Here the common assumption of random and independent PIs for combinational circuit is used. Nevertheless, the proposed method can handle realistic correlated inputs because it uses both probabilities and correlations to capture the signal statistics level by level. Table 1 shows that the AVG and RMS errors are very small, on average below 0.007 and 0.02, respectively. The MAX error representing worst case scenario is 0.1283 on average. Please note that our reported inaccuracy values are based on node by node comparison of Liang Chen, Mehdi B. Tahoori Table 1 Runtime and accuracy for error probability estimation with super gate formula (SGF) method Bench Gates 74182 74L85 74283 74181 c432 c499 c880 c1355 average 21 52 61 106 232 616 459 629 - MC 0.43 8.46 3.00 212.59 5412.58 11112.80 30885.10 17587.50 - Runtime (s) SGF Speedup 0.0055 78 0.0324 261 0.0407 74 0.1537 1383 4.3 1268 31.3 355 9.8 3165 51.5 341 866 Analytical Inaccuracy MAX AVG RMS 0.0000 0.0000 0.0000 0.0764 0.0035 0.0107 0.1235 0.0066 0.0215 0.2741 0.0116 0.0263 0.2482 0.0278 0.0515 0.0053 0.0007 0.0011 0.2410 0.0023 0.0134 0.0582 0.0022 0.0043 0.1283 0.0068 0.0161 all primary outputs for all erroneous sites, rather than one overall accuracy metric for the whole circuit. Taking this into consideration, the MAX errors are still in reasonable range. For runtime, on average our approach is 866x faster than the Monte-Carlo simulation. 5 Conclusion Soft error is becoming one major reliability issue in the nano era. It is necessary to handle error correlation effect for more accurate error estimation. This paper proposed a novel approach to consider both signal and error correlations in a unified way. The concept of error propagation function and super gate were conceived to address the error probability and correlation problem with signal probability and correlation techniques. Experimental results showed our approach is 866x faster than Monte-Carlo simulation, while the average inaccuracy of error probability estimation is below 0.007. Acknowledgement This work was partly supported by the German Research Foundation (DFG) as part of the national focal program ”Dependable Embedded Systems” (SPP-1500, http://spp1500.ira.uka.de/). References 1. Baumann, R.: Soft errors in advanced computer systems. IEEE Design and Test of Computers 22(3), 258–266 (2005) 2. Choudhury, M., Mohanram, K.: Reliability analysis of logic circuits. IEEE Tran. on Computer-Aided Design of Integrated Circuits and Systems 28(3), 392–405 (2009) 3. Ercolani, S., Favalli, M., Damiani, M., Olivo, P., Ricco, B.: Estimate of signal probability in combinational logic networks. In: European Test Conference, pp. 132–138 (1989) 4. Marculescu, R., Marculescu, D., Pedram, M.: Probabilistic modeling of dependencies during switching activity analysis. IEEE Tran. on Computer-Aided Design of Integrated Circuits and Systems 17(2), 73–83 (1998) 5. Rubinstein, R., Kroese, D.: Simulation and the Monte Carlo method. John Wiley & Sons (2008) 6. Shivakumar, P., Kistler, M., Keckler, S., Burger, D., Alvisi, L.: Modeling the effect of technology trends on the soft error rate of combinational logic. In: Intl. Conf. on Dependable Systems and Networks, pp. 389–398 (2002) 32
© Copyright 2026 Paperzz